Linear Guest PDF
Linear Guest PDF
Linear Guest PDF
2
Contents
3
4
7 Matrices 121
7.1 Linear Transformations and Matrices . . . . . . . . . . . . . . 121
7.1.1 Basis Notation . . . . . . . . . . . . . . . . . . . . . . 121
7.1.2 From Linear Operators to Matrices . . . . . . . . . . . 127
7.2 Review Problems . . . . . . . . . . . . . . . . . . . . . . . . . 129
4
5
8 Determinants 169
8.1 The Determinant Formula . . . . . . . . . . . . . . . . . . . . 169
8.1.1 Simple Examples . . . . . . . . . . . . . . . . . . . . . 169
8.1.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . 170
8.2 Elementary Matrices and Determinants . . . . . . . . . . . . . 174
8.2.1 Row Swap . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.2.2 Row Multiplication . . . . . . . . . . . . . . . . . . . . 176
8.2.3 Row Addition . . . . . . . . . . . . . . . . . . . . . . . 177
8.2.4 Determinant of Products . . . . . . . . . . . . . . . . . 179
8.3 Review Problems . . . . . . . . . . . . . . . . . . . . . . . . . 182
8.4 Properties of the Determinant . . . . . . . . . . . . . . . . . . 186
8.4.1 Determinant of the Inverse . . . . . . . . . . . . . . . . 190
8.4.2 Adjoint of a Matrix . . . . . . . . . . . . . . . . . . . . 190
8.4.3 Application: Volume of a Parallelepiped . . . . . . . . 192
8.5 Review Problems . . . . . . . . . . . . . . . . . . . . . . . . . 193
5
6
13 Diagonalization 241
13.1 Diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . 241
13.2 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . 242
13.3 Changing to a Basis of Eigenvectors . . . . . . . . . . . . . . . 246
13.4 Review Problems . . . . . . . . . . . . . . . . . . . . . . . . . 248
6
7
B Fields 317
7
8
Index 432
8
What is Linear Algebra?
1
Many difficult problems can be handled easily once relevant information is
organized in a certain way. This text aims to teach you how to organize in-
formation in cases where certain mathematical structures are present. Linear
algebra is, in general, the study of those structures. Namely
In broad terms, vectors are things you can add and linear functions are
functions of vectors that respect vector addition. The goal of this text is to
teach you to organize information about vector spaces in a way that makes
problems involving linear functions of many variables easy. (Or at least
tractable.)
To get a feel for the general idea of organizing information, of vectors,
and of linear functions this chapter has brief sections on each. We start
here in hopes of putting students in the right mindset for the odyssey that
follows; the latter chapters cover the same material at a slower pace. Please
be prepared to change the way you think about some familiar mathematical
objects and keep a pencil and piece of paper handy!
9
10 What is Linear Algebra?
But lets think carefully; what is the left hand side of this equation doing?
Functions and equations are di↵erent mathematical objects so why is the
equal sign necessary?
If someone says
we do not quite have all the information we need to determine the relationship
between inputs and outputs.
Do we multiply the first number of the input by 24 or by 35? No one has specified an
order for the variables, so we do not know how to calculate an output associated with
a particular input.1
A di↵erent notation for V can clear this up; we can denote V itself as an ordered
triple of numbers that reminds us what to do to each number from the input.
1
Of course we would know how to calculate an output if the input is described in
the tedious form such as “1 share of G, 2 shares of N and 3 shares of A”, but that is
unacceptably tedious! We want to use ordered triples of numbers to concisely describe
inputs.
10
1.1 Organizing Information 11
0 1 0 1
1 1
Denote V by 24 80 35 and thus write V @2A = 24 80 35 @2A
3B 3
If we change the order for the variables we should change the notation for V .
0 1 0 1
1 1
@
Denote V by 35 80 24 and thus write V 2 A = 35 80 24 2A
@
3 B0 3
The subscripts B and B 0 on the columns of numbers are just symbols2 reminding us
of how to interpret the column of numbers. But the distinction is critical; as shown
above V assigns completely di↵erent numbers to the same columns of numbers with
di↵erent subscripts.
There are six di↵erent ways to order the three companies. Each way will give
di↵erent notation for the same function V , and a di↵erent way of assigning numbers
to columns of three numbers. Thus, it is critical to make clear which ordering is
used if the reader is to understand what is written. Doing so is a way of organizing
information.
2
We were free to choose any symbol to denote these orders. We chose B and B 0 because
we are hinting at a central idea in the course: choosing a basis.
11
12 What is Linear Algebra?
This example is a hint at a much bigger idea central to the text; our choice of
order is an example of choosing a basis3.
1. uncover aspects of functions that don’t change with the choice (Ch 12)
Unfortunately, because the subject (at least for those learning it) requires
seemingly arcane and tedious computations involving large arrays of numbers
known as matrices, the key concepts and the wide applicability of linear
algebra are easily missed. So we reiterate,
In broad terms, vectors are things you can add and linear functions are
functions of vectors that respect vector addition.
12
1.2 What are Vectors? 13
(C) Polynomials: If p(x) = 1 + x 2x2 + 3x3 and q(x) = x + 3x2 3x3 + x4 then
their sum p(x) + q(x) is the new polynomial 1 + 2x + x2 + x4 .
(D) Power series: If f (x) = 1+x+ 2!1 x2 + 3!1 x3 +· · · and g(x) = 1 x+ 2!1 x2 1 3
3! x +· · ·
1 2 1 4
then f (x) + g(x) = 1 + 2! x + 4! x · · · is also a power series.
(E) Functions: If f (x) = ex and g(x) = e x then their sum f (x) + g(x) is the new
function 2 cosh x.
There are clearly di↵erent kinds of vectors. Stacks of numbers are not the
only things that are vectors, as examples C, D, and E show. Vectors of
di↵erent kinds can not be added; What possible meaning could the following
have? ✓ ◆
9
+ ex
3
In fact, you should think of all five kinds of vectors above as di↵erent
kinds, and that you should not add vectors that are not of the same kind.
On the other hand, any two things of the same kind “can be added”. This is
the reason you should now start thinking of all the above objects as vectors!
In Chapter 5 we will give the precise rules that vector addition must obey.
In the above examples, however, notice that the vector addition rule stems
from the rules for adding numbers.
When adding the same vector over and over, for example
x + x, x + x + x, x + x + x + x, ... ,
we will write
2x , 3x , 4x , . . . ,
respectively. For example
0 1 0 1 0 1 0 1 0 1 0 1
1 1 1 1 1 4
4 1 = 1 + 1 + 1 + 1 = 4A .
@ A @ A @ A @ A @ A @
0 0 0 0 0 0
13
14 What is Linear Algebra?
• numbers
• n-vectors
• 2nd order polynomials
• polynomials
• power series
• functions with a certain domain
14
1.3 What are Linear Functions? 15
In linear algebra, the functions we study will have vectors (of some type)
as both inputs and outputs. We just saw that vectors are objects that can be
added or scalar multiplied—a very general notion—so the functions we are
going to study will look novel at first. So things don’t get too abstract, here
are five questions that can be rephrased in terms of functions of vectors.
d
(D) What power series f (x) satisfies x dx f (x) 2f (x) = 0?
4
The cross product appears in this equation.
15
16 What is Linear Algebra?
x 10x
This is just like a function f from calculus that takes in a number x and
spits out the number 10x. (You might write f (x) = 10x to indicate this).
For part (B), we need something more sophisticated.
0 1 0 1
x z
@yA @ z A,
z y x
The inputs and outputs are both 3-vectors. The output is the cross product
of the input with... how about you complete this sentence to make sure you
understand.
The machine needed for example (C) looks like it has just one input and
two outputs; we input a polynomial and get a 2-vector as output.
0 R1 1
1
p(y)dy
p @R A.
1
1
yp(y)dy
16
1.3 What are Linear Functions? 17
While this sounds complicated, linear algebra is the study of simple func-
tions of vectors; its time to describe the essential characteristics of linear
functions.
Let’s use the letter L to denote an arbitrary linear function and think
again about vector addition and scalar multiplication. Also, suppose that v
and u are vectors and c is a number. Since L is a function from vectors to
vectors, if we input u into L, the output L(u) will also be some sort of vector.
The same goes for L(v). (And remember, our input and output vectors might
be something other than stacks of numbers!) Because vectors are things that
can be added and scalar multiplied, u + v and cu are also vectors, and so
they can be used as inputs. The essential characteristic of linear functions is
what can be said about L(u + v) and L(cu) in terms of L(u) and L(v).
Before we tell you this essential characteristic, ruminate on this picture.
The “blob” on the left represents all the vectors that you are allowed to
input into the function L, the blob on the right denotes the possible outputs,
and the lines tell you which inputs are turned into which outputs.6 A full
pictorial description of the functions would require all inputs and outputs
6
The domain, codomain, and rule of correspondence of the function are represented by
the left blog, right blob, and arrows, respectively.
17
18 What is Linear Algebra?
The key to the whole class, from which everything else follows:
1. Additivity:
L(u + v) = L(u) + L(v) .
2. Homogeneity:
L(cu) = cL(u) .
Most functions of vectors do not obey this requirement.7 At its heart, linear
algebra is the study of functions that do.
Notice that the additivity requirement says that the function L respects
vector addition: it does not matter if you first add u and v and then input
their sum into L, or first input u and v into L separately and then add the
outputs. The same holds for scalar multiplication–try writing out the scalar
multiplication version of the italicized sentence. When a function of vectors
obeys the additivity and homogeneity properties we say that it is linear (this
is the “linear” of linear algebra). Together, additivity and homogeneity are
called linearity. Are there other, equivalent, names for linear functions? yes.
7
E.g.: If f (x) = x2 then f (1 + 1) = 4 6= f (1) + f (1) = 2. Try any other function you
can think of!
18
1.3 What are Linear Functions? 19
And now for a hint at the power of linear algebra. The questions in
examples (A-D) can all be restated as
Lv = w
If we view functions as vectors with addition given by addition of functions and with
scalar multiplication given by multiplication of functions by constants, then these
familiar properties of derivatives are just the linearity property of linear maps.
Before introducing matrices, notice that for linear maps L we will often
write simply Lu instead of L(u). This is because the linearity property of a
19
20 What is Linear Algebra?
which feels a lot like the regular rules of algebra for numbers. Notice though,
that “uL” makes no sense here.
This idea will take some time to develop, but we provided an elementary
example in Section 1.1. A good starting place to learn about matrices is by
studying systems of linear equations.
20
1.4 So, What is a Matrix? 21
Each bag contains 2 apples and 4 bananas and each box contains 6 apples and 8
bananas. There are 20 apples and 28 bananas in the room. Find x and y.
The values are the numbers x and y that simultaneously make both of the following
equations true:
2 x + 6 y = 20
4 x + 8 y = 28 .
Here we have an example of a System of Linear Equations.8 It’s a collection
of equations in which variables are multiplied by constants and summed, and
no variables are multiplied together: There are no powers of variables (like x2
or y 5 ), non-integer or negative powers of variables (like y 1/7 or x 3 ), and no
places where variables are multiplied together (like xy).
21
22 What is Linear Algebra?
Writing our fruity equations as an equality between 2-vectors and then using
these rules we have:
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
2 x + 6 y = 20 2x + 6y 20 2 6 20
() = () x +y = .
4 x + 8 y = 28 4x + 8y 28 4 8 28
Now we introduce a function which takes in 2-vectors9 and gives out 2-vectors.
We denote it by an array of numbers called a matrix .
✓ ◆ ✓ ◆✓ ◆ ✓ ◆ ✓ ◆
2 6 2 6 x 2 6
The function is defined by := x +y .
4 8 4 8 y 4 8
✓ ◆ ✓ ◆
x 2x + 6y
.
y 4x + 8y
22
1.4 So, What is a Matrix? 23
Indeed this is an example of the general rule that you have probably seen
before ✓ ◆✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
p q x px + qy p q
:= =x +y .
r s y rx + sy r s
Notice, that the second way of writing the output on the right hand side of
this equation is very useful because it tells us what all possible outputs a
matrix times a vector look like – they are just sums of the columns of the
matrix multiplied by scalars. The set of all possible outputs of a matrix
times a vector is called the column space (it is also the image of the linear
function defined by the matrix).
Matrices in Space!
Thus matrices can be viewed as linear functions. The statement of this for
the matrix in our fruity example is as follows.
✓ ◆ ✓ ◆ ✓ ◆✓ ◆
2 6 x 2 6 x
1. = and
4 8 y 4 8 y
23
24 What is Linear Algebra?
✓ ◆ ✓ ◆ ✓ 0 ◆ ✓ ◆✓ ◆ ✓ ◆ ✓ 0◆
2 6 x x 2 6 x 2 6 x
2. + = + .
4 8 y y0 4 8 y 4 8 y0
These equalities can be verified using the rules we introduced so far.
✓ ◆
2 6
Example 8 Verify that is a linear operator.
4 8
The matrix-function is homogeneous if the expressions on the left hand side and right
hand side of the first equation are indeed equal.
✓ ◆ ✓ ◆ ✓ ◆✓ ◆ ✓ ◆ ✓ ◆
2 6 a 2 6 a 2 6
= = a + b
4 8 b 4 8 b 4 8
✓ ◆ ✓ ◆ ✓ ◆
2 a 6bc 2 a+6 b
= + =
4 a 8bc 4 a+8 b
while
✓ ◆✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
2 6 a 2 6 2a 6b
=c a +b = +
4 8 b 4 8 4a 8b
✓ ◆ ✓ ◆
2a + 6b 2 a+6 b
= = .
4a + 8b 4 a+8 b
The underlined expressions are identical, so the matrix is homogeneous.
The matrix-function is additive if the left and right side of the second equation are
indeed equal.
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆✓ ◆ ✓ ◆ ✓ ◆
2 6 a c 2 6 a+c 2 6
+ = = (a + c) + (b + d)
4 8 b d 4 8 b+d 4 8
✓ ◆ ✓ ◆ ✓ ◆
2(a + c) 6(b + d) 2a + 2c + 6b + 6d
= + =
4(a + c) 8(b + d) 4a + 4c + 8b + 8d
which we need to compare to
✓ ◆✓ ◆ ✓ ◆✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
2 6 a 2 6 c 2 6 2 6
+ =a +b +c +d
4 8 b 4 8 d 4 8 4 8
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
2a 6b 2c 6d 2a + 2c + 6b + 6d
= + + + = .
4a 8b 4c 8d 4a + 4c + 8b + 8d
Thus multiplication by a matrix is additive and homogeneous, and so it is, by definition,
linear.
24
1.4 So, What is a Matrix? 25
We have come full circle; matrices are just examples of the kinds of linear
operators that appear in algebra problems like those in section 1.3. Any
equation of the form M v = w with M a matrix, and v, w n-vectors is called
a matrix equation. Chapter 2 is about efficiently solving systems of linear
equations, or equivalently matrix equations.
The output of the first machine would be fed into the second.
✓ ◆ ✓ ◆ ✓ ◆
x 2x + 6y 1.(2x + 6y) + 2.(4x + 8y)
y 4x + 8y 0.(2x + 6y) + 1.(4x + 8y)
✓ ◆
10x + 22y
=
4x + 8y
Notice that the same final result could be achieved with a single machine:
✓ ◆ ✓ ◆
x 10x + 22y
.
y 4x + 8y
25
26 What is Linear Algebra?
g f :U !W
where
(g f )(u) = g(f (u)) .
This is called the composition of functions. Matrix multiplication is the tool
required for computing the composition of linear functions.
26
1.4 So, What is a Matrix? 27
0 1 20 1 0 13
2a 2 0 0 a
@
= 2a + 2b A = 4 @ 2 2 0 A @ b A5 .
b + 2c B 0 1 2 c B
That is, our notational convention for quadratic functions has induced a notation for
d
the di↵erential operator dx + 2 as a matrix. We can use this notation to change the
way that the following two equations say exactly the same thing.
20 1 0 13 0 1
✓ ◆ 2 0 0 a 0
d 4 @ A @ A5
+2 f =x+1, 2 2 0 b = 1A .
@
dx
0 1 2 c B
1 B
Our notational convention has served as an organizing principle to yield the system of
equations
2a =0
2a + 2b = 1
b + 2c = 1
0 1
0
1
with solution 2 A , where the subscript B is used to remind us that this stack of
@
1
4 B
numbers encodes the vector 12 x+ 14 , which is indeed the solution to our equation since,
d
substituting for f yields the true statement dx + 2 ( 12 x + 14 ) = x + 1.
27
28 What is Linear Algebra?
d
A simple example with the knowns (L and V are dx and 3, respectively) is
shown below, although the detour is unnecessary in this case since you know
how to anti-di↵erentiate.
To drive home the point that we are not studying matrices but rather lin-
ear functions, and that those linear functions can be represented as matrices
under certain notational conventions, consider how changeable the notational
conventions are.
28
1.4 So, What is a Matrix? 29
Example 10 of how a di↵erent matrix comes into the same linear algebra problem.
Notice that we have obtained a di↵erent matrix for the same linear function. The
equation we started with
20 1 0 13 0 1
✓ ◆ 2 1 0 a 1
d
+ 2 f = x + 1 , 4@0 2 2A @ bA5 = @1A
dx
0 0 2 c B0
0 B0
2a + b = 1
, 2b + 2c = 1
2c = 0
011
4
B C
has the solution @ 12 A. Notice that we have obtained a di↵erent 3-vector for the
0
same vector, since in the notational convention B 0 this 3-vector represents 14 + 12 x.
29
30 What is Linear Algebra?
Probably you will spend most of your time on the following review questions:
1. Problems A, B, and C of example 3 can all be written as Lv = w where
L:V !W,
(read this as L maps the set of vectors V to the set of vectors W ). For
each case write down the sets V and W where the vectors v and w
come from.
2. Torque is a measure of “rotational force”. It is a vector whose direction
is the (preferred) axis of rotation. Upon applying a force F on an object
at point r the torque ⌧ is the cross product r ⇥ F = ⌧ :
30
1.5 Review Problems 31
3. The function P (t) gives gas prices (in units of dollars per gallon) as a
function of t the year (in A.D. or C.E.), and g(t) is the gas consumption
rate measured in gallons per year by a driver as a function of their age.
The function g is certainly di↵erent for di↵erent people. Assuming a
lifetime is 100 years, what function gives the total amount spent on gas
during the lifetime of an individual born in an arbitrary year t? Is the
operator that maps g to this function linear?
f (t) = f (0)e2t
satisfies the DE for any number f (0). The number 2 in the DE is called
the constant of proportionality. A similar DE
d 2
f= f
dt t
has a time-dependent “constant of proportionality”.
(s, f )
sugar
Hint
32
1.5 Review Problems 33
33
34 What is Linear Algebra?
Are these matrices diagonal and why? Use the rule you found in prob-
lem 6 to compute the matrix products DD0 and D0 D. What do you
observe? Do you think the same property holds for arbitrary matrices?
What about products where only one of the matrices is diagonal?
(ii) Choose an ordering on {⇤, ?, #}, and then use it to write your
function from part (i) as a triple of numbers.
(iii) Choose a new ordering on {⇤, ?, #} and then write your function
from part (i) as a triple of numbers.
34
1.5 Review Problems 35
(iv) Your answers for parts (ii) and (iii) are di↵erent yet represent the
same function – explain!
35
36 What is Linear Algebra?
36
Systems of Linear Equations
2
2.1 Gaussian Elimination
Systems of linear equations can be written as matrix equations. Now you
will learn an efficient algorithm for (maximally) simplifying a system of linear
equations (or a matrix equation) – Gaussian elimination.
37
38 Systems of Linear Equations
38
2.1 Gaussian Elimination 39
Entries left of the divide carry two indices; subscripts denote column number
and superscripts row number. We emphasize, the superscripts here do not
denote exponents. Make sure you can write out the system of equations and
the associated matrix equation for any augmented matrix.
We now have three ways of writing the same question. Let’s put them
side by side as we solve the system by strategically adding and subtracting
equations. We will not tell you the motivation for this particular series of
steps yet, but let you develop some intuition first.
Example 11 (How matrix equations and augmented matrices change in elimination)
✓ ◆✓ ◆ ✓ ◆ ✓ ◆
x + y = 27 1 1 x 27 1 1 27
, = , .
2x y = 0 2 1 y 0 2 1 0
With the first equation replaced by the sum of the two equations this becomes
✓ ◆✓ ◆ ✓ ◆ ✓ ◆
3x + 0 = 27 3 0 x 27 3 0 27
, = , .
2x y = 0 2 1 y 0 2 1 0
Let the new first equation be the old first equation divided by 3:
✓ ◆✓ ◆ ✓ ◆ ✓ ◆
x + 0 = 9 1 0 x 9 1 0 9
, = , .
2x y = 0 2 1 y 0 2 1 0
Replace the second equation by the second equation minus two times the first equation:
✓ ◆✓ ◆ ✓ ◆ ✓ ◆
x + 0 = 9 1 0 x 9 1 0 9
, = , .
0 y = 18 0 1 y 18 0 1 18
Let the new second equation be the old second equation divided by -1:
✓ ◆✓ ◆ ✓ ◆ ✓ ◆
x + 0 = 9 1 0 x 9 1 0 9
, = , .
0 + y = 18 0 1 y 18 0 1 18
Did you see what the strategy was? To eliminate y from the first equation
and then eliminate x from the second. The result was the solution to the
system.
Here is the big idea: Everywhere in the instructions above we can replace
the word “equation” with the word “row” and interpret them as telling us
what to do with the augmented matrix instead of the system of equations.
Performed systemically, the result is the Gaussian elimination algorithm.
39
40 Systems of Linear Equations
Equivalence Example
Note that in going from the first to second augmented matrix, we used the top left 1
to make the bottom left entry zero. For this reason we call the top left entry a pivot.
Similarly, to get from the second to third augmented matrix, the bottom right entry
(before the divide) was used to make the top right one vanish; so the bottom right
entry is also called a pivot.
This name pivot is used to indicate the matrix entry used to “zero out”
the other entries in its column; the pivot is the number used to eliminate
another number in its column.
40
2.1 Gaussian Elimination 41
called the Identity Matrix , since this would give the simple statement of a
solution x = a, y = b. The same goes for larger systems of equations for
which the identity matrix I has 1’s along its diagonal and all o↵-diagonal
entries vanish:
0 1
1 0 ··· 0
B 0 1 0 C
B C
I=B .. .. .. C
@ . . . A
0 0 ··· 1
For many systems, it is not possible to reach the identity in the augmented
matrix via Gaussian elimination. In any case, a certain version of the matrix
that has the maximum number of components eliminated is said to be the
Row Reduced Echelon Form (RREF).
This example demonstrates if one equation is a multiple of the other the identity
matrix can not be a reached. This is because the first step in elimination will make
the second row a row of zeros. Notice that solutions still exists (1, 1) is a solution.
The last augmented matrix here is in RREF; no more than two components can be
eliminated.
This system of equation has a solution if there exists two numbers x, and y such that
0 + 0 = 1. That is a tricky way of saying there are no solutions. The last form of the
augmented matrix here is the RREF.
41
42 Systems of Linear Equations
and then give up because the the upper left slot can not function as a pivot since the 0
that lives there can not be used to eliminate the zero below it. Of course, the right
thing to do is to change the order of the equations before starting
) ! ! (
x + y = 7 1 1 7 1 0 9 x + 0 = 9
, ⇠ ,
0x + y = 2 0 1 2 0 1 2 0 + y = 2.
The third augmented matrix above is the RREF of the first and second. That is to
say, you can swap rows on your way to RREF.
For larger systems of equations redundancy and inconsistency are the ob-
structions to obtaining the identity matrix, and hence to a simple statement
of a solution in the form x = a, y = b, . . . . What can we do to maximally
simplify a system of equations in general? We need to perform operations
that simplify our system without changing its solutions. Because, exchanging
the order of equations, multiplying one equation by a non-zero constant or
adding equations does not change the system’s solutions, we are lead to three
operations:
These are called Elementary Row Operations, or EROs for short, and are
studied in detail in section 2.3. Suppose now we have a general augmented
matrix for which the first entry in the first row does not vanish. Then, using
just the three EROs, we could1 then perform the following.
1
This is a “brute force” algorithm; there will often be more efficient ways to get to
RREF.
42
2.1 Gaussian Elimination 43
Beginner Elimination
This algorithm and its variations is known as Gaussian elimination. The
endpoint of the algorithm is an augmented matrix of the form
0 1
1 ⇤ 0 ⇤ 0 · · · 0 ⇤ b1
B 0 0 1 ⇤ 0 · · · 0 ⇤ b2 C
B C
B 0 0 0 0 1 · · · 0 ⇤ b3 C
B C
B . . . . . . C
. .
B . . . . .
. .
. .
. C
B C
B C.
B 0 0 0 0 0 ··· 1 ⇤ b C k
B C
B 0 0 0 0 0 · · · 0 0 bk+1 C
B C
B . . . . . . . . C
@ .. .. .. .. .. .. .. .. A
0 0 0 0 0 · · · 0 0 br
This is called Reduced Row Echelon Form (RREF). The asterisks denote
the possibility of arbitrary numbers (e.g., the second 1 in the top line of
example 13).
Learning to perform this algorithm by hand is the first step to learning
linear algebra; it will be the primary means of computation for this course.
You need to learn it well. So start practicing as soon as you can, and practice
often.
43
44 Systems of Linear Equations
The reason we need the asterisks in the general form of RREF is that
not every column need have a pivot, as demonstrated in examples 13 and 16.
Here is an example where multiple columns have no pivot:
Example 18 (Consecutive columns with no pivot in RREF)
✓ ◆ ✓ ◆
x + y + z + 0w = 2 1 1 1 0 2 1 1 1 0 2
, ⇠
2x + 2y + 2z + 2w = 4 2 2 2 1 4 0 0 0 1 0
⇢
x + y + z = 2
,
w = 0.
Note that there was no hope of reaching the identity matrix, because of the shape of
the augmented matrix we started with.
44
2.1 Gaussian Elimination 45
Advanced Elimination
It is important that you are able to convert RREF back into a system
of equations. The first thing you might notice is that if any of the numbers
bk+1 , . . . , br in 2.1.3 are non-zero then the system of equations is inconsistent
and has no solutions. Our next task is to extract all possible solutions from
an RREF augmented matrix.
45
46 Systems of Linear Equations
There are always exactly enough non-pivot variables to index your solutions.
In any approach, the variables which are not expressed in terms of the other
variables are called free variables. The standard approach is to use the non-
pivot variables as free variables.
Non-standard approach: solve for w in terms of z and substitute into the
other equations. You now have an expression for each component in terms
of z. But why pick z instead of y or x? (or x + y?) The standard approach
not only feels natural, but is canonical, meaning that everyone will get the
same RREF and hence choose the same variables to be free. However, it is
important to remember that so long as their set of solutions is the same, any
two choices of free variables is fine. (You might think of this as the di↵erence
between using Google MapsTM or MapquestTM ; although their maps may
look di↵erent, the place hhome sici they are describing is the same!)
When you see an RREF augmented matrix with two columns that have
no pivot, you know there will be two free variables.
46
2.1 Gaussian Elimination 47
0 1
1 00 47 ⇢
B0 14 1C
3
B C, x + 7z =4
@0 00 00A y + 3z+4w = 1
0 00 00
8 9
0 1 0 1 0 1 0 1
>
> x = 4 7z >
> x 4 7 0
< =
B y C B1C B 3C B 4C
y = 1 3z 4w
, ,B C B C B C B C
@ z A = @0A + z @ 1A + w @ 0A
>
> z = z >
>
: ;
w = w w 0 0 1
You can imagine having three, four, or fifty-six non-pivot columns and
the same number of free variables indexing your solutions set. In general a
solution set to a system of equations with n free variables will be of the form
1 + µ2 x2 + · · · + µn xn : µ1 , . . . , µn 2 R}.
{xP + µ1 xH H H
The parts of these solutions play special roles in the associated matrix
equation. This will come up again and again long after we complete this
discussion of basic calculation methods, so we will use the general language
of linear algebra to give names to these parts now.
47
48 Systems of Linear Equations
Check now that the parts of the solutions with free variables as coefficients
from the previous examples are homogeneous solutions, and that by adding
a homogeneous solution to a particular solution one obtains a solution to the
matrix equation. This will come up over and over again. As an example
d2
without matrices, consider the di↵erential equation dx 2 f = 3. A particular
48
2.2 Review Problems 49
0 1
1 1 0 1 0 1 0 1
B0 0 1 2 0 2 0 1C
B C
B0 0 0 0 1 3 0 1C
B C.
@0 0 0 0 0 2 0 2A
0 0 0 0 0 0 1 1
2. Solve the following linear system:
2x1 + 5x2 8x3 + 2x4 + 2x5 = 0
6x1 + 2x2 10x3 + 6x4 + 8x5 = 6
3x1 + 6x2 + 2x3 + 3x4 + 5x5 = 6
3x1 + 1x2 5x3 + 3x4 + 4x5 = 3
6x1 + 7x2 3x3 + 6x4 + 9x5 = 9
Be sure to set your work out carefully with equivalence signs ⇠ between
each step, labeled by the row operations you performed.
3. Check that the following two matrices are row-equivalent:
✓ ◆ ✓ ◆
1 4 7 10 0 1 8 20
and .
2 9 6 0 4 18 12 0
Now remove the third column from each matrix, and show that the
resulting two matrices (shown below) are row-equivalent:
✓ ◆ ✓ ◆
1 4 10 0 1 20
and .
2 9 0 4 18 0
Now remove the fourth column from each of the original two matri-
ces, and show that the resulting two matrices, viewed as augmented
matrices (shown below) are row-equivalent:
✓ ◆ ✓ ◆
1 4 7 0 1 8
and .
2 9 6 4 18 12
Explain why row-equivalence is never a↵ected by removing columns.
4. Check that the system of equations corresponding to the augmented
matrix 0 1
1 4 10
@3 13 9A
4 17 20
49
50 Systems of Linear Equations
has no solutions. If you remove one of the rows of this matrix, does
the new matrix have any solutions? In general, can row equivalence be
a↵ected by removing rows? Explain why or why not.
x 3y = 6
x + 3z = 3
2x + ky + (3 k)z = 1
Hint
50
2.2 Review Problems 51
8. Show that this pair of augmented matrices are row equivalent, assuming
ad bc 6= 0: ! !
a b e 1 0 de bf
ad bc
⇠
c d f 0 1 af ce
ad bc
Hint
11. Equivalence of augmented matrices does not come from equality of their
solution sets. Rather, we define two matrices to be equivalent if one
can be obtained from the other by elementary row operations.
Find a pair of augmented matrices that are not row equivalent but do
have the same solution set.
51
52 Systems of Linear Equations
On the right, we have listed the relations between old and new rows in matrix notation.
52
2.3 Elementary Row Operations 53
0 10 1 0 1
0 1 0 0 1 1 7 2 0 0 4
@1 0 0A @2 0 0 4A = @0 1 1 7A
0 0 1 0 0 1 4 0 0 1 4
⇠
01 10 1 0 1
0 0
2 2 0 0 4 1 0 0 2
@ 0 1 0A @0 1 1 7A = @0 1 1 7A
0 0 1 0 0 1 4 0 0 1 4
⇠
0 10 1 0 1
1 0 0 1 0 0 2 1 0 0 2
@0 1 1A @0 1 1 7A = @0 1 0 3A
0 0 1 0 0 1 4 0 0 1 4
Here we have multiplied the augmented matrix with the matrices that acted on rows
listed on the right of example 21.
6x = 12
, 3 1 6x = 3 1 12
, 2x = 4
, 2 1 2x = 2 1 4
, 1x = 2
53
54 Systems of Linear Equations
This is another way of thinking about Gaussian elimination which feels more
like elementary algebra in the sense that you “do something to both sides of
an equation” until you have a solution.
54
2.3 Elementary Row Operations 55
As we changed the left side from the matrix M to the identity matrix, the
right side changed from the identity matrix to the matrix which undoes M .
Example 26 (Checking that one matrix undoes another)
0 10 1 0 1
0 12 0 0 1 1 1 0 0
@ 1 0 1 A@ 2 0 0 A=@ 0 1 0 A.
0 0 1 0 0 1 0 0 1
If the matrices are composed in the opposite order, the result is the same.
0 10 1 0 1
0 1 1 0 12 0 1 0 0
@ 2 0 0 A@ 1 0 1 A = @ 0 1 0 A.
0 0 1 0 0 1 0 0 1
· · · E2 E1 M = I .
55
56 Systems of Linear Equations
1
How to find M
1
(M |I) ⇠ (I|M )
Much use is made of the fact that invertible matrices can be undone with
EROs. To begin with, since each elementary row operation has an inverse,
M = E1 1 E2 1 · · · ,
• Row Sum: The identity matrix with one o↵-diagonal entry not 0.
56
2.3 Elementary Row Operations 57
• The row swap matrix that swaps the 2nd and 4th row is the identity matrix with
the 2nd and 4th row swapped:
0 1
1 0 0 0 0
B0 0 0 1 0C
B C
B0 0 1 0 0C .
B C
@0 1 0 0 0A
0 0 0 0 1
• The scalar multiplication matrix that replaces the 3rd row with 7 times the 3rd
row is the identity matrix with 7 in the 3rd row instead of 1:
0 1
1 0 0 0
B0 1 0 0C
B C
@0 0 7 0A .
0 0 0 1
• The row sum matrix that replaces the 4th row with the 4th row plus 9 times
the 2nd row is the identity matrix with a 9 in the 4th row, 2nd column:
0 1
1 0 0 0 0 0 0
B0 1 0 0 0 0 0C
B C
B0 0 1 0 0 0 0C
B C
B0 9 0 1 0 0 0C .
B C
B0 0 0 0 1 0 0C
B C
@0 0 0 0 0 1 0A
0 0 0 0 0 0 1
57
58 Systems of Linear Equations
The inverse of the ERO matrices (corresponding to the description of the reverse row
maniplulations)
0 1 0 1 0 1
0 1 0 2 0 0 1 0 0
E1 1 = @ 1 0 0 A , E2 1 = @ 0 1 0 A , E3 1 = @ 0 1 1 A .
0 0 1 0 0 1 0 0 1
58
2.3 Elementary Row Operations 59
59
60 Systems of Linear Equations
We calculated the product of the first three factors in the previous example; it was
named L there, and we will reuse that name here. The product of the next three
factors is diagonal and we wil name it D. The last factor we named U (the name means
something di↵erent in this example than the last example.) The LDU factorization
of our matrix is
0 1 0 10 10 1
2 0 3 1 1 0 0 0 2 0 0 0 1 0 32 12
B 0 1 2 2C B 1 0 0C B 0C B C
B C=B 0 C B0 1 0 C B0 1 2 24 C .
@ 4 0 9 2 A @ 2 0 1 0 A @ 0 0 3 0 A @0 0 1 3A
0 1 1 1 0 1 1 1 0 0 0 3 0 0 0 1
60
2.4 Review Problems 61
0 1
0 1 0 0
B1 0 0 0C
P =B
@0
C=P 1
0 1 0A
0 0 0 1
0 1 0 10 10 10 3 11
0 1 2 2 1 0 0 0 2 0 0 0 0 1 0 0 1 0 2 2
B 2 0 3 1C B 0 1 0 0CB 0CB 0CB 2C
B C=B CB0 1 0 CB1 0 0 CB0 1 2 C
@ 4 0 9 2 A @ 2 0 1 0 @0
A 0 3 0 @0
A 0 1 0 @0
A 0 1 4A
3
0 1 1 1 0 1 1 1 0 0 1 3 0 0 0 1 0 0 0 1
61
62 Systems of Linear Equations
0 10 1 0 1
3 6 2 x 3
@5 9 4 A @ y A = @ 1 A
2 4 2 z 0
3. Solve this vector equation by finding the inverse of the matrix through
(M |I) ⇠ (I|M 1 ) and then applying M 1 to both sides of the equation.
0 10 1 0 1
2 1 1 x 9
@1 1 1A @ y A = @6A
1 1 2 z 7
5. Multiple matrix equations with the same matrix can be solved simul-
taneously.
62
2.5 Solution Sets for Systems of Linear Equations 63
6. How can you convince your fellow students to never make this mistake?
R10 =R1 +R2
0 1 R20 =R1 R2
0 1
1 0 2 3 1 1 4 6
R30 =R1 +2R2
@0 1 2 3 A ⇠ @1 1 0 0A
2 0 1 4 1 2 6 9
1. One solution
63
64 Systems of Linear Equations
2. No solutions
In each case the linear operator is a 1 ⇥ 1 matrix. In the first case, the linear
operator is invertible. In the other two cases it is not. In the first case, the
solution set is a point on the number line, in case 2b the solution set is the
whole number line.
Lets examine similar situations with larger matrices: 2 ⇥ 2 matrices.
✓ ◆✓ ◆ ✓ ◆ ✓ ◆
6 0 x 12 2
1. = has one solution: .
0 2 y 6 3
✓ ◆✓ ◆ ✓ ◆
1 3 x 4
2a. = has no solutions.
0 0 y 1
✓ ◆✓ ◆ ✓ ◆ ⇢✓ ◆ ✓ ◆
1 3 x 4 4 3
2bi. = has solution set +y :y2R .
0 0 y 0 0 1
✓ ◆✓ ◆ ✓ ◆ ⇢✓ ◆
0 0 x 0 x
2bii. = has solution set : x, y 2 R .
0 0 y 0 y
Again, in the first case the linear operator is invertible while in the other
cases it is not. When a 2 ⇥ 2 matrix from a matrix equation is not invertible
the solution set can be empty, a line in the plane, or the plane itself.
For a system of equations with r equations and k veriables, one can have a
number of di↵erent outcomes. For example, consider the case of r equations
in three variables. Each of these equations is the equation of a plane in three-
dimensional space. To find solutions to the system of equations, we look for
the common intersection of the planes (if an intersection exists). Here we
have five di↵erent possibilities:
64
2.5 Solution Sets for Systems of Linear Equations 65
2bi. Line. The planes intersect in a common line; any point on that line
then gives a solution to the system of equations.
2bii. Plane. Perhaps you only had one equation to begin with, or else all
of the equations coincide geometrically. In this case, you have a plane
of solutions, with two free parameters.
Planes
65
66 Systems of Linear Equations
Following the standard approach, express the pivot variables in terms of the non-pivot
variables and add “empty equations”. Here x3 and x4 are non-pivot variables.
9 0 1 0 1 0 1 0 1
x1 = 1 x3 + x4 > > x1 1 1 1
= B C B C B C B
x2 = 1 + x3 x4 x2 C B1C 1C 1C
,B @ A = @ A + x3 B @ A + x4 B
@
C
x3 = x3 >
> x3 0 1 0A
;
x4 = x4 x4 0 0 1
Notice that the first two components of the second two terms come from the non-pivot
columns. Another way to write the solution set is
S = xP + µ1 xH H
1 + µ 2 x 2 : µ 1 , µ2 2 R ,
where 0 1 0 1 01
1 1 1
B 1C B 1C B 1C
xP = B C
@0A , xH
1 =B C
@ 1A , xH
2 =B C
@ 0A .
0 0 1
Here xP is a particular solution while xH H
1 and x2 are called homogeneous solutions.
The solution set forms a plane.
operators. Thus
M (xP + µ1 xH H P H H
1 + µ2 x 2 ) = M x + µ1 M x 1 + µ2 M x 2 = v ,
66
2.5 Solution Sets for Systems of Linear Equations 67
M xP = v .
M xH
1 = 0.
M xH
2 = 0.
Here xH H
1 and x2 are examples of what are called homogeneous solutions to
the system. They do not solve the original equation M x = v, but instead its
associated homogeneous equation M y = 0.
We have just learnt a fundamental lesson of linear algebra: the solution
set to Ax = b, where A is a linear operator, consists of a particular solution
plus homogeneous solutions.
Example 33 Consider the matrix equation of example 32. It has solution set
80 1 0 1 0 1 9
>
> 1 1 1 >
>
<B C B 1C B 1C =
B 1 C B C B C
S = @ A + µ 1 @ A + µ 2 @ A : µ 1 , µ2 2 R .
>
> 0 1 0 >
>
: ;
0 0 1
0 1
1
B 1C
Then M xP = v says that B C
@0A is a solution to the original matrix equation, which is
0
certainly true, but this is not the only solution.
0 1
1
B C
M xH B 1C
1 = 0 says that @ 1A is a solution to the homogeneous equation.
67
68 Systems of Linear Equations
0 1
1
B 1C
M xH
2 = 0 says that B C
@ 0A is a solution to the homogeneous equation.
1
Notice how adding any multiple of a homogeneous solution to the particular solution
yields another particular solution.
2. Invent a simple linear system that has multiple solutions. Use the stan-
dard approach for solving linear systems and a non-standard approach
to obtain di↵erent descriptions of the solution set. Is the solution set
di↵erent with di↵erent approaches?
3. Let 0 1 0 1 1
a11 a12 · · · a1k x
B 2 2 2C B 2 C
B a1 a2 · · · ak C Bx C
M =B
B .. .. .. C
C and x = B
B ..
C.
C
@. . .A @ . A
r r r
a1 a2 · · · ak xk
Note: x2 does not denote the square of the column vector x. Instead
x1 , x2 , x3 , etc..., denote di↵erent variables (the components of x);
the superscript is an index. Although confusing at first, this nota-
tion was invented by AlbertPEinstein who noticed that quantities like
k
a21 x1 + a22 x2 · · · + a2k xk =: 2 j
j=1 aj x , can be written unambiguously
as a2j xj . This is called Einstein summation notation. The most im-
portant thing to remember is that the index j is a dummy variable,
68
2.6 Review Problems 69
Show that your rule for multiplying a matrix by a vector obeys the
linearity property.
4. The standard basis vector ei is a column vector with a one in the ith
row, and zeroes everywhere else. Using the rule for multiplying a matrix
times a vector in problem 3, find a simple rule for multiplying M ei ,
where M is the general matrix defined there.
69
70 Systems of Linear Equations
70
The Simplex Method
3
In Chapter 2, you learned how to handle systems of linear equations. However
there are many situations in which inequalities appear instead of equalities.
In such cases we are often interested in an optimal solution extremizing a
particular quantity of interest. Questions like this are a focus of fields such as
mathematical optimization and operations research. For the case where the
functions involved are linear, these problems go under the title linear pro-
gramming. Originally these ideas were driven by military applications, but
by now are ubiquitous in science and industry. Gigantic computers are dedi-
cated to implementing linear programming methods such as George Dantzig’s
simplex algorithm–the topic of this chapter.
71
72 The Simplex Method
Finally Pablo knows that oranges have twice as much sugar as apples and that apples
have 5 grams of sugar each. Too much sugar is unhealthy, so Pablo wants to keep the
children’s sugar intake as low as possible. How many oranges and apples should Pablo
suggest that the school board put on the menu?
x 5 and y 7,
to fulfill the school board’s politically motivated wishes. The teacher’s and parent’s
fruit requirement means that
x + y 15 ,
but to keep the canteen tidy
x + y 25 .
Now let
s = 5x + 10y .
This linear function of (x, y) represents the grams of sugar in x apples and y oranges.
The problem is asking us to minimize s subject to the four linear inequalities listed
above.
72
3.2 Graphical Solutions 73
x 5
y 7
15 x + y 25 .
You might be able to see the solution to Pablo’s problem already. Oranges
are very sugary, so they should be kept low, thus y = 7. Also, the less fruit
the better, so the answer had better lie on the line x + y = 15. Hence,
the answer must be at the vertex (8, 7). Actually this is a general feature
73
74 The Simplex Method
The plot of a linear function of two variables is a plane through the origin.
Restricting the variables to the feasible region gives some lamina in 3-space.
Since the function we want to optimize is linear (and assumedly non-zero), if
we pick a point in the middle of this lamina, we can always increase/decrease
the function by moving out to an edge and, in turn, along that edge to a
corner. Applying this to the above picture, we see that Pablo’s best option
is 110 grams of sugar a week, in the form of 8 apples and 7 oranges.
It is worthwhile to contrast the optimization problem for a linear function
with the non-linear case you may have seen in calculus courses:
74
3.3 Dantzig’s Algorithm 75
Here we have plotted the curve f (x) = d in the case where the function f is
linear and non-linear. To optimize f in the interval [a, b], for the linear case
we just need to compute and compare the values f (a) and f (b). In contrast,
for non-linear functions it is necessary to also compute the derivative df /dx
to study whether there are extrema inside the interval.
75
76 The Simplex Method
c1 := x+y+z+w = 5
c2 := x + 2y + 3z + 2w = 6 ,
where x 0, y 0, z 0 and w 0.
3x + 3y + z 4w + f = 0 .
Keep in mind that the first four columns correspond to the positive variables (x, y, z, w)
and that the last row has the information of the function f . The general case is depicted
in figure 3.1.
Now the system is written as an augmented matrix where the last row
encodes the objective function and the other rows the constraints. Clearly we
can perform row operations on the constraint rows since this will not change
the solutions to the constraints. Moreover, we can add any amount of the
constraint rows to the last row, since this just amounts to adding a constant
to the function we want to extremize.
76
3.3 Dantzig’s Algorithm 77
77
78 The Simplex Method
Precisely because we chose the second row to perform our row operations, all entries
in the last column remain positive. This allows us to continue the algorithm.
We now repeat the above procedure: There is a 1 in the first column of the last
row. We want to zero it out while adding as little to f as possible. This is achieved
by adding twice the first row to the last row:
0 1
1 1
2 0 2 0 0 2 c1 12 c2 = 2
B C
@ 1 2 3 2 0 6 A c2 = 6
0 7 6 0 1 16 f = 16 7y 6z .
The Dantzig algorithm terminates if all the coefficients in the last row (save perhaps
for the last entry which encodes the value of the objective) are positive. To see why
we are done, lets write out what our row operations have done in terms of the function
f and the constraints (c1 , c2 ). First we have
f = 16 7y 6z
f = 16 .
Finally, we check that the constraints can be solved with y = 0 = z and positive
(x, w). Indeed, they can by taking x = 4, w = 1.
x1 + x2 3, x1 + x2 13 ,
so are not of the form M x = v. To achieve this we introduce two new positive
variables x3 0, x4 4 and write
c1 := x1 + x2 x3 = 3 , c2 := x1 + x2 + x4 = 13 .
78
3.4 Pablo Meets Dantzig 79
These are called slack variables because they take up the “slack” required to convert
inequality to equality. This pair of equations can now be written as M x = v,
0 1
✓ ◆ x1 ✓ ◆
1 1 1 0 B C
Bx2 C = 3 .
1 1 0 1 @x3 A 13
x4
Finally, Pablo wants to minimize sugar s = 5x + 10y, but the standard problem
maximizes f . Thus the so-called objective function f = s + 95 = 5x1 10x2 .
(Notice that it makes no di↵erence whether we maximize s or s + 95, we choose
the latter since it is a linear function of (x1 , x2 ).) Now we can build an augmented
matrix whose last row reflects the objective function equation 5x1 + 10x2 + f = 0:
0 1
1 1 1 0 0 3
@ 1 1 0 1 0 13 A .
5 10 0 0 1 0
Here it seems that the simplex algorithm already terminates because the last row only
has positive coefficients, so that setting x1 = 0 = x2 would be optimal. However, this
does not solve the constraints (for positive values of the slack variables x3 and x4 ).
Thus one more (very dirty) trick is needed. We add two more, positive, (so-called)
artificial variables x5 and x6 to the problem which we use to shift each constraint
c1 ! c1 x5 , c2 ! c 2 x6 .
The idea being that for large positive ↵, the modified objective function
f ↵x5 ↵x6
is only maximal when the artificial variables vanish so the underlying problem is un-
changed. Lets take ↵ = 10 (our solution will not depend on this choice) so that our
augmented matrix reads
0 1
1 1 1 0 1 0 0 3
@ 1 1 0 1 0 1 0 13 A
5 10 0 0 10 10 1 0
0 1
1 1 1 0 1 0 0 3
R30 =R3 10R1 10R2
⇠ @ 1 1 0 1 0 1 0 13 A .
15 10 10 10 0 0 1 160
Here we performed one row operation to zero out the coefficients of the artificial
variables. Now we are ready to run the simplex algorithm exactly as in section 3.3.
79
80 The Simplex Method
The first row operation uses the 1 in the top of the first column to zero out the most
negative entry in the last row:
0 1
1 1 1 0 1 0 0 3
@ 1 1 0 1 0 1 0 13 A
0 5 5 10 15 0 1 115
0 1
0
1 1 1 0 1 0 0 3
R2 =R2 R1
⇠ @ 0 0 1 1 1 1 0 10 A
0 5 5 10 15 0 1 115
0 1
1 1 1 0 1 0 0 3
R30 =R3 +10R2
⇠ @ 0 0 1 1 1 1 0 10 A .
0 5 5 0 5 10 1 15
Now the variables (x2 , x3 , x5 , x6 ) have zero coefficients so must be set to zero to
maximize f . The optimum value is f = 15 so s = f + 95 = 110 exactly as before.
Finally, to solve the constraints x1 = 3 and x4 = 10 so that x = 8 and y = 7 which
also agrees with our previous result.
Clearly, performed by hand, the simplex algorithm was slow and complex
for Pablo’s problem. However, the key point is that it is an algorithm that
can be fed to a computer. For problems with many variables, this method is
much faster than simply checking all vertices as we did in section 3.2.
x 0, y 0, x + 2y 2 , 2x + y 2 ,
by
80
3.5 Review Problems 81
their profit (all of which goes to shareholders, not operating costs). The
quality of oil from well A is better than from well B, so is worth 50%
more per barrel. The Greasy government cares about the environment
and will not allow Conoil to pump in total more than 6 million barrels
per year. Well A costs twice as much as well B to operate. Conoil’s
yearly operating budget is only sufficient to pump at most 10 million
barrels from well B per year. Using both a graphical method and then
(as a double check) Dantzig’s algorithm, determine how many barrels
Conoil should pump from each well to maximize their profits.
81
82 The Simplex Method
82
Vectors in Space, n-Vectors
4
To continue our linear algebra journey, we must discuss n-vectors with an
arbitrarily large number of components. The simplest way to think about
these is as ordered lists of numbers,
0 1
a1
B C
a = @ ... A .
an
83
84 Vectors in Space, n-Vectors
84
4.2 Hyperplanes 85
4.2 Hyperplanes
Vectors in Rn are impossible to visualize unless n is 1,2, or 3. However,
familiar objects like lines and planes still make sense for any value of n: The
line L along the direction defined by a vector v and through a point P labeled
by a vector u can be written as
L = {u + tv | t 2 R} .
80 1 0 1? 9
>
> 1 1 ?? >
>
<B C B0C? =
B2 C B C ?
Example 45 @ A + t @ A?t 2 R describes a line in R4 parallel to the x1 -axis.
>
> 3 0 ? >
>
: ;
4 0 ?
unless both vectors are in the same line, in which case, one of the vectors
is a scalar multiple of the other. The sum of u and v corresponds to laying
the two vectors head-to-tail and drawing the connecting vector. If u and v
determine a plane, then their sum lies in the plane determined by u and v.
85
86 Vectors in Space, n-Vectors
{P + su + tv | s, t 2 R} .
Parametric Notation
We can generalize the notion of a plane with the following recursive def-
inition. (That is, infinitely many things are defined in the following line.)
86
4.2 Hyperplanes 87
80 1 0 1 0 1? 9
> 3 1 0 ? >
>
> ? >
>
>
> B1C B0C B1C? >
>
> B
<B C C B C B C ? >
=
4 B0 C B 0C ?
B C B C B C ?
= B C + a B C + b B C?a, b 2 R
>
>
> B1C B0C B0C? >
>
>
>
> @ 5A @0 A @0A? >
>
>
: ? >
;
9 0 0 ?
87
88 Vectors in Space, n-Vectors
You might sometimes encounter the word “hyperplane” without the qual-
ifier “k-dimensional. When the dimension k is not specified, one usually as-
sumes that k = n 1 for a hyperplane inside Rn . This is the kind of object
that is specified by one algebraic equation in n variables.
is
80 1 0 1 0 1 0 1 0 1? 9
>
> 1 1 1 1 1 ?? >
>
>
> >
<BB0C
C B
B 1C
C
B
B 0CC
B
B 0CC
B
B 1C
C?
? >
=
B0C + s2 B 0C B 1C B 0C B 0C ? s 2 , s3 , s4 , s5 2 R ,
> B C B C + s3 B C + s4 B C + s5 B C? >
>
> @0A @ 0A @ 0 A @ 1 A @ 0A? >
>
>
: ? >
;
0 0 0 0 1 ?
a 4-dimensional hyperplane in R5 .
Using the Law of Cosines, we can then figure out the angle between two
vectors. Given two vectors v and u that span a plane in Rn , we can then
connect the ends of v and u with the vector v u.
88
4.3 Directions and Magnitudes 89
Thus,
kuk kvk cos ✓ = u1 v 1 + · · · + un v n .
Note that in the above discussion, we have assumed (correctly) that Eu-
clidean lengths in Rn give the usual notion of lengths of vectors for any plane
in Rn . This now motivates the definition of the dot product.
1 00 1
u1 v1
B .. C B .. C
Definition The dot product of u = @ . A and v = @ . A is
un vn
u v := u1 v 1 + · · · + un v n .
89
90 Vectors in Space, n-Vectors
The sum above is the one Gauß, according to legend, could do in kindergarten.
u v = kukkvk cos ✓ .
90
4.3 Directions and Magnitudes 91
1. symmetric:
u v = v u,
2. Distributive:
u (v + w) = u v + u w ,
u (cv + dw) = c u v + d u w ,
and
(cu + dw) v = c u v + d w v .
4. Positive Definite:
u u 0,
and u u = 0 only when u itself is the 0-vector.
There are, in fact, many di↵erent useful ways to define lengths of vectors.
Notice in the definition above that we first defined the dot product, and then
defined everything else in terms of the dot product. So if we change our idea
of the dot product, we change our notion of length and angle as well. The
dot product determines the Euclidean length and angle between two vectors.
Other definitions of length and angle arise from inner products, which
have all of the properties listed above (except that in some contexts the
positive definite requirement is relaxed). Instead of writing for other inner
products, we usually write hu, vi to avoid confusion.
91
92 Vectors in Space, n-Vectors
p
• separated by a time hX1 , X2 i if hX1 , X2 i 0.
In particular, the di↵erence in time coordinates t2 t1 is not the time between the
two points! (Compare this to using polar coordinates for which the distance between
two points (r, ✓1 ) and (r, ✓2 ) is not ✓2 ✓1 ; coordinate di↵erences are not necessarily
distances.)
92
4.3 Directions and Magnitudes 93
ku + vk kuk + kvk.
Proof.
ku + vk2 = (u + v) (u + v)
= u u + 2u v + v v
= kuk2 + kvk2 + 2 kuk kvk cos ✓
= (kuk + kvk)2 + 2 kuk kvk(cos ✓ 1)
(kuk + kvk)2 .
That is, the square of the left-hand side of the triangle inequality is the
square of the right-hand side. Since both the things being squared are posi-
tive, the inequality holds without the square;
ku + vk kuk + kvk
Example 54 Let 0 1 0 1
1 4
B2C B3C
a=B C B C
@3A and b = @2A ,
4 1
so that
a a = b b = 1 + 22 + 32 + 42 = 30
93
94 Vectors in Space, n-Vectors
p 2 p
) kak = 30 = kbk and kak + kbk = (2 30)2 = 120 .
Since 0 1
5
B5C
a+b=B C
@5A ,
5
we have
2
ka + bk2 = 52 + 52 + 52 + 52 = 100 < 120 = kak + kbk
as predicted by the triangle inequality. p p
Notice also that a b = 1.4 + 2.3 + 3.2 + 4.1 = 20 < 30. 30 = 30 = kak kbk in
accordance with the Cauchy–Schwarz inequality.
94
4.4 Vectors, Lists and Functions: RS 95
What you have really done here is assign a number to each element of the
set S. In other words, the second list is a function
f : S ! R.
Given two lists like the second one above, we could easily add them – if you
plan to buy 5 apples and I am buying 3 apples, together we will buy 8 apples!
In fact, the second list is really a 5-vector in disguise.
In general it is helpful to think of an n-vector as a function whose domain
is the set {1, . . . , n}. This is equivalent to thinking of an n-vector as an
ordered list of n numbers. These two ideas give us two equivalent notions for
the set of all n-vectors:
80 1 9
1
>
< a >
=
B .. C 1
R := @ . A a , . . . , a 2 R = {a : {1, . . . , n} ! R} =: R{1,··· ,n}
n n
>
: an >
;
The notation R{1,··· ,n} is used to denote the set of all functions from {1, . . . , n}
to R.
Similarly, for any set S the notation RS denotes the set of functions from
S to R:
RS := {f : S ! R} .
When S is an ordered set like {1, . . . , n}, it is natural to write the components
in order. When the elements of S do not have a natural ordering, doing so
might cause confusion.
95
96 Vectors in Space, n-Vectors
a? = 3, a# = 5, a⇤ = 2.
because the elements of S do not have an ordering, since as sets {⇤, ?, #} = {?, #, ⇤}.
a? = 3, a# = 5, a⇤ = 2
and
b? = 2, b# = 4, b⇤ = 13
then a + b 2 RS is the function such that
96
4.5 Review Problems 97
A = (200, 300, 50, 50, 100, 100, 200, 500, 1000, 100) .
He also listed the number of times he mowed each lawn in a given year,
for the year 1988 that ordered list was
f = (20, 1, 2, 4, 1, 5, 2, 1, 10, 6) .
2. (2) Find the angle between the diagonal of the unit square in R2 and
one of the coordinate axes.
(3) Find the angle between the diagonal of the unit cube in R3 and
one of the coordinate axes.
(n) Find the angle between the diagonal of the unit (hyper)-cube in
Rn and one of the coordinate axes.
97
98 Vectors in Space, n-Vectors
98
4.5 Review Problems 99
where the vector p labels a given point on the plane and n is a vector
normal to the plane. Let N and P be vectors in R101 and
0 1
x1
B x2 C
B C
X = B .. C .
@ . A
x101
7. Let 0 1 0 1
1 1
B1C B 2 C
B C B C
B1C B 3 C
u = B C and v = B C
B .. C B .. C
@.A @ . A
1 101
Find the projection of v onto u and the projection of u onto v. (Hint:
Remember that two vectors u and v define a plane, so first work out
how to project one vector onto another in a plane. The picture from
Section 14.4 could help.)
8. If the solution set to the equation A(x) = b is the set of vectors whose
tips lie on the paraboloid z = x2 + y 2 , then what can you say about
the function A?
10. If A is a linear operator and both v and cv (for any real number c) are
solutions to Ax = b, then what can you say about b?
99
100 Vectors in Space, n-Vectors
100
Vector Spaces
5
As suggested at the end of chapter 4, the vector spaces Rn are not the only
vector spaces. We now give a general definition that includes Rn for all
values of n, and RS for all sets S, and more. This mathematical structure is
applicable to a wide range of real-world problems and allows for tremendous
economy of thought; the idea of a basis for a vector space will drive home
the main idea of vector spaces; they are sets with very simple structure.
The two key properties of vectors are that they can be added together
and multiplied by scalars. Thus, before giving a rigorous definition of vector
spaces, we restate the main idea.
101
102 Vector Spaces
Remark Rather than writing (V, +, . , R), we will often say “let V be a vector space
over R”. If it is obvious that the numbers used are real numbers, then “let V be a
vector space” suffices. Also, don’t confuse the scalar product · with the dot product .
The scalar product is a function that takes as its two inputs one number and one
vector and returns a vector as its output. This can be written
·: R ⇥ V ! V .
Similarly
+:V ⇥V !V .
On the other hand, the dot product takes two vectors and returns a number. Suc-
cinctly: : V ⇥ V ! R. Once the properties of a vector space have been verified,
we’ll just write scalar multiplication with juxtaposition cv = c · v, though, to keep our
notation efficient.
102
5.1 Examples of Vector Spaces 103
Example 58
RN = {f | f : N ! R}
Here the vector space is the set of functions that take in a natural number n and return
a real number. The addition is just addition of functions: (f1 +f2 )(n) = f1 (n)+f2 (n).
Scalar multiplication is just as simple: c · f (n) = cf (n).
We can think of these functions as infinitely large ordered lists of numbers: f (1) =
13 = 1 is the first component, f (2) = 23 = 8 is the second, and so on. Then for
example the function f (n) = n3 would look like this:
0 1
1
B 8 C
B C
B 27 C
B C
f =B .
B . C.
.
C
B C
B n3 C
@ A
..
.
Thinking this way, RN is the space of all infinite sequences. Because we can not write
a list infinitely long (without infinite time and ink), one can not define an element of
this space explicitly; definitions that are implicit, as above, or algebraic as in f (n) = n3
(for all n 2 N) suffice.
Let’s check some axioms.
(+iv) (Zero) We need to propose a zero vector. The constant zero function g(n) = 0
works because then f (n) + g(n) = f (n) + 0 = f (n).
The other axioms should also be checked. This can be done using properties of the
real numbers.
RR = {f | f : R ! R}
103
104 Vector Spaces
You can probably figure out how to show that RS is vector space for any
set S. This might lead you to guess that all vector spaces are of the form RS
for some set S. The following is a counterexample.
Example 61 Another very important example of a vector space is the space of all
di↵erentiable functions: ⇢
d
f: R!R f exists .
dx
From calculus, we know that the sum of any two di↵erentiable functions is dif-
ferentiable, since the derivative distributes over addition. A scalar multiple of a func-
tion is also di↵erentiable, since the derivative commutes with scalar multiplication
d d
( dx (cf ) = c dx f ). The zero function is just the function such that 0(x) = 0 for ev-
ery x. The rest of the vector space properties are inherited from addition and scalar
multiplication in R.
104
5.1 Examples of Vector Spaces 105
This example is called a subspace because it gives a vector space inside another vector
space. See chapter 9 for details. Indeed, because it is determined by the linear map
given by the matrix M , it is called ker M , or in words, the kernel of M , for this see
chapter 16.
105
106 Vector Spaces
A hyperplane which does not contain the origin cannot be a vector space
because it fails condition (+iv).
It is also possible to build new vector spaces from old ones using the
product of sets. Remember that if V and W are sets, then their product is
the new set
V ⇥ W = {(v, w)|v 2 V, w 2 W } ,
or in words, all ordered pairs of elements from V and W . In fact V ⇥ W is a
vector space if V and W are. We have actually been using this fact already:
Example 64 The real numbers R form a vector space (over R). The new vector space
R ⇥ R = {(x, y)|x 2 R, y 2 R}
5.1.1 Non-Examples
The solution set to a linear non-homogeneous equation is not a vector space
because it does not contain the zero vector and therefore fails (iv).
Do notice that if just one of the vector space rules is broken, the example is
not a vector space.
Most sets of n-vectors are not vector spaces.
106
5.2 Other Fields 107
⇢✓ ◆
a
Example 66 P := a, b 0 is not a vector space because the set fails (·i)
b
✓ ◆ ✓ ◆ ✓ ◆
1 1 2
since 2 P but 2 = 2
/ P.
1 1 2
does not form a vector space because it does not satisfy (+i). The functions f (x) =
x2 +1 and g(x) = 5 are in the set, but their sum (f +g)(x) = x2 4 = (x+2)(x 2)
is not since (f + g)(2) = 0.
C = x + iy | i2 = 1, x, y 2 R .
Example 68 In quantum physics, vector spaces over C describe all possible states a
physical system can have. For example,
⇢✓ ◆
V = | ,µ 2 C
µ
✓ ◆ ✓ ◆
1 0
is the set of possible states for an electron’s spin. The vectors and describe,
0 1
respectively, ✓an electron
◆ with spin “up” and “down” along a given direction. Other
i
vectors, like are permissible, since the base field is the complex numbers. Such
i
states represent a mixture of spin up and spin down for the given direction (a rather
counterintuitive yet experimentally verifiable concept), but a given spin in some other
direction.
107
108 Vector Spaces
Complex numbers are very useful because of a special property that they
enjoy: every polynomial over the complex numbers factors into a product of
linear polynomials. For example, the polynomial
x2 + 1
doesn’t factor over real numbers, but over complex numbers it factors into
(x + i)(x i) .
x2 = 1,
B2 = Z2 = {0, 1} ,
+ 0 1 ⇥ 0 1
0 0 1 0 0 0
1 1 0 1 0 1
108
5.3 Review Problems 109
(b) What would happen if you used R as the base field (try comparing
to problem 1).
3. (a) Consider the set of convergent sequences, with the same addi-
tion and scalar multiplication that we defined for the space of
sequences:
n o
V = f | f : N ! R, lim f (n) 2 R ⇢ RN .
n!1
109
110 Vector Spaces
5. Let P3R be the set of polynomials with real coefficients of degree three
or less.
(a) Propose a definition of addition and scalar multiplication to make
P3R a vector space.
(b) Identify the zero vector, and find the additive inverse for the vector
3 2x + x2 .
(c) Show that P3R is not a vector space over C. Propose a small
change to the definition of P3R to make it a vector space over C.
(Hint: Every little symbol in the the instructions for par (c) is
important!)
Hint
9. Let V be a vector space and S any set. Show that the set V S of all
functions S ! V is a vector space. Hint: first decide upon a rule for
adding functions whose outputs are vectors.
110
Linear Transformations
6
The main objects of study in any course in linear algebra are linear functions:
Remark We will often refer to linear functions by names like “linear map”, “linear
operator” or “linear transformation”. In some contexts you will also see the name
“homomorphism” which generally is applied to functions from one kind of set to the
same kind of set while respecting any structures on the sets; linear maps are from
vector spaces to vector spaces that respect scalar multiplication and addition, the two
structures on vector spaces. It is common to denote a linear function by capital L
as a reminder of its linearity, but sometimes we will use just f , after all we are just
studying very special functions.
The definition above coincides with the two part description in Chapter 1;
the case r = 1, s = 1 describes additivity, while s = 0 describes homogeneity.
We are now ready to learn the powerful consequences of linearity.
111
112 Linear Transformations
because by homogeneity
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
5 1 1 5 25
L =L 5 = 5L =5 = .
0 0 0 3 15
because by additivity
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 1 0 1 0 5 2 7
L =L + =L +L = + = .
1 0 1 0 1 3 2 5
112
6.1 The Consequence of Linearity 113
we know how L acts on every vector from R2 by linearity based on just two pieces of
information;
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
x 1 0 1 0 5 2 5x + 2y
L =L x +y = xL +yL =x +y = .
y 0 1 0 1 3 2 3x + 2y
Thus, the value of L at infinitely many inputs is completely specified by its value at
just two inputs. (We can see now that L acts in exactly the way the matrix
✓ ◆
5 2
3 2
This is the reason that linear functions are so nice; they are secretly very
simple functions by virtue of two characteristics:
113
114 Linear Transformations
Example 71 Let 8 9
0 1 0 1
< 1 0 =
@ A @ A
V = c 1 1 + c 2 1 c 1 , c2 2 R
: ;
0 1
The domain of L is a plane and its range is the line through the origin in the x2
direction.
It is not clear how to formulate L as a matrix; since
0 1 0 10 1 0 1
c1 0 0 0 c1 0
L @c1 + c2 A = @1 0 1A @c1 + c2 A = (c1 + c2 ) @1A ,
c2 0 0 0 c2 0
or
0 1 0 10 1 0 1
c1 0 0 0 c1 0
L @c1 + c2 A = @0 1 0A @c1 + c2 A = (c1 + c2 ) @1A ,
c2 0 0 0 c2 0
114
6.3 Linear Di↵erential Operators 115
line is 1 dimensional, but the careful definition of “dimension” takes some work; this
is tackled in Chapter 11.) This leads us to write
2 0 1 0 13 0 1 0 1 0 1
1 0 0 0 0 0 ✓ ◆
c
L 4c1 @1A + c2 @1A5 = c1 @1A + c2 @1A = @1 1A 1 .
c2
0 1 0 0 0 0
0 1
0 0
This makes sense, but requires a warning: The matrix @1 1A specifies L so long
0 0
as you also provide the information that you are labeling points in the plane V by the
two numbers (c1 , c2 ).
115
116 Linear Transformations
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 2 1 6
L = , and L = .
1 4 1 8
✓ ◆ ✓ ◆ ✓ ◆
x 1 1
This is because any vector in R is a sum of multiples of
2 and which
y 1 1
can be calculated via a linear systems problem as follows:
✓ ◆ ✓ ◆ ✓ ◆
x 1 1
=a +b
y 1 1
✓ ◆✓ ◆ ✓ ◆
1 1 a x
, =
1 1 b y
✓ ◆ ✓ ◆
1 1 x 1 0 x+y
2
, ⇠
1 1 y 0 1 x2y
⇢
a = x+y
2
,
b = x2y .
Thus
! ! !
x x+y 1 x y 1
= + .
y 2 1 2 1
We can then calculate how L acts on any vector by first expressing the vector as a
116
6.4 Bases (Take 1) 117
It should not surprise you to learn there are infinitely many pairs of
vectors from R2 with the property that any vector can be expressed as a
linear combination of them; any pair that when used as columns of a matrix
gives an invertible matrix works. Such a pair is called a basis for R2 .
Similarly, there are infinitely many triples of vectors with the property
that any vector from R3 can be expressed as a linear combination of them:
these are the triples that used as columns of a matrix give an invertible
matrix. Such a triple is called a basis for R3 .
In a similar spirit, there are infinitely many pairs of vectors with the
property that every vector in
8 0 1 0 1 9
< 1 0 =
@ A @ A
V = c 1 1 + c 2 1 c 1 , c2 2 R
: ;
0 1
117
118 Linear Transformations
(valid for all vectors u, v and any scalar c) is equivalent to the single
condition:
L(ru + sv) = rL(u) + sL(v) , (2)
(for all vectors u, v and any scalars r and s). Your answer should have
two parts. Show that (1) ) (2), and then show that (2) ) (1).
118
6.5 Review Problems 119
Hint
119
120 Linear Transformations
120
Matrices
7
Matrices are a powerful tool for calculations involving linear transformations.
It is important to understand how to find the matrix of a linear transforma-
tion and the properties of matrices.
Example 74 Let
⇢✓ ◆
a b
V = a, b, c, d 2 R
c d
be the vector space of 2 ⇥ 2 real matrices, with addition and scalar multiplication
defined componentwise. One choice of basis is the ordered set (or list) of matrices
✓✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆◆
1 0 0 1 0 0 0 0
B= , , , =: (e11 , e12 , e21 , e22 ) .
0 0 0 0 1 0 0 1
121
122 Matrices
Given a particular vector and a basis, your job is to write that vector as a sum of
multiples of basis elements. Here an arbitrary vector v 2 V is just a matrix, so we
write
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
a b a 0 0 b 0 0 0 0
v = = + + +
c d 0 0 0 0 c 0 0 d
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 0 0 1 0 0 0 0
= a +b +c +d
0 0 0 0 1 0 0 1
= a e11 + b e12 + c e21 + d e22 .
The coefficients (a, b, c, d) of the basis vectors (e11 , e12 , e21 , e22 ) encode the information
of which matrix the vector v is. We store them in column vector by writing
0 1 0 1
a a
B b C B C
v = a e11 + b e12 + c e21 + d e22 =: (e11 , e12 , e21 , e22 ) B C B bC
@ cA =: @ cA .
d d B
0 1
a ✓ ◆
B bC a b
B C
The 4-vector @ A 2 R encodes the vector
4 2 V but is NOT equal to it!
c c d
d
(After all, v is a matrix so could not equal a column vector.) Both notations on the
right hand side of the above equation really stand for the vector obtained by multiplying
the coefficients stored in the column vector by the corresponding basis element and
then summing over them.
are called the standard basis vectors of R2 = R{1,2} . Their description as functions
of {1, 2} are
⇢ ⇢
1 if k = 1 0 if k = 1
e1 (k) = , e2 (k) =
0 if k = 2 1 if k = 2 .
122
7.1 Linear Transformations and Matrices 123
It is natural to assign these the order: e1 is first and e2 is second. An arbitrary vector v
of R2 can be written as ✓ ◆
x
v= = xe1 + ye2 .
y
To emphasize that we are using the standard basis we define the list (or ordered set)
E = (e1 , e2 ) ,
and write ✓ ◆ ✓ ◆
x x
:= (e1 , e2 ) := xe1 + ye2 = v.
y E y
You should read this equation by saying:
✓ ◆
x
“The column vector of the vector v in the basis E is .”
y
Again, the first notation of a column vector with a subscript E refers to the vector
obtained by multiplying each basis vector by the corresponding scalar listed in the
column and then summing these, i.e. xe1 + ye2 . The second notation denotes exactly
the same thing but we first list the basis elements and then the column vector; a
useful trick because this can be read in the same way as matrix multiplication of a row
vector times a column vector–except that the entries of the row vector are themselves
vectors!
You should already try to write down the standard basis vectors for Rn
for other values of n and express an arbitrary vector in Rn in terms of them.
The last example probably seems pedantic because column vectors are al-
ready just ordered lists of numbers and the basis notation has simply allowed
us to “re-express” these as lists of numbers. Of course, this objection does
not apply to more complicated vector spaces like our first matrix example.
Moreover, as we saw earlier, there are infinitely many other pairs of vectors
in R2 that form a basis.
123
124 Matrices
B = (b, )
be the ordered basis. Note that for an unordered set we use the {} parentheses while
for lists or ordered sets we use ().
As before we define
✓ ◆ ✓ ◆
x x
:= (b, ) := xb + y .
y B y
You might think that the numbers x and y denote exactly the same vector as in the
previous example. However, they do not. Inserting the actual vectors that b and
represent we have
✓ ◆ ✓ ◆ ✓ ◆
1 1 x+y
xb + y = x +y = .
1 1 x y
Thus, to contrast, we have
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
x x+y x x
= and =
y B x y y E y
Only in the standard basis E does the column vector of v agree with the column vector
that v actually is!
Based on the above example, you might think that our aim would be to
find the “standard basis” for any problem. In fact, this is far from the truth.
Notice, for example that the vector
✓ ◆
1
v= = e1 + e2 = b
1
written in the standard basis E is just
✓ ◆
1
v= ,
1 E
which was easy to calculate. But in the basis B we find
✓ ◆
1
v= ,
0 B
124
7.1 Linear Transformations and Matrices 125
which is actually a simpler column vector! The fact that there are many
bases for any given vector space allows us to choose a basis in which our
computation is easiest. In any case, the standard basis only makes sense
for Rn . Suppose your vector space was the set of solutions to a di↵erential
equation–what would a standard basis then be?
B=( x, y, z) ,
125
126 Matrices
where ✓ ◆ ✓ ◆ ✓ ◆
0 1 0 i 1 0
x = , y = , z = .
1 0 i 0 0 1
These three matrices are the famous Pauli matrices; they are used to describe electrons
in quantum theory, or qubits in quantum computation. Let
✓ ◆
2+i 1+i
v= .
3 i 2 i
This gives four equations, i.e. a linear systems problem, for the ↵’s
8 x
> ↵x
> i↵y = 1+i
<
↵ + i↵y = 3 i
>
> ↵z = 2+i
:
↵z = 2 i
with solution
↵x = 2 , ↵y = 2 2i , ↵z = 2 + i.
Thus 0 1
✓ ◆ 2
2+i 1+i
v= =@ 2 iA .
3 i 2 i
2+i B
126
7.1 Linear Transformations and Matrices 127
The numbers (↵1 , ↵2 , . . . , ↵n ) are called the components of the vector v. Two
useful shorthand notations for this are
0 1 0 1
↵1 ↵1
B ↵2 C B ↵2 C
B C B C
v = B .. C = (b1 , b2 , . . . , bn ) B .. C .
@ . A @ . A
n
↵ B
↵n
0 1 1
m1 m12 · · · m1i · · ·
Bm2 m2 · · · m2 · · ·C
B 1 2 i C
B .. .. .. C
B
= ( 1 , 2 , . . . , j , . . .) B . . . C
C
Bmj mj · · · mj · · ·C
@ 1 2 i A
.. .. ..
. . .
Example 79 Consider L : V ! R3 (as in example 71) defined by
0 1 0 1 0 1 0 1
1 0 0 0
L 1 = 1 , L 1 = 1A .
@ A @ A @ A @
0 0 1 0
127
128 Matrices
We had trouble expressing this linear operator as a matrix. Lets take input basis
00 1 0 11
1 0
B = @@1A , @1AA =: (b1 , b2 ) ,
0 1
The matrix on the right is the matrix of L in these bases. More succinctly we could
write 0 1
✓ ◆ 0
x
L = (x + y) @1A
y B
0 E
0 1
0 0
and thus see that L acts like the matrix 1 1A.
@
0 0
Hence 00 1 1
✓ ◆ 0 0 ✓ ◆
x x A
L = @@1 1A ;
y B y
0 0 E
given input and output bases, the linear operator is now encoded by a matrix.
128
7.2 Review Problems 129
Example 80 Lets compute a matrix for the derivative operator acting on the vector
space of polynomials of degree 2 or less:
V = {a0 1 + a1 x + a2 x2 | a0 , a1 , a2 2 R} .
and 0 1 0 1
a b
d @ A
b = b · 1 + 2cx + 0x2 = @ 2c A
dx
c B 0 B
In the ordered basis B for both domain and range
0 1
0 1 0
d B @
7! 0 0 2A
dx
0 0 0
Notice this last line makes no sense without explaining which bases we are using!
1. A door factory can buy supplies in two kinds of packages, f and g. The
package f contains 3 slabs of wood, 4 fasteners, and 6 brackets. The
package g contains 5 fasteners, 3 brackets, and 7 slabs of wood.
(a) Explain how to view the packages f and g as functions and list
their inputs and outputs.
129
130 Matrices
(b) Choose an ordering for the 3 kinds of supplies and use this to
rewrite f and g as elements of R3 .
2. You are designing a simple keyboard synthesizer with two keys. If you
push the first key with intensity a then the speaker moves in time as
a sin(t). If you push the second key with intensity b then the speaker
moves in time as b sin(2t). If the keys are pressed simultaneously,
(a) describe the set of all sounds that come out of your synthesizer.
(Hint: Sounds can be “added”.)
✓ ◆
3
(b) Graph the function 2 R{1,2} .
5
✓ ◆
3
(c) Let B = (sin(t), sin(2t)). Explain why is not in R{1,2} but
5 B
is still a function.
✓ ◆
3
(d) Graph the function .
5 B
d
3. (a) Find the matrix for dx acting on the vector space V of polynomi-
als of degree 2 or less in the ordered basis B = (x2 , x, 1)
(b) Use the matrix from part (a) to rewrite the di↵erential equation
d
dx
p(x) = x as a matrix equation. Find all solutions of the matrix
equation. Translate them into elements of V .
130
7.2 Review Problems 131
d
(c) Find the matrix for dx acting on the vector space V in the ordered
0 2 2
basis B = (x + x, x x, 1).
(d) Use the matrix from part (c) to rewrite the di↵erential equation
d
dx
p(x) = x as a matrix equation. Find all solutions of the matrix
equation. Translate them into elements of V .
(e) Compare and contrast your results from parts (b) and (d).
d
4. Find the “matrix” for dx acting on the vector space of all power series
in the ordered basis (1, x, x2 , x3 , ...). Use this matrix to find all power
d
series solutions to the di↵erential equation dx f (x) = x. Hint: your
“matrix” may not have finite size.
2
2 acting on {c1 cos(x) + c2 sin(x) | c1 , c2 2 R} in
d
5. Find the matrix for dx
the ordered basis (cos(x), sin(x)).
131
132 Matrices
Find the
◆ ✓✓ F :◆R ! ◆ R whose
2
(k) first
✓✓ order◆ polynomial
◆ ✓✓ ◆ function ✓✓ graph
◆ contains
◆
0 0 1 1
,1 , ,2 , , 3 , and ,4 .
0 1 0 1
(l) homogeneous first
✓✓order
◆ polynomial
◆ ✓✓ ◆function
◆ : R2 !
H✓✓ ◆ R ◆whose
0 1 1
graph contains ,2 , , 3 , and ,4 .
1 0 1
132
7.3 Properties of Matrices 133
(m) second✓✓
order◆polynomial
◆ ✓✓ function
◆ ◆ ✓✓ J : R◆2
! ◆R whose graph con-
0 0 0
tains ,0 , ,2 , ,5 ,
0 1 2
✓✓ ◆ ◆ ✓✓ ◆ ◆ ✓✓ ◆ ◆
1 2 1
,3 , , 6 , and ,4 .
0 0 1
(o) How many points in the graph of a q-th order polynomial function
Rn ! Rn would completely determine the function?
(p) In particular, how many points of the graph of linear function
Rn ! Rn would completely determine the function? How does a
matrix (in the standard basis) encode this information?
(q) Propose a way to store the information required in 8g above in an
array of numbers.
(r) Propose a way to store the information required in 8o above in an
array of numbers.
133
134 Matrices
The numbers mij are called entries. The superscript indexes the row of
the matrix and the subscript indexes the column of the matrix in which mij
appears.
v = v1 v2 · · · vk .
The transpose of a column vector is the corresponding row vector and vice
versa:
Example 81 Let
0 1
1
v = @2A .
3
Then
vT = 1 2 3 ,
and (v T )T = v. This is an example of an involution, namely an operation which when
performed twice does nothing.
Example 82 In computer graphics, you may have encountered image files with a .gif
extension. These files are actually just matrices: at the start of the file the size of the
matrix is given, after which each number is a matrix entry indicating the color of a
particular pixel in the image.
This matrix then has its rows shu✏ed a bit: by listing, say, every eighth row, a web
browser downloading the file can start displaying an incomplete version of the picture
before the download is complete.
Finally, a compression algorithm is applied to the matrix to reduce the file size.
134
7.3 Properties of Matrices 135
For example, the graph pictured above would have the following matrix, where mij
indicates the number of edges between the vertices labeled i and j:
0 1
1 2 1 1
B2 0 1 0C
M =B
@1
C
1 0 1A
1 0 1 3
135
136 Matrices
Then
0 1 0 1
| | | | | |
M N = M @N1 N2 · · · Ns A = @M N1 M N2 · · · M Ns A
| | | | | |
136
7.3 Properties of Matrices 137
⇣ ⌘ ⇣ ⌘ ⇣ ⌘
r ⇥ k times k ⇥ m is r ⇥ m
The entries of M N are made from the dot products of the rows of
M with the columns of N .
Example 85 Let
0 1 0 T 1
1 3 u ✓ ◆
2 3 1
M = @3 5A =: @ v T A and N = =: a b c
T 0 1 0
2 6 w
where
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 3 2 2 3 1
u= , v= , w= , a= , b= , c= .
3 5 6 0 1 0
Then 0 1 0 1
u·a u·b u·c 2 6 1
M N = @ v · a v · b v · c A = @6 14 3A .
w·a w·b w·c 4 12 2
137
138 Matrices
Mx = 0
Remark Remember that the set of all vectors that can be obtained by adding up
scalar multiples of the columns of a matrix is called its column space . Similarly the
row space is the set of all row vectors obtained by adding up multiples of the rows
of a matrix. The above theorem says that if M x = 0, then the vector x is orthogonal
to every vector in the row space of M .
This is the same as the rule we use to multiply matrices. In other words,
L(M ) = N M is a linear transformation.
Matrix Terminology Let M = (mij ) be a matrix. The entries mii are called
diagonal, and the set {m11 , m22 , . . .} is called the diagonal of the matrix.
Any r ⇥ r matrix is called a square matrix. A square matrix that is
zero for all non-diagonal entries is called a diagonal matrix. An example
of a square diagonal matrix is
0 1
2 0 0
@0 3 0A .
0 0 0
138
7.3 Properties of Matrices 139
Ir M = M I k = M
M T = (m̂ij )
A matrix M is symmetric if M = M T .
Example 86 0 1
✓ ◆T 2 1
2 5 6
= @5 3A ,
1 3 4
6 4
and ✓ ◆✓ ◆T ✓ ◆
2 5 6 2 5 6 65 43
= ,
1 3 4 1 3 4 43 26
is symmetric.
Observations
139
140 Matrices
(M N )T = N T M T .
So first we compute
⇣Xr hXn i ⌘ ⇣X n h
r X i ⌘ ⇣X r X
n ⌘
(M N )R = mij njk rlk = mij njk rlk = mij njk rlk .
k=1 j=1 k=1 j=1 k=1 j=1
In the first step we just wrote out the definition for matrix multiplication, in the second
step we moved summation symbol outside the bracket (this is just the distributive
property x(y +z) = xy +xz for numbers) and in the last step we used the associativity
property for real numbers to remove the square brackets. Exactly the same reasoning
shows that
⇣X n hXr i⌘ ⇣ X r X n h i⌘ ⇣ X r X n ⌘
M (N R) = mij njk rlk = mij njk rlk = mij njk rlk .
j=1 k=1 k=1 j=1 k=1 j=1
1
As a fun remark, note that Einstein would simply have written
(M N )R = (mij njk )rlk = mij njk rlk = mij (njk rlk ) = M (N R).
140
7.3 Properties of Matrices 141
M N 6= N M .
Do Matrices Commute?
rotates vectors in the plane by an angle ✓. We can generalize this, using block matrices,
to three dimensions. In fact the following matrices built from a 2 ⇥ 2 rotation matrix,
a 1 ⇥ 1 identity matrix and zeroes everywhere else
0 1 0 1
cos ✓ sin ✓ 0 1 0 0
M = @ sin ✓ cos ✓ 0A and N = @0 cos ✓ sin ✓A ,
0 0 1 0 sin ✓ cos ✓
141
142 Matrices
• The
✓ blocks
◆ of a block matrix✓ must ◆
fit together to form a rectangle. So
B A C B
makes sense, but does not.
D C D A
142
7.3 Properties of Matrices 143
This is exactly M 2 .
143
144 Matrices
to define
M0 = I ,
the identity matrix, just like x0 = 1 for numbers.
As a result, any polynomial can be have square matrices in it’s domain.
Then ✓ ◆ ✓ ◆
2 1 2t 3 1 3t
M = , M = , ...
0 1 0 1
and so
✓ ◆ ✓ ◆ ✓ ◆
1 t 1 2t 1 3t
f (M ) = 2 +3
0 1 0 1 0 1
✓ ◆
2 6t
= .
0 2
144
7.3 Properties of Matrices 145
7.3.4 Trace
A large matrix contains a great deal of information, some of which often re-
flects the fact that you have not set up your problem efficiently. For example,
a clever choice of basis can often make the matrix of a linear transformation
very simple. Therefore, finding ways to extract the essential information of
a matrix is useful. Here we need to assume that n < 1 otherwise there are
subtleties with convergence that we’d have to address.
Definition The trace of a square matrix M = (mij ) is the sum of its diag-
onal entries: n
X
tr M = mii .
i=1
Example 91 0 1
2 7 6
tr @9 5 1A = 2 + 5 + 8 = 15 .
4 3 8
X
tr(M N ) = tr( Mli Njl )
l
XX
= Mli Nil
i l
XX
= Nil Mli
l i
X
= tr( Nil Mli )
i
= tr(N M ).
Proof Explanation
Thus we have a Theorem:
Theorem 7.3.3. For any square matrices M and N
tr(M N ) = tr(N M ).
145
146 Matrices
tr M = tr M T
This is true because the trace only uses the diagonal entries, which are fixed
by the transpose. For example,
✓ ◆ ✓ ◆ ✓ ◆T
1 1 1 2 1 2
tr = 4 = tr = tr .
2 3 1 3 1 3
Finally, trace is a linear transformation from matrices to the real numbers.
This is easy to check.
146
7.4 Review Problems 147
0 10 1
2 1 2 1 2 1 2 1 2 1
0 10 1 B0 C B
2 1 1 x B 2 1 2 1 C B0 1 2 1 2CC
B C B CB C
x y z @1 2 1A @ y A , B0 1 2 1 2 C B0 2 1 2 1C ,
B CB C
1 1 2 z @0 2 1 2 1 A @0 1 2 1 2A
0 0 0 0 2 0 0 0 0 1
0 4 1
10 2 2
10 1
2 3 3
4 3 3
1 2 1
B 5 2C B 5 2C B C
@ 2 3 3A @
6 3 3A @
4 5 2A .
16 10
1 2 1 12 3 3
7 8 2
(a) Let M = (mij ) and let N = (nij ). Write out a few of the entries of
each matrix in the form given at the beginning of section 7.3.
(b) Multiply out M N and write out a few of its entries in the same
form as in part (a). In terms of the entries of M and the entries
of N , what is the entry in row i and column j of M N ?
(c) Take the transpose (M N )T and write out a few of its entries in
the same form as in part (a). In terms of the entries of M and the
entries of N , what is the entry in row i and column j of (M N )T ?
(d) Take the transposes N T and M T and write out a few of their
entries in the same form as in part (a).
(e) Multiply out N T M T and write out a few of its entries in the same
form as in part a. In terms of the entries of M and the entries of
N , what is the entry in row i and column j of N T M T ?
(f) Show that the answers you got in parts (c) and (e) are the same.
✓ ◆
1 2 0
3. (a) Let A = . Find AAT and AT A and their traces.
3 1 4
147
148 Matrices
0 1 0 1
x1 y1
B .. C B .. C
4. Let x = @ . A and y = @ . A be column vectors. Show that the
xn yn
T
dot product x y = x I y.
Hint
Hint
148
7.4 Review Problems 149
Hint
0 1
1 0 0 0 0 0 0 1
B0 1 0 0 0 0 1 0C
B C
B0 0 1 0 0 1 0 0C
B C
B0 0 0 1 1 0 0 0C
9. Let M = B B0
C. Divide M into named blocks,
B 0 0 0 2 1 0 0CC
B0 0 0 0 0 2 0 0C
B C
@0 0 0 0 0 0 3 1A
0 0 0 0 0 0 0 3
with one block the 4 ⇥ 4 identity matrix, and then multiply blocks to
compute M 2 .
149
150 Matrices
(A 1 ) 1
= A.
1
(AB) = B 1A 1
150
7.5 Inverse Matrix 151
Figure 7.1: The formula for the inverse of a 2⇥2 matrix is worth memorizing!
Thus, much like the transpose, taking the inverse of a product reverses
the order of the product.
(A 1 )T = (AT ) 1
2 ⇥ 2 Example
151
152 Matrices
M X = V1 , M X = V2
we can consider augmented matrices with many columns on the right and
then apply Gaussian row reduction to the left side of the matrix. Once the
identity matrix is on the left side of the augmented matrix, then the solution
of each of the individual linear systems is on the right.
1 1
M V1 V2 ⇠ I M V1 M V2
1 1
M I ⇠ I M I = I M
0 1 1
1 2 3
Example 93 Find @ 2 1 0A .
4 2 5
We start by writing the augmented matrix, then apply row reduction to the left side.
0 1 0 1
1 2 3 1 0 0 1 2 3 1 0 0
B 2 1 C B
0 0 1 0A ⇠ @0 5 6 2 1 0C
@ A
4 2 5 0 0 1 0 6 7 4 0 1
0 3 1 2 1
1 0 5 4 5 0
B0 1 6 2 1
0C
⇠ @ 5 5 5 A
1 4 6
0 0 5 5 5 1
0 1
1 0 0 5 4 3
B0 1 0 10 7 6C
⇠ @ A
0 0 1 8 6 5
152
7.5 Inverse Matrix 153
At this point, we know M 1 assuming we didn’t goof up. However, row reduction is a
lengthy and involved process with lots of room for arithmetic errors, so we should check
our answer, by confirming that M M 1 = I (or if you prefer M 1 M = I):
0 10 1 0 1
1 2 3 5 4 3 1 0 0
MM 1 = @ 2 1 0A @ 10 7 6A = @0 1 0A
4 2 5 8 6 5 0 0 1
The product of the two matrices is indeed the identity matrix, so we’re done.
x +2y 3z = 1
2x + y =2
4x
2y +5z = 0
0 1
1
The associated matrix equation is M X = 2A , where M is the same as in the
@
0
previous section, so the system above is equivalent to the matrix equation
0 1 0 1 10 1 0 10 1 0 1
x 1 2 3 1 5 4 3 1 3
@yA = @ 2 1 0A @2A = @ 10 7 6A @2A = @ 4A .
z 4 2 5 0 8 6 5 0 4
0 1 0 1
x 3
That is, the system is equivalent to the equation @ y A = @ 4A, and it is easy to
z 4
see what the solution(s) to this equation are.
1
In summary, when M exists
1
Mx = v , x = M v.
153
154 Matrices
154
7.6 Review Problems 155
155
156 Matrices
1. Find formulas for the inverses of the following matrices, when they are
not singular:
0 1
1 a b
(a) @0 1 cA
0 0 1
0 1
a b c
(b) @0 d eA
0 0 f
When are these matrices singular?
2. Write down all 2⇥2 bit matrices and decide which of them are singular.
For those which are not singular, pair them with their inverse.
3. Let M be a square matrix. Explain why the following statements are
equivalent:
(a) M X = V has a unique solution for every column vector V .
(b) M is non-singular.
Hint: In general for problems like this, think about the key words:
First, suppose that there is some column vector V such that the equa-
tion M X = V has two distinct solutions. Show that M must be sin-
gular; that is, show that M can have no inverse.
Next, suppose that there is some column vector V such that the equa-
tion M X = V has no solutions. Show that M must be singular.
Finally, suppose that M is non-singular. Show that no matter what
the column vector V is, there is a unique solution to M X = V.
Hint
4. Left and Right Inverses: So far we have only talked about inverses of
square matrices. This problem will explore the notion of a left and
right inverse for a matrix that is not square. Let
✓ ◆
0 1 1
A=
1 1 0
156
7.6 Review Problems 157
(a) Compute:
i. AAT ,
1
ii. AAT ,
T 1
iii. B := A AAT
(b) Show that the matrix B above is a right inverse for A, i.e., verify
that
AB = I .
(f) True or false: Left and right inverses are unique. If false give a
counterexample.
Hint
5. Show that if the range (remember that the range of a function is the
set of all its outputs, not the codomain) of a 3 ⇥ 3 matrix M (viewed
as a function R3 ! R3 ) is a plane then one of the columns is a sum of
multiples of the other columns. Show that this relationship is preserved
under EROs. Show, further, that the solutions to M x = 0 describe this
relationship between the columns.
1
6. If M and N are square matrices of the same size such that M exists
and N 1 does not exist, does (M N ) 1 exist?
157
158 Matrices
158
7.7 LU Redux 159
7.7 LU Redux
Certain matrices are easier to work with than others. In this section, we
will see how to write any square2 matrix M as the product of two simpler
matrices. We will write
M = LU ,
where:
• L is lower triangular . This means that all entries above the main
diagonal are zero. In notation, L = (lji ) with lji = 0 for all j > i.
0 1
l11 0 0 · · ·
Bl 2 l 2 0 · · · C
B1 2 C
L = Bl 3 l 3 l 3 · · · C
@1 2 3 A
.. .. .. . .
. . . .
• U is upper triangular . This means that all entries below the main
diagonal are zero. In notation, U = (uij ) with uij = 0 for all j < i.
0 1
u11 u12 u13 · · ·
B 0 u2 u2 · · · C
B 2 3 C
U = B 0 0 u3 · · · C
@ 3 A
.. .. .. . .
. . . .
M = LU is called an LU decomposition of M .
This is a useful trick for computational reasons; it is much easier to com-
pute the inverse of an upper or lower triangular matrix than general matrices.
Since inverses are useful for solving linear systems, this makes solving any lin-
ear system associated to the matrix much faster as well. The determinant—a
very important quantity associated with any square matrix—is very easy to
compute for triangular matrices.
Example 96 Linear systems associated to upper triangular matrices are very easy to
solve by back substitution.
✓ ◆ ✓ ◆
a b 1 e 1 be
) y= , x= 1
0 c e c a c
2
The case where M is not square is dealt with at the end of the section.
159
160 Matrices
0 1 8 9 8
1 0 0 d < x=d = < x=d
@a 1 0 eA ) y=e ax ) y=e ad .
: ; :
b c 1 f z=f bx cy z=f bd c(e ad)
For lower triangular matrices, forward substitution gives a quick solution; for upper
triangular matrices, back substitution gives the solution.
160
7.7 LU Redux 161
0 1
u
• Step 1: Set W = v A = U X.
@
w
0 10 1 0 1
3 0 0 u 3
@1 6 0A @ v A = @19A
2 3 1 w 0
Using an LU decomposition
161
162 Matrices
We would like to use the first row of M to zero out the first entry of every
row below it. For our running example,
0 1
6 18 3
M = @2 12 1A ,
4 15 3
162
7.7 LU Redux 163
163
164 Matrices
Moreover it is obtained by recording minus the constants used for all our
row operations in the appropriate columns (this always works this way).
Moreover, U2 is upper triangular and M = L2 U2 , we are done! Putting this
all together we have
0 1 0 10 1
6 18 3 1 0 0 6 18 3
B CB C
M = @2 12 1A = @ 13 1 0A @0 6 0A .
4 15 3 2 1
1 0 0 1
3 2
If the matrix you’re working with has more than three rows, just continue
this process by zeroing out the next column below the diagonal, and repeat
until there’s nothing left to do.
The fractions in the L matrix are admittedly ugly. For two matrices
LU , we can multiply one entire column of L by a constant and divide the
corresponding row of U by the same constant without changing the product
of the two matrices. Then:
0 1 0 1
1 0 0 6 18 3
B C B C
LU = @ 13 1 0A I @0 6 0A
2 1
1 0 0 1
03 2 1 0 1 01 10 1
1 0 0 3 0 0 3
0 0 6 18 3
B C B C
= @ 13 1 0A @0 6 0A @ 0 16 0A @0 6 0A
2 1
1 0 0 1 0 0 1 0 0 1
0 2 10
3
1
3 0 0 2 6 1
= @ 1 6 0 A @ 0 1 0A .
2 3 1 0 0 1
The resulting matrix looks nicer, but isn’t in standard (lower unit triangular
matrix) form.
164
7.7 LU Redux 165
For matrices that are not square, LU decomposition still makes sense.
Given an m ⇥ n matrix M , for example we could write M = LU with L
a square lower unit triangular matrix, and U a rectangular matrix. Then
L will be an m ⇥ m matrix, and U will be an m ⇥ n matrix (of the same
shape as M ). From here, the process is exactly the same as for a square
matrix. We create a sequence of matrices Li and Ui that is eventually the
LU decomposition. Again, we start with L0 = I and U0 = M .
✓ ◆
2 1 3
Example 99 Let’s find the LU decomposition of M = U0 = . Since M
4 4 1
is a 2 ⇥ 3 matrix, our decomposition
✓ ◆ consist of a 2 ⇥ 2 matrix and a 2 ⇥ 3 matrix.
will
1 0
Then we start with L0 = I2 = .
0 1
The next step is to zero-out the first column of M below the diagonal. There is
only one row to cancel, then, and it can be removed by subtracting 2 times the first
row of M to the second row of M . Then:
✓ ◆ ✓ ◆
1 0 2 1 3
L1 = , U1 =
2 1 0 2 5
Since U1 is upper triangular, we’re done. With a larger matrix, we would just continue
the process.
Then:
✓ ◆✓ ◆✓ ◆
I 0 X 0 I X 1Y
M= 1 1
.
ZX I 0 W ZX Y 0 I
165
166 Matrices
By multiplying the diagonal matrix by the upper triangular matrix, we get the standard
LU decomposition of the matrix.
x1 = v1
l12 x1 +x2 = v2
.. ..
. .
l1 x +l2 x + · · · + x = v n
n 1 n 2 n
(i) Find x1 .
(ii) Find x2 .
(iii) Find x3 .
166
7.8 Review Problems 167
4. Describe what upper and lower triangular matrices do to the unit hy-
percube in their domain.
7. If M is invertible then what are the LU, LDU, and LDP U decompo-
sitions of M T in terms of the decompositions for M ? Can you do the
same for M 1 ?
167
168 Matrices
168
Determinants
8
Given a square matrix, is there an easy way to know when it is invertible?
Answering this fundamental question is the goal of this chapter.
then ✓ ◆
1 1 m22 m12
M = .
m11 m22 m12 m21 m21 m11
Thus M is invertible if and only if
169
170 Determinants
det M = m11 m22 m33 m11 m23 m32 + m12 m23 m31 m12 m21 m33 + m13 m21 m32 m13 m22 m31 6= 0.
Notice that in the subscripts, each ordering of the numbers 1, 2, and 3 occurs exactly
once. Each of these is a permutation of the set {1, 2, 3}.
8.1.2 Permutations
Consider n objects labeled 1 through n and shu✏e them. Each possible shuf-
fle is called a permutation. For example, here is an example of a permutation
of 1–5:
1 2 3 4 5
=
4 2 5 1 3
170
8.1 The Determinant Formula 171
but since the top line of any permutation is always the same, we can omit it
and just write: ⇥ ⇤
= (1) (2) (3) (4) (5)
and so our example becomes simply = [4 2 5 1 3].
The mathematics of permutations is extensive; there are a few key prop-
erties of permutations that we’ll need:
171
172 Determinants
Permutation Example
P
det M = sgn( ) m1 (1) m2 (2) · · · mn(n) .
The sum is over all permutations of n objects; a sum over the all elements
of { : {1, . . . , n} ! {1, . . . , n}}. Each summand is a product of n entries
from the matrix with each factor from a di↵erent row. In di↵erent terms of
the sum the column numbers are shu✏ed by di↵erent permutations .
The last statement about the summands yields a nice property of the
determinant:
Example 102 Because there are many permutations of n, writing the determinant
this way for a general matrix gives a very long sum. For n = 4, there are 24 = 4!
permutations, and for n = 5, there are already 120 = 5! permutations.
0 1 1
m1 m12 m13 m14
Bm2 m2 m2 m2 C
B 4C
For a 4 ⇥ 4 matrix, M = B 13 2 3
C, then det M is:
@m1 m2 m3 m34 A
3 3
det M = m11 m22 m33 m44 m11 m23 m32 m44 m11 m22 m34 m43
m12 m21 m33 m44 + m11 m23 m34 m42 + m11 m24 m32 m43
+ m12 m23 m31 m44 + m12 m21 m34 m43 ± 16 more terms.
172
8.1 The Determinant Formula 173
Since the identity matrix is diagonal with all diagonal entries equal to one,
we have
det I = 1.
We would like to use the determinant to decide whether a matrix is in-
vertible. Previously, we computed the inverse of a matrix by applying row
operations. Therefore we ask what happens to the determinant when row
operations are applied to a matrix.
Swapping rows Lets swap rows i and j of a matrix M and then compute its determi-
nant. For the permutation , let ˆ be the permutation obtained by swapping positions
i and j. Clearly
sgn(ˆ ) = sgn( ) .
Let M 0 be the matrix M with rows i and j swapped. Then (assuming i < j):
X
det M 0 = sgn( ) m1 (1) · · · mj (i) · · · mi (j) · · · mn(n)
X
= sgn( ) m1 (1) · · · mi (j) · · · mj (i) · · · mn(n)
X
= ( sgn(ˆ )) m1ˆ (1) · · · miˆ (i) · · · mjˆ (j) · · · mnˆ (n)
X
= sgn(ˆ ) m1ˆ (1) · · · miˆ (i) · · · mjˆ (j) · · · mnˆ (n)
ˆ
= det M.
P P
The step replacing by ˆ often causes confusion; it holds since we sum over all
permutations (see review problem 3). Thus we see that swapping rows changes the
sign of the determinant. I.e.,
det M 0 = det M .
173
174 Determinants
det Eji = 1,
where the matrix Eji is the identity matrix with rows i and j swapped. It is a row swap
elementary matrix.
This implies another nice property of the determinant. If two rows of the matrix
are identical, then swapping the rows changes the sign of the matrix, but leaves the
matrix unchanged. Then we see the following:
Elementary Matrices
174
8.2 Elementary Matrices and Determinants 175
175
176 Determinants
Now we know that swapping a pair of rows flips the sign of the determi-
nant so det M 0 = detM . But det Eji = 1 and M 0 = Eji M so
det Eji M = det Eji det M .
This result hints at a general rule for determinants of products of matrices.
X
det M 0 = sgn( )m1 (1) · · · mi (i) · · · mn(n)
X
= sgn( )m1 (1) · · · mi (i) · · · mn(n)
= det M
176
8.2 Elementary Matrices and Determinants 177
det Ri ( )M = det M .
0 1
1
B ... C
B C
B C
det Ri ( ) = det B C= ,
B .. C
@ . A
1
177
178 Determinants
0 1
1
B .. C
B . C
B C
B 1 µ C
B . C
Sj (µ) = B
i
B .. C.
C
B 1 C
B C
B .. C
@ . A
1
Then multiplying M by Sji (µ) performs a row addition;
0 10 1 0 1
1
B .. C B .. C B .. C
B . CB . C B . C
B CB i C B i C
B 1 µ C B R C B R + µRj C
B .. CB . C B .. C
B . C B .. C = B . C.
B CB C B C
B 1 C B j C B Rj C
B CB R C B C
B .. C B . C B .. C
@ . A @ .. A @ . A
1
What is the e↵ect of multiplying by Sji (µ) on the determinant? Let M 0 =
Sji (µ)M , and let M 00 be the matrix M but with Ri replaced by Rj Then
X
det M 0 = sgn( )m1 (1) · · · (mi (i) + µmj (i) ) · · · mn(n)
X
= sgn( )m1 (1) · · · mi (i) · · · mn(n)
X
+ sgn( )m1 (1) · · · µmj (j) · · · mj (j) · · · mn(n)
= det M + µ det M 00
det M 0 = det M,
178
8.2 Elementary Matrices and Determinants 179
Figure 8.4: Adding one row to another leaves the determinant unchanged.
Elementary Determinants
179
180 Determinants
We have seen that any matrix M can be put into reduced row echelon form
via a sequence of row operations, and we have seen that any row operation can
be achieved via left matrix multiplication by an elementary matrix. Suppose
that RREF(M ) is the reduced row echelon form of M . Then
RREF(M ) = E1 E2 · · · Ek M ,
180
8.2 Elementary Matrices and Determinants 181
Corollary 8.2.3. Any elementary matrix Eji , Ri ( ), Sji (µ) is invertible, ex-
cept for Ri (0). In fact, the inverse of an elementary matrix is another ele-
mentary matrix.
To obtain one last important result, suppose that M and N are square
n ⇥ n matrices, with reduced row echelon forms such that, for elementary
matrices Ei and Fi ,
M = E1 E2 · · · Ek RREF(M ) ,
and
N = F1 F2 · · · Fl RREF(N ) .
If RREF(M ) is the identity matrix (i.e., M is invertible), then:
181
182 Determinants
Alternative proof
1. Let 0 1
m11 m12 m13
B C
M = @m21 m22 m23 A .
m31 m32 m33
182
8.3 Review Problems 183
Use row operations to put M into row echelon form. For simplicity,
assume that m11 6= 0 6= m11 m22 m21 m12 .
Prove that M is non-singular if and only if:
m11 m22 m33 m11 m23 m32 + m12 m23 m31
m12 m21 m33 + m13 m21 m32 m13 m22 m31 6= 0
✓ ◆ ✓ ◆
1 0 1 a b
2. (a) What does the matrix E2 = do to M = under
1 0 d c
left multiplication? What about right multiplication?
(b) Find elementary matrices R1 ( ) and R2 ( ) that respectively mul-
tiply rows 1 and 2 of M by but otherwise leave M the same
under left multiplication.
(c) Find a matrix S21 ( ) that adds a multiple of row 2 to row 1
under left multiplication.
3. Let ˆ denote the permutation obtained from by transposing the first
two outputs, i.e. ˆ (1) = (2) and ˆ (2) = (1). Suppose the function
f : {1, 2, 3, 4} ! R. Write out explicitly the following two sums:
X X
f (s) and f ˆ (s) .
What do you observe? Now write a brief explanation why the following
equality holds X X
F( ) = F (ˆ ) ,
183
184 Determinants
(a) det M .
(b) det N .
(c) det(M N ).
(d) det M det N .
1
(e) det(M ) assuming ad bc 6= 0.
T
(f) det(M )
(g) det(M + N ) (det M + det N ). Is the determinant a linear trans-
formation from square matrices to real numbers? Explain.
✓ ◆
a b
10. Suppose M = is invertible. Write M as a product of elemen-
c d
tary row matrices times RREF(M ).
11. Find the inverses of each of the elementary matrices, Eji , Ri ( ), Sji ( ).
Make sure to show that the elementary matrix times its inverse is ac-
tually the identity.
12. Let eij denote the matrix with a 1 in the i-th row and j-th column
and 0’s everywhere else, and let A be an arbitrary 2 ⇥ 2 matrix. Com-
pute det(A + tI2 ). What is the first order term (the t1 term)? Can you
184
8.3 Review Problems 185
express your results in terms of tr(A)? What about the first order term
in det(A + tIn ) for any arbitrary n ⇥ n matrix A in terms of tr(A)?
Note that the result of det(A + tI2 ) is a polynomial in the variable t
known as the characteristic polynomial.
(a)
det(I2 + teij ) det(I2 )
lim
t!0 t
(b)
det(I3 + teij ) det(I3 )
lim
t!0 t
(c)
det(In + teij ) det(In )
lim
t!0 t
(d)
det(In + At) det(In )
lim
t!0 t
Note, these are the directional derivative in the eij and A directions.
185
186 Determinants
X
det M = sgn( ) m1 (1) m2 (2) · · · mn(n)
X
= m11 sgn(/ 1 ) m2/ 1 (2) · · · mn/1 (n)
/1
X
+ m12 sgn(/ 2 ) m2/ 2 (1) m3/ 2 (3) · · · mn/2 (n)
/2
X
+ m13 sgn(/ 3 ) m2/ 3 (1) m3/ 3 (2) m4/ 3 (4) · · · mn/3 (n)
/3
+ ···
Here the symbols / k refers to the permutation with the input k removed.
The summand on the j’th line of the above formula looks like the determinant
of the minor obtained by removing the first and j’th column of M . However
we still need to replace sum of / j by a sum over permutations of column
numbers of the matrix entries of this minor. This costs a minus sign whenever
j 1 is odd. In other words, to expand by minors we pick an entry m1j of the
first row, then add ( 1)j 1 times the determinant of the matrix with row i
and column j deleted. An example will probably help:
186
8.4 Properties of the Determinant 187
✓ ◆ ✓ ◆ ✓ ◆
5 6 4 6 4 5
det M = 1 det 2 det + 3 det
8 9 7 9 7 8
= 1(5 · 9 8 · 6) 2(4 · 9 7 · 6) + 3(4 · 8 7 · 5)
= 0
0 1 0 1
1 2 3 4 0 0
det @4 0 0A = det @1 2 3A
7 8 9 7 8 9
✓ ◆
2 3
= 4 det
8 9
= 24
Example
Since we know how the determinant of a matrix changes when you perform
row operations, it is often very beneficial to perform row operations before
computing the determinant by brute force.
1
A fun exercise is to compute the determinant of a 4 ⇥ 4 matrix filled in order, from
left to right, with the numbers 1, 2, 3, . . . , 16. What do you observe? Try the same for a
5 ⇥ 5 matrix with 1, 2, 3, . . . , 25. Is there a pattern? Can you explain it?
187
188 Determinants
Example 105
0 1 0 1 0 1
1 2 3 1 2 3 1 2 3
det @4 5 6A = det @3 3 3A = det @3 3 3A = 0 .
7 8 9 6 6 6 0 0 0
Try to determine which row operations we made at each step of this computation.
You might suspect that determinants have similar properties with respect
to columns as what applies to rows:
Proof. By definition,
X
det M = sgn( )m1 (1) m2 (2) · · · mn(n) .
1
For any permutation , there is a unique inverse permutation that
1
undoes . If sends i ! j, then sends j ! i. In the two-line notation
for a permutation,
this corresponds to just flipping the permutation over. For
1 2 3
example, if = , then we can find 1 by flipping the permutation
2 3 1
and then putting the columns in order:
1 2 3 1 1 2 3
= = .
1 2 3 3 1 2
Since any permutation can be built up by transpositions, one can also find
the inverse of a permutation by undoing each of the transpositions used to
build up ; this shows that one can use the same number of transpositions
to build and 1 . In particular, sgn = sgn 1 .
188
8.4 Properties of the Determinant 189
= det M T .
Example 106 Because of this, we see that expansion by minors also works over
columns. Let 0 1
1 2 3
M = @0 5 6A .
0 8 9
Then ✓ ◆
5 8
det M = det M T = 1 det = 3.
6 9
189
190 Determinants
190
8.4 Properties of the Determinant 191
Example 107
0 ✓ ◆ ✓ ◆ ✓ ◆ 1T
2 0 1 0 1 2
B det det det C
0 1 B 1 1 0 1 0 1 C
3 1 1 B ✓ ◆ ✓ ◆ ✓ ◆C
B 1 1 3 1 3 1 C
adj @1 2 0 =B
A
B det det det C
B 1 1 0 1 0 1 C
C
0 1 1 B ✓ ◆ ✓ ◆ ✓ ◆C
@ 1 1 3 1 3 1 A
det det det
2 0 1 0 1 2
Let’s compute the product M adj M . For any matrix N , the i, j entry
of M N is given by taking the dot product of the ith row of M and the jth
column of N . Notice that the dot product of the ith row of M and the ith
column of adj M is just the expansion by minors of det M in the ith row.
Further, notice that the dot product of the ith row of M and the jth column
of adj M with j 6= i is the same as expanding M by minors, but with the
jth row replaced by the ith row. Since the determinant of any matrix with
a row repeated is zero, then these dot products are zero as well.
We know that the i, j entry of the product of two matrices is the dot
product of the ith row of the first by the jth column of the second. Then:
M adj M = (det M )I
1
Thus, when det M 6= 0, the adjoint gives an explicit formula for M .
191
192 Determinants
0 1 0 1
3 1 1 2 0 2
@
adj 1 2 A
0 = @ 1 3 1A .
0 1 1 1 3 7
Now, multiply:
0 10 1 0 1
3 1 1 2 0 2 6 0 0
@1 2 0A @ 1 3 1A = @0 6 0A
0 1 1 1 3 7 0 0 6
0 1 1 0 1
3 1 1 2 0 2
1@
) @1 2 0A = 1 3 1A
6
0 1 1 1 3 7
This process for finding the inverse matrix is sometimes called Cramer’s Rule .
Volume = det u v w
192
8.5 Review Problems 193
193
194 Determinants
1
3. Let denote the inverse permutation of . Suppose the function
f : {1, 2, 3, 4} ! R. Write out explicitly the following two sums:
X X
1
f (s) and f (s) .
What do you observe? Now write a brief explanation why the following
equality holds X X
F( ) = F ( 1) ,
Hint
194
Subspaces and Spanning Sets
9
It is time to study vector spaces more carefully and return to some funda-
mental questions:
9.1 Subspaces
Definition We say that a subset U of a vector space V is a subspace of V
if U is a vector space under the inherited addition and scalar multiplication
operations of V .
195
196 Subspaces and Spanning Sets
0 1
x
This equation can be expressed as the homogeneous system a b c @ y A = 0, or
z
M X = 0 with M the matrix a b c . If X1 and X2 are both solutions to M X = 0,
then, by linearity of matrix multiplication, so is µX1 + ⌫X2 :
M (µX1 + ⌫X2 ) = µM X1 + ⌫M X2 = 0.
So P is closed under addition and scalar multiplication. Additionally, P contains the
origin (which can be derived from the above by setting µ = ⌫ = 0). All other vector
space requirements hold for P because they hold for all vectors in R3 .
196
9.2 Building Subspaces 197
Note that the requirements of the subspace theorem are often referred to as
“closure”.
We can use this theorem to check if a set is a vector space. That is, if we
have some set U of vectors that come from some bigger vector space V , to
check if U itself forms a smaller vector space we need check only two things:
197
198 Subspaces and Spanning Sets
In this case, the vectors in U define the xy-plane in R3 . We can view the
xy-plane as the set of all vectors that arise as a linear combination of the two
vectors in U . We call this set of all linear combinations the span of U :
8 0 1 0 1 9
< 1 0 =
@ A @
span(U ) = x 0 + y 1 x, y 2 R .A
: ;
0 0
Notice that any vector in the xy-plane is of the form
0 1 0 1 0 1
x 1 0
@ y A = x @0A + y @1A 2 span(U ).
0 0 0
That is, the span of S is the set of all finite linear combinations1 of
elements of S. Any finite sum of the form “a constant times s1 plus a constant
times s2 plus a constant times s3 and so on” is in the span of S.2 .
0 1
0
Example 110 Let V = R3 and X ⇢ V be the x-axis. Let P = @1A, and set
0
S = X [ {P } .
0 1 0 1 0 1 0 1
2 2 2 0
The vector @3A is in span(S), because @3A = @0A + 3 @1A . Similarly, the vector
0 0 0 0
0 1 0 1 0 1 0 1
12 12 12 0
@17.5A is in span(S), because @17.5A = @ 0A +17.5 @1A . Similarly, any vector
0 0 0 0
1
Usually our vector spaces are defined over R, but in general we can have vector spaces
defined over di↵erent base fields such as C or Z2 . The coefficients ri should come from
whatever our base field is (usually R).
2
It is important that we only allow finitely many terms in our linear combinations; in
the definition above, N must be a finite number. It can be any finite number, but it must
be finite. We can relax the requirement that S = {s1 , s2 , . . .} and just let S be any set of
vectors. Then we shall write span(S) := {r1 s1 +r2 s2 +· · ·+rN sN | ri 2 R, si 2 S, N 2 N, }
198
9.2 Building Subspaces 199
of the form 0 1 0 1 0 1
x 0 x
@ 0A + y @1A = @ y A
0 0 0
is in span(S). On the other hand, any vector in span(S) must have a zero in the
z-coordinate. (Why?) So span(S) is the xy-plane, which is a vector space. (Try
drawing a picture to verify this!)
u = c 1 s1 + c 2 s2 + · · ·
v = d 1 s1 + d 2 s2 + · · ·
) u + µv = (c1 s1 + c2 s2 + · · · ) + µ(d1 s1 + d2 s2 + · · · )
= ( c1 + µd1 )s1 + ( c2 + µd2 )s2 + · · ·
Note that this proof, like many proofs, consisted of little more than just
writing out the definitions.
199
200 Subspaces and Spanning Sets
0 1 0 1 0 1 0 1
1 1 a x
r1 @0A + r2 @ 2A + r3 @1A = @ y A .
a 3 0 z
We can write this as a linear system in the unknowns r1 , r2 , r3 as follows:
0 1 0 11 0 1
1 1 a r x
@0 2 1A @r2 A = @ y A .
a 3 0 r3 z
0 1
1 1 a
If the matrix M = @0 2 1A is invertible, then we can find a solution
a 3 0
0 1 0 11
x r
M 1 @ y A = @r 2 A
z r3
0 1
x
for any vector y A 2 R3 .
@
z
Therefore we should choose a so that M is invertible:
Some other very important ways of building subspaces are given in the
following examples.
L(u) = 0 = L(u0 ) ,
200
9.2 Building Subspaces 201
Hence, thanks to the subspace theorem, the set of all vectors in U that are mapped
to the zero vector is a subspace of V . It is called the kernel of L:
kerL := {u 2 U |L(u) = 0} ⇢ U.
Note that finding a kernel means finding a solution to a homogeneous linear equation.
Hence, calling once again on the subspace theorem, the set of all vectors in V that
are obtained as outputs of the map L is a subspace. It is called the image of L:
imL := {L(u) | u 2 U } ⇢ V.
Hence, again by subspace theorem, the set of all vectors in V that obey the eigenvector
equation L(v) = v is a subspace of V . It is called an eigenspace
V := {v 2 V |L(v) = v}.
For most scalars , the only solution to L(v) = v will be v = 0, which yields the
trivial subspace {0}. When there are nontrivial solutions to L(v) = v, the number
is called an eigenvalue, and carries essential information about the map L.
201
202 Subspaces and Spanning Sets
1. Determine if x x3 2 span{x2 , 2x + x2 , x + x3 }.
(a) U [ W
(b) U \ W
Hint
3. Let L : R3 ! R3 where
L(x, y, z) = (x + 2y + z, 2x + y + z, 0) .
Find kerL, imL and the eigenspaces R3 1 , R33 . Your answers should be
subsets of R3 . Express them using span notation.
202
Linear Independence
10
Consider a plane P that includes the origin in R3 and non-zero vectors
{u, v, w} in P .
If no two of u, v and w are parallel, then P = span{u, v, w}. But any two
vectors determines a plane, so we should be able to span the plane using
only two of the vectors u, v, w. Then we could choose two of the vectors in
{u, v, w} whose span is P , and express the other as a linear combination of
those two. Suppose u and v span P . Then there exist constants d1 , d2 (not
both zero) such that w = d1 u + d2 v. Since w can be expressed in terms of u
and v we say that it is not independent. More generally, the relationship
c1 u + c2 v + c3 w = 0 ci 2 R, some ci 6= 0
203
204 Linear Independence
c1 v1 + c2 v2 + · · · + cn vn = 0.
Remark The zero vector 0V can never be on a list of independent vectors because
↵0V = 0V for any scalar ↵.
Worked Example
c1 v1 + c2 v2 + c3 v3 = 0
1
Usually our vector spaces are defined over R, but in general we can have vector spaces
defined over di↵erent base fields such as C or Z2 . The coefficients ci should come from
whatever our base field is (usually R).
204
10.1 Showing Linear Dependence 205
Therefore nontrivial solutions exist. At this point we know that the vectors are
linearly dependent. If we need to, we can find coefficients that demonstrate linear
dependence by solving
0 1 0 1 0 1
0 1 1 0 1 1 3 0 1 0 2 0
@0 2 2 0A ⇠ @0 1 1 0A ⇠ @0 1 1 0A .
1 1 3 0 0 0 0 0 0 0 0 0
The solution set {µ( 2, 1, 1) | µ 2 R} encodes the linear combinations equal to zero;
any choice of µ will produce coefficients c1 , c2 , c3 that satisfy the linear homogeneous
equation. In particular, µ = 1 corresponds to the equation
c1 v1 + c2 v2 + c3 v3 = 0 ) 2v1 v2 + v3 = 0.
Proof. The theorem is an if and only if statement, so there are two things to
show.
205
206 Linear Independence
c1 v1 + · · · + ck 1 vk 1 vk + 0vk+1 + · · · + 0vn = 0.
ii. Now we show that linear dependence implies that there exists k for
which vk is a linear combination of the vectors {v1 , . . . , vk 1 }.
The assumption says that
c1 v1 + c2 v2 + · · · + cn vn = 0.
Take k to be the largest number for which ck is not equal to zero. So:
c1 v1 + c2 v2 + · · · + ck 1 vk 1 + ck vk = 0.
c1 v1 + c2 v2 + · · · + ck 1 vk 1 = ck vk
c1 c2 ck 1
) v 1 v 2 · · · vk 1 = vk .
ck ck ck
Worked proof
206
10.2 Showing Linear Independence 207
Example 117 Consider the vector space P2 (t) of polynomials of degree less than or
equal to 2. Set:
v1 = 1 + t
v 2 = 1 + t2
v 3 = t + t2
v 4 = 2 + t + t2
v 5 = 1 + t + t2 .
c1 v1 + c2 v2 + c3 v3 = 0
207
208 Linear Independence
Since the matrix M has non-zero determinant, the only solution to the system of
equations 0 11
c
v1 v2 v3 @c2 A = 0
c3
is c1 = c2 = c3 = 0. So the vectors v1 , v2 , v3 are linearly independent.
Example 119 Let Z32 be the space of 3 ⇥ 1 bit-valued matrices (i.e., column vectors).
Is the following subset linearly independent?
8 0 1 0 1 0 19
< 1 1 0 =
@1A , @0A , @1A
: ;
0 1 1
If the set is linearly dependent, then we can find non-zero solutions to the system:
0 1 0 1 0 1
1 1 0
1@ A 2@ A 3@ A
c 1 + c 0 + c 1 = 0,
0 1 1
Solutions exist if and only if the determinant of the matrix is non-zero. But:
0 1
1 1 0 ✓ ◆ ✓ ◆
0 1 1 1
det @1 A
0 1 = 1 det 1 det = 1 1=1+1=0
1 1 0 1
0 1 1
Therefore non-trivial solutions exist, and the set is not linearly independent.
208
10.3 From Dependent Independent 209
209
210 Linear Independence
Hint
2. Let ei be the vector in Rn with a 1 in the ith position and 0’s in every
other position. Let v be an arbitrary vector in Rn .
210
10.4 Review Problems 211
First give an explicit example for each case, state whether the col-
umn vectors you use are linearly independent or spanning in each case.
Then, in general, determine whether (v1 , v2 , . . . , vm ) are linearly inde-
pendent and/or spanning Rn in each of the three cases. If they are
linearly dependent, does RREF(M ) tell you which vectors could be
removed to yield an independent set of vectors?
211
212 Linear Independence
212
Basis and Dimension
11
In chapter 10, the notions of a linearly independent set of vectors in a vector
space V , and of a set of vectors that span V were established; any set of
vectors that span V can be reduced to some minimal collection of linearly
independent vectors; such a minimal set is called a basis of the subspace V .
Definition Let V be a vector space. Then a set S is a basis for V if S is
linearly independent and V = span S.
If S is a basis of V and S has only finitely many elements, then we say
that V is finite-dimensional. The number of vectors in S is the dimension
of V .
Suppose V is a finite-dimensional vector space, and S and T are two dif-
ferent bases for V . One might worry that S and T have a di↵erent number of
vectors; then we would have to talk about the dimension of V in terms of the
basis S or in terms of the basis T . Luckily this isn’t what happens. Later in
this chapter, we will show that S and T must have the same number of vec-
tors. This means that the dimension of a vector space is basis-independent.
In fact, dimension is a very important characteristic of a vector space.
Example 121 Pn (t) (polynomials in t of degree n or less) has a basis {1, t, . . . , tn },
since every vector in this space is a sum
a 0 1 + a 1 t + · · · + a n tn , ai 2 R ,
so Pn (t) = span{1, t, . . . , tn }. This set of vectors is linearly independent; If the
polynomial p(t) = c0 1 + c1 t + · · · + cn tn = 0, then c0 = c1 = · · · = cn = 0, so p(t) is
the zero polynomial. Thus Pn (t) is finite dimensional, and dim Pn (t) = n + 1.
213
214 Basis and Dimension
Proof. Since S is a basis for V , then span S = V , and so there exist con-
stants ci such that w = c1 v1 + · · · + cn vn .
Suppose there exists a second set of constants di such that
w = d 1 v1 + · · · + d n vn .
Then
0V = w w
= c1 v1 + · · · + cn vn d 1 v1 ··· d n vn
= (c1 d1 )v1 + · · · + (cn dn )vn .
Proof Explanation
Remark This theorem is the one that makes bases so useful–they allow us to convert
abstract vectors into column vectors. By ordering the set S we obtain B = (v1 , . . . , vn )
and can write 0 11 0 11
c c
B .. C B .. C
w = (v1 , . . . , vn ) @ . A = @ . A .
cn cn B
Remember that in general it makes no sense to drop the subscript B on the column
vector on the right–most vector spaces are not made from columns of numbers!
214
215
Worked Example
vk = c0 w 1 + c1 v1 + · · · + ci 1 vi 1 + ci+1 vi+1 + · · · + cn vn .
Then replacing w1 with its expression in terms of the collection S gives a way
to express the vector vk as a linear combination of the vectors in S, which
contradicts the linear independence of S. On the other hand, we cannot
express w1 as a linear combination of the vectors in {vj |j 6= i}, since the
expression of w1 in terms of S was unique, and had a non-zero coefficient for
the vector vi . Then no vector in S1 can be expressed as a combination of
other vectors in S1 , which demonstrates that S1 is linearly independent.
The set S1 spans V : For any u 2 V , we can express u as a linear com-
bination of vectors in S. But we can express vi as a linear combination of
215
216 Basis and Dimension
Worked Example
Proof. Let S and T be two bases for V . Then both are linearly independent
sets that span V . Suppose S has n vectors and T has m vectors. Then by
the previous lemma, we have that m n. But (exchanging the roles of S
and T in application of the lemma) we also see that n m. Then m = n,
as desired.
and that this set of vectors is linearly independent. (If you didn’t do that
problem, check this before reading any further!) So this set of vectors is
216
11.1 Bases in Rn . 217
a basis for Rn , and dim Rn = n. This basis is often called the standard
or canonical basis for Rn . The vector with a one in the ith position and
zeros everywhere else is written ei . (You could also view it as the function
{1, 2, . . . , n} ! R where ei (j) = 1 if i = j and 0 if i 6= j.) It points in the
direction of the ith coordinate axis, and has unit length. In multivariable
calculus classes, this basis is often written {î, ĵ, k̂} for R3 .
Note that it is often convenient to order basis elements, so rather than
writing a set of vectors, we would write a list. This is called an ordered
basis. For example, the canonical ordered basis for Rn is (e1 , e2 , . . . , en ). The
possibility to reorder basis vectors is not the only way in which bases are
non-unique.
Bases are not unique. While there exists a unique way to express a vector in terms
of any particular basis, bases themselves are far from unique. For example, both of
the sets ⇢✓ ◆ ✓ ◆ ⇢✓ ◆ ✓ ◆
1 0 1 1
, and ,
0 1 1 1
are bases for R2 . Rescaling any vector in one of these sets is already enough to show
that R2 has infinitely many bases. But even if we require that all of the basis vectors
have unit length, it turns out that there are still infinitely many bases for R2 (see
review question 3).
0 = x1 v1 + · · · + xn vn .
Let M be a matrix whose columns are the vectors vi and X the column
vector with entries xi . Then the above equation is equivalent to requiring
that there is a unique solution to
MX = 0 .
217
218 Basis and Dimension
in the unknowns xi . For this, we need to find a unique solution for the linear
system M X = w.
Thus, we need to show that M 1 exists, so that
1
X=M w
is the unique solution we desire. Then we see that S is a basis for Rn if and
only if det M 6= 0.
Theorem 11.1.1. Let S = {v1 , . . . , vm } be a collection of vectors in Rn .
Let M be the matrix whose columns are the vectors in S. Then S is a basis
for V if and only if m is the dimension of V and
det M 6= 0.
Remark Also observe that S is a basis if and only if RREF(M ) = I.
Example 122 Let
⇢✓ ◆ ✓ ◆ ⇢✓ ◆ ✓ ◆
1 0 1 1
S= , and T = , .
0 1 1 1
✓ ◆
1 0
Then set MS = . Since det MS = 1 6= 0, then S is a basis for R2 .
0 1
✓ ◆
1 1
Likewise, set MT = . Since det MT = 2 6= 0, then T is a basis for R2 .
1 1
218
11.2 Matrix of a Linear Transformation (Redux) 219
The number mij is the ith component of L(ej ) in the basis F , while the fi
are vectors (note that if ↵ is a scalar, and v a vector, ↵v = v↵, we have
used the latter—rather uncommon—notation in the above formula). The
numbers mij naturally form a matrix whose jth column is the column vector
displayed above. Indeed, if
v = e1 v 1 + · · · + en v n ,
Then
L(v) = L(v 1 e1 + v 2 e2 + · · · + v n en )
m
X
1 2 n
= v L(e1 ) + v L(e2 ) + · · · + v L(en ) = L(ej )v j
j=1
m n
" m
#
X X X
= (f1 m1j + · · · + fm mm j
j )v = fi Mji v j
j=1 i=1 j=1
0 10 1
m11 m12 ··· m1n v1
B m21 m22 C B v2C
B CB C
= f1 f2 · · · fm B .. .. .. C B ..C
@ . . . A @ .A
mm1 · · · mm n vn
The array of numbers M = (mij ) is called the matrix of L in the input and
output bases E and F for V and W , respectively. This matrix will change
if we change either of the bases. Also observe that the columns of M are
computed by examining L acting on each basis vector in V expanded in the
basis vectors of W .
Example 123 Let L : P1 (t) 7! P1 (t), such that L(a + bt) = (a + b)t. Since V =
219
220 Basis and Dimension
When the vector space is Rn and the standard basis is used, the problem
of finding the matrix of a linear transformation will seem almost trivial. It
is worthwhile working through it once in the above language though.
Example 124 Any vector in Rn can be written as a linear combination of the standard
(ordered) basis (e1 , . . . en ). The vector ei has a one in the ith position, and zeros
everywhere else. I.e.
0 1 0 1 0 1
1 0 0
B0C B1C B0C
B C B C B C
e1 = B .C , e2 = B . C , . . . , en = B .C .
@ ..A @ ..A @ ..A
0 0 1
220
11.3 Review Problems 221
221
222 Basis and Dimension
Hint
5. Vectors are objects that you can add together; show that the set of all
linear transformations mapping R3 ! R is itself a vector space. Find a
basis for this vector space. Do you think your proof could be modified
to work for linear transformations Rn ! R? For RN ! Rm ? For RR ?
Hint: Represent R3 as column vectors, and argue that a linear trans-
formation T : R3 ! R is just a row vector.
222
11.3 Review Problems 223
Sn := {M : Rn ! Rn | M = M T }.
An = {M : Rn ! Rn | M = M T }.
7. Give the matrix of the linear transformation L with respect to the input
and output bases B and B 0 listed below:
223
224 Basis and Dimension
224
Eigenvalues and Eigenvectors
12
In a vector space with no structure other than the vector space rules, no
vector other than the zero vector is any more important than any other.
Once one also has a linear transformation the situation changes dramatically.
We begin with a fun example, of a type bound to reappear in your future
scientific studies:
The set of all displacement functions for the string can be modeled by a vector space
⇢
@ k+m y(x, t)
V = y : R2 ! R all partial derivatives exist .
@xk @tm
@2y
The concavity and the acceleration of the string at the point (x, t) are @x2
(x, t) and
@2y
@t2
respectively. Since quantities must exist at each point on the string for the
(x, t)
wave equation to make sense, we required that all partial derivatives of y(x, t) exist.
225
226 Eigenvalues and Eigenvectors
Note also that the function y(x, t) = 0 —drawn in grey—is the only special vector in
the vector space V .
We now add some extra information. The string’s behavior in time and space can
be modeled by a wave equation
@2y @2y
= ,
@t2 @x2
which says that the acceleration of a point on the string is equal its concavity at that
point. For example, if the string were made of stretched rubber, it would prefer to be
in a straight line, so this equation makes good intuitive sense. Not all of the functions
in V are solutions to the wave equation; not all of the functions in the vector space V
describe the way a string would really vibrate. The ways a string would really vibrate
are (at least approximately) solutions to the wave equation above, which can rewritten
as a linear function
Wy = 0
where ✓ ◆
@2 @2
W = + :V !V .
@t2 @x2
Some examples of solutions are
and
y3 (x, t) = sin(t) sin(x) + 3 sin(2t) sin(2x) .
Since W y = 0 is a homogeneous linear equation, linear combinations of solutions are
solutions; in other words the kernel ker(w) is a vector space. Given the linear function
W , some vectors are now more special than others.
We can use musical intuition to do more! If the ends of the string were held
fixed, we suspect that it would prefer to vibrate at certain frequencies corresponding
to musical notes. This is modeled by looking at solutions of the form
y(x, t) = sin(!t)v(x) .
Here the periodic sine function accounts for the string’s vibratory motion, while the
function v(x) gives the shape of the string at any fixed instant of time. Observe that
d2 f
W sin(!t)v(x) = sin(!t) + !2f .
dx2
This suggests we introduce a new vector space
⇢
dk f
U = v : R ! R all derivatives exist ,
dxk
226
12.1 Invariant Directions 227
d2
L := :U !U.
dx2
The number ! is called an angular frequency in many contexts, lets call its square :=
! 2 to match notations we will use later (notice that for this particular problem must
then be negative). Then, because we want W (y) = 0, which implies d2 f /dx2 = ! 2 f ,
it follows that the vector v(x) 2 U determining the vibrating string’s shape obeys
L(v) = v .
This is perhaps one of the most important equations in all of linear algebra! It is
the eigenvalue-eigenvector equation. In this problem we have to solve it both for ,
to determine which frequencies (or musical notes) our string likes to sing, and the
vector v determining the string’s shape. The vector v is called an eigenvector and
its corresponding eigenvalue. The solution sets for each are called V . For any the
set Vk is a vector space since elements of this set are solutions to the homogeneous
equation (L )v = 0.
We began this chapter by stating “In a vector space, with no other struc-
ture, no vector is more important than any other.” Our aim is to show you
that when a linear operator L acts on a vector space, vectors that solve the
equation L(v) = v play a central role.
227
228 Eigenvalues and Eigenvectors
228
12.1 Invariant Directions 229
✓ ◆
3
Then L fixes the direction (and actually also the magnitude) of the vector v1 = .
5
Now, notice that any vector with the same direction as v1 can be written as cv1
for some constant c. Then L(cv1 ) = cL(v1 ) = cv1 , so L fixes every vector pointing
in the same direction as v1 .
Also notice that
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 4·1+3·2 2 1
L = = =2 ,
2 10 · 1 + 7 · 2 4 2
✓ ◆
1
so L fixes the direction of the vector v2 = but stretches v2 by a factor of 2.
2
Now notice that for any constant c, L(cv2 ) = cL(v2 ) = 2cv2 . Then L stretches every
vector pointing in the same direction as v2 by a factor of 2.
229
230 Eigenvalues and Eigenvectors
direction also satisfies this equation because L(cv) = cL(v) = cv. More
generally, any non-zero vector v that solves
L(v) = v
we have seen that L enjoys the property of having two invariant directions,
represented by eigenvectors v1 and v2 with eigenvalues 1 and 2, respectively.
It would be very convenient if we could write any vector w as a linear
combination of v1 and v2 . Suppose w = rv1 + sv2 for some constants r and s.
Then
L(w) = L(rv1 + sv2 ) = rL(v1 ) + sL(v2 ) = rv1 + 2sv2 .
Now L just multiplies the number r by 1 and the number s by 2. If we could
write this as a matrix, it would look like:
✓ ◆✓ ◆
1 0 s
0 2 t
230
12.1 Invariant Directions 231
Now that we’ve seen what eigenvalues and eigenvectors are, there are a
number of questions that need to be answered.
2 ⇥ 2 Example
Example 127 Let L : R2 ! R2 such that L(x, y) = (2x + 2y, 16x + 6y). First, we
find the matrix of L, this is quickest in the standard basis:
✓ ◆ ✓ ◆✓ ◆
x L 2 2 x
7 ! .
y 16 6 y
✓ ◆
x
We want to find an invariant direction v = such that
y
Lv = v
231
232 Eigenvalues and Eigenvectors
✓ ◆
2 2
This is a homogeneous system, so it only has solutions when the matrix
16 6
is singular. In other words,
✓ ◆
2 2
det = 0
16 6
, (2 )(6 ) 32 = 0
, 2 8 20 = 0
, ( 10)( + 2) = 0
is called the characteristic polynomial of M , and its roots are the eigenvalues of M .
In this case, we see that L has two eigenvalues, 1 = 10 and 2 = 2. To find the
eigenvectors, we
✓ need to deal◆with
✓ ◆these✓ two
◆ cases separately. To do so, we solve the
2 2 x 0
linear system = with the particular eigenvalue plugged
16 6 y 0
in to the matrix.
232
12.2 The Eigenvalue–Eigenvector Equation 233
1. Find the characteristic polynomial of the matrix M for L, given by1 det( I M ).
2. Find the roots of the characteristic polynomial; these are the eigenvalues of L.
Eigenvalues
Lv = v.
(M I)v = 0.
This system has non-zero solutions if and only if the matrix
M I
is singular, and so we require that
1
To save writing many minus signs compute det(M I); which is equivalent if you
only need the roots.
233
234 Eigenvalues and Eigenvectors
Figure 12.2: Don’t forget the characteristic polynomial; you will need it to
compute eigenvalues.
det( I M ) = 0.
The left hand side of this equation is a polynomial in the variable
called the characteristic polynomial PM ( ) of M . For an n ⇥ n matrix,
the characteristic polynomial has degree n. Then
n n 1
PM ( ) = + c1 + · · · + cn .
PM ( ) = ( 1 )( 2) · · · ( n) =) PM ( i ) = 0.
234
12.2 The Eigenvalue–Eigenvector Equation 235
In the standard basis the matrix M representing L has columns Lei for each i, so:
0 1 0 10 1
x 2 1 1 x
L
@yA 7 ! @ 1 2 1A @ y A .
z 1 1 2 z
= ( 2)[( 2)2 1] + [ ( 2) 1] + [ ( 2) 1]
= ( 1)2 ( 4) .
0 1
1
Any vector of the form t @ 1A is then an eigenvector with eigenvalue 4; thus L
1
leaves a line through the origin invariant.
2
It is often easier (and equivalent) to solve det(M I) = 0.
235
236 Eigenvalues and Eigenvectors
Example 129 Let V be the vector space of smooth (i.e. infinitely di↵erentiable)
d
functions f : R ! R. Then the derivative is a linear operator dx : V ! V . What are
the eigenvectors of the derivative? In this case, we don’t have a matrix to work with,
so we have to make do.
d
A function f is an eigenvector of dx if there exists some number such that
d
f = f.
dx
d
An obvious candidate is the exponential function, e x ; indeed, dx e
x = e x . The
d
operator dx has an eigenvector e x for every 2 R.
12.3 Eigenspaces
In the previous example, we found two eigenvectors
0 1 0 1
1 1
@ 1A and @0A
0 1
for L, both with eigenvalue 1. Notice that
0 1 0 1 0 1
1 1 0
@ 1A + @0A = @1A
0 1 1
236
12.3 Eigenspaces 237
Eigenspaces
237
238 Eigenvalues and Eigenvectors
What equation must f (x) obey? Can you write this as an eigenvector
equation? Suppose that the string has length L and f (0) = f (L) = 0.
Can you find any solutions for f (x)?
✓ ◆
2 1
2. Let M = . Find all eigenvalues of M . Does M have two linearly
0 2
independent eigenvectors? Is there a basis in which the matrix of M is
diagonal? (I.e., can M be diagonalized?)
3. Consider L : R2 ! R2 with
✓ ◆ ✓ ◆
x x cos ✓ + y sin ✓
L = .
y x sin ✓ + y cos ✓
✓ ◆ ✓ ◆
1 0
(a) Write the matrix of L in the basis , .
0 1
(b) When ✓ 6= 0, explain how L acts on the plane. Draw a picture.
(c) Do you expect L to have invariant directions? (Consider also
special values of ✓.)
(d) Try to find real eigenvalues for L by solving the equation
L(v) = v.
p
(e) Are there complex eigenvalues for L, assuming that i = 1
exists?
238
12.4 Review Problems 239
239
240 Eigenvalues and Eigenvectors
What do you find from this computation? Does something similar hold
for 3 ⇥ 3 matrices? (Try assuming that the matrix of M is diagonal to
answer this.)
Hint
240
13
Diagonalization
13.1 Diagonalizability
Suppose we are lucky, and we have L : V ! V , and the ordered basis B =
(v1 , . . . , vn ) is a set of eigenvectors for L, with eigenvalues 1 , . . . , n . Then:
L(v1 ) = 1 v1
L(v2 ) = 2 v2
..
.
L(vn ) = n vn
241
242 Diagonalization
Non-diagonalizable example
242
13.2 Change of Basis 243
Here, the pik are constants, which we can regard as entries of a square ma-
trix P = (pik ). The matrix P must have an inverse since we can also write
each vj uniquely as a linear combination of the vk0 ;
X
vj = vk0 qjk .
k
Example 130 Suppose S 0 = (v10 , v20 ) is an ordered basis for a vector space V and that
with respect to some other ordered basis S = (v1 , v2 ) for V
! !
p1 p1
v10 = p1
2 and v20 = p1
3 .
2 S 3 S
243
244 Diagonalization
This means
! !
p1 v1 + v2 p1 v1 v2
v10 = v1 , v2 p1
2 = p and v20 = v1 , v2 p1
3 = p .
2 2 3 3
The change of basis matrix has as its columns just the components of v10 and v20 ;
!
p1 p1
P = 2 3 .
p1 p1
2 3
Let P = (pij ) be the change of basis matrix from input basis S to the basis
S 0 and Q = (qkj ) be the change of basis matrix from output basis T to the
basis T 0 . Then:
!
X X XX
L(vj0 ) = L vi pij = L(vi )pij = wk mki pij .
i i i k
244
13.2 Change of Basis 245
Meanwhile, we have:
X XX
L(vi0 ) = vk m0k
i = vj qkj mki .
k k j
Since the expression for a vector in a basis is unique, then we see that the
entries of M P are the same as the entries of QM 0 . In other words, we see
that
M P = QM 0 or M 0 = Q 1 M P.
Example 131 Let V be the space of polynomials in t and degree 2 or less and
L : V ! R2 where
✓ ◆ ✓ ◆ ✓ ◆
1 2 2 3
L(1) = L(t) = , L(t ) = .
2 1 3
From this information we can immediately read o↵ the matrix M of L in the bases
S = (1, t, t2 ) and T = (e1 , e2 ), the standard basis for R2 , because
L(1), L(t), L(t2 ) = (e1 + 2e2 , 2e1 + e2 , 3e1 + 3e2 )
✓ ◆ ✓ ◆
1 2 3 1 2 3
= (e1 , e2 ) ) M = .
2 1 3 2 1 3
Now suppose we are more interested in the bases
✓✓ ◆ ✓ ◆◆
0 0 1 2
2 2
S = (1 + t, t + t , 1 + t ) , T = , =: (w10 , w20 ) .
2 1
To compute the new matrix M 0 of L we could simply calculate what L does the the
new input basis vectors in terms of the new output basis vectors:
✓✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆◆
2 2 1 2 2 3 1 3
L(1 + t), L(t + t ), L(1 + t )) = + , + , +
2 1 1 3 2 3
245
246 Diagonalization
and
✓ ◆ ✓ ◆
1 2 1 2
(w10 , w20 ) = (e1 + 2e2 , 2e1 + e2 ) = (e1 , e1 ) ) Q= .
2 1 2 1
Hence
0 1
✓ ◆✓ ◆ 1 0 1 ✓ ◆
1 1 2 1 2 3 @ 1 1 2
M0 = Q 1
MP = A
1 1 0 = .
3 2 1 2 1 3 1 2 1
0 1 1
Notice that the change of basis matrices P and Q are both square and invertible.
Also, since we really wanted Q 1 , it is more efficient to try and write (e1 , e2 ) in
terms of (w10 , w20 ) which would yield directly Q 1 . Alternatively, one can check that
M P = QM 0 .
1
D=P MP
246
13.3 Changing to a Basis of Eigenvectors 247
247
248 Diagonalization
2 ⇥ 2 Example
248
13.4 Review Problems 249
write the matrix M for L in the standard basis, and two reorder-
ings of the standard basis. How are these matrices related?
3. Let
X = {~, |, } , Y = {⇤, ?} .
Write down two di↵erent ordered bases, S, S 0 and T, T 0 respectively,
for each of the vector spaces RX and RY . Find the change of basis
matrices P and Q that map these bases to one another. Now consider
the map
`:Y !X,
where `(⇤) = ~ and `(?) = . Show that ` can be used to define a
linear transformation L : RX ! RY . Compute the matrices M and
M 0 of L in the bases S, T and then S 0 , T 0 . Use your change of basis
matrices P and Q to check that M 0 = Q 1 M P .
249
250 Diagonalization
✓ ◆
a b
5. When is the 2 ⇥ 2 matrix diagonalizable? Include examples in
c d
your answer.
6. Show that similarity of matrices is an equivalence relation. (The defi-
nition of an equivalence relation is given in the background WeBWorK
set.)
7. Jordan form
✓ ◆
1
• Can the matrix be diagonalized? Either diagonalize it or
0
explain why this is impossible.
0 1
1 0
• Can the matrix @ 0 1A be diagonalized? Either diagonalize
0 0
it or explain why this is impossible.
0 1
1 0 ··· 0 0
B0 1 · · · 0 0C
B C
B0 0 · · · 0 0C
B C
• Can the n ⇥ n matrix B .. .. .. . . .. ..C be diagonalized?
B. . . . . .C
B C
@0 0 0 · · · 1A
0 0 ··· 0
Either diagonalize it or explain why this is impossible.
Note: It turns out that every matrix is similar to a block ma-
trix whose diagonal blocks look like diagonal matrices or the ones
above and whose o↵-diagonal blocks are all zero. This is called
the Jordan form of the matrix and a (maximal) block that looks
like 0 1
1 0 ··· 0
B 0 1 0 C
B C
B .. .. .. C
B . . . C
B C
@ 1 A
0 0 0
is called a Jordan n-cell or a Jordan block where n is the size of
the block.
8. Let A and B be commuting matrices (i.e., AB = BA) and suppose
that A has an eigenvector v with eigenvalue .
250
13.4 Review Problems 251
251
252 Diagonalization
252
Orthonormal Bases and Complements
14
You may have noticed that we have only rarely used the dot product. That
is because many of the results we have obtained do not require a preferred
notion of lengths of vectors. Once a dot or inner product is available, lengths
of and angles between vectors can be measured–very powerful machinery and
results are available in this case.
has many useful properties with respect to the dot product and lengths.
253
254 Orthonormal Bases and Complements
• The standard basis vectors are orthogonal (in other words, at right
angles or perpendicular);
ei ej = eTi ej = 0 when i 6= j
This is summarized by
⇢
1 i=j
eTi ej = ij = ,
0 i 6= j
where ij is the Kronecker delta. Notice that the Kronecker delta gives the
entries of the identity matrix.
Given column vectors v and w, we have seen that the dot product v w is
the same as the matrix multiplication v T w. This is an inner product on Rn .
We can also form the outer product vwT , which gives a square matrix. The
outer product on the standard basis vectors is interesting. Set
⇧1 = e1 eT1
0 1
1
B0 C
B C
= B ..C 1 0 ··· 0
@ .A
0
0 1
1 0 ··· 0
B0 0 · · · 0C
B C
= B .. ..C
@. .A
0 0 ··· 0
..
.
⇧n = en eTn
0 1
0
B0 C
B C
= B ..C 0 0 ··· 1
@ .A
1
0 1
0 0 ··· 0
B0 0 · · · 0C
B C
= B .. ..C
@. .A
0 0 ··· 1
254
14.2 Orthogonal and Orthonormal Bases 255
In short, ⇧i is the diagonal square matrix with a 1 in the ith diagonal position
and zeros everywhere else1 .
Notice that ⇧i ⇧j = ei eTi ej eTj = ei ij eTj . Then:
⇢
⇧i i=j
⇧i ⇧j = .
0 i 6= j
vi vj = 0 if i 6= j .
ui uj = ij .
255
256 Orthonormal Bases and Complements
in T , we get
v ui = c 1 u1 ui + · · · + c i u i u i + · · · + c n un ui
= c1 · 0 + · · · + ci · 1 + · · · + cn · 0
= ci ,
) c i = v ui
) v = (v u1 )u1 + · · · + (v un )un
X
= (v ui )ui .
i
When dealing with more general vector spaces the dot product makes no
sense, and one must instead choose an appropriate inner product. By “ap-
propriate”, we mean an inner product well-suited to the problem one is try-
ing to solve. If the vector space V under study has an orthonormal basis
O = (u1 , . . . , un ) meaning
hui , uj i = ij ,
256
14.2 Orthogonal and Orthonormal Bases 257
where h·, ·i is the inner product, you might ask whether this can be related
to a dot product? The answer to this question is yes and rather easy to
understand:
Given an orthonormal basis, the information of two vectors v and v 0 in V
can be encoded in column vectors
0 1 0 1
hv, u1 i hv, u1 i
B C B C
v = hv, u1 iu1 + · · · + hv, un iun = (u1 , . . . , un ) @ ... A = @ ... A ,
hv, un i hv, un i O
0 1 0 1
hv 0 , u1 i hv 0 , u1 i
B C B C
v 0 = hv 0 , u1 iu1 + · · · + hv 0 , un iun = (u1 , . . . , un ) @ ... A = @ ... A .
hv 0 , un i hv 0 , un i O
The dot product of these two column vectors is
0 1 0 1
hv, u1 i hv 0 , u1 i
B .. C B .. C 0 0
@ . A · @ . A = hv, u1 ihv , u1 i + · · · + hv, un ihv, un i .
hv, un i hv 0 , un i
This agrees exactly with the inner product of v and v 0 because
⌦ ↵
hv, v 0 i = hv, u1 iu1 + · · · + hv, un iun , hv 0 , u1 iu1 + · · · + hv 0 , un iun
= hv, u1 ihv 0 , u1 ihu1 , u1 i + hv, u2 ihv 0 , u1 ihu2 , u1 i + · · ·
· · · + hv, un 1 ihv 0 , un ihun 1 , un i + hv, un ihv 0 , un ihun , un i
= hv, u1 ihv 0 , u1 i + · · · + hv, un ihv 0 , un i .
The above computation looks a little daunting, but only the linearity prop-
erty of inner products and the fact that hui , uj i can equal either zero or
one was used. Because inner products become dot products once one uses
an orthonormal basis, we will quite often use the dot product notation in
situations where one really should write an inner product. Conversely, dot
product computations can always be rewritten in terms of an inner product,
if needed.
Example 133 Consider
R1 the space of polynomials given by V = span{1, x} with inner
product hp, p0 i = 0 p(x)p0 (x)dx. An obvious basis to use is B = (1, x) but it is not
hard to check that this is not orthonormal, instead we take
⇣ p 1 ⌘
O = 1, 2 3 x .
2
257
258 Orthonormal Bases and Complements
Hence we can predict the inner product of a + bx and a0 + b0 x using the dot product:
! 0 0 b0 1
a + 2b a + ⇣ ⌘⇣ 0⌘ 0
· @ 0 2 A = a + b a0 + b + bb = aa0 + 1 (ab0 + a0 b) + 1 bb0 .
pb b
p 2 2 12 2 3
2 3 2 3
Indeed
Z 1
1 1
ha + bx, a0 + b0 xi = (a + bx)(a0 + b0 x)dx = aa0 + (ab0 + a0 b) + bb0 .
0 2 3
P = (pji ) = (uj wi ).
258
14.3 Relating Orthonormal Bases 259
The object vwT is the square matrix made from the outer product of v and w.
Now we are ready to compute the components of the matrix product P P T .
X X
(uj wi )(wi uk ) = (uTj wi )(wiT uk )
i i
" #
X
= uTj (wi wiT ) uk
i
(⇤)
= uTj In uk
= uTj uk = jk .
The equality (⇤) is explained below. Assuming (⇤) holds, we have shown that
P P T = In , which implies that
PT = P 1
.
P
The equality
P in the line (⇤) says that i wi wiT = In . To see this, we
T j
examine iw
P i wi v for an arbitrary vector v. We can find constants c
such that v = j cj wj , so that
! ! !
X X X
wi wiT v = wi wiT cj w j
i i j
X X
= cj wi wiT wj
j i
X X
j
= c wi ij
j i
X
= cj wj since all terms with i 6= j vanish
j
= v.
P
Thus, as a linear transformation, i wi wiT = In fixes every vector, and thus
must be the identity In .
259
260 Orthonormal Bases and Complements
1
Definition A matrix P is orthogonal if P = PT.
Then to summarize,
1
P = PT .
Let E be the standard basis (e1 , e2 , e3 ). Since we are changing from the standard
basis to a new basis, then the columns of the change of basis matrix are exactly the
new basis vectors. Then the change of basis matrix from E to S is given by
0 1
e 1 u1 e 1 u2 e 1 u3
P = (Pij ) = (ej · ui ) = @e2 u1 e2 u2 e2 u3 A
e 3 u1 e 3 u2 e 3 u3
0 2 1
p
6
0 p13
B 1 1 1C
= u 1 u 2 u 3 = @ p6 p2 p3 A .
p1 p1 p1
6 2 3
260
14.4 Gram-Schmidt & Orthogonal Complements 261
Above we are using orthonormality of the ui and the fact that matrix multiplication
amounts to taking dot products between rows and columns. It is also very important
to realize that the columns of an orthogonal matrix are made from an orthonormal
set of vectors.
1
M = P DP = P DP T .
MT = (P DP T )T
= (P T )T DT P T
= P DP T
= M
v ? := v u·v
u · u u.
261
262 Orthonormal Bases and Complements
u
v
u·v
u·u
u = vk
v?
262
14.4 Gram-Schmidt & Orthogonal Complements 263
Given a third vector w, we should first check that w does not lie in the
span{u, v}, i.e., check that u, v and w are linearly independent. If it does
not, we then can define
u w v? w ?
w? := w u v .
u u v? v?
✓ ◆
? u w v? w ?
u w =u w u v
u u v? v?
u w v? w
=u w u u ? ?
u v?
u u v v
v? w
=u w u w u v? = 0
v? v?
✓ ◆
? ? ? u w v? w ?
v w =v w u v
u u v? v?
u w ? v? w ? ?
= v? w v u v v
u u v? v?
u w ?
= v? w v u v? w = 0
u u
263
264 Orthonormal Bases and Complements
Notice that each vi? here depends on vj? for every j < i. This allows us to
inductively/algorithmically build up a linearly independent, orthogonal set
of vectors {v1? , v2? , . . .} such that span{v1? , v2? , . . .} = span{v1 , v2 , . . .}. That
is, an orthogonal basis for the latter vector space.
Note that the set of vectors you start out with needs to be ordered to
uniquely specify the algorithm; changing the order of the vectors will give a
di↵erent orthogonal basis. You might need to be the one to put an order on
the initial set of vectors.
This algorithm is called the Gram–Schmidt orthogonalization pro-
cedure–Gram worked at a Danish insurance company over one hundred years
ago, Schmidt was a student of Hilbert (the famous German mathmatician).
9 R by appling Gram-Schmidt to
Example 135 We’ll obtain 8 an0orthogonal 3
1 0 1 basis
0 1for
< 1 1 3 =
the linearly independent set @ 1 , 1 , 1A .
A @ A @
: ;
1 0 1
Because he Gram-Schmidt algorithm uses the first vector from the ordered set the
largest number of times, we will choose the vector with the most zeros to be the first
in hopes of simplifying computations; we choose to order the set as
00 1 0 1 0 11
1 1 3
(v1 , v2 , v3 ) := @ @1 , 1 , 1AA .
A @ A @
0 1 1
264
14.5 QR Decomposition 265
A 4 ⇥ 4 Gram--Schmidt Example
14.5 QR Decomposition
In Chapter 7, Section 7.7 teaches you how to solve linear systems by decom-
posing a matrix M into a product of lower and upper triangular matrices
M = LU .
M = QR ,
265
266 Orthonormal Bases and Complements
What we will do is to think of the columns of M as three 3-vectors and use Gram–
Schmidt to build an orthonormal basis from these that will become the columns of
the orthogonal matrix Q. We will use the matrix R to record the steps of the Gram–
Schmidt procedure in such a way that the product QR equals M .
To begin with we write
0 7
10 1
1
2 5 1 1 5 0
B CB C
M = @1 14 5 2 A @ 0 1 0A .
0 1 2 0 0 1
In the first matrix the first two columns are orthogonal because we simply replaced the
second column of M by the vector that the Gram–Schmidt procedure produces from
the first two columns of M , namely
0 71 0 1 0 1
5 1 2
B 14 C B C 1 B C
@ 5 A = @ 3A @1A .
5
1 1 0
The matrix on the right is almost the identity matrix, save the + 15 in the second entry
of the first row, whose e↵ect upon multiplying the two matrices precisely undoes what
we we did to the second column of the first matrix.
For the third column of M we use Gram–Schmidt to deduce the third orthogonal
vector 0 11 0 1 0 1 0 71
6 1 2
B 1C B C B C 9 B 145 C
@ 3 A = @ 2A 0 @1A 54 @ 5 A ,
7 5
6 2 0 1
and therefore, using exactly the same procedure write
0 7 1
10 1
2 5 6 1 15 0
B 1C B 5C
M = @1 14 5 3 A @0 1 6A .
7
0 1 6 0 0 1
This is not quite the answer because the first matrix is now made of mutually orthog-
onal column vectors, but a bona fide orthogonal matrix is comprised of orthonormal
266
14.6 Orthogonal Complements 267
vectors. To achieve that we divide each column of the first matrix by its length and
multiply the corresponding row of the second matrix by the same amount:
0 p p p 1 0p p 1
2 5 7 30 6 5
5 90 18 5 5 0
B p p p C B p p C
B 5 7 30 6 CB 3 30 30 C
M =B 5 45
CB
9 A@ 0 5 2 A
C = QR .
@ p p p
30 7 6 6
0 18 18 0 0 2
Geometrically what has happened here is easy to see. We started with three vectors
given by the columns of M and rotated them such that the first lies along the x-
axis, the second in the xy-plane and the third in some other generic direction (here it
happens to be in the yz-plane).
A nice check of the above result is to verify that entry (i, j) of the matrix R equals
the dot product of the i-th column of Q with the j-th column of M . (Some people
memorize this fact and use it as a recipe for computing QR decompositions.) A good
test of your own understanding is to work out why this is true!
U + V := span(U [ V ) = {u + v | u 2 U, v 2 V }
the sum of U and V . Here, we are not adding vectors, but vector spaces to
produce a new vector space.
Example 137
80 1 0 19 8 0 1 0 19 80 1 0 1 0 19
>
> 1 0 >
> >
> 0 0 >
> >
> 1 0 0 >
<B C B C= < B C B C= <B C B C B C> =
1C , B C + span B C , B C = span B C , B C , B0C .
1 1 0 1 1
span B@0A @1A> @1A @1A> @0A @1A @1A>
>
> > >
> > >
> >
: ; : ; : ;
0 0 0 1 0 0 1
267
268 Orthonormal Bases and Complements
0 1
0
B1C
Notice that the addends have elements in common; B C
@1A is in both addends. Even
0
though both of the addends are 2-dimensional their sum is not 4-dimensional.
In the special case that U and V do not have any non-zero vectors in
common, their sum is a vector space with dimension dim U + dim V .
U V := span(U [ V ) = {u + v | u 2 U, v 2 V }
• When U \ V 6= {0W }, U + V 6= U V.
This distinction is important because the direct sum has a very nice property:
Theorem 14.6.1. If w 2 U V then there is only one way to write w as
the sum of a vector in U and a vector in V .
Proof. Suppose that u + v = u0 + v 0 , with u, u0 2 U , and v, v 0 2 V . Then we
could express 0 = (u u0 ) + (v v 0 ). Then (u u0 ) = (v v 0 ). Since U
and V are subspaces, we have (u u0 ) 2 U and (v v 0 ) 2 V . But since
these elements are equal, we also have (u u0 ) 2 V . Since U \ V = {0}, then
(u u0 ) = 0. Similarly, (v v 0 ) = 0. Therefore u = u0 and v = v 0 , proving
the theorem.
U V = W.
That is, how can we write W as the direct sum of U and some-
thing?
268
14.6 Orthogonal Complements 269
There is not a unique answer to this question as can be seen from the following
picture of subspaces in W = R3 .
However, using the inner product, there is a natural candidate U ? for this
second subspace as shown below.
U ? := w 2 W w u = 0 for all u 2 U
Remark The symbols “U ? ” are often read as “U -perp”. This is the set of all vectors
in W orthogonal to every vector in U . Notice also that in the above definition we
have implicitly assumed that the inner product is the dot product. For a general inner
product, the above definition would read U ? := w 2 W hw, ui = 0 for all u 2 U .
269
270 Orthonormal Bases and Complements
Possibly by now you are feeling overwhelmed, it may help to watch this quick
overview video.
Overview
Example 138 Consider any plane P through the origin in R3 . Then P is a subspace,
and P ? is the line through the origin orthogonal to P . For example, if P is the
xy-plane, then
v u = 0 = w u (8u 2 U ) .
Hence
) u (↵v + w) = ↵u v + u w = 0 (8u 2 U ) ,
and so ↵v + w 2 U ? .
Next, to form a direct sum between U and U ? we need to show that
U \ U ? = {0}. This holds because if u 2 U and u 2 U ? it follows that
u u = 0 , u = 0.
u = (w e1 )e1 + · · · + (w en )en 2 U ,
u? = w u .
270
14.6 Orthogonal Complements 271
Example 139 Consider any line L through the origin in R4 . Then L is a subspace,
and L? is a 3-dimensional subspace orthogonal to L. For example, let
8 0 19
>
> 1 >
< B C> =
1C
L = span B @1A>
>
> >
: ;
1
be a line in R4 . Then
80 1 9
>
> x >
>
<B C =
L? = B y C 2 R4 (x, y, z, w) (1, 1, 1, 1) = 0
> @ zA >
>
: >
;
w
80 1 9
>
> x >
>
<B C =
= B y C 2 R4 x+y+z+w =0 .
> @ zA >
>
: >
;
w
Using the Gram-Schmidt procedure one may find an orthogonal basis for L? . The set
80 1 0 1 0 19
>
> 1 1 1 >
<B C B C B C> =
B C , B C , B 0C
1 0
> @ 0A @ 1A @ 0A>
>
: >
;
0 0 1
271
272 Orthonormal Bases and Complements
So the set 80 1 0 1 1 0 1 19
>
> 1 >
<B C B 21 C B 31 C> =
1
B C , B 2 C , B 31 C
> @ 0A @ 1A @ A>
>
: 3 >;
0 0 1
is an orthogonal basis for L? . Dividing each basis vector by its length yields
80 1 0 1 1 0 p3 19
>
> p1 p >
>
> 6 p6 C>>
<B p12 C B p1 C B B 3 C
>
=
B C B
2C , B C
6C , B p C ,
6
B
>
> @ 0 A @ p2 A B 3C
@ p6 A> >
>
> 6 >
>
: 0 0 3 ;
2
272
14.7 Review Problems 273
Hint
u = 1, 2, 0 and v = 0, 1, 1 .
Hint
273
274 Orthonormal Bases and Complements
274
14.7 Review Problems 275
u v = (Qu) (Qv) ,
8. Carefully write out the Gram-Schmidt procedure for the set of vectors
8 0 1 0 1 0 19
< 1 1 1 =
@1 A , @ 1 A , @ 1 A .
: ;
1 1 1
Hint
(b) Repeat the previous problem, but with three independent vectors
u, v, w where v ? and w? are as defined by the Gram-Schmidt
procedure.
275
276 Orthonormal Bases and Complements
M · N = trM N .
Is A? n
n = Sn ? Is Mn = Sn An ?
14. The vector space V = span{sin(t), sin(2t), sin(3t), sin(3t)} has an inner
product: Z 2⇡
f · g := f (t)g(t)dt .
0
276
Diagonalizing Symmetric Matrices
15
Symmetric matrices have many applications. For example, if we consider the
shortest distance between pairs of important cities, we might get a table like
the following.
0 1
0 2000 80
M = @2000 0 2010A = M T .
80 2010 0
One very nice property of symmetric matrices is that they always have
real eigenvalues. Review exercise 1 guides you through the general proof, but
below is an example for 2 ⇥ 2 matrices.
277
278 Diagonalizing Symmetric Matrices
Notice that the discriminant 4b2 + (a d)2 is always positive, so that the eigenvalues
must be real.
M x = x, M y = µy.
xT M y = xT µy = µx y, and
0 = xT M y xT M y = (µ ) x y.
278
279
✓ ◆
2 1
Example 141 The matrix M = has eigenvalues determined by
1 2
det(M I) = (2 )2 1 = 0.
So
✓ ◆the eigenvalues
✓ ◆ of M are 3 and 1, and the associated eigenvectors turn out to be
1 1
and . It is easily seen that these eigenvectors are orthogonal;
1 1
✓ ◆ ✓ ◆
1 1
= 0.
1 1
In chapter 14 we saw that the matrix P built from any orthonormal basis
(v1 , . . . , vn ) for Rn as its columns,
P = v1 · · · vn ,
1
P = P T , or P P T = I = P T P.
Moreover, given any (unit) vector x1 , one can always find vectors x2 , . . . , xn
such that (x1 , . . . , xn ) is an orthonormal basis. (Such a basis can be obtained
using the Gram-Schmidt procedure.)
Now suppose M is a symmetric n ⇥ n matrix and 1 is an eigenvalue
with eigenvector x1 (this is always the case because every matrix has at
least one eigenvalue–see Review Problem 3). Let P be the square matrix of
orthonormal column vectors
P = x1 x2 · · · xn ,
279
280 Diagonalizing Symmetric Matrices
1
But P is an orthogonal matrix, so P = P T . Then:
0 1
xT1
B C
P 1 = P T = @ ... A
xTn
0 1
xT1 1 x1 ⇤ ··· ⇤
BxT 1 x1 ⇤ · · · ⇤C
B 2 C
) P T M P = B .. .. C
@ . .A
xTn 1 x1 ⇤ ··· ⇤
0 1
1 ⇤ ··· ⇤
B 0 ⇤ ··· ⇤C
B C
= B .. .. C
@. ⇤ .A
0 ⇤ ··· ⇤
0 1
1 0 ··· 0
B0 C
B C
= B .. C.
@. M̂ A
0
M = M T , M = P DP T
280
15.1 Review Problems 281
3 ⇥ 3 Example
281
282 Diagonalizing Symmetric Matrices
0 1
z1
B ..C
(c) Let x = @ .A 2 Cn . Let x† = z 1 · · · z n 2 Cn (a 1 ⇥ n
zn
complex matrix or a row vector). Compute x† x. Using the result
of part 1a, what can you say about the number x† x? (E.g., is it
real, imaginary, positive, negative, etc.)
(d) Suppose M = M T is an n ⇥ n symmetric matrix with real entries.
Let be an eigenvalue of M with eigenvector x, so M x = x.
Compute:
x† M x
x† x
(e) Suppose ⇤ is a 1 ⇥ 1 matrix. What is ⇤T ?
(f) What is the size of the matrix x† M x?
(g) For any matrix (or vector) N , we can compute N by applying
complex conjugation to each entry of N . Compute (x† )T . Then
compute (x† M x)T . Note that for matrices AB + C = AB + C.
(h) Show that = . Using the result of a previous part of this
problem, what does this say about ?
Hint
2. Let 0 1
a
x 1 = @ bA ,
c
where a2 + b2 + c2 = 1. Find vectors x2 and x3 such that {x1 , x2 , x3 }
is an orthonormal basis for R3 . What can you say about the matrix P
whose columns are the vectors x1 , x2 and x3 that you found?
linear
3. Let V 3 v 6= 0 be a vector space, dimV = n and L : V !V.
282
15.1 Review Problems 283
(b) Explain why there exist scalars ↵i not all zero such that
↵0 v + ↵1 Lv + ↵2 L2 v + · · · + ↵n Ln v = 0 .
p(z) = ↵0 + ↵1 z + ↵2 z 2 + · · · + ↵m z n .
p(z) = ↵m (z 1 )(z 2 ) . . . (z m) .
(L 1 )(L 2 ) . . . (L m )v = 0?
4. (Dimensions of Eigenspaces)
(a) Let 0 1
4 0 0
@
A= 0 2 2A .
0 2 2
Find all eigenvalues of A.
(b) Find a basis for each eigenspace of A. What is the sum of the
dimensions of the eigenspaces of A?
(c) Based on your answer to the previous part, guess a formula for the
sum of the dimensions of the eigenspaces of a real n⇥n symmetric
matrix. Explain why your formula must work for any real n ⇥ n
symmetric matrix.
283
284 Diagonalizing Symmetric Matrices
284
Kernel, Range, Nullity, Rank
16
Given a linear transformation
L: V ! W ,
we often want to know if it has an inverse, i.e., if there exists a linear trans-
formation
M: W !V
M Lv = v ,
LM w = w .
285
286 Kernel, Range, Nullity, Rank
16.1 Range
Definition The range of a function f : S ! T is the set
ran(f ) := {f (s)|s 2 S} ⇢ T .
286
16.2 Image 287
It might occur to you that the range of the 3 ⇥ 4 matrix from the last
example can be expressed as the range of a 3 ⇥ 2 matrix;
0 1 0 1
1 2 0 1 1 0
ran @1 2 1 2A = ran @1 1A .
0 0 1 1 0 1
Indeed, because the span of a set of vectors does not change when we replace
the vectors with another set through an invertible process, we can calculate
ranges through strings of equalities of ranges of matrices that di↵erer by
Elementary Column Operations, ECOs, ending with the range of a matrix
in Column Reduced Echelon Form, CREF, with its zero columns deleted.
1 2 0 0 2 1 0 2 1 0 1 1
0 1 0 1
1 0 0 1 0
c03 =c3 c2
= ran 1 1 0 = ran 1 1A .
@ A @
0 1 0 0 1
16.2 Image
Definition For any subset U of the domain S of a function f : S ! T the
image of U is
f (U ) = Im U := {f (x)|x 2 U } .
287
288 Kernel, Range, Nullity, Rank
is the parallelepiped
8 0 1 0 1 0 1 9
< 1 0 0 =
Img U = a @1A + b @1A + c @1A a, b, c 2 [0, 1] .
: ;
0 0 1
Note that for most subsets U of the domain S of a function f the image of
U is not a vector space. The range of a function is the particular case of the
image where the subset of the domain is the entire domain; ranf = ImgS.
For this reason, the range of f is also sometimes called the image of f and is
sometimes denoted im(f ) or f (S). We have seen that the range of a matrix
is always a span of vectors, and hence a vector space.
Note that we prefer the phrase “range of f ” to the phrase “image of f”
because we wish to avoid confusion between homophones; the word “image”
is also used to describe a single element of the codomain assigned to a single
element of the domain. For example, one might say of the function A : R ! R
with rule of correspondence A(x =) = 2x 1 for all x in R that the image of
2 is 3 with this second meaning of the word “image” in mind. By contrast,
one would never say that the range of 2 is 3 since the former is not a function
and the latter is not a set.
For thinking about inverses of function we want to think in the oposite
direction in a sense.
Definition The pre-image of any subset U ⇢ T is
1
f (U ) := {s 2 S|f (s) 2 U } ⇢ S.
The pre-image of a set U is the set of all elements of S which map to U .
8 0 1 9
< 2 =
Example 146 The pre-image of the set U = a @1A a 2 [0, 1] (a line segment)
: ;
1
under the matrix 0 1
1 0 1
M = @0 1 1 A : R3 ! R3
0 1 1
is the set
1
M U = {x|M x = v for some v 2 U }
80 1 0 10 1 0 1 9
< x 1 0 1 x 2 =
= @ y A @0 1 1A @ y A = a @1A for some a 2 [0, 1] .
: ;
z 0 1 1 z 1
288
16.2 Image 289
Since
0 1 0 1
1 0 1 2a 1 0 1 2a
RREF @0 1 1 aA = @0 1 1 aA
0 1 1 a 0 0 0 0
we have
8 0 1 0 1 9
< 2 1 =
M 1 U = a @1A + b @ 1A a 2 [0, 1], b 2 R ,
: ;
0 1
289
290 Kernel, Range, Nullity, Rank
290
16.2 Image 291
Proof. This is an “if and only if” statement so the proof has two parts.
Now let us specialize to functions f that are linear maps between two
vector spaces. Everything we said above for arbitrary functions is exactly
the same for linear functions. However, the structure of vector spaces lets
us say much more about one-to-one and onto functions whose domains are
vector spaces than we can say about functions on general sets. For example,
we know that a linear function always sends 0V to 0W , i.e.,
f (0V ) = 0W
291
292 Kernel, Range, Nullity, Rank
16.2.2 Kernel
Let L : V ! W be a linear transformation. Suppose L is not injective. Then
we can find v1 6= v2 such that Lv1 = Lv2 . So v1 v2 6= 0, but
L(v1 v2 ) = 0.
ker L = {v 2 V | Lv = 0W } ⇢ V
Notice that if L has matrix M in some basis, then finding the kernel of L
is equivalent to solving the homogeneous system
M X = 0.
Then all solutions of M X = 0 are of the form x = y = 0. In other words, ker L = {0},
and so L is injective.
292
16.2 Image 293
kerL = {0V } .
293
294 Kernel, Range, Nullity, Rank
cw + dw0 = L(cv + dv 0 ) .
Now the subspace theorem strikes again, and we have the following theorem:
Theorem 16.2.4. If L : V ! W is linear then the range L(V ) is a subspace
of W .
Example 151 Let L(x, y) = (x + y, x + 2y, y). The range of L is a plane through
the origin and thus a subspace of R3 . Indeed the matrix of L in the standard basis is
0 1
1 1
@1 2A .
0 1
The columns of this matrix encode the possible outputs of the function L because
0 1 0 1 0 1
1 1 ✓ ◆ 1 1
@ A x
L(x, y) = 1 2 = x 1 + y 2A .
@ A @
y
0 1 0 1
294
16.2 Image 295
Thus 80 1 0 19
< 1 1 =
2
L(R ) = span @ 1 , 2A
A @
: ;
0 1
Hence, when bases and a linear transformation is are given, people often refer to its
range as the column space of the corresponding matrix.
Thus
L(V ) = span L(S) = span{Lv1 , . . . , Lvn } .
However, the set {Lv1 , . . . , Lvn } may not be linearly independent; we must
solve
c1 Lv1 + · · · + cn Lvn = 0 ,
to determine whether it is. By finding relations amongst the elements of
L(S) = {Lv1 , . . . , Lvn }, we can discard vectors until a basis is arrived at.
The size of this basis is the dimension of the range of L, which is known as
the rank of L.
295
296 Kernel, Range, Nullity, Rank
{v1 , . . . , vp , u1 , . . . , uq },
where v1 , . . . , vp is also a basis for ker L. This can always be done, for exam-
ple, by finding a basis for the kernel of L and then extending to a basis for V .
Then p = null L and p + q = dim V . Then we need to show that q = rank L.
To accomplish this, we show that {L(u1 ), . . . , L(uq )} is a basis for L(V ).
To see that {L(u1 ), . . . , L(uq )} spans L(V ), consider any vector w in L(V ).
Then we can find constants ci , dj such that:
w = L(c1 v1 + · · · + cp vp + d1 u1 + · · · + dq uq )
= c1 L(v1 ) + · · · + cp L(vp ) + d1 L(u1 ) + · · · + dq L(uq )
= d1 L(u1 ) + · · · + dq L(uq ) since L(vi ) = 0,
) L(V ) = span{L(u1 ), . . . , L(uq )}.
0 = d1 L(u1 ) + · · · + dq L(uq )
= L(d1 u1 + · · · + dq uq ).
296
16.3 Summary 297
Here we only know that dim V = n and dim W = m. The rank of the map L is
the dimension of its image and also the number of linearly independent columns of
M . Hence, this is sometimes called the column rank of M . The dimension formula
predicts the dimension of the kernel, i.e. the nullity: null L = dimV rankL = n r.
To compute the kernel we would study the linear system
Mx = 0 ,
which gives m equations for the n-vector x. The row rank of a matrix is the number
of linearly independent rows (viewed as vectors). Each linearly independent row of M
gives an independent equation satisfied by the n-vector x. Every independent equation
on x reduces the size of the kernel by one, so if the row rank is s, then null L + s = n.
Thus we have two equations:
From these we conclude the r = s. In other words, the row rank of M equals its
column rank.
16.3 Summary
We have seen that a linear transformation has an inverse if and only if it is
bijective (i.e., one-to-one and onto). We also know that linear transforma-
tions can be represented by matrices, and we have seen many ways to tell
whether a matrix is invertible. Here is a list of them:
297
298 Kernel, Range, Nullity, Rank
Invertibility Conditions
298
16.4 Review Problems 299
The equations in the last two parts describe how a linear transforma-
tion M : Rm ! Rn determines orthogonal decompositions of both it’s
domain and target. This result sometimes goes by the humble name
The Fundamental Theorem of Linear Algebra.
299
300 Kernel, Range, Nullity, Rank
Hint
(a) Explain why the first three columns of the original matrix M form
a basis for L(R4 ).
(b) Find and describe an algorithm (i.e., a general procedure) for
computing a basis for L(Rn ) when L : Rn ! Rm .
(c) Use your algorithm to find a basis for L(R4 ) when L : R4 ! R3 is
the linear transformation whose matrix M in the standard basis
is 0 1
2 1 1 4
@0 1 0 5 A .
4 1 1 6
5. Claim:
d
: Pn (x) ! Pn (x) .
dx
300
16.4 Review Problems 301
Find the dimension of the kernel and image of this operator. What
happens if the target space is changed to Pn 1 (x) or Pn+1 (x)?
Now consider P2 (x, y), the space of polynomials of degree two or less
in x and y. (Recall how degree is counted; xy is degree two, y is degree
one and x2 y is degree three, for example.) Let
@ @
L := + : P2 (x, y) ! P2 (x, y).
@x @y
@ @
(For example, L(xy) = @x (xy) + @y (xy) = y + x.) Find a basis for the
kernel of L. Verify the dimension formula in this case.
7. Lets demonstrate some ways the dimension formula can break down if
a vector space is infinite dimensional.
(a) Let R[x] be the vector space of all polynomials in the variable x
d
with real coefficients. Let D = dx be the usual derivative operator.
Show that the range of D is R[x]. What is ker D?
Hint: Use the basis {xn | n 2 N}.
(b) Let L : R[x] ! R[x] be the linear map
L(p(x)) = xp(x) .
8. This question will answer the question, “If I choose a bit vector at
random, what is the probability that it lies in the span of some other
vectors?”
301
302 Kernel, Range, Nullity, Rank
ii. Give some method for choosing a random bit vector v in B 3 . Sup-
pose S is a collection of 2 linearly independent bit vectors in B 3 .
How can we tell whether S [ {v} is linearly independent? Do you
think it is likely or unlikely that S [ {v} is linearly independent?
Explain your reasoning.
iii. If P is the characteristic polynomial of a 3 ⇥ 3 bit matrix, what
must the degree of P be? Given that each coefficient must be
either 0 or 1, how many possibilities are there for P ? How many
of these possible characteristic polynomials have 0 as a root? If M
is a 3⇥3 bit matrix chosen at random, what is the probability that
it has 0 as an eigenvalue? (Assume that you are choosing a random
matrix M in such a way as to make each characteristic polynomial
equally likely.) What is the probability that the columns of M
form a basis for B 3 ? (Hint: what is the relationship between the
kernel of M and its eigenvalues?)
Note: We could ask the same question for real vectors: If I choose a real
vector at random, what is the probability that it lies in the span
of some other vectors? In fact, once we write down a reasonable
way of choosing a random real vector, if I choose a real vector in
Rn at random, the probability that it lies in the span of n 1
other real vectors is zero!
302
Least squares and Singular Values
17
linear
Consider the linear algebraic equation L(x) = v, where L : U ! W and
v 2 W are known while x is unknown. As we have seen, this system may
have one solution, no solutions, or infinitely many solutions. But if v is not
in the range of L there will never be any solutions for L(x) = v.
“My work always tried to unite the Truth with the Beautiful,
but when I had to choose one or the other, I usually chose the
Beautiful.”
– Hermann Weyl.
303
304 Least squares and Singular Values
This method has many applications, such as when trying to fit a (perhaps
linear) function to a “noisy” set of observations. For example, suppose we
measured the position of a bicycle on a racetrack once every five seconds.
Our observations won’t be exact, but so long as the observations are right on
average, we can figure out a best-possible linear function of position of the
bicycle in terms of time.
Suppose M is the matrix for the linear function L : U ! W in some
bases for U and W . The vectors v and x are represented by column vectors
V and X in these bases. Then we need to approximate
MX V ⇡ 0.
M T M X = M T V.
304
305
However, the converse is often false. In fact, the equation M X = V may have
no solutions at all, but still have least squares solutions to M T M X = M T V .
Observe that since M is an m ⇥ n matrix, then M T is an n ⇥ m matrix.
Then M T M is an n ⇥ n matrix, and is symmetric, since (M T M )T = M T M .
Then, for any vector X, we can evaluate X T M T M X to obtain a num-
ber. This is a very nice number, though! It is just the length |M X|2 =
(M X)T (M X) = X T M T M X.
X = (M T M ) 1 M T V.
• Compute M T M and M T V .
• Solve (M T M )X = M T V by Gaussian elimination.
Example 153 Captain Conundrum falls o↵ of the leaning tower of Pisa and makes
three (rather shaky) measurements of his velocity at three di↵erent times.
ts v m/s
1 11
2 19
3 31
Having taken some calculus1 , he believes that his data are best approximated by
a straight line
v = at + b.
1
In fact, he is a Calculus Superhero.
305
306 Least squares and Singular Values
11 = a · 1 + b
19 = a · 2 + b
31 = a · 3 + b.
1
v = 10 t +.
3
Notice that this equation implies that Captain Conundrum accelerates towards Italian
soil at 10 m/s2 (which is an excellent approximation to reality) and that he started at
a downward velocity of 13 m/s (perhaps somebody gave him a shove...)!
306
17.1 Projection Matrices 307
M X = Vr =) M T M x = M T Vr =) M T M x = M T (Vr + 0)
=) M T M x = M T (Vr +Vk ) =) M T M x = M T V =) X = (M T M ) 1 M T V
M (M T M ) 1 M T V = Vr .
That is, the matrix which projects V onto its ran M part is M (M T M ) 1 M T .
0 1 80 1 0 19 0 1
1 < 1 1 = 1 1
@ A
Example 154 To project 1 onto span @ A
1 , @ 1 A @
= ran 1 1A multi-
: ;
1 0 0 0 0
ply by the matrix
0 12 0 13 1
1 1 ✓ ◆
1 1 ✓ ◆
@1 A 4 1 1 0 @ 1 1 0
1 1 1A5
1 1 0 1 1 0
0 0 0 0
0 1
1 1 ✓ ◆ 1✓ ◆
2 0 1 1 0
= @1 1A
0 2 1 1 0
0 0
0 1 0 1
1 1 ✓ ◆ 2 0 0
1@ 1 1 0 1@
= 1 1A = 0 2 0A .
2 1 1 0 2
0 0 0 0 0
This gives
0 10 1 0 1
2 0 0 1 1
1@ A @
0 2 0 1 = 1A .
A @
2
0 0 0 1 0
307
308 Least squares and Singular Values
308
17.2 Singular Value Decomposition 309
L⇤ Lui = i ui .
I.e., Lui is an eigenvector of LL⇤ . The vectors (Lu1 , . . . , Lun ) are linearly
independent, because ker L = {0} (this is where we use our simplifying as-
sumption, but you can try and extend our analysis to the case where it no
longer holds).
Lets compute the angles between and lengths of these vectors. For that
we express the vectors ui in the bases used to compute the matrix M of L.
Denoting these column vectors by Ui we then compute
(M Ui ) · (M Uj ) = UiT M T M Uj = j UiT Uj = j Ui · Uj = j ij .
We see that vectors (Lu1 , . . . ,pLun ) are orthogonal but not orthonormal.
Moreover, the length of Lui is i . Normalizing gives the orthonormal and
linearly independent ordered set
✓ ◆
Lu1 Lun
p ,..., p .
1 n
In general, this cannot be a basis for W since ker L = {0}, dim L(V ) =
dim V, and in turn dim V dim W , so n m.
However, it is a subset of the eigenvectors of LL⇤ so there is an orthonor-
mal basis of eigenvectors of LL⇤ of the form
✓ ◆
0 Lu1 Lun
O = p , . . . , p , vn+1 , . . . , vm =: (v1 , . . . , vm ) .
1 n
Now lets compute the matrix of L with respect to the orthonormal basis
O = (u1 , . . . , un ) for V and the orthonormal basis O0 = (v1 , . . . , vm ) for W .
309
310 Least squares and Singular Values
As usual, our starting point is the computation of L acting on the input basis
vectors;
p p
LO = Lu1 , . . . , Lun = 1 v1 , . . . , n vn
0p 1
1 0 ··· 0
B p C
B 0 2 ··· 0 C
B . .. .. .. C
B .. . . . C
B C
B p C
= v1 , . . . , v m B 0 0 ··· nC .
B C
B 0 0 ··· 0 C
B C
B .. .. .. C
@ . . . A
0 0 ··· 0
p
The result is very close to diagonalization; the numbers i along the leading
diagonal are called the singular values of L.
Example 155 Let the matrix of a linear transformation be
0 1 1
1
2 2
B C
M =@ 1 1A .
1 1
2 2
310
17.2 Singular Value Decomposition 311
are eigenvectors of
0 1 1
1
2 0 2
MMT = @ 0 2 0A
1 1
2 0 2
The new matrix M 0 of the linear transformation given by M with respect to the bases
O and O0 is
0 1
1 p0
M 0 = @0 2A ,
0 0
p
so the singular values are 1, 2.
Finally note that arranging the column vectors of O and O0 into change of basis
matrices
0 1 1
1 1 ! p
2
0 p12
p p B C
2 2 B 0 C,
P = , Q = @ 1 0 A
p1 p1
2 2 p1 p1
2
0 2
we have, as usual,
M0 = Q 1
MP .
311
312 Least squares and Singular Values
Hint
• uT M T M v = v T M T M u.
312
17.3 Review Problems 313
Hint
313
314 Least squares and Singular Values
314
A
List of Symbols
315
316 List of Symbols
316
B
Fields
Definition A field F is a set with two operations + and ·, such that for all
a, b, c 2 F the following axioms are satisfied:
Roughly, all of the above mean that you have notions of +, , ⇥ and ÷ just
as for regular real numbers.
Fields are a very beautiful structure; some examples are rational num-
bers Q, real numbers R, and complex numbers C. These examples are in-
finite, however this does not necessarily have to be the case. The smallest
317
318 Fields
example of a field has just two elements, Z2 = {0, 1} or bits. The rules for
addition and multiplication are the usual ones save that
1 + 1 = 0.
318
Online Resources
C
Here are some internet places to get linear algebra help:
http://www.youtube.com/user/numericalmethodsguy
http://anothermathgeek.hubpages.com/hub/What-the-Heck-are-Eigenvalues-and-Eigenvectors
320
Sample First Midterm
D
Here are some worked problems typical for what you might expect on a first
midterm examination.
1. Solve the following linear system. Write the solution set in vector form.
Check your solution. Write one particular solution and one homogeneous
solution, if they exist. What does the solution set look like geometrically?
x + 3y =4
x 2y + z = 1
2x + y + z =5
321
322 Sample First Midterm
(d) What are the vectors X0 and Yi called and which matrix equations do
they solve?
(e) Check separately that X0 and each Yi solve the matrix systems you
claimed they solved in part (d).
322
323
8. What are the four main things we need to define for a vector space? Which
of the following is a vector space over R? For those that are not vector
spaces, modify one part of the definition to make it into a vector space.
(a) V =✓ { 2 ⇥◆
2 matrices
✓ ◆ entries in R}, usual matrix addition, and
with
a b ka b
k· = for k 2 R.
c d kc d
(b) V = {polynomials with complex coefficients of degree 3}, with usual
addition and scalar multiplication of polynomials.
(c) V = {vectors in R3 with at least one entry containing a 1}, with usual
addition and scalar multiplication.
323
324 Sample First Midterm
Solutions
1. As an additional exercise, write out the row operations above the ⇠ signs
below.
0 1 0 1 0 3 11
1
1 3 0 4 1 3 0 4 1 0 5 5
B C B C B 1 3 C
@ 1 2 1 1 A⇠@ 0 5 1 3 A⇠@ 0 1 5 5 A.
2 1 1 5 0 5 1 3 0 0 0 0
Solution set is
80 1 0 11 1 0 31 9
< x 5 5 =
@yA = @ 3 A + µ @ 1 A : µ 2 R .
: 5 5 ;
z 0 1
0 11 1
5
Geometrically this represents a line in R3 through the point @ 35 A running
0
0 31
5
B 1C
parallel to the vector @ 5 A.
1
0 11 1 0 3
1
5 5
The vector @ 3
5
A is a particular solution and @ 1A
5 is a homogeneous
0 1
solution.
As a double check note that
0 1 0 11 1 0 1 0 1 0 31 0 1
1 3 0 5 4 1 3 0 5 0
@ 1 2 1 A @ 3A @ A
= 1 and @ 1 2 1 A @ 1A
= 0A .
@
5 5
2 1 1 0 5 2 1 1 1 0
324
325
325
326 Sample First Midterm
MX V, M Y1 , and M Y2
3. As usual, be sure to write out the row operations above the ⇠’s so your work
can be easily checked.
0 1
1 2 3 4 1 0 0 0
B 2 4 7 11 0 1 0 0 C
B C
@ 3 7 14 25 0 0 1 0 A
4 11 25 50 0 0 0 1
0 1
1 2 3 4 1 0 0 0
B 0 0 1 3 2 1 0 0 C
⇠B@ 0 1 5
C
13 3 0 1 0 A
0 3 13 34 4 0 0 1
0 1
1 0 7 22 7 0 2 0
B 0 1 5 13 3 0 1 0 C
⇠B
@ 0 0
C
1 3 2 1 0 0 A
0 0 2 5 5 0 3 1
0 1
1 0 0 1 7 7 2 0
B 0 1 0 2 7 5 1 0 C
⇠B
@ 0 0 1
C
3 2 1 0 0 A
0 0 0 1 1 2 3 1
0 1
1 0 0 0 6 9 5 1
B 0 1 0 0 9 1 5 2 C
⇠B
@ 0 0 1 0
C.
5 5 9 3 A
0 0 0 1 1 2 3 1
Check
0 10 1 0 1
1 2 3 4 6 9 5 1 1 0 0 0
B2 4 7 11C B 9 1 5 2C B0 1 0 0C
B CB C=B C.
@3 7 14 25A @ 5 5 9 3A @0 0 1 0A
4 11 25 50 1 2 3 1 0 0 0 1
4. ✓ ◆ ! ✓ ◆
1 1 11 4
T 1 2 3 5 5 5 5
M M = 3 2 = 2 3 .
1 1 5 5 5 5
326
327
= (2 · 2 + 1 · 3) + (3 · 1 + ( 1) · ( 1)) 2 = 9.
5. First ✓ ◆✓ ◆
T cos ✓ sin ✓ x
X (M X) = X M X = x y
sin ✓ cos ✓ y
✓ ◆
x cos ✓ + y sin ✓
= x y = (x2 + y 2 ) cos ✓ .
x sin ✓ + y cos ✓
p p
Now ||X|| = X X = x2 + y 2 and (M X) (M X) = XM T M X. But
✓ ◆✓ ◆
T cos ✓ sin ✓ cos ✓ sin ✓
M M=
sin ✓ cos ✓ sin ✓ cos ✓
✓ ◆
cos2 ✓ + sin2 ✓ 0
= =I.
0 cos2 ✓ + sin2 ✓
p
Hence ||M X|| = ||X|| = x2 + y 2 . Thus the cosine of the angle between X
and M X is given by
X (M X) (x2 + y 2 ) cos ✓
=p p = cos ✓ .
||X|| ||M X|| x2 + y 2 x2 + y 2
In other words, the angle is ✓ OR ✓. You should draw two pictures, one
where the angle between X and M X is ✓, the other where it is ✓.
|X (M X)|
For Cauchy–Schwartz, ||X|| ||M X|| = | cos ✓| = 1 when ✓ = 0, ⇡. For the
triangle equality M X = X achieves ||X + M X|| = ||X|| + ||M X||, which
requires ✓ = 0.
6. This is
✓ a block
◆ matrix problem. Notice the that matrix M is really just
I I
M= , where I and 0 are the 3⇥3 identity zero matrices, respectively.
0 I
But ✓ ◆✓ ◆ ✓ ◆
I I I I I 2I
M2 = =
0 I 0 I 0 I
and ✓ ◆✓ ◆ ✓ ◆
I I I 2I I 3I
M3 = =
0 I 0 I 0 I
327
328 Sample First Midterm
✓ ◆
I kI
so, Mk = , or explicitly
0 I
0 1
1 0 0 k 0 0
B0 1 0 0 k 0C
B C
B0 0 1 0 0 kC
Mk = B
B0
C.
B 0 0 1 0 0CC
@0 0 0 0 1 0A
0 0 0 0 0 1
8. The four main ingredients are (i) a set V of vectors, (ii) a number field K
(usually K = R), (iii) a rule for adding vectors (vector addition) and (iv)
a way to multiply vectors by a number to produce a new vector (scalar
multiplication). There are, of course, ten rules that these four ingredients
must obey.
(a) This is not a vector space. Notice that distributivity of scalar multi-
plication requires 2u = (1 + 1)u = u + u for any vector u but
✓ ◆ ✓ ◆
a b 2a b
2· =
c d 2c d
328
329
(b) This is a vector space. Although, the question does not ask you to, it is
a useful exercise to verify that all ten vector space rules are satisfied.
(c) This is not a vector space for many reasons. An easy one is that
(1, 1, 0) and ( 1, 1, 0) are both in the space, but their sum (0, 0, 0) is
not (i.e., additive closure fails). The easiest way to repair this would
be to drop the requirement that there be at least one entry equaling 1.
10. 0 1 0 10 1
1 1 1 2 1 0 0 0 1 1 1 2
B 1 3 2 2C B 1 1 0 0C B0 2 3 0C
B C=B CB C
@ 1 3 4 6 A @ 1 0 1 0A @0 2 5 8A
0 4 7 2 0 0 0 1 0 4 7 2
329
330 Sample First Midterm
0 10 1
1 0 0 0 1 1 1 2
B 1 1 0 C B
0C B0 2 3 0C
=B
@ 1
C
1 1 0A @0 0 2 8A
0 2 0 1 0 0 1 2
0 10 1
1 0 0 0 1 1 1 2
B 1 1 0 0C B0 2 3 0C
=B
@ 1
CB C.
1 1 0A @0 0 2 8A
1
0 2 2 1 0 0 0 2
To solve M X = V using M = LU we first solve LW = V whose augmented
matrix reads
0 1 0 1
1 0 0 0 7 1 0 0 0 7
B 1 1 0 0 6 C B 0 1 0 0 1 C
B C⇠B C
@ 1 1 1 0 12 A @ 0 0 1 0 18 A
1 1
0 2 2 1 7 0 2 2 1 7
0 1
1 0 0 0 7
B 0 1 0 0 1 C
⇠B
@ 0
C,
0 1 0 18 A
0 0 0 1 4
from which we can read o↵ W . Now we compute X by solving U X = W
with the augmented matrix
0 1 0 1
1 1 1 2 7 1 1 1 2 7
B 0 2 3 0 1 C B 0 2 3 0 1 C
B C⇠B C
@ 0 0 2 8 18 A @ 0 0 2 0 2 A
0 0 0 2 4 0 0 0 1 2
0 1 0 1
1 1 1 2 7 1 0 0 0 1
B 0 2 0 0 2 C B 0 1 0 0 1 C
⇠B
@ 0
C⇠B C.
0 1 0 1 A @ 0 0 1 0 1 A
0 0 0 1 2 0 0 0 1 2
So x = 1, y = 1, z = 1 and w = 2.
330
Sample Second Midterm
E
Here are some worked problems typical for what you might expect on a second
midterm examination.
✓ ◆
a b
1. Determinants: The determinant det M of a 2 ⇥ 2 matrix M = is
c d
defined by
det M = ad bc .
2. Let 0 1
1 1 1
B C
A=B C
@ 2 2 3 A.
4 5 6
0 1
1
Compute det A. Find all solutions to (i) AX = 0 and (ii) AX = @ 2 A for
3
331
332 Sample Second Midterm
1 1
det M = trM 2 + (trM )2 .
2 2
For example
✓ ◆
a b
perm = ad + bc .
c d
Calculate 0 1
1 2 3
perm @4 5 6A .
7 8 9
What do you think would happen to the permanent of an n ⇥ n matrix M
if (include a brief explanation with each answer):
X T X = (1) ,
and define
H=I 2XX T ,
H = HT = H 1
.
332
333
333
334 Sample Second Midterm
Solutions
1. (a) Whenever detM = ad bc 6= 0.
(b) Unit determinant bit matrices:
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 0 1 1 1 0 0 1 1 1 0 1
, , , , , .
0 1 0 1 1 1 1 0 1 0 1 1
So we have found a pair of matrices that are not row equivalent but
do have the same determinant. It follows that the statement is false.
2.
detA = 1.(2.6 3.5) 1.(2.6 3.4) + 1.(2.5 2.4) = 1.
(i) Since detA 6= 0, the homogeneous system AX = 0 only has the solution
X = 0. (ii) It is efficient to compute the adjoint
0 1T 0 1
3 0 2 3 1 1
adj A = @ 1 2 1A = @ 0 2 1A
1 1 0 2 1 0
Hence 0 1
3 1 1
A 1
= @ 0 2 1A .
2 1 0
334
335
Thus 0 10 1 0 1
3 1 1 1 2
X=@ 0 2 1A @2A = @ 1A .
2 1 0 3 0
Finally, 0 1
1 1 1
PA ( ) = det @ 2 2 3A
4 5 6
h i
= (1 )[(2 )(6 ) 15] [2.(6 ) 12] + [10 4.(2 )]
3 2
= 9 + 1.
✓ ◆
a b
3. Call M = . Then detM = ad bc, yet
c d
✓ 2 ◆
1 1 1 a + bc ⇤ 1
tr M 2 + (tr M )2 = tr (a + d)2
2 2 2 ⇤ bc + d2 2
1 2 1
=(a + 2bc + d2 ) + (a2 + 2ad + d2 ) = ad bc ,
2 2
which is what we were asked to show.
4.
0 1
1 2 3
perm @4 5 6A = 1 · (5 · 9 + 6 · 8) + 2 · (4 · 9 + 6 · 7) + 3 · (4 · 8 + 5 · 7) = 450 .
7 8 9
(b) Multiplying the ith row by replaces M i (j) in the formula for the
permanent by M i (j) . Therefore the permanent is multiplied by an
overall factor .
(c) The permanent of a matrix transposed equals the permanent of the
original matrix, because in the formula for the permanent this amounts
to summing over permutations of rows rather than columns. But we
(1) (2) (n)
could then sort the product M1 M2 . . . Mn back into its original
order using the inverse permutation 1 . But summing over permuta-
335
336 Sample Second Midterm
(d) Swapping two rows also leaves the permanent unchanged. The argu-
ment is almost the same as in the previous part, except that we need
only reshu✏e two matrix elements M j (i) and M i (j) (in the case where
rows i and j were swapped). Then we use the fact that summing over
all permutations or over all permutations e obtained by swapping a
pair in are equivalent operations.
6. We know M v = v. Hence
M 2v = M M v = M v = M v = 2
v,
and similarly
M kv = M k 1
v = ... = k
v.
So v is an eigenvector of M k with eigenvalue k.
N k v = 0.
336
337
337
338 Sample Second Midterm
1 1
PA ( ) = det( I A) = det( P IP P DP ) = detP.det( I D).detP
0 1
1 0 ··· 0
B 0 0 C
B 2 C
= det( I D) = det B .. .. .. C
@ . . . A
0 0 ··· n
=( 1 )( 2) . . . ( n) .
PA (D) = (D 1 )(D 2 ) . . . (D n)
0 10 1 0 1
0 0 ··· 0 1 0 ··· 0 1 0 ··· 0
B0 0C B0 0 0C B0 0C
B 2 CB C B 2 C
= B. .. . CB . .. .. C . . . B .. . . .. C = 0.
@ .. . .. A @ .. . . A @. . .A
0 0 ··· n 0 0 ··· n 0 0 ··· 0
We conclude the PM (M ) = 0.
338
339
10. (i) We say that the vectors {v1 , v2 , . . . vn } are linearly independent if there
exist no constants c1 , c2 , . . . cn (not all vanishing) such that c1 v1 + c2 v2 +
· · · + cn vn = 0. Alternatively, we can require that there is no non-trivial
solution for scalars c1 , c2 , . . . , cn to the linear system c1 v1 + c2 v2 + · · · +
cn vn = 0. (ii) We say that these vectors span a vector space V if the set
span{v1 , v2 , . . . vn } = {c1 v1 + c2 v2 + · · · + cn vn : c1 , c2 , . . . cn 2 R} = V . (iii)
We call {v1 , v2 , . . . vn } a basis for V if {v1 , v2 , . . . vn } are linearly independent
and span{v1 , v2 , . . . vn } = V .
x
that any vector @ y A can be written as a linear combination of u, v and w
z
0 1 0 1 0 1 0 1
1 4 10 x
c1 @ 4A + c2 @5A + c3 @ 7 A = @ y A .
3 0 h+3 z
Both requirements mean that the matrix on the left hand side must be
invertible, so we examine its determinant
0 1
1 4 10
det @ 4 5 7 A = 4 · ( 4 · (h + 3) 7 · 3) + 5 · ( 1 · (h + 3) 10 · 3)
3 0 h+3
= 11(h 3) ·
Hence we obtain a basis whenever h 6= 3.
339
340 Sample Second Midterm
340
Sample Final Exam
F
Here are some worked problems typical for what you might expect on a final
examination.
341
342 Sample Final Exam
2. Kircho↵ ’s laws: Electrical circuits are easy to analyze using systems of equa-
tions. The change in voltage (measured in Volts) around any loop due to
batteries | and resistors /\/\/\/\ (given by the product of the current mea-
sured in Amps and resistance measured in Ohms) equals zero. Also, the sum
of currents entering any junction vanishes. Consider the circuit
1 Ohm 2 Ohms
I Amps 13 Amps J Amps
60 Volts 80 Volts V Volts
3 Ohms 3 Ohms
Find all possible equations for the unknowns I, J and V and then solve for
I, J and V . Give your answers with correct units.
L:U !V
dim U = n , dim V = m ,
and
m 6= n .
Also assume
kerL = {0U } .
342
343
(h) Is M T M diagonalizable?
(i) Does M T M have a zero eigenvalue?
(j) Suppose U = V and ker L 6= {0U }. Find an eigenvalue of M .
(k) Suppose U = V and ker L 6= {0U }. Find det M .
x + y + z + w = 1
x + 2y + 2z + 2w = 1
x + 2y + 3z + 3w = 1
Express this system as a matrix equation M X = V and then find the solution
set by computing an LU decomposition for the matrix M (be sure to use
back and forward substitution).
Make sure to jot down a few brief notes explaining any clever tricks you use.
343
344 Sample Final Exam
8 0 1 0 1 0 1 0 1 0 19
< 1 3 1 0 0 =
8. (a) Do the vectors @ 2 , 2 , 0 , 1 , 0A form a basis for R3 ?
A @ A @ A @ A @
: ;
3 1 0 0 1
Be sure to justify your answer.
0 1 0 1
1 4
B 2C B C
(b) Find a basis for R4 that includes the vectors B C B3C
@3A and @2A.
4 1
(c) Explain in words how to generalize your computation in part (b) to
obtain a basis for Rn that includes a given pair of (linearly independent)
vectors u and v.
344
345
after one orbit the satellite will instead return to some other point Y 2 R3 .
The engineer’s computations show that Y is related to X by a matrix
0 1
0 12 1
B1 1 1C
Y =B @2 2 2A X .
C
1 12 0
Let us assume that the rule found by the engineers applies to all subsequent
orbits. Discuss case by case, what will happen to the satellite if the initial
mistake in its location is in a direction given by an eigenvector.
10. In this problem the scalars in the vector spaces are bits (0, 1 with 1 + 1 = 0).
The space B k is the vector space of bit-valued, k-component column vectors.
L : B3 ! B .
How many di↵erent linear transformations did you find? Compare your
answer to part (c).
(g) Suppose L1 : B 3 ! B and L2 : B 3 ! B are linear transformations,
and ↵ and are bits. Define a new map (↵L1 + L2 ) : B 3 ! B by
345
346 Sample Final Exam
Moreover, having read Newton’s Principiæ, they know that force is propor-
tional to acceleration so that2
d2 X
F = .
dt2
Since the engineers are worried the bridge might start swaying in the heavy
channel winds, they search for an oscillatory solution to this equation of the
form3 0 1
a
X = cos(!t) @ bA .
c
(a) By plugging their proposed solution in the above equations the engi-
neers find an eigenvalue problem
0 1 0 1
a a
M @ bA = ! 2 @ bA .
c c
346
347
12. Conic Sections: The equation for the most general conic section is given by
XT M X + XT C + CT X + f = 0
✓ ◆
x
relating an unknown column vector X = , its transpose X T , a
y
2 ⇥ 2 matrix M , a constant column vector C and the constant f .
(b) Does your matrix M obey any special properties? Find its eigenvalues.
You may call your answers and µ for the rest of the problem to save
writing.
For the rest of this problem we will focus on central conics for
which the matrix M is invertible.
(c) Your equation in part (a) above should be be quadratic in X. Recall
that if m 6= 0, the quadratic equation mx2 + 2cx + f = 0 can be
rewritten by completing the square
⇣ c ⌘2 c 2
m x+ = f.
m m
Being very careful that you are now dealing with matrices, use the
same trick to rewrite your answer to part (a) in the form
Y T M Y = g.
Make sure you give formulas for the new unknown column vector Y
and constant g in terms of X, M , C and f . You need not multiply out
any of the matrix expressions you find.
If all has gone well, you have found a way to shift coordinates
for the original conic equation to a new coordinate system
with its origin at the center of symmetry. Our next aim is
to rotate the coordinate axes to produce a readily recognizable
equation.
347
348 Sample Final Exam
(d) Why is the angle between vectors V and W is not changed when you
replace them by P V and P W for P any orthogonal matrix?
(e) Explain how to choose an orthogonal matrix P such that M P = P D
where D is a diagonal matrix.
(f) For the choice of P above, define our final unknown vector Z by Y =
P Z. Find an expression for Y T M Y in terms of Z and the eigenvalues
of M .
✓ ◆
z
(g) Call Z = . What equation do z and w obey? (Hint, write your
w
answer using , µ and g.)
(h) Central conics are circles, ellipses, hyperbolae or a pair of straight lines.
Give examples of values of ( , µ, g) which produce each of these cases.
14. Captain Conundrum gives Queen Quandary a pair of newborn doves, male
and female for her birthday. After one year, this pair of doves breed and
produce a pair of dove eggs. One year later these eggs hatch yielding a new
pair of doves while the original pair of doves breed again and an additional
pair of eggs are laid. Captain Conundrum is very happy because now he will
never need to buy the Queen a present ever again!
Let us say that in year zero, the Queen has no doves. In year one she has
one pair of doves, in year two she has two pairs of doves etc... Call Fn the
number of pairs of doves in years n. For example, F0 = 0, F1 = 1 and
F2 = 1. Assume no doves die and that the same breeding pattern continues
348
349
well into the future. Then F3 = 2 because the eggs laid by the first pair of
doves in year two hatch. Notice also that in year three, two pairs of eggs are
laid (by the first and second pair of doves). Thus F4 = 3.
F n = Fn 1 + Fn 2.
✓ ◆
Fn
(c) Let us introduce a column vector Xn = . Compute X1 and X2 .
Fn 1
Verify that these vectors obey the relationship
✓ ◆
1 1
X2 = M X1 where M = .
1 0
349
350 Sample Final Exam
Suppose 0 1
1 2 1 3
B2 1 1 2C
M =B
@1
C
0 0 1A
4 1 1 0
is the matrix of L in the bases {v1 , v2 , v3 , v4 } for V and {w1 , w2 , w3 , w4 }
for W . Find bases for kerL and imL. Use the dimension formula to check
your result.
17. Captain Conundrum collects the following data set
y x
5 2
2 1
0 1
3 2
which he believes to be well-approximated by a parabola
y = ax2 + bx + c .
(a) Write down a system of four linear equations for the unknown coeffi-
cients a, b and c.
(b) Write the augmented matrix for this system of equations.
(c) Find the reduced row echelon form for this augmented matrix.
(d) Are there any solutions to this system?
(e) Find the least squares solution to the system.
(f) What value does Captain Conundrum predict for y when x = 2?
18. Suppose you have collected the following data for an experiment
x y
x1 y1
x2 y2
x3 y3
and believe that the result is well modeled by a straight line
y = mx + b .
350
351
(a) Write down a linear system of equations you could use to find the slope
m and constant term b.
(b) Arrange the unknowns (m, b) in a column vector X and write your
answer to (a) as a matrix equation
MX = V .
Be sure to give explicit expressions for the matrix M and column vector
V.
(c) For a generic data set, would you expect your system of equations to
have a solution? Briefly explain your answer.
(d) Calculate M T M and (M T M ) 1 (for the latter computation, state the
condition required for the inverse to exist).
(e) Compute the least squares solution for m and b.
(f) The least squares method determines a vector X that minimizes the
length of the vector V M X. Draw a rough sketch of the three data
points in the (x, y)-plane as well as their least squares fit. Indicate how
the components of V M X could be obtained from your picture.
Solutions
1. You can find the definitions for all these terms by consulting the index of
this book.
I + J + 13 = 0 .
There are three voltage loops (one on the left, one on the right and one going
around the outside of the circuit). Respectively, they give the equations
60 I 80 3I = 0
80 + 2J V + 3J = 0
60 I + 2J V + 3J 3I = 0 . (F.1)
The above equations are easily solved (either using an augmented matrix
and row reducing, or by substitution). The result is I = 5 Amps, J = 8
Amps, V = 40 Volts.
3. (a) m.
351
352 Sample Final Exam
(b) n.
(c) Yes.
(d) n ⇥ n.
(e) m ⇥ m.
(f) Yes. This relies on kerM = 0 because if M T M had a non-trivial kernel,
then there would be a non-zero solution X to M T M X = 0. But then
by multiplying on the left by X T we see that ||M X|| = 0. This in turn
implies M X = 0 which contradicts the triviality of the kernel of M .
T
(g) Yes because M T M = M T (M T )T = M T M .
(h) Yes, all symmetric matrices have a basis of eigenvectors.
(i) No, because otherwise it would not be invertible.
(j) Since the kernel of L is non-trivial, M must have 0 as an eigenvalue.
(k) Since M has a zero eigenvalue in this case, its determinant must vanish.
I.e., det M = 0.
Then 0 1 0 10 1
1 1 1 1 1 0 0 1 1 1 1
B C B CB C
M = @1 2 2 2A = @1 1 0A @0 1 1 1A
1 2 3 3 1 0 1 0 1 2 2
0 10 1
1 0 0 1 1 1 1
B CB C
= @1 1 0A @0 1 1 1A = LU
1 1 1 0 0 1 1
0 1
a
So now M X = V becomes LW = V where W = U X = bA (say). Thus
@
c
we solve LW = V by forward substitution
a = 1, a + b = 1, a + b + c = 1 ) a = 1, b = 0, c = 0 .
352
353
x + y + z + w = 1, y + z + w = 0, z + w = 0
) w = µ (arbitrary), z = µ, y = 0, x = 1 .
80 1 0 1 9
>
> x 1 >
>
<B C B C =
B y C B 0C
The solution set is @ A = @ A : µ 2 R
>
> z µ >
>
: ;
y µ
5. First ✓ ◆
1 2
det = 2.
3 4
All the other determinants vanish because the first three rows of each matrix
are not independent. Indeed, 2R2 R1 = R3 in each case, so we can make
row operations to get a row of zeros and thus a zero determinant.
0 1
x
6. If U spans R3 , then we must be able to express any vector X = @ y A 2 R3
z
as 0 1 0 1 0 1 0 1 0 11
1 1 a 1 1 a c
X = c1 @0A + c2 @ 2A + c3 @1A = @0 2 1A @c2 A ,
1 3 0 1 3 0 c3
for some coefficients c1 , c2 and c3 . This is a linear system. We could solve
for c1 , c2 and c3 using an augmented matrix and row operations. However,
since we know that dim R3 = 3, if U spans R3 , it will also be a basis. Then
the solution for c1 , c2 and c3 would be unique. Hence, the 3⇥3 matrix above
must be invertible, so we examine its determinant
0 1
1 1 a
det @0 2 1A = 1.(2.0 1.( 3)) + 1.(1.1 a.2) = 4 2a .
1 3 0
(You can obtain this result, or an equivalent one by studying the above linear
system with X = 0, i.e., the associated homogeneous system.) The two
353
354 Sample Final Exam
0 1 0 1
1 2
vectors @ 2 and 1A are clearly linearly independent, so this is the
A @
3 0
least number of vectors spanning U for this value of a. Also we see that
0 1be a plane in R though the
dimU = 2 in this case. Your0picture 3
1 should
1 2
origin containing the vectors @ 2A and @1A.
3 0
7. ✓ ◆
1 x
det =y x,
1 y
0 1 0 1
1 x x2 1 x x2
det @1 y y 2 A = det @0 y x y2 x2 A
1 z z2 0 z x z2 x2
0 1 0 1
1 x x2 x 3 1 x x2 x3
B1 y y 2 y C
3 B 2 x2 y 3 x3 C
det B C = det B0 y x y C
@1 z z 2 z 3 A @0 z x z 2 x 2 z 3 x3 A
1 w w w2 3 0 w x w 2 x2 w 3 x3
0 1
1 0 0 0
B0 y x y(y x) y 2 (y x)C
= det B
@0 z x z(z x) z 2 (z x)A
C
0 w x w(w x) w2 (w x)
0 1
1 0 0 0
B0 1 y y 2 C
= (y x)(z x)(w x) det B @0 1 z z 2 A
C
0 1 w w2
0 1
1 y y2
= (y x)(z x)(w x) det @1 z z 2 A
1 w w2
= (y x)(z x)(w x)(z y)(w y)(w z) .
From the 4 ⇥ 4 case above, you can see all the tricks required for a general
Vandermonde matrix. First zero out the first column by subtracting the
first row from all other rows (which leaves the determinant unchanged).
354
355
Now zero out the top row by subtracting x1 times the first column from the
second column, x1 times the second column from the third column et cetra.
Again these column operations do not change the determinant. Now factor
out x2 x1 from the second row, x3 x1 from the third row, etc. This
does change the determinant so we write these factors outside the remaining
determinant, which is just the same problem but for the (n 1) ⇥ (n 1)
case. Iterating the same procedure gives the result
0 1
1 x1 (x1 )2 · · · (x1 )n 1
B1 x2 (x2 )2 · · · (x2 )n 1C
B C Y
B 2 · · · (x3 )n 1C
det B1 x3 (x3 ) C= (xi xj ) .
B .. .. .. .. .. C
@. . . . . A i>j
1 xn (xn )2 · · · (xn )n 1
Q
(Here stands for a multiple product, just like ⌃ stands for a multiple
sum.)
So we study
0 1 0 1
1 4 1 0 0 0 1 4 1 0 0 0
B2 3 0 1 0 C
0 C B0B 5 2 1 0 0C
B ⇠ C
@3 2 0 0 1 0 A @0 10 3 0 1 0A
4 1 0 0 0 1 0 15 4 0 0 1
0 3 1 0 3 1
1 0 5 4 0 0 1 0 0 2 5 0
B0 1 2 1 C B 19 2
0 0 0 1 0 0C
⇠B
@0 0
5 5 C⇠B
A @
5 5 C
1 10 1 0 0 0 1 10 1 0A
5 1
0 0 2 15 0 1 0 0 0 2 10 2
From here we can keep row reducing to achieve RREF, but we can
already see that the non-pivot variables will be " and ⌘. Hence we can
355
356 Sample Final Exam
9. (a)
0 1
1
2 1
B C ⇣ 1⌘ 1 1⇣ 1⌘ ⇣ 1 ⌘
det B
@
1
2
1
2
1C
2A = ( )+ +
2 4 2 2 2 4
1
1 2
1 2 3
3 3
= = ( + 1)( ).
2 2 2
Hence the eigenvalues are 0, 1, 32 .
(b) When = 0 we must solve the homogenous system
0 1 0 1 0 1
0 12 1 0 1 12 0 0 1 0 1 0
B 1 1 1 C B C B C
@ 2 2 2 0 A ⇠ @ 0 14 12 0 A ⇠ @ 0 1 2 0 A .
1 12 0 0 0 12 1 0 0 0 0 0
0 1
s
So we find the eigenvector @ 2sA where s 6= 0 is arbitrary.
s
For = 1
0 1 0 1
1 12 1 0 1 0 1 0
B 1 3 1 C B C
@ 2 2 2 0 A⇠@ 0 1 0 0 A.
1 12 1 0 0 0 0 0
0 1
s
So we find the eigenvector @ 0A where s 6= 0 is arbitrary.
s
356
357
3
Finally, for = 2
0 3 1 1 0 1 3
1 0 1
2 2 1 0 1 2 2 0 1 0 1 0
B 1 1 C B 5 5 C B C
@ 2 1 2 0 A⇠@ 0 4 4 0 A⇠@ 0 1 1 0 A.
1 3
1 2 2 0 0 54 5
4 0 0 0 0 0
0 1
s
So we find the eigenvector @sA where s 6= 0 is arbitrary.
s
0 1
1
If the mistake X is in the direction of the eigenvector @ 2A, then Y = 0.
1
I.e., the satellite returns to the origin O. For all subsequent orbits it will
again return to the origin. NASA would be very pleased in this case.
0 1
1
If the mistake X is in the direction @ 0A, then Y = X. Hence the
1
satellite will move to the point opposite to X. After next orbit will move
back to X. It will continue this wobbling motion indefinitely. Since this is a
stable situation, again, the elite engineers will pat themselves on the back.
0 1
1
Finally, if the mistake X is in the direction @1A , the satellite will move to a
1
point Y = 32 X which is further away from the origin. The same will happen
for all subsequent orbits, with the satellite moving a factor 3/2 further away
from O each orbit (in reality, after several orbits, the approximations used
by the engineers in their calculations probably fail and a new computation
will be needed). In this case, the satellite will be lost in outer space and the
engineers will likely lose their jobs!
8 0 1 0 1 0 19
< 1 0 0 =
10. (a) A basis for B is3 @ 0 , 1 , 0A
A @ A @
: ;
0 0 1
(b) 3.
(c) 23 = 8.
(d) dimB 3 = 3.
(e) Because the vectors {v1 , v2 , v3 } are a basis any element v 2 B 3 0can 1be
b1
written uniquely as v = b1 v1 +b2 v2 +b3 v3 for some triplet of bits @b2 A.
b3
357
358 Sample Final Exam
0 0 0 ,
1 0 0 ,
0 1 0 ,
0 0 1 ,
1 1 0 ,
1 0 1 ,
0 1 1 ,
1 1 1 .
358
359
359
360 Sample Final Exam
0 1
1
so @ 0A is an eigenvector.
1
For = 3
0 1 0 1 0 1
2 1 0 1 1 1 1 0 1
M ( 3).I = @ 1 1 1A ⇠ @0 1 2A ⇠ @0 1 2A ,
0 1 2 0 1 2 0 0 0
0 1
1
so @2A is an eigenvector.
1
p
(c) The characteristic frequencies are 0, 1, 3.
(d) The orthogonal change of basis matrix
0 1
p1 p1 p1
3 2 6
B p1 0 p2 C
P =@ 3 6A
p1 p1 p1
3 2 6
It obeys M P = P D where
0 1
0 0 0
D = @0 1 0A .
0 0 3
1 0
1
(e) Yes, the direction given by the eigenvector @ 1A because its eigen-
1
value is zero. This is probably a bad design for a bridge because it can
be displaced in this direction with no force!
✓ ◆
a b
12. (a) If we call M = , then X T M X = ax2 + 2bxy + dy 2 . Similarly
b d
✓ ◆
c
putting C = yields X T C + C T X = 2X C = 2cx + 2ey. Thus
e
360
361
X T M X + C T X + X T C = (X T + C T M 1
)M (X + M 1
C) C T M 1
C,
so that
(X T + C T M 1
)M (X + M 1
C) = C T M C f.
Hence Y = X + M 1C and g = C T M C f.
(d) The cosine of the angle between vectors V and W is given by
V W V TW
p =p .
V VW W V TV WTW
So replacing V ! P V and W ! P W will always give a factor P T P
inside all the products, but P T P = I for orthogonal matrices. Hence
none of the dot products in the above formula changes, so neither does
the angle between V and W .
(e) If we take the eigenvectors of M , normalize them (i.e. divide them
by their lengths), and put them in a matrix P (as columns) then P
will be an orthogonal matrix. (If it happens that = µ, then we
also need to make sure the eigenvectors spanning the two dimensional
eigenspace corresponding to are orthogonal.) Then, since M times
the eigenvectors yields just the eigenvectors back again multiplied by
their eigenvalues, it follows that M P = P D where D is the diagonal
matrix made from eigenvalues.
(f) If Y = P Z,✓ then◆Y T M Y = Z T P T M P Z = Z T P T P DZ = Z T DZ
0
where D = .
0 µ
(g) Using part (f) and (c) we have
z 2 + µw2 = g .
361
362 Sample Final Exam
Thereby
dim V = rank L = dim W.
ii. Since dim V = dim W , the matrix M is square so we can talk
about its eigenvalues. Since L is injective, its kernel is the zero
vector alone. That is, the only solution to LX = 0 is X = 0V .
But LX is the same as M X, so the only solution to M X = 0 is
X = 0V . So M does not have zero as an eigenvalue.
iii. Since M X = 0 has no non-zero solutions, the matrix M is invert-
ible.
(b) Now we suppose that M is an invertible matrix.
i. Since M is invertible, the system M X = 0 has no non-zero solu-
tions. But LX is the same as M X, so the only solution to LX = 0
is X = 0V . So L does not have zero as an eigenvalue.
ii. Since LX = 0 has no non-zero solutions, the kernel of L is the
zero vector alone. So L is injective.
iii. Since M is invertible, we must have that dim V = dim W . By the
Dimension Formula, we have
dim V = L + rank L
362
363
14. (a) F4 = F2 + F3 = 2 + 3 = 5.
(b) The number of pairs of doves in any given year equals the number of
the previous years plus those that hatch and there are as many of them
as pairs of doves in the year before the previous year.
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
F1 1 F2 1
(c) X1 = = and X2 = = .
F0 0 F1 1
✓ ◆✓ ◆ ✓ ◆
1 1 1 1
M X1 = = = X2 .
1 0 0 1
(d) We just need to use the recursion relationship of part (b) in the top
slot of Xn+1 :
✓ ◆ ✓ ◆ ✓ ◆✓ ◆
Fn+1 Fn + Fn 1 1 1 Fn
Xn+1 = = = = M Xn .
Fn Fn 1 0 Fn 1
(f) M n = (P DP 1 )n = P DP 1 P DP 1 . . . P DP 1 = P Dn P 1.
363
364 Sample Final Exam
(g) Just use the matrix recursion relation of part (d) repeatedly:
Xn+1 = M Xn = M 2 Xn 1 = · · · = M n X1 .
p p
1+ 5 1 5
(h) The eigenvalues are ' = 2 and 1 '= 2 .
(i) ✓ ◆
Fn+1
Xn+1 = = M n Xn = P D n P 1
X1
Fn
✓ ◆n !✓ ◆ ✓ n ◆ !
' 0 p1 ? 1 ' 0 p1
=P 5 =P 5
1 p1
0 1 ' p
5
? 0 0 (1 ')n 5
p p ! 'n ! !
1+ 5 1 5 p ?
2 2 5
= (1 ')n = 'n (1 ')n .
1 1 p p
5 5
Hence
(1 ')n'n
p
Fn = .
5
These are the famous Fibonacci numbers.
and 1 0
1
3 B 0C
u w v? w ? 3 B C
w? = w u v =w u 4 ?
3 v = @ 0A
u u v? v? 4 4
1
Dividing by lengths, an orthonormal basis for span{u, v, w} is
8 0 1 0 p 1 0 p 19
1
>
> 2
3 2 >>
>
> B 1 C B p6 C B 2 C> >
>
<B C B 3C B >
=
B 2 C B p2 C B 0 C C
B1C B, C , .
> B C B 3 C B 0 C>
>
> @ A >
>
>@ 2 A @ p6 A
> p >
: 1 3 2 >;
2 6 2
16. (a) The columns of M span imL in the basis given for W .
364
365
17. (a) 8
>
> 5 = 4a 2b + c
<
2=a b+c
>
> 0=a+b+c
:
3 = 4a + 2b + c .
(b,c,d)
0 1 0 1 0 1
4 2 1 5 1 1 1 0 1 0 1 1
B 1 1 1 C B
2 C B 0 6 3 C B
5 C B 0 1 0 1 C
B ⇠ ⇠ C
@ 1 1 1 0 A @ 0 2 0 2 A @ 0 0 3 11 A
4 2 1 3 0 2 3 3 0 0 3 3
11
The system has no solutions because c = 1 and c = 3 is impossible.
(e) Let 0 1 0 1
4 2 1 5
B1 1 1C B2C
M =B
@1
C and V = B C .
1 1A @0A
4 2 1 3
365
366 Sample Final Exam
Then 0 1 0 1
34 0 10 34
M T M = @ 0 10 0A and M T V = @ 6A .
10 0 4 10
So
0 1 0 2
1 0 1
34 0 10 34 1 0 5 1 1 0 0 1
@ 0 10 0 6A ⇠ @0 10 0 6A ⇠ @0 1 0 3A
5
18
10 0 4 10 0 0 5 0 0 0 1 0
3
The least squares solution is a = 1, b = 5 and c = 0.
3 14
(b) The Captain predicts y(2) = 1.22 5 .2 +0= 5 .
18.
366
G
Movie Scripts
367
368 Movie Scripts
x+y = 27
2x y = 0?
Well the augmented matrix is just a new notation for the matrix equation
✓ ◆✓ ◆ ✓ ◆
1 1 x 27
=
2 1 y 0
x+y = 27
2x y = 0
and notice that the solution is x = 9 and y = 18. The other augmented matrix
represents the system
x +0·y = 9
0·x + y = 18
This clearly has the same solution. The first and second system are related
in the sense that their solutions are the same. Notice that it is really
nice to have the augmented matrix in the second form, because the matrix
multiplication can be done in your head.
368
G.2 Systems of Linear Equations 369
• Symmetric: If the first person, Bob (say) has the same hair color as a
second person Betty(say), then Bob has the same hair color as Betty, so
this holds too.
369
370 Movie Scripts
• Transitive: If Bob has the same hair color as Betty (say) and Betty has
the same color as Brenda (say), then it follows that Bob and Brenda have
the same hair color, so the transitive property holds too and we are
done.
370
G.2 Systems of Linear Equations 371
1 · x1 + 3x3 = 2
1 · x2 = 1
Notice that
✓ ◆ the system is written this way the copy of the 2 ⇥ 2 identity
when
1 0
matrix makes it easy to write a solution in terms of the variables
0 1
✓ ◆
3
x1 and x2 . We will call x1 and x2 the pivot variables. The third column
0
does not look like part of an identity matrix, and there is no 3 ⇥ 3 identity
in the augmented matrix. Notice there are more variables than equations and
that this means we will have to write the solutions for the system in terms of
the variable x3 . We’ll call x3 the free variable.
Let x3 = µ. (We could also just add a ‘‘dummy’’ equation x3 = x3 .) Then we
can rewrite the first equation in our system
x1 + 3x3 = 2
x1 + 3µ = 2
x1 = 2 3µ.
Then since the second equation doesn’t depend on µ we can keep the equation
x2 = 1,
x3 = µ
371
372 Movie Scripts
Any value of µ will give a solution of the system, and any system can be written
in this form for some value of µ. Since there are multiple solutions, we can
also express them as a set:
80 1 0 1 0 1 9
< x1 2 3 =
@x2 A = @1A + µ @ 0A µ 2 R .
: ;
x3 0 1
372
G.2 Systems of Linear Equations 373
We scale the second and third rows appropriately in order to avoid fractions,
then subtract the corresponding rows as before. Finally scale the first row
and hence we have x = 1 and y = 2 as a unique solution.
373
374 Movie Scripts
We will call two people equivalent if they have the same hair color. There are
three properties to check:
• Reflexive: This just requires that you have the same hair color as
yourself so obviously holds.
• Symmetric: If the first person, Bob (say) has the same hair color as a
second person Betty(say), then Bob has the same hair color as Betty, so
this holds too.
• Transitive: If Bob has the same hair color as Betty (say) and Betty has
the same color as Brenda (say), then it follows that Bob and Brenda have
the same hair color, so the transitive property holds too and we are
done.
374
G.2 Systems of Linear Equations 375
Planes
Here we want to describe the mathematics of planes in space. The video is
summarised by the following picture:
375
376 Movie Scripts
N V = d.
Remember that when vectors are perpendicular their dot products vanish. I.e.
U V = 0 , U ? V . This means that if a vector V0 solves our equation N V = d,
then so too does V0 + C whenever C is perpendicular to N . This is because
N (V0 + C) = N V0 + N C = d + 0 = d .
• For two equations, we must look at two planes. These usually intersect
along a line, so the solution set will also (usually) be a line:
376
G.3 Vectors in Space n-Vectors 377
ax + by + cz = d
x + 2y + 5z = 3 .
377
378 Movie Scripts
In fact this is a system of linear equations whose solutions form a plane with
normal vector (1, 2, 5). As an augmented matrix the system is simply
⇣ ⌘
1 2 5 3 .
or in vector notation
0 1 0 1 0 1 01
x 3 2 5
@ y A = @0A + 1
@ 1A + 2
@ 0A .
z 0 0 1
This describes a plane parametric equation. Planes are ‘‘two-dimensional’’
because they are described by two free variables. Here’s a picture of the
resulting plane:
378
G.4 Vector Spaces 379
idea is to plot the story of your life on a plane with coordinates (x, t). The
coordinate x encodes where an event happened (for real life situations, we
must replace x ! (x, y, z) 2 R3 ). The coordinate t says when events happened.
Therefore you can plot your life history as a worldline as shown:
Each point on the worldline corresponds to a place and time of an event in your
life. The slope of the worldline has to do with your speed. Or to be precise,
the inverse slope is your velocity. Einstein realized that the maximum speed
possible was that of light, often called c. In the diagram above c = 1 and
corresponds to the lines x = ±t ) x2 t2 = 0. This should get you started in
your search for vectors with zero length.
(+ii) Additive commutativity: We want to check that when we add any two vectors
we can do so in either order, i.e.
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
x1 y ? y1 x1
+ 1 = + .
x2 y2 y2 x2
379
380 Movie Scripts
This again relies on the underlying real numbers which for any x, y 2 R
obey
x + y = y + x.
This fact underlies the middle step of the following computation
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
x1 y1 x 1 + y1 y1 + x 1 y1 x1
+ = = = + ,
x2 y2 x 2 + y2 y2 + x 2 y2 x2
which demonstrates what we wished to show.
(+iii) Additive Associativity: This shows that we needn’t specify with paren-
theses which order we intend to add triples of vectors because their
sums will agree for either choice. What we have to check is
✓✓ ◆ ✓ ◆◆ ✓ ◆ ✓ ◆ ✓✓ ◆ ✓ ◆◆
x1 y z ? x1 y1 z
+ 1 + 1 = + + 1 .
x2 y2 z2 x2 y2 z2
Again this relies on the underlying associativity of real numbers:
(x + y) + z = x + (y + z) .
(iv) Zero: There needs to exist a vector ~0 that works the way we would expect
zero to behave, i.e. ✓ ◆ ✓ ◆
x1 x1
+ ~0 = .
y1 y1
It is easy to find, the answer is
✓ ◆
~0 = 0
.
0
You can easily check that when this vector is added to any vector, the
result is unchanged.
✓ ◆
x1
(+v) Additive Inverse: We need to check that when we have , there is
x2
another vector that can be added to it so the sum is ~0. (Note that it
~
is important to first
✓ figure
◆ ✓out ◆what 0 is here!) The answer for the
x1 x1
additive inverse of is because
x2 x2
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
x1 x1 x1 x1 0
+ = = = ~0 .
x2 x2 x2 x2 0
380
G.4 Vector Spaces 381
We are half-way done, now we need to consider the rules for scalar multipli-
cation. Notice, that we multiply vectors by scalars (i.e. numbers) but do NOT
multiply a vectors by vectors.
Since products of real numbers ax1 and ax2 are again real numbers we see
this is indeed inside R2 .
Once again this is a simple LHS=RHS proof using properties of the real
numbers. Starting on the left we have
✓ ◆ ✓ ◆ ✓ ◆
x (a + b)x1 ax1 + bx1
(a + b) 1 = =
x2 (a + b)x2 ax2 + bx2
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
ax1 bx1 x x
= + =a 1 +b 1 ,
ax2 bx2 x2 x2
as required.
(·iii) Additive distributivity: This time we need to check the equation The
equation we need to check is
✓✓ ◆ ✓ ◆◆ ✓ ◆ ✓ ◆
x1 y1 ? x y
a + =a 1 +a 1 ,
x2 y2 x2 y2
i.e., one scalar but two different vectors. The method is by now becoming
familiar
✓✓ ◆ ✓ ◆◆ ✓✓ ◆◆ ✓ ◆
x1 y1 x 1 + y1 a(x1 + y1 )
a + =a =
x2 y2 x 2 + y2 a(x2 + y2 )
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
ax1 + ay1 ax1 ay1 x y
= = + =a 1 +a 1 ,
ax2 + ay2 ax2 ay2 x2 y2
again as required.
381
382 Movie Scripts
Now we are done---we have really proven the R2 is a vector space so lets write
a little square ⇤ to celebrate.
382
G.5 Linear Transformations 383
You can also model the new vector 2J obtained by scalar multiplication by
2 by thinking about Jenny hitting the puck twice (or a world with two Jenny
Potters....). Now ask yourself questions like whether the multiplicative
distributive law
2J + 2N = 2(J + N )
make sense in this context.
a 0 + a 1 t + a 2 t2 + . . . + a n tn
a 0 + a 1 t + a 2 t2
and the output will have degree three and look like
b 0 + b 1 t + b 2 t 2 + b 3 t3
Just this should be really helpful for the first two parts of the problem. The
third part of the problem is asking us to think about this as a linear algebra
problem, so lets think about how we could write this in the vector notation we
use in the class. We could write
383
384 Movie Scripts
0 1
a0
a0 + a1 t + a2 t2 as @a1 A
a2
And think for a second about how you add polynomials, you match up terms of
the same degree and add the constants component-wise. So it makes some sense
to think about polynomials this way, since vector addition is also component-
wise.
We could also write the output
0 1
b0
b0 + b1 t + b2 t2 + b3 t3 as @b1 A b3
b2
Then lets look at the information given in the problem and think about it
in terms of column vectors
• L(1) = 4 but we can think of the input 1 = 10+10t + 0t2 and the output
0 1 4
1 B0C
4 = 4 + 0t + 0t2 0t3 and write this as L(@0A) = B
@0A
C
0
0
0 1
0 1 0
0 B0C
• L(t) = t3 This can be written as L(@1A) = B
@0A
C
0
1
384
G.6 Matrices 385
G.6 Matrices
Adjacency Matrix Example
Lets think about a graph as a mini-facebook. In this tiny facebook there are
only four people, Alice, Bob, Carl, and David.
Suppose we have the following relationships
• Alice and Bob are friends.
• Alice and Carl are friends.
• Carl and Bob are friends.
• David and Bob are friends.
Now draw a picture where each person is a dot, and then draw a line between
the dots of people who are friends. This is an example of a graph if you think
of the people as nodes, and the friendships as edges.
Now lets make a 4 ⇥ 4 matrix, which is an adjacency matrix for the graph.
Make a column and a row for each of the four people. It will look a lot like a
table. When two people are friends put a 1 the the row of one and the column
of the other. For example Alice and Carl are friends so we can label the table
below.
A B C D
A 1
B
C 1
D
We can continue to label the entries for each friendship. Here lets assume
that people are friends with themselves, so the diagonal will be all ones.
385
386 Movie Scripts
A B C D
A 1 1 1 0
B 1 1 1 1
C 1 1 1 0
D 0 1 0 1
Notice that this table is symmetric across the diagonal, the same way a
multiplication table would be symmetric. This is because on facebook friend-
ship is symmetric in the sense that you can’t be friends with someone if they
aren’t friends with you too. This is an example of a symmetric matrix.
You could think about what you would have to do differently to draw a graph
for something like twitter where you don’t have to follow everyone who follows
you. The adjacency matrix might not be symmetric then.
Do Matrices Commute?
This video shows you a funny property of matrices. Some matrix properties
look just like those for numbers. For example numbers obey
a(bc) = (ab)c
and so do matrices:
A(BC) = (AB)C.
This says the order of bracketing does not matter and is called associativity.
Now we ask ourselves whether the basic property of numbers
ab = ba ,
386
G.6 Matrices 387
387
388 Movie Scripts
Proof Explanation
In this video we will talk through the steps required to prove
tr M N = tr N M .
where the upper index labels rows and the lower one columns. Then
X
MN = mil nlj ,
l
where the ‘‘open’’ indices i and j label rows and columns, but the index l is
a ‘‘dummy’’ index because it is summed over. (We could have given it any name
we liked!).
Finally the trace is the sum over diagonal entries for which the row and
column numbers must coincide
X
tr M = mii .
i
Hence starting from the left of the statement we want to prove, we have
XX
LHS = tr M N = mil nli .
i l
Next we do something obvious, just change the order of the entries mil and nli
(they are just numbers) so
XX XX
mil nli = nli mil .
i l i l
Finally, since we have finite sums it is legal to change the order of summa-
tions XX XX
nil mli = nil mli .
l i i l
This expression is the same as the one on the line above where we started
except the m and n have been swapped so
XX
mil nli = tr N M = RHS .
i l
388
G.6 Matrices 389
fR (M + N ) = (M + N )R = M R + N R = fR (M ) + fR (N ),
v (M · N ) = (v M ) N
where we treat v M as M (v) as before, and note the order in which the
matrices are applied. People will often omit the left or right because they
are essentially the same, and just say that M acts on V .
X1
xn
ex = .
n=0
n!
389
390 Movie Scripts
This means we are going to have an idea of what An looks like for any n. Lets
look at the example of one of the matrices in the problem. Let
✓ ◆
1
A= .
0 1
then we can think about the first few terms of the sequence
X1
An 1 1
eA = = A0 + A + A2 + A3 + . . . .
n=0
n! 2! 3!
Looking at the entries when we add this we get that the upper left-most entry
looks like this:
1
X
1 1 1
1 + 1 + + + ... = = e1 .
2 3! n=0
n!
Continue this process with each of the entries using what you know about Taylor
series expansions to find the sum of each entry.
2 ⇥ 2 Example
Lets go though and show how this 2⇥2 example satisfies all of these properties.
Lets look at ✓ ◆
7 3
M=
11 5
We have a rule to compute the inverse
✓ ◆ 1 ✓ ◆
a b 1 d b
=
c d ad bc c a
390
G.6 Matrices 391
You can compute M M 1 , this should work the other way too.
Now lets think about products of matrices
✓ ◆ ✓ ◆
1 3 1 0
Let A = and B =
1 5 2 1
1 1 1
Notice that M = AB. We have a rule which says that (AB) = B A .
Lets check to see if this works
✓ ◆ ✓ ◆
1 1 5 3 1 1 0
A = and B =
2 1 1 2 1
and
✓ ◆✓ ◆ ✓ ◆
1 1 1 0 5 3 1 2 0
B A = =
2 1 1 1 2 0 2
391
392 Movie Scripts
Z32 or Z22
L : Z32 ! Z22 .
Since we have bits, we can work out what L does to every vector, this is listed
below
L
(0, 0, 0) 7! (0, 0)
L
(0, 0, 1) 7! (1, 0)
L
(1, 1, 0) 7! (1, 0)
L
(1, 0, 0) 7! (0, 1)
L
(0, 1, 1) 7! (0, 1)
L
(0, 1, 0) 7! (1, 1)
L
(1, 0, 1) 7! (1, 1)
L
(1, 1, 1) 7! (1, 1)
Now lets think about left and right inverses. A left inverse B to the matrix
A would obey
BA = I
and since the identity matrix is square, B must be 2 ⇥ 3. It would have to
undo the action of A and return vectors in Z32 to where they started from. But
above, we see that different vectors in Z32 are mapped to the same vector in Z22
by the linear transformation L with matrix A. So B cannot exist. However a
right inverse C obeying
AC = I
can. It would be 2 ⇥ 2. Its job is to take a vector in Z22 back to one in Z32 in a
way that gets undone by the action of A. This can be done, but not uniquely.
392
G.6 Matrices 393
Using an LU Decomposition
Lets go through how to use a LU decomposition to speed up solving a system of
equations. Suppose you want to solve for x in the equation M x = b
0 1 0 1
1 0 5 6
@ 3 1 14 A x = @19A
1 0 3 4
where you are given the decomposition of M into the product of L and U which
are lower and upper and lower triangular matrices respectively.
0 1 0 10 1
1 0 5 1 0 0 1 0 5
M =@ 3 1 14 A = @ 3 1 0 A @ 0 1 1 A = LU
1 0 3 1 0 2 0 0 1
First you should solve L(U x) = b for U x. The augmented matrix you would use
looks like this 0 1
1 0 0 6
@ 3 1 0 19 A
1 0 2 4
This is an easy augmented matrix to solve because it is upper triangular. If
you were to write out the three equations using variables, you would find that
the first equation has already been solved, and is ready to be plugged into
the second equation. This backward substitution makes solving the system much
faster. Try it and in a few steps you should be able to get
0 1
1 0 0 6
@ 0 1 0 1 A
0 0 1 1
0 1
6
This tells us that U x = @ 1A. Now the second part of the problem is to solve
1
for x. The augmented matrix you get is
0 1
1 0 5 6
@ 0 1 1 1 A
0 0 1 1
It should take only a few step to transform it into
0 1
1 0 0 1
@ 0 1 0 2 A,
0 0 1 1
0 1
1
which gives us the answer x = @ 2A.
1
393
394 Movie Scripts
394
G.7 Determinants 395
Now lets specialize to the case where the square matrix X has an inverse.
Then we can multiply out the following triple product of a lower triangular,
a block diagonal and an upper triangular matrix:
✓ ◆✓ ◆✓ ◆
I 0 X 0 I X 1Y
ZX 1 I 0 W ZX 1 Y 0 I
✓ ◆✓ 1
◆
X 0 I X Y
= 1
Z W ZX Y 0 I
✓ ◆
X Y
=
ZX 1 Y + Z W ZX 1
Y
✓ ◆
X Y
= =M.
Z W
This shows that the LDU decomposition given in Section 7.7 is correct.
G.7 Determinants
Permutation Example
Lets try to get the hang of permutations. A permutation is a function which
scrambles things. Suppose we had
395
396 Movie Scripts
This is an even permutation, since the number of swaps we used is two (an even
number).
Elementary Matrices
This video will explain some of the ideas behind elementary matrices. First
think back to linear systems, for example n equations in n unknowns:
8 1 1
>
> a1 x + a12 x2 + · · · + a1n xn = v 1
>
>
>
< a21 x1 + a22 x2 + · · · + a2n xn = v 2
.
.
>
> .
>
>
>
: n 1
a1 x + an2 x2 + · · · + ann xn = v n .
We know it is helpful to store the above information with matrices and vectors
0 1 1 0 11 0 11
a1 a12 · · · a1n x v
B a21 a22 · · · a2n C B x2 C B v2 C
B C B C B C
M := B . . . C, X := B .C , V := B .C .
@ .. .
. .
.A @ . .A @ ..A
an1 n
a2 · · · an n
x n
vn
Here we will focus on the case the M is square because we are interested in
its inverse M 1 (if it exists) and its determinant (whose job it will be to
determine the existence of M 1 ).
We know at least three ways of handling this linear system problem:
1. As an augmented matrix
M V .
Here our plan would be to perform row operations until the system looks
like
I M 1V ,
1
(assuming that M exists).
396
G.7 Determinants 397
2. As a matrix equation
MX = V ,
1
which we would solve by finding M (again, if it exists), so that
1
X=M V.
3. As a linear transformation
L : Rn ! R n
via
Rn 3 X 7 ! M X 2 Rn .
In this case we have to study the equation L(X) = V because V 2 Rn .
Lets focus on the first two methods. In particular we want to think about
how the augmented matrix method can give information about finding M 1 . In
particular, how it can be used for handling determinants.
The main idea is that the row operations changed the augmented matrices,
but we also know how to change a matrix M by multiplying it by some other
matrix E, so that M ! EM . In particular can we find ‘‘elementary matrices’’
the perform row operations?
Once we find these elementary matrices is is very important to ask how they
effect the determinant, but you can think about that for your own self right
now.
Lets tabulate our names for the matrices that perform the various row
operations:
To finish off the video, here is how all these elementary matrices work
for a 2 ⇥ 2 example. Lets take
✓ ◆
a b
M= .
c d
397
398 Movie Scripts
• Scalar multiplying:
✓ ◆ ✓ ◆✓ ◆ ✓ ◆
1 0 0 a b a b
R ( )= , E21 M = = .
0 1 0 1 c d c d
• Row sum:
✓ ◆ ✓ ◆✓ ◆ ✓ ◆
1 1 a b a+ c b+ d
S21 ( ) = , S21 ( )M = = .
0 1 0 1 c d c d
Elementary Determinants
This video will show you how to calculate determinants of elementary matrices.
First remember that the job of an elementary row matrix is to perform row
operations, so that if E is an elementary row matrix and M some given matrix,
EM
398
G.7 Determinants 399
0 1
1
B .. C
B C
B . C
B 1 C
B C
B .. C
Sj ( ) = B
i
B .
C
C
B C
B 1 C
B C
B .. C
@ . A
1
So to calculate their determinants, we just have to apply the above list
of what happens to the determinant of a matrix under row operations to the
determinant of the identity. This yields
x+y =1
x+y =1
0=0
1 1
The matrix for this would be M = and det(M ) = 0. When we compute the
0 0
determinant, this row of all zeros gets multiplied in every term. If instead
we were given redundant equations
x+y =1
2x + 2y = 2
1 1
The matrix for this would be M = and det(M ) = 0. But we know that
2 2
with an elementary row operation, we could replace the second row with a row
399
400 Movie Scripts
of all zeros. Somehow the determinant is able to detect that there is only one
equation here. Even if we had a set of contradictory set of equations such as
x+y =1
2x + 2y = 0,
where it is not possible for both of these equations to be true, the matrix M
is still the same, and still has a determinant zero.
Lets look at a three by three example, where the third equation is the sum
of the first two equations.
x+y+z =1
y+z =1
x + 2y + 2z = 2
Alternative Proof
Here we will prove more directly that the determinant of a product of matrices
is the product of their determinants. First we reference that for a matrix
M with rows ri , if M 0 is the matrix with rows rj0 = rj + ri for j 6= i and
ri0 = ri , then det(M ) = det(M 0 ) Essentially we have M 0 as M multiplied by the
elementary row sum matrices Sji ( ). Hence we can create an upper-triangular
matrix U such that det(M ) = det(U ) by first using the first row to set m1i 7! 0
for all i > 1, then iteratively (increasing k by 1 each time) for fixed k using
the k-th row to set mki 7! 0 for all i > k.
400
G.7 Determinants 401
Now note that for two upper-triangular matrices U = (uji ) and U 0 = (u0j
i ),
by matrix multiplication we have X = U U 0 = (xji ) is upper-triangular and
xii = uii u0i
i . Also since every permutation
Q would contain a lower diagonal entry
(which is 0) have det(U ) = i uii . Let A and A0 have corresponding upper-
triangular matrices U and U 0 respectively (i.e. det(A) = det(U )), we note
that AA0 has a corresponding upper-triangular matrix U U 0 , and hence we have
Y
det(AA0 ) = det(U U 0 ) = uii u0i
i
i
! !
Y Y
= uii u0i
i
i i
= det(U ) det(U 0 ) = det(A) det(A0 ).
This formula might be easier to remember if you think about this picture.
Now we can look at three by three matrices and see a few ways to compute
the determinant. We have a similar pattern for 3 ⇥ 3 matrices. Consider the
example
0 1
1 2 3
det @3 1 2A = ((1 · 1 · 1) + (2 · 2 · 0) + (3 · 3 · 0)) ((3 · 1 · 0) + (1 · 2 · 0) + (3 · 2 · 1)) = 5
0 0 1
We can draw a picture with similar diagonals to find the terms that will be
positive and the terms that will be negative.
401
402 Movie Scripts
0 1
1 2 3
1 2 3 2 3 1
det @3 1 2A = 1 2 +3 = 1(1 0) 2(3 0) + 3(0 0) = 5
0 1 0 1 0 0
0 0 1
Decide which way you prefer and get good at taking determinants, you’ll need
to compute them in a lot of problems.
det(A) = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 a11 a23 a32 a12 a21 a33 a13 a22 a31
and so the complexity is 5a + 12m. Now note that in general, the complexity
cn of the expansion minors formula of an arbitrary n ⇥ n matrix should be
cn = (n 1)a + ncn 1m
Pn i 1 1 1
since det(A) = i=1 ( 1) ai cofactor(ai ) and cofactor(ai ) is an (n 1) ⇥ (n 1)
matrix. This is one way to prove part (c).
402
G.8 Subspaces and Spanning Sets 403
X
lij xi = v j
i
where lij is the coefficient of the variable xi in the equation lj . However, this
is also stating that V is in the span of the vectors {Li }i where Li = (lij )j . For
example, consider the set of equations
2x + 3y z=5
x + 3y + z = 1
x+y 2z = 3
403
404 Movie Scripts
Here we have taken the subspace W to be a plane through the origin and U to
be a line through the origin. The hint now is to think about what happens when
you add a vector u 2 U to a vector w 2 W . Does this live in the union U [ W ?
For the second part, we take a more theoretical approach. Lets suppose
that v 2 U \ W and v 0 2 U \ W . This implies
v2U and v0 2 U .
So, since U is a subspace and all subspaces are vector spaces, we know that
the linear combination
↵v + v 0 2 U .
Now repeat the same logic for W and you will be nearly done.
The example asks whether they are linearly independent, and the answer is
immediate: NO, four vectors can never be linearly independent in R3 . This
vector space is simply not big enough for that, but you need to understand the
404
G.9 Linear Independence 405
notion of the dimension of a vector space to see why. So we think the vectors
v1 , v2 , v3 and v4 are linearly dependent, which means we need to show that there
is a solution to
↵1 v1 + ↵2 v2 + ↵3 v3 + ↵4 v4 = 0
for the numbers ↵1 , ↵2 , ↵3 and ↵4 not all vanishing.
To find this solution we need to set up a linear system. Writing out the
above linear combination gives
405
406 Movie Scripts
Worked Proof
Here we will work through a quick version of the proof P of Theorem 10.1.1. Let
i
{vi } denote a set of linearly dependent vectors, so i c vi = 0 where there
k
exists some c 6= 0. Now without loss of generality we order our vectors such
that c1 6= 0, and we can do so since addition is commutative (i.e. a + b = b + a).
Therefore we have
n
X
c1 v1 = ci vi
i=2
Xn
ci
v1 = vi
i=2
c1
406
G.10 Basis and Dimension 407
w = c1 v1 + · · · + cn vn .
w = d1 v 1 + · · · + dn v n .
0V = w w
= (c v1 + · · · + cn vn )
1
(d1 v1 + · · · + dn vn )
1 1 n
= (c d )v1 + · · · + (c dn )vn .
Since the vi ’s are linearly independent this implies that ci di = 0 for all i,
this means that we cannot have ci 6= di , which is a contradiction.
Worked Example
In this video we will work through an example of how to extend a set of linearly
independent vectors to a basis. For fun, we will take the vector space
V = {(x, y, z, w)|x, y, z, w 2 Z5 } .
This is like four dimensional space R4 except that the numbers can only be
{0, 1, 2, 3, 4}. This is like bits, but now the rule is
0 = 5.
407
408 Movie Scripts
The way to proceed is to add a known (and preferably simple) basis to the
vectors given, thus we consider
0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 0 0
B2C B3C B0C B1C B0C B0C
v1 = B C B C B C B C B C B C
@3A , v2 = @2A , e1 = @0A , e2 = @0A , e3 = @1A , e4 = @0A .
4 1 0 0 0 1
The last four vectors are clearly a basis (make sure you understand this....)
and are called the canonical basis. We want to keep v1 and v2 but find a way to
turf out two of the vectors in the canonical basis leaving us a basis of four
vectors. To do that, we have to study linear independence, or in other words
a linear system problem defined by
0 = ↵1 e1 + ↵2 e2 + ↵3 v 1 + ↵ 4 v 2 + ↵5 e3 + ↵6 e4 .
We want to find solutions for the ↵0 s which allow us to determine two of the
e0 s. For that we use an augmented matrix
0 1
1 0 1 0 0 0 0
B 2 3 0 1 0 0 0 C
B C
@ 3 2 0 0 1 0 0 A.
4 1 0 0 0 1 0
Next comes a bunch of row operations. Note that we have dropped the last column
of zeros since it has no information--you can fill in the row operations used
above the ⇠’s as an exercise:
0 1 0 1
1 0 1 0 0 0 1 0 1 0 0 0
B2 3 0 1 0 0C B0 3 3 1 0 0C
B C B C
@3 2 0 0 1 0A ⇠ @0 2 2 0 1 0A
4 1 0 0 0 1 0 1 1 0 0 1
0 1 0 1
1 0 1 0 0 0 1 0 1 0 0 0
B0 1 1 2 0 0C B0 1 1 2 0 0C
⇠B
@0
C⇠B C
2 2 0 1 0A @0 0 0 1 1 0A
0 1 1 0 0 1 0 0 0 3 0 1
408
G.11 Eigenvalues and Eigenvectors 409
0 1 0 1
1 0 1 0 0 0 1 0 1 0 0 0
B0 1 1 0 3 0C B0 1 1 0 3 0C
⇠B
@0
C⇠B C
0 0 1 1 0A @0 0 0 1 1 0A
0 0 0 0 2 1 0 0 0 0 1 3
0 1
1 0 1 0 0 0
B0 1 1 0 0 1C
⇠B
@0
C
0 0 1 0 2A
0 0 0 0 1 3
The pivots are underlined. The columns corresponding to non-pivot variables
are the ones that can be eliminated--their coefficients (the ↵’s) will be
arbitrary, so set them all to zero save for the one next to the vector you are
solving for which can be taken to be unity. Thus that vector can certainly be
expressed in terms of previous ones. Hence, altogether, our basis is
8 0 1 0 1 0 1 0 19
>
> 1 0 0 0 >
< B C B C B C B C> =
B C , B C , B C , B0C .
2 3 1
> @3A @2A @0A @1A>
>
: >
;
4 1 0 0
and so choosing any two non-zero vectors will form a basis. Now in general we
note that we can build up a basis {ei } by arbitrarily (independently) choosing
the first i 1 entries, then setting the i-th entry to 1 and all higher entries
to 0.
409
410 Movie Scripts
This means that = 2 and = 5 are solutions. Now if we want to find the
eigenvectors that correspond to these values we look at vectors v such that
✓ ◆
4 2
v = ~0 .
1 3
For =5 ✓ ◆✓ ◆ ✓ ◆✓ ◆
4 5 2 x 1 2 x
= = ~0 .
1 3 5 y 1 2 y
This gives us the equalities x + 2y = 0 and x
2y✓=◆0 which both give the line
2
y = 12 x. Any point on this line, so for example , is an eigenvector with
1
eigenvalue = 5.
Now lets find the eigenvector for = 2
✓ ◆✓ ◆ ✓ ◆✓ ◆
4 2 2 x 2 2 x
= = ~0,
1 3 2 y 1 1 y
410
G.11 Eigenvalues and Eigenvectors 411
and we note that we can just read off the eigenvector e1 with eigenvalue .
However the characteristic polynomial of J2 is PJ2 (µ) = (µ )2 so the only
possible eigenvalue is , but we claim it does not have a second eigenvector
v. To see this, we require that
v1 + v2 = v1
v2 = v2
which clearly implies that v 2 = 0. This is known as a Jordan 2-cell, and in
general, a Jordan n-cell with eigenvalue is (similar to) the n ⇥ n matrix
0 1
1 0 ··· 0
B .. C
B0 1 . 0C
B C
Jn = B
B. .. .. .. .C
B.
. . . . .C
.
C
@0 ··· 0 1A
0 ··· 0 0
which has a single eigenvector e1 .
Now consider the following matrix
0 1
3 1 0
M = @0 3 1A
0 0 2
and we see that PM ( ) = ( 3)2 ( 2). Therefore for = 3 we need to find the
solutions to (M 3I3 )v = 0 or in equation form:
v2 = 0
v3 = 0
v 3 = 0,
and we immediately see that we must have V = e1 . Next for = 2, we need to
solve (M 2I3 )v = 0 or
v1 + v2 = 0
v2 + v3 = 0
0 = 0,
and thus we choose v 1 = 1, which implies v 2 = 1 and v 3 = 1. Hence this is the
only other eigenvector for M .
This is a specific case of Problem 13.7.
Eigenvalues
Eigenvalues and eigenvectors are extremely important. In this video we review
the theory of eigenvalues. Consider a linear transformation
L:V !V
411
412 Movie Scripts
Lv = v
becomes
M v = v,
N u = 0 and u 6= 0 () det N = 0.
I.e., a square matrix can have an eigenvector with vanishing eigenvalue if and only if its
determinant vanishes! Hence
det(M I) = 0.
The quantity on the left (up to a possible minus sign) equals the so-called
characteristic polynomial
PM ( ) := det( I M) .
412
G.11 Eigenvalues and Eigenvectors 413
z2 + 1
i2 = 1,
we have
z 2 + 1 = (z i)(z + i) .
Returning to our characteristic polynomial, we call on the fundamental theorem
of algebra to write
PM ( ) = ( 1 )( 2) · · · ( n) .
The roots 1 , 2 ,..., n are the eigenvalues of M (or its underlying linear
transformation L).
Eigenspaces
Consider the linear map 0 1
4 6 6
L=@ 0 2 0A .
3 3 5
Direct computation will show that we have
0 1
1 0 0
L = Q @ 0 2 0A Q 1
0 0 2
where 0 1
2 1 1
Q = @0 0 1A .
1 1 0
Therefore the vectors 0 1 0 1
1 1
(2) (2)
v1 = @0A v2 = @1A
1 0
span the eigenspace E (2) of the eigenvalue 2, and for an explicit example, if
we take 0 1
1
(2) (2)
v = 2v1 v2 = @ 1A
2
413
414 Movie Scripts
we have
1 0
2
Lv = @ 2A = 2v
4
( )
so v 2 E (2) . In general, we note the linearly independent vectors vi with the
P ( )
same eigenvalue span an eigenspace since for any v = i ci vi , we have
X ( )
X ( )
X ( )
Lv = ci Lvi = ci vi = ci vi = v.
i i i
det(M I) = 0
for .
✓ ◆
3 2
det =0
2 3
By computing the determinant and solving for we can find the eigenvalues =
1 and 5, and the corresponding eigenvectors. You should do the computations
to find these for yourself.
When we think about the question in part (b) which asks to find a vector
v(0) such that v(0) = v(1) = v(2) . . ., we must look for a vector that satisfies
v = M v. What eigenvalue does this correspond to? If you found a v(0) with
this property would cv(0) for a scalar c also work? Remember that eigenvectors
have to be nonzero, so what if c = 0?
For part (c) if we tried an eigenvector would we have restrictions on what
the eigenvalue should be? Think about what it means to be pointed in the same
direction.
414
G.12 Diagonalization 415
G.12 Diagonalization
Non Diagonalizable Example
First recall that the derivative operator is linear and that we can write it
as the matrix
0 1
0 1 0 0 ···
d B0 0 2 0 · · ·C
B C
= B0 0 0 3 · · ·C .
dx @ A
.
. . . . ..
. .. .. .. .
We note that this transforms into an infinite Jordan cell with eigenvalue 0
or
0 1
0 1 0 0 ···
B0 0 1 0 · · ·C
B C
B0 0 0 1 · · ·C
@ A
.
. .
. .
. .
. ..
. . . . .
You can easily check that the only eigenvector is 1 with eigenvalue 0 since D
always lowers the degree of a polynomial by 1 each time it is applied. Note
that this is a nilpotent matrix since D4 = 0, but the only nilpotent matrix
that is ‘‘diagonalizable’’ is the 0 matrix.
415
416 Movie Scripts
Oranges
(x, y)
Apples
Calling the basis vectors ~e1 := (1, 0) and ~e2 := (0, 1), this representation would
label what’s in the barrel by a vector
✓ ◆
x
~x := x~e1 + y~e2 = ~e1 ~e2 .
y
Since this is the method ordinary people would use, we will call this the
‘‘engineer’s’’ method!
But this is not the approach nutritionists would use. They would note the
amount of sugar and total number of fruit (s, f ):
416
G.12 Diagonalization 417
fruit
(s, f )
sugar
WARNING: To make sense of what comes next you need to allow for the possibity
of a negative amount of fruit or sugar. This would be just like a bank, where
if money is owed to somebody else, we can use a minus sign.
The vector ~x says what is in the barrel and does not depend which mathe-
matical description is employed. The way nutritionists label ~x is in terms of
a pair of basis vectors f~1 and f~2 :
⇣ ⌘ ✓ s◆
~ ~ ~ ~
~x = sf1 + f f2 = f1 f2 .
f
Thus our vector space now has a bunch of interesting vectors:
The vector ~x labels generally the contents of the barrel. The vector ~e1 corre-
sponds to one apple and one orange. The vector ~e2 is one orange and no apples.
The vector f~1 means one unit of sugar and zero total fruit (to achieve this
you could lend out some apples and keep a few oranges). Finally the vector f~2
represents a total of one piece of fruit and no sugar.
You might remember that the amount of sugar in an apple is called while
oranges have twice as much sugar as apples. Thus
⇢
s = (x + 2y)
f = x+y.
417
418 Movie Scripts
Essentially, this is already our change of basis formula, but lets play around
and put it in our notations. First we can write this as a matrix
✓ ◆ ✓ ◆✓ ◆
s 2 x
= .
f 1 1 y
Comparing to the nutritionist’s formula for the same object ~x we learn that
1
f~1 = ~e1 ~e2 and f~2 = 2~e1 2~e2 .
Rearranging these equation we find the change of base matrix P from the engi-
neer’s basis to the nutritionist’s basis:
⇣ ⌘ ✓ 1 ◆
~ ~ 2
f1 f2 = ~e1 ~e2 1 =: ~e1 ~e2 P .
1
We can also go the other direction, changing from the nutritionist’s basis to
the engineer’s basis
⇣ ⌘✓ 2
◆ ⇣ ⌘
~e1 ~e2 = f~1 f~2 =: f~1 f~2 Q .
1 1
x + y = 27 and 2x y = 0.
MX = V
where ✓ ◆ ✓ ◆ ✓ ◆
1 1 x 0
M := , X := V := .
2 1 y 27
418
G.12 Diagonalization 419
Note that
~x = ~e1 ~e2 X .
Also lets call
~v := ~e1 ~e2 V .
Now the matrix M is the matrix of some linear transformation L in the basis
of the engineers. Lets convert it to the basis of the nutritionists:
⇣ ⌘✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
~ ~ s s ~e1 s
L~x = L f1 f2 = L ~e1 ~e2 P = MP .
f f ~e2 f
Note here that the linear transformation on acts on vectors -- these are the
objects we have written with a~ sign on top of them. It does not act on columns
of numbers!
We can easily compute M P and find
✓ ◆✓ 1 ◆ ✓ ◆
1 1 2 0 1
MP = 1 = 3 .
2 1 1 5
f = 27 and s = 45 .
2 ⇥ 2 Example
Lets diagonalize the matrix M from a previous example
✓ ◆
4 2
M=
1 3
We found the eigenvalues and eigenvectors of M , our solution was
✓ ◆ ✓ ◆
2 1
1 = 5, v1 = and 2 = 2, v2 = .
1 1
419
420 Movie Scripts
1
So we can diagonalize this matrix using the formula D = P M P where P =
(v1 , v2 ). This means
✓ ◆ ✓ ◆
2 1 1 1 1 1
P = and P =
1 1 3 1 2
420
G.13 Orthonormal Bases and Complements 421
for some ✓ 2 [0, 2⇡). Now first we need to show that for a fixed ✓ that the pair
is orthogonal:
Also we have
and hence {e✓1 , e✓2 } is an orthonormal basis. To show that every orthonormal
basis of R2 is {e✓1 , e✓2 } for some ✓, consider an orthonormal basis {b1 , b2 } and
note that b1 forms an angle with the vector e1 (which is e01 ). Thus b1 = e1 and
if b2 = e2 , we are done, otherwise b2 = e2 and it is the reflected version.
However we can do the same thing except starting with b2 and get b2 = e1 and
b1 = e2 since we have just interchanged two basis vectors which corresponds to
a reflection which picks up a minus sign as in the determinant.
-sin θ
cos θ cos θ
sin θ
421
422 Movie Scripts
0 1 0 1 0 1 0 1
o 0 3 1
B1C B1C B0C B1C
v1 = B C B C B C B C
@0A , v2 = @1A , v3 = @1A , and v4 = @0A ,
0 0 0 2
we start with v1
0 1
0
B1C
v1 = v1 = B
? C
@0A .
0
(v1? · v2 ) ?
v2? = v2 v
kv1? k2 1
0 1 0 1
0 0
B1C 1 B1C
= B C B C
@1A 1 @0A
0 0
0 1
0
B0C
= B C
@1A
0
(v1? · v3 ) ? (v2? · v3 ) ?
v3? = v3 v v
kv1? k2 1 kv2? k2 2
0 1 0 1 0 1 0 1
3 0 0 3
B0C 0 B1C 1 B0C B0C
= B C B C B C B C
@1A 1 @0A 1 @1A = @0A
0 0 0 0
u·v
This last step requires subtracting off the term of the form u·u u for each of
the previously defined basis vectors.
422
G.13 Orthonormal Bases and Complements 423
Now v1? , v2? , v3? , and v4? are an orthogonal basis. Notice that even with very,
very nice looking vectors we end up having to do quite a bit of arithmetic.
This a good reason to use programs like matlab to check your work.
Next we find
t 2 = m2 (m01 m2 )m01 = m2 r21 m01 = m2 0m01
noting that
m01 m01 = km01 k2 = 1
p
and kt2 k = r22 = 3, and so we get m02 = ktt22 k with the decomposition
0 1 1 0p 1
p p1 1
2 3 2 p0 0
B 0 1 C
Q2 = @ p
3
2A , R2 = @ 0 3 0A .
p1 p1 1 0 0 1
2 3
423
424 Movie Scripts
Finally we calculate
t3 = m3 (m01 m3 )m01 (m02 m3 )m02
p 2
= m3 r31 m01 r32 m02 = m3 + 2m01 p m02 ,
3
q
again noting m02 m02 = km02 k = 1, and let m03 = ktt33 k where kt3 k = r33 = 2 23 . Thus
we get our final M = QR decomposition as
0 p1 p1 p1
1 0p p 1
2 p0 2
2 3 q2
B 1 2 C B 0 3 p2 C
Q=@ 0 p
3A
, R=@ q3A .
3
2
p1 1 p1 0 0 2
2 3 6 3
Overview
This video depicts the ideas of a subspace sum, a direct sum and an orthogonal
complement in R3 . Firstly, lets start with the subspace sum. Remember that
even if U and V are subspaces, their union U [ V is usually not a subspace.
However, the span of their union certainly is and is called the subspace sum
U + V = span(U [ V ) .
You need to be aware that this is a sum of vector spaces (not vectors). A
picture of this is a pair of planes in R3 :
Here U + V = R3 .
Next lets consider a direct sum. This is just the subspace sum for the
case when U \ V = {0}. For that we can keep the plane U but must replace V by
a line:
424
G.13 Orthonormal Bases and Complements 425
Notice, we can apply the same operation to U ? and just get U back again, i.e.
?
U? =U.
425
426 Movie Scripts
However, the basis is not orthonormal so we know nothing about the lengths of
the basis vectors (save that they cannot vanish).
To complete the hint, lets use the dot product to compute a formula for c1
in terms of the basis vectors and v. Consider
v1 v = c1 v1 v1 + c2 v1 v 2 + · · · + cn v1 vn = c1 v1 v1 .
u · v = kukkvk cos(✓),
⇡
and in particular if they are perpendicular ✓ = 2 and cos( ⇡2 ) = 0 you will
get u · v = 0.
Now try to compute the dot product of u and v ? to find kukkv? k cos(✓)
⇣ u·v ⌘
u · v? = u· v u
u⇣· u ⌘
u·v
= u·v u· u
⇣ u · v ⌘· u
u
= u·v u·u
u·u
Now you finish simplifying and see if you can figure out what ✓ has to be.
(c) Given your solution to the above, how can you find a third vector perpen-
dicular to both u and v ? ?
Remember what other things you learned in multivariable calculus? This
might be a good time to remind your self what the cross product does.
426
G.13 Orthonormal Bases and Complements 427
u= 1 2 0 and v = 0 1 1 .
Try it out, and if you get stuck try drawing a sketch of the vectors you
have.
as a set of 3 vectors
0 1 0
1 0 1
0 0 2
v 1 = @ 1A , v2 = @ 2A , v3 = @0A .
1 2 2
M = QR
427
428 Movie Scripts
You can work out the last column using the same ideas. Thus it only remains to
compute Q from
1
Q = MR .
428
G.14 Diagonalizing Symmetric Matrices 429
429
430 Movie Scripts
1
because if P = [v1 , v2 , v3 ] is created from orthonormal vectors then P = PT,
which means computing P 1 should be easy. So lets say
0 p1
1 0 p1 1 0 1
2 2 0
v1 = @ p1 A ,
2
v2 = @ p1 A ,
2
and v3 = @0A
0 0 1
so we get
0 p1 p1
1 0 p1 p1
1
2 2
0 2 2
0
P =@ p1
2
p1
2
0A and P 1
= @ p1
2
p1
2
0A
0 0 1 0 0 1
1
So when we compute D = P M P we’ll get
0 p1 p1
10 1 0 p1 p1
1 0 1
2 2
0 1 2 0 2 2
0 1 0 0
@ p1 p1 0A @2 5 0A @ p12 p1 0A = @ 0 3 0A
2 2 2
0 0 1 0 0 5 0 0 1 0 0 5
430
G.15 Kernel, Range, Nullity, Rank 431
2. Now, suppose that L is one-to-one. Show that ker L = {0V }. That is, show
that 0V is in ker L, and then show that there are no other vectors in
ker L.
What would happen if we had a nonzero kernel? If we had some vector v
with L(v) = 0 and v 6= 0, we could try to show that this would contradict
the given that L is one-one. If we found x and y with L(x) = L(y), then
we know x = y. But if L(v) = 0 then L(x) + L(v) = L(y). Does this cause a
problem?
431
432 Movie Scripts
432
Index
433
434 INDEX
434
INDEX 435
435
436 INDEX
Zero vector, 14
n-vectors, 84
436