0070401098
0070401098
0070401098
Algebra
wi th Appl i c a t i ons
SEVENTH
EDITION
W. K e i t h N i c h o l s o n
Linear Algebra with Applications
Seventh Edition
Copyright © 2013, 2009, 2006, 2003 by McGraw-Hill Ryerson Limited, a Subsidiary of The
McGraw-Hill Companies. Copyright © 1995 by PWS Publishing Company. Copyright © 1990
by PWS-KENT Publishing Company. Copyright © 1986 by PWS Publishers. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, or
stored in a data base or retrieval system, without the prior written permission of McGraw-Hill
Ryerson Limited, or in the case of photocopying or other reprographic copying, a licence from
The Canadian Copyright Licensing Agency (Access Copyright). For an Access Copyright licence,
visit www.accesscopyright.ca or call toll-free to 1-800-893-5777.
The Internet addresses listed in the text were accurate at the time of publication. The inclusion
of a website does not indicate an endorsement by the authors or McGraw-Hill Ryerson, and
McGraw-Hill Ryerson does not guarantee the accuracy of information presented at these sites.
ISBN-13: 978-0-07-040109-9
ISBN-10: 0-07-040109-8
1 2 3 4 5 6 7 8 9 0 DOW 1 9 8 7 6 5 4 3
Care has been taken to trace ownership of copyright material contained in this text; however, the
publisher will welcome any information that enables it to rectify any reference or credit for subse-
quent editions.
Nicholson, W. Keith
Linear algebra with applications / W. Keith Nicholson. -- 7th ed.
Includes index.
ISBN 978-0-07-040109-9
Chapter 1 Systems of Linear Equations ............1 Chapter 5 The Vector Space n ...................229
1.1 Solutions and Elementary Operations ..................... 1 5.1 Subspaces and Spanning ....................................... 229
1.2 Gaussian Elimination ................................................ 9 5.2 Independence and Dimension .............................. 236
1.3 Homogeneous Equations........................................ 18 5.3 Orthogonality ........................................................ 246
1.4 An Application to Network Flow ........................... 25 5.4 Rank of a Matrix ................................................... 253
1.5 An Application to Electrical Networks .................. 27 5.5 Similarity and Diagonalization ............................. 262
1.6 An Application to Chemical Reactions .................. 29 5.6 Best Approximation and Least Squares ................ 273
Supplementary Exercises for Chapter 1 ........................ 30 5.7 An Application to Correlation and Variance ....... 282
Supplementary Exercises for Chapter 5 ...................... 287
Chapter 2 Matrix Algebra .................................32
2.1 Matrix Addition, Scalar Multiplication, and Chapter 6 Vector Spaces ................................288
Transposition ........................................................ 32 6.1 Examples and Basic Properties ............................. 288
2.2 Equations, Matrices, and Transformations ............ 42 6.2 Subspaces and Spanning Sets ............................... 296
2.3 Matrix Multiplication .............................................. 56
6.3 Linear Independence and Dimension .................. 303
2.4 Matrix Inverses ........................................................ 69
6.4 Finite Dimensional Spaces.................................... 311
2.5 Elementary Matrices ............................................... 83
6.5 An Application to Polynomials ............................. 320
2.6 Linear Transformations .......................................... 91
6.6 An Application to Differential Equations ............ 325
2.7 LU-Factorization .................................................. 103
Supplementary Exercises for Chapter 6 ...................... 330
2.8 An Application to Input-Output Economic
Models................................................................. 112
Chapter 7 Linear Transformations .................331
2.9 An Application to Markov Chains ........................ 117
Supplementary Exercises for Chapter 2 ...................... 124 7.1 Examples and Elementary Properties .................. 331
7.2 Kernel and Image of a Linear Transformation ... 338
Chapter 3 Determinants and 7.3 Isomorphisms and Composition........................... 347
Diagonalization ................................................126 7.4 A Theorem about Differential Equations ............ 357
7.5 More on Linear Recurrences................................ 360
3.1 The Cofactor Expansion....................................... 126
3.2 Determinants and Matrix Inverses ....................... 137
3.3 Diagonalization and Eigenvalues ......................... 150 Chapter 8 Orthogonality .................................368
3.4 An Application to Linear Recurrences ................. 168 8.1 Orthogonal Complements and Projections ......... 368
3.5 An Application to Systems of Differential 8.2 Orthogonal Diagonalization ................................. 376
Equations ............................................................ 173 8.3 Positive Definite Matrices .................................... 385
3.6 Proof of the Cofactor Expansion Theorem ......... 179 8.4 QR-Factorization .................................................. 390
Supplementary Exercises for Chapter 3 ...................... 183 8.5 Computing Eigenvalues ........................................ 393
8.6 Complex Matrices ................................................. 397
Chapter 4 Vector Geometry ............................184 8.7 An Application to Linear Codes over
4.1 Vectors and Lines ................................................. 184 Finite Fields ........................................................ 408
4.2 Projections and Planes .......................................... 198 8.8 An Application to Quadratic Forms ..................... 422
4.3 More on the Cross Product .................................. 213 8.9 An Application to Constrained
4.4 Linear Operations on 3 ...................................... 219 Optimization ....................................................... 431
4.5 An Application to Computer Graphics ................ 226 8.10 An Application to Statistical Principal
Supplementary Exercises for Chapter 4 ...................... 228 Component Analysis........................................... 434
iv Contents
This textbook is an introduction to the ideas and The examples are also used to motivate (and illustrate)
techniques of linear algebra for first- or second-year concepts and theorems, carrying the student from
students with a working knowledge of high school concrete to abstract. While the treatment is rigorous,
algebra. The contents have enough flexibility to present proofs are presented at a level appropriate to the
a traditional introduction to the subject, or to allow for student and may be omitted with no loss of continuity.
a more applied course. Chapters 1–4 contain a one- As a result, the book can be used to give a course that
semester course for beginners whereas Chapters 5–9 emphasizes computation and examples, or to give a more
contain a second semester course (see the Suggested theoretical treatment (some longer proofs are deferred to
Course Outlines below). The text is primarily about real the end of the Section).
linear algebra with complex numbers being mentioned Linear Algebra has application to the natural sciences,
when appropriate (reviewed in Appendix A). Overall, engineering, management, and the social sciences as well
the aim of the text is to achieve a balance among as mathematics. Consequently, 18 optional “applications”
computational skills, theory, and applications of linear sections are included in the text introducing topics
algebra. Calculus is not a prerequisite; places where it is as diverse as electrical networks, economic models,
mentioned may be omitted. Markov chains, linear recurrences, systems of differential
As a rule, students of linear algebra learn by studying equations, and linear codes over finite fields. Additionally
examples and solving problems. Accordingly, the some applications (for example linear dynamical systems,
book contains a variety of exercises (over 1200, many and directed graphs) are introduced in context. The
with multiple parts), ordered as to their difficulty. In applications sections appear at the end of the relevant
addition, more than 375 solved examples are included chapters to encourage students to browse.
in the text, many of which are computational in nature.
This text includes the basis for a two-semester course in applications. There is more material here than can
linear algebra. be covered in one semester, and at Calgary we cover
• Chapters 1–4 provide a standard one-semester Sections 5.1–5.5, 6.1–6.4, 7.1–7.3, 8.1–8.6, and 9.1–
course of 35 lectures, including linear equations, 9.3, with a couple of applications as time permits.
matrix algebra, determinants, diagonalization, and • Chapter 5 is a “bridging” chapter that introduces
geometric vectors, with applications as time permits. concepts like spanning, independence, and basis in
At Calgary, we cover Sections 1.1–1.3, 2.1–2.6, 3.1–3.3 the concrete setting of n, before venturing into the
and 4.1–4.4, and the course is taken by all science abstract in Chapter 6. The duplication is balanced by
and engineering students in their first semester. the value of reviewing these notions, and it enables
Prerequisites include a working knowledge of high the student to focus in Chapter 6 on the new idea of
school algebra (algebraic manipulations and some an abstract system. Moreover, Chapter 5 completes
familiarity with polynomials); calculus is not required. the discussion of rank and diagonalization from
• Chapters 5–9 contain a second semester course earlier chapters, and includes a brief introduction
including n, abstract vector spaces, linear to orthogonality in n, which creates the possibility
transformations (and their matrices), orthogonality, of a one-semester, matrix-oriented course covering
complex matrices (up to the spectral theorem) and Chapters 1–5 for students not wanting to study the
abstract theory.
vi Preface
CHAPTER DEPENDENCIES
The following chart suggests how the material introduced in each chapter draws on concepts covered in certain earlier
chapters. A solid arrow means that ready assimilation of ideas and techniques presented in the later chapter depends
on familiarity with the earlier chapter. A broken arrow indicates that some reference to the earlier chapter is made but
the chapter need not be covered.
• Vector notation. Based on feedback from reviewers • Definitions. Important ideas and concepts are
and current users, all vectors are denoted by boldface identified in their given context for student’s
letters (used only in abstract spaces in earlier editions). understanding. These are highlighted in the text when
Thus x becomes x in 2 and 3 (Chapter 4), and in n they are first discussed, identified in the left margin, and
the column X becomes x. Furthermore, the notation listed on the inside back cover for reference.
[x1 x2 … xn]T for vectors in n has been eliminated; • Exposition. Several new margin diagrams have been
instead we write vectors as n-tuples (x1, x2, …, xn) or included to clarify concepts, and the exposition has
S T
x1
x been improved to simplify and streamline discussion
as columns 2 . The result is a uniform notation for and proofs.
xn
vectors throughout the text.
OTHER CHANGES
• Several new examples and exercises have been added. • The example in Section 3.3, which illustrates that x in
• The motivation for the matrix inversion algorithm has 2 is an eigenvector of A if, and only if, the line x is
been rewritten in Section 2.4. A-invariant, has been completely rewritten.
• For geometric vectors in 2, addition (parallelogram • The first part of Section 4.1 on vector geometry in 2
law) and scalar multiplication now appear earlier and 3 has also been rewritten and shortened.
(Section 2.2). The discussion of reflections in Section • In Section 6.4 there are three improvements: Theorem
2.6 has been simplified, and projections are now 1 now shows that an independent set can be extended to
included. a basis by adding vectors from any prescribed basis; the
Preface vii
proof that a spanning set can be cut down to a basis has • In Section 8.1, the definition of projections has
been simplified (in Theorem 3); and in Theorem 4, the been clarified, as has the discussion of the nature of
argument that independence is equivalent to spanning quadratic forms in 2.
for a set S ⊆ V with |S| = dim V has been streamlined
and a new example added.
ANCILLARY MATERIALS
CHAPTER SUMMARIES
Chapter 1: Systems of Linear Equations. multiplication developed in Section 2.2. Thus the usual
A standard treatment of gaussian elimination is given. arcane definition of matrix multiplication is split into two
The rank of a matrix is introduced via the row-echelon well motivated parts, each an important aspect of matrix
form, and solutions to a homogenous system are algebra. Of course, this has the pedagogical advantage
that the conceptual power of geometry can be invoked to
presented as linear combinations of basic solutions.
illuminate and clarify algebraic techniques and definitions.
Applications to network flows, electrical networks, and
In Sections 2.4 and 2.5 matrix inverses are characterized,
chemical reactions are provided.
their geometrical meaning is explored, and block
Chapter 2: Matrix Algebra. multiplication is introduced, emphasizing those cases needed
later in the book. Elementary matrices are discussed, and
After a traditional look at matrix addition, scalar the Smith normal form is derived. Then in Section 2.6,
multiplication, and transposition in Section 2.1, matrix- linear transformations n → m are defined and shown
vector multiplication is introduced in Section 2.2 by to be matrix transformations. The matrices of reflections,
viewing the left side of a system of linear equations as the rotations, and projections in the plane are determined.
product Ax of the coefficient matrix A with the column Finally, matrix multiplication is related to directed graphs,
x of variables. The usual dot-product definition of a matrix LU-factorization is introduced, and applications to
matrix-vector multiplication follows. Section 2.2 ends by economic models and Markov chains are presented.
viewing an m × n matrix A as a transformation n → m.
This is illustrated for 2 → 2 by describing reflection in Chapter 3: Determinants and Diagonalization.
the x axis, rotation of 2 through __π2 , shears, and so on. The cofactor expansion is stated (proved by induction
In Section 2.3, the product of matrices A and B is later) and used to define determinants inductively and
defined by AB = [Ab1 Ab2 … Abn], where the bi are the to deduce the basic rules. The product and adjugate
columns of B. A routine computation shows that this is theorems are proved. Then the diagonalization algorithm
the matrix of the transformation B followed by A. This is presented (motivated by an example about the possible
observation is used frequently throughout the book, and extinction of a species of birds). As requested by our
leads to simple, conceptual proofs of the basic axioms of Engineering Faculty, this is done earlier than in most texts
matrix algebra. Note that linearity is not required—all because it requires only determinants and matrix inverses,
that is needed is some basic properties of matrix-vector avoiding any need for subspaces, independence and
Preface ix
dimension. Eigenvectors of a 2 × 2 matrix A are described dimension theorem is proved, and isomorphisms are
geometrically (using the A-invariance of lines through discussed. The chapter ends with an application to linear
the origin). Diagonalization is then used to study discrete recurrences. A proof is included that the order of a
linear dynamical systems and to discuss applications to differential equation (with constant coefficients) equals
linear recurrences and systems of differential equations. A the dimension of the space of solutions.
brief discussion of Google PageRank is included.
Chapter 8: Orthogonality.
Chapter 4: Vector Geometry. The study of orthogonality in n, begun in Chapter
Vectors are presented intrinsically in terms of length and 5, is continued. Orthogonal complements and
direction, and are related to matrices via coordinates. projections are defined and used to study orthogonal
Then vector operations are defined using matrices and diagonalization. This leads to the principal axis theorem,
shown to be the same as the corresponding intrinsic the Cholesky factorization of a positive definite matrix,
definitions. Next, dot products and projections are and QR-factorization. The theory is extended to n
introduced to solve problems about lines and planes. This in Section 8.6 where hermitian and unitary matrices
leads to the cross product. Then matrix transformations are discussed, culminating in Schur’s theorem and the
are introduced in 3, matrices of projections and spectral theorem. A short proof of the Cayley-Hamilton
reflections are derived, and areas and volumes are theorem is also presented. In Section 8.7 the field Zp
computed using determinants. The chapter closes with an of integers modulo p is constructed informally for any
application to computer graphics. prime p, and codes are discussed over any finite field.
The chapter concludes with applications to quadratic
Chapter 5: The Vector Space n. forms, constrained optimization, and statistical principal
Subspaces, spanning, independence, and dimensions component analysis.
are introduced in the context of n in the first two
sections. Orthogonal bases are introduced and used to Chapter 9: Change of Basis.
derive the expansion theorem. The basic properties of The matrix of general linear transformation is defined
rank are presented and used to justify the definition and studied. In the case of an operator, the relationship
given in Section 1.2. Then, after a rigorous study of between basis changes and similarity is revealed. This is
diagonalization, best approximation and least squares illustrated by computing the matrix of a rotation about a
are discussed. The chapter closes with an application to line through the origin in 3. Finally, invariant subspaces
correlation and variance. and direct sums are introduced, related to similarity, and
As in the sixth edition, this is a “bridging” chapter, (as an example) used to show that every involution is
easing the transition to abstract spaces. Concern about similar to a diagonal matrix with diagonal entries ±1.
duplication with Chapter 6 is mitigated by the fact that
this is the most difficult part of the course and many Chapter 10: Inner Product Spaces.
students welcome a repeat discussion of concepts like General inner products are introduced and distance,
independence and spanning, albeit in the abstract setting. norms, and the Cauchy-Schwarz inequality are discussed.
In a different direction, Chapters 1–5 could serve as The Gram-Schmidt algorithm is presented, projections
a solid introduction to linear algebra for students not are defined and the approximation theorem is proved
requiring abstract theory. (with an application to Fourier approximation). Finally,
isometries are characterized, and distance preserving
Chapter 6: Vector Spaces. operators are shown to be composites of a translations
Building on the work on n in Chapter 5, the basic and isometries.
theory of abstract finite dimensional vector spaces is
developed emphasizing new examples like matrices, Chapter 11: Canonical Forms.
polynomials and functions. This is the first acquaintance The work in Chapter 9 is continued. Invariant subspaces
most students have had with an abstract system, so and direct sums are used to derive the block triangular
not having to deal with spanning, independence and form. That, in turn, is used to give a compact proof of
dimension in the general context eases the transition to the Jordan canonical form. Of course the level is higher.
abstract thinking. Applications to polynomials and to
differential equations are included. Appendices
In Appendix A, complex arithmetic is developed far
Chapter 7: Linear Transformations. enough to find nth roots. In Appendix B, methods
General linear transformations are introduced, motivated of proof are discussed, while Appendix C presents
by many examples from geometry, matrix theory, and mathematical induction. Finally, Appendix D describes
calculus. Then kernels and images are defined, the the properties of polynomials in elementary terms.
x Preface
LIST OF APPLICATIONS
ACKNOWLEDGMENTS
Comments and suggestions that have been invaluable to the development of this edition were provided by a variety
of reviewers, and I thank the following instructors:
Robert Andre Dzung M. Ha Fred Szabo Petr Zizler
University of Waterloo Ryerson University Concordia University Mount Royal University
Dietrich Burbulla Mark Solomonovich Edward Wang
University of Toronto Grant MacEwan Wilfred Laurier
It is a pleasure to recognize the contributions of several people to this book. First, I would like to thank a
number of colleagues who have made suggestions that have improved this edition. Discussions with Thi Dinh
and Jean Springer have been invaluable and many of their suggestions have been incorporated. Thanks are also
due to Kristine Bauer and Clifton Cunningham for several conversations about the new way to look at matrix
multiplication. I also wish to extend my thanks to Joanne Canape for being there when I have technical questions.
Of course, thanks go to James Booty, Senior Sponsoring Editor, for his enthusiasm and effort in getting this project
underway, to Sarah Fulton and Erin Catto, Developmental Editors, for their work on the editorial background
of the book, and to Cathy Biribauer and Robert Templeton (First Folio Resource Group Inc.) and the rest of the
production staff at McGraw-Hill Ryerson for their parts in the project. Thanks also go to Jason Nicholson for his
help in various aspects of the book, particularly the Solutions Manual. Finally, I want to thank my wife Kathleen,
without whose understanding and cooperation, this book would not exist.
W. Keith Nicholson
University of Calgary
Systems of Linear
Equations
1
SECTION 1.1 Solutions and Elementary Operations
Practical problems in many fields of study—such as biology, business, chemistry,
computer science, economics, electronics, engineering, physics and the social
sciences—can often be reduced to solving a system of linear equations. Linear
algebra arose from attempts to find systematic methods for solving these systems,
so it is natural to begin this book by studying linear equations.
If a, b, and c are real numbers, the graph of an equation of the form
ax + by = c
is a straight line (if a and b are not both zero), so such an equation is called a linear
equation in the variables x and y. However, it is often convenient to write the
variables as x1, x2, …, xn, particularly when more than two variables are involved.
An equation of the form
a1x1 + a2x2 + + anxn = b
is called a linear equation in the n variables x1, x2, …, xn. Here a1, a2, …, an denote
real numbers (called the coefficients of x1, x2, …, xn, respectively) and b is also a
number (called the constant term of the equation). A finite collection of linear
equations in the variables x1, x2, …, xn is called a system of linear equations in
these variables. Hence,
2x1 - 3x2 + 5x3 = 7
is a linear equation; the coefficients of x1, x2, and x3 are 2, -3, and 5, and the
constant term is 7. Note that each variable in a linear equation occurs to the first
power only.
Given a linear equation a1x1 + a2x2 + + anxn = b, a sequence s1, s2, …, sn of n
numbers is called a solution to the equation if
a1s1 + a2s2 + + ansn = b
that is, if the equation is satisfied when the substitutions x1 = s1, x2 = s2, …, xn = sn
are made. A sequence of numbers is called a solution to a system of equations if it
is a solution to every equation in the system.
For example, x = -2, y = 5, z = 0 and x = 0, y = 4, z = -1 are both solutions
to the system
x+y+ z=3
2x + y + 3z = 1
2 Chapter 1 Systems of Linear Equations
A system may have no solution at all, or it may have a unique solution, or it may
have an infinite family of solutions. For instance, the system x + y = 2, x + y = 3
has no solution because the sum of two numbers cannot be 2 and 3 simultaneously.
A system that has no solution is called inconsistent; a system with at least one
solution is called consistent. The system in the following example has infinitely
many solutions.
EXAMPLE 1
Show that, for arbitrary values of s and t,
x1 = t - s + 1
x2 = t + s + 2
x3 = s
x4 = t
is a solution to the system
x1 - 2x2 + 3x3 + x4 = -3
2x1 - x2 + 3x3 - x4 = 0
Solution ► Simply substitute these values of x1, x2, x3, and x4 in each equation.
x1 - 2x2 + 3x3 + x4 = (t - s + 1) - 2(t + s + 2) + 3s + t = -3
2x1 - x2 + 3x3 - x4 = 2(t - s + 1) - (t + s + 2) + 3s - t = 0
Because both equations are satisfied, it is a solution for all choices of s and t.
The quantities s and t in Example 1 are called parameters, and the set of
solutions, described in this way, is said to be given in parametric form and
is called the general solution to the system. It turns out that the solutions to
every system of equations (if there are solutions) can be given in parametric
form (that is, the variables x1, x2, … are given in terms of new independent
variables s, t, etc.). The following example shows how this happens in the
simplest systems where only one equation is present.
EXAM PLE 2
Describe all solutions to 3x - y + 2z = 6 in parametric form.
y When only two variables are involved, the solutions to systems of linear
equations can be described geometrically because the graph of a linear equation
x−y=1 ax + by = c is a straight line if a and b are not both zero. Moreover, a point P(s, t)
with coordinates s and t lies on the line if and only if as + bt = c—that is when
x+y=3 x = s, y = t is a solution to the equation. Hence the solutions to a system of linear
equations correspond to the points P(s, t) that lie on all the lines in question.
P(2, 1) In particular, if the system consists of just one equation, there must be infinitely
many solutions because there are infinitely many points on a line. If the system has
O x two equations, there are three possibilities for the corresponding straight lines:
1. The lines intersect at a single point. Then the system has a unique solution
(a) Unique solution corresponding to that point.
(x = 2, y = 1)
2. The lines are parallel (and distinct) and so do not intersect. Then the system has
y
no solution.
3. The lines are identical. Then the system has infinitely many solutions—one for
each point on the (common) line.
x+y=4
These three situations are illustrated in Figure 1. In each case the graphs of two
specific lines are plotted and the corresponding equations are indicated. In the last
x+y=2 case, the equations are 3x - y = 4 and -6x + 2y = -8, which have identical graphs.
When three variables are present, the graph of an equation ax + by + cz = d can
O x be shown to be a plane (see Section 4.2) and so again provides a “picture” of the set
of solutions. However, this graphical method has its limitations: When more than
(b) No solution three variables are involved, no physical image of the graphs (called hyperplanes) is
y possible. It is necessary to turn to a more “algebraic” method of solution.
Before describing the method, we introduce a concept that simplifies the
computations involved. Consider the following system
3x1 + 2x2 - x3 + x4 = -1
−6x + 2y = −8
2x1 - x3 + 2x4 = 0
3x1 + x2 + 2x3 + 5x4 = 2
3x − y = 4
of three equations in four variables. The array of numbers1
3 2 −1 1 −1
O x 2 0 −1 2 0
(c) Infinitely many solutions 3 1 2 5 2
(x = t, y = 3t − 4)
FIGURE 1 occurring in the system is called the augmented matrix of the system. Each
row of the matrix consists of the coefficients of the variables (in order) from the
corresponding equation, together with the constant term. For clarity, the constants
are separated by a vertical line. The augmented matrix is just a different way of
describing the system of equations. The array of coefficients of the variables
3 2 −1 1
2 0 −1 2
3 1 2 5
1 A rectangular array of numbers is called a matrix. Matrices will be discussed in more detail in Chapter 2.
4 Chapter 1 Systems of Linear Equations
Elementary Operations
The algebraic method for solving systems of linear equations is described as follows.
Two such systems are said to be equivalent if they have the same set of solutions. A
system is solved by writing a series of systems, one after the other, each equivalent
to the previous system. Each of these systems has the same set of solutions as the
original one; the aim is to end up with a system that is easy to solve. Each system in
the series is obtained from the preceding system by a simple manipulation chosen so
that it does not change the set of solutions.
As an illustration, we solve the system x + 2y = -2, 2x + y = 7 in this manner.
At each stage, the corresponding augmented matrix is displayed. The original
system is
x + 2y = -2 1 2 −2
2x + y = 7 2 1 7
First, subtract twice the first equation from the second. The resulting system is
x + 2y = -2 1 2 −2
- 3y = 11 0 − 3 11
11
which is equivalent to the original (see Theorem 1). At this stage we obtain y = -__
3
1
_
by multiplying the second equation by - 3 . The result is the equivalent system
x + 2y = -2 1 2 −2
11
y= -__
3 0 1 − 11
3
Finally, we subtract twice the second equation from the first to get another
equivalent system.
16
__
x= 3 1 0 16
3
11
y= -__
3 0 1 − 11
3
Now this system is easy to solve! And because it is equivalent to the original system,
it provides the solution to that system.
Observe that, at each stage, a certain operation is performed on the system (and
thus on the augmented matrix) to produce an equivalent system.
Definition 1.1 The following operations, called elementary operations, can routinely be performed
on systems of linear equations to produce equivalent systems.
(I) Interchange two equations.
(II) Multiply one equation by a nonzero number.
(III) Add a multiple of one equation to a different equation.
Theorem 1
Definition 1.2 The following are called elementary row operations on a matrix.
(I) Interchange two rows.
(II) Multiply one row by a nonzero number.
(III) Add a multiple of one row to a different row.
In the illustration above, a series of such operations led to a matrix of the form
1 0 ∗
0 1 ∗
where the asterisks represent arbitrary numbers. In the case of three equations in
three variables, the goal is to produce a matrix of the form
1 0 0 ∗
0 1 0 ∗
0 0 1 ∗
This does not always happen, as we will see in the next section. Here is an example
in which it does happen.
EXAMP L E 3
Find all solutions to the following system of equations.
3x + 4y + z = 1
2x + 3y = 0
4x + 3y - z = -2
The upper left 1 is now used to “clean up” the first column, that is create zeros
in the other positions in that column. First subtract 2 times row 1 from row 2
to obtain
1 1 1 1
0 1 −2 −2
4 3 −1 −2
Now subtract 3 times row 3 from row 1, and add 2 times row 3 to row 2 to get
1 0 0 − 73
0 1 0 2
7
0 0 1 8
7
The corresponding equations are x = -_37 , y = _27 , and z = _87 , which give the
(unique) solution.
R1 R1 R1 R1
R2 R2 R2 R2
→ → =
R3 R3 + kR3 (R3 + kR2) - kR2 R3
R4 R4 R4 R4
The existence of inverses for elementary row operations and hence for
elementary operations on a system of equations, gives:
PROOF OF THEOREM 1
Suppose that a system of linear equations is transformed into a new system by a
sequence of elementary operations. Then every solution of the original system
is automatically a solution of the new system because adding equations, or
multiplying an equation by a nonzero number, always results in a valid equation.
In the same way, each solution of the new system must be a solution to the
original system because the original system can be obtained from the new one
by another series of elementary operations (the inverses of the originals). It
follows that the original and new systems have the same solutions. This proves
Theorem 1.
EXERCISES 1.1
1. In each case verify that the following are 3. Regarding 2x = 5 as the equation 2x + 0y = 5
solutions for all values of s and t. in two variables, find all solutions in parametric
form.
(a) x = 19t - 35
y = 25 - 13t 4. Regarding 4x - 2y = 3 as the equation
z=t 4x - 2y + 0z = 3 in three variables, find all
is a solution of solutions in parametric form.
2x + 3y + z = 5
5x + 7y - 4z = 0 5. Find all solutions to the general system ax = b of
2 one equation in one variable (a) when a = 0 and
(b) x1 = 2s + 12t + 13
(b) when a ≠ 0.
x2 = s
x3 = -s - 3t - 3 6. Show that a system consisting of exactly one
x4 = t linear equation can have no solution, one
is a solution of solution, or infinitely many solutions. Give
2x1 + 5x2 + 9x3 + 3x4 = -1 examples.
x1 + 2x2 + 4x3 = 1
7. Write the augmented matrix for each of the
2. Find all solutions to the following in parametric following systems of linear equations.
form in two ways.
(a) x - 3y = 5 (b) x + 2y = 0
(a) 3x + y = 2 (b) 2x + 3y = 1 2x + y = 1 y=1
(c) 3x - y + 2z = 5 (b) x - 2y + 5z = 1 (c) x - y + z = 2 (d) x+y=1
x- z=1 y+z=0
y + 2x = 0 z-x=2
2 A indicates that the exercise has an answer at the end of the book.
8 Chapter 1 Systems of Linear Equations
8. Write a system of linear equations that has each (b) A consistent linear system must have
of the following augmented matrices. infinitely many solutions.
(a) 1 −1 6 0 (b) 2 −1 0 −1 (c) If a row operation is done to a consistent
0 1 0 3 −3 2 1 0 linear system, the resulting system must be
consistent.
2 −1 0 1 0 1 1 3
(d) If a series of row operations on a linear
9. Find the solution of each of the following system results in an inconsistent system, the
systems of linear equations using augmented original system is inconsistent.
matrices.
15. Find a quadratic a + bx + cx2 such that the graph
(a) x - 3y = 1 (b) x + 2y = 1
of y = a + bx + cx2 contains each of the points
2x - 7y = 3 3x + 4y = -1
(-1, 6), (2, 0), and (3, 2).
(c) 2x + 3y = -1 (d) 3x + 4y = 1 3x + 2y = 5
3x + 4y = 2 4x + 5y = -3 16. Solve the system e by changing
7x + 5y = 1
10. Find the solution of each of the following x= 5x - 2y
systems of linear equations using augmented variables e and solving the
y = -7x + 3y
matrices.
resulting equations for x and y.
(a) x + y + 2z = -1(b) 2x + y + z = -1
2x + y + 3z = 0 x + 2y + z = 0 17. Find a, b, and c such that
- 2y + z = 2 3x - 2z = 5 2
x -x+3 ax + b c
______________ = ______ + ______
(x2 + 2)(2x - 1) x2 + 2 2x - 1
11. Find all solutions (if any) of the following
systems of linear equations. [Hint: Multiply through by (x2 + 2)(2x - 1) and
equate coefficients of powers of x.]
(a) 3x - 2y = 5 (b) 3x - 2y = 5
-12x + 8y = -20 -12x + 8y = 16 18. A zookeeper wants to give an animal 42 mg of
x + 2y − z = a vitamin A and 65 mg of vitamin D per day. He
12. Show that the system u x + y + 3z = b
2 has two supplements: the first contains 10%
vitamin A and 25% vitamin D; the second
x − 4 y + 9z = c
contains 20% vitamin A and 25% vitamin D.
is inconsistent unless c = 2b - 3a.
How much of each supplement should he give
13. By examining the possible positions of lines the animal each day?
in the plane, show that two equations in two
19. Workmen John and Joe earn a total of $24.60
variables can have zero, one, or infinitely many
when John works 2 hours and Joe works 3 hours.
solutions.
If John works 3 hours and Joe works 2 hours,
14. In each case either show that the statement is they get $23.90. Find their hourly rates.
true, or give an example3 showing it is false.
20. A biologist wants to create a diet from fish and
(a) If a linear system has n variables and m meal containing 183 grams of protein and 93
equations, then the augmented matrix has grams of carbohyrate per day. If fish contains
n rows. 70% protein and 10% carbohydrate, and meal
contains 30% protein and 60% carbohydrate,
how much of each food is required each day?
3
Such an example is called a counterexample. For example, if the statement is that “all philosophers have beards”, the existence of
a non-bearded philosopher would be a counterexample proving that the statement is false. This is discussed again in Appendix B.
SECTION 1.2 Gaussian Elimination 9
Definition 1.3 A matrix is said to be in row-echelon form (and will be called a row-echelon
matrix) if it satisfies the following three conditions:
1. All zero rows (consisting entirely of zeros) are at the bottom.
2. The first nonzero entry from the left in each nonzero row is a 1, called the
leading 1 for that row.
3. Each leading 1 is to the right of all leading 1s in the rows above it.
A row-echelon matrix is said to be in reduced row-echelon form (and will be called
a reduced row-echelon matrix) if, in addition, it satisfies the following condition:
4. Each leading 1 is the only nonzero entry in its column.
EXAMP L E 1
The following matrices are in row-echelon form (for any choice of numbers in
∗-positions).
0 1 ∗ ∗ 1 ∗ ∗∗ 1 ∗∗
1 ∗∗
0 0 1 ∗ 0 1 ∗∗ 0 1∗
0 01
0 0 0 0 0 0 01 0 01
10 Chapter 1 Systems of Linear Equations
Theorem 1
Gaussian4 Algorithm5
Step 1. If the matrix consists entirely of zeros, stop—it is already in row-echelon form.
Step 2. Otherwise, find the first column from the left containing a nonzero entry (call
it a), and move the row containing that entry to the top position.
Step 3. Now multiply the new top row by 1/a to create a leading 1.
Step 4. By subtracting multiples of that row from rows below it, make each entry
below the leading 1 zero.
This completes the first row, and all further row operations are carried out on the
remaining rows.
Step 5. Repeat steps 1–4 on the matrix consisting of the remaining rows.
The process stops when either no rows remain at step 5 or the remaining rows consist
entirely of zeros.
Carl Friedrich Gauss. Photo
© Corbis. 45
Observe that the gaussian algorithm is recursive: When the first leading 1 has
been obtained, the procedure is repeated on the remaining rows of the matrix. This
makes the algorithm easy to use on a computer. Notes that the solution to Example 3
Section 1.1 did not use the gaussian algorithm as written because the first leading 1
was not created by dividing row 1 by 3. The reason for this is that it avoids fractions.
However, the general pattern is clear: Create the leading 1s from left to right, using
each of them in turn to create zeros below it. Here are two more examples.
4 Carl Friedrich Gauss (1777–1855) ranks with Archimedes and Newton as one of the three greatest mathematicians of all time. He
was a child prodigy and, at the age of 21, he gave the first proof that every polynomial has a complex root. In 1801 he published
a timeless masterpiece, Disquisitiones Arithmeticae, in which he founded modern number theory. He went on to make ground-
breaking contributions to nearly every branch of mathematics, often well before others rediscovered and published the results.
5 The algorithm was known to the ancient Chinese.
SECTION 1.2 Gaussian Elimination 11
EXAMP L E 2
Solve the following system of equations.
3x + y - 4z = -1
x + 10z = 5
4x + y + 6z = 1
EXAMP L E 3
Solve the following system of equations.
x1 - 2x2 - x3 + 3x4 = 1
2x1 - 4x2 + x3 =5
x1 - 2x2 + 2x3 - 3x4 = 4
12 Chapter 1 Systems of Linear Equations
and t. Moreover, every choice of these parameters leads to a solution to the system,
and every solution arises in this way. This procedure works in general, and has come
to be called
Gaussian Elimination
EXAMP L E 4
Find a condition on the numbers a, b, and c such that the following system of
equations is consistent. When that condition is satisfied, find all solutions (in
terms of a, b, and c).
x1 + 3x2 + x3 = a
-x1 - 2x2 + x3 = b
3x1 + 7x2 - x3 = c
Solution ► We use gaussian elimination except that now the augmented matrix
1 3 1 a
−1 −2 1 b
3 7 −1 c
has entries a, b, and c as well as known numbers. The first leading one is in
place, so we create zeros below it in column 1:
1 3 1 a
0 1 2 a+b
0 −2 − 4 c − 3a
6 With n equations where n is large, gaussian elimination requires roughly n3/2 multiplications and divisions, whereas this number is
roughly n3/3 if back substitution is used.
14 Chapter 1 Systems of Linear Equations
The second leading 1 has appeared, so use it to create zeros in the rest of
column 2:
1 0 −5 −2a − 3b
0 1 2 a+b
0 0 0 c − a + 2b
Now the whole solution depends on the number c - a + 2b = c - (a - 2b).
The last row corresponds to an equation 0 = c - (a - 2b). If c ≠ a - 2b, there
is no solution (just as in Example 2). Hence:
The system is consistent if and only if c = a - 2b.
In this case the last matrix becomes
1 0 −5 −2a − 3b
0 1 2 a+b
0 0 0 0
Thus, if c = a - 2b, taking x3 = t where t is a parameter gives the solutions
x1 = 5t - (2a + 3b) x2 = (a + b) - 2t x3 = t.
Rank
It can be proven that the reduced row-echelon form of a matrix A is uniquely
determined by A. That is, no matter which series of row operations is used to
carry A to a reduced row-echelon matrix, the result will always be the same matrix.
(A proof is given at the end of Section 2.5.) By contrast, this is not true for row-
echelon matrices: Different series of row operations can carry the same matrix A to
1 −1 4
different row-echelon matrices. Indeed, the matrix A = can be carried
2 −1 2
1 −1 4
(by one row operation) to the row-echelon matrix , and then by another
0 1 −6
1 0 −2
row operation to the (reduced) row-echelon matrix . However, it is true
0 1 −6
that the number r of leading 1s must be the same in each of these row-echelon
matrices (this will be proved in Chapter 5). Hence, the number r depends only
on A and not on the way in which A is carried to row-echelon form.
Definition 1.4 The rank of matrix A is the number of leading 1s in any row-echelon matrix to which A
can be carried by row operations.
EXAMPLE 5
1 1 −1 4
Compute the rank of A = 2 1 3 0 .
0 1 −5 8
SECTION 1.2 Gaussian Elimination 15
Suppose that rank A = r, where A is a matrix with m rows and n columns. Then
r < m because the leading 1s lie in different rows, and r < n because the leading 1s
lie in different columns. Moreover, the rank has a useful application to equations.
Recall that a system of linear equations is called consistent if it has at least one
solution.
Theorem 2
Suppose a system of m equations in n variables is consistent, and that the rank of the
augmented matrix is r.
(1) The set of solutions involves exactly n - r parameters.
(2) If r < n, the system has infinitely many solutions.
(3) If r = n, the system has a unique solution.
PROOF
The fact that the rank of the augmented matrix is r means there are exactly
r leading variables, and hence exactly n - r nonleading variables. These
nonleading variables are all assigned as parameters in the gaussian algorithm, so
the set of solutions involves exactly n - r parameters. Hence if r < n, there is
at least one parameter, and so infinitely many solutions. If r = n, there are no
parameters and so a unique solution.
Theorem 2 shows that, for any system of linear equations, exactly three
possibilities exist:
1. No solution. This occurs when a row [0 0 0 1] occurs in the row-echelon
form. This is the case where the system is inconsistent.
2. Unique solution. This occurs when every variable is a leading variable.
3. Infinitely many solutions. This occurs when the system is consistent and there
is at least one nonleading variable, so at least one parameter is involved.
EXAMP L E 6
Suppose the matrix A in Example 5 is the augmented matrix of a system
of m = 3 linear equations in n = 3 variables. As rank A = r = 2, the set of
solutions will have n - r = 1 parameter. The reader can verify this fact directly.
16 Chapter 1 Systems of Linear Equations
EXERCISES 1.2
(h) The rank of A is at most 3. year the students switched club membership as
follows:
(i) If rank A = 3, the system is consistent.
4 1
(j) If rank C = 3, the system is consistent. Club A. __
10
remain in A, __
10
switch to B,
5
__
10
switch to C.
13. Find a sequence of row operations carrying 7 2
Club B. __ remain in B, __ switch to A,
b1 + c1 b2 + c2 b3 + c3 a1 a2 a3 1
10
__ switch to C.
10
10
c1 + a1 c2 + a2 c3 + a3 to b1 b2 b3
6 2
a1 + b1 a2 + b2 a3 + b3 c1 c2 c3 Club C. __
10
remain in C, __
10
switch to A,
2
__
10
switch to B.
14. In each case, show that the reduced row-echelon
form is as given. If the fraction of the student population in each
p 0a 1 0 0 club is unchanged, find each of these fractions.
(a) b 00 with abc ≠ 0; 0 1 0 19. Given points (p1, q1), (p2, q2), and (p3, q3) in
q cr 0 0 1 the plane with p1, p2, and p3 distinct, show
1 ab+c 1 0 ∗ that they lie on some curve with equation
(b) 1 bc + a where c ≠ a or b ≠ a; 0 1 ∗ y = a + bx + cx2. [Hint: Solve for a, b, and c.]
1 ca+b 0 0 0 20. The scores of three players in a tournament have
ax + by + cz = 0
15. Show that e
been lost. The only information available is the
always total of the scores for players 1 and 2, the total
a1x + b1y + c1z = 0
has a solution other than x = 0, y = 0, z = 0. for players 2 and 3, and the total for players 3
and 1.
16. Find the circle x2 + y2 + ax + by + c = 0 passing (a) Show that the individual scores can be
through the following points. rediscovered.
(a) (-2, 1), (5, 0), and (4, 1) (b) Is this possible with four players (knowing
(b) (1, 1), (5, -3), and (-3, -3) the totals for players 1 and 2, 2 and 3, 3 and
4, and 4 and 1)?
17. Three Nissans, two Fords, and four Chevrolets
can be rented for $106 per day. At the same rates 21. A boy finds $1.05 in dimes, nickels, and pennies.
two Nissans, four Fords, and three Chevrolets If there are 17 coins in all, how many coins of
cost $107 per day, whereas four Nissans, three each type can he have?
Fords, and two Chevrolets cost $102 per day.
Find the rental rates for all three kinds of cars. 22. If a consistent system has more variables than
equations, show that it has infinitely many
18. A school has three clubs and each student is solutions. [Hint: Use Theorem 2.]
required to belong to exactly one club. One
EXAMP L E 1
Show that the following homogeneous system has nontrivial solutions.
x1 - x2 + 2x3 - x4 = 0
2x1 + 2x2 + x4 = 0
3x1 + x2 + 2x3 - x4 = 0
Theorem 1
If a homogeneous system of linear equations has more variables than equations, then it
has a nontrivial solution (in fact, infinitely many).
PROOF
Suppose there are m equations in n variables where n > m, and let R denote
the reduced row-echelon form of the augmented matrix. If there are r leading
variables, there are n - r nonleading variables, and so n - r parameters. Hence,
it suffices to show that r < n. But r ≤ m because R has r leading 1s and m rows,
and m < n by hypothesis. So r ≤ m < n, which gives r < n.
Note that the converse of Theorem 1 is not true: if a homogeneous system has
nontrivial solutions, it need not have more variables than equations (the system
x1 + x2 = 0, 2x1 + 2x2 = 0 has nontrivial solutions but m = 2 = n.)
Theorem 1 is very useful in applications. The next example provides an
illustration from geometry.
20 Chapter 1 Systems of Linear Equations
EXAMPLE 2
We call the graph of an equation ax2 + bxy + cy2 + dx + ey + f = 0 a conic if
the numbers a, b, and c are not all zero. Show that there is at least one conic
through any five points in the plane that are not all on a line.
Solution ► Let the coordinates of the five points be (p1, q1), (p2, q2), (p3, q3),
(p4, q4), and (p5, q5). The graph of ax2 + bxy + cy2 + dx + ey + f = 0 passes
through (pi, qi) if
api2 + bpiqi + cqi2 + dpi + eqi + f = 0
This gives five equations, one for each i, linear in the six variables a, b, c, d, e,
and f. Hence, there is a nontrivial solution by Theorem 1. If a = b = c = 0, the
five points all lie on the line dx + ey + f = 0, contrary to assumption. Hence,
one of a, b, c is nonzero.
EXAMPLE 3
If x = S T and y = S T then 2x + 5y = S T + S T = S 1 T.
3 -1 6 -5 1
-2 1 -4 5
EXAMPLE 4
ST ST ST S T ST
1 2 3 0 1
Let x = 0 , y = 1 and z = 1 . If v = -1 and w = 1 ,
1 0 1 2 1
determine whether v and w are linear combinations of x and y.
Solution ► For v, we must determine whether numbers r, s, and t exist such that
v = rx + sy + tz, that is, whether
SECTION 1.3 Homogeneous Equations 21
S -1 T = r S 0 T + s S 1 T + t S 1 T = S T
0 1 2 3 r + 2s + 3t
s+t .
2 1 0 1 r+t
Equating corresponding entries gives a system of linear equations
r + 2s + 3t = 0, s + t = -1, and r + t = 2 for r, s, and t. By gaussian
elimination, the solution is r = 2 - k, s = -1 - k, and t = k where k is a
parameter. Taking k = 0, we see that v = 2x - y is indeed a linear combination
of x, y, and z.
Turning to w, we again look for r, s, and t such that w = rx + sy + tz; that is,
ST ST ST ST S T
1 1 2 3 r + 2s + 3t
1 =r 0 +s 1 +t 1 = s+t ,
1 1 0 1 r+t
leading to equations r + 2s + 3t = 1, s + t = 1, and r + t = 1 for real numbers
r, s, and t. But this time there is no solution as the reader can verify, so w is not
a linear combination of x, y, and z.
Our interest in linear combinations comes from the fact that they provide one
of the best ways to describe the general solution of a homogeneous system of linear
equations. When solving such a system with n variables x1, x2, …, xn, write the
x1 0
x 0
variables as a column7 matrix: x = 2 . The trivial solution is denoted 0 = .
xn 0
As an illustration, the general solution in Example 1 is x1 = -t, x2 = t, x3 = t, and
x4 = 0, where t is a parameter, and we would now express this by saying that the
−t
t
general solution is x = , where t is arbitrary.
t
0
Now let x and y be two solutions to a homogeneous system with n variables.
Then any linear combination sx + ty of these solutions turns out to be again a
solution to the system. More generally:
Any linear combination of solutions to a homogeneous system is again a solution. (∗)
In fact, suppose that a typical equation in the system is
x1 y1
x y
a1x1 + a2x2 + + anxn = 0, and suppose that x = 2 and y = 2 are
xn yn
solutions. Then a1x1 + a2x2 + + anxn = 0 and a1y1 + a2y2 + + anyn = 0.
sx1 + ty1
sx2 + ty2
Hence sx + ty = is also a solution because
sxn + tyn
EXAMPLE 5
Solve the homogeneous system with coefficient matrix
1 −2 3 −2
A = −3 6 1 0
−2 4 4 −2
Solution ► The reduction of the augmented matrix to reduced form is
1 −2 3 −2 0 1 − 2 0 − 15 0
−3 6 1 0 0 → 0 0 1 − 3 0
5
−2 4 4 −2 0 0 0 0 0 0
so the solutions are x1 = 2s + _15 t, x2 = s, x3 = _35 t, and x4 = t by gaussian
elimination. Hence we can write the general solution x in the matrix form
1
x1 2s + 15 t 2 5
x s 1 0
x= 2 = =s +t = sx1 + tx2
x3 3t 0 3
5 5
x4 0 1
t
1
2 5
1 0
where x1 = and x1 = are particular solutions determined by the
0 3
5
0 1
gaussian algorithm.
Definition 1.5 The gaussian algorithm systematically produces solutions to any homogeneous linear
system, called basic solutions, one for every parameter.
1
2 5 2 1
1 0 1 + _1 t 0
x=s +t =s 5
3
0 5
0 3
0 1 0 5
Hence by introducing a new parameter r = t/5 we can multiply the original basic
solution x2 by 5 and so eliminate fractions. For this reason:
Any nonzero scalar multiple of a basic solution will still be called a basic solution.
In the same way, the gaussian algorithm produces basic solutions to every
homogeneous system, one for each parameter (there are no basic solutions if the
system has only the trivial solution). Moreover every solution is given by the
algorithm as a linear combination of these basic solutions (as in Example 5). If A
has rank r, Theorem 2 Section 1.2 shows that there are exactly n - r parameters,
and so n - r basic solutions. This proves:
Theorem 2
EXAMP L E 6
Find basic solutions of the homogeneous system with coefficient matrix A, and
express every solution as a linear combination of the basic solutions, where
1 −3 0 2 2
−2 6 1 2 −5
A=
3 − 9 −1 0 7
−3 9 2 6 −8
Solution ► The reduction of the augmented matrix to reduced row-echelon
form is
1 −3 0 2 2 0 1 −3 0 2 2 0
−2 6 1 2 −5 0 0 0 1 6 −1 0
→
3 − 9 −1 0 7 0 0 0 0 0 0 0
−3 9 2 6 −8 0 0 0 0 0 0 0
so the general solution is x1 = 3r - 2s - 2t, x2 = r, x3 = -6s + t, x4 = s, and
x5 = t where r, s, and t are parameters. In matrix form this is
x1 3r − 2s − 2t 3 −2 −2
x2 r 1 0 0
x = x3 = − 6s + t =r 0 + s −6 + t 1
x4 s 0 1 0
x5 t 0 0 1
24 Chapter 1 Systems of Linear Equations
3 −2 −2
1 0 0
Hence basic solutions are x1 = 0 , x2 = −6 , and x3 = 1 .
0 1 0
0 0 1
EXERCISES 1.3
S T ST S T
2 1 1
1. Consider the following statements about a 3. Let x = 1 , y = 0 , and z = 1 . In each
system of linear equations with augmented -1 1 -2
matrix A. In each case either prove the statement case, either write v as a linear combination of
or give an example for which it is false. x, y, and z, or show that it is not such a linear
(a) If the system is homogeneous, every solution combination.
S T S T
is trivial. 0 4
(b) If the system has a nontrivial solution, it (a) v = 1 (b) v= 3
cannot be homogeneous. -3 -4
ST ST
(c) If there exists a trivial solution, the system is 3 3
homogeneous. (c) v = 1 (d) v= 0
0 3
(d) If the system is consistent, it must be
homogeneous.
4. In each case, either express y as a linear
Now assume that the system is homogeneous. combination of a1, a2, and a3, or show that it is
not such a linear combination. Here:
S T ST ST
(e) If there exists a nontrivial solution, there is
-1 3 1
no trivial solution. 3
a1 = , a = , and a3 = 1
1
(f ) If there exists a solution, there are infinitely 0 2 2 1
1 0 1
many solutions.
ST S T
1 -1
(g) If there exist nontrivial solutions, the row- (a) y = 2 (b) x = 9
echelon form of A has a row of zeros. 4 2
0 6
(h) If the row-echelon form of A has a row of
zeros, there exist nontrivial solutions. 5. For each of the following homogeneous systems,
find a set of basic solutions and express the
(i) If a row operation is applied to the system, general solution as a linear combination of these
the new system is also homogeneous. basic solutions.
2. In each of the following, find all values of a for (a) x1 + 2x2 - x3 + 2x4 + x5 = 0
which the system has nontrivial solutions, and x1 + 2x2 + 2x3 + x5 = 0
determine all solutions in each case. 2x1 + 4x2 - 2x3 + 3x4 + x5 = 0
(a) x - 2y + z = 0 (b) x + 2y + z = 0 (b) x1 + 2x2 - x3 + x4 + x5 = 0
x + ay - 3z = 0 x + 3y + 6z = 0 -x1 - 2x2 + 2x3 + x5 = 0
-x + 6y - 5z = 0 2x + 3y + az = 0 -x1 - 2x2 + 3x3 + x4 + 3x5 = 0
(c) x + y - z = 0 (d) ax + y + z = 0
ay - z = 0 x+y- z=0
x + y + az = 0 x + y + az = 0
SECTION 1.4 An Application to Network Flow 25
Junction Rule
At each of the junctions in the network, the total flow into that junction must equal the
total flow out.
This requirement gives a linear equation relating the flows in conductors emanating
from the junction.
26 Chapter 1 Systems of Linear Equations
EXAMPLE 1
A network of one-way streets is shown in the accompanying diagram. The rate
of flow of cars into intersection A is 500 cars per hour, and 400 and 100 cars
per hour emerge from B and C, respectively. Find the possible flows along
each street.
Solution ► Suppose the flows along the streets are f1, f2, f3, f4, f5, and f6 cars per
hour in the directions shown. Then, equating the flow in with the flow out at
each intersection, we get
f1
Intersection A 500 = f1 + f2 + f3
500 A B 400
Intersection B f1 + f4 + f6 = 400
f2 f4
Intersection C f3 + f5 = f6 + 100
D Intersection D f2 = f4 + f5
f3 f6
f5 These give four equations in the six variables f1, f2, …, f6.
C f1 + f2 + f3 = 500
100 f1 + f4 + f6 = 400
f3 + f5 - f6 = 100
f2 - f4 - f5 =0
1 1 1 0 0 0 500 1 0 0 1 0 1 400
0
1 0 0 1 0 1 400 0 1 0 −1 −1 0 0
→
0 0 1 0 1 −1 100 0 0 1 0 1 −1 100
0 1 0 −1 −1 0 0 0 0 0 0 0 0 0
Hence, when we use f4, f5, and f6 as parameters, the general solution is
f1 = 400 - f4 - f6 f2 = f4 + f5 f3 = 100 - f5 + f6
This gives all solutions to the system of equations and hence all the possible
flows.
Of course, not all these solutions may be acceptable in the real situation. For
example, the flows f1, f2, …, f6 are all positive in the present context (if one
came out negative, it would mean traffic flowed in the opposite direction).
This imposes constraints on the flows: f1 ≥ 0 and f3 ≥ 0 become
f4 + f6 ≤ 400 f5 - f6 ≤ 100
EXERCISES 1.4
50
(a) Find the possible flows.
(b) If canal BC is closed, what range of flow on
(b)
25 AD must be maintained so that no canal
f1 f2 carries a flow of more than 30?
75 60 3. A traffic circle has five one-way streets, and
f3 f4
50 vehicles enter and leave as shown in the
accompanying diagram.
f5 f6 f7
35 f4
25
40 f5 E D
f3
50
2. A proposed network of irrigation canals is A
C
described in the accompanying diagram. At peak 40
demand, the flows at interchanges A, B, C, and D f1
B f2
are as shown.
30
Ohm’s Law
The current I and the voltage drop V across a resistance R are related by the equation
V = RI.
Kirchhoff’s Laws
1. (Junction Rule) The current flow into a junction equals the current flow out of
that junction.
2. (Circuit Rule) The algebraic sum of the voltage drops (due to resistances) around
any closed circuit of the network must equal the sum of the voltage increases
around the circuit.
EXAMPLE 1
Find the various currents in the circuit shown.
10
Junction A I1 = I2 + I3
A
Junction B I6 = I1 + I5
10 V I3
Junction C I2 + I4 = I6
20 5V 20 V
I
Junction D I3 + I5 = I4
B 2 5 D
Note that these equations are not independent (in fact, the third is an easy
I1 I6 C I4
consequence of the other three).
5
Next, the circuit rule insists that the sum of the voltage increases (due to
10 V I5 the sources) around a closed circuit must equal the sum of the voltage drops
(due to resistances). By Ohm’s law, the voltage loss across a resistance R (in the
direction of the current I) is RI. Going counterclockwise around three closed
circuits yields
Upper left 10 + 5 = 20I1
Upper right -5 + 20 = 10I3 + 5I4
Lower -10 = -20I5 - 5I4
Hence, disregarding the redundant equation obtained at junction C, we have six
equations in the six unknowns I1, …, I6. The solution is
15 28
I1 = __
20
I4 = __
20
-1 12
I2 = __
20
I5 = __
20
16 27
I3 = __
20
I6 = __
20
The fact that I2 is negative means, of course, that this current is in the opposite
1
direction, with a magnitude of __
20
amperes.
SECTION 1.6 An Application to Chemical Reactions 29
EXERCISES 1.5
In Exercises 1–4, find the currents in the circuits. 4. All resistances are 10 Ω.
1.
6 I1 10 V
I6 I4
20 V I2 I2
I3
4
10 V I5
2 I3 20 V
I1
2.
5. Find the voltage x such that the current I1 = 0.
I1
5
5V I1 2
I2 10 1
1
I2 5V
xV 2V I3
5 I3 10 V
3. I1 I3
5V 20 20 V
I2 I5
10
I6
5V 5V
10 V I4 10 20
EXAMPLE 1
Balance the following reaction for burning octane C8H18 in oxygen O2:
C8H18 + O2 → CO2 + H2O
where CO2 represents carbon dioxide. We must find positive integers x, y, z,
and w such that
xC8H18 + yO2 → zCO2 + wH2O
Equating the number of carbon, hydrogen, and oxygen atoms on each side
gives 8x = z, 18x = 2w and 2y = 2z + w, respectively. These can be written
as a homogeneous linear system
8x - z =0
18x - 2w = 0
2y - 2z - w = 0
which can be solved by gaussian elimination. In larger systems this is necessary
but, in such a simple situation, it is easier to solve directly. Set w = t, so that
25
x = _19 t, z = _89 t, 2y = __
16
9
t + t = __
9
t. But x, y, z, and w must be positive integers,
so the smallest value of t that eliminates fractions is 18. Hence, x = 2, y = 25,
z = 16, and w = 18, and the balanced reaction is
2C8H18 + 25O2 → 16CO2 + 18H2O
The reader can verify that this is indeed balanced.
It is worth noting that this problem introduces a new element into the theory of
linear equations: the insistence that the solution must consist of positive integers.
EXERCISES 1.6
In each case balance the chemical reaction. 3. CO2 + H2O → C6H12O6 + O2. This is called
the photosynthesis reaction—C6H12O6 is
1. CH4 + O2 → CO2 + H2O. This is the burning glucose.
of methane CH4.
4. Pb(N3)2 + Cr(MnO4)2 → Cr2O3 + MnO2 +
2. NH3 + CuO → N2 + Cu + H2O. Here NH3 Pb3O4 + NO.
is ammonia, CuO is copper oxide, Cu is copper,
and N2 is nitrogen.
1. We show in Chapter 4 that the graph of an a unique solution? Give reasons for your
equation ax + by + cz = d is a plane in space answer.
when not all of a, b, and c are zero.
2. Find all solutions to the following systems of
(a) By examining the possible positions of planes linear equations.
in space, show that three equations in three
variables can have zero, one, or infinitely (a) x1 + x2 + x3 - x4 = 3
many solutions. 3x1 + 5x2 - 2x3 + x4 = 1
-3x1 - 7x2 + 7x3 - 5x4 = 7
(b) Can two equations in three variables have x1 + 3x2 - 4x3 + 3x4 = -5
SECTION 1.6 An Application to Chemical Reactions 31
2
In the study of systems of linear equations in Chapter 1, we found it convenient
to manipulate the augmented matrix of the system. Our aim was to reduce it to
row-echelon form (using elementary row operations) and hence to write down all
solutions to the system. In the present chapter we consider matrices for their own
sake. While some of the motivation comes from linear equations, it turns out that
matrices can be multiplied and added and so form an algebraic system somewhat
analogous to the real numbers. This “matrix algebra” is useful in ways that are
quite different from the study of linear equations. For example, the geometrical
transformations obtained by rotating the euclidean plane about the origin can be
viewed as multiplications by certain 2 × 2 matrices. These “matrix transformations”
Arthur Cayley. Photo © are an important tool in geometry and, in turn, the geometry provides a “picture”
Corbis.
of the matrices. Furthermore, matrix algebra has many other applications, some of
which will be explored in this chapter. This subject is quite old and was first studied
systematically in 1858 by Arthur Cayley.1
ST
1
A=S T S T
1 2 -1 1 -1
B = C = 3
0 5 6 0 2
2
are matrices. Clearly matrices come in various shapes depending on the number of
rows and columns. For example, the matrix A shown has 2 rows and 3 columns. In
general, a matrix with m rows and n columns is referred to as an m × n matrix or
as having size m × n. Thus matrices A, B, and C above have sizes 2 × 3, 2 × 2, and
3 × 1, respectively. A matrix of size 1 × n is called a row matrix, whereas one of
1 Arthur Cayley (1821–1895) showed his mathematical talent early and graduated from Cambridge in 1842 as senior wrangler. With no
employment in mathematics in view, he took legal training and worked as a lawyer while continuing to do mathematics, publishing
nearly 300 papers in fourteen years. Finally, in 1863, he accepted the Sadlerian professorship at Cambridge and remained there for
the rest of his life, valued for his administrative and teaching skills as well as for his scholarship. His mathematical achievements
were of the first rank. In addition to originating matrix theory and the theory of determinants, he did fundamental work in group
theory, in higher-dimensional geometry, and in the theory of invariants. He was one of the most prolific mathematicians of all time
and produced 966 papers.
SECTION 2.1 Matrix Addition, Scalar Multiplication, and Transposition 33
size m × 1 is called a column matrix. Matrices of size n × n for some n are called
square matrices.
Each entry of a matrix is identified by the row and column in which it lies. The
rows are numbered from the top down, and the columns are numbered from left to
right. Then the (i, j)-entry of a matrix is the number lying simultaneously in row i
and column j. For example,
The (1, 2)-entry of S T is -1.
1 -1
0 1
The (2, 3)-entry of S T is 6.
1 2 -1
0 5 6
A special notation is commonly used for the entries of a matrix. If A is an m × n
matrix, and if the (i, j)-entry of A is denoted as aij, then A is displayed as follows:
a11 a12 a13 a1n
a21 a22 a23 a2n
A=
EXAMP L E 1
2 If p and q are statements, we say that p implies q if q is true whenever p is true. Then “p if and only if q” means that both p implies
q and q implies p. See Appendix B for more on this.
34 Chapter 2 Matrix Algebra
Matrix Addition
Definition 2.1 If A and B are matrices of the same size, their sum A + B is the matrix formed by
adding corresponding entries.
EXAMPLE 2
If A = S T and B = S T , compute A + B.
2 1 3 1 1 -1
-1 2 0 2 0 6
S -1 + 2 2 + 0 0 + 6 T = S 1 2 6 T.
2+1 1+1 3-1 3 2 2
Solution ► A + B =
EXAMPLE 3
Find a, b, and c if [a b c] + [c a b] = [3 2 -1].
EXAMP L E 4
Let A = S T, B = S T , and C = S T. Compute -A,
3 -1 0 1 -1 1 1 0 -2
1 2 -4 -2 0 6 3 1 1
A - B, and A + B - C.
-A = S T
-3 1 0
Solution ►
-1 -2 4
S 1 - (-2) T = S3T
3-1 -1 - (-1) 0-1
2 0 -1
A-B=
2-0 -4 - 62 -10
S1 - 2 - 3 T = S -4 1 1 T
3+1-1 -1 - 1 - 0 0 + 1 - (-2) 3 -2 3
A+B-C=
2 + 0 - 1 -4 + 6 - 1
EXAMP L E 5
It is important to note that the sizes of matrices involved in some calculations are
often determined by the context. For example, if
A+C=S T
1 3 -1
2 0 1
then A and C must be the same size (so that A + C makes sense), and that size must
be 2 × 3 (so that the sum is 2 × 3). For simplicity we shall often omit reference to
such facts when they are clear from the context.
Scalar Multiplication
In gaussian elimination, multiplying a row of a matrix by a number k means
multiplying every entry of that row by k.
Definition 2.2 More generally, if A is any matrix and k is any number, the scalar multiple kA is the
matrix obtained from A by multiplying each entry of A by k.
If A = [aij], this is
kA = [kaij]
Thus 1A = A and (-1)A = -A for any matrix A.
The term scalar arises here because the set of numbers from which the entries are
drawn is usually referred to as the set of scalars. We have been using real numbers
as scalars, but we could equally well have been using complex numbers.
EXAMPLE 6
If A = S T and B = S T , compute 5A, _2 B, and 3A - 2B.
3 -1 4 1 2 -1 1
2 0 1 0 3 2
S0 T
1
_ 1 -_12
Solution ► 5A = S T,
15 -5 20 1
_ 2
2
B =
10 0 30 3
_ 1
2
3A - 2B = S T-S T=S T
9 -3 12 2 4 -2 7 -7 14
6 0 18 0 6 4 6 -6 14
If A is any matrix, note that kA is the same size as A for all scalars k. We also have
0A = 0 and k0 = 0
because the zero matrix has every entry zero. In other words, kA = 0 if either k = 0
or A = 0. The converse of this statement is also true, as Example 7 shows.
EXAMPLE 7
If kA = 0, show that either k = 0 or A = 0.
For future reference, the basic properties of matrix addition and scalar
multiplication are listed in Theorem 1.
Theorem 1
Let A, B, and C denote arbitrary m × n matrices where m and n are fixed. Let k and p
denote arbitrary real numbers. Then
1. A + B = B + A.
2. A + (B + C) = (A + B) + C.
3. There is an m × n matrix 0, such that 0 + A = A for each A.
4. For each A there is an m × n matrix, -A, such that A + (-A) = 0.
5. k(A + B) = kA + kB.
6. (k + p)A = kA + pA.
7. (kp)A = k(pA).
8. 1A = A.
PROOF
Properties 1–4 were given previously. To check property 5, let A = [aij] and
B = [bij] denote matrices of the same size. Then A + B = [aij + bij], as before,
so the (i, j)-entry of k(A + B) is
k(aij + bij) = kaij + kbij
But this is just the (i, j)-entry of kA + kB, and it follows that
k(A + B) = kA + kB. The other properties can be similarly verified; the details
are left to the reader.
EXAMPLE 8
Simplify 2(A + 3C) - 3(2C - B) - 3[2(2A + B - 4C) - 4(A - 2C)] where A,
B, and C are all matrices of the same size.
Transpose of a Matrix
Many results about a matrix A involve the rows of A, and the corresponding result
for columns is derived in an analogous way, essentially by replacing the word
row by the word column throughout. The following definition is made with such
applications in mind.
Definition 2.3 If A is an m × n matrix, the transpose of A, written AT, is the n × m matrix whose
rows are just the columns of A in the same order.
In other words, the first row of AT is the first column of A (that is it consists of the
entries of column 1 in order). Similarly the second row of AT is the second column
of A, and so on.
EXAMPLE 9
Write down the transpose of each of the following matrices.
ST S T S T
1 1 2 3 1 -1
A = 3 B = [5 2 6] C = 3 4 D = 1 3 2
2 5 6 -1 2 1
ST
5
Solution ► AT = [1 3 2], BT = 2 , CT = S T , and D = D.
1 3 5 T
2 4 6
6
If A = [aij] is a matrix, write AT = [bij]. Then bij is the jth element of the ith row
of AT and so is the jth element of the ith column of A. This means bij = aji, so the
definition of AT can be stated as follows:
If A = [aij], then AT = [aji] (∗)
This is useful in verifying the following properties of transposition.
SECTION 2.1 Matrix Addition, Scalar Multiplication, and Transposition 39
Theorem 2
Let A and B denote matrices of the same size, and let k denote a scalar.
1. If A is an m × n matrix, then AT is an n × m matrix.
2. (AT)T = A.
3. (kA)T = kAT.
4. (A + B)T = AT + BT.
PROOF
Property 1 is part of the definition of AT, and property 2 follows from (∗). As to
property 3: If A = [aij], then kA = [kaij], so (∗) gives
(kA)T = [kaji] = k[aji] = kAT
Finally, if B = [bij], then A + B = [cij] where cij = aij + bij Then (∗) gives property 4:
(A + B)T = [cij]T = [cji] = [aji + bji] = [aji] + [bji] = AT + BT
Thus forming the transpose of a matrix A can be viewed as “flipping” A about its
main diagonal, or as “rotating” A through 180° about the line containing the main
diagonal. This makes property 2 in Theorem 2 transparent.
EXAMPLE 10
a2AT - 3 S T b = 2(A ) - 3 S T = 2A - 3 S T
T
1 2 T T 1 2T 1 -1
-1 1 -1 1 2 1
Hence the equation becomes
2A - 3 S T=S T
1 -1 2 3
2 1 -1 2
Note that Example 10 can also be solved by first transposing both sides, then
solving for AT, and so obtaining A = (AT)T. The reader should do this.
40 Chapter 2 Matrix Algebra
The matrix D in Example 9 has the property that D = DT. Such matrices are
important; a matrix A is called symmetric if A = AT. A symmetric matrix A is
necessarily square (if A is m × n, then AT is n × m, so A = AT forces n = m). The
name comes from the fact that these matrices exhibit a symmetry about the main
diagonal. That is, entries that are directly across the main diagonal from each other
are equal.
S T
a b c
For example, b d e is symmetric when b = b, c = c, and e = e.
c e f
EXAMPLE 11
If A and B are symmetric n × n matrices, show that A + B is symmetric.
EXAMPLE 12
Suppose a square matrix A satisfies A = 2AT. Show that necessarily A = 0.
EXERCISES 2.1
S T
0 -1 2 T
(e) S T
1 -5 4 0 T
1. Find a, b, c, and d if (f ) 1 0 -4
2 1 0 6
(a) S T=S T
a b c - 3d -d -2 4 0
(g) S T - 2S T
c d 2a + d a + b 3 -1 1 -2 T
2 1 1 1
S T = 2 S -3 1 T
a-b b-c 1 1
(b)
(h) 3 S T - 2S T
c-d d-a 2 1T 1 -1
-1 0 2 3
(c) 3 S T + 2 S T = S T S T = S T
a b 1 a b b c
(d)
3. Let A = S T, B = S T, C = S T,
b a 2 c d d a 2 1 3 -1 2 3 -1
0 -1 0 1 4 2 0
S T
2. Compute the following: 1 3
D = -1 0 , and E = S T.
1 0 1
(a) S T - 5 S 1 -1 2 T
3 2 1 3 0 -2
0 1 0
5 1 0 1 4
(b) 3 S T - 5S T + 7S T
3 6 1 Compute the following (where possible).
-1 2 -1 (a) 3A - 2B (b) 5C
(c) S T - 4 S 0 -1 T + 3 S -1 -2 T
-2 1 1 -2 2 -3 T
(c) 3E (d) B+D
3 2 T
(d) [3 -1 2] - 2[9 3 4] + [3 11 -6]
(e) 4A - 3C (f ) (A + C)T
(g) 2B - 3E (h) A-D
T
(i) (B - 2E)
SECTION 2.1 Matrix Addition, Scalar Multiplication, and Transposition 41
T = 3A - S T
following matrices are also diagonal.
(a) 5A - S
1 0 5 2
2 3 6 1 (a) A + B (b) A-B
3A + S T = 5A - 2 S T
2 3 (c) kA for any number k
(b)
1 0
14. In each case determine all s and t such that the
5. Find A in terms of B if:
given matrix is symmetric:
(a) A + B = 3A + 2B
(a) S T S T
1 s s t
(b)
(b) 2A - B = 5(A + 2B) -2 t st 1
s 2 s st 2 s t
6. If X, Y, A, and B are matrices of the same size,
solve the following equations to obtain X and Y (c) t −1 s (d) 2s 0 s + t
in terms of A and B. t s2 s 3 3 t
S T
2X + Y = B 5X + 4Y = B 2 1
(a) aA + 3 S T b
T
1 -1 0
= 0 5
7. Find all matrices X and Y such that: 1 2 4
3 8
(b) a3A + 2 S Tb = S T
T
(a) 3X - 2Y = [3 -1] T 1 0 8 0
(b) 2X - 5Y = [1 2] 0 2 3 1
(c) (2A - 3[1 2 0])T = 3AT + [2 1 -1]T
8. Simplify the following expressions where A, B,
a2AT - 5 S T b = 4A - 9 S T
T
and C are matrices. 1 0 1 1
(d)
-1 2 -1 0
(a) 2[9(A - B) + 7(2B - A)]
- 2[3(2B + A) - 2(A + 3B) - 5(A + B)] 16. Let A and B be symmetric (of the same size).
(b) 5[3(A - B + 2C) - 2(3C - B) - A] Show that each of the following is symmetric.
+ 2[3(3A - B + C) + 2(B - 2A) - 2C] (a) (A - B)
9. If A is any 2 × 2 matrix, show that: (b) kA for any scalar k
(a) A = a S T + bS T + cS T + dS T
1 0 0 1 0 0 0 0 17. Show that A + AT is symmetric for any square
0 0 0 0 1 0 0 1 matrix A.
for some numbers a, b, c, and d.
18. If A is a square matrix and A = kAT where
(b) A = p S T + qS T + rS T + sS T
1 0 1 1 1 0 0 1
0 1 0 0 1 0 1 0 k ≠ ±1, show that A = 0.
for some numbers p, q, r, and s. 19. In each case either show that the statement is
true or give an example showing it is false.
10. Let A = [1 1 -1], B = [0 1 2], and C = [3 0 1].
If rA + sB + tC = 0 for some scalars r, s, and t, (a) If A + B = A + C, then B and C have the
show that necessarily r = s = t = 0. same size.
20. A square matrix W is called skew-symmetric if 23. Let A, A1, A2, …, An denote matrices of the same
WT = -W. Let A be any square matrix. size. Use induction on n to verify the following
extensions of properties 5 and 6 of Theorem 1.
(a) Show that A - AT is skew-symmetric.
(a) k(A1 + A2 + + An)
(b) Find a symmetric matrix S and a skew- = kA1 + kA2 + + kAn for any number k
symmetric matrix W such that A = S + W.
(b) (k1 + k2 + + kn)A
(c) Show that S and W in part (b) are uniquely = k1A + k2A + + knA for any
determined by A. numbers k1, k2, …, kn
21. If W is skew-symmetric (Exercise 20), show that 24. Let A be a square matrix. If A = pBT and
the entries on the main diagonal are zero. B = qAT for some matrix B and numbers p
and q, show that either A = 0 = B or pq = 1.
22. Prove the following parts of Theorem 1.
[Hint: Example 7.]
(a) (k + p)A = kA + pA
(b) (kp)A = k(pA)
Vectors
It is a well-known fact in analytic geometry that two points in the plane with
coordinates (a1, a2) and (b1, b2) are equal if and only if a1 = b1 and a2 = b2.
Moreover, a similar condition applies to points (a1, a2, a3) in space. We extend
this idea as follows.
An ordered sequence (a1, a2, …, an) of real numbers is called an ordered n-tuple.
The word “ordered” here reflects our insistence that two ordered n-tuples are equal
if and only if corresponding entries are the same. In other words,
(a1, a2, …, an) = (b1, b2, …, bn) if and only if a1 = b1, a2 = b2, …, and an = bn.
Thus the ordered 2-tuples and 3-tuples are just the ordered pairs and triples familiar
from geometry.
Definition 2.4 Let denote the set of all real numbers. The set of all ordered n-tuples from has a
special notation:
n denotes the set of all ordered n-tuples of real numbers.
There are two commonly used ways to denote the n-tuples in n: As rows
S T
r1
r
(r1, r2, …, rn) or columns 2 ; the notation we use depends on the context. In any
rn
event they are called vectors or n-vectors and will be denoted using bold type such
as x or v. For example, an m × n matrix A will be written as a row of columns:
A = [a1 a2 an] where aj denotes column j of A for each j.
SECTION 2.2 Equations, Matrices, and Transformations 43
If x and y are two n-vectors in n, it is clear that their matrix sum x + y is also
in n as is the scalar multiple kx for any real number k. We express this observation
by saying that n is closed under addition and scalar multiplication. In particular,
all the basic properties in Theorem 1 Section 2.1 are true of these n-vectors. These
properties are fundamental and will be used frequently below without comment. As
for matrices in general, the n × 1 zero matrix is called the zero n-vector in n and,
if x is an n-vector, the n-vector -x is called the negative x.
Of course, we have already encountered these n-vectors in Section 1.3 as the
solutions to systems of linear equations with n variables. In particular we defined the
notion of a linear combination of vectors and showed that a linear combination of
solutions to a homogeneous system is again a solution. Clearly, a linear combination
of n-vectors in n is again in n, a fact that we will be using.
Matrix-Vector Multiplication
Given a system of linear equations, the left sides of the equations depend only on
the coefficient matrix A and the column x of variables, and not on the constants.
This observation leads to a fundamental idea in linear algebra: We view the left
sides of the equations as the “product” Ax of the matrix A and the vector x. This
simple change of perspective leads to a completely new way of viewing linear
systems—one that is very useful and will occupy our attention throughout this book.
To motivate the definition of the “product” Ax, consider first the following
system of two equations in three variables:
ax1 + bx2 + cx3 = b1
ax1 + bx2 + cx3 = b1 (∗)
S T
x1
and let A = S T x2 , and b = S 1 T denote the coefficient matrix, the
a b c b
, x =
a b c x3 b2
variable matrix, and the constant matrix, respectively. The system (∗) can be
expressed as a single vector equation
x1 S a T + x2 S b T + x3 S c T = S T .
b1
a b c b2
Now observe that the vectors appearing on the left side are just the columns
a1 = S T , a2 = S T , and a3 = S T
a b c
a b c
of the coefficient matrix A. Hence the system (∗) takes the form
x1a1 + x2a2 + x3a3 = B. (∗∗)
This shows that the system (∗) has a solution if and only if the constant matrix b
is a linear combination3 of the columns of A, and that in this case the entries of
the solution are the coefficients x1, x2, and x3 in this linear combination.
Moreover, this holds in general. If A is any m × n matrix, it is often convenient to
view A as a row of columns. That is, if a1, a2, …, an are the columns of A, we write
A = [a1 a2 an]
and say that A = [a1 a2 an] is given in terms of its columns.
3 Linear combinations were introduced in Section 1.3 to describe the solutions of homogeneous systems of linear equations. They will
be used extensively in what follows.
44 Chapter 2 Matrix Algebra
S T
x1
x
b is the constant matrix of the system, and if x = 2 is the matrix of variables then,
xn
exactly as above, the system can be written as a single vector equation
x1a1 + x2a2 + + xnan = b. (∗∗∗)
EXAMPLE 1
3x1 + 2x2 − 4 x3 = 0
Write the system • x1 − 3x2 + x3 = 3 in the form given in (∗∗∗).
x2 − 5x3 = −1
ST S T S T S T
3 2 -4 0
Solution ► x1 1 + x2 -3 + x3 1 = 3 .
0 1 -5 -1
As mentioned above, we view the left side of (∗∗∗) as the product of the matrix A
and the vector x. This basic idea is formalized in the following definition:
S T
x1
x
terms of its columns a1, a2, …, an. If x = 2 is any n-vector, the product Ax is defined
xn
to be the m-vector given by:
Ax = x1a1 + x1a2 + x1an.
Theorem 1
(1) Every system of linear equations has the form Ax = b where A is the coefficient
matrix, b is the constant matrix, and X is the matrix of variables.
(2) The system Ax = b is consistent if and only if b is a linear combination of the
columns of A.
S T
x1
x
(3) If a1, a2, …, an are the columns of A and if x = 2 , then x is a solution to the
xn
linear system Ax = b if and only if x1, x2, …, xn are a solution of the vector
equation x1a1 + x2a2 + + xnan = b.
SECTION 2.2 Equations, Matrices, and Transformations 45
EXAMP L E 2
S T S T
2 -1 3 5 2
If A = 0 2 -3 1 and x = 1 , compute Ax.
0
-3 4 1 2 -2
S T S T S T ST S T
2 -1 3 5 -7
Solution ► By Definition 2.5: Ax = 2 0 + 1 2 + 0 -3 - 2 1 = 0 .
-3 4 1 2 -6
EXAMP L E 3
Given columns a1, a2, a3, and a4 in 3, write 2a1 - 3a2 + 5a3 + a4 in the form
Ax where A is a matrix and x is a vector.
S T
2
Solution ► Here the column of coefficients is x = -3 . Hence Definition 2.5 gives
5
1
Ax = 2a1 - 3a2 + 5a3 + a4
where A = [a1 a2 a3 a4] is the matrix with a1, a2, a3, and a4 as its columns.
EXAMP L E 4
Let A = [a1 a2 a3 a4] be the 3 × 4 matrix given in terms of its columns
S T ST S T ST
2 1 3 3
a1 = 0 , a2 = 1 , a4 = -1 , and a4 = 1 . In each case below, either express
-1 1 -3 0
b as a linear combination of a1, a2, a3, and a4, or show that it is not such a
linear combination. Explain what your answer means for the corresponding
system Ax = b of linear equations.
ST ST
1 4
(a) b = 2 (b) b = 2 .
3 1
Solution ► By Theorem 1, b is a linear combination of a1, a2, a3, and a4 if and
only if the system Ax = b is consistent (that is, it has a solution). So in each
case we carry the augmented matrix [A|b] of the system Ax = b to reduced
form.
46 Chapter 2 Matrix Algebra
2 1 3 3 1 1 0 2 1 0
(a) Here 0 1 −1 1 2 → 0 1 −1 1 0 , so the system Ax = b has no
−1 1 −3 0 3 0 0 0 0 1
solution in this case. Hence b is not a linear combination of a1, a2, a3, and a4.
2 1 3 3 4 1 0 2 1 1
(b) Now 0 1 −1 1 2 → 0 1 −1 1 2 , so the system Ax = b is
−1 1 −3 0 1 0 0 0 0 0
consistent.
Thus b is a linear combination of a1, a2, a3, and a4 in this case. In fact the
general solution is x1 = 1 - 2s - t, x2 = 2 + s - t, x3 = s, and x4 = t where s
ST
4
and t are arbitrary parameters. Hence x1a1 + x2a2 + x3a3 + x4a4 = b = 2 for
1
any choice of s and t. If we take s = 0 and t = 0, this becomes a1 + 2a2 = b,
whereas taking s = 1 = t gives -2a1 + 2a2 + a3 + a4 = b.
EXAMPLE 5
Taking A to be the zero matrix, we have 0x = 0 for all vectors x by
Definition 2.5 because every column of the zero matrix is zero. Similarly,
A0 = 0 for all matrices A because every entry of the zero vector is zero.
EXAM PLE 6
S T
1 0 0
If I = 0 1 0 , show that Ix = x for any vector x in 3.
0 0 1
S T
x1
Solution ► If x = x2 then Definition 2.5 gives
x3
ST ST ST S T S T S T S T
1 0 0 x1 0 0 x1
Ix = x1 0 + x2 1 + x3 0 = 0 + x2 + 0 = x2 = x.
0 0 1 0 0 x3 x3
The matrix I in Example 6 is called the 3 × 3 identity matrix, and we will
encounter such matrices again in Example 11 below.
Theorem 2
PROOF
We prove (3); the other verifications are similar and are left as exercises. Let
A = [a1 a2 an] and B = [b1 b2 bn] be given in terms of their columns.
Since adding two matrices is the same as adding their columns, we have
A + B = [a1 + b1 a2 + b1 an + bn]
S T
x1
x2
If we write x = Definition 2.5 gives
xn
(A + B)x = x1(a1 + b1) + x2(a2 + b2) + + xn(an + bn)
= (x1a1 + x2a2 + + xnan) + (x1b1 + x2b2 + + xnbn)
= Ax + Bx.
Theorem 3
PROOF
Suppose x2 is also a solution to Ax = b, so that Ax2 = b. Write x0 = x2 - x1.
Then x2 = x0 + x1 and, using Theorem 2, we compute
Ax0 = A(x2 - x1) = Ax2 - Ax1 = b - b = 0.
Hence x0 is a solution to the associated homogeneous system Ax = 0.
48 Chapter 2 Matrix Algebra
EXAMPLE 7
Express every solution to the following system as the sum of a specific solution
plus a solution to the associated homogeneous system.
x1 - x2 - x3 + 3x4 = 2
2x1 - x2 - 3x3 + 4x4 = 6
x1 - 2x3 + x4 = 4
S T S T S T Q S T S TR
x1 4 + 2s - t 4 2 -1
x2
x = x = 2 + ss + 2t = 2 + s 1 + t 2 .
3 0 1 0
x4 t 0 0 1
ST ST S T
4 2 -1
Thus x = 2 is a particular solution (where s = 0 = t), and x0 = s 1 +t 2
0 1 0
0 0 1
gives all solutions to the associated homogeneous system. (To see why this is
so, carry out the gaussian elimination again but with all the constants set equal
to zero.)
Definition 2.6 If (a1, a2, …, an) and (b1, b2, …, bn) are two ordered n-tuples, their dot product is
defined to be the number
a1b1 + a2b2 + + anbn
obtained by multiplying corresponding entries and adding the results.
To see how this relates to matrix products, let A denote a 3 × 4 matrix and let x be
a 4-vector. Writing
S T S T
x1 a11 a12 a13 a14
x2
x= x and A = a21 a22 a23 a24
3
x4 a31 a32 a33 a34
S TS T S T S T S T S T
a11 a12 a13 a14 x1 a11 a12 a13 a14
x
Ax = a21 a22 a23 a24 x2 = x1 a21 + x2 a22 + x3 a23 + x4 a24
a31 a32 a33 a34 x3 a31 a32 a33 a34
4
S T
a11x1 a12x2 a13x3 a14x4
= a21x1 a22x2 a23x3 a24x4 .
a31x1 a32x2 a33x3 a34x4
SECTION 2.2 Equations, Matrices, and Transformations 49
From this we see that each entry of Ax is the dot product of the corresponding row
of A with x. This computation goes through in general, and we record the result in
Theorem 4.
Theorem 4
row i entry i As an illustration, we rework Example 2 using the dot product rule instead of
Definition 2.5.
EXAMP L E 8
S T
2 −1 3 5 2
If A = 0 2 −3 1 and x = 1 , compute Ax.
0
−3 4 1 2 -2
Solution ► The entries of Ax are the dot products of the rows of A with x:
S TS T S T
2 -1 3 5 2 2·2 + (−1)1 + 3·0 + 5(−2) -7
Ax = 0 2 -3 1 1 = ·2 + ·1 + (−3)0 + 1(−2) = 0 .
0 0 2
-3 4 1 2 -2 (−3)2 + 4·1 + 1·0 + 2(−2) -6
Of course, this agrees with the outcome in Example 2.
EXAMP L E 9
Write the following system of linear equations in the form Ax = b.
5x1 - x2 + 2x3 + x4 - 3x5 = 8
x1 + x2 + 3x3 - 5x4 + 2x5 = -2
-x1 + x2 - 2x3 + - 3x5 = 0
S T
x1
S T
5 −1 2 1 −3 8 x2
Solution ► Write A = 1 1 3 −5 2 , b = -2 , and x = x3 . Then the
x4
−1 1 −2 0 −3 0
x5
50 Chapter 2 Matrix Algebra
EXAMPLE 10
If A is the zero m × n matrix, then Ax = 0 for each n-vector x.
Solution ► For each k, entry k of Ax is the dot product of row k of A with x, and
this is zero because row k of A consists of zeros.
Definition 2.7 For each n > 2, the identity matrix In is the n × n matrix with 1s on the main
diagonal (upper left to lower right), and zeros elsewhere.
EXAMPLE 11
For each n ≥ 2 we have Inx = x for each n-vector x in n.
S T
x1
x
Solution ► We verify the case n = 4. Given the 4-vector x = x2 the dot
3
product rule gives x4
1 0 0 0 x1 + 0 + 0 + 0
S T S T
x1 x1
0 1 0 0 x2 0 + x2 + 0 + 0 x2
I4x = x3 = 0 + 0 + x + 0 = x3 = x.
0 0 1 0 3
x4 x4
0 0 0 1 0 + 0 + 0 + x4
In general, Inx = x because entry k of Inx is the dot product of row k of In with
x, and row k of In has 1 in position k and zeros elsewhere.
EXAMPLE 12
Let A = [a1 a2 an] be any m × n matrix with columns a1, a2, …, an. If ej
denotes column j of the n × n identity matrix In, then Aej = aj for each
j = 1, 2, …, n.
SECTION 2.2 Equations, Matrices, and Transformations 51
S T
t1
t
Solution ► Write ej = 2 where tj = 1, but ti = 0 for all i ≠ j. Then Theorem 4
tn
gives
Aej = t1a1 + + tjaj + + tnan = 0 + + aj + + 0 = aj.
Theorem 5
PROOF
Write A = [a1 a2 an] and B = [b1 b2 bn] and in terms of their columns.
It is enough to show that ak = bk holds for all k. But we are assuming that Aek =
Bek, which gives ak = bk by Example 12.
⎡a1⎤
a2 ⎣⎢a2⎦⎥ Transformations
The set 2 has a geometrical interpretation as the euclidean plane where a vector
S a T in represents the point (a1, a2) in the plane (see Figure 1). In this way we
a1 2
a1 x1
0= 0
0
2
regard 2 as the set of all points in the plane. Accordingly, we will refer to vectors
FIGURE 1 in 2 as points, and denote their coordinates as a column rather than a row. To
enhance this geometrical interpretation of the vector S T , it is denoted graphically
a1
x3 a2
by an arrow from the origin S T to the vector as in Figure 1.
0
a3
⎡a1⎤ 0
⎢a2⎥
⎢⎣a3⎥⎦
Similarly we identify 3 with 3-dimensional space by writing a point (a1, a2, a3) as
S T
0 a2 a1
a1 x2 the vector a2 in 3, again represented by an arrow4 from the origin to the point as
a3
x1 in Figure 2. In this way the terms “point” and “vector” mean the same thing in the
FIGURE 2 plane or in space.
We begin by describing a particular geometrical transformation of the plane 2.
EXAMPLE 13
FIGURE 3
If we write A = S T , Example 13 shows that reflection in the x axis carries
1 0
0 -1
each vector x in 2 to the vector Ax in 2. It is thus an example of a function
T : 2 → 2 where T(x) = Ax for all x in 2.
As such it is a generalization of the familiar functions f : → that carry a number
x to another real number f (x).
More generally, functions T : n → m are called transformations from n to
T
. Such a transformation T is a rule that assigns to every vector x in n a uniquely
m
x T(x) determined vector T(x) in m called the image of x under T. We denote this state of
affairs by writing
n m T : n → m or n →
T
m
FIGURE 4 The transformation T can be visualized as in Figure 4.
To describe a transformation T : n → m we must specify the vector T(x) in
for every x in n. This is referred to as defining T, or as specifying the action
m
of T. Saying that the action defines the transformation means that we regard two
transformations S : n → m and T : n → m as equal if they have the same
action; more formally
S=T if and only if S(x) = T(x) for all x in n.
Again, this what we mean by f = g where f, g : → are ordinary functions.
Functions f : → are often described by a formula, examples being f (x) = x2 + 1
and f (x) = sin x. The same is true of transformations; here is an example.
EXAMPLE 14
S T S T
x1 x1 + x2
x2
The formula T x = x2 + x3 defines a transformation 4 → 3.
3
x4 x3 + x4
Thus Example 13 shows that reflection in the x axis is the matrix transformation
→ 2 induced by the matrix S T. Also, the transformation R : → in
2 1 0 4 3
0 -1
Example 13 is the matrix transformation induced by the matrix
S T S TS T S T
1 1 0 0 1 1 0 0 x1 x1 + x2
x2
A = 0 1 1 0 because 0 1 1 0 x = x2 + x3 .
3
0 0 1 1 0 0 1 1 x4 x3 + x4
EXAMPLE 15
EXAMPLE 16
If a > 0, the matrix transformation T S y T = S T induced by the matrix
x ax
y
A=S T is called an x-expansion of if a > 1, and an x-compression if
a 0 2
0 1
0 < a < 1. The reason for the names is clear in the diagram below. Similarly,
if b > 0 the matrix S T gives rise to y-expansions and y-compressions.
1 0
0 b
5 Radian measure for angles is based on the fact that 360° equals 2π radians. Hence π = 180° and __π2 = 90°.
54 Chapter 2 Matrix Algebra
y y y
x-compression x-expansion
0 x 0 a = 1 x 0 a = 3 x
2 2
EXAMPLE 17
If a is a number, the matrix transformation T S x T = S x + ay T induced by the
y y
matrix A = S T is called an x-shear of (positive if a > 0 and negative
1 a 2
0 1
if a < 0). Its effect is illustrated below when a = _14 and a = -_14 .
y y y
Positive x-shear Negative x-shear
⎡x⎤ ⎡x + y⎤ 1
4
⎡x − 14 y⎤
⎣⎢ y⎦⎥ ⎢⎣ y ⎥⎦ ⎢⎣ y ⎥⎦
0 x 0 a = 1 x 0 a = − 14 x
4
y We hasten to note that there are important geometric transformations that are
not matrix transformations. For example, if w is a fixed column in n, define the
x+2
Tw(x) = ⎡ ⎤ transformation Tw : n → n by
⎣ y + 1⎦
Tw(x) = x + w for all x in n.
Then Tw is called translation by w. In particular, if w = S T in 2, the effect of
2
x
x=⎡ ⎤ 1
⎣ y⎦
Tw on S T is to translate it two units to the right and one unit up (see Figure 6).
x
0 x y
FIGURE 6 The translation Tw is not a matrix transformation unless w = 0. Indeed, if Tw
were induced by a matrix A, then Ax = Tw(x) = x + w would hold for every x in
n. In particular, taking x = 0 gives w = A0 = 0.
EXERCISES 2.2
1. In each case find a system of equations that is 2. In each case find a vector equation that is
equivalent to the given vector equation. (Do not equivalent to the given system of equations.
solve the system.) (Do not solve the equation.)
S T ST S T S T
2 1 2 5
(a) x1 - x2 + 3x3 = 5
(a) x1 -3 + x2 1 + x3 0 = 6
-3x1 + x2 + x3 = -6
0 4 -1 -3
5x1 - 8x2 = 9
ST S T S T S T ST
1 -3 -3 3 5
0 +x 8 +x 0 +x 2 = 1 (b) x1 - 2x2 - x3 + x4 = 5
(b) x1 2 3 4
1 2 2 0 2 -x1 + x3 - 2x4 = -3
0 1 2 -2 0 2x1 - 2x2 + 7x3 = 8
3x1 - 4x2 + 9x3 - 2x4 = 12
SECTION 2.2 Equations, Matrices, and Transformations 55
S T
2
3. In each case compute Ax using: (i) Definition x0 = -1 is a solution to Ax = b. Find a
2.5. (ii) Theorem 4. 3
S T
x1 two-parameter family of solutions to Ax = b.
(a) A = S T
3 -2 0 x2 .
and x =
5 -4 1 x3 8. In each case write the system in the form
S T
x1 Ax = b, use the gaussian algorithm to solve
(b) A = S T and x = x2 .
1 2 3 the system, and express the solution as a
0 -4 5 x3 particular solution plus a linear combination
S T S T
-2 0 5 4 x1 of basic solutions to the associated
x homogeneous system Ax = 0.
(c) A = 1 2 0 3 and x = x2 .
3
-5 6 -7 8 x4 (a) x1 - 2x2 + x3 + 4x4 - x5 = 8
S T S T
3 -4 1 6 x1 -2x1 + 4x2 + x3 - 2x4 - 4x5 = -1
x 3x1 - 6x2 + 8x3 + 4x4 - 13x5 = 1
(d) A = 0 2 1 5 and x = x23 .
8x1 - 16x2 + 7x3 + 12x4 - 6x5 = 11
-8 7 -3 0 x4
(b) x1 - 2x2 + x3 + 2x4 + 3x5 = -4
4. Let A = [a1 a2 a3 a4] be the 3 × 4 matrix -3x1 + 6x2 - 2x3 - 3x4 - 11x5 = 11
given in terms of its columns -2x1 + 4x2 - x3 + x4 - 8x5 = 7
S T ST S T S T
1 3 2 0 -x1 + 2x2 + 3x4 - 5x5 = 3
a1 = 1 , a2 = 0 , a3 = -1 , and a4 = -3 .
-1 2 3 5 9. Given vectors
ST ST S T
In each case either express b as a linear 1 1 0
combination of a1, a2, a3, and a4, or show that a1 = 0 , a2 = 1 , and a3 = -1 , find a vector
it is not such a linear combination. Explain 1 0 1
what your answer means for the corresponding b that is not a linear combination of a1, a2,
system Ax = b of linear equations. and a3. Justify your answer. [Hint: Part (2) of
ST ST
0 4 Theorem 1.]
(a) b = 3 (b) b = 1
1 10. In each case either show that the statement is
5
true, or give an example showing that it is false.
5. In each case, express every solution of the system
(a) S T is a linear combination of S T and S T.
as a sum of a specific solution plus a solution of 3 1 0
the associated homogeneous system. 2 0 1
(b) If Ax has a zero entry, then A has a row of
(a) x + y + z = 2 (b) x - y - 4z = -4
zeros.
2x + y =3 x + 2y + 5z = 2
x - y - 3z = 0 x + y + 2z = 0 (c) If Ax = 0 where x ≠ 0, then A = 0.
(c) x1 + x2 - x3 - 5x5 = 2 (d) Every linear combination of vectors in n
x2 + x3 - 4x5 = -1 can be written in the form Ax.
x2 + x3 + x4 - x5 = -1
(e) If A = [a1 a2 a3] in terms of its columns, and
2x1 - 4x3 + x4 + x5 = 6
if b = 3a1 - 2a2, then the system Ax = b has
(d) 2x1 + x2 - x3 - x4 = -1 a solution.
3x1 + x2 + x3 - 2x4 = -2
(f ) If A = [a1 a2 a3] in terms of its columns,
-x1 - x2 + 2x3 + x4 = 2
and if the system Ax = b has a solution, then
-2x1 - x2 + 2x4 = 3
b = sa1 + ta2 for some s, t.
6. If x0 and x1 are solutions to the homogeneous (g) If A is m × n and m < n, then Ax = b has a
system of equations Ax = 0, use Theorem 2 to solution for every column b.
show that sx0 + tx1 is also a solution for any scalars
s and t (called a linear combination of x0 and x1). (h) If Ax = b has a solution for some column b,
then it has a solution for every column b.
S T ST
1 2
(i) If x1 and x2 are solutions to Ax = b, then
7. Assume that A -1 = 0 = A 0 . Show that
x1 - x2 is a solution to Ax = 0.
2 3
56 Chapter 2 Matrix Algebra
(j) Let A = [a1 a2 a3] in terms of its columns. 16. If a vector B is a linear combination of the
(a) T is a reflection in the y axis. 18. Let x1 and x2 be solutions to the homogeneous
system Ax = 0.
(b) T is a reflection in the line y = x.
(a) Show that x1 + x2 is a solution to Ax = 0.
(c) T is a reflection in the line y = -x.
(b) Show that tx1 is a solution to Ax = 0 for any
(d) T is a clockwise rotation through __π2 .
scalar t.
12. The projection P : 3 → 2 is defined by
19. Suppose x1 is a solution to the system Ax = b.
P S y T = S T for all S y T in 3. Show that P is
x x
x If x0 is any nontrivial solution to the associated
y homogeneous system Ax = 0, show that x1 + tx0,
z z
induced by a matrix and find the matrix. t a scalar, is an infinite one parameter family of
solutions to Ax = b. [Hint: Example 7 Section 2.1.]
13. Let T : 3 → 3 be a transformation. In each
case show that T is induced by a matrix and find 20. Let A and B be matrices of the same size. If x
the matrix. is a solution to both the system Ax = 0 and the
system Bx = 0, show that x is a solution to the
(a) T is a reflection in the x-y plane. system (A + B)x = 0.
(b) T is a reflection in the y-z plane.
21. If A is m × n and Ax = 0 for every x in n, show
4 4 that A = 0 is the zero matrix. [Hint: Consider
14. Fix a > 0 in , and define Ta : → by
Ta(x) = ax for all x in 4. Show that T is induced Aej where ej is the jth column of In; that is, ej is
by a matrix and find the matrix. [T is called a the vector in n with 1 as entry j and every other
dilation if a > 1 and a contraction if a < 1.] entry 0.]
15. Let A be m × n and let x be in n. If A has a row 22. Prove part (1) of Theorem 2.
of zeros, show that Ax has a zero entry.
23. Prove part (2) of Theorem 2.
S T
x1
x
where the Aj are the columns of A, and if x = 2 , Definition 2.5 reads
xn
Ax = x1a1 + x2a2 + + xnan (∗)
This was motivated as a way of describing systems of linear equations with
coefficient matrix A. Indeed every such system has the form Ax = b where b is
the column of constants.
In this section we extend this matrix-vector multiplication to a way of multiplying
matrices in general, and then investigate matrix algebra for its own sake. While
it shares several properties of ordinary arithmetic, it will soon become clear that
matrix arithmetic is different in a number of ways.
Matrix multiplication is closely related to composition of transformations.
SECTION 2.3 Matrix Multiplication 57
S T
x1
x
with columns Ab1, Ab2, , Abk. Now compute (TA ◦ TB)(x) for any x = 2 in k:
xk
(TA ◦ TB)(x) = TA[TB(x)] Definition of TA ◦ TB
= A(Bx) A and B induce TA and TB
= A(x1b1 + x2b2 + + xkbk) Equation (∗) above
= A(x1b1) + A(x2b2) + + A(xkbk) Theorem 2, Section 2.2
= x1(Ab1) + x2(Ab2) + + xk(Abk) Theorem 2, Section 2.2
= [Ab1 Ab2 Abk]x. Equation (∗) above
Because x was an arbitrary vector in n, this shows that TA ◦ TB is the matrix
transformation induced by the matrix [Ab1, Ab2, , Abn]. This motivates the
following definition.
6 When reading the notation S ◦ T, we read S first and then T even though the action is “first T then S ”. This annoying state of affairs
results because we write T(x) for the effect of the transformation T on x, with T on the left. If we wrote this instead as (x)T, the
confusion would not occur. However the notation T(x) is well established.
58 Chapter 2 Matrix Algebra
Thus the product matrix AB is given in terms of its columns Ab1, Ab2, …, Abn:
Column j of AB is the matrix-vector product Abj of A and the corresponding
column bj of B. Note that each such product Abj makes sense by Definition 2.5
because A is m × n and each bj is in n (since B has n rows). Note also that if B
is a column matrix, this definition reduces to Definition 2.5 for matrix-vector
multiplication.
Given matrices A and B, Definition 2.9 and the above computation give
A(Bx) = [Ab1 Ab2 Abn]x = (AB)x
k
for all x in . We record this for reference.
Theorem 1
EXAMPLE 1
S T S T
2 3 5 8 9
Compute AB if A = 1 4 7 and B = 7 2
0 1 8 6 1
ST ST
8 9
Solution ► The columns of B are b1 = 7 and b2 = 2 , so Definition 2.5 gives
6 1
S TS T S T S TS T S T
2 3 5 8 67 2 3 5 9 29
Ab1 = 1 4 7 7 = 78 and Ab2 = 1 4 7 2 = 24 .
0 1 8 6 55 0 1 8 1 10
S T
67 29
Hence Definition 2.9 above gives AB = [Ab1 Ab2] = 78 24 .
55 10
While Definition 2.9 is important, there is another way to compute the matrix
product AB that gives a way to calculate each individual entry. In Section 2.2
we defined the dot product of two n-tuples to be the sum of the products of
corresponding entries. We went on to show (Theorem 4 Section 2.2) that if A
is an m × n matrix and x is an n-vector, then entry j of the product Ax is the dot
product of row j of A with x. This observation was called the “dot product rule” for
matrix-vector multiplication, and the next theorem shows that it extends to matrix
multiplication in general.
SECTION 2.3 Matrix Multiplication 59
Theorem 2
PROOF
Write B = [b1 b2 bn] in terms of its columns. Then Abj is column j of AB
for each j. Hence the (i, j)-entry of AB is entry i of Abj, which is the dot product
of row i of A with bj. This proves the theorem.
Thus to compute the (i, j)-entry of AB, proceed as follows (see the diagram):
A B AB Go across row i of A, and down column j of B,
multiply corresponding entries, and add the results.
= Note that this requires that the rows of A must be the same length as the columns
of B. The following rule is useful for remembering this and for deciding the size of
row i column j the product matrix AB.
(i,j)-entry
Compatibility Rule
Let A and B denote matrices. If A is m × n and B is n × k, the product AB can be
formed if and only if n = n. In this case the size of the product matrix AB is m × k,
and we say that AB is defined, or that A and B are compatible for multiplication.
A B The diagram provides a useful mnemonic for remembering this. We adopt the
m × n n′ × k following convention:
Convention
Whenever a product of matrices is written, it is tacitly assumed that the sizes of the
factors are such that the product is defined.
To illustrate the dot product rule, we recompute the matrix product in Example 1.
EXAMP L E 2
S T S T
2 3 5 8 9
Compute AB if A = 1 4 7 and B = 7 2 .
0 1 8 6 1
Solution ► Here A is 3 × 3 and B is 3 × 2, so the product matrix AB is defined
and will be of size 3 × 2. Theorem 2 gives each entry of AB as the dot product
of the corresponding row of A with the corresponding column of Bj that is,
S TS T S T S T
2 3 5 8 9 2·8+3·7+5·6 2·9+3·2+5·1 67 29
AB = 1 4 7 7 2 = 1 · 8 + 4 · 7 + 7 · 6 1 · 9 + 4 · 2 + 7 · 1 = 78 24 .
0 1 8 6 1 0·8+1·7+8·6 0·9+1·2+8·1 55 10
Of course, this agrees with Example 1.
60 Chapter 2 Matrix Algebra
EXAMPLE 3
Compute the (1, 3)- and (2, 4)-entries of AB where
2 1 6 0
A=S T and B =
3 -1 2
0 2 3 4 .
0 1 4
−1 0 5 8
Then compute AB.
Solution ► The (1, 3)-entry of AB is the dot product of row 1 of A and column
3 of B (highlighted in the following display), computed by multiplying
corresponding entries and adding the results.
2 1 6 0
3 −1 2
0 2 3 4 (1, 3)-entry = 3 · 6 + (-1) · 3 + 2 · 5 = 25
0 1 4
−1 0 5 8
Similarly, the (2, 4) entry of AB involves row 2 of A and column 4 of B.
2 1 6 0
3 −1 2
0 2 3 4 (2, 4)-entry = 0 · 0 + 1 · 4 + 4 · 8 = 36
0 1 4
−1 0 5 8
Since A is 2 × 3 and B is 3 × 4, the product is 2 × 4.
2 1 6 0
AB = S T 0 2 3 4 =S T
3 -1 2 4 1 25 12
0 1 4 -4 2 23 36
−1 0 5 8
EXAMPLE 4
ST
5
If A = [1 3 2] and B = 6 , compute A2, AB, BA, and B2 when they are defined.7
4
Solution ► Here, A is a 1 × 3 matrix and B is a 3 × 1 matrix, so A2 and B2 are
not defined. However, the rule reads
A B B A
and
1×3 3×1 3×1 1×3
so both AB and BA can be formed and these are 1 × 1 and 3 × 3 matrices,
respectively.
ST
5
AB = [1 3 2] 6 = [1 · 5 + 3 · 6 + 2 · 4] = [31]
4
ST S T S T
5 5·1 5·3 5·2 5 15 10
BA = 6 [1 3 2] = 6 · 1 6 · 3 6 · 2 = 6 18 12
4 4·1 4·3 4·2 4 12 8
7
7 As for numbers, we write A2 = A · A, A3 = A · A · A, etc. Note that A2 is defined if and only if A is of size n × n for some n.
SECTION 2.3 Matrix Multiplication 61
EXAMP L E 5
EXAMP L E 6
If A is any matrix, then IA = A and AI = A, and where I denotes an identity
matrix of a size so that the multiplications are defined.
Solution ► These both follow from the dot product rule as the reader should
verify. For a more formal proof, write A = [a1 a2 an] where aj is column j
of A. Then Definition 2.9 and Example 11 Section 2.2 give
IA = [Ia1 Ia2 Ian] = [a1 a2 an] = A
If ej denotes column j of i, then Aej = aj for each j by Example 12 Section 2.2.
Hence Definition 2.9 gives:
AI = A[e1 e2 en] = [Ae1 Ae2 Aen] = [a1 a2 an] = A
The following theorem collects several results about matrix multiplication that
are used everywhere in linear algebra.
Theorem 3
Assume that a is any scalar, and that A, B, and C are matrices of sizes such that the
indicated matrix products are defined. Then:
1. IA = A and AI = A where I denotes an identity matrix.
2. A(BC) = (AB)C.
3. A(B + C) = AB + AC.
4. (B + C)A = BA + CA.
5. a(AB) = (aA)B = A(aB).
6. (AB)T = BTAT.
62 Chapter 2 Matrix Algebra
PROOF
(1) is Example 6; we prove (2), (4), and (6) and leave (3) and (5) as exercises.
(2) If C = [c1 c2 ck] in terms of its columns, then BC = [Bc1 Bc2 Bck] by
Definition 2.9, so
A(BC) = [A(Bc1) A(Bc2) A(Bck)] Definition 2.9
= [(AB)c1 (AB)c2 (AB)ck] Theorem 1
= (AB)C Definition 2.9
(4) We know (Theorem 2 Section 2.2) that (B + C)x = Bx + Cx holds for every
column x. If we write A = [a1 a2 an] in terms of its columns, we get
(B + C)A = [(B + C)a1 (B + C)a2 (B + C)an] Definition 2.9
= [Ba1 + Ca1 Ba2 + Ca2 Ban + Can] Theorem 2 Section 2.2
= [Ba1 Ba2 Ban] + [Ca1 Ca2 Can] Adding Columns
= BA + CA Definition 2.9
(6) As in Section 2.1, write A = [aij] and B = [bij], so that AT = [aij] and BT = [bij]
where aij = aji and bji = bij for all i and j. If cij denotes the (i, j)-entry of BTAT,
then cij is the dot product of row i of BT with column j of AT. Since row i of BT
is [bi1 bi2 bim] and column j of AT is [ai1 ai2 amj], we obtain
cij = bi1a1j + bi2a2j + + bim amj
= b1iaj1 + b2iaj2 + + bmi ajm
= aj1b1i + aj2b2i + + ajm bmi.
But this is the dot product of row j of A with column i of B; that is, the
(j, i)-entry of AB; that is, the (i, j)-entry of (AB)T. This proves (6).
Warning
If the order of the factors in a product of matrices is changed, the product matrix
may change (or may not be defined). Ignoring this warning is a source of many
errors by students of linear algebra!
Properties 3 and 4 in Theorem 3 are called distributive laws. They assert that
A(B + C) = AB + AC and (B + C)A = BA + CA hold whenever the sums and
products are defined. These rules extend to more than two terms and, together
with Property 5, ensure that many manipulations familiar from ordinary algebra
extend to matrices. For example
A(2B - 3C + D - 5E) = 2AB - 3AC + AD - 5AE
(A + 3C - 2D)B = AB + 3CB - 2DB
SECTION 2.3 Matrix Multiplication 63
Note again that the warning is in effect: For example A(B - C) need not equal
AB - CA. These rules make possible a lot of simplification of matrix expressions.
EXAMP L E 7
Simplify the expression A(BC - CD) + A(C - B)D - AB(C - D).
Examples 8 and 9 below show how we can use the properties in Theorem 2 to
deduce other facts about matrix multiplication. Matrices A and B are said to
commute if AB = BA.
EXAMP L E 8
Suppose that A, B, and C are n × n matrices and that both A and B commute
with C; that is, AC = CA and BC = CB. Show that AB commutes with C.
EXAMP L E 9
Show that AB = BA if and only if (A - B)(A + B) = A2 - B2.
In Section 2.2 we saw (in Theorem 1) that every system of linear equations has
the form
Ax = b
where A is the coefficient matrix, x is the column of variables, and b is the constant
matrix. Thus the system of linear equations becomes a single matrix equation. Matrix
multiplication can yield information about such a system.
64 Chapter 2 Matrix Algebra
EXAMPLE 10
Consider a system Ax = b of linear equations where A is an m × n matrix.
Assume that a matrix C exists such that CA = In. If the system Ax = b has a
solution, show that this solution must be Cb. Give a condition guaranteeing
that Cb is in fact a solution.
The ideas in Example 10 lead to important information about matrices; this will
be pursued in the next section.
Block Multiplication
Definition 2.10 It is often useful to consider matrices whose entries are themselves matrices (called
blocks). A matrix viewed in this way is said to be partitioned into blocks.
Theorem 4
Block Multiplication
If matrices A and B are partitioned compatibly into blocks, the product AB can be
computed by matrix multiplication using blocks as entries.
S T
x1
x
Bx = [b1 b2 bk] 2 = x1b1 + x2b2 + + xkbk].
xk
where x is any k × 1 column matrix (this is Definition 2.5).
It is not our intention to pursue block multiplication in detail here. However, we
give one more example because it will be used below.
Theorem 5
AA1 = S TS T =S T.
B X B1 X1 BB1 BX1 + XC1
0 C 0 C1 0 CC1
EXAMPLE 11
Block multiplication has theoretical uses as we shall see. However, it is also useful
in computing products of matrices in a computer with limited memory capacity.
The matrices are partitioned into blocks in such a way that each product of blocks
can be handled. Then the blocks are stored in auxiliary memory and their products
are computed one by one.
66 Chapter 2 Matrix Algebra
Directed Graphs
The study of directed graphs illustrates how matrix multiplication arises in ways
other than the study of linear equations or matrix transformations.
v1 v2 A directed graph consists of a set of points (called vertices) connected by
arrows (called edges). For example, the vertices could represent cities and the
edges available flights. If the graph has n vertices v1, v2, …, vn, the adjacency
v3 matrix A = [aij] is the n × n matrix whose (i, j)-entry aij is 1 if there is an edge
from vj to vi (note the order), and zero otherwise. For example, the adjacency
S T
1 1 0
matrix of the directed graph shown is A = 1 0 1 . A path of length r (or an
1 0 0
r-path) from vertex j to vertex i is a sequence of r edges leading from vj to vi.
Thus v1 → v2 → v1 → v1 → v3 is a 4-path from v1 to v3 in the given graph. The
edges are just the paths of length 1, so the (i, j)-entry aij of the adjacency matrix
A is the number of 1-paths from vj to vi. This observation has an important
extension:
Theorem 6
If A is the adjacency matrix of a directed graph with n vertices, then the (i, j)-entry of Ar
is the number of r-paths vj → vi.
S T S T S T
1 1 0 2 1 1 4 2 1
A= 1 0 1, A2 = 2 1 0 , and A3 = 3 2 1 .
1 0 0 1 1 0 2 1 1
Hence, since the (2, 1)-entry of A2 is 2, there are two 2-paths v1 → v2 (in fact
v1 → v1 → v2 and v1 → v3 → v2). Similarly, the (2, 3)-entry of A2 is zero, so there
are no 2-paths v3 → v2, as the reader can verify. The fact that no entry of A3 is zero
shows that it is possible to go from any vertex to any other vertex in exactly three
steps.
To see why Theorem 6 is true, observe that it asserts that
the (i, j)-entry of Ar equals the number of r-paths vj → vi (∗)
holds for each r ≥ 1. We proceed by induction on r (see Appendix C). The case
r = 1 is the definition of the adjacency matrix. So assume inductively that (∗) is true
for some r ≥ 1; we must prove that (∗) also holds for r + 1. But every (r + 1)-path
vj → vi is the result of an r-path vj → vk for some k, followed by a 1-path vk → vi.
Writing A = [aij] and Ar = [bij], there are bkj paths of the former type (by induction)
and aik of the latter type, and so there are aikbkj such paths in all. Summing over k,
this shows that there are
ai1b1j + ai2b2j + + ainbnj (r + 1)-paths vj → vi.
But this sum is the dot product of the ith row [ai1 ai2 ain] of A with the jth
column [b1j b2j bnj]T of Ar. As such, it is the (i, j)-entry of the matrix product
ArA = Ar+1. This shows that (∗) holds for r + 1, as required.
SECTION 2.3 Matrix Multiplication 67
EXERCISES 2.3
S T
2 3 1
(a) S TS T (b) S T 1 9 7
1 3 2 -1 1 -1 2 0 0
A=S T for some a and b.
0 -2 0 1 2 0 4 a b
-1 0 2 0 a
S TS T S T
1 0 0 3 -2 2 c a
(e) 0 1 0 5 -7 (f ) [1 -1 3] 1
0 0 1 9 7 -8 (c) Show that A commutes with every 2 × 2
matrix if and only if A = S T for some a.
a 0
S T
2
S TS T
3 1 2 -1 0 a
(g) 1 [1 -1 3] (h)
5 2 -5 3 7. (a) If A2 can be formed, what can be said about
-7
the size of A?
S T S TS T
a 0 0 a 0 0 a 0 0
(i) S T 0 b 0
2 3 1 (b) If AB and BA can both be formed, describe
(j) 0 b 0 0 b 0
5 7 4 0 0 c 0 0 c 0 0 c the sizes of A and B.
(c) If ABC can be formed, A is 3 × 3, and C is
2. In each of the following cases, find all possible
5 × 5, what size is B?
products A2, AB, AC, and so on.
S T
-1 0 8. (a) Find two 2 × 2 matrices A such that A2 = 0.
(a) A = S 1 2 3 T , B = S 1_1 -2 T , C = 2 5
-1 0 0 2
3 (b) Find three 2 × 2 matrices A such that
0 5
(i) A2 = I; (ii) A2 = A.
S T
2 0
(b) A = S T, B = S T , C = -1 1
1 2 4 -1 6
(c) Find 2 × 2 matrices A and B such that
0 1 -1 1 0 AB = 0 but BA ≠ 0.
1 2
S T
3. Find a, b, a1, and b1 if: 1 0 0
9. Write P = 0 0 1 , and let A be 3 × n and B
Sa TS T=S T
a b 3 -5 1 -1
(a) 0 1 0
1 b1 -1 2 2 0
be m × 3.
-1 2 S a1 b1 T
S T =S T
2 1 a b 7 2
(b) (a) Describe PA in terms of the rows of A.
-1 4
(b) Describe BP in terms of the columns of B.
4. Verify that A2 - A - 6I = 0 if:
10. Let A, B, and C be as in Exercise 5. Find the
(a) S T S T
3 -1 2 2
(b) (3, 1)-entry of CAB using exactly six numerical
0 -2 2 -1 multiplications.
S T
1 0
5. Given A = S T S3 1 0T
1 -1 1 0 -2 11. Compute AB, using the indicated block
, B = , C = 2 1,
0 1 partitioning.
5 8
and D = S T , verify the following facts
3 -1 2 2 −1 3 1 1 2 0
1 0 5 1 0 1 2 − 1 0 0
from Theorem 1. A= B=
0 0 1 0 0 5 1
(a) A(B - D) = AB - AD 0 0 0 1 1 −1 0
(b) A(BC) = (AB)C 12. In each case give formulas for all powers A,
A2, A3, … of A using the block decomposition
(c) (CD)T = DTCT indicated.
68 Chapter 2 Matrix Algebra
(b) Repeat part (a) for the case where A and B (f ) If A is n × m and B is m × n, and if AB = In,
are n × n. then BA is an idempotent.
29. Let A and B be n × n matrices for which the 33. Let A and B be n × n diagonal matrices (all
systems of equations Ax = 0 and Bx = 0 each entries off the main diagonal are zero).
have only the trivial solution x = 0. Show that the (a) Show that AB is diagonal and AB = BA.
system (AB)x = 0 has only the trivial solution.
(b) Formulate a rule for calculating XA if X is
30. The trace of a square matrix A, denoted tr A, is m × n.
the sum of the elements on the main diagonal of
(c) Formulate a rule for calculating AY if Y is
A. Show that, if A and B are n × n matrices:
n × k.
(a) tr(A + B) = tr A + tr B.
34. If A and B are n × n matrices, show that:
(b) tr(kA) = k tr(A) for any number k.
(a) AB = BA if and only if
(c) tr(AT) = tr(A).
(A + B)2 = A2 + 2AB + B2.
(d) tr(AB) = tr(BA).
(b) AB = BA if and only if
(e) tr(AAT) is the sum of the squares of all
(A + B)(A - B) = (A - B)(A + B).
entries of A.
35. In Theorem 3, prove
31. Show that AB - BA = I is impossible.
[Hint: See the preceding exercise.] (a) part 3; (b) part 5.
32. A square matrix P is called an idempotent if 36. (V. Camillo) Show that the product of two
P2 = P. Show that: reduced row-echelon matrices is also reduced
row-echelon.
Definition 2.11 This suggests the following definition. If A is a square matrix, a matrix B is called an
inverse of A if and only if
AB = I and BA = I
A matrix A that has an inverse is called an invertible matrix.8
8
EXAMPLE 1
Show that B = S T is an inverse of A = S T.
-1 1 0 1
1 0 1 1
Solution ► Compute AB and BA.
AB = S
TS T = S T BA = S TS T = S T
0 1 -1 1 1 0 -1 1 0 1 1 0
1 1 1 0 0 1 1 0 1 1 0 1
Hence AB = I = BA, so B is indeed an inverse of A.
EXAMPLE 2
The argument in Example 2 shows that no zero matrix has an inverse. But
Example 2 also shows that, unlike arithmetic, it is possible for a nonzero matrix to
have no inverse. However, if a matrix does have an inverse, it has only one.
Theorem 1
PROOF
Since B and C are both inverses of A, we have CA = I = AB.
Hence B = IB = (CA)B = C(AB) = CI = C.
8 Only square matrices have inverses. Even though it is plausible that nonsquare matrices A and B could exist such that AB = Im and
BA = In, where A is m × n and B is n × m, we claim that this forces n = m. Indeed, if m < n there exists a nonzero column x
such that Ax = 0 (by Theorem 1 Section 1.3), so x = In x = (BA)x = B(Ax) = B(0) = 0, a contradiction. Hence m ≥ n. Similarly, the
condition AB = Im implies that n ≥ m. Hence m = n so A is square.
SECTION 2.4 Matrix Inverses 71
These equations characterize A-1 in the following sense: If somehow a matrix B can
be found such that AB = I = BA, then A is invertible and B is the inverse of A; in
symbols, B = A-1. This gives us a way of verifying that the inverse of a matrix exists.
Examples 3 and 4 offer illustrations.
EXAMP L E 3
The next example presents a useful formula for the inverse of a 2 × 2 matrix
A=S T. To state it, we define the determinant det A and the adjugate adj A
a b
c d
of the matrix A as follows:
det S T = ad - bc, and adj S T = S -c a T
a b a b d -b
c d c d
EXAMP L E 4
If A = S T , show that A has an inverse if and only if det A ≠ 0, and in this case
a b
c d
1 adj A
A-1 = _____
det A
Solution ► For convenience, write e = det A = ad - bc and B = adj A = S T.
d -b
-c a
Then AB = eI = BA as the reader can verify. So if e ≠ 0, scalar multiplication
by 1/e gives A(_1e B) = I = (_1e B)A. Hence A is invertible and A-1 = _1e B. Thus it
remains only to show that if A-1 exists, then e ≠ 0.
We prove this by showing that assuming e = 0 leads to a contradiction. In fact,
if e = 0, then AB = eI = 0, so left multiplication by A-1 gives A-1AB = A-10;
that is, IB = 0, so B = 0. But this implies that a, b, c, and d are all zero, so
A = 0, contrary to the assumption that A-1 exists.
Theorem 2
EXAMPLE 5
41 S 41 S
T S 8 T = __ T,
4 3 -4 8
x = A-1b = __1 1
-7 5 68
8 68
so the solution is x1 = __
41
and x2 = __
41
.
An Inversion Method
If a matrix A is n × n and invertible, it is desirable to have an efficient technique for
finding the inverse matrix A–1. In fact, we can determine A–1 from the equation
AA–1 = In
Write A–1 in terms of its columns as A–1 = [x1 x2 xn], where the columns xj
are to be determined. Similarly, write In = [e1, e2, …, en] in terms of its columns.
Then (using Definition 2.9) the condition AA–1 = I becomes
[Ax1 Ax2 Axn] = [e1 e2 en]
SECTION 2.4 Matrix Inverses 73
EXAMP L E 6
Use the inversion algorithm to find the inverse of the matrix
2 7 1
A = 1 4 −1
1 3 0
Solution ► Apply elementary row operations to the double matrix
2 7 1 1 0 0
[A I ] = 1 4 − 1 0 1 0
1 3 0 0 0 1
so as to carry A to I. First interchange rows 1 and 2.
1 4 −1 0 1 0
2 7 1 1 0 0
1 3 0 0 0 1
Next subtract 2 times row 1 from row 2, and subtract row 1 from row 3.
1 4 −1 0 1 0
0 −1 3 1 − 2 0
0 −1 1 0 −1 1
74 Chapter 2 Matrix Algebra
− 3 − 3 11
Hence A-1 = _12 1 1 − 3 , as is readily verified.
1 −1 −1
Given any n × n matrix A, Theorem 1 Section 1.2 shows that A can be carried
by elementary row operations to a matrix R in reduced row-echelon form. If R = I,
the matrix A is invertible (this will be proved in the next section), so the algorithm
produces A-1. If R ≠ I, then R has a row of zeros (it is square), so no system of
linear equations Ax = b can have a unique solution. But then A is not invertible by
Theorem 2. Hence, the algorithm is effective in the sense conveyed in Theorem 3.
Theorem 3
Properties of Inverses
The following properties of an invertible matrix are used everywhere.
EXAMPLE 7
Cancellation Laws Let A be an invertible matrix. Show that:
(1) If AB = AC, then B = C.
(2) If BA = CA, then B = C.
Solution ► Given the equation AB = AC, left multiply both sides by A-1 to
obtain A-1 AB = A-1 AC. This is IB = IC, that is B = C. This proves (1) and
the proof of (2) is left to the reader.
Properties (1) and (2) in Example 7 are described by saying that an invertible matrix
can be “left cancelled” and “right cancelled”, respectively. Note however that
“mixed” cancellation does not hold in general: If A is invertible and AB = CA, then
B and C may not be equal, even if both are 2 × 2. Here is a specific example:
A = S T , B = S T , and C = S T.
1 1 0 0 1 1
0 1 1 2 1 1
SECTION 2.4 Matrix Inverses 75
EXAMP L E 8
If A is an invertible matrix, show that the transpose AT is also invertible.
Show further that the inverse of AT is just the transpose of A-1; in symbols,
(AT)-1 = (A-1)T.
Solution ► A-1 exists (by assumption). Its transpose (A-1)T is the candidate
proposed for the inverse of AT. Using Theorem 3 Section 2.3, we test it
as follows:
AT(A-1)T = (A-1A)T = IT = I
(A-1)TAT = (AA-1)T = IT = I
Hence (A-1)T is indeed the inverse of AT; that is, (AT)-1 = (A-1)T.
EXAMP L E 9
If A and B are invertible n × n matrices, show that their product AB is also
invertible and (AB)-1 = B-1A-1.
Solution ► We are given a candidate for the inverse of AB, namely B-1A-1.
We test it as follows:
Theorem 4
All the following matrices are square matrices of the same size.
1. I is invertible and I -1 = I.
2. If A is invertible, so is A-1, and (A-1)-1 = A.
3. If A and B are invertible, so is AB, and (AB)-1 = B-1A-1.
4. If A1, A2, …, Ak are all invertible, so is their product A1A2Ak, and
(A1A2 Ak)-1 = Ak-1A2-1A1-1.
5. If A is invertible, so is Ak for any k ≥ 1, and (Ak)-1 = (A-1)k.
6. If A is invertible and a ≠ 0 is a number, then aA is invertible and (aA)-1 = _a1 A-1.
7. If A is invertible, so is its transpose AT, and (AT)-1 = (A-1)T.
76 Chapter 2 Matrix Algebra
PROOF
1. This is an immediate consequence of the fact that I2 = I.
2. The equations AA-1 = I = A-1A show that A is the inverse of A-1; in
symbols, (A-1)-1 = A.
3. This is Example 9.
4. Use induction on k. If k = 1, there is nothing to prove, and if k = 2, the
result is property 3. If k > 2, assume inductively that
(A1A2Ak-1)-1 = A-1k-1A2-1A1-1. We apply this fact together with
property 3 as follows:
[A1A2Ak-1Ak]-1 = [(A1A2Ak-1)Ak]-1
= A -1
k (A1A2Ak-1)
-1
-1 -1 -1 -1
= A k (A k-1A 2 A 1 )
So the proof by induction is complete.
5. This is property 4 with A1 = A2 = = Ak = A.
6. This is left as Exercise 29.
7. This is Example 8.
Corollary 1
EXAMPLE 10
Find A if (AT - 2I )-1 = S T.
2 1
-1 0
Solution ► By Theorem 4(2) and Example 4, we have
Theorem 5
Inverse Theorem
The following conditions are equivalent for an n × n matrix A:
1. A is invertible.
2. The homogeneous system Ax = 0 has only the trivial solution x = 0.
3. A can be carried to the identity matrix In by elementary row operations.
4. The system Ax = b has at least one solution x for every choice of column b.
5. There exists an n × n matrix C such that AC = In.
PROOF
We show that each of these conditions implies the next, and that (5) implies (1).
(1) ⇒ (2). If A-1 exists, then Ax = 0 gives x = Inx = A-1Ax = A-10 = 0.
(2) ⇒ (3). Assume that (2) is true. Certainly A → R by row operations where R
is a reduced, row-echelon matrix. It suffices to show that R = In. Suppose that
this is not the case. Then R has a row of zeros (being square). Now consider the
augmented matrix [A | 0] of the system Ax = 0. Then [A | 0] → [R | 0] is the
reduced form, and [R | 0] also has a row of zeros. Since R is square there must
be at least one nonleading variable, and hence at least one parameter. Hence the
system Ax = 0 has infinitely many solutions, contrary to (2). So R = In after all.
(3) ⇒ (4). Consider the augmented matrix [A | b] of the system Ax = b. Using
(3), let A → In by a sequence of row operations. Then these same operations carry
[A | b] → [In | c] for some column c. Hence the system Ax = b has a solution (in
fact unique) by gaussian elimination. This proves (4).
(4) ⇒ (5). Write In = [e1 e2 en] where e1, e2, …, en are the columns of In.
For each j = 1, 2, …, n, the system Ax = ej has a solution cj by (4), so Acj = ej.
Now let C = [c1 c2 cn] be the n × n matrix with these matrices cj as its
columns. Then Definition 2.9 gives (5):
AC = A[c1 c2 cn] = [Ac1 Ac2 Acn] = [e1 e2 en] = In
(5) ⇒ (1). Assume that (5) is true so that AC = In for some matrix C. Then
Cx = 0 implies x = 0 (because x = Inx = ACx = A0 = 0). Thus condition (2)
holds for the matrix C rather than A. Hence the argument above that (2) ⇒
(3) ⇒ (4) ⇒ (5) (with A replaced by C) shows that a matrix C exists such that
CC = In. But then
A = AIn = A(CC) = (AC)C = InC = C
Thus CA = CC= In which, together with AC = In, shows that C is the inverse of
A. This proves (1).
9 If p and q are statements, we say that p implies q (written p ⇒ q) if q is true whenever p is true. The statements are called
equivalent if both p ⇒ q and q ⇒ p (written p ⇔ q, spoken “p if and only if q”). See Appendix B.
78 Chapter 2 Matrix Algebra
The proof of (5) ⇒ (1) in Theorem 5 shows that if AC = I for square matrices,
then necessarily CA = I, and hence that C and A are inverses of each other. We
record this important fact for reference.
Corollary 1
If A and C are square matrices such that AC = I, then also CA = I. In particular, both
A and C are invertible, C = A-1, and A = C-1.
Observe that Corollary 2 is false if A and C are not square matrices. For example,
we have
S T S T
-1 1 -1 1
S T 1 -1 = I2 but 1 -1 S T ≠ I3
1 2 1 1 2 1
1 1 1 1 1 1
0 1 0 1
In fact, it is verified in the footnote on page 70 that if AB = Im and BA = In, where A
is m × n and B is n × m, then m = n and A and B are (square) inverses of each other.
An n × n matrix A has rank n if and only if (3) of Theorem 5 holds. Hence
Corollary 2
EXAMPLE 11
S TS T=S T = Im+n = S T
A X C V AC + XW AV + XD Im 0
0 B W D BW BD 0 In
using block notation. Equating corresponding blocks, we find
AC + XW = Im, BW = 0, and BD = In
Theorem 6
EXAMPLE 12
EXERCISES 2.4
S T S T
1. In each case, show that the matrices are inverses 2 4 1 3 1 -1
of each other. (g) 3 3 2 (h) 5 2 0
(a) S T, S T (b) S T, S T
3 5 2 -5 3 0 _1 4 0 4 1 4 1 1 -1
1 2 -1 3 1 -4 2 1 -3 −1 4 5 2
S T
3 1 2
S TS T
0 −1
S T
1 2 0 7 2 -6 1
_ 0 0
0
(d) S T , _1
3 0 (i) 1 -1 3 (j)
(c) 0 2 3 , -3 -1 3
3
1 −2 −2 0
0 5 0 1 2 4
1 3 1 2 1 -2 5 0 −1 −1 0
2. Find the inverse of each of the following 1 2 0 0 0
1 0 7 5
matrices. 0 1 3 0 0
0 1 3 6
(k) (l) 0 0 1 5 0
(a) S T S3 2T
1 -1
(b)
4 1 1 −1 5 2
0 0 0 1 7
-1 3 1 −1 5 1
S T S T
1 0 -1 1 -1 2 0 0 0 0 1
(c) 3 2 0 (d) -5 7 -11 3. In each case, solve the systems of equations by
-1 -1 0 -2 3 -5 finding the inverse of the coefficient matrix.
S T S T
3 5 0 3 1 -1 (a) 3x - y = 5 (b) 2x - 3y = 0
(e) 3 7 1 (f ) 2 1 0 2x + 2y = 1 x - 4y = 1
1 2 1 1 5 -1
SECTION 2.4 Matrix Inverses 81
S T
1 -1 3
4. Given A-1 = 2 0 5: 9. In each case either prove the assertion or give an
-1 1 0 example showing that it is false.
S T
1
(a) Solve the system of equations Ax = -1 . (a) If A ≠ 0 is a square matrix, then A is
3 invertible.
S T
1 -1 2 (b) If A and B are both invertible, then A + B is
(b) Find a matrix B such that AB = 0 1 1. invertible.
1 0 0 (c) If A and B are both invertible, then (A-1B)T
(c) Find a matrix C such that CA = S T.
1 2 -1 is invertible.
3 1 1 (d) If A4 = 3I, then A is invertible.
5. Find A when (e) If A2 = A and A ≠ 0, then A is invertible.
QA S0 1T R = S1 1T
1 -1 -1 2 3
(e) 10. (a) If A, B, and C are square matrices and
AB = I = CA, show that A is invertible
Q S2 1TAR = S2 2T and B = C = A-1.
1 0 -1 1 0
(f )
S T S T S T
1 -1 3 0 1 -1 2 -3
(b) If C = S T
(a) A-1 = 2 1 1 (b) A-1 = 1 2 1 0 -5 1
and A = 1 -2 ,
0 2 -2 1 0 1 3 0 -1
6 -10
S T S TS T
find x (if it exists) when
x1 3 -1 2 y1
ST S T
7. Given 2 = 1 0 4 y2 and
x 1 7
x3 2 1 0 y3 (i) b = 0 ; and (ii) b = 4 .
3 22
S T S TS T
z1 1 -1 1 y1
12. Verify that A = S T satisfies A - 3A + 2I = 0,
z2 = 2 -3 0 y2 , express the variables 1 -1 2
z3 -1 1 -2 y3 0 2
x1, x2, and x3 in terms of z1, z2, and z3. and use this fact to show that A-1 = _12 (3I - A).
3x + 4y = 7 a −b −c −d
13. Let Q = b a − d c . Compute QQT and
8. (a) In the system , substitute
4x + 5y = 1
the new variables x and y given by c d a −b
x = -5x + 4y d −c b a
. Then find x and y.
y = 4x - 3y so find Q-1 if Q ≠ 0.
82 Chapter 2 Matrix Algebra
S T
0 1 0 that satisfies the given condition. Show that A
15. Consider A = S T S T
1 1 0 -1
, B = , C = 0 0 1. is invertible and find a formula for A–1 in terms
-1 0 1 0 of A.
5 0 0
Find the inverses by computing (a) A ; (b) B 4;
6
(a) A3 - 3A + 2I = 0.
and (c) C3.
A4 + 2A3 - A - 4I = 0.
S T
(b)
1 0 1
16. Find the inverse of c 1 c in terms of c.
25. Let A and B denote n × n matrices.
3 c 2
(a) If A and AB are invertible, show that
S T
1 -1 1 B is invertible using only (2) and (3) of
17. If c ≠ 0, find the inverse of 2 -1 2 in terms of c. Theorem 4.
0 2 c
(b) If AB is invertible, show that both A and B
18. Show that A has no inverse when:
are invertible using Theorem 5.
(a) A has a row of zeros.
26. In each case find the inverse of the matrix A
(b) A has a column of zeros. using Example 11.
S T S T
(c) each row of A sums to 0. [Hint: Theorem 5(2).] -1 1 2 3 1 0
(a) A = 0 2 -1 (b) A = 5 2 0
(d) each column of A sums to 0.
[Hint: Corollary 2, Theorem 4.] 0 1 -1 1 3 -1
3 4 0 0
19. Let A denote a square matrix.
(c) A = 2 3 0 0
(a) Let YA = 0 for some matrix Y ≠ 0. Show 1 −1 1 3
that A has no inverse. [Hint: Corollary 2, 3 1 1 4
Theorem 4.]
2 1 5 2
(b) Use part (a) to show that (d) A =
1 1 −1 0
0 1 −1
S T S T
1 -1 1 2 1 -1 0
(i) 0 1 1 ; and (ii) 1 1 0 0 0 1 −2
1 0 2 1 0 -1
have no inverse. 27. If A and B are invertible symmetric matrices such
that AB = BA, show that A-1, AB, AB-1, and
[Hint: For part (ii) compare row 3 with the
A-1B-1 are also invertible and symmetric.
difference between row 1 and row 2.]
28. Let A be an n × n matrix and let I be the n × n
20. If A is invertible, show that
identity matrix.
(a) A2 ≠ 0. (b) Ak ≠ 0 for all k = 1, 2, ….
(a) If A2 = 0, verify that (I - A)-1 = I + A.
21. Suppose AB = 0, where A and B are square (b) If A3 = 0, verify that (I - A)-1 = I + A + A2.
S T
matrices. Show that: 1 2 -1
(a) If one of A and B has an inverse, the other (c) Find the inverse of 0 1 3 .
is zero. 0 0 1
n -1
(d) If A = 0, find the formula for (I - A) .
(b) It is impossible for both A and B to have
inverses. 29. Prove property 6 of Theorem 4: If A is
2 invertible and a ≠ 0, then aA is invertible
(c) (BA) = 0.
and (aA)-1 = _1a A-1.
22. Find the inverse of the X-expansion in Example
16 Section 2.2 and describe it geometrically.
SECTION 2.5 Elementary Matrices 83
30. Let A, B, and C denote n × n matrices. Using 36. Show that a square matrix A is invertible if and
only Theorem 4, show that: only if it can be left-cancelled: AB = AC implies
B = C.
(a) If A, C, and ABC are all invertible, B is
invertible. 37. If U2 = I, show that I + U is not invertible
(b) If AB and BA are both invertible, A and B unless U = I.
are both invertible.
38. (a) If J is the 4 × 4 matrix with every entry
31. Let A and B denote invertible n × n matrices. 1, show that I - _12 J is self-inverse and
symmetric.
(a) If A-1 = B-1, does it mean that A = B?
Explain. (b) If X is n × m and satisfies XTX = Im, show
that In - 2XXT is self-inverse and symmetric.
(b) Show that A = B if and only if A-1B = I.
39. An n × n matrix P is called an idempotent if
32. Let A, B, and C be n × n matrices, with A and B P2 = P. Show that:
invertible. Show that
(a) I is the only invertible idempotent.
(a) If A commutes with C, then A-1 commutes
with C. (b) P is an idempotent if and only if I - 2P is
self-inverse.
(b) If A commutes with B, then A-1 commutes
with B-1. (c) U is self-inverse if and only if U = I - 2P for
some idempotent P.
33. Let A and B be square matrices of the same size.
(d) I - aP is invertible for any a ≠ 1, and
(a) Show that (AB)2 = A2B2 if AB = BA. (I - aP)-1 = I + Q _____
1 - aR
a P.
(b) If A and B are invertible and (AB)2 = A2B2,
40. If A2 = kA, where k ≠ 0, show that A is
show that AB = BA.
invertible if and only if A = kI.
(c) If A = S T and B = S T , show that
1 0 1 1
0 0 0 0 41. Let A and B denote n × n invertible matrices.
(AB)2 = A2B2 but AB ≠ BA.
(a) Show that A-1 + B-1 = A-1(A + B)B-1.
34. Let A and B be n × n matrices for which AB is (b) If A + B is also invertible, show that
invertible. Show that A and B are both invertible. A-1 + B-1 is invertible and find a
S T S T
1 3 -1 1 1 2 formula for (A-1 + B-1)-1.
35. Consider A = 2 1 5 , B = 3 0 -3 .
1 -7 13 -2 5 17 42. Let A and B be n × n matrices, and let I be the
n × n identity matrix.
(a) Show that A is not invertible by finding a
nonzero 1 × 3 matrix Y such that YA = 0. (a) Verify that A(I + BA) = (I + AB)A and that
[Hint: Row 3 of A equals 2(row 2) - 3 (row 1).] (I + BA)B = B(I + AB).
Definition 2.12 An n × n matrix E is called an elementary matrix if it can be obtained from the identity
matrix In by a single elementary row operation (called the operation corresponding to
E). We say that E is of type I, II, or III if the operation is of that type (see page 6).
Hence
T , E2 = S T , and E3 = S T ,
E1 = S
0 1 1 0 1 5
1 0 0 9 0 1
are elementary of types I, II, and III, respectively, obtained from the 2 × 2 identity
matrix by interchanging rows 1 and 2, multiplying row 2 by 9, and adding 5 times
row 2 to row 1.
Suppose now that a matrix A = S
p q rT
a b c
is left multiplied by the above elementary
E1A = S TSp q rT = S T
0 1 a b c p q r
1 0 a b c
=S
9p 9q 9r T
E2A = S T S T
1 0 a b c a b c
0 9 p q r
E3A = S TSp q rT = S p r T
1 5 a b c a + 5p b + 5q c + 5r
0 1 q
In each case, left multiplying A by the elementary matrix has the same effect as
doing the corresponding row operation to A. This works in general.
Lemma 110
PROOF
We prove it for operations of type III; the proofs for types I and II are left as
exercises. Let E be the elementary matrix corresponding to the operation that
adds k times row p to row q ≠ p. The proof depends on the fact that each row of
EA is equal to the corresponding row of E times A. Let K1, K2, …, Km denote the
rows of Im. Then row i of E is Ki if i ≠ q, while row q of E is Kq + kKp. Hence:
If i ≠ q then row i of EA = KiA = (row i of A).
Row q of EA = (Kq + kKp)A = KqA + k(KpA)
= (row q of A) plus k (row p of A).
Thus EA is the result of adding k times row p of A to row q, as required.
Lemma 2
Every elementary matrix E is invertible, and E-1 is also a elementary matrix (of the same
type). Moreover, E-1 corresponds to the inverse of the row operation that produces E.
The following table gives the inverse of each type of elementary row operation:
EXAMP L E 1
Find the inverse of each of the elementary matrices
S T S T S T
0 1 0 1 0 0 1 0 5
E1 = 1 0 0 , E2 = 0 1 0 , and E3 = 0 1 0 .
0 0 1 0 0 9 0 0 1
Solution ► E1, E2, and E3 are of Type I, II, and III respectively, so the table gives
S T S T S T
0 1 0 1 0 0 1 0 -5
E -1
1 = 1 0 0 = E1, E -1
2 = 0 1 0, and E -1
3 = 0 1 0.
0 0 1 0 0 _19 0 0 1
Theorem 1
EXAMPLE 2
If A = S T , express the reduced row-echelon form R of A as R = UA
2 3 1
1 2 1
where U is invertible.
Theorem 2
EXAMPLE 3
1 E2 E3 = S TS T S T.
0 1 1 0 1 0
A = (E3E2E1)-1 = E -1 -1 -1
1 0 -2 1 0 3
SECTION 2.5 Elementary Matrices 87
UAV = RV = RU T1 = (U1RT)T = a S T b = S T
Ir 0 T Ir 0
.
0 0 n×m 0 0 m×n
S S 0r 0 T V T T .
I 0
Moreover, the matrix U1 = VT can be computed by [RT In] →
n×m
This proves
Theorem 3
UAV = S T .
Ir 0
0 0 m×n
Moreover, if R is the reduced row-echelon form of A, then:
1. U can be computed by [A Im] → [R U];
S
T is called the Smith normal
Ir 0
If A is an m × n matrix of rank r, the matrix
0 0
form11 of A. Whereas the reduced row-echelon form of A is the “nicest” matrix to
which A can be carried by row operations, the Smith canonical form is the “nicest”
matrix to which A can be carried by row and column operations. This is because doing
row operations to RT amounts to doing column operations to R and then transposing.
EXAMP L E 4
1 −1 1 2
Given A = 2 − 2 1 −1 , find invertible matrices U and V such that
−1 1 0 3
1 −1 1 2 1 0 0 1 −1 0 − 3 −1 1 0
2 − 2 1 −1 0 1 0 → 0 0 1 5 2 −1 0
−1 1 0 3 0 0 1 0 0 0 0 −1 1 1
Hence
1 −1 0 − 3 −1 1 0
R= 0 0 1 5 and 2 −1 0
U=
0 0 0 0 −1 1 1
In particular, r = rank R = 2. Now row-reduce [RT I4] → S S T V T T:
Ir 0
0 0
1 0 0 1 0 0 0 1 0 0 1 0 0 0
−1 0 0 0 1 0 0 → 0 1 0 0 0 1 0
0 1 0 0 0 1 0 0 0 0 1 1 0 0
−3 5 0 0 0 0 1 0 0 0 3 0 −5 1
whence
1 0 0 0 1 0 1 3
0 0 1 0
VT = 0 0 1 0 so V=
1 1 0 0 0 1 0 −5
3 0 −5 1 0 0 0 1
Then UAV = S T as is easily verified.
I2 0
0 0
Theorem 4
PROOF
Observe first that UR = S for some invertible matrix U (by Theorem 1 there
exist invertible matrices P and Q such that R = PA and S = QA; take U = QP-1).
We show that R = S by induction on the number m of rows of R and S. The case
m = 1 is left to the reader. If Rj and Sj denote column j in R and S respectively,
the fact that UR = S gives
URj = Sj for each j. (∗)
Since U is invertible, this shows that R and S have the same zero columns.
Hence, by passing to the matrices obtained by deleting the zero columns from R
and S, we may assume that R and S have no zero columns.
But then the first column of R and S is the first column of Im because R and S
are row-echelon so (∗) shows that the first column of U is column 1 of Im. Now
write U, R, and S in block form as follows.
SECTION 2.5 Elementary Matrices 89
S T, S T, S T
Ir M R1 R2 S1 S2
U= R= and S=
0 W 0 0 0 0
where R1 and S1 are r × r. Then block multiplication gives UR = R; that is,
S = R. This completes the proof.
EXERCISES 2.5
S T S T
1 0 3 0 0 1
(a) E = 0 1 0 (b) E= 0 1 0 (b) Show that there is no elementary matrix E
0 0 1 1 0 0 such that C = EA.
S T S T
1 0 0 1 0 0 4. If E is elementary, show that A and EA differ in
1
_
(c) E = 0 2
0 (d) E = -2 1 0 at most two rows.
0 0 1 0 0 1
5. (a) Is I an elementary matrix? Explain.
S T S T
0 1 0 1 0 0
(e) E = 1 0 0 (f ) E = 0 1 0 (b) Is 0 an elementary matrix? Explain.
0 0 1 0 0 5
6. In each case find an invertible matrix U such that
2. In each case find an elementary matrix E such UA = R is in reduced row-echelon form, and
that B = EA. express U as a product of elementary matrices.
1 −3 3 2 1 −2 3 1
S T, B = S T
1 1 -1 2
(c) A =
-1 2 1 1 7. In each case find an invertible matrix U such
that UA = B, and express U as a product of
S 3 2 T, B = S 3 2 T
4 1 1 -1
(d) A= elementary matrices.
(a) A = S T, B = S3 T
2 1 3 1 -1 -2
S T, B = S T
-1 1 -1 1
(e) A = -1 1 2 0 1
1 -1 -1 1
(b) A = S T, B = S T
2 -1 0 3 0 1
S -1 3 T , B = S 2 1 T
2 1 -1 3
(f ) A= 1 1 1 2 -1 0
90 Chapter 2 Matrix Algebra
r r
8. In each case factor A as a product of elementary (ii) If A ∼ B, then B ∼ A.
matrices. r r r
(iii) If A ∼ B and B ∼ C, then A ∼ C.
(a) A = S T A=S T
1 1 2 3
(b) (c) Show that, if A and B are both row-
2 1 1 2 r
equivalent to some third matrix, then A ∼ B.
S T S T
1 0 2 1 0 -3 1 −1 3 2 1 −1 4 5
(c) A = 0 1 1 (d) A = 0 1 4 (d) Show that 0 1 4 1 and − 2 1 −11 − 8
2 1 6 -2 2 15
1 0 86 −1 2 2 2
9. Let E be an elementary matrix. are row-equivalent. [Hint: Consider (c) and
(a) Show that ET is also elementary of the same Theorem 1 Section 1.2.]
type.
18. If U and V are invertible n × n matrices, show
(b) Show that ET = E if E is of type I or II. r
that U ∼ V. (See Exercise 17.)
10. Show that every matrix A can be factored 19. (See Exercise 17.) Find all matrices that are row-
as A = UR where U is invertible and R is in equivalent to:
reduced row-echelon form.
(a) S T S T
0 0 0 0 0 0
(b)
11. If A = S T and B = S T , find an
1 2 5 2 0 0 0 0 0 1
1 -3 -5 -3
(c) S T (d) S T
1 0 0 1 2 0
elementary matrix F such that AF = B. 0 1 0 0 0 1
[Hint: See Exercise 9.]
20. Let A and B be m × n and n × m matrices,
12. In each case find invertible U and V such that respectively. If m > n, show that AB is not
UAV = S T , where r = rank A.
Ir 0 invertible. [Hint: Use Theorem 1 Section 1.3 to
0 0 find x ≠ 0 with Bx = 0.]
(a) A = S T (b) A = S T
1 1 -1 3 2 21. Define an elementary column operation on a matrix
-2 -2 4 2 1 to be one of the following: (I) Interchange two
1 −1 2 1 1 1 0 −1 columns. (II) Multiply a column by a nonzero
(c) A = 2 −1 0 3 (d) A = 3 2 1 1 scalar. (III) Add a multiple of a column to
0 1 −4 1 101 3 another column. Show that:
13. Prove Lemma 1 for elementary matrices of: (a) If an elementary column operation is done to
an m × n matrix A, the result is AF, where F
(a) type I; (b) type II. is an n × n elementary matrix.
14. While trying to invert A, [A I] is carried to [P Q] (b) Given any m × n matrix A, there exist m × m
by row operations. Show that P = QA. elementary matrices E1, …, Ek and n × n
elementary matrices F1, …, Fp such that, in
15. If A and B are n × n matrices and AB is a block form,
product of elementary matrices, show that the
EkE1AF1Fp = S T.
same is true of A. Ir 0
0 0
16. If U is invertible, show that the reduced
22. Suppose B is obtained from A by:
row-echelon form of a matrix [U A] is [I U -1A].
(a) interchanging rows i and j;
17. Two matrices A and B are called row-
r
equivalent (written A ∼ B) if there is a sequence (b) multiplying row i by k ≠ 0;
of elementary row operations carrying A to B. (c) adding k times row i to row j (i ≠ j).
r
(a) Show that A ∼ B if and only if A = UB for In each case describe how to obtain B-1 from A-1.
some invertible matrix U. [Hint: See part (a) of the preceding exercise.]
(b) Show that:
23. Two m × n matrices A and B are called
r e
(i) A ∼ A for all matrices A. equivalent (written A ∼ B if there exist
SECTION 2.6 Linear Transformations 91
e e
invertible matrices U and V (sizes m × m and (ii) If A ∼ B, then B ∼ A.
n × n) such that A = UBV. e e e
(iii) If A ∼ B and B ∼ C, then A ∼ C.
(a) Prove the following the properties of
equivalence. (b) Prove that two m × n matrices are equivalent
e
if they have the same rank. [Hint: Use part
(i) A ∼ A for all m × n matrices A. (a) and Theorem 3.]
Linear Transformations
Of course, x + y and ax here are computed in n, while T(x) + T(y) and aT(x)
are in m. We say that T preserves addition if T1 holds, and that T preserves scalar
multiplication if T2 holds. Moreover, taking a = 0 and a = -1 in T2 gives
T(0) = 0 and T(-x) = -T(x)
Hence T preserves the zero vector and the negative of a vector. Even more is true.
Recall that a vector y in n is called a linear combination of vectors x1, x2, …, xk
if y has the form
y = a1x1 + a2x2 + + akxk
for some scalars a1, a2, …, ak. Conditions T1 and T2 combine to show that every linear
transformation T preserves linear combinations in the sense of the following theorem.
Theorem 1
PROOF
If k = 1, it reads T(a1x1) = a1T(x1) which is Condition T1. If k = 2, we have
T(a1x1 + a2x2) = T(a1x1) + T(a2x2) by Condition T1
= a1T(x1) + a2T(x2) by Condition T2
If k = 3, we use the case k = 2 to obtain
T(a1x1 + a2x2 + a3x3) = T [(a1x1 + a2x2) + a3x3] collect terms
= T(a1x1 + a2x2) + T(a3x3) by Condition T1
= [a1T(x1) + a2T(x2)] + T(a3x3) by the case k = 2
= [a1T(x1) + a2T(x2)] + a3T(x3) by Condition T2
The proof for any k is similar, using the previous case k - 1 and Conditions T1
and T2.
EXAMPLE 1
If T : 2 → 2 is a linear transformation, T S T = S T and T S T = S T , find T S T.
1 2 1 5 4
1 -3 -2 1 3
Solution ► Write z = S T , x = S T , and y = S T for convenience. Then we
4 1 1
3 1 -2
know T(x) and T(y) and we want T(z), so it is enough by Theorem 1 to express
z as a linear combination of x and y. That is, we want to find numbers a and b
such that z = ax + by. Equating entries gives two equations 4 = a + b and
11
3 = a - 2b. The solution is, a = __ 3
and b = _13 , so z = __ 11
3
x + _13 y. Thus
Theorem 1 gives
3 S T + _13 S T = _13 S T
11 2 5 27
T(z) = __ 3
T(x) + _13 T(y) = __
11
-3 1 -32
This is what we wanted.
EXAMPLE 2
If A is m × n, the matrix transformation TA : n → m, A, is a linear
transformation.
Solution ► We have TA(x) = Ax for all x in n, so Theorem 2 Section 2.2 gives
TA(x + y) = A(x + y) = Ax + Ay = TA(x) + TA(y)
and
TA(ax) = A(ax) = a(Ax) = aTA(x)
hold for all x and y in n and all scalars a. Hence TA satisfies T1 and T2, and
so is linear.
SECTION 2.6 Linear Transformations 93
The remarkable thing is that the converse of Example 2 is true: Every linear
transformation T : n → m is actually a matrix transformation. To see why, we
define the standard basis of n to be the set of columns
{e1, e2, …, en}
S T
x1
x
of the identity matrix In. Then each ei is in and every vector x = 2 in n is a
n
xn
linear combination of the ei. In fact:
x = x1e1 + x2e2 + + xnen
as the reader can verify. Hence Theorem 1 shows that
T(x) = T(x1e1 + x2e2 + + xnen) = x1T(e1) + x2T(e2) + + xnT(en)
Now observe that each T(ei) is a column in m, so
A = [T(e1) T(e2) T(en)]
is an m × n matrix. Hence we can apply Definition 2.5 to get
S T
x1
x
T(x) = x1T(e1) + x2T(e2) + + xnT(en) = [T(e1) T(e2) T(en)] 2 = Ax.
xn
Since this holds for every x in n, it shows that T is the matrix transformation
induced by A, and so proves most of the following theorem.
Theorem 2
Let T : n → m be a transformation.
1. T is linear if and only if it is a matrix transformation.
2. In this case T = TA is the matrix transformation induced by a unique m × n
matrix A, given in terms of its columns by
A = [T(e1) T(e2) T(en)]
where {e1, e2, …, en} is the standard basis of n.
PROOF
It remains to verify that the matrix A is unique. Suppose that T is induced by
another matrix B. Then T(x) = Bx for all x in n. But T(x) = Ax for each x, so
Bx = Ax for every x. Hence A = B by Theorem 5 Section 2.2.
EXAMP L E 3
S T S T
x1 x1
Define T1: → by T 2 = S T for all x2 in 3. Show that T is a linear
3 2 x1
x
x3 x2 x3
transformation and use Theorem 2 to find its matrix.
94 Chapter 2 Matrix Algebra
Sx
Solution ► Write x = S x2 T and y = S y2 T , so that x + y = x2 + y2 . Hence T
x1 y1 x1 + y1
x3 y3 3 + y3
S x2 + y2 T = S x2 T + S y2 T = T(x) + T(y)
x1 + y1 x1 y1
T(x + y) =
Similarly, the reader can verify that T(ax) = aT(x) for all a in , so T is a linear
transformation. Now the standard basis of 3 is
ST ST ST
1 0 0
e1 = 0 , e2 = 1 , and e3 = 0
0 0 1
so, by Theorem 2, the matrix of T is
S T S T
x1 x1
Of course, the fact that T x2 = S T = S T 2 shows directly that T is a
x1 1 0 0 x
x3 x2 0 1 0 x3
matrix transformation (hence linear) and reveals the matrix.
EXAMPLE 4
Let Q0: 2 → 2 denote reflection in the x axis (as in Example 13 Section 2.2)
and let R__π2 : 2 → 2 denote counterclockwise rotation through __π2 about the
origin (as in Example 15 Section 2.2). Use Theorem 2 to find the matrices of
Q0 and R__π2 .
y Solution ► Observe that Q0 and R__π2 are linear by Example 2 (they are matrix
transformations), so Theorem 2 applies to them. The standard basis of 2 is
{e1, e2} where e1 = S T points along the positive x axis, and e2 = S T points
⎡0⎤ 1 0
⎣⎢1⎦⎥
e2 0 1
along the positive y axis (see Figure 1).
⎡1⎤
⎣⎢0⎦⎥ The reflection of e1 in the x axis is e1 itself because e1 points along the x axis,
0 e1 x and the reflection of e2 in the x axis is -e2 because e2 is perpendicular to the x
axis. In other words, Q0(e1) = e1 and Q0(e2) = -e2. Hence Theorem 2 shows
FIGURE 1
that the matrix of Q0 is
EXAMP L E 5
Let Q1: 2 → 2 denote reflection in the line y = x. Show that Q1 is a matrix
transformation, find its matrix, and use it to illustrate Theorem 2.
Theorem 3
Let k →
T
n →S
m, be linear transformations, and let A and B be the matrices of S
and T respectively. Then S ◦ T is linear with matrix AB.
PROOF
(S ◦ T)(x) = S[T(x)] = A[Bx] = (AB)x for all x in k.
EXAMP L E 6
Show that reflection in the x axis followed by rotation through __π2 is reflection in
the line y = x.
96 Chapter 2 Matrix Algebra
This conclusion can also be seen geometrically. Let x be a typical point in 2,
and assume that x makes an angle α with the positive x axis. The effect of first
applying Q0 and then applying R__π2 is shown in Figure 3. The fact that R__π2 [Q0(x)]
makes the angle α with the positive y axis shows that R__π2 [Q0(x)] is the reflection
of x in the line y = x.
y y y y=x
R ⁄ 2[Q0 ( x )]
x x α x
α
0 x 0 α x 0 α x
Q0(x) Q0(x)
FIGURE 3
Some Geometry
x2 As we have seen, it is convenient to view a vector x in 2 as an arrow from the
2x = ⎡2⎤ origin to the point x (see Section 2.2). This enables us to visualize what sums and
⎢⎣4⎥⎦
scalar multiples mean geometrically. For example consider x = S T in 2. Then
1
2
2x = S T , 2 x = S T and - 2 x = S T, and these are shown as arrows in Figure 4.
1
2 _1
_
2 1
-_12
x= ⎡1⎤ _
⎣⎢2⎦⎥
4 1 -1
1
x= ⎡1/2⎤ Observe that the arrow for 2x is twice as long as the arrow for x and in the same
2 ⎣⎢ 1 ⎦⎥
direction, and that the arrows for _12 x is also in the same direction as the arrow for
0 x1 x, but only half as long. On the other hand, the arrow for −_12 x is half as long as
− 12 x = ⎡−1/ 2⎤ the arrow for x, but in the opposite direction. More generally, we have the following
⎣⎢ −1 ⎦⎥ geometrical description of scalar multiplication in 2:
FIGURE 4
SECTION 2.6 Linear Transformations 97
Rotations
We can now describe rotations in the plane. Given an angle θ, let
Rθ : 2 → 2
y
Rθ (x) denote counterclockwise rotation of 2 about the origin through the angle θ. The
action of Rθ is depicted in Figure 8. We have already looked at R__π (in Example 15
2
Section 2.2) and found it to be a matrix transformation. It turns out that Rθ is a
θ x matrix transformation for every angle θ (with a simple formula for the matrix), but it
0 x is not clear how to find the matrix. Our approach is to first establish the (somewhat
FIGURE 8 surprising) fact that Rθ is linear, and then obtain the matrix from Theorem 2.
Let x and y be two vectors in 2. Then x + y is the diagonal of the parallelogram
Rθ ( x + y ) y determined by x and y as in Figure 9. The effect of Rθ is to rotate the entire
x+y
Rθ(x)
parallelogram to obtain the new parallelogram determined by Rθ(x) and Rθ(y), with
diagonal Rθ(x + y). But this diagonal is Rθ(x) + Rθ(y) by the parallelogram law
y (applied to the new parallelogram). It follows that
θ x
Rθ(y)
Rθ(x + y) = Rθ(x) + Rθ(y).
0 x
FIGURE 9
12 If k is a real number, |k| denotes the absolute value of k; that is, |k| = k if k ≥ 0 and |k| = -k if k < 0.
98 Chapter 2 Matrix Algebra
y A similar argument shows that Rθ(ax) = aRθ(x) for any scalar a, so Rθ: 2 → 2 is
sin θ
e2 indeed a linear transformation.
With linearity established we can find the matrix of Rθ. Let e1 = S T and e2 = S T
Rθ(e2) 1 0
θ cos θ Rθ(e1) 2
denote the standard basis of . By Figure 10 we see that 0 1
1 1 sin θ
Rθ(e1) = S cos θ T Rθ(e2) = S T.
θ -sin θ
0 cos θ e1 x and
sin θ cos θ
FIGURE 10 Hence Theorem 2 shows that Rθ is induced by the matrix
[Rθ(e1) Rθ(e2)] = S T.
cos θ -sin θ
sin θ cos θ
We record this as
Theorem 4
EXAMPLE 7
Let θ and ϕ be angles. By finding the matrix of the composite Rθ ◦ Rϕ, obtain
expressions for cos(θ + ϕ) and sin(θ + ϕ).
R R
y Solution ► Consider the transformations 2 → ϕ
2 →θ
2. Their composite
Rθ ◦ Rϕ is the transformation that first rotates the plane through ϕ and then
Rθ[Rϕ(x)] rotates it through θ, and so is the rotation through the angle θ + ϕ (see
Figure 11). In other words
Reflections
y The line through the origin with slope m has equation y = mx, and we let
Qm(x)
Qm: 2 → 2 denote reflection in the line y = mx.
y = mx This transformation is described geometrically in Figure 12. In words, Qm(x) is
the “mirror image” of x in the line y = mx. If m = 0 then Q0 is reflection in the x
x axis, so we already know Q0 is linear. While we could show directly that Qm is linear
0 x (with an argument like that for Rθ), we prefer to do it another way that is instructive
FIGURE 12 and derives the matrix of Qm directly without using Theorem 2.
Let θ denote the angle between the positive x axis and the line y = mx. The key
observation is that the transformation Qm can be accomplished in three steps: First
rotate through -θ (so our line coincides with the x axis), then reflect in the x axis,
and finally rotate back through θ. In other words:
Qm = Rθ ◦ Q0 ◦ R−θ
Since R–θ, Q0, and Rθ are all linear, this (with Theorem 3) shows that Qm is linear
and that is matrix is the product of the matrices of Rθ, Q0, and R–θ. If we write
c = cos θ and s = sin θ for simplicity, then the matrices of Rθ, R–θ, and Q0 are
Ss cT S -s c T , S T respectively.
c -s c s 1 0 13
, and
0 -1
Hence, by Theorem 3, the matrix of Qm = Rθ ◦ Q0 ◦ R−θ is
S T =S T.
c2 - s2
T SsS T
c -s 1 0 c s 2sc
c 0 -1 -s c 2sc s - c2
2
Let Qm denote reflection in the line y = mx. Then Qm is a linear transformation with
S T.
1 1 - m2 2m
matrix ______
1 + m2 2m m2 - 1
EXAMP L E 8
Let T : 2 → 2 be rotation through -__π2 followed by reflection in the y axis.
Show that T is a reflection in a line through the origin and find the line.
13 The matrix of R-θ comes from the matrix of Rθ using the fact that, for all angles θ, cos(-θ) = cos θ and sin(-θ) = -sin(θ).
cos(− 2 ) − sin( − 2 )
=S T and the matrix of
0 1
Solution ► The matrix of R-__π2 is
sin(− 2 ) cos( − 2 ) -1 0
reflection in the y axis is S T. Hence the matrix of T is
-1 0
0 1
Projections
y The method in the proof of Theorem 5 works more generally. Let Pm: 2 → 2
y = mx denote projection on the line y = mx. This transformation is described
Pm(x) geometrically in Figure 14. If m = 0, then P0 S xy T = S x T for all S xy T in 2, so P0 is
0
linear with matrix S T. Hence the argument above for Qm goes through for Pm.
1 0
x 0 0
First observe that
0 x Pm = Rθ ◦ P 0 ◦ R−θ
FIGURE 14 as before. So, Pm is linear with matrix
S T =S T
c2 sc
Ss
T S T
c -s 1 0 c s
c 0 0 -s c sc s2
1
where c = cos θ = ________
_______ m
and s = sin θ = ________
_______ . This gives:
√1 + m 2 √1 + m2
Theorem 6
2 S T.
1 1 m
with matrix _______
1 + m m m2
EXAMPLE 9
Given x in 2, write y = Pm(x). The fact that y lies on the line y = mx means
that Pm(y) = y. But then
(Pm ◦ Pm)(x) = Pm(y) = y = Pm(x) for all x in 2, that is, Pm ◦ Pm = Pm.
2S 2T
1 1 m
In particular, if we write the matrix of Pm as A = _______ , then A2 = A.
1+m m m
The reader should verify this directly.
SECTION 2.6 Linear Transformations 101
EXERCISES 2.6
ST S T ST
8 1 2 is also in the image of T for all scalars a and b.
(a) Find T 3 if T 0 = S T and T 1 = S T.
2 -1
3 0 6. Use Theorem 2 to find the matrix of the
7 -1 3
identity transformation 1n : n → n
S T S T ST
5 3 2
6 if T 2 = S T and T 0 = S T.
3 -1 defined by 1n : (x) = x for each x in n.
(b) Find T
5 2
-13 -1 5 7. In each case show that T : 2 → 2 is not a
linear transformation.
2. Let Tθ : 4 → 3 be a linear transformation.
(a) T S T = S T.
S y2 T
xy
(b) T S T =
x x 0
S T S T S T
1 1 2 y y
3 1 = 3 and 0
(a) Find T if T
−2 0 8. In each case show that T is either reflection in a
−3 -1 -1
S T ST
line or rotation through an angle, and find the
0 5 line or angle.
T -1 = 0 .
(a) T S T = _15 S
4x + 3y T
1 x -3x + 4y
1 1 .
S T ST S T
y
5 1 5
√2 S -x + y T
-1 if T 1 = x+y
(b) T S T = __
(b) Find T 1 and x 1__
2 1 .
-3 y
−4 1 __
S T ST S T
-1 x - √3 y
(c) T S T = __
2 x 1__
__ .
T 1 = 0. y √3
√3 x + y
0
1
(d) T S T = -__ S
6x - 8y T
2 x 8x + 6y
1
3. In each case assume that the transformation T is .
y 10
linear, and use Theorem 2 to obtain the matrix A
of T. 9. Express reflection in the line y = -x as the
composition of a rotation followed by reflection
(a) T : 2 → 2 is reflection in the line y = -x. in the line y = x.
(b) T : 2 → 2 is given by T(x) = -x for each x
10. In each case find the matrix of T : 3 → 3:
in 2.
(a) T is rotation through θ about the x axis (from
(c) T : 2 → 2 is clockwise rotation through __π4 .
the y axis to the z axis).
(d) T : 2 → 2 is counterclockwise rotation
(b) T is rotation through θ about the y axis (from
through __π4 .
the x axis to the z axis).
4. In each case use Theorem 2 to obtain the matrix
11. Let Tθ : 2 → 2 denote reflection in the line
A of the transformation T. You may assume that
making an angle θ with the positive x axis.
T is linear in each case.
(a) Show that the matrix of Tθ
(a) T : 3 → 3 is reflection in the x-z plane.
is S T for all θ.
cos 2θ sin 2θ
(b) T : 3 → 3 is reflection in the y-z plane. sin 2θ −cos 2θ
(b) Show that Tθ ◦ R2ϕ = Tθ−ϕ for all θ and ϕ.
5. Let T : n → m be a linear transformation.
(a) If x is in n, we say that x is in the kernel of T 12. In each case find a rotation or reflection that
if T(x) = 0. If x1 and x2 are both in the kernel equals the given transformation.
of T, show that ax1 + bx2 is also in the kernel (a) Reflection in the y axis followed by rotation
of T for all scalars a and b. through __π2 .
(b) If y is in n, we say that y is in the image of T (b) Rotation through π followed by reflection in
if y = T(x) for some x in n. If y1 and y2 are the x axis.
102 Chapter 2 Matrix Algebra
(c) Rotation through __π2 followed by reflection in 18. Let Q0: 2 → 2 be reflection in the x axis,
the line y = x. let Q1: 2 → 2 be reflection in the line y = x,
(d) Reflection in the x axis followed by rotation let Q-1: 2 → 2 be reflection in the line
through __π2 . y = -x, and let R__π2 : 2 → 2 be counterclockwise
(e) Reflection in the line y = x followed by rotation through __π2 .
reflection in the x axis. (a) Show that Q1 ◦ R__π2 = Q0.
(f ) Reflection in the x axis followed by reflection (b) Show that Q1 ◦ Q2 = R__π2 .
in the line y = x.
(c) Show that R__π2 ◦ Q0 = Q1.
13. Let R and S be matrix transformations n → m
(d) Show that Q0 ◦ R__π2 = Q-1.
induced by matrices A and B respectively. In
each case, show that T is a matrix transformation 19. For any slope m, show that:
and describe its matrix in terms of A and B.
(a) Qm ◦ Pm = Pm (b) Pm ◦ Qm = Pm
(a) T(x) = R(x) + S(x) for all x in n.
20. Define T : n → by
(b) T(x) = aR(x) for all x in n (where a is a fixed
T(x1, x2, …, xn) = x1 + x2 + + xn.
real number).
Show that T is a linear transformation
14. Show that the following hold for all linear and find its matrix.
transformations T : n → m:
21. Given c in , define Tc : n → by Tc(x) = cx
n
(a) T(0) = 0. (b) T(-x) = -T(x) for all x in . for all x in n. Show that Tc is a linear
transformation and find its matrix.
15. The transformation T : n → m defined
by T(x) = 0 for all x in n is called the zero 22. Given vectors w and x in n, denote their dot
transformation. product by w · x.
(a) Show that the zero transformation is linear (a) Given w in n, define Tw: n → by
and find its matrix. Tw(x) = w · x for all x in n. Show that
Tw is a linear transformation.
(b) Let e1, e2, …, en denote the columns of the
n × n identity matrix. If T : n → m is (b) Show that every linear transformation
linear and T(ei) = 0 for each i, show that T is T : n → is given as in (a); that is
the zero transformation. [Hint: Theorem 1.] T = Tw for some w in n.
16. Write the elements of n and m as rows. If 23. If x ≠ 0 and y are vectors in n, show that there
A is an m × n matrix, define T : m → n by is a linear transformation T : n → n such that
T(y) = yA for all rows y in m. Show that: T(x) = y. [Hint: By Definition 2.5, find a matrix
A such that Ax = y.]
(a) T is a linear transformation.
(b) the rows of A are T (f1), T (f2), …, T (fm) 24. Let n → T
m → S
k be two linear
where fi denotes row i of Im. [Hint: Show transformations. Show directly that S ◦ T is
that fi A is row i of A.] linear. That is:
(a) Show that (S ◦ T )(x + y) = (S ◦ T )x + (S ◦ T )y
17. Let S : n → n and T : n → n be linear
for all x, y in n.
transformations with matrices A and B respectively.
(b) Show that (S ◦ T )(ax) = a[(S ◦ T )x] for all x
(a) Show that B2 = B if and only if T 2 = T
in n and all a in .
(where T 2 means T ◦ T ).
(b) Show that B2 = I if and only if T 2 = 1n. 25. Let n → T
m → S
k →R
k be linear
transformations. Show that
(c) Show that AB = BA if and only if R ◦ (S ◦ T) = (R ◦ S) ◦ T by showing directly
S ◦ T = T ◦ S. that [R ◦ (S ◦ T )](x) = [(R ◦ S) ◦ T )](x) holds
[Hint: Theorem 3.] for each vector x in n.
SECTION 2.7 LU-Factorization 103
Triangular Matrices
As for square matrices, if A = [aij] is an m × n matrix, the elements a11, a22, a33, …
form the main diagonal of A. Then A is called upper triangular if every entry
below and to the left of the main diagonal is zero. Every row-echelon matrix is
upper triangular, as are the matrices
1 −1 0 3 0 2 1 0 5 1 1 1
0 2 1 1 0 −1 1
0 0 0 3 1
0 0 0
0 0 −3 0 0 0 1 0 1
0 0 0
By analogy, a matrix A is called lower triangular if its transpose is upper triangular,
that is if each entry above and to the right of the main diagonal is zero. A matrix is
called triangular if it is upper or lower triangular.
EXAMP L E 1
Solve the system
x1 + 2x2 - 3x3 - x4 + 5x5 = 3
5x3 + x4 + x5 = 8
2x5 = 6
where the coefficient matrix is upper triangular.
15 This section is not used later and so may be omitted with no loss of continuity.
104 Chapter 2 Matrix Algebra
Lemma 1
LU-Factorization
Let A be an m × n matrix. Then A can be carried to a row-echelon matrix U (that
is, upper triangular). As in Section 2.5, the reduction is
A → E1A → E2E1A → E3E2E1A → → EkEk-1E2E1A = U
where E1, E2, …, Ek are elementary matrices corresponding to the row operations
used. Hence
A = LU
where L = (EkEk-1E2E1)-1 = E -1 -1 -1 -1
1 E 2 E k-1E k . If we do not insist that U is
reduced then, except for row interchanges, none of these row operations involve
adding a row to a row above it. Thus, if no row interchanges are used, all the Ei are
lower triangular, and so L is lower triangular (and invertible) by Lemma 1. This
proves the following theorem. For convenience, let us say that A can be lower
reduced if it can be carried to row-echelon form using no row interchanges.
Theorem 1
Example 2 provides an illustration. For convenience, the first nonzero column from
the left in a matrix A is called the leading column of A.
EXAMP L E 2
0 2 −6 −2 4
Find an LU-factorization of A = 0 −1 3 3 2 .
0 −1 3 7 10
Solution ► We lower reduce A to row-echelon form as follows:
0 2 −6 −2 4 0 1 −3 −1 2 0 1 −3 −1 2
A = 0 −1 3 3 2 → 0 0 0 2 4 → 0 0 0 1 2 = U
0 −1 3 7 10 0 0 0 6 12 0 0 0 0 0
The circled columns are determined as follows: The first is the leading column
of A, and is used (by lower reduction) to create the first leading 1 and create
zeros below it. This completes the work on row 1, and we repeat the procedure
on the matrix consisting of the remaining rows. Thus the second circled
column is the leading column of this smaller matrix, which we use to create the
second leading 1 and the zeros below it. As the remaining row is zero here, we
are finished. Then A = LU where
2 0 0
L = −1 2 0 .
−1 6 1
This matrix L is obtained from I3 by replacing the bottom of the first two
columns by the circled columns in the reduction. Note that the rank of A is 2
here, and this is the number of circled columns.
LU-Algorithm
EXAMPLE 3
5 − 5 10 0 5
−3 3 2 21
Find an LU-factorization for A = .
−2 2 0 −1 0
1 −1 10 25
Solution ► The reduction to row-echelon form is
5 − 5 10 0 5 1 −1 2 0 1
−3 3 2 2 1 0 0 8 2 4
→
− 2 2 0 −1 0 0 0 4 −1 2
1 − 1 10 2 5 0 0 8 2 4
1 −1 2 0 1
0 0 1 41 12
→
0 0 0 −2 0
0 0 0 0 0
1 −1 2 0 1
0 0 1 41 12
→ =U
0 0 0 10
0 0 0 0 0
If U denotes this row-echelon matrix, then A = LU, where
5 0 0 0
−3 8 0 0
L=
−2 4 −2 0
1 8 0 1
The next example deals with a case where no row of zeros is present in U (in fact, A
is invertible).
EXAMPLE 4
242
Find an LU-factorization for A = 1 12 .
−1 0 2
Solution ► The reduction to row-echelon form is
24 2 1 21 12 1 12 1
1 1 2 → 0 −1 1 → 0 1 −1 → 0 1 −1 = U
−1 0 2 0 23 0 0 5 0 0 1
SECTION 2.7 LU-Factorization 107
2 00
Hence A = LU where L = 1 −1 0 .
−1 2 5
Theorem 2
EXAMP L E 5
0 0 −1 2
−1 −1 1 2
If A = , find a permutation matrix P such that PA has an
2 1 −3 6
0 1 −1 4
LU-factorization, and then find the factorization.
Two row interchanges were needed (marked with ∗), first rows 1 and 2 and
then rows 2 and 3. Hence, as in Theorem 2,
1 0 0 0 0 1 0 0 0 1 0 0
P= 0 0 1 0 1 0 0 0 = 0 0 1 0
0 1 0 0 0 0 1 0 1 0 0 0
0 0 0 1 0 0 0 1 0 0 0 1
If we do these interchanges (in order) to A, the result is PA. Now apply the
LU-algorithm to PA:
−1 −1 1 2 1 1 −1 − 2 1 1 −1 − 2
2 1 −3 6 0 − 1 − 1 10 0 1 1 −10
PA = → →
0 0 −1 2 0 0 −1 2 0 0 −1 2
0 1 −1 4 0 1 −1 4 0 0 − 2 14
1 1 −1 −2 1 1 −1 − 2
0 1 1 −10 0 1 1 −10
→ → =U
0 0 1 −2 0 0 1 −2
0 0 0 10 0 0 0 1
11 −1 − 2 −1 0 0 0
0 1 1 −10 2 −1 0 0
Hence, PA = LU, where L = and U = .
00 1 −2 0 0 −1 0
00 0 1 0 1 − 2 10
S3 2T S0 0 0T = S3 1T S0 0 0T
1 0 1 -2 3 1 0 1 -2 3
However, it is necessary here that the row-echelon matrix has a row of zeros. Recall
that the rank of a matrix A is the number of nonzero rows in any row-echelon
matrix U to which A can be carried by row operations. Thus, if A is m × n, the
matrix U has no row of zeros if and only if A has rank m.
Theorem 3
PROOF
Suppose A = MV is another LU-factorization of A, so M is lower triangular
and invertible and V is row-echelon. Hence LU = MV, and we must show
that L = M and U = V. We write N = M-1L. Then N is lower triangular and
invertible (Lemma 1) and NU = V, so it suffices to prove that N = I. If N is
m × m, we use induction on m. The case m = 1 is left to the reader. If m > 1,
observe first that column 1 of V is N times column 1 of U. Thus if either column
is zero, so is the other (N is invertible). Hence, we can assume (by deleting zero
columns) that the (1, 1)-entry is 1 in both U and V.
, and V = S
0 V1 T
,U=S
N1 T SX 0 U1 T
a 0 1 Y 1 Z
Now we write N = in block form.
Then NU = V becomes S
X XY + N1U1 T S 0 V1 T
a aY 1 Z
= . Hence a = 1, Y = Z, X = 0,
Corollary 1
Of course, in this case U is an upper triangular matrix with 1s along the main diagonal.
Proofs of Theorems
PROOF OF THE LU-ALGORITHM
If c1, c2, …, cr are columns of lengths m, m - 1, …, m - r + 1, respectively,
write L(m)(c1, c2, …, cr) for the lower triangular m × m matrix obtained from Im
by placing c1, c2, …, cr at the bottom of the first r columns of Im.
Proceed by induction on n. If A = 0 or n = 1, it is left to the reader. If n > 1, let
c1 denote the leading column of A and let k1 denote the first column of the m × m
identity matrix. There exist elementary matrices E1, …, Ek such that, in block form,
X1
(EkE2E1)A = 0 k 1 where (EkE2E1)c1 = k1.
A1
Then G is lower triangular, and GK1 = c1. Also, each Ej (and so each Ej-1) is the
result of either multiplying row 1 of Im by a constant or adding a multiple of row
1 to another row. Hence,
0
G = (E -1 -1 -1
1 E 2 E k )Im = c1
I m−1
110 Chapter 2 Matrix Algebra
0 1 X1
Hence A = LU, where U = is row-echelon and
0 0 U1
0 1 0 0
L = c1 = c1 = L(m)[c1, c2, …, cr].
I m−1 0 L1 L
This completes the proof.
PROOF OF THEOREM 2
Let A be a nonzero m × n matrix and let kj denote column j of Im. There is a
permutation matrix P1 (where either P1 is elementary or P1 = Im) such that the
first nonzero column c1 of P1A has a nonzero entry on top. Hence, as in the
LU-algorithm,
0 1 X1
L(m)[c1]-1 · P1 · A =
0 0 A1
in block form. Then let P2 be a permutation matrix (either elementary or Im)
such that
0 1 X1
P2 · L(m)[c1]-1 · P1 · A =
0 0 A1′
and the first nonzero column c2 of A1 has a nonzero entry on top. Thus,
0 1 X1
(m) -1 (m) -1
L [k1, c2] · P2 · L [c1] · P1 · A = 0 1 X2
0 0
0 0 A2
in block form. Continue to obtain elementary permutation matrices P1, P2, …, Pr
and columns c1, c2, …, cr of lengths m, m - 1, …, such that
(LrPrLr-1Pr-1L2P2L1P1)A = U
where U is a row-echelon matrix and Lj = L(m)[k1, …, kj-1, cj]-1 for each j, where
the notation means the first j -1 columns are those of Im. It is not hard to verify
that each Lj has the form Lj = L(m)[k1, …, kj-1, cj ] where cj is a column of length
m - j + 1. We now claim that each permutation matrix Pk can be “moved past”
each matrix Lj to the right of it, in the sense that
PkLj = LjPk
(m)
where Lj = L [k1, …, kj-1, cj ] for some column cj of length m - j + 1. Given
that this is true, we obtain a factorization of the form
(LrLr-1L2L1)(PrPr-1P2P1) A = U
If we write P = PrPr-1P2P1, this shows that PA has an LU-factorization
because LrLr-1L2L1 is lower triangular and invertible. All that remains is to
prove the following rather technical result.
SECTION 2.7 LU-Factorization 111
Lemma 2
Let Pk result from interchanging row k of Im with a row below it. If j < k, let cj be a column of
length m - j + 1. Then there is another column cj of length m - j + 1 such that
Pk · L(m)[k1, …, kj-1, cj] = L(m) [k1, …, kj-1, cj ] · Pk
EXERCISES 2.7
(b) Show that the factorization in (a) is unique. as in the proof of Theorem 2. [Hint: Use
induction on m and block multiplication.]
10. Let c1, c2, …, cr be columns of lengths m,
m - 1, …, m - r + 1. If kj denotes column j 11. Prove Lemma 2. [Hint: Pk-1 = Pk. Write
Pk = S T in block form where P0 is an
of Im, show that L(m)[c1, c2, …, cr] = Ik 0
L(m)[c1] L(m)[k1, c2] L(m) [k1, k2, c3] 0 P0
L(m)[k1, k2, …, kr-1, cr]. The notation is (m - k) × (m - k) permutation matrix.]
EXAMPLE 1
A primitive society has three basic needs: food, shelter, and clothing. There are thus
three industries in the society—the farming, housing, and garment industries—that
produce these commodities. Each of these industries consumes a certain proportion
of the total output of each commodity according to the following table.
OUTPUT
Farming Housing Garment
Farming 0.4 0.2 0.3
CONSUMPTION Housing 0.2 0.6 0.4
Garment 0.4 0.2 0.3
Find the annual prices that each industry must charge for its income to equal its
expenditures.
Solution ► Let p1, p2, and p3 be the prices charged per year by the farming,
housing, and garment industries, respectively, for their total output. To see
how these prices are determined, consider the farming industry. It receives p1
for its production in any year. But it consumes products from all these industries
in the following amounts (from row 1 of the table): 40% of the food, 20% of
the housing, and 30% of the clothing. Hence, the expenditures of the farming
industry are 0.4p1 + 0.2p2 + 0.3p3, so
0.4p1 + 0.2p2 + 0.3p3 = p1
A similar analysis of the other two industries leads to the following system of
equations.
0.4p1 + 0.2p2 + 0.3p3 = p1
0.2p1 + 0.6p2 + 0.4p3 = p2
0.4p1 + 0.2p2 + 0.3p3 = p3
This has the matrix form Ep = p, where
16 The applications in this section and the next are independent and may be taken in any order.
17 See W. W. Leontief, “The world economy of the year 2000,” Scientific American, Sept. 1980.
SECTION 2.8 An Application to Input-Output Economic Models 113
S T S T
0.4 0.2 0.3 p1
E = 0.2 0.6 0.4 and p = p2
0.4 0.2 0.3 p3
The equations can be written as the homogeneous system
(I - E)p = 0
where I is the 3 × 3 identity matrix, and the solutions are
S T
2t
p = 3t
2t
where t is a parameter. Thus, the pricing must be such that the total output
of the farming industry has the same value as the total output of the garment
industry, whereas the total value of the housing industry must be _32 as much.
S T
p1
p
If we write p = 2 , these equations can be written as the matrix equation
pn
Ep = p
This is called the equilibrium condition, and the solutions p are called
equilibrium price structures. The equilibrium condition can be written as
(I - E)p = 0
which is a system of homogeneous equations for p. Moreover, there is always a
nontrivial solution p. Indeed, the column sums of I - E are all 0 (because E is
stochastic), so the row-echelon form of I - E has a row of zeros. In fact, more
is true:
114 Chapter 2 Matrix Algebra
Theorem 1
EXAMPLE 2
Find the equilibrium price structures for four industries if the input-output
matrix is
S T
.6 .2 .1 .1
E = .3 .4 .2 0
.1 .3 .5 .2
0 .1 .2 .7
Find the prices if the total value of business is $1000.
S T
p1
p
Solution ► If p = p2 is the equilibrium price structure, then the equilibrium
3
p4
condition is Ep = p. When we write this as (I - E)p = 0, the methods of
Chapter 1 yield the following family of solutions:
S T
44t
p = 39t
51t
47t
where t is a parameter. If we insist that p1 + p2 + p3 + p4 = 1000, then
t = 5.525 (to four figures). Hence
S T
243.09
p = 215.47
281.76
259.67
to five figures.
18 The interested reader is referred to P. Lancaster’s Theory of Matrices (New York: Academic Press, 1969) or to E. Seneta’s
Non-negative Matrices (New York: Wiley, 1973).
SECTION 2.8 An Application to Input-Output Economic Models 115
S T
d1
The column d = is called the demand matrix, and this gives a matrix equation
dn
p = Ep + d
or
(I - E)p = d (∗)
This is a system of linear equations for p, and we ask for a solution p with every
entry nonnegative. Note that every entry of E is between 0 and 1, but the column
sums of E need not equal 1 as in the closed model.
Before proceeding, it is convenient to introduce a useful notation. If A = [aij] and
B = [bij] are matrices of the same size, we write A > B if aij > bij for all i and j, and
we write A ≥ B if aij ≥ bij for all i and j. Thus P ≥ 0 means that every entry of P is
nonnegative. Note that A ≥ 0 and B ≥ 0 implies that AB ≥ 0.
Now, given a demand matrix d ≥ 0, we look for a production matrix p ≥ 0
satisfying equation (∗). This certainly exists if I - E is invertible and (I - E)-1 ≥ 0.
On the other hand, the fact that d ≥ 0 means any solution p to equation (∗) satisfies
p ≥ Ep. Hence, the following theorem is not too surprising.
Theorem 2
EXAMP L E 3
S T
0.6 0.2 0.3
If E = 0.1 0.4 0.2 , show that I - E is invertible and (I - E)-1 ≥ 0.
0.2 0.5 0.1
Solution ► Use p = (3, 2, 2)T in Theorem 2.
116 Chapter 2 Matrix Algebra
If p0 = (1, 1, 1)T, the entries of Ep0 are the row sums of E. Hence p0 > Ep0 holds
if the row sums of E are all less than 1. This proves the first of the following useful
facts (the second is Exercise 10).
Corollary 1
EXERCISES 2.8
1. Find the possible equilibrium price structures (b) Use part (a) to deduce that, if E and F are both
when the input-output matrices are: stochastic matrices, then EF is also stochastic.
S T S T
0.1 0.2 0.3 0.5 0 0.5
7. Find a 2 × 2 matrix E with entries between 0
(a) 0.6 0.2 0.3 (b) 0.1 0.9 0.2
and 1 such that:
0.3 0.6 0.4 0.4 0.1 0.3
S T
(a) I - E has no inverse.
S T
.3 .1 .1 .2 .5 0 .1 .1
(c) .2 .3 .1 0 (d)
.2 .7 0 .1 (b) I - E has an inverse but not all entries of
.3 .3 .2 .3 .1 .2 .8 .2 (I - E)-1 are nonnegative.
.2 .3 .6 .7 .2 .1 .1 .6
8. If E is a 2 × 2 matrix with entries between 0
2. Three industries A, B, and C are such that all the and 1, show that I - E is invertible and
output of A is used by B, all the output of B is (I - E)-1 ≥ 0 if and only if tr E < 1 + det E.
used by C, and all the output of C is used by A.
Here, if E = S T , then tr E = a + d and
a b
Find the possible equilibrium price structures.
c d
3. Find the possible equilibrium price structures det E = ad - bc.
for three industries where the input-output
S T
9. In each case show that I - E is invertible and
1 0 0
(I - E)-1 ≥ 0.
matrix is 0 0 1 . Discuss why there are two
S T S T
0.6 0.5 0.1 0.7 0.1 0.3
0 1 0
parameters here. (a) 0.1 0.3 0.3 (b) 0.2 0.5 0.2
S T S T
0.6 0.2 0.1 0.8 0.1 0.1
matrix E by first writing it in the form
(c) 0.3 0.4 0.2 (d)
E = S T , where 0 ≤ a ≤ 1 and
a b 0.3 0.1 0.2
0.2 0.5 0.1 0.3 0.3 0.2
1-a 1-b
0 ≤ b ≤ 1. 10. Prove that (1) implies (2) in the Corollary to
5. If E is an n × n stochastic matrix and c is an Theorem 2.
n × 1 matrix, show that the sum of the entries 11. If (I - E)-1 ≥ 0, find p > 0 such that p > Ep.
of c equals the sum of the entries of the n × 1
matrix Ec. 12. If Ep < p where E ≥ 0 and p > 0, find a number
such that Ep < p and 0 < < 1.
6. Let W = [1 1 11]. Let E and F denote n × n
[Hint: If Ep = (q1, …, qn)T and p = (p1, …, pn)T,
matrices with nonnegative entries.
take any number such that
max U __, …, __ V < < 1.]
(a) Show that E is a stochastic matrix if and only q1 qn
if WE = W. p1 pn
SECTION 2.9 An Application to Markov Chains 117
Definition 2.15 A Markov chain is such an evolving system wherein the state to which it will go next
depends only on its present state and does not depend on the earlier history of the system.19
19
Even in the case of a Markov chain, the state the system will occupy at any stage
is determined only in terms of probabilities. In other words, chance plays a role. For
example, if a football team wins a particular game, we do not know whether it will
win, draw, or lose the next game. On the other hand, we may know that the team
tends to persist in winning streaks; for example, if it wins one game it may win the
4
next game _12 of the time, lose __
10
of the time, and draw __ 1
10
of the time. These fractions
are called the probabilities of these various possibilities. Similarly, if the team
loses, it may lose the next game with probability _12 (that is, half the time), win with
probability _14 , and draw with probability _14 . The probabilities of the various
outcomes after a drawn game will also be known.
We shall treat probabilities informally here: The probability that a given event will
occur is the long-run proportion of the time that the event does indeed occur. Hence, all
probabilities are numbers between 0 and 1. A probability of 0 means the event is
impossible and never occurs; events with probability 1 are certain to occur.
If a Markov chain is in a particular state, the probabilities that it goes to
the various states at the next stage of its evolution are called the transition
probabilities for the chain, and they are assumed to be known quantities. To
motivate the general conditions that follow, consider the following simple example.
Here the system is a man, the stages are his successive lunches, and the states are the
two restaurants he chooses.
EXAMP L E 1
A man always eats lunch at one of two restaurants, A and B. He never eats at A
twice in a row. However, if he eats at B, he is three times as likely to eat at B
next time as at A. Initially, he is equally likely to eat at either restaurant.
(a) What is the probability that he eats at A on the third day after the initial one?
(b) What proportion of his lunches does he eat at A?
19 The name honours Andrei Andreyevich Markov (1856–1922) who was a professor at the university in St. Petersburg, Russia.
118 Chapter 2 Matrix Algebra
Present Lunch
A B
Next A 0 0.25
Lunch B 1 0.75
The B column shows that, if he eats at B on one day, he will eat there on the
next day _34 of the time and switches to A only _14 of the time.
The restaurant he visits on a given day is not determined. The most that we
can expect is to know the probability that he will visit A or B on that day.
S T
s (m)
1
Let sm = denote the state vector for day m. Here s1(m) denotes the
s (m)
2
probability that he eats at A on day m, and s2(m) is the probability that he eats at
B on day m. It is convenient to let s0 correspond to the initial day. Because he
is equally likely to eat at A or B on that initial day, s1(0) = 0.5 and s2(0) = 0.5, so
Hence, the probability that his third lunch (after the initial one) is at A is
approximately 0.195, whereas the probability that it is at B is 0.805.
If we carry these calculations on, the next state vectors are (to five figures):
S 0.79883 T s5 = S T
0.20117 0.19971
s4 =
0.80029
S 0.79993 T s7 = S T
0.20007 0.19998
s6 =
0.80002
Moreover, as m increases the entries of sm get closer and closer to the
corresponding entries of S T. Hence, in the long run, he eats 20% of his
0.2
0.8
lunches at A and 80% at B.
Example 1 incorporates most of the essential features of all Markov chains. The
general model is as follows: The system evolves through various stages and at each
stage can be in exactly one of n distinct states. It progresses through a sequence
of states as time goes on. If a Markov chain is in state j at a particular stage of its
development, the probability pij that it goes to state i at the next stage is called the
SECTION 2.9 An Application to Markov Chains 119
S T
p1j
p2j
pnj
If the system is in state j at some stage of its evolution, the transition probabilities
p1j, p2j, …, pnj represent the fraction of the time that the system will move to state
1, state 2, …, state n, respectively, at the next stage. We assume that it has to go to
some state at each transition, so the sum of these probabilities equals 1:
p1j + p2j + + pnj = 1 for each j
Thus, the columns of P all sum to 1 and the entries of P lie between 0 and 1. Hence
P is called a stochastic matrix.
As in Example 1, we introduce the following notation: Let si(m) denote the
probability that the system is in state i after m transitions. The n × 1 matrices
S T
s (m)
1
s (m)
sm = 2 m = 0, 1, 2, …
s (m)
n
are called the state vectors for the Markov chain. Note that the sum of the entries
of sm must equal 1 because the system must be in some state after m transitions. The
matrix s0 is called the initial state vector for the Markov chain and is given as part
of the data of the particular chain. For example, if the chain has only two states,
then an initial vector s0 = S 1 T means that it started in state 1. If it started in state 2,
0
120 Chapter 2 Matrix Algebra
the initial vector would be s0 = S 0 T. If s0 = S 0.5 T , it is equally likely that the system
1 0.5
started in state 1 or in state 2.
Theorem 1
Let P be the transition matrix for an n-state Markov chain. If sm is the state vector at
stage m, then
sm+1 = Psm
for each m = 0, 1, 2, ….
HEURISTIC PROOF
Suppose that the Markov chain has been run N times, each time starting with the
same initial state vector. Recall that pij is the proportion of the time the system
goes from state j at some stage to state i at the next stage, whereas si(m) is the
proportion of the time it is in state i at stage m. Hence
s m+1
i N
is (approximately) the number of times the system is in state i at stage m + 1.
We are going to calculate this number another way. The system got to state i
at stage m + 1 through some other state (say state j) at stage m. The number of
times it was in state j at that stage is (approximately) sj(m)N, so the number of
times it got to state i via state j is pij (sj(m)N). Summing over j gives the number of
times the system is in state i (at stage m + 1). This is the number we calculated
before, so
s (m+1)
i N = pi1s (m) (m) (m)
1 N + pi2s 2 N + + pins n N
If the initial probability vector s0 and the transition matrix P are given,
Theorem 1 gives s1, s2, s3, …, one after the other, as follows:
s1 = Ps0
s2 = Ps1
s3 = Ps2
Hence, the state vector sm is completely determined for each m = 0, 1, 2, … by P
and s0.
EXAMPLE 3
A wolf pack always hunts in one of three regions R1, R2, and R3. Its hunting
habits are as follows:
1. If it hunts in some region one day, it is as likely as not to hunt there
again the next day.
2. If it hunts in R1, it never hunts in R2 the next day.
SECTION 2.9 An Application to Markov Chains 121
R1 R2 R3 Solution ► The stages of this process are the successive days; the states are the
1 1 1
three regions. The transition matrix P is determined as follows (see the table):
R1 _ _ _
2 4 4 The first habit asserts that p11 = p22 = p33 = _12 . Now column 1 displays what
1
_ 1
_
R2 0 2 4
happens when the pack starts in R1: It never goes to state 2, so p21 = 0 and,
R3 1
_ 1
_ 1
_ because the column must sum to 1, p31 = _12 . Column 2 describes what happens
2 4 2
if it starts in R2: p22 = _12 and p12 and p32 are equal (by habit 3), so p12 = p32 = _12
because the column sum must equal 1. Column 3 is filled in a similar way.
ST
1
Now let Monday be the initial stage. Then s0 = 0 because the pack hunts
0
in R1 on that day. Then s1, s2, and s3 describe Tuesday, Wednesday, and
Thursday, respectively, and we compute them using Theorem 1.
ST ST ST
1 3
_ 11
__
_
2 8 32
1
_ 6
__
s1 = Ps0 = 0 s2 = Ps1 = 8
s3 = Ps2 = 32
1
_ 4 15
_ __
2 8 32
11
Hence, the probability that the pack hunts in Region R1 on Thursday is __
32
.
Theorem 2
Let P be the transition matrix of a Markov chain and assume that P is regular. Then
there is a unique column matrix s satisfying the following conditions:
1. Ps = s.
2. The entries of s are positive and sum to 1.
Moreover, condition 1 can be written as
(I - P)s = 0
and so gives a homogeneous system of linear equations for s. Finally, the sequence of
state vectors s0, s1, s2, … converges to s in the sense that if m is large enough, each entry
of sm is closely approximated by the corresponding entry of s.
EXAMPLE 4
A man eats one of three soups—beef, chicken, and vegetable—each day. He
never eats the same soup two days in a row. If he eats beef soup on a certain
day, he is equally likely to eat each of the others the next day; if he does not eat
beef soup, he is twice as likely to eat it the next day as the alternative.
(a) If he has beef soup one day, what is the probability that he has it again
two days later?
(b) What are the long-run probabilities that he eats each of the three soups?
Solution ► The states here are B, C, and V, the three soups. The transition
matrix P is given in the table. (Recall that, for each state, the corresponding
column lists the probabilities for the next state.) If he has beef soup initially,
then the initial state vector is
20 The interested reader can find an elementary proof in J. Kemeny, H. Mirkil, J. Snell, and G. Thompson, Finite Mathematical
Structures (Englewood Cliffs, N.J.: Prentice-Hall, 1958).
SECTION 2.9 An Application to Markov Chains 123
ST
1
s0 = 0
0
Then two days later the state vector is s2. If P is the transition matrix, then
ST ST
0 4
s1 = Ps0 = _12 1 , s2 = Ps1 = _16 1
1 1
B C V so he eats beef soup two days later with probability _23 This answers (a) and also
2 2
shows that he eats chicken and vegetable soup each with probability _16 .
B 0 _ _
3 3 To find the long-run probabilities, we must find the steady-state vector s.
1
_ 1
_
C 2
0 3 Theorem 2 applies because P is regular (P2 has positive entries), so s satisfies
1 1
V _
2
_
3
0 Ps = s. That is, (I - P)s = 0 where
S T
6 -4 -4
I - P = _16 -3 6 -2
-3 -2 6
S T S T
4t 0.4
The solution is s = 3t , where t is a parameter, and we use s = 0.3 because
3t 0.3
the entries of S must sum to 1. Hence, in the long run, he eats beef soup 40%
of the time and eats chicken soup and vegetable soup each 30% of the time.
EXERCISES 2.9
S T S T
0 0 _12 1
_ 0 _13 other territory.
2
(a) 1 0 1
_ (b)
1
_ 1 1
_ (a) What proportion of his time does he spend
2 4 3
in A, in B, and in C?
0 1 0 1
_ 0 1
_
4 3
(b) If he hunts in A on Monday (C on Monday),
2. In each case find the steady-state vector and, what is the probability that he will hunt in B
assuming that it starts in state 1, find the on Thursday?
probability that it is in state 2 after 3 transitions.
S 0T
1
_ 4. Assume that there are three social classes—
S 0.5 0.7 T
0.5 0.3 2
1
(a) (b) upper, middle, and lower—and that social
1
_
2 mobility behaves as follows:
S T
1 _
_ 1
0
S T
2 4 1. Of the children of upper-class parents, 70%
0.4 0.1 0.5
remain upper-class, whereas 10% become
(c) 1 0 _14 (d) 0.2 0.6 0.2
middle-class and 20% become lower-class.
0 1 _
_ 1 0.4 0.3 0.3
2 2
2. Of the children of middle-class parents, 80%
S T S T
0.8 0.0 0.2 0.1 0.3 0.3 remain middle-class, whereas the others are
(e) 0.1 0.6 0.1 (f ) 0.3 0.1 0.6 evenly split between the upper class and the
0.1 0.4 0.7 0.6 0.6 0.1 lower class.
3. A fox hunts in three territories A, B, and C. 3. For the children of lower-class parents, 60%
He never hunts in the same territory on two remain lower-class, whereas 30% become
successive days. If he hunts in A, then he hunts middle-class and 10% upper-class.
124 Chapter 2 Matrix Algebra
(a) Find the probability that the grandchild of (a) If he starts in compartment 1, find the
lower-class parents becomes upper-class. probability that he is in compartment 1 again
after 3 moves.
(b) Find the long-term breakdown of society
into classes. (b) Find the compartment in which he spends
most of his time if he is left for a long time.
5. The prime minister says she will call an election.
This gossip is passed from person to person with 9. If a stochastic matrix has a 1 on its main
a probability p ≠ 0 that the information is passed diagonal, show that it cannot be regular. Assume
incorrectly at any stage. Assume that when a it is not 1 × 1.
person hears the gossip he or she passes it to one
person who does not know. Find the long-term 10. If sm is the stage-m state vector for a Markov
probability that a person will hear that there is chain, show that sm+k = Pksm holds for all m ≥ 1
going to be an election. and k ≥ 1 (where P is the transition matrix).
6. John makes it to work on time one Monday out 11. A stochastic matrix is doubly stochastic if all
of four. On other work days his behaviour is as the row sums also equal 1. Find the steady-state
follows: If he is late one day, he is twice as likely vector for a doubly stochastic matrix.
to come to work on time the next day as to be
12. Consider the 2 × 2 stochastic matrix
late. If he is on time one day, he is as likely to
P=S
1-qT
be late as not the next day. Find the probability 1-p q
, where 0 < p < 1 and 0 < q < 1.
of his being late and that of his being on time p
p + q SpT
Wednesdays. 1 q is the steady-state vector
(a) Show that _____
7. Suppose you have 1¢ and match coins with a for P.
friend. At each match you either win or lose
1¢ with equal probability. If you go broke (b) Show that Pm converges to the matrix
p + q S p pT
or ever get 4¢, you quit. Assume your friend 1 q q by first verifying inductively that
_____
never quits. If the states are 0, 1, 2, 3, and
S -p q T
- p - q)m p -q
S T
4 representing your wealth, show that the 1 q q + (1
Pm = _____ ___________
corresponding transition matrix P is not regular. p+q p p p+q
Find the probability that you will go broke after for m = 1, 2, …. (It can be shown that
3 matches. the sequence of powers P, P2, P3, …of any
8. A mouse is put into a maze regular transition matrix converges to the
3 matrix each of whose columns equals the
of compartments, as in the 1
diagram. Assume that he steady-state vector for P.)
always leaves any 2
compartment he enters and
that he is equally likely to 4
take any tunnel entry.
5
1. Solve for the matrix X if: 2. Consider p(X) = X3 - 5X2 + 11X - 4I.
S T S T
1 0 -1 1 -4
P = 2 -1 , Q = S T
1 1 -1 terms of U.
, R = -4 0 -6 ,
2 0 3
0 3 6 6 -6 3. Show that, if a (possibly nonhomogeneous)
S=S T
1 6 system of equations is consistent and has more
3 1
SECTION 2.9 An Application to Markov Chains 125
3
With each square matrix we can calculate a number, called the determinant of the
matrix, which tells us whether or not the matrix is invertible. In fact, determinants
can be used to give a formula for the inverse of a matrix. They also arise in
calculating certain numbers (called eigenvalues) associated with a matrix. These
eigenvalues are essential to a technique called diagonalization that is used in many
applications where it is desired to predict the future behaviour of a system. For
example, we use it to predict whether a species will become extinct.
Determinants were first studied by Leibnitz in 1696, and the term “determinant”
was first used in 1801 by Gauss in his Disquisitiones Arithmeticae. Determinants are
much older than matrices (which were introduced by Cayley in 1878) and were
used extensively in the eighteenth and nineteenth centuries, primarily because of
their significance in geometry (see Section 4.4). Although they are somewhat less
important today, determinants still play a role in the theory and application of
matrix algebra.
1 Determinants are commonly written |A| = det A using vertical bars. We will use both notations.
SECTION 3.1 The Cofactor Expansion 127
a b c a b c a b c
A → 0 u af − cd → 0 u af − cd → 0 u af − cd
0 v ai − cg 0 uv u( ai − cg ) 0 0 w
where w = u(ai - cg) - v(af - cd) = a(aei + bfg + cdh - ceg - afh - bdi). We define
det A = aei + bfg + cdh - ceg - afh - bdi (∗)
and observe that det A ≠ 0 because a det A = w ≠ 0 (is invertible).
To motivate the definition below, collect the terms in (∗) involving the entries a,
b, and c in row 1 of A:
a b c
det A = d e f = aei + bfg + cdh - ceg - afh - bdi
g h i
= a(ei - fh) - b(di - fg) + c(dh - eg)
=a
e f
h i
-b| | | | | |
d f
g i
+c
d e
g h
This last expression can be described as follows: To compute the determinant of a
3 × 3 matrix A, multiply each entry in row 1 by a sign times the determinant of the
2 × 2 matrix obtained by deleting the row and column of that entry, and add the
results. The signs alternate down row 1, starting with +. It is this observation that
we generalize below.
EXAMP L E 1
2 3 7
-4 6 -4 0
det − 4 0 6 = 2
1 5 0
0 6
5 0
-3| | |
1 0
+7
1 5 | | |
= 2(-30) - 3(-6) + 7(-20)
= -182.
Definition 3.1 Assume that determinants of (n - 1) × (n - 1) matrices have been defined. Given the
n × n matrix A, let
Aij denote the (n - 1) × (n - 1) matrix
obtained from A by deleting row i and column j.
Then the (i, j)-cofactor cij(A) is the scalar defined by
cij(A) = (-1)i+j det(Aij).
Here (-1)i+j is called the sign of the (i, j)-position.
The sign of a matrix is clearly 1 or -1, and the following diagram is useful for
remembering the sign of a position:
128 Chapter 3 Determinants and Diagonalization
+ − + −
− + − +
+ − + −
− + − +
Note that the signs alternate along each row and column with + in the upper
left corner.
EXAMPLE 2
Find the cofactors of positions (1, 2), (3, 1), and (2, 3) in the following matrix.
3 −1 6
A= 5 2 7
8 9 4
Solution ► Here A12 is the matrix S T that remains when row 1 and column 2
5 7
8 4
are deleted. The sign of position (1, 2) is (-1)1+2 = -1 (this is also the
(1, 2)-entry in the sign diagram), so the (1, 2)-cofactor is
| |
c12(A) = (-1)1+2 5 7 = (-1)(5 · 4 - 7 · 8) = (-1)(-36) = 36
8 4
Turning to position (3, 1), we find
Definition 3.2 Assume that determinants of (n - 1) × (n - 1) matrices have been defined. If A = [aij]
is n × n define
det A = a11c11(A) + a12c12(A) + + a1nc1n(A)
This is called the cofactor expansion of det A along row 1.
It asserts that det A can be computed by multiplying the entries of row 1 by the
corresponding cofactors, and adding the results. The astonishing thing is that det A
can be computed by taking the cofactor expansion along any row or column: Simply
multiply each entry of that row or column by the corresponding cofactor and add.
SECTION 3.1 The Cofactor Expansion 129
Theorem 1
EXAMP L E 3
3 4 5
Compute the determinant of A = 1 7 2 .
9 8 −6
Solution ► The cofactor expansion along the first row is as follows:
det A = 3c11(A) + 4c12(A) + 5c13(A)
=3
7 2
8 -6 |-4
1 2
9 -6| |
+3
1 7
9 8 | | |
= 3(-58) - 4(-24) + 5(-55)
= -353
Note that the signs alternate along the row (indeed along any row or column).
Now we compute det A by expanding along the first column.
The fact that the cofactor expansion along any row or column of a matrix A always
gives the same result (the determinant of A) is remarkable, to say the least. The
choice of a particular row or column can simplify the calculation.
EXAMP L E 4
3 0 0 0
5 1 2 0
Compute det A where A = .
2 6 0 −1
−6 3 1 0
2 The cofactor expansion is due to Pierre Simon de Laplace (1749–1827), who discovered it in 1772 as part of a study of linear
differential equations. Laplace is primarily remembered for his work in astronomy and applied mathematics.
130 Chapter 3 Determinants and Diagonalization
Solution ► The first choice we must make is which row or column to use in the
cofactor expansion. The expansion involves multiplying entries by cofactors, so
the work is minimized when the row or column contains as many zero entries
as possible. Row 1 is a best choice in this matrix (column 4 would do as well),
and the expansion is
det A = 3c11(A) + 0c12(A) + 0c13(A) + 0c14(A)
1 2 0
= 3 6 0 −1
3 1 0
This is the first stage of the calculation, and we have succeeded in expressing the
determinant of the 4 × 4 matrix A in terms of the determinant of a 3 × 3 matrix.
The next stage involves this 3 × 3 matrix. Again, we can use any row or column for
the cofactor expansion. The third column is preferred (with two zeros), so
Theorem 2
S T S T
3 a b c a b c a b
If A = d e f , we can calculate det A by considering d e f d e obtained from A by adjoining columns 1 and 2 on the right. Then
g h i g h i g h
det A = aei + bfg + cdh - ceg - afh - bdi, where the positive terms aei, bfg, and cdh are the products down and to the right
starting at a, b, and c, and the negative terms ceg, afh, and bdi are the products down and to the left starting at c, a, and b.
Warning: This rule does not apply to n × n matrices where n > 3 or n = 2.
SECTION 3.1 The Cofactor Expansion 131
PROOF
We prove properties 2, 4, and 5 and leave the rest as exercises.
Property 2. If A is n × n, this follows by induction on n. If n = 2, the verification
is left to the reader. If n > 2 and two rows are interchanged, let B denote the
resulting matrix. Expand det A and det B along a row other than the two that
were interchanged. The entries in this row are the same for both A and B, but
the cofactors in B are the negatives of those in A (by induction) because the
corresponding (n - 1) × (n - 1) matrices have two rows interchanged. Hence,
det B = -det A, as required. A similar argument works if two columns are
interchanged.
Property 4. If two rows of A are equal, let B be the matrix obtained by
interchanging them. Then B = A, so det B = det A. But det B = -det A by
property 2, so det A = det B = 0. Again, the same argument works for columns.
Property 5. Let B be obtained from A = [aij] by adding u times row p to row
q. Then row q of B is (aq1 + uap1, aq2 + uap2, …, aqn + uapn). The cofactors of
these elements in B are the same as in A (they do not involve row q): in symbols,
cqj(B) = cqj(A) for each j. Hence, expanding B along row q gives
det B = (aq1 + uap1)cq1(A) + (aq2 + uap2)cq2(A) + + (aqn + uapn)cqn(A)
= [aq1cq1(A) + aq2cq2(A) + + aqncqn(A)]
+ u[ap1cq1(A) + ap2cq2(A) + + apncqn(A)]
= det A + u det C
where C is the matrix obtained from A by replacing row q by row p (and both
expansions are along row q). Because rows p and q of C are equal, det C = 0 by
property 4. Hence, det B = det A, as required. As before, a similar proof holds
for columns.
2 5 2 0 9 20
−1 2 9 = −1 2 9 (because twice the second row of the matrix on the
3 1 1 3 1 1 left was added to the first row)
The following four examples illustrate how Theorem 2 is used to evaluate determinants.
EXAMPLE 5
1 −1 3
Evaluate det A when A = 1 0 −1 .
2 1 6
Solution ► The matrix does have zero entries, so expansion along (say) the
second row would involve somewhat less work. However, a column operation
can be used to get a zero in position (2, 3)—namely, add column 1 to column 3.
Because this does not change the value of the determinant, we obtain
1 −1 3 1 −1 4
-1 4
det A = 1 0 −1 = 1 0 0 = -
1 8
= 12 | |
2 1 6 2 1 8
where we expanded the second 3 × 3 matrix along row 2.
EXAMPLE 6
a b c a+ x b + y c + z
If det p q r = 6, evaluate det A where A = 3x 3 y 3z .
x y z − p −q −r
Solution ► First take common factors out of rows 2 and 3.
a+x b+ y c+z
det A = 3(-1) det x y z
p q r
Now subtract the second row from the first and interchange the last two rows.
a b c a b c
det A = -3 det x y z = 3 det p q r = 3 · 6 = 18
p q r x y z
EXAMPLE 7
1 x x
Find the values of x for which det A = 0, where A = x 1 x .
x x 1
SECTION 3.1 The Cofactor Expansion 133
Solution ► To evaluate det A, first subtract x times row 1 from rows 2 and 3.
1 x x 1 x x
1 − x2 x − x2
det A = x 1 x = 0 1 − x x − x 2 =
2
x − x 1− x
2 2
x x 1 0 x−x 1−x
2 2
EXAMP L E 8
If a1, a2, and a3 are given show that
1 a1 a12
det 1 a2 a22 = (a3 - a1)(a3 - a2)(a2 - a1)
1 a3 a32
Solution ► Begin by subtracting row 1 from rows 2 and 3, and then expand
along column 1:
1 a1 a12 1 a1 a12
a2 − a1 a22 − a12
det 1 a2 a22 = det 0 a 2 − a1 a22 − a12 = det
a3 − a1 a32 − a12
1 a3 a32 0 a 3 − a1 a32 − a12
Now (a2 - a1) and (a3 - a1) are common factors in rows 1 and 2, respectively, so
1 a1 a12
det 1 a2 a22 = (a2 - a1)(a3 - a1) det S T
1 a2 + a1
1 a3 + a1
1 a3 a32
= (a2 - a1)(a3 - a1)(a3 - a2)
The matrix in Example 8 is called a Vandermonde matrix, and the formula for its
determinant can be generalized to the n × n case (see Theorem 7 Section 3.2).
If A is an n × n matrix, forming uA means multiplying every row of A by u.
Applying property 3 of Theorem 2, we can take the common factor u out of each
row and so obtain the following useful result.
Theorem 3
The next example displays a type of matrix whose determinant is easy to compute.
EXAMPLE 9
a 0 0 0
u b 0 0
Evaluate det A if A = .
v w c 0
x y z d
b 0 0
Solution ► Expand along row 1 to get det A = a w c 0 . Now expand this along the
y z d
c 0
top row to get det A = ab
z d| |
= abcd, the product of the main diagonal entries.
A square matrix is called a lower triangular matrix if all entries above the main
diagonal are zero (as in Example 9). Similarly, an upper triangular matrix is one
for which all entries below the main diagonal are zero. A triangular matrix is one
that is either upper or lower triangular. Theorem 4 gives an easy rule for calculating
the determinant of any triangular matrix. The proof is like the solution to Example 9.
Theorem 4
If A is a square triangular matrix, then det A is the product of the entries on the
main diagonal.
Theorem 5
PROOF
where a11, a21, …, ak1 are the entries in the first column of A. But
EXAMPLE 10
2 3 1 3 2 1 3 3
1 −2 −1 1 1 −1 −2 1 2 1 1 1
det
0 1 0 1
=-
0 0 1 1
=- |
1 -1 4 1 || |
= -(-3)(-3) = -9
0 4 0 1 0 0 4 1
The next result shows that det A is a linear transformation when regarded as a
function of a fixed column of A. The proof is Exercise 21.
Theorem 6
EXERCISES 3.1
4. Show that det I = 1 for any identity matrix I. (f ) det(AT) = -det A.
Theorem 1
Product Theorem
If A and B are n × n matrices, then det(AB) = det A det B.
EXAMPLE 1
T , then AB = S
-(ad + bc) ac - bd T
If A = S T and B = S
a b c d ac - bd ad + bc
.
-b a -d c
Hence det A det B = det(AB) gives the identity
(a2 + b2)(c2 + d2) = (ac - bd)2 + (ad + bc)2
Theorem 1 extends easily to det(ABC) = det A det B det C. In fact, induction gives
det(A1A2Ak-1Ak) = det A1 det A2det Ak-1 det Ak
for any square matrices A1, …, Ak of the same size. In particular, if each Ai = A,
we obtain
det(Ak) = (det A)k for any k ≥ 1
We can now give the invertibility condition.
Theorem 2
PROOF
If A is invertible, then AA-1 = I; so the product theorem gives
1 = det I = det(AA-1) = det A det A-1
Hence, det A ≠ 0 and also det A-1 = _____1 .
det A
Conversely, if det A ≠ 0, we show that A can be carried to I by elementary
row operations (and invoke Theorem 5 Section 2.4). Certainly, A can be
carried to its reduced row-echelon form R, so R = EkE2E1A where the Ei are
elementary matrices (Theorem 1 Section 2.5). Hence the product theorem gives
det R = det Ek det E2 det E1 det A
Since det E ≠ 0 for all elementary matrices E, this shows det R ≠ 0. In
particular, R has no row of zeros, so R = I because R is square and reduced
row-echelon. This is what we wanted.
SECTION 3.2 Determinants and Matrix Inverses 139
EXAMP L E 2
1 0 −c
For which values of c does A = −1 3 1 have an inverse?
0 2c − 4
Solution ► Compute det A by first adding c times column 1 to column 3 and
then expanding along row 1.
1 0 −c 1 0 0
det A = det −1 3 1 = det −1 3 1 − c = 2(c + 2)(c - 3).
0 2c − 4 0 2c − 4
Hence, det A = 0 if c = -2 or c = 3, and A has an inverse if c ≠ -2 and c ≠ 3.
EXAMP L E 3
If a product A1A2Ak of square matrices is invertible, show that each Ai is
invertible.
Theorem 3
PROOF
Consider first the case of an elementary matrix E. If E is of type I or II, then
ET = E; so certainly det ET = det E. If E is of type III, then ET is also of type
III; so det ET = 1 = det E by Theorem 2 Section 3.1. Hence, det ET = det E
for every elementary matrix E.
Now let A be any square matrix. If A is not invertible, then neither is AT; so
det AT = 0 = det A by Theorem 2. On the other hand, if A is invertible, then
A = EkE2E1, where the Ei are elementary matrices (Theorem 2 Section 2.5).
Hence, AT = E T1 E T2 E Tk so the product theorem gives
det AT = det E T1 det E T2 det E Tk = det E1 det E2 det Ek
= det Ek det E2 det E1
= det A
This completes the proof.
140 Chapter 3 Determinants and Diagonalization
EXAMPLE 4
If det A = 2 and det B = 5, calculate det(A3B-1ATB2).
EXAMPLE 5
A square matrix is called orthogonal if A-1 = AT. What are the possible values
of det A if A is orthogonal?
Hence Theorems 4 and 5 of Section 2.6 imply that rotation about the origin
and reflection about a line through the origin in 2 have orthogonal matrices with
determinants 1 and -1 respectively. In fact they are the only such transformations
of 2. We have more to say about this in Section 8.2.
Adjugates
In Section 2.4 we defined the adjugate of a 2 × 2 matrix A = S T to be
a b
c d
adj(A) = S T. Then we verified that A(adj A) = (det A)I = (adj A)A and hence
d -b
-c a
1 adj A. We are now able to define the adjugate of an
that, if det A ≠ 0, A-1 = _____
det A
arbitrary square matrix and to show that this formula for the inverse remains
valid (when the inverse exists).
Recall that the (i, j)-cofactor cij(A) of a square matrix A is a number defined for
each position (i, j) in the matrix. If A is a square matrix, the cofactor matrix of A
is defined to be the matrix [cij(A)] whose (i, j)-entry is the (i, j)-cofactor of A.
Definition 3.3 The adjugate4 of A, denoted adj(A), is the transpose of this cofactor matrix; in symbols,
adj(A) = [cij(A)]T
4
This agrees with the earlier definition for a 2 × 2 matrix A as the reader can verify.
4 This is also called the classical adjoint of A, but the term “adjoint” has another meaning.
SECTION 3.2 Determinants and Matrix Inverses 141
EXAMP L E 6
1 3 −2
Compute the adjugate of A = 0 1 5 and calculate A(adj A) and (adj A)A.
− 2 −6 7
Solution ► We first find the cofactor matrix.
1 5 0 5 0 1
−
−6 7 −2 7 −2 −6
c11 ( A) c12 ( A) c13 ( A)
3 −2 1 −2 1 3
c21 ( A) c22 ( A) c23 ( A) = − −
−6 7 −2 7 −2 −6
c31 ( A) c32 ( A) c33 ( A)
3 −2 1 −2 1 3
−
1 5 0 5 0 1
37 −10 2
= −9 3 0
17 − 5 1
Then the adjugate of A is the transpose of this cofactor matrix.
37 −10 2 T 37 − 9 17
adj A = − 9 3 0 = −10 3 − 5
17 − 5 1 2 0 1
The computation of A(adj A) gives
1 3 −2 37 − 9 17 3 0 0
A(adj A) = 0 1 5 − 10 3 −5 = 0 3 0 = 3I
−2 −6 7 2 0 1 0 0 3
and the reader can verify that also (adj A)A = 3I. Hence, analogy with the
2 × 2 case would indicate that det A = 3; this is, in fact, the case.
The relationship A(adj A) = (det A)I holds for any square matrix A. To see why
this is so, consider the general 3 × 3 case. Writing cij(A) = cij for short, we have
c11 c12 c13 T c11 c21 c31
adj A = c21 c22 c23 = c12 c22 c32
c31 c32 c33 c13 c23 c33
If A = [aij] in the usual notation, we are to verify that A(adj A) = (det A)I. That is,
a11 a12 a13 c11 c21 c31 det A 0 0
A(adj A) = a21 a22 a23 c12 c22 c32 = 0 det A 0
a31 a32 a33 c13 c23 c33 0 0 det A
Consider the (1, 1)-entry in the product. It is given by a11c11 + a12c12 + a13c13, and
this is just the cofactor expansion of det A along the first row of A. Similarly, the
(2, 2)-entry and the (3, 3)-entry are the cofactor expansions of det A along rows 2
and 3, respectively.
So it remains to be seen why the off-diagonal elements in the matrix product
A(adj A) are all zero. Consider the (1, 2)-entry of the product. It is given by
a11c21 + a12c22 + a13c23. This looks like the cofactor expansion of the determinant of
some matrix. To see which, observe that c21, c22, and c23 are all computed by deleting
row 2 of A (and one of the columns), so they remain the same if row 2 of A is
changed. In particular, if row 2 of A is replaced by row 1, we obtain
142 Chapter 3 Determinants and Diagonalization
a11 a12 a 13
a11c21 + a12c22 + a13c23 = det a11 a 12 a 13 = 0
a 31 a 32 a 33
where the expansion is along row 2 and where the determinant is zero because two rows
are identical. A similar argument shows that the other off-diagonal entries are zero.
This argument works in general and yields the first part of Theorem 4. The
second assertion follows from the first by multiplying through by the scalar _____1 .
det A
Theorem 4
Adjugate Formula
If A is any square matrix, then
A(adj A) = (det A)I = (adj A)A
In particular, if det A ≠ 0, the inverse of A is given by
1 adj A
A-1 = _____
det A
It is important to note that this theorem is not an efficient way to find the inverse
of the matrix A. For example, if A were 10 × 10, the calculation of adj A would
require computing 102 = 100 determinants of 9 × 9 matrices! On the other hand,
the matrix inversion algorithm would find A-1 with about the same effort as finding
det A. Clearly, Theorem 4 is not a practical result: its virtue is that it gives a
formula for A-1 that is useful for theoretical purposes.
EXAMPLE 7
2 1 3
-1
Find the (2, 3)-entry of A if A = 5 −7 1 .
3 0 −6
2 1 3 2 1 7
Solution ► First compute det A = 5 −7 1 = 5 −7 11 = 3
3 0 −6 3 0 0
| 1 7
-7 11 | = 180.
EXAMPLE 8
If A is n × n, n ≥ 2, show that det(adj A) = (det A)n-1.
Cramer’s Rule
Theorem 4 has a nice application to linear equations. Suppose
Ax = b
is a system of n equations in n variables x1, x2, …, xn. Here A is the n × n coefficient
matrix, and x and b are the columns
S T ST
x1 b1
x2 b
x= and b = 2
xn bn
of variables and constants, respectively. If det A ≠ 0, we left multiply by A-1 to
obtain the solution x = A-1b. When we use the adjugate formula, this becomes
S T
x1
x2 1 (adj A)b
= _____
det A
xn
c11 (A) c21 (A) c n1 (A) b1
1 c (A) c (A) cn2 (A) b2
= _____ 12 22
det A
c1n (A) c2n (A) cnn (A) bn
Hence, the variables x1, x2, …, xn are given by
1 [b c (A) + b c (A) + + b c (A)]
x1 = _____ 1 11 2 21 n n1
det A
1 [b c (A) + b c (A) + + b c (A)]
x2 = _____ 1 12 2 22 n n2
det A
Theorem 5
Cramer’s Rule5
If A is an invertible n × n matrix, the solution to the system
Ax = b
of n equations in the variables x1, x2, …, xn is given by
det A det A det A
x1 = ______1 , x2 = ______2 , …, xn = ______n
det A det A det A
where, for each k, Ak is the matrix obtained from A by replacing column k by b.
5
5 Gabriel Cramer (1704–1752) was a Swiss mathematician who wrote an introductory work on algebraic curves. He popularized the
rule that bears his name, but the idea was known earlier.
144 Chapter 3 Determinants and Diagonalization
EXAMPLE 9
Find x1, given the following system of equations.
5x1 + x2 - x3 = 4
9x1 + x2 - x3 = 1
x1 - x2 + 5x3 = 2
Solution ► Compute the determinants of the coefficient matrix A and the matrix
A1 obtained from it by replacing the first column by the column of constants.
5 1 −1
det A = det 9 1 −1 = -16
1 −1 5
4 1 −1
det A1 = det 1 1 −1 = 12
2 −1 5
det A
Hence, x1 = ______1 = -_34 by Cramer’s rule.
det A
Cramer’s rule is not an efficient way to solve linear systems or invert matrices.
True, it enabled us to calculate x1 here without computing x2 or x3. Although
this might seem an advantage, the truth of the matter is that, for large systems
of equations, the number of computations needed to find all the variables by
the gaussian algorithm is comparable to the number required to find one of the
determinants involved in Cramer’s rule. Furthermore, the algorithm works when
the matrix of the system is not invertible and even when the coefficient matrix
is not square. Like the adjugate formula, then, Cramer’s rule is not a practical
numerical technique; its virtue is theoretical.
Polynomial Interpolation
EXAMPLE 10
Age (15, 6) A forester wants to estimate the age (in years) of a tree by measuring the
6 (10, 5) diameter of the trunk (in cm). She obtains the following data:
4 (5, 3) Tree 1 Tree 2 Tree 3
2 Trunk Diameter 5 10 15
Diameter
Age 3 5 6
0 5 10 12 15 Estimate the age of a tree with a trunk diameter of 12 cm.
7 1
The (unique) solution is r0 = 0, r1 = __
10
, and r2 = -__
50
, so
7 1 2 1
p(x) = __
10
x - __
50
x = __
50
x(35 - x).
Hence the estimate is p(12) = 5.52.
As in Example 10, it often happens that two variables x and y are related but
the actual functional form y = f (x) of the relationship is unknown. Suppose that
for certain values x1, x2, …, xn of x the corresponding values y1, y2, …, yn are
known (say from experimental measurements). One way to estimate the value of y
corresponding to some other value a of x is to find a polynomial6
p(x) = r0 + r1x + r2x + + rn-1xn-1
that “fits” the data, that is p(xi) = yi holds for each i = 1, 2, …, n. Then the estimate
for y is p(a). As we will see, such a polynomial always exists if the xi are distinct.
The conditions that p(xi) = yi are
r0 + r1x1 + r2x 21 + + rn-1x n-2
1 = y1
r0 + r1x2 + r2x 22 + + rn-1x n-2
2 = y2
Sr T S T
r0 y1
1 x2 x22 x2n −1 r1 y
= 2 (∗)
n-1 yn
1 xn xn2 xnn −1
It can be shown (see Theorem 7) that the determinant of the coefficient matrix
equals the product of all terms (xi - xj) with i > j and so is nonzero (because the xi
are distinct). Hence the equations have a unique solution r0, r1, …, rn-1. This proves
Theorem 6
Let n data pairs (x1, y1), (x2, y2), …, (xn, yn) be given, and assume that the xi are distinct.
Then there exists a unique polynomial
The polynomial in Theorem 6 is called the interpolating polynomial for the data.
We conclude by evaluating the determinant of the coefficient matrix in (∗).
If a1, a2, …, an are numbers, the determinant
6 A polynomial is an expression of the form a0 + a1x + a2x2 + + anxn where the ai are numbers and x is a variable. If an ≠ 0,
the integer n is called the degree of the polynomial, and an is called the leading coefficient. See Appendix D.
146 Chapter 3 Determinants and Diagonalization
n −1
1 a1 a12 a1
n −1
1 a 2 a 22 a2
det 1 a 3 a 32 a3n −1
1 an an2 ann −1
Theorem 7
1 an an2 ann −1
PROOF
We may assume that the ai are distinct; otherwise both sides are zero. We
proceed by induction on n ≥ 2; we have it for n = 2, 3. So assume it holds for
n - 1. The trick is to replace an by a variable x, and consider the determinant
1 a1 a12 a1n −1
1 a2 a22 a2n −1
p(x) = det .
1 an − 1 an2 −1 ann −−11
2 n −1
1 x x x
Then p(x) is a polynomial of degree at most n - 1 (expand along the last row),
and p(ai) = 0 for i = 1, 2, …, n - 1 because in each case there are two identical
rows in the determinant. In particular, p(a1) = 0, so we have p(x) = (x - a1)p1(x)
by the factor theorem (see Appendix D). Since a2 ≠ a1, we obtain p1(a2) = 0, and
so p1(x) = (x - a2)p2(x). Thus p(x) = (x - a1)(x - a2)p2(x). As the ai are distinct,
this process continues to obtain
p(x) = (x - a1)(x - a2)(x - an-1)d (∗∗)
7 Alexandre Théophile Vandermonde (1735–1796) was a French mathematician who made contributions to the theory of equations.
SECTION 3.2 Determinants and Matrix Inverses 147
where d is the coefficient of xn-1 in p(x). By the cofactor expansion of p(x) along
the last row we get
n−2
1 a1 a12 a1
n−2
1 a2 a22 a2
d = (-1)n+n det .
Because (-1)n+n = 1, the induction hypothesis shows that d is the product of all
factors (ai - aj) where 1 ≤ j < i ≤ n - 1. The result now follows from (∗∗) by
substituting an for x in p(x).
PROOF OF THEOREM 1
If A and B are n × n matrices we must show that
det(AB) = det A det B. (∗)
Recall that if E is an elementary matrix obtained by doing one row operation
to In, then doing that operation to a matrix C (Lemma 1 Section 2.5) results in
EC. By looking at the three types of elementary matrices separately, Theorem 2
Section 3.1 shows that
det(EC) = det E det C for any matrix C. (∗∗)
Thus if E1, E2, …, Ek are all elementary matrices, it follows by induction that
det(EkE2E1C) = det Ek det E2 det E1 det C for any matrix C. (∗∗∗)
Lemma. If A has no inverse, then det A = 0.
Proof. Let A → R where R is reduced row-echelon, say EnE2E1A = R. Then
R has a row of zeros by Theorem 5(4) Section 2.4, and hence det R = 0. But
then (∗∗∗) gives det A = 0 because det E ≠ 0 for any elementary matrix E. This
proves the Lemma.
Now we can prove (∗) by considering two cases.
Case 1. A has no inverse. Then AB also has no inverse (otherwise A[B(AB)-1] = I
so A is invertible by the Corollary 2 to Theorem 5 Section 2.4). Hence the above
Lemma (twice) gives
det(AB) = 0 = 0 det B = det A det B.
proving (∗) in this case.
Case 2. A has an inverse. Then A is a product of elementary matrices by Theorem
2 Section 2.5, say A = E1E2…Ek. Then (∗∗∗) with C = I gives
det A = det(E1E2Ek) = det E1 det E2 det Ek.
But then (∗∗∗) with C = B gives
det(AB) = det[(E1E2Ek)B] = det E1 det E2 det Ek det B = det A det B,
and (∗) holds in this case too.
148 Chapter 3 Determinants and Diagonalization
EXERCISES 3.2
16. Show that no 3 × 3 matrix A exists such that (b) (0, 1), (1, 1.49), (2, -0.42), (3, -11.33)
A2 + I = 0. Find a 2 × 2 matrix A with this
(c) (0, 2), (1, 2.03), (2, -0.40), (-1, 0.89)
property.
1 a b
17. Show that det(A + BT) = det(AT + B) for any 25. If A = − a 1 c show that
n × n matrices A and B.
−b −c 1
18. Let A and B be invertible n × n matrices. Show det A = 1 + a2 + b2 + c2. Hence, find A-1 for
that det A = det B if and only if A = UB where any a, b, and c.
U is a matrix with det U = 1. a p q
26. (a) Show that A = 0 b r has an inverse if and
19. For each of the matrices in Exercise 2, find the 0 0 c
inverse for those values of c for which it exists. only if abc ≠ 0, and find A-1 in that case.
20. In each case either prove the statement or give (b) Show that if an upper triangular matrix is
an example showing that it is false: invertible, the inverse is also upper triangular.
(a) If adj A exists, then A is invertible. 27. Let A be a matrix each of whose entries are
(b) If A is invertible and adj A = A-1, then integers. Show that each of the following
det A = 1. conditions implies the other.
(c) det(AB) = det(BTA). (1) A is invertible and A-1 has integer entries.
(d) If det A ≠ 0 and AB = AC, then B = C. (2) det A = 1 or -1.
T 3 0 1
(e) If A = -A, then det A = -1.
28. If A-1 = 0 2 3 , find adj A.
(f ) If adj A = 0, then A = 0.
3 1 −1
(g) If A is invertible, then adj A is invertible.
29. If A is 3 × 3 and det A = 2, find
(h) If A has a row of zeros, so also does adj A. det(A-1 + 4 adj A).
(a) p(0) = p(1) = 1, p(-1) = 4, p(2) = -5 (a) adj(adj A) = (det A)n-2A (here n ≥ 2)
[Hint: See Example 8.]
(b) p(0) = p(1) = 1, p(-1) = 2, p(-2) = -3
(b) adj(A-1) = (adj A)-1
24. Given the following data pairs, find the
(c) adj(AT) = (adj A)T
interpolating polynomial of degree 3 and
estimate the value of y corresponding to x = 1.5. (d) adj(AB) = (adj B)(adj A) [Hint: Show that AB
adj(AB) = AB adj B adj A.]
(a) (0, 1), (1, 2), (2, 5), (3, 10)
150 Chapter 3 Determinants and Diagonalization
EXAMPLE 1
Consider the evolution of the population of a species of birds. Because the
number of males and females are nearly equal, we count only females. We
assume that each female remains a juvenile for one year and then becomes an
adult, and that only adults have offspring. We make three assumptions about
reproduction and survival rates:
1. The number of juvenile females hatched in any year is twice
the number of adult females alive the year before (we say the
reproduction rate is 2).
2. Half of the adult females in any year survive to the next year (the adult
survival rate is _12 ).
3. One quarter of the juvenile females in any year survive into adulthood
(the juvenile survival rate is _14 ).
If there were 100 adult females and 40 juvenile females alive initially, compute
the population of females k years later.
Solution ► Let ak and jk denote, respectively, the number of adult and juvenile
females after k years, so that the total female population is the sum ak + jk.
Assumption 1 shows that jk+1 = 2ak, while assumptions 2 and 3 show that
ak+1 = _12 ak + _14 jk. Hence the numbers ak and jk in successive years are related
by the following equations:
ak+1 = _12 ak + _14 jk
jk+1 = 2ak
1 _
_ 1
If we write vk, = c ak d and A = c d, these equations take the matrix form
2 4
jk 2 0
vk+1 = Avk, for each k = 0, 1, 2, …
Taking k = 0 gives v1 = Av0, then taking k = 1 gives v2 = Av1 = A2v0, and
taking k = 2 gives v3 = Av2 = A3v0. Continuing in this way, we get
vk = Akv0 for each k = 0, 1, 2, ….
Theorem 1
EXAMP L E 2
If A = S T and x = S T , then Ax = 4x so λ = 4 is an eigenvalue of A with
3 5 5
1 -1 1
corresponding eigenvector x.
8 More precisely, this is a linear discrete dynamical system. Many models regard vt as a continuous function of the time t, and replace
our condition between vk+1 and Avk with a differential relationship viewed as functions of time.
152 Chapter 3 Determinants and Diagonalization
Note that cA(x) is indeed a polynomial in the variable x, and it has degree n when A
is an n × n matrix (this is illustrated in the examples below). The above discussion
shows that a number λ is an eigenvalue of A if and only if cA(λ) = 0, that is if
and only if λ is a root of the characteristic polynomial cA(x). We record these
observations in
Theorem 2
Let A be an n × n matrix.
1. The eigenvalues λ of A are the roots of the characteristic polynomial cA(x) of A.
2. The λ-eigenvectors x are the nonzero solutions to the homogeneous system
(λI - A)x = 0
of linear equations with λI - A as coefficient matrix.
EXAMPLE 3
Find the characteristic polynomial of the matrix A = S T discussed in
3 5
1 -1
Example 2, and then find all the eigenvalues and their eigenvectors.
Hence, the roots of cA(x) are λ1 = 4 and λ2 = -2, so these are the eigenvalues
of A. Note that λ1 = 4 was the eigenvalue mentioned in Example 2, but we
have found a new one: λ2 = -2.
To find the eigenvectors corresponding to λ2 = -2, observe that in this case
S T = S -1 T
λ2 - 3 -5 -5 -5
(λ2I - A)x =
-1 λ2 +1 -1
Note that a square matrix A has many eigenvectors associated with any given
eigenvalue λ. In fact every nonzero solution x of (λI - A)x = 0 is an eigenvector.
Recall that these solutions are all linear combinations of certain basic solutions
determined by the gaussian algorithm (see Theorem 2 Section 1.3). Observe that
any nonzero multiple of an eigenvector is again an eigenvector,9 and such multiples
are often more convenient.10 Any set of nonzero multiples of the basic solutions of
(λI - A)x = 0 will be called a set of basic eigenvectors corresponding to λ.
EXAMP L E 4
Find the characteristic polynomial, eigenvalues, and basic eigenvectors for
2 0 0
A = 1 2 −1 .
1 3 −2
Solution ► Here the characteristic polynomial is given by
x−2 0 0
cA(x) = det −1 x − 2 1 = (x - 2)(x - 1)(x + 1)
−1 −3 x + 2
so the eigenvalues are λ1 = 2, λ2 = 1, and λ3 = -1. To find all eigenvectors for
λ1 = 2, compute
λ1 − 2 0 0 0 0 0
λ1I - A = −1 λ1 − 2 1 = −1 0 1
−1 −3 λ1 + 2 −1 − 3 4
ST ST
1 1
x = t 1 , where t is arbitrary, so we can use x1 = 1 as the basic
1 1
eigenvector corresponding to λ1 = 2. As the reader can verify, the gaussian
ST ST
0 0
algorithm gives basic eigenvectors x2 = 1 and x3 = _13 corresponding to
1 1
λ2 = 1 and λ3 = -1, respectively. Note that to eliminate fractions, we could
ST
0
instead use 3x3 = 1 as the basic λ3-eigenvector.
3
EXAMPLE 5
If A is a square matrix, show that A and AT have the same characteristic
polynomial, and hence the same eigenvalues.
A-Invariance
If A is a 2 × 2 matrix, we can describe the eigenvectors of A geometrically using the
following concept. A line L through the origin in 2 is called A-invariant if Ax is in
L whenever x is in L. If we think of A as a linear transformation 2 → 2, this asks
that A carries L into itself, that is the image Ax of each vector x in L is again in L.
EXAMPLE 6
The x axis L = U S T | x in V is A-invariant for any matrix of the form
x
0
A=S T because S T s t = s t is L for all x = S T in L.
a b a b x ax x
0 c 0 c 0 0 0
SECTION 3.3 Diagonalization and Eigenvalues 155
Theorem 3
Let A be a 2 × 2 matrix, let x ≠ 0 be a vector in 2, and let Lx be the line through the
origin in 2 containing x. Then
x is an eigenvector of A if and only if Lx is A-invariant.
EXAMP L E 7
1. If θ is not a multiple of π, show that A = S T has no real
cos θ -sin θ
sin θ cos θ
eigenvalue.
S T has a 1 as an
1 1 − m2 2m
2. If m is real show that B = _______
1 + m2 2m 2
m −1
eigenvalue.
Solution ►
(1) A induces rotation about the origin through the angle θ (Theorem 4
Section 2.6). Since θ is not a multiple of π, this shows that no line
through the origin is A-invariant. Hence A has no eigenvector by
Theorem 3, and so has no eigenvalue.
(2) B induces reflection Qm in the line through the origin with slope m by
Theorem 5 Section 2.6. If x is any nonzero point on this line then it is
clear that Qmx = x, that is Qmx = 1x. Hence 1 is an eigenvalue (with
eigenvector x).
11 This is called the Fundamental Theorem of Algebra and was first proved by Gauss in his doctoral dissertation.
156 Chapter 3 Determinants and Diagonalization
we have done (gaussian elimination, matrix algebra, determinants, etc.) works if all
the scalars are complex.
Diagonalization
An n × n matrix D is called a diagonal matrix if all its entries off the main diagonal
are zero, that is if D has the form
λ1 0 0
0 λ2 0
D= = diag(λ1, λ2, …, λn)
0 0 λn
where λ1, λ2, …, λn are numbers. Calculations with diagonal matrices are very easy.
Indeed, if D = diag(λ1, λ2, …, λn) and E = diag(1, 2, …, n) are two diagonal
matrices, their product DE and sum D + E are again diagonal, and are obtained by
doing the same operations to corresponding diagonal elements:
DE = diag(λ11, λ22, …, λnn)
D + E = diag(λ1 + 1, λ2 + 2, …, λn + n)
Because of the simplicity of these formulas, and with an eye on Theorem 1 and the
discussion preceding it, we make another definition:
To discover when such a matrix P exists, we let x1, x2, …, xn denote the columns
of P and look for ways to determine when such xi exist and how to compute them.
To this end, write P in terms of its columns as follows:
P = [x1, x2, …, xn]
Observe that P-1AP = D for some diagonal matrix D holds if and only if
AP = PD
If we write D = diag(λ1, λ2, …, λn), where the λi are numbers to be determined, the
equation AP = PD becomes
λ1 0 0
0 λ2 0
A[x1, x2, …, xn] = [x1, x2, …, xn]
0 0 λn
Theorem 4
Let A be an n × n matrix.
1. A is diagonalizable if and only if it has eigenvectors x1, x2, …, xn such that the
matrix P = [x1 x2 xn] is invertible.
2. When this is the case, P-1AP = diag(λ1, λ2, …, λn) where, for each i, λi is the
eigenvalue of A corresponding to xi.
EXAMP L E 8
2 0 0
Diagonalize the matrix A = 1 2 −1 in Example 4.
1 3 −2
►
Solution By Example 4, the eigenvalues of A are λ1 = 2, λ2 = 1, and λ3 = -1,
ST ST ST
1 0 0
with corresponding basic eigenvectors x1 = 1 , x2 = 1 , and x3 = 1 ,
1 1 3
10 0
respectively. Since the matrix P = [x1 x2 xn] = 1 1 1 is
11 3
λ1 0 0 2 0 0
invertible, Theorem 4 guarantees that P-1AP = 0 λ2 0 = 0 1 0 = D.
0 0 λ3 0 0 −1
The reader can verify this directly—easier to check AP = PD.
In Example 8, suppose we let Q = [x2 x1 x3] be the matrix formed from the
eigenvectors x1, x2, and x3 of A, but in a different order than that used to form P.
Then Q-1AQ = diag(λ2, λ1, λ3) is diagonal by Theorem 4, but the eigenvalues
are in the new order. Hence we can choose the diagonalizing matrix P so that the
eigenvalues λi appear in any order we want along the main diagonal of D.
In every example above each eigenvalue has had only one basic eigenvector. Here
is a diagonalizable matrix where this is not the case.
EXAMP L E 9
0 1 1
Diagonalize the matrix A = 1 0 1 .
1 1 0
Solution ► To compute the characteristic polynomial of A first add rows 2 and 3
of xI - A to row 1:
158 Chapter 3 Determinants and Diagonalization
ST
1
system of equations (λ1I - A)x = 0 has general solution x = t 1 as the reader
1
ST
1
can verify, so a basic λ1-eigenvector is x1 = 1 .
1
Turning to the repeated eigenvalue λ2 = -1, we must solve (λ2I - A)x = 0.
S T S T
-1 -1
By gaussian elimination, the general solution is x = s 1 + t 0 where s and t
0 1
are arbitrary. Hence the gaussian algorithm produces two basic λ2-eigenvectors
S T S T
-1 -1 1 −1 −1
x2 = 1 and y2 = 0 . If we take P = [x1 x2 y2] = 1 1 0 we find that P
0 1 1 0 1
is invertible. Hence P-1AP = diag(2, -1, -1) by Theorem 4.
Definition 3.7 An eigenvalue λ of a square matrix A is said to have multiplicity m if it occurs m times
as a root of the characteristic polynomial cA(x).
Theorem 5
Theorem 6
The proofs of Theorems 5 and 6 require more advanced techniques and are given in
Chapter 5. The following procedure summarizes the method.
Diagonalization Algorithm
To diagonalize an n × n matrix A:
Step 1. Find the distinct eigenvalues λ of A.
Step 2. Compute the basic eigenvectors corresponding to each of these eigenvalues λ as
basic solutions of the homogeneous system (λI - A)x = 0.
Step 3. The matrix A is diagonalizable if and only if there are n basic eigenvectors
in all.
Step 4. If A is diagonalizable, the n × n matrix P with these basic eigenvectors as its
columns is a diagonalizing matrix for A, that is, P is invertible and P-1AP is
diagonal.
The diagonalization algorithm is valid even if the eigenvalues are nonreal complex
numbers. In this case the eigenvectors will also have complex entries, but we will
not pursue this here.
EXAMPLE 10
Show that A = S T is not diagonalizable.
1 1
0 1
Solution 1 ► The characteristic polynomial is cA(x) = (x - 1)2, so A has only one
eigenvalue λ1 = 1 of multiplicity 2. But the system of equations (λ1I - A)x = 0
has general solution t S 1 T , so there is only one parameter, and so only one basic
0
eigenvector S T. Hence A is not diagonalizable.
1
2
EXAMPLE 11
If λ3 = 5λ for every eigenvalue of the diagonalizable matrix A, show that
A3 = 5A.
160 Chapter 3 Determinants and Diagonalization
If p(x) is any polynomial and p(λ) = 0 for every eigenvalue of the diagonalizable
matrix A, an argument similar to that in Example 11 shows that p(A) = 0.
Thus Example 11 deals with the case p(x) = x3 - 5x. In general, p(A) is
called the evaluation of the polynomial p(x) at the matrix A. For example, if
p(x) = 2x3 - 3x + 5, then p(A) = 2A3 - 3A + 5I—note the use of the identity
matrix.
In particular, if cA(x) denotes the characteristic polynomial of A, we certainly
have cA(λ) = 0 for each eigenvalue λ of A (Theorem 2). Hence cA(A) = 0 for every
diagonalizable matrix A. This is, in fact, true for any square matrix, diagonalizable
or not, and the general result is called the Cayley-Hamilton theorem. It is proved
in Section 8.6 and again in Section 9.4.
EXAMPLE 12
Assuming that the initial values were a0 = 100 adult females and j0 = 40
juvenile females, compute ak and jk for k = 1, 2, ….
SECTION 3.3 Diagonalization and Eigenvalues 161
S 2 4 T is
1 _
_ 1
Solution ► The characteristic polynomial of the matrix A =
2 0
cA(x) = x2 - _12 x - _12 = (x - 1)(x + _12 ), so the eigenvalues are λ1 = 1 and
λ2 = -_12 and gaussian elimination gives corresponding basic eigenvectors
S0 -_12 T
1 0
P-1AP = D where D = .
Definition 3.8 If A is an n × n matrix, a sequence v0, v1, v2, … of columns in n is called a linear
dynamical system if v0 is specified and v1, v2, … are given by the matrix recurrence
vk+1 = Avk for each k ≥ 0.
As before, we obtain
vk = Akv0 for each k = 1, 2, … (∗)
Hence the columns vk are determined by the powers Ak of the matrix A and, as we
have seen, these powers can be efficiently computed if A is diagonalizable. In fact (∗)
can be used to give a nice “formula” for the columns vk in this case.
Assume that A is diagonalizable with eigenvalues λ1, λ2, …, λn and corresponding
basic eigenvectors x1, x2, …, xn. If P = [x1 x2 xn] is a diagonalizing matrix with
the xi as columns, then P is invertible and
162 Chapter 3 Determinants and Diagonalization
ST
b1
-1 b
b = P v0 = 2
bn
Then matrix multiplication gives
vk = PDk(P-1v0)
ST
λ1k 0 0 b1
= [x1 x2 xn] 0 λ k2 0 b2
bn
0 0 λ kn
b1λ1k
k
= [x1 x2 xn] b2 λ 2
bn λ kn
= b1λ k1x1 + b2λ k2x2 + + bnλ knxn (∗∗)
for each k ≥ 0. This is a useful exact formula for the columns vk. Note that, in
particular, v0 = b1x1 + b2x2 + + bnxn.
However, such an exact formula for vk is often not required in practice; all that is
needed is to estimate vk for large values of k (as was done in Example 12). This can
be easily done if A has a largest eigenvalue. An eigenvalue λ of a matrix A is called a
dominant eigenvalue of A if it has multiplicity 1 and
|λ| > |μ| for all eigenvalues μ ≠ λ
where |λ| denotes the absolute value of the number λ. For example, λ1 = 1 is
dominant in Example 12.
Returning to the above discussion, suppose that A has a dominant eigenvalue. By
choosing the order in which the columns xi are placed in P, we may assume that λ1
is dominant among the eigenvalues λ1, λ2, …, λn of A (see the discussion following
Example 8). Now recall the exact expression for Vk in (∗∗) above:
vk = b1λ k1x1 + b2λ k2x2 + + bnλ knxn
Take λ k1 out as a common factor in this equation to get
Theorem 7
Consider the dynamical system v0, v1, v2, … with matrix recurrence
vk+1 = Avk for k ≥ 0
where A and v0 are given. Assume that A is a diagonalizable n × n matrix with
eigenvalues λ1, λ2, …, λn and corresponding basic eigenvectors x1, x2, …, xn, and
let P = [x1 x2 xn] be the diagonalizing matrix. Then an exact formula for vk is
vk = b1λ k1x1 + b2λ k2x2 + + bnλ knxn for each k ≥ 0
where the coefficients bi come from
ST
b1
b
-1
b = P v0 = 2 .
bn
Moreover, if A has dominant eigenvalue λ1,12 then vk is approximated by
vk = b1λ k1x1 for sufficiently large k.
12
EXAMPLE 13
Returning to Example 12, we see that λ1 = 1 is the dominant eigenvalue, with
eigenvector x1 = S 1 T. Here P = S T and v0 = S T , so P-1v0 = _13 S T.
1 -1 100 220
2 2 4 40 -80
220
Hence b1 = ___
3
in the notation of Theorem 7, so
S jk T = vk ≈ b1λ 1x1 = 3 1 S 2 T
ak k 220 k 1
___
220 440
where k is large. Hence ak ≈ ___
3
and jk ≈ ___
3
, as in Example 12.
This next example uses Theorem 7 to solve a “linear recurrence.” See also
Section 3.4.
EXAMPLE 14
Suppose a sequence x0, x1, x2, … is determined by insisting that
x0 = 1, x1 = -1, and xk+2 = 2xk - xk+1 for every k ≥ 0.
Find a formula for xk in terms of k.
Solution ► Using the linear recurrence xk+2 = 2xk - xk+1 repeatedly gives
x2 = 2x0 - x1 = 3, x3 = 2x1 - x2 = 5, x4 = 11, x5 = 21, …
so the xi are determined but no pattern is apparent. The idea is to find
vk = S
xk+1 T
xk
for each k instead, and then retrieve xk as the top component
of vk. The reason this works is that the linear recurrence guarantees that these
vk are a dynamical system:
12 Similar results can be found in other situations. If for example, eigenvalues λ1 and λ2 (possibly equal) satisfy |λ1| = |λ2| > |λi| for
all i > 2, then we obtain vk ≈ b1λ k1x1 + b2λ k2x2 for large k.
164 Chapter 3 Determinants and Diagonalization
=S
2xk − xk+1 T
vk+1 = S T = Avk where A = S T.
xk+1 xk+1 0 1
xk+2 2 −1
The eigenvalues of A are λ1 = -2 and λ2 = 1 with eigenvectors
x1 = S T and x2 = S T , so the diagonalizing matrix is P = S -2 1 T.
1 1 1 1
−2 1
0 v0 = 3 S T so the exact formula for vk is
1 2
Moreover, b = P −1 _
1
S xk+1 T = vk = b1λ 1x1 + b2λ 2x2 = _3 (-2) S T + _3 1 S T.
xk k k 2 k 1 1 k 1
−2 1
Equating top entries gives the desired formula for xk:
xk = _13 S 2(−2)k + 1 T for all k = 0, 1, 2, ….
The reader should check this for the first few values of k.
EXAMPLE 15
EXAMPLE 16
vk = b1(_32 )k S T + b2(_43 )k S T
1 0
0 1
SECTION 3.3 Diagonalization and Eigenvalues 165
y
for k = 0, 1, 2, …. Since both eigenvalues are greater than 1 in absolute value,
the trajectories diverge away from the origin for every choice of initial point V0.
For this reason, the origin is called a repellor for the system.13
13
O x
EXAMPLE 17
O x 1 1
for k = 0, 1, 2, …. In this case _32 is the dominate eigenvalue so, if b1 ≠ 0, we
have vk ≈ b1(_32 )k S T for large k and vk is approaching the line y = -x.
-1
1
However, if b1 = 0, then vk = b2(_12 )k S T and so approaches the origin along the
1
1
line y = x. In general the trajectories appear as in the diagram, and the origin
is called a saddle point for the dynamical system in this case.
EXAMPLE 18
v0
eigenvalues are the complex numbers _2i and -_2i where i2 = -1. Hence A is not
1
diagonalizable as a real matrix. However, the trajectories are not difficult to
v3
describe. If we start with v0 = S 1 T , then the trajectory begins as
1
S T S T S T S T S T S T
1 1 1 1
0 1 x
1
_
2
_
-4 -_18 __
16
__
32
-__
64
v2 v1 = , v2 = , v3 = , v4 = , v5 = , v6 = ,…
v1 -_12 -_14 1
_
8
1
__
16
-__1
32
1
-__
64
Five of these points are plotted in the diagram. Here each trajectory spirals in
toward the origin, so the origin is an attractor. Note that the two (complex)
eigenvalues have absolute value less than 1 here. If they had absolute value
greater than 1, the trajectories would spiral out from the origin.
Google PageRank
Dominant eigenvalues are useful to the Google search engine for finding
information on the Web. If an information query comes in from a client, Google
has a sophisticated method of establishing the “relevance” of each site to that
query. When the relevant sites have been determined, they are placed in order of
importance using a ranking of all sites called the PageRank. The relevant sites with
the highest PageRank are the ones presented to the client. It is the construction of
the PageRank that is our interest here.
Sb T
b1
13 In fact, P = I here, so v0 =
2
166 Chapter 3 Determinants and Diagonalization
The Web contains many links from one site to another. Google interprets a link
from site j to site i as a “vote” for the importance of site i. Hence if site i has more
links to it than does site j, then i is regarded as more “important” and assigned a
higher PageRank. One way to look at this is to view the sites as vertices in a huge
directed graph (see Section 2.2). Then if site j links to site i there is an edge from j
to i, and hence the (i, j)-entry is a 1 in the associated adjacency matrix (called the
connectivity matrix in this context). Thus a large number of 1s in row i of this matrix
is a measure of the PageRank of site i.14
However this does not take into account the PageRank of the sites that link to
i. Intuitively, the higher the rank of these sites, the higher the rank of site i. One
approach is to compute a dominant eigenvector x for the connectivity matrix.
In most cases the entries of x can be chosen to be positive with sum 1. Each site
corresponds to an entry of x, so the sum of the entries of sites linking to a given site
i is a measure of the rank of site i. In fact, Google chooses the PageRank of a site so
that it is proportional to this sum.15
EXERCISES 3.3
14
15
(a) A = S T, P = S T
6 -5 1 5 (b) If all columns of A have the same sum s,
2 -1 1 2 show that s is an eigenvalue.
A=S T, P = S T
-7 -12 -3 4
(b) 20. Let A be an invertible n × n matrix.
6 10 2 -3
(a) Show that the eigenvalues of A are nonzero.
[Hint: (PDP-1)n = PDnP-1 for each n = 1, 2, ….]
(b) Show that the eigenvalues of A-1 are
9. (a) If A = S T and B = S T , verify that A
1 3 2 0
precisely the numbers 1/λ, where λ is an
0 2 0 1 eigenvalue of A.
and B are diagonalizable, but AB is not. (-x)n
(c) Show that cA-1(x) = _____ cAQ _1x R.
If D = S T , find a diagonalizable matrix A
1 0 det A
(b)
0 -1 21. Suppose λ is an eigenvalue of a square matrix A
such that D + A is not diagonalizable. with eigenvector x ≠ 0.
10. If A is an n × n matrix, show that A is (a) Show that λ2 is an eigenvalue of A2 (with the
diagonalizable if and only if AT is diagonalizable. same x).
11. If A is diagonalizable, show that each of the (b) Show that λ3 - 2λ + 3 is an eigenvalue of
following is also diagonalizable. A3 - 2A + 3I.
(a) An, n ≥ 1 (b) kA, k any scalar. (c) Show that p(λ) is an eigenvalue of p(A) for
any nonzero polynomial p(x).
(c) p(A), p(x) any polynomial (Theorem 1)
(d) U -1AU for any invertible matrix U. 22. If A is an n × n matrix, show that
cA2(x2) = (-1)ncA(x)cA(-x).
(e) kI + A for any scalar k.
23. An n × n matrix A is called nilpotent if Am = 0
12. Give an example of two diagonalizable matrices for some m ≥ 1.
A and B whose sum A + B is not diagonalizable.
(a) Show that every triangular matrix with zeros
13. If A is diagonalizable and 1 and -1 are the only on the main diagonal is nilpotent.
eigenvalues, show that A-1 = A. (b) If A is nilpotent, show that λ = 0 is the only
14. If A is diagonalizable and 0 and 1 are the only eigenvalue (even complex) of A.
eigenvalues, show that A2 = A. (c) Deduce that cA(x) = xn, if A is n × n and
nilpotent.
15. If A is diagonalizable and λ ≥ 0 for each
eigenvalue of A, show that A = B2 for some 24. Let A be diagonalizable with real eigenvalues and
matrix B. assume that Am = I for some m ≥ 1.
16. If P-1AP and P-1BP are both diagonal, show that (a) Show that A2 = I.
AB = BA. [Hint: Diagonal matrices commute.] (b) If m is odd, show that A = I.
n
17. A square matrix A is called nilpotent if A = 0 [Hint: Theorem 3 Appendix A.]
for some n ≥ 1. Find all nilpotent diagonalizable
matrices. [Hint: Theorem 1.] 25. Let A2 = I, and assume that A ≠ I and A ≠ -I.
18. Let A be any n × n matrix and r ≠ 0 a real number. (a) Show that the only eigenvalues of A are
λ = 1 and λ = -1.
(a) Show that the eigenvalues of rA are precisely
the numbers rλ, where λ is an eigenvalue (b) Show that A is diagonalizable.
of A. [Hint: Verify that A(A + I) = A + I and
168 Chapter 3 Determinants and Diagonalization
C=S T.
7 -1
34. In Example 1, let the juvenile survival rate be _25 ,
-1 7 and let the reproduction rate be 2. What values
of the adult survival rate α will ensure that the
population stabilizes?
EXAMPLE 1
An urban planner wants to determine the number xk of ways that a row of k
parking spaces can be filled with cars and trucks if trucks take up two spaces
each. Find the first few values of xk.
SECTION 3.4 An Application to Linear Recurrences 169
Indeed, every way to fill k + 2 spaces falls into one of two categories: Either a
car is parked in the first space (and the remaining k + 1 spaces are filled in xk+1
ways), or a truck is parked in the first two spaces (with the other k spaces filled
in xk ways). Hence, there are xk+1 + xk ways to fill the k + 2 spaces. This is (∗).
The recurrence (∗) determines xk for every k ≥ 2 since x0 and x1 are given.
In fact, the first few values are
x0 = 1
x1 = 1
x2 = x0 + x1 = 2
x3 = x1 + x2 = 3
x4 = x2 + x3 = 5
x5 = x3 + x4 = 8
Clearly, we can find xk for any value of k, but one wishes for a “formula”
for xk as a function of k. It turns out that such a formula can be found using
diagonalization. We will return to this example later.
EXA MP L E 2
Suppose the numbers x0, x1, x2, … are given by the linear recurrence relation
xk+2 = xk+1 + 6xk for k ≥ 0
xk = 3k for k = 0, 1, 2, 3, and 4.
This formula holds for all k because it is true for k = 0 and k = 1, and it
satisfies the recurrence xk+2 = xk+1 + 6xk for each k as is readily checked.
However, if we begin instead with x0 = 1 and x1 = 1, the sequence
continues x2 = 7, x3 = 13, x4 = 55, x5 = 133, … . In this case, the sequence is
uniquely determined but no formula is apparent. Nonetheless, a simple device
transforms the recurrence into a matrix recurrence to which our diagonalization
techniques apply.
The idea is to compute the sequence v0, v1, v2, … of columns instead of the
numbers x0, x1, x2, …, where
vk = S
xk+1 T
xk
for each k ≥ 0
EXAMPLE 3
In Example 1, an urban planner wants to determine xk, the number of ways that
a row of k parking spaces can be filled with cars and trucks if trucks take up two
spaces each. Find a formula for xk and estimate it for large k.
SECTION 3.4 An Application to Linear Recurrences 171
If we write vk = S
xk+1 T
xk
as before, this recurrence becomes a matrix recurrence
for the vk:
=S
xk + xk+1 T S 1 1 T S xk+1 T
vk+1 = S T
xk+1 xk+1 0 1 xk
= = Avk
xk+2
S b T = P v0 = ____
- √5 S -λ T √5 S
S T = ___ T
b1 -1 1 __
λ2 -1 1 1__
λ1
2 1 1 1 -λ2
where we used the fact that λ1 + λ2 = 1. Thus Theorem 7 Section 3.3 gives
1S
λ1 T √5 S λ2 T
S xk+1 T = vk = b1λ 1x1 + b2λ 2x2 = __
xk k k λ1 k 1 λ 1
__ λ
√5
- ____2 λ k2
Comparing top entries gives an exact formula for the numbers xk:
1__ S k+1
xk = __
√ 5 2 T
λ 1 - λ k+1 for k ≥ 0
Finally, observe that λ1 is dominant here (in fact, λ1 = 1.618 and λ2 = -0.618
to three decimal places) so λ k+1
2 is negligible compared with λ k+1
1 if k is large.
Thus,
1__ k+1
xk ≈ __
√
λ1 for each k ≥ 0
5
This is a good approximation, even for as small a value as k = 12. Indeed,
repeated use of the recurrence xk+2 = xk + xk+1 gives the exact value x12 = 233,
(1.618)13
while the approximation is x12 ≈ ______
√
__ = 232.94.
5
The sequence x0, x1, x2, … in Example 3 was first discussed in 1202 by Leonardo
Pisano of Pisa, also known as Fibonacci,16 and is now called the Fibonacci
sequence. It is completely determined by the conditions x0 = 1, x1 = 1 and the
recurrence xk+2 = xk + xk+1 for each k ≥ 0. These numbers have been studied for
centuries and have many interesting properties (there is even a journal, the Fibonacci
Quarterly, devoted exclusively to them). For example, biologists have discovered
that the arrangement of leaves around the stems of some plants follow a Fibonacci
1__ S k+1
pattern. The formula xk = __√ 2 T in Example 3 is called the Binet
λ 1 - λ k+1
5
16 The problem Fibonacci discussed was: “How many pairs of rabbits will be produced in a year, beginning with a single pair, if in every
month each pair brings forth a new pair that becomes productive from the second month on? Assume no pairs die.” The number of
pairs satisfies the Fibonacci recurrence.
172 Chapter 3 Determinants and Diagonalization
formula. It is remarkable in that the xk are integers but λ1 and λ2 are not. This
phenomenon can occur even if the eigenvalues λi are nonreal complex numbers.
We conclude with an example showing that nonlinear recurrences can be very
complicated.
EXAMPLE 4
Suppose a sequence x0, x1, x2, … satisfies the following recurrence:
1
_ x if xk is even
xk+1 = e
2 k
3xk + 1 if xk is odd
If x0 = 1, the sequence is 1, 4, 2, 1, 4, 2, 1, … and so continues to cycle
indefinitely. The same thing happens if x0 = 7. Then the sequence is
7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1, …
and it again cycles. However, it is not known whether every choice of x0 will
lead eventually to 1. It is quite possible that, for some x0, the sequence will
continue to produce different values indefinitely, or will repeat a value and
cycle without reaching 1. No one knows for sure.
EXERCISES 3.4
1. Solve the following linear recurrences. (b) If busses take up 4 spaces, find a recurrence
for the xk and compute x10.
(a) xk+2 = 3xk + 2xk+1, where x0 = 1 and x1 = 1.
(b) xk+2 = 2xk - xk+1, where x0 = 1 and x1 = 2. 4. A man must climb a flight of k steps. He always
takes one or two steps at a time. Thus he can
(c) xk+2 = 2xk + xk+1, where x0 = 0 and x1 = 1. climb 3 steps in the following ways: 1, 1, 1; 1, 2;
(d) xk+2 = 6xk - xk+1, where x0 = 1 and x1 = 1. or 2, 1. Find sk, the number of ways he can climb
the flight of k steps. [Hint: Fibonacci.]
2. Solve the following linear recurrences.
5. How many “words” of k letters can be made
(a) xk+3 = 6xk+2 - 11xk+1 + 6xk, where x0 = 1, from the letters {a, b} if there are no adjacent a’s?
x1 = 0, and x2 = 1.
6. How many sequences of k flips of a coin are
(b)xk+3 = -2xk+2 + xk+1 + 2xk, where x0 = 1,
there with no HH?
x1 = 0, and x2 = 1.
S T
xk 7. Find xk, the number of ways to make a stack of
[Hint: Use vk = xk+1 .] k poker chips if only red, blue, and gold chips are
xk+2 used and no two gold chips are adjacent. [Hint:
3. In Example 1 suppose busses are also allowed to Show that xk+2 = 2xk+1 + 2xk by considering how
park, and let xk denote the number of ways a row many stacks have a red, blue, or gold chip on top.]
of k parking spaces can be filled with cars, trucks,
and busses. 8. A nuclear reactor contains α- and β-particles.
In every second each α-particle splits into three
(a) If trucks and busses take up 2 and 3 β-particles, and each β-particle splits into an
spaces respectively, show that α-particle and two β-particles. If there is a single
xk+3 = xk + xk+1 + xk+2 for each k, α-particle in the reactor at time t = 0, how
and use this recurrence to compute x10. many α-particles are there at t = 20 seconds?
[Hint: The eigenvalues are of little use.] [Hint: Let xk and yk denote the number of α- and
β-particles at time t = k seconds. Find xk+1 and
yk+1 in terms of xk and yk.]
SECTION 3.5 An Application to Systems of Differential Equations 173
9. The annual yield of wheat in a certain country 12. Consider the recurrence xk+2 = axk+1 + bxk + c
has been found to equal the average of the where c may not be zero.
yield in the previous two years. If the yields
(a) If a + b ≠ 1 show that p can be found
in 1990 and 1991 were 10 and 12 million tons
such that, if we set yk = xk + p, then
respectively, find a formula for the yield k years
yk+2 = ayk+1 + byk. [Hence, the sequence
after 1990. What is the long-term average yield?
xk can be found provided yk can be
10. Find the general solution to the recurrence found by the methods of this section
xk+1 = rxk + c where r and c are constants. (or otherwise).]
[Hint: Consider the cases r = 1 and r ≠ 1 (b) Use (a) to solve the recurrence
separately. If r ≠ 1, you will need the identity xk+2 = xk+1 + 6xk + 5 where x0 = 1
1 + r + r2 + + rn-1 = 1 - rn
____ for n ≥ 1.] and x1 = 1.
1-r
S T S T
xk 0 1 0
(a) If vk = xk+1 and A = 0 0 1 , show that where c(k) is a function of k, and consider the
xk+2 a b c related recurrence
vk+1 = Avk.
xk+2 = axk+1 + bxk (∗∗)
(b) If λ is any eigenvalue of A, show that
S T
Suppose that xk = pk is a particular solution of (∗).
1
x = λ is a λ-eigenvector. (a) If qk is any solution of (∗∗), show that qk + pk
λ2 is a solution of (∗).
[Hint: Show directly that Ax = λx.] (b) Show that every solution of (∗) arises as in
(c) Generalize (a) and (b) to a recurrence (a) as the sum of a solution of (∗∗) plus the
xk+4 = axk + bxk+1 + cxk+2 + dxk+3 of particular solution pk of (∗).
length 4.
f (x) = af (x) for all x. Consider the new function g given by g(x) = f (x)e-ax.
Then the product rule of differentiation gives
g(x) = f (x)[-ae-ax] + f (x)e-ax
= -af (x)e-ax + [af (x)]e-ax
=0
for all x. Hence the function g(x) has zero derivative and so must be a constant, say
g(x) = c. Thus c = g(x) = f (x)e-ax, that is
f (x) = ceax.
In other words, every solution f (x) of (∗) is just a scalar multiple of eax. Since every
such scalar multiple is easily seen to be a solution of (∗), we have proved
Theorem 1
EXAMPLE 1
Assume that the number n(t) of bacteria in a culture at time t has the property
that the rate of change of n is proportional to n itself. If there are n0 bacteria
present when t = 0, find the number at time t.
n(t) = n0ekt
gives the number at time t. Of course the constant k depends on the strain
of bacteria.
where the aij are constants. This is called a linear system of differential equations
or simply a differential system. The first step is to put it in matrix form. Write
S T S T
f1 f 1 a11 a12 a1n
f f 2 a21 a22 a2n
f = 2 f = A=
fn f n an1 an 2 ann
Then the system can be written compactly using matrix multiplication:
f = Af
Hence, given the matrix A, the problem is to find a column f of differentiable
functions that satisfies this condition. This can be done if A is diagonalizable.
Here is an example.
EXAMP L E 2
Find a solution to the system
f 1 = f1 + 3f2
f 2 = 2f1 + 2f2
that satisfies f1(0) = 0, f2(0) = 5.
S f2 T and A = S 2
T. The reader can
f1
1 3
Solution ► This is f = Af, where f =
2
verify that cA(x) = (x - 4)(x + 1), and that x1 = S T and x2 = S T are
1 3
1 -2
eigenvectors corresponding to the eigenvalues 4 and -1, respectively. Hence
the diagonalization algorithm gives P-1AP = S T , where
4 0
0 -1
P = [x1 x2] = S T. Now consider new functions g1 and g2 given by f = Pg
1 3
1 -2
(equivalently, g = P-1f ), where g = S T. Then
g1
g2
S f2 T = S 1 TS T
f1 1 3 g1 f1 = g1 + 3g2
that is,
-2 g2 f2 = g1 - 2g2
Hence f 1 = g1 + 3g2 and f 2 = g1 - 2g2 so that
S f 2 T = S 1 -2 S g2 T
T
f 1 1 3 g1
`f = = Pg
S g2 T = S 0 T S T,
g1 4 0 g1 g1 = 4g1
so
-1 g2 g2 = -g2
4x -x
Hence Theorem 1 gives g1(x) = ce , g2(x) = de , where c and d are constants.
Finally, then,
S f2 (x) T = P S g2 (x) T = S 1 T S -x T = S 4x T
f1(x) g1(x) 1 3 ce4x ce4x + 3de-x
-2 de ce - 2de-x
so the general solution is
f1(x) = ce4x + 3de-x
c and d constants.
f2(x) = ce4x - 2de-x
It is worth observing that this can be written in matrix form as
S f2 (x) T = c S 1 Te + dS Te
f1(x) 1 4x 3 -x
-2
That is,
f (x) = cx1e4x + dx2e-x
This form of the solution works more generally, as will be shown.
Finally, the requirement that f1(0) = 0 and f2(0) = 5 in this example
determines the constants c and d:
0 = f1(0) = ce0 + 3de0 = c + 3d
5 = f2(0) = ce0 - 2de0 = c - 2d
These equations give c = 3 and d = -1, so
f1(x) = 3e4x - 3e-x
f2(x) = 3e4x + 2e-x
satisfy all the requirements.
Theorem 2
PROOF
By Theorem 4 Section 3.3, the matrix P = [x1, x2, …, xn] is invertible and
λ1 0 0
0 λ2 0
P-1AP = .
0 0 λn
S T S T
f1 g1
f2 g2
As in Example 2, write f = and define g = by g = P-1f; equivalently,
fn gn
f = Pg. If P = [ pij], this gives
fi = pi1g1 + pi2g2 + + pingn.
Since the pij are constants, differentiation preserves this relationship:
fi = pi1g1 + pi2g2 + + pingn.
so f = Pg. Substituting this into f = Af gives Pg = APg. But then
multiplication by P-1 gives g = P-1APg, so the original system of equations
f = Af for f becomes much simpler in terms of g:
S T
g1 λ1 0 0 g1
s t
g2 0 λ2 0 g2
=
gn 0 0 λn gn
Hence gi = λigi holds for each i, and Theorem 1 implies that the only solutions are
S T
c1eλ1x
c eλ2x
f (x) = [x1, x2, …, xn] 2 = c1x1eλ1x + c2x2eλ2x + + cnx2eλnx
cneλnx
This is what we wanted.
EXAMPLE 3
Find the general solution to the system
f 1 = 5 f1 + 8 f2 + 16 f3
f 2 = 4 f1 + f2 + 8 f3
f 3 = -4 f1 - 4 f2 - 11 f3
Then find a solution satisfying the boundary conditions f1(0) = f2(0) = f3(0) = 1.
5 8 16
Solution ► The system has the form f = Af, where A = 4 1 8 . Then
− 4 − 4 −11
cA(x) = (x + 3)2(x - 1) and eigenvectors corresponding to the eigenvalues -3,
-3, and 1 are, respectively,
S T S T S T
-1 -2 2
x1 = 1 x2 = 0 x3 = 1
0 1 -1
Hence, by Theorem 2, the general solution is
S T S T S T
-1 -2 2
f (x) = c1 1 e-3x + c2 0 e-3x + c3 1 ex, ci constants.
0 1 -1
The boundary conditions f1(0) = f2(0) = f3(0) = 1 determine the constants ci.
ST S T S T S T
1 -1 -2 2
1 = f (0) = c1 1 + c2 0 + c3 1
1 0 1 -1
−1 − 2 2 c1
= 1 0 1 c2
0 1 − 1 c3
The solution is c1 = -3, c2 = 5, c3 = 4, so the required specific solution is
EXERCISES 3.5
4. The population N(t) of a region at time t 7. Writing f = (f ), consider the third order
increases at a rate proportional to the population. differential equation
If the population doubles every 5 years and is
f - a1f - a2f - a3f = 0
3 million initially, find N(t).
where a1, a2, and a3 are real numbers. Let f1 = f,
5. Let A be an invertible diagonalizable n × n f2 = f - a1f and f3 = f - a1f - a2f .
matrix and let b be an n-column of constant
S T
functions. We can solve the system f = Af + b f1
as follows: (a) Show that f2 is a solution to the system
(a) If g satisfies g = Ag (using Theorem 2), f3
show that f = g - A-1b is a solution to
S T S ff T.
f = Af + b. f 1 = a1f1 + f2 f 1 a1 1 0 f1
• 2
f = a f
2 1 + f3 , that is f 2 = a2 0 1 2
(b) Show that every solution to
f = Af + b arises as in (a) for some f 3 = a3f1 f 3 a3 0 0 3
solution g to g = Ag.
S T
f1
6. Denote the second derivative of f by f = ( f ).
(b) Show further that if f2 is any solution to
Consider the second order differential equation
f - a1f - a2 f = 0, a1 and a2 real numbers. (∗) f3
this system, then f = f1 is a solution to (∗).
(a) If f is a solution to (∗) let f1 = f and Remark. A similar construction casts every
f2 = f - a1f. Show that linear differential equation of order n (with
, that is S T = S
a2 0 S f2 T
T .
f 1 = a1f1 + f2
e
f 1 a1 1 f1 constant coefficients) as an n × n linear
f 2 = a2f1 f 2 system of first order equations. However, the
(b) Conversely, if S
f2 T
f1 matrix need not be diagonalizable, so other
is a solution to the system methods have been developed.
in (a), show that f1 is a solution to (∗).
4
17 Summation notation is a convenient shorthand way to write sums of similar expressions. For example a1 + a2 + a3 + a4 = ∑ai,
8 5 i=1
2 2 2 2 2 2
a5b5 + a6b6 + a7b7 + a8b8 = ∑akbk, and 1 + 2 + 3 + 4 + 5 = ∑j .
k=5 j=1
180 Chapter 3 Determinants and Diagonalization
det S
a21 a22 T
a11 a12
= a11 det[a22] - a21 det[a12] = a11a22 - a21a12
Lemma 1
Let A, B, and C be n × n matrices that are identical except that the pth row of A is the
sum of the pth rows of B and C. Then
det A = det B + det C
PROOF
We proceed by induction on n, the cases n = 1 and n = 2 being easily checked.
Consider ai1 and Ai1:
Case 1: If i ≠ p,
ai1 = bi1 = ci1 and det Ai1 = det Bi1 = det Ci1
by induction because Ai1, Bi1, Ci1 are identical except that one row of Ai1 is the
sum of the corresponding rows of Bi1 and Ci1.
Case 2: If i = p,
ap1 = bp1 + cp1 and Ap1 = Bp1 = Cp1
Now write out the defining sum for det A, splitting off the pth term for special
attention.
where det Ai1 = det Bi1 + det Ci1 by induction. But the terms here involving Bi1
and bp1 add up to det B because ai1 = bi1 if i ≠ p and Ap1 = Bp1. Similarly, the
terms involving Ci1 and cp1 add up to det C. Hence det A = det B + det C, as
required.
18 Note that we used the expansion along row 1 at the beginning of Section 3.1. The column 1 expansion definition is more
convenient here.
SECTION 3.6 Proof of the Cofactor Expansion Theorem 181
Lemma 2
PROOF
For later reference the defining sums for det A and det B are as follows:
n
det A = ∑ai1(-1)i+1 det Ai1 (∗)
i=1
n
det B = ∑bi1(-1)i+1 det Bi1 (∗∗)
i=1
Property 1. The proof is by induction on n, the cases n = 1 and n = 2 being easily
verified. Consider the ith term in the sum (∗∗) for det B where B is the result of
multiplying row p of A by u.
(a) If i ≠ p, then bi1 = ai1 and det Bi1 = u det Ai1 by induction because Bi1 comes
from Ai1 by multiplying a row by u.
(b) If i = p, then bp1 = uap1 and Bp1 = Ap1.
In either case, each term in equation (∗∗) is u times the corresponding term in
equation (∗), so it is clear that det B = u det A.
Property 2. This is clear by property 1 because the row of zeros has a common
factor u = 0.
Property 3. Observe first that it suffices to prove property 3 for interchanges of
adjacent rows. (Rows p and q (q > p) can be interchanged by carrying out
2(q - p) - 1 adjacent changes, which results in an odd number of sign changes
in the determinant.) So suppose that rows p and p + 1 of A are interchanged to
obtain B. Again consider the ith term in (∗∗).
(a) If i ≠ p and i ≠ p + 1, then bi1 = ai1 and det Bi1 = -det Ai1 by induction
because Bi1 results from interchanging adjacent rows in Ai1. Hence the ith
term in (∗∗) is the negative of the ith term in (∗). Hence det B = -det A in
this case.
(b) If i = p or i = p + 1, then bp1 = ap+1,1 and Bp1 = Ap+1,1, whereas bp+1,1 = ap1
and Bp+1,1 = Ap1. Hence terms p and p + 1 in (∗∗) are
bp1(-1)p+1 det Bp1 = -ap+1,1(-1)(p+1)+1 det(Ap+1,1)
bp+1,1(-1)(p+1)+1 det(Bp+1,1) = -ap1(-1)p+1 det Ap1
This means that terms p and p + 1 in (∗∗) are the same as these terms in (∗),
except that the order is reversed and the signs are changed. Thus the sum (∗∗) is
the negative of the sum (∗); that is, det B = -det A.
182 Chapter 3 Determinants and Diagonalization
These facts are enough to enable us to prove Theorem 1 Section 3.1. For
convenience, it is restated here in the notation of the foregoing lemmas. The only
difference between the notations is that the (i, j)-cofactor of an n × n matrix A was
denoted earlier by
cij(A) = (-1)i+j det Aij
Theorem 1
Here Aij denotes the matrix obtained from A by deleting row i and column j.
PROOF
Lemma 2 establishes the truth of Theorem 2 Section 3.1 for rows. With this
information, the arguments in Section 3.2 proceed exactly as written to establish
that det A = det AT holds for any n × n matrix A. Now suppose B is obtained
from A by interchanging two columns. Then BT is obtained from AT by
interchanging two rows so, by property 3 of Lemma 2,
Finally, to prove the row expansion, write B = AT. Then Bij = (A Tij ) and bij = aji
for all i and j. Expanding det B along column j gives
n
det A = det AT = det B = ∑bij(-1)i+j det Bij
i=1
n n
= ∑aji(-1) j+i
det S (A Tji ) T = ∑aji(-1)j+i det Aji
i=1 i=1
This is the required expansion of det A along row j.
EXERCISES 3.6
4
SECTION 4.1 Vectors and Lines
In this chapter we study the geometry of 3-dimensional space. We view a point in
3-space as an arrow from the origin to that point. Doing so provides a “picture” of
the point that is truly worth a thousand words. We used this idea earlier, in Section
2.6, to describe rotations, reflections, and projections of the plane 2. We now apply
the same techniques to 3-space to examine similar transformations of 3. Moreover,
the method enables us to completely describe all lines and planes in space.
Vectors in 3
Introduce a coordinate system in 3-dimensional space in the usual way. First choose
a point O called the origin, then choose three mutually perpendicular lines through
O, called the x, y, and z axes, and establish a number scale on each axis with zero at
the origin. Given a point P in 3-space we associate three numbers x, y, and z with
P, as described in Figure 1. These numbers are called the coordinates of P, and we
denote the point as (x, y, z), or P(x, y, z) to emphasize the label P. The result is
called a cartesian1 coordinate system for 3-space, and the resulting description of
3-space is called cartesian geometry.
z P(x, y, z) As in the plane, we introduce vectors by identifying each point P(x, y, z) with the
ST
0
FIGURE 1
geometry. Note that the origin is 0 = 0 .
0
v is defined to be the distance from the origin to P, that is the length of the arrow
representing v. The following properties of length will be used frequently.
Theorem 1
Let v = S y T be a vector.
x
z
___________
(1) v = √x2 + y2 + z2 . 3
(2) v = 0 if and only if v = 0
(3) av = |a| v for all scalars a.4
34
PROOF
Let v have point P = (x, y, z).
z P (1) In Figure 2, v is the hypotenuse of the right triangle OQP, and so
v2 = h2 + z2 by Pythagoras’ theorem.5 But h is the hypotenuse of the
v right triangle ORQ, so h2 = x2 + y2. Now (1) follows by eliminating h2
z and taking positive square roots.
R
O (2) If v = 0, then x2 + y2 + z2 = 0 by (1). Because squares of real numbers
x h
i
are nonnegative, it follows that x = y = z = 0, and hence that v = 0. The
x y
y converse is because 0 = 0.
Q
FIGURE 2 (3) We have av = (ax,
__
ay, az) so (1) gives av2 = (ax)2__+ (ay)2 + (az)2 = a2v2.
Hence av = √a2 v, and we are done because √a2 = |a| for any real
number a.
EXAMP L E 1
S T
2
If v = −1 then v = √4 + 1 + 9 = √14 . Similarly if v = S T in 2-space
_________ ___ 3
−4
3 ______
then v = √9 + 16 = 5.
When we view two nonzero vectors as arrows emanating from the origin, it is
clear geometrically what we mean by saying that they have the same or opposite
direction. This leads to a fundamental new description of vectors.
__
3 When we write √p we mean the positive square root of p.
4 {
Recall that the absolute value |a| of a real number is defined by | a | = a if a ≥ 0 .
-a if a < 0
5 Pythagoras’ theorem states that if a and b are sides of right triangle with hypotenuse c, then a2 + b2 = c2. A proof is given at the
end of this section.
186 Chapter 4 Vector Geometry
Theorem 2
PROOF
z If v = w, they clearly have the same direction and length. Conversely, let v and
P w be vectors with points P(x, y, z) and Q(x1, y1, z1) respectively. If v and w have
v the same length and direction then, geometrically, P and Q must be the same
Q
point (see Figure 3). Hence x = x1, y = y1, and z = z1, that is v = S T = y1 = w. S T
x x1
w
O y
x y z z1
FIGURE 3
A characterization of a vector in terms of its length and direction only is called
an intrinsic description of the vector. The point to note is that such a description
does not depend on the choice of coordinate system in 3. Such descriptions are
important in applications because physical laws are often stated in terms of vectors,
and these laws cannot depend on the particular coordinate system used to describe
the situation.
Geometric Vectors
If A and B are distinct points in space, the arrow from A to B has length and
direction. Hence:
Definition 4.1 Suppose that_ A and B are any two points in 3. In Figure 4 the line segment from A to
›
z B is denoted
_›
AB and is called the_geometric vector from_›
A to B. Point A is called the
_›
›
tail of AB, B is called the tip of AB, and the length of AB is denoted AB .
B _›
A → Note that if v is any vector in 3 with point_› P then v = OP is itself a geometric
AB
vector where O is the origin. Referring to AB as a “vector” seems justified_› by
Theorem 2 because it has a direction (from A to B) and a length AB . However
O
x y there appears to be a problem because two geometric vectors can have the _› same
FIGURE 4 length
_›
and direction even if the tips and
__
tails are different. For example AB and
√
PQ in Figure 5 have the same length 5 and the same direction (1 unit left and 2
y units up) so, by Theorem 2, they _› are the
_›
same vector! The best way to understand
this apparent paradox is to see AB and PQ as different representations of the same
B(2, 3)
underlying vector S T. Once it is clarified, this phenomenon is a great benefit
-1 7
Q(0, 2) 2
because, thanks to Theorem 2, it means that the same geometric vector can be
A(3, 1) positioned anywhere in space; what is important is the length and direction, not
the location of the tip and tail. This ability to move geometric vectors about is very
P(1, 0)
useful as we shall soon see.
O x
FIGURE 5
6 It is Theorem 2 that gives vectors their power in science and engineering because many physical quantities are determined by
their length and magnitude (and are called vector quantities). For example, saying that an airplane is flying at 200 km/h does not
describe where it is going; the direction must also be specified. The speed and direction comprise the velocity of the airplane, a
vector quantity.
7 Fractions provide another example of quantities that can be the same but look different. For example _69 and __
14
21
certainly appear
different, but they are equal fractions—both equal _23 in “lowest terms”.
SECTION 4.1 Vectors and Lines 187
Because a vector can be positioned with its tail at any point, the parallelogram
v law leads to another way to view vector addition. In Figure 7(a) the sum v + w of
v +w two vectors v and w is shown as given by the parallelogram law. If w is moved so its
tail coincides with the tip of v (Figure 7(b)) then the sum v + w is seen as “first v
P
w and then w. Similarly, moving the tail of v to the tip of w shows in Figure 7(c) that
(a)
w v + w is “first w and then v.” This will be referred to as the tip-to-tail rule, and it
gives a graphic illustration of why v + w = w + v.
v Since AB denotes the vector from a point A to a point B, the tip-to-tail rule takes
v +w
the easily remembered form
(b) AB + BC = AC
w+ v for any points A, B, and C. The next example uses this to derive a theorem in
v geometry without using coordinates.
(c)
w
EXAMP L E 2
FIGURE 7
Show that the diagonals of a parallelogram bisect each other.
8 Recall that a parallelogram is a four-sided figure whose opposite sides are parallel and of equal length.
188 Chapter 4 Vector Geometry
One reason for the importance of the tip-to-tail rule is that it means two or
w more vectors can be added by placing them tip-to-tail in sequence. This gives a
useful “picture” of the sum of several vectors, and is illustrated for three vectors
u in Figure 8 where u + v + w is viewed as first u, then v, then w.
v
There is a simple geometrical way to visualize the (matrix) difference v - w
of two vectors. If v and w are positioned so that they have a common tail A (see
u+v+w Figure 9), and if B and C are their respective tips, then the tip-to-tail rule gives
w + CB = v. Hence v - w = CB is the vector from the tip of w to the tip of v.
u w Thus both v - w and v + w appear as diagonals in the parallelogram determined
by v and w (see Figure 9). We record this for reference.
v
FIGURE 8
Theorem 3
If v and w have a common tail, then v - w is the vector from the tip of w to the tip of v.
B One of the most useful applications of vector subtraction is that it gives a simple
v → formula for the vector from one point to another, and for the distance between the points.
CB
A
w C
Theorem 4
v −w v +w
v
Let P1(x1, y1, z1) and P2(x2, y2, z2) be two points. Then:
S T
x2 - x1
w 1. P1P2 = y2 - y1 .
FIGURE 9 z2 - z1
______________________________
2. The distance between P1 and P2 is √(x2 - x1)2 + ( y2 - y1)2 + (z2 - z1)2 .
PROOF
S T S T
x1 x2
v1 P1 If O is the origin, write v1 = OP1 = y1 and v2 = OP2 = y2 as in Figure 10.
O → z1 z2
P1P2
Then Theorem 3 gives P1P2 = v2 - v1, and (1) follows. But the distance between
v2
P2 P1 and P2 is P1P2 , so (2) follows from (1) and Theorem 1.
FIGURE 10
Of course the 2-version of Theorem 4 is also valid: If P1(x1, y1) and P2(x2, y2)
EXAMPLE 3
_________________ __
The distance between P1(2, -1, 3) and P2(1, 1, 4) is √(-1)2 + (2)2 + (1)2 = √6 ,
S T
-1
and the vector from P1 to P2 is P1P2 = 2 .
1
SECTION 4.1 Vectors and Lines 189
As for the parallelogram law, the intrinsic rule for finding the length and direction
of a scalar multiple of a vector in 3 follows easily from the same situation in 2.
Scalar Multiplication
PROOF
(1) This part of Theorem 1.
(2) Let O denote the origin in 3, let v have point P, and choose any plane
containing O and P. If we set up a coordinate system in this plane with O as
_›
origin, then v = OP so the result in (2) follows from the scalar multiple law in
the plane (Section 2.6).
ST ST ST
L
1 0 0
A vector u is called a unit vector if u = 1. Then i = 0 , j = 1 , and k = 0
P 3 0 0 1
2
p
O p are unit vectors, called the coordinate vectors. We discuss them in more detail in
1
2
p Section 4.2.
− 12 p
FIGURE 12 EXAMP L E 4
1 v is the unique unit vector in the same direction as v.
If v ≠ 0 show that ____
v
Solution ► The vectors in the same direction as v are the scalar multiples av
where a > 0. But av = |a|v = av when a > 0, so av is a unit vector if
1 .
and only if a = ____
v
The next example shows how to find the coordinates of a point on the line
segment between two given points. The technique is important and will be used
again below.
9 Since the zero vector has no direction, we deal only with the case av ≠ 0.
190 Chapter 4 Vector Geometry
EXAMPLE 5
Let p1 and p2 be the vectors of two points P1 and P2. If M is the point one third
the way from P1 to P2, show that the vector m of M is given by
m = _23 p1 + _13 p2
Conclude that if P1 = P1(x1, y1, z1) and P2 = P2(x2, y2, z2), then M has
coordinates
M = M(_23 x1 + _13 x2, _23 y1 + _13 y2, _23 z1 + _13 z2).
P1 Solution ► The vectors p1, p2, and m are shown in the diagram. We have
p1 P1M = _13 P1P2 because P1M is in the same direction as P1P2 and _13 as long. By
M
Theorem 3 we have P1P2 = p2 - p1, so tip-to-tail addition gives
m
O
m = p1 + P1M = p1 + _13 (p2 - p1) = _23 p1 + _13 p2
p2 P2
S T S T
x1 x2
as required. For the coordinates, we have p1 = y1 and p2 = y2 , so
z1 z2
S T
2
_ x + _13 x2
S T S T
x1 x2 3 1
m = _23 y1 + _1 y2 = 2
_ y + _13 y2
3 3 1
z1 z2 2
_z
3 1
+ _13 z2
by matrix addition. The last statement follows.
EXAMPLE 6
Show that the midpoints of the four sides of any quadrilateral are the vertices
of a parallelogram. Here a quadrilateral is any figure with four vertices and
straight sides.
F C Solution ► Suppose that the vertices of the quadrilateral are A, B, C, and D (in
B that order) and that E, F, G, and H are the midpoints of the sides as shown in
the diagram. It suffices to show EF = HG (because then sides EF and HG are
E parallel and of equal length). Now the fact that E is the midpoint of AB means
G that EB = _12 AB . Similarly, BF = _12 BC , so
A
EF = EB + BF = _12 AB + _12 BC = _12 ( AB + BC ) = _12 AC
H
D A similar argument shows that HG = _12 AC too, so EF = HG as required.
SECTION 4.1 Vectors and Lines 191
Definition 4.2 Two nonzero vectors are called parallel if they have the same or opposite direction.
Many geometrical propositions involve this notion, so the following theorem will
be referred to repeatedly.
Theorem 5
Two nonzero vectors v and w are parallel if and only if one is a scalar multiple of the other.
PROOF
If one of them is a scalar multiple of the other, they are parallel by the scalar
multiple law.
v
Conversely, assume that v and w are parallel and write d = ____ for
w
convenience. Then v and w have the same or opposite direction. If they have
the same direction we show that v = dw by showing that v and dw have the
same length and direction. In fact, dw = |d| w = v by Theorem 1; as to
the direction, dw and w have the same direction because d > 0, and this is the
direction of v by assumption. Hence v = dw in this case by Theorem 2. In the
other case, v and w have opposite direction and a similar argument shows that
v = -dw. We leave the details to the reader.
EXAMP L E 7
Given points P(2, -1, 4), Q(3, -1, 3), A(0, 2, 1), and B(1, 3, 0), determine if
PQ and AB are parallel.
Solution ► By Theorem 3, PQ = (1, 0, -1) and AB = (1, 1, -1). If PQ = t AB
then (1, 0, -1) = (t, t, -t), so 1 = t and 0 = t, which is impossible. Hence PQ
is not a scalar multiple of AB , so these vectors are not parallel by Theorem 5.
Lines in Space
These vector techniques can be used to give a very simple way of describing straight
lines in space. In order to do this, we first need a way to specify the orientation of
such a line, much as the slope does in the plane.
Definition 4.3 With this in mind, we call a nonzero vector d ≠ 0 a direction vector for the line if it is
parallel to AB for some pair of distinct points A and B on the line.
P0 d
P Of course it is then parallel to CD for any distinct points C and D on the line. In
P0 P particular, any nonzero scalar multiple of d will also serve as a direction vector of
p0 the line.
p We use the fact that there is exactly one line that passes through a particular
point P0(x0, y0, z0) and has a given direction vector d = S b T. We want to describe
a
Origin c
FIGURE 13 this line by giving a condition on x, y, and z that the point P(x, y, z) lies on
192 Chapter 4 Vector Geometry
S T
this line. Let p0 = y0 and p = S y T denote the vectors of P0 and P, respectively
x0 x
z0 z
(see Figure 13). Then
p = p0 + P0P
Hence P lies on the line if and only if P0P is parallel to d—that is, if and only if
P0P = td for some scalar t by Theorem 5. Thus p is the vector of a point on the
line if and only if p = p0 + td for some scalar t. This discussion is summed up
as follows.
S T S T
SbT
x x0 a
y = y 0 + t
z z0 c
Equating components gives a different description of the line.
The line through P0(x0, y0, z0) with direction vector d = S b T ≠ 0 is given by
a
c
x = x0 + ta
y = y0 + tb t any scalar
z = z0 + tc
In other words, the point P(x, y, z) is on this line if and only if a real number t exists
such that x = x0 + ta, y = y0 + tb, and z = z0 + tc.
EXAMPLE 8
Find the equations of the line through the points P0(2, 0, 1) and P1(4, -1, 1).
ST
2
Solution ► Let d = P0P1 = 1 denote the vector from P0 to P1. Then d is
0
parallel to the line (P0 and P1 are on the line), so d serves as a direction vector
for the line. Using P0 as the point on the line leads to the parametric equations
x = 2 + 2t
y = -t t a parameter
z=1
Note that if P1 is used (rather than P0), the equations are
SECTION 4.1 Vectors and Lines 193
x = 4 + 2s
y = -1 - s s a parameter
z=1
These are different from the preceding equations, but this is merely the result
of a change of parameter. In fact, s = t - 1.
EXAMP L E 9
Find the equations of the line through P0(3, -1, 2) parallel to the line with
equations
x = -1 + 2t
y=1+t
z = -3 + 4t
ST
2
Solution ► The coefficients of t give a direction vector d = 1 of the given
4
line. Because the line we seek is parallel to this line, d also serves as a direction
vector for the new line. It passes through P0, so the parametric equations are
x = 3 + 2t
y = -1 + t
z = 2 + 4t
EXAMPLE 10
Determine whether the following lines intersect and, if so, find the point of
intersection.
x = 1 - 3t x = -1 + s
y = 2 + 5t y = 3 - 4s
z=1+t z=1-s
S T S T
S T 3 - 4s for some t and s,
1 - 3t x -1 + s
2 + 5t = y =
1+t z 1-s
where the first (second) equation is because P lies on the first (second) line.
Hence the lines intersect if and only if the three equations
1 - 3t = -1 + s
2 + 5t = 3 - 4s
1+t =1-s
have a solution. In this case, t = 1 and s = -1 satisfy all three equations, so the
lines do intersect and the point of intersection is
S T S T
1 - 3t -2
p = 2 + 5t = 7
1+t 2
194 Chapter 4 Vector Geometry
S T
-1 + s
using t = 1. Of course, this point can also be found from p = 3 - 4s using
1-s
s = -1.
EXAMPLE 11
Show that the line through P0(x0, y0) with slope m has direction vector d = S T
1
m
and equation y - y0 = m(x - x0). This equation is called the point-slope formula.
y Solution ► Let P1(x1, y1) be the point on the line one unit to the right of P0 (see the
diagram). Hence x1 = x0 + 1. Then d = P0P1 serves as direction vector of the line,
and d = S y - y T = S
y1 - y0 T
P1(x1,y1) x1 - x0 1
. But the slope m can be computed as follows:
P0(x0,y0) 1 0
y1 - y0 1 - y0
y______
m = _______
x1 - x0 = = y1 - y0
1
x1 = x0+1 x
Hence d = S T and the parametric equations are x = x0 + t, y = y0 + mt.
O x0 1
m
Eliminating t gives y - y0 = mt = m(x - x0), as asserted.
Note that the vertical line through P0(x0, y0) has a direction vector d = S T that is
0
1
not of the form S T for any m. This result confirms that the notion of slope makes
1
m
no sense in this case. However, the vector method gives parametric equations for
the line:
x = x0
y = y0 + t
Because y is arbitrary here (t is arbitrary), this is usually written simply as x = x0.
Pythagoras’ Theorem
The pythagorean theorem was known earlier, but Pythagoras (c. 550 b.c.) is credited
with giving the first rigorous, logical, deductive proof of the result. The proof we
give depends on a basic property of similar triangles: ratios of corresponding sides
are equal.
Theorem 6
Pythagoras’ Theorem
Given a right-angled triangle with hypotenuse c and sides a and b, then a2 + b2 = c2.
B
p D
PROOF
c
a
q Let A, B, and C be the vertices of the triangle as in Figure 14. Draw a perpendicular
from C to the point D on the hypotenuse, and let p and q be the lengths of BD
C b A p __a
and DA respectively. Then DBC and CBA are similar triangles so __ a = c.
FIGURE 14
SECTION 4.1 Vectors and Lines 195
q
This means a2 = pc. In the same way, the similarity of DCA and CBA gives __ = __bc ,
2
whence b = qc. But then b
a2 + b2 = pc + qc = (p + q)c = c2
because p + q = c. This proves Pythagoras’ theorem.
EXERCISES 4.1
S T S T
2 1 of the following cases.
S T S T
(a) -1 (b) -1 -3 5
2 2 (a) u = -6 ; v = 10
S T S T
1 -1 3 -5
S T S T
(c) 0 (d) 0 3 -1
(b) u = -6 ; v = 2
-1 2
S T ST
3 -1
1 1
ST S T
(e) 2 -1 (f ) -3 1
1 -1
(c) u = 0 ; v = 0
2 2
1 1
S T S T
2. Find a unit vector in the direction of: 2 -8
S T S T
7 -2 (d) u = 0;v= 0
(a) -1 (b) -1
-1 4
5 2
8. Let p and q be the vectors of points P and
3. (a) Find a unit vector in the direction from Q, respectively, and let R be the point whose
S T ST
3 1 vector is p + q. Express the following in terms
-1 to 3. of p and q.
4 5
(a) QP (b) QR
(b) If u ≠ 0, for which values of a is au a unit
vector? (c) RP (d) RO where O is the origin
4. Find the distance between the following pairs of 9. In each case, find PQ and PQ .
points.
(a) P(1, -1, 3), Q(3, 1, 0)
S T S T S T ST
3 2 2 2
(a) -1 and -1 (b) -1 and 0 (b) P(2, 0, 1), Q(1, -1, 6)
0 1 2 1
(c) P(1, 0, 1), Q(1, 0, -3)
S T ST S T ST
-3 1 4 3
(c) 5 and 3 (d) 0 and 2 (d) P(1, -1, 2), Q(1, -1, 2)
2 3 -2 0 (e) P(1, 0, -3), Q(-1, 0, 3)
5. Use vectors to show that the line joining the (f ) P(3, -1, 6), Q(1, 1, 4)
midpoints of two sides of a triangle is parallel to
the third side and half as long. 10. In each case, find a point Q such that PQ has
(i) the same direction as v; (ii) the opposite
6. Let A, B, and C denote the three vertices of a direction to v.
ST
triangle.
1
(a) If E is the midpoint of side BC, show that (a) P(-1, 2, 2), v = 3
1
1
_
AE = ( AB
+ AC).
S T
2 2
(b) If F is the midpoint of side AC, show that (b) P(3, 0, -1), v = -1
FE = _12 AB . 3
196 Chapter 4 Vector Geometry
S T ST S T S T S T
3 4 -1 2 2
11. Let u = -1 , v = 0 , and w = 1 . 18. Let u = 0 and v = 1 . In each case find x:
0 1 5 -4 -2
In each case, find x such that: (a) 2u - vv = _32 (u - 2x)
(a) 3(2u + x) + w = 2x - v (b) 3u + 7v = u2(2x + v)
S T
(b) 2(3v - x) = 5w + u - 3x 3
ST ST S T
19. Find all vectors u that are parallel to v = -2
1 0 1
12. Let u = 1 , v = 1 , and w = 0 . In each 1
and satisfy u = 3v.
2 2 -1
case, find numbers a, b, and c such that 20. Let P, Q, and R be the vertices of a parallelogram
x = au + bv + cw. with adjacent sides PQ and PR. In each case, find
S T ST
2 1 the other vertex S.
(a) x = -1 (b) x = 3
(a) P(3, -1, -1), Q(1, -2, 0), R(1, -1, 2)
6 0
S T ST ST
(b) P(2, 0, -1), Q(-2, 4, 1), R(3, -1, 0)
3 4 1
13. Let u = -1 , v = 0 , and z = 1 . In each 21. In each case either prove the statement or give
0 1 1 an example showing that it is false.
case, show that there are no numbers a, b, and c
such that: (a) The zero vector 0 is the only vector of
ST
1 length 0.
(a) au + bv + cz = 2 (b) If v - w = 0, then v = w.
1
S T
5 (c) If v = -v, then v = 0.
(b) au + bv + cz = 6 (d) If v = w, then v = w.
-1
(e) If v = w, then v = ±w.
14. Let P1 = P1(2, 1, -2) and P2 = P2(1, -2, 0).
Find the coordinates of the point P: (f ) If v = tw for some scalar t, then v and w
1
have the same direction.
(a) _ the way from P1 to P2
5
1
(g) If v, w, and v + w are nonzero, and v and
(b) _ the way from P2 to P1
4 v + w parallel, then v and w are parallel.
15. Find the two points trisecting the segment (h) -5v = -5v, for all v.
between P(2, 3, 5) and Q(8, -6, 2).
(i) If v = 2v, then v = 0.
16. Let P1 = P1(x1, y1, z1) and P2 = P2(x2, y2, z2) be (j) v + w = v + w, for all v and w.
two points with vectors p1 and p2, respectively. If
r and s are positive integers, show that the point 22. Find the vector and parametric equations of the
r
P lying ___
r + s the way from P1 to P2 has vector following lines.
S T
s
p = (___ ___r 2
r + s )p1 + ( r + s )p2.
(a) The line parallel to -1 and passing through
17. In each case, find the point Q: 0
S T
2 P(1, -1, 3).
(a) PQ = 0 and P = P(2, -3, 1)
(b) The line passing through P(3, -1, 4) and
-3
Q(1, 0, -1).
S T
-1
(b) PQ = 4 and P = P(1, 3, -4) (c) The line passing through P(3, -1, 4) and
Q(3, -1, 5).
ST
7 1
(d) The line parallel to 1 and passing
1
through P(1, 1, 1).
SECTION 4.1 Vectors and Lines 197
ST S T
1 2 equations.
line with vector equation p = 2 + t -1 at
28. A parallelogram has sides AB, BC, CD, and DA.
0 2
Given A(1, -1, 2), C(2, 1, 0), and the midpoint
points at distance 3 from P0(1, 2, 0).
M(1, 0, -3) of AB, find BD.
23. In each case, verify that the points P and Q lie
29. Find all points C on the line through A(1, -1, 2)
on the line.
and B = (2, 0, 1) such that AC = 2 BC .
(a) x = 3 - 4t P(-1, 3, 0), Q(11, 0, 3)
y=2+t 30. Let A, B, C, D, E, and F be the vertices of a
z=1-t regular hexagon, taken in order. Show that
(b) x = 4 - t P(2, 3, -3), Q(-1, 3, -9) AB + AC + AD + AE + AF = 3AD.
y=3
z = 1 - 2t 31. (a) Let P1, P2, P3, P4, P5, and P6 be six points
equally spaced on a circle with centre C.
24. Find the point of intersection (if any) of the Show that
following pairs of lines.
CP1 + CP2 + CP3 + CP4 + CP5 + CP6 = 0.
(a) x = 3 + t x = 4 + 2s
(b) Show that the conclusion in part (a) holds for
y = 1 - 2t y = 6 + 3s
any even set of points evenly spaced on the
z = 3 + 3t z=1+s
circle.
(b) x=1-t x = 2s
(c) Show that the conclusion in part (a) holds for
y = 2 + 2t y=1+s
three points.
z = -1 + 3t z = 3
S T
(c) S y T = -1 + t 1 S T
x 3 1 (d) Do you think it works for any finite set of
points evenly spaced around the circle?
z 2 -1
32. Consider a quadrilateral with vertices A, B, C,
S T
SyT = 1 + s 0 ST
x 1 2
and D in order (as shown in the diagram).
z -2 3 A B
S T
(d) S y T = -1 + t 0
ST
x 4 1
z 5 1
S T -7 S T S T
x 2 0 D C
y = + s -2 If the diagonals AC and BD bisect each other,
z 12 3 show that the quadrilateral is a parallelogram.
(This is the converse of Example 2.) [Hint: Let E
25. Show that if a line passes through the origin,
be the intersection of the diagonals. Show that
the vectors of points on the line are all scalar
multiples of some fixed nonzero vector. AB = DC by writing AB = AE + EB .]
33. Consider the parallelogram ABCD (see diagram), vectors u, v, and w, show that the point on each
and let E be the midpoint of side AD. median that is _13 the way from the midpoint to
C the vertex has vector _13 (u + v + w). Conclude
that the point C with vector _13 (u + v + w) lies
on all three medians. This point C is called the
B
centroid of the triangle.
F D 35. Given four noncoplanar points in space, the
E
figure with these points as vertices is called a
tetrahedron. The line from a vertex through
A
the centroid (see previous exercise) of the
Show that BE and AC trisect each other; that is, triangle formed by the remaining vertices is
show that the intersection point is one-third of called a median of the tetrahedron. If u, v,
the way from E to B and from A to C. [Hint: If w, and x are the vectors of the four vertices,
F is one-third of the way from A to C, show that show that the point on a median one-fourth
2 EF = FB and argue as in Example 2.] the way from the centroid to the vertex has
vector _14 (u + v + w + x). Conclude that the
34. The line from a vertex of a triangle to the four medians are concurrent.
midpoint of the opposite side is called a median
of the triangle. If the vertices of a triangle have
S T S T
x1 x2
Definition 4.4 Given vectors v = 1 and w = y2 , their dot product v · w is a number defined
y
z1 z2
v · w = x1x2 + y1 y2 + z1z2 = vTw
EXAMPLE 1
S T S T
2 1
If v = -1 and w = 4 , then v · w = 2 · 1 + (-1) · 4 + 3 · (-1) = -5.
3 -1
The next theorem lists several basic properties of the dot product.
Theorem 1
PROOF
(1), (2), and (3) are easily verified, and (4) comes from Theorem 1 Section 4.1.
The rest are properties of matrix arithmetic (because w · v = vTw, and are left
to the reader.
EXAMP L E 2
Verify that v - 3w2 = 1 when v = 2, w = 1, and v · w = 2.
There is an intrinsic description of the dot product of two nonzero vectors in 3.
To understand it we require the following result from trigonometry.
Law of Cosines
If a triangle has sides a, b, and c, and if θ is the interior angle opposite c then
c2 = a2 + b2 - 2ab cos θ.
200 Chapter 4 Vector Geometry
PROOF
We prove it when is θ acute, that is 0 ≤ θ < __π2 ; the obtuse case is similar. In
a p c Figure 2 we have p = a sin θ and q = a cos θ. Hence Pythagoras’ theorem gives
θ q b−q c2 = p2 + (b - q)2 = a2 sin2 θ + (b - a cos θ)2
b = a2(sin2 θ + cos2 θ) + b2 - 2ab cos θ.
FIGURE 2
The law of cosines follows because sin2 θ + cos2 θ = 1 for any angle θ.
Note that the law of cosines reduces to Pythagoras’ theorem if θ is a right angle
(because cos __π2 = 0).
Now let v and w be nonzero vectors positioned with a common tail as in
v θ obtuse Figure 3. Then they determine a unique angle θ in the range
0≤θ≤π
θ
This angle θ will be called the angle between v and w. Figure 2 illustrates when θ
w is acute (less than __π2 ) and obtuse (greater than __π2 ). Clearly v and w are parallel if θ is
either 0 or π. Note that we do not define the angle between v and w if one of these
v vectors is 0.
θ acute The next result gives an easy way to compute the angle between two nonzero
vectors using the dot product.
θ
w
FIGURE 3 Theorem 2
PROOF
v v −w We calculate v - w2 in two ways. First apply the law of cosines to the triangle
in Figure 4 to obtain:
θ v - w2 = v2 + w2 - 2vwcos θ
w On the other hand, we use Theorem 1:
FIGURE 4
v - w2 = (v - w) · (v - w)
=v·v-v·w-w·v+w·w
= v2 - 2(v · w) + w2
Comparing these we see that - 2vwcos θ = -2(v · w), and the result follows.
EXAMP L E 3
S T S T
-1 2
y Compute the angle between u = 1 and v = 1 .
2 -1
( −1
2 2
, 3)
Solution ► Compute cos θ = _______ v · w = -2 +__ 1 __- 2
___________ 1 . Now recall that
= -__
2 vw √6 √6 2
3
cos θ and sin θ are defined so that (cos θ, sin θ) is the point on the unit circle
−1 O x
2 determined by the angle θ (drawn counterclockwise, starting from the positive
x axis). In the present case, we know that cos θ = -_12 and that 0 ≤ θ ≤ π.
Because cos __π3 = _12 , it follows that θ = __
2π
3
(see the diagram).
If v and w are nonzero, (∗) shows that cos θ has the same sign as v · w, so
v · w > 0 if and only if θ is acute (0 ≤ θ < __π2 )
v · w < 0 if and only if θ is obtuse (__π2 < θ ≤ 0)
v · w = 0 if and only if θ = __π2
In this last case, the (nonzero) vectors are perpendicular. The following terminology
is used in linear algebra:
Definition 4.5 Two vectors v and w are said to be orthogonal if v = 0 or w = 0 or the angle between
them is __π2 .
Theorem 3
EXAMP L E 4
Show that the points P(3, -1, 1), Q(4, 1, 4), and R(6, 0, 4) are the vertices of a
right triangle.
ST ST S T
1
3 2
PQ = 2 , PR = 1 , and QR = -1
3 3 0
Evidently PQ · QR = 2 - 2 + 0 = 0, so PQ and QR are orthogonal vectors.
This means sides PQ and QR are perpendicular—that is, the angle at Q is a
right angle.
Example 5 demonstrates how the dot product can be used to verify geometrical
theorems involving perpendicular lines.
202 Chapter 4 Vector Geometry
EXAMPLE 5
A parallelogram with sides of equal length is called a rhombus. Show that the
diagonals of a rhombus are perpendicular.
Solution ► Let u and v denote vectors along two adjacent sides of a rhombus, as
u−v shown in the diagram. Then the diagonals are u - v and u + v, and we compute
u u+v
(u - v) · (u + v) = u · (u + v) - v · (u + v)
=u·u+u·v-v·u-v·v
= u2 - v2
v
=0
because u = v (it is a rhombus). Hence u - v and u + v are orthogonal.
Projections
In applications of vectors, it is frequently useful to write a vector as the sum of two
orthogonal vectors. Here is an example.
EXAMPLE 6
Suppose a ten-kilogram block is placed on a flat surface inclined 30° to the
w1 horizontal as in the diagram. Neglecting friction, how much force is required
w2 to keep the block from sliding down the surface?
30°
w
30° Solution ► Let w denote the weight (force due to gravity) exerted on the block.
Then w = 10 kilograms and the direction of w is vertically down as in the
diagram. The idea is to write w as a sum w = w1 + w2 where w1 is parallel to
the inclined surface and w2 is perpendicular to the surface. Since there is no
friction, the force required is -w1 because the force w2 has no effect parallel to
the surface. As the angle between w and w2 is 30° in the diagram, we have
w1
_____
P = sin 30° = _12 . Hence w1 = _12 w = _12 10 = 5. Thus the required
w
u − u1 force has a magnitude of 5 kilograms weight directed up the surface.
d
u
P1 If a nonzero vector d is specified, the key idea in Example 6 is to be able to write
u1 an arbitrary vector u as a sum of two vectors,
Q u = u1 + u2
(a)
where u1 is parallel to d and u2 = u - u1 is orthogonal to d. Suppose that u and
P d ≠ 0 emanate from a common tail Q (see Figure 5). Let P be the tip of u, and let
u d P1 denote the foot of the perpendicular from P to the line through Q parallel to d.
u − u1 Then u1 = QP1 has the required properties:
Q
u1 1. u1 is parallel to d.
P1
2. u2 = u - u1 is orthogonal to d.
(b)
FIGURE 5 3. u = u1 + u2.
Definition 4.6 The vector u1 = QP1 in Figure 5 is called the projection of u on d. It is denoted
u1 = projd u
SECTION 4.2 Projections and Planes 203
In Figure 5(a) the vector u1 = projd u has the same direction as d; however, u1
and d have opposite directions if the angle between u and d is greater than __π2 .
(Figure 5(b)). Note that the projection u1 = projd u is zero if and only if u and d
are orthogonal.
Calculating the projection of u on d ≠ 0 is remarkably easy.
Theorem 4
PROOF
The vector u1 = projd u is parallel to d and so has the form u1 = td for some
scalar t. The requirement that u - u1 and d are orthogonal determines t. In fact,
it means that (u - u1) · d = 0 by Theorem 3. If u1 = td is substituted here, the
condition is
0 = (u - td) · d = u · d - t(d · d) = u · d - td2
It follows that t = u · d , where the assumption that d ≠ 0 guarantees that
_____
d2
d2 ≠ 0.
EXAMP L E 7
S T S T
2 1
Find the projection of u = -3 on d = -1 and express u = u1 + u2 where
1 3
u1 is parallel to d and u2 is orthogonal to d.
S T S T
1 1
2+3+3
u1 = projd u = u · d d = ______________
_____
-1 = 8
__
11 -1
d2 12 + (-1)2 + 32 3 3
S T
14
1
Hence u2 = u - u1 = __ 11 -25 , and this is orthogonal to d by Theorem 4
-13
(alternatively, observe that d · u2 = 0). Since u = u1 + u2, we are done.
EXAMP L E 8
P(1, 3, −2)
Find the shortest distance (see diagram) from the point P(1, 3, -2) to the line
u − u1
S T
1
u
u1 d through P0(2, 0, -1) with direction vector d = -1 . Also find the point Q that
Q
0
lies on the line and is closest to P.
P0(2, 0, −1)
204 Chapter 4 Vector Geometry
S T S T S T
1 2 -1
Solution ► Let u = 3 - 0 = 3 denote the vector from P0 to P, and let
-2 -1 -1
u1 denote the projection of u on d. Thus
S 2T
-2
-1 - 3 + 0
u1 = u · d d = ______________
_____
2
d = -2d =
d 1 + (-1)2 + 02
2
0
by Theorem 4. We see geometrically that the point Q on the line is closest to
P, so the distance is
S 1T
1 __
QP = u - u1 = = √3
-1
To find the coordinates of Q, let p0 and q denote the vectors of P0 and Q,
S T S T
2 0
respectively. Then p0 = 0 and q = p0 + u1 = 2 .
-1 -1
Hence Q(0, 2, -1)
__ is the required point. It can be checked that the distance
from Q to P is √3 , as expected.
Planes
It is evident geometrically that among all planes that are perpendicular to a given
straight line there is exactly one containing any given point. This fact can be used to
give a very simple description of a plane. To do this, it is necessary to introduce the
following notion:
Definition 4.7 A nonzero vector n is called a normal for a plane if it is orthogonal to every vector in
the plane.
For example, the coordinate vector k is a normal for the x-y plane.
n P Given a point P0 = P0(x0, y0, z0) and a nonzero vector n, there is a unique plane
through P0 with normal n, shaded in Figure 6. A point P = P(x, y, z) lies on this
plane if and only if the vector P0P is orthogonal to n—that is, if and only if
P0
S T
x - x0
n · P0P = 0. Because P0P = y - y0 this gives the following result:
z - z0
FIGURE 6
Scalar Equation of a Plane
The plane through P0(x0, y0, z0) with normal n = S b T ≠ 0 as a normal vector is given by
a
c
a(x - x0) + b( y - y0) + c(z - z0) = 0
In other words, a point P(x, y, z) is on this plane if and only if x, y, and z satisfy this
equation.
SECTION 4.2 Projections and Planes 205
EXAMP L E 9
S T
3
Find an equation of the plane through P0(1, -1, 3) with n = -1 as normal.
2
Solution ► Here the general scalar equation becomes
3(x - 1) - ( y + 1) + 2(z - 3) = 0
This simplifies to 3x - y + 2z = 10.
If we write d = ax0 + by0 + cz0, the scalar equation shows that every plane with
c
ax + by + cz = d (∗)
for some constant d. Conversely, the graph of this equation is a plane with n = S b T
a
c
as a normal vector (assuming that a, b, and c are not all zero).
EXAMPLE 10
Find an equation of the plane through P0(3, -1, 2) that is parallel to the plane
with equation 2x - 3y = 6.
S T
2
Solution ► The plane with equation 2x - 3y = 6 has normal n = -3 . Because
0
the two planes are parallel, n serves as a normal for the plane we seek, so the
equation is 2x - 3y = d for some d by equation (∗). Insisting that P0(3, -1, 2)
lies on the plane determines d; that is, d = 2 · 3 - 3(-1) = 9. Hence, the
equation is 2x - 3y = 9.
S T
Consider points P0(x0, y0, z0) and P(x, y, z) with vectors p0 = y0 and p = S y T.
x0 x
z0 z
Given a nonzero vector n, the scalar equation of the plane through P0(x0, y0, z0) with
The plane with normal n ≠ 0 through the point with vector p0 is given by
n · (p - p0) = 0
In other words, the point with vector p is on the plane if and only if p satisfies this condition.
EXAMPLE 11
Find the shortest distance from the point P(2, 1, -3) to the plane with equation
3x - y + 4z = 1. Also find the point Q on this plane closest to P.
S T
3
n Solution 1 ► The plane in question has normal n = -1 . Choose any point
P(2, 1, −3) 4
u1
P0 on the plane—say P0(0, -1, 0)—and let Q(x, y, z) be the point on the plane
S T
u 2
P0(0, −1, 0) Q(x, y, z) closest to P (see the diagram). The vector from P0 to P is u = 2 . Now erect
-3
n with its tail at P0. Then QP = u1 and u1 is the projection of u on n:
S T S T
3 3
u1 = n · u n = __
_____ -8
26 -1
= -4
__
13 -1
n2 4 4
___
Hence the distance is QP = u1 = ____
4 26 √
13
. To calculate the point Q, let
S T
q = S y T and p0 = -1 be the vectors of Q and P0. Then
x 0
S T
z 0 38
__
S T S T S T
0 2 3 13
4 9
q = p0 + u - u1 = -1 + 2 + __ 13 -1 = __
13
0 -3 4 -23
___
13
38 __
This gives the coordinates of Q(__ , 9 , ___
13 13 13
-23
).
S T
Solution 2 ► Let q = S y T and p = 1 be the vectors of Q and P. Then Q is on
x 2
z -3
the line through P with direction vector n, so q = p + tn for some scalar t. In
addition, Q lies on the plane, so n · q = 1. This determines t:
1 = n · q = n · (p + tn) = n · p + tn2 = -7 + t(26)
8 4
This gives t = __
26
= __
13
, so
S y T = q = p + tn = S 1 T + __134 S -1 T = __131 S T
x 2 3 38
9
z -3 4 -23
as before. This determines Q (in the diagram), and the reader can verify that
___
the required distance is QP = __
4√
13
26 , as before.
S T S T
x1 x2
Definition 4.8 Given vectors v1 = 1 and v2 = y2 , define the cross product v1 × v2 by
y
z1 z2
S T
y1z2 - z1y2
v1 × v2 = -(x1z2 - z1x2) .
x1 y2 - y1x2
ST ST ST
1 0 0
k i = 0 , j = 1 , and k = 0
j 0 0 1
i
O They are vectors of length 1 pointing along the positive x, y, and z axes,
x y respectively, as in Figure 7. The reason for the name is that any vector can be
FIGURE 7 written as
S y T = xi + yj + zk.
x
z
With this, the cross product can be described as follows:
S T S T
x1 x2
If v1 = 1 and v2 = y2 are two vectors, then
y
z1 z2
i x1 x 2
y1 y2 x1 x2 x1 x2
v1 × v2 = det j y1 y 2 =
z 1 z2
i- |
z 1 z2 | |
j+
y 1 y2
k | | |
k z1 z 2
where the determinant is expanded along the first column.
EXAMPLE 12
S T ST
2 1
If v = -1 and w = 3 , then
4 7
i 2 1
-1 3
v1 × v2 = det j −1 3 =
4 7
i-
2 1
4 7
j+ |2 1
-1 3
k | | | | |
k 4 7
= -19i - 10j + 7k
S T
-19
= -10
7
general result which, together with the second part, will be proved in Section 4.3
where a more detailed study of the cross product will be undertaken.
Theorem 5
It is interesting to contrast Theorem 5(2) with the assertion (in Theorem 3) that
v · w = 0 if and only if v and w are orthogonal.
EXAMPLE 13
Find the equation of the plane through P(1, 3, -2), Q(1, 1, 5), and R(2, -2, 3).
S T S T
0
1
Solution ► The vectors PQ = -2 and PR = -5 lie in the plane, so
7 5
S T
i 0 1 25
PQ × PR = det j −2 −5 = 25i + 7j + 2k = 7
2
k 7 5
is a normal for the plane (being orthogonal to both PQ and PR ). Hence the
plane has equation
25x + 7y + 2z = d for some number d.
Since P(1, 3, -2) lies in the plane we have 25 · 1 + 7 · 3 + 2(-2) = d. Hence
d = 42 and the equation is 25x + 7y + 2z = 42. Incidentally, the same
equation is obtained (verify) if QP and QR, or RP and RQ, are used as the
vectors in the plane.
EXAMPLE 14
Find the shortest distance between the nonparallel lines
S T S T ST S T ST S T
x 1 2 x 3 1
y = 0 + t 0 and y = 1 + s 1
z -1 1 z 0 -1
Then find the points A and B on the lines that are closest together.
ST S T
2 1
Solution ► Direction vectors for the two lines are d1 = 0 and d2 = 1 , so
1 -1
S T
i 2 1 -1
n = d1 × d2 = det j 0 1 = 3
k 1 −1 2
SECTION 4.2 Projections and Planes 209
ST
A 2
P1 plane. The vector u from P1 to P2 is u = P1P2 = 1 and so, as in Example 11,
1
the distance is the length of the projection of u on n.
· n n = |u · n| ___ ___
distance = u
_____
n2 ______
n
= 3___
√14
= 3√14
____
14
S T
2 + s - 2t
B(3 + s, 1 + s, -s) for some s and t, so AB = 1 + s . This vector is
1-s-t
orthogonal to both d1 and d2, and the conditions AB · d1 = 0 and AB · d2 = 0
-5 13
give equations 5t - s = 5 and t - 3s = 2. The solution is s = __
14
and t = __
14
, so
___
the points are AQ __ R and BQ __ R. We have AB = ____
40 37 __ 3 14 √
14
-1
, 0, __
14
, 9 , __
14 14 14
5
14
, as before.
EXERCISES 4.2
S T S T
2 -1 vectors.
ST ST S T S T
(a) u = -1 , v = 1 1 2 3 -6
3 1 (a) u = 0 , v = 0 (b) u = -1 , v = 2
S T
1 3 1 0 0
S T S T
(b) u = 2,v=u 7 1
-1 (c) u = -1 , v = 4
S T S T
1 2 3 -1
S T ST S T ST
(c) u = 1 , v = -1 2 3 1 0
-3 1 (d) u = 1,v= 6 (e) u = -1 , v = 1
S T S T
3 6 -1 3__ 0 1
ST S T
(d) u = -1 , v = -7 0 √
5 2
5 -5 (f ) u = 3 , v = -7
(e) u = S y T , v = S b T
x a 4 -1
z c 3. Find all real numbers x such that:
(f ) u = S b T , v = 0
S T S T
a 2 x
(a) -1 and -2 are orthogonal.
c 3 1
S T S T
2 1
π
(b) -1 and x are at an angle of __.
3
1 2
210 Chapter 4 Vector Geometry
S T ST S T S T
-1 0 2 1
(a) u1 = -3 , u2 = 1 (a) u = -1 , v = -1
2 1 1 3
S T ST ST S T
3 2 3 -2
(b) u1 = -1 , u2 = 0 (b) u = 1 , v = 1
2 1 0 4
S T S T S T S T
2 -4 2 3
(c) u1 = 0 , u2 = 0 (c) u = -1 , v = 1
-1 2 0 -1
S T ST S T S T
2 0 3 -6
(d) u1 = -1 , u2 = 0 (d) u = -2 , v = 4
3 0 1 -1
5. Find two orthogonal vectors that are both 12. Calculate the distance from the point P to the
ST
1 line in each case and find the point Q on the line
orthogonal to v = 2 . closest to P.
ST S T
(a) P(3, 2, -1) line: S y T = 1 + t -1
0 x 2 3
6. Consider the triangle with vertices P(2, 0, -3),
z 3 -2
Q(5, -2, 1), and R(7, 5, 3).
S T ST
line: S y T = 0 + t 1
x 1 3
(a) Show that it is a right-angled triangle. (b) P(1, -1, 3)
z -1 4
(b) Find the lengths of the three sides and verify
the Pythagorean theorem.
13. Compute u × v where:
ST ST
7. Show that the triangle with vertices A(4, -7, 9), 1 1
B(6, 4, 4), and C(7, 10, -6) is not a right-angled (a) u = 2 , v = 1
triangle. 3 2
S T S T
3 -6
8. Find the three internal angles of the triangle (b) u = -1 , v = 2
with vertices: 0 0
S T S T
(a) A(3, 1, -2), B(3, 0, -1), and C(5, 2, -1) 3 1
(c) u = -2 , v = 1
(b) A(3, 1, -2), B(5, 2, -1), and C(4, 3, -3) 1 -1
S T ST
9. Show that the line through P0(3, 1, 4) and 2 1
(d) u = 0,v= 4
P1(2, 1, 3) is perpendicular to the line through
P2(1, -1, 2) and P3(0, 5, 3). -1 7
14. Find an equation of each of the following planes.
10. In each case, compute the projection of u on v.
ST S T
5 2 (a) Passing through A(2, 1, 3), B(3, -1, 5), and
(a) u = 7 , v = -1 C(1, 2, -3).
1 3 (b) Passing through A(1, -1, 6), B(0, 0, 1), and
S T ST
3 4 C(4, 7, -11).
(b) u = -2 , v = 1
S T S T
1 3 the plane with equation 3x - 2y - z = 0.
c) u = -1 , v = -1 (d) Passing through P(3, 0, -1) and parallel to
2 1 the plane with equation 2x - y + z = 3.
S T S T
3 -6
(d) u = -2 , v = 4
-1 2
SECTION 4.2 Projections and Planes 211
(e) Containing P(3, 0, -1) and the line 16. In each case, find the shortest distance from the
S T ST ST
x 0 1 point P to the plane and find the point Q on the
y = 0 + t 0. plane closest to P.
z 2 1 (a) P(2, 3, 0); plane with equation 5x + y + z = 1.
(f ) Containing P(2, 1, 0) and the line
S T S T
(b) P(3, 1, -1); plane with equation
S y T = -1 + t 0 .
x 3 1
2x + y - z = 6.
z 2 -1
17. (a) Does the line through P(1, 2, -3) with
(g) Containing the lines
S T
1
S T S T ST S T ST S T
x 1 1 x 0 1 direction vector d = 2 lie in the plane
y = -1 + t 1 and y = 0 + t -1 .
z z -3
2 1 2 0
2x - y - z = 3? Explain.
ST S T
(h) Containing the lines S y T = 1 + t -1
x 3 1
(b) Does the plane through P(4, 0, 5), Q(2, 2, 1),
z 0 3 and R(1, -1, 2) pass through the origin?
S T S T
Explain.
and S y T = -2 + t 1 .
x 0 2
S T S T
perpendicular to the plane 3x - 2y - z = 0.
given plane and the line S y T = -2 + t 5 .
x 1 2
(b) Passing through P(2, -1, 3) and
z 3 -1
perpendicular to the plane 2x + y = 1.
(a) x - 3y + 2z = 4 (b) 2x - y - z = 5
(c) Passing through P(0, 0, 0) and perpendicular
to the lines (c) 3x - y + z = 8 (d) -x - 4y - 3z = 6
ST S T S T S T
S y T = 1 + t 0 and S y T = 1 + t -1 .
x 1 2 x 2 1 21. Find the equation of all planes:
S T ST
(a) Perpendicular to the line S y T = -1 + t 1 .
x 2 2
z 0 -1 z -3 5
(d) Passing through P(1, 1, -1), and z 3 3
perpendicular to the lines
(b) Perpendicular to the line S y T =
S T ST
x 1 3
S T ST S T S T S T S T
x 2 1 x 5 1
0 + t 0.
y = 0 + t 1 and y = 5 + t 2 . z
z z -1 2
1 -2 -2 -3
(c) Containing the origin.
(e) Passing through P(2, 1, -1), intersecting
(d) Containing P(3, 2, -4).
S T ST
the line S y T = 2 + t 0 , and
x 1 3
(e) Containing P(1, 1, -1) and Q(0, 1, 1).
z -1 1
(f )
Containing P(2, -1, 1) and Q(1, 0, 0).
perpendicular to that line.
ST S T
(g) Containing the line S y T = 1 + t -1 .
x 2 1
(f ) Passing through P(1, 1, 2), intersecting the line
ST ST
z
S y T = 1 + t 1 , and perpendicular
x 2 1 0 0
ST S T
(h) Containing the line S y T = 0 + t -2 .
x 3 1
z 0 1
to that line. z 2 -1
212 Chapter 4 Vector Geometry
22. If a plane contains two distinct points P1 and equation of the line through B and C.
P2, show that it contains every point on the line
through P1 and P2. 30. If the diagonals of a parallelogram have equal
length, show that the parallelogram is a
23. Find the shortest distance between the following rectangle.
S T S T ST S T
(a) S y T = -1 + t -1 ; S y T = 0 + t -1
x 2 1 x 1 1
z
z projections of v on i, j, and k are xi, yj, and zk,
3 4 z 1 4
respectively.
ST ST
(b) S y T = 0 + t 1 ; S y T =
S T S T
x 3 3 x -1 3
2 +t 1 32. (a) Can u · v = -7 if u = 3 and v = 2?
z 2 0 z 2 0 Defend your answer.
S T
24. Find the shortest distance between the following 2
pairs of nonparallel lines and find the points on (b) Find u · v if u = -1 , v = 6, and the
the lines that are closest together. 2
ST S T S T ST
(a) S y T = 0 + s 1 ; S y T = 1 + t 0
x 3 2 x 1 1 2π
angle between u and v is __ .
3
S T ST
(b) S y T = -1 + s 1 ;
S T ST
S y T = -1 + t 1
x 1 1 x 2 3 for any vectors u and v.
S T S T
(c) S y T = 1 + s 1 ; S y T = 2 + t 0 ST ST
x 3 1 x 1 1
for any vectors u and v.
z -1 -1 z 0 2 (b) What does this say about parallelograms?
ST S T
(d) S y T = 2 + s S T -1 S T ST
x 1 2 x 3 1
; y = + t 35. Show that if the diagonals of a parallelogram
0 1
z 3 -1 z 0 0 are perpendicular, it is necessarily a rhombus.
[Hint: Example 5.]
25. Show that two lines in the plane with slopes m1
and m2 are perpendicular if and only if 36. Let A and B be the end points of a diameter of a
m1m2 = -1. [Hint: Example 11 Section 4.1.] circle (see the diagram). If C is any point on the
circle, show that AC and BC are perpendicular.
26. (a) Show that, of the four diagonals of a cube,
[Hint: Express AC and BC in terms of u = OA
no pair is perpendicular.
and v = OC, where O is the centre.]
(b) Show that each diagonal is perpendicular to
the face diagonals it does not meet. C
27. Given a __
rectangular solid with sides of lengths 1,
1, and √2 , find the angle between a diagonal and A
O B
one of the longest sides.
29. Let A, B, and C(2, -1, 1) be the vertices of a 38. Let u, v, and w be pairwise orthogonal vectors.
S T
1
(a) Show that
triangle where AB is parallel to -1 , AC is
u + v + w2 = u2 + v2 + w2.
1
S T
2 (b) If u, v, and w are all the same length, show
parallel to 0 , and angle C = 90°. Find the that they all make the same angle with
-1 u + v + w.
SECTION 4.3 More on the Cross Product 213
S T S T
x1 x2
The cross product v × w of two 3-vectors v = y1 and w = y2 was defined in
z1 z2
Section 4.2 where we observed that it can be best remembered using a determinant:
i x1 x2
y1 y2 x1 x2 x1 x2
v × w = det j y1 y2 =
z 1 z2
i- |
z 1 z2
j+ | |
y 1 y2
k | | | (∗)
k z1 z2
ST ST ST
1 0 1
Here i = 0 , j = 1 , and k = 0 are the coordinate vectors, and the determinant
0 0 0
is expanded along the first column. We observed (but did not prove) in Theorem 5
Section 4.2 that v × w is orthogonal to both v and w. This follows easily from the
next result.
214 Chapter 4 Vector Geometry
Theorem 1
S T S T S T
x0 x1 x2 x0 x1 x2
If u = 0 , v = 1 , and w = y2 , then u · (v × w) = det y y y .
y y
0 1 2
z0 z1 z2 z0 z1 z2
PROOF
Recall that u · (v × w) is computed by multiplying corresponding components of
u and v × w and then adding. Using (∗), the result is:
x0 x1 x2
u · (v × w) = x0 a z z b + y0 a- z z b + z0a y y b = det y0 y1 y2
y1 y2 x1 x2 x1 x2
|
1 2
| 1 2
| | 1 2
| |
z0 z1 z2
where the last determinant is expanded along column 1.
The result in Theorem 1 can be succinctly stated as follows: If u, v, and w are three
vectors in 3, then
u · (v × w) = det[u v w]
where [u v w] denotes the matrix with u, v, and w as its columns. Now it is clear
that v × w is orthogonal to both v and w because the determinant of a matrix is
zero if two columns are identical.
Because of (∗) and Theorem 1, several of the following properties of the cross
product follow from properties of determinants (they can also be verified directly).
Theorem 2
PROOF
(1) is clear; (2) follows from Theorem 1; and (3) and (4) follow because the
determinant of a matrix is zero if one column is zero or if two columns are
identical. If two columns are interchanged, the determinant changes sign, and
this proves (5). The proofs of (6), (7), and (8) are left as Exercise 15.
SECTION 4.3 More on the Cross Product 215
We now come to a fundamental relationship between the dot and cross products.
Theorem 3
Lagrange Identity11
If u and v are any two vectors in 3, then
u × v2 = u2v2 - (u · v)2
11
PROOF
S T S T
x1 x2
Given u and v, introduce a coordinate system and write u = y1 and v = y2 in
z1 z2
Joseph Louis Lagrange.
component form. Then all the terms in the identity can be computed in terms of
Photo © Corbis.
the components. The detailed proof is left as Exercise 14.
Theorem 4
If u and v are two nonzero vectors and θ is the angle between u and v, then
1. u × v = uv sin θ = area of the parallelogram determined by u and v.
2. u and v are parallel if and only if u × v = 0.
11 Joseph Louis Lagrange (1736–1813) was born in Italy and spent his early years in Turin. At the age of 19 he solved a famous
problem by inventing an entirely new method, known today as the calculus of variations, and went on to become one of the greatest
mathematicians of all time. His work brought a new level of rigour to analysis and his Mécanique Analytique is a masterpiece in
which he introduced methods still in use. In 1766 he was appointed to the Berlin Academy by Frederik the Great who asserted
that the “greatest mathematician in Europe” should be at the court of the “greatest king in Europe.” After the death of Frederick,
Lagrange went to Paris at the invitation of Louis XVI. He remained there throughout the revolution and was made a count
by Napoleon.
216 Chapter 4 Vector Geometry
PROOF OF (2)
By (1), u × v = 0 if and only if the area of the parallelogram is zero. By Figure 1
the area vanishes if and only if u and v have the same or opposite direction—that
is, if and only if they are parallel.
EXAMPLE 1
Find the area of the triangle with vertices P(2, 1, 0), Q(3, -1, 1), and R(1, 0, 1).
S T S T
1 2
Solution ► We have RP = 1 and RQ = -1 . The area of the triangle is half
-1 0
P the area of the parallelogram (see the diagram), and so equals _12 RP × RQ . We
have
S T
Q i 1 2
-1
RP × RQ = det j 1 −1 = -2 ,
-3
R k −1 0
_________ ___
so the area of the triangle is _12 RP × RQ = _12 √1 + 4 + 9 = _12 √14 .
u×v If three vectors u, v, and w are given, they determine a “squashed” rectangular
solid called a parallelepiped (Figure 2), and it is often useful to be able to find the
w
h volume of such a solid. The base of the solid is the parallelogram determined by
v
u and v, so it has area A = u × v by Theorem 4. The height of the solid is the
u length h of the projection of w on u × v. Hence
FIGURE 2 |w · (u × v)| |w · (u × v)|
|
w · (u × v)
h = __________
u × v 2 |
u × v = ___________ = ___________
u × v A
Thus the volume of the parallelepiped is hA = |w · (u × v)|. This proves
Theorem 5
EXAMPLE 2
Find the volume of the parallelepiped determined by the vectors
S T ST S T
1 1 -2
w = 2 , u = 1 , and v = 0 .
-1 0 1
1 1 −2
Solution ► By Theorem 1, w · (u × v) = det 2 1 0 = -3.
−1 0 1
Hence the volume is |w · (u × v)| = |-3| = 3 by Theorem 5.
SECTION 4.3 More on the Cross Product 217
Right-hand Rule
If the vector u × v is grasped in the right hand and the fingers curl around from u to v
through the angle θ, the thumb points in the direction for u × v.
S T S T
a c a b
b θ
u v y v have component form u = 0 and v = c where a > 0 and c > 0. The situation
x 0 0
FIGURE 4 is depicted in Figure 4. The right-hand rule asserts that u × v should point in the
positive z direction. But our definition of u × v gives
S T
i a b 0
u × v = det j 0 c = 0 = (ac)k
ac
k 0 0
and (ac)k has the positive z direction because ac > 0.
EXERCISES 4.3
1. If i, j, and k are the coordinate vectors, verify 3. Find two unit vectors orthogonal to both u and
that i × j = k, j × k = i, and k × i = j. v if:
S T S T S T ST
1 2 1 3
2. Show that u × (v × w) need not equal (a) u = 2 , v = -1 (b) u = 2 , v = 1
(u × v) × w by calculating both when 2 2 -1 2
ST ST ST
1 1 0
u = 1 , v = 1 , and w = 0 .
1 0 1
218 Chapter 4 Vector Geometry
4. Find the area of the triangle with the following 10. Show that points A, B, and C are all on one line
vertices. if and only if AB × AC = 0.
(a) A(3, -1, 2), B(1, 1, 0), and C(1, 2, -1)
11. Show that points A, B, C, and D are all on one
plane if and only if AB · Q AB × AC R = 0.
(b) A(3, 0, 1), B(5, 1, 0), and C(7, 2, -1)
(c) A(1, 1, -1), B(2, 0, 1), and C(1, -1, 3)
12. Use Theorem 5 to confirm that, if u, v, and w
(d) A(3, -1, 1), B(4, 1, 0), and C(2, -3, 0) are mutually perpendicular, the (rectangular)
parallelepiped they determine has volume
5. Find the volume of the parallelepiped uvw.
determined by w, u, and v when:
ST ST S T
2 1 2 13. Show that the volume of the parallelepiped
(a) w = 1 , v = 0 , and u = 1 determined by u, v, and u × v is u × v2.
1 2 -1
14. Complete the proof of Theorem 3.
ST S T ST
1 2 1
(b) w = 0 , v = 1 , and u = 1 15. Prove the following properties in Theorem 2.
3 -3 1
(a) Property 6 (b) Property 7
6. Let P0 be a point with vector p0, and let
(c) Property 8
ax + by + cz = d be the equation of a plane with
normal n = S b T.
a
16. (a) Show that
c w · (u × v) = u · (v × w) = v × (w × u)
holds for all vectors w, u, and v.
(a) Show that the point on the plane closest to
P0 has vector p given by (b) Show that v - w and
d - (p0 · n) (u × v) + (v × w) + (w × u) are orthogonal.
p = p0 + __________ n.
n2 17. Show that u × (v × w) = (u · w)v - (u × v)w.
[Hint: p = p0 + tn for some t, and [Hint: First do it for u = i, j, and k; then write
p · n = d.] u = xi + yj + zk and use Theorem 2.]
(b) Show that the shortest distance from P0 to 18. Prove the Jacobi identity:
|d - (p0 · n)| u × (v × w) + v × (w × u) + w × (u × v) = 0.
the plane is ___________.
n [Hint: The preceding exercise.]
(c) Let P0 denote the reflection of P0 in the
plane—that is, the point on the opposite side 19. Show that
(u × v) · (w × z) = det S
v·w v·zT
u·w u·z
of the plane such that the line through P0 .
and P0 is perpendicular to the plane. [Hint: Exercises 16 and 17.]
d - (p0 · n)
Show that p0 + 2 __________ n is the vector
n2 20. Let P, Q, R, and S be four points, not all on one
of P0. plane, as in the diagram. Show that the volume
of the pyramid they determine is
PQ · Q PR × PS R |.
|
7. Simplify (au + bv) × (cu + dv).
1
_
6
8. Show that the shortest distance from a point P [Hint: The volume of a cone with base area
to the line through P0 with direction vector d A and height h as in the diagram below right
__________
P0P × d is _13 Ah.]
is . Q
d
h
9. Let u and v be nonzero, nonorthogonal vectors.
If θ is the angle between them, show that
u × v P S
tan θ = ________
u·v .
R
SECTION 4.4 Linear Operators on 3 219
21. Consider a triangle with vertices A, B, and 23. Let A and B be points other than the origin, and
C, as in the diagram below. Let α, β, and γ let a and b be their vectors. If a and b are not
denote the angles at A, B, and C, respectively, parallel, show that the plane through A, B, and
and let a, b, and c denote the lengths of the the origin is given by
Theorem 1
PROOF
z Since T(0) = 0, taking w = 0 in (∗) shows that T(v) = v for all v in 3,
T (v + w)
that is T preserves length. Also, T(v) - T(w)2 = v - w2 by (∗). Since
T(v) v - w2 = v2 - 2v · w + w2 always holds, it follows that
T(v) · T(w) = v · w for all v and w. Hence (by Theorem 2 Section 4.2) the
angle between T(v) and T(w) is the same as the angle between v and w for
T(w) w all (nonzero) vectors v and w in 3.
y With this we can show that T is linear. Given nonzero vectors v and w in 3,
x
the vector v + w is the diagonal of the parallelogram determined by v and w. By
v v+w the preceding paragraph, the effect of T is to carry this entire parallelogram to the
parallelogram determined by T(v) and T(w), with diagonal T(v + w). But this
FIGURE 1
diagonal is T(v) + T(w) by the parallelogram law (see Figure 1).
In other words, T(v + w) = T(v) + T(w). A similar argument shows that
T(av) = aT(v) for all scalars a, proving that T is indeed linear.
S T T. 2 S
1 1 - m2 2m 1 1 m
Qm has matrix _______ and Pm has matrix _______
1 + m2 2m m −1 2
m2 1+m m
We now look at the analogues in 3.
v Let L denote a line through the origin in 3. Given a vector v in 3, the
L reflection QL(v) of v in L and the projection PL(v) of v on L are defined in Figure 2.
P L (v) In the same figure, we see that
PL(v) = v + _12 [QL(v) - v] = _12 [QL(v) + v] (∗∗)
0 QL (v) 12
so the fact that QL is linear (by Theorem 1) shows that PL is also linear. However,
z
2
a ab ac x
2 SbT
ab b bc S y T
ax + by + cz a
v · d d = ___________
PL(v) = _____ 1
= ___________
2
2 2 2 2 2 2
d a + b + c c a + b + c 2 z
ac bc c
as the reader can verify. Note that this shows directly that PL is a matrix
transformation and so gives another proof that it is linear.
12 Note that Theorem 1 does not apply to PL since it does not preserve distance.
SECTION 4.4 Linear Operators on 3 221
Theorem 2
Let L denote the line through the origin in 3 with direction vector d = S b T ≠ 0. Then
a
PROOF
It remains to find the matrix of QL. But (∗∗) implies that QL(v) = 2PL(v) - v for
z
a 2 ab ac
− 0 1 0 ∂SyT
1 0 0 x
QL(v) = μ ___________
2 ab b 2
bc
a2 + b2 + c2 z
ac bc c 2 0 0 1
a2 − b2 − c 2 2ab 2ac
SyT
x
1
= ___________ 2 2 2
2ab b −a −c 2bc
a + b2 + c2
2
z
2ac 2bc c − a2 − b2
2
as required.
c z
b 2 + c 2 − ab − ac
PM(v) = • 0 1 0 S y T - 2 S T a2 + b2 + c2 S y T.
1 0 0 x a x
¶
ax + by + cz 1
___________
b = ___________ − ab a 2
+ c 2
− bc
z a + b2 + c2 c
0 0 1 − ac −bc b + c z
2 2
222 Chapter 4 Vector Geometry
Theorem 3
Let M denote the plane through the origin in 3 with normal n = S b T ≠ 0. Then PM
a
b2 +c 2 −ab −ac
1
PM has matrix ___________
2 2 2 −ab a +c2 2
−bc ,
a +b +c
−ac −bc a +b
2 2
b 2 + c 2 − a2 −2ab −2ac
1
QM has matrix ___________ .
−2ab a +c −b
2 2 2 −2bc
a + b2 + c2
2
−2ac −2bc a2 + b 2 − c 2
PROOF
It remains to compute the matrix of QM. Since QM(v) = 2PM(v) - v for each
v in 3, the computation is similar to the above and is left as an exercise for
the reader.
Rotations
In Section 2.6 we studied the rotation Rθ : 2 → 2 counterclockwise about the
origin through the angle θ. Moreover, we showed in Theorem 4 Section 2.6 that
Rθ is linear and has matrix S cos θ -sin θ T. One extension of this is given in the
sin θ cos θ
following example.
EXAMPLE 1
Let Rz,θ : 3 → 3 denote rotation of 3 about the z axis through an angle θ
from the positive x axis toward the positive y axis. Show that Rz,θ is linear and
find its matrix.
ST ST ST
1 0 0
k
Let i = 0 , j = 1 , and k = 0 denote the standard basis of 3; we must find
Rz(j)
0 0 1
θ Rz,θ(i), Rz,θ(j), and Rz,θ(k). Clearly Rz,θ(k) = k. The effect of Rz,θ on the x-y plane
θ j y
is to rotate it counterclockwise through the angle θ. Hence Figure 4 gives
i
S T S T
x Rz(i)
cos θ -sin θ
Rz,θ(i) = sin θ , Rz,θ(j) = cos θ
FIGURE 4 0 0
SECTION 4.4 Linear Operators on 3 223
Example 1 begs to be generalized. Given a line L through the origin in 3, every
rotation about L through a fixed angle is clearly distance preserving, and so is a
linear operator by Theorem 1. However, giving a precise description of the matrix
of this rotation is not easy and will have to wait until more techniques are available.
Theorem 4
This result is illustrated in Figure 7, and was used in Examples 15 and 16 Section
v 2.2 to reveal the effect of expansion and shear transformations.
Now we are interested in the effect of a linear transformation T : 3 → 3 on
u the parallelepiped determined by three vectors u, v, and w in 3 (see the discussion
preceding Theorem 5 Section 4.3). If T has matrix A, Theorem 4 shows that this
w
O parallelepiped is carried to the parallelepiped determined by T(u) = Au, T(v) = Av,
and T(w) = Aw. In particular, we want to discover how the volume changes, and it
T(w) turns out to be closely related to the determinant of the matrix A.
T(u) Theorem 5
Let vol(u, v, w) denote the volume of the parallelepiped determined by three vectors u, v,
O
and w in 3, and let area(p, q) denote the area of the parallelogram determined by two
T(v) vectors p and q in 2. Then:
FIGURE 7 1. If A is a 3 × 3 matrix, then vol(Au, Av, Aw) = |det(A)| · vol(u, v, w).
2. If A is a 2 × 2 matrix, then area(Ap, Aq) = |det(A)| · area(p, q).
224 Chapter 4 Vector Geometry
PROOF
1. Let [u v w] denote the 3 × 3 matrix with columns u, v, and w. Then
vol(Au, Av, Aw) = |Au · (Av × Aw)|
by Theorem 5 Section 4.3. Now apply Theorem 1 Section 4.3 twice to get
k
Au · (Av × Aw) = det[Au Av Aw] = det{A [u v w]}
q1
= det(A) det[u v w]
= det(A) (u · (v × w))
p1 where we used Definition 2.9 and the product theorem for determinants.
Finally (1) follows from Theorem 5 Section 4.3 by taking absolute values.
S T
x
2. Given p = S T in 2, write p1 = y in 3. By the diagram,
x
y
0
area(p, q) = vol(p1, q1, k) where k is the (length 1) coordinate vector
along the z axis. If A is a 2 × 2 matrix, write A1 = S T in block form,
A 0
0 1
and observe that (Av)1 = (A1v1) for all v in 2 and A1k = k. Hence
part (1) if this theorem shows
area(Ap, Aq) = vol(A1p1, A1q1, A1k)
= |det(A1)| vol(p1, q1, k)
= |det(A)| area(p, q)
as required.
Define the unit square and unit cube to be the square and cube corresponding
to the coordinate vectors in 2 and 3, respectively. Then Theorem 5 gives a
geometrical meaning to the determinant of a matrix A:
• If A is a 2 × 2 matrix, then |det(A)| is the area of the image of the unit square
under multiplication by A;
• If A is a 3 × 3 matrix, then |det(A)| is the volume of the image of the unit cube
under multiplication by A.
These results, together with the importance of areas and volumes in geometry, were
among the reasons for the initial development of determinants.
EXERCISES 4.4
1. In each case show that that T is either projection 2. Determine the effect of the following
on a line, reflection in a line, or rotation through transformations.
an angle, and find the line or angle.
(a) Rotation through __π2 , followed by projection
(a) T S T = _15 S T (b) T S T = _ S
y - xT
x x + 2y x x-y
1 on the y axis, followed by reflection in the
y 2x + 4y y 2
line y = x.
(d) T S T = _ S
√2 S x - y T
4x + 3y T
-x - y -3x + 4y
(c) T S T = __
x 1__ x 1 (b) Projection on the line y = x followed by
y y 5
projection on the line y = -x.
__
S T
x - √3 y
(e) T S x T = S (f ) T S T = _
-x T
-y x 1 (c) Projection on the x axis followed by
__
y y 2
√3 x + y reflection in the line y = x.
SECTION 4.4 Linear Operators on 3 225
3. In each case solve the problem by finding the 5. Find the matrix of the rotation in 3 about the x
matrix of the operator. axis through the angle θ (from the positive y axis
S T
1 to the positive z axis).
(a) Find the projection of v = -2 on the plane
6. Find the matrix of the rotation about the y axis
3
with equation 3x - 5y + 2z = 0. through the angle θ (from the positive x axis to
S T
0 the positive z axis).
(b) Find the projection of v = 1 on the plane 7. If A is 3 × 3, show that the image of the line in
-3 3 through p0 with direction vector d is the line
with equation 2x - y + 4z = 0. through Ap0 with direction vector Ad, assuming
S T
1 that Ad ≠ 0. What happens if Ad = 0?
(c) Find the reflection of v = -2 in the plane
3 8. If A is 3 × 3 and invertible, show that the image
with equation x - y + 3z = 0. of the plane through the origin with normal
S T
0 n is the plane through the origin with normal
(d) Find the reflection of v = 1 in the plane n1 = Bn where B = (A-1)T. [Hint: Use the fact
-3 that v · w = vTw to show that n1 · (Ap) = n · p
with equation 2x + y - 5z = 0. for each p in 3.]
S T
2
(e) Find the reflection of v = 5 in the line 9. Let L be the line through the origin in 2 with
direction vector d = S T ≠ 0.
-1 a
S T
with equation S y T = t 1 .
x 1 b
(a) If PL denotes projection on L, show that PL
z
S T.
-2 a2 ab
S T
1 1
has matrix _______
(f ) Find the projection of v = -1 on the line a2 + b2 ab b2
7 (b) If QL denotes reflection in L, show that QL
ST
with equation S y T = t 0 . S T.
x 3
1 a2 - b2 2ab
has matrix _______
z 4 a2 + b2 2ab b2 - a2
S T
1 10. Let n be a nonzero vector in 3, let L be the
(g) Find the projection of v = 1 on the line line through the origin with direction vector n,
-3 and let M be the plane through the origin with
S T
with equation S y T = t 0 .
x 2 normal n. Show that PL(v) = QL(v) + PM(v) for
all v in 3. [In this case, we say that
z
S T
-3 PL = QL + PM.]
2
(h) Find the reflection of v = -5 in the line
11. If M is the plane through the origin in 3 with
0
normal n = S b T , show that QM has matrix
a
S T
with equation S y T = t 1 .
x 1
c
z -3
b 2 + c 2 − a2 −2ab −2ac
S T
2 1
___________
−2ab a +c −b
2 2 2 −2bc
4. (a) Find the rotation of v = 3 about the a2 + b2 + c2
-1
−2ac −2bc a b2 − c2
2
+
z axis through θ = __π4 .
ST
1
(b) Find the rotation of v = 0 about the z axis
3
through θ = __π6 .
226 Chapter 4 Vector Geometry
S T
cos(__π6 ) -sin(__π6 )
=S T.
0.866 -0.5
by multiplying by the matrix R__π2 =
sin(__π6 ) cos(__π6 ) 0.5 0.866
This gives
R__π2 D = S TS T=S T
0.866 -0.5 0 6 5 1 3 0 5.196 2.83 -0.634 -1.902
FIGURE 4 0.5 0.866 0 0 3 3 9 0 3 5.098 3.098 9.294
and is plotted in Figure 5.
This poses a problem: How do we rotate at a point other than the origin? It
turns out that we can do this when we have solved another more basic problem. It
is clearly important to be able to translate a screen image by a fixed vector w, that is
apply the transformation Tw : 2 → 2 given by Tw(v) = v + w for all v in 2. The
problem is that these translations are not matrix transformations 2 → 2 because
they do not carry 0 to 0 (unless w = 0). However, there is a clever way around this.
FIGURE 5 13 If v0 and v1 are vectors, the vector from v0 to v1 is d = v1 - v0. So a vector v lies on the line segment between v0 and v1 if and only
if v = v0 + td for some number t in the range 0 ≤ t ≤ 1. Thus the image of this segment is the set of vectors Av = Av0 + tAd with
0 ≤ t ≤ 1, that is the image is the segment between Av0 and Av1.
SECTION 4.5 An Application to Computer Graphics 227
S T
x
The idea is to represent a point v = S T as a 3 × 1 column y , called the
x
y
1
homogeneous coordinates of v. Then translation by w = S T can be achieved
p
q
by multiplying by a 3 × 3 matrix:
1 0 p x x + p
y+q =S T
Tw(v)
0 1 q y =
1
0 0 1 1 1
Thus, by using homogeneous coordinates we can implement the translation Tw in
the top two coordinates. On the other hand, the matrix transformation induced by
A=S T is also given by a 3 × 3 matrix:
a b
c d
a b 0 x ax + by
c d 0 y = cx + dy = S T
Av
1
0 0 1 1 1
So everything can be accomplished at the expense of using 3 × 3 matrices and
homogeneous coordinates.
EXAMP L E 1
Rotate the letter A in Figure 2 through __π6 about the point S T.
4
5
Solution ► Using homogenous coordinates for the vertices of the letter results in
a data matrix with three rows:
0 6 5 1 3
Kd = 0 0 3 3 9
1 1 1 1 1
If we write w = S T , the idea is to use a composite of transformations: First
Origin 4
5
translate the letter by -w so that the point w moves to the origin, then rotate
FIGURE 6 this translated letter, and then translate it by w back to its original position.
The matrix arithmetic is as follows (remember the order of composition!):
1 0 4 0.866 −0.5 0 1 0 −4 0 6 5 1 3
0 1 5 0.5 0.866 0 0 1 −5 0 0 3 3 9
0 0 1 0 0 1 0 0 1 1 1 1 1 1
3.036 8.232 5.866 2.402 1.134
−
= 1.33 1.67 3.768 1.768 7.964
1 1 1 1 1
This is plotted in Figure 6.
This discussion merely touches the surface of computer graphics, and the
reader is referred to specialized books on the subject. Realistic graphic rendering
requires an enormous number of matrix calculations. In fact, matrix multiplication
algorithms are now embedded in microchip circuits, and can perform over 100
million matrix multiplications per second. This is particularly important in the
field of three-dimensional graphics where the homogeneous coordinates have four
components and 4 × 4 matrices are required.
228 Chapter 4 Vector Geometry
EXERCISES 4.5
1. Consider the letter A described in Figure 2. 4. Find the 3 × 3 matrix for rotating through the
Find the data matrix for the letter obtained by: angle θ about the point P(a, b).
(a) Rotating the letter through __π4 about the 5. Find the reflection of the point P in the line
origin. y = 1 + 2x in 2 if:
Rotating the letter through __π4 about the
(b) (a) P = P(1, 1)
point S T.
1
(b) P = P(1, 4)
2
2. Find the matrix for turning the letter A in (c) What about P = P(1, 3)? Explain.
Figure 2 upside-down in place. [Hint: Example 1 and Section 4.4.]
1. Suppose that u and v are nonzero vectors. If u (b) Find the speed of the airplane if the wind is
and v are not parallel, and au + bv = a1u + b1v, from the west (at 150 km/h).
show that a = a1 and b = b1.
6. A rescue boat has a top speed of 13 knots. The
2. Consider a triangle with vertices A, B, and captain wants to go due east as fast as possible
C. Let E and F be the midpoints of sides AB in water with a current of 5 knots due south.
and AC, respectively, and let the medians Find the velocity vector v = (x, y) that she must
EC and FB meet at O. Write EO = s EC achieve, assuming the x and y axes point east and
north, respectively, and find her resulting speed.
and FO = t FB , where s and t are scalars. Show
that s = t = _13 by expressing AO two ways in 7. A boat goes 12 knots heading north. The current
the form a EO + b AC, and applying Exercise 1. is 5 knots from the west. In what direction does
Conclude that the medians of a triangle meet the boat actually move and at what speed?
at the point on each that is one-third of the 8. Show that the distance from a point A (with
way from the midpoint to the vertex (and so are vector a) to the plane with vector equation
concurrent). 1
n · p = d is ___
n
|n · a - d|.
3. A river flows at 1 km/h and a swimmer moves 9. If two distinct points lie in a plane, show that
at 2 km/h (relative to the water). At what angle the line through these points is contained in
must he swim to go straight across? What is his the plane.
resulting speed?
10. The line through a vertex of a triangle,
4. A wind is blowing from the south at 75 knots,
perpendicular to the opposite side, is called
and an airplane flies heading east at 100 knots.
an altitude of the triangle. Show that the
Find the resulting velocity of the airplane.
three altitudes of any triangle are concurrent.
5. An airplane pilot flies at 300 km/h in a direction (The intersection of the altitudes is called the
30° south of east. The wind is blowing from the orthocentre of the triangle.) [Hint: If P is the
south at 150 km/h. intersection of two of the altitudes, show that
the line through P and the remaining vertex is
(a) Find the resulting direction and speed of the perpendicular to the remaining side.]
airplane.
The Vector Space n
5
SECTION 5.1 Subspaces and Spanning
In Section 2.2 we introduced the set n of all n-tuples (called vectors), and began our
investigation of the matrix transformations n → m given by matrix multiplication
by an m × n matrix. Particular attention was paid to the euclidean plane 2 where
certain simple geometric transformations were seen to be matrix transformations.
Then in Section 2.6 we introduced linear transformations, showed that they are all
matrix transformations, and found the matrices of rotations and reflections in 2.
We returned to this in Section 4.4 where we showed that projections, reflections,
and rotations of 2 and 3 were all linear, and where we related areas and volumes
to determinants.
In this chapter we investigate n in full generality, and introduce some of the
most important concepts and methods in linear algebra. The n-tuples in n will
continue to be denoted x, y, and so on, and will be written as rows or columns
depending on the context.
Subspaces of n
Definition 5.1 A set1 U of vectors in n is called a subspace of n if it satisfies the following properties:
S1. The zero vector 0 is in U.
S2. If x and y are in U, then x + y is also in U.
S3. If x is in U, then ax is in U for every real number a.
1
We say that the subset U is closed under addition if S2 holds, and that U is closed
under scalar multiplication if S3 holds.
Clearly n is a subspace of itself. The set U = {0}, consisting of only the zero
vector, is also a subspace because 0 + 0 = 0 and a0 = 0 for each a in ; it is called
the zero subspace. Any subspace of n other than {0} or n is called a proper
subspace.
1 We use the language of sets. Informally, a set X is a collection of objects, called the elements of the set. The fact that x is an
element of X is denoted x ∈ X. Two sets X and Y are called equal (written X = Y) if they have the same elements. If every element of
X is in the set Y, we say that X is a subset of Y, and write X ⊆ Y. Hence X ⊆ Y and Y ⊆ X both hold if and only if X = Y.
230 Chapter 5 The Vector Space n
We saw in Section 4.2 that every plane M through the origin in 3 has equation
ax + by + cz = 0 where a, b, and c are not all zero. Here n = S b T is a normal for the
a
c
plane and
M = {v in 3 | n · v = 0}
where v = S y T and n · v denotes the dot product introduced in Section 2.2 (see the
x
z
n z
diagram).2 Then M is a subspace of 3. Indeed we show that M satisfies S1, S2, and
S3 as follows:
y S1. 0 is in M because n · 0 = 0;
x S2. If v and v1 are in M, then n · (v + v1) = n · v + n · v1 = 0 + 0 = 0, so v + v1 is
M
in M;
S3. If v is in M, then n · (av) = a(n · v) = a(0) = 0, so av is in M.
This proves the first part of
EXAMPLE 1
z Planes and lines through the origin in 3 are all subspaces of 3.
L Solution ► We dealt with planes above. If L is a line through the origin with
direction vector d, then L = {td | t in } (see the diagram). We leave it as an
d exercise to verify that L satisfies S1, S2, and S3.
y
Example 1 shows that lines through the origin in 2 are subspaces; in fact, they are the
x only proper subspaces of 2 (Exercise 24). Indeed, we shall see in Example 14 Section
5.2 that lines and planes through the origin in 3 are the only proper subspaces of 3.
Thus the geometry of lines and planes through the origin is captured by the subspace
concept. (Note that every line or plane is just a translation of one of these.)
Subspaces can also be used to describe important features of an m × n matrix A.
The null space of A, denoted null A, and the image space of A, denoted im A, are
defined by
null A = {x in n | Ax = 0} and im A = {Ax | x in n}
In the language of Chapter 2, null A consists of all solutions x in n of the
homogeneous system Ax = 0, and im A is the set of all vectors y in m such that
Ax = y has a solution x. Note that x is in null A if it satisfies the condition Ax = 0,
while im A consists of vectors of the form Ax for some x in n. These two ways to
describe subsets occur frequently.
EXAMPLE 2
If A is an m × n matrix, then:
1. null A is a subspace of n.
2. im A is a subspace of m.
2 We are using set notation here. In general {q | p } means the set of all objects q with property p.
SECTION 5.1 Subspaces and Spanning 231
Solution ►
1. The zero vector 0 in n lies in null A because A0 = 0.3 If x and x1
are in null A, then x + x1 and ax are in null A because they satisfy the
required condition:
A(x + x1) = Ax + Ax1 = 0 + 0 = 0 and A(ax) = a(Ax) = a0 = 0
Hence null A satisfies S1, S2, and S3, and so is a subspace of n.
2. The zero vector 0 in m lies in im A because 0 = A0. Suppose that y and
y1 are in im A, say y = Ax and y1 = Ax1 where x and x1 are in n. Then
y + y1 = Ax + Ax1 = A(x + x1) and ay = a(Ax) = A(ax)
There are other important subspaces associated with a matrix A that clarify basic
properties of A. If A is an n × n matrix and λ is any number, let
Eλ(A) = {x in n | Ax = λx}.
A vector x is in Eλ(A) if and only if (λI - A)x = 0, so Example 2 gives:
EXAMP L E 3
Eλ(A) = null(λI - A) is a subspace of n for each n × n matrix A and
number λ.
Eλ(A) is called the eigenspace of A corresponding to λ. The reason for the name
is that, in the terminology of Section 3.3, λ is an eigenvalue of A if Eλ(A) ≠ {0}.
In this case the nonzero vectors in Eλ(A) are called the eigenvectors of A
corresponding to λ.
The reader should not get the impression that every subset of n is a subspace.
For example:
U1 = U S T | x ≥ 0 V satisfies S1 and S2, but not S3;
x
y
U2 = U S T | x2 = y2 V satisfies S1 and S3, but not S2;
x
y
Hence neither U1 nor U2 is a subspace of 2. (However, see Exercise 20.)
Spanning Sets
Let v and w be two nonzero, nonparallel vectors in 3 with their tails at the
origin. The plane M through the origin containing these vectors is described in
Section 4.2 by saying that n = v × w is a normal for M, and that M consists of all
vectors p such that n · p = 0.4 While this is a very useful way to look at planes,
there is another approach that is at least as useful in 3 and, more importantly,
works for all subspaces of n for any n ≥ 1.
3 We are using 0 to represent the zero vector in both m and n. This abuse of notation is common and causes no confusion once
everybody knows what is going on.
4 The vector n = v × w is nonzero because v and w are not parallel.
232 Chapter 5 The Vector Space n
Definition 5.2 The set of all such linear combinations is called the span of the xi and is denoted
span{x1, x2, …, xk} = {t1x1 + t2x2 + + tkxk | ti in }.
If V = span{x1, x2, …, xk}, we say that V is spanned by the vectors x1, x2, …, xk, and
that the vectors x1, x2, …, xk span the space V.
Two examples:
span{x} = {tx | t in },
which we write as span{x} = x for simplicity.
span{x, y} = {rx + sy| r, s in }.
In particular, the above discussion shows that, if v and w are two nonzero,
nonparallel vectors in 3, then
M = span{v, w}
is the plane in 3 containing v and w. Moreover, if d is any nonzero vector in 3
(or 2), then
L = span{v} = {td | t in } = d
is the line with direction vector d (see also Lemma 1 Section 3.3). Hence lines and
planes can both be described in terms of spanning sets.
EXAMPLE 4
Let x = (2, -1, 2, 1) and y = (3, 4, -1, 1) in 4. Determine whether
p = (0, -11, 8, 1) or q = (2, 3, 1, 2) are in U = span{x, y}.
5 In particular, this implies that any vector p orthogonal to v × w must be a linear combination p = av + bw of v and w for some a
and b. Can you prove this directly?
SECTION 5.1 Subspaces and Spanning 233
Theorem 1
PROOF
Write U = span{x1, x2, …, xk} for convenience.
1. The zero vector 0 is in U because 0 = 0x1 + 0x2 + + 0xk is a linear
combination of the xi. If x = t1x1 + t2x2 + + tkxk and
y = s1x1 + s2x2 + + skxk are in U, then x + y and ax are in U because
x + y = (t1 + s1)x1 + (t2 + s2)x2 + + (tk + sk)x1, and
ax = (at1)x1 + (at2)x2 + + (atk)x1.
Hence S1, S2, and S3 are satisfied for U, proving (1).
2. Let x = t1x1 + t2x2 + + tkxk where the ti are scalars and each xi is in W.
Then each tixi is in W because W satisfies S3. But then x is in W because W
satisfies S2 (verify). This proves (2).
Condition (2) in Theorem 1 can be expressed by saying that span{x1, x2, …, xk} is
the smallest subspace of n that contains each xi. This is useful for showing that two
subspaces U and W are equal, since this amounts to showing that both U ⊆ W and
W ⊆ U. Here is an example of how it is used.
EXAMP L E 5
If x and y are in n, show that span{x, y} = span{x + y, x - y}.
It turns out that many important subspaces are best described by giving a
spanning set. Here are three examples, beginning with an important spanning set
for n itself. Column j of the n × n identity matrix In is denoted ej and called the jth
coordinate vector in n, and the set {e1, e2, …, en} is called the standard basis of
S T
x1
x
. If x = 2 is any vector in n, then x = x1e1 + x2e2 + + xnen, as the reader
n
xn
can verify. This proves:
234 Chapter 5 The Vector Space n
EXAMPLE 6
n = span{e1, e2, …, en} where e1, e2, …, en are the columns of In.
EXAMPLE 7
Given an m × n matrix A, let x1, x2, …, xk denote the basic solutions to the
system Ax = 0 given by the gaussian algorithm. Then
null A = span{x1, x2, …, xk}.
EXAMPLE 8
Let c1, c2, …, cn denote the columns of the m × n matrix A. Then
im A = span{c1, c2, …, cn}.
Solution ► If {e1, e2, …, en} is the standard basis of n, observe that
[Ae1 Ae2 Aen] = A[e1 e2 en] = AIn = A = [c1 c2 cn].
Hence ci = Aei is in im A for each i, so span{c1, c2, …, cn} ⊆ im A.
S T
x1
xn
Conversely, let y be in im A, say y = Ax for some x in . If x = 2 , then
xn
Definition 2.5 gives
y = Ax = x1c1 + x2c2 + + xncn is in span{c1, c2, …, cn}.
This shows that im A ⊆ span{c1, c2, …, cn}, and the result follows.
EXERCISES 5.1
2. In each case determine if x lies in U = span{y, z}. 15. Let U be a subspace of n, and let x be a vector
If x is in U, write it as a linear combination of in n.
y and z; if x is not in U, show why not.
(a) If ax is in U where a ≠ 0 is a number, show
(a) x = (2, -1, 0, 1), y = (1, 0, 0, 1), and that x is in U.
z = (0, 1, 0, 1).
(b) If y and x + y are in U where y is a vector in
(b) x = (1, 2, 15, 11), y = (2, -1, 0, 2), and n, show that x is in U.
z = (1, -1, -3, 1).
16. In each case either show that the statement is
(c) x = (8, 3, -13, 20), y = (2, 1, -3, 5), and true or give an example showing that it is false.
z = (-1, 0, 2, -3).
(a) If U ≠ n is a subspace of n and x + y is in
(d) x = (2, 5, 8, 3), y = (2, -1, 0, 5), and U, then x and y are both in U.
z = (-1, 2, 2, -3).
(b) If U is a subspace of n and rx is in U for all
3. In each case determine if the given vectors r in , then x is in U.
span 4. Support your answer. (c) If U is a subspace of n and x is in U, then
(a) {(1, 1, 1, 1), (0, 1, 1, 1), (0, 0, 1, 1), (0, 0, 0, 1)}. -x is also in U.
(b) {(1, 3, -5, 0), (-2, 1, 0, 0), (0, 2, 1, -1), (d) If x is in U and U = span{y, z}, then
(1, -4, 5, 0)}. U = span{x, y, z}.
4. Is it possible that {(1, 2, 0), (2, 0, 3)} can span the (e) The empty set of vectors in n is a subspace
subspace U = {(r, s, 0) | r and s in }? Defend of n.
(f ) S T is in span U S T , S T V .
your answer. 0 1 2
1 0 0
5. Give a spanning set for the zero subspace {0} of n. 17. (a) If A and B are m × n matrices, show that
U = {x in n | Ax = Bx} is a subspace of n.
6. Is 2 a subspace of 3? Defend your answer.
(b) What if A is m × n, B is k × n, and m ≠ k?
7. If U = span{x, y, z} in n, show that
U = span{x + tz, y, z} for every t in . 18. Suppose that x1, x2, …, xk are vectors in n. If
y = a1x1 + a2x2 + + akxk where a1 ≠ 0, show
8. If U = span{x, y, z} in n, show that that span{x1, x2, …, xk} = span{y1, x2, …, xk}.
U = span{x + y, y + z, z + x}.
19. If U ≠ {0} is a subspace of , show that U = .
9. If a ≠ 0 is a scalar, show that
span{ax} = span{x} for every vector x in n. 20. Let U be a nonempty subset of n. Show that U
is a subspace if and only if S2 and S3 hold.
10. If a1, a2, …, ak are nonzero scalars, show that
21. If S and T are nonempty sets of vectors in n,
span{a1x1, a2x2, …, akxk} = span{x1, x2, …, xk}
and if S ⊆ T, show that span{S} ⊆ span{T}.
for any vectors xi in n.
22. Let U and W be subspaces of n. Define their
11. If x ≠ 0 in n, determine all subspaces of
intersection U ∩ W and their sum U + W as
span{x}.
follows:
12. Suppose that U = span{x1, x2, …, xk} where each U ∩ W = {x in n | x belongs to both U and W}.
xi is in n. If A is an m × n matrix and Axi = 0
for each i, show that Ay = 0 for every vector y U + W = {x in n | x is a sum of a vector in U
in U. and a vector in W}.
(a) Show that U ∩ W is a subspace of n.
13. If A is an m × n matrix, show that, for each
invertible m × m matrix U, null(A) = null(UA). (b) Show that U + W is a subspace of n.
14. If A is an m × n matrix, show that, for each 23. Let P denote an invertible n × n matrix. If λ is
invertible n × n matrix V, im(A) = im(AV ). a number, show that Eλ(PAP-1) = {Px | x is in
Eλ(A)} for each n × n matrix A.
236 Chapter 5 The Vector Space n
24. Show that every proper subspace U of 2 is a the line with direction vector d. If u is in U but
line through the origin. [Hint: If d is a nonzero not in L, argue geometrically that every vector v
vector in U, let L = d = {rd | r in } denote in 2 is a linear combination of u and d.]
Linear Independence
Given x1, x2, …, xk in n, suppose that two linear combinations are equal:
r1x1 + r2x2 + + rkxk = s1x1 + s2x2 + + skxk.
We are looking for a condition on the set {x1, x2, …, xk} of vectors that guarantees
that this representation is unique; that is, ri = si for each i. Taking all terms to the
left side gives
(r1 - s1)x1 + (r2 - s2)x2 + + (rk - sk)xk = 0.
so the required condition is that this equation forces all the coefficients ri - si to be zero.
Definition 5.3 With this in mind, we call a set {x1, x2, …, xk} of vectors linearly independent (or
simply independent) if it satisfies the following condition:
If t1x1 + t2x2 + + tkxk = 0 then t1 = t2 = = tk = 0.
Theorem 1
If {x1, x2, …, xk} is an independent set of vectors in n, then every vector in
span{x1, x2, …, xk} has a unique representation as a linear combination of the xi.
Independence Test
To verify that a set {x1, x2, …, xk} of vectors in n is independent, proceed as follows:
1. Set a linear combination equal to zero: t1x1 + t2x2 + + tkxk = 0.
2. Show that ti = 0 for each i (that is, the linear combination is trivial).
Of course, if some nontrivial linear combination vanishes, the vectors are not independent.
SECTION 5.2 Independence and Dimension 237
EXAMP L E 1
Determine whether {(1, 0, -2, 5), (2, 1, 0, -1), (1, 1, 2, 1)} is independent
in 4.
EXAMP L E 2
Show that the standard basis {e1, e2, …, ek} of n is independent.
Solution ► The components of t1e1 + t2e2 + + tnen are t1, t2, …, tn (see the
discussion preceding Example 6 Section 5.1) So the linear combination vanishes
if and only if each ti = 0. Hence the independence test applies.
EXAMP L E 3
If {x, y} is independent, show that {2x + 3y, x - 5y} is also independent.
EXAMP L E 4
Show that the zero vector in n does not belong to any independent set.
Solution ► No set {0, x1, x2, …, xk} of vectors is independent because we have a
vanishing, nontrivial linear combination 1 · 0 + 0x1 + 0x2 + + 0xk = 0.
EXAMP L E 5
Given x in n, show that {x} is independent if and only if x ≠ 0.
EXAMPLE 6
Show that the nonzero rows of a row-echelon matrix R are independent.
Solution ► We illustrate the case with 3 leading 1s; the general case is
0 1 ∗ ∗ ∗ ∗
analogous. Suppose R has the form R = 0 0 0 1 ∗ ∗ where ∗ indicates
0 0 0 0 1 ∗
0 0 0 0 0 0
a nonspecified number. Let R1, R2, and R3 denote the nonzero rows of R. If
t1R1 + t2R2 + t3R3 = 0 we show that t1 = 0, then t2 = 0, and finally t3 = 0.
The condition t1R1 + t2R2 + t3R3 = 0 becomes
EXAMPLE 7
If v and w are nonzero vectors in 3, show that {v, w} is dependent if and only
if v and w are parallel.
Solution ► If v and w are parallel, then one is a scalar multiple of the other
(Theorem 4 Section 4.1), say v = aw for some scalar a. Then the nontrivial
linear combination v - aw = 0 vanishes, so {v, w} is dependent.
Conversely, if {v, w} is dependent, let sv + tw = 0 be nontrivial, say s ≠ 0.
Then v = -_ts w, so v and w are parallel (by Theorem 4 Section 4.1). A similar
argument works if t ≠ 0.
u
With this we can give a geometric description of what it means for a set {u, v, w}
in 3 to be independent. Note that this requirement means that {v, w} is also
v independent (av + bw = 0 means that 0u + av + bw = 0), so M = span{v, w} is the
w plane containing v, w, and 0 (see the discussion preceding Example 4 Section 5.1).
So we assume that {v, w} is independent in the following example.
M
{u, v, w} independent
EXAMPLE 8
v
u Let u, v, and w be nonzero vectors in 3 where {v, w} independent. Show that
w {u, v, w} is independent if and only if u is not in the plane M = span{v, w}.
M
This is illustrated in the diagrams.
{u, v, w} not independent
SECTION 5.2 Independence and Dimension 239
S T
x1
x
Indeed, if c1, c2, …, cn are the columns of A, and if we write x = 2 , then
xn
Ax = x1c1 + x2c2 + + xncn
by Definition 2.5. Hence the definitions of independence and spanning show,
respectively, that condition 2 is equivalent to the independence of {c1, c2, …, cn}
and condition 3 is equivalent to the requirement that span{c1, c2, …, cn} = m.
This discussion is summarized in the following theorem:
Theorem 2
Theorem 3
6 It is best to view columns and rows as just two different notations for ordered n-tuples. This discussion will become redundant in
Chapter 6 where we define the general notion of a vector space.
240 Chapter 5 The Vector Space n
PROOF
Let c1, c2, …, cn denote the columns of A.
(1) ⇔ (2). By Theorem 5 Section 2.4, A is invertible if and only if Ax = 0
implies x = 0; this holds if and only if {c1, c2, …, cn} is independent by
Theorem 2.
(1) ⇔ (3). Again by Theorem 5 Section 2.4, A is invertible if and only if
Ax = b has a solution for every column B in n; this holds if and only if
span{c1, c2, …, cn} = n by Theorem 2.
(1) ⇔ (4). The matrix A is invertible if and only if AT is invertible (by the
Corollary to Theorem 4 Section 2.4); this in turn holds if and only if AT has
independent columns (by (1) ⇔ (2)); finally, this last statement holds if and
only if A has independent rows (because the rows of A are the transposes of
the columns of AT).
(1) ⇔ (5). The proof is similar to (1) ⇔ (4).
EXAMPLE 9
Show that S = {(2, -2, 5), (-3, 1, 1), (2, 7, -4)} is independent in 3.
2 -2 5
Solution ► Consider the matrix A = -3 1 1 with the vectors in S as its
2 7 -4
rows. A routine computation shows that det A = -117 ≠ 0, so A is invertible.
Hence S is independent by Theorem 3. Note that Theorem 3 also shows that
3 = span S.
Dimension
It is common geometrical language to say that 3 is 3-dimensional, that planes are
2-dimensional and that lines are 1-dimensional. The next theorem is a basic tool
for clarifying this idea of “dimension”. Its importance is difficult to exaggerate.
Theorem 4
Fundamental Theorem
Let U be a subspace of n. If U is spanned by m vectors, and if U contains k linearly
independent vectors, then k ≤ m.
Definition 5.4 If U is a subspace of n, a set {x1, x2, …, xm} of vectors in U is called a basis of U if it
satisfies the following two conditions:
1. {x1, x2, …, xm} is linearly independent.
2. U = span{x1, x2, …, xm}.
Theorem 5
Invariance Theorem
If {x1, x2, …, xm} and {y1, y2, …, yk} are bases of a subspace U of n, then m = k.
PROOF
We have k ≤ m by the fundamental theorem because {x1, x2, …, xm} spans U,
and {y1, y2, …, yk} is independent. Similarly, by interchanging xs and ys we get
m ≤ k. Hence m = k.
Definition 5.5 If U is a subspace of n and {x1, x2, …, xm} is any basis of U, the number, m, of vectors
in the basis is called the dimension of U, denoted
dim U = m.
EXAMPLE 10
dim(n) = n and {e1, e2, …, en} is a basis.
EXAMPLE 11
ST ST
Solution ► Clearly, S s T = ru + sv where u = 0 and v = 1 . It follows that
r 1 0
r 1 0
U = span{u, v}, and hence that U is a subspace of 3. Moreover, if
ST
ru + sv = 0, then S s T = 0 so r = s = 0. Hence {u, v} is independent, and so a
r 0
r 0
basis of U. This means dim U = 2.
EXAMPLE 12
Let B = {x1, x2, …, xn} be a basis of n. If A is an invertible n × n matrix, then
D = {Ax1, Ax2, …, Axn} is also a basis of n.
While we have found bases in many subspaces of n, we have not yet shown that
every subspace has a basis. This is part of the next theorem, the proof of which is
deferred to Section 6.4 where it will be proved in more generality.
Theorem 6
EXAMPLE 13
Find a basis of 4 containing S = {u, v} where u = (0, 1, 2, 3) and v = (2, -1, 0, 1).
Solution ► By Theorem 6 we can find such a basis by adding vectors from the
standard basis of 4 to S. If we try e1 = (1, 0, 0, 0), we find easily that {e1, u, v}
is independent. Now add another vector from the standard basis, say e2.
SECTION 5.2 Independence and Dimension 243
Theorem 7
Let U be a subspace of n where dim U = m and let B = {x1, x2, …, xm} be a set of m
vectors in U. Then B is independent if and only if B spans U.
PROOF
Suppose B is independent. If B does not span U then, by Theorem 6, B can be
enlarged to a basis of U containing more than m vectors. This contradicts the
invariance theorem because dim U = m, so B spans U. Conversely, if B spans U
but is not independent, then B can be cut down to a basis of U containing fewer
than m vectors, again a contradiction. So B is independent, as required.
Theorem 8
PROOF
Write dim W = k, and let B be a basis of U.
1. If dim U > k, then B is an independent set in W containing more than k
vectors, contradicting the fundamental theorem. So dim U ≤ k = dim W.
2. If dim U = k, then B is an independent set in W containing k = dim W
vectors, so B spans W by Theorem 7. Hence W = span B = U, proving (2).
It follows from Theorem 8 that if U is a subspace of n, then dim U is one of the
integers 0, 1, 2, …, n, and that:
dim U = 0 if and only if U = {0},
dim U = n if and only if U = n
The other subspaces are called proper. The following example uses Theorem 8
to show that the proper subspaces of 2 are the lines through the origin, while the
proper subspaces of 3 are the lines and planes through the origin.
244 Chapter 5 The Vector Space n
EXAMPLE 14
1. If U is a subspace of 2 or 3, then dim U = 1 if and only if U is a line
through the origin.
2. If U is a subspace of 3, then dim U = 2 if and only if U is a plane
through the origin.
PROOF
1. Since dim U = 1, let {u} be a basis of U. Then U = span{u} = {tu | t in }, so
U is the line through the origin with direction vector u. Conversely each line
L with direction vector d ≠ 0 has the form L = {td | t in }. Hence {d} is a
basis of U, so U has dimension 1.
2. If U ⊆ 3 has dimension 2, let {v, w} be a basis of U. Then v and w are not
parallel (by Example 7) so n = v × w ≠ 0. Let P = {x in 3 | n · x = 0}
denote the plane through the origin with normal n. Then P is a subspace of
3 (Example 1 Section 5.1) and both v and w lie in P (they are orthogonal to
n), so U = span{v, w} ⊆ P by Theorem 1 Section 5.1. Hence
U ⊆ P ⊆ 3.
Since dim U = 2 and dim(3) = 3, it follows from Theorem 8 that dim P = 2
or 3, whence P = U or 3. But P ≠ 3 (for example, n is not in P) and so
U = P is a plane through the origin.
Conversely, if U is a plane through the origin, then dim U = 0, 1, 2, or 3
by Theorem 8. But dim U ≠ 0 or 3 because U ≠ {0} and U ≠ 3, and dim
U ≠ 1 by (1). So dim U = 2.
Note that this proof shows that if v and w are nonzero, nonparallel vectors in
3, then span{v, w} is the plane with normal n = v × w. We gave a geometrical
verification of this fact in Section 5.1.
EXERCISES 5.2
4. Find a basis and calculate the dimension of the (b) If {x, y, z} is independent, then {y, z} is
following subspaces of 4. independent.
US T V
a (c) If {y, z} is dependent, then {x, y, z} is
(a) U = a + b | a and b in .
a-b dependent for any x.
b (d) If all of x1, x2, …, xk are nonzero, then
US T V
a+b {x1, x2, …, xk} is independent.
(b) U =
a - b | a and b in .
b (e) If one of x1, x2, …, xk is zero, then
a {x1, x2, …, xk} is dependent.
US T V
a
b | a, b, and c in . (f ) If ax + by + cz = 0, then {x, y, z} is
(c) U = c + a independent.
c
US T V
(g) If {x, y, z} is independent, then
a-b
b + c | a, b, and c in . ax + by + cz = 0 for some a, b, and c in .
(d) U= a
(h) If {x1, x2, …, xk} is dependent, then
b+c
t1x1 + t2x2 + + tkxk = 0 for some
US T V
a
b | a + b - c + d = 0 in . numbers ti in not all zero.
(e) U = c
d (i) If {x1, x2, …, xk} is independent, then
t1x1 + t2x2 + + tkxk = 0 for some ti in .
US T V
a
(f ) U= b | a + b = c + d in .
c 8. If A is an n × n matrix, show that det A = 0
d if and only if some column of A is a linear
combination of the other columns.
5. Suppose that {x, y, z, w} is a basis of 4. Show
that: 9. Let {x, y, z} be a linearly independent set in 4.
(a) {x + aw, y, z, w} is also a basis of 4 for any Show that {x, y, z, ek} is a basis of 4 for some ek
choice of the scalar a. in the standard basis {e1, e2, e3, e4}.
(b) {x + w, y + w, z + w, w} is also a basis 10. If {x1, x2, x3, x4, x5, x6} is an independent set of
of 4. vectors, show that the subset {x2, x3, x5} is also
independent.
(c) {x, x + y, x + y + z, x + y + z + w} is also
a basis of 4. 11. Let A be any m × n matrix, and let b1, b2,
b3, …, bk be columns in m such that the
6. Use Theorem 3 to determine if the following system Ax = bi has a solution xi for each i. If
sets of vectors are a basis of the indicated space. {b1, b2, b3, …, bk} is independent in m, show
(a) {(3, -1), (2, 2)} in 2. that {x1, x2, x3, …, xk} is independent in n.
(b) {(1, 1, -1), (1, -1, 1), (0, 0, 1)} in 3. 12. If {x1, x2, x3, …, xk} is independent, show that
(c) {(-1, 1, -1), (1, -1, 2), (0, 0, 1)} in . 3 {x1, x1 + x2, x1 + x2 + x3, …, x1 + x2 + + xk}
is also independent.
(d) {(5, 2, -1), (1, 0, 1), (3, -1, 0)} in 3.
13. If {y, x1, x2, x3, …, xk} is independent, show
(e) {(2, 1, -1, 3), (1, 1, 0, 2), (0, 1, 0, -3), that {y + x1, y + x2, y + x3, …, y + xk} is also
(-1, 2, 3, 1)} in 4. independent.
(f ) {(1, 0, -2, 5), (4, 4, -3, 2), (0, 1, 0, -3),
(1, 3, 3, -10)} in 4. 14. If {x1, x2, …, xk} is independent in n, and if y is
not in span{x1, x2, …, xk}, show that
7. In each case show that the statement is true or {x1, x2, …, xk, y} is independent.
give an example showing that it is false.
15. If A and B are matrices and the columns of AB
(a) If {x, y} is independent, then {x, y, x + y} is are independent, show that the columns of B are
independent. independent.
246 Chapter 5 The Vector Space n
16. Suppose that {x, y} is a basis of 2, and let 18. Let A denote an m × n matrix.
A = S T.
a b
(a) Show that im A = im(AV ) for every
c d invertible n × n matrix V.
(a) If A is invertible, show that {ax + by, cx + dy}
is a basis of 2. (b) Show that dim(im A) = dim(im(UA)) for
every invertible m × m matrix U. [Hint: If
(b) If {ax + by, cx + dy} is a basis of 2, show {y1, y2, …, yk} is a basis of im(UA), show that
that A is invertible. {U -1y1, U -1y2, …, U -1yk} is a basis of im A.]
17. Let A denote an m × n matrix. 19. Let U and W denote subspaces of n, and
(a) Show that null A = null(UA) for every assume that U ⊆ W. If dim U = n - 1, show
invertible m × m matrix U. that either W = U or W = n.
(b) Show that dim(null A) = dim(null(AV )) 20. Let U and W denote subspaces of n, and
for every invertible n × n matrix V. [Hint: assume that U ⊆ W. If dim W = 1, show that
If {x1, x2, …, xk} is a basis of null A, show either U = {0} or U = W.
that {V-1x1, V-1x2, …, V-1xk} is a basis of
null(AV ).]
EXAMPLE 1
If x = (1, -1, -3, 1) and y = (2, 1, 1, 0) in 4, then x · y = 2 - 1 - 3 + 0 = -2
____________ ___ __
1 __
and x = √1 + 1 + 9 + 1 = √12 = 2√3 . Hence ___
√
x is a unit vector;
2 3
1__
similarly __
√
y is a unit vector.
6
SECTION 5.3 Orthogonality 247
These definitions agree with those in 2 and 3, and many properties carry over to n:
Theorem 1
PROOF
(1), (2), and (3) follow from matrix arithmetic because x · y = xTy; (4) is clear
__
from the definition; and (6) is a routine verification since |a| = √a2 . If
________________
x = (x1, x2, …, xn), then x = √x 21 + x 22 + + x 2n , so x = 0 if and only if
x 21 + x 22 + + x 2n = 0. Since each xi is a real number this happens if and only
if xi = 0 for each i; that is, if and only if x = 0. This proves (5).
EXAMP L E 2
Show that x + y2 = x2 + 2(x · y) + y2 for any x and y in n.
EXAMP L E 3
Suppose that n = span{f1, f2, …, fk} for some vectors fi. If x · fi = 0 for each i
where x is in n, show that x = 0.
248 Chapter 5 The Vector Space n
We saw in Section 4.2 that if u and v are nonzero vectors in 3, then
u · v = cos θ where θ is the angle between u and v. Since |cos θ| ≤ 1 for any
_______
uv
angle θ, this shows that |u · v| ≤ uv. In this form the result holds in n.
Theorem 2
Cauchy Inequality9
If x and y are vectors in n, then
|x · y| ≤ xy.
Moreover |x · y| = xy if and only if one of x and y is a multiple of the other.
9
PROOF
The inequality holds if x = 0 or y = 0 (in fact it is equality). Otherwise,
Augustin Louis Cauchy.
Photo © Corbis. write x = a > 0 and y = b > 0 for convenience. A computation like
that preceding Example 2 gives
bx - ay2 = 2ab(ab - x · y) and bx - ay2 = 2ab(ab + x · y). (∗)
It follows that ab - x · y ≥ 0 and ab + x · y ≥ 0, and hence that -ab ≤ x · y ≤ ab.
Hence |x · y| ≤ ab = xy, proving the Cauchy inequality.
If equality holds, then |x · y| = ab, so x · y = ab or x · y = -ab. Hence (∗)
shows that bx - ay = 0 or bx + ay = 0, so one of x and y is a multiple of the
other (even if a = 0 or b = 0).
9 Augustin Louis Cauchy (1789–1857) was born in Paris and became a professor at the École Polytechnique at the age of 26. He was
one of the great mathematicians, producing more than 700 papers, and is best remembered for his work in analysis in which he
established new standards of rigour and founded the theory of functions of a complex variable. He was a devout Catholic with a long-
term interest in charitable work, and he was a royalist, following King Charles X into exile in Prague after he was deposed in 1830.
Theorem 2 first appeared in his 1812 memoir on determinants.
SECTION 5.3 Orthogonality 249
Corollary 1
Triangle Inequality
If x and y are vectors in n, then x + y ≤ x + y.
v w
The reason for the name comes from the observation that in 3 the inequality
asserts that the sum of the lengths of two sides of a triangle is not less than the
v +w length of the third side. This is illustrated in the first diagram.
Definition 5.7 If x and y are two vectors in n, we define the distance d(x, y) between x and y by
d(x, y) = x - y
The motivation again comes from 3 as is clear in the second diagram. This
v −w
distance function has all the intuitive properties of distance in 3, including
w another version of the triangle inequality.
v
Theorem 3
PROOF
(1) and (2) restate part (5) of Theorem 1 because d(x, y) = x - y, and (3)
follows because u = -u for every vector u in n. To prove (4) use the
Corollary to Theorem 2:
d(x, z) = x - z = (x - y) + (y - z)
≤ (x - y) + (y - z) = d(x, y) + d(y, z)
Definition 5.8 We say that two vectors x and y in n are orthogonal if x · y = 0, extending the
terminology in 3 (See Theorem 3 Section 4.2). More generally, a set {x1, x2, …, xk}
of vectors in n is called an orthogonal set if
xi · xj = 0 for all i ≠ j and xi ≠ 0 for all i.10
Note that {x} is an orthogonal set if x ≠ 0. A set {x1, x2, …, xk} of vectors in n is called
orthonormal if it is orthogonal and, in addition, each xi is a unit vector:
xi = 1 for each i.
10
10 The reason for insisting that orthogonal sets consist of nonzero vectors is that we will be primarily concerned with orthogonal bases.
250 Chapter 5 The Vector Space n
EXAMPLE 4
The standard basis {e1, e2, …, en} is an orthonormal set in n.
EXAMPLE 5
If {x1, x2, …, xk} is orthogonal, so also is {a1x1, a2x2, …, akxk} for any nonzero
scalars ai.
1 x is a unit vector,
If x ≠ 0, it follows from item (6) of Theorem 1 that ____
that is it has length 1. x
EXAMPLE 6
S T ST S T S T
1 1 -1 -1
If f1 = 1 , f = 0 , f = 0 , and f = 3 then {f , f , f , f } is an
1 2 1 3 1 4
-1 1 2 3 4
-1 2 0 1
orthogonal set in 4 as is easily verified. After normalizing, the corresponding
orthonormal set is U _12 f1, __
√
1__ 1__
f2, __
√
1 __
f3, ___
√
f4 V.
6 2 2 3
Theorem 4
Pythagoras’ Theorem
If {x1, x2, …, xk} is a orthogonal set in n, then
x1 + x2 + + xk2 = x12 + x22 + + xk2.
PROOF
The fact that xi · xj = 0 whenever i ≠ j gives
x1 + x2 + + xk2 = (x1 + x2 + + xk) · (x1 + x2 + + xk)
= (x1 · x1 + x2 · x2 + + xk · xk) + ∑xi · xj
i≠j
= x12 + x22 + + xk2 + 0.
This is what we wanted.
SECTION 5.3 Orthogonality 251
If v and w are orthogonal, nonzero vectors in 3, then they are certainly not
parallel, and so are linearly independent by Example 7 Section 5.2. The next
theorem gives a far-reaching extension of this observation.
Theorem 5
PROOF
Let {x1, x2, …, xk} be an orthogonal set in n and suppose a linear combination
vanishes: t1x1 + t2x2 + + tkxk = 0. Then
0 = x1 · 0 = x1 · (t1x1 + t2x2 + + tkxk)
= t1(x1 · x1) + t2(x1 · x2) + + tk(x1 · xk)
= t1x12 + t2(0) + + tk(0)
= t1x12
Since x12 ≠ 0, this implies that t1 = 0. Similarly ti = 0 for each i.
Theorem 5 suggests considering orthogonal bases for n, that is orthogonal sets
that span n. These turn out to be the best bases in the sense that, when expanding
a vector as a linear combination of the basis vectors, there are explicit formulas for
the coefficients.
Theorem 6
Expansion Theorem
Let {f1, f2, …, fm} be an orthogonal basis of a subspace U of n. If x is any vector in U,
we have
x = Q _____12 R f1 + Q _____22 R f2 + + Q _____ R fm.
x·f x·f x · fm
f1 f2 fm2
PROOF
Since {f1, f2, …, fm} spans U, we have x = t1f1 + t2f2 + + tmfm where the ti are
scalars. To find t1 we take the dot product of both sides with f1:
orthonormal, then ti = x · fi for each i. We will have a great deal more to say about
this in Section 10.5.
EXAMPLE 7
Expand x = (a, b, c, d) as a linear combination of the orthogonal basis
{f1, f2, f3, f4} of 4 given in Example 6.
EXERCISES 5.3
3. In each case, show that B is an orthogonal basis 6. If x = 3, y = 1, and x · y = -2, compute:
of 3 and use Theorem 6 to expand x = (a, b, c)
(a) 3x - 5y (b) 2x + 7y
as a linear combination of the basis vectors.
(c) (3x - y) · (2y - x)
(a) B = {(1, -1, 3), (-2, 1, 1), (4, 7, 1)}
(d) (x - 2y) · (3x + 5y)
(b) B = {(1, 0, -1), (1, 4, 1), (2, -1, 2)}
(c) B = {(1, 2, 3), (-1, -1, 1), (5, -4, 1)} 7. In each case either show that the statement is
true or give an example showing that it is false.
(d) B = {(1, 1, 1), (1, -1, 0), (1, 1, -2)}
(a) Every independent set in n is orthogonal.
4. In each case, write x as a linear combination of
(b) If {x, y} is an orthogonal set in n, then
the orthogonal basis of the subspace U.
{x, x + y} is also orthogonal.
SECTION 5.4 Rank of a Matrix 253
(c) If {x, y} and {z, w} are both orthogonal in n, (b) r1r2 + r1r3 + r2r3 ≤ r 21 + r 22 + r 23 for all r1,
then {x, y, z, w} is also orthogonal. r2, and r3 in . [Hint: See part (a).]
(d) If {x1, x2} and {y1, y2, y3} are both orthogonal 12. (a) Show that x and y are orthogonal in n if
and xi · yj = 0 for all i and j, then and only if x + y = x - y.
{x1, x2, y1, y2, y3} is orthogonal.
(b) Show that x + y and x - y are orthogonal in
(e) If {x1, x2, …, xn} is orthogonal in n, then n if and only if x = y.
n = span{x1, x2, …, xn}.
13. (a) Show that x + y2 = x2 + y2 if and
(f ) If x ≠ 0 in n, then {x} is an orthogonal set.
only if x is orthogonal to y.
(b) If x = S T , y = S T and z = S T , show that
8. Let v denote a nonzero vector in n. 1 1 -2
1 0 3
(a) Show that P = {x in n | x · v = 0} is a x + y + z2 = x2 + y2 + z2 but
subspace of n. x · y ≠ 0, x · z ≠ 0, and y · z ≠ 0.
(b) Show that v = {tv | t in } is a subspace of n.
14. (a) Show that x · y = _14 [x + y2 - x - y2] for
(c) Describe P and v geometrically when all x, y in n.
n = 3.
(b) Show that
9. If A is an m × n matrix with orthonormal x2 + y2 = _12 x + y2 + x - y2
columns, show that ATA = In. for all x, y in n.
[Hint: If c1, c2, …, cn are the columns of A,
show that column j of ATA has entries 15. If A is n × n, show that every eigenvalue of ATA
c1 · cj, c2 · cj, …, cn · cj]. is nonnegative. [Hint: Compute Ax2 where x is
an eigenvector.]
10. Use the Cauchy inequality to show that
__ 1 16. If n = span{x1, …, xm} and x · xi = 0 for all i,
√xy ≤ _2 (x + y) for all x ≥ 0 and y ≥ 0. Here
__ 1 show that x = 0. [Hint: Show x = 0.]
√xy and _2 (x + y) are called, respectively, the
geometric mean and arithmetic mean of x and y. 17. If n = span{x1, …, xm} and x · xi = y · xi for all
__ __
S __ T S √__ T .]
√x y i, show that x = y. [Hint: Preceding Exercise.]
[Hint: Use x = and y =
√y √ x
18. Let {e1, …, en} be an orthogonal basis of n.
11. Use the Cauchy inequality to prove that:
Given x and y in n, show that
(a) (r1 + r2 + + rn)2 ≤ n(r 21 + r 22 + + r 2n) (x · e1)(y · e1) (x · en)(y · en)
x · y = ____________ + + ____________ .
for all ri in and all n ≥ 1. e1 2
en2
Definition 5.10 The column space, col A, of A is the subspace of m spanned by the columns of A.
The row space, row A, of A is the subspace of n spanned by the rows of A.
Lemma 1
PROOF
We prove (1); the proof of (2) is analogous. It is enough to do it in the case
when A → B by a single row operation. Let R1, R2, …, Rm denote the rows of A.
The row operation A → B either interchanges two rows, multiplies a row by a
nonzero constant, or adds a multiple of a row to a different row. We leave the
first two cases to the reader. In the last case, suppose that a times row p is added
to row q where p < q. Then the rows of B are R1, …, Rp, …, Rq + aRp, …, Rm,
and Theorem 1 Section 5.1 shows that
span{R1, …, Rp, …, Rq, …, Rm} = span{R1, …, Rp, …, Rq + aRp, …, Rm}.
That is, row A = row B.
Lemma 2
PROOF
The rows of R are independent by Example 6 Section 5.2, and they span row R
by definition. This proves 1.
Let cj1, cj2, …, cjr denote the columns of R containing leading 1s. Then
{cj1, cj2, …, cjr} is independent because the leading 1s are in different rows (and
have zeros below and to the left of them). Let U denote the subspace of all
columns in m in which the last m - r entries are zero. Then dim U = r (it is
just r with extra zeros). Hence the independent set {cj1, cj2, …, cjr} is a basis of
U by Theorem 7 Section 5.2. Since each cji is in col R, it follows that col R = U,
proving (2).
With Lemma 2 we can fill a gap in the definition of the rank of a matrix given
in Chapter 1. Let A be any matrix and suppose A is carried to some row-echelon
matrix R by row operations. Note that R is not unique. In Section 1.2 we defined
the rank of A, denoted rank A, to be the number of leading 1s in R, that is the
number of nonzero rows of R. The fact that this number does not depend on the
choice of R was not proved in Section 1.2. However part 1 of Lemma 2 shows that
SECTION 5.4 Rank of a Matrix 255
rank A = dim(row A)
and hence that rank A is independent of R.
Lemma 2 can be used to find bases of subspaces of n (written as rows). Here is
an example.
EXAMP L E 1
Find a basis of U = span{(1, 1, 2, 3), (2, 4, 1, 0), (1, 5, -4, -9)}.
1 1 2 3
Solution ► U is the row space of 2 4 1 0 . This matrix has row-echelon
1 5 −4 −9
1 1 2 3
form 0 1 − 32 −3 , so {(1, 1, 2, 3), (0, 1, -_32 , -3)} is basis of U by Lemma 2.
0 0 0 0
Note that {(1, 1, 2, 3), (0, 2, -3, -6)} is another basis that avoids fractions.
Theorem 1
PROOF
We have row A = row R by Lemma 1, so (1) follows from Lemma 2. Moreover,
R = UA for some invertible matrix U by Theorem 1 Section 2.5. Now write
A = [c1 c2 cn] where c1, c2, …, cn are the columns of A. Then
Thus, in the notation of (2), the set B = {Ucj1, Ucj2, …, Ucjr} is a basis of col R
by Lemma 2. So, to prove (2) and the fact that dim(col A) = r, it is enough to
show that D = {cj1, cj2, …, cjr} is a basis of col A. First, D is linearly independent
because U is invertible (verify), so we show that, for each j, column cj is a linear
combination of the cji. But Ucj is column j of R, and so is a linear combination of
the Ucji, say Ucj = a1Ucj1 + a2Ucj2 + + arUcjr where each ai is a real number.
Since U is invertible, it follows that cj = a1cj1 + a2cj2 + + arcjr and the proof
is complete.
256 Chapter 5 The Vector Space n
EXAMPLE 2
1 2 2 −1
Compute the rank of A = 3 6 5 0 and find bases for row A and col A.
1 2 1 2
Solution ► The reduction of A to row-echelon form is as follows:
1 2 2 −1 1 2 2 −1 1 2 2 −1
3 6 5 0 → 0 0 −1 3 → 0 0 1 −3
1 2 1 2 0 0 −1 3 0 0 0 0
Hence rank A = 2, and {[1 2 2 -1], [0 0 1 -3]} is a basis of row A by
Lemma 2. Since the leading 1s are in columns 1 and 3 of the row-echelon
matrix, Theorem 1 shows that columns 1 and 3 of A are a basis
U S T S TV
1 2
3 , 5 of col A.
1 1
Corollary 1
Corollary 2
Corollary 3
PROOF
Lemma 1 gives rank A = rank(UA). Using this and Corollary 1 we get
rank(AV ) = rank(AV )T = rank(VTAT) = rank(AT) = rank A.
Lemma 3
PROOF
For (1), write V = [v1, v2, …, vq] where vj is column j of V. Then we
have AV = [Av1, Av2, …, Avq], and each Avj is in col A by Definition 1
Section 2.2. It follows that col(AV ) ⊆ col A. If V is invertible, we obtain
col A = col[(AV )V-1] ⊆ col(AV ) in the same way. This proves (1).
As to (2), we have col[(UA)T] = col(ATUT) ⊆ col(AT) by (1), from which
row(UA) ⊆ row A. If U is invertible, this is equality as in the proof of (1).
Corollary 4
PROOF
By Lemma 3, col(AB) ⊆ col A and row(BA) ⊆ row A, so Theorem 1 applies.
Theorem 2
PROOF
It remains to prove (1). We already know (Theorem 1 Section 2.2) that null(A) is
spanned by the n - r basic solutions of Ax = 0. Hence using Theorem 7 Section
5.2, it suffices to show that dim[null(A)] = n - r. So let {x1, …, xk} be a basis
of null(A), and extend it to a basis {x1, …, xk, xk+1, …, xn} of n (by Theorem 6
Section 5.2). It is enough to show that {Axk+1, …, Axn} is a basis of im(A); then
n - k = r by the above and so k = n - r as required.
258 Chapter 5 The Vector Space n
EXAMPLE 3
1 −2 1 1
If A = −1 2 0 1 , find bases of null(A) and im(A), and so find their dimensions.
2 −4 1 0
Solution ► If x is in null(A), then Ax = 0, so x is given by solving the system
Ax = 0. The reduction of the augmented matrix to reduced form is
1 −2 1 1 0 1 −2 0 −1 0
−1 2 0 1 0 → 0 0 1 2 0
2 −4 1 0 0 0 0 0 0 0
U S T S TV
1 1
Hence r = rank(A) = 2. Here, im(A) = col(A) has basis −1 , 0 by
2 1
Theorem 1 because the leading 1s are in columns 1 and 3. In particular,
dim[im(A)] = 2 = r as in Theorem 2.
Turning to null(A), we use gaussian elimination. The leading variables are
x1 and x3, so the nonleading variables become parameters: x2 = s and x4 = t.
It follows from the reduced matrix that x1 = 2s + t and x3 = -2t, so the
general solution is
S T S T ST S T
x1 2s + t 2 1
x
x = x2 = s = sx1 + tx2 where x1 = 1 , and x2 = 0 .
3 −2t 0 -2
x4 t 0 1
Hence null(A). But x1 and x2 are solutions (basic), so
However Theorem 2 asserts that {x1, x2} is a basis of null(A). (In fact it is
easy to verify directly that {x1, x2} is independent in this case.) In particular,
dim[null(A)] = 2 = n - r, as Theorem 2 asserts.
Theorem 3
PROOF
(1) ⇒ (2). We have row A ⊆ n, and dim(row A) = n by (1), so row A = n by
Theorem 8 Section 5.2. This is (2).
(2) ⇒ (3). By (2), row A = n, so rank A = n. This means dim(col A) = n. Since
the n columns of A span col A, they are independent by Theorem 7 Section 5.2.
(3) ⇒ (4). If (ATA)x = 0, x in n, we show that x = 0 (Theorem 5 Section 2.4).
We have
Ax2 = (Ax)TAx = xTATAx = xT0 = 0.
Hence Ax = 0, so x = 0 by (3) and Theorem 2 Section 5.2.
(4) ⇒ (5). Given (4), take C = (ATA)-1 AT.
(5) ⇒ (6). If Ax = 0, then left multiplication by C (from (5)) gives x = 0.
(6) ⇒ (1). Given (6), the columns of A are independent by Theorem 2 Section
5.2. Hence dim(col A) = n, and (1) follows.
Theorem 4
PROOF
(1) ⇒ (2). By (1), dim(col A) = m, so col A = m by Theorem 8 Section 5.2.
(2) ⇒ (3). By (2), col A = m, so rank A = m. This means dim(row A) = m. Since
the m rows of A span row A, they are independent by Theorem 7 Section 5.2.
260 Chapter 5 The Vector Space n
EXAMPLE 4
3 x+ y+z
Show that is invertible if x, y, and z are not all equal.
x+ y+z x 2 + y 2 + z2
S T
1 x
Solution ► The given matrix has the form ATA where a = 1 y has independent
1 z
columns because x, y, and z are not all equal (verify). Hence Theorem 3 applies.
EXERCISES 5.4
1. In each case find bases for the row and column (b) U = span{(1, -1, 2, 5, 1), (3, 1, 4, 2, 7),
spaces of A and determine the rank of A. (1, 1, 0, 0, 0), (5, 1, 6, 7, 8)}
US T S T S T S TV
2 −4 6 8 2 −1 1 1 0 1 0
2 −1 3 2 − 2 1 1 (c) U = span 1, 0, 0, 1
(a) A = (b) A =
4 − 5 9 10 4 −2 3 0 1 1 0
0 1 0 1
0 −1 1 2 −6 3 0
US T S T S T S TV
1 2 3 4
1 −1 5 −2 2 (d) U = span 5, 6, 7, 8
(c) A = 2 − 2 − 2 5 1 -6 -8 -10 12
0 0 −12 9 − 3
−1 1 7 −7 1 3. (a) Can a 3 × 4 matrix have independent
columns? Independent rows? Explain.
1 2 −1 3
(d) A=
−3 − 6 3 − 2 (b) If A is 4 × 3 and rank A = 2, can A have
independent columns? Independent rows?
2. In each case find a basis of the subspace U. Explain.
(a) U = span{(1, -1, 0, 3), (2, 1, 5, 1), (4, -2, 5, 7)}
SECTION 5.4 Rank of a Matrix 261
(c) If A is an m × n matrix and rank A = m, (b) Conclude that if A2 = 0, then rank A ≤ _n2 .
show that m ≤ n.
(c) Find a matrix A for which col A = null A.
(d) Can a nonsquare matrix have its rows
independent and its columns independent? 11. Let B be m × n and let AB be k × n. If
Explain. rank B = rank(AB), show that null B = null(AB).
[Hint: Theorem 1.]
(e) Can the null space of a 3 × 6 matrix have
dimension 2? Explain. 12. Give a careful argument why rank(AT) = rank A.
(f ) Suppose that A is 5 × 4 and null(A) = x for 13. Let A be an m × n matrix with columns
some column x ≠ 0. Can dim(im A) = 2? c1, c2, …, cn. If rank A = n, show that
4. If A is m × n show that col(A) = {Ax | x in n}. {ATc1, ATc2, …, ATcn} is a basis of n.
Similar Matrices
Definition 5.11 If A and B are n × n matrices, we say that A and B are similar, and write A ∼ B, if
B = P-1AP for some invertible matrix P.
Note that A ∼ B if and only if B = QAQ–1 where Q is invertible (write P–1 = Q).
The language of similarity is used throughout linear algebra. For example, a matrix
A is diagonalizable if and only if it is similar to a diagonal matrix.
If A ∼ B, then necessarily B ∼ A. To see why, suppose that B = P-1AP. Then
A = PBP-1 = Q-1BQ where Q = P-1 is invertible. This proves the second of the
following properties of similarity (the others are left as an exercise):
1. A ∼ A for all square matrices A.
2. If A ∼ B, then B ∼ A. (∗)
3. If A ∼ B and B ∼ C, then A ∼ C.
These properties are often expressed by saying that the similarity relation ∼ is an
equivalence relation on the set of n × n matrices. Here is an example showing how
these properties are used.
EXAMPLE 1
If A is similar to B and either A or B is diagonalizable, show that the other is
also diagonalizable.
Definition 5.12 The trace tr A of an n × n matrix A is defined to be the sum of the main diagonal
elements of A.
In other words:
Lemma 1
PROOF
Write A = [aij] and B = [bij]. For each i, the (i, i)-entry di of the matrix AB is
di = ai1b1i + ai2b2i + + ainbni = ∑ j aijbji. Hence
tr(AB) = d1 + d2 + + dn = ∑ i di = ∑ i (∑ j aijbji).
Similarly we have tr(BA) = ∑ i (∑ j bijaji). Since these two double sums are the
same, Lemma 1 is proved.
As the name indicates, similar matrices share many properties, some of which are
collected in the next theorem for reference.
Theorem 1
If A and B are similar n × n matrices, then A and B have the same determinant, rank,
trace, characteristic polynomial, and eigenvalues.
PROOF
Let B = P-1AP for some invertible matrix P. Then we have
det B = det(P-1) det A det P = det A because det(P-1) = 1/det P.
Similarly, rank B = rank(P-1AP) = rank A by Corollary 3 of Theorem 1
Section 5.4. Next Lemma 1 gives
tr(P-1AP) = tr[P-1(AP)] = tr[(AP)P-1] = tr A.
As to the characteristic polynomial,
cB(x) = det(xI - B) = det{x(P-1IP) - P-1AP}
= det{P-1(xI - A)P}
= det(xI - A)
= cA(x).
Finally, this shows that A and B have the same eigenvalues because the
eigenvalues of a matrix are the roots of its characteristic polynomial.
264 Chapter 5 The Vector Space n
EXAMPLE 2
Sharing the five properties in Theorem 1 does not guarantee that two matrices
are similar. The matrices A = S T and I = S T have the same determinant,
1 1 1 0
0 1 0 1
rank, trace, characteristic polynomial, and eigenvalues, but they are not similar
because P-1IP = I for any invertible matrix P.
Diagonalization Revisited
Recall that a square matrix A is diagonalizable if there exists an invertible matrix
P such that P-1AP = D is a diagonal matrix, that is if A is similar to a diagonal
matrix D. Unfortunately, not all matrices are diagonalizable, for example S T
1 1
0 1
(see Example 10 Section 3.3). Determining whether A is diagonalizable is closely
related to the eigenvalues and eigenvectors of A. Recall that a number λ is called
an eigenvalue of A if Ax = λx for some nonzero column x in n, and any such
nonzero vector x is called an eigenvector of A corresponding to λ (or simply a
λ-eigenvector of A). The eigenvalues and eigenvectors of A are closely related to
the characteristic polynomial cA(x) of A, defined by
cA(x) = det(xI - A).
If A is n × n this is a polynomial of degree n, and its relationship to the eigenvalues
is given in the following theorem (a repeat of Theorem 2 Section 3.3).
Theorem 2
Let A be an n × n matrix.
1. The eigenvalues λ of A are the roots of the characteristic polynomial cA(x) of A.
2. The λ-eigenvectors x are the nonzero solutions to the homogeneous system
(λI - A)x = 0
of linear equations with λI - A as coefficient matrix.
EXAMPLE 3
Show that the eigenvalues of a triangular matrix are the main diagonal entries.
Theorem 3
Let A be an n × n matrix.
1. A is diagonalizable if and only if n has a basis {x1, x2, …, xn} consisting of
eigenvectors of A.
2. When this is the case, the matrix P = [x1 x2 xn] is invertible and
P-1AP = diag(λ1, λ2, …, λn) where, for each i, λi is the eigenvalue of A
corresponding to xi.
The next result is a basic tool for determining when a matrix is diagonalizable.
It reveals an important connection between eigenvalues and linear independence:
Eigenvectors corresponding to distinct eigenvalues are necessarily linearly
independent.
Theorem 4
PROOF
We use induction on k. If k = 1, then {x1} is independent because x1 ≠ 0.
In general, suppose the theorem is true for some k ≥ 1. Given eigenvectors
{x1, x2, …, xk+1}, suppose a linear combination vanishes:
t1x1 + t2x2 + + tk+1xk+1 = 0. (∗)
We must show that each ti = 0. Left multiply (∗) by A and use the fact that
Axi = λixi to get
t1λ1x1 + t2λ2x2 + + tk+1λk+1xk+1 = 0. (∗∗)
If we multiply (∗) by λ1 and subtract the result from (∗∗), the first terms cancel
and we obtain
t2(λ2 - λ1)x2 + t3(λ3 - λ1)x3 + + tk+1(λk+1 - λ1)xk+1 = 0.
Since x2, x3, …, xk+1 correspond to distinct eigenvalues λ2, λ3, …, λk+1, the set
{x2, x3, …, xk+1} is independent by the induction hypothesis. Hence,
t2(λ2 - λ1) = 0, t3(λ3 - λ1) = 0, …, tk+1(λk+1 - λ1) = 0,
and so t2 = t3 = = tk+1 = 0 because the λi are distinct. Hence (∗) becomes
t1x1 = 0, which implies that t1 = 0 because x1 ≠ 0. This is what we wanted.
Theorem 5
PROOF
Choose one eigenvector for each of the n distinct eigenvalues. Then these
eigenvectors are independent by Theorem 4, and so are a basis of n by
Theorem 7 Section 5.2. Now use Theorem 3.
EXAMPLE 4
1 0 0
Show that A = 1 2 3 is diagonalizable.
−1 1 0
Solution ► A routine computation shows that cA(x) = (x - 1)(x - 3)(x + 1) and
so has distinct eigenvalues 1, 3, and -1. Hence Theorem 5 applies.
Lemma 2
PROOF
If {e1, e2, …, en} is the standard basis of n, then
[e1 e2 … en] = In = P-1P = P-1[x1 x2 xn]
= [P-1x1 P-1x2 P-1xn]
Comparing columns, we have P-1xi = ei for each 1 ≤ i ≤ n. On the other hand,
observe that
P-1AP = P-1A[x1 x2 xn] = [(P-1A)x1 (P-1A)x2 (P-1A)xn].
Hence, if 1 ≤ i ≤ k, column i of P-1AP is
(P-1A)xi = P-1(λixi) = λi(P-1x1) = λiei.
This describes the first k columns of P-1AP, and Lemma 2 follows.
Lemma 3
PROOF
Write dim[Eλ(A)] = d. It suffices to show that cA(x) = (x - λ)dg(x) for some
polynomial g(x), because m is the highest power of (x - λ) that divides cA(x).
To this end, let {x1, x2, …, xd} be a basis of Eλ(A). Then Lemma 2 shows that an
invertible n × n matrix P exists such that
P-1AP = S T
λId B
0 A1
in block form, where Id denotes the d × d identity matrix. Now write A = P-1AP
and observe that cA(x) = cA(x) by Theorem 1. But Theorem 5 Section 3.1 gives
cA(x) = cA(x) = det(xIn - A ) = det S T
(x - λ)Id -B
0 xIn-d - A1
= det[(x - λ)Id] det[(xIn-d - A1)]
= (x - λ)dg(x).
where g(x) = cA1(x). This is what we wanted.
distinct); in other words, every eigenvalue of A is real. This need not happen
(consider A = S T), and we investigate the general case below.
0 -1
1 0
Theorem 6
The following are equivalent for a square matrix A for which cA(x) factors completely.
1. A is diagonalizable.
2. dim[Eλ(A)] equals the multiplicity of λ for every eigenvalue λ of the matrix A.
PROOF
Let A be n × n and let λ1, λ2, …, λk be the distinct eigenvalues of A. For each i,
let mi denote the multiplicity of λi and write di = dim[Eλi(A)]. Then
cA(x) = (x - λ1)m1(x - λ2)m2 (x - λn)mk
so m1 + + mk = n because cA(x) has degree n. Moreover, di ≤ mi for each i by
Lemma 3.
(1) ⇒ (2). By (1), n has a basis of n eigenvectors of A, so let ti of them lie in
Eλi(A) for each i. Since the subspace spanned by these ti eigenvectors has
dimension ti, we have ti ≤ di for each i by Theorem 4 Section 5.2. Hence
n = t1 + + tk ≤ d1 + + dk ≤ m1 + + mk = n.
It follows that d1 + + dk = m1 + + mk so, since di ≤ mi for each i, we must
have di = mi. This is (2).
(2) ⇒ (1). Let Bi denote a basis of Eλi(A) for each i, and let B = B1 ∪ ∪ Bk.
Since each Bi contains mi vectors by (2), and since the Bi are pairwise disjoint (the
λi are distinct), it follows that B contains n vectors. So it suffices to show that B
is linearly independent (then B is a basis of n). Suppose a linear combination
of the vectors in B vanishes, and let yi denote the sum of all terms that come
from Bi. Then yi lies in Eλi(A) for each i, so the nonzero yi are independent by
Theorem 4 (as the λi are distinct). Since the sum of the yi is zero, it follows that
yi = 0 for each i. Hence all coefficients of terms in yi are zero (because Bi is
independent). Since this holds for each i, it shows that B is independent.
EXAMPLE 5
5 8 16 2 1 1
If A = 4 1 8 and B = 2 1 − 2 , show that A is diagonalizable but
− 4 − 4 −11 − 1 0 −2
B is not.
S T S T S T
-1 -2 2
x1 = 1 , x2 = 0 , x3 = 1
0 1 -1
SECTION 5.5 Similarity and Diagonalization 269
S T S T
-1 5
y1 = 2 , y2 = 6 .
1 -1
Here dim(Eλ1(B)) = 1 is smaller than the multiplicity of λ1, so the matrix B is
not diagonalizable, again by Theorem 6. The fact that dim(Eλ1(B)) = 1 means
that there is no possibility of finding three linearly independent eigenvectors.
Complex Eigenvalues
All the matrices we have considered have had real eigenvalues. But this need not be
the case: The matrix A = S T has characteristic polynomial cA(x) = x + 1 which
0 -1 2
1 0
has no real roots. Nonetheless, this matrix is diagonalizable; the only difference is
that we must use a larger set of scalars, the complex numbers. The basic properties
of these numbers are outlined in Appendix A.
Indeed, nearly everything we have done for real matrices can be done for
complex matrices. The methods are the same; the only difference is that the
arithmetic is carried out with complex numbers rather than real ones. For example,
the gaussian algorithm works in exactly the same way to solve systems of linear
equations with complex coefficients, matrix multiplication is defined the same way,
and the matrix inversion algorithm works in the same way.
But the complex numbers are better than the real numbers in one respect: While
there are polynomials like x2 + 1 with real coefficients that have no real root, this
problem does not arise with the complex numbers: Every nonconstant polynomial
with complex coefficients has a complex root, and hence factors completely as a
product of linear factors. This fact is known as the fundamental theorem of algebra.12
EXAMP L E 6
Diagonalize the matrix A = S T.
0 -1
1 0
Solution ► The characteristic polynomial of A is
cA(x) = det(xI - A) = x2 + 1 = (x - i)(x + i)
where i2 = -1. Hence the eigenvalues are λ1 = i and λ2 = -i, with
corresponding eigenvectors x1 = S T and x2 = S T. Hence A is diagonalizable
1 1
-i i
by the complex version of Theorem 5, and the complex version of Theorem 3
shows that P = [x1 x2] = S T is invertible and P AP = S 0 λ T = S T. Of
1 1 -1 λ1 0 i 0
-i i 2 0 -i
course, this can be checked directly.
12 This was a famous open problem in 1799 when Gauss solved it at the age of 22 in his Ph.D. dissertation.
270 Chapter 5 The Vector Space n
Symmetric Matrices13
On the other hand, many of the applications of linear algebra involve a real matrix A
and, while A will have complex eigenvalues by the fundamental theorem of algebra,
it is always of interest to know when the eigenvalues are, in fact, real. While this
can happen in a variety of ways, it turns out to hold whenever A is symmetric. This
important theorem will be used extensively later. Surprisingly, the theory of complex
eigenvalues can be used to prove this useful result about real eigenvalues.
__
Let z denote the__conjugate of a complex number z. If A is a complex matrix, the
conjugate matrix A is defined to be__the matrix obtained from A by conjugating
__
every entry. Thus, if A = [zij], then A = [ z ij]. For example,
S T S T
-i + 2 5 __ i+2 5
If A = then A=
i 3 + 4i -i 3 - 4i
______ __ __ ___ __ __
Recall that z + w = z + w and zw = z w hold for all complex numbers z and w. It
follows that if A and B are two complex matrices, then
______ __ __ ___ __ __ ___ __ __
A + B = A + B, AB = A B and λA = λ B
hold for all complex scalars λ. These facts are used in the proof of the following
theorem.
Theorem 7
PROOF
__
Observe that A = A because__ A is real. If λ is an eigenvalue of A, we show that
λ is real by showing that λ = λ. Let x be a (possibly complex) eigenvector
__
corresponding to λ, so that x ≠ 0 and Ax = λx. Define c = xT x .
If we write x = (z1, z2, …, zn) where the zi are complex numbers, we have
__ __ __ __ __ __ __
c = xT x = z1 z1 + z2 z2 + + zn zn = | z1 |2 + | z2 |2 + + | zn |2.
Thus c is a real__number, and c > 0 because at__least one of the zi ≠ 0 (as x ≠ 0).
We show that λ = λ by verifying that λc = λ c. We have
__ __ __ __
λc = λ(xT x ) = (λx)T x = (Ax)T x = xTAT x .
At this point
__ we use the hypothesis that A is symmetric and real. This means
AT = A = A , so we continue the calculation:
__ __ __ ___ ___
λc = xTAT x = xT( A x ) = xT( Ax ) = xT( λx )
__ __
=__xT( λ x )
__
λ xT x
= __
= λc
as required.
The technique in the proof of Theorem 7 will be used again when we return to
complex linear algebra in Section 8.6.
13 This discussion uses complex conjugation and absolute value. These topics are discussed in Appendix A.
14 This theorem was first proved in 1829 by the great French mathematician Augustin Louis Cauchy (1789–1857).
SECTION 5.5 Similarity and Diagonalization 271
EXAMP L E 7
Verify Theorem 7 for every real, symmetric 2 × 2 matrix A.
for any choice of a, b, and c. Hence, the eigenvalues are real numbers.
EXERCISES 5.5
11. Given a polynomial p(x) = r0 + r1x + + rnxn 19. Show that A is similar to AT for all 2 × 2
matrices A. [Hint: Let A = S T. If c = 0, treat
and a square matrix A, the matrix a b
p(A) = r0I + r1A + + rnAn is called the c d
evaluation of p(x) at A. Let B = P-1AP. Show the cases b = 0 and b ≠ 0 separately. If c ≠ 0,
reduce to the case c = 1 using Exercise 12(d).]
that p(B) = P-1p(A)P for all polynomials p(x).
20. Refer to Section 3.4 on linear recurrences.
12. Let P be an invertible n × n matrix. If A is any
Assume that the sequence x0, x1, x2, … satisfies
n × n matrix, write TP(A) = P-1AP. Verify that:
xn+k = r0xn + r1xn+1 + + rk-1xn+k-1
(a) TP(I) = I
for all n ≥ 0. Define
(b) TP(AB) = TP(A)TP(B)
0 1 0 0
(c) TP(A + B) = TP(A) + TP(B) xn
0 0 1 0
xn +1
(d) TP(rA) = rTP(A) A= , Vn = .
k k
(e) TP(A ) = [TP(A)] for k ≥ 1 0 0 0 1
xn + k −1
-1 -1 r0 r1 r2 r k −1
(f ) If A is invertible, TP(A ) = [TP(A)] .
Then show that:
(g) If Q is invertible, TQ[TP(A)] = TPQ(A).
(a) Vn = AnV0 for all n.
13. (a) Show that two diagonalizable matrices are
(b) cA(x) = xk - rk-1xk-1 - - r1x - r0.
similar if and only if they have the same
eigenvalues with the same multiplicities. (c) If λ is an eigenvalue of A, the eigenspace Eλ
If A is diagonalizable, show that A ∼ AT.
(b)
has dimension 1, and x = (1, λ, λ2, …, λk-1)T
is an eigenvector. [Hint: Use cA(λ) = 0 to
(c) Show that A ∼ AT if A = S T.
1 1
show that Eλ = x.]
0 1
14. If A is 2 × 2 and diagonalizable, show that (d) A is diagonalizable if and only if the
C(A) = {X | XA = AX} has dimension 2 or 4. eigenvalues of A are distinct. [Hint: See part
[Hint: If P-1AP = D, show that X is in C(A) if (c) and Theorem 4.]
and only if P-1XP is in C(D).] (e) If λ1, λ2, …, λk are distinct real eigenvalues,
15. If A is diagonalizable and p(x) is a polynomial there exist constants t1, t2, …, tk such that
such that p(λ) = 0 for all eigenvalues λ of A, xn = t1λ n1 + + tkλ nk holds for all n.
show that p(A) = 0 (see Example 9 Section 3.3). [Hint: If D is diagonal with λ1, λ2, …, λk
In particular, show cA(A) = 0. [Remark: cA(A) = 0 as the main diagonal entries, show that
for all square matrices A—this is the Cayley- An = PDnP-1 has entries that are linear
Hamilton theorem (see Theorem 2 Section 9.4).] combinations of λ n1, λ n2, …, λ nk.
(ATA)z = ATb.
Definition 5.14 This is a system of linear equations called the normal equations for z.
Note that this system can have more than one solution (see Exercise 5). However,
the n × n matrix ATA is invertible if (and only if ) the columns of A are linearly
independent (Theorem 3 Section 5.4); so, in this case, z is uniquely determined and
is given explicitly by z = (ATA)-1ATb. However, the most efficient way to find z is
to apply gaussian elimination to the normal equations.
This discussion is summarized in the following theorem.
Theorem 1
EXAMPLE 1
The system of linear equations
3x - y = 4
x + 2y = 0
2x + y = 1
has no solution. Find the vector z = S T that best approximates a solution.
x0
y0
Solution ► In this case,
S T S T
3 -1 3 -1
A = 1 2 , so ATA = S T 1 2 =S T
3 1 2 14 1
2 1 -1 2 1 1 6
2 1
is invertible. The normal equations (ATA)z = ATb are
S 83 S
Tz = S -3 T , so z = __ T.
14 1 14 1 87
1 6 -56
87 -56
Thus x0 = __
83
and y0 = ___
83
. With these values of x and y, the left sides of the
equations are, approximately,
317
3x0 - y0 = ___
83
= 3.82
-25
x0 + 2y0 = ___
83
= -0.30
118
___
2x0 + y0 = 83
= 1.42
EXAMPLE 2
The average number g of goals per game scored by a hockey player seems to
be related linearly to two factors: the number x1 of years of experience and the
number x2 of goals in the preceding 10 games. The data on the following page
were collected on four players. Find the linear function g = a0 + a1x1 + a2x2
that best fits these data.
SECTION 5.6 Best Approximation and Least Squares 275
g x1 x2 Solution ► If the relationship is given by g = r0 + r1x1 + r2x2, then the data can
be described as follows:
0.8 5 3
S T
1 5 3 r0
S T
0.8 3 4 0.8
1 3 4 r1 = 0.8
0.6 1 5 1 1 5 r2 0.6
1 2 1 0.4
0.4 2 1
Using the notation in Theorem 1, we get
z = (ATA)-1ATb
S T S T
119 −17 − 19 1 1 1 1 0.8 0.14
= 1
__
−17 0.8
42 5 1 5 3 1 2 0.6 = 0.09
−19 1 5 3 4 5 1 0.4 0.08
as the measure of error, and the line y = f (x) is to be chosen so as to make this sum
as small as possible. This line is said to be the least squares approximating line for
the data points (x1, y1), (x2, y2), …, (xn, yn).
The square of the error di is given by d 2i = [yi - f (xi)]2 for each i, so the quantity
S to be minimized is the sum:
S = [y1 - f (x1)]2 + [y2 - f (x2)]2 + + [yn - f (xn)]2.
Note that all the numbers xi and yi are given here; what is required is that the
function f be chosen in such a way as to minimize S. Because f (x) = r0 + r1x, this
amounts to choosing r0 and r1 to minimize S. This problem can be solved using
Theorem 1. The following notation is convenient.
S T S T S T
f (x1)
S T
x1 y1 r0 + r1x1
x2 y2 f (x2) r + r1x2
x= y= and f (x) = = 0
xn yn f (xn) r0 + r1xn
Then the problem takes the following form: Choose r0 and r1 such that
S = [y1 - f (x1)]2 + [y2 - f (x2)]2 + + [yn - f (xn)]2 = y - f (x)2
is as small as possible. Now write
S T
1 x1
r = S T.
1 x2 r0
M= and
r1
1 xn
Then Mr = f (x), so we are looking for a column r = S T such that y - Mr2 is as
r0
r1
small as possible. In other words, we are looking for a best approximation z to the
system Mr = y. Hence Theorem 1 applies directly, and we have
Theorem 2
Suppose that n data points (x1, y1), (x2, y2), …, (xn, yn) are given, where at least two of
x1, x2, …, xn are distinct. Put
S T S T
y1 1 x1
y2 1 x2
y= M=
yn 1 xn
Then the least squares approximating line for these data points has equation
y = z0 + z1x
where z = S z T is found by gaussian elimination from the normal equations
z0
1
(MTM)z = MTy.
The condition that at least two of x1, x2, …, xn are distinct ensures that MTM is an
invertible matrix, so z is unique:
z = (MTM)-1MTy.
SECTION 5.6 Best Approximation and Least Squares 277
EXAMP L E 3
Let data points (x1, y1), (x2, y2), …, (x5, y5) be given as in the accompanying
table. Find the least squares approximating line for these data.
S T
1 x1
1 1
1 1 1 1 x2
3 2 MTM =
x1 x2 x5
4 3 1 x5
6 4 x1 + + x5
=S T,
5 5 21
=
7 5
x1 + + x5 x12 + + x52 21 111
S T
y1
T 1 1 1 y2
and M y =
x1 x2 x5
y5
y1 + y2 + + y5 = S T,
15
=
+
x1 y1 x2 y2 + + x5 y5 78
S T = S z1 T = S T
5 21 z0 15
21 111 78
114 S T S T = ___
114 S T = __38 S T.
111 -21 15 1 27 9
z = (MTM)-1MTy = ___ 1 1
-21 5 78 75 25
S T
As before, write
S T
f (x1)
S T
x1 y1
x2 y2 f (x2)
x= y= and f (x) =
xn yn f (xn)
For each xi we have two values of the variable y, the observed value yi, and the
computed value f (xi). The problem is to choose f (x)—that is, choose r0, r1, …, rm
—such that the f (xi) are as close as possible to the yi. Again we define “as close as
possible” by the least squares condition: We choose the ri such that
y - f (x)2 = [y1 - f (x1)]2 + [y2 - f (x2)]2 + + [yn - f (xn)]2
is as small as possible.
278 Chapter 5 The Vector Space n
Definition 5.15 A polynomial f (x) satisfying this condition is called a least squares approximating
polynomial of degree m for the given data pairs.
If we write
1 x1 x12 x1m
S T
r0
1 x2 x22 x2m r1
M= and r=
rm
1 xn xn2 xnm
we see that f (x) = Mr. Hence we want to find R such that y - Mr2 is as small as
possible; that is, we want a best approximation z to the system Mr = y. Theorem 1
gives the first part of Theorem 3.
Theorem 3
Let n data pairs (x1, y1), (x2, y2), …, (xn, yn) be given, and write
1 x1 x12 x1m
S T S T
y1 z0
y2 2 m
z1
y= M = 1 x2 x2 x2 z=
yn zm
2 m
1 xn xn xn
1. If z is any solution to the normal equations
(MTM)z = MTy
then the polynomial
z0 + z1x + z2x2 + + zmxm
is a least squares approximating polynomial of degree m for the given data pairs.
2. If at least m + 1 of the numbers x1, x2, …, xn are distinct (so n ≥ m + 1), the
matrix MTM is invertible and z is uniquely determined by
z = (MTM)-1MTy
PROOF
It remains to prove (2), and for that we show that the columns of M are linearly
independent (Theorem 3 Section 5.4). Suppose a linear combination of the
columns vanishes:
ST S T ST
xm
S T
1 x1 1 0
x2 xm
r0 1 + r1 + + rm 2 = 0
1 xn xm 0
n
If we write q(x) = r0 + r1x + + rmxm, equating coefficients shows that
q(x1) = q(x2) = = q(xn) = 0. Hence q(x) is a polynomial of degree m with at
least m + 1 distinct roots, so q(x) must be the zero polynomial (see Appendix D
or Theorem 4 Section 6.5). Thus r0 = r1 = = rm = 0 as required.
SECTION 5.6 Best Approximation and Least Squares 279
EXAMP L E 4
Find the least squares approximating quadratic y = z0 + z1x + z2x2 for the
following data points.
(-3, 3), (-1, 1), (0, 1), (1, 2), (3, 4)
S T
1 1 1 1 1 1 11
T
M y = − 3 −1 0 1 3 1 = 4
9 1 0 1 9 2 66
4
The normal equations for z are
S T S T
5 0 20 11 1.15
0 20 0 z= 4 whence z = 0.20
20 0 164 66 0.26
This means that the least squares approximating quadratic for these data is
y = 1.15 + 0.20x + 0.26x2.
Other Functions
There is an extension of Theorem 3 that should be mentioned. Given data pairs
(x1, y1), (x2, y2), …, (xn, yn), that theorem shows how to find a polynomial
f (x) = r0 + r1x + + rmxm
such that y - f (x)2 is as small as possible, where x and f (x) are as before.
Choosing the appropriate polynomial f (x) amounts to choosing the coefficients
r0, r1, …, rm, and Theorem 3 gives a formula for the optimal choices. Here f (x) is a
linear combination of the functions 1, x, x2, …, xm where the ri are the coefficients,
and this suggests applying the method to other functions. If f0(x), f1(x), …, fm(x) are
given functions, write
f (x) = r0 f0(x) + r1 f1(x) + + rm fm(x)
280 Chapter 5 The Vector Space n
where the ri are real numbers. Then the more general question is whether
r0, r1, …, rm can be found such that y - f (x)2 is as small as possible where
S T
f (x1)
f (x2)
f (x) =
f (xm)
Such a function f (x) is called a least squares best approximation for these data
pairs of the form r0 f0(x) + r1 f1(x) + + rm fm(x), ri in . The proof of Theorem 3
goes through to prove
Theorem 4
Let n data pairs (x1, y1), (x2, y2), …, (xn, yn) be given, and suppose that m + 1 functions
f0(x), f1(x), …, fm(x) are specified. Write
f 0 ( x1) f1( x1) fm ( x1)
S T S T
y1 z1
y f 0 ( x 2) f1(x 2) fm (x 2 ) z2
y= 2 M= z=
yn zm
f 0 ( xn ) f1( xn ) fm ( xn )
(1) If z is any solution to the normal equations
(MTM)z = MTy,
is the best approximation for these data among all functions of the form
r0 f0(x) + r1 f1(x) + + rm fm(x) where the ri are in .
(2) If MTM is invertible (that is, if rank(M) = m + 1), then z is uniquely
determined; in fact, z = (MTM)-1(MTy).
Clearly Theorem 4 contains Theorem 3 as a special case, but there is no simple test
in general for whether MTM is invertible. Conditions for this to hold depend on the
choice of the functions f0(x), f1(x), …, fm(x).
EXAMPLE 5
Given the data pairs (-1, 0), (0, 1), and (1, 4), find the least squares
approximating function of the form r0x + r12x.
Solution ► The functions are f0(x) = x and f1(x) = 2x, so the matrix M is
f 0 (x1) f1 (x1 ) −1 2−1 −2 1
M = f 0 (x2 ) f1 (x2 ) = 0 20 = _12 0 2
f 0 (x3) f1 (x3 ) 2 4
1 21
In this case MTM = _14 S T is invertible, so the normal equations
8 6
6 21
SECTION 5.6 Best Approximation and Least Squares 281
EXERCISES 5.6
1. Find the best approximation to a solution of each (b) (-1, _12 ), (0, 1), (2, 5), (3, 9)
of the following systems of equations.
6. If M is a square invertible matrix, show that
(a) x+ y- z=5 z = M-1y (in the notation of Theorem 3).
2x - y + 6z = 1
3x + 2y - z = 6 7. Newton’s laws of motion imply that an object
-x + 4y + z = 0 dropped from rest at a height of 100 metres will
be at a height s = 100 - _12 gt2 metres t seconds
(b) 3x + y + z = 6
later, where g is a constant called the acceleration
2x + 3y - z = 1
due to gravity. The values of s and t given in the
2x - y + z = 0
table are observed. Write x = t2, find the least
3x - 3y + 3z = 8
squares approximating line s = a + bx for these
2. Find the least squares approximating line data, and use b to estimate g.
y = z0 + z1x for each of the following sets of
Then find the least squares approximating
data points.
quadratic s = a0 + a1t + a2t2 and use the value of
(a) (1, 1), (3, 2), (4, 3), (6, 4) a2 to estimate g.
(b) (2, 4), (4, 3), (7, 2), (8, 1) t 1 2 3
(c) (-1, -1), (0, 1), (1, 2), (2, 4), (3, 6) s 95 80 56
(d) (-2, 3), (-1, 1), (0, 0), (1, -2), (2, -4)
8. A naturalist measured the heights yi (in metres)
3. Find the least squares approximating quadratic of several spruce trees with trunk diameters xi (in
y = z0 + z1x + z2x2 for each of the following sets centimetres). The data are as given in the table.
of data points. Find the least squares approximating line for
these data and use it to estimate the height of a
(a) (0, 1), (2, 2), (3, 3), (4, 5) spruce tree with a trunk of diameter 10 cm.
(b) (-2, 1), (0, 0), (3, 2), (4, 3)
xi 5 7 8 12 13 16
4. Find a least squares approximating function yi 2 3.3 4 7.3 7.9 10.1
of the form r0x + r1x2 + r22x for each of the
following sets of data pairs. 9. The yield y of wheat in bushels per acre
appears to be a linear function of the number
(a) (-1, 1), (0, 3), (1, 1), (2, 0) of days x1 of sunshine, the number of inches
(b) (0, 1), (1, 1), (2, 5), (3, 10) x2 of rain, and the number of pounds x3 of
fertilizer applied per acre. Find the best fit to
5. Find the least squares approximating function the data in the table by an equation of the form
of the form r0 + r1x2 + r2sin __
πx
2
for each of the y = r0 + r1x1 + r2x2 + r3x3. [Hint: If a calculator
following sets of data pairs. for inverting ATA is not available, the inverse is
(a) (0, 3), (1, 0), (1, -1), (-1, 2) given in the answer.]
282 Chapter 5 The Vector Space n
related to it. The most widely known statistic for describing a data set is the sample
__
mean x defined by16
__ n
x = _1n (x1 + x2 + + xn) = _1n ∑xi.
__ i=1
The mean x is “typical”
__
of the sample values xi, but may not itself
__
be one of them.
The number xi - x is called the deviation of xi from the mean x . The deviation is
__ __
positive if xi > x and it is negative if xi < x . Moreover, the sum of these deviations
is zero:
∑(xi - x ) = Q ∑xi R - n x = n x - n x = 0.
n __ n __ __ __
(∗)
i=1 i=1
__
This is described
__
by saying that the sample mean x is central to the sample values xi.
__
If the mean x is subtracted from each data value xi, the resulting data xi - x are
said to be centred. The corresponding data vector is
__ __ __
xc = [x1 - x x2 - x xn - x ]
__
and (∗) shows that the mean x c = 0. For example, the sample x = [-1 0 1 4 6]
Sample x __
is plotted in the first diagram. The mean is x = 2, and the centred sample
xc = [-3 -2 -1 2 4] is also plotted. Thus, the effect of centring is to shift the
__ __
−1 0 1 2 4 6 data by an amount x (to the left if x is positive) so that the mean moves to 0.
Another question that arises about samples is how much variability there is in the
x sample x = [x1 x2 xn]; that is, how widely are the data “spread out” around the
__
sample mean x . A natural measure of variability would be the sum of the deviations
Centred Sample xc of the xi about the mean, but this sum is zero by (∗); these deviations cancel out.
__
To avoid this cancellation, statisticians use the squares (xi - x )2 of the deviations as
−3 −2 −1 0 2 4 a measure of variability. More precisely, they compute a statistic called the sample
variance s 2x, defined17 as follows:
xc n
__ __ __ __
s 2x = ___
1
n-1
[(x1 - x )2 + (x2 - x )2 + + (xn - x )2] = ___
1
n-1
∑(xi - x )2.
i=1
The sample variance will be large if there are many xi at a large distance from the
__
mean x , and it will be small if all the xi are tightly clustered about the mean. The
variance is clearly nonnegative (hence the notation s 2x), and the square root sx of the
variance is called the sample standard deviation.
The sample mean and variance can be conveniently described using the dot
product. Let
1 = [1 1 1]
denote the row with every entry equal to 1. If x = [x1 x2 xn], then
x · 1 = x1 + x2 + + xn, so the sample mean is given by the formula
__
x = _1n (x · 1).
__ __ __ __ __
Moreover, remembering that x is a scalar, we have x 1 = [ x x x ], so the centred
sample vector xc is given by
__ __ __ __
xc = x - x 1 = [x1 - x x2 - x xn - x ].
Thus we obtain a formula for the sample variance:
__
s 2x = ___
1
n-1
xc2 = ___
1
n-1
x - x 12.
Linear algebra is also useful for comparing two different samples. To illustrate
how, consider two examples.
16 The mean is often called the “average” of the sample values xi, but statisticians use the term “mean”.
17 Since there are n sample values, it seems more natural to divide by n here, rather than by n - 1. The reason for using n - 1 is that
then the sample variance sx2 provides a better estimate of the variance of the entire population from which the sample was drawn.
284 Chapter 5 The Vector Space n
The following table represents the number of sick days at work per year and the
yearly number of visits to a physician for 10 individuals.
Individual 1 2 3 4 5 6 7 8 9 10
Sick Doctor visits 2 6 8 1 5 10 3 9 7 4
Days Sick days 2 4 8 3 5 9 4 7 7 2
The data are plotted in the scatter diagram where it is evident that, roughly
Doctor Visits
speaking, the more visits to the doctor the more sick days. This is an example
of a positive correlation between sick days and doctor visits.
Now consider the following table representing the daily doses of vitamin C
and the number of sick days.
Individual 1 2 3 4 5 6 7 8 9 10
Vitamin C 1 5 7 0 4 9 2 8 6 3
Sick days 5 2 2 6 2 1 4 3 2 5
The scatter diagram is plotted as shown and it appears that the more vitamin C
taken, the fewer sick days. In this case there is a negative correlation between daily
Sick
vitamin C and sick days.
Days
In both these situations, we have paired samples, that is observations of two
variables are made for ten individuals: doctor visits and sick days in the first case;
Vitamin C Doses daily vitamin C and sick days in the second case. The scatter diagrams point to
a relationship between these variables, and there is a way to use the sample to
compute a number, called the correlation coefficient, that measures the degree to
which the variables are associated.
To motivate the definition of the correlation coefficient, suppose two paired
samples x = [x1 x2 xn], and y = [y1 y2 yn] are given and consider the
centred samples
__ __ __ __ __ __
xc = [x1 - x x2 - x xn - x ] and yc = [y1 - y y2 - y yn - y ]
__ __
If xk is large among the xi’s, then the deviation xk - x will be positive; and xk - x
will be negative if xk is small among the xi’s. The situation is similar for y, and the
__ __
following table displays the sign of the quantity (xi - x )( yk - y ) in all four cases:
__ __
Sign of (xi - x )( yk - y ):
xi large xi small
yi large positive negative
yi small negative positive
Intuitively, if x and y are positively correlated, then two things happen:
1. Large values of the xi tend to be associated with large values of the yi, and
2. Small values of the xi tend to be associated with small values of the yi.
It follows from the table that, if x and y are positively correlated, then the dot
product
n __ __
xc · yc = ∑(xi - x )( yi - y )
i=1
18 The idea of using a single number to measure the degree of relationship between different variables was pioneered by Francis Galton
(1822–1911). He was studying the degree to which characteristics of an offspring relate to those of its parents. The idea was refined
by Karl Pearson (1857–1936) and r is often referred to as the Pearson correlation coefficient.
SECTION 5.7 An Application to Correlation and Variance 285
xc · yc
r = r(x, y) = ________.
xc yc
Bearing the situation in 3 in mind, r is the cosine of the “angle” between the
vectors xc and yc, and so we would expect it to lie between -1 and 1. Moreover,
we would expect r to be near 1 (or -1) if these vectors were pointing in the same
(opposite) direction, that is the “angle” is near zero (or π).
This is confirmed by Theorem 1 below, and it is also borne out in the examples
above. If we compute the correlation between sick days and visits to the physician
(in the first scatter diagram above) the result is r = 0.90 as expected. On the other
hand, the correlation between daily vitamin C doses and sick days (second scatter
diagram) is r = -0.84.
However, a word of caution is in order here. We cannot conclude from the
second example that taking more vitamin C will reduce the number of sick days
at work. The (negative) correlation may arise because of some third factor that is
related to both variables. For example, case it may be that less healthy people are
inclined to take more vitamin C. Correlation does not imply causation. Similarly,
the correlation between sick days and visits to the doctor does not mean that having
many sick days causes more visits to the doctor. A correlation between two variables
may point to the existence of other underlying factors, but it does not necessarily
mean that there is a causality relationship between the variables.
Our discussion of the dot product in n provides the basic properties of the
correlation coefficient:
Theorem 1
Let x = [x1 x2 xn] and y = [y1 y2 yn] be (nonzero) paired samples, and let
r = r(x, y) denote the correlation coefficient. Then:
1. -1 ≤ r ≤ 1.
2. r = 1 if and only if there exist a and b > 0 such that yi = a + bxi for each i.
3. r = -1 if and only if there exist a and b < 0 such that yi = a + bxi for each i.
PROOF
The Cauchy inequality (Theorem 2 Section 5.3) proves (1), and also shows that
r = ±1 if and only if one of xc and yc is a scalar multiple of the other. This in
turn holds if and only if yc = bxc for some b ≠ 0, and it is easy to verify that r = 1
when b > 0 and r = -1 when b < 0.
__ __
Finally, yc = bxc means y1 - y = b(x1 - x ) for each i; that is, yi = a + bxi
__ __ __ __
where a = y - b x . Conversely, if yi = a + bxi, then y = a + b x (verify), so
__ __ __
y1 - y = (a + bxi) - (a + b x ) = b(x1 - x ) for each i. In other words, yc = bxc.
This completes the proof.
Properties (2) and (3) in Theorem 1 show that r(x, y) = 1 means that there is
a linear relation with positive slope between the paired data (so large x values are
paired with large y values). Similarly, r(x, y) = -1 means that there is a linear
relation with negative slope between the paired data (so small x values are paired
with small y values). This is borne out in the two scatter diagrams above.
286 Chapter 5 The Vector Space n
We conclude by using the dot product to derive some useful formulas for
computing variances and correlation coefficients. Given samples x = [x1 x2 xn],
and y = [y1 y2 yn], the key observation is the following formula:
__ __
xc · yc = x · y - n x y . (∗∗)
__ __
Indeed, remembering that x and y are scalars:
__ __
xc · yc = (x - x 1) · (y - y 1)
__ __ __ __
= x · y - x · ( y 1) - ( x 1) · y + ( x 1) · ( y 1)
__ __ ____
= x · y - y (x · 1) - x (1 · y) + x y (1 · 1)
__ __ __ __ ____
= x · y - y (n x ) - x (n y ) + x y (n)
____
= x · y - nx y.
Taking y = x in (∗∗) gives a formula for the variance s 2x = ___
1
n-1
xc2 of x.
Variance Formula
__
If x is a sample vector, then s 2x = ___
1
n-1
(xc2 - n x 2).
Correlation Formula
Data Scaling
Let x = [x1 x2 xn] and y = [y1 y2 yn] be sample vectors. Given constants a, b, c,
and d, consider new samples z = [z1 z2 zn] and w = [w1 w2 wn] where
zi = a + bxi, for each i and wi = c + dyi for each i. Then:
__ __
(a) z = a + b x .
(b) s 2z = b2s 2x , so sz = |b|sx .
(c) If b and d have the same sign, then r(x, y) = r(z, w).
EXERCISES 5.7
1 2 3 4 5 6 7 8 9 10
3. If x is a sample vector, and xc is the centred
Father’s IQ 140 131 120 115 110 106 100 95 91 86 __
sample, show that x c = 0 and the standard
Son’s IQ 130 138 110 99 109 120 105 99 100 94
deviation of xc is sx.
2. The following table gives the number of years of 4. Prove the data scaling formulas found on page
education and the annual income (in thousands) 292: (a), (b), and (c).
of 10 individuals. Find the means, the variances,
and the correlation coefficient. (Again the data
scaling formula is useful.)
1. In each case either show that the statement is (j) If ax + by + cz = 0 where a, b, and c are in
true or give an example showing that it is false. , then {x, y, z} is independent.
Throughout, x, y, z, x1, x2, …, xn denote vectors
(k) If {x, y, z} is independent, then
in n.
ax + by + cz = 0 for some a, b, and c in .
(a) If U is a subspace of n and x + y is in U,
(l) If {x1, x2, …, xn} is not independent, then
then x and y are both in U.
t1x1 + t2x2 + + tnxn = 0 for ti in not all
(b) If U is a subspace of n and rx is in U, then x zero.
is in U.
(m) If {x1, x2, …, xn} is independent, then
(c) If U is a nonempty set and sx + ty is in U for t1x1 + t2x2 + + tnxn = 0 for some ti in .
any s and t whenever x and y are in U, then
(n) Every set of four non-zero vectors in 4 is a
U is a subspace.
basis.
(d) If U is a subspace of n and x is in U, then
(o) No basis of 3 can contain a vector with a
-x is in U.
component 0.
(e) If {x, y} is independent, then {x, y, x + y} is
(p) 3 has a basis of the form {x, x + y, y} where
independent.
x and y are vectors.
(f ) If {x, y, z} is independent, then {x, y} is
(q) Every basis of 5 contains one column of I5.
independent.
(r) Every nonempty subset of a basis of 3 is
(g) If {x, y} is not independent, then {x, y, z} is
again a basis of 3.
not independent.
(s) If {x1, x2, x3, x4} and {y1, y2, y3, y4} are bases
(h) If all of x1, x2, …, xn are nonzero, then
of 4, then {x1 + y1, x2 + y2, x3 + y3, x4 + y4}
{x1, x2, …, xn} is independent.
is also a basis of 4.
(i) If one of x1, x2, …, xn is zero, then
{x1, x2, …, xn} is not independent.
Vector Spaces
6
In this chapter we introduce vector spaces in full generality. The reader will notice
some similarity with the discussion of the space n in Chapter 5. In fact much of the
present material has been developed in that context, and there is some repetition.
However, Chapter 6 deals with the notion of an abstract vector space, a concept
that will be new to most readers. It turns out that there are many systems in which
a natural addition and scalar multiplication are defined and satisfy the usual rules
familiar from n. The study of abstract vector spaces is a way to deal with all these
examples simultaneously. The new aspect is that we are dealing with an abstract
system in which all we know about the vectors is that they are objects that can be
added and multiplied by a scalar and satisfy rules familiar from n.
The novel thing is the abstraction. Getting used to this new conceptual level
is facilitated by the work done in Chapter 5: First, the vector manipulations are
familiar, giving the reader more time to become accustomed to the abstract setting;
and, second, the mental images developed in the concrete setting of n serve as an
aid to doing many of the exercises in Chapter 6.
The concept of a vector space was first introduced in 1844 by the German
mathematician Hermann Grassmann (1809–1877), but his work did not receive the
attention it deserved. It was not until 1888 that the Italian mathematician Guiseppe
Peano (1858–1932) clarified Grassmann’s work in his book Calcolo Geometrico
and gave the vector space axioms in their present form. Vector spaces became
established with the work of the Polish mathematician Stephan Banach (1892–1945),
and the idea was finally accepted in 1918 when Hermann Weyl (1885–1955) used it
in his widely read book Raum-Zeit-Materie (“Space-Time-Matter”), an introduction
to the general theory of relativity.
Definition 6.1 A vector space consists of a nonempty set V of objects (called vectors) that can be
added, that can be multiplied by a real number (called a scalar in this context), and for
which certain axioms hold.1 If v and w are two vectors in V, their sum is expressed as
v + w, and the scalar product of v by a real number a is denoted as av. These operations
are called vector addition and scalar multiplication, respectively, and the following
axioms are assumed to hold.
1
EXAMP L E 1
n is a vector space using matrix addition and scalar multiplication.2
2
It is important to realize that, in a general vector space, the vectors need not
be n-tuples as in n. They can be any kind of objects at all as long as the addition
and scalar multiplication are defined and the axioms are satisfied. The following
examples illustrate the diversity of the concept.
The space n consists of special types of matrices. More generally, let Mmn denote
the set of all m × n matrices with real entries. Then Theorem 1 Section 2.1 gives:
EXAMP L E 2
The set Mmn of all m × n matrices is a vector space using matrix addition and
scalar multiplication. The zero element in this vector space is the zero matrix of
size m × n, and the vector space negative of a matrix (required by axiom A5) is
the usual matrix negative discussed in Section 2.1. Note that Mmn is just mn in
different notation.
1 The scalars will usually be real numbers, but they could be complex numbers, or elements of an algebraic system called a field.
Another example is the field of rational numbers. We will look briefly at finite fields in Section 8.7.
2 We will usually write the vectors in n as n-tuples. However, if it is convenient, we will sometimes denote them as rows or columns.
290 Chapter 6 Vector Spaces
EXAMPLE 3
Show that every subspace of n is a vector space in its own right using the
addition and scalar multiplication of n.
Solution ► Axioms A1 and S1 are two of the defining conditions for a subspace
U of n (see Section 5.1). The other eight axioms for a vector space are
inherited from n. For example, if x and y are in U and a is a scalar, then
a(x + y) = ax + ay because x and y are in n. This shows that axiom S2 holds
for U; similarly, the other axioms also hold for U.
EXAMPLE 4
Let V denote the set of all ordered pairs (x, y) and define addition in V as in 2.
However, define a new scalar multiplication in V by
a(x, y) = (ay, ax)
Determine if V is a vector space with these operations.
Solution ► Axioms A1 to A5 are valid for V because they hold for matrices.
Also a(x, y) = (ay, ax) is again in V, so axiom S1 holds. To verify axiom S2,
let v = (x, y) and w = (x1, y1) be typical elements in V and compute
a(v + w) = a(x + x1, y + y1) = (a( y + y1), a(x + x1))
av + aw = (ay, ax) + (ay1, ax1) = (ay + ay1, ax + ax1)
Because these are equal, axiom S2 holds. Similarly, the reader can verify that
axiom S3 holds. However, axiom S4 fails because
a(b(x, y)) = a(by, bx) = (abx, aby)
need not equal ab(x, y) = (aby, abx). Hence, V is not a vector space. (In fact,
axiom S5 also fails.)
are two polynomials in P (possibly of different degrees). Then p(x) and q(x)
are called equal [written p(x) = q(x)] if and only if all the corresponding
coefficients are equal—that is, a0 = b0, a1 = b1, a2 = b2, and so on. In particular,
a0 + a1x + a2x2 + = 0 means that a0 = 0, a1 = 0, a2 = 0, …, and this is the
reason for calling x an indeterminate. The set P has an addition and scalar
multiplication defined on it as follows: if p(x) and q(x) are as before and a is a
real number,
p(x) + q(x) = (a0 + b0) + (a1 + b1)x + (a2 + b2)x2 +
ap(x) = aa0 + (aa1)x + (aa2)x2 +
Evidently, these are again polynomials, so P is closed under these operations, called
pointwise addition and scalar multiplication. The other vector space axioms are
easily verified, and we have
EXAMP L E 5
The set P of all polynomials is a vector space with the foregoing addition and
scalar multiplication. The zero vector is the zero polynomial, and the negative
of a polynomial p(x) = a0 + a1x + a2x2 + is the polynomial
-p(x) = -a0 - a1x - a2x2 - obtained by negating all the coefficients.
EXAMP L E 6
Given n ≥ 1, let Pn denote the set of all polynomials of degree at most n,
together with the zero polynomial. That is
Pn = {a0 + a1x + a2x2 + + anxn | a0, a1, a2, …, an in }.
Then Pn is a vector space. Indeed, sums and scalar multiples of polynomials in
Pn are again in Pn, and the other vector space axioms are inherited from P. In
particular, the zero vector and the negative of a polynomial in Pn are the same
as those in P.
If a and b are real numbers and a < b, the interval [a, b] is defined to be the set
of all real numbers x such that a ≤ x ≤ b. A (real-valued) function f on [a, b] is a
rule that associates to every number x in [a, b] a real number denoted f (x). The
rule is frequently specified by giving a formula for f (x) in terms of x. For example,
f (x) = 2x, f (x) = sin x, and f (x) = x2 + 1 are familiar functions. In fact, every
polynomial p(x) can be regarded as the formula for a function p.
The set of all functions on [a, b] is denoted F[a, b]. Two functions f and g in
F[a, b] are equal if f (x) = g(x) for every x in [a, b], and we describe this by saying
that f and g have the same action. Note that two polynomials are equal in P
(defined prior to Example 5) if and only if they are equal as functions.
If f and g are two functions in F[a, b], and if r is a real number, define the sum
f + g and the scalar product rf by
( f + g)(x) = f (x) + g(x) for each x in [a, b]
(rf )(x) = rf (x) for each x in [a, b]
292 Chapter 6 Vector Spaces
2
y = x = f (x) In other words, the action of f + g upon x is to associate x with the number
y f (x) + g(x), and rf associates x with rf (x). The sum of f (x) = x2 and g(x) = -x is
shown in the diagram. These operations on F[a, b] are called pointwise addition
1
and scalar multiplication of functions and they are the usual operations familiar
y = f ( x ) + g( x )
2 from elementary algebra and calculus.
=x −x
O 1 x
EXAMPLE 7
y = − x = g( x ) The set F[a, b] of all functions on the interval [a, b] is a vector space using
pointwise addition and scalar multiplication. The zero function (in axiom A4),
denoted 0, is the constant function defined by
0(x) = 0 for each x in [a, b]
The negative of a function f is denoted -f and has action defined by
(-f )(x) = -f (x) for each x in [a, b]
Axioms A1 and S1 are clearly satisfied because, if f and g are functions on [a, b],
then f + g and rf are again such functions. The verification of the remaining
axioms is left as Exercise 14.
Other examples of vector spaces will appear later, but these are sufficiently varied
to indicate the scope of the concept and to illustrate the properties of vector spaces
to be discussed. With such a variety of examples, it may come as a surprise that a
well-developed theory of vector spaces exists. That is, many properties can be shown
to hold for all vector spaces and hence hold in every example. Such properties are
called theorems and can be deduced from the axioms. Here is an important example.
Theorem 1
Cancellation
Let u, v, and w be vectors in a vector space V. If v + u = v + w, then u = w.
PROOF
We are given v + u = v + w. If these were numbers instead of vectors, we
would simply subtract v from both sides of the equation to obtain u = w. This
can be accomplished with vectors by adding -v to both sides of the equation.
The steps (using only the axioms) are as follows:
v+u=v+w
-v + (v + u) = -v + (v + w) (axiom A5)
(-v + v) + u = (-v + v) + w (axiom A3)
0+u=0+w (axiom A5)
u=w (axiom A4)
This is the desired conclusion.3
3
3 Observe that none of the scalar multiplication axioms are needed here.
SECTION 6.1 Examples and Basic Properties 293
Theorem 2
PROOF
The difference x = u - v is indeed a solution to the equation because (using
several axioms)
x + v = (u - v) + v = [u + (-v)] + v = u + (-v + v) = u + 0 = u.
To see that this is the only solution, suppose x1 is another solution so that
x1 + v = u. Then x + v = x1 + v (they both equal u), so x = x1 by cancellation.
Similarly, cancellation shows that there is only one zero vector in any vector
space and only one negative of each vector (Exercises 10 and 11). Hence we speak
of the zero vector and the negative of a vector.
The next theorem derives some basic properties of scalar multiplication that
hold in every vector space, and will be used extensively.
Theorem 3
Let v denote a vector in a vector space V and let a denote a real number.
1. 0v = 0.
2. a0 = 0.
3. If av = 0, then either a = 0 or v = 0.
4. (-1)v = -v.
5. (-a)v = -(av) = a(-v).
PROOF
1. Observe that 0v + 0v = (0 + 0)v = 0v = 0v + 0 where the first equality is by
axiom S3. It follows that 0v = 0 by cancellation.
294 Chapter 6 Vector Spaces
The properties in Theorem 3 are familiar for matrices; the point here is that they
hold in every vector space.
Axiom A3 ensures that the sum u + (v + w) = (u + v) + w is the same however
it is formed, and we write it simply as u + v + w. Similarly, there are different
ways to form any sum v1 + v2 + + vn, and Axiom A3 guarantees that they are all
equal. Moreover, Axiom A2 shows that the order in which the vectors are written
does not matter (for example: u + v + w + z = z + u + w + v).
Similarly, Axioms S2 and S3 extend. For example a(u + v + w) = au + av + aw
and (a + b + c)v = av + bv + cv hold for all values of the scalars and vectors
involved (verify). More generally,
a(v1 + v2 + + vn) = av1 + av2 + + avn
(a1 + a2 + + an)v = a1v + a2v + + anv
hold for all n ≥ 1, all numbers a, a1, …, an, and all vectors, v, v1, …, vn. The verifications
are by induction and are left to the reader (Exercise 13). These facts—together with the
axioms, Theorem 3, and the definition of subtraction—enable us to simplify expressions
involving sums of scalar multiples of vectors by collecting like terms, expanding, and
taking out common factors. This has been discussed for the vector space of matrices
in Section 2.1 (and for geometric vectors in Section 4.1); the manipulations in an
arbitrary vector space are carried out in the same way. Here is an illustration.
EXAMPLE 8
If u, v, and w are vectors in a vector space V, simplify the expression
2(u + 3w) - 3(2w - v) - 3[2(2u + v - 4w) - 4(u - 2w)].
EXAMP L E 9
A set {0} with one element becomes a vector space if we define
0+0=0 and a0 = 0 for all scalars a.
The resulting space is called the zero vector space and is denoted {0}.
The vector space axioms are easily verified for {0}. In any vector space V, Theorem
3(2) shows that the zero subspace (consisting of the zero vector of V alone) is a copy
of the zero vector space.
EXERCISES 6.1
1. Let V denote the set of ordered triples (x, y, z) (k) The set V of all ordered pairs (x, y) with
and define addition in V as in 3. For each of the addition of 2, but scalar multiplication
the following definitions of scalar multiplication, a(x, y) = (x, y) for all a in .
decide whether V is a vector space.
(l) The set V of all functions f: → with
(a) a(x, y, z) = (ax, y, az) pointwise addition, but scalar multiplication
defined by (af )(x) = f (ax).
(b) a(x, y, z) = (ax, 0, az)
(m) The set V of all 2 × 2 matrices whose entries
(c) a(x, y, z) = (0, 0, 0)
sum to 0; operations of M22.
(d) a(x, y, z) = (2ax, 2ay, 2az)
(n) The set V of all 2 × 2 matrices with the
2. Are the following sets vector spaces with the addition of M22 but scalar multiplication ∗
indicated operations? If not, why not? defined by a ∗ X = aXT.
(a) The set V of nonnegative real numbers; 3. Let V be the set of positive real numbers with
ordinary addition and scalar multiplication. vector addition being ordinary multiplication,
and scalar multiplication being a · v = va. Show
(b) The set V of all polynomials of degree ≥3,
that V is a vector space.
together with 0; operations of P.
(c) The set of all polynomials of degree ≤3; 4. If V is the set of ordered pairs (x, y) of real
operations of P. numbers, show that it is a vector space if
(x, y) + (x1, y1) = (x + x1, y + y1 + 1) and
(d) The set {1, x, x2, …}; operations of P. a(x, y) = (ax, ay + a - 1). What is the zero
(e) The set V of all 2 × 2 matrices of the form vector in V?
S T; operations of M22.
a b
5. Find x and y (in terms of u and v) such that:
0 c
(f ) The set V of 2 × 2 matrices with equal
(a) 2x + y = u (b) 3x - 2y = u
column sums; operations of M22. 5x + 3y = v 4x - 5y = v
(g) The set V of 2 × 2 matrices with zero 6. In each case show that the condition
determinant; usual matrix operations. au + bv + cw = 0 in V implies that
a = b = c = 0.
(h) The set V of real numbers; usual operations.
(a) V = 4; u = (2, 1, 0, 2), v = (1, 1, -1, 0),
(i) The set V of complex numbers; usual w = (0, 1, 2, 1)
addition and multiplication by a real number.
(b) V = M22; u = S T, v = S T, w = S T
1 0 0 1 1 1
(j) The set V of all ordered pairs (x, y) with 0 1 1 0 1 -1
the addition of 2, but scalar multiplication (c) V = P; u = x3 + x, v = x2 + 1,
a(x, y) = (ax, -ay). w = x3 - x2 + x + 1
296 Chapter 6 Vector Spaces
8. Show that x = v is the only solution to the 17. Let V be a vector space, and define Vn to be the
equation x + x = 2v in a vector space V. Cite all set of all n-tuples (v1, v2, …, vn) of n vectors vi,
axioms used. each belonging to V. Define addition and scalar
multiplication in Vn as follows:
9. Show that -0 = 0 in any vector space. Cite all
axioms used. (u1, u2, …, un) + (v1, v2, …, vn)
= (u1 + v1, u2 + v2, …, un + vn)
10. Show that the zero vector 0 is uniquely a(v1, v2, …, vn) = (av1, av2, …, avn)
determined by the property in axiom A4.
Show that Vn is a vector space.
11. Given a vector v, show that its negative -v
18. Let Vn be the vector space of n-tuples from the
is uniquely determined by the property in
preceding exercise, written as columns. If A is an
axiom A5.
m × n matrix, and X is in Vn, define AX in Vm by
12. (a) Prove (2) of Theorem 3. [Hint: Axiom S2.] matrix multiplication. More precisely, if
S T S T
v1 u1
(b) Prove that (-a)v = -(av) in Theorem 3 by
A = [aij] and X =
, let AX =
, where
first computing (-a)v + av. Then do it using vn un
(4) of Theorem 3 and axiom S4.
ui = ai1v1 + ai2v2 + + ainvn for each i. Prove
(c) Prove that a(-v) = -(av) in Theorem 3 in that:
two ways, as in part (b).
(a) B(AX) = (BA)X
13. Let v, v1, …, vn denote vectors in a vector space (b) (A + A1)X = AX + A1X
V and let a, a1, …, an denote numbers. Use
induction on n to prove each of the following. (c) A(X + X1) = AX + AX1
(a) a(v1 + v2 + + vn) = av1 + av2 + + avn (d) (kA)X = k(AX) = A(kX) if k is any number
(b) (a1 + a2 + + an)v = a1v + a2v + + anv (e) IX = X if I is the n × n identity matrix
14. Verify axioms A2—A5 and S2—S5 for the space (f ) Let E be an elementary matrix obtained by
F[a, b] of functions on [a, b] (Example 7). performing a row operation on the rows
of In (see Section 2.5). Show that EX is the
15. Prove each of the following for vectors u and v column resulting from performing that same
and scalars a and b. row operation on the vectors (call them rows)
of X. [Hint: Lemma 1 Section 2.5.]
Subspaces of n (as defined in Section 5.1) are subspaces in the present sense by
Example 3 Section 6.1. Moreover, the defining properties for a subspace of n
actually characterize subspaces in general.
SECTION 6.2 Subspaces and Spanning Sets 297
Theorem 1
Subspace Test
A subset U of a vector space is a subspace of V if and only if it satisfies the following
three conditions:
1. 0 lies in U where 0 is the zero vector of V.
2. If u1 and u2 are in U, then u1 + u2 is also in U.
3. If u is in U, then au is also in U for each scalar a.
PROOF
If U is a subspace of V, then (2) and (3) hold by axioms A1 and S1 respectively,
applied to the vector space U. Since U is nonempty (it is a vector space), choose
u in U. Then (1) holds because 0 = 0u is in U by (3) and Theorem 3 Section 6.1.
Conversely, if (1), (2), and (3) hold, then axioms A1 and S1 hold because of
(2) and (3), and axioms A2, A3, S2, S3, S4, and S5 hold in U because they hold in
V. Axiom A4 holds because the zero vector 0 of V is actually in U by (1), and so
serves as the zero of U. Finally, given u in U, then its negative -u in V is again
in U by (3) because -u = (-1)u (again using Theorem 3 Section 6.1). Hence
-u serves as the negative of u in U.
Note that the proof of Theorem 1 shows that if U is a subspace of V, then U and V
share the same zero vector, and that the negative of a vector in the space U is the
same as its negative in V.
EXAMP L E 1
If V is any vector space, show that {0} and V are subspaces of V.
EXAMP L E 2
Let v be a vector in a vector space V. Show that the set
v = {av | a in }
of all scalar multiples of v is a subspace of V.
Solution ► Because 0 = 0v, it is clear that 0 lies in v. Given two vectors av
and a1v in v, their sum av + a1v = (a + a1)v is also a scalar multiple of v and
so lies in v. Hence v is closed under addition. Finally, given av, r(av) = (ra)v
lies in v for all r ∈ , so v is closed under scalar multiplication. Hence the
subspace test applies.
298 Chapter 6 Vector Spaces
In particular, given d ≠ 0 in 3, d is the line through the origin with direction
vector d.
The space v in Example 2 is described by giving the form of each vector in v.
The next example describes a subset U of the space Mnn by giving a condition that
each matrix of U must satisfy.
EXAMPLE 3
Let A be a fixed matrix in Mnn. Show that U = {X in Mnn | AX = XA} is a
subspace of Mnn.
Suppose p(x) is a polynomial and a is a number. Then the number p(a) obtained
by replacing x by a in the expression for p(x) is called the evaluation of p(x) at a.
For example, if p(x) = 5 - 6x + 2x2, then the evaluation of p(x) at a = 2 is
p(2) = 5 - 12 + 8 = 1. If p(a) = 0, the number a is called a root of p(x).
EXAMPLE 4
Consider the set U of all polynomials in P that have 3 as a root:
Solution ► Clearly, the zero polynomial lies in U. Now let p(x) and q(x) lie
in U so p(3) = 0 and q(3) = 0. We have (p + q)(x) = p(x) + q(x) for all x, so
(p + q)(3) = p(3) + q(3) = 0 + 0 = 0, and U is closed under addition. The
verification that U is closed under scalar multiplication is similar.
EXAMPLE 5
Pn is a subspace of P for each n ≥ 0.
SECTION 6.2 Subspaces and Spanning Sets 299
The next example involves the notion of the derivative f of a function f. (If the
reader is not familiar with calculus, this example may be omitted.) A function f
defined on the interval [a, b] is called differentiable if the derivative f (r) exists at
every r in [a, b].
EXAMP L E 6
Show that the subset D[a, b] of all differentiable functions on [a, b] is a
subspace of the vector space F[a, b] of all functions on [a, b].
Definition 6.3 Let {v1, v2, …, vn} be a set of vectors in a vector space V. As in n, a vector v is called a
linear combination of the vectors v1, v2, …, vn if it can be expressed in the form
v = a1v1 + a2v2 + + anvn
where a1, a2, …, an are scalars, called the coefficients of v1, v2, …, vn. The set of all
linear combinations of these vectors is called their span, and is denoted by
span{v1, v2, …, vn} = {a1v1 + a2v2 + + anvn | ai in }.
If it happens that V = span{v1, v2, …, vn}, these vectors are called a spanning set
for V. For example, the span of two vectors v and w is the set
span{v, w} = {sv + tw | s and t in }
of all sums of scalar multiples of these vectors.
EXAMP L E 7
Consider the vectors p1 = 1 + x + 4x2 and p2 = 1 + 5x + x2 in P2. Determine
whether p1 and p2 lie in span{1 + 2x - x2, 3 + 5x + 2x2}.
We saw in Example 6 Section 5.1 that m = span{e1, e2, …, em} where the
vectors e1, e2, …, em are the columns of the m × m identity matrix. Of course
m = Mm1 is the set of all m × 1 matrices, and there is an analogous spanning
set for each space Mmn. For example, each 2 × 2 matrix has the form
EXAMPLE 8
Mmn is the span of the set of all m × n matrices with exactly one entry equal to
1, and all other entries zero.
The fact that every polynomial in Pn has the form a0 + a1x + a2x2 + + anxn
where each ai is in shows that
EXAMPLE 9
Pn = span{1, x, x2, …, xn}.
Theorem 2
EXAMPLE 10
Show that P3 = span{x2 + x3, x, 2x2 + 1, 3}.
Solution ► Write U = span{x2 + x3, x, 2x2 + 1, 3}. Then U ⊆ P3, and we use
the fact that P3 = span{1, x, x2, x3} to show that P3 ⊆ U. In fact, x and 1 = _13 · 3
clearly lie in U. But then successively,
x2 = _12 [(2x2 + 1) - 1] and x3 = (x2 + x3) - x2
also lie in U. Hence P3 ⊆ U by Theorem 2.
SECTION 6.2 Subspaces and Spanning Sets 301
EXAMPLE 11
Let u and v be two vectors in a vector space V. Show that
span{u, v} = span{u + 2v, u - v}.
EXERCISES 6.2
1. Which of the following are subspaces of P3? (d) U = {f | f (x) ≥ 0 for all x in [0, 1]}
Support your answer.
(e) U = {f | f (x) = f ( y) for all x and y in [0, 1]}
(a) U = {f (x) | f (x) in P3, f (2) = 1}
(f ) U = {f | f (x + y) = f (x) + f ( y) for all x and
(b) U = {xg(x) | g(x) in P2} y in [0, 1]}
(c) U = {xg(x) | g(x) in P3} (g) U = {f | f is integrable and ∫10 f (x)dx = 0}
(d) U = {xg(x) + (1 - x)h(x) | g(x) and h(x) in P2} 4. Let A be an m × n matrix. For which columns b
(e) U = The set of all polynomials in P3 with in m is U = {x | x in n, Ax = b} a subspace of
constant term 0 n? Support your answer.
(a) U = U S a T` V
b a, b, and c in (b) Show that U = m if x ≠ 0.
0 c
6. Write each of the following as a linear
(b) U = U S T` V
a b a + b = c + d; a, b, c, and d in combination of x + 1, x2 + x, and x2 + 2.
c d
(a) x2 + 3x + 2
(c) U = {A | A in M22, A = AT}
(b) 2x2 - 3x + 1
(d) U = {A | A in M22, AB = 0}, B a fixed
2 × 2 matrix (c) x2 + 1
v=S T; u = S 2 1 T , w = S 1 0 T
(b) U = {f | f (0) = 1} 1 -4 1 -1 2 1
(d)
5 3
(c) U = {f | f (0) = f (1)}
302 Chapter 6 Vector Spaces
8. Which of the following functions lie in 20. If Pn = span{p1(x), p2(x), …, pk(x)} and a is in ,
span{cos2 x, sin2 x}? (Work in F[0, π].) show that pi(a) ≠ 0 for some i.
(a) cos 2x (b) 1 21. Let U be a subspace of a vector space V.
2 2
(c) x (d) 1+x (a) If au is in U where a ≠ 0, show that u is
3 in U.
9. (a) Show that is spanned by
{(1, 0, 1), (1, 1, 0), (0, 1, 1)}. (b) If u and u + v are in U, show that v is in U.
(b) Show that P2 is spanned by 22. Let U be a nonempty subset of a vector space
{1 + 2x2, 3x, 1 + x}. V. Show that U is a subspace of V if and only if
(c) Show that M22 is spanned by u1 + au2 lies in U for all u1 and u2 in U and all a
U S 0 0 T , S 0 1 T , S 1 0 T , S 0 1 T V.
1 0 1 0 0 1 1 1 in .
S T S T
v1 u1
13. If X and Y are nonempty subsets of a vector X=
Y=
space V such that span X = span Y = V, must vn un
there be a vector common to both X and Y?
as in Exercise 18 Section 6.1.
Justify your answer.
(a) Show that span{v1, …, vn} ⊆ span{u1, …, un}
14. Is it possible that {(1, 2, 0), (1, 1, 1)} can span the if and only if AY = X for some n × n
subspace U = {(a, b, 0) | a and b in }? matrix A.
15. Describe span{0}. (b) If X = AY where A is invertible, show that
span{v1, …, vn} = span{u1, …, un}.
16. Let v denote any vector in a vector space V.
Show that span{v} = span{av} for any a ≠ 0. 26. If U and W are subspaces of a vector space V, let
U ∪ W = {v | v is in U or v is in W}. Show that
17. Determine all subspaces of v where v ≠ 0 in U ∪ W is a subspace if and only if U ⊆ W or
some vector space V. W ⊆ U.
18. Suppose V = span{v1, v2, …, vn}. If 27. Show that P cannot be spanned by a finite set of
u = a1v1 + a2v2 + + anvn where the ai are in polynomials.
and a1 ≠ 0, show that V = span{u, v2, …, vn}.
Definition 6.4 As in n, a set of vectors {v1, v2, …, vn} in a vector space V is called linearly
independent (or simply independent) if it satisfies the following condition:
If s1v1 + s2v2 + + snvn = 0, then s1 = s2 = = sn = 0.
A set of vectors that is not linearly independent is said to be linearly dependent
(or simply dependent).
The trivial linear combination of the vectors v1, v2, …, vn is the one with every
coefficient zero:
0v1 + 0v2 + + 0vn.
This is obviously one way of expressing 0 as a linear combination of the vectors
v1, v2, …, vn, and they are linearly independent when it is the only way.
EXAMP L E 1
Show that {1 + x, 3x + x2, 2 + x - x2} is independent in P2.
EXA MP L E 2
Show that {sin x, cos x} is independent in the vector space F[0, 2π] of functions
defined on the interval [0, 2π].
EXA MP L E 3
Suppose that {u, v} is an independent set in a vector space V. Show that
{u + 2v, u - 3v} is also independent.
304 Chapter 6 Vector Spaces
EXAMPLE 4
Show that any set of polynomials of distinct degrees is independent.
EXAMPLE 5
Suppose that A is an n × n matrix such that Ak = 0 but Ak-1 ≠ 0. Show that
B = {I, A, A2, …, Ak-1} is independent in Mnn.
Since Ak = 0, all the higher powers are zero, so this becomes r0Ak–1 = 0.
But Ak–1 ≠ 0, so r0 = 0, and we have r1A1 + r2A2 + + rk–1Ak–1 = 0. Now
multiply by Ak–2 to conclude that r1 = 0. Continuing, we obtain ri = 0 for
each i, so B is independent.
EXAMP L E 6
Let V denote a vector space.
1. If v ≠ 0 in V, then {v} is an independent set.
2. No independent set of vectors in V can contain the zero vector.
Solution ►
1. Let tv = 0, t in . If t ≠ 0, then v = 1v = _1t (tv) = _1t 0 = 0, contrary
to assumption. So t = 0.
2. If {v1, v2, …, vk} is independent and (say) v2 = 0, then
0v1 + 1v2 + + 0vk = 0 is a nontrivial linear combination that
vanishes, contrary to the independence of {v1, v2, …, vk}.
Theorem 1
Let {v1, v2, …, vn} be a linearly independent set of vectors in a vector space V. If a vector
v has two (ostensibly different) representations
v = s1v1 + s2v2 + + snvn
v = t1v1 + t2v2 + + tnvn
as linear combinations of these vectors, then s1 = t1, s2 = t2, …, sn = tn. In other words,
every vector in V can be written in a unique way as a linear comination of the vi.
PROOF
Subtracting the equations given in the theorem gives
(s1 - t1)v1 + (s2 - t2)v2 + + (sn - tn)vn = 0
The independence of {v1, v2, …, vn} gives si - ti = 0 for each i, as required.
The following theorem extends (and proves) Theorem 4 Section 5.2, and is one
of the most useful results in linear algebra.
Theorem 2
Fundamental Theorem
Suppose a vector space V can be spanned by n vectors. If any set of m vectors in V is
linearly independent, then m ≤ n.
306 Chapter 6 Vector Spaces
PROOF
Let V = span{v1, v2, …, vn}, and suppose that {u1, u2, …, um} is an independent
set in V. Then u1 = a1v1 + a2v2 + + anvn where each ai is in . As u1 ≠ 0
(Example 6), not all of the ai are zero, say a1 ≠ 0 (after relabelling the vi). Then
V = span{u1, v2, v3, …, vn} as the reader can verify. Hence, write
u2 = b1u1 + c2v2 + c3v3 + + cnvn. Then some ci ≠ 0 because {u1, u2} is
independent; so, as before, V = span{u1, u2, v3, …, vn}, again after possible
relabelling of the vi. If m > n, this procedure continues until all the vectors vi are
replaced by the vectors u1, u2, …, un. In particular, V = span{u1, u2, …, un}. But
then un+1 is a linear combination of u1, u2, …, un contrary to the independence
of the ui. Hence, the assumption m > n cannot be valid, so m ≤ n and the
theorem is proved.
Definition 6.5 As in n, a set {e1, e2, …, en} of vectors in a vector space V is called a basis of V if it
satisfies the following two conditions:
1. {e1, e2, …, en} is linearly independent
2. V = span{e1, e2, …, en}
Thus if a set of vectors {e1, e2, …, en} is a basis, then every vector in V can be written
as a linear combination of these vectors in a unique way (Theorem 1). But even more
is true: Any two (finite) bases of V contain the same number of vectors.
Theorem 3
Invariance Theorem
Let {e1, e2, …, en} and {f1, f2, …, fm} be two bases of a vector space V. Then n = m.
PROOF
Because V = span{e1, e2, …, en} and {f1, f2, …, fm} is independent, it follows from
Theorem 2 that m ≤ n. Similarly n ≤ m, so n = m, as asserted.
Definition 6.6 If {e1, e2, …, en} is a basis of the nonzero vector space V, the number n of vectors in the
basis is called the dimension of V, and we write
dim V = n.
The zero vector space {0} is defined to have dimension 0:
dim {0} = 0.
In our discussion to this point we have always assumed that a basis is nonempty and
hence that the dimension of the space is at least 1. However, the zero space {0} has
no basis (by Example 6) so our insistence that dim{0} = 0 amounts to saying that the
empty set of vectors is a basis of {0}. Thus the statement that “the dimension of a
vector space is the number of vectors in any basis” holds even for the zero space.
We saw in Example 9 Section 5.2 that dim(n) = n and, if ej denotes column j
of In, that {e1, e2, …, en} is a basis (called the standard basis). In Example 7 below,
similar considerations apply to the space Mmn of all m × n matrices; the verifications
are left to the reader.
EXAMP L E 7
The space Mmn has dimension mn, and one basis consists of all m × n matrices
with exactly one entry equal to 1 and all other entries equal to 0. We call this
the standard basis of Mmn.
EXAMP L E 8
Show that dim Pn = n + 1 and that {1, x, x2, …, xn} is a basis, called the
standard basis of Pn.
EXAMP L E 9
If v ≠ 0 is any nonzero vector in a vector space V, show that span{v} = v
has dimension 1.
EXAMPLE 10
Let A = S T and consider the subspace
1 1
0 0
U = {X in M22 | AX = XA}
of M22. Show that dim U = 2 and find a basis of U.
Solution ► It was shown in Example 3 Section 6.2 that U is a subspace for any
choice of the matrix A. In the present case, if X = S T is in U, the condition
x y
z w
AX = XA gives z = 0 and x = y + w. Hence each matrix X in U can be written
X=S T = yS T + wS T
y+w y 1 1 1 0
0 w 0 0 0 1
EXAMPLE 11
Show that the set V of all symmetric 2 × 2 matrices is a vector space, and find
the dimension of V.
EXAMPLE 12
Let B = {v1, v2, …, vn} be vectors in a vector space V. Given nonzero scalars
a1, a2, …, an, write D = {a1v1, a2v2, …, anvn}. If B is independent or spans V,
the same is true of D. In particular, if B is a basis of V, so also is D.
SECTION 6.3 Linear Independence and Dimension 309
EXERCISES 6.3
1. Show that each of the following sets of vectors is (d) {1 + x, x + x2, x2 + x3, x3}; V = P3
independent.
6. Exhibit a basis and calculate the dimension of
(a) {(1 + x, 1 - x, x + x2} in P2 each of the following subspaces of P2.
(b) {x2, x + 1, 1 - x - x2} in P2 (a) {a(1 + x) + b(x + x2) | a and b in }
(c) U S T, S T, S T , S T V in M22
1 1 1 0 0 0 0 1 (b) {a + b(x + x2) | a and b in }
0 0 1 0 1 -1 0 1
(c) {p(x) | p(1) = 0}
(d) U S T , S T , S T , S T V in M22
1 1 0 1 1 0 1 1
(d) {p(x) | p(x) = p(-x)}
1 0 1 1 1 1 0 1
7. Exhibit a basis and calculate the dimension of
2. Which of the following subsets of V are
each of the following subspaces of M22.
independent?
(a) {A | AT = -A}
(a) V = P2; {x2 + 1, x + 1, x}
(b) V = P2; {x2 - x + 3, 2x2 + x + 5, x2 + 5x + 1} (b) U A ` A S -11 10 T = S -11 10 T A V
(c) V = M22; U S T, S T, S T V
1 1 1 0 1 0
(c) U A ` A S T = S TV
1 0 0 0
0 1 1 1 0 1
-1 0 0 0
(d) V = M22;
U S 0 -1 T , S -1 1 T , S 1 1 T , S -1 0 T V
-1 0 1 -1 1 1 0 -1 (d) U A ` A S -11 10 T = S -10 11 T A V
U x2 + x - 6 , x2 - 5x + 6 , x2 - 9 V
1
__________ 1
___________ 1
______ (a) Find a basis of U containing A.
(b) Find a basis of U not containing A.
3. Which of the following are independent in
F[0, 2π]? 9. Show that the set of all complex numbers is a
vector space with the usual operations, and find
(a) {sin2 x, cos2 x} its dimension.
(b) {1, sin2 x, cos2 x}
10. (a) Let V denote the set of all 2 × 2 matrices
(c) {x, sin2 x, cos2 x} with equal column sums. Show that V is a
subspace of M22, and compute dim V.
4. Find all values of x such that the following are
independent in 3. (b) Repeat part (a) for 3 × 3 matrices.
(a) {(1, -1, 0), (x, 1, 0), (0, 2, 3)} (c) Repeat part (a) for n × n matrices.
(b) {(2, x, 1), (1, 0, 1), (0, 1, 3)} 11. (a) Let V = {(x2 + x + 1)p(x) | p(x) in P2}. Show
that V is a subspace of P4 and find dim V.
5. Show that the following are bases of the space V [Hint: If f (x)g(x) = 0 in P, then f (x) = 0 or
indicated. g(x) = 0.]
(a) {(1, 1, 0), (1, 0, 1), (0, 1, 1)}; V = 3 (b) Repeat with V = {(x2 - x)p(x) | p(x) in P3}, a
3
(b) {(-1, 1, 1), (1, -1, 1), (1, 1, -1)}; V = subset of P5.
(c) U S T , S T , S T , S T V; V = M22
1 0 0 1 1 1 1 0 (c) Generalize.
0 1 1 0 0 1 0 0
310 Chapter 6 Vector Spaces
12. In each case, either prove the assertion or give an 17. Let {A1, A2, …, Ak} be independent in Mmn, and
example showing that it is false. suppose that U and V are invertible matrices of
size m × m and n × n, respectively. Show that
(a) Every set of four nonzero polynomials in P3
{UA1V, UA2V, …, UAkV} is independent.
is a basis.
(b) P2 has a basis of polynomials f (x) such that 18. Show that {v, w} is independent if and only if
f (0) = 0. neither v nor w is a scalar multiple of the other.
(c) P2 has a basis of polynomials f (x) such that 19. Assume that {u, v} is independent in a vector
f (0) = 1. space V. Write u = au + bv and v = cu + dv,
where a, b, c, and d are numbers. Show that
(d) Every basis of M22 contains a noninvertible
{u, v} is independent if and only if the
matrix.
matrix S T is invertible. [Hint: Theorem 5
a c
(e) No independent subset of M22 contains a b d
matrix A with A2 = 0. Section 2.4.]
(f ) If {u, v, w} is independent then, 20. If {v1, v2, …, vk} is independent and w is not in
au + bv + cw = 0 for some a, b, c. span{v1, v2, …, vk}, show that:
(g) {u, v, w} is independent if au + bv + cw = 0 (a) {w, v1, v2, …, vk} is independent.
for some a, b, c.
(b) {v1 + w, v2 + w, …, vk + w} is independent.
(h) If {u, v} is independent, so is {u, u + v}.
21. If {v1, v2, …, vk} is independent, show that
(i) If {u, v} is independent, so is {u, v, u + v}. {v1, v1 + v2, …, v1 + v2 + + vk} is also
(j) If {u, v, w} is independent, so is {u, v}. independent.
(k) If {u, v, w} is independent, so is 22. Prove Example 12.
{u + w, v + w}.
23. Let {u, v, w, z} be independent. Which of the
(l) If {u, v, w} is independent, so is {u + v + w}. following are dependent?
(m) If u ≠ 0 and v ≠ 0 then {u, v} is dependent (a) {u - v, v - w, w - u}
if and only if one is a scalar multiple of the
other. (b) {u + v, v + w, w + u}
(n) If dim V = n, then no set of more than (c) {u - v, v - w, w - z, z - u}
n vectors can be independent. (d) {u + v, v + w, w + z, z + u}
(o) If dim V = n, then no set of fewer than
n vectors can span V. 24. Let U and W be subspaces of V with bases
{u1, u2, u3} and {w1, w2} respectively. If U and W
13. Let A ≠ 0 and B ≠ 0 be n × n matrices, and have only the zero vector in common, show that
assume that A is symmetric and B is skew- {u1, u2, u3, w1, w2} is independent.
symmetric (that is, BT = -B). Show that {A, B}
is independent. 25. Let {p, q} be independent polynomials. Show
that {p, q, pq} is independent if and only if
14. Show that every set of vectors containing a deg p ≥ 1 and deg q ≥ 1.
dependent set is again dependent.
26. If z is a complex number, show that {z, z2} is
15. Show that every nonempty subset of an independent if and only if z is not real.
independent set of vectors is again independent.
27. Let B = {A1, A2, …, An} ⊆ Mmn, and write
16. Let f and g be functions on [a, b], and assume B = {A T1 , A T2 , …, ATn} ⊆ Mnm. Show that:
that f (a) = 1 = g(b) and f (b) = 0 = g(a). Show
(a) B is independent if and only if B is
that {f, g} is independent in F[a, b].
independent.
(b) B spans Mmn if and only if B spans Mnm.
SECTION 6.4 Finite Dimensional Spaces 311
28. If V = F[a, b] as in Example 7 Section 6.1, show (b) If dim U = m and dim W = n, show that
that the set of constant functions is a subspace of dim V = m + n.
dimension 1 ( f is constant if there is a number c
(c) If V1, …, Vm are vector spaces, let
such that f (x) = c for all x).
V = V1 × × Vm = {(v1, …, vm) | vi in Vi
29. (a) If U is an invertible n × n matrix and for each i} denote the space of n-tuples from
{A1, A2, …, Amn} is a basis of Mmn, show the Vi with componentwise operations (see
that {A1U, A2U, …, AmnU} is also a basis. Exercise 17 Section 6.1). If dim Vi = ni for
each i, show that dim V = n1 + + nm.
(b) Show that part (a) fails if U is not invertible.
[Hint: Theorem 5 Section 2.4.] 35. Let Dn denote the set of all functions f from the
set {1, 2, …, n} to .
30. Show that {(a, b), (a1, b1)} is a basis of 2 if and
only if {a + bx, a1 + b1x} is a basis of P1. (a) Show that Dn is a vector space with pointwise
addition and scalar multiplication.
31. Find the dimension of the subspace
(b) Show that {S1, S2, …, Sn} is a basis of
span{1, sin2 θ, cos 2θ} of F[0, 2π].
Dn where, for each k = 1, 2, …, n, the
32. Show that F[0, 1] is not finite dimensional. function Sk is defined by Sk(k) = 1,
whereas Sk(j) = 0 if j ≠ k.
33. If U and W are subspaces of V, define their
intersection U ∩ W as follows: 36. A polynomial p(x) is even if p(-x) = p(x) and
odd if p(-x) = -p(x). Let En and On denote the
U ∩ W = {v | v is in both U and W}
sets of even and odd polynomials in Pn.
(a) Show that U ∩ W is a subspace contained in
U and W. (a) Show that En is a subspace of Pn and find
dim En.
(b) Show that U ∩ W = {0} if and only if {u, w}
is independent for any nonzero vectors u in (b) Show that On is a subspace of Pn and find
U and w in W. dim On.
(c) If B and D are bases of U and W, 37. Let {v1, …, vn} be independent in a vector
and if U ∩ W = {0}, show that space V, and let A be an n × n matrix. Define
B ∪ D = {v | v is in B or D} is independent. u1, …, un by
S T S T
u1 v1
34. If U and W are vector spaces, let
=A
V = {(u, w) | u in U and w in W}. un vn
(a) Show that V is a vector space if (See Exercise 18 Section 6.1.) Show that
(u, w) + (u1, w1) = (u + u1, w + w1) and {u1, …, un} is independent if and only if A
a(u, w) = (au, aw). is invertible.
Lemma 1
Independent Lemma
Let {v1, v2, …, vk} be an independent set of vectors in a vector space V. If u ∈ V but4
u ∉ span{v1, v2, …, vk}, then {u, v1, v2, …, vk} is also independent.
4
PROOF
Let tu + t1v1 + t2v2 + + tkvk = 0; we must show that all the coefficients
t t t
are zero. First, t = 0 because, otherwise, u = -__t1 v1 - __t2 v2 - - __tk vk is in
span{v1, v2, …, vk}, contrary to our assumption. Hence t = 0. But then
t1v1 + t2v2 + + tkvk = 0 so the rest of the ti are zero by the independence
z of {v1, v2, …, vk}. This is what we wanted.
u
v1
v2
Note that the converse of Lemma 1 is also true: if {u, v1, v2, …, vk} is independent,
0 then u is not in span{v1, v2, …, vk}.
y As an illustration, suppose that {v1, v2} is independent in 3. Then v1 and v2 are
x not parallel, so span{v1, v2} is a plane through the origin (shaded in the diagram). By
span{ v1, v 2 } Lemma 1, u is not in this plane if and only if {u, v1, v2} is independent.
Definition 6.7 A vector space V is called finite dimensional if it is spanned by a finite set of vectors.
Otherwise, V is called infinite dimensional.
Thus the zero vector space {0} is finite dimensional because {0} is a spanning set.
Lemma 2
PROOF
Suppose that I is an independent subset of U. If span I = U then I is already
a basis of U. If span I ≠ U, choose u1 ∈ U such that u1 ∉ span I. Hence the
set I {u1} is independent by Lemma 1. If span{I {u1}} = U we are done;
otherwise choose u2 ∈ U such that u2 ∉ span{I {u1}}. Hence {I {u1, u2}}
is independent, and the process continues. We claim that a basis of U will
be reached eventually. Indeed, if no basis of U is ever reached, the process
creates arbitrarily large independent sets in V. But this is impossible by the
fundamental theorem because V is finite dimensional and so is spanned by a
finite set of vectors.
4 If X is a set, we write a ∈ X to indicate that a is an element of the set X. If a is not an element of X, we write a ∉ X.
SECTION 6.4 Finite Dimensional Spaces 313
Theorem 1
PROOF
(1) If V = {0}, then V has an empty basis and dim V = 0 ≤ m. Otherwise, let
v ≠ 0 be a vector in V. Then {v} is independent, so (1) follows from Lemma 2
with U = V.
(2) We refine the proof of Lemma 2. Fix a basis B of V and let I be an
independent subset of V. If span I = V then I is already a basis of V. If
span I ≠ V, then B is not contained in I (because B spans V ). Hence choose
b1 ∈ B such that b1 ∉ span I. Hence the set I {b1} is independent by
Lemma 1. If span{I {b1}} = V we are done; otherwise a similar argument
shows that {I {b1, b2}} is independent for some b2 ∈ B. Continue this
process. As in the proof of Lemma 2, a basis of V will be reached eventually.
(3a) This is clear if U = {0}. Otherwise, let u ≠ 0 in U. Then {u} can be
enlarged to a finite basis B of U by Lemma 2, proving that U is finite
dimensional. But B is independent in V, so dim U ≤ dim V by the
fundamental theorem.
(3b) This is clear if U = {0} because V has a basis; otherwise, it follows from (2).
Theorem 1 shows that a vector space V is finite dimensional if and only if it has a
finite basis (possibly empty), and that every subspace of a finite dimensional space
is again finite dimensional.
EXAMP L E 1
Enlarge the independent set D = U S T , S T , S T V to a basis of M22.
1 1 0 1 1 0
1 0 1 1 1 1
EXAMPLE 2
Find a basis of P3 containing the independent set {1 + x, 1 + x2}.
Solution ► The standard basis of P3 is {1, x, x2, x3}, so including two of these
vectors will do. If we use 1 and x3, the result is {1, 1 + x, 1 + x2, x3}. This is
independent because the polynomials have distinct degrees (Example 4 Section
6.3), and so is a basis by Theorem 1. Of course, including {1, x} or {1, x2} would
not work!
EXAMPLE 3
Show that the space P of all polynomials is infinite dimensional.
EXAMPLE 4
If c1, c2, …, ck are independent columns in n, show that they are the first k
columns in some invertible n × n matrix.
Theorem 2
PROOF
Since W is finite dimensional, (1) follows by taking V = W in part (3) of
Theorem 1. Now assume dim U = dim W = n, and let B be a basis of U. Then
B is an independent set in W. If U ≠ W, then span B ≠ W, so B can be extended
to an independent set of n + 1 vectors in W by Lemma 1. This contradicts
the fundamental theorem (Theorem 2 Section 6.3) because W is spanned by
dim W = n vectors. Hence U = W, proving (2).
SECTION 6.4 Finite Dimensional Spaces 315
Theorem 2 is very useful. This was illustrated in Example 13 Section 5.2 for 2
and 3; here is another example.
EXAMP L E 5
If a is a number, let W denote the subspace of all polynomials in Pn that have a
as a root:
W = {p(x) | p(x) is in Pn and p(a) = 0}.
Show that {(x - a), (x - a)2, …, (x - a)n} is a basis of W.
Lemma 3
Dependent Lemma
A set D = {v1, v2, …, vk} of vectors in a vector space V is dependent if and only if some
vector in D is a linear combination of the others.
PROOF
Let v2 (say) be a linear combination of the rest: v2 = s1v1 + s3v3 + + skvk.
Then s1v1 + (-1)v2 + s3v3 + + skvk = 0 is a nontrivial linear combination
that vanishes, so D is dependent. Conversely, if D is dependent, let
t1v1 + t2v2 + + tkvk = 0 where some coefficient is nonzero. If (say) t2 ≠ 0,
t t t
then v2 = - __t12 v1 - __t32 v3 - - __t2k vk is a linear combination of the others.
Theorem 3
Let V be a finite dimensional vector space. Any spanning set for V can be cut down (by
deleting vectors) to a basis of V.
316 Chapter 6 Vector Spaces
PROOF
Since V is finite dimensional, it has a finite spanning set S. Among all spanning
sets contained in S, choose S0 containing the smallest number of vectors. It
suffices to show that S0 is independent (then S0 is a basis, proving the theorem).
Suppose, on the contrary, that S0 is not independent. Then, by Lemma 3,
some vector u ∈ S0 is a linear combination of the set S1 = S0 \ {u} of vectors
in S0 other than u. It follows that span S0 = span S1, that is, V = span S1. But
S1 has fewer elements than S0 so this contradicts the choice of S0. Hence S0 is
independent after all.
EXAMPLE 6
Find a basis of P3 in the spanning set S = {1, x + x2, 2x - 3x2, 1 + 3x - 2x2, x3}.
Theorem 4
Let V be a vector space with dim V = n, and suppose S is a set of exactly n vectors in V.
Then S is independent if and only if S spans V.
PROOF
Assume first that S is independent. By Theorem 1, S is contained in a basis B
of V. Hence |S| = n = |B| so, since S ⊆ B, it follows that S = B. In particular S
spans V.
Conversely, assume that S spans V, so S contains a basis B by Theorem 3.
Again |S| = n = |B| so, since S ⊇ B, it follows that S = B. Hence S is
independent.
One of independence or spanning is often easier to establish than the other when
showing that a set of vectors is a basis. For example if V = n it is easy to check
whether a subset S of n is orthogonal (hence independent) but checking spanning
can be tedious. Here are three more examples.
SECTION 6.4 Finite Dimensional Spaces 317
EXAMP L E 7
Consider the set S = {p0(x), p1(x), …, pn(x)} of polynomials in Pn. If
deg pk(x) = k for each k, show that S is a basis of Pn.
EXAMP L E 8
Let V denote the space of all symmetric 2 × 2 matrices. Find a basis of V
consisting of invertible matrices.
EXAMP L E 9
Let A be any n × n matrix. Show that there exist n2 + 1 scalars a0, a1, a2, …, an2
not all zero, such that
2
a0I + a1A + a2A2 + + an2An = 0
where I denotes the n × n identity matrix.
Theorem 5
Suppose that U and W are finite dimensional subspaces of a vector space V. Then
U + W is finite dimensional and
dim(U + W) = dim U + dim W - dim(U ∩ W).
PROOF
Since U ∩ W ⊆ U, it has a finite basis, say {x1, …, xd}. Extend it to a basis
{x1, …, xd, u1, …, um} of U by Theorem 1. Similarly extend {x1, …, xd} to
a basis {x1, …, xd, w1, …, wp} of W. Then
U + W = span{x1, …, xd, u1, …, um, w1, …, wp}
as the reader can verify, so U + W is finite dimensional. For the rest, it suffices
to show that {x1, …, xd, u1, …, um, w1, …, wp} is independent (verify). Suppose
that
r1x1 + + rdxd + s1u1 + + smum + t1w1 + + tpwp = 0 (∗)
where the ri, sj, and tk are scalars. Then
r1x1 + + rdxd + s1u1 + + smum = -(t1w1 + + tpwp)
is in U (left side) and also in W (right side), and so is in U ∩ W. Hence
(t1w1 + + tpwp) is a linear combination of {x1, …, xd}, so t1 = = tp = 0,
because {x1, …, xd, w1, …, wp} is independent. Similarly, s1 = = sm = 0, so (∗)
becomes r1x1 + + rdxd = 0. It follows that r1 = = rd = 0, as required.
EXERCISES 6.4
1. In each case, find a basis for V that includes the (b) V = P2, {x2 + 3, x + 2, x2 - 2x -1, x2 + x}
vector v.
3. In each case, find a basis of V containing v and w.
(a) V = 3, v = (1, -1, 1)
3
(a) V = 4, v = (1, -1, 1, -1), w = (0, 1, 0, 1)
(b) V = , v = (0, 1, 1)
(b) V = 4, v = (0, 0, 1, 1), w = (1, 1, 1, 1)
(c) V = M22, v = S T
1 1
(c) V = M22, v = S T, w = S T
1 0 0 1
1 1
0 1 1 0
(d) V = P2, v = x2 - x + 1
(d) V = P3, v = x2 + 1, w = x2 + x
2. In each case, find a basis for V among the given
4. (a) If z is not a real number, show that {z, z2} is a
vectors.
basis of the real vector space of all complex
(a) V = 3, numbers.
{(1, 1, -1), (2, 0, 1), (-1, 1, -2), (1, 2, 1)}
SECTION 6.4 Finite Dimensional Spaces 319
(b) If z is neither real nor pure imaginary, show 13. Let A be a nonzero 2 × 2 matrix and write
__
that {z, z } is a basis of . U = {X in M22 | XA = AX}. Show that
dim U ≥ 2. [Hint: I and A are in U.]
5. In each case use Theorem 4 to decide if S is a
basis of V. 14. If U ⊆ 2 is a subspace, show that U = {0},
U = 2, or U is a line through the origin.
(a) V = M22;
S = US T , S T , S T , S T V.
1 1 0 1 0 0 0 0 15. Given v1, v2, v3, …, vk, and v, let
1 1 1 1 1 1 0 1 U = span{v1, v2, …, vk} and
(b) V = P3; S = {2x2, 1 + x, 3, 1 + x + x2 + x3}. W = span{v1, v2, …, vk, v}. Show that either
dim W = dim U or dim W = 1 + dim U.
6. (a) Find a basis of M22 consisting of matrices
with the property that A2 = A. 16. Suppose U is a subspace of P1, U ≠ {0}, and
U ≠ P1. Show that either U = or
(b) Find a basis of P3 consisting of polynomials U = (a + x) for some a in .
whose coefficients sum to 4. What if they
sum to 0? 17. Let U be a subspace of V and assume
dim V = 4 and dim U = 2. Does every basis of
7. If {u, v, w} is a basis of V, determine which of V result from adding (two) vectors to some basis
the following are bases. of U? Defend your answer.
(a) {u + v, u + w, v + w}
18. Let U and W be subspaces of a vector space V.
(b) {2u + v + 3w, 3u + v - w, u - 4w}
(a) If dim V = 3, dim U = dim W = 2, and
(c) {u, u + v + w} U ≠ W, show that dim(U ∩ W) = 1.
(d) {u, u + w, u - w, v + w} (b) Interpret (a) geometrically if V = 3.
8. (a) Can two vectors span 3? Can they be 19. Let U ⊆ W be subspaces of V with dim U = k
linearly independent? Explain. and dim W = m, where k < m. If k < l < m,
show that a subspace X exists where U ⊆ X ⊆ W
(b) Can four vectors span 3? Can they be
and dim X = l.
linearly independent? Explain.
20. Let B = {v1, …, vn} be a maximal independent set
9. Show that any nonzero vector in a finite
in a vector space V. That is, no set of more than
dimensional vector space is part of a basis.
n vectors S is independent. Show that B is a basis
10. If A is a square matrix, show that det A = 0 if of V.
and only if some row is a linear combination of
21. Let B = {v1, …, vn} be a minimal spanning set for
the others.
a vector space V. That is, V cannot be spanned
11. Let D, I, and X denote finite, nonempty sets of by fewer than n vectors. Show that B is a basis
vectors in a vector space V. Assume that D is of V.
dependent and I is independent. In each case
22. (a) Let p(x) and q(x) lie in P1 and suppose that
answer yes or no, and defend your answer.
p(1) ≠ 0, q(2) ≠ 0, and p(2) = 0 = q(1). Show
(a) If X ⊇ D, must X be dependent? that {p(x), q(x)} is a basis of P1. [Hint: If
rp(x) + sq(x) = 0, evaluate at x = 1, x = 2.]
(b) If X ⊆ D, must X be dependent?
(b) Let B = {p0(x), p1(x), …, pn(x)} be a set of
(c) If X ⊇ I, must X be independent?
polynomials in Pn. Assume that there exist
(d) If X ⊆ I, must X be independent? numbers a0, a1, …, an such that pi(ai) ≠ 0 for
each i but pi(aj) = 0 if i is different from j.
12. If U and W are subspaces of V and dim U = 2, Show that B is a basis of Pn.
show that either U ⊆ W or dim(U ∩ W) ≤ 1.
320 Chapter 6 Vector Spaces
23. Let V be the set of all infinite sequences 25. Let U and W be subspaces of V.
(a0, a1, a2, …) of real numbers. Define
(a) Show that U + W is a subspace of V
addition and scalar multiplication by
containing U and W.
(a0, a1, …) + (b0, b1, …) = (a0 + b0, a1 + b1, …)
and r(a0, a1, …) = (ra0, ra1, …). (b) Show that span{u, w} = u + w for any
vectors u and w.
(a) Show that V is a vector space.
(c) Show that span{u1, …, um, w1, …, wn}
(b) Show that V is not finite dimensional.
= span{u1, …, um} + span{w1, …, wn} for
(c) [For those with some calculus.] Show that any vectors ui in U and wj in W.
the set of convergent sequences (that is,
lim an exists) is a subspace, also of infinite 26. If A and B are m × n matrices, show that
n→∞ rank(A + B) ≤ rank A + rank B. [Hint: If U and
dimension. V are the column spaces of A and B, respectively,
show that the column space of A + B is
24. Let A be an n × n matrix of rank r. If
contained in U + V and that
U = {X in Mnn | AX = 0}, show that
dim(U + V ) ≤ dim U + dim V. (See
dim U = n(n - r). [Hint: Exercise 34
Theorem 5.)]
Section 6.3.]
Theorem 1
Corollary 1
If a is any number, every polynomial f (x) of degree at most n has an expansion in powers
of (x - a):
f (x) = a0 + a1(x - a) + a2(x - a)2 + + an(x - a)n. (∗)
f (a) = 0, then it is clear that f (x) has the form f (x) = (x - a)g(x). Conversely, every
such polynomial certainly satisfies f (a) = 0, and we obtain:
Corollary 2
The polynomial g(x) can be computed easily by using “long division” to divide f (x)
by (x - a)—see Appendix D.
All the coefficients in the expansion (∗) of f (x) in powers of (x - a) can be
determined in terms of the derivatives of f (x).5 These will be familiar to students
of calculus. Let f (n)(x) denote the nth derivative of the polynomial f (x), and write
f (0)(x) = f (x). Then, if
f (x) = a0 + a1(x - a) + a2(x - a)2 + + an(x - a)n,
it is clear that a0 = f (a) = f (0)(a). Differentiation gives
f (1)(x) = a1 + 2a2(x - a) + 3a3(x - a)2 + + nan(x - a)n-1
and substituting x = a yields a1 = f (1)(a). This process continues to give
f (2)(a) f (3)(a) f (k)(a)
a2 = ______, a3 = ______, …, = ______, where k! is defined as k! = k(k - 1) 2 · 1.
2! 3! k!
Hence we obtain the following:
Corollary 3
Taylor’s Theorem
If f (x) is a polynomial of degree n, then
f (1)(a) f (2)(a) f (n)(a)
f (x) = f (a) + ______ (x - a) + ______(x - a)2 + + ______(x - a)n.
1! 2! n!
EXAMP L E 1
Expand f (x) = 5x3 + 10x + 2 as a polynomial in powers of x - 1.
Solution ► The derivatives are f (1)(x) = 15x2 + 10, f (2)(x) = 30x, and
f (3)(x) = 30. Hence the Taylor expansion is
Theorem 2
Let f0(x), f1(x), …, fn(x) be nonzero polynomials in Pn. Assume that numbers a0, a1, …,
an exist such that
fi(ai) ≠ 0 for each i
fi(aj) = 0 if i ≠ j
Then
1. { f0(x), …, fn(x)} is a basis of Pn.
2. If f (x) is any polynomial in Pn, its expansion as a linear combination of these
basis vectors is
f (a0) f (a1) f (an)
f (x) = _____ f 0(x) + _____ f 1(x) + + _____ f n(x).
f 0(a0) f (a
1 1 ) f n(an)
PROOF
1. It suffices (by Theorem 4 Section 6.4) to show that {f0(x), …, fn(x)} is linearly
independent (because dim Pn = n + 1). Suppose that
r0 f0(x) + r1 f1(x) + + rn fn(x) = 0, ri ∈ .
Because fi(a0) = 0 for all i > 0, taking x = a0 gives r0f0(a0) = 0. But then
r0 = 0 because f0(a0) ≠ 0. The proof that ri = 0 for i > 0 is analogous.
2. By (1), f (x) = r0 f0(x) + + rn fn(x) for some numbers ri. Again, evaluating at
a0 gives f (a0) = r0 f0(a0), so r0 = f (a0)/f0(a0). Similarly, ri = f (ai)/fi(ai) for each i.
EXAMPLE 2
Show that {x2 - x, x2 - 2x, x2 - 3x + 2} is a basis of P2.
In fact, this can be generalized with no extra effort. If a0, a1, …, an are distinct
numbers, define the Lagrange polynomials δ0(x), δ1(x), …, δn(x) relative to these
numbers as follows:
∏i≠k(x - ai)
δk(x) = _____________ k = 0, 1, 2, …, n
∏i≠k(ak - ai)
Here the numerator is the product of all the terms (x - a0), (x - a1), …, (x - an)
with (x - ak) omitted, and a similar remark applies to the denominator. If n = 2,
these are just the polynomials in the preceding paragraph. For another example, if
n = 3, the polynomial δ1(x) takes the form
(x - a0)(x - a2)(x - a3)
δ1(x) = ______________________
(a1 - a0)(a1 - a2)(a1 - a3)
In the general case, it is clear that δi(ai) = 1 for each i and that δi(aj) = 0 if i ≠ j.
Hence Theorem 2 specializes as Theorem 3.
Theorem 3
EXAMP L E 3
Find the Lagrange interpolation expansion for f (x) = x2 - 2x + 1 relative to
a0 = -1, a1 = 0, and a2 = 1.
Theorem 4
Let f (x) be a polynomial in Pn, and let a0, a1, …, an denote distinct numbers. If f (ai) = 0
for all i, then f (x) is the zero polynomial (that is, all coefficients are zero).
PROOF
All the coefficients in the Lagrange expansion of f (x) are zero.
EXERCISES 6.5
1. If polynomials f (x) and g(x) satisfy f (a) = g(a), f (x) relative to a0 = 1, a1 = 2, and a2 = 3 if:
show that f (x) - g(x) = (x - a)h(x) for some
(a) f (x) = x2 + 1 (b) f (x) = x2 + x + 1
polynomial h(x).
Exercises 2, 3, 4, and 5 require polynomial 8. Let a0, a1, …, an be distinct numbers. If f (x) and
differentiation. g(x) in Pn satisfy f (ai) = g(ai) for all i, show that
f (x) = g(x). [Hint: See Theorem 4.]
2. Expand each of the following as a polynomial
in powers of x - 1. 9. Let a0, a1, …, an be distinct numbers. If f (x) in
3 2 Pn+1 satisfies f (ai) = 0 for each i = 0, 1, …, n,
(a) f (x) = x - 2x + x - 1
show that f (x) = r(x - a0)(x - a1) (x - an) for
(b) f (x) = x3 + x + 1 some r in . [Hint: r is the coefficient of xn+1 in
f (x). Consider f (x) - r(x - a0) (x - an) and
(c) f (x) = x4
use Theorem 4.]
(d) f (x) = x3 - 3x2 + 3x
10. Let a and b denote distinct numbers.
3. Prove Taylor’s theorem for polynomials.
(a) Show that {(x - a), (x - b)} is a basis of P1.
4. Use Taylor’s theorem to derive the binomial (b) Show that {(x - a)2, (x - a)(x - b), (x - b)2}
theorem: is a basis of P2.
(1 + x)n = Q R + Q Rx + Q Rx2 + + Q Rxn
n n n n
0 1 2 n (c) Show that {(x - a)n, (x - a)n-1(x - b), …,
(x - a)(x - b)n-1, (x - b)n} is a basis of
Here the binomial coefficients Q R are defined
n
r Pn. [Hint: If a linear combination vanishes,
by Q R = ________
n n! where n! = n(n - 1) 2 · 1 evaluate at x = a and x = b. Then reduce
r r!(n - r)! to the case n - 2 by using the fact that if
if n ≥ 1 and 0! = 1. p(x)q(x) = 0 in P, then either p(x) = 0 or
5. Let f (x) be a polynomial of degree n. Show q(x) = 0.]
that, given any polynomial g(x) in Pn, there exist 11. Let a and b be two distinct numbers. Assume
numbers b0, b1, …, bn such that that n ≥ 2 and let
g(x) = b0f (x) + b1f (1)(x) + + bnf (n)(x) Un = {f (x) in Pn | f (a) = 0 = f (b)}.
(k)
where f (x) denotes the kth derivative of f (x). (a) Show that
Un = {(x - a)(x - b)p(x) | p(x) in Pn-2}.
6. Use Theorem 2 to show that the following are
bases of P2. (b) Show that dim Un = n - 1.
[Hint: If p(x)q(x) = 0 in P, then either
(a) {x2 - 2x, x2 + 2x, x2 - 4} p(x) = 0, or q(x) = 0.]
(b) {x2 - 3x + 2, x2 - 4x + 3, x2 - 5x + 6} (c) Show that {(x - a)n-1(x - b), (x - a)n-2(x - b)2,
…, (x - a)2(x - b)n-2, (x - a)(x - b)n-1} is a
7. Find the Lagrange interpolation expansion of
basis of Un. [Hint: Exercise 10.]
SECTION 6.6 An Application to Differential Equations 325
Theorem 1
Theorem 2
The set of solutions to the nth order equation (∗) has dimension n.
Remark Every differential equation of order n can be converted into a system of n linear
first-order equations (see Exercises 6 and 7 in Section 3.5). In the case that the
matrix of this system is diagonalizable, this approach provides a proof of Theorem
2. But if the matrix is not diagonalizable, Theorem 1 Section 7.4 is required.
Theorem 1 suggests that we look for solutions to (∗) of the form eλx for some
number λ. This is a good idea. If we write f (x) = eλx, it is easy to verify that
f (k)(x) = λkeλx for each k ≥ 0, so substituting f in (∗) gives
(λn + an-1λn-1 + an-2λn-2 + + a2λ2 + a1λ1 + a0)eλx = 0.
Since eλx ≠ 0 for all x, this shows that eλx is a solution of (∗) if and only if λ is a root
of the characteristic polynomial c(x), defined to be
c(x) = xn + an-1xn-1 + an-2xn-2 + + a2x2 + a1x + a0.
This proves Theorem 3.
326 Chapter 6 Vector Spaces
Theorem 3
If λ is real, the function eλx is a solution of (∗) if and only if λ is a root of the
characteristic polynomial c(x).
EXAMPLE 1
Find a basis of the space U of solutions of f - 2f - f - 2f = 0.
Lemma 1
If λ1, λ2, …, λk are distinct, then {eλ1x, eλ2x, …, eλkx} is linearly independent.
PROOF
If r1eλ1x + r2eλ2x + + rkeλkx = 0 for all x, then
r1 + r2e(λ2-λ1)x + + rke(λk-λ1)x = 0; that is, r2e(λ2-λ1)x + + rke(λk-λ1)x
is a constant. Since the λi are distinct, this forces r2 = = rk = 0,
whence r1 = 0 also. This is what we wanted.
Theorem 4
PROOF
Since dim(U ) = 2 by Theorem 2, (1) follows by Lemma 1, and (2) follows
because the set {eλx, xeλx} is independent (Exercise 3).
EXAMPLE 2
Find the solution of f + 4f + 4f = 0 that satisfies the boundary conditions
f (0) = 1, f (1) = -1.
SECTION 6.6 An Application to Differential Equations 327
One other question remains: What happens if the roots of the characteristic
polynomial are not real? To answer this, we must first state precisely what eλx
means when λ is not real. If q is a real number, define
eiq = cos q + i sin q
where i2 = -1. Then the relationship eiqeiq1 = ei(q+q1) holds for all real q and q1, as
is easily verified. If λ = p + iq, where p and q are real numbers, we define
For convenience, denote the real and imaginary parts of f (x) as u(x) = epxcos(qx) and
v(x) = epxsin(qx). Then the fact that f (x) satisfies the differential equation gives
0 = f + af + bf = (u + au + bu) + i(v + av + bv).
Equating real and imaginary parts shows that u(x) and v(x) are both solutions to the
differential equation. This proves part of Theorem 5.
Theorem 5
PROOF
The foregoing discussion shows that these functions lie in U. Because dim U = 2
by Theorem 2, it suffices to show that they are linearly independent. But if
repxcos(qx) + sepxsin(qx) = 0
for all x, then r cos(qx) + s sin(qx) = 0 for all x (because epx ≠ 0). Taking x = 0
π
gives r = 0, and taking x = __ 2q
gives s = 0 (q ≠ 0 because λ is not real). This is
what we wanted.
EXAMPLE 3
Find the solution f (x) to f - 2f + 2f = 0 that satisfies f (0) = 2 and f (__π2 ) = 0.
Theorem 6
PROOF
The characteristic polynomial x2 + q2 has roots qi and -qi, so Theorem 5 applies
with p = 0.
In many situations, the displacement s(t) of some object at time t turns out
to have an oscillating form s(t) = c sin(at) + d cos(at). These are called simple
harmonic motions. An example follows.
EXAMPLE 4
A weight is attached to an extension spring (see diagram). If it is pulled from
the equilibrium position and released, it is observed to oscillate up and down.
Let d(t) denote the distance of the weight below the equilibrium position t
seconds later. It is known (Hooke’s law) that the acceleration d(t) of the
weight is proportional to the displacement d(t) and in the opposite direction.
That is,
d(t)
SECTION 6.6 An Application to Differential Equations 329
d(t) = -kd(t)
where k > 0 is called the spring constant. Find d(t) if the maximum extension
is 10 cm below the equilibrium position and find the period of the oscillation
(time taken for the weight to make a full oscillation).
EXERCISES 6.6
1. Find a solution f to each of the following 4. (a) Given the equation f + af = b, (a ≠ 0), make
differential equations satisfying the given the substitution f (x) = g(x) + b/a and obtain
boundary conditions. a differential equation for g. Then derive the
general solution for f + af = b.
(a) f - 3f = 0; f (1) = 2
(b) Find the general solution to f + f = 2.
(b) f + f = 0; f (1) = 1
(c) f + 2f - 15f = 0; f (1) = f (0) = 0 5. Consider the differential equation
f + af + bf = g, where g is some fixed function.
(d) f + f - 6f = 0; f (0) = 0, f (1) = 1 Assume that f0 is one solution of this equation.
(e) f - 2f + f = 0; f (1) = f (0) = 1 (a) Show that the general solution is
cf1 + df2 + f0, where c and d are constants and
(f ) f - 4f + 4f = 0; f (0) = 2, f (-1) = 0
{f1, f2} is any basis for the solutions to
(g) f - 3af + 2a2f = 0; a ≠ 0; f (0) = 0, f + af + bf = 0.
f (1) = 1 - ea
(b) Find a solution to
(h)
2
f - a f = 0, a ≠ 0; f (0) = 1, f (1) = 0 f + f - 6f = 2x3 - x2 - 2x.
-1 3
[Hint: Try f (x) = __
3
x .]
(i) f - 2f + 5f = 0; f (0) = 1, f (__π4 ) = 0
6. A radioactive element decays at a rate
(j) f + 4f + 5f = 0; f (0) = 0, f (__π2 ) = 1
proportional to the amount present. Suppose
2. If the characteristic polynomial of an initial mass of 10 grams decays to 8 grams
f + af + bf = 0 has real roots, show that f = 0 in 3 hours.
is the only solution satisfying f (0) = 0 = f (1). (a) Find the mass t hours later.
3. Complete the proof of Theorem 2. [Hint: If λ is (b) Find the half-life of the element—the time it
a double root of x2 + ax + b, show that a = -2λ takes to decay to half its mass.
and b = λ2. Hence xeλx is a solution.]
330 Chapter 6 Vector Spaces
7. The population N(t) of a region at time t 9. As a pendulum swings (see the diagram), let t
increases at a rate proportional to the population. measure the time since it was vertical. The angle
If the population doubles in 5 years and is θ = θ(t) from the vertical can be shown to satisfy
3 million initially, find N(t). the equation θ + kθ = 0, provided that θ is
small. If the maximal angle is θ = 0.05 radians,
8. Consider a spring, as in Example 4. If the period find θ(t) in terms of k. If the period is
of the oscillation is 30 seconds, find the spring 0.5 seconds, find k. [Assume that θ = 0
constant k. when t = 0.]
1. (Requires calculus) Let V denote the space of 3. If A is an m × n matrix, show that A has rank m
all functions f : → for which the derivatives if and only if col A contains every column of Im.
f and f exist. Show that f1, f2, and f3 in V
are linearly independent provided that their 4. Show that null A = null(ATA) for any real
wronskian w(x) is nonzero for some x, where matrix A.
S T
f1(x) f2(x) f3(x) 5. Let A be an m × n matrix of rank r. Show that
w(x) = det f 1 (x) f 2 (x) f 3 (x) dim(null A) = n - r (Theorem 3 Section 5.4) as
f 1(x) f 2(x) f 3(x) follows. Choose a basis {x1, …, xk} of null A and
extend it to a basis {x1, …, xk, z1, …, zm} of n.
2. Let {v1, v2, …, vn} be a basis of n (written as Show that {Az1, …, Azm} is a basis of col A.
columns), and let A be an n × n matrix.
(a) If A is invertible, show that
{Av1, Av2, …, Avn} is a basis of n.
(b) If {Av1, Av2, …, Avn} is a basis of n, show
that A is invertible.
Linear Tran sformations
7
If V and W are vector spaces, a function T : V → W is a rule that assigns to each
vector v in V a uniquely determined vector T(v) in W. As mentioned in Section 2.2,
two functions S : V → W and T : V → W are equal if S(v) = T(v) for every v in V.
A function T : V → W is called a linear transformation if T(v + v1) = T(v) + T(v1)
for all v, v1 in V and T(rv) = rT(v) for all v in V and all scalars r. T(v) is called the
image of v under T. We have already studied linear transformations T : n → m
and shown (in Section 2.6) that they all are given by multiplication by a uniquely
determined m × n matrix A; that is, T(x) = Ax for all x in n. In the case of linear
operators 2 → 2, this yields an important way to describe geometric functions
such as rotations about the origin and reflections in a line through the origin.
In the present chapter we will describe linear transformations in general,
introduce the kernel and image of a linear transformation, and prove a useful result
(called the dimension theorem) that relates the dimensions of the kernel and image,
and unifies and extends several earlier results. Finally we study the notion of
isomorphic vector spaces, that is, spaces that are identical except for notation, and
relate this to composition of transformations that was introduced in Section 2.3.
Definition 7.1 If V and W are two vector spaces, a function T : V → W is called a linear
transformation if it satisfies the following axioms.
T1. T(v + v1) = T(v) + T(v1) for all v and v1 in V.
T
T2. T(rv) = rT(v) for all v in V and r in .
v T (v)
W A linear transformation T : V → V is called a linear operator on V. The situation can
V be visualized as in the diagram.
Axiom T1 is just the requirement that T preserves vector addition. It asserts that
the result T(v + v1) of adding v and v1 first and then applying T is the same as
applying T first to get T(v) and T(v1) and then adding. Similarly, axiom T2 means
that T preserves scalar multiplication. Note that, even though the additions in axiom
T1 are both denoted by the same symbol +, the addition on the left forming v + v1
is carried out in V, whereas the addition T(v) + T(v1) is done in W. Similarly,
the scalar multiplications rv and rT(v) in axiom T2 refer to the spaces V and W,
respectively.
332 Chapter 7 Linear Transformations
EXAMPLE 1
If V and W are vector spaces, the following are linear transformations:
Identity operator V → V 1V : V → V where 1V(v) = v for all v in V
Zero transformation V → W 0 : V → W where 0(v) = 0 for all v in V
Scalar operator V → V a:V→V where a(v) = av for all v in V
(Here a is any real number.)
The symbol 0 will be used to denote the zero transformation from V to W for
any spaces V and W. It was also used earlier to denote the zero function [a, b] → .
The next example gives two important transformations of matrices. Recall that
the trace tr A of an n × n matrix A is the sum of the entries on the main diagonal.
EXAMPLE 2
Show that the transposition and trace are linear transformations. More
precisely,
R : Mmn → Mnm where R(A) = AT for all A in Mmn
S : Mmn → where S(A) = tr A for all A in Mnn
are both linear transformations.
EXAMPLE 3
If a is a scalar, define Ea : Pn → by Ea(p) = p(a) for each polynomial p in Pn.
Show that Ea is a linear transformation (called evaluation at a).
Solution ► If p and q are polynomials and r is in , we use the fact that the sum
p + q and scalar product rp are defined as for functions:
(p + q)(x) = p(x) + q(x) and (rp)(x) = rp(x)
for all x. Hence, for all p and q in Pn and all r in :
SECTION 7.1 Examples and Elementary Properties 333
EXAMP L E 4
Show that the differentiation and integration operations on Pn are linear
transformations. More precisely,
D : Pn → Pn-1 where D[ p(x)] = p(x) for all p(x) in Pn
I : Pn → Pn+1 where I[ p(x)] = ∫0x p(t)dt for all p(x) in Pn
are linear transformations.
∫0x [ p(t) + q(t)]dt = ∫0x p(t)dt + ∫0x q(t)dt and ∫0xrp(t)dt = r∫0x p(t)dt
The next theorem collects three useful properties of all linear transformations.
They can be described by saying that, in addition to preserving addition and scalar
multiplication (these are the axioms), linear transformations preserve the zero
vector, negatives, and linear combinations.
Theorem 1
PROOF
1. T(0) = T(0v) = 0T(v) = 0 for any v in V.
2. T(-v) = T [(-1)v] = (-1)T(v) = -T(v) for any v in V.
3. The proof of Theorem 1 Section 2.6 goes through.
The ability to use the last part of Theorem 1 effectively is vital to obtaining the
benefits of linear transformations. Example 5 and Theorem 2 provide illustrations.
334 Chapter 7 Linear Transformations
EXAMPLE 5
Let T : V → W be a linear transformation. If T(v - 3v1) = w and
T(2v - v1) = w1, find T(v) and T(v1) in terms of w and w1.
Theorem 2
PROOF
If v is any vector in V = span{v1, v2, …, vn}, write v = a1v1 + a2v2 + + anvn
where each ai is in . Since T(vi) = S(vi) for each i, Theorem 1 gives
T(v) = T(a1v1 + a2v2 + + anvn)
= a1T(v1) + a2T(v2) + + anT(vn)
= a1S(v1) + a2S(v2) + + anS(vn)
= S(a1v1 + a2v2 + + anvn)
= S(v).
Since v was arbitrary in V, this shows that T = S.
EXAMPLE 6
Let V = span{v1, …, vn}. Let T : V → W be a linear transformation. If
T(v1) = = T(vn) = 0, show that T = 0, the zero transformation from
V to W.
to every vector in V. If the spanning set is a basis, we can say much more.
Theorem 3
Let V and W be vector spaces and let {b1, b2, …, bn} be a basis of V. Given any
vectors w1, w2, …, wn in W (they need not be distinct), there exists a unique linear
transformation T : V → W satisfying T(bi) = wi for each i = 1, 2, …, n. In fact, the
action of T is as follows:
Given v = v1b1 + v2b2 + + vnbn in V, vi in , then
T(v) = T(v1b1 + v2b2 + + vnbn) = v1w1 + v2w2 + + vnwn.
PROOF
If a transformation T does exist with T(bi) = wi for each i, and if S is any other
such transformation, then T(bi) = wi = S(bi) holds for each i, so S = T by
Theorem 2. Hence T is unique if it exists, and it remains to show that there
really is such a linear transformation. Given v in V, we must specify T(v) in
W. Because {b1, …, bn} is a basis of V, we have v = v1b1 + + vnbn, where
v1, …, vn are uniquely determined by v (this is Theorem 1 Section 6.3). Hence
we may define T : V → W by
T(v) = T(v1b1 + v2b2 + + vnbn) = v1w1 + v2w2 + + vnwn
for all v = v1b1 + + vnbn in V. This satisfies T(bi) = wi for each i; the
verification that T is linear is left to the reader.
This theorem shows that linear transformations can be defined almost at will:
Simply specify where the basis vectors go, and the rest of the action is dictated
by the linearity. Moreover, Theorem 2 shows that deciding whether two linear
transformations are equal comes down to determining whether they have the same
effect on the basis vectors. So, given a basis {b1, …, bn} of a vector space V, there is a
different linear transformation V → W for every ordered selection w1, w2, …, wn of
vectors in W (not necessarily distinct).
EXAMP L E 7
Find a linear transformation T : P2 → M22 such that
EXERCISES 7.1
T S T = S T , T S T = S T , find T S T.
1 1 1 0 -1
10. Let {e1, …, en} be a basis of n. Given
3 1 1 1 3 k, 1 ≤ k ≤ n, define Pk : n → n by
If T : 2 → 2 and T S T = S T, T S T = S T,
1 0 1 1 Pk(r1e1 + + rnen) = rkek. Show that
(d)
Pk a linear transformation for each k.
find T S T
1 -1 1 1 0
.
-7
SECTION 7.1 Examples and Elementary Properties 337
11. Let S : V → W and T : V → W be linear 19. If a and b are real numbers, define Ta,b : →
transformations. Given a in , define functions by Ta,b(r + si) = ra + sbi for all r + si in .
(S + T ) : V → W and (aT ) : V → W by __ ______
(a) Show that Ta,b is linear and Ta,b( z ) = Ta,b(z)
(S + T )(v) = S(v) + T(v) and (aT )(v) = aT(v) __
for all z in . (Here z denotes the conjugate
for all v in V. Show that S + T and aT are linear
of z.)
transformations. ____
__
(b) If T : → is linear and T( z ) = T(z)
12. Describe all linear transformations T : → V. for all z in , show that T = Ta,b for
some real a and b.
13. Let V and W be vector spaces, let V be finite
dimensional, and let v ≠ 0 in V. Given any w in 20. Show that the following conditions are
W, show that there exists a linear transformation equivalent for a linear transformation
T : V → W with T(v) = w. [Hint: Theorem 1(2) T : M22 → M22.
Section 6.4 and Theorem 3.]
(1) tr[T(A)] = tr A for all A in M22.
14. Given y in n, define Sy : n → by
(2) T S
r21 r22 T
r11 r12
Sy(x) = x · y for all x in n (where · is the = r11B11 + r12B12 + r21B21 + r22B22
dot product introduced in Section 5.3). for matrices Bij such that
(a) Show that Sy : n → is a linear tr B11 = 1 = tr B22 and tr B12 = 0 = tr B21.
transformation for any y in n.
21. Given a in , consider the evaluation map
(b) Show that every linear transformation Ea : Pn → defined in Example 3.
T : n → arises in this way; that is, T = Sy
(a) Show that Ea is a linear transformation
for some y in n. [Hint: If {e1, …, en} is the
satisfying the additional condition that
standard basis of n, write Sy(ei) = yi for each
Ea(xk) = [Ea(x)]k holds for all k = 0, 1, 2, ….
i. Use Theorem 1.]
[Note: x0 = 1.]
15. Let T : V → W be a linear transformation. (b) If T : Pn → is a linear transformation
(a) If U is a subspace of V, show that satisfying T(xk) = [T(x)]k for all
T(U ) = {T(u) | u in U} is a subspace of W k = 0, 1, 2, …, show that T = Ea for
(called the image of U under T ). some a in R.
Definition 7.2 The kernel of T (denoted ker T) and the image of T (denoted im T or T(V )) are
defined by
ker T = {v in V | T(v) = 0}
im T = {T(v) | v in V} = T(V )
EXAMPLE 1
Theorem 1
PROOF
The fact that T(0) = 0 shows that ker T and im T contain the zero vector of V
and W respectively.
1. If v and v1 lie in ker T, then T(v) = 0 = T(v1), so
T(v + v1) = T(v) + T(v1) = 0 + 0 = 0
T(rv) = rT(v) = r0 = 0 for all r in
Hence v + v1 and rv lie in ker T (they satisfy the required condition), so
ker T is a subspace of V by the subspace test (Theorem 1 Section 6.2).
2. If w and w1 lie in im T, write w = T(v) and w1 = T(v1) where v, v1 ∈ V. Then
w + w1 = T(v) + T(v1) = T(v + v1)
rw = rT(v) = T(rv) for all r in
Hence w + w1 and rw both lie in im T (they have the required form), so
im T is a subspace of W.
SECTION 7.2 Kernel and Image of a Linear Transformation 339
The rank of a matrix A was defined earlier to be the dimension of col A, the column
space of A. The two usages of the word rank are consistent in the following sense.
Recall the definition of TA in Example 1.
EXAMP L E 2
Given an m × n matrix A, show that im TA = col A, so rank TA = rank A.
EXAMP L E 3
Define a transformation P : Mnn → Mnn by P(A) = A - AT for all A in Mnn.
Show that P is linear and that:
(a) ker P consists of all symmetric matrices.
(b) im P consists of all skew-symmetric matrices.
Solution ► The verification that P is linear is left to the reader. To prove part
(a), note that a matrix A lies in ker P just when 0 = P(A) = A - AT, and this
occurs if and only if A = AT—that is, A is symmetric. Turning to part (b),
the space im P consists of all matrices P(A), A in Mnn. Every such matrix is
skew-symmetric because
P(A)T = (A - AT)T = AT - A = -P(A)
On the other hand, if S is skew-symmetric (that is, ST = -S), then S lies in
im P. In fact,
P[_12 S] = _12 S - [_12 S]T = _12 (S - ST) = _12 (S + S) = S.
Theorem 2
PROOF
If T is one-to-one, let v be any vector in ker T. Then T(v) = 0, so T(v) = T(0).
Hence v = 0 because T is one-to-one. Hence ker T = {0}.
Conversely, assume that ker T = {0} and let T(v) = T(v1) with v and v1 in V.
Then T(v - v1) = T(v) - T(v1) = 0, so v - v1 lies in ker T = {0}. This means
that v - v1 = 0, so v = v1, proving that T is one-to-one.
EXAMPLE 4
The identity transformation 1V : V → V is both one-to-one and onto for any
vector space V.
EXAMPLE 5
Consider the linear transformations
S : 3 → 2 given by S(x, y, z) = (x + y, x - y)
T : 2 → 3 given by T(x, y) = (x + y, x - y, x)
Show that T is one-to-one but not onto, whereas S is onto but not one-to-one.
EXAMP L E 6
Let U be an invertible m × m matrix and define
T : Mmn → Mmn by T(X) = UX for all X in Mmn
Show that T is a linear transformation that is both one-to-one and onto.
Solution ► The verification that T is linear is left to the reader. To see that T
is one-to-one, let T(X) = 0. Then UX = 0, so left-multiplication by U -1 gives
X = 0. Hence ker T = {0}, so T is one-to-one. Finally, if Y is any member of
Mmn, then U -1Y lies in Mmn too, and T(U -1Y) = U(U -1Y) = Y. This shows
that T is onto.
Theorem 3
PROOF
1. We have that im TA is the column space of A (see Example 2), so TA is onto
if and only if the column space of A is m. Because the rank of A is the
dimension of the column space, this holds if and only if rank A = m.
2. ker TA = {x in n | Ax = 0}, so (using Theorem 2) TA is one-to-one if and
only if Ax = 0 implies x = 0. This is equivalent to rank A = n by Theorem 3
Section 5.4.
Theorem 4
Dimension Theorem
Let T : V → W be any linear transformation and assume that ker T and im T are both
finite dimensional. Then V is also finite dimensional and
dim V = dim(ker T ) + dim(im T )
In other words, dim V = nullity(T ) + rank(T ).
PROOF
Every vector in im T = T(V ) has the form T(v) for some v in V. Hence let
{T(e1), T(e2), …, T(er)} be a basis of im T, where the ei lie in V. Let {f1, f2, …, fk}
be any basis of ker T. Then dim(im T ) = r and dim(ker T ) = k, so it suffices to
show that B = {e1, …, er, f1, …, fk} is a basis of V.
1. B spans V. If v lies in V, then T(v) lies in im V, so
T(v) = t1T(e1) + t2T(e2) + + trT(er) ti in
This implies that v - t1e1 - t2e2 - - trer lies in ker T and so is a linear
combination of f1, …, fk. Hence v is a linear combination of the vectors in B.
2. B is linearly independent. Suppose that ti and sj in satisfy
t1e1 + + trer + s1f1 + + skfk = 0 (∗)
Note that the vector space V is not assumed to be finite dimensional in Theorem 4.
In fact, verifying that ker T and im T are both finite dimensional is often an
important way to prove that V is finite dimensional.
Note further that r + k = n in the proof so, after relabelling, we end up with
a basis
B = {e1, e2, …, er, er+1, …, en}
of V with the property that {er+1, …, en} is a basis of ker T and {T(e1), …, T(er)} is
a basis of im T. In fact, if V is known in advance to be finite dimensional, then any
basis {er+1, …, en} of ker T can be extended to a basis {e1, e2, …, er, er+1, …, en}
of V by Theorem 1 Section 6.4. Moreover, it turns out that, no matter how this is
done, the vectors {T(e1), …, T(er)} will be a basis of im T. This result is useful, and
we record it for reference. The proof is much like that of Theorem 4 and is left as
Exercise 26.
SECTION 7.2 Kernel and Image of a Linear Transformation 343
Theorem 5
Let T : V → W be a linear transformation, and let {e1, …, er, er+1, …, en} be a basis of
V such that {er+1, …, en} is a basis of ker T. Then {T(e1), …, T(er)} is a basis of im T,
and hence r = rank T.
The dimension theorem is one of the most useful results in all of linear algebra.
It shows that if either dim(ker T ) or dim(im T ) can be found, then the other is
automatically known. In many cases it is easier to compute one than the other, so
the theorem is a real asset. The rest of this section is devoted to illustrations of this
fact. The next example uses the dimension theorem to give a different proof of the
first part of Theorem 2 Section 5.4.
EXAMP L E 7
Let A be an m × n matrix of rank r. Show that the space null A of all solutions
of the system Ax = 0 of m homogeneous equations in n variables has dimension
n - r.
EXAMP L E 8
If T : V → W is a linear transformation where V is finite dimensional, then
dim(ker T ) ≤ dim V and dim(im T ) ≤ dim V
Indeed, dim V = dim(ker T ) + dim(im T ) by Theorem 4. Of course, the first
inequality also follows because ker T is a subspace of V.
EXAMP L E 9
Let D : Pn → Pn-1 be the differentiation map defined by D[ p(x)] = p(x).
Compute ker D and hence conclude that D is onto.
Of course it is not difficult to verify directly that each polynomial q(x) in Pn-1 is
the derivative of some polynomial in Pn (simply integrate q(x)!), so the dimension
theorem is not needed in this case. However, in some situations it is difficult to see
directly that a linear transformation is onto, and the method used in Example 9 may
be by far the easiest way to prove it. Here is another illustration.
344 Chapter 7 Linear Transformations
EXAMPLE 10
Given a in , the evaluation map Ea : Pn → is given by Ea[ p(x)] = p(a).
Show that Ea is linear and onto, and hence conclude that
{(x - a), (x - a)2, …, (x - a)n} is a basis of ker Ea, the subspace of all
polynomials p(x) for which p(a) = 0.
EXAMPLE 11
If A is any m × n matrix, show that rank A = rank ATA = rank AAT.
Solution ► It suffices to show that rank A = rank ATA (the rest follows by
replacing A with AT). Write B = ATA, and consider the associated matrix
transformations
TA : n → m and TB : n → n
The dimension theorem and Example 2 give
rank A = rank TA = dim(im TA) = n - dim(ker TA)
rank B = rank TB = dim(im TB) = n - dim(ker TB)
so it suffices to show that ker TA = ker TB. Now Ax = 0 implies that
Bx = ATAx = 0, so ker TA is contained in ker TB. On the other hand,
if Bx = 0, then ATAx = 0, so
Ax2 = (Ax)T(Ax) = xTATAx = xT0 = 0
This implies that Ax = 0, so ker TB is contained in ker TA.
EXERCISES 7.2
1. For each matrix A, find a basis for the kernel and 2. In each case, (i) find a basis of ker T, and (ii) find
image of TA, and find the rank and nullity of TA. a basis of im T. You may assume that T is linear.
1 2 −1 1 2 1 −1 3 (a) T : P2 → 2; T(a + bx + cx2) = (a, b)
(a) 3 1 0 2 (b) 1 0 3 1 (b) T : P2 → 2; T(p(x)) = (p(0), p(1))
1 −3 2 0 1 1 −4 2
1 2 −1 2 1 0 (c) T : 3 → 3; T(x, y, z) = (x + y, x + y, 0)
3 1 2 1 −1 3 (d) T : 3 → 4; T(x, y, z) = (x, x, y, y)
(c) (d)
T=S T
4 −1 5
(e) T : M22 → M22; T S
1 2 −3 a b a+b b+c
0 2 −2 0 3 −6 c d c+d d+a
SECTION 7.2 Kernel and Image of a Linear Transformation 345
T : M22 → ; T S T=a+d
a b
(f ) (h) If dim(ker T ) ≤ dim W, then
c d dim W ≥ _12 dim V.
(g) T : Pn → ; T(r0 + r1x + + rnxn) = rn
(i) If T is one-to-one, then dim V ≤ dim W.
(h) T : n → ;
T(r1, r2, …, rn) = r1 + r2 + + rn (j) If dim V ≤ dim W, then T is one-to-one.
(i) T : M22 → M22; T(X) = XA - AX, where (k) If T is onto, then dim V ≥ dim W.
A=S T
0 1 (l) If dim V ≥ dim W, then T is onto.
1 0
(m) If {T(v1), …, T(vk)} is independent, then
(j) T : M22 → M22; T(X) = XA, where
{v1, …, vk} is independent.
A=S T
1 1
0 0 (n) If {v1, …, vk} spans V, then {T(v1), …, T(vk)}
spans W.
3. Let P : V → and Q : V → be linear
transformations, where V is a vector space. 7. Show that linear independence is preserved by
Define T : V → 2 by T(v) = (P(v), Q(v)). one-to-one transformations and that spanning
sets are preserved by onto transformations.
(a) Show that T is a linear transformation.
More precisely, if T : V → W is a linear
(b) Show that ker T = ker P ∩ ker Q, the set transformation, show that:
of vectors in both ker P and ker Q.
(a) If T is one-to-one and {v1, …, vn} is
4. In each case, find a basis independent in V, then {T(v1), …, T(vn)} is
B = {e1, …, er, er+1, …, en} of V such that independent in W.
{er+1, …, en} is a basis of ker T, and verify (b) If T is onto and V = span{v1, …, vn}, then
Theorem 5. W = span{T(v1), …, T(vn)}.
(a) T : 3 → 4; T(x, y, z) =
8. Given {v1, …, vn} in a vector space V, define
(x - y + 2z, x + y - z, 2x + z, 2y - 3z)
T : n → V by T(r1, …, rn) = r1v1 + + rnvn.
(b) T : 3 → 4; T(x, y, z) = Show that T is linear, and that:
(x + y + z, 2x - y + 3z, z - 3y, 3x + 4z)
(a) T is one-to-one if and only if {v1, …, vn} is
5. Show that every matrix X in Mnn has the independent.
form X = AT - 2A for some matrix A in Mnn. (b) T is onto if and only if V = span{v1, …, vn}.
[Hint: The dimension theorem.]
9. Let T : V → V be a linear transformation where
6. In each case either prove the statement or give V is finite dimensional. Show that exactly one of
an example in which it is false. Throughout, let (i) and (ii) holds: (i) T(v) = 0 for some v ≠ 0 in
T : V → W be a linear transformation where V V; (ii) T(x) = v has a solution x in V for every v
and W are finite dimensional. in V.
(a) If V = W, then ker T ⊆ im T.
10. Let T : Mnn → denote the trace map:
(b) If dim V = 5, dim W = 3, and T(A) = tr A for all A in Mnn. Show that
dim(ker T ) = 2, then T is onto. dim(ker T ) = n2 - 1.
(c) If dim V = 5 and dim W = 4, then 11. Show that the following are equivalent for a
ker T ≠ {0}. linear transformation T : V → W.
(d) If ker T = V, then W = {0}. (a) ker T = V (b) im T = {0}
(e) If W = {0}, then ker T = V. (c) T = 0
(f ) If W = V, and im T ⊆ ker T, then T = 0.
12. Let A and B be m × n and k × n matrices,
(g) If {e1, e2, e3} is a basis of V and respectively. Assume that Ax = 0 implies
T(e1) = 0 = T(e2), then dim(im T ) ≤ 1. Bx = 0 for every n-column x. Show that
rank A ≥ rank B. [Hint: Theorem 4.]
346 Chapter 7 Linear Transformations
13. Let A be an m × n matrix of rank r. Thinking of 22. Fix a column y ≠ 0 in n and let
n as rows, define V = {x in m | xA = 0}. Show U = {A in Mnn | Ay = 0}. Show that
that dim V = m - r. dim U = n(n - 1).
14. Consider V = U S T ` a + c = b + d V.
a b 23. If B in Mmn has rank r, let U = {A in Mnn | BA = 0}
c d and W = {BA | A in Mnn}. Show that
(a) Consider S : M22 → with dim U = n(n - r) and dim W = nr. [Hint: Show
SS T = a + c - b - d. Show that
a b that U consists of all matrices A whose columns
are in the null space of B. Use Example 7.]
c d
S is linear and onto and that V is a 24. Let T : V → V be a linear transformation where
subspace of M22. Compute dim V. dim V = n. If ker T ∩ im T = {0}, show that
(b) Consider T : V → with T S T = a + c.
a b every vector v in V can be written v = u + w for
c d some u in ker T and w in im T. [Hint: Choose
Show that T is linear and onto, and use this bases B ⊆ ker T and D ⊆ im T, and use Exercise
information to compute dim(ker T ). 33 Section 6.3.]
15. Define T : Pn → by T [ p(x)] = the sum of all 25. Let T : n → n be a linear operator of rank
the coefficients of p(x). 1, where n is written as rows. Show that there
exist numbers a1, a2, …, an and b1, b2, …, bn such
(a) Use the dimension theorem to show that that T(X) = XA for all rows X in n, where
dim(ker T ) = n.
a1b1 a1b2 a1bn
(b) Conclude that {x - 1, x2 - 1, …, xn - 1} is a
a2b1 a2b2 a2bn
basis of ker T. A=
16. Use the dimension theorem to prove Theorem 1 an b1 an b2 an bn
Section 1.3: If A is an m × n matrix with m < n,
[Hint: im T = w for w = (b1, …, bn) in n.]
the system Ax = 0 of m homogeneous equations
in n variables always has a nontrivial solution. 26. Prove Theorem 5.
17. Let B be an n × n matrix, and consider the 27. Let T : V → be a nonzero linear
subspaces U = {A | A in Mmn, AB = 0} and transformation, where dim V = n. Show
V = {AB | A in Mmn}. Show that that there is a basis {e1, …, en} of V such
dim U + dim V = mn. that T(r1e1 + r2e2 + + rnen) = r1.
18. Let U and V denote, respectively, the spaces 28. Let f ≠ 0 be a fixed polynomial of degree
of even and odd polynomials in Pn. Show that m ≥ 1. If p is any polynomial, recall that
dim U + dim V = n + 1. [Hint: Consider (p ◦ f )(x) = p[ f (x)]. Define Tf : Pn → Pn+m
T : Pn → Pn where T [ p(x)] = p(x) - p(-x).] by Tf ( p) = p ◦ f
19. Show that every polynomial f (x) in Pn-1 can (a) Show that Tf is linear.
be written as f (x) = p(x + 1) - p(x) for some
(b) Show that Tf is one-to-one.
polynomial p(x) in Pn. [Hint: Define
T : Pn → Pn-1 by T [ p(x)] = p(x + 1) - p(x).] 29. Let U be a subspace of a finite dimensional
vector space V.
20. Let U and V denote the spaces of symmetric
and skew-symmetric n × n matrices. Show that (a) Show that U = ker T for some linear
dim U + dim V = n2. operator T : V → V.
21. Assume that B in Mnn satisfies Bk = 0 for some (b) Show that U = im S for some linear operator
k ≥ 1. Show that every matrix in Mnn has the S : V → V. [Hint: Theorems 1 Section 6.4
form BA - A for some A in Mnn. [Hint: Show and 3 Section 7.1.]
that T : Mnn → Mnn is linear and one-to-one
where T(A) = BA - A for each A.]
SECTION 7.3 Isomorphisms and Composition 347
30. Let V and W be finite dimensional vector spaces. (b) Show that dim W ≥ dim V if and
only if there exists a one-to-one
(a) Show that dim W ≤ dim V if and only if
linear transformation T : V → W.
there exists an onto linear transformation
[Hint: Theorems 1 Section 6.4 and 3
T : V → W. [Hint: Theorems 1 Section 6.4
Section 7.1.]
and 3 Section 7.1.]
EXAMP L E 1
The identity transformation 1V : V → V is an isomorphism for any vector
space V.
EXAMP L E 2
If T : Mmn → Mnm is defined by T(A) = AT for all A in Mmn, then T is an
isomorphism (verfiy). Hence Mmn Mnm.
EXAMP L E 3
Isomorphic spaces can “look” quite different. For example, M22 P3 because
the map T : M22 → P3 given by T S T = a + bx + cx + dx is an isomorphism
a b 2 3
c d
(verify).
348 Chapter 7 Linear Transformations
The word isomorphism comes from two Greek roots: iso, meaning “same,” and
morphos, meaning “form.” An isomorphism T : V → W induces a pairing
v ↔ T(v)
between vectors v in V and vectors T(v) in W that preserves vector addition and
scalar multiplication. Hence, as far as their vector space properties are concerned, the
spaces V and W are identical except for notation. Because addition and scalar
multiplication in either space are completely determined by the same operations in
the other space, all vector space properties of either space are completely determined
by those of the other.
One of the most important examples of isomorphic spaces was considered in
Chapter 4. Let A denote the set of all “arrows” with tail at the origin in space, and
make A into a vector space using the parallelogram law and the scalar multiple law
(see Section 4.1). Then define a transformation T : 3 → A by taking
z
In Section 4.1 matrix addition and scalar multiplication were shown to correspond
to the parallelogram law and the scalar multiplication law for these arrows, so the
map T is a linear transformation. Moreover T is an isomorphism: it is one-to-one
by Theorem 2 Section 4.1, and it is onto because, given an arrow v in A with tip
z z
the geometric arrows with the algebraic matrices. This identification is very useful.
The arrows give a “picture” of the matrices and so bring geometric intuition into
3; the matrices are useful for detailed calculations and so bring analytic precision
into geometry. This is one of the best examples of the power of an isomorphism to
shed light on both spaces being considered.
The following theorem gives a very useful characterization of isomorphisms:
They are the linear transformations that preserve bases.
Theorem 1
If V and W are finite dimensional spaces, the following conditions are equivalent for a
linear transformation T : V → W.
1. T is an isomorphism.
2. If {e1, e2, …, en} is any basis of V, then {T(e1), T(e2), …, T(en)} is a basis of W.
3. There exists a basis {e1, e2, …, en} of V such that {T(e1), T(e2), …, T(en)} is a
basis of W.
PROOF
(1) ⇒ (2). Let {e1, …, en} be a basis of V. If t1T(e1) + + tnT(en) = 0 with ti in
, then T(t1e1 + + tnen) = 0, so t1e1 + + tnen = 0 (because ker T = {0}).
But then each ti = 0 by the independence of the ei, so {T(e1), …, T(en)} is
independent. To show that it spans W, choose w in W. Because T is onto,
w = T(v) for some v in V, so write v = t1e1 + + tnen. Then
w = T(v) = t1T(e1) + + tnT(en), proving that {T(e1), …, T(en)} spans W.
SECTION 7.3 Isomorphisms and Composition 349
Theorem 1 dovetails nicely with Theorem 3 Section 7.1 as follows. Let V and W
be vector spaces of dimension n, and suppose that {e1, e2, …, en} and {f1, f2, …, fn}
are bases of V and W, respectively. Theorem 3 Section 7.1 asserts that there exists a
linear transformation T : V → W such that
T(ei) = fi for each i = 1, 2, …, n
Then {T(e1), …, T(en)} is evidently a basis of W, so T is an isomorphism by
Theorem 1. Furthermore, the action of T is prescribed by
T(r1e1 + + rnen) = r1f1 + + rnfn
so isomorphisms between spaces of equal dimension can be easily defined as soon
as bases are known. In particular, this shows that if two vector spaces V and W have
the same dimension then they are isomorphic, that is V W. This is half of the
following theorem.
Theorem 2
PROOF
It remains to show that if V W then dim V = dim W. But if V W, then
there exists an isomorphism T : V → W. Since V is finite dimensional, let
{e1, …, en} be a basis of V. Then {T(e1), …, T(en)} is a basis of W by Theorem 1,
so dim W = n = dim V.
Corollary 1
The proof is left to the reader. By virtue of these properties, the relation is
called an equivalence relation on the class of finite dimensional vector spaces. Since
dim(n) = n it follows that
350 Chapter 7 Linear Transformations
Corollary 2
S T
v1
v
CB(v1b1 + v2b2 + + vnbn) = v1e1 + v2e2 + + vnen = 2
vn
where each vi is in . Moreover, CB(bi) = ei for each i so CB is an isomorphism
by Theorem 1, called the coordinate isomorphism corresponding to the basis B.
These isomorphisms will play a central role in Chapter 9.
The conclusion in the above corollary can be phrased as follows: As far as vector
space properties are concerned, every n-dimensional vector space V is essentially
the same as n; they are the “same” vector space except for a change of symbols.
This appears to make the process of abstraction seem less important—just study n
and be done with it! But consider the different “feel” of the spaces P8 and M33 even
though they are both the “same” as 9: For example, vectors in P8 can have roots,
while vectors in M33 can be multiplied. So the merit in the abstraction process lies
in identifying common properties of the vector spaces in the various examples. This is
important even for finite dimensional spaces. However, the payoff from abstraction
is much greater in the infinite dimensional case, particularly for spaces of functions.
EXAMPLE 4
Let V denote the space of all 2 × 2 symmetric matrices. Find an isomorphism
T : P2 → V such that T(1) = I, where I is the 2 × 2 identity matrix
The dimension theorem (Theorem 4 Section 7.2) gives the following useful
fact about isomorphisms.
Theorem 3
PROOF
The dimension theorem asserts that dim(ker T ) + dim(im T ) = n, so
dim(ker T ) = 0 if and only if dim(im T ) = n. Thus T is one-to-one if
and only if T is onto, and the result follows.
Composition
Suppose that T : V → W and S : W → U are linear transformations. They link
together as in the diagram so, as in Section 2.3, it is possible to define a new
function V → U by first applying T and then S.
T S
Definition 7.5 Given linear transformations V → W→ U, the composite ST : V → U of T and S is
defined by
T S
ST(v) = S[T(v)] for all v in V.
V W U The operation of forming the new function ST is called composition.1
1
EXAMP L E 5
Define: S : M22 → M22 and T : M22 → M22 by S S a b T = S c d T and
c d a b
T(A) = AT for A ∈ M22 Describe the action of ST and TS, and show
that ST ≠ TS.
Solution ► ST S a b T = S S a c T = S b d T , whereas TS S a b T = T S c d T = S c a T.
c d b d a c c d a b d b
It is clear that TS S a b T need not equal ST S a b T, so TS ≠ ST.
c d c d
The next theorem collects some basic properties3 of the composition operation.
1 In Section 2.3 we denoted the composite as S ◦ T. However, it is more convenient to use the simpler notation ST.
2 Actually, all that is required is U ⊆ V.
3 Theorem 4 can be expressed by saying that vector spaces and linear transformations are an example of a category. In general
a category consists of certain objects and, for any two objects X and Y, a set mor(X, Y). The elements α of mor(X, Y) are called
morphisms from X to Y and are written α : X → Y. It is assumed that identity morphisms and composition are defined in such a way
that Theorem 4 holds. Hence, in the category of vector spaces the objects are the vector spaces themselves and the morphisms are
the linear transformations. Another example is the category of metric spaces, in which the objects are sets equipped with a distance
function (called a metric), and the morphisms are continuous functions (with respect to the metric). The category of sets and
functions is a very basic example.
352 Chapter 7 Linear Transformations
Theorem 4
T S R
---
Let V → --
W→ --
U→ Z be linear transformations.
1. The composite ST is again a linear transformation.
2. T1V = T and 1WT = T.
3. (RS)T = R(ST ).
PROOF
The proofs of (1) and (2) are left as Exercise 25. To prove (3), observe that, for
all v in V:
{(RS)T}(v) = (RS)[T(v)] = R{S[T(v)]} = R{(ST)(v)} = {R(ST)}(v)
Theorem 5
Let V and W be finite dimensional vector spaces. The following conditions are equivalent
for a linear transformation T : V → W.
1. T is an isomorphism.
2. There exists a linear transformation S : W → V such that ST = 1V and
TS = 1W.
PROOF
(1) ⇒ (2). If B = {e1, …, en} is a basis of V, then D = {T(e1), …, T(en)} is a basis
of W by Theorem 1. Hence (using Theorem 3 Section 7.1), define a linear
transformation S : W → V by
EXAMP L E 6
Define T : P1 → P1 by T(a + bx) = (a - b) + ax. Show that T has an inverse,
and find the action of T -1.
EXAMP L E 7
If B = {b1, b2, …, bn} is a basis of a vector space V, the coordinate
transformation CB : V → n is an isomorphism defined by
CB(v1b1 + v2b2 + + vnbn) = (v1, v2, …, vn)T.
The way to reverse the action of CB is clear: C -1 n
B : → V is given by
C -1
B (v1, v2, …, vn) = v1b1 + v2b2 + + vnbn for all vi in V.
EXAMP L E 8
Define T : 3 → 3 by T(x, y, z) = (z, x, y). Show that T 3 = 13, and hence
find T -1.
EXAMPLE 9
Define T : Pn → n+1 by T(p) = (p(0), p(1), …, p(n)) for all p in Pn. Show that
T -1 exists.
Solution ► The verification that T is linear is left to the reader. If T(p) = 0, then
p(k) = 0 for k = 0, 1, …, n, so p has n + 1 distinct roots. Because p has degree
at most n, this implies that p = 0 is the zero polynomial (Theorem 4 Section
6.5) and hence that T is one-to-one. But dim Pn = n + 1 = dim n+1, so this
means that T is also onto and hence is an isomorphism. Thus T -1 exists by
Theorem 5. Note that we have not given a description of the action of T -1, we
have merely shown that such a description exists. To give it explicitly requires
some ingenuity; one method involves the Lagrange interpolation expansion
(Theorem 3 Section 6.5).
EXERCISES 7.3
1. Verify that each of the following is an 4. In each case, compute the action of ST and TS,
isomorphism (Theorem 3 is useful). and show that ST ≠ TS.
(a) T : 3 → 3; (a) S : 2 → 2 with S(x, y) = ( y, x);
T(x, y, z) = (x + y, y + z, z + x) T : 2 → 2 with T(x, y) = (x, 0)
(b) T : 3 → 3; (b) S : 3 → 3 with S(x, y, z) = (x, 0, z);
T(x, y, z) = (x, x + y, x + y + z) T : 3 → 3 with T(x, y, z) = (x + y, 0, y + z)
__
(c) T : → ; T(z) = z (c) S : P2 → P2 with S(p) = p(0) + p(1)x + p(2)x2;
T : P2 → P2 with
(d) T : Mmn → Mmn; T(X) = UXV, U and V
T(a + bx + cx2) = b + cx + ax2
invertible
S : M22 → M22 with S S T = S T;
a b a 0
(e) T : P1 → 2; T [ p(x)] = [ p(0), p(1)] (d)
c d 0 d
T : M22 → M22 with T S T=S T
(f ) T : V → V; T(v) = kv, k ≠ 0 a fixed number, a b c a
V any vector space c d d b
(g) T : M22 → 4; T S T = (a + b, d, c, a - b)
a b 5. In each case, show that the linear transformation
c d T satisfies T 2 = T.
T
(h) T : Mmn → Mnm; T(A) = A
(a) T : 4 → 4; T(x, y, z, w) = (x, 0, z, 0)
2. Show that {a + bx + cx2, a1 + b1x + c1x2, (b) T : 2 → 2; T(x, y) = (x + y, 0)
a2 + b2x + c2x2} is a basis of P2 if and only if
{(a, b, c), (a1, b1, c1), (a2, b2, c2)} is a basis of 3. (c) T : P2 → P2;
T(a + bx + cx2) = (a + b - c) + cx + cx2
3. If V is any vector space, let V n denote the space
T : M22 → M22; T S T = _12 S T
a b a+c b+d
of all n-tuples (v1, v2, …, vn), where each vi lies in (d)
c d a+c b+d
V. (This is a vector space with component-wise
operations; see Exercise 17 Section 6.1.) If Cj(A) 6. Determine whether each of the following
denotes the jth column of the m × n matrix A, transformations T has an inverse and, if so,
show that T : Mmn → (m)n is an isomorphism determine the action of T -1.
if T(A) = [C1(A) C2(A) Cn(A)]. (Here m (a) T : 3 → 3;
consists of columns.) T(x, y, z) = (x + y, y + z, z + x)
SECTION 7.3 Isomorphisms and Composition 355
(a) T : Pn → Pn is given by T [ p(x)] = p(x + 1). 18. Let Dn denote the space of all functions f from
{1, 2, …, n} to (see Exercise 35 Section 6.3). If
(b) T : Mnn → Mnn is given by T(A) = UA where T : Dn → n is defined by
U is invertible in Mnn.
T( f ) = (f (1), f (2), …, f (n)),
T S
10. Given linear transformations V → W→ U:
show that T is an isomorphism.
(a) If S and T are both one-to-one, show that
ST is one-to-one. 19. (a) Let V be the vector space of Exercise 3
Section 6.1. Find an isomorphism T : V → 1.
(b) If S and T are both onto, show that ST is
onto. (b) Let V be the vector space of Exercise
4 Section 6.1. Find an isomorphism
11. Let T : V → W be a linear transformation. T : V → 2.
(a) If T is one-to-one and TR = TR1 for T S
20. Let V → W→ V be linear transformations such
transformations R and R1 : U → V, show that ST = 1V. If dim V = dim W = n, show that
that R = R1. S = T -1 and T = S-1. [Hint: Exercise 13 and
(b) If T is onto and ST = S1T for Theorems 3, 4, and 5.]
transformations S and S1 : W → U, show T S
that S = S1. 21. Let V → W→ V be functions such that
TS = 1W and ST = 1V. If T is linear, show that S
is also linear.
356 Chapter 7 Linear Transformations
22. Let A and B be matrices of size p × m and n × q. (a) Show that T is one-to-one if and only
Assume that mn = pq. Define R : Mmn → Mpq by if there exists a linear transformation
R(X) = AXB. S : W → V with ST = 1V. [Hint: If
{e1, …, en} is a basis of V and T is
(a) Show that Mmn Mpq by comparing dimensions.
one-to-one, show that W has a basis
(b) Show that R is a linear transformation. {T(e1), …, T(en), fn+1, …, fn+k} and use
Theorems 2 and 3, Section 7.1.]
(c) Show that if R is an isomorphism, then m = p
and n = q. [Hint: Show that T : Mmn → Mpn (b) Show that T is onto if and only if there exists
given by T(X) = AX and S : Mmn → Mmq a linear transformation S : W → V with
given by S(X) = XB are both one-to-one, and TS = 1W. [Hint: Let {e1, …, er, …, en} be a
use the dimension theorem.] basis of V such that {er+1, …, en} is a basis
of ker T. Use Theorem 5 Section 7.2 and
23. Let T : V → V be a linear transformation such Theorems 2 and 3, Section 7.1.]
that T 2 = 0 is the zero transformation.
28. Let S and T be linear transformations V → W,
(a) If V ≠ {0}, show that T cannot be invertible.
where dim V = n and dim W = m.
(b) If R : V → V is defined by R(v) = v + T(v)
(a) Show that ker S = ker T if and only if
for all v in V, show that R is linear and
T = RS for some isomorphism R : W → W.
invertible.
[Hint: Let {e1, …, er, …, en} be a basis
24. Let V consist of all sequences [x0, x1, x2, …) of of V such that {er+1, …, en} is a basis of
numbers, and define vector operations ker S = ker T. Use Theorem 5 Section 7.2 to
extend {S(e1), …, S(er)} and {T(e1), …, T(er)}
[x0, x1, …) + [y0, y1, …) = [x0 + y0, x1 + y1, …) to bases of W.]
r[x0, x1, …) = [rx0, rx1, …)
(b) Show that im S = im T if and only if
(a) Show that V is a vector space of infinite
T = SR for some isomorphism R : V → V.
dimension.
[Hint: Show that dim(ker S) = dim(ker T )
(b) Define T : V → V and S : V → V by and choose bases {e1, …, er, …, en} and
T [x0, x1, …) = [x1, x2, …) and {f1, …, fr, …, fn} of V where {er+1, …, en}
S[x0, x1, …) = [0, x0, x1, …). Show that and {fr+1, …, fn} are bases of ker S and ker T,
TS = 1V, so TS is one-to-one and onto, respectively. If 1 ≤ i ≤ r, show that
but that T is not one-to-one and S is not S(ei) = T(gi) for some gi in V, and prove
onto. that {g1, …, gr, fr+1, …, fn} is a basis of V.]
25. Prove (1) and (2) of Theorem 4. 29. If T : V → V is a linear transformation where
dim V = n, show that TST = T for some
26. Define T : Pn → Pn by T(p) = p(x) + xp(x) for isomorphism S : V → V. [Hint: Let
all p in Pn. {e1, …, er, er+1, …, en} be as in Theorem 5
(a) Show that T is linear. Section 7.2. Extend {T(e1), …, T(er)} to a basis
of V, and use Theorem 1 and Theorems 2 and 3,
(b) Show that ker T = {0} and conclude Section 7.1.]
that T is an isomorphism. [Hint: Write
p(x) = a0 + a1x + + anxn and 30. Let A and B denote m × n matrices. In each case
compare coefficients if p(x) = -xp(x).] show that (1) and (2) are equivalent.
(c) Conclude that each q(x) in Pn has the (a) (1) A and B have the same null space.
form q(x) = p(x) + xp(x) for some unique (2) B = PA for some invertible m × m
polynomial p(x). matrix P.
(d) Does this remain valid if T is defined by (b) (1) A and B have the same range.
T [ p(x)] = p(x) - xp(x)? Explain. (2) B = AQ for some invertible n × n
matrix Q.
27. Let T : V → W be a linear transformation, where
V and W are finite dimensional. [Hint: Use Exercise 28.]
SECTION 7.4 A Theorem about Differential Equations 357
Theorem 1
Lemma 1
PROOF
Observe first that if dim (D ∗n) = n, then dim(D ∗n) = 2n. [In fact, if {g1, …, gn} is
a -basis of D ∗n then {g1, …, gn, ig1, …, ign} is a -basis of D ∗n]. Now observe that
the set Dn × Dn of all ordered pairs (f, g) with f and g in Dn is a real vector space
with componentwise operations. Define
θ : D ∗n → Dn × Dn given by θ( f ) = ( fr, fi ) for f in D ∗n.
One verifies that θ is onto and one-to-one, and it is -linear because f → fr and
f → fi are both -linear. Hence D ∗n Dn × Dn as -spaces. Since dim(D ∗n) is
finite, it follows that dim(Dn) is finite, and we have
2 dim(Dn) = dim(Dn × Dn) = dim(D ∗n) = 2n.
4 Write |w| for the absolute value of any complex number w. As for functions → , we say that lim f (t) = w if, for all ε > 0 there
t→0
exists δ > 0 such that |f (t) - w| < ε whenever |t| < δ. (Note that t represents a real number here.) In particular, given a real
number x, we define the derivative f of a function f : → by f (x) = lim {_1t [f (x + t) - f (x)]}, and we say that f is differentiable if
t→0
f (x) exists for all x in . Then we can prove that f is differentiable if and only if both fr and fi are differentiable, and that f = fr + ifi
in this case.
SECTION 7.4 A Theorem about Differential Equations 359
Lemma 2
PROOF
Lemma 3
Kernel Lemma
Let V be a vector space, and let S and T be linear operators V → V. If S is onto and both
ker(S) and ker(T ) are finite dimensional, then ker(TS) is also finite dimensional and
dim[ker(TS)] = dim[ker(T )] + dim[ker(S)].
PROOF
Let {u1, u2, …, um} be a basis of ker(T ) and let {v1, v2, …, vn} be a basis of ker(S).
Since S is onto, let ui = S(wi) for some wi in V. It suffices to show that
is a basis of ker(TS). Note that B ⊆ ker(TS) because TS(wi) = T(ui) = 0 for each
i and TS(vj) = T(0) = 0 for each j.
Spanning. If v is in ker(TS), then S(v) is in ker(T ), say
S(v) = ∑riui = ∑riS(wi) = SQ ∑riwi R. It follows that v - ∑riwi is in
ker(S) = span{v1, v2, …, vn}, proving that v is in span(B).
Independence. Let ∑riwi + ∑tjvj = 0. Applying S, and noting that S(vj) = 0 for
each j, yields 0 = ∑riS(wi) = ∑riui. Hence ri = 0 for each i, and so ∑tjvj = 0.
This implies that each tj = 0, and so proves the independence of B.
360 Chapter 7 Linear Transformations
PROOF OF THEOREM 1
By Lemma 1, it suffices to prove that dim (D ∗n) = n. This holds for n = 1
because the proof of Theorem 1 Section 3.5 goes through to show that
D ∗1 = ea0x. Hence we proceed by induction on n. With an eye on (∗),
consider the polynomial
Since D ∗n = ker[ p(D)], this completes the induction, and so proves Theorem 1.
5
5 This is the reason for allowing our solutions to (∗) to be complex valued.
6 This section requires only Sections 7.1–7.3.
SECTION 7.5 More on Linear Recurrences 361
EXAMP L E 1
[n) is the sequence 0, 1, 2, 3, …
[n + 1) is the sequence 1, 2, 3, 4, …
[2n) is the sequence 1, 2, 22, 23, …
[(-1)n) is the sequence 1, -1, 1, -1, …
[5) is the sequence 5, 5, 5, 5, …
Sequences of the form [c) for a fixed number c will be referred to as constant
sequences, and those of the form [λn), λ some number, are power sequences.
Two sequences are regarded as equal when they are identical:
[xn) = [yn) means xn = yn for all n = 0, 1, 2, …
Addition and scalar multiplication of sequences are defined by
[xn) + [yn) = [xn + yn)
r[xn) = [rxn)
These operations are analogous to the addition and scalar multiplication in n, and
it is easy to check that the vector-space axioms are satisfied. The zero vector is the
constant sequence [0), and the negative of a sequence [xn) is given by -[xn) = [-xn).
Now suppose k real numbers r0, r1, …, rk-1 are given, and consider the linear
recurrence relation determined by these numbers.
xn+k = r0xn+ r1xn+1 + + rk-1xn+k-1 (∗)
7
When r0 ≠ 0, we say this recurrence has length k. For example, the relation
xn+2 = 2xn+ xn+1 is of length 2.
A sequence [xn) is said to satisfy the relation (∗) if (∗) holds for all n ≥ 0. Let V
denote the set of all sequences that satisfy the relation. In symbols,
V = {[xn) | xn+k = r0xn + r1xn+1 + + rk-1xn+k-1 hold for all n ≥ 0}
It is easy to see that the constant sequence [0) lies in V and that V is closed under
addition and scalar multiplication of sequences. Hence V is vector space (being a
subspace of the space of all sequences). The following important observation about
V is needed (it was used implicitly earlier): If the first k terms of two sequences
agree, then the sequences are identical. More formally,
Lemma 1
PROOF
If [xn) = [yn) then xn = yn for all n = 0, 1, 2, … . Conversely, if xi = yi for all
i = 0, 1, …, k - 1, use the recurrence (∗) for n = 0.
xk = r0x0 + r1x1 + + rk-1xk-1 = r0y0 + r1y1 + + rk-1yk-1 = yk
Next the recurrence for n = 1 establishes xk+1 = yk+1. The process continues to
show that xn+k = yn+k holds for all n ≥ 0 by induction on n. Hence [xn) = [yn).
7 We shall usually assume that r0 ≠ 0; otherwise, we are essentially dealing with a recurrence of shorter length than k.
362 Chapter 7 Linear Transformations
Theorem 1
PROOF
(1) and (2) will follow from Theorems 1 and 2, Section 7.3 as soon as we show
that T is an isomorphism. Given v and w in k, write v = (v0, v1, …, vk-1) and
w = (w0, w1, …, wk-1). The first k terms of T(v) and T(w) are v0, v1, …, vk-1 and
w0, w1, …, wk-1, respectively, so the first k terms of T(v) + T(w) are v0 + w0,
v1 + w1, …, vk-1 + wk-1. Because these terms agree with the first k terms of
T(v + w), Lemma 1 implies that T(v + w) = T(v) + T(w). The proof that
T(rv) + rT(v) is similar, so T is linear.
Now let [xn) be any sequence in V, and let v = (x0, x1, …, xk-1). Then the
first k terms of [xn) and T(v) agree, so T(v) = [xn). Hence T is onto. Finally,
if T(v) = [0) is the zero sequence, then the first k terms of T(v) are all zero
(all terms of T(v) are zero!) so v = 0. This means that ker T = {0}, so T is
one-to-one.
EXAMPLE 2
Show that the sequences [1), [n), and [(-1)n) are a basis of the space V of all
solutions of the recurrence
xn+3 = -xn + xn+1 + xn+2.
Then find the solution satisfying x0 = 1, x1 = 2, x2 = 5.
Solution ► The verifications that these sequences satisfy the recurrence (and
hence lie in V ) are left to the reader. They are a basis because [1) = T(1, 1, 1),
[n) = T(0, 1, 2), and [(-1)n) = T(1, -1, 1); and {(1, 1, 1), (0, 1, 2), (1, -1, 1)} is
a basis of 3. Hence the sequence [xn) in V satisfying x0 = 1, x1 = 2, x2 = 5 is a
linear combination of this basis:
SECTION 7.5 More on Linear Recurrences 363
1 = x0 = t1 + 0 + t3
2 = x1 = t1 + t2 - t3
5 = x2 = t1 + 2t2 + t3
This technique clearly works for any linear recurrence of length k: Simply take
your favourite basis {v1, …, vk} of k—perhaps the standard basis—and compute
T(v1), …, T(vk). This is a basis of V all right, but the nth term of T(vi) is not usually
given as an explicit function of n. (The basis in Example 2 was carefully chosen so
that the nth terms of the three sequences were 1, n, and (-1)n, respectively, each a
simple function of n.)
However, it turns out that an explicit basis of V can be given in the general
situation. Given the recurrence (∗) again:
xn+k = r0xn + r1xn+1 + + rk-1xn+k-1
the idea is to look for numbers λ such that the power sequence [λn) satisfies (∗).
This happens if and only if
λn+k = r0λn + r1λn+1 + + rk-1λn+k-1
holds for all n ≥ 0. This is true just when the case n = 0 holds; that is,
λk = r0 + r1λ + + rk-1λk-1
The polynomial
p(x) = xk - rk-1xk-1 - - r1x - r0
is called the polynomial associated with the linear recurrence (∗). Thus every root λ
of p(x) provides a sequence [λn) satisfying (∗). If there are k distinct roots, the power
sequences provide a basis. Incidentally, if λ = 0, the sequence [λn) is 1, 0, 0, …; that
is, we accept the convention that 00 = 1.
Theorem 2
denote the vector space of all sequences satisfying the linear recurrence relation
determined by r0, r1, …, rk-1; and let
PROOF
It remains to prove (2). But [λin) = T(vi) where vi = (1, λi, λi2, …, λik-1), so (2)
follows by Theorem 1, provided that (v1, v2, …, vn) is a basis of k. This is true
provided that the matrix with the vi as its rows
1 λ1 λ21 λk–1
1
1 λ2 λ22 λk–1
2
2
1 λk λ k
λk–1
k
EXAMPLE 3
Find the solution of xn+2 = 2xn + xn+1 that satisfies x0 = a, x1 = b.
Theorem 3
PROOF
n(n - 1)(n - k + 1)
(Sketch) It remains to prove (2). If Q R = ___________________ denotes the
n
k k!
binomial coefficient, the idea is to use (1) to show that the sequence sk = Q Rλn
n
k [ )
is a solution for each k = 0, 1, …, m - 1. Then (2) of Theorem 1 can be
applied to show that {s0, s1, …, sm-1} is linearly independent. Finally, the
sequences tk = [nkλn), k = 0, 1, …, m - 1, in the present theorem can be
m-1
given by tk = ∑ akjsj, where A = [aij] is an invertible matrix. Then (2) follows.
j=0
We omit the details.
This theorem combines with Theorem 2 to give a basis for V when p(x) has k real
roots (not necessarily distinct) none of which is zero. This last requirement means
r0 ≠ 0, a condition that is unimportant in practice (see Remark 1 below).
Theorem 4
PROOF
There are m1 + m2 + + mp = k sequences in all so, because dim V = k,
it suffices to show that they are linearly independent. The assumption that
r0 ≠ 0, implies that 0 is not a root of p(x). Hence each λi ≠ 0, so
{[λ ni ), [nλ ni ), … [nmi-1λ ni )} is linearly independent by Theorem 3.
The proof that the whole set of sequences is linearly independent is omitted.
EXAMPLE 4
Find a basis for the space V of all sequences [xn) satisfying
xn+3 = -9xn - 3xn+1 + 5xn+2.
Remark 1 If r0 = 0 [so p(x) has 0 as a root], the recurrence reduces to one of shorter length.
For example, consider
xn+4 = 0xn + 0xn+1 + 3xn+2 + 2xn+3 (∗∗∗)
SECTION 7.5 More on Linear Recurrences 367
If we set yn = xn+2, this recurrence becomes yn+2 = 3yn + 2yn+1, which has solutions
[3n) and [(-1)n). These give the following solution to (∗):
[0, 0, 1, 3, 32, …)
[0, 0, 1, -1, (-1)2, …)
In addition, it is easy to verify that
[1, 0, 0, 0, 0, …)
[0, 1, 0, 0, 0, …)
are also solutions to (∗∗∗). The space of all solutions of (∗) has dimension 4
(Theorem 1), so these sequences are a basis. This technique works whenever r0 = 0.
Remark 2 Theorem 4 completely describes the space V of sequences that satisfy a linear
recurrence relation for which the associated polynomial p(x) has all real roots.
However, in many cases of interest, p(x) has complex roots that are not real.
__ __
If p() = 0, complex, then p( ) = 0 too ( the conjugate), and the main
__ __
observation is that [n + n) and [i(n + n)) are real solutions. Analogs of the
preceding theorems can then be proved.
EXERCISES 7.5
1. Find a basis for the space V of sequences [xn) 3. Find a basis for the space V of sequences [xn)
satisfying the following recurrences, and use it satisfying each of the following recurrences.
to find the sequence satisfying x0 = 1, x1 = 2,
(a) xn+2 = -a2xn + 2axn+1, a ≠ 0
x2 = 1.
(b) xn+2 = -abxn + (a + b)xn+1, (a ≠ b)
(a) xn+3 = -2xn + xn+1 + 2xn+2
(b) xn+3 = -6xn + 7xn+1 4. In each case, find a basis of V.
(c) xn+3 = -36xn + 7xn+2 (a) V = {[xn) | xn+4 = 2xn+2 - xn+3, for n ≥ 0}
(b) V = {[xn) | xn+4 = -xn+2 + 2xn+3, for n ≥ 0}
2. In each case, find a basis for the space V of all
sequences [xn) satisfying the recurrence, and use 5. Suppose that [xn) satisfies a linear recurrence
it to find xn if x0 = 1, x1 = -1, and x2 = 1. relation of length k. If {e0 = (1, 0, …, 0),
(a) xn+3 = xn + xn+1 - xn+2 e1 = (0, 1, …, 0), ek-1 = (0, 0, …, 1)} is the
standard basis of k, show that
(b) xn+3 = -2xn + 3xn+1 xn = x0T(e0) + x1T(e1) + + xk-1T(ek-1)
(c) xn+3 = -4xn + 3xn+2 holds for all n ≥ k. (Here T is as in Theorem 1.)
(d) xn+3 = xn - 3xn+1 + 3xn+2 6. Show that the shift operator S is onto but not
one-to-one. Find ker S.
(e) xn+3 = 8xn - 12xn+1 + 6xn+2
7. Find a basis for the space V of all sequences [xn)
satisfying xn+2 = -xn.
Orthogonality
8
In Section 5.3 we introduced the dot product in n and extended the basic
geometric notions of length and distance. A set {f1, f2, …, fm} of nonzero vectors
in n was called an orthogonal set if fi · fj = 0 for all i ≠ j, and it was proved
that every orthogonal set is independent. In particular, it was observed that the
expansion of a vector as a linear combination of orthogonal basis vectors is easy to
obtain because formulas exist for the coefficients. Hence the orthogonal bases are
the “nice” bases, and much of this chapter is devoted to extending results about
bases to orthogonal bases. This leads to some very powerful methods and theorems.
Our first task is to show that every subspace of n has an orthogonal basis.
Lemma 1
Orthogonal Lemma
Let {f1, f2, …, fm} be an orthogonal set in n. Given x in n, write
x·f x·f x · fm
fm+1 = x - _____12 f1 - _____22 f2 - - _____ fm
f1 f2 fm2
Then:
1. fm+1 · fk = 0 for k = 1, 2, …, m.
2. If x is not in span{f1, …, fm}, then fm+1 ≠ 0 and {f1, …, fm, fm+1} is an
orthogonal set.
PROOF
For convenience, write ti = (x · fi)/fi2 for each i. Given 1 ≤ k ≤ m:
fm+1 · fk = (x - t1f1 - - tkfk - - tmfm) · fk
= x · fk - t1(f1 · fk) - - tk(fk · fk) - - tm(fm · fk)
= x · fk - tk fk2
=0
This proves (1), and (2) follows because fm+1 ≠ 0 if x is not in span{f1, …, fm}.
SECTION 8.1 Orthogonal Complements and Projections 369
The orthogonal lemma has three important consequences for n. The first is an
extension for orthogonal sets of the fundamental fact that any independent set is
part of a basis (Theorem 1 Section 6.4).
Theorem 1
PROOF
1. If span{f1, …, fm} = U, it is already a basis. Otherwise, there exists x in U
outside span{f1, …, fm}. If fm+1 is as given in the orthogonal lemma, then fm+1
is in U and {f1, …, fm, fm+1} is orthogonal. If span{f1, …, fm, fm+1} = U, we are
done. Otherwise, the process continues to create larger and larger orthogonal
subsets of U. They are all independent by Theorem 5 Section 5.3, so we have
a basis when we reach a subset containing dim U vectors.
2. If U = {0}, the empty basis is orthogonal. Otherwise, if f ≠ 0 is in U, then {f}
is orthogonal, so (2) follows from (1).
We can improve upon (2) of Theorem 1. In fact, the second consequence of the
orthogonal lemma is a procedure by which any basis {x1, …, xm} of a subspace U of
n can be systematically modified to yield an orthogonal basis {f1, …, fm} of U. The
fi are constructed one at a time from the xi.
To start the process, take f1 = x1. Then x2 is not in span{f1} because {x1, x2} is
independent, so take
x2 · f1
f2 = x2 - ______ f1
f12
Thus {f1, f2} is orthogonal by Lemma 1. Moreover, span{f1, f2} = span{x1, x2}
(verify), so x3 is not in span{f1, f2}. Hence {f1, f2, f3} is orthogonal where
x3 · f1 x3 · f2
f3 = x3 - ______ f - ______
2 1
f2
f1 f22
Again, span{f1, f2, f3} = span{x1, x2, x3}, so x4 is not in span{f1, f2, f3} and the process
continues. At the mth iteration we construct an orthogonal set {f1, …, fm} such that
span{f1, f2, …, fm} = span{x1, x2, …, xm} = U
Hence {f1, f2, …, fm} is the desired orthogonal basis of U. The procedure can be
summarized as follows.
Theorem 2
1 Erhardt Schmidt (1876–1959) was a German mathematician who studied under the great David Hilbert and later developed the theory of
Hilbert spaces. He first described the present algorithm in 1907. Jörgen Pederson Gram (1850–1916) was a Danish actuary.
370 Chapter 8 Orthogonality
x3 f1 = x1
x2 · f1
f2 = x2 - ______ f1
0 f2 f12
f1 x3 · f1 x3 · f2
f3 = x3 - ______ f - ______
2 1
f2
span {f1, f2} f1 f22
Gram-Schmidt xk · f1 xk · f2 xk · fk-1
fk = xk - _____ f - _____
2 1
f - - _______
2 2
fk-1
f1 f2 fk-12
f3 for each k = 2, 3, …, m. Then
1. {f1, f2, …, fm} is an orthogonal basis of U.
f2
0 2. span{f1, f2, …, fk} = span{x1, x2, …, xk} for each k = 1, 2, …, m.
f1
span {f1, f2}
The process (for k = 3) is depicted in the diagrams. Of course, the algorithm
converts any basis of n itself into an orthogonal basis.
EXAMPLE 1
1 1 −1 −1
Find an orthogonal basis of the row space of A = 3 2 0 1 .
1 0 1 0
Solution ► Let x1, x2, x3 denote the rows of A and observe that {x1, x2, x3} is
linearly independent. Take f1 = x1. The algorithm gives
x2 · f1
f2 = x2 - ______ f = (3, 2, 0, 1) - _44 (1, 1, -1, -1) = (2, 1, 1, 2)
2 1
f1
x3 · f1 x3 · f2
f3 = x3 - ______ f - ______
2 1
3
f2 = x3 - _04 f1 - __ 1
f = __
10 2 10
(4, -3, 7, -6)
f1 f22
1
Hence {(1, 1, -1, -1), (2, 1, 1, 2), __
10
(4, -3, 7, -6)} is the orthogonal basis
provided by the algorithm. In hand calculations it may be convenient to
eliminate fractions, so {(1, 1, -1, -1), (2, 1, 1, 2), (4, -3, 7, -6)} is also an
orthogonal basis for row A.
x·f
Remark Observe that the vector _____2i fi is unchanged if a nonzero scalar multiple of fi is used
fi
in place of fi. Hence, if a newly constructed fi is multiplied by a nonzero scalar at
some stage of the Gram-Schmidt algorithm, the subsequent f s will be unchanged.
This is useful in actual calculations.
Projections
x Suppose a point x and a plane U through the origin in 3 are given, and we want to
x−p
find the point p in the plane that is closest to x. Our geometric intuition assures us
0
p that such a point p exists. In fact (see the diagram), p must be chosen in such a way
that x - p is perpendicular to the plane.
U
SECTION 8.1 Orthogonal Complements and Projections 371
Lemma 2
PROOF
3. Let U = span{x1, x2, …, xk}; we must show that U⊥ = {x | x · xi = 0 for each i}.
If x is in U⊥ then x · xi = 0 for all i because each xi is in U. Conversely,
suppose that x · xi = 0 for all i; we must show that x is in U⊥, that is,
x · y = 0 for each y in U. Write y = r1x1 + r2x2 + + rkxk, where each
ri is in . Then, using Theorem 1 Section 5.3,
x · y = r1(x · x1) + r2(x · x2) + + rk(x · xk) = r10 + r20 + + rk0 = 0,
as required.
EXAMP L E 2
Find U⊥ if U = span{(1, -1, 2, 0), (1, 0, -2, 3)} in 4.
U
x Now consider vectors x and d ≠ 0 in 3. The projection p = projd(x) of x on
d was defined in Section 4.2 as in the diagram. The following formula for p was
d derived in Theorem 4 Section 4.2
p
p = projd(x) = Q _____
d 2 R
x · d d,
0
372 Chapter 8 Orthogonality
Definition 8.2 Let U be a subspace of n with orthogonal basis {f1, f2, …, fm}. If x is in n, the vector
x·f x·f x · fm
projU(x) = _____12 f1 + _____22 f2 + + _____ fm
f1 f2 fm2
is called the orthogonal projection of x on U. For the zero subspace U = {0}, we
define
proj{0}(x) = 0.
Theorem 3
Projection Theorem
If U is a subspace of n and x is in n, write p = projU(x). Then
1. p is in U and x - p is in U⊥.
2. p is the vector in U closest to x in the sense that
x - p < x - y for all y ∈ U, y ≠ p
SECTION 8.1 Orthogonal Complements and Projections 373
PROOF
1. This is proved in the preceding discussion (it is clear if U = {0}).
2. Write x - y = (x - p) + (p - y). Then p - y is in U and so is orthogonal
to x - p by (1). Hence, the pythagorean theorem gives
x - y2 = x - p2 + p - y2 > x - p2
because p - y ≠ 0. This gives (2).
EXAMP L E 3
Let U = span{x1, x2} in 4 where x1 = (1, 1, 0, 1) and x2 = (0, 1, 1, 2). If
x = (3, -1, 0, 2), find the vector in U closest to x and express x as the sum
of a vector in U and a vector orthogonal to U.
EXAMP L E 4
Find the point in the plane with equation 2x + y - z = 0 that is closest to the
point (2, -1, -3).
Theorem 4
PROOF
If U = {0}, then U⊥ = n, and so T(x) = proj{0}(x) = 0 for all x. Thus T = 0 is
the zero (linear) operator, so (1), (2), and (3) hold. Hence assume that U ≠ {0}.
1. If {f1, f2, …, fm} is an orthonormal basis of U, then
T(x) = (x · f1)f1 + (x · f2)f2 + + (x · fm)fm for all x in n (∗)
by the definition of the projection. Thus T is linear because
(x + y) · fi = x · fi + y · fi and (rx) · fi = r(x · fi) for each i.
2. We have im T ⊆ U by (∗) because each fi is in U. But if x is in U, then
x = T(x) by (∗) and the expansion theorem applied to the space U. This
shows that U ⊆ im T, so im T = U.
Now suppose that x is in U⊥. Then x · fi = 0 for each i (again because
each fi is in U ) so x is in ker T by (∗). Hence U⊥ ⊆ ker T. On the other
hand, Theorem 3 shows that x - T(x) is in U⊥ for all x in n, and it
follows that ker T ⊆ U⊥. Hence ker T = U⊥, proving (2).
3. This follows from (1), (2), and the dimension theorem (Theorem 4 Section 7.2).
EXERCISES 8.1
1. In each case, use the Gram-Schmidt algorithm to (d) x = (2, 0, 1, 6), U = span{(1, 1, 1, 1),
convert the given basis B of V into an orthogonal (1, 1, -1, -1), (1, -1, 1, -1)}
basis.
(e) x = (a, b, c, d), U = span{(1, 0, 0, 0),
(a) V = 2, B = {(1, -1), (2, 1)} (0, 1, 0, 0), (0, 0, 1, 0)}
(b) V = 2, B = {(2, 1), (1, 2)} (f ) x = (a, b, c, d), U = span{(1, -1, 2, 0),
3 (-1, 1, 1, 1)}
(c) V = , B = {(1, -1, 1), (1, 0, 1), (1, 1, 2)}
(d) V = 3, B = {(0, 1, 1), (1, 1, 1), (1, -2, 2)} 3. Let x = (1, -2, 1, 6) in 4, and let
U = span{(2, 1, 3, -4), (1, 2, 0, 1)}.
2. In each case, write x as the sum of a vector in U
(a) Compute projU(x).
and a vector in U⊥.
(b) Show that {(1, 0, 2, -3), (4, 7, 1, 2)} is
(a) x = (1, 5, 7), U = span{(1, -2, 3), (-1, 1, 1)}
another orthogonal basis of U.
(b) x = (2, 1, 6), U = span{(3, -1, 2), (2, 0, -3)}
(c) Use the basis in part (b) to compute
(c) x = (3, 1, 5, 9), U = span{(1, 0, 1, 1), projU(x).
(0, 1, -1, 1), (-2, 0, 1, 1)}
SECTION 8.1 Orthogonal Complements and Projections 375
4. In each case, use the Gram-Schmidt algorithm to 15. Write n as rows. If A is an n × n matrix, write
find an orthogonal basis of the subspace U, and its null space as null A = {x in n | AxT = 0}.
find the vector in U closest to x. Show that:
(a) U = span{(1, 1, 1), (0, 1, 1)}, x = (-1, 2, 1) (a) null A = (row A)⊥;
(b) U = span{(1, -1, 0), (-1, 0, 1)}, x = (2, 1, 0) (b) null AT = (col A)⊥.
(c) U = span{(1, 0, 1, 0), (1, 1, 1, 0), (1, 1, 0, 0)}, 16. If U and W are subspaces, show that
x = (2, 0, -1, 3) (U + W)⊥ = U⊥ ∩ W⊥. [See Exercise 22
(d) U = span{(1, -1, 0, 1), (1, 1, 0, 0), (1, 1, 0, 1)}, Section 5.1.]
x = (2, 0, 3, 1)
17. Think of n as consisting of rows.
5. Let U = span{v1, v2, …, vk}, vi in n, and let A be (a) Let E be an n × n matrix, and let
the k × n matrix with the vi as rows. U = {xE | x in n}. Show that the following
(a) Show that U⊥ = {x | x in n, AxT = 0}. are equivalent.
(i) E2 = E = ET (E is a projection matrix).
(b) Use part (a) to find U⊥ if
U = span{(1, -1, 2, 1), (1, 0, -1, 1)}. (ii) (x - xE) · (yE) = 0 for all x and y in n.
(b) Prove part 2 of Lemma 2. [Hint: For (ii) implies (iii): Write
x = xE + (x - xE) and use the uniqueness
7. Let U be a subspace of n. If x in n can be argument preceding the definition of
written in any way at all as x = p + q with projU(x). For (iii) implies (ii): x - xE is in
p in U and q in U⊥, show that necessarily U⊥ for all x in n.]
p = projU(x).
(b) If E is a projection matrix, show that I - E is
8. Let U be a subspace of n and let x be a vector also a projection matrix.
in n. Using Exercise 7, or otherwise, show that
(c) If EF = 0 = FE and E and F are projection
x is in U if and only if x = projU(x).
matrices, show that E + F is also a projection
9. Let U be a subspace of n. matrix.
(a) Show that U⊥ = n if and only if U = {0}. (d) If A is m × n and AAT is invertible, show that
E = AT(AAT)-1A is a projection matrix.
(b) Show that U⊥ = {0} if and only if U = n.
18. Let A be an n × n matrix of rank r. Show that
10. If U is a subspace of n, show that projU(x) = x there is an invertible n × n matrix U such that
for all x in U. UA is a row-echelon matrix with the property
that the first r rows are orthogonal. [Hint: Let
11. If U is a subspace of n, show that
R be the row-echelon form of A, and use the
x = projU(x) + projU⊥(x) for all x in n.
Gram-Schmidt process on the nonzero rows of R
12. If {f1, …, fn} is an orthogonal basis of n and from the bottom up. Use Lemma 1 Section 2.4.]
U = span{f1, …, fm}, show that
19. Let A be an (n - 1) × n matrix with rows
U⊥ = span{fm+1, …, fn}.
x1, x2, …, xn-1 and let Ai denote the
13. If U is a subspace of n, show that U⊥⊥ = U. (n - 1) × (n - 1) matrix obtained from A by
[Hint: Show that U ⊆ U⊥⊥, then use deleting column i. Define the vector y in n by
Theorem 4(3) twice.] y = [det A1 -det A2 det A3 (-1)n+1 det An]
Show that:
14. If U is a subspace of n, show how to find an
(a) xi · y = 0 for all i = 1, 2, …, n - 1. [Hint:
n × n matrix A such that U = {x | Ax = 0}.
Write Bi = S T and show that det Bi = 0.]
xi
[Hint: Exercise 13.]
A
376 Chapter 8 Orthogonality
(b) y ≠ 0 if and only if {x1, x2, …, xn-1} is (c) If {x1, x2, …, xn-1} is linearly independent,
linearly independent. [Hint: If some use Theorem 3(3) to show that all solutions
det Ai ≠ 0, the rows of Ai are linearly to the system of n - 1 homogeneous
independent. Conversely, if the xi are equations
independent, consider A = UR where R is in AxT = 0
reduced row-echelon form.]
are given by ty, t a parameter.
Theorem 1
PROOF
First recall that condition (1) is equivalent to PPT = I by Corollary 1 of
Theorem 5 Section 2.4. Let x1, x2, …, xn denote the rows of P. Then x Tj is the
jth column of PT, so the (i, j)-entry of PPT is xi · xj. Thus PPT = I means that
xi · xj = 0 if i ≠ j and xi · xj = 1 if i = j. Hence condition (1) is equivalent to
(2). The proof of the equivalence of (1) and (3) is similar.
Definition 8.3 An n × n matrix P is called an orthogonal matrix2 if it satisfies one (and hence all) of
the conditions in Theorem 1.
2
2 In view of (2) and (3) of Theorem 1, orthonormal matrix might be a better name. But orthogonal matrix is standard.
SECTION 8.2 Orthogonal Diagonalization 377
EXAMP L E 1
These orthogonal matrices have the virtue that they are easy to invert—simply
take the transpose. But they have many other important properties as well. If
T : n → n is a linear operator, we will prove (Theorem 3 Section 10.4) that T is
distance preserving if and only if its matrix is orthogonal. In particular, the matrices
of rotations and reflections about the origin in 2 and 3 are all orthogonal (see
Example 1).
It is not enough that the rows of a matrix A are merely orthogonal for A to be an
orthogonal matrix. Here is an example.
EXAMP L E 2
2 1 1
The matrix −1 1 1 has orthogonal rows but the columns are not orthogonal.
0 −1 1
2 1 1
6 6 6
−1 1 1
However, if the rows are normalized, the resulting matrix 3 3 3
is
−1 1
0 2 2
orthogonal (so the columns are now orthonormal as the reader can verify).
EXAMP L E 3
If P and Q are orthogonal matrices, then PQ is also orthogonal, as is P-1 = PT.
Theorem 2
PROOF
(1) ⇔ (2). Given (1), let x1, x2, …, xn be orthonormal eigenvectors of
A. Then P = [x1 x2 xn] is orthogonal, and P–1AP is diagonal by
Theorem 4 Section 3.3. This proves (2). Conversely, given (2) let P–1AP
be diagonal where P is orthogonal. If x1, x2, …, xn are the columns
of P then {x1, x2, …, xn} is an orthonormal basis of n that consists of
eigenvectors of A by Theorem 4 Section 3.3. This proves (1).
(2) ⇒ (3). If PTAP = D is diagonal, where P-1 = PT, then A = PDPT. But
DT = D, so this gives AT = PTTDTPT = PDPT = A.
(3) ⇒ (2). If A is an n × n symmetric matrix, we proceed by induction on n.
If n = 1, A is already diagonal. If n > 1, assume that (3) ⇒ (2) for
(n - 1) × (n - 1) symmetric matrices. By Theorem 7 Section 5.5 let λ1
be a (real) eigenvalue of A, and let Ax1 = λ1x1, where x1 = 1. Use the
Gram-Schmidt algorithm to find an orthonormal basis {x1, x2, …, xn} for n.
EXAMPLE 4
1 0 −1
Find an orthogonal matrix P such that P-1AP is diagonal, where A = 0 1 2.
−1 2 5
SECTION 8.2 Orthogonal Diagonalization 379
S T ST S T
1 2 -1
x1 = -2 x2 = 1 x3 = 2
1 0 5
respectively. Moreover, by what appears to be remarkably good luck, these
eigenvectors are orthogonal. We have x12 = 6, x22 = 5, and x32 = 30, so
5 2 6 −1
S √__6 x1 √__5 x2 √___ x T
1
__ 1
__ 1
___ 1___
___
P= = −2 5 6 2
30 3 √30
5 0 5
-1 T
is an orthogonal matrix. Thus P = P and
0 0 0
T
P AP = 0 1 0
0 0 6
by the diagonalization algorithm.
Theorem 3
PROOF
Recall that x · y = xTy for all columns x and y. Because AT = A, we get
(Ax) · y = (Ax)Ty = xTATy = xTAy = x · (Ay).
Theorem 4
PROOF
Let Ax = λx and Ay = y, where λ ≠ . Using Theorem 3, we compute
λ(x · y) = (λx) · y = (Ax) · y = x · (Ay) = x · (y) = (x · y)
Hence (λ - )(x · y) = 0, and so x · y = 0 because λ ≠ .
Now the procedure for diagonalizing a symmetric n × n matrix is clear. Find the
distinct eigenvalues (all real by Theorem 7 Section 5.5) and find orthonormal bases
for each eigenspace (the Gram-Schmidt algorithm may be needed). Then the set of
all these basis vectors is orthonormal (by Theorem 4) and contains n vectors. Here
is an example.
EXAMPLE 5
8 −2 2
Orthogonally diagonalize the symmetric matrix A = − 2 5 4 .
2 4 5
x −8 2 −2
cA(x) = det 2 x − 5 − 4 = x(x - 9)2.
−2 −4 x − 5
Hence the distinct eigenvalues are 0 and 9 of multiplicities 1 and 2,
respectively, so dim(E0) = 1 and dim(E9) = 2 by Theorem 6 Section 5.5 (A is
diagonalizable, being symmetric). Gaussian elimination gives
S 2 T, U S T S TV
1 -2 2
E0(A) = span{x1}, x1 = and E9(A) = span 1, 0 .
-2 0 1
The eigenvectors in E9 are both orthogonal to x1 as Theorem 4 guarantees,
but not to each other. However, the Gram-Schmidt process yields and
orthogonal basis
S T ST
-2 2
{x2, x3} of E9(A) where x2 = 1 and x3 = 4 .
0 5
Normalizing gives orthonormal vectors U _13 x1, 1__
__ 1 __
x , ___
√5 2 3√5 3
x V, so
S T
__
√5 −6 2
__
P = S _13 x1 1__
__ 1 __
x ___
√5 2 3√5 3
x T 1 __
= ___
√ 2√__
5 3 4
3 5
√
−2 5 0 5
is an orthogonal matrix such that P–1AP is diagonal.
It is worth noting that other, more convenient, diagonalizing matrices P
S T S T
2 -2
exist. For example, y2 = 1 and y3 = 2 lie in E9(A) and they are orthogonal.
2 1
Moreover, they both have norm 3 (as does x1), so
SECTION 8.2 Orthogonal Diagonalization 381
1 2 −2
S_ x _1 y _1 y T
1
Q= 3 1 3 2 3 3
= _13
2 1 2
−2 2 1
is a nicer orthogonal matrix with the property that Q-1AQ is diagonal.
EXAMP L E 6
O 2 2
y −y =1
1 2
Find principal axes for the quadratic form q = x 21 - 4x1x2 + x 22.
√2 S T is orthogonal and P AP = D = S T.
1__ 1 1 T 3 0
P = __
-1 1 0 -1
Theorem 5
Triangulation Theorem
If A is an n × n matrix with n real eigenvalues, an orthogonal matrix P exists such that
PTAP is upper triangular.4
4
PROOF
We modify the proof of Theorem 2. If Ax1 = λ1x1 where x1 = 1, let
{x1, x2, …, xn} be an orthonormal basis of n, and let P1 = [x1 x2 xn].
(n - 1) × (n -1). Then P2 = S
0 QT
1 0
is orthogonal, so P = P1P2 is also
Corollary 1
If A is an n × n matrix with real eigenvalues λ1, λ2, …, λn (possibly not all distinct),
then det A = λ1λ2λn and tr A = λ1 + λ2 + + λn.
This corollary remains true even if the eigenvalues are not real (using Schur’s
theorem).
EXERCISES 8.2
(a) A = S T A=S T
0 1 1 -1 [Hint: For (b) if and only if (c), use Theorem 2.]
(b)
1 0 -1 1
3 0 0 3 0 7 12. We call matrices A and B orthogonally similar
◦
(and write A ∼ B) if B = PTAP for an orthogonal
(c) A = 0 2 2 (d) A = 0 5 0
matrix P.
0 2 5 7 0 3
◦ ◦ ◦
1 1 0 5 −2 −4 (a) Show that A ∼ A for all A; A ∼ B⇒B∼ A;
◦ ◦ ◦
and A ∼ B and B ∼ C ⇒ A ∼ C.
(e) A = 1 1 0 (f ) A = − 2 8 −2
0 0 2 −4 −2 5 (b) Show that the following are equivalent for
3 5 −1 1 two symmetric matrices A and B.
5 3 0 0
5 3 1 −1 (i) A and B are similar.
(g) A = 3 5 0 0 (h) A =
0 0 7 1 −1 1 3 5
(ii) A and B are orthogonally similar.
0 0 1 7 1 −1 5 3
(iii) A and B have the same eigenvalues.
0 a 0
6. Consider A = a 0 c where one of a, c ≠ 0.
13. Assume that A and B are orthogonally similar
(Exercise 12).
0 c 0
Show that cA(x) = x(x - k)(x + k), where
______
(a) If A and B are invertible, show that A-1 and
k = √a2 + c2 and find an orthogonal matrix P B-1 are orthogonally similar.
such that P-1AP is diagonal. (b) Show that A2 and B2 are orthogonally similar.
(c) Show that, if A is symmetric, so is B.
384 Chapter 8 Orthogonality
14. If A is symmetric, show that every eigenvalue of (ii) A can be factored as A = DP, where
A is nonnegative if and only if A = B2 for some D is invertible and diagonal and P has
symmetric matrix B. orthonormal rows.
(iii) AAT is an invertible, diagonal matrix.
15. Prove the converse of Theorem 3:
If (Ax) · y = x · (Ay) for all n-columns x and y, (b) Show that an n × n matrix A has orthogonal
then A is symmetric. rows if and only if A can be factored as
A = DP, where P is orthogonal and D is
16. Show that every eigenvalue of A is zero if and diagonal and invertible.
only if A is nilpotent (Ak = 0 for some k ≥ 1).
23. Let A be a skew-symmetric matrix; that is,
17. If A has real eigenvalues, show that AT = -A. Assume that A is an n × n matrix.
A = B + C where B is symmetric and C
is nilpotent. [Hint: Theorem 5.] (a) Show that I + A is invertible. [Hint: By
Theorem 5 Section 2.4, it suffices to show
18. Let P be an orthogonal matrix. that (I + A)x = 0, x in n, implies x = 0.
Compute x · x = xTx, and use the fact that
(a) Show that det P = 1 or det P = -1.
Ax = -x and A2x = x.]
(b) Give 2 × 2 examples of P such that det P = 1
(b) Show that P = (I - A)(I + A)-1 is
and det P = -1.
orthogonal.
(c) If det P = -1, show that I + P has no
(c) Show that every orthogonal matrix P such
inverse. [Hint: PT(I + P) = (I + P)T.]
that I + P is invertible arises as in part (b)
(d) If P is n × n and det P ≠ (-1)n, show that from some skew-symmetric matrix A.
I - P has no inverse. [Hint: Solve P = (I - A)(I + A)-1 for A.]
[Hint: PT(I - P) = -(I - P)T.]
24. Show that the following are equivalent for an
19. We call a square matrix E a projection matrix if n × n matrix P.
E2 = E = ET.
(a) P is orthogonal.
(a) If E is a projection matrix, show that
(b) Px = x for all columns x in n.
P = I - 2E is orthogonal and symmetric.
(c) Px - Py = x - y for all columns x
(b) If P is orthogonal and symmetric, show that
and y in n.
E = _12 (I - P) is a projection matrix.
(d) (Px) · (Py) = x · y for all columns x and y
(c) If U is m × n and UTU = I (for example, a
in n.
unit column in n), show that E = UUT is a
projection matrix. [Hints: For (c) ⇒ (d), see Exercise 14(a) Section
5.3. For (d) ⇒ (a), show that column i of P
20. A matrix that we obtain from the identity
equals Pei, where ei is column i of the identity
matrix by writing its rows in a different order is
matrix.]
called a permutation matrix. Show that every
permutation matrix is orthogonal. 25. Show that every 2 × 2 orthogonal matrix has the
form S T or S T for some
cos θ -sin θ cos θ sin θ
21. If the rows r1, …, rn of the n × n matrix
A = [aij] are orthogonal, show that the sin θ cos θ sin θ -cos θ
2 2
aji angle θ. [Hint: If a + b = 1, then a = cos θ and
(i, j)-entry of A-1 is _____2 . b = sin θ for some angle θ.]
rj
22. (a) Let A be an m × n matrix. Show that the 26. Use Theorem 5 to show that every symmetric
following are equivalent. matrix is orthogonally diagonalizable.
(i) A has orthogonal rows.
SECTION 8.3 Positive Definite Matrices 385
Definition 8.5 A square matrix is called positive definite if it is symmetric and all its eigenvalues λ
are positive, that is λ > 0.
Because these matrices are symmetric, the principal axis theorem plays a central
role in the theory.
Theorem 1
PROOF
If A is n × n and the eigenvalues are λ1, λ2, …, λn, then det A = λ1λ2λn > 0
by the principal axis theorem (or the corollary to Theorem 5 Section 8.2).
Theorem 2
A symmetric matrix A is positive definite if and only if xTAx > 0 for every column
x ≠ 0 in n.
PROOF
A is symmetric so, by the principal axis theorem, let
PTAP = D = diag(λ1, λ2, …, λn) where P-1 = PT and the λi are the
eigenvalues of A. Given a column x in n, write y = PTx = [y1 y2 yn]T. Then
xTAx > xT(PDPT)x = yTDy = λ1y 21 + λ2y 22 + + λny 2n (∗)
T
If A is positive definite and x ≠ 0, then x Ax > 0 by (∗) because some yj ≠ 0 and
every λi > 0. Conversely, if xTAx > 0 whenever x ≠ 0, let x = Pej ≠ 0 where ej
is column j of In. Then y = ej, so (∗) reads λj = xTAx > 0.
Note that Theorem 2 shows that the positive definite matrices are exactly the symmetric
matrices A for which the quadratic form q = xTAx takes only positive values.
386 Chapter 8 Orthogonality
EXAMPLE 1
If U is any invertible n × n matrix, show that A = UTU is positive definite.
EXAMPLE 2
10 5 2
If A = 5 3 2 then (1)A = [10], (2)A = S T and A = A.
10 5 (3)
5 3
2 2 3
Lemma 1
PROOF
If A is positive definite, Lemma 1 and Theorem 1 show that det((r)A) > 0 for
every r. This proves part of the following theorem which contains the converse to
Example 1, and characterizes the positive definite matrices among the symmetric
ones.
5 A similar argument shows that, if B is any matrix obtained from a positive definite matrix A by deleting certain rows and deleting the
same columns, then B is also positive definite.
SECTION 8.3 Positive Definite Matrices 387
Theorem 3
PROOF
First, (3) ⇒ (1) by Example 1, and (1) ⇒ (2) by Lemma 1 and Theorem 1.
(2) ⇒ (3). Assume (2) and proceed by induction on n. If n = 1, then A = [a]
__
where a > 0 by (2), so take U = [√a ]. If n > 1, write B = (n-1)A. Then B is
symmetric and satisfies (2) so, by induction, we have B = UTU as in (3) where
U is of size (n - 1) × (n - 1). Then, as A is symmetric, it has block form
A=S
T T
B p
. where p is a column in n-1 and b is in . If we write
p b
x = (UT)-1p and c = b - xTx, block multiplication gives
S T = S xT 1 T S
UTU p
T
UT 0 U x
A=
pT b 0 c
as the reader can verify. Taking determinants and applying Theorem 5 Section
3.1 gives det A = det(UT) det U · c = c(det U )2. Hence c > 0 because det A > 0
The remarkable thing is that the matrix U in the Cholesky factorization is easy
to obtain from A using row operations. The key is that Step 1 of the following
algorithm is possible for any positive definite matrix A. A proof of the algorithm is
given following Example 3.
6 Andre-Louis Cholesky (1875–1918), was a French mathematician who died in World War I. His factorization was published in 1924
by a fellow officer.
388 Chapter 8 Orthogonality
EXAMPLE 3
10 5 2
Find the Cholesky factorization of A = 5 3 2 .
2 2 3
Solution ► The matrix A is positive definite by Theorem 3 because
det (1)A = 10 > 0, det (2)A = 5 > 0, and det (3)A = det A = 3 > 0. Hence Step 1
of the algorithm is carried out as follows:
10 5 2 10 5 2 10 5 2
A = 5 3 2 → 0 12 1 → 0 1
2
1 = U1
2 2 3 0 1 13 0 0 3
5 5
5 2
10 10 10
Now carry out Step 2 on U1 to obtain U = 1
0 2
2 .
3
0 0 5
T
The reader can verify that U U = A.
Let D1 = diag(e1, …, en) denote the diagonal of U1. Then (∗) gives
L1(U T1 D -1 T -1
1 ) = U1L 1 D 1 . This is both upper triangular (right side) and
LT-1 (left side), and so must equal In. In particular, U T1 D -1 -1
1 = L 1 . Now
__ __
let D2 = diag(√e1 , …, √en ), so that D 22 = D1. If we write U = D -1
2 U1 we have
UTU = (U T1 D -1 -1 T 2 -1 T -1 -1
2 )(D 2 U1) = U 1 (D 2) U1 = (U 1 D 1 )U1 = (L 1 )U1 = A
EXERCISES 8.3
1. Find the Cholesky decomposition of each of the 9. If A is positive definite, show that A = CCT
following matrices. where C has orthogonal columns.
(a) S T (b) S T
4 3 2 -1
3 5 -1 1 10. If A is positive definite, show that A = C2 where
20 4 5 C is positive definite.
12 4 3
(c) 4 2 −1 (d) 4 2 3 11. Let A be a positive definite matrix. If a is a real
3 −1 7 5 3 5 number, show that aA is positive definite if and
only if a > 0.
2. (a) If A is positive definite, show that Ak is
positive definite for all k ≥ 1. 12. (a) Suppose an invertible matrix A can be
(b) Prove the converse to (a) when k is odd. factored in Mnn as A = LDU where L is
lower triangular with 1s on the diagonal, U
(c) Find a symmetric matrix A such that A2 is is upper triangular with 1s on the diagonal,
positive definite but A is not. and D is diagonal with positive diagonal
entries. Show that the factorization is unique:
3. Let A = S T. If a < b, show that A is positive
1 a 2
If A = L1D1U1 is another such factorization,
a b
show that L1 = L, D1 = D, and U1 = U.
definite and find the Cholesky factorization.
(b) Show that a matrix A is positive definite
4. If A and B are positive definite and r > 0, show if and only if A is symmetric and admits a
that A + B and rA are both positive definite. factorization A = LDU as in (a).
The importance of the factorization lies in the fact that there are computer
algorithms that accomplish it with good control over round-off error, making it
particularly useful in matrix calculations. The factorization is a matrix version of
the Gram-Schmidt process.
Suppose A = [c1 c2 cn] is an m × n matrix with linearly independent
columns c1, c2, …, cn. The Gram-Schmidt algorithm can be applied to these
columns to provide orthogonal columns f1, f2, …, fn where f1 = c1 and
ck · f1 ck · f2 ck · fk-1
fk = ck - _____ f + _____
2 1
f - - _______
2 2
fk-1
f1 f2 fk-12
1 f for each k. Then q , q , …, q are
for each k = 2, 3, …, n. Now write qk = ____ k 1 2 n
fk
orthonormal columns, and the above equation becomes
fkqk = ck - (ck · q1)q1 - (ck · q2)q2 - - (ck · qk-1)qk-1
Using these equations, express each ck as a linear combination of the qi:
c1 = f1q1
c2 = (c2 · q1)q1 + f2q2
c3 = (c3 · q1)q1 + (c3 · q2)q2 + f3q3
cn = (cn · q1)q1 + (cn · q2)q2 + (cn · q3)q3 + + fnqn
These equations have a matrix form that gives the required factorization:
A = [c1 c2 c3 cn]
f1 c 2 q1 c 3 q1 c n q1
0 f2 c 3 q2 c n q2
= [q1 q2 q3 qn] 0 0 f3 c n q3 (∗)
0 0 0 fn
Here the first factor Q = [q1 q2 q3 qn] has orthonormal columns, and the
second factor is an n × n upper triangular matrix R with positive diagonal entries
(and so is invertible). We record this in the following theorem.
Theorem 1
QR-Factorization
Every m × n matrix A with linearly independent columns has a QR-factorization A = QR
where Q has orthonormal columns and R is upper triangular with positive diagonal entries.
EXAMP L E 1
1 1 0
−1 0 1
Find the QR-factorization of A = .
0 1 1
0 0 1
Solution ► Denote the columns of A as c1, c2, and c3, and observe that {c1, c2, c3}
is independent. If we apply the Gram-Schmidt algorithm to these columns, the
ST
result is:
1
_
S T ST
2
1 1
_
0
f1 = c1 = −1 , f2 = c2 - _12 f1 = 2 , and f3 = c3 + _12 f1 - f2 = 0 .
0 1 0
0 0 1
1 f for each j, so {q , q , q } is orthonormal. Then equation (∗)
Write qj = ____ j 1 2 3
fj 2
preceding Theorem 1 gives A = QR where
1 1
0
2 6 3 1 0
−1 1
0 − 3 1 0
Q = [q1 q2 q3] = 2 6 = 1
2
0 0 6 0 2 0
6
0 0 1 0 0 6
f1 c2 · q1 c3 · q1 2 1 −1
2 1 −1
2 2
R= 0 f2 c3 · q2 = 3 3 = 1
0 3 3
0 2
0 0 2 2
f3 0 0 2
0 0 1
The reader can verify that indeed A = QR.
Corollary 1
Theorem 2
Theorem 3
PROOF
Write Q = [c1 c2 cn] and Q1 = [d1 d2 dn] in terms of their columns,
and observe first that QTQ = In = Q T1 Q1 because Q and Q1 have orthonormal
columns. Hence it suffices to show that Q1 = Q (then R1 = Q T1 A = QTA = R).
Since Q T1 Q1 = In, the equation QR = Q1R1 gives Q T1 Q = R1R-1; for convenience
we write this matrix as
Q T1 Q = R1R-1 = [tij].
This matrix is upper triangular with positive diagonal elements (since this is true
for R and R1), so tii > 0 for each i and tij = 0 if i > j. On the other hand, the
(i, j)-entry of Q T1 Q is d Ti cj = di · cj, so we have di · cj = tij for all i and j. But each
cj is in span{d1, d2, …, dn} because Q = Q1(R1R-1). Hence the expansion theorem
gives
cj = (d1 · cj)d1 + (d2 · cj)d2 + + (dn · cj)dn = t1jd1 + t2jd2 + + tjjdi
because di · cj = tij = 0 if i > j. The first few equations here are
c1 = t11d1
c2 = t12d1 + t22d2
c3 = t13d1 + t23d2 + t33d3
c4 = t14d1 + t24d2 + t34d3 + t44d4
The first of these equations gives 1 = c1 = t11d1 = t11d1 = t11, whence
c1 = d1. But then t12 = d1 · c2 = c1 · c2 = 0, so the second equation becomes
c2 = t22d2. Now a similar argument gives c2 = d2, and then t13 = 0 and t23 = 0
follows in the same way. Hence c3 = t33d3 and c3 = d3. Continue in this way to
get ci = di for all i. This means that Q1 = Q, which is what we wanted.
SECTION 8.5 Computing Eigenvalues 393
EXERCISES 8.4
1. In each case find the QR-factorization of A. (c) If AB has a QR-factorization, show that the
same is true of B but not necessarily A.
(a) A = S T A=S T
1 -1 2 1
(b)
[Hint: Consider AAT where A = S T.]
-1 0 1 1 1 0 0
1 1 1
1 1 1 1 1 0
1 1 0 −1 0 1 3. If R is upper triangular and invertible, show that
(c) A = (d) A
there exists a diagonal matrix D with diagonal
1 0 0 0 1 1
entries ±1 such that R1 = DR is invertible, upper
0 0 0 1 −1 0
triangular, and has positive diagonal entries.
2. Let A and B denote matrices.
4. If A has independent columns, let A = QR where
(a) If A and B have independent columns, show Q has orthonormal columns and R is invertible
that AB has independent columns. [Hint: and upper triangular. [Some authors call this
Theorem 3 Section 5.4.] a QR-factorization of A.] Show that there is a
(b) Show that A has a QR-factorization if and diagonal matrix D with diagonal entries ±1 such
only if A has independent columns. that A = (QD)(DR) is the QR-factorization of A.
[Hint: Preceding exercise.]
Because the vectors x1, x2, …, xn, … approximate dominant eigenvectors, this
suggests that we define the Rayleigh quotients as follows:
xk · xk+1
rk = ________ for k ≥ 1.
xk2
Then the numbers rk approximate the dominant eigenvalue λ.
EXAMPLE 1
Use the power method to approximate a dominant eigenvector and eigenvalue
of A = S T.
1 1
2 0
Solution ► The eigenvalues of A are 2 and -1, with eigenvectors S 1 T and S 1 T.
1 -2
Take x0 = S T as the first approximation and compute x1, x2, …, successively,
1
0
from x1 = Ax0, x2 = Ax1, … . The result is
x1 = S 1 T , x2 = S 3 T , x3 = S 5 T , x4 = S 11 T , x5 = S 21 T , …
2 2 6 10 22
These vectors are approaching scalar multiples of the dominant eigenvector
S T. Moreover, the Rayleigh quotients are
1
1
r1 = _75 , r2 = __
27
13
115
, r3 = ___
61
451
, r4 = ___
221
,…
and these are approaching the dominant eigenvalue 2.
To see why the power method works, let λ1, λ2, …, λm be eigenvalues of A with
λ1 dominant and let y1, y2, …, ym be corresponding eigenvectors. What is required
is that the first approximation x0 be a linear combination of these eigenvectors:
x0 = a1y1 + a2y2 + + amym with a1 ≠ 0
If k ≥ 1, the fact that xk = Akx0 and Akyi = λ ki yi for each i gives
xk = a1λ k1y1 + a2λ k2y2 + + amλ kmym for k ≥ 1
Hence
2Q R y2 + + amQ ___ R ym
λ2 k λm k
1 x = a y + a ___
___
k k 1 1
λ1 λ1 λ1
Q ___
| | < 1 for each i > 1 R. Because a1 ≠ 0, this means that xk approximates the
λi
λ1
dominant eigenvector a1λ k1y1.
The power method requires that the first approximation x0 be a linear
combination of eigenvectors. (In Example 1 the eigenvectors form a basis of 2.)
But even in this case the method fails if a1 = 0, where a1 is the coefficient of the
dominant eigenvector (try x0 = S T in Example 1). In general, the rate of
-1
2
λ
| |
convergence is quite slow if any of the ratios ___i is near 1. Also, because the
λ1
method requires repeated multiplications by A, it is not recommended unless these
multiplications are easy to carry out (for example, if most of the entries of A are zero).
SECTION 8.5 Computing Eigenvalues 395
QR-Algorithm
A much better method for approximating the eigenvalues of an invertible matrix A
depends on the factorization (using the Gram-Schmidt algorithm) of A in the form
A = QR
where Q is orthogonal and R is invertible and upper triangular (see Theorem
2 Section 8.4). The QR-algorithm uses this repeatedly to create a sequence of
matrices A1 = A, A2, A3, …, as follows:
1. Define A1 = A and factor it as A1 = Q1R1.
2. Define A2 = R1Q1 and factor it as A2 = Q2R2.
3. Define A3 = R2Q2 and factor it as A3 = Q3R3.
EXAMP L E 2
If A = S T as in Example 1, use the QR-algorithm to approximate the
1 1
2 0
eigenvalues.
√5 S T
A1 = S T = Q1R1 where Q1 = __ √5 S T and R1 = __
1__ 5 1
1 1 1__ 1 2
2 0 2 -1 0 2
13 S T=S T
1 27 -5 2.08 -0.38
A3 = __
8 -14 0.62 -1.08
so we take Ak+1 = RkQk + skI. If the shifts sk are carefully chosen, convergence can
be greatly improved.
Complex Eigenvalues. If some of the eigenvalues of a real matrix A are not real,
the QR-algorithm converges to a block upper triangular matrix where the diagonal
blocks are either 1 × 1 (the real eigenvalues) or 2 × 2 (each providing a pair of
conjugate complex eigenvalues of A).
EXERCISES 8.5
1. In each case, find the exact eigenvalues and 4. If A is symmetric, show that each matrix Ak in
determine corresponding eigenvectors. Then the QR-algorithm is also symmetric. Deduce
(a) A = S T A=S T
2 -4 5 2 6. Given a matrix A, let Ak, Qk, and Rk, k ≥ 1, be
(b)
-3 3 -3 -2 the matrices constructed in the QR-algorithm.
Show that Ak = (Q1Q2Qk)(RkR2R1) for each
(c) A = S T (d) A = S T
1 2 3 1
2 1 1 0 k ≥ 1 and hence that this is a QR-factorization
of Ak. [Hint: Show that QkRk = Rk-1Qk-1 for
2. In each case, find the exact eigenvalues and then each k ≥ 2, and use this equality to compute
approximate them using the QR-algorithm. (Q1Q2Qk)(RkR2R1) “from the centre out.”
(a) A = S T (b) A = S T
1 1 3 1 Use the fact that (AB)n+1 = A(BA)nB for any
1 0 1 0 square matrices A and B.]
Definition 8.7 Given z = (z1, z2, …, zn) and w = (w1, w2, …, wn) in n, define their standard inner
product 〈z, w〉 by
__ __ __
〈z, w〉 = z1 w 1 + z2 w 2 + + zn w n
__
where w is the conjugate of the complex number w.
Clearly, if z and w actually lie in n, then 〈z, w〉 = z · w is the usual dot product.
398 Chapter 8 Orthogonality
EXAMPLE 1
If z = (2, 1 - i, 2i, 3 - i) and w = (1 - i, -1, -i, 3 + 2i), then
〈z, w〉 = 2(1 + i) + (1 - i)(-1) + (2i)(i) + (3 - i)(3 - 2i) = 6 - 6i
〈z, z〉 = 2 · 2 + (1 - i)(1 + i) + (2i)(-2i) + (3 - i)(3 + i) = 20
Theorem 1
Let z, z1, w, and w1 denote vectors in n, and let λ denote a complex number.
1. 〈z + z1, w〉 = 〈z, w〉 + 〈z1, w〉 and 〈z, w + w1〉 = 〈z, w〉 + 〈z, w1〉.
__
2. 〈λz, w〉 = λ〈z, w〉 and 〈z, λw〉 = λ 〈z, w〉.
______
3. 〈z, w〉 = 〈w, z〉 .
4. 〈z, z〉 ≥ 0, and 〈z, z〉 = 0 if and only if z = 0.
PROOF
We leave (1) and (2) to the reader (Exercise 10), and (4) has already been proved.
To prove (3), write z = (z1, z2, …, zn) and w = (w1, w2, …, wn). Then
______ _________________
__ __ __ __
__ __ __
__
〈w, z〉 = (w1 z 1 + + wn z n) = w 1 z 1 + + w n z n
__ __
= z1 w 1 + + zn w n = 〈z, w〉
Definition 8.8 As for the dot product on n, property (4) enables us to define the norm or length z
of a vector z = (z1, z2, …, zn) in n:
_____ _____________________
z = √〈z, z〉 = √|z1|2 + |z2|2 + + |zn|2
The only properties of the norm function we will need are the following (the proofs
are left to the reader):
Theorem 2
EXAMP L E 2
In 4, find a unit vector u that is a positive real multiple of
z = (1 - i, i, 2, 3 + 4i).
______________ ___ __
Solution ► z = √2 + 1 + 4 + 25 = √32 = 4√2 , so take u = ___
1 __
√
z.
4 2
EXAMP L E 3
3 − 2i
3 1 − i 2 +i H
= 1 + i 5 − 2i
2i 5 + 2i − i
2−i i
The following properties of AH follow easily from the rules for transposition of
real matrices and extend these rules to complex matrices. Note the conjugate in
property (3).
Theorem 3
Hermitian matrices are easy to recognize because the entries on the main diagonal
must be real, and the “reflection’’ of each nondiagonal entry in the main diagonal
must be the conjugate of that entry.
EXAMPLE 4
3 i 2+ i
− i − 2 − 7 is hermitian, whereas S 1 i T and S 1 i T are not.
2 − i −7 1 i -2 -i i
The following Theorem extends Theorem 3 Section 8.2, and gives a very useful
characterization of hermitian matrices in terms of the standard inner product in n.
Theorem 4
PROOF
__
If A is hermitian, we have AT = A . If z and w are columns in n, then
__
〈z, w〉 = zT w , so
__ __ __ __ ____
〈Az, w〉 = (Az)T w = zTAT w = zT A w = zT( Aw ) = 〈z, Aw〉.
To prove the converse, let ej denote column j of the identity matrix. If A = [aij],
the condition gives
__
a ij = 〈ei, Aej〉 = 〈Aei, ej〉 = aij.
__
T
Hence A = A , so A is hermitian.
9 The name hermitian honours Charles Hermite (1822–1901), a French mathematician who worked primarily in analysis and is
remembered as the first to show that the number e from calculus is transcendental—that is, e is not a root of any polynomial with
integer coefficients.
SECTION 8.6 Complex Matrices 401
This polynomial has complex coefficients (possibly nonreal). However, the proof of
Theorem 2 Section 3.3 goes through to show that the eigenvalues of A are the roots
(possibly complex) of cA(x).
It is at this point that the advantage of working with complex numbers becomes
apparent. The real numbers are incomplete in the sense that the characteristic
polynomial of a real matrix may fail to have all its roots real. However, this difficulty
does not occur for the complex numbers. The so-called fundamental theorem of
algebra ensures that every polynomial of positive degree with complex coefficients
has a complex root. Hence every square complex matrix A has a (complex)
eigenvalue. Indeed (Appendix A), cA(x) factors completely as follows:
cA(x) = (x - λ1)(x - λ2)(x - λn)
where λ1, λ2, …, λn are the eigenvalues of A (with possible repetitions due to
multiple roots).
The next result shows that, for hermitian matrices, the eigenvalues are actually
real. Because symmetric real matrices are hermitian, this re-proves Theorem 7
Section 5.5. It also extends Theorem 4 Section 8.2, which asserts that eigenvectors
of a symmetric real matrix corresponding to distinct eigenvalues are actually
orthogonal. In the complex context, two n-tuples z and w in n are said to be
orthogonal if 〈z, w〉 = 0.
Theorem 5
PROOF
Let λ and be eigenvalues of A with (nonzero) eigenvectors z and w. Then
Az = λz and Aw = w, so Theorem 4 gives
__
λ〈z, w〉 = 〈λz, w〉 = 〈Az, w〉 = 〈z, Aw〉 = 〈z, w〉 = 〈z, w〉 (∗)
__
2
If = λ and w =__z, this becomes λ〈z, z〉 = λ 〈z, z〉. Because 〈z, z〉 = z ≠ 0,
this implies λ = λ . Thus λ is real, proving (1). Similarly, is real, so equation
(∗) gives λ〈z, w〉 = 〈z, w〉. If λ ≠ , this implies 〈z, w〉 = 0, proving (2).
The principal axis theorem (Theorem 2 Section 8.2) asserts that every real
symmetric matrix A is orthogonally diagonalizable—that is PTAP is diagonal where
P is an orthogonal matrix (P-1 = PT). The next theorem identifies the complex
analogs of these orthogonal real matrices.
Definition 8.11 As in the real case, a set of nonzero vectors {z1, z2, …, zm} in n is called orthogonal
if 〈zi, zj〉 = 0 whenever i ≠ j, and it is orthonormal if, in addition, zi = 1 for
each i.
402 Chapter 8 Orthogonality
Theorem 6
PROOF
If A__= [c1 c2 cn] is a complex matrix with jth column cj, then
AT A = [〈ci, cj〉], as in Theorem 1 Section 8.2. Now (1) ⇔ (2) follows,
and (1) ⇔ (3) is proved in the same way.
EXAMPLE 5
1 - i √2 i
EXAMPLE 6
Unitary Diagonalization
An n × n complex matrix A is called unitarily diagonalizable if U HAU is diagonal
for some unitary matrix U. As Example 6 suggests, we are going to prove that every
hermitian matrix is unitarily diagonalizable. However, with only a little extra effort,
we can get a very important theorem that has this result as an easy consequence.
A complex matrix is called upper triangular if every entry below the main
diagonal is zero. We owe the following theorem to Issai Schur.10
Theorem 7
Schur’s Theorem
If A is any n × n complex matrix, there exists a unitary matrix U such that
U HAU = T
is upper triangular. Moreover, the entries on the main diagonal of T are the eigenvalues
λ1, λ2, …, λn of A (including multiplicities).
PROOF
We use induction on n. If n = 1, A is already upper triangular. If n > 1,
assume the theorem is valid for (n - 1) × (n - 1) complex matrices. Let
λ1 be an eigenvalue of A, and let y1 be an eigenvector with y1 = 1. Then
y1 is part of a basis of n (by the analog of Theorem 1 Section 6.4), so the
(complex analog of the) Gram-Schmidt process provides y2, …, yn such that
{y1, y2, …, yn} is an orthonormal basis of n. If U1 = [y1 y2 yn] is the
matrix with these vectors as its columns, then (see Lemma 3)
S0 T
λ1 X1
U H1 AU1 =
A1
in block form. Now apply induction to find a unitary (n - 1) × (n - 1) matrix
1 A1W1 = T1 is upper triangular. Then U2 = S
0 W1 T
1 0
W1 such that W H is a unitary
n × n matrix. Hence U = U1U2 is unitary (using Theorem 6), and
U HAU = U H2 (U H1 AU1)U2
= S 1 0 H T S λ1 X1 T S 1 0 T = S 1 1 1 T
λ XW
0 W 1 0 A1 0 W1 0 T1
is upper triangular. Finally, A and U HAU = T have the same eigenvalues by (the
complex version of ) Theorem 1 Section 5.5, and they are the diagonal entries of
T because T is upper triangular.
The fact that similar matrices have the same traces and determinants gives the
following consequence of Schur’s theorem.
10 Issai Schur (1875–1941) was a German mathematician who did fundamental work in the theory of representations of groups as
matrices.
404 Chapter 8 Orthogonality
Corollary 1
Let A be an n × n complex matrix, and let λ1, λ2, …, λn denote the eigenvalues of A,
including multiplicities. Then
det A = λ1λ2λn and tr A = λ1 + λ2 + + λn
Theorem 8
Spectral Theorem
If A is hermitian, there is a unitary matrix U such that U HAU is diagonal.
PROOF
By Schur’s theorem, let U HAU = T be upper triangular where U is unitary.
Since A is hermitian, this gives
T H = (U HAU )H = U HAHU HH = U HAU = T
This means that T is both upper and lower triangular. Hence T is actually
diagonal.
The principal axis theorem asserts that a real matrix A is symmetric if and only if
it is orthogonally diagonalizable (that is, PTAP is diagonal for some real orthogonal
matrix P). Theorem 8 is the complex analog of half of this result. However, the
converse is false for complex matrices: There exist unitarily diagonalizable matrices
that are not hermitian.
EXAMPLE 7
Show that the non-hermitian matrix A = S T is unitarily diagonalizable.
0 1
-1 0
Solution ► The characteristic polynomial is cA(x) = x2 + 1. Hence the
eigenvalues are i and -i, and it is easy to verify that S T and S T are
i -1
-1 i
corresponding eigenvectors. Moreover, these eigenvectors are orthogonal and
There is a very simple way to characterize those complex matrices that are
unitarily diagonalizable. To this end, an n × n complex matrix N is called normal
if NNH = NHN. It is clear that every hermitian or unitary matrix is normal, as is
the matrix S T in Example 7. In fact we have the following result.
0 1
-1 0
Theorem 9
PROOF
Assume first that U HAU = D, where U is unitary and D is diagonal.
Then DDH = DHD as is easily verified. Because DDH = U H(AAH)U and
DHD = U H(AHA)U, it follows by cancellation that AAH = AHA.
Conversely, assume A is normal—that is, AAH = AHA. By Schur’s theorem,
let U HAU = T, where T is upper triangular and U is unitary. Then T is
normal too:
TT H = U H(AAH)U = U H(AHA)U = T HT
Hence it suffices to show that a normal n × n upper triangular matrix T must be
diagonal. We induct on n; it is clear if n = 1. If n > 1 and T = [tij], then equating
(1, 1)-entries in TT H and T HT gives
|t11|2 + |t12|2 + + |t1n|2 = |t11|2
S0 T so TT
t 11 0
T = H
= T HT implies T1T H1 = T1T H1 . Thus T1 is diagonal by
TH
1
induction, and the proof is complete.
Theorem 10
Cayley-Hamilton Theorem11
If A is an n × n complex matrix, then cA(A) = 0; that is, A is a root of its characteristic
polynomial.
11
11 Named after the English mathematician Arthur Cayley (1821–1895), see page 32, and William Rowan Hamilton (1805–1865), an Irish
mathematician famous for his work on physical dynamics.
406 Chapter 8 Orthogonality
PROOF
If p(x) is any polynomial with complex coefficients, then p(P-1AP) = P-1p(A)P
for any invertible complex matrix P. Hence, by Schur’s theorem, we may assume
that A is upper triangular. Then the eigenvalues λ1, λ2, …, λn of A appear along
the main diagonal, so cA(x) = (x - λ1)(x - λ2)(x - λ3)(x - λn). Thus
cA(A) = (A - λ1I)(A - λ2I)(A - λ3I)(A - λnI)
Note that each matrix A - λiI is upper triangular. Now observe:
1. A - λ1I has zero first column because column 1 of A is (λ1, 0, 0, …, 0)T.
2. Then (A - λ1I)(A - λ2I) has the first two columns zero because column 2 of
(A - λ2I) is (b, 0, 0, …, 0)T for some constant b.
3. Next (A - λ1I)(A - λ2I)(A - λ3I) has the first three columns zero because
column 3 of (A - λ3I) is (c, d, 0, …, 0)T for some constants c and d.
Continuing in this way we see that (A - λ1I)(A - λ2I)(A - λ3I) (A - λnI) has
all n columns zero; that is, cA(A) = 0.
EXERCISES 8.6
1. In each case, compute the norm of the complex 4. In each case, find a basis over , and determine
vector. the dimension of the complex subspace U of 3
(see the previous exercise).
(a) (1, 1 - i, -2, i)
(a) U = {(w, v + w, v - iw) | v, w in }
(b) (1 - i, 1 + i, 1, -1)
(b) U = {(iv + w, 0, 2v - w) | v, w in }
(c) (2 + i, 1 - i, 2, 0, -i)
(c) U = {(u, v, w) | iu - 3v + (1 - i)w = 0;
(d) (-2, -i, 1 + i, 1 - i, 2i)
u, v, w in }
2. In each case, determine whether the two vectors (d) U = {(u, v, w) | 2u + (1 + i)v - iw = 0;
are orthogonal. u, v, w in }
(a) (4, -3i, 2 + i), (i, 2, 2 - 4i)
5. In each case, determine whether the given matrix
(b) (i, -i, 2 + i), (i, i, 2 - i) is hermitian, unitary, or normal.
(a) S T S -3 2 T
(c) (1, 1, i, i), (1, i, -i, 1) 1 -i 2 3
(b)
i i
(d) (4 + 4i, 2 + i, 2i), (-1 + i, 2, 3 - 2i)
(c) S T S T
1 i 1 -i
3. A subset U of n is called a complex subspace (d)
-i 2 i -1
of n if it contains 0 and if, given v and w in
·S T (f ) S T
U, both v + w and zv lie in U (z any complex 1__ 1 -1 1 1+i
(e) __
√2
number). In each case, determine whether U is a 1 1 1+i i
(g) S T
1+i 1
√2 |z| S
z -z T
complex subspace of 3. (h) ____
1
__
z z
__ _ ,z≠0
__ -i -1 + i
(a) U = {(w, w , 0) | w in }
(b) U = {(w, 2w, a) | w in , a in } 6. Show
__ that a__matrix N is normal if and only if
3 N NT = NT N .
(c) U =
(d) U = {(v + w, v - 2w, v) | v, w in }
SECTION 8.6 Complex Matrices 407
__
7. Let A = S
v wT
z v
where v, w, and z are complex (d) Show that every n × n complex matrix Z can
be written uniquely as Z = A + B, where A is
numbers. Characterize in terms of v, w, and z
hermitian and B is skew-hermitian.
when A is
(a) hermitian (b) unitary 15. Let U be a unitary matrix. Show that:
(c) A = S T S1 - i T
a b 2 1+i again unitary.
(d) A=
-b a 3 (c) If U is unitary, show that U H is unitary.
a, b, real
1 0 1+ i 1 0 0 17. Let Z be an m × n matrix such that ZHZ = In
(for example, Z is a unit column in n).
(e) A = 0 2 0 (f ) A = 0 1 1+ i
1− i 0 0 −
0 1 i 2 (a) Show that V = ZZH is hermitian and satisfies
V 2 = V.
9. Show that 〈Ax, y〉 = 〈x, AHy〉 holds for all n × n
matrices A and for all n-tuples x and y in n. (b) Show that U = I - 2ZZH is both unitary and
hermitian (so U -1 = U H = U ).
10. (a) Prove (1) and (2) of Theorem 1.
18. (a) If N is normal, show that zN is also normal
(b) Prove Theorem 2.
for all complex numbers z.
(c) Prove Theorem 3.
(b) Show that (a) fails if normal is replaced by
11. (a) __
Show that A is hermitian if and only if hermitian.
A = AT.
19. Show that a real 2 × 2 normal matrix is either
symmetric or has the form S T.
(b) Show that the diagonal entries of any a b
hermitian matrix are real. -b a
12. (a) Show that every complex matrix Z can be 20. If A is hermitian, show that all the coefficients of
written uniquely in the form Z = A + iB, cA(x) are real numbers.
Modular Arithmetic
We work in the set = {0, ±1, ±2, ±3, …} of integers, that is the set of whole
numbers. Everyone is familiar with the process of “long division” from arithmetic.
For example, we can divide an integer a by 5 and leave a remainder “modulo 5” in
the set {0, 1, 2, 3, 4}. As an illustration
19 = 3 · 5 + 4,
so the remainder of 19 modulo 5 is 4. Similarly, the remainder of 137 modulo 5 is 2
because 137 = 27 · 5 + 2. This works even for negative integers: For example,
-17 = (-4) · 5 + 3,
so the remainder of -17 modulo 5 is 3.
This process is called the division algorithm. More formally, let n ≥ 2 denote an
integer. Then every integer a can be written uniquely in the form
a = qn + r where q and r are integers and 0 ≤ r ≤ n - 1.
Here q is called the quotient of a modulo n, and r is called the remainder
of a modulo n. We refer to n as the modulus. Thus, if n = 6, the fact that
134 = 22 · 6 + 2 means that 134 has quotient 22 and remainder 2 modulo 6.
Our interest here is in the set of all possible remainders modulo n. This set is
denoted
n = {0, 1, 2, 3, …, n - 1}
and is called the set of integers modulo n. Thus every integer is uniquely
represented in n by its remainder modulo n.
We are going to show how to do arithmetic in n by adding and multiplying
modulo n. That is, we add or multiply two numbers in n by calculating the usual
sum or product in and taking the remainder modulo n. It is proved in books on
abstract algebra that the usual laws of arithmetic hold in n for any modulus n ≥ 2.
This seems remarkable until we remember that these laws are true for ordinary
addition and multiplication and all we are doing is reducing modulo n.
To illustrate, consider the case n = 6, so that 6 = {0, 1, 2, 3, 4, 5}. Then
2 + 5 = 1 in 6 because 7 leaves a remainder of 1 when divided by 6. Similarly,
2 · 5 = 4 in 6, while 3 + 5 = 2, and 3 + 3 = 0. In this way we can fill in the
addition and multiplication tables for 6; the result is:
SECTION 8.7 An Application to Linear Codes over Finite Fields 409
Tables for 6
+ 0 1 2 3 4 5 × 0 1 2 3 4 5
0 0 1 2 3 4 5 0 0 0 0 0 0 0
1 1 2 3 4 5 0 1 0 1 2 3 4 5
2 2 3 4 5 0 1 2 0 2 4 0 2 4
3 3 4 5 0 1 2 3 0 3 0 3 0 3
4 4 5 0 1 2 3 4 0 4 2 0 4 2
5 5 0 1 2 3 4 5 0 5 4 3 2 1
Calculations in 6 are carried out much as in . As an illustration, consider the
“distributive law” a(b + c) = ab + ac familiar from ordinary arithmetic. This holds
for all a, b, and c in 6; we verify a particular case:
3(5 + 4) = 3 · 5 + 3 · 4 in 6
In fact, the left side is 3(5 + 4) = 3 · 3 = 3, and the right side is (3 · 5) + (3 · 4) =
3 + 0 = 3 too. Hence doing arithmetic in 6 is familiar. However, there are
differences. For example, 3 · 4 = 0 in 6, in contrast to the fact that a · b = 0 in
can only happen when either a = 0 or b = 0. Similarly, 32 = 3 in 6, unlike .
Note that we will make statements like -30 = 19 in 7; it means that -30 and
19 leave the same remainder 5 when divided by 7, and so are equal in 7 because
they both equal 5. In general, if n ≥ 2 is any modulus, the operative fact is that
a = b in n if and only if a - b is a nultiple of n.
In this case we say that a and b are equal modulo n, and write a = b (mod n).
Arithmetic in n is, in a sense, simpler than that for the integers. For
example, consider negatives. Given the element 8 in 17, what is -8? The
answer lies in the observation that 8 + 9 = 0 in 17, so -8 = 9 (and -9 = 8).
In the same way, finding negatives is not difficult in n for any modulus n.
Finite Fields
In our study of linear algebra so far the scalars have been real (possibly complex)
numbers. The set of real numbers has the property that it is closed under addition
and multiplication, that the usual laws of arithmetic hold, and that every nonzero real
number has an inverse in . Such a system is called a field. Hence the real numbers
form a field, as does the set of complex numbers. Another example is the set of all
rational numbers (fractions); however the set of integers is not a field—for example,
2 has no inverse in the set because 2 · x = 1 has no solution x in .
Our motivation for isolating the concept of a field is that nearly everything we
have done remains valid if the scalars are restricted to some field: The gaussian
algorithm can be used to solve systems of linear equations with coefficients in
the field; a square matrix with entries from the field is invertible if and only if its
determinant is nonzero; the matrix inversion algorithm works in the same way; and
so on. The reason is that the field has all the properties used in the proofs of these
results for the field , so all the theorems remain valid.
It turns out that there are finite fields—that is, finite sets that satisfy the usual
laws of arithmetic and in which every nonzero element a has an inverse, that
is an element b in the field such that ab = 1. If n ≥ 2 is an integer, the modular
system n certainly satisfies the basic laws of arithmetic, but it need not be a field.
For example we have 2 · 3 = 0 in 6 so 3 has no inverse in 6 (if 3a = 1 then
2 = 2 · 1 = 2(3a) = 0a = 0 in 6, a contradiction). The problem is that 6 = 2 · 3
can be properly factored in .
410 Chapter 8 Orthogonality
Theorem 1
The proof can be found in books on abstract algebra.12 If p is a prime, the field p is
called the field of integers modulo p.
For example, consider the case n = 5. Then 5 = {0, 1, 2, 3, 4} and the addition
and multiplication tables are:
+ 0 1 2 3 4 × 0 1 2 3 4
0 0 1 2 3 4 0 0 0 0 0 0
1 1 2 3 4 0 1 0 1 2 3 4
2 2 3 4 0 1 2 0 2 4 1 3
3 3 4 0 1 2 3 0 3 1 4 2
4 4 0 1 2 3 4 0 4 3 2 1
Hence 1 and 4 are self-inverse in 5, and 2 and 3 are inverses of each other, so 5 is
indeed a field. Here is another important example.
EXAMPLE 1
If p = 2, then 2 = {0, 1} is a field with addition and multiplication modulo 2
given by the tables
+ 0 1 × 0 1
0 0 1 and 0 0 0
1 1 0 1 0 1
This is binary arithmetic, the basic algebra of computers.
12 See, for example, W. K. Nicholson, Introduction to Abstract Algebra, 4th ed., (New York: Wiley, 2012).
SECTION 8.7 An Application to Linear Codes over Finite Fields 411
EXAMP L E 2
Find the inverse of 14 in 17.
As mentioned above, nearly everything we have done with matrices over the field
of real numbers can be done in the same way for matrices with entries from p. We
illustrate this with one example. Again the reader is referred to books on abstract
algebra.
EXAMP L E 3
While we shall not use them, there are finite fields other than p for the various
primes p. Surprisingly, for every prime p and every integer n ≥ 1, there exists a field
with exactly pn elements, and this field is unique.13 It is called the Galois field of
order pn, and is denoted GF( pn).
13 See, for example, W. K. Nicholson, Introduction to Abstract Algebra, 4th ed., (New York: Wiley, 2012).
412 Chapter 8 Orthogonality
Theorem 2
PROOF
(1) and (3) are clear, and (2) follows because wt(v) = 0 if and only if v = 0. To prove
(4), write x = v - u and y = u - w. Then (4) reads wt(x + y) ≤ wt(x) + wt(y).
If x = a1a2an and y = b1b2bn, this follows because ai + bi ≠ 0 implies that
either ai ≠ 0 or bi ≠ 0.
SECTION 8.7 An Application to Linear Codes over Finite Fields 413
Given a word w in F n and a real number r > 0, define the ball Br(w) of radius r
(or simply the r-ball) about w as follows:
Br(w) = {x ∈ F n | d(w, x) ≤ r}.
Using this we can describe one of the most useful decoding methods.
Using this method, we can describe how to construct a code C that can detect (or
correct) t errors. Suppose a code word c is transmitted and a word w is received with
s errors where 1 ≤ s ≤ t. Then s is the number of places at which the c- and w-digits
differ, that is, s = d(c, w). Hence Bt(c) consists of all possible received words where
at most t errors have occurred.
Assume first that C has the property that no code word lies in the t-ball of
another code word. Because w is in Bt(c) and w ≠ c, this means that w is not a code
word and the error has been detected. If we strengthen the assumption on C to
require that the t-balls about code words are pairwise disjoint, then w belongs to a
unique ball (the one about c), and so w will be correctly decoded as c.
To describe when this happens, let C be an n-code. The minimum distance d of C
is defined to be the smallest distance between two distinct code words in C; that is,
d = min{d(v, w) | v and w in C; v ≠ w}.
Theorem 3
Let C be an n-code with minimum distance d. Assume that nearest neighbour decoding
is used. Then:
1. If t < d, then C can detect t errors.14
2. If 2t < d, then C can correct t errors.
14
PROOF
1. Let c be a code word in C. If w ∈ Bt(c), then d(w, c) ≤ t < d by hypothesis.
Thus the t-ball Bt(c) contains no other code word, so C can detect t errors
by the preceding discussion.
2. If 2t < d, it suffices (again by the preceding discussion) to show that the
t-balls about distinct code words are pairwise disjoint. But if c ≠ c are
code words in C and w is in Bt(c ) ∩ Bt(c), then Theorem 2 gives
d(c, c ) ≤ d(c, w) + d(w, c ) ≤ t + t = 2t < d
by hypothesis, contradicting the minimality of d.
14 We say that C detects (corrects) t errors if C can detect (or correct) t or fewer errors.
414 Chapter 8 Orthogonality
EXAMPLE 4
If F = 3 = {0, 1, 2}, the 6-code {111111, 111222, 222111} has minimum
distance 3 and so can detect 2 errors and correct 1 error.
Theorem 4
Hamming Bound
Let C be an n-code over a field F that can correct t errors using nearest neighbour
decoding. If |F| = q, then
qn
|C| ≤ ____________
t
.
∑Q R(q - 1)i
n
i=0 i
PROOF
t
Write k = ∑Q R(q - 1)i. The t-balls centred at distinct code words each contain
n
i=0 i
k words, and there are |C| of them. Moreover they are pairwise disjoint because
the code corrects t errors (see the discussion preceding Theorem 3). Hence they
contain k · |C| distinct words, and so k · |C| ≤ |F n| = qn, proving the theorem.
Linear Codes
Up to this point we have been regarding any nonempty subset of the F-vector space
F n as a code. However many important codes are actually subspaces. A subspace
C ⊆ F n of dimension k ≥ 1 over F is called an (n, k)-linear code, or simply an
(n, k)-code. We do not regard the zero subspace (that is, k = 0) as a code.
SECTION 8.7 An Application to Linear Codes over Finite Fields 415
EXAMP L E 5
If F = 2 and n ≥ 2, the n-parity-check code is constructed as follows: An
extra digit is added to each word in F n-1 to make the number of 1s in the
resulting word even (we say such words have even parity). The resulting
(n, n-1)-code is linear because the sum of two words of even parity again
has even parity.
Many of the properties of general codes take a simpler form for linear codes.
The following result gives a much easier way to find the minimal distance of a
linear code, and sharpens the results in Theorem 3.
Theorem 5
Let C be an (n, k)-code with minimum distance d over a finite field F, and use nearest
neighbour decoding.
1. d = min{wt(w) | 0 ≠ w in C}.
2. C can detect t ≥ 1 errors if and only if t < d.
3. C can correct t ≥ 1 errors if and only if 2t < d.
4. If C can correct t ≥ 1 errors and |F| = q, then
PROOF
1. Write d = min{wt(w) | 0 ≠ w in C}. If v ≠ w are words in C, then
d(v, w) = wt(v - w) ≥ d because v - w is in the subspace C. Hence
d ≥ d. Conversely, given w ≠ 0 in C then, since 0 is in C, we have
wt(w) = d(w, 0) ≥ d by the definition of d. Hence d ≥ d and (1) is proved.
2. Assume that C can detect t errors. Given w ≠ 0 in C, the t-ball Bt(w) about
w contains no other code word (see the discussion preceding Theorem 3).
In particular, it does not contain the code word 0, so t < d(w, 0) = wt(w).
Hence t < d by (1). The converse is part of Theorem 3.
3. We require a result of interest in itself.
Claim. Suppose c in C has wt(c) ≤ 2t. Then Bt(0) ∩ Bt(c) is nonempty.
Proof. If wt(c) ≤ t, then c itself is in Bt(0) ∩ Bt(c). So assume t < wt(c) ≤ 2t.
Then c has more than t nonzero digits, so we can form a new word w by
changing exactly t of these nonzero digits to zero. Then d(w, c) = t, so w is
in Bt(c). But wt(w) = wt(c) - t ≤ t, so w is also in Bt(0). Hence w is in
Bt(0) ∩ Bt(c), proving the Claim.
If C corrects t errors, the t-balls about code words are pairwise disjoint (see
the discussion preceding Theorem 3). Hence the claim shows that wt(c) > 2t
for all c ≠ 0 in C, from which d > 2t by (1). The other inequality comes from
Theorem 3.
4. We have |C| = qk because dimF C = k, so this assertion restates Theorem 4.
416 Chapter 8 Orthogonality
EXAMPLE 6
If F = 2, then
C = {0000000, 0101010, 1010101, 1110000,
1011010, 0100101, 0001111, 1111111}
is a (7, 3)-code; in fact C = span{0101010, 1010101, 1110000}. The minimum
distance for C is 3, the minimum weight of a nonzero word in C.
Matrix Generators
Given a linear n-code C over a finite field F, the way encoding works in practice
is as follows. A message stream is blocked off into segments of length k ≤ n called
messages. Each message u in F k is encoded as a code word, the code word is
transmitted, the receiver decodes the received word as the nearest code word, and
then re-creates the original message. A fast and convenient method is needed to
encode the incoming messages, to decode the received word after transmission (with
or without error), and finally to retrieve messages from code words. All this can be
achieved for any linear code using matrix multiplication.
Let G denote a k × n matrix over a finite field F, and encode each message u in
F k as the word uG in F n using matrix multiplication (thinking of words as rows).
This amounts to saying that the set of code words is the subspace C = {uG | u in F k}
of F n. This subspace need not have dimension k for every k × n matrix G. But,
if {e1, e2, …, ek} is the standard basis of F k, then eiG is row i of G for each I and
{e1G, e2G, …, ekG} spans C. Hence dim C = k if and only if the rows of G are
independent in F n, and these matrices turn out to be exactly the ones we need.
For reference, we state their main properties in Lemma 1 below (see Theorem 4
Section 5.4).
Lemma 1
PROOF
(1) ⇒ (2). This is because dim(col G) = k by (1).
(2) ⇒ (4). G[x1 xn]T = x1c1 + + xncn where cj is column j of G.
(4) ⇒ (5). G[k1 kk] = [Gk1 Gkk] for columns kj.
(5) ⇒ (3). If a1R1 + + akRk = 0 where Ri is row i of G, then [a1 ak]G = 0,
so [a1 ak] = 0, by (5). Hence each ai = 0, proving (3).
(3) ⇒ (1). rank G = dim(row G) = k by (3).
SECTION 8.7 An Application to Linear Codes over Finite Fields 417
Note that Theorem 4 Section 5.4 asserts that, over the real field , the properties in
Lemma 1 hold if and only if GGT is invertible. But this need not be true in general.
For example, if F = 2 and G = S T , then GG = 0. The reason is that the
1 0 1 0 T
0 1 0 1
dot product w · w can be zero for w in F n even if w ≠ 0. However, even though
GGT is not invertible, we do have GK = I2 for some 4 × 2 matrix K over F as
Lemma 1 asserts (in fact, K = S T is one such matrix).
1 0 0 0T
0 1 0 0
Let C ⊆ F n be an (n, k)-code over a finite field F. If {w1, …, wk} is a basis
S T
w1
of C, let G = be the k × n matrix with the wi as its rows. Let {e1, …, ek}
wk
is the standard basis of F k regarded as rows. Then wi = eiG for each i, so
C = span{w1, …, wk} = span{e1G, …, ekG}. It follows (verify) that
C = {uG | u in F k}.
Because of this, the k × n matrix G is called a generator of the code C, and G has
rank k by Lemma 1 because its rows wi are independent.
In fact, every linear code C in F n has a generator of a simple, convenient form.
If G is a generator matrix for C, let R be the reduced row-echelon form of G. We
claim that C is also generated by R. Since G → R by row operations, Theorem 1
Section 2.5 shows that these same row operations [G Ik] → [R W], performed on
[G Ik], produce an invertible k × k matrix W such that R = WG. This shows that
C = {uR | u in F k}. [In fact, if u is in F k, then uG = u1R where u1 = uW -1 is in
F k, and uR = u2G where u2 = uW is in F k]. Thus R is a generator of C, so we may
assume that G is in reduced row-echelon form.
In that case, G has no row of zeros (since rank G = k) and so contains all the
columns of Ik. Hence a series of column interchanges will carry G to the block form
G = [Ik A] for some k × (n - k) matrix A. Hence the code C = {uG | u in F k} is
essentially the same as C; the code words in C are obtained from those in C by a
series of column interchanges. Hence if C is a linear (n, k)-code, we may (and shall)
assume that the generator matrix G has the form
G = [Ik A] for some k × (n - k) matrix A.
Such a matrix is called a standard generator, or a systematic generator, for the
code C. In this case, if u is a message word in F k, the first k digits of the encoded
word uG are just the first k digits of u, so retrieval of u from uG is very simple
indeed. The last n - k digits of uG are called parity digits.
Parity-Check Matrices
We begin with an important theorem about matrices over a finite field.
Theorem 6
2. HGT = 0.
3. C = {w in F n | wHT = 0}.
4. D = {w in F n | wGT = 0}.
PROOF
First, (1) ⇔ (2) holds because HGT and GHT are transposes of each other.
(1) ⇒ (3). Consider the linear transformation T : F n → F n-k defined by
T(w) = wHT for all w in F n. To prove (3) we must show that C = ker T.
We have C ⊆ ker T by (1) because T(uG) = uGHT = 0 for all u in F k. Since
dim C = rank G = k, it is enough (by Theorem 2 Section 6.4) to show that
dim(ker T ) = k. However the dimension theorem (Theorem 4 Section 7.2)
shows that dim(ker T ) = n - dim(im T ), so it is enough to show that
dim(im T ) = n - k. But if R1, …, Rn are the rows of HT, then block
multiplication gives
im T = {wHT | w in n} = span{R1, …, Rn} = row(HT).
Hence dim(im T ) = rank(HT) = rank H = n - k, as required. This proves (3).
(3) ⇒ (1). If u is in F k, then uG is in C so, by (3), u(GHT) = (uG)HT = 0.
Since u is arbitrary in F k, it follows that GHT = 0.
(2) ⇔ (4). The proof is analogous to (1) ⇔ (3).
GHT = [Ik A] S
In-k T
-A
= -A + A = 0
is called the syndrome. The receiver knows v and s = HvT, and wants to recover c.
Since c = v - z, it is enough to find z. But the possibilities for z are the solutions of
the linear system
HzT = s
where s is known. Now recall that Theorem 3 Section 2.2 shows that these
solutions have the form z = x + s where x is any solution of the homogeneous
system HxT = 0, that is, x is any word in C (by Lemma 1). In other words, the
errors z are the elements of the set
C + s = {c + s | c in C}.
The set C + s is called a coset of C. Let |F| = q. Since |C + s| = |C| = qn-k the
search for z is reduced from qn possibilities in F n to qn-k possibilities in C + s.
This is called syndrome decoding, and various methods for improving efficiency
and accuracy have been devised. The reader is referred to books on coding for
more details.15
Orthogonal Codes
Let F be a finite field. Given two words v = a1a2an and w = b1b2bn in F n, the
dot product v · w is defined (as in n) by
v · w = a1b1 + a2b2 + + anbn.
Note that v · w is an element of F, and it can be computed as a matrix product:
v · w = vwT.
If C ⊆ F n is an (n, k)-code, the orthogonal complement C⊥ is defined as in n:
C⊥ = {v in F n | v · c = 0 for all c in C}.
This is easily seen to be a subspace of F n, and it turns out to be an (n, n-k)-code.
This follows when F = because we showed (in the projection theorem) that
n = dim U⊥ + dim U for any subspace U of n. However the proofs break down
for a finite field F because the dot product in F n has the property that w · w = 0
can happen even if w ≠ 0. Nonetheless, the result remains valid.
Theorem 7
Let C be an (n, k)-code over a finite field F, let G = [Ik A] be a standard generator for C
where A is k × (n - k), and write H = [-AT In-k] for the parity-check matrix. Then:
1. H is a generator of C⊥.
2. dim(C⊥) = n - k = rank H.
3. C⊥⊥ = C and dim(C⊥) + dim C = n.
PROOF
As in Theorem 6, let D = {vH | v in F n-k} denote the code generated by H.
Observe first that, for all w in F n and all u in F k, we have
w · (uG) = w(uG)T = w(GTuT) = (wGT) · u.
15 For an elementary introduction, see V. Pless, Introduction to the Theory of Error-Correcting Codes, 3rd ed., (New York: Wiley, 1998).
For a more detailed treatment, see A. A. Bruen and M. A. Forcinito, Cryptography, Information Theory, and Error-Correction, (New
York: Wiley, 2005).
420 Chapter 8 Orthogonality
16
EXERCISES 8.7
0 0 1 1 0 1 0 1 1 1
5. Show that the entries of the last column of the that the binary (10, 3)-code generated by G
multiplication table of n are 0, n - 1, n - 2, …, corrects two errors. [It can be shown that no
2, 1 in that order. binary (9, 3)-code corrects two errors.]
6. In each case show that the matrix A is invertible 13. (a) Show that no binary linear (4, 2)-code can
over the given field, and find A-1. correct single errors.
S 2 1 T over 5.
1 4
(a) A = (b) Find a binary linear (5, 2)-code that can
correct one error.
S 4 3 T over 7.
5 6
(b) A=
14. Find the standard generator matrix G and the
3x + y + 4z = 3 parity-check matrix H for each of the following
7. Consider the linear system . systematic codes:
4x + 3y + z = 1
In each case solve the system by reducing the (a) {00000, 11111} over 2.
augmented matrix to reduced row-echelon form
(b) Any systematic (n, 1)-code where n ≥ 2.
over the given field:
(c) The code in Exercise 10(a).
(a) 5. (b) 7.
(d) The code in Exercise 10(b).
8. Let K be a vector space over 2 with basis {1, t},
so K = {a + bt | a, b, in 2}. It is known that K 15. Let c be a word in F n. Show that Bt(c) = c + Bt(0),
becomes a field of four elements if we define where we write c + Bt(0) = {c + v | v in Bt(0)}.
t2 = 1 + t. Write down the multiplication table
of K. 16. If a (n, k)-code has two standard generator
matrices G and G1, show that G = G1.
9. Let K be a vector space over 3 with basis {1, t},
so K = {a + bt | a, b, in 3}. It is known that K 17. Let C be a binary linear n-code (over 2).
becomes a field of nine elements if we define Show that either each word in C has even
t2 = -1 in 3. In each case find the inverse of weight, or half the words in C have even
the element x of K: weight and half have odd weight. [Hint: The
dimension theorem.]
422 Chapter 8 Orthogonality
Definition 8.13 A quadratic form q in the n variables x1, x2, …, xn is a linear combination of terms
x 21, x 22, …, x 2n, and cross terms x1x2, x1x3, x2x3, … .
EXAMPLE 1
Write q = x 21 + 3x 23 + 2x1x2 - x1x3 in the form q(x) = xTAx, where A is a
symmetric 3 × 3 matrix.
Solution ► The cross terms are 2x1x2 = x1x2 + x2x1 and -x1x3 = -_12 x1x3 - _12 x3x1.
Of course, x2x3 and x3x2 both have coefficient zero, as does x 22. Hence
1 1 − 21 x1
q(x) = [x1 x2 x3] 1 0 0 x2
− 21 0 3 x3
We shall assume from now on that all quadratic forms are given by
q(x) = xTAx
where A is symmetric. Given such a form, the problem is to find new variables
y1, y2, …, yn, related to x1, x2, …, xn, with the property that when q is expressed
in terms of y1, y2, …, yn, there are no cross terms. If we write
y = (y1, y2, …, yn)T
this amounts to asking that q = yTDy where D is diagonal. It turns out that this
can always be accomplished and, not surprisingly, that D is the matrix obtained
when the symmetric matrix A is othogonally diagonalized. In fact, as Theorem 2
Section 8.2 shows, a matrix P can be found that is orthogonal (that is, P-1 = PT)
and diagonalizes A:
λ1 0 0
0 λ2 0
PTAP = D =
0 0 λn
The diagonal entries λ1, λ2, …, λn are the (not necessarily distinct) eigenvalues
of A, repeated according to their multiplicities in cA(x), and the columns of P are
corresponding (orthonormal) eigenvectors of A. As A is symmetric, the λi are real
by Theorem 7 Section 5.5.
Now define new variables y by the equations
x = Py equivalently y = PTx
Then substitution in q(x) = xTAx gives
Theorem 1
Diagonalization Theorem
Let q = xTAx be a quadratic form in the variables x1, x2, …, xn, where
x = (x1, x2, …, xn)T and A is a symmetric n × n matrix. Let P be an orthogonal
matrix such that PTAP is diagonal, and define new variables y = ( y1, y2, …, yn)T by
x = Py equivalently y = PTx
If q is expressed in terms of these new variables y1, y2, …, yn, the result is
q = λ1 y 21 + λ2 y 22 + + λn y 2n
where λ1, λ2, …, λn are the eigenvalues of A repeated according to their multiplicities.
S T
y1
y2
x = Py = [f1 f2 fn] = y1f1 + y2f2 + + ynfn.
yn
424 Chapter 8 Orthogonality
Thus the new variables yi are the coefficients when x is expanded in terms of the
orthonormal basis {f1, …, fn} of n. In particular, the coefficients yi are given by
yi = x · fi by the expansion theorem (Theorem 6 Section 5.3). Hence q itself is
easily computed from the eigenvalues λi and the principal axes fi:
q = q(x) = λ1(x · f1)2 + + λn(x · fn)2.
EXAMPLE 2
Find new variables y1, y2, y3, and y4 such that
q = 3( x 21 + x 22 + x 23 + x 24)
+ 2x1x2 - 10x1x3 + 10x1x4 + 10x2x3 - 10x2x4 + 2x3x4
has diagonal form, and find the corresponding principal axes.
f1 = Rθ(e1) = S T f2 = Rθ(e2) = S T
cos θ -sin θ
and (∗)
sin θ cos θ
Given a point p = S T = x1e1 + x2e2 in the original system, let y1 and y2 be the
x1
x2
x2
x2 p
y2 coordinates of p in the new system (see the diagram). That is,
S x T = p = y1f1 + y2f2 = S TS T
y1 x1 cos θ -sin θ y1
y2 (∗∗)
y1 2 sin θ cos θ y2
Theorem 2
Consider the quadratic form q = ax 21 + bx1x2 + cx 22 where a, c, and b2 - 4ac are all
nonzero.
1. There is a counterclockwise rotation of the coordinate axes about the origin such
that, in the new coordinate system, q has no cross term.
2. The graph of the equation
ax 21 + bx1x2 + cx 22 = 1
PROOF
If b = 0, q already has no cross term and (1) and (2) are clear. So assume b ≠ 0.
S T
a _12 b
The matrix A = of q has characteristic polynomial
1
_
2
b c
___________
cA(x) = x2 - (a + c)x - _14 (b2 - 4ac). If we write d = √b2 + (a - c)2 for convenience;
then the quadratic formula gives the eigenvalues
λ1 = _12 [a + c - d] and λ2 = _12 [a + c + d]
with corresponding principal axes
S T
1
f1 = ________________
_______________ a-c-d and
2 2 b
√b + (a - c - d)
S T
1
f2 = ________________
_______________ -b
2 2 a - c - d
√b + (a - c - d)
as the reader can verify. These agree with equation (∗) above if θ is an angle
such that
a-c-d
cos θ = ________________
_______________ b
and sin θ = ________________
_______________
√b2 + (a - c - d)2 √b2 + (a - c - d)2
Then P = [f1 f2] = S T diagonalizes A and equation (∗∗) becomes the
cos θ -sin θ
sin θ cos θ
formula x = Py in Theorem 1. This proves (1).
EXAMPLE 3
√2 S T and f2 = __
√2 S T
y2
1__ -1 1__ -1
the ellipse (see the diagram). The eigenvectors f1 = __
1 -1
point along these axes of symmetry, and this is the reason for the name
principal axes.
Congruence
We return to the study of quadratic forms in general.
Theorem 3
PROOF
Let q(x) = xTBx for all x where BT = B. If C = A - B, then CT = C and
xTCx = 0 for all x. We must show that C = 0. Given y in n,
0 = (x + y)TC(x + y) = xTCx + xTCy + yTCx + yTCy
= xTCy + yTCx
But yTCx = (xTCy)T = xTCy (it is 1 × 1). Hence xTCy = 0 for all x and y in n.
If ej is column j of In, then the (i, j)-entry of C is eiTCej = 0. Thus C = 0.
c
1. A ∼ A for all A.
c c
2. If A ∼ B, then B ∼ A.
c c c
3. If A ∼ B and B ∼ C, then A ∼ C.
c
4. If A ∼ B, then A is symmetric if and only if B is symmetric.
c
5. If A ∼ B, then rank A = rank B.
The converse to (5) can fail even for symmetric matrices.
EXAMPLE 4
The key distinction between A and B in Example 4 is that A has two positive
eigenvalues (counting multiplicities) whereas B has only one.
Theorem 4
Then D T0 DD0 = Dn(k, r), so if new variables z are given by x = (P0D0)z, we obtain
q(z) = zTDn(k, r)z = z 21 + + z 2k - z 2k+1 - - z 2r
as required. Note that the change-of-variables matrix P0D0 from z to x has
orthogonal columns (in fact, scalar multiples of the columns of P0).
EXAMP L E 5
Completely diagonalize the quadratic form q in Example 2 and find the index
and rank.
Theorem 5
PROOF
1. If A has index k and rank r, take U = P0D0 where P0 and D0 are as described
prior to Example 5. Then UTAU = Dn(k, r). The converse is true because
Dn(k, r) has index k and rank r (using Theorem 4).
c c
2. If A and B both have index k and rank r, then A ∼ Dn(k, r) ∼ B by (1). The
converse was given earlier.
430 Chapter 8 Orthogonality
PROOF OF THEOREM 4
c c
By Theorem 1, A ∼ D1 and B ∼ D2 where D1 and D2 are diagonal and have the
c c
same eigenvalues as A and B, respectively. We have D1 ∼ D2, (because A ∼ B),
so we may assume that A and B are both diagonal. Consider the quadratic form
q(x) = xTAx. If A has k positive eigenvalues, q has the form
q(x) = a1x 21 + + ak x 2k - ak+1 x 2k+1 - - ar x 2r , ai > 0
where r = rank A = rank B. The subspace W1 = {x | xk+1 = = xr = 0} of n
has dimension n - r + k and satisfies q(x) > 0 for all x ≠ 0 in W1.
On the other hand, if B = UTAU, define new variables y by x = Uy. If B has
k positive eigenvalues, q has the form
q(x) = b1y 21 + + bk y 2k - bk+1 y 2k+1 - - br y 2r , bi > 0
n
Let f1, …, fn denote the columns of U. They are a basis of and
S T
y1
x = Uy = [f1 fn] = y1f1 + + ynfn
yn
Hence the subspace W2 = span{fk+1, …, fr} satisfies q(x) < 0 for all x ≠ 0
in W2. Note that dim W2 = r - k. It follows that W1 and W2 have only
the zero vector in common. Hence, if B1 and B2 are bases of W1 and W2,
respectively, then (Exercise 33 Section 6.3) B1 ∪ B2 is an independent set of
(n - r + k) + (r - k ) = n + k - k vectors in n. This implies that k ≤ k,
and a similar argument shows k ≤ k.
EXERCISES 8.8
1. In each case, find a symmetric matrix A such that 3. For each of the following, write the equation in
q = xTBx takes the form q = xTAx. terms of new variables so that it is in standard
(a) S T (b) S T
1 1 1 1 position, and identify the curve.
0 1 -1 2 (a) xy = 1
1 0 1 1 2 −1 (b) 3x2 - 4xy = 2
(c) 1 1 0 (d) 4 1 0
0 1 1 5 −2 3 (c) 6x2 + 6xy - 2y2 = 5
2. In each case, find a change of variables that will (d) 2x2 + 4xy + 5y2 = 1
diagonalize the quadratic form q. Determine the 4. Consider the equation ax2 + bxy + cy2 = d,
index and rank of q. where b ≠ 0. Introduce new variables x1 and y1
(a) q = x 21 + 2x1x2 + x 22 by rotating the axes counterclockwise through an
angle θ. Show that the resulting equation has no
(b) q = x 21 + 4x1x2 + x 22 x1y1-term if θ is given by
(c) q = x 21 + x 22 + x 23 - 4(x1x2 + x1x3 + x2x3) a-c
cos 2θ = _____________
___________ ,
(d) q = 7x 21 + x 22 + x 23 + 8x1x2 + 8x1x3 - 16x2x3 √b + (a - c)2
2
5. Prove properties (1)–(5) preceding Example 4. positive). In this case, q is called positive
c
definite.
6. If A ∼ B show that A is invertible if and only if B
is invertible. (b) Show that new variables y can be found such
that q = y2 and y = Ux where U is upper
7. If x = (x1, …, xn)T is a column of variables, triangular with positive diagonal entries.
A = AT is n × n, B is 1 × n, and c is a constant, [Hint: Theorem 3 Section 8.3.]
xTAx + Bx = c is called a quadratic equation in
the variables xi. 10. A bilinear form β on n is a function that
assigns to every pair x, y of columns in n a
(a) Show that new variables y1, …, yn can be number β(x, y) in such a way that
found such that the equation takes the form
λ1y 21 + + λry 2r + k1y1 + + knyn = c. β(rx + sy, z) = rβ(x, z) + sβ(y, z)
β(x, ry + sz) = rβ(x, y) + sβ(x, z)
(b) Write x 21 + 3x 22 + 3x 23 + 4x1x2 - 4x1x3 + 5x1
- 6x3 = 7 in this form and find variables y1, for all x, y, z in n and r, s in . If
y2, y3 as in (a). β(x, y) = β(y, x) for all x, y, β is called
symmetric.
8. Given a symmetric matrix A, define
c (a) If β is a bilinear form, show that an n × n
qA(x) = xTAx. Show that B ∼ A if and only
matrix A exists such that β(x, y) = xTAy for
if B is symmetric and there is an invertible all x, y.
matrix U such that qB(x) = qA(Ux) for all x.
[Hint: Theorem 3.] (b) Show that A is uniquely determined by β.
9. Let q(x) = xTAx be a quadratic form, A = AT. (c) Show that β is symmetric if and only if
A = AT.
(a) Show that q(x) > 0 for all x ≠ 0, if and only
if A is positive definite (all eigenvalues are
EXAMP L E 1
Hence the goal is to find the largest value of c for which the graph of
q(x1, x2) = c contains a feasible point.
The choice of the function q depends upon many factors; we will show
how to solve the problem for any quadratic form q (even with more than two
variables). In the diagram the function q is given by
q(x1, x2) = x1x2,
and the graphs of q(x1, x2) = c are shown for c = 1 and c = 2. As c increases the
graph of q(x1, x2) = c moves up and to the right. From this it is clear that there
will be a ___
solution for some value of c between 1 and 2 (in fact the largest value
is c = _12 √15 = 1.94 to two decimal places).
Theorem 1
PROOF
Since A is symmetric, let the (real) eigenvalues λi of A be ordered as to size as
follows: λ1 ≥ λ2 ≥ ≥ λn. By the principal axis theorem, let P be an orthogonal
matrix such that PTAP = D = diag(λ1, λ2, …, λn). Define y = PTx, equivalently
x = Py, and note that y = x because y2 = yTy = xT(PPT)x = xTx = x2.
If we write y = ( y1, y2, …, yn)T, then
q(x) = q(Py) = (Py)TA(Py)
= yT(PTAP)y = yTDy
= λ1y 21 + λ2y 22 + + λny 2n. (∗)
SECTION 8.9 An Application to Constrained Optimization 433
The set of all vectors x in n such that x ≤ 1 is called the unit ball. If n = 2,
it is often called the unit disk and consists of the unit circle and its interior; if n = 3,
it is the unit sphere and its interior. It is worth noting that the maximum value of a
quadratic form q(x) as x ranges throughout the unit ball is (by Theorem 1) actually
attained for a unit vector x on the boundary of the unit ball.
Theorem 1 is important for applications involving vibrations in areas as diverse
as aerodynamics and particle physics, and the maximum and minimum values in the
theorem are often found using advanced calculus to minimize the quadratic form
on the unit ball. The algebraic approach using the principal axis theorem gives a
geometrical interpretation of the optimal values because they are eigenvalues.
EXAMP L E 2
Maximize and minimize the form q(x) = 3x 21 + 14x1x2 + 3x 22 subject to
x ≤ 1.
EXAMP L E 3
1200x1 + 1300x2 = 8700
x2
A manufacturer makes x1 units of product 1, and x2 units of product 2, at a
profit of $70 and $50 per unit respectively, and wants to choose x1 and x2 to
p=
maximize the total profit p(x1, x2) = 70x1 + 50x2. However x1 and x2 are not
p = 30
57
p=
arbitrary; for example, x1 ≥ 0 and x2 ≥ 0. Other conditions also come into play.
0
50
4
0
Each unit of product 1 costs $1200 to produce and requires 2000 square feet of
(4, 3) warehouse space; each unit of product 2 costs $1300 to produce and requires
1100 square feet of space. If the total warehouse space is 11 300 square feet,
and if the total production budget is $8700, x1 and x2 must also satisfy the
O x1 conditions
2000x1 + 1100x2 = 11300
434 Chapter 8 Orthogonality
18 A good introduction can be found at http://www.mcgrawhill.ca/olc/nicholson, and more information is available in “Linear
Programming and Extensions” by N. Wu and R. Coppins, McGraw-Hill, 1981.
SECTION 8.10 An Application to Statistical Principal Component Analysis 435
19 Hence E( ) is a linear transformation from the vector space of all random variables to the space of real numbers.
20 If X and Y are independent in the sense of probability theory, then they are uncorrelated; however, the converse is not true in
general.
Change of Basis
9
If A is an m × n matrix, the corresponding matrix transformation TA : n → m
is defined by
TA(x) = Ax for all columns x in n.
It was shown in Theorem 2 Section 2.6 that every linear transformation
T : n → m is a matrix transformation; that is, T = TA for some m × n
matrix A. Furthermore, the matrix A is uniquely determined by T. In fact A
is given in terms of its columns by
A = [T(e1) T(e2) T(en)]
where {e1, e2, …, en} is the standard basis of n.
In this chapter we show how to associate a matrix with any linear transformation
T : V → W where V and W are finite-dimensional vector spaces, and we describe
how the matrix can be used to compute T(v) for any v in V. The matrix depends on
the choice of a basis B in V and a basis D in W, and is denoted MDB (T ). The case
when W = V is particularly important. If B and D are two bases of V, we show that
the matrices MBB(T ) and MDD(T ) are similar, that is MDD(T ) = P-1MBB(T )P for
some invertible matrix P. Moreover, we give an explicit method for constructing
P depending only on the bases B and D. This leads to some of the most important
theorems in linear algebra, as we shall see in Chapter 11.
S T
v1
v
CB(v) = (v1b1 + v2b2 + + vnbn) = 2
vn
The reason for writing CB(v) as a column instead of a row will become clear later.
Note that CB(bi) = ei is column i of In.
EXAMP L E 1
The coordinate vector for v = (2, 1, 3) with respect to the ordered basis
ST
0
B = {(1, 1, 0), (1, 0, 1), (0, 1, 1)} of 3 is CB(v) = 2 because
1
v = (2, 1, 3) = 0(1, 1, 0) + 2(1, 0, 1) + 1(0, 1, 1).
Theorem 1
If V has dimension n and B = {b1, b2, …, bn} is any ordered basis of V, the coordinate
transformation CB : V → n is an isomorphism. In fact, C −1 n
B : → V is given by
S T S T
v1 v1
−1 v2 v2
CB = v1b1 + v2b2 + + vnbn for all in n.
vn vn
PROOF
The verification that CB is linear is Exercise 13. If T : n→ V is the map denoted
C -1
B in the theorem, one verifies (Exercise 13) that TCB = 1V and CBT = 1n.
Note that CB(bj) is column j of the identity matrix, so CB carries the basis B to
the standard basis of n, proving again that it is an isomorphism (Theorem 1
Section 7.3).
so Theorem 2 Section 2.6 shows that a unique m × n matrix A exists such that
CDTC -1
B = TA, equivalently CDT = TACB
TA acts by left multiplication by A, so this latter condition is
CD[T(v)] = ACB(v) for all v in V
This requirement completely determines A. Indeed, the fact that CB(bj) is column j
of the identity matrix gives
column j of A = ACB(bj) = CD[T(bj)]
438 Chapter 9 Change of Basis
Definition 9.2 This is called the matrix of T corresponding to the ordered bases B and D, and
we use the following notation:
MDB(T ) = S CD[T(b1)] CD[T(b2)] CD[T(bn)] T
Theorem 2
EXAMPLE 2
Define T : P2 → 2 by T(a + bx + cx2) = (a + c, b - a - c) for all polynomials
a + bx + cx2. If B = {b1, b2, b3} and D = {d1, d2} where
b1 = 1, b2 = x, b3 = x2 and d1 = (1, 0), d2 = (0, 1)
compute MDB(T ) and verify Theorem 2.
Solution ► We have T(b1) = d1 - d2, T(b2) = d2, and T(b3) = d1 - d2. Hence
1 0 1
MDB(T ) = S CD[T(b1)] CD[T(b2)] CD[T(b3)] T = .
−1 1 −1
If v = a + bx + cx2 = ab1 + bb2 + cb3, then T(v) = (a + c)d1 + (b - a - c)d2, so
a
a + c
CD[T(v)] = = 1 0 1 b = MDB(T )CB(v)
b − a − c −1 1 −1
c
as Theorem 2 asserts.
SECTION 9.1 The Matrix of a Linear Transformation 439
The next example shows how to determine the action of a transformation from
its matrix.
EXAMP L E 3
1 −1 0 0
Suppose T : M22() → 3 is linear with matrix MDB(T) = 0 1 −1 0 where
0 0 1 −1
ST S T
1 −1 0 0 a a-b
CD[T(v)] = MDB(T )CB(v) = 0 1 −1 0 b =
c b-c
0 0 1 −1 d c -d
EXAMP L E 4
Let A be an m × n matrix, and let TA : n → m be the matrix transformation
induced by A : TA(x) = Ax for all columns x in n. If B and D are the standard
bases of n and m, respectively (ordered as usual), then
MDB(TA) = A
In other words, the matrix of TA corresponding to the standard bases is A itself.
EXAMP L E 5
Let V and W have ordered bases B and D, respectively. Let dim V = n.
1. The identity transformation 1V : V → V has matrix MBB(1V) = In.
2. The zero transformation 0 : V → W has matrix MDB(0) = 0.
440 Chapter 9 Change of Basis
The first result in Example 5 is false if the two bases of V are not equal. In fact, if
B is the standard basis of n, then the basis D of n can be chosen so that MBD(1n)
turns out to be any invertible matrix we wish (Exercise 14).
The next two theorems show that composition of linear transformations is
compatible with multiplication of the corresponding matrices.
Theorem 3
T S
T S Let V → W→ U, be linear transformations and let B, D, and E be finite ordered bases
V W U
of V, W, and U, respectively. Then
ST
MEB(ST ) = MED(S) · MDB(T )
PROOF
We use the property in Theorem 2 three times. If v is in V,
MED(S)MDB(T )CB(v) = MED(S)CD[T(v)] = CE[ST(v)] = MEB(ST )CB(v)
If B = {e1, …, en}, then CB(ej) is column j of In. Hence taking v = ej shows that
MED(S)MDB(T ) and MEB(ST ) have equal jth columns. The theorem follows.
Theorem 4
PROOF
T -1
n TA−1 n TA n (1) ⇒ (2). We have V → W T→ V, so Theorem 3 and Example 5 give
MBD(T -1)MDB(T ) = MBB(T -1T ) = MBB(1V) = In
TATA−1
Similarly, MDB(T )MBD(T -1) = In, proving (2) (and the last statement in the
theorem).
(2) ⇒ (3). This is clear.
(3) ⇒ (1). Suppose that TDB(T ) is invertible for some bases B and D and, for
convenience, write A = MDB(T ). Then we have CDT = TACB by Theorem 2, so
T = (CD)–1TACB
by Theorem 1 where (CD)–1 and CB are isomorphisms. Hence (1) follows if we
can show that TA : n → n is also an isomorphism. But A is invertible by (3)
and one verifies that TATA–1 = 1n = TA–1TA. So TA is indeed invertible (and
(TA)–1 = TA–1).
SECTION 9.1 The Matrix of a Linear Transformation 441
Theorem 5
PROOF
Write A = MDB(T ) for convenience. The column space of A
is U = {Ax | x in n}. Hence rank A = dim U and so, because
rank T = dim(im T ), it suffices to find an isomorphism S : im T → U.
Now every vector in im T has the form T(v), v in V. By Theorem 2,
CD[T(v)] = ACB(v) lies in U. So define S : im T → U by
S[T(v)] = CD[T(v)] for all vectors T(v) in im T
EXAMP L E 6
Define T : P2 → 3 by T(a + bx + cx2) = (a - 2b, 3c - 2a, 3c - 4b) for
a, b, c ∈ . Compute rank T.
Solution ► Since rank T = rank [MDB(T )] for any bases B ⊆ P2 and D ⊆ 3, we
choose the most convenient ones: B = {1, x, x2} and D = {(1, 0, 0), (0, 1, 0),
(0, 0, 1)}. Then MDB(T ) = [CD[T(1)] CD[T(x)] CD[T(x2)]] = A where
1 −2 0 1 −2 0 1 −2 0
0 1 − 4–
3
A = −2 0 3 . Since A→ 0 −4 3 →
0 −4 3 0 −4 3 0 0 0
we have rank A = 2. Hence rank T = 2 as well.
EXAMPLE 7
Let T : V → W be a linear transformation where dim V = n and dim W = m.
Choose an ordered basis B = {b1, …, br, br+1, …, bn} of V in which
{br+1, …, bn} is a basis of ker T, possibly empty. Then {T(b1), …, T(br)} is a
basis of im T by Theorem 5 Section 7.2, so extend it to an ordered basis
D = {T(b1), …, T(br), fr+1, …, fm} of W. Because T(br+1) = = T(bn) = 0,
we have
MDB(T ) = [CD[T(b1)] CD[T(br)] CD[T(br+1)] CD[T(bn)]] = S T.
Ir 0
0 0
Incidentally, this shows that rank T = r by Theorem 5.
EXERCISES 9.1
(b) MDB(T) = S T
2 1 3
(d) T : P2 → 2, T(a + bx + cx2) = (a + b, c);
-1 0 -2 B = {1, x, x2}, D = {(1, -1), (1, 1)};
3. In each case, find the matrix of T : V → W v = a + bx + cx2
(e) T : M22 → , T S T = a + b + c + d;
corresponding to the bases B and D of V and W, a b
respectively. c d
B = US T , S T , S T , S T V,
(a) T : M22 → , T(A) = tr A; 1 0 0 1 0 0 0 0
B = US T , S T , S T , S T V,
1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1
D = {1}; v = S T
a b
0 0 0 0 1 0 0 1
c d
D = {1}
SECTION 9.1 The Matrix of a Linear Transformation 443
T : M22 → M22, T S T= Sb + c T;
a b a b+c
(f ) (a) T : P2 → 3,
c d d T(a + bx + cx2) = (a + c, c, b - c);
B = D = US T, S T, S T , S T V;
1 0 0 1 0 0 0 0
B = {1, x, x2}, D = standard
0 0 0 0 1 0 0 1
T : M22 → 4,
v=S T
a b
(b)
TS T = (a + b + c, b + c, c, d);
c d a b
5. In each case, verify Theorem 3. Use the standard c d
B = US T , S T , S T , S T V,
1 0 0 1 0 0 0 0
basis in n and {1, x, x2} in P2.
0 0 0 0 1 0 0 1
(a) 3 → T
--- 2 → S
--- 4; T(a, b, c) = (a + b, b - c), D = standard
S(a, b) = (a, b - 2a, 3b, a + b)
9. Let D : P3 → P2 be the differentiation map
(b) 3 → T
--- 4 → S
--- 2; given by D[ p(x)] = p(x). Find the matrix of D
T(a, b, c) = (a + b, c + b, a + c, b - a), corresponding to the bases B = {1, x, x2, x3}
S(a, b, c, d) = (a + b, c - d) and E = {1, x, x2}, and use it to compute
(c) P2 → T
--- 3 → S
--- P2; D(a + bx + cx2 + dx3).
T(a + bx + cx2) = (a, b - c, c - a), 10. Use Theorem 4 to show that T : V → V is
S(a, b, c) = b + cx + (a - c)x2
not an isomorphism if ker T ≠ 0 (assume
(d) 3 → T
--- P2 → S
--- 2; dim V = n). [Hint: Choose any ordered basis B
T(a, b, c) = (a - b) + (c - a)x + bx2, containing a vector in ker T.]
S(a + bx + cx2) = (a - b, c)
11. Let T : V → be a linear transformation,
T S and let D = {1} be the basis of . Given any
6. Verify Theorem 3 for M22 → --- M22 →
-- P2 where
(b) Generalize to T : Pn → n+1 where
(b) If X ⊆ X1, show that X 01 ⊆ X0.
T(p) = (p(a0), p(a1), …, p(an)) and a0, a1, …, an
(c) If U and U1 are subspaces of V, show that
are distinct real numbers. [Hint: Theorem 7
(U + U1)0 = U0 ∩ U 01.
Section 3.2.]
23. Define R : Mmn → L(n, m) by R(A) = TA for
17. Let T : Pn → Pn be defined by
each m × n matrix A, where TA : n → m is
T [ p(x)] = p(x) + xp(x), where p(x) denotes the
given by TA(x) = Ax for all x in n. Show that R
derivative. Show that T is an isomorphism by
is an isomorphism.
finding MBB(T ) when B = {1, x, x2, …, xn}.
24. Let V be any vector space (we do not assume
18. If k is any number, define Tk : M22 → M22 by
it is finite dimensional). Given v in V, define
Tk(A) = A + kAT.
Sv: → V by Sv(r) = rv for all r in .
(a) If B = U S T, S T, S T, S TV
1 0 0 0 0 1 0 1
0 0 0 1 1 0 -1 0 (a) Show that Sv lies in L(, V ) for each v in V.
find MBB (Tk), and conclude that Tk is
(b) Show that the map R : V → L(, V ) given by
invertible if k ≠ 1 and k ≠ -1. R(v) = Sv is an isomorphism. [Hint: To show
that R is onto, if T lies in L(, V ), show that
(b) Repeat for Tk: M33 → M33. Can you
T = Sv where v = T(1).]
generalize?
The remaining exercises require the following 25. Let V be a vector space with ordered basis
definitions. If V and W are vector spaces, the B = {b1, b2, …, bn}. For each i = 1, 2, …, m,
set of all linear transformations from V to W define Si: → V by Si(r) = rbi for all r in .
will be denoted by (a) Show that each Si lies in L(, V ) and
L(V, W) = {T | T : V → W is a linear transformation} Si(1) = bi.
Given S and T in L(V, W) and a in , define
(b) Given T in L(, V ), let
S + T : V → W and aT : V → W by T(1) = a1b1 + a2b2 + + anbn, ai in .
(S + T )(v) = S(v) + T(v) for all v in V Show that T = a1S1 + a2S2 + + anSn.
(aT )(v) = aT(v) for all v in V
(c) Show that {S1, S2, …, Sn} is a basis of L(, V ).
19. Show that L(V, W) is a vector space.
26. Let dim V = n, dim W = m, and let B and D be
20. Show that the following properties hold provided ordered bases of V and W, respectively. Show
that the transformations link together in such a that MDB : L(V, W) → Mmn is an isomorphism
way that all the operations are defined. of vector spaces. [Hint: Let B = {b1, …, bn} and
D = {d1, …, dm}. Given A = [aij] in Mmn, show
(a) R(ST ) = (RS)T
that A = MDB(T ) where T : V → W is defined
(b) 1WT = T = T1V by T(bj) = a1jd1 + a2jd2 + + amjdm for each j.]
(c) R(S + T ) = RS + RT 27. If V is a vector space, the space V∗ = L(V, )
(d) (S + T )R = SR + TR is called the dual of V. Given a basis
B = {b1, b2, …, bn} of V, let Ei : V →
(e) (aS)T = a(ST ) = S(aT ) for each i = 1, 2, …, n be the linear
transformation satisfying
21. Given S and T in L(V, W), show that:
Ei(bj) = e
0 if i ≠ j
(a) ker S ∩ ker T ⊆ ker(S + T ) 1 if i = j
(b) im(S + T ) ⊆ im S + im T (each Ei exists by Theorem 3 Section 7.1). Prove
the following:
22. Let V and W be vector spaces. If X is a subset
(a) Ei(r1b1 + + rnbn) = ri for each i = 1, 2, …, n
of V, define
X 0 = {T in L(V, W) | T(v) = 0 for all v in X}
(b) v = E1(v)b1 + E2(v)b2 + + En(v)bn for all
0 v in V
(a) Show that X is a subspace of L(V, W).
SECTION 9.2 Operators and Similarity 445
Recall that if T : n → n is a linear operator and E = {e1, e2, …, en} is the standard
basis of n, then CE(x) = x for every x ∈ n, so ME(T ) = [T(e1), T(e2), …, T(en)] is
the matrix obtained in Theorem 2 Section 2.6. Hence ME(T ) will be called the
standard matrix of the operator T.
For reference the following theorem collects some results from Theorems 2,
3, and 4 in Section 9.1, specialized for operators. As before, CB(v) denoted the
coordinate vector of v with respect to the basis B.
Theorem 1
For a fixed operator T on a vector space V, we are going to study how the matrix
MB(T ) changes when the basis B changes. This turns out to be closely related to
how the coordinates CB(v) change for a vector v in V. If B and D are two ordered
bases of V, and if we take T = 1V in Theorem 2 Section 9.1, we obtain
Definition 9.4 With this in mind, define the change matrix PD←B by
PD←B = MDB(1V) for any ordered bases B and D of V.
Theorem 2
Let B = {b1, b2, …, bn} and D denote ordered bases of a vector space V. Then the change
matrix PD←B is given in terms of its columns by
PD←B = [CD(b1) CD(b2) CD(bn)] (∗)
and has the property that
CD(v) = PD←BCD(v) for all v in V. (∗∗)
Moreover, if E is another ordered basis of V, we have
1. PB←B = In.
2. PD←B is invertible and (PD←B)-1 = PB←D.
3. PE←DPD←B = PE←B.
PROOF
The formula (∗∗) is derived above, and (∗) is immediate from the definition of
PD←B and the formula for MDB(T ) in Theorem 2 Section 9.1.
1. PB←B = MBB(1V) = In as is easily verified.
2. This follows from (1) and (3).
T S
3. Let V →--- W→-- U be operators, and let B, D, and E be ordered bases of V,
W, and U respectively. We have MEB(ST ) = MED(S)MDB(T ) by Theorem
3 Section 9.1. Now (3) is the result of specializing V = W = U and
T = S = 1V.
EXAMPLE 1
In P2 find PD←B if B = {1, x, x2} and D = {1, (1 - x), (1 - x)2}. Then use this to
express p = p(x) = a + bx + cx2 as a polynomial in powers of (1 - x).
SECTION 9.2 Operators and Similarity 447
Now let B = {b1, b2, …, bn} and B0 be two ordered bases of a vector space V. An
operator T : V → V has different matrices MB[T] and MB0[T] with respect to B and
B0. We can now determine how these matrices are related. Theorem 2 asserts that
CB0(v) = PB0←BCB(v) for all v in V.
On the other hand, Theorem 1 gives
CB[T(v)] = MB(T )CB(v) for all v in V.
Combining these (and writing P = PB0←B for convenience) gives
PMB(T )CB(v) = PCB[T(v)]
= CB0[T(v)]
= MB0(T)CB0(v)
= MB0(T)PCB(v)
This holds for all v in V. Because CB(bj) is the jth column of the identity matrix,
it follows that
PMB(T ) = MB0(T )P
Moreover P is invertible (in fact, P-1 = PB←B0 by Theorem 2), so this gives
MB(T ) = P-1MB0(T )P
This asserts that MB0(T ) and MB(T ) are similar matrices, and proves Theorem 3.
Theorem 3
1 This also follows from Taylor’s Theorem (Corollary 3 of Theorem 1 Section 6.5 with a = 1).
448 Chapter 9 Change of Basis
EXAMPLE 2
Let T : 3 → 3 be defined by T(a, b, c) = (2a - b, b + c, c - 3a). If B0 denotes
the standard basis of 3 and B = {(1, 1, 0), (1, 0, 1), (0, 1, 0)}, find an invertible
matrix P such that P-1MB0(T )P = MB(T ).
Solution ► We have
2 −1 0
MB0(T ) = [CB0(2, 0, -3 ) CB0(-1, 1, 0 ) CB0(0, 1, 1)] = 0 1 1
−3 0 1
4 4 −1
MB(T ) = [CB(1, 1, -3 ) CB(2, 1, -2 ) CB(-1, 1, 0)] = −3 −2 0
−3 −3 2
1 1 0
P = PB0←B = [CB0(1, 1, 0 ) CB0(1, 0, 1 ) CB0(0, 1, 0)] = 1 0 1
0 1 0
The reader can verify that P-1MB0(T )P = MB(T ); equivalently that
MB0(T )P = PMB(T ).
Theorem 4
2. If B is any ordered basis of n, let P be the (invertible) matrix whose columns are
the vectors in B in order. Then
MB(TA) = P-1AP
SECTION 9.2 Operators and Similarity 449
PROOF
EXAMP L E 3
EXAMP L E 4
1. If T : V → V is an operator where V is finite dimensional, show that
TST = T for some invertible operator S : V → V.
2. If A is an n × n matrix, show that AUA = A for some invertible matrix U.
Solution ►
1. Let B = {b1, …, br, br+1, …, bn} be a basis of V chosen so that
ker T = span{br+1, …, bn}. Then {T(b1), …, T(br)} is independent
(Theorem 5 Section 7.2), so complete it to a basis
{T(b1), …, T(br), fr+1, …, fn} of V.
450 Chapter 9 Change of Basis
The reader will appreciate the power of these methods if he/she tries to find U
directly in part 2 of Example 4, even if A is 2 × 2.
A property of n × n matrices is called a similarity invariant if, whenever a given
n × n matrix A has the property, every matrix similar to A also has the property.
Theorem 1 Section 5.5 shows that rank, determinant, trace, and characteristic
polynomial are all similarity invariants.
To illustrate how such similarity invariants are related to linear operators, consider
the case of rank. If T : V → V is a linear operator, the matrices of T with respect to
various bases of V all have the same rank (being similar), so it is natural to regard the
common rank of all these matrices as a property of T itself and not of the particular
matrix used to describe T. Hence the rank of T could be defined to be the rank of A,
where A is any matrix of T. This would be unambiguous because rank is a similarity
invariant. Of course, this is unnecessary in the case of rank because rank T was
defined earlier to be the dimension of im T, and this was proved to equal the rank
of every matrix representing T (Theorem 5 Section 9.1). This definition of rank T
is said to be intrinsic because it makes no reference to the matrices representing T.
However, the technique serves to identify an intrinsic property of T with every
similarity invariant, and some of these properties are not so easily defined directly.
In particular, if T : V → V is a linear operator on a finite dimensional space V,
define the determinant of T (denoted det T ) by
det T = det MB(T ), B any basis of V
This is independent of the choice of basis B because, if D is any other basis of V,
the matrices MB(T ) and MD(T ) are similar and so have the same determinant. In
the same way, the trace of T (denoted tr T ) can be defined by
tr T = tr MB(T ), B any basis of V
This is unambiguous for the same reason.
Theorems about matrices can often be translated to theorems about linear
operators. Here is an example.
EXAMPLE 5
Let S and T denote linear operators on the finite dimensional space V. Show that
det(ST ) = det S det T
SECTION 9.2 Operators and Similarity 451
EXAMP L E 6
Compute the characteristic polynomial cT(x) of the operator T : P2 → P2 given
by T(a + bx + cx2) = (b + c) + (a + c)x + (a + b)x2.
EXAMP L E 7
Let L be the line in 3 through the origin with (unit) direction vector
d = _13 [2 1 2]T. Compute the matrix of the rotation about L through an
angle θ measured counterclockwise when viewed in the direction of d.
L
Solution ► Let R : 3 → 3 be the rotation. The idea is to first find a basis
B0 for which the matrix of MB0(R) of R is easy to compute, and then use
d = R(d) Theorem 3 to compute the “standard” matrix ME(R) with respect to the
R(g) standard basis E = {e1, e2, e3} of 3.
θ To construct the basis B0, let K denote the plane through the origin with d
0 g
θ as normal, shaded in the diagram. Then the vectors f = _13 [-2 2 1]T and
R(f )
f g = _13 [1 2 -2]T are both in K (they are orthogonal to d) and are independent
(they are orthogonal to each other).
452 Chapter 9 Change of Basis
Note that in Example 7 not much motivation was given to the choices of the
(orthonormal) vectors f and g in the basis B0, which is the key to the solution.
However, if we begin with any basis containing d the Gram-Schmidt algorithm
will produce an orthogonal basis containing d, and the other two vectors will
automatically be in L⊥ = K.
EXERCISES 9.2
1. In each case find PD←B, where B and D are 2. In 3 find PD←B, where
ordered bases of V. Then verify that B = {(1, 0, 0), (1, 1, 0), (1, 1, 1)} and
CD(v) = PD←BCB(v). D = {(1, 0, 1), (1, 0, -1), (0, 1, 0)}.
If v = (a, b, c), show that
(a) V = 2, B = {(0, -1), (2, 1)},
S T S T
a+c a-b
D = {(0, 1), (1, 1)}, v = (3, -5) CD(v) = _12 a - c and CB(v) = b - c , and
(b) V = P2, B = {x, 1 + x, x2}, 2b c
D = {2, x + 3, x2 - 1}, v = 1 + x + x2 verify that CD(v) = PD←BCB(v).
(c) V = M22, 3. In P3 find PD←B if B = {1, x, x2, x3} and
B = US T , S T , S T , S T V,
D = {1, (1 - x), (1 - x)2, (1 - x)3}.
1 0 0 1 0 0 0 0
0 0 0 0 0 1 1 0 Then express p = a + bx + cx2 + dx3 as
D = US T , S T , S T , S T V,
1 1 1 0 1 0 0 1 a polynomial in powers of (1 - x).
0 0 1 0 0 1 1 0 4. In each case verify that PD←B is the inverse
v = S 3 -1 T of PB←D and that PE←DPD←B = PE←B, where
1 4 B, D, and E are ordered bases of V.
SECTION 9.2 Operators and Similarity 453
(a) V = 3, B = {(1, 1, 1), (1, -2, 1), (1, 0, -1)}, (e) T : 3 → 3, T(a, b, c) = (b, c, a)
D = standard basis,
T : M22 → M22, T S T=S T;
a b a-c b-d
E = {(1, 1, 1), (1, -1, 0), (-1, 0, 1)}
(f )
c d a-c b-d
(b) V = P2, B = {1, x, x2}, 10. If V is finite dimensional, show that a linear
D = {1 + x + x2, 1 - x, -1 + x2}. E = {x2, x, 1} operator T on V has an inverse if and only if
5. Use property (2) of Theorem 2, with D the det T ≠ 0.
standard basis of n, to find the inverse of: 11. Let S and T be linear operators on V where V is
1 1 0 1 2 1 finite dimensional.
(a) A = 1 0 1
(b) A = 2 3 0
(a) Show that tr(ST ) = tr(TS).
0 1 1 −1 0 2
[Hint: Lemma 1 Section 5.5.]
6. Find PD←B if B = {b1, b2, b3, b4} and
(b) [See Exercise 19 Section 9.1.] For a in ,
D = {b2, b3, b1, b4}. Change matrices arising
show that tr(S + T ) = tr S + tr T, and
when the bases differ only in the order of the
tr(aT ) = a tr(T ).
vectors are called permutation matrices.
12. If A and B are n × n matrices, show that they
7. In each case, find P = PB0←B and verify that
have the same null space if and only if A = UB
P-1MB0(T )P = MB(T ) for the given operator T.
for some invertible matrix U. [Hint: Exercise 28
(a) T : 3 → 3, T(a, b, c) = (2a - b, b + c, c - 3a); Section 7.3.]
B0 = {(1, 1, 0), (1, 0, 1), (0, 1, 0)} and B is the
standard basis. 13. If A and B are n × n matrices, show that they
have the same column space if and only if
(b) T : P2 → P2, A = BU for some invertible matrix U. [Hint:
T(a + bx + cx2) = (a + b) + (b + c)x + (c + a)x2; Exercise 28 Section 7.3.]
B0 = {1, x, x2} and B = {1 - x2, 1 + x, 2x + x2}.
(c) T : M22 → M22, T S T=S T;
a b a+d b+c 14. Let E = {e1, …, en} be the standard ordered basis
c d a+c b+d of n, written as columns. If D = {d1, …, dn} is
any ordered basis, show that PE←D = [d1 dn].
B0 = U S T , S T , S T , S T V,
1 0 0 1 0 0 0 0
0 0 0 0 1 0 0 1 15. Let B = {b1, b2, …, bn} be any ordered basis of
and B = U S T, S T, S T, S T V
1 1 0 0 1 0 0 1 n, written as columns. If Q = [b1 b2 bn]
0 0 1 1 0 1 1 1 is the matrix with the bi as columns, show that
QCB(v) = v for all v in n.
8. In each case, verify that P-1AP = D and find a
basis B of 2 such that MB(TA) = D. 16. Given a complex number w, define Tw: →
(a) A = S 11 T S3 4T S0 3T
-6 P = 2 3 D = 2 0 by Tw(z) = wz for all z in .
12 -6 (a) Show that Tw is a linear operator for each w
(b) A = S T S7 5T S T
29 -12 P = 3 2 D = 1 0 in , viewing as a real vector space.
70 -29 0 -1
(b) If B is any ordered basis of , define
9. In each case, compute the characteristic S : → M22 by S(w) = MB(Tw) for all w
polynomial cT(x). in . Show that S is a one-to-one linear
transformation with the additional property
(a) T : 2 → 2, T(a, b) = (a - b, 2b - a)
that S(wv) = S(w)S(v) holds for all w and v in
(b) T : 2 → 2, T(a, b) = (3a + 5b, 2a + 3b) .
(c) T : P2 → P2, T(a + bx + cx2) (c) Taking B = {1, i} show that
S(a + bi) = S T for all complex
= (a - 2c) + (2a + b + c)x + (c - a)x2 a -b
(d) T : P2 → P2, T(a + bx + cx2) b a
numbers a + bi. This is called the regular
= (a + b - 2c) + (a - 2b + c)x + (b - 2a)x2
representation of the complex numbers
as 2 × 2 matrices. If θ is any angle,
454 Chapter 9 Change of Basis
describe S(eiθ ) geometrically. Show that and write P = [ pij]. Show that
__
S( w ) = S(w)T for all w in ; that is, that P = [CD(b1) CD(b1) CD(b1)] and that
conjugation corresponds to transposition. CD(v) = PCB(v) for all v in B.
17. Let B = {b1, b2, …, bn} and D = {d1, d2, …, dn} 18. Find the standard matrix of the rotation R about
be two ordered bases of a vector space V. Prove the line through the origin with direction vector
that CD(v) = PD←BCB(v) holds for all v in V as d = [2 3 6]T. [Hint: Consider f = [6 2 -3]T
follows: Express each bj in the form and g = [3 -6 2]T.]
bj = p1jd1 + p2jd2 + + pnjdn
U T U EXAMPLE 1
Let T : V → V be any linear operator. Then:
1. {0} and V are T-invariant subspaces.
2. Both ker T and im T = T(V ) are T-invariant subspaces.
3. If U and W are T-invariant subspaces, so are T(U ), U W, and U + W.
EXAMPLE 2
Define T : 3 → 3 by T(a, b, c) = (3a + 2b, b - c, 4a + 2b - c). Then
U = {(a, b, a) | a, b in } is T-invariant because
T(a, b, a) = (3a + 2b, b - a, 3a + 2b)
is in U for all a and b (the first and last entries are equal).
EX AMPL E 3
Let T : V → V be a linear operator, and suppose that U = span{u1, u2, …, uk}
is a subspace of V. Show that U is T-invariant if and only if T(ui) lies in U for
each i = 1, 2, …, k.
EXAMP L E 4
Define T : 2 → 2 by T(a, b) = (b, -a). Show that 2 contains no T-invariant
subspace except 0 and 2.
This is the reason for the importance of T-invariant subspaces and is the first step
toward finding a basis that simplifies the matrix of T.
Theorem 1
MB(T ) = S T
MB1(T ) Y
0 Z
where Z is (n - k) × (n - k) and MB1(T ) is the matrix of the restriction of T to U.
456 Chapter 9 Change of Basis
PROOF
The matrix of (the restriction) T : U → U with respect to the basis B1 is the
k × k matrix
MB1(T ) = [CB1[T(b1)] CB1[T(b2)] CB1[T(bk)]]
Now compare the first column CB1[T(b1)] here with the first column CB[T(b1)] of
MB(T ). The fact that T(b1) lies in U (because U is T-invariant) means that T(b1)
has the form
T(b1) = t1b1 + t2b2 + + tkbk + 0bk+1 + + 0bn
Consequently,
t1
t2
t1
t2
CB1[T(b1)] = in k whereas CB[T(b1)] = t k in n
0
tk
0
The block upper triangular form for the matrix MB(T ) in Theorem 1 is
very useful because the determinant of such a matrix equals the product of the
determinants of each of the diagonal blocks. This is recorded in Theorem 2 for
reference, together with an important application to characteristic polynomials.
Theorem 2
0 0 0 Ann
where the diagonal blocks are square. Then:
1. det A = (det A11)(det A22)(det A33)(det Ann).
2. cA(x) = cA11(x)cA22(x)cA33(x)cAnn(x).
SECTION 9.3 Invariant Subspaces and Direct Sums 457
PROOF
If n = 2, (1) is Theorem 5 Section 3.1; the general case (by induction on n) is left
to the reader. Then (2) follows from (1) because
xI − A11 − A12 − A13 − A1n
0 xI − A22 − A23 − A2n
xI - A = 0 0 xI − A33 − A3n
0 0 0 xI − Ann
where, in each diagonal block, the symbol I stands for the identity matrix of the
appropriate size.
EXAMP L E 5
Consider the linear operator T : P2 → P2 given by
T(a + bx + cx2) = (-2a - b + 2c) + (a + b)x + (-6a - 2b + 5c)x2
Show that U = span{x, 1 + 2x2} is T-invariant, use it to find a block upper
triangular matrix for T, and use that to compute cT(x).
Eigenvalues
Let T : V → V be a linear operator. A one-dimensional subspace v, v ≠ 0, is
T-invariant if and only if T(rv) = rT(v) lies in v for all r in . This holds if and
only if T(v) lies in v; that is, T(v) = λv for some λ in . A real number λ is called
an eigenvalue of an operator T : V → V if
T(v) = λv
holds for some nonzero vector v in V. In this case, v is called an eigenvector of T
corresponding to λ. The subspace
458 Chapter 9 Change of Basis
Theorem 3
Let T : V → V be a linear operator where dim V = n, let B denote any ordered basis of
V, and let CB : V → n denote the coordinate isomorphism. Then:
1. The eigenvalues λ of T are precisely the eigenvalues of the matrix MB(T ) and
thus are the roots of the characteristic polynomial cT(x).
2. In this case the eigenspaces Eλ(T ) and Eλ[MB(T )] are isomorphic via the
restriciton CB : Eλ(T ) → Eλ[MB(T )].
PROOF
Write A = MB(T ) for convenience. If T(v) = λv, then applying CB gives
λCB(v) = CB[T(v)] = ACB(v) because CB is linear. Hence CB(v) lies in Eλ(A),
so we do indeed have a function CB : Eλ(T ) → Eλ(A). It is clearly linear and
one-to-one; we claim it is onto. If x is in Eλ(A), write x = CB(v) for some v
in V (CB is onto). This v actually lies in Eλ(T ). To see why, observe that
CB[T(v)] = ACB(v) = Ax = λx = λCB(v) = CB(λv)
Hence T(v) = λv because CB is one-to-one, and this proves (2). As to (1), we
have already shown that eigenvalues of T are eigenvalues of A. The converse
follows, as in the foregoing proof that CB is onto.
Theorem 3 shows how to pass back and forth between the eigenvectors of an
operator T and the eigenvectors of any matrix MB(T ) of T :
v lies in Eλ(T ) if and only if CB(v) lies in Eλ[MB(T )]
EXAMPLE 6
Find the eigenvalues and eigenspaces for T : P2 → P2 given by
T(a + bx + cx2) = (2a + b + c) + (2a + b - 2c)x - (a + 2c)x2
S T S T
-1 5
Moreover, E-1[MB(T )] = 2 and E3[MB(T )] = 6 , so Theorem 3 gives
1 -1
E-1(T ) = (-1 + 2x + x2) and E3(T ) = (5 + 6x - x2).
Theorem 4
PROOF
If v lies in the eigenspace Eλ(T ), then T(v) = λv, so T [T(v)] = T(λv) = λT(v).
This shows that T(v) lies in Eλ(T ) too.
Direct Sums
Sometimes vectors in a space V can be written naturally as a sum of vectors in two
subspaces. For example, in the space Mnn of all n × n matrices, we have subspaces
U = {P in Mnn | P is symmetric} and W = {Q in Mnn | Q is skew symmetric}
where a matrix Q is called skew-symmetric if QT = -Q. Then every matrix A in
Mnn can be written as the sum of a matrix in U and a matrix in W; indeed,
A = _12 (A + AT) + _12 (A - AT)
where _12 (A + AT) is symmetric and _12 (A - AT) is skew symmetric. Remarkably,
this representation is unique: If A = P + Q where PT = P and QT = -Q, then
AT = PT + QT = P - Q; adding this to A = P + Q gives P = _12 (A + AT), and
subtracting gives Q = _12 (A - AT). In addition, this uniqueness turns out to be closely
related to the fact that the only matrix in both U and W is 0. This is a useful way to
view matrices, and the idea generalizes to the important notion of a direct sum of
subspaces.
If U and W are subspaces of V, their sum U + W and their intersection U ∩ W
were defined in Section 6.4 as follows:
U + W = {u + w | u in U and w in W}
U ∩ W = {v | v lies in both U and W}
These are subspaces of V, the sum containing both U and W and the intersection
contained in both U and W. It turns out that the most interesting pairs U and W
are those for which U ∩ W is as small as possible and U + W is as large as possible.
Definition 9.7 A vector space V is said to be the direct sum of subspaces U and W if
U ∩ W = {0} and U+W=V
In this case we write V = U ⊕ W. Given a subspace U, any subspace W such that
V = U ⊕ W is called a complement of U in V.
460 Chapter 9 Change of Basis
EXAMPLE 7
In the space 5, consider the subspaces U = {(a, b, c, 0, 0) | a, b, and c in }
and W = {(0, 0, 0, d, e) | d and e in }. Show that 5 = U ⊕ W.
EXAMPLE 8
If U is a subspace of n, show that n = U ⊕ U⊥.
EXAMPLE 9
Let {e1, e2, …, en} be a basis of a vector space V, and partition it into two parts:
{e1, …, ek} and {ek+1, …, en}. If U = span{e1, …, ek} and W = span{ek+1, …, en},
show that V = U ⊕ W.
Theorem 5
Let U and W be subspaces of a finite dimensional vector space V. The following three
conditions are equivalent:
1. V = U ⊕ W.
2. Each vector v in V can be written uniquely in the form
v=u+w u in U, w in W
3. If {u1, …, uk} and {w1, …, wm} are bases of U and W, respectively, then
B = {u1, …, uk, w1, …, wm} is a basis of V.
SECTION 9.3 Invariant Subspaces and Direct Sums 461
PROOF
Example 9 shows that (3) ⇒ (1).
(1) ⇒ (2). Given v in V, we have v = u + w, u in U, w in W, because V = U + W.
If also v = u1 + w1, then u - u1 = w1 - w lies in U ∩ W = {0}, so u = u1
and w = w1.
(2) ⇒ (3). Given v in V, we have v = u + w, u in U, w in W. Hence v lies
in span B; that is, V = span B. To see that B is independent, let
a1u1 + + akuk + b1w1 + + bmwm = 0. Write u = a1u1 + + akuk
and w = b1w1 + + bmwm. Then u + w = 0, and so u = 0 and w = 0
by the uniqueness in (2). Hence ai = 0 for all i and bj = 0 for all j.
Theorem 6
S T
MB1(T ) Y
MB(T ) = (∗)
0 Z
in Theorem 1 is achieved by choosing any basis B1 = {b1, …, bk} of U1 and
completing it to a basis B = {b1, …, bk, bk+1, …, bn} of V in any way at all. The
fact that U1 is T-invariant ensures that the first k columns of MB(T ) have the form
in (∗) (that is, the last n - k entries are zero), and the question arises whether the
additional basis vectors bk+1, …, bn can be chosen such that
U2 = span{bk+1, …, bn}
is also T-invariant. In other words, does each T-invariant subspace of V have a
T-invariant complement? Unfortunately the answer in general is no (see Example 11
below); but when it is possible, the matrix MB(T ) simplifies further. The assumption
that the complement U2 = span{bk+1, …, bn} is T-invariant too means that Y = 0
in equation (∗) above, and that Z = MB2(T ) is the matrix of the restriction of T
to U2 (where B2 = {bk+1, …, bn}). The verification is the same as in the proof of
Theorem 1.
462 Chapter 9 Change of Basis
Theorem 7
S MB1(T ) T
MB1(T ) 0
MB(T ) =
0
where MB1(T ) and MB2(T ) are the matrices of the restrictions of T to U1 and to U2
respectively.
Definition 9.8 The linear operator T : V → V is said to be reducible if nonzero T-invariant subspaces
U1 and U2 can be found such that V = U1 ⊕ U2.
Then T has a matrix in block diagonal form as in Theorem 7, and the study of T
is reduced to studying its restrictions to the lower-dimensional spaces U1 and U2. If
these can be determined, so can T. Here is an example in which the action of T on
the invariant subspaces U1 and U2 is very simple indeed. The result for operators is
used to derive the corresponding similarity theorem for matrices.
EXAMPLE 10
Let T : V → V be a linear operator satisfying T 2 = 1V (such operators are
called involutions). Define
U1 = {v | T(v) = v} and U2 = {v | T(v) = -v}
MB(TA) = S T
Ir 0
0 -In-r
But Theorem 4 Section 9.2 shows that MB(TA) = P-1AP for some
invertible matrix P, and this proves part (c).
Note that the passage from the result for operators to the analogous result for
matrices is routine and can be carried out in any situation, as in the verification
of part (c) of Example 10. The key is the analysis of the operators. In this case,
the involutions are just the operators satisfying T 2 = 1V, and the simplicity of this
condition means that the invariant subspaces U1 and U2 are easy to find.
Unfortunately, not every linear operator T : V → V is reducible. In fact, the
linear operator in Example 4 has no invariant subspaces except 0 and V. On the
other hand, one might expect that this is the only type of nonreducible operator;
that is, if the operator has an invariant subspace that is not 0 or V, then some
invariant complement must exist. The next example shows that even this is not valid.
EXAMPLE 11
Consider the operator T : 2 → 2 given by T S T = S T. Show that
a a+b
b b
U1 = S T is T-invariant but that U1 has no T-invariant complement in 2.
1
0
Solution ► Because U1 = spanU S T V and T S T = S T , it follows (by Example 3)
1 1 1
0 0 0
that U1 is T-invariant. Now assume, if possible, that U1 has a T-invariant
complement U2 in 2. Then U1 ⊕ U2 = 2 and T(U2) ⊆ U2. Theorem 6 gives
2 = dim 2 = dim U1 + dim U2 = 1 + dim U2
Thus
S q T = T S q T = λ S q T where λ ∈ .
p+q p p
This is as far as we take the theory here, but in Chapter 11 the techniques
introduced in this section will be refined to show that every matrix is similar to a
very nice matrix indeed—its Jordan canonical form.
EXERCISES 9.3
1. If T : V → V is any linear operator, show that 7. Suppose that T : V → V is a linear operator and
ker T and im T are T-invariant subspaces. that U is a T-invariant subspace of V. If S is an
invertible operator, put T = STS-1. Show that
2. Let T be a linear operator on V. If U and W are S(U ) is a T -invariant subspace.
T-invariant, show that
8. In each case, show that U is T-invariant, use it to
(a) U ∩ W and U + W are also T-invariant.
find a block upper triangular matrix for T, and
(b) T(U ) is T-invariant. use that to compute cT(x).
3. Let S and T be linear operators on V and assume (a) T : P2 → P2, T(a + bx + cx2)
that ST = TS. = (-a + 2b + c) + (a + 3b + c)x + (a + 4b)x2,
U = span{1, x + x2}
(a) Show that im S and ker S are T-invariant.
(b) T : P2 → P2, T(a + bx + cx2)
(b) If U is T-invariant, show that S(U ) is
T-invariant. = (5a - 2b + c) + (5a - b + c)x + (a + 2c)x2,
U = span{1 - 2x2, x + x2}
4. Let T : V → V be a linear operator. Given v in
V, let U denote the set of vectors in V that lie in 9. In each case, show that TA : 2 → 2 has no
every T-invariant subspace that contains v. invariant subspaces except 0 and 2.
(a) A = S T
1 2
(a) Show that U is a T-invariant subspace of V
containing v. -1 -1
(b) A = S T , 0 < θ < π
cos θ -sin θ
(b) Show that U is contained in every sin θ cos θ
T-invariant subspace of V that contains v. 10. In each case, show that V = U ⊕ W.
5. (a) If T is a scalar operator (see Example 1 (a) V = 4, U = span{(1, 1, 0, 0), (0, 1, 1, 0)},
Section 7.1) show that every subspace is W = span{(0, 1, 0, 1), (0, 0, 1, 1)}
T-invariant.
(b) V = 4, U = {(a, a, b, b) | a, b in },
(b) Conversely, if every subspace is T-invariant, W = {(c, d, c, -d) | c, d in }
show that T is scalar.
(c) V = P3, U = {a + bx | a, b in },
6. Show that the only subspaces of V that are W = {ax2 + bx3 | a, b in }
T-invariant for every operator T : V → V are
0 and V. Assume that V is finite dimensional.
[Hint: Theorem 3 Section 7.1.]
SECTION 9.3 Invariant Subspaces and Direct Sums 465
(d) V = M22, U = e S a a T ` a, b in f, 2 −5 0 0
b b
1 −2 0 0
W = eS a b
T ` a, b in f
19. If A =
0 0 − 1 −2
, show that TA : 4 → 4
-a b
0 0 1 1
11. Let U = span{(1, 0, 0, 0), (0, 1, 0, 0)} in 4.
has two-dimensional T-invariant subspaces U
Show that 4 = U ⊕ W1 and 4 = U ⊕ W2,
and W such that 4 = U ⊕ W, but A has no real
where W1 = span{(0, 0, 1, 0), (0, 0, 0, 1)} and
eigenvalue.
W2 = span{(1, 1, 1, 1), (1, 1, 1, -1)}.
20. Let T : V → V be a linear operator where
12. Let U be a subspace of V, and suppose that
dim V = n. If U is a T-invariant subspace of V,
V = U ⊕ W1 and V = U ⊕ W2 hold for
let T1: U → U denote the restriction of T to U
subspaces W1 and W2. Show that
(so T1(u) = T(u) for all u in U ). Show that
dim W1 = dim W2.
cT(x) = cT1(x) · q(x) for some polynomial q(x).
13. If U and W denote the subspaces of even and [Hint: Theorem 1.]
odd polynomials in Pn, respectively, show that 21. Let T : V → V be a linear operator where
Pn = U ⊕ W. (See Exercise 36 Section 6.3.) dim V = n. Show that V has a basis of
[Hint: f (x) + f (-x) is even.] eigenvectors if and only if V has a basis B such
MB(T ) = S T
U ∩ W = {0} if and only if {u, w} is independent I3 0
for all u ≠ 0 in U and all w ≠ 0 in W.
0 -1
T S
16. Let V →
--- W→-- V be linear transformations, and
(b) T : P3 → P3 where T [ p(x)] = p(-x),
MB(T ) = S T
assume that dim V and dim W are finite. I2 0
(a) If ST = 1V, show that W = im T ⊕ ker S. 0 -I2
[Hint: Given w in W, show that w - TS(w) (c) T : → where T(a + bi) = a - bi,
lies in ker S.]
MB(T ) = S T
1 0
(b) Illustrate with 2 → T
--- 3 →S
-- 2 where 0 -1
T(x, y) = (x, y, 0) and S(x, y, z) = (x, y).
(d) T : 3 → 3 where
T(a, b, c) = (-a + 2b + c, b + c, -c),
17. Let U and W be subspaces of V, let dim V = n,
MB(T ) = S
0 -I2 T
and assume that dim U + dim W = n. 1 0
(b) Conversely, if T : V → V is a linear 27. Let T : V → V be an operator such that T 2 = c2,
transformation such that T 2 = T, show that c ≠ 0.
V = ker T ⊕ im T. [Hint: v - T(v) lies in
(a) Show that V = U1 ⊕ U2, where
ker T for all v in V.]
U1 = {v | T(v) = cv} and
24. Let T : V → V be a linear operator satisfying U2 = {v | T(v) = -cv}.
1
T 2 = T (such operators are called idempotents). [Hint: v = __
2c
{[T(v) + cv] - [T(v) - cv]}.]
Define U1 = {v | T(v) = v} and (b) If dim V = n, show that V has a basis B such
that MB(T ) = S T for some k.
U2 = ker T = {v | T(v) = 0}. cIk 0
(a) Show that V = U1 ⊕ U2. 0 -cIn-k
(b) If dim V = n, find a basis B of V such that (c) If A is an n × n matrix such that A2 = c2I,
10
SECTION 10.1 Inner Products and Norms
The dot product was introduced in n to provide a natural generalization of
the geometrical notions of length and orthogonality that were so important in
Chapter 4. The plan in this chapter is to define an inner product on an arbitrary
real vector space V (of which the dot product is an example in n) and use it to
introduce these concepts in V.
Definition 10.1 An inner product on a real vector space V is a function that assigns a real number
〈v, w〉 to every pair v, w of vectors in V in such a way that the following axioms
are satisfied.
EXAMPLE 1
n is an inner product space with the dot product as inner product:
〈x, y〉 = x · y for all v, w ∈ n
See Theorem 1 Section 5.3. This is also called the euclidean inner product,
and n, equipped with the dot product, is called euclidean n-space.
1 If we regard n as a vector space over the field of complex numbers, then the “standard inner product” on n defined in Section
8.6 does not satisfy Axiom P4 (see Theorem 1(3) Section 8.6).
SECTION 10.1 Inner Products and Norms 469
EXAMP L E 2
If A and B are m × n matrices, define 〈A, B〉 = tr(ABT) where tr(X) is the trace
of the square matrix X. Show that 〈 , 〉 is an inner product in Mmn.
Solution ► P1 is clear. Since tr(P) = tr(PT) for every m × n matrix P, we have P2:
〈A, B〉 = tr(ABT) = tr[(ABT)T] = tr(BAT) = 〈B, A〉.
Next, P3 and P4 follow because trace is a linear transformation Mmn →
(Exercise 19). Turning to P5, let r1, r2, …, rm denote the rows of the matrix A.
Then the (i, j)-entry of AAT is ri · rj, so
〈A, A〉 = tr(AAT) = r1 · r1 + r2 · r2 +
+ rm · rm
But rj · rj is the sum of the squares of the entries of rj, so this shows that 〈A, A〉
is the sum of the squares of all nm entries of A. Axiom P5 follows.
EXAMP L E 32
Let C[a, b] denote the vector space of continuous functions from [a, b] to , a
subspace of F[a, b]. Show that
〈 f, g〉 = ∫ba f (x)g(x) dx
defines an inner product on C[a, b].
Theorem 1
Let 〈 , 〉 be an inner product on a space V; let v, u, and w denote vectors in V; and let r
denote a real number.
1. 〈u, v + w〉 = 〈u, v〉 + 〈u, w〉.
2. 〈v, rw〉 = r〈v, w〉 = 〈rv, w〉.
2 This example (and others later that refer to it) can be omitted with no loss of continuity by students with no calculus background.
470 Chapter 10 Inner Product Spaces
Theorem 2
PROOF
Given an inner product 〈 , 〉 on n, let {e1, e2, …, en} be the standard basis of n.
n n
If x = ∑xiei and y = ∑yjej are two vectors in n, compute 〈x, y〉 by adding the
i=1 j=1
inner product of each term xiei to each term yjej. The result is a double sum.
n n n n
〈x, y〉 = ∑ ∑ 〈xiei, yjej〉 = ∑ ∑ xi〈ei, ej〉yj
i=1 j=1 i=1 j=1
SECTION 10.1 Inner Products and Norms 471
en , e 1 en , e 2 en , e n yn
T
Hence 〈x, y〉 = x Ay, where A is the n × n matrix whose (i, j)-entry is 〈ei, ej〉.
The fact that 〈ei, ej〉 = 〈ej, ei〉 shows that A is symmetric. Finally, A is positive
definite by Theorem 2 Section 8.3.
Remark If we refer to the inner product space n without specifying the inner product, we
mean that the dot product is to be used.
EXAMP L E 4
Let the inner product 〈 , 〉 be defined on 2 by
Solution ► The (i, j)-entry of the matrix A is the coefficient of viwj in the
expression, so A = S T. Incidentally, if x = S y T , then
2 -1 x
-1 1
〈x, x〉 = 2x2 - 2xy + y2 = x2 + (x - y)2 ≥ 0
for all x, so 〈x, x〉 = 0 implies x = 0. Hence 〈 , 〉 is indeed an inner product, so
A is positive definite.
Definition 10.2 As in n, if 〈 , 〉 is an inner product on a space V, the norm3 v of a vector v in V is
defined by _____
v = √〈v, v〉
We define the distance between vectors v and w in an inner product space V to be
d(v, w) = v - w
3
3 If the dot product is used in n, the norm x of a vector x is usually called the length of x.
472 Chapter 10 Inner Product Spaces
EXAMPLE 5
y The norm of a continuous function f = f (x) in C[a, b] (with the inner product
from Example 3) is given by
_________
y = f (x)2
f = √∫ab f (x)2dx
←
f 2 Hence f2 is the area beneath the graph of y = f (x)2 between x = a and x = b
O x (see the diagram).
a b
EXAMPLE 6
Show that 〈u + v, u - v〉 = u2 - v2 in any inner product space.
A vector v in an inner product space V is called a unit vector if v = 1. The set
of all unit vectors in V is called the unit ball in V. For example, if V = 2 (with the
dot product) and v = (x, y), then
v = 1 if and only if x2 + y2 = 1
Hence the unit ball in 2 is the unit circle x2 + y2 = 1 with centre at the origin and
radius 1. However, the shape of the unit ball varies with the choice of inner product.
EXAMPLE 7
y Let a > 0 and b > 0. If v = (x, y) and w = (x1, y1), define an inner product on
(0, b) 2 by
xx1 ___yy1
〈v, w〉 = ___2
+ 2
(−a, 0) (a, 0) a b
O x The reader can verify (Exercise 5) that this is indeed an inner product. In this
case
(0, −b) y2
x2 + __
v = 1 if and only if __ =1
a2 b2
so the unit ball is the ellipse shown in the diagram.
Example 7 graphically illustrates the fact that norms and distances in an inner
product space V vary with the choice of inner product in V.
Theorem 3
The next theorem reveals an important and useful fact about the relationship
between norms and inner products, extending the Cauchy inequality for n
(Theorem 2 Section 5.3).
Theorem 4
Cauchy-Schwarz Inequality4
If v and w are two vectors in an inner product space V, then
〈v, w〉2 ≤ v2w2
Moreover, equality occurs if and only if one of v and w is a scalar multiple of the other.
4
PROOF
Write v = a and w = b. Using Theorem 1 we compute:
bv - aw2 = b2v2 - 2ab〈v, w〉 + a2w2 = 2ab(ab - 〈v, w〉)
(∗)
bv + aw2 = b2v2 + 2ab〈v, w〉 + a2w2 = 2ab(ab + 〈v, w〉)
It follows that ab - 〈v, w〉 ≥ 0 and ab + 〈v, w〉 ≥ 0, and hence that
-ab ≤ 〈v, w〉 ≤ ab. But then |〈v, w〉| ≤ ab = vw, as desired.
Conversely, if |〈v, w〉| = vw = ab then 〈v, w〉 = ±ab. Hence (∗)
shows that bv - aw = 0 or bv + aw = 0. It follows that one of v and w
is a scalar multiple of the other, even if a = 0 or b = 0.
EXAMP L E 8
If f and g are continuous functions on the interval [a, b], then (see Example 3)
U ∫ab f (x)g(x)dx V2 ≤ ∫ab f (x)2dx ∫ab g(x)2dx
Another famous inequality, the so-called triangle inequality, also comes from the
Cauchy-Schwarz inequality. It is included in the following list of basic properties of
the norm of a vector.
Theorem 5
If V is an inner product space, the norm · has the following properties.
1. v ≥ 0 for every vector v in V.
2. v = 0 if and only if v = 0.
3. rv = |r|v for every v in V and every r in .
4. v + w ≤ v + w for all v and w in V (triangle inequality).
4 Hermann Amandus Schwarz (1843–1921) was a German mathematician at the University of Berlin. He had strong geometric
intuition, which he applied with great ingenuity to particular problems. A version of the inequality appeared in 1885.
474 Chapter 10 Inner Product Spaces
PROOF
_____
Because v = √〈v, v〉 , properties (1) and (2) follow immediately from (3) and
(4) of Theorem 1. As to (3), compute
rv2 = 〈rv, rv〉 = r2〈v, v〉 = r2v2
Hence (3) follows by taking positive square roots. Finally, the fact that
〈v, w〉 ≤ vw by the Cauchy-Schwarz inequality gives
v + w2 = 〈v + w, v + w〉 = v2 + 2〈v, w〉 + w2
≤ v2 + 2vw + w2
= (v + w)2
Hence (4) follows by taking positive square roots.
It is worth noting that the usual triangle inequality for absolute values,
|r + s| ≤ |r| + |s| for all real numbers r and s,
is a special case of (4) where V = = 1 and the dot product 〈r, s〉 = rs is used.
In many calculations in an inner product space, it is required to show that some
vector v is zero. This is often accomplished most easily by showing that its norm
v is zero. Here is an example.
EXAMPLE 9
Let {v1, …, vn} be a spanning set for an inner product space V. If v in V satisfies
〈v, vi〉 = 0 for each i = 1, 2, …, n, show that v = 0.
Theorem 6
EXERCISES 10.1
4. In each case, find the distance between u and v. 12. In each case, show that 〈v, w〉 = vTAw defines
an inner product on 2 and hence show that A is
(a) u = (3, -1, 2, 0), v = (1, 1, 1, 3);
positive definite.
〈u, v〉 = u · v
(a) A = S T (b) A = S T
2 1 5 -3
(b) u = (1, 2, -1, 2), v = (2, 1, -1, 3); 1 1 -3 2
(c) A = S T (d) A = S T
〈u, v〉 = u · v 3 2 3 4
(c) u = f, v = g in C[0, 1] where f (x) = x2 and 2 3 4 6
g(x) = 1 - x; 〈f, g〉 = ∫10 f (x)g(x)dx 13. In each case, find a symmetric matrix A such that
(d) u = f, v = g in C[-π, π] where f (x) = 1 and 〈v, w〉 = vTAw.
π
(a) S 1 T , S w1 T = v1w1 + 2v1w2 + 2v2w1 + 5v2w2
g(x) = cos x; 〈f, g〉 = ∫-π f (x)g(x)dx v w
w = w1b1 +
+ wnbn are vectors in V, define
476 Chapter 10 Inner Product Spaces
〈S TS T
v1 w1
(d)
v3 w3
〉
v2 , w2 = v1w1 + 2v2w2 + 5v3w3
- 2v1w3 - 2v3w1
(a) Show that W = {w | w in V, 〈v, w〉 = 0}
is a subspace of V.
14. If A is symmetric and xTAx = 0 for all columns x (b) If V = 3 with the dot product, and if
in n, show that A = 0. [Hint: Consider v = (1, -1, 2), find a basis for W (W as in (a)).
〈x + y, x + y〉 where 〈x, y〉 = xTAy.]
27. Given vectors w1, w2, …, wn and v, assume that
15. Show that the sum of two inner products on V is 〈v, wi〉 = 0 for each i. Show that 〈v, w〉 = 0 for
again an inner product. all w in span{w1, w2, …, wn}.
__
16. Let u = 1, v = 2, w = √3 , 〈u, v〉 = -1, 28. If V = span{v1, v2, …, vn} and 〈v, vi〉 = 〈w, vi〉
〈u, w〉 = 0 and 〈v, w〉 = 3. Compute: holds for each i. Show that v = w.
S T
20. Prove Theorem 1.
u2 u · v
(a) Show that AAT = .
21. Prove Theorem 6. u · v v2
22. Let u and v be vectors in an inner product (b) Show that det(AAT ) ≥ 0.
space V.
31. (a) If v and w are nonzero vectors in an
(a) Expand 〈2u - 7v, 3u + 5v〉. inner product space V, show that
(b) Expand 〈3u - 4v, 5u + v〉. 〈v, w〉
-1 ≤ _______ ≤ 1, and hence that a unique
vw
(c) Show that u + v2 = u2 + 2〈u, v〉 + v2. 〈v, w〉
angle θ exists such that _______ = cos θ and
(d) Show that u - v2 = u2 - 2〈u, v〉 + v2. vw
0 ≤ θ ≤ π. This angle θ is called the angle
23. Show that between v and w.
v2 + w2 = _12 {v + w2 + v - w2}
for any v and w in an inner product space. (b) Find the angle between v = (1, 2, -1, 1, 3)
and w = (2, 1, 0, 2, 0) in 5 with the dot
24. Let 〈 , 〉 be an inner product on a vector space V. product.
Show that the corresponding distance function is
(c) If θ is the angle between v and w, show that
translation invariant. That is, show that
the law of cosines is valid:
d(v, w) = d(v + u, w + u) for all v, w, and u
v - w2 = v2 + w2 - 2vw cos θ.
in V.
32. If V = 2, define (x, y) = |x| + |y|.
25. (a) Show that 〈u, v〉 = _14 [u + v2 - u - v2]
for all u, v in an inner product space V. (a) Show that · satisfies the conditions in
Theorem 5.
(b) If 〈 , 〉 and 〈 , 〉 are two inner products on V that
have equal associated norm functions, show (b) Show that · does not arise from an inner
that 〈u, v〉 = 〈u, v〉 holds for all u and v. product on 2 given by a matrix A. [Hint: If
it did, use Theorem 2 to find numbers a, b,
26. Let v denote a vector in an inner product and c such that (x, y)2 = ax2 + bxy + cy2 for
space V. all x and y.]
SECTION 10.2 Orthogonal Sets of Vectors 477
EXAMP L E 1
{sin x, cos x} is orthogonal in C[-π, π] because
π
∫π-π sin x cos x dx = [-_14 cos 2x] -π = 0
Theorem 1
Pythagoras’ Theorem
If {f1, f2, …, fn} is an orthogonal set of vectors, then
f1 + f2 +
+ fn2 = f12 + f22 +
+ fn2
Theorem 2
Theorem 3
EXAMPLE 2
S T S T S T v is an orthogonal basis of
2 0 0
Show that u -1 , 1 , -1 3
with inner product
0 1 2
S T
1 1 0
〈v, w〉 = vTAw, where A = 1 2 0 .
0 0 1
Solution ► We have
〈S TST 〉 S TS T ST
2 0 1 1 0 0 0
,
-1 1 = [2 -1 0] 1 2 0 1 = [1 0 0] 1 =0
0 1 0 0 1 1 1
and the reader can verify that the other pairs are orthogonal too. Hence the set
is orthogonal, so it is linearly independent by Theorem 3. Because dim 3 = 3,
it is a basis.
Theorem 4
Expansion Theorem
Let {f1, f2, …, fn} be an orthogonal basis of an inner product space V. If v is any vector
in V, then
〈v, f1〉 〈v, f2〉 〈v, fn〉
v = ______ f + ______
2 1
f +
+ ______
2 2
fn
f1 f2 fn2
is the expansion of v as a linear combination of the basis vectors.
EXAMPLE 3
If a0, a1, …, an are distinct numbers and p(x) and q(x) are in Pn, define
〈p(x), q(x)〉 = p(a0)q(a0) + p(a1)q(a1) +
+ p(an)q(an)
SECTION 10.2 Orthogonal Sets of Vectors 479
This is an inner product on Pn. (Axioms P1–P4 are routinely verified, and P5
holds because 0 is the only polynomial of degree n with n + 1 distinct roots.
See Theorem 4 Section 6.5 or Appendix D.)
Recall that the Lagrange polynomials δ0(x), δ1(x), …, δn(x) relative to the
numbers a0, a1, …, an are defined as follows (see Section 6.5):
∏i≠k(x - ai)
δk(x) = ____________ k = 0, 1, 2, …, n
∏i≠k(ak - ai)
where ∏i≠k(x - ai) means the product of all the terms
(x - a0), (x - a1), (x - a2), …, (x - an)
except that the kth term is omitted. Then {δ0(x), δ1(x), …, δn(x)} is orthonormal
with respect to 〈 , 〉 because δk(ai) = 0 if i ≠ k and δk(ak) = 1. These facts also
show that 〈p(x), δk(x)〉 = p(ak) so the expansion theorem gives
p(x) = p(a0)δ0(x) + p(a1)δ1(x) +
+ p(an)δn(x)
for each p(x) in Pn. This is the Lagrange interpolation expansion of p(x),
Theorem 3 Section 6.5, which is important in numerical integration.
Lemma 1
Orthogonal Lemma
Let {f1, f2, …, fm} be an orthogonal set of vectors in an inner product space V, and let v
be any vector not in span{f1, f2, …, fm}. Define
〈v, f1〉 〈v, f2〉 〈v, fm〉
fm+1 = v - ______ f - ______
2 1
f -
- ______
2 2
fm
f1 f2 fm2
Then {f1, f2, …, fm, fm+1} is an orthogonal set of vectors.
The proof of this result (and the next) is the same as for the dot product in n
(Lemma 1 and Theorem 2 in Section 8.1).
Theorem 5
EXAMPLE 4
1 p(x)q(x)dx. If the
Consider V = P3 with the inner product 〈p, q〉 = ∫-1
Gram-Schmidt algorithm is applied to the basis {1, x, x2, x3}, show that the
result is the orthogonal basis
{1, x, _13 (3x2 - 1), _15 (5x3 - 3x)}.
The polynomials in Example 4 are such that the leading coefficient is 1 in each case.
In other contexts (the study of differential equations, for example) it is customary to
take multiples p(x) of these polynomials such that p(1) = 1. The resulting orthogonal
basis of P3 is
{1, x, _12 (3x2 - 1), _12 (5x3 - 3x)}
and these are the first four Legendre polynomials, so called to honour the French
mathematician A. M. Legendre (1752–1833). They are important in the study of
differential equations.
If V is an inner product space of dimension n, let E = {f1, f2, …, fn} be an
orthonormal basis of V (by Theorem 5). If v = v1f1 + v2f2 +
+ vnfn and
w = w1f1 + w2f2 +
+ wnfn are two vectors in V, we have CE(v) = [v1 v2
vn]T
and CE(w) = [w1 w2
wn]T. Hence
〈v, w〉 = 〈∑ivifi, ∑jwjfj〉 = ∑i,jviwj〈fi, fj〉 = ∑iviwi = CE(v) · CE(w).
This shows that the coordinate isomorphism CE : V → n preserves inner products,
and so proves
Corollary 1
Theorem 6
PROOF
1. U⊥ is a subspace by Theorem 1 Section 10.1. If v is in U ∩ U⊥, then
〈v, v〉 = 0, so v = 0 again by Theorem 1 Section 10.1. Hence
U ∩ U⊥ = {0}, and it remains to show that U + U⊥ = V. Given v in V,
we must show that v is in U + U⊥, and this is clear if v is in U. If v is not
in U, let {f1, f2, …, fm} be an orthogonal basis of U. Then the orthogonal
2 mR
lemma shows that v - Q ______
〈v, f1〉 〈v, f2〉 〈v, fm〉
f + ______
2 1
f +
+ ______
2 2
f is in U⊥,
f1 f2 fm
so v is in U + U⊥ as required.
2. This follows from Theorem 6 Section 9.3.
3. We have dim U⊥⊥ = n - dim U⊥ = n - (n - dim U ) = dim U, using
(2) twice. As U ⊆ U⊥⊥ always holds (verify), (3) follows by Theorem 2
Section 6.4.
Definition 10.3 The projection on U with kernel U⊥ is called the orthogonal projection on U (or
simply the projection on U) and is denoted projU : V → V.
Theorem 7
Projection Theorem
Let U be a finite dimensional subspace of an inner product space V and let v be a vector
in V.
1. projU : V → V is a linear operator with image U and kernel U⊥.
2. projU(v) is in U and v - projU(v) is in U⊥.
3. If {f1, f2, …, fm} is any orthogonal basis of U, then
〈v, f1〉 〈v, f2〉 〈v, fm〉
projU(v) = ______ f + ______
2 1
f +
+ ______
2 2
fm.
f1 f2 fm2
PROOF
Only (3) remains to be proved. But since {f1, f2, …, fn} is an orthogonal basis
of U and since projU(v) is in U, the result follows from the expansion theorem
(Theorem 4) applied to the finite dimensional space U.
EXAMPLE 5
Let U be a subspace of the finite dimensional inner product space V. Show that
projU⊥(v) = v - projU(v) for all v in V.
The vectors v, projU (v), and v - projU (v) in Theorem 7 can be visualized
v geometrically as in the diagram (where U is shaded and dim U = 2). This
v − projUv
suggests that projU(v) is the vector in U closest to v. This is, in fact, the case.
0 projUv
U Theorem 8
Approximation Theorem
Let U be a finite dimensional subspace of an inner product space V. If v is any vector in
V, then projU(v) is the vector in U that is closest to v. Here closest means that
v - projU(v) < v - u
PROOF
Write p = projU(v), and consider v - u = (v - p) + (p - u). Because v - p is in
U⊥ and p - u is in U, Pythagoras’ theorem gives
v - u2 = v - p2 + p - u2 > v - p2
because p - u ≠ 0. The result follows.
EXAMP L E 6
Consider the space C[-1, 1] of real-valued continuous functions on the interval
[-1, 1] with inner product 〈 f, g〉 = ∫1-1 f (x)g(x) dx. Find the polynomial p = p(x)
of degree at most 2 that best approximates the absolute-value function f given
by f (x) = |x|.
EXERCISES 10.2
Use the dot product in n unless otherwise instructed. 5. Show that {1, x - _12 , x2 - x + _16 }, is an
orthogonal basis of P2 with the inner product
1. In each case, verify that B is an orthogonal 〈p, q〉 = ∫01 p(x)q(x) dx, and find the
basis of V with the given inner product and use corresponding orthonormal basis.
the expansion theorem to express v as a linear
combination of the basis vectors. 6. In each case find U⊥ and compute dim U
(a) v = S T , B = U S T , S T V, V = ,
and dim U⊥.
a 1 1 2
b -1 0
(a) U = span{(1, 1, 2, 0), (3, -1, 2, 1),
〈v, w〉 = vTAw where A = S T
2 2
(1, -3, -2, 1)} in 4
2 5
(b) v = S b T , B =
U S T S T S TV
a 1 -1 1 (b) U = span{(1, 1, 0, 0)} in 4
3
1 , 0 -6 , V = ,
,
c (c) U = span{1, x} in P2 with
1 1 1
〈p, q〉 = p(0)q(0) + p(1)q(1) + p(2)q(2)
S T
2 0 1
〈v, w〉 = vTAw where A = 0 1 0 (d) U = span{x} in P2 with 〈p, q〉 = ∫10p(x)q(x)dx
(e) U = spanU S T , S T V in M22 with
1 0 2 1 0 1 1
0 1 0 0
(c) v = a + bx + cx2, B = {1, x, 2 - 3x2}, V = P2,
〈p, q〉 = p(0)q(0) + p(1)q(1) + p(-1)q(-1) 〈X, Y〉 = tr(XYT)
(f ) U = span U S T , S T , S T V in M22 with
1 1 1 0 1 0
v=S T,
a b 0 0 1 0 1 1
(d)
c d
〈X, Y〉 = tr(XYT)
B = US T, S T, S T, S T V,
1 0 1 0 0 1 0 1
0 1 0 -1 1 0 -1 0 7. Let 〈X, Y〉 = tr(XYT) in M22. In each case find
V = M22, 〈X, Y〉 = tr(XYT) the matrix in U closest to A.
(a) U = span U S T, S T V, A = S T
1 0 1 1 1 -1
2. Let 3 have the inner product
0 1 1 1 2 3
〈(x, y, z), (x, y, z )〉 = 2xx + yy + 3zz.
(b) U = span U S T, S T , S T V,
1 0 1 1 1 1
In each case, use the Gram-Schmidt algorithm to
0 1 1 -1 0 0
transform B into an orthogonal basis.
A=S T
2 1
(a) B = {(1, 1, 0), (1, 0, 1), (0, 1, 1)} 3 2
(b) B = {(1, 1, 1), (1, -1, 1), (1, 1, 0)} 8. Let 〈p(x), q(x)〉 = p(0)q(0) + p(1)q(1) + p(2)q(2)
in P2. In each case find the polynomial in U
3. Let M22 have the inner product closest to f (x).
〈X, Y〉 = tr(XYT). In each case, use the
Gram-Schmidt algorithm to transform B (a) U = span{1 + x, x2}, f (x) = 1 + x2
into an orthogonal basis. (b) U = span{1, 1 + x2}; f (x) = x
(a) B = U S T, S T, S T, S T V
1 1 1 0 0 1 1 0
0 0 1 0 0 1 0 1 9. Using the inner product 〈p, q〉 = ∫01 p(x)q(x) dx
(b) B = U S T, S T, S T, S T V
1 1 1 0 1 0 1 0 on P2, write v as the sum of a vector in U and a
0 1 1 1 0 1 0 0 vector in U⊥.
10. (a) Show that {u, v} is orthogonal if and only if (b) If V = 3, show that (-5, 4, -3) lies in
u + v2 = u2 + v2. span{(3, -2, 5), (-1, 1, 1)} but that (-1, 0, 2)
does not.
(b) If u = v = (1, 1) and w = (-1, 0), show that
u + v + w2 = u2 + v2 + w2 but 19. Let n ≠ 0 and w ≠ 0 be nonparallel vectors in
{u, v, w} is not orthogonal. Hence the 3 (as in Chapter 4).
converse to Pythagoras’ theorem need not
U
(a) Show that n, n × w, w - n
n2 V
· w n is an
_____
hold for more than two vectors.
3
11. Let v and w be vectors in an inner product space orthogonal basis of .
U n2 V
V. Show that: n·w
(b) Show that span n × w, w - _____ n is the
(b) v + w and v - w are orthogonal if and only 20. Let E = {f1, f2, …, fn} be an orthonormal basis
if v = w. of V.
12. Let U and W be subspaces of an n-dimensional (a) Show that 〈v, w〉 = CE(v) · CE(w) for all v, w
inner product space V. If dim U + dim W = n in V.
and 〈u, v〉 = 0 for all u in U and w in W, show (b) If P = [ pij] is an n × n matrix, define
that U⊥ = W. bi = pi1f1 +
+ pinfn for each i. Show that
B = {b1, b2, …, bn} is an orthonormal basis if
13. If U and W are subspaces of an inner product
and only if P is an orthogonal matrix.
space, show that (U + W)⊥ = U⊥ ∩ W⊥.
21. Let {f1, …, fn} be an orthogonal basis of V. If v
14. If X is any set of vectors in an inner
and w are in V, show that
product space V, define
X⊥ = {v | v in V, 〈v, x〉 = 0 for all x in X}. 〈v, f1〉〈w, f1〉 〈v, fn〉〈w, fn〉
〈v, w〉 = ___________
2
+
+ ___________ .
(a) Show that X⊥ is a subspace of V. f1 fn2
22. Let {f1, …, fn} be an orthonormal basis
(b) If U = span{u1, u2, …, um}, show that
of V, and let v = v1f1 +
+ vnfn
U⊥ = {u1, …, um}⊥.
and w = w1f1 +
+ wnfn. Show
(c) If X ⊆ Y, show that Y⊥ ⊆ X⊥. that 〈v, w〉 = v1w1 +
+ vnwn and
v2 = v 21 +
+ v 2n (Parseval’s formula).
(d) Show that X⊥ ∩ Y⊥ = (X ∪ Y)⊥.
23. Let v be a vector in an inner product space V.
15. If dim V = n and w ≠ 0 in V, show that
dim{v | v in V, 〈v, w〉 = 0} = n - 1. (a) Show that v ≥ projU(v) holds for all
finite dimensional subspaces U. [Hint:
16. If the Gram-Schmidt process is used on an Pythagoras’ theorem.]
orthogonal basis {v1, …, vn} of V, show that
fk = vk holds for each k = 1, 2, …, n. That is, (b) If {f1, f2, …, fm} is any orthogonal set in V,
show that the algorithm reproduces the same prove Bessel’s inequality:
basis. 〈v, f1〉2 〈v, fm〉2
______ +
+ _______ ≤ v2
f12 fm2
17. If {f1, f2, …, fn-1} is orthonormal in an inner
product space of dimension n, prove that 24. Let B = {f1, f2, …, fn} be an orthogonal basis
there are exactly two vectors fn such that of an inner product space V. Given v ∈ V,
{f1, f2, …, fn-1, fn} is an orthonormal basis. let θi be the angle between v and fi for each i
(see Exercise 31 Section 10.1). Show that
18. Let U be a finite dimensional subspace of an cos2 θ1 + cos2 θ2 +
+ cos2 θn = 1.
inner product space V, and let v be a vector in V. [The cos θi are called direction cosines
for v corresponding to B.]
(a) Show that v lies in U if and only if
v = projU(v).
486 Chapter 10 Inner Product Spaces
25. (a) Let S denote a set of vectors in a finite Define componentwise addition and scalar
dimensional inner product space V, and multiplication on V as follows:
suppose that 〈u, v〉 = 0 for all u in S implies [xi) + [yi) = [xi + yi), and a[xi) = [axi) for a in .
v = 0. Show that V = span S. [Hint: Write ∞
Given [xi) and [yi) in V, define 〈[xi), [yi)〉 = ∑ xiyi.
U = span S and use Theorem 6.] i=0
(Note that this makes sense since only finitely
(b) Let A1, A2, …, Ak be n × n matrices. Show
many xi and yi are nonzero.) Finally define
that the following are equivalent. ∞
U = {[xi) in V | ∑ xi = 0}.
(i) If Aib = 0 for all i (where b is a column i=0
in n), then b = 0. (a) Show that V is a vector space and that U is a
subspace.
(ii) The set of all rows of the matrices Ai
spans n. (b) Show that 〈 , 〉 is an inner product on V.
26. Let [xi) = (x1, x2, …) denote a sequence (c) Show that U⊥ = {0}.
of real numbers xi , and let (d) Hence show that U ⊕ U⊥ ≠ V and U ≠ U⊥⊥.
V = {[xi) | only finitely many xi ≠ 0}.
Theorem 1
PROOF
We have MB(T ) = [CB[T(b1)] CB[T(b2)]
CB[T(bn)]] where B = {b1, b2, …, bn}
is any basis of V. By comparing columns:
λ1 0 0
0 λ2 0
MB(T ) = if and only if T(bi) = λibi for each i
0 0 λn
Theorem 1 follows.
Definition 10.4 A linear operator T on a finite dimensional space V is called diagonalizable if V has a
basis consisting of eigenvectors of T.
SECTION 10.3 Orthogonal Diagonalization 487
EXAMP L E 1
Let T : P2 → P2 be given by
T(a + bx + cx2) = (a + 4c) - 2bx + (3a + 2c)x2
Find the eigenspaces of T and hence find a basis of eigenvectors.
U S 1 T, S 0 T, S 0 T V is a basis of eigenvectors of M
0 4 1
One sees that B0(T ), so
0 -3 1
B = {x, 4 - 3x2, 1 + x2} is a basis of P2 consisting of eigenvectors of T.
If V is an inner product space, the expansion theorem gives a simple formula for
the matrix of a linear operator with respect to an orthogonal basis.
Theorem 2
PROOF
Write MB(T ) = [aij]. The jth column of MB(T ) is CB[T(ej)], so
T(bj) = a1jb1 +
+ aijbi +
+ anjbn
On the other hand, the expansion theorem (Theorem 4 Section 10.2) gives
〈b1, v〉 〈bi, v〉 〈bn, v〉
v = ______ b +
+ ______
2 1
b +
+ ______
2 i
bn
b1 bi bn2
for any v in V. The result follows by taking v = T(bj).
EXAMP L E 2
Let T : 3 → 3 be given by
T(a, b, c) = (a + 2b - c, 2a + 3c, -a + 3b + 2c)
If the dot product in 3 is used, find the matrix of T with respect to the
standard basis B = {e1, e2, e3} where e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1).
488 Chapter 10 Inner Product Spaces
Theorem 3
Let V be a finite dimensional inner product space. The following conditions are
equivalent for a linear operator T : V → V.
1. 〈v, T(w)〉 = 〈T(v), w〉 for all v and w in V.
2. The matrix of T is symmetric with respect to every orthonormal basis of V.
3. The matrix of T is symmetric with respect to some orthonormal basis of V.
4. There is an orthonormal basis B = {f1, f2, …, fn} of V such that
〈fi, T(fj)〉 = 〈T(fi), fj〉 holds for all i and j.
PROOF
(1) ⇒ (2). Let B = {f1, …, fn} be an orthonormal basis of V, and write
MB(T ) = [aij]. Then aij = 〈fi, T(fj)〉 by Theorem 2. Hence (1) and axiom P2 give
aij = 〈fi, T(fj)〉 = 〈T(fi), fj〉 = 〈fj, T(fi)〉 = aji
for all i and j. This shows that MB(T ) is symmetric.
(2) ⇒ (3). This is clear.
(3) ⇒ (4). Let B = {f1, …, fn} be an orthonormal basis of V such that MB(T ) is
symmetric. By (3) and Theorem 2, 〈fi, T(fj)〉 = 〈fj, T(fi)〉 for all i and j, so (4)
follows from axiom P2.
n
(4) ⇒ (1). Let v and w be vectors in V and write them as v = ∑vi fi and
n i=1
w = ∑wj fj. Then
j=1
〈 i j 〉
〈v, T(w)〉 = ∑vi fi, ∑wjT(fj) = ∑ ∑ vi wj〈fi, T(fj)〉
i j
= ∑ ∑ vi wj〈T(fi), fj〉
i j
〈
= ∑viT(fi), ∑wj fj
i j 〉
= 〈T(v), w〉
where we used (4) at the third stage. This proves (1).
SECTION 10.3 Orthogonal Diagonalization 489
EXAMP L E 3
If A is an n × n matrix, let TA : n → n be the matrix operator given by
TA(v) = Av for all columns v. If the dot product is used in n, then TA is a
symmetric operator if and only if A is a symmetric matrix.
Solution ► If E is the standard basis of n, then E is orthonormal when the dot
product is used. We have ME(TA) = A (by Example 4 Section 9.1), so the result
follows immediately from part (3) of Theorem 3.
Theorem 4
A symmetric linear operator on a finite dimensional inner product space has real
eigenvalues.
Theorem 5
PROOF
1. U is itself an inner product space using the same inner product, and condition
1 in Theorem 3 that T is symmetric is clearly preserved.
2. If v is in U⊥, our task is to show that T(v) is also in U⊥; that is, 〈T(v), u〉 = 0 for
all u in U. But if u is in U, then T(u) also lies in U because U is T-invariant, so
〈T(v), u〉 = 〈v, T(u)〉 = 0
using the symmetry of T and the definition of U⊥.
490 Chapter 10 Inner Product Spaces
The principal axis theorem (Theorem 2 Section 8.2) asserts that an n × n matrix
A is symmetric if and only if n has an orthogonal basis of eigenvectors of A. The
following result not only extends this theorem to an arbitrary n-dimensional inner
product space, but the proof is much more intuitive.
Theorem 6
PROOF
(1) ⇒ (2). Assume that T is symmetric and proceed by induction on n = dim V.
If n = 1, every nonzero vector in V is an eigenvector of T, so there is nothing
to prove. If n ≥ 2, assume inductively that the theorem holds for spaces of
dimension less than n. Let λ1 be a real eigenvalue of T (by Theorem 4) and
choose an eigenvector f1 corresponding to λ1. Then U = f1 is T-invariant, so
U⊥ is also T-invariant by Theorem 5 (T is symmetric). Because dim U⊥ = n - 1
(Theorem 6 Section 10.2), and because the restriction of T to U⊥ is a symmetric
operator (Theorem 5), it follows by induction that U⊥ has an orthogonal basis
{f2, …, fn} of eigenvectors of T. Hence B = {f1, f2, …, fn} is an orthogonal basis of
V, which proves (2).
(2) ⇒ (1). If B = {f1, …, fn} is a basis as in (2), then MB(T ) is symmetric (indeed
diagonal), so T is symmetric by Theorem 3.
EXAMP L E 4
Let T : P2 → P2 be given by
T(a + bx + cx2) = (8a - 2b + 2c) + (-2a + 5b + 4c)x + (2a + 4b + 5c)x2
Using the inner product 〈a + bx + cx2, a + bx + cx2〉 = aa + bb + cc,
show that T is symmetric and find an orthonormal basis of P2 consisting
of eigenvectors.
8 −2 2
Solution ► If B0 = {1, x, x2}, then MB0(T ) = − 2 5 4 is symmetric, so
2 4 5
T is symmetric. This matrix was analyzed in Example 5 Section 8.2,
where it was found that an orthonormal basis of eigenvectors is
U _13 [1 2 -2] , _13 [2 1 2] , _13 [-2 2 1] V. Because B0 is orthonormal,
T T T
EXERCISES 10.3
MDB(T) = S _________ T
(a) Show that TA is symmetric if and only if 〈di, T(bj)〉
PA = ATP. di2
(b) Use part (a) to deduce Example 3. This is a generalization of Theorem 2.
7. Let T : M22 → M22 be given by T(X) = AX, 12. Let T : V → V be a linear operator on an inner
where A is a fixed 2 × 2 matrix. product space V of finite dimension. Show that
the following are equivalent.
(a) Compute MB(T ), where
B = US T , S T , S T , S T V.
1 0 0 0 0 1 0 0 (1) 〈v, T(w)〉 = -〈T(v), w〉 for all v and w in V.
0 0 1 0 0 0 0 1 (2) MB(T ) is skew-symmetric for every
Note the order! orthonormal basis B.
(b) Show that cT(x) = [cA(x)]2. (3) MB(T ) is skew-symmetric for some
orthonormal basis B.
(c) If the inner product on M22 is
〈X, Y〉 = tr(XYT), show that T is symmetric Such operators T are called skew-symmetric
if and only if A is a symmetric matrix. operators.
8. Let T : 2 → 2 be given by 13. Let T : V → V be a linear operator on an
T(a, b) = (b - a, a + 2b). Show that T is n-dimensional inner product space V.
symmetric if the dot product is used in 2
but that it is not symmetric if the following (a) Show that T is symmetric if and only if it
inner product is used: satisfies the following two conditions.
〈x, y〉 = xAyT, A = S T.
1 -1 (i) cT(x) factors completely over .
-1 2
(ii) If U is a T-invariant subspace of V,
9. If T : V → V is symmetric, write then U⊥ is also T-invariant.
T -1(W) = {v | T(v) is in W}. Show that
T(U )⊥ = T -1(U⊥) holds for every (b) Using the standard inner product on 2,
subspace U of V. show that T : 2 → 2 with
T(a, b) = (a, a + b) satisfies condition (i)
10. Let T : M22 → M22 be defined by T(X) = PXQ, and that S : 2 → 2 with S(a, b) = (b, -a)
where P and Q are nonzero 2 × 2 matrices. Use satisfies condition (ii), but that neither is
the inner product 〈X, Y〉 = tr(XYT). Show that symmetric. (Example 4 Section 9.3 is useful
T is symmetric if and only if either P and Q are for S.)
both symmetric or both are scalar multiples
[Hint for part (a): If conditions (i) and (ii) hold,
of S T. [Hint: If B is as in part (a) of
0 1
proceed by induction on n. By condition (i), let
-1 0
e1 be an eigenvector of T. If U = e1, then U⊥
Exercise 7, then MB(T ) = S T in block
aP cP is T-invariant by condition (ii), so show that
bP dP the restriction of T to U⊥ satisfies conditions (i)
form, where Q = S T.
a b and (ii). (Theorem 1 Section 9.3 is helpful for
c d part (i)). Then apply induction to show that V
If B0 = U S T , S T , S T , S T V,
1 0 0 1 0 0 0 0 has an orthogonal basis of eigenvectors (as in
0 0 0 0 1 0 0 1 Theorem 6)].
SECTION 10.4 Isometries 493
14. Let B = {f1, f2, …, fn} be an orthonormal basis 15. Let V be a finite dimensional inner product
of an inner product space V. Given T : V → V, space. Show that the following conditions are
define T : V → V by equivalent for a linear operator T : V → V.
T (v) = 〈v, T(f1)〉f1 + 〈v, T(f2)〉f2 (1) T is symmetric and T 2 = T.
Lemma 1
PROOF
We have S(v) - S(w)2 = v - w2 for all v and w in V by (∗), which gives
〈S(v), S(w)〉 = 〈v, w〉 for all v and w in V. (∗∗)
Now let {f1, f2, …, fn} be an orthonormal basis of V. Then {S(f1), S(f2), …, S(fn)}
is orthonormal by (∗∗) and so is a basis because dim V = n. Now compute:
〈S(v + w) - S(v) - S(w), S(fi)〉 = 〈S(v + w), S(fi)〉 - 〈S(v), S(fi)〉 - 〈S(w), S(fi)〉
= 〈v + w, fi〉 - 〈v, fi〉 - 〈w, fi〉
=0
for each i. It follows from the expansion theorem (Theorem 4 Section 10.2)
that S(v + w) - S(v) - S(w) = 0; that is, S(v + w) = S(v) + S(w). A similar
argument shows that S(av) = aS(v) holds for all a in and v in V, so S is
linear after all.
Theorem 1
PROOF
If S : V → V is distance preserving, write S(0) = u and define T : V → V by
T(v) = S(v) - u for all v in V. Then T(v) - T(w) = v - w for all vectors
v and w in V as the reader can verify; that is, T is distance preserving. Clearly,
T(0) = 0, so it is an isometry by Lemma 1. Since S(v) = u + T(v) = (Su ◦ T )(v)
for all v in V, we have S = Su ◦ T, and the theorem is proved.
Theorem 2
PROOF
(1) ⇒ (2). Take w = 0 in (∗).
(2) ⇒ (3). Since T is linear, (2) gives T(v) - T(w)2 = T(v - w)2 = v - w2.
Now (3) follows.
(3) ⇒ (4). By (3), {T(f1), T(f2), …, T(fn)} is orthogonal and T(fi)2 = fi2 = 1.
Hence it is a basis because dim V = n.
(4) ⇒ (5). This needs no proof.
(5) ⇒ (1). By (5), let {f1, …, fn} be an orthonormal basis of V such that
{T(f1), …, T(fn)} is also orthonormal. Given v = v1f1 +
+ vnfn in V, we have
T(v) = v1T(f1) +
+ vnT(fn) so Pythagoras’ theorem gives
T(v)2 = v 21 +
+ v 2n = v2.
Hence T(v) = v for all v, and (1) follows by replacing v by v - w.
Corollary 1
PROOF
(1) is by (4) of Theorem 2 and Theorem 1 Section 7.3. (2a) is clear, and (2b) is
left to the reader. If T : V → V is an isometry and {f1, …, fn} is an orthonormal
basis of V, then (2c) follows because T -1 carries the orthonormal basis
{T(f1), …, T(fn)} back to {f1, …, fn}.
The conditions in part (2) of the corollary assert that the set of isometries
of a finite dimensional inner product space forms an algebraic system called a
group. The theory of groups is well developed, and groups of operators are
important in geometry. In fact, geometry itself can be fruitfully viewed as the
EXAMPLE 1
Rotations of 2 about the origin are isometries, as are reflections in lines
through the origin: They clearly preserve distance and so are linear by Lemma
1. Similarly, rotations about lines through the origin and reflections in planes
through the origin are isometries of 3.
EXAMPLE 2
Let T : Mnn → Mnn be the transposition operator: T(A) = AT. Then T is an
isometry if the inner product is 〈A, B〉 = tr(ABT) = ∑aijbij. In fact, T permutes
i, j
the basis consisting of all matrices with one entry 1 and the other entries 0.
The proof of the next result requires the fact (see Theorem 2) that, if B is an
orthonormal basis, then 〈v, w〉 = CB(v) · CB(w) for all vectors v and w.
Theorem 3
PROOF
(1) ⇒ (2). Let B = {e1, …, en} be an orthonormal basis. Then the jth column of
MB(T ) is CB[T(ej)], and we have
CB[T(ej)] · CB[T(ek)] = 〈T(ej), T(ek)〉 = 〈ej, ek〉
using (1). Hence the columns of MB(T ) are orthonormal in n, which proves (2).
(2) ⇒ (3). This is clear.
(3) ⇒ (1). Let B = {e1, …, en} be as in (3). Then, as before,
〈T(ej), T(ek)〉 = CB[T(ej)] · CB[T(ek)]
so {T(e1), …, T(en)} is orthonormal by (3). Hence Theorem 2 gives (1).
Corollary 2
EXAMP L E 3
If A is any n × n matrix, the matrix operator TA : n → n is an isometry if and
only if A is orthogonal using the dot product in n. Indeed, if E is the standard
basis of n, then ME (TA) = A by Theorem 4, Section 9.2.
Rotations and reflections that fix the origin are isometries in 2 and 3
(Example 1); we are going to show that these isometries (and compositions of
them in 3) are the only possibilities. In fact, this will follow from a general
structure theorem for isometries. Surprisingly enough, much of the work
involves the two-dimensional case.
Theorem 4
MB(T ) = S T
1 0
0 -1
Furthermore, type (1) occurs if and only if det T = 1, and type (2) occurs if and only if
det T = -1.
PROOF
The final statement follows from the rest because det T = det[MB(T )] for any
basis B. Let B0 = {e1, e2} be any ordered orthonormal basis of V and write
A = MB0(T ) = S T;
a b T(e1) = ae1 + ce2
that is,
c d T(e2) = be1 + de2
Then A is orthogonal by Theorem 3, so its columns (and rows) are orthonormal.
Hence a2 + c2 = 1 = b2 + d2, so (a, c) and (d, b) lie on the unit circle. Thus
angles θ and φ exist such that
a = cos θ, c = sin θ 0 ≤ θ < 2π
d = cos φ, b = sin φ 0 ≤ φ < 2π
498 Chapter 10 Inner Product Spaces
S sin θ T
cos θ (-1)k+1 sin θ
A=
(-1)k cos θ
If k is even we are in type (1) with B = B0, so assume k is odd. Then A = S
c -a T
a c
.
If a = -1 and c = 0, we are in type (1) with B = {e2, e2}. Otherwise A has
eigenvalues λ1 = 1 and λ2 = -1 with corresponding eigenvectors x1 = S T
1+a
c
and x2 = S
1 + aT
-c
as the reader can verify. Write
Corollary 3
In fact, if E is the standard basis of 2, then the clockwise rotation Rθ about the
origin through an angle θ has matrix
ME(Rθ) = S T
cos θ -sin θ
sin θ cos θ
(see Theorem 4 Section 2.6). On the other hand, if S : 2 → 2 is the reflection in a
line through the origin (called the fixed line of the reflection), let f1 be a unit vector
pointing along the fixed line and let f2 be a unit vector perpendicular to the fixed
line. Then B = {f1, f2} is an orthonormal basis, S(f1) = f1 and S(f2) = -f2, so
MB(S) = S T
1 0
0 -1
Thus S is of type 2. Note that, in this case, 1 is an eigenvalue of S, and any
eigenvector corresponding to 1 is a direction vector for the fixed line.
SECTION 10.4 Isometries 499
EXAMP L E 4
In each case, determine whether TA : 2 → 2 is a rotation or a reflection, and
then find the angle or fixed line:
__
We now give a structure theorem for isometries. The proof requires three
preliminary results, each of interest in its own right.
Lemma 2
PROOF
Let w lie in U⊥. We are to prove that T(w) is also in U⊥; that is, 〈T(w), u〉 = 0
for all u in U. At this point, observe that the restriction of T to U is an isometry
U → U and so is an isomorphism by the corollary to Theorem 2. In particular,
each u in U can be written in the form u = T(u1) for some u1 in U, so
〈T(w), u〉 = 〈T(w), T(u1)〉 = 〈w, u1〉 = 0
Lemma 3
PROOF
Choose an orthonormal basis B of V, and let A = MB(T ). Then A is a real
__
orthogonal matrix so, using the standard inner product 〈x, y〉 = xT y in n,
we get
___ ___
Ax2 = (Ax)T( Ax ) = xTAT Ax = xTIx = x2
for all x in n. But Ax = λx for some x ≠ 0, whence x2 = λx2 = |λ|2x2.
This gives |λ| = 1, as required.
Lemma 4
PROOF
Let B be an orthonormal basis of V, let A = MB(T ), and (using Lemma 3) let
λ = eiα be a nonreal eigenvalue of A, say Ax__ = λx where
__
x ≠ 0 in n. Because
__ __
A is real, complex
__
conjugation gives A x = λ x , so λ is also an eigenvalue.
__
Moreover λ ≠ λ (λ is nonreal), so {x, x } is linearly independent in n (the
argument in the proof of Theorem 4 Section 5.5 works). Now define
__ __
z1 = x + x and z2 = i(x - x )
Then z1 and z2 lie in n, and {z1, z2} is linearly independent over because
__
{x, x } is linearly independent over . Moreover
__
x = _12 (z1 - iz2) and x = _12 (z1 + iz2)
__ __
Now λ + λ = 2 cos α and λ - λ = 2i sin α, and a routine computation gives
Az1 = z1 cos α + z2 sin α
Az2 = -z1 sin α + z2 cos α
Finally, let e1 and e2 in V be such that z1 = CB(e1) and z2 = CB(e2). Then
CB[T(e1)] = ACB(e1) = Az1 = CB(e1 cos α + e2 sin α)
using Theorem 2 Section 9.1. Because CB is one-to-one, this gives the first
of the following equations (the other is similar):
T(e1) = e1 cos α + e2 sin α
T(e2) = -e1 sin α + e2 cos α
Thus U = span{e1, e2} is T-invariant and two-dimensional.
Theorem 5
0 0 R(θk ) 0 0 R(θk )
−1 0 0 0
R(θ1) 0 0
0 1 0 0
0 R(θ2) 0
n = 2k or 0 0 R(θ1 ) 0
0 0 R(θk )
0 0 0 R(θk −1 )
PROOF
We show first, by induction on n, that an orthonormal basis B of V can be found
such that MB(T ) is a block diagonal matrix of the following form:
Ir 0 0 0
0 − Is 0 0
MB(T ) = 0 0 R (θ1) 0
0 0 0 R(θt)
where the identity matrix Ir, the matrix -Is, or the matrices R(θi) may be missing.
If n = 1 and V = v, this holds because T(v) = λv and λ = ±1 by Lemma 3.
If n = 2, this follows from Theorem 4. If n ≥ 3, either T has a real eigenvalue
and therefore has a one-dimensional T-invariant subspace U = u for any
eigenvector u, or T has no real eigenvalue and therefore has a two-dimensional
T-invariant subspace U by Lemma 4. In either case U⊥ is T-invariant (Lemma
2) and dim U⊥ = n - dim U < n. Hence, by induction, let B1 and B2 be
orthonormal bases of U and U⊥ such that MB1(T ) and MB2(T ) have the form
given. Then B = B1 ∪ B2 is an orthonormal basis of V, and MB(T ) has the
desired form with a suitable ordering of the vectors in B.
{f2, f3} to be any orthonormal basis of the fixed plane and take f1 to be a unit
vector perpendicular to the fixed plane. Then Q(f1) = -f1, whereas Q(f2) = f2 and
Q(f3) = f3. Hence B = {f1, f2, f3} is an orthonormal basis such that
−1 0 0
MB(Q) = 0 1 0
0 0 1
Similarly, suppose that R : 3 → 3 is any rotation about a line through the
origin (called the axis of the rotation), and let f1 be a unit vector pointing along the
axis, so R(f1) = f1. Now the plane through the origin perpendicular to the axis is an
R-invariant subspace of 2 of dimension 2, and the restriction of R to this plane is
a rotation. Hence, by Theorem 4, there is an orthonormal basis B1 = {f2, f3} of this
plane such that MB1(R) = S T. But then B = {f1, f2, f3} is an orthonormal
cos θ -sin θ
sin θ cos θ
3
basis of such that the matrix of R is
1 0 0
MB(R) = 0 cos θ −sin θ
0 sin θ cos θ
However, Theorem 5 shows that there are isometries T in 3 of a third type:
those with a matrix of the form
−1 0 0
MB(T) = 0 cos θ −sin θ
0 sin θ cos θ
If B = {f1, f2, f3}, let Q be the reflection in the plane spanned by f2 and f3, and let R
be the rotation corresponding to θ about the line spanned by f1. Then MB(Q) and
MB(R) are as above, and MB(Q) MB(R) = MB(T ) as the reader can verify. This means
that MB(QR) = MB(T ) by Theorem 1 Section 9.2, and this in turn implies that
QR = T because MB is one-to-one (see Exercise 26 Section 9.1). A similar argument
shows that RQ = T, and we have Theorem 6.
Theorem 6
PROOF
It remains only to verify the final observation that T is a rotation if and only if
det T = 1. But clearly det T = -1 in parts (b) and (c).
TABLE 1
Eigenvalues of T Action of T
(1) 1, no other real Rotation about the line f where f is an eigenvector
eigenvalues corresponding to 1. [Case (a) of Theorem 6.]
(2) -1, no other Rotation about the line f followed by reflection in the
real eigenvalues plane (f )⊥ where f is an eigenvector corresponding to -1.
[Case (c) of Theorem 6.]
(3) -1, 1, 1 Reflection in the plane (f )⊥ where f is an eigenvector
corresponding to -1. [Case (b) of Theorem 6.]
(4) 1, -1, -1 This is as in (1) with a rotation of π.
(5) -1, -1, -1 Here T(x) = -x for all x. This is (2) with a rotation of π.
(6) 1, 1, 1 Here T is the identity isometry.
EXAMP L E 5
z -x
0 10
Solution ► If B0 is the standard basis of 3, then MB0(T) = 0 0 1 ,
−1 0 0
so cT(x) = x3 + 1 = (x + 1)(x2 - x + 1). This is (2) in Table 1. Write:
S T ST S T
1 1 1
1__ 1__ 1__
f1 = __
√3
-1 f2 = __
√6
2 f3 = __
√
0
2
1 1 -1
Here f1 is a unit eigenvector corresponding to λ1 = -1, so T is a rotation
(through an angle θ) about the line L = f1, followed by reflection in the
plane U through the origin perpendicular to f1 (with equation x - y + z = 0).
Then, {f1, f2} is chosen as an orthonormal basis of U, so B = {f1, f2, f3} is an
orthonormal basis of 3 and
−1 0 0
MB(T ) = 0 1
2
− 2
3
3 1
0 2 2
__
3 √
Hence θ is given by cos θ = _12 , sin θ = __
2
, so θ = __π3 .
504 Chapter 10 Inner Product Spaces
for some orthonormal basis B = {f1, f2, …, fn}. Then Q(f1) = -f1 whereas Q(u) = u
for each u in U = span{f2, …, fn}. Hence U is called the fixed hyperplane of Q, and
Q is called reflection in U. Note that each hyperplane in V is the fixed hyperplane
of a (unique) reflection of V. Clearly, reflections in 2 and 3 are reflections in this
more general sense.
Continuing the analogy with 2 and 3, an isometry T : V → V is called a
rotation if there exists an orthonormal basis {f1, …, fn} such that
Ir 0 0
MB(T ) = 0 R(θ) 0
0 0 Is
in block form, where R(θ) = S T , and where either Ir or Is (or both) may
cos θ -sin θ
sin θ cos θ
be missing. If R(θ) occupies columns i and i + 1 of MB(T ), and if W = span{fi, fi+1},
then W is T-invariant and the matrix of T : W → W with respect to {fi, fi+1}
is R(θ). Clearly, if W is viewed as a copy of 2, then T is a rotation in W.
Moreover, T(u) = u holds for all vectors u in the (n - 2)-dimensional subspace
U = span{f1, …, fi-1, fi+1, …, fn}, and U is called the fixed axis of the rotation T.
In 3, the axis of any rotation is a line (one-dimensional), whereas in 2 the axis is
U = {0}.
With these definitions, the following theorem is an immediate consequence of
Theorem 5 (the details are left to the reader).
Theorem 7
EXERCISES 10.4
2. In each case, show that T is an isometry of 2, 6. If T is an isometry, show that aT is an isometry if
determine whether it is a rotation or a reflection, and only if a = ±1.
and find the angle or the fixed line. Use the dot
product. 7. Show that every isometry preserves the angle
between any pair of nonzero vectors (see
(a) T S T = S T (b) T S T = S T
a -a a -a
b b b -b Exercise 31 Section 10.1). Must an angle-
(c) T S T = S T (d) T S T = S T
a b a -b preserving isomorphism be an isometry?
b -a b -a Support your answer.
√2 S T (f ) T S T = __
√2 S T
(e) T S T = __
1__ a + b 1__ a - b
a a
8. If T : V → V is an isometry, show that T 2 = 1V if
b b-a b a+b and only if the only complex eigenvalues of T are
3. In each case, show that T is an isometry of 3, 1 and -1.
determine the type (Theorem 6), and find the
axis of any rotations and the fixed plane of any 9. Let T : V → V be a linear operator. Show that
reflections involved. any two of the following conditions implies the
S T
__
third:
(a) T S b T = S -b T (b) T S b T = _ √3 a + c
a a a √3 c - a
1
__
2 (1) T is symmetric.
c c c 2b
(2) T is an involution (T 2 = 1V).
(c) T S b T = S c T (d) T S b T = S -b T
a b a a
a (3) T is an isometry.
c c -c
S T
__
(e) T S b T = _12 b - √3 a
a a + √ 3 b [Hint: In all cases, use the definition
__
〈v, T(w)〉 = 〈T(v), w〉 of a symmetric operator.
c 2c For (1) and (3) ⇒ (2), use the fact that, if
S T
〈T 2(v) - v, w〉 = 0 for all w, then T 2(v) = v.]
(f ) T S b T =
a a +c
1__
__
__
√2 - 2 b
√
c c-a 10. If B and D are any orthonormal bases of V, show
that there is an isometry T : V → V that carries
4. Let T : 2 → 2 be an isometry. A vector x in B to D.
2 is said to be fixed by T if T(x) = x. Let E1
11. Show that the following are equivalent for a
denote the set of all vectors in 2 fixed by T.
linear transformation S : V → V where V is
Show that:
finite dimensional and S ≠ 0:
(a) E1 is a subspace of 2.
(1) 〈S(v), S(w)〉 = 0 whenever 〈v, w〉 = 0;
(b) E1 = 2 if and only if T = 1 is the identity
map. (2) S = aT for some isometry T : V → V and
some a ≠ 0 in .
(c) dim E1 = 1 if and only if T is a reflection
(about the line E1). (3) S is an isomorphism and preserves angles
between nonzero vectors.
(d) E1 = {0} if and only if T is a rotation (T ≠ 1).
[Hint: Given (1), show that S(e) = S(f ) for
5. Let T : 3 → 3 be an isometry, and let E1 all unit vectors e and f in V.]
be the subspace of all fixed vectors in 3 (see
Exercise 4). Show that: 12. Let S : V → V be a distance preserving
3 transformation where V is finite dimensional.
(a) E1 = if and only if T = 1.
(a) Show that the factorization in the proof of
(b) dim E1 = 2 if and only if T is a reflection
Theorem 1 is unique. That is, if S = Su ◦ T
(about the plane E1).
and S = Su ◦ T where u, u ∈ V and T,
(c) dim E1 = 1 if and only if T is a rotation T : V → V are isometries, show that u = u
(T ≠ 1) (about the line E1). and T = T .
(d) dim E1 = 0 if and only if T is a reflection (b) If S = Su ◦ T, u ∈ V, T an isometry, show
followed by a (nonidentity) rotation. that w ∈ V exists such that S = T ◦ Sw.
506 Chapter 10 Inner Product Spaces
13. Define T : P → P by T( f ) = xf (x) for all f ∈ P, (a) Show that 〈 , 〉 is an inner product on P.
and define an inner product on P as follows: If
(b) Show that T is an isometry of P.
f = a0 + a1x + a2x2 +
and
g = b0 + b1x + b2x2 +
are in P, define (c) Show that T is one-to-one but not onto.
〈f, g〉 = a0b0 + a1b1 + a2b2 +
.
π
cos kx2 = ∫-π cos2(kx) dx = π for any k = 1, 2, 3, …
We leave the verifications to the reader, together with the task of showing that
these functions are orthogonal:
〈sin(kx), sin(mx)〉 = 0 = 〈cos(kx), cos(mx)〉 if k ≠ m
and
〈sin(kx), cos(mx)〉 = 0 for all k ≥ 0 and m ≥ 0
(Note that 1 = cos(0x), so the constant function 1 is included.)
Now define the following subspace of C[-π, π]:
Fn = span{1, sin x, cos x, sin(2x), cos(2x), …, sin(nx), cos(nx)}
The aim is to use the approximation theorem (Theorem 8 Section 10.2); so, given
a function f in C[-π, π], define the Fourier coefficients of f by
〈 f (x), 1〉 __ π
a0 = ________2
= 2π1
∫-π f (x) dx
1
〈 f (x), cos(kx)〉 __1 π
ak = _____________ = π ∫-π f (x)cos(kx) dx k = 1, 2, …
cos(kx)2
〈 f (x), sin(kx)〉 __1 π
bk = ____________ = π ∫-π f (x)sin(kx) dx k = 1, 2, …
sin(kx)2
Then the approximation theorem (Theorem 8 Section 10.2) gives Theorem 1.
6 The name honours the French mathematician J.B.J. Fourier (1768–1830) who used these techniques in 1822 to investigate heat
conduction in solids.
SECTION 10.5 An Application to Fourier Approximation 507
Theorem 1
Let f be any continuous real-valued function defined on the interval [-π, π]. If
a0, a1, …, and b0, b1, … are the Fourier coefficients of f, then given n ≥ 0,
fn (x) = a0 + a1 cos x + b1 sin x + a2 cos(2x) + b2 sin(2x) +
+ an cos(nx) + bn sin(nx)
is a function in Fn that is closest to f in the sense that
f - fn ≤ f - g
holds for all functions g in Fn.
EXAMP L E 1
Find the fifth Fourier approximation to the function f (x) defined on [-π, π] as
follows:
f (x) = U π + x if -π ≤ x < 0
π - x if 0 ≤ x ≤ π
y Solution ► The graph of y = f (x) appears in the top diagram. The Fourier
coefficients are computed as follows. The details of the integrations (usually
π by parts) are omitted.
1
a0 = __ ∫ π f (x) dx = __π2
2π -π
−π 0 π x
π
ak = __π1 ∫-π 2
f (x)cos(kx) dx = ___2
πk
[1 - cos(kπ)] = {04
___
πk2
if k is even
if k is odd
y π
4 bk = __π1 ∫-π f (x)sin(kx) dx = 0 for all k = 1, 2, …
3
2 Hence the fifth Fourier approximation is
1
x
f5(x) = __π2 + __π4 U cos x + __
1
2
1
cos(3x) + __
2
cos(5x) V
3 5
−4 −3 −2 −1 0 1 2 3 4
f5(x) This is plotted in the middle diagram and is already a reasonable approximation
y
to f (x). By comparison, f13(x) is also plotted in the bottom diagram.
4
3
2 We say that a function f is an even function if f (x) = f (-x) holds for all x; f is
1 called an odd function if f (-x) = -f (x) holds for all x. Examples of even functions
x
−4 −3 −2 −1 0 1 2 3 4 are constant functions, the even powers x2, x4, …, and cos(kx); these functions are
f13(x) characterized by the fact that the graph of y = f (x) is symmetric about the y axis.
Examples of odd functions are the odd powers x, x3, …, and sin(kx) where k > 0,
and the graph of y = f (x) is symmetric about the origin if f is odd. The usefulness of
these functions stems from the fact that
π
∫-π f (x) dx = 0 if f is odd
π
∫-π f (x) dx = 2∫0π f (x) dx if f is even
508 Chapter 10 Inner Product Spaces
These facts often simplify the computations of the Fourier coefficients. For
example:
1. The Fourier sine coefficients bk all vanish if f is even.
2. The Fourier cosine coefficients ak all vanish if f is odd.
This is because f (x) sin(kx) is odd in the first case and f (x) cos(kx) is odd in the
second case.
The functions 1, cos(kx), and sin(kx) that occur in the Fourier approximation for
f (x) are all easy to generate as an electrical voltage (when x is time). By summing
these signals (with the amplitudes given by the Fourier coefficients), it is possible to
produce an electrical signal with (the approximation to) f (x) as the voltage. Hence
these Fourier approximations play a fundamental role in electronics.
Finally, the Fourier approximations f1, f2, … of a function f get better and better
as n increases. The reason is that the subspaces Fn increase:
F1 ⊆ F2 ⊆ F3 ⊆
⊆ Fn ⊆
So, because fn = projFn( f ), we get (see the discussion following Example 6 Section
10.2)
f - f1 ≥ f - f2 ≥
≥ f - fn ≥
These numbers f - fn approach zero; in fact, we have the following fundamental
theorem.7
Theorem 2
It shows that f has a representation as an infinite series, called the Fourier series
of f:
f (x) = a0 + a1 cos x + b1 sin x + a2 cos(2x) + b2 sin(2x) +
whenever -π < x < π. A full discussion of Theorem 2 is beyond the scope of this
book. This subject had great historical impact on the development of mathematics,
and has become one of the standard tools in science and engineering.
Thus the Fourier series for the function f in Example 1 is
f (x) = __π2 + __π4 U cos x + __
1
2
1
cos(3x) + __
2
1
cos(5x) + __
2
cos(7x) +
V
3 5 7
3 5 7
7 See, for example, J. W. Brown and R. V. Churchill, Fourier Series and Boundary Value Problems, 7th ed., (New York: McGraw-Hill,
2008).
8 We have to be careful at the end points x = π or x = -π because sin(kπ) = sin(-kπ) and cos(kπ) = cos(-kπ).
SECTION 10.5 An Application to Fourier Approximation 509
EXAMP L E 2
Expand f (x) = x on the interval [-π, π] in a Fourier series, and so obtain a
series expansion of __π4 .
Solution ► Here f is an odd function so all the Fourier cosine coefficients ak are
zero. As to the sine coefficients:
π
bk = __π1 ∫-π x sin(kx) dx = _2k (-1)k+1 for k ≥ 1
where we omit the details of the integration by parts. Hence the Fourier series
for x is
x = 2 S sin x - _12 sin(2x) + _13 sin(3x) - _14 sin(4x) + … T
for -π < x < π. In particular, taking x = __π2 gives an infinite series for __π4 .
π
= 1 - _13 + _15 - _17 + _19 -
__
4
Many other such formulas can be proved using Theorem 2.
EXERCISES 10.5
π
1. In each case, find the Fourier approximation f5 of 3. (a) Prove that ∫-π f (x)dx = 0 if f is odd and that
the given function in C[-π, π]. π
∫-π f (x)dx = 2∫0π f (x)dx if f is even.
(a) f (x) = π - x (b) Prove that _12 [ f (x) + f (-x)] is even and that
(b) {
f (x) = |x| = x if 0 ≤ x ≤ π
-x if -π ≤ x < 0
1
_
2
[ f (x) - f (-x)] is odd for any function f.
Note that they sum to f (x).
(c) f (x) = x2
(d) {
f (x) = 0 if -π ≤ x < 0
x if 0 ≤ x ≤ π
4. Show that {1, cos x, cos(2x), cos(3x), …} is an
orthogonal set in C[0, π] with respect to the
2. (a) Find f5 for the even function f on [-π, π] inner product 〈f, g〉 = ∫0π f (x)g(x)dx.
satisfying f (x) = x for 0 ≤ x ≤ π. 2
π 1 1
5. (a) Show that __
8
= 1 + __ + __ +
using
(b) Find f6 for the even function f on [-π, π] 32 52
Exercise 1(b).
satisfying f (x) = sin x for 0 ≤ x ≤ π.
2
π 1 1 1
(b) Show that __
12
= 1 - __ + __ - __ +
using
[Hint: If k > 1, ∫ sin x cos(kx) 22 32 42
Exercise 1(c).
= _12 S ____________ - ____________ T.]
cos[(k - 1)x] cos[(k + 1)x]
k-1 k+1
Canonical Forms
11
Given a matrix A, the effect of a sequence of row-operations on A is to produce
UA where U is invertible. Under this “row-equivalence” operation the best that
can be achieved is the reduced row-echelon form for A. If column operations are
also allowed, the result is UAV where both U and V are invertible, and the best
outcome under this “equivalence” operation is called the Smith canonical form of
A (Theorem 3 Section 2.5). There are other kinds of operations on a matrix and,
in many cases, there is a “canonical” best possible result.
If A is square, the most important operation of this sort is arguably “similarity”
wherein A is carried to U -1AU where U is invertible. In this case we say that
matrices A and B are similar, and write A ∼ B, when B = U -1AU for some invertible
matrix U. Under similarity the canonical matrices, called Jordan canonical matrices,
are block triangular with upper triangular “Jordan” blocks on the main diagonal. In
this short chapter we are going to define these Jordan blocks and prove that every
matrix is similar to a Jordan canonical matrix.
Here is the key to the method. Let T : V → V be an operator on an
n-dimensional vector space V, and suppose that we can find an ordered basis B
of V so that the matrix MB(T ) is as simple as possible. Then, if B0 is any ordered
basis of V, the matrices MB(T ) and MB0(T ) are similar; that is,
MB(T ) = P-1MB0(T )P for some invertible matrix P.
Moreover, P = PB0←B is easily computed from the bases B and D (Theorem 3
Section 9.2). This, combined with the invariant subspaces and direct sums studied in
Section 9.3, enables us to calculate the Jordan canonical form of any square matrix
A. Along the way we derive an explicit construction of an invertible matrix P such
that P-1AP is block triangular.
This technique is important in many ways. For example, if we want to diagonalize
an n × n matrix A, let TA : n → n be the operator given by TA(x) = Ax for all
x in n, and look for a basis B of n such that MB(TA) is diagonal. If B0 = E is the
standard basis of n, then ME(TA) = A, so
P-1AP = P-1ME(TA)P = MB(TA),
and we have diagonalized A. Thus the “algebraic” problem of finding an invertible
matrix P such that P-1AP is diagonal is converted into the “geometric” problem of
finding a basis B such that MB(TA) is diagonal. This change of perspective is one of
the most important techniques in linear algebra.
SECTION 11.1 Block Triangular Form 511
Theorem 1
0 0 0 Uk
where, for each i, Ui is an mi × mi upper triangular matrix with every entry on the main
diagonal equal to λi.
The proof is given at the end of this section. For now, we focus on a method for
finding the matrix P. The key concept is as follows.
Lemma 1
PROOF
Write Ai = (λiI - A)mi for convenience and let P be as in Theorem 1. The spaces
Gλi(A) = null(Ai) and null(P-1AiP) are isomorphic via x ↔ P-1x, so we show
m
that dim[null(P-1AiP)] = mi. Now P-1AiP = (λiI - P-1AP) i. If we use the block
form in Theorem 1, this becomes
λi I − U 1
mi
0 0
0 λi I − U 2 0
P-1AiP =
0 0 λi I − U k
( λi I − U1 ) mi 0 0
0 ( λi I − U 2 ) mi
0
=
( λi I − U k ) i
m
0 0
Lemma 2
PROOF
It suffices by Lemma 1 to show that each pij is in Gλi(A). Write the matrix in
Theorem 1 as P-1AP = diag(U1, U2, …, Uk). Then
AP = P diag(U1, U2, …, Uk)
Comparing columns gives, successively:
Lemma 3
PROOF
It suffices by Lemma 1 to show that B is independent. If a linear combination
from B vanishes, let xi be the sum of the terms from Bi. Then x1 + + xk = 0.
But xi = ∑ jrij pij by Lemma 2, so ∑ i,jrij pij = 0. Hence each xi = 0, so each
coefficient in xi is zero.
Triangulation Algorithm
PROOF
Lemma 3 guarantees that B = {p11, …, pkm1} is a basis of n, and Theorem 4
Section 9.2 shows that P-1AP = MB(TA). Now Gλi(A) is TA-invariant for each i
because
(λiI - A)mi x = 0 implies (λiI - A)mi(Ax) = A(λiI - A)mix = 0
By Theorem 7 Section 9.3 (and induction), we have
P-1AP = MB(TA) = diag(U1, U2, …, Uk)
where Ui is the matrix of the restriction of TA to Gλi(A), and it remains to
show that Ui has the desired upper triangular form. Given s, let pij be a
basis vector in null[(λiI - A)s+1]. Then (λiI - A)pij is in null[(λiI - A)s], and
therefore is a linear combination of the basis vectors pit coming before pij.
Hence
TA(pij) = Apij = λipij - (λiI - A)pij
shows that the column of Ui corresponding to pij has λi on the main diagonal
and zeros below the main diagonal. This is what we wanted.
514 Chapter 11 Canonical Forms
EXAMPLE 1
2 0 0 1
0 2 0 −1
If A = , find P such that P-1AP is block triangular.
−1 1 2 0
0 0 0 2
Solution ► cA(x) = det[xI - A] = (x - 2)4, so λ1 = 2 is the only eigenvalue and
we are in the case k = 1 of Theorem 1. Compute:
0 0 0 −1 0 0 0 0
0 0 0 1 0 0 0 0
(2I - A) = (2I - A)2 = (2I - A)3 = 0
1 −1 0 0 0 0 0 −2
0 0 0 0 0 0 0 0
By gaussian elimination find a basis {p11, p12} of null(2I - A); then extend in
any way to a basis {p11, p12, p13} of null[(2I - A)2]; and finally get a basis
{p11, p12, p13, p14} of null[(2I - A)3] = 4. One choice is
ST ST ST ST
1 0 0 0
p11 = 1 p12 = 0 p13 = 1 p14 = 0
0 1 0 0
0 0 0 1
1 0 0 0 2 0 0 1
1 0 1 0 0 2 1 0
Hence P = [p11 p12 p13 p14] = gives P-1AP =
0 1 0 0 0 0 2 −2
0 0 0 1 0 0 0 2
EXAMPLE 2
2 0 1 1
3 5 4 1
If A = , find P such that P-1AP is block triangular.
− 4 − 3 − 3 −1
1 0 1 2
= (x - 1)(x - 2) | x -3 5 -4
x+2 |
= (x - 1)2(x - 2)2
S T S T
1 0
p11 = 1 p12 = 3
-2 -4
1 1
Since λ1 = 1 has multiplicity 2 as a root of cA(x), dim Gλi(A) = 2 by Lemma 1.
Since p11 and p12 both lie in Gλi(A), we have Gλi(A) = span{p11, p12}.
Turning to λ2 = 2, we find that null(2I - A) = span{p21} and
null[(2I - A)2] = span{p21, p22} where
S T S T
1 0
p21 = 0 and p22 = -4
-1 3
1 0
Again, dim Gλ2(A) = 2 as λ2 has multiplicity 2, so Gλ2(A) = span{p21, p22}.
1 0 1 0 1 −3 0 0
−
Hence P = 1 3 0 4 gives P-1AP = 0 1 0 0 .
− 2 − 4 −1 3 0 0 2 3
1 1 1 0 0 0 0 2
Theorem 2
Cayley-Hamilton Theorem
If A is a square matrix with every eigenvalue real, then cA(A) = 0.
PROOF
k
As in Theorem 1, write cA(x) = (x - λ1)m1(x - λk)mk = ∏(x − λi)mi, and write
i=1
P–1AP = D = diag(U1, …, Uk). Hence
k
cA(Ui) = ∏(Ui − λiImi)mi = 0 for each i
i=1
mi
because the factor (Ui − λiImi) = 0. In fact Ui − λiImi is mi × mi and has zeros on
the main diagonal. But then
P-1cA(A)P = cA(D) = cA[diag(U1, …, Uk)]
= diag[cA(U1), …, cA(Uk)]
=0
It follows that cA(A) = 0.
516 Chapter 11 Canonical Forms
EXAMPLE 3
Proof of Theorem 1
The proof of Theorem 1 requires the following simple fact about bases, the proof
of which we leave to the reader.
Lemma 4
If {v1, v2, …, vn} is a basis of a vector space V, so also is {v1 + sv2, v2, …, vn} for any
scalar s.
PROOF OF THEOREM 1
Let A be as in Theorem 1, and let T = TA : n → n be the matrix
transformation induced by A. For convenience, call a matrix a λ-m-ut matrix if it
is an m × m upper triangular matrix and every diagonal entry equals λ. Then we
must find a basis B of n such that MB(T ) = diag(U1, U2, …, Uk) where Ui is a
λi-mi-ut matrix for each i. We proceed by induction on n. If n = 1, take B = {v}
where v is any eigenvector of T.
If n > 1, let v1 be a λ1-eigenvector of T, and let B0 = {v1, w1, …, wn-1} be any
basis of n containing v1. Then (see Lemma 2 Section 5.5)
MB0(T ) = S T
λ1 X
0 A1
in block form where A1 is (n - 1) × (n - 1). Moreover, A and MB0(T ) are
similar, so
cA(x) = cMB (T )(x) = (x - λ1)cA1(x)
0
m1-1
Hence cA1(x) = (x - λ1) (x - λ2)m2(x - λk)mk, so (by induction) let
Q-1A1Q = diag(Z1, U2, …, Uk)
where Z1 is a λ1-(m1 - 1)-ut matrix and Ui is a λi-mi-ut matrix for each i > 1.
If P = S , then P-1MB0(T ) = S T = A, say. Hence
0 QT
1 0 λ1 XQ
-1
0 Q A1Q
A ∼ MB0(T ) ∼ A so by Theorem 4(2) Section 9.2 there is a basis B of n such
that MB1(TA) = A, that is MB1(T ) = A. Hence MB1(T ) takes the block form
SECTION 11.1 Block Triangular Form 517
λ1 X1 Y
0 Z1 0 0 0
MB1 T = S
0 diag(Z1, U2, …, Uk) T
λ1 XQ
Q R = (∗)
U2 0
0
0 … U
k
0 0 0 … λ2
We first replace w1 by w1 = w1 + sv1 where s is to be determined. Then (∗)
gives
T(w1) = T(w1) + sT(v1)
= ( y1v1 + λ2w1) + sλ1v1
= y1v1 + λ2(w1 - sv1) + sλ1v1
= λ2w1 + [( y1 - s(λ2 - λ1)]v1
Again, t can be chosen so that T(w2) = u12w1 + λ2w2. Continue in this way to
eliminate y1, …, ym2. This procedure also works for λ3, λ4, … and so produces a
new basis B such that MB(T ) is as in (∗) but with Y = 0.
EXERCISES 11.1
2. Show that the following conditions are scalars r0, r1, …, rn-1. [Hint: Cayley-Hamilton
equivalent for a linear operator T on a finite theorem.]
dimensional space V.
4. If T : V → V is a linear operator where V is
(1) MB(T ) is upper triangular for some ordered finite dimensional, show that cT(T ) = 0. [Hint:
basis B of E. Exercise 26 Section 9.1.]
(2) A basis {b1, …, bn} of V exists such that, for each
5. Define T : P → P by T [ p(x)] = xp(x). Show that:
i, T(bi) is a linear combination of b1, …, bi.
(a) T is linear and f (T )[ p(x)] = f (x)p(x) for all
(3) There exist T-invariant subspaces
polynomials f (x).
V1 ⊆ V2 ⊆ ⊆ Vn = V such that dim Vi = i
for each i. (b) Conclude that f (T ) ≠ 0 for all nonzero
polynomials f (x). [See Exercise 4.]
3. If A is an n × n invertible matrix, show that
A-1 = r0I + r1A + + rn-1An-1 for some
column operations as well, then A → UAV = S T for invertible U and V, and the
Ir 0
0 0
canonical forms are the matrices S T where r is the rank (this is the Smith normal
Ir 0
0 0
form and is discussed in Theorem 3 Section 2.6). In this section, we discover the
canonical forms for square matrices under similarity: A → P-1AP.
If A is an n × n matrix with distinct real eigenvalues λ1, λ2, …, λk, we saw
in Theorem 1 Section 11.1 that A is similar to a block triangular matrix; more
precisely, an invertible matrix P exists such that
U1 0 0
0 U2 0
P-1AP = = diag(U1, U2, …, Uk) (∗)
0 0 Uk
where, for each i, Ui is upper triangular with λi repeated on the main diagonal. The
Jordan canonical form is a refinement of this theorem. The proof we gave of (∗) is
matrix theoretic because we wanted to give an algorithm for actually finding the
matrix P. However, we are going to employ abstract methods here. Consequently,
we reformulate Theorem 1 Section 11.1 as follows:
Theorem 1
Let T : V → V be a linear operator where dim V = n. Assume that λ1, λ2, …, λk are
the distinct eigenvalues of T, and that the λi are all real. Then there exists a basis F
of V such that MF (T ) = diag(U1, U2, …, Uk) where, for each i, Ui is square, upper
triangular, with λi repeated on the main diagonal.
SECTION 11.2 The Jordan Canonical Form 519
PROOF
Choose any basis B = {b1, b2, …, bn} of V and write A = MB(T ). Since A has
the same eigenvalues as T, Theorem 1 Section 11.1 shows that an invertible
matrix P exists such that P-1AP = diag(U1, U2, …, Uk) where the Ui are as in the
statement of the Theorem. If pj denotes column j of P and CB : V → n is the
coordinate isomorphism, let fj = C -1B (pj) for each j. Then F = {f1, f2, …, fn} is a
basis of V and CB(fj) = pj for each j. This means that PB←F = [CB(fj)] = [pj] = P,
and hence (by Theorem 2 Section 9.2) that PF←B = P-1. With this, column j of
MF (T ) is
CF (T(fj)) = PF←BCB(T(fj)) = P-1MB(T )CB(fj) = P-1Apj
for all j. Hence
MF (T ) = [CF (T(fj))] = [P-1Apj] = P-1A[pj] = P-1AP = diag(U1, U2, …, Uk)
as required.
Definition 11.2 If n ≥ 1, define the Jordan block Jn(λ) to be the n × n matrix with λs on the main
diagonal, 1s on the diagonal above, and 0s elsewhere. We take J1(λ) = [λ].
Hence
λ 1 0 0
λ 1 0
λ 1 0 λ 1 0
J1(λ) = [λ], J2(λ) = , J3(λ) = 0 λ 1 , J4(λ) = , …
0 λ 0 0 λ 1
0 0 λ
0 0 0 λ
We are going to show that Theorem 1 holds with each block Ui replaced by Jordan
blocks corresponding to eigenvalues. It turns out that the whole thing hinges on the
case λ = 0. An operator T is called nilpotent if T m = 0 for some m ≥ 1, and in this
case λ = 0 for every eigenvalue λ of T. Moreover, the converse holds by Theorem 1
Section 11.1. Hence the following lemma is crucial.
Lemma 1
1 The converse is true too: If MB(T ) has this form for some basis B of V, then T is nilpotent.
520 Chapter 11 Canonical Forms
Theorem 2
PROOF
Let E = {e1, e2, …, en) be a basis of V as in Theorem 1, and assume that Ui is an
ni × ni matrix for each i. Let
E1 = {e1, …, en1}, E2 = {en1+1, …, en2}, …, Ek = {enk-1+1, …, enk}
where nk = n, and define Vi = span{Ei} for each i. Because the matrix
ME(T ) = diag(U1, U2, …, Um) is block diagonal, it follows that each
Vi is T-invariant and MEi(T ) = Ui for each i. Let Ui have λi repeated
along the main diagonal, and consider the restriction T : Vi → Vi. Then
MEi(T - λiIni) is a nilpotent matrix, and hence (T - λiIni) is a nilpotent
operator on Vi. But then Lemma 1 shows that Vi has a basis Bi such that
MBi(T - λiIni) = diag(K1, K2, …, Kti) where each Ki is a Jordan block
corresponding to λ = 0. Hence
MBi(T ) = MBi(λiIni) + MBi(T - λiIni)
= λiIni + diag(K1, K2, …, Kti) = diag( J1, J2, …, Jk)
where Ji = λiIfi + Ki is a Jordan block corresponding to λi (where Ki is fi × fi).
Finally, B = B1 B2 Bk is a basis of V with respect to which T has the
desired matrix.
Corollary 1
PROOF
Apply Theorem 2 to the matrix transformation TA : n → n to find a basis B
of n such that MB(TA) has the desired form. If P is the (invertible) n × n matrix
with the vectors of B as its columns, then P-1AP = MB(TA) by Theorem 4
Section 9.2.
SECTION 11.2 The Jordan Canonical Form 521
Of course if we work over the field of complex numbers rather than , the
characteristic polynomial of a (complex) matrix A splits completely as a product
of linear factors. The proof of Theorem 2 goes through to give
Theorem 3
Proof of Lemma 1
Lemma 1
PROOF
The proof proceeds by induction on n. If n = 1, then T is a scalar operator, and
so T = 0 and the lemma holds. If n ≥ 1, we may assume that T ≠ 0, so m ≥ 1
and we may assume that m is chosen such that T m = 0, but T m-1 ≠ 0. Suppose
T m-1u ≠ 0 for some u in V.3
Claim. {u, Tu, T 2u, …, T m-1u} is independent.
Proof. Suppose a0u + a1Tu + a2T 2u + + am-1T m-1u = 0 where each ai is in
. Since T m = 0, applying T m-1 gives 0 = T m-10 = a0 T m-1u, whence a0 = 0.
3
2 This was first proved in 1870 by the French mathematician Camille Jordan (1838–1922) in his monumental Traité des substitutions
et des équations algébriques.
3 If S : V→V is an operator, we abbreviate S(u) by Su for simplicity.
522 Chapter 11 Canonical Forms
EXERCISES 11.2
1. By direct computation, show that there is no 3. (a) Show that every complex matrix is similar to
invertible complex matrix C such that its transpose.
1 1 0 1 1 0 (b) Show every real matrix is similar
C-1 0 1 1 C = 0 1 0 . to its transpose. [Hint: Show that
0 0 1 0 0 1 Jk(0)Q = Q[Jk(0)]T where Q is the k × k
b 0 0 matrix with 1s down the “counter diagonal”,
a 1 0
0 a 1 . that is from the (1, k)-position to the
2. Show that 0 a 0 is similar to
(k, 1)-position.]
0 0 b 0 0 a
The fact that the square of every real number is nonnegative shows that the
equation x2 + 1 = 0 has no real root; in other words, there is no real number u such
that u2 = -1. So the set of real numbers is inadequate for finding all roots of all
polynomials. This kind of problem arises with other number systems as well. The
set of integers contains no solution of the equation 3x + 2 = 0, and the rational
numbers had to be invented to solve such equations. But the set of rational numbers
is also incomplete because, for example, it contains no root of the polynomial
x2 - 2. Hence the real numbers were invented. In the same way, the set of complex
numbers was invented, which contains all real numbers together with a root of the
equation x2 + 1 = 0. However, the process ends here: the complex numbers have
the property that every polynomial with complex coefficients has a (complex) root.
This fact is known as the fundamental theorem of algebra.
One pleasant aspect of the complex numbers is that, whereas describing the real
numbers in terms of the rationals is a rather complicated business, the complex
numbers are quite easy to describe in terms of real numbers. Every complex
number has the form
a + bi
where a and b are real numbers, and i is a root of the polynomial x2 + 1. Here a
and b are called the real part and the imaginary part of the complex number,
respectively. The real numbers are now regarded as special complex numbers of
the form a + 0i = a, with zero imaginary part. The complex numbers of the form
0 + bi = bi with zero real part are called pure imaginary numbers. The complex
number i itself is called the imaginary unit and is distinguished by the fact that
i2 = -1
As the terms complex and imaginary suggest, these numbers met with some resistance
when they were first used. This has changed; now they are essential in science
and engineering as well as mathematics, and they are used extensively. The names
persist, however, and continue to be a bit misleading: These numbers are no more
“complex” than the real numbers, and the number i is no more “imaginary” than -1.
Much as for polynomials, two complex numbers are declared to be equal if and
only if they have the same real parts and the same imaginary parts. In symbols,
a + bi = a + bi if and only if a = a and b = b
The addition and subtraction of complex numbers is accomplished by adding and
subtracting real and imaginary parts:
524 Appendix A: Complex Numbers
EXAMPLE 1
If z = 2 - 3i and w = -1 + i, write each of the following in the form a + bi:
z + w, z - w, zw, _13 z, and z2.
EXAMPLE 2
Find all complex numbers z such as that z2 = i.
In actual calculations, the work is facilitated by two useful notions: the conjugate
and the absolute value of a complex number. The next example illustrates the
technique.
Appendix A: Complex Numbers 525
EXAMP L E 3
3 + 2i
Write ______ in the form a + bi.
2 + 5i
Solution ► Multiply top and bottom by the complex number 2 - 5i (obtained
from the denominator by negating the imaginary part). The result is
3 + 2i
______ (2 - 5i)(3 + 2i) (6 + 10) + (4 - 15)i __
= ______________ = __________________
2 2
= 16
29
11
- __
29
i
2 + 5i (2 - 5i)(2 + 5i) 2 - (5i)
16 11
Hence the simplified form is __
29
- __
29
i, as required.
The key to this technique is that the product (2 - 5i)(2 + 5i) = 29 in the
denominator turned out to be a real number. The situation in general leads to
the following notation: If z = a + bi is a complex number, the conjugate of z
__
is the complex number, denoted z , given by
__
z = a - bi where z = a + bi
__
Hence z
_______ is obtained from z by negating the imaginary part. For example,
______ __
(2 + 3i) = 2 - 3i and (1 - i) = 1 + i. If we multiply z = a + bi by z , we obtain
__
z z = a2 + b2 where z = a + bi
2 2
The real number a + b is always nonnegative, so we can state the following
definition: The absolute value or modulus_______
of a complex number z = a + bi,
denoted by |z|, is the positive square root a2 + b2 ; that is,
√
_______
|z| = √a2 + b2 where z = a + bi
__________ ___ _______ __
For example, |2 - 3i| = √22 + (-3)2 = √13 and |1 + i| = √12 + 12 = √2 .
Note that if a real number a is viewed
__ as the complex number a + 0i, its absolute
value (as a complex number) is |a| = √a2 , which agrees with its absolute value as a
real number.
With these notions in hand, we can describe the technique applied in Example 3
as follows: When converting a quotient __wz of complex numbers to the form a + bi,
__
multiply top and bottom by the conjugate w of the denominator.
The following list contains the most important properties of conjugates and
absolute values. Throughout, z and w denote complex numbers.
______ __
C1. z ± w = z ± w
__ 1 __
1 = ____
C7. __
z |z|2 z
___ __ __
zw = z __w
C2. ____ C8. |z| ≥ 0 for all complex numbers z
wR = w
C3. Q __
z z
__
__ C9. |z| = 0 if and only if z = 0
___
__
C4. ( z ) = z C10. |zw| = |z||w|
__ z |z|
C5. z is real if and only if z = z C11. | __
w | = |w|
___
__
C6. z z = |z|2 C12. |z + w| = |z| + |w| (triangle
inequality)
All these properties (except property C12) can (and should) be verified by the
reader for arbitrary complex numbers z = a + bi and w = c + di. They are not
independent; for example, property C10 follows from properties C2 and C6.
The triangle inequality, as its name suggests, comes from a geometric
representation of the complex numbers analogous to identification of the real
526 Appendix A: Complex Numbers
Polar Form
y The geometric description of what happens when two complex numbers are
multiplied is at least as elegant as the parallelogram law of addition, but it requires
Unit i Radian
circle measure that the complex numbers be represented in polar form. Before discussing this,
P we pause to recall the general definition of the trigonometric functions sine and
of θ
1 cosine. An angle θ in the complex plane is in standard position if it is measured
−1 θ
0 counterclockwise from the positive real axis as indicated in Figure A.4. Rather than
1 x
using degrees to measure angles, it is more natural to use radian measure. This
is defined as follows: The circle with its centre at the origin and radius 1 (called
−i the unit circle) is drawn in Figure A.4. It has circumference 2π, and the radian
FIGURE A.4 measure of θ is the length of the arc on the unit circle counterclockwise from 1 to
the point P on the unit circle determined by θ. Hence 90° = __π2 , 45° = __π4 , 180° = π,
and a full circle has the angle 360° = 2π. Angles measured clockwise from 1 are
negative; for example, -i corresponds to -__π2 (or to __
2π
3
).
π
__
Consider an angle θ in the range 0 ≤ θ ≤ 2 . If θ is plotted in standard
position as in Figure A.4, it determines a unique point P on the unit circle,
and P has coordinates (cos θ, sin θ) by elementary trigonometry. However, any
angle θ (acute or not) determines a unique point on the unit circle, so we define
the cosine and sine of θ (written cos θ and sin θ) to be the x and y coordinates
of this point. For example, the points
1 = (1, 0) i = (0, 1) -1 = (-1, 0) -i = (0, -1)
π
__ 3π
__
plotted in Figure A.4 are determined by the angles 0, 2
, π, 2
, respectively. Hence
3π
cos 0 = 1 cos __π2 = 0 cos π = -1 cos __
2
=0
3π
sin 0 = 0 sin __π2 = 1 sin π = 0 sin __
2
= -1
Now we can describe the polar form of a complex number. Let z = a + bi be a
complex number, and write the absolute value of z as
_______
r = |z| = √a2 + b2
y If z ≠ 0, the angle θ shown in Figure A.5 is called an argument of z and is denoted
θ = arg z
z = (a, b)
This angle is not unique (θ + 2πk would do as well for any k = 0, ±1, ± 2, …).
r
b However, there is only one argument θ in the range -π ≤ θ ≤ π, and this is
sometimes called the principal argument of z.
θ
0 a x Returning to Figure A.5, we find that the real and imaginary parts a and b of z
are related to r and θ by
FIGURE A.5
a = r cos θ
b = r sin θ
Hence the complex number z = a + bi has the form
z = r(cos θ + i sin θ) r = |z|, θ = arg(z)
The combination cos θ + i sin θ is so important that a special notation is used:
eiθ = cos θ + i sin θ
is called Euler’s formula after the great Swiss mathematician Leonhard Euler
(1707–1783). With this notation, z is written
z = reiθ r = |z|, θ = arg(z)
This is a polar form of the complex number z. Of course it is not unique, because
the argument can be changed by adding a multiple of 2π.
528 Appendix A: Complex Numbers
EXAMPLE 4
Write z1 = -2 + 2i and z2 = -i in polar form.
z1 = −2 + 2i y Solution ► The two numbers are plotted in the complex plane in Figure A.6.
The absolute values are
__________ __
r1 = |-2 + 2i| = √(-2)2 + 22 = 2√2
__________
θ1
θ2 r2 = |-i| = √02 + (-1)2 = 1
0 x By inspection of Figure A.6, arguments of z1 and z2 are
3π
z2 = −i θ1 = arg(-2 + 2i) = __
4
3π
FIGURE A.6 θ2 = arg(-i) = __
2 __
The corresponding polar forms are z1 = -2 + 2i = 2√2 e3πi/4 and
z2 = -i = e3πi/2. Of course, we could have taken the argument -__π2 for z2 and
obtained the polar form z2 = e-πi/2.
In Euler’s formula eiθ = cos θ + i sin θ, the number e is the familiar constant
e = 2.71828… from calculus. The reason for using e will not be given here; the
reason why cos θ + i sin θ is written as an exponential function of θ is that the
law of exponents holds:
eiθ · eiϕ = ei(θ+ϕ)
where θ and ϕ are any two angles. In fact, this is an immediate consequence of
the addition identities for sin(θ + ϕ) and cos(θ + ϕ):
eiθeiϕ = (cos θ + i sin θ)(cos ϕ + i sin ϕ)
= (cos θ cos ϕ - sin θ sin ϕ) + i (cos θ sin ϕ + sin θ cos ϕ)
= cos(θ + ϕ) + i sin(θ + ϕ)
= ei(θ+ϕ)
This is analogous to the rule eaeb = ea+b, which holds for real numbers a and b, so it
is not unnatural to use the exponential notation eiθ for the expression cos θ + i sin θ.
In fact, a whole theory exists wherein functions such as ez, sin z, and cos z are
studied, where z is a complex variable. Many deep and beautiful theorems can be
proved in this theory, one of which is the so-called fundamental theorem of algebra
mentioned later (Theorem 4). We shall not pursue this here.
The geometric description of the multiplication of two complex numbers follows
from the law of exponents.
Theorem 1
Multiplication Rule
If z1 = r1eiθ1 and z2 = r2eiθ2 are complex numbers in polar form, then
z1z2 = r1r2ei(θ2+θ2)
In other words, to multiply two complex numbers, simply multiply the absolute
values and add the arguments. This simplifies calculations considerably, particularly
when we observe that it is valid for any arguments θ1 and θ2.
Appendix A: Complex Numbers 529
EXAMP L E 5
__
Multiply (1 - i)(1 + √3 i) in two ways.
__ __
y Solution ► We have |1 - i| = √2 and |1 + √3 i| = 2 so, from Figure A.7,
__
1 + √3i 1 - i = √2 e-iπ/4
__
1 + √3 i = 2eiπ/3
(1 − i )(1 + √ 3i ) Hence, by the multiplication rule,
π
3 π __ __
12
x (1 - i)(1 + √3 i) = (√2__e-iπ/4)(2eiπ/3)
0 − π4
2 ei(-π/4+π/3)
= 2√__
= 2√2 eiπ/12
1 −i
FIGURE A.7 This gives the required
__ product
__ in polar__form. Of course, direct multiplication
gives (1 - i)(1 + √3 i) = (√3 + 1) + (√3 - 1)i. __
Hence, equating __real and
π √3 + 1 π √3 - 1
imaginary parts gives the formulas cos( 12 ) = √ and sin(__
__ _____
__
12
) = _____
√
__ .
2 2 2 2
Roots of Unity
If a complex number z = reiθ is given in polar form, the powers assume a particularly
simple form. In fact, z2 = (reiθ)(reiθ) = r2e2iθ, z3 = z2
z = (r2e2iθ)(reiθ) = r3e3iθ, and so
on. Continuing in this way, it follows by induction that the following theorem holds
for any positive integer n. The name honours Abraham De Moivre (1667-1754).
Theorem 2
De Moivre’s Theorem
If θ is any angle, then (eiθ)n = einθ holds for all integers n.
PROOF
The case n > 0 has been discussed, and the reader can verify the result for n = 0.
To derive it for n < 0, first observe that
if z = reiθ ≠ 0 then z-1 = _1r e-iθ
In fact, (reiθ)(_1r e-iθ) = 1ei0 = 1 by the multiplication rule. Now assume that n is
negative and write it as n = -m, m > 0. Then
(reiθ)n = [(reiθ)-1]m = (_1r e-iθ)m = r-mei(-mθ) = rneinθ
If r = 1, this is De Moivre’s theorem for negative n.
530 Appendix A: Complex Numbers
EXAMPLE 6
__
y Verify that (-1 + √3 i)3 = 8.
−1+ √ 3i __ __
Solution ► We have |-1 + √3 i| = 8, so -1 + √3 i = 2e2πi/3 (see Figure A.8).
Hence De Moivre’s theorem gives
2 __
(-1 + √3 i)3 = (2e2πi/3)3 = 8e3(2πi/3) = 8e2πi = 8
2π
3
0 x
FIGURE A.8 De Moivre’s theorem can be used to find nth roots of complex numbers where n
is positive. The next example illustrates this technique.
EXAMPLE 7
Find the cube roots of unity; that is, find all complex numbers z such that
z3 = 1.
Solution ► First write z = reiθ and 1 = 1ei0 in polar form. We must use the
condition z3 = 1 to determine r and θ. Because z3 = r3e3iθ by De Moivre’s
theorem, this requirement becomes
r3e3iθ = 1e0i
These two complex numbers are equal, so their absolute values must be equal
and the arguments must either be equal or differ by an integral multiple of 2π:
r3 = 1
The same type of calculation gives all complex nth roots of unity; that is, all
complex numbers z such that zn = 1. As before, write 1 = 1e0i and
z = reiθ
in polar form. Then zn = 1 takes the form
rnenθi = 1e0i
using De Moivre’s theorem. Comparing absolute values and arguments yields
Appendix A: Complex Numbers 531
rn = 1
nθ = 0 + 2kπ, k some integer
Hence r = 1, and the n values
2kπ ,
θ = ____ k = 0, 1, 2, …, n - 1
n
of θ all lie in the range 0 ≤ θ < 2π. As in Example 7, every choice of k yields a value
of θ that differs from one of these by a multiple of 2π, so these give the arguments
of all the possible roots.
Theorem 3
EXAMP L E 8
__ __
Find the fourth roots of √2 + √2 i.
__ __
πi/4
Solution__► First √ √
__ write 2 + 2 i = 2e in polar form. If z = reiθ satisfies
4
z = √2 + √2 i, then De Moivre’s theorem gives
r4ei(4θ) = 2eπi/4
Hence r4 = 2 and 4θ = __π4 + 2kπ, k an integer. We obtain four distinct roots
(and hence all) by
4 __
π 2kπ
r = √2 , θ = __
16
+ ___
16
, k = 0, 1, 2, 3
Thus the four roots are
4 __ 4 __ 4 __ 4 __
√2 eπi/16 √2 e9πi/16 √2 e17πi/16 √2 e25πi/16
EXAMPLE 9
Find a real irreducible quadratic with u = 3 - 4i as a root.
__
Solution ► We have u + u = 6 and |u|2 = 25, so x2 - 6x + 25 is irreducible with
__
u and u = 3 + 4i as roots.
Theorem 4
If f (x) is a polynomial with complex coefficients, and if u1 is a root, then the factor
theorem (Section 6.5) asserts that
f (x) = (x - u1)g(x)
where g(x) is a polynomial with complex coefficients and with degree one less
than the degree of f (x). Suppose that u2 is a root of g(x), again by the fundamental
theorem. Then g(x) = (x - u2)h(x), so
f (x) = (x - u1)(x - u2)h(x)
This process continues until the last polynomial to appear is linear. Thus f (x) has
been expressed as a product of linear factors. The last of these factors can be written
in the form u(x - un), where u and un are complex (verify this), so the fundamental
theorem takes the following form.
Theorem 5
This form of the fundamental theorem, when applied to a polynomial f (x) with real
coefficients, can be used to deduce the following result.
Theorem 6
Every polynomial f (x) of positive degree with real coefficients can be factored as a
product of linear and irreducible quadratic factors.
EXERCISES A
1. Solve each of the following for the real number x. (g) (1 + i)4
(a) x - 4i = (2 - i)2
(h) (1 - i)2(2 + i)2
__ __
(b) (2 + xi)(3 - 2i) = 12 + 5i 3√__3 - i √3 + 7i
(i) ________ + ________
__
√3 + i √3 - i
(c) (2 + xi)2 = 4
3. In each case, find the complex number z.
(d) (2 + xi)(2 - xi) = 5
(a) iz - (1 + i)2 = 3 - i
2. Convert each of the following to the form a + bi.
(b) (i + z) - 3i(2 - z) = iz + 1
(a) (2 - 3i) - 2(2 - 3i) + 9
(c) z2 = -i
(b) (3 - 2i)(1 + i) + |3 + 4i|
(d) z2 = 3 - 4i
1+i 1 - i
(d) 3 - 2i + 3 - 7i
(c) ______ + ________ ______ ______ __
2 - 3i -2 + 3i 1-i 2 - 3i (e) z(1 + i) = z + (3 + 2i)
__
(e) i131
(f ) (2 - i)
3
(f ) z(2 - i) = ( z + 1)(1 + i)
534 Appendix A: Complex Numbers
4. In each case, find the roots of the real quadratic 12. In each case, describe the graph of the equation
equation. (where z denotes a complex number).
(a) x2 - 2x + 3 = 0 (a) |z| = 1
(b) |z - 1| = 2
_ _
(b) x2 - x + 1 = 0 (c) z = iz
(d) z = -z
(c) 3x2 - 4x + 2 = 0 (e) z = |z|
(d) 2x2 - 5x + 2 = 0
(f ) im z = m · re z, m a real number
5. Find all numbers x in each case. 13. (a) Verify |zw| = |z||w| directly for z = a + bi
3 3 and w = c + di.
(a) x = 8
(b) x = -8
4 (b) Deduce (a) from properties C2 and C6.
(c) x = 16
(d) x4 = 64
__ __
14. Prove that |z + w| = |z|2 + |w|2 + w z + w z
6. In each case, find a real quadratic with u as a
for all complex numbers w and z.
root, and find the other root.
__
(a) u = 1 + i
(b) u = 2 - 3i 15. If zw is real and z ≠ 0, show that w = a z for
some real number a.
(c) u = -i
(d) u = 3 - 4i
__
2
16. If zw = z v and z ≠ 0, show that w = uv for
7. Find the roots of x - 2cos θ x + 1 = 0, θ any some u in with |u| = 1.
angle.
17. Use property C5 to show that (1 + i)n + (1 - i)n
8. Find a real polynomial of degree 4 with 2 - i is real for all n.
and 3 - 2i as roots.
18. Express each of the following in polar form
9. Let re z and im z denote, respectively, the real (use the principal argument).
and imaginary parts of z. Show that:
(a) 3 - 3i
(b) -4i
(a) im(iz) = re z (b) re(iz) = -im z __ __
__ __ (c) -√3 + i
(d) -4 + 4√3 i
(c) z + z = 2 re z (d) z - z = 2i im z
(e) -7i
(f ) -6 + 6i
(e) re(z + w) = re z + re w, and re(tz) = t · re z
if t is real 19. Express each of the following in the form a + bi.
(f ) im(z + w) = im z + im w, and (a) 3eπi
(b) e7πi/3
im(tz) = t · im z if t is real __
(c) 2e3πi/4
(d) √2 e-πi/4
__
10. In each case, show that u is a root of the (e) e5πi/4
(f ) 2√3 e-2πi/6
quadratic equation, and find the other root.
20. Express each of the following in the form a + bi.
(a) x2 - 3ix + (-3 + i) = 0; u = 1 + i __ __
(a) (-1 + √3 i)2
(b) (1 + √3 i)-4
(b) x2 + ix - (4 - 2i) = 0; u = -2
(c) (1 + i)8
(d) (1 - i)10
(c) x2 - (3 - 2i)x + (5 - i) = 0; u = 2 - 3i __ __
(e) (1 - i)6(√3 + i)3
(f ) (√3 - i)9(2 - 2i)5
(d) x2 + 3(1 - i)x - 5i = 0; u = -2 + i
21. Use De Moivre’s theorem to show that:
11. Find the roots of each of the following complex
quadratic equations. (a) cos 2θ = cos2 θ - sin2 θ;
(a) x2 + 2x + (1 + i) = 0 sin 2θ = 2 cos θ sin θ
2
(b) x - x + (1 - i) = 0 (b) cos 3θ = cos3 θ - 3 cos θ sin2 θ;
(c) x2 - (2 - i)x + (3 - i) = 0 sin 3θ = 3 cos2 θ sin θ - sin3 θ
(d) x2 - 3(1 - i)x - 5i = 0
Appendix A: Complex Numbers 535
22. (a) Find the fourth roots of unity. (c) If |w| = 1, show that the sum of the roots of
zn = w is zero.
(b) Find the sixth roots of unity.
__
27. If zn is real, n ≥ 1, show that ( z )n is real.
23. Find all complex numbers z such that:
__ __
(a) z4 = -1
(b) z4 = 2(√3 i - 1) 28. If z 2 = z2, show that z is real or pure imaginary.
Logic plays a basic role in human affairs. Scientists use logic to draw conclusions
from experiments, judges use it to deduce consequences of the law, and
mathematicians use it to prove theorems. Logic arises in ordinary speech with
assertions such as “If John studies hard, he will pass the course,” or “If an integer n
is divisible by 6, then n is divisible by 3.” 1 In each case, the aim is to assert that if a
certain statement is true, then another statement must also be true. In fact, if p and
q denote statements, most theorems take the form of an implication: “If p is true,
then q is true.” We write this in symbols as
p ⇒ q
and read it as “p implies q.” Here p is the hypothesis and q the conclusion of
the implication. The verification that p ⇒ q is valid is called the proof of the
implication. In this section we examine the most common methods of proof 2
and illustrate each technique with some examples.
EXAMPLE 1
If n is an odd integer, show that n2 is odd.
1 By an integer we mean a “whole number”; that is, a number in the set 0, ±1, ±2, ±3, … .
2 For a more detailed look at proof techniques see D. Solow, How to Read and Do Proofs, 2nd ed. (New York: Wiley, 1990);
or J. F. Lucas. Introduction to Abstract Mathematics, Chapter 2 (Belmont, CA: Wadsworth, 1986).
Appendix B: Proofs 537
EXAMP L E 2
In a right triangle, show that the sum of the two acute angles is 90 degrees.
β
Solution ► The right triangle is shown in the diagram. Construct a rectangle
with sides of the same length as the short sides of the original triangle, and
α draw a diagonal as shown. The original triangle appears on the bottom of the
rectangle, and the top triangle is identical to the original (but rotated). Now it
α is clear that α + β is a right angle.
β
β Geometry was one of the first subjects in which formal proofs were used—
α Euclid’s Elements was published about 300 b.c. The Elements is the most successful
textbook ever written, and contains many of the basic geometrical theorems that are
taught in school today. In particular, Euclid included a proof of an earlier theorem
(about 500 b.c.) due to Pythagoras. Recall that, in a right triangle, the side opposite
the right angle is called the hypotenuse of the triangle.
EXAMP L E 3
c Pythagoras’ Theorem
aβ
α In a right-angled triangle, show that the square of the length of the hypotenuse
b equals the sum of the squares of the lengths of the other two sides.
a
Solution ► Let the sides of the right triangle have lengths a, b, and c as shown.
a a2
Consider two squares with sides of length a + b, and place four copies of the
triangle in these squares as in the diagram. The central rectangle in the second
square shown is itself a square because the angles α and β add to 90 degrees
b2 b
(using Example 2), so its area is c2 as shown. Comparing areas shows that both
a2 + b2 and c2 each equal the area of the large square minus four times the area
b of the original triangle, and hence are equal.
b a
α β
a β
b Sometimes it is convenient (or even necessary) to break a proof into parts, and
c2 deal with each case separately. We formulate the general method as follows:
α
b
β a
β α
Method of Reduction to Cases
a b To prove that p ⇒ q, show that p implies at least one of a list p1, p2, …, pn of
statements (the cases) and then show that pi ⇒ q for each i.
EXAMP L E 4
Show that n2 ≥ 0 for every integer n.
EXAMPLE 5
If n is an integer, show that n2 - n is even.
The statements used in mathematics are required to be either true or false. This
leads to a proof technique which causes consternation in many beginning students.
The method is a formal version of a debating strategy whereby the debater assumes
the truth of an opponent’s position and shows that it leads to an absurd conclusion.
EXAMPLE 6
If r is a rational number (fraction), show that r2 ≠ 2.
EXAMPLE 7
Pigeonhole Principle
If n + 1 pigeons are placed in n holes, then some hole contains at least 2 pigeons.
Solution ► Assume the conclusion is false. Then each hole contains at most
one pigeon and so, since there are n holes, there must be at most n pigeons,
contrary to assumption.
The next example involves the notion of a prime number, that is an integer that
is greater than 1 which cannot be factored as the product of two smaller positive
integers both greater than 1. The first few primes are 2, 3, 5, 7, 11, … .
EXAMPLE 8
If 2n - 1 is a prime number, show that n is a prime number.
prime”, and q is the statement “n is a prime.” Suppose that p is true but q is false
so that n is not a prime, say n = ab where a ≥ 2 and b ≥ 2 are integers. If we
write 2a = x, then 2n = 2ab = (2a)b = xb. Hence 2n - 1 factors:
2n - 1 = xb - 1 = (x - 1)(xb-1 + xb-2 + + x2 + x + 1)
As x ≥ 4, this expression is a factorization of 2n - 1 into smaller positive
integers, contradicting the assumption that 2n - 1 is prime.
The next example exhibits one way to show that an implication is not valid.
EXAMP L E 9
Show that the implication “n is a prime ⇒ 2n - 1 is a prime” is false.
Solution ► The first four primes are 2, 3, 5, and 7, and the corresponding values
for 2n - 1 are 3, 7, 31, 127 (when n = 2, 3, 5, 7). These are all prime as the
reader can verify. This result seems to be evidence that the implication is true.
However, the next prime is 11 and 211 - 1 = 2047 = 23 · 89, which is clearly
not a prime.
EXAMPLE 10
If n is an integer, show that “n is odd ⇔ n2 is odd.”
Many more examples of proofs can be found in this book and, although they are
often more complex, most are based on one of these methods. In fact, linear algebra is
one of the best topics on which the reader can sharpen his or her skill at constructing
proofs. Part of the reason for this is that much of linear algebra is developed using the
axiomatic method. That is, in the course of studying various examples it is observed
that they all have certain properties in common. Then a general, abstract system is
540 Appendix B: Proofs
studied in which these basic properties are assumed to hold (and are called axioms).
In this system, statements (called theorems) are deduced from the axioms using
the methods presented in this appendix. These theorems will then be true in all the
concrete examples, because the axioms hold in each case. But this procedure is more
than just an efficient method for finding theorems in the examples. By reducing the
proof to its essentials, we gain a better understanding of why the theorem is true and
how it relates to analogous theorems in other abstract systems.
The axiomatic method is not new. Euclid first used it in about 300 b.c. to
derive all the propositions of (euclidean) geometry from a list of 10 axioms. The
method lends itself well to linear algebra. The axioms are simple and easy to
understand, and there are only a few of them. For example, the theory of vector
spaces contains a large number of theorems derived from only ten simple axioms.
EXERCISES B
1. In each case prove the result and either prove the
(b) If x is irrational and y is rational, then x + y
converse or give a counterexample. is irrational.
(a) If n is an even integer, then n2 is a multiple of 4. (c) If 13 people are selected, at least 2 have
birthdays in the same month.
(b) If m is an even integer and n is an odd
integer, then m + n is odd. 5. Disprove each statement by giving a
(c) If x = 2 or x = 3, then x3 - 6x2 + 11x - 6 = 0. counterexample.
3. In each case prove the result by contradiction and n=2 n=3 n=4
either prove the converse or give a counterexample. 6. The number e from calculus has a series
(a) If n > 2 is a prime integer, then n is odd. expansion
1 + __
e = 1 + __ 1 + __1 +
(b) If n + m = 25 where n and m are integers,
1! 2! 3!
then one of n and m is greater than 12.
where n! = n(n - 1) 3 · 2 · 1 for each integer
(c) If a and b are__positive numbers and a ≤ b, n ≥ 1. Prove that e is irrational by contradiction.
__
then √a ≤ √b . [Hint: If e = m/n, consider
(d) If m and n are integers and mn is even, then k = n!Q e - 1 - __
1 - __ 1 - - __
1 - __ 1 .
R
m is even or n is even. 1! 2! 3! n!
Show that k is a positive integer and that
4. Prove each implication by contradiction. 1 + _____________
k = _____ 1 + < __1
n + 1 (n + 1)(n + 2) n .]
(a) If x and_____
y are positive numbers,
__ __
then √x + y ≠ √x + √y .
Appendix C:
Mathematical Induction
This is one of the most useful techniques in all of mathematics. It applies in a wide
variety of situations, as the following examples illustrate.
EXAMPLE 1
Show that 1 + 2 + + n = _12 n(n + 1) for n ≥ 1.
In the verification that Sn ⇒ Sn+1, we assume that Sn is true and use it to deduce
that Sn+1 is true. The assumption that Sn is true is sometimes called the induction
hypothesis.
EXAMPLE 2
n+1
-1
If x is any number such that x ≠ 1, show that 1 + x + x2 + + xn = x________
x-1
for n ≥ 1.
n+1
Solution ► Let Sn be the statement: 1 + x + x2 + + xn = x________ - 1.
2 x - 1
- 1 , which is true because
1. S1 is true. S1 reads 1 + x = x______
x-1
x2 - 1 = (x - 1)(x + 1).
Appendix C: Mathematical Induction 543
n+1
2. Sn ⇒ Sn+1. Assume the truth of Sn: 1 + x + x2 + + xn = x________ - 1.
x-1
xn+2 - 1 .
We must deduce from this the truth of Sn+1: 1 + x + x2 + + xn+1 = ________
x-1
Starting with the left side of Sn+1 and using the induction hypothesis, we find
1 + x + x2 + + xn+1 = (1 + x + x2 + + xn) + xn+1
n+1
- 1 + xn+1
= x________
x-1
xn+1 - 1 + xn+1(x - 1)
= ____________________
x-1
n+2
= x - 1
________
x-1
This shows that Sn+1 is true and so completes the induction.
Both of these examples involve formulas for a certain sum, and it is often
n
convenient to use summation notation. For example, ∑(2k - 1) means that in
k=1
the expression (2k - 1), k is to be given the values k = 1, k = 2, k = 3, …, k = n,
and then the resulting n numbers are to be added. The same thing applies to other
expressions involving k. For example,
n
∑k3 = 13 + 23 + + n3
k=1
5
∑ (3k - 1) = (3 · 1 - 1) + (3 · 2 - 1) + (3 · 3 - 1) + (3 · 4 - 1) + (3 · 5 - 1)
k=1
The next example involves this notation.
EXAMP L E 3
n
Show that ∑(3k2 - k) = n2(n + 1) for each n ≥ 1.
k=1
n
Solution ► Let Sn be the statement: ∑ (3k2 - k) = n2(n + 1).
k=1
1. S1 is true. S1 reads (3 · 12 - 1) = 12(1 + 1), which is true.
2. Sn ⇒ Sn+1. Assume that Sn is true. We must prove Sn+1:
n+1 n
∑ (3k2 - k) = ∑(3k2 - k) + [3(n + 1)2 - (n + 1)]
k=1 k=1
= n2(n + 1) + (n + 1)[3(n + 1) - 1] using Sn
= (n + 1)[n2 + 3n + 2]
= (n + 1)[(n + 1)(n + 2)]
= (n + 1)2(n + 2)
EXAMPLE 4
Show that 7n + 2 is a multiple of 3 for all n ≥ 1.
Solution ►
1. S1 is true: 71 + 2 = 9 is a multiple of 3.
2. Sn ⇒ Sn+1. Assume that 7n + 2 is a multiple of 3 for some n ≥ 1; say,
7n + 2 = 3m for some integer m. Then
7n+1 + 2 = 7(7n) + 2 = 7(3m - 2) + 2 = 21m - 12 = 3(7m - 4)
In all the foregoing examples, we have used the principle of induction starting
at 1; that is, we have verified that S1 is true and that Sn ⇒ Sn+1 for each n ≥ 1, and
then we have concluded that Sn is true for every n ≥ 1. But there is nothing special
about 1 here. If m is some fixed integer and we verify that
1. Sm is true.
2. Sn ⇒ Sn+1 for every n ≥ m.
then it follows that Sn is true for every n ≥ m. This “extended” induction principle is
just as plausible as the induction principle and can, in fact, be proved by induction.
The next example will illustrate it. Recall that if n is a positive integer, the number
n! (which is read “n-factorial”) is the product
n! = n(n - 1)(n - 2)3 · 2 · 1
of all the numbers from n to 1. Thus 2! = 2, 3! = 6, and so on.
EXAMPLE 5
Show that 2n < n! for all n ≥ 4.
EXERCISES C
1 + _____
1 + + ________
1 n (c) Sn ⇒ Sn+1 for each n ≥ 10.
6. _____ = _____
1·2 2·3 n(n + 1) n+1 (d) Both Sn and Sn+1 ⇒ Sn+2 for each n ≥ 1.
7. 12 + 32 + + (2n - 1)2 = _n3 (4n2 - 1) 23. If Sn is a statement for each n ≥ 1, argue that
1 1 1 Sn is true for all n ≥ 1 if it is known that the
8. _______ + _______ + + ______________
1·2·3 2·3·4 n(n + 1)(n + 2) following two conditions hold:
n(n + 3)
= ______________ (1) Sn ⇒ Sn-1 for each n ≥ 2.
4(n + 1)(n + 2)
(2) Sn is true for infinitely many values of n.
9. 1 + 2 + 22 + + 2n-1 = 2n - 1
24. Suppose a sequence a1, a2, … of numbers is given
10. 3 + 33 + 35 + + 32n-1 = _38 (9n - 1) that satisfies:
1 + __
11. __ 1 + + __
1 ≤ 2 - __
1
n (1) a1 = 2.
12 22 n2
12. n < 2n (2) an+1 = 2an for each n ≥ 1.
13. For any integer m > 0, m!n! < (m + n)! Formulate a theorem giving an in terms of n, and
prove your result by induction.
14.
1__ + ___
___ 1__ ≤ 2√__
1__ + + ___ n-1
√1 √2 √ n 25. Suppose a sequence a1, a2, … of numbers is given
1__ + ___
1__ + + ___
1__ ≥ __ that satisfies:
15. ___ n
√
√1 √2 √n
(1) a1 = b.
16. n3 + (n + 1)3 + (n + 2)3 is a multiple of 9.
(2) an+1 = can + b for n = 1, 2, 3, ….
17. 5n + 3 is a multiple of 4. Formulate a theorem giving an in terms of n, and
18.
3
n - n is a multiple of 3. prove your result by induction.
19. 32n+1 + 2n+2 is a multiple of 7. 26. (a) Show that n2 ≤ 2n for all n ≥ 4.
(b) Show that n3 ≤ 2n for all n ≥ 10.
Appendix D:
Polynomials
Two polynomials f (x) and g(x) are called equal if every coefficient of f (x) is the
same as the corresponding coefficient of g(x). More precisely, if
f (x) = a0 + a1x + a2x2 + and g(x) = b0 + b1x + b2x2 +
are polynomials, then
f (x) = g(x) if and only if a0 = b0, a1 = b1, a2 = b2, ….
In particular, this means that
f (x) = 0 is the zero polynomial if and only if a0 = 0, a1 = 0, a2 = 0, ….
This is the reason for calling x an indeterminate.
Let f (x) and g(x) denote nonzero polynomials of degrees n and m respectively, say
f (x) = a0 + a1x + a2x2 + + anxn and g(x) = b0 + b1x + b2x2 + + bmxm
where an ≠ 0 and bm ≠ 0. If these expressions are multiplied, the result is
f (x) g(x) = a0b0 + (a0b1 + a1b0)x + (a0b2 + a1b1 + a2b0)x2 + + anbmxn+m.
Since an and bm are nonzero numbers, their product anbm ≠ 0 and we have
Appendix D: Polynomials 547
Theorem 1
If f (x) and g(x) are nonzero polynomials of degrees n and m respectively, their product
f (x) g(x) is also nonzero and
deg[ f (x)g(x)] = n + m.
EXAMP L E 1
(2 - x + 3x2)(3 + x2 - 5x3) = 6 - 3x + 11x2 - 11x3 + 8x4 - 15x5.
If f (x) is any polynomial, the next theorem shows that f (x) - f (a) is a multiple of
the polynomial x - a. In fact we have
Theorem 2
Remainder Theorem
If f (x) is a polynomial of degree n ≥ 1 and a is any number, then there exists a
polynomial q(x) such that
f (x) = (x - a)q(x) + f (a)
where deg(q(x)) = n - 1.
PROOF
Write f (x) = a0 + a1x + a2x2 + + anxn where the ai are numbers, so that
f (a) = a0 + a1a + a2a2 + + anan. If these expressions are subtracted, the
constant terms cancel and we obtain
f (x) - f (a) = a1(x - a) + a2(x2 - a2) + + an(xn - an).
Hence it suffices to show that, for each k ≥ 1, xk - ak = (x - a)p(x) for some
polynomial p(x) of degree k - 1. This is clear if k = 1. If it holds for some value
k, the fact that
xk+1 - ak+1 = (x - a)xk + a(xk - ak)
shows that it holds for k + 1. Hence the proof is complete by induction.
x2 − x − 1
x − 2 x − 3x 2 + x − 1
3
x 3 − 2x 2
−x2 + x − 1
− x 2 + 2x
−x − 1
−x + 2
−3
3 2 2
Hence x - 3x + x - 1 = (x - 2)(x - x - 1) + (-3). The final remainder is
-3 = f (2) as is easily verified. This procedure is called the division algorithm.1
A real number a is called a root of the polynomial f (x) if f (a) = 0. Hence for
example, 1 is a root of f (x) = 2 - x + 3x2 - 4x3, but -1 is not a root because
f (-1) = 10 ≠ 0. If f (x) is a multiple of x - a, we say that x - a is a factor of f (x).
Hence the remainder theorem shows immediately that if a is root of f (x), then
x - a is factor of f (x). But the converse is also true: If x - a is a factor of f (x), say
f (x) = (x - a) q(x), then f (a) = (a - a)q(a) = 0. This proves theT
Theorem 3
Factor Theorem
If f (x) is a polynomial and a is a number, then x - a is a factor of f (x) if and only if a is
a root of f (x).
EXAMPLE 2
If f (x) = x3 - 2x2 - 6x + 4, then f (-2) = 0, so x - (-2) = x + 2 is a factor of
f (x). In fact, the division algorithm gives f (x) = (x + 2)(x2 - 4x + 2).
1 This procedure can be used to divide f (x) by any nonzero polynomial d(x) in place of x - a; the remainder then is a polynomial that is
either zero or of degree less than the degree of d(x).
Appendix D: Polynomials 549
Theorem 4
If f (x) is a nonzero polynomial of degree n, then f (x) has at most n roots counting
multiplicities.
PROOF
If n = 0, then f (x) is a constant and has no roots. So the theorem is true if n = 0.
(It also holds for n = 1 because, if f (x) = a + bx where b ≠ 0, then the only root
is -_ab .) In general, suppose inductively that the theorem holds for some value of
n ≥ 0, and let f (x) have degree n + 1. We must show that f (x) has at most n + 1
roots counting multiplicities. This is certainly true if f (x) has no root. On the
other hand, if a is a root of f (x), the factor theorem shows that f (x) = (x - a) q(x)
for some polynomial q(x), and q(x) has degree n by Theorem 1. By induction,
q(x) has at most n roots. But if b is any root of f (x), then
(b - a)q(b) = f (b) = 0
so either b = a or b is a root of q(x). It follows that f (x) has at most n roots. This
completes the induction and so proves Theorem 4.
Exercises 1.1 Solutions and Elementary 5. (b) x = -15t - 21, y = -11t - 17, z = t
Operations (Page 7) (d) No solution (f) x = -7, y = -9, z = 1
(h) x = 4, y = 3 + 2t, z = t
1. (b) 2(2s + 12t + 13) + 5s + 9(-s - 3t - 3) + 3t = -1;
6. (b) Denote the equations as E1, E2, and E3. Apply
(2s + 12t + 13) + 2s + 4(-s - 3t - 3) = 1
gaussian elimination to column 1 of the augmented
2. (b) x = t, y = _13 (1 - 2t) or x = _12 (1 - 3s), y = s
matrix, and observe that E3 - E1 = -4(E2 - E1).
(d) x = 1 + 2s - 5t, y = s, z = t or x = s, y = t,
Hence E3 = 5E1 - 4E2.
z = _15 (1 - s + 2t)
7. (b) x1 = 0, x2 = -t, x3 = 0, x4 = t
4. x = _14 (3 + 2s), y = s, z = t
(d) x1 = 1, x2 = 1 - t, x3 = 1 + t, x4 = t
5. (a) No solution if b ≠ 0. If b = 0, any x is a solution.
8. (b) If ab ≠ 2, unique solution x = -2 - 5b ,
________
(b) x = _ab 2 - ab
a+5
1 1 0 1 y = ______ . If ab = 2: no solution if a ≠ -5; if
1 2 0 2 - ab
7. (b) (d) 0 1 1 0
0 1 1 a = -5, the solutions are x = -1 + _25 t, y = t.
−1 0 1 2
1 - b , y = ______
(d) If a ≠ 2, unique solution x = _____ ab - 2 . If
8. (b) 2x − y = −1 2x1 − x2 = −1 a-2 a-2
− 3x + 2y + z = 0 or −3x1 + 2x2 + x3 = 0 a = 2, no solution if b ≠ 1; if b = 1, the solutions are
y + z= 3 x2 + x3 = 3 x = _12 (1 - t), y = t.
9. (b) x = -3, y = 2 (d) x = -17, y = 13 9. (b) Unique solution x = -2a + b + 5c, y = 3a - b - 6c,
10 -7
10. (b) x = _19 , y = __
9
, z = __ 3
z = -2a + b + 4c, for any a, b, c.
11. (b) No solution (d) If abc ≠ -1, unique solution x = y = z = 0; if
14. (b) F. x + y = 0, x - y = 0 has a unique solution. abc = -1 the solutions are x = abt, y = -bt, z = t.
(d) T. Theorem 1. (f) If a = 1, solutions x = -t, y = t, z = -1. If a = 0,
16. x = 5, y = 1, so x = 23, y = -32 there is no solution. If a ≠ 1 and a ≠ 0, unique solution
17. a = -_19 , b = -_59 , c = __
11 a - 1 , y = 0, z = ___
x = _____ -1 .
9 a a
19. $4.50, $5.20
10. (b) 1 (d) 3 (f) 1
11. (b) 2 (d) 3
Exercises 1.2 Gaussian Elimination (Page 16) (f) 2 if a = 0 or a = 2; 3, otherwise.
1. (b) No, no (d) No, yes (f) No, no 1 0 1 1 0 1
12. (b) False. A = 0 1 1 (d) False. A = 0 1 0
S T
0 1 -3 0 0 0 0
0 0 0 0 0 0
2. (b) 0 0 0 1 0 0 -1 2x - y = 0
0 0 0 0 1 0 0
0 0 0 0 0 1 1 (f) False. is consistent
-4x + 2y = 0
3. (b) x1 = 2r - 2s - t + 1, x2 = r, x3 = -5s + 3t - 1, 2x - y = 1
x4 = s, x5 = -6t + 1, x6 = t (d) x1 = -4s - 5t - 4, but is not.
-4x + 2y = 1
x2 = -2s + t - 2, x3 = s, x4 = 1, x5 = t
(h) True, A has 3 rows, so there are at most
4. (b) x = -_17 , y = -_37 (d) x = _13 (t + 2), y = t
3 leading 1’s.
(f) No solution
Selected Answers 551
14. (b) Since one of b – a and c – a is nonzero, then Exercises 1.6 An Application to Chemical
S T S T
1 a b+c 1 a b+c Reactions (Page 30)
1 b c+a → 0 b-a a-b →
0 c-a a-c 2. 2NH3 + 3CuO → N2 + 3Cu + 3H2O
1 b c+a
S T S T
1 a b+c 1 0 b+c+a 4. 15Pb(N3)2 + 44Cr(MnO4)2 →
0 1 -1 → 0 1 -1 22Cr2O3 + 88MnO2 + 5Pb3O4 + 90NO
0 0 0 0 0 0
Supplementary Exercises for Chapter 1
16. (b) x2 + y2 - 2x + 6y - 6 = 0 (Page 30)
5 7 8
18. __
20
in A, __
20
in B, __
20
in C
1. (b) No. If the corresponding planes are parallel and
Exercises 1.3 Homogeneous Equations distinct, there is no solution. Otherwise they either
(Page 24) coincide or have a whole common line of solutions,
that is, at least one parameter.
1 0 1 0 1 0 1 1 1 1
1. (b) False. A = (d) False. A = 2. (b) x1 = __ (-6s - 6t + 16), x2 = __ (4s - t + 1), x3 = s,
0 1 1 0 0 1 1 0 10 10
1 0 0 x4 = t
1 0 0 3. (b) If a = 1, no solution. If a = 2, x = 2 - 2t, y = -t,
(f) False. A = (h) False. A = 0 1 0
0 1 0 z = t. If a ≠ 1 and a ≠ 2, the unique solution is
0 0 0
2. (b) a = -3, x = 9t, y = -5t, z = t (d) a = 1, x = -t, 8 - 5a , y = ________ a+2
-2 - a , z = _____
x = ________
y = t, z = 0; or a = -1, x = t, y = 0, z = t 3(a - 1) 3(a - 1) 3
4. S T → S T → S -R T → S -R T → S R T
R1 R1 + R2 R1 + R2 R2 R2
3. (b) Not a linear combination. (d) v = x + 2y - z
R2 R2 1 1 1
4. (b) y = 2a1 - a2 + 4a3.
S T S T S T ST S T
-2 -2 -3 0 -1 6. a = 1, b = 2, c = -1
1 0 0 2 3 9. (b) 5 of brand 1, 0 of brand 2, 3 of brand 3
5. (b) r 0 + s -1 + t -2 (d) s 1 + t 0 8. The (real) solution is x = 2, y = 3 – t, z = t where t is
0 1 0 0 1 a parameter. The given complex solution occurs when
0 0 1 0 0
6. (b) The system in (a) has nontrivial solutions. t = 3 – i is complex. If the real system has a unique
7. (b) By Theorem 2 Section 1.2, there are solution, that solution is real because the coefficients
n - r = 6 - 1 = 5 parameters and thus infinitely and constants are all real.
many solutions.
Exercises 2.1 Matrix Addition, Scalar
(d) If R is the row-echelon form of A, then R
Multiplication, and Transposition (Page 40)
has a row of zeros and 4 rows in all. Hence R
has r = rank A = 1, 2, or 3. Thus there are 1. (b) (a b c d) = (-2, -4, -6, 0) + t(1, 1, 1, 1),
n - r = 6 - r = 5, 4, or 3 parameters and thus t arbitrary (d) a = b = c = d = t, t arbitrary
2. (b) S T (d) (-12, 4, -12)
infinitely many solutions. -14
9. (b) That the graph of ax + by + cz = d contains three -20
S T
0 1 -2
(f) -1 0 4 (h) S T
points leads to 3 linear equations homogeneous in 4 -1
variables a, b, c, and d. Apply Theorem 1. -1 -6
2 -4 0
11. There are n - r parameters (Theorem 2 Section 1.2),
3. (b) S T (d) Impossible (f) S T
15 -5 5 2
so there are nontrivial solutions if and only if n - r > 0.
10 0 0 -1
Exercises 1.4 An Application to Network Flow (h) Impossible
S _1 T
(Page 27) 4
4. (b)
2
1. (b) f1 = 85 - f4 - f7 2. (b) f5 = 15 11
5. (b) A = -__ B
f2 = 60 - f4 - f7 25 ≤ f4 ≤ 30 3
6. (b) X = 4A - 3B, Y = 4B - 5A
f3 = -75 + f4 + f6
7. (b) Y = (s, t), X = _12 (1 + 5s, 2 + 5t); s and t arbitrary
f5 = 40 - f6 - f7 f4, f6, f7 parameters
8. (b) 20A - 7B + 2C
3. (b) CD
9. (b) If A = S T, then (p, q, r, s) =
a b
c d
Exercises 1.5 An Application to Electrical 1
_ (2d, a + b - c - d, a - b + c - d, -a + b + c + d).
Networks (Page 29) 2
11. (b) If A + A = 0 then -A = -A + 0 =
2. I1 = -_15 , I2 = _35 , I3 = _45 -A + (A + A ) = (-A + A) + A = 0 + A = A
4. I1 = 2, I2 = 1, I3 = _12 , I4 = _32 , I5 = _32 , I6 = _12
552 Selected Answers
S T S T S T S T
13. (b) Write A = diag(a1, …, an), where a1, …, an are -2 1 3 -1
the main diagonal entries. If B = diag(b1, …, bn) then 5. (b) 2 + t -3 (d) -9 +t 4
-2 1
kA = diag(ka1, …, kan). 0 1 0 1
14. (b) s = 1 or t = 0 (d) s = 0, and t = 3 6. We have Ax0 = 0 and Ax1 = 0 and so
T (d) S _9
- 2 -5 T
15. (b) S
2 0 2 7 A(sx0 + tx1) = s(Ax0) + t(Ax1) = s 0 + t 0 = 0.
S T Q S T S TR
1 -1 -3 2 -5
16. (b) A = AT, so using Theorem 2 Section 2.1, 0 1 0
8. (b) x = -1 + s 0 + t 2 .
(kA)T = kAT = kA. 0 0 0
19. (b) False. Take B = -A for any A ≠ 0. 0 0 1
10. (b) False. S T S T = S T.
(d) True. Transposing fixes the main diagonal. 1 2 2 0
(f) True. (kA + mB)T = (kA)T + (mB)T = kAT + mBT 2 4 -1 0
= kA + mB (d) True. The linear combination x1a1 + + xnan
20. (c) Suppose A = S + W, where S = ST and W = -WT. equals Ax where A = [a1 an] by Theorem 1.
Then AT = ST + WT = S - W, so A + AT = 2S and
ST
2
(f) False. If A = S T and x = 0 , then
A - AT = 2W. Hence S = _12 (A + AT) and 1 1 -1
W = _12 (A - AT) are uniquely determined by A. 2 2 0
1
Ax = S T ≠ s S T + t S T for any s and t.
22. (b) If A = [aij] then (kp)A = [(kp)aij] = [k(paij)] = k[paij] 1 1 1
= k(pA). 4 2 2
(h) False. If A = S T, there is a solution for
1 -1 1
Exercises 2.2 Equations, Matrices, and -1 1 -1
Transformations (Page 54)
b = S T but not for b = S T.
0 1
0 0
1. (b) x1 − 3x2 − 3x3 + 3x4 = 5
8x2 + 2x 4 = 1 11. (b) Here T S x T = S y T = S 0 1 T S x T
y x 1 0 y
x1 + 2x2 + 2x3 = 2
x2 + 2x3 − 5x4 = 0 (d) Here T S x T = S y T = S 0 1 T S x T.
y -x -1 0 y
S T S T S T S T S T S T
(b) Here T S y T = S y T = 0 1 0 S y T, so the matrix
1 -2 -1 1 5 x -x -1 0 0 x
2. (b) x1 -1 + x2 0 + x3 1 + x4 -2 = -3 13.
2 -2 7 0 8 z z 0 0 1 z
S T
3 -4 9 -2 12 -1 0 0
S T
x1 is 0 1 0 .
3. (b) Ax = S T 2 = x1 S 0 T + x2 S T + x3 S 5 T
1 2 3 x 1 2 3
0 0 1
0 -4 5 x3 -4
16. Write A = [a1 a2 an] in terms of its columns. If
S T
x1 + 2x2 + 3x3 b = x1a1 + x2a2 + + xnan where the xi are scalars,
=
-4x2 + 5x3 then Ax = b by Theorem 1 where x = [x1 x2 xn]T.
S TS T S T S T
3 -4 1 6 x1 3 -4 That is, x is a solution to the system Ax = b.
x 18. (b) By Theorem 3, A(tx1) = t(Ax1) = t 0 = 0; that is,
(d) Ax = 0 2 1 5 x23 = x1 0 + x2 2 +
-8 7 -3 0 x4 -8 7 tx1 is a solution to Ax = 0.
ST S T
22. If A is m × n and x and y are n-vectors, we
S T
1 6 3x1 - 4x2 + x3 + 6x4
must show that A(x + y) = Ax + Ay. Denote
x3 1 + x4 5 = 2x2 + x3 + 5x4
the columns of A by a1, a2, …, an, and write
-3 0 -8x1 + 7x2 - 3x3
x = [x1 x2 xn]T and y = [y1 y2 yn]T. Then
4. (b) To solve Ax = b the reduction is x + y = [x1 + y1 x2 + y2 xn + yn]T, so
1 3 2 0 4 1 0 −1 −3 1 Definition 1 and Theorem 1 §2.1 give
1 0 −1 −3 1 → 0 1 1 1 1 , so the general A(x + y) = (x1 + y1)a1 + (x2 + y2)a2 + + (xn + yn)an
−1 2 3 5 1 0 0 0 0 0 = (x1a1 + x2a2 + + xnan) +
S T
1 + s + 3t ( y1a1 + y2a2 + + ynan) = Ax + Ay.
solution is 1 - ss - t .
t
Hence (1 + s + 3t)a1 + (1 - s - t)a2 + sa3 + ta4 = b
for any choice of s and t. If s = t = 0, we get
a1 + a2 = b; if s = 1 and t = 0, we have 2a1 + a3 = b.
Selected Answers 553
Exercises 2.3 Matrix Multiplication (Page 67) (d) True. Since AT = A, we have (I + AT = IT + AT =
I + A. (f) False. If A = S T, then A ≠ 0 but A = 0.
0 1 2
S T
aa 0 0 A2 + AB = A2 + BA. Subtracting A2 gives AB = BA.
(h) S T (j) 0 bb 0
1 0
(j) False. A = S T, B = S 1 2 T (l) False. See (j).
0 1 1 -2 2 4
0 0 cc
S T
-2 12 2 4
2. (b) BA = S TB =S T CB = 2 -6
-1 4 -10 2 7 -6
28. (b) If A = [aij] and B = [bij] and ∑j aij = 1 = ∑jbij,
1 2 4 -1 6
1 6 then the (i, j)-entry of AB is cij = ∑k aik bkj, whence
S T
2 4 8
AC = S T CA = -1 -1 -5
4 10 ∑j cij = ∑j∑k aik bkj = ∑k aik(∑j bkj) = ∑k aik = 1.
-2 -1 Alternatively: If e = (1, 1, …, 1), then the rows of A
1 4 2
sum to 1 if and only if Ae = e. If also Be = e then
3. (b) (a, b, a1, b1) = (3, 0, 1, 2)
(AB)e = A(Be) = Ae = e.
4. (b) A2 - A - 6I = S T-S T-S T=S T
8 2 2 2 6 0 0 0 n n
30. (b) If A = [aij], then tr(kA) = tr[kaij] = ∑kaii = k∑aii
2 5 2 -1 0 6 0 0 i=1 i=1
= k tr(A).
5. (b) A(BC) = S TS T=S T=
1 -1 -9 -16 -14 -17
0 1 5 1 5 1 (e) Write AT = [aij], where aij = aji. Then AAT =
S 3 1 0 T 2 1 = (AB)C
-2 -1 -2
k=1 i=1 k=1 i=1k=1
5 8 32. (e) Observe that PQ = P + PAP - P2AP = P, so
2
S T S T
2 -1 3 1 4 -1
2. (b) _15 S T
0 0 0 1 2 -1 1
(d) 3 1 -1 (f) __
10 -2 2 2
13. (b) S T = I2k (d) 0k (f) S T if n = 2m;
m -3 4
I 0 X 0 1 1 -2 -9 14 -1
S T
0 I 0 Xm
S T
2 0 -2 0 0 1 -2
S m T if n = 2m + 1.
m+1
0 X (h) _14 (j) -1 -2 -1 -3
-5 2 5
X 0 1 2 1 2
-3 2 -1 0 -1 0 0
14. (b) If Y is row i of the identity matrix I, then YA is
1 − 2 6 − 30 210
row i of IA = A.
0 1 − 3 15 − 105
(l)
16. (b) AB - BA (d) 0 0 0 1 −5 35
18. (b) (kA)C = k(AC) = k(CA) = C(kA) 0 0 0 1 −7
20. We have AT = A and BT = B, so (AB)T = BTAT = BA. 0 0 0 0 1
3. (b) S T = 5 S T S T = _15 S T
1 4 -3 0
Hence AB is symmetric if and only if AB = BA. x _ -3
22. (b) A = 0 y 1 -2 1 -2
(d) S y T = _15 S TS T S T
24. If BC = I, then AB = 0 gives 0 = 0C = (AB)C = A(BC) x 9 -14 6 1 23
1
= AI = A, contrary to the assumption that A ≠ 0. 4 -4 1 -1 = _5 8
z -10 15 -5 0 -25
26. 3 paths v1 → v4, 0 paths v2 → v3
S T
4 -2 1
27. (b) False. If A = S T = J, then AJ = A but J ≠ I.
1 0
4. (b) B = A-1AB = 7 -2 4
0 0
-1 2 -1
554 Selected Answers
S T
1 0 0
10. (b) (CT)-1 = (C-1)T = AT because C-1 = (A-1)-1 = A.
(d) Add (-2) times row 1 of I to row 2. E-1 = 2 1 0
(b) (i) Inconsistent. (ii) S x1 T = S T
2
11. 0 0 1
x2 -1
S T
1 0 0
(b) B4 = I, so B-1 = B3 = S T
0 1
15. (f) Multiply row 3 of I by 5. E-1 = 0 1 0
-1 0
S T
2 0 0 _15
c - 2 -c 1
2. (b) S T (d) S T (f) S T
16. -c 1 0 -1 0 1 -1 0 1
3 - c2 c -1 0 1 0 1 1 0
3. (b) The only possibilities for E are S T, S T,
0 1 k 0
18. (b) If column j of A is zero, Ay = 0 where y is column
1 0 0 1
S T, S 0 1 T, and S T. In each case, EA has a row
j of the identity matrix. Use Theorem 5. 1 0 1 k 1 0
(d) If each column of A sums to 0, XA = 0 where X is 0 k k 1
the row of 1s. Hence ATXT = 0 so A has no inverse by different from C.
Theorem 5 (XT ≠ 0). 5. (b) No, 0 is not invertible.
0 1 S 0 _1 T -5 1
6. (b) S T S T A = S 0 1 -3 T.
19. (b) (ii) (-1, 1, 1)A = 0 1 -2 1 0 1 0 1 0 7
20. (b) Each power Ak is invertible by Theorem 4
2
(because A is invertible). Hence Ak cannot be 0.
S 0 _1 T S 0
TS T A=S T.
1 0 1 -1 1 0 1 0 7
21. (b) By (a), if one has an inverse the other is zero and Alternatively,
1 -5 1 0 1 -3
so has no inverse. 2
T, a > 1, then A = S 0 1 T is an
S TS TS TS TS T
1
If A = S
2 0 1 0 0 1 0 0
_
a 0 -1 a 0 1 1 0 0 1 0 0
22. 1
0 1 (d) 0 1 0 0 _5 0 0 1 0 0 1 0 -3 1 0
x-compression because _1a < 1. 0 0 1 0 0 1 0 -1 1 -2 0 1 0 0 1
S T
24. (b) A-1 = _14 (A3 + 2A2 - I) _1
1 0 5 5 _1
S T
0 0 1
25. (b) If Bx = 0, then (AB)x = (A)Bx = 0, so x = 0 7
0 1 0 A = 0 1 -_5 -_25
because AB is invertible. Hence B is invertible by
1 0 0 0 0 0 0
Theorem 5. But then A = (AB)B–1 is invertible by
7. (b) U = S T= S TS T
1 1 1 1 0 1
Theorem 4.
1 0 0 1 1 0
1 -1 -14 8
8. (b) A = S TS TS TS T
2 -1 0 0 1 1 0 1 0 1 2
26. (b) -5 3 (d) -1 2 16 -9 1 0 2 1 0 -1 0 1
0
0 0 2 -1
S TS TS TS T
-13 8 -1 1 0 0 1 0 0 1 0 -3 1 0 0
0 0 1 -1
(d) A = 0 1 0 0 1 0 0 1 0 0 1 4
28. (d) If An = 0, (I - A)-1 = I + A + + An-1. -2 0 1 0 2 1 0 0 1 0 0 1
30. (b) A[B(AB)-1] = I = [(BA)-1B]A, so A is invertible by 10. UA = R by Theorem 1, so A = U -1R.
Exercise 10. 12. (b) U = A-1, V = I2; rank A = 2
S T
32. (a) Have AC = CA. Left-multiply by A-1 to get
S T
-2 1 0 1 0 -1 -3
C = A-1CA. Then right-multiply by A-1 to get (d) U = 3 -1 0 , V = 0 1 1 4 ; rank A = 2
CA-1 = A-1C. 2 -1 1
0 0 1 0
0 0 0 1
33. (b) Given ABAB = AABB. Left multiply by A-1, then
16. Write U-1 = EkEk-1E2E1, Ei elementary. Then
right multiply by B-1.
[I U-1A] = [U-1U U-1A] = U-1[U A] = EkEk-
34. If Bx = 0 where x is n × 1, then ABx = 0 so x = 0 as -1
1E2E1[U A]. So [U A] [I U A] by row
AB is invertible. Hence B is invertible by Theorem 5,
operations (Lemma 1).
so A = (AB)B-1 is invertible.
Selected Answers 555
S T
Section 2.3. x1
22. (b) Multiply column i by 1/k. x
20. We have T(x) = x1 + x2 + + xn = [1 1 1] 2 , so
xn
Exercises 2.6 Linear Transformations
(Page 101) T is the matrix transformation induced by the matrix
A = [1 1 1]. In particular, T is linear. On the other
S T S T ST
5 3 2 hand, we can use Theorem 2 to get A, but to do this
1. (b) 6 = 3 2 - 2 0 , so we must first show directly that T is linear.
S T S T
-13 -1 5 x1 y1
S T S T ST
5 3 2 x2 y
6 = 3T 2 - 2T 0 = 3 S T - 2 S T S T
3 -1 = 11 If we write x = and y = 2 . Then
T
5 2 11 xn yn
-13 -1 5
S T S T S T
5 4 x1 + y1
2. (b) As in 1(b), T -1 = 2 . T(x + y) = T 2
x + y2
2
-4 -9
xn + yn
3. (b) T(e1) = -e2 and T(e2) = -e1. = (x1 + y1) + (x2 + y2) + + (xn + yn)
So A[T(e1) T(e2)] = [-e2 -e1] = S T.
-1 0
= (x1 + x2 + + xn) + ( y1 + y2 + + yn)
0 -1
S T S T
__ __
√
__ 2 √ 2 = T(x) + T(y)
2
-__
2
(d) T(e1) = __ and T(e2) = __ . Similarly, T(ax) = aT(x) for any scalar a,
√2
__ √2
__
2 2 so T is linear. By Theorem 2, T has matrix
2 S T.
√2
__
1 -1 A = [T(e1) T(e2) T(en)] = [1 1 1], as before.
So A = [T(e1) T(e2)] = __
1 1 22. (b) If T : n → is linear, write T(ej) = wj for each
4. (b) T(e1) = -e1, T(e2) = e2 and T(e3) = e3. j = 1, 2, …, n where {e1, e2, …, en} is the standard
Hence Theorem 2 gives A[T(e1) T(e2) T(e3)] = basis of n. Since x = x1e1 + x2e2 + + xnen,
S T
-1 0 0 Theorem 1 gives
[-e1 e2 e3] = 0 1 0 . T(x) = T(x1e1 + x2e2 + + xnen)
0 0 1 = x1T(e1) + x2T(e2) + + xnT(en)
5. (b) We have y1 = T(x1) for some x1 in n, and = x1w1 + x2w2 + + xnwn
y2 = T(x2) for some x2 in n. So ay1 + by2 = = w · x = Tw(x)
S T
w1
aT(x1) + bT(x2) = T(ax1 + bx2). Hence ay1 + by2 is w
also in the image of T. where w = 2 . Since this holds for all x in n, it
(b) T Q 2 S T R ≠ 2 S T.
0 0 wn
7.
1 -1 shows that T = TW. This also follows from Theorem
√2 S T, rotation through θ = -__4 .
1__ 1 1 π 2, but we have first to verify that T is linear. (This
8. (b) A = __
-1 1 comes to showing that
10 S T, reflection in the line y = -3x.
1 -8 -6 w · (x + y) = w · s + w · y and w · (ax) = a(w · x)
(d) A = __
-6 8 for all x and y in n and all a in .) Then T has matrix
S T
cos θ 0 -sin θ
A = [T(e1) T(e2) T(en)] = [w1 w2 wn] by
S T
10. (b) 0 1 0 x1
sin θ 0 cos θ x
Theorem 2. Hence if x = 2 in , then
12. (b) Reflection in the y axis (d) Reflection in y = x
(f) Rotation through __π2 xn
13. (b) T(x) = aR(x) = a(Ax) = (aA)x for all x in . Hence T(x) = Ax = w · x, as required.
T is induced by aA.
556 Selected Answers
23. (b) Given x in and a in , we have 9. (b) Let A = LU = L1U1 be two such factorizations.
(S ◦ T)(ax) = S[T(ax)] Definition of S ◦ T Then UU -1 -1
1 = L L1; write this matrix as
-1 -1
= S[aT(x)] Because T is linear. D = UU 1 = L L1. Then D is lower triangular
= a[S[T(x)]] Because S is linear. (apply Lemma 1 to D = L-1L1); and D is also upper
= a[(S ◦ T)(x)] Definition of S ◦ T triangular (consider UU -1
1 ). Hence D is diagonal, and
so D = I because L-1 and L1 are unit triangular. Since
Exercises 2.7 LU-Factorization (Page 111) A = LU; this completes the proof.
S TS T
2 0 0 1 2 1 Exercises 2.8 An Application to Input-Output
2
1. (b) 1 -3 0 0 1 -_3 Economic Models (Page 116)
S T
-1 9 1 0 0 0
S T
14t
2. S t T
TS T
t
S
t
-1 0 0 0 1 3 -1 0 1
1. (b) 3t (d) 17t
(d) 1 1 0 0 0 1 2 1 0
t 47t t
1 -1 1 0 0 0 0 0 0 23t
4. P = S
(1 - a)t T
0 -2 0 1 0 0 0 0 0 bt
TS T
is nonzero (for some t) unless b = 0 and
S
1 1 -1 2 1
2 0 0 0
a = 1. In that case, S T is a solution. If the entries of E
1 -2 0 0 0 1 -_12 0 0 1
(f)
3 -2 1 0 0 0 0 0 0 1
are positive, then P = S T has positive entries.
0 2 0 1 0 0 0 0 0 b
S T
0 0 1 1-a
7. (b) S T
2. (b) P = 1 0 0 0.4 0.8
0 1 0 0.7 0.2
8. If E = S T, then I - E = S T, so
S T =S TS T
-1 2 1 -1 0 0 1 -2 -1 a b 1-a -b
PA = 0 -1 2 0 -1 0 0 1 2 c d -c 1-d
0 0 4 0 0 4 0 0 1 det(I – E) = (1 – a)(1 – d) – bc = 1 – tr E + det E.
S T S T
1 0 0 0 -1 -2 3 0 If det(I – E) ≠ 0, then
(d) P = 0 0 1 PA = 1 1 -1 3 = S T, so (I – E) ≥ 0
0 1 1-d b
0 0 0 1 2 5 -10 1 (I - E)-1 = _________ –1
det(I - E) c 1-a
0 1 0 0 2 4 -6 5
S TS T
if det(I - E) > 0, that is, tr E < 1 + det E. The
-1 0 0 0 1 2 -3 0
1 -1 0 0 0 1 -2 -3 converse is now clear.
ST
2 1 -2 0 0 0 1 -2 3
2 0 0 5 0 0 0 1 9. (b) Use p = 2 in Theorem 2.
S T S T
1
ST
-1 -1 + 2t 3
3. (b) y = 0 x = -t s and t arbitray
s (d) p = 2 in Theorem 2.
0 t 2
S T S T
2 8 - 2t
(d) y = 8 x = 6 - t t arbitrary Exercises 2.9 An Application to Markov Chains
-1 -1 - t (Page 123)
0 t
5. S T → S T →S T →S T →S T
R1 R + R R 1 + R2 R2 R2 1. (b) Not regular
ST
1 2
ST
1 5
2. (b) _13 S T , _38
R2 R2 -R1 -R1 R1 2
(d) _13 1 , 0.312 (f) __
1
20 7
, 0.306
6. (b) Let A = LU = L1U1 be LU-factorizations of the 1
1 8
invertible matrix A. Then U and U1 have no row of 4. (b) 50% middle, 25% upper, 25% lower
zeros and so (being row-echelon) are upper triangular 7 __
6. __ , 9
16 16
with 1’s on the main diagonal. Thus, using (a), the 7
diagonal matrix D = UU1-1 has 1’s on the main 8. (a) __(b) He spends most of his time in
ST
75
3
diagonal. Thus D = I, U = U1, and L = L1. 2
and B = S
Y B1 T
7. If A = S
X A1 T
1
a 0 b 0 compartment 3; steady state __
16 5
.
in block form, then
4
AB = S
Xb + A1Y A1B1 T
ab 0 2
, and A1B1 is lower triangular 12. (a) Direct verification.
(b) Since 0 < p < 1 and 0 < q < 1 we get
by induction. 0 < p + q < 2 whence -1 < p + q - 1 < 1. Finally,
-1 < 1 - p - q < 1, so (1 - p - q)m converges to
zero as m increases.
Selected Answers 557
Supplementary Exercises for Chapter 2 24. If A is n × n, then det B = (-1)k det A where n = 2k
(Page 124) or n = 2k + 1.
S T S T
1 -1 -2 -1 2 2
y and z are distinct), so k - m = 0 by Example 7
1. (b) -3 1 6 (d) _13 2 -1 2 = A
Section 2.1. n n -3 1 4 2 2 -1
6. (d) Using parts (c) and (b) gives Ipq AIrs = ∑∑aijIpqIijIrs.
i=1 j=1 2. (b) c ≠ 0 (d) any c (f) c ≠ -1
The only nonzero term occurs when i = q 3. (b) -2 4. (b) 1 6. (b) _49 7. (b) 16
S T
and j = r, so Ipq AIrs = aqrIps. 12
11 S T (d) __
1 5 1 4
7. (b) If A = [aij] = ∑i,jaijIij, then IpqAIrs = aqrIps by 8. (b) __ 79 -37
9. (b) __
51
21
6(d). But then aqrIps = AIpqIrs = 0 if q ≠ r, so aqr = 0 -2
if q ≠ r. If q = r, then aqqIps = AIpqIrs = AIps is 10. (b) det A = 1, -1 (d) det A = 1
independent of q. Thus aqq = a11 for all q. (f) det A = 0 if n is odd; nothing can be said if n is even
15. dA where d = det A
Exercises 3.1 The Cofactor Expansion
S T
8−c −c c −6
2 2
(Page 135) 1 0 1
1 −c
1
19. (b) _
c 0 c 1,c≠0 (d) _12 c
1. (b) 0 (d) -1 (f) -39 (h) 0 (j) 2abc (l) 0 -1 c 1 c − 10 c 8 − c
2 2
S T
2a + p 2b + q 2c + r c3 + 1 c
−c 1 c −1
2
8. (b) det 2p + x 2q + y 2r + z =
2x + a 2y + b 2z + c 20. (b) T. det AB = det A det B = det B det A = det BA.
(d) T. det A ≠ 0 means A-1 exists, so AB = AC
a+ p+x b+q+y c +r +z
S T
3det 2p + x 1 1 1
2q + y 2r + z =
implies that B = C. (f) F. If A = 1 1 1 , then
2x + a 2y + b 2z + c
1 1 1
a+ p+x b+q+y c +r +z
adj A = 0. (h) F. If A = S ,T then adj A = S T.
1 1 0 -1
3det p − a q−b r −c = 0 0 0 1
x−p y−q z−r
(j) F. If A = S T, then det(I + A) = -1 but
-1 1
3x 3y 3z 1 -1
1 + det A = 1. (l) F. If A = S T, then det A = 1
3det p − a q − b r − c 1 1
x−p y −q z− r 0 1
but adj A = S T ≠ A.
1 -1
9. (b) F, A = S T (d) F, A = S T → R = S T
1 1 2 0 1 0
2 2 0 1 0 1 0 1
(f) F, A = S T (h) F, A = S T and B = S T
1 1 1 1 1 0 22. (b) 5 - 4x + 2x2.
0 1 0 1 1 1 23. (b) 1 - _53 x + _12 x2 + _76 x3.
10. (b) 35 24. (b) 1 - 0.51x + 2.1x2 - 1.1x3; 1.25, so y = 1.25
11. (b) -6 (d) -6 26. (b) Use induction on n where A is n × n. It is clear if
14. (b) -(x - 2)(x2 + 2x - 12)
n = 1. If n > 1, write A = S T in block form where
a X
15. (b) -7__ 0 B
B is (n - 1) × (n - 1). Then A-1 = S T,
√6
16. (b) ±__ (d) x = ±y a-1 -a-1XB-1
S T
2
S T
x1 y1
x2 y 0 B-1
21. Let x = , y = 2 , and A = [c1 x + y cn] and this is upper triangular because B is upper
xn yn triangular by induction.
S T
where x + y is in column j. Expanding det A along 3 0 1
1
column j (the one containing x + y): 28. -__
21 0 2 3
n 3 1 -1
T(x + y) = det A = ∑(xi + yi)cij(A)
i=1
n n
= ∑xicij(A) + ∑yicij(A)
i=1 i=1
= T(x) + T(y)
Similarly for T(ax) = aT(x).
558 Selected Answers
34. (b) Have (adj A)A = (det A)I; so taking inverses, 20. (b) If λ ≠ 0, Ax = λx if and only if A-1x = _λ1 x.
1 I. On the other hand,
A-1 · (adj A)-1 = _____ The result follows.
det A 21. (b) (A3 - 2A - 3I)x = A3x - 2Ax + 3x =
A-1adj(A-1) = det(A-1)I = _____ 1 I. Comparison yields
det A λ3x - 2λx + 3x = (λ3 - 2λ - 3)x.
A-1(adj A)-1 = A-1adj(A-1), and part (b) follows. 23. (b) If Am = 0 and Ax = λx, x ≠ 0, then A2x = A(λx)
(d) Write det A = d, det B = e. By the adjugate = λAx = λ2x. In general, Akx = λkx for all k ≥ 1.
formula AB adj(AB) = deI, and AB adj B adj A = Hence, λmx = Amx = 0x = 0, so λ = 0 (because x ≠ 0).
A[eI] adj A = (eI)(dI) = deI. Done as AB is invertible. 24. (a) If Ax = λx, then Akx = λkx for each k. Hence
λmx = Amx = x, so λm = 1. As λ is real, λ = ±1 by
Exercises 3.3 Diagonalization and Eigenvalues the Hint. So if P-1AP = D is diagonal, then D2 = I
(Page 166) by Theorem 4. Hence A2 = PD2P = I.
27. (a) We have P-1AP = λI by the diagonalization
1. (b) (x - 3)(x + 2); 3, -2; S T , S T;
4 1
-1 1 algorithm, so A = P(λI)P-1 = λPP-1 = λI.
P=S T; P AP = S T.
4 1 -1 3 0 (b) No. λ = 1 is the only eigenvalue. ___
1
-1 1 0 -2 31. (b) λ1 = 1, stabilizes. (d) λ1 = __ 24
(3 + √69 ) = 1.13,
STS T
1 -3 diverges.
(d) (x - 2)3; 2; 1 , 0 ; No such P; Not diagonalizable. 34. Extinct if α < _15 , stable if α = _15 , diverges if α > _15 .
0 1
S TST
-1 1 Exercises 3.4 An Application to Linear
2
(f) (x + 1) (x - 2); -1, 2; 1 , 2 ; No such P; Recurrences (Page 172)
2 1
Not diagonalizable. Note that this matrix and the 1. (b) xk = _13 [4 - (-2)k] (d) xk = _15 [2k+2 + (-3)k].
matrix in Example 9 have the same characteristic 2. (b) xk = _12 [(-1)k + 1]
polynomial, but that matrix is diagonalizable. 3. (b) xk+4 =__xk + xk+2 + xk+3__; x10 = 169 __
S TST
-1 1 1 __
5. ___
√
[3 + √5 ]λ k1 + __ (-3 + √5 )λ k2 where λ1 = _12 (1 + √5 )
2 5
(h) (x - 1)2(x - 3); 1, 3; 0 , 0 ; No such P; Not and λ2 = _12__(1 - √5 ). __ __
1 1 1 __
7. ___ [2 + √3 ]λ k1 + (-2 + √3 )λ k2 where λ1 = 1 + √3
√
2 3 __
diagonalizable.
ST
1 and λ2 = 1 - √3 .
2. (b) Vk = _73 2k S 2 T. (d) Vk = _32 3k 0 . - _43 Q -_12 Rk. Long term 11_13 million tons.
34
9. __
3
S T S T S T S T
1
1 1 λ λ 1
4. Ax = λx if and only if (A - αI)x = (λ - α)x. 11. (b) A λ = = λ2 = λ λ
λ2
Same eigenvectors. λ2
λ2 a + bλ + cλ2 λ3
8. (b) P-1AP = S T, so A = P S n T P =
1 0 n 1 0 -1
11 k
0 2 0 2 12. (b) xk = __
10
11
3 + __
15
(-2)k - _56
S 6(2n - 1) 9 · 2n - 8 T.
9 - 8 · 2n 12(1 - 2n) 13. (a) pk+2 + qk+2 = [apk+1 + bpk + c(k)] + [aqk+1 + bqk] =
a( pk+1 + qk+1) + b( pk + qk) + c(k)
9. (b) A = S T
0 1 Section 3.5 An Application to Systems of
0 2 Differential Equations (Page 178)
11. (b) and (d) If PAP-1 = D is diagonal, then
(b) P-1(kA)P = kD is diagonal, and 1. (b) c1 S 1 Te4x + c2 S 5 Te-2x; c1 = -_23 , c2 = _13
1 -1
(d) Q(U -1AU )Q = D where Q = PU.
S T S T ST
-8 1 1
S T is not diagonalizable by Example 8. But (d) c1 10 e-x + c2 -2 e2x + c3 0 e4x; c1 = 0, c2 = -_12 , c3 = _32
1 1
12.
0 1 7 1 1
S T=S T+S T where S T has
1 1 2 1 -1 0 2 1 3. (b) The solution to (a) is m(t) = 10(_45 )t/3. Hence we
0 1 0 -1 0 2 0 -1 want t such that 10(_45 )t/3 = 5. We solve for t by taking
diagonalizing matrix P = S T, and S 0 2 T
1 -1 -1 0 3ln(_12 )
0 3 natural logarithms: t = ______ = 9.32 hours.
ln(_45 )
is already diagonal.
5. (a) If g = Ag, put f = g - A-1b. Then f = g and
14. We have λ2 = λ for every eigenvalue λ (as λ = 0, 1)
Af = Ag - b, so f = g = Ag = Af + b, as required.
so D2 = D, and so A2 = A as in Example 9.
6. (b) Assume that f1 = a1f1 + f2 and f2 = a2f1.
18. (b) crA(x) = det[xI - rA] = rndet S __
r T
x I - A = rnc __
AS r T
x
Differentiating gives f1 = a1f1 + f2 = a1f1 + a2f1,
proving that f1 satisfies (∗).
Selected Answers 559
S -5 T. S T
Exercises 3.6 Proof of the Cofactor Expansion 5 -20
1
_ 1
Theorem (Page 182) 14. (b) 4
17. (b) Q(0, 7, 3). 18. (b) x = __
40 -13
.
-2 14
2. Consider the rows Rp, Rp+1, …, Rq-1, Rq. In q - p 20. (b) S(-1, 3, 2).
adjacent interchanges they can be put in the order 21. (b) T. v - w = 0 implies that v - w = 0. (d) F.
Rp+1, …, Rq-1, Rq, Rp. Then in q - p - 1 adjacent v = -v for all v but v = -v only holds if v = 0.
interchanges we can obtain the order Rq, Rp+1, …, (f) F. If t < 0 they have the opposite direction.
Rq-1, Rp. This uses 2(q - p) -1 adjacent interchanges (h) F. -5v = 5v for all v, so it fails if v ≠ 0.
in all. (j) F. Take w = -v where v ≠ 0.
S T S T
3 2
Supplementary Exercises for Chapter 3 22. (b) -1 + t -1 ; x = 3 + 2t, y = -1 - t, z = 4 + 5t
(Page 182) 4 5
ST ST
2. (b) If A is 1 × 1, then AT = A. In general, 1 1
(d) 1 + t 1 ; x = y = z = 1 + t
det[Aij] = det[(Aij)T] = det[(AT)ji] by (a) and induction.
1 1
Write AT = [aij] where aij = aji, and expand det AT
S T S T
2 -1
along column 1. (f) -1 + t 0 ; x = 2 - t, y = -1, z = 1 + t
n
det AT = ∑aj1(-1) j+1det[(AT)j1] 1 1
j=1 23. (b) P corresponds to t = 2; Q corresponds to t = 5.
n
= ∑a1j(-1) 1+j
det[A1j] = det A 24. (b) No intersection
j=1 (d) P (2, -1, 3); t = -2, s = -3
where the last equality is the expansion of det A along
29. P(3, 1, 0) or PQ _53 , __, 4R
-1 _
3 3
row 1. → →
31. (b) CPk = -CPn+k if 1 ≤ k ≤ n, where there are
2n points.
Exercises 4.1 Vectors and Lines (Page 195)
33. DA = 2EA and 2AF = FC, so 2EF = 2Q EF + AF R
→ → → → → → →
__ __ __
1. (b) √6 (d) √5 (f) 3√6 . → → → → → → →
= DA + FC = CB + FC = FC + CB = FB.
S T
-2 → →
2. (b) _13 -1 . Hence EF = _12 FB. So F is the trisection point of both
2 AC and EB.
__
4. (b) √2 (d) 3.
6. (b) FE = FC + CE = _12 AC + _12 CB = _12 Q AC + CB R
→ → → → → → → Exercises 4.2 Projections and Planes
→ (Page 209)
= _12 AB.
7. (b) Yes (d) Yes 1. (b) 6 (d) 0 (f) 0
8. (b) p (d) -(p + q). 2. (b) π or 180° (d) __π3 or 60° (f) __ 2π
3
or 120°
S T ST S T
3. (b) 1 or -17
-1 0 -2
S T ST ST
___ ___
-1 1 0
9. (b) -1 , √27 (d) 0 , 0 (f) 2 , √12
4. (b) t 1 (d) s 2 + t 3
5 0 2
2 0 1
10. (b) (i) Q(5, -1, 2) (ii) Q(1, 1, -4).
S T
-26 6. (b) 29 + 57 = 86
11. (b) x = u - 6v + 5w = 4. 8. (b) A = B = C = __π3 or 60°
11
v (d) -_12 v
S T
19 10. (b) __
(b) S b T = 8
a -5 18
S T S T S T S T
12. 2 53 6 -3
5 1 27 1
c 6 11. (b) __
21 -1 + __
21 26 (d) __
53 -4 + __
53 2
S T S T
3a + 4b + c x1 -4 20 1 26
_____
13. (b) If it holds then -a + c = x2 . 1√
12. (b) __ 71 __
5642 , Q(__ , 15 , __
34
)
26 26 26 26
x3
ST S T
b+c 4
0
S -10 01 11 xx T → S -10 01 11 T
3 4 1 x1 0 4 4 x1 + 3x2 13. (b) 0 (b) -15
2 x2 0 8
3 x3 14. (b) -23x + 32y + 11z = 11 (d) 2x - y + z = 5
If there is to be a solution then x1 + 3x2 = 4x3 must (f) 2x + 3y + 2z = 7 (h) 2x - 7y - 3z = -1
hold. This is not satisfied. (j) x - y - z = 3
560 Selected Answers
S T ST
15. (b) S y T = -1 + t 1 (d) S y T = 1 + t 1 S T ST
x 2 2 x 1 1 12. Because u and v × w are parallel, the angle θ between
them is 0 or π. Hence cos(θ) = ±1, so the volume is
z 3 0 z -1 1 |u · (v × w)| = uv × wcos(θ) = u(v × w).
ST S T
(f) S y T = 1 + t 1
x 1 4
But the angle between v and w is __π2 so v × w =
z__ vwcos(__π2 ) = vw. The result follows.
2 -5
S T S T S T
√6
16. (b) __3
, Q(_73 , __23 , __
-2
3
) u1 v1 w1
17. (b) Yes. The equation is 5x - 3y - 4z = 0. 15. (b) If u = u2 , v = v2 , and w = w2 , then
19. (b) (-2, 7, 0) + t(3, -5, 2) u3 v3 w3
S T
20. (b) None (d) P(__ , -78 , __
13 ___ 65
) i u1 v1 + w1
19 19 19
21. (b) 3x + 2z = d, d arbitrary (d) a(x - 3) + b( y - 2) u × (v + w) = det j u2 v2 + w2 =
+ c(z + 4) = 0; a, b, and c not all zero k u3 v3 + w3
S T S T
(f) ax + by + (b - a)z = a; a and b not both zero i u1 v1 i u1 w1
(h) ax___ + by + (a - 2b)z = 5a - 4b; a and b not both zero det j u2 v2 + det j u2 w2 = (u × v) + (u × w)
23. (b) √___10 k u3 v3 k u3 w3
√14
24. (b) ___2
, A(3, 1, 2), B(_72 , -_12 , 3) where we used Exercise 21 Section 3.1.
__
√
(d) __6 19
, A(__ 37 __
, 2, _13 ), B(__ , 13 , 0) 16. (b) (v - w) · [(u × v) + (v × w) + (w × u)]
S TS T
√3 17 2 -8 0
apply to the other angles. 3. (b) __1
2 20 4 1
21
39. (b) Let p0, p1 be the vectors of P0, P1, so u = p0 - p1. -8 4 5 -3
S TS T
Then u · n = p0 · n – p1 · n = (ax0 + by0) - (ax1 + by1) 22 -4 20 0
1
= ax0 + by0 + c. Hence the distance is (d) __30 -4 28 10 1
|u · n|
Q n2 Rn = n , as required.
u·n 20 10 -20 -3
_____
______
S TS T
9 0 12 1
1
41. (b) This follows from (a) because v2 = a2 + b2 + c2. (f) __
25 0 0 0 -1
S T S T
12 0 16 7
44. (d) Take y1 = S y T and y2 = S z T in (c).
x1 x x2 y
S TS T
-9 2 -6 2
1
z1 z z2 x (h) __11 2 -9 -6 -5
-6 -6 7 0
S TS T
__
Exercises 4.3 More on the Cross Product √3 -1 0 1
__
(Page 217) 4. (b) _12 1 √3 0 0
S T
1 0 0 1 3
S T
__ __
√3
3. (b) ±__ -1 . 4. (b) 0 (d) √5 5. (b) 7 cos θ 0 -sin θ
3
-1 6. 0 1 0
6. (b) The distance is p - p0; use (a). sin θ 0 cos θ
10.
AB × AC is the area of the parallelogram
determined by A, B, and C.
Selected Answers 561
S T
(Page 244)
2 S TS T
2
1 a x + aby 1 a2 ab x
= _______ = ________
ST ST ST ST
2 2
a + b abx + b y 2
a + b ab b2 y
2 1 1 0 0
1. (b) Yes. If r 1 + s 1 + t 0 = 0 , then r + s = 0,
Exercises 4.5 An Application to Computer 1 1 1 0
Graphics (page 228) r - s = 0, and r + s + t = 0. These equations give
r = s = t = 0.
ST ST ST ST ST
2 + 2 7 2 + 2 3 2 + 2 − 2 + 2 −5 2 + 2 1 1 0 0 0
1. (b) _12 −
3 2+4 3 2+4 5 2+4 2+4 9 2+4 (d) No. Indeed: 1 - 0 + 0 - 1 = 0 .
0 1 1 0 0
2 2 2 2 2 0 0 1 1 0
5. (b) P(_95 , __
18
) 2. (b) Yes. If r(x + y) + s(y + z) + t(z + x) = 0, then
5
(r + t)x + (r + s)y + (s + t)z = 0. Since {x, y, z} is
Supplementary Exercises for Chapter 4 independent, this implies that r + t = 0, r + s = 0,
(Page 228) and s + t = 0. The only solution is r = s = t = 0.
4. 125 knots in a direction θ degrees east of north, where (d) No. In fact, (x + y) - (y + z) + (z + w) - (w + x) = 0.
U S T S TV
cos θ = 0.6 (θ = 53° or 0.93 radians). 2 -1
3. (b) 1 , 1 ; dimension 2.
6. (12, 5). Actual speed 12 knots. 0 1
-1 1
U S T S TV
Exercises 5.1 Subspaces and Spanning -2 1
(Page 234) (d) 0 , 2 ; dimension 2.
3 -1
1. (b) Yes (d) No (f) No. 1 0
U S T S TV U S T S T S TV
2. (b) No (d) Yes, x = 3y + 4z. 1 1 1 -1 0
3. (b) No 4. (b) 1 , -1 ; dimension 2. (d) 0 , 1 , 1 ;
0 1 1 0 0
10. span{a1x1, a2x2, …, akxk} ⊆ span{x1, x2, …, xk} 1 0 0 1 1
U S T S T S TV
by Theorem 1 because, for each i, aixi is in -1 1 1
span{x1, x2, …, xk}. Similarly, the fact that dimension 3. (f) 1 , 0 , 0 ; dimension 3.
xi = ai-1(aixi) is in span{a1x1, a2x2, …, akxk} for each 0 1 0
0 0 1
i shows that span{x1, x2, …, xk} ⊆ span{a1x1, a2x2, …,
5. (b) If r(x + w) + s(y + w) + t(z + w) + u(w) = 0,
akxk}, again by Theorem 1.
then rx + sy + tz + (r + s + t + u)w = 0, so r = 0,
12. If y = r1x1 + + rkxk then
s = 0, t = 0, and r + s + t + u = 0. The only solution
Ay = r1(Ax1) + + rk(Axk) = 0.
is r = s = t = u = 0, so the set is independent. Since
15. (b) x = (x + y) - y = (x + y) + (-y) is in U because
dim 4 = 4, the set is a basis by Theorem 7.
U is a subspace and both x + y and -y = (-1)y are in U.
6. (b) Yes (d) Yes (f) No.
16. (b) True. x = 1x is in U. (d) True. Always span{y, z}
7. (b) T. If ry + sz = 0, then 0x + ry + sz = 0 so
⊆ span{x, y, z} by Theorem 1. Since x is in span{x, y}
r = s = 0 because {x, y, z} is independent.
we have span{x, y, z} ⊆ span{y, z}, again by Theorem 1.
(d) F. If x ≠ 0, take k = 2, x1 = x and x2 = -x.
(f) False. a S T + b S T = S T cannot equal S T.
1 2 a + 2b 0 (f) F. If y = -x and z = 0, then 1x + 1y + 1z = 0.
0 0 0 1
20. If U is a subspace, then S2 and S3 certainly hold. (h) T. This is a nontrivial, vanishing linear
Conversely, assume that S2 and S3 hold for U. Since combination, so the xi cannot be independent.
U is nonempty, choose x in U. Then 0 = 0x is in U by 10. If rx2 + sx3 + tx5 = 0 then 0x1 + rx2 + sx3 + 0x4 +
S3, so S1 also holds. This means that U is a subspace. tx5 + 0x6 = 0 so r = s = t = 0.
22. (b) The zero vector 0 is in U + W because 0 = 0 + 0. 12. If t1x1 + t2(x1 + x2) + + tk(x1 + x2 + + xk) = 0,
Let p and q be vectors in U + W, say p = x1 + y1 then (t1 + t2 + + tk)x1 + (t2 + + tk)x2 + +
and q = x2 + y2 where x1 and x2 are in U, and y1 and (tk-1 + tk)xk-1 + (tk)xk = 0. Hence all these
y2 are in W. Then p + q = (x1 + x2) + (y1 + y2) is coefficients are zero, so we obtain successively
in U + W because x1 + x2 is in U and y1 + y2 is in tk = 0, tk-1 = 0, …, t2 = 0, t1 = 0.
W. Similarly, a(p + q) = ap + aq is in U + W for 16. (b) We show AT is invertible (then A is invertible). Let
any scalar a because ap is in U and aq is in W. Hence ATx = 0 where x = [s t]T. This means as + ct = 0 and
U + W is indeed a subspace of n. bs + dt = 0, so s(ax + by) + t(cx + dy) = (sa + tc)x +
(sb + td)y = 0. Hence s = t = 0 by hypothesis.
562 Selected Answers
17. (b) Each V-1xi is in null(AV ) because AV(V-1xi) = 15. If ATAx = λx, then
Axi = 0. The set {V-1x1, …, V-1xk} is independent Ax2 = (Ax) · (Ax) = xTATAx = xT(λx) = λx2.
as V-1 is invertible. If y is in null(AV ), then Vy is in
null(A) so let Vy = t1x1 + + tkxk where each tk is in Exercises 5.4 Rank of a Matrix (Page 260)
. Thus y = t1V-1x1 + + tkV-1xk is in
U S T S TV U S T S TV
span{V-1x1, …, V-1xk}. 2 0 2 1
20. We have {0} ⊆ U ⊆ W where dim{0} = 0 and 1. (b) -1 0 , ; -2 , 1 ;2
4 3
dim W = 1. Hence dim U = 0 or dim U = 1 by 1 1 -6 0
U S T S TV
Theorem 8, that is U = 0 or U = W, again by 1 0
-1 0 U S -3 T S -2 T V
2, 0 ; 1 3
Theorem 8. (d) , ;2
3 1
U S T S T S TV
1 0 0
U S T S T S TV
Exercises 5.3 Orthogonality (Page 252) 1 0 0
1 -2 0
U ST S TV
2. (b) 0 , 2 , 2 (d) 5, 1, 0
S T
1 4 2
1__ 1___ 1___ 0 5 -3 -6 -1 1
1. (b) __ √3
1 , ___
√42
1 , ___
√14
-3 . 0 1 6
1 -5 1 3. (b) No; no (d) No
S T
(b) S b T = _12 (a - c) 0 + __ ST
a 1 1 (f) Otherwise, if A is m × n, we have
1
3. (a + 4b + c) 4 +
c
18 m = dim(row A) = rank A = dim(col A) = n
-1 1
S T
2 4. Let A = [c1 cn]. Then col A = span{c1, …, cn} =
1
_ (2a - b + 2c) -1 . {x1c1 + + xncn | xi in } = {Ax | x in n}.
U S T S TV
9
2 6 5
ST S T
(d) S b T = _13 (a + b + c) 1 + _12 (a - b) -1 +
a 1 1 0 0
7. (b) The basis is -4 , -3 , so the dimension is 2.
c 1 0 1 0
S T
1 0 1
1
_ (a + b - 2c) 1 . Have rank A = 3 and n - 3 = 2.
6
-2 8. (b) n - 1
(b) If r1c1 + + rncn = 0, let x = [r1, …, rn]T. Then
S T S T S T
14 9.
2 2
Cx = r1c1 + + rncn = 0, so x is in null A = 0.
4. (b) 1 = 3 -1 + 4 1 .
-8 0 -2 Hence each ri = 0.
5 3 -1 10. (b) Write r = rank A. Then (a) gives
S T
-1 r = dim(col A) ≤ dim(null A) = n - r.
5. (b) t 3 , t in . 12. We have rank(A) = dim[col(A)] and
10
11
___ rank (AT) = dim[row(AT)]. Let {c1, c2, …, ck}
6. (b) √29 (d) 19 be a basis of col(A); it suffices to show that
(b) F. x = S T and y = S T.
1 0 {c T1 , c T2 , …, c Tk } is a basis of row(AT). But if
7.
0 1 t1c T1 + t2c T2 + + tkc Tk = 0, tj in , then (taking
(d) T. Every xi · yj = 0 by assumption, every transposes) t1c1 + t2c2 + + tkck = 0 so each
xi · xj = 0 if i ≠ j because the xi are orthogonal, and tj = 0. Hence {c T1 , c T2 , …, c Tk } is independent.
every yi · yj = 0 if i ≠ j because the yi are orthogonal. Given v in row(AT) then vT is in col(A); say
As all the vectors are nonzero, this does it. vT = s1c1 + s2c2 + + skck, sj in : Hence
(f) T. Every pair of distinct vectors in the set {x} has v = s1c T1 + s2c T2 + + skc Tk , so {c T1 , c T2 , …, c Tk } spans
dot product zero (there are no such pairs). row(AT), as required.
9. Let c1, …, cn be the columns of A. Then row i of AT 15. (b) Let {u1, …, ur} be a basis of col(A). Then b is not
is c Ti , so the (i, j)-entry of ATA is c Ti cj = ci · cj = 0, 1 in col(A), so {u1, …, ur, b} is linearly independent.
according as i ≠ j, i = j. So ATA = I. Show that col[A b] = span{u1, …, ur, b}.
11. (b) Take n = 3 in (a), expand, and simplify.
12. (b) We have (x + y) · (x - y) = x2 - y2. Hence Exercises 5.5 Similarity and Diagonalization
(x + y) · (x - y) = 0 if and only if x2 = y2; if (Page 271)
and only if x = y—where we used the fact that
1. (b) traces = 2, ranks = 2, but det A = -5, det B = -1
x ≥ 0 and y ≥ 0.
(d) ranks = 2, determinants = 7, but tr A = 5, tr B = 4
(f) traces = -5, determinants = 0, but rank A = 2,
rank B = 1
Selected Answers 563
3. (b) If B = P-1AP, then B-1 = P-1A-1(P-1)-1 = P-1A-1P. 13. (b) Here f (x) = r0 + r1ex. If f (x1) = 0 = f (x2) where
S T S T
-1 0 6 -3 0 0 x1 ≠ x2, then r0 + r1 · ex1 = 0 = r0 + r1 · ex2 so
4. (b) Yes, P = 0 1 0 , P-1AP = 0 -3 0 r1(ex1 - ex2) = 0. Hence r1 = 0 = r0.
1 0 5 0 0 8
(d) No, cA(x) = (x + 1)(x - 4)2 so λ = 4 has Exercises 5.7 An Application to Correlation and
multiplicity 2. But dim(E4) = 1 so Theorem 6 applies. Variance (Page 287)
8. (b) If B = P-1AP and Ak = 0, then
2. Let X denote the number of years of education,
Bk = (P-1AP)k = P-1AkP = P-10P = 0.
and let Y denote the yearly income (in 1000’s).
9. (b) The eigenvalues of A are all equal (they are the __
Then x = 15.3, s 2x = 9.12 and sx = 3.02,
diagonal elements), so if P-1AP = D is diagonal, then __
while y = 40.3, s 2y = 114.23 and sy = 10.69. The
D = λI. Hence A = P-1(λI)P = λI.
correlation is r(X, Y) = 0.599.
S T S T
10. (b) A is similar to D = diag(λ1, λ2, …, λn) so x1 z1
x z
(Theorem 1) tr A = tr D = λ1 + λ2 + + λn. 4. (b) Given the sample vector x = 2 , let z = 2
12. (b) TP(A)TP(B) = (P-1AP)(P-1BP) = P-1(AB)P = TP(AB). xn zn
13. (b) If A is diagonalizable, so is AT, and they have the where zi = a + bxi for each i. By (a) we have
__ __ __
same eigenvalues. Use (a). z = a + b x , so s 2z = ___
1
n-1
∑(zi - z )2
17. (b) cB(x) = [x - (a + b + c)][x2 - k] where i
__ __
k = a2 + b2 + c2 - [ab + ac + bc]. Use Theorem 7.
1
= ___
n-1
∑[(a + bxi) - (a + b x )]2 = ___
1
n-1
∑b2(xi - x )2
i __ i
= b2s 2x. Now (b) follows because √b2 = |b|.
Exercises 5.6 Best Approximation and Least
Squares (Page 281) Supplementary Exercises for Chapter 5
S T S T
-20 8 -10 -18 (Page 287)
1
__ T -1 1
1. (b) 12 46 , (A A) = __
12 -10 14 24 (b) F (d) T (f) T (h) F (j) F (l) T (n) F (p) F (r) F
95 -18 24 43
64 6 4 17
2. (b) __
13
- __
13
x (d) -__
10
- __
10
x Exercises 6.1 Examples and Basic Properties
2 (Page 295)
3. (b) y = 0.127 - 0.024x + 0.194x ,
S T
3348 642 -426 1. (b) No; S5 fails. (d) No; S4 and S5 fail.
(MTM)-1 = ___ 1
4248 642 571 -187 2. (b) No; only A1 fails. (d) No
-426 -187 91 (f) Yes (h) Yes (j) No
1 2 x
4. (b) __ (-46x + 66x + 60 · 2 ), (l) No; only S3 fails.
S T
92
115 0 -46 (n) No; only S4 and S5 fail.
(M M)-1 = __
T 1
46 0 17 -18 4. The zero vector is (0, -1); the negative of (x, y) is
-46 -18 38 (-x, -2 - y).
S 18 + 21x + 28 sin(__ ) T,
1 2 πx
5. (b) __
20 2 5. (b) x = _17 (5u - 2v), y = _17 (4u - 3v)
S T
24 -2 14 6. (b) Equating entries gives a + c = 0, b + c = 0,
(MTM)-1 = __ 1
40 -2 1 3 b + c = 0, a - c = 0. The solution is a = b = c = 0.
14 3 49 (d) If a sin x + b cos y + c = 0 in F[0, π], then this
7. s = 99.71 - 4.87x; the estimate of g is 9.74. [The true
must hold for every x in [0, π]. Taking x = 0, __π2 , and
value of g is 9.81]. If a quadratic in s is fit, the result is
π, respectively, gives b + c = 0, a + c = 0, -b + c = 0
s = 101 - _32 t - _92 t2 giving g = 9;
whence, a = b = c = 0.
S T
38 -42 10 7. (b) 4w
(MTM)-1 = _12 -42 49 -12 . 10. If z + v = v for all v, then z + v = 0 + v, so z = 0 by
10 -12 3 cancellation.
9. y = -5.19 + 0.34x1 + 0.51x2 + 0.71x3, 12. (b) (-a)v + av = (-a + a)v = 0v = 0 by Theorem
S T
517860 -8016 5040 -22650 3. Because also -(av) + av = 0 (by the definition of
(ATA)-1 = ____ 1 -8016 208 -316 400
25080 5040 -316 1300 -1090 -(av) in axiom A5), this means that (-a)v = -(av) by
-22650 400 -1090 1975 cancellation. Alternatively, use Theorem 3(4) to give
10. (b) f (x) = a0 here, so the sum of squares is (-a)v = [(-1)a]v = (-1)(av) = -(av).
S = ∑( yi - a0)2 = na02 - 2a0∑yi + ∑yi2.
Completing the square gives S = n S a0 - _1n ∑yi T 2
+ S ∑yi2 - _1n Q ∑yi R2 T. This is minimal when
a0 = _1n ∑yi.
564 Selected Answers
13. (b) The case n = 1 is clear, and n = 2 is axiom S3. 5. (b) If r(-1, 1, 1) + s(1, -1, 1) + t(1, 1, -1) =
If n > 2, then (a1 + a2 + + an)v = (0, 0, 0), then -r + s + t = 0, r - s + t = 0, and
[a1 + (a2 + + an)]v = a1v + (a2 + + an)v = r - s - t = 0, and this implies that r = s = t = 0. This
a1v + (a2v + + anv) using the induction hypothesis; proves independence. To prove that they span 3,
so it holds for all n. observe that (0, 0, 1) = _12 [(-1, 1, 1) + (1, -1, 1)] so
15. (c) If av = aw, then v = 1v = (a-1a)v = a-1(av) = (0, 0, 1) lies in span{(-1, 1, 1), (1, -1, 1), (1, 1, -1)}.
a-1(aw) = (a-1a)w = 1w = w. The proof is similar for (0, 1, 0) and (1, 0, 0).
(d) If r(1 + x) + s(x + x2) + t(x2 + x3) + ux3 = 0, then
Exercises 6.2 Subspaces and Spanning Sets r = 0, r + s = 0, s + t = 0, and t + u = 0, so r = s =
(Page 301) t = u = 0. This proves independence. To show
1. (b) Yes (d) Yes (f) No; not closed under addition that they span P3, observe that x2 = (x2 + x3) - x3,
or scalar multiplication, and 0 is not in the set. x = (x + x2) - x2, and 1 = (1 + x) - x, so
2. (b) Yes (d) Yes (f) No; not closed under addition. {1, x, x2, x3} ⊆ span{1 + x, x + x2, x2 + x3, x3}.
3. (b) No; not closed under addition. (d) No; not 6. (b) {1, x + x2}; dimension = 2
closed under scalar multiplication. (f) Yes (d) {1, x2}; dimension = 2
7. (b) U S T, S T V; dimension = 2
5. (b) If entry k of x is xk ≠ 0, and if y is in 1 1 1 0
n, then y = Ax where the column of A -1 0 0 1
(d) U S T, S T V; dimension = 2
1 0 0 1
is x -1
k y, and the other columns are zero.
1 1 -1 0
6. (b) -3(x + 1) + 0(x2 + x) + 2(x2 + 2)
8. (b) U S T, S T V
1 0 0 1
(d) _23 (x + 1) + _13 (x2 + x) - _13 (x2 + 2)
0 0 0 0
7. (b) No (d) Yes; v = 3u - w. 10. (b) dim V = 7
8. (b) Yes; 1 = cos2 x + sin2 x 11. (b) {x2 - x, x(x2 - x), x2(x2 - x), x3(x2 - x)}; dim V = 4
(d) No. If 1 + x2 = a cos2 x + b sin2 x, then taking 12. (b) No. Any linear combination f of such polynomials
x = 0 and x = π gives a = 1 and a = 1 + π2. has f (0) = 0.
(d) No. U S T, S T, S T, S T V; consists of
9. (b) Because P2 = span{1, x, x2}, it suffices to show that 1 0 1 1 1 0 0 1
{1, x, x2} ⊆ span{1 + 2x2, 3x, 1 + x}. But x = _13 (3x); 0 1 0 1 1 1 1 1
1 = (1 + x) - x and x2 = _12 [(1 + 2x2) - 1]. invertible matrices.
11. (b) u = (u + w) - w, v = -(u - v) + (u + w) - w, (f) Yes. 0u + 0v + 0w = 0 for every set {u, v, w}.
and w = w (h) Yes. su + t(u + v) = 0 gives (s + t)u + tv = 0,
14. No whence s + t = 0 = t. (j) Yes. If ru + sv = 0, then
17. (b) Yes. ru + sv + 0w = 0, so r = 0 = s.
1 a2 an (l) Yes. u + v + w ≠ 0 because {u, v, w} is independent.
18. v1 = __ __ __
a1 u - a1 v2 - - a1 vn, so V ⊆ span{u, v2, …, vn}.
21. (b) v = (u + v) - u is in U. (n) Yes. If I is independent, then |I| ≤ n by the
22. Given the condition and u ∈ U, 0 = u + (-1)u ∈ U. fundamental theorem because any basis spans V.
The converse holds by the subspace test. 15. If a linear combination of the subset vanishes, it is
a linear combination of the vectors in the larger set
Exercises 6.3 Linear Independence and (coefficients outside the subset are zero) so it is trivial.
Dimension (Page 309)
19. Because {u, v} is linearly independent, su + tv = 0 is
equivalent to S T S T = S 0 T. Now apply Theorem 5
1. (b) If ax2 + b(x + 1) + c(1 - x - x2) = 0, then a c s 0
a + c = 0, b - c = 0, b + c = 0, so a = b = c = 0. b d t
Section 2.4.
(d) If a S T + b S T + c S T + d S T = S T,
1 1 0 1 1 0 1 1 0 0
23. (b) Independent (d) Dependent. For example,
1 0 1 1 1 1 0 1 0 0
then a + c + d = 0, a + b + d = 0, a + b + c = 0, and (u + v) - (v + w) + (w + z) - (z + u) = 0.
b + c + d = 0, so a = b = c = d = 0. 26. If z is not real and az + bz2 = 0, then a + bz = 0
2. (b) 3(x2 - x + 3) - 2(2x2 + x + 5) + (x2 + 5x + 1) = 0 (z ≠ 0). Hence if b ≠ 0, then z = -ab-1 is real. So
b = 0, and so a = 0. Conversely, if z is real, say z = a,
(d) 2 S T+S T+S T=S T
-1 0 1 -1 1 1 0 0
0 -1 -1 1 1 1 0 0 then (-a)z + 1z2 = 0, contrary to the independence
5 1 6 =0 of {z, z2}.
(f) __________ + ___________ - ______
x2 + x - 6 x2 - 5x + 6 x2 - 9 29. (b) If Ux = 0, x ≠ 0 in n, then Rx = 0 where R ≠ 0
3. (b) Dependent: 1 - sin2 x - cos2 x = 0 is row 1 of U. If B ∈ Mmn has each row equal to R,
4. (b) x ≠ -_13 then Bx ≠ 0. But if B = ∑riAiU, then
Bx = ∑riAiUx = 0. So {AiU} cannot span Mmn.
Selected Answers 565
33. (b) If U ∩ W = 0 and ru + sw = 0, then ru = –sw is 11. (b) Suppose {p0(x), p1(x), …, pn-2(x)} is a basis of Pn-2.
in U ∩ W, so ru = 0 = sw. Hence r = 0 = s because We show that {(x - a)(x - b)p0(x), (x - a)(x - b)p1(x),
u ≠ 0 ≠ w. Conversely, if v ≠ 0 lies in U ∩ W, then …, (x - a)(x - b)pn-2(x)} is a basis of Un. It is a
1v + (–1)v = 0, contrary to hypothesis. spanning set by part (a), so assume that a linear
n+1
36. (b) dim On = _n2 if n is even and dim On = ___
2
if n is combination vanishes with coefficients r0, r1, …, rn-2.
odd. Then (x - a)(x - b)[r0p0(x) + + rn-2pn-2(x)] = 0,
so r0p0(x) + + rn-2pn-2(x) = 0 by the Hint. This
Exercises 6.4 Finite Dimensional Spaces implies that r0 = = rn-2 = 0.
(Page 318)
Exercises 6.6 An Application to Differential
1. (b) {(0, 1, 1), (1, 0, 0), (0, 1, 0)} (d) {x2 - x + 1, 1, x}
Equations (Page 329)
2. (b) Any three except {x2 + 3, x + 2, x2 - 2x - 1}
3. (b) Add (0, 1, 0, 0) and (0, 0, 1, 0). (d) Add 1 and x3. e2x - e-3x (f) 2e2x(1 + x)
1. (b) e1-x (d) _________
__
4. (b) If z = a + bi, then a ≠ 0 and b ≠ 0. If rz + s z = 0, e2 - e-3
ax a(2-x)
then (r + s)a = 0 and (r - s)b = 0. This means that e -e
(h) __________ (j) eπ-2xsin x
__
r + s = 0 = r - s, so r = s = 0. Thus {z, z } is 1 - e2a
independent; it is a basis because dim = 2. 4. (b) ce-x + 2, c a constant
5. (b) The polynomials in S have distinct degrees. 5. (b) ce-3x + de2x - __ x3
3
6. (b) {4, 4x, 4x2, 4x3} is one such basis of P3. However, 3 ln(_12 )
8. k = Q ___
15 R
there is no basis of P3 consisting of polynomials that 6. (b) t = ______ = 9.32 hours π 2 = 0.044
have the property that their coefficients sum to zero. ln(_45 )
For if such a basis exists, then every polynomial
Supplementary Exercises for Chapter 6
in P3 would have this property (because sums (Page 330)
and scalar multiples of such polynomials have the
same property). 2. (b) If YA = 0, Y a row, we show that Y = 0; thus AT
7. (b) Not a basis (d) Not a basis (and hence A) is invertible. Given a column c in n
8. (b) Yes; no write c = ∑ri(Avi) where each ri is in . Then
i
10. det A = 0 if and only if A is not invertible; if and only Yc = ∑riY Avi, so Y = YIn = Y[e1 e1 en] =
if the rows of A are dependent (Theorem 3 Section i
5.2); if and only if some row is a linear combination of [Ye1 Ye2 Yen] = [0 0 0] = 0, as required.
the others (Lemma 2). 4. We have null A ⊆ null(ATA) because Ax = 0 implies
11. (b) No. {(0, 1), (1, 0)} ⊆ {(0, 1), (1, 0), (1, 1)}. (ATA)x = 0. Conversely, if (ATA)x = 0, then
(d) Yes. See Exercise 15 Section 6.3. Ax2 = (Ax)T(Ax) = xTATAx = 0. Thus Ax = 0.
15. If v ∈ U then W = U; if v ∉ U then {v1, v2, …, vk, v}
Exercises 7.1 Examples and Elementary
is a basis of W by the independent lemma. Properties (Page 336)
18. (b) Two distinct planes through the origin (U and W)
S T
meet in a line through the origin (U ∩ W). 1 0 0
23. (b) The set {(1, 0, 0, 0, …), (0, 1, 0, 0, 0, …), 1. (b) T(v) = vA where A = 0 1 0
(0, 0, 1, 0, 0, …), …} contains independent 0 0 -1
(d) T(A + B) = P(A + B)Q = PAQ + PBQ =
subsets of arbitrary size.
T(A) + T(B); T(rA) = P(rA)Q = rPAQ = rT(A)
25. (b) u + w = {ru + sw | r, s in } = span{u, w}
(f) T [(p + q)(x)] = (p + q)(0) = p(0) + q(0) =
Exercises 6.5 An Application to Polynomials T [p(x)] + T [q(x)];
(Page 324) T [(rp)(x)] = (rp)(0) = r(p(0)) = rT [p(x)]
(h) T(X + Y) = (X + Y) · Z = X · Z + Y · Z =
2. (b) 3 + 4(x - 1) + 3(x - 1)2 + (x - 1)3 (d) 1 + (x - 1)3 T(X) + T(Y), and T(rX) = (rX) · Z = r(X · Z) = rT(X)
6. (b) The polynomials are (x - 1)(x - 2), (x - 1)(x - 3), (j) If v = (v1, …, vn) and w = (w1, …, wn), then
(x - 2)(x - 3). Use a0 = 3, a1 = 2, and a2 = 1. T(v + w) = (v1 + w1)e1 + + (vn + wn)en =
7. (b) f (x) = _32 (x - 2)(x - 3) - 7(x - 1)(x - 3) + (v1e1 + + vnen) + (w1e1 + + wnen) =
13
__
2
(x - 1)(x - 2). T(v) + T(w)
10. (b) If r(x - a)2 + s(x - a)(x - b) + t(x - b)2 = 0, T(av) = (av1)e + + (avn)en = a(ve + + vnen) =
then evaluation at x = a (x = b) gives t = 0 (r = 0). aT(v)
Thus s(x - a)(x - b) = 0, so s = 0. Use Theorem 4
Section 6.4.
566 Selected Answers
2. (b) rank(A + B) ≠ rank A + rank B in general. For 6. (b) Yes. dim(im T) = 5 - dim(ker T) = 3, so
example, A = S T and B = S T.
im T = W as dim W = 3. (d) No. T = 0 : 2 → 2
1 0 1 0
0 1 0 -1 (f) No. T : 2 → 2, T(x, y) = ( y, 0). Then
(d) T(0) = 0 + u = u ≠ 0, so T is not linear by ker T = im T
Theorem 1. (h) Yes. dim V = dim(ker T) + dim(im T) ≤
3. (b) T(3v1 + 2v2) = 0 (d) T S T = S 4T
1 -3 dim W + dim W = 2 dim W
-7 (j) No. Consider T : 2 → 2 with T(x, y) = ( y, 0).
(f) T(2 - x + 3x2) = 46 (l) No. Same example as (j).
4. (b) T(x, y) = _13 (x - y, 3y, x - y); T(-1, 2) = (-1, 2, -1) (n) No. Define T : 2 → 2 by T(x, y) = (x, 0). If
(d) T S T = 3a - 3c + 2b v1 = (1, 0) and v2 = (0, 1), then 2 = span{v1, v2} but
a b
c d 2 ≠ span{T(v1), T(v2)}.
5. (b) T(v) = _13 (7v - 9w), T(w) = _13 (v + 3w) 7. (b) Given w in W, let w = T(v), v in V, and write v = r1v1 +
8. (b) T(v) = (-1)v for all v in V, so T is the scalar + rnvn. Then w = T(v) = r1T(v1) + + rnT(vn).
operator -1 8. (b) im T = {∑rivi|ri in } = span{vi}.
i
12. If T(1) = v, then T(r) = T(r 1) = rT(1) = rv for all r
10. T is linear and onto. Hence 1 = dim = dim(im T) =
in .
dim(Mnn) - dim(ker T) = n2 - dim(ker T).
15. (b) 0 is in U = {v ∈ V|T(v) ∈ P} because T(0) = 0
12. The condition means ker (TA) ⊆ ker(TB), so
is in P. If v and w are in U, then T(v) and T(w) are
dim[ker(TA)]≤ dim[ker(TB)]. Then Theorem 4 gives
in P. Hence T(v + w) = T(v) + T(w) is in P and
dim[im(TA)] ≥ dim[im(TB)]; that is, rank A ≥ rank B.
T(rv) = rT(v) is in P, so v + w and rv are in U.
15. (b) B = {x - 1, …, xn - 1} is independent (distinct
18. Suppose rv + sT(v) = 0. If s = 0, then r = 0 (because
degrees) and contained in ker T. Hence B is a basis of
v ≠ 0). If s ≠ 0, then T(v) = av where a = -s-1r.
ker T by (a).
Thus v = T 2(v) = T(av) = a2v, so a2 = 1, again
20. Define T : Mnn → Mnn by T(A) = A - AT for all A in
because v ≠ 0. Hence a = ±1. Conversely, if
Mnn. Then ker T = U and im T = V by Example 3, so
T(v) = ±v, then {v, T(v)} is certainly not independent.
the dimension theorem gives
21. (b) Given such a T, write T(x) = a. If p = p(x) =
n n2 = dim Mnn = dim(U ) + dim(V ).
∑ai xi, then T(p) = ∑aiT(xi) = ∑ai[T(x)]i = ∑aiai = 22. Define T : Mnn → n by T(A) = Ay for all A in Mnn.
i=0
p(a) = Ea(p). Hence T = Ea. Then T is linear with ker T = U, so it is enough to
show that T is onto (then dim U = n2 - dim(im T) =
Exercises 7.2 Kernel and Image of a Linear n2 - n). We have T(0) = 0. Let y = [y1 y2 yn]T ≠ 0
Transformation (Page 344) in n. If yk ≠ 0 let ck = y -1
k y, and let cj = 0 if j ≠ k.
U S T S TV U S T S TV
-3 1 1 0 If A = [c1 c2 cn], then
1. (b) 7, 1 ; T(A) = Ay = y1c1 + + ykck + + yncn = y.
0 , 1 ; 2, 2
1 0 This shows that T is onto, as required.
0 -1 1 -1
U S TV U S T S TV
29. (b) By Lemma 2 Section 6.4, let {u1, …, um, …, un}
-1 1 0
0 1 ; 2, 1 be a basis of V where {u1, …, um} is a basis of U. By
(d) 2 ; 1,
-1 Theorem 3 Section 7.1 there is a linear transformation
1 1 -2 S : V → V such that S(ui) = ui for 1 ≤ i ≤ m,
2. (b) {x2 - x}; {(1, 0), (0, 1)}
and S(ui) = 0 if i > m. Because each ui is in
(d) {(0, 0, 1)}; {(1, 1, 0, 0), (0, 0, 1, 1)}
im S, U ⊆ im S. But if S(v) is in im S, write
(f) U S T , S T, S T V; {1}
1 0 0 1 0 0
v = r1u1 + + rmum + + rnun. Then
0 -1 0 0 1 0
S(v) = r1S(u1) + + rmS(um) = r1u1 + + rmum is
(h) {(1, 0, 0, …, 0, -1), (0, 1, 0, …, 0, -1), …,
in U. So im S ⊆ U.
(0, 0, 0, …, 1, -1)}; {1}
(j) U S T, S T V; U S T, S T V
0 1 0 0 1 1 0 0 Exercises 7.3 Isomorphisms and Composition
0 0 0 1 0 0 1 1 (Page 354)
3. (b) T(v) = 0 = (0, 0) if and only if P(v) = 0 and
Q(v) = 0; that is, if and only if v is in ker P ∩ ker Q. 1. (b) T is onto because T(1, -1, 0) = (1, 0, 0), T(0, 1, -1)
4. (b) ker T = span{(-4, 1, 3)}; B = {(1, 0, 0), (0, 1, 0), = (0, 1, 0), and T(0, 0, 1) = (0, 0, 1). Use Theorem 3.
(-4, 1, 3)}, im T = span{(1, 2, 0, 3), (1, -1, -3, 0)} (d) T is one-to-one because 0 = T(X) = UXV implies
that X = 0 (U and V are invertible). Use Theorem 3.
Selected Answers 567
(f) T is one-to-one because 0 = T(v) = kv implies that 27. (b) If ST = 1V for some S, then T is onto by
v = 0 (because k ≠ 0). T is onto because T(_1k v) = v for Exercise 13. If T is onto, let {e1, …, er, …, en} be
all v. [Here Theorem 3 does not apply if dim V is not a basis of V such that {er+1, …, en} is a basis of
finite.] ker T. Since T is onto, {T(e1), …, T(er)} is a basis
(h) T is one-to-one because T(A) = 0 implies AT = 0, of im T = W by Theorem 5 Section 7.2. Thus
whence A = 0. Use Theorem 3. S : W → V is an isomorphism where by S{T(ei)] = ei
4. (b) ST(x, y, z) = (x + y, 0, y + z), TS(x, y, z) = (x, 0, z) for i = 1, 2, …, r. Hence TS[T(ei)] = T(ei) for
(d) ST S T = S T, TS S T = S T
a b c 0 a b 0 a each i, that is TS[T(ei)] = 1W[T(ei)]. This means
c d 0 d c d d 0 that TS = 1W because they agree on the basis
5. (b) T 2(x, y) = T(x + y, 0) = (x + y, 0) = T(x, y). {T(e1), …, T(er)} of W.
Hence T 2 = T. 28. (b) If T = SR, then every vector T(v) in im T has the
(d) T 2 S T = _12 T S T = _12 S a + c b + d T
a b a+c b+d a+c b+d form T(v) = S[R(v)], whence im T ⊆ im S. Since R is
c d a+c b+d invertible, S = TR-1 implies im S ⊆ im T.
6. (b) No inverse; (1, -1, 1, -1) is in ker T. Conversely, assume that im S = im T. Then
(d) T -1 S T = _15 S T
a b 3a - 2c 3b - 2d dim(ker S) = dim(ker T) by the dimension theorem.
c d a+c b+d Let {e1, …, er, er+1, …, en} and {f1, …, fr, fr+1, …, fn}
(f) T (a, b, c) = _2 [2a + (b - c)x - (2a - b - c)x2]
-1 1
be bases of V such that {er+1, …, en} and
7. (b) T 2(x, y) = T(ky - x, y) = (ky - (ky - x), y) = (x, y) {fr+1, …, fn} are bases of ker S and ker T, respectively.
(d) T 2(X) = A2X = IX = X By Theorem 5, Section7.2, {S(e1), …, S(er)} and
8. (b) T 3(x, y, z, w) = (x, y, z, -w) so T 6(x, y, z, w) = {T(f1), …, T(fr)} are both bases of im S = im T. So
T 3[T 3(x, y, z, w)] = (x, y, z, w). Hence T -1 = T 5. So let g1, …, gr in V be such that S(ei) = T(gi) for each
T -1(x, y, z, w) = ( y - x, -x, z, -w). i = 1, 2, …, r. Show that
9. (b) T -1(A) = U -1 A.
10. (b) Given u in U, write u = S(w), w in W (because B = {g1, …, gr, fr+1, …, fn} is a basis of V.
S is onto). Then write w = T(v), v in V (T is onto). Then define R: V → V by R(gi) = ei for
Hence u = ST(v), so ST is onto. i = 1, 2, …, r, and R(fj) = ej for j = r + 1, …, n.
12. (b) For all v in V, (RT)(v) = R[T(v)] is in im(R). Then R is an isomorphism by Theorem 1,
13. (b) Given w in W, write w = ST(v), v in V (ST is Section 7.3. Finally SR = T since they have the
onto). Then w = S[T(v)], T(v) in U, so S is onto. But same effect on the basis B.
then im S = W, so dim U = dim(ker S) + dim(im S) 29. Let B = {e1, …, er, er+1, …, en} be a basis of V with
≥ dim(im S) = dim W. {er+1, …, en} a basis of ker T. If
16. {T(e1), T(e2), …, T(er)} is a basis of im T by {T(e1), …, T(er), wr+1, …, wn} is a basis of V, define
Theorem 5 Section 7.2. So T : span{e1, …, er} → im T S by S[T(ei)] = ei for 1 ≤ i ≤ r, and S(wj) = ej for
is an isomorphism by Theorem 1. r + 1 ≤ j ≤ n. Then S is an isomorphism by Theorem
19. (b) T(x, y) = (x, y + 1) 1, and TST(ei) = T(ei) clearly holds for 1 ≤ i ≤ r. But
24. (b) TS[x0, x1, …) = T [0, x0, x1, …) = [x0, x1, …), so if i ≥ r + 1, then T(ei) = 0 = TST(ei), so T = TST by
TS = 1V. Hence TS is both onto and one-to-one, so Theorem 2 Section 7.1.
T is onto and S is one-to-one by Exercise 13. But
[1, 0, 0, …) is in ker T while [1, 0, 0, …) is not in im S. Exercises 7.5 More on Linear Recurrences
26. (b) If T(p) = 0, then p(x) = -xp(x). We write (Page 367)
p(x) = a0 + a1x + a2x2 + + anxn, and this becomes
1. (b) {[1), [2n), [(-3)n)}; xn = __ 1
(15 + 2n+3 + (-3)n+1)
a0 + a1x + a2x2 + + anxn = 20
2. (b) {[1), [n), [(-2) )}; xn = _19 (5 - 6n + (-2)n+2)
n
-a1x - 2a2x2 - - nanxn.
(d) {[1), [n), [n2)}; xn = 2(n - 1)2 - 1
Equating coefficients yields a0 = 0, 2a1 = 0, 3a2 = 0,
3. (b) {[an), [bn)}
…, (n + 1)an = 0, whence p(x) = 0. This means
4. (b) [1, 0, 0, 0, 0, …), [0, 1, 0, 0, 0, …), [0, 0, 1, 1, 1, …),
that ker T = 0, so T is one-to-one. But then T is an
[0, 0, 1, 2, 3, …)
isomorphism by Theorem 3.
7. By Remark 2,
[in + (-i)n) = [2, 0, -2, 0, 2, 0, -2, 0, …)
[i(in - (-i)n)) = [0, -2, 0, 2, 0, -2, 0, 2, …)
are solutions. They are linearly independent and so are
a basis.
568 Selected Answers
S T
__
Exercises 8.1 Orthogonal Complements and 1 -1 √__ 2 0
Projections (Page 374) 1 -1
_ 1 √2 0
(h) 2 __
-1 -1 0 √__ 2
1. (b) {(2, 1), _35 (-1, 2)} (d) {(0, 1, 1), (1, 0, 0), (0, -2, 2)} 1 1 __ 0 √2
S T
1 1
2. (b) x = ___
182
(271, -221, 1030) + ___
182
(93, 403, 62) c √2 a a
1__
(d) x = _14 (1, 7, 11, 17) + _14 (7, -7, -7, 7) 6. P = ___
√2 k
0__ k -k
1
(f) x = __ (5a - 5b + c - 3d, -5a + 5b - c + 3d, -a √2 c c
12 1__ 1__
a - b + 11c + 3d, -3a + 3b + 3c + 3d) 10. (b) y1 = __ √
(-x1 + 2x2) and y2 = __
√
(2x1 + x2);
5 5
1
+ __12
(7a + 5b - c + 3d, 5a + 7b + c - 3d, q = -3y 21 + 2y 22.
-a + b + c - 3d, 3a - 3b - 3c + 9d) 11. (c)⇒(a) By Theorem 1 let P-1AP = D =
1 3
3. (a) __
10
(-9, 3, -21, 33) = __
10
(-3, 1, -7, 11) diag(λ1, …, λn) where the λi are the eigenvalues of A.
(c) 1
__
70
(-63, 3
21, -147, 231) = __
10
(-3, 1, -7, 11) By (c) we have λi = ±1 for each i, whence D2 = I.
4. (b) {(1, -1, 0), _12 (-1, -1, 2)}; projU(x) = (1, 0, -1) But then A2 = (PDP-1)2 = PD2P-1 = I. Since A is
(d) {(1, -1, 0, 1), (1, 1, 0, 0), _13 (-1, 1, 0, 2)}; symmetric this is AAT = I, proving (a).
projU(x) = (2, 0, 0, 1) 13. (b) If B = PTAP = P-1, then B2 = PTAPPTAP = PTA2P.
5. (b) U⊥ = span{(1, 3, 1, 0), (-1, 0, 0, 1)} 15. If x and y are respectively columns i and j of In, then
8. Write p = projU(x). Then p is in U by definition. If xTATy = xTAy shows that the (i, j)-entries of AT and A
x is U, then x - p is in U. But x - p is also in U⊥ by are equal.
(b) det S T = 1 and det S T = -1
Theorem 3, so x - p is in U ∩ U⊥ = {0}. Thus x = p. cos θ -sin θ cos θ sin θ
18.
10. Let {f1, f2, …, fm} be an orthonormal basis of U. If x is sin θ cos θ sin θ -cos θ
[Remark: These are the only 2 × 2 examples.]
in U the expansion theorem gives
(d) Use the fact that P-1 = PT to show that
x = (x · f1)f1 + (x · f2)f2 + + (x · fm)fm = projU(x).
PT(I - P) = -(I - P)T. Now take determinants and
14. Let {y1, y2, …, ym} be a basis of U⊥, and let A be the
use the hypothesis that det P ≠ (-1)n.
n × n matrix with rows y T1 , y T2 , …, y Tm, 0, …, 0.
21. We have AAT = D, where D is diagonal with main
Then Ax = 0 if and only if yi · x = 0 for each
diagonal entries R12, …, Rn2. Hence A-1 = ATD-1,
i = 1, 2, …, m; if and only if x is in U⊥⊥ = U.
and the result follows because D-1 has diagonal entries
17. (d) ET = AT[(AAT)-1]T(AT)T = AT[(AAT)T]-1A =
1/R12, …, 1/Rn2.
AT[AAT]-1A = E
23. (b) Because I - A and I + A commute,
E2 = AT(AAT)-1AAT(AAT)-1A = AT(AAT)-1A = E
PPT = (I - A)(I + A)-1[(I + A)-1]T(I - A)T =
Exercises 8.2 Orthogonal Diagonalization (I - A)(I + A)-1(I - A)-1(I + A) = I.
(Page 383)
Exercises 8.3 Positive Definite Matrices
1. (b) _15 S T S T
3 -4 1 a b (Page 389)
(d) _________
_______
S T
4 3 2 2 -b a __ __ __
√a + b 60 √5 12 √5 15 √5
S T
__
2 -1 ___ ___
2 1
− 1 √2 1
S T
1. (b) U = __ (d) U = __
6 6 6 2 6 -3 2 30 0 6 √30 10 √___
30
0 1
(f) 1
− 1 1
(h) 1
_
7 3 2 6 0 0 5 √15
3 3 3
0 1 1 -6 3 2 2. (b) If λk > 0, k odd, then λ > 0.
2 2
4. If x ≠ 0, then xTAx > 0 and xTBx > 0.
2. We have PT = P-1; this matrix is lower triangular Hence xT(A + B)x = xTAx + xTBx > 0 and
(left side) and also upper triangular (right side–see xT(rA)x = r(xTAx) > 0, as r > 0.
Lemma 1 Section 2.7), and so is diagonal. But then 6. Let x ≠ 0 in n. Then xT(UTAU )x = (Ux)TA(Ux) > 0
P = PT = P-1, so P2 = I. This implies that the provided Ux ≠ 0. But if U = [c1 c2 cn] and
diagonal entries of P are all ±1. x = (x1, x2, …, xn), then Ux = x1c1 + x2c2 + + xncn
S T
0 1 1 ≠ 0 because x ≠ 0 and the ci are independent.
S T
1__ 1 -1 1__
__
5. (b) __
√2
(d) __
√2
√ 2 0 0 10. Let PTAP = D = diag(λ1, …, λn) where PT = P. Since
1 1
0 1 -1 A is positive___definite,___each eigenvalue λi > 0. If
S T S
__
T
2√2 3 1 2 -2 1 B = diag(√λ1 , …, √λn ) then B2 = D, so
1 __
__ 1
(f) ___ √2 0 -4 or _
3 1 2 2 A = PB2PT = __ (PBPT)2. Take C = PBPT. Since C has
3√2 __
√
2 2 -3 1 2 1 -2 eigenvalues √λi > 0, it is positive definite.
Selected Answers 569
12. (b) If A is positive definite, use Theorem 1 to write Exercises 8.6 Complex Matrices (Page 406)
A = UTU where U is upper triangular with positive __ ___
diagonal D. Then A = (D-1U )TD2(D-1U ) so 1. (b) √6 (d) √13
A = L1D1U1 is such a factorization if 2. (b) Not orthogonal (d) Orthogonal
3. (b) Not a subspace. For example, i(0, 0, 1) = (0, 0, i) is
U1 = D-1U, D1 = D2, and L1 = U T1 .
not in U. (d) This is a subspace.
Conversely, let AT = A = LDU be such a
4. (b) Basis {(i, 0, 2), (1, 0, -1)}; dimension 2 (d) Basis
factorization. Then UTDTLT = AT = A = LDU,
{(1, 0, -2i), (0, 1, 1 - i)}; dimension 2
so L = UT by (a). Hence A = LDLT = VTV where
5. (b) Normal only (d) Hermitian (and normal),
V = LD0 and D0 is diagonal with D 20 = D (the matrix
not unitary (f) None (h) Unitary (and normal);
D0 exists because D has positive diagonal entries).
hermitian if and only if z is real
Hence A is symmetric, and it is positive definite by
√14 S T, U AU = S 0 6 T
1___ -2 3 - i H -1 0
Example 1. 8. (b) U = ___
3+i 2
√3 S T, U AU = S T
1__ 1 + i 1 H 1 0
Exercises 8.4 QR-Factorization (Page 393) (d) U = __
-1
__
1 - i 0 4
√ S √5 S
T, R = __ T
S T S T
1__ 2 -1 1__ 5 3
√3 0 0 1 0 0
1. (b) Q = __ 1__ H
5 1 2 0 1 (f) U = __ 0 1 + i 1 , U AU = 0 0 0
S T
√3
S T
1 1 0 3 0 -1 0 -1 1 - i 0 0 3
__
1__ -1 0 1 , R = __ 1__
(d) Q = __
√ 3 0 1 1 √3
0 3 1 10. (b) λZ2 = 〈λZ, λZ〉 = λ λ 〈Z, Z〉 = |λ|2Z2
0 0 2 11. (b)__If the (k, k)-entry of A is akk, then
1 -1 1 __ __ the (k, k)-entry
__
2. If A has a QR-factorization, use (a). For the converse of A is a kk, so the (k, k)-entry of ( A )T = AH is a kk.
use Theorem 1. This equals a, so akk is real.
14. (b) Show_ that (B2)H = BHBH = (-B)(-B) = B2;
Exercises 8.5 Computing Eigenvalues (iB)H = i BH = (-i)(-B) = iB. (d) If Z = A + B, as
(Page 396) given, first show that ZH = A - B, and hence that
1. (b) Eigenvalues 4, -1; eigenvectors S T , S T;
2 1 A = _12 (Z + ZH) and B = _12 (Z - ZH).
-1 -3 16. (b) If U is unitary, (U -1)-1 = (U H)-1 = (U -1)H, so
x4 = S T; r3 = 3.94
409
-203 U -1 is unitary.
18. (b) H = S T is hermitian but iH = S 1 0 T is not.
___ ___ 1 i i -1
(d) Eigenvalues λ1 = _12 (3 + √13 ), λ2 = _12 (3 - √13 );
-i 0
eigenvectors S T , S T; x4 = S T; r3 = 3.3027750
λ1 λ2
21. (b) Let U = S T be real and invertible, and assume
142 a b
1 1 43 c d
that U -1AU = S T. Then AU = U S T, and first
(The true value is λ1 = 3.3027756, to seven λ μ λ μ
decimal places.)
___ 0 ν 0 ν
2. (b) Eigenvalues λ = _12 (3 + √13 ) = 3.302776,
___ 1
column entries are c = aλ and -a = cλ. Hence λ is
1 real (c and a are both real and are not both 0), and
λ2 = _2 (3 - √13 ) = -0.302776
√10 S T, R1 = ___
√10 S T
A1 = S T, Q1 = ___
3 1 1___ 3 -1 1___ 10 3 (1 + λ2)a = 0. Thus a = 0, c = aλ = 0, a contradiction.
1 0 1 3 0 -1
10 S T, Q2 = ____√1090 S T,
1 33 -1 1 33 1
A2 = __ _____
Section 8.7 An Application to Linear Codes
-1 -3 -1 33
over Finite Fields (Page 421)
√1090 S T
1 109 -3
R2 = ____ _____
0 -10 1. (b) 1-1 = 1, 9-1 = 9, 3-1 = 7, 7-1 = 3.
109 S T=S T
1 360 1 3.302775 0.009174 (d) 21 = 2, 22 = 4, 23 = 8, 24 = 16 = 6, 25 = 12 = 2,
A3 = ___
1 -33 0.009174 -0.302775 26 = 22… so a = 2k if and only if a = 2, 4, 6, 8.
4. Use induction on k. If k = 1, A1 = A. In general 2. (b) If 2a = 0 in 10, then 2a = 10k for some integer k.
Ak+1 = Q -1 T
k AkQk = Q k AkQk, so the fact that A k = Ak
T
Thus a = 5k.
T
implies A k+1 = Ak+1. The eigenvalues of A are all 3. (b) 11-1 = 7 in 19.
real (Theorem 7 Section 5.5), so the Ak converge 6. (b) det A = 15 - 24 = 1 + 4 = 5 ≠ 0 in 7, so
to an upper triangular matrix T. But T must also be A-1 exists. Since 5-1 = 3 in 7, we have
A-1 = 3 S T = 3 S 3 5 T = S 2 1 T.
symmetric (it is the limit of symmetric matrices), so it 3 -6 3 1 2 3
is diagonal. 3 5
570 Selected Answers
__ __ __
7. (b) We have 5 3 = 1 in 7 so the reduction of the 7. (b) 3y 21 + 5y 22 - y 23 - 3√2 y1 + __
11 √
3
3 y2 + _23 √6 y3 = 7
augmented matrix is: 1__
y1 = __ 1__
(x2 + x3), y2 = __ (x1 + x2 - x3),
S4 3 1 1T → S4 3 1 1T → S0 4 5 4T →
√ 2 √ 3
3 1 4 3 1 5 6 1 1 5 6 1 1__
y3 = __
√
(2x1 - x2 + x3)
6
9. (b) By Theorem 3 Section 8.3 let A = UTU where
S 0 1 3 1 T → S 0 1 3 1 T.
1 5 6 1 1 0 5 3
U is upper triangular with positive diagonal entries.
Hence x = 3 + 2t, y = 1 + 4t, z = t; t in 7. Then q = xT(UTU )x = (Ux)TUx = Ux2.
9. (b) (1 + t)-1 = 2 + t.
10. (b) The minimum weight of C is 5, so it detects 4 Exercises 9.1 The Matrix of a Linear
Transformation (Page 442)
errors and corrects 2 errors.
S T S T
11. (b) {00000, 01110, 10011, 11101}. a a-b
12. (b) The code is {0000000000, 1001111000, 1. (b) 2b - c (d) _12 a+b
0101100110, 0011010111, 1100011110, 1010101111, c-b -a + 3b + 2c
0110110001, 1111001001}. This has minimum 2. (b) Let v = a + bx + cx2. Then CD[T(v)] =
T S b T = S -a - 2c T
distance 5 and so corrects 2 errors. a
MDB(T)CB(v) = S
2 1 3 2a + b + 3c
13. (b) {00000, 10110, 01101, 11011} is a (5,2)-code of -1 0 -2 c
minimal weight 3, so it corrects single errors.
Hence T(v) = (2a + b + 3c)(1, 1) + (-a - 2c)(0, 1)
14. (b) G = [1 u] where u is any nonzero vector in the = (2a + b + 3c, a + b + c).
code. H = S
In-1 T
u
S T S T
. 1 0 0 0 1 1 1
3. (b) 0 0 1 0 (d)
0 1 2
0 1 0 0
0 0 0 1 0 0 1
Exercises 8.8 An Application to Quadratic
S T S T S T
Forms (Page 430) 1 2 1 2 2a - b
4 0 Sa - bT
b 3a + 2b
4. (b) 5 3 ; C [T(a, b)] = 5 3
S T
1 3 2 D =
4 0 4b
1. (b) A = S T
1 0 a
(d) A = 3 1 -1 1 1 1 1
0 2
(d) _12 S T; CD[T(a + bx + cx )] =
2 -1 3 1 1 -1 2
√2 S T; y = __ S T; q = 3y 21 - y 22; 1, 2
1__ 1 1 1__
x1 + x2 1 1 1
2. (b) P = __
T S b T = _12 S
√2 x - x a
T
1 -1
2S
1 2
1 1 1 -1 a+b-c
S T S T
_
2 2 -1 2x1 + 2x2 - x3
1 1 1 c a+b+c
(d) P = _13 2 -1 2 ; y = _13 2x1 - x2 + 2x3 ;
S T S TS T S T
1 0 0 0 1 0 0 0 a a
(f) 0 1 1 0 ; CD Q T S TR 0 1 1 0 c
-1 2 2 -x1 + 2x2 + 2x3 a b 0 1 1 0 b = b+c
=
q = 9y 21 + 9y 22 - 9y 23; 2, 3 0 1 1 0 c d b+c
S T
0 0 0 1 d
S T
-2x1 + 2x2 + x3 0 0 0 1 d
-2 1 2
(f) P = _13 2 2 1 ; y = _13 x1 + 2x2 - 2x3 ; 5. (b) MED(S)MDB(T) =
S T
1 -2 2 2x1 + x2 + 2x3 1 1 0
S T 01 10 11 = S T = MEB(ST)
1 1 0 0 1 2 1
q = 9y 21 + 9y 22; 2, 2
0 0 1 -1 2 -1 1
S T
__ __
-√2 √3 1 -1 1 0
1__
__
(h) P = __ √2 0 2; (d) MED(S)MDB(T) =
S T
√ 6 __ __ 1 -1 0
S T -1 0 1 = S 0 1 0 T = MEB(ST)
√2 √3 -1 1 -1 0 2 -1 -1
S T;
__ __ __
-√2 x1 + √2 x2 + √2 x3 0 0 1
__ __ 0 1 0
1__
y = __
√
√3 x1 + √3 x3 -1
7. (b) T (a, b, c) = _12 (b + c - a, a + c - b, a + b - c);
6
S T S T
x1 + 2x2 - x3 0 1 1 -1 1 1
q = 2y 21 + y 22 - y 23; 2, 3 MDB(T) = 1 0 1 ; MBD(T -1) = _12 1 -1 1
3. (b) x1 = 1__
__ (2x - y), y1 = 1__
__ (x + 2y); 4x 21 - y 21 = 2; 1 1 0 1 1 -1
√5
1__
√5
1__ (d) T -1(a, b, c) = (a - b) + (b - c)x + cx2;
hyperbola (d) x1 = __ (x + 2y), y1 = __ (2x - y);
S T S T
2 2
√5 √5 1 1 1 1 -1 0
6x 1 + y 1 = 1; ellipse MDB(T) = 0 1 1 ; MBD(T -1) = 0 1 -1
4. (b) Basis {(i, 0, i), (1, 0, -1)}, dimension 2 0 0 1 0 0 1
(d) Basis {(1, 0, -2i), (0, 1, 1 - i)}, dimension 2
Selected Answers 571
S T S T
1 1 1 0 -1 1 -1 0 0 (Page 452)
0 1 1 0 = 0 1 -1 0 .
S T
0 0 1 0 0 0 1 0 -3 -2 1
1
_
0 0 0 1 0 0 0 1 1. (b) 2 2 2 0
Hence CB[T -1(a, b, c, d)] = MBD(T -1)CD(a, b, c, d) = 0 0 2
S T S T
S TS T S T
a-b 1 1 -1 1 1 1
1 -1 0 0 a
0 1 -1 0 b = b - c , so 4. (b) PB←D = 1 -1 0 , PD←B = _13 1 -2 1 ,
0 0 1 0 c c 1 0 1 -1 -1 2
S T S T
0 0 0 1 d d 1 0 1 0 0 1
T -1(a, b, c, d) = S T.
a-b b-c PE←D = 1 -1 0 , PE←B = 0 1 0
c d 1 1 -1 1 0 0
12. Have CD[T(ej)] = column j of In. Hence 5. (b) A = PD←B, where B = {(1, 2, -1), (2, 3, 0), (1, 0, 2)}.
S T
MDB(T) = [CD[T(e1)] CD[T(e2)] CD[T(en)]] = In. 6 -4 -3
16. (b) If D is the standard basis of n+1 and Hence A-1 = PB←D = -4 3 2
B = {1, x, x2, …, xn}, then 3 -2 -1
S T
1 1 0
8. (b) B = U S T, S T V
MDB(T) = [CD[T(1)] CD[T(x)] CD[T(xn)]] = 3 2
S T
7. (b) P = 0 1 2
1 a0 a 20 a n0 -1 0 1 7 5
1 a1 a 21 a n1 9. (b) cT(x) = x2 - 6x - 1
1 a2 a 22 a n2 . (d) cT(x) = x3 + x2 - 8x - 3 (f) cT(x) = x4
12. Define TA : n → n by TA(x) = Ax for all x in n.
1 an a 2n a nn If null A = null B, then ker(TA) = null A = null B =
This matrix has nonzero determinant by Theorem ker(TB) so, by Exercise 28 Section 7.3, TA = STB for
7 Section 3.2 (since the ai are distinct), so T is an some isomorphism S : n → n. If B0 is the standard
isomorphism. basis of n, we have A = MB0(TA) = MB0(STB) =
20. (d) [(S + T)R](v) = (S + T)(R(v)) = MB0(S)MB0(TB) = UB where U = MB0(S) is invertible
S[(R(v))] + T [(R(v))] = SR(v) + TR(v) = [SR + TR](v) by Theorem 1. Conversely, if A = UB with U
holds for all v in V. Hence (S + T)R = SR + TR. invertible, then Ax = 0 if and only Bx = 0, so null A
21. (b) If w lies in im(S + T), then w = (S + T)(v) for = null B.
some v in V. But then w = S(v) + T(v), so w lies in 16. (b) Showing S(w + v) = S(w) + S(v) means MB(Tw+v)
im S + im T. = MB(Tw) + MB(Tv). If B = {b1, b2}, then column
22. (b) If X ⊆ X1, let T lie in X 01. Then T(v) = 0 for all v j of MB(Tw+v) is CB[(w + v)bj] = CB(wbj + vbj) =
in X1, whence T(v) = 0 for all v in X. Thus T is in X0 CB(wbj) + CB(vbj) because CB is linear. This is column
and we have shown that X 01 ⊆ X0. j of MB(Tw) + MB(Tv). Similarly MB(Taw) = aMB(Tw);
24. (b) R is linear means Sv+w = Sv + Sw and Sav = aSv. so S(aw) = aS(w). Finally TwTv = Twv so S(wv) =
These are proved as follows: Sv+w(r) = r(v + w) = MB(TwTv) = MB(Tw)MB(Tv) = S(w)S(v) by Theorem 1.
rv + rw = Sv(r) + Sw(r) = (Sv + Sw)(r), and
Sav(r) = r(av) = a(rv) = (aSv)(r) for all r in . To show Exercises 9.3 Invariant Subspaces and Direct
R is one-to-one, let R(v) = 0. This means Sv = 0 so Sums (Page 464)
0 = Sv(r) = rv for all r. Hence v = 0 (take r = 1).
2. (b) T(U ) ⊆ U, so T [T(U )] ⊆ T(U ).
Finally, to show R is onto, let T lie in L(, V ). We
3. (b) If v is in S(U ), write v = S(u), u in U. Then
must find v such that R(v) = T, that is Sv = T. In
T(v) = T [S(u)] = (TS)(u) = (ST)(u) = S[T(u)] and this
fact, v = T(1) works since then T(r) = T(r 1) =
lies in S(U ) because T(u) lies in U (U is T-invariant).
rT(1) = rv = Sv(r) holds for all r, so T = Sv.
6. Suppose U is T-invariant for every T. If U ≠ 0, choose
25. (b) Given T : → V, let T(1) = a1b1 + + anbn, ai
u ≠ 0 in U. Choose a basis B = {u, u2, …, un} of V
in . For all r in , we have (a1S1 + + anSn)(r) =
containing u. Given any v in V, there is (by Theorem
a1S1(r) + + anSn(r) = (a1rb1 + + anrbn) = rT(1)
3 Section 7.1) a linear transformation T : V → V such
= T(r). This shows that a1S1 + + anSn = T.
that T(u) = v, T(u2) = = T(un) = 0. Then v = T(u)
27. (b) Write v = v1b1 + + vnbn, vj in . Apply Ei
lies in U because U is T-invariant. This shows that
to get Ei(v) = v1Ei(b1) + + vnEi(bn) = vi by the
V = U.
definition of the Ei.
572 Selected Answers
8. (b) T(1 - 2x2) = 3 + 3x - 3x2 = 3(1 - 2x2) + 3(x + x2) 29. (b) Tf,z[Tf,z(v)] = Tf,z[ f (v)z] = f[ f (v)z]z = f (v){ f [z]z}
and T(x + x2) = -(1 - 2x2), so both are in U. Hence = f (v)f (z)z. This equals Tf,z(v) = f (v)z for all v if and
U is T-invariant by Example 3. If only if f (v)f (z) = f (v) for all v. Since f ≠ 0, this holds if
S T
3 -1 1 and only if f (z) = 1.
B = {1 - 2x2, x + x2, x2} then MB(T) = 3 0 1 , so 30. (b) If A = [p1 p2 pn] where Upi = λpi for each
0 0 3 i, then UA = λA. Conversely, UA = λA means that
S T
x-3 1 -1
x -1 = (x - 3)det S T
x-3 1 Up = λp for every column p of A.
cT(x) = det -3
-3 x
0 0 x-3
Exercises 10.1 Inner Products and Norms
= (x - 3)(x2 - 3x + 3) (Page 475)
9. (b) Suppose u is TA-invariant where u ≠ 0. Then
TA(u) = ru for some r in , so (rI - A)u = 0. But 1. (b) P5 fails. (d) P5 fails. (f) P5 fails.
det(rI - A) = (r - cos θ)2 + sin2 θ ≠ 0 because 2. Axioms P1–P5 hold in U because they hold in V.
√17 S T
0 < θ < π. Hence u = 0, a contradiction. 1__ 1___ 3
3. (b) ___
√π
f (d) ___
10. (b) U = span{(1, 1, 0, 0), (0, 0, 1, 1)} and -1
__ ___
W = span{(1, 0, 1, 0), (0, 1, 0, -1)}, and these 4. (b) √3 (d) √3π
four vectors form a basis of 4. Use Example 9. 8. P1 and P2 are clear since f (i) and g(i) are real numbers.
-1 0 0 1 = ∑[ f (i)h(i) + g(i)h(i)]
i
of M22. Use Example 9.
= ∑f (i)h(i) + ∑g(i)h(i) = 〈f, h〉 + 〈g, h〉.
14. The fact that U and W are subspaces is easily verified i i
using the subspace test. If A lies in U ∩ V, P4: 〈rf, g〉 = ∑(rf)(i) · g(i) = ∑rf (i) · g(i)
i i
then A = AE = 0; that is, U ∩ V = 0. To show that
= r∑f (i) · g(i) = r〈f, g〉.
M22 = U + V, choose any A in M22. i
Then A = AE + (A - AE), and AE lies in U [because P5: If f ≠ 0, then 〈f, f〉 = ∑f (i)2 > 0 because some
i
(AE)E = AE2 = AE], and A - AE lies in W f (i) ≠ 0.
[because (A - AE)E = AE - AE2 = 0]. 12. (b) 〈v, v〉 = 5v 1 - 6v1v2 + 2v 22 = _15 [(5v1 - 3v2)2 + v 22]
2
17. (b) By (a) it remains to show U + W = V; we show (d) 〈v, v〉 = 3v 21 + 8v1v2 + 6v 22 = _13 [(3v1 + 4v2)2 + 2v 22]
that dim(U + W) = n and invoke Theorem 2 Section
13. (b) S T
1 -2
6.4. But U + W = U ⊕ W because U ∩ W = 0, so
-2 1
dim(U + W) = dim U + dim W = n.
S T
1 0 -2
18. (b) First, ker(TA) is TA-invariant. Let U = p be
(d) 0 2 0
TA-invariant. Then TA(p) is in U, say TA(p) = λp.
-2 0 5
Hence Ap = λp so λ is an eigenvalue of A. This means
14. By the condition, 〈x, y〉 = _12 〈x + y, x + y〉 = 0 for
that λ = 0 by (a), so p is in ker(TA). Thus U ⊆ ker(TA).
all x, y. Let ei denote column i of I. If A = [aij], then
But dim[ker(TA)] ≠ 2 because TA ≠ 0, so dim[ker(TA)]
aij = eiTAej = 〈ei, ej〉 = 0 for all i and j.
= 1 = dim(U ). Hence U = ker(TA).
16. (b) -15
20. Let B1 be a basis of U and extend it to a basis B 20. 1. Using P2: 〈u, v + w〉 = 〈v + w, u〉 = 〈v, u〉 + 〈w, u〉
of V. Then MB(T) = S T, so
MB1(T) Y = 〈u, v〉 + 〈u, w〉.
0 Z 2. Using P2 and P4:
cT(x) = det[xI - MB(T)] = 〈v, rw〉 = 〈rw, v〉 = r〈w, v〉 = r〈v, w〉.
det[xI - MB1(T)]det[xI - Z] = cT1(x)q(x). 3. Using P3: 〈0, v〉 = 〈0 + 0, v〉 = 〈0, v〉 + 〈0, v〉,
22. (b) T 2[p(x)] = p[-(-x)] = p(x), so T 2 = 1; so 〈0, v〉 = 0. The rest is P2.
B = {1, x2; x, x3} 4. Assume that 〈v, v〉 = 0. If v ≠ 0 this contradicts P5,
(d) T 2(a, b, c) = T(-a + 2b + c, b + c, -c) = (a, b, c), so v = 0. Conversely, if v = 0, then 〈v, v〉 = 0 by Part
so T 2 = 1; B = {(1, 1, 0); (1, 0, 0), (0, -1, 2)} 3 of this theorem.
23. (b) Use the Hint and Exercise 2. 22. (b) 15u2 - 17〈u, v〉 - 4v2
25. (b) T 2(a, b, c) = T(a + 2b, 0, 4b + c) = (a + 2b, 0, 4b + c) (d) u + v2 = 〈u + v, u + v〉 = u2 + 2〈u, v〉 + v2
= T(a, b, c), so T 2 = T; B = {(1, 0, 0), (0, 0, 1); (2, -1, 4)} 26. (b) {(1, 1, 0), (0, 2, 1)}
Selected Answers 573
28. 〈v - w, vi〉 = 〈v, vi〉 - 〈w, vi〉 = 0 for each i, so Exercises 10.3 Orthogonal Diagonalization
v = w by Exercise 27. (Page 491)
29. (b) If u = (cos θ, sin θ) in 2 (with the dot product)
1. (b) B = U S T, S T, S T, S T V;
1 0 0 1 0 0 0 0
then u = 1. Use (a) with v = (x, y).
0 0 0 0 1 0 0 1
S T
-1 0 1 0
Exercises 10.2 Orthogonal Sets of Vectors
(Page 484) MB(T) = 0 -1 0 1
1 0 2 0
S Tf
1 (d) Given v and w, write T -1(v) = v1 and
(a - 2b + c) -6 T -1(w) = w1. Then 〈T -1(v), w〉 = 〈v1, T(w1)〉 =
1 〈T(v1), w1〉 = 〈v, T -1(w)〉.
(d) Q _____ R S T + Q _____ RS T Q 2 RS T
a+d 1 0 a - d 1 0 + _____ b+c 0 1 5. (b) If B0 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}, then
S T
2 0 1 2 0 -1 1 0 7 -1 0
+Q RS T
b -
_____ c 0 1 MB0(T) = -1 7 0 has an orthonormal basis of
2 -1 0 0 0 2
U S T S T S TV
2. (b) {(1, 1, 1), (1, -5, 1), (3, 0, -2)} 1 1 0
1__ 1__
(b) U S T, S T, S T, S TV
1 1 1 -2 1 -2 1 0 eigenvectors __ √2
1 , __
√2
-1 , 0 . Hence an
3. 0 0 1
0 1 3 1 -2 1 0 -1
4. 2
(b) {1, x - 1, x - 2x + 3 } 2
_ orthonormal basis of eigenvectors of T is
U √__ (1, 1, 0), √__ (1, -1, 0), (0, 0, 1) V. -1 0
1 1
S T
__ __
6. (b) U⊥ = span{[1 -1 0 0], [0 0 1 0], [0 0 0 1]}, 2 2 1
dim U⊥ = 3, dim U = 1 (d) If B0 = {1, x, x2}, then MB0(T) = 0 3 0 has an
(d) U⊥ = span{2 - 3x, 1 - 2x2}, dim U⊥ = 2, 1 0 -1
US T S T S T V
dim U = 1 0 1 1
(f) U⊥ = spanU S T V, dim U = 1, dim U = 3
1__ 1__
1 -1 ⊥ orthonormal basis of eigenvectors 1 , __ √2
0 , __
√2
0 .
-1 0 0 1 -1
(b) U = span U S T, S T, S T V;
1 0 1 1 0 1 Hence an orthonormal basis of eigenvectors of T is
7.
U x, __ (1 + x2), __ (1 - x2) V.
1__ 1__
0 1 1 -1 -1 0 √2 √2
S TS T S TS T
a 1 0 0 1 0 0 1 0 a 0 0
v ≠ 0, then |a| = 1 so a = ±1.
2. 0 a 0 0 0 1 = 0 0 1 0 a 1
12. (b) Assume that S = Su ◦ T, u ∈ V, T an isometry
0 0 b 1 0 0 1 0 0 0 0 a
of V. Since T is onto (by Theorem 2), let u = T(w)
where w ∈ V. Then for any v ∈ V, we have (T ◦ Sw) Appendix A Complex Numbers (Page 533)
(v) = T(w + v) = T(w) + T(w) = ST(w)(T(v)) =
(ST(w) ◦T)(v), and it follows that T ◦ Sw = ST(w) ◦ T. 1. (b) x = 3 (d) x = ±1
11 23
2. (b) 10 + i (d) __
26
+ __
26
i
Exercises 10.5 An Application to Fourier (f) 2 - 11i (h) 8 - 6i
Approximation (Page 509) 11
3. (b) __
5
+ _35 i (d) ±(2 - i) (f) 1 + i
S T
__
π - __
1. (b) __ cos 3x + ______
4 cos x + ______ cos 5x 1
_ √3
__
4. (b) ± 2 i (d) 2, 1
_
2 π 32 52
2 __ 2
5. (b) -2, 1__± √3 i__
S T
sin 3x - ______
sin 2x + ______
π + sin x - ______
(d) __ sin 5x
sin 4x + ______
4 2 3 4 5 (d) ±2√2 , ±2√2 i
6. (b) x2 - 4x + 13; 2 + 3i
π S cos x + 32 + 52 T
2
- __ cos 3x ______
______ cos 5x
(d) x2 - 6x + 25; 3 + 4i
π - π S 22 - 1 + 4 2 - 1 + 6 2 - 1 T
2 __ cos 2x
8 ______ cos 4x cos 6x 8. x4 - 10x3 + 42x2 - 82x + 65
2. (b) __ ______ ______
10. (b) (-2)2 + 2i - (4 - 2i) = 0; 2 - i
S T
1 0 0 √3
19. (b) _12 + __ i (d) 1 - i (f) √3 - 3i
1. (b) cA(x) = (x + 1)3; P = 1 1 0 ; 2 __
(f) -216(1 + i)
1 3 √
1 -3 1 20. (b) -__
32
+ __
32
i (d) -32i
__ __
S T
__ __
-1 0 1 √2
__ √2
__
23. (b) ± 2 (√3 +__ i), ± 2 (-1__ + √3 i)
P-1AP = 0 -1 0 (d) ±2i, ±(√3 + i), ±(√3 - i)
0 0 -1 2π
S T
-1 0 -1 26. (b) The argument in (a) applies using β = __ n .
n
(d) cA(x) = (x - 1)2(x + 2); P = 4 1 1 ; Then 1 + z + + z n - 1
= 1 -
______z = 0.
1-z
4 2 1
S T
1 1 0
-1
P AP = 0 1 0 Appendix B Proofs (Page 540)
0 0 -2 1. (b) If m = 2p and n = 2q + 1 where p and q
S T
1 1 5 1 are integers, then m + n = 2(p + q) + 1 is odd.
(f) cA(x) = (x + 1)2(x -1)2; P = 0 0 2 -1 ; The converse is false: m = 1 and n = 2 is a
0 1 2 0
1 0 1 1 counterexample.
S T
-1 1 0 0 (d) x2 - 5x + 6 = (x - 2)(x - 3) so, if this is zero,
P-1AP = 0 -1 1 0 then x = 2 or x = 3. The converse is true: each of 2
0 0 1 -2
and 3 satisfies x2 - 5x + 6 = 0.
0 0 0 1
2. (b) This implication is true. If n = 2t + 1 where t is an
4. If B is any ordered basis of V, write A = MB(T). Then
integer, then n2 = 4t2 + 4t + 1 = 4t(t + 1) + 1. Now
cT(x) = cA(x) = a0 + a1x + + anxn for scalars ai in
t is either even or odd, say t = 2m or t = 2m + 1. If
. Since MB is linear and MB(Tk) = MB(T)k, we have
t = 2m, then n2 = 8m(2m + 1) + 1; if t = 2m + 1,
MB[cT(T)] = MV[a0 + a1T + + anTn] =
then n2 = 8(2m + 1)(m + 1) + 1. Either way, n2 has
a0I + a1A + + anAn = cA(A) = 0 by the
the form n2 = 8k + 1 for some integer k.
Cayley-Hamilton theorem. Hence cT(T) = 0 because
MB is one-to-one.
Selected Answers 575
sets, 229n odd function, 507 hermitian matrix, 400–402 input-output economic models,
transformation, 52, 334 pointwise addition, 292 higher-dimensional geometry, 32n 112–116
equal modulo, 409 real-valued function, 291 Hilbert, David, 369n input-output matrix, 113
equilibrium, 113 scalar multiplication, 292 Hilbert spaces, 369n integers, 408, 410, 536n
equilibrium condition, 113 fundamental identities, 80, 353 hit, 340 integers modulo, 408
equilibrium price structures, 113, fundamental theorem, 240, 243, homogenous coordinates, 227 integration, 333
114 305–306 homogenous equations, 18–24 interpolating polynomial, 145
equivalence relation, 262 fundamental theorem of algebra, associated homogenous system, 47 Interpolation and Approximation
equivalent 155n, 269, 401, 532–533 basic solutions, 22–23 (Davis), 483
matrices, 90–91 defined, 18 intersection, 235, 317–318
statements, 77n general solution, 21–22 interval, 291
systems of linear equations, 4 G linear combinations, 20–24 intrinsic descriptions, 186
error, 275, 418 Galois field, 411 nontrivial solutions, 18, 152 Introduction to Abstract Algebra
error correcting codes, 412–414 Galton, Francis, 284n trivial solution, 18 (Nicholson), 410n, 411n
Euclid, 537, 540 Gauss, Carl Friedrich, 10f, 10n, 126, homogenous system, 23–24 Introduction to Abstract Mathematics
euclidean algorithm, 410 155n, 269n, 532 Hooke’s law, 328 (Lucas), 536n
euclidean geometry, 246 gaussian algorithm, 10, 22–23, 267, Householder matrices, 396 Introduction to Matrix Computations
euclidean inner product, 468 409 How to Read and Do Proofs (Solow),
gaussian elimination (Stewart), 395
euclidean n-space, 468 536n
defined, 13 Introduction to the Theory of Error-
Euler, Leonhard, 527 hyperbola, 425
example, 13–14 Correcting Codes (Pless), 419n
Euler’s formula, 527 hyperplanes, 3, 504
LU-factorization, 103 invariance theorem, 241
evaluation, 160, 271, 298, 332, 337, hypotenuse, 537
365 normal equations, 273 hypothesis, 536 invariant subspaces, 454–464
even function, 507 scalar multiple, 36 invariants, 52n, 154–155
even parity, 415 systems of linear equations and, inverse theorem, 77
even polynomial, 311 9–16 I inverses
exact formula, 162 general differential systems, 174–178 (i, j)-entry, 33 see also invertible matrix
expansion theorem, 251, 478, 487 general solution, 2, 13, 177 idempotents, 69, 466 adjugate, 140–142
expectation, 434 general theory of relativity, 288 identity matrix, 46, 50, 107 cancellation laws, 74
exponential function, 173–174, 358 generalized eigenspace, 511 identity operator, 332 complex numbers, 524
generalized inverse, 274, 282 identity transformation, 53, 101 Cramer’s rule, 137, 143–144
generator, 417 “if and only if,” 33n defined, 70
F geometric vectors image determinants, 126, 137–147
factor, 267, 548 see also vector geometry of linear transformations, 52, 337, and elementary matrices, 85–86
factor theorem, 321, 548 defined, 186 338–344 elementary row operations, 6,
feasible region, 431 described, 186 of the parallelogram, 223 83–85
Fibonacci sequence, 171, 171n difference, 188 image space, 230 finite fields, 409
field, 289n, 409 intrinsic descriptions, 187 imaginary axis, 526 generalized inverse, 274, 282
field of integers modulo, 410 midpoint, 190 imaginary parts, 357, 523 inverse theorem, 77
finite dimensional spaces, 311–318 parallelogram law, 187–191 imaginary unit, 523 inversion algorithm, 73–74, 86
finite fields, 409–411, 416 Pythagoras’ theorem, 194–195 implication, 536 and linear systems, 72
finite sets, 409 scalar multiple law, 189, 191 implies, 77n linear transformation, 79–80, 353
fixed axis, 504 scalar multiplication, 189–191 inconsistent system, 2 matrix transformations, 79–80
fixed hyperplane, 504 scalar product, 189 independence, 236–240, 303–308 Moore-Penrose inverse, 282
fixed line, 498 standard position, 188 independence test, 236–237 nonzero matrices, 70
fixed plane, 501 sum, 188 independent, 236, 303, 312 properties of inverses, 74–79
fixed vectors, 505 tip-to-tail rule, 187 independent lemma, 312 square matrices, application to,
Forcinito, M.A., 419n unit vector, 189 indeterminate, 291, 546 70n, 72–74, 138
formal proofs, 537 vector subtraction, 188 index, 428, 429 and zero matrices, 70
forward substitution, 103 geometry, 32 induction inversion algorithm, 73–74
Fourier, J.B.J., 478, 506n Google PageRank, 165–166, 166n cofactor expansion theorem,
invertibility condition, 138
Fourier approximation, 506–509 Gram, Jörgen Pederson, 369n 180–181
invertible matrix
Fourier coefficients, 251–252, 478, Gram-Schmidt orthogonalization determinant, determination of, 127
see also inverses
506, 507 algorithm, 252, 369–370, mathematical induction, 92,
defined, 70
Fourier expansion, 251–252 378, 380, 390, 452, 479 541–544
Fourier series, 508 on path of length r, 66 determinants, 126
graphs
Fourier Series and Boundary Value induction hypothesis, 542 left cancelled, 74
attractor, 164
Problems (Brown and infinite dimensional, 312 LU-factorization, 108
conic, 20
Churchill), 508n initial condition, 174 “mixed” cancellation, 74
directed graphs, 66
fractions ellipse, 425 initial state vector, 119 orthogonal matrices, 391–392
eigenvectors, 153n hyperbola, 425 inner product product of elementary matrix,
equal fractions, 186n linear dynamical systems, 164–165 coordinate isomorphism, 350 85–86
field, 409 saddle point, 165 defined, 468 right cancelled, 74
probabilities, 117 trajectory, 164 euclidean inner product, 468 involutions, 462
free variables, 12 Grassman, Hermann, 288 and norms, 471 irreducible, 531
function groups, 495 positive definite n × n matrix, isometries, 220, 493–504
of a complex variable, 248n group theory, 32n 470–471 isomorphic, 347
composition, 351 properties of, 469–470 isomorphism, 347–351, 437,
continuous functions, 469 inner product space 440–441, 495
defined, 291 H defined, 468
derivative, 357, 358n Hamilton, William Rowan, 405n distance, 471
differentiable function, 173, 299, Hamming, Richard, 412 dot product, use of, 471 J
325, 357 Hamming (7,4)-code, 420 Fourier approximation, 506–509 Jacobi identity, 218
equal, 291 Hamming bound, 414 isometries, 493–504 Jordan, Camille, 521f, 521n
even function, 507 Hamming distance, 412 norms, 471–474 Jordan blocks, 519
exponential function, 173–174, Hamming, Richard, 412 orthogonal diagonalization, Jordan canonical form, 518–522
358 Hamming weight, 412 486–491 Jordan canonical matrices, 510
inner product. See inner product heat conduction in solids, 506n orthogonal sets of vectors, 477–483 junction rule, 25, 28
objective function, 431, 433 Hermite, Charles, 400n unit vector, 472 juvenile survival rate, 150
Index 579
matrix multiplication, 56–66 matrix-vector products, 44 orthogonal set of vectors, 401, perpendicular lines, 198–202
matrix subtraction, 35 scalar multiplication, 36–38, 289, 477–483 physical dynamics, 405n
matrix transformations. See linear 292 orthogonal sets, 249–252, 249n, pigeonhole principle, 538
transformations multiplication rule, 528–529 368–369 Pisano, Leonardo, 171
matrix-vector multiplication, multiplicity, 158, 267, 364, 548 orthogonal vectors, 201, 202, 249, planes, 204–206, 230
43–48 multivariate analysis, 434 401–402, 477 Pless, V., 419n
numerical division, 69–80 orthogonality PLU-factorization, 108
scalar multiplication, 36–38 complex matrices, 397–406 point-slope formula, 194
size of matrices, 36 N constrained optimization, 431–434 pointwise addition, 291, 292
transformations, 51–54 n-code. See code dot product, 246–249 polar form, 527–529
transpose of a matrix, 38–40 (n, k)-code. See code eigenvalues, computation of, polynomials
usefulness of, 32 n-parity-check code, 415 393–396 associated with the linear
matrix form n-tuples, 229, 253, 289n expansion theorem, 251, 478 recurrence, 363
defined, 45 see also set of all ordered (n) finite fields, 409–411 characteristic polynomial. See
reduced row-echelon form, 9, 10, n-vectors. See vectors Fourier expansion, 251 characteristic polynomial
12, 14, 88–89 n-words, 412 Gram-Schmidt orthogonalization coefficients, 279–280, 290, 545
row-echelon form, 9 n × 1 matrix. See zero n-vector algorithm, 369–370, 378, 380, companion matrix, 137
upper Hessenberg form, 396 n × n matrix. See square matrix 390, 452, 479 complex roots, 155, 401, 549
matrix generators, 416–417 (n × n matrix) normalizing the orthogonal set, constant, 546
matrix inverses. See inverses nearest neighbour decoding, 413 250 defined, 145n, 290, 545
matrix inversion algorithm, 73–74, 86 negative orthogonal codes, 419–420 degree of the polynomial, 145n,
matrix multiplication correlation, 284, 285 orthogonal complement, 371–372, 290, 546
associative law, 62 of m × n matrix, 35 419 distinct degrees, 304
block, 64–65 vector, 43, 289 orthogonal diagonalization, division algorithm, 548
commute, 60, 63 negative x, 43 376–382 equal, 291, 546
compatibility rule, 59 negative x-shear, 54 orthogonal projection, 372–374 evaluation, 160, 298
and composition of network flow, 25–26 orthogonal sets, 249–252, 249n, even, 311
transformations, 57–64 Newton, Sir Isaac, 10n 368–369 factor theorem, 548
definition, 57, 59 Nicholson, W.K., 410n, 411n orthogonally similar, 383 form, 546
directed graphs, 66 nilpotent, 167, 519 positive definite matrix, 385–389 indeterminate, 546
distributive laws, 62–64 noise, 408 principal axis theorem, 377–378 interpolating the polynomial,
dot product rule, 49, 59 nonleading variable, 19 projection theorem, 273 144–147
left-multiplication, 76 nonlinear recurrences, 172 Pythagoras’ theorem, 250 Lagrange polynomials, 323, 479
matrix of composite of two linear nontrivial solution, 18, 152 QR-algorithm, 395–396 leading coefficient, 145n, 290, 546
transformations, 95 nonzero scalar multiple of a basic QR-factorization, 390–392 least squares approximating
matrix products, 58–62 solution, 23 quadratic forms, 381, 422–430 polynomials, 277–279
non-commutative, 76 nonzero vectors, 191, 231n, 248, real spectral theorem, 378 Legendre polynomials, 480
order of the factors, 62 249n statistical principal component as matrix entries, and
results of, 61–62 norm, 398, 471–474 analysis, 434–435 determinants, 132
right-multiplication, 76 normal, 204, 405 triangulation theorem, 382 with no root, 548, 549
matrix of a linear transformation, normal equations, 273 orthogonally diagonalizable, 377 nonconstant polynomial with
436–441 normalizing the orthogonal set, orthogonally similar, 383 complex coefficients, 269
matrix of T corresponding to the 250, 477 orthonormal basis, 480, 488 odd, 311
ordered bases B and D, 438 nth roots of unity, 530–531 orthonormal matrix, 376n remainder theorem, 547
matrix recurrence, 151 null space, 231 orthonormal set, 401, 477 root, 155, 298, 393, 523
matrix theory, 32n nullity, 339 orthonormal vector, 249 root of characteristic polynomial,
matrix transformation induced, 53, nullspace, 338 152, 325
91, 219 Taylor’s theorem, 321–322, 321n
matrix transformations. See linear P vector space of, 290
transformations O PageRank, 165–166, 166n vector spaces, 320–323
matrix-vector products, 44 objective function, 431, 433 paired samples, 284 zero polynomial, 290, 546
mean objects, 351n parabola, 425 position vector, 187
“average” of the sample values, 283n odd function, 507 parallel, 191 positive correlation, 284
calculation, 434 odd polynomial, 311 parallelepiped, 216, 223–224 positive definite, 385, 431
sample mean, 283 Ohm’s law, 27 parallelogram positive definite matrix, 385–389,
Mécanique Analytique (Legrange), one-to-one transformations, 339–341 area equal to zero, 215–216 470
215n onto transformations, 339–341 defined, 97, 187n positive semidefinite, 435
median open model of the economy, determined by geometric vectors, positive x-shear, 54
tetrahedron, 198 114–116 187 power method, 393–394
triangle, 198 open sector, 114 image, 223 power sequences, 361
messages, 416 ordered basis, 436, 437–438 law, 97, 187–191, 526 practical problems, 1
methods of proof. See proof ordered n-tuple, 42, 239n rhombus, 202 preimage, 337
metric, 351n see also set of all ordered n-tuples parameters, 2, 12 prime, 410, 538
midpoint, 190 (n) parametric equations of a line, 192 principal argument, 527
minimum distance, 413 origin, 184 parametric form, 2 principal axes, 378, 381–382, 423
Mirkil, H., 122n orthocentre, 228 parity-check code, 415 principal axis theorem, 377–378,
“mixed” cancellation, 74 orthogonal basis, 249n, 251–252, parity-check matrixes, 417–419 401, 404, 432, 490
modular arithmetic, 408–409 369, 479–480 parity digits, 417 principal components, 435
modulo, 408 orthogonal codes, 419–420 Parseval’s formula, 485 principal submatrices, 386
modulus, 408, 525 orthogonal complement, 370–371, particle physics, 433 probabilities, 117
Moler, Cleve, 166n 419, 481 partitioned into blocks, 64 probability law, 434
Moore-Penrose inverse, 282 orthogonal diagonalization, 376–382, path of length, 66 probability theory, 435
morphisms, 351n 486–491 Peano, Giuseppe, 288 product
multiplication orthogonal hermitian matrix, Pearson, Karl, 284n complex number, 526
block multiplication, 64–65 401–402 Pearson correlation coefficient, 284n cross product. See cross product
compatible, 59 orthogonal lemma, 368, 479 perfect code, 414 determinant of product of
matrix multiplication, 56–66 orthogonal matrix, 140, 376–377, period, 329 matrices, 137–138
matrix-vector multiplication, 376n, 383 permutation matrix, 107–110, 384, dot product. See dot product
43–48 orthogonal projection, 372–374, 482 453 inner product. See inner product
Index 581
matrix products, 59–62 regular stochastic matrix, 122 multiplication, 91 solution to a system, 1–3, 13
matrix-vector products, 44 remainder, 408 vectors, 289 trivial solution, 18
scalar product. See scalar product remainder theorem, 321, 547 scalar operator, 332 solution to a system, 1
standard inner product, 397–399 repellor, 165 scalar product span, 232, 299
theorem, 138, 147 reproduction rate, 150 defined, 198 spanning sets, 231–234, 299–301
product rule, 358 restriction, 455 elementary row operations, 20 spectral theorem, 40
projection reversed, 6 geometric vectors, 186 spectrum, 378
linear operator, 493 rhombus, 202 scatter diagram, 284 sphere, 433
linear operators, 220–222 right cancelled invertible matrix, 74 Schmidt, Erhardt, 369n spring constant, 329
orthogonal projection, 372–374, right-hand coordinate systems, 217 Schur, Issai, 403, 403n square matrix (n × n matrix)
482 right-hand rule, 217 Schur’s theorem, 403–404, 405 characteristic polynomial, 152,
projection matrix, 56, 374, 384 root Schwarz, Hermann Amandus, 473n 405
projection on U with kernel W, 481 of characteristics polynomial, 152, second-order differential equation, cofactor matrix, 140
projection theorem, 273, 372–373, 325, 326 325, 357 complex matrix. See complex
482 of polynomials, 155, 298, 393, 523 Seneta, E., 114n matrix
projections, 100, 202–204, 370–374 of the quadratic, 531 sequences defined, 33
proof roots of unity, 530–531 of column vectors, 151 determinants, 51, 126, 139
by contradiction, 538–540 rotations constant sequences, 361 diagonal matrices, 69, 151, 156
defined, 536 about a line through the origin, equal, 361 diagonalizable matrix, 156, 448
direct proof, 536–537 451 Fibonacci sequence, 171, 171n diagonalizing matrix, 151
formal proofs, 537 about the origin, and orthogonal linear recurrences, 168–172 elementary matrix, 83–89
reduction to cases, 537–538 matrices, 140 notation, 360–361 hermitian matrix, 400–402
proper subspace, 229, 243 axis, 502 ordered sequence of real numbers, idempotent, 69
pure imaginary numbers, 523 describing rotations, 97 42 identity matrix, 46, 50–51
Pythagoras, 194, 537 fixed axis, 504 power sequences, 361 invariants, 154–156
Pythagoras’ theorem, 185, 185n, isometries, 497–498, 501–502, 504 recursive sequence, 169 inverse. See inverses
194–195, 200, 250, 477, 537 linear operators, 222–223 satisfy the relation, 361 lower triangular matrix, 134
linear transformations, 97–98 set, 229n matrix of an operator, 448
round-off error, 153n set notation, 230n nilpotent matrix, 167
Q row-echelon form, 9 set of all ordered n-tuples (n) orthogonal matrix, 140
QR-algorithm, 395–396 row-echelon matrix, 9, 10, 13, 254 closed under addition and scalar positive definite matrix, 385–389,
QR-factorization, 390–392 row-equivalent matrices, 90 multiplication, 43 470
quadratic equation, 431 row matrix, 32 complex eigenvalues, 269 regular representation of complex
quadratic form, 381, 422–430, 471 row space, 253 dimension, 240–244 numbers, 453
quadratic formula, 531 rows dot product, 246–249, 397 scalar matrix, 125
quotient, 408 convention, 32 expansion theorem, 251–252 similarity invariant, 450
elementary row operations, 5–7 as inner product space, 468 skew-symmetric, 42
(i, j)-entry, 33 linear independence, 236–240 trace, 69, 263
R leading 1, 9 linear operators, 219–224 triangular matrix, 134
r-ball, 413 as notations for ordered n-tuples, n-tuples, 253, 289n unitary matrix, 402
radian measure, 53n, 97, 527 239n notation, 42 upper triangular matrix, 134
random variable, 434, 435 shape of matrix, 32 orthogonal sets, 249–252 staircase form, 9
range, 338 Smith normal form, 87–88 projection on, 374 standard basis, 93, 233, 241, 242,
rank zero rows, 9 rank of a matrix, 253–260 250, 307, 397
linear transformation, 339, 441 rules of matrix arithmetic, 289 standard deviation, 434
matrix, 14–16, 253–260, 339, 428 similar matrices, 262–264 standard generator, 417
quadratic form, 428 S spanning sets, 231–234 standard inner product, 397–399
similarity invariant, 450 saddle point, 165 special types of matrices, 289 standard matrix, 445
symmetry matrix, 428 same action, 52, 291, 334 standard basis, 93 standard position, 97, 188, 527
theorem, 255–258, 259 sample subspaces, 231–234, 290 state vector, 118, 119–123
rational numbers, 289n analysis of, 282 symmetric matrix, 270–271 statistical principal component
Raum-Zeit-Materie (“Space-Time- comparison of two samples, Shannon, Claude, 412 analysis, 434–435
Matter”) (Weyl), 288 283–284 shift operator, 364–367 steady-state vector, 122
Rayleigh quotients, 394 defined, 282 shifting, 396 Steinitz Exchange Lemma, 306
real axis, 526 paired samples, 284 sign, 127 Stewart, G.W., 395
real Jordan canonical form, 520 sample correlation coefficient, similar matrices, 262–264 stochastic matrices, 113, 119, 122
real numbers, 1, 42, 289n, 291, 397, 284–285 similarity invariant, 450 structure theorem, 499–501
401, 409, 523 sample mean, 283 simple harmonic motions, 328 submatrix, 261
real parts, 357, 523 sample standard deviation, 283, 283n simplex algorithm, 16, 434 subset, 229n
real quadratic, 531 sample variance, 283 sine, 97, 219, 527 subspace test, 297
real spectral theorem, 378 sample vector, 282 single vector equation, 43 subspaces
recurrence, 169 satisfy the relation, 361 size m × n matrix, 32 basis, 241
recursive algorithm, 10 scalar, 36, 289, 289n, 409 skew-hermitian, 407 closed under addition, 229
recursive sequence, 169 scalar equation of a plane, 204 skew-symmetric, 42, 459, 492 closed under scalar multiplication,
reduced row-echelon form, 9, 10, 12, scalar matrix, 125 Smith, Henry John Stephen, 87n 229
14, 88–89 scalar multiple law, 97, 189, 191 Smith normal form, 87–88 complex subspace, 406
reduced row-echelon matrix, 9, 13 scalar multiples, 20, 36, 96 Snell, J., 122n defined, 229, 296
reducible, 462 scalar multiplication Solow, D., 536n dimension, 240–244
reduction to cases, 537–538 axioms, 289, 292n solution eigenspace, 267
reflections basic properties, 293–294 algebraic method, 4, 9 fundamental theorem, 240
about a line through the origin, closed under, 43 basic solutions, 22, 23 image, 337, 338–339
140 closed under scalar multiplication, best approximation to, 273–277 intersection, 235, 317–318
fixed hyperplane, 504 229 consistent system, 2, 15 invariance theorem, 241
fixed line, 498 described, 36–38 general solution, 2, 13 invariant subspaces, 454–464
fixed plane, 501 distributive laws, 37 geometric description, 3 kernel, 338–339
isometries, 497–498 of functions, 292 inconsistent system, 2 m × n matrix, 230–231
linear operators, 220–222 geometric vectors, 189–191 to linear equation, 1–3 planes and lines through the
linear transformations, 99–100 geometrical description, 96 nontrivial solution, 18, 152 origin, 230
regular representation, 453 transformations, preserving scalar in parametric form, 2 projection, 371–372
582 Index
proper subspace, 229, 243 solutions, 1–3, 13 vector equation of a line, 192 position vector, 201
spanning set, 231–234 trivial solution, 18 vector equation of a plane, 205 sample vector, 282
subspace test, 297 unique solution, 3, 3f vector geometry scalar multiplication, 289
sum, 235, 317–318 systematic generator, 417 angle between two vectors, 200 set of all ordered n-tuples (n).
vector spaces, 296–299 computer graphics, 225–227 See set of all ordered n-tuples
zero subspace, 229, 297 cross product, 206–209 (n)
subtraction T defined, 184 single vector equation, 43
complex number, 523–524 T-invariant, 454 direction vector, 191–192 state vector, 118, 119–123
matrix, 35 tail, 186 geometric vectors. See geometric steady-state vector, 122
vector, 188, 293 Taylor’s theorem, 321–322, 321n, vectors subtraction, 293
sum 447n line perpendicular to plane, sum of two vectors, 289
see also addition tetrahedron, 198 198–202 unit vector, 189, 246, 398–399
algebraic sum, 28 theorems, 540 linear operators, 219–224 zero n-vector, 43
complex number, 526 theory of Hilbert spaces, 369n lines in space, 191–194 zero vector, 46, 189n, 229, 236,
direct sum, 318, 459–464, 467 third-order differential equation, planes, 204–206 289
elementary row operations, 20 325, 357 projections, 202–204 velocity, 186n
geometric vectors, 188 Thompson, G., 122n symmetric form, 197 vertices, 66
geometrical description, 96 3-dimensional space, 51, 184 vector equation of a line, 192 vibrations, 433
matrices of the same size, 34 time, functions of, 151n vector product, 207 volume
matrix addition, 34–36 tip, 186 see also cross product linear transformations of, 223–224
of product of matrix entries, 132 tip-to-tail rule, 187 vector quantities, 186n of parallelepiped, 216, 223–224
of scalar multiples, 20 total variance, 435 vector spaces
subspaces, 235, 317–318 trace, 69, 166, 263, 332, 450 abstract vector space, 288
subspaces of a vector space, trajectory, 164 axioms, 289, 292, 294 W
317–318 transformations basic properties, 288–295 Weyl, Hermann, 288
of two vectors, 293 see also linear transformations basis, 306 whole number, 536n
variances of set of random action, 52, 436, 438–439 cancellation, 292 Wilf, Herbert S., 166n
variables, 435 composite, 57, 95 as category, 351n Wilkinson, J.M., 395
of vectors in two subspaces, 459 defining, 52 continuous functions, 469 words, 412
summation notation, 179, 179n described, 32, 52 defined, 289 wronskian, 330
Sylvester’s law of inertia, 428 equal, 52 differential equations, 325–329 Wu, N., 434n
symmetric bilinear form, 431 identity transformation, 53, 101 dimension, 307–308
symmetric form, 196, 422, 431 matrix transformation, 53, 426 direct sum, 459–460
symmetric linear operator, 489 zero transformation, 53, 102 examples, 288–295 X
symmetric matrix transition matrix, 118, 119 finite dimensional spaces, 311–318 x-axis, 184
absolute value, 270n transition probabilities, 117, 119 infinite dimensional, 312 x-compression, 53
congruence, 427–430 translation, 54, 336, 493 introduction of concept, 288 x-expansion, 53
defined, 40 transpose of a matrix, 38–40 isomorphic, 347–348 x-shear, 54
index, 428 transposition, 38–39, 332 linear independence, 303–308
orthogonal eigenvectors, 377–381 triangle linear recurrences, 360–367
positive definite, 385–389 altitude, 228 linear transformations, 331
Y
rank and index, 428 centroid, 198 polynomials, 290–291, 320–323 y-axis, 184
real eigenvalues, 270–271 hypotenuse, 537 scalar multiplication, basic y-compression, 53
syndrome, 419 inequality, 213, 249, 473–474, 525 properties of, 293–294 y-expansion, 53
syndrome decoding, 419 median, 198 set of all ordered n-tuples (n).
system of first order differential orthocentre, 228 See set of all ordered n-tuples Z
equations. See differential triangle inequality, 213, 249, (n)
473–474, 525 z-axis, 184
equations spanning sets, 299–301
triangular matrices, 103–104, 111, zero matrix
system of linear equations subspaces, 296–299, 317–318
134 described, 34
algebraic method, 4 theory of vector spaces, 292–293
triangulation algorithm, 513 no inverse, 70
associated homogeneous system, 3-dimensional space, 184
triangulation theorem, 382 scalar multiplication, 36
47 zero vector space, 295
trigonometric functions, 97 zero n-vector, 43
augmented matrix, 3 vectors
trivial linear combinations, 236, 303 zero polynomial, 290, 546
chemical reactions, application addition, 289
trivial solution, 18 zero rows, 9
to, 29–30 arrow representation, 51n
zero subspace, 229, 297
coefficient matrix, 3 column vectors, 151
zero transformation, 53, 102, 332
consistent system, 2, 15 complex matrices, 397
constant matrix, 3 U coordinate vectors, 189, 207, 224,
zero vector, 46, 189n, 229, 236, 289
uncorrelated, 435 zero vector space, 295
defined, 1 233, 437
electrical networks, application unit ball, 433, 472 defined, 42, 288, 397
to, 27–28 unit circle, 97, 472, 527 difference of, 293
elementary operations, 4–7 unit cube, 224 direction of, 185
equivalent systems, 4 unit square, 224 direction vector, 191–194
gaussian elimination, 13, 14–16 unit triangular, 111 fixed vectors, 505
general solution, 2 unit vector, 189, 246, 398–399, 472 geometric vectors. See geometric
homogeneous equations, 18–23 unitarily diagonalizable, 403 vectors; vector geometry
inconsistent system, 2 unitary matrix, 402 initial state vector, 119
infinitely many solutions, 3, 3f upper Hessenberg form, 396 intrinsic descriptions, 186
inverses and, 72 upper triangular matrix, 103, 134, length, 184–185, 246, 398
with m × n coefficient matrix, 403 matrix recurrence, 151
44–45 matrix-vector multiplication,
matrix algebra. See matrix algebra 43–48
matrix multiplication, 61–62
V matrix-vector products, 44
network flow application, 25–26 Vandermonde determinant, 146 negative, 43, 289
no solution, 2, 3f Vandermonde matrix, 133, 364 nonzero vectors, 191, 231n, 248,
nontrivial solution, 18–19 variance, 282–286, 434 249n
normal equations, 273 variance formula, 286 orthogonal vectors, 201, 202, 249,
positive integers, 30 variation, 434 401–402, 477
rank of a matrix, 14–16 vector addition, 289, 526 orthonormal vector, 249, 401