Mathematical Fundamentals
Mathematical Fundamentals
Mathematical Fundamentals
Mathematical Fundamentals
A.1 Matrices and Their Operation
A.1.1 Denition of a Matrix
A matrix is simply a rectangular array of numbers arranged in m rows and n columns. For example,
a 2 3 matrix has a form:
_
3 2 4
5 1 3
_
(A.1)
Along with being seen as a way of arranging numbers in rows and columns (arrays), matrices normally
have much deeper meaning which depends on the applications. One such an application is the set of
coecients of the system of linear equations, as explained in Appendix A.3. In general, the matrix
elements are indexed by subscripts as a
ij
with index i denoting the row number and the second denotes
the column number:
_
_
_
_
_
a
11
a
12
a
1n
a
21
a
22
a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
a
mn
_
_
_
_
_
(A.2)
Here, i = 1, 2, , m and j = 1, 2, , n.
A.2 Properties of Matrices
Two matrices are said to be equal if and only if they have the same number of rows and columns
and, at the same time every element of one is equal to the corresponding element of the other. For
example, the matrix A = (a
ij
) is equal to matrix B = (b
ij
) if and only if
(a
ij
) = (b
ij
), i = 1, 2, , m; j = 1, 2, , n
The matrix transpose can be dened as follows. For a mn matrix A = (a
ij
), the transpose
matrix is
A
T
= (a
ji
), i = 1, 2, , m; j = 1, 2, , n
In other word, every row of matrix A becomes the column of transpose matrix A
T
, hence the dimension
of the transpose matrix is nm. Obviously, the transpose of transpose matrix is equal to its original:
(A
T
)
T
= A
123
124 APPENDIX A. MATHEMATICAL FUNDAMENTALS
A.2.1 Operations with Matrices
Multiplication by a Scalar
Matrix A = (a
ij
) is multiplied by a scalar in the following way:
A =
_
_
_
_
_
a
11
a
12
a
1n
a
21
a
22
a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
a
mn
_
_
_
_
_
(A.3)
which means that every element of a matrix is multiplied by the given scalar .
Addition of Two or More Matrices
Two matrices could be add together only if they have the same number of rows and columns and then
corresponding elements are added together. For example, for two matrices A = (a
ij
) and B = (b
ij
),
the sum is obtained as follows:
A + B =
_
_
_
_
_
a
11
a
1n
a
21
a
2n
.
.
.
.
.
.
.
.
.
a
m1
a
mn
_
_
_
_
_
+
_
_
_
_
_
b
11
b
1n
b
21
b
2n
.
.
.
.
.
.
.
.
.
b
m1
b
mn
_
_
_
_
_
=
_
_
_
_
_
a
11
+ b
11
a
1n
+ b
1n
a
21
+ b
21
a
2n
+ b
2n
.
.
.
.
.
.
.
.
.
a
m1
+ b
m1
a
mn
+ b
mn
_
_
_
_
_
(A.4)
Matrix Multiplication
For two matrices to be multiplied the number of columns of the rst one must be the same as the
number of rows of the second one. Then, the product of two matrices can be dened as follows. Given
the matrices A = (a
ik
) and B = (b
kj
) where i = 1, 2, , m, k = 1, 2, , n and j = 1, 2, , p, then
their product C = (c
ij
) is given as
C = AB = (a
ik
)(b
kj
) = (c
ij
) = (
n
k=1
a
ik
b
kj
) (A.5)
which has dimension mn. For example, given the two matrices
A =
_
a
11
a
12
a
13
a
21
a
22
a
23
_
, B =
_
_
b
11
b
12
b
21
b
22
b
31
b
32
_
_
their product can be calculated as
AB =
_
a
11
a
12
a
13
a
21
a
22
a
23
_
_
_
b
11
b
12
b
21
b
22
b
31
b
32
_
_
=
=
_
a
11
b
11
+ a
12
b
21
+ a
13
b
31
a
11
b
12
+ a
12
b
22
+ a
13
b
32
a
21
b
11
+ a
22
b
21
+ a
23
b
31
a
21
b
12
+ a
22
b
22
+ a
23
b
32
_
=
_
c
11
c
12
c
21
c
22
_
Note that the resultant matrix has dimension mn or in this case 2 2.
The matrix multiplication is not commutative, that is
AB = BA
A.2. PROPERTIES OF MATRICES 125
A.2.2 Matrices of Special Form
Diagonal Matrices
If a matrix has equal number of rows and columns, that is if m = n, it is called a square matrix. A
square matrix with elements a
ij
= b
ij
when i = j, where b
ij
represents any real number and a
ij
= 0
when i = j is called a diagonal matrix and has the form such as:
_
_
b
11
0 0
0 b
22
0
0 0 b
33
_
_
(A.6)
Scalar Matrix
A diagonal matrix with common principal diagonal element is called a scalar matrix. For example
B =
_
_
b 0 0
0 b 0
0 0 b
_
_
=
_
_
2 0 0
0 2 0
0 0 2
_
_
(A.7)
is a scalar matrix.
Unit or Identity Matrix
A diagonal matrix with all principal elements being one is called a unit or an identity matrix:
I =
_
_
1 0 0
0 1 0
0 0 1
_
_
(A.8)
A.2.3 Determinants
For every matrix we can dene a unique number which is called the determinant of that matrix. For
a 2 2 matrix, the determinant is:
_
a
11
a
12
a
21
a
22
_
= a
11
a
22
a
12
a
21
and for a 3 3 matrix the determinant is:
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
= a
11
a
22
a
33
+ a
12
a
23
a
31
+ a
13
a
21
a
32
a
13
a
22
a
31
a
12
a
21
a
33
a
11
a
23
a
32
There are many ways of easy remembering how to derive the determinant of a matrix and they are
widely explained in the literature. However, mathematically it is important to realise that for a matrix
of n n ordernthere are n! terms to be summed for the determinant since all possible permutations
of thenindexes 1, 2, , n must be found among the second subscript of the matrix elements. We can
now summarise the preceding discussion into a formal denition of a determinant. Given a matrix of
order n n,
A =
_
_
_
_
_
a
11
a
1n
a
21
a
2n
.
.
.
.
.
.
.
.
.
a
n1
a
nn
_
_
_
_
_
(A.9)
126 APPENDIX A. MATHEMATICAL FUNDAMENTALS
we can select one element from each column and row of A in such a way that when the product of
these elements is expressed as
a
1i
a
2j
a
3k
, ans
the set of subscripts {i, j, k , s} for the second subscript representsna permutation of the integers
1, 2, , n. The determinant of an nth order matrix A, denoted as |A|, may be dened as
|A| =
a
1i
a
2j
a
3k
a
ns
(A.10)
the sum bein taken over all permutations of the second subscripts where a + sign is attached to the
terms exhibiting even permutations and a sign to the terms exhibiting odd permutations.
A.2.4 Eigenvalues and Eigenvectors
A square matrix A may be regarded as a linear operator which maps a vector x into another vector y:
y = Ax
Any vector x which is mapped into x, where is a scalar, by A is called an eigenvector of the matrix
A:
x Ax = (I A)x = 0
This has nontrivial solution for if and only if:
det(I A) = 0
where I is the unity matrix with diagonal elements 1 and all other equal to zero. This equation is
also called the characteristic equation of A and solution for are eigenvalues of A. The vectors x
corresponding to those eigenvalues are called eigenvectors.
Example: For a matrix
A =
_
1 2
1 4
_
we have
det(I A) = det
_
1 2
1 4
_
= 0
The characteristic equation is
2
5 + 6 = 0
hence the eigenvalues of A are = 2 and = 3.
A.2.5 Positive-denite Matrices
A symmetric matrix A is positive denite if and only if
x
T
Ax > 0
for all nonzero vectors x. This is dicult to verify and hence more practical denition is that a matrix
A is positive-denite if all eigenvalues are positive. In the similar way, we can dene that matrix A is
1. positive semidenite if x
T
Ax 0 for all x, or equivalently all the eigenvalues of A are nonnega-
tive;
2. negative denite if x
T
Ax < 0 for all x = 0, or equivalently all the eigenvalues of A are negative;
3. negative semidenite if x
T
Ax 0 for all x, or equivalently all the eigenvalues of A are nonposi-
tive;
4. indenite if x
T
Ax takes on both positive and negative values, or equivalently the eigenvalues of
A are both positive and negative;
A.2. PROPERTIES OF MATRICES 127
A.2.6 Null and Range Spaces
If A is an mn with m n, then the null space (A) of A is
(A) = {p
n
: Ap = 0}
That is the null space of a matrix is the set of vectors orthogonal to the rows of the matrix. From the
representation point of view, Z is a null space of matrix A if any vector in (A) can be expressed as
linear combination of the column of Z. Hence, the representation of a null space is not unique, but
for a matrix Z to be a null space of A, it must satisfy A Z = 0.
The range space of a matrix is the set of all linear combinations of the column of that matrix.
For the matrix A
T
, the range space is dened as:
(A
T
) =
_
q
n
: q = A
T
for some
m
_
Example: Consider the following matrix:
A =
_
1 1 0 0
0 0 1 1
_
The null space of A is the set of vectors p such that
Ap =
_
1 1 0 0
0 0 1 1
_
_
_
_
_
p
1
p
2
p
3
p
4
_
_
_
_
=
_
p
1
p
2
p
3
+ p
4
_
=
_
0
0
_
which then must satisfy p
1
= p
2
and p
3
= p
4
. So, the null space vector must have the form
p =
_
_
_
_
v
1
v
1
v
2
v
2
_
_
_
_
for some scalars v
1
and v
2
. A possible basis matrix for the null space of A is
Z =
_
_
_
_
1 0
1 0
0 1
0 1
_
_
_
_
To verify the solution. we have
A Z =
_
1 1 0 0
0 0 1 1
_
_
_
_
_
1 0
1 0
0 1
0 1
_
_
_
_
=
_
0 0
0 0
_
which indicates that Z is indeed the null space matrix of A.
Another view of the null space, rather another explanation is as follows:
Let A be an mn matrix with m n
1
. We denote the null space of A as
(A) = {p
n
: Ap = 0} (A.11)
The matrix equation Ap = 0 is equivalent to a homogeneous system of linear equations.
1
With matrices m denotes number of rows and n denotes number of columns.
128 APPENDIX A. MATHEMATICAL FUNDAMENTALS
Example: Consider the matrix
A =
2 3 5
4 2 3
2 3 5
4 2 3
p
1
p
2
p
3
0
0
The null space represents the set of feasible directions for the constraints Ap = b and it is a set of
vectors orthogonal to the rows of the matrix. It is easy to see that any linear combination of two
vectors in (A) is also in (A), and thus the null space is subspace of
n
. It can be shown that
the dimension of this space is n rank(A). When A has full row rank (i.e. its rows are linearly
independent) this is just n m.
A.3 Solving System of Linear Equations
The system of linear equations such as
a
11
x
1
+ a
12
x
2
a
1n
x
n
= b
1
a
21
x
1
+ a
22
x
2
a
2n
x
n
= b
2
.
.
.
a
m1
x
1
+ a
m2
x
2
a
mn
x
n
= b
m
where a
jk
are given numbers which are called the coecients of the system. The b
i
are also given
numbers. If b
i
are all zero, then the system mis called the homogeneous system. if at least one b
i
is
not zero, the system is called nonhomogenous system. A solution of the system is a set of numbers
z
1
, x
n
that satises all m equations.
Now, we see that the system of linear equations can be represented in a single vector form as
Ax = b (A.12)
where coecient matrix A is the mn matrix together with vectors x and b as follows:
A =
a
11
a
12
a
1n
a
21
a
22
a
2n
.
.
.
.
.
.
.
.
.
a
m1
a
m2
a
mn
, x =
x
1
x
2
.
.
.
x
n
, and b =
b
1
b
2
.
.
.
b
m
2
f(x)
x
2
1
2
f(x)
x
1
x
2
2
f(x)
x
1
x
n
2
f(x)
x
2
x
1
2
f(x)
x
2
2
2
f(x)
x
2
x
n
.
.
.
.
.
.
.
.
.
.
.
.
2
f(x)
x
n
x
1
2
f(x)
x
n
x
2
2
f(x)
x
2
n
(A.14)
Using the gradient notation, we can write that the Hessian matrix is of the form
H(f(x)) =
_
2
f(x)
ij
2
f(x)
x
j
x
i
(A.15)
The functions with continuous second derivatives, the corresponding Hessian is always a symetric
matrix:
2
f(x)
x
i
x
j
=
2
f(x)
x
j
x
i
(A.16)
Example: Consider function F(x
1
, x
2
) = 2x
4
1
+3x
2
1
x
2
+2x
1
x
3
2
+4x
2
2
. The gradient of this function
is:
f(x) =
8x
3
1
+ 6x
1
x
2
+ 2x
3
2
3x
2
1
+ 6x
1
x
2
2
+ 8x
2
2
f(x) =
24x
2
1
+ 6x
2
6x
1
+ 6x
2
2
6x
1
+ 6x
2
2
12x
1
x
2
+ 8
At the point x
0
= (2, 3) these become f(x
0
) =
46
72
and
2
f(x
0
) =
114 42
42 64
.
A.6 The Chain Rule
The rule of obtaining the derivative of a function of a function is called the chain rule. Take the function
g(x) = g(f
1
, f
2
, , f
n
), and suppose that each f
i
is in turn a function of variables x
1
, , x
m
so that
f
i
= f
i
(x
1
, , x
m
) for i = 1, n. So, we will examine the composite function
h(t) = g(f(x))
The chain rule states that if g is continuously dierentiable in
n
and f are continuously dierentiable
in
m
, then h is continuously dierentiable in
m
and
h(x) = f(x)g(x)
where
h(x) = (f
1
(x), f
n
(x))
Example: Take the set of functions h(x) = g(f(x)) where
g
1
(f) = f
2
1
f
1
f
2
g
2
(f) = f
4
1
+ 2f
2
2
where
f
1
(x) = x
1
+ 2x
2
2x
3
f
2
(x) = x
2
1
+ 2x
2
2
A.7. TAYLOR SERIES 131
Then
f(x) =
_
_
1 2x
1
2 1
3 0
_
_
and
h(f(x)) =
_
2f
1
(x) f
2
(x) 4f
3
1
(x)
f
1
(x) 4f
2
(x)
_
=
=
_
2(x
1
+ 2x
2
3x
3
) (x
2
1
+ x
2
) 4(x
1
+ 2x
2
3x
3
)
3
(x
1
+ 2x
2
3x
3
) 4(x
2
1
x
2
)
_
hence
h(x) =
_
_
1 2x
1
2 1
3 0
_
_
_
2(x
1
+ 2x
2
3x
3
) (x
2
1
+ x
2
) 4(x
1
+ 2x
2
3x
3
)
3
(x
1
+ 2x
2
3x
3
) 4(x
2
1
x
2
)
_
A.7 Taylor Series
The Taylor series is a tool for approximating a function f near a specied point x
0
. The obtained
approximation is polynomial, i.e., a function that is easy to manipulate. It is a general tool and can
be applied whenever the function has derivatives.
Consider the case of one-dimensional function f with n continuous derivatives. Let x
0
be the
specied point, then the n-th order Taylor series approximation is:
f(x
0
+ p) f(x
0
) + pf
(x
0
) +
1
2
p
2
f
(x
0
) + +
p
n
n!
f
(n)
(x
0
)
Here, p is a variable: we will see later what values p will take. Normally, the approximation will only
be accurate for small values of p.
Example: Find the Taylor series for the function f(x) =
x at point x
0
= 1. Then:
f(x
0
) =
x
0
=
1 = 1
f
(x
0
) =
1
2
x
1
2
0
=
1
2
1
1
2
=
1
2
f
(x
0
) =
1
4
x
3
2
0
=
1
4
1
3
2
=
1
4
f
(x
0
) =
3
8
x
5
2
0
=
8
1
5
2
=
3
8
Substituting these into the formula for the Taylor series
1 + p = f(x
0
+ p) f(x
0
) + pf
(x
0
) =
1
2
p
2
f
(x
0
) +
1
6
p
3
f
(x
0
)
1 + p
_
1
2
_
+
1
2
p
2
_
1
4
_
+
1
6
p
3
_
1
6
_
1 +
1
2
1
8
p
2
+
1
16
p
3
So, how do we use this? Suppose we want to approximate f(1.6) then x
0
+ p = 1 + p = 1.6 and so
p = 0.6:
1.6 =
1 + 0.6 1 +
1
2
(0.6)
1
8
(0.6)
2
+
1
16
(0.6)
3
1.2685
The true value is 1.264911 . . .; the approximation is accurate to three digits.
The rst two terms of the Taylor series give us the formula for the tangent line for the function
f at the point x
0
. We commonly de4ne tangent line in terms of general point x, and not in terms of
p. Since x
0
+ p = x, we get p = x x
0
. Substituting this into the rst two terms of the Taylor series
we get the tangent line as:
y = f(x
0
) + (x x
0
)f
(x
0
)
For our example above we get:
132 APPENDIX A. MATHEMATICAL FUNDAMENTALS
Figure A.2: Taylor series approximation
y = 1 + (x 1)
1
2
or y =
1
2
(x + 1)
This is illustrated in Figure A.2. The Taylor series can also be derived for real-valued function of more
than one variable. In the matrix notation the obvious analogy between the two cases is:
1 variable : f(x
0
+ p) f(x
0
) + pf
(x
0
) +
1
2
p
2
f
(x
0
) +
n variables : f(x
0
+ p) f(x
0
) + p
T
f(x
0
) +
1
2
p
T
2
f(x
0
)p +
In the case of more than one variable, x
0
and p are both vectors. The notion f(x
0
) refers to the
gradient of the function f at the point x = x
0
. The notion
2
f(x
0
) represents the Hessian of f at the
point x = x
0
.
Example: Consider the function with more than one variable:
f(x
1
, x
2
) = x
3
1
+ 5x
2
1
x
2
+ 7x
1
x
2
2
+ 2x
3
2
at the point
x
0
= (2, 3)
T
The gradient of this function:
f(x) =
_
3x
2
1
+ 10x
1
x
2
+ 7x
2
2
5x
2
1
+ 14x
1
x
2
+ 6x
2
2
_
and the Hessian is:
2
f(x) =
_
6x
1
+ 10x
2
10x
1
+ 14x
2
10x
1
+ 14x
2
14x
1
12x
2
_
At the point x
0
= (2, 3)
T
these are
f(x
0
) =
_
15
10
_
and
2
f(x
0
) =
_
18 22
22 8
_
If p = (p
1
, p
2
)
T
= (0.1, 0.2)
T
then
f(1.9, 3.2) = f(2 + 0.1, 3 + 0.2)
= f(x
0
+ p)
f(x
0
) + p
T
f(x
0
) +
1
2
p
T
2
f(x
0
)p
= 20 + (0.1 0.2)
_
15
10
_
+
1
2
(0.1 0.2)
_
18 22
22 8
__
0.1
0.2
_
= 20 0.5 + 0.69 = 19.81
A.8. NEWTONS METHOD FOR NONLINEAR EQUATIONS 133
The true value is f(1.9, 3.2) = 19.755, so the approximation is accurate to tree digits.
The Taylor series for multi-dimensional problems can also derived using summations (dierent
notation only):
f(x
0
+ p) = f(x
0
) +
n
i=1
p
i
f(x)
x
i
x=x
0
+
1
2
n
i=1
n
j=1
p
1
p
j
2
f(x)
x
i
x
j
x=x
0
+
There is alternate form of the Taylor series called remainderform. For three terms it looks like:
1 variable : f(x
0
+ p) = f(x
0
) + pf
(x
0
) +
1
2
p
2
f
()
n variables : f(x
0
+ p) = f(x
0
) + p
T
f(x
0
) +
1
2
p
T
2
f()p
The point is an unknown between x
0
and x
)
+p. In this form the series is exact, but since it involves
unknown point it can not be evaluated. If the remainder form is used but only with the two terms,
then we have
1 variable : f(x
0
+ p) = f(x
0
) + pf
()
n variables : f(x
0
+ p) = f(x
0
) + p
T
f()
which is known as the mean value theorem.
A.8 Newtons Method for Nonlinear Equations
In numerical analysis, Newtons method (also known as the NewtonRaphson method), named after
Isaac Newton and Joseph Raphson, is perhaps the best known method for nding successively better
approximations to the zeros (or roots) of a real-valued function. Newtons method can often converge
remarkably quickly, especially if the iteration begins suciently near the desired root. We will
explain the method using a one-dimensional case where x is a scalar and f is a real-valued function:
f(x) = 0
The whole algorithm works as follows. Given an estimate of the solution x
k
, the function
is approximated by the linear function consisting of the rst two terms of the Taylor series for the
function f at the point x
k
. The resulting linear system is solved to obtain new estimate of the solution
x
k+1
. The Taylor series of the function f at the point x
k
is:
f(x
k
+ p) f(x
k
) + pf
(x
k
)
If f
(x
k
) = 0 the we cal solve the equation
f(x
) f(x
k
) + pf
(x
k
) = 0
for p to obtain
p =
f(x
k
)
f
(x
k
)
The new estimate of the solution then is x
k+1
= p + x
k
or
x
k+1
= x
k
f(x
k
)
f
(x
k
)
The graphical interpretation of the whole process, starting from the initial guess x
0
is shown in Figure
A.3.
134 APPENDIX A. MATHEMATICAL FUNDAMENTALS
Figure A.3: Geometric interpretation of Newtons method
Example: As an example, consider the one-dimensional problem
f(x) = 7x
4
+ 3x
3
+ 2x
2
+ 9x + 4 = 0
Then
f
(x) = 28x
3
+ 9x
2
+ 4x + 9
and the Newtons method is
x
k+1
= x
k
7x
4
k
+ 3x
3
k
+ 2x
2
k
+ 9x
k
+ 4
28x
3
k
+ 9x
2
k
+ 4x
k
+ 9
which gives the solution as in Table A.1.
k x
k
f(x
k
)
0 0 4.00E+00
1 -0.444444444 4.0E-01
2 -0.506325575 2.6E-02
3 -0.511009243 1.8E-04
4 -0.511041786 8.8E-09
5 -0.511041788 0.0E+00
Table A.1: Newtons method for one dimensional problem
Much of the discussion in the one-dimensional case can be transfered with only minor changes
to the n-dimensional case. Again, suppose that we are solving
f(x) = 0
where this represents
f
1
(x
1
, , x
n
) = 0
f
2
(x
1
, , x
n
) = 0
.
.
.
f
n
(x
1
, , x
n
) = 0
Dene the matrix f(x) with columns f
1
(x), , f
n
(x). This is the transpose of the Jacobian
2
of
f at the point x. As before, we will write the Taylors series approximation for the function f at the
point x
k
:
f(x
k
+ p) f(x
k
) +f(x
k
)P
2
The Jacobian matrix is the matrix of all rst-order partial derivatives of a vector-valued function. The partial
A.8. NEWTONS METHOD FOR NONLINEAR EQUATIONS 135
where p is now a vector. Now we solve the equation:
f(x
) f(x
k
) +f(x
k
)
T
p = 0
for p to obtain
p = f(x
k
)
T
f(x
k
)
The new estimate of the solution then is:
x
k+1
= x
k
f(x
k
)
T
f(x
k
)
This is now formula for Newtons method in -n-dimensional case.
derivatives of all these functions (if they exist) can be organized in an mn matrix, the Jacobian matrix J:
J =
f
1
x
1
f
1
x
n
.
.
.
.
.
.
.
.
.
f
m
x
1
f
m
x
n
The i-th row (i = 1, , m) of this matrix is the gradient of the i-th component function f
i
(f
i
)