Nothing Special   »   [go: up one dir, main page]

Appendix: Useful Mathematical Techniques: A.1 Solving Triangles

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Appendix:

Useful Mathematical Techniques


This appendix contains concise reviews of various mathematical techniques
used in this book. We start with formulae useful for solving triangles, both
planar and spherical. Next, some aspects of the manipulation of vectors
are summarized. These include solution methods for vector equations and
conventions for dierentiation with respect to a vector and a matrix. Least-
squares methods for linear systems come next. These are followed by a
review of optimization methods, both unconstrained and constrained. The
appendix ends with a look at the calculus of variations.
A.1 Solving Triangles
Suppose a planar triangle has sides a, b, and c with opposite angles A, B,
and C (gure A-1a). The diameter of the circumscribed circle equals the
length of one side divided by the sine of the opposite angle. Since this is
true for all three choices of sides, we have
a
sin A
=
b
sin B
=
c
sin C
.
This is the well-known law of sines. The law of cosines is given by
a
2
= b
2
+ c
2
2bc cos A.
454 Appendix: Useful Mathematical Techniques
Two similar formulae can be obtained by simultaneous cyclical permutation
of a, b, c and A, B, C. The projection theorem states that
c = a cos B + b cos A.
(There are many other useful relationships, but we can usually manage
with just these three; others can be derived from them if necessary.) The
area of the triangle can be written
S =
1
2
ab sin C.
A spherical triangle is a gure on the sphere whose three sides are seg-
ments of great circles (gure A-1b). Suppose that the angles of intersection
at the three corners are A, B, and C. On the unit sphere, the length of a
side is equal to the angle it subtends at the center of the sphere. Let the
lengths of the three sides be a, b, and c. In the case of a spherical triangle,
the law of sines is
sin a
sin A
=
sin b
sin B
=
sin c
sin C
.
The law of cosines for the sides is
cos a = cos b cos c + sin b sin c cos A,
while the law of cosines for the angles is
cos A = cos B cos C + sin B sin C cos a.
There are two more formulae in each case obtained by simultaneous cyclical
permutation of a, b, c and A, B, C. (Other relations can be derived from
these if necessary.)
Sometimes it is dicult to determine which quadrant an angle lies in.
In this case, the rule of quadrants comes to the rescue:
1
2
(A + B) is in the
same quadrant as
1
2
(a + b). Finally, we note that the area of a spherical
triangle is
S
R
= R
2
.
Here R is the radius of the sphere, while = A + B + C is called the
spherical excess (measured in radians).
A.2 Manipulation of Vectors 455
A.2 Manipulation of Vectors
We shall assume that the reader is familiar with the properties of vector
addition, scalar multiplication, and dot- and cross-products. Vectors will
be denoted by boldface letters. We commonly deal with column vectors
and therefore have to take the transpose, indicated by the superscript
T
,
when we want to write them in terms of the equivalent row vectors.
A.2.1 More Complex Products of Vectors
The vector triple product is dened as follows:
[abc] = a (b c) = (a b) c.
The magnitude of the result is independent of the order of the vectors, since
it is the volume of the parallelepiped dened by the three vectors. The sign
of the result is the same for all triple products with the same cyclical order.
If the three vectors lie in a plane, they are linearly dependent, and in this
case the triple product is zero.
The following identities apply to other complicated products of vectors:
a (b c) = (c a)b (a b)c,
(a b) c = (c a)b (b c)a.
Thus we have
_
(a b) c
_
d +
_
(b c) a
_
d +
_
(c a) b
_
d = 0,
a
_
b (c d)
_
= (b d)(c a) (b c)(d a),
(a b) (c d) = (c a)(b d) (d a)(b c),
456 Appendix: Useful Mathematical Techniques
and from these we can derive
(a b) (c d) + (b c) (a d) + (c a) (b d) = 0,
|a b|
2
= |a|
2
|b|
2
(a b)
2
.
Note that this last quantity cannot be negative. Also,
(a b) (c d) = [dab]c [abc]d,
(a b) (c d) = [c da]b [bc d]a.
From this it follows that
_
(a b) (c d) (e f )

= [dab] [c e f ] [abc] [de f ],


and so
_
(a b) (b c) (c a)

= [abc]
2
.
We can express any given vector d in terms of any three independent
vectors a, b, and c:
[abc]d = [bc d]a + [dc a]b + [dab]c.
This identity can be used to solve linear vector equations.
A.2.2 Solving Vector Equations
Suppose we are to nd a vector x given its dot-products with three known
linearly independent vectors, a, b, and c. We have
x a = , x b = , and x c = .
The unknown x can be expressed in terms of any three independent vectors.
Rather than use a, b, and c for this purpose, consider a linear combination
of their pairwise cross-products:
x = u(b c) + v(c a) + w(a b).
It remains for us to determine the three scalars u, v, and w. Taking the
dot-product of the above expression with the three vectors a, b, and c, we
obtain
u[abc] = , v[abc] = , and w[abc] = .
Thus we have
x =
1
[abc]
_
(b c) + (c a) + (a b)
_
.
A.2 Manipulation of Vectors 457
The same result could have been obtained by noting that
[abc]x = (b c)(a x) + (c a)(b x) + (a b)(c x).
The above method amounts to solving three equations in three un-
knowns. The result can therefore be used in inverting a 3 3 matrix M
that has the vectors a
T
, b
T
, and c
T
as rows:
M =
_
_
a
T
b
T
c
T
_
_
.
The determinant of this matrix is just [abc]. According to the previous
result, the inverse of this matrix is just the matrix of column vectors ob-
tained by taking the pairwise cross-products, divided by the value of the
determinant:
M
1
=
1
[abc]
_
(b c) (c a) (a b)
_
.
The result is easily checked by matrix multiplication. There is a symmetric
form of this result in which the columns rather than the rows of the original
matrix are considered as vectors.
We turn now to other vector equations. Given an equation
x +x a = b,
we want to nd the unknown x. It can be shown that
(b x) +
1
(b a)a a
2
x = b a,
so that
x =
b +
1
(b a)a b a

2
+ a
2
,
provided = 0.
Given another vector equation,
x + (x b)a = c,
we again want to nd x. Taking the dot-product with b, we get
(x b)
_
+a b
_
= b c,
so that
x =
1

_
c a
b c
+a b
_
,
458 Appendix: Useful Mathematical Techniques
provided +a b = 0.
Next, consider nding a vector x, given its size and its dot-products
with two test vectors a and b. Thus
a x = , b x = , and x x = .
Unless a and b are parallel, we know that a, b, and a b are linearly
independent. Thus the unknown vector can be expressed as
x = ua + vb + w(a b).
We can nd u and v from
|a b|
2
u = +(b b) (a b),
|a b|
2
v = (a b) + (a a).
Moreover,
|a b|
2
(ua + vb) =
_
(b b)a (a b)b


_
(a b)a (a a)b

,
and so
|a b|
2
|ua + vb|
2
= |a b|
2
(ua + vb) c = |a b|
2
.
Now
x x = |ua + vb|
2
+ w
2
|a b|
2
,
or
|a b|
4
w
2
= |a b|
2
|a b|
2
.
We thus obtain the two solutions
w =
_
|a b|
2
|a b|
2
|a b|
2
.
A.3 Vector and Matrix Dierentiation
Often a set of equations can be written more compactly in vector notation.
The advantage of this may evaporate when it becomes necessary to look
at the derivatives of a scalar or vector with respect to the components of
a vector. It is, however, possible to use a consistent, compact notation in
this case also.
A.3 Vector and Matrix Dierentiation 459
A.3.1 Dierentiation of a Scalar with Respect to a Vector
The derivative of a scalar with respect to a vector is the vector whose
components are the derivatives of the scalar with respect to each of the
components of the vector. If r = (x, y, z)
T
, then
df
dr
= (f
x
, f
y
, f
z
)
T
.
Consequently,
d
da
(a b) = b and
d
db
(a b) = a.
The length of a vector is the square root of the sum of the squares of its
elements,
|a| =

a a.
We see that
d
da
|a|
2
= 2a,
so that
d
da
|a| = a,
where a = a/ |a|; also,
d
da
[abc] = b c,
where [abc] = a (b c) as before. Furthermore,
d
da
a
T
Mb = Mb and
d
db
a
T
Mb = M
T
a.
In particular,
d
dx
x
T
Mx = (M+M
T
)x.
The derivative of a scalar with respect to a matrix is the matrix whose
components are the derivatives of the scalar with respect to the elements
of the matrix. Thus if
M =
_
a b
c d
_
,
then
df
dM
=
_
df
da
df
db
df
dc
df
dd
_
.
460 Appendix: Useful Mathematical Techniques
Consequently,
d
dM
Trace(M) = I,
where the trace of a matrix is the sum of its diagonal elements and I is the
identity matrix. Also,
d
dM
a
T
Mb = ab
T
.
In particular,
d
dM
x
T
Mx = xx
T
.
Note that ab
T
is not the scalar a b. The latter equals a
T
b. If
a = (a
x
, a
y
, a
z
)
T
and b = (b
x
, b
y
, b
z
)
T
, the dyadic product is
ab
T
=
_
_
a
x
b
x
a
x
b
y
a
x
b
z
a
y
b
x
a
y
b
y
a
y
b
z
a
z
b
x
a
z
b
y
a
z
b
z
_
_
.
Another interesting matrix derivative is
d
dM
Det(M) = Det(M) (M
1
)
T
.
That this is the matrix of cofactors can be shown as follows. Consider a
particular element m
ij
of the matrix M. We can express the determinant
as the sum
Det(M) =
n

k=1
m
ik
c
ik
by expanding along the row containing this element, where c
ij
is the cofac-
tor of m
ij
. Thus the derivative of the determinant with respect to m
ij
is
just c
ij
. (The cofactor c
ij
is (1)
i+j
times the determinant of the matrix
obtained by deleting the i
th
row and the j
th
column.) The result follows
from the fact that the inverse of a matrix equals the transpose of the matrix
of cofactors divided by the value of the determinant.
Next, if A and B are compatible matrices (that is, if A has as many
columns as B has rows), then
d
dA
Trace(AB) = B
T
and
d
dB
Trace(AB) = A
T
.
Also, in general,
d
dM
Trace(AMB) = A
T
B
T
.
A.3 Vector and Matrix Dierentiation 461
One norm of a matrix is the square root of the sum of squares of its
elements:
M
2
= Trace(M
T
M).
We see that
d
dM
M
2
= 2 M,
and it follows from
AB
2
= Trace(A
T
AB
T
AA
T
B+B
T
B)
or
AB
2
= A
2
2 Trace(A
T
B) +B
2
that
d
dA
AB
2
= 2(AB) and
d
dB
AB
2
= 2(BA).
A.3.2 Dierentiation of a Vector with Respect to a Vector
Occasionally it is also useful to dene a matrix that contains as elements the
derivatives of the components of one vector with respect to the components
of another:
db
da
=
_
_
_
_
dbx
dax
dbx
day
dbx
daz
dby
dax
dby
day
dby
daz
dbz
dax
dbz
day
dbz
daz
_
_
_
_
.
This matrix is just the Jacobian J of the coordinate transformation from
a to b. Clearly,
d
da
Ma = M
for any matrix M. We also see that
d
db
(a b) =
_
_
0 a
z
+a
y
+a
z
0 a
x
a
y
+a
x
0
_
_
,
and so conclude that
a b =
_
_
0 a
z
+a
y
+a
z
0 a
x
a
y
+a
x
0
_
_
b.
462 Appendix: Useful Mathematical Techniques
This denes an isomorphism between vectors and antisymmetric matrices
that can be useful when we are dealing with cross-products.
A.4 Least-Squares Solutions of Linear Equations
Let a = Mb, where M is an m n matrix, a is a vector with m com-
ponents, and b is a vector with n components. Suppose that we have n
measurements {a
i
} and {b
i
} and wish to calculate the matrix M. We
can form the matrices A and B by adjoining the vectors {a
i
} and {b
i
},
respectively. (That is, the i
th
column of the matrix A is a
i
.) Then
A = MB.
Now B is a square matrix. If it has an inverse, then
M = AB
1
.
Suppose that we have more measurements. The problem is then overdeter-
mined, with more equations than unknowns. We can dene an error vector
e with m components
e
i
= a
i
Mb
i
.
Adjoining these k vectors, we obtain
E = AMB.
The sum of the squares of the errors is

|e
i
|
2
=

e
i
e
i
=

e
T
i
e
i
,
or
Trace(E
T
E) = Trace
_
(AMB)
T
(AMB)
_
,
or
Trace(E
T
E) = Trace(A
T
AB
T
M
T
AA
T
MB+B
T
M
T
MB).
Thus
d
dM
Trace(E
T
E) = AB
T
AB
T
+ (B
T
M
T
)
T
B
T
+ (B
T
M
T
)
T
B
T
.
If this is to equal zero, then AB
T
= MBB
T
, that is,
M = AB
T
(BB
T
)
1
.
The term B
T
(BB
T
)
1
is called the pseudoinverse of the nonsquare matrix
B.
A.5 Lagrange Multipliers 463
The problem is underdetermined, on the other hand, if there are fewer
equations than unknowns. There are then innitely many solutions. In
this case the pseudoinverse provides the solution with least norm, but it
has to be computed dierently. The pseudoinverse of a matrix B can be
dened as the limit
B
+
= lim
0
_
B
T
B+
2
I
_
1
B
T
.
Alternatively, it can be dened using the conditions of Penrose (see Albert
[1982]), which state that the matrix B
+
is the pseudoinverse of the matrix
B if and only if
BB
+
and B
+
B are symmetric,
B
+
BB
+
= B,
BB
+
B = B
+
.
The pseudoinverse can also be found using spectral decomposition. The
eigenvectors of the pseudoinverse are the same as those of the original
matrix, while the corresponding nonzero eigenvalues are the inverses of the
nonzero eigenvalues of the original matrix.
A.5 Lagrange Multipliers
The method of Lagrange multipliers provides powerful techniques for solv-
ing extremal problems with constraints. We rst consider the case of a
single constraint equation, then generalize to several constraints.
A.5.1 One Constraint
Suppose we want to nd an extremum of a function f(x, y) subject to the
constraint g(x, y) = 0. If we can solve the latter equation for y,
y = (x),
we can eliminate y by substitution and nd the extrema of
f
_
x, (x)
_
by dierentiating with respect to x. Using the chain rule for dierentiation,
we obtain
f
x
+
f
y
d
dx
= 0.
464 Appendix: Useful Mathematical Techniques
Often, however, it is impossible or impractical to nd a closed-form solution
for y in terms of x. In this situation we use the method of Lagrange mul-
tipliers. We shall not prove that the method provides necessary conditions
for extrema, just indicate why it works.
Consider the curve dened by g(x, y) = 0. Let s be a parameter that
varies as we move along the curve. Then
g
x
dx
ds
+
g
y
dy
ds
= 0,
and the slope of this curve is
dy
dx
=
g
x
_
g
y
.
Substituting this for d/dx in the equation for the extrema derived earlier,
we obtain
f
x
g
y
=
f
y
g
x
.
This equation applies even when the constraint equation g(x, y) = 0 cannot
be solved explicitly for y.
Now consider instead the extrema of the function
F(x, y, ) f(x, y) + g(x, y).
Dierentiating with respect to x, y, and , we have
f
x
+
g
x
= 0 and
f
y
+
g
y
= 0.
If we eliminate , we obtain again
f
x
g
y
=
f
y
g
x
.
Thus the extrema of f(x, y) subject to the constraint g(x, y) = 0 can be
found by nding the extrema of F(x, y, ).
To make this seem plausible, consider moving along the curve dened
by g(x, y) = 0, searching for an extremum of f(x, y). There can be no
extremum where the contours of constant f(x, y) cross the curve we are
following, since we can move a small distance along the curve and nd a
slightly larger or slightly smaller value of f(x, y), as needed. The extrema
are where the contours of constant f(x, y) are parallel to the constraint
curve. Note also that the constraint curve in turn is a curve of constant
A.5 Lagrange Multipliers 465
g(x, y). Contours of constant f(x, y) are perpendicular to the gradient of
f(x, y),
f =
_
f
x
,
f
y
_
T
,
while contours of constant g(x, y) are perpendicular to the gradient of
g(x, y),
g =
_
g
x
,
g
y
_
T
.
These two gradients must be parallel.
Consider now nding an extremum of F(x, y, ). Dierentiation gives
us
f
x
+
g
x
= 0 and
f
y
+
g
y
= 0,
or
_
f
x
,
f
y
_
T
=
_
g
x
,
g
y
_
T
,
which is a statement of the condition that the two gradients must be par-
allel. The factor is simply the ratio of the magnitudes of the two
gradients.
As an example, let us nd the point on the line
xsin y cos + = 0
that is closest to the origin. Here we have to minimize x
2
+ y
2
subject to
the given constraint. Conversely, we can minimize
(x
2
+ y
2
) + (xsin y cos + ).
Dierentiating with respect to x and y, we nd
2x + sin = 0 and 2y cos = 0.
Thus xcos + y sin = 0. Substituting in the constraint equation, we
obtain
xsin + xcos
2
/ sin + = 0, or x = sin ,
and
y sin
2
/ cos y cos + = 0, or y = + cos .
The same method applies if we have more than three independent
variables. To nd an extremum of f(x, y, z) subject to the constraint
466 Appendix: Useful Mathematical Techniques
g(x, y, z) = 0, we look for places where the surfaces of constant f(x, y, z)
are tangent to the surfaces of constant g(x, y, z). Because the gradient is
perpendicular to the tangent plane, we can also look for places where the
gradient of f(x, y, z) is parallel to the gradient of g(x, y, z). We thus look
for extrema of
f(x, y, z) + g(x, y, z),
since this leads to the equations
f
x
+
g
x
= 0,
f
y
+
g
y
= 0,
f
z
+
g
z
= 0,
or
_
f
x
,
f
y
,
f
z
_
T
=
_
g
x
,
g
y
,
g
z
_
T
.
A.5.2 More than One Constraint
With three independent variables, we can add a second constraint. The
extrema of f(x, y, z) = 0 subject to the constraints g(x, y, z) = 0 and
h(x, y, z) = 0 are the same as the extrema of
f(x, y, z) + g(x, y, z) + h(x, y, z)
subject to those constraints. Dierentiating with respect to x, y, and z
yields
f
x
+
g
x
+
h
x
= 0,
f
y
+
g
y
+
h
y
= 0,
f
z
+
g
z
+
h
z
= 0.
These equations state that the gradient of f must be a linear combination
of the gradients of g and h:
f = g h.
A.5 Lagrange Multipliers 467
The constraints g(x, y, z) = 0 and h(x, y, z) = 0 each dene a surface. Their
intersection is the curve along which we search for extrema. The gradient
of a surface is perpendicular to the tangent plane. Thus curves on a surface
are perpendicular to the gradient. In particular, the intersection of the two
surfaces is perpendicular to both gradients. The curve of intersection is
thus parallel to the cross-product of the two gradients. At an extremum,
the gradient of f(x, y, z) should not have any component in this direction
otherwise we can increase or decrease the value of f(x, y, z) by moving a
little along the curve. The gradient of f(x, y, z) will satisfy this condition
if and only if it can be expressed as a linear combination of the gradients
of g(x, y, z) and h(x, y, z).
As an example, let us nd the box with largest volume subject to the
constraints that one face of the box has unit area and that the sum of the
width, height, and depth of the box is four. Let the dimensions of the box
be a, b, and c. We minimize
abc + (ab 1) + (a + b + c 4).
Dierentiating with respect to a, b, and c yields
bc + b + = 0,
ac + a + = 0,
ab + = 0.
Eliminating and from these equations, we obtain a = b. From the rst
constraint it follows that a = b = 1. The second constraint gives c = 2.
A.5.3 The General Case
Let x = (x
1
, x
2
, . . . , x
n
)
T
be a vector in an n-dimensional space. The set
of values x that satisfy the constraint g(x) = 0 form a subspace. Consider
some curve lying entirely in this subspace. Let s be a parameter that varies
as we move along this curve. The direction of the curve at a point is dened
by the tangent at that point,
dx
ds
=
_
dx
1
ds
,
dx
2
ds
, . . . ,
dx
n
ds
_
T
.
The rate of change of g(x) with s is given by
dg
ds
=
g
x
1
dx
1
ds
+
g
x
2
dx
2
ds
+ +
g
x
n
dx
n
ds
= g
dx
ds
,
468 Appendix: Useful Mathematical Techniques
where g is the gradient of g,
g =
_
g
x
1
,
g
x
2
, . . . ,
g
x
n
_
T
.
Because the curve remains in the subspace where g(x) = 0, we have
g
dx
ds
= 0.
That is, the curve must at each point be perpendicular to the gradient of
g(x). The allowed tangent directions at a particular point form an (n1)-
dimensional subspace as long as the gradient is nonzero. We can see this by
noting that only n 1 components of the tangent vector are independent,
the remaining component being constrained by the last equation.
Now suppose that there are m constraints g
i
(x) = 0 for i = 1, 2, . . . ,
m. The intersection of the subspaces dened by each of the constraints
individually is also a subspace. A curve lying in this common subspace
must be perpendicular to all of the gradients of the g
i
; thus
g
i

dx
ds
= 0 for i = 1, 2, . . . , m.
If the m gradients are linearly independent, the common subspace has
dimension n m, since only n m components of the tangent vector can
be freely chosen, the rest being constrained by the m equations above.
If f(x) is to have an extremum at a point in the subspace dened by
the constraints, then the rst derivative of f along any curve lying in the
subspace must be zero. Now
df
ds
= f
dx
ds
,
so that we want
f
dx
ds
= 0
for any tangent direction that satises
g
i

dx
ds
= 0 for i = 1, 2, . . . , m.
That is, at the extremum, the tangent must be perpendicular to the gra-
dient of f as well. This certainly is the case if the gradient of f happens
to be a linear combination of the gradients of the g
i
at the point. What
we want to show is that f must be a linear combination of the g
i
at an
extremum.
A.5 Lagrange Multipliers 469
It is easy to show that the constraints
g
i

dx
ds
= 0
dene a vector space. Any vector can be uniquely decomposed into a
component that lies in this subspace and one that is orthogonal to it.
The vector f can be decomposed into a component g that is a linear
combination of the g
i
and a component c that is orthogonal to each of
the g
i
. Suppose that c is nonzero. Then
f
dx
ds
= g
dx
ds
+c
dx
ds
= c
dx
ds
,
since g is a linear combination of the g
i
. We can choose a curve for which
dx
ds
= c,
since g
i
c = 0 for i = 1, 2, . . . , m. In this case
f
dx
ds
= c c = 0.
This contradicts the condition for an extremum, and c must therefore be
zero after all. That is, f must be a linear combination of the gradients
g
i
. We can write this condition as
f =
m

i=1

i
g
i
for some set of coecients
i
.
Consider now the function
f(x) +
m

i=1

i
g
i
(x).
If we try to nd an extremum by dierentiating with respect to x, we obtain
f +
m

i=1

i
g
i
= 0,
which is just the equation shown to be satised at an extremum of f.
To summarize, then, the extrema of f(x), subject to the m constraints
g
i
(x) = 0, can be found by locating the extrema of
f(x) +
m

i=1

i
g
i
(x)
470 Appendix: Useful Mathematical Techniques
subject to the same constraints. Here x is a vector with n components and
n > m.
A.6 The Calculus of Variations
Calculus teaches us how to nd extrema of functions. We are allowed to
vary one or more parameters of some function. A solution is a set of param-
eters that corresponds to an extremum of the function. Dierentiation of
the function leads to a set of (algebraic) equations that represent necessary
conditions for an extremum.
In the calculus of variations we look for extrema of expressions that
depend on functions rather than parameters. Such expressions are called
functionals. Now we obtain dierential equations rather than ordinary
equations to represent the necessary conditions for an extremum.
A.6.1 Problems without Constraints
As an example, consider the simple integral
I =
_
x2
x1
F(x, f, f

) dx.
Here F depends on the unknown function f and its derivative f

. Let us
assume that the curve to be found must pass through the points f(x
1
) =
f
1
and f(x
2
) = f
2
. Suppose that the function f(x) is a solution of the
extremum problem. Then we expect that small variations in f(x) should
not change the integral signicantly.
Let (x) be a test function. If we add (x) to f(x), we expect that
the integral will change by an amount proportional to
2
for small values
of . If, instead, it varied linearly with , we could increase or decrease
the integral as desired and would therefore not be at an extremum. To be
precise, we want
dI
d

=0
= 0.
This must be true for all test functions (x).
In our specic problem, we must have (x
1
) = 0 and (x
2
) = 0 to
satisfy the boundary conditions. Also, if f(x) is replaced by f(x) + (x),
then f

(x) is replaced by f

(x) +

(x). The integral then becomes


I =
_
x2
x1
F(x, f + , f

) dx.
A.6 The Calculus of Variations 471
If F is suitably dierentiable, we can expand the integrand in a Taylor
series,
F(x, f + , f

)
= F(x, f, f

) +

f
F(x, f, f

)(x) +

f

F(x, f, f

(x) + e,
where e consists of terms in higher powers of . Thus
I =
_
x2
x1
_
F + (x)F
f
+

F
f
+ e
_
dx,
and dierentiating with respect to and setting equal to zero yields
0 =
_
x2
x1
_
(x)F
f
+

(x)F
f

_
dx.
Using integration by parts, we see that
_
x2
x1

(x)F
f
dx = [(x)F
f
]
x2
x1

_
x2
x1
(x)
d
dx
F
f
dx,
where the rst term is zero because of the boundary conditions. We must
therefore have
0 =
_
x2
x1
(x)
_
F
f

d
dx
F
f

_
dx.
If this is to be true for all test functions (x), then
F
f

d
dx
F
f
= 0.
This is called the Euler equation for this problem.
The method can be generalized in a number of ways. First, suppose
that the boundary conditions f(x
1
) = f
1
and f(x
2
) = f
2
are not given.
Then in order for the term
[(x)F
f
]
x2
x1
to be zero for all possible test functions (x), we must introduce the natural
boundary conditions
F
f
= 0 at x = x
1
and x = x
2
.
Next, the integrand might contain higher derivatives,
I =
_
x2
x1
F(x, f, f

, f

, . . . ) dx.
472 Appendix: Useful Mathematical Techniques
The Euler equation in this case becomes
F
f

d
dx
F
f
+
d
2
dx
2
F
f
= 0.
In this case we must specify the boundary values of all but the highest
derivatives in order to pose the problem properly.
We can also treat the case in which the integrand depends on several
functions f
1
(x), f
2
(x), . . . instead of just one. That is,
I =
_
x2
x1
F(x, f
1
, f
2
, . . . , f

1
, f

2
, . . . ) dx.
In this case there are as many Euler equations as there are unknown func-
tions:
F
fi

d
dx
F
f

i
= 0.
Consider next a case in which there are two independent variables x
and y and we are to nd a function f(x, y) that yields an extremum of the
integral
I =
__
D
F(x, y, f, f
x
, f
y
) dxdy.
Here f
x
and f
y
are the partial derivatives of f with respect to x and y,
respectively, and the integral is over some simply-connected closed region
D. We introduce a test function (x, y) and add (x, y) to f(x, y). We
are given the values of f(x, y) on the boundary D of the region, so the
test function must be zero on the boundary. Taylor series expansion yields
F(x, y, f + , f
x
+
x
, f
y
+
y
)
= F(x, y, f, f
x
, f
y
) +

f
F(x, y, f, f
x
, f
y
)(x, y)
+

f
x
F(x, y, f, f
x
, f
y
)
x
(x, y) +

f
y
F(x, y, f, f
x
, f
y
)
y
(x, y) + e,
where e consists of terms in higher powers of . Thus
I =
__
D
_
F + F
f
+
x
F
fx
+
y
F
fy
_
dxdy,
and dierentiating with respect to and setting equal to zero yields
0 =
__
D
_
F
f
+
x
F
fx
+
y
F
fy
_
dxdy.
A.6 The Calculus of Variations 473
Now by Gausss integral theorem,
__
D
_
Q
x
+
P
y
_
dxdy =
_
D
_
Qdy P dx
_
,
so that
__
D
_

x
(F
fx
) +

y
(F
fy
)
_
dxdy =
_
D
_
F
fx
dy F
fy
dx
_
.
Given the boundary conditions, the term on the right must be zero, so that
__
D
_

x
F
fx
+
y
F
fy
_
dxdy =
__
D
_


x
F
fx
+

y
F
fy
_
dxdy.
Consequently,
0 =
__
D

_
F
f


x
F
fx


y
F
fy
_
dxdy
for all test functions . We must have, then, that
F
f


x
F
fx


y
F
fy
= 0.
Here the Euler equation is a partial dierential equation. An immediate
extension is to the case in which the value of f on the boundary D is not
specied. For the integral
_
D

_
F
fx
dy F
fy
dx
_
to be zero for all test functions , we must have
F
fx
dy
ds
= F
fy
dx
ds
,
where s is a parameter that varies along the boundary.
The extension to more than two independent variables is also immedi-
ate.
A.6.2 Variational Problems with Constraints
A problem in the calculus of variations can also have constraints. Suppose,
for example, that we want to nd an extremum of
I =
_
x2
x1
F(x, f
1
, f
2
, . . . , f
n
, f

1
, f

2
, . . . , f

n
) dx
474 Appendix: Useful Mathematical Techniques
subject to the constraints g
i
(x, f
1
, f
2
, . . . , f
n
) = 0 for i = 1, 2, . . . , m, with
m < n. We can solve the modied Euler equations

f
i

d
dx

i
= 0 for i = 1, 2, . . . , n
subject to the constraints. Here
F +
m

i=1

i
(x)g
i
(x, f
1
, f
2
, . . . , f
n
).
The unknown functions
i
(x) are again called Lagrange multipliers.
Constraints in the form of integrals are treated similarly. Suppose we
want to nd an extremum of the integral
I =
_
x2
x1
F(x, f
1
, f
2
, . . . , f
n
, f

1
, f

2
, . . . , f

n
) dx
subject to the constraints
_
x2
x1
g
i
(x, f, f
1
, f
2
, . . . , f
n
, f

1
, f

2
, . . . , f

n
) dx = c
i
for i = 1, 2, . . . , m,
where the c
i
are given constants. We now solve the modied Euler equa-
tions

f
i

d
dx

i
= 0 for i = 1, 2, . . . , m
subject to the constraints. Here
F +
m

i=1

i
g
i
(x, f
1
, f
2
, . . . , f
n
).
The unknown constants
i
are still called Lagrange multipliers.
A.7 References
A valuable mathematical encyclopedia is the Mathematical Handbook for
Engineers and Scientists by Korn & Korn [1968]. Dicult integrals can be
looked up in the Table of Integrals, Series, and Products by Gradshteyn
& Ryzhik [1980]. All elementary and many special functions can be found
in the Handbook of Mathematical Functions with Formulas, Graphs, and
Mathematical Tables edited by Abramowitz & Stegan [1964]. Smaller col-
lections of this kind of information may be found in Buringtons Handbook
A.7 References 475
of Mathematical Tables and Formulas [1973] and Dwights Tables of Inte-
grals and Other Mathematical Data [1964].
Chisholm provides an elementary treatment of vectors in Vectors in
Three-Dimensional Space [1978], while Schey deals with dierential opera-
tors applied to vectors in Div, Grad, Curl, and All That: An Informal Text
on Vector Calculus [1973]. Generally useful mathematical tools are cov-
ered in many books, such as Hildebrands Methods of Applied Mathematics
[1952].
There is no shortage of books on probability and statistics. One such
is Drakes Fundamentals of Applied Probability Theory [1967].
The calculus of variations is explained in Methods of Mathematical
Physics, volume I, by Courant & Hilbert [1953] and Calculus of Variations:
With Applications to Physics & Engineering by Weinstock [1974].
Least-squares methods often lead to problems that can be tackled using
the pseudoinverse. Here one can recommend Alberts Regression and the
MoorePenrose Pseudoinverse [1972]. Gill, Murray, & Wright cover prac-
tical aspects of optimization problems in Practical Optimization [1981].
Deeper coverage of the theoretical issues can be found in Luenbergers In-
troduction to Linear and Nonlinear Programming [1973].
Numerical methods are covered in such books as Elementary Numer-
ical Analysisan Algorithmic Approach by Conte & de Boor [1972] and
Hammings Numerical Methods for Scientists and Engineers [1962].

You might also like