Nothing Special   »   [go: up one dir, main page]

USyd MATH1011 Full Course Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 122

Applications

of
Calculus
Lecture notes for MATH1011

School of Mathematics and Statistics


Semester 1 2020
Applications of
Calculus

Lecture notes for MATH1011

c 2020 School of Mathematics and Statistics, University of Sydney



Authors: G. R. Ball and S. Britton.
Revised by E. Carberry and R. Howlett
Table of Contents
Chapter 1: Curve Fitting
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Periodic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Modification of Sinusoidal Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Combining sinusoidal functions having the same period . . . . . . . . . . . . . . . . . . . . . . 9
Exercises Set 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Scaling Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Proportionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Linear Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Logarithmic scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Exercises Set 1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Finite Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Difference Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Constructing polynomials from difference tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Exercises Set 1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Chapter 2: Optimisation
2.1 One Variable Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
The Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Global and Local Maxima and Minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Maxima and minima when the derivative is undefined . . . . . . . . . . . . . . . . . . . . . . . 39
Concavity and points of inflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Absolute and relative growth rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Exercises Set 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2 Functions with two or more variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Coordinates in 3 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Geometric interpretation of partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Maxima and minima of functions of two variables . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Tests for maxima and minima of functions of two variables . . . . . . . . . . . . . . . . . 56
Exercises Set 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.3 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
The line of best fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Polynomials of best fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Exercises Set 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Chapter 3: Summation
3.1 Finite sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Geometric series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Collapsing series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Exercises Set 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

–i–
3.2 The definite integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Exercises Set 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.3 The indefinite integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Integral curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Finding an amount given its rate of change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Exercises Set 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.4 Applications of the definite integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Area under a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Area between two curves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
Average value of a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Some properties of definite integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Exercises Set 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.5 Extending integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100
Improper integrals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100
Infinite integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Exercises Set 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Appendix 1 – Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Appendix 2 – Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Appendix 3 – Trigonometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Appendix 4 – Exponents and logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Appendix 5 – Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Appendix 6 – Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Answers to the Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

– ii –
Chapter 1
Curve Fitting

§1.1 Introduction

This chapter is concerned with finding mathematical functions to describe vari-


ous types of data. Scientific experiments frequently involve measuring one variable
against another, with the objective of determining the relationship between the vari-
ables. There are many circumstances in experimental sciences where the relationship
between the variables is not at all clear. It is generally possible, however, to find a
function which fits the data reasonably well.
In the next three sections we consider some elementary ways of fitting functions
to given data. You should keep in mind the fact that a particular set of data may
well satisfy more than one formula. The art of curve fitting is not purely one of doing
the right calculations; there may be evidence to suggest that a particular relationship
is involved, or you may make assumptions which result in one formula rather than
another.

§1.2 Periodicity

1.2.1 Periodic Functions


This section deals with the type of functions that are required to model such biological
rhythms as seasonal variations, daily cycles and breathing. In such variations, almost
the same pattern occurs in each cycle. Functions to model this behaviour exhibit the
property
f (t) = f (t + a) for all t (1)

for some positive number a. Such functions are called periodic, and the period is the
smallest positive a satisfying Eq. (1).
The following is the graph of a function with period equal to 1.

1 2 3 4

The display of an electrocardiogram

Example 1.1
Yeast cells increase in a two stage cyclic process of mitosis. Firstly, each cell duplicates
its DNA and then it divides to form new cells. The cell’s length, ℓ, grows at a
constant rate from the time of its creation to that of nuclear division; the length
remains stationary during the time taken to create new cell walls and divide.

1
2 Chapter 1 Curve Fitting

The following graph illustrates the typical growth of a cell.

1.3
1

The process is then repeated in a sequence of cycles. The resulting periodic


function ℓ = f (t) is shown in the following graph.

1.3
1

0.7 1 1.7 2 2.7 3 3.7 4

The black circles and white circles in this diagram indicate (for example) that
we take the function value at t = 2 to be 2, rather than 1.3.
The equation of this function has to accommodate both the cyclic nature of the
process and the fact that a different rule applies in the two stages of each cycle (the
cell growth stage and the cell division stage). For t in the interval 0 < t ≤ 1 the
equation is 
t + 1.3 if 0 < t < 0.7
f (t) =
2.0 if 0.7 ≤ t ≤ 1
and for t outside this interval the value of f (t) is determined by the rule that

f (t + 1) = f (t) for all t.

The idea is, for example, that f (1.4) = f (0.4) (by the rule that f (t + 1) = f (t) for
all t), and since 0.4 is in the interval 0 < t ≤ 1 the value of f (0.4) is determined by
the previous rule. We find that

f (0.4) = 0.4 + 1.3 = 1.7,

and thus f (1.4) = 1.7 also.


Note that
(i) The equation f (t + 1) = f (t) says that the function is periodic: each shift in the
value of t by one unit results in the same value of f (t). In this case the period
is 1, since f (t + a) = f (t) does not hold for any positive a less than 1.
(ii) In the interval from t = 0 to t = 1 the rule that defines the function for
0 < t < 0.7 is different from the one that applies for 0.7 ≤ t ≤ 1.
1.2 Periodicity 3

Example 1.2
Sketch the following function, and state its period:

2
 for 0 ≤ x < 1,
f (x) = 3−x for 1 ≤ x < 3,


f (x + 3) for all x.

Solution: The period of the function is 3, and its graph is as follows.


f (x)
2

x
−4 −3 −2 −1 1 2 3 4 5 6 7 8 9

Note that, since the function values repeat every 3 units, we have

. . . f (x − 6) = f (x − 3) = f (x) = f (x + 3) = f (x + 6) = . . . for all x.

For example, f (92) = f (89) = · · · = f (2) = 1, and f (−11) = f (−8) = · · · = f (1) = 2.

Example 1.3
In some instances, the periodicity can be unexpected. In a study of asthma in
N.S.W. from 2003 to 2010, the Department of Public Health at the University of
Sydney obtained the data shown below. The data points show the number of cases
treated in public hospitals each month. Approximate periodic behaviour has been
emphasized by drawing a curve that provides a reasonable fit to the data points.

2000

1500

1000

500

0
Jan 03 Jan 04 Jan 05 Jan 07 Jan 08 Jan 09 Jan 10

The most famous periodic functions are the sinusoidal functions such as f (t) = sin t
and g(t) = cos t. Both of these functions have period 2π; that is,

sin(t + 2π) = sin(t) for all t


and also
cos(t + 2π) = cos(t) for all t.
4 Chapter 1 Curve Fitting

Their graphs are as follows. cos


sin
1

0
−2π 2π
t
−1

Observe that the sine function is simply the cosine function shifted π/2 units to the
right. (Note that in mathematical writing sin t and cos t always mean the sine and
cosine of the number t, or (equivalently) the sine and cosine of an angle of t radians.
To convert from degrees to radians you divide by 180 and multiply by π.)
As most periodic phenomena do not have period 2π, and since, further, many
are not sinusoidal, it is necessary to either modify the sine (or cosine) function or
find some other functions to model rhythmic behaviour. For instance, the function
f (t) = sin 3t is sinusoidal with period 2π/3. Furthermore, as the next example shows,
the sum of two sinusoidal functions may be periodic and not sinusoidal.

Example 1.4
The diagram below shows the graph of the function f (t) = sin 2t + sin 3t over the
interval from t = −2π to t = 2π.
f (t)
2

0
−2π 2π
t
−1

−2

The periodicity of this function is evident from the graph, and is confirmed by
the following calculation:

f (t + 2π) = sin 2(t + 2π) + sin 3(t + 2π)


= sin(2t + 4π) + sin(3t + 6π)
= sin 2t + sin 3t
= f (t).

Adding extra terms of the form A sin(nt + α), for suitably chosen constants A, n
and α, one could produce a periodic function with a graph to roughly match the
asthma data in Example 1.3 above.

There are four ways in which one might modify the graph of a periodic function to
produce another graph with the same general shape. These are moving the graph
vertically, moving it horizontally, stretching it vertically and stretching it horizon-
tally. These transformations, which can all be accomplished by simple modifications
of the given function, correspond to changing four quantities that we shall discuss
in a moment: the mean level, the phase, the amplitude and the period. Of special
1.2 Periodicity 5

importance are the functions that can be obtained from the sine function f (t) = sin t
by a combination of transformations of the above kinds; these are precisely the si-
nusoidal functions. They are essential for modelling periodic phenomena, since it
turns out that any continuous periodic function can be approximated to an arbitrary
degree of accuracy by a sum of sinusoidal functions.†
Features of periodic functions
(a) Period : This is the smallest positive number a such that f (t) = f (t+a). As there
is no positive number a smaller than 2π such that sin(t + a) = sin t, the period
of f (t) = sin t is 2π. By contrast, the functions g(t) = sin 2t and h(t) = sin(t/2)
are also periodic, but have different periods: g has period π and h has period 4π.
g(t)=sin 2t f (t)=sin t
h(t)=sin(t/2)
1

0
−2π 2π
t
−1

(b) Amplitude: This is the quantity 12 (M − m), where M and m are respectively the
maximum and minimum values of the function. As can be seen from the graphs
above, the functions f (t) = sin t, g(t) = sin 2t and h(t) = sin(t/2) all have the
same amplitude, namely 1.
(c) Mean Level : The value halfway between the maximum and minimum values
of the function is sometimes called the mean level. That is, the mean level is
1
2 (M + m), where M is the maximum and m the minimum. Thus M is the mean
level plus the amplitude, while m is the mean level minus the amplitude.
(d) Phase: Values of t that differ by one or more periods are said to represent the
same phase of the rhythm. Technically, the phase of a point t0 is the set of all t
values that differ from t0 by a multiple of the period. Points that differ by one
or more periods are said to be in equal phase; similarly, one refers to the phase
of a peak (turning point).

Note that for sinusoidal functions, but not for all periodic functions, the mean
level coincides with the average value (also called the mean value) of the function.
The formal definition of the average value of a function involves integration, a topic
that we shall study later in this course. But an intuitive understanding of the concept
of average value is easily obtained with the aid of a diagram such as the following.
f (x)
1
a b c d
0.6
e
x
1 2 3 4

† The general theory of approximation of periodic functions by sums of sinusoidal func-


tions is known as Fourier analysis. It is beyond the scope of this course, but is studied in
second year mathematics.
6 Chapter 1 Curve Fitting

The diagram shows the graph of a certain function f over the interval from x = 0
to x = 4. The total area of the regions below this graph and above the line y = 0.6
is equal to the area of the region above the graph and below y = 0.6. That is, the
sum of the areas marked with the letters a, b, c and d equals that marked with the
letter e (including the part below the x-axis). So we call 0.6 the average value of f .
Note that there is a unique function f that satisfies f (x + 4) = f (x) and has the
graph shown above on the interval from x = 0 to x = 4. This function is periodic,
with period 4. Since the maximum and minimum values this function achieves are 1
and −1 respectively, we see that its amplitude is 1 and its mean level is 0.
1.2.2 Modification of Sinusoidal Functions
The most general form for the equation of a sinusoidal function is

f (t) = a cos b(t − c) + d,

where t is the variable and a, b, c and d are constants. The values of a, b and d
determine the amplitude, period and mean level, while c determines the phase of the
point t = 0. To be precise, the amplitude is |a|, the mean level is d, and the period
is 2π/|b|. We present some examples to illustrate this.
(a) Consider first the graph of y = a cos t.

−2π 2π t
−a

In this diagram the value of a is positive, and the graph of y = a cos t is drawn
as a solid line. Replacing a by its negative would give the dotted graph.†
We find that y takes the value a at t = 0 and the value −a at t = π. Since cos t
can never be greater than 1 or less than −1 it follows that all values taken by
a cos t lie between a and −a. Thus absolute value of a is the amplitude of the
function and the mean level is 0. Note also that the period of this function is
always 2π, irrespective of the value of a.
(b) Consider next y = cos bt, where b 6= 0. Note that since cos bt = cos(−bt) for all t,
there is no loss of generality in assuming that b is positive. (The case b = 0 is
excluded since it corresponds to the horizontal line y = 1.)

0
−2π/b 2π/b
t
−1

Evaluating y = cos bt at t = n(2π/b), where n is any integer, gives

y = cos bn(2π/b) = cos 2πn = 1.

† In fact there is no need to ever use negative values of a, since − cos t = cos(t + π).
1.2 Periodicity 7

These integer multiples of 2π/b are the only values of t for which cos bt equals 1.
It follows readily that cos bt has period 2π/b, since for any value of t we have

cos b(t + (2π/b)) = cos(bt + 2π) = cos bt.

Note in particular that the period decreases as b increases.


If k denotes the period of our sinusoidal function, then our calculations above
give the formula k = 2π/b. Equivalently, b = 2π/k. This observation enables us
to easily write down formulas for sinusoidal functions with specified period. For
example, if we seek a sinusoidal function with period 3, then y = cos(2π/3)t will
suffice. More generally, y = a cos( 2π
3 (t − c)) + d has period 3, irrespective of the
values of a, c and d.
Here are some more examples.
formula period
cos t 2π
cos 2(t − π) π

cos(3t + 100) 3
5 + cos 12 t 4π
−1 + cos π(t − 21 ) 2

Observe that the coefficient b is frequently not an integer.


(c) y = cos(t − c):
y = cos t y = cos(t − c)
1
0
−2π c 2π 2π+c t
−1

These graphs show that replacing t by t − c corresponds to moving the whole


graph c units to the right. Our diagram shows the result of moving the graph
of y = cos t to the right by an amount of 0.22 units (approximately). That is,
we have chosen c = 0.22. If we had instead used c = −0.22 then the graph
would have been moved 0.22 units to the left, since moving right by a negative
amount is the same as moving left by a positive amount. The quantity c is called
the phase shift. It is not uniquely determined, since values of c that differ by a
multiple of the period correspond to the same graph. We can choose c to be any
of the values of t at which the function takes its maximum value.
(d) y = d + cos t:
y =d+1

1
mean level, d
0
2π t
−1 y =d−1

Our diagram shows the graphs of y = d + cos t (solid) and y = cos t (broken).
We have used the value d = 0.7. For each t, the difference between d + cos t and
8 Chapter 1 Curve Fitting

cos t is d; hence the vertical distance between the two graphs is d units, at all
points t. The graph of d + cos t is obtained by shifting the graph of cos t upwards
by d units. This increases the mean level by d, without changing the amplitude,
period or phase.
Knowing of the amplitude, period, phase shift and mean level of a sinusoidal function
makes it a trivial matter to sketch the graph. The next example illustrates the
process.

Example 1.5
We show how to sketch the graph of y = 3 cos(2t − 2) + 4. The key to doing this is to
rewrite the formula in the standard form a cos b(t − c) + d; the values of a, b, c and
d then provide all the information needed.
Since clearly 3 cos(2t − 2) + 4 = 3 cos 2(t − 1) + 4, we see that a = 3, b = 2, c = 1
and d = 4. Thus the amplitude is 3 (the value of a), the mean level is 4 (the value
of d) and the period is π (the value of 2π/b). At t = 1 (the value of c) the function
attains its maximum. This maximum is 7, the sum of the amplitude and the mean
level. Knowing that the graph is sinusoidal with period π, amplitude 3 and mean
level 4, the information that it takes the value 7 at t = 1 determines it uniquely, and
sketching it becomes a trivial task.

7
6
5
4
3
2
1
0
−3π −2π −π −1 π 2π 3π t

Example 1.6
Find a formula for a sinusoidal function with amplitude 2 and period π/2.
Solution: Recall that the general sinusoidal function y = a cos b(t − c) + d has
amplitude a and period 2π/b. So we require a = 2 and 2π/b = π/2. This gives b = 4.
The values of c and d do not affect the amplitude and period; so they can be chosen
arbitrarily. Thus for any real numbers c and d the formula

y = 2 cos 4(t − c) + d

determines a sinusoidal function with amplitude 2 and period π/2. There are in-
finitely many functions satisfying the requirements.
If we choose c = 0 and d = 0 then we obtain the simple formula y = 2 cos 4t. An-
other solution is y = 2 cos(4t−π/2); this is given by putting d = 0 and c = π/8. Since
cos(θ − π2 ) = sin θ, this second solution can be written more simply as y = 2 sin 4t.

The fact that cos(t − π2 ) = sin t for all t corresponds to a property of the graphs of
y = cos t and y = sin t that we have already observed, namely that the the graph of
sin t is the graph of cos t shifted π/2 units to the right. It also shows us that any
1.2 Periodicity 9

expression involving cos can be easily reformulated in terms of sin. One consequence
of this is that y = a sin b(t − c) + d would serve just as well as y = a cos b(t − c) + d
as a formula for the general sinusoidal function. Indeed, y = a sin b(t − c) + d still
has amplitude a, period 2π/b and mean level d; the conversion from one form to the
other is simply a matter of changing the value of c. For y = a sin b(t − c) + d the
value the function takes at t = c is the mean level d (rather than the maximum value
a + d, as in the other case).†
1.2.3 Combining sinusoidal functions having the same period
If f and g are sinusoidal functions that have the same period then their sum is also
sinusoidal. So, for example, any expression of the form a sin x + b cos x (where a and b
are constants) can be expressed more simply as a modified sine function, R sin(x + α)
(where R and α are constants). In order to do this, we would want to determine R
and α, given a and b.
By a standard trigonometric formula (see Appendix 3),

R sin(x + α) = R sin x cos α + R cos x sin α


= (R cos α) sin x + (R sin α) cos x.

If this is to equal a sin x + b cos x, then we must have

a = R cos α (2)
and
b = R sin α. (3)

Squaring both equations and adding gives

a2 + b2 = (R cos α)2 + (R sin α)2 = R2 (cos2 α + sin2 α) = R2 ,

so that p
R= a 2 + b2 .
The equations (2) and (3) above now give
a
cos α = √
a2
+ b2
and
b
sin α = √
a + b2
2

and these equations determine a unique α in the range 0 ≤ α < 2π.

Example 1.7
Suppose we wish to write sin x + cos x in the form R sin(x + α).
√ √ √
We have R = 12 + 12 = 2 and sin α = cos α = 1/ 2. Since sin α and cos α
are both positive, α lies between
√ 0 and π/2 (the first quadrant). In this interval the
unique solution of sin α = 1/ 2 is α = π/4. Hence

sin x + cos x = 2 sin(x + π/4).

† Since we always insist that a > 0, values of t that are slightly greater than c yield
function values slightly greater than d. In other words, the function is increasing at t = c.
10 Chapter 1 Curve Fitting

We can use the steps outlined in Example 1.5 to sketch the graph of y = sin x + cos x.

2

−2π −π π 2π x

− 2

Note that the sum of two periodic functions that have the same period always
gives another periodic function; this is quite easy to see.† It is not at all obvious
that if the two given functions are sinusoidal then so is their sum, but this is what
we have shown above. It is important to realize, however, that if the given sinusoidal
functions do not have the same period then their sum will not be sinusoidal.
Functions of the form

f (t) = a0 + a1 cos t + a2 cos 2t + · · · + an cos nt


+b1 sin t + b2 sin 2t + · · · + bn sin nt

are called trigonometric polynomials. Since we have sin k(t + 2π) = sin kt and
cos k(t + 2π) = cos kt whenever k is an integer, it follows that f (t + 2π) = f (t)
for all t. By choosing the coefficients ak and bk appropriately, it is possible to find
a trigonometric polynomial that closely approximates any given continuous periodic
function of period 2π. The process of fitting a trigonometric polynomial to given
periodic data is called Fourier analysis, a topic that is beyond the scope of these
notes.

Exercises Set 1.1

1. Find formulas for a sinusoidal function with


(i ) period π and amplitude 3,
(ii ) period 3 and amplitude 5,
(iii ) period 3π and amplitude 3, satisfying f (0) = 3.

2. Sketch the following curves.


(i ) y = sin 2x (ii ) y = 3 sin 2x
(iii ) y = 3 sin 2x + 2 (iv ) y = 3 sin(2x + π2 ) + 2

3. Sketch the following curves.


(i ) y = cos πx (ii ) y = 2 cos πx (iii ) y = 2 cos(πx + π2 )

† To be more exact, the common period is a multiple of the period of the sum. For
example sin 2x + sin x and sin 2x − sin x both have period 2π; their sum has period π.
1.3 Scaling Data 11

§1.3 Scaling Data

1.3.1 Proportionality
The notion of proportionality pervades the life sciences. For instance, the amount A
of muscle action of an animal is roughly proportional to its body mass M . We write

A ∝ M.

This means that there is a constant k, called the constant of proportionality, such
that the equation
A = kM
holds (closely enough for practical purposes) for all animals of a given species. Sim-
ilarly, the body heat H generated by the animal is proportional to muscular action.
That is,
H = aA,
for some constant of proportionality a. Notice that as H ∝ A and A ∝ M , we
have H ∝ M (since H = aA = a(kM ) = (ak)M ). Throughout this course we
shall encounter situations where variables are in proportion. It is also common to
find situations where one variable is proportional to some simple function of another
variable, such as its square or its logarithm. For instance, the surface area of a guinea
pig is proportional to the square of its length:

S ∝ L2 .

1.3.2 Linear Functions


A typical aim of an experiment is to determine how the values of one quantity affect
the values of another. For instance, Table 1 below gives the results of a (hypothetical)
experiment in which a certain length, which is varying with time, is measured at

t (hours) 0 1 2 3 4
y (cm) 1.5 6.1 10.7 15.3 19.9
Table 1

hourly intervals. Such a table defines a function, which we call a tabulated function,
giving the values of the dependent variable (y in this case) in terms of the independent
variable (here t).
The aim is usually to find a formula that fits the tabulated values, the hope being
that the formula can be used to reliably forecast subsequent values of the dependent
variable. In this section we look at two possible procedures for finding such formulas.

Example 1.8
Find an explicit formula to fit the experimental data in the following table.

t 1 2 3 4 5 6
y 0.7 5.6 9.6 19.4 31 42.1
Table 2
12 Chapter 1 Curve Fitting

Observe that for all values of t the corresponding value of y is slightly greater than t2 .
This might lead one to guess that y is proportional to t2 , with a constant of propor-
tionality k that is slightly greater than 1. This theory can be tested easily by plotting
y against T = t2 . If the theory is correct the data points will approximately lie on a
straight line through the origin with slope k.
Expressed in terms of T our data is as follows.

T 1 4 9 16 25 36
y 0.7 5.6 9.6 19.4 31 42.1
Table 3

When this is plotted on graph paper, all the points approximately trace out a
straight line, passing through (0, 0). This tells us that we should approximate the
data with a straight line, which we call the line of best fit. Due to experimental error,
the best fitting straight line may not pass exactly through all the points (or indeed
pass exactly through any of them). However it must be close to all the points.

y
45

40

35

30

25

20

15

10

0 T
0 5 10 15 20 25 30 35 40 45
1.3 Scaling Data 13

The slope of this line can be determined by applying the formula (y2 − y1 )/(x2 − x1 ),
where (x1 , y1 ) and (x2 , y2 ) are any two points on the line of best fit. Using the points
(20, 24) and (10, 12) gives
12 − 24
slope ≅ = 1.2.
10 − 20
So the equation of the line of best fit is y = 1.2T , and the original data satisfies

y ≅ 1.2t2 .

If we calculate the slope using actual data points, for example the points (1, 0.7)
and (16, 19.4), we may get a different result, i.e. using these two points we get that
the slope is
19.4 − 0.7
= 1.25.
16 − 1
Notice that to calculate the slope of the line accurately it was necessary to use points
on the line of best fit rather than taking points from the data table. Estimating the
line of best fit visually is not an exact process, and you and a friend may have slightly
differing answers for the same data. It is important to plot the data carefully, so that
any error is small. It is also a good strategy to use points that are reasonably far
apart to compute the slope.

1.3.3 Linearization
In Example 1.8 above we chose to plot one of the variables against the square of the
other in the hope that doing so would produce a straight line throu gh the origin,
and, luckily, it did. If we had obtained a straight line that did not pass through the
origin we would still have been able to determine the relationship between T and y,
and hence the relationship between t and y, since it is always straightforward to
determine the equation of a straight line graph. On the other hand, if we plot some
experimentally obtained data points and find that they do not lie on a straight line
graph, the relationship between the variables generally remains unclear. To illustrate
this point, consider the result of plotting
y the data points from Table 1 of Example 1.8.
30
25
20
15
10
5
t
−1 1 2 3 4 5 6
The points clearly do not lie on a straight line, and hence we cannot immediately say
what the relationship is between y and t.
If the graph of a dependent variable y against an independent variable x turns
out to be a straight line then we say that x and y satisfy a linear relationship. It
means that y = mx + c for some constants m and c that are easily obtained from the
graph. If x and y do not satisfy a linear relationship then we can try introducing new
variables X and Y , given by some simple functions of x and y respectively, in the
14 Chapter 1 Curve Fitting

hope that X and Y will satisfy a linear relationship. This procedure, if successful, is
known as linearization of the data. The all important question, of course, is how to
define the new variables.
There is no general rule, guaranteed to work in all cases, for deciding what the
new variables should be. It depends on the data in question. Generally speaking, one
needs some extra information that gives one some idea of what kind of relationship
might hold.
If one is able to find a transformation that linearizes a set of data, there is a
reasonable procedure for finding an equation to fit the data. It is as follows.
(i) Transform the data from the old variables, say x and y, to new variables, say
X and Y . Here the values of X should be determined by some function of x
and the values of Y by some function of y. (It may be adequate to use one new
variable in conjunction with one of the old variables. In other words, we may
find it convenient to take X = x or Y = y.) We assume that X and Y have a
linear relationship.
(ii) From the graph of the data for X and Y , determine the linear equation in X
and Y . That is, find values for a and b such that Y = a + bX.
(iii) Transform the equation in X and Y to an equivalent equation in x and y.
The only difficulty is the first step; that is, finding a suitable transformation. Fortu-
nately, there are two frequently occurring classes of relationships for which standard
linearizing transformations are available. These are as follows.
(a) Power laws. These are relationships of the form

y = Axb ,

where A and b are constants that depend on the context. For example, the sizes
of two different parts of a living organism are often related by a power law.
(b) Exponential laws. These are relationships of the form

y = Abx = Aekx (where k = ln b).

For example, if the rate at which y increases or decreases is proportional to its


current value then y will be related to time via an exponential law. Examples
are population growth (so long as there are sufficient resources to sustain the
increase) and radioactive decay.
Exponential laws are so-called since the independent variable x occurs as an exponent
in the equation y = Abx . Power laws are intrinsically very different from exponential
laws, since they involve xb rather than bx .
It turns out that in both of these cases linearization can be achieved by use of
logarithmic transformations. The difference between the two is that for exponential
laws we need only introduce the logarithm of one of the variables, while for power
laws we must use the logarithms of both variables. These linearization procedures
are illustrated in examples below.
Log-log plots
If it is suspected that a function, y = f (x), determined by a set of experimental data
is in fact a power law, so that
y = Axb , (1)
1.3 Scaling Data 15

then, taking logarithms of both sides and using some standard properties of loga-
rithms (see Appendix 2), we find that

ln y = ln(Axb )
= ln A + ln xb
= ln A + b ln x.

So if we let Y = ln y and X = ln x then Eq. (1) has been transformed into

Y = ln A + bX.

Here ln A and b are both constants.


The relationship between Y (= ln y) and X (= ln x) is a linear relationship, since
writing a = ln A gives Y = a + bX. So if we plot Y against X, a straight line graph
will result. If a straight line is not obtained then we will know that the relationship
between x and y was not, after all, a power law. If we do obtain a straight line then
its equation can easily be obtained, giving us the values of a and b. Specifically, b is
the slope of the line and a the Y -intercept (the value of Y at X = 0). Since a = ln A
gives A = ea (see Appendix 2), both the parameters A and b in Eq. (1) will then be
known, giving an explicit formula for y in terms of x.

Example 1.9

Find a formula y = f (x) for the function defined by the following data.

x 1.5 2 2.5 3 3.5 4


y 1.5 3.6 4.7 7.3 9.4 12.9
Table 4

We apply logarithmic transformations to both variables:

X = ln x 0.4 0.69 0.92 1.10 1.25 1.39


Y = ln y 0.59 1.28 1.55 1.99 2.24 2.56
Table 5

Now plot Y against X:


16 Chapter 1 Curve Fitting

Y
3

2.5

1.5

0.5

X
0 0.5 1 1.5 2

Since the points (1.6, 3) and (0.1, 0) lie on the best fitting straight line as drawn
in our diagram, we can use these points to find the slope. The result is that
3
slope = b = = 2,
1.5
so that Y = a + 2X. Now since (0.1,0) lies on the line, we find that
0 = a + 0.2,
giving a = −0.2, Therefore,
Y = 2X − 0.2.
We must now express this relationship in terms of the original variables x and y.
Since Y = ln y and X = ln x,
ln y = 2 ln x − 0.2,
and using the formulas in Appendix 2 once more,
y = e2 ln x−0.2
= e2 ln x × e−0.2
2
= eln x × e−0.2
= x2 × e−0.2
≈ 0.8x2
1.3 Scaling Data 17

Semi-log plots
If it is suspected that the relationship y = f (x) is exponential, so that

y = Aekx (2)

for some constants A and k, then, again taking logarithms, we find that

ln y = ln A + ln ekx
= ln A + kx.

In this case, it is only necessary to transform the y values. Letting Y = ln y, our


calculations have transformed Eq. (2) into

Y = ln A + kx.

Writing a = ln A, we see that the linear equation Y = a + kx relates the variables Y


and x. (Contrast this with the previous example, where we found a linear relationship
between Y = ln y and X = ln x rather than Y = ln y and x itself.) Our task will be
to find the values of a and k by plotting Y against x, and then use A = ea to find
the equation y = Aekx explicitly.

Example 1.10
The alcoholic content, y mg/ml, of a person’s blood t hours after drinking whiskey
rose to 0.22 mg/ml and then slowly decreased, as shown in Table 6.

t 0 0.5 0.75 1.0 1.5 2.0 2.5 3.0


y 0.20 0.19 0.16 0.12 0.11 0.09 0.06 0.04
Table 6

If we assume that y = f (t) is likely to be exponential then we will want to plot


Y = ln y against t. The points to be plotted are given in the following table.

t 0 0.5 0.75 1.0 1.5 2.0 2.5 3.0


Y −1.61 −1.66 −1.83 −2.12 −2.21 −2.41 −2.81 −3.22
Table 7

It turns out that the points do lie close to a straight line.


18 Chapter 1 Curve Fitting

0 0.5 1 1.5 2 2.5 3


t

-0.5

-1

-1.5

-2

-2.5

-3

Since the line of best fit passes through the points (0, −1.5) and (3, −3), its slope
k is given by
−3 − (−1.51)
k= = −0.5.
3−0
Furthermore, the Y -intercept is −1.5; so the equation relating Y and t is
Y = −0.5t − 1.5.
Thus ln y = −0.5t − 1.5, and it follows that
y = e−0.5t−1.5
= e−0.5t × e−1.5
≈ 0.22e−0.5t .
Note that e−0.5t = (e−0.5 )t ≈ (0.61)t ; so the above formula can also be written as
y ≈ (0.22) × (0.61)t . However, it is usual to leave it in the form y = 0.22e−0.5t .
1.3 Scaling Data 19

1.3.4 Logarithmic scales


There are many scientific contexts in which the logarithms of various quantities as-
sociated with the situation provide convenient and natural measuring scales. For
quantities that can only take positive values, it is often the case that the ratio of
two values is a more natural basis of comparison than their difference. In such cases
logarithmic scales are commonly used, since taking logarithms converts ratios to
differences (by virtue of the formula log(a/b) = log a − log b).
(a) Sound level
The intensity of a sound is defined as the amount of energy transmitted per unit
time through a unit area perpendicular to the direction of travel of the sound waves.
Accordingly, sound intensity can be measured in units such as joules per second per
square centimetre (a joule being a measure of energy). Energy per unit time is called
power; so sound intensity can also be measured in units of power per unit area. The
standard unit of power is the watt; one watt is one joule per second.
Define I0 to be a sound intensity of 10−16 watt/cm2 ; this is the intensity of
the faintest sound that can be heard. By definition, the sound level of a sound of
intensity I is b decibels, where b is given by the formula

b = 10 log(I/I0 ).

Here log means logarithm to the base 10 (or common logarithm).


The formula implies that a sound level of zero decibels corresponds to the case
I = I0 , sounds that are barely audible. The threshold of pain occurs at approximately
120 decibels, corresponding to sounds of intensity 1012 times greater than I0 . Since
I0 = 10−16 watt/cm2 , it follows that 1012 I0 = 10−4 watt/cm2 .
(b) Richter Scale
The Richter scale used for measuring the magnitude of an earthquake is a logarithmic
scale, again using logarithms to the base 10. The intensity of an earthquake is roughly
analogous to the intensity of a sound, and it is proportional to the amplitude of the
vibrations of the earth at a fixed distance from the epicentre of the earthquake. The
magnitude on the Richter scale of an earthquake of intensity I is defined to be the
logarithm of I/I0 , where I0 is a certain fixed quantity, corresponding to an earthquake
of very low intensity. Writing M (I) for the magnitude, the formula is

M (I) = log10 (I/I0 ).

Thus the difference in magnitude of two earthquakes is the logarithm of the ratio of
their intensities. Equivalently, it is the logarithm of the ratio of the amplitudes that
a seismograph would record at the same distance from the epicentres.
For example, an earthquake that measures 5 on the scale is 105 times as intense
as one with intensity I0 (since 5 = log10 (I/I0 ) if and only if I/I0 = 105 ). Similarly,
an earthquake that measures 6 on the scale has intensity 106 I0 , and so is 10 times as
intense as one with a Richter reading of 5. Indeed, an increase of 1 on the Richter
scale always corresponds to an increase in intensity by a factor of 10. Thus, an
increase of 2 on the scale corresponds to a 100-fold increase in intensity, and so on.
All logarithmic scales have this property that an increase of one unit on the scale
corresponds to multiplying the quantity being measured by a fixed factor.
20 Chapter 1 Curve Fitting

(c) pH factor
Of fundamental importance to any living organism is the level of hydrogen ion concen-
tration, [H+ ], of its environment. The value of [H+ ] determines whether a substance
is classified as alkaline, neutral or acidic, and it can range from a minimum of 10−12
(for the most alkaline solutions) to a maximum of 10−2 (for the most acidic). Distilled
water (neutral) has an [H+ ] value of 10−7 .
When comparing the [H+ ] values of two substances, it is not the difference of
the values that is important, but their ratio. To see this, let A, B and C denote,
respectively, the [H+ ] values for the strongest acid, for distilled water and for the
strongest alkali. Thus A = 10−2 , while B = 10−7 and C = 10−12 . Observe that

B A
= 105 = .
C B
By contrast, B − C ≈ 0.0000001 is infinitesimal in comparison with A − B ≈ 0.01.
This shows us that use of a logarithmic scale is appropriate.
The so-called pH factor is defined to be the negative of the logarithm of [H+ ]:

pH = − log10 [H+ ].

In consequence of this definition, the pH of a solution determines whether it is acidic,


alkaline or neutral:
pH < 7 ⇒ the solution is acidic;
pH = 7 ⇒ the solution is neutral;
pH > 7 ⇒ the solution is alkaline.
Note that on this scale, an increase of 1 corresponds to multiplying [H+ ] by a factor
1
of 10−1 (that is, 10 ).

Example 1.11
Suppose that two substances, S1 and S2 , have pH factors 5.5 and 8.7 respectively.
How does the [H+ ] concentration in S1 compare with that in S2 ?
In S1 we have [H+ ] = 10−5.5 , while in S2 we have [H+ ] = 10−8.7 . Since

10−5.5
= 103.2 ≈ 1600,
10−8.7
the hydrogen ion concentration of S1 is about 1600 times that of S2 .

Exercises Set 1.2

1. Solve the equations


(i ) log x = log 12 − 2 log 5 (ii ) log x + log 2x = 2
(iii ) e2 ln x = 16 (iv ) ln(x + 2) = 1
(v ) log(5x) = 12 (vi ) e2−x = 4

2. How much more intense is an earthquake that measures 6 on the Richter scale
than one that measures 5.5?
1.4 Finite Differences 21

3. For each of the following sets of data, plot ln y against ln x, and, if necessary, ln y
against x, to determine whether an equation of the form y = axb or y = aekx
satisfies the data. Find the equation in each case.

(i ) x 1 2 3 4 5
y 3 4.1 5.4 5.9 6.7

(ii ) x 0 0.5 1 1.5 2 2.5


y 5 15 35 102 278 735

(iii ) x 10 20 30 40 50
y 4.9 8.0 10.4 11.3 13.9

(iv ) x 0 1 2 3 4
y 19 14 8 7 4


4. (i ) Show that the transformation X = x linearizes the following data.

x 2 2.5 3 3.5 4 4.5


y 5.2 5.6 7 7.7 8.2 8.6

1
(ii ) Show that the transformation X = linearizes the following data.
x

x 1 2 3 4 5 6
y 1 6.1 7.6 8.3 8.9 9.6

§1.4 Finite Differences

1.4.1 Difference Tables


A third procedure for fitting a curve to a set of data utilizes the method of finite
differences. Although this is quite a general method, we shall concentrate on its use
for fitting a polynomial to a set of data. The method is particularly suitable if
(i) the data function is tabulated at equal intervals of the independent variable, and
(ii) the data function displays regularity in the values of the difference functions
constructed from it,
as illustrated in our next example.
22 Chapter 1 Curve Fitting

Example 1.12
Suppose that an experiment records the following data.

x 0.5 1.0 1.5 2.0 2.5 3.0


y 1.9 2.5 3.1 3.7 4.3 4.9
Table 8
This function is tabulated at equal intervals of 0.5 of the independent variable x. We
say that the interval of tabulation is 0.5. Furthermore, note that for each increase of
0.5 in x, there is a corresponding constant increase of 0.6 in the dependent variable y.
As x changes from a to a + h, where h is the interval of tabulation, the change in
the value of y = f (x) is f (a + h) − f (a). We write this quantity as ∆f (a) or ∆y.
The function obtained by calculating ∆f (x) for each tabulated x is called the first
difference function, ∆f , of the function f . We shall use difference functions to help
describe the functions from which they are derived.
For the given data,
∆f (0.5) = f (1) − f (0.5) = 0.6
∆f (1) = f (1.5) − f (1) = 0.6
∆f (1.5) = f (2) − f (1.5) = 0.6
∆f (2) = f (2.5) − f (2) = 0.6
∆f (2.5) = f (3) − f (2.5) = 0.6
Observe that in this case the first difference function is a constant function. Its values
can be tabulated as follows.
x 0.5 1.0 1.5 2.0 2.5 3.0
∆y 0.6 0.6 0.6 0.6 0.6
Table 9
However, for the purpose of displaying the values of f and ∆f on the same table, we
shall usually employ the layout shown in Table 10 below (called a difference table).
There are two points to note.
(i) The table displays two functions: the data function y = f (x) and its first differ-
ence function y = ∆f (x);
(ii) The domains of these functions are similar but not identical. The function f has
domain { 0.5, 1.0, 1.5, 2.0, 2.5, 3.0}, while ∆f has domain { 0.5, 1.0, 1.5, 2.0, 2.5}.†

x f (x) ∆f (x)
0.5 1.9
0.6
1.0 2.5
0.6
1.5 3.1
0.6
2.0 3.7
0.6
2.5 4.3
0.6
3.0 4.9
Table 10
† The domain of a function is the set of points at which it is defined.
1.4 Finite Differences 23

In general, a difference table takes the following form.

x f (x) ∆f (x)
a f (a)
∆f (a)
a+h f (a + h)
∆f (a + h)
a + 2h f (a + 2h)
∆f (a + 2h)
a + 3h f (a + 3h)
Table 11
where ∆f (x) = f (x + h) − f (x).
Geometrically, if ∆f (x) is constant, then, since the values of x are a constant
distance h apart, the data points will lie on a straight line when they are plotted.
f (x)
f (a+2h)
∆f (a+h)
f (a+h)
∆f (a)
f (a)

a a+h a+2h x
Thus we should be able to find a first degree polynomial, f (x) = a0 + a1 x, that fits
the data. We shall see how to do this in the next section.
The regularity in the differences of the dependent variable values may not always
be as obvious as in our previous example.

Example 1.13
Suppose that an experiment records the following data.

x 0 1 2 3 4 5 6
f (x) 1.4 0.6 2.2 6.2 12.6 21.4 32.6
Table 12

Again the data is tabulated with constant increments of the independent variable
x, this time the corresponding changes in the dependent variable are not constant.
Indeed, the first difference function ∆f takes the values shown in the next table.

x 0 1 2 3 4 5
∆f (x) −0.8 1.6 4.0 6.4 8.8 11.2

Table 13

If we now construct ∆(∆f ), the difference function for ∆f , we obtain the following
table.
x 0 1 2 3 4
∆ (∆f (x)) 2.4 2.4 2.4 2.4 2.4

Table 14
24 Chapter 1 Curve Fitting

We call the function ∆(∆f ) the second difference function of f , and we use the
notation ∆2 f . Analogous functions ∆3 f , and so on, can also be constructed.
The given data in this example exhibits a new kind of regularity: constant second
differences. The fact that ∆(∆f ) is constant means that a first degree polynomial
can be found that fits the values of ∆f . We shall see that this enables us to find a
polynomial of degree 2 that fits the values of f . The procedure will be explained in
the next section.
The values of f , ∆f and ∆2 f are generally more conveniently displayed in a
difference table as follows.

x f (x) ∆f (x) ∆2 f (x)


0 1.4
−0.8
1 0.6 2.4
1.6
2 2.2 2.4
4.0
3 6.2 2.4
6.4
4 12.6 2.4
8.8
5 21.4 2.4
11.2
6 32.6
Table 15

Table 16 illustrates a difference table with third differences.

x f (x) ∆f (x) ∆2 f (x) ∆3 f (x)


a f (a)
∆f (a)
a+h f (a + h) ∆2 f (a)
∆f (a + h) ∆3 f (a)
2
a + 2h f (a + 2h) ∆ f (a + h)
∆f (a + 2h)
a + 3h f (a + 3h)
Table 16

1.4.2 Constructing polynomials from difference tables


First Degree Polynomials
If a tabulated function f has a constant first difference function ∆f , then a first
degree polynomial formula can be constructed to fit the given values of f . As a first
step, recall that ∆f (a) = f (a + h) − f (a); so f (a + h) = f (a) + ∆f (a). This simple
observation can be reformulated as follows: each successive function value is the sum
of two preceding terms in the difference table, namely the function value directly
above it and the difference value diagonally above it.

a f (a)
∆f (a)
| {z }
a+h f (a + h)
...
... ...
1.4 Finite Differences 25

The same principle with a + h in place of a yields f (a + 2h) = f (a + h) + ∆f (a + h).

a+h f (a + h)
∆f (a + h)
| {z }
a + 2h f (a + 2h)

Now if ∆f (x) takes the constant value D, so that

D = ∆f (a) = ∆f (a + h) = ∆f (a + 2h) = · · · ,

then we deduce that

f (a + 2h) = f (a + h) + ∆f (a + h)
= f (a + h) + D
= (f (a) + ∆f (a)) + D
= f (a) + 2D,

and similarly,

f (a + 3h) = f (a + 2h) + ∆f (a + 2h)


= (f (a) + 2D) + D
= f (a) + 3D.

More generally, for all integers k ≥ 0,

f (a + kh) = f (a) + kD. (1)

If we put x = a + kh (and hence k = (x − a)/h) then this equation becomes

D
f (x) = f (a) + (x − a). (2)
h

Moreover, we know that Eq. (2) holds for all values of x in the table, since these x
values all have the form a + kh for nonnegative integers k. Since the right hand side
of Eq. (2) is clearly a linear function (since f (a), a, D and h are constants) we have
found a linear equation that is satisfied by the data.
For the data in Example 1.12 above, the constant value of the first differences
is 0.6, and the other constants appearing in Eq. (2) are a = 0.5, h = 0.5 and
f (a) = 1.9. So a first degree polynomial formula satisfied by the data is

0.6
f (x) = 1.9 + (x − 0.5)
0.5
= 1.9 + 1.2(x − 0.5)
= 1.2x + 1.3.

We could check directly that the data does indeed satisfy this first degree (linear)
polynomial. To do this, substitute in all the values of x and check that the answer
always agrees with the table.
26 Chapter 1 Curve Fitting

Thus, with the first difference function constant, we have found a linear polyno-
mial satisfied by the data. This may appear to be a rather formal procedure for such
a simple case, but the method extends easily to higher degree polynomials.
Second Degree Polynomials
If the second differences of a tabulated function f are constant, one can find a second
degree polynomial that fits data. The algebra is a little harder than in the previous
case, but still routine.
We start with the observation that since the first difference function has constant
first differences, the theory developed above applies to it. In particular, by Eq. (1)
with ∆f in place of f ,
∆f (a + kh) = ∆f (a) + kD (3)
for all integers k ≥ 0, where D is the constant value of the function ∆2 f .
Now, looking at the difference table as before, we find that

f (a + h) = f (a) + ∆f (a), (4)

and similarly
f (a + 2h) = f (a + h) + ∆f (a + h).
Putting k = 1 in Eq. (3) provides a formula for ∆f (a + h), and combining this with
the formula for f (a + h) in Eq. (4) we deduce that

f (a + 2h) = (f (a) + ∆f (a)) + (∆f (a) + D)


= f (a) + 2∆f (a) + D. (5)

Similarly,

f (a + 3h) = f (a + 2h) + ∆f (a + 2h)


= (f (a) + 2∆f (a) + D) + (∆f (a) + 2D)

by Eq. (5) and Eq. (3) applied with case k = 2. So

f (a + 3h) = f (a) + 3∆f (a) + 3D. (6)

Applying the same method once more,

f (a + 4h) = f (a + 3h) + ∆f (a + 3h)


= (f (a) + 3∆f (a) + 3D) + (∆f (a) + 3D)

by Eq. (6) and Eq. (3) with k = 3. So

f (a + 4h) = f (a) + 4∆f (a) + 6D.

You may check, by similar calculations, that

f (a + 5h) = f (a) + 5∆f (a) + 10D,


f (a + 6h) = f (a) + 6∆f (a) + 15D,
and
f (a + 7h) = f (a) + 7∆f (a) + 21D.
1.4 Finite Differences 27

Thus it seems that


f (a + kh) = f (a) + k∆f (a) + nk D,
for some positive integer nk depending on k. Furthermore, the sequence of values of
nk as k goes from 1 to 7 is as follows: 0, 1, 3, 6, 10, 15, 21. In fact, nk is simply the
sum of the numbers from 0 to k − 1, and it is given by the formula

nk = 21 k(k − 1).

So
f (a + kh) = f (a) + k∆f (a) + 12 k(k − 1)∆2 f (a).
Defining x = a + kh (and hence k = (x − a)/h), this equation becomes

(x − a) 1 (x − a)  (x − a) 
f (x) = f (a) + ∆f (a) + −1 D
h 2 h h
1 1
= f (a) + ∆f (a)(x − a) + 2 D(x − a)(x − a − h).
h 2h
This is a quadratic (or second degree) polynomial that is satisfied by the data (since
a, h, D, f (a) and ∆f (a) are all constants).
For the data in Example 1.13 above, the constant value of the second differences
is D = 2.4, and the other constants appearing in Eq. (2) are a = 0, h = 1 and
f (a) = f (0) = 1.4 and ∆f (a) = ∆f (0) = −0.8. So a second degree polynomial
formula satisfied by the data is
1
f (x) = 1.4 + (−0.8)x + 2 2.4 x(x − 1)
2
= 1.4 − 2 x + 1.2 x .

Example 1.14
Problem: find a polynomial of minimal degree that fits the data below.

x 0.0 0.5 1.0 1.5 2.0 2.5


f (x) −0.500 −0.125 1.000 2.875 5.500 8.875
Table 17

First, construct the difference table.

x f (x) ∆f (x) ∆2 f (x)


0.0 −0.500
0.375
0.5 −0.125 0.750
1.125
1.0 1.000 0.750
1.875
1.5 2.875 0.750
2.625
2.0 5.500 0.750
3.375
2.5 8.875

As the first differences are not constant there is no suitable first degree polynomial.
However, the second differences are constant, and so it is possible to find a suitable
28 Chapter 1 Curve Fitting

second degree polynomial. As was shown above, the required polynomial is given by
the formula
1 1
f (x) = f (a) + ∆f (a)(x − a) + 2 ∆2 f (a)(x − a)(x − a − h),
h 2h
where h is the interval of tabulation. It is usual to take a to be the first of the x
values in the table; however, so long as the values of f (a) and ∆f (a) appear in the
table, we could take a to be any of the x values. (They all give the same answer.)
Here h = 0.5, and if we take a = 0 then f (a) = −0.5, ∆f (a) = 0.375 and
∆2 f (a) = 0.750. Thus the polynomial we seek is

0.375 0.750
f (x) = −0.5 + x+ x(x − 0.5)
0.5 2(0.5)2
= 1.5x2 − 0.5.

For interest’s sake, let us check that taking a = 1 leads to the same answer. This
time we have f (a) = 1 and ∆f (a) = 1.875; so the formula gives

1.875 0.750
f (x) = 1 + (x − 1) + (x − 1)(x − 1.5)
0.5 2(0.5)2
= 1 + (3.75 x − 3.75) + 1.5(x2 − 2.5 x + 1.5)
= (1 − 3.75 + 2.25) + (3.75 − 3.75)x + 1.5 x2 ,

the same answer as before.

The procedure we used to derive the formula for the second degree polynomial
when the second differences are constant extends naturally to the case of constant
third differences, in which case a third degree polynomial is obtained, and to constant
fourth differences, giving a degree 4 polynomial, and so on.
The formulae for the first three cases are as follows.
(x − a)
First degree: f (x) = f (a) + ∆f (a)
h
(x − a) (x − a)(x − a − h) 2
Second degree: f (x) = f (a) + ∆f (a) + ∆ f (a)
h 2h2
(x − a) (x − a)(x − a − h) 2
Third degree: f (x) = f (a) + ∆f (a) + ∆ f (a)
h 2h2
(x − a)(x − a − h)(x − a − 2h) 3
+ ∆ f (a)
6h3
This generalizes to higher degrees in a natural way. Indeed, for each degree n,
the formula for the polynomial of degree n is obtained from the one of degree n − 1
by adding the extra term

(x − a)(x − a − h)(x − a − 2h) · · · (x − a − (n − 1)h) n


∆ f (a),
n! hn
where n! is an abbreviation for the product of the numbers from 1 to n.
This degree n polynomial is known as Newton’s Polynomial.
1.4 Finite Differences 29

Exercises Set 1.3

1. Construct difference tables and find polynomials to fit the following sets of data.
(i )
x 0 2 4 6 8
f (x) 17 13 9 5 1

(ii )
x 1 1.2 1.4 1.6 1.8 2.0
f (x) 3 3.8 4.6 5.4 6.2 7

(iii )
x −3 −2 −1 0 1 2 3
f (x) 4 −5 −10 −11 −8 −1 10

(iv )
x 2 2.1 2.2 2.3 2.4 2.5
f (x) 7 7.41 7.84 8.29 8.76 9.25

(v )
x 0 10 20 30 40
f (x) 3 1053 8303 27753 65403

(vi )
x 1 2 3 4 5 6
f (x) 0.25 2 6.75 16 31.25 54
Chapter 2
Optimisation

§2.1 One Variable Problems

In this section we will be concerned with finding maximum and minimum values
of various functions. Problems of this type arise quite frequently. For example,
how much fertilizer applied to a particular crop will produce the greatest yield?
Which cylindrical container, made from a given amount of material, has the greatest
volume? How should the wings of an aeroplane be shaped to maximize lift? These
are all optimisation problems.
In mathematical terminology, solving an optimisation problem means finding
the maximum or minimum value of some quantity f (x) defined for given values of x.
We call the values of x for which the function f (x) is defined the domain of f .
We will for example consider problems where f (x) is defined whenever a ≤ x ≤ b
(where the numbers a and b depend on the context). The set of all x satisfying
a ≤ x ≤ b is called the closed interval [a, b]. Note that the set [a, b] contains the
points a, b (which are called the endpoints of the interval) and it is possible that the
maximum or minimum value of f (x) on [a, b] occurs at these endpoints. We will also
consider problems where f (x) is defined whenever a < x < b, e.g. 1 < x < 6 or
−∞ < x < ∞. The set of all x satisfying a < x < b is called the open interval (a, b).
Since the interval (a, b) does not contain the points a, b we do not need to consider
these numbers when searching for maximum and minimum value of f (x) on (a, b).
For example on the interval −1 ≤ x ≤ 6, or [−1, 6], the function f (x) = 2x has
minimum value f (−1) = −2 and maximum value f (6) = 12.

max

-1
x
6
min

On the interval −1 < x < 6, or (−1, 6), the function f (x) = 2x does not attain
a minimum or maximum value.

30
2.1 One Variable Problems 31

-1
x
6

On the interval −1 < x ≤ 6, which we also write as (−1, 6], the function
f (x) = 2x does not attain a minimum value but has the maximum value f (6) = 12.

max

-1
x
6

On the interval −∞ < x < ∞, or (−∞, ∞), the function f (x) = 2x does not
attain a maximum or minimum value.

-1
x
6
32 Chapter 2 Optimisation

We shall always assume that the function is differentiable at all points in [a, b],
with (possibly) a finite number of exceptions, and continuous everywhere. The points
at which f (x) is not differentiable are those at which the graph of f (x) either has no
tangent, or has a vertical tangent. When searching for minimum or maximum values
of a function y = f (x), we must check the values of the function at any endpoints
which are included in its domain. To find the candidate points at which the maximum
or minimum value may occur, we need to consider the derivative of the function.
2.1.1 The Derivative
Suppose that we have an equation y = f (x) that describes the relationship between
x and y, and we wish to know the maximum and minimum values of y, if such exist.
In order for y to reach a maximum value at a point other than an endpoint, it must
increase to that value and then decrease. When a function is increasing its gradient
is positive, and when it is decreasing its gradient is negative. The gradient is given by
dy
the first derivative, dx = f ′ (x). We see that the first derivative must move from posi-
dy
tive to negative as we pass through a maximum value of the function. As dx changes
from positive to negative it must either equal zero or become undefined. Similarly,
as we pass through a minimum at a point other than an endpoint the derivative
changes from negative to positive; so it must either equal zero or be undefined at the
minimum point.
Values of x for which the first derivative of y = f (x) equals zero or fails to exist
are called critical numbers of the function. We can find all the numbers in the domain
of f (x) at which the function might have a maximum or a minimum value by finding
these critical numbers together with any endpoints contained in the domain. If x0
is a critical number then the corresponding point (x0 , y0 ) on the graph is called a
critical point. The diagram illustrates maximum and minimum critical points with
dy
dx = 0.
max

dy dy
dx >0 dx <0 dy dy
dx <0 dx >0

min
dy
We shall consider the case when is undefined later.
dx
Once a critical number has been found, one can determine whether it corresponds
dy
to a maximum or minimum, or neither, by investigating the sign of dx at x values
close to the critical number. Suppose that x = x0 is a critical number. If it is
dy dy
found that dx < 0 for all x values a little less than x0 and dx > 0 for all x values a
little greater than x0 then the critical point is a local minimum (meaning that it is a
dy
maximum if we consider only x values close enough to x0 ). If it is found that dx >0
dy
for all x values a little less than x0 and dx < 0 for all x values a little greater than
x0 then the critical point is a local maximum. If neither of these circumstances arise
– which is quite possible – then the critical point is neither a local maximum nor a
local minimum.
Example 2.1
et f (x) = 2x2 − 4x − 1. (When a function is defined for all values of x then unless
specified otherwise we take its domain to be −∞ < x < ∞). Then f ′ (x) = 4x − 4.
2.1 One Variable Problems 33

Solving 4x − 4 = 0 gives x = 1; so the only critical number is x = 1. Since f (1) = −3


the corresponding critical point is (1, −3). Now since 4x − 4 < 0 when x < 1 and
4x − 4 > 0 when x > 1 it follows that the critical point (1, −3) is a local minimum.
Since f (x) has no other critical points, this is also a global minimum. There are no
endpoints to consider.

At an end point we can also use the derivative of the function to test whether
this number is a local maximum or local minimum of the function. It works exactly
the same as for a critical number, except that one only has to consider values of x
that actually lie in the domain of the function. This is best demonstrated by an
example.

Example 2.2
Let f (x) = 2x2 − 4x − 1, for −2 ≤ x ≤ 2. As above, f ′ (x) = 4x − 4 and x = 1
is the only critical number of the function with corresponding critical point (1, −3).
Consider the endpoint x = −2. When x is slightly larger than −2, the derivative
f ′ (x) = 4x − 4 is negative, so the corresponding point (−2, f (−2)) = (−2, 15) is a
local maximum of f (x) on the domain [−2, 2]. We do not have to consider here what
happens when x is slightly less than −2 because such points do not lie in the given
domain [−2, 2].
Consider the end point x = 2. When x is slightly less than 2, the derivative

f (x) = 4x − 4 is positive, so the corresponding point (2, f (2)) = (2, −1) is a local
maximum of f (x) on the domain −2 ≤ x ≤ 2.
It is a very useful fact that a continuous function defined on a closed
interval [a, b] will always achieve a global maximum value and a global
minimum value. These can occur only at critical points, points where the derivative
is not defined, or endpoints. A global maximum is in particular a local maximum, so
since we have just two local maximum points the global maximum point is the one
of these where f (x) is largest, namely (−2, 15). We have only one local minimum
point, so this is also the global minimum point.
34 Chapter 2 Optimisation

global maximum

y = 2x2 − 4x − 1

x
-2 -1 1 2
local maximum

global minimum

There is an alternative test one can use as a means of determining whether a


dy
point where dx = 0 is a local maximum or a local minimum. This test involves the
2
d y dy
second derivative, dx 2 (the derivative of dx ).† As we noted above, if the function has

a local maximum at x = x0 then the derivative is positive when x is just less than x0
and negative when x is just greater than x0 . Thus the derivative is decreasing as we
pass through x = x0 . This suggests that the derivative of the derivative should be
negative at x = x0 . Similarly, at a local minimum the derivative must be increasing,
and the second derivative should be positive.
The second derivative test has some drawbacks. It is true that if the second
derivative is negative at a place where the first derivative vanishes then the critical
point is a local maximum, and it is true that the if the second derivative is positive at
a place where the first derivative vanishes then the critical point is a local minimum.

d2 y
† If y = f (x) then we also use the notation f ′′ (x) for the second derivative dx2
.
2.1 One Variable Problems 35

However it is not true that the second derivative must be negative at a local maximum
and positive at a local minimum. It could turn out that the second derivative is zero,
and in this case the test is inconclusive. We could have a local maximum, e.g.
f (x) = −x4 and (0, 0)
y

local maximum
x

y = −x4

we could have a local minimum, e.g. f (x) = x4 and (0, 0)


y

y = x4

x
local minimum

we could have a critical point that is neither a local maximum nor a local mini-
mum, e.g. f (x) = x3 and (0, 0).
y
y = x3

x
neither

Similarly, if the second derivative fails to exist at the critical number then there
is no information to be had from the second derivative test.
Another drawback with the second derivative test is simply that calculating the
value of the second derivative at the critical number is frequently much harder than
calculating the first derivative for values of x near the critical number. However,
36 Chapter 2 Optimisation

whether or not this is really a problem varies from case to case, and it is undeniable
that the second derivative test is often simple and successful. In the example we
considered above (namely f (x) = 2x2 − 4x − 1) we found that f ′ (x) = 4x − 4, and it
follows at once that f ′′ (1) = 4 > 0. So the critical point (1, −3) is a local minimum.
If we are given the formula for the function f that we are trying to maximize or
minimize, then finding the points at which the derivative is zero or undefined should
be a straightforward task. When we are faced with a practical problem, however,
the necessary equations will not generally be provided for us. In such cases we have
to construct the equations for ourselves. That is, we try to find a “mathematical
model” that accurately represents the situation.

Example 2.3
The manager of a hotel in San Juan with 200 rooms finds that if the cost of a room
is $50 or less per night, then the hotel will generally be full. She also knows that, on
average, for every dollar the cost is increased above $50, another two rooms remain
empty. The daily cost of running the hotel (for staff etc.) is $5000, independent of
the number of rooms occupied. How much should the manager charge in order to
maximize profit?
There are four steps involved in the solution of this problem, and others like it.
Step 1: Identify the variables involved. We have the cost of the room (let us call it
$ x), the number of rooms occupied (call it N ), and the profit (call it $ P ). (The $5000
daily cost of running the hotel and the number of rooms in the hotel are constants
and do not need to be given names.)
Step 2: Determine the independent variables (those whose values we are free to
choose), and determine the variable to be optimised. In the present example the
independent variable is x, and the aim is to find the value of x that maximizes P .
Note that the value of the variable N depends upon the value of x, and so it will be
possible to eliminate N from the formula for P .
Step 3: Write down equations that describe all the known relationships between the
variables.
In the present example, we have first of all that

Profit = No. rooms occupied × cost of a room − $5000.

That is,
P = N x − 5000.

Next, we know that a relationship exists between N , the number of rooms occupied,
and x, the cost of a room. Since the hotel is generally full if x ≤ 50, the manager
may as well charge at least $50 per room. So we shall restrict our attention to values
of x greater than or equal to 50. Since N is decreased by two for every dollar by
which x exceeds 50,
N = 200 − 2 × (x − 50)
= 300 − 2x.

(Note that N = 0 when x = 150. The manager will certainly not want to charge as
much as $150 for a room.)
2.1 One Variable Problems 37

Now
P = N x − 5000
= (300 − 2x)x − 5000
= −2x2 + 300x − 5000.
Step 4: Once we have an expression for the variable to be optimised in terms of
the independent variables, we can differentiate the expression and find the critical
numbers. In the present problem there is only one independent variable, x. Problems
with two or more independent variables will be considered in the next section.
The formula P = −2x2 + 300x − 5000 gives
dP
= −4x + 300.
dx
dP
So = 0 when 4x = 300. So x = 75 is the only critical number, and since
dx
2
d P
dx2 = −4 < 0 for all x, the critical number gives a maximum value for P . Thus the
manager should charge $75 per night for a room, thereby making a daily profit of

$((300 − 2 × 75) × 75 − 5000) = $6250.

2.1.2 Global and Local Maxima and Minima


In the above example it would make no sense to consider values for x other than
those between 50 and 150. We call this set of values, 50 ≤ x ≤ 150, the domain
of the function. Since the function is not defined for x values outside the domain,
the graph of the function does not extend beyond x = 50 and x = 150. The graph

P
6250

5000

3750

2500

1250

0 x
25 50 75 100 125 150
−1250

−2500

−3750

−5000

The graph of P = −2x2 + 300x − 5000 for 50 ≤ x ≤ 150.

is as shown in the diagram. It is clear from the graph that the value of P at the
38 Chapter 2 Optimisation

turning point is a global maximum for this function: there is no point in the domain
where the function achieves a greater value. It is not always the case, however, that a
turning point at which the second derivative is negative must be a global maximum,
rather than merely a local maximum. To demonstrate this we consider the function
f (t) = t2 et .
Differentiating y = t2 et gives

dy
= t2 et + et 2t
dt
= tet (t + 2),

and this takes the value zero at t = 0 and at t = −2. (It does not take the value
zero anywhere else, since et never equals zero. And since the derivative exists for all
values of t, it follows that 0 and −2 are the only critical numbers.) Observe that
y = 0 at t = 0, while at t = −2 we find that y = 4e−2 ≈ 0.5. Now to determine the
nature of the critical points (0, 0) and (−2, 4e−2 ) we look at the sign of dy
dt on either
side of t = 0 and t = −2. It often helps to display the results in a sign diagram for
dy
dt , as follows.

t < −2 t = −2 −2 < t < 0 t=0 t>0


dy
+ve 0 −ve 0 +ve
dt
slope ր −→ ց ր
−→
Table 1

Thus the graph slopes up to a local maximum at (−2, 4e−2 ), then down to a local
minimum at t = 0, then up again after that.
We note also that y = t2 et ≥ 0 for all t. For negative values of t with |t| large,
e−t is so close to zero that t2 e−t is also close to zero. For t large and positive, y is also
large and positive. We now have enough information at our disposal to be able to
sketch the graph. It is immediately clear that for large positive t the function takes
y

x
−5 −4 −3 −2 −1 1 2 3 4
on values much bigger than 4e−2 ; so (−2, 4e−2 ) is not a global maximum, but only a
local maximum. The function value at t = −2 is bigger than the values immediately
2.1 One Variable Problems 39

around, but by no means the biggest value the function attains. On the other hand,
the point (0, 0) is a global minimum, since t2 et ≥ 0 for all t. (This function does not
have a global maximum.)
Let us now modify the above example by restricting the domain of the function
to the interval [−3, 1]. The graph now stops at t = −3 and t = 1.
As before, the function has a local maximum of 4e−2 at t = −2 and a global
minimum of 0 at t = 0. But it now has a global maximum of e at t = 1, and a
local minimum of 9e−3 ≈ 0.45 at t = −3. A continuous function whose domain
is a closed interval [a, b] will always have global maximum and minimum
values. These values occur either at critical numbers in the interior of the interval
or at the end-points (a and b).

y
3
e
2

x
−3 −2 −1 1

It is also possible to consider functions defined on intervals of the form a < x < b,
called open intervals, where the end-points are excluded. Such functions need not
have global maxima and minima.
2.1.3 Maxima and minima when the derivative is undefined
The maximum and minimum points we have encountered so far have all occurred
at points where the first derivative is zero (that is, at points where the curve has a
horizontal tangent). We now turn to cases in which the first derivative is undefined.
As a first example, consider the absolute value function, y = |x|, defined by the
formula
n
x if x ≥ 0,
|x| =
−x if x < 0.
(This formula confuses some students, but it is correct. For example, when x = −2
it gives |x| = −(−2) = +2, just as it should.) The graph is easy to draw, and shows
y

0
−5 −4 −3 −2 −1 1 2 3 4 x
−1
40 Chapter 2 Optimisation

that the function has a minimum value of 0 at x = 0. The gradient is −1 for x < 0
and +1 for x > 0, but is undefined at x = 0.† (The function itself is certainly defined
at x = 0; indeed, |0| = 0.)
Turning now to the general situation, assume (as always) that f is a function
whose derivative exists at all but possibly a finite number of points in its domain.
Suppose that x0 is a point in the domain (so that f (x0 ) is defined) for which f ′ (x0 )
is undefined. Then the graph of the function will come to a sharp point (technically
called a cusp) at (x0 , f (x0 )), and frequently this will be a local maximum or minimum.
As for critical numbers with f ′ (x) = 0, the determining feature of a maximum is that
f ′ (x) changes from positive to negative as x increases through the value x0 . Similarly,
a minimum is characterized by f ′ (x) changing from negative to positive at the critical
number. The diagrams illustrate a cusp that is a local maximum, a cusp that is a
dy
dx undefined
dy
dx undefined
dy
dx > 0
dy dy
dx > 0 dx < 0 dy dy
dx < 0 dx > 0
dy dy
dx undefined dx > 0

local minimum, and a cusp that is neither a local maximum nor a local minimum.
Note that it makes no sense to even ask whether there is a local maximum or
minimum at an x value for which f (x) is undefined. Such a value of x is not even in
the domain of the function. If f (x0 ) does not exist, f cannot have a maximum or a
minimum at x = x0 : it does not have any value at x0 . For example x = 0 is not in
the domain of the function f (x) = 1/x.

Example 2.4
The first derivative of y = (x − 1)2/5 is

dy 2
= (x − 1)−(3/5)
dx 5
2
= ,
5(x − 1)3/5

dy
which is undefined at x = 1. There are no values of x that make dx zero; so 1 is the
only critical number. The corresponding critical point is (1, 0). The sign diagram for
the slope is as shown.

x<1 x=1 x>1


dy
−ve undefined +ve
dx
slope ց undefined ր

Table 2

† The graph is not smooth at the origin; so there is no tangent at that point.
2.1 One Variable Problems 41

We can now draw the graph.


y

x
−4 −3 −2 −1 1 2 3 4 5

A Summary
(i) Local maximum and minimum points of f (x) can occur only at at critical points
(that is, at points where f ′ (x) is zero or is undefined) or at endpoints of the
domain of f . Note that critical points and endpoints may be neither a local
maxima nor local minima.
(ii) At a local maximum that is not an endpoint f ′ (x) changes from positive to
negative as x increases, and f ′′ (x) ≤ 0 (if it exists).
(iii) At a local minimum that is not an endpoint f ′ (x) changes from negative to
positive as x increases, and f ′′ (x) ≥ 0 (if it exists).
(iv) The global maximum and minimum values, if they exist, are included among
the local maxima and minima. A function need not have a global maximum or
minimum if its domain is an infinite interval or if the one of the endpoints of the
interval is not part of the domain.†
2.1.4 Concavity and points of inflection
Let x = c be a point in the domain of a function f at which the derivative f ′ (c) is
defined. The tangent to the graph at the point (c, f (c)) can then be drawn. If the
graph of f lies above the tangent line for all points close to (c, f (c)) then we say
that the graph is concave upward at this point. Similarly, if the graph lies below
the tangent line at all points close to (c, f (c)) then the graph is said to be concave
downward at this point. The diagrams illustrate these concepts.

f (x) f (x)

f (c) f (c)

c x c x

concave upward concave downward


The requirement that the graph near (c, f (c)) remains on one side of the tangent
means that it must bend away from the tangent.

† For example, if the domain of f consists of all x with 0 < x ≤ 1, with x = 0 excluded,
then f need not have a global maximum. Indeed, f (x) = 1/x does not have a maximum
value on this domain. Nor does f (x) = 1 − x.
42 Chapter 2 Optimisation

If we imagine moving along a portion of the graph for which the concavity is
upward, we will find that the slope of the graph increases as x increases. That is, the
first derivative of f is an increasing function when f is concave upward. Conversely,
f is concave upward when f ′ is increasing. Similarly, f is concave downward on
intervals where f ′ is a decreasing function. Now recall that a function must be
increasing at points where its derivative is positive. So at points where the second
derivative of f is positive the first derivative of f must be an increasing function,
and the graph of f must be concave upward. Similarly, when the second derivative
is negative the graph is concave downward.
A point at which the concavity changes from upward to downward (or vice versa)
is called a point of inflection. Since the sign of the second derivative changes as we
pass through a point of inflection, at the point of inflection itself the second derivative
must either be zero or undefined. One can see (roughly) where the concavity of a
d2 y
curve changes, but it is not easy to tell visually whether dx 2 is zero or undefined.

 d2 y (concave upward)
dx2
=0
1 point of inflection
1
d2 y
dx2
<0
(concave downward) d2 y (horizontal tangent) (c,f (c))=(0,0.5)
f (c)
dx2
<0
(concave upward) d2 y

dx2
undefined
point of inflection

c
0
1 2 1
(concave downward)

3
f (x) = x3 − 3x2 + 2x + 0.5 f (x) = 0.5 − x5

Observe that although the graph and its tangent line at the point (c, f (c)) have
the same slope, as they must by the definition of “tangent”, the fact that (c, f (c)) is
a point of inflection means that the graph actually crosses the tangent at (c, f (c)).
As we move away from this point the slope of graph changes, and it bends away from
the line.
y = x3 y = x4

x x

2
dy d y
The graphs of y = x3 and y = x4 both satisfy dx = dx 2 = 0 at x = 0. In

both cases the x-axis is the tangent to the graph at the origin. For y = x3 the
graph crosses its tangent at the origin, indicating a point of inflection. The
graph of y = x4 is concave upward at the origin, indicating a local minimum.

It is helpful to locate points of inflection when sketching graphs, since changes


in concavity are significant aspects of a curve’s shape.
2.1 One Variable Problems 43

Critical points and concavity


Note that if f ′ (c) = 0 and f is concave upward at (c, f (c)) then the critical point
(c, f (c)) is a local minimum of f . Similarly, if f is concave downward at a critical
point then the point is a local maximum. However, if f ′ (c) = 0 and (c, f (c)) is a point
of inflection (as well as being a critical point) then it is neither a local maximum nor
a local minimum.
This is because the tangent at (c, f (c)) must be horizontal (since f ′ (c) = 0) and
the graph must cross then tangent at (c, f (c)) (since we have a point of inflection);
so f (x) > f (c) on one side of x = c and f (x) < f (c) on the other side.
dy
The simplest example of this is the graph of y = x3 at the origin. Both dx and
d2 y
dx2 are zero at x = 0.
Since y > 0 when x > 0 and y < 0 when x < 0 we see that (0, 0) is neither a
local maximum nor a local minimum; hence it must be a point of inflection as well
dy d2 y
as a critical point. By contrast, y = x4 also has the property that dx and dx 2 both

vanish at x = 0; however, in this case (0, 0) is a local minimum. This example shows
d2 y
that the condition dx 2 = 0 is no guarantee that we have a point of inflection.

Example 2.5
2/3
We sketch the graph of y = x (x − 5), and determine the maximum and minimum
values of y on the interval [−1, 8].
As a first step, we find the x- and y-intercepts. Note that x2/3 is the cube root
of x , which is zero at x = 0 and positive elsewhere. Thus x2/3 (x − 5) has the same
2

sign as x − 5, for all nonzero values of x. So y < 0 for x < 0 and for 0 < x < 5, while
y > 0 for x > 5. And y = 0 at x = 0 and at x = 5.
Observe also that y has extremely large magnitude when x has extremely large
magnitude, y having the same sign as x. In mathematical notation, y → ∞ as x → ∞
and y → −∞ as x → −∞.
We now differentiate y, so that we can find the critical points.

dy 2
= x2/3 + (x − 5) x−1/3
dx 3
2(x − 5)
= x2/3 +
3x1/3
3x + 2(x − 5)
=
3x1/3
5x − 10
=
3x1/3
5 (x − 2)
=
3 x1/3
dy
So dx is 0 when x = 2 and is undefined when x = 0. There are therefore two critical

points: (0, 0) and (2, −3 3 4) ≈ (2, −4.8).
We had already found that y = 0 at x = 0 and that y < 0 for nearby values of
x on either side of x = 0. This shows that (0, 0) is a local maximum, and it might
have led us to expect that the derivative would be zero at x = 0. In fact, there is
a cusp at x = 0, as can be seen by inspecting the values of the derivative for points
44 Chapter 2 Optimisation

close to zero. If x is small and positive then x1/3 (the cube root of x) is also small
and positive, whereas x − 2 is close to −2. So (x−2)
x1/3
is a negative number of large
magnitude. On the other hand, when x is negative and close to zero, x1/3 is also
negative and close to zero, while x − 2 is still close to −2; so in this case (x−2)
x1/3
is
positive and of large magnitude. So we have an extremely sharp cusp at the origin,
the graph being nearly vertical on either side of the cusp.
Since y = 0 at √x = 0 and x = 5, and y < 0 for 0 < x < 5, it is clear that the
critical point (2, −3 3 4) must be a local minimum. At this stage we know enough
to be able to draw the graph reasonably well. It only remains to determine the
concavity of the various sections of the graph, so that we can really represent its
shape accurately. To determine the concavity and the points of inflection, we need
dy
to differentiate again. The following calculation only applies for x 6= 0, since dx is
not defined at x = 0.
!
2 1/3 1 −(2/3)
d y 5 x − (x − 2) 3 x
2
= 2/3
dx 3 x
5 (3x1/3 − (x − 2)x−2/3 )
=
3 3x2/3
5 (3x − (x − 2))
=
3 3x4/3
5 (2x + 2)
=
9 3x4/3
10 (x + 1)
=
9 x4/3

(In the first line of the we applied the quotient rule for differentiation. In the third
line we multiplied both the numerator and the denominator by x2/3 .)
d2 y
So dx2 = 0 when x = −1. (The value of y at this point is (−1)2/3 (−1−5) = −6.)
2 2
4/3 d y d y
Since x is never negative we see that dx 2 < 0 when x < −1, and dx2 > 0 for x > −1

(excluding x = 0). So the concavity is downward for x < −1, the point (−1, −6) is a
point of inflection, and the concavity is upward for −1 < x < 0 and for x > 0.

x < −1 x = −1 −1 < x < 0 x = 0 0<x<2 x=2 x>2

dy/dx +ve +ve +ve undef. −ve 0 +ve


slope ր ր ր ↑↓ ց −→ ր
√ √ √
y y < −6 y = −6 −6 < y < 0 y = 0 0 > y > −3 4 y = −3 4 −3 3 4 < y
3 3

d2 y/dx2 −ve 0 +ve undef. +ve +ve +ve

concavity downward inflection upward undef. upward upward upward

Table 3

Finally, after calculating that y ≈ −22.6 at x = −4 and y ≈ 3.3 at x = 6, we are able


to draw the graph quite accurately. (In order to exaggerate the change in concavity
at (−1, −6) we have used different scales for x and y.)
2.1 One Variable Problems 45

Our other task is to find the maximum and minimum values of y for −1 ≤ x ≤ 8.
From the graph it is clear that there the local maxima occur when x = 0 and when
x = 8, and the local minima occur when x = −1 and when x = 2. Furthermore, we
have already calculated y at x = 2 and at x = −1, and we know that the value at −1
is less than the value at 2. So the point (−1, −6) gives the global minimum value for
y when x is restricted to the interval [−1, 8]. Similarly, the global maximum clearly
occurs at x = 8 rather than x = 0. In fact y = 12 at x = 8; so the point (8, 12) gives
the global maximum y value.
y

0
x
−4 −3 −2 −1 1 2 3 4 5
−3

Point of inflection −6

−9

y = 5x − 1 −12
(dotted line) −15

−18

−21

−24

The graph of y = x2/3 (x − 5). The line y = 5x − 1 is the


tangent at (−1, −6); it crosses the graph at (−1, −6).

Example 2.6
Suppose that a tank with a rectangular base, straight sides and no top is to have a
volume of 4 cubic metres. The width of the base is to be 1 metre. Suppose also that
material for the base costs $10 per square metre, while material for the sides costs
$5 per square metre. Our task is to find the least possible total cost of materials.
The first step is to draw a diagram (see below) and identify the variables involved.
The volume and width of the base are constants. The variables are the length of the

width = 1 m.

base (call it ℓ), the height of the tank (call it h) and the cost (call it $C). The aim
is to find the global minimum value of C.
Next, write down the relationships between the variables. Since

Volume = length × width × height

and the volume is required to be 4, we have

4 = ℓ × 1 × h.
46 Chapter 2 Optimisation

That is, lh = 4. This enables us to eliminate one of these two variables, either ℓ or
h, by expressing it in terms of the other. It does not matter which we one choose.
So let us use
h = 4/ℓ
to eliminate h whenever it arises. (As it happens, it would simplify the calculations
a little to eliminate ℓ instead. We leave it to the student to redo the problem in this
alternative fashion.)
The cost is given by

C = 10 × Area of base + 5 × Area of sides


= 10 × ℓ × 1 + 5 × (ℓ × h + ℓ × h + 1 × h + 1 × h)
4 4
= 10ℓ + 5(4 + 4 + + )
ℓ ℓ
40
= 10ℓ + + 40.
l
So we now have an expression for C in terms of one variable only (namely ℓ). Note
that the physical constraints of the problem mean that ℓ > 0.
We now differentiate the expression for C, and find the critical points.
dC 40
= 10 − 2 .
dℓ l
So dC/dℓ = 0 when 10 = 40 ℓ2 ; that is, when ℓ = 2. The derivative is undefined when
ℓ = 0, but this is not in the domain of the function. Since dC/dℓ is negative for
0 < ℓ < 2 and positive for 2 < ℓ we see that C is decreasing on the interval 0 < ℓ < 2
and increasing on the interval 2 < ℓ. Hence it has a global minimum at ℓ = 2. (One
could also check that the graph of C is concave upward for all ℓ > 0, by showing that
d2 C/dℓ2 > 0. This is quite straightforward.)
The minimum cost is found by substituting l = 2 into C = 10ℓ + 40 ℓ + 40; so we
find that
minimum = 20 + 20 + 40 = 80.
The least expensive possible tank therefore costs $80. The student might find it
useful to evaluate C for some other values of ℓ, such as ℓ = 3 and ℓ = 1.5, and
confirm that the values obtained are greater than 80.

2.1.5 Absolute and relative growth rates


If y(t) denotes the size of an organism at time t then we call dy
dt the absolute growth
rate, or AGR, of y. When comparing the progress of organisms of different sizes, say
an apple tree and a bonsai apple plant, the absolute growth rate may not be the most
appropriate quantity to use as we would expect it to be higher for the regular apple
tree simply because it is bigger. An alternative measure of growth is the relative
growth rate, or RGR:
1 dy absolute growth rate of y
relative growth rate of y = = .
y dt y
The relative growth rate is the absolute growth rate per unit size, and by using this
measure of growth we put the apple tree and the bonsai version on an equal footing.
2.1 One Variable Problems 47

d
 1 dy
Since dt ln y(t) = y dt , we may write
d 
RGR of y = ln y(t) .
dt

Example 2.7
Assume that for the first 30 days the height of a parsley plant is given (in centimetres)
1 2
by y(t) = 200 t , where time t is measured in days. Calculate the absolute and relative
growth rates of the parsley plant during the first 30 days and find their maxima during
this time, if they exist.
The absolute growth rate is just the derivative:
dy 1
AGR =
= t.
dt 100
The AGR is defined for all 0 ≤ t ≤ 30, and it is an increasing function, which can be
seen either by graphing it, or by calculating that its derivative is positive throughout
this interval:
d(AGR) d2 y 1
= 2 = ,
dt dt 100
a positive number.
Therefore the absolute growth rate has a maximum at t = 30. Its maximum
value is AGR(30) = 0.3 cm per day.
AGR
0.3
0.2
0.1

5 10 15 20 25 30 t

The relative growth rate is


1
1 dy 100 t 2
RGR = = 1 2 = .
y dt 200 t
t
Notice that is not defined when t = 0; as t tends towards 0 from the positive side,
RGR tends towards +∞. It is a decreasing function for 0 < t ≤ 30, which can be
seen either by graphing it, or by calculating that its derivative is negative throughout
this interval:
d(RGR) d 2 2
= = − 2,
dt dt t t
which is negative for all 0 < t ≤ 30. Since the relative growth rate is decreasing and
is not defined at the beginning value t = 0, it has no maximum.

RGR
2.0
1.5
1.0
0.5

5 10 15 20 25 30 t
48 Chapter 2 Optimisation

Exercises Set 2.1

1. Find any stationary points, and any points of inflection, and hence sketch the
curves.
(a) f (x) = x2 − 10x + 12 (b) f (x) = 1 + 3x − x2
(c) f (x) = x3 − 3x2 + 5 (d) y = 2x3 − 9x2 + 12x − 2
1
(e) y = x4 − 4x3 − 2 (f) f (x) = x 5 (x + 6)

2. Show that f (x) = x3 − 3x2 + 3x + 7 has neither a local maximum nor minimum
at x = 1.

3. Show that f (x) = x + x1 has a local maximum and a local minimum, but its
value at the local maximum is less than its value at the local minimum.

4. Find the global maximum and global minimum points for each of the following
functions in the indicated intervals.
(i ) f (x) = x3 − 75x + 1; −1 ≤ x ≤ 6.
(ii ) f (x) = 3x−1
x+1 ; 1 ≤ x ≤ 10.

5. Find the values of x for which f (x) = x21+1 is (a) increasing; (b) decreasing;
(c) concave upwards; (d) concave downwards. Also find any points of inflection.

6. When a dose d of a certain drug is administered to a patient his blood pressure


drops by an amount p given by p = 13 d2 (c − d), where c is a constant. Find the
dosage that provides the greatest drop in blood pressure.

7. During a certain influenza epidemic, the proportion of the population in the


lower mainland who are infected is denoted by y(t), where t is the time in weeks
t
since the start of the epidemic. It is found that y(t) = 4+t 2.

(a) For what value of t is y a maximum?


(b) For what values of t is y increasing, and for what values is it decreasing?

8. The yield of fruit from each tree of an apple orchard decreases as the density
at which the trees are planted increases. When there are n trees per acre, the
average number of apples per tree is known to be equal to 900 − 5n, for a
particular variety of apple (where n lies between 30 and 60). What value of n
gives the maximum total yield of apples per acre?

9. An individual suffering from a disease is√administered an amount x of a drug.


x
His probability of being cured is then 3(1+x) . Find the value of x which gives
the maximum probability of being cured.

10. The reaction as a function of time (measured in hours) to two drugs is given by
2
R1 (t) = te−t , R2 (t) = te−2t .

Which drug has the larger maximum reaction?

11. Find two numbers whose sum is 14 and whose product is maximum.
2.2 Functions with two or more variables 49

12. Find two numbers whose sum is 8 and the sum of their squares is a minimum.
13. A farmer wishes to enclose a rectangular paddock using only 200m of fencing.
What is the largest area he can enclose?
14. Repeat exercise 13 for the case in which one side of the paddock makes use of an
existing fence, and only three new sides need to be constructed using the 200m
of available fencing.
15. An open rectangular box is to be made from a piece of cardboard 8cm wide and
15cm long by cutting a square from each corner and bending up the sides. Find
the dimensions of the box of largest volume.
16. A cistern is to be constructed to hold 324m3 of water. The cistern has a square
base and four vertical sides, all made of concrete, and a square top made of
steel. If the steel costs twice as much per unit area as the concrete, determine
the dimensions of the cistern that minimize the total cost of construction.
17. A eucalyptus tree grows according to a logistic curve, with its height at time
t ≥ 0 years given by
60
h(t) = metres.
1 + e(5−t)
(a) If the absolute growth rate has a maximum, find the time at which it
occurs, and the height of the tree at this time.
(b) Calculate the relative growth rate of the tree.

§2.2 Functions with two or more variables

2.2.1 Coordinates in 3 dimensions


In this section we will extend the ideas of the last section to functions with more
than one independent variable. In reality, of course, many quantities depend on two
or more variables. For example, a person’s reaction to a drug depends not only on
the dosage, but also on the person’s body weight, the presence of other drugs in the
body, and so on. The volume of a cylinder is a function of two variables, its radius
and height. We write V = f (r, h) = πr2 h.
The points (x, y) that satisfy the equation y = f (x) (with one independent
variable only) can be plotted in two dimensions, using two axes. Similarly, the points
(x, y, z) that satisfy the equation z = f (x, y) (with two independent variables) can
be plotted in three dimensions, using three axes at right angles to one another. Since
we only have a 2-dimensional page on which to represent 3-dimensional space you
have to imagine the following diagram as having the y- and z-axes in the plane of
the page, and the x-axis pointing straight out from the page (at right angles to it).
z

x
The broken lines represent the negative parts of the axes.
50 Chapter 2 Optimisation

Any two intersecting straight lines determine a plane. In particular, the y- and
z-axes determine a vertical plane, containing all points (0, y, z), with equation x = 0.
Similarly, the x- and y-axes determine the horizontal plane z = 0 and the x- and
z-axes determine the vertical plane y = 0.
An example of a function in two variables is z = 4 − x − 2y. The following points
satisfy this equation: A (4, 0, 0), B (0, 2, 0), C (0, 0, 4), D (1, 1, 2) and E (−1, 3, −1).
These points and the resulting plane are plotted on the following diagram.

Remember that this diagram represents 3-dimensional space. The point E, for exam-
ple, can plotted by moving 1 unit backwards from 0 along the x-axis, 3 units in the
y direction, and then 1 unit down in the (negative) z direction, so that it is situated
one unit below the horizontal plane formed by the x- and y-axes. Similarly, D can
be plotted by moving from the origin 2 units vertically along the z-axis, then 1 units
in the positive x direction and then 1 unit in the y direction.
We note that the domain of this function z = f (x, y) = 4 − x − 2y is the set
of all ordered pairs (x, y) of real numbers, and that the range is the set of all real
numbers z. All the points (x, y, z) that satisfy the equation z = 4 − x − 2y lie in
a plane determined by any three points that satisfy the equation. In general, the
sets of points (x, y, z) that satisfy an equation z = f (x, y) will form a surface in
3-dimensions. Sketching such a surface is by no means easy—see p444 of Arya and
Lardner, Mathematics for the Biological Sciences, for some examples.
2.2.2 Partial derivatives
In section 2.1 we were able to determine maximum and minimum values of a function
by considering its derivatives. We now extend this idea to functions of more than
one variable, whose derivatives are called partial derivatives. In this section we will
see, by way of examples, how to differentiate such functions.
If z = f (x, y) is determined by two independent variables, x and y, then it has
two partial derivatives, one with respect to x and one with respect to y. They are
often written as fx (x, y) and fy (x, y). We also use the notation
∂z
= fx (x, y)
∂x
and
∂z
= fy (x, y)
∂y
for the partial derivatives.
2.2 Functions with two or more variables 51

∂z , pretend that y is a constant and differentiate the expression for z in


To find ∂x
the usual way. For example, if z = 25 − x2 − y 2 then
∂z
= −2x
∂x
Note that the derivative of y 2 with respect to x is zero, since we are treating y as
a constant. Similarly, the partial derivative of z with respect to y is calculated by
regarding x as a constant, and so
∂z
= fy (x, y) = −2y.
∂y

Example 2.8
∂z ∂z
p
Calculate ∂x and ∂y if z = x − 3y 2 .
2

Solution: Keeping y constant, and using the chain rule:


 
∂z = ∂ (x2 − 3y 2 )1/2
∂x ∂x
1 2 ∂ (x2 − 3y 2 )
= (x − 3y 2 )−1/2 · ∂x
2
1
= (x2 − 3y 2 )−1/2 · 2x
2
x
=p .
x − 3y 2
2

Keeping x constant:
 
∂z ∂ (x2 − 3y 2 )1/2
= ∂y
∂y
1 2 ∂ (x2 − 3y 2 )
= (x − 3y 2 )−1/2 · ∂y
2
1
= (x2 − 3y 2 )−1/2 · −6y
2
−3y
=p .
x2 − 3y 2

Recall that the second derivative of a function of one variable was useful in
determining maximum and minimum values. We shall find that the second-order
partial derivatives have a similarly important role in determining maximum and
minimum values of functions of two variables.
For the function z = f (x, y) there would seem to be four second-order partial
derivatives, calculated as follows:
2
∂z with respect to x, to get ∂ z .
(i) Differentiate ∂x ∂x2
∂z ∂2z .
(ii) Differentiate ∂x with respect to y, to get ∂y∂x
∂z with respect to x, to get ∂ 2 z .
(iii) Differentiate ∂y ∂x∂y
∂z ∂2z
(iv) Differentiate ∂y with respect to y, to get ∂y2 .
It turns out, however, that for any of the functions that we shall consider, the deriva-
∂ 2 z and ∂ 2 z are always equal. So there are three second-order partial deriva-
tives ∂x∂y ∂y∂x  
∂2z ∂2z ∂2z ∂ 2 z . Alternatively, we write f (x, y), f (x, y) and
tives, ∂x2 , ∂y2 and ∂x∂y = ∂y∂x xx yy
fxy (x, y) = fyx (x, y).
52 Chapter 2 Optimisation

Example 2.9
Calculate the second derivatives of the function z = x2 y 5 .
Solution: The first derivatives are:
∂z
= 2xy 5
∂x
and
∂z
= 5x2 y 4 .
∂y

Differentiating each of these again with respect to x (and keeping y constant), gives

∂2z
= 2y 5
∂x2
and
∂2z
= 10xy 4 .
∂x∂y

∂z again, with respect to y (keeping x constant) we have:


Differentiating ∂y

∂2z
2
= 20x2 y 3 .
∂y
2 2
∂z again with respect to y, and check that ∂ z = ∂ z .
You should differentiate ∂x ∂x∂y ∂y∂x

2.2.3 Geometric interpretation of partial derivatives


If f is a differentiable function of one variable then at every point on the graph of
y = f (x) there is a tangent, the slope of which is given by the first derivative of f
at that point. However, if f is a smooth function of two variables then the graph of
z = f (x, y) is a surface in 3-dimensions, and at any point on the graph there will be
an infinite number of tangent lines. In most cases these will have different slopes.
(Imagine a road down a steep mountainside. If it goes straight down then you will
be driving down a steep slope, but if it zig-zags then the slope will be more gentle.)
All the tangent lines at any point lie in one plane, called the tangent plane. In this
section, we shall see how the first partial derivatives of a function f (x, y) give us the
slopes of two particular lines in the tangent plane.
Let z = f (x, y), and suppose that we keep y fixed at some constant value y0 .
Then z = f (x, y0 ) is now a function of x alone. Geometrically, this equation repre-
sents the 2-dimensional curve which results from cutting the surface z = f (x, y) with
∂z gives us the slope of the tangent
the vertical plane y = y0 . The partial derivative ∂x
line to this curve. Note that this tangent lies in the plane y = y0 (which is parallel
to the xz-plane).
Consider, for example, the function z = 25 − x2 − y 2 . The graph of this function
is a surface that is known as a paraboloid. You can visualize it by first drawing the
2.2 Functions with two or more variables 53

parabola z = 25 − y 2 in the (z, y)-plane, and then imagining the surface that is
swept out when this curve is rotated about the z-axis. The diagram below shows the
portion of the surface that is above the (x, y)-plane.

Imagine cutting this surface with the vertical plane y = 3. The values of x and z
such that the point (x, 3, z) lies on the surface must satisfy the equation z = 25−x2 −9
(since 32 = 9), or z = 16 − x2 . This is the equation of a parabola lying in the plane
y = 3 (which is simply a copy of the (x, z)-plane). It describes the cross-section of
the surface that is obtained when the portion corresponding to y > 3 is sliced away
(as shown in the right-hand diagram below).

∂z = −2x. For example,


The slope of this parabola at any point is given by ∂x
when x = 2 we have z = 25 − x2 − 9 = 25 − 4 − 9 = 12, and the tangent at the point
(2, 3, 12) (shown in the diagram) has slope −2x = −4. (The equation of this tangent
is z = 20 − 4x, in the plane y = 3).
z z
Tangent
(4x + z = 20,
y = 3)

y y

x x
The surface z = 25 − x2 − y 2 The surface z = 25 − x2 − y 2
for nonnegative values of z. for z ≥ 0 and y ≤ 3.

∂z is the slope of the tangent to the 2-dimensional curve that results


Similarly, ∂y
from the intersection of a surface z = f (x, y) and a vertical plane x = x0 . If, for
example, the surface z = 25 − x2 − y 2 is cut by the plane x = 2 then the parabola
z = 21 − y 2 is obtained, and the partial derivative ∂y∂z = −2y gives the slope of the

tangent to this parabola. At the point (2, 3, 12) this slope is −6. The diagram below
illustrates the surface cut by both the planes y = 3 and x = 2 and the two tangents
54 Chapter 2 Optimisation

at (2, 3, 12) described above.


z
y tangent
(6y + z = 30, x = 2)

x tangent
(4x + z = 20, y = 3)

The point (2, 3, 12)

x
The surface z = 25 − x2 − y 2
for z ≥ 0, y ≤ 3 and x ≤ 2.

Note that the tangent labelled “x tangent” lies in the plane y = 3, which is
parallel to the xz-plane, while the one labelled “y tangent” lies in the plane x = 2,
parallel to the yz-plane. Note also that the planes y = 3 and x = 2 are perpendicular
to each other. Any one line, of course, lies in many different planes in space. Any
two intersecting lines, however, determine a unique plane. In the example illustrated
above the two intersecting tangents determine the tangent plane to the surface at the
point (2, 3, 12).
2.2.4 Maxima and minima of functions of two variables
Local maximum and minimum values of a function of two variables are defined sim-
ilarly to those for functions of one variable.
Definition: (a) A local maximum value of z = f (x, y) is a value z0 = f (x0 , y0 ) such
that f (x0 , y0 ) > f (x, y) for all points (x, y) in the domain of f close to (x0 , y0 ).
(b) A local minimum value of z = f (x, y) is a value z0 = f (x0 , y0 ) such that
f (x0 , y0 ) < f (x, y) for all points (x, y) in the domain of f close to (x0 , y0 ).
A local maximum is akin to the top of a mountain, which may or may not be
the highest point in the mountain range. A local minimum is akin to the bottom of
a hole in the ground, which may or may not be deeper than others.
As with the one variable case, at a local maximum/minimum point (x0 , y0 , z0 ) of
z = f (x, y), either (x0 , y0 ) is on the boundary of the domain of f (x, y), or at least one
of the partial derivatives fx (x0 , y0 ), fy (x0 , y0 ) is not defined, or fx (x0 , y0 ) = fy (x0 , y0 ) = 0,
i.e. the tangent plane at (x0 , y0 , z0 ) is horizontal.
For the rest of this section we will assume that the surface z = f (x, y) is smooth
which means that fx and fy are defined throughout the domain of f . Points at which
2.2 Functions with two or more variables 55

fx = fy = 0 or one of are called critical points, and the tangent plane at such a point
local maximum

local minimum

is horizontal. It is not the case, however, that all such points are local maximum
or minimum values. This is not surprising, since we know that for functions of one
variable the points where the derivative is zero are not necessarily maxima or minima.
However, for functions of two variables the situation is more complicated still, since
there can exist points that are local minima if y is held constant and x is varied, and
are local maxima if x is held constant and y is varied (or vice versa).
To understand the last sentence above, imagine the middle point on the surface
of a horse’s saddle. Let us say that the y direction is towards the horse’s head and
the x direction is sideways. Moving from the middle point in the y direction or
the negative y direction, the saddle rises; this helps to stop the rider from slipping
forward or backward. So the middle point is a local minimum if y is varied and
x stays constant. On the other hand, moving sideways from the middle point the
saddle descends, showing that we have a local maximum if x is varied and y is held
constant.
Definition: Points of the kind just described are called saddle points.
The surface z = f (x, y) = y 2 − x2 has a saddle point at the origin. To check
this, we first calculate the partial derivatives. We find that
∂ 2
fx (x, y) = (y − x2 ) = 0 − 2x = −2x
∂x
and
∂ 2
fx (x, y) = (y − x2 ) = 2y − 0 = 2y.
∂y
So it is certainly true that fx (0, 0) = fy (0, 0), showing that (0, 0) is a critical point.
If we hold y = 0 and allow x to vary then the equation for z becomes z = −x2 , and
there is a maximum at x = 0. On the other hand, if we hold x = 0 and vary y then
we find that z = y 2 , which has a minimum at y = 0. So (0, 0, 0) is a saddle point.
z

The surface z = y 2 − x2 for −1 ≤ y ≤ 1 and z ≥ −1


All critical points that we shall meet in this course will be either local maxima,
local minima or saddle points. Moreover, by investigating the function values near
56 Chapter 2 Optimisation

the critical point it will be relatively easy to decide, in each case, which kind of
critical point it is.

Example 2.10
p
Determine the critical points, if any, for z = 4 − x2 − y 2 .
Solution:
∂z 1 −x
= (4 − x2 − y 2 )−1/2 × (−2x) = p ,
∂x 2 4 − x2 − y 2
∂z 1 −y
= (4 − x2 − y 2 )−1/2 × (−2y) = p .
∂y 2 4 − x2 − y 2

Thus ∂x ∂z = 0 only when x = 0, and ∂z = 0 only when y = 0. So the only critical


∂y
point is (0, 0, 2). Since x2 + y 2 is always
p greater than or equal to 0 it follows that
2 2
4−x −y can never exceed 4. So z = 4 − x2 −p y 2 can never exceed 2, and therefore
(0, 0, 2) is a maximum point. (The surface z = 4 − x2 − y 2 is a hemisphere, with
centre at (0, 0, 0) and radius 2.) z

y
x

2.2.5 Tests for maxima and minima of functions of two variables

A test exists to determine the nature of a critical point of a function of two


variables, although as you might expect, it is not as simple as the second derivative
test in the one variable case.
Before we state the test, let’s try to visualise a surface with a local minimum at
(a, b) – think of the bottom of a bowl in space. If we keep y constant (y = b) and
take a vertical cross-section through the surface z = f (x, y) parallel to the x-axis, the
curve of intersection has a local minimum at x = a. To the left of a the slopes of the
tangents fx (x, b) are negative, and to the right of a they are positive. Therefore the
slopes of the tangents are increasing as we move from left to right through a, that is
fxx (a, b) > 0. If we keep x fixed (x = a) and take a vertical cross-section parallel to
the y axis, again the curve of intersection has a local minimum at y = b and a similar
conclusion is drawn, namely fyy (a, b) > 0. In other words, at a point (a, b) where f
has a minimum, both fxx and fyy are positive.
We can repeat this line of reasoning for a maximum point, and deduce that at
such a point both fxx and fyy are negative.
Might this type of condition form our new second derivative test for a mini-
mum/maximum? Almost, but not quite! It turns out that these conditions on fxx
and fyy are not enough to guarantee a minimum/maximum and we need to involve
the mixed partial derivative fxy at (a, b) as well. The test requires the calculation of
an expression known as the discriminant of f .
2.2 Functions with two or more variables 57

Definition: The ”discriminant” D of f at the point (a, b) is defined to be the expres-


sion
2
D(a, b) = fxx (a, b)fyy (a, b) − (fxy (a, b)) .

The second derivative test can now be stated:


The Discriminant Test: Let f (x, y) be a function of two variables which has contin-
uous second order partial derivatives. Suppose (a, b) is a point such that the parital
derivatives fx (a, b) = fy (a, b) = 0.

If the discriminant D(a, b) is positive and fxx (a, b) > 0 then there is a local
minimum at (a, b).

If the discriminant D(a, b) is positive and fxx (a, b) < 0 then there is a local
maximum at (a, b).

If the discriminant D(a, b) is negative then there is a saddle point at (a, b).

If the discriminant D(a, b) is zero then we can’t draw any conclusion without
further work.
If you read the wording of this test carefully, you’ll notice that it appears to
give special prominence to the sign of fxx (a, b) while ignoring the sign of fyy (a, b)
completely. However we have seen that at a minimum or maximum point, these
particular second partial derivatives have the same sign and so only one is needed in
the statement of the test.
Note that it is not possible for D(a, b) to be positive and fxx (a, b) = 0, for if
2
fxx (a, b) = 0 then D(a, b) = − (fxy (a, b)) ≤ 0.

Example 2.11

Find the discriminant D of f at each of its critical points for the function
f (x, y) = x4 + y 4 − 4xy + 1, and hence identify the critical points as local max-
ima, local minima or saddle points, if possible.
Solution: First we calculate the first and second partial derivatives.

fx (x, y) = 4x3 − 4y,


fy (x, y) = 4y 3 − 4x,
fxx (x, y) = 12x2 ,
fyy (x, y) = 12y 2 ,
fxy (x, y) = fyx (x, y) = −4.

To find critical points we solve x3 − y = 0 and y 3 − x = 0 simultaneously.


Substituting y = x3 into the second equation gives

x9 − x = x(x8 − 1) = x(x4 − 1)(x4 + 1) = x(x2 − 1)(x2 + 1)(x4 + 1) = 0,

which implies that x = 0 or x = ±1 (since x is real). Then since y = x3 , we must


have y = 0 or y = ±1, respectively. There are three critical points, (0, 0), (1, 1) and
58 Chapter 2 Optimisation

(−1, −1). At the point (0, 0), all the above partial derivatives are 0 except for the
last, fxy (0, 0) = −4. Therefore
2
D(0, 0) = fxx (0, 0)fyy (0, 0) − (fxy (0, 0)) = −(−4)2 = −16 < 0.
The test shows that there is a saddle point at (0, 0).
At the point (1, 1), we have fxx (1, 1) = fyy (1, 1) = 12 and fxy (1, 1) = −4.
2
D(1, 1) = fxx (1, 1)fyy (1, 1) − (fxy (1, 1)) = 12 × 12 − (−4)2 = 128,
and since fxx (1, 1) = 12 > 0 there must be a local minimum at (1, 1). Similarly, you
can confirm that there is also a local minimum at (−1, −1).

Example 2.12
A courier company will accept rectangular parcels for delivery only if the sum of the
parcel’s length and girth (girth = smallest distance around) does not exceed 200cm.
Find the dimensions of the parcel with maximum volume accepted by the courier
company.
Solution: Step 1: First we draw a diagram, and identify the variables involved.
girth

x

Let ℓ be the length of the parcel, x the width of the base and y the height.
(Presumably, in this context, the largest of the three dimensions is the one referred
to as the length; however, we shall solve the problem without assuming this.) Let V
be the volume and g the girth.
Step 2: The next task is to identify the independent variables and the quantity to be
optimised. Here x, y and ℓ are all independent variables, and the task is to maximize
the volume, V .
Step 3: We now write down all the relationships between the variables. Observe that
g = 2x + 2y (by definition), and V = xyℓ. The courier’s requirement that
girth + length ≤ 200
therefore becomes 2x + 2y + l ≤ 200. A parcel with 2x + 2y + ℓ < 200 will obviously
not have the maximum allowable volume, since by keeping x and y unchanged and
increasing ℓ until 2x + 2y + ℓ = 200 will produce an acceptable parcel with a larger
volume. So we can assume that 2x + 2y + l = 200, and thus l = 200 − 2x − 2y. This
observation reduces the problem to one in which there are only two independent
variables, since we can eliminate ℓ.
We now proceed to express V in terms of the independent variables x and y.
V = xyℓ
= xy(200 − 2x − 2y)
= 200xy − 2x2 y − 2xy 2 .
2.2 Functions with two or more variables 59

Step 4: Differentiate and find any points (x, y) for which ∂V ∂V


∂x = ∂y = 0.

∂V
= 200y − 4xy − 2y 2
∂x
= 2y(100 − 2x − y),
∂V
= 200x − 2x2 − 4xy
∂y
= 2x(100 − x − 2y).

Thus we must solve the following simultaneous equations:

2y(100 − 2x − y) = 0 (1)
2x(100 − x − 2y) = 0. (2)

Now Eq. (2.2) gives y = 0 or 2x + y = 100, and Eq. (2) gives y = 0 or 100 = x + 2y.
Clearly solutions with either x = 0 or y = 0 will not produce the maximum, since
they give zero for the volume. Hence

2x + y = 100 (3)
x + 2y = 100. (4)

Twice Eq. (3) minus Eq.(4) gives x = 100/3, and substituting back gives y = 100/3
also. Therefore ℓ = 200 − 200/3 − 200/3 = 200/3.
We still have to show that the values of x and y that we have found do gen-
uinely give a maximum value of V = xy(200 − 2x − 2y). Observe that the physical
interpretation of the variables forces ℓ ≥ 0, as well as x ≥ 0 and y ≥ 0. Now
ℓ = 200 − 2x − 2y ≥ 0 gives x + y ≤ 100, which mean that the point (x, y) lies on or
below the line x + y = 100. The points in the (x, y)-plane with x and y positive and
x + y < 100 form the interior of the triangular region shown below, and this is there-
fore the domain on which V is defined. The boundaries of this region correspond to
y
100

x
100
x = 0, y = 0 and ℓ = 0, and so V = 0 at all points on the boundary. Inside the region
V takes positive values. The fact that V is a continuous function of x and y means
that it must have a maximum value somewhere in the region, and the point that we
have found is the only possibility, since it is the only critical point in the region. So
the dimensions of the parcel with maximum volume are 100 100 200
3 cm × 3 cm × 3 cm.
Note that our reasoning in the preceding paragraph merely consisted of analysing
something that was intuitively clear: there had to be some maximum, and it clearly
does not lie on the boundary of the domain since V (x, y) = 0 there. So the only
possible critical point had to give that maximum.
60 Chapter 2 Optimisation

Exercises Set 2.2

1. Evaluate f (2, 1) for each of the following functions f .



(i ) f (x, y) = x2 + y (ii ) f (x, y) = x2 − y 2 (iii ) f (x, y) = xy
p
(iv ) f (x, y) = 1 − x2 y 2 (v ) f (x, y) = ex+y

2. Describe the sets of points (x, y, z) that satisfy the following conditions.
(i ) y=3 (ii ) z = −3 (iii ) x = 1 and y = 0
∂z ∂z
3. Find and if (i) z = x3 + 2x2 y + y 2 , (ii) z = x ln y + yex .
∂x ∂y
4. Find fx , fy , fxx , fxy and fyy in each of the following cases.
x
(i ) f (x, y) = xey + yex + x (ii ) f (x, y) = (iii ) f (x, y) = ln (x2 + y 2 )
y
y
5. Show that fxy = fyx for: (i) f (x, y) = 3x4 y 2 + 7x2 y (ii) f (x, y) =
x2
6. Find the slope of the tangent line to the curve of intersection of the cylinder
√ √
5 3
2
4z = 5 16 − x and the plane y = 3 at the point (2, 3, 2 ).

7. Demand for butter in Sydney (in kg) is given by z = 2400 − 50x + 90y where x
is the price per kg of butter and y is the price per kg of margarine.
Find zx and zy and interpret your answer.

∂p ∂2p
8. (i ) If p(x, y) = ln(x2 + y 2 ) find and
∂x ∂y∂x
−t
(ii ) Show yxx = yt for y(x, t) = e sin x
(iii ) Find all the local minima and maxima of z = x2 − 2x + y 2 + 2y + 3
(iv ) Find the greatest and least value of z in Part (iii) if −2 ≤ x ≤ 2, and
y = −3
(v ) Find the greatest and least value of z in Part (iii) if −2 ≤ x ≤ 2, and
y = 2x
(vi ) Find the greatest and least value of z in Part (iii) if 0 ≤ x ≤ 1, and
0≤y≤3

∂2z ∂2z
9. (i ) If z = ex sin y show that + =0
∂x2 ∂y 2
(ii ) If f (x, y) = x3 − 3x2 y 2 + 2x, find fx (1, 1) and fy (1, 1).

§2.3 Least Squares

2.3.1 The line of best fit


Often we need to fit a straight line to data which, due to statistical fluctuations or
experimental errors, is only approximately linear. The question then is this: among a
2.3 Least Squares 61

variety of lines y = a + bx, which fits the given data best? The least squares method
of finding the best fitting straight line is a further application of partial derivatives.
Suppose we graph the following data:

x 1 2 3 4
y 1 2 2 2

In the left hand diagram below, lines 2 and 3 clearly fit the data better than line 1.
But which of 2 and 3 is better?
The answer to this question, of course, is that it depends what you mean by
better. We need some sensible way of measuring how well a line fits the given data
points.
line 3
(x1 , f (x1 ))
2 line 2
f (x1 ) ℓ
1 y1
(x1 , y1 )
line 1
0
0 1 2 3 4 x0 x1 x2 x3 x4
A convenient measure of how well an arbitrary function f fits a given tabulated
function is the sum of the squares of the differences between f (xi ) and yi , taken over
all pairs of tabulated values (xi , yi ). In other words, we define

S = (f (x1 ) − y1 )2 + (f (x2 ) − y2 )2 + · · · + (f (xn ) − yn )2

and consider the function f to be a good fit to the tabulated points if S is small: the
smaller S is, the better the fit.
In the right-hand diagram above, the question is to find how well the line marked
ℓ fits the given data. That is, we are assuming that ℓ is the graph of y = f (x), and
we want to measure how close the graph is to the data points. So for each data point
(xi , yi ) we find the point on ℓ that is directly above or below it, and we determine
the distance between these points. In fact, the point on the graph of y = f (x) that is
directly above or below (xi , yi ) is the point (xi , f (xi )), and its distance from (xi , yi )
is |f (xi ) − yi |. In the diagram these distances are the lengths of the four vertical
dotted lines, and the quantity S is the sum of the squares of these four lengths.
Assume now that f (x) = a + bx for some constants a and b, so that (as in the
diagram) the graph of f is a straight line. The formula for S then becomes

S = (a + bx1 − y1 )2 + (a + bx2 − y2 )2 + · · · + (a + bxn − yn )2 .

We can now think of S as a function of the two independent variables a and b. The
problem of finding the line that gives the minimum value of S is now a calculus
problem: find the values of the variables a, b that minimize S. The minimum will
occur at a critical point; so we must find the values of a and b for which ∂S ∂S
∂a = ∂b = 0.
Note that since S is a sum of squares it is always positive; moreover, making either
a or b extremely large will inevitably make S extremely large. Hence it is clear that
there must be some point at which a minimum occurs, and this must be a point such
that ∂S ∂S
∂a = ∂b = 0.
62 Chapter 2 Optimisation

Differentiating S with respect to a and equating to zero gives

2(a + bx1 − y1 ) + 2(a + bx2 − y2 ) + · · · + 2(a + bxn − yn ) = 0.

Dividing through by 2, then collecting like terms and moving the constants to the
right hand side, this becomes

na + (x1 + x2 + · · · + xn )b = y1 + y2 + · · · + yn .

Similarly, differentiating S with respect to b and equating to zero gives

2x1 (a + bx1 − y1 ) + 2x2 (a + bx2 − y2 ) + · · · + 2xn (a + bxn − yn ) = 0,

which rearranges to give

(x1 + x2 + · · · + xn )a + (x21 + x22 + · · · + x2n )b = x1 y1 + x2 y2 + · · · + xn yn .

We introduce the following convenient abbreviations:


X
xi = x1 + x2 + · · · + xn ,
X
y i = y 1 + y 2 + · · · + yn ,
X
x2i = x21 + x22 + · · · + x2n ,
X
x i yi = x 1 y1 + x 2 y2 + · · · + x n yn .

With this notation our equations above become


X X
na + ( xi )b = yi ,
X X X
( xi )a + ( x2i )b = x i yi .

These equations have a unique solution (a, b), which must minimize the sum of the
squares S. The resulting straight line y = a + bx is called the least squares best fitting
straight line for the given data.

Example 2.13
For the data considered above we have four points; so n = 4. The following P table
2
gives
P the P values of xi , yi , xi and xi yi in all 4 cases, enabling the quantities
P xi ,
2
yi , xi and xi yi to be readily calculated.

xi yi x2i x i yi
1 1 1 1
2 2 4 4
3 2 9 6
4 2 16 8
P P P 2 P
xi = 10 yi = 7 xi = 30 xi yi = 19

So the equations for a and b are

4a + 10b = 7,
10a + 30b = 19.
2.3 Least Squares 63

These are easily solved, giving a = 1 and b = 0.3. So the least squares best fitting
linear function is
y = 1 + 0.3x.

This is in fact the line marked as line 3 in the left hand diagram above.

2.3.2 Polynomials of best fit


The calculations above can be generalized to give a method of finding the least squares
best fitting polynomial function of degree m, for any specified positive integer m. For
example, to find the least squares best fitting quadratic, y = a + bx + cx2 , we need
to find the values of a, b and c that minimize the function

S(a, b, c) = (a + bx1 + cx21 − y1 )2 + (a + bx2 + cx22 − y2 )2 + · · · + (a + bxn + cx2n − yn )2 .

Here there are three independent variables a, b and c, but the procedure remains the
same. The critical points are the points at which ∂S ∂S ∂S
∂a = ∂b = ∂c = 0. We get the
following equations:
X X X
na + ( xi )b + ( x2i )c = yi ,
X X X X
( xi )a + ( x2i )b + ( x3i )c = x i yi ,
X X X X
( x2i )a + ( x3i )b + ( x4i )c = x2i yi .

Example 2.14
Using the same four data points as above, we get the following table.

xi x2i x3i x4i yi x i yi x2i yi


1 1 1 1 1 1 1
2 4 8 16 2 4 8
3 9 27 81 2 6 18
4 16 64 256 2 8 32
10 30 100 354 7 19 59

The equations for a, b, c are, therefore,

4a + 10b + 30c = 7,
10a + 30b + 100c = 19,
30a + 100b + 354c = 59.

The solution is
a = −0.25, b = 1.55, c = −0.25.

So the least squares best fitting quadratic is given by the equation

y = −0.25 + 1.55x − 0.25x2 .


64 Chapter 2 Optimisation

0
0 1 2 3 4

Exercises Set 2.3

1. The following points were obtained in an experiment:

x 1 2 3 4 5 6
y 3 5 6 6 7 9

Plot the points on a diagram, and use the method of least squares to fit a straight
line.

2. Fit a straight line by the least squares method to each of the following sets of
data:
(i ) toughness (x) and percentage of nickel (y) in eight specimens of alloy steel.

toughness (x) 36 41 42 43 44 45 47 50
% nickel (y) 2.5 2.7 2.8 2.9 3.0 3.2 3.3 3.5

(ii ) aptitude test mark (x) given to six trainee salesmen, and their first year
sales (y) in hundreds of dollars.

aptitude test (x) 25 29 33 36 42 54


first-year sales (y) 42 45 50 48 73 90

For both sets of data, plot the points and draw the least squares line on a graph.
Use the lines to predict
(a) the % nickel of a specimen of steel whose toughness is 38, and
(b) the likely first-year sales of a trainee salesman who obtains a mark of 48
on his aptitude test.
Chapter 3
Summation
§3.1 Finite sums

One of the first mathematical experiences one has is learning to add. There was
a time when adding long columns of numbers (“long tots”) occupied many hours
of primary school. When faced with such “sums”, one quickly learns to recognise
patterns, especially 10-combinations such as 7 + 3, 8 + 2, and so on. Other patterns
are not so easily recognised. The famous mathematician C. F. Gauss is credited with
recognising, when only a young child, that the sum 1 + 2 + 3 + · · · + 100 must be 5050,
since it is half of (1 + 100) + (2 + 99) + (3 + 98) + · · · + (100 + 1) = 100 × 101 = 10100.
Gauss’s technique can be applied to any arithmetic series; that is, a series in
which each term differs from the preceding one by the same amount.† An arithmetic
series of n terms, with first term a and common difference d, has the form,
a + (a + d) + (a + 2d) + · · · + (a + (n − 2)d) + (a + (n − 1)d), (1)
in which there are n terms, each differing from its predecessor by d. Writing S for
this sum and applying Gauss’s idea, we see that 2S is the sum of (1) above and
(a + (n − 1)d) + (a + (n − 2)d) + (a + (n − 3)d) + · · · + (a + d) + a, (2)
and so it follows that
2S = n × (2a + (n − 1)d) (3)
since the sum of the first term of (1) and the first term of (2) is 2a + (n − 1)d, and
the sum of the second term of (1) and the second term of (2) the same, and the sum
of the two third terms is again the same, and so on. Since the series have n terms
each, Eq. (3) follows.

The sum of the general arithmetic series (1) above (with first
term a, common difference d and n terms altogether) is given by
(4)
n
S = (2a + (n − 1)d) = an + 21 n(n − 1)d.
2

For example,
1
3 + 8 + 13 + · · · + 98 = ((3 + 98) + (8 + 93) + · · · + (98 + 3))
2
1
= (20 × 101)
2
= 1010.
(Note that you need to be able to work out the number of terms in the series; it
was 20 in this example. The way to find this is to use the fact that the nth term is

† In mathematical literature the word series means a collection of successive terms that
are to be added. By contrast, if the successive terms are not added but remain separate,
then the word sequence is used instead.

65
66 Chapter 3 Summation

a + (n − 1)d, where a is the first term and d the common difference. Here a = 3 and
d = 5, and solving 3 + 5(n − 1) = 98 gives n = 20, showing that 98 is the 20th term.)
Sigma notation
In mathematics we use the greek letter Σ (called “Sigma”) as a shorthand notation
for summation. For example,
n
X
f (l) = f (1) + f (2) + · · · + f (n − 1) + f (n).
l=1

The notation means that f (l) is to be evaluated for all values of l from 1 to n, and the
resulting terms are to be summed. The letter l here is called a dummy variable, since
it does not stand for any particular number. One can replace l by any other Pn letter
that P
has not been used, and the meaning of the expression is unchanged: l=1 f (l)
n
and i=1 f (i) are the same as each other.
Here are two examples of Sigma notation:

10
X
k = 1 + 2 + · · · + 9 + 10
k=1

and
n
X
r2 = 22 + 32 + · · · + (n − 1)2 + n2 .
r=2
Pn
An example that often confuses people at first is j=1 1. It means this: evaluate 1
for all values of j from 1 to n, and add the resulting n terms. The fact that the
expression to be evaluated does not in fact depend on j (being always equal to 1)
does not alter the procedure. All n terms are equal to 1, and so the sum is n:
n
X
1 = 1 + 1 + 1 + · · · + 1 = n.
j=1

The principal advantage of Sigma notation is that it is very compact. For ex-
ample, in Sigma notation the general arithmetic series ((1) above) is
n
X
(a + (i − 1)d),
i=1

which is much shorter to write than (1).


3.1.1 Geometric series
The series a + ar + ar2 + · · · + arn−1 is called a geometric series. The number r is
the common ratio of successive terms, and we have chosen the notation so that n is
the number of terms of the series. Using Sigma notation,
n
X
2 n−1
a + ar + ar + · · · + ar = arx−1 .
x=1
3.1 Finite sums 67

Notice that it would not be correct to write instead


n
X
arn−1 ,
n=1

since the variable n has already been used—it is the number of terms—and therefore
cannot be used as the dummy variable in the summation. The choice of x as the
dummy variable was quite arbitrary: anything but n, r or a would do.
n−1
X
We could equally well have written the series as arx .
x=0
To find the sum of a geometric series, we can make the following observation.
Let S = a + ar + ar2 + · · · + arn−1 , and assume that r 6= 1. Then

Sr = ar + ar2 + ar3 + · · · + arn ,

and therefore
Sr − S = arn − a
since the terms ar + ar2 + · · · + arn−1 all cancel out. Since r 6= 1, we can divide both
sides by r − 1, yielding the formula
 rn − 1 
S=a .
r−1

The formula for the sum of a geometric series of n terms, with


first term a and common ratio r 6= 1, is as follows:
n  rn − 1  (5)
X
x−1
ar =a .
x=1
r−1

Example 3.1
Suppose we start an annuity with $1000, to which we add $100 each year, and that
the fixed interest rate is 12%.
At the end of year 1, we have an amount

A1 = 1000 + 0.12 × 1000 = 1000 × 1.12.

At the beginning of year 2 we add $100, so that at the end of year 2 we have

A2 = amount at beginning of year 2 + interest for year 2


= (A1 + 100) + 0.12 × (A1 + 100)
= (A1 + 100) × 1.12
= (1000 × 1.12 + 100) × 1.12
= 1000 × 1.122 + 100 × 1.12.
68 Chapter 3 Summation

Similarly, at the end of year 3 we have

A3 = (A2 + 100) × 1.12


= (1000 × 1.122 + 100 × 1.12 + 100) × 1.12
= 1000 × 1.123 + 100(1.12 + 1.122 ).

Extending this pattern to n years we have

An = 1000 × 1.12n + 100(1.12 + 1.122 + · · · + 1.12n−1 ).

The sum in the brackets in this expression is a geometric series with first term equal
to 1.12, common ratio equal to 1.12, and with n − 1 terms. Thus to evaluate it we
can use (5) with a and r both replaced by 1.12 and n replaced by n − 1. Hence,

 1.12n−1 − 1 
An = 1000 × 1.12n + 100 × 1.12 .
0.12

3.1.2 Collapsing series

A more general technique for adding utilizes the so called collapsing technique.

Example 3.2
Observe that
1 1 2
+ = ,
1×2 2×3 3
1 1 1 3
+ + = ,
1×2 2×3 3×4 4
1 1 1 1 4
+ + + = .
1×2 2×3 3×4 4×5 5
It might be conjectured that

1 1 1 1 n
S= + + + ··· + = .
1×2 2×3 3×4 n(n + 1) n+1

That this is indeed the case becomes apparent when we express each term as a
difference, as follows.
1 1
=1− ,
1×2 2
1 1 1
= − ,
2×3 2 3
..
.
1 1 1
= − .
n(n + 1) n n+1
Therefore
 1 1 1  1 1 1 1 
S = 1− + − + ··· + − + − .
2 2 3 n−1 n n n+1
3.1 Finite sums 69

All the inside terms cancel, so that the whole sum collapses. After cancelling the
interior terms, the only terms that remain in S are the first and the last. That is,
 1 1  1 1  1 1 1
S =1+ − + + − + − ··· + − + −
2 2 3 3 n n n+1
1
= 1 + (0 + 0 + · · · + 0) −
n+1
1
=1−
n+1
n
= .
n+1
Pb
Some series j=a f (j) can be evaluated by expressing each of the terms f (j) as
a difference,
def
f (n) = −g(n) + g(n + 1) = ∆g(n),
for some function g to be determined. Then
Xb
f (j) = f (a) + f (a + 1) + · · · + f (b)
j=a

= (−g(a) + g(a + 1)) + (−g(a + 1) + g(a + 2)) + · · · + (−g(b) + g(b + 1)) (6)
= −g(a) + (g(a + 1) − g(a + 1)) + · · · + (g(b) − g(b)) + g(b + 1)
= −g(a) + g(b + 1).
We call such series collapsing series. That is, if f (n) = ∆g(n) = g(n + 1) − g(n),
Pb
then n=a f (i) = g(b + 1) − g(a). In other words, the sum of n terms of the function
f has been reduced to the difference between two terms of the function g. The hard
part, of course, is to find a function g satisfying ∆g(n) = f (n) for all n.
It is usually advisable to write out several of the terms of a series explicitly,
rather than use Sigma notation.
For future reference, let us state clearly the formula that we have established.

If f (j) = ∆g(j) = g(j + 1) − g(j) for all j, then

b
X (7)
f (j) = g(b + 1) − g(a).
j=a

1
In Example 3.2 above we had f (j) = j(j+1) , and putting g(j) = − 1j we find that
for all j,
1 1 −j + (j + 1)
∆g(j) = g(j + 1) − g(j) = − + = = f (j).
j+1 j j(j + 1)
So by the formula (7)
n
X 1 n
f (j) = g(n + 1) − g(1) = − +1= .
j=1
n+1 n+1

In order to use this technique of reducing a sum of n terms to the difference


between two terms, we would need to be able to find a new function whose first
difference is the function to be summed. This is usually no easy matter. We shall
not investigate this problem at all in this course; instead, we just give some examples.
70 Chapter 3 Summation

Example 3.3
We can compute a geometric series as a collapsing sum.
Observe that if h(j) = 7j then ∆h(j) = 7j+1 − 7j = 7j (7 − 1) = 6 × 7j . More
generally, if h(j) = rj , where r is some constant, then ∆h(j) = rj+1 − rj = rj (r − 1).
The thing to notice here is that r − 1 is a constant that does not involve j, and this
means that we can divide by it† and get a function whose first difference is rj . That
is, if we define
rj
g(n) =
r−1
(for some r 6= 1) then we obtain

rj+1 rj rj+1 − rj rj (r − 1)
∆g(j) = − = = = rj .
r−1 r−1 r−1 r−1

So by the formula (7) it follows, for example, that

100
X r101 r0 r101 − 1
rj = g(101) − g(0) = − = .
j=0
r−1 r−1 r−1

(Remember that r0 = 1, by definition, for all r.)


As we have seen, series of the form a + ar + ar2 + . . . + arn−1 is a geometric
series.
Just as the first difference function for g(j) = rj /(r − 1) is ∆g(j) = rj , if we put
q(j) = arj /(r − 1) then we obtain

arj+1 arj arj+1 − arj arj (r − 1)


∆q(j) = − = = = arj .
r−1 r−1 r−1 r−1

So
n−1
X
arj = q(n) − q(0)
j=0

arn ar0
= −
r−1 r−1
 rn − 1 
=a .
r−1

This agrees with the formula we saw before (5).

Example 3.4
Let’s find a formula for the sum of the first n numbers, i.e. let’s compute the
series
Xn
j.
j=1

† (provided that it is nonzero, of course)


3.1 Finite sums 71

We have already seen how to compute this series using Gauss’ method; now let’s use
a collapsing sum to get the same answer. If we define g(j) = j 2 , then

∆g(j) = (j + 1)2 − j 2 = (j 2 + 2j + 1) − j 2 = 2j + 1.

It follows then that


n
X n
X
(2j + 1) = ∆g(j) = g(n + 1) − g(1) = (n + 1)2 − 1 = [n2 + 2n + 1] − 1 = n2 + 2n.
j=1 j=1

Pn Pn Pn
Therefore we see that 2 j=1 j + j=1 1 = n2 + 2n. But j=1 1 = n, so rearranging
we see
n
X 1 n2 + n 1
j = (n2 + 2n − n) = = n(n + 1),
j=1
2 2 2

which agrees with what we found earlier.

Example 3.5
If we define g(k) = k 3 then

∆g(k) = (k + 1)3 − k 3 = 3k 2 + 3k + 1.

It follows that
n
X n
X
2
(3k + 3k + 1) = ∆g(k) = g(n + 1) − g(1) = (n + 1)3 − 1 = n3 + 3n2 + 3n.
k=1 k=1

We therefore see that


n
X n
X n
X
2
3 k +3 k+ 1 = n3 + 3n2 + 3n.
k=1 k=1 k=1

Pn Pn
But k=1 1 = n, and the formula (4) tells us that k=1 k = 21 n(n + 1). Therefore
n
X 3
3 k 2 = (n3 + 3n2 + 3n) − n(n + 1) − n,
2
k=1

giving the formula


n
X 1 1
k2 = (2n3 + 3n2 + n) = n(n + 1)(2n + 1).
6 6
k=1

Similarly, if we define h(k) = k 4 then we find that ∆h(k) = 4k 3 + 6k 2 + 4k + 1,


and therefore
n
X n
X n
X n
X
3 2
4 k +6 k +4 k+ 1 = (n + 1)4 − 1.
k=1 k=1 k=1 k=1
72 Chapter 3 Summation
Pn
This can be used to obtain a formula for k=1 k 3 . (It turns out that the answer is
1 2 2
4 n (n + 1) .)

Exercises Set 3.1

1. Write the following in sigma notation:


(i ) 2 + 4 + 6 + · · · + 2n
1 1 1
(ii ) 1 + 2 + 3 + ··· + k−1
(iii ) 12 + 22 + 32 + . . . to n terms
(iv ) a + (a + d) + (a + 2d) + · · · + (a + (n − 1)d)

2. Evaluate the following:


P10 P20
(i ) k=1 (2k − 1) (ii ) r=1 4r
P8 k
P 5 
1 k−1
(iii ) k=0 3 (iv ) k=1 2 × 3

P20  1 1

3. (i ) Evaluate k=2 k − k+1 .
Pn
(ii ) Simplify the sum r=0 (br+1 − br ).

4. An amount of $5000 is deposited in an account which pays 12% p.a., compounded


every 3 months. How much is in the account after 3 years?

5. The lease of a shop costs $10,000 per year payable half-yearly (in advance).
However, for the first three years the shopkeeper does not pay, but the debt
accumulates together with interest at the rate of 16% p.a., added half-yearly.
How much does the shopkeeper owe the landlord at the start of the 4th year?

§3.2 The definite integral

Suppose that f is a continuous function that is defined for all x in the interval [a, b].
In this section we shall introduce the concept of the integral of f (x) from x = a to
x = b, a quantity that is written as
Z b
f (x) dx.
a

The reason that this topic appears in a chapter entitled “Summation” is because
Rb
a
f (x) dx is obtained from f by a kind of continuous summation process. The
precise meaning of this rather cryptic statement will become clearer in due course.
We have seen that the problem of evaluating a finite sum

a+n−1
X
f (i)
i=a
3.2 The definite integral 73

reduces to evaluating g(a + n) − g(a), provided we can find a function g(x) satisfying
∆g(x) = f (x). The principal result of this section is an analogous result for integrals,
known as the Fundamental Theorem of Calculus. It says that
Z b
f (x) dx = F (b) − F (a)
a

for some function F . Indeed, the function F must have the property that its derivative
is equal to f .
Our first task on the road towards defining the integral of a function is to inves-
tigate sums of series with an infinite number of terms. The simplest infinite series
that has a meaningful sum is
1 1 1
1+ + + · · · + k−1 + · · ·
2 4 2
(an infinite geometric series). Using the formula (5) from §3.1 gives
n
X 1 1 − ( 12 )n
=
k=1
2k−1 1 − 12

= 2 1 − ( 21 )n .

But ( 21 )n → 0 as n → ∞; so 2 1 − ( 12 )n → 2 as n → ∞. This makes it natural to
define

X 1
k−1
= 2.
2
k=1

In other words, the so-called infinite sum is really the limit of a sequence of finite
sums. The integral of a function will similarly be defined as the limiting value of a
sequence of finite sums.

Example 3.6

The rate of growth, F (t), of a microbiological population (where F (t) = number of
cells present at time t) will usually vary with time. That is, it is a function of t,
usually not constant.
If F ′ (t) were constant—say F ′ (t) = k cells per minute—then in any interval of 60
minutes the number of cells formed would be 60k. In other words, F (t0 + 60) − F (t0 )
would be equal to 60k (for any value of t0 ).
The problem we wish to address now is this: if F ′ (t) is not constant, but we
know how it varies with time, can we still determine how many cells will be formed
in a given time interval?
For simplicity, let us assume that we are dealing with an interval of 60 minutes,
from t = 0 to t = 60. We want a method for finding F (60) − F (0); let us call this
quantity F.
If F ′ (t) does not vary too much then we could choose some representative value
of F (t), say F ′ (30), and take this as an estimate for the value of F ′ (t) over the whole

interval. This gives the rough estimate

F ≈ 60F ′ (30).
74 Chapter 3 Summation

We can improve this by breaking the 60 minute interval into smaller subintervals,
and using the same technique on each of these. For example, suppose that we use
three subintervals of length 20 minutes, and let F1 , F2 and F3 be the number of cell
produced in the first, second and third subintervals.
t number produced
0 – 20 F1
20 – 40 F2
40 – 60 F3
P3
Obviously, F = F1 + F2 + F3 = i=1 Fi . Now we can choose some time t1 in the
first subinterval (perhaps t1 = 10) and use 20F ′ (t1 ) as an estimate for F1 . Similarly,
choosing some t2 and t3 in the second and third subintervals gives estimates 20F ′ (t2 )
and 20F ′ (t3 ) for F2 and F3 , and an overall estimate

F ≈ 20F ′ (t1 ) + 20F ′ (t2 ) + 20F ′ (t3 )

for the quantity of interest. This estimate will presumably be better that the previous
estimate of 60F ′ (30), since the amount of variation of F ′ (t) on the subintervals is
certainly less than it is over the whole interval.
Better and better estimates can be obtained by taking smaller and smaller subin-
tervals. If we took ten subintervals of 6 minutes each, and if we let Fi be the number
of cells produced in the i-th subinterval, then

F = F1 + F2 + F3 + F4 + F5 + F6 + F7 + F8 + F9 + F10

and Fi ≈ 6F ′ (ti ), where ti is some point in the i-th subinterval. This gives
10
X
F≈ 6F ′ (ti ).
i=1

Similarly, 120 subintervals of half minute length would give the estimate
120
X
1 ′
F≈ 2 F (ti ),
i=1

which by now, hopefully, would be fairly accurate.

Clearly, the assumption in Example 3.6 that F (t) represented a microbiological


population had nothing whatever to do with the procedure used to analyse the prob-
lem. In general, if F (t) is any time-dependent quantity for which F ′ (t) (the rate of
change of F (t)) is known, then we can estimate the change in F (t) between t = a
and t = b by the same method used in Example 3.6.
let us work out the formula carefully. Suppose the we divide the interval [a, b]
into n subintervals of equal length. We use the notation ∆t for the common length
of the subintervals.† Thus
1
∆t = (b − a).
n
† This usage of the symbol ∆ is somewhat different from the one in the section on finite
differences, although both are traditional.
3.2 The definite integral 75

Let us also write


a0 = a = a + 0∆t,
a1 = a + 1∆t
a2 = a + 2∆t
..
.
an = a + n∆t = b

so that the i-th subinterval is [ai−1 , ai ]. If ti is chosen as a point in [ai−1 , ai ] then we


have the estimate
F (ai ) − F (ai−1 ) ≈ F ′ (ti )∆t

for the change in value of F (t) over the i-th subinterval. Now

F (b) − F (a) = F (an ) − F (a0 )


Xn
= (F (ai ) − F (ai−1 ))
i=1
Xn
≈ F ′ (ti )∆t.
i=1

This approximation tends to the true value as the number n increases, and so we can
write
Xn
F (b) − F (a) = lim F ′ (ti )∆t.
n→∞
i=1

(Note that here ∆t and the numbers ti all depend on n.)


More generally still, suppose that f is any continuous function on the interval
[a, b]. We can perform the same calculation as above with f (t) replacing F ′ (t). The
value of the limit
X n
lim f (ti )∆t
n→∞
i=1

is called the definite integral of f (t) with respect to t, from t = a to t = b. It is


Rb
written as a f (t) dt. (Read this as “the integral from a to b of f (t) dee t”.)

Z b n
X
def
f (t) dt = lim f (ti )∆t
a n→∞ (1)
i=1

where ∆t = (b − a)/n and ti lies in the interval [a + (i − 1)∆t, a + i∆t].

The key point to note is that if f (t) is the derivative of some function F (t), then,
as we found previously,
Z b
f (t) dt = F (b) − F (a).
a
76 Chapter 3 Summation

Example 3.7
Let us work through an example with F ′ (t) = t + 1 (so that at any time t the
instantaneous rate of change of F (t) is given by t + 1), and estimate the increase
in F (t) for the period from t = 0 to t = 2. That is, we will find an estimate for
F (2) − F (0).
First divide [0, 2] into four equal subintervals, each of length ∆t = 21 . The
subintervals are
[0, 21 ], [ 12 , 1], [1, 23 ], [ 32 , 2].
Next select a point ti from each subinterval:

t1 = 0, t2 = 21 , t3 = 1, t4 = 3
2

will do. Since F ′ (t) = t + 1, we have the following table of values.


1 3
ti 0 2 1 2
3 5
F ′ (ti ) 1 2 2 2

The estimate for F (2) − F (0) is given by

4
X 1 3 1 1 5 1 7
F ′ (ti )∆t = 1 × + × +2× + × = .
i=1
2 2 2 2 2 2 2

1
Suppose we keep the intervals the same, but choose instead t1 = 2, t2 = 1,
t3 = 32 and t4 = 2. We have

1 3
ti 2 1 2 2
3 5
F ′ (ti ) 2 2 2 3

and the estimate is


3 5 3
F (2) − F (0) ≈ 4 +1+ 4 + 2 = 29 .

The first estimate was clearly too small. Since t + 1 is an increasing function
its value on each subinterval is minimum at the left hand end of the interval. Since
we took each ti to be the left hand end-point of its subinterval, we were consistently
underestimating the rate of increase of F on the subintervals. Similarly, the second
estimate was too large, since by taking the right hand end-points we were consistently
overestimating. So the true value of F (2)−F (0) lies somewhere between 7/2 and 9/2.
Now let us be more ambitious, and use 200 subintervals of length ∆t = 1/100.
Let ti be the left hand end-point of the i-th subinterval; this gives ti = (i − 1)/100,
and
i−1
F ′ (ti ) = 1 + .
100
The estimate for F (2) − F (0) is

200 200  200 


X

X i − 1 1 X 1 i−1 
F (ti )∆t = 1+ = + .
i=1 i=1
100 100 i=1
100 10000
3.2 The definite integral 77

This is an arithmetic series with first term 1/100, common difference 1/10000 and
200 terms. Using the formula

1
S = an + n(n − 1)d
2
(see (4) of §3.1) we obtain

200 200 × 199


F (2) − F (0) ≈ +
100 2 × 10000
= 2 + 1.99
= 3.99.

If we had chosen ti as the right hand end-point we would have obtained a similar
arithmetic series, the only difference being that the first term would be 0.0101 instead
of 0.01, making the an term in the formula equal to 2.02 instead of 2. So the estimate
for F (2) − F (0) would be 4.01.
Using 2000 intervals instead of 200 gives upper and lower estimates of 3.999 and
4.001, and using 20000 intervals gives 3.9999 and 4.0001. It looks as though the
true value is 4. One can in fact work out a general formula for an estimate with n
subintervals and take the limit as n → ∞, and confirm that 4 is correct.
There is, however, an easier way. We could just find a formula for F ! In this
case, it is not hard to do this just by trial and error. In fact, if

F (t) = 21 t2 + t
1 2
then F ′ (t) = t + 1. The formula F (t) = 2t + t gives F (0) = 0 and F (2) = 4,
confirming that F (2) − F (0) = 4.

The moral to be drawn from the above example is this: one can calculate
Rb
a
f (t) dt by a lengthy calculation involving sums and limits, or by a trivial cal-
culation if a function F can be found whose derivative is f . The catch is that the
problem of finding a function with a given derivative is not always easy.
A function F whose derivative is f is traditionally called a primitive of f , but
we shall prefer the more modern and intuitive term antiderivative.

Example 3.8
Imagine a journey in Ben’s VW from the Carslaw building (CB) to Bankstown Public
Library (BPL). Ben’s odometer (which measures distance travelled) is broken, but
his speedometer is working. We could estimate the length of the journey from CB to
BPL by noting the speed every few minutes and forming the sum
speed × bit of time + speed × bit of time + speed × bit of time + · · · .
That is, our estimate of distance is
X
F ′ (ti ) · ∆ti
i

where ∆ti is the time between successive measurements. Note that it is not necessary
for the subintervals to all have the same length, as we assumed (for simplicity) in
78 Chapter 3 Summation

our previous examples. So long as all of the subintervals are small enough a good
estimate can be obtained.
Note that speed is the derivative of distance travelled. Here F (t) is the distance
that has been travelled at time t, and F ′ (t) is the speed at time t. If tCB is the time
at the start of the journey and tBP L the time at the end, then F (tCB ) = 0 and the
length of the journey is

F (tBP L ) = F (tBP L ) − F (tCB )


Z tBP L
= F ′ (t) dt
tCB
X
= lim F ′ (ti ) · ∆ti .
n→∞
i

Of course, evaluating the sum for some finite number of terms n will only give
an estimate of the integral. All in all it would be better if Ben got his F (t) fixed, so
that the length of the journey would just be the difference of two odometer readings.

Exercises Set 3.2

1. Evaluate the definite integrals:


R2 3 R1
(i ) −2
(x − x + 1) dx (ii ) 0
sin(2x + 1) dx
R2 R1 2
(iii ) 0 x2x+5 dx (iv ) 0
xe−x dx

2. A micro-organism is producing cells at a rate of 50e0.06t cells per minute.


(i ) Estimate the number of cells, N (t), produced in the first three minutes by
dividing the time interval into six equal subintervals and evaluating the
P6
sum i=1 N ′ (ti ) × (3−0)
6 (where each ti is in the ith subinterval).
(ii ) Find the number of cells produced in three minutes using an appropriate
definite integral.

3. A liquid is entering a collection chamber at the rate of 4t ml/sec, and leaving at


the rate of t2 ml/sec.
(i ) Estimate the volume of liquid, V (t), which will accumulate in the first 4
seconds by dividing the time interval into 4 equal subintervals and evaluat-
P4
ing the sum i=1 V ′ (ti ) × (4−0)
4 (where each ti is in the ith subinterval).
(ii ) Find the volume which accumulates in the first 4 seconds using an appro-
priate definite integral.

4. The volume, V , of a balloon is increasing at a rate of t + 1+ 23 t (t in seconds).
Find the volume after 8 seconds.
3.3 The indefinite integral 79

§3.3 The indefinite integral

Our discussion in §3.2 naturally leads us to consider the following problem: how can
we find a function F whose derivative is a given function f ? Doing this is called
integrating f .
There are many rules and techniques for integration that have been discovered,
and we shall meet several of them in this course. In general, integration is somewhat
harder than differentiation; so whenever you think that you have found an antideriva-
tive for a function f , you should always differentiate it and check that the answer
really is f .
Knowing how to differentiate makes it possible to integrate some functions by a
combination of guesswork and trial and error. For example, it is not too hard to find
a function F (x) satisfying F ′ (x) = x2 . Indeed,
 
d x3 3x2
= = x2 .
dx 3 3

These days there are powerful computer algebra packages that are able to find
antiderivatives of many functions, and it can be very useful to employ such a program
when faced with a particularly complicated integration problem. Nevertheless, it is
still important to learn the techniques of integration to understand what the computer
is doing. This not only aids in spotting any errors, but also aids in constructing
a greater picture of what is going on. This means a better understanding of the
problem in general, but can also lead to, for example, being able to pick pathways
to solutions of entire classes of problems, or even finding solutions to problems that
remain unreachable by current computer packages.
Notation and terminology
If f is a given function, then a function F satisfying F ′ (x) = f (x) is called a primitive
or antiderivative of f . It is also called the indefinite integral of f , and the notation
Z
F (x) = f (x) dx

is commonly used. The function f is called the integrand (meaning “thing to be


integrated”).
Notice that since the derivative of any constant function is zero, the zero function
has infinitely many antiderivatives. It follows that in fact every function that has an
antiderivative has infinitely many. For example, if

F1 (x) = 1 + sin x
F2 (x) = 15 + sin x
F3 (x) = −8 + sin x

then F1′ (x) = F2′ (x) = F3′ (x) = cos x; so F1 , F2 and F3 are all antiderivatives of the
cosine function. Thus the expression
Z
cos x dx
80 Chapter 3 Summation

does not Runiquely determine any one function. Although itR is legitimate to write
F1 (x) = R cos x dx, it is equally legitimate to write F2 (x) = cos x dx and to write
F3 (x) = cos x dx, even R though F1 , F2 and F3 are not equal. It is best not to regard
the indefinite integral f (x) dx as a function, but to think of the equation
Z
F (x) = f (x) dx

as just an alternative notation for the equation

f (x) = F ′ (x).

Because of this non-uniqueness property of the indefinite integral, some people


take the view that it is incorrect to write equations such as
Z
(2x + 1) dx = x2 + x, (1)

and that one should always write instead


Z
(2x + 1) dx = x2 + x + C (where C is an arbitrary constant).

The constant C is sometimes called the “constant of integration”.


In this course we shall not consider Eq.(1) to be wrong. However, you should
always remember that the right hand side can be changed by the addition of any
constant, and all of the alternatives are equally correct.

Example 3.9
The best way to learn how to find antiderivatives is to find the derivatives of a
large number of functions. Altering your point of view, you have then found the
antiderivatives of a large number of functions, since a table of antiderivatives is just
a table of derivatives in reverse. So let us differentiate some functions, and reformulate
the results as indefinite integrals.
Z
d
• Since sin x = cos x, it follows that cos x dx = sin x+C (for any constant C).
dx
Z
d n n−1
• Since x = nx , it follows that nxn−1 dx = xn + C.
dx
R
Of course, it would be nicer to have a formula for xn dx, and with a small amount
of ingenuity we can achieve this. Firstly, Since the formula we obtained above
is
R valid for all values of n, we can put n + 1 in place of n. This shows that
(n + 1)x dx = xn+1 + C, corresponding to the fact that dx
n d xn+1 = (n + 1)xn .
Provided that n 6= −1, we can now just divide through by n + 1.
Z
d  xn+1  n n xn+1
• Since = x , it follows that x dx = + C, for n 6= −1.
dx n + 1 n+1
Z
d 1 1
• Since ln x = , it follows that dx = ln x + C, provided that x > 0.
dx x x
3.3 The indefinite integral 81

The proviso that x > 0 has to be inserted, since the function ln x is only defined for
x > 0. But if x < 0 then ln(−x) is defined, and differentiating we find that

d 1 1
ln(−x) = × (−1) = .
dx −x x
So if we are concerned with negative x values rather than positive ones, the following
rule applies:
Z
1
• dx = ln(−x) + C, provided that x < 0.
x
Differentiating ex , cos x and tan x immediately gives us the following three formulas:
Z
• ex dx = ex + C;
Z
• sin x dx = − cos x + C;
Z
• sec2 x dx = tan x + C.

Finally, the general rules that

d d d
(f (x) + g(x)) = f (x) + g(x)
dx dx dx
and
d d
(kf (x)) = k f (x)
dx dx
for all differentiable functions f and g and all constants k give us the following general
rules for integration.
Z Z Z
• (f (x) + g(x)) dx = f (x) dx + g(x) dx;
Z Z
• kf (x) dx = k f (x) dx.

The above integration rules, and a few others, are listed in Appendix 4. In the
next example we illustrate how these rules can be used to integrate various functions.

Example 3.10
Z
x7
(i) x6 dx = + C.
7
Z
x−3
(ii) x−4 dx = + C.
−3
Z Z 3 √
√ 1 x2 2x x
(iii) x dx = x 2 dx = 3 + C = + C.
2
3
Z Z Z Z Z
3 2 4 3 2
(iv) (x + 2x + 3 − 6 ) dx = x dx + 2 x dx + 3 dx − 4 x−6 dx
x
x4 2x3 4x−5
= + + 3x + + C.
4 3 5
82 Chapter 3 Summation
Z Z
4 2 y5 2y 6 y7
(v) y (1 − y) dy = (y 4 − 2y 5 + y 6 ) dy = − + + C.
5 6 7
Z Z √
1−u − 21 1 2 u
(vi) √ du = (u − u ) du =
2 (3 − u) + C.
u 3
Z
(vii) 5ex dx = 5ex + C.
 2
 x
Z 2
x +1
Z 
 + ln x + C on intervals where x > 0,
(viii) −1
dx = (x + x ) dx = 2
x 2
 x + ln(−x) + C

 on intervals where x < 0.
2

Integration using the chain rule (or “function of a function” rule) in reverse
Recall the formula for the derivative of a composite function f (g(x)):

d
(f (g(x)) = f ′ (g(x))g ′ (x).
dx

Integrating both sides with respect to x gives


Z
f ′ (g(x))g ′ (x) dx = f (g(x)) + C.

Example 3.11
d d
(i) (sin(x2 + 3)) = cos(x2 + 3) (x2 + 3) = 2x cos(x2 + 3).
dx Z dx
Therefore, 2x cos(x2 + 3) dx = sin(x2 + 3) + C.
d
(ii) (3x3 + 5)9 = 9(3x3 + 5)8 (9x2 ) = 81x2 (3x3 + 5)8 .
dx Z Z
Therefore, 81x2 (3x3 + 5)8 dx = 9(3x3 + 5)8 (9x2 ) dx = (3x3 + 5)9 + C.

Many integration problems involve recognising that the integrand is of the form
f (g(x)) g ′ (x): the product of a composite function and the derivative of the “inside

function”. We give several examples.

Example 3.12
Z Z
2
(i) ex 2x dx = eg(x) g ′ (x) dx, where g(x) = x2 ,

= eg(x) + c
2
= ex + c.
Z p Z
2
1 p
(ii) x 1 + x dx = 2x 1 + x2 dx
2Z
1 1
= g ′ (x) (g(x)) 2 dx, where g(x) = 1 + x2 ,
2
3.3 The indefinite integral 83
3
1 (g(x)) 2
= 3 +c
2 2
3
(1 + x2 ) 2
= + c.
3
Z Z
(iii) sin x cos x dx = g(x)g ′ (x) dx, where g(x) = sin x,
(g(x))2
= +c
2
sin2 x
= + c.
2
Z Z
(iv) e (e − 2) dx = g ′ (x)(g(x))4 dx,
x x 4
where g(x) = ex − 2,
(g(x))5
= +c
5
(ex − 2)5
= + c.
5
Z Z
x 1
(v) dx = 2x(x2 + 1)−1 dx
x2 + 1 2Z
1
= g ′ (x)(g(x))−1 dx, where g(x) = x2 + 1,
2
1
= ln |g(x)| + c
2
1
= ln(x2 + 1) + c.
2
Z √ Z
cos x √ 1
(vi) √ dx = 2 cos x √ dx
x Z 2 x

= 2 cos(g(x))g ′ (x) dx, where g(x) = x,
= 2 sin g(x) + c

= 2 sin x + c.
Z Z
(vii) cos x sin x dx = − cos2 x(− sin x) dx
2

Z
= − (g(x))2 g ′ (x) dx, where g(x) = cos x,
(g(x))3
=− +c
3
cos3 x
=− + c.
3
Z Z
sin t cos t 1 2 sin t cos t
(viii) 2 dt = 2 2 dt
2 + sin t Z 2 + sin t
1
= g ′ (t)(g(t))−1 dt, where g(t) = 2 + sin2 t,
2
1
= ln |g(t)| + c
2
1
= ln(2 + sin2 t) + c.
2
84 Chapter 3 Summation
Z Z
ln x
(ix) dx = g(x)g ′ (x) dx, where g(x) = ln x,
x
(g(x))2
= +c
2
(ln x)2
= + c.
2

3.3.1 Integral curves

One can perhaps regard the indefinite integral

Z
f (x) dx = F (x) + c

as really being a family of functions: one function for each different choice of the
constant of integration c. The graphs of the different functions in this family give a
family of curves, called integral curves.

The diagram on theR next page illustrates the family of curves obtained from
the indefinite integral (3x − 1) dx = x3 − x + c. The graphs of y = x3 − x + c
2

corresponding to c = −3, −2 − 1, 0, 1 and 2 have been plotted over the interval


[−1.7, 1.7].

Consider the integral curves y = F (x) + c1 and y = F (x) + c2 , corresponding


to two different values of c. For any value of x, the vertical distance between these
two graphs is the difference between c1 and c2 (since the vertical line x = x0 cuts the
first graph at (x0 , F (x0 ) + c1 ) and the second at (x0 , F (x0 ) + c2 ), and the distance
between these two points is |c2 − c1 |). Thus, in some sense, the graphs are parallel
to each other: moving the first c2 − c1 units vertically will place it exactly on the
second. Note, however, that (x0 , F (x0 ) + c2 ) is not usually the point on the second
graph that is closest to (x0 , F (x0 )+c1 ), and, unlike the vertical distance, the shortest
distance between the two graphs does vary as you move along the graphs. (They get
closer together as they get steeper.)

Notice that any given point (x0 , y0 ) will lie on one of the integral curves, since
you can take any one of the curves and move it vertically (up or down) until it passes
through (x0 , y0 ). For example, it is easy to find the value of c for which y = x3 − x + c
passes through the point (2, 3): since (2, 23 − 2 + c) lies on the curve, you just need
3.3 The indefinite integral 85

to solve 23 − 2 + c = 3. (The answer is c = −3.)


y = x3 − x + c

3 (2, 3)
c=2
2
c=1
1
c=0
0
−2 −1 1 2 x

−1
c = −1
−2
c = −2
−3
c = −3

−4

−5

Example 3.13

Sketch the integral curves of ex , and find the particular curve that contains the point
(2, 1).

R
Solution: We have that ex dx = ex + C. The curve y = ex + C passes through
the point (2, e2 + C), and so we want to choose c so that e2 + C = 1. That is, the
particular curve required is y = ex + 1 − e2 . (Note that 1 − e2 ≈ −6.39).

To sketch these curves, remember that ex > 0 for all x and ex → 0 as x → −∞.
Thus the graph of y = ex + C approaches the horizontal line y = c as we move along
86 Chapter 3 Summation

it in the negative x-direction. Also, ex + C takes the value 1 + C at x = 0, since


y = ex + C

C=0 1 (2,1)
0
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 1 2 x
C = −1 −1

C = −2 −2

C = −3 −3

C = −4 −4

C = −5 −5

C = −6.39

e0 = 1. For x > 0 the slope of the graph increases rapidly as x increases. To get a
reasonably accurate picture it is necessary to plot a few points.
The diagram shows the curves corresponding to C = 0, −1 and −2 plotted from
x = −9 to x = 2, and the curves corresponding to C = −3, C = −4 and C = −6.9
plotted from x = −9 to x = 2.3.

Direction fields
Sometimes it is very difficult, or even impossible, to find a formula for an antideriva-
tive of a given function f . Try as you might, you will never be able to find an F (x)
2
such that F ′ (x) = ex , for example. However,
R even in such cases it is still possible to
make a sketch of the integral curves for f (x) dx.
We have seen that every point (x0 , y0 ) in the plane lies on one of these integral
curves y = F (x) + c. Observe that the slope of y = F (x) + c at the point (x0 , y0 )
is F ′ (x0 ) = f (x0 ), which we can compute just by knowing the function f . So, for
every (x0 , y0 ) in the plane, we know the slope at (x0 , y0 ) of the integral curve that
(x0 , y0 ) lies on. In other words, we know the direction in which the curve is going as
it passes through (x0 , y0 ).
This collection of directions, one direction for each point in the plane, constitute
what is known as a direction field. One can imagine the entire plane covered with
short line segments, one at each point, with slopes as given by the direction field. If
enough of these line segments are drawn, it becomes possible to draw integral curves
by simply joining them up.
In the diagram we have used the function f (x) = 3x2 − 1 (the integral curves of
which were sketched previously). Calculating f (x) at intervals of length 0.25, from
3.3 The indefinite integral 87

x = −1.5 to x = 1.5, gives the following table of values (rounded to two decimals).
−1.5 −1.25 −1 −0.75 −0.5 −0.25 0 0.25 0.5 0.75 1 1.25 1.5
6.75 3.69 2 0.69 −0.25 −0.81 −1 −0.81 −0.25 0.69 2 3.69 6.75
For each of these x values we have drawn line segments of slope f (x) at a number of
points (x, y), with y ranging from −1.5 to 1.5. The diagram enables on to see roughly
where the integral curves must go. (One has been drawn in, as a dotted line.)
y

3.3.2 Finding an amount given its rate of change

If we know how a quantity Q(t) varies with time t, differentiating gives the rate
of change, Q′ (t). If we are instead given the rate of change, as a function q(t), we
need to integrate it to find Q(t).
Similarly, when graphing the function y = f (x), the derivative gives the slope
of the tangent at each point. If we are instead given a formula for the slope, as a
function of x, we need to integrate this to find f (x).
The problem of finding a formula for quantity given its rate of change is equiv-
alent to finding the equation of a function given the slope of its graph.
Example 3.14
(1) Find the equation of a curve that has a slope of 3x2 at each point.
Solution: Let the curve be y = f (x). Then f ′ (x) = 3x2 , and so
Z
f (x) = 3x2 dx

= x3 + C.
Therefore, the equation of a suitable curve is y = x3 + C, for any choice of C.
Notice that without some additional requirement, the answer is not unique. But
if, for example, we were also told that the curve must pass through (0, 1), then we
could use this to compute C. Indeed, f (0) = 1 would give 03 + C = 1, and so the
curve required would be y = x3 + 1.

(2) Find the equation of the curve that passes through the point (2, 5) and satisfies
dy
= 3x2 − 8x + 7.
dx
88 Chapter 3 Summation

Solution: Let the equation be y = f (x), so that f ′ (x) = 3x2 − 8x + 7. Then


Z
f (x) = (3x2 − 8x + 7) dx

= x3 − 4x2 + 7x + C,

which is the general solution before imposing the requirement that the graph must
pass through (2, 5). The formula gives

f (2) = 23 − 4 × 22 + 7 × 2 + C = 8 − 16 + 14 + C = 6 + C,

and so f (2) = 5 forces C = −1. Therefore f (x) = x3 − 4x2 + 7x − 1 is the particular


solution sought.

(3) A migrating goose has a speed of 12 − 5t kph. Starting at dawn (t = 0) it flies
for 12 hours. How many kilometres are flown in the first 6 hours? How many in the
next 6 hours?
Solution: The distance travelled between t = a and t = b is given by
Z b
f (t) dt = F (b) − F (a)
a

where f (t) is the speed at time t and F is an antiderivative of f . (We are applying the
Fundamental Theorem of Calculus to evaluate a definite integral.) In this problem
Z
t t2
F (t) = 12 − dt = 12t − + c,
5 10

and so the distance travelled between t = 0 and t = 6 is

62
F (6) − F (0) = (12 × 6 − + c) − (0 − 0 + c) = 68.4 kilometres.
10

Notice that the constant c cancels out; so in this kind of problem there is no
need to include it in the first place.
We remark that is usual practice in these problems to use the notation
 b def
F (t) a
= F (b) − F (a).

Using this, the distance the goose travels between t = 6 and t = 12 is


Z 12  12
t t2
12 − dt = 12t −
6 5 10 6
= (144 − 14.4) − (72 − 3.6)
= 61.2 kilometres.

(4) A population of size N (t) has a rate of change R(t) = −t(t − 100) = N ′ (t).
(i) Represent N (t) as an integral.
3.3 The indefinite integral 89

(ii) Assuming that N (0) = 50, find the function N (t).


(iii) Continuing on from (ii), find N (50).
Solution: (i) We are given that N ′ (t) = −t(t − 100) = 100t − t2 . Therefore
Z
N (t) = (100t − t2 ) dt

= 50t2 − 1 3
3 t + c.

(ii) The formula just derived gives N (0) = 0 − 0 + c = c. Since we require N (0) = 50
it follows that c = 50. So
N (t) = 50t2 − 31 t3 + 50.

(iii) Substituting in t = 50 gives

503
N (50) = 50 × 502 − + 50 ≈ 83, 400
3
(to the nearest hundred).

(5) Suppose that a population has a rate of growth given by three times the popu-
lation size. In other words, if N (t) is the number of individuals in the population at
time t, then at time t the rate of growth of the population is 3N (t) individuals per
unit time. (Since we have not been told the unit of time employed, we do not really
know how fast this population is growing.)
(i) Find N (t) if N (0) = 3.2 × 104 .
(ii) Calculate N (10).
Solution: (i) We are given that N ′ (t) = 3N (t). Dividing by N (t) we obtain

N ′ (t)
= 3.
N (t)

The point of this trick is that we can now integrate, using the chain rule in reverse:
Z
1
N ′ (t) dt = ln N (t) + c, (2)
N (t)

Rwhere c is a constant, as yet unknown. But the left hand side of Eq. (2) has to equal
3 dt = 3t + c′ , for some c′ , and so we obtain

ln N (t) + c = 3t + c′ .

Of course, there is certainly no need to include two constants of integration when


integrating a single equation, and it would be more usual to proceed straight from
Eq. (2) to
ln N (t) = 3t + C (3)
(where this C is related to c and c′ via C = c′ − c). Now Eq. (3) gives

N (t) = e3t+C = e3t eC = Ke3t ,


90 Chapter 3 Summation

where K = ec .
This is an example of the so called exponential growth model. Note that in
practice exponential growth cannot be sustained for very long, since the exponen-
tial function increases so rapidly that some previously ignored physical restriction is
bound to manifest itself, slowing the rate of increase.
By the formula, N (0) = Ke0 = K. So N (0) = 3.2 × 104 gives K = 3.2 × 104 , and

N (t) = 32000 e3t .

(ii) The formula above gives N (10) = 32000 e30 ≈ 3 × 1017 . (It is hard to believe
that such an astronomic population size could ever be achieved in any real situation.)

Exercises Set 3.3


R
1. Using f ′ (g(x))g ′ (x) dx = f (g(x)) + c, find the following integrals:
Z Z
1 3
2 2
(1) 2x(1 + x ) dx (2) p 2 (1 + p 2 )3 dp
Z Z √
2 3 cos x
(3) x sin x dx (4) √ dx
x
Z Z
x dx
(5) sin2 t cos t dt (6) √
5 + x2
Z Z
3r dr 2
(7) √ (8) t2 (1 + 2t3 )− 3 dt
1 − r2
Z p Z p
(9) 2 + 5y dy (10) x2 x3 + 4 dx
Z Z
y dy (z + 1) dz
(11) p (12) √
2y 2 + 1
3
z 2 + 2z + 2
Z Z
(13) x sin(2x2 ) dx (14) cos(2x + 4) dx
Z Z
sin 2t dt
(15) √ (16) 2 sin z cos z dz
2 − cos 2t
Z Z
cos 2t dt 1
(17) (18) x2 (2 + 3x3 ) 2 dx
sin2 2t
Z Z
x dx
(19) (20) cos2 2x sin 2x dx
x2 + 1
Z Z
y y (t2 + 2t) dt
(21) sin3 cos dy (22)
2 2 (t3 + 3t2 + 1)3
Z Z
3 ln x
(23) x2 ex dx (24) dx
x
3.4 Applications of the definite integral 91

2. For each of the integrands,


(a) f (x) = −6 (b) f (x) = −2x + 2 (c) f (x) = x3 − 4x
perform the following:
(i ) Determine the associated one-parameter family of integral curves.
(ii ) Sketch the graph of the integral curves associated with
C = −3, C = 0, C = 3
(iii ) On the same graph, sketch the particular integral curve passing through
(1, 3). Give the equation of this curve.

§3.4 Applications of the definite integral

3.4.1 Area under a curve


Suppose that f (x) ≥ 0 on the interval [a, b], and let A be the area of the region
bounded by the graph of y = f (x), the vertical lines x = a and x = b, and the x-axis
(y = 0). The area A is shaded in the following diagram.

y = f (x)

a b x

An estimate for A can be found as follows. Divide the area into n vertical strips,
each of width ∆x = b−a n , and draw a horizontal line across the top of the strip,
crossing the curve at some point within the strip. The area A is then approximated
by the sum of the areas of n narrow rectangles, each of width ∆x and height f (xi ),
where xi is some point in the interval [a + (i − 1)∆x, a + i∆x]). The next diagram
illustrates a typical narrow rectangle.

y = f (x)
∆x

a xi b x
92 Chapter 3 Summation

Each of the n rectangles has area f (xi )∆x (for some xi ), and hence

n
X
A≈ f (xi )∆x.
i=1

Clearly, as n increases (that is, as the number of rectangles increases and their width
∆x is made smaller) we obtain better estimates for A. As n increases indefinitely we
have

n
X Z b
A = lim f (xi )∆x = f (x) dx.
n→∞ a
i=1

Here we have used the definition of the definite integral, as given in §3.2.

Example 3.15

(1) Find the area under the curve f (x) = x/2 on the interval [0, 2].

y
1 y = x/2

x
1 2

Z 2  2
x2
Solution: A= x/2 dx = = 1.
0 4 0

(2) Find the area enclosed between y = (x − 1)(4 − x) and y = 0.

y
2
1

x
1 2 3 4

Z 4  4
2 x3 5x2
Solution: A= (−x + 5x − 4) dx = − + − 4x = 4.5.
1 3 2 1
3.4 Applications of the definite integral 93

3.4.2 Area between two curves


Suppose that g(x) ≥ f (x) on [a, b].

g(xi ) − f (xi )

A
y = g(x)
xi
x

∆x y = f (x)

x=a x=b

The area A bounded by the two curves y = g(x) and y = f (x) and the vertical lines
x = a and x = b can be found by the same method as used in 3.15 to find the area
under a curve. That is, divide the area into n narrow strips, each of width ∆x = b−a n ,
and find the limit of the sum of the areas of these n strips as n → ∞.
Note that as long as g(x) ≥ f (x), the height of the i-th strip is g(xi ) − f (xi ),
for some xi in the subinterval. (Note that this is true regardless of whether f (x) (or,
indeed, g(x)) is above or below the x-axis; all that matters is that g(x) ≥ f (x).) So
n
X
A = lim (g(xi ) − f (xi ))∆x
n→∞
i=1
Z b
= (g(x) − f (x)) dx.
a

Example 3.16
(1) Find the area between the curves y = g(x) and y = f (x) for 0 ≤ x ≤ 5, where
g(x) = 2 and f (x) = 2 − 2e−x .

y
2 y=2

1 y = 2 − e−x

x
1 2 3 4 5 6

Note that e−x > 0 for all x; so 2 − 2e−x < 2. (As the diagram shows, at x = 5 the dif-
Rb
ference between 2 − 2e−x and 2 is very small.) So the formula A = a (g(x) − f (x)) dx
94 Chapter 3 Summation

applies, and it gives


Z 5 Z 5
2
A= (2 − (2 − 2e −x
)) dx = 2e−x dx = [−2e−x ]50 = 2 − ≈ 1.99326.
0 0 e5

(2) To find the area between two intersecting curves, we must first find the x-values
of all points of intersection. For example, let us find the area between y = 5x − x2
and y = x2 − 5x + 4.5. The first step is to solve for the intersection(s) as follows:

5x − x2 = x2 − 5x + 4.5,
2x2 − 10x + 4.5 = 0,
(2x − 1)(x − 4.5) = 0,

giving x = 0.5 or x = 4.5. Thus we find that there are two points of intersection,

y y = 5x − x2
6

0
0.5 1 x
2 3 4 4.5 5
-1

-2 y = x2 − 5x + 4.5

one at (0.5, 2.25) and the other at (4.5, 2.25). For x in the interval [0.5, 4.5] we have
5x − x2 ≥ x2 − 5x + 4.5, and the area sought is
Z 4.5 Z 4.5
2 2
A= ((5x − x ) − (x − 5x + 4.5)) dx = (10x − 2x2 − 4.5) dx
0.5 0.5
 4.5 1
= 5x2 − 32 x3 − 4.5x 0.5
= 21 .
3

You should be careful not to fall into the trap of thinking that the definite
Rb
integral a f (x) dx is always the area between the graph and the x-axis. This is only
valid if f (x) ≥ 0 on the interval [a, b]. If f (x) < 0 on the interval [a, b] then the area
Rb Rb
between the graph and the x-axis is a (0 − f (x)) dx = − a f (x) dx, and if the graph
3.4 Applications of the definite integral 95

crosses the x-axis in the interval [a, b] then it is necessary to find all the points where
it crosses, and evaluate all the different pieces of area separately.
Consider, for example the area between the graph of y = sin x and the x-axis
for 0 ≤ x ≤ 2π.

y
1

0
π 2π x
−1

The graph crosses the axis once in the interval [0, 2π], namely at x = π. Of course
Z 2π  2π
sin x dx = − cos x 0 = (− cos 2π) − (− cos 0) = −1 + 1 = 0,
0

which is clearly not equal to the area between y = sin x and the x-axis on [0, 2π].
Instead, we should calculate the two shaded areas separately, and add them.
On the interval [0, π] we have sin x ≥ 0, and so the first area is
Z π  π
sin x dx = − cos x 0 = (− cos 2π) − (− cos 0) = 1 + 1 = 2.
0

It is clear by the symmetry of the diagram that the other part will have the same
area. To confirm this, observe that since sin x ≤ 0 for π ≤ x ≤ 2π, the area between
the graphs of y = 0 (the x-axis) and y = sin x on [π, 2π] is
Z 2π  2π
(0 − sin x) dx = cos x π = cos 2π − cos π = 1 + 1 = 2.
π

So the total area is 4.


3.4.3 Average value of a function
We all know what is meant by the average of a finite number of numbers: it is the
sum of the numbers divided by the number of numbers. But can one sensibly define
the average value of a function on an interval? Any interval [a, b] (where a < b)
contains infinitely many numbers, and so there are infinitely many function values to
be averaged. You cannot form the sum of these infinitely many numbers, and even
if you could then dividing by the total number of numbers would make no sense. So
what can we do?
One natural idea is to take just a finite number of x-values in [a, b], thinking of
the function values at these points as a representative sample of the function values
on the whole interval. Their average should then be at least a reasonable estimate of
the average value of the function on [a, b]. The larger the sample you take, the better
the estimate should be.
So that we do not favour one part of the interval over another, we should take
our sample points to be spaced over the entire interval. So if we want n sample
points then we should divide [a, b] into n sub-intervals, each of width ∆x = b−a n ,
96 Chapter 3 Summation

and choose a sample point xi in each sub-interval. Averaging the function values
f (x1 ), f (x2 ), . . . , f (xn ) gives
n
f (x1 ) + f (x2 ) + · · · + f (xn ) X f (xi )
=
n i=1
n
n
X ∆x
= f (xi )
i=1
b−a
n
1 X
= f (xi )∆x
b − a i=1

since ∆x = b−a n . If we believe that we get better estimates by taking more points,
the average value of f on [a, b] should be defined as the limit of the above expression
as n tends to ∞. Since we have already defined the limit as n → ∞ of the expression
P n
i=1 f (xi )∆x to be the definite integral of f (x) from a to b, our definition of the
average value of f on [a, b] becomes
Z b
1
average value = f (x) dx.
b−a a

The above formula is intuitively reasonable, since if we write k for the average
value then we find that Z b
f (x) dx = k(b − a)
a
= kb − ka
 b
= kx a
Z b
= k dx.
a
In other words, the integral of f (x) from a to b equals the integral of its average value
over the same interval.
Example 3.17
Consider the function f (x) = x(2 − x) over the interval [0, 2]. The function values on
this interval are all nonnegative,
R 2 and so the area between the graph and the x-axis
over this interval is given by 0 (2x − x2 ) dx = 43 . The average value of f (x) over
y y
1 1
2 2
3 3

0 0
0 1 2 x 0 1 2 x
1
R2
[0, 2] is 2−0 0
(2x − x2 ) dx = 32 . Now observe that a rectangle on the base [0, 2] with
height equal to this average value has area 43 , the same as the area under the graph.
That is, the shaded areas in the above diagrams are equal.

3.4.4 Some properties of definite integrals


If f (x) and g(x) are integrable functions, then
3.4 Applications of the definite integral 97
Z b Z b Z b
(1) (f (x) + g(x)) dx = f (x) dx + g(x) dx.
a a a
Z b Z b
(2) kf (x) dx = k f (x) dx, for any constant k.
a a
Z b Z c Z b
(3) f (x) dx = f (x) dx + f (x) dx.
a a c
Z b Z a
(4) f (x) dx = − f (x) dx.
a b
Z a
(5) f (x) dx = 0.
a

The next example illustrates the use of some of these properties.

Example 3.18
Z 5 Z 5 Z 5
(1) 2 2
(t − 2t) dt = t dt + (−2)t dt
2 2 2
Z 5 Z 5
2
= t dt − 2 t dt
2 2
 1 3 5  1 2 5
= 3t 0 − 2 2t 0
117 21
= −2×
3 2
= 18.

Z 0 Z 3
(2) 2
 1 3 3
t dt = − t2 dt = − 3 t 0 = −9.
3 0

(3) If f (x) is a continuous function on [a, b], but is defined differently over different
sections of the interval, then property (3) above allows us to calculate the integral
from a to b by splitting it into appropriate pieces. We illustrate this by finding the
average value of g(t) over its interval of definition, where

t(10 − t) 0 ≤ t ≤ 5,
g(t) = 5−t
25e 5 ≤ t ≤ 60.

By definition the average value is


Z 60 Z Z 60
1 1 5 
g(t) dt = t(10 − t) dt + 25 e5−t dt
60 − 0 0 60 0 5
h 3 i5 h i60
1 2 t 25 5−t
= 5t − + −e
60 3 0 60 5
≈ 1.8.

(Since g(t) was defined differently on [0, 5] and [5, 60], we were forced to calculate the
integral from 0 to 60 in two pieces, namely 0 to 5 and 5 to 60.)
98 Chapter 3 Summation

Odd and Even Integrands


An odd function is a function f such that f (−x) = −f (x) for all x in [−a, a], where
[−a, a] is the domain of f . An even function is a function f satisfying f (−x) = f (x)
for all x in [−a, a], where [−a, a] is its domain.
Z a
If f is an odd function then f (x) dx = 0.
−a
Z a Z a
If f is an even function then f (x) dx = 2 f (x) dx.
−a 0

Some examples of odd functions are

ex − e−x
x, x3 , x5 , . . . , sin x, tan x, , x2 sin x.
2
Some examples of even functions are

ex + e−x 3
17, x2 , x4 , . . . , cos x, , x sin x.
2
Some examples of functions that are neither even nor odd are

x + x2 , ex , 1 + sin x.

Example 3.19
Z a
(1) Since sin x is odd, sin x dx = 0.
−a
Of course we can confirm this by integrating:
Z a
 a
sin x dx = − cos x −a = − cos a + cos(−a) = 0.
a

The diagram illustrates what is happening in the case that 0 < a < π. The integral
of sin x from −a to 0 is the negative of the area marked A in the diagram, since the
graph is below the axis here. By symmetry the area A equals the area B, and the
integral from −a to a is −A + B = 0.

y
1
−a B
0
−π A a π x
−1

(2) Since f (x) = −x2 + 4 is an even function


Z 3 Z 3
2
(−x + 4) dx = 2 (−x2 + 4) dx
−3 0
= 2 × 3 = 6.
3.4 Applications of the definite integral 99

Note that the integral of −x2 + 4 from 0 to 3 is the difference AR − B, where A and
3
B are the areas shown in the diagram. This is because B = − 2 (−x2 + 4) dx, the
R0 R 3
graph being below the axis on [2, 3]. That −3 (−x2 + 4) dx = 0 (−x2 + 4) dx is clear
by symmetry.

y = 4 − x2
4

2
A
1

0
−3 −2 −1 1 2 3 x
−1 B
−2

−3

−4

−5

(In fact the area A is 5 13 and the area B is 2 13 .)


Z 1 Z 1 Z 1 Z 1
2 2
(3) (x + x ) dx = x dx + x dx = 0 + 2 x2 dx = 2
3 (since x is an odd
−1 −1 −1 0
function and x2 an even function).

Exercises Set 3.4

1. If a patient’s blood pressure changes at a rate r(t) = 40 cos 4t sin 4t mm Hg/hr


in response to a test drug, what would be the patient’s blood pressure in two
hours if it was 96 mm Hg when the drug was administered? When will it return
to normal?
2. The flow rate of a stream over the spillway of a dam is given by the formula
V ′ (t) = e−Kt ,
where the constant K is related to the size of the spillway.
Express the flow volume of the stream as an integral and then integrate to
establish the explicit function V (t).
3. A drug is excreted in a patient’s urine, and the urine is monitored continuously
using a catheter, allowing the rate at which the drug is excreted to be measured.
Denote this rate by f (t) (a function of t = time), so that the amount of drug
excreted by the patient over an interval of time is given by the integral of f (t)
over that interval.
Assume that a patient at time t = 0 is administered 12 mg of a drug, which is
excreted at a rate of 3t−1/2 mg/hr. What will be the amount of drug in the
patient at time t > 0? When will the patient be drug free?
100 Chapter 3 Summation

4. Find the law connecting time t and distance s when time and velocity v = ds/dt
are related by
1
v = 2 − 2,
t
and s = 3 when t = 1.

5. Find the area bounded by the parabola y = 3 + 4x + 3x2 , the axis of x, and
(i ) the lines x = 1 and x = 2
(ii ) the lines x = 1 and x = 5

6. Find the area enclosed by the following curves:


(i ) y = x and y = x2 − 2.
π
(ii ) y = sin x, y = cos x, for 0 ≤ x ≤ 4.
(iii ) y = x3 and x + y = 4.

7. Find the average values of the following functions over the indicated intervals:
(i ) y = 6x2 + 4x + 3 for −2 ≤ x ≤ 2.
(ii ) y = 5 sin x2 for 0 ≤ x ≤ 2π.
2
8. Find the area under the curve y = xex for 0 ≤ x ≤ 2. Find the average value
of y over the same interval. What is the relationship between the area and the
average value?

§3.5 Extending integration

3.5.1 Improper integrals

In this section, we shall consider some ways in which the notion of a definite
integral can be extended. Specifically, we shall see how in some circumstances one
can define the integral of f over an infinite interval. Such improper integrals are
analogous to the infinite series considered at the start of §3.2.
3.5.2 Infinite integrals

It is frequently important to consider integrals for which the domain is, at least
theoretically, the whole real number system. Such integrals are important in math-
ematical statistics, for example. So let us start by investigating a simple statistical
example.
Probability density functions
If a coin is tossed three times one may consider a variety of different types of outcomes.
For instance, one might be interested in the number of times a head was displayed.
Then there are four possible outcomes: 0 heads, 1 head, 2 heads or 3 heads. If we
let P (x) be the probability of there being x heads, then

1 3 3 1
P (0) = , P (1) = , P (2) = , P (3) = .
8 8 8 8
3.5 Extending integration 101

A function such as P is called a probability density function. The two crucial prop-
erties of P are that P (x) ≥ 0 for all x, and the sum of P (x) over all possible values
of x must be 1.
In the coin tossing example the range of possible values of the random variable
x was the finite set of numbers {0, 1, 2, 3}. But it is important to be able to use
statistical methods to analyse more complex situations. Specifically, we need to
consider random variables for which the range of possible values is an interval [a, b].
For example, imagine a bee that flies from its hive in search of pollen. Let d(t)
denote the distance that the bee is from its hive at time t, and let x denote the
maximum value that d(t) achieves on a given day. Then x will certainly lie in some
interval [0, D] (where D is the maximum distance the bee can ever travel from its
hive). It seems intuitively reasonable to expect that x is less likely to be very close
to D or very close to 0 than it is to be somewhere near the middle of the interval.
Whatever way, there ought to be some probability density function that describes
the relative likelihoods of the various possibilities.
Notice that the probability that x will take any one given value k in [0, D] is
infinitesimal. Real numbers correspond to infinite decimal expansions, and as such
they can never be measured exactly. (Which is more likely: that x = 1.03, or that
x = 1.030001, or that x = 1.0300000276?) It does not really make sense to ask about
the probability that x = k; what you should really ask about is the probability that
x lies in some range such as k − 0.001 < x < k + 0.001. The probability density
function f is defined by the requirement that
Z k
Prob(h ≤ x ≤ k) = f (x) dx.
h

for all h and k in [xmin , xmax ], where xmin and xmax are the maximum and minimum
possible values for x. (In our bee example, xmin = 0 and xmax = D.)
In the bee example, the graph of the probability density function on the interval
[0, D] might look something like that shown in the diagram below (in which D = 3).
RD
The important features are that f (x) ≥ 0 for all x in [0, D], and 0 f (x) dx = 1.
Rk
These two properties ensure that 0 ≤ h f (x) dx ≤ 1 whenever 0 ≤ h ≤ k ≤ D.
Rk
This is what we want, since h f (x) dx is supposed to be the probability that x lies
between h and k, and probabilities have to lie between 0 and 1. Since [0, D] is the
entire range of possible outcomes, the probability that x lies in [0, D] is 1; hence our
RD
requirement that 0 f (x) dx = 1.

y
1

0 x
0 1 2 3

There are many instances when a random variable x could conceivably take any
nonnegative real value. In this case the domain of the probability density function
102 Chapter 3 Summation

has to be the infinite interval [0, ∞) (the set of all nonnegative real numbers). In this
case the p.d.f. would have to have the property that
Z ∞
f (x)dx = 1.
0

We have not yet defined what is meant by such an expression as this. But by
analogy with the case of infinite series, it is natural to define
Z ∞ Z b
f (x) dx = lim f (x) dx
a b→∞ a

whenever the limit exists.


Example 3.20
The p.d.f. for many biological phenomena has the form f (t) = λe−λt for λ ≥ 0. This
is called the exponential probability density function. (It is often the case for this
p.d.f. that the random variable represents time; hence we have decided to write t in
place of x.)
Observe that
Z T T
Prob(0 ≤ t ≤ T ) = λe −λt
dt = −e −λt
= −(e−λT − e0 ) = 1 − e−λT .
0 0

Note that e−λT = 1/eλT , and since λ ≥ 0 it follows that e−λT → 0 as T → ∞. So


RT
the integral 0 λe−λt dt approaches 1 as T tends to ∞. Thus
Z ∞
λe−λt dt = 1
0

as required for a probability density function.

Z b Z ∞
If f (t) dt approaches a finite limit L as b → ∞ then we define f (t) dt = L.
a Z a Z a a

Similarly, we define f (t) dt = lim f (t) dt, provided this limit exists,
−∞ b→−∞ b
Z ∞ Z 0 Z ∞
and f (t) dt = f (t) dt + f (t) dt, provided both exist.
−∞ −∞ 0

Example 3.21
Z b b
−3x 1 −3x 1 1 1
e dx = − e = − [e−3b − e−6 ] = 6 − 3b .
2 3
2 3 3e 3e
1
Now since → 0 as b → ∞, it follows that
3e3b
Z ∞
1
e−3x dx = 6 .
2 3e
3.5 Extending integration 103

Example 3.22
Z b b
1 1 1 1
dx = ln(2x + 5) = ln(2b + 5) − ln(5).
0 2x + 5 2 0 2 2
Z ∞
1
Observe that 21 ln(2b + 5) → ∞ as b → ∞; so dx is not defined.
0 2x + 5

Example 3.23
It can be shown that the improper integral
Z ∞
def
Γ(x) = tx−1 e−t dt (1)
0

exists for all real numbers x ≥ 1. This is easy to check in the case x = 1, since
Z T
Γ(1) = lim e−t dt = lim (1 − e−T ) = 1
T →∞ 0 T →∞

(as we have already seen in the context of exponential probability density functions).
The function Γ(x) defined by Eq. (1) is called the Gamma function, and is important
in statistics and applied mathematics.
The following calculation establishes a surprising fact about the values that the
Gamma function takes for positive integer values of x. Observe first that since
d x −t
(t e ) = xtx−1 e−t − tx e−t ,
dt
it follows that
Z T Z T iT
x−1 −t
xt e dt − tx e−t dt = tx e−t = T x e−T
0 0 0
x
since 0 = 0 (given that x > 0). Taking limits as T → ∞ we deduce that
xΓ(x) − Γ(x + 1) = lim T x e−T = 0.
T →∞

(We omit the proof that this limit really is 0, but the student is invited to use a
calculator to work out the values of T x e−T for (say) x = 5 and large values of T .)
Thus we see that Γ(x + 1) = xΓ(x) for all x ≥ 1.
Thus Γ(4) = 3Γ(3) = 3 × 2Γ(2) = 3 × 2 × 1Γ(1) = 3 × 2 × 1. More generally, we
see that
Γ(n + 1) = n(n − 1)(n − 2) · · · × 3 × 2 × 1
for all nonnegative integers n.

Exercises Set 3.5

1. Calculate, if possible, the following improper integrals:


Z ∞ Z ∞ Z ∞
−5x dx x3
(i ) 5e dx (ii ) (iii ) dx
0 1 3x + 4 0 x4 + 1
Z ∞ Z ∞
dx dx
(iv ) (v )
0 (4x + 9) 2
5
1 x4
104 Chapter 3 Summation
Z ∞
2. (i ) Show that e 5−b
→ 0 as b → ∞. Hence evaluate e5−x dx.
5
Z ∞
−3t
(ii ) Sketch the graph of f (t) = 3e . Find 3e−3t dt.
0
Appendix 1 — Algebra
1. Basic identities

(x + y)2 = x2 + 2xy + y 2 (NOT x2 + y 2 )


(x − y)2 = x2 − 2xy + y 2 (NOT x2 − y 2 )
(x + y)(x − y) = x2 − y 2

2. Simplifying fractions

To simplify fractions, you may multiply numerator and denominator by the same
expression. You may also divide numerator and denominator by the same expression.
Don’t try to cancel terms any other way than this! Some examples:

x2 + xy x(x + y) x
= =
y 2 + xy y(x + y) y
1 + x1 x(1 + x1 ) x+1
3 = 3 = 2x + 3
2+ x x(2 + x )
3x + 1 x+1
Cannot cancel like this: 6= .
3x + 4 x+4

3. Manipulating powers

Many arithmetic operations can be written in terms of powers:


1
= x−1
x
1
= x−k
xk

x = x1/2
√k
x = x1/k

q
xp = xp/q

Powers can be distributed over multiplication and division: for all indices n,
 x n xn
(xy)n = xn y n and = .
y yn

Be careful not to distribute powers over addition! That is, (x + y)n 6= xn + y n in


general.

The solutions of the quadratic equation ax2 + bx + c = 0 are given by the


quadratic formula, √
−b ± b2 − 4ac
x= .
2a

105
Appendix 2 — Geometry
We revise the properties of some geometric shapes. First, we revise four-sided
polygons, or quadrilaterals.

s a h

s b b
A square is a quadrilat- A rectangle is a quadri- A parallelogram is a quad-
eral with four sides, all the lateral with four right an- rilateral in which opposite
same length, and four right gles. Opposite sides have sides are parallel, and have
angles. the same length. the same length.

Area = s2 Area = ab Area = bh

a r r
h O b
O b

θ
θ
b
Area of a triangle: A circle is the set of all A sector is the part of a
points in 2D space a fixed circle swept out by radii
Area = 12 bh distance r from a point O. traversing an angle θ at the
Area = 21 ab sin θ centre.
Area = πr2
See also the sine and cosine Circumference = 2πr Area of sector = 12 r2 θ
rules, in trigonometry. Length of arc = rθ

r
O b
h h

A A

A sphere is the set of all A cylinder is the solid ob- A cone is the solid obtained
points in 3D space a fixed tained by extruding a base by shrinking a base of area
distance r from a point O. of area A along a perpen- A to a point along a per-
dicular distance h. pendicular distance h.
Volume = 34 πr3
Surface area = 4πr2 Volume = Ah Volume = 13 Ah

106
Appendix 3 — Trigonometry
You will need to recall various facts about trigonometry from high school. This
appendix is intended to be a helpful reminder of the most important ideas, but it is
not at all comprehensive.

All angles are measured in radians, which are related to degrees by the equation
π radians = 180◦ .
Angles in radian measure are generally written as pure numbers, without any units.

By convention, angles in the (x, y)-plane are measured as the anticlockwise angle
of rotation from the positive x-axis to a particular ray. Some common angles are
marked on the following diagram.
π
2

3π π
4 4

π 0

5π 7π
4 4


2

The sine and cosine are defined as follows: given an angle θ, let P (x, y) be the
point lying on the circle of radius 1 centred at the origin O such that the angle from
the positive x-axis to the ray OP (measured anticlockwise) is θ. Then we define
cos θ = x and sin θ = y.
In the special case where 0 < θ < π2 , the sine and cosine of an angle can also be
expressed in terms of the corresponding right-angled-triangle:
adjacent opposite
cos θ = and sin θ = .
hypotenuse hypotenuse

P (x, y)

1
sin θ
se

θ
nu

opposite

O cos θ
pote
hy

θ
adjacent

107
108 Appendix 3 Trigonometry

From the definition of sin and cos in terms of the unit circle, it is easy to see the
following symmetry properties:
sin(−θ) = − sin θ and cos(−θ) = cos θ.
In other words, sin is an odd function and cos is an even function.

Since P (x, y) lies on the unit circle, we have x2 +y 2 = 1 by Pythagoras’ theorem.


This shows that
sin2 θ + cos2 θ = 1,
which is the trigonometric form of Pythagoras’ theorem.

The graphs of y = sin x and y = cos x are known as sinusoidal waves:


y
1
y = sin x

−2π − 3π −π − π2 0 π π 3π 2π x
2 2 2

−1

y 1
y = cos x

−2π − 3π −π − π2 0 π π 3π 2π x
2 2 2

−1

The remaining trigonometric functions are constructed straightforwardly from


sin and cos:
sin θ cos θ 1 1
tan θ = cot θ = sec θ = and cosec θ = .
cos θ sin θ cos θ sin θ
Dividing Pythagoras’ theorem by cos2 θ and sin2 θ yields
1 + tan2 θ = sec2 θ and 1 + cot2 θ = cosec2 θ.
Questions involving tan, cot, sec and cosec can often be solved by rewriting those
functions in terms of sin and cos. It is nevertheless worth knowing the graph of
y = tan x:
y

1 y = tan x

−2π − 3π
2
−π − π2 0 π
2
π 3π
2
2π x
−1
Appendix 3 Trigonometry 109

Very often we need the addition laws for sin and cos:

sin(x + y) = sin x cos y + cos x sin y


sin(x − y) = sin x cos y − cos x sin y
cos(x + y) = cos x cos y − sin x sin y
cos(x − y) = cos x cos y + sin x sin y

As special cases we obtain the complementary angle relations,


π  π 
sin − x = cos x and cos − x = sin x,
2 2
and the double angle formulae,

sin 2x = 2 sin x cos x and cos 2x = cos2 x − sin2 x.

Deft rearrangement of the double angle formulae gives the formulae

1 1
cos2 x = (1 + cos 2x) and sin2 x = (1 − cos 2x).
2 2
More generally, the addition laws can be rearranged to yield the so-called “products
to sums” formulae:
1 1
sin x cos y = sin(x − y) + sin(x + y)
2 2
1 1
cos x cos y = cos(x − y) + cos(x + y)
2 2
1 1
sin x sin y = cos(x − y) − cos(x + y)
2 2
Finally, the t-formulae provide a parametrisation of sin and cos entirely in terms of
rational functions. They say that, if t = tan x2 , then

2t 1 − t2
sin x = and cos x = .
1 + t2 1 + t2

The sine and cosine rules relate the sides and angles in any triangle.
A
c
b

B a C
If the triangle is labelled as above, then

a b c
= =
sin A sin B sin C
and c2 = a2 + b2 − 2ab cos C.
Appendix 4 — Exponents and logarithms

1. Index laws

• x0 = 1

• xa × xb = xa+b

• xa /xb = xa−b

• (xa )b = xab

2. Definition of logarithm

• Let a be a fixed positive number. For all real numbers x, y with x > 0

loga x = y if and only if x = ay .

In particular,

log10 x = y if and only if x = 10y .

• The function loga x is defined only for x > 0.

3. Logarithm laws

• log 1 = 0

• log(xy) = log x + log y for all x > 0 and y > 0



• log xy = log x − log y for all x > 0 and y > 0

• log(xy ) = y log x for all x > 0 and all real numbers y

4. Natural exponentials and logarithms

There is a positive number called e, having the value e = 2.71828 . . . , such that
d x
dx e = ex . In fact, e is the only such number. Because of this property, exponen-
tial functions and logarithms with the base e are called “natural exponentials” and
“natural logarithms”. We use the special notation ln x (= loge x) for the natural
logarithm. In particular,

• eln x = x for all x > 0,

• ln(ex ) = x for all x, and

• ln x = y if and only if x = ey .

110
Appendix 5 — Differentiation

1. Derivatives of some basic functions

d n
• dx (x ) = nxn−1
d x
• dx (e ) = ex
d 1
• dx (ln x) = x

d
• dx (sin x) = cos x
d
• dx (cos x) = − sin x
d
• dx (tan x) = sec2 x

2. Rules for differentiating

d d
• dx (k f (x)) =k dx (f (x)) (k a constant)
d d d
• dx (f (x) + g(x)) = dx f (x) + dx g(x)

• If u = f (x) and v = g(x), then

d dv du
(uv) = u +v (product rule)
dx dx dx
and
d  u  v du dv
dx − u dx
= (quotient rule)
dx v v2

• If y = g(u), where u = f (x), then

dy dy du
= (chain rule)
dx du dx

• If h(x) = g(f (x)) then

h′ (x) = g ′ (f (x))f ′ (x) (chain rule)

(Observe that if we write u = f (x) and y = h(x) then the equation h(x) = g(f (x))
dy du
can be reformulated as y = g(u), and g ′ (f (x))f ′ (x) can be reformulated as du dx . So
the two versions of the chain rule are equivalent.)

111
Appendix 6 — Integrals

1. Rules for integrating


R R
• kf (x) dx = k f (x) dx
R R R
• (f (x) ± g(x)) dx = f (x) dx ± g(x) dx

2. A brief table of integrals


R
• dx = x + C
R xn+1
• xn dx = n+1 +C (n 6= −1)
R 1
• x dx = ln |x| + C
R
• ex dx = ex + C
R
• sin x dx = − cos x + C
R
• cos x dx = sin x + C
R
• sec2 x dx = tan x + C
R
• cosec2 x dx = − cot x + C
R
• sec x tan x dx = sec x + C
R
• cosec x cot x dx = − cosec x + C
R
• sin2 x dx = x
2 − sin 2x
4 +C
R x sin 2x
• cos2 x dx = 2 + 4 +C
R
• tan x dx = ln | cos x| + C
R
• cot x dx = − ln | sin x| + C
R
• √ 1 dx = arcsin x + C
1−x2
R 1
• 1+x2 dx = arctan x + C

112
Answers to the Exercises

Answers to Exercises Set 1.1

1. (i ) 3 sin 2x (ii ) 5 sin 2π


3 x

(iii ) 3 sin 32 (x + 3π
4 ) or 3 cos 23 x
3
1 2
2. (i ) 0 (ii ) 1
−π π
−1 0
−π π
−1
−2
−3

(iii ) 5 (iv ) 5
4 4
3 3
2 2
1 1
0 0
−π π −π π
−1 −1

1 2

3. (i ) 0 (ii ) 1
−2 −1 1 2
−1 0
−2 −1 1 2
−1
2
(iii )
−2
1

0
−2 −1 1 2
−1

−2

Answers to Exercises Set 1.2


12

1. (i ) x= 25 (ii ) x = 5 2 (iii ) x = 4

10
(iv ) x = e − 2 (v ) x= 5 (vi ) x = 2 − ln 4


2. × 10

1
3. (i ) y = 3x 2 (ii ) y = 5e2x (iii ) y = 1.3x0.6 (iv ) y = 20e−0.4x

113
114 Answers to the Exercises

Answers to Exercises Set 1.3

1. (i ) f (x) = 17 − 2x (ii ) f (x) = 4x − 1

(iii ) f (x) = 2x2 + x − 11 (iv ) f (x) = 3 + x2


x3
(v ) f (x) = x3 + x2 − 5x + 3 (vi ) f (x) = 4

Answers to Exercises Set 2.1 ( 23 , 13


4
)
3
1. (a) 30 (b)
2
20 1
10 0
0 −3 −2 −1 1 2 3
−1
5 10
−10 −2
(5, −13)
−3
8 −4
−5
6
(c) (0,5) (d) 5
4 4 (1,3)
(1,3) 3
( 23 , 52 )
2 2
(2,1) 1 (2,2)
0 0
−2−1 1 2 3 −1 1 2 3
−2 −2
−3
20
(e) (f) 15
10 (0, −2) 10 (4, 13)

0 5
−1 1 2 3 4 5
0
−10 −6 −4 −2 2 4
−5 (0, 0)
(2, −18) (−1, −5)
−20 −10

−30 (3, −29)

2. Inflection when x = 1.

3. Local maximum of −2 when x = −1.


Local minimum of 2 when x = 1.

4. (i ) Minimum= −249; maximum= 75.


29
(ii ) Minimum= 1; maximum= 11 .
Answers to the Exercises 115

5. (a) x<0 (b) x>0 (c) x < −1, x > 1 (d) −1 < x < 1

Inflections at x = −1, 1.

2c
6. d= 3

7. (a) t=2

(b) increasing for 0 ≤ t < 2; decreasing for t > 2.

8. n = 30

9. x=1

10. Maximum for R1 = 0.36, maximum for R2 = 0.30.

11. 7 and 7

12. 4 and 4

13. 250 m2

14. 500 m2

5
15. 3 cm × 35 14
3 cm × 3 cm

16. 6m× 6m× 9m

17. (a) Maximum AGR at 5 years, when the tree is 30 metres high.

e(5−t)
(b) RGR =
1 + e(5−t)

Answers to Exercises Set 2.2



1. (i ) 5 (ii ) 3 (iii ) 2 (iv ) no real value (v ) e3

2. (i ) vertical plane through (0, 3, 0) parallel to xz−plane.

(ii ) horizontal plane through (0, 0, −3).

(iii ) straight line x = 1 in xz−plane.

∂z ∂z
3. (i ) ∂x = 3x2 + 4xy, ∂y = 2x2 + 2y
x
(ii ) ∂z
∂x = ln y + yex , ∂z
∂y = + ex
y
116 Answers to the Exercises

4. (i ) fx = ey + yex + 1, fy = xey + ex , fxx = yex , fxy = ey + ex , fyy = xey


1 −x −1 2x
(ii ) fx = , fy = 2 , fxx = 0, fxy = 2 , fyy = 3
y y y y

2x 2y 2(y 2 − x2 ) −4xy
(iii ) fx = 2 2
, f y = 2 2
, f xx = 2 2 2
, fxy = 2 ,
x +y x +y (x + y ) (x + y 2 )2

2(x2 − y 2 )
fyy =
(x2 + y 2 )2

5. (i ) fxy = fyx = 24x3 y + 14xy (ii ) fxy = fyx = − x23

∂z −5x −5
6. = √ = √ when x = 2.
∂x 4 16 − x2 4 3

7. zx = −50, zy = 90.
Since zx < 0, demand for butter (z) decreases as the price of butter (x) increases,
and since zy > 0 demand for butter increases as the price of margarine (y)
increases.

2x −4xy
8. (i ) , (ii )
x2 + y2 (x2 + y 2 )2

(iii ) Min. of +1 at (1, −1).

(iv ) Least value 5 at (1, −3), greatest value 14 at (−2, −3).

(v ) Least value 2 45 at (− 15 , − 25 ), greatest value 27 at (2, 4).

(vi ) Least value of 2 at (1, 0), greatest value 18 at (0, 3).

9. (ii ) fx (1, 1) = −1, fy (1, 1) = −6

Answers to Exercises Set 2.3


12 36
1. y= 5 + 35 x ≈ 2.4 + x

2. (i ) y = −0.38 + 0.077x (ii ) y = −6.99 + 1.78x

(a) 2.55 (to 2 dec. pl.) (b) 78.45(×$100)

Answers to Exercises Set 3.1


Pn Pk 1 Pn
1. (i ) k=1 2k (ii ) i=2 (iii ) k=1 k2
i−1
Pn
(iv ) k=1 a + (k − 1)d

80
2. (i ) 100 (ii ) 840 (iii ) 9841 (iv ) 2 81
Answers to the Exercises 117

1 1 19
3. (i ) 2 − 21 = 42 (ii ) bn+1 − b0

4. $5000 × (1.03)12 = $7128.80

5. $5000(1 + 1.08 + · · · + 1.086 ) = $44614

Answers to Exercises Set 3.2


1
1. (i ) 4 (ii ) 0.77 (2 dec. pl.) (iii ) 2 ln 59 (iv ) 1
2 − 1
2e

R3
2. (ii) 0
50e0.06t dt ≈ 164

R4 32
3. (ii) 0
(4t − t2 ) dt = 3

118
4. 3

Answers to Exercises Set 3.3


3
(1+x2 )3 (1+p 2 )4
1. (1) 3 +c (2) 6 +c

(3) − 13 cos(x3 ) + c (4) 2 sin x+c

sin3 t

(5) 3 +c (6) 5 + x2 + c
√ 1 1
(7) −3 1 − r2 + c (8) 2 (1 + 2t3 ) 3 + c

2 3 2 3
3
(9) 15 (2 + 5y) 2 + c (10) 9 (x + 4) 2 + c

1
p 3 2 2
(11) 2 2y 2 + 1 + c (12) 4 (z + 2z + 2) 3 + c

(13) − 14 cos(2x2 ) + c (14) 1


2 sin(2x + 4) + c

(15) 2 − cos 2t + c (16) sin2 z + c
2 3
(17) −1
2 sin 2t +c (18) 27 (2 + 3x3 ) 2 + c
3
1
(19) 2 ln(x2 + 1) + c (20) − cos6 2x + c
y
sin4 −1
(21) 2
2
+c (22) 6(t3 +3t2 +1)2 +c

1 x3 1 2
(23) 3e +c (24) 2 (ln x) +c

2. (a) (i) y = −6x + c (iii) y = −6x + 9

(b) (i) y = −x2 + 2x + c (iii) y = −x2 + 2x + 2


x4 x4 19
(c) (i) y= 4 − 2x2 + c (iii) y= 4 − 2x2 + 4
118 Answers to the Exercises

Answers to Exercises Set 3.4

1. p(2) = 114, t = 4π, 8π, . . .

e−Kt
2. V (t) = − +C
K

1
3. Amount of drug = −6t 2 + 12. Patient will be drug free after 4 hours.

1
4. s = 2t + t

5. (i ) 16 (ii ) 184

6. (i ) 4 12 units2 (ii ) ( √22 − 1) units2 (iii ) (4 − 3 ln 3) units2

10
7. (i ) 11 (ii ) π

8. Area = 12 (e4 − 1) units2 ; mean value = 41 (e4 − 1).

Answers to Exercises Set 3.5


1 1
1. (i ) 1 (ii ) not defined (iii ) not defined (iv ) 162 (v ) 3

2. (i ) 1 (ii ) 1

You might also like