USyd MATH1011 Full Course Notes
USyd MATH1011 Full Course Notes
USyd MATH1011 Full Course Notes
of
Calculus
Lecture notes for MATH1011
Chapter 2: Optimisation
2.1 One Variable Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
The Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Global and Local Maxima and Minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Maxima and minima when the derivative is undefined . . . . . . . . . . . . . . . . . . . . . . . 39
Concavity and points of inflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Absolute and relative growth rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Exercises Set 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2 Functions with two or more variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Coordinates in 3 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Geometric interpretation of partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Maxima and minima of functions of two variables . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Tests for maxima and minima of functions of two variables . . . . . . . . . . . . . . . . . 56
Exercises Set 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.3 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
The line of best fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Polynomials of best fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Exercises Set 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Chapter 3: Summation
3.1 Finite sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Geometric series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Collapsing series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Exercises Set 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
–i–
3.2 The definite integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Exercises Set 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.3 The indefinite integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Integral curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Finding an amount given its rate of change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Exercises Set 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.4 Applications of the definite integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Area under a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Area between two curves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
Average value of a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Some properties of definite integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Exercises Set 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.5 Extending integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100
Improper integrals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100
Infinite integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Exercises Set 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Appendix 1 – Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Appendix 2 – Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Appendix 3 – Trigonometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Appendix 4 – Exponents and logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Appendix 5 – Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Appendix 6 – Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Answers to the Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
– ii –
Chapter 1
Curve Fitting
§1.1 Introduction
§1.2 Periodicity
for some positive number a. Such functions are called periodic, and the period is the
smallest positive a satisfying Eq. (1).
The following is the graph of a function with period equal to 1.
1 2 3 4
Example 1.1
Yeast cells increase in a two stage cyclic process of mitosis. Firstly, each cell duplicates
its DNA and then it divides to form new cells. The cell’s length, ℓ, grows at a
constant rate from the time of its creation to that of nuclear division; the length
remains stationary during the time taken to create new cell walls and divide.
1
2 Chapter 1 Curve Fitting
1.3
1
1.3
1
The black circles and white circles in this diagram indicate (for example) that
we take the function value at t = 2 to be 2, rather than 1.3.
The equation of this function has to accommodate both the cyclic nature of the
process and the fact that a different rule applies in the two stages of each cycle (the
cell growth stage and the cell division stage). For t in the interval 0 < t ≤ 1 the
equation is
t + 1.3 if 0 < t < 0.7
f (t) =
2.0 if 0.7 ≤ t ≤ 1
and for t outside this interval the value of f (t) is determined by the rule that
The idea is, for example, that f (1.4) = f (0.4) (by the rule that f (t + 1) = f (t) for
all t), and since 0.4 is in the interval 0 < t ≤ 1 the value of f (0.4) is determined by
the previous rule. We find that
Example 1.2
Sketch the following function, and state its period:
2
for 0 ≤ x < 1,
f (x) = 3−x for 1 ≤ x < 3,
f (x + 3) for all x.
x
−4 −3 −2 −1 1 2 3 4 5 6 7 8 9
Note that, since the function values repeat every 3 units, we have
Example 1.3
In some instances, the periodicity can be unexpected. In a study of asthma in
N.S.W. from 2003 to 2010, the Department of Public Health at the University of
Sydney obtained the data shown below. The data points show the number of cases
treated in public hospitals each month. Approximate periodic behaviour has been
emphasized by drawing a curve that provides a reasonable fit to the data points.
2000
1500
1000
500
0
Jan 03 Jan 04 Jan 05 Jan 07 Jan 08 Jan 09 Jan 10
The most famous periodic functions are the sinusoidal functions such as f (t) = sin t
and g(t) = cos t. Both of these functions have period 2π; that is,
0
−2π 2π
t
−1
Observe that the sine function is simply the cosine function shifted π/2 units to the
right. (Note that in mathematical writing sin t and cos t always mean the sine and
cosine of the number t, or (equivalently) the sine and cosine of an angle of t radians.
To convert from degrees to radians you divide by 180 and multiply by π.)
As most periodic phenomena do not have period 2π, and since, further, many
are not sinusoidal, it is necessary to either modify the sine (or cosine) function or
find some other functions to model rhythmic behaviour. For instance, the function
f (t) = sin 3t is sinusoidal with period 2π/3. Furthermore, as the next example shows,
the sum of two sinusoidal functions may be periodic and not sinusoidal.
Example 1.4
The diagram below shows the graph of the function f (t) = sin 2t + sin 3t over the
interval from t = −2π to t = 2π.
f (t)
2
0
−2π 2π
t
−1
−2
The periodicity of this function is evident from the graph, and is confirmed by
the following calculation:
Adding extra terms of the form A sin(nt + α), for suitably chosen constants A, n
and α, one could produce a periodic function with a graph to roughly match the
asthma data in Example 1.3 above.
There are four ways in which one might modify the graph of a periodic function to
produce another graph with the same general shape. These are moving the graph
vertically, moving it horizontally, stretching it vertically and stretching it horizon-
tally. These transformations, which can all be accomplished by simple modifications
of the given function, correspond to changing four quantities that we shall discuss
in a moment: the mean level, the phase, the amplitude and the period. Of special
1.2 Periodicity 5
importance are the functions that can be obtained from the sine function f (t) = sin t
by a combination of transformations of the above kinds; these are precisely the si-
nusoidal functions. They are essential for modelling periodic phenomena, since it
turns out that any continuous periodic function can be approximated to an arbitrary
degree of accuracy by a sum of sinusoidal functions.†
Features of periodic functions
(a) Period : This is the smallest positive number a such that f (t) = f (t+a). As there
is no positive number a smaller than 2π such that sin(t + a) = sin t, the period
of f (t) = sin t is 2π. By contrast, the functions g(t) = sin 2t and h(t) = sin(t/2)
are also periodic, but have different periods: g has period π and h has period 4π.
g(t)=sin 2t f (t)=sin t
h(t)=sin(t/2)
1
0
−2π 2π
t
−1
(b) Amplitude: This is the quantity 12 (M − m), where M and m are respectively the
maximum and minimum values of the function. As can be seen from the graphs
above, the functions f (t) = sin t, g(t) = sin 2t and h(t) = sin(t/2) all have the
same amplitude, namely 1.
(c) Mean Level : The value halfway between the maximum and minimum values
of the function is sometimes called the mean level. That is, the mean level is
1
2 (M + m), where M is the maximum and m the minimum. Thus M is the mean
level plus the amplitude, while m is the mean level minus the amplitude.
(d) Phase: Values of t that differ by one or more periods are said to represent the
same phase of the rhythm. Technically, the phase of a point t0 is the set of all t
values that differ from t0 by a multiple of the period. Points that differ by one
or more periods are said to be in equal phase; similarly, one refers to the phase
of a peak (turning point).
Note that for sinusoidal functions, but not for all periodic functions, the mean
level coincides with the average value (also called the mean value) of the function.
The formal definition of the average value of a function involves integration, a topic
that we shall study later in this course. But an intuitive understanding of the concept
of average value is easily obtained with the aid of a diagram such as the following.
f (x)
1
a b c d
0.6
e
x
1 2 3 4
The diagram shows the graph of a certain function f over the interval from x = 0
to x = 4. The total area of the regions below this graph and above the line y = 0.6
is equal to the area of the region above the graph and below y = 0.6. That is, the
sum of the areas marked with the letters a, b, c and d equals that marked with the
letter e (including the part below the x-axis). So we call 0.6 the average value of f .
Note that there is a unique function f that satisfies f (x + 4) = f (x) and has the
graph shown above on the interval from x = 0 to x = 4. This function is periodic,
with period 4. Since the maximum and minimum values this function achieves are 1
and −1 respectively, we see that its amplitude is 1 and its mean level is 0.
1.2.2 Modification of Sinusoidal Functions
The most general form for the equation of a sinusoidal function is
where t is the variable and a, b, c and d are constants. The values of a, b and d
determine the amplitude, period and mean level, while c determines the phase of the
point t = 0. To be precise, the amplitude is |a|, the mean level is d, and the period
is 2π/|b|. We present some examples to illustrate this.
(a) Consider first the graph of y = a cos t.
−2π 2π t
−a
In this diagram the value of a is positive, and the graph of y = a cos t is drawn
as a solid line. Replacing a by its negative would give the dotted graph.†
We find that y takes the value a at t = 0 and the value −a at t = π. Since cos t
can never be greater than 1 or less than −1 it follows that all values taken by
a cos t lie between a and −a. Thus absolute value of a is the amplitude of the
function and the mean level is 0. Note also that the period of this function is
always 2π, irrespective of the value of a.
(b) Consider next y = cos bt, where b 6= 0. Note that since cos bt = cos(−bt) for all t,
there is no loss of generality in assuming that b is positive. (The case b = 0 is
excluded since it corresponds to the horizontal line y = 1.)
0
−2π/b 2π/b
t
−1
† In fact there is no need to ever use negative values of a, since − cos t = cos(t + π).
1.2 Periodicity 7
These integer multiples of 2π/b are the only values of t for which cos bt equals 1.
It follows readily that cos bt has period 2π/b, since for any value of t we have
1
mean level, d
0
2π t
−1 y =d−1
Our diagram shows the graphs of y = d + cos t (solid) and y = cos t (broken).
We have used the value d = 0.7. For each t, the difference between d + cos t and
8 Chapter 1 Curve Fitting
cos t is d; hence the vertical distance between the two graphs is d units, at all
points t. The graph of d + cos t is obtained by shifting the graph of cos t upwards
by d units. This increases the mean level by d, without changing the amplitude,
period or phase.
Knowing of the amplitude, period, phase shift and mean level of a sinusoidal function
makes it a trivial matter to sketch the graph. The next example illustrates the
process.
Example 1.5
We show how to sketch the graph of y = 3 cos(2t − 2) + 4. The key to doing this is to
rewrite the formula in the standard form a cos b(t − c) + d; the values of a, b, c and
d then provide all the information needed.
Since clearly 3 cos(2t − 2) + 4 = 3 cos 2(t − 1) + 4, we see that a = 3, b = 2, c = 1
and d = 4. Thus the amplitude is 3 (the value of a), the mean level is 4 (the value
of d) and the period is π (the value of 2π/b). At t = 1 (the value of c) the function
attains its maximum. This maximum is 7, the sum of the amplitude and the mean
level. Knowing that the graph is sinusoidal with period π, amplitude 3 and mean
level 4, the information that it takes the value 7 at t = 1 determines it uniquely, and
sketching it becomes a trivial task.
7
6
5
4
3
2
1
0
−3π −2π −π −1 π 2π 3π t
Example 1.6
Find a formula for a sinusoidal function with amplitude 2 and period π/2.
Solution: Recall that the general sinusoidal function y = a cos b(t − c) + d has
amplitude a and period 2π/b. So we require a = 2 and 2π/b = π/2. This gives b = 4.
The values of c and d do not affect the amplitude and period; so they can be chosen
arbitrarily. Thus for any real numbers c and d the formula
y = 2 cos 4(t − c) + d
determines a sinusoidal function with amplitude 2 and period π/2. There are in-
finitely many functions satisfying the requirements.
If we choose c = 0 and d = 0 then we obtain the simple formula y = 2 cos 4t. An-
other solution is y = 2 cos(4t−π/2); this is given by putting d = 0 and c = π/8. Since
cos(θ − π2 ) = sin θ, this second solution can be written more simply as y = 2 sin 4t.
The fact that cos(t − π2 ) = sin t for all t corresponds to a property of the graphs of
y = cos t and y = sin t that we have already observed, namely that the the graph of
sin t is the graph of cos t shifted π/2 units to the right. It also shows us that any
1.2 Periodicity 9
expression involving cos can be easily reformulated in terms of sin. One consequence
of this is that y = a sin b(t − c) + d would serve just as well as y = a cos b(t − c) + d
as a formula for the general sinusoidal function. Indeed, y = a sin b(t − c) + d still
has amplitude a, period 2π/b and mean level d; the conversion from one form to the
other is simply a matter of changing the value of c. For y = a sin b(t − c) + d the
value the function takes at t = c is the mean level d (rather than the maximum value
a + d, as in the other case).†
1.2.3 Combining sinusoidal functions having the same period
If f and g are sinusoidal functions that have the same period then their sum is also
sinusoidal. So, for example, any expression of the form a sin x + b cos x (where a and b
are constants) can be expressed more simply as a modified sine function, R sin(x + α)
(where R and α are constants). In order to do this, we would want to determine R
and α, given a and b.
By a standard trigonometric formula (see Appendix 3),
a = R cos α (2)
and
b = R sin α. (3)
so that p
R= a 2 + b2 .
The equations (2) and (3) above now give
a
cos α = √
a2
+ b2
and
b
sin α = √
a + b2
2
Example 1.7
Suppose we wish to write sin x + cos x in the form R sin(x + α).
√ √ √
We have R = 12 + 12 = 2 and sin α = cos α = 1/ 2. Since sin α and cos α
are both positive, α lies between
√ 0 and π/2 (the first quadrant). In this interval the
unique solution of sin α = 1/ 2 is α = π/4. Hence
√
sin x + cos x = 2 sin(x + π/4).
† Since we always insist that a > 0, values of t that are slightly greater than c yield
function values slightly greater than d. In other words, the function is increasing at t = c.
10 Chapter 1 Curve Fitting
We can use the steps outlined in Example 1.5 to sketch the graph of y = sin x + cos x.
√
2
−2π −π π 2π x
√
− 2
Note that the sum of two periodic functions that have the same period always
gives another periodic function; this is quite easy to see.† It is not at all obvious
that if the two given functions are sinusoidal then so is their sum, but this is what
we have shown above. It is important to realize, however, that if the given sinusoidal
functions do not have the same period then their sum will not be sinusoidal.
Functions of the form
are called trigonometric polynomials. Since we have sin k(t + 2π) = sin kt and
cos k(t + 2π) = cos kt whenever k is an integer, it follows that f (t + 2π) = f (t)
for all t. By choosing the coefficients ak and bk appropriately, it is possible to find
a trigonometric polynomial that closely approximates any given continuous periodic
function of period 2π. The process of fitting a trigonometric polynomial to given
periodic data is called Fourier analysis, a topic that is beyond the scope of these
notes.
† To be more exact, the common period is a multiple of the period of the sum. For
example sin 2x + sin x and sin 2x − sin x both have period 2π; their sum has period π.
1.3 Scaling Data 11
1.3.1 Proportionality
The notion of proportionality pervades the life sciences. For instance, the amount A
of muscle action of an animal is roughly proportional to its body mass M . We write
A ∝ M.
This means that there is a constant k, called the constant of proportionality, such
that the equation
A = kM
holds (closely enough for practical purposes) for all animals of a given species. Sim-
ilarly, the body heat H generated by the animal is proportional to muscular action.
That is,
H = aA,
for some constant of proportionality a. Notice that as H ∝ A and A ∝ M , we
have H ∝ M (since H = aA = a(kM ) = (ak)M ). Throughout this course we
shall encounter situations where variables are in proportion. It is also common to
find situations where one variable is proportional to some simple function of another
variable, such as its square or its logarithm. For instance, the surface area of a guinea
pig is proportional to the square of its length:
S ∝ L2 .
t (hours) 0 1 2 3 4
y (cm) 1.5 6.1 10.7 15.3 19.9
Table 1
hourly intervals. Such a table defines a function, which we call a tabulated function,
giving the values of the dependent variable (y in this case) in terms of the independent
variable (here t).
The aim is usually to find a formula that fits the tabulated values, the hope being
that the formula can be used to reliably forecast subsequent values of the dependent
variable. In this section we look at two possible procedures for finding such formulas.
Example 1.8
Find an explicit formula to fit the experimental data in the following table.
t 1 2 3 4 5 6
y 0.7 5.6 9.6 19.4 31 42.1
Table 2
12 Chapter 1 Curve Fitting
Observe that for all values of t the corresponding value of y is slightly greater than t2 .
This might lead one to guess that y is proportional to t2 , with a constant of propor-
tionality k that is slightly greater than 1. This theory can be tested easily by plotting
y against T = t2 . If the theory is correct the data points will approximately lie on a
straight line through the origin with slope k.
Expressed in terms of T our data is as follows.
T 1 4 9 16 25 36
y 0.7 5.6 9.6 19.4 31 42.1
Table 3
When this is plotted on graph paper, all the points approximately trace out a
straight line, passing through (0, 0). This tells us that we should approximate the
data with a straight line, which we call the line of best fit. Due to experimental error,
the best fitting straight line may not pass exactly through all the points (or indeed
pass exactly through any of them). However it must be close to all the points.
y
45
40
35
30
25
20
15
10
0 T
0 5 10 15 20 25 30 35 40 45
1.3 Scaling Data 13
The slope of this line can be determined by applying the formula (y2 − y1 )/(x2 − x1 ),
where (x1 , y1 ) and (x2 , y2 ) are any two points on the line of best fit. Using the points
(20, 24) and (10, 12) gives
12 − 24
slope ≅ = 1.2.
10 − 20
So the equation of the line of best fit is y = 1.2T , and the original data satisfies
y ≅ 1.2t2 .
If we calculate the slope using actual data points, for example the points (1, 0.7)
and (16, 19.4), we may get a different result, i.e. using these two points we get that
the slope is
19.4 − 0.7
= 1.25.
16 − 1
Notice that to calculate the slope of the line accurately it was necessary to use points
on the line of best fit rather than taking points from the data table. Estimating the
line of best fit visually is not an exact process, and you and a friend may have slightly
differing answers for the same data. It is important to plot the data carefully, so that
any error is small. It is also a good strategy to use points that are reasonably far
apart to compute the slope.
1.3.3 Linearization
In Example 1.8 above we chose to plot one of the variables against the square of the
other in the hope that doing so would produce a straight line throu gh the origin,
and, luckily, it did. If we had obtained a straight line that did not pass through the
origin we would still have been able to determine the relationship between T and y,
and hence the relationship between t and y, since it is always straightforward to
determine the equation of a straight line graph. On the other hand, if we plot some
experimentally obtained data points and find that they do not lie on a straight line
graph, the relationship between the variables generally remains unclear. To illustrate
this point, consider the result of plotting
y the data points from Table 1 of Example 1.8.
30
25
20
15
10
5
t
−1 1 2 3 4 5 6
The points clearly do not lie on a straight line, and hence we cannot immediately say
what the relationship is between y and t.
If the graph of a dependent variable y against an independent variable x turns
out to be a straight line then we say that x and y satisfy a linear relationship. It
means that y = mx + c for some constants m and c that are easily obtained from the
graph. If x and y do not satisfy a linear relationship then we can try introducing new
variables X and Y , given by some simple functions of x and y respectively, in the
14 Chapter 1 Curve Fitting
hope that X and Y will satisfy a linear relationship. This procedure, if successful, is
known as linearization of the data. The all important question, of course, is how to
define the new variables.
There is no general rule, guaranteed to work in all cases, for deciding what the
new variables should be. It depends on the data in question. Generally speaking, one
needs some extra information that gives one some idea of what kind of relationship
might hold.
If one is able to find a transformation that linearizes a set of data, there is a
reasonable procedure for finding an equation to fit the data. It is as follows.
(i) Transform the data from the old variables, say x and y, to new variables, say
X and Y . Here the values of X should be determined by some function of x
and the values of Y by some function of y. (It may be adequate to use one new
variable in conjunction with one of the old variables. In other words, we may
find it convenient to take X = x or Y = y.) We assume that X and Y have a
linear relationship.
(ii) From the graph of the data for X and Y , determine the linear equation in X
and Y . That is, find values for a and b such that Y = a + bX.
(iii) Transform the equation in X and Y to an equivalent equation in x and y.
The only difficulty is the first step; that is, finding a suitable transformation. Fortu-
nately, there are two frequently occurring classes of relationships for which standard
linearizing transformations are available. These are as follows.
(a) Power laws. These are relationships of the form
y = Axb ,
where A and b are constants that depend on the context. For example, the sizes
of two different parts of a living organism are often related by a power law.
(b) Exponential laws. These are relationships of the form
then, taking logarithms of both sides and using some standard properties of loga-
rithms (see Appendix 2), we find that
ln y = ln(Axb )
= ln A + ln xb
= ln A + b ln x.
Y = ln A + bX.
Example 1.9
Find a formula y = f (x) for the function defined by the following data.
Y
3
2.5
1.5
0.5
X
0 0.5 1 1.5 2
Since the points (1.6, 3) and (0.1, 0) lie on the best fitting straight line as drawn
in our diagram, we can use these points to find the slope. The result is that
3
slope = b = = 2,
1.5
so that Y = a + 2X. Now since (0.1,0) lies on the line, we find that
0 = a + 0.2,
giving a = −0.2, Therefore,
Y = 2X − 0.2.
We must now express this relationship in terms of the original variables x and y.
Since Y = ln y and X = ln x,
ln y = 2 ln x − 0.2,
and using the formulas in Appendix 2 once more,
y = e2 ln x−0.2
= e2 ln x × e−0.2
2
= eln x × e−0.2
= x2 × e−0.2
≈ 0.8x2
1.3 Scaling Data 17
Semi-log plots
If it is suspected that the relationship y = f (x) is exponential, so that
y = Aekx (2)
for some constants A and k, then, again taking logarithms, we find that
ln y = ln A + ln ekx
= ln A + kx.
Y = ln A + kx.
Example 1.10
The alcoholic content, y mg/ml, of a person’s blood t hours after drinking whiskey
rose to 0.22 mg/ml and then slowly decreased, as shown in Table 6.
-0.5
-1
-1.5
-2
-2.5
-3
Since the line of best fit passes through the points (0, −1.5) and (3, −3), its slope
k is given by
−3 − (−1.51)
k= = −0.5.
3−0
Furthermore, the Y -intercept is −1.5; so the equation relating Y and t is
Y = −0.5t − 1.5.
Thus ln y = −0.5t − 1.5, and it follows that
y = e−0.5t−1.5
= e−0.5t × e−1.5
≈ 0.22e−0.5t .
Note that e−0.5t = (e−0.5 )t ≈ (0.61)t ; so the above formula can also be written as
y ≈ (0.22) × (0.61)t . However, it is usual to leave it in the form y = 0.22e−0.5t .
1.3 Scaling Data 19
b = 10 log(I/I0 ).
Thus the difference in magnitude of two earthquakes is the logarithm of the ratio of
their intensities. Equivalently, it is the logarithm of the ratio of the amplitudes that
a seismograph would record at the same distance from the epicentres.
For example, an earthquake that measures 5 on the scale is 105 times as intense
as one with intensity I0 (since 5 = log10 (I/I0 ) if and only if I/I0 = 105 ). Similarly,
an earthquake that measures 6 on the scale has intensity 106 I0 , and so is 10 times as
intense as one with a Richter reading of 5. Indeed, an increase of 1 on the Richter
scale always corresponds to an increase in intensity by a factor of 10. Thus, an
increase of 2 on the scale corresponds to a 100-fold increase in intensity, and so on.
All logarithmic scales have this property that an increase of one unit on the scale
corresponds to multiplying the quantity being measured by a fixed factor.
20 Chapter 1 Curve Fitting
(c) pH factor
Of fundamental importance to any living organism is the level of hydrogen ion concen-
tration, [H+ ], of its environment. The value of [H+ ] determines whether a substance
is classified as alkaline, neutral or acidic, and it can range from a minimum of 10−12
(for the most alkaline solutions) to a maximum of 10−2 (for the most acidic). Distilled
water (neutral) has an [H+ ] value of 10−7 .
When comparing the [H+ ] values of two substances, it is not the difference of
the values that is important, but their ratio. To see this, let A, B and C denote,
respectively, the [H+ ] values for the strongest acid, for distilled water and for the
strongest alkali. Thus A = 10−2 , while B = 10−7 and C = 10−12 . Observe that
B A
= 105 = .
C B
By contrast, B − C ≈ 0.0000001 is infinitesimal in comparison with A − B ≈ 0.01.
This shows us that use of a logarithmic scale is appropriate.
The so-called pH factor is defined to be the negative of the logarithm of [H+ ]:
pH = − log10 [H+ ].
Example 1.11
Suppose that two substances, S1 and S2 , have pH factors 5.5 and 8.7 respectively.
How does the [H+ ] concentration in S1 compare with that in S2 ?
In S1 we have [H+ ] = 10−5.5 , while in S2 we have [H+ ] = 10−8.7 . Since
10−5.5
= 103.2 ≈ 1600,
10−8.7
the hydrogen ion concentration of S1 is about 1600 times that of S2 .
2. How much more intense is an earthquake that measures 6 on the Richter scale
than one that measures 5.5?
1.4 Finite Differences 21
3. For each of the following sets of data, plot ln y against ln x, and, if necessary, ln y
against x, to determine whether an equation of the form y = axb or y = aekx
satisfies the data. Find the equation in each case.
(i ) x 1 2 3 4 5
y 3 4.1 5.4 5.9 6.7
(iii ) x 10 20 30 40 50
y 4.9 8.0 10.4 11.3 13.9
(iv ) x 0 1 2 3 4
y 19 14 8 7 4
√
4. (i ) Show that the transformation X = x linearizes the following data.
1
(ii ) Show that the transformation X = linearizes the following data.
x
x 1 2 3 4 5 6
y 1 6.1 7.6 8.3 8.9 9.6
Example 1.12
Suppose that an experiment records the following data.
x f (x) ∆f (x)
0.5 1.9
0.6
1.0 2.5
0.6
1.5 3.1
0.6
2.0 3.7
0.6
2.5 4.3
0.6
3.0 4.9
Table 10
† The domain of a function is the set of points at which it is defined.
1.4 Finite Differences 23
x f (x) ∆f (x)
a f (a)
∆f (a)
a+h f (a + h)
∆f (a + h)
a + 2h f (a + 2h)
∆f (a + 2h)
a + 3h f (a + 3h)
Table 11
where ∆f (x) = f (x + h) − f (x).
Geometrically, if ∆f (x) is constant, then, since the values of x are a constant
distance h apart, the data points will lie on a straight line when they are plotted.
f (x)
f (a+2h)
∆f (a+h)
f (a+h)
∆f (a)
f (a)
a a+h a+2h x
Thus we should be able to find a first degree polynomial, f (x) = a0 + a1 x, that fits
the data. We shall see how to do this in the next section.
The regularity in the differences of the dependent variable values may not always
be as obvious as in our previous example.
Example 1.13
Suppose that an experiment records the following data.
x 0 1 2 3 4 5 6
f (x) 1.4 0.6 2.2 6.2 12.6 21.4 32.6
Table 12
Again the data is tabulated with constant increments of the independent variable
x, this time the corresponding changes in the dependent variable are not constant.
Indeed, the first difference function ∆f takes the values shown in the next table.
x 0 1 2 3 4 5
∆f (x) −0.8 1.6 4.0 6.4 8.8 11.2
Table 13
If we now construct ∆(∆f ), the difference function for ∆f , we obtain the following
table.
x 0 1 2 3 4
∆ (∆f (x)) 2.4 2.4 2.4 2.4 2.4
Table 14
24 Chapter 1 Curve Fitting
We call the function ∆(∆f ) the second difference function of f , and we use the
notation ∆2 f . Analogous functions ∆3 f , and so on, can also be constructed.
The given data in this example exhibits a new kind of regularity: constant second
differences. The fact that ∆(∆f ) is constant means that a first degree polynomial
can be found that fits the values of ∆f . We shall see that this enables us to find a
polynomial of degree 2 that fits the values of f . The procedure will be explained in
the next section.
The values of f , ∆f and ∆2 f are generally more conveniently displayed in a
difference table as follows.
a f (a)
∆f (a)
| {z }
a+h f (a + h)
...
... ...
1.4 Finite Differences 25
a+h f (a + h)
∆f (a + h)
| {z }
a + 2h f (a + 2h)
D = ∆f (a) = ∆f (a + h) = ∆f (a + 2h) = · · · ,
f (a + 2h) = f (a + h) + ∆f (a + h)
= f (a + h) + D
= (f (a) + ∆f (a)) + D
= f (a) + 2D,
and similarly,
D
f (x) = f (a) + (x − a). (2)
h
Moreover, we know that Eq. (2) holds for all values of x in the table, since these x
values all have the form a + kh for nonnegative integers k. Since the right hand side
of Eq. (2) is clearly a linear function (since f (a), a, D and h are constants) we have
found a linear equation that is satisfied by the data.
For the data in Example 1.12 above, the constant value of the first differences
is 0.6, and the other constants appearing in Eq. (2) are a = 0.5, h = 0.5 and
f (a) = 1.9. So a first degree polynomial formula satisfied by the data is
0.6
f (x) = 1.9 + (x − 0.5)
0.5
= 1.9 + 1.2(x − 0.5)
= 1.2x + 1.3.
We could check directly that the data does indeed satisfy this first degree (linear)
polynomial. To do this, substitute in all the values of x and check that the answer
always agrees with the table.
26 Chapter 1 Curve Fitting
Thus, with the first difference function constant, we have found a linear polyno-
mial satisfied by the data. This may appear to be a rather formal procedure for such
a simple case, but the method extends easily to higher degree polynomials.
Second Degree Polynomials
If the second differences of a tabulated function f are constant, one can find a second
degree polynomial that fits data. The algebra is a little harder than in the previous
case, but still routine.
We start with the observation that since the first difference function has constant
first differences, the theory developed above applies to it. In particular, by Eq. (1)
with ∆f in place of f ,
∆f (a + kh) = ∆f (a) + kD (3)
for all integers k ≥ 0, where D is the constant value of the function ∆2 f .
Now, looking at the difference table as before, we find that
and similarly
f (a + 2h) = f (a + h) + ∆f (a + h).
Putting k = 1 in Eq. (3) provides a formula for ∆f (a + h), and combining this with
the formula for f (a + h) in Eq. (4) we deduce that
Similarly,
nk = 21 k(k − 1).
So
f (a + kh) = f (a) + k∆f (a) + 12 k(k − 1)∆2 f (a).
Defining x = a + kh (and hence k = (x − a)/h), this equation becomes
(x − a) 1 (x − a) (x − a)
f (x) = f (a) + ∆f (a) + −1 D
h 2 h h
1 1
= f (a) + ∆f (a)(x − a) + 2 D(x − a)(x − a − h).
h 2h
This is a quadratic (or second degree) polynomial that is satisfied by the data (since
a, h, D, f (a) and ∆f (a) are all constants).
For the data in Example 1.13 above, the constant value of the second differences
is D = 2.4, and the other constants appearing in Eq. (2) are a = 0, h = 1 and
f (a) = f (0) = 1.4 and ∆f (a) = ∆f (0) = −0.8. So a second degree polynomial
formula satisfied by the data is
1
f (x) = 1.4 + (−0.8)x + 2 2.4 x(x − 1)
2
= 1.4 − 2 x + 1.2 x .
Example 1.14
Problem: find a polynomial of minimal degree that fits the data below.
As the first differences are not constant there is no suitable first degree polynomial.
However, the second differences are constant, and so it is possible to find a suitable
28 Chapter 1 Curve Fitting
second degree polynomial. As was shown above, the required polynomial is given by
the formula
1 1
f (x) = f (a) + ∆f (a)(x − a) + 2 ∆2 f (a)(x − a)(x − a − h),
h 2h
where h is the interval of tabulation. It is usual to take a to be the first of the x
values in the table; however, so long as the values of f (a) and ∆f (a) appear in the
table, we could take a to be any of the x values. (They all give the same answer.)
Here h = 0.5, and if we take a = 0 then f (a) = −0.5, ∆f (a) = 0.375 and
∆2 f (a) = 0.750. Thus the polynomial we seek is
0.375 0.750
f (x) = −0.5 + x+ x(x − 0.5)
0.5 2(0.5)2
= 1.5x2 − 0.5.
For interest’s sake, let us check that taking a = 1 leads to the same answer. This
time we have f (a) = 1 and ∆f (a) = 1.875; so the formula gives
1.875 0.750
f (x) = 1 + (x − 1) + (x − 1)(x − 1.5)
0.5 2(0.5)2
= 1 + (3.75 x − 3.75) + 1.5(x2 − 2.5 x + 1.5)
= (1 − 3.75 + 2.25) + (3.75 − 3.75)x + 1.5 x2 ,
The procedure we used to derive the formula for the second degree polynomial
when the second differences are constant extends naturally to the case of constant
third differences, in which case a third degree polynomial is obtained, and to constant
fourth differences, giving a degree 4 polynomial, and so on.
The formulae for the first three cases are as follows.
(x − a)
First degree: f (x) = f (a) + ∆f (a)
h
(x − a) (x − a)(x − a − h) 2
Second degree: f (x) = f (a) + ∆f (a) + ∆ f (a)
h 2h2
(x − a) (x − a)(x − a − h) 2
Third degree: f (x) = f (a) + ∆f (a) + ∆ f (a)
h 2h2
(x − a)(x − a − h)(x − a − 2h) 3
+ ∆ f (a)
6h3
This generalizes to higher degrees in a natural way. Indeed, for each degree n,
the formula for the polynomial of degree n is obtained from the one of degree n − 1
by adding the extra term
1. Construct difference tables and find polynomials to fit the following sets of data.
(i )
x 0 2 4 6 8
f (x) 17 13 9 5 1
(ii )
x 1 1.2 1.4 1.6 1.8 2.0
f (x) 3 3.8 4.6 5.4 6.2 7
(iii )
x −3 −2 −1 0 1 2 3
f (x) 4 −5 −10 −11 −8 −1 10
(iv )
x 2 2.1 2.2 2.3 2.4 2.5
f (x) 7 7.41 7.84 8.29 8.76 9.25
(v )
x 0 10 20 30 40
f (x) 3 1053 8303 27753 65403
(vi )
x 1 2 3 4 5 6
f (x) 0.25 2 6.75 16 31.25 54
Chapter 2
Optimisation
In this section we will be concerned with finding maximum and minimum values
of various functions. Problems of this type arise quite frequently. For example,
how much fertilizer applied to a particular crop will produce the greatest yield?
Which cylindrical container, made from a given amount of material, has the greatest
volume? How should the wings of an aeroplane be shaped to maximize lift? These
are all optimisation problems.
In mathematical terminology, solving an optimisation problem means finding
the maximum or minimum value of some quantity f (x) defined for given values of x.
We call the values of x for which the function f (x) is defined the domain of f .
We will for example consider problems where f (x) is defined whenever a ≤ x ≤ b
(where the numbers a and b depend on the context). The set of all x satisfying
a ≤ x ≤ b is called the closed interval [a, b]. Note that the set [a, b] contains the
points a, b (which are called the endpoints of the interval) and it is possible that the
maximum or minimum value of f (x) on [a, b] occurs at these endpoints. We will also
consider problems where f (x) is defined whenever a < x < b, e.g. 1 < x < 6 or
−∞ < x < ∞. The set of all x satisfying a < x < b is called the open interval (a, b).
Since the interval (a, b) does not contain the points a, b we do not need to consider
these numbers when searching for maximum and minimum value of f (x) on (a, b).
For example on the interval −1 ≤ x ≤ 6, or [−1, 6], the function f (x) = 2x has
minimum value f (−1) = −2 and maximum value f (6) = 12.
max
-1
x
6
min
On the interval −1 < x < 6, or (−1, 6), the function f (x) = 2x does not attain
a minimum or maximum value.
30
2.1 One Variable Problems 31
-1
x
6
On the interval −1 < x ≤ 6, which we also write as (−1, 6], the function
f (x) = 2x does not attain a minimum value but has the maximum value f (6) = 12.
max
-1
x
6
On the interval −∞ < x < ∞, or (−∞, ∞), the function f (x) = 2x does not
attain a maximum or minimum value.
-1
x
6
32 Chapter 2 Optimisation
We shall always assume that the function is differentiable at all points in [a, b],
with (possibly) a finite number of exceptions, and continuous everywhere. The points
at which f (x) is not differentiable are those at which the graph of f (x) either has no
tangent, or has a vertical tangent. When searching for minimum or maximum values
of a function y = f (x), we must check the values of the function at any endpoints
which are included in its domain. To find the candidate points at which the maximum
or minimum value may occur, we need to consider the derivative of the function.
2.1.1 The Derivative
Suppose that we have an equation y = f (x) that describes the relationship between
x and y, and we wish to know the maximum and minimum values of y, if such exist.
In order for y to reach a maximum value at a point other than an endpoint, it must
increase to that value and then decrease. When a function is increasing its gradient
is positive, and when it is decreasing its gradient is negative. The gradient is given by
dy
the first derivative, dx = f ′ (x). We see that the first derivative must move from posi-
dy
tive to negative as we pass through a maximum value of the function. As dx changes
from positive to negative it must either equal zero or become undefined. Similarly,
as we pass through a minimum at a point other than an endpoint the derivative
changes from negative to positive; so it must either equal zero or be undefined at the
minimum point.
Values of x for which the first derivative of y = f (x) equals zero or fails to exist
are called critical numbers of the function. We can find all the numbers in the domain
of f (x) at which the function might have a maximum or a minimum value by finding
these critical numbers together with any endpoints contained in the domain. If x0
is a critical number then the corresponding point (x0 , y0 ) on the graph is called a
critical point. The diagram illustrates maximum and minimum critical points with
dy
dx = 0.
max
dy dy
dx >0 dx <0 dy dy
dx <0 dx >0
min
dy
We shall consider the case when is undefined later.
dx
Once a critical number has been found, one can determine whether it corresponds
dy
to a maximum or minimum, or neither, by investigating the sign of dx at x values
close to the critical number. Suppose that x = x0 is a critical number. If it is
dy dy
found that dx < 0 for all x values a little less than x0 and dx > 0 for all x values a
little greater than x0 then the critical point is a local minimum (meaning that it is a
dy
maximum if we consider only x values close enough to x0 ). If it is found that dx >0
dy
for all x values a little less than x0 and dx < 0 for all x values a little greater than
x0 then the critical point is a local maximum. If neither of these circumstances arise
– which is quite possible – then the critical point is neither a local maximum nor a
local minimum.
Example 2.1
et f (x) = 2x2 − 4x − 1. (When a function is defined for all values of x then unless
specified otherwise we take its domain to be −∞ < x < ∞). Then f ′ (x) = 4x − 4.
2.1 One Variable Problems 33
At an end point we can also use the derivative of the function to test whether
this number is a local maximum or local minimum of the function. It works exactly
the same as for a critical number, except that one only has to consider values of x
that actually lie in the domain of the function. This is best demonstrated by an
example.
Example 2.2
Let f (x) = 2x2 − 4x − 1, for −2 ≤ x ≤ 2. As above, f ′ (x) = 4x − 4 and x = 1
is the only critical number of the function with corresponding critical point (1, −3).
Consider the endpoint x = −2. When x is slightly larger than −2, the derivative
f ′ (x) = 4x − 4 is negative, so the corresponding point (−2, f (−2)) = (−2, 15) is a
local maximum of f (x) on the domain [−2, 2]. We do not have to consider here what
happens when x is slightly less than −2 because such points do not lie in the given
domain [−2, 2].
Consider the end point x = 2. When x is slightly less than 2, the derivative
′
f (x) = 4x − 4 is positive, so the corresponding point (2, f (2)) = (2, −1) is a local
maximum of f (x) on the domain −2 ≤ x ≤ 2.
It is a very useful fact that a continuous function defined on a closed
interval [a, b] will always achieve a global maximum value and a global
minimum value. These can occur only at critical points, points where the derivative
is not defined, or endpoints. A global maximum is in particular a local maximum, so
since we have just two local maximum points the global maximum point is the one
of these where f (x) is largest, namely (−2, 15). We have only one local minimum
point, so this is also the global minimum point.
34 Chapter 2 Optimisation
global maximum
y = 2x2 − 4x − 1
x
-2 -1 1 2
local maximum
global minimum
a local maximum at x = x0 then the derivative is positive when x is just less than x0
and negative when x is just greater than x0 . Thus the derivative is decreasing as we
pass through x = x0 . This suggests that the derivative of the derivative should be
negative at x = x0 . Similarly, at a local minimum the derivative must be increasing,
and the second derivative should be positive.
The second derivative test has some drawbacks. It is true that if the second
derivative is negative at a place where the first derivative vanishes then the critical
point is a local maximum, and it is true that the if the second derivative is positive at
a place where the first derivative vanishes then the critical point is a local minimum.
d2 y
† If y = f (x) then we also use the notation f ′′ (x) for the second derivative dx2
.
2.1 One Variable Problems 35
However it is not true that the second derivative must be negative at a local maximum
and positive at a local minimum. It could turn out that the second derivative is zero,
and in this case the test is inconclusive. We could have a local maximum, e.g.
f (x) = −x4 and (0, 0)
y
local maximum
x
y = −x4
y = x4
x
local minimum
we could have a critical point that is neither a local maximum nor a local mini-
mum, e.g. f (x) = x3 and (0, 0).
y
y = x3
x
neither
Similarly, if the second derivative fails to exist at the critical number then there
is no information to be had from the second derivative test.
Another drawback with the second derivative test is simply that calculating the
value of the second derivative at the critical number is frequently much harder than
calculating the first derivative for values of x near the critical number. However,
36 Chapter 2 Optimisation
whether or not this is really a problem varies from case to case, and it is undeniable
that the second derivative test is often simple and successful. In the example we
considered above (namely f (x) = 2x2 − 4x − 1) we found that f ′ (x) = 4x − 4, and it
follows at once that f ′′ (1) = 4 > 0. So the critical point (1, −3) is a local minimum.
If we are given the formula for the function f that we are trying to maximize or
minimize, then finding the points at which the derivative is zero or undefined should
be a straightforward task. When we are faced with a practical problem, however,
the necessary equations will not generally be provided for us. In such cases we have
to construct the equations for ourselves. That is, we try to find a “mathematical
model” that accurately represents the situation.
Example 2.3
The manager of a hotel in San Juan with 200 rooms finds that if the cost of a room
is $50 or less per night, then the hotel will generally be full. She also knows that, on
average, for every dollar the cost is increased above $50, another two rooms remain
empty. The daily cost of running the hotel (for staff etc.) is $5000, independent of
the number of rooms occupied. How much should the manager charge in order to
maximize profit?
There are four steps involved in the solution of this problem, and others like it.
Step 1: Identify the variables involved. We have the cost of the room (let us call it
$ x), the number of rooms occupied (call it N ), and the profit (call it $ P ). (The $5000
daily cost of running the hotel and the number of rooms in the hotel are constants
and do not need to be given names.)
Step 2: Determine the independent variables (those whose values we are free to
choose), and determine the variable to be optimised. In the present example the
independent variable is x, and the aim is to find the value of x that maximizes P .
Note that the value of the variable N depends upon the value of x, and so it will be
possible to eliminate N from the formula for P .
Step 3: Write down equations that describe all the known relationships between the
variables.
In the present example, we have first of all that
That is,
P = N x − 5000.
Next, we know that a relationship exists between N , the number of rooms occupied,
and x, the cost of a room. Since the hotel is generally full if x ≤ 50, the manager
may as well charge at least $50 per room. So we shall restrict our attention to values
of x greater than or equal to 50. Since N is decreased by two for every dollar by
which x exceeds 50,
N = 200 − 2 × (x − 50)
= 300 − 2x.
(Note that N = 0 when x = 150. The manager will certainly not want to charge as
much as $150 for a room.)
2.1 One Variable Problems 37
Now
P = N x − 5000
= (300 − 2x)x − 5000
= −2x2 + 300x − 5000.
Step 4: Once we have an expression for the variable to be optimised in terms of
the independent variables, we can differentiate the expression and find the critical
numbers. In the present problem there is only one independent variable, x. Problems
with two or more independent variables will be considered in the next section.
The formula P = −2x2 + 300x − 5000 gives
dP
= −4x + 300.
dx
dP
So = 0 when 4x = 300. So x = 75 is the only critical number, and since
dx
2
d P
dx2 = −4 < 0 for all x, the critical number gives a maximum value for P . Thus the
manager should charge $75 per night for a room, thereby making a daily profit of
P
6250
5000
3750
2500
1250
0 x
25 50 75 100 125 150
−1250
−2500
−3750
−5000
is as shown in the diagram. It is clear from the graph that the value of P at the
38 Chapter 2 Optimisation
turning point is a global maximum for this function: there is no point in the domain
where the function achieves a greater value. It is not always the case, however, that a
turning point at which the second derivative is negative must be a global maximum,
rather than merely a local maximum. To demonstrate this we consider the function
f (t) = t2 et .
Differentiating y = t2 et gives
dy
= t2 et + et 2t
dt
= tet (t + 2),
and this takes the value zero at t = 0 and at t = −2. (It does not take the value
zero anywhere else, since et never equals zero. And since the derivative exists for all
values of t, it follows that 0 and −2 are the only critical numbers.) Observe that
y = 0 at t = 0, while at t = −2 we find that y = 4e−2 ≈ 0.5. Now to determine the
nature of the critical points (0, 0) and (−2, 4e−2 ) we look at the sign of dy
dt on either
side of t = 0 and t = −2. It often helps to display the results in a sign diagram for
dy
dt , as follows.
Thus the graph slopes up to a local maximum at (−2, 4e−2 ), then down to a local
minimum at t = 0, then up again after that.
We note also that y = t2 et ≥ 0 for all t. For negative values of t with |t| large,
e−t is so close to zero that t2 e−t is also close to zero. For t large and positive, y is also
large and positive. We now have enough information at our disposal to be able to
sketch the graph. It is immediately clear that for large positive t the function takes
y
x
−5 −4 −3 −2 −1 1 2 3 4
on values much bigger than 4e−2 ; so (−2, 4e−2 ) is not a global maximum, but only a
local maximum. The function value at t = −2 is bigger than the values immediately
2.1 One Variable Problems 39
around, but by no means the biggest value the function attains. On the other hand,
the point (0, 0) is a global minimum, since t2 et ≥ 0 for all t. (This function does not
have a global maximum.)
Let us now modify the above example by restricting the domain of the function
to the interval [−3, 1]. The graph now stops at t = −3 and t = 1.
As before, the function has a local maximum of 4e−2 at t = −2 and a global
minimum of 0 at t = 0. But it now has a global maximum of e at t = 1, and a
local minimum of 9e−3 ≈ 0.45 at t = −3. A continuous function whose domain
is a closed interval [a, b] will always have global maximum and minimum
values. These values occur either at critical numbers in the interior of the interval
or at the end-points (a and b).
y
3
e
2
x
−3 −2 −1 1
It is also possible to consider functions defined on intervals of the form a < x < b,
called open intervals, where the end-points are excluded. Such functions need not
have global maxima and minima.
2.1.3 Maxima and minima when the derivative is undefined
The maximum and minimum points we have encountered so far have all occurred
at points where the first derivative is zero (that is, at points where the curve has a
horizontal tangent). We now turn to cases in which the first derivative is undefined.
As a first example, consider the absolute value function, y = |x|, defined by the
formula
n
x if x ≥ 0,
|x| =
−x if x < 0.
(This formula confuses some students, but it is correct. For example, when x = −2
it gives |x| = −(−2) = +2, just as it should.) The graph is easy to draw, and shows
y
0
−5 −4 −3 −2 −1 1 2 3 4 x
−1
40 Chapter 2 Optimisation
that the function has a minimum value of 0 at x = 0. The gradient is −1 for x < 0
and +1 for x > 0, but is undefined at x = 0.† (The function itself is certainly defined
at x = 0; indeed, |0| = 0.)
Turning now to the general situation, assume (as always) that f is a function
whose derivative exists at all but possibly a finite number of points in its domain.
Suppose that x0 is a point in the domain (so that f (x0 ) is defined) for which f ′ (x0 )
is undefined. Then the graph of the function will come to a sharp point (technically
called a cusp) at (x0 , f (x0 )), and frequently this will be a local maximum or minimum.
As for critical numbers with f ′ (x) = 0, the determining feature of a maximum is that
f ′ (x) changes from positive to negative as x increases through the value x0 . Similarly,
a minimum is characterized by f ′ (x) changing from negative to positive at the critical
number. The diagrams illustrate a cusp that is a local maximum, a cusp that is a
dy
dx undefined
dy
dx undefined
dy
dx > 0
dy dy
dx > 0 dx < 0 dy dy
dx < 0 dx > 0
dy dy
dx undefined dx > 0
local minimum, and a cusp that is neither a local maximum nor a local minimum.
Note that it makes no sense to even ask whether there is a local maximum or
minimum at an x value for which f (x) is undefined. Such a value of x is not even in
the domain of the function. If f (x0 ) does not exist, f cannot have a maximum or a
minimum at x = x0 : it does not have any value at x0 . For example x = 0 is not in
the domain of the function f (x) = 1/x.
Example 2.4
The first derivative of y = (x − 1)2/5 is
dy 2
= (x − 1)−(3/5)
dx 5
2
= ,
5(x − 1)3/5
dy
which is undefined at x = 1. There are no values of x that make dx zero; so 1 is the
only critical number. The corresponding critical point is (1, 0). The sign diagram for
the slope is as shown.
Table 2
† The graph is not smooth at the origin; so there is no tangent at that point.
2.1 One Variable Problems 41
x
−4 −3 −2 −1 1 2 3 4 5
A Summary
(i) Local maximum and minimum points of f (x) can occur only at at critical points
(that is, at points where f ′ (x) is zero or is undefined) or at endpoints of the
domain of f . Note that critical points and endpoints may be neither a local
maxima nor local minima.
(ii) At a local maximum that is not an endpoint f ′ (x) changes from positive to
negative as x increases, and f ′′ (x) ≤ 0 (if it exists).
(iii) At a local minimum that is not an endpoint f ′ (x) changes from negative to
positive as x increases, and f ′′ (x) ≥ 0 (if it exists).
(iv) The global maximum and minimum values, if they exist, are included among
the local maxima and minima. A function need not have a global maximum or
minimum if its domain is an infinite interval or if the one of the endpoints of the
interval is not part of the domain.†
2.1.4 Concavity and points of inflection
Let x = c be a point in the domain of a function f at which the derivative f ′ (c) is
defined. The tangent to the graph at the point (c, f (c)) can then be drawn. If the
graph of f lies above the tangent line for all points close to (c, f (c)) then we say
that the graph is concave upward at this point. Similarly, if the graph lies below
the tangent line at all points close to (c, f (c)) then the graph is said to be concave
downward at this point. The diagrams illustrate these concepts.
f (x) f (x)
f (c) f (c)
c x c x
† For example, if the domain of f consists of all x with 0 < x ≤ 1, with x = 0 excluded,
then f need not have a global maximum. Indeed, f (x) = 1/x does not have a maximum
value on this domain. Nor does f (x) = 1 − x.
42 Chapter 2 Optimisation
If we imagine moving along a portion of the graph for which the concavity is
upward, we will find that the slope of the graph increases as x increases. That is, the
first derivative of f is an increasing function when f is concave upward. Conversely,
f is concave upward when f ′ is increasing. Similarly, f is concave downward on
intervals where f ′ is a decreasing function. Now recall that a function must be
increasing at points where its derivative is positive. So at points where the second
derivative of f is positive the first derivative of f must be an increasing function,
and the graph of f must be concave upward. Similarly, when the second derivative
is negative the graph is concave downward.
A point at which the concavity changes from upward to downward (or vice versa)
is called a point of inflection. Since the sign of the second derivative changes as we
pass through a point of inflection, at the point of inflection itself the second derivative
must either be zero or undefined. One can see (roughly) where the concavity of a
d2 y
curve changes, but it is not easy to tell visually whether dx 2 is zero or undefined.
d2 y (concave upward)
dx2
=0
1 point of inflection
1
d2 y
dx2
<0
(concave downward) d2 y (horizontal tangent) (c,f (c))=(0,0.5)
f (c)
dx2
<0
(concave upward) d2 y
dx2
undefined
point of inflection
c
0
1 2 1
(concave downward)
√
3
f (x) = x3 − 3x2 + 2x + 0.5 f (x) = 0.5 − x5
Observe that although the graph and its tangent line at the point (c, f (c)) have
the same slope, as they must by the definition of “tangent”, the fact that (c, f (c)) is
a point of inflection means that the graph actually crosses the tangent at (c, f (c)).
As we move away from this point the slope of graph changes, and it bends away from
the line.
y = x3 y = x4
x x
2
dy d y
The graphs of y = x3 and y = x4 both satisfy dx = dx 2 = 0 at x = 0. In
both cases the x-axis is the tangent to the graph at the origin. For y = x3 the
graph crosses its tangent at the origin, indicating a point of inflection. The
graph of y = x4 is concave upward at the origin, indicating a local minimum.
vanish at x = 0; however, in this case (0, 0) is a local minimum. This example shows
d2 y
that the condition dx 2 = 0 is no guarantee that we have a point of inflection.
Example 2.5
2/3
We sketch the graph of y = x (x − 5), and determine the maximum and minimum
values of y on the interval [−1, 8].
As a first step, we find the x- and y-intercepts. Note that x2/3 is the cube root
of x , which is zero at x = 0 and positive elsewhere. Thus x2/3 (x − 5) has the same
2
sign as x − 5, for all nonzero values of x. So y < 0 for x < 0 and for 0 < x < 5, while
y > 0 for x > 5. And y = 0 at x = 0 and at x = 5.
Observe also that y has extremely large magnitude when x has extremely large
magnitude, y having the same sign as x. In mathematical notation, y → ∞ as x → ∞
and y → −∞ as x → −∞.
We now differentiate y, so that we can find the critical points.
dy 2
= x2/3 + (x − 5) x−1/3
dx 3
2(x − 5)
= x2/3 +
3x1/3
3x + 2(x − 5)
=
3x1/3
5x − 10
=
3x1/3
5 (x − 2)
=
3 x1/3
dy
So dx is 0 when x = 2 and is undefined when x = 0. There are therefore two critical
√
points: (0, 0) and (2, −3 3 4) ≈ (2, −4.8).
We had already found that y = 0 at x = 0 and that y < 0 for nearby values of
x on either side of x = 0. This shows that (0, 0) is a local maximum, and it might
have led us to expect that the derivative would be zero at x = 0. In fact, there is
a cusp at x = 0, as can be seen by inspecting the values of the derivative for points
44 Chapter 2 Optimisation
close to zero. If x is small and positive then x1/3 (the cube root of x) is also small
and positive, whereas x − 2 is close to −2. So (x−2)
x1/3
is a negative number of large
magnitude. On the other hand, when x is negative and close to zero, x1/3 is also
negative and close to zero, while x − 2 is still close to −2; so in this case (x−2)
x1/3
is
positive and of large magnitude. So we have an extremely sharp cusp at the origin,
the graph being nearly vertical on either side of the cusp.
Since y = 0 at √x = 0 and x = 5, and y < 0 for 0 < x < 5, it is clear that the
critical point (2, −3 3 4) must be a local minimum. At this stage we know enough
to be able to draw the graph reasonably well. It only remains to determine the
concavity of the various sections of the graph, so that we can really represent its
shape accurately. To determine the concavity and the points of inflection, we need
dy
to differentiate again. The following calculation only applies for x 6= 0, since dx is
not defined at x = 0.
!
2 1/3 1 −(2/3)
d y 5 x − (x − 2) 3 x
2
= 2/3
dx 3 x
5 (3x1/3 − (x − 2)x−2/3 )
=
3 3x2/3
5 (3x − (x − 2))
=
3 3x4/3
5 (2x + 2)
=
9 3x4/3
10 (x + 1)
=
9 x4/3
(In the first line of the we applied the quotient rule for differentiation. In the third
line we multiplied both the numerator and the denominator by x2/3 .)
d2 y
So dx2 = 0 when x = −1. (The value of y at this point is (−1)2/3 (−1−5) = −6.)
2 2
4/3 d y d y
Since x is never negative we see that dx 2 < 0 when x < −1, and dx2 > 0 for x > −1
(excluding x = 0). So the concavity is downward for x < −1, the point (−1, −6) is a
point of inflection, and the concavity is upward for −1 < x < 0 and for x > 0.
Table 3
Our other task is to find the maximum and minimum values of y for −1 ≤ x ≤ 8.
From the graph it is clear that there the local maxima occur when x = 0 and when
x = 8, and the local minima occur when x = −1 and when x = 2. Furthermore, we
have already calculated y at x = 2 and at x = −1, and we know that the value at −1
is less than the value at 2. So the point (−1, −6) gives the global minimum value for
y when x is restricted to the interval [−1, 8]. Similarly, the global maximum clearly
occurs at x = 8 rather than x = 0. In fact y = 12 at x = 8; so the point (8, 12) gives
the global maximum y value.
y
0
x
−4 −3 −2 −1 1 2 3 4 5
−3
Point of inflection −6
−9
y = 5x − 1 −12
(dotted line) −15
−18
−21
−24
Example 2.6
Suppose that a tank with a rectangular base, straight sides and no top is to have a
volume of 4 cubic metres. The width of the base is to be 1 metre. Suppose also that
material for the base costs $10 per square metre, while material for the sides costs
$5 per square metre. Our task is to find the least possible total cost of materials.
The first step is to draw a diagram (see below) and identify the variables involved.
The volume and width of the base are constants. The variables are the length of the
width = 1 m.
ℓ
base (call it ℓ), the height of the tank (call it h) and the cost (call it $C). The aim
is to find the global minimum value of C.
Next, write down the relationships between the variables. Since
4 = ℓ × 1 × h.
46 Chapter 2 Optimisation
That is, lh = 4. This enables us to eliminate one of these two variables, either ℓ or
h, by expressing it in terms of the other. It does not matter which we one choose.
So let us use
h = 4/ℓ
to eliminate h whenever it arises. (As it happens, it would simplify the calculations
a little to eliminate ℓ instead. We leave it to the student to redo the problem in this
alternative fashion.)
The cost is given by
d
1 dy
Since dt ln y(t) = y dt , we may write
d
RGR of y = ln y(t) .
dt
Example 2.7
Assume that for the first 30 days the height of a parsley plant is given (in centimetres)
1 2
by y(t) = 200 t , where time t is measured in days. Calculate the absolute and relative
growth rates of the parsley plant during the first 30 days and find their maxima during
this time, if they exist.
The absolute growth rate is just the derivative:
dy 1
AGR =
= t.
dt 100
The AGR is defined for all 0 ≤ t ≤ 30, and it is an increasing function, which can be
seen either by graphing it, or by calculating that its derivative is positive throughout
this interval:
d(AGR) d2 y 1
= 2 = ,
dt dt 100
a positive number.
Therefore the absolute growth rate has a maximum at t = 30. Its maximum
value is AGR(30) = 0.3 cm per day.
AGR
0.3
0.2
0.1
5 10 15 20 25 30 t
RGR
2.0
1.5
1.0
0.5
5 10 15 20 25 30 t
48 Chapter 2 Optimisation
1. Find any stationary points, and any points of inflection, and hence sketch the
curves.
(a) f (x) = x2 − 10x + 12 (b) f (x) = 1 + 3x − x2
(c) f (x) = x3 − 3x2 + 5 (d) y = 2x3 − 9x2 + 12x − 2
1
(e) y = x4 − 4x3 − 2 (f) f (x) = x 5 (x + 6)
2. Show that f (x) = x3 − 3x2 + 3x + 7 has neither a local maximum nor minimum
at x = 1.
3. Show that f (x) = x + x1 has a local maximum and a local minimum, but its
value at the local maximum is less than its value at the local minimum.
4. Find the global maximum and global minimum points for each of the following
functions in the indicated intervals.
(i ) f (x) = x3 − 75x + 1; −1 ≤ x ≤ 6.
(ii ) f (x) = 3x−1
x+1 ; 1 ≤ x ≤ 10.
5. Find the values of x for which f (x) = x21+1 is (a) increasing; (b) decreasing;
(c) concave upwards; (d) concave downwards. Also find any points of inflection.
8. The yield of fruit from each tree of an apple orchard decreases as the density
at which the trees are planted increases. When there are n trees per acre, the
average number of apples per tree is known to be equal to 900 − 5n, for a
particular variety of apple (where n lies between 30 and 60). What value of n
gives the maximum total yield of apples per acre?
10. The reaction as a function of time (measured in hours) to two drugs is given by
2
R1 (t) = te−t , R2 (t) = te−2t .
11. Find two numbers whose sum is 14 and whose product is maximum.
2.2 Functions with two or more variables 49
12. Find two numbers whose sum is 8 and the sum of their squares is a minimum.
13. A farmer wishes to enclose a rectangular paddock using only 200m of fencing.
What is the largest area he can enclose?
14. Repeat exercise 13 for the case in which one side of the paddock makes use of an
existing fence, and only three new sides need to be constructed using the 200m
of available fencing.
15. An open rectangular box is to be made from a piece of cardboard 8cm wide and
15cm long by cutting a square from each corner and bending up the sides. Find
the dimensions of the box of largest volume.
16. A cistern is to be constructed to hold 324m3 of water. The cistern has a square
base and four vertical sides, all made of concrete, and a square top made of
steel. If the steel costs twice as much per unit area as the concrete, determine
the dimensions of the cistern that minimize the total cost of construction.
17. A eucalyptus tree grows according to a logistic curve, with its height at time
t ≥ 0 years given by
60
h(t) = metres.
1 + e(5−t)
(a) If the absolute growth rate has a maximum, find the time at which it
occurs, and the height of the tree at this time.
(b) Calculate the relative growth rate of the tree.
x
The broken lines represent the negative parts of the axes.
50 Chapter 2 Optimisation
Any two intersecting straight lines determine a plane. In particular, the y- and
z-axes determine a vertical plane, containing all points (0, y, z), with equation x = 0.
Similarly, the x- and y-axes determine the horizontal plane z = 0 and the x- and
z-axes determine the vertical plane y = 0.
An example of a function in two variables is z = 4 − x − 2y. The following points
satisfy this equation: A (4, 0, 0), B (0, 2, 0), C (0, 0, 4), D (1, 1, 2) and E (−1, 3, −1).
These points and the resulting plane are plotted on the following diagram.
Remember that this diagram represents 3-dimensional space. The point E, for exam-
ple, can plotted by moving 1 unit backwards from 0 along the x-axis, 3 units in the
y direction, and then 1 unit down in the (negative) z direction, so that it is situated
one unit below the horizontal plane formed by the x- and y-axes. Similarly, D can
be plotted by moving from the origin 2 units vertically along the z-axis, then 1 units
in the positive x direction and then 1 unit in the y direction.
We note that the domain of this function z = f (x, y) = 4 − x − 2y is the set
of all ordered pairs (x, y) of real numbers, and that the range is the set of all real
numbers z. All the points (x, y, z) that satisfy the equation z = 4 − x − 2y lie in
a plane determined by any three points that satisfy the equation. In general, the
sets of points (x, y, z) that satisfy an equation z = f (x, y) will form a surface in
3-dimensions. Sketching such a surface is by no means easy—see p444 of Arya and
Lardner, Mathematics for the Biological Sciences, for some examples.
2.2.2 Partial derivatives
In section 2.1 we were able to determine maximum and minimum values of a function
by considering its derivatives. We now extend this idea to functions of more than
one variable, whose derivatives are called partial derivatives. In this section we will
see, by way of examples, how to differentiate such functions.
If z = f (x, y) is determined by two independent variables, x and y, then it has
two partial derivatives, one with respect to x and one with respect to y. They are
often written as fx (x, y) and fy (x, y). We also use the notation
∂z
= fx (x, y)
∂x
and
∂z
= fy (x, y)
∂y
for the partial derivatives.
2.2 Functions with two or more variables 51
Example 2.8
∂z ∂z
p
Calculate ∂x and ∂y if z = x − 3y 2 .
2
Keeping x constant:
∂z ∂ (x2 − 3y 2 )1/2
= ∂y
∂y
1 2 ∂ (x2 − 3y 2 )
= (x − 3y 2 )−1/2 · ∂y
2
1
= (x2 − 3y 2 )−1/2 · −6y
2
−3y
=p .
x2 − 3y 2
Recall that the second derivative of a function of one variable was useful in
determining maximum and minimum values. We shall find that the second-order
partial derivatives have a similarly important role in determining maximum and
minimum values of functions of two variables.
For the function z = f (x, y) there would seem to be four second-order partial
derivatives, calculated as follows:
2
∂z with respect to x, to get ∂ z .
(i) Differentiate ∂x ∂x2
∂z ∂2z .
(ii) Differentiate ∂x with respect to y, to get ∂y∂x
∂z with respect to x, to get ∂ 2 z .
(iii) Differentiate ∂y ∂x∂y
∂z ∂2z
(iv) Differentiate ∂y with respect to y, to get ∂y2 .
It turns out, however, that for any of the functions that we shall consider, the deriva-
∂ 2 z and ∂ 2 z are always equal. So there are three second-order partial deriva-
tives ∂x∂y ∂y∂x
∂2z ∂2z ∂2z ∂ 2 z . Alternatively, we write f (x, y), f (x, y) and
tives, ∂x2 , ∂y2 and ∂x∂y = ∂y∂x xx yy
fxy (x, y) = fyx (x, y).
52 Chapter 2 Optimisation
Example 2.9
Calculate the second derivatives of the function z = x2 y 5 .
Solution: The first derivatives are:
∂z
= 2xy 5
∂x
and
∂z
= 5x2 y 4 .
∂y
Differentiating each of these again with respect to x (and keeping y constant), gives
∂2z
= 2y 5
∂x2
and
∂2z
= 10xy 4 .
∂x∂y
∂2z
2
= 20x2 y 3 .
∂y
2 2
∂z again with respect to y, and check that ∂ z = ∂ z .
You should differentiate ∂x ∂x∂y ∂y∂x
parabola z = 25 − y 2 in the (z, y)-plane, and then imagining the surface that is
swept out when this curve is rotated about the z-axis. The diagram below shows the
portion of the surface that is above the (x, y)-plane.
Imagine cutting this surface with the vertical plane y = 3. The values of x and z
such that the point (x, 3, z) lies on the surface must satisfy the equation z = 25−x2 −9
(since 32 = 9), or z = 16 − x2 . This is the equation of a parabola lying in the plane
y = 3 (which is simply a copy of the (x, z)-plane). It describes the cross-section of
the surface that is obtained when the portion corresponding to y > 3 is sliced away
(as shown in the right-hand diagram below).
y y
x x
The surface z = 25 − x2 − y 2 The surface z = 25 − x2 − y 2
for nonnegative values of z. for z ≥ 0 and y ≤ 3.
tangent to this parabola. At the point (2, 3, 12) this slope is −6. The diagram below
illustrates the surface cut by both the planes y = 3 and x = 2 and the two tangents
54 Chapter 2 Optimisation
x tangent
(4x + z = 20, y = 3)
x
The surface z = 25 − x2 − y 2
for z ≥ 0, y ≤ 3 and x ≤ 2.
Note that the tangent labelled “x tangent” lies in the plane y = 3, which is
parallel to the xz-plane, while the one labelled “y tangent” lies in the plane x = 2,
parallel to the yz-plane. Note also that the planes y = 3 and x = 2 are perpendicular
to each other. Any one line, of course, lies in many different planes in space. Any
two intersecting lines, however, determine a unique plane. In the example illustrated
above the two intersecting tangents determine the tangent plane to the surface at the
point (2, 3, 12).
2.2.4 Maxima and minima of functions of two variables
Local maximum and minimum values of a function of two variables are defined sim-
ilarly to those for functions of one variable.
Definition: (a) A local maximum value of z = f (x, y) is a value z0 = f (x0 , y0 ) such
that f (x0 , y0 ) > f (x, y) for all points (x, y) in the domain of f close to (x0 , y0 ).
(b) A local minimum value of z = f (x, y) is a value z0 = f (x0 , y0 ) such that
f (x0 , y0 ) < f (x, y) for all points (x, y) in the domain of f close to (x0 , y0 ).
A local maximum is akin to the top of a mountain, which may or may not be
the highest point in the mountain range. A local minimum is akin to the bottom of
a hole in the ground, which may or may not be deeper than others.
As with the one variable case, at a local maximum/minimum point (x0 , y0 , z0 ) of
z = f (x, y), either (x0 , y0 ) is on the boundary of the domain of f (x, y), or at least one
of the partial derivatives fx (x0 , y0 ), fy (x0 , y0 ) is not defined, or fx (x0 , y0 ) = fy (x0 , y0 ) = 0,
i.e. the tangent plane at (x0 , y0 , z0 ) is horizontal.
For the rest of this section we will assume that the surface z = f (x, y) is smooth
which means that fx and fy are defined throughout the domain of f . Points at which
2.2 Functions with two or more variables 55
fx = fy = 0 or one of are called critical points, and the tangent plane at such a point
local maximum
local minimum
is horizontal. It is not the case, however, that all such points are local maximum
or minimum values. This is not surprising, since we know that for functions of one
variable the points where the derivative is zero are not necessarily maxima or minima.
However, for functions of two variables the situation is more complicated still, since
there can exist points that are local minima if y is held constant and x is varied, and
are local maxima if x is held constant and y is varied (or vice versa).
To understand the last sentence above, imagine the middle point on the surface
of a horse’s saddle. Let us say that the y direction is towards the horse’s head and
the x direction is sideways. Moving from the middle point in the y direction or
the negative y direction, the saddle rises; this helps to stop the rider from slipping
forward or backward. So the middle point is a local minimum if y is varied and
x stays constant. On the other hand, moving sideways from the middle point the
saddle descends, showing that we have a local maximum if x is varied and y is held
constant.
Definition: Points of the kind just described are called saddle points.
The surface z = f (x, y) = y 2 − x2 has a saddle point at the origin. To check
this, we first calculate the partial derivatives. We find that
∂ 2
fx (x, y) = (y − x2 ) = 0 − 2x = −2x
∂x
and
∂ 2
fx (x, y) = (y − x2 ) = 2y − 0 = 2y.
∂y
So it is certainly true that fx (0, 0) = fy (0, 0), showing that (0, 0) is a critical point.
If we hold y = 0 and allow x to vary then the equation for z becomes z = −x2 , and
there is a maximum at x = 0. On the other hand, if we hold x = 0 and vary y then
we find that z = y 2 , which has a minimum at y = 0. So (0, 0, 0) is a saddle point.
z
the critical point it will be relatively easy to decide, in each case, which kind of
critical point it is.
Example 2.10
p
Determine the critical points, if any, for z = 4 − x2 − y 2 .
Solution:
∂z 1 −x
= (4 − x2 − y 2 )−1/2 × (−2x) = p ,
∂x 2 4 − x2 − y 2
∂z 1 −y
= (4 − x2 − y 2 )−1/2 × (−2y) = p .
∂y 2 4 − x2 − y 2
y
x
If the discriminant D(a, b) is positive and fxx (a, b) > 0 then there is a local
minimum at (a, b).
If the discriminant D(a, b) is positive and fxx (a, b) < 0 then there is a local
maximum at (a, b).
If the discriminant D(a, b) is negative then there is a saddle point at (a, b).
If the discriminant D(a, b) is zero then we can’t draw any conclusion without
further work.
If you read the wording of this test carefully, you’ll notice that it appears to
give special prominence to the sign of fxx (a, b) while ignoring the sign of fyy (a, b)
completely. However we have seen that at a minimum or maximum point, these
particular second partial derivatives have the same sign and so only one is needed in
the statement of the test.
Note that it is not possible for D(a, b) to be positive and fxx (a, b) = 0, for if
2
fxx (a, b) = 0 then D(a, b) = − (fxy (a, b)) ≤ 0.
Example 2.11
Find the discriminant D of f at each of its critical points for the function
f (x, y) = x4 + y 4 − 4xy + 1, and hence identify the critical points as local max-
ima, local minima or saddle points, if possible.
Solution: First we calculate the first and second partial derivatives.
(−1, −1). At the point (0, 0), all the above partial derivatives are 0 except for the
last, fxy (0, 0) = −4. Therefore
2
D(0, 0) = fxx (0, 0)fyy (0, 0) − (fxy (0, 0)) = −(−4)2 = −16 < 0.
The test shows that there is a saddle point at (0, 0).
At the point (1, 1), we have fxx (1, 1) = fyy (1, 1) = 12 and fxy (1, 1) = −4.
2
D(1, 1) = fxx (1, 1)fyy (1, 1) − (fxy (1, 1)) = 12 × 12 − (−4)2 = 128,
and since fxx (1, 1) = 12 > 0 there must be a local minimum at (1, 1). Similarly, you
can confirm that there is also a local minimum at (−1, −1).
Example 2.12
A courier company will accept rectangular parcels for delivery only if the sum of the
parcel’s length and girth (girth = smallest distance around) does not exceed 200cm.
Find the dimensions of the parcel with maximum volume accepted by the courier
company.
Solution: Step 1: First we draw a diagram, and identify the variables involved.
girth
x
ℓ
Let ℓ be the length of the parcel, x the width of the base and y the height.
(Presumably, in this context, the largest of the three dimensions is the one referred
to as the length; however, we shall solve the problem without assuming this.) Let V
be the volume and g the girth.
Step 2: The next task is to identify the independent variables and the quantity to be
optimised. Here x, y and ℓ are all independent variables, and the task is to maximize
the volume, V .
Step 3: We now write down all the relationships between the variables. Observe that
g = 2x + 2y (by definition), and V = xyℓ. The courier’s requirement that
girth + length ≤ 200
therefore becomes 2x + 2y + l ≤ 200. A parcel with 2x + 2y + ℓ < 200 will obviously
not have the maximum allowable volume, since by keeping x and y unchanged and
increasing ℓ until 2x + 2y + ℓ = 200 will produce an acceptable parcel with a larger
volume. So we can assume that 2x + 2y + l = 200, and thus l = 200 − 2x − 2y. This
observation reduces the problem to one in which there are only two independent
variables, since we can eliminate ℓ.
We now proceed to express V in terms of the independent variables x and y.
V = xyℓ
= xy(200 − 2x − 2y)
= 200xy − 2x2 y − 2xy 2 .
2.2 Functions with two or more variables 59
∂V
= 200y − 4xy − 2y 2
∂x
= 2y(100 − 2x − y),
∂V
= 200x − 2x2 − 4xy
∂y
= 2x(100 − x − 2y).
2y(100 − 2x − y) = 0 (1)
2x(100 − x − 2y) = 0. (2)
Now Eq. (2.2) gives y = 0 or 2x + y = 100, and Eq. (2) gives y = 0 or 100 = x + 2y.
Clearly solutions with either x = 0 or y = 0 will not produce the maximum, since
they give zero for the volume. Hence
2x + y = 100 (3)
x + 2y = 100. (4)
Twice Eq. (3) minus Eq.(4) gives x = 100/3, and substituting back gives y = 100/3
also. Therefore ℓ = 200 − 200/3 − 200/3 = 200/3.
We still have to show that the values of x and y that we have found do gen-
uinely give a maximum value of V = xy(200 − 2x − 2y). Observe that the physical
interpretation of the variables forces ℓ ≥ 0, as well as x ≥ 0 and y ≥ 0. Now
ℓ = 200 − 2x − 2y ≥ 0 gives x + y ≤ 100, which mean that the point (x, y) lies on or
below the line x + y = 100. The points in the (x, y)-plane with x and y positive and
x + y < 100 form the interior of the triangular region shown below, and this is there-
fore the domain on which V is defined. The boundaries of this region correspond to
y
100
x
100
x = 0, y = 0 and ℓ = 0, and so V = 0 at all points on the boundary. Inside the region
V takes positive values. The fact that V is a continuous function of x and y means
that it must have a maximum value somewhere in the region, and the point that we
have found is the only possibility, since it is the only critical point in the region. So
the dimensions of the parcel with maximum volume are 100 100 200
3 cm × 3 cm × 3 cm.
Note that our reasoning in the preceding paragraph merely consisted of analysing
something that was intuitively clear: there had to be some maximum, and it clearly
does not lie on the boundary of the domain since V (x, y) = 0 there. So the only
possible critical point had to give that maximum.
60 Chapter 2 Optimisation
2. Describe the sets of points (x, y, z) that satisfy the following conditions.
(i ) y=3 (ii ) z = −3 (iii ) x = 1 and y = 0
∂z ∂z
3. Find and if (i) z = x3 + 2x2 y + y 2 , (ii) z = x ln y + yex .
∂x ∂y
4. Find fx , fy , fxx , fxy and fyy in each of the following cases.
x
(i ) f (x, y) = xey + yex + x (ii ) f (x, y) = (iii ) f (x, y) = ln (x2 + y 2 )
y
y
5. Show that fxy = fyx for: (i) f (x, y) = 3x4 y 2 + 7x2 y (ii) f (x, y) =
x2
6. Find the slope of the tangent line to the curve of intersection of the cylinder
√ √
5 3
2
4z = 5 16 − x and the plane y = 3 at the point (2, 3, 2 ).
7. Demand for butter in Sydney (in kg) is given by z = 2400 − 50x + 90y where x
is the price per kg of butter and y is the price per kg of margarine.
Find zx and zy and interpret your answer.
∂p ∂2p
8. (i ) If p(x, y) = ln(x2 + y 2 ) find and
∂x ∂y∂x
−t
(ii ) Show yxx = yt for y(x, t) = e sin x
(iii ) Find all the local minima and maxima of z = x2 − 2x + y 2 + 2y + 3
(iv ) Find the greatest and least value of z in Part (iii) if −2 ≤ x ≤ 2, and
y = −3
(v ) Find the greatest and least value of z in Part (iii) if −2 ≤ x ≤ 2, and
y = 2x
(vi ) Find the greatest and least value of z in Part (iii) if 0 ≤ x ≤ 1, and
0≤y≤3
∂2z ∂2z
9. (i ) If z = ex sin y show that + =0
∂x2 ∂y 2
(ii ) If f (x, y) = x3 − 3x2 y 2 + 2x, find fx (1, 1) and fy (1, 1).
variety of lines y = a + bx, which fits the given data best? The least squares method
of finding the best fitting straight line is a further application of partial derivatives.
Suppose we graph the following data:
x 1 2 3 4
y 1 2 2 2
In the left hand diagram below, lines 2 and 3 clearly fit the data better than line 1.
But which of 2 and 3 is better?
The answer to this question, of course, is that it depends what you mean by
better. We need some sensible way of measuring how well a line fits the given data
points.
line 3
(x1 , f (x1 ))
2 line 2
f (x1 ) ℓ
1 y1
(x1 , y1 )
line 1
0
0 1 2 3 4 x0 x1 x2 x3 x4
A convenient measure of how well an arbitrary function f fits a given tabulated
function is the sum of the squares of the differences between f (xi ) and yi , taken over
all pairs of tabulated values (xi , yi ). In other words, we define
and consider the function f to be a good fit to the tabulated points if S is small: the
smaller S is, the better the fit.
In the right-hand diagram above, the question is to find how well the line marked
ℓ fits the given data. That is, we are assuming that ℓ is the graph of y = f (x), and
we want to measure how close the graph is to the data points. So for each data point
(xi , yi ) we find the point on ℓ that is directly above or below it, and we determine
the distance between these points. In fact, the point on the graph of y = f (x) that is
directly above or below (xi , yi ) is the point (xi , f (xi )), and its distance from (xi , yi )
is |f (xi ) − yi |. In the diagram these distances are the lengths of the four vertical
dotted lines, and the quantity S is the sum of the squares of these four lengths.
Assume now that f (x) = a + bx for some constants a and b, so that (as in the
diagram) the graph of f is a straight line. The formula for S then becomes
We can now think of S as a function of the two independent variables a and b. The
problem of finding the line that gives the minimum value of S is now a calculus
problem: find the values of the variables a, b that minimize S. The minimum will
occur at a critical point; so we must find the values of a and b for which ∂S ∂S
∂a = ∂b = 0.
Note that since S is a sum of squares it is always positive; moreover, making either
a or b extremely large will inevitably make S extremely large. Hence it is clear that
there must be some point at which a minimum occurs, and this must be a point such
that ∂S ∂S
∂a = ∂b = 0.
62 Chapter 2 Optimisation
Dividing through by 2, then collecting like terms and moving the constants to the
right hand side, this becomes
na + (x1 + x2 + · · · + xn )b = y1 + y2 + · · · + yn .
These equations have a unique solution (a, b), which must minimize the sum of the
squares S. The resulting straight line y = a + bx is called the least squares best fitting
straight line for the given data.
Example 2.13
For the data considered above we have four points; so n = 4. The following P table
2
gives
P the P values of xi , yi , xi and xi yi in all 4 cases, enabling the quantities
P xi ,
2
yi , xi and xi yi to be readily calculated.
xi yi x2i x i yi
1 1 1 1
2 2 4 4
3 2 9 6
4 2 16 8
P P P 2 P
xi = 10 yi = 7 xi = 30 xi yi = 19
4a + 10b = 7,
10a + 30b = 19.
2.3 Least Squares 63
These are easily solved, giving a = 1 and b = 0.3. So the least squares best fitting
linear function is
y = 1 + 0.3x.
This is in fact the line marked as line 3 in the left hand diagram above.
Here there are three independent variables a, b and c, but the procedure remains the
same. The critical points are the points at which ∂S ∂S ∂S
∂a = ∂b = ∂c = 0. We get the
following equations:
X X X
na + ( xi )b + ( x2i )c = yi ,
X X X X
( xi )a + ( x2i )b + ( x3i )c = x i yi ,
X X X X
( x2i )a + ( x3i )b + ( x4i )c = x2i yi .
Example 2.14
Using the same four data points as above, we get the following table.
4a + 10b + 30c = 7,
10a + 30b + 100c = 19,
30a + 100b + 354c = 59.
The solution is
a = −0.25, b = 1.55, c = −0.25.
0
0 1 2 3 4
x 1 2 3 4 5 6
y 3 5 6 6 7 9
Plot the points on a diagram, and use the method of least squares to fit a straight
line.
2. Fit a straight line by the least squares method to each of the following sets of
data:
(i ) toughness (x) and percentage of nickel (y) in eight specimens of alloy steel.
toughness (x) 36 41 42 43 44 45 47 50
% nickel (y) 2.5 2.7 2.8 2.9 3.0 3.2 3.3 3.5
(ii ) aptitude test mark (x) given to six trainee salesmen, and their first year
sales (y) in hundreds of dollars.
For both sets of data, plot the points and draw the least squares line on a graph.
Use the lines to predict
(a) the % nickel of a specimen of steel whose toughness is 38, and
(b) the likely first-year sales of a trainee salesman who obtains a mark of 48
on his aptitude test.
Chapter 3
Summation
§3.1 Finite sums
One of the first mathematical experiences one has is learning to add. There was
a time when adding long columns of numbers (“long tots”) occupied many hours
of primary school. When faced with such “sums”, one quickly learns to recognise
patterns, especially 10-combinations such as 7 + 3, 8 + 2, and so on. Other patterns
are not so easily recognised. The famous mathematician C. F. Gauss is credited with
recognising, when only a young child, that the sum 1 + 2 + 3 + · · · + 100 must be 5050,
since it is half of (1 + 100) + (2 + 99) + (3 + 98) + · · · + (100 + 1) = 100 × 101 = 10100.
Gauss’s technique can be applied to any arithmetic series; that is, a series in
which each term differs from the preceding one by the same amount.† An arithmetic
series of n terms, with first term a and common difference d, has the form,
a + (a + d) + (a + 2d) + · · · + (a + (n − 2)d) + (a + (n − 1)d), (1)
in which there are n terms, each differing from its predecessor by d. Writing S for
this sum and applying Gauss’s idea, we see that 2S is the sum of (1) above and
(a + (n − 1)d) + (a + (n − 2)d) + (a + (n − 3)d) + · · · + (a + d) + a, (2)
and so it follows that
2S = n × (2a + (n − 1)d) (3)
since the sum of the first term of (1) and the first term of (2) is 2a + (n − 1)d, and
the sum of the second term of (1) and the second term of (2) the same, and the sum
of the two third terms is again the same, and so on. Since the series have n terms
each, Eq. (3) follows.
The sum of the general arithmetic series (1) above (with first
term a, common difference d and n terms altogether) is given by
(4)
n
S = (2a + (n − 1)d) = an + 21 n(n − 1)d.
2
For example,
1
3 + 8 + 13 + · · · + 98 = ((3 + 98) + (8 + 93) + · · · + (98 + 3))
2
1
= (20 × 101)
2
= 1010.
(Note that you need to be able to work out the number of terms in the series; it
was 20 in this example. The way to find this is to use the fact that the nth term is
† In mathematical literature the word series means a collection of successive terms that
are to be added. By contrast, if the successive terms are not added but remain separate,
then the word sequence is used instead.
65
66 Chapter 3 Summation
a + (n − 1)d, where a is the first term and d the common difference. Here a = 3 and
d = 5, and solving 3 + 5(n − 1) = 98 gives n = 20, showing that 98 is the 20th term.)
Sigma notation
In mathematics we use the greek letter Σ (called “Sigma”) as a shorthand notation
for summation. For example,
n
X
f (l) = f (1) + f (2) + · · · + f (n − 1) + f (n).
l=1
The notation means that f (l) is to be evaluated for all values of l from 1 to n, and the
resulting terms are to be summed. The letter l here is called a dummy variable, since
it does not stand for any particular number. One can replace l by any other Pn letter
that P
has not been used, and the meaning of the expression is unchanged: l=1 f (l)
n
and i=1 f (i) are the same as each other.
Here are two examples of Sigma notation:
10
X
k = 1 + 2 + · · · + 9 + 10
k=1
and
n
X
r2 = 22 + 32 + · · · + (n − 1)2 + n2 .
r=2
Pn
An example that often confuses people at first is j=1 1. It means this: evaluate 1
for all values of j from 1 to n, and add the resulting n terms. The fact that the
expression to be evaluated does not in fact depend on j (being always equal to 1)
does not alter the procedure. All n terms are equal to 1, and so the sum is n:
n
X
1 = 1 + 1 + 1 + · · · + 1 = n.
j=1
The principal advantage of Sigma notation is that it is very compact. For ex-
ample, in Sigma notation the general arithmetic series ((1) above) is
n
X
(a + (i − 1)d),
i=1
since the variable n has already been used—it is the number of terms—and therefore
cannot be used as the dummy variable in the summation. The choice of x as the
dummy variable was quite arbitrary: anything but n, r or a would do.
n−1
X
We could equally well have written the series as arx .
x=0
To find the sum of a geometric series, we can make the following observation.
Let S = a + ar + ar2 + · · · + arn−1 , and assume that r 6= 1. Then
and therefore
Sr − S = arn − a
since the terms ar + ar2 + · · · + arn−1 all cancel out. Since r 6= 1, we can divide both
sides by r − 1, yielding the formula
rn − 1
S=a .
r−1
Example 3.1
Suppose we start an annuity with $1000, to which we add $100 each year, and that
the fixed interest rate is 12%.
At the end of year 1, we have an amount
At the beginning of year 2 we add $100, so that at the end of year 2 we have
The sum in the brackets in this expression is a geometric series with first term equal
to 1.12, common ratio equal to 1.12, and with n − 1 terms. Thus to evaluate it we
can use (5) with a and r both replaced by 1.12 and n replaced by n − 1. Hence,
1.12n−1 − 1
An = 1000 × 1.12n + 100 × 1.12 .
0.12
A more general technique for adding utilizes the so called collapsing technique.
Example 3.2
Observe that
1 1 2
+ = ,
1×2 2×3 3
1 1 1 3
+ + = ,
1×2 2×3 3×4 4
1 1 1 1 4
+ + + = .
1×2 2×3 3×4 4×5 5
It might be conjectured that
1 1 1 1 n
S= + + + ··· + = .
1×2 2×3 3×4 n(n + 1) n+1
That this is indeed the case becomes apparent when we express each term as a
difference, as follows.
1 1
=1− ,
1×2 2
1 1 1
= − ,
2×3 2 3
..
.
1 1 1
= − .
n(n + 1) n n+1
Therefore
1 1 1 1 1 1 1
S = 1− + − + ··· + − + − .
2 2 3 n−1 n n n+1
3.1 Finite sums 69
All the inside terms cancel, so that the whole sum collapses. After cancelling the
interior terms, the only terms that remain in S are the first and the last. That is,
1 1 1 1 1 1 1
S =1+ − + + − + − ··· + − + −
2 2 3 3 n n n+1
1
= 1 + (0 + 0 + · · · + 0) −
n+1
1
=1−
n+1
n
= .
n+1
Pb
Some series j=a f (j) can be evaluated by expressing each of the terms f (j) as
a difference,
def
f (n) = −g(n) + g(n + 1) = ∆g(n),
for some function g to be determined. Then
Xb
f (j) = f (a) + f (a + 1) + · · · + f (b)
j=a
= (−g(a) + g(a + 1)) + (−g(a + 1) + g(a + 2)) + · · · + (−g(b) + g(b + 1)) (6)
= −g(a) + (g(a + 1) − g(a + 1)) + · · · + (g(b) − g(b)) + g(b + 1)
= −g(a) + g(b + 1).
We call such series collapsing series. That is, if f (n) = ∆g(n) = g(n + 1) − g(n),
Pb
then n=a f (i) = g(b + 1) − g(a). In other words, the sum of n terms of the function
f has been reduced to the difference between two terms of the function g. The hard
part, of course, is to find a function g satisfying ∆g(n) = f (n) for all n.
It is usually advisable to write out several of the terms of a series explicitly,
rather than use Sigma notation.
For future reference, let us state clearly the formula that we have established.
b
X (7)
f (j) = g(b + 1) − g(a).
j=a
1
In Example 3.2 above we had f (j) = j(j+1) , and putting g(j) = − 1j we find that
for all j,
1 1 −j + (j + 1)
∆g(j) = g(j + 1) − g(j) = − + = = f (j).
j+1 j j(j + 1)
So by the formula (7)
n
X 1 n
f (j) = g(n + 1) − g(1) = − +1= .
j=1
n+1 n+1
Example 3.3
We can compute a geometric series as a collapsing sum.
Observe that if h(j) = 7j then ∆h(j) = 7j+1 − 7j = 7j (7 − 1) = 6 × 7j . More
generally, if h(j) = rj , where r is some constant, then ∆h(j) = rj+1 − rj = rj (r − 1).
The thing to notice here is that r − 1 is a constant that does not involve j, and this
means that we can divide by it† and get a function whose first difference is rj . That
is, if we define
rj
g(n) =
r−1
(for some r 6= 1) then we obtain
rj+1 rj rj+1 − rj rj (r − 1)
∆g(j) = − = = = rj .
r−1 r−1 r−1 r−1
100
X r101 r0 r101 − 1
rj = g(101) − g(0) = − = .
j=0
r−1 r−1 r−1
So
n−1
X
arj = q(n) − q(0)
j=0
arn ar0
= −
r−1 r−1
rn − 1
=a .
r−1
Example 3.4
Let’s find a formula for the sum of the first n numbers, i.e. let’s compute the
series
Xn
j.
j=1
We have already seen how to compute this series using Gauss’ method; now let’s use
a collapsing sum to get the same answer. If we define g(j) = j 2 , then
∆g(j) = (j + 1)2 − j 2 = (j 2 + 2j + 1) − j 2 = 2j + 1.
Pn Pn Pn
Therefore we see that 2 j=1 j + j=1 1 = n2 + 2n. But j=1 1 = n, so rearranging
we see
n
X 1 n2 + n 1
j = (n2 + 2n − n) = = n(n + 1),
j=1
2 2 2
Example 3.5
If we define g(k) = k 3 then
∆g(k) = (k + 1)3 − k 3 = 3k 2 + 3k + 1.
It follows that
n
X n
X
2
(3k + 3k + 1) = ∆g(k) = g(n + 1) − g(1) = (n + 1)3 − 1 = n3 + 3n2 + 3n.
k=1 k=1
Pn Pn
But k=1 1 = n, and the formula (4) tells us that k=1 k = 21 n(n + 1). Therefore
n
X 3
3 k 2 = (n3 + 3n2 + 3n) − n(n + 1) − n,
2
k=1
P20 1 1
3. (i ) Evaluate k=2 k − k+1 .
Pn
(ii ) Simplify the sum r=0 (br+1 − br ).
5. The lease of a shop costs $10,000 per year payable half-yearly (in advance).
However, for the first three years the shopkeeper does not pay, but the debt
accumulates together with interest at the rate of 16% p.a., added half-yearly.
How much does the shopkeeper owe the landlord at the start of the 4th year?
Suppose that f is a continuous function that is defined for all x in the interval [a, b].
In this section we shall introduce the concept of the integral of f (x) from x = a to
x = b, a quantity that is written as
Z b
f (x) dx.
a
The reason that this topic appears in a chapter entitled “Summation” is because
Rb
a
f (x) dx is obtained from f by a kind of continuous summation process. The
precise meaning of this rather cryptic statement will become clearer in due course.
We have seen that the problem of evaluating a finite sum
a+n−1
X
f (i)
i=a
3.2 The definite integral 73
reduces to evaluating g(a + n) − g(a), provided we can find a function g(x) satisfying
∆g(x) = f (x). The principal result of this section is an analogous result for integrals,
known as the Fundamental Theorem of Calculus. It says that
Z b
f (x) dx = F (b) − F (a)
a
for some function F . Indeed, the function F must have the property that its derivative
is equal to f .
Our first task on the road towards defining the integral of a function is to inves-
tigate sums of series with an infinite number of terms. The simplest infinite series
that has a meaningful sum is
1 1 1
1+ + + · · · + k−1 + · · ·
2 4 2
(an infinite geometric series). Using the formula (5) from §3.1 gives
n
X 1 1 − ( 12 )n
=
k=1
2k−1 1 − 12
= 2 1 − ( 21 )n .
But ( 21 )n → 0 as n → ∞; so 2 1 − ( 12 )n → 2 as n → ∞. This makes it natural to
define
∞
X 1
k−1
= 2.
2
k=1
In other words, the so-called infinite sum is really the limit of a sequence of finite
sums. The integral of a function will similarly be defined as the limiting value of a
sequence of finite sums.
Example 3.6
′
The rate of growth, F (t), of a microbiological population (where F (t) = number of
cells present at time t) will usually vary with time. That is, it is a function of t,
usually not constant.
If F ′ (t) were constant—say F ′ (t) = k cells per minute—then in any interval of 60
minutes the number of cells formed would be 60k. In other words, F (t0 + 60) − F (t0 )
would be equal to 60k (for any value of t0 ).
The problem we wish to address now is this: if F ′ (t) is not constant, but we
know how it varies with time, can we still determine how many cells will be formed
in a given time interval?
For simplicity, let us assume that we are dealing with an interval of 60 minutes,
from t = 0 to t = 60. We want a method for finding F (60) − F (0); let us call this
quantity F.
If F ′ (t) does not vary too much then we could choose some representative value
of F (t), say F ′ (30), and take this as an estimate for the value of F ′ (t) over the whole
′
F ≈ 60F ′ (30).
74 Chapter 3 Summation
We can improve this by breaking the 60 minute interval into smaller subintervals,
and using the same technique on each of these. For example, suppose that we use
three subintervals of length 20 minutes, and let F1 , F2 and F3 be the number of cell
produced in the first, second and third subintervals.
t number produced
0 – 20 F1
20 – 40 F2
40 – 60 F3
P3
Obviously, F = F1 + F2 + F3 = i=1 Fi . Now we can choose some time t1 in the
first subinterval (perhaps t1 = 10) and use 20F ′ (t1 ) as an estimate for F1 . Similarly,
choosing some t2 and t3 in the second and third subintervals gives estimates 20F ′ (t2 )
and 20F ′ (t3 ) for F2 and F3 , and an overall estimate
for the quantity of interest. This estimate will presumably be better that the previous
estimate of 60F ′ (30), since the amount of variation of F ′ (t) on the subintervals is
certainly less than it is over the whole interval.
Better and better estimates can be obtained by taking smaller and smaller subin-
tervals. If we took ten subintervals of 6 minutes each, and if we let Fi be the number
of cells produced in the i-th subinterval, then
F = F1 + F2 + F3 + F4 + F5 + F6 + F7 + F8 + F9 + F10
and Fi ≈ 6F ′ (ti ), where ti is some point in the i-th subinterval. This gives
10
X
F≈ 6F ′ (ti ).
i=1
Similarly, 120 subintervals of half minute length would give the estimate
120
X
1 ′
F≈ 2 F (ti ),
i=1
for the change in value of F (t) over the i-th subinterval. Now
This approximation tends to the true value as the number n increases, and so we can
write
Xn
F (b) − F (a) = lim F ′ (ti )∆t.
n→∞
i=1
Z b n
X
def
f (t) dt = lim f (ti )∆t
a n→∞ (1)
i=1
The key point to note is that if f (t) is the derivative of some function F (t), then,
as we found previously,
Z b
f (t) dt = F (b) − F (a).
a
76 Chapter 3 Summation
Example 3.7
Let us work through an example with F ′ (t) = t + 1 (so that at any time t the
instantaneous rate of change of F (t) is given by t + 1), and estimate the increase
in F (t) for the period from t = 0 to t = 2. That is, we will find an estimate for
F (2) − F (0).
First divide [0, 2] into four equal subintervals, each of length ∆t = 21 . The
subintervals are
[0, 21 ], [ 12 , 1], [1, 23 ], [ 32 , 2].
Next select a point ti from each subinterval:
t1 = 0, t2 = 21 , t3 = 1, t4 = 3
2
4
X 1 3 1 1 5 1 7
F ′ (ti )∆t = 1 × + × +2× + × = .
i=1
2 2 2 2 2 2 2
1
Suppose we keep the intervals the same, but choose instead t1 = 2, t2 = 1,
t3 = 32 and t4 = 2. We have
1 3
ti 2 1 2 2
3 5
F ′ (ti ) 2 2 2 3
The first estimate was clearly too small. Since t + 1 is an increasing function
its value on each subinterval is minimum at the left hand end of the interval. Since
we took each ti to be the left hand end-point of its subinterval, we were consistently
underestimating the rate of increase of F on the subintervals. Similarly, the second
estimate was too large, since by taking the right hand end-points we were consistently
overestimating. So the true value of F (2)−F (0) lies somewhere between 7/2 and 9/2.
Now let us be more ambitious, and use 200 subintervals of length ∆t = 1/100.
Let ti be the left hand end-point of the i-th subinterval; this gives ti = (i − 1)/100,
and
i−1
F ′ (ti ) = 1 + .
100
The estimate for F (2) − F (0) is
This is an arithmetic series with first term 1/100, common difference 1/10000 and
200 terms. Using the formula
1
S = an + n(n − 1)d
2
(see (4) of §3.1) we obtain
If we had chosen ti as the right hand end-point we would have obtained a similar
arithmetic series, the only difference being that the first term would be 0.0101 instead
of 0.01, making the an term in the formula equal to 2.02 instead of 2. So the estimate
for F (2) − F (0) would be 4.01.
Using 2000 intervals instead of 200 gives upper and lower estimates of 3.999 and
4.001, and using 20000 intervals gives 3.9999 and 4.0001. It looks as though the
true value is 4. One can in fact work out a general formula for an estimate with n
subintervals and take the limit as n → ∞, and confirm that 4 is correct.
There is, however, an easier way. We could just find a formula for F ! In this
case, it is not hard to do this just by trial and error. In fact, if
F (t) = 21 t2 + t
1 2
then F ′ (t) = t + 1. The formula F (t) = 2t + t gives F (0) = 0 and F (2) = 4,
confirming that F (2) − F (0) = 4.
The moral to be drawn from the above example is this: one can calculate
Rb
a
f (t) dt by a lengthy calculation involving sums and limits, or by a trivial cal-
culation if a function F can be found whose derivative is f . The catch is that the
problem of finding a function with a given derivative is not always easy.
A function F whose derivative is f is traditionally called a primitive of f , but
we shall prefer the more modern and intuitive term antiderivative.
Example 3.8
Imagine a journey in Ben’s VW from the Carslaw building (CB) to Bankstown Public
Library (BPL). Ben’s odometer (which measures distance travelled) is broken, but
his speedometer is working. We could estimate the length of the journey from CB to
BPL by noting the speed every few minutes and forming the sum
speed × bit of time + speed × bit of time + speed × bit of time + · · · .
That is, our estimate of distance is
X
F ′ (ti ) · ∆ti
i
where ∆ti is the time between successive measurements. Note that it is not necessary
for the subintervals to all have the same length, as we assumed (for simplicity) in
78 Chapter 3 Summation
our previous examples. So long as all of the subintervals are small enough a good
estimate can be obtained.
Note that speed is the derivative of distance travelled. Here F (t) is the distance
that has been travelled at time t, and F ′ (t) is the speed at time t. If tCB is the time
at the start of the journey and tBP L the time at the end, then F (tCB ) = 0 and the
length of the journey is
Of course, evaluating the sum for some finite number of terms n will only give
an estimate of the integral. All in all it would be better if Ben got his F (t) fixed, so
that the length of the journey would just be the difference of two odometer readings.
Our discussion in §3.2 naturally leads us to consider the following problem: how can
we find a function F whose derivative is a given function f ? Doing this is called
integrating f .
There are many rules and techniques for integration that have been discovered,
and we shall meet several of them in this course. In general, integration is somewhat
harder than differentiation; so whenever you think that you have found an antideriva-
tive for a function f , you should always differentiate it and check that the answer
really is f .
Knowing how to differentiate makes it possible to integrate some functions by a
combination of guesswork and trial and error. For example, it is not too hard to find
a function F (x) satisfying F ′ (x) = x2 . Indeed,
d x3 3x2
= = x2 .
dx 3 3
These days there are powerful computer algebra packages that are able to find
antiderivatives of many functions, and it can be very useful to employ such a program
when faced with a particularly complicated integration problem. Nevertheless, it is
still important to learn the techniques of integration to understand what the computer
is doing. This not only aids in spotting any errors, but also aids in constructing
a greater picture of what is going on. This means a better understanding of the
problem in general, but can also lead to, for example, being able to pick pathways
to solutions of entire classes of problems, or even finding solutions to problems that
remain unreachable by current computer packages.
Notation and terminology
If f is a given function, then a function F satisfying F ′ (x) = f (x) is called a primitive
or antiderivative of f . It is also called the indefinite integral of f , and the notation
Z
F (x) = f (x) dx
F1 (x) = 1 + sin x
F2 (x) = 15 + sin x
F3 (x) = −8 + sin x
then F1′ (x) = F2′ (x) = F3′ (x) = cos x; so F1 , F2 and F3 are all antiderivatives of the
cosine function. Thus the expression
Z
cos x dx
80 Chapter 3 Summation
does not Runiquely determine any one function. Although itR is legitimate to write
F1 (x) = R cos x dx, it is equally legitimate to write F2 (x) = cos x dx and to write
F3 (x) = cos x dx, even R though F1 , F2 and F3 are not equal. It is best not to regard
the indefinite integral f (x) dx as a function, but to think of the equation
Z
F (x) = f (x) dx
f (x) = F ′ (x).
Example 3.9
The best way to learn how to find antiderivatives is to find the derivatives of a
large number of functions. Altering your point of view, you have then found the
antiderivatives of a large number of functions, since a table of antiderivatives is just
a table of derivatives in reverse. So let us differentiate some functions, and reformulate
the results as indefinite integrals.
Z
d
• Since sin x = cos x, it follows that cos x dx = sin x+C (for any constant C).
dx
Z
d n n−1
• Since x = nx , it follows that nxn−1 dx = xn + C.
dx
R
Of course, it would be nicer to have a formula for xn dx, and with a small amount
of ingenuity we can achieve this. Firstly, Since the formula we obtained above
is
R valid for all values of n, we can put n + 1 in place of n. This shows that
(n + 1)x dx = xn+1 + C, corresponding to the fact that dx
n d xn+1 = (n + 1)xn .
Provided that n 6= −1, we can now just divide through by n + 1.
Z
d xn+1 n n xn+1
• Since = x , it follows that x dx = + C, for n 6= −1.
dx n + 1 n+1
Z
d 1 1
• Since ln x = , it follows that dx = ln x + C, provided that x > 0.
dx x x
3.3 The indefinite integral 81
The proviso that x > 0 has to be inserted, since the function ln x is only defined for
x > 0. But if x < 0 then ln(−x) is defined, and differentiating we find that
d 1 1
ln(−x) = × (−1) = .
dx −x x
So if we are concerned with negative x values rather than positive ones, the following
rule applies:
Z
1
• dx = ln(−x) + C, provided that x < 0.
x
Differentiating ex , cos x and tan x immediately gives us the following three formulas:
Z
• ex dx = ex + C;
Z
• sin x dx = − cos x + C;
Z
• sec2 x dx = tan x + C.
d d d
(f (x) + g(x)) = f (x) + g(x)
dx dx dx
and
d d
(kf (x)) = k f (x)
dx dx
for all differentiable functions f and g and all constants k give us the following general
rules for integration.
Z Z Z
• (f (x) + g(x)) dx = f (x) dx + g(x) dx;
Z Z
• kf (x) dx = k f (x) dx.
The above integration rules, and a few others, are listed in Appendix 4. In the
next example we illustrate how these rules can be used to integrate various functions.
Example 3.10
Z
x7
(i) x6 dx = + C.
7
Z
x−3
(ii) x−4 dx = + C.
−3
Z Z 3 √
√ 1 x2 2x x
(iii) x dx = x 2 dx = 3 + C = + C.
2
3
Z Z Z Z Z
3 2 4 3 2
(iv) (x + 2x + 3 − 6 ) dx = x dx + 2 x dx + 3 dx − 4 x−6 dx
x
x4 2x3 4x−5
= + + 3x + + C.
4 3 5
82 Chapter 3 Summation
Z Z
4 2 y5 2y 6 y7
(v) y (1 − y) dy = (y 4 − 2y 5 + y 6 ) dy = − + + C.
5 6 7
Z Z √
1−u − 21 1 2 u
(vi) √ du = (u − u ) du =
2 (3 − u) + C.
u 3
Z
(vii) 5ex dx = 5ex + C.
2
x
Z 2
x +1
Z
+ ln x + C on intervals where x > 0,
(viii) −1
dx = (x + x ) dx = 2
x 2
x + ln(−x) + C
on intervals where x < 0.
2
Integration using the chain rule (or “function of a function” rule) in reverse
Recall the formula for the derivative of a composite function f (g(x)):
d
(f (g(x)) = f ′ (g(x))g ′ (x).
dx
Example 3.11
d d
(i) (sin(x2 + 3)) = cos(x2 + 3) (x2 + 3) = 2x cos(x2 + 3).
dx Z dx
Therefore, 2x cos(x2 + 3) dx = sin(x2 + 3) + C.
d
(ii) (3x3 + 5)9 = 9(3x3 + 5)8 (9x2 ) = 81x2 (3x3 + 5)8 .
dx Z Z
Therefore, 81x2 (3x3 + 5)8 dx = 9(3x3 + 5)8 (9x2 ) dx = (3x3 + 5)9 + C.
Many integration problems involve recognising that the integrand is of the form
f (g(x)) g ′ (x): the product of a composite function and the derivative of the “inside
′
Example 3.12
Z Z
2
(i) ex 2x dx = eg(x) g ′ (x) dx, where g(x) = x2 ,
= eg(x) + c
2
= ex + c.
Z p Z
2
1 p
(ii) x 1 + x dx = 2x 1 + x2 dx
2Z
1 1
= g ′ (x) (g(x)) 2 dx, where g(x) = 1 + x2 ,
2
3.3 The indefinite integral 83
3
1 (g(x)) 2
= 3 +c
2 2
3
(1 + x2 ) 2
= + c.
3
Z Z
(iii) sin x cos x dx = g(x)g ′ (x) dx, where g(x) = sin x,
(g(x))2
= +c
2
sin2 x
= + c.
2
Z Z
(iv) e (e − 2) dx = g ′ (x)(g(x))4 dx,
x x 4
where g(x) = ex − 2,
(g(x))5
= +c
5
(ex − 2)5
= + c.
5
Z Z
x 1
(v) dx = 2x(x2 + 1)−1 dx
x2 + 1 2Z
1
= g ′ (x)(g(x))−1 dx, where g(x) = x2 + 1,
2
1
= ln |g(x)| + c
2
1
= ln(x2 + 1) + c.
2
Z √ Z
cos x √ 1
(vi) √ dx = 2 cos x √ dx
x Z 2 x
√
= 2 cos(g(x))g ′ (x) dx, where g(x) = x,
= 2 sin g(x) + c
√
= 2 sin x + c.
Z Z
(vii) cos x sin x dx = − cos2 x(− sin x) dx
2
Z
= − (g(x))2 g ′ (x) dx, where g(x) = cos x,
(g(x))3
=− +c
3
cos3 x
=− + c.
3
Z Z
sin t cos t 1 2 sin t cos t
(viii) 2 dt = 2 2 dt
2 + sin t Z 2 + sin t
1
= g ′ (t)(g(t))−1 dt, where g(t) = 2 + sin2 t,
2
1
= ln |g(t)| + c
2
1
= ln(2 + sin2 t) + c.
2
84 Chapter 3 Summation
Z Z
ln x
(ix) dx = g(x)g ′ (x) dx, where g(x) = ln x,
x
(g(x))2
= +c
2
(ln x)2
= + c.
2
Z
f (x) dx = F (x) + c
as really being a family of functions: one function for each different choice of the
constant of integration c. The graphs of the different functions in this family give a
family of curves, called integral curves.
The diagram on theR next page illustrates the family of curves obtained from
the indefinite integral (3x − 1) dx = x3 − x + c. The graphs of y = x3 − x + c
2
Notice that any given point (x0 , y0 ) will lie on one of the integral curves, since
you can take any one of the curves and move it vertically (up or down) until it passes
through (x0 , y0 ). For example, it is easy to find the value of c for which y = x3 − x + c
passes through the point (2, 3): since (2, 23 − 2 + c) lies on the curve, you just need
3.3 The indefinite integral 85
3 (2, 3)
c=2
2
c=1
1
c=0
0
−2 −1 1 2 x
−1
c = −1
−2
c = −2
−3
c = −3
−4
−5
Example 3.13
Sketch the integral curves of ex , and find the particular curve that contains the point
(2, 1).
R
Solution: We have that ex dx = ex + C. The curve y = ex + C passes through
the point (2, e2 + C), and so we want to choose c so that e2 + C = 1. That is, the
particular curve required is y = ex + 1 − e2 . (Note that 1 − e2 ≈ −6.39).
To sketch these curves, remember that ex > 0 for all x and ex → 0 as x → −∞.
Thus the graph of y = ex + C approaches the horizontal line y = c as we move along
86 Chapter 3 Summation
C=0 1 (2,1)
0
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 1 2 x
C = −1 −1
C = −2 −2
C = −3 −3
C = −4 −4
C = −5 −5
C = −6.39
e0 = 1. For x > 0 the slope of the graph increases rapidly as x increases. To get a
reasonably accurate picture it is necessary to plot a few points.
The diagram shows the curves corresponding to C = 0, −1 and −2 plotted from
x = −9 to x = 2, and the curves corresponding to C = −3, C = −4 and C = −6.9
plotted from x = −9 to x = 2.3.
Direction fields
Sometimes it is very difficult, or even impossible, to find a formula for an antideriva-
tive of a given function f . Try as you might, you will never be able to find an F (x)
2
such that F ′ (x) = ex , for example. However,
R even in such cases it is still possible to
make a sketch of the integral curves for f (x) dx.
We have seen that every point (x0 , y0 ) in the plane lies on one of these integral
curves y = F (x) + c. Observe that the slope of y = F (x) + c at the point (x0 , y0 )
is F ′ (x0 ) = f (x0 ), which we can compute just by knowing the function f . So, for
every (x0 , y0 ) in the plane, we know the slope at (x0 , y0 ) of the integral curve that
(x0 , y0 ) lies on. In other words, we know the direction in which the curve is going as
it passes through (x0 , y0 ).
This collection of directions, one direction for each point in the plane, constitute
what is known as a direction field. One can imagine the entire plane covered with
short line segments, one at each point, with slopes as given by the direction field. If
enough of these line segments are drawn, it becomes possible to draw integral curves
by simply joining them up.
In the diagram we have used the function f (x) = 3x2 − 1 (the integral curves of
which were sketched previously). Calculating f (x) at intervals of length 0.25, from
3.3 The indefinite integral 87
x = −1.5 to x = 1.5, gives the following table of values (rounded to two decimals).
−1.5 −1.25 −1 −0.75 −0.5 −0.25 0 0.25 0.5 0.75 1 1.25 1.5
6.75 3.69 2 0.69 −0.25 −0.81 −1 −0.81 −0.25 0.69 2 3.69 6.75
For each of these x values we have drawn line segments of slope f (x) at a number of
points (x, y), with y ranging from −1.5 to 1.5. The diagram enables on to see roughly
where the integral curves must go. (One has been drawn in, as a dotted line.)
y
If we know how a quantity Q(t) varies with time t, differentiating gives the rate
of change, Q′ (t). If we are instead given the rate of change, as a function q(t), we
need to integrate it to find Q(t).
Similarly, when graphing the function y = f (x), the derivative gives the slope
of the tangent at each point. If we are instead given a formula for the slope, as a
function of x, we need to integrate this to find f (x).
The problem of finding a formula for quantity given its rate of change is equiv-
alent to finding the equation of a function given the slope of its graph.
Example 3.14
(1) Find the equation of a curve that has a slope of 3x2 at each point.
Solution: Let the curve be y = f (x). Then f ′ (x) = 3x2 , and so
Z
f (x) = 3x2 dx
= x3 + C.
Therefore, the equation of a suitable curve is y = x3 + C, for any choice of C.
Notice that without some additional requirement, the answer is not unique. But
if, for example, we were also told that the curve must pass through (0, 1), then we
could use this to compute C. Indeed, f (0) = 1 would give 03 + C = 1, and so the
curve required would be y = x3 + 1.
(2) Find the equation of the curve that passes through the point (2, 5) and satisfies
dy
= 3x2 − 8x + 7.
dx
88 Chapter 3 Summation
= x3 − 4x2 + 7x + C,
which is the general solution before imposing the requirement that the graph must
pass through (2, 5). The formula gives
f (2) = 23 − 4 × 22 + 7 × 2 + C = 8 − 16 + 14 + C = 6 + C,
where f (t) is the speed at time t and F is an antiderivative of f . (We are applying the
Fundamental Theorem of Calculus to evaluate a definite integral.) In this problem
Z
t t2
F (t) = 12 − dt = 12t − + c,
5 10
62
F (6) − F (0) = (12 × 6 − + c) − (0 − 0 + c) = 68.4 kilometres.
10
Notice that the constant c cancels out; so in this kind of problem there is no
need to include it in the first place.
We remark that is usual practice in these problems to use the notation
b def
F (t) a
= F (b) − F (a).
(4) A population of size N (t) has a rate of change R(t) = −t(t − 100) = N ′ (t).
(i) Represent N (t) as an integral.
3.3 The indefinite integral 89
= 50t2 − 1 3
3 t + c.
(ii) The formula just derived gives N (0) = 0 − 0 + c = c. Since we require N (0) = 50
it follows that c = 50. So
N (t) = 50t2 − 31 t3 + 50.
503
N (50) = 50 × 502 − + 50 ≈ 83, 400
3
(to the nearest hundred).
(5) Suppose that a population has a rate of growth given by three times the popu-
lation size. In other words, if N (t) is the number of individuals in the population at
time t, then at time t the rate of growth of the population is 3N (t) individuals per
unit time. (Since we have not been told the unit of time employed, we do not really
know how fast this population is growing.)
(i) Find N (t) if N (0) = 3.2 × 104 .
(ii) Calculate N (10).
Solution: (i) We are given that N ′ (t) = 3N (t). Dividing by N (t) we obtain
N ′ (t)
= 3.
N (t)
The point of this trick is that we can now integrate, using the chain rule in reverse:
Z
1
N ′ (t) dt = ln N (t) + c, (2)
N (t)
Rwhere c is a constant, as yet unknown. But the left hand side of Eq. (2) has to equal
3 dt = 3t + c′ , for some c′ , and so we obtain
ln N (t) + c = 3t + c′ .
where K = ec .
This is an example of the so called exponential growth model. Note that in
practice exponential growth cannot be sustained for very long, since the exponen-
tial function increases so rapidly that some previously ignored physical restriction is
bound to manifest itself, slowing the rate of increase.
By the formula, N (0) = Ke0 = K. So N (0) = 3.2 × 104 gives K = 3.2 × 104 , and
(ii) The formula above gives N (10) = 32000 e30 ≈ 3 × 1017 . (It is hard to believe
that such an astronomic population size could ever be achieved in any real situation.)
y = f (x)
a b x
An estimate for A can be found as follows. Divide the area into n vertical strips,
each of width ∆x = b−a n , and draw a horizontal line across the top of the strip,
crossing the curve at some point within the strip. The area A is then approximated
by the sum of the areas of n narrow rectangles, each of width ∆x and height f (xi ),
where xi is some point in the interval [a + (i − 1)∆x, a + i∆x]). The next diagram
illustrates a typical narrow rectangle.
y = f (x)
∆x
a xi b x
92 Chapter 3 Summation
Each of the n rectangles has area f (xi )∆x (for some xi ), and hence
n
X
A≈ f (xi )∆x.
i=1
Clearly, as n increases (that is, as the number of rectangles increases and their width
∆x is made smaller) we obtain better estimates for A. As n increases indefinitely we
have
n
X Z b
A = lim f (xi )∆x = f (x) dx.
n→∞ a
i=1
Here we have used the definition of the definite integral, as given in §3.2.
Example 3.15
(1) Find the area under the curve f (x) = x/2 on the interval [0, 2].
y
1 y = x/2
x
1 2
Z 2 2
x2
Solution: A= x/2 dx = = 1.
0 4 0
y
2
1
x
1 2 3 4
Z 4 4
2 x3 5x2
Solution: A= (−x + 5x − 4) dx = − + − 4x = 4.5.
1 3 2 1
3.4 Applications of the definite integral 93
g(xi ) − f (xi )
A
y = g(x)
xi
x
∆x y = f (x)
x=a x=b
The area A bounded by the two curves y = g(x) and y = f (x) and the vertical lines
x = a and x = b can be found by the same method as used in 3.15 to find the area
under a curve. That is, divide the area into n narrow strips, each of width ∆x = b−a n ,
and find the limit of the sum of the areas of these n strips as n → ∞.
Note that as long as g(x) ≥ f (x), the height of the i-th strip is g(xi ) − f (xi ),
for some xi in the subinterval. (Note that this is true regardless of whether f (x) (or,
indeed, g(x)) is above or below the x-axis; all that matters is that g(x) ≥ f (x).) So
n
X
A = lim (g(xi ) − f (xi ))∆x
n→∞
i=1
Z b
= (g(x) − f (x)) dx.
a
Example 3.16
(1) Find the area between the curves y = g(x) and y = f (x) for 0 ≤ x ≤ 5, where
g(x) = 2 and f (x) = 2 − 2e−x .
y
2 y=2
1 y = 2 − e−x
x
1 2 3 4 5 6
Note that e−x > 0 for all x; so 2 − 2e−x < 2. (As the diagram shows, at x = 5 the dif-
Rb
ference between 2 − 2e−x and 2 is very small.) So the formula A = a (g(x) − f (x)) dx
94 Chapter 3 Summation
(2) To find the area between two intersecting curves, we must first find the x-values
of all points of intersection. For example, let us find the area between y = 5x − x2
and y = x2 − 5x + 4.5. The first step is to solve for the intersection(s) as follows:
5x − x2 = x2 − 5x + 4.5,
2x2 − 10x + 4.5 = 0,
(2x − 1)(x − 4.5) = 0,
giving x = 0.5 or x = 4.5. Thus we find that there are two points of intersection,
y y = 5x − x2
6
0
0.5 1 x
2 3 4 4.5 5
-1
-2 y = x2 − 5x + 4.5
one at (0.5, 2.25) and the other at (4.5, 2.25). For x in the interval [0.5, 4.5] we have
5x − x2 ≥ x2 − 5x + 4.5, and the area sought is
Z 4.5 Z 4.5
2 2
A= ((5x − x ) − (x − 5x + 4.5)) dx = (10x − 2x2 − 4.5) dx
0.5 0.5
4.5 1
= 5x2 − 32 x3 − 4.5x 0.5
= 21 .
3
You should be careful not to fall into the trap of thinking that the definite
Rb
integral a f (x) dx is always the area between the graph and the x-axis. This is only
valid if f (x) ≥ 0 on the interval [a, b]. If f (x) < 0 on the interval [a, b] then the area
Rb Rb
between the graph and the x-axis is a (0 − f (x)) dx = − a f (x) dx, and if the graph
3.4 Applications of the definite integral 95
crosses the x-axis in the interval [a, b] then it is necessary to find all the points where
it crosses, and evaluate all the different pieces of area separately.
Consider, for example the area between the graph of y = sin x and the x-axis
for 0 ≤ x ≤ 2π.
y
1
0
π 2π x
−1
The graph crosses the axis once in the interval [0, 2π], namely at x = π. Of course
Z 2π 2π
sin x dx = − cos x 0 = (− cos 2π) − (− cos 0) = −1 + 1 = 0,
0
which is clearly not equal to the area between y = sin x and the x-axis on [0, 2π].
Instead, we should calculate the two shaded areas separately, and add them.
On the interval [0, π] we have sin x ≥ 0, and so the first area is
Z π π
sin x dx = − cos x 0 = (− cos 2π) − (− cos 0) = 1 + 1 = 2.
0
It is clear by the symmetry of the diagram that the other part will have the same
area. To confirm this, observe that since sin x ≤ 0 for π ≤ x ≤ 2π, the area between
the graphs of y = 0 (the x-axis) and y = sin x on [π, 2π] is
Z 2π 2π
(0 − sin x) dx = cos x π = cos 2π − cos π = 1 + 1 = 2.
π
and choose a sample point xi in each sub-interval. Averaging the function values
f (x1 ), f (x2 ), . . . , f (xn ) gives
n
f (x1 ) + f (x2 ) + · · · + f (xn ) X f (xi )
=
n i=1
n
n
X ∆x
= f (xi )
i=1
b−a
n
1 X
= f (xi )∆x
b − a i=1
since ∆x = b−a n . If we believe that we get better estimates by taking more points,
the average value of f on [a, b] should be defined as the limit of the above expression
as n tends to ∞. Since we have already defined the limit as n → ∞ of the expression
P n
i=1 f (xi )∆x to be the definite integral of f (x) from a to b, our definition of the
average value of f on [a, b] becomes
Z b
1
average value = f (x) dx.
b−a a
The above formula is intuitively reasonable, since if we write k for the average
value then we find that Z b
f (x) dx = k(b − a)
a
= kb − ka
b
= kx a
Z b
= k dx.
a
In other words, the integral of f (x) from a to b equals the integral of its average value
over the same interval.
Example 3.17
Consider the function f (x) = x(2 − x) over the interval [0, 2]. The function values on
this interval are all nonnegative,
R 2 and so the area between the graph and the x-axis
over this interval is given by 0 (2x − x2 ) dx = 43 . The average value of f (x) over
y y
1 1
2 2
3 3
0 0
0 1 2 x 0 1 2 x
1
R2
[0, 2] is 2−0 0
(2x − x2 ) dx = 32 . Now observe that a rectangle on the base [0, 2] with
height equal to this average value has area 43 , the same as the area under the graph.
That is, the shaded areas in the above diagrams are equal.
Example 3.18
Z 5 Z 5 Z 5
(1) 2 2
(t − 2t) dt = t dt + (−2)t dt
2 2 2
Z 5 Z 5
2
= t dt − 2 t dt
2 2
1 3 5 1 2 5
= 3t 0 − 2 2t 0
117 21
= −2×
3 2
= 18.
Z 0 Z 3
(2) 2
1 3 3
t dt = − t2 dt = − 3 t 0 = −9.
3 0
(3) If f (x) is a continuous function on [a, b], but is defined differently over different
sections of the interval, then property (3) above allows us to calculate the integral
from a to b by splitting it into appropriate pieces. We illustrate this by finding the
average value of g(t) over its interval of definition, where
t(10 − t) 0 ≤ t ≤ 5,
g(t) = 5−t
25e 5 ≤ t ≤ 60.
(Since g(t) was defined differently on [0, 5] and [5, 60], we were forced to calculate the
integral from 0 to 60 in two pieces, namely 0 to 5 and 5 to 60.)
98 Chapter 3 Summation
ex − e−x
x, x3 , x5 , . . . , sin x, tan x, , x2 sin x.
2
Some examples of even functions are
ex + e−x 3
17, x2 , x4 , . . . , cos x, , x sin x.
2
Some examples of functions that are neither even nor odd are
x + x2 , ex , 1 + sin x.
Example 3.19
Z a
(1) Since sin x is odd, sin x dx = 0.
−a
Of course we can confirm this by integrating:
Z a
a
sin x dx = − cos x −a = − cos a + cos(−a) = 0.
a
The diagram illustrates what is happening in the case that 0 < a < π. The integral
of sin x from −a to 0 is the negative of the area marked A in the diagram, since the
graph is below the axis here. By symmetry the area A equals the area B, and the
integral from −a to a is −A + B = 0.
y
1
−a B
0
−π A a π x
−1
Note that the integral of −x2 + 4 from 0 to 3 is the difference AR − B, where A and
3
B are the areas shown in the diagram. This is because B = − 2 (−x2 + 4) dx, the
R0 R 3
graph being below the axis on [2, 3]. That −3 (−x2 + 4) dx = 0 (−x2 + 4) dx is clear
by symmetry.
y = 4 − x2
4
2
A
1
0
−3 −2 −1 1 2 3 x
−1 B
−2
−3
−4
−5
4. Find the law connecting time t and distance s when time and velocity v = ds/dt
are related by
1
v = 2 − 2,
t
and s = 3 when t = 1.
5. Find the area bounded by the parabola y = 3 + 4x + 3x2 , the axis of x, and
(i ) the lines x = 1 and x = 2
(ii ) the lines x = 1 and x = 5
7. Find the average values of the following functions over the indicated intervals:
(i ) y = 6x2 + 4x + 3 for −2 ≤ x ≤ 2.
(ii ) y = 5 sin x2 for 0 ≤ x ≤ 2π.
2
8. Find the area under the curve y = xex for 0 ≤ x ≤ 2. Find the average value
of y over the same interval. What is the relationship between the area and the
average value?
In this section, we shall consider some ways in which the notion of a definite
integral can be extended. Specifically, we shall see how in some circumstances one
can define the integral of f over an infinite interval. Such improper integrals are
analogous to the infinite series considered at the start of §3.2.
3.5.2 Infinite integrals
It is frequently important to consider integrals for which the domain is, at least
theoretically, the whole real number system. Such integrals are important in math-
ematical statistics, for example. So let us start by investigating a simple statistical
example.
Probability density functions
If a coin is tossed three times one may consider a variety of different types of outcomes.
For instance, one might be interested in the number of times a head was displayed.
Then there are four possible outcomes: 0 heads, 1 head, 2 heads or 3 heads. If we
let P (x) be the probability of there being x heads, then
1 3 3 1
P (0) = , P (1) = , P (2) = , P (3) = .
8 8 8 8
3.5 Extending integration 101
A function such as P is called a probability density function. The two crucial prop-
erties of P are that P (x) ≥ 0 for all x, and the sum of P (x) over all possible values
of x must be 1.
In the coin tossing example the range of possible values of the random variable
x was the finite set of numbers {0, 1, 2, 3}. But it is important to be able to use
statistical methods to analyse more complex situations. Specifically, we need to
consider random variables for which the range of possible values is an interval [a, b].
For example, imagine a bee that flies from its hive in search of pollen. Let d(t)
denote the distance that the bee is from its hive at time t, and let x denote the
maximum value that d(t) achieves on a given day. Then x will certainly lie in some
interval [0, D] (where D is the maximum distance the bee can ever travel from its
hive). It seems intuitively reasonable to expect that x is less likely to be very close
to D or very close to 0 than it is to be somewhere near the middle of the interval.
Whatever way, there ought to be some probability density function that describes
the relative likelihoods of the various possibilities.
Notice that the probability that x will take any one given value k in [0, D] is
infinitesimal. Real numbers correspond to infinite decimal expansions, and as such
they can never be measured exactly. (Which is more likely: that x = 1.03, or that
x = 1.030001, or that x = 1.0300000276?) It does not really make sense to ask about
the probability that x = k; what you should really ask about is the probability that
x lies in some range such as k − 0.001 < x < k + 0.001. The probability density
function f is defined by the requirement that
Z k
Prob(h ≤ x ≤ k) = f (x) dx.
h
for all h and k in [xmin , xmax ], where xmin and xmax are the maximum and minimum
possible values for x. (In our bee example, xmin = 0 and xmax = D.)
In the bee example, the graph of the probability density function on the interval
[0, D] might look something like that shown in the diagram below (in which D = 3).
RD
The important features are that f (x) ≥ 0 for all x in [0, D], and 0 f (x) dx = 1.
Rk
These two properties ensure that 0 ≤ h f (x) dx ≤ 1 whenever 0 ≤ h ≤ k ≤ D.
Rk
This is what we want, since h f (x) dx is supposed to be the probability that x lies
between h and k, and probabilities have to lie between 0 and 1. Since [0, D] is the
entire range of possible outcomes, the probability that x lies in [0, D] is 1; hence our
RD
requirement that 0 f (x) dx = 1.
y
1
0 x
0 1 2 3
There are many instances when a random variable x could conceivably take any
nonnegative real value. In this case the domain of the probability density function
102 Chapter 3 Summation
has to be the infinite interval [0, ∞) (the set of all nonnegative real numbers). In this
case the p.d.f. would have to have the property that
Z ∞
f (x)dx = 1.
0
We have not yet defined what is meant by such an expression as this. But by
analogy with the case of infinite series, it is natural to define
Z ∞ Z b
f (x) dx = lim f (x) dx
a b→∞ a
Z b Z ∞
If f (t) dt approaches a finite limit L as b → ∞ then we define f (t) dt = L.
a Z a Z a a
Similarly, we define f (t) dt = lim f (t) dt, provided this limit exists,
−∞ b→−∞ b
Z ∞ Z 0 Z ∞
and f (t) dt = f (t) dt + f (t) dt, provided both exist.
−∞ −∞ 0
Example 3.21
Z b b
−3x 1 −3x 1 1 1
e dx = − e = − [e−3b − e−6 ] = 6 − 3b .
2 3
2 3 3e 3e
1
Now since → 0 as b → ∞, it follows that
3e3b
Z ∞
1
e−3x dx = 6 .
2 3e
3.5 Extending integration 103
Example 3.22
Z b b
1 1 1 1
dx = ln(2x + 5) = ln(2b + 5) − ln(5).
0 2x + 5 2 0 2 2
Z ∞
1
Observe that 21 ln(2b + 5) → ∞ as b → ∞; so dx is not defined.
0 2x + 5
Example 3.23
It can be shown that the improper integral
Z ∞
def
Γ(x) = tx−1 e−t dt (1)
0
exists for all real numbers x ≥ 1. This is easy to check in the case x = 1, since
Z T
Γ(1) = lim e−t dt = lim (1 − e−T ) = 1
T →∞ 0 T →∞
(as we have already seen in the context of exponential probability density functions).
The function Γ(x) defined by Eq. (1) is called the Gamma function, and is important
in statistics and applied mathematics.
The following calculation establishes a surprising fact about the values that the
Gamma function takes for positive integer values of x. Observe first that since
d x −t
(t e ) = xtx−1 e−t − tx e−t ,
dt
it follows that
Z T Z T iT
x−1 −t
xt e dt − tx e−t dt = tx e−t = T x e−T
0 0 0
x
since 0 = 0 (given that x > 0). Taking limits as T → ∞ we deduce that
xΓ(x) − Γ(x + 1) = lim T x e−T = 0.
T →∞
(We omit the proof that this limit really is 0, but the student is invited to use a
calculator to work out the values of T x e−T for (say) x = 5 and large values of T .)
Thus we see that Γ(x + 1) = xΓ(x) for all x ≥ 1.
Thus Γ(4) = 3Γ(3) = 3 × 2Γ(2) = 3 × 2 × 1Γ(1) = 3 × 2 × 1. More generally, we
see that
Γ(n + 1) = n(n − 1)(n − 2) · · · × 3 × 2 × 1
for all nonnegative integers n.
2. Simplifying fractions
To simplify fractions, you may multiply numerator and denominator by the same
expression. You may also divide numerator and denominator by the same expression.
Don’t try to cancel terms any other way than this! Some examples:
x2 + xy x(x + y) x
= =
y 2 + xy y(x + y) y
1 + x1 x(1 + x1 ) x+1
3 = 3 = 2x + 3
2+ x x(2 + x )
3x + 1 x+1
Cannot cancel like this: 6= .
3x + 4 x+4
3. Manipulating powers
Powers can be distributed over multiplication and division: for all indices n,
x n xn
(xy)n = xn y n and = .
y yn
105
Appendix 2 — Geometry
We revise the properties of some geometric shapes. First, we revise four-sided
polygons, or quadrilaterals.
s a h
s b b
A square is a quadrilat- A rectangle is a quadri- A parallelogram is a quad-
eral with four sides, all the lateral with four right an- rilateral in which opposite
same length, and four right gles. Opposite sides have sides are parallel, and have
angles. the same length. the same length.
a r r
h O b
O b
θ
θ
b
Area of a triangle: A circle is the set of all A sector is the part of a
points in 2D space a fixed circle swept out by radii
Area = 12 bh distance r from a point O. traversing an angle θ at the
Area = 21 ab sin θ centre.
Area = πr2
See also the sine and cosine Circumference = 2πr Area of sector = 12 r2 θ
rules, in trigonometry. Length of arc = rθ
r
O b
h h
A A
A sphere is the set of all A cylinder is the solid ob- A cone is the solid obtained
points in 3D space a fixed tained by extruding a base by shrinking a base of area
distance r from a point O. of area A along a perpen- A to a point along a per-
dicular distance h. pendicular distance h.
Volume = 34 πr3
Surface area = 4πr2 Volume = Ah Volume = 13 Ah
106
Appendix 3 — Trigonometry
You will need to recall various facts about trigonometry from high school. This
appendix is intended to be a helpful reminder of the most important ideas, but it is
not at all comprehensive.
All angles are measured in radians, which are related to degrees by the equation
π radians = 180◦ .
Angles in radian measure are generally written as pure numbers, without any units.
By convention, angles in the (x, y)-plane are measured as the anticlockwise angle
of rotation from the positive x-axis to a particular ray. Some common angles are
marked on the following diagram.
π
2
3π π
4 4
π 0
5π 7π
4 4
3π
2
The sine and cosine are defined as follows: given an angle θ, let P (x, y) be the
point lying on the circle of radius 1 centred at the origin O such that the angle from
the positive x-axis to the ray OP (measured anticlockwise) is θ. Then we define
cos θ = x and sin θ = y.
In the special case where 0 < θ < π2 , the sine and cosine of an angle can also be
expressed in terms of the corresponding right-angled-triangle:
adjacent opposite
cos θ = and sin θ = .
hypotenuse hypotenuse
P (x, y)
1
sin θ
se
θ
nu
opposite
O cos θ
pote
hy
θ
adjacent
107
108 Appendix 3 Trigonometry
From the definition of sin and cos in terms of the unit circle, it is easy to see the
following symmetry properties:
sin(−θ) = − sin θ and cos(−θ) = cos θ.
In other words, sin is an odd function and cos is an even function.
−2π − 3π −π − π2 0 π π 3π 2π x
2 2 2
−1
y 1
y = cos x
−2π − 3π −π − π2 0 π π 3π 2π x
2 2 2
−1
1 y = tan x
−2π − 3π
2
−π − π2 0 π
2
π 3π
2
2π x
−1
Appendix 3 Trigonometry 109
Very often we need the addition laws for sin and cos:
1 1
cos2 x = (1 + cos 2x) and sin2 x = (1 − cos 2x).
2 2
More generally, the addition laws can be rearranged to yield the so-called “products
to sums” formulae:
1 1
sin x cos y = sin(x − y) + sin(x + y)
2 2
1 1
cos x cos y = cos(x − y) + cos(x + y)
2 2
1 1
sin x sin y = cos(x − y) − cos(x + y)
2 2
Finally, the t-formulae provide a parametrisation of sin and cos entirely in terms of
rational functions. They say that, if t = tan x2 , then
2t 1 − t2
sin x = and cos x = .
1 + t2 1 + t2
The sine and cosine rules relate the sides and angles in any triangle.
A
c
b
B a C
If the triangle is labelled as above, then
a b c
= =
sin A sin B sin C
and c2 = a2 + b2 − 2ab cos C.
Appendix 4 — Exponents and logarithms
1. Index laws
• x0 = 1
• xa × xb = xa+b
• xa /xb = xa−b
• (xa )b = xab
2. Definition of logarithm
• Let a be a fixed positive number. For all real numbers x, y with x > 0
In particular,
3. Logarithm laws
• log 1 = 0
There is a positive number called e, having the value e = 2.71828 . . . , such that
d x
dx e = ex . In fact, e is the only such number. Because of this property, exponen-
tial functions and logarithms with the base e are called “natural exponentials” and
“natural logarithms”. We use the special notation ln x (= loge x) for the natural
logarithm. In particular,
• ln x = y if and only if x = ey .
110
Appendix 5 — Differentiation
d n
• dx (x ) = nxn−1
d x
• dx (e ) = ex
d 1
• dx (ln x) = x
d
• dx (sin x) = cos x
d
• dx (cos x) = − sin x
d
• dx (tan x) = sec2 x
d d
• dx (k f (x)) =k dx (f (x)) (k a constant)
d d d
• dx (f (x) + g(x)) = dx f (x) + dx g(x)
d dv du
(uv) = u +v (product rule)
dx dx dx
and
d u v du dv
dx − u dx
= (quotient rule)
dx v v2
dy dy du
= (chain rule)
dx du dx
(Observe that if we write u = f (x) and y = h(x) then the equation h(x) = g(f (x))
dy du
can be reformulated as y = g(u), and g ′ (f (x))f ′ (x) can be reformulated as du dx . So
the two versions of the chain rule are equivalent.)
111
Appendix 6 — Integrals
112
Answers to the Exercises
(iii ) 3 sin 32 (x + 3π
4 ) or 3 cos 23 x
3
1 2
2. (i ) 0 (ii ) 1
−π π
−1 0
−π π
−1
−2
−3
(iii ) 5 (iv ) 5
4 4
3 3
2 2
1 1
0 0
−π π −π π
−1 −1
1 2
3. (i ) 0 (ii ) 1
−2 −1 1 2
−1 0
−2 −1 1 2
−1
2
(iii )
−2
1
0
−2 −1 1 2
−1
−2
√
2. × 10
1
3. (i ) y = 3x 2 (ii ) y = 5e2x (iii ) y = 1.3x0.6 (iv ) y = 20e−0.4x
113
114 Answers to the Exercises
0 5
−1 1 2 3 4 5
0
−10 −6 −4 −2 2 4
−5 (0, 0)
(2, −18) (−1, −5)
−20 −10
2. Inflection when x = 1.
5. (a) x<0 (b) x>0 (c) x < −1, x > 1 (d) −1 < x < 1
Inflections at x = −1, 1.
2c
6. d= 3
7. (a) t=2
8. n = 30
9. x=1
11. 7 and 7
12. 4 and 4
13. 250 m2
14. 500 m2
5
15. 3 cm × 35 14
3 cm × 3 cm
17. (a) Maximum AGR at 5 years, when the tree is 30 metres high.
e(5−t)
(b) RGR =
1 + e(5−t)
∂z ∂z
3. (i ) ∂x = 3x2 + 4xy, ∂y = 2x2 + 2y
x
(ii ) ∂z
∂x = ln y + yex , ∂z
∂y = + ex
y
116 Answers to the Exercises
2x 2y 2(y 2 − x2 ) −4xy
(iii ) fx = 2 2
, f y = 2 2
, f xx = 2 2 2
, fxy = 2 ,
x +y x +y (x + y ) (x + y 2 )2
2(x2 − y 2 )
fyy =
(x2 + y 2 )2
∂z −5x −5
6. = √ = √ when x = 2.
∂x 4 16 − x2 4 3
7. zx = −50, zy = 90.
Since zx < 0, demand for butter (z) decreases as the price of butter (x) increases,
and since zy > 0 demand for butter increases as the price of margarine (y)
increases.
2x −4xy
8. (i ) , (ii )
x2 + y2 (x2 + y 2 )2
80
2. (i ) 100 (ii ) 840 (iii ) 9841 (iv ) 2 81
Answers to the Exercises 117
1 1 19
3. (i ) 2 − 21 = 42 (ii ) bn+1 − b0
R3
2. (ii) 0
50e0.06t dt ≈ 164
R4 32
3. (ii) 0
(4t − t2 ) dt = 3
118
4. 3
sin3 t
√
(5) 3 +c (6) 5 + x2 + c
√ 1 1
(7) −3 1 − r2 + c (8) 2 (1 + 2t3 ) 3 + c
2 3 2 3
3
(9) 15 (2 + 5y) 2 + c (10) 9 (x + 4) 2 + c
1
p 3 2 2
(11) 2 2y 2 + 1 + c (12) 4 (z + 2z + 2) 3 + c
1 x3 1 2
(23) 3e +c (24) 2 (ln x) +c
e−Kt
2. V (t) = − +C
K
1
3. Amount of drug = −6t 2 + 12. Patient will be drug free after 4 hours.
1
4. s = 2t + t
5. (i ) 16 (ii ) 184
10
7. (i ) 11 (ii ) π
2. (i ) 1 (ii ) 1