Paul G.bamberg-Convexity and Optimization With Applications
Paul G.bamberg-Convexity and Optimization With Applications
Paul G.bamberg-Convexity and Optimization With Applications
WITH A PPLICATIONS
Paul G. Bamberg
Harvard University
Convexity and Optimization
WITH A PPLICATIONS
Paul G. Bamberg
Copyright
2008
c
Paul G. Bamberg
Harvard University
Cambridge, MA 02138
This text is based on lecture notes by Paul G. Bamberg written for M ATH 116: Convexity and Optimiza-
tion with Applications, a course offered at Harvard University in Fall 2008. The notes were meant to
complement Optimization by Vector Space Methods by David Luenberger (Wiley Interscience, 1969 [1997]).
Front cover: The image was generated by the following M ATHEMATICA code:
GraphicsGrid[
Table[ReliefPlot[
Table[Evaluate[
Sum[RiemannSiegelZ[RandomReal[3, 2].{x, y}], {3}]], {x, 0,
10, .2}, {y, 0, 10, .2}],
ColorFunction -> ColorData["BlueGreenYellow"],
Frame -> False], {3}, {3}]]
C ONTENTS
3 Banach Spaces 41
3.1 lp Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Lebesgue Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Lp Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 Compactness and Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.7 Denseness and Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4 Hilbert Space 59
4.1 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 The Projection Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 Orthogonal Complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.4 The Gram-Schmidt Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3
4 U CONTENTS
Bibliography 130
C HAPTER 1
GENERALIZING FROM TWO DIMENSIONS
1.1 Introduction
The general approach in [1], which has made the book a classic, is this:
• Using geometry for inspiration, generalize the solution, typically to infinite-dimensional vector
spaces and non-Euclidean norms, and prove (algebraically) that it is still valid.
All the finite-dimensional problems in this chapter should be familiar, though they may be valuable re-
view for some students. The infinite-dimensional problems are just stated, not solved, and we will take
quite a while to get to them.
In this chapter, we will hold off on defining some important concepts, which for now are just in SMALL
CAPS . These concepts will appear in Chapters 2, 3, and 5 of [1], generally in a context where there is no
mention of optimization, just some challenging mathematics. You will need to learn them before you
tackle optimization. In the process, you will acquire a good background in real analysis and in the branch
of mathematics called “functional analysis”, the theory of normed infinite-dimensional vector spaces.
My hope for these introductory notes is to convince you that optimization problems are fun and rele-
vant, that some of the best ones can only be formulated in infinite-dimensional vector spaces, and that it
is worth your while to learn quite a few new definitions and theorems in order to be able to solve them.
T HEOREM 1.2.1 (extreme value theorem). If f is a continuous real function on a compact metric space X,
M suppPX f ppq, and m inf pPX f ppq, then there exist points q, r P X such that f pqq M and f prq m.
5
6 U 1. G ENERALIZING FROM T WO D IMENSIONS
Example 1.2.2. You are entering a student competition to draw up a business plan for a company with
m scientists and n other employees. Entries with m2 ¡ 2n2 get rejected. You want to have the highest
possible ratio of scientists to other employees. Does this optimization problem have a solution?
Solution (based on [2, Example 1.1, p. 2]): We want to solve max m{n such that m2 ¤ 2n2 for m, n P N.
Equivalently, we want to find the largest rational number p m{n such that p2 2. We cannot apply
Theorem 1.2.1 because the function we are optimizing (our OBJECTIVE FUNCTION) is rational-, not real-
valued. This alone is not enough to rule out the existence of an optimal solution, but we can show that, in
fact, an optimal solution does not exist.
Assume that p is this largest rational number such that p2 2. Define
p pp 22 .
2
q (1.2.1)
Then
2pp2 2q
q2 2
pp 2 q2 . (1.2.2)
Since p2 2, (1.2.2) shows that q 2 2. However, (1.2.1) shows that q ¡ p, which contradicts our initial
assumption that p is the largest rational number such that p2 2. Therefore this optimization problem
has no solution. q
Of course, note that as p2 approaches 2, m and n would become larger and larger. Since the number of
people is finite, we should impose constraints m ¤ mmax and n ¤ nmax , in which case there would be a
solution.
Example 1.2.3. As director of the state lottery, you are designing scratch tickets by assigning probabilities
to the possible payoffs from 0 through $4. Since a ticket sells for $5, you want to be as generous as possible.
Does this optimization problem in R5 have a solution? Change the problem so that any nonnegative
integer payoff is allowed. Does this optimization problem in an infinite-dimensional vector space have a
solution?
Solution: We define “as generous as possible” as “maximizing the expected value of the ticket”. There-
°
fore, in the first problem, we want to solve max 4k0 kpk where pk is the probability of receiving a payoff
°
of $k, such that 0 ¤ pk ¤ 1 for all k 0, . . . , 4 and 4k0 pk 1. Our objective function is continuous
and real-valued, and the constraints on tpk u4k0 define a compact set, so Theorem 1.2.1 guarantees the
existence of a solution. We simply set pk 1 for k 4 and pk 0 for k 4.
°
For the second problem, we want to solve max kPZ kpk , such that 0 ¤ pk ¤ 1 for all k P Z and
°
kPZ pk 1. The notion of compactness is tricky in infinite-dimensional vector spaces so we cannot
apply Theorem 1.2.1 to this problem, but we can show that, in fact, an optimal solution does not exist.
Assume that we have some solution pk πk for all k P Z . We can always increase the expected value of
the ticket and still satisfy our constraints by setting p0 0 and pk πk1 for k P N since
8̧ 8̧ 8̧
kπk1 pk 1qπk ¥ kπk .
k 1
k 0
k 0
Example 1.3.1. Your small bakery can produce only two products: frosted cookies and cakes. A batch
of frosted cookies uses up 1 pound of flour and 3 pounds of sugar. A batch of cake uses up 2 pounds
of flour and 1 pound of sugar. Each day your suppliers bring you 14 pounds of flour and 17 pounds of
sugar. Your optimization problem is to look at the market price of cookies and cakes and decide what to
produce.
Solution: Let x be the number of batches of cookies, y be the number of batches of cake, p1 be the market
price of a batch of cookies, and p2 be the market price of a batch of cake. Then we want to solve:
max p1 x p2 y (revenueq
x,y
Figure 1.3.1 shows the possible production schemes. Observe that if v and w represent possible produc-
tion schemes, then αv p1 αqw with 0 ¤ α ¤ 1, is also possible. This is the definition of a CONVEX SET.
To show this, let
A ,x ,b
1 2 x 14
3 1 y 17
so that we can write the constraints as Ax ¤ b. Then if Av ¤ b and Aw ¤ b, αAv ¤ αb and p1 αqAw ¤
8 U 1. G ENERALIZING FROM T WO D IMENSIONS
• p1 1, p2 7. Again, we see from Figure 1.3.1 that the optimal solution is x 0, y 7. Revenue is
$1 0 $7 7 $47.
• p1 6, p2 2. Notice in Figure 1.3.1 that the revenue function overlaps with the sugar constraint.
Therefore, we maximize revenue by choosing any point along this constraint. Revenue is $6 17 3
$2 0 $34. q
The revenue function is an example of a ( LINEAR ) FUNCTIONAL on R2 that we are trying to optimize.
It is an element of the DUAL SPACE. The straight line that we “slide” to solve the problem is an example
of a HYPERPLANE. The fact that this approach works for any convex set is a simple consequence of the
H AHN -B ANACH T HEOREM.
What we have done is to calculate, for any functional in the dual space, the largest value that this
function can achieve subject to the constraint imposed by our budget. This functional on the dual space
is called the SUPPORT FUNCIONAL.
Example 1.3.2. What is the support functional for the (closed) unit disk D̄1 ? Can you reconstruct the unit
disk (or any other convex set) from its support functional?
Figure 1.3.2: As shown in Example 1.3.2, on the left we see that the envelope of all functionals for which
? the closed
the support functional returns a constant value (i. e. , 1) is the boundary of our convex set,
unit disk. On the right we see that the contour plots of the support functional cpm, nq m2 n2 are
concentric circles centered at the origin; the contour cpm, nq 1 is the boundary of our convex set.
Solution: Let the functionals be given by mx ny. The support functional is then
(
cpm, nq inf c : mx ny ¤ c for all px, yq P D̄2
1.4. Finite- vs. Infinite-Dimensional Vector Spaces U 9
(D̄2 is the closed unit disk). We can see from the left side of Figure 1.3.2 that cpm, nq c0 , where mx0
ny0 c0 is the tangent to the unit disk at the point px0 , y0 q. The slope of the tangent is m{n, so the
slope of the normal is n{m? ?
and the angle between the normal and the positive x-axis is θ? tan1 pn{mq.
Therefore, x0 cos θ m{ m2 n2 and y0 sin θ n{ m2 n2 . Then cpm, nq c0 m2 n2 .
Observe that the value of the support functional evaluated for a given functional was just the given
functional evaluated at some point on the boundary of the convex set. Therefore, intuitively, it seems
that we can reconstruct the convex set from the support functional. We can see this in two ways. First,
the boundary of the convex set can be given by a contour of the support functional, as shown in Figure
1.3.2. Equivalently, the boundary of the convex set is the envelope of all functionals for which the support
functional returns some constant value, as shown in Figure 1.3.2. q
Example 1.2.3 illustrated some of the added complexities of moving from finite- to infinite-dimensional
vector spaces. The following examples further explore the differences between solving optimization prob-
lems in finite- versus infinite-dimensional vector spaces.
Example 1.4.1 (infinite-dimensional: equivalent to [1, Example 8.7.3, p. 231–234]). You are playing a real-
time strategy computer game in which you have to build up a civilization from scratch. The first phase of
the game, the “Age of Agriculture”, lasts from time t 0 to time t T . In this phase, you have farms that
produce at a rate f ptq. Your farms disappear when you move to the next age, the “Age of Mining”. What
makes this game interesting is that you can allocate production between reinvestment rptq and storage
sptq. Your farm production increases at a rate proportional to your reinvestment:
Solution: Note that if we set sptq 0, then f ptq rptq and f9ptq krptq kf ptq, so f ptq f0 exppktq. In
any case, we can solve (1.4.1) with the boundary condition f p0q f0 :
»t
f pt q krpτ q dτ f0 .
0
If functions r1 ptq and r2 ptq satisfy these inequalities, so does αr1 ptq p1 αqr2 ptq, where 0 ¤ α ¤ 1. The
set of acceptable solutions to the problem is a CONVEX SET in an INFINITE DIMENSIONAL vector space and
10 U 1. G ENERALIZING FROM T WO D IMENSIONS
Example 1.4.2 (finite-dimensional). You are growing a new genetically engineered crop as part of a 2-year
biofuels experiment. Your contract requires you to deliver 5 tons at the end of each year. The government
?
pays all costs except the cost of fencing your plot of land, so the cost of producing x tons can be modeled
as cpxq 6 x. The inventory cost of storing your excess crop is hx.
Solution: Let yi be the number of tons grown in year i for i 1, 2. We have the constraints y1 ¥ 5 and
y1 y2 ¥ 10 since we must grow at least 5 tons in order to be able to deliver the required amount by the
end of year 1, and we must grow at least 10 tons in order to be ale to deliver the required amount by the
end of year 2. The region corresponding to possible production schemes is shown in Figure 1.4.1. Clearly,
1.4. Finite- vs. Infinite-Dimensional Vector Spaces U 11
? ?
our cost function 6 y1 6 y2 hpy1 5q is strictly increasing in both y1 and y2 within the feasible region,
so the minimum cost will be found somewhere along the line y1 y2 10 between y1 5 and y1 10.
We substitute y2 10 y1 into the cost function: now we want to minimize
?
6 y1
a
6 10 y1 hpy 1 5 q
which is strictly positive for y1 P p5, 10s and zero for y1 5. Therefore, we reach a corner solution by
?
setting py1 , y2 q p5, 5q, at which the minimum cost is 12 5 26.83. q
Example 1.4.3 (infinite-dimensional: equivalent to [1, Example 1.2.2, p. 3]). Your new contract requires
to deliver at a known rate dptq during the interval t P p0, T s. You produce at a rate rptq. Youra
rate of
production cost is cprptqq (if all you have to pay for is maintaining the fence, this might equal rptq. If
you have xptq tons on hand, you pay inventory costs at a rate hxptq. You start with an inventory xp0q.
Solution: First, note that x9 ptq rptq dptq. Both inventory and production must be nonnegative, so we
have the constraints
»t
xptq xp0q prpτ q dpτ qq dτ ¥ 0
0
rptq ¥ 0.
The optimal solution r ptq lies in an infinite-dimensional vector space. Note that r ptq is not necessarily
continuous: for example, we might want to produce some constant positive quantity r pτ q r for τ P
p0, ts and then produce rpτ q 0 for τ P pt, T³s. rptq is not necessarily bounded either: if h is very small,
then the optimal solution will be to produce 0 dptq dt, the total amount required over the interval p0, T s,
T
as rapidly as possible. q
See [1, Example 8.7.4, p. 234] for a related problem.
Example 1.4.4 (finite-dimensional). You are operating a simple frictionless rocket-propelled car. You ex-
pend fuel instantaneously to increase your speed to v miles/second, coast 1 mile, and use your brakes to
stop instantaneously. You then expend fuel to increase your speed to w miles/second, coast 4 miles, and
stop instantaneously. The challenge is to minimize the travel time. The constraint imposed by your fuel
tank is that v w 3.
12 U 1. G ENERALIZING FROM T WO D IMENSIONS
Solution: The total time spent is 1{v 4{w. We use a Lagrange multiplier to solve this problem:
Lpv, w, λq λ pv w 3q
1 4
v w
BL 1 λ 0
Bv v2
BL
B w w2 λ 0
4
Combining the two partial derivatives gives 4v 2 w2 . Combining this with the constraint gives 4v 2
p3 vq2, so v 1 and w 2 (we discard the solution v 3 since v, w ¥ 0 as we are discussing speed,
not velocity). The travel time is 1.5 seconds. q
If this were a physics course, we might make this problem more realistic by assuming some nonzero
coefficient of friction between the car and the surface on which it travels and some function that gives the
rate of expenditure of fuel, among many other things.
Example 1.4.5 (infinite-dimensional: [1, Example 1.2.5, p. 4]). Now your rocket-propelled vehicle goes
straight up and is subject to gravity. You expend fuel at a rate uptq. Your goal is for the vehicle to reach
height h at time T while expending minimum fuel.
Solution (full solution in [1, Example 5.9.3, p. 125]): Assuming unit mass, massless fuel, and the absence
of aerodynamic forces, the equation of the rocket is governed by the second-order differential equation
: ptq uptq g. We cannot expend negative fuel and we want to reach height h at time T , so we have the
h
constraints
uptq ¥ 0
» T » t
hpT q pupτ q gq dτ dt ¥ h,
0 0
where we assume in the second constraint that hp0q h9 p0q 0; that is, we start at ground level with zero
³T
velocity. We want to minimize the total fuel expended: 0 uptq dt.
The optimal solution u ptq is not necessarily continuous or bounded: for example, it might (and ac-
tually does) consist of an impulse at time t 0. In that case, in order to work with a function that is at
³t
least bounded, it might be a better idea to work with v ptq 0 upτ q dτ , the total amount of fuel expended
during the interval p0, ts. This insight is closely related to the R IESZ REPRESENTATION THEOREM. q
Example 1.4.6 (finite-dimensional). Your job is to invent the most exciting scratch ticket. If an outcome
has probability p of occurring, the excitement that results when that outcome occurs is proportional to
log p. For example, the excitement of rolling 6s on two dice log 361 2 log 61 is precisely twice the
excitement of rolling 6 on one die. The ticket can pay $k for k 0, 1, 2. You must assign probabilities pk
for k 0, 1, 2 to to these three outcomes. So that the state can make a nice profit, the expected payoff must
be 4{7 of a dollar.
1.4. Finite- vs. Infinite-Dimensional Vector Spaces U 13
(H pX q, the “excitement function”, is the information entropy or Shannon entropy of the random variable X,
the payoff of the lottery ticket.) We can substitute the constraints into (1.4.1) to turn this into a single-
variable problem, where we maximize
3
7
p2 log
3
7
p2
4
7
2p2 log
4
7
2p2 p2 log p2
k
Therefore, pk e1 λ1 eλ2 . Some algebra tells us λ1 log 7 1 and λ2 log 2, so pk 2k {7. q
Example 1.4.7 (infinite-dimensional). Your job is to again to invent the most exciting scratch ticket. The
ticket can have any nonnegative integer payoff. You need to assign probabilities pk for k P Z to each of
these possible outcomes. Make the expected payoff be $1, since the tickets will sell for $1.50.
Solution: Now we are trying to maximize over an infinite number of variables. Obviously we cannot use
substitution as we could in the previous example since we still have an infinite number of variables after
taking into account the two constraints
8̧ 8̧
pk 1 kpk 1.
k 0 k 0
There are two LINEAR FUNCTIONALS in the dual space acting as constraints, so the CODIMENSION of this
problem is 2.
We can still solve the problem using Lagrange multipliers:
8̧ 8̧ 2̧
Lpp0 , p1 , . . . , λ1 , λ2 q pk log pk λ1 pk 1 λ2 k pk 1
k 0
k 0 k 0
BL 1 log pk λ1 kλ2 0, ¥0
B pk k
k
Therefore, pk e1 λ1 e λ2 . We recognize this as in the form of a geometric distribution pk pp1 pqk
14 U 1. G ENERALIZING FROM T WO D IMENSIONS
(note this is a geometric distribution with support r0, 8q, not r1, 8q). Since the mean of a geometric
distribution is p1 pq{p, we have p 1{2, so λ1 log 2 1, λ2 log 2, and pk 1{2k 1 . q
Minimum norm problems arise in several instances, as shown by the following examples.
Example 1.5.1. You are visiting a friend who lives at the point px, y q p2, 1q. A bus drives through town
along the main highway, a subspace whose equation is x 2y 0.
• Where do you get off the bus in order to minimize your walking distance to the friend’s house?
• Suppose you take a taxicab, which can travel only north-south or east-west. The cab driver charges
for the driving distance. Where do you get out of the cab in order to minimize your cost?
Figure 1.5.1: minimizing the distance between the highway and the house for the various norms in Ex-
ample 1.5.1
Solution: In the first problem, we are minimizing the 2-NORM or E UCLIDEAN NORM. We can solve the
problem geometrically by expanding a circle centered at the house until it is tangent to the road, as shown
in Figure 1.5.1. The point of tangency is the solution to the system
#
x 2y 0 highway
,
y 1 2 px 2 q normal to the highway through the point p2, 1q
a
or px, y q p6{5, 3{5q. The minimum cost (assuming unit cost) is then p4{5q2 p8{5q2 4{sqrt5. Note
that we were implicitly invoking the PROJECTION THEOREM ([1], Theorem 2, p. 51) in knowing that we
minimize the distance by finding the point on the highway through which the normal through p2, 1q
passes.
1.5. Minimum Norm Problems U 15
In the second problem, we are minimizing the 1-norm or TAXICAB NORM. We can solve the problem
geometrically by expanding a diamond centered at the house until it touches the road, as shown in Figure
1.5.1. The point where the square touches the road is the solution to the system x 2y 0, x 2, or
px, yq p2, 1q. The minimum cost is then 4{3 4{3 2. In this case, the projection theorem with the
ordinary concept of “perpendicular” does not apply, but we could have used [1, Theorem 5.8.1, p. 119].
In the third problem, we are minimizing a type of M INKOWSKI FUNCTIONAL (see [1, p. 131]). We can
solve the problem geometrically by expanding a square around the house until it touches the road, as we
did before. The factor by which you must expand the unit square in order for it make it touch a point, or
4{3, is the minimum cost. q
Example 1.5.2 (minimum norms and convex sets). Assume that the Island of Sodor is a convex set (it
actually is not – see Figure 1.5.2) whose shore is a smooth curve. Reverend Awdry is offshore and want
to swim to the closest point on the island.
Figure 1.5.2: The Island of Sodor is actually not a convex set (Example 1.5.2)!
Solution: We solve the problem geometrically by expanding a circle centered at Reverend Awdry’s loca-
tion until it just touches the island. Let the radius of the circle be d and the point of tangency between the
circle and the convex set be pt .
Consider any other tangent to the circle, where the point of tangency is very close to pt : this tangent
will cut across the interior of the island, otherwise either it would not be a tangent or the set would not
be convex. Therefore, d is the minimum distance from Reverend Awdry’s location to the shore. Now con-
sider all hyperplanes (lines) that separate Reverend Awdry from every point on the island. The reverend
is farthest away from the hyperplane tangent to the island that we just constructed. q
This is a DUALITY theorem: the distance d solves a minimization problem over points on the island
and also solves a maximization problem over elements of the dual space. The theorem is proved in [1]
(Theorem 1, p. 136) with no restrictions except convexity. That is:
An important class of minimum norm problems is finding a polynomial that best approximates a
function in some interval. Before taking a brief look at approximation problems, we first define a LINEAR
FUNCTIONAL :
Definition 1.5.3 (linear functional). Let V be a vector space over a field F. A linear functional is a map
f : V Ñ F such that f pv wq f pvq f pwq for all v, w P V and f pavq af pvq for all v P V and all a P F.
Example 1.5.4. Consider the infinite-dimensional vector space V of all continuous functions f on r1, 1s.
Let W V be the 4-dimensional subspace of polynomials p for which deg p ¤ 3. Which of the following
are linear functionals?
• L : f pxq Ñ f p1q
• L : f pxq Ñ f p0q
• L : f pxq Ñ f p1q
³1
• L : f px q Ñ 1 f pxq dx
f p1q a0 a1 a2 a3
f p0q a0
f p1q a0 a1 a2 a3
»1 a3 4 1
f pxq dx a0 x 2a0
a1 2 a2 3 2
x x x a2 .
1 2 3 4 1 3
1.5. Minimum Norm Problems U 17
Since
1 1 1 1
1
det
1
0
1
0
1
0
1
0,
2
2 0 3 0
the functionals are linearly dependent. More elegantly, we might have remembered that Simpson’s rule
is exact for polynomials p for which deg p ¤ 3, so we can write
»1
f pxq dx pf p1q 4f p0q f p1qq .
1
q
1 3
Example 1.5.5. You have lost your scientific calculator and can only evaluate polynomials ppxq. You need
to compute approximate values of f pxq sin x for randomly chosen values x P r0, 1s. Here are three
norms that might be used to choose the best approximating polynomial:
³1
• L1 -norm: 0 |f pxq ppxq| dx
³1
• L2 -norm: 0 pf pxq ppxqq2 dx
For which of these norms is the Taylor polynomial the right choice for ppxq? Under what circumstances
might one or another of these norms be the appropriate choice?
Solution: The Taylor polynomial is actually not the correct choice for any of these norms. It turns out that
interpolating polynomials arise in minimizing the 1-norm, Fourier series in minimizing the 2-norm, and
C HEBYSHEV POLYNOMIALS in minimizing the 8-norm. See [3] for a complete textbook on approximation
theory. q
The following theorem provides reassurance that under reasonably broad conditions, functions can
be approximated by polynomials to arbitrary accuracy.
T HEOREM 1.5.6 (Weierstraß Approximation Theorem). If f is a continuous real-valued function on r0, 1s, then
there exists a polynomial p on ra, bs such that |f pxq ppxq| ¤ for all x P ras and any ¡ 0.
is called a Bernstein polynomial, and β1 , . . . βn are called the Bernstein or Bézier coefficients. The proof
of the Weierstraß approximation theorem is constructive: we will construct a series of Bernstein polyno-
mials that converges uniformly to f .
18 U 1. G ENERALIZING FROM T WO D IMENSIONS
f is continuous and r0, 1s is compact, so f is uniformly continuous ([2, Theorem 4.19]) on r0, 1s. Therefore,
given any ¡ 0, there exists a δ ¡ 0 such that |x y | δ implies |f pxq f py q| {2. Because f
is continuous and r0, 1s is compact, we can also conclude that f is bounded ([2, Theorem 4.15]); say
supxPr0,1s |f pxq| ¤ M . Now
ņ
so,
ņ k n k nk
|f pxq Bnpf qpxq| ¤ p q
f x f
n k x 1 x p q
k 0
¸ k n k ¸ k n k
nk nk
p q
f x f
n k x 1 x p q p q
f x f
n k x 1 x p q ,
k PS
loooooooooooooooooooooooomoooooooooooooooooooooooon P
loooooooooooooooooooooooomoooooooooooooooooooooooon
k T
A B
where S tj : |x j {n| ¥ n1{4 , j P r0, nsu and T tj : |x j {n| n1{4 , j P r0, nsu r0, nszS.
Let X Binompn, pq, so that P pX k q nk xk p1 xqnk and σ 2 var X {n pp1 pq{n ¤ 1{4n. By
Chebyshev’s inequality,
Equivalently, A {2 if n ¡ M 2 {42 . Furthermore, if |X k {n| n1{4 and n1{4 δ, then |X k {n| δ
and |f pxq Bn pf qpxq| {2 since f is uniformly continuous on r0, 1s. Therefore, B is trivially less than
{2. We conclude that if n ¡ maxpM 2 {42 , δ 4 q, then |f pxq Bn pf qpxq| A B ¤ for all x P r0, 1s; that
is, the Bernstein polynomials Bn pf q converge uniformly to f . q
Example 1.5.7 (Gibbs Phenomenon). Find a discontinuous real-valued function for which such a polyno-
mial does not exist.
Solution (based on [4]): Consider the function f pxq sgn sin x, where
$
'
'
&1 x¡0
sgn x 0 x0.
'
'
% 1 x 0
Figure 1.5.3: Notice how limN Ñ8 SN f pxq f pxq at x 0. This discrepancy is known as Gibbs phe-
nomenon or ringing artifacts (Example 1.5.7).
By the Weierstraß approximation theorem, we have limN Ñ8 |f pxq SN f pxq| ¤ for all x where x is in
some closed interval not containing a multiple of π.
However, we notice in Figure 1.5.3 that SN f pxq exhibits a bump whenever x kπ. In particular,
SN f pxq reaches a critical point whenever
Ņ
pSN f q1pxq π4 cospp2k 1qxq 0.
k 1
To find the zeros of pSN f q1 pxq, we first show that
Ņ
2 sin x cospp2k 1qxq sinp2N xq. (1.5.1)
k 1
The proof is by induction. For N 1, Equation (1.5.1) reduces to the identity sinp2xq 2 sin x cos x. Now
assume Equation (1.5.1) is true for general N and show that this implies it is true for N 1. We have
N¸1 Ņ
2 sin x cospp2k 1qxq 2 sin x cospp2k 1qxq cospp2N 1qxq
k 1
k 1
paq
sinp2N xq 2 sin x cospp2N 1qxq
sinp2N xq 2 sin x cosp2N xq cos x 2 sin x sinp2N xq sin x
pbq
p1 2 sin2 xq sinp2N xq sinp2xq cosp2N xq
p cq
cosp2xq sinp2N xq sinp2xq cosp2N xq pdq sinpp2N 1qxq.
where (a) follows from the inductive step, (b) follows from the identity sinp2xq 2 sin x cos x, (c) follows
from the identity cosp2xq cos2 x sin2 x p1 sin2 xq sin2 x 1 2 sin2 x, and (d) follows from the
identity sinpa bq cos a sin b sin a cos b.
Using this fact, we see that SN f reaches a critical point whenever π sin xpSN f q1 pxq 2 sinp2N xq 0,
or whenever x is a multiple of π {2N . Consider the closest critical points to x 0: x π {2N . Then we
20 U 1. G ENERALIZING FROM T WO D IMENSIONS
have
π Ņ 1 π
sin 2k2N Ņ
2k 1
SN f
2N
4
π 2k 1
2
π
sin
2N π
2k 1
π
N
.
k1
k 1 2N π
! )
2k 2pk 1q
The last sum is a Riemann sum taken over midpoints of the partition 2N , 2N ,k 0, . . . , N 1 .
Therefore, »π
π
lim SN f
N Ñ8 2N
lim f pxq
xÑ 0
sin x
x
dx 1.1790.
0
A similar analysis limxÑ0 f pxq 1.1790. Therefore, the sequence of polynomials tSN f u does not
converge pointwise to f . However, we do see that the sequence tSN f u does converge to f for the L1 - and
L2 -norms, though it does not for the L8 -norm. q
C HAPTER 2
PRELIMINARIES IN ALGEBRA , TOPOLOGY, AND ANALYSIS
Reading: [1, Sections 2.1 – 2.9]. Also see [5, Chapters 1 – 2] for more on the linear algebra covered in this
chapter, [6, Chapter 2] for more on the topology, and [2, Chapters 2 and 4] for more on the topology and
analysis.
Definition 2.1.1 (vector space). A vector space V over a field F is a set V along with two operations:
addition, which associates with any two vectors u, v P V a vector u v P V , and scalar multiplication,
which associates with any vector v P V and any scalar a P F a vector av P V . The following axioms hold:
3. existence of an additive identity: There exists a null vector 0 P V such that v 0 v for all v P V .
4. distributive law for vector addition: apu vq au av for all u, v P V and all a P F.
5. distributive law for scalar addition: pa bqv av bv for all v P V and all a, b P F.
6. associative law for scalar multiplication: pabqv apbvq for all v P V and all a, b P F.
Unless specified otherwise, we will assume we are working with real vector spaces (F R).
Example 2.1.2. Note that [1] replaces a standard axiom, the existence of an additive inverse, with the
axiom 0v 0 for all v P V . Prove the existence of an additive inverse from [1]’s axioms.
Solution: We have
paq
0 0v p1 p1qqv pbq 1v p1qv pcq v p1vq,
where (a) follows from [1]’s axiom 0v 0 for all v P V , (b) follows from the distributive law for scalar
addition, and (c) follows from the definition of a multiplicative identity. Therefore, we define the additive
inverse v 1v. q
21
22 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS
Example 2.1.3. Which axiom on the list can be proved from the others?
Solution: We can prove the commutative law from the other axioms. We have
paq
0 0 pu v q
pp1q 1qpu vq
pbq
p1qpu vq 1pu vq
pcq
p1qu p1qv 1u 1v
pdq
u v u v
where (a) follows from the fact that 0v 0, (b) follows from the distributive law for scalar addition,
(c) follows from the distributive law for vector addition, and (d) follows from the existence of additive
inverses, as shown above, and the definition of the multiplicative identity. Then we add v to the left of
both sides to get u u 0 u u v u v v u v; repeating with v gives v u u v. q
How do we know that we cannot prove another axiom from the remaining ones? We can try defining
vector addition and scalar multiplication in different ways and seeing if the resulting operators still obey
the axioms for a vector space. If not, then the axioms that are not satisfied are independent of the axioms
that are.
Example 2.1.4. Let V R2 . Define addition as usual, but define scalar multiplication by apx, y q pax, 0q
for px, y q P V and a P R. Is V a vector space? If not, what axiom is not satisfied?
Solution: A multiplicative identity does not exist in this vector space since there exists no a such that
apx, y q px, y q for all px, y q. Therefore, the existence of a multiplicative identity is independent of the
other axioms. q
Example 2.1.5. For the NSA budget, let V R Ytsu. The symbol S denotes a secret amount. For elements
in V other than s, addition and multiplication are defined as usual. However, s v v s s for all
v P V and as s for all a P R. Is V a vector space? If not, what axiom is not satisfied?
All 2 3 matrices with real entries: This is equivalent to R6 , which is a 6-dimensional vector space
All bounded infinite sequences: infinite-dimensional vector space (since the sum of two bounded
sequences is bounded and a bounded sequence times a constant is still bounded)
All infinite sequences that converge to zero: infinite-dimensional vector space
2.1. Vector Spaces U 23
All infinite sequences with only finitely many nonzero terms: infinite-dimensional vec-
tor space
All infinite sequences for which the terms form a convergent series: infinite-dimensional
vector space
All polynomials p with deg p 3: not a vector space (consider f pxq x3 and gpxq x3:
degpf g q 3)
All linear functions f : R3 Ñ R: 3-dimensional vector space (since we have the basis
tf p1, 0, 0q, f p0, 1, 0q, f p0, 0, 1qu
because f is linear)
Definition 2.1.6 (vector subspace). A nonempty subset U of a vector space V over F is called a subspace
of V if for any u, v P U and a, b P F, au bv P U . That is, U is a subspace of V if U is closed under vector
addition and scalar multiplication.
A subspace must be nonempty, so it must contain at least one element. By the axiom guaranteeing the
existence of an additive identity, that one element must be 0. Therefore, every subspace contains 0.
Definition 2.1.8 (sum of vector spaces). The sum of two subsets U1 and U2 of a vector space is the set of
all sums u1 u2 , where u1 P U1 and u2 P U2 .
Example 2.1.9. What are the sum and difference of two squares, one centered at a and with side length 2r
and the other centered at b and with side length 2s?
24 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS
Solution: We skip ahead a little and assume knowledge of NORMED VECTOR SPACES. Let the first square
be the set A tx : }x a}8 ¤ ru and B ty : }y b}8 ¤ su. Consider any x P A and y P B. We have
where the first inequality follows from the TRIANGLE INEQUALITY. Therefore, A B is a square centered
at a b with side length 2pr sq. Furthermore, we have
where again we use the triangle inequality, as well as the property that }ax} |a|}x} for any scalar a P F
and any vector x P X, where X is a normed vector space. Therefore, A B is a square centered at a b
with side length 2pr sq. q
Note that A B does not depend on our choice of the origin o since pa oq pb oq a b. This
construction will be crucial to the proof of the E IDELHEIT SEPARATION THEOREM ([1], Theorem 3, p. 133).
Proposition 2.1.10 ([1, Proposition 2.3.2, p. 15]). If U1 and U2 are subspaces of V , then U1 U2 is a subspace of
V.
Definition 2.1.11 (subspace generated by a subset). Suppose S is a subset of a vector space V . The set rS s,
called the subspace generated by S, consists of all vectors in V which are linear combinations of vectors
in S.
If subspace U contains S, it contains rS s. Equivalently, rS s is the intersection of all subspaces that contain
S.
The usual way to construct rS s is to form all linear combinations of vectors in S. This is clearly a
subspace since it is closed.
Solution: We have
• rS s tpx, 2xq : x P Ru since the line segment is part of the line y 2x. q
S UBSPACE generalizes “line or plane through the origin”. The generalization of “any line or plane” is
LINEAR VARIETY , a subspace plus a constant vector:
Definition 2.1.13 (linear variety). The translation of a subspace is a linear variety or affine subspace.
Definition 2.1.14 (linear variety generated by a subset). Suppose S is a nonempty subset of a vector space
V . The set ν pS q, called the linear variety generated by S, is the intersection of all linear varieties in V that
contain S.
Example 2.1.15. If S P R3 consists of the vectors tp0, 0, 1q, p0, 1, 1qu, give examples of subspaces of two
different dimensions that contain S. Which subspace is the “smallest”? What is ν pS q?
Solution: The subspaces tp0, x, y q : x, y P Ru and tpx, y, z q : x, y, z P Ru are subspaces of dimensions 2 and
3, respectively, that contain S. The first subspace is the smallest. ν pS q is p0, 0, 1q tp0, x, 0q : x P Ru. q
Solution: ν pS q is the plane containing the circle. If the origin lies in the plane containing the circle, then
ν pS q rS s. q
Definition 2.2.1 (convex set). A set K in a linear vector space is said to be convex if for any x1 , x2 P K, all
elements in the set tλx1 p1 λqx2 : λ P r0, 1su are in K.
26 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS
Proof: To show that ∅ is not convex, we must find two vectors x1 , x2 P ∅ such that there exist vectors in
the set tλx1 p1 λqx2 : 0 ¤ λ ¤ 1u not in ∅. Since ∅ contains no vectors at all, it is convex. q
Proposition 2.2.3 ([1, Proposition 2.4.1, p. 18]). For any convex sets K, L and any scalars a, b, aK bL is
convex.
Proof: Consider the vectors z1 , z2 P aK bL. There exist vectors x1 , x2 P K and y1, y2 P L such that
z1 ax1 by1 and z2 ax2 by2 . Since K and L are convex, we have
In general, the union of two convex sets is not convex, as shown in Figure 2.2.1. However, we do have
the following:
Proposition 2.2.4 ([1, Proposition 2.4.2, p. 18]). Let C be an arbitrary collection of convex sets. Then XK PC K is
convex.
Proof: Let C XK PC . If C is empty, then the proof reduces to that in Example 2.2.2. Assume we have
x1 , x2 P C and choose any λ P r0, 1s. Then x1 , x2 P K for all K P C and since each K is convex, λx1 p1
λqx2 P K for all K P C. Therefore, λx1 p1 λqx2 P C and C is convex. q
Definition 2.2.5 (convex hull). Given an arbitrary set S in a linear vector space, the convex hull or convex
cover, denoted copS q, is the smallest convex set containing S.
We can express copS q as the intersection of all convex sets containing S. Alternatively, we could express
copS q as the set of all convex combinations of vectors in S, where a convex combination is a linear combi-
°| S | °|S |
nation k1 αk xk , where αk ¥ 0 for k 1, . . . , |S | and k1 αk 1.
2.2. Convex Sets U 27
Proposition 2.2.6. Let K be the set of vectors consisting of all convex combinations of vectors in S. Show that
K copS q.
°
Proof: First we show that K copS q. Let Km be the set of all convex combinations of the form m i1 αi xji ,
°m n
where i1 αi 1, ai ¥ 0 for i 1, . . . , m, and m 1, . . . , n. Clearly K m1 Km . Then in order to
show K copS q, we need to show that Km copS q for m 1, . . . , n. The proof is by induction. This
clearly holds for m 1 since in that case α1 1 and the convex combinations are therefore just the
elements of S copS q. Now we assume that the result is true for general m n and show that this
° °
implies it is true for m 1. Say we are given a convex combination p im1 1 αi xji , where ik11 αi 1
and ai ¥ 0 for i 1, . . . , m 1. At least one of α1 , . . . , αm 1 must be strictly positive; without loss of
generality, assume that α1 ¡ 0. Then
¸1
m ¸1
m ¸1
m ¸1
m
p α1 xj α1 looxmo αi
α i x ji 1 α i x ji j on 1 αi °m 1 x ji ,
i 1
i 2
i 2 i2 αi
looooooooomooooooooon
i 2
q
r
r is a convex combination of m elements of S and therefore, by the inductive hypothesis, is also in copS q.
Then p is on the line segment connecting q and r with q, r P copS q, so p P copS q, as we wanted to show.
Remember that copS q is the smallest convex set containing S. We know that S K1 K copS q, so
we just need to show that K is actually convex in order to show that K copS q. Say we are given two
° ° ° °
elements of K, q ni1 αi xi and r ni1 βi xi , with ni1 αi ni1 βi 1 and αi , βi ¥ 0 for i 1, . . . , n.
° °
Then any point on the line segment connecting q and r can be written as p λ ni1 αi xi p1 λq ni1 βi xi
°
with λ P r0, 1s. We can rewrite p as ni1 pλαi p1 λqβi qxi , with
ņ ņ ņ
pλαi p1 λqβiq λ αi p1 λq βi λ p1 λq 1,
i 1
i 1
i 1
Definition 2.2.7 (cone). A set C in a linear vector space is said to be a cone with vertex at the origin if
x P C implies αx P C for all α ¥ 0.
Example 2.2.8. Make a cone from a line segment, from a circle, and from a disk.
Solution: Figure 2.2.2 shows a cone made from the line segment connecting p5, 5q and p5, 10q and a cone
made from the circle of radius 1{2 centered at p0, 0, 1{2q and parallel to the xy-plane. A cone made from
the corresponding disk would be the same as the cone made from the circle, except that the cone made
from the disk would be filled in. q
28 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS
Figure 2.2.2: cones made from the line segment and the circle in Example 2.2.8
Solution: Consider the convex cone C generated from the line segment L in Figure 2.2.2. If we replace
L with an arc A that has the same endpoints as L and that is entirely contained within C, then we still
generate C. However, A is a non-convex set. q
Proof: Let C be a cone generated from a convex set K. Pick any two vectors y1 , y2 P C. Then there exist
vectors x1 , x2 P K and scalars α1 , α2 ¥ 0 such that y1 α1 x1 and y2 α2 x2 . Furthermore, we can set
α1 βγ and α2 β p1 γ q where β ¥ 0 and 0 ¤ γ ¤ 1: specifically, β α1 α2 and γ α1 {pα1 α2 q.
Then we can write
yc λy1 p1 λqy2 β pγx1 p1 γ qx2 q βxc ;
xc P K since K is convex and yc P C by definition of a cone. Therefore, C is convex. q
Definition 2.3.1 (linear independence). A vector x is said to be linearly dependent upon a set of vectors S
if x can be expressed as a linear combination of vectors in S. Equivalently, x is linearly dependent upon S
if x P rS s. Conversely, x is said to be linearly independent of the set S if it is not linearly dependent on S,
and a set of vectors is said to be a linearly independent set if each vector in the set is linearly independent
of the remainder of the set.
Note that this definition works even for infinite-dimensional vector spaces.
Example 2.3.2. Let S t1, t2 , t4 , . . . u. Is f ptq pt8 4t2 qpt6 t2 7q dependent upon S? Is g ptq t5
dependent upon S? Is S a linearly independent set?
2.3. Linear Independence and Dimension U 29
Solution: f ptq is dependent upon S since each term of f ptq has an even exponent. g ptq is independent of
S since t5 has an odd exponent. S is a linearly independent set. q
T HEOREM 2.3.3 ([1, Theorem 2.5.1, p. 20]). The set tx1 , . . . , xn u is linearly independent if and only if
°n
k1 αk xk 0 implies αk 0 for k 1, . . . , n.
°n
T HEOREM 2.3.4 ([1, Corollary 2.5.1, p. 20]). If tx1 , . . . , xn u is a linearly independent set and
k 1 αk x k
°n
k1 βk xk , then αk βk for k 1, . . . , n.
°n
Proof: If
k 1 α k xk °nk1 βk xk , then °nk1pαk βk qxk 0 and αk βk for k 1, . . . , n by Theorem
2.3.3. q
Definition 2.3.5 (basis). A finite set B of linearly independent vectors is said to be a basis for the space V
if rB s V . The number of vectors |B | in B is called the dimension of V .
Example 2.3.6. What is the dimension of the space spanned by t1, cos2 x, sin2 x, cos 2xu?
Solution: This space has dimension 2, since we can write 1 cos2 x sin2 x and cos 2x cos2 x
sin2 x. q
Q UIZ T HEOREM 1 (from [1, Theorem 2.5.2, p. 21]). If a vector space V is generated by the set of k vectors
Sk tv1 , . . . , vk u, then any set of k 1 vectors Tk 1 tw1 , . . . , wk 1 u in V must be linearly dependent.
Proof: The proof is by induction. For the base case k 1, S1 tv1 u and T2 tw1 , w2 u. There exist
scalars α1 , α2 0 such that w1 α1 v1 and w2 α2 v1 . Then we have α2 v1 α1 v2 0, so T is a linearly
dependent set.
Our inductive hypothesis is that if V rSk1 s, then Tk is a linearly dependent set. We assume this is
°
true for arbitrary k and show that it holds for k 1. First, we can write wk 1 ki1 αi vi since tv1 , . . . , vk u
is a basis. Choose some j such that αj 0. If this cannot be done, then w1 0 and Tk 1 is a linearly
dependent set. Then let Sk1 Sk tvj u. Now
vj α1 wk 1 v
j
°k
for some v P rSk1 s, with v bi vi . Substituting this into the expression below, we have
i 1
i j
ķ ķ
wm β i vi pβ i bi qvi wk 1
i 1
i 1
i j
30 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS
for m 1, . . . , k. By the inductive hypothesis, these k vectors w1 , . . . , wk are linearly dependent; that is,
°
there exists some linear combination ki1 γi wi 0 with not all of γ1 , . . . , γk equal to 0. But then we can
°
set γk 1 0 so that the linear combination kk1 γi wi also equals 0; again not all of γ1 , . . . , γk equal 0 so
the set Tk 1 is linearly dependent, thus proving the theorem. q
We conclude that if Sk is a basis for V , then any set of more than k vectors in V is linearly dependent
and cannot be a basis. Therefore, any two bases for a finite-dimensional vector space contain the same
number of elements.
Example 2.3.7. Show that the result does not extend to infinite-dimensional vector spaces.
Solution: The vector space V of convergent sequences is generated by the countably infinite set of linearly
independent elements S tp1, 0, . . . q, p1, 1, 0, . . . q, p1, 1, 1, 0, . . . q, . . . u. If we remove the first element of
this set, we still have a countably infinite set S 1 S tp1, 0, . . . qu, but this does not generate V since not
all convergent sequences tak u8 k1 have a1 a2 . q
Definition 2.4.1 (normed vector space). A normed vector space, or normed linear space or normed linear
vector space, is a vector space X on which there is defined a real-valued function which maps every x P X
into a real number }x} called the norm of x. The norm satisfies the following axioms:
A real-valued function p : X Ñ R that satisfies just the first two axioms is called a SEMINORM.
Note the following about each axiom:
positive homogeneity
• If you take a cab from your destination back to your starting point, you should pay the same
amount as when you went from your starting point to your destination.
• You cannot save money by breaking a cab ride along a single straight road into two pieces.
• The cab driver cannot offer bargain fares for extra-long rides.
• In order to convert x41 x42 into a norm, we must take the fourth root (this is the l4 -norm in 2
1
dimensions: }x}4 x41 x42 4
).
• If you draw a convex set that contains the origin and decree that the cab fare from the origin to
any point on the boundary of this set is $1 and that fare scales linearly with distance, you do not
necessarily have a norm. In fact, you only have a norm if the boundary is the set tx : }x} cu
for some constant c.
2.4. Normed Vector Spaces U 31
triangle inequality
• You cannot save money by breaking a cab ride into two pieces.
• A cab driver can offer a 50% discount on the shorter dimension (}x pu, v q} 12 minp|u|, |v |q
maxp|u|, |v |q). Say we have x pa, bq and y pc, dq. Assume without loss of generality that
minp|a|, |b|q |a|. If minp|c|, |d|q |c|, then clearly }x y} ¤ }x} }y}. If minp|c|, |d|q |d|,
assume without loss of generality that minp|a c|, |b d|q |a c|. Then
}x} }y} }x y}
1
2
|a| |b| |c| 1
2
|d| 1
2
|a c| |b d|
• The “if” part of positive definiteness is not an independent requirement: positive homogeneity
implies that }x} 0 if x 0 since we have }0} }00} |0|}0} 0.
• Positivity is not an independent requirement: we have
paq pbq
0 }0} }x x} ¤ }x} } x} pcq }x} | 1|}x} 2}x},
where (a) follows from positive homogeneity as shown above, (b) follows from the triangle in-
equality, and (c) follows from positive homogeneity. Dividing both sides by 2 proves positivity.
• If the cab driver makes the longer dimension free, cab fare is still a norm (the l8 -norm).
1 1
In R2 , we have seen the l1 -, l2 -, l4 -, and l8 -norms so far, defined by |x1 |
|x2|, x21 x22 , x41 x42 , 2 4
and maxp|x1 |, |x2 |q, respectively. Before we are tempted to conclude that }x}p pxp1 xp2 q is a norm
1
p
for all p ¥ 0, note that it is actually not a norm for p P r0, 1q. The “l0 -norm” does not satisfy positive
homogeneity, and the “lp -norm” for p P p0, 1q does not satisfy the triangle inequality (the unit disks for
these norms are not convex).
Example 2.4.2. Consider the set of infinite sequences of real numbers that have only finitely many nonzero
terms. Is this a vector space? If so, what is its dimension? Generalize the norm examples to this case. If
you remove the restriction to finitely many nonzero terms but insist that the norm be finite, which example
gives the largest vector space? The smallest?
Solution: Yes, this is a vector space since the sum of an infinite sequence with M ¤ 8 nonzero terms and
an infinite sequence with N ¤ 8 nonzero terms will have at most M N ¤ 8 nonzero terms, and any
multiple of an infinite sequence with M nonzero terms will still have only M nonzero terms. The vector
32 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS
° 1
8 |ξ |p p . The
space is infinite-dimensional. The lp -norm (p P r1, 8q) for a sequence tξ1 , ξ2 , . . . u is i1 i
l8 -norm is maxi t|ξi |u8
i1 . If we remove the restriction to finitely many nonzero terms but insist that the
norm be finite, then the l8 -norm gives the largest vector space, while the l1 norm gives the smallest. q
Example 2.4.3. C ra, bs is the space of continuous functions on ra, bs, with norm }x} maxtPra,bs |xptq|.
Confirm that this norm satisfies all the axioms.
Solution: We have
1. positive homogeneity:
}αx} amax
¤t¤b
|αxptq| |α| amax
¤t¤b
|xptq| |α|}x}.
2. triangle inequality:
}x y } max |xptq
¤¤
a t b
y ptq| ¤ max p|xptq|
¤¤
a t b
|yptq|q ¤ amax
¤t¤b
|x| max |y ptq| }x}
¤¤
a t b
}y}.
3. positivity and positive definiteness: Clearly }x} 0 if only if xptq 0 for all t P ra, bs. }x} ¥ 0 since
|x| ¥ 0. q
Example 2.4.4. Give examples of functions, not in C r0, 1s, that can be included in the space if you choose
³1
the norm }x} 0 |xptq| dt.
Figure 2.4.1: Thomae’s function (Example 2.4.4). Image taken from Wikipedia.
Solution: The function xptq sgn px 0.5q is discontinuous at the point x 0.5, but }x} 1. A more
interesting choice is Thomae’s function, also known as the popcorn function or the Riemann function:
#
1
t PQ.
p
xptq q q
0 tPRQ
We show that }x} 0 (see Figure 2.4.1). Given any ¡ 0, choose n such that 1{n . In the interval
r0, 1s, there are only finitely many rational numbers with denominator at most n. Say there are dn of these
numbers. Surround each of these dn points with an interval of length {dn , and use the endpoints of these
intervals to form a dissection of the interval r0, 1s. The upper Darboux sum for this dissection is less than
2: within each of the dn intervals, whose combined length is , the function has an upper bound of 1, and
2.4. Normed Vector Spaces U 33
within each of the remaining intervals, whose combined length is less than 1, the function has an upper
bound of 1{n . The lower Darboux sum is obviously 0 since every interval in the dissection contains
an irrational number. Since we have 0 ¤ }x} ¤ 2 for all ¡ 0, }x} 0.
The norm }x} does not exist for the Dirichlet function, defined as
#
tPQ
x pt q 1 Q pt q
1
,
0 tPRQ
is not Darboux-integrable (and therefore not Riemann-integrable) since every interval in any dissection
of r0, 1s will always contain at least one rational and at least on irrational number; therefore, the upper
and lower Darboux sums will not converge. If we use the Lebesgue integral, however, then }x} exists and
equals 0 since Q is countable. q
Definition 2.4.5 (total variation). A function x on ra, bs is said to be of bounded variation if there is a
constant K so that for any partition a t0 t1 tn b of ra, bs,
ņ
|xptiq xpti1q| ¤ K.
i 1
where the supremum is taken with respect to all partitions of ra, bs.
Example 2.4.6. The space BVra, bs is defined as the space of all functions of bounded variation on ra, bs
together with the norm }x} |xpaq| TVpxq. Why is this norm the appropriate choice? Is the function
xptq sin 1
1
t in BVr0, 1s?
Figure 2.4.2: The function xptq sin 1t does not have bounded variation (Example 2.4.6).
Solution: We check that }x} satisfies the three axioms for a norm:
34 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS
1. positive homogeneity:
ņ ņ
}αx} |αxpaq| sup |αxptiq αxpti1q| |α||xpaq| |α| sup |xptiq xpti1q| |α|}x}.
i 1
i 1
2. triangle inequality:
ņ
}x y } |xpaq y paq| sup |pxptiq y pti qq pxpti1 q y pti1 qq|
i 1
ņ ņ
¤ |xpaq| |ypaq| sup |xptiq xpti1q| sup |yptiq ypti1q|,
i 1
i 1
where we use the triangle inequality for the absolute value function.
3. positivity and positive definiteness: Clearly }x} 0 if xptq 0 for t P ra, bs. Furthermore, because
we include the term |xpaq| in the norm, }x} 0 only if xptq 0 for t P ra, bs. Finally, }x} ¥ 0 for all
functions x P BVra, bs since every term in the norm involves absolute values.
The function xptq sin 1t is not in BVr0, 1s since there are infinitely many points t P r0, 1s at which
xptq 1. Specifically, as t Ñ 0, xptq oscillates between 1 and 1 infinitely often, as shown in Figure 2.4.2.
Therefore, xptq does not have bounded variation. q
We can think of xptq as the position of a car as a function of time for t P ra, bs, and that the car’s
odometer increases even when the car is backing up. Then the space BVra, bs is the set of functions for
which the change in the odometer reading is finite. This space is important because it will turn out to be
the dual space to C ra, bs.
A norm introduces a topology that may be more general than what the reader is used to.
Definition 2.5.1 (topology). A topology on a set X is a collection T of subsets of X having the following
properties:
1. ∅ and X are in T .
A set X for which a topology T has been specified is called a topological space.
Definition 2.5.2 (open set). If X is a topological space with topology T , we say that a subset U X is an
open set of X if U belongs to the collection T . More generally, a topological space is a set X together with
a collection of subsets of X, called open sets, such that ∅ and X are both open, and such that arbitrary
unions and finite intersections of open sets are open.
2.5. Open and Closed Sets U 35
The most basic topology on a nonempty set X is the collection t∅, X u, so a topology can contain just
two open sets. Furthermore, it is permissible to have every set in a topology be open.
Here are some other definitions that are independent of any model for topology:
Definition 2.5.3 (closed set). A subset A of a topological space X is said to be closed if the set X A is
open.
Definition 2.5.4 (interior and closure). Given a subset A of a topological space X, the interior of A, de-
noted Å, is defined as the union of all open sets contained in A (i. e. the largest open subset of A). The
closure of A, denoted Ā, is defined as the intersection of all closed sets containing A (i. e. the smallest
closed set containing the subset A).
Figure 2.5.1: a Web site model that illustrates a finite topology that does not involve norms
We use a Web site model to illustrate a finite topology that does not involve norms. Figure 2.5.1 shows
the links between the six Web pages in a set X. An “open set” is defined by the property that no page in
the set can be reached from a page from outside the set.
Example 2.5.5. For the Web site example, that both ∅ and X are open.
Solution: We cannot find a page in ∅ that can be reached from a page outside ∅ because there are no
pages in ∅. We cannot find a page outside X that links to a page in X because there are no pages outside
X. Therefore, ∅ and X are open. q
Example 2.5.6. For the Web site example, prove that the union of two open sets is open.
Solution: Let A and B be two open sets. By the definition of an open set, there are no links from pages
in X zA to pages in A, nor are there links from pages in X zB to pages in B. Since X zpA Y B q X zA
and X zpA Y B q X zB, there are no links from pages in X zpA Y B q to either pages in A or pages in B;
therefore, there are no links from pages in X zpA Y B q to pages in A Y B. Thus, A Y B is open. q
36 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS
Example 2.5.7. For the Web site example, prove that the intersection of two open sets is open.
Solution: Let A and B be two open sets. Suppose that A X B is not open. Then there must be some page
in X zpA X B q that links to a page in A X B. But X zpA X B q pX zAq Y pX zB q, so we have found either
a page in X zA that links to a page in A or a page in X zB that links to a page in B. This contradicts our
assumption that A and B are open, so A X B is open. q
For the Web site model, we note the following to illustrate some of the topological concepts we have
defined:
• The nine open sets in X are ∅, t2u, t4, 5u, t1, 2, 3u, t2, 4, 5u, t4, 5, 6u, t2, 4, 5, 6u, t1, 2, 3, 4, 5u, and
t1, 2, 3, 4, 5, 6u X.
• The interior of t2, 3u is t2u since t2u is the largest open subset of t2, 3u.
• The nine closed sets are the complements of the nine open sets in X. Note that ∅ and X are both
open and closed sets (sometimes called clopen sets).
• The closure of t1u is t1, 3u since t1, 3u is the smallest closed set containing the subset t1u.
• The closure of t1, 6u is t1, 3, 6u since t1, 3, 6u is the smallest closed set containing the subset t1, 6u.
Definition 2.5.8 (interior point). Let P be a subset of a normed vector space X. The point p P P is an
interior point of P if there exists some ¡ 0 such that the ball B pp, qq tx : }x p} u is a subset of
P.
Definition 2.5.9 (clsoure point). A point x P X is a closure point or limit point of a set P if, given any
¡ 0, there is a point p P P such that }x p} .
In other words, these definitions are saying that a point p is an interior point of P if we can always
surround p with a ball entirely contained within P , while a point x is a closure point of P if every ball
centered at x contains at least one point in P .
Proposition 2.5.10. P̄ is a closed set; that is, that the complement of P̄ is an open set.
(based on [2, Theorem 2.27, p. 35]): If p P X and p R P̄ , then p is neither a point in P nor a closure point
of P . Therefore, there exists some ball centered at p that does not contain any points in P . This shows
that the complement of P̄ in X is open, so P̄ is closed. q
2.6. Convergence, Limits, and Continuity U 37
Q UIZ T HEOREM 2 (first part of [1, Proposition 2.7.4, p. 25]). Let C be a convex set in a normed space. Then C̊
is convex.
Proof: If C̊ is empty, it is convex (see Proposition 2.2.2). Suppose x0 , y0 P C̊. Fix λ P p0, 1q: we must
show that z λx0 p1 λqy0 is in C̊. Since x, y P C̊, there exists some ¡ 0 such that the open
balls B px0 , q, B py0 , q are contained in C; that is, all vectors x0 w and y0 w with }w} are in C.
Since C is convex, all convex combinations λpx0 wq p1 λqpy0 wq are in C. Furthermore, since
λpx0 wq p1 λqpy0 wq z0 w, it follows that all points of the form z0 w are in C; that is, there
exists some ¡ 0 such that the open ball B pz0 , q is contained in C. Therefore, z0 P C̊. q
Q UIZ T HEOREM 3 (second part of [1, Proposition 2.7.4, p. 25]). Let C be a convex set in a normed space. Then
C̄ is convex.
Proof: If C̄ is empty, it is convex (see Proposition 2.2.2). Suppose x0 , y0 P C̄. Fix λ P p0, 1q: we must
show that z0 λx0 p1 λqy0 is in C̄. Given any ¡ 0, select x, y from C such that }x x0 } and
}y y0} . Since C is convex, z λx p1 λqy is in C. Then by the triangle inequality,
}z z0} }λx p1 λqy λx0 p1 λqy0} ¤ λ}x x0} p1 λq}y y0} λ p1 λq ,
so z0 is within a distance of the point z P C. Since this is true for every ¡ 0, z0 is a closure point of C̄
and is therefore in C̄. q
Definition 2.6.1 (convergence (any topology)). The sequence txn u converges to x if, for every open set P
containing x, there exists an N such that for all n ¡ N , xn P P . We write xn Ñ x.
Example 2.6.2. For the Web site topology, show that the sequence 6, 5, 4, 6, 5, 4, 5, 4, 5, 4, . . . converges both
to 4 and to 5.
Solution: The smallest open set containing 4 is t4, 5u, and the smallest open set containing 5 is also t4, 5u.
Therefore, for every open set P containing 4 or 5, there exists some N such that for all n ¡ N , q
If we define open sets by a norm, we can formulate convergence in terms of norms.
Definition 2.6.3 (convergence (normed space)). The sequence txn u converges to x if, for every ¡ 0, there
exists an N such that for all n ¡ N , }xn x} . As before, we write xn Ñ x.
38 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS
Q UIZ T HEOREM 4 ([1, Proposition 2.8.1, p. 27]). If a sequence converges, its limit is unique.
Proof: Suppose xn Ñ x and xn Ñ x1 . Then for every {2 ¡ 0, there exist N, N 1 such that for all n ¡N
and all n1 ¡ N 1 , }xn x} {2 and }xn x1 } {2. Then by the triangle inequality,
T HEOREM 2.6.4 ([1, Proposition 2.8.2, p. 27]). A set F is closed if and only if every convergent sequence with
elements in F has its limit in F .
Proof: For the “only if” direction, the limit of a sequence in F is obviously a closure point of F and
therefore must be in F if F is closed. For the “if” direction, suppose that F is not closed. Then there is a
closure point x of F that is not in F . In each of the open balls B px, 1{nq we may select a point xn P F since
x is a closure point. The sequence txn u generated in this way converges to some x R F , which contradicts
our assumption that every convergent sequence with elements in F has its limit in F . Therefore, F must
be closed. q
Definition 2.6.5 (transformation). Let X and Y be vector spaces and let D be a subset of X. A rule which
associates an element y P Y with every element x P D is a transformation from X to Y with domain D.
Definition 2.6.7 (surjective). T is surjective or onto if for every y P Y , there exists at least one x such that
T pxq y. In other words, the image of T equals its codomain.
Definition 2.6.8 (linear). T is linear if T pax byq aT pxq bT pyq for any x, y P X and any scalars a, b P F.
• Both X and Y are the space of continuous functions on ra, bs: T is defined by
»b
T px q k pt, τ qxpτ q dτ.
a
³
Solution: Differentiation and integration are linear operators: Dx paf bg q aDx f bDx g and paf pxq
³ ³ R
bg pxqq dx a R f pxq dx b R g pxq dx. q
Notice that the inverse image is defined even if T is not invertible, and it is not necessarily a connected
set.
For a topology defined by a norm, we can formulate convergence in terms of norms.
Definition 2.6.11 (continuity (normed space)). A transformation T from a normed space X to a normed
space Y is continuous at x0 P X if for every ¡ 0, there exists a δ ¡ 0 such that }x x0 } δ implies
}T pxq T px0q} .
Example 2.6.12 (based on [6, Theorem 18.1, p. 104]). Show that this is the same definition, specialized to
a topology defined by a norm. In other words, show that the definition of continuity for any topology
implies the definition of continuity for a normed space.
T HEOREM 2.6.13 ([1, Proposition 2.9.1, p. 28]). A transformation T mapping a normed space X into a normed
space Y is continuous at the point x0 P X if and only if xn Ñ x0 implies T pxn q Ñ T pbx0 q.
Proof: The “if” portion is obvious since we just set the δ in the definition of continuity to the in the
definition of convergence. For the “only if” portion, let txn u be a sequence such that xn Ñ x0 but T pxn q Û
T px0 q. Then for some ¡ 0 and every N , there is an n ¡ N such that }T pxn q T px0 q} ¥ . Since xn Ñ x0 ,
this implies that for every δ ¡ 0 there is a point xN such that }xn x0 } δ } and }T pxn q T px0 q} ¥ . But
this contradicts our assumption that T is continuous at x0 , so we must have T pxn q Ñ T px0 q. q
Theorem 2.6.13 suggests the following practical test for lack of continuity. If you want to show that T
is not continuous at x0 :
– xn converges to x.
– For all n ¡ N , ||T pxn q T pxq|| ¥ .
1
Having xn scale like n is often convenient.
40 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS
x2 y
f px, y q , f p0, 0q 0
x4 y 2
is discontinuous at px, y q p0, 0q and that the proof is valid for the l2 norm, the l1 norm, or the l8 norm
in R2 .
3.1 lp Space
Definition 3.1.1 (lp space). For p ¥ 1, the space lp consists of all sequences of scalars tξ1, ξ2, . . . u such for
which
8̧
|ξi|ρ 8.
i 1
}x}p |ξi| p
i 1
The space l8 consists of bounded sequences. The norm of an element x tξiu P l8 is defined as }x}8
supi |ξi |.
Proof: First, we show that the geometric mean is less than or equal to the arithmetic mean. By elementary
algebra, we have 0 ¤ px y q2 for any x, y P R. Then xy ¤ 21 x2 2 y . Let a x , b y . Then
1 2 2 2
a 2 b 2 ¤ 12 a 12 b.
1 1
Now consider x tξi u, y tηi u P l2 . We set a |ξi |2 {}x}2 , b |ηi |2 {}y}2 and use the above inequality
for each component i. Summing over all i gives
8̧ |ξiηi| ¤ 8̧ 1 |ξi|2 1 |ηi |2
°8
Then |ξi ηi | ¤ }x}2 }y}2 .
i 1 q
As an aside, while Augustin Cauchy first published the inequality for sums in 1821, the corresponding
inequality for integrals was first stated by Viktor Bunyakovsky in 1859, well before Hermann Schwarz’s
41
42 U 3. B ANACH S PACES
rediscovery of the inequality in 1888. This fact illustrates Stigler’s Law of Eponymy: “No scientific discov-
ery is named after its original discoverer.” The law, proposed by University of Chicago statistics professor
Stephen Stigler in 1980, applies to itself since long before Stigler formulated it, A. N. Whitehead noted
that “Everything of importance has been said before, by someone who did not discover it.” Sigler himself
attributed “his” law to the sociologist Robert K. Merton, famous for coining the phrases “self-fulfilling
prophecy” and “role model” – as well as for being the father of Harvard economist Robert C. Merton,
co-discoverer of the Black-Scholes(-Merton) option-pricing formula.
Before proving the Hölder inequality, a generalization of the Cauchy-Schwarz inequality, we provide
a motivating example.
Example 3.1.3. Consider a specific vector x tξi u P X, where ξi 1{i0.3 . What is the smallest integer p
° °8
for which the norm }x}p is finite? If ηi 1{i0.3 , does 8i1 |ξi ηi | converge? If ηi 1{i , does
0.9
i1 |ξi ηi |
converge?
°
Solution: Since the series 8 1 1{i converges for p ¡ 1, we want to find the smallest integer p such that
p
°i8
0.3p ¡ 1. Therefore, p 4. i1 |ξi ηi | does not converge if ηi 1{i0.3 , but it does if ηi 1{i0.9 . q
We can view ηi as an infinite set of coefficients that define a linear functional on X. If ξi 1{ip and
°
ηi 1{iq , then a sufficient condition for 8 i1 |ξi ηi | to converge is 1{p 1{q ¡ 1.
Cauchy-Schwarz was a special situation, because when we use the l2 -norm in X, we also want the
same l2 norm in the dual space of linear functionals on X. In general, if we use the lp -norm in X and want
°
the linear functional 8 i1 |ξi ηi | to converge, we should use the lq -norm, where p
1
q 1, on the dual
1
space of linear functionals. Therefore we need to prove the Hölder inequality. Before proving the main
inequality, first we derive an auxiliary inequality.
aλ b1λ ¤ λa p1 λqb
Proof: Consider the function f ptq tλ λt λ 1 for t ¥ 0. Then f 1 ptq λptλ1 1q. Since 0 λ 1,
we have f 1 ptq ¡ 0 for 0 t 1 and f 1 ptq 0 for t ¡ 1. Therefore, f ptq ¤ f p1q 0 for t ¥ 0, so
tλ ¤ λt 1λ
for t ¥ 0. If b 0, substituting t a{b and multiplying both sides by b gives the desired inequality. If
b 0, the inequality is obvious. q
Q UIZ T HEOREM 5 (Hölder Inequality ([1, Theorem 2.10.1, p. 29])). If p and q satisfy 1
p
1
q 1, 1 ¤ p ¤ 8,
1 ¤ q ¤ 8, and x tξi u P lp , y tηi u P lq , then
8̧
|ξiηi| ¤ }x}p}y}q .
i 1
3.1. lp Space U 43
1 1
q p
}x}p }y}q
for all i.
The case p 1, q
8 is equivalent.
Now consider the cases 1 p 8, 1 q 8. Set a |ξi |p {}x}p , b |ηi |q {}y}q , and λ 1{p, and use
the above inequality for each component i. Summing over all i gives
8̧ |ξiηi| ¤ 8̧ 1 |ξi|p
°8
Then |ξi ηi | ¤ }x}p }y}q .
i 1 q
We can use the Hölder inequality to generalize the Euclidean triangle inequality to the Minkowski
inequality.
Q UIZ T HEOREM 6 (Minkowski Inequality ([1, Theorem 2.10.2, p. 31])). If x, y P lp with 1 ¤ p ¤ 8, then so
is x y, and }x y}p ¤ }x}p }y}p . For 1 p 8, equality holds if and only if x ky for some positive
constant k.
Proof: For p 1,
8̧ 8̧ 8̧
}x y }1 |ξi ηi | ¤ |ξi| |ηi| }x}1 y}1 ,
i 1
i 1 i 1
where we apply the triangle inequality to each term in the summation. For p 8,
}x y}8 max
i
|ξi ηi | ¤ max |ξi |
i
max |ηi | }x}8
i
}y}8.
Now apply Quiz Theorem 5 to both sums, treating t|ξi ηi |p1 u as an element of lq , with 1
p
1
q 1. Then
1 1 1
ņ ņ ņ ņ
ηi |pp1qq
q p p
°n
since pp 1qq q {q 1. Dividing both sides of the above equation by p |ξi ηi |p q q and remembering
1
i 1
that 1 1q p1 , we have
1 1 1
ņ p ņ p ņ p
|ξi ηi | p
¤ |ξi| p
|ηi| p
.
i 1
i 1
i 1
Note that this holds for all n; therefore, we can also let n Ñ 8 on the left-hand side to obtain the desired
result. The conditions for equality follow from the Hölder inequality. q
Definition 3.2.1 (volume zero). A set S R has volume zero if, for any ¡ 0, S can be contained in the
union of a finite set of intervals with total length less than .
Definition 3.2.2 (measure zero). A set S R has measure zero if, for any ¡ 0, S can be contained in the
union of a countably infinite set of open intervals with total length less than .
Example 3.2.3 (rational numbers). Show that the set Q Xr0, 1s has measure zero but does not have volume
zero.
Solution: Since Q X r0, 1s is countable, we can enumerate the elements of the set as q1 , q2 , . . . . For any
¡ 0, define the open interval Ik pqk 2k , qk 2k q. The sum of the lengths of these intervals is .
Therefore, Q Y r0, 1s has measure zero.
Assume Q X r0, 1s can be contained in the union C Yni1 Ci , where n 8 and maxi |Ci | ¤ δ. Then
|C | ¤ nδ. For nδ 1, there exists an interval r0, 1s C with length 1 nδ ¡ 0; since Q is dense in R, there
must be an element of Q X r0, 1s in this interval. Therefore, Q X r0, 1s does not have volume zero. q
The proof that Q X r0, 1s has measure zero works for any countably infinite set. However, there also
exist sets of measure zero that are uncountably infinite. The Cantor set is one famous example:
3.2. Lebesgue Integration U 45
• Step i: Remove the middle third of every interval in the Cantor set at Step i 1.
Show that the Cantor set has volume, and therefore measure, zero.
Solution: At Step i, the Cantor set contains 2i intervals with total volume p2{3qi . For every , there exists
some i such that p2{3qi . Then we can contain the Cantor set in the 2i 8 intervals with total length
less than . q
Figure 3.2.1: the Cantor set (Example 3.2.4). Image taken from Wikimedia Commons.
As an interesting aside, note that the Cantor set is in one-to-one correspondence with r0, 1s. Consider a
point x in the Cantor set. That point is in r0, 1s, a left or right subinterval of r0, 1s, a left or right subinterval
of that subinterval, and so on (see Figure 3.2.1). We can encode the position of x by the binary expansion
0.b1 b2 . . . , where bi equals 0 if x is in the left subinterval at Step i and 1 if x is in the right subinterval at
Step i.
If f and g are Riemann (or, equivalently, Darboux) integrable, then so is f g, and it follows by
induction that the sum of any finite series of Riemann integrable functions is integrable. The Lebesgue
integral extends this idea to a function that is the sum of an infinite series of functions.
°8
Definition 3.2.5 (Lebesgue integral). Suppose that f pxq
k 1 fk pxq. Further suppose that the sum
8̧ »b
A |fk pxq| dx
k 1 a
°8
converges. Then f pxq is defined, in the sense that
k 1 fk pxq converges, except perhaps on a set of mea-
sure zero. Furthermore,
8̧ »b »b
fk pxq dx f pxq dx
k 1 a a
Note that if f is Riemann integrable, then we can write f1 pxq f pxq and fk pxq 0 for k ¡ 1, so the
Lebesgue integral of f equals the Riemann integral.
The following proposition extends the result that if functions f and g are equal except on a set of
volume zero, their Riemann integrals are equal:
46 U 3. B ANACH S PACES
°8
Proposition 3.2.6. If f pxq
k 1 fk pxq and gpxq °8k1 gk pxq are equal except on a set of measure zero, then
»b »b
f pxq dx g pxq dx.
a a
Example 3.2.7 (Dirichlet function). Show that the Dirichlet function 1xPQXr0,1s is Lebesgue integrable (with
integral zero) but not Riemann integrable.
Solution: We showed in Example 3.2.3 that Q X r0, 1s has measure zero, so the Lebesgue integral of the
Dirichlet function is zero. To show that the Dirichlet function is not Riemann integrable, consider any
dissection of r0, 1s. Each interval in this dissection will contain a rational number and a nonrational
number since Q is dense in R. Therefore, the upper Darboux sum will always be 1, while the lower
Darboux sum will always be zero. Therefore, the Darboux integral, and hence the Riemann integral, does
not exist. q
The improper integral of an unbounded function cannot be defined as a Riemann integral, since the
upper Darboux sum does not exist. However, Lebesgue integration solves the unbounded function prob-
lem:
Solution: Let #
fk pxq
?1x 1
2k
x ¤ 2 1 k 1
.
0 otherwise
Then
k1
»
k
|fk pxq| dx 2 1
1
2 2
Ik ;
R 2 2
°8
k 0 Ikis a telescoping series and evaluates to 2. Since |fk pxq| fk pxq for all k and x, the given integral
also equals 1. q
Lebesgue integration also solves the unbounded interval problem:
Solution: Let #
ex k1¤x k
fk pxq .
0 otherwise
Then »
Ik |fk pxq| dx epk1q ek ;
R
°8
k0 Ik is a telescoping series and evaluates to 1. Since |fk pxq| fk pxq for all k and x, the given integral
also equals 1. q
3.3 Lp Space
Definition 3.3.1 (Lp space). For p ¥ 1, the space Lp ra, bs consists of all real-valued measurable functions x
on ra, bs for which |xptq|p is Lebesgue integrable. The norm of an element x P Lp ra, bs is defined as
» b
p1
}x}p |xptq| p
dt .
a
Of course, }x}p 0 does not necessarily imply that xptq 0 since xptq may be nonzero on a set of measure
zero, as we saw in Example 3.2.7. Therefore, we do not distinguish between functions that are equal
almost everywhere and redefine 0 as the equivalence class of all functions that equal zero except on a set
of measure zero.
Note that we can define the Rp spaces similarly, replacing “Lebesgue integrable” with “Riemann in-
tegrable”. While most of the functions we see in these notes will be Riemann integrable, we prefer to
work with the Lp spaces since they are Banach spaces (as we will see shortly), allowing us to use a strong
theoretical framework.
A tricky special case is L8 ra, bs. The obvious definition is }x}p suptPra,bs |xptq|, but this is ambiguous
since an element of x corresponds not to a single function but to an equivalence class of functions that
differ on a set of measure zero. Therefore, we define the norm as
}x}8 inf
p qxptq
sup |y ptq| ess sup |y ptq|,
y t
almost everywhere
Pr s
t a,b Pr s
t a,b
T HEOREM 3.3.2 (Hölder Inequality ([1, Theorem 2.10.3, p. 32])). If p and q satisfy 1
p
1
q 1, 1 ¤ p ¤ 8,
1 ¤ q ¤ 8, and x P Lp ra, bs, y P Lq ra, bs, then
»b
xptqy ptq dt ¤ }x}p }y }q .
a
48 U 3. B ANACH S PACES
T HEOREM 3.3.3 (Minkowski Inequality ([1, Theorem 2.10.4, p. 33])). If x, y P Lpra, bs with 1 ¤ p ¤ 8, then
so is x y, and }x y }p ¤ }x}p }y }p .
Proof: Consider the function f ptq tp . Clearly this function is convex on r0, 8q for p ¥ 1 since f 1ptq
ptp1 and f 2 ptq ppp 1qtp2 ¥ 0 for p ¥ 0 and t ¥ 0. Define
aptq
|xptq| bptq
|yptq| λ
}x}p .
}x}p }y }p }x}p }y}p
Then
|xptq yptq|p p¤aq |xptq|p |yptq|p pλaptq p1 λqbptqqp p¤bq λpaptqqp p1 λqpbptqqp,
p}x}p }y}pqp p}x}p }y}pqp
where (a) follows from the triangle inequality and (b) follows from the convexity of tp . Then integrating
over ra, bs gives
}x y}p
p ¤ λ p1 λq 1,
}x}p }y}p
yielding the desired result. q
Definition 3.4.1 (Cauchy sequence). A sequence txn u in a normed space is a Cauchy sequence if for any
¡ 0, there exists an integer N such that }xm xn } for all n, m ¡ N .
Example 3.4.2. Show that the partial sums over the vector space Q
ķ
sn p1qi 2i 4 1
i 0
Q UIZ T HEOREM 7 (includes [1, Lemma 2.11.1, p. 35]). Every convergent sequence is a Cauchy sequence. Fur-
thermore, every Cauchy sequence is bounded.
Proof: If a series txn u converges to x, then for every ¡ 0, there exists an N such that for every n ¡ N ,
}xn x} . Then we have
}xm xn} }pxm xq pxn xq} ¤ }xm x} }xn x} ¤ 2,
where the first inequality follows from the triangle inequality. If we set 1 2, we have shown that for
every 1 ¡ 0, there exists an N such that for every m, n ¡ N , }xm xn } 1 . Therefore, txn u is a Cauchy
sequence.
To show that every Cauchy sequence is bounded, pick an N such that }xn xN } 1 for n ¡ N . Then
for any n ¡ N ,
}xn} }xn xN xN } ¤ }xn xN } }xN } 1 }xN },
where the first inequality follows from the triangle inequality. Since }xn } }xN } for any n ¡ N , txnu is
bounded. q
Definition 3.4.3 (Banach space). A normed linear vector space X is complete if every Cauchy sequence in
X has a limit in X. A complete normed linear vector space is a Banach space.
Example 3.4.4 ([1, Example 2.11.2, p. 34]). Let X be the vector space of sequences of real numbers with
only finitely many nonzero components with the l8 norm and define
" *
xn 1, , ,
1
2
1
n1
, 0, 0, .
Show that txn u is a Cauchy sequence. How would you “complete” X to make this Cauchy sequence
convergent in X?
Solution: We have }xn xm } maxp1{n, 1{mq Ñ 0. Clearly txn u does not converge to an element of
X, but we can “complete” X by turning it into the vector space of all sequences of real numbers that
converge to zero. q
50 U 3. B ANACH S PACES
Solution: The function converges to xptq t1{2. Because x is unbounded, x R R1r0, 1s. However, from
Example 3.2.8, we see that x P L1 r0, 1s. q
Proof: First consider the case 1 ¤ p 8. Let txn u be a Cauchy sequence in lp . Then if xn tξinu,
1
8̧
|ξkn ξkm| |ξkn ξkm|p ¤
p
|ξin ξim|p Ñ 0,
1
p
i 1
so for each k, tξkn u is a Cauchy sequence and therefore converges to a limit ξk . We show that x tξk u P lp .
From Quiz Theorem 7, there exists some M such that }xn } ¤ M for all n, so
¸
j
|ξin|p ¤ }xn}p ¤ M p.
i 1
¸
j
|ξi|p ¤ M p.
i 1
so x P lp , and }x} ¤ M . We still must show that xn Ñ x. For any ¡ 0, there exists an N such that
¸
j
|ξin ξim|p ¤ }xn xm}p ¤
i 1
since txn u is a Cauchy sequence. Since the sum on the left is finite, the inequality holds as m Ñ 8:
¸
j
|ξin ξi|p ¤ }xn x|p ¤ .
i 1
3.4. Cauchy Sequences U 51
for n ¡ N , so xn Ñ x.
Now let txn u be a Cauchy sequence in l8 . Then |ξkn ξkm | ¤ }xn xm } Ñ 0, so for each k, as before,
tξknu is a Cauchy sequence and therefore converges to a limit ξk . Furthermore, this convergence is uniform
in k. We show that x tξk u P l8 . Since tξk u is Cauchy, from Quiz Theorem 7 there exists an M such that
for all n, }xn } ¤ M . Then |ξkn | ¤ }xn } ¤ M for all k and n, so x P l8 and }bx} ¤ M . Since ξkn Ñ ξk
uniformly, xn Ñ x. q
Example 3.4.6 ([1, Example 2.11.3, p. 35]). Show that C r0, 1s is a Banach space.
Solution: Let txn u be a Cauchy sequence in C r0, 1s. For each fixed t P r0, 1s, |xn ptq xm ptq| ¤a
}xn xm} Ñ
0, so txn ptqu is a Cauchy sequence of real numbers. Since E (R with the Euclidean norm px y q2
|x y|) is complete, there exists a real number xptq such that xnptq Ñ xptq. Therefore, the functions xn
converge pointwise to x. This pontwise convergence is actually uniform in t P r0, 1s; that is, for any ¡ 0,
there exists an N such that |xn ptq xptq| for all t P r0, 1s and n ¥ N . For ¡ 0, choose N such that
}xn xm} {2 for n, m ¡ N , which we can do since txnu is a Cauchy sequence. Then for n ¡ N ,
|xnptq xptq| ¤ |xnptq xmptq| |xmptq xptq| ¤ loooomoooon
}xn xm} }xm x}.
¤{2
By choosing m sufficiently large, we can make the second term less than {2 since xm Ñ x pointwise, so
|xnptq xptq| for n ¡ N .
We must still show that x is continuous and therefore in C r0, 1s and that xn Ñ x in the norm of C r0, 1s.
To prove continuity, choose ¡ 0. For all t, δ, n,
Example 3.4.7 ([1, Example 2.11.1, p. 35]). Show that the space of continuous functions on r0, 1s, with the
norm » 1
}x} |xptq| dt,
0
is not a Banach space.
52 U 3. B ANACH S PACES
Solution: We must construct a Cauchy sequence of functions that convergence to a discontinuous func-
tion. Consider the sequence txn u, where
$
'
'
&0 0¤t n1 1
2
xn ptq nt n
1 2 n ¤t 2
1 1 1 .
'
'
2
%1 t ¥ 12
which clearly goes to zero as n, m Ñ 8. Therefore, txnu is a Cauchy sequence. However, xn Ñ x, where
x is the discontinuous function #
0¤t 1
xptq
0 2
.
1 1
2 ¤t¤1
Therefore, the given space is not a Banach space. q
T HEOREM 3.4.8 ([1, Theorem 2.12.1, p. 38]). In a Banach space X, a subset S X is complete if and only if it is
closed.
Proof: If S is complete, then every Cauchy (and hence, by Quiz Theorem 7, convergent) sequence in S
converges to a point in S, so S is closed. Furthermore, since X is a Banach space, any Cauchy sequence in
S must converge to a point in x P X, and if S is closed, then x P S. Hence, if S is closed, then S must be
complete. q
T HEOREM 3.4.9 ([1, Theorem 2.12.2, p. 38]). In a normed linear vector space, any finite-dimensional subspace is
complete.
Proof: The proof is by induction on the dimension of the subspace. A 1-dimensional subspace is complete
since all elements in the subspace can be written in the form x αe, where α is a real number and e is
3.5. Compactness and Extrema U 53
a fixed vector. Convergence of a sequence tαn eu is equivalent to convergence of the sequence of real
numbers tαn u; this sequence is convergent since E is complete.
Now assume that the theorem is true for subspaces of dimension N 1. Let X be a normed space and
let M be an N -dimensional subspace of X. We show that M is complete. Let te1 , . . . , eN u be a basis for
M . Define
Ņ
δk inf ek
a1 ,...,aN
αj ej ,
k 1, . . . , N.
j 1
j k
δk is the distance from ek to the subspace Mk generated by the remaining N 1 basis vectors. We must
have δk ¡ 0 otherwise a sequence of vectors in the (N 1)-dimensional subspace Mk could be constructed
converging to ek R Mk , which contradicts the induction hypothesis that Mk is complete.
Define δ mink δk and let txn u be a Cauchy sequence in M . Each xn has a unique representation
Ņ
xn λni ei .
i 1
Definition 3.5.1 (compact). A set K in a normed space X is compact if, given an arbitrary sequence txi u
in K, there is a subsequence txin u that converges to an element x P K.
T HEOREM 3.5.3 (extreme value theorem [1, Theorem 2.13.1, p. 40]). An upper semicontinuous functional on
a compact subset K of a normed linear vector space X achieves a maximum on K.
Proof: Let M supxPK f pxq (we allow the possibility that M 8). There is a sequence txi u in K such
that f pxi q Ñ M . Since K is compact, there is a convergent subsequence xin Ñ x P K. Clearly f pxin q Ñ M
and since f is upper semicontinuous, f pxq ¥ limnÑ8 f pxin q M . But since K is compact, f is bounded,
so M 8 and then f pxq M . q
Example 3.5.4 ([1, Example 2.13.1, p. 40]). Consider the linear functional
» 1 »1
f pxq xptq dt xptq dt,
2
1
0 2
defined for x P C r0, 1s. Show that f is continuous. What function on the unit sphere in C r0, 1s maximizes
f ? Is this function an element of C r0, 1s?
Solution: We have
» 1 »1
1
0 2
Then for every ¡ 0, there exists a δ such that }x x0 } δ implies |f pxq f px0 q| . The unit ball
in C r0, 1s is the set of continuous functions x such that 1 ¤ xptq ¤ 1 for all t P r0, 1s. The supremum of f
over the unit ball in C r0, 1s is clearly 1, but no continuous function x with }x} ¤ 1 achieves this supremum.
(If we instead considered L8 r0, 1s, then the function
#
0¤t 1
x pt q
1 2
1 1
2 ¤t¤1
achieves the supremum.) This shows that the unit ball in C r0, 1s is not compact. q
In fact, the unit ball is not compact in any infinite-dimensional normed linear vector space:
T HEOREM 3.5.5. In any infinite-dimensional normed linear vector space X, the unit ball is not compact.
Proof: We construct an infinite sequence of linearly independent unit vectors zn such that for all n, m,
}zn zm} ¡ 1{2. Let tz1, . . . , zn1u be a set of n 1 such vectors, and let Y be the closed subspace
generated by these vectors. Y is a proper subspace of X since X is not finite-dimensional. Therefore,
we can choose x P X zY and let d inf YPY }x y}. Since Y is finite-dimensional, it is complete by
Theorem 3.4.9. If d were zero, then there would exist a sequence of vectors in Y that converged to x R Y ,
contradicting our hypothesis that Y is finite-dimensional and hence complete. Therefore, d ¡ 0.
Choose some y0 P Y such that d }x y0 } 2d, and let
zn }xx yy0} .
0
3.6. Quotient Spaces U 55
The numerator is greater than d and the denominator is less than 2d, so the fraction is greater than 1{2.
Since the above equation holds for all y P Y , }zn zi } ¡ 1{2 for all i n. By repeating this process for
n ¥ 2, we construct an infinite sequence of vectors in the unit ball (specifically on the unit sphere), such
that }zn zm } ¡ 1{2 for all n, m. Thus, this sequence has no Cauchy subsequence and hence, by Quiz
Theorem 7, no convergent subseqeunce. Therefore, the unit ball is not compact. q
Definition 3.6.1 (equivalence). Let M be a subspace of a vector space X. Two elements x1 , x2 P X are
equivalent modulo M if x1 x2 P M . In this case, we write x1 x2 .
This equivalence relation partitions X into disjoint subsets, called cosets, of equivalent elements. Geo-
metrically, these cosets are linear varieties that are distinct translations of the subspace M . Each x P X
belongs to a unique coset of M , denoted rxs.
Definition 3.6.2 (quotient space). Let M be a subspace of a vector space X. The quotient space X {M
consists of all cosets of M with addition defined by rx1 s rx2 s rx1 x2 s and scalar multiplication
defined by arxs raxs.
Definition 3.6.3 (norm in a quotient space). If X is a Banach space and M is a closed subspace of X, then
the norm of a coset rxs P X {M is defined as
}rxs} minfPM }x m}
Example 3.6.4. Show that this definition satisfies the axioms for a norm.
paq pbq
minfPM }x y 2m} ¤ }rxs} }rys},
where (a) follows for the same reason as before and (b) follows by the fact that the norm } } in X obeys
the triangle inequality.
Positivity follows from the positivity of the norm } } in X. Clearly }r0s} 0. There remains the
possibility that rxs does not include 0 but that there is a sequence in rxs that converges to 0, in which case
}rxs} 0. However, this is impossible since M is closed, so }rxs} ¡ 0 if rxs r0s. Thus, we have shown
positive definiteness. q
Definition 3.7.1 (dense). A set D is dense in a normed space X if for each x P X and each ¡ 0, there
exists a d P D such that }x d} .
Note that if D is dense in X, there are points of D arbitrarily close to each x P X. Therefore, given x P X,
we can construct a sequence of elements in D that converges to x. Thus, an equivalent definition is: D is
dense in X if D̄ X; that is, if the closure of D equals X.
One of the most common examples of a dense set is the set Q in R. By the Weierstraß approximation
theorem (1.5.6), the set of all polynomials with rational coefficients is dense in C ra, bs.
Definition 3.7.2 (separable). A normed space is separable if it contains a countable dense set.
Proof: Let D be the set of all finitely nonzero sequences with rational components. Clearly, D is countable.
°
Let x tξi u P lp and fix ¡ 0. Since x P lp , 8i1 |ξi | 8, so there exists an N such that
p
8̧
|ξi|p 2 .
i N 1
For i 1, . . . , N , let ri be a rational such that |ξi ri|p {2N , and let d tr1, . . . , rN , 0, 0, . . . u. Then
d P D and
Ņ 8̧
}x d}p ¤ |ξi ri|p |ξi|p ¤ ,
i 1
i N 1
so D is dense in lp . q
3.7. Denseness and Separability U 57
Solution: Let D be the set of all polynomials of finite degree with rational coefficients. Clearly, D is
countable since it can be put into correspondence with the set of all finitely nonzero sequences with
rational components. Given x P C ra, bs and ¡ 0, by the Weierstraß approximation theorem (Theorem
1.5.6), there exists a polynomial p such that |xptq pptq| {2 for all t P ra, bs. While p does not necessarily
have rational coefficients, we can construct another polynomial r that does, such that |pptq rptq| {2
N
for all t P ra, bs. We can easily construct r by changing each of the coefficients of p by less than 2N c ,
where N 1 deg p and c maxp|a|, |b|q. Then
1
N¸
|pptq rptq| cN ti ¤ 2 , a ¤ t ¤ b,
2N
i 0
as desired. Thus
}x r} tmax
Pra,bs
|xptq rptq| ¤ tmax
Pra,bs
|xptq pptq| max |pptq rptq| ¤ ,
Pr s
t a,b
Solution: Suppose we have found a countable dense subset txn u of l8 , where xn tξin u. (We can index
the sequence txn u by n since the sequence is countable.) Construct x tξi u by the following rule:
• If ξii 21 , set ξi 1.
• If ξii ¥ 21 , set ξi 0.
Solution: Consider the subset of indicator functions t1t 1xPr0,ts utPr0,1s P L8 . Clearly this set is uncount-
able since the set of real numbers in the interval r0, 1s is uncountable. Furthermore, }Is It } 1 if s t.
Therefore, the open balls tB1{2 pIt qutPr0,1s do not intersect. However, any dense set of L8 must include an
element in each of these balls; since the set of balls is uncountable, any dense set of L8 is uncountable.
Therefore L8 is not separable. q
58 U 3. B ANACH S PACES
C HAPTER 4
HILBERT SPACE
Definition 4.1.1 (pre-Hilbert space). A pre-Hilbert space is a linear vector space X over a field F along
with an inner product defined on X X, which associates with any two vectors x, y P X a scalar px, yq.
The inner product satisfies the following axioms:
2. sesquilinearity (linearity in the first variable): px y, zq px, zq py, zq and pλx, yq λ px, yq
for all x, y, z P X and all λ P F.
1. positive homogeneity:
b b
}αx} pαx, αxq paq pbq p cq
a a
α px, αxq αpαx, xq ααpx, xq |α|}x},
where (a) and (c) follow from sesquilinearity and (b) follows from conjugate symmetry.
2. triangle inequality:
}x y }2 px y, x yq
px, xq px, yq py, xq py, yq
¤ }x}2 2| px, yq | }y}2.
By the Cauchy-Schwarz inequality, (re)proved below, | px, yq | ¤ }x}}y}; substituting this into the
inequality and taking square roots gives }x y} ¤ }x} }y}.
59
60 U 4. H ILBERT S PACE
3. positive definiteness and positivity: This follows from the positivity and positive definiteness of
the inner product.
Proposition 4.1.2 (Cauchy-Schwarz Inequality ([1], Lemma 1, p. 47)). For all x, y in an inner product space,
| px, yq | ¤ }x}}y}. Equality holds if and only if x λy or if y 0.
0 ¤ px, xq
| px, yq |2 ùñ | px, yq | ¤ apx, xq py, yq }x}}y}.
py, yq q
Proposition 4.1.3 (parallelogram law ([1], Lemma 3, p. 48)). For any x, y in a pre-Hilbert space X,
Proof: In the general case, directly expand the norms in terms of the inner product. For X R2 , we
can prove the parallelogram law using only the law of cosines. Define the inner product px | yq as the
dot product x y }x}2 }y}2 cos θ, where θ is the angle between the vectors x and y. Then by the law of
cosines,
Proposition 4.1.4 (continuity of the inner product ([1], Lemma 4, p. 49)). Suppose that xn Ñ x and yn Ñ y
in a pre-Hilbert space. Then pxn , yn q Ñ px, yq.
Proof: Because the sequence txn u is convergent, it is bounded; say }xn } ¤ M . Then
| pxn, ynq px, yq | | pxn, ynq pxn, yq pxn, yq px, yq | ¤ | pxn, yn yq | | pxn x, yq |.
By the Cauchy-Schwarz inequality,
Definition 4.2.1 (orthogonality). In a pre-Hilbert space, two vectors x, y are said to be orthogonal, written
x K y, if px, yq 0. A vector is said to be orthogonal to a set S, written x K S, if x K s for all s P S.
Consider the following optimization problem in a pre-Hilbert space: given a vector x in a pre-Hilbert
space X and a subspace M in X, find arg minmPM }x m}. If x P M , then the solution is trivial, but if
x R M , we must answer three questions:
1. Is there a vector m P M that minimizes }x m}, or is there no m that is at least as good as the
others?
T HEOREM 4.2.2 (Projection Theorem: pre-Hilbert space ([1], Theorem 1, p. 50)). Let X be a pre-Hilbert space
and M be a subspace of X, and let x P X. If there exists a vector m0 P M such that }x m0 } ¤ }x m} for all
m P M , then m0 is unique. Furthermore, m0 arg minmPM }x m} if and only if x m0 K M .
Proof: First we show that if m0 arg minmPM }x m}, then x m0 K M . Suppose for contradiction
that there exists an m P M such that x m0 M m. Without loss of generality, assume that }m} 1 and
px m0 | mq δ 0. Define m1 m0 δm. Then
}x m1}2 }x m0 δm}2
}x m0}2 px m0, δmq pδm, x m0q |δ|2
}x m0}2 |δ|2 }x m0}2.
so if x m0 M M , m0 is not a minimizing vector.
Now we show that if x m0 K M , then m0 arg minmPM }x m} and m0 is unique. For any m P M ,
Example 4.2.3. Let X be the space of infinite sequences, with the l2 -norm and the inner product px | yq
°8
i1 ξi ηi for x tξi u, y tηi u P X. Let x 1, 2 , 3 , . . . , and let M be the subspace of sequences with
1 1
only finitely many nonzero terms such that all even terms equal zero. Show that the minimizing vector
m0 does not exist in M .
Solution: Let m0 tηiu be a candidate for the minimizing vector, where η2k 0 for k P N. Then
8̧
2
}x m0} 2 1
i
ηi ,
i 1
so the best we can do is set η2k1 2k11 for k n 8. Consider the sequence tmnu, where mn tηiu
with η2k 0 and η2k1 2k11 for k P N. Then
8̧
2 8̧
2
1 π2
8̧
2
}x mn} 2 1
2k
1
2k 1
4 6
1
2k 1
.
k 1
k N
k N
Because mn is the best we can do among infinite series with n 1 nonzero terms, a minimizing vector
does not exist since }x mn } ¡ }x mn1 } for all n. The problem is that mn Ñ m0 , but m0 P M since M
is not complete. q
To fix the problem, we have the following definition.
Definition 4.2.4 (Hilbert space). A Hilbert space is a complete pre-Hilbert space. That is, a Hilbert space
is a Banach space with an inner product that induces the norm.
Q UIZ T HEOREM 10 (Projection Theorem: Hilbert space ([1], Theorem 2, p. 51)). Let H be a Hilbert space and
M be a closed subspace of H. For any x P H, there exists a unique vector m0 P M such that }x m0 } ¤ }x m}
for all m P M . Furthermore, m0 arg minmPM }x m} if and only if x m0 K M .
Proof: Uniqueness and orthogonality have already been established in the proof of Theorem 4.2.2, so it
suffices to establish the existence of the minimizing vector m0 .
If x P M , then m0 x and the theorem is trivial. If x R M , define δ inf mPM }x m}. We want to
find an m0 P M such that }x m0 } δ. Let tmi u be a sequence of vectors in M such that }x mi } Ñ δ.
By the parallelogram law (4.1.3),
or
mj 2
}mj mi} 2}mj x}
2 2
2}x mi } 2
4 x mi
2 .
Therefore tmi u is a Cauchy sequence and since M is a closed subspace of a complete space, tmi u has a
limit tm0 u in M . By continuity of the norm, which follows from continuity of the inner product (4.1.4),
}x m0} δ. q
Solution: S K tpx, y, zq : x y z 0u, the plane through the origin perpendicular to the vector p1, 1, 1q.
Then
Proposition 4.3.3 (([1], Proposition 1, p. 52)). Let S and T be subsets of a Hilbert space. Then
1. S K is a closed subspace.
2. S S KK.
3. If S T , then T K S K .
4. S KKK S K .
q
64 U 4. H ILBERT S PACE
We want to show that given any closed subspace S of a Hilbert space H, any vector x can be written
as the sum of of a vector m P M and a vector n P M T . First we need the following definition.
Definition 4.3.4 (direct sum). A vector space X is the direct sum of two vector spaces M and N , written
M ` N , if every vector x P X has a unique representation of the form x m n, where m P M and
n P N.
Example 4.3.5. Show that R3 is the sum of the xy-plane and the xz-plane, but not the direct sum.
T HEOREM 4.3.6 (([1], Theorem 1, p. 53)). If M is a closed linear subspace of a Hilbert space H, then H
M ` M K and M M KK .
Proof: Let x P H. By the projection theorem, there is a unique vector m0 P M such that }xm0 } ¤ }xm}
for all m P M , and we can write n0 bx m0 P M K . Therefore, we can write x m0 n0 with m0 P M
and n0 P M K . To show that this representation is unique, suppose x m1 n1 with m1 P M and
n1 P M K . Then 0 m1 m0 n1 n0 . m1 m0 and n1 n0 are orthogonal, so by the Pythagorean
theorem, 0 }m1 m0 }2 }n1 n0 }2 , so by positive definiteness of the norm, m0 m1 and n0 n1 .
Therefore, H M ` M K .
Now we show that M M KK . By Part (2) of Proposition 4.3.3, M M KK . To show the other
direction, let x P M KK . By the above result, x m n where m P M KK and n P M KKK M K , where
M KKK M K by Part (4) of Proposition 4.3.3. Since x, m P M KK , x m n P M KK . But n P M K as
well, so pn, nq }n}2 0, which by positive definiteness of the inner product implies n 0. Thus,
x m P M , so M KK M and we conclude M M KK . q
Definition 4.4.1 (orthogonal set). A set S of vectors in a pre-Hilbert space is an orthogonal set if x K y for
all x, y P S such that x y. If }x} 1 for all x P S, then the set is orthonormal.
Proposition 4.4.2 (([1], Proposition 1, p. 53)). An orthogonal set of nonzero vectors is a linearly independent set.
Proof: Suppose tx1 , . . . , xn u is a finite subset of the given orthogonal set and that there are scalars α1 , . . . , αn
°
such that ni1 αi xi 0. Then
ņ
α i xi , xj αj pxj , xj q p0, xj q 0, j 1, . . . , n.
i 1
4.4. The Gram-Schmidt Procedure U 65
T HEOREM 4.4.3 (Gram-Schmidt Process). Let txi u be a countable sequence of linearly independent vectors in a
pre-Hilbert space X. Consider the vectors tei u defined by
i¸1
zi xi pxi, ej q ej ei }zzi}
j 1 i
for all i. Then tei u is an orthonormal sequence and for each n, rtx1 , . . . , xn us rte1 , . . . , en us.
Proof: The proof of the first part is by induction. For the case n 1, te1 u is an orthonormal sequence.
For general n, assume that te1 , . . . , en u is an orthonormal sequence and prove that te1 , . . . , en 1 u is also
an orthonormal sequence. Because te1 , . . . , en u is an orthonormal sequence, it suffices to show that en 1
is orthogonal to each of e1 , . . . , en and that }en 1 } 1. Taking inner products,
ņ i¸1
pz n 1 | z i q xn 1 px n 1, ej q ej , xi pxi, ek q ek
j 1 k 1
i¸1 ņ i¸1
pxn 1 , xi q px n 1 , ek q pek , xi q p xn 1 , ej q pej , xi q p xn 1, ek q pek , xi q
k 1
j 1 k 1
px n 1, xi q p x n 1, xi q p xn 1, xi q pxn 1, xi q 0,
for i 1, . . . , n, where the last line follows from Parseval’s equality (??). Since en 1 zn 1{}zn 1},
en 1 K ei for i 1, . . . , n and }en 1} 1. Therefore, teiu is an orthonormal sequence.
To show that rtx1 , . . . , xn us rte1 , . . . , en us for all n, observe that en is a linear combination of
x1 , . . . , xn . Therefore, we can also write xn as a linear combination of e1 , . . . , en ; thus, a vector can be
written as a linear combination of x1 , . . . , xn if and only if it can be written as a linear combination of
e1 , . . . , en . q
Example 4.4.4. Let X be the space of polynomials p such that deg p ¤ 2. Define the inner product
»1
pp, qq pptqq ptq dt
1
for any p, q P X. Starting with the sequence t1, t, tu, construct an orthonormal basis for X.
66 U 4. H ILBERT S PACE
Solution: We have
z1 x1 1
e1
z1
}z1} ? 1
2
z2 x2 px2 , e1 q e1 t
c
e2 z2
}z2} 2
3
t
q
Suppose we have a vector x in a Hilbert space X and want to find the closest vector to x that lies
in the subspace M rty1 , . . . , ym us. We can use Theorem 10 and the Gram-Schmidt process to solve
this problem. If M is one-dimensional, we can set ei yi {}yi } for any i 1, . . . , n to find the solution
x px, ei q ei P M K .
for any v pv1 , v2 q, w pw1, w2q P X. Let M rty1us rtp1, 4qus. Find the vector in M that is closest
to y1 p5{2, 0q.
e1 y1
}y1} ? , ?2
1
.
2 5 5
Then the minimizing vector is
x px, e1 q e1 52 , 0 ?6 ? , ?2
1
4, ?2 . q
5 2 5 5 5
This technique works equally well if M is generated by more than one vector, provided that we start
by converting the set ty1 , . . . , yn u into an orthonormal basis te1 , . . . , em u. (Of course, if y1 , . . . , yn are not
linearly independent, m n.)
Note that we can write this problem as
ņ
max x
α1 ,...,αn
αi yi
i 1
ņ
x α i yi , yj 0, j 1, . . . , n.
i 1
Note that the objective function is quadratic in α1 , . . . , αn and that the constraints are linear in α1 , . . . , αn ,
4.5. Fourier Series U 67
so using Lagrange multipliers reduces this problem to solving a system of linear equations. More specifi-
cally, if we write
py 1 , y 1 q p y 1 , y n q α1
px, y1q
G G py 1 , . . . , y n q α . c
.. .. .. .. .. ,
. . . .
px n , x 1 q p xn , xn q αn px, ynq
then we can write the constraints as GT α c. These are called the normal equations. G is called the
Gram matrix. The normal equations have a unique solution if and only if det G 0.
Proposition 4.4.6 ([1], Proposition 1, p. 56). det G 0 if and only if y1 , . . . , yn are linearly independent.
Proof: Equivalently, we show that det G 0 if and only if y1 , . . . , yn are linearly dependent. If y1 , . . . , yn
°
are linearly dependent, then there exist β1 , . . . , βn not all equal to zero such that ni1 βi yi 0. Taking
inner products,
ņ ņ
β i yi , yj βi pyi , yj q 0, j 1, . . . , n,
i 1
i 1
or equivalently,
ņ
β i Gi 0 GTi p yi , y1 q p yi , yn q .
i 1
Therefore, the rows of the Gram matrix are linearly dependent, so det G 0.
If det G 0, then the rows of the Gram matrix are linearly dependent, so there exist β1 , . . . , βn not all
°
equal to zero such that ni1 βi pyi , yj q 0 for j 1, . . . , n. Therefore,
ņ
β i yi , yj 0, j 1, . . . , n,
i 1
so 2
ņ ņ ņ
βi βi yi , yj 0 ùñ
βi yi
0.
j 1 i 1
i 1
°n
Thus 0, so y1, . . . , yn are linearly dependent.
i 1 β i yi q
Note that if det G 0, solving the normal equations is equivalent to solving fewer than n equations in n
unknowns, so a solution exists, though it is not unique.
Now suppose that X is infinite-dimensional and that M is generated by a countably infinite set of linearly
independent vectors. (Recall from Example 2.3.7 that M does not necessarily equal X.) Assume that we
have constructed an orthonormal basis e1 , e2 , . . . for M using the Gram-Schmidt process. Then
8̧
x̂ px, eiq ei
i 1
68 U 4. H ILBERT S PACE
is the vector in M closest to x. However, this is an infinite series, so we need the following necessary and
sufficient condition for an infinite series of orthogonal vectors to converge in Hilbert space.
T HEOREM 4.5.1 ([1], Theorem 1, p. 59). Let tei u be an orthonormal sequence in a Hilbert space H. A series of
° °n
the form 8i1 ξi ei converges to an element x P H if and only if i1 |ξ | 8. In that case, ξi px, ei q for i ¥ 1.
2
°n °n
|ξ | 8 and define the partial sums sn
2
Proof: Suppose i 1
i 1 ξi ei . Then
2
m̧ m̧
}sn sm} 2
ξi ei
|ξi|2 Ñ 0
i n 1
i n 1
Proposition 4.5.2 (Bessel’s inequality ([1], Lemma 1, p. 59)). Let x be an element in a Hilbert space H and
suppose tei u is an orthonormal sequence in H. Then
8̧
| px, eiq |2 ¤ }x}2.
i 1
Then
ņ
|αi|2 ¤ }x}2
i 1
for all n; letting n Ñ 8 establishes the inequality. q
Example 4.5.3. Show that Bessel’s inequality holds even if tei u does not generate H.
Solution: Let X be the space of polynomials on r1, 1s and define the inner product
»1
pp, qq pptqq ptq dt
1
4.5. Fourier Series U 69
for any p, q P X. Let tei u be an orthonormal set of even polynomials and let xptq t. Clearly tei u are even
polynomials, so px, ei q 0 for i ¥ 0. Then
8̧
0 | px, eiq |2 ¤ }x}2 32 ,
i 1
°
Since Bessel’s inequality guarantees that 8 i1 | px, ei q | ¤ 8, Theorem 4.5.1 guarantees that the series
2
°8
i1 px, ei q ei converges to some element. Since any Hilbert space H is complete by definition, that
element is in H. Let S tei u be an infinite subset S of H; the subspace rS s include all finite linear
combinations of the vectors in S, but since we have an infinite sum, we are interested in the closure rS ¯ s of
rS s. For example, if S t1, t2, t4, . . . u, then cos t is in rS¯ s.
T HEOREM 4.5.4 ([1], Theorem 2, p. 60). Let x be an element in a Hilbert space H and suppose tei u is an or-
thonormal sequence in H. Then the series
8̧
px, eiq ei
i 1
Proof: q
Since any Hilbert space H is complete, the series converges to an element of H. The vectors ei are
an infinite subset S. The subspace rS s includes all finite linear combinations, but since we have an
infinite sum, we need the closure of this subspace.
If the closed subspace M is H itself, then the orthonormal sequence ei is called complete, and
Bessel’s inequality becomes an equality.
A useful criterion (proof left for homework): the orthonormal sequence ei is complete if and only if
the only vector orthogonal to each of the ei is the null vector θ.
2. Completeness: an example
The Hilbert space is H L2 r1, 1s. From the sequence t1, t, t2 , u it is straightforward to use
Gram-Schmidt to construct an orthonormal basis. Luenberger, p. 61, has an explicit formula, but we
do not need it.
To prove completeness, we assume (for a contradiction) the existence of a function f ptq that is or-
thogonal to each tn .
This element f ptq of H may be discontinuous or unbounded, but its antiderivative
»t
F ptq f pτ qdτ
1
is a continuous function of t.
Prove continuity
Prove that F p1q F p1q 0.
³1
1 t F ptqdt 0.
Using integration by parts, prove that n
Thus the continuous function F ptq is the zero function and the Lebesgue integrable function f ptq is
in the same equivalence class as the zero function, i.e. it is θ.
5. A finite-dimensional example
You are doing the term project for your course “Ethnic Sculpture.” You start with a log of length 21
meters. In the first week, you cut off u meters, carve a totem pole, and set it vertically in Harvard
Yard. In the second week, you cut off v meters, carve a totem pole, and set it vertically in Harvard
Yard. In the third week, you carve a totem pole from the remaining w meters, and set it vertically
in Harvard Yard. At the end of the fourth week, you take down the poles, having gained artistic
renown equal to 3u 2v w. You need renown of 46 to get an A in the course.
The energy cost of erecting a totem pole is proportional to the square of its height, so you want to
minimize ||x||2 u2 v 2 w2 .
Express the constraints in the form py |xq c.
The vectors y1 and y2 generate N K . Find the linear combination β1 y1 β2 y2 that satisfies both con-
straints.
What is the solution to this optimization problem?
x9 ptq 0.5xptq.
That will increase your money to
?e trillion in a decade, but it is not enough. You will need funds
from Congress at a rate uptq.
Constraints: xp0q 1, xp1q e 2 .
3
³1
Optimization problem: choose uptq to minimize the Hilbert space norm pu|uq 0 u2 ptqdt. The
solution lies in an infinite-dimensional vector space.
Secret: the codimension is 1. If we can express the constraint in the form pu|y q c, then the solution
is that uptq is in the space generated by y ptq, and we just have to determine one coefficient.
First you will have to learn something about linear differential equations that Luenberger takes for
granted.
What inhomogeneous differential equation does xptq satisfy?
What is a solution to the corresponding homogeneous differential equation?
Substitute xptq aptqx0ptq, where x0ptq is a solution to the homogeneous equation, and solve for
a9 ptq.
Find an expression for ap1q ap0q as an integral.
Express the constraint xp1q e 2 in terms of an inner product pu|y q c.
3
The optimal uptq lies in the space N K generated by y. Use the constraint to find the coefficient, and
the problem is solved!
Now we insist that the vector uptq must lie in the linear variety V of functions that cause the motor
to rotate through an angle of 1 radian in one second, starting at rest and ending at rest.
What small change would make V be a subspace N ?
What is the dimension of N and of V ?
The secret to solving this problem is that the codimension of N is 2. To prove this, we have to find
basis vectors y1 and y2 that are orthogonal to N , so that
³
py1|uq 01 y1ptquptqdt ωp1q 0
³
py2|uq 01 y2ptquptqdt θp1q 1.
We want to find the vector of minimum norm in V , the function uptq that minimizes the energy
³1
consumption 0 uptq2 dt. This will lie in the two-dimensional space spanned by y1 and y2 .
4.5. Fourier Series U 73
δ kinf
PK
||x k||
and construct a sequence such that ||x ki || converges to δ and so is Cauchy.
Show that ki is Cauchy.
Why is ki also convergent? What does this prove?
To prove uniqueness, assume that ||x k0 || ||x k1 || δ. Luenberger just does a special case of
the argument above, but for a little more fun, just repeat the parallelogram-law argument.
Now let’s assume that k0 has the property that
px k0|k k0q ¤ 0 for all k P K.
It is then easy to prove that k0 is the vector that minimizes the distance ||x k ||. Just expand
||x k||2 ||x k0 k0 k||2 in terms of dot products. This proof doesn’t even use the convexity
of k.
74 U 4. H ILBERT S PACE
Definition 5.1.2 (linear functional). A linear functional is a functional f : X Ñ R such that f pax byq
af pxq bf pyq for all x, y P X and a, b P R.
In a Hilbert space H, any vector y determines a linear function by the rule fy pxq py, xq.
• y1 p1, 1, 1q on E 3?
• y2 1, 12 , 13 , . . . on l2 ?
Solution: We have
8̧ 1 »1
fy1 pxq py1 , xq x1 x2 x3 .fy2 pxq py2, xq yi fy3 pxq py3 , xq et1 xptq dt. q
i 1
i 0
Recall that for the case p 2, q 2, the Hölder inequality (Quiz Theorem 5) reduced to the Cauchy-
Schwarz inequality (Propositions 3.1.2, 4.1.2), which in the inner product form was px, yq2 ¤ px, xq py, yq.
The following example will motivate some of the calculations in the duality proofs in this chapter.
°8 1{2 ξi converge?
Example 5.1.4. If x tξi u P l4{3 does
i 1i
8̧ 1 8̧ 1
8̧ 1
4 1{4 2
1{4
ξi ¤ |ξi | ¤ }x}4{3 }x}4{3 8.
π
1{ 2 1{ 2 1{2 6
q
i1 i1 i1
i i i
75
76 U 5. D UAL S PACES AND THE H AHN -B ANACH T HEOREM
Definition 5.1.5 (norm of a linear functional). A linear functional f on a normed space X is bounded if
there is a constant M such that |f pxq| ¤ M }x} for all x P X. The smallest such constant M is the norm of
f , denoted }f }; that is, }f } inf tM : |f pxq| ¤ M }x} for all x P X u.
Q UIZ T HEOREM 11 ([1], Proposition 2, p. 104). A linear functional on a normed space is continuous if and only
if it is bounded.
Proof: First we show that a linear functional f on a normed space X is continuous at a single point, then
it is continuous throughout X. Assume f is continuous at x0 P X, and let txn u Ñ x be a sequence in X.
By the linearity of f ,
|f pxnq f pxq| |f pxn x x0q f px0q|.
Since xn x x0 Ñ x0 and since f is continuous at x0 , f pxn x x0 q Ñ f px0 q. Thus, |f pxn q f pxq| Ñ 0,
so f is continuous at x. Since this holds for any x P X, f is continuous throughout X.
Suppose }f } ¤ M . Then if xn Ñ 0, |f pxn q| ¤ M }xn } Ñ 0, so f is continuous at x 0. By the above
result, f is continuous everywhere in X.
Now suppose that f is continuous anywhere in X, specifically at x 0. Then there exists a δ ¡ 0 such
that |f pxq| 1 for }x} ¤ δ. For any x 0 P X, δx{}x} has norm δ, so by using the linearity of f ,
|f pxq|
f δ }x} x
}x} f δ x
}x} ,
}}
x δ δ x δ
in l2 : clearly this function is discontinuous at x 0 since limxÑ0 f pxq 0 if we let ξk 1{kn and take the
limit as n Ñ 8, but limxÑ0 f pxq is undefined if we let ξk 1{n.
The linear functionals themselves may be regarded as elements of a vector space:
Definition 5.1.6 (algebraic dual). Given a vector space X, not necessarily normed, the linear functionals
on X form a vector space defined as follows:
• Given two linear functionals f1 and f2 on X, we define their sum f1 f2 as the linear functional on
X given by pf1 f2 qpxq f1 pxq f2 pxq.
• Given a linear functional f and a scalar a, we define the scalar product af as the linear functional
on X given by paf qpxq af pxq.
• The null element in the space of linear functionals is the functional f pxq 0 for all x P X.
However, not all linear functionals in the algebraic dual will be bounded, so we restrict our attention
to the subspace of the algebraic dual consisting of all bounded, or equivalently continuous, functionals
on a normed space X. The norm of a functional f in this subspace is still Definition 5.1.5. In fact, we can
write the norm of a functional in several ways:
}f } inf
M
tM : |f pxq| ¤ M }x} for all x P X u
sup |f}pxx}q|
x 0
sup |f pxq|
}x}¤1
sup |f pxq|.
}x}1
This leads to the following definition:
Definition 5.1.7 (normed dual). Let X be a normed linear vector space. The space of all bounded linear
functionals on X is the normed dual of X, denoted X . The norm of an element f P X is
}f } sup |f pxq|.
}x}¤1
If we explicitly treat a functional f as an element of the dual space, we can write f x . Then
f pxq x pxq xx, x y, where the symmetric notation x, y suggests that we are generalizing the inner
product p, q from Hilbert space.
Proof: By definition of the algebraic dual (Definition 5.1.6), X already satisfies the axioms of a vector
space. Therefore, we simply need to show that X is complete. Let tx u be a Cauchy sequence in X , so
}xn xm} Ñ 0 as n, m Ñ 0. Then for any x P X, txnu is a Cauchy sequence since |xnpxq xmpxq| ¤
}xn xm}}x}. Therefore, since R is complete, there exists a scalar xpxq such that xnpxq Ñ xpxq. We
must check that x P X ; that is, the functional x is linear and bounded. x is linear since
x pax byq lim xn pax byq a lim xn pxq b lim xn pyq ax pxq bx pyq.
nÑ8 n Ñ8 nÑ8
Furthermore, since txn u is a Cauchy sequence, for all ¡ 0, there exists an M such that |xn pxq xm pxq|
}x} for n, m ¡ M and all x. Since xn Ñ x , |x pxq xm pxq| }bx} for m ¡ M . Thus,
|xpxq| |xpxq xmpxq xm pxq| ¤ |x pxq x pxq| |xmpxq| ¤ p }xmq}x},
so x is bounded. q
78 U 5. D UAL S PACES AND THE H AHN -B ANACH T HEOREM
Guided by this theorem, we derive representations of the duals of some common dual spaces. These
representations will allow us to apply the theoretical framework of dual spaces to practical problems.
To find the dual of a normed space X, we start with a space whose elements are bounded linear
functionals f on X. We can calculate }f } from Definition 5.1.5. The challenge is to find a “representation”
for each functional by identifying it with an element }x } of a familiar normed space. For example, we
of coefficients pa, bq, combined with the evaluation
can identify any linear functional on E 2 with a vector ?
formula ax by and the Euclidean norm }pa, bq} a2 b2 . By analogy with the inner product, we
use xx, x y to denote application of the evaluation rule. Usually this is a “sum of products” or“integral
of products” rule. The hard part is usually to show that the norm }f } is equal to the norm }x } of the
representation, which was defined independently.
The proofs in general have four steps:
1. evaluation: state the rule for using x to evaluate the bounded linear functional f .
2. construction: state the rule for converting the bounded linear functional f to x .
3. alignment: invent a vector x P X that is aligned with a given representation vector x . “Aligned”
means that evaluating f pxq is essentially the same calculation as evaluating ||y ||||x||. If ||x|| 1, so
much the better. Now we know that ||f || ¥ ||y ||.
4. The “inequality” step: Prove, usually by invoking something like the Hoelder inequality, that |f pxq| ¤
||x||||y||. Applied to vectors with ||x|| 1, this guarantees that ||f || ¤ ||y||, so we conclude }f }
}x}.
1. The dual of E n is E n
This is easy, but it is a good introduction to what needs to be proved.
• Evaluation: show how to use a vector y pη1, η2, ηnq to define a bounded linear functional.
• Construction: give the rule for converting a bounded linear functional f to its representation y.
• Alignment: Show that the choice x y guarantees that ||f || ¥ ||y ||
• Inequality: show that if y represents f , then ||f || ¤ ||y ||
• Evaluation: show how to use a vector y pη1, η2, ηnq P l1 to define a bounded linear func-
tional.
• Construction: give the rule for converting a bounded linear functional f to its representation y.
• Alignment: Choose x P cn with ξi sgn ηi . Show that this choice makes |f pxq| ||x||8 ||y ||1 .
What goes wrong if we try to do this for x P l8 ?
Show that ||f || ¥ ||y ||1 .
• Inequality: show that if y represents f , then f pxq x, y ¡¤ ||x||||y||1, so that ||f || ¤ ||y||1.
Conclusion: ||f || ||y ||1
• Evaluation: any element y P H defines a bounded linear functional by the rule f pxq px|yq.
How do we know that this function is bounded?
• Construction:
First assume a bounded linear functional f . We want to find a vector y such that f pxq px|y q.
Consider the set N of all vectors n such that f pnq 0. Show that this set is a closed subspace.
If N H, then f is the zero functional and we take y θ. Otherwise, write H N ` N K .
Choose a vector z P N K with f pz q 1.
Consider any x P H. Show that x f pxqz P N .
Use the fact that (x f pxqz |z q 0 to construct y.
Show that y is unique.
• Alignment: choose x ||y|| . Show that ||f || ¥ ||y||.
y
³1
• Evaluation: given y ptq P Lq r0, 1s and xptq P Lp r0, 1s, define a functional by f pxq 0 xptqy ptqdt.
How do we know that it is bounded?
• Construction: Define the function us ptq 1 for 0 ¤ t s, 0 otherwise. For future use, evaluate
the norm of v ptq us h ptq us ptq.
Given a bounded linear functional f , define F psq f pusq Prove that F ps hq F psq Mh
1
p
for some M .
We now know that F psq is continuous. We cannot prove that it is differentiable. However, we
can prove that it is “absolutely continuous.” There are theorems, beyond the scope of our grasp
of Lebesgue integration, that say that an absolutely continuous function is differentiable except
on a set of measure zero. Now, wherever F is differentiable, take y ptq F 1 ptq as the function
that represents f . The values of y on the set of measure zero where F is not differentiable are
³1
irrelevant to the value of the Lebesgue integral f pxq 0 xptqy ptqdt. So we have identified the
bounded functional f with an equivalence class of functions.
• Alignment: choose
xptq |y ptq| p sgn y ptq.
q
We are done: we can convert between f and y, and we have proved that ||f || ||y ||q .
8. Minkowski functionals
This topic comes much later in the chapter, but it provides a geometric motivation for the sublinear
functionals that appear in the Hahn-Banach theorem.
Consider a convex set K in a normed space X that has the origin θ in its interior. You can think of
the boundary of K as the set of all points that can be reached from the origin for unit taxicab fare.
Then the Minkowski functional ppxq is the cab fare from the origin to x or equivalently, the factor by
which x must be scaled down to end up on the boundary of K. More formally,
Using the inequality g pm y q ¤ ppm y q, which holds for all m P M given our clever choice of g py q,
show that
g pm αy q ¤ ppm αy q for all α ¡ 0.
Using the inequality g pm y q ¤ ppm yq, which also holds for all m P M , show that gpm βyq ¤
ppm βy q for all β ¡ 0.
What about α 0?
We have now succeeded for one more dimension, since we have chosen g py q in such a way that
g pm αy q ¤ ppm αy q for all α and all m, and any vector in rM y s is of the form m αy.
A really simple example: In E 2 (Euclidean norm), let M be the line u v and define f pt, tq 2t on
M . What is the extension F pu, v q to all of E 2 ?
5.2. Common Dual Spaces U 85
Reading: [1, Sections 5.5 – 5.9, 5.11 – 5.13]. Also see [7, Chapter 2] for more on the Riesz representation
theorem.
Recall that C ra, bs is the Banach space of real-valued (uniformly) continuous functions x on ra, bs with
norm }x} suptPra,bs |xptq|. The challenge is to find a way of representing all linear functionals on C ra, bs
in a unified manner without using Dirac delta functions. This can be done elegantly by using the Stieltjes
integral.
Definition 6.1.1. Let f and g be real-valued, bounded functions on ra, bs that do not have a common point
of discontinuity. Take a partition of the interval a t0 tn b. Then if
n¸1
I maxpt limt qÑ0 f pti1 qpg pti q g pti1 qq
i 1 i
i 0
Example 6.1.2. Suppose that you use a logarithmic t-axis in presenting a graph of function f : i.e. you
graph f plog sq. Now the area under the graph no longer represents the integral. Show that the Stieltjes
³b
integral gives the right formula for approximating the integral a f psq ds and that this formula agrees with
the change-of-variable formula.
where we have the partition a s0 sn b, which agrees with the change-of-variable formula. q
87
88 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM
³1
Example 6.1.3. What is 0 f ptq dg ptq if:
• g ptq t?
• g is differentiable?
Solution: We have
³1 ³1
• 0 f ptq dg ptq 0 f ptq dt
³1 ³1
• 0 f ptq dg ptq 0 f ptqg 1 ptq dt
³1
• 0 f ptq dg ptq f p1{2q
³1 ³1
• 0 f ptq dg ptq 0 f ptq dt f p1{2q
³1
• 0 f ptq dg ptq E rT s
q
The Riesz representation theorem says that any bounded linear functional f on C ra, bs can be ex-
pressed as a Stieltjes integral
»b
f px q xptq dv ptq.
a
However, there is one caveat: the function v must be of bounded variation. Loosely speaking, TVpv q
³b
a |dv ptq| must be finite. For example, we could choose xptq sgn v ptq so that f pxq TVpxq. However,
such an x is clearly not in C ra, bs since it is not continuous, so we must introduce the following normed
linear space.
Definition 6.1.4. Let B ra, bs be the space of bounded functions x on ra, bs with norm }x} suptPra,bs |xptq|.
Note that C ra, bs is a subspace of B ra, bs.
Also recall that we defined the normed linear space BVra, bs of functions on ra, bs with norm TVpxq in
Example 2.4.6. Now we can state the Riesz represetnation theorem.
T HEOREM 6.1.5 (Riesz Representation Theorem ([1], Theorem 1, p. 113)). Let f be a bounded linear functional
on C ra, bs. Then there is a function v P BVra, bs such that for all x P C ra, bs,
»b
f px q xptq dv ptq
a
and }f } TVpv q. Conversely, every function v P BVra, bs defines a bounded linear functional in this way.
6.1. The Dual of C ra, bs U 89
Proof: Since C ra, bs is a subspace of B ra, bs, if f is a bounded linear functional on C ra, bs then, by the
Hahn-Banach theorem, there exists a linear functional F on B ra, bs which is an extension of f and has the
same norm. For any s P ra, bs, define the set of functions tus : s P ra, bsu, where for all t P ra, bs, us ptq 0
if s a and us ptq 1tPra,ss if s P pa, bs. Clearly, each us is in B ra, bs. Also define v psq F pus q, and let
a t0 tn b be a finite partition of ra, bs.
First we show that TVpv q ¤ }f }. Define i sgnpvptiq vpti1qq for i 1, . . . , n. Then
ņ ņ
|vptiq vpti1q| i pv pti q v pti1 qq
i 1 i 1
ņ
i pF puti q F puti1 qq
i 1
ņ
F i puti ut q
i 1 ,
i 1
where the last step follows from the linearity of F . By the Hahn-Banach theorem, }F } }f }. By definition,
}uti uti1 } 1 for only one i P t1, . . . , nu, so
ņ
i puti q 1.
uti1
i 1
Therefore, combining these results with Corollary ??,
ņ ņ
|vptiq vpti1q| ¤ } }
F
i puti
uti1
q }f }.
i 1
i 1
°
Then since TVpv q is the maximum of ni1 |v pti q v pti1 q| over all partitions of ra, bs and the above in-
equality holds for all partitions, TVpv q ¤ }f }.
Now we show that }f } ¤ TVpv q. For any x P C ra, bs, define
ņ
z pτ q xpti1 qputi pτ q uti1 pτ qq.
i 1
}z x} i,τ Prmax ,t s
|xpti1q xpτ q|.
t
i 1 i
Since x is uniformly continuous, this quantity goes to zero as the partition is made arbitrarily fine. Then
since F is continuous, F pz q Ñ F pxq, which in turn equals f pxq by the Hahn-Banach theorem. Finally,
ņ »b
F pz q xpti1 qpv pti q v pti1 qq Ñ xptq dv ptq,
i 1 a
Finally,
» b
p q p q ¤ }x}TVpvq;
x t dv t
a
Definition 6.1.6. Let NBVra, bs be the space of functions v on ra, bs of bounded variation such that v paq 0
and v is right-continuous on pa, bq. The norm is }v } TVpv q. Note that NBVra, bs is a subspace of BVra, bs.
Let the functional g pxq be represented by some element x P X ; we will use the notation g pxq xx, x y.
Conversely, any x P X defines a functional on X by f px q xx, x y.
f pax1 bx2 q xx, ax1 bx2 y a xx, x1 y b xx, x2 y af px1 q bf px2 q.
Also, |f px q| | xx, x y | ¤ }x}}x }, so }f } ¤ }x}. Furthermore, by Corollary ??, there exists a nonzero
x P X such that xx, x y }x}}x }, so in fact }f } }x}. q
Definition 6.2.2 (second dual). The space of all bounded linear functionals on X is denoted X and is
called the second dual of X.
Definition 6.2.3 (natural mapping). The mapping φ : X Ñ X defined by φpxq x such that xx, x y
xx, xy is called the natural mapping of X into X ; that is, φ maps elements of X into the functionals
they generate X .
Definition 6.2.4 (reflexivity). A normed linear space is said to be reflexive if X X ; that is, if the
natural mapping is surjective.
6.3. Alignment and Orthogonal Components U 91
Example 6.2.5. Give examples of spaces that are reflexive and not reflexive.
Solution: The lp spaces, with 1 p 8, are reflexive since lp lq with 1{p 1{q 1, as shown in Quiz
Theorem ??, so lp lq lp . l1 is not reflexive since the dual of l1 is l8 but the dual of l8 is not l1 . By the
Riesz-Fréchet theorem, any Hilbert space is reflexive. q
Recall that in a Hilbert space H, px, yq }x}}y} if and only if y ax for some constant a. We generalize
this idea below.
Definition 6.3.1 (alignment). A vector x P X is said to be aligned with a vector x P X if xx, xy
}x}}x}.
Example 6.3.2. Let x tξi u P lp and x tηiu P lq , with 1 p 8 and 1{p 1{q 1. Under what
condition are x and x aligned?
Solution: The condition for alignment is simply the condition for equality in the Hölder inequality:
|ξi|
|ηi|
1 1
q p
}x}p }x}q . q
Example 6.3.3 ([1], Example 1, p. 117). Let x P Lp ra, bs and x P Lq ra, bs, with 1 p 8 and 1{p 1{q 1.
Under what condition are x and x aligned?
Solution: As before, the condition for alignment is simply the condition for equality in the Hölder in-
equality:
xptq K psgn y ptqq|y ptq|q{p . q
Example 6.3.4 ([1], Example 2, p. 117). Let x P C ra, bs and let Γ tt P ra, bs : |xptq| }x}u. Under what
³b
conditions is a bounded linear functional x pxq a xptq dv ptq aligned with xptq?
Solution: We require that v vary only on Γ and that v is nondecreasing at t if xptq ¡ 0 and nonincreasing
at t if xptq 0. q
Recall that in a Hilbert space H, we defined the orthogonal complement of a set S as S K ty P H :
ps, yq 0 for all s P S u. We generalize this idea below.
Definition 6.3.5 (orthogonal complement). Let S be a subset of a normed linear space X. The orthogonal
complement or annihilator space of S is defined as S K tx P X : xs, x y 0 for all s P S u.
92 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM
Definition 6.3.6 (orthogonal complement of U in X). Let U be a subset of the dual space X . The orthog-
onal complement of U in X is defined as K U U K X X.
T HEOREM 6.3.7 ([1], Theorem 1, p. 118). Let M be a closed subspace of a normed space X. Then K rK M s M .
f p x mq
}f } sup
}x m} inf 1
}x m}
.
m M P P
m M
Example 6.4.1 (equivalent to [1], Example 1, p. 119). Cab drivers are charging only for the larger dimen-
sion traveled in R2 . Show that from a point pu, v q with v 0, there are many points pu1 , 0q that can be
reached for the same fare.
Solution: All the points pu1 , 0q with u1 P ru v, u v s can be reached for fare v. q
Example 6.4.2. Now cab drivers are charging for the total distance driven. Show that from a point pu, v q
with u v, there are many points pu1 , u1 q that can be reached for the same fare.
Solution: Without loss of generality, assume u v. Then all the points pu1 , u1 q with u1 P ru, vs can be
reached for fare v u. q
The following theorem guarantees neither existence nor uniqueness, but it is more general than the
projection theorem in that it specifies a dual problem for any minimum norm problem.
where the maximum is achieved for some x0 P M K. If the infimum is achieved for some m0 P M , then x0 is aligned
with x m0 .
6.4. Minimum Norm Problems U 93
Proof: For ¡ 0, let m P M satisfy }x m} ¤ d . Then for any x P M K such that }x} ¤ 1,
xx, xy xx, xy xloooooomoooooon
m , x y xx m , x y ¤ }x m }}x } ¤ d .
0
Since was arbitrary, we conclude xx, x y ¤ d. To complete the proof of the first part of the theorem, we
must find a x such that xx, x y d. Let N rx M s. We can represent an element n P N uniquely as
0 0
ax m, where m P M . Define the linear functional f on N by f pnq ad. Then
C OROLLARY 6.4.4. Let x be an element of a real normed linear space X and let M be a subspace of X. A vector
m0 P M satisfies }x m0 } ¤ }x m} for all m P M if and only if there is a nonzero vector x P M K aligned
with x m0 .
Proof: If }x m0 } ¤ }x m} for all m P M , then we have achieved the infimum in Theorem 6.4.3,
specifically at m0 , so the given x0 is aligned with x m0 .
If there is a nonzero vector x P M K aligned with x m0 , take }x } 1 without loss of generality. Then
for all m P M , xx, x y xx m, x0 y ¤ }x m} while xx, x y xx m0 , x0 y }x m0 }. Therefore,
}x m0} ¤ }x m} for all m P M . q
Example 6.4.5. Let X R2 with the l1 norm, M tpu, vq : u 2v 0u, and x p0, 3q. Determine d, m0,
and x .
Solution: We have M K rp1, 2qs, so K tx : }x } 1u tp1{2, 1qu. Then x0 arg maxx PK xx, x y
p1{2, 1q and d xx, x0 y 3. Then x m0 p0, 3q since x m0 and x0 must be aligned, so m0
p0, 0q. q
Example 6.4.6. Let X R2 with the l8 norm, M tpu, v q : u 2v 0u, and x p0, 3q. Determine d, m0 ,
and x . Note that the norm in X is the l1 norm since difficulties concerning the dual of l8 occur only
because of infinite dimensionality.
Solution: We have M K rp1, 2qs, so K tx : }x } 1u tp1{3, 2{3qu. Then x0 arg maxx PK xx, x y
p1{3, 2{3q and d xx, x0 y 2. Then x m0 p2, 2q since x m0 and x0 must be aligned, so
m0 p2, 1q. q
94 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM
Example 6.4.7. Let X R2 with the l1 norm, M tpu, vq : 2u v 0u, and x p1, 2q. Determine d, m0,
and x .
Solution: We have M K rp2, 1qs, so K tx : }x } 1u tp1, 1{2qu. Then x0 arg maxx PK xx, x y
p1, 1{2q and d xx, x0 y 2. Then x m0 p2, 0q since x m0 and x0 must be aligned, so
m0 p1, 2q. q
Theorem 6.4.3 simply proved the existence of a solution to the dual problem in X : the theorem below
proves the existence of a solution to the minimum norm problem in X.
where the minimum is achieved for some m0 P M K. If the supremum is achieved for some x0 P M , then x0 m0
is aligned with x0 .
Proof: Define
}x}M sup xx, xy
P } }1
x M x
since it is the norm of the functional x restricted to the subspace M . Then for any m P M K,
}x m} sup xpx m q sup pxx, xy xx, bmyq
}x}¤1 }x}¤1
¥ sup pxx, xy xx, bmyq sup xx, xy ¥ }x}M .
P } }¤1
x M, x P } }¤1
x M, x
Since m was arbitrary, we conclude }x m } ¤ }x }M . To complete the proof of the first part of the
theorem, we must find a m0 such that }x m } }x }M . Consider x restricted to M . The norm of
x so restricted is }x }M . By the Hahn-Banach theorem, extend x restricted on M to y on X. Then
}y} }x}M and y x on M so y x 0 on M . Set m0 x y; then m0 P M K and
}x m0 } }y} }x}M .
To prove the second part of the theorem, assume there exists an x0 P M such that xx0 , x y d. Since
the supremum is achieved, clearly }x0 } 1 and }x m0 } xx0 , x y xx0 , x m0 y, so x0 is aligned
with x m0 . q
Example 6.4.9. Let X R2 with the l1 norm, M tpu, vq : 2u v 0u, and x p4, 1q. Determine d, m0 ,
and x0 .
Solution: We have M rp1, 2qs, so K tx : }x} 1u tp1{3, 2{3qu. Then x0 arg supxPK xx, x y
p1{3, 2{3q and d xx, x0 y 2. Then x m0 p2, 2q since x m0 and x0 must be aligned, so
m0 p2, 1q. q
We can illustrate the duality between Theorems 6.4.3 and 6.4.8 more explicitly as follows: given a
normed linear space X and a subspace M , we have two ways of making a new vector space:
6.5. Applications U 95
• the quotient space X {M . An element of X {M is a coset rxs. Recall that the norm of such a coset is
}rxs} inf mPM }x m}. Motivated by this definition, let }x}M }rxs}. This notation is reasonable
if you think of it as answering the question: if I am allowed to add any element of M to x, how small
can I make the norm?
}x}M K
sup
K
xx, xy
x PM ,}x }1
Using this notation, Theorem 6.4.3 is }x}M }x}M K and Theorem 6.4.8 is }x}M K }x}M .
Example 6.4.10. Let X R3 with the l1 norm, M tpu, v, wq : u 2v 3w 0u, and x p7, 2, 3q. Note
that M rtp2, 1, 0q, p3, 0, 1qus. Determine d, m0 , and x0 .
Solution: We have M K rp1, 2, 3qs, so K tx : }x } 1u tp1{3, 2{3, 1qu. Then x0
arg maxx PK xx, x y p1{3, 2{3, 1q and d xx, bx0 y }x}M K 2. Then x m0 p0, 0, 2q since x m0
and x0 must be aligned, so m0 p7, 2, 1q. q
We will solve minimum norm problems using roughly the following three steps:
1. In characterizing optimal solutions, use the alignment properties of the normed linear space and its
dual. (Refer back to Examples 6.3.2, 6.3.3, and 6.3.4.)
2. Try to guarantee the existence of a solution by formulating the minimum norm problem in a dual
space.
3. Examine the dual problem to see if it is easier – that is, if it is a lower-dimensional problem or is
more transparent – than the original problem.
6.5 Applications
Q UIZ T HEOREM 12 (Tonelli ([1], Example 1, p. 122)). If f is continuous on ra, bs and p0 is the polynomial with
deg p0 ¤ n minimizing maxtPra,bs |f ptq pptq|, then |f ptq p0 ptq| achieves its maximum at at least n 2 points
on ra, bs.
Proof: First, we reformulate the problem as follows: in the Banach space C ra, bs, we want to find p P N
tpolynomials p : deg p ¤ nu such that }f p} is minimized. N is finite-dimensional (dimension n 1), so
an optimal solution p0 must exist.
Suppose d }f p0 } ¡ 0 and let Γ tt P ra, bs : |f ptq p0 ptq| du. By Theorem 6.4.3, the optimal
solution p0 must be such that f p0 is aligned with an element v P N K with }v } 1. Furthermore, by
Theorem 6.1.5, N K is a subspace of NBVra, bs. Therefore, recall from Example 6.3.4 that if f p0 is aligned
with v, then v varies only on Γ; that is, v consists of jump discontinuities at the points in Γ.
Now suppose that Γ contains m n 2 points a ¤ t1 tm ¤ b. Then the polynomial
±
q ptq ik pt ti q with k P t1, . . . , mu has degree m 1 ¤ n and is therefore in N . The Stieltjes integral
96 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM
³b
a q ptq dv ptq is a linear combination of q pt1 q, . . . , q ptm q with no zero coefficients since v consists of jump
discontinuities at the points in Γ. However, the integral cannot equal zero since q pti q 0 for i k and
q pti q 0 for i k, so v R N K . Therefore, f p0 is not aligned with any nonzero element of N K , so our
assumption that Γ contains less than n 2 points was false. q
Example 6.5.1. Let ra, bs r1, 1s, f ptq t2 , and n 1. Find p0 ptq.
Solution: Let p0 ptq a bt. By Theorem 12, |t2 a bt| achieves its maximum at at least 3 points on
r1, 1s. The maximum of |t2 a bt| could occur at t 1 (d |1 a b|), t 1 (d |1 a b|), or
t b{2 (d | b2 {4 a|). Because the coefficient on the t2 term (1) is positive, we know that 1 a b,
1 a b, and b2 {4 a will be positive for optimal p0 ptq. Setting these quantities equal to d gives a 1{2,
b 0, and d 1{2. q
Example 6.5.2. Let ra, bs r1, 1s, f ptq tk 1, and n k. Find p0 ptq.
Solution: Notice that the function cosppn 1qθq achieves its maximum absolute value of 1 precisely n 2
times on the interval r0, π s, specifically at the points tkθ{pn 1q : k 0, . . . , n 1u. Now cos θ P r1, 1s,
so set t cos θ and express cosppn 1qθq as a function of t to find p0 ptq. For example,
number of linear constraints. Treat the solution to the minimum norm problem as a vector x in the dual
space X . Theorem 6.4.8 then guarantees the existence of a solution to the minimum norm problem. We
can express the constraints in the form xy1 , x y c1 , . . . , xyn , x y cn with yi P X for i 1, . . . , n. Then
if M rtyi us and x̄ is a vector satisfying the constraints, we have
d min
xyi , x yci
}x} mmin
PM K
}x̄ m} sup xx, x̄y ,
P } }¤1
x M x
where the second equality follows from Theorem 6.4.8. Then because M is finite-dimensional (dimenson
n), so we can change the supremum to a maximum. Any vector in M can be written as a linear combina-
°
tion x ni1 ai yi or equivalently, if we set Y py1 , , yn q and a1 pa1 , , an q, as x Ya. Then if
we set c1 pc1 , , cn q,
d min
xyi , x yci
}x} }Ya
max xYa, x̄ y max c1 a.
}¤1 }Ya}¤1
°
Then by Theorem 6.4.8, the optimal x is aligned with x ni1 ai yi (compare to the equivalent result in
°
Hilbert space, where the optimal solution was proportional to x ni1 ai yi ).
C OROLLARY 6.5.3 ([1], Corollary 1, p. 123). Let y1 , . . . , yn P X and suppose the system of linear equalities
xyi, xy ci for i 1, . . . , n is consistent; that is
D tx P X : xyi, xy ci for i 1, . . . , nu ∅.
Then
min }x } max c1 a
x D
P }Ya}¤1
and the optimal x is aligned with the optimal Ya.
Example 6.5.4. You are taking the new Gen Ed course “Applied Cubism,” whose goal is to teach you
how to apply art to everyday life. As your term project, you are producing three large concrete cubes.
Initially these will serve as a podium for children who are receiving medals for their athletic prowess, but
they will live on as modern sculpture. The gold-medal cube has side u and provides artistic renown 9u.
The silver-medal cube has side v and provides artistic renown 4v. The gold-medal cube has side w and
provides artistic renown w. You must achieve artistic renown of 72 to earn an A in the course. How do
you accomplish this while minimizing the total volume d3 of concrete?
Solution: The solution x pu, v, wq lies in the space X l3 . We want to minimize }x }3 such that
xy, xy c, where y p9, 4, 1q P X l3{2 and c 72. We have
2
}ay}3{2 a 92
3 3
42
3
12
3
a 64{3 ¤ 1.
Then by Corollary 6.5.3,
d min
xy, x yc
}x} }Ya
max c1 a
}¤1
max 72a 2 62{3
a 64{3
¤
98 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM
and d3 288. Then because x and y must be aligned (there is only one constraint, so Ya in Corollary
6.5.3 reduces to a constant times y), we have x p6, 4, 2q.
We can check this result by setting up a Lagrangian:
Example 6.5.5. First you go to the McCain-Palin staff. They require 14 tons of moose jerky. Since they
favor small government, they want only 1 unit of labor left over to serve the snacks.
°
Solution: Note that at the start of Week k, we have ki1 ξi units of labor, for i 1, . . . , 5 (the start of Week
5 is the inauguration). Then the functional that gives the tons of snacks produced is y1 p4, 3, 2, 1, 0q and
the functional that gives the number of units of labor left for the inauguration is y2 p1, 1, 1, 1, 1q. The
optimal solution x must satisfy xy1 , x y c1 14 and xy2 , x y c2 1.
Since the optimal x is aligned with a vector in l1 , its entries must almost all be d. The exception is
that if the vector in l1 has an entry of zero in any position, then x can have any entry between d and d
in that position without destroying alignment.
The vector Ya can be written p4a1 a2 , 3a1 a2 , 2a1 a2 , a1 a2 , a2 q. Since we want a unit of labor
left over, the first three entries of x will be nonnegative, while the last two will be negative. Therefore,
in the aligned Ya, the first three entries will also be nonnegative, while the last two will be nonpositive.
Therefore, we can discard the absolute values, so }Ya}1 8a1 a2 . From Corollary 6.5.3,
d min
x PD
}x} }Ya
max c1 a 14a1
}¤1
a2
Example 6.5.6. Next you go to the Obama-Biden staff. They require 17 tons of brioche stuffed with brie.
Since they favor job creation, they want 3 units of labor left over to serve the snacks.
6.5. Applications U 99
Solution: Again we have y1 p4, 3, 2, 1, 0q P l1 and y2 p1, 1, 1, 1, 1q P l1 , though the optimal solution
x must now satisfy xy1 , x y c1 17 and xy2 , x y c2 3.
We repeat all of the steps as before, though now since we want 3 units of labor left over, either the
first three entries of x will be nonnegative while the last two will be negative, or the first four entries of
x will be nonnegative while the last one will be negative. Therefore, in the aligned Ya, either the first
three entries will be nonnegative while the last two will be nonpositive, or the first four entries will be
nonnegative while the last one will be nonpositive. In either case, we can discard the absolute values, so
}Ya}1 8a1 a2 or }Ya}1 10a1 3a2. From Corollary 6.5.3,
d min
x PD
}x} }Ya
max c1 a 17a1
}¤1
3a2
Now either η3 2a1 a2 0 or η4 a1 a2 0, which combined with the constraint }Ya}1 1 above,
yields either a1 1{6, a2 1{3, Ya p1{3, 1{6, 0, 1{6, 1{3q, and d 11{6 or a1 1{7, a2 1{7,
Ya p3{7, 2{7, 1{7, 0, 1{7q, and d 2. d is larger in the second case, so that case is optimal. Then since
x must be aligned with Ya, x p2, 2, 2, 1, 2q. q
Example 6.5.7 ([1], Example 2, p. 124). Consider selecting the field current uptq on r0, 1s to drive a motor
governed by θ:ptq θ9ptq uptq with initial conditions θp0q 0, θ9p0q 0, θp1q 1, and θ9p1q 0 such that
we minimize maxtPr0,1s |uptq|.
Proof: We could think of this problem as being formulated in C r0, 1s, but since this is not the dual of
any normed space, we are not guaranteed that a solution exists in C r0, 1s. Therefore, we instead take
X L1 r0, 1s, X L8 r0, 1s, and seek u P X of minimum norm.
As in Example ??, the constraints are
»1
xy1, uy y1 ptquptq dt θ9p1q 0
0
»1
xy2, uy y2 ptquptq dt θp1q 1,
0
This can happen only if uptq is equal to d everywhere and always has the same sign as a1 y1 ptq a2 y2 ptq.
Furthermore, the function a1 y1 ptq a2 y2 ptq is the sum of a constant and an exponential term, so it changes
sign on r0, 1s at most once; the same must be true of uptq.
100 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM
ùñ t 1 log
1
2
1
1
e
,
Example 6.5.8 ([1], Example 3, p. 125). Consider the problem of selecting the thrust program uptq for a
vertically ascending rocket-propelled vehicle, subject only to the forces of gravity and rocket thrust, in
order to reach a given altitude with minimum fuel expenditure.
Solution: Assuming fixed unit mass, unit gravity, amd zero initial conditions, the altitude xptq is gov-
erned by the differential equation
:ptq uptq 1,
x xp0q x9 p0q 0.
Integrate to give
»T
x9 pT q uptq dt T.
0
x pT q x9 psq ds uptq dt s ds
0 0 0
»T » s
» T » T
T2
uptq dt ds
2
uptq ds dt
0 0 0 t
»T 2
pT tquptq dt T2 .
0
(Note this gives xp0q 0.) Then for xpT q 1 we have the constraint
»T
T2
xy, uy y ptquptq dt 1
2
,
0
³T
where y ptq T t, and we want to minimize 0 |uptq| dt.
We could think of this problem as being formulated in L1 r0, T s, but since this is not the dual of any
normed space, we are not guaranteed that a solution exists in L1 r0, T s. Therefore, we instead take X
6.6. Hyperplanes and Linear Functionals U 101
subject to
»T
T2
pT tq dvptq 1 2
.
0
Because we cannot expend negative fuel, the physical meaning of v ptq is the total fuel consumption over
the interval r0, ts, while }v } is the total fuel consumption over the whole interval r0, T s.
From Corollary 6.5.3,
T2
d min }v } max a 1 .
}pT tqa}¤1 2
Then
}pT tqa} tmax
Pr0,T s
|pT tqa| T |a|,
so the optimal choice is a 1{T and d 1{T T {2. The optimal v must be aligned with pT tqa and,
from Example 6.3.4, can vary only at t 0. Thus, v is a step function. Differentiating d with respect to T
? ? ?
gives T 2 and d 2, so v 2 1rtPp0,?2ss . q
Definition 6.6.1 (hyperplane). A hyperplane H in a linear vector space X is a maximal proper linear
variety; that is, a linear variety H such that H X (proper), and if V is any linear variety containing H,
then either V X or V H (maximal).
The following proposition, stated without proof, gives us a more practical way to work with hyperplanes.
Proposition 6.6.2 ([1], Proposition 1, p. 129). Let H be a hyperplane in a linear vector space X. Then there is
a linear functional f on X and a constant c such that H tx : f pxq cu. Conversely, if f is a nonzero linear
functional on X, then the set tx : f pxq cu is a hyperplane in X.
Definition 6.6.3 (half-space). The four half-spaces associated with a hyperplane H are
tx : f pxq cu tx : f pxq ¤ cu
tx : f pxq ¡ cu tx : f pxq ¥ cu
T HEOREM 6.6.5 (Support Theorem ([1], Theorem 2, p. 133)). Given x in the boundary of a convex set K, there
exists a supporting hyperplane containing x.
Proof: If x is in the boundary of K, then there exists a sequence txk u in K̄ C such that xk Ñ x. Now for
any k, since xk R K̄, there exists a hyperplane Hk strictly separating xk from K. Let Hk be defined by the
functional fk ; that is, fk pxk q ¡ fk pyq for all y P K. Now we extract a subsequence so that fk Ñ f ; then in
the limit, f pxq ¡ f pyq for all y P K. f defines the desired separating hyperplane. q
T HEOREM 6.6.6 (Eidelheit Separation Theorem ([1], Theorem 3, p. 133)). Let K1 and K2 be convex sets in X
such that K̊1 ∅ and K2 X K̊1 ∅. Then there is a closed hyperplane H separating K1 and K2 , or equivalently,
there exists an x P X such that supxPK1 xx, x y ¤ inf xPK2 xx, x y. In other words, K1 and K2 lie in opposite
half-spaces determined by H.
Proof: Let K K1 K2 . Then K contains an interior point and 0 is not one of them. Recall from
Proposition 2.2.3 that K1 K2 is convex. Then by Theorem 6.6.5, there exists an x P X with x 0
such that xx, x y ¤ 0 for x P K. Setting x x1 x2 with x1 P K1 and x2 P K2 gives xx1 , x y ¤ xx2 , x y,
so there exists a real number c such that
CALCULUS OF VARIATIONS
7.1 Review
• Calculate B f {B u and B f {B v at 0.
• Introducing the Euclidean norm, show that the derivative does not exist.
Solution: We have
Bf δf p0; p1, 0qq lim 1 pf pα, 0q f p0, 0qq 0
Bu αÑ 0 α
Bf δf p0; p0, 1qq lim 1 pf p0, αq f p0, 0qq 0
Bv αÑ 0 α
Furthermore,
Note that the directional derivative is not a linear functional of h. In fact, the only linear function
δf p0; hq that could possibly work in this case, given that both partial derivatives evaluated at 0 equal
0, is δf p0; hq 0.
We now show that the derivative does not exist; that is, we show that there is no linear function
δf p0; hq of h such that
1
}h}Ñ0 }h}
lim pf phq f p0q δf p0; hqq 0.
First we convert to polar coordinates: h ph1, h2q pr cos θ, r sin θq. Then f phq r cos2 θ sin θ and
103
104 U 7. C ALCULUS OF VARIATIONS
for constants a, b. There exist no constants a, b such that the above quantity is zero for all θ, so the deriva-
tive does not exist. Intuitively, we can see from Figure 7.1.1 that f has a cusp at 0. q
Figure 7.1.1: On the left, we see that the function f in Example 7.1.1 has a cusp at 0. Furthermore, the
partial derivatives B f {B u (middle) and B f {B v (right) are clearly discontinuous at 0, so Proposition 7.1.2
does not apply.
The following proposition will be useful later. Note that it did not apply above since the partial deriva-
tives were not continuous at 0, as shown in Figure 7.1.1.
Proposition 7.1.2 ([1, Example 7.2.4, p. 172]). Let X E n and let f pxq be a functional on E n having continuous
partial derivatives with respect to each xi . Then
δf px; hq
ņ
Bf h .
i1
B xi i
Definition 7.2.1 (Gateaux differential). Let X be a vector space, Y be a normed space, and T : D Ñ Y a
transformation, with D X. Let x be a fixed element of D and h an arbitrary element of X. If the limit
exists, it is called the Gateaux differential of T at x with increment h. If the limit exists for all h P X, T is
said to be Gateaux differentiable at x.
Note that while we do require Y to be normed since we need to take limits in Y , we do not require X to
7.2. Gateaux and Fréchet Differentials U 105
be normed. Furthermore, note that if T is Gateaux differentiable at x, then δT px; hq is linear in h since
δT px; λhq lim p T px αλhq T pxqq λ lim pT p x γhq T pxqq λδT px; hq.
1 1
αÑ 0 α γ Ñ0 γ
Definition 7.2.2 (Fréchet differential). Let X and Y be normed spaces, and T : D Ñ Y a transformation,
with D X. If for fixed x P D and arbitrary h P X there exists a linear, continuous function δT px; hq of
h such that
1
}h}Ñ0 }h}
lim pT px hq T pxq δT px; hqq 0,
then T is said to be Fréchet differentiable at x and δT px; hq is called the Fréchet differential of T at x
with increment h.
In Example 7.1.1, we showed that the given function was not Fréchet differentiable at the origin.
from the definition. Then let T π{2, xptq sin t, and hptq cos t. Use the differential to approximate
» π
Solution: We have
»
1 T
lim pf px αhq f pxqq lim p xptq αhptqq2 xptq dt
1
αÑ 0 α αÑ 0 α 0
» »T
1 T
αlim
Ñ0 α 0
p 2αxptqhptq α2 h2 ptqq dt 2xptqhptq dt.
0
Furthermore,
» π » π » π
2 2
2
sin t dt 0.01 sin 2t dt
0 0
» π »π
2
sin2 t dt 0.005 sin s ds
0 0
π4 0.005 0.7954.
q
106 U 7. C ALCULUS OF VARIATIONS
Example 7.2.4 ([1, Examples 7.2.2, 7.2.5, p. 172 – 174]). Let X C r0, 1s. Calculate the Gateaux differential
of » 1
f p xq g pxptq, tq dt,
0
assuming that the partial derivative gx px, tq exists and is continuous with respect to x and t. Show that
this differential is also a Fréchet differential.
Now we show that the Gateaux differential is also a Fréchet differential. We have
» 1
|f px hq f pxq δf px; hq|
p p pq
g xt hptq, tq g pxptq, tq gx pxptq, tqhptqq dt .
0
Therefore,
1
}h}Ñ0 }h}
lim |f px hq f pxq δf px; hq| 0,
Definition 7.2.5 (relative minimum and maximum). Let f be a real-valued functional defined on a subset
Ω of a normed space X. A point x0 P Ω is a relative or local minimum of f on Ω if there is an open ball
B px0 q such that f px0 q ¤ f pxq for all x P Ω X B px0 q. The relative or local maximum is defined similarly.
Q UIZ T HEOREM 13 ([1, Theorem 7.4.1, p. 178]). Let a real-valued functional f have a Gateaux differential on a
vector space X. A necessary condition for f to have an extremum at x0 P X is that δf px0 , hq 0 for all h P X.
Proof: For every h P X, the function g pαq f px0 αhq must achieve a extremum at α 0. By Fermat’s
7.3. Euler-Lagrange Equations U 107
Definition 7.3.1 (Dra, bs). The normed linear space Dra, bs consists of all continuous and continuously
differentiable functions on ra, bs, with norm
}x} tmax
Pra,bs
|xptq| max |x9 ptq|.
Pr s
t a,b
Q UIZ T HEOREM 14 (Euler-Lagrange ([1, Section 7.5, p. 179 – 180])). Consider finding a function x P Drt1 , t2 s
minimizing a functional of the form
» t2
J f pxptq, x9 ptq, tq dt,
t1
9 and t and has continuous partial derivatives with respect to x and x.9
where f is continuous with respect to x, x,
Assume that xpt1 q c1 and xpt2 q c2 for constants c1 , c2 . Show that x0 arg minx J is a solution to the
Euler-Lagrange equation
fx pxptq, x9 ptq, tq fx9 pxptq, x9 ptq, tq 0.
d
dt
Proof: Our admissible set is the subset of Drt1 , t2 s with xpt1 q and xpt2 q fixed. Given an admissible vector
x, we also consider vectors of the form x h that are admissible. The set of all h such that x h is
admissible is called the class of admissible variations. Clearly, hpt1 q hpt2 q 0 for such h.
The Gateaux differential of J is
» t2
αh, x9 αh9 , tq dt
δJ px; hq f px
d
.
dα t1 α 0
By our assumptions on f , we can interchange the order of integration and differentiation:
» t2 » t2
δJ px; hq 9 tqhptq dt
fx px, x, 9 tqh9 ptq dt.
fx9 px, x,
t1 t1
If we assume that fx9 has a continuous derivative with respect to t, then we can rewrite the second integral
using integration by parts:
» t2 » t2
9 9 tqh9 ptq dt fxpx, x,9 tqhptq|tt
fx px, x, 9
2
1
d
dt
9 tqhptq dt,
fx9 px, x,
t1 t1
108 U 7. C ALCULUS OF VARIATIONS
so » t2
δJ px; hq 9 tq
fx px, x, 9 tq hptq dt
fx9 px, x, 9 tqhptq|tt21 .
fx9 px, x,
d
t1 dt
For all admissible variations h, hpt1 q hpt2 q 0, so the second term equals zero. By Quiz Theorem 13, a
necessary condition on x0 arg minx J is that it must satisfy
» t2
δJ px; hq fx px, x,
9 tq fx9 px, x,
9 tq hptq dt 0
d
t1 dt
fx px, x,
9 tq fx9 px, x,
9 tq 0.
d
dt
³t
We prove that if g ptq is continuous on rt1 , t2 s and t12 g ptqhptq dt 0 for all h P Drt1 , t2 s with hpt1 q
hpt2 q 0, then g ptq 0 on rt1 , t2 s. Assume g ptq is nonzero, say positive, for some t P rt1 , t2 s. Then since g
is continuous, it is positive on some interval rs1 , s2 s rt1 , t2 s. Let
#
hptq
p t s 1 q2 ps 2 t q2 s1 ¤ t ¤ s2 .
0 otherwise
But then » t2
aptqhptq dt ¡ 0,
t1
contradicting our claim that this integral evaluates to zero. Therefore, g ptq 0 on rt1 , t2 s. Noting that
9 tq
fx px, x, 9 tq
fx9 px, x,
d
dt
is continuous, we apply this result to obtain the Euler-Lagrange equation. q
In fact, we do not need to assume that the derivative of fx9 with respect to t is continuous. See [1, Lemmas
7.5.1 – 7.5.3] for an alternative derivation.
Example 7.3.2. For a mass on a spring, if we set all coefficients equal to 1, the difference between kinetic
9 tq 21 px9 2 x2q. Find the function x that minimizes
energy and potential energy (the Lagrangian) is f px, x,
» π » π
9 tq dt
f px, x,
1
px9 ptq2 xptq2q dt
2 2
J
0 0 2
fx px, x,
9 tq fx9 px, x,
9 tq x : 0.
d
x
dt
The general solution to this equation is xptq A cos t B sin t. From the initial conditions, we have A 1
7.3. Euler-Lagrange Equations U 109
Example 7.3.3. The balance in your campaign chest is xptq, and it grows at a rate equal to xptq by ac-
cumulating interest. If you want a different growth rate x9 ptq, you must make contributions at a rate
uptq x9 ptq xptq. Find the function x that minimizes
»1 »1 »1
J puptqq 2
dt 9 tq dt
f px, x, px9 ptq xptqq2 dt
0 0 0
9 tq
fx px, x, 9 tq 2px9 xq 2p:x x9 q 2px x:q 0.
fx9 px, x,
d
dt
The general solution to this equation is xptq Aet Bet . From the initial conditions, we have A
1 e2 {p1 eq and B e2 {p1 eq, so
et .
e2 e2
xptq 1 t
e
1 e 1 e
uptq x9 ptq xptq pAet Bet q pAet Bet q 2Bet 12e e et.
2
q
Example 7.3.4 ([1, Example 7.5.2, p. 182]). You want to determine your lifetime plan of investment and
expenditure that maximizes total enjoyment. Assume you have fixed savings S and know you will die in
T years. Further assume you have no descendants and are not a charitable person, so you plan to have
no savings left at time T . Your total capital xptq at time t grows at rate x9 ptq αxptq rptq, where α is
the interest rate you earn on your investments and rptq is your rate of expenditure. Your enjoyment is a
function U of your rate of expenditure rptq, so you want to maximize
»T »T »T
J eβt U prptqq dt 9 tq dt
f px, x, eβt U pαxptq x9 ptqq dt.
0 0 0
d βt 1
fx px, x,
9 tq fx9 px, x,
9 tq αeβt U 1 pαxptq x9 ptqq e U pαxptq x9 ptqq 0
d
dt dt
ùñ dt U pαxptq x9 ptqq pβ αqU 1pαxptq x9 ptqq.
d 1
?
Consider U prq 2 r. Then U 1 prq r1{2 and pU 1 q1 prq r2 . From the previous equation, we
have rptq rp0q expp2pα β qq (note pU 1 q1 prsq pU 1 q1 prqpU 1 q1 psq). We can now solve for xptq: the
calculations are messy and not worth repeating here. q
In several optimization problems, we require the optimal vector to satisfy some constraints. In the previ-
ous section, we considered functions that vanish on the endpoints of an interval. More generally, we op-
timize a functional f subject to n nonlinear constraints given in the implicit form g1 pxq 0, . . . , gn pxq 0.
We will assume throughout this section that f and g1 , . . . , gn are continuous and Fréchet differentiable on
the normed space X.
Definition 7.4.1 (regular point). A point x0 satisfying the constraints g1 pxq 0, . . . , gn pxq 0 is a regular
point of these constraints if the n linear functionals g11 px0 q, . . . , gn1 px0 q are linearly independent.
Recall that even though g1 , . . . , gn are nonlinear functionals, their Gateaux differentials are linear in the
increment h.
T HEOREM 7.4.2 ([1, Theorem 7.7.1, p. 187]). If x0 is an extremum of the functional f subject to the constraints
gi pxq 0 for i 1, . . . , n and if x0 is a regular point of these constraints, then δf px0 ; hq 0 for all h satisfying
δgi px0 ; hq 0 for i 1, . . . , n.
Proof (single constraint): Choose h P X such that δg px0 ; hq 0. Choose any vector y P X such that
δg px0 ; y q 0 (note that if x0 were not a regular point, we would not be able to do this).
Now set g px0 h φy q 0. This is a nonlinear function of and φ since x0 , H, and y are fixed. The
derivative of this function with respect to φ at , φ 0 is just δg px0 ; y q, which by assumption is nonzero.
Therefore, by the implicit function theorem, we can solve for φ as a function of in some neighborhood
of 0. Then
0 g px 0 h φy q lo
gopmo
x0oqn looomooon
δg px0 ; hq φδg px0 ; y q opq op}φy }q. (7.4.1)
0 0
Since δg px0 ; y q 0, there exist constants constants c1 , c2 ¡ 0 such that c1 |φ| ¤ |δg px0 ; y qφ| ¤ c2 |φ|, and
since y 0, there exist constants d1 , d2 ¡ 0 such that d1 }φy } ¤ |φ| ¤ d2 }φy }. Therefore, c1 d1 }φy } ¤
|δgpx0; yqφ| ¤ c2d2}φy}, so from (7.4.1), }φy} opq.
Now x0 h φy satisfies the constraint and is a function of . By Quiz Theorem 13, we must have
f p x0 φpqy q|0 0,
d
h
d
which implies δf px0 ; hq 0 since }φy } opq. q
7.4. Problems with Constraints U 111
Proposition 7.4.3 ([1, Lemma 7.7.1, p. 188]). Let f0 , f1 , . . . , fn be linear functionals on a vector space X and
suppose that f0 pxq 0 for every x P X satisfying fi pxq 0 for i 1, . . . , n. Then there exist scalars λ1 , . . . , λn
such that
ņ
f0 λi fi 0.
i 1
Proof: In E n , consider the subspace M formed by all points of the form pf1 pxq, . . . , fn pxqq. Define the
linear functional g on rM s by g pmq f0 pxq. By the Hahn-Banach theorem, we can extend g to λ defined
on all of E n . Furthermore, since pE n q E n , we can represent λ as pλ1 , . . . , λn . Thus, we have found
λ1 , . . . , λn such that
ņ
λi f1 pxq f0 pxq;
i 1
T HEOREM 7.4.4 (Lagrange Multipliers ([1, Theorem 7.7.2, p. 188 – 189])). If x0 is an extremum of the func-
tional f subject to the constraints gi pxq 0 for i 1, . . . , n and x0 is a regular point of these constraints, then
there exist scalars λi for i 1, . . . , n such that the functional
ņ
Lpxq f pxq λ i g i px q
i 1
for all h.
Example 7.4.5 (Dido’s Problem ([1, Example 7.7.1, p. 189])). Find the function x that maximizes
»1
xptq dt
1
subject to the arc length constraint
»1 a
x9 2 1 dt L
1
with xp1q xp1q 0.
fx px, x,
9 tq d
9 tq 1 λ dtd ? 2x9
fx9 px, x, 1 λ p1 x:x9 2q3{2 κ 0.
dt x9 1 looooomooooon
Note that κ is the curvature of a plane curve (see [8, Example 4.2.2, p. 90]). the above equation says that κ
is the constant 1{λ, so x must be the arc of a circle
Example 7.4.6. A linear functional on L4 r0, 1s is represented by a function y ptq that is non-negative on
r0, 1s. Find the function xptq that maximizes xx, yy subject to the constraint }x}4{3 1.
4 1{3
fx px, x,
9 tq fx9 px, x,
9 tq y 0,
d
λx
dt 3
e 3 3t dt e4t dt 1
4 1
0 0 4 4e4
and
3{4
xptq e3t .
1 1
q
4 4e4
Example 7.4.7. In many of the problems we solved in Hilbert space, the goal was to minimize the norm of
a vector x P L2 r0, 1s subject to the constraints pyi , xq ci for i 1, . . . , n. Using the calculus of variations,
show that x is a linear combination of the constraints.
7.4. Problems with Constraints U 113
Example 7.4.8. Find the function x P L4 r0, 1s of minimum norm such that
»1
t3 xptq dt 1.
0
fx px, x,
9 tq fx9 px, x,
9 tq 4x3 0,
d
λt3
dt
so xptq 5t4 . q
114 U 7. C ALCULUS OF VARIATIONS
C HAPTER 8
CONVEX FUNCTIONALS
Recall from single variable calculus that if a function(al) f satisfies f 2 pxq ¥ 0 on ra, bs, then any point x
where f 1 pxq 0 is not merely a local minimum but a global minimum. The following theorem helps us
generalize the second derivative test by relating the second derivative to convexity.
T HEOREM 8.1.1. If f pxq is twice differentiable on ra, bs, then it is convex on ra, bs if and only if f 2 pxq ¥ 0 on C¿
Proof: First we show the “if” direction: specifically, we show that if f is a convex function, then f 1 ptq is
increasing or constant for all x P C. Pick x1 , x2 P C such that x1 x2 . Then
f px 2 hq f px 2 q f px 1 hq f px 1 q
f 1 px2 q f 1 px1 q lim ,
hÑ0 h
f pλx1
p1 λqx2q f px1q ¡ p1 λqf px1q p1 λqf px2q f px2q f px1q ,
f 1 pz 1 q
p1 λqpx1 x2q p1 λqpx1 x2q x2 x1
where the inequality follows from our assumption. Similarly, there exists some z2 P pλx1 p1 λqx2 , x2 q
such that
f px2 q f pλx1 p1 λqx2 q λf px1 q λf px2 q f px 2 q f px 1 q
f 1 pz 2 q
λpx1 x2 q λ px 1 x 2 q x2 x1
,
but this is a contradiction since we chose z2 ¡ z1 and f 1 is increasing. Therefore, f must be convex. q
115
116 U 8. C ONVEX F UNCTIONALS
Definition 8.1.2 (convex functional). A real-valued functional f defined on a convex subset C of a linear
vector space is convex if
f pλx1 p1 λqx2 q ¤ λf px1 q p1 λqf px2 q
for all x1 , x2 P C and all λ P p0, 1q.
is convex.
Solution: We have
»1
f pλx1 p1 λqx2q pλx1ptq p1 λqx2ptqq2 dt
0
»1 »1 »1
λ 2
x21 ptq dt p 1 λq 2
x22 ptq dt λ p1 λ q p pq
x1 t p qq
x2 t 2 dt . q
0 0 loooooooooooooooooomoooooooooooooooooon
0
¥0
Q UIZ T HEOREM 15 ([1, Proposition 7.8.1, p. 191]). Let f be a convex functional defined on a convex set C in a
normed space. Let µ inf xPC f pxq. Then
Proof: First, suppose x1 , x2 P Ω. Then for x λx1 p1 λqx2 with λ P p0, 1q, we have f pxq ¤ λf px1 q
p1 λqf px2q µ. Furthermore, for any x P C, f pxq ¥ µ by definition. Therefore, f pxq µ.
Next, suppose N is a neighborhood about x0 in which x0 minimizes f . For any x P C with x x0 ,
there exists some λ P p0, 1q such that x1 λx0 p1 λqx P N . Then
paq pbq
f px 0 q ¤ f px1q ¤ λf px0q p1 λqf pxq,
where (a) follows from the fact that x0 is a local minimum and (b) follows from the convexity of f . Rear-
ranging terms, we have f px0 q ¤ f pxq for all x P C, so x0 is a global minimum. q
Definition 8.1.4 (epigraph). The epigraph of a convex functional f defined on a convex set C in a vector
space X is the set
rf, C s tpr, xq P R X : x P C, f pxq ¤ ru.
8.2. Conjugate Convex Functionals U 117
Solution: Pick px1 , y1 q and px2 , y2 q such that f px1 q ¤ y1 and f px2 q ¤ y2 . Then
Definition 8.2.1 (conjugate convex). Let f be a convex functional defined on a convex set C in a normed
space X. The conjugate set C is defined as
" *
C x P X : suppxx, x y f pxqq 8
xPC
Example 8.2.2. Let C r0, 1s and f pxq x2. Find C , f pxq, C , and f .
Figure 8.2.1: The epigraphs rf, C s and rf , C s in Example 8.2.2. The linear functionals mx for m 0, 1, 2
are shown on rf, C s, and the linear functionals xm for x 0, 1{2, 1 are shown on rf , C s.
Solution: Any element of the dual space can be represented by a real number m in the form xx, x y mx.
Figure 8.2.1 shows the convex sets rf, C s and rf , C s. From the graph and using elementary calculus, we
118 U 8. C ONVEX F UNCTIONALS
see that $
'
'
&0 m¤0
f px q f pmq m2
0 m 2.
' 4
'
%m 1 m¥2
Furthermore, f pmq is the vertical coordinate of the point where the hyperplane (line) tangent to the
graph of f and parallel to the subspace mx intersects the vertical axis. For m ¤ 0, clearly the tangent line
and mx are equal, so 0 f pmq. For m P p0, 2q, the tangent line is given by y 2x0 px x0 q x20 : for
m 2x0 , the vertical intercept of this line is m2 {4 f pmq. For m ¥ 2, the tangent line is given by
y mpx 1q 1: the vertical intercept of this line is pm 1q f pmq.
Is this simple case, X X, and any element of X can be represented by a real number x in the
form xx , x y xm. Clearly f pxq is undefined except on r0, 1s, so C r0, 1s. Using elementary
calculus, we see that f pxq x2 for x P p0, 1q. q
Example 8.2.3. Let C r1, 2s and f pxq 0. Find C , f pxq, C , and f .
Figure 8.2.2: The epigraphs rf, C s and rf , C s in Example 8.2.3. The linear functionals mx for m 1, 0, 1
are shown on rf, C s, and the linear functionals xm for x 1, 3{2, 2 are shown on rf , C s.
Solution: We construct C and f using the “support functional” approach suggested in the previous
example: construct the hyperplane, parallel to the subspace mx, that touches f at one or more points in
C and lies below the graph elsewhere. Then f pmq is the vertical intercept of the hyperplane. If no such
hyperplane exists, then m R C .
Any element of the dual space can be represented by a real number m in the form xx, x y mx.
Figure 8.2.2 shows the convex sets rf, C s and rf , C s. Clearly,
#
m 0
f px q f pmq
m
2m m¥0
and C R. Again, we use the support functional approach to find f and C : from Figure 8.2.2,
clearly f f and C C. q
8.2. Conjugate Convex Functionals U 119
Example 8.2.4. Let C E n and f pxq α p1 }x}pp for 1 p 8 and α ¡ 0. Find C and f .
The supremum exists since the sum inside the supremum is finite. Differentiating with respect to each ξi
gives
ηi α|ξi |p1 sgn ξi for i 1, . . . , n, (8.2.1)
so
ņ ņ
f px q ξi η i α|ξi |p α 1q |ξi|p.
1
i 1
p i1
Taking absolute values, exponentiating both sides of (8.2.1) by q, and using the identity p q pp 1q gives
|ηi|q αq |ξi|p; substituting this above gives
ņ
f px q α1q |ηi|q 1q α1q }x}q .
1
q
q i1
Find C and f .
Then »1
f py q suppxy, uy f puqq sup pyptquptq puptqqpq dt;
u CP u CP 0
fu pu, u,
9 tq d
dt
9 tq y pup1
fu9 pu, u, 0,
so y pup1. We check that y P Lq p0, 1q:
»1 »1 »1
pyptqq q
dt p puptqq p q dt
q q p 1
pq puptqqp dt 8
0 0 0
Q UIZ T HEOREM 17 (first part of [1, Proposition 7.10.1, p. 196]). The conjugate set C and the conjugate
functional f are convex.
Therefore, f is convex. Since C is the set of all x where f px q is defined, C is also convex. q
Proposition 8.2.5 (second part of [1, Proposition 7.10.1, p. 196]). rf , C s is a closed convex subset of R X.
Proof: From Quiz Theorem 17, rf , C s is convex. To show that it is closed, let tpsi , xi qu be a convergent
sequence in rf , C s with psi , xi q Ñ ps, x q. We must show that ps, x q P rf , C s. For every i and every
x P C,
paq
si ¥ f px q ¥ xx, xy f pxq,
where (a) follows by definition of the epigraph and (b) follows by definition of f . Taking the limit as
i Ñ 8 gives s ¥ xx, x y f pxq ¥ f px q for all x P C, so ps, x q P rf , C s. q
The following proposition confirms our intuition about recovering C and f from C and f , as we did
in the previous examples:
Proposition 8.2.6 ([1, Proposition 7.10.2, p. 197]). Let f be a convex functional on a convex set C in a normed
space X. If rf, C s is closed, then rf, C s rrf, C s s.
Example 8.2.7. Let C D̄2 (the closed unit disk in E 2) and let f pxq f pu, vq 0. Find C , f , C , and
f .
8.3. Conjugate Concave Functionals U 121
?
Solution: We actually found C R2 and f pa, bq a2 b2 in Example 1.3.2. By Proposition 8.2.6,
C C and f f since rf, C s D̄2 X R3 is closed. q
Definition 8.3.1 (concave functional). A functional g defined on a convex set D of a linear vector space is
concave if g is convex.
Note that the set D is still convex, not concave. Furthermore, a functional is linear if and only if it is both
convex and concave.
Just as we defined the epigraph for convex functionals, we define the hypograph for concave func-
tionals:
Definition 8.3.2 (hypograph). The hypograph of a concave functional g defined on a convex set D in a
vector space X is the set
rg, Ds tpr, xq P R X : x P D, f pxq ¥ ru.
The hypograph is a convex set: the proof is analogous to that in Example 8.1.5.
We can also define conjugate functionals and sets for concave functionals, just as we did for convex
functionals:
Definition 8.3.3 (conjugate concave). Let g be a concave functional defined on a convex set D in a normed
space X. The conjugate set D is defined as
" *
C x P X : inf pxx, x y f pxqq ¡ 8
xPD
Example 8.3.4. Let D r0, 1s and gpxq x2. Find D and g.
Solution: Any element of the dual space can be represented by a real number m in the form xx, x y mx.
Figure 8.3.1 shows the convex sets rg, Ds and rg , D s. From the graph and using elementary calculus, we
see that $
'
'
&0 m¥0
g px q g pmq m4 2 m 0 .
2
'
'
%m 1 m ¤ 2
and C R. From Figure 8.3.1, clearly g g and D D. Note that gpmq f pmq from Example
8.2.2. q
122 U 8. C ONVEX F UNCTIONALS
Figure 8.3.1: The hypographs rg, Ds and rg , D s in Example 8.3.4. The linear functionals mx for m
0, 1, 2 are shown on rg, Ds, and the linear functionals xm for x 0, 1{2, 1 are shown on rg , D s.
Figure 8.3.2: The hypographs rg, Ds and rg , D s in Example 8.3.5. The linear functionals mx for m
1, 0, 1 are shown on rg, Ds, and the linear functionals xm for x 1, 3{2, 2 are shown on rg, Ds.
Solution: Any element of the dual space can be represented by a real number m in the form xx, x y mx.
Figure 8.3.2 shows the convex sets rg, Ds and rg , D s. From the graph,
#
2m m 0
g px q g pmq
m m¥0
and D R. From Figure 8.3.2, clearly g g and D D. Note that gpmq f pmq from Example
8.2.3. q
8.4. Fenchel Duality U 123
Example 8.3.6. Let D p0, 8sn equipped with the Euclidean norm and
ņ
g px q α
1
ξp
p i1 i
Solution: The proof is nearly the same as in Example 8.2.4. If we represent an element x P D D as
tη1, . . . , ηnu, we have ņ
g px q α 1q 1
ηiq . q
q i1
Suppose that f is convex over convex domain C and g is concave over convex domain D. A standard
minimization problem is to find the minimum vertical distance
While g pxq usually equals zero, introducing g is helpful because it allows us to exploit duality: the mini-
mum vertical separation between the convex sets rf, C s and rg, Ds equals the maximum vertical separa-
tion between two parallel hyperplanes separating rf, C s and rg, Ds, as shown in Figure 8.4.1.
Figure 8.4.1: The minimum vertical separation between the convex sets rf, C s and rg, Ds equals the maxi-
mum vertical separation between two parallel hyperplanes separating rf, C s and rg, Ds.
Q UIZ T HEOREM 18 (Fenchel Duality Theorem ([1, Theorem 7.12.1, p. 201])). Suppose that f is convex over
C and g is concave over D, where C, D are convex sets in a normed space X. Assume that C X D contains points
124 U 8. C ONVEX F UNCTIONALS
in the relative interiors of C and D and that either rf, C s or rg, Ds has nonempty interior. Further assume that
Then
µ max
x PC XD
pgpxq f pxqq,
where the maximum on the right is achieved by some x0 P C X D .
If the infimum on the left is achieved by some x0 P C X D, then
so f pxq g pxq ¥ g px q f px q. Since this holds for all x P C X D and x P C X D ,
µ inf pf pxq gpxqq ¥ sup pg px q f px qq.
P X
x C D x PC XD
and similarly, rf µ, C s lies below the hyperplane but is also arbitrarily close to it, so
c suppxx, x0 y pf pxq µqq f px0 q µ, (8.4.2)
x C P
so µ g px0 q f px0 q. If the infimum µ is achieved by some x0 P C X D, then the sets rf µ, C s and
rg, Ds have the point pgpx0q, x0q in common and this point is in the separating hyperplane, so by (8.4.1)
8.4. Fenchel Duality U 125
and (8.4.2),
µ inf
P X
x C D
f pxq max
x PC XD
pgpxq f pxqq
is the desired minimum of f .
Example 8.4.1 (generalization [1, Example 7.12.1, p. 202]). Suppose you have w to spend on n activities,
each of which costs wi per unit of work for i 1, . . . , n. Assume that the return associated with activity i
is gi pxi q, xi is the number of units of work allocated to the activity, and gi is increasing and gi1 is decreasing
due to diminishing marginal returns (so g is concave).
to capture the budget constraint, and set D r0, 8qn to capture the positivity constraints. Set f pxq 0
since we are maximizing a concave functional g. Then for x px1 , . . . , xn q,
ņ
f px q sup xi xi ,
P
x Ci 1
126 U 8. C ONVEX F UNCTIONALS
which is finite only if x λpw1, . . . , wnq, in which case f pxq λw. Therefore, C tλpw1, . . . , wnqu.
Furthermore,
gi pxi q inf pxixi gipxiqq
xi 0,Pr 8q
is defined for all xi P r0, 8q since gipxiq is concave, so
ņ ņ
g px q inf x i x g p xq gi pxi q;
xPD
i
i 1
i 1
Example 8.4.2. As a concrete application of Example 8.4.1, say you have w 5 to spend on two presents:
? ?
x1 on the first and x2 on the second. Maximize the total gratitude you generate, where total gratitude is
given by g pxq g1 px1 q g2 px2 q, with g1 px1 q 2 x1 and g2 px2 q x2 .
where the infima are achieved at x1 p1{x1 q2 and x2 p1{2x2 q2. The dual problem is
1 1
min λw ;
λ λ 4λ
Example 8.4.3. As another concrete application of Example 8.4.1, say you have $w million to invest in oil
exploration and wind power. The government gives you a 50% subsidy for your wind power investment,
2 x2 w. The total return on your investment is given by g pxq
1
so your budget constraint is x1
8.4. Fenchel Duality U 127
g 1 px 1 q g2 px2 q, where
# #
0 ¤ x1 ¤3 0 ¤ x2 ¤2.
g1 px1 q g 2 px 2 q
4x1 3x2
9 x1 x1 ¡3 4 x2 x2 ¥2
The table below shows values of the optimal x1 and x2 for various values of a, which we can determine
using the graphs in Figure 8.4.3.
a λ x1 x2 x1 x2 µ
1{2 6 6 3 0 1 3
2 4 4 2 1 2 10
5 2 2 1 3 4 20
q
Example 8.4.4 ([1, Example 7.12.2, p. 203]). As yet another concrete application of Example 8.4.1, say you
have $w to bet on a horse race with n horses. Assume that we know that the public has bet si on horse i
128 U 8. C ONVEX F UNCTIONALS
Figure 8.4.3: The function λa g1 pλq g2 1
2λ in Example 8.4.3 for a 1{2, 2, 5.
and that horse i has probability pi of winning for i 1, . . . , n. We bet xi on horse i, and we must bet all of
our money on the n horses. Let P denote the amount the track pays out to bettors on the winning horse
(typically 80% of the total amount bet). Maximize your expected fraction of the total amount P paid out.
Remember that the value of the conjugate functional gi at xi is obtained by finding the lowest line of
slope xi which lies above gi . Therefore, gi pxi q is undefined for xi ¤ 0, and since gi1 p0q pi {si , gi pxi q 0
for xi ¥ pi {si . By elementary calculus, we have for 0 λ pi {si
c
xi xi x 0 ùñ xi si .
d pi xi pi si pi si
p x i s i q2 xi
i (8.4.3)
dxi xi s i
Then #
0 xi ps
px 2
i i
yu . c.
.. ..
.
py n , u q cn
Solution: Let C L2r0, 1s, D tu : yu cu, and gptq 0. To find f puq, we want to maximize
»1
»1
J pu, uq f puq uptqu ptq
1
2
ppuptqq2 dt 9 tq dt,
v pu, u,
0 0
9 tq
vu pu, u,
d
dt
9 tq u u 0.
vu9 pu, u,
Therefore, »1
f pu q supppu, u q f puqq puptqq2 dt
1
uPC 2 0
Therefore, g pu q is defined only if u °ni1 λiyi yT λ for λT pλ1, . . . , λnq, in which case
ņ ņ
g pu q inf u, λ i yi uinf λi ci pλ, cq cT λ
uPD PD
i1
i 1
and D tu : u yλu. By Quiz Theorem 18, the dual maximization problem is then
where by elementary matrix calculus, we achieve the maximum at λ pyyT q1c. Then the solution to
the original minimization problem is u yT λ. q
B IBLIOGRAPHY
[1] D. G. Luenberger, Optimization by Vector Space Methods. Wiley Interscience, 1969 [1997].
[3] M. J. D. Powell, Approximation Theory and Methods. Cambridge University Press, 1981.
[5] S. Axler, Linear Algebra Done Right. Springer, 2nd ed., 2004.
[7] G. Bachmann and L. Narici, Functional Analysis. Dover Publications, 2nd ed., 1998.
131