Nothing Special   »   [go: up one dir, main page]

Paul G.bamberg-Convexity and Optimization With Applications

Download as pdf or txt
Download as pdf or txt
You are on page 1of 131
At a glance
Powered by AI
The document discusses concepts in convexity and optimization with applications to vector spaces and functional analysis. It covers topics such as linear programming, minimum norm problems, Banach spaces, Hilbert spaces, and duality.

The book covers topics such as generalizing optimization from two dimensions, preliminaries in algebra, topology, analysis, Banach spaces, Hilbert space, dual spaces, and the Hahn-Banach theorem.

Mathematical concepts introduced include vector spaces, convex sets, normed vector spaces, open and closed sets, convergence, limits, compactness, inner products, and functional analysis.

Convexity and Optimization

WITH A PPLICATIONS

Paul G. Bamberg
Harvard University
Convexity and Optimization
WITH A PPLICATIONS

Paul G. Bamberg

Copyright 2008
c
Paul G. Bamberg
Harvard University
Cambridge, MA 02138

This text is based on lecture notes by Paul G. Bamberg written for M ATH 116: Convexity and Optimiza-
tion with Applications, a course offered at Harvard University in Fall 2008. The notes were meant to
complement Optimization by Vector Space Methods by David Luenberger (Wiley Interscience, 1969 [1997]).

Front cover: The image was generated by the following M ATHEMATICA code:

GraphicsGrid[
Table[ReliefPlot[
Table[Evaluate[
Sum[RiemannSiegelZ[RandomReal[3, 2].{x, y}], {3}]], {x, 0,
10, .2}, {y, 0, 10, .2}],
ColorFunction -> ColorData["BlueGreenYellow"],
Frame -> False], {3}, {3}]]
C ONTENTS

1 Generalizing from Two Dimensions 5


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Existence of Optimal Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Finite- vs. Infinite-Dimensional Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Minimum Norm Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Preliminaries in Algebra, Topology, and Analysis 21


2.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Linear Independence and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Normed Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6 Convergence, Limits, and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Banach Spaces 41
3.1 lp Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Lebesgue Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Lp Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 Compactness and Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.7 Denseness and Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4 Hilbert Space 59
4.1 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 The Projection Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 Orthogonal Complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.4 The Gram-Schmidt Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5 Dual Spaces and the Hahn-Banach Theorem 75


5.1 Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Common Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6 Applications of the Hahn-Banach Theorem 87


6.1 The Dual of C ra, bs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2 The Second Dual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3 Alignment and Orthogonal Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4 Minimum Norm Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

3
4 U CONTENTS

6.6 Hyperplanes and Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7 Calculus of Variations 103


7.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.2 Gateaux and Fréchet Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.3 Euler-Lagrange Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.4 Problems with Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

8 Convex Functionals 115


8.1 Local to Global . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.2 Conjugate Convex Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.3 Conjugate Concave Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8.4 Fenchel Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Bibliography 130
C HAPTER 1
GENERALIZING FROM TWO DIMENSIONS

Reading: [1, Chapter 1]

1.1 Introduction

The general approach in [1], which has made the book a classic, is this:

• Identify techniques from algebra, elementary single-variable calculus, or elementary multivariable


calculus that can be used to solve optimization problems.

• Reformulate the solution geometrically.

• Using geometry for inspiration, generalize the solution, typically to infinite-dimensional vector
spaces and non-Euclidean norms, and prove (algebraically) that it is still valid.

All the finite-dimensional problems in this chapter should be familiar, though they may be valuable re-
view for some students. The infinite-dimensional problems are just stated, not solved, and we will take
quite a while to get to them.
In this chapter, we will hold off on defining some important concepts, which for now are just in SMALL
CAPS . These concepts will appear in Chapters 2, 3, and 5 of [1], generally in a context where there is no
mention of optimization, just some challenging mathematics. You will need to learn them before you
tackle optimization. In the process, you will acquire a good background in real analysis and in the branch
of mathematics called “functional analysis”, the theory of normed infinite-dimensional vector spaces.
My hope for these introductory notes is to convince you that optimization problems are fun and rele-
vant, that some of the best ones can only be formulated in infinite-dimensional vector spaces, and that it
is worth your while to learn quite a few new definitions and theorems in order to be able to solve them.

1.2 Existence of Optimal Solutions

We begin by remembering an EXISTENCE THEOREM from real analysis:

T HEOREM 1.2.1 (extreme value theorem). If f is a continuous real function on a compact metric space X,
M  suppPX f ppq, and m  inf pPX f ppq, then there exist points q, r P X such that f pqq  M and f prq  m.

The following examples illustrate applications of this theorem.

5
6 U 1. G ENERALIZING FROM T WO D IMENSIONS

Example 1.2.2. You are entering a student competition to draw up a business plan for a company with
m scientists and n other employees. Entries with m2 ¡ 2n2 get rejected. You want to have the highest
possible ratio of scientists to other employees. Does this optimization problem have a solution?

Solution (based on [2, Example 1.1, p. 2]): We want to solve max m{n such that m2 ¤ 2n2 for m, n P N.
Equivalently, we want to find the largest rational number p  m{n such that p2   2. We cannot apply
Theorem 1.2.1 because the function we are optimizing (our OBJECTIVE FUNCTION) is rational-, not real-
valued. This alone is not enough to rule out the existence of an optimal solution, but we can show that, in
fact, an optimal solution does not exist.
Assume that p is this largest rational number such that p2   2. Define

 p  pp 22 .
2
q (1.2.1)

Then
2pp2  2q
q2  2 
pp 2 q2 . (1.2.2)

Since p2   2, (1.2.2) shows that q 2   2. However, (1.2.1) shows that q ¡ p, which contradicts our initial
assumption that p is the largest rational number such that p2   2. Therefore this optimization problem
has no solution. q
Of course, note that as p2 approaches 2, m and n would become larger and larger. Since the number of
people is finite, we should impose constraints m ¤ mmax and n ¤ nmax , in which case there would be a
solution.

Example 1.2.3. As director of the state lottery, you are designing scratch tickets by assigning probabilities
to the possible payoffs from 0 through $4. Since a ticket sells for $5, you want to be as generous as possible.
Does this optimization problem in R5 have a solution? Change the problem so that any nonnegative
integer payoff is allowed. Does this optimization problem in an infinite-dimensional vector space have a
solution?

Solution: We define “as generous as possible” as “maximizing the expected value of the ticket”. There-
°
fore, in the first problem, we want to solve max 4k0 kpk where pk is the probability of receiving a payoff
°
of $k, such that 0 ¤ pk ¤ 1 for all k  0, . . . , 4 and 4k0 pk  1. Our objective function is continuous
and real-valued, and the constraints on tpk u4k0 define a compact set, so Theorem 1.2.1 guarantees the
existence of a solution. We simply set pk  1 for k  4 and pk  0 for k  4.
°
For the second problem, we want to solve max kPZ kpk , such that 0 ¤ pk ¤ 1 for all k P Z and
°
kPZ pk  1. The notion of compactness is tricky in infinite-dimensional vector spaces so we cannot
apply Theorem 1.2.1 to this problem, but we can show that, in fact, an optimal solution does not exist.
Assume that we have some solution pk  πk for all k P Z . We can always increase the expected value of
the ticket and still satisfy our constraints by setting p0  0 and pk  πk1 for k P N since
8̧ 8̧ 8̧
kπk1  pk 1qπk ¥ kπk .

k 1 
k 0 
k 0

Because the expected value is unbounded, there is no solution. q


1.3. Linear Programming U 7

We will return to the concepts of compactness and METRIC SPACES later.

1.3 Linear Programming

Example 1.3.1. Your small bakery can produce only two products: frosted cookies and cakes. A batch
of frosted cookies uses up 1 pound of flour and 3 pounds of sugar. A batch of cake uses up 2 pounds
of flour and 1 pound of sugar. Each day your suppliers bring you 14 pounds of flour and 17 pounds of
sugar. Your optimization problem is to look at the market price of cookies and cakes and decide what to
produce.

Figure 1.3.1: possible production schemes in Example 1.3.1

Solution: Let x be the number of batches of cookies, y be the number of batches of cake, p1 be the market
price of a batch of cookies, and p2 be the market price of a batch of cake. Then we want to solve:

max p1 x p2 y (revenueq
x,y

such that x 2y¤ 14 (flour constraint)


3x y ¤ 17 (sugar constraint)
x, y ¥ 0

Figure 1.3.1 shows the possible production schemes. Observe that if v and w represent possible produc-
tion schemes, then αv p1  αqw with 0 ¤ α ¤ 1, is also possible. This is the definition of a CONVEX SET.
To show this, let      
A ,x  ,b 
1 2 x 14
3 1 y 17

so that we can write the constraints as Ax ¤ b. Then if Av ¤ b and Aw ¤ b, αAv ¤ αb and p1  αqAw ¤
8 U 1. G ENERALIZING FROM T WO D IMENSIONS

p1  αqb; adding these inequalities gives the desired result.


Now we can consider what the optimal solutions are for various values of p1 and p2 :

• p1  5, p2  5. We see from Figure 1.3.1 that the optimal solution is x  4, y  5. Revenue is


$5  4 $5  5  $45.

• p1  1, p2  7. Again, we see from Figure 1.3.1 that the optimal solution is x  0, y  7. Revenue is
$1  0 $7  7  $47.

• p1  6, p2  2. Notice in Figure 1.3.1 that the revenue function overlaps with the sugar constraint.
Therefore, we maximize revenue by choosing any point along this constraint. Revenue is $6  17 3
$2  0  $34. q

The revenue function is an example of a ( LINEAR ) FUNCTIONAL on R2 that we are trying to optimize.
It is an element of the DUAL SPACE. The straight line that we “slide” to solve the problem is an example
of a HYPERPLANE. The fact that this approach works for any convex set is a simple consequence of the
H AHN -B ANACH T HEOREM.
What we have done is to calculate, for any functional in the dual space, the largest value that this
function can achieve subject to the constraint imposed by our budget. This functional on the dual space
is called the SUPPORT FUNCIONAL.

Example 1.3.2. What is the support functional for the (closed) unit disk D̄1 ? Can you reconstruct the unit
disk (or any other convex set) from its support functional?

Figure 1.3.2: As shown in Example 1.3.2, on the left we see that the envelope of all functionals for which
? the closed
the support functional returns a constant value (i. e. , 1) is the boundary of our convex set,
unit disk. On the right we see that the contour plots of the support functional cpm, nq  m2 n2 are
concentric circles centered at the origin; the contour cpm, nq  1 is the boundary of our convex set.

Solution: Let the functionals be given by mx ny. The support functional is then
(
cpm, nq  inf c : mx ny ¤ c for all px, yq P D̄2
1.4. Finite- vs. Infinite-Dimensional Vector Spaces U 9

(D̄2 is the closed unit disk). We can see from the left side of Figure 1.3.2 that cpm, nq  c0 , where mx0
ny0  c0 is the tangent to the unit disk at the point px0 , y0 q. The slope of the tangent is m{n, so the
slope of the normal is n{m? ?
and the angle between the normal and the positive x-axis is θ? tan1 pn{mq.
Therefore, x0  cos θ  m{ m2 n2 and y0  sin θ  n{ m2 n2 . Then cpm, nq  c0  m2 n2 .
Observe that the value of the support functional evaluated for a given functional was just the given
functional evaluated at some point on the boundary of the convex set. Therefore, intuitively, it seems
that we can reconstruct the convex set from the support functional. We can see this in two ways. First,
the boundary of the convex set can be given by a contour of the support functional, as shown in Figure
1.3.2. Equivalently, the boundary of the convex set is the envelope of all functionals for which the support
functional returns some constant value, as shown in Figure 1.3.2. q

1.4 Finite- vs. Infinite-Dimensional Vector Spaces

Example 1.2.3 illustrated some of the added complexities of moving from finite- to infinite-dimensional
vector spaces. The following examples further explore the differences between solving optimization prob-
lems in finite- versus infinite-dimensional vector spaces.

Example 1.4.1 (infinite-dimensional: equivalent to [1, Example 8.7.3, p. 231–234]). You are playing a real-
time strategy computer game in which you have to build up a civilization from scratch. The first phase of
the game, the “Age of Agriculture”, lasts from time t  0 to time t  T . In this phase, you have farms that
produce at a rate f ptq. Your farms disappear when you move to the next age, the “Age of Mining”. What
makes this game interesting is that you can allocate production between reinvestment rptq and storage
sptq. Your farm production increases at a rate proportional to your reinvestment:

f9ptq  krptq, k ¡ 0, f9ptq  df .


dt
What system of allocation maximizes the amount of food available at time T ?

Solution: Note that if we set sptq  0, then f ptq  rptq and f9ptq  krptq  kf ptq, so f ptq  f0 exppktq. In
any case, we can solve (1.4.1) with the boundary condition f p0q  f0 :
»t
f pt q  krpτ q dτ f0 .
0

Then the total amount of food stored by time T is


»T »T » T » t

sptq dt  pf ptq  rptqq dt  krpτ q dτ  rptq dt f0 T.


0 0 0 0

Reinvestment and storage must be nonnegative, so we have the constraints


»t
0 ¤ rptq ¤ f ptq ùñ 0 ¤ rptq ¤ krpτ q dτ f0 .
0

If functions r1 ptq and r2 ptq satisfy these inequalities, so does αr1 ptq p1  αqr2 ptq, where 0 ¤ α ¤ 1. The
set of acceptable solutions to the problem is a CONVEX SET in an INFINITE DIMENSIONAL vector space and
10 U 1. G ENERALIZING FROM T WO D IMENSIONS

the quantity to be maximized is a LINEAR FUNCTIONAL on this space.


We guess that the optimal strategy is to reinvest everything during the interval p0, ts and then store
everything during the interval pt, T s:

• τ P p0, ts: f pτ q  rpτ q, so f9pτ q  krpτ q  kf pτ q, f pτ q  f0 exppkτ q, and spτ q  0.


• τ P pt, T s: f pτ q  spτ q, so f9pτ q  krpτ q  0, f pτ q  f0 exppktq, and spτ q  f0 exppktq.

Then we want to maximize »T


spτ q dτ  pT  tqf0 exppktq.
0
Differentiating with respect to t gives

pT  tqkf0 exppktq  f0 exppktq  0 ùñ t  T  k1 .


This solution holds if T  1{k ¥ 0 (if k ¥ 1{T ); if k   1{T , then the derivative of the objective function
with respect to t is always negative and we are forced into a corner solution in which we just set t  0. q

Example 1.4.2 (finite-dimensional). You are growing a new genetically engineered crop as part of a 2-year
biofuels experiment. Your contract requires you to deliver 5 tons at the end of each year. The government
?
pays all costs except the cost of fencing your plot of land, so the cost of producing x tons can be modeled
as cpxq  6 x. The inventory cost of storing your excess crop is hx.

Figure 1.4.1: possible production schemes for Example 1.4.2

Solution: Let yi be the number of tons grown in year i for i  1, 2. We have the constraints y1 ¥ 5 and
y1 y2 ¥ 10 since we must grow at least 5 tons in order to be able to deliver the required amount by the
end of year 1, and we must grow at least 10 tons in order to be ale to deliver the required amount by the
end of year 2. The region corresponding to possible production schemes is shown in Figure 1.4.1. Clearly,
1.4. Finite- vs. Infinite-Dimensional Vector Spaces U 11

? ?
our cost function 6 y1 6 y2 hpy1  5q is strictly increasing in both y1 and y2 within the feasible region,
so the minimum cost will be found somewhere along the line y1 y2  10 between y1  5 and y1  10.
We substitute y2  10  y1 into the cost function: now we want to minimize
?
6 y1
a
6 10  y1 hpy 1  5 q

with respect to y1 . The derivative with respect to y1 is


?10  y  ?y
?3y  10  y h  3 a 1
3 1
h,
1 1 y1 p10  y1 q

which is strictly positive for y1 P p5, 10s and zero for y1  5. Therefore, we reach a corner solution by
?
setting py1 , y2 q  p5, 5q, at which the minimum cost is 12 5  26.83. q

Example 1.4.3 (infinite-dimensional: equivalent to [1, Example 1.2.2, p. 3]). Your new contract requires
to deliver at a known rate dptq during the interval t P p0, T s. You produce at a rate rptq. Youra
rate of
production cost is cprptqq (if all you have to pay for is maintaining the fence, this might equal rptq. If
you have xptq tons on hand, you pay inventory costs at a rate hxptq. You start with an inventory xp0q.

Solution: First, note that x9 ptq  rptq  dptq. Both inventory and production must be nonnegative, so we
have the constraints
»t
xptq  xp0q prpτ q  dpτ qq dτ ¥ 0
0
rptq ¥ 0.

We are trying to minimize total costs:


»T
pcprptqq hxptqq dt.
0

The optimal solution r ptq lies in an infinite-dimensional vector space. Note that r ptq is not necessarily
continuous: for example, we might want to produce some constant positive quantity r pτ q  r for τ P
p0, ts and then produce rpτ q  0 for τ P pt, T³s. rptq is not necessarily bounded either: if h is very small,
then the optimal solution will be to produce 0 dptq dt, the total amount required over the interval p0, T s,
T

as rapidly as possible. q
See [1, Example 8.7.4, p. 234] for a related problem.

Example 1.4.4 (finite-dimensional). You are operating a simple frictionless rocket-propelled car. You ex-
pend fuel instantaneously to increase your speed to v miles/second, coast 1 mile, and use your brakes to
stop instantaneously. You then expend fuel to increase your speed to w miles/second, coast 4 miles, and
stop instantaneously. The challenge is to minimize the travel time. The constraint imposed by your fuel
tank is that v w  3.
12 U 1. G ENERALIZING FROM T WO D IMENSIONS

Solution: The total time spent is 1{v 4{w. We use a Lagrange multiplier to solve this problem:

Lpv, w, λq   λ pv w  3q
1 4
v w
BL   1  λ  0
Bv v2
BL
B w   w2  λ  0
4

Combining the two partial derivatives gives 4v 2  w2 . Combining this with the constraint gives 4v 2 
p3  vq2, so v  1 and w  2 (we discard the solution v  3 since v, w ¥ 0 as we are discussing speed,
not velocity). The travel time is 1.5 seconds. q
If this were a physics course, we might make this problem more realistic by assuming some nonzero
coefficient of friction between the car and the surface on which it travels and some function that gives the
rate of expenditure of fuel, among many other things.

Example 1.4.5 (infinite-dimensional: [1, Example 1.2.5, p. 4]). Now your rocket-propelled vehicle goes
straight up and is subject to gravity. You expend fuel at a rate uptq. Your goal is for the vehicle to reach
height h at time T while expending minimum fuel.

Solution (full solution in [1, Example 5.9.3, p. 125]): Assuming unit mass, massless fuel, and the absence
of aerodynamic forces, the equation of the rocket is governed by the second-order differential equation
: ptq  uptq  g. We cannot expend negative fuel and we want to reach height h at time T , so we have the
h
constraints

uptq ¥ 0
» T » t

hpT q  pupτ q  gq dτ dt ¥ h,
0 0

where we assume in the second constraint that hp0q  h9 p0q  0; that is, we start at ground level with zero
³T
velocity. We want to minimize the total fuel expended: 0 uptq dt.
The optimal solution u ptq is not necessarily continuous or bounded: for example, it might (and ac-
tually does) consist of an impulse at time t  0. In that case, in order to work with a function that is at
³t
least bounded, it might be a better idea to work with v ptq  0 upτ q dτ , the total amount of fuel expended
during the interval p0, ts. This insight is closely related to the R IESZ REPRESENTATION THEOREM. q

Example 1.4.6 (finite-dimensional). Your job is to invent the most exciting scratch ticket. If an outcome
has probability p of occurring, the excitement that results when that outcome occurs is proportional to
 log p. For example, the excitement of rolling 6s on two dice  log 361  2 log 61 is precisely twice the
excitement of rolling 6 on one die. The ticket can pay $k for k  0, 1, 2. You must assign probabilities pk
for k  0, 1, 2 to to these three outcomes. So that the state can make a nice profit, the expected payoff must
be 4{7 of a dollar.
1.4. Finite- vs. Infinite-Dimensional Vector Spaces U 13

Solution: Our two constraints are


2̧ 2̧
pk 1 kpk  74 .
k 0  
k 0

We want to maximize the quantity



H pX q   pk log pk . (1.4.1)

k 0

(H pX q, the “excitement function”, is the information entropy or Shannon entropy of the random variable X,
the payoff of the lottery ticket.) We can substitute the constraints into (1.4.1) to turn this into a single-
variable problem, where we maximize





 3
7
p2 log
3
7
p2
4
7
 2p2 log
4
7
 2p2 p2 log p2

over p2 . However, this would be messy, so we just use Lagrange multipliers:


 
2̧ 2̧ 2̧
Lpp1 , p2 , p3 , λ1 , λ2 q   pk log pk  λ1 pk  1  λ2 k  pk 
4
k 0  
k 0 k 0  7
BL  1 log pk  λ1  kλ2  0,  0, 1, 2
B pk k

k
Therefore, pk  e1 λ1 eλ2 . Some algebra tells us λ1   log 7  1 and λ2  log 2, so pk  2k {7. q

Example 1.4.7 (infinite-dimensional). Your job is to again to invent the most exciting scratch ticket. The
ticket can have any nonnegative integer payoff. You need to assign probabilities pk for k P Z to each of
these possible outcomes. Make the expected payoff be $1, since the tickets will sell for $1.50.

Solution: Now we are trying to maximize over an infinite number of variables. Obviously we cannot use
substitution as we could in the previous example since we still have an infinite number of variables after
taking into account the two constraints
8̧ 8̧
pk 1 kpk  1.
k 0  k 0
There are two LINEAR FUNCTIONALS in the dual space acting as constraints, so the CODIMENSION of this
problem is 2.
We can still solve the problem using Lagrange multipliers:
 
8̧ 8̧ 2̧
Lpp0 , p1 , . . . , λ1 , λ2 q   pk log pk  λ1 pk  1  λ2 k  pk  1

k 0 
k 0 k 0
BL  1 log pk  λ1  kλ2  0, ¥0
B pk k

k
Therefore, pk  e1 λ1 e λ2 . We recognize this as in the form of a geometric distribution pk  pp1  pqk
14 U 1. G ENERALIZING FROM T WO D IMENSIONS

(note this is a geometric distribution with support r0, 8q, not r1, 8q). Since the mean of a geometric
distribution is p1  pq{p, we have p  1{2, so λ1   log 2  1, λ2   log 2, and pk  1{2k 1 . q

1.5 Minimum Norm Problems

Minimum norm problems arise in several instances, as shown by the following examples.

Example 1.5.1. You are visiting a friend who lives at the point px, y q  p2, 1q. A bus drives through town
along the main highway, a subspace whose equation is x 2y  0.

• Where do you get off the bus in order to minimize your walking distance to the friend’s house?

• Suppose you take a taxicab, which can travel only north-south or east-west. The cab driver charges
for the driving distance. Where do you get out of the cab in order to minimize your cost?

• What if the driver only charges for the larger dimension?

Figure 1.5.1: minimizing the distance between the highway and the house for the various norms in Ex-
ample 1.5.1

Solution: In the first problem, we are minimizing the 2-NORM or E UCLIDEAN NORM. We can solve the
problem geometrically by expanding a circle centered at the house until it is tangent to the road, as shown
in Figure 1.5.1. The point of tangency is the solution to the system
#
x 2y 0 highway
,
y  1  2 px  2 q normal to the highway through the point p2, 1q
a
or px, y q  p6{5, 3{5q. The minimum cost (assuming unit cost) is then p4{5q2 p8{5q2  4{sqrt5. Note
that we were implicitly invoking the PROJECTION THEOREM ([1], Theorem 2, p. 51) in knowing that we
minimize the distance by finding the point on the highway through which the normal through p2, 1q
passes.
1.5. Minimum Norm Problems U 15

In the second problem, we are minimizing the 1-norm or TAXICAB NORM. We can solve the problem
geometrically by expanding a diamond centered at the house until it touches the road, as shown in Figure
1.5.1. The point where the square touches the road is the solution to the system x 2y  0, x  2, or
px, yq  p2, 1q. The minimum cost is then 4{3 4{3  2. In this case, the projection theorem with the
ordinary concept of “perpendicular” does not apply, but we could have used [1, Theorem 5.8.1, p. 119].
In the third problem, we are minimizing a type of M INKOWSKI FUNCTIONAL (see [1, p. 131]). We can
solve the problem geometrically by expanding a square around the house until it touches the road, as we
did before. The factor by which you must expand the unit square in order for it make it touch a point, or
4{3, is the minimum cost. q

Example 1.5.2 (minimum norms and convex sets). Assume that the Island of Sodor is a convex set (it
actually is not – see Figure 1.5.2) whose shore is a smooth curve. Reverend Awdry is offshore and want
to swim to the closest point on the island.

Figure 1.5.2: The Island of Sodor is actually not a convex set (Example 1.5.2)!

Solution: We solve the problem geometrically by expanding a circle centered at Reverend Awdry’s loca-
tion until it just touches the island. Let the radius of the circle be d and the point of tangency between the
circle and the convex set be pt .
Consider any other tangent to the circle, where the point of tangency is very close to pt : this tangent
will cut across the interior of the island, otherwise either it would not be a tangent or the set would not
be convex. Therefore, d is the minimum distance from Reverend Awdry’s location to the shore. Now con-
sider all hyperplanes (lines) that separate Reverend Awdry from every point on the island. The reverend
is farthest away from the hyperplane tangent to the island that we just constructed. q
This is a DUALITY theorem: the distance d solves a minimization problem over points on the island
and also solves a maximization problem over elements of the dual space. The theorem is proved in [1]
(Theorem 1, p. 136) with no restrictions except convexity. That is:

• The shore of the island need not be smooth.


16 U 1. G ENERALIZING FROM T WO D IMENSIONS

• The vector space can be infinite-dimensional.

• The norm need not be the Euclidean norm.

An important class of minimum norm problems is finding a polynomial that best approximates a
function in some interval. Before taking a brief look at approximation problems, we first define a LINEAR
FUNCTIONAL :

Definition 1.5.3 (linear functional). Let V be a vector space over a field F. A linear functional is a map
f : V Ñ F such that f pv wq  f pvq f pwq for all v, w P V and f pavq  af pvq for all v P V and all a P F.

Example 1.5.4. Consider the infinite-dimensional vector space V of all continuous functions f on r1, 1s.
Let W € V be the 4-dimensional subspace of polynomials p for which deg p ¤ 3. Which of the following
are linear functionals?

• L : f pxq Ñ f p1q

• L : f pxq Ñ f p0q

• L : f pxq Ñ f p1q
³1
• L : f px q Ñ 1 f pxq dx

Solution: The first three are linear functionals since

L pf g q  pf g qpxq  f pxq g pxq  Lpf q Lpg q


Lpaf q  af pxq  aLpf q

for all x P R. The last is also a linear functional since


»1 »1 »1
L pf gq  pf p x q g pxqq dx  f pxq dx g pxq dx  Lpf q Lpg q
1 1 1
»1 »1
Lpaf q  a  f pxq dx  a f pxq dx  aLpf q.
1 1
Note that these four functionals acting on V are clearly linearly independent, as we cannot write one as a
linear combination of the others for every continuous function f on r1, 1s. If we restrict ourselves to W
°
and write f pxq  3k0 ak xk , then

f p1q  a0  a1 a2  a3
f p0q  a0
f p1q  a0 a1 a2 a3
»1  a3 4 1
f pxq dx  a0 x  2a0
a1 2 a2 3 2
x x x a2 .
1 2 3 4 1 3
1.5. Minimum Norm Problems U 17

Since  
1 1 1 1
 
1
det 
1
0
1
0
1
0
1



 0,
2
2 0 3 0
the functionals are linearly dependent. More elegantly, we might have remembered that Simpson’s rule
is exact for polynomials p for which deg p ¤ 3, so we can write
»1
f pxq dx  pf p1q 4f p0q f p1qq .
1
q
1 3

Example 1.5.5. You have lost your scientific calculator and can only evaluate polynomials ppxq. You need
to compute approximate values of f pxq  sin x for randomly chosen values x P r0, 1s. Here are three
norms that might be used to choose the best approximating polynomial:
³1
• L1 -norm: 0 |f pxq  ppxq| dx
³1
• L2 -norm: 0 pf pxq  ppxqq2 dx

• L8 -norm: max0¤x¤1 |f pxq  ppxq|

For which of these norms is the Taylor polynomial the right choice for ppxq? Under what circumstances
might one or another of these norms be the appropriate choice?

Solution: The Taylor polynomial is actually not the correct choice for any of these norms. It turns out that
interpolating polynomials arise in minimizing the 1-norm, Fourier series in minimizing the 2-norm, and
C HEBYSHEV POLYNOMIALS in minimizing the 8-norm. See [3] for a complete textbook on approximation
theory. q
The following theorem provides reassurance that under reasonably broad conditions, functions can
be approximated by polynomials to arbitrary accuracy.

T HEOREM 1.5.6 (Weierstraß Approximation Theorem). If f is a continuous real-valued function on r0, 1s, then
there exists a polynomial p on ra, bs such that |f pxq  ppxq| ¤  for all x P ras and any  ¡ 0.

Proof: Define the n 1 Bernstein basis polynomials of degree n as




bk,n pxq  x p1  xqnk ,  0, . . . , n.


n k
k
k

Then a linear combination of Bernstein basis polynomials



B pX q  βv bk,n pxq

k 0

is called a Bernstein polynomial, and β1 , . . . βn are called the Bernstein or Bézier coefficients. The proof
of the Weierstraß approximation theorem is constructive: we will construct a series of Bernstein polyno-
mials that converges uniformly to f .
18 U 1. G ENERALIZING FROM T WO D IMENSIONS

Define the Bernstein polynomial


ņ 


Bn pf qpxq  x p1  xqnk , x P r0, 1s.


k n k
f

k 0
n k

f is continuous and r0, 1s is compact, so f is uniformly continuous ([2, Theorem 4.19]) on r0, 1s. Therefore,
given any  ¡ 0, there exists a δ ¡ 0 such that |x  y |   δ implies |f pxq  f py q|   {2. Because f
is continuous and r0, 1s is compact, we can also conclude that f is bounded ([2, Theorem 4.15]); say
supxPr0,1s |f pxq| ¤ M . Now

ņ  

f pxq  Bn pf qpxq  f px q  f x p1  xqnk ,


k n k

k 0
n k

so,
 
 

ņ  k  n k nk
|f pxq  Bnpf qpxq| ¤  p q
f x f
n  k x 1 x p  q
k 0
 
 
 
 

¸ k  n k ¸ k  n k
nk nk
  p q
f x f
n  k x 1 x p  q  p q
f x f
n  k x 1 x p  q ,
k PS
loooooooooooooooooooooooomoooooooooooooooooooooooon P
loooooooooooooooooooooooomoooooooooooooooooooooooon
k T
A B

where S  tj : |x  j {n| ¥ n1{4 , j P r0, nsu and T  tj : |x  j {n|   n1{4 , j P r0, nsu  r0, nszS.

Let X  Binompn, pq, so that P pX  k q  nk xk p1  xqnk and σ 2  var X {n  pp1  pq{n ¤ 1{4n. By
Chebyshev’s inequality,
 

¤ M nσ1{2  M xp1?n xq ¤ M 4?1 n .


 k 
¥ n1{4
2
A M P X  n

Equivalently, A   {2 if n ¡ M 2 {42 . Furthermore, if |X  k {n|   n1{4 and n1{4   δ, then |X  k {n|   δ
and |f pxq  Bn pf qpxq|   {2 since f is uniformly continuous on r0, 1s. Therefore, B is trivially less than
{2. We conclude that if n ¡ maxpM 2 {42 , δ 4 q, then |f pxq  Bn pf qpxq|  A B ¤  for all x P r0, 1s; that
is, the Bernstein polynomials Bn pf q converge uniformly to f . q

Example 1.5.7 (Gibbs Phenomenon). Find a discontinuous real-valued function for which such a polyno-
mial does not exist.

Solution (based on [4]): Consider the function f pxq  sgn sin x, where
$
'
'
&1 x¡0
sgn x  0 x0.
'
'

% 1 x 0

Clearly f is discontinuous at x  kπ, for k P Z. Consider the partial Fourier series:


sinpp2k  1qxq

SN f p xq 
4
2k  1
.
π k 1
1.5. Minimum Norm Problems U 19

Figure 1.5.3: Notice how limN Ñ8 SN f pxq  f pxq at x  0. This discrepancy is known as Gibbs phe-
nomenon or ringing artifacts (Example 1.5.7).

By the Weierstraß approximation theorem, we have limN Ñ8 |f pxq  SN f pxq| ¤  for all x where x is in
some closed interval not containing a multiple of π.

However, we notice in Figure 1.5.3 that SN f pxq exhibits a bump whenever x  kπ. In particular,
SN f pxq reaches a critical point whenever


pSN f q1pxq  π4 cospp2k  1qxq  0.
k 1 
To find the zeros of pSN f q1 pxq, we first show that


2 sin x cospp2k  1qxq  sinp2N xq. (1.5.1)

k 1

The proof is by induction. For N  1, Equation (1.5.1) reduces to the identity sinp2xq  2 sin x cos x. Now
assume Equation (1.5.1) is true for general N and show that this implies it is true for N 1. We have

N¸1 Ņ
2 sin x cospp2k  1qxq  2 sin x cospp2k  1qxq cospp2N 1qxq

k 1 
k 1
paq
 sinp2N xq 2 sin x cospp2N 1qxq
 sinp2N xq 2 sin x cosp2N xq cos x  2 sin x sinp2N xq sin x
pbq
 p1  2 sin2 xq sinp2N xq sinp2xq cosp2N xq
p cq
 cosp2xq sinp2N xq sinp2xq cosp2N xq pdq sinpp2N 1qxq.
where (a) follows from the inductive step, (b) follows from the identity sinp2xq  2 sin x cos x, (c) follows
from the identity cosp2xq  cos2 x  sin2 x  p1  sin2 xq  sin2 x  1  2 sin2 x, and (d) follows from the
identity sinpa bq  cos a sin b sin a cos b.

Using this fact, we see that SN f reaches a critical point whenever π sin xpSN f q1 pxq  2 sinp2N xq  0,
or whenever x is a multiple of π {2N . Consider the closest critical points to x  0: x  π {2N . Then we
20 U 1. G ENERALIZING FROM T WO D IMENSIONS

have  
 π Ņ 1 π
sin 2k2N Ņ 
2k 1
SN f
2N
 4
π 2k  1
 2
π
sin

2N π
2k 1
π
N
.
k1 
k 1 2N π
!  )
2k 2pk 1q
The last sum is a Riemann sum taken over midpoints of the partition 2N , 2N ,k  0, . . . , N  1 .
Therefore, »π
 π
lim SN f
N Ñ8 2N
 lim f pxq 
xÑ 0
sin x
x
dx  1.1790.
0

A similar analysis limxÑ0 f pxq  1.1790. Therefore, the sequence of polynomials tSN f u does not
converge pointwise to f . However, we do see that the sequence tSN f u does converge to f for the L1 - and
L2 -norms, though it does not for the L8 -norm. q
C HAPTER 2
PRELIMINARIES IN ALGEBRA , TOPOLOGY, AND ANALYSIS

Reading: [1, Sections 2.1 – 2.9]. Also see [5, Chapters 1 – 2] for more on the linear algebra covered in this
chapter, [6, Chapter 2] for more on the topology, and [2, Chapters 2 and 4] for more on the topology and
analysis.

2.1 Vector Spaces

Definition 2.1.1 (vector space). A vector space V over a field F is a set V along with two operations:
addition, which associates with any two vectors u, v P V a vector u v P V , and scalar multiplication,
which associates with any vector v P V and any scalar a P F a vector av P V . The following axioms hold:

1. commutative law for vector addition: u vv u for all u, v P V .

2. associative law for vector addition: pu vq wu pv wq for all u, v, w P V .

3. existence of an additive identity: There exists a null vector 0 P V such that v 0  v for all v P V .

4. distributive law for vector addition: apu vq  au av for all u, v P V and all a P F.

5. distributive law for scalar addition: pa bqv  av bv for all v P V and all a, b P F.

6. associative law for scalar multiplication: pabqv  apbvq for all v P V and all a, b P F.

7. existence of a multiplicative identity: 1v  v for all v P V . Also, 0v  0 for all v P V .

Unless specified otherwise, we will assume we are working with real vector spaces (F  R).

Example 2.1.2. Note that [1] replaces a standard axiom, the existence of an additive inverse, with the
axiom 0v  0 for all v P V . Prove the existence of an additive inverse from [1]’s axioms.

Solution: We have
paq
0  0v  p1 p1qqv pbq 1v p1qv pcq v p1vq,
where (a) follows from [1]’s axiom 0v  0 for all v P V , (b) follows from the distributive law for scalar
addition, and (c) follows from the definition of a multiplicative identity. Therefore, we define the additive
inverse v  1v. q

21
22 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS

Example 2.1.3. Which axiom on the list can be proved from the others?

Solution: We can prove the commutative law from the other axioms. We have

paq
0  0 pu v q
 pp1q 1qpu vq
pbq
 p1qpu vq 1pu vq
pcq
 p1qu p1qv 1u 1v
pdq
 u  v u v
where (a) follows from the fact that 0v  0, (b) follows from the distributive law for scalar addition,
(c) follows from the distributive law for vector addition, and (d) follows from the existence of additive
inverses, as shown above, and the definition of the multiplicative identity. Then we add v to the left of
both sides to get u  u 0  u  u  v u v  v u v; repeating with v gives v u  u v. q
How do we know that we cannot prove another axiom from the remaining ones? We can try defining
vector addition and scalar multiplication in different ways and seeing if the resulting operators still obey
the axioms for a vector space. If not, then the axioms that are not satisfied are independent of the axioms
that are.

Example 2.1.4. Let V  R2 . Define addition as usual, but define scalar multiplication by apx, y q  pax, 0q
for px, y q P V and a P R. Is V a vector space? If not, what axiom is not satisfied?

Solution: A multiplicative identity does not exist in this vector space since there exists no a such that
apx, y q  px, y q for all px, y q. Therefore, the existence of a multiplicative identity is independent of the
other axioms. q

Example 2.1.5. For the NSA budget, let V  R Ytsu. The symbol S denotes a secret amount. For elements
in V other than s, addition and multiplication are defined as usual. However, s v  v s  s for all
v P V and as  s for all a P R. Is V a vector space? If not, what axiom is not satisfied?

Solution: No, because the axiom 0v  0 is not satisfied. Clearly 0s  0  s. q


Below are some potential vector spaces:

™ All 2  3 matrices with real entries: This is equivalent to R6 , which is a 6-dimensional vector space

™ All infinite sequences: infinite-dimensional vector space

€ All bounded infinite sequences: infinite-dimensional vector space (since the sum of two bounded
sequences is bounded and a bounded sequence times a constant is still bounded)
€ All infinite sequences that converge to zero: infinite-dimensional vector space
2.1. Vector Spaces U 23

€ All infinite sequences with only finitely many nonzero terms: infinite-dimensional vec-
tor space
€ All infinite sequences for which the terms form a convergent series: infinite-dimensional
vector space

™ All functions f : tA, B, C u Ñ R: 3-dimensional vector space

™ All polynomials of any (finite) degree: infinite-dimensional vector space

€ All polynomials p with deg p  3: not a vector space (consider f pxq  x3 and gpxq  x3:
degpf g q  3)

™ All power series: infinite-dimensional vector space

™ All continuous functions on r0, 1s: infinite-dimensional vector space

™ All linear functions f : R3 Ñ R: 3-dimensional vector space (since we have the basis
tf p1, 0, 0q, f p0, 1, 0q, f p0, 0, 1qu
because f is linear)

Note that we nested SUBSPACES inside each other:

Definition 2.1.6 (vector subspace). A nonempty subset U of a vector space V over F is called a subspace
of V if for any u, v P U and a, b P F, au bv P U . That is, U is a subspace of V if U is closed under vector
addition and scalar multiplication.

A subspace must be nonempty, so it must contain at least one element. By the axiom guaranteeing the
existence of an additive identity, that one element must be 0. Therefore, every subspace contains 0.

Proposition 2.1.7 ([1, Proposition 2.3.1, p. 15]). If U1 and U2 are subspaces of V , so is U1 X U2 .

Proof: 0 P U1 , U2 since U1 and U2 are subspaces, so 0 P U1 X U2. Therefore, U1 X U2 is nonempty. If


u, v P U1 X U2 , then u, v P U1 and u, v P U2 . Then au bv P U1 , U2 since U1 and U2 are subspaces, so
au bv P U1 X U2 . q
While U1 X U2 is a subspace if U1 and U2 are subspaces, it is not necessarily true that U1 Y U2 is a
subspace. Consider any two (noncolinear) lines in R2 that pass through the origin. These lines are both
subspaces, but the smallest subspace containing both of these subspaces is R2 .

Definition 2.1.8 (sum of vector spaces). The sum of two subsets U1 and U2 of a vector space is the set of
all sums u1 u2 , where u1 P U1 and u2 P U2 .

Example 2.1.9. What are the sum and difference of two squares, one centered at a and with side length 2r
and the other centered at b and with side length 2s?
24 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS

Solution: We skip ahead a little and assume knowledge of NORMED VECTOR SPACES. Let the first square
be the set A  tx : }x  a}8 ¤ ru and B  ty : }y  b}8 ¤ su. Consider any x P A and y P B. We have

}px y q  pa bq}8 ¤ }x  a}8 }y  b}8 ¤ r s,

where the first inequality follows from the TRIANGLE INEQUALITY. Therefore, A B is a square centered
at a b with side length 2pr sq. Furthermore, we have

}px  yq  pa  bq}8 ¤ }x  a}8 }y  b}8 ¤ r s,

where again we use the triangle inequality, as well as the property that }ax}  |a|}x} for any scalar a P F
and any vector x P X, where X is a normed vector space. Therefore, A  B is a square centered at a  b
with side length 2pr sq. q
Note that A B does not depend on our choice of the origin o since pa  oq  pb  oq  a  b. This
construction will be crucial to the proof of the E IDELHEIT SEPARATION THEOREM ([1], Theorem 3, p. 133).

Proposition 2.1.10 ([1, Proposition 2.3.2, p. 15]). If U1 and U2 are subspaces of V , then U1 U2 is a subspace of
V.

Proof: U1 and U2 contain 0, so U1 U2 contains 0. Suppose u1 , u2 P U1 U2 . Then there exist vectors


v1 , v2 P U1 and w1 , w2 P U2 such that u1  v1 w1 and u2  v2 w2 . For any scalars a, b P F, we have

au1 bu2  looooomooooon


pav1 bv2q loooooomoooooon
paw1 bw2q;
PU1 PU2
since au1 bu2 can be expressed as the sum of a vector in U1 and a vector in U2 , it is in U1 U2 . Therefore,
U1 U2 is a subspace of V . q
Given any set S of vectors in V , there will in general be many subspaces of V that contain S. One is V
itself, but there may be smaller ones. For example, if S  tp1, 1, 0u, then some examples of subspaces that
contain S are tpx, x, 0q : x P Ru, tpx, y, 0q : x, y P Ru, and tpx, y, z q : x, y, z P Ru, which are of dimensions
1, 2, and 3, respectively. The “smallest” such subspace is given a special name:

Definition 2.1.11 (subspace generated by a subset). Suppose S is a subset of a vector space V . The set rS s,
called the subspace generated by S, consists of all vectors in V which are linear combinations of vectors
in S.

If subspace U contains S, it contains rS s. Equivalently, rS s is the intersection of all subspaces that contain
S.
The usual way to construct rS s is to form all linear combinations of vectors in S. This is clearly a
subspace since it is closed.

Example 2.1.12. Construct rS s P R2 if

• S  tp1, 2q, p3, 6qu


2.2. Convex Sets U 25

• S  tp1, 2q, p3, 3qu


• S is the line segment from p1, 2q to p3, 6q.

Solution: We have

• rS s  tpx, 2xq : x P Ru since p3, 6q is a multiple of p1, 2q.

• rS s  R2 since p1, 2q and p3, 3q are linearly independent.

• rS s  tpx, 2xq : x P Ru since the line segment is part of the line y  2x. q

S UBSPACE generalizes “line or plane through the origin”. The generalization of “any line or plane” is
LINEAR VARIETY , a subspace plus a constant vector:

Definition 2.1.13 (linear variety). The translation of a subspace is a linear variety or affine subspace.

Analogous to the subspace generated by a subset, we have the following:

Definition 2.1.14 (linear variety generated by a subset). Suppose S is a nonempty subset of a vector space
V . The set ν pS q, called the linear variety generated by S, is the intersection of all linear varieties in V that
contain S.

Example 2.1.15. If S P R3 consists of the vectors tp0, 0, 1q, p0, 1, 1qu, give examples of subspaces of two
different dimensions that contain S. Which subspace is the “smallest”? What is ν pS q?

Solution: The subspaces tp0, x, y q : x, y P Ru and tpx, y, z q : x, y, z P Ru are subspaces of dimensions 2 and
3, respectively, that contain S. The first subspace is the smallest. ν pS q is p0, 0, 1q tp0, x, 0q : x P Ru. q

Example 2.1.16. In R3 , what is ν pS q if S is a circle? Under what circumstances does ν pS q  rS s?

Solution: ν pS q is the plane containing the circle. If the origin lies in the plane containing the circle, then
ν pS q  rS s. q

2.2 Convex Sets

Definition 2.2.1 (convex set). A set K in a linear vector space is said to be convex if for any x1 , x2 P K, all
elements in the set tλx1 p1  λqx2 : λ P r0, 1su are in K.
26 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS

Proposition 2.2.2. The empty set is convex.

Proof: To show that ∅ is not convex, we must find two vectors x1 , x2 P ∅ such that there exist vectors in
the set tλx1 p1  λqx2 : 0 ¤ λ ¤ 1u not in ∅. Since ∅ contains no vectors at all, it is convex. q

Proposition 2.2.3 ([1, Proposition 2.4.1, p. 18]). For any convex sets K, L and any scalars a, b, aK bL is
convex.

Proof: Consider the vectors z1 , z2 P aK bL. There exist vectors x1 , x2 P K and y1, y2 P L such that
z1  ax1 by1 and z2  ax2 by2 . Since K and L are convex, we have

λz1 p1  λqz2  λpax1 by1 q p1  λqpax2 by2 q  apλx1 p1  λqx2q bpλy1 p 1  λ qy 2 q.


Then λx1 p1  λqx2 and λy1 p1  λqy2 are in K and L, respectively, since K and L are convex. Therefore,
λz1 p1  λqz2 is in aK bL. q

Figure 2.2.1: The union of the two disks is not convex.

In general, the union of two convex sets is not convex, as shown in Figure 2.2.1. However, we do have
the following:

Proposition 2.2.4 ([1, Proposition 2.4.2, p. 18]). Let C be an arbitrary collection of convex sets. Then XK PC K is
convex.

Proof: Let C  XK PC . If C is empty, then the proof reduces to that in Example 2.2.2. Assume we have
x1 , x2 P C and choose any λ P r0, 1s. Then x1 , x2 P K for all K P C and since each K is convex, λx1 p1 
λqx2 P K for all K P C. Therefore, λx1 p1  λqx2 P C and C is convex. q

Definition 2.2.5 (convex hull). Given an arbitrary set S in a linear vector space, the convex hull or convex
cover, denoted copS q, is the smallest convex set containing S.

We can express copS q as the intersection of all convex sets containing S. Alternatively, we could express
copS q as the set of all convex combinations of vectors in S, where a convex combination is a linear combi-
°| S | °|S |
nation k1 αk xk , where αk ¥ 0 for k  1, . . . , |S | and k1 αk  1.
2.2. Convex Sets U 27

Proposition 2.2.6. Let K be the set of vectors consisting of all convex combinations of vectors in S. Show that
K  copS q.

°
Proof: First we show that K „ copS q. Let Km be the set of all convex combinations of the form m i1 αi xji ,
°m ”n
where i1 αi  1, ai ¥ 0 for i  1, . . . , m, and m  1, . . . , n. Clearly K  m1 Km . Then in order to
show K „ copS q, we need to show that Km „ copS q for m  1, . . . , n. The proof is by induction. This
clearly holds for m  1 since in that case α1  1 and the convex combinations are therefore just the
elements of S „ copS q. Now we assume that the result is true for general m   n and show that this
° °
implies it is true for m 1. Say we are given a convex combination p  im1 1 αi xji , where ik11 αi  1
and ai ¥ 0 for i  1, . . . , m 1. At least one of α1 , . . . , αm 1 must be strictly positive; without loss of
generality, assume that α1 ¡ 0. Then

¸1
m ¸1
m ¸1
m ¸1
m
p  α1 xj  α1 looxmo αi
α i x ji 1 α i x ji j on 1 αi °m 1 x ji ,

i 1 
i 2 
i 2  i2 αi
looooooooomooooooooon
i 2
q
r

Clearly q P copS q since q P S. Since


°m
¸1
m 1
°m
αi
 
i 2 αi
°m 1  1,
1

i 2  αi
i 2 
i 2 αi

r is a convex combination of m elements of S and therefore, by the inductive hypothesis, is also in copS q.
Then p is on the line segment connecting q and r with q, r P copS q, so p P copS q, as we wanted to show.
Remember that copS q is the smallest convex set containing S. We know that S  K1 „ K „ copS q, so
we just need to show that K is actually convex in order to show that K  copS q. Say we are given two
° ° ° °
elements of K, q  ni1 αi xi and r  ni1 βi xi , with ni1 αi  ni1 βi  1 and αi , βi ¥ 0 for i  1, . . . , n.
° °
Then any point on the line segment connecting q and r can be written as p  λ ni1 αi xi p1  λq ni1 βi xi
°
with λ P r0, 1s. We can rewrite p as ni1 pλαi p1  λqβi qxi , with
ņ ņ ņ
pλαi p1  λqβiq  λ αi p1  λq βi  λ p1  λq  1,

i 1 
i 1 
i 1

so p is a convex combination of x1 , . . . , xn and is therefore in K. q

Definition 2.2.7 (cone). A set C in a linear vector space is said to be a cone with vertex at the origin if
x P C implies αx P C for all α ¥ 0.

Example 2.2.8. Make a cone from a line segment, from a circle, and from a disk.

Solution: Figure 2.2.2 shows a cone made from the line segment connecting p5, 5q and p5, 10q and a cone
made from the circle of radius 1{2 centered at p0, 0, 1{2q and parallel to the xy-plane. A cone made from
the corresponding disk would be the same as the cone made from the circle, except that the cone made
from the disk would be filled in. q
28 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS

Figure 2.2.2: cones made from the line segment and the circle in Example 2.2.8

Example 2.2.9. Make a convex cone from a non-convex set.

Solution: Consider the convex cone C generated from the line segment L in Figure 2.2.2. If we replace
L with an arc A that has the same endpoints as L and that is entirely contained within C, then we still
generate C. However, A is a non-convex set. q

Proposition 2.2.10. It is impossible to make a non-convex cone from a convex set.

Proof: Let C be a cone generated from a convex set K. Pick any two vectors y1 , y2 P C. Then there exist
vectors x1 , x2 P K and scalars α1 , α2 ¥ 0 such that y1  α1 x1 and y2  α2 x2 . Furthermore, we can set
α1  βγ and α2  β p1  γ q where β ¥ 0 and 0 ¤ γ ¤ 1: specifically, β  α1 α2 and γ  α1 {pα1 α2 q.
Then we can write
yc  λy1 p1  λqy2  β pγx1 p1  γ qx2 q  βxc ;
xc P K since K is convex and yc P C by definition of a cone. Therefore, C is convex. q

2.3 Linear Independence and Dimension

Definition 2.3.1 (linear independence). A vector x is said to be linearly dependent upon a set of vectors S
if x can be expressed as a linear combination of vectors in S. Equivalently, x is linearly dependent upon S
if x P rS s. Conversely, x is said to be linearly independent of the set S if it is not linearly dependent on S,
and a set of vectors is said to be a linearly independent set if each vector in the set is linearly independent
of the remainder of the set.

Note that this definition works even for infinite-dimensional vector spaces.

Example 2.3.2. Let S  t1, t2 , t4 , . . . u. Is f ptq  pt8  4t2 qpt6 t2 7q dependent upon S? Is g ptq  t5
dependent upon S? Is S a linearly independent set?
2.3. Linear Independence and Dimension U 29

Solution: f ptq is dependent upon S since each term of f ptq has an even exponent. g ptq is independent of
S since t5 has an odd exponent. S is a linearly independent set. q

T HEOREM 2.3.3 ([1, Theorem 2.5.1, p. 20]). The set tx1 , . . . , xn u is linearly independent if and only if
°n
k1 αk xk  0 implies αk  0 for k  1, . . . , n.

Proof: See the proof in [1]. q

°n
T HEOREM 2.3.4 ([1, Corollary 2.5.1, p. 20]). If tx1 , . . . , xn u is a linearly independent set and 
k 1 αk x k 
°n
k1 βk xk , then αk  βk for k  1, . . . , n.

°n
Proof: If 
k 1 α k xk  °nk1 βk xk , then °nk1pαk  βk qxk  0 and αk  βk for k  1, . . . , n by Theorem
2.3.3. q

Definition 2.3.5 (basis). A finite set B of linearly independent vectors is said to be a basis for the space V
if rB s  V . The number of vectors |B | in B is called the dimension of V .

Example 2.3.6. What is the dimension of the space spanned by t1, cos2 x, sin2 x, cos 2xu?

Solution: This space has dimension 2, since we can write 1  cos2 x sin2 x and cos 2x  cos2 x 
sin2 x. q

Q UIZ T HEOREM 1 (from [1, Theorem 2.5.2, p. 21]). If a vector space V is generated by the set of k vectors
Sk  tv1 , . . . , vk u, then any set of k 1 vectors Tk 1  tw1 , . . . , wk 1 u in V must be linearly dependent.

Proof: The proof is by induction. For the base case k  1, S1  tv1 u and T2  tw1 , w2 u. There exist
scalars α1 , α2  0 such that w1  α1 v1 and w2  α2 v1 . Then we have α2 v1  α1 v2  0, so T is a linearly
dependent set.
Our inductive hypothesis is that if V  rSk1 s, then Tk is a linearly dependent set. We assume this is
°
true for arbitrary k and show that it holds for k 1. First, we can write wk 1  ki1 αi vi since tv1 , . . . , vk u
is a basis. Choose some j such that αj  0. If this cannot be done, then w1  0 and Tk 1 is a linearly
dependent set. Then let Sk1  Sk  tvj u. Now

vj  α1 wk 1 v
j

°k
for some v P rSk1 s, with v   bi vi . Substituting this into the expression below, we have

i 1
i j

ķ ķ
wm  β i vi  pβ i bi qvi wk 1

i 1 

i 1
i j
30 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS

for m  1, . . . , k. By the inductive hypothesis, these k vectors w1 , . . . , wk are linearly dependent; that is,
°
there exists some linear combination ki1 γi wi  0 with not all of γ1 , . . . , γk equal to 0. But then we can
°
set γk 1  0 so that the linear combination kk1 γi wi also equals 0; again not all of γ1 , . . . , γk equal 0 so
the set Tk 1 is linearly dependent, thus proving the theorem. q
We conclude that if Sk is a basis for V , then any set of more than k vectors in V is linearly dependent
and cannot be a basis. Therefore, any two bases for a finite-dimensional vector space contain the same
number of elements.

Example 2.3.7. Show that the result does not extend to infinite-dimensional vector spaces.

Solution: The vector space V of convergent sequences is generated by the countably infinite set of linearly
independent elements S  tp1, 0, . . . q, p1, 1, 0, . . . q, p1, 1, 1, 0, . . . q, . . . u. If we remove the first element of
this set, we still have a countably infinite set S 1  S  tp1, 0, . . . qu, but this does not generate V since not
all convergent sequences tak u8 k1 have a1  a2 . q

2.4 Normed Vector Spaces

Definition 2.4.1 (normed vector space). A normed vector space, or normed linear space or normed linear
vector space, is a vector space X on which there is defined a real-valued function which maps every x P X
into a real number }x} called the norm of x. The norm satisfies the following axioms:

1. positive homogeneity: }αx}  |α|}x} for all α P R and all x P X.

2. triangle inequality: }x y} ¤ }x}}y} for all x, y P X.


3. positivity and positive definiteness: }x} ¥ 0 for all x P X and }x}  0 if and only if x  0.

A real-valued function p : X Ñ R that satisfies just the first two axioms is called a SEMINORM.
Note the following about each axiom:

positive homogeneity

• If you take a cab from your destination back to your starting point, you should pay the same
amount as when you went from your starting point to your destination.
• You cannot save money by breaking a cab ride along a single straight road into two pieces.
• The cab driver cannot offer bargain fares for extra-long rides.
• In order to convert x41 x42 into a norm, we must take the fourth root (this is the l4 -norm in 2
1
dimensions: }x}4  x41 x42 4
).
• If you draw a convex set that contains the origin and decree that the cab fare from the origin to
any point on the boundary of this set is $1 and that fare scales linearly with distance, you do not
necessarily have a norm. In fact, you only have a norm if the boundary is the set tx : }x}  cu
for some constant c.
2.4. Normed Vector Spaces U 31

triangle inequality

• You cannot save money by breaking a cab ride into two pieces.
• A cab driver can offer a 50% discount on the shorter dimension (}x  pu, v q}  12 minp|u|, |v |q
maxp|u|, |v |q). Say we have x  pa, bq and y  pc, dq. Assume without loss of generality that
minp|a|, |b|q  |a|. If minp|c|, |d|q  |c|, then clearly }x y} ¤ }x} }y}. If minp|c|, |d|q  |d|,
assume without loss of generality that minp|a c|, |b d|q  |a c|. Then




}x} }y}  }x y} 
1
2
|a| |b| |c| 1
2
|d|  1
2
|a c| |b d|

 12 p|a| |c|  |a c|q p|b| |d|  |b d|q


1
2
p|c|  |d|q ¥ 0,
where the inequality follows since each term in parentheses is nonnegative. Similarly, a cab
driver can also offer a 50% discount on the longer dimension.
• The set of vectors tx : }x} ¤ 1u is convex since

}λx1 p1  λqx2} ¤ λ}x1} p1  λq}x2} ¤ λ p1  λq  1.


positivity and positive definiteness

• The “if” part of positive definiteness is not an independent requirement: positive homogeneity
implies that }x}  0 if x  0 since we have }0}  }00}  |0|}0}  0.
• Positivity is not an independent requirement: we have

paq pbq
0  }0}  }x  x} ¤ }x} }  x} pcq }x} |  1|}x}  2}x},
where (a) follows from positive homogeneity as shown above, (b) follows from the triangle in-
equality, and (c) follows from positive homogeneity. Dividing both sides by 2 proves positivity.
• If the cab driver makes the longer dimension free, cab fare is still a norm (the l8 -norm).
1 1
In R2 , we have seen the l1 -, l2 -, l4 -, and l8 -norms so far, defined by |x1 |
|x2|, x21 x22 , x41 x42 , 2 4

and maxp|x1 |, |x2 |q, respectively. Before we are tempted to conclude that }x}p  pxp1 xp2 q is a norm
1
p

for all p ¥ 0, note that it is actually not a norm for p P r0, 1q. The “l0 -norm” does not satisfy positive
homogeneity, and the “lp -norm” for p P p0, 1q does not satisfy the triangle inequality (the unit disks for
these norms are not convex).

Example 2.4.2. Consider the set of infinite sequences of real numbers that have only finitely many nonzero
terms. Is this a vector space? If so, what is its dimension? Generalize the norm examples to this case. If
you remove the restriction to finitely many nonzero terms but insist that the norm be finite, which example
gives the largest vector space? The smallest?

Solution: Yes, this is a vector space since the sum of an infinite sequence with M ¤ 8 nonzero terms and
an infinite sequence with N ¤ 8 nonzero terms will have at most M N ¤ 8 nonzero terms, and any
multiple of an infinite sequence with M nonzero terms will still have only M nonzero terms. The vector
32 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS

° 1
8 |ξ |p p . The
space is infinite-dimensional. The lp -norm (p P r1, 8q) for a sequence tξ1 , ξ2 , . . . u is i1 i
l8 -norm is maxi t|ξi |u8
i1 . If we remove the restriction to finitely many nonzero terms but insist that the
norm be finite, then the l8 -norm gives the largest vector space, while the l1 norm gives the smallest. q

Example 2.4.3. C ra, bs is the space of continuous functions on ra, bs, with norm }x}  maxtPra,bs |xptq|.
Confirm that this norm satisfies all the axioms.

Solution: We have

1. positive homogeneity:

}αx}  amax
¤t¤b
|αxptq|  |α| amax
¤t¤b
|xptq|  |α|}x}.

2. triangle inequality:

}x y }  max |xptq
¤¤
a t b
y ptq| ¤ max p|xptq|
¤¤
a t b
|yptq|q ¤ amax
¤t¤b
|x| max |y ptq|  }x}
¤¤
a t b
}y}.

3. positivity and positive definiteness: Clearly }x}  0 if only if xptq  0 for all t P ra, bs. }x} ¥ 0 since
|x| ¥ 0. q

Example 2.4.4. Give examples of functions, not in C r0, 1s, that can be included in the space if you choose
³1
the norm }x}  0 |xptq| dt.

Figure 2.4.1: Thomae’s function (Example 2.4.4). Image taken from Wikipedia.

Solution: The function xptq  sgn px  0.5q is discontinuous at the point x  0.5, but }x}  1. A more
interesting choice is Thomae’s function, also known as the popcorn function or the Riemann function:
#
1
t PQ.
p
xptq  q q
0 tPRQ

We show that }x}  0 (see Figure 2.4.1). Given any  ¡ 0, choose n such that 1{n   . In the interval
r0, 1s, there are only finitely many rational numbers with denominator at most n. Say there are dn of these
numbers. Surround each of these dn points with an interval of length {dn , and use the endpoints of these
intervals to form a dissection of the interval r0, 1s. The upper Darboux sum for this dissection is less than
2: within each of the dn intervals, whose combined length is , the function has an upper bound of 1, and
2.4. Normed Vector Spaces U 33

within each of the remaining intervals, whose combined length is less than 1, the function has an upper
bound of 1{n   . The lower Darboux sum is obviously 0 since every interval in the dissection contains
an irrational number. Since we have 0 ¤ }x} ¤ 2 for all  ¡ 0, }x}  0.
The norm }x} does not exist for the Dirichlet function, defined as
#
tPQ
x pt q  1 Q pt q 
1
,
0 tPRQ

is not Darboux-integrable (and therefore not Riemann-integrable) since every interval in any dissection
of r0, 1s will always contain at least one rational and at least on irrational number; therefore, the upper
and lower Darboux sums will not converge. If we use the Lebesgue integral, however, then }x} exists and
equals 0 since Q is countable. q

Definition 2.4.5 (total variation). A function x on ra, bs is said to be of bounded variation if there is a
constant K so that for any partition a  t0   t1        tn  b of ra, bs,

|xptiq  xpti1q| ¤ K.

i 1

We then define the total variation of x as


ņ »b
TVpxq  sup |xptiq  xpti1q|  |dxptq|,

i 1 a

where the supremum is taken with respect to all partitions of ra, bs.

Example 2.4.6. The space BVra, bs is defined as the space of all functions of bounded variation on ra, bs
together with the norm }x}  |xpaq| TVpxq. Why is this norm the appropriate choice? Is the function
xptq  sin 1
1
t in BVr0, 1s?

Figure 2.4.2: The function xptq  sin 1t does not have bounded variation (Example 2.4.6).

Solution: We check that }x} satisfies the three axioms for a norm:
34 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS

1. positive homogeneity:
ņ ņ
}αx}  |αxpaq| sup |αxptiq  αxpti1q|  |α||xpaq| |α| sup |xptiq  xpti1q|  |α|}x}.

i 1 
i 1

2. triangle inequality:


}x y }  |xpaq y paq| sup |pxptiq y pti qq  pxpti1 q y pti1 qq|

i 1
ņ ņ
¤ |xpaq| |ypaq| sup |xptiq  xpti1q| sup |yptiq  ypti1q|,

i 1 
i 1

where we use the triangle inequality for the absolute value function.

3. positivity and positive definiteness: Clearly }x}  0 if xptq  0 for t P ra, bs. Furthermore, because
we include the term |xpaq| in the norm, }x}  0 only if xptq  0 for t P ra, bs. Finally, }x} ¥ 0 for all
functions x P BVra, bs since every term in the norm involves absolute values.

The function xptq  sin 1t is not in BVr0, 1s since there are infinitely many points t P r0, 1s at which
xptq  1. Specifically, as t Ñ 0, xptq oscillates between 1 and 1 infinitely often, as shown in Figure 2.4.2.
Therefore, xptq does not have bounded variation. q
We can think of xptq as the position of a car as a function of time for t P ra, bs, and that the car’s
odometer increases even when the car is backing up. Then the space BVra, bs is the set of functions for
which the change in the odometer reading is finite. This space is important because it will turn out to be
the dual space to C ra, bs.

2.5 Open and Closed Sets

A norm introduces a topology that may be more general than what the reader is used to.

Definition 2.5.1 (topology). A topology on a set X is a collection T of subsets of X having the following
properties:

1. ∅ and X are in T .

2. The union of the elements of any subcollection of T is in T .

3. The intersection of the elements of any finite subcollection of T is in T .

A set X for which a topology T has been specified is called a topological space.

Definition 2.5.2 (open set). If X is a topological space with topology T , we say that a subset U € X is an
open set of X if U belongs to the collection T . More generally, a topological space is a set X together with
a collection of subsets of X, called open sets, such that ∅ and X are both open, and such that arbitrary
unions and finite intersections of open sets are open.
2.5. Open and Closed Sets U 35

The most basic topology on a nonempty set X is the collection t∅, X u, so a topology can contain just
two open sets. Furthermore, it is permissible to have every set in a topology be open.
Here are some other definitions that are independent of any model for topology:

Definition 2.5.3 (closed set). A subset A of a topological space X is said to be closed if the set X  A is
open.

Definition 2.5.4 (interior and closure). Given a subset A of a topological space X, the interior of A, de-
noted Å, is defined as the union of all open sets contained in A (i. e. the largest open subset of A). The
closure of A, denoted Ā, is defined as the intersection of all closed sets containing A (i. e. the smallest
closed set containing the subset A).

Clearly if A  Å, then A is open, while if A  Ā, then A is closed.

Figure 2.5.1: a Web site model that illustrates a finite topology that does not involve norms

We use a Web site model to illustrate a finite topology that does not involve norms. Figure 2.5.1 shows
the links between the six Web pages in a set X. An “open set” is defined by the property that no page in
the set can be reached from a page from outside the set.

Example 2.5.5. For the Web site example, that both ∅ and X are open.

Solution: We cannot find a page in ∅ that can be reached from a page outside ∅ because there are no
pages in ∅. We cannot find a page outside X that links to a page in X because there are no pages outside
X. Therefore, ∅ and X are open. q

Example 2.5.6. For the Web site example, prove that the union of two open sets is open.

Solution: Let A and B be two open sets. By the definition of an open set, there are no links from pages
in X zA to pages in A, nor are there links from pages in X zB to pages in B. Since X zpA Y B q „ X zA
and X zpA Y B q „ X zB, there are no links from pages in X zpA Y B q to either pages in A or pages in B;
therefore, there are no links from pages in X zpA Y B q to pages in A Y B. Thus, A Y B is open. q
36 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS

Example 2.5.7. For the Web site example, prove that the intersection of two open sets is open.

Solution: Let A and B be two open sets. Suppose that A X B is not open. Then there must be some page
in X zpA X B q that links to a page in A X B. But X zpA X B q  pX zAq Y pX zB q, so we have found either
a page in X zA that links to a page in A or a page in X zB that links to a page in B. This contradicts our
assumption that A and B are open, so A X B is open. q
For the Web site model, we note the following to illustrate some of the topological concepts we have
defined:

• The nine open sets in X are ∅, t2u, t4, 5u, t1, 2, 3u, t2, 4, 5u, t4, 5, 6u, t2, 4, 5, 6u, t1, 2, 3, 4, 5u, and
t1, 2, 3, 4, 5, 6u  X.
• The interior of t2, 3u is t2u since t2u is the largest open subset of t2, 3u.

• The interior of t1, 2, 4, 5, 6u is t2, 4, 5, 6u.

• The nine closed sets are the complements of the nine open sets in X. Note that ∅ and X are both
open and closed sets (sometimes called clopen sets).

• The closure of t1u is t1, 3u since t1, 3u is the smallest closed set containing the subset t1u.

• The closure of t1, 6u is t1, 3, 6u since t1, 3, 6u is the smallest closed set containing the subset t1, 6u.

Now we define a topology using a norm.

Definition 2.5.8 (interior point). Let P be a subset of a normed vector space X. The point p P P is an
interior point of P if there exists some  ¡ 0 such that the ball B pp, qq  tx : }x  p}   u is a subset of
P.

Definition 2.5.9 (clsoure point). A point x P X is a closure point or limit point of a set P if, given any
 ¡ 0, there is a point p P P such that }x  p}   .

In other words, these definitions are saying that a point p is an interior point of P if we can always
surround p with a ball entirely contained within P , while a point x is a closure point of P if every ball
centered at x contains at least one point in P .

Proposition 2.5.10. P̄ is a closed set; that is, that the complement of P̄ is an open set.

(based on [2, Theorem 2.27, p. 35]): If p P X and p R P̄ , then p is neither a point in P nor a closure point
of P . Therefore, there exists some ball centered at p that does not contain any points in P . This shows
that the complement of P̄ in X is open, so P̄ is closed. q
2.6. Convergence, Limits, and Continuity U 37

Q UIZ T HEOREM 2 (first part of [1, Proposition 2.7.4, p. 25]). Let C be a convex set in a normed space. Then C̊
is convex.

Proof: If C̊ is empty, it is convex (see Proposition 2.2.2). Suppose x0 , y0 P C̊. Fix λ P p0, 1q: we must
show that z  λx0 p1  λqy0 is in C̊. Since x, y P C̊, there exists some  ¡ 0 such that the open
balls B px0 , q, B py0 , q are contained in C; that is, all vectors x0 w and y0 w with }w}    are in C.
Since C is convex, all convex combinations λpx0 wq p1  λqpy0 wq are in C. Furthermore, since
λpx0 wq p1  λqpy0 wq  z0 w, it follows that all points of the form z0 w are in C; that is, there
exists some  ¡ 0 such that the open ball B pz0 , q is contained in C. Therefore, z0 P C̊. q

Q UIZ T HEOREM 3 (second part of [1, Proposition 2.7.4, p. 25]). Let C be a convex set in a normed space. Then
C̄ is convex.

Proof: If C̄ is empty, it is convex (see Proposition 2.2.2). Suppose x0 , y0 P C̄. Fix λ P p0, 1q: we must
show that z0  λx0 p1  λqy0 is in C̄. Given any  ¡ 0, select x, y from C such that }x  x0 }    and
}y  y0}   . Since C is convex, z  λx p1  λqy is in C. Then by the triangle inequality,
}z  z0}  }λx p1  λqy  λx0  p1  λqy0} ¤ λ}x  x0} p1  λq}y  y0}   λ p1  λq  ,
so z0 is within a distance  of the point z P C. Since this is true for every  ¡ 0, z0 is a closure point of C̄
and is therefore in C̄. q

2.6 Convergence, Limits, and Continuity

Any topology is sufficient to define convergence.

Definition 2.6.1 (convergence (any topology)). The sequence txn u converges to x if, for every open set P
containing x, there exists an N such that for all n ¡ N , xn P P . We write xn Ñ x.

Example 2.6.2. For the Web site topology, show that the sequence 6, 5, 4, 6, 5, 4, 5, 4, 5, 4, . . . converges both
to 4 and to 5.

Solution: The smallest open set containing 4 is t4, 5u, and the smallest open set containing 5 is also t4, 5u.
Therefore, for every open set P containing 4 or 5, there exists some N such that for all n ¡ N , q
If we define open sets by a norm, we can formulate convergence in terms of norms.

Definition 2.6.3 (convergence (normed space)). The sequence txn u converges to x if, for every  ¡ 0, there
exists an N such that for all n ¡ N , }xn  x}   . As before, we write xn Ñ x.
38 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS

Q UIZ T HEOREM 4 ([1, Proposition 2.8.1, p. 27]). If a sequence converges, its limit is unique.

Proof: Suppose xn Ñ x and xn Ñ x1 . Then for every {2 ¡ 0, there exist N, N 1 such that for all n ¡N
and all n1 ¡ N 1 , }xn  x}   {2 and }xn  x1 }   {2. Then by the triangle inequality,

}x  x1}  }px  xmq  px1  xmq} ¤ }x  xm} }x1  xm} ¤ 


for m ¡ maxpN, N 1 q. Since this is true for any  ¡ 0, x  x1 . q

T HEOREM 2.6.4 ([1, Proposition 2.8.2, p. 27]). A set F is closed if and only if every convergent sequence with
elements in F has its limit in F .

Proof: For the “only if” direction, the limit of a sequence in F is obviously a closure point of F and
therefore must be in F if F is closed. For the “if” direction, suppose that F is not closed. Then there is a
closure point x of F that is not in F . In each of the open balls B px, 1{nq we may select a point xn P F since
x is a closure point. The sequence txn u generated in this way converges to some x R F , which contradicts
our assumption that every convergent sequence with elements in F has its limit in F . Therefore, F must
be closed. q

Definition 2.6.5 (transformation). Let X and Y be vector spaces and let D be a subset of X. A rule which
associates an element y P Y with every element x P D is a transformation from X to Y with domain D.

Definition 2.6.6 (injective). T is injective or one-to-one if T pxq  T pyq implies x  y.

Definition 2.6.7 (surjective). T is surjective or onto if for every y P Y , there exists at least one x such that
T pxq  y. In other words, the image of T equals its codomain.

Definition 2.6.8 (linear). T is linear if T pax byq  aT pxq bT pyq for any x, y P X and any scalars a, b P F.

Example 2.6.9. If X or Y is infinite-dimensional, then a linear transformation T cannot be represented by


a matrix. Show that the following transformations still qualify as linear:

• Both X and Y are the space of polynomial functions: T is differentiation.

• Both X and Y are the space of continuous functions on ra, bs: T is defined by
»b
T px q  k pt, τ qxpτ q dτ.
a

where k is a continuous function on ra, bs  ra, bs


2.6. Convergence, Limits, and Continuity U 39

³
Solution: Differentiation and integration are linear operators: Dx paf bg q  aDx f bDx g and paf pxq
³ ³ R
bg pxqq dx  a R f pxq dx b R g pxq dx. q

Definition 2.6.10 (continuity (any topology)). T : X ÑY is continuous if the inverse image U  T 1 pV q


of any open subset V „ Y is an open subset of X.

Notice that the inverse image is defined even if T is not invertible, and it is not necessarily a connected
set.
For a topology defined by a norm, we can formulate convergence in terms of norms.

Definition 2.6.11 (continuity (normed space)). A transformation T from a normed space X to a normed
space Y is continuous at x0 P X if for every  ¡ 0, there exists a δ ¡ 0 such that }x  x0 }   δ implies
}T pxq  T px0q}   .

Example 2.6.12 (based on [6, Theorem 18.1, p. 104]). Show that this is the same definition, specialized to
a topology defined by a norm. In other words, show that the definition of continuity for any topology
implies the definition of continuity for a normed space.

Solution: Suppose T is continuous at x0 and let V  ty : }T px0 q  y}   u be an open subset of Y . Then


there exists an open set U 1  T 1 pV q in X, and clearly, x0 P U . Furthermore, since U 1 is open, there exists
some δ such that U  tx : }x0  x}   δ u is a subset of U 1 . Then x P U implies T pxq P V . q
Note that the norm can be different in X and Y .

T HEOREM 2.6.13 ([1, Proposition 2.9.1, p. 28]). A transformation T mapping a normed space X into a normed
space Y is continuous at the point x0 P X if and only if xn Ñ x0 implies T pxn q Ñ T pbx0 q.

Proof: The “if” portion is obvious since we just set the δ in the definition of continuity to the  in the
definition of convergence. For the “only if” portion, let txn u be a sequence such that xn Ñ x0 but T pxn q Û
T px0 q. Then for some  ¡ 0 and every N , there is an n ¡ N such that }T pxn q  T px0 q} ¥ . Since xn Ñ x0 ,
this implies that for every δ ¡ 0 there is a point xN such that }xn  x0 }   δ } and }T pxn q  T px0 q} ¥ . But
this contradicts our assumption that T is continuous at x0 , so we must have T pxn q Ñ T px0 q. q
Theorem 2.6.13 suggests the following practical test for lack of continuity. If you want to show that T
is not continuous at x0 :

• Choose some suitable fixed  (smaller works better)

• Construct a sequence xn with the following properties:

– xn converges to x.
– For all n ¡ N , ||T pxn q  T pxq|| ¥ .
1
Having xn scale like n is often convenient.
40 U 2. P RELIMINARIES IN A LGEBRA , T OPOLOGY, AND A NALYSIS

Example 2.6.14. Use this test to show that the function

x2 y
f px, y q  , f p0, 0q  0
x4 y 2

is discontinuous at px, y q  p0, 0q and that the proof is valid for the l2 norm, the l1 norm, or the l8 norm
in R2 .

Solution: Let pxn , yn q  p1{n, 1{n2 q. Then


1 
lim f pxn , yn q  lim n4
 12  0  f p0, 0q  f lim xn , lim yn ,
nÑ8 nÑ8 1
n4
1
n4
n Ñ8 nÑ8

so f is discontinuous at px, y q  p0, 0q. q


C HAPTER 3
BANACH SPACES

Reading: [1, Sections 2.10 – 2.15]

3.1 lp Space

Definition 3.1.1 (lp space). For p ¥ 1, the space lp consists of all sequences of scalars tξ1, ξ2, . . . u such for
which

|ξi|ρ   8.

i 1

The norm of an element x  tξi u P lp is defined as


 1
8̧ p

}x}p  |ξi| p


i 1

The space l8 consists of bounded sequences. The norm of an element x  tξiu P l8 is defined as }x}8 
supi |ξi |.

T HEOREM 3.1.2 (Cauchy-Schwarz Inequality). If x  tξi u, y  tηi u P l2 , then


 1  1
8̧ 8̧ 2 8̧ 2

|ξiηi| ¤ ξi2 ηi2  }x}2}y}2.



i 1 i 1  
i 1

Proof: First, we show that the geometric mean is less than or equal to the arithmetic mean. By elementary
algebra, we have 0 ¤ px  y q2 for any x, y P R. Then xy ¤ 21 x2 2 y . Let a  x , b  y . Then
1 2 2 2

a 2 b 2 ¤ 12 a 12 b.
1 1

Now consider x  tξi u, y  tηi u P l2 . We set a  |ξi |2 {}x}2 , b  |ηi |2 {}y}2 and use the above inequality
for each component i. Summing over all i gives
8̧ |ξiηi| ¤ 8̧  1 |ξi|2 1 |ηi |2

}x}2}y}2 i1 2 }x}2 2 }y}2


 21 }}xx}}2 1 }y}2
2 }y}2
 1.
i1 2

°8
Then  |ξi ηi | ¤ }x}2 }y}2 .
i 1 q
As an aside, while Augustin Cauchy first published the inequality for sums in 1821, the corresponding
inequality for integrals was first stated by Viktor Bunyakovsky in 1859, well before Hermann Schwarz’s

41
42 U 3. B ANACH S PACES

rediscovery of the inequality in 1888. This fact illustrates Stigler’s Law of Eponymy: “No scientific discov-
ery is named after its original discoverer.” The law, proposed by University of Chicago statistics professor
Stephen Stigler in 1980, applies to itself since long before Stigler formulated it, A. N. Whitehead noted
that “Everything of importance has been said before, by someone who did not discover it.” Sigler himself
attributed “his” law to the sociologist Robert K. Merton, famous for coining the phrases “self-fulfilling
prophecy” and “role model” – as well as for being the father of Harvard economist Robert C. Merton,
co-discoverer of the Black-Scholes(-Merton) option-pricing formula.
Before proving the Hölder inequality, a generalization of the Cauchy-Schwarz inequality, we provide
a motivating example.

Example 3.1.3. Consider a specific vector x  tξi u P X, where ξi  1{i0.3 . What is the smallest integer p
° °8
for which the norm }x}p is finite? If ηi  1{i0.3 , does 8i1 |ξi ηi | converge? If ηi  1{i , does
0.9
i1 |ξi ηi |
converge?

°
Solution: Since the series 8 1 1{i converges for p ¡ 1, we want to find the smallest integer p such that
p
°i8
0.3p ¡ 1. Therefore, p  4. i1 |ξi ηi | does not converge if ηi  1{i0.3 , but it does if ηi  1{i0.9 . q
We can view ηi as an infinite set of coefficients that define a linear functional on X. If ξi  1{ip and
°
ηi  1{iq , then a sufficient condition for 8 i1 |ξi ηi | to converge is 1{p 1{q ¡ 1.
Cauchy-Schwarz was a special situation, because when we use the l2 -norm in X, we also want the
same l2 norm in the dual space of linear functionals on X. In general, if we use the lp -norm in X and want
°
the linear functional 8 i1 |ξi ηi | to converge, we should use the lq -norm, where p
1
q  1, on the dual
1

space of linear functionals. Therefore we need to prove the Hölder inequality. Before proving the main
inequality, first we derive an auxiliary inequality.

Proposition 3.1.4. For a, b ¥ 0 and 0   λ   1,

aλ b1λ ¤ λa p1  λqb

Proof: Consider the function f ptq  tλ  λt λ  1 for t ¥ 0. Then f 1 ptq  λptλ1  1q. Since 0   λ   1,
we have f 1 ptq ¡ 0 for 0   t   1 and f 1 ptq   0 for t ¡ 1. Therefore, f ptq ¤ f p1q  0 for t ¥ 0, so

tλ ¤ λt 1λ

for t ¥ 0. If b  0, substituting t  a{b and multiplying both sides by b gives the desired inequality. If
b  0, the inequality is obvious. q

Q UIZ T HEOREM 5 (Hölder Inequality ([1, Theorem 2.10.1, p. 29])). If p and q satisfy 1
p
1
q  1, 1 ¤ p ¤ 8,
1 ¤ q ¤ 8, and x  tξi u P lp , y  tηi u P lq , then

|ξiηi| ¤ }x}p}y}q .

i 1
3.1. lp Space U 43

Equality holds if and only if



|ξi|
  |ηi|

1 1
q p

}x}p }y}q
for all i.

Proof: First consider the case p  8, q  1. Then we have


8̧ 8̧ 



 8̧
|ξiηi| ¤  max ξi ηi 
  | |  max |ξi| |ηi|  }x}p}y}q .

i 1 
i 1
i i

i 1

The case p  1, q
 8 is equivalent.
Now consider the cases 1   p   8, 1   q   8. Set a  |ξi |p {}x}p , b  |ηi |q {}y}q , and λ  1{p, and use
the above inequality for each component i. Summing over all i gives
8̧ |ξiηi| ¤ 8̧  1 |ξi|p

}x}p}y}q i1 p }x}p


1 |ηi |q
q }y}q
 p1 }}xx}}p 1 }y}q
q }y}q
 1.
i1 p

°8
Then  |ξi ηi | ¤ }x}p }y}q .
i 1 q

We can use the Hölder inequality to generalize the Euclidean triangle inequality to the Minkowski
inequality.

Q UIZ T HEOREM 6 (Minkowski Inequality ([1, Theorem 2.10.2, p. 31])). If x, y P lp with 1 ¤ p ¤ 8, then so
is x y, and }x y}p ¤ }x}p }y}p . For 1   p   8, equality holds if and only if x  ky for some positive
constant k.

Proof: For p  1,
8̧ 8̧ 8̧
}x y }1  |ξi ηi | ¤ |ξi| |ηi|  }x}1 y}1 ,

i 1 
i 1 i 1 
where we apply the triangle inequality to each term in the summation. For p  8,

}x y}8  max
i
|ξi ηi | ¤ max |ξi |
i
max |ηi |  }x}8
i
}y}8.

For 1   p   8, we first consider finite sums. We have


ņ ņ ņ ņ
p}x y }p q p  |ξi ηi |p  |ξi ηi |p1 |ξi ηi | ¤ |ξi ηi |p1 |ξi | |ξi ηi |p1 |ηi |.

i 1 
i 1 
i 1 
i 1
44 U 3. B ANACH S PACES

Now apply Quiz Theorem 5 to both sums, treating t|ξi ηi |p1 u as an element of lq , with 1
p
1
q  1. Then
 1  1  1
ņ ņ ņ ņ
ηi |pp1qq
q p p

|ξi ηi |p ¤ |ξi  |ξi|p |ηi|p



i 1 i 1  
i 1 
i 1
 1  1  1
ņ q ņ p ņ p

 |ξi ηi |p  |ξi|p |ηi|p ,


i 1  
i 1 
i 1

°n
since pp  1qq  q {q  1. Dividing both sides of the above equation by p  |ξi ηi |p q q and remembering
1
i 1
that 1  1q  p1 , we have
 1  1  1
ņ p ņ p ņ p

|ξi ηi | p
¤ |ξi| p
|ηi| p
.
i 1  
i 1 
i 1

Letting n Ñ 8 on the right-hand side can only increase its value, so


 1
ņ p

|ξi ηi |p ¤ }x}p }y}p.



i 1

Note that this holds for all n; therefore, we can also let n Ñ 8 on the left-hand side to obtain the desired
result. The conditions for equality follow from the Hölder inequality. q

3.2 Lebesgue Integration

Definition 3.2.1 (volume zero). A set S € R has volume zero if, for any  ¡ 0, S can be contained in the
union of a finite set of intervals with total length less than .

Definition 3.2.2 (measure zero). A set S € R has measure zero if, for any  ¡ 0, S can be contained in the
union of a countably infinite set of open intervals with total length less than .

Example 3.2.3 (rational numbers). Show that the set Q Xr0, 1s has measure zero but does not have volume
zero.

Solution: Since Q X r0, 1s is countable, we can enumerate the elements of the set as q1 , q2 , . . . . For any
 ¡ 0, define the open interval Ik  pqk  2k , qk 2k q. The sum of the lengths of these intervals is .
Therefore, Q Y r0, 1s has measure zero.
Assume Q X r0, 1s can be contained in the union C  Yni1 Ci , where n   8 and maxi |Ci | ¤ δ. Then
|C | ¤ nδ. For nδ   1, there exists an interval r0, 1s  C with length 1  nδ ¡ 0; since Q is dense in R, there
must be an element of Q X r0, 1s in this interval. Therefore, Q X r0, 1s does not have volume zero. q
The proof that Q X r0, 1s has measure zero works for any countably infinite set. However, there also
exist sets of measure zero that are uncountably infinite. The Cantor set is one famous example:
3.2. Lebesgue Integration U 45

Example 3.2.4 (Cantor set). Define the Cantor set as follows:

• Step 0: Let the Cantor set be the interval r0, 1s.

• Step i: Remove the middle third of every interval in the Cantor set at Step i  1.

Show that the Cantor set has volume, and therefore measure, zero.

Solution: At Step i, the Cantor set contains 2i intervals with total volume p2{3qi . For every , there exists
some i such that p2{3qi   . Then we can contain the Cantor set in the 2i   8 intervals with total length
less than . q

Figure 3.2.1: the Cantor set (Example 3.2.4). Image taken from Wikimedia Commons.

As an interesting aside, note that the Cantor set is in one-to-one correspondence with r0, 1s. Consider a
point x in the Cantor set. That point is in r0, 1s, a left or right subinterval of r0, 1s, a left or right subinterval
of that subinterval, and so on (see Figure 3.2.1). We can encode the position of x by the binary expansion
0.b1 b2 . . . , where bi equals 0 if x is in the left subinterval at Step i and 1 if x is in the right subinterval at
Step i.
If f and g are Riemann (or, equivalently, Darboux) integrable, then so is f g, and it follows by
induction that the sum of any finite series of Riemann integrable functions is integrable. The Lebesgue
integral extends this idea to a function that is the sum of an infinite series of functions.

°8
Definition 3.2.5 (Lebesgue integral). Suppose that f pxq  
k 1 fk pxq. Further suppose that the sum
8̧ »b
A |fk pxq| dx

k 1 a
°8
converges. Then f pxq is defined, in the sense that 
k 1 fk pxq converges, except perhaps on a set of mea-
sure zero. Furthermore,
8̧ »b »b
fk pxq dx  f pxq dx

k 1 a a

is the Lebesgue integral of f pxq on ra, bs.

Note that if f is Riemann integrable, then we can write f1 pxq  f pxq and fk pxq  0 for k ¡ 1, so the
Lebesgue integral of f equals the Riemann integral.
The following proposition extends the result that if functions f and g are equal except on a set of
volume zero, their Riemann integrals are equal:
46 U 3. B ANACH S PACES

°8
Proposition 3.2.6. If f pxq  
k 1 fk pxq and gpxq  °8k1 gk pxq are equal except on a set of measure zero, then
»b »b
f pxq dx  g pxq dx.
a a

Example 3.2.7 (Dirichlet function). Show that the Dirichlet function 1xPQXr0,1s is Lebesgue integrable (with
integral zero) but not Riemann integrable.

Solution: We showed in Example 3.2.3 that Q X r0, 1s has measure zero, so the Lebesgue integral of the
Dirichlet function is zero. To show that the Dirichlet function is not Riemann integrable, consider any
dissection of r0, 1s. Each interval in this dissection will contain a rational number and a nonrational
number since Q is dense in R. Therefore, the upper Darboux sum will always be 1, while the lower
Darboux sum will always be zero. Therefore, the Darboux integral, and hence the Riemann integral, does
not exist. q
The improper integral of an unbounded function cannot be defined as a Riemann integral, since the
upper Darboux sum does not exist. However, Lebesgue integration solves the unbounded function prob-
lem:

Example 3.2.8 (unbounded function). Evaluate the Lebesgue integral


»1 »1
?1x dx  alim
Ñ0
?1x dx.
0 a

Solution: Let #
fk pxq 
?1x 1
2k
  x ¤ 2 1 k 1
.
0 otherwise
Then 
k1
» 
k
 |fk pxq| dx  2 1
 1
2 2
Ik ;
R 2 2
°8

k 0 Ikis a telescoping series and evaluates to 2. Since |fk pxq|  fk pxq for all k and x, the given integral
also equals 1. q
Lebesgue integration also solves the unbounded interval problem:

Example 3.2.9 (unbounded interval). Evaluate the Lebesgue integral


»8
ex dx.
0
3.3. Lp Space U 47

Solution: Let #
ex k1¤x k
fk pxq  .
0 otherwise
Then »
Ik  |fk pxq| dx  epk1q  ek ;
R
°8
k0 Ik is a telescoping series and evaluates to 1. Since |fk pxq|  fk pxq for all k and x, the given integral
also equals 1. q

3.3 Lp Space

Definition 3.3.1 (Lp space). For p ¥ 1, the space Lp ra, bs consists of all real-valued measurable functions x
on ra, bs for which |xptq|p is Lebesgue integrable. The norm of an element x P Lp ra, bs is defined as
» b
p1
}x}p  |xptq| p
dt .
a

Of course, }x}p  0 does not necessarily imply that xptq  0 since xptq may be nonzero on a set of measure
zero, as we saw in Example 3.2.7. Therefore, we do not distinguish between functions that are equal
almost everywhere and redefine 0 as the equivalence class of all functions that equal zero except on a set
of measure zero.
Note that we can define the Rp spaces similarly, replacing “Lebesgue integrable” with “Riemann in-
tegrable”. While most of the functions we see in these notes will be Riemann integrable, we prefer to
work with the Lp spaces since they are Banach spaces (as we will see shortly), allowing us to use a strong
theoretical framework.
A tricky special case is L8 ra, bs. The obvious definition is }x}p  suptPra,bs |xptq|, but this is ambiguous
since an element of x corresponds not to a single function but to an equivalence class of functions that
differ on a set of measure zero. Therefore, we define the norm as

}x}8  inf
p qxptq
sup |y ptq|  ess sup |y ptq|,
y t
almost everywhere
Pr s
t a,b Pr s
t a,b

where ess sup denotes the essential supremum.


Results analogous to Quiz Theorems 5 and 6 hold for the Lp spaces:

T HEOREM 3.3.2 (Hölder Inequality ([1, Theorem 2.10.3, p. 32])). If p and q satisfy 1
p
1
q  1, 1 ¤ p ¤ 8,
1 ¤ q ¤ 8, and x P Lp ra, bs, y P Lq ra, bs, then
»b
xptqy ptq dt ¤ }x}p }y }q .
a
48 U 3. B ANACH S PACES

Equality holds if and only if 


|xptq|
p   |yptq|
q
}x}p }y}q
almost everywhere on ra, bs.

Proof: The proof is essentially the same as that of Quiz Theorem 5. q

T HEOREM 3.3.3 (Minkowski Inequality ([1, Theorem 2.10.4, p. 33])). If x, y P Lpra, bs with 1 ¤ p ¤ 8, then
so is x y, and }x y }p ¤ }x}p }y }p .

Proof: Consider the function f ptq  tp . Clearly this function is convex on r0, 8q for p ¥ 1 since f 1ptq 
ptp1 and f 2 ptq  ppp  1qtp2 ¥ 0 for p ¥ 0 and t ¥ 0. Define

aptq 
|xptq| bptq 
|yptq| λ
}x}p .
}x}p }y }p }x}p }y}p
Then
|xptq yptq|p p¤aq |xptq|p |yptq|p  pλaptq p1  λqbptqqp p¤bq λpaptqqp p1  λqpbptqqp,
p}x}p }y}pqp p}x}p }y}pqp
where (a) follows from the triangle inequality and (b) follows from the convexity of tp . Then integrating
over ra, bs gives 
}x y}p
p ¤ λ p1  λq  1,
}x}p }y}p
yielding the desired result. q

3.4 Cauchy Sequences

Definition 3.4.1 (Cauchy sequence). A sequence txn u in a normed space is a Cauchy sequence if for any
 ¡ 0, there exists an integer N such that }xm  xn }    for all n, m ¡ N .

Example 3.4.2. Show that the partial sums over the vector space Q


sn  p1qi 2i 4 1

i 0

form a Cauchy sequence.

Solution: We recognize that



tan1 x  p1q2i 1 1
,

i 0
2i 1
3.4. Cauchy Sequences U 49

so sn Ñ 4 tan1 1  π. Without loss of generality, let n ¡ m. Then for n, m ¡ N ,


|sn  sm|  sn  sm  π  sm Ñ 0.
Note that since π R Q, the series does not converge. The obvious fix is to let tsn u be a Cauchy sequence
over the vector space R. q

Q UIZ T HEOREM 7 (includes [1, Lemma 2.11.1, p. 35]). Every convergent sequence is a Cauchy sequence. Fur-
thermore, every Cauchy sequence is bounded.

Proof: If a series txn u converges to x, then for every  ¡ 0, there exists an N such that for every n ¡ N ,
}xn  x}   . Then we have
}xm  xn}  }pxm xq  pxn  xq} ¤ }xm x} }xn  x} ¤ 2,
where the first inequality follows from the triangle inequality. If we set 1  2, we have shown that for
every 1 ¡ 0, there exists an N such that for every m, n ¡ N , }xm  xn }   1 . Therefore, txn u is a Cauchy
sequence.
To show that every Cauchy sequence is bounded, pick an N such that }xn  xN }   1 for n ¡ N . Then
for any n ¡ N ,
}xn}  }xn  xN xN } ¤ }xn  xN } }xN }   1 }xN },
where the first inequality follows from the triangle inequality. Since }xn }   }xN } for any n ¡ N , txnu is
bounded. q

Definition 3.4.3 (Banach space). A normed linear vector space X is complete if every Cauchy sequence in
X has a limit in X. A complete normed linear vector space is a Banach space.

From Example 3.4.2, we see that Q is not complete.

Example 3.4.4 ([1, Example 2.11.2, p. 34]). Let X be the vector space of sequences of real numbers with
only finitely many nonzero components with the l8 norm and define
" *
xn  1, ,    ,
1
2
1
n1
, 0, 0,    .

Show that txn u is a Cauchy sequence. How would you “complete” X to make this Cauchy sequence
convergent in X?

Solution: We have }xn  xm }  maxp1{n, 1{mq Ñ 0. Clearly txn u does not converge to an element of
X, but we can “complete” X by turning it into the vector space of all sequences of real numbers that
converge to zero. q
50 U 3. B ANACH S PACES

Example 3.4.5. Let X  R1r0, 1s and define


#
0 t¤ 1
xn ptq 
0 n
.
?1t 1
n  t 1
Show that txn u is a Cauchy sequence. To what function does this sequence converge? Is it in R1 r0, 1s, and
if not, of what larger space is it an element?

Solution: The function converges to xptq  t1{2. Because x is unbounded, x R R1r0, 1s. However, from
Example 3.2.8, we see that x P L1 r0, 1s. q

Q UIZ T HEOREM 8 ([1, Example 2.11.4, p. 35]). lp , for 1 ¤ p ¤ 8, is a Banach space.

Proof: First consider the case 1 ¤ p   8. Let txn u be a Cauchy sequence in lp . Then if xn  tξinu,
 1

|ξkn  ξkm|  |ξkn  ξkm|p ¤
p

|ξin  ξim|p Ñ 0,
1
p


i 1

so for each k, tξkn u is a Cauchy sequence and therefore converges to a limit ξk . We show that x  tξk u P lp .
From Quiz Theorem 7, there exists some M such that }xn } ¤ M for all n, so

¸
j
|ξin|p ¤ }xn}p ¤ M p.

i 1

Since the sum on the left is finite, the inequality holds as n Ñ 8:

¸
j
|ξi|p ¤ M p.

i 1

Furthermore, since the inequality holds uniformly in j,



|ξi|p ¤ M p,

i 1

so x P lp , and }x} ¤ M . We still must show that xn Ñ x. For any  ¡ 0, there exists an N such that
¸
j
|ξin  ξim|p ¤ }xn  xm}p ¤ 

i 1

since txn u is a Cauchy sequence. Since the sum on the left is finite, the inequality holds as m Ñ 8:

¸
j
|ξin  ξi|p ¤ }xn  x|p ¤ .

i 1
3.4. Cauchy Sequences U 51

Furthermore, since the inequality holds uniformly in j,



|ξinξi|p  }xn  x}p ¤ 

i 1

for n ¡ N , so xn Ñ x.
Now let txn u be a Cauchy sequence in l8 . Then |ξkn  ξkm | ¤ }xn  xm } Ñ 0, so for each k, as before,
tξknu is a Cauchy sequence and therefore converges to a limit ξk . Furthermore, this convergence is uniform
in k. We show that x  tξk u P l8 . Since tξk u is Cauchy, from Quiz Theorem 7 there exists an M such that
for all n, }xn } ¤ M . Then |ξkn | ¤ }xn } ¤ M for all k and n, so x P l8 and }bx} ¤ M . Since ξkn Ñ ξk
uniformly, xn Ñ x. q

Example 3.4.6 ([1, Example 2.11.3, p. 35]). Show that C r0, 1s is a Banach space.

Solution: Let txn u be a Cauchy sequence in C r0, 1s. For each fixed t P r0, 1s, |xn ptq xm ptq| ¤a
}xn  xm} Ñ
0, so txn ptqu is a Cauchy sequence of real numbers. Since E (R with the Euclidean norm px  y q2 
|x  y|) is complete, there exists a real number xptq such that xnptq Ñ xptq. Therefore, the functions xn
converge pointwise to x. This pontwise convergence is actually uniform in t P r0, 1s; that is, for any  ¡ 0,
there exists an N such that |xn ptq  xptq|    for all t P r0, 1s and n ¥ N . For  ¡ 0, choose N such that
}xn  xm}   {2 for n, m ¡ N , which we can do since txnu is a Cauchy sequence. Then for n ¡ N ,
|xnptq  xptq| ¤ |xnptq  xmptq| |xmptq  xptq| ¤ loooomoooon
}xn  xm} }xm  x}.
¤{2
By choosing m sufficiently large, we can make the second term less than {2 since xm Ñ x pointwise, so
|xnptq  xptq|    for n ¡ N .
We must still show that x is continuous and therefore in C r0, 1s and that xn Ñ x in the norm of C r0, 1s.
To prove continuity, choose  ¡ 0. For all t, δ, n,

|xpt δ q  xptq| ¤ |xpt δ q  xn pt δ q| |xnpt δ q  xn ptq| |xnptq  xptq|.


By the uniform convergence of txn u, we can choose n such that the first and last terms are less than {3.
Since xn is continuous, δ can be chosen such that the middle term is less than {3. Thus, x is continuous.
Since xn ptq Ñ xptq uniformly, xn Ñ x in the norm of C r0, 1s. q

Example 3.4.7 ([1, Example 2.11.1, p. 35]). Show that the space of continuous functions on r0, 1s, with the
norm » 1
}x}  |xptq| dt,
0
is not a Banach space.
52 U 3. B ANACH S PACES

Figure 3.4.1: Plots of x2 ptq, x3 ptq, and x4 ptq (Example 3.4.7).

Solution: We must construct a Cauchy sequence of functions that convergence to a discontinuous func-
tion. Consider the sequence txn u, where
$
'
'
&0 0¤t   n1 1
2
xn ptq  nt  n
1 2  n ¤t  2
1 1 1 .
'
'
2
%1 t ¥ 12

Figure 3.4.1 shows plots of x2 ptq, x3 ptq, and x4 ptq. Then


» 1    

  
 1 1  1  1 1 
} x n x m }  
 n
 mt 
m 
 1
p 1q  f rac1n  1
  
2
 nt 1 1  dt ,
1
 n1 2 2 2 2 2 m  2 m n
2

which clearly goes to zero as n, m Ñ 8. Therefore, txnu is a Cauchy sequence. However, xn Ñ x, where
x is the discontinuous function #
0¤t  1
xptq 
0 2
.
1 1
2 ¤t¤1
Therefore, the given space is not a Banach space. q

T HEOREM 3.4.8 ([1, Theorem 2.12.1, p. 38]). In a Banach space X, a subset S „ X is complete if and only if it is
closed.

Proof: If S is complete, then every Cauchy (and hence, by Quiz Theorem 7, convergent) sequence in S
converges to a point in S, so S is closed. Furthermore, since X is a Banach space, any Cauchy sequence in
S must converge to a point in x P X, and if S is closed, then x P S. Hence, if S is closed, then S must be
complete. q

T HEOREM 3.4.9 ([1, Theorem 2.12.2, p. 38]). In a normed linear vector space, any finite-dimensional subspace is
complete.

Proof: The proof is by induction on the dimension of the subspace. A 1-dimensional subspace is complete
since all elements in the subspace can be written in the form x  αe, where α is a real number and e is
3.5. Compactness and Extrema U 53

a fixed vector. Convergence of a sequence tαn eu is equivalent to convergence of the sequence of real
numbers tαn u; this sequence is convergent since E is complete.
Now assume that the theorem is true for subspaces of dimension N  1. Let X be a normed space and
let M be an N -dimensional subspace of X. We show that M is complete. Let te1 , . . . , eN u be a basis for
M . Define  
 
 Ņ 
δk  inf ek

a1 ,...,aN 
 
αj ej  ,

k  1, . . . , N.
 j 1 

j k

δk is the distance from ek to the subspace Mk generated by the remaining N  1 basis vectors. We must
have δk ¡ 0 otherwise a sequence of vectors in the (N  1)-dimensional subspace Mk could be constructed
converging to ek R Mk , which contradicts the induction hypothesis that Mk is complete.
Define δ  mink δk and let txn u be a Cauchy sequence in M . Each xn has a unique representation


xn  λni ei .

i 1

Then for arbitrary n, m,


 
 Ņ 
}xn  xm}  


pλni  λmiq 
ei 

¥ |λnk  λmk|δ, k  1, . . . , N ;
i 1 
since }xn  xm } Ñ 0, |λnk  λmk |Ñ 0 for each k. Then for each k, tλnk u is a Cauchy sequence and therefore
°n
converges to a limit λk . Let x  k1 λk ek P M . We show that xn Ñ x. For all n,
 
 ņ 
}xn  x}  


p  λk q
λnk

ek 

¤ N  1¤max
k¤N
|λni  λk |}ek },
k 1 
but since |λnk  λk | Ñ 0 for all k, }xn  x} Ñ 0. Thus, xn Ñ x. q

3.5 Compactness and Extrema

Definition 3.5.1 (compact). A set K in a normed space X is compact if, given an arbitrary sequence txi u
in K, there is a subsequence txin u that converges to an element x P K.

By the Bolzano-Weierstraß theorem, “compact” is equivalent to “closed and unbounded” in finite-dimensional


vector spaces.
Note that if K is compact, then every Cauchy sequence converges to a limit in K, so K is also complete.
We can now prove the extreme value theorem (see Theorem 1.2.1 for an equivalent statement):

Definition 3.5.2 (upper semicontinuous). A real-valued functional f defined on a normed space X is


upper semicontinuous at x0 if for any  ¡ 0, there exists a δ ¡ 0 such that }x  x0 }   δ implies f pxq 
f px0 q    (note the absence of the usual absolute value signs).
54 U 3. B ANACH S PACES

T HEOREM 3.5.3 (extreme value theorem [1, Theorem 2.13.1, p. 40]). An upper semicontinuous functional on
a compact subset K of a normed linear vector space X achieves a maximum on K.

Proof: Let M  supxPK f pxq (we allow the possibility that M  8). There is a sequence txi u in K such
that f pxi q Ñ M . Since K is compact, there is a convergent subsequence xin Ñ x P K. Clearly f pxin q Ñ M
and since f is upper semicontinuous, f pxq ¥ limnÑ8 f pxin q  M . But since K is compact, f is bounded,
so M   8 and then f pxq  M . q

Example 3.5.4 ([1, Example 2.13.1, p. 40]). Consider the linear functional
» 1 »1
f pxq  xptq dt  xptq dt,
2

1
0 2

defined for x P C r0, 1s. Show that f is continuous. What function on the unit sphere in C r0, 1s maximizes
f ? Is this function an element of C r0, 1s?

Solution: We have
» 1 »1 

|f pxq  f px0q|  pxptq  x0ptqq dt  pxptq  x0ptqq dt ¤ 21 }x  x0}   21 }x  x0}  }x  x0}.


2

1
0 2

Then for every  ¡ 0, there exists a δ   such that }x  x0 }   δ implies |f pxq  f px0 q|   . The unit ball
in C r0, 1s is the set of continuous functions x such that 1 ¤ xptq ¤ 1 for all t P r0, 1s. The supremum of f
over the unit ball in C r0, 1s is clearly 1, but no continuous function x with }x} ¤ 1 achieves this supremum.
(If we instead considered L8 r0, 1s, then the function
#
0¤t  1
x pt q 
1 2
1 1
2 ¤t¤1
achieves the supremum.) This shows that the unit ball in C r0, 1s is not compact. q
In fact, the unit ball is not compact in any infinite-dimensional normed linear vector space:

T HEOREM 3.5.5. In any infinite-dimensional normed linear vector space X, the unit ball is not compact.

Proof: We construct an infinite sequence of linearly independent unit vectors zn such that for all n, m,
}zn  zm} ¡ 1{2. Let tz1, . . . , zn1u be a set of n  1 such vectors, and let Y be the closed subspace
generated by these vectors. Y is a proper subspace of X since X is not finite-dimensional. Therefore,
we can choose x P X zY and let d  inf YPY }x  y}. Since Y is finite-dimensional, it is complete by
Theorem 3.4.9. If d were zero, then there would exist a sequence of vectors in Y that converged to x R Y ,
contradicting our hypothesis that Y is finite-dimensional and hence complete. Therefore, d ¡ 0.
Choose some y0 P Y such that d   }x  y0 }   2d, and let

zn  }xx  yy0} .
0
3.6. Quotient Spaces U 55

Then by construction, }zn }  1. Now for any y P Y ,



}zn  y} 
x
  y0  y}x  y0}  .
 } x  y0 } 

The numerator is greater than d and the denominator is less than 2d, so the fraction is greater than 1{2.
Since the above equation holds for all y P Y , }zn  zi } ¡ 1{2 for all i   n. By repeating this process for
n ¥ 2, we construct an infinite sequence of vectors in the unit ball (specifically on the unit sphere), such
that }zn  zm } ¡ 1{2 for all n, m. Thus, this sequence has no Cauchy subsequence and hence, by Quiz
Theorem 7, no convergent subseqeunce. Therefore, the unit ball is not compact. q

3.6 Quotient Spaces

Definition 3.6.1 (equivalence). Let M be a subspace of a vector space X. Two elements x1 , x2 P X are
equivalent modulo M if x1  x2 P M . In this case, we write x1  x2 .

This equivalence relation partitions X into disjoint subsets, called cosets, of equivalent elements. Geo-
metrically, these cosets are linear varieties that are distinct translations of the subspace M . Each x P X
belongs to a unique coset of M , denoted rxs.

Definition 3.6.2 (quotient space). Let M be a subspace of a vector space X. The quotient space X {M
consists of all cosets of M with addition defined by rx1 s rx2 s  rx1 x2 s and scalar multiplication
defined by arxs  raxs.

Definition 3.6.3 (norm in a quotient space). If X is a Banach space and M is a closed subspace of X, then
the norm of a coset rxs P X {M is defined as

}rxs}  minfPM }x m}

Example 3.6.4. Show that this definition satisfies the axioms for a norm.

Solution: Positive homogeneity follows since

}αrxs}  }rαxs}  minfPM }αx m}  inf


P
αm M
}αx αm}
paq pbq
 minfPM }αx αm}  |α|}rxs},
where (a) follows because M is a linear subspace, so finding the infimum over m P M is the same as
finding the infimum over αm P M and (b) follows by the positive homogeneity of the norm }  } in X.
56 U 3. B ANACH S PACES

The triangle inequality follows since

}rxs rys}  }rx ys}  inf


m M P
}x y m}  inf
P
2m M
}x y 2m}

paq pbq
 minfPM }x y 2m} ¤ }rxs} }rys},
where (a) follows for the same reason as before and (b) follows by the fact that the norm }  } in X obeys
the triangle inequality.
Positivity follows from the positivity of the norm }  } in X. Clearly }r0s}  0. There remains the
possibility that rxs does not include 0 but that there is a sequence in rxs that converges to 0, in which case
}rxs}  0. However, this is impossible since M is closed, so }rxs} ¡ 0 if rxs  r0s. Thus, we have shown
positive definiteness. q

3.7 Denseness and Separability

Definition 3.7.1 (dense). A set D is dense in a normed space X if for each x P X and each  ¡ 0, there
exists a d P D such that }x  d}   .

Note that if D is dense in X, there are points of D arbitrarily close to each x P X. Therefore, given x P X,
we can construct a sequence of elements in D that converges to x. Thus, an equivalent definition is: D is
dense in X if D̄  X; that is, if the closure of D equals X.
One of the most common examples of a dense set is the set Q in R. By the Weierstraß approximation
theorem (1.5.6), the set of all polynomials with rational coefficients is dense in C ra, bs.

Definition 3.7.2 (separable). A normed space is separable if it contains a countable dense set.

Q UIZ T HEOREM 9. lp , for 1 ¤ p   8, is separable.

Proof: Let D be the set of all finitely nonzero sequences with rational components. Clearly, D is countable.
°
Let x  tξi u P lp and fix  ¡ 0. Since x P lp , 8i1 |ξi |   8, so there exists an N such that
p


|ξi|p   2 .

i N 1

For i  1, . . . , N , let ri be a rational such that |ξi  ri|p   {2N , and let d  tr1, . . . , rN , 0, 0, . . . u. Then
d P D and
Ņ 8̧
}x  d}p ¤ |ξi  ri|p |ξi|p ¤ ,

i 1 
i N 1

so D is dense in lp . q
3.7. Denseness and Separability U 57

Example 3.7.3. Show that C ra, bs is separable.

Solution: Let D be the set of all polynomials of finite degree with rational coefficients. Clearly, D is
countable since it can be put into correspondence with the set of all finitely nonzero sequences with
rational components. Given x P C ra, bs and  ¡ 0, by the Weierstraß approximation theorem (Theorem
1.5.6), there exists a polynomial p such that |xptq pptq|   {2 for all t P ra, bs. While p does not necessarily
have rational coefficients, we can construct another polynomial r that does, such that |pptq  rptq|   {2
 N
for all t P ra, bs. We can easily construct r by changing each of the coefficients of p by less than 2N c ,
where N  1  deg p and c  maxp|a|, |b|q. Then
 
1 
N¸ 
|pptq  rptq|  cN ti  ¤ 2 , a ¤ t ¤ b,
 

 2N 

i 0

as desired. Thus

}x  r}  tmax
Pra,bs
|xptq  rptq| ¤ tmax
Pra,bs
|xptq  pptq| max |pptq  rptq| ¤ ,
Pr s
t a,b

so D is dense in C ra, bs. q

Example 3.7.4. Show that l8 is not separable.

Solution: Suppose we have found a countable dense subset txn u of l8 , where xn  tξin u. (We can index
the sequence txn u by n since the sequence is countable.) Construct x  tξi u by the following rule:

• If ξii   21 , set ξi  1.
• If ξii ¥ 21 , set ξi  0.

Then }x  xi } ¥ 21 for all i, since x and xi differ by at least 1


2 in their ith component. Therefore txn u is not
dense, so l8 is not separable. q

Example 3.7.5. Show that L8 is not separable.

Solution: Consider the subset of indicator functions t1t  1xPr0,ts utPr0,1s P L8 . Clearly this set is uncount-
able since the set of real numbers in the interval r0, 1s is uncountable. Furthermore, }Is  It }  1 if s  t.
Therefore, the open balls tB1{2 pIt qutPr0,1s do not intersect. However, any dense set of L8 must include an
element in each of these balls; since the set of balls is uncountable, any dense set of L8 is uncountable.
Therefore L8 is not separable. q
58 U 3. B ANACH S PACES
C HAPTER 4
HILBERT SPACE

Reading: [1, Chapter 3].

4.1 Inner Products

Definition 4.1.1 (pre-Hilbert space). A pre-Hilbert space is a linear vector space X over a field F along
with an inner product defined on X  X, which associates with any two vectors x, y P X a scalar px, yq.
The inner product satisfies the following axioms:

1. conjugate symmetry: px, yq  py, xq for all x, y P X.

2. sesquilinearity (linearity in the first variable): px y, zq  px, zq py, zq and pλx, yq  λ px, yq
for all x, y, z P X and all λ P F.

3. positivity and positive definiteness: px, xq ¥ 0 and px, xq  0 if and only if x  0.

Note that by conjugate symmetry and linearity in the first term,

px, y zq  p y z, xq  py, xq pz, xq  px, yq px, zq .


The property that pw y, x zq  pw, xq py, zq is sometimes called additivity. If F  R, then conjugate
symmetry becomes regular symmetry. Therefore, if F  R, as we will assume for the rest of this text,
sesquilinearity
a
becomes standard bilinearity. a
Let }x}  px | xq. We verify that }  }  p | q is indeed a norm:

1. positive homogeneity:
b b
}αx}  pαx, αxq paq pbq p cq
a a
α px, αxq  αpαx, xq  ααpx, xq  |α|}x},

where (a) and (c) follow from sesquilinearity and (b) follows from conjugate symmetry.

2. triangle inequality:

}x y }2  px y, x yq
 px, xq px, yq py, xq py, yq
¤ }x}2 2| px, yq | }y}2.
By the Cauchy-Schwarz inequality, (re)proved below, | px, yq | ¤ }x}}y}; substituting this into the
inequality and taking square roots gives }x y} ¤ }x} }y}.

59
60 U 4. H ILBERT S PACE

3. positive definiteness and positivity: This follows from the positivity and positive definiteness of
the inner product.

Below we prove a version of the Cauchy-Schwarz inequality equivalent to Proposition 3.1.2.

Proposition 4.1.2 (Cauchy-Schwarz Inequality ([1], Lemma 1, p. 47)). For all x, y in an inner product space,
| px, yq | ¤ }x}}y}. Equality holds if and only if x  λy or if y  0.

Proof: If y  0, the inequality holds trivially. If y  0, then for all scalars λ,

0 ¤ px  λy, x  λyq  px, xq  λ py, xq  λ px, yq |λ2| py, yq .


Then for λ  px, yq { py, yq,

0 ¤ px, xq 
| px, yq |2 ùñ | px, yq | ¤ apx, xq py, yq  }x}}y}.
py, yq q

The following lemma generalizes a useful result in Euclidean geometry:

Proposition 4.1.3 (parallelogram law ([1], Lemma 3, p. 48)). For any x, y in a pre-Hilbert space X,

}x y }2 }x  y}2  2}x}2 2}y}2 .

Proof: In the general case, directly expand the norms in terms of the inner product. For X  R2 , we
can prove the parallelogram law using only the law of cosines. Define the inner product px | yq as the
dot product x  y  }x}2 }y}2 cos θ, where θ is the angle between the vectors x and y. Then by the law of
cosines,

}x  y}22  }x}22 }y}22  2}x}2}y}2 cos θ


}x y}22  }x}22 }y}22  2}x}2}y}2 loooomoooon
cospπ  θq .
 cos θ
Summing these equations gives the desired result. q
The Cauchy-Schwarz inequality allows us to prove the continuity of the inner product.

Proposition 4.1.4 (continuity of the inner product ([1], Lemma 4, p. 49)). Suppose that xn Ñ x and yn Ñ y
in a pre-Hilbert space. Then pxn , yn q Ñ px, yq.

Proof: Because the sequence txn u is convergent, it is bounded; say }xn } ¤ M . Then

| pxn, ynq  px, yq |  | pxn, ynq  pxn, yq pxn, yq  px, yq | ¤ | pxn, yn  yq | | pxn  x, yq |.
By the Cauchy-Schwarz inequality,

| pxn, ynq  px, yq | ¤ }xn}}yn  y} }xn  x}}y}.


4.2. The Projection Theorem U 61

Since }xn } is bounded,

| pxn, ynq  px, yq | ¤ M }yn  y} }xn  x}}y} Ñ 0. q

4.2 The Projection Theorem

Definition 4.2.1 (orthogonality). In a pre-Hilbert space, two vectors x, y are said to be orthogonal, written
x K y, if px, yq  0. A vector is said to be orthogonal to a set S, written x K S, if x K s for all s P S.

Consider the following optimization problem in a pre-Hilbert space: given a vector x in a pre-Hilbert
space X and a subspace M in X, find arg minmPM }x  m}. If x P M , then the solution is trivial, but if
x R M , we must answer three questions:

1. Is there a vector m P M that minimizes }x  m}, or is there no m that is at least as good as the
others?

2. Is the solution unique?

3. What is the solution?

The following version of the projection theorem answers these questions.

T HEOREM 4.2.2 (Projection Theorem: pre-Hilbert space ([1], Theorem 1, p. 50)). Let X be a pre-Hilbert space
and M be a subspace of X, and let x P X. If there exists a vector m0 P M such that }x  m0 } ¤ }x  m} for all
m P M , then m0 is unique. Furthermore, m0  arg minmPM }x  m} if and only if x  m0 K M .

Proof: First we show that if m0  arg minmPM }x  m}, then x  m0 K M . Suppose for contradiction
that there exists an m P M such that x  m0 M m. Without loss of generality, assume that }m}  1 and
px  m0 | mq  δ  0. Define m1  m0 δm. Then
}x  m1}2  }x  m0  δm}2
 }x  m0}2  px  m0, δmq  pδm, x  m0q |δ|2
 }x  m0}2 |δ|2   }x  m0}2.
so if x  m0 M M , m0 is not a minimizing vector.
Now we show that if x  m0 K M , then m0  arg minmPM }x  m} and m0 is unique. For any m P M ,

}x  m}2  }x  m0 m0  m}2  }x  m0}2 }m0  m}2


by the Pythagorean theorem. Then m0  arg minmPM }x  m} since }x  m} ¥ }x  m0 } by positive
definiteness of the norm, and m0 is unique since }m0  m}2  0, and therefore }x  m}  }x  m0 }, if
and only if m0  m. q
The following “counterexample” will suggest a stronger version of the projection theorem.
62 U 4. H ILBERT S PACE

Example 4.2.3. Let X be the space of infinite sequences, with the l2 -norm and the inner product px | yq 
°8 
i1 ξi ηi for x  tξi u, y  tηi u P X. Let x  1, 2 , 3 , . . . , and let M be the subspace of sequences with
1 1

only finitely many nonzero terms such that all even terms equal zero. Show that the minimizing vector
m0 does not exist in M .

Solution: Let m0  tηiu be a candidate for the minimizing vector, where η2k  0 for k P N. Then
8̧ 
2
}x  m0}  2 1
i
 ηi ,

i 1

so the best we can do is set η2k1  2k11 for k   n   8. Consider the sequence tmnu, where mn  tηiu
with η2k  0 and η2k1  2k11 for k P N. Then

8̧ 
2 8̧ 
2
1 π2
8̧ 
2
}x  mn}  2 1
2k
1
2k  1
 4 6
1
2k  1
.

k 1 
k N 
k N

Because mn is the best we can do among infinite series with n  1 nonzero terms, a minimizing vector
does not exist since }x  mn } ¡ }x  mn1 } for all n. The problem is that mn Ñ m0 , but m0 P M since M
is not complete. q
To fix the problem, we have the following definition.

Definition 4.2.4 (Hilbert space). A Hilbert space is a complete pre-Hilbert space. That is, a Hilbert space
is a Banach space with an inner product that induces the norm.

We then prove the following stronger version of the projection theorem.

Q UIZ T HEOREM 10 (Projection Theorem: Hilbert space ([1], Theorem 2, p. 51)). Let H be a Hilbert space and
M be a closed subspace of H. For any x P H, there exists a unique vector m0 P M such that }x  m0 } ¤ }x  m}
for all m P M . Furthermore, m0  arg minmPM }x  m} if and only if x  m0 K M .

Proof: Uniqueness and orthogonality have already been established in the proof of Theorem 4.2.2, so it
suffices to establish the existence of the minimizing vector m0 .
If x P M , then m0  x and the theorem is trivial. If x R M , define δ  inf mPM }x  m}. We want to
find an m0 P M such that }x  m0 }  δ. Let tmi u be a sequence of vectors in M such that }x  mi } Ñ δ.
By the parallelogram law (4.1.3),

}pmj  xq px  miq}2 }pmj  xq  px  miq}2  2}mj  x}2 2}x  mi }2 ,

or  
 mj 2
}mj  mi}  2}mj  x}
2 2
2}x  mi } 2
 4 x  mi
2  .

Then }x  pmi mj q{2} ¥ δ by definition of δ and }mi  x}2 Ñ δ2 as i Ñ 8, so


}mj  mi}2 ¤ 2}mi  x}2 2}x  mj }2  4δ 2 Ñ 2δ2 2δ 2  4δ 2  0.
4.3. Orthogonal Complements U 63

Therefore tmi u is a Cauchy sequence and since M is a closed subspace of a complete space, tmi u has a
limit tm0 u in M . By continuity of the norm, which follows from continuity of the inner product (4.1.4),
}x  m0}  δ. q

4.3 Orthogonal Complements

Definition 4.3.1 (orthogonal complement). Given a subset S of a pre-Hilbert space, we call S K  tt : t K


S u the orthogonal complement of S.

Example 4.3.2. If S  tp1, 1, 1qu, what are S K, S KK, and S KKK?

Solution: S K  tpx, y, zq : x y z  0u, the plane through the origin perpendicular to the vector p1, 1, 1q.
Then

S KK  tpa, b, cq : ax by cz  0, x y z  0u  tpa, b, cq : a  b  cu  rtp1, 1, 1qus  rS s.


Finally, S KKK  tpx, y, zq : λx λy λz  0u  S K. q

Proposition 4.3.3 (([1], Proposition 1, p. 52)). Let S and T be subsets of a Hilbert space. Then

1. S K is a closed subspace.

2. S „ S KK.
3. If S „ T , then T K „ S K .

4. S KKK  S K .

Proof: We prove each statement individually:

1. S K is a subspace because for any x, y P S K , a, b P F, and s P S,

pax by, sq  a px, sq b py, sq  0,

so ax by P S K . S K is closed because if txn u is a convergent sequence in S K , then continuity of the


inner product implies 0  pxn , sq Ñ px, sq for all s P S, so x P S K .

2. If x P S, then x K y for all y P S K ; therefore, x P S KK .

3. If y P T K , then y K x for all x P S since S € T . Therefore, y P S K.


4. From (2), S K „ S KKK and S „ S KK. Combining the latter result with (3), S KKK „ SK. Therefore,
S K  S KKK .

q
64 U 4. H ILBERT S PACE

We want to show that given any closed subspace S of a Hilbert space H, any vector x can be written
as the sum of of a vector m P M and a vector n P M T . First we need the following definition.

Definition 4.3.4 (direct sum). A vector space X is the direct sum of two vector spaces M and N , written
M ` N , if every vector x P X has a unique representation of the form x  m n, where m P M and
n P N.

Example 4.3.5. Show that R3 is the sum of the xy-plane and the xz-plane, but not the direct sum.

Solution: In R3 , the xy-plane is given by M  tpx, y, 0q : x, y P Ru and the xz-plane is given by N 


tpx, 0, zq : x, z P Ru; their sum is tp2x, y, zq : x, y, z P Ru  tpx, y, zq : x, y, z P Ru  R3. However, R3 is not
the direct sum of the two planes since we can write px, y, z q  pw, y, 0q px  w, 0, z q, so no x P R3 can be
written uniquely in the form x  m n with m P M and n P N . q

T HEOREM 4.3.6 (([1], Theorem 1, p. 53)). If M is a closed linear subspace of a Hilbert space H, then H 
M ` M K and M  M KK .

Proof: Let x P H. By the projection theorem, there is a unique vector m0 P M such that }xm0 } ¤ }xm}
for all m P M , and we can write n0  bx  m0 P M K . Therefore, we can write x  m0 n0 with m0 P M
and n0 P M K . To show that this representation is unique, suppose x  m1 n1 with m1 P M and
n1 P M K . Then 0  m1  m0 n1  n0 . m1  m0 and n1  n0 are orthogonal, so by the Pythagorean
theorem, 0  }m1  m0 }2 }n1  n0 }2 , so by positive definiteness of the norm, m0  m1 and n0  n1 .
Therefore, H  M ` M K .
Now we show that M  M KK . By Part (2) of Proposition 4.3.3, M „ M KK . To show the other
direction, let x P M KK . By the above result, x  m n where m P M KK and n P M KKK  M K , where
M KKK  M K by Part (4) of Proposition 4.3.3. Since x, m P M KK , x  m  n P M KK . But n P M K as
well, so pn, nq  }n}2  0, which by positive definiteness of the inner product implies n  0. Thus,
x  m P M , so M KK „ M and we conclude M  M KK . q

4.4 The Gram-Schmidt Procedure

Definition 4.4.1 (orthogonal set). A set S of vectors in a pre-Hilbert space is an orthogonal set if x K y for
all x, y P S such that x  y. If }x}  1 for all x P S, then the set is orthonormal.

Proposition 4.4.2 (([1], Proposition 1, p. 53)). An orthogonal set of nonzero vectors is a linearly independent set.

Proof: Suppose tx1 , . . . , xn u is a finite subset of the given orthogonal set and that there are scalars α1 , . . . , αn
°
such that ni1 αi xi  0. Then


α i xi , xj  αj pxj , xj q  p0, xj q  0, j  1, . . . , n.

i 1
4.4. The Gram-Schmidt Procedure U 65

Since xj  0, αj  0 for j  1, . . . , n. Then by Theorem 2.3.3, x1, . . . , xn are linearly independent. q

T HEOREM 4.4.3 (Gram-Schmidt Process). Let txi u be a countable sequence of linearly independent vectors in a
pre-Hilbert space X. Consider the vectors tei u defined by


i¸1
zi  xi  pxi, ej q ej ei  }zzi}

j 1 i

for all i. Then tei u is an orthonormal sequence and for each n, rtx1 , . . . , xn us  rte1 , . . . , en us.

Proof: The proof of the first part is by induction. For the case n  1, te1 u is an orthonormal sequence.
For general n, assume that te1 , . . . , en u is an orthonormal sequence and prove that te1 , . . . , en 1 u is also
an orthonormal sequence. Because te1 , . . . , en u is an orthonormal sequence, it suffices to show that en 1
is orthogonal to each of e1 , . . . , en and that }en 1 }  1. Taking inner products,

ņ i¸1
pz n 1 | z i q  xn 1  px n 1, ej q ej , xi  pxi, ek q ek

j 1 k 1

i¸1 ņ i¸1 
 pxn 1 , xi q  px n 1 , ek q pek , xi q  p xn 1 , ej q pej , xi q p xn 1, ek q pek , xi q

k 1 
j 1 k 1 
 px n 1, xi q  p x n 1, xi q  p xn 1, xi q pxn 1, xi q  0,

for i  1, . . . , n, where the last line follows from Parseval’s equality (??). Since en 1  zn 1{}zn 1},
en 1 K ei for i  1, . . . , n and }en 1}  1. Therefore, teiu is an orthonormal sequence.
To show that rtx1 , . . . , xn us  rte1 , . . . , en us for all n, observe that en is a linear combination of
x1 , . . . , xn . Therefore, we can also write xn as a linear combination of e1 , . . . , en ; thus, a vector can be
written as a linear combination of x1 , . . . , xn if and only if it can be written as a linear combination of
e1 , . . . , en . q

Example 4.4.4. Let X be the space of polynomials p such that deg p ¤ 2. Define the inner product
»1
pp, qq  pptqq ptq dt
1
for any p, q P X. Starting with the sequence t1, t, tu, construct an orthonormal basis for X.
66 U 4. H ILBERT S PACE

Solution: We have

z1  x1  1
e1 
z1
}z1}  ? 1
2
z2  x2  px2 , e1 q e1  t
c
e2  z2
}z2}  2
3
t

z3  x3  px3, e1q e1  px3, e2q e2  t2  13


c c
e3  z3
}z3}  3
2 2
t 
5 2 1
2
5
2
.

q
Suppose we have a vector x in a Hilbert space X and want to find the closest vector to x that lies
in the subspace M  rty1 , . . . , ym us. We can use Theorem 10 and the Gram-Schmidt process to solve
this problem. If M is one-dimensional, we can set ei  yi {}yi } for any i  1, . . . , n to find the solution
x  px, ei q ei P M K .

Example 4.4.5. Let X  R2 and define the inner product


pv, wq  4v1w1 v 2 w2

for any v  pv1 , v2 q, w  pw1, w2q P X. Let M  rty1us  rtp1, 4qus. Find the vector in M that is closest
to y1  p5{2, 0q.

Solution: First we set 

e1  y1
}y1}  ? ,  ?2
1
.
2 5 5
Then the minimizing vector is




x  px, e1 q e1   52 , 0  ?6 ? ,  ?2
1
 4,  ?2 . q
5 2 5 5 5

This technique works equally well if M is generated by more than one vector, provided that we start
by converting the set ty1 , . . . , yn u into an orthonormal basis te1 , . . . , em u. (Of course, if y1 , . . . , yn are not
linearly independent, m   n.)
Note that we can write this problem as
 
 ņ 
max x

α1 ,...,αn 
 
αi yi 



i 1


x α i yi , yj  0, j  1, . . . , n.

i 1

Note that the objective function is quadratic in α1 , . . . , αn and that the constraints are linear in α1 , . . . , αn ,
4.5. Fourier Series U 67

so using Lagrange multipliers reduces this problem to solving a system of linear equations. More specifi-
cally, if we write

py 1 , y 1 q    p y 1 , y n q  α1
  
px, y1q
G  G py 1 , . . . , y n q   α .  c
 .. .. ..   ..   ..  ,
. . .  . 
px n , x 1 q  p xn , xn q αn px, ynq
then we can write the constraints as GT α  c. These are called the normal equations. G is called the
Gram matrix. The normal equations have a unique solution if and only if det G  0.

Proposition 4.4.6 ([1], Proposition 1, p. 56). det G  0 if and only if y1 , . . . , yn are linearly independent.

Proof: Equivalently, we show that det G  0 if and only if y1 , . . . , yn are linearly dependent. If y1 , . . . , yn
°
are linearly dependent, then there exist β1 , . . . , βn not all equal to zero such that ni1 βi yi  0. Taking
inner products, 
ņ ņ
β i yi , yj  βi pyi , yj q  0, j  1, . . . , n,
i 1  
i 1

or equivalently,
ņ  
β i Gi 0 GTi  p yi , y1 q    p yi , yn q .

i 1

Therefore, the rows of the Gram matrix are linearly dependent, so det G  0.
If det G  0, then the rows of the Gram matrix are linearly dependent, so there exist β1 , . . . , βn not all
°
equal to zero such that ni1 βi pyi , yj q  0 for j  1, . . . , n. Therefore,


β i yi , yj  0, j  1, . . . , n,

i 1

so   2
ņ ņ  ņ 
βi βi yi , yj  0 ùñ 



βi yi 

 0.

j 1 i 1  
i 1
°n
Thus   0, so y1, . . . , yn are linearly dependent.
i 1 β i yi q
Note that if det G  0, solving the normal equations is equivalent to solving fewer than n equations in n
unknowns, so a solution exists, though it is not unique.

4.5 Fourier Series

Now suppose that X is infinite-dimensional and that M is generated by a countably infinite set of linearly
independent vectors. (Recall from Example 2.3.7 that M does not necessarily equal X.) Assume that we
have constructed an orthonormal basis e1 , e2 , . . . for M using the Gram-Schmidt process. Then

x̂  px, eiq ei

i 1
68 U 4. H ILBERT S PACE

is the vector in M closest to x. However, this is an infinite series, so we need the following necessary and
sufficient condition for an infinite series of orthogonal vectors to converge in Hilbert space.

T HEOREM 4.5.1 ([1], Theorem 1, p. 59). Let tei u be an orthonormal sequence in a Hilbert space H. A series of
° °n
the form 8i1 ξi ei converges to an element x P H if and only if i1 |ξ |   8. In that case, ξi  px, ei q for i ¥ 1.
2

°n °n
 |ξ |   8 and define the partial sums sn 
2
Proof: Suppose i 1 
i 1 ξi ei . Then
 2
 m̧  m̧
}sn  sm}  2 



ξi ei 

 |ξi|2 Ñ 0

i n 1 
i n 1

as n, m Ñ 0, where n   m without loss of generality. Therefore, tsn u is a Cauchy sequence; since H is


complete by definition of a Hilbert space, there exists an x P H such that sn Ñ x.
° °8
If s Ñ x P H, then tsn u is a Cauchy sequence so m in 1 |ξi | Ñ 0 as n, m Ñ 0. Thus,
2
in 1 |ξi | Ñ 0,
2
°8 n
so i1 |ξi |2   8.
Clearly psn , ei q  ξi for i ¥ 1. Since sn Ñ x, psn , ei q Ñ px, ei q by Proposition 4.1.4. q
We call tξi u  tpx, ei qu the Fourier coefficients of x with respect to tei u.

Proposition 4.5.2 (Bessel’s inequality ([1], Lemma 1, p. 59)). Let x be an element in a Hilbert space H and
suppose tei u is an orthonormal sequence in H. Then

| px, eiq |2 ¤ }x}2.

i 1

Proof: Let αi  px, eiq for i ¥ 1. For all n,


 2
 ņ  ņ
0¤   }x}2  |αi|2.
 
x α i xi 
 

i 1 
i 1

Then

|αi|2 ¤ }x}2
i 1 
for all n; letting n Ñ 8 establishes the inequality. q

Example 4.5.3. Show that Bessel’s inequality holds even if tei u does not generate H.

Solution: Let X be the space of polynomials on r1, 1s and define the inner product
»1
pp, qq  pptqq ptq dt
1
4.5. Fourier Series U 69

for any p, q P X. Let tei u be an orthonormal set of even polynomials and let xptq  t. Clearly tei u are even
polynomials, so px, ei q  0 for i ¥ 0. Then

0 | px, eiq |2 ¤ }x}2  32 ,

i 1

so Bessel’s inequality holds. q

°
Since Bessel’s inequality guarantees that 8 i1 | px, ei q | ¤ 8, Theorem 4.5.1 guarantees that the series
2
°8
i1 px, ei q ei converges to some element. Since any Hilbert space H is complete by definition, that
element is in H. Let S  tei u be an infinite subset S of H; the subspace rS s include all finite linear
combinations of the vectors in S, but since we have an infinite sum, we are interested in the closure rS ¯ s of
rS s. For example, if S  t1, t2, t4, . . . u, then cos t is in rS¯ s.

T HEOREM 4.5.4 ([1], Theorem 2, p. 60). Let x be an element in a Hilbert space H and suppose tei u is an or-
thonormal sequence in H. Then the series

px, eiq ei

i 1

converges to an element x̂ P M  rteius. Furthermore, x  x̂ P M T .

Proof: q

1. Convergence of Fourier series

Since any Hilbert space H is complete, the series converges to an element of H. The vectors ei are
an infinite subset S. The subspace rS s includes all finite linear combinations, but since we have an
infinite sum, we need the closure of this subspace.

If S  t1, t2, t4,    u, what transcendental function is in the closure of rS s?

Prove (Theorem 2 on page 60) that



x̂  px|eiqei

i 1

converges to an element of the closed subspace M generated by the ei .


70 U 4. H ILBERT S PACE

If the closed subspace M is H itself, then the orthonormal sequence ei is called complete, and
Bessel’s inequality becomes an equality.
A useful criterion (proof left for homework): the orthonormal sequence ei is complete if and only if
the only vector orthogonal to each of the ei is the null vector θ.

2. Completeness: an example
The Hilbert space is H  L2 r1, 1s. From the sequence t1, t, t2 ,    u it is straightforward to use
Gram-Schmidt to construct an orthonormal basis. Luenberger, p. 61, has an explicit formula, but we
do not need it.
To prove completeness, we assume (for a contradiction) the existence of a function f ptq that is or-
thogonal to each tn .
This element f ptq of H may be discontinuous or unbounded, but its antiderivative

»t
F ptq  f pτ qdτ
1
is a continuous function of t.
Prove continuity
Prove that F p1q  F p1q  0.
³1
1 t F ptqdt  0.
Using integration by parts, prove that n

Using Weierstrass, we construct a polynomial Qptq such that


|³F ptq  Qptq|    on [-1,1]. Now prove (this is tricky, see Luenberger p. 62) that
1 |F ptq| dt ¤ 2 .
1 2 2

Thus the continuous function F ptq is the zero function and the Lebesgue integrable function f ptq is
in the same equivalence class as the zero function, i.e. it is θ.

3. When codimension is smaller - a finite example


The Hilbert space H  E 3 . Let a  p6, 0, 0q, let M be the subspace spanned by the vectors z1 
p1, 0, 1q and z2  p1, 2, 1q, which are conveniently orthogonal to one another, and let V be the
linear variety a N .
Find the vector m0 P N that is closest to a.
Find the vector a0 P V of minimum norm.
State and prove the theorem (Theorem 1 on p. 64) that these two problems (or any two like them)
are really the same.
There is an easier way to do this problem, since the dimension of N K is just 1. Since y  p1, 1, 1q is
perpendicular to N , ry s  N K , and V is the set of all vectors that satisfy the single equation
px|yq  pa|yq  6. Of these, the one with minimum norm is the one that lies in N K. So set x  βy
and solve the problem.
Of course, this approach would have been spectacularly better if H and N were infinite-dimensional
while N K was 1-dimensional.
4.5. Fourier Series U 71

4. The dual approximation problem


Theorem 2 on p. 65 generalizes the preceding example.
Let y1 , y2 ,    yn be a set of linearly independent vectors. Luenberger calls the space that they span
M , but it is more consistent with the preceding example to call it N K .
Consider all vectors that satisfy the equations
px|y1q  c1
px|y2q  c2

px|ynq  cn
If all the ci are zero, these vectors are in the subspace N .
Otherwise they are in a linear variety V N a, where
pa|yiq  ci for all i
How do we know that the vector in V with minimum norm is of the form
x0  °ni1 βiyi?
Write down a set of equations that can be solved for the βi .
How could we solve this problem by Gram-Schmidt?

5. A finite-dimensional example
You are doing the term project for your course “Ethnic Sculpture.” You start with a log of length 21
meters. In the first week, you cut off u meters, carve a totem pole, and set it vertically in Harvard
Yard. In the second week, you cut off v meters, carve a totem pole, and set it vertically in Harvard
Yard. In the third week, you carve a totem pole from the remaining w meters, and set it vertically
in Harvard Yard. At the end of the fourth week, you take down the poles, having gained artistic
renown equal to 3u 2v w. You need renown of 46 to get an A in the course.
The energy cost of erecting a totem pole is proportional to the square of its height, so you want to
minimize ||x||2  u2 v 2 w2 .
Express the constraints in the form py |xq  c.
The vectors y1 and y2 generate N K . Find the linear combination β1 y1 β2 y2 that satisfies both con-
straints.
What is the solution to this optimization problem?

6. An easy economic example


To keep the numbers small, measure money in trillions of dollars and time in decades.
You are Treasury secretary in the new Obama administration, and your task is to build up a fund of
3
e 2 trillion dollars over the next decade to deal with the financial disaster that will be caused when
the current crop of Harvard undergraduates gets busy on Wall Street. By selling off formerly toxic
securities you already have a trillion dollars to start with.
By making loans to banks, you could get your fund balance to grow at an annual rate of 5 percent
compounded continuously, in accordance with the differential equation
72 U 4. H ILBERT S PACE

x9 ptq  0.5xptq.
That will increase your money to
?e trillion in a decade, but it is not enough. You will need funds
from Congress at a rate uptq.
Constraints: xp0q  1, xp1q  e 2 .
3

³1
Optimization problem: choose uptq to minimize the Hilbert space norm pu|uq  0 u2 ptqdt. The
solution lies in an infinite-dimensional vector space.
Secret: the codimension is 1. If we can express the constraint in the form pu|y q  c, then the solution
is that uptq is in the space generated by y ptq, and we just have to determine one coefficient.
First you will have to learn something about linear differential equations that Luenberger takes for
granted.
What inhomogeneous differential equation does xptq satisfy?
What is a solution to the corresponding homogeneous differential equation?
Substitute xptq  aptqx0ptq, where x0ptq is a solution to the homogeneous equation, and solve for
a9 ptq.
Find an expression for ap1q  ap0q as an integral.
Express the constraint xp1q  e 2 in terms of an inner product pu|y q  c.
3

The optimal uptq lies in the space N K generated by y. Use the constraint to find the coefficient, and
the problem is solved!

7. An example from optimal control.


Let H  L2 r0, 1s, which is infinite-dimensional. We have a door that is supposed, at the push of a
button, to open through an angle of one radian and come back to rest in one second. The door is
controlled by an electric motor. By supplying a current uptq P H, which produces a proportional
torque, we can overcome a linear damping force and also cause some angular acceleration. With
all physical constants set equal to 1 for simplicity, the angular velocity ω of the door satisfies the
differential equation

ω9 ptq ω ptq  uptq.

Now we insist that the vector uptq must lie in the linear variety V of functions that cause the motor
to rotate through an angle of 1 radian in one second, starting at rest and ending at rest.
What small change would make V be a subspace N ?
What is the dimension of N and of V ?
The secret to solving this problem is that the codimension of N is 2. To prove this, we have to find
basis vectors y1 and y2 that are orthogonal to N , so that
³
py1|uq  01 y1ptquptqdt  ωp1q  0
³
py2|uq  01 y2ptquptqdt  θp1q  1.
We want to find the vector of minimum norm in V , the function uptq that minimizes the energy
³1
consumption 0 uptq2 dt. This will lie in the two-dimensional space spanned by y1 and y2 .
4.5. Fourier Series U 73

8. More fun with differential equations


First solve the homogeneous first-order equation ω9 ptq ω ptq  0.
Now multiply this solution by aptq (“variation of parameters”) to get a trial solution for the inho-
mogeneous equation ω9 ptq ω ptq  uptq. Plug in the trial solution to get a formula for a9 ptq.
Integrate once to express ap1q and ω p1q in terms of uptq.
Integrate the original equation, in the form ω9 ptq θ9ptq  uptq, from 0 to 1 to get a relation among
ω p1q, θp1q, and uptq.
What are the vectors y1 and y2 ?

9. Finishing the solution


By the general theorem for the dual approximation problem, the vector uptq of minimum norm lies
in the space generated by y1 ptq and y2 ptq.
We could find the minimum-norm uptq  β1 y1 ptq β2 y2 ptq by solving two linear equations for β1
and β2 . However, there is a simpler basis for the space: the functions 1 and et . So
uptq  α1 α2 e t .
Use the constraints ω p1q  0 and θp1q  1 to get two equations for α1 and α2, and solve them to
determine uptq.

10. Minimum distance to a convex set


Replace the linear variety V by a closed convex subset K of a Hilbert space H, and let x be an
arbitrary vector outside of K.
We can prove (Theorem 1 of section 3.12) that there exists a unique vector k0 P K of minimum
distance from x. Orthogonality is too much to ask for, but we can also prove that
px  k0|k  k0q ¤ 0 for all k P K.
Draw a diagram, where H is two-dimensional, to show how
px  k0|k  k0q  0 can fail to be achieved.
Existence proof: this is almost the same as in the case where we have a subspace M instead of a
convex set K. We use the parallelogram-law trick. Let

δ  kinf
PK
||x  k||
and construct a sequence such that ||x  ki || converges to δ and so is Cauchy.
Show that ki is Cauchy.
Why is ki also convergent? What does this prove?
To prove uniqueness, assume that ||x  k0 ||  ||x  k1 ||  δ. Luenberger just does a special case of
the argument above, but for a little more fun, just repeat the parallelogram-law argument.
Now let’s assume that k0 has the property that
px  k0|k  k0q ¤ 0 for all k P K.
It is then easy to prove that k0 is the vector that minimizes the distance ||x  k ||. Just expand
||x  k||2  ||x  k0 k0  k||2 in terms of dot products. This proof doesn’t even use the convexity
of k.
74 U 4. H ILBERT S PACE

11. A proof that requires convexity


Suppose that k0 is the unique minimizing vector. The task is to show that px  k0 |k1  k0 q ¤ 0 for all
k1 P K.
Write a formula for the point kα that interpolates between k0 and k1 , and draw a diagram to show
why this point is in K for 0 ¤ α ¤ 1.
Since α  0 minimizes ||x  kα ||, the quantity
||x  kα||2 is a nondecreasing function of α. Its derivative cannot be negative, even for α  0. Use
this fact to complete the proof.

12. Example where the convex set is a cone


This is a three-dimensional version of Luenberger’s example 1 on page 71.
You are piloting a helicopter, and your mission is to deliver a sports reporter to the playing field of
the world outdoor curling championships, which are being held on a large frozen lake in Saskatchewan.
At noon, two snowmobiles set out from the origin along the ice, moving with velocity y1 and y2
respectively. The convex cone α1 y1 α2 y2 between their tracks becomes the playing field.
Sketch the playing field, and describe the four different types of location for the position x̂  α1 y1
α2 y2 on the playing field that is closest to the position x of your helicopter.
We know that for all k on the playing field, px  x̂|k  x̂q ¤ 0.
Whatever the solution, you can always move to a new point in the cone along a snowmobile velocity
vector. What does this say about k  x̂ yi ?
If αi ¡ 0, you can also move to a new point in the cone backwards along a snowmobile velocity
vector. What does this say about k  x̂ yi ?

13. Curling concluded


We now have Luenberger’s general solution to this problem:
x̂  α1 y1 α2 y2 , where for all i
px  x̂|yiq ¤ 0, with equality if αi ¡ 0.
Equivalently
px  x̂|yiq  zi, with αizi  0
Show how to recast this equation, for the two-dimensional case, in terms of the Gram matrix G.
Alas, there are four unknowns even in this simple case!
Give an interpretation of z for each of the four types of solution to the curling-competition problem.
C HAPTER 5
DUAL SPACES AND THE HAHN - BANACH THEOREM

Reading: [1, Sections 5.1 – 5.4, 5.12]

5.1 Linear Functionals

Definition 5.1.1 (functional). A transformation or function f : X ÑY is a functional if Y  R.

Definition 5.1.2 (linear functional). A linear functional is a functional f : X Ñ R such that f pax byq 
af pxq bf pyq for all x, y P X and a, b P R.

In a Hilbert space H, any vector y determines a linear function by the rule fy pxq  py, xq.

Example 5.1.3. What linear functionals are determined by

• y1  p1, 1, 1q on E 3?

• y2  1, 12 , 13 , . . . on l2 ?

• y3 ptq  et1 on L2 r0, 1s?

Solution: We have
8̧ 1 »1
fy1 pxq  py1 , xq  x1 x2 x3 .fy2 pxq  py2, xq  yi fy3 pxq  py3 , xq  et1 xptq dt. q

i 1
i 0

Recall that for the case p  2, q  2, the Hölder inequality (Quiz Theorem 5) reduced to the Cauchy-
Schwarz inequality (Propositions 3.1.2, 4.1.2), which in the inner product form was px, yq2 ¤ px, xq py, yq.
The following example will motivate some of the calculations in the duality proofs in this chapter.

°8 1{2 ξi converge?
Example 5.1.4. If x  tξi u P l4{3 does 
i 1i

Solution: Yes. By the Hölder inequality,

8̧ 1 8̧ 1 
8̧  1
4 1{4  2
1{4
ξi ¤ |ξi | ¤ }x}4{3  }x}4{3   8.
π
1{ 2 1{ 2 1{2 6
q
i1 i1 i1
i i i

75
76 U 5. D UAL S PACES AND THE H AHN -B ANACH T HEOREM

Definition 5.1.5 (norm of a linear functional). A linear functional f on a normed space X is bounded if
there is a constant M such that |f pxq| ¤ M }x} for all x P X. The smallest such constant M is the norm of
f , denoted }f }; that is, }f }  inf tM : |f pxq| ¤ M }x} for all x P X u.

Q UIZ T HEOREM 11 ([1], Proposition 2, p. 104). A linear functional on a normed space is continuous if and only
if it is bounded.

Proof: First we show that a linear functional f on a normed space X is continuous at a single point, then
it is continuous throughout X. Assume f is continuous at x0 P X, and let txn u Ñ x be a sequence in X.
By the linearity of f ,
|f pxnq  f pxq|  |f pxn  x x0q  f px0q|.
Since xn  x x0 Ñ x0 and since f is continuous at x0 , f pxn  x x0 q Ñ f px0 q. Thus, |f pxn q f pxq| Ñ 0,
so f is continuous at x. Since this holds for any x P X, f is continuous throughout X.
Suppose }f } ¤ M . Then if xn Ñ 0, |f pxn q| ¤ M }xn } Ñ 0, so f is continuous at x  0. By the above
result, f is continuous everywhere in X.
Now suppose that f is continuous anywhere in X, specifically at x  0. Then there exists a δ ¡ 0 such
that |f pxq|   1 for }x} ¤ δ. For any x  0 P X, δx{}x} has norm δ, so by using the linearity of f ,
 
|f pxq| 

f δ }x} x
  }x} f  δ x
   }x} ,
 }}
x δ  δ  x  δ

so M  1{δ is a bound for f . q


Note that a linear functional on a finite-dimensional normed space must be bounded and is therefore
continuous. To see why the extension to infinite-dimensional vector spaces poses a problem, consider the
functional

f px q  kξk

k 1

in l2 : clearly this function is discontinuous at x  0 since limxÑ0 f pxq  0 if we let ξk  1{kn and take the
limit as n Ñ 8, but limxÑ0 f pxq is undefined if we let ξk  1{n.
The linear functionals themselves may be regarded as elements of a vector space:

Definition 5.1.6 (algebraic dual). Given a vector space X, not necessarily normed, the linear functionals
on X form a vector space defined as follows:

• Given two linear functionals f1 and f2 on X, we define their sum f1 f2 as the linear functional on
X given by pf1 f2 qpxq  f1 pxq f2 pxq.

• Given a linear functional f and a scalar a, we define the scalar product af as the linear functional
on X given by paf qpxq  af pxq.

• The null element in the space of linear functionals is the functional f pxq  0 for all x P X.

This vector space is the algebraic dual of X.


5.2. Common Dual Spaces U 77

However, not all linear functionals in the algebraic dual will be bounded, so we restrict our attention
to the subspace of the algebraic dual consisting of all bounded, or equivalently continuous, functionals
on a normed space X. The norm of a functional f in this subspace is still Definition 5.1.5. In fact, we can
write the norm of a functional in several ways:

}f }  inf
M
tM : |f pxq| ¤ M }x} for all x P X u
 sup |f}pxx}q|
x 0

 sup |f pxq|
}x}¤1
 sup |f pxq|.
}x}1
This leads to the following definition:

Definition 5.1.7 (normed dual). Let X be a normed linear vector space. The space of all bounded linear
functionals on X is the normed dual of X, denoted X  . The norm of an element f P X  is

}f }  sup |f pxq|.
}x}¤1

If we explicitly treat a functional f as an element of the dual space, we can write f  x . Then
f pxq  x pxq  xx, x y, where the symmetric notation x, y suggests that we are generalizing the inner
product p, q from Hilbert space.

5.2 Common Dual Spaces

T HEOREM 5.2.1 ([1, Theorem 5.2.1, p. 106]). X  is a Banach space.

Proof: By definition of the algebraic dual (Definition 5.1.6), X  already satisfies the axioms of a vector
space. Therefore, we simply need to show that X  is complete. Let tx u be a Cauchy sequence in X  , so
}xn  xm} Ñ 0 as n, m Ñ 0. Then for any x P X, txnu is a Cauchy sequence since |xnpxq  xmpxq| ¤
}xn  xm}}x}. Therefore, since R is complete, there exists a scalar xpxq such that xnpxq Ñ xpxq. We
must check that x P X  ; that is, the functional x is linear and bounded. x is linear since

x pax byq  lim xn pax byq  a lim xn pxq b lim xn pyq  ax pxq bx pyq.
nÑ8 n Ñ8 nÑ8
Furthermore, since txn u is a Cauchy sequence, for all  ¡ 0, there exists an M such that |xn pxq  xm pxq|  
}x} for n, m ¡ M and all x. Since xn Ñ x , |x pxq  xm pxq|   }bx} for m ¡ M . Thus,

|xpxq|  |xpxq  xmpxq xm pxq| ¤ |x pxq  x pxq| |xmpxq| ¤ p }xmq}x},
so x is bounded. q
78 U 5. D UAL S PACES AND THE H AHN -B ANACH T HEOREM

Guided by this theorem, we derive representations of the duals of some common dual spaces. These
representations will allow us to apply the theoretical framework of dual spaces to practical problems.
To find the dual of a normed space X, we start with a space whose elements are bounded linear
functionals f on X. We can calculate }f } from Definition 5.1.5. The challenge is to find a “representation”
for each functional by identifying it with an element }x } of a familiar normed space. For example, we
of coefficients pa, bq, combined with the evaluation
can identify any linear functional on E 2 with a vector ?
formula ax by and the Euclidean norm }pa, bq}  a2 b2 . By analogy with the inner product, we
use xx, x y to denote application of the evaluation rule. Usually this is a “sum of products” or“integral
of products” rule. The hard part is usually to show that the norm }f } is equal to the norm }x } of the
representation, which was defined independently.
The proofs in general have four steps:

1. evaluation: state the rule for using x to evaluate the bounded linear functional f .

2. construction: state the rule for converting the bounded linear functional f to x .

3. alignment: invent a vector x P X that is aligned with a given representation vector x . “Aligned”
means that evaluating f pxq is essentially the same calculation as evaluating ||y ||||x||. If ||x||  1, so
much the better. Now we know that ||f || ¥ ||y ||.

4. The “inequality” step: Prove, usually by invoking something like the Hoelder inequality, that |f pxq| ¤
||x||||y||. Applied to vectors with ||x||  1, this guarantees that ||f || ¤ ||y||, so we conclude }f } 
}x}.
1. The dual of E n is E n
This is easy, but it is a good introduction to what needs to be proved.

• Evaluation: show how to use a vector y  pη1, η2,    ηnq to define a bounded linear functional.

• Construction: give the rule for converting a bounded linear functional f to its representation y.
• Alignment: Show that the choice x  y guarantees that ||f || ¥ ||y ||
• Inequality: show that if y represents f , then ||f || ¤ ||y ||

2. The normed dual of cn is a subspace of l1 .


The notation cn is not standard. It means Rn with the norm that is used in l8 . There are several
spaces that use this maximum absolute value norm.
l8 includes all bounded infinite sequences.
c is the subspace of sequences that converge to some limit.
c0 is the subspace of sequences that converge the limit 0.
cn is the subspace of sequences with n components.
5.2. Common Dual Spaces U 79

• Evaluation: show how to use a vector y  pη1, η2,    ηnq P l1 to define a bounded linear func-
tional.
• Construction: give the rule for converting a bounded linear functional f to its representation y.
• Alignment: Choose x P cn with ξi  sgn ηi . Show that this choice makes |f pxq|  ||x||8 ||y ||1 .
What goes wrong if we try to do this for x P l8 ?
Show that ||f || ¥ ||y ||1 .
• Inequality: show that if y represents f , then f pxq   x, y ¡¤ ||x||||y||1, so that ||f || ¤ ||y||1.
Conclusion: ||f ||  ||y ||1

3. The normed dual of lp is lq (Part of this is proof 2.5)


As you would guess from the Hoelder inequality, 1
p
1
q  1. First we do the proof for 1   p   8.
• Evaluation: If y  tηiu P lq represents f and x  tξiu P lp, what is f pxq? How do we know that
f is bounded?
• Construction: Given that f is linear and bounded, define y by ηi  f pei q. Show that if a vector
xN has only its first N components nonzero, then f pxN q   xN , y ¡
On what basis can we extend this result to any x P lp ?
• Alignment: fix N and construct the vector xN whose first N components are given by
ξi  |ηi | p sgn ηi and whose other components are all 0. Since there are only a finite number of
q

nonzero components, we need not worry about convergence.


Calculate ||xN || and |f pxN q|.
By the definition of the norm, |f pxN q| ¤ ||f ||||xN ||.
Prove that ||f || ¥ ||y ||q .
Given that f is bounded, what can we conclude about y?
 tηiu P lq . What linear functional f does this define?
• Inequality: Start with the vector y
Use the Hoelder inequality to show that |f pxq| ¤ ||x||p ||y ||q .
Why can we conclude that ||f || ¤ ||y ||q ?

Conclusion: ||f ||  ||y ||q ?

4. The special case p  1

• Evaluation: same as for p ¡ 1


• Construction: same as for p ¡ 1
• Alignment: Choose a vector xN with at most one nonzero entry, xN = sgn ηN . What is the norm
||xN ||?
Show that ||f || ¥ ||y ||8 .
Given that f is bounded, what can we conclude about y?
• Inequality: Start with y  tηi u P l8 , define the obvious linear functional f , and use the Hoelder
inequality to prove that ||f || ¤ ||y ||8 .

You can then conclude that ||f ||  ||y ||8 .


80 U 5. D UAL S PACES AND THE H AHN -B ANACH T HEOREM

5. The special case p  8


The obvious guess, that the dual space is l1 , is wrong. The problem is related to the fact that l8 is
not separable. It is OK for a separable space (l1 ) to have a nonseparable dual, but not the other way
around.
If you take the subspace c0 € l8 of infinite sequences that converge to zero, which is separable, then
the dual space is l1 . The proof is left to the homework.
Consider the subspace c € l8 of infinite sequences that converge to some value ξ. This space is also
separable, and its dual space is l1 . The proof is left to the homework. The trick is to notice that once
you evaluate f on the vector tξ, ξ, ξ,    , you can use linearity to reduce this problem to the previous
one.
Here is the problem with l8 .
One element is the sequence p0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1,    q where ξi  1 if i is prime, 0 otherwise.
Another is the sequence p1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,    q. Since f is bounded, it has a finite value when
evaluated on any such sequence. There are uncountably many such sequences.
If the dual space of l8 were l1 , it would be possible to identify a countable subset S of these se-
quences, evaluate f on that subset, and then use those values to evaluate f on any sequence x. But
f is bounded, hence continuous. To find a y P S such that |f pxq  f py q|   , we need to make
||x  y||   δ. If that could be done, then S would be dense, and l8 would be separable, which we
know is not the case.

6. Dual of Hilbert space


The obvious guess, that any Hilbert space H is its own dual, is correct. This is theorem 2 on page
109.

• Evaluation: any element y P H defines a bounded linear functional by the rule f pxq  px|yq.
How do we know that this function is bounded?
• Construction:
First assume a bounded linear functional f . We want to find a vector y such that f pxq  px|y q.
Consider the set N of all vectors n such that f pnq  0. Show that this set is a closed subspace.
If N  H, then f is the zero functional and we take y  θ. Otherwise, write H  N ` N K .
Choose a vector z P N K with f pz q  1.
Consider any x P H. Show that x  f pxqz P N .
Use the fact that (x  f pxqz |z q  0 to construct y.
Show that y is unique.
• Alignment: choose x  ||y|| . Show that ||f || ¥ ||y||.
y

• : Inequality: use Cauchy-Schwarz to show that ||f || ¤ ||y ||.

We conclude, as usual, that ||f ||  ||y ||.

7. Dual of Lp r0, 1s for p ¡ 1


Luenberger just mentions “arguments similar to those in Theorem 1.”
The obvious guess, that the dual space is Lq r0, 1s, is correct. What is not entirely obvious is how to
construct the representation y ptq from the functional.
5.2. Common Dual Spaces U 81

³1
• Evaluation: given y ptq P Lq r0, 1s and xptq P Lp r0, 1s, define a functional by f pxq  0 xptqy ptqdt.
How do we know that it is bounded?
• Construction: Define the function us ptq  1 for 0 ¤ t   s, 0 otherwise. For future use, evaluate
the norm of v ptq  us h ptq  us ptq.
Given a bounded linear functional f , define F psq  f pusq Prove that F ps hq  F psq   Mh
1
p

for some M .
We now know that F psq is continuous. We cannot prove that it is differentiable. However, we
can prove that it is “absolutely continuous.” There are theorems, beyond the scope of our grasp
of Lebesgue integration, that say that an absolutely continuous function is differentiable except
on a set of measure zero. Now, wherever F is differentiable, take y ptq  F 1 ptq as the function
that represents f . The values of y on the set of measure zero where F is not differentiable are
³1
irrelevant to the value of the Lebesgue integral f pxq  0 xptqy ptqdt. So we have identified the
bounded functional f with an equivalence class of functions.
• Alignment: choose
xptq  |y ptq| p sgn y ptq.
q

Calculate f pxq and ||x||p by evaluating integrals. Conclude that


||f || ¥ ||y||q . Since f is bounded, this will establish that yptq P Lq r0, 1s.
• Inequality: use Hoelder to show that, if y ptq represents f , then
f pxq ¤ ||x||p ||y ||q . How does this establish that ||f || ¤ ||y ||q ?

We are done: we can convert between f and y, and we have proved that ||f ||  ||y ||q .

8. Minkowski functionals
This topic comes much later in the chapter, but it provides a geometric motivation for the sublinear
functionals that appear in the Hahn-Banach theorem.
Consider a convex set K in a normed space X that has the origin θ in its interior. You can think of
the boundary of K as the set of all points that can be reached from the origin for unit taxicab fare.
Then the Minkowski functional ppxq is the cab fare from the origin to x or equivalently, the factor by
which x must be scaled down to end up on the boundary of K. More formally,

ppxq  inf tr : P K, r ¡ 0u.


x
r

Draw a diagram showing curves where ppxq  1, 2, 12 .


A Minkowski functional is “sublinear”: it has some but not all properties associated with norms.
The first two are the defining properties of a sublinear functional.

• ppαxq  αppxq for α ¥ 0. Prove this from the definition.


• p px 1 x 2 q ¤ p px 1 q ppx2 q. Use the convexity of K to prove this. (This is proof 2.6)
• ppxq ¥ 0, and ppxq is finite for all x. This additional property gives us a “convex functional.” It
is obvious from the definition.
• ppxq is continuous. Prove this first at θ, then use sublinearity to extend the proof to an arbitrary
x. We will consider only continuous sublinear functionals.
82 U 5. D UAL S PACES AND THE H AHN -B ANACH T HEOREM

9. Statement of the Geometric Hahn-Banach theorem


This looks obvious in two or three dimensions, but we will need the more obscure “extension form”
of the theorem in order to prove it.
Let K be a convex set with nonempty interior in a real normed vector space X. There is no loss of
generality in assuming that θ is in the interior. Suppose that V is a linear variety in X containing no
interior points of K. Note that V can be as small as just a single point, and that it can include points
in the boundary of K.
Then there exists an element x P X  and a constant c such that
  v, x ¡ c for all v P V (this defines a hyperplane H containing V )
  k, x ¡  c for all k in the interior of K (the hyperplane does not intersect the interior of K).
Draw diagrams to illustrate the following:
If K is not convex, the conclusion of the theorem may not hold.
The hyperplane H may or may not be unique.
Also illustrate the feature that if X is 3-dimensional, V may have dimension 0, 1, or 2.

10. The theorem in two dimensions.


Suppose that V is a zero-dimensional variety (just a single point x). Construct the subspace M (a
line through the origin) that contains x. It is easy to construct a linear function f pxq on M such that
f pxq  1. Since x is not in the interior of K, ppxq ¥ 1.
Draw a diagram to illustrate this setup.
Now consider α ¡ 0. Show that f pαxq ¤ ppαxq.
Suppose α   0. Show that f pαxq ¤ ppαxq
At this point we need a way to extend the definition of f pxq to the entire space X while preserving
the property that f pαxq ¤ ppαxq. That is exactly what the extension form of the Hahn-Banach
theorem will do for us.

11. Proof of the Hahn-Banach theorem(extension form) (proof 2.7)


On subspace M , we have a linear functional f pmq and a continuous sublinear functional p (defined
on all of X) satisfying f pmq ¤ ppmq for all m. Choose a vector y that is not in M . Our immediate goal
is to define a linear functional g on the space rM y s such that g pmq  f pmq on M and g pxq ¤ ppxq
on this space that is one dimension larger.
Draw a diagram to illustrate the case where X is the plane, M is a line, and p is the Minkowski
functional of a convex set. Show that setting g py q  0 will not necessarily work, but that it appears
possible to choose a vector m0 P M such that setting g py  m0 q  0 will work. Show an example
where only one value for m0 is possible, and show other examples where many different choices
will do the job.
Now consider arbitrary m1 , m2 P M . For fixed y, we require
g pm1  y q ¤ ppm1  y q and g pm2 y q ¤ ppm2 y q.
Exploit the linearity of g to get a pair of inequalities involving the value g py q that we need to choose.
Starting with f pm1 m2 q ¤ ppm1 m2 q, show that there exists at least one acceptable value for g py q.
Luenberger, on page 111, calls this value c.
5.2. Common Dual Spaces U 83

Using the inequality g pm y q ¤ ppm y q, which holds for all m P M given our clever choice of g py q,
show that
g pm αy q ¤ ppm αy q for all α ¡ 0.
Using the inequality g pm  y q ¤ ppm  yq, which also holds for all m P M , show that gpm  βyq ¤
ppm  βy q for all β ¡ 0.
What about α  0?
We have now succeeded for one more dimension, since we have chosen g py q in such a way that
g pm αy q ¤ ppm αy q for all α and all m, and any vector in rM y s is of the form m αy.

12. Finishing the proof


At this point, if X is finite dimensional, an extension to all of X follows easily by induction, since
given f on a k-dimensional M , we have shown how to construct an extension g on a (k 1)-
dimensional rM y s. However, we want to use this theorem in the infinite-dimensional case, so
there is still work to be done.
Being wimps who are not “familiar with Zorn’s lemma” (Luenberger, p. 111), we assume that X is
separable. From the countable dense subset x1 , x2 ,    , xn ,    we extract a subset y1 , y2 ,    , yn ,   
of linearly independent vectors, none of them in M . Now M and these yi generate S, a dense subset
of X. One y at a time, we extend functional f to a g that is defined on S.
By construction, g ¤ p. Furthermore, p is bounded and continuous. How do we know that g is
continuous?
Now consider an x that is not in the dense subspace S. How do we define F pxq on all of X so that
F px q ¤ p px q ?
If you are keen on Zorn’s lemma, see Bachman and Zarici, Functional Analysis, pp.179-180, for a
proof that does not assume separability.

13. Back to the geometric version (proof 2.8)


Review the situation with the aid of a diagram. We have a convex set K with θ in its interior. V is a
linear variety in X that includes no interior points of K, although it may include boundary points.
We let M be the subspace of X generated by V . Then there is a linear functional f , defined on M ,
which has the value 1 on the variety V . Since V contains no interior points of K, the Minkowski
functional of K, ppxq, satisfies f pxq  1 ¤ ppxq on V .
Prove that f pxq ¤ ppxq for all m P M .
Use Hahn-Banach to extend f pxq to an F pxq defined on all of X. How can we be sure that F is
continuous?
Show that the hyperplane H  tx : F pxq  1u is closed and contains no interior points of K.
Restate the conclusion in terms of an element x P X  .

14. The norm as an all-purpose sublinear function - Corollary 1 on page 112.


Show that any norm ||x|| is a sublinear function.
Given a bounded linear functional f defined on subspace M € X, with norm ||f ||M , prove that it
can be extended to a bounded linear functional F , with the same norm, defined on all of X. The
trick is to choose ppxq  ||f ||M ||x||.
84 U 5. D UAL S PACES AND THE H AHN -B ANACH T HEOREM

A really simple example: In E 2 (Euclidean norm), let M be the line u  v and define f pt, tq  2t on
M . What is the extension F pu, v q to all of E 2 ?
5.2. Common Dual Spaces U 85

15. Achieving alignment


Show (Corollary 2) for any x P X, how to construct a nonzero bounded linear functional F , defined
on all of X, such that |F pxq|  ||F ||||x||.
In R2 , choose the Somali taxicab norm
||x||  ||pu, vq||  12 |u| |v|. Show the set K for which ||x|| ¤ 1.
Geometric form: As V choose the point x  p1, 21 q. Find a hyperplane H that includes V but that
does not intersect the interior of K.
Extension form(corollary 1): Define f on the subspace generated by x as f p2t, tq  2t. Extend f to a
function F pu, v q with the same norm.
Extension form(corollary 2): Find a linear functional, defined on all of R2 , such that |F pxq| 
||F ||||x||.
86 U 5. D UAL S PACES AND THE H AHN -B ANACH T HEOREM
C HAPTER 6
APPLICATIONS OF THE HAHN - BANACH THEOREM

Reading: [1, Sections 5.5 – 5.9, 5.11 – 5.13]. Also see [7, Chapter 2] for more on the Riesz representation
theorem.

6.1 The Dual of C ra, bs

Recall that C ra, bs is the Banach space of real-valued (uniformly) continuous functions x on ra, bs with
norm }x}  suptPra,bs |xptq|. The challenge is to find a way of representing all linear functionals on C ra, bs
in a unified manner without using Dirac delta functions. This can be done elegantly by using the Stieltjes
integral.

Definition 6.1.1. Let f and g be real-valued, bounded functions on ra, bs that do not have a common point
of discontinuity. Take a partition of the interval a  t0        tn  b. Then if


n¸1
I  maxpt limt qÑ0 f pti1 qpg pti q  g pti1 qq
i 1 i

i 0

exists, I is called the Stieltjes integral of f with respect to g, denoted


»b
f ptq dg ptq.
a

Example 6.1.2. Suppose that you use a logarithmic t-axis in presenting a graph of function f : i.e. you
graph f plog sq. Now the area under the graph no longer represents the integral. Show that the Stieltjes
³b
integral gives the right formula for approximating the integral a f psq ds and that this formula agrees with
the change-of-variable formula.

Solution: The area under the graph is


» exp b 
n¸1
f plog sq dplog sq  lim f plog si1 qplog si  log si1 q
exp a max sip 1 si qÑ0 i0
» exp b » exp b
 f plog sq ds 
1
s
f psq ds,
exp a exp a

where we have the partition a  s0        sn  b, which agrees with the change-of-variable formula. q

87
88 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM

³1
Example 6.1.3. What is 0 f ptq dg ptq if:

• g ptq  t?

• g is differentiable?

• g ptq  0 for t   1{2 and g ptq  1 for t ¥ 1{2?

• g ptq  t for t   1{2 and g ptq  t 2 for t ¥ 1{2?

• g pt q  P p T ¤ tq for random variable T with support r0, 1s?

Solution: We have
³1 ³1
• 0 f ptq dg ptq  0 f ptq dt
³1 ³1
• 0 f ptq dg ptq  0 f ptqg 1 ptq dt
³1
• 0 f ptq dg ptq  f p1{2q
³1 ³1
• 0 f ptq dg ptq  0 f ptq dt f p1{2q
³1
• 0 f ptq dg ptq  E rT s
q
The Riesz representation theorem says that any bounded linear functional f on C ra, bs can be ex-
pressed as a Stieltjes integral
»b
f px q  xptq dv ptq.
a

However, there is one caveat: the function v must be of bounded variation. Loosely speaking, TVpv q 
³b
a |dv ptq| must be finite. For example, we could choose xptq  sgn v ptq so that f pxq  TVpxq. However,
such an x is clearly not in C ra, bs since it is not continuous, so we must introduce the following normed
linear space.

Definition 6.1.4. Let B ra, bs be the space of bounded functions x on ra, bs with norm }x}  suptPra,bs |xptq|.
Note that C ra, bs is a subspace of B ra, bs.

Also recall that we defined the normed linear space BVra, bs of functions on ra, bs with norm TVpxq in
Example 2.4.6. Now we can state the Riesz represetnation theorem.

T HEOREM 6.1.5 (Riesz Representation Theorem ([1], Theorem 1, p. 113)). Let f be a bounded linear functional
on C ra, bs. Then there is a function v P BVra, bs such that for all x P C ra, bs,
»b
f px q  xptq dv ptq
a

and }f }  TVpv q. Conversely, every function v P BVra, bs defines a bounded linear functional in this way.
6.1. The Dual of C ra, bs U 89

Proof: Since C ra, bs is a subspace of B ra, bs, if f is a bounded linear functional on C ra, bs then, by the
Hahn-Banach theorem, there exists a linear functional F on B ra, bs which is an extension of f and has the
same norm. For any s P ra, bs, define the set of functions tus : s P ra, bsu, where for all t P ra, bs, us ptq  0
if s  a and us ptq  1tPra,ss if s P pa, bs. Clearly, each us is in B ra, bs. Also define v psq  F pus q, and let
a  t0        tn  b be a finite partition of ra, bs.
First we show that TVpv q ¤ }f }. Define i  sgnpvptiq  vpti1qq for i  1, . . . , n. Then
ņ ņ
|vptiq  vpti1q|  i pv pti q  v pti1 qq

i 1 i 1 

 i pF puti q  F puti1 qq
i 1 

F i puti  ut  q
i 1 ,
i 1 
where the last step follows from the linearity of F . By the Hahn-Banach theorem, }F }  }f }. By definition,
}uti  uti1 }  1 for only one i P t1, . . . , nu, so
 
 ņ 
i puti  q  1.
 
 uti1 
 
i 1 
Therefore, combining these results with Corollary ??,
 
ņ  ņ 
|vptiq  vpti1q| ¤ } } 
F 

i puti  
uti1 

q  }f }.

i 1 
i 1
°
Then since TVpv q is the maximum of ni1 |v pti q  v pti1 q| over all partitions of ra, bs and the above in-
equality holds for all partitions, TVpv q ¤ }f }.
Now we show that }f } ¤ TVpv q. For any x P C ra, bs, define

z pτ q  xpti1 qputi pτ q  uti1 pτ qq.

i 1

Then, again since }uti  ut  }  1 for only one i P t1, . . . , nu,


i 1

}z  x}  i,τ Prmax ,t s
|xpti1q  xpτ q|.
t 
i 1 i

Since x is uniformly continuous, this quantity goes to zero as the partition is made arbitrarily fine. Then
since F is continuous, F pz q Ñ F pxq, which in turn equals f pxq by the Hahn-Banach theorem. Finally,
ņ »b
F pz q  xpti1 qpv pti q  v pti1 qq Ñ xptq dv ptq,

i 1 a

so combined with the previous result,


»b
f px q  xptq dv ptq.
a
90 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM

Finally,
» b 
 
 p q p q ¤ }x}TVpvq;
 x t dv t 

a

dividing by }x} gives }f } ¤ TVpv q. Therefore, }f }  TVpv q.


Conversely, if v P BVra, bs, then
»b
f px q  xptq dv ptq
a

is linear. Furthermore, f is bounded since |f pxq| ¤ }x}TVpv q. q


Note that the Riesz representation theorem does not claim that the function v representing the func-
tional f is unique. For example, the functional f pxq  xp1{2q could be represented by a function v ptq that
equals zero for t P r0, 1{2q, any value a P r0, 1s for t  1{2, and 1 for t P p1{2, 1s. In fact, we could even shift
v upward or downward by an arbitrary amount. Therefore, we introduce the following normed linear
space, which we can use in applications.

Definition 6.1.6. Let NBVra, bs be the space of functions v on ra, bs of bounded variation such that v paq  0
and v is right-continuous on pa, bq. The norm is }v }  TVpv q. Note that NBVra, bs is a subspace of BVra, bs.

6.2 The Second Dual Space

Let the functional g pxq be represented by some element x P X  ; we will use the notation g pxq  xx, x y.
Conversely, any x P X defines a functional on X  by f px q  xx, x y.

Example 6.2.1. Show that such an f is linear and bounded.

Solution: f is linear because

f pax1 bx2 q  xx, ax1 bx2 y  a xx, x1 y b xx, x2 y  af px1 q bf px2 q.

Also, |f px q|  | xx, x y | ¤ }x}}x }, so }f } ¤ }x}. Furthermore, by Corollary ??, there exists a nonzero
x P X  such that xx, x y  }x}}x }, so in fact }f }  }x}. q

Definition 6.2.2 (second dual). The space of all bounded linear functionals on X  is denoted X  and is
called the second dual of X.

Definition 6.2.3 (natural mapping). The mapping φ : X Ñ X  defined by φpxq  x such that xx, x y 
xx, xy is called the natural mapping of X into X ; that is, φ maps elements of X into the functionals
they generate X  .

Definition 6.2.4 (reflexivity). A normed linear space is said to be reflexive if X  X  ; that is, if the
natural mapping is surjective.
6.3. Alignment and Orthogonal Components U 91

Example 6.2.5. Give examples of spaces that are reflexive and not reflexive.

Solution: The lp spaces, with 1   p   8, are reflexive since lp  lq with 1{p 1{q  1, as shown in Quiz
Theorem ??, so lp  lq  lp . l1 is not reflexive since the dual of l1 is l8 but the dual of l8 is not l1 . By the
Riesz-Fréchet theorem, any Hilbert space is reflexive. q

6.3 Alignment and Orthogonal Components

Recall that in a Hilbert space H, px, yq  }x}}y} if and only if y  ax for some constant a. We generalize
this idea below.

Definition 6.3.1 (alignment). A vector x P X  is said to be aligned with a vector x P X if xx, xy 
}x}}x}.

Example 6.3.2. Let x  tξi u P lp and x  tηiu P lq , with 1   p   8 and 1{p 1{q  1. Under what
condition are x and x aligned?

Solution: The condition for alignment is simply the condition for equality in the Hölder inequality:

|ξi|
  |ηi|

1 1
q p

}x}p }x}q . q

Example 6.3.3 ([1], Example 1, p. 117). Let x P Lp ra, bs and x P Lq ra, bs, with 1   p   8 and 1{p 1{q  1.
Under what condition are x and x aligned?

Solution: As before, the condition for alignment is simply the condition for equality in the Hölder in-
equality:
xptq  K psgn y ptqq|y ptq|q{p . q

Example 6.3.4 ([1], Example 2, p. 117). Let x P C ra, bs and let Γ  tt P ra, bs : |xptq|  }x}u. Under what
³b
conditions is a bounded linear functional x pxq  a xptq dv ptq aligned with xptq?

Solution: We require that v vary only on Γ and that v is nondecreasing at t if xptq ¡ 0 and nonincreasing
at t if xptq   0. q
Recall that in a Hilbert space H, we defined the orthogonal complement of a set S as S K  ty P H :
ps, yq  0 for all s P S u. We generalize this idea below.

Definition 6.3.5 (orthogonal complement). Let S be a subset of a normed linear space X. The orthogonal
complement or annihilator space of S is defined as S K  tx P X  : xs, x y  0 for all s P S u.
92 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM

Given a subset U „ X  , U K „ X  . However, U K is not necessarily a subset of X if X is not reflexive,


so we define the more useful concept below.

Definition 6.3.6 (orthogonal complement of U in X). Let U be a subset of the dual space X  . The orthog-
onal complement of U in X is defined as K U  U K X X.

T HEOREM 6.3.7 ([1], Theorem 1, p. 118). Let M be a closed subspace of a normed space X. Then K rK M s  M .

Proof: Clearly, M „K rM K s, or equivalently, if x P M , then x PK rM K s. We show the converse: if x R M ,


then x RK rM K s. On the subspace rx M s, define the linear functional f pax mq  a for m P M . Then

f p x mq
}f }  sup
}x m}  inf 1
}x m}
.
m M P P
m M

Since x R M and M is closed, inf mPM }x m}  0, so }f }   8. Thus, by the Hahn-Banach theorem, we


can extend f to an x P X  . Since f pmq  0 for m P M , x P M K . But xx, x y  1, so x RK rM K s. Then
since M „K rM K s and M C „ pK rM K sqC , M K rM K s. q

6.4 Minimum Norm Problems

In general, the solutions to minimum norm problems are not unique.

Example 6.4.1 (equivalent to [1], Example 1, p. 119). Cab drivers are charging only for the larger dimen-
sion traveled in R2 . Show that from a point pu, v q with v  0, there are many points pu1 , 0q that can be
reached for the same fare.

Solution: All the points pu1 , 0q with u1 P ru  v, u v s can be reached for fare v. q

Example 6.4.2. Now cab drivers are charging for the total distance driven. Show that from a point pu, v q
with u  v, there are many points pu1 , u1 q that can be reached for the same fare.

Solution: Without loss of generality, assume u   v. Then all the points pu1 , u1 q with u1 P ru, vs can be
reached for fare v  u. q
The following theorem guarantees neither existence nor uniqueness, but it is more general than the
projection theorem in that it specifies a dual problem for any minimum norm problem.

T HEOREM 6.4.3. Let x be an element of a real normed linear space X. Then

d  inf }x  m}  }x}¤max xx, x y ,


m M P 
1,x PM K

where the maximum is achieved for some x0 P M K. If the infimum is achieved for some m0 P M , then x0 is aligned
with x  m0 .
6.4. Minimum Norm Problems U 93

Proof: For  ¡ 0, let m P M satisfy }x  m} ¤ d . Then for any x P M K such that }x} ¤ 1,
xx, xy  xx, xy xloooooomoooooon
m  , x y  xx  m , x y ¤ }x  m }}x } ¤ d .
0
Since  was arbitrary, we conclude xx, x y ¤ d. To complete the proof of the first part of the theorem, we
must find a x such that xx, x y  d. Let N  rx M s. We can represent an element n P N uniquely as
0 0
ax m, where m P M . Define the linear functional f on N by f pnq  ad. Then

}f }  sup |f}pnn}q|  sup }ax|a|d n}  inf }


d
m{a}
 1.
nPN nPN nPN x

By the Hahn-Banach theorem, extend f on N to x0 on X. Then }x0 }  }f }  1 and x0  f on N , so by


construction, x0 P M K and xx, x0 y  f px mq  d.
To prove the second part of the theorem, assume there exists an m0 P M such that }x  m0 }  d.
Let x0 be any element such that x0 P M K , }x0 }  1, and xx, x0 y  d (the x0 constructed above is one
possibility). Then
xx  m0, x0 y  xx, x0 y  d  }x  m0}}x0 },
so x0 is aligned with x  m0 . q
We can generalize the projection theorem as a corollary of Theorem 6.4.3.

C OROLLARY 6.4.4. Let x be an element of a real normed linear space X and let M be a subspace of X. A vector
m0 P M satisfies }x  m0 } ¤ }x  m} for all m P M if and only if there is a nonzero vector x P M K aligned
with x  m0 .

Proof: If }x  m0 } ¤ }x  m} for all m P M , then we have achieved the infimum in Theorem 6.4.3,
specifically at m0 , so the given x0 is aligned with x  m0 .
If there is a nonzero vector x P M K aligned with x  m0 , take }x }  1 without loss of generality. Then
for all m P M , xx, x y  xx  m, x0 y ¤ }x  m} while xx, x y  xx  m0 , x0 y  }x  m0 }. Therefore,
}x  m0} ¤ }x  m} for all m P M . q

Example 6.4.5. Let X  R2 with the l1 norm, M  tpu, vq : u  2v  0u, and x  p0, 3q. Determine d, m0,
and x .

Solution: We have M K  rp1, 2qs, so K  tx : }x }  1u  tp1{2, 1qu. Then x0  arg maxx PK xx, x y 
p1{2, 1q and d  xx, x0 y  3. Then x  m0  p0, 3q since x  m0 and x0 must be aligned, so m0 
p0, 0q. q

Example 6.4.6. Let X  R2 with the l8 norm, M  tpu, v q : u  2v  0u, and x  p0, 3q. Determine d, m0 ,
and x . Note that the norm in X  is the l1 norm since difficulties concerning the dual of l8 occur only
because of infinite dimensionality.

Solution: We have M K  rp1, 2qs, so K  tx : }x }  1u  tp1{3, 2{3qu. Then x0  arg maxx PK xx, x y 
p1{3, 2{3q and d  xx, x0 y  2. Then x  m0  p2, 2q since x  m0 and x0 must be aligned, so
m0  p2, 1q. q
94 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM

Example 6.4.7. Let X  R2 with the l1 norm, M  tpu, vq : 2u  v  0u, and x  p1, 2q. Determine d, m0,
and x .

Solution: We have M K  rp2, 1qs, so K  tx : }x }  1u  tp1, 1{2qu. Then x0  arg maxx PK xx, x y 
p1, 1{2q and d  xx, x0 y  2. Then x  m0  p2, 0q since x  m0 and x0 must be aligned, so
m0  p1, 2q. q
Theorem 6.4.3 simply proved the existence of a solution to the dual problem in X  : the theorem below
proves the existence of a solution to the minimum norm problem in X.

T HEOREM 6.4.8. Let M be a subspace in a real normed space X. Then

d  min }x  m }  sup xx, x y ,


m PM K xPM,}x}1

where the minimum is achieved for some m0 P M K. If the supremum is achieved for some x0 P M , then x0  m0
is aligned with x0 .

Proof: Define
}x}M  sup xx, xy
P } }1
x M x

since it is the norm of the functional x restricted to the subspace M . Then for any m P M K,
}x  m}  sup xpx  m q  sup pxx, xy  xx, bmyq
}x}¤1 }x}¤1
¥ sup pxx, xy  xx, bmyq  sup xx, xy ¥ }x}M .
P } }¤1
x M, x P } }¤1
x M, x

Since m was arbitrary, we conclude }x  m } ¤ }x }M . To complete the proof of the first part of the
theorem, we must find a m0 such that }x  m }  }x }M . Consider x restricted to M . The norm of
x so restricted is }x }M . By the Hahn-Banach theorem, extend x restricted on M to y on X. Then
}y}  }x}M and y  x on M so y  x  0 on M . Set m0  x  y; then m0 P M K and
}x  m0 }  }y}  }x}M .
To prove the second part of the theorem, assume there exists an x0 P M such that xx0 , x y  d. Since
the supremum is achieved, clearly }x0 }  1 and }x  m0 }  xx0 , x y  xx0 , x  m0 y, so x0 is aligned
with x  m0 . q

Example 6.4.9. Let X  R2 with the l1 norm, M  tpu, vq : 2u  v  0u, and x  p4, 1q. Determine d, m0 ,
and x0 .

Solution: We have M  rp1, 2qs, so K  tx : }x}  1u  tp1{3, 2{3qu. Then x0  arg supxPK xx, x y 
p1{3, 2{3q and d  xx, x0 y  2. Then x  m0  p2, 2q since x  m0 and x0 must be aligned, so
m0  p2, 1q. q
We can illustrate the duality between Theorems 6.4.3 and 6.4.8 more explicitly as follows: given a
normed linear space X and a subspace M , we have two ways of making a new vector space:
6.5. Applications U 95

• the quotient space X {M . An element of X {M is a coset rxs. Recall that the norm of such a coset is
}rxs}  inf mPM }x  m}. Motivated by this definition, let }x}M  }rxs}. This notation is reasonable
if you think of it as answering the question: if I am allowed to add any element of M to x, how small
can I make the norm?

• the annihilator space M K , a subspace of X  . If we view x as an element of X  , it has norm

}x}M K  
sup
K 
xx, xy
x PM ,}x }1

since this is the norm of x considered as a functional on M .

Using this notation, Theorem 6.4.3 is }x}M  }x}M K and Theorem 6.4.8 is }x}M K  }x}M .

Example 6.4.10. Let X  R3 with the l1 norm, M  tpu, v, wq : u  2v  3w  0u, and x  p7, 2, 3q. Note
that M  rtp2, 1, 0q, p3, 0, 1qus. Determine d, m0 , and x0 .

Solution: We have M K  rp1, 2, 3qs, so K  tx : }x }  1u  tp1{3, 2{3, 1qu. Then x0 
arg maxx PK xx, x y  p1{3, 2{3, 1q and d  xx, bx0 y  }x}M K  2. Then x  m0  p0, 0, 2q since x  m0
and x0 must be aligned, so m0  p7, 2, 1q. q
We will solve minimum norm problems using roughly the following three steps:

1. In characterizing optimal solutions, use the alignment properties of the normed linear space and its
dual. (Refer back to Examples 6.3.2, 6.3.3, and 6.3.4.)

2. Try to guarantee the existence of a solution by formulating the minimum norm problem in a dual
space.

3. Examine the dual problem to see if it is easier – that is, if it is a lower-dimensional problem or is
more transparent – than the original problem.

6.5 Applications

Q UIZ T HEOREM 12 (Tonelli ([1], Example 1, p. 122)). If f is continuous on ra, bs and p0 is the polynomial with
deg p0 ¤ n minimizing maxtPra,bs |f ptq  pptq|, then |f ptq  p0 ptq| achieves its maximum at at least n 2 points
on ra, bs.

Proof: First, we reformulate the problem as follows: in the Banach space C ra, bs, we want to find p P N 
tpolynomials p : deg p ¤ nu such that }f  p} is minimized. N is finite-dimensional (dimension n 1), so
an optimal solution p0 must exist.
Suppose d  }f  p0 } ¡ 0 and let Γ  tt P ra, bs : |f ptq  p0 ptq|  du. By Theorem 6.4.3, the optimal
solution p0 must be such that f  p0 is aligned with an element v P N K with }v }  1. Furthermore, by
Theorem 6.1.5, N K is a subspace of NBVra, bs. Therefore, recall from Example 6.3.4 that if f  p0 is aligned
with v, then v varies only on Γ; that is, v consists of jump discontinuities at the points in Γ.
Now suppose that Γ contains m   n 2 points a ¤ t1        tm ¤ b. Then the polynomial
±
q ptq  ik pt  ti q with k P t1, . . . , mu has degree m  1 ¤ n and is therefore in N . The Stieltjes integral
96 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM

³b
a q ptq dv ptq is a linear combination of q pt1 q, . . . , q ptm q with no zero coefficients since v consists of jump
discontinuities at the points in Γ. However, the integral cannot equal zero since q pti q  0 for i  k and
q pti q  0 for i  k, so v R N K . Therefore, f  p0 is not aligned with any nonzero element of N K , so our
assumption that Γ contains less than n 2 points was false. q

Example 6.5.1. Let ra, bs  r1, 1s, f ptq  t2 , and n  1. Find p0 ptq.

Solution: Let p0 ptq  a bt. By Theorem 12, |t2  a  bt| achieves its maximum at at least 3 points on
r1, 1s. The maximum of |t2  a  bt| could occur at t  1 (d  |1  a b|), t  1 (d  |1  a  b|), or
t  b{2 (d  |  b2 {4  a|). Because the coefficient on the t2 term (1) is positive, we know that 1  a b,
1  a  b, and b2 {4 a will be positive for optimal p0 ptq. Setting these quantities equal to d gives a  1{2,
b  0, and d  1{2. q

Example 6.5.2. Let ra, bs  r1, 1s, f ptq  tk 1, and n  k. Find p0 ptq.

Solution: Notice that the function cosppn 1qθq achieves its maximum absolute value of 1 precisely n 2
times on the interval r0, π s, specifically at the points tkθ{pn 1q : k  0, . . . , n 1u. Now cos θ P r1, 1s,
so set t  cos θ and express cosppn 1qθq as a function of t to find p0 ptq. For example,

cos 2θ  2 cos2 θ  1  2t2  1,


so the best approximation to f ptq  t2 on r1, 1s is p0 ptq  1{2. As another example,
a
cos 4θ  cos2 2θ  sin2 2θ  p2 cos2 θ  1q2  p2t 1  t2q2
 p2t2  1q2  4t2p1  t2q  8t4  8t2 1,
so the best approximation to f ptq  t4 on r1, 1s is p0 ptq  t2  81 .
Let Tn pcos θq  cos nθ by the Chebyshev polynomials of the first kind. Note that putting t  cos θ
gives T0 ptq  cos 0t  1 and T1 ptq  cos θ  t. We have the recurrence relation

Tn 1 ptq  cosppn 1qθq  cos θ cos nθ  sin θ sin nθ


 cos θ cos nθ  sin θpsin θ cosppn  1qθq cos θ sinppn  1qθqq
 cos θ cos nθ  sin2 θ cosppn  1qθ  cos2 θ cosppn  1qθq
cos2 θ cosppn  1qθq  sin θ cos θ sinppn  1qθq
 cos θ cos nθ  cosppn  1qθq cosθ cos nθ
 2t Tnptq  Tn1ptq.
Then to determine the best approximation p0 ptq to f ptq  tn on r1, 1s, find the Chebyshev polynomial
Tn ptq, divide by the leading coefficient,
! subtract
 the tn term, and multiply
) by 1. Then |f ptq  p0 ptq|
reaches its maximum at the points t  cos k
n 1π :k  0, . . . , n 1 . q
In the previous examples, we wanted to find the vector of minimum norm in a given linear variety
instead of finding the vector closest to a given subspace. Remember that these problems are equivalent by
Theorems 6.4.3 and 6.4.8. A standard problem is to find an element of minimum norm satisfying a finite
6.5. Applications U 97

number of linear constraints. Treat the solution to the minimum norm problem as a vector x in the dual
space X  . Theorem 6.4.8 then guarantees the existence of a solution to the minimum norm problem. We
can express the constraints in the form xy1 , x y  c1 , . . . , xyn , x y  cn with yi P X for i  1, . . . , n. Then
if M  rtyi us and x̄ is a vector satisfying the constraints, we have

d min
xyi , x yci
}x}  mmin
 PM K
}x̄  m}  sup xx, x̄y ,
P } }¤1
x M x

where the second equality follows from Theorem 6.4.8. Then because M is finite-dimensional (dimenson
n), so we can change the supremum to a maximum. Any vector in M can be written as a linear combina-
°
tion x  ni1 ai yi or equivalently, if we set Y  py1 ,    , yn q and a1  pa1 ,    , an q, as x  Ya. Then if
we set c1  pc1 ,    , cn q,

d min
xyi , x yci
}x}  }Ya
max xYa, x̄ y  max c1 a.
}¤1 }Ya}¤1
°
Then by Theorem 6.4.8, the optimal x is aligned with x  ni1 ai yi (compare to the equivalent result in
°
Hilbert space, where the optimal solution was proportional to x  ni1 ai yi ).

C OROLLARY 6.5.3 ([1], Corollary 1, p. 123). Let y1 , . . . , yn P X and suppose the system of linear equalities
xyi, xy  ci for i  1, . . . , n is consistent; that is
D  tx P X  : xyi, xy  ci for i  1, . . . , nu  ∅.
Then
min }x }  max c1 a
x D
P }Ya}¤1
and the optimal x is aligned with the optimal Ya.

Proof: See the discussion above. q

Example 6.5.4. You are taking the new Gen Ed course “Applied Cubism,” whose goal is to teach you
how to apply art to everyday life. As your term project, you are producing three large concrete cubes.
Initially these will serve as a podium for children who are receiving medals for their athletic prowess, but
they will live on as modern sculpture. The gold-medal cube has side u and provides artistic renown 9u.
The silver-medal cube has side v and provides artistic renown 4v. The gold-medal cube has side w and
provides artistic renown w. You must achieve artistic renown of 72 to earn an A in the course. How do
you accomplish this while minimizing the total volume d3 of concrete?

Solution: The solution x  pu, v, wq lies in the space X   l3 . We want to minimize }x }3 such that
xy, xy  c, where y  p9, 4, 1q P X  l3{2 and c  72. We have
 2
}ay}3{2  a 92
3 3
42
3
12
3
 a  64{3 ¤ 1.
Then by Corollary 6.5.3,

d min
xy, x yc
}x}  }Ya
max c1 a 
}¤1
max 72a  2  62{3
a 64{3
¤
98 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM

and d3  288. Then because x and y must be aligned (there is only one constraint, so Ya in Corollary
6.5.3 reduces to a constant times y), we have x  p6, 4, 2q.
We can check this result by setting up a Lagrangian:

Lpu, v, w, λq  u3 v 3 w3  λp9u 4v w  72q


dL
du
 3u2  9λ  0
dL
dv
 3v2  4λ  0
dL
dw
 3w2  λ  0
We see that pu, v, w, λq  p6, 4, 2, 12q satisfies the equations above. q
We will use the following scenario for the next two examples: You are setting up a company that will
operate for four weeks to produce snacks for the inaugural festivities. At the start of each week, and at
the end of the fourth week, you will hire or fire staff (1 unit of labor equals 100 hours per week). Each
unit of labor produces a ton of snacks per week. Since you are using overworked government personnel
to do your hiring and firing, your requirement is to minimize the maximum absolute value of the weekly
change in the labor force.
In accordance with standard practice, you look for the optimal solution in the space X   l8 . It has
the form x  tη1 , η2 , η3 , η4 , η5 u, and it will be the vector of minimum norm in a linear variety determined
by whatever constraints the candidates provide. These constraints will be vectors in X  l1 .

Example 6.5.5. First you go to the McCain-Palin staff. They require 14 tons of moose jerky. Since they
favor small government, they want only 1 unit of labor left over to serve the snacks.

°
Solution: Note that at the start of Week k, we have ki1 ξi units of labor, for i  1, . . . , 5 (the start of Week
5 is the inauguration). Then the functional that gives the tons of snacks produced is y1  p4, 3, 2, 1, 0q and
the functional that gives the number of units of labor left for the inauguration is y2  p1, 1, 1, 1, 1q. The
optimal solution x must satisfy xy1 , x y  c1  14 and xy2 , x y  c2  1.
Since the optimal x is aligned with a vector in l1 , its entries must almost all be d. The exception is
that if the vector in l1 has an entry of zero in any position, then x can have any entry between d and d
in that position without destroying alignment.
The vector Ya can be written p4a1 a2 , 3a1 a2 , 2a1 a2 , a1 a2 , a2 q. Since we want a unit of labor
left over, the first three entries of x will be nonnegative, while the last two will be negative. Therefore,
in the aligned Ya, the first three entries will also be nonnegative, while the last two will be nonpositive.
Therefore, we can discard the absolute values, so }Ya}1  8a1 a2 . From Corollary 6.5.3,

d  min
x PD
}x}  }Ya
max c1 a  14a1
}¤1
a2

Now η3  2a1 a2  0 and we require }Ya}  8a1 a2  1, so a1  1{6, a2  1{3, Ya 


p1{3, 1{6, 0, 1{6, 1{3q, and d  2. Then since x must be aligned with Ya, x  p2, 2, 1, 2, 2q.
 q

Example 6.5.6. Next you go to the Obama-Biden staff. They require 17 tons of brioche stuffed with brie.
Since they favor job creation, they want 3 units of labor left over to serve the snacks.
6.5. Applications U 99

Solution: Again we have y1  p4, 3, 2, 1, 0q P l1 and y2  p1, 1, 1, 1, 1q P l1 , though the optimal solution
x must now satisfy xy1 , x y  c1  17 and xy2 , x y  c2  3.
We repeat all of the steps as before, though now since we want 3 units of labor left over, either the
first three entries of x will be nonnegative while the last two will be negative, or the first four entries of
x will be nonnegative while the last one will be negative. Therefore, in the aligned Ya, either the first
three entries will be nonnegative while the last two will be nonpositive, or the first four entries will be
nonnegative while the last one will be nonpositive. In either case, we can discard the absolute values, so
}Ya}1  8a1 a2 or }Ya}1  10a1 3a2. From Corollary 6.5.3,
d  min
x PD
}x}  }Ya
max c1 a  17a1
}¤1
3a2

Now either η3  2a1 a2  0 or η4  a1 a2  0, which combined with the constraint }Ya}1  1 above,
yields either a1  1{6, a2  1{3, Ya  p1{3, 1{6, 0, 1{6, 1{3q, and d  11{6 or a1  1{7, a2  1{7,
Ya  p3{7, 2{7, 1{7, 0, 1{7q, and d  2. d is larger in the second case, so that case is optimal. Then since
x must be aligned with Ya, x  p2, 2, 2, 1, 2q. q

Example 6.5.7 ([1], Example 2, p. 124). Consider selecting the field current uptq on r0, 1s to drive a motor
governed by θ:ptq θ9ptq  uptq with initial conditions θp0q  0, θ9p0q  0, θp1q  1, and θ9p1q  0 such that
we minimize maxtPr0,1s |uptq|.

Proof: We could think of this problem as being formulated in C r0, 1s, but since this is not the dual of
any normed space, we are not guaranteed that a solution exists in C r0, 1s. Therefore, we instead take
X  L1 r0, 1s, X   L8 r0, 1s, and seek u P X  of minimum norm.
As in Example ??, the constraints are
»1
xy1, uy  y1 ptquptq dt  θ9p1q  0
0
»1
xy2, uy  y2 ptquptq dt  θp1q  1,
0

where y1 ptq  et1 and y2 ptq  1  et1 . From Corollary 6.5.3,

min }u}8  }max a2 ,


Ya¤1

so we want to maximize a2 subject to the constraint


»1
 
p  a2qet1
 a1 a2  dt ¤ 1.
0

We require that uptq is aligned with this function:


»1
uptqpa1 y1 ptq a2 y2 ptqq dt  }u}8 }a1 y1 a2 y2 }  d.
0

This can happen only if uptq is equal to d everywhere and always has the same sign as a1 y1 ptq a2 y2 ptq.
Furthermore, the function a1 y1 ptq a2 y2 ptq is the sum of a constant and an exponential term, so it changes
sign on r0, 1s at most once; the same must be true of uptq.
100 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM

From the first constraint, we have


» t »1    
xy1, uy  e  dt 
t 1
et1 dt  et 1  e1  1  et 1  0
0 t 
 

ùñ t  1 log
1
2
1
1
e
,

and from the second constraint, we have


» t »1
xy2, uy  dp1  e  q dt 
t 1
dp1  et1 q dt
0 t
       
d t   e t 1  e1  1  e0  t  et 1 1
ùñ d  
1

p1 eq2 .
log 4e

Then uptq  d for t P r0, t s and uptq  d for t P pt , 1s. q

Example 6.5.8 ([1], Example 3, p. 125). Consider the problem of selecting the thrust program uptq for a
vertically ascending rocket-propelled vehicle, subject only to the forces of gravity and rocket thrust, in
order to reach a given altitude with minimum fuel expenditure.

Solution: Assuming fixed unit mass, unit gravity, amd zero initial conditions, the altitude xptq is gov-
erned by the differential equation

:ptq  uptq  1,
x xp0q  x9 p0q  0.

Integrate to give
»T
x9 pT q  uptq dt  T.
0

(Note this gives x9 p0q  0.) Now integrate again:


»T » T » s

x pT q  x9 psq ds  uptq dt  s ds
0 0 0
»T » s
» T » T

T2
 uptq dt ds 
2
 uptq ds dt
0 0 0 t
»T 2
 pT  tquptq dt  T2 .
0

(Note this gives xp0q  0.) Then for xpT q  1 we have the constraint
»T
T2
xy, uy  y ptquptq dt  1
2
,
0
³T
where y ptq  T  t, and we want to minimize 0 |uptq| dt.
We could think of this problem as being formulated in L1 r0, T s, but since this is not the dual of any
normed space, we are not guaranteed that a solution exists in L1 r0, T s. Therefore, we instead take X 
6.6. Hyperplanes and Linear Functionals U 101

C0 r0, T s, X   NBVr0, T s, and seek v P X  minimizing


»T
|dvptq|  TVpvq  }v}
0

subject to
»T
T2
pT  tq dvptq  1 2
.
0

Because we cannot expend negative fuel, the physical meaning of v ptq is the total fuel consumption over
the interval r0, ts, while }v } is the total fuel consumption over the whole interval r0, T s.
From Corollary 6.5.3, 

T2
d  min }v }  max a 1 .
}pT tqa}¤1 2
Then
}pT  tqa}  tmax
Pr0,T s
|pT  tqa|  T |a|,
so the optimal choice is a  1{T and d  1{T T {2. The optimal v must be aligned with pT  tqa and,
from Example 6.3.4, can vary only at t  0. Thus, v is a step function. Differentiating d with respect to T
? ? ?
gives T  2 and d  2, so v  2  1rtPp0,?2ss . q

6.6 Hyperplanes and Linear Functionals

Definition 6.6.1 (hyperplane). A hyperplane H in a linear vector space X is a maximal proper linear
variety; that is, a linear variety H such that H  X (proper), and if V is any linear variety containing H,
then either V  X or V  H (maximal).

The following proposition, stated without proof, gives us a more practical way to work with hyperplanes.

Proposition 6.6.2 ([1], Proposition 1, p. 129). Let H be a hyperplane in a linear vector space X. Then there is
a linear functional f on X and a constant c such that H  tx : f pxq  cu. Conversely, if f is a nonzero linear
functional on X, then the set tx : f pxq  cu is a hyperplane in X.

Definition 6.6.3 (half-space). The four half-spaces associated with a hyperplane H are

tx : f pxq   cu tx : f pxq ¤ cu
tx : f pxq ¡ cu tx : f pxq ¥ cu

Definition 6.6.4 (supporting hyperplane). A closed hyperplane H in a normed space X is said to be a


support or supporting hyperplane for the convex set K if K is contained in one of the closed half-spaces
determined by H and H contains a point in K̄
102 U 6. A PPLICATIONS OF THE H AHN -B ANACH T HEOREM

T HEOREM 6.6.5 (Support Theorem ([1], Theorem 2, p. 133)). Given x in the boundary of a convex set K, there
exists a supporting hyperplane containing x.

Proof: If x is in the boundary of K, then there exists a sequence txk u in K̄ C such that xk Ñ x. Now for
any k, since xk R K̄, there exists a hyperplane Hk strictly separating xk from K. Let Hk be defined by the
functional fk ; that is, fk pxk q ¡ fk pyq for all y P K. Now we extract a subsequence so that fk Ñ f ; then in
the limit, f pxq ¡ f pyq for all y P K. f defines the desired separating hyperplane. q

T HEOREM 6.6.6 (Eidelheit Separation Theorem ([1], Theorem 3, p. 133)). Let K1 and K2 be convex sets in X
such that K̊1  ∅ and K2 X K̊1  ∅. Then there is a closed hyperplane H separating K1 and K2 , or equivalently,
there exists an x P X  such that supxPK1 xx, x y ¤ inf xPK2 xx, x y. In other words, K1 and K2 lie in opposite
half-spaces determined by H.

Proof: Let K  K1  K2 . Then K contains an interior point and 0 is not one of them. Recall from
Proposition 2.2.3 that K1  K2 is convex. Then by Theorem 6.6.5, there exists an x P X  with x  0
such that xx, x y ¤ 0 for x P K. Setting x  x1  x2 with x1 P K1 and x2 P K2 gives xx1 , x y ¤ xx2 , x y,
so there exists a real number c such that

sup xx, x y ¤ c ¤ inf xx, xy


P
x K1 P
x K2

The desired hyperplane is then H  tx : xx, xy  cu. q


C HAPTER 7

CALCULUS OF VARIATIONS

Reading: [1, Sections 7.1 – 7.5, 7.7]

7.1 Review

Example 7.1.1. Let


u2 v
f pu, v q 
u2 v 2
with f p0q  0.

• Calculate B f {B u and B f {B v at 0.

• Calculate the directional derivative at 0 along the vector h  p1, mq.

• Introducing the Euclidean norm, show that the derivative does not exist.

Solution: We have
Bf  δf p0; p1, 0qq  lim 1 pf pα, 0q  f p0, 0qq  0
Bu αÑ 0 α
Bf  δf p0; p0, 1qq  lim 1 pf p0, αq  f p0, 0qq  0
Bv αÑ 0 α

Furthermore,

δf p0; hq  lim pf pα, αmq  f p0, 0qq


1
α Ñ0 α
3
 αlim 1 α m

Ñ0 α α 2 p 1 m 2 q 1
m
m2
.

Note that the directional derivative is not a linear functional of h. In fact, the only linear function
δf p0; hq that could possibly work in this case, given that both partial derivatives evaluated at 0 equal
0, is δf p0; hq  0.
We now show that the derivative does not exist; that is, we show that there is no linear function
δf p0; hq of h such that
1
}h}Ñ0 }h}
lim pf phq  f p0q  δf p0; hqq  0.
First we convert to polar coordinates: h  ph1, h2q  pr cos θ, r sin θq. Then f phq  r cos2 θ sin θ and

103
104 U 7. C ALCULUS OF VARIATIONS

}h}  r. We rewrite the above equation as



r cos2 θ sin θ  par cos θ br sin θq  cos2 θ sin θ  a cos θ  b sin θ.
1
lim
r Ñ0 r

for constants a, b. There exist no constants a, b such that the above quantity is zero for all θ, so the deriva-
tive does not exist. Intuitively, we can see from Figure 7.1.1 that f has a cusp at 0. q

Figure 7.1.1: On the left, we see that the function f in Example 7.1.1 has a cusp at 0. Furthermore, the
partial derivatives B f {B u (middle) and B f {B v (right) are clearly discontinuous at 0, so Proposition 7.1.2
does not apply.

The following proposition will be useful later. Note that it did not apply above since the partial deriva-
tives were not continuous at 0, as shown in Figure 7.1.1.

Proposition 7.1.2 ([1, Example 7.2.4, p. 172]). Let X  E n and let f pxq be a functional on E n having continuous
partial derivatives with respect to each xi . Then

δf px; hq 

Bf h .
i1
B xi i

7.2 Gateaux and Fréchet Differentials

The Gateaux differential generalizes the concept of a directional derivative:

Definition 7.2.1 (Gateaux differential). Let X be a vector space, Y be a normed space, and T : D Ñ Y a
transformation, with D „ X. Let x be a fixed element of D and h an arbitrary element of X. If the limit

δT px; hq  lim pT p x αhq  T pxqq


1
αÑ 0 α

exists, it is called the Gateaux differential of T at x with increment h. If the limit exists for all h P X, T is
said to be Gateaux differentiable at x.

Note that while we do require Y to be normed since we need to take limits in Y , we do not require X to
7.2. Gateaux and Fréchet Differentials U 105

be normed. Furthermore, note that if T is Gateaux differentiable at x, then δT px; hq is linear in h since

δT px; λhq  lim p T px αλhq  T pxqq  λ lim pT p x γhq  T pxqq  λδT px; hq.
1 1
αÑ 0 α γ Ñ0 γ

The Fréchet differential generalizes the concept of a derivative:

Definition 7.2.2 (Fréchet differential). Let X and Y be normed spaces, and T : D Ñ Y a transformation,
with D „ X. If for fixed x P D and arbitrary h P X there exists a linear, continuous function δT px; hq of
h such that
1
}h}Ñ0 }h}
lim pT px hq  T pxq  δT px; hqq  0,
then T is said to be Fréchet differentiable at x and δT px; hq is called the Fréchet differential of T at x
with increment h.

In Example 7.1.1, we showed that the given function was not Fréchet differentiable at the origin.

Example 7.2.3. Calculate the Gateaux differential of


»T
f px q  xptq2 dt
0

from the definition. Then let T  π{2, xptq  sin t, and hptq  cos t. Use the differential to approximate
» π

psin t 0.01 cos tq2 dt.


2

Solution: We have
»
1 T 
lim pf px αhq  f pxqq  lim p xptq αhptqq2  xptq dt
1
αÑ 0 α αÑ 0 α 0
» »T
1 T
 αlim
Ñ0 α 0
p 2αxptqhptq  α2 h2 ptqq dt  2xptqhptq dt.
0

Furthermore,
» π » π » π

psin t 0.01 cos tq dt 


2 2 2
2 2
sin t dt 0.02 sin t cos t dt
0 0 0
» π » π


2 2
2
sin t dt 0.01 sin 2t dt
0 0
» π »π

2
sin2 t dt 0.005 sin s ds
0 0

 π4 0.005  0.7954.

q
106 U 7. C ALCULUS OF VARIATIONS

Example 7.2.4 ([1, Examples 7.2.2, 7.2.5, p. 172 – 174]). Let X  C r0, 1s. Calculate the Gateaux differential
of » 1
f p xq  g pxptq, tq dt,
0

assuming that the partial derivative gx px, tq exists and is continuous with respect to x and t. Show that
this differential is also a Fréchet differential.

Solution: We write the Gateaux differential as


»1 

δf px; hq  g pxptq pq q
d
αh t , t dt .
dα 0 
α 0

By our assumptions on g, we can interchange the order of integration and differentiation:


»1
δf px; hq  gx pxptq, tqhptq dt.
0

Now we show that the Gateaux differential is also a Fréchet differential. We have
» 1 
 
|f px hq  f pxq  δf px; hq|  
 p p pq
g xt hptq, tq  g pxptq, tq  gx pxptq, tqhptqq dt .
0

For fixed t, we have


g px pt q hptq, tq  g pxptq, tq  gx px̄ptq, tqhptq,
where |xptq  x̄ptq| ¤ |hptq|. Because r0, 1s is compact, gx px, tq is not just continuous but also uniformly
continuous with respect to x and t. Therefore, for every  ¡ 0, there exists a δ ¡ 0 such that for }h}   δ,
|gxpx h, tq  gxpx, tq|   . Therefore,
» 1 
 
|f px hq  f pxq  δf px; hq|  
 p p p q q  p p q qq p q
gx x̄ t , t gx x t , t h t dt ¤ }h}.
0

Therefore,
1
}h}Ñ0 }h}
lim |f px hq  f pxq  δf px; hq|  0,

so δf px; hq is indeed a Fréchet differential. q

Definition 7.2.5 (relative minimum and maximum). Let f be a real-valued functional defined on a subset
Ω of a normed space X. A point x0 P Ω is a relative or local minimum of f on Ω if there is an open ball
B px0 q such that f px0 q ¤ f pxq for all x P Ω X B px0 q. The relative or local maximum is defined similarly.

Q UIZ T HEOREM 13 ([1, Theorem 7.4.1, p. 178]). Let a real-valued functional f have a Gateaux differential on a
vector space X. A necessary condition for f to have an extremum at x0 P X is that δf px0 , hq  0 for all h P X.

Proof: For every h P X, the function g pαq  f px0 αhq must achieve a extremum at α  0. By Fermat’s
7.3. Euler-Lagrange Equations U 107

theorem from single-variable calculus, if α  0 is an extremum, then g 1 p0q  0; that is,

g 1 p0q  lim pgpαq  gp0qq  αlim pf px 0 αhq  f px0 qq  δf px0 , hq  0.


1 1
q
α Ñ0 α Ñ0 α
Note that the theorem is necessary, not sufficient. For example, the function f pu, v q  uv has partial deriva-
tives B f {B u  v and B f {B v  u, so the Gateaux differential of f at 0 with increment h  ph1 , h2 q is
0h1 0h2  0. However, 0 is not an extremum, but a saddle point.

7.3 Euler-Lagrange Equations

Definition 7.3.1 (Dra, bs). The normed linear space Dra, bs consists of all continuous and continuously
differentiable functions on ra, bs, with norm

}x}  tmax
Pra,bs
|xptq| max |x9 ptq|.
Pr s
t a,b

Q UIZ T HEOREM 14 (Euler-Lagrange ([1, Section 7.5, p. 179 – 180])). Consider finding a function x P Drt1 , t2 s
minimizing a functional of the form
» t2
J  f pxptq, x9 ptq, tq dt,
t1

9 and t and has continuous partial derivatives with respect to x and x.9
where f is continuous with respect to x, x,
Assume that xpt1 q  c1 and xpt2 q  c2 for constants c1 , c2 . Show that x0  arg minx J is a solution to the
Euler-Lagrange equation
fx pxptq, x9 ptq, tq  fx9 pxptq, x9 ptq, tq  0.
d
dt

Proof: Our admissible set is the subset of Drt1 , t2 s with xpt1 q and xpt2 q fixed. Given an admissible vector
x, we also consider vectors of the form x h that are admissible. The set of all h such that x h is
admissible is called the class of admissible variations. Clearly, hpt1 q  hpt2 q  0 for such h.
The Gateaux differential of J is
» t2 
αh, x9 αh9 , tq dt

δJ px; hq  f px
d
.
dα t1 α 0
By our assumptions on f , we can interchange the order of integration and differentiation:
» t2 » t2
δJ px; hq  9 tqhptq dt
fx px, x, 9 tqh9 ptq dt.
fx9 px, x,
t1 t1

If we assume that fx9 has a continuous derivative with respect to t, then we can rewrite the second integral
using integration by parts:
» t2 » t2
9 9 tqh9 ptq dt  fxpx, x,9 tqhptq|tt 
fx px, x, 9
2
1
d
dt
9 tqhptq dt,
fx9 px, x,
t1 t1
108 U 7. C ALCULUS OF VARIATIONS

so » t2 

δJ px; hq  9 tq 
fx px, x, 9 tq hptq dt
fx9 px, x, 9 tqhptq|tt21 .
fx9 px, x,
d
t1 dt
For all admissible variations h, hpt1 q  hpt2 q  0, so the second term equals zero. By Quiz Theorem 13, a
necessary condition on x0  arg minx J is that it must satisfy
» t2 

δJ px; hq  fx px, x,
9 tq  fx9 px, x,
9 tq hptq dt  0
d
t1 dt

for all h P Drt1 , t2 s with hpt1 q  hpt2 q  0.


Now we want to show that x0 must satisfy the Euler-Lagrange equation

fx px, x,
9 tq  fx9 px, x,
9 tq  0.
d
dt
³t
We prove that if g ptq is continuous on rt1 , t2 s and t12 g ptqhptq dt  0 for all h P Drt1 , t2 s with hpt1 q 
hpt2 q  0, then g ptq  0 on rt1 , t2 s. Assume g ptq is nonzero, say positive, for some t P rt1 , t2 s. Then since g
is continuous, it is positive on some interval rs1 , s2 s € rt1 , t2 s. Let
#
hptq 
p t  s 1 q2 ps 2  t q2 s1 ¤ t ¤ s2 .
0 otherwise

But then » t2
aptqhptq dt ¡ 0,
t1

contradicting our claim that this integral evaluates to zero. Therefore, g ptq  0 on rt1 , t2 s. Noting that

9 tq 
fx px, x, 9 tq
fx9 px, x,
d
dt
is continuous, we apply this result to obtain the Euler-Lagrange equation. q
In fact, we do not need to assume that the derivative of fx9 with respect to t is continuous. See [1, Lemmas
7.5.1 – 7.5.3] for an alternative derivation.

Example 7.3.2. For a mass on a spring, if we set all coefficients equal to 1, the difference between kinetic
9 tq  21 px9 2  x2q. Find the function x that minimizes
energy and potential energy (the Lagrangian) is f px, x,
» π » π

 9 tq dt 
f px, x,
1
px9 ptq2  xptq2q dt
2 2
J
0 0 2

with xp0q  1 and xpπ {2q  0.

Solution: x must satisfy the Euler-Lagrange equation

fx px, x,
9 tq  fx9 px, x,
9 tq  x :  0.
d
x
dt
The general solution to this equation is xptq  A cos t B sin t. From the initial conditions, we have A  1
7.3. Euler-Lagrange Equations U 109

and B  0, so xptq  cos t. q

Example 7.3.3. The balance in your campaign chest is xptq, and it grows at a rate equal to xptq by ac-
cumulating interest. If you want a different growth rate x9 ptq, you must make contributions at a rate
uptq  x9 ptq  xptq. Find the function x that minimizes
»1 »1 »1
J  puptqq 2
dt  9 tq dt 
f px, x, px9 ptq  xptqq2 dt
0 0 0

with xp0q  1 and xp1q  e2 .

Solution: x must satisfy the Euler-Lagrange equation

9 tq 
fx px, x, 9 tq  2px9  xq  2p:x  x9 q  2px  x:q  0.
fx9 px, x,
d
dt
The general solution to this equation is xptq  Aet Bet . From the initial conditions, we have A 
1 e2 {p1 eq and B  e2 {p1 eq, so



et .
e2 e2
xptq  1 t
e 
1 e 1 e

We make contributions at a rate

uptq  x9 ptq  xptq  pAet  Bet q  pAet Bet q  2Bet  12e e et.
2
q

Example 7.3.4 ([1, Example 7.5.2, p. 182]). You want to determine your lifetime plan of investment and
expenditure that maximizes total enjoyment. Assume you have fixed savings S and know you will die in
T years. Further assume you have no descendants and are not a charitable person, so you plan to have
no savings left at time T . Your total capital xptq at time t grows at rate x9 ptq  αxptq  rptq, where α is
the interest rate you earn on your investments and rptq is your rate of expenditure. Your enjoyment is a
function U of your rate of expenditure rptq, so you want to maximize
»T »T »T
J  eβt U prptqq dt  9 tq dt 
f px, x, eβt U pαxptq  x9 ptqq dt.
0 0 0

Find the function x that maximizes J with xp0q  S and xpT q  0.

Solution: x must satisfy the Euler-Lagrange equation

d  βt 1 
fx px, x,
9 tq  fx9 px, x,
9 tq  αeβt U 1 pαxptq  x9 ptqq e U pαxptq  x9 ptqq  0
d
dt dt
ùñ dt U pαxptq  x9 ptqq  pβ  αqU 1pαxptq  x9 ptqq.
d 1

Integrating this equation gives U 1 prptqq  U 1 prp0qq expppβ  αqtq.


110 U 7. C ALCULUS OF VARIATIONS

?
Consider U prq  2 r. Then U 1 prq  r1{2 and pU 1 q1 prq  r2 . From the previous equation, we
have rptq  rp0q expp2pα  β qq (note pU 1 q1 prsq  pU 1 q1 prqpU 1 q1 psq). We can now solve for xptq: the
calculations are messy and not worth repeating here. q

7.4 Problems with Constraints

In several optimization problems, we require the optimal vector to satisfy some constraints. In the previ-
ous section, we considered functions that vanish on the endpoints of an interval. More generally, we op-
timize a functional f subject to n nonlinear constraints given in the implicit form g1 pxq  0, . . . , gn pxq  0.
We will assume throughout this section that f and g1 , . . . , gn are continuous and Fréchet differentiable on
the normed space X.

Definition 7.4.1 (regular point). A point x0 satisfying the constraints g1 pxq  0, . . . , gn pxq  0 is a regular
point of these constraints if the n linear functionals g11 px0 q, . . . , gn1 px0 q are linearly independent.

Recall that even though g1 , . . . , gn are nonlinear functionals, their Gateaux differentials are linear in the
increment h.

T HEOREM 7.4.2 ([1, Theorem 7.7.1, p. 187]). If x0 is an extremum of the functional f subject to the constraints
gi pxq  0 for i  1, . . . , n and if x0 is a regular point of these constraints, then δf px0 ; hq  0 for all h satisfying
δgi px0 ; hq  0 for i  1, . . . , n.

Proof (single constraint): Choose h P X such that δg px0 ; hq  0. Choose any vector y P X such that
δg px0 ; y q  0 (note that if x0 were not a regular point, we would not be able to do this).
Now set g px0 h φy q  0. This is a nonlinear function of  and φ since x0 , H, and y are fixed. The
derivative of this function with respect to φ at , φ  0 is just δg px0 ; y q, which by assumption is nonzero.
Therefore, by the implicit function theorem, we can solve for φ as a function of  in some neighborhood
of   0. Then

0  g px 0 h φy q  lo
gopmo
x0oqn  looomooon
δg px0 ; hq φδg px0 ; y q opq op}φy }q. (7.4.1)
0 0

Since δg px0 ; y q  0, there exist constants constants c1 , c2 ¡ 0 such that c1 |φ| ¤ |δg px0 ; y qφ| ¤ c2 |φ|, and
since y  0, there exist constants d1 , d2 ¡ 0 such that d1 }φy } ¤ |φ| ¤ d2 }φy }. Therefore, c1 d1 }φy } ¤
|δgpx0; yqφ| ¤ c2d2}φy}, so from (7.4.1), }φy}  opq.
Now x0 h φy satisfies the constraint and is a function of . By Quiz Theorem 13, we must have

f p x0 φpqy q|0  0,
d
h
d
which implies δf px0 ; hq  0 since }φy }  opq. q
7.4. Problems with Constraints U 111

Proposition 7.4.3 ([1, Lemma 7.7.1, p. 188]). Let f0 , f1 , . . . , fn be linear functionals on a vector space X and
suppose that f0 pxq  0 for every x P X satisfying fi pxq  0 for i  1, . . . , n. Then there exist scalars λ1 , . . . , λn
such that

f0 λi fi  0.

i 1

Proof: In E n , consider the subspace M formed by all points of the form pf1 pxq, . . . , fn pxqq. Define the
linear functional g on rM s by g pmq  f0 pxq. By the Hahn-Banach theorem, we can extend g to λ defined
on all of E n . Furthermore, since pE n q  E n , we can represent λ as pλ1 , . . . , λn . Thus, we have found
λ1 , . . . , λn such that

λi f1 pxq  f0 pxq;

i 1

adding f0 pxq to both sides gives the desired result. q

T HEOREM 7.4.4 (Lagrange Multipliers ([1, Theorem 7.7.2, p. 188 – 189])). If x0 is an extremum of the func-
tional f subject to the constraints gi pxq  0 for i  1, . . . , n and x0 is a regular point of these constraints, then
there exist scalars λi for i  1, . . . , n such that the functional

Lpxq  f pxq λ i g i px q

i 1

is stationary at x0 ; that is,



δLpx0 ; hq  δf px0 ; hq λi δg px0 ; hq  0

i 1

for all h.

Proof: By Theorem 7.4.2, δf px0 ; hq  0 whenever δgipx0; hq  0 for i  1, . . . , n. Applying Proposition


7.4.3 gives the desired result. q

Example 7.4.5 (Dido’s Problem ([1, Example 7.7.1, p. 189])). Find the function x that maximizes
»1
xptq dt
1
subject to the arc length constraint
»1 a
x9 2 1 dt  L
1
with xp1q  xp1q  0.

Solution: We want to find a stationary point of


» »1  a
J  1 f px, x,9 tq dt 
1
x λ x9 2 1 dt.
1
112 U 7. C ALCULUS OF VARIATIONS

x must satisfy the Euler-Lagrange equation

fx px, x,
9 tq  d
9 tq  1  λ dtd ? 2x9
fx9 px, x,  1  λ p1 x:x9 2q3{2 κ  0.
dt x9 1 looooomooooon

Note that κ is the curvature of a plane curve (see [8, Example 4.2.2, p. 90]). the above equation says that κ
is the constant 1{λ, so x must be the arc of a circle

px  c1q2 pt  c2q2  r2.


Recall that the curvature of a circle is the inverse of its radius, so r  λ is the radius of the circle. From the
boundary conditions, we have

p0  c1q2 p1  c2q2  r2


p0  c1q2 p1  c2q2  r2,
?
so c2  0 and c1  r2  1. Then the arc length of the circle from t  1 to t  1 is 2rφ, where
φ  tan1 p1{c1 q is the angle between the line segments pp0, c1 q, p0, 0qq and pp0, c1 q, p1, 0qq. q
We can also use the calculus of variations to find aligned vectors, as long as absolute values are not an
issue.

Example 7.4.6. A linear functional on L4 r0, 1s is represented by a function y ptq that is non-negative on
r0, 1s. Find the function xptq that maximizes xx, yy subject to the constraint }x}4{3  1.

Solution: Note that the constraint is equivalent to }x}4{3


{
4 3
 1. Therefore, we want to find a stationary
point of
»1 »1
J  9 tq dt 
f px, x, xptqy ptq λpxptqq4{3 dt.
0 0
x must satisfy the Euler-Lagrange equation

4 1{3
fx px, x,
9 tq  fx9 px, x,
9 tq  y  0,
d
λx
dt 3

so xptq  K py ptqq3 , where K is a constant chosen so that }x}4{3


{
4 3
 1. For example, if yptq  et, then
»1 »1 

e 3 3t dt  e4t dt   1
4 1
0 0 4 4e4

and 
3{4
xptq   e3t .
1 1
q
4 4e4

Example 7.4.7. In many of the problems we solved in Hilbert space, the goal was to minimize the norm of
a vector x P L2 r0, 1s subject to the constraints pyi , xq  ci for i  1, . . . , n. Using the calculus of variations,
show that x is a linear combination of the constraints.
7.4. Problems with Constraints U 113

Solution: We want to find a stationary point of


»1 »1 ņ

J  f px, x,
9 tq dt  pxptqq 2
λi yi ptqxptq dt.
0 0 i 1
Then x must satisfy the Euler-Lagrange equation

9 tq 
fx px, x, 9 tq  2x
fx9 px, x,  0,
d
λi yi
dt i 1 
so x is a linear combination of the constraints, as we saw before. q
In Lp r0, 1s, x is still aligned with the constraints, but “aligned” is no longer equivalent to “propor-
tional”.

Example 7.4.8. Find the function x P L4 r0, 1s of minimum norm such that
»1
t3 xptq dt  1.
0

Solution: We want to find a stationary point of


»1 »1
J  9 tq dt 
f px, x, ppxptqq4 λt3 xptqq dt.
0 0

Then x must satisfy the Euler-Lagrange equation

fx px, x,
9 tq  fx9 px, x,
9 tq  4x3  0,
d
λt3
dt

so xptq  5t4 . q
114 U 7. C ALCULUS OF VARIATIONS
C HAPTER 8
CONVEX FUNCTIONALS

Reading: [1, Sections 7.8, 7.10 – 7.12]

8.1 Local to Global

Recall from single variable calculus that if a function(al) f satisfies f 2 pxq ¥ 0 on ra, bs, then any point x
where f 1 pxq  0 is not merely a local minimum but a global minimum. The following theorem helps us
generalize the second derivative test by relating the second derivative to convexity.

T HEOREM 8.1.1. If f pxq is twice differentiable on ra, bs, then it is convex on ra, bs if and only if f 2 pxq ¥ 0 on C¿

Proof: First we show the “if” direction: specifically, we show that if f is a convex function, then f 1 ptq is
increasing or constant for all x P C. Pick x1 , x2 P C such that x1   x2 . Then

f px 2 hq  f px 2 q  f px 1 hq f px 1 q
f 1 px2 q  f 1 px1 q  lim ,
hÑ0 h

so we simply need to show that f px2 hq f p x 1 q ¥ f px 1 hq f px2 q for h ¡ 0 if f is convex. Let


λ  x2x2  x1
x1 h so that x1 h  p1  λqx1 λpx2 hq and x2  λx1 p1  λqpx2 hq. Then
f p x1 hq f px2 q  f pp1  λqx1 λ px 2 hqq f pλx1 p1  λqpx2 hqq ¤ f px1 q f px 2 hq,

where the inequality follows since f is convex.


Next we show the “only if” direction: specifically, we show that if f 2 pxq ¥ 0 for x P C, then f pλx1
p1  λqx2q ¤ λf px1q p1  λqf px2q for any x1, x2 P C. The proof is by contradiction: suppose that there
exists some λ0 P p0, 1q such that the previous inequality was false. Then by the mean value theorem, there
exists some z1 P px1 , λx1 p1  λqx2 q such that

f pλx1
p1  λqx2q  f px1q ¡ p1  λqf px1q  p1  λqf px2q  f px2q  f px1q ,
f 1 pz 1 q 
p1  λqpx1  x2q p1  λqpx1  x2q x2  x1

where the inequality follows from our assumption. Similarly, there exists some z2 P pλx1 p1  λqx2 , x2 q
such that
f px2 q  f pλx1 p1  λqx2 q λf px1 q  λf px2 q f px 2 q  f px 1 q
f 1 pz 2 q    
λpx1  x2 q λ px 1  x 2 q x2  x1
,

but this is a contradiction since we chose z2 ¡ z1 and f 1 is increasing. Therefore, f must be convex. q

115
116 U 8. C ONVEX F UNCTIONALS

Definition 8.1.2 (convex functional). A real-valued functional f defined on a convex subset C of a linear
vector space is convex if
f pλx1 p1  λqx2 q ¤ λf px1 q p1  λqf px2 q
for all x1 , x2 P C and all λ P p0, 1q.

Example 8.1.3. Check that on L2 r0, 1s, the functional


»1
f pxq  x2 ptq dt
0

is convex.

Solution: We have
»1
f pλx1 p1  λqx2q  pλx1ptq p1  λqx2ptqq2 dt
0
»1 »1 »1
λ 2
x21 ptq dt p 1  λq 2
x22 ptq dt λ p1  λ q p pq
x1 t p qq
x2 t 2 dt . q
0 0 loooooooooooooooooomoooooooooooooooooon
0
¥0

Q UIZ T HEOREM 15 ([1, Proposition 7.8.1, p. 191]). Let f be a convex functional defined on a convex set C in a
normed space. Let µ  inf xPC f pxq. Then

• The subset Ω „ C where f pxq  µ is convex.

• If x0 is a local minimum of f , then f px0 q  µ and x0 is therefore a global minimum of f .

Proof: First, suppose x1 , x2 P Ω. Then for x  λx1 p1  λqx2 with λ P p0, 1q, we have f pxq ¤ λf px1 q
p1  λqf px2q  µ. Furthermore, for any x P C, f pxq ¥ µ by definition. Therefore, f pxq  µ.
Next, suppose N is a neighborhood about x0 in which x0 minimizes f . For any x P C with x  x0 ,
there exists some λ P p0, 1q such that x1  λx0 p1  λqx P N . Then

paq pbq
f px 0 q ¤ f px1q ¤ λf px0q p1  λqf pxq,
where (a) follows from the fact that x0 is a local minimum and (b) follows from the convexity of f . Rear-
ranging terms, we have f px0 q ¤ f pxq for all x P C, so x0 is a global minimum. q

Definition 8.1.4 (epigraph). The epigraph of a convex functional f defined on a convex set C in a vector
space X is the set
rf, C s  tpr, xq P R  X : x P C, f pxq ¤ ru.
8.2. Conjugate Convex Functionals U 117

Example 8.1.5. Show that the epigraph is a convex set.

Solution: Pick px1 , y1 q and px2 , y2 q such that f px1 q ¤ y1 and f px2 q ¤ y2 . Then

λy1 p1  λqy2 ¥ λf px1q p1  λqf px2q ¥ f pλx1 p1  λqx2q,


where the last step follows since f is convex. Therefore the point pλx1 p1  λqx2 , λy1 p1  λqy2q is on
or above the graph of f , so the region on or above the graph of f is convex. q

8.2 Conjugate Convex Functionals

Definition 8.2.1 (conjugate convex). Let f be a convex functional defined on a convex set C in a normed
space X. The conjugate set C  is defined as
" *
C   x P X  : suppxx, x y  f pxqq   8
xPC

and the functional f  conjugate to f is defined on C  as

f  px q  suppxx, x y  f pxqq.


P
x C

Example 8.2.2. Let C  r0, 1s and f pxq  x2. Find C , f pxq, C , and f .

Figure 8.2.1: The epigraphs rf, C s and rf  , C  s in Example 8.2.2. The linear functionals mx for m  0, 1, 2
are shown on rf, C s, and the linear functionals xm for x  0, 1{2, 1 are shown on rf  , C  s.

Solution: Any element of the dual space can be represented by a real number m in the form xx, x y  mx.
Figure 8.2.1 shows the convex sets rf, C s and rf  , C  s. From the graph and using elementary calculus, we
118 U 8. C ONVEX F UNCTIONALS

see that $
'
'
&0 m¤0
f  px q  f  pmq  m2
0 m 2.
' 4
'
%m 1 m¥2
Furthermore, f  pmq is the vertical coordinate of the point where the hyperplane (line) tangent to the
graph of f and parallel to the subspace mx intersects the vertical axis. For m ¤ 0, clearly the tangent line
and mx are equal, so 0  f  pmq. For m P p0, 2q, the tangent line is given by y  2x0 px  x0 q x20 : for
m  2x0 , the vertical intercept of this line is m2 {4  f  pmq. For m ¥ 2, the tangent line is given by
y  mpx  1q 1: the vertical intercept of this line is pm  1q  f  pmq.
Is this simple case, X   X, and any element of X  can be represented by a real number x in the
form xx , x y  xm. Clearly f  pxq is undefined except on r0, 1s, so C   r0, 1s. Using elementary
calculus, we see that f  pxq  x2 for x P p0, 1q. q

Example 8.2.3. Let C  r1, 2s and f pxq  0. Find C , f pxq, C , and f .

Figure 8.2.2: The epigraphs rf, C s and rf  , C  s in Example 8.2.3. The linear functionals mx for m  1, 0, 1
are shown on rf, C s, and the linear functionals xm for x  1, 3{2, 2 are shown on rf  , C  s.

Solution: We construct C  and f  using the “support functional” approach suggested in the previous
example: construct the hyperplane, parallel to the subspace mx, that touches f at one or more points in
C and lies below the graph elsewhere. Then f  pmq is the vertical intercept of the hyperplane. If no such
hyperplane exists, then m R C  .
Any element of the dual space can be represented by a real number m in the form xx, x y  mx.
Figure 8.2.2 shows the convex sets rf, C s and rf  , C  s. Clearly,
#
m 0
f  px q  f  pmq 
m
2m m¥0

and C   R. Again, we use the support functional approach to find f  and C  : from Figure 8.2.2,
clearly f   f and C   C. q
8.2. Conjugate Convex Functionals U 119

For the next example, note that if 1


p
1
q  1, then
p  pq
q p1  pqp1  qq  1
p  q p p  1q q  ppq  1q

Example 8.2.4. Let C  E n and f pxq  α p1 }x}pp for 1   p   8 and α ¡ 0. Find C  and f .

Solution: Let x  pξ1 , . . . , ξn q P C and represent an element x P C   C as pη1, . . . , ηnq. Then


ņ 

f  px q  suppxx, bx y  f pxqq  sup ξi ηi  α|ξi |p .


1
xPC xPC i1 p

The supremum exists since the sum inside the supremum is finite. Differentiating with respect to each ξi
gives
ηi  α|ξi |p1 sgn ξi for i  1, . . . , n, (8.2.1)
so 

ņ ņ
f  px  q  ξi η i  α|ξi |p  α 1q |ξi|p.
1

i 1
p i1

Taking absolute values, exponentiating both sides of (8.2.1) by q, and using the identity p  q pp  1q gives
|ηi|q  αq |ξi|p; substituting this above gives

f  px q  α1q |ηi|q  1q α1q }x}q .
1
q
q i1

Q UIZ T HEOREM 16. Let C  Lpp0, 1q for 1   p   8 and


»1
f puq  puptqqp dt.
1
p 0

Find C  and f  .

Proof: We can represent an element y P C   Lq p0, 1q, where p1 1


q  1, as
»1
xy, uy  y ptquptq dt.
0

Then »1
f  py q  suppxy, uy  f puqq  sup pyptquptq  puptqqpq dt;
u CP u CP 0

that is, we want to find a stationary point of


»1 »1
J  f pu, u,
9 tq dt  pyptquptq  puptqqpq dt.
0 0
120 U 8. C ONVEX F UNCTIONALS

Then u must satisfy the Euler-Lagrange equation

fu pu, u,
9 tq  d
dt
9 tq  y  pup1
fu9 pu, u,  0,
so y  pup1. We check that y P Lq p0, 1q:
»1 »1 »1
pyptqq q
dt  p puptqq p  q dt 
q q p 1
pq puptqqp dt   8
0 0 0

where boundedness follows since u P Lp p0, 1q. q

Q UIZ T HEOREM 17 (first part of [1, Proposition 7.10.1, p. 196]). The conjugate set C  and the conjugate
functional f  are convex.

Proof: For any x1 , x2 P C  and any λ P p0, 1q,


f  pλx1 p1  λqx2 q  suppxx, λx1 p1  λqx2 y  f pxqq
xPC
 suppλpxx, x1 y  f pxqq p1  λqpxx, x2 y  f pxqqq
xPC
¤ λ suppxx, x1 y  f pxqq p1  λq suppxx, x2 y  f pxqq
xPC xPC
  
 λf px1 q p1  λqf px2 q.

Therefore, f  is convex. Since C  is the set of all x where f  px q is defined, C  is also convex. q

Proposition 8.2.5 (second part of [1, Proposition 7.10.1, p. 196]). rf  , C  s is a closed convex subset of R  X.

Proof: From Quiz Theorem 17, rf  , C  s is convex. To show that it is closed, let tpsi , xi qu be a convergent
sequence in rf  , C  s with psi , xi q Ñ ps, x q. We must show that ps, x q P rf  , C  s. For every i and every
x P C,
paq  
si ¥ f px q ¥ xx, xy  f pxq,
where (a) follows by definition of the epigraph and (b) follows by definition of f  . Taking the limit as
i Ñ 8 gives s ¥ xx, x y  f pxq ¥ f  px q for all x P C, so ps, x q P rf  , C  s. q
The following proposition confirms our intuition about recovering C and f from C  and f  , as we did
in the previous examples:

Proposition 8.2.6 ([1, Proposition 7.10.2, p. 197]). Let f be a convex functional on a convex set C in a normed
space X. If rf, C s is closed, then rf, C s  rrf, C s s.

Example 8.2.7. Let C  D̄2 (the closed unit disk in E 2) and let f pxq  f pu, vq  0. Find C , f , C , and
f  .
8.3. Conjugate Concave Functionals U 121

?
Solution: We actually found C   R2 and f  pa, bq  a2 b2 in Example 1.3.2. By Proposition 8.2.6,
C   C and f   f since rf, C s  D̄2 X R3 is closed. q

8.3 Conjugate Concave Functionals

Definition 8.3.1 (concave functional). A functional g defined on a convex set D of a linear vector space is
concave if g is convex.

Note that the set D is still convex, not concave. Furthermore, a functional is linear if and only if it is both
convex and concave.
Just as we defined the epigraph for convex functionals, we define the hypograph for concave func-
tionals:

Definition 8.3.2 (hypograph). The hypograph of a concave functional g defined on a convex set D in a
vector space X is the set
rg, Ds  tpr, xq P R  X : x P D, f pxq ¥ ru.

The hypograph is a convex set: the proof is analogous to that in Example 8.1.5.
We can also define conjugate functionals and sets for concave functionals, just as we did for convex
functionals:

Definition 8.3.3 (conjugate concave). Let g be a concave functional defined on a convex set D in a normed
space X. The conjugate set D is defined as
" *
C   x P X  : inf pxx, x y  f pxqq ¡ 8
xPD

and the functional g  conjugate to g is defined on D as

g  px q  suppxx, x y  g pxqq.


P
x D

Example 8.3.4. Let D  r0, 1s and gpxq  x2. Find D and g.
Solution: Any element of the dual space can be represented by a real number m in the form xx, x y  mx.
Figure 8.3.1 shows the convex sets rg, Ds and rg  , D s. From the graph and using elementary calculus, we
see that $
'
'
&0 m¥0
g  px q  g  pmq   m4 2   m   0 .
2

'
'
%m 1 m ¤ 2
and C   R. From Figure 8.3.1, clearly g  g and D  D. Note that gpmq  f pmq from Example
8.2.2. q
122 U 8. C ONVEX F UNCTIONALS

Figure 8.3.1: The hypographs rg, Ds and rg  , D s in Example 8.3.4. The linear functionals mx for m 
0, 1, 2 are shown on rg, Ds, and the linear functionals xm for x  0, 1{2, 1 are shown on rg  , D s.

Example 8.3.5. Let D  r1, 2s and gpxq  0. Find D and g.

Figure 8.3.2: The hypographs rg, Ds and rg  , D s in Example 8.3.5. The linear functionals mx for m 
1, 0, 1 are shown on rg, Ds, and the linear functionals xm for x  1, 3{2, 2 are shown on rg, Ds.

Solution: Any element of the dual space can be represented by a real number m in the form xx, x y  mx.
Figure 8.3.2 shows the convex sets rg, Ds and rg  , D s. From the graph,
#
2m m   0
g  px q  g  pmq 
m m¥0

and D  R. From Figure 8.3.2, clearly g  g and D  D. Note that gpmq  f pmq from Example
8.2.3. q
8.4. Fenchel Duality U 123

Example 8.3.6. Let D  p0, 8sn equipped with the Euclidean norm and

g px q  α
1
ξp
p i1 i

for x  tξ1 , . . . , ξn u, p   1 and α ¡ 0. Find C  and f  .

Solution: The proof is nearly the same as in Example 8.2.4. If we represent an element x P D  D as
tη1, . . . , ηnu, we have ņ
 
g px q  α 1q 1
ηiq . q
q i1

8.4 Fenchel Duality

Suppose that f is convex over convex domain C and g is concave over convex domain D. A standard
minimization problem is to find the minimum vertical distance

µ inf pf pxq  gpxqq.


P X
x C D

While g pxq usually equals zero, introducing g is helpful because it allows us to exploit duality: the mini-
mum vertical separation between the convex sets rf, C s and rg, Ds equals the maximum vertical separa-
tion between two parallel hyperplanes separating rf, C s and rg, Ds, as shown in Figure 8.4.1.

Figure 8.4.1: The minimum vertical separation between the convex sets rf, C s and rg, Ds equals the maxi-
mum vertical separation between two parallel hyperplanes separating rf, C s and rg, Ds.

Q UIZ T HEOREM 18 (Fenchel Duality Theorem ([1, Theorem 7.12.1, p. 201])). Suppose that f is convex over
C and g is concave over D, where C, D are convex sets in a normed space X. Assume that C X D contains points
124 U 8. C ONVEX F UNCTIONALS

in the relative interiors of C and D and that either rf, C s or rg, Ds has nonempty interior. Further assume that

µ inf pf pxq  gpxqq   8.


P X
x C D

Then
µ   max
x PC  XD
pgpxq  f pxqq,
where the maximum on the right is achieved by some x0 P C  X D .
If the infimum on the left is achieved by some x0 P C X D, then

maxpxx, x0 y  f pxqq  xx0 , x0 y  f px0 q


x C P
minpxx, x0 y  g pxqq  xx0 , x0 y  g px0 q.
x D P

Proof: By definition of f  and g  , for all x P C X D and x P C  X D ,


f  px q ¥ xx, x y  f px q
g  px q ¤ xx, x y  g px q,

so f pxq  g pxq ¥ g  px q  f  px q. Since this holds for all x P C X D and x P C  X D ,
µ inf pf pxq  gpxqq ¥ sup pg  px q  f  px qq.
P X
x C D x PC  XD


To prove equality, we must find an x0 P C  X D such that µ  gpx0 q  f px0 q.


By definition of µ, the sets rf  µ, C s and rg, Ds are arbitrarily close but have disjoint relative interiors.
Since one of the sets rf  µ, C s and rg, Ds has nonempty relative interior, there is a closed hyperplane in
R  X separating the sets. The hyperplane cannot be vertical, otherwise its vertical projection onto X
would separate C and D. Therefore, we can represent the hyperplane as

tpr, xq P R  X : xx, x0 y  r  cu


for some x0 P X  and some c P R. Now rg, Ds lies below this hyperplane but is arbitrarily close to it, so
c  inf pxx, x0 y  g pxqq  g  px0 q (8.4.1)
P
x D

and similarly, rf  µ, C s lies below the hyperplane but is also arbitrarily close to it, so
c  suppxx, x0 y  pf pxq  µqq  f  px0 q µ, (8.4.2)
x C P
so µ  g  px0 q  f  px0 q. If the infimum µ is achieved by some x0 P C X D, then the sets rf  µ, C s and
rg, Ds have the point pgpx0q, x0q in common and this point is in the separating hyperplane, so by (8.4.1)
8.4. Fenchel Duality U 125

and (8.4.2),

maxpxx, x0 y  f pxqq  xx0 , x0 y  f px0 q


P
x C
minpxx, x0 y  g pxqq  xx0 , x0 y  g px0 q. q
P
x D

The general strategy is to use the sets C and D to impose constraints.

• If we want to minimize a convex functional f , we choose g pxq  0. Then we want to minimize f  g,


which is equivalent to maximizing g   f  . Since we chose g pxq  0,

µ inf
P X
x C D
f pxq   max
x PC  XD
pgpxq  f pxqq
is the desired minimum of f .

• If we want to maximize a concave functional g, we choose f pxq  0. Then we want to minimize


f  g, which is equivalent to maximizing g   f  . Since we chose f pxq  0,

µ  sup g pxq   min


P  X  pf pxq  gpxqq
xPC XD x C D

is the desired maximum of g.

Example 8.4.1 (generalization [1, Example 7.12.1, p. 202]). Suppose you have w to spend on n activities,
each of which costs wi per unit of work for i  1, . . . , n. Assume that the return associated with activity i
is gi pxi q, xi is the number of units of work allocated to the activity, and gi is increasing and gi1 is decreasing
due to diminishing marginal returns (so g is concave).

Solution: We write the problem formally as



max g px q  g i px i q (returns)
x1 ,...,xn

i 1

s.t. wi x i w (budget constraint)

i 1
x1 , . . . , x n ¥0 (positivity constraints)

where x  px1 , . . . , xn q. To solve the problem using Fenchel duality, set


# +

C  x: wi x i w

i 1

to capture the budget constraint, and set D  r0, 8qn to capture the positivity constraints. Set f pxq 0
since we are maximizing a concave functional g. Then for x  px1 , . . . , xn q,

f  px q  sup xi xi ,
P 
x Ci 1
126 U 8. C ONVEX F UNCTIONALS

which is finite only if x  λpw1, . . . , wnq, in which case f pxq  λw. Therefore, C   tλpw1, . . . , wnqu.
Furthermore,
gi pxi q  inf pxixi  gipxiqq
xi 0,Pr 8q
is defined for all xi P r0, 8q since gipxiq is concave, so

ņ ņ
g  px q  inf x i x   g p xq  gi pxi q;
xPD
i

i 1 
i 1

is defined for all x P D  D. Therefore, the dual problem is




min
x C  D 
P X
pf pxq  gpxqq  min λw  gi pλwi q .
λ
i 1
Given the optimal λ, xi is the value for which the infimum in (8.4) is achieved, with xi  λwi, for i 
1, . . . , n. q

Example 8.4.2. As a concrete application of Example 8.4.1, say you have w  5 to spend on two presents:

? ?
x1 on the first and x2 on the second. Maximize the total gratitude you generate, where total gratitude is
given by g pxq  g1 px1 q g2 px2 q, with g1 px1 q  2 x1 and g2 px2 q  x2 .

Solution: By elementary calculus, we have


d

2 
2
g  px  q   ?
px x  2 x q  1 x   2 1
  x1
x x1
inf
1 1
x1 0,Pr 8q 1 1 1
1
1
1
d

2 
2
g2 px2 q  p ?
x2 x2  x2 q 
1
x2 
1
  4x1 .
2x2 2x2
inf
x Pr0,8q
1 2

where the infima are achieved at x1  p1{x1 q2 and x2  p1{2x2 q2. The dual problem is


1 1
min λw ;
λ λ 4λ

taking the derivative with respect to λ, we have


c
x0   0 ùñ λ   12 .
5 1 1 5
4 λ2 2 w
? ?
Then x1  x2  1{2, so x1  4 and x2  1. The maximum gratitude is gp4, 1q  2 4 1  5. q

Example 8.4.3. As another concrete application of Example 8.4.1, say you have $w million to invest in oil
exploration and wind power. The government gives you a 50% subsidy for your wind power investment,
2 x2  w. The total return on your investment is given by g pxq 
1
so your budget constraint is x1
8.4. Fenchel Duality U 127

g 1 px 1 q g2 px2 q, where
# #
0 ¤ x1 ¤3 0 ¤ x2 ¤2.
g1 px1 q  g 2 px 2 q 
4x1 3x2
9 x1 x1 ¡3 4 x2 x2 ¥2

Figure 8.4.2: The hypographs rg1 , Ds and rg2 , Ds in Example 8.4.3.

Solution: From the graphs in 8.4.2, we see that


# #
3x1  12 1 ¤ x1 ¤4 2x2  6 1 ¤ x2 ¤3,
g1 px1 q  g2 px2 q 
0 x1 ¡4 0 x2 ¡3
where the infima are achieved at x1  3 if 1 ¤ x1 ¤ 4 and x1  0 if x1 ¡ 4, and x2  2 if 1 ¤ x2 ¤ 3 and
x2  0 if x2 ¡ 3. Then D  r1, 8q2 , and C  X D  tλp1, 1{2q : λ ¥ 2u. The dual problem is
 

µ  min λa  g1 pλq  g2


1
λ .
λ¥2 2

The table below shows values of the optimal x1 and x2 for various values of a, which we can determine
using the graphs in Figure 8.4.3.
a λ x1 x2 x1 x2 µ
1{2 6 6 3 0 1 3
2 4 4 2 1 2 10
5 2 2 1 3 4 20
q

Example 8.4.4 ([1, Example 7.12.2, p. 203]). As yet another concrete application of Example 8.4.1, say you
have $w to bet on a horse race with n horses. Assume that we know that the public has bet si on horse i
128 U 8. C ONVEX F UNCTIONALS


Figure 8.4.3: The function λa  g1 pλq  g2 1
2λ in Example 8.4.3 for a  1{2, 2, 5.

and that horse i has probability pi of winning for i  1, . . . , n. We bet xi on horse i, and we must bet all of
our money on the n horses. Let P denote the amount the track pays out to bettors on the winning horse
(typically 80% of the total amount bet). Maximize your expected fraction of the total amount P paid out.

Proof: Our expected net return is




 
 ņ   ņ 
 w.
xi
C
 w si 
 pi 
i1
 loooomoooon i1 xi s i
loomoon
amount won if horse i wins
total amount bet

Maximizing this quantity is equivalent to maximizing



g px q  g i px i q , gi pxi q 
pi xi
.

i 1
xi s i

Remember that the value of the conjugate functional gi at xi is obtained by finding the lowest line of
slope xi which lies above gi . Therefore, gi pxi q is undefined for xi ¤ 0, and since gi1 p0q  pi {si , gi pxi q  0
for xi ¥ pi {si . By elementary calculus, we have for 0   λ   pi {si
  c
xi xi   x   0 ùñ xi   si .
d pi xi pi si pi si
p x i s i q2 xi
i (8.4.3)
dxi xi s i

Then #


 0   xi   ps
px 2
i i

gi pxi q  inf xi xi   ps xq


i
pi xi 2
i i i
, (8.4.4)
xi Pr0,8q xi si 0 xi ¥ psi
i

where xi is determined from (8.4.3).


Now we know from Example 8.4.1 that xi  λ for i  1, . . . , n. Rearrange the indices so that p1 {s1 ¥
   ¥ pn{sn. For the given λ, define m as the largest index for which pi{si ¥ λ. From (8.4.3) and (8.4.4),
our solution is $b
& pi si  s i  1, . . . , m
xi  λ i
. q
%0 i  m 1, . . . , n
8.4. Fenchel Duality U 129

Example 8.4.5 ([1, Example 7.12.3, p. 205]). Find u P L2 r0, 1s minimizing


»1
f puq  puptqq2 dt
1
2 0

subject to the linear constraints



py 1 , u q  
c1


yu      .   c.
 ..   .. 
.
py n , u q cn

Solution: Let C  L2r0, 1s, D  tu : yu  cu, and gptq  0. To find f puq, we want to maximize
»1
»1
J  pu, uq  f puq  uptqu ptq 
1
2
ppuptqq2 dt  9 tq dt,
v pu, u,
0 0

so u must satisfy the Euler-Lagrange equation

9 tq 
vu pu, u,
d
dt
9 tq  u  u  0.
vu9 pu, u,

Therefore, »1
f  pu q  supppu, u q  f puqq  puptqq2 dt
1
uPC 2 0

and C   L2r0, 1s. Furthermore,


g  pu q  inf pu, u q ;
uPD

that is, we want to minimize pu, u q subject to the constraints py1 , uq  c1 , . . . , pyn , uq  cn . Equivalently,
we want to find a stationary point of
»1 ņ
»1
J  uptqu ptq λi yi ptquptq dt  9 tq dt,
v pu, u,
0 
i 1 0

so v must satisfy the Euler-Lagrange equation



vu pu, u,
9 tq  d
dt
9 tq  u
vu9 pu, u, λi yi  0.

i 1

Therefore, g  pu q is defined only if u  °ni1 λiyi  yT λ for λT  pλ1, . . . , λnq, in which case

ņ ņ
g  pu q  inf u, λ i yi  uinf λi ci  pλ, cq  cT λ
uPD PD
i1 
i 1

and D  tu : u  yλu. By Quiz Theorem 18, the dual maximization problem is then


max pg  pu q  f  pu qq  max cλT  p y λ q py λ q ,


1 T T T
u PC XD λ 2
130 U 8. C ONVEX F UNCTIONALS

where by elementary matrix calculus, we achieve the maximum at λ  pyyT q1c. Then the solution to
the original minimization problem is u  yT λ. q
B IBLIOGRAPHY

[1] D. G. Luenberger, Optimization by Vector Space Methods. Wiley Interscience, 1969 [1997].

[2] W. Rudin, Principles of Mathematical Analysis. McGraw-Hill Science/Engineering/Math, third ed.,


1976.

[3] M. J. D. Powell, Approximation Theory and Methods. Cambridge University Press, 1981.

[4] M. A. Khamsi, “Gibbs phenomenon,” S.O.S. MATHematics, 2008.

[5] S. Axler, Linear Algebra Done Right. Springer, 2nd ed., 2004.

[6] J. Munkres, Topology. Prentice Hall, 2nd ed., 2000.

[7] G. Bachmann and L. Narici, Functional Analysis. Dover Publications, 2nd ed., 1998.

[8] B. V. Brunt, Calculus of Variations. Springer, 1st ed., 2003.

131

You might also like