Probability Theory An Analytic View PDF
Probability Theory An Analytic View PDF
Probability Theory An Analytic View PDF
Probability Theory
An Analytic View, Second Edition
This second edition of Daniel W. Stroock’s text is suitable for first-year graduate
students with a good grasp of introductory undergraduate probability. It provides
a reasonably thorough introduction to modern probability theory with an empha-
sis on the mutually beneficial relationship between probability theory and analy-
sis. It includes more than 750 exercises and offers new material on Levy processes,
large deviations theory, Gaussian measures on a Banach space, and the relationship
between a Wiener measure and partial differential equations.
The first part of the book deals with independent random variables, Central Limit
phenomena, the general theory of weak convergence and several of its applications, as
well as elements of both the Gaussian and Markovian theories of measures on function
space. The introduction of conditional expectation values is postponed until the
second part of the book, where it is applied to the study of martingales. This part also
explores the connection between martingales and various aspects of classical analysis
and the connections between a Wiener measure and classical potential theory.
Daniel W. Stroock
Massachusetts Institute of Technology
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore,
São Paulo, Delhi, Dubai, Tokyo, Mexico City
c Daniel W. Stroock 1994, 2011
A catalog record for this publication is available from the British Library.
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for
external or third-party Internet Web sites referred to in this publication and does not guarantee
that any content on such Web sites is, or will remain, accurate or appropriate.
This book is dedicated to my teachers:
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
vii
viii Contents
2.4 An Application to Hermite Multipliers . . . . . . . . . . . . . 96
2.4.1. Hermite Multipliers . . . . . . . . . . . . . . . . . . . . 96
2.4.2. Beckner’s Theorem . . . . . . . . . . . . . . . . . . . . 101
2.4.3. Applications of Beckner’s Theorem . . . . . . . . . . . . . 105
Exercises for § 2.4 . . . . . . . . . . . . . . . . . . . . . . . . 110
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
Preface
xiii
xiv Preface
tell the reader enough so that he could understand the ideas and not so much
that he would become bored by them. In addition, they gave me an introduction
to a host of ideas and techniques (e.g., stopping times and the strong Markov
property), all of which Kac himself consigned to the category of overelaborated
measure theory. In fact, it would be reasonable to say that my thesis was simply
the application of techniques which I picked up from Dynkin to a problem that
I picked up by reading some notes by Kac. Of course, along the way I profited
immeasurably from continued contact with McKean, a large number of courses
at N.Y.U. (particularly ones taught by M. Donsker, F. John, and L. Nirenberg),
and my increasingly animated conversations with S.R.S. Varadhan.
As I trust the preceding description makes clear, my graduate education was
anything but deprived; I had ready access to some of the very best analysts
of the day. On the other hand, I never had a proper introduction to my field,
probability theory. The first time that I ever summed independent random
variables was when I was summing them in front of a class at N.Y.U. Thus,
although I now admire the magnificent body of mathematics created by A.N.
Kolmogorov, P. Lévy, and the other twentieth-century heroes of the field, I
am not a dyed-in-the-wool probabilist (i.e., what Donsker would have called a
true coin-tosser). In particular, I have never been able to develop sufficient
sensitivity to the distinction between a proof and a probabilistic proof. To me,
a proof is clearly probabilistic only if its punch-line comes down to an argument
like P (A) ≤ P (B) because A ⊆ B; and there are breathtaking examples of such
arguments. However, to base an entire book on these examples would require a
level of genius that I do not possess. In fact, I myself enjoy probability theory
best when it is inextricably interwoven with other branches of mathematics and
not when it is presented as an entity unto itself. For this reason, the reader
should not be surprised to discover that he finds some of the material presented
in this book does not belong here; but I hope that he will make an effort to figure
out why I disagree with him.
Summary
In spite of the realistic assessment contained in the first paragraph of its preface,
when I wrote the first edition of this book I harbored the naı̈ve hope that it
might become the standard graduate text in probability theory. By the time
that I started preparing the second edition, I was significantly older and far less
naı̈ve about its prospects. Although the first edition has its admirers, it has
done little to dent the sales record of its competitors. In particular, the first
edition has seldom been adopted as the text for courses in probability, and I
doubt that the second will be either. Nonetheless, I close this preface with a few
suggestions for anyone who does choose to base a course on it.
I am well aware that, except for those who find their way into the poorly
stocked library of some prison camp, few copies of this book will be read from
cover to cover. For this reason, I have attempted to organize it in such a way that,
with the help of the table of dependence that follows, a reader can select a path
which does not require his reading all the sections preceding the information he
is seeking. For example, the contents of §§ 1.1–1.2, § 1.4, § 2.1, § 2.3, and § 5.1–
5.2 constitute the backbone of a one semester, graduate level introduction to
probability theory. What one attaches to this backbone depends on the speed
with which these sections are covered and the content of the courses for which
the course is the introduction. If the goal is to prepare the students for a career
as a “quant” in what is left of the financial industry, an obvious choice is § 4.3
and as much of Chapter 7 as time permits, thereby giving one’s students a
reasonably solid introduction to Brownian motion. On the other hand, if one
wants the students to appreciate that white noise is not the only noise that they
may encounter in life, one might defer the discussion of Brownian motion and
replace it with the material in Chapter 3 and §§ 4.1–4.2.
Alternatively, one might use this book in a more advanced course. An intro-
duction to stochastic processes with an emphasis on their relationship to partial
differential equations can be constructed out of Chapters 6, 7, 10, and 11, and
§ 4.3 combined with Chapter 8 could be used to provide background for a course
on Gaussian processes.
Whatever route one takes through this book, it will be a great help to your
students for you to suggest that they consult other texts. Indeed, it is a familiar
fact that the third book one reads on a subject is always the most lucid, and so
one should suggest at least two other books. Among the many excellent choices
available, I mention Wm. Feller’s An Introduction to Probability Theory and Its
Applications, Vol. II, and M. Loéve’s classic Probability Theory. In addition, for
xx Preface
background, precision (including accuracy of attribution), and supplementary
material, R. Dudley’s Real Analysis and Probability is superb.
Table of Dependence
§§11.1–11.4
§§10.1–10.3
§§7.4–7.5 §§8.1–8.5
§9.3 §§7.1–7.3
§§9.1–9.2
xxi
Chapter 1
Sums of Independent Random Variables
In one way or another, most probabilistic analysis entails the study of large
families of random variables. The key to such analysis is an understanding
of the relations among the family members; and of all the possible ways in
which members of a family can be related, by far the simplest is when there
is no relationship at all! For this reason, I will begin by looking at families of
independent random variables.
§ 1.1 Independence
In this section I will introduce Kolmogorov’s way of describing independence
and prove a few of its consequences.
§ 1.1.1. Independent σ-Algebras. Let (Ω, F, P) be a probability space
(i.e., Ω is a nonempty set, F is a σ-algebra over Ω, and P is a non-negative
measure on the measurable space (Ω, F) having total mass 1), and, for each i
from the (non-empty) index set I, let Fi be a sub-σ-algebra of F. I will say
that the σ-algebras Fi , i ∈ I, are mutually P-independent, or, less precisely,
P-independent, if, for every finite subset {i1 , . . . , in } of distinct elements of I
and every choice of Aim ∈ Fim , 1 ≤ m ≤ n,
1
2 1 Sums of Independent Random Variables
independent σ-algebras tend to fill up space in a sense made precise by the fol-
lowing beautiful thought experiment designed by A.N. Kolmogorov. Let I be
any index set, take F∅ = {∅, Ω}, and, for each non-empty subset Λ ⊆ I, let
!
_ [
FΛ = Fi ≡ σ Fi
i∈Λ i∈I
S
be the σ-algebra
S generated by i∈Λ Fi (i.e., FΛ is the smallest σ-algebra con-
taining i∈Λ Fi ). Next, define the tail σ-algebra T to be the intersection over
all finite Λ ⊆ I of the σ-algebras FΛ{ . When I itself is finite, T = {∅, Ω} and
is therefore P-trivial in the sense that P(A) ∈ {0, 1} for every A ∈ T . The
interesting remark made by Kolmogorov is that even when I is infinite, T is
P-trivial whenever the original Fi ’s are P-independent. To see this, for a given
non-empty Λ ⊆ I, let CΛ denote the collection of sets of the form Ai1 ∩ · · · Ain
where {i1 , . . . , in } are distinct elements of Λ and Aim ∈ Fim for each 1 ≤ m ≤ n.
Clearly CΛ is closed under intersection and FΛ = σ(CΛ ). In addition, by assump-
tion, P(A ∩ B) = P(A)P(B) for all A ∈ CΛ and B ∈ CΛ{ . Hence, by Exercise
1.1.12, FΛ is independent of FΛ{ . But this means that T is independent of FF
for every finite F ⊆ I, and therefore, again by Exercise 1.1.12, T is independent
of [
FI = σ {FF : F a finite subset of Λ} .
(See part (iii) of Exercise 5.2.40 and Lemma 11.4.14 for generalizations.)
Proof: The first assertion, which is due to E. Borel, is an easy application of
countable additivity. Namely, by countable additivity,
[ X
P lim An = lim P An ≤ lim P(An ) = 0
n→∞ m→∞ m→∞
n≥m n≥m
P∞
if n=1 P(An ) < ∞.
To complete the proof of (1.1.5) when
the An ’s are independent, note that,
by countable additivity, P limn→∞ An = 1 if and only if
\ ∞ \
[
lim P An { = P An { = P lim An { = 0.
m→∞ n→∞
n≥m m=1 n≥m
∞
! N
" N
#
\ Y X
P An { = lim 1 − P(An ) ≤ lim exp − P An =0
N →∞ N →∞
n=m n=m n=m
P∞
if n=1 P(An ) = ∞. (In the preceding, I have used the trivial inequality 1 − t ≤
e−t , t ∈ [0, ∞).)
A second, and perhaps more transparent, way of dealing with the contents of
the preceding is to introduce the non-negative random variable N (ω) ∈ Z+ ∪
4 1 Sums of Independent Random Variables
are P-independent. If B(E; R) = B (E, B); R denotes the space of bounded
measurable R-valued functions on the measurable space (E, B), then it should
be clear that P-independence of {Xi : i ∈ I} is equivalent to the statement that
EP fi1 ◦ Xi1 · · · fin ◦ Xin = EP fi1 ◦ Xi1 · · · EP fin ◦ Xin
if ω ∈ A
1
1A (ω) ≡
0 if ω ∈
/A
denotes the indicator function of the set A ⊆ Ω, notice that the family of sets
{Ai : i ∈ I} ⊆ F is P-independent if and only if the random variables 1Ai , i ∈ I,
are P-independent.
Thus far I have discussed only the abstract notion of independence and have
yet to show that the concept is not vacuous. In the modern literature, the
standard way to construct lots of independent quantities is to take products of
probability spaces.
Q Namely, if E i , B i , µi is a probability space for each i ∈ I,
one sets Ω = i∈I Ei ; defines πi : Ω −→ Ei to be the natural projection map
for each i ∈ I; takes Fi = πi−1 (Bi ), i ∈ I, and F = i∈I Fi ; and shows that
W
there is a unique probability measure P on (Ω, F) with the properties that
P πi−1 Γi = µi Γi )
for all i ∈ I and Γi ∈ Bi
1 Throughout this book, I use EP [X, A] to denote the expected value under P of X over the set
R
A. That is, EP [X, A] = X dP. Finally, when A = Ω, I will write EP [X]. Tonelli’s Theorem
A
is the version of Fubini’s Theorem for non-negative functions. Its virtue is that it applies
whether or not the integrand is integrable.
§ 1.1 Independence 5
if t − btc ∈ 0, 12
−1
R(t) =
if t − btc ∈ 12 , 1 .
1
I will now show that the Rademacher functions are P-independent. To this end,
first note that every real-valued function f on {−1, 1} is of the form α + βx, x ∈
{−1, 1}, for some pair of real numbers α and β. Thus, all that I have to show is
that
EP (α1 + β1 R1 ) · · · (αn + βn Rn ) = α1 · · · αn
for any n ∈ Z+ and (α1 , β1 ), . . . , (αn , βn ) ∈ R2 . Since this is obvious when
n = 1, I will assume that it holds for n and need only check that it must also
hold for n + 1, and clearly this comes down to checking that
EP F (R1 , . . . , Rn ) Rn+1 = 0
whereas Rn+1 integrates to 0 on each Im,n . Hence, by writing the integral over
Ω as the sum of integrals over the Im,n ’s, we get the desired result.
At this point I have produced a countably infinite sequence of independent
Bernoulli random variables (i.e., two-valued random variables whose range is
usually either {−1, 1} or {0, 1}) with mean value 0. In order to get more general
6 1 Sums of Independent Random Variables
t−a
P(U ≤ t) = for t ∈ [a, b].
b−a
1 + Rn (ω)
n (ω) ≡ , n ∈ Z+ and ω ∈ [0, 1),
2
on [0, 1), B[0,1) , λ[0,1) . But, as is easily checked (cf. part (i) of Exercise 1.1.11),
P∞
for each ω ∈ [0, 1], ω = n=1 2−n n (ω). Hence, the desired conclusion is trivial
in this case.
Now let (k, `) ∈ Z+ × Z+ 7−→ n(k, `) ∈ Z+ be any one-to-one mapping of
Z × Z+ onto Z+ , and set
+
1 + Rn(k,`) 2
Yk,` = , (k, `) ∈ Z+ .
2
Clearly, each Yk,` is a {0, 1}-valued, Bernoulli random variable with mean value
1 + 2
2 , and the family Yk,` : (k, `) ∈ Z is P-independent. Hence, by Lemma
1.1.6, each of the random variables
∞
X Yk,`
Uk ≡ , k ∈ Z+ ,
2`
`=1
Exercise 1.1.13. In this exercise I discuss two criteria for determining when
random variables on the probability space (Ω, F, P) are independent.
(i) Let X1 , . . . , Xn be bounded, real-valued random variables. Using Weier-
strass’s Approximation Theorem, show that the Xm ’s are P-independent if and
only if
EP X1m1 · · · Xnmn = EP X1m1 · · · EP Xnmn
for all m1 , . . . , mn ∈ N.
(ii) Let X : Ω −→ Rm and Y : Ω −→ Rn be random variables. Show that X
and Y are P-independent if and only if
h√ i
P
E exp −1 α, X Rm + β, Y Rn
h√
i P h√ i
P
= E exp −1 α, X Rm E exp −1 β, Y Rn
for all f ∈ Cc∞ Rm ; C and g ∈ Cc∞ Rn ; C . Second, given such f and g, apply
where ϕ and ψ are smooth functions with rapidly decreasing (i.e., tending
to 0 as |x| → ∞ faster than any power of (1 + |x|)−1 ) derivatives of all orders.
Finally, apply Fubini’s Theorem.
Exercise 1.1.14. Given a pair of measurable spaces (E1 , B1 ) and (E 2 , B2 ),
recall that their product is the measurable space E1 × E2 , B1 × B2 , where
B1 × B2 is the σ-algebra over the Cartesian product space E1 × E2 generated by
the sets Γ1 × Γ2 , Γi ∈ Bi . Further, recall that, for any probability measures µi
on (Ei , Bi ), there is a unique probability measure µ1 × µ2 on E1 × E2 , B1 × B2
such that
(µ1 × µ2 ) Γ1 × Γ2 = µ1 (Γ1 )µ2 (Γ2 ) for Γi ∈ Bi .
generally, for any n ≥ 2 and measurable
More Q spaces {(Ei , Bi ) : 1Q≤ i ≤ n}, one
n Qn n
takes 1 Bi to be the σ-algebra over 1 Ei generated by the sets 1 Γi , Γi ∈ Bi .
Qn+1 Qn+1 Qn
In particular, since 1 Ei and 1 Bi can be identified with ( 1 Ei ) ×
10 1 Sums of Independent Random Variables
Qn
En+1 and ( 1 Bi ) × Bn+1 , respectively, one can use induction to show that, for
every choice
Qn of probability
Qn measures
Qn µi on (Ei , Bi ), there is a unique probability
measure 1 µi on ( 1 Ei , 1 Bi ) such that
n
! n ! n
Y Y Y
µi Γi = µi (Γi ), Γi ∈ Bi .
1 1 1
for every ∅ =
6 F ⊂⊂ I. Not surprisingly, the probability space
!
Y Y Y
Ei , Bi , µi
i∈I i∈I i∈I
is called the product over I of the spaces Ei , Bi , µi ; and when all the factors
it by E I , B I , µI , and
are the same space E, B, µ , it is customary to denote
if, in addition, I = {1, . . . , N }, one uses E N , B N , µN .
(i) After noting (cf. Exercise 1.1.12) that two probability measures that agree on
a π-system agree on the σ-algebra generated bythat π-system, show that there
is at most one probability measure on EI , BI that satisfies the condition in
(1.1.15). Hence, the problem is purely one of existence.
(ii) Let A be the algebra over EI generated by C, and show that there is a finitely
additive µ : A −→ [0, 1] with the property that
!
Y
µ πF−1 ΓF =
µi ΓF , ΓF ∈ BF ,
i∈F
Exercises for § 1.1 11
for all ∅ =
6 F ⊂⊂ I. Hence, all that one has to do is check that µ admits a
σ-additive extension to BI , and, by a standard extension theorem, this comes
down to checking that µ(An ) & 0 whenever {An : n ≥ 1} ⊆ A and An & ∅.
Thus, let {An : n ≥ 1} be a non-increasing sequence from A, and
T∞assume that
µ(An ) ≥ for some > 0 and all n ∈ Z+ . One must show that 1 An 6= ∅.
(iii) Referring to the last part of (ii), show that there is no loss in generality
to assume that An = πF−1 n
ΓFn , where, for each n ∈ Z+ , ∅ = 6 Fn ⊂⊂ I and
ΓFn ∈ BFn . In addition, show that one may assume that F1 = {i1 } and that
Fn = Fn−1 ∪ {in }, n ≥ 2, where {in : n ≥ 1} is a sequence of distinct elements
of I. Now, make these assumptions, and show that it suffices to find a` ∈ Ei` ,
` ∈ Z+ , with the property that, for each m ∈ Z+ , (a1 , . . . , am ) ∈ ΓFm .
( iv) Continuing (iii), for each m, n ∈ Z+ , define gm,n : EFm −→ [0, 1] so that
gm,n xFm = 1ΓFn xi1 , . . . , xin if n ≤ m
and
Z n
!
Y
gm,n xFm = 1ΓFn xFm , yFn \Fm µi` dyFn \Fm if n > m.
EFn \Fm `=m+1
= lim µ(An ) ≥ ,
n→∞
Finally, check that {am : m ≥ 1} is a sequence of the sort for which we were
looking at the end of part (iii).
12 1 Sums of Independent Random Variables
Given a non-empty index set I and, for each i ∈ I, a measurable space (Ei , Bi )
and an Ei -valued
Q random variable Xi on the probability space (Ω, F, P), define
X : Ω −→ i∈I Ei so that X(ω)i = Xi (ω) for each i ∈ I and ω ∈ Ω. Show
that XQ i : i ∈ I is a family of P-independent random variables if and only if
X∗ P = i∈I (Xi )∗ P. In particular, given probability measures µi on (Ei , Bi ),
set Y Y Y
Ω= Ei , F = Bi , P = µi ,
i∈I i∈I i∈I
let Xi : Ω −→ Ei be the natural projection map from Ω onto Ei , and show that
{Xi : i ∈ I} is a family of mutually P-independent random variables such that,
for each i ∈ I, Xi has distribution µi .
Exercise 1.1.17. Although it does not entail infinite product spaces, an inter-
esting example of the way in which the preceding type of construction can be
effectively applied is provided by the following elementary version of a coupling
argument.
(i) Let (Ω, B, P) be a probability space and X and Y a pair of P-square integrable
R-valued random variables with the property that
Show that
EP X Y ≥ EP [X] EP [Y ].
Hint: Define Xi and Yi on Ω2 for i ∈ {1, 2} so that Xi (ω) = X(ωi ) and
Yi (ω) = Y (ωi ) when ω = (ω1 , ω2 ), and integrate the inequality
0 ≤ X(ω1 ) − X(ω2 ) Y (ω1 ) − Y (ω2 ) = X1 (ω) − X2 (ω) Y1 (ω) − Y2 (ω)
with respect to P2 .
(ii) Suppose that n ∈ Z+ and that f and g are R-valued, Borel measurable
functions on Rn that are non-decreasing with respect to each coordinate (sepa-
rately). Show that if X = X1 , . . . , Xn is an Rn -valued random variable on a
probability space (Ω, B, P) whose coordinates are mutually P-independent, then
EP f (X) g(X) ≥ EP f (X) EP g(X)
Hint: First check that the case when n = 1 reduces to an application of (i).
Next, describe the general case in terms of a multiple integral, apply Fubini’s
Theorem, and make repeated use of the case when n = 1.
Exercise 1.1.18. A σ-algebra is said to be countably generated if it contains
a countable collection of sets that generate it. The purpose of this exercise is to
show that just because a σ-algebra is itself countably generated does not mean
that all its sub-σ-algebras are.
Let (Ω, F, P) be a probability space and {An : n ∈ Z+ ⊆ F a sequence of
P-independent sub-subsets of F with the property that α ≤ P(An ) ≤ 1 − α for
some α ∈ (0, 1). Let Fn be the sub-σ-algebra generated by An . Show that the
tail σ-algebra T determined by Fn : n ∈ Z+ cannot be countably generated.
Hint: Show that C ∈ T is an atom in T (i.e., B = C whenever B ∈ T \ {∅} is
contained in C) only if one can write
∞ \
[
C = lim Cn ≡ Cn ,
n→∞ m=1 n≥m
Note that, on the one hand, P(C) = 1, while, on the other hand, C is an atom
in T and therefore has probability 0.
Exercise 1.1.19. Here is an interesting application of Kolmogorov’s 0–1 Law
to a property of the real numbers.
(i) Referring to the discussion preceding Lemma 1.1.6 and part (i) of Exercise
1.1.11, define the transformations Tn : [0, 1) −→ [0, 1) for n ∈ Z+ so that
Rn (ω)
Tn (ω) = ω − , ω ∈ [0, 1),
2n
and notice (cf. the proof of Lemma 1.1.6) that Tn (ω) simply flips the nth co-
efficient in the binary expansion ω. Next, let Γ ∈ B[0,1) , and show that Γ
is measurable with respect to the σ-algebra σ {Rn : n > m} generated by
{Rn : n > m} if and only if Tn (Γ) = Γ for each 1 ≤ n ≤ m. In particular,
conclude that λ[0,1) (Γ) ∈ {0, 1} if Tn Γ = Γ for every n ∈ Z+ .
14 1 Sums of Independent Random Variables
(ii) Let F denote the set of all finite subsets of Z+ , and for each F ∈ F, define
T F : [0, 1) −→ [0, 1) so that T ∅ is the identity mapping and
T F ∪{m} = T F ◦ Tm for each F ∈ F and m ∈ Z+ \ F.
As an application of (i), show that for every Γ ∈ B[0,1) with λ[0,1) (Γ) > 0,
!
[
λ[0,1) T F (Γ) = 1.
F ∈F
In particular, this means that if Γ has positive measure, then almost every
ω ∈ [0, 1) can be moved to Γ by flipping a finite number of the coefficients in the
binary expansion of ω.
§ 1.2 The Weak Law of Large Numbers
Starting with this section, and for the rest of this chapter, I will be studying what
happens when one averages independent, real-valued random variables. The
remarkable fact, which will be confirmed repeatedly, is that the limiting behavior
of such averages depends hardly at all on the variables involved. Intuitively,
one can explain this phenomenon by pretending that the random variables are
building blocks that, in the averaging process, first get homothetically shrunk
and then reassembled according to a regular pattern. Hence, by the time that
one passes to the limit, the peculiarities of the original blocks get lost.
Throughout the discussion, (Ω, F, P) will be a probability space on which there
is a sequence {Xn : n ≥ 1} of real-valued random variables. Given n ∈ Z+ , use
Sn to denote the partial sum X1 + · · · + Xn and S n to denote the average:
n
Sn 1X
= X` .
n n
`=1
§ 1.2.1. Orthogonal Random Variables. My first result is a very general
one; in fact, it even applies to random variables that are not necessarily inde-
pendent and do not necessarily have mean 0.
Lemma 1.2.1. Assume that
EP Xn2 < ∞ for n ∈ Z+
and EP Xk X` = 0 if k 6= `.
Then, for each > 0,
n
2 1 X P 2
(1.2.2) 2 P S n ≥ ≤ EP S n = 2 E X` for n ∈ Z+ .
n
`=1
In particular, if
M ≡ sup EP Xn2 < ∞,
n∈Z+
then
2 M
(1.2.3) 2 P S n ≥ ≤ EP S n ≤ , n ∈ Z+ and > 0;
n
and so S n −→ 0 in L2 (P; R) and therefore also in P-probability.
§ 1.2 The Weak Law of Large Numbers 15
`=1
The rest is just an application of Chebyshev’s inequality, the estimate that
results after integrating the inequality
2 1[,∞) |Y | ≤ Y 2 1[,∞) |Y | ≤ Y 2
are still P-square integrable, have mean value 0, and therefore are orthogonal.
Hence, the following statement is an immediate consequence of Lemma 1.2.1.
Theorem 1.2.4. Let Xn : n ∈ Z+ be a sequence of P-independent, P-square
integrable random variables with mean value m and variance dominated by σ 2 .
Then, for every n ∈ Z+ and > 0,
h 2 i σ 2
2 P S n − m ≥ ≤ EP S n − m
(1.2.5) ≤ .
n
In particular, S n −→ m in L2 (P; R) and therefore in P-probability.
As yet I have made only minimal use of independence: all that I have done
is subtract off the mean of independent random variables and thereby made
them orthogonal. In order to bring the full force of independence into play, one
has to exploit the fact that one can compose independent random variables with
any (measurable) functions without destroying their independence; in particular,
truncating independent random variables does not destroy independence. To see
how such a property can be brought to bear, I will now consider the problem
of extending the last part of Theorem 1.2.4 to Xn ’s that are less than P-square
integrable. In order to
understand the statement, recall that a family of random
variables Xi : i ∈ I is said to be uniformly P-integrable if
h i
lim sup EP Xi , Xi ≥ R = 0.
R%∞ i∈I
As the proof of the following theorem illustrates, the importance of this condition
is that it allows one to simultaneously approximate the random variables Xi , i ∈
I, by bounded random variables.
16 1 Sums of Independent Random Variables
Theorem 1.2.6 (The Weak Law of Large Numbers). Let Xn : n ∈ Z+
be a uniformly P-integrable sequence of P-independent random variables. Then
n
1X
Xm − EP [Xm ] −→ 0 in L1 (P; R)
n 1
and therefore also in P-probability. In particular, if Xn : n ∈ Z+ is a sequence
of P-independent, P-integrable random variables that are identically distributed,
then S n −→ EP [X1 ] in L1 (P; R) and P-probability. (Cf. Exercise 1.2.11.)
Proof: Without loss in generality, I will assume that EP [Xn ] = 0 for every
n ∈ Z+ .
For each R ∈ (0, ∞), define fR (t) = t 1[−R,R] (t), t ∈ R,
and set
n n
(R) 1 X (R) (R) 1 X (R)
Sn = X` and Tn = Y` .
n n
`=1 `=1
(R)
Since E[Xn ] = 0 =⇒ mn = −E Xn , |Xn | > R ,
(R) (R)
EP |S n | ≤ EP |S n | + EP |T n |
(R) 1
≤ EP |S n |2 2 + 2 max EP |X` |, |X` | ≥ R
1≤`≤n
R
≤ √ + 2 max EP |X` |, |X` | ≥ R ;
n `∈Z+
Hence, because the X` ’s are uniformly P-integrable, we get the desired conver-
gence in L1 (P; R) by letting R % ∞.
§ 1.2.3. Approximate Identities. The name of Theorem 1.2.6 comes from
a somewhat invidious comparison with the result in Theorem 1.4.9. The reason
why the appellation weak is not entirely fair is that, although The Weak Law
is indeed less refined than the result in Theorem 1.4.9, it is every bit as useful
as the one in Theorem 1.4.9 and maybe even more important when it comes
to applications. What The Weak Law provides is a ubiquitous technique for
constructing an approximate identity (i.e., a sequence of measures that ap-
proximate a point mass) and measuring how fast the approximation is taking
§ 1.2 The Weak Law of Large Numbers 17
place. To illustrate how clever selections of the random variables entering The
Weak Law can lead to interesting applications, I will spend the rest of this section
discussing S. Bernstein’s approach
to Weierstrass’s
Approximation Theorem.
+
For a given p ∈ [0, 1], let Xn : n ∈ Z be a sequence of P-independent
{0, 1}-valued Bernoulli random variables with mean value p. Then
n `
p (1 − p)n−`
P Sn = ` = for 0 ≤ ` ≤ n.
`
Hence, for any f ∈ C [0, 1]; R , the nth Bernstein polynomial
n
X n `
(1.2.7) Bn (p; f ) ≡ f p` (1 − p)n−`
` n
`=0
of f at p is equal to
EP f ◦ S n .
In particular,
f (p) − Bn (p; f ) = EP f (p) − f ◦ S n ≤ EP f (p) − f ◦ S n
≤ 2kf ku P S n − p ≥ + ρ(; f ),
where kf ku is the uniform norm of f (i.e., the supremum of |f | over the domain
of f ) and
ρ(; f ) ≡ sup |f (t) − f (s)| : 0 ≤ s < t ≤ 1 with t − s ≤
1
is the modulus of continuity of f . Noting that Var Xn = p(1 − p) ≤ 4 and
applying (1.2.5), we conclude that, for every > 0,
most efficient one,1 as we are about to see, the Bernstein polynomials have a
lot to recommend them. In particular, they have the feature that they provide
non-negative polynomial approximates to non-negative functions. In fact, the
following discussion reveals much deeper non-negativity preservation properties
possessed by the Bernstein approximation scheme.
In order to bring out the virtues of the Bernstein polynomials, it is impor-
tant to replace (1.2.7) with an expression in which the coefficients of Bn ( · ; f )
(as polynomials) are clearly displayed. To this end, introduce the difference
operator ∆h for h > 0 given by
f (t + h) − f (t)
∆h f (t) = .
h
A straightforward inductive argument (using Pascal’s Identity for the binomial
coefficients) shows that
m
` m
X
m
m
(−h) ∆h f (t) = (−1) f (t + `h) for m ∈ Z+ ,
`
`=0
(m) 1
where ∆h denotes the mth iterate of the operator ∆h . Taking h = n, we now
see that
n n−`
X X nn − `
Bn (p; f ) = (−1)k f (`h)p`+k
` k
`=0 k=0
n r
X X
rn n−`
= p (−1)r−` f (`h)
r=0
` r − `
`=0
n r
X
X n r r
= (−p) (−1)` f (`h)
r=0
r `
`=0
n
X n
(ph)r ∆rh f (0),
=
r=0
r
one can exploit the relationship between the Bernstein and Taylor polynomials,
say that a function ϕ ∈ C ∞ (a, b); R is absolutely monotone if its mth deriva-
tive Dm ϕ is non-negative for every m ∈ N. Also, say that ϕ ∈ C ∞ [0, 1]; [0, 1])
is a probability generating function if there exists a un : n ∈ N ⊆ [0, 1]
such that
X∞ X∞
un = 1 and ϕ(t) = un tn for t ∈ [0, 1].
n=0 n=0
Proof: The implication (i) =⇒ (ii) is trivial. To see that (ii) implies (iii), first
observe that if ψ is absolutely monotone on (a, b) and h ∈ (0, b − a), then ∆h ψ
is absolutely monotone on (a, b − h). Indeed, because D ◦ ∆h ψ = ∆h ◦ Dψ on
(a, b − h), we have that
Z t+h
h Dm ◦ ∆h ψ (t) = Dm+1 ψ(s) ds ≥ 0,
t ∈ (a, b − h),
t
[∆m m
h ϕ](0) = lim [∆h ϕ](t) ≥ 0 if mh < 1,
t&0
1
and so ∆m
h ϕ (0) ≥ 0 when h = n and 0 ≤ m < n. Moreover, since
we also know that ∆nh ϕ (0) ≥ 0 when h = n1 , and this completes the proof that
Because the un,` ’s are all elements of [0, 1], one can use a diagonalization proce-
dure to choose {nk : k ∈ Z+ } so that
Exercise 1.2.11. Although, for historical reasons, The Weak Law is usually
thought of as a theorem about convergence in P-probability, the forms in which
I have presented it are clearly results about convergence in either P-mean or
even P-square mean. Thus, it is interesting to discover that one can replace the
uniform integrability assumption made in Theorem 1.2.6 with a weak uniform in-
tegrability assumption if one is willing to settle for convergence in P-probability.
Namely, let X1 , . . . , Xn , . . . be mutually P-independent random variables, as-
sume that
F (R) ≡ sup RP |Xn | ≥ R −→ 0 as R % ∞,
n∈Z+
2 Wm. Feller, An Introduction to Probability Theory and Its Applications, Vol. II, Wiley, Series
in Probability and Math. Stat. (1968). Feller provides several other similar applications of The
Weak Law, including the ones in the following exercises.
Exercises for § 1.2 21
and set
n
1 X Ph i
mn = E X` , |X` | ≤ n , n ∈ Z+ .
n
`=1
Show that, for each > 0,
n
1 X P h 2 i
P S n − mn ≥ ≤ E X ` , X ` ≤ n + P max X` > n
(n)2 1≤`≤n
`=1
Z n
2
≤ 2 F (t) dt + F (n),
n 0
and conclude that S n − mn −→ 0 in P-probability. (See part (ii) of Exercises
1.4.26 and 1.4.27 for a partial converse to this statement.)
Hint: Use the formula
Z
Var(Y ) ≤ EP Y 2 = 2
t P |Y | > t dt.
[0,∞)
Exercise 1.2.12. Show that, for each T ∈ [0, ∞) and t ∈ (0, ∞),
X (nt)k
1 if T > t
lim e−nt =
n→∞ k! 0 if T < t.
0≤k≤nT
Obviously, (1.3.4) is more than sufficient to guarantee that the Xn ’s have mo-
ments of all orders. In fact, as an application of Lebesgue’s Dominated Conver-
gence Theorem, one sees that ξ ∈ R 7−→ M (ξ) ∈ (0, ∞) is infinitely differentiable
and that
dn M
Z
EP X1n = xn µ(dx) =
(0) for all n ∈ N.
R dξ n
In the discussion that follows, I will use m and σ 2 to denote, respectively, the
common mean value and variance of the Xn ’s.
In order to develop some intuition for the considerations that follow, I will
first consider an example, which, for many purposes, is the canonical example in
probability theory. Namely, let g : R −→ (0, ∞) be the Gauss kernel
|y|2
1
(1.3.5) g(y) ≡ √ exp − , y ∈ R,
2π 2
and recall that a random variable X is standard normal if
Z
P X∈Γ = g(y) dy, Γ ∈ BR .
Γ
There are two obvious reasons for the honored position held by Gaussian
random variables. In the first place, they certainly have finite moment generating
functions. In fact, since
Z 2
ξy ξ
e g(y) dy = exp , ξ ∈ R,
R 2
it is clear that
σ2 ξ2
(1.3.6) Mγm,σ2 (ξ) = exp ξm + .
2
n|y|2
r Z
n
P Sn ∈ Γ = exp − dy.
2π Γ 2
|y|2
1 h i
(1.3.7) lim log P S n ∈ Γ = −ess inf : y∈Γ ,
n→∞ n 2
where the “ess” in (1.3.7) stands for essential and means that what follows is
taken modulo a set of measure 0. (Hence, apart from a minus sign, the right-
2
hand side of (1.3.7) is the greatest number dominated by |y|2 for Lebesgue-almost
every y ∈ Γ.) In fact, because
Z ∞
g(y) dy ≤ x−1 g(x) for all x ∈ (0, ∞),
x
Of course, in general one cannot hope to know such explicit expressions for the
distribution of S n . Nonetheless, on the basis of the preceding, one can start to
see what is going on. Namely, when the distribution µ falls off rapidly outside of
compacts, averaging n independent random variables with distribution µ has the
effect of building an exponentially deep well in which the mean value m lies at the
bottom. More precisely, if one believes that the Gaussian random variables are
normal in the sense that they are typical, then one should conjecture
that,
even
when the random variables are not normal, the behavior of P S n − m ≥ for
large n’s should resemble that of Gaussians with the same variance; and it is in
the verification of this conjecture that the moment generating function Mµ plays
a central role. Namely, although an expression in terms of µ for the distribution
of Sn is seldom readily available, the moment generating function for Sn is easily
expressed in terms of Mµ . To wit, as a trivial application of independence, we
have
EP eξSn = Mµ (ξ)n , ξ ∈ R.
where
(1.3.8) Λµ (ξ) ≡ log Mµ (ξ)
Notice that (1.3.9) is really very good. For instance, when the Xn ’s are N (m, σ 2 )-
random variables and σ > 0, then (cf. (1.3.6)) the preceding leads quickly to the
estimate
n2
P S n − m ≥ ≤ exp − 2 ,
2σ
which is essentially the upper bound at which we arrived before.
Taking a hint from the preceding, I now introduce the Legendre transform
(1.3.10) Iµ (x) ≡ sup ξx − Λµ (ξ) : ξ ∈ R , x ∈ R,
and
Iµ (x) = sup ξx − Λµ (ξ) : ξ ≤ 0 for x ∈ (−∞, m].
Finally, if
α = inf x ∈ R : µ (−∞, x] > 0 and β = sup x ∈ R : µ [x, ∞) > 0 ,
then Iµ is smooth on (α, β) and identically +∞ off of [α, β]. In fact, either
µ({m}) = 1 and α = m = β or m ∈ (α, β), in which case Λ0µ is a smooth,
strictly increasing mapping from R onto (α, β),
−1
Ξµ = Λ0µ
Iµ (x) = Ξµ (x) x − Λµ Ξµ (x) , x ∈ (α, β), where
is the inverse of Λ0µ , µ({α}) = e−Iµ (α) if α > −∞, and µ({β}) = e−Iµ (β) if
β < ∞.
§ 1.3 Cramér’s Theory of Large Deviations 27
Proof: For notational convenience, I will drop the subscript “µ” during the
proof. Further, note that the smoothness of Λ follows immediately from the
positivity and smoothness of M , and the identification of Λ0 (ξ) and Λ00 (ξ) with
the mean and variance of νξ is elementary calculus combined with the remark
following (1.3.4). Thus, I will concentrate on the properties of the function I.
As the pointwise supremum of functions that are linear, I is certainly lower
semicontinuous and convex. Also, because Λ(0) = 0, it is obvious that I ≥ 0.
Next, by Jensen’s Inequality,
Z
Λ(ξ) ≥ ξ x µ(dx) = ξ m,
R
and therefore e−I(β) = inf ξ≥0 e−ξβ M (ξ) = µ({β}). Since the same reasoning
applies when α > −∞, we are done.
Theorem 1.3.12 (Cramér’s Theorem). Let {Xn : n ≥ 1} be a sequence of
P-independent random variables with common distribution µ, assume R that the
associated moment generating function Mµ satisfies (1.3.4), set m = R x µ(dx),
and define Iµ accordingly, as in (1.3.10). Then,
P S n ≥ a ≤ e−nIµ (a)
for all a ∈ [m, ∞),
P S n ≤ a ≤ e−nIµ (a)
for all a ∈ (−∞, m].
28 1 Sums of Independent Random Variables
Results like the ones obtained in Theorem 1.3.12 are examples of a class of
results known as large deviations estimates. They are large deviations be-
cause the probability of their occurrence is exponentially small. Although large
deviation estimates are available in a variety of circumstances,1 in general one
has to settle for the cruder sort of information contained in the following.
1In fact, some people have written entire books on the subject. See, for example, J.-D.
Deuschel and D. Stroock, Large Deviations, now available from the A.M.S. in the Chelsea
Series.
§ 1.3 Cramér’s Theory of Large Deviations 29
(I use Γ◦ and Γ to denote the interior and closure of a set Γ. Also, recall that I
take the infemum over the empty set to be +∞.)
Proof: To prove the upper bound, let Γ be a closed set, and define Γ+ =
Γ ∩ [m, ∞) and Γ− = Γ ∩ (−∞, m]. Clearly,
P S n ∈ Γ ≤ 2P S n ∈ Γ+ ∨ P S n ∈ Γ− .
P S n ∈ Γ ≥ P S n = a ≥ e−nIµ (a) .
30 1 Sums of Independent Random Variables
Remark 1.3.14. The upper bound in Theorem 1.3.12 is often called Cher-
noff ’s Inequality. The idea underlying its derivation is rather mundane by
comparison to the subtle idea underlying the proof of the lower bound. Indeed,
it may not be immediately obvious what that idea was! Thus, consider once
again the second part of the proof of Theorem 1.3.12. What I had to do is esti-
mate the probability that S n lies in a neighborhood of a. When a is the mean
value m, such an estimate is provided by the Weak Law. On the other hand,
when a 6= m, the Weak Law for the Xn ’s has very little to contribute. Thus,
what I did is replace the original Xn ’s by random variables Yn , n ∈ Z+ , whose
mean value is a. Furthermore, the transformation from the Xn ’s to the Yn ’s was
sufficiently simple that it was easy to estimate Xn -probabilities in terms of Yn -
probabilities. Finally, the Weak Law
Pn applied to the Yn ’s gave strong information
about the rate of approach of n1 `=1 Y` to a.
I close this section by verifying the conjecture (cf. the discussion preceding
Lemma 1.3.11) that the Gaussian case is normal. In particular, I want to check
that the well around m in which the distribution of S n becomes concentrated
looks Gaussian, and, in view of Theorem 1.3.12, this comes down to the following.
Theorem 1.3.15. Let everything be as in Lemma 1.3.11, and assume that
the variance σ 2 > 0. There exists a δ ∈ (0, 1] anda K ∈ (0, ∞) such that
[m − δ, m + δ] ⊆ (α, β) (cf. Lemma 1.3.11), Λ00µ Ξ(x) ≤ K,
2
Iµ (x) − (x − m) ≤ K|x − m|3
Ξµ (x) ≤ K|x − m|, and 2σ 2
|a − m|2
K 2
P |S n − a| < ≥ 1 − 2 exp −n + K|a − m| + |a − m| .
n 2σ 2
Proof: Without loss in generality (cf. Exercise 1.3.17), I will assume that m =
0 and σ 2 = 1. Since, in this case, Λµ (0) = Λ0µ (0) = 0 and Λ00µ (0) = 1, it
follows that Ξµ (0) = 0 and Ξ0µ (0) = 1. Hence, we can find an M ∈ (0, ∞)
and a δ ∈ (0, 1] with α < −δ < δ < β for which Ξµ (x) − x ≤ M |x|2 and
Λµ (ξ) − ξ2 ≤ M |ξ|3 whenever |x| ≤ δ and |ξ| ≤ (M + 1)δ, respectively. In
2
particular, this leads immediately to Ξµ (x) ≤ (M + 1)|x| for |x| ≤ δ, and
the estimate for Iµ comes easily from the preceding combined with equation
Iµ (x) = Ξ(x)x − Λµ Ξµ (x) .
Exercises for § 1.3 31
Hint: Handle the case µ(E) < ∞ first, and treat the case when f ∈ L1 (µ; R)
by considering the measure ν(dx) = f (x) µ(dx).
Exercise 1.3.17. Referring to the notation used in this section, assume that
µ is a non-degenerate (i.e., it is not concentrated at a single point) probability
measure on R for which (1.3.4) holds. Next, let m and σ 2 be the mean and
variance of µ, use ν to denote the distribution of
x−m
x ∈ R 7−→ ∈R under µ,
σ
and define Λν , Iν , and Ξν accordingly. Show that
Λµ (ξ) = ξm + Λν (σξ), ξ ∈ R,
x−m
Iµ (x) = Iν , x ∈ R,
σ
Image Λ0µ = m + σ Image Λ0ν ,
1 x−m
, x ∈ Image Λ0µ .
Ξµ (x) = Ξν
σ σ
x2
Pn 2
Pn
distribution of S ≡ 1 σk Xk and show that Iν (x) ≥ 2Σ 2 , where Σ ≡ 1 σk2 .
In particular, conclude that
a2
P |S| ≥ a ≤ 2 exp − 2 , a ∈ [0, ∞).
2Σ
Exercise 1.3.19. Although it is not exactly the direction in which I have been
going, it seems appropriate to include here a derivation of Stirling’s formula.
Namely, recall Euler’s Gamma function:
Z
(1.3.20) Γ(t) ≡ xt−1 e−x dx, t ∈ (−1, ∞).
[0,∞)
This is, of course, far less than we want to know. Nonetheless, it does show that
all the action is going to take place near y = 1 and that the principal factor in
the asymptotics of Γ(t+1) −t
tt+1 is e . In order to highlight these observations, make
the substitution y = z + 1 and obtain
Z
Γ(t + 1)
= (1 + z)t e−tz dz.
tt+1 e−t (−1,∞)
2
Before taking the next step, introduce the function R(z) = log(1 + z) − z + z2
|z|3
for z ∈ (−1, 1), and check that R(z) ≤ 0 if z ∈ (−1, 0] and that |R(z)| ≤ 3(1−|z|)
everywhere in (−1, 1). Now let δ ∈ (0, 1) be given, and show that
Z −δ
tδ 2
t tz −δ t
1 + z e dz ≤ (1 − δ) (1 − δ)e ≤ exp −
−1 2
and
Z ∞ t h it−1 Z ∞
e−tz dz ≤ 1 + δ e−δ (1 + z)e−z dz
1+z
δ δ
tδ 2 δ3
≤ 2 exp 1 − + .
2 3(1 − δ)
Exercises for § 1.3 33
tz 2
Next, write (1 + z)t e−tz = e− 2 etR(z) . Then
Z Z
t −tz tz 2
1+z e dz = e− 2 dz + E(t, δ),
|z|≤δ |z|≤δ
where Z
tz 2
e− etR(z) − 1 dz.
E(t, δ) = 2
|z|≤δ
Check that
Z r Z
− tz2
2 2π 1 z2 2 tδ 2
e dz − = t− 2 e− 2 dz ≤ 1 e− 2 .
t
1
|z|≤δ |z|≥t 2 δ t2 δ
12(1 − δ)
Z Z
2 tz 2 3−5δ
− tz2 +|R(z)|
|E(t, δ)| ≤ t |R(z)|e dz ≤ t |z|3 e− 2 3(1−δ)
dz ≤
|z|≤δ |z|≤δ (3 − 5δ)2 t
p
as long as δ < 35 . Finally, take δ = 2t−1 log t, and combine these to conclude
that there is a C < ∞ such that
Γ(t + 1) C
√ − 1 ≤ , t ∈ [1, ∞).
2πt t t t
e
f (n + 1) + f (n − 1)
Af (n) = , n ∈ Z,
2
and show that, for any n ≥ 1, An f = EP f (Sn ) , where Sn is the sum of n
P-independent, {−1, 1}-valued Bernoulli random variables with mean value 0.
2 T.H. Carne, “A transformation formula for Markov chains,” Bull. Sc. Math., 109, pp. 399–
405 (1985). As Carne points out, what he is doing is the discrete analog of Hadamard’s rep-
resentation, via the Weierstrass transform, of solutions to heat equations in terms of solutions
to the wave equations.
34 1 Sums of Independent Random Variables
In particular, this means that |Q(n, x)| ≤ 1 for all x ∈ [−1, 1]. (It also means
that Q(n, · ) is the nth Chebychev polynomial.)
(iii) Using induction on n ∈ Z+ , show that
n
A Q( · , z) (m) = z n Q(m, z),
m ∈ Z and z ∈ C,
In particular, if
h i X n
pm,n (z) ≡ E Q Sn , z), Sn < m = 2−n
Q(2` − n, z),
`
|2`−n|<m
m2
n
sup |x − pm,n (x)| ≤ P |Sn | ≥ m ≤ 2 exp − for all 1 ≤ m ≤ n.
x∈[−1,1] 2n
2
f, An g ≤ 2kf kH kgkH exp − m
H
for n ≥ m.
2n
Sn − an
lim exists in R
n→∞ bn
has P-measure either 0 or 1. In fact, if bn −→ ∞ as n → ∞, then both
Sn − an Sn − an
lim and lim
n→∞ bn n→∞ bn
then
∞
X
Xn − EP Xn converges P-almost surely.
n=1
2
2
− Sn2 = SN − Sn
SN + 2 SN − Sn Sn ≥ 2 SN − Sn Sn ;
and therefore, since SN −Sn has mean value 0 and is independent of the σ-algebra
σ {X1 , . . . , Xn } ,
2
, An ≥ EP Sn2 , An for any An ∈ σ {X1 , . . . , Xn } .
(*) EP SN
In particular, if A1 = |S1 | > and
n o
n ∈ Z+ ,
An+1 = Sn+1 > and max S` ≤ ,
1≤`≤n
N N
2 X 2 X
EP Sn2 , An
P P
E SN , BN = E SN , An ≥
n=1 n=1
N
X
≥ 2 P An = 2 P B N .
n=1
§ 1.4 The Strong Law of Large Numbers 37
Thus,
∞
2 X
P sup Sn > = lim 2 P BN ≤ lim EP SN
2
EP Xn2 ,
≤
n≥1 N →∞ N →∞
n=1
and so the result follows after one takes left limits with respect to > 0.
Proof of Theorem 1.4.2: Again assume that the Xn ’s have mean value 0.
By (1.4.6) applied to XN +n : n ∈ Z+ , we see that (1.4.3) implies
∞
1 X
EP Xn2 −→ 0 as N → ∞
P sup Sn − SN ≥ ≤ 2
n>N
n=N +1
for every > 0, and this is equivalent to the P-almost sure Cauchy convergence
of {Sn : n ≥ 1}.
In order to convert the conclusion in Theorem 1.4.2 into a statement about
S n : n ≥ 1 , I will need the following elementary summability fact about
sequences of real numbers.
Lemma 1.4.7 (Kronecker). Let bn : n ∈ Z+ be a non-decreasing sequence
of positive numbers that tend to ∞, and set βn = bn − bn−1 , where b0 ≡ 0. If
{sn : n ≥ 1} ⊆ R is a sequence that converges to s ∈ R, then
n
1 X
β` s` −→ s.
bn
`=1
Proof: To prove the first part, assume that s = 0, and for given > 0 choose
N ∈ Z+ so that |s` | < for ` ≥ N . Then, with M = supn≥1 |sn |,
1 X n Mb
N
β` s` ≤ + −→ as n → ∞.
bn bn
`=1
Pn
Turning to the second part, set y` = xb`` , s0 = 0, and sn = `=1 y` . After
summation by parts,
n n
1 X 1 X
x` = sn − β` s`−1 ;
bn bn
`=1 `=1
and so, since sn −→ s ∈ R as n → ∞, the first part gives the desired conclu-
sion.
After combining Theorem 1.4.2 with Lemma 1.4.7, we arrive at the following
interesting statement.
38 1 Sums of Independent Random Variables
then
n
1 X
X` − EP X` −→ 0 P-almost surely.
bn
`=1
P ∃n ∈ Z+ ∀N ≥ n YN = XN = 1.
Pn
In particular, if T n = n1 `=1 Y` for n ∈ Z+ , then, for P-almost every ω ∈ Ω,
T n (ω) −→ 0 if and only if S n (ω) −→ 0. Finally, to see that T n −→ 0 P-almost
surely, first observe that, because EP [X1 ] = 0, by the first part of Lemma 1.4.7,
n
1X P
lim E [Y` ] = lim EP X1 , |X1 | ≤ n = 0,
n→∞ n n→∞
`=1
Thus, the P-almost sure convergence is now established, and the L1 (P; R)-conver-
gence result was proved already in Theorem 1.2.6.
Turning to the converse assertion, first note that (by Lemma 1.4.1) if S n
converges in R on a set of positive P-measure, then it converges P-almost surely
to some m ∈ R. In particular,
|Xn |
lim = lim S n − S n−1 = 0 P-almost surely;
n→∞ n n→∞
and so, if An ≡ |Xn | > n , then P limn→∞ An = 0. But the An ’s are mutually
independent, and Ptherefore, by the second part of the Borel–Cantelli Lemma, we
∞
now know that n=1 P An < ∞. Hence,
Z ∞ X∞
P
E |X1 | = P |X1 | > t dt ≤ 1 + P |Xn | > n < ∞.
0 n=1
Remark 1.4.10. A reason for being interested in the converse part of Theorem
1.4.9 is that it provides a reconciliation between the measure theory vs. frequency
schools of probability theory.
Although Theorem 1.4.9 is the centerpiece of this section, I want to give
another approach to the study of the almost sure convergence properties of
{Sn : n ≥ 1}. In fact, following P. Lévy, I am going to show that {Sn : n ≥ 1}
converges P-almost surely if it converges in P-measure. Hence, for example,
Theorem 1.4.2 can be proved as a direct consequence of (1.4.4), without appeal
to Kolmogorov’s Inequality.
The key to Lévy’s analysis lies in a version of the reflection principle, whose
statement requires the introduction of a new concept. Given an R-valued random
variable Y , say that α ∈ R is a median of Y and write α ∈ med(Y ), if
P Y ≤ α ∧ P Y ≥ α ≥ 12 .
(1.4.11)
40 1 Sums of Independent Random Variables
Notice that (as distinguished from a mean value) every Y admits a median; for
example, it is easy to check that
α ≡ inf t ∈ R : P Y ≤ t ≥ 12
On the other hand, the notion of median is flawed by the fact that, in general,
a random variable will admit an entire non-degenerate interval of medians. In
addition, it is neither easy to compute the medians of a sum in terms of the
medians of the summands nor to relate the medians of an integrable random
variable to its mean value. Nonetheless, at least if Y ∈ Lp (P; R) for some
p ∈ [1, ∞), the following estimate provides some information. Namely, since, for
α ∈ med(Y ) and β ∈ R,
|α − β|p
≤ |α − β|p P Y ≥ α ∧ P Y ≤ α ≤ EP |Y − β|p ,
2
Theorem 1.4.13 (Lévy’s Reflection Principle). Let Xn : n ∈ Z+ be
a sequence of P-independent random variables, and, for k ≤ `, choose α`,k ∈
med S` − Sk . Then, for any N ∈ Z+ and > 0,
(1.4.14) P max Sn + αN,n ≥ ≤ 2P SN ≥ ,
1≤n≤N
and therefore
(1.4.15) P max Sn + αN,n ≥ ≤ 2P |SN | ≥ .
1≤n≤N
Proof:
Clearly (1.4.15) follows by applying (1.4.14) to both the sequences
Xn : n ≥ 1} and {−Xn : n ≥ 1} and then adding the two results.
§ 1.4 The Strong Law of Large Numbers 41
To prove (1.4.14), set A1 = S1 + αN,1 ≥ and
An+1 = max S` + αN,` < and Sn+1 + αN,n+1 ≥
1≤`≤n
In addition,
{SN ≥ ⊇ An ∩ SN − Sn ≥ αN,n for each 1 ≤ n ≤ N.
Hence,
N
X
P SN ≥ ≥ P An ∩ SN − Sn ≥ αN,n
n=1
N
1X 1
≥ P An = P max Sn + αN,n ≥ ,
2 n=1 2 1≤n≤N
Remark 1.4.17. The most beautiful and startling feature of Lévy’s line of
reasoning is that it requires no integrability assumptions. Of course, in many
applications of Corollary 1.4.16, integrability considerations enter into the proof
that {Sn : n ≥ 1} converges in P-measure. Finally, a word of caution may be
in order. Namely, the result in Corollary 1.4.16 applies to the quantities Sn
themselves; it does not apply to associated quantities like S n . Indeed, suppose
that {Xn : n ≥ 1} is a sequence of independent, identically distributed random
variables that satisfy
− 12
P Xn ≤ −t = P Xn ≥ t = 1 + t2 log e4 + t2
for all t ≥ 0.
On the one hand, by Exercise 1.2.11, we know that the associated averages S n
tend to 0 in probability. On the other hand, by the second part of Theorem
1.4.9, we know that the sequence S n : n ≥ 1 diverges almost surely.
Exercises for § 1.4
Show that
p1 p P p p1
(1.4.20) EP X p ≤ E Y , p ∈ (1, ∞).
p−1
and conclude from this that, for each p ∈ (2, ∞), {Sn : n ≥ 1} converges to S
in Lp (P ) if and only if S ∈ Lp (P ).
Exercise 1.4.23. If X ∈ L2 (P; R), then it is easy to characterize its mean m
as the c ∈ R that minimizes EP (X − c)2 . Assuming that X ∈ L1 (P; R), show
that α ∈ med(X) if and only if
EP |X − α| = min EP |X − c| .
c∈R
Note that this can be used in place of (1.4.15) when proving results like the one
in Corollary 1.4.16.
44 1 Sums of Independent Random Variables
Show that
(ii) Continuing in the same setting, add the assumption that the Xn ’s are iden-
tically distributed, and use part (i) to show that
lim P |S n | ≤ C = 1 for some C ∈ (0, ∞)
n→∞
=⇒ lim nP |X1 | ≥ n = 0.
n→∞
1−(1−x)n
and that x −→ n as x & 0.
In conjunction with Exercise 1.2.11, this proves that if {Xn : n ≥ 1} is
a sequence of independent, identically distributed symmetric random
variables,
then S n −→ 0 in P-probability if and only if limn→∞ nP |X1 | ≥ n = 0.
Exercises for § 1.4 45
The beautiful argument given below is due to Y. Guivarc’h, but its full power
cannot be appreciated in the present context (cf. Exercise 6.2.19). Furthermore,
a classic result (cf. Exercise 5.2.43) due to K.L. Chung and W.H. Fuchs gives a
much better result for the independent random variables. Their result says that
limn→∞ |Sn | = 0 P-almost surely.
In order to prove the assertion here, assume that limn→∞ |Sn | = ∞ with
positive P-probability, use Kolmogorov’s 0–1 Law to see that |Sn | −→ ∞ P-
almost surely, and proceed as follows.
1These ideas are taken from the book by Wm. Feller cited at the end of § 1.2. They become
even more elegant when combined with a theorem due to E.J.G. Pitman, which is given in
Feller’s book.
46 1 Sums of Independent Random Variables
(i) Show that there must exist an > 0 with the property that
P ∀` > k S` − Sk ≥ ≥
where A ≡ ω : ∀` ∈ Z+ S` (ω) ≥ .
P(A) ≥ ,
and n o
Γn0 (ω) = t ∈ R : ∃1 ≤ ` ≤ n t − S`0 (ω) < 2 ,
Pn
where Sn0 ≡ `=1 X`+1 . Next, let Rn (ω) and Rn0 (ω) denote the Lebesgue mea-
sure of Γn (ω) and Γn0 (ω), respectively; and, using the translation invariance of
Lebesgue measure, show that
n ∈ Z+ ,
P(A) ≤ EP Rn+1 − Rn ,
(iii) In view of parts (i) and (ii), what remains to be done is show that
1 P
m = 0 =⇒ lim E Rn = 0.
n→∞ n
But, clearly, 0 ≤ Rn (ω) ≤ n. Thus, it is enough to show that, when m = 0,
Rn
n −→ 0 P-almost surely; and, to this end, first check that
Sn (ω) Rn (ω)
−→ 0 =⇒ −→ 0,
n n
and, finally, apply The Strong Law of Large Numbers.
Exercises for § 1.4 47
Exercise 1.4.29. As I have already said, for many applications The Weak
Law of Large Numbers is just as good as and even preferable to the Strong
Law. Nonetheless, here is an application in which the full strength of the Strong
Law plays an essential role. Namely, I want to use the Strong Law to produce
examples of continuous, strictly increasing functions F on [0, 1] with the property
that their derivative
F (y) − F (x)
F 0 (x) ≡ lim =0 at Lebesgue-almost every x ∈ (0, 1).
y→x y−x
By familiar facts about functions of a real variable, one knows that such func-
tions F are in one-to-one correspondence with non-atomic, Borel probability
measures µ on [0, 1] which charge every non-empty open subset but are singular
to Lebesgue’s measure.
Namely, F is the distribution function determined by µ:
F (x) = µ (−∞, x] .
+ +
(i) Set Ω = {0, 1}Z , and, for each p ∈ (0, 1), take Mp = (βp )Z , where βp on
{0, 1} is the Bernoulli measure with βp ({1}) = p = 1 − βp ({0}). Next, define
∞
X
ω ∈ Ω 7−→ Y (ω) ≡ 2−n ωn ∈ [0, 1],
n=1
wherePbsc
∞
denotes the integer part of s. If {n : n ≥ 1} ⊆ {0, 1} satisfies
x = 1 2−m m , show that m = m (x) for all m ≥ 1 if and only if m = 0 for
infinitely many m ≥ 1. In particular, conclude first that ωn = n Y (ω) , n ∈
Z+ , for Mp -almost every ω ∈ Ω and, second, by the Strong Law, that
n
1 X
n (x) −→ p for µp -almost every x ∈ [0, 1].
n m=1
(iv) By Lemma 1.1.6, we know that µ 12 is Lebesgue measure λ[0,1] on [0, 1].
Hence, we now know that µp ⊥ λ[0,1] when p 6= 12 . In view of the introductory
remarks, this completes
the proof that, for each p ∈ (0, 1) \ { 12 }, the function
Fp (x) = µp (−∞, x] is a strictly increasing, continuous function on [0, 1] whose
derivative vanishes at Lebesgue-almost every point. Here, one can do better.
Namely, referring to part (iii), let ∆p denote the set of x ∈ [0, 1) such that
n
1 X
lim Σn (x) = p, where Σn (x) ≡ m (x).
n→∞ n
m=1
We know that ∆ 12 has Lebesgue measure 1. Show that, for each x ∈ ∆ 12 and
p ∈ (0, 1) \ { 12 }, Fp is differentiable with derivative 0 at x.
Hint: Given x ∈ [0, 1), define
n
X
Ln (x) = 2−m m (x) and Rn (x) = Ln (x) + 2−n .
m=1
Show that
n
!
X
2−m ωm = Ln (x) = pΣn (x) (1 − p)n−Σn (x) .
Fp Rn (x) − Fp Ln (x) = Mp
m=1
When p ∈ (0, 1) \ { 12 }
and x ∈ ∆ 12 , use this together with 4p(1 − p) < 1 to show
that !
Fp Rn (x) − Fp Ln (x)
lim n log < 0.
n→∞ Rn (x) − Ln (x)
To complete the proof, for given x ∈ ∆ 12 and n ≥ 2 such that Σn (x) ≥ 2, let
mn (x) denote the largest m < n such that m (x) = 1, and show that mnn(x) −→ 1
as n → ∞. Hence, since 2−n−1 < h ≤ 2−n implies that
Fp (x) − Fp (x − h) n−mn (x)+1 Fp Rn (x) − Fp Ln (x)
≤2 ,
h Rn (x) − Ln (x)
one concludes that Fp is left-differentiable at x and has left derivative equal to
0 there. To get the same conclusion about right derivatives, simply note that
Fp (x) = 1 − F1−p (1 − x).
(v) Again let p ∈ (0, 1) \ { 12 } be given, but this time choose x ∈ ∆p . Show that
Fp (x + h) − Fp (x)
lim = +∞.
h&0 h
The argument is similar to the one used to handle part (iv). However, this time
the role played by the inequality 4pq < 1 is played here by (2p)p (2q)q > 1 when
q = 1 − p.
§ 1.5 Law of the Iterated Logarithm 49
∞
Sn X 1
−→ 0 P-almost surely if 2
< ∞.
bn b
n=1 n
1
Thus, for example, Sn grows more slowly than n 2 log n. On the other hand, if
Sn
the Xn ’s are N (0, 1)-random variables, then so are the random variables √ n
;
and therefore, for every R ∈ (0, ∞),
[ Sn
Sn S
P lim √ ≥ R = lim P √ ≥ R ≥ lim P √N ≥ R > 0.
n→∞ n N →∞ n N →∞ N
n≥N
Hence, at least for normal random variables, one can use Lemma 1.4.1 to see
that
Sn
lim √ = ∞ P-almost surely;
n→∞ n
1
and so Sn grows faster than n 2 .
If, as we did in Section 1.3, we proceed on the assumption that Gaussian
random variables are typical, we should expect the growth rate of the Sn ’s to be
1 1
something between n 2 and n 2 log n. What, in fact, turns out to be the precise
growth rate is
q
(1.5.1) Λn ≡ 2n log(2) (n ∨ 3),
where log(2) x ≡ log log x (not the logarithm with base 2) for x ∈ [e, ∞). That
is, one has The Law of the Iterated Logarithm:
Sn
(1.5.2) lim = 1 P-almost surely.
n→∞ Λn
This remarkable fact was discovered first for Bernoulli random variables by Khin-
chine, was extended by Kolmogorov to random variables possessing 2 + mo-
ments, and eventually achieved its final form in the work of Hartman and Wint-
ner. The approach that I will adopt here is based on ideas (taught to me by
M. Ledoux) introduced originally to handle generalizations of (1.5.2) to random
50 1 Sums of Independent Random Variables
variables with values in a Banach space.1 This approach consists of two steps.
The first establishes a preliminary version of (1.5.2) that, although it is far cruder
than (1.5.2) itself, will allow me to justify a reduction of the general case to the
case of bounded random variables. In the second step, I deal with bounded ran-
dom variables and more or less follow Khinchine’s strategy for deriving (1.5.2)
once one has estimates like the ones provided by Theorem 1.3.12.
In what follows, I will use the notation
S[β]
Λβ = Λ[β] and S̃β = for β ∈ [3, ∞),
Λβ
Proof: Let β ∈ (1, ∞) be given and, for each m ∈ N and 1 ≤ n ≤ β m , let √αm,n
be a median (cf. (1.4.11)) of S[β m ] −Sn . Noting that, by (1.4.12), αm,n ≤ 2β m ,
we know that
1
Sn
lim S̃n = lim m−1max m S̃n ≤ β lim m−1max m
2
n→∞ m→∞ β ≤n≤β m→∞ β ≤n≤β Λβ m
1
Sn + αm,n
≤ β 2 lim maxm ,
m→∞ n≤β Λβ m
and therefore
!
Sn + αm,n
− 12
P lim S̃n ≥ a ≤ P lim max ≥ aβ .
n→∞ m→∞ n≤β m Λβ m
has under Q. Next, using the last part of (iii) in Exercise 1.3.18 with σk = Xk (ω),
note that
2m !
X
λ[0,1) t ∈ [0, 1) : Rn (t)Xn (ω) ≥ a
n=1
" #
a2
≤ 2 exp − P2m , a ∈ [0, ∞) and ω ∈ Ω.
2 n=1 Xn (ω)2
Hence, if
2m
( )
1 X 2
Am ≡ ω∈Ω: m Xm (ω) ≥ 2
2 n=1
and ( 2m )!
X 3
Fm (ω) ≡ λ[0,1) t ∈ [0, 1) : Rn (t)Xn (ω) ≥ 2 2 Λ2m ,
n=1
52 1 Sums of Independent Random Variables
under the measure Q ≡ P × P0 . Since the Yn ’s are obviously (cf. part (i) of
Exercise 1.4.21) symmetric, the result which I have already proved says that
Sn (ω) − Sn (ω 0 ) 5
lim ≤ 22 ≤ 8 for Q-almost every (ω, ω 0 ) ∈ Ω × Ω0 .
n→∞ Λn
|Sn (ω)|
lim ≥ 8 + for P-almost every ω ∈ Ω;
n→∞ Λn
§ 1.5 Law of the Iterated Logarithm 53
But, again by Fubini’s Theorem, this would mean that there exists a {nm : m ∈
Sn (ω0 )
Z+ } ⊆ Z+ such that nm % ∞ and limm→∞ Λmn ≥ for P0 -almost every
m
ω 0 ∈ Ω0 , and obviously this contradicts
" #
2
P 0 Sn 1
E = −→ 0.
Λn 2 log(2) n
We have now got the crude statement alluded to above. In order to get the
more precise statement contained in (1.5.2), I will need the following application
of the results in § 1.3.
Lemma 1.5.6. Let {Xn : n ≥ 1} be a sequence of independent random
variables with mean value 0, variance 1, and common distribution µ. Further,
assume that (1.3.4) holds. Then, for each R ∈ (0, ∞) there is an N (R) ∈ Z+
such that
" r ! #
8R log(2) n
(1.5.7) P S̃n ≥ R ≤ 2 exp − 1 − K R2 log(2) n
n
for n ≥ N (R). In addition, for each ∈ (0, 1], there is an N () ∈ Z+ such that,
for all n ≥ N () and |a| ≤ 1 ,
1 h i
P S̃n − a < ≥ exp − a2 + 4K|a| log(2) n .
(1.5.8)
2
In both (1.5.7) and (1.5.8), the constant K ∈ (0, ∞) is the one in Theorem
1.3.15.
Proof: Set
12
2 log(2) (n ∨ 3)
Λn
λn = = .
n n
To prove (1.5.7), simply apply the upper bound in the last part of Theorem
1.3.15 to see that, for sufficiently large n ∈ Z+ ,
(Rλn )2
3
P S̃n ≥ R = P S n ≥ Rλn ≤ 2 exp −n
− K Rλn .
2
3This is Fubini at his best and subtlest. Namely, I am using Fubini to switch between hori-
zontal and vertical sets of measure 0.
54 1 Sums of Independent Random Variables
where an = aλn and n = λn . Thus, by the lower bound in the last part of
Theorem 1.3.15,
2
K an 2
P S̃n − a < ≥ 1 − 2 exp −n
+ K|an | n + an
nn 2
!
K h i
≥ 1− 2 exp − a2 + 2K|a| + a2 λn log(2) n
2 log(2) n
(Cf. Exercise 1.5.12 for a converse statement and §§ 8.4.2 and 8.6.3 for related
results.)
Proof: I begin with the observation that, because of (1.5.5), I may restrict
my attention to the case when the Xn ’s are bounded random variables. Indeed,
for any Xn ’s and any > 0, an easy truncation procedure allows us to find an
ψ ∈ Cb (R; R) such that Yn ≡ ψ ◦ Xn again has mean value 0 and variance 1
while Zn ≡ Xn − Yn has variance less than 2 . Hence, if the result is known
when the random variables are bounded, then, by (1.5.5) applied to the Zn ’s,
Pn
m=1 Zm (ω)
lim S̃n (ω) ≤ 1 + lim
≤ 1 + 8,
n→∞ n→∞ Λn
In view of the preceding, from now on I may and will assume that the Xn ’s
are bounded. To prove that limn→∞ S̃n ≤ 1 (a.s., P), let β ∈ (1, ∞) be given,
and use (1.5.7) to see that
1
h 1 i
P S̃β m ≥ β 2 ≤ 2 exp −β 2 log(2) β m
+
large m ∈ Z . Hence, by Lemma 1.5.3 with a = β, we see
for all sufficiently
that limn→∞ S̃n ≤ β (a.s., P) for every β ∈ (1, ∞). To complete the proof, I
must still show that, for every a ∈ (−1, 1) and > 0,
P lim S̃n − a < = 1.
n→∞
Because I want to get this conclusion as an application of the second part of
the Borel–Cantelli Lemma, it is important that we be dealing with independent
events, and for this purpose I use the result just proved to see that, for every
integer k ≥ 2,
lim S̃n − a ≤ inf lim S̃km − a
n→∞ k→∞ m→∞
Skm − Skm−1
= inf lim
− a P-almost surely.
k→∞ m→∞ Λk m
Thus, because the events
Skm − Skm−1
m ∈ Z+ ,
Ak,m ≡ − a < ,
Λkm
are independent for each k ≥ 2, all that I need to do is check that
X∞
P Ak,m = ∞ for sufficiently large k ≥ 2.
m=1
But
Λkm a Λkm
P Ak,m = P S̃km −km−1 −
< ,
Λkm −km−1 Λkm −km−1
and, because
Λkm
lim max+
− 1 = 0,
k→∞ m∈Z Λ m m−1
k −k
everything reduces to showing that
X∞
(*) P S̃km −km−1 − a < = ∞
m=1
for each k ≥ 2, a ∈ (−1, 1), and > 0. Finally, referring to (1.5.8), choose 0 > 0
so small that ρ ≡ a2 + 4K0 |a| < 1, and conclude that, when 0 < < 0 ,
1 h i
P S̃n − < ≥ exp −ρ log(2) n
2
for all sufficiently large n’s, from which (*) is easy.
56 1 Sums of Independent Random Variables
Remark 1.5.11. The reader should notice that the Law of the Iterated Log-
arithm provides a naturally occurring sequence of functions that converge in
measure but not almost everywhere. Indeed, it is obvious that S̃n −→ 0 in
L2 (P; R), but the Law of the Iterated Logarithm says that S̃n : n ≥ 1 is
wildly divergent when looked at in terms of P-almost sure convergence.
Exercises for § 1.5
In this exercise I4 will outline a proof that X1 is P-square integrable, EP X1 = 0,
and
Sn Sn 1
(1.5.14) lim = − lim = EP X12 2 (a.s., P).
n→∞ Λn n→∞ Λn
(i) Using Lemma 1.4.1, show that there is a σ ∈ [0, ∞) such that
Sn
(1.5.15) lim =σ (a.s., P).
n→∞ Λn
1 Sn Sn
σ = EP X12 2 = lim = − lim (a.s., P).
n→∞ Λn n→∞ Λn
In other words, everything comes down to proving that (1.5.13) implies that X1
is P-square integrable.
(ii) Assume that the Xn ’s are symmetric. For t ∈ (0, ∞), set
X̌1t , . . . , X̌nt , . . .
and X1 , . . . , X n , . . .
4I follow Wm. Feller “An extension of the law of the iterated logarithm to variables without
variance,” J. Math. Mech., 18 #4, pp. 345–355 (1968), although V. Strassen was the first to
prove the result.
Exercises for § 1.5 57
have the same distribution. Conclude first that, for all t ∈ [0, 1),
Pn
m=1 Xn 1[0,t] |Xn |
lim ≤ σ (a.s., P),
n→∞ Λn
where σ is the number in (1.5.15), and second that
h i
EP X12 = lim EP X12 , X1 ≤ t ≤ σ 2 .
t%∞
Xn + X̌nt
Xn 1[0,t] |Xn | = ,
2
and apply part (i).
(iii) For general {Xn : n ≥ 1}, produce an independent copy {Xn0 : n ≥ 1} (as
in the proof of Lemma 1.5.4), and set Yn = Xn − Xn0 . After checking that
Pn
| m=1 Ym |
lim ≤ 2σ (a.s., P),
n→∞ Λn
conclude
first that EP Y12 ≤ 4σ 2 and then (cf. part
(i) of Exercise1.4.27) that
EP X12 < ∞. Finally, apply (i) to arrive at EP X1 = 0 and (1.5.14).
Exercise 1.5.16. Let {s̃n : n ≥ 1} be a sequence of real numbers which possess
the properties that
lim s̃n = 1, lim s̃n = −1, and lim s̃n+1 − s̃n = 0.
n→∞ n→∞ n→∞
Show that the set of subsequential limit points of {s̃n : n ≥ 1} coincides with
[−1, 1]. Apply this observation to show that, in order to get the final statement in
Theorem 1.5.9, I need only have proved (1.5.10) for the function f (x) = x, x ∈ R.
Hint: In proving the last part, use the square integrability of X1 to see that
∞ 2
X Xn
P ≥ 1 < ∞,
n=1
n
and apply the Borel–Cantelli Lemma to conclude that S̃n − S̃n−1 −→ 0 (a.s., P).
Exercise 1.5.17. Let {Xn : n ≥ 1} be a sequence of RN -valued, identically
distributed random variables on (Ω, F, P) with the property that, for each e ∈
SN −1 = {x ∈ RN : |x| = 1}, e, X1 RN has mean value 0 and variance 1. Set
Pn Sn
Sn = m=1 Xm and S̃n = Λ n
, and show that limn→∞ |S̃n | = 1 P-almost surely.
Here are some steps that you might want to follow.
58 1 Sums of Independent Random Variables
Show that
|s̃n | ≤ max e, s̃n RN + ,
1≤k≤`
and conclude first that limn→∞ |s̃n | ≤1 + and then that limn→∞ |s̃n | ≤ 1.
At the same time, since |s̃n | ≥ e1 , s̃n RN , show that limn→∞ |s̃n | ≥ 1. Thus
limn→∞ |s̃n | = 1.
(iii) Let {ek : k ≥ 1} be as in (i), and apply Theorem 1.5.9 to show that, for
P-almost all ω ∈ Ω, the sequence {S̃n (ω) : n ≥ 1} satisfies the condition in (i).
Thus, by (ii), limn→∞ |S̃n (ω)| = 1 for P-almost every ω ∈ Ω.
Chapter 2
The Central Limit Theorem
In the preceding chapter I dealt with averages of random variables and showed
that, in great generality, those averages converge almost surely or in probability
to a constant. At least when all the random variables have the same distribution
and moments of all orders, one way of rationalizing this phenomenon is to rec-
ognize that the mean value is conserved whereas all higher moments are driven
to 0 when one averages. Of course, the reason why it is easy to conserve the
first moment is that the mean of the sum is the sum of the means. Thus, if one
is going to attempt to find a simple normalization procedure that conserves a
quantity involving more than the mean value, one should seek a quantity that
shares this additivity property.
With this in mind, one is led to ask what happens if one normalizes in a way
that conserves the variance. For this purpose, suppose that {Xn : n ∈ Z+ } is a
sequence of mutually independent, identicallyP distributed random variables with
n 1
mean value 0 and variance 1, and set Sn = 1 Xk . Then S̆n ≡ n− 2 Sn again
has mean value 0 and variance 1. On the other hand, because of Theorem 1.5.9,
we know that, with probability 1, limn→∞ S̆n = ∞ = − limn→∞ S̆n . Hence,
from the point of view of either almost sure convergence or even convergence in
probability, there is no hope that the S̆n ’s will converge.
Nonetheless, the random variables {S̆n : n ≥ 1} possess remarkable stability
when viewed from a distributional perspective. Indeed, if the Xn ’s are Gaussian,
then so are the S̆n ’s, and therefore S̆n ∈ N (0, 1) for all n ≥ 1. More generally,
even if the Xn ’s are not Gaussian, fixing their mean value and variance in this
way forces all their moments to stabilize. To be precise, assume that X1 has finite
moments of all orders, that its mean is 0, and that its variance is 1. Trivially,
L1 ≡ limn→∞ EP [S̆n ] = 0 and L2 ≡ limn→∞ EP [S̆n2 ] = 1. Next, assume that
L` ≡ limn→∞ EP [S̆n` ] exists for 1 ≤ ` ≤ m, where m ≥ 2. I will show now that
Lm+1 ≡ limn→∞ EP [S̆nm+1 ] exists and is equal to mLm−1 . To this end, first note
that, since EP [Xn ] = 0 and the Xn ’s are independent and identically distributed,
m
m+1
P P
m X m P j+1 P m−j
E Sn = nE Xn Xn + Sn−1 =n E Xn E Sn−1
j=0
j
59
60 2 The Central Limit Theorem
m
m−1
P
X m P j+1 P m−j
= nmE Sn−1 + n E Xn E Sn−1 .
j=2
j
m+1
Thus, after dividing through by n 2 , one gets the desired conclusion when
n → ∞. Starting from L1 = 0 and L2 = 1, one now can use induction to check
Qm
that L2m−1 = 0 and L2m = `=1 (2` − 1) = 2(2m)! +
m m! for all m ∈ Z . That is,
m
Y (2m)!
lim EP S̆n2m−1 = 0 lim EP S̆n2m =
and (2` − 1) = m ,
n→∞ n→∞ 2 m!
`=1
Notice that when the Xk ’s are identically distributed and have variance 1, the
S̆n in (2.1.2) is consistent with the notation used above. Finally, set
n
σm 1 X P h 2 i
(2.1.3) rn = max and gn () = E Xm , X m
≥ Σ n
1≤m≤n Σn Σ2n m=1
§ 2.1 The Basic Central Limit Theorem 61
1
for > 0. Clearly, in the identically distributed case, rn = n− 2 and
h 1
i
gn () = σ1−2 EP X12 , |X1 | ≥ n 2 σ1 −→ 0 as n → ∞ for each > 0.
In particular, because
and observe that T̆n is again an N (0, 1)-random variable and therefore that
∆ ≡ EP ϕ(S̆n ) − hϕ, γ0,1 i = EP ϕ(S̆n ) − EP ϕ(T̆n ) .
Xk
Further, set X̆k = Σn , and define
X X
Um = Y̆k + X̆k for 1 ≤ m ≤ n,
1≤k≤m−1 m+1≤k≤n
where a sum over the empty set is taken to be 0. It is then clear that
n
X
∆≤ ∆m where ∆m ≡ EP ϕ Um + X̆m − EP ϕ Um + Y̆m .
1
Moreover, if
ξ 2 00
Rm (ξ) ≡ ϕ Um + ξ − ϕ(Um ) − ξϕ0 (Um ) −
2 ϕ (Um ), ξ ∈ R,
62 2 The Central Limit Theorem
then (because both X̆m and Y̆m are independent of Um and have the same first
two moments)
∆m = EP Rm (X̆m ) − EP Rm (Y̆m )] ≤ EP Rm (X̆m ) + EP Rm (Y̆m ) .
In order to complete the derivation of (2.1.5), note that, by Taylor’s Theorem,
3
Rm (ξ) ≤
ϕ000
|ξ| ∧
ϕ00
|ξ|2 ;
u 6 u
while
n n 3
X kϕ000 ku P X 3
σm 3 4 rn kϕ000 ku
E |Y1 |3
EP |Rm (Y̆n )| ≤ ≤ .
1
6 1
Σ3n 6
The condition that gn () −→ 0 for each > 0 is often called Lindeberg’s
condition because it introduced by J. Lindeberg and it was he who proved that
it is a sufficient condition for (2.1.1) to hold for all (cf. Theorem 2.1.8) ϕ ∈
Cb (RN ; C). Later, Feller proved that (2.1.1) for all ϕ ∈ Cb (RN ; R) plus rn → 0
imply that Lindeberg’s condition holds. Together, these two results are known
as the Lindeberg–Feller Theorem. See Exercise 2.3.20 for a proof of Feller’s
part.
§ 2.1.2. The Central Limit Theorem. If one is not concerned about rates
of convergence, then the differentiability requirement on ϕ can be dropped from
the last part of Theorem 2.1.4. In order to understand the reason for this, it is
helpful to couch the statement of Theorem 2.1.4 entirely in terms of measures.
Thus, let µn denote the distribution of S̆n . Then, under Lindeberg’s condition,
Theorem 2.1.4 allows one to say that hϕ, µn i −→ hϕ, γ0,1 i for all ϕ ∈ C 3 (RN ; C)
with bounded second and third order derivatives. Because we are dealing with
statements about integration and integration is a very forgiving operation, this
sort of result self-improves. To be precise, I prove the following lemma.
§ 2.1 The Basic Central Limit Theorem 63
−N ρ(−1 x) for > 0. Also, choose η ∈ Cc∞ B(0, 2); [0, 1] so that η = 1 on
B(0, 1), and set ηR (x) = η(R−1 x) for R > 0.
Begin by noting that hϕ, µn i −→ hϕ, µi for all ϕ ∈ Cc∞ (RN ; C). Next, suppose
that ϕ ∈ Cc (RN ; C), and, for > 0, set ϕ = ρ ? ϕ, the convolution
Z
ρ (x − y)ϕ(y) dy
RN
of ρ with ϕ. Then, for each > 0, ϕ ∈ Cc∞ (RN ; C) and therefore hϕ , µn i −→
hϕ , µi. In addition, there is an R > 0 such that supp(ϕ ) ⊆ B(0, R) for all
∈ (0, 1]. Hence,
lim hϕ, µn i − hϕ, µi ≤ 2hηR , µikϕ − ϕku .
n→∞
Since lim&0 kϕ − ϕku = 0, we have now shown that hϕ, µn i −→ hϕ, µi for all
ϕ ∈ Cc (RN ; C).
Now suppose that ψ ∈ C RN ; [0, ∞) , and set ψR = ηR ψ, where ηR is as
above. Then, for each R > 0, hψR , µi = limn→∞ hψR , µn i ≤ limn→∞ hψ, µn i.
Hence, by Fatou’s Lemma, hψ, µi ≤ limR→∞ hψR , µi ≤ limn→∞ hψ, µn i.
Finally, suppose that ψ ∈ C RN ; [0, ∞) is µn -integrable for each n ∈ Z+ and
that hψ, µn i −→ hψ, µi ∈ [0, ∞). Given {ϕn : n ≥ 1} ⊆ C(RN ; C) satisfying
|ϕn | ≤ Cψ and converging uniformly on compacts to ϕ, one has
hϕn , µn i − hϕ, µi ≤ hϕn − ϕ, µn i + hϕ, µn i − hϕ, µi.
1 A Borel measure on a topological space is locally finite if it gives finite measure to compacts.
64 2 The Central Limit Theorem
and similarly
lim hϕ, µn i − hϕ, µi
n→∞
≤ lim hηR ϕ, µn i − hηR ϕ, µi + C lim h(1 − ηR )ψ, µn i + Ch(1 − ηR )ψ, µi
n→∞ n→∞
= 2Ch(1 − ηR )ψ, µi.
as k → ∞, and, similarly,
Z
lim P a ≤ S̆n ≤ b ≤ lim EP ψk S̆n = ψk (y) γ0,1 (dy) −→ γ0,1 [a, b] .
n→∞ n→∞ R
Finally, note that γ0,1 (a, b) = γ0,1 [a, b] .
Exercises for § 2.1 65
lim EP S̆n2 ∧ R2 ≤ 1
for every R ∈ [0, ∞).
n→∞
In particular, by Lemma 2.1.7, this will certainly be the case whenever (2.1.1)
holds for every ϕ ∈ Cc (R; R). The purpose of this exercise is to show that the
Xn ’s are P-square integrable, have mean value 0, and variance no more than 1;
and the method which I will use is based on the same line of reasoning as was
given in Exercise 1.5.12.
(i) Assuming that X1 ∈ L2 (P; R), show that EP X1 = 0 and EP X12 ≤ 1. In
particular, use this together with the result in part (i) of Exercise 1.4.27 to see
that it suffices to handle the case when the Xn ’s are symmetric.
(ii) In this and the succeeding parts of this exercise, we will be assuming that
the Xn ’s are symmetric. Following the same route as was suggested in (ii) of
Exercise 1.5.12, set
and recall that X̌1t , . . . , X̌nt , . . . and X1 , . . . , Xn , . . . have the same distribu-
tion for each t ∈ (0, ∞). Use this together with our basic assumption to see that
limR→∞ sup n∈Z+ P An (t, R) = 0, where
t∈(0,∞)
( n n )
X X 1
X̌kt ≥ n 2 R .
An (t, R) ≡ Xk ∨
1 1
After noting that the Xn 1[0,t] |Xn | ’s are symmetric, check (cf. the proof of
Theorem 1.3.1) that EP |S̆nt |4 ≤ 3t4 . In particular, conclude that, for each
t ∈ (0, ∞), there is an R(t) ∈ (0, ∞) such that
1 1
EP |S̆nt |2 , An t, R(t) ≤ 3 2 t2 P An t, R(t) 2 ≤ 1 for all n ∈ Z+ .
66 2 The Central Limit Theorem
(iv) Given t ∈ (0, ∞), choose R(t) ∈ (0, ∞) as in the preceding. Taking into
account the identity Pn Pn t
t 1 Xk + 1 X̌k
S̆n = 1 ,
2n 2
show that
After checking that T2 maps P into itself, use The Central Limit Theorem to
show that, for every µ ∈ P,
Z Z
n
lim ϕ d T2 µ = ϕ dγ0,1 , ϕ ∈ Cb (R; C).
n→∞ R R
Conclude, in particular, that γ0,1 is the one and only element µ of P with the
property that T2 µ = µ and that this fixed point is attracting. (See Exercise
2.3.21 for more information.)
Exercise 2.1.12. Here is another indication of the remarkable stability of nor-
mal random variables. Namely, I will outline here a derivation2 of the Lévy–
Cramér Theorem which says that if X and Y are independent random vari-
ables whose sum is normal (with some mean and variance), then both X and Y
are normal.
2 This derivation is based on a note by Z. Sasvári, who himself borrowed some of the ideas
from A. Rényi. I know of no derivation that does not rely on complex analysis and would be
very interested in learning one.
Exercises for § 2.1 67
R2
P |X| ≥ r + R ∨ P |Y | ≥ r + R ≤ 4 exp − , R ∈ (0, ∞).
2
In particular,
show that the moment generating
functions z ∈ C 7−→ M (z) =
EP ezX ∈ C and z ∈ C 7−→ N (z) = EhP eizY ∈ C exist and are entire functions.
2
Further, note that M (z)N (z) = exp z2 , and conclude that M and N never
vanish. Finally, from the fact that X + Y has mean 0, show that one can reduce
to the case in which both X and Y have mean 0. Thus, from now on, we assume
that M 0 (0) = 0 = N 0 (0).
(iii) Because M never vanishes and M (0) = 1, elementary complex analysis (cf.
Lemma 3.2.3) guarantees that there is a unique entire function θ : C −→ C such
that θ(0) = 0 and M (z) = eθ(z) for all z ∈ C. Further, from M 0 (0) = 0, note
that θ0 (0) = 0. Thus,
∞
dn
X
cn z n P
xX
θ(z) = where n!cn = log E e ∈ R.
n=2
dxn
x=0
h i
z2
Finally, note that N (z) = exp 2 − θ(z) .
and h i h 2 i
z2
= EP ezY ≤ exp x2 − θ(x)
exp Re 2 − θ(z)
to arrive at
√
−y 2 ≤ 2Re θ(z) ≤ x2
for z = x + −1 y ∈ C.
68 2 The Central Limit Theorem
while, on the other hand (since θ(z) = θ z̄) and therefore ∂z θ(z) = 0),
Z 2π √ √
e− −1 nθ
0= θ re −1 θ dθ.
0
Hence,
Z 2π √ √
1
n
cn r = Re θ re −1 θ e− −1 nθ dθ, n ∈ Z+ and r > 0.
π 0
Finally, in combination with the estimate obtained in (iv) and the fact that
c0 = c1 = 0, this leads to the conclusion that cn = 0 for n 6= 2 and therefore
that θ(z) = c2 z 2 with 0 ≤ c2 ≤ 12 .
Exercise 2.1.13. An important result that is closely related to The Central
Limit Theorem is the following observation, which occupies a central position in
the development of classical statistical mechanics.3
(i) For each n ∈ Z+ , let λn denote the normalized surface measure on the
(n − 1)-dimensional sphere
√ 1
Sn−1 n = x ∈ Rn : |x| = n 2 ,
(1)
and denote by λn the distribution of the coordinate x1 under λn . Check that,
(1)
when n ≥ 2, λn (dt) = fn (t) dt, where
n−3
t2
ωn−2 2
1
fn (t) = 1 1− 1(−1,1) n− 2 t ,
n ωn−1
2 n
3Although E. Borel seems to have thought he was the first to discover this result and rhap-
sodizes about it a good deal in “Sur les principes de la cinétique des gaz,” Ann. l’École Norm.
sup., 3e t. 23, it appears already in the 1866 article “Über die Entwicklungen einer Funktion
von beliebig vielen Variabeln nach Laplaceshen Funktionen höherer Ordnung,” J. Reine u.
Angewandte Math., by F. Mehler and is only a small part of what Mehler discovered there. Be
that as it may, Borel deserves credit for recognizing the significance of this result for statistical
mechanics.
Exercises for § 2.1 69
and ωk−1 denotes the surface area of the (k − 1)-dimensional unit sphere in Rk .
Using polar coordinates to compute the right-hand side of
Z
k |x|2
(2π) 2 = e− 2 dx,
Rk
ωn−2 1
1 −→ √ as n → ∞.
n ωn−1
2 2π
Now, using g to denote the density for the standard Gauss distribution (i.e., the
Gauss kernel in (1.3.5)), apply these computations to show that
fn (t) fn (t)
sup sup < ∞ and that −→ 1 uniformly on compacts.
n≥3 t∈R g(t) g(t)
(ii) A less computational approach to the same calculation is the following. Let
{Xn : n p≥ 1} be a sequence of independent N (0, 1) random
variables, and set
2 2
Rn = X1 + · · · + Xn . First note that P Rn = 0 = 0 and then that the
distribution of 1
n 2 X1 , . . . , Xn
θn ≡
Rn
R2
is λn . Next, use The Strong Law of Large Numbers to see that nn −→ 1 (a.s., P)
and conclude that, for any N ∈ Z+ ,
lim EP ϕ θn(N ) = EP ϕ X1 , . . . , XN , ϕ ∈ Cc RN ; R ,
n→∞
(N )
where, for n ≥ N , θn ∈ RN denotes the projection of θn ∈ Rn onto its first
(N )
N coordinates. Conclude that if λn on RN , BRN denotes the distribution of
x = (x1 , . . . , xn ) ∈ Rn 7−→ x(N ) ≡ x1 , . . . , xN ∈ RN under λn , then
Z Z
(N ) N
for all ϕ ∈ Cb RN ; C .
lim ϕ dλn = ϕ dγ0,1
n→∞ RN RN
70 2 The Central Limit Theorem
(iii) By considering the case when N = 2, show that, for any ϕ ∈ Cb (R; R),
Z n Z !2
1X
(2.1.15) lim ϕ xk − ϕ dγ0,1 λn (dx) = 0.
n→∞
√
n R
k=1
Sn−1 ( n)
Notice that the non-computational argument has the advantage that it immedi-
(N )
ately generalizes the earlier result to cover λn for all N ∈ Z+ , not just N = 1
(cf. Exercise 2.3.24). On the other hand, the conclusion is weaker in the sense
that convergence of the densities has been replaced by convergence of integrals
with bounded continuous integrands and that no estimate on the rate of con-
vergence is provided. More work is required to restore the stronger statements
when N ≥ 2.
When couched in terms of statistical mechanics, this result can be interpreted
as a derivation of the Maxwell distribution of velocities for an ideal gas of free
particles of mass 2 and having average energy 1.
Exercise 2.1.16. The most frequently encountered applications of Stirling’s
formula (cf. (1.3.21)) are to cases when t ∈ Z+ . That is, one is usually interested
in the formula
√ n n
(2.1.17) n! ∼ 2πn .
e
and clearly (2.1.17) follows from these. In fact, if one applies the Berry–Esseen
estimate proved in the next section, one finds that
√ n n
2πn 1
e
= 1 + O n− 2 .
n!
However, this last observation is not very interesting since we saw in Exercise
1.3.19 that the true correction term is of order n−1 .4
§ 2.2 The Berry–Esseen Theorem via Stein’s Method
As we will see in the next section, the principles underlying the passage from
Theorem 2.1.4 to Theorem 2.1.8 are very general. In fact, as we will see in
Chapter 9, some of these principles can be formulated in such a way that they
extend to a very abstract setting. However, rather than delve into such exten-
sions here, I will devote this section to a closer examination of the situation at
hand. Specifically, in this section we are going to see how to make the final part
of Theorem 2.1.8 quantitative.
From (2.1.5), we get a rate of convergence in terms of the second and third
derivatives of ϕ. In fact, if we assume that
1
τk ≡ EP |Xk |3 3 < ∞,
(2.2.1) 1 ≤ k ≤ n,
To see how (2.1.5) and (2.2.2) must be modified in order to gain such information,
first observe that
Z
ϕ0 (x) Fn (x) − G(x) dx
R
(2.2.5) Z
ϕ(y) γ0,1 (dy), ϕ ∈ Cb1 (R; R .
P
= E ϕ(S̆n ) −
R
(To prove (2.2.5), reduce to the case in which ϕ ∈ Cc1 (R; R) and ϕ(0) = 0;
and for this case apply either Fubini’s Theorem or integration by parts over
the intervals (−∞, 0] and [0, ∞) separately.) Hence, in order to get information
about the distance between Fn and G, we will have to learn how to replace
the right-hand sides of (2.1.5) and (2.2.2) with expressions that depend only on
the first derivative of ϕ. For example, if the dependence is on kϕ0 ku , then we
get information about the L1 (R; R) distance between Fn and G, whereas if the
dependence is on kϕ0 kL1 (R;R) , then the information will be about the uniform
distance between Fn and G.
§ 2.2.1. L1 -Berry–Esseen. The basic idea that I will use to get estimates in
terms of ϕ0 was introduced by C. Stein and is an example of a procedure known
as Stein’s method.1 In the case at hand, his method stems from the trivial
observation that if µ is a Borel probability measure
on R and g is the Gauss
kernel in (1.3.5), then µ = γ0,1 if and only if ∂ µg = 0 in the sense of Schwartz
distribution theory. Equivalently, if A+ is the raising operator
D §E
(cf. 2.4.1) given
µ
by A+ ϕ(x) = xϕ(x) − ∂ϕ(x), then, because hA+ ϕ, µi = ϕg, ∂ g , µ = γ0,1
if and only if hA+ ϕ, µi = 0 for sufficiently many test functions ϕ. In fact, as
will be shown in what follows, µ will be close to γ0,1 if, in an appropriate sense,
hA+ ϕ, µi is small.
To make mathematics out of the preceding, I will need the following.
Lemma 2.2.6. Let ϕ ∈ C 1 (R; R), assume that kϕ0 ku < ∞, set ϕ̃ = ϕ−hϕ, γ0,1 i,
and define
Z x
x2 t2
(2.2.7) x ∈ R 7−→ f (x) ≡ e 2 ϕ̃(t)e− 2 dt ∈ R.
−∞
and
Proof: The facts that f ∈ C 1 (R; R) and that (2.2.9) holds are elementary
applications of The Fundamental Theorem of Calculus. Moreover, knowing that
f ∈ C 1 (R; R) and using (2.2.9), we see that f ∈ C 2 (R; R) and, in fact, that
To prove the estimates in (2.2.8), first note that, because ϕ̃ and therefore f are
unchanged when ϕ is replaced by ϕ − ϕ(0), I may and will assume that ϕ(0) = 0
and therefore that |ϕ(t)| ≤ kϕ0 ku |t|. In particular, this means that
Z Z q
ϕ dγ0,1 ≤ kϕ0 ku |t| γ0,1 (dt) = kϕ0 ku 2 .
π
R R
t2
ϕ̃(t)e− 2 dt = 0, an alternative expression for f
R
Next, observe that, because R
is Z ∞
x2 t2
f (x) = −e 2 ϕ̃(t)e− 2 dt, x ∈ R.
x
Thus, by using the original expression for f (x) when x ∈ (−∞, 0) and the
alternative one when x ∈ [0, ∞), we see first that
Z ∞
x2 2
ϕ̃ −t sgn(x) e− t2 dt, x ∈ R,
|f (x)| ≤ e 2
|x|
But, since
Z ∞ Z ∞
d x2
− t2 x2 t2
e2 e 2 dt ≤ e 2 t e− 2 dt − 1 = 0 for x ∈ [0, ∞),
dx x x
which means that I have now proved the first estimate in (2.2.8). To prove the
other two estimates there, derive from (2.2.10)
d − x2 0 x2
e 2 f (x) = e− 2 f (x) + ϕ0 (x)
dx
74 2 The Central Limit Theorem
Thus, reasoning as I did above and using the first estimate in (2.2.8) and the
relations in (2.2.9), (2.2.10), and (2.2.11), one arrives at the second and third
estimates in (2.2.8).
I now have the ingredients needed to apply Stein’s method to the following
example of a Berry–Esseen type of estimate.
Theorem 2.2.12 (L1 -Berry–Esseen Estimate). Continuing in the setting
of Theorem 2.1.4, one has that for all > 0 (cf. (2.1.3), (2.2.3), and (2.2.4))
√
(2.2.13)
Fn − G
1 ≤ 6(rn + ) + 3 2π gn (2).
L (R;R)
6 + 2τ 3 8τ 3
Fn − G
1
L (R;R)
≤ √ ≤ √ .
n n
Proof: Let ϕ ∈ C 1 (R; R) having bounded first derivative be given, and define
f accordingly, as in (2.2.7). Everything turns on the equality in (2.2.9). Indeed,
because of that equality, we know that the right-hand side of (2.2.5) is equal to
n
X
EP f 0 (S̆n ) − EP S̆n f (S̆n ) = E f 0 (S̆n ) − EP X̆m f (S̆n ) ,
2 P
σ̆m
m=1
σm Xm
where I have set σ̆m = Σn and X̆m = Σn . Next, define
Z 1 2 0
EP X̆m f (S̆n ) = EP X̆m f T̆n,m (t) dt
0
Z 1
E f 0 T̆n,m (0) +
2 0
2 P
f (T̆n,m (t) − f 0 T̆n,m (0) dt
= σ̆m EP X̆m
0
§ 2.2 The Berry–Esseen Theorem via Stein’s Method 75
where h i
Am ≡ EP f 0 S̆n ) − f 0 T̆n,m (0)
and h i
2
f 0 (T̆n,m (t) − f 0 T̆n,m (0)
Bm (t) ≡ EP X̆m .
Obviously, by Taylor’s Theorem and Hölder’s Inequality, for each 1 ≤ m ≤ n,
00 τm
(*) |Am | ≤ σ̆m kf ku ≤ rn ∧ kf 00 ku
Σn
while, for each t ∈ [0, 1] and > 0,
kf 0 ku h 2 i
2
kf 00 ku + 2 2 EP Xm
Bm (t) ≤ 2tσ̆m , |Xm | ≥ 2Σn .
Σn
Thus, after summing over 1 ≤ m ≤ n, integrating with respect to t ∈ [0, 1], and
using (2.2.5), (2.2.15), and (*), we arrive at
Z
ϕ0 (x) Fn (x) − G(x) dx ≤ rn + kf 00 ku + 2gn (2)kf 0 ku ,
R
p π Let
Lemma 2.2.16. ϕ ∈ C 1 (R; R), and define f accordingly, as in (2.2.7).
Then kf ku ≤ 8 kϕ kL1 (R;R) and kf 0 ku ≤ kϕ0 kL1 (R;R) .
0
Proof: I will assume, throughout, that kϕ0 kL1 (R;R) = 1. Observe that, by the
Fundamental Theorem of Calculus, (cf. the notation in Lemma 2.2.6)
Z
ϕ̃(x) = − ϕ̃y (x) ϕ0 (y) dy, where ϕy = 1(−∞,y] ,
R
and √ x2
2πxe 2 G(x ∧ y) − G(x)G(y) + 1(−∞,y] (x) − G(y) ≤ 1
which proves the first inequality. To get the second one, it suffices to consider
each of the four cases 0 ≤ x ≤ y, x ≥ 0 & y < x, y < x < 0, and x < 0 & y ≥ x
separately and take into account that, from the first part of (2.2.11),
√ x2 √ x2
x ≥ 0 =⇒ 2πxe 2 1−G(x) ≤ 1 and x < 0 =⇒ 2π|x|e 2 G(x) ≤ 1.
3
Pn 3 max τm
1 τm 1≤m≤n
(2.2.19) kFn − Gku ≤ 10 3 ≤ 10 √ .
n2 n
Proof: For each n ∈ Z+ , let βn denote the smallest number β with the property
that Pn 3
τ
kFn − Gku ≤ β 1 3 m
Σn
for all choices of random variables satisfying the hypotheses under which (2.2.18)
is to be proved. My strategy is to give an inductive proof that βn ≤ 10 for all
n ∈ Z+ ; and, because Σ1 ≤ τ1 and therefore β1 ≤ 1, I need only be concerned
with n ≥ 2.
Given n ≥ 2 and X1 , . . . , Xn , define X̆m , σ̆m , and T̆n,m (t) for 1 ≤ m ≤ n and
t ∈ [0, 1] as in the proof of Theorem 2.2.12. Next, for each 1 ≤ m ≤ n, set
n X τ` 3
p τm X
3
Σn,m = Σ2n − σm
2 , τ̆m = , ρn = τ̆m , and ρn,m = .
Σn 1
Σn,m
1≤`≤n
`6=m
Finally, set
X Sn,m
Sn,m = X` and S̆n,m = ,
Σn,m
1≤`≤n
`6=m
and let x ∈ R 7−→ Fn,m (x) ≡ P S̆n,m ≤ x ∈ [0, 1] denote the distribution
function for S̆n,m . Notice that, by definition, kFn,m − Gku ≤ βn−1 ρn,m for each
1 ≤ m ≤ n. Furthermore, because (cf. (2.1.3))
3
Σ2n,m
2 Σn
= 1 − σ̆m ≥ 1 − rn2 and ρn,m ≤ ρn ,
Σ2n Σn,m
Now let ϕ ∈ Cb2 (R; R) with kϕ00 kL1 (R) < ∞ be given, define f accordingly, as
in (2.2.7), and let
Z 1
E X̆m ϕ0 T̆n,m (ξ) dξ
P
+
0
Σn,m 0
kf ku + max EP X̆m ϕ0 T̆n,m (ξ)
≤ σ̆m kf ku +
Σn ξ∈[0,1]
kf ku + kf 0 ku + max EP X̆m ϕ0 T̆n,m (ξ) .
≤ σ̆m
ξ∈[0,1]
Similarly, from (2.2.9)) and the independence of X̆m from T̆m,n (0), one sees that
|Bm (t)| is dominated by
h i h 2 i
3
tEP X̆m f T̆n,m (t) + EP X̆m T̆n,m (0) f T̆n,m (t) − f T̆n,m (0)
h i
2
+ EP X̆m ϕ T̆n,m (t) − ϕ T̆n,m (0)
Z 1
P 3 0
+t E X̆m ϕ T̆n,m (tξ) dξ
0
kf ku + kf 0 ku + t max EP X̆m
3
3 0
≤ tτ̆m ϕ T̆n,m (ξ) .
ξ∈[0,1]
In order to handle the second term in the last line of each of these calculations,
introduce the function
0 Σn,m
(ξ, ω, y) ∈ [0, 1] × Ω × R 7−→ ψ(ξ, ω, y) ≡ ϕ ξ X̆m (ω) + y ∈ R.
Σn
§ 2.2 The Berry–Esseen Theorem via Stein’s Method 79
and
0
kϕ k 1
L (R;R) βn−1 ρn
3
|Bm (t)| ≤ tτ̆m kf ku + kf 0 ku + 12 +
00
3 kϕ kL1 (R;R)
2 2
(1 − rn )
2π(1 − rn ) 2
for all 1 ≤ m ≤ n and t ∈ [0, 1], and, after putting these together with (2.2.5)
and (2.2.15), we conclude that
Z
ϕ0 (y) G(y) − Fn (y) dy
R
3
(2.2.21) ≤ kf ku + kf 0 ku
2
kϕ0 kL1 (R;R) βn−1 kϕ00 kL1 (R;R) ρn
+ 1 + 3 ρn .
2π(1 − r2 ) 2 (1 − rn2 ) 2
n
80 2 The Central Limit Theorem
and define
Z
−1
η −1 y h(x − y) dy
h (x) = for > 0 and x ∈ R,
R
and set
ϕ,L (x) = h x−a
Lρn , x ∈ R and , L > 0.
It is then an easy matter to check that kϕ0,L kL1 (R;R) = 1 while kϕ00,L kL1 (R;R) ≤
2
Lρn . Hence, by plugging the estimates from Lemma 2.2.16 into (2.2.21) and
then letting & 0, we find that, for each L > 0,
1 Z a+Lρn
sup G(y) − Fn (y) dy
a∈R Lρn a
(2.2.22)
r
3 π 1 2βn−1
≤ 1+ + 1 + 3 ρn .
2 8 2π(1 − r2 ) 2 (1 − rn2 ) 2 L
n
But Z a Z a+Lρn
1 1
Fn (y) dy ≤ Fn (a) ≤ Fn (y) dy,
Lρn a−Lρn Lρn a
while
Z a+Lρn Z a+Lρn
1 1 Lρn
0≤ G(y) dy − G(a) = (a + Lρn − y) γ0,1 (dy) ≤ √ ,
Lρn a Lρn a 8π
and, similarly, Z a
1 Lρn
0 ≤ G(a) − G(y) dy ≤ √ .
Lρn a−Lρn 8π
Thus, from (2.2.22), we first obtain, for each L ∈ (0, ∞),
r
3 9π 3 3βn−1 L
kFn − Gku ≤ + + 1 + 3 + 1 ρn ,
2 32 8π(1 − rn2 ) 2 (1 − rn2 ) 2 L (8π) 2
Exercises for § 2.2 81
In order to complete the proof starting from (2.2.23), we have to consider the
1 1
two cases determined by whether ρn ≥ 10 or ρn < 10 . Because kFn − Gku ≤ 1,
it is obvious that we can take βn ≤ 10 in the first case. On the other hand, if
1
ρn ≤ 10 and we assume that βn−1 ≤ 10, then, because
n n n
1 X P 3 1 X P 2 32 X
3
≥ rn3 ,
ρn = 3 E |Xm | ≥ 3 E Xm = σ̆m
Σn 1 Σn 1 1
(2.2.23) says that kFn − Gku ≤ 10ρn . Hence, in either case, βn−1 ≤ 10 =⇒
βn ≤ 10.
It is clear from the preceding derivation (in particular, the final step) that the
constant 10 appearing in (2.2.18) and (2.2.19) can be replaced by the smallest
β > 1 that satisfies the equation
r r r
3 9π 9 1
− 23 − 2 4 18 1 2 − 3
β= + + 1−β + β 2 1 − β− 3 4 .
2 32 8π π
Numerical experimentation indicates that 10 is quite a good approximation to
the actual solution of this minimization problem. However, it should be rec-
ognized that, with sufficient diligence and entirely different techniques, one can
show that the 10 in (2.2.18) can be replaced by a number that is less than 1.
Thus, I do not claim that Stein’s method gives the best result, only that it gives
whatever it gives with relatively little pain.
Exercises for § 2.2
1, and set ρ (x) = −N ρ(−1 x) for ∈ (0, ∞). Next, define ψ for ∈ (0, ∞) to
be the convolution ρ ? µ of ρ with µ. That is,
Z
ψ (x) = ρ (x − y) µ(dy) for x ∈ RN .
RN
It is then easy to check that ψ ∈ Cb RN ; C and kψ kL1 (RN ;R) ≤ kµkvar for every
∈ (0, ∞). In addition, one sees
(by Fubini’s Theorem) that ψ̂ (ξ) = ρ̂( ξ)µ̂(ξ).
Thus, for any ϕ ∈ Cb (RN ; C ∩ L1 RN ; C , Fubini’s Theorem followed by the
classical Parseval Identity (cf. Exercise 2.3.23) yields
Z Z
1
hϕ , µi = ϕ(x) ψ (x) dx = ρ̂( ξ) ϕ̂(ξ) µ̂(−ξ) dξ,
RN (2π)N RN
Notice that when N = 1, the above use of the notation Σn and S̆n is consistent
with that in § 2.1.1.
With these preparations, I am ready to prove the following multidimensional
generalization of Theorem 2.1.8.
Theorem 2.3.8. Referring to the preceding, assume that the limit
Cn
(2.3.9) A ≡ lim
n→∞ Σ2n
|ϕn (y)|
(2.3.11) sup sup <∞
n≥1 y∈RN 1 + |y|2
86 2 The Central Limit Theorem
Σn (e)
q
Σn (e) = e, Cn e RN and ρn (e) = .
Σn
p
Then, ρ(e) ≡ inf n≥1 ρn (e) ∈ (0, 1] and ρn (e) −→ (e, Ae)RN as n → ∞. In
particular, if (e1 , . . . , eN ) is an orthonormal basis in RN , then
N N
X 2 X
EP |S̆n |2 = ρn (ei )2
EP ei , S̆n RN =
i=1 i=1
N
X Z
|y|2 γ0,A (dy).
−→ ei , Aei RN
=
i=1 RN
Hence, by Lemmas 2.1.7 and 2.3.3 plus (2.3.7), all that we have to do is check
that
h √ i 1
(*) fn (ξ) ≡ EP e −1 (ξ,S̆n )RN −→ e− 2 (ξ,Aξ)RN
for each ξ ∈ RN .
ξ
When ξ = 0, (*) is trivial. Thus, assume that ξ 6= 0, set e = |ξ| , and take
(e,Sn )RN
S̆n (e) = Σn (e) . Because
n
1 X 2
2
EP e, Xm RN , e, Xm RN ≥ Σn (e)
Σn (e) m=1
n
1 X 2
≤ 2 2
EP e, Xm RN , e, Xm RN ≥ ρ(e)Σn (e)
ρ(e) Σn m=1
§ 2.3 Some Extensions of The Central Limit Theorem 87
tends to 0 for each > 0, Theorem 2.1.8 combined with Lemma 2.3.3 guarantees
that, for any η ∈ R,
√ 1 2
EP e −1 ηn S̆n (e) −→ e− 2 |η|
p
for any {ηn : n ≥ 1} ⊆ R that tends to η. In particular, if η = (ξ, Aξ)RN and
ηn = ρn (e)|ξ|, we find that
√ 1
fn (ξ) = EP e −1 ηn S̆n (e) −→ e− 2 (ξ,Aξ)RN .
Proof: By Lemma 2.1.7, all that we have to prove is that hψ, µn i −→ hψ, µi.
For this purpose, note that, under our present hypotheses, Lemma 2.1.7 shows
that hψ, µi ≤ limn→∞ hψ, µn i < ∞ and that hψ ∧ R, µn i −→ hψ ∧ R, µi ≤ hψ, µi
for each R > 0. Thus, it suffices to observe that
Z
suph(ψ − ψ ∧ R), µn i = sup ψ dµn ≤ R1−p suphψ p , µn i −→ 0
n≥1 n≥1 {ψ>R} n≥1
as R → ∞.
Knowing Lemma 2.3.15, one’s problem is to find conditions under which one
can show that supn≥1 EP [ψ(S̆n )] < ∞ for an interesting class of non-negative
ψ’s. One such class is provided by the notion of a sub-Gaussian random vari-
able. Given β ∈ [0, ∞), an RN -valued random variable X is said to be β-sub-
Gaussian if
β 2 |ξ|2
EP e(ξ,X)RN ≤ e 2 , ξ ∈ RN .
(2.3.17)
Proof: Since the moment generating function of the sum of independent ran-
dom variables is the product of the moment generating functions of the sum-
mands, the final assertion is essentially trivial.
To prove the first assertion, use Lebesgue’s Dominated Convergence Theorem
to justify
β 2 t2
P
−1
P t(e,X)RN
e 2 −1
±E (e, X)RN = lim t E e − 1 ≤ lim =0
t&0 t&0 t
§ 2.3 Some Extensions of The Central Limit Theorem 89
and
β 2 t2
EP et(e,X)RN + EP e−t(e,X)RN − 2
2 e 2 −1
= β2
P
E (e, X)RN = lim ≤ 2 lim
t&0 t2 t&0 t2
β 2 t2
−tR P
t(e,X) N
P (e, X)RN ≥ R) ≤ e E e R ≤ exp −tR +
2
R2
−
for any ≥ 0 and e ∈ SN −1 , one gets P (e, X)RN ≥ R) ≤ e 2β 2 by minimizing
over t ≥ 0. Since
1
P |X| ≥ R ≤ 2N max P (e, X)| RN ≥ N − 2 R ,
e∈SN −1
α2 |X|2
the estimate for P(|X| ≥ R) follows. To get the estimate on EP e 2 , use
Tonelli’s Theorem to see that
Z Z
α2 |X|2 β 2 |ξ|2 − N
EP e(ξ,X)RN γ0,α2 I (dξ) ≤ e 2 γ0,α2 I (dξ) = 1−(αβ)2 2 .
EP e 2 =
R R
α2 |X|2
Now assume that A = EP e 2 < ∞ for some α ∈ (0, ∞) and that EP [X]
= 0. Then
1
|ξ|2 P 2 |ξ||X|
Z
EP e(ξ,X)RN = 1 + (1 − t)EP (ξ, X)2RN et(ξ,X)RN dt ≤ 1 +
E |X| e
0 2
|ξ|2 |ξ|22 P 2 α2 |X|2 |ξ|2 |ξ|22 A|ξ|2
|ξ|2
≤1+ e E |X| e
α 4 ≤1+A 2 e α ≤ 1+ e α2 ,
2 α α2
ξ2 ξ2
≤ − gn ()
2 2
and that
n
X ξXm
P
E 1 − cos , |Xm | ≥ Σn ≤ −2 .
m=1
Σ n
Finally, combine these and apply (ii) to get limn→∞ ξ 2 gn () ≤ −2 for all ξ ∈ R.
Exercise 2.3.21. It is of some interest to know that the second moment
assumption can be removed from the hypotheses in Exercise 2.1.11 and that
the result there extends to Borel probability measures on RN .R To explain what
I have in mind, first use that exercise to see that if σ 2 = R x2 µ(dx) < ∞,
then µ = T2 µ =⇒ µR∈ N (0, σ 2 ). What I want to do now is remove the a
priori assumption that R x2 µ(dx) < ∞. That is, I want to show that, for any
probability measure µ on R, µ = T2 µ ⇐⇒ µ ∈ N (0, σ 2 ) for some σ ∈ [0, ∞).
Since the “⇐=” direction is obvious,
R and, by the discussion above, the “ =⇒ ”
direction is already covered when R x2 µ(dx) < ∞, all that remains is to show
that
Z
(2.3.22) µ = T2 µ =⇒ x2 µ(dx) < ∞.
R
Finally, note that 1 − x ≤ − log x for x ∈ (0, 1], apply this to the preceding to
get Z
n
n
1 − cos 2− 2 x µ(dx) ≤ − log µ̂(1) < ∞, n ∈ N,
2
R
92 2 The Central Limit Theorem
and arrive at Z
x2 µ(dx) ≤ −2 log µ̂(1)
R
after an application of Fatou’s Lemma.
(ii) To complete the program, let µ be any solution to µ = T2 µ, and define ν by
ZZ
ν(Γ) = 1Γ (x − y) µ(dx)µ(dy).
R2
(in fact, ν is centered normal). Finally, use this and part (i) of Exercise 1.4.27
to deduce that R x2 µ(dx) < ∞.
R
Using the result just proved when N = 1, show that µ = T2 µ if and only if
µ = γ0,C for some non-negative definite, symmetric C.
Exercise 2.3.23. In connection with the preceding exercise, define Tα µ for
α ∈ (0, ∞) and Borel probability measures µ on RN , so that
ZZ
1
1Γ 2− α (x + y) µ(dx)µ(dy), Γ ∈ BRN .
Tα µ(Γ) =
RN ×RN
The problem under consideration here is that of determining for which α’s there
exist nontrivial (i.e., µ 6= δ0 ) solutions to the fixed point equation µ = Tα µ.
Begin by reducing the problem to the case when N = 1. Next, repeat the initial
argument given in part (ii) of Exercise 2.3.21 to see that there is some solution
if and only if there is one that is symmetric. Assuming that µ is a non-trivial,
symmetric solution, use the reasoning in part (i) there to see that
∞ if α ∈ (0, 2)
Z
2
x µ(dx) =
R 0 if α ∈ (2, ∞).
In particular, when α ∈ (2, ∞), there are no non-trivial solutions to µ = Tα µ.
(See § 3.2.3 for more on this topic.)
Exercise 2.3.24. Return to the setting of Exercise 2.1.13. After noting that,
so long as e ∈ Sn−1 , the distribution of
√
x ∈ Sn−1 n 7−→ (e, x)Rn ∈ R
is independent of e, use Lemma 2.3.3 to prove that the assertion in (2.1.15)
follows as a consequence of the one in (2.1.14).
Exercises for § 2.3 93
q
where q 0 = q−1 is the Hölder conjugate of q.
(iii) Suppose that X1 , . . . , Xn are independent and that, for each 1 ≤ m ≤ n,
2
Xm is βm -sub-Gaussian and has variance σm . Given {a1 , . . . , an } ⊆ R, set
v v
Xn u n u n
uX uX
S= am Xm , Σ = t (am σm )2 , and B = t (am βm )2 ,
m=1 m=1 m=1
(iv) The most famous case of the situation discussed in (iii) is when the Xm ’s are
symmetric Bernoulli (i.e., P(Xm = ±1) = 12 ). First use (iii) in Exercise 1.3.17
or direct computation to check that Xm is 1-sub-Gaussian, and then conclude
that
n
! p2 " n p # n
! p2
−(1− p +
X
2)
X X
(2.3.27) K4 a2m P
≤E am Xm ≤ Kp a2m
m=1 m=1 m=1
Hint: Refer to the beginning of the proof of Lemma 1.1.6, and let R1 , . . . , Rn be
the Rademacher functions on [0, 1), set Q = λ[0,1) × P on [0, 1) × Ω, B[0,1) × F ,
and observe that
Xn
ω ∈ Ω 7−→ S(ω) ≡ Xm (ω)
1
does under Q. Next, apply Khinchine’s inequality to see that, for each ω ∈ Ω,
n
! p2 Z n
! p2
−(1− p
2)
+
T (t, ω)p dt ≤ Kp
X X
Xm (ω)2 Xm (ω)2
K4 ≤ ,
1 [0,1) 1
and complete the proof by taking the P-integral of this with respect to ω.
At least when p ∈ (1, ∞), I will show later that this sort of inequality holds
in much greater generality. Specifically, see Burkholder’s Inequality in Theorem
6.3.6.
Exercise 2.3.29. Suppose that X is an RN -valued Gaussian random variable
with mean value 0 and covariance C.
(i) Show that if A : RN −→ RN is a linear transformation, then AX is an
N (0, ACA> ) random variable, where A> is the adjoint transformation.
Exercises for § 2.3 95
(v) Continuing with the assumption that C(22) is non-degenerate, show that
C(12) C−1
X= (22) Y +
Z
,
Y 0
Exercise 2.3.30. Given h ∈ L2 (RN ; C), recall that the (n + 2)-fold convolution
h?(n+2) is a bounded continuous function for each n ∈ N. Next, assume that
h(−x) = h(x) for almost every x ∈ RN and that h ≡ 0 off of BRN (0, 1). As an
application of part (iii) in Exercise 1.3.22, show that
" 2 #
?(n+2) (|x| − 2)+
h (x) ≤ 2khk2 2
L N
(R ;C) khkn
1 N
L (R ;C) exp − .
2n
Hint: Note that h ∈ L1 (RN ; C), assume that M ≡ khkL1 (RN ;C) > 0, and define
Af = M −1 h ? f for f ∈ L2 (RN ; C). Show that A is a self-adjoint contraction on
L2 (RN ; C), check that
Tx h, A` h L2 (RN ;C) = 0
if ` ≤ |x| − 2.
x2 dn − x2
(2.4.1) Hn (x) = (−1)n e 2 e 2 , x ∈ R.
dxn
Clearly, Hn is an nth order, real, monic (i.e., 1 is the coefficient of the highest
order term) polynomial. Moreover, if we define the raising operator A+ on
C 1 (R; C) by
x2 d − x2 dϕ
A+ ϕ (x) = −e 2 e 2 ϕ(x) = − (x) + xϕ(x), x ∈ R,
dx dx
then
At the same time, if ϕ and ψ are continuously differentiable functions whose first
derivatives are tempered (i.e., have at most polynomial growth at infinity), then
(2.4.3) ϕ, A+ ψ L2 (γ 0,1 ;C)
= A− ϕ, ψ L2 (γ0,1 ;C)
,
dϕ
where A− is the lowering operator given by A− ϕ = dx . After combining
(2.4.2) with (2.4.3), we see that, for all 0 ≤ m ≤ n,
= Hm , An+ H0 = An− Hm , H0
Hm , Hn L2 (γ0,1 ;C) L2 (γ0,1 ;C) L2 (γ0,1 ;C)
= m! δm,n ,
where, at the last step, I have used the fact that Hm is a monic mth order
polynomial. Hence, the (normalized) Hermite polynomials
Hn (x) (−1)n x2 dn − x2
H n (x) = √ = √ e2 e 2 , x ∈ R,
n! n! dxn
form an orthonormal set in L2 (γ0,1 ; C). (Indeed, they are one choice of the
orthogonal polynomials relative to the Gauss weight.)
Lemma 2.4.4. For each λ ∈ C, set
λ2
H(x; λ) = exp λx − , x ∈ R.
2
Then
∞
X λn
(2.4.5) H(x; λ) = Hn (x), x ∈ R,
n=0
n!
where the convergence is both uniform on compact subsets of R× C and, for λ’s
in compact subsets of C, uniform in L2 (γ0,1 ; C). In particular, H n : n ∈ N is
an orthonormal basis in L2 (γ0,1 ; C).
x2
Proof: By (2.4.1) and Taylor’s expansion for the function e− 2 , it is clear that
(2.4.5) holds for each (x, λ) and that the convergence is uniform on compact
subsets of R × C. Furthermore, because the Hn ’s are orthogonal, the asserted
uniform convergence in L2 (γ0,1 ; C) comes down to checking that
∞ n 2
X λ
lim sup
Hn k2 2
L (γ0,1 ;C) = 0
m→∞ |λ|≤R n=m n!
for every R ∈ (0, ∞), and obviously this follows from our earlier calculation that
2
Hn
2 = n!.
L (γ ;C)
0,1
98 2 The Central Limit Theorem
To prove the assertion that H n : n ∈ N forms an orthonormal basis in
L2 (γ0,1 ; C), it suffices to check that any ϕ ∈ L2 (γ0,1 ; C) that is orthogonal to all
of the Hn ’s must be 0. But, because of the L2 (γ0,1 ; C) convergence in (2.4.5),
we would have that
Z
ϕ(x) eλx γ0,1 (dx) = 0, λ ∈ C,
R
n=1
∞
X
θn ϕ, H N
Hθ ϕ = L2 (γ0,1 ;C)
H n, ϕ ∈ Dom Hθ .
n=0
for all θ ∈ (0, 1) and (x, λ) ∈ R × C. In conjunction with (2.4.5), this means that
Z
(2.4.6) Hθ ϕ = M ( · , y; θ) ϕ(y) γ0,1 (dy), θ ∈ (0, 1) and ϕ ∈ L2 (γ0,1 ; C),
R
and from here it is not very difficult to prove the following properties of Hθ for
θ ∈ (0, 1).
Lemma 2.4.7. For each ϕ ∈ L2 (γ0,1 ; C), (θ, x) ∈ (0, 1) × R 7−→ Hθ ϕ(x) ∈
C may be chosen to be a continuous function that is non-negative if ϕ ≥ 0
Lebesgue-almost everywhere. In addition, for each θ ∈ (0, 1) and every p ∈
[1, ∞],
(2.4.8)
Hθ ϕ
p ≤ kϕkLp (γ0,1 ;C) .
L (γ 0,1 ;C)
Hence, (2.4.8) is now proved for p ∈ [1, ∞). The case when p = ∞ is even easier
and is left to the reader.
The conclusions drawn in Lemma 2.4.7 from the Mehler representation in
(2.4.6) are interesting but not very deep (cf. Exercise 2.4.36). A deeper fact is
100 2 The Central Limit Theorem
the relationship between Hermite multipliers and the Fourier transform. For the
purposes of this analysis, it is best to define the Fourier operator F by
Z √
e −1 2πξx f (x) dx, ξ ∈ R,
(2.4.9) Ff (ξ) =
R
1
for f ∈ L (R; C). The advantage
√ of this choice is that, without the introduction
of any further factors of 2π, the Parseval Identity (cf. Exercise 2.4.37) becomes
the statement that F determines a unitary operator on L2 (R; C). In order to
relate F to Hermite multipliers, observe that, after analytically continuing the
result of another simple Gaussian computation,
Z
2 ζ2
eζx e−πx dx = e 4π for all ζ ∈ C,
R
ϕ ∈ L2 (γ0,1 ; C)
(2.4.15)
Hθ ϕ
q ≤ kϕkLp (γ0,1 ;C) for all
L (γ 0,1 ;C)
if
q1 p1
|1 − θζ|q + |1 + θζ|q |1 − ζ|p + |1 + ζ|p
(2.4.16) ≤
2 2
for every ζ ∈ C.
That (2.4.16) implies (2.4.15) is trivial is quite remarkable. Indeed, it takes
a problem in infinite dimensional analysis and reduces it to a calculus question
about functions on the complex plane. Even though, as we will see later, this
reduction leads to highly non-trivial problems in calculus, Theorem 2.4.14 has
to be considered a major step toward understanding the contraction properties
of Hermite multipliers.3
The first step in the proof of Theorem 2.4.14 is to interpret (2.4.16) in oper-
ator theoretic language. For this
purpose, let β denote the standard Bernoulli
probability measure on R, BR . That is, β {±1} = 12 . Next, use χ∅ to denote
the function on R that is constantly equal to 1 and χ{1} to stand for the iden-
tity function on R (i.e., χ{1} (x) = x, x ∈ R). It is then clear that χ∅ and
χ{1} constitute an orthonormal basis in L2 (β; C); in fact, they are the orthog-
onal polynomials there. Hence, for each θ ∈ C, we can define the Bernoulli
multiplier Kθ as the unique normal operator on L2 (β; C) prescribed by
if F = ∅
χ∅
Kθ χF =
θχ{1} if F = {1}.
2 See Beckner’s “Inequalities in Fourier analysis,” Ann. Math., # 102 #1, pp. 159–182 (1975).
3 Later, in his article “Gaussian kernels have only Gaussian maximizers,” Invent. Math. 12,
pp. 179–208 (1990), E. Lieb essentially killed this line of research. His argument, which is
entirely different from the one discussed here, handles not only the Hermite multipliers but
essentially every operator whose kernel can be represented as the exponential of a second order
polynomial.
102 2 The Central Limit Theorem
where, in the passage to the third line, I have used the continuous form of
Minkowski’s Inequality (it is at this point that the only essential use of the
hypothesis p ≤ q is made).
I am now ready to take the main step in the proof of Theorem 2.4.14.
Lemma 2.4.20. Define An : L2 (β; C) −→ L2 β n ; C) by
Pn
`=1 x`
for x ∈ Rn .
An ϕ (x) = ϕ √
n
and
(2.4.22) Hθ ϕ, ψ = lim Kθ⊗n ◦ An ϕ, An ψ
L2 (γ0,1 ;C) n→∞ L2 (β n ;C)
for every θ ∈ (0, 1). Moreover, if, in addition, either ϕ or ψ is a polynomial, then
(2.4.22) continues to hold for all θ ∈ C.
Proof: Let ϕ and ψ be tempered elements of C(R; C), and define
fn (θ) = Kθ⊗n ◦ An ϕ, An ψ
L2 (β n ;C)
and f (θ) = Hθ ϕ, ψ L2 (γ0,1 ;C)
Notice that (2.4.23) is (2.4.22) for θ ∈ (0, 1) and that In (2.4.21) follows from
(2.4.22) with ϕ = 1, ψ = |ϕ|p , and any θ ∈ (0, 1).
In order to prove (2.4.23), I will need to introduce other expressions for f (θ)
and the fn (θ)’s. To this end, set
1 θ
Cθ = ,
θ 1
Next, let, for each x ∈ R\{0}, define kθ (x, · ) to be the probability measure on R
such that kθ x, {±sgnx} = 1±θ
2 , and set kθ (0, · ) = β. Then it is easy to check
104 2 The Central Limit Theorem
R R
that R χ{0} (y) kθ (±1, dy) =R χ{0} (±1) and R χ{1} (y) kθ (±1, dy) = θχ{1} (±1)
and therefore Kθ ϕ(±1) = R ϕ(y) kθ (±1, dy) for all ϕ. Hence, if βθ be the
probability measure on R2 determined by βθ (dx × dy) = kθ (x, dy) β(dx) or,
equivalently,
then Z
Kθ ϕ, ψ L2 (β;C)
= ϕ(x) ψ(y) βθ (dx × dy).
R2
Z+
for all Φ, Ψ ∈ C(Rn ; C). Hence, if (cf. Exercise 1.1.14) Ω = R2 , F = BΩ ,
Z+
and Pθ = βθ , then
Pn
1 Zm
fn (θ) = E Pθ
F √ ,
n
time, because (ϕ, Hm )L2 (γ0,1 ;C) = 0 for m > k, f is also a polynomial of degree
at most k, and therefore (2.4.23) already implies that the convergence extends
to the whole of C and is uniform on compacts. Finally, in the case when ψ,
instead of ϕ, is a polynomial, simply note that
Kθ⊗n ◦ An ϕ, An ψ = Kθ̄⊗n ◦ An ψ, An ϕ
L2 (β n ;C) L2 (β n ;C)
and Hθ ϕ, ψ L2 (γ
= Hθ̄ ψ, ϕ L2 (γ0,1 ;C)
, and apply the preceding.
0,1 ;C)
Proof of Theorem 2.4.14: Assume that (2.4.16) holds for a given pair 1 <
p ≤ q < ∞ and θ ∈ D. We then know that (2.4.19) holds for every n ∈ Z+ .
Hence, by Lemma 2.4.20, if ϕ and ψ are tempered elements of C(R; C) and at
least one of them is a polynomial, then
Hθ ϕ, ψ L2 (γ0,1 ;C) = lim Kθ⊗n ◦ An ϕ, An ψ 2 n
n→∞ L (β ;C)
≤ lim
An ϕ
Lp (β n ;C)
An ψ
Lq0 (β n ;C) = kϕkLp (γ0,1 ;C) kψkLq0 (γ0,1 ;C) .
n→∞
In other words, we now know that, for all tempered ϕ and ψ from C(R; C),
(2.4.24) H ϕ, ψ ≤ kϕkLp (γ0,1 ;C) kψkLq0 (γ0,1 ;C)
θ L2 (γ0,1 ;C)
I will give is entirely different from Nelson’s and is much closer to the ideas
introduced by L. Gross5 as they were developed by Beckner.
Theorem 2.4.25 (Nelson). Let θ ∈ (0, 1) and p ∈ (1, ∞) be given, and set
p−1
q(p, θ) = 1 + .
θ2
Then
ϕ ∈ L2 (γ0,1 ; C),
(2.4.26)
Hθ ϕ
q ≤ kϕkLp (γ0,1 ;C) ,
L (γ0,1 ;C)
Proof: I will leave the proof of (2.4.27) as an exercise. (Try taking ϕ’s of
2
the form eλx .) Also, because γ0,1 is a probability measure and therefore the
left-hand side of (2.4.26) is non-decreasing as a function of q, I will restrict my
attention to the proof of (2.4.26) for q = q(p, θ). Hence, by Theorem 2.4.14, what
I have to do is prove (2.4.16) for every 1 < p < q < ∞ and θ ∈ (0, 1) that are
related by
12
p−1
(2.4.28) θ= .
q−1
I begin with the case when 1 < p < q ≤ 2, and I will first consider ζ ∈ [0, 1).
Introducing the generalized binomial coefficients
r r(r − 1) · · · (r − ` + 1)
≡ for r ∈ R and ` ∈ N,
` `!
and
∞
|1 − ζ|p + |1 + ζ|p
X p
=1+ ζ 2k .
2 2k
k=1
5 See Gross’s “Logarithmic Sobolev inequalities,” Amer. J. Math. 97 #4, pp. 1061–1083
(1975). In this paper, Gross introduced the idea of proving estimates on Hθ from the corre-
sponding estimates for Kθ . In this connection, have a look at Exercises 2.4.39 and 2.4.41.
§ 2.4 An Application to Hermite Multipliers 107
q
Noting that, because q ≤ 2, 2k ≥ 0 for every k ∈ Z+ , and using the fact that,
p
because pq ∈ (0, 1), (1 + x) q ≤ 1 + pq x for all x ≥ 0, we see that
pq ∞
|1 − θζ|q + |1 + θζ|q
pX q
≤1+ (θζ)2k .
2 q 2k
k=1
Hence, I will have completed the case under consideration once I check that
∞ ∞
pX q 2k
X p
(θζ) ≤ ζ 2k ,
q 2k 2k
k=1 k=1
But the choice of θ in (2.4.28) makes the preceding an equality when k = 1, and,
when k ≥ 2,
p q 2k−1
2k
q 2k θ
Y j−q
p
≤ ≤ 1,
2k j=2
j−p
|1 − ζ| + |1 + ζ| |1 − ζ| − |1 + ζ| b
a= , b= , and c = ∈ [−1, 1].
2 2 a
Then
|1 ± θζ| = 1+θ
2 (1 ± ζ) +
1−θ
2 (1 ∓ ζ) ≤ a ∓ θb,
and, therefore, by the preceding applied to c, we have that
1 1
|1 − θζ|q + |1 + θζ|q q |1 − θc|q + |1 + θc|q q
≤a
2 2
1 1 1
|1 − c|p + |1 + c|p p |a − b|p + |a + b|p p |1 − ζ|p + |1 + ζ|p p
≤a = = .
2 2 2
Hence, I have now completed the case when 1 < p < q ≤ 2 and θ is given by
(2.4.28).
108 2 The Central Limit Theorem
To handle the other cases, I will use the equivalence of (2.4.16) and (2.4.17).
Thus, what we already know is that (2.4.17) holds for 1 < p < q ≤ 2 and the θ
in (2.4.28). Next, suppose that 2 ≤ p < q < ∞. Then, since 1 < q 0 < p0 ≤ 2 and
p−1 q0 − 1
= 0 ,
q−1 p −1
≤ kϕkLp (β;C) ,
where the θ is the one given in (2.4.28). Thus, the only case that remains is the
1 1
one when 1 < p ≤ 2 ≤ q < ∞. But, in this case, set ξ = (p − 1) 2 , η = (q − 1)− 2 ,
and observe that, because the associated θ in (2.4.28) is the product of ξ with
η, Kθ = Kη ◦ Kξ and therefore
Kθ ϕ
q ≤
Kξ ϕ
L2 (β;C) ≤ kϕkLp (β;C) .
L (β;C)
(*)
h 2 i p2 h 2 i p2 p1
0 2 0 2
1−ξ + (p − 1)η + 1+ξ + (p − 1)η
≤
2
for all ξ, η ∈ R.
To prove (*), consider,
1for each α ∈ (0, ∞), the function gα : [0, ∞)2 −→ [0, ∞)
1 α
defined by gα (x, y) = x α +y α . It is an easy matter to check that gα is concave
or convex depending on whether α ∈ [1, ∞) or α ∈ (0, 1). In particular, since
p0 p0 0 p0 0
p
and similarly, because 2 ∈ (0, 1),
h 2 i p2 h 2 i p2
0 2 0 2
1−ξ + (p − 1)η + 1+ξ + (p − 1)η
2
" p2 # p2
|1 − ξ|p + |1 + ξ|p
≥ + (p0 − 1)η 2 .
2
0 0
! 20 p2
p
|1 − η|p + |1 + η|p |1 − ξ|p + |1 + ξ|p
(**) +(p−1)ξ ≤ 2
+(p0 −1)η 2 .
2 2
But because (cf. Theorems 2.4.14 and 2.4.25) we know that (2.4.16) holds with
1
p replaced by 2, q = p0 , and θ = p − 1 2 , the left side of (**) is dominated by
1 2 1 2
1 − (p0 − 1) 2 η + 1 + (p0 − 1) 2 η
(p − 1)ξ + 2
= 1 + (p − 1)ξ 2 + (p0 − 1)η 2 .
2
At the same time, again by (2.4.16), only this time with p, 2, and the same
choice of θ, we see that the right-hand side of (**) dominates
1 2 1 2
0 1 − (p − 1) 2 ξ + 1 + (p − 1) 2 ξ
(p − 1)η + 2
= 1 + (p − 1)ξ 2 + (p0 − 1)η 2 .
2
f ◦ π 1 + f ◦ π2
(2.4.34) f ◦S = √ λ2R -almost everywhere,
2
then there is an α ∈ R such that f (x) = αx for λR1 -almost every x ∈ R. Here
are steps which one can take to prove this result.
(i) After noticing that (2.4.34) holds when λR is replaced by γ0,1 , apply Exercise
2.3.21 to see that the γ0,1 -distribution of x f (x) is γ0,α for some α ∈ [0, ∞).
Conclude, in particular, that f ∈ L2 (γ0,1 ; R).
Exercises for § 2.4 111
(ii) For each n ≥ 0, let Z (n) denote span {Hn ◦ π1 Hn−m ◦ π2 : 0 ≤ m ≤ n} .
S∞
Show that Z (m) ⊥ Z (n) in L2 (γ0,1
2
; R) when m 6= n and the span of n=0 Z (n)
is dense in L2 (γ0,1
2
; R). Conclude from these that if F ∈ L2 (γ0,1
2
; R), then F =
P∞ (n)
n=0 Πn F , where Πn denotes orthogonal projection onto Z and the series
convergences in L2 (γ0,1 ; R).
(iii) Using the generating (2.4.5), show that
n
−n
X n
Hn ◦ S = 2 2 Hm ◦ π1 Hn−m ◦ π2 ,
m=0
m
is a strictly convex function that tends to 0 at both end points and is therefore
strictly negative. Hence, Ap < 1 for p ∈ (1, 2).
112 2 The Central Limit Theorem
Check that Π takes B(E; C) into itself and that kΠϕku ≤ kϕku . Next, given a
σ-finite measure µ on (E, B), say that µ is Π-invariant if
Z
µ(Γ) = Π(x, Γ) µ(dx) for all Γ ∈ B.
E
Using Jensen’s Inequality, first show that, for each p ∈ [1, ∞),
p
Πϕ (x) ≤ Π|ϕ|p (x), x ∈ E,
R
(ii) Prove the Logarithmic Sobolev Inequality
Z 2 Z
2 ϕ
2
(2.4.40) ϕ log kϕk 2 dβ ≤ 2 ϕ(x) − ϕ(−x) β(dx)
L (β;C)
R R
for strictly positive ϕ’s on R.
114 2 The Central Limit Theorem
Hint: Reduce to the case when ϕ(x) = 1 + bx for some b ∈ (0, 1), and, in this
case, check that (2.4.40) is the elementary calculus inequality
(iii) By plugging (2.4.40) into (**), arrive at (*), and conclude that (2.4.17)
holds for θ ∈ (0, 1) and q = 1 + p−1
θ2 .
Exercise 2.4.41. The major difference between Gross’s and Beckner’s ap-
proaches to proving Nelson’s Theorem 2.4.25 is that Gross based his proof on
the equivalence of contraction results like (2.4.17) and (2.4.15) to Logarithmic
Sobolev Inequalities like (2.4.40). In Exercise 2.4.38, I outlined how one passes
from a Logarithmic Sobolev Inequality to a contraction result. The object of this
exercise is to go in the opposite direction. Specifically, starting from (2.4.26),
show that
Z 2 Z
(2.4.42) 2
ϕ log ϕ
kϕkL2 (γ dγ0,1 ≤ 2 |ϕ0 |2 γ0,1 (dx)
R 0,1 ;C) R
The results in this chapter are an attempt to answer the following question.
GivenPan RN -valued random variable Y with the property that, for each n ∈ Z+ ,
n
Y = m=1 Xm , where X1 , . . . , Xn are independent and identically distributed,
what can one say about the distribution of Y?
Recall that the convolution ν1 ? ν2 of two finite Borel measures ν1 and ν2 on
RN is given by
ZZ
ν1 ? ν2 (Γ) = 1Γ (x + y) ν1 (dx)ν2 (dy), Γ ∈ BRN ,
RN ×RN
and that the distribution of the sum of two independent random variables is the
convolution of their distributions. Thus, the analytic statement of our problem
is that of describing those probability measures µ that, for each n ≥ 1, can be
written as the n-fold convolution power µ?n 1 of some probability measure µ n1 .
n
I will say that such a µ is infinitely divisible and will use I(RN ) to denote
the class of infinitely divisible measures on RN . Since the Fourier transform
takes convolution into ordinary multiplication, the Fourier formulation of this
problem is that of describing those Borel probability measures on RN whose
Fourier transform µ̂ has, for each n ∈ Z+ , an nth root which is again the Fourier
transform of a Borel probability measure on RN .
Not surprisingly, the Fourier formulation of the problem is, in many ways, the
most amenable to analysis, and it is the formulation in terms of which I will solve
it in this chapter. On the other hand, this formulation has the disadvantage that,
although it yields a quite satisfactory description of µ̂, it leaves the problem
of extracting information about µ from properties of µ̂. For this reason, the
following chapter will be devoted to developing a probabilistic understanding of
the analytic answer obtained in this chapter.
§ 3.1 Convergence of Measures on RN
In order to carry out our program, I will need two important facts about the
convergence of probability measures on RN . The first of these is a minor modifi-
cation of the classical Helly–Bray Theorem, and the second is an improvement,
due to Lévy, of Lemma 2.3.3.
115
116 3 Infinitely Divisible Laws
Hence, if
∞
X
µ(Γ) ≡ lim µ` Γ ∩ B(0, `) = µ` Γ ∩ B(0, `) \ B(0, ` − 1) ,
`→∞
`=1
and
1
µ B(0, N 2 R){ ≤ N sup µ {y : |(e, y)RN | ≥ R}
e∈SN −1
(3.1.5)
N
≤ max 1 − µ̂(ξ) : |ξ| ≤ r .
s(rR)
118 3 Infinitely Divisible Laws
r
!
sin r(e, y)RN
Z Z
1
1 − µ̂(te) dt ≥ 1− µ(dy)
r 0 RN \{0} r(e, y)RN
≥ s(rR)µ {y : |(e, y)RN | ≥ R} ,
and therefore
(3.1.7) sup 1 − µ̂(ξ) ≥ s(rR)µ {y : |(e, y)RN | ≥ R} .
ξ∈B(0,r)
Exercise 3.1.9. One might think that to address the sort of problem posed
at the beginning of this chapter, it would be helpful to know which functions
f : RN −→ C are the Fourier transforms of a probability measure. Such a
characterization is the content of Bochner’s Theorem, whose proof will be
outlined in this exercise. Unfortunately, his characterization looks more useful
than it is in practice. For instance, I will not use it to solve our problem, and it
is difficult to see how its use would simplify matters.
In order to state Bochner’s Theorem, say that a function f : RN −→ C is
N
non-negative definite if, for each n ≥ 1 and ξ1 , . . . , ξn ∈ R , the matrix
f (ξi − ξj ) 1≤i,j≤n is Hermitian and non-negative definite. Equivalently,1
n
X
f (ξi − ξj )ζi ζ̄j ≥ 0 for all ζ1 , . . . , ζn ∈ C.
i,j=1
In particular, when f ∈ L1 RN ; C , set
√
Z
−N
m(x) = (2π) e− −1 (x,ξ)RN
f (ξ) dξ,
RN
1 Recall that a non-negative definite operator on a complex Hilbert space is always Hermitian.
120 3 Infinitely Divisible Laws
and use Parseval’s Identity and Fubini’s Theorem, together with elementary
manipulations, to arrive at
Z ZZ
(2π)N m(x) ψ(x)2 dx = f (ξ − η)ψ̂(ξ)ψ̂(η) dξ dη ≥ 0
RN
RN ×RN
for all ψ ∈ L1 (RN ; R) ∩ Cb (RN ; R) with ψ̂ ∈ L1 (RN ; R). Conclude that m is non-
negative, and use this to complete the proof in the case when f ∈ L1 RN ; C .
(iii) It remains only to pass from the case when f ∈ L1 RN ; C to the general
|x|2
case. For each t ∈ (0, ∞), set ft (x) = e−t 2 f (x). Clearly, ft (0) = 1 and
ft ∈ Cb (RN ; C) ∩ L1 (RN ; C). In addition, show that
Xn Z Xn
ft ξi − ξj ζi ζ̄j = f ξi − ξj ζi (x)ζ̄j (x) γ0,tI (dx) ≥ 0,
i,j=1 RN i,j=1
√
where ζi (x) ≡ ζi e −1 (ξi ,x)RN . Hence, ft is also non-negative definite, and so,
by part (ii), we know thatft = µbt for some µt ∈ M1 (RN ). Finally, apply Lévy’s
Continuity Theorem to see that µt =⇒µ, where µ ∈ M1 (RN ) satisfies f = µ̂.
(iv) Let {µn : n ≥ 1} and f be as in Theorem 3.1.8. Combining Bochner’s
Theorem with Lemma 2.1.7, show that there exists a µ ∈ M1 (RN ) such that
f = µ̂ and µn =⇒ µ if and only if f is continuous.
Exercise 3.1.10. Suppose that f is a non-negative definite function with f (0) =
1. As we have just seen, if f is continuous, then f = µ̂ for some µ ∈ M1 (RN ).
(i) Assuming that f = µ̂, show that
Next, show that (*) follows directly from non-negative definiteness, whether
or not f is continuous. Thus, a non-negative definite function is uniformly
continuous everywhere if it is continuous at the origin.
Hint: Both parts of (*) follow from the fact that
1 f (ξ) f (η)
A = f (ξ) 1 f (ξ − η)
f (η) f (ξ − η) 1
is non-negative
definite. To get the second part, consider the quadratic form
v, Av C3 with v = (v1 , 1, −1).2
2 This choice of v was suggested to me by Linan Chen.
Exercises for § 3.1 121
Show that, for any orthonormal basis {ei : i ∈ Z+ } in H, the functions Xi (h) =
(ei , h)H , i ∈ Z+ , would be, under µ, a sequence of independent, N (0, 1)-random
variables, and conclude from this that
Z
2 Y 2
e−khkH µ(dh) = Eµ e−Xi = 0.
H i∈Z+
Hence, no such µ can exist. See Chapter 8 for a much more thorough account
of this topic.
Hint: The non-negative definiteness of f can be seen as a consequence of the
analogous result for Rn .
Exercise 3.1.12. The Riemann–Lebesgue Lemma says that fˆ(ξ) −→ 0
as |ξ| → ∞ if f ∈ L1 (RN ; C). Thus µ̂(ξ) −→ 0 as |ξ| → ∞ if µ ∈ M1 (R)
is absolutely continuous. In this exercise we will examine situations in which
µ ∈ M1 (R) but µ̂(ξ)−→
6 0 as |ξ| → ∞.
(i) Given a symmetric µ ∈ M1 (R), show that µ̂ is real valued, and use Bochner’s
Theorem to show that µ̂(ξ) cannot tend to a strictly negative number as |ξ| → ∞.
Hint: Let α > 0, and suppose that µ̂(ξ) −→ −2α as |ξ| → ∞. Choose R > 0
+
so that µ̂(ξ) ≤
−α for |ξ| ≥ R and n ∈ Z so that (n − 1)α > 1. Set A =
µ̂(`R − kR) 1≤k,`≤n , and show that A cannot be non-negative definite.
122 3 Infinitely Divisible Laws
(ii) Show that µ̂(ξ)−→ 6 0 if µ has an atom (i.e., µ({x}) > 0 for some x ∈ R).
Hint: Reduce to the case in which µ is symmetric, and therefore that µ = pδ0 +
qν, where p ∈ (0, 1], q = 1 − p, and ν ∈ M1 (R) is symmetric. If p = 1, µ̂(ξ) = 1
for all ξ. If p ∈ (0, 1), then µ̂(ξ) −→ 0 as |ξ| → ∞ implies ν̂(ξ) −→ − pq < 0.
(iii) To produce an example that is non-atomic, refer to Exercise 1.4.29, take
p ∈ (0, 1) \ { 12 }, and let µ = µp , where µp is the measure described in that
exercise. Show that µ is a non-atomic element of M1 (R) for which µ̂−→ 6 0 as
|ξ| → ∞.
Hint: Show that µ̂ never vanishes and that µ̂(2m π) is independent of m ∈ Z+ .
§ 3.2 The Lévy–Khinchine Formula
Throughout, I(R ) will be the set of µ ∈ M1 (RN ) that are infinitely divisible.
N
and therefore that πα,ν = π ?nα . To see why the Poisson measures provide a
n ,ν
more hopeful choice of starting point, let m ∈ RN and a non-negative definite,
symmetric C be given, and choose (e1 , . . . , eN ) to be p an orthonormal basis of
eigenvectors for C. Next, set mi = (m, ei )RN and σi = (ei , Cei )RN , and take
N N
!
1 X 1X
νn = δ mi ei + δ σi ei + δ− σ√i ei .
2N i=1 n 2 i=1 √n n
§ 3.2 The Lévy–Khinchine Formula 123
N √ N
!
X −1mi (ξ,ei ) N
R
X
σi (ξ, ei )RN
exp n e n −1 + n cos 1 −1 ,
i=1 i=1
n2
√
Z
−1(ξ,y)RN
π
d M (ξ) = exp e − 1 M (dy) .
Before turning to the proof of (3.2.1), I need the following simple lemma about
non-vanishing, C-valued functions. In its statement, and elsewhere,
∞
X (1 − ζ)m
(3.2.2) log ζ = − for ζ ∈ C with |1 − ζ| < 1
m=1
m
is the principle branch of logarithm function on the open unit disk around 1 in
the complex plane.
Lemma 3.2.3. Let R ∈ (0, ∞) be given. If f ∈ C B(0, R); C \ {0} with
f (0) = 1, then there is a unique `f ∈ C B(0; R); C such that `f (0) = 0 and
124 3 Infinitely Divisible Laws
f (η)
f = e`f . Moreover, if ξ ∈ B(0; R), r ∈ (0, ∞), and 1 − < 1 for all
f (ξ)
η ∈ B(ξ, r) ∩ B(0, R), then, for each η ∈ B(ξ, r) ∩ B(0, R),
f (η)
`f (η) − `f (ξ) = log ,
f (ξ)
and therefore
f (η) f (η) 1
|`f (η) − `f (ξ)| ≤ 2 1 − if 1 −
≤ .
f (ξ) f (ξ) 2
˜(ξ)
f
` ˜(ξ) − `f (ξ) ≤ 2 1 −
for ξ ∈ B(0, R).
f
f (ξ)
In particular, if {fn : n ≥ 1} ⊆ C B(0, R); C \ {0} with fn (0) = 1 for all n ≥ 1,
and if fn −→ f ∈ C B(0; R); C \ {0} uniformly on B(0, R), then f (0) = 1 and
`fn −→ `f uniformly on B(0; R).
Proof: To prove the existence and uniqueness of `f , begin by observing that
there exists an M ∈ Z+ and 0 = r0 < r1 < · · · < rM = R such that
1 − f (ξ) 1
≤ for 1 ≤ m ≤ M and ξ ∈ B(0, rm ) \ B(0, rm−1 ).
f rm−1 ξ 2
|ξ|
Set
f (η)
`(η) = `f (ξ) + log for η ∈ B(ξ, r) ∩ B(0, R),
f (ξ)
√
( −12π)−1 `(η) − `f (η) is a continuous, Z-valued func-
and check that η
tion that vanishes at ξ. Hence, ` = `f on B(0, R) ∩ B(ξ, r), and therefore on
B(0, R) ∩ B(ξ, r). Since | log(1 − ζ)| ≤ 2|ζ| if |ζ| ≤ 12 , this completes the proof
of the asserted properties of `f .
˜
Turning to the comparison between `f and `f˜ when 1 − ff (ξ) 1
(ξ) ≤ 2 for all
f˜(ξ)
ξ ∈ B(0, R), set `(ξ) = `f (ξ) + log f (ξ) , check that `(0) = 0 and f˜ = e` , and
˜
conclude that `f˜ − `f = log ff . From this, the asserted estimate for |`f˜ − `f | is
immediate.
Lemma 3.2.4. Define r s(r) as in Lemma 3.1.3, and let µ ∈ M1 (RN ) and
0 < r < R be given. If |1 − µ̂(ξ)| ≤ 12 for all ξ ∈ B(0, r) and there is an
ν ∈ M1 (RN ) such that µ = ν ?n for some
16
(3.2.5) n≥ r
,
s 4R
then πMn =⇒µ. Finally, I(RN ) is closed in the sense that µ ∈ I(RN ) if there
exists a sequence {µk : k ≥ 1} ⊆ I(RN ) such that µk =⇒ µ. In particular, µ n1
is uniquely determined and (3.2.1) holds.
Proof: Let µ ∈ I(RN ) be given. Since there is an r > 0 such that |1− µ̂(ξ)| ≤ 12
for all ξ ∈ B(0, r) and, for all n ∈ Z+ , µ = µ?n 1 for some µ n1 ∈ M1 (RN ),
n
Lemma 3.2.4 guarantees that µ̂ never vanishes. Hence, by Lemma 3.2.3, both
the existence and uniqueness of `µ follow. Moreover, if µ = µ?n 1 , then, from
n
n
µ̂(ξ) = µcn1 (ξ) , we know first that µcn1 never vanishes and then that `µ = n`,
where ` is the unique element of C(RN ; C) satisfying `(0) = 0 and µcn1 = e` . In
1
particular, this proves that µ n1 = e n ` for any µ n1 with µ = µ∗n
1 , and so there is
n
at most one such µ n1 .
Now define Mn as in the statement, and observe that
1
πd (ξ) = exp n µ̂ 1 (ξ) − 1 = exp n e n `µ (ξ) − 1 −→ e`µ (ξ) = µ̂(ξ)
Mn n
and clearly this is more than enough to show that µ̂ never vanishes. Thus we can
choose a unique ` ∈ C(RN ; C) so that `(0) = 0 and µ̂ = e` . Moreover, if `k = `µk ,
then, by Lemma 3.2.3, `k −→ ` uniformly on compacts. Now let n ∈ Z+ be given,
and choose {µk, n1 : k ≥ 1} ⊆ M1 (RN ) so that µk = µ?n k, 1
. Then we know that
n
1 1
`
1 = e n k , and so, as k → ∞, µ̂ 1 −→ e n
`
µ
[k, n k, n uniformly on compacts. Hence,
1
by Lévy’s Continuity Theorem, e n ` = µ̂ n1 for some µ n1 ∈ M1 (RN ). Since this
means that µ = µ?n1 , we have shown that µ ∈ I(R ).
N
n
and, as I already noted, only the Poisson component M offers much flexibility.
With this in mind, I introduce for each α ∈ [0, ∞) the class Mα (RN ) of Borel
measures M on RN such that
|y|α
Z
M ({0}) = 0 and α
M (dy) < ∞.
RN 1 + |y|
√−1(ξ,y) N √−1(ξ,y) N
Z i Z i
e R − 1 Mr (dy) −→ e R − 1 M (dy)
RN RN
√ h √ √
Z
i
1
e −1(ξ,y)RN − 1 − −1η(y) ξ, y RN Mr (dy).
−1 ξ, m RN
− 2 ξ, Cξ RN
+
RN
Because
√ √
Z h i
1 −1(ξ,y)RN
`r (ξ) = −1 ξ, mr RN
− 2 ξ, Cξ RN
+ e − 1 Mr (dy),
RN
Z
where mr = m − η(y)y Mr (dy),
RN
128 3 Infinitely Divisible Laws
√ h √ √
Z
i
−1(ξ, m)RN − 1
2 (ξ, Cξ)RN + e −1(ξ,y)RN − 1 − −1η(y) ξ, y RN M (dy),
RN
by
h √ √
Z
2 i
e −1(ξ,y)RN − 1 − −1η(y) ξ, y RN + 12 η(y) ξ, y RN Mr (dy)
RN
in the expression for `r . However, to re-write this `r in the form given in (*),
one would have to replace C by
Z
C− η(y)y ⊗ y Mr (dy),
RN
4
sup |1 − µcn1 (ξ)| ≤ ρR + .
|ξ|≤R ns(rρ)
1
Hence, if R ≥ r, then, by taking ρ = 4R , we obtain sup|ξ|≤R |1 − µcn1 (ξ)| ≤ 12
and therefore sup|ξ|≤R | n1 `µ (ξ)| ≤ 2 if n satisfies (3.2.5). Finally, observe that
2
is an > 0 such that s(t) ≥ t for t ∈ (0, 1], and therefore that |`µ (ξ)| ≤
there
64R2
2 1+ r 2 for |ξ| ≤ R, which completes the proof of the first assertion.
Clearly it suffices to prove (3.2.10) when c = 0. Thus, let ϕ ∈ S (RN ; C) be
given. Then, by (2.3.4),
Z
1
N
n e n `µ (ξ) − 1 ϕ̂(ξ) dξ
(2π) n hϕ, µ n1 i − ϕ(0) =
RN
Z 1 Z Z
t
= e n `µ (ξ) `µ (ξ)ϕ̂(ξ) dξ dt −→ `µ (ξ)ϕ̂(ξ) dξ,
0 RN RN
1
where (keeping in mind that |e n `µ | = |µ̂ n1 (ξ)| ≤ 1, `µ (ξ) has a most quadratic
growth, and ϕ̂(ξ) is rapidly decreasing) the passage to the second line is justified
130 3 Infinitely Divisible Laws
x
where ϕR (x) = ϕ R for R > 0. Notice that, by applying the minimum principle
to both 1 and −1, one knows that A1 = 0.
To see that Aµ satisfies both these conditions, first observe that if ϕ(0) =
minx∈RN ϕ(x), then hϕ, µ n1 i − ϕ(0) ≥ 0 for all n ∈ Z+ , and therefore that
Aµ ϕ ≥ 0. Secondly, to check that Aµ is quasi-local, note that it suffices to treat
ϕ ∈ S (RN ; R) and that for such a ϕ, ϕc N
R (ξ) = R ϕ̂(Rξ). Thus,
Z
N
(2π) Aµ ϕR = `µ R−1 ξ ϕ̂(ξ) dξ −→ 0,
RN
|ϕ(y)|
(3.2.13) sup α
< ∞ =⇒ ϕ ∈ L1 (M ; C).
y∈RN \{0} 1 ∧ |y|
Using (3.2.13), one can easily check that if ϕ ∈ Cb2 (RN ; C) and η ∈ S (RN ; R)
equals 1 in a neighborhood of 0, then
y ϕ(y) − ϕ(0) − η(y) y, ∇ϕ(0) RN
(3.2.16)
Z
+ ϕ(y) − ϕ(0) − η(y) y, ∇ϕ(0) RN M (dy).
RN
preserving and has norm Km ; and so, by the Riesz Representation Theorem,
we now know that there is a unique non-negative Borel measure Mm on RN
such that MRm is supported on B(0, 2−m+1 ) \ B(0, 2−m−2 ), Km = Mm (RN ), and
A(χm ϕ) = RN ϕ(y) Mm (dy) for all ϕ ∈ S (RN ; R).
Now define the non-negative Borel measure M on RN by M = m∈Z Mm .
P
and therefore
Z
(3.2.17) Aϕ = ϕ(y) M (dy)
RN
Now let η be as in the statement of the lemma, and set ηR (y) = η(R−1 y) for
R > 0. By (**) with ϕ(y) = |y|2 η(y) we know that
Z
|y|2 η(y) M (dy) ≤ Aϕ < ∞.
RN
§ 3.2 The Lévy–Khinchine Formula 133
By our assumptions about ϕ at 0, we can find a C < ∞ such that |ηR ϕ(y)| ≤
CR|y|2 η(y) for all R ∈ (0, 1]. Hence, by (*) and the M -integrability of |y|2 η(y),
there is a C 0 < ∞ such that |A(ηR ϕ)| ≤ C 0 R for small R > 0, and therefore
A(ηR ϕ) −→ 0 as R & 0.
To complete the proof from here, let ϕ ∈ S (RN ; R) be given, and set
ϕ̃(x) = ϕ(x) − ϕ(0) − η(x) x, ∇ϕ(0) RN − 12 η(x)2 x, ∇2 ϕ(0)x RN .
Then, by the preceding, (3.2.17) holds for ϕ̃ and, after one re-arranges terms,
says that (3.2.16) holds. Thus, the properties of C are all that remain to be
proved. That C is symmetric requires no comment. In addition, from (*), it
is clearly non-negative definite. Finally, to see that it is independent of the η
chosen, let η 0 be a second choice, note that ηξ0 = ηξ in a neighborhood of 0, and
apply (3.2.17).
134 3 Infinitely Divisible Laws
(3.2.21) Z √
√
+ e −1 (ξ,y)RN − 1 − −1η(y) ξ, y RN M (dy)
RN
for any Lévy system (m, C, M ) and any Borel measurable η : RN −→ [0, 1]
that satisfies (3.2.19). Furthermore, because `η(m,C,Mr ) −→ `η(m,C,M ) uniformly
on compacts when Mr (dy) = 1[r,∞) (|y|) M (dy), it is clear that `η(m,C,M ) is
continuous.
Theorem 3.2.22 (Lévy–Khinchine). For each µ ∈ I(RN ), there is a unique
1
`µ ∈ C(RN ; C) such that `µ (0) = 0 and µ̂ = e`µ , and, for each n ∈ Z+ , e n `µ is
the Fourier transform of the unique µ n1 ∈ M1 (RN ) satisfying µ = µ?n1 . Next,
n
let η : RN −→ [0, 1] be a Borel measurable function that satisfies (3.2.19).
Then, for each µ ∈ I(RN ), there is a unique Lévy system (mηµ , Cµ , Mµ ) such
that `µ = `η(mη ,Cµ ,Mµ ) , and, for each Lévy system (m, C, M ), there is a unique
µ
Z
ϕ(y) Mµ (dy) = lim nhϕ, µ n1 i
RN n→∞
Z Z
2
Cµ = lim n η0 (y) y ⊗ y µ n1 (dy) − η0 (y)2 y ⊗ y Mµ (dy),
n→∞ RN RN
and Z
mηµ0 = lim n η0 (y)y µ n1 (dy)
n→∞ RN
given. For µ ∈ I(RN ), I will show that `µ = `η(mη ,C,M ) , where mη , C, and M are
determined from (cf. √
(3.2.10)) Aµ as in Lemma 3.2.14. To this end, define eξ for
ξ ∈ RN by eξ (x) = e −1(ξ,x)RN , and set ηR (x) = η(R−1 x) for R > 0. The idea
136 3 Infinitely Divisible Laws
is to show that, as R → ∞, Aµ (ηR eξ ) tends to both `µ (ξ) and to `η(mη ,C,M ) (ξ).
To check the first of these, use (3.2.10) to see that
Z Z
(2π)N Aµ (ηR eξ ) = ηR (ξ 0 + ξ) dξ 0 =
`µ (ξ 0 )c `µ (R−1 ξ 0 − x)η̂(ξ 0 ) dξ 0 .
RN RN
Hence, since `µ is continuous and, by Lemma 3.2.9, supR≥1 |`µ (R−1 ξ)η̂(ξ)| is
rapidly decreasing, Lebesgue’s Dominated Convergence Theorem says that
Z
lim Aµ (ηR eξ ) = `µ (−ξ)(2π)−N η̂R (ξ 0 ) dξ 0 = `µ (ξ).
R→∞ RN
η
To prove that Aµ (ηR eξ ) also tends to `(mη ,C,M ) (ξ), use (3.2.16) to write
√
Z
Aµ (ηR eξ ) = `η(mη ,C,M ) (ξ) −
−1
1 − ηR (y) eξ (y) M (dy),
RN
and observe that the last term is dominated by M B(0, R){ −→ 0.
So far we know that, for each µ ∈ I(RN ), there is a Lévy system (mη , C, M )
such that `µ (ξ) = `η(mη ,C,M ) . Moreover, in the preliminary discussion at the
beginning of this subsection, it was shown that, for each Lévy system (m, C, M ),
there exists a µ ∈ I(RN ) for which `η(m,C,M ) = `µ .
Finally, let η0 be as in the statement of this theorem. Given µ ∈ I(RN ), let
mµ ∈ RN , Cµ ∈ Hom(RN ; RN ), and Mµ ∈ M2 (RN ) be associated with Aµ as in
η0
C = Cµ , and M = Mµ .
The expression in (3.2.21) for `µ in terms of a Lévy system is known as the
Lévy–Khinchine formula.
Exercises for § 3.2 137
Hint: The first part is completely elementary complex analysis. To handle the
second part, begin by arguing that it is enough to treat the cases when either
M = 0 or C = 0. The case M = 0 is trivial, and the case when C = 0 can be
further reduced to the one in which µ = πM for an M ∈ M0 (RN ) with compact
P∞ m
support in RN \ {0}. Finally, use the representation πM = e−α m=0 αm! ν ?m to
complete the computation in this case.
Exercise 3.2.24. Given µ ∈ I(RN ) and knowing (3.2.20), show that
√ 2
ξ, mµ = − −1 lim t−1 `µ (tξ) + t2 ξ, Cµ ξ RN
and
t→∞
√
Z √
`µ (ξ) = − 12 ξ, Cµ ξ RN + −1 ξ, mµ RN + e −1(ξ,y)RN − 1 Mµ (dy).
RN
r
(i) To prove the “if” assertion, set M (dy) = 1[r,∞) (y) Mµ (dy) for r > 0, and
show that δmµ ? πM r (−∞, 0] = 0 for all r > 0 and δmµ ? πM =⇒µ as r & 0.
r
(ii) As a consequence of (i), we know that the µt ’s are infinitely divisible. Show
that their Lévy–Khinchine representation is
" Z #
√
−y dy
−1 ξy
µbt (ξ) = exp t e −1 e .
(0,∞) y
Exercise 3.2.27. Given a µ ∈ M1 (RN ) for which there exists a strictly increas-
ing sequence {nm : m ≥ 1} ⊆ Z+ and a sequence {µ n1 : m ≥ 1} ⊆ M1 (RN )
m
such that µ = µ?n1 m for all m ≥ 1, show that µ ∈ I(RN ).
nm
Hint: First use Lemma 3.2.4 to show that µ̂ never vanishes and therefore that
there is a unique `µ ∈ C(RN ; C) such that `µ (0) = 0 and µ̂ = e`µ . Next,
proceed as in the proof of Theorem 3.2.7 to show that µ ∈ P(RN ), and apply
that theorem to conclude that µ ∈ I(RN ).
§ 3.3 Stable Laws 139
for Borel measurable ϕ : RN −→ [0, ∞). It is easy to check that T̂α maps
M2 (RN ) into itself.
Lemma 3.3.2. For any α ∈ (0, 2),
( C = 0, Mµ = T̂α Mµ , and
µ
µ ∈ Fα (RN ) ∪ {δ0 } ⇐⇒
Z
1 1
1− α η
(1 − 2 )mµ = η(y) − η(2 α y) y Mµ (dy).
RN
In addition, if M ∈ M2 (RN ) \ {0} satisfies M = T̂α M for some α ∈ (0, 2), then
M ∈ Mβ (RN ) for all β > α but M ∈ / Mα (RN ).
Proof: From the uniqueness of the Lévy system associated with an element of
2
I(RN ), it is clear that, for any µ ∈ I(RN ), MTα µ = T̂α Mµ , CTα µ = 21− α Cµ ,
and Z
η 1 1
1− α η
mTα µ = 2 mµ + η(y) − η(2 α y) y T̂α Mµ (dy).
RN
140 3 Infinitely Divisible Laws
2
Hence, µ ∈ Fα (RN ) ∪ {δ0 } if and only if Mµ = T̂α Mµ , Cµ = 21− α Cµ , and, for
any η satisfying (3.2.19),
Z
1 1
1− α
)mηµ
(1 − 2 = η(y) − η(2 α y) y Mµ (dy).
RN
1
From this we see that κ ≡ M B(0, 1) \ B(0, 2− α ) > 0 unless M = 0 and that
P∞ β
the M -integral of |y|β over B(0, 1) is bounded below by 2−1 κ n=0 2n(1− α ) and
P∞ β
above by κ n=0 2n(1− α ) .
Theorem 3.3.3. µ ∈ F2 (RN ) if and only if µ = γ0,C for some non-negative
definite, symmetric C ∈ Hom(RN ; RN ) \ {0}. If α ∈ (1, 2), then µ ∈ Fα (RN ) if
and only if µ ∈ I(RN ) and `µ (ξ) equals
√
−1
Z
1
1− α
ξ, y RN
M (dy)
1−2
1
2− α <|y|≤1
√ √
Z
−1(ξ,y)RN
+ e −1− −1 1[0,1] (|y|) ξ, y RN
M (dy)
RN
T
for some M ∈ β>α Mβ (RN ) \ Mα (RN ) satisfying M = T̂α M . If α ∈ (0, 1),
then µ ∈ Fα (RN ) if and only if µ ∈ I(RN ) and `µ (ξ) equals
√
Z
−1(ξ,y)RN
e − 1 M (dy)
RN
T
N
for some M ∈ β>α M β (R ) \ Mα (RN ) satisfying M = T̂α M . Finally, µ ∈
F1 (RN ) if and only if µ ∈ I(RN ) and either µ = δm for some m ∈ RN \ {0} or
√ √ √
Z
e −1(ξ,y)RN − 1 − −1 1[0,1] (|y|) ξ, y RN M (dy)
`µ (ξ) = −1 m, ξ RN
+
RN
T
for some m ∈ RN and M ∈ N N
β∈(1,2] Mβ (R ) \ M1 (R ) satisfying M = T̂1 M
and Z
y M (dy) = 0.
1
2 <|y|≤1
§ 3.3 Stable Laws 141
Proof: The first assertion requires no comment. When α ∈ (0, 2), the “if”
1
assertions can be proved by checking that, in each case, `µ (ξ) = 2`µ (2− α ξ).
When α ∈ [1, 2), the “only if” assertion follows immediately from Lemma 3.3.2
with η = 1B(0,1) , and when α ∈ (0, 1), it follows from that lemma combined
with the observation that
Z Z
1
1− α
M = T̂α M =⇒ 1 − 2 y M (dy) = 1
y M (dy).
B(0,1) {2− α <|y|≤1}
§ 3.3.2. α-Stable Laws. The most studied elements of Fα (RN ) are the α-
stable laws: those µ ∈ I(RN )\{δ0 } such that `µ (tξ) = tα `µ (ξ) for all t ∈ (0, ∞),
not just for t = 2. Equivalently, if µ ∈ M1 (RN ) is α-stable if and only if
µ ∈ I(RN ) \ {δ0 } and, for all non-negative, Borel measurable functions ϕ,
Z Z
1
ϕ(y) µt (dy) = ϕ(t α y) µ(dy), t ∈ (0, ∞),
RN RN
where µbt (ξ) = et`µ (ξ) . Thus, there are no α-stable laws if α > 2, and µ is 2-stable
if and only if µ = γ0,C for some C 6= 0. To examine the α-stable laws when
α ∈ (0, 2), I will need the computations contained in the following lemmas.
Lemma 3.3.4. Assume that M ∈ M2 (RN ) and that α ∈ (0, 2), and define the
finite Borel measure ν on SN −1 by
Z
1 y
2 −|y|
hϕ, νi = ϕ |y| |y| e M (dy)
Γ(2 − α) RN \{0}
uniqueness of the Laplace transform (cf. Exercise 1.2.12) implies that ρ(dr) =
hϕ1 , νir1−α dr, and therefore that
Z Z Z
y
ϕ2 (r) dr
ϕ1 |y| M (dy) = ρ(dr) = hϕ1 , νi ϕ1 (r) .
RN \{0} (0,∞) r2 (0,∞) r1+α
Lemma 3.3.6. Let µ ∈ I(RN ). Then µ is 2-stable if and only if µ = γ0,C for
some symmetric, non-negative definite C 6= 0; µ is α-stable for some α ∈ (0, 1)
if and only if there is a finite, non-negative Borel measure ν 6= 0 on SN −1 such
that !
√−1(ξ,rω) N
Z Z
dr
`µ (ξ) = e R − 1 1+α ν(dω);
SN −1 (0,∞) r
and µ is α-stable for some α ∈ (1, 2) if and only if there is a finite, non-negative,
Borel measure ν 6= 0 on SN −1 such that `µ (ξ) equals
√
−1
Z
ξ, ω RN
ν(dω)
1−α SN −1
!
√−1(ξ,rω) N √
Z Z
dr
+ e R − 1 − −11[0,1] (r) ξ, rω RN 1+α ν(dω).
SN −1 (0,∞) r
§ 3.3 Stable Laws 143
Proof: The sufficiency part of each case is easy to check directly or as a conse-
quence of Theorem 3.3.3. To prove the necessity, first check that if µ is α-stable
and therefore `µ (tξ) = tα `µ (ξ), then M must have the scaling property in (3.3.5)
and therefore have the form described in Lemma 3.3.4. Second, when M has
this form, simply check that in each case the result in Theorem 3.3.3 translates
into the result here.
In the following, C+ denotes the open upper half-space {ζ ∈ C : Im(ζ) > 0}
in C, and C+ denotes its closure {ζ√ ∈ C : Im(ζ) ≥ 0}. In addition, given ζ ∈ C
and α ∈ (0, 2), we take ζ α ≡ |ζ|α e −1αargζ
√
, where argζ is 0 if ζ = 0 and is the
unique θ ∈ (−π, π] such that ζ = |ζ|e −1θ if ζ 6= 0.
Lemma 3.3.7. If α ∈ (0, 1), then
√ α
−1ζr
−1 Γ(1 − α
Z
e ζ
dr = √ for ζ ∈ C+ .
(0,∞) r1+α α −1
In particular,
Γ(2−α)
cos απ
α(α−1) 2 if α ∈ (1, 2)
cos r − 1
Z
aα ≡ dr = − Γ(1−α) cos απ if α ∈ (0, 1)
(0,∞) r1+α π α
2
−2 if α = 1
and
Γ(1 − α)
Z
sin r απ
bα ≡ dr = sin if α ∈ (0, 1).
(0,∞) r1+α α 2
Proof: Let fα (ζ) denote the integral on the left-hand side of the first equation.
Clearly fα is continuous on C+ and analytic on C+ . In addition, fα (ξ) = ξ αfα (1)
for ξ ∈ (0, ∞), and Re fα (1) < 0. Hence, there exist c > 0 and θ ∈ 0, π2 such
√
that fα (ξ) = −ce −1θ ξ α for ξ ∈ (0, ∞). Since ζ ∈ C+ 7−→ ζ α ∈ C is the unique
continuous extension of ξ ∈ (0,√∞) 7−→ ξ α ∈ (0, ∞) to C+ that is analytic on
C+ , we know that fα (ζ) = −ce −1θ ζ α for ζ ∈ C+ . In addition,
√ e−r − 1 Γ(1 − α)
Z Z
1
fα ( −1) = 1+α
dr = − r−α e−r dr = − .
(0,∞) r α (0,∞) α
Hence, c = Γ(1−α)
α and θ = − απ2 .
When α ∈ (0, 1), the values of aα and bα follow immediately from the evalu-
ation of fα (1). When α ∈ (1, 2), one can find the value of aα by first observing
that
cos(ξr) − 1 cos r − 1
Z Z
α
1+α
dr = ξ dr for ξ ∈ (0, ∞),
(0,∞) r (0,∞) r1+α
144 3 Infinitely Divisible Laws
cos r − 1
Z Z
sin r
α dr = − dr = −bα−1 .
(0,∞) r1+α (0,∞) rα
Γ(2 − α) cos απ
2 π
a1 = lim aα = − lim =− .
α&1 α&1 α 1−α 2
and
√ √
Z
`µ (ξ) = −1 ξ, m RN − −1 ξ, ω RN
log ξ, ω RN
ν(dω),
SN −1
√
where ζ log ζ = ζ log |ζ| + −1ζargζ for ζ ∈ C.
Proof: When α ∈ (0, 1), the conclusion is a simple application of the cor-
responding results in Lemmas 3.3.6 and 3.3.7. When α ∈ (1, 2), one has to
massage the corresponding expression in Lemma 3.3.6. Specifically, begin with
the observation that
√
−1ξ h √ √
Z i dr
+ e −1ξr − 1 − −1ξ1[0,1] (r)r 1+α
1−α (0,∞) r
√ !
h √ √ −1sgn(ξ)
Z i dr
α −1sgn(ξ)r
= |ξ| e − 1 − −1sgn(ξ)1[0,1] (r)r 1+α +
(0,∞) r 1−α
Next use integration by parts over the intervals (0, 1] and [1, ∞) to check that
cos r − 1
Z Z
dr 1 1 1 aα−1
sin r − 1[0,1] (r)r 1+α
= + α
dr = + .
(0,∞) r α−1 α (0,∞) r α−1 α
aα−1 Γ(2−α)
Hence, since α = − α(α−1) sin απ
2 ,
Γ(2 − α) ∓ απ
gα (±1) = e 2 ,
α(α − 1)
and therefore
α
α Γ(2 − α) (ξ, ω)RN
gα sgn(x, ω)RN ξ, ω RN = √ .
α(α − 1) −1
1−α
Thus, all that we need to do is replace the ν in Theorem 3.3.8 by Γ(1−α) ν.
Turning to the case α = 1, note that, because of the mean zero condition on
ν,
!
√ √
Z Z h i dr
−1(ξ,ω)RN r
e −1− −11[0,1] (r)r ξ, ω RN
ν(dω)
SN −1 (0,∞) r2
!
Z h √ Z i dr
= lim e −1(ξ,ω)RN r − 1 1+α ν(dω)
α%1 SN −1 (0,∞) r
α
Γ(1 − α)
Z
(ξ, ω)RN
= − lim √ ν(dω)
α%1 α SN −1 −1
√
Z
1 α
= −1 lim ξ, ω RN − ξ, ω RN ν(dω)
α%1 1 − α SN −1
√
Z
= − −1 ξ, ω RN log ξ, ω RN ν(dω),
SN −1
Corollary 3.3.9. For any α ∈ (0, 2], µ is a symmetric and α-stable law if
and only if there is a finite, non-negative, symmetric, Borel measure ν 6= 0 on
SN −1 such that Z
ξ, ω N α ν(dω).
`µ (ξ) = − R
SN −1
we see that
Z
1
ξ, ω N 2 ν(dω).
`µ (ξ) = − ξ, Cξ RN =
2 SN −1
R
the expression for `µ (ξ) in Theorem 3.3.8. Hence, by the preceding calculation,
`µ (ξ) has the desired form.
Finally, if µ is a rotationally invariant, α-stable law, then `µ (ξ) is a rotationally
invariant function of ξ and therefore the preceding leads to
Z Z
α
|ξ|α ω, ω 0 ν(dω 0 ) λSN −1 (dω) = −t|ξ|α ,
`µ (ξ) = −
SN −1 SN −1
Exercise 3.3.10. Given α ∈ (0, 2), define Sα ν for finite, non-negative, Borel
1
measures ν on B(0, 1) \ B(0, 2− α ) by
Z
m
X
Sα ν(Γ) = 2−m 1Γ (2 α y) ν(dy),
m∈Z RN
and show that this map is one-to-one and onto the set of M ∈ M2 (RN ) satisfying
(cf. (3.3.1)) M = T̂α M . Conclude that, for each α ∈ (0, 2), Fα (RN ) contains
lots of elements!
Exercise 3.3.11. Here are a few further properties of elements of Fα (RN ).
(i) Show that there is µ ∈ Fα (RN ) such that µ {y : (e, y)RN < 0} = 0 for some
e ∈ SN −1 if and only if α ∈ (0, 1).
Hint: Reduce to the case when N = 1, and look at Exercise 3.2.24.
N −1
(ii) If µ ∈ F1 (RN ), show that,
for every e ∈ S , µ {y : (e, y)RN < 0} >
0 ⇐⇒ µ {y : (e, y)RN > 0} > 0.
(iii) If α ∈ (1, 2), show that for each > 0 there is a µ ∈ Fα (R) such that
µ (−∞, −] = 0.
Exercise 3.3.12. Take N = 1. This exercise is about an important class of
stable laws known as one-sided stable laws: stable laws that are supported
on [0, ∞).
(i) Show that there exists a one-sided α-stable law only if α ∈ (0, 1).
148 3 Infinitely Divisible Laws
(ii)If α ∈
α(0, 1), show that µ is a one-sided α-stable law if and only if `µ (ξ) =
ξ
−t √−1 for some t ∈ (0, ∞).
(iii) Let α ∈ (0, 1), and use νtα to denote the one-sided α-stable law with `νtα (ξ) =
α
−t √ξ−1 . Show that
Z √
α
−1ζy ζ
e νtα (dy) = exp −t √ for ζ ∈ C with Im(ζ) ≥ 0.
[0,∞) −1
Exercise 3.3.13. Given α ∈ (0, 2], let µα t denote the symmetric α-stable law,
described in Corollary 3.3.9, with `µαt (ξ) = −t|ξ|α . Clearly µ2t = γ0,2tI . When
α ∈ (0, 2), show that
Z
α
α
µt = γ0,2τ I νt2 (dτ ),
[0,∞)
α
where νt2 is the one-sided α2 -stable law in part (iii) of the preceding exercise.
This representation is an example of subordination, and, as we will see in
Exercise 3.3.17, can be used to good effect.
dνtα
(3.3.15) hα
t = for t ∈ (0, ∞),
dλR
1 1
−α α −α
and that hα
t (τ ) ≡ t h1 (t τ ).
Exercises for § 3.3 149
(ii) Only when α = 12 is an explicit expression for hα1 readily available. To find
this expression, first note that, by the uniqueness of the Laplace transform (cf.
1
Exercise 1.2.12) and (i), h12 is uniquely determined by
Z ∞
2 1
e−λ τ h12 (τ ) dτ = e−λ , λ ∈ [0, ∞).
0
Next, show that
Z ∞ 1 ∞ 1
π 2 e−2ab π 2 e−2ab
Z
a2 2 a2
1 3
+b2 τ )
τ − 2 e−( τ +b τ ) dτ = and τ − 2 e−( τ dτ =
0 b 0 a
2
for all (a, b) ∈ (0, ∞) , and conclude from the second of these that
1
1 1(0,∞) (τ )e− 4τ
(3.3.16) h1 (τ ) =
2
√ 3 .
4πτ 2
1 1
Hint: To prove the first identity, try the change of variables x = aτ − 2 − bτ 2 ,
and get the second by differentiating the first with respect to a.
Exercise 3.3.17. In this exercise we will discuss the densities of the symmetric
stable laws µα t for α ∈ (0, 2) (cf. Exercise 3.3.13). Once again, we know that
each µαt admits a smooth density with respect to Lebesgue measure λRN on RN .
Further, it is clear that this density is symmetric and that
dµα 1 dµ
α
1
t
(x) = t− α 1
(t− α x) for t ∈ (0, ∞).
dλRN dλRN
(i) Referring to Exercise 3.3.14 and using Exercise 3.3.12, show that
Z ∞
dµα 1 −N
|x|2 α
2 e− 4τ h 2 (τ ) dτ.
1
(3.3.18) (x) = N τ
dλRN (4π) 2 0
1
(ii) Because we have an explicit expression for h12 , we can use (3.3.18) to get an
dµ11
explicit expression for dλRN . In fact, show that
dµ1t N 2tN
(3.3.19) (x) = πtR (x) ≡ N +1 , (t, x) ∈ (0, ∞) × RN ,
dλRN ωN (t2 + |x|2 ) 2
N +1 −1
where ωN = 2π 2 Γ N2+1 is the surface area of SN in RN +1 . The function
R
π1 is the density for what probabilists call the Cauchy distribution. For
N
general N ’s, (t, x) ∈ (0, ∞) × RN 7−→ πtR (x) is what analysts call the Poisson
kernel for the right half-space in RN +1 . That is (cf. Exercise 10.2.22), if f ∈
Cb (RN ; R), then
Z
N
(t, x) uf (t, x) = f (x − y) πtR (y) dy
RN
is the unique, bounded harmonic extension of f to the right half-space.
150 3 Infinitely Divisible Laws
for f ∈ L1 (RN ; C). This can be used to prove that k · kα determines a Hilbert
norm on Cc (RN ; C).
Chapter 4
Lévy Processes
Although analysis was the engine that drove the proofs in Chapter 3, probability
theory can do a lot to explain the meaning of the conclusions drawn there.
Specifically, in this chapter I will develop an intuitively appealing way of thinking
about a random variable
√−1 (ξ,X) X whose distribution is infinitely divisible, an X for
P
which E e RN
equals
√
1
exp −1 ξ, m) − 2 ξ, Cξ RN
h √ √
Z
−1 (ξ,y)RN
i
+ e − 1 − −1 1[0,1] |y| ξ, y RN M (dy)
RN
151
152 4 Lévy Processes
For reasons that should be obvious now, an evolution {Z(t) : t ∈ [0, ∞)} of the
sort described above used to be called a process with independent, homo-
geneous increments, the term “process” being the common one for continuous
families of random variables and the adjective “homogeneous” referring to the
fact that the distribution of the increment Z(t) − Z(s) for 0 ≤ s < t depends
only on the length t − s of the time interval over which it is taken. In more
recent times, a process with independent, homogeneous increments is said to be
a Lévy process, and so I will adopt this more modern terminology.
Assuming that the family {Z(t) : t ∈ [0, ∞)} exists, notice that we already
know what the joint distribution of {Z(tk ) : k ∈ N} must be for any choice of
0 = t0 < · · · < tk < · · · . Indeed, Z(0) = 0 and
K
Y
P Z(tk ) − Z(tk−1 ) ∈ Γk , 1 ≤ k ≤ K = µtk −tk−1 (Γk )
k+1
for any K ∈ Z+ and Γ1 , . . . , ΓK ∈ BRN . Equivalently, P Z(tk ) ∈ Γk , 0 ≤ k ≤ K
equals
Z Z Y K X k K
Y
1Γ0 (0) · · · 1Γk yj µtk −tk−1 (dyk )
k=1 j=1 k=1
(RN )K
K
X
var[a,b] (ψ) = sup |ψ(tk ) − ψ(tk−1 )| : K ∈ Z+
(4.1.2) k=1
and a = t0 < t1 < · · · < tK = b
is finite subset of (0, t]. In addition, there exists an n(t, r, ψ) ∈ N such that, for
every n ≥ n(t, r, ψ) and m ∈ Z+ ∩ (0, 2n ],
Finally,
kψk[0,t] = lim max |ψ(m2−n t)| : m ∈ N ∩ [0, 2n ]
n→∞
and X
ψ m2−n t − ψ (m − 1)2−n t .
var[0,t] (ψ) = lim
n→∞
m∈Z+ ∩[0,2n ]
Proof: Begin by noting that it suffices to treat the case when t = 1, since one
can always reduce to this case by replacing ψ with τ ψ(tτ ).
If kψk[0,1] were infinite, then we could find a sequence {τn : n ≥ 1} ⊆ [0, 1] such
that |ψ(τn )| −→ ∞, and clearly, without loss in generality, we could choose this
sequence so that τn −→ τ ∈ [0, 1] and {τn : n ≥ 1} is either strictly decreasing or
strictly increasing. But, in the first case this would contradict right-continuity,
and in the second it would contradict the existence of left limits. Thus, kψk[0,1]
must be finite.
Essentially the same reasoning shows that J(1, r, ψ) is finite. If it were not,
then we could find a sequence {τn : n ≥ 0} of distinct points in (0, 1] such
that |ψ(τn ) − ψ(τn −)| ≥ r, and again we could choose them so that they were
either strictly increasing or strictly decreasing. If they were strictly increasing,
then τn % τ for some τ ∈ (0, 1] and, for each n ∈ Z+ , there would exist a
τn0 ∈ (τn−1 , τn ) such that |ψ(τn ) − ψ(τn0 )| ≥ 2r , which would contradict the
existence of a left limit at τ . Similarly, right-continuity would be violated if the
τn ’s were decreasing.
Although it has the same flavor, the proof of the existence of n(1, r, ψ) is a
bit trickier. Let 0 < τ1 < · · · τK ≤ 1 be the elements of J(1, r, ψ). If n(1, r, ψ)
§ 4.1 Stochastic Processes, Some Generalities 155
The assertion about var[0,1] (ψ) is proved in essentially the same manner, al-
though now the monotonicity comes from the triangle inequality and the first
equality in the preceding must be replaced by |ψ(t)−ψ(t−)| = limn→∞ |ψ(btc+n )−
−
ψ(btcn )|.
I next give D(RN ) the topological structure corresponding to uniform conver-
gence on compacts, or, equivalently, the topological structure for which
∞
X kψ − ψ 0 k[0,n]
ρ(ψ, ψ 0 ) ≡ 2−n
n=1
1 + kψ − ψ 0 k[0,n]
lim |ψ(τ ) − ψ(τn )| ≤ 2kψ − ψk k[0,t] + lim |ψk (τ ) − ψk (τn )| ≤ 2kψ − ψk k[0,t]
n→∞ n→∞
156 4 Lévy Processes
{0, 1} for each t > 0, it has at most a countable number of discontinuities, and
at most fr (t) of them can occur in any interval (0, t]. Furthermore, if fr has a
discontinuity at τ , then j τ, B(0, r) − j τ −, B(0, r) = 0, and so the measure
ντ = j(τ, · ) − j(τ −, · ) is a {0, 1}-valued probability measure on RN that assigns
mass 0 to B(0, r). Hence (cf. Exercise 4.1.15) fr (τ ) 6= fr (τ −) =⇒ ντ = δy
for some yτ ∈ RN \ B(0, r). From these considerations, it follows easily that if
J(r) = {τ ∈ (0, ∞) : fr (τ ) 6= fr (τ −)} and if, for each τ ∈ J(r), yτ ∈ RN \B(0, r)
is chosen so that j(τ, · ) − j(τ −, · ) = δyτ , then J(r) ∩ (0, t] is finite for all t > 0
and X
j t, · B(0, r){ = 1[τ,∞) (t)δyτ .
τ ∈J(r)
S
Thus, if J = r>0 J(r), then J is at most countable, {(τ, yτ ) : τ ∈ J} has the
required finiteness property, and (4.1.5) holds.
The reason for my introducing jump functions is that every element ψ ∈
D(RN ) determines a jump function t j(t, · , ψ) by the prescription
X
j(t, Γ, ψ) = 1Γ ψ(τ ) − ψ(τ −) ,
(4.1.6) τ ∈J(t,ψ)
for Γ ⊆ RN \ {0}.
S To check that j(t, · , ψ) is well defined and is a jump function,
take J(ψ) = t>0 J(t, ψ) and yτ = ψ(τ ) − ψ(τ −) when τ ∈ J(ψ), note that,
by Lemma 4.1.3, J(ψ) is at most countable and that {(τ, yτ ) : τ ∈ J(ψ)} has
the finiteness required in Lemma 4.1.4, and observe that (4.1.5) holds when
j(t, · ) = j(t, · , ψ) and J = J(ψ).
Because it will be important for us to know that the distribution of a D(RN )-
valued stochastic process determines the distribution of the jump functions for
its paths, we will make frequent use of the following lemma.
Lemma 4.1.7. If ϕ : RN −→ R is a BRN -measurable function that vanishes in a
neighborhood of 0, then ϕ is j(t, · , ψ)-integrable for all (t, ψ) ∈ [0, ∞) × D(RN ),
and Z
(t, ψ) ∈ [0, ∞) × D(RN ) 7−→ ϕ(y) j(t, dy, ψ) ∈ R
RN
j(t, · ,Rψ)-integrable for all (t, ψ) ∈ [0, ∞) × D(RN ) and, for each ψ ∈ D(RN ),
t RN
ϕ(y) j(t, dy, ψ) is right-continuous and piecewise constant. Thus, it
suffices to show that, for each t ∈ (0, ∞),
Z
(*) ψ ϕ(y) j(t, dy, ψ) is FD(RN ) -measurable.
RN
the absolutely pure jump paths are those that are the piecewise constant paths:
those absolutely pure jump ψ’s for which j(t, · , ψ) ∈ M0 (RN ), t > 0. Because
of Lemma 4.1.7, each of these properties is FD(RN ) -measurable. In particular, if
{Z(t) : t ≥ 0} is a D(RN )-valued stochastic process whose paths almost surely
have any one of these properties, then the paths of every D(RN )-valued stochastic
process with the same distribution as {Z(t) : t ≥ 0} will almost surely possess
that property.
Finally, I need to address the question of when a jump function is the jump
function for some ψ ∈ D(RN ).
Theorem 4.1.8. Let t j(t, · ) be a non-zero jump function, and set j Γ (t, dy)
¯ and if ψ ∆ (t) =
R 1Γ (y)j(t, dy) for ∆Γ ∈ BRN . If ∆ ∈ BRN with 0 ∈
= / ∆
∆
y j(t, dy), then ψ is a piecewise constant element of D(RN ), j(t, · , ψ ∆ ) =
N
j ∆ (t, · ), and j(t, · , ψ−ψ ∆ ) = j R \∆ (t, · ) = j(t, · )−j ∆ (t, · ) for any ψ ∈ D(RN )
whose jump function is t j(t, · ). Finally, suppose that {ψm : m ≥ 0} ⊆
N
D(R ) and a non-decreasing Ssequence {∆m : m ≥ 0} ⊆ BRN satisfy the
∞
conditions that RN \ {0} = m=0 ∆m and, for each m ∈ N, 0 ∈ / ∆m and
∆m
j(t, · , ψm ) = j (t, · ), t ≥ 0. If ψm −→ ψ uniformly on compacts, then
j(t, · , ψ) = j(t, · ), t ≥ 0.
Exercises for § 4.1 159
Proof: Throughout the proof I will use the notation introduced in Lemma
4.1.4.
Assuming that 0 ∈ ¯ we know that
/ ∆,
X
j ∆ (t, · ) = 1[τ,∞) (t)1∆ (yτ )δyτ ,
τ ∈J
where, for each t > 0, there are only finitely many non-vanishing terms. At the
same time,
X X
ψ ∆ (t) = 1[τ,∞) (t)1∆ (yτ )yτ and j(t, · , ψ−ψ ∆ ) = 1[τ,∞) (t)1RN \∆ (yτ )δyτ
τ ∈J τ ∈J
if j(t, · , ψ) = j(t, · ). Thus, all that remains is to prove the final assertion. To
this end, suppose that j(t, · , ψ) 6= j(t−, · , ψ). Since kψ − ψm k[0,t] −→ 0, there
exists an m such that ψm (t) 6= ψm (t−) and therefore that j(t, · ) − j(t−, · ) = δy
for some y ∈ ∆m . Since this means that ψn (t) − ψn (t−) = y for all n ≥ m, it
follows that ψ(t) − ψ(t−) = y and therefore that j(t, · , ψ) − j(t−, · , ψ) = δy =
j(t, · ) − j(t−, · ). Conversely, suppose that j(t, · ) 6= j(t−, · ) and choose m so
that j(t, · ) − j(t−, · ) = δy for some y ∈ ∆m . Then ψn (t) − ψn (t−) = y for
all n ≥ m. Thus, since this means that ψ(t) − ψ(t−) = y, we again have that
j(t, · , ψ) − j(t−, · , ψ) = δy = j(t, · ) − j(t−, · ). After combining these, we see
that j(t, · , ψ) − j(t−, · , ψ) = j(t, · ) − j(t−, · ) for all t > 0, from which it is an
easy step to j(t, · ) = j(t, · , ψ) for all t ≥ 0.
Exercises for § 4.1
Next, define ψt as in Exercise 4.1.10, and use that exercise together with the
preceding to show that the open set {ψ ∈ D(RN ) : ∃ t ∈ [0, 1] kψ − ψt k[0,1] < 1}
is not FD(RN ) -measurable. Conclude that BD(RN ) % FD(RN ) . Similarly, conclude
that neither D(RN ) nor C(RN ) is a measurable subset of (RN )[0,∞) . On the
other hand, as we have seen, C(RN ) ∈ FD(RN ) .
Exercise 4.1.12. Show that
Z
(4.1.13) var[0,t] (ψ) ≥ |y| j(t, dy, ψ), (t, ψ) ∈ [0, ∞) × D(RN ).
RN
Hint: This is most easily seen from the representation of j(t, · , ψ) in terms of
point masses at the discontinuities of ψ. One can use this representation to show
that, for each r > 0,
X Z
var[0,t] (ψ) ≥ ψ(τ ) − ψ(τ −) =
|y| j(t, dy, ψ), (t, ψ) ∈ [0, ∞).
τ ∈J(t,r,ψ) |y|≥r
Exercise
R 4.1.14. If ψ is an absolutely pure jump path, show that var[0,t] (ψ) =
|y| j(t, dy, ψ) and therefore that ψ has locally bounded variation. Conversely,
if ψ ∈ C(RN ) has locally bounded variation, show that ψ is an absolutely pure
if ψ ∈ D(RN )
R
jump path if and only if var[0,t] (ψ) = |y| j(t, dy, ψ). Finally,
and j(t, · , ψ) ∈ M1 (RN ) for all t ≥ 0, set ψc (t) ≡ ψ(t) − y j(t, dy, ψ) and
R
Exercise 4.1.15. If ν ∈ M1 (RN ), show that ν(Γ) ∈ {0, 1} for all Γ ∈ BRN if
and only if ν = δy for some y ∈ RN .
Hint: Begin by showing that it suffices to handle the case when N = 1. Next,
assuming that N = 1, show that ν is compactly supported, let m be its mean
value, and show that ν = δm .
§ 4.2 Discontinuous Lévy Processes
In this section I will construct the Lévy processes corresponding to those µ ∈
I(RN ) with no Gaussian component. That is,
√
µ̂(ξ) = exp −1 ξ, mµ RN
(4.2.1)
√
Z h √
−1(ξ,y)
i
+ e − 1 − −1 1[0,1] (|y|) ξ, y RN Mµ (dy) .
RN
§ 4.2 Discontinuous Lévy Processes 161
Because they are the building blocks out of which all such processes are made,
I will treat separately the case when µ is a Poisson measure πM for some M ∈
M0 (RN ) and will call the corresponding Lévy process the Poisson process
associated with M .
§ 4.2.1. The Simple Poisson Process. I begin with the case when P∞ N 1= 1
−1
and M = δ1 , for which πM is the simple Poisson measure e m=0 m! δm
√
whose Fourier transform is exp e −1ξ − 1 .
To construct the Poisson process associated with δ1 , start with a sequence
{τm : m ≥ 1} of independent, unit exponential random variables on a proba-
bility space (Ω, F, P). That is,
n
!
X
+
P {ω : τ1 (ω) > t1 , . . . , τn (ω) > tn } = exp − tm
m=1
for all n ∈ Z+ and (t1 , . . . , tn ) ∈ Rn . Without loss in generality, I may and will
assume that τm (ω) > 0 for all m ∈ Z+ and ω ∈ Ω. InPaddition, by The Strong
∞
Law of Large Numbers, I may and will assume Pn that m=1 τm (ω) = ∞ for all
ω ∈ Ω. Next, set T0 (ω) = 0 and Tn (ω) = m=1 τm (ω), and define
∞
X
(4.2.2) N (t, ω) = max{n ∈ N : Tn (ω) ≤ t} = 1[Tn (ω),∞) (t) for t ∈ [0, ∞).
n=1
Pn Pn+1
where A = (τ1 , . . . , τn+1 ) ∈ (0, ∞)n+1 : m=1 τm ≤ t < m=1 τm and
Pn
B = (τ1 , . . . , τn ) ∈ (0, ∞)n : m=1 τm ≤ t . By making the change of
Pm
variables sm = j=1 τj and remarking that the associated Jacobian is 1, one
sees that |B| = |C|, where C = (s1 , . . . , sn ) ∈ Rn : 0 < s1 < · · · < sn ≤ t .
n
Since |C| = tn! , we have shown that the P-distribution of N (t) is the Poisson
measure πtδ1 . In particular, πδ1 is the P-distribution of N (1).
I now want to use the same sort of calculation to show that {N (t) : t ∈ [0, ∞)}
is a simple Poisson process, that is, a Lévy process for πδ1 . (See Exercise
4.2.18 for another, perhaps preferable, approach.)
162 4 Lévy Processes
Lemma 4.2.3. For any (s, t) ∈ [0, ∞), the P-distribution of the increment
N (s + t) − N (s) is πtδ1 . In addition, for any K ∈ Z+ and 0 = t0 < t1 < · · · < tK ,
the increments {N (tk ) − N (tk−1 ) : 1 ≤ k ≤ K} are independent.
Proof: What I have to show is that, for all K ∈ Z+ , 0 = n0 ≤ · · · ≤ nK , and
0 = t0 < t1 < · · · < tK ,
P N (tk ) − N (tk−1 ) = nk − nk−1 , 1 ≤ k ≤ K
K
Y e−(tk −tk−1 ) (tk − tk−1 )nk −nk−1
= ,
(nk − nk−1 )!
k=1
and, since the case when nK = 0 is trivial, I will assume that nK ≥ 1. In fact,
because neither side is changed if one removes those nk ’s for which nk = nk−1 ,
I will assume that 0 = n0 < · · · < nK .
Begin by noting that
P N (tk ) = nk , 0 ≤ k ≤ K = P Tnk ≤ tk < Tnk+1 , 1 ≤ k ≤ K
Z Z PnK +1
= · · · e− m=1 τm dτ1 · · · dτnK +1 = e−tK |B|,
A
where
nk nX
k +1
( )
X
nK +1
A= (τ1 , . . . , τnK +1 ) ∈ (0, ∞) : τm ≤ tk < τm , 1 ≤ k ≤ K
m=1 m=1
and
nk
( )
X
B= (τ1 , . . . , τnK ) ∈ (0, ∞)nK : tk−1 < τm ≤ tk : 1 ≤ k ≤ K .
m=1
Pm
To compute |B|, make the change of variables sm = j=1 τj to see that |B| =
|C|, where
C = (s1 , . . . , snK ) ∈ RnK : tk−1 < snk−1 +1 < · · · < snk ≤ tk for 1 ≤ k ≤ K .
Ck = (snk−1 +1 , . . . , snk ) ∈ Rnk −nk−1 : tk−1 < snk−1 +1 < · · · < snk ≤ tk ,
§ 4.2 Discontinuous Lévy Processes 163
X ∞
X
(4.2.5) j t, · , ZM ( · , ω) = δXn (ω) = 1[Tn (ω),∞) (t)δXn (ω) .
1≤n≤N (αt,ω) n=1
I now want to check that {ZM (t) : t ≥ 0} is a Lévy process for πM and, as
such, deserves to be called a Poisson process associated with M : the one with
rate M (RN ) and jump distribution M M (RN )
. That is, I want to show that, for
each 0 = t0 < t1 < · · · tK , the random variables ZM (tk ) − ZM (tk−1 ), 1 ≤ k ≤ K,
164 4 Lévy Processes
are mutually independent and that the kth one has distribution π(tk −tk−1 )M .
Equivalently, I need to check that, for any ξ1 , . . . , ξK ∈ RN ,
" K
!# K
P
√ X Y
E exp −1 ξk , ZM (tk ) − ZM (tk−1 ) RN = π[
τk M (ξk ),
k=1 k=1
Proof: In proving the first part, I will, without loss in generality, assume that
(cf. (4.2.4)) Z = ZM . But then, by (4.2.5),
X
ZF (t, ω) =
F Xn (ω) ,
1≤n≤N (αt,ω)
§ 4.2 Discontinuous Lévy Processes 165
from which the first assertion follows immediately from the same computation
with which I just showed that {ZM (t) : t ≥ 0} is a Poisson process associated
with M .
To prove the second assertion, I begin by observing that it suffices to treat
the case when I = {1, 2}. To see this, suppose that we know the result in that
case, and let n > 2 and a set {i1 , . . . , in } of distinct elements from I be given.
By taking F1 = (Fi1 , . . . , Fin−1 ), F2 = Fin , and applying the assumed result, we
would have that {ZFin (t) : t ≥ 0} is independent of ZFi1 (t), . . . , ZFin−1 (t) :
t ≥ 0 . Hence,
F proceeding by induction, we would be able to show that the
processes {Z im (t) : t ≥ 0} : 1 ≤ m ≤ n are independent.
Now assume that I = {1, 2}. What I have to check is that, for any K ∈ Z+ ,
0 = t0 < t1 < · · · < tK , and {(ξk1 , ξk2 ) : 1 ≤ k ≤ K} ⊆ RN1 × RN2 ,
" K h
√ X
ξk1 , ZF1 (tk ) − ZF1 (tk−1 RN1
P
E exp −1
k=1
i #
+ ξk2 , ZF2 (tk ) − ZF2 (tk−1 ) RN2
" K
!#
√ X
ξk1 , ZF1 (tk ) F1
P
=E exp −1 − Z (tk−1 ) RN1
k=1
" K
!#
√ X
ξk2 , ZF2 (tk ) F2
P
×E exp −1 − Z (tk−1 ) RN2 .
k=1
For this purpose, take F : RN −→ RN1 +N2 to be given by F (y) = F1 (y), F2 (y) ,
and set ξk = (ξk1 , ξk2 ). Then the first expression in the preceding equals
" K #
√ X
F F
P
E exp −1 ξk , Z (tk ) − Z (tk−1 RN1 +N2
k=1
K h √
Y i
EP exp −1 ξk , ZF (tk − tk−1 ) RN1 +N2
= ,
k=1
since {ZF (t) : t ≥ 0} has independent, homogeneous increments. Hence, it
suffices to observe that, for any t > 0 and ξ = (ξ 1 , ξ 2 ),
h i Z √
EP exp ξ, ZF (t) RN1 +N2 = exp t e −1(ξ,F (y))RN1 +N2 − 1 M (dy)
RN
Z √
−1(ξ1 ,F1 (y))RN1
= exp t e − 1 M (dy)
RN
Z √
−1(ξ2 ,F2 (y))RN2
× exp t e − 1 M (dy)
RN
h i h i
= EP exp ξ 1 , ZF1 (t) RN1 EP exp ξ 2 , ZF2 (t) RN2 .
166 4 Lévy Processes
Now assume that the Mk ’s are as in the final part of the statement, and choose
∆k ’s accordingly. Without loss in generality, I will assume that RN \ {0} =
SK
k=1 ∆k . Also, because the assertion depends only on the joint distribution of
the processes involved, I may and will assume that
Z
Zk (t) = y j t, dy, Z for 1 ≤ k ≤ K,
∆k
PK
since then Z(t) = k=1 Zk (t), and, by Theorem 4.2.8, the Zk ’s are independent
and the kth one is a Poisson process associated with Mk . But
with this choice,
another application of Theorem 4.2.8 shows that j t, Γ, Zk = j t, Γ ∩ ∆k , Z ,
and therefore
K
X
j t, Γ, Z = j t, Γ, Zk , t ∈ [0, ∞).
k=1
Because the paths of a Poisson process are piecewise constant, they certainly
have finite variation on each compact time interval. The first part of the next
lemma provides an estimate of that variation. The estimate in the second part
will be used in § 4.2.5.
Lemma 4.2.10. If {Z(t) : t ≥ 0} is a Poisson process associated with M ∈
M0 (RN ), then Z
P
E var[0,t] (Z) = t |y| M (dy).
RN
R R
In addition, if RN
|y| M (dy) < ∞ and Z̄(t) = Z(t) − RN
y M (dy), then
N 2t N 2t
Z
≥ R ≤ 2 EP |Z̄(t)|2 = 2 |y|2 M (dy).
P kZ̄k[0,t]
R R RN
1≤`≤m
are measurable. Note that if |F (y)| vanishes for y’s in a neighborhood of 0, then
Ω(T ) = Ω for all T > 0.
My goal in this subsection is to prove the following existence result.
§ 4.2 Discontinuous Lévy Processes 169
0
then M F ∈ M0 (RN ), {ZF (t) : t ≥ 0} is a Poisson process associated with M F ,
and j t, · , ZF ( · , ω) = j F (t, · , ω).
Proof: To prove the first assertion, suppose that {∆1 , . . . , ∆n } are disjoint
0 Sn
Borel subsets of RN and that 0 ∈ / i=1 ∆i . Then {F −1 (∆1 ), . . . , F −1 (∆n )}
satisfy the same conditions as subsets of RN , and therefore, since j F (t, ∆i , ω) =
−1
F
j t, F (∆i ), ω), {j (t, ∆i ) : t ≥ 0} : 1 ≤ i ≤ n has the required properties.
170 4 Lévy Processes
0
Turning to the second assertion, first note that M F ∈ M0 (RN ) is an immedi-
0
ate consequence of 0 ∈ / F −1 (RN \ {0}) and that the equality j t, · , ZF ( · , ω) =
j F (t, · , ω) is a trivial application of the final part of Theorem 4.1.8. To prove
that {ZF (t) : t ≥ 0} is a Poisson process associated with M F , use Theorem
4.2.8 to see that {j F (t, · ) : t ≥ 0} has the same distribution as the jump
process for a Poisson process {Z(t) : t ≥ 0} associated with M F . Hence,
since Z(t) = y j(t, dy, Z), {ZF (t) : t ≥ 0} has the same distribution as
R
{Z(t) : t ≥ 0}.
§ 4.2.4. Lévy Processes with Bounded Variation. Although the contents
of the previous section provide the machinery with which to construct a Lévy
process for any µ with Fourier transform given by (4.2.1), for reasons made clear
in the next lemma, I will treat the special case when M ∈ M1 (RN ) here and will
deal with M ∈ M2 (RN ) \ M1 (RN ) in the following subsection.
Lemma 4.2.13. Let {j(t, · ) : t ≥ 0}R be a Poisson jump process associated
with M ∈ M2 (RN ), and set V (t, ω) = |y| j(t, dy, ω). Then V (t) < ∞ almost
surely or V (t) = ∞ almost surely for all t > 0, depending on whether M is or is
not in M1 (RN ). (See Exercise 4.3.11 to see that the same conclusion holds for
any M ∈ M∞ (RN ).)
R
Proof: Since |y|>1 |y| j(t, dy, ω) < ∞ for all (t, ω) ∈ [0, ∞) × Ω, the question
R
is entirely about the finiteness of V0 (t, ω) ≡ B(0,1) |y| j(t, dy, ω). To study this
−k+1 ) \ B(0, 2−k ), F (y) = |y|1
Rquestion, set Ak = B(0, 2 k Ak (y), and Vk (t, ω) =
Ak
|y| j(t, dy, ω) for k ≥ 1. Clearly, the processes {V k (t) : t ≥ 0} : k ∈ Z+
are mutually independent. In addition, for each k, t Vk (t) is non-decreasing
and, by the second part of Lemma 4.2.12, {Vk (t) : t ≥ 0} is a Poisson process
associated with M Fk . Thus, by Lemma 4.2.10,
Z Z
|y|2 M (dy).
ak ≡ EP Vk (t) = t |y| M (dy) and bk ≡ Var Vk (t) = t
Ak Ak
∞
"Z # Z
X
P P
E |y| j(t, dy) = E Vk (t) = |y| M (dy),
B(0,1) k=1 B(0,1)
which finishes the case when M ∈ M1 (RN ). When M ∈ M2 (RN ) \ M1 (RN ), set
V̄k (t) = Vk (t) − tak . Then, for each t > 0, {V̄k (t) : k ∈ Z+ } is a sequence of
mutually independent random variables with mean value 0. Furthermore,
∞
X ∞
X Z
|y|2 M (dy) < ∞.
Var V̄k (t) = t bk = t
k=1 k=1 B(0,1)
§ 4.2 Discontinuous Lévy Processes 171
P∞
Hence, by Theorem 1.4.2, k=1 V̄k (t) converges P-almost
P∞ surely. But, when
∞
/ M1 (RN ), k=1 ak = ∞, and so, for each t > 0, k=1 Vk (t) must diverge
P
M ∈
P-almost surely.
Before stating the main result of the subsection, I want to introduce the notion
of a generalized Poisson measure. Namely, if M ∈ M1 (RN ) \ M0 (RN ) and
πM is the element of I(RN ) whose Fourier transform is given by
Z √
−1(ξ,y)RN
exp e − 1 M (dy) ,
R
or, equivalently, π
d M is given by (4.2.1) with m = B(0,1) y M (dy), then I will
call πM the generalized Poisson measure for M . Similarly, if {Z(t) : t ≥ 0}
is a Lévy process for a generalized Poisson measure πM , I will say that it is a
generalized Poisson process associated with M .
Theorem 4.2.14. Suppose that M ∈ M1 (RN ) and that {j(t, · ) : t ≥ 0} is
a Poisson jump process associated with M . Set N = {ω : ∃t > 0 j(t, · , ω) ∈
/
N
M1 (R )}, and define (t, ω) ZM (t, ω) so that
R
y j(t, dy, ω) if ω ∈
/N
ZM (t, ω) =
0 if ω ∈ N .
Then P(N ) = 0 and {ZM (t) : t ≥ 0} is a (possibly generalized) Poisson process
associated with M . In particular, t ZM (t, ω) is absolutely pure jump for all
ω ∈ Ω, and {j(t, · , ZM ) : t ≥ 0} is a Poisson jump process associated with M .
Finally, if µ ∈ I(RN ) has Fourier transform given by (4.2.1), then
( Z ! )
t m− y M (dy) + ZM (t) : t ≥ 0
B(0,1)
then
!
N 2t
Z
sup Z(τ ) − τ m − Z(r) (τ ) ≥ |y|2 M (dy).
P ≤
τ ∈[0,t] 2 B(0,r)
and define
Z
(r)
Z (t, ω) = y j̄(t, dy, ω), (t, ω) ∈ [0, ∞) × Ω,
|y|>r
for r ∈ (0, 1]. By Theorem 4.2.14, we know that {Z(r) (t) : t ≥ 0} is a Lévy
process for µ(r) , where
!
h √ √
Z i
e −1(ξ,y)RN − 1 − −1 1[0,1] (y) ξ, y RN M (dy) .
µd(r) (ξ) = exp
|y|>r
Furthermore, by the second part of Lemma 4.2.10, we know that, for 0 < r <
r0 ≤ 1,
N 2t
Z
0
(*) P kZ(r ) − Z(r) k[0,t] ≥ ≤ 2 |y|2 M (dy).
r<|y|≤r 0
then
X
P sup kZ(rn ) − Z(rm ) k[0,t] ≥ 1
P kZ(rn+1 ) − Z(rn ) k[0,t] ≥ (m + 1)−2
m ≤
n>m
n≥m
∞
X
≤ N 2t (n + 1)4 2−n ,
n=m
§ 4.2 Discontinuous Lévy Processes 173
In particular, this means that {Z(r) (t)−Z∆ (t) : t ≥ 0} has independent, homoge-
neous increments and (cf. Theorem 4.1.8) is independent of {j t, · , Z∆ : t ≥ 0}.
Thus, since, as r & 0, Z(r) (t) −→ Z(t) − tm in probability, it follows that
{Z(t) − Z∆ (t) : t ≥ 0} is independent of {j(t, · , Z∆ ) : t ≥ 0}. In addition,
√ ∆ √ ∆ √ (r) ∆ ∆
e− −1t(ξ,m−m )RN EP e −1(ξ,Z(t)−Z (t))RN = lim EP e −1(ξ,Z (t)−Z (t)+tm )RN
r&0
h √ √
Z i
e −1(ξ,y)RN − 1 − −11[0,1] |y| ξ, y RN M (dy)
= lim exp
r&0
(∆∪B(0,r)){
!
√ √
Z h
−1(ξ,y)RN
i
= exp e −1− −11[0,1] |y| ξ, y RN M (dy) .
RN \∆
Hence, it follows that {Z(t) − Z∆ (t) : t ≥ 0} is a Lévy process for the specified
element of I(RN ).
Exercises for § 4.2
Exercise 4.2.18. Here is another proof that the process {N (t) : t ≥ 0} in
§ 4.2.1 has independent, homogeneous increments. Refer to the notation used
there.
(i) Given n ∈ Z+ and measurable functions f : [0, ∞)n+1 7−→ [0, ∞) and g :
[0, ∞)n −→ R, show that
EP f (τ1 , . . . , τn+1 ), τn+1 > g(τ1 , . . . , τn )
+
= EP e−g(τ1 ,... ,τn ) f τ1 , . . . , τn , τn+1 + g(τ1 , . . . , τn )+ .
(iii) Let n ∈ Z+ and t > 0 be given, and set h(ξ) = P(Tn−1 > ξ). Referring to
(ii) and again using (i), show that
P A ∩ {N (s + t) − N (s) < n} = EP h(t + s − TnK +1 ), B ∩ {τnK +1 > s − TnK }
= EP e−(s−TnK ) h(t − τnK +1 ), B = EP h(t − τnK +1 ) EP e−(s−TnK ) , B
= P N (t) < n P(A).
Exercises for § 4.2 175
to see that
N (t) N (btc)
lim − = 0 P-almost surely.
t→∞ t btc
Exercise 4.2.20. Assume that µ ∈ I(R) has its Fourier transform given by
(4.2.1), and let {Z(t) : t ≥ 0} be a Lévy process for µ. Using Exercise 3.2.25,
show that t R Z(t) is non-decreasing if and only if M ∈ M1 (R), M (−∞, 0) =
0, and m ≥ [−1,1] y M (dy).
Exercise 4.2.21. Let {j(t, · ) : t ≥ 0} be a Poisson jump process associated
with some M ∈ M∞ (RN ), and suppose that F : RN −→ R is a Borel measurable,
M -integrable function that vanishes at 0.
(i) Let N be the set of ω ∈ Ω for which there is a t > 0 such that F is not
j t, · , ω)-integrable, and show that P(N ) = 0.
(ii) Show that (cf. Lemma 4.2.6) M F ∈ M1 (R) and that, in fact,
Z Z
F
|y| M (dy) = |F (y)| M (dy) < ∞.
Next, define R
F F (y) j(t, dy, ω) if ω ∈
/N
Z (t, ω) =
0 if ω ∈ N ,
and show that {Z F (t) : t ≥ 0} is a (possibly generalized) Poisson process asso-
ciated with M F .
(iii) Show that
Z F (t)
Z
lim = F (y) M (dy) P-almost surely.
t→∞ t
Hint: Begin by using Lemma 4.2.10 to show that it suffices to handle F ’s that
vanish in a neighborhood of 0. When F vanishes in a neighborhood of 0, use
Lemma 4.2.12 to see that {Z F (t) : t ≥ 0} is a Poisson process associated with
M F . Finally, use the representation of a Poisson process in terms of a simple
Poisson process and independent random variables, and apply The Strong Law
of Large Numbers together with the result in Exercise 4.2.19.
176 4 Lévy Processes
Exercise 4.2.22. Let {Z(t) : t ≥ 0} be a Lévy process for the µ ∈ I(RN ) with
Z̄(t) = Z(t) − tm. Show that for all
Fourier transform given by (4.2.1), and set
R ∈ [1, ∞) and t ∈ (0, ∞), P kZ̄k[0,t] ≥ R is dominated by t times
4N
Z
2
Z √
|y|2 M (dy) +
√ |y| M (dy) + M B(0, R){ .
R2 B(0,1) R 1<|y|≤ R
Then,
R R
P kZk[0,t] ≥ R ≤ P kZ1 k[0,t] ≥ 2 + P kZ2 k[0,t] ≥ 2 + P kZ3 k[0,t] 6= 0 .
Apply the estimates in Lemma 4.2.10 to control the first two terms on the right,
and use
√ N
√
P j t, RN \ B(0, R), Z 6= 0 = 1 − e−tM (R \B(0, R))
Exercise 4.2.24. Let M ∈ M2 (RN ) be given, and assume that there exists a
decreasing sequence {rn : n ≥ 0} ⊆ (0, 1] with rn & 0 such that
Z
m = lim y M (dy)
n→∞ rn <|y|≤1
exists. Let µ ∈ I(RN ) have Fourier transform given by (4.2.1) with this m and
M . If {Z(t) : t ≥ 0} is a Lévy process for µ, set
Z
Zn (t, ω) = y j t, dy, Z( · , ω) ,
|y|>rn
and show that limn→∞ P kZ − Zn k[0,t] ≥ = 0 for all t ≥ 0 and > 0. Thus,
after passing to a subsequence {nm : m ≥ 0} if necessary, one sees that, P-almost
surely, Z
Z(t, ω) = lim y j t, dy, Z( · , ω) ,
m→∞ |y|>rnm
where the convergence is uniform on finite time intervals. In particular, one can
say that P-almost all the paths t Z(t, ω) are “conditionally pure jump.”
§ 4.3 Brownian Motion, the Gaussian Lévy Process
What remains of the program in this chapter is the construction of a Lévy
process for the standard, normal distribution γ0,I , the infinitely divisible law
|ξ|2
whose Fourier transform is e− 2 . Indeed, if {Zγ0,I (t) : t ≥ 0} is such a process
and {Zµ (t) : t ≥ 0} is a Lévy process for the µ ∈ I(RN ) whose Fourier transform
is given by (4.2.1), and if {Zγ0,I (t) : t ≥ 0} is independent of {Zµ (t) : t ≥ 0},
1
then it is an easy matter to check that C 2 Zγ0,I (t) + Zµ (t) will be a Lévy process
for γ0,C ? µ, whose Fourier transform is
√
−1 ξ, m RN − 12 ξ, Cξ RN
exp
√
Z h √
i
+ e −1(ξ,y)RN − 1 − −1 1[0,1] (|y|) ξ, y RN M (dy) .
RN
Because one of its earliest applications was as a mathematical model for the
motion of “Brownian particles,” 1 such a Lévy process for γ0,1 is called a Brow-
nian motion. In recognition of its provenance, I will adopt this terminology
and will use the notation {B(t) : t ≥ 0} instead of {Zγ0,I (t) : t ≥ 0}.
1 R. Brown, an eighteenth century English botanist, observed the motion of pollen particles
in a dilute gas. His observations were interpreted by A. Einstein as evidence for the kinetic
theory of gases. In his famous 1905 paper, Einstein took the first steps in a program, eventually
completed by N. Wiener in 1923, to give a mathematical model of what Brown had seen.
178 4 Lévy Processes
Before getting into the details, it may be helpful to think a little about what
sorts of properties we should expect the paths t B(t) will possess. For this
N
purpose, set Mn = n δ − 12 + δ − 12 , and recall that we have seen already
n −n
that πMn =⇒γ0,I . Since a Poisson process associated with Mn has nothing but
1
jumps of size n− 2 , if one believes that the Lévy process for γ0,I should be, in
some sense, the limit of such Poisson processes, then it is reasonable to guess
that its paths will have jumps of size 0. That is, they will be continuous.
Although the prediction that the paths of {B(t) : t ≥ 0} will be continuous
is correct, it turns out that, because it is based on the Central Limit Theorem,
the heuristic reasoning just given does not lead to the easiest construction. The
problem is that The Central Limit Theorem gives convergence of distributions,
not random variables, and therefore one should not expect the paths, as opposed
to their distributions, of the approximating Poisson processes to converge. For
this reason, it is easier to avoid The Central Limit Theorem and work with
Gaussian random variables from the start, and that is what I will do here. The
Central Limit approach is the content of § 9.3.
§ 4.3.1. Deconstructing Brownian Motion. My construction of Brownian
motion is based on an idea of Lévy’s; and in order to explain Lévy’s idea, I will
begin with the following line of reasoning.
Assume that {B(t) : t ≥ 0} is a Brownian motion in RN . That is, {B(t) : t ≥
0} starts at 0, has independent increments, any increment B(s + t) − B(s) has
distribution γ0,tI , and the paths t B(t) are continuous. Next, given n ∈ N, let
t Bn (t) be the polygonal path obtained from t B(t) by linear interpolation
during each time interval [m2−n , (m + 1)2−n ]. Thus,
Bn (t) = B(m2−n ) + 2n t − m2−n B (m + 1)2−n − B(m2−n )
n
Xm,n+1 ≡ 2 2 +1 Bn+1 (2m − 1)2−n−1 − Bn (2m − 1)2−n−1
!
B m2−n + B (m − 1)2−n
n
+1 −n−1
= 22 B (2m − 1)2 −
2
n
h
= 2 2 B (2m − 1)2−n−1 − B (m − 1)2−n
i
− B m2−n − B (2m − 1)2−n−1
,
§ 4.3 Brownian Motion, the Gaussian Lévy Process 179
0
for any choice of {ξm : 1 ≤ m ≤ n} ∪ {ξm : 1 ≤ m ≤ n} ⊆ R. But the
expectation value on the left is equal to
!2
n
1 X
0 0
exp − EP ξm Xm + ξm Xm
2 m=1
!2 !2
n n
1 X 1 X
0 0
= exp − EP ξm Xm − EP ξm Xm
2 m=1
2 m=1
" n # " n #
Y √ Y √ 0 0
= EP e −1 ξm Xm EP e −1 ξm Xm ,
m=1 m=1
180 4 Lévy Processes
0 0
since EP [Xm Xm 0 ] = 0 for all 1 ≤ m, m ≤ n.
Armed with Lemma 4.3.1, we can now check that {Xm,n : (m, n) ∈ Z+ ×N}
is independent. Indeed, since, for all (m, n) ∈ Z+ × N and ξ ∈ RN , ξ, Xm,n RN
a member of the Gaussian family G(B), all that we have to do is check that, for
each (m, n) ∈ Z+ × N, ` ∈ N, and (ξ, η) ∈ (RN )2 ,
EP ξ, Xm,n+1 RN η, B(`2−n ) RN = 0.
and therefore
n
2− 2 −1 EP ξ, Xm,n+1 RN η, B(`2−n ) RN
h i
= EP ξ, B (2m − 1)2−n−1 N η, B(`2−n ) N
R R
1 P h −n
i
+ B (m − 1)2−n N η, B(`2−n ) N
− E ξ, B m2
2 R R
m ∧ ` + (m − 1) ∧ `
= 2−n ξ, η RN m − 12 ∧ ` −
= 0.
2
§ 4.3.2. Lévy’s Construction of Brownian Motion. Lévy’s idea was to
invert the reasoning given in the preceding subsection. That is, start with a
family {Xm,n : (m, n) ∈ Z+ × N} of independent N (0, I)-random variables.
Next, define {Bn (t) : t ≥ 0} inductively
P so that t Bn (t) is linear on each
interval [(m − 1)2 , m2 ], B0 (m) = 1≤`≤m X`,0 , m ∈ N, Bn+1 (m2−n ) =
−n −n
independent N (0, 2−n I)-random variables. But, since this sequence is contained
in the Gaussian family spanned by {Xm,n : (m, n) ∈ Z+ × N}, Lemma 4.3.1 says
that we need only show that
h
EP ξ, Bn (m + 1)2−n − Bn m2−n N
R
i
0 0 −n
− Bn m0 2−n = 2−n ξ, ξ 0 RN δm,m0
× ξ , Bn (m + 1)2
RN
§ 4.3 Brownian Motion, the Gaussian Lévy Process 181
Bn (m2−n ) − Bn (m − 1)2−n
n
= − 2− 2 −1 Xm,n+1
2
and
Bn (m2−n ) − Bn (m − 1)2−n
n
= + 2− 2 −1 Xm,n+1 .
2
Using these expressions and the induction hypothesis, it is easy to check the
required equation.
Second, and more challenging, we must show that, P-almost surely, these
processes are converging uniformly on compact time intervals. For this purpose,
consider the difference t Bn+1 (t) − Bn (t). Since this path is linear on each
interval [m2−n−1 , (m + 1)2−n−1 ],
1
where CN ≡ EP |X1,0 |4 4 < ∞.
Starting from the preceding, it is an easy matter to show that there is a
measurable B : [0, ∞) × Ω −→ RN such that B(0) = 0, B( · , ω) ∈ C [0, ∞); RN )
for each ω ∈ Ω, and kBn − Bk[0,t] −→ 0 both P-almost surely and in L1 (P; R)
−n −n
for every t ∈ [0, ∞). Furthermore, since
B(m2 )−n= Bn (m2 −n ) P-almost surely
2
for all (m, n) ∈ N , it is clear that B (m + 1)2 − B(m2 ) : m ≥ 0 is a
sequence of independent N (0, 2−n I)-random variables for all n ∈ N. Hence, by
continuity, it follows that {B(t) : t ≥ 0} is a Brownian motion.
We have now completed the task described in the introduction to this section.
However, before moving on, it is only proper to recognize that, clever as his
method is, Lévy was not the first to construct a Brownian motion. Instead, it
182 4 Lévy Processes
was N. Wiener who was the first. In fact, his famous2 1923 article “Differential
Space” in J. Math. Phys. #2 contains three different approaches.
§ 4.3.3. Lévy’s Construction in Context. There are elements of Lévy’s
construction that admit interesting generalizations, perhaps the most important
of which is Kolmogorov’s Continuity Criterion.
Theorem 4.3.2. Suppose that {X(t) : t ∈ [0, T ]} is a family of random
variables taking values in a Banach space B, and assume that, for some p ∈
[1, ∞), C < ∞, and r ∈ (0, 1],
1 1
EP kX(t) − X(s)kpB p ≤ C|t − s| p +r for all s, t ∈ [0, T ].
Then, there exists a family {X̃(t) : t ∈ [0, T ]} of random variables such that
X(t) = X̃(t) P-almost surely for each t ∈ [0, T ] and t ∈ [0, T ] 7−→ X̃(t, ω) ∈ B is
continuous for all ω ∈ Ω. In fact, for each α ∈ (0, r),
" !p # p1 1
P kX̃(t) − X̃(s)kB 5CT p +r−α
E sup ≤ .
0≤s<t≤T (t − s)α (1 − 2−r )(1 − 2α−r )
Proof: First note that, by rescaling time, it suffices to treat the case when
T = 1.
Given n ≥ 0, set Mn = max1≤m≤2n
X(m2−n ) − X (m − 1)2−n
B , and
observe that
2n
! p1
1 X p
EP Mnp p ≤ EP
X(m2−n ) − X (m − 1)2−n
≤ C2−rn .
B
m=1
kX̃(t) − X̃(s)kB ≤ kX̃(t) − Xn (t)kB + kXn (t) − Xn (s)kB + kXn (s) − X̃(s)kB
≤ 2 sup kX̃(τ ) − Xn (τ )kB + 2n (t − s)Mn ,
τ ∈[0,1]
kX̃(t) − X̃(s)kB
≤ 22α(n+1) sup kX̃(τ ) − Xn (τ )kB + 2n 2(α−1)n Mn .
(t − s)α τ ∈[0,1]
" !p # p1 ∞
2α(n+1) 2−rn
P kX̃(t) − X̃(s)kB X
αn −rn
E sup ≤C 2 + 2 2
0≤s<t≤1 (t − s)α n=0
1 − 2−r
5C
≤ .
(1 − 2−r )(1 − 2α−r )
But
`+1 ` M
P B n −B n
≤
nα , 0≤`<L
Z !L
L −N
|y|2
− 2 1
= γ0, n1 I B 0, nα M
= (2π) 2
1
e dy ≤ Cn( 2 −α)N L .
B(0,M n 2 −α )
N
Proof: Let (e1 , . . . , eN ) be an orthonormal basis for R , and set Xi (k, n) =
ei , ∆k,n B RN . Then, what we have to show is that
m
X m
(*) lim sup Xi (k, n)Xj (k, n) − δi,j = 0 P-almost surely.
n→∞ 1≤m≤nT n
k=1
To this end, note that, for each n ∈ Z+ and 1 ≤ i ≤ N , {Xi (k, n) : k ≥ 1} are
mutually independent N (0, n−1 )-random variables. Hence, for each 1 ≤ i ≤ N ,
{Xi (k, n)2 − n−1 : k ≥ 1} are independent random variables with mean value
0 and variance 3n−2 , and therefore, by (1.4.22) and the second inequality in
(1.3.2),
Xm 4
E max Xi (k, n)2 − n1
1≤m≤nT
k=1
4
12M4 T 2
X 2
1
≤ 4E Xi (k, n) − n ≤ ,
1≤k≤nT n2
§ 4.3.5. General Lévy Processes. Our original reason for constructing Brow-
nian motion was to complete the program of constructing all the Lévy processes.
In this subsection, I will do that.
Throughout this subsection, µ ∈ I(RN ) has Fourier transform
√
−1 ξ, m RN − 12 ξ, Cξ RN
exp
(4.3.6)
√
Z h √
−1(ξ,y)RN
i
+ e − 1 − −11[0,1] (|y|) ξ, y RN M (dy) ,
Thus, µ = µ0 ? µ1 .
186 4 Lévy Processes
N 2t
Z
P kZ(r) − Z1 k[0,t] ≥ ≤ 2 |y|2 M (dy).
B(0,r)
also have locally bounded variation P-almost surely, and, since {Z0 (t) : t ≥ 0}
1
has the same distribution as {tm + C 2 B(t) : t ≥ 0}, Theorem 4.3.5 shows that
this is possible only if C = 0.
Remark 4.3.9. Recall the linear functional Aµ introduced in (3.2.10). As I
showed in Lemma 3.2.14, the action of Aµ on ϕ decomposes into a local part
and a non-local part, which, with 20-20 hindsight, we can write as, respectively,
m, ∇ϕ(0) RN + 12 Trace C∇2 ϕ(0)
Z h
i
and ϕ(y) − ϕ(0) − 1[0,1] (|y|) y, ∇ϕ(0) RN M (dy).
In terms of this decomposition, Corollary 4.3.8 is saying that the local part of
Aµ governs the continuous part of {Z(t) : t ≥ 0} and that the non-local part
governs the discontinuous part.
Exercises for § 4.3
Exercise 4.3.10. This exercise deals with a few elementary facts about Brow-
nian motion.
(i) Let {X(t) : t ≥ 0} be an RN -valued stochastic process satisfying X(0, ω) = 0
and X( · , ω) ∈ C(RN ) for all ω ∈ Ω, and showthat {X(t) N
: t ≥ 0} is an R -valued
Brownian motion if and only if the span of ξ, X(t) RN : t ≥ 0 & ξ ∈ RN } is a
Gaussian family with the property that, for all t, t0 ∈ [0, ∞) and ξ, ξ 0 ∈ RN ,
h i
EP ξ, X(t) RN ξ 0 , X(t0 ) RN = t ∧ t0 (ξ, ξ 0 )RN .
(ii) As a consequence of part (i), prove the Brownian Strong Law of Large Num-
bers: limt→∞ t−1 B(t) = 0.
R2
P kBk[0,T ] ≥ R ≤ 2N e− 2N T .
(4.3.13)
examining its proof, one sees that the inequality in Theorem 1.4.13 comes from
not knowing how far over a the partial sums jump when they first exceed level a.
Thus, because we are now dealing with “continuous partial sums,” one should
suspect that the inequality can be made an equality. To verify this suspicion, let
Γn () denote the set of ω such that |B(t, ω) − B(s, ω)| < for all 0 ≤ s < t ≤ 1
with t − s ≤ 2−n , and show that, for 0 < < a,
{B(1) ≥ a} ∩ Γn ()
2n
[ −1
−n −n −n
⊆ max B(`2 ) < a − ≤ B(m2 ) & B(1) − B(m2 ) > 0 ,
0≤`<m
m=1
∞
r Z
∗ 2 x2
e−
(4.3.14) P B (t) ≥ a = 2P B(t) ≥ a = 2 dx.
π 1
at− 2
This beautiful result, which is sometimes called the reflection principle for
Brownian motion, seems to have appeared first in L. Bachelier’s now famous
1900 thesis, where he used what is now called “Brownian motion” to model
price fluctuations on the Paris Bourse. More information about the reflection
principle can be found in § 8.6.3.
Exercises for § 4.3 189
B(t) B(t)
lim q = 1 = lim q P-almost surely.
t→∞ t&0
2t log(2) t 2t log(2) t−1
Begin by checking that the second equality follows from the first applied to the
time inverted process {B̃(t) : t ≥ 0} described in (i) of Exercise 4.3.11. Next,
observe that
B(n)
lim q = 1 P-almost surely
n→∞
2n log(2) n
is just the Law of the Iterated Logarithm for standard normal random variables.
Thus, all that remains is to show that
B(t) B(n)
lim sup q −q = 0 P-almost surely,
n→∞ t∈[n,n+1]
2t log(2) t 2n log(2) n
which can be checked by a combination of the Strong Law for Brownian motion,
the estimate in (4.3.13), and the easy half of the Borel–Cantelli Lemma.
Exercise 4.3.16. Given a stochastic process {X(t) : t ≥ 0}, the stochastic
process {X̃(t) : t ≥ 0} is said to be a modification of {X(t) : t ≥ 0} if, for
each t ∈ [0, ∞), X̃(t) = X(t) P-almost surely. Further, given a stochastic process
{X(t) : t ≥ 0} with values in a metric space (E, ρ), one says that {X(t) : t ≥ 0}
is stochastically continuous if, as t → s, X(t) −→ X(s) in probability for
each s ∈ [0, ∞).
(i) Show that the simple Poisson process {N (t) : t ≥ 0} is stochastically contin-
uous. Thus, stochastic continuity does not imply path continuity.
(ii) Let Q denote the set of rational real numbers. Show that an RN -valued,
stochastically continuous stochastic process {X(t) : t ≥ 0} admits a continuous
modification if and only if, for each T > 0, t ∈ [0, T ] ∩ Q 7−→ X(t) is uniformly
continuous. Conclude that a stochastically continuous process {X(t) : t ≥ 0}
admits a continuous modification if and only if there exists a µ ∈ M1 C(RN )
such that the distribution of {X(t) : t ≥ 0} under P is the same as the dis-
tribution of {ψ(t) : t ≥ 0} under µ. Equivalently, a stochastically continuous
process {X(t) : t ≥ 0} admits a continuous modification if and only if there
exists a continuous stochastic process {Y (t) : t ≥ 0}, not necessarily on the
same probability space, with the same distribution as {X(t) : t ≥ 0}.
190 4 Lévy Processes
for some p ∈ [1, ∞), r > 0, and C < ∞. Show that there exists a family
{X̃(x) : x ∈ [0, T ]ν } with the properties that x ∈ [0, T ]ν 7−→ X̃(x, ω) ∈ B
is continuous for all ω, and, for each x ∈ [0, T ]ν , X̃(x, ω) = X(x, ω) P-almost
surely. Further, show that, for each α ∈ (0, r), there is a universal K(ν, r, α) < ∞
such that
kX̃(y) − X̃(x)kB ν
+r−α
EP sup ≤ K(ν, r, α)CT p .
|y − x|α
x,y∈[0,T ]ν
y6=x
Hint: First rescale time to reduce to the case when T = 1. Now assume that 2
T = 1. Given n ∈ N, take Sn to be the set of pairs (m, m0 ) ∈ {0, . . . , 2n }N
ν
such that m0i ≥ mi for all 1 ≤ i ≤ ν and i=1 (m0i − mi ) = 1, note that Sn has
P
1
and show that EP [Mn ] ≤ C2ν ν p 2−rn . Next, let x Xn (x) denote the nth
dyadic multiliniarization of x X(x), the one that is multilinear on each dyadic
QN
cube i=1 [(mi − 1)2−n , mi 2−n ] for (m1 , . . . , mN ) ∈ {1, . . . , 2n }N . As in the
proof of Theorem 4.3.2, argue that kXn+1 − Xn ku,B ≤ Mn+1 , and conclude
that there exists an (x, ω) X̃(x, ω) that is continuous in x for each ω and
is P-almost surely equal to X(x, · ) for each x. Finally, to derive the Hölder
1
continuity estimate, observe that kXn (y) − Xn (x)kB ≤ 2n ν 2 |y − x|Mn , and
proceed as in the proof of the corresponding part of Theorem 4.3.2.
Exercise 4.3.19. In this exercise we will examine a couple of the implications
that Theorem 4.3.5 has about any Riemann–Stieltjes type integration theory
Exercises for § 4.3 191
involving Brownian paths. For simplicity, I will restrict my attention to the one-
dimensional case. Thus, let {B(t) : t ≥ 0} be an R-valued Brownian motion.
Because t B(t) is continuous, one knows that any function ψ : [0, 1] −→ R of
bounded variation is Riemann–Stieltjes integrable on [0, 1] with respect to B
[0, 1]. However, as the following shows, almost no Brownian path is Riemann–
Stieltjes with respect to itself. Namely, using Theorem 4.3.5, show that P-almost
surely,
n
X
m−1
m
m−1
B(1)2 − 1
lim B n B n −B n = ,
n→∞
m=1
2
n
X
m
m
m−1
B(1)2 + 1
lim B n B n −B n = ,
n→∞
m=1
2
whereas
n
X
2m−1 m m−1
= B(1)2 .
lim B 2n B n −B n
n→∞
m=1
In this exercise, weR will show that the same is true for any M ∈ M∞ (RN ). That
is, assuming that |y| j(t, dy) < ∞, t ≥ 0, with positive probability, it is to be
shown that M ∈ M1 (RN ). Here are some steps that you might want to follow.
R
(i) As an application of Kolmogorov’s 0–1 Law, show that |y| j(t, dy) < ∞
with positive probability implies it is finite with probability 1.
R
(ii) Let N be the set of ω ∈ Ω for which there is aRt > 0 such that |y| j(t, dy, ω)
= ∞. By (i), P(N ) = 0. Define Z(t, ω) = y j(t, dy, ω) for ω ∈ / N and
Z(t, ω) = 0 for ω ∈ N , and show that {Z(t) : t ≥ 0} is a Lévy process with
absolutely pure jump paths.
(iii) Applying Theorem 4.1.8, first show that {Z(t) : t ≥ 0} is a Lévy process
for a µ with Lévy measure M , and then apply Corollary 4.3.8 to conclude that
M ∈ M1 (RN ).
Exercise 4.3.22. Corollary 4.3.3 can be sharpened. In fact, Lévy showed that
if {B(t) : t ≥ 0} is an R-valued Brownian motion, then
|B(t) − B(s)| √
P lim sup = 2 = 1,
δ&0 0<t−s≤δ L(δ)
192 4 Lévy Processes
p
where L(δ) ≡ δ log δ −1 . Notice that, on the one hand, this result is in the direc-
tion that one should expect: we know (cf. Theorem 4.3.4) that Brownian paths
are almost never Hölder continuous of any order greater than 12 . On the other
hand, the Brownian Law of the Iterated Logarithm (cf. Exercise q 4.3.15) might
make one guess that their true modulus of continuity ought to be δ log(2) δ −1 ,
not L(δ). However, that guess is wrong because it fails to take into account the
difference between a question about what is true at a single time as opposed to
what is true simultaneously for all times. The purpose of this exercise is to show
how the considerations in § 4.3.3 can be used to get a statement that is related
to but far less refined than Lévy’s. The result to be proved here says only that
|B(t) − B(s)|
(4.3.23) P lim sup ≤K =1
δ&0 0<t−s≤δ L(δ)
and combine this with (ii) and (iii) to prove that (*) holds for some K < ∞.
Chapter 5
Conditioning and Martingales
Up to this point I have been dealing with random variables that are either
themselves mutually independent or are built out of other random variables
that are. For this reason, it has not been necessary for me to make explicit
use of the concept of conditioning, although, as we will see shortly, this concept
has been lurking silently in the background. In this chapter I will first give the
modern formulation of conditional expectations and then provide an example of
the way in which conditional expectations can be used.
Let (Ω, F, P) be a probability space, and suppose that A ∈ F is a set having
positive P-measure. For reasons that are most easily understood when Ω is finite
and P is uniform, the ratio
P(A ∩ B)
P(B|A) ≡ , B ∈ F,
P(A)
is called the conditional probability of B given A. As one learns in an
elementary course, the introduction of conditional probabilities makes many
calculations much simpler; in particular, conditional probabilities help to clarify
dependence relations between the events represented by A and B. For example,
B is independent of A precisely when P(B|A) = P(B) or, in words, when the
condition that A occurs does not change the probability that B occurs. Thus, it
is unfortunate that the naı̈ve definition of conditioning as described above does
not cover many important situations. For example, suppose that X and Y are
random variables and that one wants to talk about the conditional probability
that Y ≤ b given that X = a. Unless one is very lucky and P(X = a) > 0,
dividing by P(X = a) is not going to do the job. As this example illustrates,
it is of great importance to generalize the concept of conditional probability to
include situations when the event on which one is conditioning has P-measure 0,
and the next section is devoted to Kolmogorov’s elegant solution to the problem
of doing so.
§ 5.1 Conditioning
In order to appreciate the idea behind Kolmogorov’s solution, imagine someone
told you the conditional probability that the event B occurs given that the
event A occurs. Obviously, since you have no way of saying anything about the
193
194 5 Conditioning and Martingales
probability of B when A does not occur, she has provided you with incomplete
information about B. Thus, before you are satisfied, you should demand to
know also what is the conditional probability of B given that A does not occur.
Of course, this second piece of information is relevant only if A is not certain,
in which case P(A) < 1 and therefore P B A{ is well defined. More generally,
suppose that P = {A1 , . . . , AN } (N here may be either finite or countably
infinite) is a partition of Ω into elements of F having positive P-measure. Then,
of B ∈ F relative to
in order to have complete information about the probability
P, one has to know the entire list of the numbers P B An , 1 ≤ n ≤ N . Next,
suppose that one attempts to describe this list in a way that does not depend
explicitly on the positivity of the numbers P(An ). For this purpose, consider the
function
XN
ω ∈ Ω 7−→ f (ω) ≡ P B An 1An (ω).
n=1
Clearly, f is not only F-measurable, it is measurable with respect to the σ-
algebra σ(P) over Ω generated by P. In particular (because the only σ(P)-
measurable set of P-measure 0 is empty), f is uniquely determined by its P-
integrals EP [f, A] over sets A ∈ σ(P). Moreover, because, for each B ∈ σ(P)
and n, either An ⊆ B or B ∩ An = ∅, we have that
N
X X
EP f, A = P B ∩ An = P An ∩ B = P A ∩ B .
n=1 {n:An ⊆B}
and one would be foolish to take any other representative. More generally, I
will always take non-negative representatives of EP [X|Σ] when X itself is non-
negative and R-valued representatives when X is P-integrable. Finally, for histor-
ical reasons, it is usual to distinguish the case when X is the indicator function
1B of a set B ∈ F and to call EP [1B |Σ] the conditional probability of B
given Σ and to write P(B|Σ) instead of EP [1B |Σ]. Of course, representatives of
P(B|Σ) will always be assumed to take their values in [0, 1].
§ 5.1 Conditioning 197
Once one has established the existence and uniqueness of conditional expec-
tations, there is a long list of more or less obvious properties that one can easily
verify. The following theorem contains some of the more important items that
ought to appear on such a list.
Theorem 5.1.4. Let Σ be a sub-σ-algebra of F. If X is a P-integrable random
variable and C ⊆ Σ is a π-system (cf. Exercise 1.1.12) that generates Σ, then
Y = EP X Σ (a.s., P) ⇐⇒
Y ∈ L1 (Ω, Σ, P; R) and EP Y, A = EP X, A for A ∈ C ∪ {Ω}.
h i
(5.1.6) EP X T = EP EP X Σ T
and
(5.1.7) EP Y X Σ = Y EP X Σ
Proof: To prove the first assertion, note that the set of A ∈ Σ for which
EP [X, A] = EP [Y, A] is (cf. Exercise 1.1.12) a λ-system that contains C and
therefore Σ. Next, clearly (5.1.5) is just an application of Lemma 5.1.2, while
(5.1.6) and the two equations that follow it are all expressions of uniqueness. As
for the next equation, one can first reduce to the case when X and Y are both
non-negative. Then one can use uniqueness to check it when Y is the indicator
function of an element of Σ, use linearity to extend it to simple Σ-measurable
functions, and complete the job by taking monotone limits. Finally, (5.1.8) is an
immediate application of the Monotone Convergence Theorem, whereas (5.1.9)
comes from the conjunction of
m ∈ Z+ ,
P
E inf Xn Σ ≤ inf EP Xn Σ (a.s., P),
n≥m n≥m
with (5.1.8).
It probably will have occurred to most readers that the properties discussed
in Theorem 5.1.4 give strong evidence that, for fixed ω ∈ Ω, X 7−→ EP [X|Σ](ω)
behaves like an integral (in the sense of Daniell) and therefore ought to be
expressible in terms of integration with respect to a probability measure Pω .
Indeed, if one could actually talk about X 7−→ EP [X|Σ](ω) for a fixed (as opposed
to P-almost every) ω ∈ Ω, then there is no doubt that such a Pω would have to
exist. Thus, it is reasonable to ask whether there are circumstances in which one
can gain sufficient control over all the P-null sets involved to really make sense
out of X 7−→ EP [X|Σ](ω) for fixed ω ∈ Ω. Of course, when Σ is generated by a
countable partition P, we already know what to do. Namely, when ω ∈ A ∈ P,
we can take (
0 if P(A) = 0
P
E [X|Σ](ω) = EP [X, A]
P(A) if P(A) > 0.
Even when Σ does not arise in this way, one can often find a satisfactory repre-
sentation of conditional expectations as expectations. A quite general statement
of this sort is the content of Theorem 9.2.1 in Chapter 9.
§ 5.1.2. Some Extensions. For various applications it is convenient to have
two extensions of the basic theory developed in § 5.1.1. Specifically, as I will now
show, the theory is not restricted to probability (or even finite) measures and
can be applied to random variables that take their values in a separable Banach
space. Thus, from now
on, µ will be an arbitrary (non-negative) measure on
(Ω, F) and E, k·kE will be a separable Banach space; and I begin by reviewing
a few elementary facts about µ-integration for E-valued random variables.2
2 The integration that I outline below is what functional analysts call the Bochner integral for
Banach space–valued functions. There is a more subtle and intricate theory due to Pettis, but
Bochner’s theory seems adequate for most probabilistic considerations.
§ 5.1 Conditioning 199
Notice that another description of Eµ [X] is as the unique element of E with the
property that
and will write X ∈ Lp (µ; E) when kXkLp (µ;E) < ∞. Also, I will say the X :
Ω −→ E is µ-integrable if X ∈ L1 (µ; E); and I will say that X is locally
µ-integrable if 1A X is µ-integrable for every A ∈ F with µ(A) < ∞.
The definition of µ-integration for an E-valued X is completed in the following
lemma.
Lemma 5.1.10. For
each µ-integrable X : Ω −→ E there is a unique element
Eµ [X] ∈ E satisfying EP [X], x∗ = EP [hX, x∗ ] for all x∗ ∈ E ∗ . In particular,
Finally, if X ∈ Lp (µ; E), where p ∈ [1, ∞), then there is a sequence {Xn : n ≥ 1}
of E-valued, µ-simple functions with the property that kXn − XkLp (µ;E) −→ 0.
Proof: Clearly uniqueness, linearity, and (5.1.11) all follow immediately from
the given characterization of Eµ [X]. Thus, all that remains is to prove existence
and the final approximation assertion. In fact, once the approximation assertion
is proved, then existence will follow immediately from the observation that, by
(5.1.11), Eµ [X] can be taken equal to limn→∞ Eµ [Xn ] if kX − Xn kL1 (µ;E) −→ 0.
To prove the approximation assertion, I begin with the case when µ is finite
and M = supω∈Ω kX(ω)kE < ∞. Next, choose a dense sequence {x` : ` ≥ 1} in
E, set A0,n = ∅, and let
n o
A`,n = ω : kX(ω) − x` kE < n1 for (`, n) ∈ Z+ × Z+ .
200 5 Conditioning and Martingales
where n o
Ω(r) ≡ ω : r ≤ kX(ω)kE ≤ 1r for r ∈ (0, 1].
Since, for any r ∈ (0, 1], rp µ Ω(r) ≤ kXkpLp (µ;E) , we can apply the preceding to
Eµ XΣ , A = Eµ X, A
(5.1.13) for every A ∈ Σ with µ(A) < ∞.
Hence, not only does (5.1.13) continue to hold for any A ∈ Σ with 1A X ∈
L1 (µ; E), but also, for each p ∈ [1, ∞], the mapping X ∈ Lp (µ; E) 7−→ XΣ ∈
Lp (µ; E) is a linear contraction.
Proof: Clearly, it is only necessary to prove the “⇐=” part of the first assertion.
Thus, suppose that µ(X 6= 0) > 0. Then, because E is separable and therefore
(cf. Exercise 5.1.19) E ∗ with the weak* topology
is also separable, there exists
an > 0 and a x∗ ∈ E ∗ with the property that µ X, x∗ ≥ > 0, from which
it follows (by σ-finiteness) that there is an A ∈ F for which µ(A) < ∞ and
D E h
i
Eµ X, A , x∗ = Eµ X, x∗ , A 6= 0.
I turn next to the uniqueness and other properties of XΣ . But it is obvious that
uniqueness is an immediate consequence of the first assertion and that linearity
follows from uniqueness. As for (5.1.14), notice that if x∗ ∈ E ∗ and kx∗ kE ∗ ≤ 1,
then
Eµ XΣ , x∗ , A = Eµ X, x∗ , A ≤ Eµ kXkE , A = Eµ kXkE Σ , A
element x∗ from the unit ball in E ∗ ; and so, because E ∗ with the weak* topology
is separable, (5.1.14) follows in this case. To handle µ’s that are not probability
measures, note that either µ(Ω) = 0, in which case everything is trivial, or
µ(Ω) ∈ (0, ∞), in which case we can renormalize µ to make it a probability
202 5 Conditioning and Martingales
has the required properties. In order to handle general X ∈ L1 (P; E), I use the
approximation result in Lemma 5.1.10 to find a sequence {Xn : n ≥ 1} of simple
functions that tend to X in L1 (P; E). Then, since
(Xn )Σ − (Xm )Σ = Xn − Xm Σ (a.s., P)
1
we
know that there exists a Σ-measurable XΣ ∈ L (P; E) to which the sequence
(Xn )Σ : n ≥ 1 converges; and clearly XΣ has the required properties.
Referring to the setting in the second part of Theorem 5.1.12, I will extend
the convention introduced following Theorem 5.1.3 and call the µ-equivalence
class of XΣ ’s satisfying (5.1.13) the µ-conditional expectation of X given
Σ, will use Eµ [X|Σ] to denote this µ-equivalence class, and will, in general,
ignore the distinction between the equivalence class and a generic representative
of that class. In addition, if X : Ω −→ E is locally µ-integrable, then, just
as in Theorem 5.1.4, the following are essentially immediate consequences of
uniqueness:
Eµ Y X Σ = Y Eµ X Σ (a.e., µ) for Y ∈ L∞ (Ω, Σ, µ; R),
and h i
Eµ X T = Eµ Eµ X Σ T
(a.e., µ)
X, Y ∈ L∞ (P; R).
(*) Π X ΠY = (ΠX)(ΠY ) for all
Hint: Assume that Π1 = 1 and that (*) holds. Given X ∈ L∞ (P; R), use
induction to show that
n
kΠXknL2n (P) ≤ kXkn−1 = Π X(ΠX)n−1
L∞ (P) kXkL (P) and ΠX
2
n
for all n ∈ Z+ . Conclude that kΠXkL∞ (P) ≤ kXkL∞ (P) and that ΠX ∈
L, n ∈ Z+ , for every X ∈ L∞ (P; R). Next, using the preceding together with
Weierstrass’s Approximation Theorem, show that (ΠX)+ ∈ L, first for X ∈
L∞ (P; R) and then for all X ∈ L2 (P; R). Finally, apply (i) to arrive at L =
L2 Ω, Σ, P; R .
(iii) To emphasize the point being made here, consider once again a closed
linear subspace L of L2 (P; R), and let ΠL be orthogonal projection onto L.
Given X ∈ L2 (P; R), recall that ΠL X is characterized as the unique element of
L for which X − ΠL X ⊥ L, and show that EP [X|ΣL ] is the unique element of
L2 (Ω, ΣL , P; R) with the property that
X − EP X ΣL ⊥ f Y1 , . . . , Yn
for all n ∈ Z+ , f ∈ Cb Rn ; R , and Y1 , . . . , Yn ∈ L. In particular, ΠL X =
EP [X|ΣL ] if and only if X −ΠL X is perpendicular not only to all linear functions
of the Y ’s in L but even to all nonlinear ones.
Exercise 5.1.16. In spite of the preceding, there is a situation in which or-
thogonal projection coincides with conditioning. Namely, suppose that G is a
closed Gaussian family in L2 (P; R), and let L be a closed, linear subspace of G.
As an application of Lemma 4.3.1, show that, for any X ∈ G, the orthogonal
projection ΠL X of X onto L is a conditional expectation value of X given the
σ-algebra ΣL generated by the elements of L.
204 5 Conditioning and Martingales
where the convergence is in L2 ([0, 1]; C). (Also see Exercise 5.2.45.)
Exercise 5.1.18. Let (Ω, F, µ) be a measure space and Σ a sub-σ-algebra of
F with the property that µ Σ is σ-finite. Next, let E be a separable Hilbert
0
space, p ∈ [1, ∞], X ∈ Lp (µ; E), and Y a Σ-measurable element of Lp (µ; E) (p0
is the Hölder conjugate of p). Show that
h i
Eµ Y, X E Σ = Y, Eµ X Σ µ-almost surely.
E
Next, choose an orthonormal basis {en : n ≥ 0} for E, and justify the steps in
∞
X
Eµ Y, X E = Eµ Y, en E en , X E
1
∞ h
X i
Eµ Y, en E Eµ en , X E Σ = Eµ Y, Eµ [X|Σ] E .
=
1
Exercise 5.1.19. Let E be a separable Banach space, and show that, for each
R > 0, the closed ball BE ∗ (0, R) with the weak* topology is a compact metric
space. Conclude from this that the weak* topology on E ∗ is second countable
and therefore separable.
Hint: Choose a countable, dense subset {xn : n ≥ 1} in the unit ball BE (0, 1),
and define
∞
X
ρ(x∗ , y ∗ ) = 2−n hxn , x∗ − y ∗ i for x∗ , y ∗ ∈ BE ∗ (0, R).
n=1
§ 5.2 Discrete Parameter Martingales 205
Show that ρ is a metric for the weak* topology on BE ∗ (0, R). Next, choose
{xnm : m ≥ 1} so that xn1 = x1 and xnm+1 = xn if n is the first n > nm such
that xn is linearly independent of {x1 , . . . , xn−1 }. Given a sequence {x∗` : ` ≥ 1}
in BE ∗ (0, R), use a diagonalization argument to find a subsequence {x∗`k : k ≥ 1}
such that am = limk→∞ hxnm , x∗`k i exists for each m ≥ 1. Now define f on the
PM PM
span S of {xnm : m ≥ 1} so that f (x) = m=1 αm am if x = m=1 αm xnm ,
note that f (x) = limk→∞ hx, x∗`k i for x ∈ S, and conclude that f is linear on
S and satisfies the estimate |f (x)| ≤ RkxkE there. Since S is dense in E,
there is a unique extension of f as a bounded linear functional on E satisfying
the same estimate, and so there exists an x∗ ∈ BE ∗ (0, R) such that hx, x∗ i =
limk→∞ hx, x∗`k i for all x ∈ S. Finally, check that this convergence continues to
hold for all x ∈ E, and conclude that x∗`k −→ x∗ in the weak* topology.
Exercise 5.1.20. The purpose of this exercise is to show that Bochner’s theory
of integration for Banach space functions relies heavily on the assumption that
the Banach space be separable. In particular, the approximation procedure on
which the proof of Lemma 5.1.10 fails in the absence of separability. To see
this, consider the Banach space `∞ (µ; R) of uniformly bounded sequences x =
(x0 , . . . , xn , . . . ) ∈ RN with kxk`∞ (N;R) = supn≥0 |xn |. Next, let {Xn : n ≥ 0}
be a sequence of mutually independent, {−1, 1}-valued, Bernoulli random with
∞
mean value 0 on some probability space (Ω, F, P), and define X : Ω −→ ` (N; R)
by X(ω) = X0 (ω), . . . , Xn (ω), . . . . Show that, for any simple function Y :
Ω −→ `∞ (N; R),
P kX − Yk`∞ (N;R) < 14 = 0.
Hint: For any α ∈ R, show that P |Xn − α| < 14 ≤ 12 and therefore that
Now assume that the Xn ’s are non-negative. Given (5.2.2), (5.2.3) becomes
an easy application of Exercise 1.4.18.
Doob’s inequality is an example of what analysts call a weak-type inequal-
ity. To be more precise, it is a weak-type 1–1 inequality. The terminology derives
from the fact that such an inequality follows immediately from an L1 -norm, or
strong-type 1–1, inequality between the objects under consideration; but, in gen-
eral, it is strictly weaker. In order to demonstrate how powerful such a result
can be, I will now apply Doob’s Inequality to prove a theorem of Marcinkewitz.
Because it is an argument to which we will return again, the reader would do
well to become comfortable with the line of reasoning that allows one to pass
from a weak-type inequality, like Doob’s, to almost sure convergence results.
Corollary 5.2.4. Let X be an R-valued random variable and p ∈ [1, ∞). If
X ∈ Lp (P; R), then, for any non-decreasing sequence Fn : n ∈ N of sub-σ-
algebras of F,
" ∞ #
_
(a.s., P) and in Lp (P; R) as n → ∞.
P
P
E X Fn −→ E X Fn
0
W∞
In particular, if X is 0 Fn -measurable, then EP [X|Fn ] −→ X (a.s., P) and in
Lp (P; R).
208 5 Conditioning and Martingales
W∞
Proof: Without loss in generality, assume that F = 0 Fn .
Given X ∈ L1 (P; R), set Xn = EP [X|Fn ] for n ∈ N. The key to my proof will
be the inequality
1
(5.2.5) P sup |Xn | ≥ α ≤ EP |X|, sup |Xn | ≥ α , α ∈ (0, ∞);
n∈N α n∈N
and, since, by (5.1.5), |Xn | ≤ EP [|X| |Fn ] (a.s., P), while proving (5.2.5) I may
and will assume that X and all the Xn ’s are non-negative. But then, by (5.2.2),
1 P
P sup Xn > α ≤ E XN , sup Xn > α
0≤n≤N α 0≤n≤N
1
= EP X, sup Xn > α
α 0≤n≤N
for all N ∈ Z+ , and therefore (5.2.5) follows when N → ∞ and one takes right
limits in α.
As my first application of (5.2.5), note that {Xn : n ≥ 0} is uniformly P-
integrable. Indeed, because |Xn | ≤ EP [|X| |Fn ], we have from (5.2.5) that
h i h i
sup EP |Xn |, |Xn | ≥ α ≤ sup EP |X|, |Xn | ≥ α
n∈N n∈N
P
≤ E |X|, sup |Xn | ≥ α −→ 0
n∈N
for every X ∈ L.
210 5 Conditioning and Martingales
Proof: To prove the first part, simply set Fn = σ Pn , identify the Xn in
(5.2.6) as EP [X|Fn ], and finally apply Corollary
5.2.4. As for the second part,
let Σ(L) be the σ-algebra generated by EP [X|Σ] : X ∈ L , note that Σ(L) is
countably generated and that
EP X Σ = EP X Σ(L) (a.s., P) for each X ∈ L,
and
X EP g(X), A
Yn ≡ 1A −→ EP g(X)Σ
P(A)
A∈Pn
for all A ∈ F with P(A) > 0. Hence, if Λ ∈ Σ denotes the set of ω for which
Xn (ω)
lim ∈ RN +1
n→∞ Yn (ω)
limn→∞ Xn (ω) if ω ∈ Λ
XΣ (ω) ≡
v if ω ∈
/ Λ,
and
limn→∞ Yn (ω) if ω ∈ Λ
Y (ω) ≡
v if ω ∈
/ Λ,
then XΣ is a C-valued representative of EP [X|Σ], Y is a representative of
E [g(X)|Σ], and Y (ω) ≤ g XΣ (ω) for every ω ∈ Ω.
P
Turning to the final assertion, begin by observing that once one knows that
f (X) ∈ L1 (P; R), the concluding inequality follows immediately by applying the
first part to the non-negative, concave function M − f , where M ∈ R is an upper
bound of f . Thus, what remains to be shown is that f − (X) ∈ L1 (P; R). To
this end, set fn = (−n) ∨ f for n ≥ 1. Then fn is bounded and convex,
and
P
so, by the preceding with Σ = {∅, Ω}, we know that fn E [X] ≤ E fn (X) .
P
Writing fn = f+ − fn− , this shows that EP fn− (X) ≤ M + − f E P [X] when
and, in the case of martingales, the inequality in the preceding can be replaced
by an equality.
Closely related to Doob’s Stopping Time Theorem is an important variant
due to G. Hunt. In order to facilitate the proof of Hunt’s result, I begin with an
easy but seminal observation of Doob’s.
Lemma 5.2.12 (Doob’s Decomposition). For each n ∈ N let Xn be an
Fn -measurable, P-integrable random variable. Then, up to a P-null set, there is
at most one sequence {An : n ≥ 0} ⊆ L1 (P; R) such that A0 = 0, An is Fn−1 -
+
measurable for each n ∈ Z , and Xn − An , Fn , P is a martingale. Moreover, if
(Xn , Fn , P) is an integrable submartingale, then such a sequence {An : n ≥ 0}
exists, and An−1 ≤ An P-almost surely for all n ∈ Z+ .
Proof: To prove the uniqueness assertion, suppose that {An : n ≥ 0} and
{Bn : n ≥ 0} are two such sequences, and set ∆n = Bn − An . Then ∆0 = 0,
∆n is Fn−1 -measurable for each n ∈ Z+ , and (∆n , Fn , P) is a martingale. But
this means that ∆n = EP [∆n | Fn−1 ] = ∆n−1 for all n ∈ Z+ , and so ∆n = 0 for
all n ∈ N.
Now suppose that (Xn , Fn , P) is an integrable submartingale. To prove the
asserted existence result, set A0 ≡ 0 and
for n ∈ Z+ .
An = An−1 + EP Xn − Xn−1 Fn−1 ∨ 0
Theorem 5.2.13 (Hunt). Let Xn , Fn , P be a P-integrable submartingale.
Given bounded stopping times ζ and ζ 0 satisfying ζ ≤ ζ 0 ,
(5.2.14) Xζ ≤ EP Xζ 0 Fζ (a.s., P),
and the inequality can be replaced by equality when Xn , Fn , P is a martingale.
(Cf. Exercise 5.2.39 for unbounded stopping times.)
Proof: Choose {An : n ∈ N} for (Xn , Fn , P) as in Lemma 5.2.12, and set
Yn = Xn − An for n ∈ N. Then, because Aζ ≤ Aζ 0 and Aζ is Fζ -measurable,
EP Xζ 0 Fζ ≥ EP Yζ 0 + Aζ Fζ = EP Yζ 0 Fζ + Aζ .
214 5 Conditioning and Martingales
Hence, it suffices to prove that equality holds in (5.2.14) when Xn , Fn , P is a
martingale. To this end, choose N ∈ Z+ to be an upper bound for ζ 0 , let Γ ∈ Fζ
be given, and note that
N
X
EP XN , Γ = EP XN , Γ ∩ {ζ = n}
n=0
N
X
= EP Xn , Γ ∩ {ζ = n} = EP Xζ , Γ .
n=0
2 In the notes to Chapter VII of his Stochastic Processes, Wiley (1953), Doob gives a thorough
account of the relationship between his convergence result and earlier attempts in the same
direction. In particular, he points out that, in 1946, S. Anderson and B. Jessen formulated
and proved a closely related convergence theorem.
§ 5.2 Discrete Parameter Martingales 215
0
ζk = inf{n ≥ ζk−1 : Xn ≤ a} ∧ N and ζk0 = inf{n ≥ ζk : Xn ≥ b} ∧ N.
N N
(N )
X X
U[a,b] ≤ Yζk0 − Yζk = YN − Y0 − Yζk − Yζk−1
0
k=1 k=1
N
X
≤ YN − Yζk − Yζk−1
0 .
k=1
0
Hence, since ζk−1 ≤ ζk and therefore, by (5.2.14), EP Yζk − Yζk−1
0 ≥ 0 for all
(N )
k ∈ Z+ , we see that EP [U[a,b] ] ≤ EP [YN ], and clearly (5.2.16) follows from this
after one lets N → ∞.
Given (5.2.16), the convergence result is easy. Namely, if (5.2.17) is satisfied,
then (5.2.16) implies that there is a set Λ of full P-measure such that U[a,b] (ω) <
∞ for all rational a < b and ω ∈ Λ; and so, by the remark preceding the
statement of this theorem, for each ω ∈ Λ, {Xn (ω) : n ≥ 0} converges to some
X(ω) ∈ [−∞, ∞]. Hence, we will be done as soon as we know that EP [|X|, Λ] <
∞. But
Q(A ∩ G = EQ XY, A ∩ G = EPa X, A ∩ G
= EP X, A ∩ G ∩ B = EP X, A ∩ B = Qa (A ∩ B) = Qa (A)
for all A ∈ F.
§ 5.2.4. Reversed Martingales and De Finetti’s Theory. For some appli-
cations it is important to know what happens if one runs a submartingale or
martingale backwards. Thus,
again let (Ω, F, P) be a probability space, only
this time suppose that Fn : n ∈ N is a sequence of sub-σ-algebras that
is non-increasing. Given a sequence {Xn : n ≥ 0} of (−∞, ∞]-valued ran-
dom variables, I will say that the triple Xn , Fn , P is either a reversed sub-
martingale or a reversed martingale if, for each n ∈ N, Xn is Fn -measurable
and either Xn− ∈ L1 (P; R) and Xn+1 ≤ EP [Xn | Fn+1 ] or Xn ∈ L1 (P; R) and
Xn+1 = EP [Xn | Fn+1 ].
218 5 Conditioning and Martingales
Moreover, if (Xn , Fn , P) is a reversed martingale, then (|Xn |, Fn , P is a re-
versed submartingale. Finally, if (XnT , Fn , P) is a reversed submartingale and
∞
X0 ∈ L1 (P; R), then there is a F∞ ≡ n=0 Fn -measurable X : Ω −→ [−∞, ∞]
to which Xn converges P-almost surely. In fact, X will be P-integrable if
supn≥0 EP [|Xn |] < ∞; and if (Xn , Fn , P) is either a non-negative reversed sub-
martingale or a reversed martingale with X0 ∈ Lp (P; R) for some p ∈ [1, ∞),
then Xn −→ X in Lp (P; R).
Proof: More or less everything here follows immediately from the observation
that (Xn , Fn , P) is a reversed submartingale or a reversed martingale if and only
if, for each N ∈ Z+ , (XN −n∧N , FN −n∧N , P) is a submartingale or a martingale.
Indeed, by this observation and (5.2.2) applied to (XN −n∧N , FN −n∧N , P),
1 P
P max Xn > R ≤ E X0 , max Xn > R
0≤n≤N R 0≤n≤N
h i
EP F X−1 (A∞ ) = EP EP F σ {Xn : n ≥ 1} X−1 (A∞ )
h i
= lim EP EP F σ {Xm : 1 ≤ m ≤ N } X−1 (A∞ ) .
N →∞
§ 5.2 Discrete Parameter Martingales 221
Now suppose that F is σ {Xm : 1 ≤ m ≤ N } -measurable. Then there
exists a g : E N −→ R such that F = g X1 , . . . , XN ). If N = 1, then, because
Pn
limn→∞ n1 m=1 g ◦ Xm is T -measurable, (5.2.24) says that E P [F | X−1 (A∞ )]
is T -measurable. To get the same conclusion when N ≥ 2, I want to apply the
same reasoning, only now with E replaced by E N . To be precise, define
Z+
A(N )
: B = Sσ B for all π ∈ Σ(N ) , where
∞ = B ∈B
Clearly,
x ∈ Q =⇒ M f (Q) ≤ f ∗ (x) ≡ sup A|f | Qk (x) ,
k≥0
and, because Af Qk (x) = Eµ f σ(Pk ) (x), Doob’s Inequality (5.2.3) implies
p
that kf ∗ kLp (µ;R) ≤ p−1 kf kLp (µ;R) for all p ∈ (1, ∞].
Lemma 5.2.31. For any f ∈ L1 (µ; R),
Z Z
−1
(5.2.32) |f | dµn ≤ θ f ∗ dνn .
0
In particular, if q ∈ [1, ∞) and f ∈ Lqr (µ; R), then
q‘
rKr
(5.2.33) kf kLq (µn ;R) ≤ kf kLqr0 (µ;R) .
θ
Proof: Without loss in generality, I will assume throughout that f ≥ 0.
To prove (5.2.32), first note that
Z ∞ Z
X X
n 1
f dµn = ∆ (Q)νn (Q) f dµ
k=0 Q∈Pk+1
µ(Q̆ \ Q) Q̆\Q
X X
≤ θ−1 ∆n (Q)νn (Q)M f (Q̆),
k=0 Q∈Pk+1
224 5 Conditioning and Martingales
since Z
1
f dµ ≤ θ−1 Af (Q̆) ≤ θ−1 M f (Q̆).
µ(Q̆ \ Q) Q̆\Q
and therefore
Z K X
X n
θ f dµn ≤ lim 1 − µ(Q) νn (Q)M f (Q)
K→∞
k=0 Q∈Pk+1
X n
− 1 − µ(Q) νn (Q)M f (Q)
Q∈Pk
Z
X n
= lim 1 − µ(Q) νn (Q)M f (Q) ≤ f ∗ dνn .
K→∞
Q∈PK+1
r0 rKr
≤ θ−1 Kr 0
kf q kLr0 (µ;R) = kf kqLqr0 (µ;R) .
r −1 θ
Proof: It is easy to prove (5.2.28) from (5.2.35). Indeed, given δ > 0, choose
R
R > 0 so that µ |f | ≥ R < δ, and set f = f 1[−R,R] (f ). Then, by (5.2.35),
limn→∞ P |f R (Yn ) − f R (Zn )| ≥ = 0 for all > 0. Hence,
lim P |f (Yn ) − f (Zn )| ≥ 3
n→∞
≤ lim µn |f − f R | ≥ + lim νn |f − f R | ≥ .
n→∞ n→∞
By Hölder’s Inequality,
1 1
νn |f − f R | ≥ ≤ Kr µ |f − f R | ≥ r0 < Kr δ r0 ,
By (5.2.33),
p1
rKr
kf − fk kLp (µn ;R) ≤ kf − fk kLpr0 (µ;R) ,
θ
and, by Hölder’s Inequality,
1
kf − fk kLp (νn ;R) ≤ Krp kf − fk kLpr0 (µ;R) .
EP Mn2 − Mn−1
2
= EP Mn − Mn−1 Xn + Xn−1
= EP Xn2 − Xn−1
2
− EP An − An−1 Xn + Xn−1 ≤ EP Xn2 − Xn−1
2
,
±
±
For each m ∈ N, define Yn,m = EP Xn∨m Fm ∨0 for n ∈ N. Show that Y ± ≥
n+1,m
± ± ± +
Yn,m (a.s., P), define Ym = limn→∞ Yn,m , check that both Ym , Fm , P and
Ym− , Fm , P are non-negative martingales with EP Y0+ +Y0− ≤ supn∈N EP |Xn | ,
and note that Xm = Y m+ − Ym− (a.s., P) for each m ∈ N. In other words, every
martingale Xn , Fn , P satisfying (5.2.37 ) admits a Hahn decomposition3 as
the difference of two non-negative martingales whose sum has expectation value
dominated by the left-hand side of (5.2.37). Finally, use this observation together
with (iii) to see that every such martingale converges P-almost surely to some
X ∈ L1 (P; R).
(v) By combining the final assertion in (iv) together with Doob’s Decomposition
in Lemma 5.2.12, give another proof of the convergence assertion in Theorem
5.2.15.
Exercise 5.2.38. In this exercise we will develop another way to reduce Doob’s
Martingale Convergence Theorem to the case of L2 -bounded martingales. The
technique here is due to R. Gundy and derives from the ideas introduced by
Calderón and Zygmund in connection with their famous work on weak-type 1–1
estimates for singular integrals.
(i) Let {Zn : n ∈ N} be a Fn : n ∈ N -progressively
measurable, [0, R]-valued
sequence with the property that
−Z n , F n , P is a submartingale. Next, choose
{An : n ∈ N} for −Zn , Fn , P as in Lemma 5.2.12, note that An ’s can be chosen
so that 0 ≤ An − An−1 ≤ R for all n ∈ Z+ , and set Mn = Zn + An , n ∈ N.
Check that Mn , Fn , P is a non-negative martingale with Mn ≤ (n + 1)R for
each n ∈ N. Next, show that
EP Mn2 − Mn−1
2
= EP Mn − Mn−1 Zn + Zn−1
= EP Zn2 − Zn−1
2
+ EP An − An−1 Zn + Zn−1
≤ EP Zn2 − Zn−1
2
+ 2R EP An − An−1 ,
for n ∈ Z+ ; and {∆n : ∈ N} is an Fn : n ∈ N -progressively measurable
sequence satisfying
2
P ∃ 0 ≤ m ≤ n ∆(R)
m 6
= 0 ≤ EP |Xn | .
R
The preceding representation is called the Calderón–Zygmund decomposi-
tion of the martingale Xn , Fn , P .
(iv) Let Xn , Fn , P be a martingale that satisfies (5.2.37), and use part (iii)
above together with part (i) of Exercise 5.2.36 to show that, for each R ∈ (0, ∞),
2
{Xn : n ≥ 0} converges off of a set whose P-measure is no more than R times the
supremum over n ∈ N of E [|Xn |]. In particular, when combined with Lemma
P
5.2.12, the preceding line of reasoning leads to the advertised alternate proof of
the convergence result in Theorem 5.2.15.
Exercise 5.2.39. In this exercise we will extend Hunt’s Theorem (cf. Theorem
5.2.13) to allow unbounded stopping times. To this end, let Xn , Fn , P be a
uniformly P-integrable submartingale on the probability space (Ω, F, P), and set
Mn = Xn − An , n ∈ N, where {An : n ∈ N} is the sequence produced in Lemma
5.2.12. After checking that Mn , Fn , P is a uniformly P-integrable martingale,
show that, for any stopping time ζ: Xζ = EP [M∞ |Fζ ] + Aζ (a.s., P), where
X∞ , M∞ , and A∞ are, respectively, the P-almost sure limits of {Xn : n ≥ 0},
{Mn : n ≥ 0}, and {An : n ≥ 0}. In particular, if ζ and ζ 0 are a pair of stopping
times and ζ ≤ ζ 0 , conclude that Xζ ≤ EP [Xζ 0 |Fζ ] (a.s., P).
Exercises for § 5.2 229
Exercise 5.2.40. There are times when submartingales converge even though
they are not bounded in L1 (P; R). For example, suppose that (Xn , Fn , P) is a
ρ : R 7−→ R with
submartingale for which there exists a non-decreasing function
the properties that ρ(R) ≥ R for all R and Xn+1 ≤ ρ Xn (a.e., P) for each
n ∈ N.
(i) Set ζR (ω) = inf n ∈ N : Xn (ω) ≥ R for R ∈ (0, ∞), and note that
sup Xn∧ζR ≤ X0 ∨ ρ(R) (a.e., P).
n∈N
dνn
where fn ≡ dµ n
. In particular, when νn ∼ µn for each n ∈ N, use Kol-
mogorov’s
0–1 Law (cf. Theorem 1.1.2) to see that Q(G) ∈ {0, 1}, where G ≡
limn→∞ Xn ∈ (0, ∞)}, and combine this with the last part of Theorem 5.2.20
to conclude that Q 6⊥ P =⇒ Q P. Finally, to remove the assumption that
νn ∼ µn for all n’s, define ν̃n on (En , Bn ) by ν̃n = 1 − 2−n−1 νn + 2−n−1 µn ,
Q
check that ν̃n ∼ µn and Q Q̃ ≡ n∈N ν̃n , and use the preceding to complete
the proof.
Exercise 5.2.42. Let (Ω, F) be a measurable space and Σ a sub-σ-algebra of
F. Given a pair of probability measures P and Q on (Ω, F), let XΣ and YΣ
be non-negative Radon–Nikodym derivatives
of, respectively, PΣ ≡ P Σ and
QΣ ≡ Q Σ with respect to PΣ + QΣ , and define
Z
1 1
P, Q Σ = XΣ2 YΣ2 d(P + Q).
Z 12 12
dPΣ dQΣ
dµ.
dµ dµ
Also, check that PΣ ⊥ QΣ if and only if P, Q Σ = 0.
(ii) Suppose that Fn : n ∈ N is a non-decreasing sequence of sub-σ-algebras
of F, and show that (P, Q)Fn −→ (P, Q)W∞ Fn .
0
P∞
depending on whether 0 σn−2 (bn − an )2 converges or diverges.
Exercise 5.2.43. Let {Xn : n ∈ Z+ } be a sequence of identically distributed,
mutually independent, integrable, mean value P
0, R-valued random variables on
n
the probability space (Ω, F, P), and set Sn = 1 Xm for n ∈ Z+ . In Exercise
Exercises for § 5.2 231
1.4.28 we showed that limn→∞ |Sn | < ∞ P-almost surely. Here we will show
that
As was mentioned before, this result was proved first by K.L. Chung and W.H.
Fuchs. The basic observation behind the present proof is due to A. Perlin, who
noticed that, by the Hewitt–Savage 0–1 Law, limn→∞ |Sn | = L P-almost surely
for some L ∈ [0, ∞). Thus, the problem is to show that L = 0, and we will do
this by an simple argument invented by A. Yushkevich.
(i) Assuming that L > 0, use the Hewitt–Savage 0–1 Law to show that
L
P |Sn − x| < 3 i.o. = 0 for any x ∈ R,
where “i.o.” stands for “infinitely often” and means here “for infinitely many
n’s.”
Hint: Set ρ = L3 . Begin by observing that, because {Sm+n − Sm : n ∈ Z+ }
has the same P-distribution as {Sn : n ∈ Z+ }, P(|Sm+n − Sm | < 2ρ i.o.) = 0 for
any m ∈ Z+ . Thus, since |Sm+n − x| ≥ |Sm+n − Sm | − |Sm − x|, P(|Sn − x| <
ρ i.o.) ≤ P(|Sm − x| ≥ ρ) for any m ∈ Z+ . Moreover, by the Hewitt–Savage
0–1 Law, P(|Sn − x| < ρ i.o.) ∈ {0, 1}. Hence, either P(|Sn − x| < ρ i.o.) = 0,
or one has the contradiction that P(|Sm − x| < ρ) = 0 for all m ∈ Z+ and yet
P(|Sn − x| < ρ i.o.) = 1.
(ii) Still assuming that L > 0, argue that
L L
P |Sn − L| < 3 i.o. ∨ P |Sn + L| < 3 i.o. = 1,
After noting that Fn : n ∈ N is non-increasing, use the convergence result for
reversed martingales in Theorem 5.2.21 to see that the expansion
∞
X
f = f, 1 L2 ([0,1);C)
+ ∆m (f )
m=0
4 When f is a function with the property that (f, e` )L2 ([0,1);C) = 0 for all ` ∈ Z\{2m : m ∈ N},
the preceding almost everywhere convergence result can be interpreted as saying that the
Fourier series of f converges almost everywhere, a result that was discovered originally by
Kolmogorov. The proof suggested here is based on fading memories of a conversation with
N. Varopolous. Of course, ever since L. Carleson’s definitive theorem on the almost every
convergence of the Fourier series of an arbitrary square integrable function, the interest in this
result of Kolmogorov is mostly historical.
Chapter 6
Some Extensions and Applications
of Martingale Theory
Many of the results obtained in § 5.2 admit easy extensions to both infinite
measures and Banach space–valued random variables. Furthermore, in many
applications, these extensions play a useful, and occasionally essential, role. In
the first section of this chapter, I will develop some of these extensions, and in the
second section I will show how these extensions can be used to derive Birkhoff’s
Individual Ergodic Theorem. The final section is devoted to Burkholder’s In-
equality for martingales, an estimate that is second in importance only to Doob’s
Inequality.
§ 6.1 Some Extensions
Throughout
discussion that follows, (Ω, F, µ) will be a measure space and
the
Fn : n ∈ N will be a non-decreasing sequence of sub-σ-algebras with the
property that µ F0 is σ-finite. In particular, this means that the conditional
expectation of a locally µ-integrable random variable given Fn is well defined (cf.
Theorem 5.1.12) even if the random variable takes
values in a separable Banach
space E. Thus, I will say that the sequence Xn ; n ∈ N of E-valued random
variables is a µ-martingale with respect to Fn : n ∈ N , or, more briefly,
that the triple Xn , Fn , µ is a martingale, if {Xn : n ∈ N} is Fn : n ∈ N -
progressively measurable, each Xn is locally µ-integrable, and
233
234 6 Some Extensions and Applications
Theorem 6.1.1. Let Xn , Fn , µ be an R-valued µ-submartingale. Then, for
each N ∈ N and A ∈ F0 on which XN is µ-integrable,
1
(6.1.2) µ max Xn ≥ α ∩ A ≤ Eµ XN , max Xn ≥ α ∩ A
0≤n≤N α 0≤n≤N
for all α ∈ (0, ∞); and so, when all the Xn ’s are non-negative, for every p ∈
(1, ∞) and A ∈ F0 ,
p1
p 1
Eµ sup |Xn |p , A ≤ sup Eµ |Xn |p , A p .
n∈N p − 1 n∈N
Furthermore, for each stopping time ζ, Xn∧ζ , Fn , µ is a submartingale or a
martingale depending on whether Xn , Fn , µ is a submartingale or a martingale.
In addition, for any pair of bounded stopping times ζ ≤ ζ 0 ,
Xζ ≤ Eµ Xζ 0 Fζ
(a.e., µ),
and the inequality is an equality in the martingale case. Finally, given a < b
and A ∈ F0 ,
µ
Eµ (Xn − a)+ , A
E U[a,b] , A ≤ sup ,
n∈N b−a
where U[a,b] (ω) denotes the precise number of times that {Xn (ω) : n ≥ 1}
upcrosses [a, b] (cf. the discussion preceding Theorem 5.2.15), and therefore
These partitions are nicely meshed in the sense that the (n + 1)st is a refinement
of the nth. Equivalently, if Fn denotes the σ-algebra over RN generated by the
partition Pn , then Fn ⊆ Fn+1 . Moreover, if f ∈ L1 (RN ; R) and
Z
f nN f (y) dy for x ∈ Cn (k) and k ∈ ZN ,
Xn (x) ≡ 2
Cn (k)
where ( Z )
(0) 1 [
M f (x) = sup |f (y)| dy : x ∈ Q ∈ Pn
|Q| Q n∈Z
and I have used |Γ| to denote λRN (Γ), the Lebesgue measure of Γ.
At first sight, one might hope that it should be possible to pass directly from
(6.1.5) to analogous estimates on the level sets of Mf . However, the passage
from (6.1.5) to control on Mf is not as easy as it might appear at first: the
“sup” in the definition of Mf involves many more cubes than the one in the
definition of M(0) f . For this reason I will have to introduce additional families
of meshed partitions. Namely, for each η ∈ {0, 1}N , set
(−1)n η
N
Pn (η) = + Cn (k) : k ∈ Z ,
3 × 2n
then exactly the same argument that (when η = 0) led us to (6.1.5) can now be
used to get
Z
n
N (η)
o 1
(*) x ∈ R : M f (x) ≥ α ≤ f (y) dy
α
{M(η) f ≥α}
for each η ∈ {0, 1}N and α ∈ (0, ∞). Finally, if Q is given by (6.1.3) and
r ≤ 3 12n , then it is possible to find an η ∈ {0, 1}N and a C ∈ Pn (η) for which
Q ⊆ C. (To see this, first reduce to the case when N = 1.) Hence,
After combining this with the estimate in (*), we arrive at the following version
of the Hardy–Littlewood Maximal Inequality:
n o (12)N Z
(6.1.6) x ∈ RN : Mf (x) ≥ α ≤ |f (y)| dy.
α RN
Next, even though the result in Exercise 1.4.18 was stated for probability mea-
sures, it applies equally well to any finite measure. Thus, we now know that
Z ! p1
(η) (η) p p
kM f kLp (RN ;R) = lim (M f ) (x) dx ≤ kf kLp (RN ;R) ,
R→∞ B(0,R) p−1
where, for each x ∈ RN , the limit is taken over balls B that contain x and tend
to x in the sense that their radii shrink to 0. In particular,
Z
1
f (x) = lim f (y) dy for λRN -almost every x ∈ RN .
B&{x} |B| B
Proof: I begin with the observation that, for each f ∈ L1 (RN ; R),
Z
1 f (y) dy ≤ κN Mf (x), x ∈ RN ,
M̃f (x) ≡ sup
B3x |B| B
238 6 Some Extensions and Applications
2N
where κn = Ω N
with ΩN = B(0, 1). Second, notice that (6.1.9) for every
x ∈ RN is trivial when f ∈ Cc (RN ; R). Hence, all that remains is to check that
if fn −→ f in L1 (RN ; R) and if (6.1.9) holds for each fn , then it holds for f . To
this end, let > 0 be given and check that, because of the preceding and (6.1.6),
Z
x : lim 1
f (y) − f (x) dy ≥
B&{x} |B| B
n o
≤ x : M̃(f − fn )(x) ≥
3
Z
1
fn (y) − fn (x) dy ≥
+ x : lim
B&{x} |B| B 3
n o
+ x : fn (x) − f (x) ≥
3
3
≤ 1 + (12)N κN kf − fn kL1 (RN )
for every n ∈ Z+ . Hence, after letting n → ∞, we get (6.1.9) f .
Although applications like Lebesgue’s Differentiation Theorem might make
one think that (6.1.6) is most interesting because of what it says about averages
over small cubes, its implications for large cubes are also significant. In fact, as I
will show in § 6.2, it allows one to prove Birkhoff’s Individual Ergodic Theorem
(cf. Theorem 6.2.7), which may be viewed as a result about differentiation at
infinity. The link between ergodic theory and the Hardy–Littlewood Inequality
is provided by the following deterministic
version
of the Maximal Ergodic Lemma
(cf. Lemma 6.2.1). Namely, let ak : k ∈ ZN be a summable subset of [0, ∞),
and set
1 X
S n (k) = N
aj+k , n ∈ N and k ∈ ZN ,
(2n)
j∈Qn
where Qn = j ∈ ZN : −n ≤ ji < n for 1 ≤ i ≤ N . By applying (6.1.6) and
(6.1.7) to the function f given by (cf. (6.1.4)) f (x) = ak when x ∈ C0 (k), we
see that
(12)N X
N
(6.1.10) card k ∈ Z : sup S n (k) ≥ α ≤ ak , α ∈ (0, ∞)
n∈Z+ α N k∈Z
and
! p1 ! p1
X (12)N p X
(6.1.11) sup |S n (k)|p ≤ |ak |p for p ∈ (1, ∞].
n∈Z+ p−1
k∈ZN k∈ZN
for the game of cricket. What Hardy wanted to find is the optimal order in
which to arrange batters to maximize the average score per inning. Thus, he
worked with a non-negative sequence {ak : k ≥ 0} in which ak represented the
expected number of runs scored by player k, and what he showed is that, for
each α ∈ (0, ∞),
k ∈ N : sup S n (k) ≥ α
+
n∈Z
Although this sharpened result can also be obtained as a corollary the Sunrise
Lemma,1 Hardy’s approach remains the most appealing.
§ 6.1.2. Banach Space–Valued Martingales. I turn next to martingales
with values in a separable Banach space. Actually, everything except the easiest
aspects of this topic becomes extremely complicated and technical very quickly,
and, for this reason, I will restrict my attention to those results that do not
involve any deep properties of the geometry of Banach spaces. In fact, the only
general theory with which I will deal is contained in the following.
Theorem 6.1.12. Let E be a separable Banach space and X n , Fn , µ an E-
valued martingale. Then kXn kE , Fn , µ is a non-negative submartingale and
therefore, for each N ∈ Z+ and all α ∈ (0, ∞),
1 µ
(6.1.13) µ sup kXn kE ≥ α ≤ E kXN kE , sup kXn kE ≥ α .
0≤n≤N α 0≤n≤N
1See Lemma 3.4.5 in my A Concise Introduction to the Theory of Integration, Third Edition,
Birkhauser (1998).
240 6 Some Extensions and Applications
Proof: The fact kXn kE , Fn , µ is a submartingale is an easy application of
the inequality in (5.1.14); and, given this fact, the inequalities in (6.1.13) and
(6.1.14) follow from the corresponding inequalities in Theorem 6.1.1.
W∞ While proving the convergence statement, I may and will assume that F =
p µ
0 Fn . Now let X ∈ L (µ; E) be given, and set Xn = E [X|Fn ], n ∈ N.
Because of (6.1.13) and (6.1.14), we know (cf. the proofs of Corollary 5.2.4 and
Theorem 6.1.8) that the set of X for which Xn −→ X (a.e., µ) is a closed
subset of Lp (µ; E). Moreover, if X is µ-simple, then the µ-almost everywhere
convergence of Xn to X follows easily from the R-valued result. Hence, we
now know that Xn −→ X (a.s, µ) for each X ∈ L1 (µ; E). In addition, because
of (6.1.14), when p ∈ (1, ∞), the convergence in Lp (µ; E) follows by Lebesgue’s
Dominated Convergence Theorem. Finally, to prove the convergence in L1 (µ; E)
when X ∈ L1 (µ; E), note that, by Fatou’s Lemma,
kXkL1 (µ;E) ≤ lim kXn kL1 (µ;E) ,
n→∞
Hence, because
kXn kE − kXkE − kXn − XkE ≤ 2kXkE ,
kψt ? f kLp (RN ;R) ≤ kψkL1 (RN ;R) kf kLp (RN ;R) , t ∈ (0, ∞) and p ∈ [1, ∞],
2 This proof, which seems to have been the first, of the Strong Law for Banach spaces was
given by E. Mourier in “Eléments aléatoires dans un espace de Banach,” Ann. Inst. Poincaré
13, pp. 166–244 (1953).
242 6 Some Extensions and Applications
and Z
ψt ? f (x) = ψ(y) f (x − ty) dy,
RN
where M̃f is the quantity introduced at the beginning of the proof of Theorem
6.1.8. In particular, conclude that there is a constant KN ∈ (0, ∞), depending
only on N ∈ Z+ , such that
x ∈ RN .
Mψ f (x) ≡ sup ψt ? f (x) ≤ KN A Mf (x),
t∈(0,∞)
N
ΠR
t . In both these cases, A = N .
and finally apply the last part of Theorem 6.1.12 to see that Xn −→ X P-almost
surely.
Exercise 6.1.19. This exercise deals with a variation, proposed by Jan Myciel-
ski, on the sort of search algorithm discussed in § 5.2.5. Let G be a non-empty,
bounded, open subset of RN with the property that λRN B(x, r) ∩ G ≥ αΩN rd
for some α > 0 and all x ∈ G and 0 < r ≤ diam(G), and define µ on (G, BG )
λ (Γ∩G)
by µ(Γ) = RλNN (G) . Next, let (Ω, F, P) be a probability space on which there
R
exists sequences {Xn : n ≥ 1} and {Zn : n ≥ 1} of G-valued random variables
with the properties that the Xn ’s are mutually independent and have distribu-
tion µ, Zn is independent
of {X
1 , . . . , Xn } and has distribution νn µ for each
n ≥ 1, and Kr ≡ supn≥1
dν < ∞ for some r ∈ (1, ∞). Without loss
n
dµ
r
L (µ;R)
in generality, assume that n 6= n0 =⇒ Xn (ω) 6= Xn0 (ω) for all ω ∈ Ω. For each
n ≥ 1, let Yn (ω) be the last element of {X1 (ω), . . . , Xn (ω)} which is closest to
Zn (ω). That is, if Σn is the permutation group on {1, . . . , n} and, for π ∈ Σn ,
An (π) = ω : |Xπ(m) (ω) − Zn (ω)| < |Xπ(m−1) (ω) − Zn (ω)| : for 2 ≤ m ≤ n ,
then Yn = Zπ(n) on An (π). Show that for all Borel measurable f : G −→ R,
|f (Yn ) − f (Zn )| −→ 0 in P-probability. Here are some steps that you might want
to follow.
244 6 Some Extensions and Applications
Next, for n ≥ 2, set rn (ω) = |Xn−1 (ω) − z|, and show that
"Z #
P
P
E f (Xn ), An (z) = E f dµ, An−1 (z) ≤ MG f (z)P An (z) ,
B(z,rn )
(iii) Given the conclusion drawn at the end of (ii), proceed as in the derivation
of Theorem 5.2.34 from Lemma 5.2.31 to get the desired result.
§ 6.2 Elements of Ergodic Theory
Among the two or three most important general results about dynamical systems
is D. Birkhoff’s Individual Ergodic Theorem. In this section, I will present a
generalization, due to N. Wiener, of Birkhoff’s basic theorem.
The setting in which I will prove the Ergodic Theorem will be the following.
kF, µ) will N
(Ω, be a σ-finite measure space on which there exits a semigroup
Σ : k ∈ N of measurable, µ-measure preserving transformations.
That is, for each k ∈ NN , Σk is an F-measurable map from Ω into itself, Σ0 is
the identity map, Σk+` = Σk ◦ Σ` for all k, ` ∈ NN , and
µ(Γ) = µ (Σk )−1 (Γ) for all k ∈ N and Γ ∈ F.
(24)N
(6.2.2) µ sup kAn F kE ≥ λ ≤ kF kL1 (µ;E) , λ ∈ (0, ∞),
n≥1 λ
or
(24)N p
(6.2.3)
sup kAn F kE
≤ kF kLp (µ;E) ,
n≥1
Lp (µ)
p−1
F ◦ Σk (ω) if k ∈ Q+
2n
ak (ω) ≡
0 / Q+
if k ∈ 2n ,
1The idea of using Hardy’s Inequality was suggested to P. Hartman by J. von Neumann and
appears for the first time in Hartman’s “On the ergodic theorem,” Am. J. Math. 69, pp.
193–199 (1947).
246 6 Some Extensions and Applications
and
p X
(12)N p
X p p
k
F ◦ Σk (ω)
max Am F ◦ Σ (ω) ≤ .
1≤m≤n p−1
k∈Q+
n k∈Q+
2n
X Z
k
µ max Am F ◦ Σ ≥λ = Cn (ω) µ(dω)
1≤m≤n
k∈Q+
n
(12)N X
Z
≤ F ◦ Σk f dµ
λ +
k∈Q2n
and, similarly,
p X Z
(12)N p
X Z
k
p p
max Am F ◦ Σ dµ ≤ F ◦ Σk dµ.
1≤m≤n p−1
k∈Q+
n k∈Q+
2n
Finally, since the distributions of max1≤m≤n Am F ◦ Σk and F ◦ Σk do not
depend on k ∈ NN , the preceding lead immediately to
(24)N
µ max Am F ≥ λ ≤ kF kL1 (µ)
1≤m≤n λ
and N
2 p (12)N p
max Am F
≤ kF kLp (µ)
1≤m≤n
Lp (µ) p−1
for all n ∈ Z+ . Thus, (6.2.2) and (6.2.3) follow after one lets n → ∞.
Given (6.2.2) and (6.2.3), I adopt again the strategy used in the proof of
Corollary 5.2.4. That is, I must begin by finding a dense subset of each Lp -space
on that the desired convergence results can be checked by hand, and for this
purpose I will have to introduce the notion of invariance.
A set Γ ∈ F is said to be invariant, and I write Γ ∈ I if Γ = (Σk )−1 (Γ) for
every k ∈ NN . As is easily checked, I is a sub-σ-algebra of F. In addition, it
is clear that Γ ∈ F is invariant if Γ = (Σej )−1 (Γ) for each 1 ≤ j ≤ N , where
{ei : 1 ≤ i ≤ N } is the standard orthonormal basis in RN . Finally, if I is the
µ-completion of I relative to F in the sense that Γ ∈ I if and only if Γ ∈ F and
there is Γ̃ ∈ I such that µ(Γ∆Γ̃) = 0 (A∆B ≡ (A\B)∪(B \A) is the symmetric
difference between the sets A and B), then an F-measurable F : Ω −→ E is
I-measurable if and only if F = F ◦ Σk (a.e., µ) for each k ∈ NN . Indeed, one
§ 6.2 Elements of Ergodic Theory 247
need only check this equivalence for indicator functions of sets. But if Γ ∈ F
and µ(Γ∆Γ̃) = 0 for some Γ̃ ∈ I, then
µ Γ∆(Σk )−1 (Γ) ≤ µ (Σk )−1 (Γ∆Γ̃) + µ(Γ∆Γ̃) = 0,
Proof: I begin with the case when E = R. The first step is to identify the
orthogonal complement I(R)⊥ of I(R). To this end, let N denote the subspace
of L2 (µ; R) consisting of elements having the form g − g ◦ Σej for some g ∈
L2 (µ; R) ∩ L∞ (µ; R) and 1 ≤ j ≤ N . Given f ∈ I(R), observe that
and so
Z `
!2
ΠI(E) F
2 2
X
L (µ;E)
≤ kai kE ΠI(R) 1Γi dµ
1
`
!
2 `
X
X
=
ΠI(R) kai kE 1Γi
≤ kai k2E µ(Γi ) = kF k2L2 (µ;E) .
2
1 L (µ;R) 1
Thus, since the space of µ-simple functions is dense in L2 (µ; E), it is clear that
ΠI(E) not only exists but is also unique.
Finally, to check (6.2.5) for general E’s, note that (6.2.5) for E-valued, µ-
simple F ’s is an immediate consequence of (6.2.5) for E = R. Thus, we already
know (6.2.5) for a dense subspace of L2 (µ; E), and so the rest is another elemen-
tary application of (6.2.3).
§ 6.2.2. Birkhoff ’s Ergodic Theorem. For any p ∈ [1, ∞), let Ip (E) denote
the subspace of I-measurable elements of Lp (µ; E). Clearly Ip (E) is closed for
every p ∈ [1, ∞). Moreover, since
µ(Ω) < ∞ =⇒ ΠI(E) F = Eµ F I ,
(6.2.6)
when µ is finite ΠI(E) extends automatically as a linear contraction from Lp (µ; E)
onto Ip (E) for each p ∈ [1, ∞), the extension being given by the right-hand side
of (6.2.6). However, when µ(E) = ∞, there is a problem. Namely, because µ I
will seldom be σ-finite, it will not be possible to condition µ with respect to I.
Be that as it may, (6.2.5) provides an extension of ΠI(E) . Namely, from (6.2.5)
and Fatou’s Lemma, it is clear that, for each p ∈ [1, ∞),
≤ kF kLp (µ;E) , F ∈ Lp (µ; E) ∩ L2 (µ; E),
ΠI(E) F
p
L (µ;E)
and the convergence is in Lp (µ; E) when either p ∈ (1, ∞) or p = 1 and µ(Ω) <
∞.
Proof: As I said above, the proof is now an easy application of the strategy
used to prove Corollary 5.2.4. Namely, by (6.2.2), the set of F ∈ L1 (µ; E) for
which (6.2.8) holds is closed and, by (6.2.5), it includes L1 (µ; E) ∩ L∞ (µ; E).
Hence, (6.2.8) is proved for p = 1. On the other hand, when p ∈ (1, ∞),
(6.2.3) applies and shows first that the set of F ∈ Lp (µ; E) for which (6.2.8)
holds is closed in Lp (µ; E) and second that µ-almost everywhere convergence
already implies convergence in Lp (µ; E). Hence, we have proved that (6.2.8)
holds and that the convergence is in Lp (µ; E) when p ∈ (1, ∞). In addition,
when µ(Γ) ∧ µ(Γ{) = 0 for all Γ ∈ I, it is clear that the only elements of Ip (E)
are µ-almost everywhere constant, which, in the case when µ(Ω) < ∞, means (cf.
µ
[F ]
(6.2.6)) that ΠI(E) F = Eµ(Ω) , and, when µ(Ω) = ∞, means that Ip (E) = {0}
for all p ∈ [1, ∞).
In view of the preceding, all that remains is to discuss the L1 (µ; E) convergence
in the case when p = 1 and µ(Ω) < ∞. To this end, observe that, because the
An ’s are all contractions in L1 (µ; E), it suffices to prove L1 (µ; E) convergence
for E-valued, µ-simple F ’s. But L1 (µ; E) convergence for such F ’s reduces
to showing that An f −→ ΠI(R) f in L1 (µ; R) for non-negative f ∈ L∞ (µ; R).
Finally, if f ∈ L1 µ; [0, ∞) , then
n ∈ Z+ ,
An f kL1 (µ) = kf kL1 (µ) =
ΠI(R) f kL1 (µ;R) ,
where, in the last equality, I used (6.2.6); and this, together with (6.2.8), implies
(cf. the final step in the proof of Theorem 6.1.12) convergence in L1 (µ).
I will say that semigroup Σk : k ∈ NN is ergodic on (Ω, F, µ) if, in addition
to being µ-measure preserving, µ(Γ) ∧ µ(Γ{) = 0 for every invariant Γ ∈ I.
250 6 Some Extensions and Applications
Classic Example. In order to get a feeling for what the Ergodic Theorem is
saying, take µ to be Lebesgue measure on the interval [0, 1) and, for a given
α ∈ (0, 1), define Σα : [0, 1) −→ [0, 1) so that
Σα (ω) ≡ ω + α − [ω + α] = ω + α mod 1.
If α is rational and m is the smallest element of Z+ with the property that
mα ∈ Z+ , then it is clear that, for any F on [0, 1), F ◦ Σα = F if and only if F
1
has period m . Hence, if F ∈ L2 [0, 1); C and
√
Z
c` (F ) ≡ F (ω)e− −1 2π`ω dω, ` ∈ Z,
[0,1)
then elementary Fourier analysis leads to the conclusion that, in this case,
X √
lim An F (ω) = cm` (F )e −1 2m`πω for Lebesgue-almost every ω ∈ [0, 1).
n→∞
`∈Z
On the other hand, if α is irrational, then Σkα : k ∈ N} is µ-ergodic on [0, 1).
To see this, suppose that F ∈ I(C). Then (cf. the preceding and use Parseval’s
Identity)
2
c` (F ) − c` (F ◦ Σα )2 .
X
0 =
F − F ◦ Σα
L2 ([0,1);C) =
`∈Z
But, clearly, √
−1 2π`α
c` (F ◦ Σα ) = e c` (F ), ` ∈ Z,
and so (because α is irrational) c` (F ) = 0 for each ` 6= 0. In other words, the only
elements of I(C) are µ-almost everywhere constant. Thus, for each irrational
α ∈ (0, 1), p ∈ [1, ∞), separable Banach space E, and F ∈ Lp [0, 1); E ,
Z
lim An F = F (ω) dω Lebesgue-almost everywhere and in Lp (µ; E).
n→∞ [0,1)
Finally, notice that the situation changes radically when one moves from [0, 1) to
[0, ∞) and again takes µ to be Lebesgue measure and α ∈ (0, 1) to be irrational.
If I extend the definition of Σα by taking Σα (ω) = bωc + Σα (ω − bωc) for ω ∈
[0, ∞), then it is clear that invariant functions are those that are constant on each
R bωc+1
interval [m, m+1) and that, Lebesgue-almost surely, An f (ω) −→ bωc f (η) dη.
On the other hand, if one defines Σα (ω) = ω + α, then every invariant set that
has non-zero measure will have infinite measure, and so, now, every choice of
α ∈ (0, 1) (not just irrational ones) will give rise to an ergodic system. In
particular, one will have, for each p ∈ [1, ∞) and F ∈ Lp (µ; E),
lim An F = 0 Lebesgue-almost everywhere,
n→∞
has the same (joint) distribution under P as F itself. Clearly, one can test for
stationarity by checking that the distribution of Fej is the same as that of F for
each 1 ≤ j ≤ N . In order to apply the considerations of § 6.2.1 to stationary
families, note that all questions about the properties of F can be phrased in
N
terms of the following canonical setting . Namely, set E = E N and define µ
N N
on E, B N to be the image measure F∗ P. In other words, for each Γ ∈ B N ,
µ(Γ) = P F ∈ Γ . Next, for each ` ∈ NN , define Σ` : E −→ E to be the natural
shift transformation on E given by Σ` (x)k = xk+` for all k ∈ NN . Obviously,
stationarity of F is equivalent to the statement that {Σk : k ∈ NN } is µ-measure
N
preserving. Moreover, if I is the σ-algebra of shift invariant elements Γ ∈ B N
−1
(i.e., Γ = Σk (Γ) for all k ∈ NN ), then, by Theorem 6.2.7, for any separable
Banach space B, any p ∈ [1, ∞), and any F ∈ Lp (P; B),
1 X h i
lim N F ◦ Fk = EP F ◦ F F−1 (I) (a.s., P) and in Lp (P; B).
n→∞ n
+
k∈Qn
N
In particular, when Σk : k ∈ NN is ergodic on E, B N µ , I will say that the
family F is ergodic and conclude that the preceding can be replaced by
1 X
F ◦ Fk = EP F ◦ F (a.s., P) and in Lp (P; B).
(6.2.9) lim N
n→∞ n
+
k∈Qn
So far I have discussed one-sided stationary families, that is, families indexed
by NN . However, for various reasons (cf. Theorem 6.2.11) it is useful to know
that one can usually embed a one-sided stationary family into a two-sided one. In
terms of the semigroup of shifts,
this corresponds to the trivial observation that
NN
k N
the semigroup Σ : k ∈ N on E = E can be viewed as a sub-semigroup
N
of the group of shifts Σk : k ∈ ZN on Ê = E Z . With these comments in
mind, I will prove the following.
Lemma 6.2.10. Assume that E is a complete, separable, metric space and that
F = {Xk : k ∈ NN } is a stationary family of E-valued random variables on the
(Ω, F, P).
probability space
N
Then there exists a probability space (Ω̂,
N
F̂, P̂) and
a family F̂ = X̂k : k ∈ Z with the property that, for each ` ∈ Z ,
F̂` ≡ X̂k+` : k ∈ NN
Λn = k ∈ ZN : kj ≥ −n for 1 ≤ j ≤ N ,
Hence the µn ’s are consistently defined on the spaces E Λn , and therefore Kol-
mogorov’s Extension Theorem applies and guarantees the existence of a unique
N
Borel probability measure µ on E Z with the property that
N
\Λn
µ EZ × Γ = µn (Γ) for all n ≥ 0 and Γ ∈ BE Λn .
Thus, if
λΓ (ω̂) ≡ inf k ∈ N : U−k (ω̂) = 1 ,
then
n ∈ Z+ ,
P ρΓ ≥ n, X0 ∈ Γ = P̂ λΓ = n − 1 ,
and so
EP ρΓ , X0 ∈ Γ = P̂ λΓ < ∞ .
Now observe that
P̂ λΓ > n = P̂ Û−n = 0, . . . , Û0 = 0 = P X0 ∈
/ Γ, . . . , Xn ∈
/Γ ,
Clearly,
ω ∈ G(F ) =⇒ Σt (ω) ∈ G(F ) for every t ∈ [0, ∞)N .
In addition, if F ∈ Lp (µ; E) for some p ∈ [1, ∞), then
Z Z !
F ◦ Σt (ω)
p dt µ(dω) = T N kF kp p
E L (µ;E) < ∞,
Ω [0,T )N
and so [
F ∈ Lp (µ; E) =⇒ µ G(F ){ = 0.
p∈[1,∞)
Eµ [F ]
lim AT F = (a.e., µ),
T →∞ µ(Ω)
where it is understood that the ratio is 0 when the denominator is infinite.
§ 6.2 Elements of Ergodic Theory 255
(24)N
(6.2.14) µ sup
AT F
E ≥ λ ≤ kF kL1 (µ;E) , λ ∈ (0, ∞)
T >0 λ
and
(24)N p
(6.2.15)
sup
AT F
≤ kF kLp (µ;E) for p ∈ (1, ∞).
T >0 E
p−1
Lp (µ;E)
for n ≤ T ≤ n + 1,
lim
AT F − An F̂
for every p ∈ [1, ∞].
n→∞
supn≤T ≤n+1
E
=0
Lp (µ;R)
Hence, for F ∈ L1 (µ; E)∩L∞ (µ; E), (6.2.13) follows from (6.2.8). As for
the case
when µ(Ω) < ∞, all that we have to do is check that ΠÎ(E) F = Eµ F Î (a.e., µ).
However, from (6.2.13), it is easy to see that ΠÎ(E) F is measurable with respect
to the µ-completion of Î, and so it suffices to show that
Exercise 6.2.16. Given an irrational α ∈ (0, 1) and an ∈ (0, 1), let Nn (α, )
be the number of 1 ≤ m ≤ n with the property that
α − ` ≤
for some ` ∈ Z.
m 2m
On the other hand, it is not at all clear how to compare the size of Yn to that
of Xn in any of the Lp spaces other than p = 2.
The problem of finding such a comparison was given a definitive solution by D.
Burkholder, and I will present his solution in this section. Actually, Burkholder
solved the problem twice. His first solution was a beautiful adaptation of general
ideas and results that had been developed over the years to solve related prob-
lems in probability theory and analysis and, as such, did not yield the optimal
solution. His second approach is designed specifically to address the problem
at hand and bears little or no resemblance to familiar techniques. It is entirely
original, remarkably elementary and effective, but somewhat opaque. The ap-
proach is the outgrowth of many years of deep thinking that Burkholder devoted
to the topic, and the reader who wants to understand the path that led him to
it should consult the explanation that he wrote.1
§ 6.3.1. Burkholder’s Comparison Theorem. Burkholder’s basic result is
the following comparison theorem.
Theorem
6.3.1 (Burkholder). Let Ω, F, P be a probability space, Fn :
n ∈ N a non-decreasing sequence of sub-σ-algebras of F, and E and F a pair
of (real or complex)
separable Hilbert spaces. Next, suppose that Xn , Fn , P
and Yn , Fn , P are, respectively, E- and F -valued martingales. If
I may assume that both E and F are complex Hilbert spaces, since we can always
complexify them, and, in addition, that E = F , since, if that is not already the
case, I can embed them in E ⊕ F . Thus, I will be making these assumptions
throughout.
The heart of the proof lies in the computations contained in the following two
lemmas.
Lemma 6.3.3. Let p ∈ (1, ∞) be given, set
Then p
kykpE − Bp kxkE ≤ αp u(x, y), (x, y) ∈ E 2 .
Proof: When p = 2, there is nothing to do. Thus, I will assume that p ∈
(1, ∞) \ {2}.
Observe that it suffices to show that, for all (x, y) ∈ E 2 satisfying kxkE +
kykE = 1, depending on whether p ∈ (2, ∞) or p ∈ (1, 2),
p ≤ p2−p (p − 1)p−1 kykE − (p − 1)kxkE
p
(*) kykE − (p − 1)kxkE
≥ p2−p (p − 1)p−1 kykE − (p − 1)kxkE .
Indeed, when p ∈ (2, ∞), (*) is precisely the result desired, and, when p ∈ (1, 2),
(*) gives the desired result after one divides through by (p − 1)p and reverses
the roles of x and y.
I begin the verification of (*) by checking that
if p ∈ (2, ∞)
2−p p−1 >1
(**) p (p − 1)
<1 if p ∈ (1, 2).
To this end, set f (p) = (p − 1) log(p − 1) − (p − 2) log p for p ∈ (1, ∞). Then
f is strictly convex on (1, 2) and strictly concave on (2, ∞). Thus, f (1, 2)
cannot achieve a maximum and, therefore, since limp&1 f (p) = 0 = f (2), f < 0
on (1, 2). Similarly, f (2, ∞) cannot achieve a minimum and, therefore, since
f (2) = 0 while limp%∞ f (p) = ∞, we have that f > 0 on (2, ∞).
Next, observe that proving (*) comes down to checking that, for s ∈ [0, 1],
≥0 if p ∈ (2, ∞)
2−p p−1 p p p
Φ(s) ≡ p (p − 1) (1 − ps) − (1 − s) + (p − 1) s
≤0 if p ∈ (1, 2).
§ 6.3 Burkholder’s Inequality 259
To this end, note that, by (**), Φ(0) > 0 when p ∈ (2, ∞) and Φ(0) < 0 when
p ∈ (1, 2). Also, for s ∈ (0, 1),
h i
Φ0 (s) = p (p − 1)p sp−1 + (1 − s)p−1 − p2−p (p − 1)p−1
and h i
Φ00 (s) = p(p − 1) (p − 1)p sp−2 − (1 − s)p−2 .
p ∈ (2, ∞) or p ∈ (1, 2), lims&0 Φ00 (s) is negative or positive, Φ00 is strictly in-
creasing or decreasing on (0, 1), and lims%1 Φ00 (1) is positive or negative. Hence,
there exists a unique t = tp ∈ (0, 1) with the property that
< 0 if p ∈ (2, ∞) > 0 if p ∈ (2, ∞)
Φ00 (0, t) and Φ00 (t, 1)
> 0 if p ∈ (1, 2) < 0 if p ∈ (1, 2.
Moreover, because Φ00 (t) = 0, it is easy to see that t ∈ 0, p1 .
Now suppose that p ∈ (2, ∞) and consider Φ on each of the intervals p1 , 1 ,
we also know that Φ (0, t) > 0. The argument when p ∈ (1, 2) is similar, only
this time all the signs are reversed.
Lemma 6.3.4. Again let p ∈ (1, ∞) be given, and define u : E × F −→ R as in
Lemma 6.3.3. In addition, define the functions v and w on E 2 \ {0, 0} by
p−2
v(x, y) = p kykE + kxkE kykE + (2 − p)kxkE
and p−2
w(x, y) = p(1 − p) kykE + kxkE kxkE .
Then, for (x, y) ∈ E 2 and (k, h) ∈ E 2 satisfying
min ky + thkE ∧ kx + tkkE > 0 and khkE ≤ kkkE ,
t∈[0,1]
one has
y x
u(x + k, y + h) − u(x, y) ≤ v(x, y) Re kykF , h + w(x, y) Re kxk E
, k
F E
Proof: Set
Φ(t) = Φ t; (x, k), (y, h)
p−1
≡ ky + thkE − (p − 1)kx + tkkE kx + tkkE + ky + thkE ,
In particular, the first expression establishes the required form for Φ0 (t). In
addition, from the second expression, we see that
Φ00 (t) 2
− = (p − 1)(p − 2) Ψ(t)p−3 kx(t)kE a(t) + b(t)
p
h i
+ (p − 1)Ψ(t)p−2 a(t) a(t) + b(t) + kx(t)k 2 2
E
ky(t)kE b⊥ (t) + a⊥ (t)
h i
b⊥ (t)2
− Ψ(t)p−2 (p − 1) a(t) + b(t) b(t) + Ψ(t) ky(t)k
E
2
= (p − 1)(p − 2) Ψ(t)p−3 kx(t)kE a(t) + b(t)
b⊥ (t)2
+ (p − 1)Ψ(t)p−2 kkk2E − khk2E + (p − 2)Ψ(t)p−1
ky(t)kE ,
p p
where a⊥ (t) = kkk2E − a(t)2 and b⊥ (t) = khk2E − b(t)2 . Hence the required
properties of Φ00 (t) have also been established.
§ 6.3 Burkholder’s Inequality 261
and
Y0 (ω) − span{Hn (ω) : n ∈ Z+ }
≥
E
for all ω ∈ Ω. Indeed, if this is not already the case, then I can replace E by
R × E (or, when E is complex, C × E) and Xn (ω) and Yn (ω), respectively, by
() ()
for each n ∈ N. Clearly, (6.3.2) for each Xn and Yn implies (6.3.2) for Xn
and Yn after one lets & 0. Finally, because there is nothing to do when the
right-hand side of (6.3.2) is infinite, let p ∈ (1, ∞) be given, and assume that
Xn ∈ Lp (P; E) for each n ∈ N. In particular, if u is the function defined in
Lemma 6.3.3 and v and w are those defined in Lemma 6.3.4, then
0
u(Xn , Yn ) ∈ L1 (P; R) and v(Xn , Yn ), w(Xn , Yn ) ∈ Lp (P; R)
p
for all n ∈ N, where p0 = p−1 is the Hölder conjugate of p.
Note that, by Lemma 6.3.3, it suffices for us to show that An ≡ EP u Xn , Yn
≤ 0, n ∈ N. Since u X0 , Y0 ) ≤ 0 P-almost surely, there is no question that
A0 ≤ 0. Next, assume that An ≤ 0, and, depending on whether p ∈ [2, ∞) or
p ∈ (1, 2], use the appropriate part of Lemma 6.3.4 to see that
h i
An+1 ≤EP v(Xn , Yn )Re kYYnnkE , Hn+1
E
h i
P Xn
+ E w(Xn , Yn )Re kXn kE , Kn+1
E
or
h i
An+1 ≤ − EP w(Yn , Xn )Re kYYnnkE , Hn+1
E
h i
P Xn
− E v(Yn , Xn )Re kXn kE , Kn+1 .
E
Since the same reasoning shows that each of the other terms on the right-hand
side vanishes, we have now proved that An+1 ≤ 0.
As an immediate consequence of Theorem (6.3.2), we have the following answer
to the question raised at the beginning of this section.
262 6 Some Extensions and Applications
This is the form of his inequality which is best known and, as such, is called
Burkholder’s Inequality. Notice that his inequality can be viewed as a vast
generalization of Khinchine’s Inequality (2.3.27), although it applies only when
p ∈ (1, ∞).
Theorem
6.3.6 (Burkholder’s Inequality). Let Ω, F, P and Fn : n ∈
N be as in Theorem 6.3.1, and let Xn , Fn , P be a martingale with values in
the separable Hilbert space E. Then, for each p ∈ (1, ∞),
1
sup
Xn − X0
Lp (P;E)
Bp n∈N
∞
! p2 p1
Xn − Xn−1
2
X
(6.3.7) ≤ EP E
1
≤ Bp sup
Xn − X0
Lp (P;E) ,
n∈N
with Bp as in (6.3.2).
Proof: Let F = `2 (N; E) be the separable Hilbert space of sequences
y = x0 , . . . , xn , . . . ∈ E N
satisfying
∞
! 12
X
kykF ≡ kxn k2E < ∞,
0
Exercises for § 6.3 263
and define
Yn (ω) = (X0 (ω), X1 (ω) − X0 (ω), . . . , Xn (ω) − Xn−1 (ω), 0, 0, . . . ) ∈ F
for ω ∈ Ω and n ∈ N. Obviously, Yn , Fn , P is an F -valued martingale. More-
over,
kX0 kE = kY0 kF and kXn − Xn−1 kE = kYn − Yn−1 kF , n ∈ N,
and therefore the right-hand side of (6.3.7) is implied by (6.3.2) while the left-
hand side also follows from (6.3.2) when the roles of the Xn ’s and Yn ’s are
reversed.
Exercises for § 6.3
Exercise 6.3.8. Because it arises repeatedly in the theory of stochastic inte-
gration, one of the most frequent applications of Burkholder’s Inequality is to
situations in which E is a separable Hilbert space and (Xn , Fn , P) is an E-valued
martingale for which one has an estimate of the form
1
h i 2p
2p
P
Kp ≡ sup
E kXn − Xn−1 kE Fm−1
∞ <∞
n∈Z+
L (P;R)
for some p ∈ [1, ∞). To see how such an estimate gets used, let F be a sec-
ond separable Hilbert space and suppose that {σn : n ∈ N} is a sequence of
Hom(E; F )-valued random variables with the properties that, for each n ∈ N,
1
2p
σn is Fn -measurable and an ≡ EP kσn k2p
op < ∞. Set Y0 = 0 and
n
X
for n ∈ Z+ ,
Yn = σm−1 Xm − Xm−1
m=1
Exercise 6.3.9. Return to the setting in Exercise 5.2.45, and let λ[0,1) denote
Lebesgue measure on [0, 1). Given f ∈ L2 (λ[0,1) ; C), show that, for each p ∈
(1, ∞),
1
(p − 1)∧
f − (f, 1)L2 (λ ;C)
p
L ([0,1);C)
p−1 [0,1)
∞
! p p1
2 2
Z X
≤ ∆m (f ) dt
[0,1) m=0
1
≤ (p − 1) ∨
f − (f, 1)L2 (λ ;C)
p
L (λ[0,1) ;C)
.
p−1 [0,1)
264 6 Some Extensions and Applications
For functions f with (f, e` )L2 (λ[0,1) ;C) = 0 unless ` = ±2m for some m ∈ N, this
estimate is a case of a famous theorem proved by Littlewood and Paley in order
to generalize Parseval’s Identity to cover p 6= 2. Unfortunately, the argument
here is far too weak to give their inequality for general f ’s.
Exercise 6.3.10. In connection with the preceding exercise, it is interesting
to note that there is an orthonormal basis for L2 λ[0,1) ; R that, as distinguished
from the trigonometric functions, can be nearly completely understood in terms
of martingale analysis. Namely, recall the Rademacher functions {Rn : n ∈ Z+ }
introduced in § 1.1.2. Next, use F to denote the set of all finite subsets F of Z+ ,
and define the Walsh function WF for F ∈ F by
if F = ∅
1
WF = Q
m∈F Rm if F 6= ∅.
Using the result in (i), show that Xnf = Eλ[0,1) [f |Fn ] and therefore that Xnf , Fn ,
λ[0,1) is a martingale.
In particular, Xnf −→ f both (a.e., λ[0,1) ) as well as in
1
L λ[0,1) ; R .
(iii) Show that for each p ∈ (1, ∞) and f ∈ L1 λ[0,1) ; R with mean value 0,
(p − 1) ∧ (p − 1)−1 kf kLp ([0,1);R)
p1
2 p2
∞
Z Z !
X X
≤ f (s) WF (s) ds WF (t) dt
[0,1) n=1 F ⊆An \An−1 [0,1)
1 + x a 1 − x −a
eax ≤ e + e = cosh a + x sinh a.
2 2
(ii) Suppose that {Y1 , . . . , Yn } are [−1, 1]-valued random variables on the prob-
ability space (Ω, F, P) with the property that, for each 1 ≤ m ≤ n,
EP Yj1 · · · Yjm = 0 for all 1 ≤ j1 < · · · < jm ≤ n.
It turns out that many of the ideas and results introduced in § 5.2 can be easily
transferred to the setting of processes depending on a continuous parameter. In
addition, the resulting theory is intimately connected with Lévy processes, and
particularly Brownian. In this chapter, I will give a brief introduction to this
topic and some of the techniques to which it leads.1
§ 7.1 Continuous Parameter Martingales
There is a huge number of annoying technicalities which have to be addressed in
order to give a mathematically correct description of the continuous time theory
of martingales. Fortunately, for the applications which I will give here, I can
keep them to a minimum.
§ 7.1.1. Progressively
Measurable
Functions. Let (Ω, F) be a measurable
space and Ft : t ∈ [0, ∞) a non-decreasing family of sub-σ-algebras. I will say
that a function X on [0, ∞)×Ω space (E, B) is progressively
into a measurable
measurable with respect to Ft : t ∈ [0, ∞) if X [0, T ] × Ω is B[0,T ] × FT -
measurable for every T ∈ [0, ∞). When E is a metric space, I will say that
X : [0, ∞) × Ω −→ E is right-continuous if X(s, ω) = limt&s X(t, ω) for every
(s, ω) ∈ [0, ∞) × Ω and will say that it is continuous if X( · , ω) is continuous
for all ω ∈ Ω.
Remark 7.1.1. The reader might have been expecting a slightly different def-
inition of progressive measurability
here. Namely,
he might have thought that
one would say that X is Ft : t ∈ [0, ∞) -progressively measurable if it is
B[0,∞) × F-measurable and ω ∈ Ω 7−→ X(t, ω) ∈ E is Ft -measurable for each
t ∈ [0, ∞). Indeed, in extrapolating from the discrete parameter setting, this
would be the first definition at which one would arrive. In fact, it was the notion
with which Doob and Itô originally worked; and such functions were said by
them to be adapted to Ft : t ∈ [0, ∞) . However, it came to be realized
that there are various problems with the notion of adaptedness. For example,
even if X is adapted and f : E −→ R is a bounded, B-measurable function, the
1 A far more thorough treatment can be found in D. Revuz and M. Yor’s treatise Continuous
Martingales and Brownian Motion, Springer-Verlag, Grundlehren der Mathematishen #293
(1999).
266
§ 7.1 Continuous Parameter Martingales 267
Rt
function (t, ω) Y (t, ω) ≡ 0 f X(s, ω) ds ∈ R need not be adapted. On the
other hand, if X is progressively measurable, then Y will be also.
The following simple lemma should help to explain the virtue of progressive
measurability and its relationship to adaptedness.
Lemma 7.1.2. Let PM denote the set of A ⊆ [0, ∞) × Ω with the property
that [0, t] × Ω ∩ A ∈ B[0,t] × Ft for every t ≥ 0. Then PM is a sub-σ-algebra of
B[0,∞) ×F and X is progressively measurable if and only if it is PM-measurable.
Furthermore, if E is a separable metric space and X : [0, ∞) × Ω −→ E is a
right-continuous function, then X is progressively measurable if it is adapted.
Proof: Checking that PM is a σ-algebra is easy. Furthermore, for any X :
[0, ∞) × Ω −→ E, T ∈ [0, ∞), and Γ ∈ B,
(t, ω) ∈ [0, T ] × Ω : X(t, ω) ∈ Γ
= [0, T ] × Ω ∩ (t, ω) ∈ [0, ∞) × Ω : X(t, ω) ∈ Γ},
and so X is Ft : t ∈ [0, ∞) -progressively measurable if and only if it is PM-
measurable. Hence, the first assertion has been proved.
Next, suppose that X is a right-continuous, adapted function. To see that X
is progressively measurable, let t ∈ [0, ∞) be given, and define
n
Xnt (τ, ω) = X [2 2τn]+1 ∧ t, ω , for (τ, ω) ∈ [0, ∞) × Ω and n ∈ N.
Obviously, Xnt is B[0,t] ×Ft -measurable for every n ∈ N and Xnt (τ, ω) −→ X(τ, ω)
as n → ∞ for every (τ, ω) ∈ [0, t] × Ω. Hence, X [0, t] × Ω is B[0,t] × Ft -
measurable, and so X is progressively measurable.
§ 7.1.2. Martingales: Definition and Examples. Given a probability space
(Ω, F, P ) and a non-decreasing family of sub-σ-algebras Ft : t ∈ [0, ∞) , I will
that X : [0, ∞) × Ω −→ (−∞, ∞] is a submartingale
say with respect to
Ft : t ∈ [0, ∞) or, equivalently, that X(t), Ft , P is a submartingale if X
is a right-continuous, progressively measurable function with the properties that
X(t)− is P-integrable for every t ∈ [0, ∞) and
X(s) ≤ EP X(t)Fs (a.s., P) for all 0 ≤ s ≤ t < ∞.
When both X(t), Ft , P and −X(t), Ft , P are submartingales, I will say either
that X is a martingale with respect to Ft : t ∈ [0, ∞) or simply that
X(t), Ft , P is a martingale. Finally, if Z : [0, ∞) × Ω −→ C is a right-
continuous, progressively measurable function, then
Z(t), Ft , P is said
to be a
(complex) martingale if both Re Z(t), Ft , P and Im Z(t), Ft , P are.
The next two results show that Lévy processes provide a rich source of con-
tinuous parameter martingales.
268 7 Continuous Parameter Martingales
Theorem 7.1.3. Let µ ∈ I(RN ) with µ̂(ξ) = e`µ (ξ) , where `µ (ξ) equals
√ √
Z
−1(ξ,y)RN
−1 ξ, m RN − ξ, Cξ RN + e − 1 − 1[0,1] (|y|) ξ, y RN
M (dy).
RN
where Ft = σ {Z(τ ) : τ ∈ [0, t]} .
√
= exp −1 ξ, Z(s) RN
− s`µ (ξ) .
To prove the converse assertion, observe that the defining distributional property
of a Lévy process for µ can be summarized as the statement that Z(0, ω) = 0
and, for each 0 ≤ s < t, Z(t) − Z(s) is independent of σ {Z(τ ) : τ ∈ [0, t]} and
cτ = eτ `µ . Hence, since (7.1.4) implies that
has distribution µt−s , where µ
h √ i
EP exp −1 ξ, Z(t) − Z(s) RN Fs = e(t−s)`µ (ξ) , ξ ∈ RN ,
1
Lµ ϕ(x) = Trace C∇2 ϕ(x) + m, ∇ϕ(x) RN
2 Z
(7.1.5) h i
+ ϕ(x + y) − ϕ(x) − 1[0,1] (|y|) y, ∇ϕ(x) RN M (dy)
RN
is a martingale for each ϕ ∈ Cc∞ (RN ; R), then {Z(t) : t ≥ 0} is a Lévy process
for µ.
Proof: Begin by noting that it suffices to handle the case when F is the
restriction to [0, ∞) × RN of an element of the Schwartz test function space
S (R × RN ; C). Indeed, because kLµ ϕku ≤ CkϕkCb2 (RN ;C) for some C < ∞,
the result for F ∈ Cb1,2 [0, ∞) × RN ; C follows, via an obvious approxima-
tion procedure, from the result for F ∈ S (R × RN ; C). Next observe that
it suffices to treat F ∈ S (RN ; C). To see this, simply interpret the process
t ∈ [0, ∞) 7−→ t, Zµ (t) ∈ RN +1 as a Lévy process for δ1 × µ.
Now let ϕ ∈ S (RN ; C) be given. The key to proving the required result is the
identity
d
(*) ϕ ? µ̆t = (Lµ ϕ) ? µ̆t ,
dt
where µ̆t is the distribution of −x under µt , the measure determined by µbt = et`µ .
The easiest way to check (*) is to work via Fourier transform and to use (3.2.10)
to verify that
d\
ϕ ? µ̆t (ξ) = `µ (−ξ)ϕ̂(ξ)et`µ (−ξ) = Ld
µ ϕ(ξ)et`µ (−ξ) ,
dt
which is equivalent to (*). To see how (*) applies, observe that
EP ϕ Z(t) Fs = ϕ ? µ̆t−s Z(s) ,
Since this means that u(t) = e(t−s)`µ (ξ) u(s), it follows that {Z(t) : t ≥ 0}
satisfies (7.1.4) and is therefore a Lévy process for µ.
As an immediate consequence of the preceding we have the following charac-
terizations of the distribution of a Lévy process. In the statement that follows,
Ft is the σ-algebra over D(RN ) generated by {ψ(τ ) : τ ∈ [0, t]}.
Theorem 7.1.7. Given µ ∈ I(RN ), let Qµ ∈ M1 D(RN ) be the distribution
of a Lévy process for µ. Then Qµ is the unique P ∈ M1 D(RN ) that satisfies
either one of the properties:
h√ i
exp −1 ξ, ψ(t) RN + t`µ (ξ) , Ft , P
is a martingale with mean value 1 for each ξ ∈ RN ,
or
Z t
µ
ϕ ψ(t) − ϕ(0) − L ϕ ψ(τ ) dτ, Ft , P
0
is a martingale with mean value 0 for each ϕ ∈ Cc∞ (RN ; R).
§ 7.1.3. Basic Results. In this subsection I run through some of the results
from § 5.2 that transfer immediately to the continuous parameter setting.
Lemma 7.1.8. Let the interval I and the function f : I −→ R ∪ {∞} be as in
Corollary 5.2.10. If either X(t), Ft , P is an I-valued martingale or X(t), Ft , P
is an I-valued submartingale
and f is non-decreasing and bounded below, then
f ◦ X(t), Ft , P is a submartingale.
Proof: The fact that the parameter is continuous plays no role here, and so
this result is already covered by the argument in Corollary 5.2.10.
Theorem 7.1.9 (Doob’s Inequality). Let X(t), Ft , P be a submartingale.
Then, for every α ∈ (0, ∞) and T ∈ [0, ∞),
! " #
1 P
P sup X(t) ≥ α ≤ E X(T ), sup X(t) ≥ α .
t∈[0,T ] α t∈[0,T ]
§ 7.1 Continuous Parameter Martingales 271
Proof: Because of Exercise 1.4.18, I need only prove the first assertion. To
this end, let T ∈ (0, ∞) and
n ∈ N be given,
apply Theorem 5.2.1 to the discrete
mT
parameter submartingale X 2n , F mT n
, P , and observe that
2
mT n
sup X 2n : 0≤m≤2 % sup X(t) as n → ∞.
t∈[0,T ]
Finally, again when X(t), Ft , P is either a non-negative submartingale
or a
martingale, for each p ∈ (1, ∞) the family |X(t)|p : t ∈ [0, ∞) is uniformly P-
integrable if and only if supt∈[0,∞) kX(t)kLp (P) < ∞, in which case X(t) −→ X
in Lp (P; R).
Proof: To prove the initial convergence assertion, note that, by Theorem
W 5.2.15
applied to the discrete parameter process X(n), Fn , P , there is an n∈N Fn -
measurable X ∈ L1 (P; R) to which X(n) converges P-almost surely. Hence,
we need only check that limt→∞ X(t) exists in [−∞, ∞] P-almost surely. To
(n)
this end, define U[a,b] (ω) for n ∈ N and a < b to be the precise number of
times that the sequence X 2mn , ω : m ∈ N upcrosses the interval [a, b] (cf. the
(n)
paragraph preceding Theorem 5.2.15), observe that U[a,b] (ω) is non-decreasing
(n)
as n increases, and set U[a,b] (ω) = limn→∞ U[a,b] (ω). Note that if U[a,b] (ω) < ∞,
272 7 Continuous Parameter Martingales
then (by right-continuity), there is an s ∈ [0, ∞) such that either X(t, ω) ≤ b for
all t ≥ s or X(t, ω) ≥ a for all t ≥ s. Hence, we will know that X(t, ω) converges
in [−∞, ∞] for P-almost every ω ∈ Ω as soon as we show that EP U[a,b] < ∞
for every pair a < b. In addition, by (5.2.16), we know that
P
h
(n)
i EP (X(t) − a)+
sup E U[a,b] ≤ sup < ∞,
n∈N t∈[0,∞) b−a
for every T ∈ (0, ∞). Hence, (7.1.11) follows when one lets T → ∞. But, again
from (*),
EP |X(T )|, |X(T )| ≥ α ≤ EP |X|, |X(T )| ≥ α ≤ EP |X|, sup |X(t)| ≥ α ,
t≥0
and therefore, since, by (7.1.11), P supt≥0 |X(t)| ≥ α −→ 0 as α → ∞, we can
conclude that {X(t) : t ≥ 0} is uniformly P-integrable.
Finally, if {X(T ) : T ≥ 0} is bounded in Lp (P; R) for some p ∈ (1, ∞), then,
by the last part of Theorem 7.1.9, supt≥0 |X(t)|p is P-integrable and therefore
X(t) −→ X in Lp (P; R).
§ 7.1.4. Stopping Times and Stopping Theorems. A stopping time
relative to a non-decreasing family {Ft : t ≥ 0} of σ-algebras is a map ζ :
Ω −→ [0, ∞] with the property that {ζ ≤ t} ∈ Ft for every t ≥ 0. Given a
stopping time ζ, I will associate with it the σ-algebra Fζ consisting of those
A ⊆ Ω such S∞ that A ∩ {ζ ≤ t} ∈ Ft for every t ≥ 0. Note that, because
{ζ < t} = n=0 {ζ ≤ (1 − 2−n )t}, {ζ < t} ∈ Ft for all t ≥ 0.
Here are a few useful facts about stopping times.
Lemma 7.1.12. Let ζ be a stopping time. Then ζ is Fζ -measurable, and,
for any progressively measurable function X with
values in a measurable space
(E, B), the function ω X(ζ, ω) ≡ X ζ(ω), ω is Fζ -measurable on {ζ < ∞} in
§ 7.1 Continuous Parameter Martingales 273
the sense that ω : ζ(ω) < ∞ & X(ζ, ω) ∈ Γ ∈ Fζ for all Γ ∈ B. In addition,
f ◦ ζ is again a stopping time if f : [0, ∞] −→ [0, ∞] is a non-decreasing, right-
continuous function satisfying f (τ ) ≥ τ for all τ ∈ [0, ∞]. Next, suppose that
ζ1 and ζ2 are a pair of stopping times. Then ζ1 + ζ2 , ζ1 ∧ ζ2 , and ζ1 ∨ ζ2
are all stopping times, and Fζ1 ∧ζ2 ⊆ Fζ1 ∩ Fζ2 . Finally, for any A ∈ Fζ1 ,
A ∩ {ζ1 ≤ ζ2 } ∈ Fζ1 ∧ζ2 .
Proof: Since {ζ ≤ s} ∩ {ζ ≤ t} = {ζ ≤ s ∧ t} ∈ Ft , it is clear that ζ is
Fζ -measurable. Next, suppose that X is a progressively measurable
function.
To prove that X(ζ) is Fζ -measurable, begin by checking that ω : ζ(ω), ω ∈
A ∈ Ft for any A ∈ Bt × Ft . Indeed, this is obvious when A = [0, s] × B for
s ∈ [0, t] and B ∈ Ft and, since these generate B[0,t] × Ft , follows in general.
Now, for any t ≥ 0 and Γ ∈ B,
A(t, Γ) ≡ (τ, ω) ∈ [0, ∞) × Ω : τ, X(τ, ω) ∈ [0, t] × Γ ∈ B[0,t] × Ft ,
and therefore
{X(ζ) ∈ Γ} ∩ {ζ ≤ t} = ω : ζ(ω), ω ∈ A(t, Γ) ∈ Ft .
As for f ◦ ζ when f satisfies the stated conditions, simply note that
{f ◦ ζ ≤ t} = {ζ ≤ f −1 (t)} ∈ Ft ,
where f −1 (t) ≡ inf{τ : f (τ ) ≥ t} ≤ t.
Next suppose that ζ1 and ζ2 are two stopping times. It is trivial to see that
ζ1 ∧ ζ2 and ζ1 ∨ ζ2 are again stopping times. In addition, if Q denotes the set of
rational numbers, then
[
{ζ1 + ζ2 > t} = {ζ1 > t} ∪ {ζ1 ≥ qt & ζ2 > (1 − q)t} ∈ Ft .
q∈Q∩[0,1]
and so
EP X(ζn ), X(ζn ) ≥ α ≤ EP X(T + 1), X(ζn ) ≥ α
" #
P
≤E X(T + 1), sup X(t) ≥ α .
t∈[0,T +1]
Starting from here, noting that ζn & ζ as n → ∞, and applying Fatou’s Lemma,
we arrive at
" #
(*) EP X(ζ), X(ζ) > α ≤ EP X(T + 1), sup X(t) ≥ α .
t∈[0,T +1]
Hence, since, by Theorem 7.1.9, P supt∈[0,T +1] X(t) ≥ α tends to 0 as α → ∞,
this proves the first assertion. When {X(t) : t ≥ 0} is uniformly integrable, we
can replace (*) by
P
P
E X(ζ ∧ T ), X(ζ ∧ T ) > α ≤ E X(∞), sup X(t) ≥ α
t≥0
for any stopping time ζ and T > 0. Hence, after another application of Fatou’s
Lemma, we get
P
P
E X(ζ), X(ζ) > α ≤ E X(∞), sup X(t) ≥ α .
t≥0
At the same time, the first inequality in Theorem 7.1.9 can be replaced by
1 P 1
P sup X(t) ≥ α ≤ E X(∞), sup X(t) ≥ α ≤ EP [X(∞)],
t≥0 α t≥0 α
and so the asserted uniform integrability follows.
It turns out that in the continuous time context, Doob’s Stopping Time The-
orem is most easily seen as a corollary of Hunt’s. Thus, I will begin with Hunt’s.
§ 7.1 Continuous Parameter Martingales 275
(ζi )n is a {Fm2−n : m ≥ 0}-stopping time and that Fζ1 ⊆ F(ζ1 )n , and apply
Theorem 5.2.13 to the discrete parameter submartingale X(m2−n , Fm2−n , P
in order to see that
h i h i
EP X (ζ1 )n , A ≤ EP X (ζ2 )n , A , A ∈ Fζ1 ,
with equality in the martingale case. Because of right-continuity and Lemma
7.1.13,
X (ζi )n −→ X(ζi ) in L1 (P; R), and so we have now shown that X(ζ1 ) ≤
E X(ζ2 ) Fζ1 , with equality in the martingale case.
P
with equality in the martingale case. Letting first T and then t tend to infinity,
one gets the same relationship for X(ζ1 ) and X(ζ2 ), initially with A ∩ {ζ1 < ∞}
and then, trivially, with A alone.
Theorem 7.1.15 (Doob’s Stopping Time Theorem). If X(t), Ft , P is
either a non-negative, integrable submartingale
or a martingale, then, for every
stopping time ζ, X(t ∧ ζ), Ft , P is either an integrable submartingale or a
martingale.
Proof: Given 0 ≤ s < t and A ∈ Fs , note that A ∩ {ζ > s} ∈ Fs∧ζ and
therefore, by Hunt’s Theorem applied to the stopping times s ∧ ζ and t ∧ ζ, that
EP X(t ∧ ζ), A = EP X(ζ), A ∩ {ζ ≤ s} + EP X(t ∧ ζ), A ∩ {ζ > s}
≥ EP X(ζ), A ∩ {ζ ≤ s} + EP X(s ∧ ζ), A ∩ {ζ > s} = EP X(s ∧ ζ), A ,
Proof: By elementary measure theory, all that we have to show is that, for
each B ∈ Fζ contained in {ζ < ∞}, Qµ (δζ−1 Γ) ∩ B = Qµ (Γ)Qµ (B).
Given B ∈ Fζ contained in {ζ < ∞} with Qµ (B) > 0, choose T > 0 so that
Qµ (BT ) > 0 when BT = B ∩ {ζ ≤ T }, and define QT ∈ M1 D(RN ) so that
Qµ (δζ−1 Γ) ∩ BT
QT (Γ) = .
Qµ (BT )
√
since ψ e− −1(ξ,ψ(ζ))RN +ζ`µ (ξ) 1A (δζ ψ)1BT (ψ) is Fs+ζ∧T -measurable.
§ 7.1.5. An Integration by Parts Formula. In this subsection I will derive
a simple result that has many interesting applications.
Theorem 7.1.17. Suppose V : [0, ∞)×Ω −→ C is a right-continuous, progres-
sively measurable function, and let |V |(t, ω) ∈ [0, ∞] denote the total variation
var[0,t] V ( · , ω) of V ( · , ω) on the interval [0, t]. Then |V | : [0, ∞)×Ω −→ [0, ∞]
is a non-decreasing, progressively measurable function that is right-continuous
§ 7.1 Continuous Parameter Martingales 277
on each interval [0, t) for which |V |(t, ω) < ∞. Next, suppose that X(t), Ft , P
is a C-valued martingale with the property that, for each (t, ω) ∈ (0, ∞) × Ω,
the product kX( · , ω)k[0,t] |V |(t, ω) < ∞, and define
(R
(0,t]
X(s, ω) V (ds, ω) if |V |(t, ω) < ∞
Y (t, ω) =
0 otherwise,
where, in the case when |V |(t, ω) < ∞, the integral is the Lebesgue integral of
X( · , ω) on [0, t] with respect to the C-valued measure determined by V ( · , ω).
If h i
EP kXk[0,T ] |V |(T ) + V (0) < ∞ for all T ∈ (0, ∞),
then X(t)V (t) − Y (t), Ft , P is a martingale.
[2n t]
X
k+1 k
|V |(t, ω) = sup V
2n ∧ t, ω − V 2n , ω
;
n∈N
k=0
[2n t]
X
k+1
k+1
k
lim X 2n ∧ t, ω V 2n ∧ t, ω − V 2n ∨ s, ω .
n→∞
k=[2n s]
In fact, under the stated integrability condition, the convergence in the preceding
takes place in L1 (P; R) for every t ∈ [0, ∞); and therefore, for any 0 ≤ s ≤ t < ∞
278 7 Continuous Parameter Martingales
and A ∈ Fs ,
EP Y (t) − Y (s), A
[2n t] h
X
k+1
k+1
k
i
= lim EP X 2n ∧ t, ω V 2n ∧ t, ω − V 2n ∨ s, ω ,A
n→∞
k=[2n s]
[2n t] h
X
k+1
k
i
= lim EP X(t) V 2n ∧ t, ω − V 2n ∨ s, ω ,A
n→∞
k=[2n s]
h i h i
= EP X(t) V (t) − V (s) , A = EP X(t)V (t) − X(s)V (s), A ,
and clearly this is equivalent to the asserted martingale property.
We will make frequent practical applications of Theorem 7.1.17 later, but
here I will show that it enables us to prove that there is an important dichotomy
between continuous martingales and functions of bounded variation. However,
before doing so, I need to make a small, technical digression.
A function ζ : Ω −→ [0, ∞] is an extended stopping time relative to
Ft : t ∈ [0, ∞) if {ζ < t} ∈ Ft for every t ∈ (0, ∞). Since {ζ < t} ∈ Ft for any
stopping time ζ, it is clear that every stopping time is an extended stopping time.
On the other hand, not every extended stopping time is a stopping time. To wit,
if X : [0, ∞)× Ω −→ R is a right-continuous,
progressively measurable function
relative to σ X(τ ) : τ ∈ [0, t] : t ≥ 0 , then ζ = inf{t ≥ 0 : X(t) > 1} will
always be an extended stopping time but will seldom be a stopping time.
T
Lemma 7.1.18. For each t ≥ 0, set Ft+ = τ >t Fτ . Then ζ : Ω −→ [0, ∞]
is an extended stopping time if and only if it is a stopping time relative to
{Ft+ : t ≥ 0}. Moreover, if X(t), Ft , P is either a non-negative,
integrable
submartingale or a martingale, then so is X(t), Ft+ , P . In particular, if ζ is
an extended stopping time, then X(t ∧ ζ), Ft+ , P is a non-negative, integrable
submartingale or a martingale.
T
Proof: The first assertion is immediate from {ζ ≤ t} = τ >t {ζ < τ }. To prove
the second assertion, apply right-continuity and the first uniform integrability
result in Lemma 7.1.13 to see that if 0 ≤ s < t and A ∈ Fs+ , then
EP X(s), A = lim EP X(τ ), A ≤ EP X(t), A ,
τ &s
and conclude that σ {ψ ζ (t) : t ≥ 0} ⊆ Fζ . To prove the opposite inclu-
sion, show that if f : Ψ −→ R is Fζ -measurable, then, for each t ∈ [0, ∞),
1{t} ζ(ψ) f (ψ) = 1{t} ζ(ψ t ) f (ψ t ), and thereby arrive at f (ψ) = f (ψ ζ ). Fi-
nally, use this together with Exercise 4.1.9 to show that f is σ {ψ ζ (t) : t ≥ 0} -
measurable.
Exercises for § 7.1 281
Exercise 7.1.22. Let (Ω, F, P) be a probability space and Ft : t ∈ [0, ∞)
is non-decreasing family of sub-σ-algebras of F. Denoteby F and Ft the com-
pletions of F and Ft with respect to P. If X(t), Ft , P is a submartingale or
martingale, show that X(t), Ft , P is also.
∞
r Z
2 y2
(1) a
e−
W ζ ≤t = 2 dy.
π at− 2
1
Now use the results in Exercise 3.3.14 (especially (3.3.16)) to conclude that the
1 1
W (1) -distribution of ζ a is ν 21 , the one-sided 12 -stable law “at time 2 2 a.”
22 a
(ii) Here is another, more conceptual way to understand the conclusion drawn
in (i) that the W (1) -distribution is a one-sided 12 -stable law. Namely, begin
by
a a+b a b
showing that if ψ(0) = 0 and ζ (ψ) < ∞, then ζ (ψ) = ζ (ψ) + ζ δζ ψ . As
a
an application of Theorem 7.1.16, conclude from this that if βa denotes the W (1) -
distribution of ζ a , then βa+b = βa ? βb . In particular, this means that β ≡ β1
ca = ea`β , where `β is the exponent appearing in
is infinitely divisible and that β
the Lévy–Khinchine formula for β̂.
(iii) Next, use Brownian scaling to see that, for all λ > 0, ζ λa has the same W (1) -
distribution as λ2 ζ a , and use this together with part (iii) of Exercise 3.3.12 to
1
see that the distribution of ζ 1 is νc2 for some c > 0.
1
(iv) Although we know from (i) that the constant c must be 2 2 , here is an
1 2
easier way to find it. Use Exercise 7.1.23 to see that eλψ(t)− 2 λ t , Ft , W (1)
for every λ ∈ R, and apply Doob’s Stopping Time Theorem and the fact that
(1) 1 2 a
W (1) (ζ a < ∞) = 1 to verify the identity EW e− 2 λ ζ = e−λa for λ > 0.
1 √
Hence, the Laplace transform of νc2 is e− 2λ , which, by the calculation in part
1
(iii) of Exercise 3.3.12, means that c = 2 2 . Of course, this calculation makes
the preceding parts of this exercise unnecessary. Nonetheless, it is interesting to
see the Brownian explanation for the properties of the one-sided, 12 -stable laws.
282 7 Continuous Parameter Martingales
µ
h i
Qµ {ψ : ψ(t) ∈ Γ & ζ(ψ) ≤ t} = EQ µt−ζ Γ − ψ(ζ) , ζ ≤ t ,
cτ = eτ `µ . As a consequence,
where, as usual, µτ is determined by µ
µ
h i
Qµ {ψ : ψ(t) ∈ Γ & ζ(ψ) > t} = µt (Γ) − EQ µt−ζ Γ − ψ(ζ) , ζ ≤ t ,
is a martingale for all ϕ ∈ Cc∞ (RN ; R). In this subsection, I, following Lévy,1 will
give another martingale characterization of Brownian motion, this time involving
many fewer test functions. On the other hand, we will have to assume ahead of
time that B( · , ω) ∈ C(RN ) for every ω ∈ Ω.
Theorem 7.2.1 (Lévy). Let B : [0, ∞) × Ω −→ RN be a progressively mea-
surable function satisfying
B(0, ω) = 0 and B( · , ω) ∈ C(RN ) for every ω ∈ Ω.
Then B(t), Ft , P is a Brownian motion if and only if
2 t|η|2
ξ, B(t) RN
+ η, B(t) RN − , Ft , P
2
where
∆n (ω) ≡ ξ, B ζn (ω), ω − B ζn−1 (ω), ω
RN
2
δn (ω) ≡ |ξ| ζn (ω) − ζn−1 (ω) .
h√ i
Dn ≡ exp −1 ∆n + δ2n − 1
h√ 2
i
Mn ≡ exp −1 ξ, B(ζn−1 ) RN + |ξ|2 ζn−1 .
By Taylor’s Theorem,
√ √
1 |ξ|2 2 √
2 3
δn 1 δn δn
Dn − −1 ∆n + − −1 ∆n + ≤ 6 e −1 ∆n + 2 .
2 2 2
284 7 Continuous Parameter Martingales
√
Hence, after rearranging terms, we see that Dn = −1 ∆n − 12 ∆2n − δn + En ,
In other words, we have now proved that, for every ∈ (0, 1], the difference
|ξ|2
between the two sides of (*) is dominated by 2(1 + |ξ|2 )(t − s)e 2 (1+t) , and so
the equality in (*) has been established.
As in Theorem 7.1.19, the subtlety here is in the use of the continuity as-
sumption. Indeed, the same example that demonstrated its importance there,
does so again here. Namely, if {N (t) : t ≥ 0} is a simple Poisson
process and
X(t) = N (t) − t, then both X(t), Ft , P and X(t)2 − t, Ft , P are martingales,
but X(t), Ft , P is certainly not a Brownian motion.
§ 7.2.2. Doob–Meyer Decomposition, an Easy Case. The continuous pa-
rameter analog of Lemma 5.2.12 is a highly non-trivial result, one that was
proved by P.A. Meyer and led him to his profound analysis of stochastic pro-
cesses. Nonetheless, there is an important case in which Meyer’s result is rel-
atively easy to prove, and that is the case proved in this subsection. However,
before getting to that result, there is a rather fussy matter to be dealt with.
Lemma 7.2.2. For each n ∈ N, let Xn : [0, ∞) −→ R be a right-continuous,
progressively measurable function with the property that Xn ( · , ω) is continuous
for P-almost every ω ∈ Ω. If
lim sup kXn ( · , ω) − Xm ( · , ω)k[0,t] = 0 (a.s., P) for each t ∈ (0, ∞),
m→∞ n>m
and so (τ, ω) τ ∧ζ(ω) and therefore also (τ, ω) τ ∧ζ(ω)−τ are progressively
measurable functions. Hence, since B = {(τ, ω) : τ ∧ ζ(ω) − τ ≥ 0}, B is
progressively measurable.
Now define
limn→∞ Xn (t, ω) if (t, ω) ∈ A
X(t, ω) = 0 if (t, ω) ∈ B \ A
X ζ(ω), ω if (t, ω) ∈
/ B.
where, in the passage to the second to last equality, I have used the fact that
Xk−1,n 1A 1[ζk−1,n ,ζk,n ) (s) is Fs -measurable and applied Theorem 7.1.14. At the
same time
EP Xk−1,n ∆k,n (t), A ∩ {ζk−1,n > s}
= EP Xk−1,n X(t ∧ ζk,n ) − X(t ∧ ζk−1,n ) , A ∩ {s < ζk−1,n ≤ t}
= EP Xk−1,n X(t) − X(t) , A ∩ {s < ζk−1,n ≤ t}
= 0 = EP Xk−1,n ∆k,n (s), A ∩ {ζk−1,n > s} ,
where I have used the fact that Xk−1,n 1A 1(s,t] (ζk−1,n ) is Ft∧ζk−1,n -measurable
and again applied Theorem 7.1.14 in getting the second
to last line. After
combining these, one sees that EP Xk−1,n ∆ k,n (t), A = E P
Xk−1,n ∆k,n (s), A ,
which means that Xk−1,n ∆k,n (t), Ft , P is a P-almost surely continuous mar-
tingale.
Given the preceding, it is clear that, for each n and `, Mn (t ∧ ζ`,n ), Ft , P
is a P-almost surely continuous, square integrable martingale. In addition, for
k 6= k 0 , Xk−1 ∆k,n (t ∧ ζ`,n ) is orthogonal to Xk0 −1 ∆k0 ,n (t ∧ ζ`,n ) in L2 (P; R).
Thus
" #
Mn (τ ) ≤ 4EP Mn (t ∧ ζ`,n )2
2
P
E sup
0≤τ ≤t∧ζ`,n
`
X `
X
2
∆k,n (t ∧ ζ`,n )2 ≤ 4C 2 EP ∆k,n (t ∧ ζ`,n )2
=4 EP Xk−1,n
k=1 k=1
2 P 2 2 2
P
= 4C E X(t ∧ ζ`,n ) ≤ 4C E X(t) ,
from which it is easy to see that Mn (t), Ft , P is a P-almost surely continuous,
square integrable martingale.
I will now show that limm→∞ supn>m kMn − Mm k[0,t] = 0 P-almost surely
(m) (m)
and in L2 (P; R) for each t ∈ [0, ∞). To this end, define Yk−1,n so that Yk−1,n (ω)
(m)
= Xk−1,n (ω) − X`−1,m (ω) when ζ`−1,m (ω) ≤ ζk−1,n (ω) < ζ`,m (ω). Then Yk−1,n
(m) 1
P∞ (m)
is Fk−1,n -measurable, |Yk−1,n | ≤ m (a.s., P), and Mn −Mm = k=1 Yk−1,n ∆k,n .
Hence, by the same reasoning as above,
∞
X (m) 4
EP kMn − Mm k2[0,t] ≤ 4 EP (Yk−1,n )2 ∆k,n (t)2 ≤ 2 EP X(t)2 ,
m
k=1
particular,
X(t) X(t)
lim q = 1 = − lim q
t→∞
2hXi(t) log(2) hXi(t) t→∞ 2hXi(t) log(2) hXi(t)
P-almost surely.
Proof: Clearly, given the first part, the last assertion is a trivial application of
Exercise 4.3.15.
After replacing F and the Ft ’s by their completions and applying Exercise
7.1.22, I may and will assume that X(0, ω) = 0, X( · , ω) is continuous, hXi( · , ω)
is continuous and strictly increasing, and limt→∞ hXi(t, ω) = ∞ for every ω ∈ Ω.
Next, for each (t, ω) ∈ [0, ∞), set ζt (ω) = hXi−1 (t, ω), where hXi−1 ( · , ω) is the
inverse of hXi( · , ω). Clearly, for each ω ∈ Ω, t ζt (ω) is a continuous, strictly
increasing function that tends to infinity as t → ∞. Moreover, because hXi is
progressively measurable, ζt is a stopping time for each t ∈ [0, ∞). Now set
§ 7.2 Brownian Motion and Martingales 289
B(t) = X(ζt ). Since it is obvious that X(t) = B hXi(t) , all that I have to
show is that B(t), Ft0 , P is a Brownian motion for some non-decreasing family
{Ft0 : t ≥ 0} of sub-σ-algebras.
Trivially, B(0, ω) = 0 and B( · , ω) is continuous for all ω ∈ Ω. In addi-
tion, B(t) is Fζt -measurable, and so B is progressively measurable with respect
to {Fζt : t ≥ 0}. Thus, by Theorem 7.2.1, I will be done once I show that
2
B(t), Fζt , P and B(t) − t, Fζt , P are martingales. To this end, first observe
that
" # " #
EP sup X(τ )2 = lim EP sup X(τ )2
τ ∈[0,ζt ] T →∞ τ ∈[0,T ∧ζt ]
Thus, limT →∞ X(T ∧ ζt ) −→ B(t) in L2 (P; R). Now let 0 ≤ s < t and A ∈ Fζs
be given. Then, for each T > 0, AT ≡ A ∩ {ζs ≤ T } ∈ FT ∧ζs , and so, by
Theorem 7.1.14,
EP X(T ∧ ζt ), AT = EP X(T ∧ ζs ), AT
and
Now let T → ∞, and apply the preceding convergence assertion to get the
desired conclusion.
§ 7.2.3. Burkholder’s Inequality Again. In this subsection we will see what
Burkholder’s Inequality looks like in the continuous parameter setting, a result
whose importance for the theory of stochastic integration is hard to overstate.
Theorem 7.2.6 (Burkholder). Let X(t), Ft , P be a P-almost surely con-
tinuous, square integrable martingale. Then, for each p ∈ (1, ∞) and t ∈ [0, ∞)
(cf. (6.3.2)),
p1
(7.2.7) Bp−1 kX(t) − X(0)kLp (P;R) ≤ EP hX(t)i 2 p ≤ Bp kX(t) − X(0)kLp (P;R) .
Proof: After completing the σ-algebras if necessary, I may (cf. Exercise 7.1.22)
and will assume that X( · , ω) is continuous and that hXi( · , ω) is continuous and
non-decreasing for every ω ∈ Ω. In addition, I may and will assume that X(0) =
0. Finally, I will assume that X is bounded. To justify this last assumption, let
ζn = inf{t ≥ 0 : |X(t)| ≥ n}, set Xn (t) = X(t ∧ ζn ), and use Exercise 7.2.10 to
see that one can take hXn i = hXi(t ∧ ζn ). Hence, if we know (7.2.7) for bounded
martingales, then
p1
Bp−1 kX(t ∧ ζn )kLp (P;R) ≤ EP hXi(t ∧ ζn ) 2 p ≤ Bp kX(t ∧ ζn )kLp (P;R)
290 7 Continuous Parameter Martingales
for all n ≥ 1. Since hXi is non-decreasing, we can apply Fatou’s Lemma to the
preceding and thereby get
p1
kX(t)kLp (P;R) ≤ lim kX(t ∧ ζn )kLp (P;R) ≤ Bp EP hXi(t) 2 p ,
n→∞
which is the left-hand side of (7.2.7). To get the right-hand side, note that either
kX(t)kLp (P;R) = ∞, in which case there is nothing to do, or kX(t)kLp (P;R) < ∞,
in which case, by the second half of Theorem 7.1.9, X(t ∧ ζn ) −→ X(t) in
Lp (P; R) and therefore
p1 p1
EP hXi(t) 2 p = lim EP hXi(t ∧ ζn ) 2 p
n→∞
≤ Bp lim kX(t ∧ ζn )kLp (P;R) = Bp kX(t)kLp (P;R) .
n→∞
Proceeding under the above assumptions and referring to the notation in the
proof of Theorem 7.2.3, begin by observing that, for any t ∈ [0, ∞) and n ∈
N, Theorem 7.1.14 shows that X(t ∧ ζk,n ), Ft∧ζk,n , P is a discrete parameter
martingale indexed by k ∈ N. In addition, ζk,n = t for all but a finite number
of k’s. Hence, by (6.3.7) applied to X(t ∧ ζk,n ), Ft∧ζk,n , P ,
p1
Bp−1 kX(t)kLp (P;R) ≤ EP hXin (t) 2 p ≤ Bp kX(t)kLp (P;R)
for all n ∈ N.
In particular, this shows that supn≥0 khXin (t)kLp (P;R) < ∞ for every p ∈ (1, ∞),
and therefore, since hXin (t) −→ hXi(t) (a.s., P), this is more than enough to
p p
verify that E hXin (t) −→ E hXi(t) for every p ∈ (1, ∞).
P 2 P 2
(i) Given R ∈ (0, ∞), set ζR = inf{t ≥ 0 : |X(t)| ≥ R}, and show that
!
Z t∧ζR
eX(t∧ζR ) − 1
2 eX(τ ) dhXi, Ft , P
0
is a martingale.
Hint: Choose F ∈ Cc∞ (R; R) so that F (x) = ex for x ∈ [−2R, 2R], apply
Exercise 7.2.8 to this F , and then use Doob’s Stopping Time Theorem.
1
(ii) Apply Theorem 7.1.17
to the martingale in (i) and e− 2 hXi(t∧ζR ) to show
that E(t ∧ ζR ), Ft , P is a martingale.
(iii) By replacing X and R with 2X and 2R in (ii), show that
Finally, given this estimate, show that the conclusion in Exercise 7.2.8 continues
to hold for any F ∈ C 2 (R; C) whose second derivative has at most exponential
growth.
292 7 Continuous Parameter Martingales
Z t
∂τ + 12 ∆ f τ, B(τ ) dτ,
X(t) = f t, B(t) −
0
Z t
∂τ + 12 ∆ g τ, B(τ ) dτ,
Y (t) = g t, B(t) −
0
2
Z t
∂τ + 12 ∆ f τ, B(τ ) dτ
f t, B(t) − 2X(t)
0
Z t 2
∂τ + 12 ∆ f τ, B(τ ) dτ
− ,
0
Similarly,
h √ i
EP exp −1 ξ, Z̃(t) RN − t`µ (ξ) , A ∩ {ζ > s}
h √ √ i
= EP e2 −1(ξ,Z(t∧ζ))RN exp − −1 (ξ, Z(t) RN − t`µ (ξ) , A ∩ {ζ > s}
h √ i
= EP exp −1 ξ, Z(t ∧ ζ) RN − (t ∧ ζ)`µ (ξ) , A ∩ {ζ > s}
h √ i
= EP exp −1 ξ, Z(s ∧ ζ) RN − (s ∧ ζ)`µ (ξ) , A ∩ {ζ > s}
h √ i
= EP exp −1 ξ, Z̃(s) RN − s`µ (ξ) , A ∩ {ζ > s} .
To check that
P {B(t) ∈ Γ}∩A1 (t) −P {B(t) ∈ 2(a2 −a1 )+Γ}∩A1 (t) ≥ 0 when Γ ∈ B[a1 ,∞) ,
M h
X i
P B(t) ∈ 2a1 − 2(m − 1)(a2 − a1 ) − Γ − P B(t) ∈ 2m(a2 − a1 ) + Γ
m=1
+ P {B(t) ∈ 2M (a2 − a1 ) + Γ} ∩ A1 (t)
for all Γ ∈ B[a1 ,∞) . The same line of reasoning applies when Γ ∈ B(−∞,a2 ] and
A1 (t) is replaced by A2 (t).
Perhaps the most useful consequence of the preceding is the following corollary.
296 7 Continuous Parameter Martingales
Then
Z
I
(7.3.5) P (s + t, x, Γ) = P I (t, z, Γ) P I (s, x, dz).
I
Next, set
1 x2
X
g̃(t, x) = g(t, x + 4m), where g(t, x) = (2πt)− 2 e− 2t
m∈Z
and
Then p(−1,1) is a smooth function that is symmetric in (x, y), strictly positive
on (0, ∞) × (0, 1)2 , and vanishes when x ∈ {−1, 1}. Finally, if
then
Z
I
(7.3.6) p (s + t, x, y) = pI (s, x, z)pI (t, z, y) dz
I
where, in the passage to the second line, I have used Brownian scaling. Now,
use the last part of Theorem 7.3.3, the symmetry of γ0,r−2 t , and elementary
rearrangement of terms to arrive first at
Xh i
P I (t, x, Γ) = γr−2 t 4m + r−1 (Γ − x) − γr−2 t 4m + 2 + r−1 (Γ + x − 2c) ,
m∈Z
and then at P I (t, x, dy) = pI (t, x, y) dy. Given this and (7.3.5), (7.3.6) is obvious.
Turning to the properties of p(−1,1) (t, x, y), both its symmetry and smooth-
ness are clear. In addition, as the density for P (−1,1) (t, x. · ), it is non-negative,
and, because x g̃(t, x) is periodic with period 4, it is easy to see that
(−1,1)
p (t, ±1, y) = 0. Thus, everything comes down to proving that p(−1,1) (t, x, y)
> 0 for (t, x, y) ∈ (0, ∞) × (−1, 1)2 . To this end, first observe that, after rear-
ranging terms, one can write p(−1,1) (t, x, y) as
Since each of the terms in the sum over m ∈ Z+ is positive, we have that
2(1−|x|)(1−|y|)
p(−1,1) (t, x, y) > g(t, y − x) 1 − 2e− ≥ 1 − 2e g(t, y − x)
t
if t ≤ 2(1 − |x|)(1 − |y|). Hence, for each θ ∈ (0, 1), p(−1,1) (t, x, y) > 0 for all
(t, x, y) ∈ [0, 2θ2 ] × [−1 + θ, 1 − θ]2 . Finally, to handle x, y ∈ [−1 + θ, 1 − θ] and
t > 2θ2 , apply (7.3.6) with I = (−1, 1) to see that
Z
(−1,1) 2
p (m + 1)θ , x, y) ≥ p(−1,1) (θ2 , x, z)p(−1,1) (mθ2 , z, y) dz,
|z|≤(1−θ)
and use this and induction to see that p(−1,1) (mθ2 , x, y) > 0 for all m ≥ 1. Thus,
if n ∈ Z+ is chosen so that nθ2 < t ≤ (n + 1)θ2 , then another application of
(7.3.6) shows that
Z
(−1,1)
p (t, x, y) ≥ p(−1,1) (t − nθ2 , x, z)p(−1,1) (nθ2 , z, y) dz > 0.
|z|≤(1−θ)
298 7 Continuous Parameter Martingales
and set
P G (t, x, Γ) = W (N ) {ψ : x + ψ(t) ∈ Γ & ζxG (ψ) > t}
. This is the probabilistic version of Duhamel’s Formula, which we will see again
in § 10.3.1.
(iii) As a consequence of (ii), show that there is a Borel measurable function
pG : (0, ∞) × G2 −→ [0, ∞) such that (t, y) pG (t, x, y) is continuous for
each x ∈ G and P G (t, x, dy) = pG (t, x, y) dy for each (t, x) ∈ (0, ∞) × G. In
particular, use this in conjunction with (i) to conclude that
Z
G
p (s + t, x, y) = pG (t, z, y)pG (s, x, z) dz.
G
N |ξ|2
Hint: Keep in mind that (τ, ξ) (2πτ )− 2 e− 2τ is smooth and bounded as
long as ξ stays away from the origin.
(iv) Given c = (c1 , . . . , cN ) ∈ RN and r > 0, let Q(c, r) denote the open cube
QN
i=1 (ci − r, ci + r), and show that (cf. Corollary 7.3.4)
N
Y
pQ(c,r) (t, x, y) = p(ci −r,ci +r) (t, xi , yi )
i=1
1 See I.E. Segal’s “Distributions in Hilbert space and canonical systems of operators,” T.A.M.S.,
88 (1958) and L. Gross’s “Abstract Wiener spaces,” Proc. 5th Berkeley Symp. on Prob. &
Stat., 2 (1965), Univ. of California Press. A good exposition of this topic can be found in
H.-H. Kuo’s Gaussian Measures in Banach Spaces, Springer-Verlag, Math. Lec. Notes., # 463
(1975).
299
300 8 Gaussian Measures on a Banach Space
Proof: It is obvious that the inclusion map taking Θ(RN ) into C(RN ) is con-
tinuous. To see that k · kΘ(RN ) is lower semicontinuous on C(RN ) and that
Θ(RN ) ∈ BC(RN ) , note that, for any s ∈ [0, ∞) and R ∈ (0, ∞),
n o
A(s, R) ≡ ψ ∈ C(RN ) : ψ(t) ≤ R(1 + t) for t ≥ s
by
θ (es )
F (θ) (s) = , s ∈ R.
1 + es
1 I use |λ| to denote the variation measure determined by λ.
§ 8.1 The Classical Wiener Space 301
As is well known, C0 R; RN with the uniform norm is a separable Banach space,
N N
and it is obvious that F is an isometry from Θ(R ) onto
C0 R; R . Moreover,
by the Riesz Representation Theorem for C0 R; RN , one knows that the dual
of C0 R; RN is isometric to the space of totally finite, RN -valued measures
on R; BR with the norm given by total variation. Hence, the identification
∗
of Θ(RN ) reduces to the obvious interpretation of the adjoint map F ∗ as a
mapping from totally finite RN -valued measures onto the space of RN -valued
measures that do not charge 0 and whose variation measure integrates (1 + t).
Because of the Strong Law in part (ii) of Exercise 4.3.11, it is clear that almost
every Brownian path is in Θ(RN ). In addition, by the Brownian scaling property
and Doob’s Inequality (cf. Theorem 7.1.9),
∞
X
−n+1
kBk2Θ(RN ) 2
P
P
E ≤ 4 E sup |B(t)|
n=0 0≤t≤2n
X∞
−n+2 2
≤ 32EP |B(1)|2 = 32N.
P
= 2 E sup |B(t)|
n=0 0≤t≤1
Turning to the properties of µ̂, note that its continuity with respect to weak*
convergence is an immediate consequence of Lebesgue’s Dominated Convergence
Theorem. Furthermore, in view of the preceding, we will know that µ̂ completely
determines µ as soon as we show that, for each n ∈ Z+ and X ∗ = x∗1 , . . . , x∗n ∈
n
E ∗ , µ̂ determines the marginal distribution µX ∗ ∈ M1 (RN ) of
x ∈ E 7−→ hx, x∗1 i, . . . , hx, x∗n i ∈ Rn
I will now compute the Fourier transform of W (N) . To this end, first recall
that, for an RN -valued Brownian motion, { ξ, B(t) RN : t ≥ 0 and ξ ∈ RN
spans a Gaussian family G(B) in L2 (P; R). Hence, span ξ, θ(t) : t ≥
0 and ξ ∈ RN is a Gaussian family in L2 (W (N ) ; R). From this, combined
with an easy limit argument using Riemann sum approximations, one sees that,
∗
for any λ ∈ Θ(RN ) , θ hθ, λi is a centered Gaussian random variable under
W (N ) . Furthermore, because, for 0 ≤ s ≤ t,
(N ) (N )
EW ξ, θ(s) RN η, θ(t) RN = EW
ξ, θ(s) RN η, θ(s) RN = s ξ, η RN ,
we can apply Fubini’s Theorem to see that
ZZ
(N )
EW hθ, λi2 =
s ∧ t λ(ds) · λ(dt).
[0,∞)2
Obviously, nothing very significant has happened yet, since nothing very excit-
ing has been done yet. However, if we now close our eyes, suspend our disbelief,
and pass to the limit as n tends to infinity and the tk ’s become dense, we arrive
at Feynman’s representation 2 of Wiener’s measure:
" Z #
(N ) 1 1 2
(8.1.4) W dθ) = exp − θ̇(t) dt dθ,
Z 2 [0,∞)
2 In truth, Feynman himself never dabbled in considerations so mundane as the ones that
√
follow. He was interested in the Schödinger equation, and so he had a factor −1 multiplying
the exponent.
304 8 Gaussian Measures on a Banach Space
is Lebesgue measure for every n ∈ Z+ and 0 < t1 · · · < tn , dθ must be the nonex-
istent translation invariant measure on the infinite dimensional space Θ(RN ). Fi-
nally, the integral in the exponent only makes sense if θ is differentiable in some
sense, but almost no Brownian path is. Nonetheless, ridiculous as it is, (8.1.4)
is exactly the expression at which one would arrive if one were to make a suffi-
ciently naı̈ve interpretation of the notion that Wiener measure is the standard
Gauss measure on the Hilbert space H(RN ) consisting of absolutely continuous
h : [0, ∞) −→ RN with h(0) = 0 and
Θ(RN ) is clear. Knowing this, abstract reasoning (cf. Lemma 8.2.3) guarantees
∗
that Θ(RN ) can be identified as a subspace of H1 (RN ). That is, for each λ ∈
∗
Θ(RN ) , there is a hλ ∈ H1 (RN ) with the property that h, hλ H1 (RN ) = hh, λi
for all h ∈ H1 (RN ), and in the present setting it is easy to give a concrete
∗
representation of hλ . In fact, if λ ∈ Θ(RN ) , then, for any h ∈ H1 (RN ),
Z Z Z !
hh, λi = h(t) · λ(dt) = ḣ(τ ) dτ · λ(dt)
(0,∞) (0,∞) (0,t)
Z
= ḣ(τ ) · λ (τ, ∞) dτ = h, hλ H1 (RN ) ,
(0,∞)
where Z
hλ (t) = λ (τ, ∞) dτ.
(0,t]
§ 8.1 The Classical Wiener Space 305
Moreover,
Z Z ZZ
khλ k2H1 (RN ) = λ (τ, ∞) |2 dτ =
λ(ds) · λ(dt) dτ
(0,∞) (0,∞)
(τ,∞)2
ZZ
= s ∧ t λ(ds) · λ(dt).
(0,∞)2
Hence, by (8.1.3),
khλ k2H(RN )
!
∗
(8.1.5) \
W (N ) (λ) = exp − , λ ∈ Θ(RN ) .
2
Although (8.1.5) is far less intuitively appealing than (8.1.4), it provides a
mathematically rigorous way in which to think of W (N ) as the standard Gaussian
measure on H1 (RN ). Furthermore, there is another way to understand why one
should accept (8.1.5) as evidence for this way of thinking about W (N ) . Indeed,
∗
given λ ∈ Θ(RN ) , write
Z Z T
hθ, λi = lim θ(t) · λ(dt) = − lim θ(t) · dλ (t, ∞) ,
T →∞ [0,T ] T →∞ 0
where the integral in the last expression is taken in the sense of Riemann–
Stieltjes. Next, apply the integration by part formula3 to conclude that t
λ (t, ∞) is Riemann–Stieltjes integrable with respect to t θ(t) and that
Z T Z T
− θ(t) · dλ (t, ∞) = −θ(T ) · λ (T, ∞) + λ (t, ∞) · dθ(t).
0 0
Hence, since
|θ(T )|
Z
lim |θ(T )||λ|(T, ∞) ≤ lim (1 + t) |λ|(dt) = 0,
T →∞ T →∞ 1 + T (0,∞)
Z T
(8.1.6) hθ, λi = lim ḣλ (t) · dθ(t),
T →∞ 0
where again the integral is in the sense of Riemann–Stieltjes. Thus, if one
dt, one can believe that hθ, λi provides a
somewhat casually writes dθ(t) = θ̇(t)
reasonable interpretation of θ, hλ H(RN ) for all θ ∈ Θ(RN ), not just those that
are in H1 (RN ).
Because R. Cameron and T. Martin were the first mathematicians to system-
atically exploit the consequences of this line of reasoning, I will call H1 (RN ) the
Cameron–Martin space for classical Wiener measure.
3See, for example, Theorem 1.2.7 in my A Concise Introduction to the Theory of Integration,
Birkhäuser (1999).
306 8 Gaussian Measures on a Banach Space
is an algebra that generates BH . Show that there always exists a finitely additive
WH on A that is uniquely determined by the properties that it is σ-additive on
A(g1 , . . . , gn ) for every n ∈ Z+ and {g1 , . . . , gn } ⊆ H and that
Z h√ i
kgk2H
exp −1 (h, g)H WH (dh) = exp − , g ∈ H.
H 2
On the other hand, as we already know, this finitely additive measure admits a
countably additive extension to BH if and only if H is finite dimensional.
§ 8.2 A Structure Theorem for Gaussian Measures
Say that a centered Gaussian measure W on a separable Banach space E is
non-degenerate if EW hx, x∗ i2 > 0 unless x∗ = 0. (See Exercise 8.2.11.) In
this section I will show that any non-degenerate, centered Gaussian measure W
on a separable Banach space E shares the same basic structure that W (N ) has
on Θ(RN ). In particular, I will show that there is always a Hilbert space H ⊆ E
for which W is the standard Gauss measure in the same sense that W (N ) was
shown in § 8.1.2 to be the standard Gauss measure for H1 (RN ).
§ 8.2.1. Fernique’s Theorem. In order to carry out my program, I need a
basic integrability result about Banach space–valued, Gaussian random vari-
ables. The one that I will use is due to X. Fernique, and his is arguably the
most singularly beautiful result in the theory of Gaussian measures on a Banach
space.
Theorem 8.2.1 (Fernique’s Theorem). Let E be a real, separable Banach
space, and suppose that X is an E-valued random variable that is centered and
Gaussian in the sense that, for each x∗ ∈ E ∗ , hX, x∗ i is a centered, R-valued
Gaussian random variable. If R = inf{r : P(kXkE ≤ r) ≥ 34 )}, then
h kXk2E i ∞ 2n
1
X e
(8.2.2) E e 18R2 ≤ K ≡ e 2 + .
n=0
3
Proof: After enlarging the sample space if necessary, I may and will assume
that there is an E-valued random variable X 0 that is independent of X and has
1 1
the same distribution as X. Set Y = 2− 2 (X + X 0 ) and Y 0 = 2− 2 (X − X 0 ).
Then the pair (Y, Y 0 ) has the same distribution as the pair (X, X 0 ). Indeed, by
2
Lemma 8.1.2, this comes down to showing that the R ∗-valued random variable
hY, x i, hY , x i has the same distribution as hX, x i, hX 0 , x∗ i , and that is
∗ 0 ∗
3
Now suppose that P kXk ≤ R ≥ 4, and define {tn : n ≥ 0} by t0 = R and
1
tn = R + 2 2 tn−1 for n ≥ 1. Then
2
P kXkE ≤ R P kXkE ≥ tn ≤ P kXkE ≥ tn−1
and therefore
!2
P kXkE ≥ tn P kXkE ≥ tn−1
≤
P kXkE ≤ R P kXkE ≤ R
for n ≥ 1. Working by induction, one gets from this that
!2n
P kXkE ≥ tn P kXkE ≥ R
≤
P kXkE ≤ R P kXkE ≤ R
n+1
2 −1 n+1 n+1 n
and therefore, since tn = R 2 R ≤ 3−2 .
1 ≤ 32 2 R, that P kXkE ≥ 32 2
2 2 −1
Hence,
h kXk2E i ∞
1 X n n n+1
e2 P 32 2 R ≤ kXkE ≤ 32 2 R
EP e 18R2 ≤ e 2 P kXkE ≤ 3R +
n=0
∞ n
1
X e 2
≤ e2 + = K.
n=0
3
§ 8.2.2. The Basic Structure Theorem. I will now abstract the relationship,
proved in § 8.1.2, between Θ(RN ), H1 (RN ), and W (N ) , and for this purpose I
will need the following simple lemma.
308 8 Gaussian Measures on a Banach Space
Lemma 8.2.3. Let E be a separable, real Banach space, and suppose that
H ⊆ E is a real Hilbert space that is continuously embedded as a dense subspace
of E.
(i) For each x∗ ∈ E ∗ there is a unique hx∗ ∈ H with the property that
h, hx∗ H = hh, x∗ i for all h ∈ H, and the map x∗ ∈ E ∗ 7−→ hx∗ ∈ H is
linear, continuous, one-to-one, and onto a dense subspace of H.
(ii) If x ∈ E, then x ∈ H if and only if there is a K < ∞ such that |hx, x∗ i| ≤
Kkhx∗ kH for all x∗ ∈ E ∗ . Moreover, for each h ∈ H, khkH = sup{hh, x∗ i : x∗ ∈
E ∗ & kx∗ kE ∗ ≤ 1}.
(iii) If L∗ is a weak* dense subspace of E ∗ , then there exists a sequence {x∗n :
n ≥ 0} ⊆ L∗ such that {hx∗n : n ≥ 0} is an orthonormal basis for H. Moreover,
P∞
if x ∈ E, then x ∈ H if and only if n=0 hx, x∗n i2 < ∞. Finally,
∞
X
h, h0 hh, x∗n ihh0 , x∗n i for all h, h0 ∈ H.
H
=
n=0
2
P∞ ∗ 2
P∞ ∗ 2
P khkH ∗= n=0 hh, xn i . Finally, if x ∈ E and
In particular, n=0 hx, xn i < ∞,
set g = m=0 hx, xn ihx∗n . Then g ∈ H and hx − g, x i = 0 for all x∗ ∈ S ∗ .
∗
The terminology is justified by the fact, demonstrated at the end of § 8.1.2,
that H1 (RN ), Θ(RN ), W (N ) is an abstract Wiener space. The concept of an
abstract Wiener space was introduced by Gross, although his description was
somewhat different from the one just given (cf. Theorem 8.3.9 for a reconciliation
of mine with his definition).
Theorem 8.2.5. Suppose that E is a separable, real Banach space and that
W ∈ M1 (E) is a centered Gaussian measure that is non-degenerate. Then there
exists a unique Hilbert space H such that (H, E, W) is an abstract Wiener space.
q
Proof: By Fernique’s Theorem, we know that C ≡ EW kxk2E < ∞.
To understand the proof of existence, it is best to start with the proof of
uniqueness. Thus, suppose that H is a Hilbert space for which (E, H, W) is an
abstract Wiener space. Then, for all x∗ , y ∗ ∈ E ∗ , hhx∗ , y ∗ i = (hx∗ , hy∗ )H =
hhy∗ , x∗ i. In addition,
Z
∗
hhx∗ , x i = khx∗ k2H = hx, x∗ i2 W(dx),
310 8 Gaussian Measures on a Banach Space
Moreover, by (*),
Z
∗ ∗ ∗
hhx∗ , y i = xhx, x i W(dx), y for all y ∗ ∈ E ∗ ,
and so Z
(***) hx∗ = xhx, x∗ i W(dx).
and therefore both (*) and (***) hold for this choice of H. Further, given (*), it
is clear that khx∗ k2H is the variance of h · , x∗ i and therefore that (8.2.4) holds.
At the same time, just as in the derivation of (**), kF (ψ)kE ≤ CkψkL2 (W;R) =
CkF (ψ)kH , and so H is continuously embedded inside E. Finally, by the Hahn–
Banach Theorem, to show that H is dense in E it suffices to check that the only
x∗ ∈ E ∗ such Rthat hF (ψ), x∗ i = 0 for all ψ ∈ Ψ is x∗ = 0. But when ψ = h · , x∗ i,
hF (ψ), x∗ i = hx, x∗ i2 W (dx), and therefore, because W is non-degenerate, such
an x∗ would have to be 0.
§ 8.2.3. The Cameron–Marin Space. Given a centered, non-degenerate
Gaussian measure W on E, the Hilbert space H for which (H, E, W) is an ab-
stract Wiener space is called its Cameron–Martin space. Here are a couple
of important properties of the Cameron–Martin subspace.
§ 8.2 A Structure Theorem for Gaussian Measures 311
Now choose ` so that max1≤m≤n |hhk − h, x∗m i| < for all k ≥ `. Then, for any
x∗ ∈ BE ∗ (0, 1) and all k ≥ `,
Since, by the uniform boundedness principle, supk≥1 khk kH < ∞, this proves
that khk − hkE = sup{hhk − h, x∗ i : x∗ ∈ BE ∗ (0, 1)} −→ 0 as k → ∞.
S∞
Because H = 1 BH (0, n) and BH (0, n) is a compact subset of E for each
n ∈ Z+ , it is clear that H ∈ BE . To see that W(H) = 0 when E is infinite
dimensional, choose {x∗n : n ≥ 0} as in the final part of Lemma 8.2.3, and
set Xn (x) = hx, x∗n i. Then the Xn ’s are an infinite
P∞ sequence of independent,
centered, Gaussians with mean value 1, and so n=0 Xn2 = ∞ W-almost surely.
Hence, by Lemma 8.2.3, W-almost no x is in H.
Turning to the map I, define I(hx∗ ) = h · , x∗ i. Then, for each x∗ , I(hx∗ ) is
a centered Gaussian with variance khx∗ k2H , and so I is a linear isometry from
312 8 Gaussian Measures on a Banach Space
1 − kh−gk2H
e 2 λH (dh),
Z
which could be rewritten as
(8.2.8) (τg )∗ W(dx) (dh) = Rg (x) W (dx), where Rg = exp I(g) − 12 kgk2H .
That (8.2.8) is correct was proved for the classical Wiener space by Cameron
and Martin, and for this reason it is called the Cameron–Martin formula. In
fact, one has the following result, the second half of which is due to Segal.
Exercises for § 8.2 313
for all ξ1 , ξ2 ∈ C. Indeed, this is obvious when ξ1 and ξ2 are pure imaginary,
and, since both sides are entire functions of (ξ1 , ξ2 ) ∈ C2 , it follows in general
by analytic
√ continuation. In particular, by taking h1 = g, ξ1 = 1, h2 = hx∗ , and
ξ2 = −1, it is easy to check that the right-hand side of (*) is equal to ν̂(x∗ ).
To prove the second assertion, begin by recalling from Lemma 8.2.3 that if
y ∈ E, then y ∈ H if and only if there is a K < ∞ with the property that
|hy, x∗ i| ≤ K for all x∗ ∈ E ∗ with khx∗ kH = 1. Now suppose that (τx∗ )∗ W 6⊥
W, and let R be the Radon–Nikodym derivative of its absolutely continuous
part. Given x∗ ∈ E ∗ with khx∗ kH = 1, let Fx∗ be the σ-algebra generated by
x hx, x∗ i, and check that (τy )∗ W Fx∗ W Fx∗ with Radon–Nikodym
derivative
hy, x∗ i2
∗ ∗
Y (x) = exp hy, x ihx, x i − .
2
Hence,
1 2
Y ≥ EW R Fx∗ ≥ EW R 2 Fx∗ ,
hy, x∗ i2
1 1
exp − = EW Y 2 ≥ α ≡ EW R 2 ∈ (0, 1].
8
whether one can choose a countable subset C ⊆ N such that x ∈ Ê if and only
if hx, x∗ i = 0 for all x∗ ∈ C. For this purpose, recall that, by Exercise 5.1.19, E ∗
with the weak* topology is second countable and therefore that N is separable
with respect to the weak* topology.
Exercise 8.2.12. Let {xP n : n ≥ 0} be a sequence in the P
separable Banach space
∞ ∞
E with the property that n=0 kxn kE < ∞. Show that n=0 |ξn |kxP n k < ∞ for
∞
N
γ0,1 -almost every ξ ∈ RN , and define X : RN −→ E so that X(ξ) = n=0 ξn xn
P∞
if n=0 |ξn |kxn kE < ∞ and X(ξ) = 0 otherwise. Show that the distribution
µ of X is a centered, Gaussian measure on E. In addition, show that µ is
non-degenerate if and only if the span of {xn : n ≥ 0} is dense in E.
Exercise 8.2.13. Here an application of Fernique’s Theorem to functional anal-
ysis. Let E and F be a pair of separable Banach spaces and ψ a Borel measurable,
linear map from E to F . Given a centered, Gaussian E-valued random variable
X, use Exercise 2.3.21 see that ψ ◦ X is an F -valued, a centered Gaussian ran-
dom variable, and apply Fernique’s Theorem to conclude that ψ ◦ X is a square
integrable and has mean value 0. Next, suppose that ψ is not continuous, and
choose {xn : n ≥ 0} ⊆ E and {yn : n ≥ 0} ⊆ F ∗ so that kxn kE = 1 = kyn ∗ kF ∗
and hψ(xn ), yn∗ i ≥ n + 13 . Using Exercise 8.2.12, show that there exist cen-
tered, Gaussian F -valued random variables {Xn : n ≥ 0},P {X n : n ≥ 0},
N −2 ∞
and X under γ0,1 such that Xn (ξ) = (n + 1) ξn xn , X(ξ) = n=0 Xn (ξ), and
X n (ξ) = X(ξ) − Xn (ξ) for γ0,1
N
-almost every ξ ∈ RN . Show that
Z Z
kψ ◦ X(ξ)k2F N
γ0,1 ≥ hψ ◦ X(ξ), yn∗ i γ0,1
(dξ) N
(dξ)
Z
≥ hψ ◦ Xn (ξ), yn∗ i γ0,1
N
(dξ) ≥ (n + 1),
EP kSkpE = σ p EW kxkpE
for all p ∈ [0, ∞).
Hint: Using Exercise 8.2.11, reduce to the case when W is non-degenerate. For
this case, let H be the Cameron–Martin space for W on E, and show that
h √ ∗
i σ2 2
EP e −1hS,x i = e− 2 khx∗ kH for all x∗ ∈ E ∗ .
Exercise 8.2.15. Referring to the setting in Lemma 8.2.3, show that there is a
(n)
sequence {k · kE : n ≥ 0} of norms on E each of which is commensurate with
(N )
k · kE (i.e., Cn−1 k · k ≤ k · kE ≤ Cn k · k for some Cn ∈ [1, ∞)) such that, for
each R > 0,
(n)
BH (0, R) = {x ∈ E : kxkE ≤ R for all n ≥ 0}.
which is remarkably close to the equality that holds when E = R. See Corollary
8.4.3 for a sharper statement.
Exercise 8.2.17. Again let E be a separable, real Banach space. Suppose that
{Xn : n ≥ 1} is a sequence for centered, Gaussian E-valued random variables on
some probability space (Ω, F, P) and that Xn −→ X in P-probability. Show that
X is again a centered,
Gaussian random variable and that there exists a λ > 0
2
for which supn≥1 EP eλkXn kE < ∞. Conclude, in particular, that Xn −→ X in
Lp (P; E) for every p ∈ [1, ∞).
316 8 Gaussian Measures on a Banach Space
Exercise 8.2.18. Given λ ∈ Θ(RN )∗ , I pointed out at the end of § 8.1.2 that the
Paley–Wiener integral
[I(hλ )](θ) can be interpreted as the Riemann–Stieltjes
integral of λ (s, ∞) with respect to θ(s). In this exercise, I will use this obser-
vation as the starting point for what is called stochastic integration.
(i) Given λ ∈ Θ(RN )∗ and t > 0, set λt (dτ ) = 1[0,t) (τ )λ(dτ ) + δt λ [t, ∞) , and
where again the integral on the right is Riemann–Stieltjes. Use this to see that
the process Z t
f (τ ) · dθ(τ ) : t ≥ 0
0
Of course, unless f has bounded variation, the integrals in the preceding are
no longer interpretable as Riemann–Stieltjes integrals. In fact, they not even
defined θ by θ but only as a stochastic process. For this reason, they are called
stochastic integrals.
§ 8.3 From Hilbert to Abstract Wiener Space 317
Theorem 8.3.1 says that there is a one-to-one correspondence between the ab-
stract Wiener spaces associated with one Hilbert space and the abstract Wiener
spaces associated with any other. In particular, it allows us to prove the theorem
of Gross which states that every Hilbert space is the Cameron–Martin space for
some abstract Wiener space.
Corollary 8.3.2. Given a separable, real Hilbert space H, there exists a
separable Banach space E and a W ∈ M1 (E) such that (H, E, W) is an abstract
Wiener space.
Proof: Let F : H 1 (R) −→ H be an isometric isomorphism, and use Theorem
8.3.1 to construct a separable Banach space E and an isometric, isomorphism
F̃ : Θ(R) −→ E so that (H, E, W) is an abstract Wiener space when W =
F̃∗ W (1) .
It is important to recognize that although a non-degenerate, centered Gaussian
measure on a Banach space E determines a unique Cameron–Martin space H,
a given H will be the Cameron–Martin space for an uncountable number of
abstract Wiener spaces. For example, in the classical case when H = H1 (RN ),
we could have replaced Θ(RN ) by a subspace which reflected the fact that almost
every Brownian path is locally Hölder continuous of any order less than a half.
We will see a definitive, general formulation of this point in Corollary 8.3.10.
§ 8.3.2. Wiener Series. The proof that I gave of Corollary 8.3.2 is too non-
constructive to reveal much about the relationship between H and the abstract
Wiener spaces for which it is the Cameron–Martin space. Thus, in this sub-
section I will develop another, entirely different way of constructing abstract
Wiener spaces for a Hilbert space.
The approach here has its origins in one of Wiener’s own constructions of
Brownian motion and is based on the following line of reasoning. Given H,
choose an orthonormal basis {hn : n ≥ 0}. If there were a standard Gauss
measure W on H, then the random variables {Xn : n ≥ 0} given by Xn (h) =
h, hn H would be independent, standard normal, R-valued random variables,
P∞
and, for each h ∈ H, 0 Xn (h)hn would converge in H to h. Even though
W cannot live on H, this line of reasoning suggests that a way to construct an
abstract Wiener space is to start with a sequence {Xn : n ≥ 0} of R-valued,
independent standard normalPrandom variables on some probability space, find
∞
a Banach space E in which 0 Xn hn converges with probability 1, and take
W on E to the distribution of this series.
§ 8.3 From Hilbert to Abstract Wiener Space 319
To convince oneself that this line of reasoning has a chance of leading some-
where, one should observe that Lévy’s construction corresponds to a particu-
lar choice of the orthonormal basis {hm : m ≥ 0}.1 To see this, determine
{ḣk,n : (k, n) ∈ N2 } by
on k21−n , (2k + 1)2−n
1
n−1
−1 on (2k + 1)2−n , (k + 1)21−n
ḣk,0 = 1[k,k+1) and ḣk,n = 2 2
0 elsewhere
for n ≥ 1. Clearly, the ḣk,n ’s are orthonormal in L2 [0, ∞); R . In addition, for
each n ∈ N, the span of {ḣk,n : k ∈ N} equals that of {1[k2−n ,(k+1)2−n ) : k ∈ N}.
Perhaps the easiest way to check this is to do so by dimension counting. That
is, for a given (`, n) ∈ N2 , note that
has the same number of elements as {1[k2−n ,(k+1)2−n ) : `2n ≤ k < (` + 1)2n }
and that the first set is contained in the span of the second. As a consequence,
we know that {ḣk,n : (k, n) ∈ N2 } is an orthonormal basis in L2 [0, ∞); R , and
Rt
so, if hk,n (t) = 0 ḣk,n (τ ) dτ and (e1 , . . . , eN ) is an orthonormal basis in RN ,
then
hk,n,i ≡ hk,n ei : (k, n, i) ∈ N2 × {1, . . . , N }
is
an orthonormal basis, known as the Haar basis, in H1 (RN ). Finally, if
2
Xk,n,i : (k, n, i) ∈ N ×{1, . . . , N } is a family of independent, N (0, 1)-random
PN
variables and Xk,n = i=1 Xk,n,i ei , then
X ∞ X
n X N ∞
n X
X
Xk,m,i hk,m,i (t) = hk,m (t)Xk,m
m=0 k=0 i=1 m=0 k=0
1 The observation that Lévy’s construction (cf. § 4.3.2) can be interpreted in terms of a Wiener
series is due to Z. Ciesielski. To be more precise, initially Ciesielski himself was thinking
entirely in terms of orthogonal series and did not realize that he was giving a re-interpretation
of Lévy’s construction. Only later did the connection become clear.
320 8 Gaussian Measures on a Banach Space
and if S : RN −→ E is given by
P∞
m=0 ξm hm when the series converges in E
S(ξ) =
0 otherwise,
then H, E, W with W = S∗ γ0,1 N
is an abstract Wiener space. Conversely, if
(H, E, W) is an abstract Wiener space and {hm : m ≥ 0} is an orthogonal
sequence in H such that, for each m ∈ N, either hm = 0 or khm kH = 1, then
"
p #
Xn
W
(8.3.5) sup
I(hm )hm
< ∞ for all p ∈ [1, ∞),
E
n≥0
m=0
E
P∞
and, for W-almost every x ∈ E, m=0 [I(hm )](x)hm converges
in E to the
W-conditional expectation value of x given σ {I(hm ) : m ≥ 0} . Moreover,
∞
X ∞
X
[I(hm )](x)hm is W-independent of x − [I(hm )](x)hm .
m=0 m=0
h √ i n
N Y 1 2 1 2
c ∗ ) = lim Eγ0,1
W(x e −1hSn ,λi = lim e− 2 (hx∗ ,hm )H = e− 2 khx∗ kH ,
n→∞ n→∞
m=0
§ 8.3 From Hilbert to Abstract Wiener Space 321
W
Hence, I(h) is F -measurable for every h ∈ H. In particular, this means that
W
x hx, x∗ i is F -measurable for every x∗ ∈ E ∗ , and so, since BE is generated
W
by {h · , x∗ i : x∗ ∈ E ∗ }, BE ⊆ F .
It is important to acknowledge that the preceding theorem does not give an-
other proof of Wiener’s theorem that Brownian motion exists. Instead, it simply
says that, knowing it exists, there are lots of ways in which to construct it. See
Exercise 8.3.21 for a more satisfactory proof of the same conclusion in the clas-
sical case, one that does not require the a priori existence of W (N ) .
The following result shows that, in some sense, a non-degenerate, centered,
Gaussian measure W on a Banach space does not fit on a smaller space.
Corollary 8.3.6. If W is a non-degenerate, centered Gaussian measure on
a separable Banach space E, then E is the support of W in the sense that W
assigns positive probability to every non-empty open subset of E.
Proof: Let H be the Cameron–Martin space for W. Since H is dense in E, it
suffices to show that W BE (g, r) > 0 for every g ∈ H and r > 0. Moreover,
since, by the Cameron–Martin formula (8.2.8) (cf. Exercise 8.2.19)
kgk2
q
H
≤e 2 W BE (g, r) ,
322 8 Gaussian Measures on a Banach Space
I need only show that W BE (0, r) > 0 for all r > 0. To this end, choose an
Pn
orthonormal basis {hm : m ≥ 0} in H, and set Sn = m=0 I(hm )hm . Then, by
Theorem 8.3.3, x Sn (x) is W-independent of x x − Sn (x) and
Sn (x) −→ x
in E for W-almost every x ∈ E. Hence, W {kx − Sn (x)kE < 2r } ≥ 12 for some
n ∈ N, and therefore
W BE (0, r) ≥ 12 W kSn kE < 2r .
Pn
But kSn k2E ≤ CkSn k2H = m=0 I(hm )2 for some C < ∞, and so
n+1
W kSn kE < 2r ≥ γ0,1 r
BRn+1 0, 2C > 0 for any r > 0.
for some Km < ∞ which depends only on `m . Moreover, and because L ⊥ Lnm ,
g̃m,i ⊥ Lnm for all 1 ≤ i ≤ `m. Hence, we can find an nm+1 ≥ nm + `m so that
span {hn : nm < n ≤ nm+1 } admits an orthonormal basis {fnm +1 , . . . , fnm+1 }
P`
with the property that 1m kgm,i − fnm +i kH ≤ 4 .
Clearly {fn : n ≥ 0} is an orthonormal basis for H. On the other hand,
2 12
2 12
nmX+`m
X`m
EW
I(fn )fn
≥ − EW
I(gm,i )gm,i − I(fnm +i )fnm +i
n=nm +1 E 1 E
`m
X
2 1
EW
I(gm,i )gm,i − I(fnm +i )fnm +i
H 2 ,
≥−
1
2 1
and so, since EW
I(gi,m )gm,i − I(fnm +i )fnm +i
H 2 is dominated by
2 1 1
EW
I(gm,i ) − I(fnm +i ) gm,i
H 2 + EW I(fnm +i )2 2 kgm,i − fnm +i kH
≤ 2kgm,i − fnm +i kH ,
324 8 Gaussian Measures on a Banach Space
we have that
2 12
nmX+`m
EW
I(fn )fn
≥ for all m ≥ 0,
n +1
2
m E
P∞
and this means that 0 I(fn )fn cannot be converging in L2 (W; E).
Besides showing that my definition of an abstract Wiener space is the same
as Gross’s, Theorem 8.3.9 allows us to prove a very convincing statement, again
due to Gross, of just how non-unique is the Banach space for which a given
Hilbert space is the Cameron–Martin space.
Corollary 8.3.10. If (H, E, W) is an abstract Wiener space, then there
exists a separable Banach space E0 that is continuously embedded in E as a
measurable subset and has the properties that W(E 0 ) = 1, bounded subsets of
E0 are relatively compact in E, and (H, E0 , W E0 is again an abstract Wiener
space.
Proof: Again I will assume that k · kE ≤ k · kH .
Choose {x∗n : n ≥ 0} ⊆ E ∗ so that {hn : n ≥ 0} is an orthonormal basis
in H when hn = hx∗n , and set Ln = span {h0 , . . . , hn } . Next, using Theo-
rem 8.3.9, choose an increasing sequence {nm : m ≥ 0} so that n0 = 0 and
1
EW kPL xk2E 2 ≤ 2−m for m ≥ 1 and finite dimensional L ⊥ Lnm , and define
Pm
Finally, set Sm = PLnm = `=0 Q` , and define E0 to be the set of x ∈ E such
that
∞
X
`2
Q` xkE < ∞
kxkE0 ≡ kQ0 xkE + and kSm x − xkE −→ 0.
`=1
and therefore k · kE0 is certainly a norm on E0 . Next, suppose that the sequence
{xk : k ≥ 1} ⊆ E0 is a Cauchy sequence with respect to k · kE0 . By the
preceding, we know that {xk : k ≥ 1} is also Cauchy convergent with respect to
§ 8.3 From Hilbert to Abstract Wiener Space 325
Thus, by choosing k for a given > 0 so that sup`>k kx` − xk kE0 < , we
conclude that limm→∞ kx − Sm xkE < and therefore that Sm x −→ x in E.
Hence, x ∈ E0 . Finally, to see that xk −→ x in E0 , simply note that
∞
X
kx − xk kE0 = kQ0 (x − xk )kE + m2 kQm (x − xk )kE
m=1
∞
!
X
2
≤ lim kQ0 (x` − xk )kE + m kQm (x` − xk )kE ≤ sup kx` − xk kE0 ,
`→∞ m=1 `>k
which tends to 0 as k → ∞.
To show that bounded subsets of E0 are relatively compact in E, it suffices
to show that if {x` : ` ≥ 1} ⊆ BE0 (0, R), then there is an x ∈ E to which a
subsequence converges in E. For this purpose, observe that, for each m ≥ 0,
there is a subsequence {x`k : k ≥ 1} along which Sm x`k converges in Lnm .
Hence, by a diagonalization argument, {x`k : k ≥ 1} can be chosen so that
{Sm x`k : k ≥ 1} converges in Lnm for all m ≥ 0. Since, for 1 ≤ j < k,
X
kx`k − x`j kE ≤ kSm x`k − Sm x`j kE + kQn (x`k − x`j )kE
n>m
X 1
≤ kSm x`k − Sm x`j kE + 2R ,
n>m
n2
1 2 2 khkE khkE
W
1
≥ E I(h) = .
2m khk2H khkH
Hence, we now know that h ⊥ Lnm =⇒ khkE ≤ 2−m khkH . In particular,
kQm+1 hkE ≤ 2−m kQm+1 hkH ≤ 2−m khkH for all m ≥ 0 and h ∈ H, and so
∞ ∞
!
X
2
X m2
khkE0 = kQ0 hkE + m kQm hkE ≤ 1 + 2 m
khkH = 25khkH .
m=1 m=1
2
To complete the proof, I must show that H is dense in E0 and that, for each
c0 (y ∗ ) = e− 12 khy∗ k2H , where W0 = W E0 and hy∗ ∈ H is determined
y ∗ ∈ E0∗ , W
by h, hy∗ H = hh, y ∗ i for h ∈ H. Both these facts rely on the observation that
X
kx − Sm xkE0 = n2 kQn xkE −→ 0 for all x ∈ E0 .
n>m
n
X htm − htm−1
PL θ = θ(tm ) − θ(tm−1 ) ,
t − tm−1
m=1 m
and so
θ(t1 ,... ,tn ) (t) ≡ [θ − PL θ](t)
t−tm−1
(
(8.3.11) θ(t) − θ(tm−1 ) − tm −tm−1 θ(tm ) − θ(tm−1 ) if t ∈ [tm−1 , tm ]
=
θ(t) − θ(tn ) if t ∈ [tn , ∞).
§ 8.3 From Hilbert to Abstract Wiener Space 327
Thus, if (θ, ~y) ∈ Θ(RN ) × (RN )n 7−→ θ(t1 ,... ,tn ),~y ∈ Θ(RN ) is given by
n
X htm − htm−1
θ(t1 ,... ,tn ),~y = θ(t1 ,... ,tn ) + (ym − ym−1 ),
t − tm−1
m=1 m
then
Z
F θ, θ(t1 ) − θ(t0 ), . . . , θ(tn ) − θ(tn−1 ) W (N ) (dθ)
Θ(RN )
(8.3.13) Z Z
(N )
= F θ̌(t1 ,... ,tn ),~y , y W
~ (dθ) γ0,D(t1 ,... ,tn ) (d~y),
(RN )n Θ(RN )
where D(t1 , . . . , tn )(m,i),(m0 ,i0 ) = (tm − tm−1 )δm,m0 δi,i0 for 1 ≤ m, m0 ≤ n and
1 ≤ i, i0 ≤ N is the covariance matrix for θ(t1 ) − θ(t0 ), . . . , θ(tn ) − θ(tn−1 )
under W (N ) .
There are several comments that should be made about these conclusions. In
the first place, it is clear from (8.3.11) that t θ(t1 ,... ,tn ) (t) returns to the origin
at each of the times {tm : 1 ≤ m ≤ n}. In addition, the excursions θ(t1 ,... ,tn )
[tm−1 , tm ], 1 ≤ m ≤ n, are independent of each other and of θ(t1 ,... ,tn ) [tn , ∞).
(N )
Secondly, if W(t1 ,... ,tn ),~y denotes the W (N ) -distribution of θ θ(t1 ,... ,tn ),~y , then
(8.3.12) says that
(N )
θ W(t1 ,... ,tn ),(θ(t1 ),... ,θ(tn ))
has distribution W. Hence, all that remains is to check that I(h)◦TO = I(O> h)
W-almost surely for each h ∈ H. To this end, let x∗ ∈ E ∗ , and observe that
∞
X
∗
[I(hx∗ )](TO x) = hTO x, x i = hx∗ , Ohm H
[I(hm )](x)
m=0
∞
X
O> hx∗ , hm
= H
[I(hm )](x)
m=0
for W-almost every x ∈ E. Thus, since, by Lemma 8.3.7, the last of these
series convergences W-almost surely to I(O> hx∗ ), we have that I(hx∗ ) ◦ TO =
§ 8.3 From Hilbert to Abstract Wiener Space 329
I(O> hx∗ ) W-almost surely. To handle general h ∈ H, simply note that both
h ∈ H 7−→ I(h) ◦ TO ∈ L2 (W; R) and h ∈ H 7−→ I(O> h) ∈ L2 (W; R) are
isometric, and remember that {hx∗ : x∗ ∈ E ∗ } is dense in H.
I next want to discuss the possibility of TO being ergodic for some orthog-
onal transformations O. First notice that TO cannot be ergodic if O has a
Pn subspace L, since if {h1 , . . . , hn } were
non-trivial, finite dimensional invariant
an orthonormal basis for L, then m=1 I(hm )2 would be a non-constant, TO -
invariant function. Thus, the only candidates for ergodicity are O’s that have
no non-trivial, finite dimensional, invariant subspaces. In a more general and
highly abstract context, I. Segal2 showed that the existence of a non-trivial, fi-
nite dimensional subspace for O is the only obstruction to TO being ergodic.
Here I will show less.
Theorem 8.3.15. Let (H, E, W) be an abstract Wiener space. If O is an
orthogonal transformation
on H with the property that, for every g, h ∈ H,
limn→∞ On g, h H = 0, then TO is ergodic.
Proof: What I have to show is that any TO -invariant element Φ ∈ L2 (W; R)
is W-almost surely constant, and for this purpose it suffices to check that
lim EW (Φ ◦ TOn )Φ = 0
(*)
n→∞
2See I.E. Segal’s “Ergodic subsgroups of the orthogonal group on a real Hilbert Space,” Annals
of Math. 66 # 2, pp. 297–303 (1957). For a treatment in the setting here, see my article “Some
thoughts about Segals ergodic theorem,” Colloq. Math. 118 # 1, pp. 89-105 (2010).
330 8 Gaussian Measures on a Banach Space
where
I Bn
Cn = with Bn = hk , On h`
B>n I H 1≤k,`≤N
Perhaps the best tests for whether an orthogonal transformation satisfies the
hypothesis in Theorem 8.3.15 come from spectral theory. To be more precise, if
Hc and Oc are the space and operator obtained by complexifying H and O, the
Spectral Theorem for normal operators allows one to write
Z 2π √
−1α
Oc = e dEα ,
0
See Exercises 8.3.24, 8.3.25, and 8.5.15 for a more concrete examples.
Exercises for § 8.3
Exercise 8.3.16. The purpose of this exercise is to provide the linear algebraic
facts that I used in the proof of Theorem 8.3.9. Namely, I want to show that if
a set {h1 , . . . , hn } ⊆ H is approximately orthonormal, then the vectors hi differ
by very little from their Gram–Schmidt orthogonalization.
3This conclusion highlights the poverty of the result here in comparison to Segal’s result,
which says that TO is ergodic as soon as the spectrum of Oc is continuous.
Exercises for § 8.3 331
(i) Suppose that A = aij 1≤i,j≤n ∈ Rn ⊗Rn is a lower triangular matrix whose
diagonal entries are non-negative. Show that there is a Cn < ∞, depending only
on n, such that kIRn − Akop ≤ Cn kIRn − AA> kop .
Hint: Show that it suffices to treat the case when AA> ≤ 2IRn , and set ∆ =
IRn − AA> . Assuming that AA> ≤ 2IRn , work by induction on n, at each step
using the lower triangularity of A, to see that
12
`
1 X
|a` ` an ` | ≤ |∆n ` | + (AA> )n2 n a2` j if 1 ≤ ` < n
j=1
n−1
X
1 − a2n n ≤ |∆n n | + a2n ` .
`=1
(ii) Let {h1 , . . . , hn } ⊆ H, set B = (hi , hj )H 1≤i,j≤n , and assume that kIRn −
Bkop < 1. Show that the hi ’s are linearly independent.
(iii) Continuing part (ii), let {f1 , . . . , fn } be the orthonormal set obtained from
the hi ’s by the Gram–Schmidt orthogonalization procedure, and let A be the
matrix whose (i, j)th entry is (hi , fj )H . Show that A is lower triangular and
that its diagonal entries are non-negative. In addition, show that AA> = B.
(iv) By combining (i) and (iii), show that there is a Kn < ∞, depending only
on n, such that
n
X n
X
khi − fi kH ≤ Kn δi,j − (hi , hj )H .
i=1 i,j=1
Pn
Hint: Note that hi = j=1 aij fj and therefore that
n
X 2
khi − fi k2H = IRn − A ij
≤ nkIRn − Ak2op .
j=1
Exercise 8.3.18. Let (H, E, W) be an abstract Wiener space, and assume that
H is infinite dimensional. As was pointed out, {hx∗ : x∗ ∈ E ∗ } is the subspace of
g ∈ H for which there exists a C < ∞ with the property that |(h, g)H | ≤ CkhkE
for all h ∈ H. Show that for each g ∈ H there is separable Banach space Eg
that is continuously embedded as a Borel subset of E such that W(Eg ) = 1,
(H, Eg , W Eg ) is an abstract Wiener space, and |(h, g)H | ≤ khkEg for all
h ∈ H.
Hint: Refer to the notation used in the proof of Corollary 8.3.10. Choose nm %
1
gkH ≤ 2−m and EW kPL k2E 2 ≤ 2−m
∞ so that n0 = 0 and, for m ≥ 1, kΠL⊥ nm
for finite dimensional L ⊥ Lnm . Next, define Eg to be the space of x ∈ E with
the properties that PLnm x −→ x in E and
X
kxkEg ≡ kQ` xkE + Q` x, g H < ∞,
`=0
Pn`
where Q0 x = hx, x∗0 ihx∗0 and Q` x = ∗
n=n`−1 +1 hx, xn ihxn for ` ≥ 1. Using
∗
the reasoning in the proof of Corollary 8.3.10, show that Eg has the required
properties.
Exercise 8.3.19. Let N = 1. Using Theorem 8.3.3, take Wiener’s choice of or-
thonormal basis and check that there are independent, standard normal random
variables {Xm : m ≥ 1} under W (1) such that, for W (1) -almost almost every θ,
∞
1
X sin(πmt)
θ(t) = tX0 (θ) + 2 2 Xm (θ) , t ∈ [0, 1],
m=1
mπ
where the convergence is uniform. From this, show that, W (1) -almost surely,
1 ∞ √
X0 (θ)2 1 X Xm (θ)2 + 8X0 (θ)Xm (θ)
Z
2
θ(t) dt = + 2 ,
0 3 π m=1 m2
where the convergence of the series is absolute. Using the preceding, conclude
that, for any α ∈ (0, ∞),
This is a famous calculation that can be made using many different methods.
We will return to it in § 10.1.3. See, in addition, Exercise 8.4.7.
Hint: Use Euler’s product formula to see that
∞
d sinh t X 1
log = 2t for t ∈ R.
dt t n=1
n π + t2
2 2
Exercise 8.3.20. Related to the preceding exercise, but easier, is finding the
Laplace transform of the variance
!2
1 T 1 T
Z Z
2
VT (θ) ≡ θ(t) dt − θ(t) dt
T 0 T 0
of a Brownian path over the interval [0, T ]. To do this calculation, first use
Brownian scaling to show that
(1) (1)
EW e−αVT = EW e−αT V1 .
Next, use elementary Fourier series to show that (cf. part (iii) of Exercise 8.2.18)
R 2
∞ Z 1 2 X ∞ 1
X 0
f k (t) dθ(t)
V1 (θ) = 2 θ(t) cos(kπt) dt = ,
0 k2 π2
k=1 k=1
1
where fk (t) = 2 sin(kπt) for k ≥ 1. Since the fk ’s are orthonormal as elements
2
Exercise 8.3.21. The purpose of this exercise is to show that, without know-
ing ahead of time that W (N ) lives on Θ(RN ), for the Hilbert space H1 (RN ) one
can give a proof that any Wiener series converges γ0,1 N
-almost surely in Θ(RN ).
N
Thus, let {hm : m ≥ 0} be an orthonormal basis Pn in H(R ) and, for n ∈ N
and ω = (ω0 , . . . , ωm , . . . ) ∈ R , set Sn (t, ω) = m=0 ωm hm (t). The goal is to
N
(iii) As an application of Theorem 4.3.2, show that (*) will follow once one
checks that
N
γ0,1
E sup |Sn (t) − Sn (s)| ≤ B(t − s)2 , 0 ≤ s < t,
4
n≥0
(i) Show that the W (N ) -distribution of {θT (t) : t ≥ 0} is the same as that of
1
{T 2 θ1 (T −1 t) : t ≥ 0}.
(ii) Set H1T (RN ) = {h [0, T ] : h ∈ H1 (RN ) & h(T ) = 0}, and define
(N )
khkH1T (RN ) = kḣkL2 ([0,T ];RN ) . Show that the triple H1T (RN ), ΘT (RN ), WT
(N )
is an abstract Wiener space. In addition, show that WT is invariant under
time reversal. That is, {θ(t) : t ∈ [0, T ]} and {θ(T − t) : t ∈ [0, T ]} have the
(N )
same distribution under WT .
Hint: Begin by identifying ΘT (RN )∗ as the space of finite, RN -valued Borel
measures λ on [0, T ] such that λ({0}) = 0 = λ({T }).
(ii) Complete the program by showing that Oαn h, h0 H1 (RN ) tends to 0 for all
α ∈ (0, ∞) \ {1} and h, h0 ∈ H1 (RN ) with ḣ, ḣ0 ∈ Cc∞ (0, ∞); RN .
(iii) There is another way to think about the operator Oα . Namely, let λRN
be Lebesgue measure on R, define U : H(RN ) −→ L2 (λRN ; RN ) by U h(x) =
x
e 2 ḣ(ex ), and show that U is an isometry from H1 (RN ) onto L2 (λRN ; RN ). Fur-
ther, show that U ◦ Oα = τlog α ◦ U , where τα : L2 (λRN ; RN ) −→ L2 (λRN ; RN ) is
the translation map τα f (x) = f (x + α). Conclude from this that
√
Z
Oαn h, h0 = (2π)−1 e− −1nξ log α
H1 (RN )
Uch(ξ), U
d h0 CN
dξ,
R
(iv) As a consequence of the above and Theorem 6.2.7, show that for each
α ∈ (0, ∞) \ {1}, q ∈ [1, ∞), and F ∈ Lq (W (N ) ; C),
n−1
1 X (N )
F Sαn θ = EW [F ] W (N ) -almost surely and in Lq (W (N ) ; C).
lim
n→∞ n
m=0
khk2H
− inf◦ ≤ lim log W (Γ)
h∈Γ 2 &0
(8.4.2)
khk2H
≤ lim log W (Γ) ≤ − inf .
&0 h∈Γ 2
The original version of Theorem 8.4.1 was proved by M. Schilder for the clas-
sical Wiener measure using a method that does not extend easily to the general
case. The statement that I have given is due to Donsker and S.R.S. Varadhan,
and my proof derives from an approach (which very much resembles the argu-
ments given in § 1.3 to prove Cramér’s Theorem) that was introduced into this
context by Varadhan.
The lower bound is an easy application of the Cameron–Martin formula. In-
deed, all that I have to do is show that if h ∈ H and r > 0, then
khk2H
(*) lim log W BE (h, r) ≥ − .
&0 2
khx∗ k2H
BE (hx∗ , δ) ⊆ BE (h, r) =⇒ lim log W BE (hx∗ , r) ≥ −δkx∗ kE ∗ −
,
&0 2
and therefore, after letting δ & 0 and remembering that {hx∗ : x ∈ E ∗ } is dense
in H, that (*) holds.
338 8 Gaussian Measures on a Banach Space
The proof of the upper bound in (8.4.2) is a little more involved. The first step
is to show that it suffices to treat the case when Γ is relatively compact. To this
end, refer to Corollary 8.3.10, and set CR equal to the closure in E of BE0 (0, R).
2
By Fernique’s Theorem applied to W on E0 , we know that EW eαkxkE0 ≤ K <
Thus, if we can prove the upper bound for relatively compact Γ’s, then, because
Γ ∩ CR is relatively compact, we will know that, for all R > 0,
khk2H
∧ αR2
lim log W (Γ) ≤ − inf ,
&0 h∈Γ 2
kyk2
(
− 2 H if y ∈ H
(**) lim lim log W BE (y, r) ≤
r&0 &0 −∞ if y ∈
/ H.
To see that (**) is enough, assume that it is true and let Γ ∈ BE \{∅} be relatively
compact. Given β ∈ (0, 1), for each y ∈ Γ choose r(y) > 0 and (y) > 0 so that
(1−β)
( 2
e− 2 kykH if y ∈ H
W BE (y, r(y)) ≤ 1
− β
e if y ∈
/H
for all 0 < ≤ (y). Because Γ is relatively compact, we can find N ∈ Z+ and
SN
{y1 , . . . , yN } ⊆ Γ such that Γ ⊆ 1 BE (yn , rn ), where rn = r(yn ). Then, for
sufficiently small > 0,
1−β 2 1
W (Γ) ≤ N exp − inf khkH ∧ ,
2 h∈Γ β
and so
1−β 1
lim log W (Γ) ≤ − inf khk2H ∧ .
&0 2 h∈Γ β
Now let β & 0.
§ 8.4 A Large Deviations Result and Strassen’s Theorem 339
khx∗ k2
−1
−1 ∗ ∗ ∗ −1 ∗ H ∗
≤ e− (hy,x i−rkx kE∗ ) EW e 2 hx,x i = e− hy,x i− 2 −rkx kE∗ ,
Finally, note that the preceding supremum is the same as half the supremum
kyk2
of hy, x∗ i over x∗ with khx∗ kH = 1, which, by Lemma 8.2.3, is equal to 2 H if
y ∈ H and to ∞ if y ∈ / H.
An interesting corollary of Theorem 8.4.1 is the following sharpening, due to
Donsker and Varadhan, of Fernique’s Theorem.
Corollary 8.4.3. Let W be a non-degenerate, centered, Gaussian measure on
the separable Banach space E, let H be the associated Cameron–Martin space,
and determine Σ > 0 by Σ−1 = inf{khkH : khkE = 1}. Then
1
lim R−2 log W kxkE ≥ R = − 2 .
R→∞ 2Σ
α2 2
In particular, EW e 2 kxkE is finite if α < Σ−1 and infinite if α ≥ Σ−1 .
Proof: Set f (r) = inf{khkH : khkE ≥ r}. Clearly f (r) = rf (1) and f (1) =
Σ−1 . Thus, by the upper bound in (8.4.2), we know that
f (1)2 Σ−2
lim R−2 log W kxkE ≥ R = lim R−2 log WR−2 kxkE ≥ 1 ≤ −
= .
R→∞ R→∞ 2 2
α2 Λ2 √
Sn
(*) P ∈
/ G ≤ exp − for all n ∈ Z+ and Λ ≥ M n.
Λ 2n
§ 8.4 A Large Deviations Result and Strassen’s Theorem 341
To check (*), first note (cf. Exercise 8.2.14) that the distribution of Sn under
1
P is the same as that of x n 2 x under W and therefore that P S̃Λn ∈ /G =
W n2 (G{). Hence, (*) is really just an application of the upper bound in (8.4.2).
Λ
Given (*), I proceed in very much the same way as I did at the analogous place
in § 1.5. Namely, for any β ∈ (1, 2),
At this point in § 1.5 (cf. the proof of Lemma 1.5.3), I applied Lévy’s reflection
principle to get rid of the “max.” However, Lévy’s argument works only for
R-valued random variables, and so here I will replace his estimate by one based
on the idea in Exercise 1.4.25.
Lemma 8.4.5. Let {YmP: m ≥ 1} be mutually independent, E-valued random
n
variables, and set Sn = m=1 Ym for n ≥ 1. Then, for any closed F ⊆ E and
δ > 0,
P(kSn − F kE ≥ δ)
P max kSm − F kE ≥ 2δ ≤ .
1≤m≤n 1 − max1≤m≤n P(kSn − Sm kE ≥ δ)
Proof: Set
Am = {kSm − F kE ≥ 2δ and kSk − F kE < 2δ for 1 ≤ k < m}.
Following the hint for Exercise 1.4.25, observe that
P max kSm − F kE ≥ 2δ min P(kSn − Sm kE < δ)
1≤m≤n 1≤m≤n
n
X n
X
≤ P Am ∩ {kSn − Sm kE < δ} ≤ P Am ∩ {kSn − F kE ≥ δ} ,
m=1 m=1
which, because the Am ’s are disjoint, is dominated by P kSn − F kE ≥ δ .
Applying the preceding to the situation at hand, we see that
!
Sn
P max
≥ 2δ
− BH (0, 1)
1≤n≤β m
Λ[β m−1 ]
E
S[βm ]
− BH (0, 1)
≥ δ
P
Λ[β
m−1 ] E
≤ .
1 − max1≤n≤β m P kSn kE ≥ δΛ[β m−1 ]
342 8 Gaussian Measures on a Banach Space
After combining this with the estimate in (*), it is an easy matter to show that,
for each δ > 0, there is a β ∈ (1, 2) such that
∞
!
X
Sn
P max
≥ 2δ < ∞,
− BH (0, 1)
β m−1 ≤n≤β m Λ[β m−1 ]
m=1 E
from which it should be clear why limn→∞ kS̃n − BH (0, 1)kE = 0 P-almost surely.
The proof that, P-almost surely, limn→∞ kS̃n − hkE = 0 for all h ∈ BH (0, 1)
differs in no substantive way from the proof of the analogous assertion in the
second part of Theorem 1.5.9. Namely, because BH (0, 1) is separable, it suffices
to work with one h ∈ BH (0, 1) at a time. Furthermore, just as I did there, I can
reduce the problem to showing that, for each k ≥ 2, > 0, and h with khkH < 1,
∞
X
P
S̃km −km−1 − h
E < = ∞.
m=1
Exercise 8.4.6. Let (H, E, W) be an abstract Wiener space, and assume that
dim(H) = ∞. If W is defined for > 0 as in Theorem 8.4.1, show that
W1 ⊥ W2 if 2 6= 1 .
Hint: Choose {x∗m : m ≥ 0} ⊆ E ∗ so that {hx∗m : m ≥ 0} is an orthonormal
basis in H, and show that
n−1
1 X
lim hx, x∗m i2 = W -almost surely.
n→∞ n
m=0
Exercise 8.4.7. Show that the Σ in Corollary 8.4.3 is 12 in the case of the
classical abstract Wiener space H1 (RN ), Θ(RN ), W (N ) and therefore that
and that
!
2
lim R−2 log W (N )
sup |θ(τ )| ≥ R θ(t) = 0 = − .
R→∞ τ ∈[0,t] t
and that
t
π2
Z
−1 (N ) 2
lim R log W |θ(τ )| dτ ≥ R θ(t) = 0 = − 2 .
R→∞ 0 2t
Hint: In each case after the first, Brownian scaling can be used to reduce the
problem to the case when t = 1, and the challenge is to find the optimal constant
C for which khkE ≤ CkhkH , h ∈ H for the appropriate abstract Wiener space
N
(E, H, W). In the second case E =
C 0 [0, 1] : R ≡ θ [0, 1] : θ ∈ Θ(RN )
and H = η [0, 1] : η ∈ H1 (RN ) , in the third (cf. part (ii) of Exercise 8.3.22)
E = Θ1 (RN ) and H = H11 (RN ) , in the fourth E = L2 [0, 1]; N
R ) and H = {η
1 N 2 N 1 N
[0, 1] : η ∈ H (R )}, and in the fifth E = L [0, 1]; R and
H = H 1 (R ).
The optimization problems when E = Θ(RN ) or C0 [0, 1]; RN are rather easy
1
consequences of |η(t)| ≤ t 2 kηkH1 (RN ) . When E = Θ1 (RN ), one should start with
the observation that if η ∈ H11 (RN ), then 2kηku ≤ kη̇kL1 ([0,1];RN ) ≤ kηkH11 (RN ) .
In the final two cases, one can either use elementary variational calculus or one
can make use of, respectively, the orthonormal bases
1 1
1
πτ : n ≥ 0 and 2 2 sin nπτ : n ≥ 1 in L2 [0, 1]; R).
2 2 sin n + 2
Exercise 8.4.8. Suppose that f ∈ C E; R , and show, as a consequence of
Theorem 8.4.4, that
lim f S̃n = min{f (h) : khkH ≤ 1} and lim f S̃n = max{f (h) : khkH ≤ 1}
n→∞ n→∞
W N -almost surely.
§ 8.5 Euclidean Free Fields
In this section I will give a very cursory introduction to a family of abstract
Wiener spaces they played an important role in the attempt to give a mathe-
matically rigorous construction of quantum fields. From the physical standpoint,
the fields treated here are “trivial” in the sense that they model “free” (i.e.,
non-interacting) fields. Nonetheless, they are interesting from a mathematical
344 8 Gaussian Measures on a Banach Space
standpoint and, if nothing else, show how profoundly properties of a process are
effected by the dimension of its parameter set.
I begin with the case when the parameter set is one dimensional and the
resulting process can be seen as a minor variant of Brownian motion. As we
will see, the intractability of the higher dimensional analogs increases with the
number of dimensions.
§ 8.5.1. The Ornstein–Uhlenbeck Process. Given x ∈ RN and θ ∈ Θ(RN ),
consider the integral equation
1 t
Z
(8.5.1) U(t, x, θ) = x + θ(t) − U(τ, x, θ) dτ, t ≥ 0.
2 0
A completely elementary argument (e.g., via Gronwall’s Inequality) shows that,
for each x and θ, there is at most one solution. Furthermore, integration by
parts allows one to check that if
Z t
− 2t τ
U(t, 0, θ) = e e 2 dθ(τ ),
0
1 In their article “On the theory of Brownian motion,” Phys. Reviews 36 # 3, pp. 823-841
(1930), L. Ornstein and G. Uhlenbeck introduced this process in an attempt to reconcile some
of the more disturbing properties of Wiener paths with physical reality.
§ 8.5 Euclidean Free Fields 345
As
−at consequence,
we see that if {B(t) : t ≥ 0} is a Brownian motion, then
e 2 B et : t ≥ 0 is an ancient Ornstein–Uhlenbeck process. In addition, as
we suspected, the ancient Ornstein–Uhlenbeck process is a stationary process
in the sense that, for each T > 0, the distribution of {UA (t + T ) : t ≥ 0} is
the same as that of {UA (t) : t ≥ 0}, which can be checked either by using the
preceding representation in terms of Brownian motion or by observing that its
covariance is a function of t − s.
In fact, even more is true: it is time reversible in the sense that, for each T > 0,
{UA (t) : t ∈ [0, T ]} has the same distribution as {UA (T − t) : t ∈ [0, T ]}. This
observation suggests that we can give the ancient Ornstein–Uhlenbeck its past
by running it backwards. That is, define UR : [0, ∞) × RN × Θ(RN )2 −→ RN by
if t ≥ 0
U(t, x, θ+ )
UR (t, x, θ+ , θ− ) =
U(−t, x, θ− ) if t < 0,
and consider the process {UR (t, x, θ+ , θ− ) : t ∈ R} under γ0,I × W (N ) × W (N ) .
This process also spans a Gaussian family, and it is still true that
(N ) (N ) |t−s|
(8.5.3) Eγ0,I ×W ×W UR (s) ⊗ UR (t) = u(s, t)I, where u(s, t) ≡ e− 2 ,
346 8 Gaussian Measures on a Banach Space
only now for all s, t ∈ R. One advantage of having added the past is that the
statement of reversibility takes a more appealing form. Namely, {UR (t) : t ∈ R}
is reversible in the sense that its distribution is the same whether one runs
it forward or backward in time. That is, {UR (−t) : t ∈ R} has the same
distribution as {UR (t) : t ∈ R}. For this reason, I will say that {UR (t) : t ≥ 0}
is a reversible Ornstein–Uhlenbeck process if its distribution is the same
as that of {UR (t, x, θ+ , θ− ) : t ≥ 0} under γ0,I × W (N ) × W (N ) .
An alternative way to realize a reversible Ornstein–Uhlenbeck process is to
start with an RN -valued Brownian motion {B(t) : t ≥ 0} and consider the
t t
process {e− 2 B(et ) : t ∈ R}. Clearly ξ, e− 2 B(et ) RN : (t, ξ) ∈ R × RN is
that the Hilbert space associated with this process should be the space HU (RN )
t
of functions hU (t) = e− 2 h et − 1), h ∈ H1 (RN ). Thus, define the map F U :
H1 (RN ) −→ HU (RN ) accordingly, and introduce the Hilbert norm k · kHU (RN )
on HU (RN ) that makes F U into an isometry. Equivalently,
Z
U 2
h d 1 i2
kh kHU (RN ) = (1 + s) 2 hU log(1 + s) ds
[0,∞) ds
Note that
Z
U U d U
1
|h (t)|2 dt = 1
lim |hU (t)|2
ḣ , h L2 ([0,∞);RN )
= 2 2 t→∞ = 0.
[0,∞) dt
1
To check the final equality, observe that it is equivalent to limt→∞ t− 2 |h(t)| = 0
1 1
for h ∈ H(RN ). Hence, since supt>0 t− 2 |h(t)| ≤ khkH1 (RN ) and limt→∞ t− 2 |h(t)|
= 0 if ḣ has compact support, the same result is true for all h ∈ H1 (RN ). In
particular,
q
khU kHU (RN ) = kḣU k2L2 ([0,∞);RN ) + 14 khU k2L2 ([0,∞);RN ) .
and so we will adopt ΘU (RN ) as the Banach space for HU (RN ). Clearly, the
dual space ΘU (RN )∗ of ΘU (RN ) can be identified with the space of RN -valued
Borel
R measures λ on [0, ∞) that give 0 mass to {0} and satisfy kλkΛU (RN ) ≡
[0,∞)
log(e + t) |λ|(dt) < ∞.
(N )
Theorem 8.5.4. Let U0 ∈ M1 ΘU (RN ) be the distribution of {U(t, 0) :
(N )
t ≥ 0} under W (N ) . Then HU (RN ), ΘU (RN ), U0 is an abstract Wiener
space.
Proof: Since Cc∞ (0, ∞); RN is contained in HU (RN ) and is dense in ΘU (RN ),
ZZ
(8.5.5) khU 2
λ kHU (RN ) = u0 (s, t) λ(ds) · λ(dt).
[0,∞)2
Furthermore, it should be clear that one can identify ΘU (R; RN )∗ with the space
of RN -valued Borel measures λ on R satisfying
Z
kλkΛU (R;RN ) ≡ log(e + |t|) |λ|(dt) < ∞.
R
(N )
Then H1 (R; RN ), ΘU (R; RN ), UR is an abstract Wiener space.
|s−t|
− 2
Proof: Set u(s, t) ≡ e , and let λ ∈ ΛU (R; RN ). By the same reasoning
as I used in the preceding proof,
hh, λi = h, hλ H1 (R;RN )
§ 8.5 Euclidean Free Fields 349
and ZZ
khλ k2H1 (R;RN ) = u(s, t) λ(ds) · λ(dt)
R×R
u(τ, t) λ(dt). Hence, since ξ, θ(t) RN : t ≥ 0 & ξ ∈ RN
R
when hλ (τ ) = R
(N ) (N )
spans a Gaussian family in L2 UR ; R and u(s, t)I = EUR θ(s) ⊗ θ(t) , the
proof is complete.
§ 8.5.3. Higher Dimensional Free Fields. Thinking a la Feynman, Theorem
(N )
8.5.6 is saying that UR wants to be the measure on H 1 (R; R) given by
Z
1 1 2 1 2
√ exp − |ḣ(t)| + 4 |h(t)| dt λH1 (R;RN ) (dh),
( 2π)dim(H1 (R;RN )) 2 R
The approach that I will adopt is based on the following subterfuge. The space
H 1 (Rν ; R) is one of a continuously graded family of spaces known as Sobolev
spaces. Sobolev spaces are graded according to the number of derivatives “bet-
ter or worse” than L2 (Rν ; R) their elements are. To be more precise, for each
s ∈ R, define the Bessel operator B s on S (Rν ; C) so that
s
Bds ϕ(ξ) = 1 + |ξ|2 − 2 ϕ̂(ξ).
4
m
When s = −2m, it is clear that B s = 14 −∆ , and so, in general, it is reasonable
to think of B s as an operator that, depending on whether s ≤ 0 or s ≥ 0,
involves taking or restoring derivatives of order |s|. In particular, kϕkH 1 (Rν ;R) =
kB −1 ϕkL2 (Rν ;R) for ϕ ∈ S (Rν ; R). More generally, define the Sobolev space
H s (Rν ; R) to be the separable Hilbert space obtained by completing S (Rν ; R)
with respect to
s Z
−s 1 1
s
khkH s (Rν ;R) ≡ kB hkL2 (Rν ;R) = ν 4 + |ξ|2 |ĥ(ξ)|2 dξ.
(2π) Rν
Obviously, H 0 (Rν ; R) is just L2 (Rν ; R). When s > 0, H s (Rν ; R) is a sub-
space of L2 (Rν ; R), and the quality of its elements will improve as s gets larger.
However, when s < 0, some elements of H s (Rν ; R) will be strictly worse than
elements of L2 (Rν ; R), and their quality will deteriorate as s becomes more neg-
ative. Nonetheless, for every s ∈ R, H s (Rν ; R) ⊆ S 0 (Rν ; R), where S 0 (Rν ; R),
whose elements are called real-valued tempered distributions, is the dual
space of S (Rν ; R). In fact, with a little effort, one can check that an alternative
description of H s (Rν ; R) is as the subspace of u ∈ S 0 (Rν ; R) with the prop-
erty that B −s u ∈ L2 (Rν ; R). Equivalently, H s (Rν ; R) is the isometric image in
S (Rν ; R) of L2 (Rν ; R) under the map B s , and, more generally, H s2 (Rν ; R) is
the isometric image of H s1 (Rν ; R) under B s2 −s1 . Thus, by Theorem 8.3.1, once
we understand the abstract Wiener spaces for any one of the spaces H s (Rν ; R),
understanding the abstract Wiener spaces for any of the others comes down to
understanding the action of the Bessel operators, a task that, depending on what
one wants to know, can be highly non-trivial.
ν+1
Lemma 8.5.7. The space H 2 (Rν ; R) is continuously embedded as a dense
subspace of the separable Banach space C0 (Rν ; R) whose elements are continu-
ous functions that tend to 0 at infinity and whose norm is the uniform norm.
Moreover, given a totally finite, signed Borel measure λ on Rν , the function
Z 1−ν
|x−y|
− 2 π 2
hλ (x) ≡ Kν e λ(dy), with Kν ≡ ,
Rν Γ ν+1
2
ν+1
is an element of H 2 (Rν ; R),
ZZ
|x−y|
khλ k 2
ν+1 = Kν e− 2 λ(dx)λ(dy),
H 2 (Rν ;R)
Rν ×Rν
§ 8.5 Euclidean Free Fields 351
and ν+1
(Rν ; R).
hh, λi = h, hλ ν+1 for each h ∈ H 2
H 2 (Rν ;R)
Proof: To prove the initial assertion, use the Fourier inversion formula to write
√
Z
−ν
h(x) = (2π) e− −1(x,ξ)Rν ĥ(ξ) dξ
Rν
for h ∈ S (R ; R), and derive from this the estimate
ν
Z 12
ν ν+1
2 − 2
khku ≤ (2π)− 2 1
4 + |ξ| dξ khk ν+1 .
Rν H 2 (Rν ;R)
ν+1
Hence, since H 2 (Rν ; R) is the completion of S (Rν ; R) with respect to the
ν+1
norm k · k ν+1 , it is clear that H 2 (Rν ; R) is continuously embedded in
H 2 (Rν ;R)
ν+1
C0 (R ; R). In addition, since S (Rν ; R) is dense in C0 (Rν ; R), H 2 (Rν ; R) is
ν
also.
To carry out the next step, let λ be given, and observe that the Fourier
− ν+1
transform of B ν+1 λ is 14 + |ξ|2 2
λ̂(ξ) and therefore that
√
e− −1(x,ξ)Rν λ̂(ξ)
Z
ν+1 1
B λ(x) = ν+1 dξ
(2π)ν Rν 1
+ |ξ| 2 2
4 √
Z Z −1(y−x,ξ)Rν
1 e
=
ν+1 dξ λ(dy).
(2π)ν Rν Rν 1 + |ξ|2 2
4
where Q(z, R) = z + [−R, R)ν . Indeed, by the argument given in that exer-
cise combined with the higher dimensional analog of Kolmogorov’s continuity
criterion in Exercise 4.3.18, (*) will follow once we show that
N
Eγ0,1 |Sn (y) − Sn (x)|2 ≤ C|y − x|, x, y ∈ Rν ,
for some C < ∞. To this end, set λ = δy − δx , and apply Lemma 8.5.7 to check
n
N
γ0,1
2
X 2
E |Sn (y) − Sn (x)| = hm , hλ ν+1
H 2 (Rν ;R)
m=0
|y−x|
≤ khλ k2 = 2Kν 1 − e−
ν+1
2 .
H 2 (Rν ;R)
Knowing (*), it becomes an easy matter to see that there exists a measur-
able S : Rν × RN −→ R such that x S(x, ω) is continuous of each ω and
§ 8.5 Euclidean Free Fields 353
where k · ku,C denotes the uniform norm over a set C ⊆ Rν . At this point, I
would like to apply Fernique’s Theorem (Theorem 8.2.1) to the Banach space
`∞ N; Cb (Q(z, 1); R) and thereby conclude that there exists an α > 0 such that
N
(**) B ≡ sup Eγ0,1 exp α sup kSn k2u,Q(z,1) < ∞.
z∈Rν n≥0
∞
However, ` N; Cb (Q(z, 1); R) is not separable. Nonetheless, there are two
ways to get around this technicality. The first is to observe that the only place
separability was used in the proof of Fernique’s Theorem was at the beginning,
where I used it to guarantee that BE is generated by the maps x hx, x∗ i as
∗ ∗
x runs over E and therefore that the distribution of X is determined by the
distribution of {hX, x∗ i : x∗ ∈ E ∗ }. But, even though `∞ N; Cb (Q(z, 1); R)
is not separable, one can easily check that it nevertheless possesses this prop-
erty. The second way to deal with the problem is to apply his theorem to
`∞ {0, . . . , N }; Cb (Q(z, 1); R) , which is separable, and to note that the result-
ing estimate can be made uniform in N ∈ N. Either way, p one arrives at (**).
2
Now set ψ(t) = eαt − 1 for t ≥ 0. Then ψ −1 (s) = α−1 log(1 + s), and
ν
sup kSn ku,Q(0,M ) = max sup kSn ku,Q(m,1) : m ∈ Q(0, M ) ∩ Z
n≥0 n≥0
X
≤ ψ −1 ψ sup kSn ku,Q(m,1) .
n≥0
m∈Q(0,M )∩Zν
354 8 Gaussian Measures on a Banach Space
and therefore
N
h i
Eγ0,1 supn≥0 kSn ku,Q(0,em4 )
" #
N
γ0,1 Sn (x) X
E sup sup ≤
|x|≥R n≥0 log(e + |x|) 1
log(e + e(m−1)4 )
m≥(log R) 4
p
X log(1 + 2ν eν(m+1)4 B)
≤ √ −→ 0 as R → ∞.
1
α log(e + e(m−1)4 )
m≥(log R) 4
To complete the proof, I must show that, for any α > 12 , W ν+1 -almost
H 2 (Rν ;R)
no θ is anywhere Hölder continuous of order α, and for this purpose I will proceed
as in the proof of Theorem 4.3.4. Because the {θ(x + y) : x ∈ Rν } has the same
W ν+1 ν -distribution for all y, it suffices for me to show that, W ν+1 ν -
H 2 (R ;R) H 2 (R ;R)
almost surely, there is no x ∈ Q(0, 1) at which θ is Hölder continuous of order
α > 12 . Now suppose that α ∈ 12 , 1 , and observe that, for any L ∈ Z+ and
e ∈ Sν−1 , the set H(α) of θ’s that are α-Hölder continuous at some x ∈ Q(0, 1)
is contained in
∞ \
[ ∞ [ L n
\ o
m+(`−1)e
m+`e
M
θ : θ n −θ n ≤ nα .
M =1 n=1 m∈Q(0,n)∩Zν `=1
Hence, again using translation invariance, we see that we need only show that
there is an L ∈ Z+ such that, for each M ∈ Z+ ,
(`−1)e
nν W ν+1 ν θ : θ `e M
n − θ n ≤ nα , 1 ≤ ` ≤ L
H 2 (R ;R)
−1
tends to 0 as n → ∞. To this end, set U (t, θ) = Kν 2 θ(te), and observe that
the W ν+1 ν -distribution of {U (t) : t ≥ 0} is that of an R-valued ancient
H 2 (R ;R)
Ornstein–Uhlenbeck process. Thus, what I have to estimate is
` ` `−1 `−1
P e− 2n B e n − e− 2n B e n ≤ nMα , 1 ≤ ` ≤ L ,
where B(t), Ft , P is an R-valued Brownian motion. But clearly this probability
is dominated by the sum of
` `−1 `
P B e n − B e n ≤ M2ne 2n
α , 1 ≤ ` ≤ L
Exercises for § 8.5 355
and `
1 `−1
P ∃1 ≤ ` ≤ L 1 − e− 2n B e n ≥ M e 2n
2nα .
M 2 n2(1−α)
The second of these is easily dominated by 2Le− 8 , which, since α < 1,
means that it causes no problems. As for the first, one can use the independence
of Brownian increments and Brownian scaling to dominate it by the Lth power
of
1
P B(1)−B e− n ≤ M (2nα )−1 . Hence, I can take any L such that α− 12 L >
ν.
As a consequence of the preceding and Theorem 8.3.1, we have the following
corollary.
Corollary 8.5.9. Given s ∈ R, set
ν+1 ν+1
Θs (Rν ; R) = B s− 2 θ : θ ∈ Θ 2 (Rν ; R) ,
ν+1
kθkΘs (Rν ;R) = kB 2 −s θk ν+1 ,
Θ 2 (Rν ;R)
and ν+1
WH s (Rν ;R) = (B s− 2 )∗ W ν+1 .
H 2 (Rν ;R)
Then Θs (Rν ; R) is a separable Banach space in which H (Rν ; R) is continuously
s
embedded as a dense subspace, and H s (Rν ; R), Θs (Rν ; R), WH s (Rν ;R) is an
abstract Wiener space.
Exercises for § 8.5
Exercise 8.5.10. In this exercise we will show how to use the Ornstein–Uhlen-
beck process to prove Poincaré’s Inequality
(8.5.11) Varγ0,1 (ϕ) = kϕ − hϕ, γ0,1 ik2L2 (γ0,1 ;R) ≤ kϕ0 k2L2 (γ0,1 ;R)
for the standard Gaussian distribution on R. I will outline the proof of (8.5.11)
for ϕ ∈ S (R; R), but the estimate immediately extends to any ϕ ∈ L2 (γ0,1 ; R)
whose (distributional) first derivative is again in L2 (γ0,1 ; R).
(i) For ϕ ∈ S (R; R), set
(1)
uϕ (t, x) = EW
ϕ U (t, x) ,
where {U (t, x) : t ≥ 0} is the one-sided, R-valued Ornstein–Uhlenbeck process
t
starting at x. Show that u0ϕ (t, x) = e− 2 uϕ0 (t, x) and that
lim uϕ (t, · ) = ϕ and lim uϕ (t, · ) = hϕ, γ0,1 i in L2 (γ0,1 ; R).
t&0 t→∞
Show that another expression for uϕ is
t
!
(y − e− 2 x)2
Z
1
−t − 2
uϕ (t, x) = 2π(1 − e ) ϕ(y) exp − dy.
R 2(1 − e−t )
Using this second expression, show that uϕ (t, · ) ∈ S (R; R) and that t ∈
[0, ∞) 7−→ uϕ (t, · ) ∈ S (R; R) is continuous. In addition, show that u̇ϕ (t, x) =
1 00 0
2 uϕ (t, x) − xuϕ (t, x) .
356 8 Gaussian Measures on a Banach Space
(ii) For ϕ1 , ϕ2 ∈ C 2 (R; R) whose second derivative are tempered, show that
and use this together with (i) to show that, for any ϕ ∈ S (R; R),
d
huϕ (t, · ), γ0,1 i = hϕ, γ0,1 i and kuϕ (t, · )k2L2 (γ0,1 ;R) = −e−t kuϕ0 (t, · )k2L2 (γ0,1 ;R) .
dt
Conclude that kuϕ (t, · )kL2 (γ0,1 ;R) ≤ kϕkL2 (γ0,1 ;R) and
d
kuϕ (t, · )k2L2 (γ0,1 ;R) ≥ −e−t kϕ0 k2L2 (γ0,1 ;R) .
dt
(ϕ0 )2
1
(*) ϕ log ϕ γ0,1
≤
2 ϕ γ0,1
e−t (ϕ0 )2
d
uϕ (t, · ) log uϕ (t, · ) γ0,1 ≥ − .
dt 2 ϕ γ0,1
Exercise 8.5.13. Although it should be clear that the arguments given in Ex-
ercises 8.5.10 and 8.5.12 work equally well in RN and yield (8.5.11) and (2.4.42)
with γ0,1 replaced by γ0,I and (ϕ0 )2 replaced by |∇ϕ|2 , it is significant that each
of these inequalities for R implies its RN analog. Indeed, show that Fubini’s The-
orem is all that one needs to pass to the higher dimensional results. The reason
why this remark is significant is that it allows one to prove infinite dimensional
versions of both Poincaré’s Inequality and the logarithmic Sobolev Inequality,
and both of these play a crucial role in infinite dimensional analysis. In fact,
Nelson’s interest in hypercontractive estimates sprung from his brilliant insight
that hypercontractive estimates would allow him to construct a non-trivial (i.e.,
non-Gaussian), translation invariant quantum field for R2 .
Exercise 8.5.14. It is interesting to see what happens if one changes the sign
of the second term on the right-hand side of (8.5.1), thereby converting the
centripetal force into a centrifugal one.
(i) Show that, for each θ ∈ Θ(RN ), the unique solution to
Z t
1
V(t, θ) = θ(t) + 2 V(τ, θ) dτ, t ≥ 0,
0
is Z t
t τ
V(t, θ) = e 2 e− 2 dθ(τ ),
0
(iii) Let {B(t) : t ≥ 0} be an RN -valued Brownian motion, and show that the
distribution of
t
e 2 B 1 − e−t : t ≥ 0
and set kθkΘV (RN ) ≡ supt≥0 e−t |θ(t)|. Show that ΘV (RN ); k · kΘV (RN ) is a
separable Banach space and that there exists a unique V (N ) ∈ M1 ΘV (RN )
such that the distribution of {θ(t) : t ≥ 0} under V (N ) is the same as the
distribution of {V(t) : t ≥ 0} under W (N ) .
358 8 Gaussian Measures on a Banach Space
Theorem 8.6.1. With H 1 (H) and Θ(E) as above, there is a unique W (E) ∈
M1 Θ(E) such that H 1 (H), Θ(E), W (E) is an abstract Wiener space.
most part, the proof follows the same basic line of reasoning as that suggested in
Exercise 8.3.21 when E = RN . However, there is a problem here that we did not
encounter there. Namely, unless E is finite dimensional, bounded subsets will
not necessarily be relatively compact in E. Hence, local uniform equicontinuity
plus local boundedness is not sufficient to guarantee
that a collection of E-valued
paths is relatively compact in C [0, ∞); E , and that is the reason why we have
to work a little harder here.
But,
with variance khλ k2H 1 (H) . To this end, define x∗m ∈ E ∗ so that1 hx, x∗m i =
hh1m x, λi for x ∈ E. Then,
n
X
hB( · , x), λi = lim hSn ( · , x), λi = lim hxm , x∗m i W N -almost surely.
n→∞ n→∞
0
Finally, to complete the proof, all that remains is to take W (E) to be the
W N -distribution of x B( · , x).
§ 8.6.2. Brownian Formulation. Let (H, E, W) be an abstract Wiener space.
Given a probability space (Ω, F, P), a non-decreasing family of sub-σ-algebras
{Ft : t ≥ 0}, and a measurable map B : [0, ∞) × Ω −→ E, say that the triple
B(t), Ft , P is a W-Brownian motion if
(1) B is {Ft : t ≥ 0}-progressively measurable,
(2) B(0, ω) = 0 and B( · , ω) ∈ C [0, ∞); E for P-almost every ω,
(3) B(1) has distribution W, and, for all 0 ≤ s < t, B(t)−B(s) is independent
1
of Fs and has the same distribution as (t − s) 2 B(1).
Lemma 8.6.4. Suppose that {B(t) : t ≥ 0} satisfies conditions (1) and (2).
Then B(t), Ft , P is a W-Brownian motion if and only if hB(t), x∗ i, Ft , P is
hB(t1 ), x∗1 i + hB(t2 ), x∗2 i = hB(t1 ), x∗1 + x∗2 i + hB(t2 ) − B(t1 ), x∗2 i,
and the terms on the right are independent, centered Gaussians, the first with
variance t1 khx∗1 + hx∗2 k2H and the second with variance (t2 − t1 )khx∗2 k2H .
Finally, take Ft = σ {B(τ ) : τ ∈ [0, t]} , and assume that G(B) is a Gaussian
family satisfying (8.6.5). Given x∗ with khx∗ kH = 1 and 0 ≤ s < t, we know
that hB(t) − B(s), x∗ i = hB(t), x∗ i − hB(s), x∗ i is orthogonal in L2 (P; R) to
hB(τ ), y ∗ i for every τ ∈ [0, s] and y ∗ ∈ E ∗ . Hence, since Fs is generated by
{hB(τ ), y ∗ i : (τ, y ∗ ) ∈ [0, s]×E ∗ }, we know that hB(t)−B(s), x∗ i is independent
of Fs . In addition, hB(t) − B(s), x∗ i is a centered Gaussian with variance t − s,
and so we have proved that hB(t), x∗ i, Ft , P is an R-valued Brownian
motion.
Now apply the first part of the lemma to conclude that B(t), Ft , P is a W-
Brownian motion.
Theorem 8.6.6. Refer to the notation in Theorem 8.6.1. When Ω = Θ(E),
F = BE , and Ft = σ {θ(τ ) : τ ∈ [0, t]} , θ(t), Ft , W (E) is a W-Brownian
motion. Conversely, if B(t), Ft , P is any W-Brownian motion, then B( · , ω) ∈
Θ(E) P-almost surely and W (E) is the P-distribution of ω B( · , ω).
Proof: To prove the first assertion, let t1 , t2 ∈ [0, ∞) and x∗1 , x∗2 ∈ E ∗ be given,
and define λi ∈ Θ(E)∗ so that hθ, λi i = hθ(ti ), x∗i i for i ∈ {1, 2}. Then (cf. the
notation in the proof of Theorem 8.6.1) hλi = h1ti hx∗i , and so
(E)
EW hθ(t1 ), x∗1 ihθ(t2 ), x∗2 i = hλ1 hλ2 H 1 (H) = (t1 ∧ t2 ) hx∗1 , hx∗2 H .
§ 8.6 Brownian Motion on a Banach Space 363
Starting from this, it is an easy matter to check that the span of {hθ(t), x∗ i :
(t, x∗ ) ∈ [0, ∞) × E ∗ } is a Gaussian family in L2 (W (E) ; R) that satisfies (8.6.5).
To prove the converse, begin by observing that, because G(B) is a Gaussian
family satisfying (8.6.5), the distribution of ω ∈ Ω 7−→ B( · , ω) ∈ C [0, ∞); E
under P is the same as that of θ ∈ Θ(E) 7−→ θ( · ) ∈ C [0, ∞); E under W (E) .
Hence
kB(t)kE (E) kθ(t)kE
P lim =0 =W lim = 0 = 1,
t→∞ t t→∞ t
Theorem 8.6.7 (Strassen). Given θ ∈ Θ(E), define θ̃n (t) = θ(nt) Λn for n ≥ 1
q
and t ∈ [0, ∞), where Λn = 2n log(2) (n ∨ 3). Then, for W (E) -almost every θ,
the sequence {θ̃n : n ≥ 0} is relatively compact in Θ(E) and BH 1 (H) (0, 1) is its
set of limit points. Equivalently, for W (E) -almost every θ,
Not surprisingly, the proof differs only slightly from that of Theorem 8.4.4.
In proving the W (E) -almost sure convergence of {θ̃n : n ≥ 1} to BH 1 (H) (0, 1),
there are two new ingredients here. The first is the use of the Brownian scaling
invariance property (cf. Exercise 8.6.8), which says that the W (E) is invariant
1
under the scaling maps Sα : Θ(E) −→ Θ(E) given by Sα θ = α− 2 θ(α · ) for
α > 0 and is easily proved as a consequence of the fact that these maps are
isometric from H 1 (H) onto itself. The second new ingredient is the observation
that, for any R > 0, r ∈ (0, 1], and θ ∈ Θ(E), kθ(r · ) − BH 1 (H) (0, R)kΘ(E) ≤
kθ − BH 1 (H) (0, R)kΘ(E) . To see this, let h ∈ BH 1 (H) (0, R) be given, and check
that h(r · ) is again in BH (0, R) and that kθ(r · ) − h(r · )kΘ(E) ≤ kθ − hkΘ(E) .
364 8 Gaussian Measures on a Banach Space
Taking these into account and applying (8.4.2), one can now justify
W (E) m−1max m
θ̃n − BH 1 (H) (0, 1)
Θ(E) ≥ δ
β ≤n≤β
m !
β 2 θ(nβ −m · )
(E)
=W max
− BH 1 (H) (0, 1)
≥δ
β m−1 ≤n≤β m
Λn
Θ(E)
Λ [β m−1 ]
δ
≤ W (E) m−1max m
θ β −m n · − BH 1 (H) 0,
≥ m
m
β ≤n≤β
β2
β 2 Λ[β m−1 ]
Θ(E)
Λ[β m−1 ]
δ
≤ W (E)
θ − BH 1 (H) 0, ≥ m
m
β2
β 2 Λ[β m−1 ]
Θ(E)
m
= W (E)
β 2 Λ−1
[β m−1 ] θ − B 1
H (H) (0, 1)
Θ(E)
≥ δ
R2 [β m−1 ]
(E) m−1
= Wβ m Λ−2 kθ − BH 1 (H) (0, 1)kΘ(E) ≥ δ ≤ exp − log(2) [β ]
[β m−1 ] βm
for all β ∈ (1, 2), R < inf{khkH 1 (H) : khkΘ(E) ≥ δ}, and sufficiently large m ≥ 1.
Armed with this information, one can simply repeat the argument given at the
analogous place in the proof of Theorem 8.4.4.
The proof that, W (E) -almost surely, θ̃n approaches every h ∈ C infinitely often
also requires only minor modification. To begin, one remarks that if A ⊆ Θ(E)
is relatively compact, then
kθ(t)kE
lim sup sup = 0.
/ −1 ,T ] 1 + t
T →∞ θ∈A t∈[T
Thus, since, by the preceding, for W (E) -almost every θ, the union of {θn : n ≥ 1}
and BH 1 (H) (0, 1) is relatively compact in Θ(E), it suffices to prove that
θ̃n (t) − θ̃n (k −1 ) − h(t) − h(k −1 ) kE
lim sup = 0 W (E) -almost surely
n→∞ t∈[k−1 ,k] 1+t
for each h ∈ BH 1 (H) (0, 1) and k ≥ 2. Because, for a fixed k ≥ 2, the random
variables θ̃k2m − θ̃k2m (k −1 ) [k −1 , k], m ≥ 1, are W (E) -independent random
variables, we can use the Borel–Cantelli Lemma as in § 8.4.2 and thereby reduce
the problem to showing that, if θ̌km (t) = θ̃km (t + k −1 ) − θ̃km (k −1 ), then
∞
X
W (E) kθ̌k2m − hkΘ(E) ≤ δ = ∞
m=1
for each δ > 0, k ≥ 2, and h ∈ BH 1 (H) (0, 1). Finally, since W (E) km Λ−1 is the
k2m
W (E) distribution of θ θ̌k2m , the rest of the argument is the same as the one
given in § 8.4.2.
Exercises for § 8.6 365
t > 0. Show that I is an isometry from Θ(E) onto itself and that I H 1 (H)
is an isometry on H onto itself. Finally, use this to prove the Brownian time
inversion invariance property: I∗ W (E) = W (E) .
Exercise 8.6.9. Let H U (H) be the Hilbert space of absolutely continuous hU :
R −→ H with the property that
q
khkH U (H) = kḣU k2L2 (R;H) + 14 khU k2L2 (R;H) < ∞,
and show that, W (E) -almost surely, {θ̆n : n ≥ 1} is relatively compact in Θ(E)
and that BH 1 (H) (0, 1) is the set of its limit points.
Hint: Referring to (ii) in Exercise 8.6.8, show that it suffices to prove these
properties for the sequence {(Iθ)˘n : n ≥ 1}. Next check that
and use Theorem 8.6.7 and the fact that I is an isometry of H 1 (H) onto itself.
Chapter 9
Convergence of Measures on a Polish Space
367
368 9 Convergence of Measures on a Polish Space
dµ ∂ν
(9.1.2) kµ − νkvar = kg − f kL1 (λ;R) , where f = and g = .
dλ ∂λ
and equality holds when ϕ = sgn ◦ (g − f ). To prove the assertion that follows
(9.1.2), note that
and that the inequality is strict if and only if f g > 0 on a set of strictly positive
λ-measure or, equivalently, if and only if µ 6⊥ ν. Thus, all that remains is to
check the completeness assertion. To this end, let {µn : n ≥ 1} ⊆ M1 (E)
satisfying
lim sup kµn − µm kvar = 0
m→∞ n≥m
P∞
be given, and set λ = n=1 2−n µn . Clearly, λ is an element of M1 (E) with
respect to which each µn is absolutely continuous. Moreover, if fn = dµ dλ , then,
n
1
by (9.1.2), {fn : n ≥ 1} is a Cauchy convergent sequence in L (λ; R). Hence,
since L1 (λ; R) is complete, there is an f ∈ L1 (λ; R) to which the fn ’s converge in
L1 (λ; R). Obviously, we may choose f to be non-negative, and certainly it has
λ-integral 1. Thus, the measure µ given by dµ = f dλ is an element of M1 (E),
and, by (9.1.2), kµn − µkvar −→ 0.
As a consequence of Lemma 9.1.1, we see that the uniform topology on M1 (E)
admits a complete metric and that convergence in this topology is intimately re-
lated to L1 -convergence in the L1 -space of an appropriate element of M1 (E).
§ 9.1 Prohorov–Varadarajan Theory 369
In fact, M1 (E) looks in the uniform topology like a galaxy that is broken into
many constellations, each constellation consisting of measures that are all abso-
lutely continuous with respect to some fixed measure. In particular, there will
usually be too many constellations for M1 (E) in the uniform topology to be
separable. To wit, if E is uncountable and {x} ∈ B for every x ∈ E, then the
point masses δx , x ∈ E, (i.e., δx (Γ) = 1Γ (x)) form an uncountable subset of
M1 (E) and kδy − δx kvar = 2 for y 6= x. Hence, in this case, M1 (E) cannot be
covered by a countable collection of open k · kvar -balls of radius 1.
As I said at the beginning of this section, the uniform topology is not the only
one available. Indeed, for many purposes and, in particular, for probability the-
ory, it is too rigid a topology to be useful. For this reason, it is often convenient
to consider a more lenient topology on M1 (E). The first one that comes to mind
is the one that results from eliminating the uniformity in the uniform topology.
That is, given a µ ∈ M1 (E), define
n o
(9.1.3) S µ, δ; ϕ1 , . . . , ϕn ≡ ν ∈ M1 (E) : max hϕk , νi − hϕk , µi < δ
1≤k≤n
are all trivial. Thus, the first part will be complete once I check that (ii) =⇒
(iii), (iv) =⇒ (vi), and that (v) together with (vi) imply (vii). To see the
first of these, let F be a closed subset of E, and set
n1
ρ(x, F )
ψn (x) = 1 − for n ∈ Z+ and x ∈ E.
1 + ρ(x, F )
It is then clear that ψn ∈ Ubρ (E; R) for each n ∈ Z+ and that 1 ≥ ψn (x) & 1F (x)
as n → ∞ for each x ∈ E. Thus, The Monotone Convergence Theorem followed
by (ii) imply that
In proving that (iv) =⇒ (vi), I may and will assume that f is a non-negative,
lower semicontinuous function. For n ∈ N, define
n
∞ 4
X ` ∧ 4n 1 X
fn = 1I`,n ◦f = n 1J`,n ◦ f,
2n 2
`=0 `=0
where
` `+1 `
I`,n = , and J`,n = ,∞ .
2n 2n 2n
It is then clear that 0 ≤ fn % f and therefore that hfn , µi −→ hf, µi as n → ∞.
At the same time, by lower semicontinuity, the sets {f ∈ J`,n } are open, and so
(iv) implies
hfn , µi ≤ limhfn , µα i ≤ limhf, µα i
α α
+
for each n ∈ Z . After letting n → ∞, one sees that (iv) =⇒ (vi).
Turning to the proof that (v) & (vi) =⇒ (vii), suppose that f ∈ B(E; R) is
continuous at µ-almost every x ∈ E, and define
and so I have now completed the proof that conditions (i) through (vii) are
equivalent.
Now assume that E is separable, and let ρ̂ be a totally bounded metric for E.
By (iii) of Lemma 9.1.4, Ubρ̂ (E; R) is separable. Hence, we can find a countable
set {ϕn : n ≥ 1} that is dense in Ubρ̂ (E; R). In particular, by the equivalence of
(i) and (ii) above, we see that hϕn , µα i −→ hϕn , µi for all n ∈ Z+ if and only if
+
µα =⇒ µ, which is to say that the corresponding map H : M1 (E) −→ [0, 1]Z is
+
a homeomorphism. Since [0, 1]Z is a compact metric space and D (cf. the proof
of (ii) in Lemma 9.1.4) is a metric for it, we also see that the R described is a
totally bounded metric for M1 (E). In particular, M1 (E) is separable. Finally,
since, by (ii) in Lemma 9.1.4, it is always possible to find a totally bounded
metric for E, the last assertion needs no further comment.
The reader would do well to pay close attention to what (iii) and (iv) say
about the nature of weak convergence. Namely, even though µα =⇒ µ, it is
possible that some or all of the mass that the µα ’s assign to the interior of a
set may gravitate to the boundary in the limit. This phenomenon is most easily
understood by taking E = R, µα to be the unit point mass δα at α ∈ [0, 1),
checking that δα =⇒ δ1 , and noting that δ1 (0, 1) = 0 < 1 = δα (0, 1) for each
α ∈ [0, 1).
Remark 9.1.6. Those who find nets distasteful will be pleased to learn that,
from now on, I will be restricting my attention to separable metric spaces E and
therefore need only discuss sequential convergence when working with the weak
topology on M1 (E). Furthermore, unless the contrary is explicitly stated, I will
always be thinking of the weak topology when working with M1 (E).
Given a separable metric space E, I next want to find conditions that guarantee
that a subset of M1 (E) is compact; and at this point it will be convenient to
have introduced the notation K ⊂⊂ E to indicate that K is a compact subset
of E. The key to my analysis is the following extension of the sort of Riesz
Representation result in Theorem 3.1.1 combined with a crucial observation
made by S. Ulam.1
1 It is no accident that Ulam was the first to make this observation. Indeed, the term Polish
space was coined by Bourbaki in recognition of the contribution made to this subject by the
Polish school in general and C. Kuratowski in particular (cf. Kuratowski’s Topologie, Vol. I,
Warszawa–Lwow (1933)). Ulam had studied with Kuratowski.
§ 9.1 Prohorov–Varadarajan Theory 375
such that
Conversely, if E is a Polish space and µ ∈ M1 (E), then for every > 0 there is a
K ⊂⊂ E such that µ(K) ≥ 1 − . In particular, if µ ∈ M1 (E) and Λ(ϕ) = hϕ, µi
for ϕ ∈ Cb (E; R), then, for each > 0, (9.1.8) holds for some K ⊂⊂ E.
Proof: I begin with the trivial observation that, because Λ is non-negative and
Λ(1) = 1, Λ(ϕ) ≤ kϕku . Next, according to the Daniell theory of integration,
the first statement will be proved as soon as we know that Λ(ϕn ) & 0 whenever
{ϕn : n ≥ 1} is a non-increasing sequence of functions from Ubρ E; [0, ∞) that
`n
!
[
µ Bk,n ≥1− .
2n
k=1
Hence, if
`n
[ ∞
\
Cn ≡ B k,n and K = Cn ,
k=1 n=1
then µ(K) ≥ 1 − . At the same time, it is obvious that, on the one hand,
K is closed (and therefore ρ-complete) and that, on the other hand, K ⊆
S`n 2
k=1 B pk , n for every n ∈ Z+ . Hence, K is both complete and totally
bounded with respect to ρ and, as such, is compact.
As Lemma 9.1.7 makes clear, probability measures on a Polish space like to
be nearly concentrated on a compact set. Following Prohorov and Varadarajan,2
2 See Yu. V. Prohorov’s article “Convergence of random processes and limit theorems in prob-
ability theory,” Theory of Prob. & Appl., which appeared in 1956. Independently, V.S.
Varadarajan developed essentially the same theory in “Weak convergence of measures on a
separable metric spaces,” Sankhyǎ, which was published in 1958. Although Prohorov got into
print first, subsequent expositions, including this one, rely heavily on Varadarajan.
376 9 Convergence of Measures on a Polish Space
what we are about to see is that, for a Polish space E, relatively compact subsets
of M1 (E) are those whose elements are nearly concentrated on the same compact
set of E. More precisely, given a separable metric space E, say that M ⊆ M1 (E)
is tight if, for every > 0, there exists a K ⊂⊂ E such that µ(K) ≥ 1 − for
all µ ∈ M .
Theorem 9.1.9. Let E be a separable metric space and M ⊆ M1 (E). Then
M is compact if M is tight. Conversely, when E is Polish, M is tight if M is
compact.3
Proof: Since it is clear, from (iii) in Theorem 9.1.5, that M is tight if and only
if M is, I will assume throughout that M is closed in M1 (E).
To prove the first statement, take
ρ̂ to be a totally bounded metric on E,
ρ̂
choose {ϕn : n ≥ 1} ⊆ Ub E; [0, 1] accordingly, as in the last part of Theorem
9.1.5, and let ϕ0 = 1. Given a sequence {µ` : ` ≥ 1} ⊆ M1(E), we can use a
standard diagonalization procedure to extract a subsequence µ`k : k ≥ 1 such
that
Λ(ϕn ) ≡ lim hϕn , µ`k i
k→∞
exists for each n ∈ N. Since Λ(ϕ) ≡ limk→∞ hϕ, µ`k i continues to exist for
every ϕ in the uniform closure of the span of {ϕn : n ≥ 1}, we now see that
Λ determines a non-negative linear functional on Ubρ̂ (E; R) and that Λ(1) = 1.
Moreover, because M is tight, we can find, for any > 0, a K ⊂⊂ E such that
µ(K) ≥ 1 − for every µ ∈ M , and therefore (9.1.8) holds with this choice
of K. Hence, by Lemma 9.1.7, we know that there is a µ ∈ M1 (E) for which
Λ(ϕ) = hϕ, µi, ϕ ∈ Ubρ̂ (E; R). Because this means that hϕ, µ`k i −→ hϕ, µi for
every ϕ ∈ Ubρ̂ (E; R), the equivalence of (i) and (ii) in Theorem 9.1.5 allows us
to conclude that µ`k =⇒ µ.
Finally, suppose that E is Polish and that M is compact in M1 (E). To see
that M must be tight, repeat the argument used to prove the second part of
Lemma 9.1.7. Thus, choose Bk,n for k, n ∈ Z+ as in the proof there, and set
`
!
[
f`,n (µ) = µ Bk,n for `, n ∈ Z+ .
k=1
By (iv) in Theorem 9.1.5, µ ∈ M1 (E) 7−→ f`,n (µ) ∈ [0, 1] is lower semicontinu-
ous. Moreover, for each n ∈ Z+ , f`,n % 1 as ` % ∞. Thus, by Dini’s Lemma,
we can choose, for each n ∈ Z+ , one `n ∈ Z+ so that f`n ,n (µ) ≥ 1 − 2n for all
3 For the reader who wishes to investigate just how far these results can be pushed before
they start of break down, a good place to start is Appendix III in P. Billingsley’s Convergence
of Probability Measures, Wiley (1968). In particular, although it is reasonably clear that
completeness is more or less essential for the necessity, the havoc that results from dropping
separability may come as a surprise.
§ 9.1 Prohorov–Varadarajan Theory 377
µ ∈ M ; and at this point the rest of the argument is precisely the same as the
one given at the end of the proof of Lemma 9.1.7.
§ 9.1.3. The Lévy Metric and Completeness of M1 (E). We have now seen
that M1 (E) inherits properties from E. To be more specific, if E is a metric
space, then M1 (E) is separable or compact if E itself is. What I want to show
next is that completeness also gets transferred. That is, I will show that M1 (E)
is Polish if E is. In order to do this, I will need a lemma that is of considerable
importance in its own right.
Lemma 9.1.10. Let E be a Polish space and Φ a bounded subset of Cb (E; R)
that is equicontinuous at each x ∈ E. (That is, for each x ∈ E, supϕ∈Φ |ϕ(y) −
ϕ(x)| = 0 as y → x.) If {µn : n ≥ 1} ∪ {µ} ⊆ M1 (E) and µn =⇒ µ, then
lim sup hϕ, µn i − hϕ, µi = 0.
n→∞ ϕ∈Φ
Proof: Let > 0 be given, and use the second part of Theorem 9.1.9 to choose
K ⊂⊂ E so that
sup kϕku sup µn K{ < .
ϕ∈Φ n∈Z+ 4
By (iv) of Theorem 9.1.5, µ K{ satisfies the same estimate. Next, choose a
metric ρ for E and a countable dense set {pk : k ≥ 1} in K. Using equicontinuity
together with compactness, find ` ∈ Z+ and δ1 , . . . , δ` > 0 so that K ⊆ x :
ρ(x, pk ) < δk for some 1 ≤ k ≤ ` and
sup ϕ(x) − ϕ(pk ) < for 1 ≤ k ≤ ` and x ∈ K with ρ(x, pk ) < 2δk .
ϕ∈Φ 4
Because r ∈ (0, ∞) 7−→ µ y ∈ K : ρ(y, x) ≤ r ∈ [0, 1] is non-decreasing
for each x ∈ K, we can find,
for each 1 ≤k ≤ `, an rk ∈ δk , 2δk such that
µ(∂Bk ) = 0 when Bk ≡ x ∈ K : ρ x, pk < rk . Finally, set A1 = B1 and
Sk S`
Ak+1 = Bk+1 \ j=1 Bj for 1 ≤ k < `. Then, K ⊆ k=1 Ak , the Ak ’s are
disjoint, and, for each 1 ≤ k ≤ `,
sup sup ϕ(x) − ϕ pk < and µ ∂Ak = 0.
ϕ∈Φ x∈Ak 4
`
X
lim sup hϕ, µn i − hϕ, µi < + lim sup ϕ pk µn Ak − µ Ak = .
n→∞ ϕ∈Φ n→∞ ϕ∈Φ
k=1
378 9 Convergence of Measures on a Polish Space
where F (δ) denotes the set of x ∈ E that lie a ρ-distance less than δ from F .
Then L is a complete metric for M1 (E), and therefore M1 (E) is Polish.
Proof: It is clear that L is symmetric and that it satisfies the triangle in-
equality. Thus,
we will know that it is a metric for M1 (E) as soon as we show
that L µn , µ −→ 0 if and only if µn =⇒ µ. To this end, first suppose that
L µn , µ −→ 0. Then, for every closed F , µ F (δ) + δ ≥ limn→∞ µn (F ) for all
δ > 0; and therefore, by countable additivity, µ(F ) ≥ limn→∞ µn (F ) for every
closed F . Hence, by the equivalence of (i) and (iii) in Theorem 9.1.5, µn =⇒ µ.
Now suppose that µn =⇒ µ, and let δ > 0 be given. Given a closed F in E,
define
ρ x, F (δ) {
ψF (x) = for x ∈ E.
ρ x, F (δ) { + ρ(x, F )
It is then an easy matter to check that both
ρ(x, y)
1F ≤ ψF ≤ 1F (δ) and ψF (x) − ψF (y) ≤ .
δ
In particular, by Lemma 9.1.10, we can choose m ∈ Z+ so that
n o
sup sup hψF , µn i − hψF , µi : F closed in E < δ,
n≥m
In other words, supn≥m L µn , µ ≤ δ, and, since δ > 0 was arbitrary, we have
shown that L µn , µ −→ 0.
In order to finish the proof, I must show that if {µn : n ≥ 1} ⊆ M1 (E) is
L-Cauchy convergent, then it is tight. Thus, let > 0 be given, and choose, for
each ` ∈ Z+ , an m` ∈ Z+ and a K` ⊂⊂ E so that
sup L µn , µm` ≤ `+1 and max µn K` { ≤ `+1 .
n≥m` 2 1≤n≤m ` 2
( )
Setting ` = 2`
one then has that supn∈Z+ µn K` ` { ≤ ` for each ` ∈ Z+ .
,
T∞ ( )
In particular, if K ≡ `=1 K` ` , then µn (K) ≥ 1 − for all n ∈ Z+ . Finally,
§ 9.1 Prohorov–Varadarajan Theory 379
not depend on the choice of metric. To complete the first part, suppose that
ρ(Xn , X) −→ 0 in P-measure. Then, for every ϕ ∈ Ubρ (E; R) and δ > 0,
lim EP ϕ Xn − EP ϕ X) ≤ lim EP ϕ Xn − ϕ(X)
n→∞ n→∞
≤ (δ) + kϕku lim P ρ Xn , X ≥ δ = (δ),
n→∞
where
(δ) ≡ sup |ϕ(y) − ϕ(x)| : ρ(x, y) ≤ δ −→ 0 as δ & 0.
Hence, since the same is true when the roles of X and Y are reversed, the
asserted estimate for L X∗ P, Y∗ P) holds.
As a demonstration of the sort of use to which one can put these ideas, I
present the following version of the Principle of Accompanying Laws.
Theorem 9.1.13. Let E be a Polish space and, for each k ∈ Z+ , let {Yk,n :
n ≥ 1} be a sequence of E-valued random variables on the probability space
(Ω, F, P). Further, assume that, for each k ∈ Z+ , there is a µk ∈ M1 (E) such
∗
that Yk,n P =⇒ µk as n → ∞. Finally, let ρ be a complete metric for E, and
suppose that {Xn : n ≥ 1} is a sequence of E-valued random variables on
(Ω, F, P) with the property that
(9.1.14) lim lim P ρ Xn , Yk,n ≥ δ = 0 for every δ > 0.
k→∞ n→∞
and so
lim L µ, (Xn )∗ P ≤ L(µ, µk ) + lim L (Yk,n )∗ P, (Xn )∗ P .
n→∞ n→∞
Thus, after letting k → ∞ and applying (*), one concludes that (Xn )∗ P =⇒
µ.
Exercises for § 9.1
Exercise 9.1.15. Let (E, B) be a measurable space with the property that
{x} ∈ B for all x ∈ E. In this exercise, we will investigate the strong topology
in a little more detail. In particular, in part (iv), we will show that when
µ ∈ M1 (E) is non-atomic (i.e., µ {x} = 0 for every x ∈ E), then there is no
countable neighborhood basis of µ in the strong topology. Obviously, this means
that the strong topology for M1 (E) admits no metric whenever M1 (E) contains
a non-atomic element.
(i) Show that, in general,
kν − µkvar = 2 max ν(A) − µ(A) : A ∈ B
and that in the case when E is a metric space, B its Borel field, and ρ a metric
for E,
n
1 X ϕ
lim Xm (x) = hϕ, µ
n→∞ n
m=1
(ii) We have seen that M1 (E) is compact if E is. To see that the converse is
also true, show that x ∈ E 7−→ δx ∈ M1 (E) is a homeomorphism whose image
is closed.
(iii) Although it is a little off our track, it is amusing to show that E being
compact is equivalent to Cb (E; R) being separable; and, in view of (i) in Lemma
9.1.4, this comes down to checking that E is compact if Cb (E; R) is separable.
Hint: Let ρ̂ be a totally bounded metric on E, and use Ê to denote the ρ̂-
completion of E. Show that if {xn : n ≥ 1} ⊆ E has the properties that
xn −→ x̂ ∈ Ê and limn→∞ ϕ(xn ) exists for every ϕ ∈ Cb (E; R), then x̂ ∈ E.
1
(Suppose not, set ψ(x) = ρ̂(x,x̂) , and consider functions of the form f ◦ ψ for
f ∈ Cb (R; R).) Finally, assuming that Cb (E; R) is separable, and, using a
diagonalization procedure, show that every sequence {xn : n ≥ 1} ⊆ E admits a
subsequence {xnm : m ≥ 1} that converges to some x̂ ∈ Ê and limm→∞ ϕ xnm
exists for every ϕ ∈ Cb (E; R).
Show that there is a unique µ ∈ M1 (E) such that µ[1,`] = (π` )∗ µ for every
` ∈ Z+ .
384 9 Convergence of Measures on a Polish Space
n≤`
xn if
Φ` x1 , . . . , x` =
n en otherwise.
Show that (Φ` )∗ µ[1,`] : ` ∈ Z+ ∈ M1 (E) is tight and that any limit must be
the desired µ.
The conclusion drawn in (iii) is the renowned Kolmogorov Extension (or
Consistency) Theorem. Notice that, at least for Polish spaces, it represents
a vast generalization of the result obtained in Exercise 1.1.14.
Exercise 9.1.18. In this exercise we will use the theory of weak convergence
to develop variations on The Strong Law of Large Numbers (cf. Theorem 1.4.9).
Thus, let E be a Polish space, (Ω, F, P ) a probability space, and {Xn : n ≥ 1}
a sequence of mutually independent E-valued random variables on (Ω, F, P )
with common distribution µ ∈ M1 (E). Next, define the empirical distribution
function
n
1 X
ω ∈ Ω 7−→ Ln (ω) ≡ δX (ω) ∈ M1 (E),
n m=1 m
n
1 X
n ∈ Z+ and ω ∈ Ω.
ϕ, Ln (ω) = ϕ Xm (ω) ,
n m=1
which is The Strong Law of Large Numbers for the empirical distribu-
tion.
Now show that (9.1.19) provides another (cf. Exercises 6.1.16 and 6.2.18) proof
of the Strong Law of Large Numbers for Banach space–valued random variables.
Thus, let EPbe a real, separable, Banach space with dual space E ∗ , and set
n
S n (ω) = n1 1 Xm (ω) for n ∈ Z+ and ω ∈ Ω.
(i) As a preliminary step, begin with the case when
(*) µ BE (0, R){ = 0 for some R ∈ (0, ∞).
Choose η ∈ Cb R; R so that η(t) = t for t ∈ [−R, R] and η(t) = 0 when |t| ≥
R + 1, and define ψx∗ ∈ Cb (E; R) for x∗ ∈ E ∗ by ψx∗ (x) = η hx, x∗ i , x ∈ E,
Exercises for § 9.1 385
Z
PΣ
P A∩B = ω (B) P(dω) for all A ∈ Σ and B ∈ F.
A
In particular, for each (−∞, ∞]-valued random variable X that is bounded be-
Σ
low, ω ∈ Ω 7−→ EPω [X] is a conditional expectation value of X given Σ. Finally,
if Σ is countably generated, then there is a P-null set N ∈ Σ with the property
that PΣω (A) = 1A (ω) for all ω ∈
/ N and A ∈ Σ.
Proof: To prove the uniqueness, suppose ω ∈ Ω 7−→ QΣ ω ∈ M1 (Ω) were a
second such mapping. We would then know that, for each B ∈ F, QΣ ω (B) =
PΣ
ω (B) for P-almost every ω ∈ Ω. Hence, since F (as the Borel field over a
second countable topological space) is countably generated, we could find one
Σ-measurable P-null set off of which QΣ Σ
ω = Pω . Similarly, to prove the final
assertion when Σ is countably generated, note (cf. (5.1.7)) that, for each A ∈
Σ, PΣ ω (A) = 1A (ω) = δω (A) for P-almost every ω ∈ Ω. Thus, once again
countability allows us to choose one Σ-measurable P-null set N such that PΣ ω
Σ = δω Σ if ω ∈ / N.
I turn next to the question of existence. For this purpose, first choose (cf. (ii)
of Lemma 9.1.4) ρ to be a totally bounded metric for Ω, and let U = Ubρ (Ω; R) be
the space of bounded, ρ-uniformly continuous, R-valued functions on Ω. Then
(cf. (iii) of Lemma 9.1.4) U is a separable Banach space with respect to the
uniform norm. In particular, we can choose a sequence {fn : n ≥ 0} ⊆ U so
that f0 = 1, the functions f0 , . . . , fn are linearly independent for each n ∈ Z+ ,
and the linear span S of {fn : n ≥ 0} is dense in U. Set g0 = 1, and, for each
n ∈ Z+ , let gn be some fixed representative of EP [fn | Σ]. Next, set
R = α ∈ RN : ∃m ∈ N αn = 0 for all n ≥ m
4The beautiful argument that I have just outlined is due to Ranga Rao. See his 1963 article
“The law of large numbers for D[0, 1]-valued random variables,” Theory of Prob. & Appl.
VIII #1, where he shows that this method applies even outside the separable context.
§ 9.2 Regular Conditional Probability Distributions 387
and define
∞
X ∞
X
fα = αn fn and gα = αn gn
n=0 n=0
Clearly, Λω (1) = 1 for all ω ∈ Ω. On the other hand, we cannot say that Λω
is always non-negative as a linear functional on S. In fact, the best we can
do is extract a Σ-measurable P-null set N so that Λω is a non-negative linear
functional on S whenever ω ∈ / N . To this end, let Q denote the rational reals
and set
Q+ = α ∈ R ∩ QN : fα ≥ 0 .
for ω ∈
/N
Λω (f )
g(ω) = P
E [f ] for ω∈N
m ρ(ω, Kn )
ηm,n (ω) = for m, n ∈ Z+ .
1 + m ρ(ω, Kn )
388 9 Convergence of Measures on a Polish Space
Clearly, ηm,n ∈ U for each pair (m, n) and 0 ≤ ηm,n % 1Kn { as m → ∞ for each
n ∈ Z+ . Thus, by The Monotone Convergence Theorem, for each n ∈ Z+ ,
Z Z
sup Λω ηm,n P(dω) = lim Λω ηm,n P(dω)
N { m∈Z+ m→∞ N{
1
= lim EP ηm,n ≤ n ;
m→∞ 2
and so, by the Borel–Cantelli Lemma, we can find a Σ-measurable P-null set
N 0 ⊇ N such that
/ N 0.
M (ω) ≡ sup n sup Λω ηm,n < ∞ for every ω ∈
n∈Z+ m∈Z+
In the older literature, the result in Theorem 9.2.2 would be called a fibering
of µ. The name derives from the idea that µ on E1 × E2 can be decomposed into
its “vertical component” µ2 and its “restrictions” µ(x2 , · ) to “horizontal fibers”
E1 × {x2 }. Alternatively, Theorem 9.2.2 can be interpreted as saying that any
µ ∈ M1 (E1 × E2 ) can be decomposed into its marginal distribution on E2 and
a transition probability x2 ∈ E2 7−→ µ(x2 , · ) ∈ M1 (E1 ). The two extreme cases
are when the coordinates are independent, in which case µ(x2 , · ) is independent
of x2 , and the case when the coordinates are equal, in which case µ(x2 , · ) = δx2 .
As an application of Theorem 9.2.2, I present the following important special
case of a more general result that indicates just how remarkably fungible non-
atomic measures are.
Corollary 9.2.3. Let λ[0,1) denote Lebesgue measure on [0, 1). For each
N ∈ Z+ and µ ∈ M1 (RN ), there is a Borel measurable map f : [0, 1) −→ RN
such that µ = f∗ λ[0,1) .
Proof: I will work by induction on N ∈ Z+ . When N = 1, take
f (u) = inf t ∈ R : µ (−∞, t] ≥ u , u ∈ [0, 1).
for (u1 , u2 ) ∈ [0, 1)2 , then g is Borel measurable on [0, 1)2 and µ = g∗ λ2[0,1) .
Finally, by Lemma 1.1.6 or part (ii) of Exercise 1.1.11, we know
that there is a
Borel measurable map u ∈ [0, 1) 7−→ U(u) = U1 (u), U2 (u) ∈ [0, 1)2 such that
U∗ λ[0,1) = λ2[0,1) , and so we can take f (u) = g ◦ U.
§ 9.2.2. Representing Lévy Measures via the Itô Map. There is another
way of thinking about the construction of the Poisson jump processes, one that
is based on Corollary 9.2.3 and the transformation property described in Lemma
4.2.12. The advantage of this approach is that it provides a method of coupling
Lévy processes corresponding to different Lévy measures. Indeed, it is this cou-
pling procedure that underlies K. Itô’s construction of Markov processes modeled
on Lévy processes.1
Let M0 (dy) = |y|−N −1 dy, which is the Lévy measure for a (cf. Corollary 3.3.9)
symmetric 1-stable law. My first goal is to show that every M ∈ M∞ (RN ) can
be realized as (cf. the notation in Lemma 4.2.6) M0F for some Borel measurable
F : RN −→ RN satisfying F (0) = 0.2
Theorem 9.2.4. For each M ∈ M∞ (RN ) there exists a Borel measurable map
F : RN −→ RN such that F (0) = 0 and
M (Γ) = M0F ≡ M0 F −1 (Γ \ {0}) , Γ ∈ BRN .
Proof: I begin with the case when N = 1. Given M ∈ M∞ (R), define ρ(r, ±1)
for r > 0 by
ρ(r, 1) = sup ρ ∈ [0, ∞) : M [ρ, ∞) ≥ r−1
where I have taken the supremum over the empty set to be 0. Applying Exercise
9.2.6 with ν(dr)= r−2 λ(0,∞) (dr), one sees that M = M0F when F (0) = 0 and
y
F (y) = ρ |y|, |y| for y ∈ R \ {0}.
Now assume that N ≥ 2, and let M ∈ M∞ (RN ). If M = 0, simply take
F ≡ 0. If M 6= 0, choose a non-decreasing function h : (0, ∞) −→ (0, ∞) so that
Z
h |y| M (dy) = 1,
and so
Z Z Z !
−2
ϕ(y) M (dy) = ωN −1 ϕ ρ(r, η)η r dr µ2 (dη)
RN SN −1 (0,∞)
Z Z !
−2
= ωN −1 ϕ ρ(r, η)f (t) r dr λ[0,1) (dt).
[0,1) (0,∞)
We can now prove the following theorem, which is the simplest example of
Itô’s procedure.
Theorem 9.2.5. Let {j0 (t, · ) : t ≥ 0} be a Poisson jump process associated
with M0 . Then, for each M ∈ M∞ (RN ), there is a Borel measurable map
F : RN −→ RN with F (0) = 0 and a Poisson jump process {j(t, · ) : t ≥ 0}
associated with M such that j(t, · ) = j0F (t, · ), t ≥ 0, P-almost surely.
Proof: Choose F as in Theorem 9.2.4 so that M = M0F . For R > 0, set
FR (y) = 1[R,∞) (y)F (y). By Lemma 4.2.12, we know that {j0FR (t, · ) : t ≥ 0} is
a Poisson jump process associated with M FR . In particular, for each r > 0,
EP j0F t, RN \ B(0, r) = lim EP j0FR t, RN \ B(0, r) = M RN \ B(0, r) < ∞.
R&0
Hence, there exists a P-null set N such that t j0F (t, · , ω) is a jump function
F
for all ω ∈/ N . Finally, if j(t, · , ω) = j0 (t, · , ω) when ω ∈ / N and j(t, · , ω) = 0
for ω ∈ N , then {j(t, · ) : t ≥ 0} is a jump process associated with M and
j(t, · ) = j0F (t, · ), t ≥ 0, for P-almost every ω ∈ Ω.
392 9 Convergence of Measures on a Polish Space
lim EP Φ ◦ Sn = hΦ, W (N ) i.
n→∞
Proving this result comes down to showing that {µn : n ≥ 1} is tight and
that every limit point is W (N ) . The second of these is a rather elementary
application of the Central Limit Theorem, and, at least when the Xn ’s have
uniformly bounded fourth moments, the first is an application of Kolmogorov’s
Continuity Criterion. Finally, to remove the fourth moment assumption, I will
use the Principle of Accompanying Laws. It should be noticed that, at no point
in the proof, do I make use of the a priori existence of Wiener measure. Thus,
Theorem 9.3.1 provides another derivation of its existence, a derivation that
includes an an extremely ubiquitous approximation procedure.
1
where τk = tk − tk−1 , 1 ≤ k ≤ `. To this end, for 1 ≤ k ≤ ` and n > τk , set
bntk c
1
X
∆n (k) = n− 2 Xj ,
j=bntk−1 c+1
394 9 Convergence of Measures on a Polish Space
where, as usual, I use the notation btc to denote the integer part of t. Noting
that
Sn tk − Sn tk−1 − ∆n (k)
bntk c bntk−1 c
≤ Sn tk − Sn
+ Sn tk−1 − Sn
n n
Xbnt c+1 + Xbnt c+1
k k−1
≤ 1 ,
n2
one sees that, for any > 0,
`
! `
!
X 2 X 2 n2
Sn tk − Sn tk−1 − ∆n (k) ≥ 2
≤P Xbntk c+1 ≥
P
4
k=1 k=0
`
4 X P h 2 i 4(` + 1)N
≤ E X bnt c+1
= −→ 0
n2 k
n2
k=0
Moreover, since
∆n (1), . . . , ∆n (`) ∗ P = ∆n (1) ∗ P × · · · × ∆n (`) ∗ P
for all sufficiently large n’s, this reduces to checking ∆n (k) ∗ P =⇒ γ0,τk I for
each 1 ≤ k ≤ `. Finally, given 1 ≤ k ≤ `, set Mn (k) = bntk c − bntk−1 c, and use
Theorem 2.3.8 to see that, as n → ∞,
√
Mn (k)
|ξ|2
P −1 X
E exp 1 ξ, Xbntk c+j RN −→ exp −
Mn (k) 2 j=1 2
Mn (k)
uniformly for ξ in compact subsets of RN . Hence, since n −→ τk , we now
see that, for any fixed ξ ∈ RN ,
√
h i
τk |ξ|2
P
E exp −1 ξ, ∆n (k) RN −→ exp − = γ\
0,τk I (ξ),
2
and therefore ∆n (k) ∗ P =⇒ γ0,τk I .
§ 9.3 Donsker’s Invariance Principle 395
iscompact for any α > 0 and {R` : ` ≥ 1} ⊆ [0, ∞). Thus, since µn ψ(0) =
0 = 1, all that we have to do is show that, for each T > 0,
P |Sn (t) − Sn (s)|
sup E sup 1 < ∞,
n≥1 1≤s<t≤T (t − s) 8
and, by Theorem 4.3.2, this would follow if we knew that
sup EP |Sn (t) − Sn (s)|4 ≤ C(t − s)2 , s, t ∈ [0, ∞),
(*)
n≥1
h 4 i
+ 27EP Sn nk − Sn (s)
4
4 X`−k 4
2 ` 27 P 2 k
≤ 27M n t − + 2 E Xk+j + 27M n
−s
n n j=1 n
81N 2 M (` − k)2
≤ 54M (t − s)2 + ≤ 135N 2 M (t − s)2 ,
n2
where, in the passage to the final line, I have taken {ei : 1 ≤ i ≤ N } to be an
orthonormal basis for RN and used the estimate
4 2 2
X`−k N `−k
X X
EP Xk+j = EP ei , Xk+j RN
j=1 i=1 j=1
4
N
X `−k
X
ei , Xk+j RN ≤ 3N 2 M (` − k)2
≤N EP
i=1 j=1
396 9 Convergence of Measures on a Polish Space
and, for every n ∈ Z+ , the random variable Xn,δ ≡ fn,δ ◦ Xn has mean value 0
and covariance I. Next, for each δ > 0, define the maps ω ∈Ω 7−→ Sn,δ ( · , ω) ∈
C(RN ) relative to {Xn,δ : n ≥ 1}, and set µn,δ = Sn,δ ∗ P. Then, by the
preceding, we know that µn,δ =⇒ W (N ) for each δ > 0. Hence, by Theorem
9.1.13, we will have proved that µn =⇒ W (N ) as soon as we show that
lim sup P sup Sn (t) − Sn,δ (t) ≥ = 0
δ&0 n∈Z+ 0≤t≤T
for every T ∈ Z+ and > 0. To this end, first observe that, because Sn ( · ) and
Sn,δ ( · ) are linear on each interval [(m − 1)2−n , m2−n ],
m
1 X
sup Sn (t) − Sn,δ (t) = max Yk,δ ,
1
t∈[0,T ] 1≤m≤nT n k=1
2
for every e ∈ SN −1 .
§ 9.3.2. Rayleigh’s Random Flights Model. Here is a more picturesque
scheme for approximating Brownian motion. Imagine the path t R(t) of a
bird that starts at the origin, flies in a randomly chosen direction at unit speed
§ 9.3 Donsker’s Invariance Principle 397
for a unit exponential random time, then switches to a new randomly chosen
direction for a second unit exponential time, etc. Next, given > 0, rescale time
1
and space so that the path becomes t R (t), where R (t) ≡ 2 R(−1 t). I
will show that, as & 0, the distribution of {R (t) : t ≥ 0} becomes Brownian
motion. This model was introduced by Rayleigh and is called his random flights
model.
In the following, {τm : m ≥ 1} is a sequence of mutually independent, unit
exponential random variables from which their partial sums {Tn : n ≥ 0} and
the associated simple Poisson process {N (t) : t ≥ 0} are defined as in § 4.2.1.
Finally, given > 0, N (t) = N (−1 t).
Lemma 9.3.3. Let {Xn : n ≥ 1} a sequence of mutually independent RN -
valued, uniformly square P-integrable random variables with mean value 0 and
covariance I, and define {Sn (t) : t ≥ 0} accordingly, as in Theorem 9.3.1. (Note
that the Xn ’s are not assumed to be independent of the τn ’s.) Next, define
N (t,ω)
√ X
X (t, ω) = Xm , (t, ω) ∈ [0, ∞) × Ω.
m=1
But, by Theorem 9.3.1 and the converse statement in Theorem 9.1.9, we know
that the first term tends to 0 as & 0 uniformly in δ ∈ (0, 1] and that the third
398 9 Convergence of Measures on a Polish Space
term tends to 0 as δ & 0 uniformly in ∈ (0, 1]. Thus, all that remains is to
note that, by Exercise 4.2.19,
!
(9.3.4) lim P sup N (t) − t ≥ δ = 0.
&0 t∈[0,T ]
Hence, by Lemma 9.3.3 and Theorems 9.3.1 and 9.1.13, all that we have to do
is check that !
√
lim P sup XN (t)+1 ≥ r = 0
&0 t∈[0,T ]
for every r ∈ (0, ∞) and T ∈ [0, ∞). To this end, set T = 1+T . Then, by
(9.3.4), we have that
!
√
r
lim P sup XN (t)+1 ≥ r = lim P max |Xn+1 | ≥ √
&0 t∈[0,T ] &0 0≤n≤T
14
√ 1
P X 4 M (2 + T ) 4
≤ lim E |Xn+1 | ≤ lim = 0.
&0 r &0 r
0≤n≤T
Exercise for § 9.3 399
Show that there is a µ ∈ M1 C(RN ) to which {µn : n ≥ 1} converges in
M1 C(RN ) if and only if, for each T ∈ (0, ∞), there is a µT ∈ M1 C([0, T ]; RN )
with the property that
µTn =⇒ µT in M1 C([0, T ]; RN ) ,
Prove their result as an application of Donsker’s Theorem and part (iii) of Ex-
ercise 4.3.11. According to Kac, it was G. Uhlenbeck who first suggested that
their result might be a consequence of a more general “invariance” principle.
Exercise 9.3.8. Here is another version
of Rayleigh’s random
flights model.
Again let {τk : k ≥ 1}, Tm : m ≥ 0 , and N (t) : t ≥ 0 be as in § 4.2.2, and
set Z t
√
(−1)N (s) ds and R (t) = R t .
R(t) =
0
Show that R ∗ P =⇒ W (1) as & 0.
Hint: Set βk = 0 or 1 according to whether k ∈ N is even or odd, and note that
n
X n
X X
(−1)k τk =
βk τk+1 − τk − βn τn = τ2k − τ2k−1 − βn τn+1 .
k=1 k=1 1≤k≤ n
2
In this chapter I will give a somewhat sketchy survey of the bridge between
Brownian motion and partial differential equations. Like all good bridges, it
is valuable when crossed starting at either end. For those starting from the
probability side, it provides a computational tool with which the evaluation of
many otherwise intractable Wiener integrals is reduced to finding the solution to
a partial differential equation. For aficionados of partial differential equations,
it provides a representation of solutions that often reveals properties that are
not at all apparent in more conventional, purely analytic, representations.
§ 10.1 Martingales and Partial Differential Equations
The origin of all the connections between Brownian motion and partial differen-
tial equations is the observation that the Gauss kernel
N |x|2
(10.1.1) g (N ) (t, x) = (2πt)− 2 e− 2t , (t, x) ∈ (0, ∞) × RN ,
is simultaneously the density for the Gaussian distribution γ0,tI and the solution
to the heat equation ∂t u = 12 ∆u in (0, ∞) × R with initial condition δ0 . More
precisely, if ϕ ∈ Cb (RN ; R), then
Z
uϕ (t, x) = g (N ) (t, y − x)ϕ(y) dy
RN
is the one and only bounded u ∈ C 1,2 (0, ∞) × RN ; R that solves the Cauchy
initial value problem
400
§ 10.1 Martingales and Partial Differential Equations 401
Brownian motion, for each T > 0, u(T −t∧T, x+B(t∧T )Ft , P is a martingale.
Thus,
Z
u(T, x) = EP ϕ B(T ) = ϕ(x + y) γ0,tI (dy) = uϕ (T, x).
RN
In Theorem 10.1.2, I will prove a refinement of Theorem 7.1.6 that will enable
me (cf. the discussion following Corollary 10.1.3) to remove the assumption that
the derivatives of u are bounded.
As the preceding line of reasoning indicates, the advantage that probability
theory provides comes from lifting questions about a partial differential equa-
tion to a pathspace setting, and martingales provide one of the most powerful
machines with which to do the requisite lifting. In this section I will refine and
exploit that machine.
§ 10.1.1. Localizing and Extending Martingale Representations. The
purpose of this subsection is to combine Theorems 7.1.6 and 7.1.17 with Corollary
7.1.15 to obtain a quite general method for representing solutions to partial
differential equations as Wiener integrals.
For the purposes of this chapter, it is best to think of Wiener measure
W (N )
N N
as a Borel measure on the Polish space C(R ) ≡ C [0, ∞); R and to take
{Ft : t ≥ 0} with Ft = σ {ψ(τ ) : τ ∈ [0, t]} as the standard choice of a
non-decreasing family of σ-algebras. The reason for using C(RN ) instead of (cf.
(N )
§ 8.1.3) Θ(RN ) is that we will want to consider the translates Wx of W (N ) by
(N )
x ∈ RN . That is, Wx is the distribution of ψ x + ψ under W (N ) . Since it
(N )
is clear that the map x ∈ R 7−→ Wx ∈ M1 C(RN ) is continuous, there is
N
Z t∧ζsG (ψ)
E (τ, ψ)f s + τ, ψ(τ ) , Ft , Wx(N )
V
−
0
402 10 Wiener Measure and P.D.E.’s
Thus, if Z t
En (t, ψ) = exp Vn τ, ψ(τ ) dτ ,
0
and therefore
Z t
En (t, ψ)Mn (t, ψ) − En (τ, ψ)Mn (τ, ψ)Vn (τ, ψ) dτ
0
Z t
= En (t, ψ)wn t, ψ(t) − En (τ, ψ)fn τ, ψ(τ ) dτ,
0
is a martingale.
Finally, define ζ0Gn for Gn in the same way as ζ0G was defined for G. Since
fn ≥ f on Gn , an application of Theorem 7.1.15 gives the desired result with
ζ0Gn in place of ζ0G , and, because ζ0Gn % ζ0G , this completes the proof.
Perhaps the most famous application of Theorem 10.1.2 is the Feynman–Kac
formula,1 a version of which is the content of the following corollary.
Corollary 10.1.3. Let V : [0, T ] × RN −→ R be a Borel measurable function
that is uniformly bounded above everywhere
and bounded below uniformly on
compacts. If u ∈ C 1,2 (0, T ) × RN ; R is bounded and satisfies the Cauchy initial
value problem
Proof: Given Theorem 10.1.2, there is hardly anything to do. Indeed, here
G = (0, T ) × RN and so ζ0G = T . Thus, by Theorem 10.1.2 applied to w(t, · ) =
u(T − t, · ), we know that
R t∧T
V (τ,ψ(τ )) dτ
e 0 u T − t ∧ T, ψ(t)
Z t∧T Rτ
V (σ,ψ(σ)) dσ
f τ, ψ(τ ) dτ, Ft , Wx(N )
− e 0
is a martingale. Hence,
Rt
W (N ) V (τ,ψ(τ )) dτ
u(T, x) = lim E e 0 u T − t, ψ(t)
t%T
Z t Rτ
V (σ,ψ(σ)) dσ
+ e 0 f τ, ψ(τ ) dτ ,
0
1In the same spirit as he wrote down (8.1.4), Feynman expressed solutions to Schrödinger’s
equation in terms of path-integrals. After hearing Feynman lecture on his method, Kac realized
that one could transfer Feynman’s ideas from the Schrödinger to the heat context and thereby
arrive at a mathematically rigorous but far less exciting theory.
404 10 Wiener Measure and P.D.E.’s
(N )
h R ζn (ψ) i
Wx V (−τ,ψ(τ )) dτ
w(0, x) ≥ E e 0 w ζn (ψ), ψ(ζn ) ,
where ζn (ψ) = inf{t ≥ 0 : t, ψ(t) ∈ fn } ∧ n. Moreover, by (10.1.5), for
/ G
(N )
Wx -almost every ψ, −ζn (ψ), ψ(ζn ) tends to a point in {(t, x) ∈ ∂G : t < 0}
as n → ∞, and therefore
everywhere on G.
Proof: Again, without loss in generality, I assume that s = 0. In addition,I
may and will assume that x = 0, V is uniformly bounded, and u ∈ Cb G; [0, ∞) .
To see that these latter assumptions cause no loss in generality, one can use an
exhaustion argument of the same sort as was used in the proof of Theorem 10.1.2.
N
Given (t, ψ) ∈ (0, ∞)×C(R
) with ψ(0) = 0 and −τ, ψ(τ ) ∈ G for τ ∈ [0, t],
suppose that u −t, ψ(t) > 0. In order to get a contradiction, choose r > 0 so
that u(−t, y) ≥ r if |y − ψ(t)| ≤ r and so that −τ, ψ 0 (τ ) ∈ G if τ ∈ [0, t] and
≥ re−tkV ku k W (N ) {ψ 0 : kψ 0 − ψk[0,t] ≤ r} .
required contradiction.
Turning to the final assertion, take G = R × G, and observe that for all
(x, y) ∈ G2 there is a ψ such that ψ(0) = x, ψ(1) = y, and ψ(τ ) ∈ G for all
τ ∈ [0, 1].
At first glance, one might think that the strong minimum principle overshad-
ows the weak minimum principle and makes it obsolete. However, that is not
entirely true. Specifically, before one can apply the strong minimum principle,
one has to know that a minimum is actually achieved. In many situations,
continuity plus compactness provide the necessary existence. However, when
compactness is absent, special considerations have to be brought to bear. The
weak minimum principle does not suffer from this problem. On the other hand,
it suffers from a related problem. Namely, one has to know ahead of time that
(10.1.5) holds. As we will see below, this is usually not too serious a problem,
but it should be kept in mind.
§ 10.1.3. The Hermite Heat Equation. In the preceding subsection I gave
an example of how probability theory can give information about solutions to
partial differential equations. In this subsection, it will be a differential equation
that gives us information about probability theory. To be precise, I, following M.
Kac, will give in this subsection his derivation of the formulas that we derived
406 10 Wiener Measure and P.D.E.’s
by purely Gaussian techniques in Exercise 8.2.16, and in the next section I will
give his treatment of a closely related problem.2
Closed form solutions to the Cauchy initial value problem are available for
very few V ’s, but there is a famous one for which they are. Namely, when
V = − 12 |x|2 , a great deal is known. Indeed, already in the nineteenth century,
Hermite knew how to analyze the operator 12 ∆− 12 |x|2 . As a result, this operator
is often called the Hermite operator by mathematicians, although physicists
call it the harmonic oscillator because it arises in quantum mechanics as minus
the Hamiltonian for an oscillator that satisfies Hook’s law. Be that as it may,
set (cf. (10.1.1))
1 − e−2t
N t+|x|2 |y|2
− (N ) −t
(10.1.7) h(t, x, y) = e 2 g ,y − e x e 2
2
for (t, x, y) ∈ (0, ∞) × RN × RN . By using the fact that g (N ) solves the heat
equation and tends to δ0 as t & 0, one can apply elementary calculus to check
that
|x|2
Rt
(N )
Wx − 12 |ψ(τ )|2 dτ − N2
E e 0 = cosh t exp − tanh t ,
2
which, together with Brownian scaling, vastly generalizes the result in Exercise
8.2.16.
2See Kac’s “On some connections between probability theory and differential and integral
equations,” Proc. 2nd Berkeley Symp. on Prob. & Stat. Univ. of California Press (1951),
where he gives several additional, intriguing applications of Corollary 10.1.3.
§ 10.1 Martingales and Partial Differential Equations 407
§ 10.1.4. The Arcsine Law. As I said at the beginning of the last subsection,
there are very few V ’s for which one can write down explicit solutions to equa-
tions of the form ∂t u = 12 ∆u + V u. On the other hand, when V is independent
of time one can often, particularly whenRN = 1, write down a closed form ex-
∞
pression for the Laplace transform Uλ = 0 e−λt u(t, · ) dt of u. Indeed, if u is a
bounded solution to ∂t u = 12 ∆u + V u, then it is an elementary exercise to check
that
λ − 12 ∆ − V Uλ = f,
The preceding remark is the origin of Kac’s derivation of Lévy’s Arcsine Law
for Wiener measure.
Theorem 10.1.8. For every T ∈ (0, ∞) and α ∈ [0, 1],
( )!
Z T √
(1) 1 2
W ψ ∈ C(R) : 1[0,∞) ψ(t) dt ≤ α = arcsin α .
T 0 π
Proof: First note that, by Brownian scaling, it suffices to prove the result when
T = 1. Next, set
Z 1
F (α) = W ψ ∈ C(R) : 1[0,∞) ψ(s) ds ≤ α , α ∈ [0, ∞),
0
and let µ denote the element of M1 [0, ∞) for which F is the distribution
function. We are going to compute F (α) by looking at the double Laplace
transform Z
G(λ) ≡ e−λt g(t) dt, λ ∈ (0, ∞),
(0,∞)
where Z
g(t) ≡ e−tα µ(dα), t ∈ (0, ∞);
[0,∞)
408 10 Wiener Measure and P.D.E.’s
At this point, the strategy is to calculate G(λ) with the help of the idea
explained above. For this purpose, I begin by seeking as good a solution x ∈
R 7−→ uλ (x) ∈ R as I can find to the equation 12 u00 + Vλ u = −1. By considering
this equation separately on the left and right half-lines and then matching, in so
far as possible, at 0, one finds that the best choice of bounded uλ will be to take
h p i
1
Aλ exp − 2(1 + λ) x + 1+λ
if x ∈ [0, ∞)
uλ (x) = h√ i
Bλ exp 2λ x + 1
if x ∈ (−∞, 0),
λ
where
12 12
1 1 1 1
Aλ = − and Bλ = − .
λ(1 + λ) 1+λ λ(1 + λ) λ
1 00
fn ≡ uλ,n − λ + 1[0,∞) uλ,n −→ −1 on R \ {0}.
2
Thus, since the argument that I attempted to apply to uλ works for uλ,n , we
know that
Z ∞ R t
W (1) Vλ (ψ(τ )) dτ
uλ,n (0) = E e 0 fn ψ(t) dτ dt .
0
§ 10.1 Martingales and Partial Differential Equations 409
In addition, because
Z ∞ Z ∞
W (1)
E 1{0} ψ(t) dt = γ0,t {0} dt = 0,
0 0
Z ∞ Rt
W (1) Vλ (ψ(τ )) dτ
E e 0 fn ψ(t) dt −→ G(λ).
0
Hence, the conclusion uλ (0) = G(λ) has now been rigorously verified.
− 1
Knowing that G(λ) = λ(1−λ) 2 , the rest of the calculation is easy. Indeed,
since Z ∞ r
− 12 −λt π
t e dt = ,
0 λ
the multiplication rule for Laplace transforms tells us that
t 1
e−s e−tα
Z Z
1 1
g(t) = p ds = p dα;
π 0 s(t − s) π 0 α(1 − α)
1 α∧1
Z
1 2 √
F (α) = p dβ = arcsin α ∧ 1 .
π 0 β(1 − β) π
√
Nn (ω) 2
lim P ω: ≤α = arcsin α ,
n→∞ n π
Pm
where Nn (ω) is the number of m ∈ Z+ ∩ [0, n] for which Sm (ω) ≡ `=1 X` (ω)
is non-negative.
Nn (ω)
Proof: Thinking of n as a Riemann approximation to (cf. the notation in
§ 9.2.1)
Z 1
1[0,∞) Sn (t, ω) dt,
0
one should guess that, in view of Theorem 9.3.1 and Theorem 9.1.13, there
should be very little left to be done. However, once again there are continuity
410 10 Wiener Measure and P.D.E.’s
issues that have to be dealt with. Thus, for each f ∈ C R; [0, 1] and n ∈ Z+ ,
introduce the functions F f and Fnf on C(R) given by
Z 1 n
1 X
F f (ψ) = and Fnf (ψ) = m
f ψ(t) dt f ψ n
0 n m=1
for any f ∈ C R; [0, 1] . Since Fnf −→ F f uniformly on compacts, Theorem
9.3.1 plus Lemma 9.1.10 show that the distribution of
n
1 X Sm (ω)
ω ∈ Ω 7−→ Afn (ω) ≡ f 1
n m=1 n2
under P tends weakly to that of ψ ∈ C(R) 7−→ F f (ψ) under W (1) . Next, for
each δ ∈ (0, ∞), choose continuous functions fδ± so that 1(δ,∞) ≤ fδ+ ≤ 1[0,∞)
and 1[0,∞) ≤ fδ− ≤ 1[−δ,∞) , and conclude that
Nn
(1)
+
fδ
lim P ≤α ≤W F ≤α
n→∞ n
and
Nn
(1)
−
fδ
lim P <α ≥W F <α
n→∞ n
and
Z 1
Nn (1)
lim P <α ≥W ψ: 1[0,∞) ψ(t) dt < α .
n→∞ n 0
Finally, since
Z Z 1 Z 1
(1)
W (1) ψ(t) = 0 dt = 0,
1{0} ψ(t) dt W (dψ) =
0 0
√
and α ∈ [0, 1] 7−→ arcsin α is continuous, the asserted result follows.
§ 10.1 Martingales and Partial Differential Equations 411
Remark 10.1.10. The renown of the Arcsine Law stems, in large part, from the
following counterintuitive deduction that can be drawn from it. Namely, given
δ ∈ 0, 12 , guess which α maximizes limn→∞ P Nnn ∈ (α − δ, α + δ) mod1 for
a fixed δ. Because of The Law of Large Numbers (in more common parlance,
“The Law of Averages”), most people are inclined to guess that the maximum
should occur at α = 12 . Thus, it is surprising that, since
1
α ∈ [0, 1] 7−→ p ∈ [0, ∞]
α(1 − α)
is convex and has its minimum at 12 , the Arcsine Law makes the exact opposite
prediction! The point is, of course, that the sequence of partial sums {Sn (ω) :
n ≥ 1} is most likely to make long excursions above and below 0 but tends to
spend relatively little time in a neighborhood of 0. In other words, although
one may be correct to feel that “my luck has got to change,” one had better be
prepared to wait a long time.
A more technical point is one raised by S. Sternberg. The arcsine distribution
is familiar to people who study iterated maps and is important to them because
(cf. Exercise 10.1.15) it is the one and only absolutely continuous probability
distribution on [0, 1] that is invariant under x ∈ [0, 1] 7−→ 4x(1 − x) ∈ [0, 1].
Sternberg asked whether a derivation R 1 of Theorem 10.1.8 can be
R 1based on this
invariance property. Taking T+ = 0 1[0,∞) ψ(s) ds and S = 0 sgn ψ(s) ds,
and noting that 4T+ (1 − T+ ) = 1 − S 2 , one way to phrase Sternberg’s question
is to ask is whether there is a pure thought way to check that T+ and 1 − S 2
have the same distribution under W (1) and that that distribution is absolutely
continuous. I have posed this problem to several experts but, as yet, none of
them has come up with a satisfactory solution.
ψ ∈ C(RN ).
ζr (ψ) = inf t ∈ [0, ∞) : |ψ(t)| = r ,
Then
(N ) r2 − |x|2
EWx ζr =
N
for |x| < r.
(N ) (N + 4)r2 − N |x|2 2
EWx ζr2 = r − |x|2
2
N (N + 2)
412 10 Wiener Measure and P.D.E.’s
In particular,
and
N −2
r
Wx(N )
ζr < ∞ = , 0 < r < |x|, when N ≥ 3.
|x|
Proof: To prove the first two equalities, set f (t, x) = |x|2 − N t, use Theorem
10.1.2 to show that
f t ∧ ζr , ψ(t ∧ ζr ) , Ft , Wx(N )
and
!
2
Z t∧ζr
f t ∧ ζr , ψ(t ∧ ζr ) − 4 |ψ(s)|2 ds, Ft , Wx(N )
0
and
"Z #
(N ) (N )
t∧ζr
Wx
2
(t ∧ ζr )2 =|x|4 + 4EWx 2
N E |ψ(s)| ds
0
(N )
h 2 i h
(N ) 4 i
+ 2N EWx (t ∧ ζr ) ψ(t ∧ ζr ) − EWx ψ(t ∧ ζr )
§ 10.1 Martingales and Partial Differential Equations 413
(N )
for all t ∈ [0, ∞). Now assume that |x| ≤ r, and use the first of these N EWx [ζr ] ≤
(N ) (N )
r2 . Thus Wx (ζr < ∞) = 1, and so N EWx [ζr ] = r2 −|x|2 follows when t → ∞.
To get the second equality, use Theorem 10.1.2 to show that
Z t∧ζr !
4 2
ψ(s) ds, Ft , Wx(N )
ψ(t ∧ ζr ) − (4 + 2N )
0
as required. Given this, the rest of the theorem follows easily when one lets
R % ∞ and, in the case when N = {1, 2}, r & 0.
The second part of Theorem 10.1.11 says something significant about the
global behavior of Brownian paths and the dependence of that behavior on
dimension. Namely, when N ∈ {1, 2}, it says that, no matter where it is started,
a Brownian path will hit any non-empty open set with probability 1. As will be
shown in Theorem 10.2.3, this property implies the seemly stronger statement
that, with probability 1, a Brownian path will visit every non-empty open set
infinitely often and will spend infinite time in each. For this reason, Brownian
motion in one and two dimensions is said to be recurrent. By contrast, when
N ≥ 3, Theorem 10.1.11 says that, with positive probability, a Brownian path
will never visit a closed ball in which it was not started. Moreover, if it is started
outside of a ball, then the probability of its ever hitting that ball goes to 0 as
the diameter of the set goes to 0. As I am about to show, this latter property
leads to the conclusion that, with probability 1, a Brownian path in three or
more dimensions tends to infinity.
414 10 Wiener Measure and P.D.E.’s
Proof: Given r > 0, apply Theorem 10.1.2 to see that (cf. the notation in
Theorem 10.1.11)
ψ(t ∧ ζr )−N +2 , Ft , Wx(N )
is a bounded, non-negative martingale for every |x| > r > 0. Hence, by Theorem
7.1.14, for any 0 ≤ s ≤ t < ∞ and A ∈ Fs ,
h i
ψ(s)−N +2 , A ∩ ζr (ψ) > s
(N )
|x|−N +2 ≥ EWx
h
(N ) −N +2 i
= EWx ψ t ∧ ζr
, A ∩ ζr (ψ) > s ;
(N )
and, because N ≥ 3 and therefore ζr % ∞ a.s., Wx as r & 0, an application
of the Monotone Convergence Theorem and Fatou’s Lemma leads to
h i h i
ψ(s)−N +2 , A ≥ EWx ψ(t)−N +2 , A
(N ) (N )
|x|−N +2 ≥ EWx
(iii) Now add the assumption that µ λ[0,1] , let f be the corresponding Radon–
Nikodym derivative, and extend f to R by taking f = 0 off of [0, 1]. Given
0 ≤ x < x + y ≤ 1, conclude that
Z
F (x + y) − F (x) − F (y) ≤ f t + x2−n − f (t) dt −→ 0
R
as n → ∞. In other words, F (x + y) = F (x) + F (y) whenever 0 ≤ x <
x + y ≤ 1. Finally, after combining this with the facts that F (0) = 0, F (1) = 1,
and F is continuous, conclude that F (x) = x for x ∈ [0, 1]. In view of part
(i), this completes the proof that the arcsine distribution admits the asserted
characterization.
(vi) To see that absolute continuity is absolutely essential in the preceding con-
+
siderations, consider any Borel probability measure M on {0, 1}Z that is sta-
tionary in the sense that the M -distribution of
+ +
ω ∈ {0, 1}Z 7−→ (ω2 , . . . , ωn+1 , . . . ) ∈ {0, 1}Z
is again M . Show that the M -distribution µ of
∞
+ X
ω ∈ {0, 1}Z 7−→ 2−n ωn ∈ [0, 1]
n=1
is invariant under x 2x mod 1. In particular, this means that, for each
p ∈ (0, 1) \ { 12 }, the µp described in Exercise 1.4.29 is a non-atomic, Borel
probability measure on [0, 1] that is invariant under x 2x mod 1 but singular
to Lebesgue measure.
§ 10.2 The Markov Property and Potential Theory
In this section I will discuss the Markov property for Wiener measure and show
how it can be used as a tool for connecting Brownian motion to partial differential
equations.
§ 10.2.1. The Markov Property for Wiener Measure. The introduction
(N )
of the translates Wx ’s facilitates the statement of the following important in-
terpretation of Theorem 7.1.16. In its statement, and elsewhere, Σt : C(RN ) −→
C(RN ) is the time-shift map determined by Σt ψ(τ ) = ψ(t + τ ), τ ∈ [0, ∞),
and when ζ is a stopping time, Σζ is the map on {ψ : ζ(ψ) < ∞} −→ C(RN )
given by Σζ ψ(τ ) = ψ ζ(ψ) + τ .
Theorem 10.2.1. If ζ is a stopping time and F : C(RN ) × C(RN ) −→ [0, ∞)
is a Fζ × FC(RN ) -measurable function, then
Z
F ψ, Σζ ψ Wx(N ) (dψ)
{ψ:ζ(ψ)<∞}
(10.2.2) Z Z !
0 (N )
= F (ψ, ψ ) Wψ(ζ) (dψ 0 ) Wx(N ) (dψ).
C(RN )
{ψ:ζ(ψ)<∞}
§ 10.2 The Markov Property and Potential Theory 417
Proof: Given Theorem 7.1.16, the proof is mostly a matter of notation. In the
first place, by replacing F (ψ, ψ 0 ) with F (x + ψ, ψ 0 ), one can reduce to the case
when x = 0. Thus, I will assume that x = 0. Secondly, Σζ ψ = ψ(ζ) + δζ ψ if
ζ(ψ) < ∞. Hence,
Z Z
(N )
F ψ, ψ(ζ) + δζ ψ W (N ) (dψ).
F ψ, Σζ ψ W (dψ) =
{ψ:ζ(ψ)<∞} {ψ:ζ(ψ)<∞}
Fζ × BC(RN ) -measurable, and apply Theorem 7.1.16 to reach the desired conclu-
sion.
Theorem 10.2.1 is a statement of the Markov property for Wiener measure.
More precisely, because it involves stopping times, and not just fixed times, it is
often called the strong Markov property.
§ 10.2.2. Recurrence in One and Two Dimensions. As my first application
of the Markov property, I will prove the statement made following Theorem
10.1.11 about the recurrence of Brownian motion when N ∈ {1, 2}.
Theorem 10.2.3. If N ∈ {1, 2}, then, for all x ∈ RN ,
Z ∞
Wx(N ) 1B(c,r) ψ(t) dt = ∞ for all c ∈ RN and r ∈ (0, ∞) = 1.
0
Theorem 10.2.1,
(N )
Wx(N ) ζ 2n+1 ≤ t Fζ 2n = Wψ(ζ 2n ) ζ B(0,r) ≤ t if ζ2n (ψ) < ∞,
(**) (N )
Wx(N ) ζ 2n ≤ t Fζ 2n−1 = Wψ(ζ 2n−1 ) ζ r2 ≤ t
if ζ2n−1 (ψ) < ∞.
418 10 Wiener Measure and P.D.E.’s
In particular, because N ∈ {1, 2}, Theorem 10.1.11 says that both ζ B(0,r) and
(N )
ζ r2 are Wy -almost surely finite for all y ∈ RN . Thus, by induction, ζ n < ∞
(N )
Wx -almost surely for all n ≥ 0.
Next set
2n+1
ζ (ψ) − ζ 2n (ψ) if ζ 2n (ψ) < ∞
Xn (ψ) ≡
0 if ζ 2n (ψ) = ∞.
(N )
By the preceding, we know that, for each n ≥ 0, Xn > 0 Wx -almost surely.
In addition, it is obvious that
Z ∞ ∞
X
1B(0,r) ψ(t) dt ≥ Xn (ψ).
0 n=0
Hence, if we show that the Xn ’s are mutually independent and identically dis-
(N )
tributed under Wx , then (*) will follow from The Strong Law of Large Num-
bers. But, by (**), we will know that the Xn ’s have both these properties once
(N )
we show that Wy (ζ B(0,r) ≤ t) is the same for all y ∈ RN with |y| = 2r . To
this end, let yi , i ∈ {1, 2} with |yi | = 2r be given, and choose an orthogonal
(N )
transformation O of RN so that y2 = Oy1 . Then, Wy2 is the distribution of
(N ) (N ) (N )
ψ Oψ under Wy1 , and so Wy2 (ζ B(0,r) ≤ t) = Wy1 (ζ B(0,r) ≤ t).
§ 10.2.3. The Dirichlet Problem. There are many ways in which the Markov
property can be used to relate Brownian motion to partial differential equations,
but among the most compelling is the one that was discovered by S. Kakutani
and developed by Doob.1 What Kakutani discovered is that the capacitory
potential (cf. § 11.4.1) of a set K ⊆ R2 at a point x ∈ R2 \ K is equal to the
probability that a Brownian motion started at x ever hits K. What Doob did
is extend Kakutani’s result to RN and show that it is a very special case of a
result that identifies the distribution of the place where a Brownian motion hits
the boundary of a set as the harmonic measure (cf. § 11.1.4) for that set. In this
subsection, I will give a brief introduction to these ideas. A much more thorough
account is given in Chapter 11.
Let G be a non-empty, connected open subset of RN . Given an f ∈ Cb (G; R),
one says that u ∈ C 2 (G; R) solves the Dirichlet problem for f in G if u is
1 Kakutani’s 1944 article, “Two dimensional Brownian motion and harmonic functions,” Proc.
Imp. Acad. Tokyo, 20, together with his 1949 article, “Markoff process and the Dirichlet prob-
lem,” Proc. Imp. Acad. Tokyo, 21, are generally accepted as the first place in which a definitive
connection between harmonic functions and Brownian motion was established. However, it
was not until with Doob’s “Semimartingales and subharmonic functions,” T.A.M.S., 77, in
1954 that the connection was completed. It is ironic that this connection was not made by
Wiener himself. Indeed, Wiener’s early fame as an analyst was based on his contributions to
potential theory. However, in spite of his claims to the contrary, I know of no evidence that
he discovered the connection between his measure and potential theory.
§ 10.2 The Markov Property and Potential Theory 419
Wx(N ) ζ G < ∞ = 1,
(10.2.5)
then
(N )
u(x) = EWx u ψ(ζ G ) , ζ G (ψ) < ∞ .
(10.2.6)
(N )
Hence, u(x) = EWx u ψ(t ∧ ζ G ) , and so, after letting t → ∞ and taking
Hence, the proof of (10.2.7) reduces to the observation that the distribution of
(N )
ψ ∈ {ζ B(x,r) < ∞} 7−→ ψ(ζ B(x,r) ) ∈ ∂B(x, r) under Wx is same as that of
ψ ∈ {ζ B(0,r) < ∞} 7−→ x+ψ(ζ B(0,r) ) under W (N ) and that (cf. Exercise 4.3.10)
the distribution of ψ ∈ {ζ B(0,r) < ∞} 7−→ ψ(ζ B(0,r) ) under W (N ) is rotation
invariant.
Turning to the converse assertion, suppose that u : G −→ R is a locally
bounded, Borel measurable function for which (10.2.7) holds. To see that u ∈
C ∞ (G; R), extend u to RN so that it is 0 off of G, and choose a ρ ∈ Cc∞ R; [0, ∞)
with support in (0, 1) and total integral 1. Using (10.2.7) together with Fubini’s
Theorem, one sees that, as long as B(x, r) ⊂⊂ G,
ZZ 1
u(x) = ρ(t) − u(x + trω) λS N −1 (dω) dt
0 N −1
Z S
1
|y − x|1−N ρ r−1 |y − x| u(y) dy,
=
ωN −1 r RN
from which it is clear that u ∈ C ∞ (G; R). Further, knowing that u is smooth
and satisfies (10.2.7), it is easy to see that it is harmonic. Indeed, by Taylor’s
Theorem, we know that
Z Z
− u(x + rω) λSN −1 (dω) − u(x) = − u(x + rω) − u(x) λSN −1 (dω)
SN −1 SN −1
r2
= 2 ∆u(x) + o(r2 ),
and, when 1 ≤ i 6= j ≤ N ,
Z Z
ei , ω RN
λSN −1 (dω) = ei , ω RN
ej , ω RN
λSN −1 (dω) = 0.
SN −1 SN −1
Hence, after dividing through by r2 and letting r & 0, we see that (10.2.7)
implies ∆u(x) = 0.
§ 10.2 The Markov Property and Potential Theory 421
Thus, if (10.2.5) holds for all x ∈ G and we are going to solve the Dirichlet
problem for f , then we have no choice but to show that the u given by (10.2.8)
is a solution. Furthermore, because of the last part of Theorem 10.2.4, we already
know that this u is harmonic in G. Thus, all that remains is to find conditions
under which the u in (10.2.8) will take the correct boundary values.
It should be reasonably clear, and will be verified shortly (cf. Theorem 10.2.14),
that if f is continuous at a ∈ ∂G and if
(10.2.10) ζsG = s + ζ G ◦ Σs G
and ζ0+ G
≥ s =⇒ ζ0+ = ζsG .
Lemma 10.2.11. Regularity is a local property in the sense that, for each
r ∈ (0, ∞), a ∈ ∂reg G if and only if a ∈ ∂reg G ∩ B(a, r) . Furthermore,
and so ∂reg G is Borel measurable. Finally, if a ∈ ∂reg G, then, for each δ > 0,
(N ) G G
(10.2.13) lim
x→a
W x ζ , ψ(ζ ) ∈ (0, δ) × B(a, δ) = 1.
x∈G
Proof: Set G(a, r) = G ∩ BRN (a, r). Since it is obvious that ζ G(a,r) is domi-
nated by ζ G , there is no question that a ∈ ∂reg G =⇒ a ∈ ∂reg G(a, r). On the
other hand, if a ∈ ∂reg G(a, r) and > 0, then, for all 0 < δ < ,
(N ) G
is upper semicontinuous for all δ ≥ 0. In particular, if Wa ζ0+ > 0 = 0, then,
because ζ G (ψ) = ζ0+
G
(ψ) when ψ(0) ∈ G, it follows that
for every δ > 0. To prove the converse, suppose that a ∈ ∂reg G, let positive
and δ be given, and choose r > 0 so that
Then, by the second part of (10.2.10), the Markov property, and (4.3.13), for
each s ∈ (0, δ) one has
(N )
h i
(N )
Wa(N ) ζ0+
G
≥ 2δ ≤ EWa Wψ(s) ζ G ≥ δ , ψ(s) ∈ G
r2
/ B(a, r) ≤ + 2N e− 2N s ,
≤ + Wa(N ) ψ(s) ∈
§ 10.2 The Markov Property and Potential Theory 423
(N ) G
from which Wa ζ0+ > 0 = 0 follows when first s & 0 and then & 0.
Now, assume that a ∈ ∂reg G, and observe that, for each 0 < < δ,
Wx(N ) ψ ζ G ∈ / B(a, δ) or ζ G ≥ δ
!
≤ Wx(N ) ζ G ≥ + Wx(N ) sup |ψ(t) − a| ≥ δ .
t∈[0,]
δ2
lim Wx(N ) G G
x→a
ψ ζ ∈
/ B(x, δ) or ζ ≥ δ ≤ 2N exp − ,
x∈G
2N
At the same time, define L(f ) to be the set of v : G −→ R such that −v ∈ U(−f ).
Finally, given a ∈ ∂G, say that a admits
a barrier if, for some r > 0, there
exists an η ∈ C 2 G ∩ B(a, r); (0, ∞) such that
lim
x→a
η(x) = 0 and ∆η ≤ − for some > 0.
x∈G∩B(a,r)
424 10 Wiener Measure and P.D.E.’s
and that if Hf (x) denotes this common value, then x Hf (x) is a bounded
harmonic function on G with the property that
where, in the passage to the second to last line, I have used the fact, established
earlier, that the exit place from a ball of a Brownian path started at its center
is uniformly distributed. Hence, by Fatou’s Lemma and the boundary condition
satisfied by w,
(N )
w(x) ≥ lim EWx w ψ(ζ n ) , ζ n (ψ) < ∞
n→∞
(N )
≥ EWx f ψ(ζ G ) , ζ G (ψ) < ∞ = u(x).
Thus, we have now shown that w ≥ u for all w ∈ U(f ). Of course, if v ∈ L(f ),
then, because −v ∈ U(−f ), we also know that −v ≥ −u and therefore that
v ≤ u.
(N )
I turn next to the second part of the theorem. Set m(x) = EWx [ζ G ], x ∈ G.
Clearly m is positive. Moreover, if m(x) −→ 0 as x → a through G, then a is
regular. Conversely, suppose a is regular. Since
(N ) 1 (N ) 1
m(x) ≤ δ + EWx ζ G , ζ G ≥ δ ≤ δ + Wx(N ) (ζ G ≥ δ) 2 EWx (ζ G )2 2 ,
is chosen so that G ⊆ B(c, R), then ζ G ≤ ζ B(c,R) , and, by the first part of
(N )
EWx (ζ B(c,R) )2 is bounded
Theorem 10.1.11 and translation invariance, x
on B(c, R). Hence, we now know that a ∈ ∂G is regular if and only if m(x) −→
0 as x → a through G. To complete the proof at this point, set m̃(x) =
(N )
EWx [ζ B(c,R) ], and observe that, since ζ B(c,R) = ζ G + ζ B(c,R) ◦ Σζ G when ζ G <
∞,
(N )
m̃(x) − m(x) = EWx m̃ ψ(ζ G ) , ζ G (ψ) < ∞ .
Thus m̃−m is harmonic on G and so, by the first part of Theorem 10.1.11, ∆m =
∆m̃ = −2 on G. Hence, if a is regular, then m is a barrier at a. Conversely,
suppose that a admits a barrier η ∈ C 2 G ∩ B(a, r); (0, ∞) . Because of the
locality property proved in Lemma 10.2.11, I will, without loss in generality,
assume that B(a, r) ⊇ G. Choose a sequence {Gn : n ≥ 1} of open sets so that
Ḡn ⊂⊂ Gn+1 for each n and Gn % G. Then, by Theorem 10.1.2, for x ∈ Gn
and t ≥ 0,
(N )
η(x) ≥ η(x) − EWx η ψ(t ∧ ζ Gn )
t∧ζ Gn (ψ)
"Z #
(N ) Wx(N )
= −EWx 1
t ∧ ζ Gn .
2 ∆η ψ(τ ) dτ ≥ 2 E
0
Hence, after letting first t and then n tend to infinity, we see that m(x) ≤ 2 η(x)
for all x ∈ G; and, since η(x) −→ 0 as x tends to a through G, it follows that
a ∈ ∂reg G.
426 10 Wiener Measure and P.D.E.’s
The argument used to prove the first part of Theorem 10.2.15 is a probabilistic
implementation of what analysts call the “balayage” procedure for solving the
Dirichlet problem.
Exercises for § 10.2
Exercise 10.2.16. Suppose that G is a non-empty, open subset of RM ×RN and
that (x, y) ∈ G 7−→ u(x, y) ∈ R is a Borel measurable function that is harmonic
with respect to x and y separately (i.e., u( · , y) is harmonic on {x : (x, y) ∈ G}
for each y ∈ G and u(x, · ) is harmonic on {y : (x, y) ∈ G} for each x ∈ G).
Assuming that u is bounded below on compact subsets of G, show that u is
harmonic on G.
Hint: Clearly, all that one has to show is that u is smooth on G. In addition,
without loss in generality, one can assume that u can be extended to RM × RN
as a non-negative, Borel measurable function. Making this assumption, proceed
as in the proof of Theorem 10.2.4 to show that if ρ ∈ Cc∞ (0, 1); R has total
integral 1 and BRM (x, r) × BRN (y, r) ⊂⊂ G, then u(x, y) equals
ZZ
1 1−M 1−N −1
−1
|x−ξ| |y−η| ρ r |x−ξ| ρ r |y−η| u(ξ, η) dxdη.
ωM −1 ωN −1 r2
RM ×RN
|B(a, r) ∩ G{|
lim > 0,
r&0 |B(a, r)|
where |Γ| denotes the Lebesgue measure of Γ ∈ BRN . Show that a is regular. In
particular, because, for any Borel set Γ, the set of x ∈ Γ with upper Lebesgue
density less than 1 has Lebesgue measure 0, this proves that ∂G \ ∂reg G has
Lebesgue measure 0. (See the Lemma 11.1.9 for another proof of this fact.)
Hint: Show that, for all t > 0,
ΩN e− 12 |B(a, t 12 ) ∩ G{|
Wa(N ) Wa(N )
ζ0+ ≤ t ≥ ψ(t) ∈
/G ≥ N 1 ,
(2π) 2 |B(a, t 2 )|
is contained in G{
(v) If F is a closed subset of RN , r > 0, and G = {x ∈ RN : |x − F | > r},
show that every boundary point of G satisfies the exterior cone condition and is
therefore regular.
428 10 Wiener Measure and P.D.E.’s
(N )
EPn,x f ψ(ζ G ) , ζ G (ψ) < ∞ −→ EWx f ψ(ζ G ) , ζ G (ψ) < ∞
(10.2.21)
Thus, if (10.2.5) holds for all x ∈ G and every a ∈ ∂G has positive, upper
Lebesgue density in RN \ Ḡ, then (10.2.21) holds uniformly for x in compact
subsets of G.
4This type of approximation was carried out originally by H. Phillips and N. Wiener in “Nets
and Dirichlet problem,” J. Math. Phys. 2 in 1923. Ironically, the authors do not appear to have
made the connection between their procedure and probability theory. In 1928, a more complete
analysis was carried out in the famous article “Über die partiellen Differenzengleichungen der
Phsik,” Ann. Math. 5 # 2, of R. Courant, K. Friedrichs, and H. Lewy. Interestingly, these
authors do allude to a possible probabilistic interpretation, although their method (based on
energy considerations) makes no use of probability theory.
§ 10.3 Other Heat Kernels 429
where ζ y (ψ) = inf{t ≥ 0 : ψ(t) ≥ y}. Next, recall from Exercise 7.1.24 that the
1
W (1) -distribution of ζ y is the one-sided, 12 -stable law ν 21 . Thus
22 y
Z
(N +1) N
ψ(ζ H ) ∈ Γ × {0} =
W(x,y) πyR (y − x) dy,
Γ
where Z
N 1
πyR (y) = γ0,tI (y) ν 21 (dt).
(0,∞) 22 y
N
Finally, referring to Exercise 3.3.17, conclude that πyR is the Cauchy distribution
in (3.3.19). This, of course, explains the reason, alluded to in (ii) of Exercise
N
3.3.17, why analysts call πyR the Poisson kernel for the upper half-space.
§ 10.3 Other Heat Kernels
As we saw in § 10.1, from the perspective of someone studying partial differential
equations, the function (t, x, y) ∈ (0, ∞) × RN × RN 7−→ g (N ) (t, y − x) ∈ (0, ∞)
is the heat kernel, or, equivalently, the fundamental solution, to the classical
heat equation ∂t u = 12 ∆u in (0, ∞) × RN . That is, if ϕ ∈ Cb (RN ; R), then
Z
u(t, x) = ϕ(y)g (N ) (t, y − x) dy
RN
is the unique bounded solution to the classical heat equation that tends to ϕ
as t & 0. Of course, from a probabilistic perspective, g (N ) (t, y − x) is the
probability (in the sense of densities) of a Brownian path going from x to y
during a time interval of length t.
In this section I will construct other functions that, on the one hand, are
the fundamental solution to a heat equation and, at the same time, the den-
sity for the probability of a Brownian motion making transitions under various
conditions.
§ 10.3.1. A General Construction. For each t > 0, let Et : C(RN ) −→ [0, ∞)
be a Ft -measurable function with the property that
and define
(N )
h i
q(t, x, y) = EW Et x(1 − `t ) + θt + y`t g (N ) (t, y − x),
(10.3.2)
for (t, x, y) ∈ (0, ∞) × RN × RN ,
where `t (τ ) = τ ∧t N
t , τ ∈ [0, ∞), and θt = θ − θ(t)`t , θ ∈ Θ(R ). Clearly (x, y) ∈
(RN )2 7−→ q(t, x, y) ∈ [0, ∞) is Borel measurable for each t > 0.
My goal in this subsection is to prove the following theorem.
Theorem 10.3.3. For each t ∈ (0, ∞) and Borel measurable ϕ : RN −→ R
that is bounded below,
Z h
(N ) i
ϕ(y)q(t, x, y) dy = EWx Et (ψ)ϕ ψ(t) .
RN
s t tx+sy
where α = s+t , β= s+t , and c = s+t . At the same time, by Exercise 10.3.34,
Z
g (N ) (s, αη + ξ − c)g (N ) (t, βη − ξ + c) dη = g (N ) st
s+t , ξ −c
RN
g (N ) (s, ξ − x)g (N ) (t, η − ξ)
= ,
g (N ) (s + t, y − x)
and so we are done.
To prove the last assertion, simply note that when Et is reversible, q̄(t, x, y)
equals
(N )
h i (N )
h i
EW Et x(1−`t )+θt +y`t = EW Et y(1−`t )+(θ˘t )t +y`t = q̄(t, y, x),
since, by part (ii) of Exercise 8.3.22, θ (θ˘t )t [0, t] has the same distribution
(N )
under W as θ θt [0, t].
§ 10.3.2. The Dirichlet Heat Kernel. Let G be a non-empty, open subset
of RN , and set EtG (ψ) = 1(t,∞) ζ G (ψ) . Obviously Et is Ft -measurable and
(10.3.1) holds. In addition, if pG (t, x, y) is used to denote the associated q given
in (10.3.2), then, pG (t, x, y) = 0 unless x, y ∈ G, and, by Theorem 10.3.3,
Z h i
(N )
ϕ(y)pG (t, x, y) dy = EWx ϕ ψ(t) , ζ G (ψ) > t ,
(10.3.4) G
for (t, x) ∈ (0, ∞) × G,
Z
G
(10.3.5) p (s+t, x, y) = pG (s, x, z)pG (t, z, y) dz, (s, x), (t, y) ∈ (0, ∞)×G,
G
and
(10.3.6). pG (t, x, y) = pG (t, y, x) for (t, x, y) ∈ (0, ∞) × G2 .
In order to show that pG is smooth on (0, ∞) × G2 , I will use the Duhamel
formula contained in the following.
432 10 Wiener Measure and P.D.E.’s
Further, by the same argument as was used to prove the first assertion in The-
orem 10.3.3, for any ϕ ∈ Cc (G; R),
Z h i
(N )
ϕ(y)qα (t, x, y) dy = EWx ϕ ψ(ζ G ) , ζ G (ψ) > αt
G
Z
= ϕ(y)g (N ) (y − x) dy
G
(N )
hZ i
− EWx ϕ(y)g (N ) t − ζ G (ψ), y − ψ(ζ G ) , ζ G (ψ) ≤ αt ,
G
where, in the passage to the second line, I have applied the same reasoning as was
suggested in part (i) of Exercise 7.3.7. Hence, (*) will follow once y q̄α (t, x, y)
qα (t,x,y)
≡ g(N ) (t,y−x) is shown to be continuous on G. To this end, argue as in the last
part of Theorem (10.3.1) and apply the Markov property to show that q̄α (t, x, y)
equals
and
lim sup pG (t, x, y) = 0 for (s, a) ∈ (0, ∞) × ∂reg G.
(t,x)→(s,a) y∈K
x∈G
and therefore
Z
lim sup 1 − pG (t, x, y) dy ≤ lim sup Wx(N ) (ζ G ≤ t),
t&0 x∈K G∩B(x,r) t&0 x∈K
Cn −
|x|2
max ∂xα g (N ) (t, x) ≤
(10.3.10) N +n e
4t
kαk=n (t + |x|2 ) 2
and so we now see (cf. (4.3.13)) that, for some other choice of Cn < ∞,
!
1 1 |y−x|2
(10.3.11) ∂yα pG (t, x, y) ≤ Cn e−
N +n + 16N t
when kαk = n.
Combining (10.3.11) with the symmetry of pG , we have
!
1 1 |y−x|2
(10.3.12) ∂xα pG (t, x, y) ≤ Cn e−
N +n + 16N t .
(t + |y − x|2 ) 2 |x − ∂G|(N +n)
and so, by (10.3.12) and (10.3.11), we see that (x, y) pG (t, x, y) is smooth for
each t ∈ (0, ∞).
To check the assertions about the time derivatives, first observe that for any
ϕ ∈ Cb2 (G; R) and (x, y) ∈ G2 ,
Z
1
lim pG (h, x, y)ϕ(y) dy − ϕ(x) = 12 ∆ϕ(x)
h&0 h
ZG
1 G
lim p (h, x, y)ϕ(x) dx − ϕ(y) = 12 ∆ϕ(y).
h&0 h G
§ 10.3 Other Heat Kernels 435
To see this, use the symmetry of pG to show that the second of these follows
from the first one. To prove the first one, use pG (h, x, y) ≤ g (N ) (h, y − x) and
(10.3.8) to show that, for any ϕ̃ ∈ Cc2 (RN ; R) that equals ϕ in a neighborhood
of x, Z Z
G (N )
p (h, x, y)ϕ(y) dy − ϕ(x) − g (h, y − x)ϕ̃(y) dy
G G
tends to 0 faster than any power of h. Thus, since
Z
1 (N )
g (h, y − x)ϕ̃(y) dy − ϕ(x) −→ 12 ∆ϕ(x),
h G
the assertion is proved. Given the preceding, we know that
1h G i 1 Z
G G G G
p (t + h, x, y) − p (t, x, y) = p (h, x, z)p (t, z, y) dz − p (t, x, y)
h h G
tends to 12 ∆x pG (t, x, y). Thus, ∂t pG (t, x, y) = 12 ∆x pG (t, x, y). Similarly, using
Z Z
pG (t + h, x, y) = pG (t, x, z)pG (h, z, y) dz = pG (h, y, z)pG (t, x, z) dz,
G G
G 1 G
one gets ∂t p (t, x, y) = 2 ∆y p (t, x, y). Finally, assume the result for m, use
(10.3.11) to justify
Z
−m
∂tm pG (t + h, x, y) = 2 pG (h, x, z)∆m G
y p (t, z, y) dz,
G
differentiate this with respect to h, and let h & 0 to arrive at
∂tm+1 pG (t, x, y) = 2−m−1 ∆x ∆m G
y p (t, x, y)
m+1 G
∆x p (t, x, y)
= 2−m−1
∆m G m+1 G
y ∆x p (t, x, y) = ∆y p (t, x, y).
The following result provides the justification for my calling pG the Dirichlet
heat kernel on G.
Corollary 10.3.13. For each ϕ ∈ Cb (G; R), the function
Z
(N )
Wx
G
ϕ(y)pG (t, x, y) dy
u(t, x) = E ϕ ψ(t) , ζ (ψ) > t =
G
is a smooth solution to the boundary value problem
Proof: That the u in the first part is a bounded, smooth solution follows easily
from (10.3.12) and the last part of Theorem 10.3.9. To prove the uniqueness
assertion when ∂G = ∂reg G, choose {Gn : n ≥ 1} to be a non-decreasing
S
sequence of open sets so that Gn ⊆ G and G = n≥1 Gn . Given a bounded
solution u, apply Theorem 10.1.2 to see that, for each n ≥ 1, u(t, x) equals
(N ) (N )
EWx ϕ ψ(t) , ζ Gn (ψ) > t + EWx u t − ζ Gn (ψ), ψ(ζ Gn ) , ζ Gn (ψ) ≤ t
(N ) (N )
= EWx ϕ(ψ(t) , ζ G (ψ) > t + EWx u t − ζ Gn (ψ), ψ(ζ Gn ) , ζ G (ψ) < t
(N )
+ EWx u t − ζ Gn (ψ), ψ(ζ Gn ) − ϕ ψ(t) , ζ Gn (ψ) ≤ t < ζ G (ψ) .
we see that
Z Z t
(N )
V Wx
(10.3.16) ϕ(y)q (t, x, y) dy = E exp V ψ(τ ) dτ ϕ ψ(t)
RN 0
for (t, x) ∈ (0, ∞) × RN and Borel measurable ϕ’s that are bounded below,
Z
V
(10.3.17) q (t, x, y) = q V (s, x, z)q V (t, z, y) dz
RN
§ 10.3 Other Heat Kernels 437
I now want to make an analysis of q V (t, x, · ) which, among other things, will
enable me to show (cf. Corollary 10.3.22) that, under suitable conditions on V ,
the right-hand side of (10.3.20) is necessarily a solution to (10.3.19). For this
reason, I will call q V the Feynman–Kac heat kernel with potential V .
Theorem 10.3.21. Assume that V ∈ C n (RN ; R) is bounded above and that,
for some Cn < ∞,
n
Finally, if n ≥ 2 and m ≤ 2, then
m m
∂tm q V (t, x, y) = 1
2 ∆x + V (x) q V (t, x, y) = 1
2 ∆y + V (y) q V (t, x, y).
k=1 0
(0)
× ∂xα g (N ) (t, y − x),
438 10 Wiener Measure and P.D.E.’s
Rt
V (ψt,x,y (τ )) dτ
where ψt,x,y (τ ) = x + ψt (τ ) + τt (y − x), E V (t, x, y, ψ) = e 0 , and
P` (k)
k=0 α = α. Since, by our hypotheses, each of the integrands in these terms
+
is bounded by a constant times etkV ku , the asserted estimate for ∂xα q V (t, x, y)
follows from this and (10.3.10).
The rest of the proof is similar to, but easier than, that of Theorem 10.3.9.
Specifically, one uses q V (t, x, y) = q V (t, y, x) and
Z
V
qV t
qV t
q (t, x, y) = 2 , x, z 2 , z, y dz
RN
to prove the existence of and estimate for ∂xα ∂yβ q V (t, x, y). Also, knowing these
results about the spacial derivatives, one deals with the time derivatives in the
same way as I did at the end of that theorem. The details are left to the
reader.
Corollary 10.3.22. Let V be as in Theorem 10.3.21, and assume that n ≥ 2.
Then, for each ϕ ∈ Cb (RN ; R), the function
h Rt Z
Wx
(N ) V (ψ(τ )) dτ i
u(t, x) = E e 0 ϕ t) = ϕ(y)q V (t, x, y) dy
RN
is the unique u ∈ C 1,2 (0, ∞) × RN ; R that is bounded on (0, T ) × RN for each
T > 0 and satisfies (10.3.19).
Proof: The only assertion that has not already been proved is that the u
described takes on the correct initial value. However, because q V (t, x, y) ≤
+
ekV ku g (N ) (t, y − x), it is clear that, for each r > 0,
Z
lim sup q V (t, x, y) dy = 0.
t&0 x∈RN B(x,r){
To this end, let ϕ ∈ Cc∞ (RN ; R) be given, and apply symmetry, Theorem 10.1.2,
and Fubini’s Theorem to justify
0 = ϕ, QV1 ρ − ρ L2 (RN ;R) = QV1 ϕ − ϕ, ρ L2 (RN ;R)
Z 1
= QVτ 21 ∆ϕ + V ϕ , ρ 2 N dτ
0 L (R ;R)
Z 1
1
+ V ϕ, QVτ ρ 1
= 2 ∆ϕ dτ = 2 ∆ϕ + V ϕ, ρ L2 (RN ;R) .
0 L2 (RN ;R)
440 10 Wiener Measure and P.D.E.’s
and Z
pρ (s + t, x, y) = pρ (t, z, y)pρ (t, x, z) dz.
RN
Finally, if V ∈ C 2 (RN ; R) satisfies (10.3.23), then x pρ (t, x, y) is twice con-
N
tinuously differentiable for each (t, y) ∈ (0, ∞) × R , y ∂xα pρ (t, x, y) is twice
continuously differentiable for each α with kαk ≤ 2 and (t, x) ∈ (0, ∞) × RN ,
and
∂t pρ (t, x, y) = 12 ∆x pρ (t, x, y) + ∇x (log ρ), ∇x pρ (t, x, y) RN
for all (t, x, y) ∈ (0, ∞) × RN × RN . In particular, for each ϕ ∈ Cb (RN ; R), the
function Z
u(t, x) = ϕ(y)pρ (t, x, y) dy
RN
is the one and only bounded u ∈ C 1,2 (0, ∞) × RN ; R that satisfies
where y0 = x. In fact, if
−1 V Rt
ρ −tλ
V V ((ψ(τ )) dτ
R (t, ψ) = e ρ ψ(0) E (t, ψ)ρ ψ(t) where E (t, ψ) = e 0 ,
then (N )
Pρx (A) = EWx
ρ
R (t), A for all t ≥ 0 and A ∈ Ft .
Finally, x Pρx is continuous, and, for any stopping time ζ,
Z Z Z
Pρx (dψ) 0
) Pρψ(ζ) (dψ 0 ) Pρx (dψ)
F ψ, Σζ ψ = F (ψ, ψ
{ζ(ψ)<∞} {ζ(ψ)<∞}
Z (N )
(N ) (N )
Wψ(s)
EWx Rρ (t) Wx(N ) (dψ) = EWx Rρ (s), A
ρ
Rρ (s, ψ)E
R (s+t), A =
A
for A ∈ Fs . (N )
Determine µt,x ∈ M1 C(RN ) by µt,x (dψ) = R(t, ψ)Wx (dψ). By the pre-
ceding, µt1 ,x Ft1 = µt2 ,x Ft1 for
all 0 ≤ t1 ≤ t2 , and so (cf. Exercise 9.3.6)
there is a unique Pρx ∈ M1 C(RN ) whose restriction to Ft is the same as that
of µt,x for all t ≥ 0.
To see that x Pρx is continuous, it suffices to check that
But clearly this convergence is taking place pointwise for each ψ ∈ C(RN ). In
addition, Rρ (t, · ) ≥ 0 and, for each z ∈ RN , Rρ (t, z + ψ) has W (N ) -integral 1.
Hence, the convergence is also taking place in L1 (W (N ) ; R).
442 10 Wiener Measure and P.D.E.’s
Now suppose that ζ is a stopping time and that ζ ≤ T for some T ∈ (0, ∞).
Then, for any Fζ × FT -measurable F : C(RN )2 −→ R that is bounded below,
Z
F ψ, Σζ ψ Pρx (dψ)
Z
= Rρ ζ(ψ), ψ Rρ (2T − ζ(ψ), Σζ ψ F ψ, Σζ ψ Wx(N ) (dψ)
Z Z
ρ 0 0 (N ) 0
R 2T − ζ(ψ), ψ F (ψ, ψ )Wψ(ζ) (dψ ) Wx(N ) (dψ)
ρ
= R ζ(ψ), ψ
Z Z
ρ 0 ρ 0
F (ψ, ψ ) Pψ(ζ) (dψ ) Wx(N ) (dψ)
= R ζ(ψ), ψ
Z Z
0 ρ 0
= F (ψ, ψ ) Pψ(ζ) (dψ ) Pρx (dψ),
where I have again used (10.2.2) and, in the final step, Hunt’s Theorem (cf.
Theorem 7.1.14) to replace Rρ ζ(ψ), ψ) by Rρ (T, ψ). Starting from this, one
can easily remove the condition that ζ is bounded and extend the result to all
F ’s that are Fζ × BC(RN ) -measurable and bounded below.
To complete the proof, observe that, as a special case of the preceding,
Z
ρ ρ
pρ t, ψ(s), y) dy, A
EPx ϕ ψ(s + t) , A = EPx
RN
(N )
is a martingale. Hence, since Rρ (t), Ft , Wx is a martingale, one can apply
Theorem 7.1.14 to see that
is a martingale for all ϕ ∈ Cc∞ (RN ; R), then B(t), Ft , P is a Brownian motion.
Hence
√ |ξ|2
B(0,R) B(0,R)
exp −1 ξ, B(t ∧ ζ (ψ) RN + t∧ζ (ψ) , Ft , P
2
is a martingale for every R > 0, and so, after letting R → ∞, we know, by
Theorem 7.1.7, that B(t), Ft , P is a Brownian motion.
It is important to be clear about what Lemma 10.3.28 says and what it does not
say. It says that there is a progressively measurable B : [0, ∞) × C(RN ) −→ RN
such that B(t), Ft , P is a Brownian motion and
Z t
b ψ(τ ) dτ, (t, ψ) ∈ [0, ∞) × C(RN ).
(*) ψ(t) = x + B(t, ψ) +
0
then A(b) ∈ BC(RN ) , and I can define the Borel measurable map ϕ ∈ C(RN )
7−→ Xb ( · , ϕ) ∈ C(RN ) given by
ϕ1 , QVt ϕ2 = ϕ2 , QVt ϕ1
(10.3.30) L2 (RN ;R) L2 (RN ;R)
Finally,
ZZ Z Z
V 2 V −N
q (t, x, y) dx dy = q (2t, x, x) dx ≤ (4πt) 2 e2tV (x) dx.
RN RN
RN ×RN
In the language of functional analysis, the last part of Lemma 10.3.31 says
that QVT is Hilbert–Schmidt and therefore compact if e2T V ∈ L1 (RN ; R). As
a consequence, the elementary theory of compact, self-adjoint operators allows
us to make the conclusions drawn in the following theorem.
§ 10.3 Other Heat Kernels 447
Theorem 10.3.32. Assume that eT V ∈ L2 (RN ; R) for some T ∈ (0, ∞). Then
there is a unique ρ ∈ Cb RN ; (0, ∞) ∩ L2 (RN ; R) such that
kρkL2 (RN ;R) = 1 and etλ ρ = QVt ρ for some λ ∈ R and all t ∈ (0, ∞).
which, because q V (T, x, y) > 0 for all (x, y), is possible only if α(T ) > 0 and
ρ never changes sign. Therefore we can be take ρ to be non-negative. But,
if ρ ≥ 0, then, since pρ (T, x, y) > 0 everywhere and α(T )ρ = QVT ρ, ρ > 0
everywhere. Thus, we have now shown that every normalized eigenvector for
QVT with eigenvalue α(T ) is a bounded, continuous function that, after a change
of sign, can be taken to be strictly positive. In particular, if ρ1 and ρ2 were
linearly independent, normalized eigenvectors of QVT with eigenvalue α(T ), then
which means that α(t) = etβ for some β ∈ R, and, because α(T ) = eT λ , this com-
pletes the proof of everything except the final statement, which is an immediate
consequence of Theorem 10.3.21.
If nothing else, Theorem 10.3.32 helps to explain the terminology that I have
been using. In Schrödinger mechanics, the function ρ in Theorem 10.3.32 is
called the ground state because it is the wave function corresponding to the
lowest energy level of the quantum mechanical Hamiltonian − 12 ∆ − V . From
our standpoint, its importance is that it shows that lots of V ’s admit a ground
state.
I turn now to the second method for producing ground states. Namely, sup-
pose that ρ ∈ C 2 RN ; (0, ∞) . Then, it is obvious that 12 ∆ρ + V ρ = 0, where
Theorem 10.3.33. Let U ∈ C 2 (RN ; R), and assume that both U and V U ≡
1 2
N
− 2 ∆U + |∇U | are bounded above. Then, for each x ∈ R , there is a unique
U N U
Px ∈ M1 C(R ) such that Px ψ(0) = x = 1 and
Z t
1
∆ϕ + ∇U, ∇ϕ RN ψ(τ ) dτ, Ft , PU
ϕ ψ(t) − 2 x
0
Finally, x PU
x is continuous and, for any stopping time ζ and any Fζ ×BC(RN ) -
measurable F : C(RN ) × C(RN ) that is bounded below,
Z Z Z
0 0
F (ψ, Σζ ψ) PU
x (dψ) = F (ψ, ψ ) PU
ψ(ζ) (dψ ) PU
x (dψ).
{ζ(ψ)<∞} {ζ(ψ)<∞}
Exercises for § 10.3 449
p(0,∞) (t, x, y) = g (1) (t, y − x) − g (1) (t, x + y) for (t, x, y) ∈ (0, ∞) × (0, ∞)2 ,
1 ξ2
where g (1) (τ, ξ) = (2πτ )− 2 e− 2τ . In addition, referring to Corollary 7.3.4, show
that, for c ∈ R, r > 0, and (x, y) ∈ (c − r, c + r),
p(c−r,c+r) (t, x, y) = r−1 g̃ (1) r−2 t, r−1 (y−x) −r−1 g̃ (1) r−2 t, r−1 (x+y+2−2c)) ,
1 N π2
lim log Wx(N ) (ζ Q(a,R) > t) = − for x ∈ Q(a, R).
t→∞ t 8R2
450 10 Wiener Measure and P.D.E.’s
for every T > 0. Conclude from this that limt→∞ 1t f (t) = supT >0 T1 f (T ) ∈
[−∞, 0].
(ii) Refer to the notation in Exercise 10.3.36, set R1 = sup{r ≥ 0 : Q(a, r) ⊆
π2
G for some a ∈ G}, and show that λG ≡ − limt→∞ 1t log w(t) ≤ N 8R2
. In partic-
1
ular, λG < ∞.
(ii) Let R2 be the diameter of G, choose a ∈ RN so that G ⊆ B(a, R2 ), and use
(N ) R2
the first part of Theorem 10.1.11 to show that EWx [ζ G ] ≤ N2 for all x ∈ G. In
particular, conclude that w 2N −1 R22 ≤ 12 and therefore that λG ≥ N2R log 2
2 > 0.
2
for some δ > 0. In particular, this means that λ0 here is equal to λG in Exercise
10.3.37.
Exercises for § 10.3 451
(i) Let PG G
t be the operator on Cb (G; R) whose kernel is p (t, x, y), and show
G 2
that Pt admits a unique extension to L (G; R) as a self-adjoint contraction.
Further, show that {PG t : t > 0} is a continuous semigroup of non-negative
definite, self-adjoint contractions on L2 (G; R). Finally, show that
|G|
ZZ Z
G 2
p (t, x, y) dxdy = pG (2t, x, x) dx ≤ N ,
G (4πt) 2
G×G
(ii) Knowing that the operators PGt form a continuous semigroup of self-adjoint,
Hilbert–Schmidt (and therefore compact), non-negative definite contractions,
standard spectral theory2 guarantees that there exists a non-decreasing sequence
{λn : n ≥ 0} ⊆ [0, ∞) tending to ∞ and an orthonormal basis {ϕn : n ≥ 0}
in L2 (G; R) such that e−tλn ϕn = PGt ϕn for all t ∈ (0, ∞) and n ≥ 0. Conclude
from this that ϕn can be taken to be smooth and bounded. In addition, show
that PGt ϕ0 −→ 0 uniformly, and therefore that λ0 > 0.
∞
X
0
ϕ, PG e−tλn ϕ, ϕn ϕ0 , ϕn
(*) t ϕ L2 (G;R)
= L2 (G;R) L2 (G;R)
n=0
e−λ0 = sup ϕ, PG
1 ϕ L2 (G;R) : kϕkL2 (G;R) = 1 .
Use (cf. the proof of Theorem 10.3.32) this to show that if λn = λ0 , then ϕn
never changes sign and can therefore be taken to be non-negative. In particular,
show that this means that λ1 > λ0 and that ϕ0 > 0.
(iv) Starting from (*), show that
∞ Z
X
−tλn
2 N
e ϕ, ϕn L2 (G;R)
= ϕ(x)pG (t, x, y)ϕ(y) dxdy ≤ (2πt)− 2 kϕk2L1 (G;R)
n=0 G×G
2 What is needed here is the variant of Stone’s Theorem that applies to semigroups. The
technical question which his theorem addresses is that of finding a simultaneous diagonalization
of the operators PGt . Because we are dealing here with compact operators, this question can
be reduced to one about operators in finite dimensions, where it is quite easy to handle. For
a general statement, see, for example, K. Yoshida’s Functional Analysis and its Applications,
Springer-Verlag (1971).
452 10 Wiener Measure and P.D.E.’s
for any ϕ ∈ L2 (G; R), and use this to show that, for any M ∈ N and ϕ, ϕ0 ∈
L2 (G; R),
∞ tλ
X
−tλn
0 e− 2M 0
e ϕ, ϕn ϕ , ϕn L2 (G;R) ≤ N kϕkL (G;R) kϕ kL (G;R) .
1 1
L2 (G;R)
n=M (πt) 2
M −1 tλ
X 2e− 2M
e−tλn ϕn (x)ϕn (y) ≤
G
p (t, x, y) − N ,
n=0 (πt) 2
∞
! 12 ∞
! 12
X X
e−θtλn ϕn (x)2 e−θtλn ϕn (y)2
tλ
e 0 p(t, x, y) − ϕ0 (x)ϕ0 (y) ≤
n=1 n=1
θtλ1
−
θtλ1 12 12 e 2
≤ e− 2 pG θt
2 , x, x pG θt
2 , y, y ≤ N .
(πθt) 2
radiator. When Kac took up the problem, he turned it around. Namely, he asked
what geometric information, besides the volume, is encoded in the eigenvalues.
When he explained his program to L. Bers, Bers rephrased the problem in the
terms that Kac adopted for his title. Audiophiles will be disappointed to learn
that, according to C. Gordon, D. Webb, and S. Wolpert’s,4 one cannot hear the
shape of a drum, even a two dimensional one.
This exercise outlines Kac’s argument for proving Weyl’s asymptotic for-
mula N
|G|λ 2
N (λ) ∼ N ,
(2π) 2 Γ( N2+1 )
in the sense that the ratio of the two sides tends to 1 as λ → ∞.
(i) Refer to Exercise 10.3.38, and show that, for each n ≥ 0,
1
2 ∆ϕn = −λn ϕn and lim ϕn (x) = 0 for a ∈ ∂reg G.
x∈G
x→a
where N (dλ) denotes integration with respect the purely atomic measure on
(0, ∞) determined by the non-decreasing function λ N (λ).
(iii) Using (10.3.8), show that
N
1 ≥ (2πt) 2 pG (t, x, x) ≥ 1 − E(t, x),
At this point, Kac invoked Karamata’s Tauberian Theorem,5 which relates the
asymptotics at infinity of an increasing function to the asymptotics at zero of
4 See their 1992 announcement in B.A.M.S., new series 27 (2), “One cannot hear the shape
of a drum.”
5 See, for example, Theorem 1.7.6 in N. Bingham, C. Goldie, and J. Teugel’s Regularly Varying
its Laplace transform. Given the preceding, Karamata’s theorem yields Weyl’s
asymptotic formula. It should be pointed out that the weakness of Kac’s method
is its reliance on the Laplace transform and Tauberian theory, which gives only
the principal term in the asymptotics. Further information can be obtained
using Fourier methods, which, in terms of partial differential equations, means
that one is replacing the heat equation by the wave equation, an equation about
which probability theory has embarrassingly little to say.
Exercise 10.3.41. It will have occurred to most readers that the relation be-
tween the Hermite heat kernel in (10.1.7) and the Ornstein–Uhlenbeck process
in § 8.4.1 is the archetypal example of what we have been doing in this section.
This exercise gives substance to this remark.
|x|2
(i) Set ρ± (x) = e± 2 , and show that 1
2 ∆ρ± − 12 |x|2 ρ± = ± N2 ρ± . By Lemma
2
10.3.24, ρ− is a ground state for − |x|2 with associated eigenvalue − N2 , a fact
that also can be verified by direct computation using (10.1.7). Show that the
ρ 1 1
measure Px− is the distribution under W (N ) of {2− 2 U(2t, 2 2 x, θ) : t ≥ 0},
where U(t, x, θ) is the Ornstein–Uhlenbeck process described in (8.5.1).
(ii) Although it does not follow from Lemma 10.3.24, use (10.1.7) to show that
2
ρ+ is also a ground state for − |x|2 with associated N2 . (See Exercise 10.3.43.)
ρ 1
Also, show that Px+ is the W (N ) -distribution of {θ ∈ et x+2− 2 V(2t, θ) : t ≥ 0},
where {V(t, θ) : t ≥ 0} is the process discussed in Exercise 8.5.14.
x2 n x2
d − 2
Exercise 10.3.42. Recall the Hermite polynomials Hn (x) = (−1)n e 2 dx ne
in § 2.4.1. Show that the Hermite functions (although these are not precisely the
ones introduced in § 2.4, they are obtained from those by rescaling)
1
24 x2 1
h̃n (x) = 1 e− n ≥ 0,
2 Hn (2 2 x),
(n!) 2
Exercise 10.3.43. Part (ii) of Exercise 10.3.41 might lead one to question the
necessity of the boundedness assumption made in Lemma 10.3.24. However, that
would be a mistake because, in general, a positive solution to 12 ∆ρ + V ρ = λρ
need not be a ground state. For example, in this exercise we will show that
x4
although ρ(x) = e 4 satisfies 12 ∂x2 ρ + V ρ = 0 when V = − 12 x6 + 3x2 , this ρ is
not a ground state for V . The proof is based on the following idea. If ρ were a
ground state, then Theorems 10.3.26 and its corollaries would apply, and so we
would know that the equation
Z t
(*) X(t, ψ) = ψ(t) + X(τ, ψ)3 dτ
0
(1)
would have a solution on [0, ∞) for Wx -almost every ψ ∈ C(R) for every x ∈ R.
The following steps show that this is impossible.
(i) Suppose that ψ1 , ψ2 ∈ C(R) and that 0 ≤ ψ1 (t) ≤ ψ2 (t) for t ∈ [0, 1]. If
X( · , ψ2 ) exists on [0, 1], show that X( · , ψ1 ) exists on [0, 1].
Rt
Hint: Define X0 (t, ψ) = ψ(t) and Xn+1 (t, ψ) = ψ(t) + 0 Xn (τ, ψ)3 dτ . First
show that if 0 ≤ ψ1 (t) ≤ ψ2 (t), then 0 ≤ Xn ( · , ψ1 ) ≤ Xn ( · , ψ2 ). Second, if
supn≥0 kXn ( · , ψ)k[0,T ] < ∞, show that Xn ( · , ψ) converges uniformly on [0, T ]
to the unique solution to (*) on [0, T ].
1
(ii) Show that if ψ(t) ≥ 1 for t ∈ [0, 1], then X(t, ψ) ≥ (1 − 2t)− 2 for t ∈ 0, 12
In this concluding chapter I will discuss a few refinements and extensions of the
material in §§ 10.2 and 10.3. Even so, I will be barely scratching the surface. The
interested reader should consult J.L. Doob’s thorough account in Classical Po-
tential Theory and Its Probabilistic Counterpart, published by Springer–Verlag
in 1984, or S. Port and C. Stones’s Brownian Motion and Classical Potential
Theory, published by Academic Press in 1978.
§ 11.1 Uniqueness Refined
In this section I will refine some of the uniqueness statements made in § 10.2.
The improved statements result from the removal of the defect mentioned in
Remark 10.3.14. To be precise, recall that if G is an open subset of RN , then
ζsG (ψ) = inf{t ≥ s : ψ(t) ∈ / G}, ζ0+G
= lims&0 ζsG , and (cf. Lemma 10.2.11)
(N ) G
∂reg G is the set of x ∈ ∂G such that Wx (ζ0+ = 0) = 1. The main result
proved in this section is Theorem 11.1.15, which states that, for any x ∈ G and
(N )
Wx -almost all ψ ∈ C(RN ), ζ G (ψ) < ∞ =⇒ ψ(ζ G ) ∈ ∂reg G. However, I
will begin by amending the treatment that I gave in § 10.3 of the Dirichlet heat
kernel pG (t, x, y).
§ 11.1.1. The Dirichlet Heat Kernel Again. In § 10.3, I introduced the
Dirichlet heat kernel pG (t, x, y). At the time, I was concerned with it only when
(x, y) ∈ G × G, and so I defined it in such a way that it was 0 outside G × G.
When G is regular in the sense that ∂G = ∂reg G, this choice is the obvious one,
since (cf. Theorem 10.3.9) it is the one that makes pG (t, · , y) continuous on R
for each (t, y) ∈ (0, ∞) × RN . However, when G is not regular, it is too crude
for the analysis here. Instead, from now on I will take
pG (t, x, y) =
(11.1.1)
W (N ) x 1 − `t (τ ) + θt (τ ) + y`t (τ ) ∈ G, τ ∈ (0, t) g (N ) (t, y − x),
where `t (τ ) = τ ∧t
t and θt (τ ) = θ(τ ) − θ(t)`t (τ ). Notice that the difference
between this definition and the one in § 10.3.2 results from the replacement of
the closed interval [0, t] there by the open interval (0, t) here. That is, in § 10.3.2,
pG (t, x, y) was given by
W (N ) x 1 − `t (τ ) + θt (τ ) + y`t (τ ) ∈ G, τ ∈ [0, t] g (N ) (t, y − x).
456
§ 11.1 Uniqueness Refined 457
of Theorem 10.3.3, one can use the results in § 8.3.3 to check that pG (t, x, y) =
pG (t, y, x) is again true but that (10.3.4) has to be replaced by
Z
(N )
ϕ(y)pG (t, x, y) dy = EWx
G
(11.1.2) ϕ(ψ(t) , ζ0+ (ψ) ≥ t .
RN
◦
(ψ) = 1G ψ(s) Es◦ (ψ)Et◦ (ψ).
(11.1.3) Es+t
Thus, repeating the argument used in the proof of Theorem 10.3.3 to derive
(10.3.5), one finds that
Z
G
(11.1.4) p (s + t, x, y) = pG (s, x, z)pG (t, z, y) dz,
G
which, because the integral is over G and not RN , is a flawed version of the
Chapman–Kolmogorov equation. In order to remove this flaw, I will need the
following lemma.
Lemma 11.1.5. For each (t, x) ∈ (0, ∞) × RN ,
and therefore
Z h i
ϕ(y)pG (t, x, y) = Wx(N ) ϕ ψ(t) , ζ0+
G
(11.1.6) (ψ) > t
RN
/ Λ =⇒ Wy(N ) (ζ G = ξ) = 0
ξ∈ for Lebesgue-almost every y ∈ RN .
Wx(N ) ζ0+
G
= t = Wx(N ) ζ0+
G
> s & ζ G ◦ Σs = t − s ≤ Wx(N ) ζ G ◦ Σs = t − s
Z
(N )
Wx+y ζ G = t − s γ0,sI (dy) = 0.
=
RN
pG (t, x, y) =g (N ) (t, y − x)
(11.1.8) (N )
h i
− EWx g (N ) t − ζ0+
G G
G
(ψ), y − ψ(ζ0+ ) , ζ0+ (ψ) < t
for all (t, x, y) ∈ (0, ∞) × (RN )2 , and the idea is very much the same as the one
used to prove (10.3.8). Thus, for α ∈ (0, 1), set
qα◦ (t, x, y) = W (N ) x 1 − `t (τ ) + θt (τ ) + y`t (τ ) ∈ G, τ ∈ (0, αt) g (N ) (t, y − x).
(N )
h i
= EWx ϕ ψ(t) , ζsG (ψ) ≥ αt
Z
(N )
Wx (N ) G G G
+E ϕ(y)g t − ζs (ψ), y − ψ(ζs ) dy, ζs (ψ) < αt .
RN
for all α ∈ (0, 1), t ∈ (0, ∞) and s ∈ (0, αt). Thus, by (*), after letting s & 0,
we see that
Z
ϕ(y)qα (t, x, y) dy
RN
Z
= ϕ(y)g (N ) (t, y − x) dy
RN
Z
(N )
− EWx G G G
ϕ(y)g t − ζ0+ (ψ), y − ψ(ζ0+ ) dy, ζ0+ (ψ) < αt .
RN
for any 0 < s < t, and note that pG (s, · , a) is bounded. Hence, the desired
conclusions follow from (10.3.12) and the argument used to prove the last part of
Theorem 10.3.9. Next, suppose that pG (t0 , x0 , a) = 0 for some (t0 , x0 ) ∈ (0, ∞)×
G. Then, by the strong minimum principle (cf. Theorem 10.1.6), pG (t, x, a) = 0
for all (t, x) ∈ (0, t0 ) × G. But this, by (11.1.2) and symmetry, means that, for
t ∈ (0, t0 ),
Z Z
(N ) G G
Wa (ζ0+ ≥ t) = p (t, a, y) dy = pG (t, x, a) dx = 0,
RN G
where I have used the final part of Lemma 11.1.5 to get the second equality.
Hence, pG (t0 , x0 , a) = 0 =⇒ a ∈ ∂reg G.
Finally, because, by the preceding and symmetry, for any x ∈ G, ∂G \ ∂reg G
is contained in {y ∈ / G : p(1, x, y) > 0}, and, by Lemma 11.1.5, the latter set
has Lebesgue measure 0, it is clear the ∂G \ ∂reg G has Lebesgue measure 0.
I next introduce the function
(N ) −ζ G
(11.1.10) v G (x) ≡ EWx e 0+ , x ∈ RN .
Since, by the Markov property,
Z
(N ) G (N ) G
−s
e g (N ) (s, y − x)EWy e−ζ dy = EWx e−ζs % v G (x)
RN
as s & 0, it is clear that v G is lower semicontinuous. In addition, it is obvious
that v G ≤ 1 everywhere and that
x ∈ RN : v G (x) = 1 = reg(G) = ∂reg G ∪ RN \ G .
for all (t, x, y) ∈ (0, ∞) × RN × RN . Hence, after multiplying by e−t and inte-
grating with respect to t ∈ (0, ∞), one arrives at
(N )
h G i (N )
h G i
EWx e−ζ0+ (ψ) r ψ(ζ0+ G
) − y = EWy e−ζ0+ (ψ) r ψ(ζ0+ G
)−x .
But Z
r(x − y) dy = 1, x ∈ RN ,
RN
and so (11.1.12) follows after one integrates the preceding over y ∈ RN and
applies Tonelli’s Theorem.
Given (11.1.12) and the fact that r is uniformly positive on compacts, it
becomes obvious that ν G must be always locally finite and finite when G{ is
compact. Thus, all that remains is to check (11.1.13). But clearly, after multi-
plying (11.1.8) with G = H throughout by e−t and integrating with respect to
t ∈ (0, ∞), one gets
Z h G
(N ) i
r(x − y) = e−t pH (t, x, y) dt + EWx e−ζ0+ r ψ(ζ0+
G
)−y .
(0,∞)
Hence, since, by the first part of Lemma 11.1.9 with G = H, pH (t, x, · ) vanishes
on reg(H), (11.1.13) follows after one integrates the preceding with respect to
ν G (dy) and uses (11.1.12).
Lemma 11.1.14. If G{ is compact and, for some θ ∈ [0, 1), v G G{ ≤ θ, then
(N ) G
Wx ζ0+ < ∞ = 0 for every x ∈ RN .
G
Proof: I begin by checking that v ≤ θ everywhere. Thus, suppose that
H = x ∈ R : v (x) > θ + 6= ∅ for some > 0. Because v G is lower
N G
(N )
and so, after letting s & 0, we have that v G (x) ≥ (θ + )Wx (ζ0+ H
> 0).
(N ) H
In particular, if x ∈/ G, then θ ≥ (θ + )Wx (ζ0+ > 0), which means that
(N ) H
x ∈/ G =⇒ Wx (ζ0+ > 0) < 1. Hence, because (cf. part (ii) of Exercise
(N ) H
10.2.19) Wx (ζ0+ > 0) ∈ {0, 1}, this means that x ∈ / G =⇒ x ∈ reg(H) and
therefore that (11.1.13) applies. But if x ∈ H, (11.1.13) yields the contradiction
(N )
h H i
θ + < v G (x) = EWx e−ζ0+ v G ψ(ζ0+
H
) < θ + ,
H H
since ζ0+ (ψ) < ∞ =⇒ ψ(ζ0+ )∈ / H. That is, I have shown that H must be
empty.
Knowing that v G ≤ θ everywhere, I now want to argue that ν G (RN ) ≤
θν G (RN ). Since ν G (RN ) < ∞, this will show that ν G = 0 and therefore, by
(N ) G
(11.1.12), that v G ≡ 0, which is the same as saying that Wx (ζ0+ < ∞) = 0
everywhere. Thus, let K = G{, and set Kn = {x : dist(x, K) ≤ n−1 } and
Gn = Kn { for n ≥ 1. Clearly, K ⊆ RN \ Gn ⊆ reg(Gn ), and so, by (11.1.12) and
Tonelli’s Theorem,
Z Z
G N Gn G
ν (R ) = v (x) ν (dx) = v G (y) ν Gn (dy) ≤ θν Gn (RN ).
RN RN
Thus, all that we have to do is check that ν Gn (RN ) & ν G (RN ) when n → ∞.
But Z
Gn N
ν (R ) = v Gn (x) dx
RN
(N )
G
Proof: Suppose not. Because Wy (ζ0+ > 0) ∈ {0, 1} for all y ∈ RN , we could
then find an x ∈ G and a δ > 0 for which
Wx(N ) ζ0+
G G
(ψ) < ∞ & ψ(ζ0+ ) ∈ Γδ > 0,
where n o
Γδ = y ∈ ∂G : Wy(N ) ζ0+
G
≥ δ ≥ 12 .
§ 11.1 Uniqueness Refined 463
is the one and only bounded, smooth solution to the boundary value problem
described in Corollary 10.3.13.
More interesting are the improvements that Theorem 11.1.15 allows me to
make to the results in § 10.2.3.
Theorem 11.1.17. Given an open G ⊆ RN and a bounded Borel measurable
f : ∂G −→ R, set
(N )
uf (x) = EWx f ψ(ζ G ) , ζ G (ψ) < ∞ ,
(11.1.18) for x ∈ G.
Proof: The initial assertions are covered already by Theorem 10.2.14. Next,
let f ∈ Cb (∂G; R) be given, and suppose that u is an element of C 2 G; [0, ∞)
in the second assertion. To prove that uf ≤ u, set
which satisfies the conditions
Ft = σ {ψ(τ ) : τ ∈ [0, t]} , and choose a sequence of bounded, open subsets Gn
(N )
so that Gn ⊆ G and Gn % G. Then, for each n ≥ 1, −u ψ(t ∧ ζ Gn ), Ft , Wx
is a submartingale, and so we know that, for each x ∈ G, u(x) dominates
(N ) (N )
lim EWx u ψ(T ∧ ζ Gn ) ≥ lim lim EWx u ψ(ζ Gn ) , ζ G ≤ T
lim
T %∞ n→∞ T %∞ n→∞
(N )
h i
Wx
f ψ(ζ G ) , ζ G < ∞ = uf (x),
≥E
where, in the passage to the last line, I have used Fatou’s Lemma and Theorem
11.1.15.
Finally, let f ∈ Cb (∂G; R) be given. What I still have to show is that if u
is a harmonic function on G which tends to f at points in ∂reg G and satisfies
(N )
|u(x)| ≤ CWx (ζ G < ∞) for some C < ∞, then u = uf . Thus, suppose u is
such a function, and set M = C + kf ku . Then, by the preceding, we have both
that
and that
Then, for each f ∈ Cb (G; R) the function uf in (11.1.18) is the one and only
bounded, harmonic function u on G which satisfies limx→a u(x) = f (a) for every
x∈G
a ∈ ∂reg G. In particular, this will be the case if G is contained in a half-space.
In order to go further, it will be helpful to have the following lemma.
Lemma 11.1.21. Let G be a non-empty, connected, open set in RN . Then
and so we need only check that limx→b uf (x) > 0. To this end, first note that,
x∈G
since
lim uf (x) = f (a) = 1,
x→a
x∈G
the Strong Minimum Principle (cf. Theorem 10.1.6) says that uf > 0 everywhere
in G. Next, because b is not regular, we can find a δ > 0 and a sequence
{xn : n ≥ 1} ⊆ G such that xn → b and
≡ inf+ Wx(N ) G
n
ζ > δ > 0.
n∈Z
and construct f so that f ≡ 1 on ∂G ∩ B(b, r){ and f (b) = 0. Then f (b) <
limx→b uf (x).
x∈G
I next take a closer look at the conditions under which we can assert the
uniqueness of solutions to the Dirichlet problem. To begin, observe that, by
Corollary 11.1.19, the situation is quite satisfactory when (11.1.20) holds. In
fact, the same line of reasoning which I used there shows that the same conclusion
(N )
holds as soon as one knows that Wx ζ G < ∞ is bounded below by a positive
(N )
constant; and therefore, because x ∈ G 7−→ Wx (ζ G < ∞) is a bounded
harmonic function which tends to 1 at ∂reg G, Theorem 11.1.17 tells us that
inf Wx(N ) ζ G < ∞ > 0 =⇒ inf Wx(N ) ζ G < ∞ = 1.
(11.1.23)
x∈G x∈G
I will close this discussion of the Dirichlet problem with two results which
reflect the transience of Brownian paths in three and higher dimensions and
their recurrence in one and two dimensions.
Theorem 11.1.24. Assume that N ≥ 3, and let G be a nonempty, connected,
open subset of RN . If f ∈ Cc (∂G; R), then uf is the one and only bounded
harmonic function u on G which tends to f at ∂reg G and satisfies
(11.1.25) lim u(x) = 0.
|x|→∞
x∈G
Proof: We already know that uf is a bounded harmonic function which tends
to f at ∂reg G, but we must still show that it satisfies (11.1.25). For this purpose,
choose r ∈ (0, ∞) so that f is supported in B(0, r). Then (cf. the last part of
Theorem 10.1.11), because N ≥ 3,
uf (x) ≤ kf ku Wx(N ) ζr < ∞ −→ 0 as |x| → ∞.
To prove that uf is the only such function u, select bounded open sets Gn % G
with Gn ⊂⊂ G, and note that, for each T ∈ (0, ∞),
(N )
h i
u(x) = lim EWx u ψ(T ∧ ζ Gn )
n→∞
(N )
h i (N )
h i
= EWx f ψ(ζ G ) , ζ G ≤ T + EWx u ψ(T ) , T < ζ G < ∞
(N )
h i
+ EWx u ψ(T ) , ζ G = ∞ .
§ 11.1 Uniqueness Refined 467
Clearly, h i
(N )
uf (x) = lim EWx f ψ(ζ G ) , ζ G ≤ T
T %∞
and h i
(N )
lim EWx u ψ(T ) , T < ζ G < ∞ = 0.
T %∞
Finally, because N ≥ 3 and, therefore, by Corollary 10.1.12, ψ(T ) −→ ∞ as
(N )
T % ∞ for Wx -almost every ψ ∈ C(RN ), (11.1.25) guarantees that
(N )
h i
lim EWx u ψ(T ) , ζ G = ∞ = 0,
T %∞
(N ) (N )
Then −Xn (t), Ft , Wx0 is a non-positive,
right-continuous, Wx0 -submartin-
gale when Ft = σ {ψ(τ ) : τ ∈ [0, t]} . Hence, since
(N )
At the same time, by Theorem 10.2.3, we know that, for Wx0 -almost every
ψ ∈ C(RN ), Z ∞
1U ψ(t) dt = ∞ for all open U 6= ∅.
0
(N )
Hence, since Wx0 ζ G = ∞ > 0, there exists a ψ0 ∈ C(RN ) with the properties
that ψ(0) = x0 , ζ G (ψ0 ) = ∞,
Z ∞
1U ψ0 (t) dt = ∞ for all open U 6= ∅, and lim u ψ0 (t) exists,
0 t→∞
which is possible only if u is constant. In other words, we have now proved that
(N )
when Wx0 (ζ G < ∞) < 1 for some x0 ∈ G, then the only u ∈ C 2 G; [0, ∞)
with ∆u ≤ 0 are constant.
Given the preceding paragraph, the rest is easy. Indeed, if ∂reg G = ∅, then
(N )
Theorem 11.1.15 already implies that Wx (ζ G < ∞) = 0 for all x ∈ G. On the
(N )
other hand, if a ∈ ∂reg G but Wx0 ζ G < ∞ < 1 for some x0 ∈ G, then the
(N ) (N )
preceding paragraph applied to x Wx (ζ G < ∞) says that Wx (ζ G < ∞)
is constant, which leads to the contradiction
1 > Wx(N
0
) G
lim Wx(N ) (ζ G < ∞) = 1.
(ζ < ∞) = x→a
x∈G
I, II, & III,” Ill. J. Math. 1 & 2. In these articles, he literally created the modern theory of
Markov processes and established their relationship to potential theory. To see just how far
Hunt’s ideas can be elaborated, see M. Sharpe’s General Theory of Markov Processes, Acad.
Press Series in Pure & Appl. Math. 133 (1988).
§ 11.1 Uniqueness Refined 469
N 2 y
ΠR+ (0, y), dω = λ N −1 (dω), y ∈ (0, ∞),
ωN −1 y 2 + |ω|2 N2 R
N 2 y
ΠR+ (x, y), dω = λ N −1 (dω)
ωN −1 y 2 + |x − ω|2 N2 R
Moreover, by using further translation plus Wiener rotation invariance (cf. (ii) in
Exercise 4.3.10), one can pass easily from the preceding to an explicit expression
of the harmonic measure for an arbitrary half-space.
In the preceding, we were able to derive an expression giving the harmonic
measure for half-spaces directly from probabilistic considerations. Unfortu-
nately, half-spaces are essentially the only regions for which probabilistic rea-
soning yields such explicit expressions. Indeed, embarrassing as it is to admit,
it must recognized that, when it comes to explicit expressions, the time-honored
techniques of clever changes of variables followed by separation of variables are
more powerful than anything which comes out of (11.1.27). To wit, I have been
unable to give a truly probabilistic derivation of the classical formula given in
the following.
Theorem 11.1.28 (Poisson Formula). Use λSN −1 to denote the surface
measure on the unit sphere SN −1 in RN , and define
1 1 − |x|2
π (N ) (x, ω) = for (x, ω) ∈ B(0, 1) × SN −1 .
ωN −1 |x − ω|N
Then:
ΠB(0,1) (x, dω) = π (N ) (x, ω) λSN −1 (dω), for x ∈ B(0, 1).
470 11 Some Classical Potential Theory
More generally, if c ∈ RN , r ∈ (0, ∞), and λSN −1 (c,r) denotes the surface measure
on the sphere SN −1 (c, r) ≡ ∂B(c, r), then
1 r2 − |x − c|2
ΠB(c,r) (x, dω) = λSN −1 (c,r) (dω), x ∈ B(c, r).
ωN −1 r |x − ω|N
Equivalently, for each open G in RN , harmonic function u on G, B(c, r) ⊂⊂ G,
and x ∈ B(c, r),
Z
u(c + rω) π (N ) x−c
u(x) = r , ω λSN −1 (dω).
SN −1
In particular, if {un : n ≥ 1} is a sequence of harmonic functions on the open
set G and if un −→ u boundedly and pointwise on compact subsets of G, then
u is harmonic on G and un −→ u uniformly on compact subsets. (See Exercise
11.2.22 for another approach.)
Proof: Set B = B(0, 1). Clearly, everything except the final assertion follows
by scaling and translation once we identify π (N ) as the density for ΠB . To make
this identification, first check, by direct calculation, that π (N ) ( · , ω) is harmonic
in B for each ω ∈ SN −1 . Hence, in order to complete the proof, all that we have
to do is check that Z
lim
x→a
f (ω) π (N ) (x, ω) λSN −1 (dω) = f (a)
x∈B SN −1
for each x ∈/ D(r). In particular, if u ∈ Cb R2 \ D(r); R is harmonic on
R2 \ D(r), then
|x|2 |x|2 − r2
Z
u(x) = u(rω)λS1 (dω),
2π S1 |x|2 ω − rx2
and so
Z
1
(11.1.31) lim u(x) = u(rω) λS1 (dω).
|x|→∞ 2π S1
Proof: After an easy scaling argument, I may and will assume that r = 1.
Thus, set D = D(1), and
assume
that u ∈ Cb R2 \ D; R is harmonic in R2 \
x
D. Next, set v(x) = u |x| 2 for x ∈ D \ {0}. Obviously, v is bounded and
continuous. In addition, by using polar coordinates, one can easily check that v
is harmonic in D \ {0}. In particular, if ρ ∈ (0, 1) and G(ρ) ≡ B \ B(0, ρ), then
(N )
h i (N )
h i
v(x) = EWx v ψ(ζ1 ) , ζ1 < ζρ + EWx v ψ(ζρ ) , ζρ < ζ1 , x ∈ G(ρ),
where the notation is that in Theorem 10.1.11. Hence, because, by that theorem,
(N )
ζρ % ∞ (a.s., Wx ) as ρ & 0, this leads to
1 − |x|2
Z
Wx
(N )
h i 1
v(x) = E v ψ(ζ1 ) , ζ1 < ∞ = u(ω) λS1 (dω)
2π S1 ω − x2
for all x ∈ D \{0}. Finally, given the preceding, the rest comes down to a simple
matter of bookkeeping.
As a second application of Poisson’s formula, I make the following famous ob-
servation, which can be viewed as a quantitative version of the Strong Minimum
Principle (cf. Theorem 10.1.6) for harmonic functions.
Corollary 11.1.32 (Harnack’s Principle). For any c ∈ RN and r ∈
(0, ∞),
rN −2 r − |x − c| B(c,r)
N −1 Π (c, · )
r + |x − c|
(11.1.33)
rN −2 r + |x − c| B(c,r)
B(c,r)
≤Π (x, · ) ≤ N −1 Π (c, · ).
r − |x − c|
472 11 Some Classical Potential Theory
for all x ∈ B(c, r). Hence, if u is a non-negative, harmonic function on B(c, r),
then
rN −2 r − |x − c| rN −2 r + |x − c|
(11.1.34) N −1 u(c) ≤ u(x) ≤ N −1 u(c).
r + |x − c| r − |x − c|
(ii) Let K ⊂⊂ RN , and take σK (ψ) = inf{t > 0 : ψ(t) ∈ K} to be the first
positive entrance time of ψ ∈ C(RN ) into K. Given an open G ⊃⊃ K, show
that
Wx(N ) σK < ζ G = 0 for all x ∈ G \ K
(11.1.38)
if and only if K ∩ ∂reg (G \ K) = ∅, and use the locality proved in Lemma 10.2.11
to conclude that (11.1.38) for some G ⊃⊃ K is equivalent to K ∩ ∂reg (G \ K) = ∅
for all G ⊃⊃ K. In particular, conclude that (11.1.38) holds for some G ⊃⊃ K
if and only if
(11.1.39) Wx(N ) ∃t ∈ [0, ∞) ψ(t) ∈ K = 0 for all x ∈ / K.
for all x ∈ H.
(iii) Let K be a compact subset of RN and a connected G ⊃⊃ K be given.
Assuming either that N ≥ 3 or that ∂reg G 6= ∅, show that (11.1.39) holds if K
is a removable singularity in G for every bounded, harmonic function on G \ K.
(N )
Hint: Consider the function x ∈ G \ K 7−→ Wx σK < ζ G ∈ [0, 1], and use
the Strong Minimum Principle.
(iv) Let G be a non-empty, open subset of RN , where N ≥ 2, and set D =
{(x, x) : x ∈ G}, the diagonal in G2 . Given a u ∈ C(G2 ; R) which is harmonic
on G \ D, show that u is harmonic on G2 .
Hint: Show that
(2N )
Wx,y ∃t ∈ [0, ∞) ψ(t) ∈ D
Z
Wy(N ) ∃t ∈ (0, ∞) ψ(t) = ϕ1 (t) Wx(N ) (dϕ) = 0
≤
C(RN )
for (x, y) ∈ G2 \ D.
474 11 Some Classical Potential Theory
Exercise 11.1.40. For each r ∈ (0, ∞), let S(r) denote the open vertical strip
(−r, r) × R in R2 . Clearly,
and so the harmonic measure for S(r), based at any point in S(r), will be
supported on {(x, y) : x = ±r and y ∈ R}. In particular, if u ∈ Cb S(r); R is
bounded and harmonic on S(r), then
for z ∈ S(r, R) ≡ (−r, r) × (−R, R). Conclude that (11.1.41) holds as long as
(2)
lim sup u(x, R) ∨ u(x, −R)Wz(2) ζR < ζr(1) = 0, z ∈ S(r).
R→∞ |x|≤1
Thus, the desired conclusion comes down to showing that, for each ρ ∈ (r, ∞),
πR (2)
Wz(2) ζR < ζr(1) = 0, z ∈ S(r).
(*) lim exp
R→∞ 2ρ
§ 11.2 The Poisson Problem and Green Functions 475
(ii) To prove (*), let ρ ∈ (r, ∞) be given. Show that, for R ∈ (0, ∞) and
z ∈ S(r, R),
h (2)
i
πR
(2)
Wz π ψ1 ζR +ρ (2) (1)
uρ (z) = cosh 2ρ E sin 2ρ , ζ R < ζr
(2)
≥ cosh πR cos πr Wz(2) ζR < ζr(1) ,
2ρ 2ρ
Notice that, at least when G is bounded, or, more generally, whenever (11.1.20)
holds, there is at most one bounded u ∈ C 2 (G; R) which satisfies (11.2.1). In-
deed, if there were two, then their difference would be a bounded harmonic func-
tion on G satisfying boundary condition 0 at ∂reg G, which, because of (11.1.20)
and Corollary 11.1.19, means that this difference vanishes. Moreover, when
N ≥ 3, even if (11.1.20) fails, one can (cf. Theorem 11.1.24) recover uniqueness
by adding to (11.2.1) the condition that
Hence, at least when (11.1.20) holds and therefore G pG (T, x, y)f (y) dy −→ 0
R
desired solution to (11.2.1). On the other hand, it is neither obvious that the
limit will exist nor, even if it does exist, in what sense either the smoothness
properties or (11.2.2) will survive the limit procedure.
Motivated by these considerations, I now define the Green function to be
the function g G given by
Z
G
(11.2.3) g (x, y) = pG (t, x, y) dt, (x, y) ∈ G2 .
(0,∞)
N 2|y − x|2−N
(11.2.4) g R (x, y) = ,
(N − 2)ωN −1
N
where ωN −1 is the area of SN −1 . In particular, when N ≥ 3, g R (x, · ) is smooth
and has bounded derivatives of all orders in RN \ B(x, r) for each r > 0. Next,
by integrating both sides of (10.3.8) with respect to t ∈ (0, ∞), we obtain, for
any G, the Duhamel formula
N (N )
h N i
g G (x, y) = g R (x, y) − EWx g R ψ(ζ0+ G
G
(11.2.5) ), y , ζ0+ <∞ ,
What we still have to find are conditions under which GG f solves (11.2.1) and
satisfies (11.2.2). From (11.2.5) and Theorem 10.2.14, it is clear that GG f (x)
§ 11.2 The Poisson Problem and Green Functions 477
N N
tends to 0 as x tends to ∂reg G. In addition, since |GG f | ≤ GR |f | and GR |f |
tends to 0 at infinity, GG f satisfies (11.2.2). Hence, the remaining question is
whether 12 ∆GG f = −f on G. As an initial step, suppose that GG f ∈ Cb2 (G; R),
and note that, for each x ∈ G,
1 s
Z Z
1 G G
2 ∆G f (x) = lim ϕ(y)p (t, x, y) dy dt
s&0 s 0 G
Z
1 G G G
= lim p (s, x, y)G f (y) dy − G f (x)
s&0 s G
1 s
Z Z
G
= − lim f (y)p (t, x, y) dy dt = −f (x).
s&0 s 0 G
Thus, what we need to know is whether GG f ∈ Cb2 (G; R). By the considerations
N
above, we already know that GG f ∈ Cb2 (G; R) if and only if GR f is. Moreover,
N N
if f ∈ Cc2 (G; R), then ∂ α GR f = GR ∂ α f for any α with kαk ≤ 2. In addition,
Hence, by starting with f ’s that are in Cc2 (G; R) and applying an obvious ap-
proximation argument, we see that GG f ∈ Cb2 (G; R) whenever f ∈ Cc1 (G; R).1
Theorem 11.2.7. Assume that N ≥ 3 and that G is a non-empty, open subset
of RN . Then, for each f ∈ Cc1 (G; R), the function GG f in (11.2.6) is the unique
bounded, twice differentiable solution to (11.2.1) which satisfies (11.2.2).
Remark 11.2.8. Notice that the Duhamel formula in (11.2.5) could have been
N
guessed. To be precise, g R is a fundamental solution for − 12 ∆ in RN in the
N
sense that 12 ∆GR f = −f all test functions f ∈ Cc1 (RN ; R), and g G is to be a
fundamental solution for − 12 ∆ in G with 0 boundary data in the sense that it
should be the kernel for the solution operator which solves the Poisson problem in
(11.2.1). Based on these remarks, one should guess that a reasonable approach
N
to the construction of g G would be to correct g R ( · , y) for each y ∈ G by
N
subtracting off a harmonic function which has g R ( · , y) as its boundary value,
and this is, of course, precisely what is being done in (11.2.5).
§ 11.2.2. Green Functions when N ∈ {1, 2}. Because (cf. Theorem 10.2.3)
Brownian paths in one and two dimensions spend infinite time in every non-
empty open set, the reasoning § 11.2.1 is too crude to handle the Poisson problem
1 It turns out that if f is Hölder continuous of some order, then GG f will be twice continuously
differentiable and its second derivatives will be Hölder continuous of the same order as f . Such
results are called Schauder estimates. See, for example, N.V. Krylov’s Lectures on Elliptic and
Parabolic Equations in Hölder Spaces, A.M.S. Graduate Studies in Math. 12 (1996).
478 11 Some Classical Potential Theory
N
in these dimensions. In particular, when N ∈ {1, 2}, g R will be identically
infinite, and so (11.2.5) does us no good. To overcome this difficulty, I will use a
generalization of (11.2.5). Namely, let H be an open set that contains G. Then,
by the Markov property, it is easy to check that
(N )
h i
P H (t, x, Γ) = P G (t, x, Γ) + EWx P H t − ζ G (ψ), ψ(ζ G ), Γ , ζ G (ψ) < ∞
for (t, x, y) ∈ (0, ∞) × G2 . Hence, after integrating with respect to t ∈ (0, ∞),
one obtains
(N )
h i
g H (x, y) = g G (x, y) + EWx g H t − ζ G (ψ), y , ζ G (ψ) < ∞
(11.2.9)
(1)
Since Wx (ζ (a,b) < ∞) = 1 for all x ∈ R and the boundary of (a, b) is regular,
Corollary 11.1.19 together with (11.2.10) say that, as a function of x ∈ (a, b),
the second term on the right equals −u, where u00 = 0, limx&0 u(x) = 0, and
limx%b u(x) = 2(y − a). Hence, u(x) = 2(x−a)(y−a)
b−a , and so
2
g (a,b) (x, y) =
(11.2.11) x ∧ y − a (b − x ∨ y).
b−a
Starting from these, it is an easy matter to check by hand that if G 6= R is any
open interval and f ∈ Cc (G; R), GG f is bounded and solves (11.2.1). Moreover,
because (11.1.20) holds, GG f is the only such solution.
When N = 2, matters are significantly more complicated but much more
interesting. I will begin by considering the R2 analog of (0, ∞), which is the
upper half-space R2+ = {(x1 , x2 ) : x2 > 0}. It should be clear that, for x =
(x1 , x2 ) and y = (y1 , y2 ),
R2+ (1) (0,∞) 1 |y−x|2
− 2t
|y̌−x|2
− 2t
p (t, x, y) = g (t, y1 − x1 )p (t, y1 , y2 ) = e −e ,
2πt
where y̌ = (y1 , −y2 ). Therefore,
Z
2
2π pR+ (t, x, y) dt
(0,∞)
T
|y − x|2 |y̌ − x|2
Z
1
= lim exp − − exp − dt
T %∞ 0 t 2t 2t
−2
|y−x|
Z
1 − 1
= lim e 2tT dt,
T %∞ t
|y̌−x|−2
2 |y−x|
which means that g R+ (x, y) = − π1 log |y̌−x| . Hence, by (11.2.9), we know that
2 (2)
h 2
i
g G (x, y) = g R+ (x, y) − EWx G
G
g R+ ψ(ζ0+ ), y , ζ0+ <∞
and (2)
h i
EWx log |y − ψ(ζ G )|, ζ G (ψ) < ∞ < ∞.
sup
(x,y)∈K 2
In particular, u is harmonic on G2 .
Proof: Since g G is symmetric, the first equality is obvious. While proving the
associated finiteness assertion, I may and will assume that G is connected. In
addition, it suffices for me to prove
"Z G #
(2)
ζ (ψ)
sup EWx
1B(c,r) ψ(t) dt < ∞
x∈G 0
for all c ∈ G and r > 0 with B(c, 2r) ⊂⊂ G. Given such a ball, set B = B(c, r)
and 2B = B(c, 2r), and define {ζn : n ≥ 0} inductively by ζ0 = 0 and, for n ≥ 1,
ζ2n−1 = inf{t ≥ ζ2(n−1) : ψ(t) ∈ B} and ζ2n = inf{t ≥ ζ2n−1 : ψ(t) ∈ / 2B}.
(2) G
If u(x) = Wx ζ1 < ζ , then u is a [0, 1]-valued, harmonic function on G \ B
that tends to 0 as x tends to ∂reg G and to 1 as x tends to ∂B. Thus, since
∂reg G 6= ∅, the Minimum Principle says that u(x) ∈ (0, 1) for all x ∈ G \ B. In
particular, this means that α ≡ max{u(x) : |x − c| = 2r} ∈ (0, 1). At the same
time, by the Markov property,
(2)
Wx(2) ζ2n+1 < ζ G = EWx u ψ(ζ2n ) , ζ2n (ψ) < ζ G (ψ) ≤ αWx(2) ζ2n−1 < ζ G ,
(2) (2)
and so Wx ζ2n−1 < ζ G ≤ αn−1 for n ∈ Z+ . Hence, if f (y) = EWy ζ 2B ,
then
∞
"Z G # "Z #
(2)
ζ X (2)
ζ2n
Wx Wx G
E 1B ψ(t) dt = E 1B ψ(t) dt, ζ2n−1 (ψ) < ζ
0 n=1 ζ2n−1
∞
X (2) kf ku
EWx f ψ(ζ2n−1 ) , ζ2n−1 (ψ) < ζ G (ψ) ≤
≤ .
n=1
1−α
§ 11.2 The Poisson Problem and Green Functions 481
|y − ψ(ζ G )| G
(2)
= EWx log , ζ (ψ) < ζ B(c,r) (ψ)
D
is a non-negative, harmonic function on B 2 , and, for each (x, y) ∈ B 2 , vr (x, y) is
non-decreasing as a function of r > R. Thus, by Harnack’s Principle (cf. Corol-
lary 11.1.32), either limr→∞ vr = ∞ on B 2 or vr tends uniformly on compact
subsets of B 2 to a harmonic function v. Since
lim sup Wx(2) ζ G < ζ B(c,r) − Wx(2) (ζ G < ∞) = 0,
r→∞ x∈B
(2)
h i
= ur (x, y) + EWx logy − ψ(ζ B(c,r) ), ζ B(c,r) (ψ) < ζ G (ψ) < ∞ .
and (2)
h i
(x, y) ∈ G2 7−→ EWx log y − ψ(ζ G ), ζ G < ∞ ∈ R
1 1 (2)
h i
(11.2.16) g G (x, y) = − log |y−x|+ EWx logy−ψ(ζ G ), ζ G < ∞ +hG (x)
π π
for all distinct x’s and y’s from G, and so either hG ≡ 0 or G is unbounded and
1 1 (2)
h i
gr (x, y) = − log |y − x| + EWx log |y − ψ(ζ G(r) )|, ζ G(r) (ψ) < ∞ .
π π
we conclude from the first part of Lemma 11.2.13 that only the second alternative
is possible. Thus, we now know that g G is harmonic on G c2 and that
where
(2)
h i
ur (x, y) = EWx logy − ψ(ζ G ), ζ G (ψ) < ζ B(c,r) (ψ) for (x, y) ∈ G(r)2 .
Moreover, by combining this with (*) and (**), we also know that the third term
on the right of (**) converges uniformly on compact subsets of G c2 to a harmonic
function on G c2 . At the same time, as r → ∞,
(2)
h i
EWx logy − ψ(ζ B(c,r) ), ζ B(c,r) (ψ) ≤ ζ G (ψ) < ∞
− log rWx(2) ζ B(c,r) ≤ ζ G (ψ) < ∞
" ! #
(2) y − ψ(ζ B(c,r) )
= EWx log , ζ B(c,r) ≤ ζ G (ψ) < ∞ −→ 0
r
and apply Lebesgue’s Dominated Convergence Theorem together with the inte-
grability estimate in the second part of Lemma 11.2.13 to see that, as |y| → ∞
through G, the second term tends to 0 uniformly for x in compact subsets of
G.
484 11 Some Classical Potential Theory
log |x|
Wx(2) ζ D(r) < ζ G = R
.
log Rr
Hence, by (11.2.15), we see that
2
\D(R) 1 |x|
hR (x) = log , x∈
/ D(R).
π R
As we are about to see, for G’s whose complements are compact, the conclusion
drawn about hG at the end of Remark 11.2.18 is typical, at least as |x| → ∞.
Corollary 11.2.19. Let everything be as in Theorem 11.2.14, and assume
that K ≡ R2 \ G is compact. Then, for each R ∈ (0, ∞) with the property that
K ⊂⊂ D(R), one has that
|x| |x|2 |x|2 − R2
Z
1
hG (x) − log = G
h (Rω) λS1 (dω)
π R 2π S1 |x|2 ω − Rx2
Z
1
−→ hG (Rω) λS1 (dω)
2π S1
as |x| → ∞.
Proof: Define σ : C(RN ) −→ [0, ∞] to be the first entrance time into D(R),
and note (cf. the preceding discussion) that, for each r > R and R < |x| < r,
(2)
h i
(2)
= Wx(2) ζ D(r) < σ + EWx Wψ(σ) ζ D(r) < ζ G , σ < ζ D(r)
Hence, after multiplying the preceding through by logπ r , using (11.2.15), and
letting r → ∞, we arrive at
1 |x| 1 (2)
h i
hG (x) = log + EWx hG ψ(σ) , σ < ∞ , x ∈ R2 \ D(R),
π R π
§ 11.2 The Poisson Problem and Green Functions 485
does not depend on R as long as G{ ⊂⊂ B(0, R). This number plays an im-
portant role in classical two-dimensional potential theory, where it is known as
Robin’s constant for G.
Corollary 11.2.20. Again let everything be as in Theorem 11.2.14. Then,
for each K ⊂⊂ G and r > 0,
n o
sup g G (x, y) : |x − y| ≥ r and y ∈ K < ∞
and
lim sup g G (x, y) = 0 for each a ∈ ∂reg G.
x→a
x∈G y∈K
Moreover, for each f ∈ Cc1 (G; R), GG f is the unique bounded solution to
(11.2.1).
Proof: To prove the initial statements, let c ∈ G and r > 0 satisfying B(c, 2r)
⊂⊂ G be given, set B = B(c, r), and define the first entrance time σ(ψ) of ψ
into B by σ(ψ) = inf t ≥ 0 : ψ(t) ∈ B . By the Markov property, we see that,
for any f ∈ Cc B; [0, ∞) ,
"Z G #
Z
(2)
ζ
g G (x, y)f (y) dy = EWx f ψ(t) dt, σ < ζ G
G σ
Z
(2)
Wx G G
=E g ψ(σ), y f (y) dy, σ < ζ .
G
for some C ∈ (0, ∞). In particular, this, combined with the obvious Heine–
Borel argument, proves the first estimate. In addition, if a ∈ ∂reg G, then, for
each δ > 0,
lim Wx(2)
= x→a
σ≤δ .
x∈G
Thus, since the last expression obviously tends to 0 as δ & 0, this, together with
(*), implies that
lim sup g G (x, y) = 0,
x→a
x∈G y∈B
which (again after the obvious Heine–Borel argument) means that we have also
proved the second assertion.
Turning to the last part of the statement, let f ∈ Cc1 (G, R) be given. By the
preceding, we know that GG f is bounded and tends to 0 at ∂reg G. In addition,
using Theorem 11.2.14, especially (11.2.16), and arguing as I did in the case
when N ≥ 3, it is easy to check that GG f ∈ C 2 (G; R) and 12 ∆GG = −f . Thus,
GG f is a bounded solution to (11.2.1), and, because (11.1.20) holds, it can be
the only such solution.
Exercises for § 11.2
Exercises 11.2.21. Give an explicit expression for the Green function g B(c,R)
when N ≥ 2. To this end, first use translation and scaling to see that
x−c y−c
g B(c,R) (x, y) = R2−N g B(0,1) ,
R R
for distinct x, y from B(c, R). Thus, assume that c = 0 and R = 1. Next,
observe that
y
|x − y| = |y|x − |y| for x ∈ SN −1 and y ∈ BRN (0, 1) \ {0},
and use this observation together with (11.2.12) and (11.2.5) to conclude that
(
y
1 1 log − |y|x if y 6= 0
B(0,1) |y|
(x, y) = − log |y − x| +
g
π π 0 if y = 0
when N = 2 and
N
y
N
gR
|y| − |y|x if y 6= 0
g B(0,1) (x, y) = g R (x, y) −
2
(N −2)ωN −1 if y = 0
when N ≥ 3.
§ 11.3 Excessive Functions, Potentials, and Riesz Decompositions 487
Exercise 11.2.22. The derivation that I gave of Poisson’s formula (cf. Theo-
rem 11.1.28) required me to already know the answer and simply verify that it is
correct. Here I outline another approach, which is the basis for a quite general
procedure. To begin with, recall the classical Green’s Identity
Z Z
∂v ∂u
u∆v − v∆u dx = u ∂n − v ∂n dλ∂G
G ∂G
N
for bounded, smooth regions G in R and functions u and v that are smooth
in a neighborhood of G. (In the preceding, ∂w ∂n (x) is used to denote the normal
derivative ∇w(x), n(x) RN , where n(x) is the outer unit normal at x ∈ ∂G
and λ∂G is the standard surface measure for ∂G.) Next, let c be an element of
B(0, 1), suppose r > 0 satisfies B(c, r) ⊂⊂ B(0, 1), and let u be a function that
is harmonic in a neighborhood of BRN (0, 1). By applying Green’s Identity with
G = BRN (0, 1) \ B(c, r) and v = 12 g B(0,1) (c, · ), use Exercise 11.2.21 to verify
Z
N −1
u(c) = lim r ω, ∇v(c + rω) RN u c + rω) λSN −1 (dω)
r&0 SN −1
Z Z
u ω)π (N ) (c, ω) λSN −1 (dω),
= ω, ∇v(ω) RN u ω) λSN −1 (dω) =
SN −1 SN −1
check that, as R & 1, uR −→ u1 uniformly on B(0, 1), and use the preceding to
conclude that Z
u1 (c) = f (ω) π (N ) (c, ω) λSN −1 (dω),
SN −1
which is, of course, the result that was proved in Theorem 11.1.28.
§ 11.3 Excessive Functions, Potentials, and Riesz Decompositions
The origin of the Green function lies in the theory of electricity and magnetism.
Namely, if G is a region in RN whose boundary is grounded and y ∈ G, then
g G ( · , y) should be the electrical potential in G that results from placing a unit
point charge at y. More generally, if µ is any distribution of charge in G (i.e.,
a non-negative, locally finite, Borel measure on G), then one can consider the
potential GG µ given by
Z
(11.3.1) GG µ(x) = g G (x, y) µ(dy), x ∈ G,
G
ζ B(x,r)
"Z #
(N ) (N )
Wx
u ψ(ζ B(x,r) ) , ζ B(x,r) < ∞ − EWx 1
u(x) = E 2 ∆u ψ(τ ) dτ
0
Z
1
≥ u(x + rω) λSN −1 (dω).
ωN −1 SN −1
and used the rotation invariance of Brownian motion. Hence, each un is exces-
sive, and therefore, since
Z ∞
un (x) = pG (t, x, y) dt % g G (x, y) as n → ∞,
1
n
we are done.
§ 11.3.2. Potentials and Riesz Decomposition. My next goal is to prove
that, apart from the trivial case when u ≡ ∞, every excessive function on G
admits a unique representation in the form GG µ + h for an appropriate choice
of µ and h. The proof requires me to make some preparations.
Lemma 11.3.4. If u ∈ E(G), then either u ≡ ∞ or u is locally integrable on G.
Next, given a u ∈ E(G) that is not identically infinite, there exists a sequence
{un : n ≥ 1} ⊆ Cc∞ (G; R) and a non-decreasing sequence {Gn : n ≥ 1} of
open subsets of G with the properties that Gn ⊂⊂ G, Gn % G, un ≤ u,
∆un ≤ 0 on Gn for each n ≥ 1, and un −→ u pointwise as n → ∞. Moreover,
if µn (dy) = − 12 1Gn (y)∆un (y) dy, then there is a non-negative, locally finite,
Borel measure µ on G such that
Z Z
(11.3.5) lim ϕ dµn = ϕ dµ for all ϕ ∈ Cc (G; R).
n→∞ G G
Proof: To prove the first assertion, let U denote the set of all x ∈ G with the
property that
Z
u(y) dy < ∞ for some r > 0 with B(x, r) ⊂⊂ G.
B(x,r)
490 11 Some Classical Potential Theory
and so, after integrating this with respect to N sN −1 ds over (0, r), we get
Z Z
1 1
u(y) ≥ u(z) dz ≥ u(z) dz = ∞,
ΩN −1 rN B(y,r) ΩN −1 rN B(x,δ)
where δ ≡ r − |y − x|. Hence, we now see that G \ U is also open, and therefore
that either U = G or U = ∅ and u ≡ ∞.
Now assume that u ∈ E(G) is not identically infinite. To construct the required
Gn ’s and un ’s, choose a reference point c ∈ G, set R = 12 |c − G{|, and take ρ ∈
Cc∞ B(0, R4 ); [0, ∞) to be a rotationally invariant function with total integral
where ρ̃ : R −→ [0, ∞) is taken so that ρ(x) = ρ̃ |x| . Similarly, if B(x, r) ⊂⊂
Gn , then
Z
un (x + rω) λSN −1 (dω)
SN −1
Z Z
1
= ρ(z) u x+ nz + rω λSN −1 (dω) dz
B(0, R
4 ) SN −1
Z
ρ(z)u x + n1 z dz = ωN −1 un (x).
≤ ωN −1
B(0, R
4 )
observe that we already know that u(x) ≥ limn→∞ un (x). On the other hand,
because u is lower semicontinuous, an application of Fatou’s Lemma yields
Z
ρ(y) u x + n1 y dy = lim un (x).
u(x) ≤ lim
n→∞ G n→∞
To complete the proof, let µn be the measure described, and note that
t∧ζ Gn
"Z #
(N )
h i (N )
un (x) = EWx un ψ(t ∧ ζ Gn ) − EWx 1
2 ∆un ψ(s) ds
0
t∧ζ Gn
"Z # Z t Z
(N )
Wx 1 Gn
≥ −E 2 ∆un ψ(s) ds = p (s, x, y) µn (dy) ds
0 0 Gn
for all n ∈ Z+ and (t, x) ∈ (0, ∞) × Gn . Hence, after letting t % ∞, we see that
Z
u(x) ≥ un (x) ≥ g Gn (x, y) µn (dy), n ∈ Z+ and x ∈ Gn .
Gn
for every pair σ and τ of Bt : t ∈ [0, ∞) -stopping times with σ ≤ τ . In
particular, if u ∈ E(G) and B(x, r) ⊂⊂ G, then, for any rotationally symmetric
ρ ∈ Cc B(0, r); [0, ∞) with total integral 1,
Z
t ∈ (0, 1) 7−→ ρ(y) u(x + ty) dy ∈ [0, ∞]
B(0,r)
is a non-increasing function.
492 11 Some Classical Potential Theory
Proof: Let u ∈ E(G) be given. Clearly (11.3.9) is trivial in the case when
u ≡ ∞. Thus, assume that u 6≡ ∞, and define Gn and un for n ∈ Z+ as in
(11.3.7). Because ∆un Gn ≤ 0, we know that
(N )
h i
EWx un ψ(τ ∧ ζ Gm ∧ T ) , σ(ψ) ∧ T < ζ Gm (ψ)
(N )
h i
≤ EWx un ψ(σ ∧ T ) , σ(ψ) ∧ T < ζ Gm (ψ)
for all 1 ≤ m ≤ n, x ∈ Gm , and T ∈ [0, ∞). Next, after noting that ζ Gm < ∞
(N )
Wx -almost surely, let T % ∞ in the preceding, and arrive at
(N )
h i (N )
h i
EWx un ψ(τ ∧ζ Gm ) , σ(ψ) < ζ Gm (ψ) ≤ EWx un ψ(σ) , σ(ψ) < ζ Gm (ψ) .
Hence, by the Monotone Convergence Theorem, for any locally finite, non-
negative, Borel measure ν on G,
Z ZZ Z
Gm
(*) u(x) ν(dx) = lim g (x, y) ν(dx)dµn (y) + wm (x) ν(dx),
Gm n→∞ Gm
G2m
(N )
where wm (x) = EWx u ψ(ζ Gm ) , ζ Gm < ∞ .
But, since Gm is the intersection of two sets, both of which (cf. part (iv) in
Exercise 10.2.19) are regular, and is therefore regular as well, there is an n(a) ≥
m for which ϕn is continuous whenever n ≥ n(a). In particular, by (11.3.5), we
can now say that
Z Z Z
ρn (x − a) u(x) dx = ϕn (x) µ(dx) + ρn (x − a) wm (x) dx
Gm G Gm
the same
time, it is clear that the second term on the right goes to wm (a) and
that ϕn (y) : n ≥ n(a) tends non-decreasingly to g Gm (a, y). Thus, we have
now proved that
(**) u = GGm µ + wm on Gm for every m ∈ Z+ .
Starting from (**), the rest of the proof is quite easy. Namely, fix x ∈ G,
choose m so that x ∈ Gm , note that, g Gn (x, · ) is non-decreasing as n ≥ m
increases, and conclude that GGn∨m µ(x) % GG µ(x). Hence, by (**) (alter-
natively, by (11.3.9)), we know that wm∨n (x) tends non-increasingly to a limit
h(x), which Harnack’s Principle guarantees to be harmonic as a function of
x ∈ G. Thus, after passing to the limit as m → ∞ in (**), we conclude that
(11.3.11) holds with the µ satisfying (11.3.6) and h = limm→∞ H Gm u.
To prove that these quantities are unique, note that if ν is any locally finite,
non-negative, Borel measure on G for which u − GG ν is a non-negative harmonic
function, then, for every ϕ ∈ Cc∞ (G; R), simple integration by parts plus the
symmetry of g G shows that
Z Z Z
1 1 G
−2 ∆ϕu dx = − 2 ∆G ϕ dν = ϕ dν.
G G G
That is, ν must satisfy (11.3.6); and so we have now derived the required unique-
ness result.
Finally, to check the asserted characterization of h, suppose that v is a non-
negative harmonic function that is dominated by u on G. We then have
(N )
v(x) = EWx v ψ(ζ Gm ) , ζ Gm (ψ) < ∞ ≤ wm (x) for m ∈ Z+ and x ∈ Gm ,
and therefore the desired conclusion follows from the fact that wm tends to h.
By combining Lemma 11.3.2 with Theorem 11.3.10, we arrive at the following
characterization of potentials.
Corollary 11.3.12. Let everything be as in Theorem 11.3.10, and suppose
that u : G −→ [0, ∞] is not identically infinite. Then a necessary and sufficient
condition for u to be the potential GG µ of some locally finite, non-negative,
Borel measure µ on G is that u be excessive on G and have the property that
the constant function 0 is the only non-negative harmonic function on G that is
dominated by u.
Let u be an excessive function on G that is not identically infinite. In keeping
with the electrostatic metaphor, I will call the measure µ entering the Riesz de-
composition (11.3.11) of u the charge determined by u. A more mathematical
interpretation is provided by Schwartz’s theory of distributions. Namely, when
u ∈ E(G) is not identically infinite, it is (cf. Lemma 11.3.4) locally integrable on
G, and, as such, it determines a distribution there. Moreover, in the language
of distribution theory, (11.3.6) says that µ = − 12 ∆u. However, the following
theorem provides a better way of thinking about µ.
§ 11.3 Excessive Functions, Potentials, and Riesz Decompositions 495
Moreover, if u ∈ E(G) is not identically infinite and, for s ∈ (0, ∞), µs (dx) =
fs (x) dx, where fs (x) = u(x)−u
s
s (x)
, then, as s & 0, {µs : s > 0} tends to the
charge µ of u in the sense that
Z Z
ϕ(x) µ(dx) = lim ϕ(x) µs (dx) for all ϕ ∈ Cc (G; R).
G s&0 G
Proof: If u ∈ E(G), then, by the first part of Lemma 11.3.8 with τ = s and
σ = 0, one sees that u ≥ us . Conversely, suppose that u : G −→ [0, ∞] is lower
semicontinuous, not identically infinite, and satisfies u ≥ us for all s > 0. Then,
since pG (s, x, · ) > 0, u is locally integrable on G. Thus, if B(c, r) ⊂⊂ G and
Z
ws (x) = u(y)pB(c,r) (s, x, y) dy,
B(c,r)
for (s, t) ∈ (0, ∞)2 and x ∈ B(c, r). Hence, if ϕ ∈ Cc2 B(c, r); [0, ∞) , then
Z Z
1
− 12 ∆ws (x)ϕ(x) dx = lim
ws (x) − ws+t (x) ϕ(x) dx ≥ 0,
B(c,r) t&0 s B(c,r)
which proves that ∆ws ≤ 0 on B(c, r). Since this means that ws ∈ E B(c, r)
for each s > 0 and because
ws is non-increasing as a function of s, we will know
that u ∈ E B(c, r) once we show that ws −→ u pointwise on B(c, r). But,
since ws ≤ u, this comes down to checking u(x) ≤ lims&0 ws (x), which follows
from lower semicontinuity.
496 11 Some Classical Potential Theory
Turning to the second assertion, begin with the observation that, because
u ≥ us and u is lower semicontinuous, us −→ u pointwise as s & 0. Next, note
that for (s, x) ∈ (0, ∞) × G,
"Z #
Z s Z T +s
G 1
g (x, y)fs (y) dy = lim ut (x) dt − ut (x) dt
G T →∞ s 0 T
1 s
Z
≤ ut (x) dt ≤ u(x).
s 0
Z s Z
1 G
ϕs − ϕ = 2 ∆ϕ(y)p (τ, · , y) dy dτ,
0 G
and so, by Fubini’s Theorem and the symmetry of pG (τ, x, y), one can justify
Z Z Z s
1
ϕ dµs = − uτ (y) dτ
∆ϕ(y) dy
G 2s G 0
Z Z
1
−→ − 2 ∆ϕ(y)u(y) dy = ϕ dµ.
G G
for all x ∈ R3 . Of course, it is not at all obvious that such a µK exists. Indeed,
the proof that it always does was one of Wiener’s significant contributions to
classical potential theory. As we are about to see, probability provides a simple
proof of Wiener’s result.1
§ 11.4.1. The Capacitory Potential. Here I will show that the extremal
problem described above has a solution.
Theorem 11.4.1. Assume that G is a connected, open subset of RN and that
either N ≥ 3 or (11.1.20) holds. Given K ⊂⊂ G, set
pG (N )
∃t ∈ 0, ζ G (ψ) ψ(t) ∈ K , x ∈ G.
(11.4.2) K (x) = Wx
Then pG G
K is a potential whose charge µK is supported on K. Moreover, if µ ∈
M(G) is supported on K and GG µ ≤ 1, then GG µ ≤ pG K.
Proof: I begin by checking that pG K is excessive. For this purpose, note that,
for any s > 0, the Markov property says that
Z
pG G (N )
∃t ∈ s, ζ G (ψ) ψ(t) ∈ K ≤ pG
K (y)p (s, x, y) dy = Wx K (x).
G
In addition, because pG
K is bounded, the left-hand side is continuous with respect
to x ∈ G, and clearly the middle expression tends non-decreasingly to pG K (x) as
s & 0. Thus, by the first part of Theorem 11.3.13, we now know that pGK ∈ E(G).
1It is interesting to note that, although Wiener’s 1924 article, “Certain notions in potential
theory,” J. Math. Phys. M.I.T. 4, contains the first proof that an arbitrary compact set is
capacitable, it contains no reference to his own measure.
498 11 Some Classical Potential Theory
That is, pG
K satisfies the mean value property in G \ K and is therefore harmonic
there.
To complete the proof I must still show that if µ ∈ M(G) is supported on
K and u ≡ GG µ ≤ 1, then u ≤ pG K , and I will start by showing that u ≤ pK
G
The function pG G
K and the measure µK are, for the reasons explained above,
known as, respectively, the capacitory potential and the capacitory distri-
bution for K in G, and the total mass
(11.4.3) Cap(K; G) ≡ µG
K (K)
there exists (cf. Corollary 11.2.20 when N = 2) a C < ∞ such that g G (x, y) ≤
B(0,R)
g G∩B(0,R) (x, y) + C for all x ∈/ B(0, R) and y ∈ K. Hence, GG µK ≤
500 11 Some Classical Potential Theory
B(0,R)
1 + CCap K, B(0, R) , and so we have shown that GG µK is a non-zero,
bounded potential on G whose charge is supported in K, which, by the preceding
equivalences, means that Cap(K; G) > 0. Conversely, if Cap(K; G) > 0, then,
again by the preceding equivalences, we know that pG K > 0everywhere on G,
(N )
which, of course, means that Wx ∃t ∈ (0, ∞) ψ(t) ∈ K > 0, first for all
x ∈ G and then for all x ∈ RN .
The last part of the preceding allows us to use capacity to determine whether
Brownian paths will hit a K ⊂⊂ RN . Indeed, we now know that they will if
and only if Cap(K; G) > 0 for some G ⊃⊃ K satisfying our hypotheses. Thus,
the ability of Brownian paths in RN to hit a set is completely determined by
the singularity in the Green function. Namely, they will hit K with positive
probability if and only if there is a non-zero µ supported on K for which GG µ
is bounded. When N = 1, there is no singularity, and so even points can be hit.
When N ≥ 2, there is a singularity, and so, in order to be hit, K has to be large
enough to support a measure that is sufficiently smooth to mollify the singularity
in the Green function. Non-trivial (i.e., K’s for which K{ is the interior of its
closure) examples of K’s that cannot be hit are hard to come by. “Lebesgue’s
spine” provides one in R3 and can be adapted to RN for N ≥ 3. When N = 2
one has too work much harder. The most famous example is a devilishly clever
construction, known as “Littlewood’s crocodile,” due to J.E. Littlewood. See M.
Brelot’s lecture notes Éléments de la Théorie Classique du Potenial published
in 1965 by Centre de Documentation Universitaire, Sorbonne, Paris V.
§ 11.4.2. The Capacitory Distribution. In this subsection I will give a prob-
abilistic representation, discovered by K.L. Chung, of the capacitory distribution
µG N
K . Again I assume that G is a connected open subset of R and that either
N ≥ 3 or (11.1.20) holds.
The function `G N
K : C(R ) −→ [0, ∞] given by
`G G
K (ψ) = sup t ∈ 0, ζ (ψ) : ψ(t) ∈ K
(11.4.5)
≡ 0 if t ∈ 0, ζ G (ψ) : ψ(t) ∈ K = ∅ .
Cap(K; G) > 0. Then, for all Borel measurable ϕ : G −→ R that are bounded
below and every c ∈ G,
" #
ϕ ψ(`GK)
Z
(N )
G Wc G
, `K ∈ (0, ∞) .
(11.4.7) ϕ dµK = E
G g G c, ψ(`G
K)
(N )
h i
−→ EWc ϕ ψ(`G
G
K ) , `K ∈ (0, ∞) as s & 0,
where, in the passage to the third line, I have applied the Markov property and
used the time-shift property of `G K . Next, let η ∈ Cc (G; R) be given, note that
η
ϕ = gG (c, ·)
is again an element of Cc (G; R), and conclude from Theorem 11.3.13
and the preceding that (11.4.7) holds first for ϕ’s in Cc (G; R) and then for all
bounded, measurable ϕ’s on G.
Aside from its intrinsic beauty, (11.4.7) has the virtue that it simplifies the
proofs of various important facts about capacity. For instance, it allows one to
prove a basic monotone convergence result for capacity. However, before doing
so, I will need to introduce the the energy E G (µ, ν), which is defined for locally
finite, non-negative Borel measures µ and ν on G by
ZZ
E G (µ, ν) = g G (x, y) µ(dx)ν(dy).
G2
Clearly E G (µ, ν) is some sort of inner product, and so it is not surprising that
there is a Schwarz inequality for it.
Lemma 11.4.8. For any pair of locally finite, non-negative, Borel measures µ
and ν on G, q q
E G (µ, ν) ≤ E G (µ, µ) E G (ν, ν);
and, when the factors on the right are both finite, equality holds if and only if
aµ − bν = 0 for some pair (a, b) ∈ [0, ∞)2 \ (0, 0).
502 11 Some Classical Potential Theory
(0,∞)×G (0,∞)×G
12 12
ZZ ZZ
= f (t, x) dtdx g(t, x) dtdx
(0,∞)×G (0,∞)×G
q q
= E G (µ, µ) E G (ν, ν).
Furthermore, when f and g are square integrable, then equality holds if and only
if they are linearly dependent in the sense that af − bg = 0 Lebesgue-almost
everywhere for some non-trivial choice of a, b ∈ [0, ∞). But this means that
Z Z T Z
a G
a ϕ dµ = lim ϕ(x)p (t, x, y) µ(dx) dt
G T &0 T 0 G
ZZ ZZ
a b
= lim ϕ(x) f (t, x) dtdx = lim ϕ(x) g(t, x) dtdx
T &0 T T &0 T
(0,T ]×G (0,T ]×G
Z T Z Z
b
= lim ϕ(x)pG (t, x, y) ν(dx) dt = b ϕ dν
T &0 T 0 G G
Z 12 q
1 p
= pG
K dµG
K E G
(µ, µ) 2 ≤ Cap(K; G) E G (µ, µ),
G
and equality can hold only if aµG K − bµ = 0 for some non-trivial pair (a, b) ∈
[0, ∞)2 . When one takes µ = µG K this, in conjunction with the preceding, proves
,
that Cap(K; G) = E G µG K , µG
K . In addition, for any µ with µ(G \ K) = 0 and
G G
G µ ≤ 1, it shows that E (µ, µ) ≤ Cap(K; G) and that equality can hold only
if µ and µGK are related by a non-trivial linear equation,
in which case µ = µG K
G G G G
follows immediately from the equality E µK , µK = E (µ, µ).
The result in Theorem 11.4.9, which was known to Wiener, played an impor-
tant role in his analysis of classical potential theory. To be more precise, when
3
3
N = 3 and K{ is regular, pR K is the continuous function on R that is harmonic
off K, is 1 on K, and tends to 0 at infinity. Thus, it is a relatively simple prob-
lem to define the capacitory distribution for such K’s in R3 . The importance
to Wiener of results like that in Theorem 11.4.9 is that they enabled him (cf.
Exercise 11.4.20) to make a consistent assignment of capacity to K’s for which
K{ is not necessarily regular.
§ 11.4.3. Wiener’s Test. This subsection is devoted to another of Wiener’s
famous contributions to classical potential theory.
As was pointed out following Corollary 11.4.4, capacity can be used to test
whether Brownian paths will hit a compact set K. By Lemma 11.1.21, an
equivalent statement is that capacity can be used to test whether ∂reg (K{) is
empty or not. The result of Wiener that will be proved here can be viewed as a
sharpening of this remark.
Assume that N ≥ 2, and let an open subset G of RN and an a ∈ ∂G be given.
For n ∈ Z+ , set
n o
Kn = y ∈ / G : 2−n−1 ≤ |y − a| ≤ 2−n ,
and define
nCap Kn ; B(a, 1) if N = 2
(11.4.10) Wn (a, G) =
2n(N −2) Cap Kn ; B(a, 1)
if N ≥ 3.
Then Wiener’s test says that
∞
X
(11.4.11) a ∈ ∂reg G ⇐⇒ Wn (a, G) = ∞.
n=1
§ 11.4 Capacity 505
Notice that, at least qualitatively, (11.4.11) is what one should expect in that
the divergence of the series is some sort of statement that G{ is robust at a.
The key to my proof of Wiener’s test is the trivial observation that because
Z
B(a,1) B(a,1)
pn (x) ≡ pKn (x) = g B(a,1) (x, y) µKn (dy),
Kn
Hence, in probabilistic terms, Wiener’s test comes down to the assertion that
∞
X
Wa(N ) G
Wa(N ) An = ∞,
ζ0+ = 0 = 1 ⇐⇒
1
where An is the set of ψ ∈ C(RN ) that visit Kn before leaving B(a, 1). Actually,
although the preceding equivalence is not obvious, the closely related statement
Wa(N ) ζ0+
G
= 0 = 1 ⇐⇒ Wa(N ) lim An > 0
(11.4.12)
n→∞
G
is essentially immediate. Indeed, if ψ(0) = a and ζ0+ (ψ) = 0, then there
exists a sequence of times tm & 0 with the property that ψ(tm ) ∈ B(a, 1) ∩
G{ for all m, from which it is clear that ψ visits infinitely many Kn ’s before
leaving B(a, 1). Hence, the “ =⇒ ” in (11.4.12) is trivial. As for the opposite
N B(a,1)
implication,
B(a,1) that ψ ∈ C(R ) has the properties that ζ
suppose (ψ) < ∞,
t ∈ 0, ζ (ψ) : ψ(t) = a} = {0}, and that ψ visits infinitely many Kn ’s
before leaving B(a, 1). We can then find a subsequence {nm : m ≥ 1} and
a convergent sequence of times tm > 0 such that ψ(tm ) ∈ Knm for each m.
Clearly, limm→∞ ψ(tm) = a, and therefore
limm→∞
tm = 0. In other words, if
ζ B(a,1) (ψ) < ∞, t ∈ 0, ζ B(a,1) (ψ) : ψ(t) = a = {0}, and ψ ∈ limn→∞ An ,
G
then ζ0+ (ψ) = 0. Hence, since N ≥ 2 and therefore
(N ) G
and therefore, because Wa ζ0+ = 0 ∈ {0, 1}, we have proved the equivalence
in (11.4.12).
506 11 Some Classical Potential Theory
In view of the preceding paragraph, the proof of Wiener’s test reduces to the
problem of showing that
∞
X
Wa(N ) Wa(N ) An = ∞.
(11.4.13) lim An > 0 ⇐⇒
n→∞
1
Proof: Because
∞
X ∞
X
P An = ∞ =⇒ P And+k = ∞ for some 0 ≤ k < d,
n=1 n=1
whereas
P lim An ≥ P lim And+k for each 0 ≤ k < d,
n→∞ n→∞
1
I will assume that P(An ) ≤ 4C for all n ∈ Z+ . In particular, these assumptions
mean that, for each m ∈ Z+ , we can find an nm > m such that
nm
X 3 1
sm ≡ P A` ∈ 4C ,C .
`=m
Pn 1
Indeed, simply take nm to be the largest n > m for which `=m P A` ≤ C.
At the same time, by an easy induction argument on n > m, one has that
n
! n
[ X 1 X
P A` ≥ P A` − P Ak ∩ A`
2
`=m `=m m≤k6=`≤n
§ 11.4 Capacity 507
for all m ∈ Z+ .
Proof of Wiener’s Test: All that remains is to check that the sets An
(N )
appearing in (11.4.13) satisfy the hypothesis in Lemma 11.4.14 when P = Wa .
To this end, set n o
σn (ψ) = inf t ∈ (0, ∞) : ψ(t) ∈ Kn .
Clearly, An = σn < ζ B(a,1) , and so
as the amount of heat that flows into K during [0, t] from outside.
3See Electrostatic capacity, heat flow, and Brownian motion, in Z. Wahrsh. Gebiete. 3. Re-
cently, M. Van den Burg has written several papers in which he greatly refines Spitzer’s result.
508 11 Some Classical Potential Theory
EK (t)
lim = Cap(K; RN ).
t→∞ t
Proof: Because, by the second part of Lemma 11.1.5,
To see this, notice that there would be nothing to do if the integral were over
(N )
K{. On the other hand, by part (ii) of Exercise 10.2.19, Wx (σK > 0) = 0
Lebesgue-almost everywhere on K, and so the integral over K does not con-
tribute anything.
I now want to replace the preceding by
Z
Wy(N ) σK ≤ h and σK h
(*) EK (t) − EK (t − h) = > t dy,
RN
where
h
σK (ψ) ≡ inf s ∈ (h, ∞) : ψ(s) ∈ K
is the first entrance time into K after time h. To prove (*), set
(x,y) t−s s
θt (s) = x + θt (s) + y, s ∈ [0, t],
t t
Wx(N ) t − h < σK ≤ t
Z
(x,y)
= W (N ) t − h < σK θt ≤ t g (N ) (t, y − x) dy
N
ZR
(y,x) (y,x)
= W (N ) σK θt ≤ h and σK h
θt > t g (N ) (t, y − x) dy,
RN
Starting from (*), one has that, for each h ∈ [0, ∞),
∆K (h) ≡ lim EK (t + h) − EK (t)
t→∞
Z
= Wy(N ) σK ≤ h and σK h
= ∞ dy,
RN
Wy(N ) σK ≤ h and σK h
= ∞ = Wy(N ) σKh
= ∞ − Wy(N ) σK = ∞
Z
N N
= Wy(N ) σK < ∞ − Wy(N ) σK h
g (N ) (h, y − ξ)pR
< ∞ = pR K (y) − K (ξ) dξ.
RN
Finally, combine these with Theorem 11.3.13 to arrive at ∆K (1) = Cap K; RN .
To complete the proof, set ]t[= t − btc and write
[t]
X
EK (t) = EK ]t[ + EK ]t[ +n − EK ]t[ +n − 1 .
n=1
Using this together with ∆K (h) = hCap(K; G), one obtains the desired re-
sult.
The next two computations provide asymptotic formulas as t % ∞ for the
(N )
quantity Wx σK ∈ (t, ∞) .
Theorem 11.4.17.4 If N ≥ 3 and K ⊂⊂ RN , then, as t % ∞,
N
2Cap(K; RN ) 1 − pR
K (x) 1− N
pK (t, x) ≡ Wx(N ) σK ∈ (t, ∞) ∼ N t 2
(2π) 2 (N − 2)
uniformly for x in compacts.
4 This result was conjectured by Kac and first proved by his student A. Joffe. However, I will
follow the argument given by F. Spitzer in the article cited above.
510 11 Some Classical Potential Theory
Proof: Without loss in generality (cf. Corollary 11.4.4), I will assume that
N
Cap(K; RN ) > 0. Next, set pK (x) = pR
K (x) and pK (t, x, y) = p
K{
(t, x, y), and
note that, by the Markov property,
Z
pK (t, x) = pK (y) pK (t, x, y) dy.
K{
N
Thus, since pK (t, x, y) ≤ (2πt)− 2 , we know that
Z
N
−1
lim sup t p (t, x) − p (y) p (t, x, y) dy =0
2
K K K
t→∞ x∈RN |y|≥R
for every R > 0 with K ⊂⊂ B(0, R). At the same time, because
Z
3 N
pK (y) = g R (x, y) µR
K (dx),
K
it is clear that
2Cap(K; RN )
lim |y|N −2 pK (y) = .
|y|→∞ (N − 2)ωN −1
Hence, we have now shown that
N Z
N
2Cap(K; R ) pK (t, x, y)
lim sup t −1 pK (t, x) − dy =0
2
t→∞ x∈RN (N − 2)ωN −1 |y|≥R |y| N −2
for each R ∈ (0, ∞) with K ⊂⊂ B(0, R), and what we must still prove is that
N Z pK (t, x, y) ωN −1 (N )
2 −1
(*) lim sup t dy − W (σ = ∞)=0
N −2 N x K
t→∞ |x|≤r |y|≥R |y| (2π) 2
where
g (N ) (t, y − x)
Z
q(t, x) ≡ dy for (t, x) ∈ (0, ∞) × RN .
|y|≥R |y|N −2
§ 11.4 Capacity 511
After changing to polar coordinates and making a change of variables, one can
easily check that, for each T ∈ [0, ∞),
N ωN −1
lim sup t 2 −1 q(t − s, x) − = 0.
N
t→∞ 0<s≤T (2π) 2
|x|≤r
then it becomes clear that (*) will follow once we check that
lim sup Wx(N ) σK ∈ (T, ∞) = 0 and
T →∞ x∈RN
(**) N (N )
h i
sup t 2 −1 EWx q t − σK , ψ(σK ) , σK ∈ (T, t) = 0.
lim
T →∞ t>T
x∈RN
To check the first part of (**), note that, by the Markov property,
Z
Wx(N ) σK ∈ (T, T + 1] = pK (T, x, y)Wy(N ) σK ≤ 1 dy
K{
Z
−N N
Wy(N ) σK ≤ 1 dy ≤ CT − 2 ,
≤ (2πT ) 2
RN
X∞
Wx(N ) σK ∈ (T, ∞) ≤ Wx(N ) σK ∈ (T + n, T + n + 1] ,
n=0
(N )
we see that, as T → ∞, Wx σK ∈ (T, ∞) −→ 0 uniformly with respect to
x ∈ RN .
To handle the second part of (**), note that there is a constant A ∈ (0, ∞)
for which
N
q(t, x) ≤ A (t ∨ 1)1− 2 , (t, x) ∈ (0, ∞) × K,
512 11 Some Classical Potential Theory
and therefore
N (N )
t 2 −1 EWx
q t − σK , ψ(σK ) , σK ∈ (T, t)
N
−1
Wx(N ) σK ∈ [t] − 1, t
≤ At 2
[t]−1
N
X
(t − `)1− 2 Wx(N ) σK ∈ (` − 1, `]
+
`=[T ]
[t]−1
N
−N N N N
X
−1 −1
≤ ACt 2 ([t] − 1) 2 + ACt 2 (t − `)1− 2 (` − 1)− 2 ,
`=[T ]
where the C is the same as the one that appeared in the derivation of the first
part of (**). Thus, everything comes down to verifying that
n−1
N N N
X
lim sup n 2 −1 (n − `)1− 2 `− 2 = 0.
m→∞ n>m
`=m
2
But, by taking m = m N −1 and considering
N N N N
X X
(n − `)1− 2 `− 2 and (n − `)1− 2 `− 2
m≤`≤(1−m )n (1−m )n≤`≤n
n−1
N N N
X
n 2 −1 (n − `)1− 2 `− 2 ≤ Bm .
`=m
2πhK (x)
Wx(2) σK > t ∼ for each x ∈ R2 \ K.
log t
5 This theorem is taken from G. Hunt’s article Some theorems concerning Brownian motion,
T.A.M.S. 81, pp. 294–319 (1956). With breathtaking rapidity, it was followed by the articles
referred to in § 11.1.4.
§ 11.4 Capacity 513
Proof: The strategy of Hunt’s proof is to deal with the Laplace transform
Z ∞ (2)
e−αt W (2) σK > t dt = α−1 1 − EWx e−ασK ,
0
show that
log α1 (2)
(*) lim 1 − EWx e−ασK = hK (x),
α&0 2π
log t t (2)
Z
lim Wx σK > τ dτ = hK (x)
t→∞ 2πt 0
and then, because t W (2) σK > t is non-increasing, that the asserted result
holds. Thus, everything comes down to proving (*).
Set G = R2 \ K. By assumption, G satisfies the hypotheses of Theorem
11.2.14. Now let x ∈ G be given, and choose y ∈ G \ {x} from the same
connected component of G as x. Then pG (t, x, y) > 0 for all t ∈ (0, ∞). In
addition, by (10.3.8), for each α ∈ (0, ∞),
Z ∞
e−αt pG (t, x, y) dt
0
Z ∞ Z ∞
(N )
−αt (2) Wx −ασK −αt (2)
= e g t, y − ψ(σK ) dt − E e e g (t, y − x) dt .
0 0
Writing
Z 1 Z ∞
t exp −βt − t−1 dt +
−1
t−1 e−βt exp −t−1 − 1 dt
2πf (β) =
0 1
Z ∞
+ t−1 e−t dt,
β
where Z ∞
1
κ= e−t log t dt.
π 0
1 1 (2)
g G (x, y) = − log |y − x| + EWx log |y − ψ(σK )|, σK < ∞
π π
log α1 (2)
+ 1 − EWx e−ασK + o(1)
2π
as α & 0. Finally, after comparing this to (11.2.16), we arrive at (*).
Let K ⊂⊂ RN be as in the preceding theorem, and choose some c ∈ K{. By
comparing the result just obtained to (11.2.15), we see that
(2)
Wx σK > t
lim (2) = 2 for each x ∈ K{.
t→∞ Wx σK > ζ BR2 (c,t)
Further, say that Γ ∈ BRN has capacity zero if there is no tame µ ∈ M(RN )
for which µ(Γ) > 0.
(i) If K ⊂⊂ RN , show that K has capacity 0 if and only if Cap K; B(0, R) = 0
for some R > 0 with K ⊂⊂ B(0, R). Further, show that if K has capacity 0, G
is open with K ⊂⊂ G, and either N ≥ 3 or (11.1.20) holds, then Cap(K; G) = 0.
(ii) If Γ ∈ BRN , show that Γ has capacity 0 if and only if every compact K ⊆ Γ
has capacity 0.
(iii) For any open G ⊆ RN , show that ∂G \ ∂reg G has capacity 0.
Exercises for § 11.4 515
General
517
518 Notation
Measure Theoretic
Wiener Measure
Gaussian or normal distribution with mean m and co-
γm,C §2.3.1
variance C
Potential Theoretic
521
522 Index
M Hermite, 98
marginal distribution, 83
N
Markov property, 417
martingale, 205 Nelson’s Inequality, 106
application to Fourier series, 263 non-degenerate, 306
continuous parameter, 267 non-negative definite function, 119
complex, 267 non-negative linear functional, 374
Gundy’s decomposition of, 227 normal law, 23
Hahn decomposition of, 227 fixed point characterization, 91
reversed, 217 Lévy–Cramér Theorem, 66
Banach-valued case, 241 standard, 23
on σ-finite measure space, 233 null set, see P-null set
martingale convergence
continuous parameter, 271 O
Hilbert-valued case, 243
operator
Marcinkewitz’s Theorem, 207
Fourier, 100
preliminary version for Banach space, 239
hypercontractive, 105
second proof, 226
lowering, 97
third proof, 227
raising, 96
via upcrossing inequality, 214
optional stopping time, 280
maximal function
Ornstein–Uhlenbeck process, 344
Hardy–Littlewood, 235
ancient, 345
Hardy–Littlewood inequality, 236
associated martingales, 415
maximum principle of Phragmén– Lin-
Gaussian description, 344
delöf, 474
Hermite heat kernel, 454
Maxwell distribution for ideal gas, 70
reversible, 346
mean value
in Banach space, 365
Banach space case, 199
vector-valued case, 84
P
measure
invariant, 112 Paley–Littlewood Inequality for Walsh
locally finite, 63 series, 264
non-atomic, 381 Paley–Wiener map, 312
product, 10 as a stochastic integral, 316
pushforward Φ∗ µ of µ under Φ, 12 Parseval’s Identity, 112
measure preserving, 244 path properties, 158
measures absolutely pure jump, 158
consistent family, 383 piecewise constant, 158
tight, 376, 382 Phragmén–Lindelöf, 474
median, 39 pinned Brownian motion, 327
variational characterization, 43 π-system, 8
Mehler kernel, 98 P-null set, 194
minimum principle, 130 Poincaré’s Inequality for Gaussian, 355
strong, 405 Poisson jump process, 168
weak, 404 Itô’s construction of, 390
moment estimate for sums of independent Poisson kernel, 149
random variables, 94 for upper half-space, 429
moment generating function, 23 for ball via Green’s Identity, 487
logarithmic, 25 Poisson measure, 122
multiplier generalized, 171
Bernoulli, 101 simple, 161
526 Index