Probability Theory An Analytic View PDF

This page intentionally left blank
Probability Theory
An Analytic View, Second Edition
This second edition of Daniel W. Stroock’s text is suitable for first-year graduate
students with a good grasp of introductory undergraduate probability. It provides
a reasonably thorough introduction to modern probability theory with an empha-
sis on the mutually beneficial relationship between probability theory and analy-
sis. It includes more than 750 exercises and offers new material on Levy processes,
large deviations theory, Gaussian measures on a Banach space, and the relationship
between a Wiener measure and partial differential equations.
The first part of the book deals with independent random variables, Central Limit
phenomena, the general theory of weak convergence and several of its applications, as
well as elements of both the Gaussian and Markovian theories of measures on function
space. The introduction of conditional expectation values is postponed until the
second part of the book, where it is applied to the study of martingales. This part also
explores the connection between martingales and various aspects of classical analysis
and the connections between a Wiener measure and classical potential theory.
Dr. Daniel W. Stroock is the Simons Professor of Mathematics Emeritus at the

Massachusetts Institute of Technology. He has published many articles and is the
author of six books, most recently Partial Differential Equations for Probabilists
(2008).
Probability Theory
An Analytic View
Second Edition
Daniel W. Stroock
Massachusetts Institute of Technology
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore,
São Paulo, Delhi, Dubai, Tokyo, Mexico City
Cambridge University Press

32 Avenue of the Americas, New York, NY 10013-2473, USA
www.cambridge.org
Information on this title: www.cambridge.org/9780521132503

c Daniel W. Stroock 1994, 2011
This publication is in copyright. Subject to statutory exception

and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First edition published 1994

First paperback edition 2000
Second edition published 2011
Printed in the United States of America
A catalog record for this publication is available from the British Library.
Library of Congress Cataloging in Publication data

Stroock, Daniel W.
Probability theory : an analytic view/ Daniel W. Stroock. – 2nd ed.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-521-76158-1 (hardback) – ISBN 978-0-521-13250-3 (pbk.)
1. Probabilities. I. Title.
QA 273.S763 2010
519.2–dc22 2010027652
ISBN 978-0-521-76158-1 Hardback

ISBN 978-0-521-13250-3 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for
external or third-party Internet Web sites referred to in this publication and does not guarantee
that any content on such Web sites is, or will remain, accurate or appropriate.
This book is dedicated to my teachers:
M. Kac, H.P. McKean, Jr., and S.R.S. Varadhan

Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Table of Dependence . . . . . . . . . . . . . . . . . . . . . . xxi
Chapter 1 Sums of Independent Random Variables . . . . . . . . 1

1.1 Independence . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1. Independent σ-Algebras . . . . . . . . . . . . . . . . . . . 1
1.1.2. Independent Functions . . . . . . . . . . . . . . . . . . . . 4
1.1.3. The Rademacher Functions . . . . . . . . . . . . . . . . . . 5
Exercises for § 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 The Weak Law of Large Numbers . . . . . . . . . . . . . . . 14
1.2.1. Orthogonal Random Variables . . . . . . . . . . . . . . . 14
1.2.2. Independent Random Variables . . . . . . . . . . . . . . . 15
1.2.3. Approximate Identities . . . . . . . . . . . . . . . . . . . 16
Exercises for § 1.2 . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 Cramér’s Theory of Large Deviations . . . . . . . . . . . . . . 22
Exercises for § 1.3 . . . . . . . . . . . . . . . . . . . . . . . . 31
1.4 The Strong Law of Large Numbers . . . . . . . . . . . . . . . 35
Exercises for § 1.4 . . . . . . . . . . . . . . . . . . . . . . . . 42
1.5 Law of the Iterated Logarithm . . . . . . . . . . . . . . . . 49
Exercises for § 1.5 . . . . . . . . . . . . . . . . . . . . . . . . 56
Chapter 2 The Central Limit Theorem . . . . . . . . . . . . . 59

2.1 The Basic Central Limit Theorem . . . . . . . . . . . . . . . 60
2.1.1. Lindeberg’s Theorem . . . . . . . . . . . . . . . . . . . 60
2.1.2. The Central Limit Theorem . . . . . . . . . . . . . . . . 62
Exercises for § 2.1 . . . . . . . . . . . . . . . . . . . . . . . . 65
2.2 The Berry–Esseen Theorem via Stein’s Method . . . . . . . . . 71
2.2.1. L1 -Berry–Esseen . . . . . . . . . . . . . . . . . . . . . 72
2.2.2. The Classical Berry–Esseen Theorem . . . . . . . . . . . . 75
Exercises for § 2.2 . . . . . . . . . . . . . . . . . . . . . . . . 81
2.3 Some Extensions of The Central Limit Theorem . . . . . . . . . 82
2.3.1. The Fourier Transform . . . . . . . . . . . . . . . . . . . 82
2.3.2. Multidimensional Central Limit Theorem . . . . . . . . . . . 84
2.3.3. Higher Moments . . . . . . . . . . . . . . . . . . . . . 87
Exercises for § 2.3 . . . . . . . . . . . . . . . . . . . . . . . . 90
vii
viii Contents
2.4 An Application to Hermite Multipliers . . . . . . . . . . . . . 96
2.4.1. Hermite Multipliers . . . . . . . . . . . . . . . . . . . . 96
2.4.2. Beckner’s Theorem . . . . . . . . . . . . . . . . . . . . 101
2.4.3. Applications of Beckner’s Theorem . . . . . . . . . . . . . 105
Exercises for § 2.4 . . . . . . . . . . . . . . . . . . . . . . . . 110
Chapter 3 Infinitely Divisible Laws . . . . . . . . . . . . . . . 115

3.1 Convergence of Measures on RN . . . . . . . . . . . . . . . . 115
3.1.1. Sequential Compactness in M1 (RN ) . . . . . . . . . . . . . 116
3.1.2. Lévy’s Continuity Theorem . . . . . . . . . . . . . . . . . 117
Exercises for § 3.1 . . . . . . . . . . . . . . . . . . . . . . . . 119
3.2 The Lévy–Khinchine Formula . . . . . . . . . . . . . . . . . 122
3.2.1. I(RN ) Is the Closure of P(RN ) . . . . . . . . . . . . . . . 123
3.2.2. The Formula . . . . . . . . . . . . . . . . . . . . . . . 126
Exercises for § 3.2 . . . . . . . . . . . . . . . . . . . . . . . . 137
3.3 Stable Laws . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.3.1. General Results . . . . . . . . . . . . . . . . . . . . . . 139
3.3.2. α-Stable Laws . . . . . . . . . . . . . . . . . . . . . . 141
Exercises for § 3.3 . . . . . . . . . . . . . . . . . . . . . . . . 147
Chapter 4 Lévy Processes . . . . . . . . . . . . . . . . . . . 151

4.1 Stochastic Processes, Some Generalities . . . . . . . . . . . . . 152
4.1.1. The Space D(RN ) . . . . . . . . . . . . . . . . . . . . . 153
4.1.2. Jump Functions . . . . . . . . . . . . . . . . . . . . . . 156
Exercises for § 4.1 . . . . . . . . . . . . . . . . . . . . . . . . 159
4.2 Discontinuous Lévy Processes . . . . . . . . . . . . . . . . . 160
4.2.1. The Simple Poisson Process . . . . . . . . . . . . . . . . . 161
4.2.2. Compound Poisson Processes . . . . . . . . . . . . . . . . 163
4.2.3. Poisson Jump Processes . . . . . . . . . . . . . . . . . . 168
4.2.4. Lévy Processes with Bounded Variation . . . . . . . . . . . 170
4.2.5. General, Non-Gaussian Lévy Processes . . . . . . . . . . . . 171
Exercises for § 4.2 . . . . . . . . . . . . . . . . . . . . . . . . 174
4.3 Brownian Motion, the Gaussian Lévy Process . . . . . . . . . . 177
4.3.1. Deconstructing Brownian Motion . . . . . . . . . . . . . . 178
4.3.2. Lévy’s Construction of Brownian Motion . . . . . . . . . . . 180
4.3.3. Lévy’s Construction in Context . . . . . . . . . . . . . . . 182
4.3.4. Brownian Paths Are Non-Differentiable . . . . . . . . . . . 183
4.3.5. General Lévy Processes . . . . . . . . . . . . . . . . . . 185
Exercises for § 4.3 . . . . . . . . . . . . . . . . . . . . . . . . 187
Contents ix
Chapter 5 Conditioning and Martingales . . . . . . . . . . . . 193
5.1 Conditioning . . . . . . . . . . . . . . . . . . . . . . . . 193
5.1.1. Kolmogorov’s Definition . . . . . . . . . . . . . . . . . . 194
5.1.2. Some Extensions . . . . . . . . . . . . . . . . . . . . . 198
Exercises for § 5.1 . . . . . . . . . . . . . . . . . . . . . . . . 202
5.2 Discrete Parameter Martingales . . . . . . . . . . . . . . . . 205
5.2.1. Doob’s Inequality and Marcinkewitz’s Theorem . . . . . . . . 206
5.2.2. Doob’s Stopping Time Theorem . . . . . . . . . . . . . . . 212
5.2.3. Martingale Convergence Theorem . . . . . . . . . . . . . . 214
5.2.4. Reversed Martingales and De Finetti’s Theory . . . . . . . . . 217
5.2.5. An Application to a Tracking Algorithm . . . . . . . . . . . 221
Exercises for § 5.2 . . . . . . . . . . . . . . . . . . . . . . . . 226
Chapter 6 Some Extensions and Applications of Martingale

Theory . . . . . . . . . . . . . . . . . . . . . . 233
6.1 Some Extensions . . . . . . . . . . . . . . . . . . . . . . . 233
6.1.1. Martingale Theory for a σ-Finite Measure Space . . . . . . . 233
6.1.2. Banach Space–Valued Martingales . . . . . . . . . . . . . . 239
Exercises for § 6.1 . . . . . . . . . . . . . . . . . . . . . . . . 240
6.2 Elements of Ergodic Theory . . . . . . . . . . . . . . . . . . 244
6.2.1. The Maximal Ergodic Lemma . . . . . . . . . . . . . . . . 245
6.2.2. Birkhoff’s Ergodic Theorem . . . . . . . . . . . . . . . . 248
6.2.3. Stationary Sequences . . . . . . . . . . . . . . . . . . . 251
6.2.4. Continuous Parameter Ergodic Theory . . . . . . . . . . . . 253
Exercises for § 6.2 . . . . . . . . . . . . . . . . . . . . . . . . 256
6.3 Burkholder’s Inequality . . . . . . . . . . . . . . . . . . . . 257
6.3.1. Burkholder’s Comparison Theorem . . . . . . . . . . . . . 257
6.3.2. Burkholder’s Inequality . . . . . . . . . . . . . . . . . . 262
Exercises for § 6.3 . . . . . . . . . . . . . . . . . . . . . . . . 263
Chapter 7 Continuous Parameter Martingales . . . . . . . . . 266

7.1 Continuous Parameter Martingales . . . . . . . . . . . . . . . 266
7.1.1. Progressively Measurable Functions . . . . . . . . . . . . . 266
7.1.2. Martingales: Definition and Examples . . . . . . . . . . . . 267
7.1.3. Basic Results . . . . . . . . . . . . . . . . . . . . . . . 270
7.1.4. Stopping Times and Stopping Theorems . . . . . . . . . . . 272
7.1.5. An Integration by Parts Formula . . . . . . . . . . . . . . 276
Exercises for § 7.1 . . . . . . . . . . . . . . . . . . . . . . . . 280
7.2 Brownian Motion and Martingales . . . . . . . . . . . . . . . 282
7.2.1. Lévy’s Characterization of Brownian Motion . . . . . . . . . 282
7.2.2. Doob–Meyer Decomposition, an Easy Case . . . . . . . . . . 284
7.2.3. Burkholder’s Inequality Again . . . . . . . . . . . . . . . . 289
x Contents
Exercises for § 7.2 . . . . . . . . . . . . . . . . . . . . . . . . 290
7.3 The Reflection Principle Revisited . . . . . . . . . . . . . . . 292
7.3.1. Reflecting Symmetric Lévy Processes . . . . . . . . . . . . 292
7.3.2. Reflected Brownian Motion . . . . . . . . . . . . . . . . . 294
Exercises for § 7.3 . . . . . . . . . . . . . . . . . . . . . . . . 298
Chapter 8 Gaussian Measures on a Banach Space . . . . . . . . 299

8.1 The Classical Wiener Space . . . . . . . . . . . . . . . . . . 299
8.1.1. Classical Wiener Measure . . . . . . . . . . . . . . . . . 299
8.1.2. The Classical Cameron–Martin Space . . . . . . . . . . . . 303
Exercises for § 8.1 . . . . . . . . . . . . . . . . . . . . . . . . 306
8.2 A Structure Theorem for Gaussian Measures . . . . . . . . . . 306
8.2.1. Fernique’s Theorem . . . . . . . . . . . . . . . . . . . . 306
8.2.2. The Basic Structure Theorem . . . . . . . . . . . . . . . . 307
8.2.3. The Cameron–Marin Space . . . . . . . . . . . . . . . . . 310
Exercises for § 8.2 . . . . . . . . . . . . . . . . . . . . . . . . 313
8.3 From Hilbert to Abstract Wiener Space . . . . . . . . . . . . 317
8.3.1. An Isomorphism Theorem . . . . . . . . . . . . . . . . . 317
8.3.2. Wiener Series . . . . . . . . . . . . . . . . . . . . . . . 318
8.3.3. Orthogonal Projections . . . . . . . . . . . . . . . . . . . 322
8.3.4. Pinned Brownian Motion . . . . . . . . . . . . . . . . . . 326
8.3.5. Orthogonal Invariance . . . . . . . . . . . . . . . . . . . 328
Exercises for § 8.3 . . . . . . . . . . . . . . . . . . . . . . . . 330
8.4 A Large Deviations Result and Strassen’s Theorem . . . . . . . 337
8.4.1. Large Deviations for Abstract Wiener Space . . . . . . . . . 337
8.4.2. Strassen’s Law of the Iterated Logarithm . . . . . . . . . . . 340
Exercises for § 8.4 . . . . . . . . . . . . . . . . . . . . . . . . 342
8.5 Euclidean Free Fields . . . . . . . . . . . . . . . . . . . . 343
8.5.1. The Ornstein–Uhlenbeck Process . . . . . . . . . . . . . . 344
8.5.2. Ornstein–Uhlenbeck as an Abstract Wiener Space . . . . . . . 346
8.5.3. Higher Dimensional Free Fields . . . . . . . . . . . . . . . 349
Exercises for § 8.5 . . . . . . . . . . . . . . . . . . . . . . . . 355
8.6 Brownian Motion on a Banach Space . . . . . . . . . . . . . . 358
8.6.1. Abstract Wiener Formulation . . . . . . . . . . . . . . . . 358
8.6.2. Brownian Formulation . . . . . . . . . . . . . . . . . . . 361
8.6.3. Strassen’s Theorem Revisited . . . . . . . . . . . . . . . . 363
Exercises for § 8.6 . . . . . . . . . . . . . . . . . . . . . . . . 365
Chapter 9 Convergence of Measures on a Polish Space . . . . . 367

9.1 Prohorov–Varadarajan Theory . . . . . . . . . . . . . . . . 367
9.1.1. Some Background . . . . . . . . . . . . . . . . . . . . . 367
9.1.2. The Weak Topology . . . . . . . . . . . . . . . . . . . . 370
9.1.3. The Lévy Metric and Completeness of M1 (E) . . . . . . . . . 377
Contents xi
Exercises for § 9.1 . . . . . . . . . . . . . . . . . . . . . . . . 381
9.2 Regular Conditional Probability Distributions . . . . . . . . . . 386
9.2.1. Fibering a Measure . . . . . . . . . . . . . . . . . . . . 388
9.2.2. Representing Lévy Measures via the Itô Map . . . . . . . . . 390
Exercises for § 9.2 . . . . . . . . . . . . . . . . . . . . . . . . 392
9.3 Donsker’s Invariance Principle . . . . . . . . . . . . . . . . . 392
9.3.1. Donsker’s Theorem . . . . . . . . . . . . . . . . . . . . 393
9.3.2. Rayleigh’s Random Flights Model . . . . . . . . . . . . . . 396
Exercise for § 9.3 . . . . . . . . . . . . . . . . . . . . . . . . 399
Chapter 10 Wiener Measure and Partial Differential Equations . 400

10.1 Martingales and Partial Differential Equations . . . . . . . . . 400
10.1.1. Localizing and Extending Martingale Representations . . . . . 401
10.1.2. Minimum Principles . . . . . . . . . . . . . . . . . . . 404
10.1.3. The Hermite Heat Equation . . . . . . . . . . . . . . . . 405
10.1.4. The Arcsine Law . . . . . . . . . . . . . . . . . . . . . 407
10.1.5. Recurrence and Transience of Brownian Motion . . . . . . . 411
Exercises for § 10.1 . . . . . . . . . . . . . . . . . . . . . . . 415
10.2 The Markov Property and Potential Theory . . . . . . . . . . 416
10.2.1. The Markov Property for Wiener Measure . . . . . . . . . . 416
10.2.2. Recurrence in One and Two Dimensions . . . . . . . . . . . 417
10.2.3. The Dirichlet Problem . . . . . . . . . . . . . . . . . . 418
Exercises for § 10.2 . . . . . . . . . . . . . . . . . . . . . . . 426
10.3 Other Heat Kernels . . . . . . . . . . . . . . . . . . . . . 429
10.3.1. A General Construction . . . . . . . . . . . . . . . . . . 429
10.3.2. The Dirichlet Heat Kernel . . . . . . . . . . . . . . . . . 431
10.3.3. Feynman–Kac Heat Kernels . . . . . . . . . . . . . . . . 436
10.3.4. Ground States and Associated Measures on Pathspace . . . . 439
10.3.5. Producing Ground States . . . . . . . . . . . . . . . . . 445
Exercises for § 10.3 . . . . . . . . . . . . . . . . . . . . . . . 449
Chapter 11 Some Classical Potential Theory . . . . . . . . . . 456

11.1 Uniqueness Refined . . . . . . . . . . . . . . . . . . . . . 456
11.1.1. The Dirichlet Heat Kernel Again . . . . . . . . . . . . . . 456
11.1.2. Exiting Through ∂reg G . . . . . . . . . . . . . . . . . . 459
11.1.3. Applications to Questions of Uniqueness . . . . . . . . . . . 463
11.1.4. Harmonic Measure . . . . . . . . . . . . . . . . . . . . 468
Exercises for § 11.1 . . . . . . . . . . . . . . . . . . . . . . . 472
11.2 The Poisson Problem and Green Functions . . . . . . . . . . . 475
11.2.1. Green Functions when N ≥ 3 . . . . . . . . . . . . . . . 476
11.2.2. Green Functions when N ∈ {1, 2} . . . . . . . . . . . . . . 477
Exercises for § 11.2 . . . . . . . . . . . . . . . . . . . . . . . 486
11.3 Excessive Functions, Potentials, and Riesz Decompositions . . . . 487
xii Contents
11.3.1. Excessive Functions . . . . . . . . . . . . . . . . . . . . 488
11.3.2. Potentials and Riesz Decomposition . . . . . . . . . . . . 489
Exercises for § 11.3 . . . . . . . . . . . . . . . . . . . . . . . 496
11.4 Capacity . . . . . . . . . . . . . . . . . . . . . . . . . 497
11.4.1. The Capacitory Potential . . . . . . . . . . . . . . . . . 497
11.4.2. The Capacitory Distribution . . . . . . . . . . . . . . . . 500
11.4.3. Wiener’s Test . . . . . . . . . . . . . . . . . . . . . . 504
11.4.4. Some Asymptotic Expressions Involving Capacity . . . . . . 507
Exercises for § 11.4 . . . . . . . . . . . . . . . . . . . . . . . 514
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
Preface
From the Preface to the First Edition

When writing a graduate level mathematics book during the last decade of
the twentieth century, one probably ought not inquire too closely into one’s
motivation. In fact, if ones own pleasure from the exercise is not sufficient to
justify the effort, then one should seriously consider dropping the project. Thus,
to those who (either before or shortly after opening it) ask for whom was this
book written, my pale answer is me; and, for this reason, I thought that I should
preface this preface with an explanation of who I am and what were the peculiar
educational circumstances that eventually gave rise to this somewhat peculiar
book.
My own introduction to probability theory began with a private lecture from
H.P. McKean, Jr. At the time, I was a (more accurately, the) graduate student
of mathematics at what was then called The Rockefeller Institute for Biologi-
cal Sciences. My official mentor there was M. Kac, whom I had cajoled into
becoming my adviser after a year during which I had failed to insert even one
micro-electrode into the optic nerves of innumerable limuli. However, as I soon
came to realize, Kac had accepted his role on the condition that it would not
become a burden. In particular, he had no intention of wasting much of his
own time on a reject from the neurophysiology department. On the other hand,
he was most generous with the time of his younger associates, and that is how
I wound up in McKean’s office. Never one to bore his listeners with a lot of
dull preliminaries, McKean launched right into a wonderfully lucid explanation
of P. Lévy’s interpretation of the infinitely divisible laws. I have to admit that
my appreciation of the lucidity of his lecture arrived nearly a decade after its
delivery, and I can only hope that my reader will reserve judgment of my own
presentation for an equal length of time.
In spite of my perplexed state at the end of McKean’s lecture, I was sufficiently
intrigued to delve into the readings that he suggested at its conclusion. Knowing
that the only formal mathematics courses that I would be taking during my
graduate studies would be given at N.Y.U. and guessing that those courses would
be oriented toward partial differential equations, McKean directed me to material
which would help me understand the connections between partial differential
equations and probability theory. In particular, he suggested that I start with
the, then recently translated, two articles by E.B. Dynkin which had appeared
originally in the famous 1956 volume of Teoriya Veroyatnostei i ee Primeneniya.
Dynkin’s articles turned out to be a godsend. They were beautifully crafted to
xiii
xiv Preface
tell the reader enough so that he could understand the ideas and not so much
that he would become bored by them. In addition, they gave me an introduction
to a host of ideas and techniques (e.g., stopping times and the strong Markov
property), all of which Kac himself consigned to the category of overelaborated
measure theory. In fact, it would be reasonable to say that my thesis was simply
the application of techniques which I picked up from Dynkin to a problem that
I picked up by reading some notes by Kac. Of course, along the way I profited
immeasurably from continued contact with McKean, a large number of courses
at N.Y.U. (particularly ones taught by M. Donsker, F. John, and L. Nirenberg),
and my increasingly animated conversations with S.R.S. Varadhan.
As I trust the preceding description makes clear, my graduate education was
anything but deprived; I had ready access to some of the very best analysts
of the day. On the other hand, I never had a proper introduction to my field,
probability theory. The first time that I ever summed independent random
variables was when I was summing them in front of a class at N.Y.U. Thus,
although I now admire the magnificent body of mathematics created by A.N.
Kolmogorov, P. Lévy, and the other twentieth-century heroes of the field, I
am not a dyed-in-the-wool probabilist (i.e., what Donsker would have called a
true coin-tosser). In particular, I have never been able to develop sufficient
sensitivity to the distinction between a proof and a probabilistic proof. To me,
a proof is clearly probabilistic only if its punch-line comes down to an argument
like P (A) ≤ P (B) because A ⊆ B; and there are breathtaking examples of such
arguments. However, to base an entire book on these examples would require a
level of genius that I do not possess. In fact, I myself enjoy probability theory
best when it is inextricably interwoven with other branches of mathematics and
not when it is presented as an entity unto itself. For this reason, the reader
should not be surprised to discover that he finds some of the material presented
in this book does not belong here; but I hope that he will make an effort to figure
out why I disagree with him.
Preface to the Second Edition

My favorite “preface to a second edition” is the one that G.N. Watson wrote for
the second edition of his famous treatise on Bessel functions. The first edition
appeared in 1922, the second came out in 1941, and Watson had originally
intended to stay abreast of developments and report on them in the second
edition. However, in his preface to the second edition Watson admits that his
interest in the topic had “waned” during the intervening years and apologizes
that, as a consequence, the new edition contains less new material than he had
thought it would.
My excuse for not incorporating more new material into this second edition is
related to but somewhat different from Watson’s. In my case, what has waned
is not my interest in probability theory but instead my ability to assimilate
the transformations that the subject has undergone. When I was a student,
Preface xv
probabilists were still working out the ramifications of Kolmogorov’s profound
insights into the connections between probability and analysis, and I have spent
my career investigating and exploiting those connections. However, about the
time when the first edition of this book was published, probability theory began
a return to its origins in combinatorics, a topic in which my abilities are woefully
deficient. Thus, although I suspect that, for at least a decade, the most exciting
developments in the field will have a strong combinatorial component, I have
not attempted to prepare my readers for those developments. I repeat that my
decision not to incorporate more combinatorics into this new edition in no way
reflects my assessment of the direction in which probability is likely to go but
instead reflects my assessment of my own inability to do justice to the beautiful
combinatorial ideas that have been introduced in the recent past.
In spite of the preceding admission, I believe that the material in this book
remains valuable and that, no matter how probability theory evolves, the ideas
and techniques presented here will play an important role. Furthermore, I have
made some substantive changes. In particular, I have given more space to in-
finitely divisible laws and their associated Lévy processes, both of which are now
developed in RN rather than just in R. In addition, I have added an entire chap-
ter devoted to Gaussian measures in infinite dimensions from the perspective of
the Segal–Gross school. Not only have recent developments in Malliavin calculus
and conformal field theory sparked renewed interest in this topic, but it seems to
me that most modern texts pay either no or too little attention to this beautiful
material. Missing from the new edition is the treatment of singular integrals. I
included it in the first edition in the hope that it would elucidate the similarity
between cancellations that underlie martingale theory, especially Burkholder’s
Inequality, and Calderon–Zygmund theory. I still believe that these similarities
are worth thinking about, but I have decided that my explanation of them led
me too far astray and was more of a distraction than a pedagogically valuable
addition.
Besides those mentioned above, minor changes have been made throughout.
For one thing, I have spent a lot of time correcting old errors and, undoubtedly,
inserting new ones. Secondly, I have made several organizational changes as well
as others that are remedial. A summary of the contents follows.
Summary
1: Chapter 1 contains a sampling of the standard, pointwise convergence theo-

rems dealing with partial sums of independent random variables. These include
the Weak and Strong Laws of Large Numbers as well as Hartman–Wintner’s Law
of the Iterated Logarithm. In preparation for the Law of the Iterated Logarithm,
Cramér’s theory of large deviations from the Law of Large Numbers is developed
in § 1.4. Everything here is very standard, although I feel that my passage from
the bounded to the general case of the Law of the Iterated Logarithm has been
xvi Preface
considerably smoothed by the ideas that I learned during a conversation with
M. Ledoux.
2: The whole of Chapter 2 is devoted to the classical Central Limit Theorem.
After an initial (and slightly flawed) derivation of the basic result via moment
considerations, Lindeberg’s general version is derived in § 2.1. Although Linde-
berg’s result has become a sine qua non in the writing of probability texts, the
Berry–Esseen estimate has not. Indeed, until recently, the Berry–Esseen esti-
mate required a good many somewhat tedious calculations with characteristic
functions (i.e., Fourier transforms), and most recent authors seem to have de-
cided that the rewards did not justify the effort. I was inclined to agree with
them until P. Diaconis brought to my attention E. Bolthausen’s adaptation of
C. Stein’s techniques (the so-called Stein’s method) to give a proof that is not
only brief but also, to me, aesthetically pleasing. In any case, no use of Fourier
methods is made in the derivation given in § 2.2. On the other hand, Fourier
techniques are introduced in § 2.3, where it is shown that even elementary Fourier
analytic tools lead to important extensions of the basic Central Limit Theorem
to more than one dimension. Finally, in § 2.4, the Central Limit Theorem is ap-
plied to the study of Hermite multipliers and (following Wm. Beckner) is used to
derive both E. Nelson’s hypercontraction estimate for the Mehler kernel as well
as Beckner’s own estimate for the Fourier transform. I am afraid that, with this
flagrant example of the sort of thing that does not belong here, I may be trying
the patience of my purist colleagues. However, I hope that their indignation
will be somewhat assuaged by the fact that the rest of the book is essentially
independent of the material in § 2.4.
3: This chapter is devoted to the study of infinitely divisible laws. It begins
in § 3.1 with a few refinements (especially The Lévy Continuity Theorem) of
the Fourier techniques introduced in § 2.3. These play a role in § 3.2, where
the Lévy–Khinchine formula is first derived and then applied to the analysis of
stable laws.
4: In Chapter 4 I construct the Lévy processes (a.k.a. independent increment
processes) corresponding to infinitely divisible laws. Secton 4.1 provides the req-
uisite information about the pathspace D(RN ) of right-continuous paths with
left limits, and § 4.2 gives the construction of Lévy processes with discontinuous
paths, the ones corresponding to infinitely divisible laws having no Gaussian
part. Finally, in § 4.3 I construct Brownian motion, the Lévy process with con-
tinuous paths, following the prescription given by Lévy.
5: Because they are not needed earlier, conditional expectations do not appear
until Chapter 5. The advantage gained by this postponement is that, by the
time I introduce them, I have an ample supply of examples to which condition-
ing can be applied; the disadvantage is that, with considerable justice, many
probabilists feel that one is not doing probability theory until one is condition-
ing. Be that as it may, Kolmogorov’s definition is given in § 5.1 and is shown
Preface xvii
to extend naturally both to σ-finite measure spaces as well as to random vari-
ables with values in a Banach space. Section 5.2 presents Doob’s basic theory
of real-valued, discrete parameter martingales: Doob’s Inequality, his Stopping
Time Theorem, and his Martingale Convergence Theorem. In the last part of
§ 5.2, I introduce reversed martingales and apply them to DeFinetti’s theory of
exchangeable random variables.
6: Chapter 6 opens with extensions of martingale theory in two directions: to
σ-finite measures and to random variables with values in a Banach space. The
results in § 6.1 are used in § 6.2 to derive Birkhoff’s Individual Ergodic Theorem
and a couple of its applications. Finally, in § 6.3 I prove Burkholder’s Inequality
for martingales with values in a Hilbert space. The derivation that I give is
essentially the same as Burkholder’s second proof, the one that gives optimal
constants.
7: Section 7.1 provides a brief introduction to the theory of martingales with
a continuous parameter. As anyone at all familiar with the topic knows, any-
thing approaching a full account of this theory requires much more space than a
book like this can give it. Thus, I deal with only its most rudimentary aspects,
which, fortunately, are sufficient for the applications to Brownian motion that I
have in mind. Namely, in § 7.2 I first discuss the intimate relationship between
continuous martingales and Brownian motion (Lévy’s martingale characteriza-
tion of Brownian motion), then derive the simplest (and perhaps most widely
applied) case of the Doob–Meyer Decomposition Theory, and finally show what
Burkholder’s Inequality looks like for continuous martingales. In the conclud-
ing section, § 7.3, the results in §§ 7.1–7.2 are applied to derive the Reflection
Principle for Brownian motion.
8: In § 8.1 I formulate the description of Brownian motion in terms of its Gaus-
sian, as opposed to its independent increment, properties. More precisely, fol-
lowing Segal and Gross, I attempt to convince the reader that Wiener measure
(i.e., the distribution of Brownian motion) would like to be the standard Gauss
measure on the Hilbert space H 1 (RN ) of absolutely continuous paths with a
square integrable derivative, but, for technical reasons, cannot live there and
has to settle for a Banach space in which H 1 (RN ) is densely embedded. Using
Wiener measure as the model, in § 8.2 I show that, at an abstract level, any
non-degenerate, centered Gaussian measure on an infinite dimensional, separa-
ble Banach space shares the same structure as Wiener measure in the sense
that there is always a densely embedded Hilbert space, known as the Cameron–
Martin space, for which it would like to be the standard Gaussian measure but
on which it does not fit. In order to carry out this program, I need and prove
Fernique’s Theorem for Gaussian measures on a Banach space. In § 8.3 I begin
by going in the opposite direction, showing how to pass from a Hilbert space H
to a Gaussian measure on a Banach space E for which H is the Cameron–Martin
space. The rest of § 8.3 gives two applications: one to “pinned Brownian” motion
xviii Preface
and the second to a very general statement of orthogonal invariance for Gaussian
measures. The main goal of § 8.4 is to prove a large deviations result, known as
Schilder’s Theorem, for abstract Wiener spaces; and once I have Schilder’s The-
orem, I apply it to derive a version of Strassen’s Law of the Iterated Logarithm.
Starting with the Ornstein–Uhlenbeck process, I construct in § 8.5 a family of
Gaussian measures known in the mathematical physics literature as Euclidean
free fields. In the final section, § 8.6, I first show how to construct Banach space–
valued Brownian motion and then derive the original form of Strassen’s Law of
the Iterated Logarithm in that context.
9: The central topic here is the abstract theory of weak convergence of prob-
ability measures on a Polish space. The basic theory is developed in § 9.1. In
§ 9.2 I apply the theory to prove the existence of regular conditional probability
distributions, and in § 9.3 I use it to derive Donsker’s Invariance Principle (i.e.,
the pathspace statement of the Central Limit Theorem).
10: Chapter 10 is an introduction to the connections between probability the-
ory and partial differential equations. At the beginning of § 10.1 I show that
martingale theory provides a link between probability theory and partial dif-
ferential equations. More precisely, I show how to represent in terms of Wiener
integrals solutions to parabolic and elliptic partial differential equations in which
the Laplacian is the principal part. In the second part of § 10.1, I use this link to
calculate various Wiener integrals. In § 10.2 I introduce the Markov property of
Wiener measure and show how it not only allows one to evaluate other Wiener
integrals in terms of solutions to elliptic partial differential equations but also
enables one to prove interesting facts about solutions to such equations as a con-
sequence of their representation in terms of Wiener integrals. Continuing in the
same spirit, I show in § 10.2 how to represent solutions to the Dirichlet problem
in terms of Wiener integrals, and in § 10.3 I use Wiener measure to construct
and discuss heat kernels related to the Laplacian.
11: The final chapter is an extended example of the way in which probability
theory meshes with other branches of analysis, and the example that I have cho-
sen is the marriage between Brownian motion and classical potential theory. Like
an ideal marriage, this one is simultaneously intimate and mutually beneficial to
both partners. Indeed, the more one knows about it, the more convinced one be-
comes that the properties of Brownian paths are a perfect reflection of properties
of harmonic functions, and vice versa. In any case, in § 11.1 I sharpen the results
in § 10.2.3 and show that, in complete generality, the solution to the Dirichlet
problem is given by the Wiener integral of the boundary data evaluated at the
place where Brownian paths exit from the region. Next, in § 11.2, I discuss the
Green function for a region and explain how its existence reflects the recurrence
and transience properties of Brownian paths. In preparation for § 11.4, § 11.3 is
devoted to the Riesz Decomposition Theorem for excessive functions. Finally,
in § 11.4, I discuss the capacity of regions, derive Chung’s representation of the
Preface xix
capacitory measure in terms of the last place where a Brownian path visits a
region, apply the probabilistic interpretation of capacity to give a derivation of
Wiener’s test for regularity, and conclude with two asymptotic calculations in
which capacity plays a crucial role.
Suggestions about the Use of This Book
In spite of the realistic assessment contained in the first paragraph of its preface,
when I wrote the first edition of this book I harbored the naı̈ve hope that it
might become the standard graduate text in probability theory. By the time
that I started preparing the second edition, I was significantly older and far less
naı̈ve about its prospects. Although the first edition has its admirers, it has
done little to dent the sales record of its competitors. In particular, the first
edition has seldom been adopted as the text for courses in probability, and I
doubt that the second will be either. Nonetheless, I close this preface with a few
suggestions for anyone who does choose to base a course on it.
I am well aware that, except for those who find their way into the poorly
stocked library of some prison camp, few copies of this book will be read from
cover to cover. For this reason, I have attempted to organize it in such a way that,
with the help of the table of dependence that follows, a reader can select a path
which does not require his reading all the sections preceding the information he
is seeking. For example, the contents of §§ 1.1–1.2, § 1.4, § 2.1, § 2.3, and § 5.1–
5.2 constitute the backbone of a one semester, graduate level introduction to
probability theory. What one attaches to this backbone depends on the speed
with which these sections are covered and the content of the courses for which
the course is the introduction. If the goal is to prepare the students for a career
as a “quant” in what is left of the financial industry, an obvious choice is § 4.3
and as much of Chapter 7 as time permits, thereby giving one’s students a
reasonably solid introduction to Brownian motion. On the other hand, if one
wants the students to appreciate that white noise is not the only noise that they
may encounter in life, one might defer the discussion of Brownian motion and
replace it with the material in Chapter 3 and §§ 4.1–4.2.
Alternatively, one might use this book in a more advanced course. An intro-
duction to stochastic processes with an emphasis on their relationship to partial
differential equations can be constructed out of Chapters 6, 7, 10, and 11, and
§ 4.3 combined with Chapter 8 could be used to provide background for a course
on Gaussian processes.
Whatever route one takes through this book, it will be a great help to your
students for you to suggest that they consult other texts. Indeed, it is a familiar
fact that the third book one reads on a subject is always the most lucid, and so
one should suggest at least two other books. Among the many excellent choices
available, I mention Wm. Feller’s An Introduction to Probability Theory and Its
Applications, Vol. II, and M. Loéve’s classic Probability Theory. In addition, for
xx Preface
background, precision (including accuracy of attribution), and supplementary
material, R. Dudley’s Real Analysis and Probability is superb.
Table of Dependence
§§11.1–11.4
§§10.1–10.3
§§7.4–7.5 §§8.1–8.5
§9.3 §§7.1–7.3
§§3.2–3.3 §§4.1–4.3 §§6.1 & 6.3 §6.2
§2.3 §§3.1 & 3.4 §§5.1 & 5.3 §1.5
§§2.1 & 2.2 §§1.1–1.4
§§9.1–9.2
xxi
Chapter 1
Sums of Independent Random Variables
In one way or another, most probabilistic analysis entails the study of large
families of random variables. The key to such analysis is an understanding
of the relations among the family members; and of all the possible ways in
which members of a family can be related, by far the simplest is when there
is no relationship at all! For this reason, I will begin by looking at families of
independent random variables.
§ 1.1 Independence
In this section I will introduce Kolmogorov’s way of describing independence
and prove a few of its consequences.
§ 1.1.1. Independent σ-Algebras. Let (Ω, F, P) be a probability space
(i.e., Ω is a nonempty set, F is a σ-algebra over Ω, and P is a non-negative
measure on the measurable space (Ω, F) having total mass 1), and, for each i
from the (non-empty) index set I, let Fi be a sub-σ-algebra of F. I will say
that the σ-algebras Fi , i ∈ I, are mutually P-independent, or, less precisely,
P-independent, if, for every finite subset {i1 , . . . , in } of distinct elements of I
and every choice of Aim ∈ Fim , 1 ≤ m ≤ n,
(1.1.1) P(Ai1 ∩ · · · ∩ Ain ) = P(Ai1 ) · · · P(Ain ).
In particular, if {Ai : i ∈ I} is a family of sets from F, I will say that Ai , i ∈

I, are P-independent if the associated σ-algebras Fi = {∅, Ai , Ai {, Ω}, i ∈ I,
are. To gain an appreciation for the intuition on which this definition is based,
it is important to notice that independence of the pair A1 and A2 in the present
sense is equivalent to P(A1 ∩ A2 ) = P(A1 )P(A2 ), the classical definition that
one encounters in elementary treatments. Thus, the notion of independence just
introduced is no more than a simple generalization of the classical notion of
independent pairs of sets encountered in non-measure theoretic presentations,
and therefore the intuition that underlies the elementary notion applies equally
well to the definition given here. (See Exercise 1.1.8 for more information about
the connection between the present definition and the classical one.)
As will become increasing evident as we proceed, infinite families of indepen-
dent objects possess surprising and beautiful properties. In particular, mutually
1
2 1 Sums of Independent Random Variables
independent σ-algebras tend to fill up space in a sense made precise by the fol-
lowing beautiful thought experiment designed by A.N. Kolmogorov. Let I be
any index set, take F∅ = {∅, Ω}, and, for each non-empty subset Λ ⊆ I, let
!
_ [
FΛ = Fi ≡ σ Fi
i∈Λ i∈I
S
be the σ-algebra
S generated by i∈Λ Fi (i.e., FΛ is the smallest σ-algebra con-
taining i∈Λ Fi ). Next, define the tail σ-algebra T to be the intersection over
all finite Λ ⊆ I of the σ-algebras FΛ{ . When I itself is finite, T = {∅, Ω} and
is therefore P-trivial in the sense that P(A) ∈ {0, 1} for every A ∈ T . The
interesting remark made by Kolmogorov is that even when I is infinite, T is
P-trivial whenever the original Fi ’s are P-independent. To see this, for a given
non-empty Λ ⊆ I, let CΛ denote the collection of sets of the form Ai1 ∩ · · · Ain
where {i1 , . . . , in } are distinct elements of Λ and Aim ∈ Fim for each 1 ≤ m ≤ n.
Clearly CΛ is closed under intersection and FΛ = σ(CΛ ). In addition, by assump-
tion, P(A ∩ B) = P(A)P(B) for all A ∈ CΛ and B ∈ CΛ{ . Hence, by Exercise
1.1.12, FΛ is independent of FΛ{ . But this means that T is independent of FF
for every finite F ⊆ I, and therefore, again by Exercise 1.1.12, T is independent
of [
FI = σ {FF : F a finite subset of Λ} .
Since T ⊆ FI , this implies that T is independent of itself ; that is, P(A ∩ B) =

P(A)P(B) for all A, B ∈ T . Hence, for every A ∈ T , P(A) = P(A)2 , or,
equivalently, P(A) ∈ {0, 1}, and so I have now proved the following famous
result.
Theorem 1.1.2 (Kolmogorov’s 0–1 Law). Let {Fi : i ∈ I} be a family
of P-independent sub-σ-algebras of (Ω, F, P), and define the tail σ-algebra T
accordingly, as above. Then, for every A ∈ T , P(A) is either 0 or 1.
To develop a feeling for the kind of conclusions that can be drawn from Kol-
mogorov’s 0–1 Law (cf. Exercises 1.1.18 and 1.1.19 as well), let {An : n ≥ 1} be
a sequence of subsets of Ω, and recall the notation
∞ [
\
An = ω : ω ∈ An for infinitely many n ∈ Z+ .

lim An ≡
n→∞
m=1 n≥m
Obviously, limn→∞ An is measurable with respect to the tail field determined by

the sequence of σ-algebras {∅, An , An {, Ω}, n ∈ Z+ ; and therefore, if the An ’s
are P-independent elements of F, then

P lim An ∈ {0, 1}.
n→∞
§ 1.1 Independence 3
In words, this conclusion can be summarized as follows: for any sequence of

P-independent events An , n ∈ Z+ , either P-almost every ω ∈ Ω is in infinitely
many An ’s or P-almost every ω ∈ Ω is in at most finitely many An ’s. A more
quantitative statement of this same fact is contained in the second part of the
following useful result.
Lemma 1.1.3 (Borel–Cantelli Lemma). Let {An : n ∈ Z+ } ⊆ F be given.
Then
∞
X
(1.1.4) P(An ) < ∞ =⇒ P lim An = 0.
n→∞
n=1
In fact, if the An ’s are P-independent sets, then

∞
X
(1.1.5) P(An ) = ∞ ⇐⇒ P lim An = 1.
n→∞
n=1
(See part (iii) of Exercise 5.2.40 and Lemma 11.4.14 for generalizations.)
Proof: The first assertion, which is due to E. Borel, is an easy application of
countable additivity. Namely, by countable additivity,
[ X
P lim An = lim P An ≤ lim P(An ) = 0
n→∞ m→∞ m→∞
n≥m n≥m
P∞
if n=1 P(An ) < ∞.
To complete the proof of (1.1.5) when
the An ’s are independent, note that,
by countable additivity, P limn→∞ An = 1 if and only if
 
\ ∞ \
[
lim P An { = P  An {  = P lim An { = 0.
m→∞ n→∞
n≥m m=1 n≥m
But, by independence and another application of countable additivity, for any

given m ≥ 1 we have that
∞
! N
" N
#
\ Y X
P An { = lim 1 − P(An ) ≤ lim exp − P An =0
N →∞ N →∞
n=m n=m n=m
P∞
if n=1 P(An ) = ∞. (In the preceding, I have used the trivial inequality 1 − t ≤
e−t , t ∈ [0, ∞).)
A second, and perhaps more transparent, way of dealing with the contents of
the preceding is to introduce the non-negative random variable N (ω) ∈ Z+ ∪
{∞}, that counts theP number of n ∈ Z+ such that ω ∈ An . Then, by Tonelli’s

1 P ∞
Theorem, E [N ] = n=1 P(An ), and so Borel’s contribution is equivalent to
the EP [N ] < ∞ =⇒ P(N < ∞) = 1, which is obvious, whereas Cantelli’s
contribution is that, for mutually independent An ’s, P(N < ∞) =⇒ EP [N ] <
∞, which is not obvious.
§ 1.1.2. Independent Functions. Having described what it means for the σ-
algebras to be P-independent, I will now transfer the notion to random variables
on (Ω, F, P). Namely, for each i ∈ I, let Xi be a random variable (i.e., a
measurable function on (Ω, F)) with values in the measurable space (Ei , Bi )). I
will say that the random variables Xi , i ∈ I, are (mutually) P-independent
if the σ-algebras
σ(Xi ) = Xi−1 Bi ≡ Xi−1 (Bi ) : Bi ∈ Bi , i ∈ I,

are P-independent. If B(E; R) = B (E, B); R denotes the space of bounded
measurable R-valued functions on the measurable space (E, B), then it should
be clear that P-independence of {Xi : i ∈ I} is equivalent to the statement that

EP fi1 ◦ Xi1 · · · fin ◦ Xin = EP fi1 ◦ Xi1 · · · EP fin ◦ Xin
for all finite subsets

{i1 , . . . , in } of distinct elements of I and all choices of
fi1 ∈ B Ei1 ; R , . . . , fin ∈ B Ein ; R . Finally, if 1A given by
if ω ∈ A

1
1A (ω) ≡
0 if ω ∈
/A
denotes the indicator function of the set A ⊆ Ω, notice that the family of sets
{Ai : i ∈ I} ⊆ F is P-independent if and only if the random variables 1Ai , i ∈ I,
are P-independent.
Thus far I have discussed only the abstract notion of independence and have
yet to show that the concept is not vacuous. In the modern literature, the
standard way to construct lots of independent quantities is to take products of
probability spaces.
Q Namely, if E i , B i , µi is a probability space for each i ∈ I,
one sets Ω = i∈I Ei ; defines πi : Ω −→ Ei to be the natural projection map
for each i ∈ I; takes Fi = πi−1 (Bi ), i ∈ I, and F = i∈I Fi ; and shows that
W
there is a unique probability measure P on (Ω, F) with the properties that
P πi−1 Γi = µi Γi )

for all i ∈ I and Γi ∈ Bi
1 Throughout this book, I use EP [X, A] to denote the expected value under P of X over the set
R
A. That is, EP [X, A] = X dP. Finally, when A = Ω, I will write EP [X]. Tonelli’s Theorem
A
is the version of Fubini’s Theorem for non-negative functions. Its virtue is that it applies
whether or not the integrand is integrable.
§ 1.1 Independence 5
and the σ-algebras Fi , i ∈ I, are P-independent. Although this procedure is

extremely powerful, it is rather mechanical. For this reason, I have chosen to
defer the details of the product construction to Exercises 1.1.14 and 1.1.16 and
to, instead, spend the rest of this section developing a more hands-on approach
to constructing independent sequences of real-valued random variables. Indeed,
although the product method is more ubiquitous and has become the construc-
tion of choice, the one that I am about to present has the advantage that it shows
independent random variables can arise “naturally” and even in a familiar places.
§ 1.1.3. The Rademacher Functions. Until further notice, take (Ω, F) =
[0, 1), B[0,1) (when E is a metric space, I use BE to denote the Borel field over
E) and P to be the restriction λ[0,1) of Lebesgue measure λR to [0, 1). Next
define the Rademacher functions Rn , n ∈ Z+ , on Ω as follows. Take the
integer part btc of t ∈ R to be the largest integer dominated by t, and consider
the function R : R −→ {−1, 1} given by
if t − btc ∈ 0, 12

−1

R(t) =
if t − btc ∈ 12 , 1 .

1
The function Rn is then defined on [0, 1) by
Rn (ω) = R 2n−1 ω , n ∈ Z+ and ω ∈ [0, 1).

I will now show that the Rademacher functions are P-independent. To this end,
first note that every real-valued function f on {−1, 1} is of the form α + βx, x ∈
{−1, 1}, for some pair of real numbers α and β. Thus, all that I have to show is
that
EP (α1 + β1 R1 ) · · · (αn + βn Rn ) = α1 · · · αn
for any n ∈ Z+ and (α1 , β1 ), . . . , (αn , βn ) ∈ R2 . Since this is obvious when
n = 1, I will assume that it holds for n and need only check that it must also
hold for n + 1, and clearly this comes down to checking that

EP F (R1 , . . . , Rn ) Rn+1 = 0
for any F : {−1, 1}n −→ R. But (R1 , . . . , Rn ) is constant on each interval

m m+1
Im,n ≡ n , , 0 ≤ m < 2n ,
2 2n
whereas Rn+1 integrates to 0 on each Im,n . Hence, by writing the integral over
Ω as the sum of integrals over the Im,n ’s, we get the desired result.
At this point I have produced a countably infinite sequence of independent
Bernoulli random variables (i.e., two-valued random variables whose range is
usually either {−1, 1} or {0, 1}) with mean value 0. In order to get more general
random variables, I will combine our Bernoulli random variables together in a

clever way.
Recall that a random variable U is said to be uniformly distributed on the
finite interval [a, b] if
t−a
P(U ≤ t) = for t ∈ [a, b].
b−a
Lemma 1.1.6. Let {X` : ` ∈ Z+ } be a sequence of P-independent {0, 1}-

valued Bernoulli random variables with mean value 12 on some probability space
(Ω, F, P), and set
∞
X X`
U= .
2`
`=1
Then U is uniformly distributed on [0, 1].

Proof: Because the assertion only involves properties of distributions, it will be
proved in general as soon as I prove it for a particular realization of independent,
mean value 12 , {0, 1}-valued Bernoulli random variables. In particular, by the
preceding discussion, I need only consider the random variables
1 + Rn (ω)
n (ω) ≡ , n ∈ Z+ and ω ∈ [0, 1),
2

on [0, 1), B[0,1) , λ[0,1) . But, as is easily checked (cf. part (i) of Exercise 1.1.11),
P∞
for each ω ∈ [0, 1], ω = n=1 2−n n (ω). Hence, the desired conclusion is trivial
in this case.
Now let (k, `) ∈ Z+ × Z+ 7−→ n(k, `) ∈ Z+ be any one-to-one mapping of
Z × Z+ onto Z+ , and set
+
1 + Rn(k,`) 2
Yk,` = , (k, `) ∈ Z+ .
2
Clearly, each Yk,` is a {0, 1}-valued, Bernoulli random variable with mean value
1 + 2

2 , and the family Yk,` : (k, `) ∈ Z is P-independent. Hence, by Lemma
1.1.6, each of the random variables
∞
X Yk,`
Uk ≡ , k ∈ Z+ ,
2`
`=1
is uniformly distributed on [0, 1). In addition, the Uk ’s are obviously mutually

independent. Hence, I have now produced a sequence of mutually independent
random variables, each of which is uniformly distributed on [0, 1). To complete
our program, I use the time-honored transformation that takes a uniform random
Exercises for § 1.1 7
variable into an arbitrary one. Namely, given a distribution function F on

R (i.e., F is a right-continuous, non-decreasing function that tends to 0 at −∞
and 1 at +∞), define F −1 on [0, 1] to be the left-continuous inverse of F . That
is,
F −1 (t) = inf{s ∈ R : F (s) ≥ t}, t ∈ [0, 1].
(Throughout, the infemum over the empty set is taken to be +∞.) It is then an
easy matter to check that when U is uniformly distributed on [0, 1) the random
variable X = F −1 ◦ U has distribution function F :
P(X ≤ t) = F (t), t ∈ R.
Hence, after combining this with what we already know, I have now completed
the proof of the following theorem.
Theorem 1.1.7. Let Ω = [0, 1), F = B[0,1) , and P = λ[0,1) . Then, for
any sequence {Fk : k ∈ Z+ } of distribution functions on R, there exists a
sequence {Xk : k ∈ Z+ } ofP-independent random variables on (Ω, F, P) with
the property that P Xk ≤ t = Fk (t), t ∈ R, for each k ∈ Z+ .
Exercises for § 1.1

Exercise 1.1.8. As I pointed out, P A1 ∩ A2 = P A1 )P A2 if and only
if the σ-algebra generated by A1 is P-independent of the one generated by A2 .
Construct an example to show that the analogous statement is false when dealing

of two, sets. That is, just because P A1 ∩ A2 ∩ A3 =
with three, instead
P A1 P A2 P A3 , show that it is not necessarily true that the three σ-algebras
generated by A1 , A2 , and A3 are P-independent.
Exercise 1.1.9. This exercise deals with three elementary, but important,
properties of independent random variables. Throughout, (Ω, F, P) is a given
probability space.
(i) Let X1 and X2 be a pair of P-independent random variables with values in
the measurable spaces (E1 , B1 ) and (E2 , B2 ), respectively. Given a B1 × B2 -
measurable function F : E1 × E2 −→ R that is bounded below, use Tonelli’s or
Fubini’s Theorem to show that

x2 ∈ E2 7−→ f (x2 ) ≡ EP F X1 , x2 ∈ R
is B2 -measurable and that

EP F X1 , X2 = EP f X2 .
(ii) Suppose that X1 , . . . , Xn are P-independent, real-valued random variables.
If each of the Xm ’s is P-integrable, show that X1 · · · Xn is also P-integrable and
that
EP X1 · · · Xn = EP X1 · · · EP Xn .
(iii) Let {Xn : n ∈ Z+ } be a sequence of independent random variables taking
values in some separable metric space E. If P(X n = x) = 0 for all x ∈ E and
n ∈ Z+ , show that P Xm = Xn for some m 6= n = 0.
Exercise 1.1.10. As an application of Lemma 1.1.6 and part (ii) of Exercise

1.1.9, prove the identity
∞
Y
cos 2−n z

sin z = z for all z ∈ C.
n=1
Exercise 1.1.11. Define {n (ω) : n ≥ 1} for ω ∈ [0, 1) as in the proof of

Lemma 1.1.6.
+
(i) Show that {Pn (ω) : n ≥ 1} is the unique sequence {αn : n ≥ 1} ⊆ {0, 1}
Z
n
such that ω − m=1 2−m αm < 2−n , and conclude that 1 (ω) = b2ωc and
n+1 (ω) = b2n+1 ωc − 2b2n ωc for n ≥ 1.
(ii) Define F : [0, 1) −→ [0, 1)2 by
∞ ∞
!
X X
−n −n
F (ω) = 2 2n−1 (ω), 2 2n (ω) ,
n=1 n=1

and show that λ[0,1)2 = F∗ λ[0,1) . That is, λ[0,1) {ω : F (ω) ∈ Γ} = λ2[0,1) (Γ) for
all Γ ∈ B[0,1)2 .
(iii) Define G : [0, ∞)2 −→ [0, 1) by
∞
X 2n (ω1 ) + n (ω2 )
G (ω1 , ω2 ) = ,
n=1
4n
and show that λ[0,1) = G∗ λ[0,1)2 .

Parts (ii) and (iii) are special cases of a general principle that says, under
very general circumstances, measures can be transformed into one another.
Exercise 1.1.12. Given a non-empty set Ω, recall2 that a collection C of subsets
of Ω is called a π-system if C is closed under finite intersections. At the same
time, recall that a collection L is called a λ-system if Ω ∈ L, A ∪ B ∈ L
whenever A and B are disjoint members S∞ of L, B \ A ∈ L whenever A and B
are members of L with A ⊆ B, and 1 An ∈ L whenever {An : n ≥ 1} is a
non-decreasing sequence of members of L. Finally, recall (cf. Lemma 3.1.3 in my
Concise Introduction to the Theory of Integration) that if C is a π-system, then
the σ-algebra σ(C) generated by C is the smallest L-system L ⊇ C.
Show that if C is a π-system and F = σ(C), then two probability measures
P and Q are equal on F if they are equal on C. Next use this to see that if
{Ci : i ∈ I} is a family of π-systems contained in F and if (1.1.1) holds when
the Ai ’s are from the Ci ’s, then the family of σ-algebras {σ(Ci ) : i ∈ I} is
independent.
2See, for example, § 3.1 in the author’s A Concise Introduction to the Theory of Integration,
Third Edition, Birkhäuser (1998).
Exercise 1.1.13. In this exercise I discuss two criteria for determining when
random variables on the probability space (Ω, F, P) are independent.
(i) Let X1 , . . . , Xn be bounded, real-valued random variables. Using Weier-
strass’s Approximation Theorem, show that the Xm ’s are P-independent if and
only if
EP X1m1 · · · Xnmn = EP X1m1 · · · EP Xnmn

for all m1 , . . . , mn ∈ N.
(ii) Let X : Ω −→ Rm and Y : Ω −→ Rn be random variables. Show that X
and Y are P-independent if and only if
h√ i

P
E exp −1 α, X Rm + β, Y Rn
h√
i P h√ i

P
= E exp −1 α, X Rm E exp −1 β, Y Rn
for all α ∈ Rm and β ∈ Rn .

Hint: The only if assertion is obvious. To prove the if assertion, first check
that X and Y are independent if

EP f (X) g(Y) = EP f (X) EP g(Y)
for all f ∈ Cc∞ Rm ; C and g ∈ Cc∞ Rn ; C . Second, given such f and g, apply

elementary Fourier analysis to write

√ √
Z Z
−1 (α,x)Rm
f (x) = e ϕ(α) dα and g(y) = e −1 (β,y)Rn ψ(β) dβ,
Rm Rn
where ϕ and ψ are smooth functions with rapidly decreasing (i.e., tending
to 0 as |x| → ∞ faster than any power of (1 + |x|)−1 ) derivatives of all orders.
Finally, apply Fubini’s Theorem.
Exercise 1.1.14. Given a pair of measurable spaces (E1 , B1 ) and (E 2 , B2 ),
recall that their product is the measurable space E1 × E2 , B1 × B2 , where
B1 × B2 is the σ-algebra over the Cartesian product space E1 × E2 generated by
the sets Γ1 × Γ2 , Γi ∈ Bi . Further, recall that, for any probability measures µi
on (Ei , Bi ), there is a unique probability measure µ1 × µ2 on E1 × E2 , B1 × B2
such that
(µ1 × µ2 ) Γ1 × Γ2 = µ1 (Γ1 )µ2 (Γ2 ) for Γi ∈ Bi .
generally, for any n ≥ 2 and measurable
More Q spaces {(Ei , Bi ) : 1Q≤ i ≤ n}, one
n Qn n
takes 1 Bi to be the σ-algebra over 1 Ei generated by the sets 1 Γi , Γi ∈ Bi .
Qn+1 Qn+1 Qn
In particular, since 1 Ei and 1 Bi can be identified with ( 1 Ei ) ×
Qn
En+1 and ( 1 Bi ) × Bn+1 , respectively, one can use induction to show that, for
every choice
Qn of probability
Qn measures
Qn µi on (Ei , Bi ), there is a unique probability
measure 1 µi on ( 1 Ei , 1 Bi ) such that
n
! n ! n
Y Y Y
µi Γi = µi (Γi ), Γi ∈ Bi .
1 1 1
The purpose of this exercise is to generalize the preceding construction to

infinite collections. Thus, let I be an infinite index set, and, for each i ∈ I,
let (Ei , Bi ) be a measurable
Q space. Given ∅ = 6 Λ ⊆ I, use EΛ to denote the
Cartesian product space i∈Λ Ei and πQ Λ to denote the natural projection map
taking EI onto EΛ . Further, let BI = i∈I Bi stand for the σ-algebra over EI
generated by the collection C of subsets
!
Y
−1
πF Γi , Γi ∈ Bi ,
i∈F
as F varies over non-empty, finite subsets of I (abbreviated by ∅ = 6 F ⊂⊂ I).

In the following steps, I outline a proof that, for every choice of Q probability
measures µi on the (Ei , Bi )’s, there is a unique probability measure i∈I µi on
EI , BI with the property that
! !!
Y Y Y
−1

(1.1.15) µi πF Γi = µi Γi , Γi ∈ Bi ,
i∈I i∈F i∈F
for every ∅ =
6 F ⊂⊂ I. Not surprisingly, the probability space
!
Y Y Y
Ei , Bi , µi
i∈I i∈I i∈I

is called the product over I of the spaces Ei , Bi , µi ; and when all the factors
it by E I , B I , µI , and

are the same space E, B, µ , it is customary to denote
if, in addition, I = {1, . . . , N }, one uses E N , B N , µN .
(i) After noting (cf. Exercise 1.1.12) that two probability measures that agree on
a π-system agree on the σ-algebra generated bythat π-system, show that there
is at most one probability measure on EI , BI that satisfies the condition in
(1.1.15). Hence, the problem is purely one of existence.
(ii) Let A be the algebra over EI generated by C, and show that there is a finitely
additive µ : A −→ [0, 1] with the property that
!
Y
µ πF−1 ΓF =

µi ΓF , ΓF ∈ BF ,
i∈F
for all ∅ =
6 F ⊂⊂ I. Hence, all that one has to do is check that µ admits a
σ-additive extension to BI , and, by a standard extension theorem, this comes
down to checking that µ(An ) & 0 whenever {An : n ≥ 1} ⊆ A and An & ∅.
Thus, let {An : n ≥ 1} be a non-increasing sequence from A, and
T∞assume that
µ(An ) ≥ for some > 0 and all n ∈ Z+ . One must show that 1 An 6= ∅.
(iii) Referring to the last part of (ii), show that there is no loss in generality
to assume that An = πF−1 n
ΓFn , where, for each n ∈ Z+ , ∅ = 6 Fn ⊂⊂ I and
ΓFn ∈ BFn . In addition, show that one may assume that F1 = {i1 } and that
Fn = Fn−1 ∪ {in }, n ≥ 2, where {in : n ≥ 1} is a sequence of distinct elements
of I. Now, make these assumptions, and show that it suffices to find a` ∈ Ei` ,
` ∈ Z+ , with the property that, for each m ∈ Z+ , (a1 , . . . , am ) ∈ ΓFm .
( iv) Continuing (iii), for each m, n ∈ Z+ , define gm,n : EFm −→ [0, 1] so that

gm,n xFm = 1ΓFn xi1 , . . . , xin if n ≤ m
and
Z n
!
Y
gm,n xFm = 1ΓFn xFm , yFn \Fm µi` dyFn \Fm if n > m.
EFn \Fm `=m+1
After noting that, for each m and n, gm,n+1 ≤ gm,n and

Z

gm,n xFm = gm+1,n xFm , yim+1 µim+1 dyim+1 ,
Eim+1
set gm = limn→∞ gm,n and conclude that

Z

gm xFm = gm+1 xFm , yim+1 µim+1 dyim+1 .
Eim+1
In addition, note that

Z Z

g1 xi1 µi1 dxi1 = lim g1,n xi1 µi1 dxi1
Ei1 n→∞ Ei1
= lim µ(An ) ≥ ,
n→∞
and proceed by induction to produce a` ∈ Ei` , ` ∈ Z+ , so that
gm (a1 , . . . , am ) ≥ for all m ∈ Z+ .

Finally, check that {am : m ≥ 1} is a sequence of the sort for which we were
looking at the end of part (iii).
Exercise 1.1.16. Recall that if Φ is a measurable map from one measurable

space (E, B) into a second one (E 0 , B 0 ), then the distribution of Φ under a
measure µ on (E, B) is the pushforward measure Φ∗ µ (also denoted by µ◦Φ−1 )
defined on (E 0 , B 0 ) by
Φ∗ µ(Γ) = µ Φ−1 (Γ) for Γ ∈ B 0 .

Given a non-empty index set I and, for each i ∈ I, a measurable space (Ei , Bi )
and an Ei -valued
Q random variable Xi on the probability space (Ω, F, P), define
X : Ω −→ i∈I Ei so that X(ω)i = Xi (ω) for each i ∈ I and ω ∈ Ω. Show
that XQ i : i ∈ I is a family of P-independent random variables if and only if
X∗ P = i∈I (Xi )∗ P. In particular, given probability measures µi on (Ei , Bi ),
set Y Y Y
Ω= Ei , F = Bi , P = µi ,
i∈I i∈I i∈I
let Xi : Ω −→ Ei be the natural projection map from Ω onto Ei , and show that
{Xi : i ∈ I} is a family of mutually P-independent random variables such that,
for each i ∈ I, Xi has distribution µi .
Exercise 1.1.17. Although it does not entail infinite product spaces, an inter-
esting example of the way in which the preceding type of construction can be
effectively applied is provided by the following elementary version of a coupling
argument.
(i) Let (Ω, B, P) be a probability space and X and Y a pair of P-square integrable
R-valued random variables with the property that
X(ω) − X(ω 0 ) Y (ω) − Y (ω 0 ) ≥ 0 for all (ω, ω 0 ) ∈ Ω2 .

Show that
EP X Y ≥ EP [X] EP [Y ].
Hint: Define Xi and Yi on Ω2 for i ∈ {1, 2} so that Xi (ω) = X(ωi ) and
Yi (ω) = Y (ωi ) when ω = (ω1 , ω2 ), and integrate the inequality

0 ≤ X(ω1 ) − X(ω2 ) Y (ω1 ) − Y (ω2 ) = X1 (ω) − X2 (ω) Y1 (ω) − Y2 (ω)
with respect to P2 .
(ii) Suppose that n ∈ Z+ and that f and g are R-valued, Borel measurable
functions on Rn that are non-decreasing with respect to each coordinate (sepa-
rately). Show that if X = X1 , . . . , Xn is an Rn -valued random variable on a
probability space (Ω, B, P) whose coordinates are mutually P-independent, then

EP f (X) g(X) ≥ EP f (X) EP g(X)
so long as f (X) and g(X) are both P-square integrable.

Hint: First check that the case when n = 1 reduces to an application of (i).
Next, describe the general case in terms of a multiple integral, apply Fubini’s
Theorem, and make repeated use of the case when n = 1.
Exercise 1.1.18. A σ-algebra is said to be countably generated if it contains
a countable collection of sets that generate it. The purpose of this exercise is to
show that just because a σ-algebra is itself countably generated does not mean
that all its sub-σ-algebras are.
Let (Ω, F, P) be a probability space and {An : n ∈ Z+ ⊆ F a sequence of
P-independent sub-subsets of F with the property that α ≤ P(An ) ≤ 1 − α for
some α ∈ (0, 1). Let Fn be the sub-σ-algebra generated by An . Show that the
tail σ-algebra T determined by Fn : n ∈ Z+ cannot be countably generated.
Hint: Show that C ∈ T is an atom in T (i.e., B = C whenever B ∈ T \ {∅} is
contained in C) only if one can write
∞ \
[
C = lim Cn ≡ Cn ,
n→∞ m=1 n≥m
where, for each n ∈ Z+ , Cn equals either An or An {. Conclude that every

atom
in T must have P-measure 0. Now suppose
that T were generated by
B` : ` ∈ N . By Kolmogorov’s 0–1 Law, P B` ∈ {0, 1} for every ` ∈ N. Take

B` if P B` = 1
\
B̂` = and set C= B̂` .
B` { if P B` = 0 `∈N
Note that, on the one hand, P(C) = 1, while, on the other hand, C is an atom
in T and therefore has probability 0.
Exercise 1.1.19. Here is an interesting application of Kolmogorov’s 0–1 Law
to a property of the real numbers.
(i) Referring to the discussion preceding Lemma 1.1.6 and part (i) of Exercise
1.1.11, define the transformations Tn : [0, 1) −→ [0, 1) for n ∈ Z+ so that
Rn (ω)
Tn (ω) = ω − , ω ∈ [0, 1),
2n
and notice (cf. the proof of Lemma 1.1.6) that Tn (ω) simply flips the nth co-
efficient in the binary expansion ω. Next, let Γ ∈ B[0,1) , and show that Γ
is measurable with respect to the σ-algebra σ {Rn : n > m} generated by
{Rn : n > m} if and only if Tn (Γ) = Γ for each 1 ≤ n ≤ m. In particular,
conclude that λ[0,1) (Γ) ∈ {0, 1} if Tn Γ = Γ for every n ∈ Z+ .
(ii) Let F denote the set of all finite subsets of Z+ , and for each F ∈ F, define
T F : [0, 1) −→ [0, 1) so that T ∅ is the identity mapping and
T F ∪{m} = T F ◦ Tm for each F ∈ F and m ∈ Z+ \ F.
As an application of (i), show that for every Γ ∈ B[0,1) with λ[0,1) (Γ) > 0,
!
[
λ[0,1) T F (Γ) = 1.
F ∈F
In particular, this means that if Γ has positive measure, then almost every
ω ∈ [0, 1) can be moved to Γ by flipping a finite number of the coefficients in the
binary expansion of ω.
§ 1.2 The Weak Law of Large Numbers
Starting with this section, and for the rest of this chapter, I will be studying what
happens when one averages independent, real-valued random variables. The
remarkable fact, which will be confirmed repeatedly, is that the limiting behavior
of such averages depends hardly at all on the variables involved. Intuitively,
one can explain this phenomenon by pretending that the random variables are
building blocks that, in the averaging process, first get homothetically shrunk
and then reassembled according to a regular pattern. Hence, by the time that
one passes to the limit, the peculiarities of the original blocks get lost.
Throughout the discussion, (Ω, F, P) will be a probability space on which there
is a sequence {Xn : n ≥ 1} of real-valued random variables. Given n ∈ Z+ , use
Sn to denote the partial sum X1 + · · · + Xn and S n to denote the average:
n
Sn 1X
= X` .
n n
`=1
§ 1.2.1. Orthogonal Random Variables. My first result is a very general
one; in fact, it even applies to random variables that are not necessarily inde-
pendent and do not necessarily have mean 0.
Lemma 1.2.1. Assume that
EP Xn2 < ∞ for n ∈ Z+

and EP Xk X` = 0 if k 6= `.
Then, for each > 0,
n
2 1 X P 2
(1.2.2) 2 P S n ≥ ≤ EP S n = 2 E X` for n ∈ Z+ .
n
`=1
In particular, if
M ≡ sup EP Xn2 < ∞,

n∈Z+
then
2 M
(1.2.3) 2 P S n ≥ ≤ EP S n ≤ , n ∈ Z+ and > 0;
n
and so S n −→ 0 in L2 (P; R) and therefore also in P-probability.
§ 1.2 The Weak Law of Large Numbers 15
Proof: To prove the equality in (1.2.2), note that, by orthogonality,

n
X
EP Sn2 = EP X`2 .

`=1
The rest is just an application of Chebyshev’s inequality, the estimate that
results after integrating the inequality
2 1[,∞) |Y | ≤ Y 2 1[,∞) |Y | ≤ Y 2

for any random variable Y .

§ 1.2.2. Independent Random Variables. Although Lemma 1.2.1 does
not use independence, independent random variables provide a ready source of
orthogonal functions. To wit, recall that for any P-square integrable random
variable X, its variance Var(X) satisfies
h 2 i 2
= EP X 2 − EP [X] ≤ EP X 2 .

Var(X) ≡ EP X − EP [X]
In particular, if the random variables Xn , n ∈ Z+ , are P-square integrable and
P-independent, then the random variables
X̂n ≡ Xn − EP Xn , n ∈ Z+ ,

are still P-square integrable, have mean value 0, and therefore are orthogonal.
Hence, the following statement is an immediate consequence of Lemma 1.2.1.

Theorem 1.2.4. Let Xn : n ∈ Z+ be a sequence of P-independent, P-square
integrable random variables with mean value m and variance dominated by σ 2 .
Then, for every n ∈ Z+ and > 0,
h 2 i σ 2
2 P S n − m ≥ ≤ EP S n − m

(1.2.5) ≤ .
n
In particular, S n −→ m in L2 (P; R) and therefore in P-probability.
As yet I have made only minimal use of independence: all that I have done
is subtract off the mean of independent random variables and thereby made
them orthogonal. In order to bring the full force of independence into play, one
has to exploit the fact that one can compose independent random variables with
any (measurable) functions without destroying their independence; in particular,
truncating independent random variables does not destroy independence. To see
how such a property can be brought to bear, I will now consider the problem
of extending the last part of Theorem 1.2.4 to Xn ’s that are less than P-square
integrable. In order to
understand the statement, recall that a family of random
variables Xi : i ∈ I is said to be uniformly P-integrable if
h i
lim sup EP Xi , Xi ≥ R = 0.
R%∞ i∈I
As the proof of the following theorem illustrates, the importance of this condition
is that it allows one to simultaneously approximate the random variables Xi , i ∈
I, by bounded random variables.

Theorem 1.2.6 (The Weak Law of Large Numbers). Let Xn : n ∈ Z+
be a uniformly P-integrable sequence of P-independent random variables. Then
n
1X
Xm − EP [Xm ] −→ 0 in L1 (P; R)

n 1

and therefore also in P-probability. In particular, if Xn : n ∈ Z+ is a sequence
of P-independent, P-integrable random variables that are identically distributed,
then S n −→ EP [X1 ] in L1 (P; R) and P-probability. (Cf. Exercise 1.2.11.)
Proof: Without loss in generality, I will assume that EP [Xn ] = 0 for every
n ∈ Z+ .
For each R ∈ (0, ∞), define fR (t) = t 1[−R,R] (t), t ∈ R,
m(R) Xn(R) = fR ◦ Xn − m(R) and Yn(R) = Xn − Xn(R) ,

n = EP fR ◦ Xn , n ,
and set
n n
(R) 1 X (R) (R) 1 X (R)
Sn = X` and Tn = Y` .
n n
`=1 `=1
(R)
Since E[Xn ] = 0 =⇒ mn = −E Xn , |Xn | > R ,
(R) (R)
EP |S n | ≤ EP |S n | + EP |T n |
(R) 1
≤ EP |S n |2 2 + 2 max EP |X` |, |X` | ≥ R

1≤`≤n
R
≤ √ + 2 max EP |X` |, |X` | ≥ R ;
n `∈Z+
and therefore, for each R > 0,

lim EP |S n | ≤ 2 sup EP |X` |, |X` | ≥ R .
n→∞ `∈Z+
Hence, because the X` ’s are uniformly P-integrable, we get the desired conver-
gence in L1 (P; R) by letting R % ∞.
§ 1.2.3. Approximate Identities. The name of Theorem 1.2.6 comes from
a somewhat invidious comparison with the result in Theorem 1.4.9. The reason
why the appellation weak is not entirely fair is that, although The Weak Law
is indeed less refined than the result in Theorem 1.4.9, it is every bit as useful
as the one in Theorem 1.4.9 and maybe even more important when it comes
to applications. What The Weak Law provides is a ubiquitous technique for
constructing an approximate identity (i.e., a sequence of measures that ap-
proximate a point mass) and measuring how fast the approximation is taking
place. To illustrate how clever selections of the random variables entering The
Weak Law can lead to interesting applications, I will spend the rest of this section
discussing S. Bernstein’s approach
to Weierstrass’s
Approximation Theorem.
+
For a given p ∈ [0, 1], let Xn : n ∈ Z be a sequence of P-independent
{0, 1}-valued Bernoulli random variables with mean value p. Then

n `
p (1 − p)n−`

P Sn = ` = for 0 ≤ ` ≤ n.
`

Hence, for any f ∈ C [0, 1]; R , the nth Bernstein polynomial
n
X n `
(1.2.7) Bn (p; f ) ≡ f p` (1 − p)n−`
` n
`=0
of f at p is equal to
EP f ◦ S n .
In particular,

f (p) − Bn (p; f ) = EP f (p) − f ◦ S n ≤ EP f (p) − f ◦ S n

≤ 2kf ku P S n − p ≥ + ρ(; f ),
where kf ku is the uniform norm of f (i.e., the supremum of |f | over the domain
of f ) and

ρ(; f ) ≡ sup |f (t) − f (s)| : 0 ≤ s < t ≤ 1 with t − s ≤
1

is the modulus of continuity of f . Noting that Var Xn = p(1 − p) ≤ 4 and
applying (1.2.5), we conclude that, for every > 0,
f (p) − Bn (p; f ) ≤ kf ku + ρ(; f ).

u 2n2
In other words, for all n ∈ Z+ ,

f − Bn (· ; f ) u ≤ β(n; f ) ≡ inf kf ku
(1.2.8) + ρ(; f ) : > 0 .
2n2
Obviously, (1.2.8) not only shows that, as n → ∞, Bn (· ; f ) −→ f uniformly on

[0, 1], it even provides a rate of convergence in terms of the modulus of continuity
of f . Thus, we have done more than simply prove Weierstrass’s theorem; we have
produced a rather
explicit and tractable
sequence of approximating polynomials,
the sequence Bn (· ; f ) : n ∈ Z+ . Although this sequence is, by no means, the
most efficient one,1 as we are about to see, the Bernstein polynomials have a
lot to recommend them. In particular, they have the feature that they provide
non-negative polynomial approximates to non-negative functions. In fact, the
following discussion reveals much deeper non-negativity preservation properties
possessed by the Bernstein approximation scheme.
In order to bring out the virtues of the Bernstein polynomials, it is impor-
tant to replace (1.2.7) with an expression in which the coefficients of Bn ( · ; f )
(as polynomials) are clearly displayed. To this end, introduce the difference
operator ∆h for h > 0 given by
f (t + h) − f (t)
∆h f (t) = .
h
A straightforward inductive argument (using Pascal’s Identity for the binomial
coefficients) shows that
m
` m
X
m
m
(−h) ∆h f (t) = (−1) f (t + `h) for m ∈ Z+ ,
`
`=0
(m) 1
where ∆h denotes the mth iterate of the operator ∆h . Taking h = n, we now
see that
n n−`
X X nn − `
Bn (p; f ) = (−1)k f (`h)p`+k
` k
`=0 k=0
n r
X X
rn n−`
= p (−1)r−` f (`h)
r=0
` r − `
`=0
n r
X
X n r r
= (−p) (−1)` f (`h)
r=0
r `
`=0
n
X n
(ph)r ∆rh f (0),

=
r=0
r
where ∆0h f ≡ f . Hence, we have proved that

n
X n `
(1.2.9) Bn (p; f ) = n−` ∆ 1 f (0)p` for p ∈ [0, 1].
` n
`=0
The marked resemblance between the expression on the right-hand side of

(1.2.9) and a Taylor polynomial is more than coincidental. To demonstrate how
1 See G.G. Lorentz’s Bernstein Polynomials, Chelsea Publ. Co. (1986) for a lot more information.
one can exploit the relationship between the Bernstein and Taylor polynomials,
say that a function ϕ ∈ C ∞ (a, b); R is absolutely monotone if its mth deriva-

tive Dm ϕ is non-negative for every m ∈ N. Also, say that ϕ ∈ C ∞ [0, 1]; [0, 1])
is a probability generating function if there exists a un : n ∈ N ⊆ [0, 1]
such that
X∞ X∞
un = 1 and ϕ(t) = un tn for t ∈ [0, 1].
n=0 n=0
Obviously, every probability generating function is absolutely monotone on (0, 1).

The somewhat surprising (remember that most infinitely differentiable functions
do not admit power series expansions) fact which I am about to prove is that,
apart from a multiplicative constant, the converse is also true. In fact, one does
not need to know, a priori, that the function is smooth so long as it satisfies a
discrete version of absolute monotonicity.

Theorem 1.2.10. Let ϕ ∈ C [0, 1]; R with ϕ(1) = 1 be given. Then the
following are equivalent:
(i) ϕ is a probability generating function,
(ii) the
restriction
of ϕ to (0, 1) is absolutely monotone;
(iii) ∆m 1 ϕ (0) ≥ 0 for every n ∈ N and 0 ≤ m ≤ n.
n
Proof: The implication (i) =⇒ (ii) is trivial. To see that (ii) implies (iii), first
observe that if ψ is absolutely monotone on (a, b) and h ∈ (0, b − a), then ∆h ψ
is absolutely monotone on (a, b − h). Indeed, because D ◦ ∆h ψ = ∆h ◦ Dψ on
(a, b − h), we have that
Z t+h
h Dm ◦ ∆h ψ (t) = Dm+1 ψ(s) ds ≥ 0,

t ∈ (a, b − h),
t
for any m ∈ N. Returning to the function ϕ, we now know that ∆m

h ϕ is absolutely
monotone on (0, 1 − mh) for all m ∈ N and h > 0 with mh < 1. In particular,
[∆m m
h ϕ](0) = lim [∆h ϕ](t) ≥ 0 if mh < 1,
t&0
1

and so ∆m
h ϕ (0) ≥ 0 when h = n and 0 ≤ m < n. Moreover, since
[∆n1 ϕ](0) = lim1 [∆nh ϕ](0),

n h% n
we also know that ∆nh ϕ (0) ≥ 0 when h = n1 , and this completes the proof that

(ii) implies (iii).

Finally, assume that (iii) holds and set ϕn = Bn ( · ; ϕ). Then, from (1.2.9) and
the equality ϕn (1) = ϕ(1) = 1, we see that each ϕn is a probability generating
function. Thus, in order to complete the proof that (iii) implies (i), all that
one has to do is check that a uniform limit of probability generating functions

is itself a probability generating function. To this end, write
∞
X
ϕn (t) = un,` t` , t ∈ [0, 1] for each n ∈ Z+ .
`=0
Because the un,` ’s are all elements of [0, 1], one can use a diagonalization proce-
dure to choose {nk : k ∈ Z+ } so that
lim unk ,` = u` ∈ [0, 1]

k→∞
exists for each ` ∈ N. But, by Lebesgue’s Dominated Convergence Theorem,

this means that
∞
X
ϕ(t) = lim ϕnk (t) = u` t` for every t ∈ [0, 1).
k→∞
`=0
Finally, by the Monotone Convergence Theorem, the preceding extends imme-

diately to t = 1, and so ϕ is a probability generating function. (Notice that
the argument just given does not even use the assumed uniform convergence
and shows that the pointwise limit of probability generating functions is again
a probability generating function.)
The preceding is only one of many examples in which The Weak Law leads
to useful ways of forming an approximate identity. A second example is given
in Exercises 1.2.12 and 1.2.13. My treatment of these is based on that of Wm.
Feller.2
Exercise 1.2.11. Although, for historical reasons, The Weak Law is usually
thought of as a theorem about convergence in P-probability, the forms in which
I have presented it are clearly results about convergence in either P-mean or
even P-square mean. Thus, it is interesting to discover that one can replace the
uniform integrability assumption made in Theorem 1.2.6 with a weak uniform in-
tegrability assumption if one is willing to settle for convergence in P-probability.
Namely, let X1 , . . . , Xn , . . . be mutually P-independent random variables, as-
sume that
F (R) ≡ sup RP |Xn | ≥ R −→ 0 as R % ∞,
n∈Z+
2 Wm. Feller, An Introduction to Probability Theory and Its Applications, Vol. II, Wiley, Series
in Probability and Math. Stat. (1968). Feller provides several other similar applications of The
Weak Law, including the ones in the following exercises.
and set
n
1 X Ph i
mn = E X` , |X` | ≤ n , n ∈ Z+ .
n
`=1
Show that, for each > 0,
n
1 X P h 2 i
P S n − mn ≥ ≤ E X ` , X ` ≤ n + P max X` > n
(n)2 1≤`≤n
`=1
Z n
2
≤ 2 F (t) dt + F (n),
n 0

and conclude that S n − mn −→ 0 in P-probability. (See part (ii) of Exercises
1.4.26 and 1.4.27 for a partial converse to this statement.)
Hint: Use the formula
Z
Var(Y ) ≤ EP Y 2 = 2

t P |Y | > t dt.
[0,∞)
Exercise 1.2.12. Show that, for each T ∈ [0, ∞) and t ∈ (0, ∞),
X (nt)k
1 if T > t
lim e−nt =
n→∞ k! 0 if T < t.
0≤k≤nT
Hint: Let X1 , . . . , Xn , . . . be P-independent, N-valued Poisson random vari-

ables with mean value t. That is, the Xn ’s are P-independent and
tk
P Xn = k = e−t

for k ∈ N.
k!
Show that Sn is an N-valued Poisson random variable with mean value nt, and
conclude that, for each T ∈ [0, ∞) and t ∈ (0, ∞),
X (nt)k
e−nt

= P Sn ≤ T .
k!
0≤k≤nT
Exercise 1.2.13. Given a right-continuous function F : [0, ∞) −→ R of bound-

ed variation with F (0) = 0, define its Laplace transform ϕ(λ), λ ∈ [0, ∞), by
the Riemann–Stieltjes integral:
Z
ϕ(λ) = e−λt dF (t).
[0,∞)
Using Exercise 1.2.12, show that
X (−n)k
Dk ϕ (n) −→ F (T )

as n → ∞
k!
k≤nT
for each T ∈ [0, ∞) at which F is continuous. Conclude, in particular, that

F can be recovered from its Laplace transform. Although this is not the most
practical recovery method, it is distinguished by the fact that it does not involve
complex analysis.
§ 1.3 Cramér’s Theory of Large Deviations

From Theorem 1.2.4, we know that if Xn : n ∈ Z+ is a sequence of P-
independent, P-square integrable random variables with mean value 0, and if
the averages S n , n ∈ Z+ , are defined accordingly, then, for every > 0,
max
1≤m≤n Var(Xm )

P S n ≥ ≤ , n ∈ Z+ .
n2
Thus, so long as
Var(Xn )
−→ 0 as n → ∞,
n
the S n ’s are becoming more and more concentrated near 0, and the rate at
which this concentration is occurring can be estimated in terms of the variances
Var(Xn ). In this section, we will see that, by placing more stringent integrability
requirements on the Xn ’s, one can gain more information about the rate at which
the S n ’s are concentrating at 0.
In all of this analysis, the trick is to see how independence can be combined
with 0 mean value to produce unexpected cancellations; and, as a preliminary
warm-up exercise, I begin with the following.
Theorem 1.3.1. Let {Xn : n ∈ Z+ } be a sequence of P-independent, P-
integrable random variables with mean value 0, and assume that
M4 ≡ sup EP Xn4 < ∞.

n∈Z+
Then, for each > 0,

4 3M4
4 P |S n | ≥ ≤ EP S n ≤ 2 , n ∈ Z+ .

(1.3.2)
n
In particular, S n −→ 0 P-almost surely.
Proof: Obviously, in order to prove (1.3.2), it suffices to check the second
inequality, which is equivalent to EP Sn4 ≤ 3M4 n2 . But
n
X
E Sn4 =

P

EP Xm1 · · · Xm4 ,
m1 ,...,m4 =1
and, by Schwarz’s Inequality, each of these terms is dominated by M4 . In addi-

tion, of these terms, the only ones that do not vanish have either all their factors
the same or two pairs of equal factors. Thus, the number of non-vanishing terms
is n + 3n(n − 1) = 3n2 − 2n.
Given (1.3.2), the proof of the last part becomes an easy application of the
Borel–Cantelli Lemma. Indeed, for any > 0, we know from (1.3.2) that
X∞
P S n ≥ < ∞,
n=1

and therefore, by (1.1.4), that P limn→∞ S n ≥ = 0.
§ 1.3 Cramér’s Theory of Large Deviations 23
Remark 1.3.3. The final assertion in Theorem 1.3.1 is a primitive version of

The Strong Law of Large Numbers. Although The Strong Law will be taken up
again, and considerably refined, in Section 1.4, the principle on which its proof
here was based is an important one: namely, control more moments and you
will get better estimates; get better estimates and you will reach more refined
conclusions.
With the preceding adage in mind, I will devote the rest of this section to
examining what one can say when one has all moments at one’s disposal. In fact,
from now on, I will be assuming that X1 , . . . , Xn , . . . are independent random
variables with common distribution µ having the property that the moment
generating function
Z
(1.3.4) Mµ (ξ) ≡ eξ x µ(dx) < ∞ for all ξ ∈ R.
R
Obviously, (1.3.4) is more than sufficient to guarantee that the Xn ’s have mo-
ments of all orders. In fact, as an application of Lebesgue’s Dominated Conver-
gence Theorem, one sees that ξ ∈ R 7−→ M (ξ) ∈ (0, ∞) is infinitely differentiable
and that
dn M
Z
EP X1n = xn µ(dx) =

(0) for all n ∈ N.
R dξ n
In the discussion that follows, I will use m and σ 2 to denote, respectively, the
common mean value and variance of the Xn ’s.
In order to develop some intuition for the considerations that follow, I will
first consider an example, which, for many purposes, is the canonical example in
probability theory. Namely, let g : R −→ (0, ∞) be the Gauss kernel
|y|2

1
(1.3.5) g(y) ≡ √ exp − , y ∈ R,
2π 2
and recall that a random variable X is standard normal if
Z

P X∈Γ = g(y) dy, Γ ∈ BR .
Γ
In spite of their somewhat insultingly bland moniker, standard normal random

variables are the building blocks for the most honored family in all of probability
theory. Indeed, given m ∈ R and σ ∈ [0, ∞), the random variable Y is said to
be normal (or Gaussian) with mean valuem and variance σ 2 (often this
is abbreviated by saying that X is an N m, σ 2 -random variable) if and only
if the distribution of Y is γm,σ2 , where γm,σ2 is the distribution of the variable
σX + m when X is standard normal. That is, Y is an N (m, σ 2 ) random variable
if, when σ = 0, P(Y = m) = 1 and, when σ > 0,

y−m
Z
1
P Y ∈Γ = g dy for Γ ∈ BR .
Γ σ σ
There are two obvious reasons for the honored position held by Gaussian
random variables. In the first place, they certainly have finite moment generating
functions. In fact, since
Z 2
ξy ξ
e g(y) dy = exp , ξ ∈ R,
R 2
it is clear that
σ2 ξ2

(1.3.6) Mγm,σ2 (ξ) = exp ξm + .
2
Secondly, they add nicely. To be precise, it is a familiar fact from elemen-

tary probability theory that if X is an N (m, σ 2 )-random variable and X̂ is
an N (m̂, σ̂ 2 )-random
variable that is independent of X, then X + X̂ is an
N m + m̂, σ 2 + σ̂ 2 -random variable. In particular, if X1 , . . . , Xn are mutually
independent, standard normal random variables, then S n is an N 0, n1 -random

variable. That is,
n|y|2
r Z
n
P Sn ∈ Γ = exp − dy.
2π Γ 2
Thus (cf. Exercise 1.3.16), for any Γ we see that
|y|2

1 h i
(1.3.7) lim log P S n ∈ Γ = −ess inf : y∈Γ ,
n→∞ n 2
where the “ess” in (1.3.7) stands for essential and means that what follows is
taken modulo a set of measure 0. (Hence, apart from a minus sign, the right-
2
hand side of (1.3.7) is the greatest number dominated by |y|2 for Lebesgue-almost
every y ∈ Γ.) In fact, because
Z ∞
g(y) dy ≤ x−1 g(x) for all x ∈ (0, ∞),
x
we have the rather precise upper bound

r
n2

2
P |S n | ≥ ≤ exp − for > 0.
nπ2 2
At the same time, it is clear that, for 0 < < |a|,

r
22 n n(|a| + )2

P |S n − a| < ≥ exp − .
π 2
More generally, if the Xn ’s are mutually independent N (m, σ 2 )-random vari-

ables, then one finds that
r
n2

2
P |S n − m| ≥ σ ≤ exp − for > 0;
nπ2 2
and, for 0 < < |a| and sufficiently large n’s,

r
22 n n(|a| + )2

P |S n − (m + a)| < σ ≥ exp − .
π 2
Of course, in general one cannot hope to know such explicit expressions for the
distribution of S n . Nonetheless, on the basis of the preceding, one can start to
see what is going on. Namely, when the distribution µ falls off rapidly outside of
compacts, averaging n independent random variables with distribution µ has the
effect of building an exponentially deep well in which the mean value m lies at the
bottom. More precisely, if one believes that the Gaussian random variables are
normal in the sense that they are typical, then one should conjecture
that,
even
when the random variables are not normal, the behavior of P S n − m ≥ for
large n’s should resemble that of Gaussians with the same variance; and it is in
the verification of this conjecture that the moment generating function Mµ plays
a central role. Namely, although an expression in terms of µ for the distribution
of Sn is seldom readily available, the moment generating function for Sn is easily
expressed in terms of Mµ . To wit, as a trivial application of independence, we
have
EP eξSn = Mµ (ξ)n , ξ ∈ R.

Hence, by Markov’s Inequality applied to eξSn , we see that, for any a ∈ R,
P S n ≥ a ≤ e−nξa Mµ (ξ)n = exp −n ξa − Λµ (ξ) ,

ξ ∈ [0, ∞),
where

(1.3.8) Λµ (ξ) ≡ log Mµ (ξ)
is the logarithmic moment generating function of µ. The preceding rela-

tion is one of those lovely situations in which a single quantity is dominated by a
whole family of quantities, which means that one should optimize by minimizing
over the dominating quantities. Thus, we now have
" #

(1.3.9) P S n ≥ a ≤ exp −n sup ξa − Λµ (ξ) .
ξ∈[0,∞)
Notice that (1.3.9) is really very good. For instance, when the Xn ’s are N (m, σ 2 )-
random variables and σ > 0, then (cf. (1.3.6)) the preceding leads quickly to the
estimate
n2

P S n − m ≥ ≤ exp − 2 ,
2σ
which is essentially the upper bound at which we arrived before.
Taking a hint from the preceding, I now introduce the Legendre transform

(1.3.10) Iµ (x) ≡ sup ξx − Λµ (ξ) : ξ ∈ R , x ∈ R,
of Λµ and, before proceeding further, make some elementary observations about

the structure of the functions Λµ and Iµ .
Lemma 1.3.11. The function Λµ is infinitely differentiable. In addition, for
each ξ ∈ R, the probability measure νξ on R given by
Z
1
νξ (Γ) = eξx µ(dx) for Γ ∈ BR
Mµ (ξ) Γ
has moments of all orders,

Z Z Z 2
x νξ (dx) = Λ0µ (ξ), and 2
x νξ (dx) − x νξ (dx) = Λ00µ (ξ).
R R R
Next, the function Iµ is a [0, ∞]-valued, lower semicontinuous, convex function

that vanishes at m. Moreover,

Iµ (x) = sup ξx − Λµ (ξ) : ξ ≥ 0 for x ∈ [m, ∞)
and

Iµ (x) = sup ξx − Λµ (ξ) : ξ ≤ 0 for x ∈ (−∞, m].
Finally, if

α = inf x ∈ R : µ (−∞, x] > 0 and β = sup x ∈ R : µ [x, ∞) > 0 ,
then Iµ is smooth on (α, β) and identically +∞ off of [α, β]. In fact, either
µ({m}) = 1 and α = m = β or m ∈ (α, β), in which case Λ0µ is a smooth,
strictly increasing mapping from R onto (α, β),
−1
Ξµ = Λ0µ

Iµ (x) = Ξµ (x) x − Λµ Ξµ (x) , x ∈ (α, β), where
is the inverse of Λ0µ , µ({α}) = e−Iµ (α) if α > −∞, and µ({β}) = e−Iµ (β) if
β < ∞.
Proof: For notational convenience, I will drop the subscript “µ” during the
proof. Further, note that the smoothness of Λ follows immediately from the
positivity and smoothness of M , and the identification of Λ0 (ξ) and Λ00 (ξ) with
the mean and variance of νξ is elementary calculus combined with the remark
following (1.3.4). Thus, I will concentrate on the properties of the function I.
As the pointwise supremum of functions that are linear, I is certainly lower
semicontinuous and convex. Also, because Λ(0) = 0, it is obvious that I ≥ 0.
Next, by Jensen’s Inequality,
Z
Λ(ξ) ≥ ξ x µ(dx) = ξ m,
R
and, therefore, ξx − Λ(ξ) ≤ 0 if x ≤ m and ξ ≥ 0 or if x ≥ m and ξ ≤ 0. Hence,

because I is non-negative, this proves the one-sided extremal characterizations
of Iµ (x) depending on whether x ≥ m or x ≤ m.
Turning to the final part, note first that there is nothing more to do in the
case when µ({m}) = 1. Thus, assume that µ({m}) < 1, in which case it is clear
that m ∈ (α, β) and that none of the measures νξ is degenerate (i.e., concentrate
at one point). In particular, because Λ00 (ξ) is the variance of the νξ , we know
that Λ00 > 0 everywhere. Hence, Λ0 is strictly increasing and therefore admits a
smooth inverse Ξ on its image. Furthermore, because Λ0 (ξ) is the mean of νξ , it
is clear that the image of Λ0 is contained in (α, β). At the same time, given an
x ∈ (α, β), note that
Z
e−ξx eξy µ(dy) −→ ∞ as |ξ| → ∞,
R
and therefore ξ ξx − Λ(ξ) achieves a maximum at some point ξx ∈ R. In

addition, by the first derivative test, Λ0 (ξx ) = x, and so ξx = Ξ−1 (x). Finally,
suppose that β < ∞. Then
Z Z
−ξβ
e ξy
e µ(dy) = e−ξ(β−y) µ(dy) & µ({β}) as ξ → ∞,
R (−∞,β]
and therefore e−I(β) = inf ξ≥0 e−ξβ M (ξ) = µ({β}). Since the same reasoning
applies when α > −∞, we are done.
Theorem 1.3.12 (Cramér’s Theorem). Let {Xn : n ≥ 1} be a sequence of
P-independent random variables with common distribution µ, assume R that the
associated moment generating function Mµ satisfies (1.3.4), set m = R x µ(dx),
and define Iµ accordingly, as in (1.3.10). Then,
P S n ≥ a ≤ e−nIµ (a)

for all a ∈ [m, ∞),
P S n ≤ a ≤ e−nIµ (a)

for all a ∈ (−∞, m].
Moreover, for a ∈ (α, β) (cf. Lemma 1.3.11), > 0, and n ∈ Z+ ,

!
Λ00µ Ξµ (a) h i
P S n − a < ≥ 1 − exp −n I µ (a) + |Ξµ (a)| ,
n2
−1
where Λµ is the function given in (1.3.8) and Ξµ ≡ Λµ 0 .
Proof: To prove the first part, suppose that a ∈ [m, ∞), and apply the second
part of Lemma 1.3.11 to see that the exponent in (1.3.9) equals −nIµ (a), and,
after replacing {Xn : n ≥ 1} by {−Xn : n ≥ 1}, one also gets the desired
estimate when a ≤ m.
To prove the lower bound, let a ∈ [m, β) be given, and set ξ = Ξµ (a) ∈
[0, ∞). Next, recall the probability measure νξ described in Lemma 1.3.11, and
remember that νξ has mean value a = Λ0µ (ξ) and variance Λ00µ (ξ). Further, if

Yn : n ∈ Z+ is a sequence of independent, identically distributed random
variables with common distribution νξ , then it is an easy matter to check that,
for any n ∈ Z+ and every BRn -measurable F : Rn −→ [0, ∞),
h i 1 h
P ξSn
i
EP F Y1 , . . . , Yn = E e F X 1 , . . . , X n .
Mµ (ξ)n
In particular, if
n
X Tn
Tn = Y` and T n = ,
n
`=1
then, because Iµ (a) = ξa − Λµ (ξ),
h i
P S n − a < = M (ξ)n EP e−ξTn , T n − a <

≥ e−nξ(a+) M (ξ)n P T n − a <

h i
= exp −n Iµ (a) + ξ P T n − a < .
But, because the mean value and variance of the Yn ’s are, respectively, a and
Λ00µ (ξ), (1.2.5) leads to
Λ00 (ξ)
µ
P T n − a ≥ ≤ .
n2
The case when a ∈ (α, m] is handled in the same way.
Results like the ones obtained in Theorem 1.3.12 are examples of a class of
results known as large deviations estimates. They are large deviations be-
cause the probability of their occurrence is exponentially small. Although large
deviation estimates are available in a variety of circumstances,1 in general one
has to settle for the cruder sort of information contained in the following.
1In fact, some people have written entire books on the subject. See, for example, J.-D.
Deuschel and D. Stroock, Large Deviations, now available from the A.M.S. in the Chelsea
Series.
Corollary 1.3.13. For any Γ ∈ BR ,

1 h i
− inf◦ Iµ (x) ≤ lim log P S n ∈ Γ
x∈Γ n→∞ n
1 h i
≤ lim log P S n ∈ Γ ≤ − inf Iµ (x).
n→∞ n x∈Γ
(I use Γ◦ and Γ to denote the interior and closure of a set Γ. Also, recall that I
take the infemum over the empty set to be +∞.)
Proof: To prove the upper bound, let Γ be a closed set, and define Γ+ =
Γ ∩ [m, ∞) and Γ− = Γ ∩ (−∞, m]. Clearly,

P S n ∈ Γ ≤ 2P S n ∈ Γ+ ∨ P S n ∈ Γ− .
Moreover, if Γ+ 6= ∅ and a+ = min{x : x ∈ Γ+ }, then, by Lemma 1.3.11 and

Theorem 1.3.12,
and P S n ∈ Γ+ ≤ e−nIµ (a+ ) .

Iµ (a+ ) = inf Iµ (x) : x ∈ Γ+
Similarly, if Γ− 6= ∅ and a− = max{x : x ∈ Γ− }, then
and P S n ∈ Γ− ≤ e−nIµ (a− ) .

Iµ (a− ) = inf Iµ (x) : x ∈ Γ−
Hence, either Γ = ∅, and there is nothing to do anyhow, or
P S n ∈ Γ ≤ 2 exp −n inf Iµ (x) : x ∈ Γ , n ∈ Z+ ,

which certainly implies the asserted upper bound.

To prove the lower bound, assume that Γ is a non-empty open set. What I
have to show is that
1 h i
lim log P S n ∈ Γ ≥ −Iµ (a)
n→∞ n
for every a ∈ Γ. If a ∈ Γ ∩ (α, β), choose δ > 0 so that (a − δ, a + δ) ⊆ Γ and

use the second part of Theorem 1.3.12 to see that
1 h i
lim log P S n ∈ Γ ≥ −Iµ (a) − Ξµ (a)
n→∞ n
for every ∈ (0, δ). If a ∈

/ [α, β], then Iµ (a) = ∞, and so there is nothing to do.
Finally, if a ∈ {α, β}, then µ({a}) = e−Iµ (a) and therefore
P S n ∈ Γ ≥ P S n = a ≥ e−nIµ (a) .

Remark 1.3.14. The upper bound in Theorem 1.3.12 is often called Cher-
noff ’s Inequality. The idea underlying its derivation is rather mundane by
comparison to the subtle idea underlying the proof of the lower bound. Indeed,
it may not be immediately obvious what that idea was! Thus, consider once
again the second part of the proof of Theorem 1.3.12. What I had to do is esti-
mate the probability that S n lies in a neighborhood of a. When a is the mean
value m, such an estimate is provided by the Weak Law. On the other hand,
when a 6= m, the Weak Law for the Xn ’s has very little to contribute. Thus,
what I did is replace the original Xn ’s by random variables Yn , n ∈ Z+ , whose
mean value is a. Furthermore, the transformation from the Xn ’s to the Yn ’s was
sufficiently simple that it was easy to estimate Xn -probabilities in terms of Yn -
probabilities. Finally, the Weak Law
Pn applied to the Yn ’s gave strong information
about the rate of approach of n1 `=1 Y` to a.
I close this section by verifying the conjecture (cf. the discussion preceding
Lemma 1.3.11) that the Gaussian case is normal. In particular, I want to check
that the well around m in which the distribution of S n becomes concentrated
looks Gaussian, and, in view of Theorem 1.3.12, this comes down to the following.
Theorem 1.3.15. Let everything be as in Lemma 1.3.11, and assume that
the variance σ 2 > 0. There exists a δ ∈ (0, 1] anda K ∈ (0, ∞) such that
[m − δ, m + δ] ⊆ (α, β) (cf. Lemma 1.3.11), Λ00µ Ξ(x) ≤ K,
2

Iµ (x) − (x − m) ≤ K|x − m|3

Ξµ (x) ≤ K|x − m|, and 2σ 2
for all x ∈ [m − δ, m + δ]. In particular, if 0 < < δ, then

2
3
P |S n − m| ≥ ≤ 2 exp −n − K ,
2σ 2
and if |a − m| < δ and > 0, then
|a − m|2

K 2

P |S n − a| < ≥ 1 − 2 exp −n + K|a − m| + |a − m| .
n 2σ 2
Proof: Without loss in generality (cf. Exercise 1.3.17), I will assume that m =
0 and σ 2 = 1. Since, in this case, Λµ (0) = Λ0µ (0) = 0 and Λ00µ (0) = 1, it
follows that Ξµ (0) = 0 and Ξ0µ (0) = 1. Hence, we can find an M ∈ (0, ∞)
and a δ ∈ (0, 1] with α < −δ < δ < β for which Ξµ (x) − x ≤ M |x|2 and
Λµ (ξ) − ξ2 ≤ M |ξ|3 whenever |x| ≤ δ and |ξ| ≤ (M + 1)δ, respectively. In

2
particular, this leads immediately to Ξµ (x) ≤ (M + 1)|x| for |x| ≤ δ, and
the estimate for Iµ comes easily from the preceding combined with equation
Iµ (x) = Ξ(x)x − Λµ Ξµ (x) .

Exercise 1.3.16. Let E, F, µ be a measure space and f a non-negative,
F-measurable function. If either µ(E) < ∞ or f is µ-integrable, show that
kf kLp (µ;R) −→ kf kL∞ (µ;R) as p → ∞.
Hint: Handle the case µ(E) < ∞ first, and treat the case when f ∈ L1 (µ; R)
by considering the measure ν(dx) = f (x) µ(dx).
Exercise 1.3.17. Referring to the notation used in this section, assume that
µ is a non-degenerate (i.e., it is not concentrated at a single point) probability
measure on R for which (1.3.4) holds. Next, let m and σ 2 be the mean and
variance of µ, use ν to denote the distribution of
x−m
x ∈ R 7−→ ∈R under µ,
σ
and define Λν , Iν , and Ξν accordingly. Show that
Λµ (ξ) = ξm + Λν (σξ), ξ ∈ R,

x−m
Iµ (x) = Iν , x ∈ R,
σ
Image Λ0µ = m + σ Image Λ0ν ,

1 x−m
, x ∈ Image Λ0µ .

Ξµ (x) = Ξν
σ σ
Exercise 1.3.18. Continue with the same notation as in the preceding.

(i) Show that Iν ≤ Iµ if Mµ ≤ Mν .
(ii) Show that
(x − m)2
Iµ (x) = , x ∈ R,
2σ 2

when µ is the N m, σ 2 distribution with σ > 0, and show that
x−a x−a b−x b−x

Iµ (x) = log + log , x ∈ (a, b),
b−a (1 − p)(b − a) b − a p(b − a)
when a < b, p ∈ (0, 1), and µ({a}) = 1 − µ({b}) = p.

Bernoulli distribution given by µ {±1} = 12 , show

(iii) When µ is the hcentered
i
2 2
that Mµ (ξ) ≤ exp ξ2 , ξ ∈ R, and conclude that Iµ (x) ≥ x2 , x ∈ R. More
generally, given n ∈ Z+ , {σk : 1 ≤ k ≤ n} ⊆ R, and independent random
variables X1 , . . . , Xn with this µ as their common distribution, let ν denote the
x2
Pn 2
Pn
distribution of S ≡ 1 σk Xk and show that Iν (x) ≥ 2Σ 2 , where Σ ≡ 1 σk2 .
In particular, conclude that
a2

P |S| ≥ a ≤ 2 exp − 2 , a ∈ [0, ∞).
2Σ
Exercise 1.3.19. Although it is not exactly the direction in which I have been
going, it seems appropriate to include here a derivation of Stirling’s formula.
Namely, recall Euler’s Gamma function:
Z
(1.3.20) Γ(t) ≡ xt−1 e−x dx, t ∈ (−1, ∞).
[0,∞)
The goal of this exercise is to prove that

t
√ t
(1.3.21) Γ(t + 1) ∼ 2πt as t % ∞,
e
where the tilde “∼” means that the two sides are asymptotic to one another in
the sense that their ratio tends to 1. (See Exercise 2.1.16 for another approach.)
The first step is to make the problem look like one to which Exercise 1.3.16
is applicable. Thus, make the substitution x = ty, and apply Exercise 1.3.16 to
see that ! 1t
1 Z
Γ(t + 1) t
= y t e−ty dy −→ e−1 .
tt+1 [0,∞)
This is, of course, far less than we want to know. Nonetheless, it does show that
all the action is going to take place near y = 1 and that the principal factor in
the asymptotics of Γ(t+1) −t
tt+1 is e . In order to highlight these observations, make
the substitution y = z + 1 and obtain
Z
Γ(t + 1)
= (1 + z)t e−tz dz.
tt+1 e−t (−1,∞)
2
Before taking the next step, introduce the function R(z) = log(1 + z) − z + z2
|z|3
for z ∈ (−1, 1), and check that R(z) ≤ 0 if z ∈ (−1, 0] and that |R(z)| ≤ 3(1−|z|)
everywhere in (−1, 1). Now let δ ∈ (0, 1) be given, and show that
Z −δ
tδ 2

t tz −δ t

1 + z e dz ≤ (1 − δ) (1 − δ)e ≤ exp −
−1 2
and
Z ∞ t h it−1 Z ∞
e−tz dz ≤ 1 + δ e−δ (1 + z)e−z dz

1+z
δ δ
tδ 2 δ3

≤ 2 exp 1 − + .
2 3(1 − δ)
tz 2
Next, write (1 + z)t e−tz = e− 2 etR(z) . Then
Z Z
t −tz tz 2
1+z e dz = e− 2 dz + E(t, δ),
|z|≤δ |z|≤δ
where Z
tz 2
e− etR(z) − 1 dz.

E(t, δ) = 2
|z|≤δ
Check that
Z r Z

− tz2
2 2π 1 z2 2 tδ 2
e dz − = t− 2 e− 2 dz ≤ 1 e− 2 .

t
1
|z|≤δ |z|≥t 2 δ t2 δ
At the same time, show that
12(1 − δ)
Z Z
2 tz 2 3−5δ
− tz2 +|R(z)|
|E(t, δ)| ≤ t |R(z)|e dz ≤ t |z|3 e− 2 3(1−δ)
dz ≤
|z|≤δ |z|≤δ (3 − 5δ)2 t
p
as long as δ < 35 . Finally, take δ = 2t−1 log t, and combine these to conclude
that there is a C < ∞ such that

Γ(t + 1) C
√ − 1 ≤ , t ∈ [1, ∞).

2πt t t t

e
Exercise 1.3.22. Inspired by T.H. Carne,2 here is a rather different sort of

application of large deviation estimates. Namely, the goal is to show that for
each n ≥ 2 and 1 ≤ m < n there exists an (m − 1)st order polynomial pm,n with
the property that
2

x − pm,n (x) ≤ 2 exp − m
n
for x ∈ [−1, 1].
2n
(i) Given a C-valued f on Z, define Af : Z −→ C by
f (n + 1) + f (n − 1)
Af (n) = , n ∈ Z,
2

and show that, for any n ≥ 1, An f = EP f (Sn ) , where Sn is the sum of n
P-independent, {−1, 1}-valued Bernoulli random variables with mean value 0.
2 T.H. Carne, “A transformation formula for Markov chains,” Bull. Sc. Math., 109, pp. 399–
405 (1985). As Carne points out, what he is doing is the discrete analog of Hadamard’s rep-
resentation, via the Weierstrass transform, of solutions to heat equations in terms of solutions
to the wave equations.
(ii) Show that, for each z ∈ C, there is a unique sequence {Q(m, z) : m ∈ Z} ⊆ C

satisfying Q(0, z) = 1,

Q(−m, z) = Q(m, z), and AQ( · , z) (m) = zQ(m, z) for all m ∈ Z.
In fact, show that, for each m ∈ Z+ : Q(m, · ) is a polynomial of degree m and
Q(m, cos θ) = cos(mθ), θ ∈ C.
In particular, this means that |Q(n, x)| ≤ 1 for all x ∈ [−1, 1]. (It also means
that Q(n, · ) is the nth Chebychev polynomial.)
(iii) Using induction on n ∈ Z+ , show that
n
A Q( · , z) (m) = z n Q(m, z),

m ∈ Z and z ∈ C,
and conclude that

h i
z n = E Q Sn , z , n ∈ Z+ and z ∈ C.
In particular, if

h i X n
pm,n (z) ≡ E Q Sn , z), Sn < m = 2−n

Q(2` − n, z),
`
|2`−n|<m
conclude that (cf. Exercise 1.3.18)
m2

n

sup |x − pm,n (x)| ≤ P |Sn | ≥ m ≤ 2 exp − for all 1 ≤ m ≤ n.
x∈[−1,1] 2n
(iv) Suppose that A is a self-adjoint contraction on the real or complex Hilbert

space H (i.e., (f, Ag)H = (g, Af )H and kAf kH ≤ kf kH for all f, g ∈ H). Next,
assume that f, A` g H = 0 for some f, g ∈ H and each 0 ≤ ` < m. Show that
2

f, An g ≤ 2kf kH kgkH exp − m

H
for n ≥ m.
2n
(See Exercise 2.3.30 for an application.)

Hint: Note that f, pm,n (A)g H = 0, and use the Spectral Theorem to see that,
for any polynomial p,
kp(A)f kH ≤ sup |p(x)| kf kH , f ∈ H.

x∈[−1,1]
§ 1.4 The Strong Law of Large Numbers 35
§ 1.4 The Strong Law of Large Numbers

In this section I will discuss a few almost sure convergence properties of averages
of independent random variables. Thus, once again, {Xn : n ≥ 1} will be a
sequence of independent random variables on a probability space Ω, F, P , and
Sn and S n will be, respectively, the sum and average of X1 , . . . , Xn . Throughout
this section, the reader should notice how much more immediately important a
role independence (as opposed to orthogonality) plays than it did in Section 1.2.
To get started, I point out that, for both {Sn : n ≥ 1} and S n : n ≥ 1 , the
set on which convergence occurs has P-measure either 0 or 1. In fact, we have
the following simple application of Kolmogorov’s 0–1 Law (Theorem 1.1.2).
+

Lemma
1.4.1.
For any sequence a n : n ∈ Z ⊆ R and any sequence
bn : n ∈ Z+ ⊆ (0, ∞) that converges to an element of (0, ∞], the set on which
Sn − an
lim exists in R
n→∞ bn
has P-measure either 0 or 1. In fact, if bn −→ ∞ as n → ∞, then both
Sn − an Sn − an
lim and lim
n→∞ bn n→∞ bn
are P-almost surely constant.

Proof: Simply observe that all of the events and functions involved can be
expressed in terms of {Sm+n − Sm : n ≥ 1} for each m ∈ Z+ and are therefore
tail-measurable.
The following beautiful statement, which was proved originally by Kolmogorov,
is the driving force behind
many of the almost sure convergence results about
both {Sn : n ≥ 1} and S n : n ≥ 1 .
Theorem 1.4.2. If the Xn ’s are independent, P-square integrable random
variables, and if
∞
X
(1.4.3) Var Xn ) < ∞,
n=1
then
∞
X
Xn − EP Xn converges P-almost surely.
n=1
Note that, since

∞
n
!
X 1 X
(1.4.4) sup P X` − EP X` ≥ ≤ 2 Var X` ,

n≥N
`=N `=N
P∞
(1.4.3) certainly implies that the series n=1 Xn − EP Xn converges in P-
measure. Thus, all that I am attempting to do here is replace a convergence
in measure statement with an almost sure one. Obviously, this replacement
would be trivial if the “supn≥N ” in (1.4.3) appeared on the other side of P. The
remarkable fact which we are about to prove is that, in the present situation,
the “supn≥N ” can be brought inside!
Theorem 1.4.5 (Kolmogorov’s Inequality). If the Xn ’s are independent
and P-square integrable, then
n ! ∞
X 1 X
(1.4.6) P sup X` − EP X` ≥ ≤ 2 Var Xn

n≥1 n=1
`=1
for each > 0. (See Exercise 1.4.21 for more information.)

Proof: Without loss in generality, assume that each Xn has mean value 0.
Given 1 ≤ n < N , note that
2
2
− Sn2 = SN − Sn

SN + 2 SN − Sn Sn ≥ 2 SN − Sn Sn ;
and therefore, since SN −Sn has mean value 0 and is independent of the σ-algebra
σ {X1 , . . . , Xn } ,
2
, An ≥ EP Sn2 , An for any An ∈ σ {X1 , . . . , Xn } .

(*) EP SN

In particular, if A1 = |S1 | > and
n o
n ∈ Z+ ,

An+1 = Sn+1 > and max S` ≤ ,
1≤`≤n
then, the An ’s are mutually disjoint,

N
[

BN ≡ max Sn > =
An ,
1≤n≤N
n=1
and so (*) implies that
N N
2 X 2 X
EP Sn2 , An
P P

E SN , BN = E SN , An ≥
n=1 n=1
N
X
≥ 2 P An = 2 P B N .

n=1
Thus,
∞
2 X
P sup Sn > = lim 2 P BN ≤ lim EP SN
2
EP Xn2 ,

≤
n≥1 N →∞ N →∞
n=1
and so the result follows after one takes left limits with respect to > 0.
Proof of Theorem 1.4.2: Again assume that the Xn ’s have mean value 0.
By (1.4.6) applied to XN +n : n ∈ Z+ , we see that (1.4.3) implies
∞
1 X
EP Xn2 −→ 0 as N → ∞

P sup Sn − SN ≥ ≤ 2
n>N
n=N +1
for every > 0, and this is equivalent to the P-almost sure Cauchy convergence
of {Sn : n ≥ 1}.
In order to convert the conclusion in Theorem 1.4.2 into a statement about
S n : n ≥ 1 , I will need the following elementary summability fact about
sequences of real numbers.

Lemma 1.4.7 (Kronecker). Let bn : n ∈ Z+ be a non-decreasing sequence
of positive numbers that tend to ∞, and set βn = bn − bn−1 , where b0 ≡ 0. If
{sn : n ≥ 1} ⊆ R is a sequence that converges to s ∈ R, then
n
1 X
β` s` −→ s.
bn
`=1
In particular, if {xn : n ≥ 1} ⊆ R, then

∞ n
X xn 1 X
converges in R =⇒ x` −→ 0 as n → ∞.
b
n=1 n
bn
`=1
Proof: To prove the first part, assume that s = 0, and for given > 0 choose
N ∈ Z+ so that |s` | < for ` ≥ N . Then, with M = supn≥1 |sn |,

1 X n Mb
N
β` s` ≤ + −→ as n → ∞.

bn bn

`=1
Pn
Turning to the second part, set y` = xb`` , s0 = 0, and sn = `=1 y` . After
summation by parts,
n n
1 X 1 X
x` = sn − β` s`−1 ;
bn bn
`=1 `=1
and so, since sn −→ s ∈ R as n → ∞, the first part gives the desired conclu-
sion.
After combining Theorem 1.4.2 with Lemma 1.4.7, we arrive at the following
interesting statement.
Corollary 1.4.8. Assume that {bn : n ≥ 1} ⊆ (0, ∞) increases to infinity as

n → ∞, and suppose that {Xn : n ≥ 1} is a sequence of independent, P-square
integrable random variables. If
∞
X Var Xn
< ∞,
n=1
b2n
then
n
1 X
X` − EP X` −→ 0 P-almost surely.
bn
`=1
As an immediate consequence of the preceding, we see that S n −→ m P-almost

surely if the Xn ’s are identically distributed and P-square integrable. In fact,
without very much additional effort, we can also prove the following much more
significant refinement of the last part of Theorem 1.3.1.

Theorem 1.4.9 (Kolmogorov’s Strong Law). Let Xn : n ∈ Z+ be
a sequence of P-independent, identically distributed random variables. If X1
is P-integrable and has mean value m, then, as n → ∞, S n −→ m P-almost
surely and in L1 (P; R). Conversely, if S n converges (in R) on a set of positive
P-measure, then X1 is P-integrable.
P

Proof: Assume that X 1 is P-integrable and that E X1 = 0. Next, set Yn =
Xn 1[0,n] |Xn | , and note that
∞
X ∞
X

P Yn 6= Xn = P |Xn | > n
n=1 n=1
X∞ Z n

≤ P |X1 | > t dt = EP |X1 | < ∞.
n=1 n−1
Thus, by the first part of the Borel–Cantelli Lemma,
P ∃n ∈ Z+ ∀N ≥ n YN = XN = 1.

Pn
In particular, if T n = n1 `=1 Y` for n ∈ Z+ , then, for P-almost every ω ∈ Ω,
T n (ω) −→ 0 if and only if S n (ω) −→ 0. Finally, to see that T n −→ 0 P-almost
surely, first observe that, because EP [X1 ] = 0, by the first part of Lemma 1.4.7,
n
1X P
lim E [Y` ] = lim EP X1 , |X1 | ≤ n = 0,
n→∞ n n→∞
`=1
and therefore, by Corollary 1.4.8, it suffices for us to check that

∞
X EP [Yn2 ]
< ∞.
n=1
n2
To this end, set

∞
X 1
C = sup ` ,
`∈Z+ n2
n=`
and note that
∞ ∞ n
X EP [Y 2 ]
n
X 1 X P 2
= E X1 , ` − 1 < |X1 | ≤ `
n=1
n2 n=1
n2
`=1
∞ ∞
X X 1
EP X12 , ` − 1 < |X1 | ≤ `

=
n2
`=1 n=`
∞
X 1
EP X12 , ` − 1 < |X1 | ≤ ` ≤ C EP |X1 | < ∞.

≤C
`
`=1
Thus, the P-almost sure convergence is now established, and the L1 (P; R)-conver-
gence result was proved already in Theorem 1.2.6.
Turning to the converse assertion, first note that (by Lemma 1.4.1) if S n
converges in R on a set of positive P-measure, then it converges P-almost surely
to some m ∈ R. In particular,
|Xn |
lim = lim S n − S n−1 = 0 P-almost surely;
n→∞ n n→∞

and so, if An ≡ |Xn | > n , then P limn→∞ An = 0. But the An ’s are mutually
independent, and Ptherefore, by the second part of the Borel–Cantelli Lemma, we
∞
now know that n=1 P An < ∞. Hence,
Z ∞ X∞
P

E |X1 | = P |X1 | > t dt ≤ 1 + P |Xn | > n < ∞.
0 n=1
Remark 1.4.10. A reason for being interested in the converse part of Theorem
1.4.9 is that it provides a reconciliation between the measure theory vs. frequency
schools of probability theory.
Although Theorem 1.4.9 is the centerpiece of this section, I want to give
another approach to the study of the almost sure convergence properties of
{Sn : n ≥ 1}. In fact, following P. Lévy, I am going to show that {Sn : n ≥ 1}
converges P-almost surely if it converges in P-measure. Hence, for example,
Theorem 1.4.2 can be proved as a direct consequence of (1.4.4), without appeal
to Kolmogorov’s Inequality.
The key to Lévy’s analysis lies in a version of the reflection principle, whose
statement requires the introduction of a new concept. Given an R-valued random
variable Y , say that α ∈ R is a median of Y and write α ∈ med(Y ), if
P Y ≤ α ∧ P Y ≥ α ≥ 12 .

(1.4.11)
Notice that (as distinguished from a mean value) every Y admits a median; for
example, it is easy to check that
α ≡ inf t ∈ R : P Y ≤ t ≥ 12

is a median of Y . In addition, it is clear that
med(−Y ) = −med(Y ) and med (β + Y ) = β + med (Y ) for all β ∈ R.
On the other hand, the notion of median is flawed by the fact that, in general,
a random variable will admit an entire non-degenerate interval of medians. In
addition, it is neither easy to compute the medians of a sum in terms of the
medians of the summands nor to relate the medians of an integrable random
variable to its mean value. Nonetheless, at least if Y ∈ Lp (P; R) for some
p ∈ [1, ∞), the following estimate provides some information. Namely, since, for
α ∈ med(Y ) and β ∈ R,
|α − β|p
≤ |α − β|p P Y ≥ α ∧ P Y ≤ α ≤ EP |Y − β|p ,

2
we see that, for any p ∈ [1, ∞) and Y ∈ Lp (P; R),

p1
|α − β| ≤ 2EP |Y − β|p

for all β ∈ R and α ∈ med (Y ).
In particular, if Y ∈ L2 (P ) and m is the mean value of Y , then

p
(1.4.12) |α − m| ≤ 2Var(Y ) for all α ∈ med(Y ).

Theorem 1.4.13 (Lévy’s Reflection Principle). Let Xn : n ∈ Z+ be
a sequence of P-independent random variables, and, for k ≤ `, choose α`,k ∈
med S` − Sk . Then, for any N ∈ Z+ and > 0,

(1.4.14) P max Sn + αN,n ≥ ≤ 2P SN ≥ ,
1≤n≤N
and therefore

(1.4.15) P max Sn + αN,n ≥ ≤ 2P |SN | ≥ .
1≤n≤N
Proof:
Clearly (1.4.15) follows by applying (1.4.14) to both the sequences
Xn : n ≥ 1} and {−Xn : n ≥ 1} and then adding the two results.

To prove (1.4.14), set A1 = S1 + αN,1 ≥ and

An+1 = max S` + αN,` < and Sn+1 + αN,n+1 ≥
1≤`≤n
for 1 ≤ n < N . Obviously, the An ’s are mutually disjoint and

N
[

An = max Sn + αN,n ≥ .
1≤n≤N
n=1
In addition,

{SN ≥ ⊇ An ∩ SN − Sn ≥ αN,n for each 1 ≤ n ≤ N.
Hence,
N
X
P SN ≥ ≥ P An ∩ SN − Sn ≥ αN,n
n=1
N
1X 1
≥ P An = P max Sn + αN,n ≥ ,
2 n=1 2 1≤n≤N
where, in the passage to the

last line, I have used the independence of the sets
An and SN − Sn ≥ αN,n .
+

Corollary 1.4.16. Let X n : n ∈ Z be a sequence of independent random
+
variables, and assume that Sn : n ∈ Z converges in P-measure to an R-
valued random variable S. Then Sn −→ S P-almost surely. (Cf. Exercise 1.4.25
as well.)
Proof: What I must show is that, for each > 0, there is an M ∈ Z+ such
that
sup P max Sn+M − SM ≥ < .
N ≥1 1≤n≤N
To this end, let 0 < < 1 be given, and choose M ∈ Z+ so that

P Sn+M − Sk+M ≥ < for all 1 ≤ k < n.
2 2

Next, for a given N ∈ Z+ , choose αN,n ∈ med SM +N − SM +n for 0 ≤ n ≤ N .
Then |αN,n | ≤ 2 , and so, by (1.4.15) applied to {XM +n : n ≥ 1},

P max SM +n − SM ≥ ≤ P
max SM +n − SM + αN,n ≥

1≤n≤N 1≤n≤N 2

≤ 2P SM +N − SM ≥ < .
2
Remark 1.4.17. The most beautiful and startling feature of Lévy’s line of
reasoning is that it requires no integrability assumptions. Of course, in many
applications of Corollary 1.4.16, integrability considerations enter into the proof
that {Sn : n ≥ 1} converges in P-measure. Finally, a word of caution may be
in order. Namely, the result in Corollary 1.4.16 applies to the quantities Sn
themselves; it does not apply to associated quantities like S n . Indeed, suppose
that {Xn : n ≥ 1} is a sequence of independent, identically distributed random
variables that satisfy
− 12
P Xn ≤ −t = P Xn ≥ t = 1 + t2 log e4 + t2

for all t ≥ 0.
On the one hand, by Exercise 1.2.11, we know that the associated averages S n
tend to 0 in probability. On the other hand, by the second part of Theorem
1.4.9, we know that the sequence S n : n ≥ 1 diverges almost surely.
Exercise 1.4.18. Let X and Y be non-negative random variables, and suppose

that
1 h i
(1.4.19) P X ≥ t ≤ EP Y, X ≥ t , t ∈ (0, ∞).
t
Show that
p1 p P p p1
(1.4.20) EP X p ≤ E Y , p ∈ (1, ∞).
p−1
Hint: First, reduce to the

case when X is bounded. Next, recall that,
for any
measure space E, F, µ , any non-negative, measurable f on E, F , and any
α ∈ (0, ∞),
Z Z Z
α α−1
tα−1 µ f ≥ t dt.

f (x) µ(dx) = α t µ f > t dt = α
E (0,∞) (0,∞)
Use this together with (1.4.19) to justify the relation

Z
E Xp ≤ p tp−2 EP Y, X ≥ t dt

P

(0,∞)
" Z #
X
p−2 p
EP X p−1 Y ,
P

= pE Y t dt =
0 p−1
and arrive at (1.4.20) after an application of Hölder’s Inequality.

Exercise 1.4.21. Let {Xn : n ≥ 1} be a sequence of mutually indepen-

dent,
P∞ P-square
2 integrable random variables with mean value 0, and assume that
1 E X n < ∞. Let S denote the random variable (guaranteed by Theorem
1.4.2) to which {Sn : n ≥ 1} converges P-almost surely, and, using elementary
orthogonality considerations, check that Sn −→ S in L2 (P; R) as well. Next,
after examining the proof of Kolmogorov’s Inequality (cf. (1.4.6)), show that

2 1 P 2 2
P sup Sn ≥ t ≤ E S , sup Sn ≥ t , t > 0.

n∈Z+ t n∈Z+
Finally, by applying (1.4.20), show that

p h i
P
2p p 2p
(1.4.22) E sup Sn ≤ p−1 EP S , p ∈ (1, ∞),
n∈Z+
and conclude from this that, for each p ∈ (2, ∞), {Sn : n ≥ 1} converges to S
in Lp (P ) if and only if S ∈ Lp (P ).
Exercise 1.4.23. If X ∈ L2 (P; R), then it is easy to characterize its mean m
as the c ∈ R that minimizes EP (X − c)2 . Assuming that X ∈ L1 (P; R), show
that α ∈ med(X) if and only if

EP |X − α| = min EP |X − c| .
c∈R
Hint: Show that, for any a, b ∈ R,

Z b
E |X − b| − EP |X − a| =
P
P(X ≤ t) − P(X ≥ t) dt.
a
Exercise 1.4.24. Let {Xn : n ≥ 1} be a sequence of P-square integrable

random variables that converges in probability to a random variable X, and
assume that supn≥1 Var(Xn ) < ∞. Show that X is square integrable and that

EP |Xn − X| −→ 0. In particular, if, in addition, Var(Xn ) −→ Var(X), show
that EP |Xn − X|2 −→ 0.
Hint: Let αn ∈ med(Xn ), and show that α+ = limn→∞ αn and α− = limn→∞ αn
are both
elements
of med(X). Combine this with (1.4.12) to conclude that
supn≥1 EP [Xn ] < ∞ and therefore that supn≥1 EP [X 2 ] < ∞.
Exercise 1.4.25. The following variant of Theorem 1.4.13 is sometimes useful
and has the advantage that it avoids the introduction of medians. Namely, show
that, for any t ∈ (0, ∞) and n ∈ Z+ ,

P |Sn | > t
P max |Sn | ≥ 2t ≤ .
1≤m≤n 1 − max P |Sn − Sm | > t
1≤m≤n
Note that this can be used in place of (1.4.15) when proving results like the one
in Corollary 1.4.16.
Exercise 1.4.26. A random variable X is said to be symmetric if −X has

the same distribution as X itself. Obviously, the most natural choice of median
for a symmetric random variable is 0; and thus, because sums of independent,
symmetric random variables are again symmetric, (1.4.14) and (1.4.15) are par-
ticularly useful when the Xn ’s are symmetric, since the α`,k ’s can then be taken
to be 0. In this connection, consider the following interesting variation on the
theme of Theorem 1.4.13.
(i) Let X1 , . . . , Xn , . . . be independent, symmetric random variables, set Mn (ω)
=
max1≤`≤n
|X` (ω)|, let τn (ω) be the smallest 1 ≤ ` ≤ n with the property that
X` (ω) = Mn (ω), and define
Yn (ω) = Xτn (ω) (ω) and Ŝn = Sn − Yn .
Show that
ω ∈ Ω 7−→ Ŝn (ω), Yn (ω) ∈ R2 and ω ∈ Ω 7−→ −Ŝn (ω), Yn (ω) ∈ R2

have the same distribution, and conclude first that

P Yn ≥ t ≤ P Yn ≥ t & Ŝn ≥ 0 + P Yn ≥ t & Ŝn ≤ 0

= 2P Yn ≥ t & Ŝn ≥ 0 ≤ 2P Sn ≥ t ,
for all t ∈ R, and then that

P max X` ≥ t ≤ 2P Sn ≥ t ,
t ∈ [0, ∞).
1≤`≤n
(ii) Continuing in the same setting, add the assumption that the Xn ’s are iden-
tically distributed, and use part (i) to show that

lim P |S n | ≤ C = 1 for some C ∈ (0, ∞)
n→∞

=⇒ lim nP |X1 | ≥ n = 0.
n→∞
Hint: Note that

P max |X` | > t = 1 − P(|X1 | ≤ t)n
1≤`≤n
1−(1−x)n
and that x −→ n as x & 0.
In conjunction with Exercise 1.2.11, this proves that if {Xn : n ≥ 1} is
a sequence of independent, identically distributed symmetric random
variables,
then S n −→ 0 in P-probability if and only if limn→∞ nP |X1 | ≥ n = 0.
Exercise 1.4.27. Let X and X 0 be a pair of independent random variables

that have the same distribution, let α be a median of X, and set Y = X − X 0 .
(i) Show that Y is symmetric and that

P |X − α| ≥ t ≤ 2P |Y | ≥ t for all t ∈ [0, ∞),
and conclude that, for any p ∈ (0, ∞),

1 1 1
2− p ∨1 EP |Y |p p ≤ 2EP |Y |p p + |α| .

In particular, |X|p is integrable if and only if |Y |p is.

(ii) The result in (i) leads to my final refinement of The Weak Law of Large
Numbers. Namely, let {Xn : n ≥ 1} be a sequence of independent, identically
distributed random variables. By combining Exercise 1.2.11, part (ii) in Exercise
1.4.26, and part (i) above, show that1

lim P S n ≤ C = 1 for some C ∈ (0, ∞)
n→∞

=⇒ lim nP |X1 | ≥ n = 0
n→∞

=⇒ S n − EP X1 , |X1 | ≤ n −→ 0 in P-probability.
Exercise 1.4.28. Let {Xn : n ≥ 1} be a sequence of mutually independent,

identically distributed, P-integrable random variables with mean value m. As
we already know, when m > 0, the partial sums Sn tend, P-almost surely, to
+∞ at an asymptotic linear rate m; and, of course, when m < 0, the situation
is similar at −∞. On the other hand, when m = 0, we know that, if |Sn | tends
to ∞ at all, then, P-almost surely, it does so at a strictly sublinear rate. In this
exercise, you are to sharpen this statement by proving that
m = 0 =⇒ lim |Sn | < ∞ P-almost surely.

n→∞
The beautiful argument given below is due to Y. Guivarc’h, but its full power
cannot be appreciated in the present context (cf. Exercise 6.2.19). Furthermore,
a classic result (cf. Exercise 5.2.43) due to K.L. Chung and W.H. Fuchs gives a
much better result for the independent random variables. Their result says that
limn→∞ |Sn | = 0 P-almost surely.
In order to prove the assertion here, assume that limn→∞ |Sn | = ∞ with
positive P-probability, use Kolmogorov’s 0–1 Law to see that |Sn | −→ ∞ P-
almost surely, and proceed as follows.
1These ideas are taken from the book by Wm. Feller cited at the end of § 1.2. They become
even more elegant when combined with a theorem due to E.J.G. Pitman, which is given in
Feller’s book.
(i) Show that there must exist an > 0 with the property that

P ∀` > k S` − Sk ≥ ≥
for some k ∈ Z+ and therefore that
where A ≡ ω : ∀` ∈ Z+ S` (ω) ≥ .

P(A) ≥ ,
(ii) For each ω ∈ Ω and n ∈ Z+ , set

n o
Γn (ω) = t ∈ R : ∃1 ≤ ` ≤ n t − S` (ω) < 2
and n o
Γn0 (ω) = t ∈ R : ∃1 ≤ ` ≤ n t − S`0 (ω) < 2 ,

Pn
where Sn0 ≡ `=1 X`+1 . Next, let Rn (ω) and Rn0 (ω) denote the Lebesgue mea-
sure of Γn (ω) and Γn0 (ω), respectively; and, using the translation invariance of
Lebesgue measure, show that
Rn+1 (ω) − Rn0 (ω) ≥ 1A0 (ω),

where A0 ≡ ω : ∀` ≥ 2 S` (ω) − S1 (ω) ≥ .

On the other hand, show that
EP Rn0 = EP Rn and P(A0 ) = P(A),

and conclude first that
n ∈ Z+ ,

P(A) ≤ EP Rn+1 − Rn ,
and then that

1 P
P(A) ≤ lim E Rn .
n→∞ n
(iii) In view of parts (i) and (ii), what remains to be done is show that
1 P
m = 0 =⇒ lim E Rn = 0.
n→∞ n
But, clearly, 0 ≤ Rn (ω) ≤ n. Thus, it is enough to show that, when m = 0,
Rn
n −→ 0 P-almost surely; and, to this end, first check that
Sn (ω) Rn (ω)
−→ 0 =⇒ −→ 0,
n n
and, finally, apply The Strong Law of Large Numbers.
Exercise 1.4.29. As I have already said, for many applications The Weak
Law of Large Numbers is just as good as and even preferable to the Strong
Law. Nonetheless, here is an application in which the full strength of the Strong
Law plays an essential role. Namely, I want to use the Strong Law to produce
examples of continuous, strictly increasing functions F on [0, 1] with the property
that their derivative
F (y) − F (x)
F 0 (x) ≡ lim =0 at Lebesgue-almost every x ∈ (0, 1).
y→x y−x
By familiar facts about functions of a real variable, one knows that such func-
tions F are in one-to-one correspondence with non-atomic, Borel probability
measures µ on [0, 1] which charge every non-empty open subset but are singular
to Lebesgue’s measure.
Namely, F is the distribution function determined by µ:
F (x) = µ (−∞, x] .
+ +
(i) Set Ω = {0, 1}Z , and, for each p ∈ (0, 1), take Mp = (βp )Z , where βp on
{0, 1} is the Bernoulli measure with βp ({1}) = p = 1 − βp ({0}). Next, define
∞
X
ω ∈ Ω 7−→ Y (ω) ≡ 2−n ωn ∈ [0, 1],
n=1
and let µp denote the Mp -distribution of Y . Given n ∈ Z+ and 0 ≤ m < 2n ,

show that
µp m2−n , (m + 1)2−n = p`m,n (1 − p)n−`m,n ,

Pn n −n
where
Pn `m,n = k=1 ωk and (ω1 , . . . , ωn ) ∈ {0, 1} is determined by m2 =
−k
k=1 2 ωk . Conclude, in particular, that µp is non-atomic and charges every
non-empty open subset of [0, 1].
(iii) Given x ∈ [0, 1) and n ∈ Z+ , define
1
1 if 2n−1 x − b2n−1 xc ≥

2
n (x) = 1
0 if 2n−1 x − b2n−1 xc < 2,
wherePbsc
∞
denotes the integer part of s. If {n : n ≥ 1} ⊆ {0, 1} satisfies
x = 1 2−m m , show that m = m (x) for all m ≥ 1 if and only if m = 0 for
infinitely many m ≥ 1. In particular, conclude first that ωn = n Y (ω) , n ∈
Z+ , for Mp -almost every ω ∈ Ω and, second, by the Strong Law, that
n
1 X
n (x) −→ p for µp -almost every x ∈ [0, 1].
n m=1
Thus, µp1 ⊥ µp2 whenever p1 6= p2 .

(iv) By Lemma 1.1.6, we know that µ 12 is Lebesgue measure λ[0,1] on [0, 1].
Hence, we now know that µp ⊥ λ[0,1] when p 6= 12 . In view of the introductory
remarks, this completes
the proof that, for each p ∈ (0, 1) \ { 12 }, the function
Fp (x) = µp (−∞, x] is a strictly increasing, continuous function on [0, 1] whose
derivative vanishes at Lebesgue-almost every point. Here, one can do better.
Namely, referring to part (iii), let ∆p denote the set of x ∈ [0, 1) such that
n
1 X
lim Σn (x) = p, where Σn (x) ≡ m (x).
n→∞ n
m=1
We know that ∆ 12 has Lebesgue measure 1. Show that, for each x ∈ ∆ 12 and
p ∈ (0, 1) \ { 12 }, Fp is differentiable with derivative 0 at x.
Hint: Given x ∈ [0, 1), define
n
X
Ln (x) = 2−m m (x) and Rn (x) = Ln (x) + 2−n .
m=1
Show that
n
!
X
2−m ωm = Ln (x) = pΣn (x) (1 − p)n−Σn (x) .

Fp Rn (x) − Fp Ln (x) = Mp
m=1
When p ∈ (0, 1) \ { 12 }
and x ∈ ∆ 12 , use this together with 4p(1 − p) < 1 to show
that !
Fp Rn (x) − Fp Ln (x)
lim n log < 0.
n→∞ Rn (x) − Ln (x)
To complete the proof, for given x ∈ ∆ 12 and n ≥ 2 such that Σn (x) ≥ 2, let
mn (x) denote the largest m < n such that m (x) = 1, and show that mnn(x) −→ 1
as n → ∞. Hence, since 2−n−1 < h ≤ 2−n implies that

Fp (x) − Fp (x − h) n−mn (x)+1 Fp Rn (x) − Fp Ln (x)
≤2 ,
h Rn (x) − Ln (x)
one concludes that Fp is left-differentiable at x and has left derivative equal to
0 there. To get the same conclusion about right derivatives, simply note that
Fp (x) = 1 − F1−p (1 − x).
(v) Again let p ∈ (0, 1) \ { 12 } be given, but this time choose x ∈ ∆p . Show that
Fp (x + h) − Fp (x)
lim = +∞.
h&0 h
The argument is similar to the one used to handle part (iv). However, this time
the role played by the inequality 4pq < 1 is played here by (2p)p (2q)q > 1 when
q = 1 − p.
§ 1.5 Law of the Iterated Logarithm 49
§ 1.5 Law of the Iterated Logarithm

Let X1 , . . . , Xn , . . . be a sequence of independent, identically distributed random
variables with mean value 0 and variance 1. In this section, I will investigate
exactly how large {Sn : n ∈ Z+ } can become as n → ∞. To get a feeling
for what one should be expecting, first note that, by Corollary 1.4.8, for any
non-decreasing {bn : n ≥ 1} ⊆ (0, ∞),
∞
Sn X 1
−→ 0 P-almost surely if 2
< ∞.
bn b
n=1 n
1
Thus, for example, Sn grows more slowly than n 2 log n. On the other hand, if
Sn
the Xn ’s are N (0, 1)-random variables, then so are the random variables √ n
;
and therefore, for every R ∈ (0, ∞),
 
[ Sn
Sn S
P lim √ ≥ R = lim P  √ ≥ R  ≥ lim P √N ≥ R > 0.
n→∞ n N →∞ n N →∞ N
n≥N
Hence, at least for normal random variables, one can use Lemma 1.4.1 to see
that
Sn
lim √ = ∞ P-almost surely;
n→∞ n
1
and so Sn grows faster than n 2 .
If, as we did in Section 1.3, we proceed on the assumption that Gaussian
random variables are typical, we should expect the growth rate of the Sn ’s to be
1 1
something between n 2 and n 2 log n. What, in fact, turns out to be the precise
growth rate is
q
(1.5.1) Λn ≡ 2n log(2) (n ∨ 3),

where log(2) x ≡ log log x (not the logarithm with base 2) for x ∈ [e, ∞). That
is, one has The Law of the Iterated Logarithm:
Sn
(1.5.2) lim = 1 P-almost surely.
n→∞ Λn
This remarkable fact was discovered first for Bernoulli random variables by Khin-
chine, was extended by Kolmogorov to random variables possessing 2 + mo-
ments, and eventually achieved its final form in the work of Hartman and Wint-
ner. The approach that I will adopt here is based on ideas (taught to me by
M. Ledoux) introduced originally to handle generalizations of (1.5.2) to random
variables with values in a Banach space.1 This approach consists of two steps.
The first establishes a preliminary version of (1.5.2) that, although it is far cruder
than (1.5.2) itself, will allow me to justify a reduction of the general case to the
case of bounded random variables. In the second step, I deal with bounded ran-
dom variables and more or less follow Khinchine’s strategy for deriving (1.5.2)
once one has estimates like the ones provided by Theorem 1.3.12.
In what follows, I will use the notation
S[β]
Λβ = Λ[β] and S̃β = for β ∈ [3, ∞),
Λβ
where [β] is the integer part of β.

Lemma 1.5.3. Let {Xn : n ≥ 1} be a sequence of independent, identically
distributed random variables with mean value 0 and variance 1. Then, for any
a ∈ (0, ∞) and β ∈ (1, ∞),2
∞
1
X
P S̃β m ≥ a β − 2 < ∞.

lim S̃n ≤ a (a.s., P) if
n→∞
m=1
Proof: Let β ∈ (1, ∞) be given and, for each m ∈ N and 1 ≤ n ≤ β m , let √αm,n
be a median (cf. (1.4.11)) of S[β m ] −Sn . Noting that, by (1.4.12), αm,n ≤ 2β m ,
we know that

1
Sn
lim S̃n = lim m−1max m S̃n ≤ β lim m−1max m
2
n→∞ m→∞ β ≤n≤β m→∞ β ≤n≤β Λβ m

1
Sn + αm,n
≤ β 2 lim maxm ,
m→∞ n≤β Λβ m
and therefore
!
Sn + αm,n
− 12

P lim S̃n ≥ a ≤ P lim max ≥ aβ .
n→∞ m→∞ n≤β m Λβ m
But, by Theorem 1.4.13,

!
Sn + αm,n 1
1

≥ aβ − 2 ≤ 2P S̃β m ≥ aβ − 2 ,

P maxm
n≤β Λβ m
and so the desired result follows from the Borel–Cantelli Lemma.

1 See §§ 8.4.2 and 8.6.3 and, for much more information, M. Ledoux and M. Talagrand, Prob-
ability in Banach Spaces, Springer-Verlag, Ergebnisse Series 3.Folge·Band 23 (1991).
2 Here and elsewhere, I use (a.s.,P) to abbreviate “P-almost surely.”
Lemma 1.5.4. For any sequence {Xn : n ≥ 1} of independent, identically

distributed random variables with mean value 0 and variance σ 2 ,

(1.5.5) lim S̃n ≤ 8σ (a.s., P).
n→∞
Proof: Without loss in generality, I assume throughout that σ = 1; and, for

the moment, I will also assume that the Xn ’s are symmetric (cf. Exercise 1.4.26).
By Lemma 1.5.3, we will know that (1.5.5) holds with 8 replaced by 4 once I
show that
∞
3
X
(*) P S̃2m ≥ 2 2 < ∞.
m=0
In order to take maximal advantage of symmetry, let (Ω, F, P) be the probability

space on which the Xn ’s are defined, use {Rn : n ≥ 1} to denote the sequence of
Rademacher functions on [0, 1) introduced in Section 1.1, and set Q = λ[0,1) × P
on [0, 1) × Ω, B[0,1) × F . It is then an easy matter to check that symmetry of
the Xn ’s is equivalent to the statement that
+
ω ∈ Ω −→ X1 (ω), . . . , Xn (ω), . . . ∈ RZ
has the same distribution under P as

+
(t, ω) ∈ [0, 1) × Ω 7−→ R1 (t)X1 (ω), . . . , Rn (t)Xn (ω), . . . ∈ RZ
has under Q. Next, using the last part of (iii) in Exercise 1.3.18 with σk = Xk (ω),
note that
2m !
X
λ[0,1) t ∈ [0, 1) : Rn (t)Xn (ω) ≥ a
n=1
" #
a2
≤ 2 exp − P2m , a ∈ [0, ∞) and ω ∈ Ω.
2 n=1 Xn (ω)2
Hence, if
2m
( )
1 X 2
Am ≡ ω∈Ω: m Xm (ω) ≥ 2
2 n=1
and ( 2m )!
X 3
Fm (ω) ≡ λ[0,1) t ∈ [0, 1) : Rn (t)Xn (ω) ≥ 2 2 Λ2m ,

n=1
then, by Tonelli’s Theorem,

n 3
o Z
P ω ∈ Ω : S2m (ω) ≥ 2 Λ2m
2 = Fm ω) P(dω)
Ω
" #
8Λ2m
Z h i
exp − P2m 2 P(dω) ≤ 2 exp −4 log(2) 2m + 2P Am .

≤2
Ω 2 n=1 Xn (ω)2
P∞
Thus, (*) comes down to proving that m=0 P Am < ∞; and, in order to
check this, I argue in much the same way as I did when I proved the converse
statement in Kolmogorov’s Strong Law. Namely, set
m
2
X Tm+1 − Tm Tm
Tm = Xn2 , Bm = ≥2 , and T m =
n=1
2m 2m

for m ∈ N. Clearly, P Am = P Bm . Moreover, the sets Bm , m ∈ N, are
mutually independent; and therefore, by the Borel–Cantelli Lemma, I need only
check that
Tm+1 − Tm
P lim Bm = P lim ≥ 2 = 0.
m→∞ m→∞ 2m
But, by the Strong Law, we know that T m −→ 1 (a.s., P), and therefore it is
clear that
Tm+1 − Tm
−→ 1 (a.s., P).
2m
I have now proved (1.5.5) with 4 replacing 8 for symmetric random variables.
To eliminate the symmetry assumption, again let (Ω,
0 0 0
F, P) be the probability
space on which the Xn ’s are defined, let Ω , F , P be a second copy of the
same space, and consider the random variables
(ω, ω 0 ) ∈ Ω × Ω0 7−→ Yn ω, ω 0 ≡ Xn (ω) − Xn (ω 0 )

under the measure Q ≡ P × P0 . Since the Yn ’s are obviously (cf. part (i) of
Exercise 1.4.21) symmetric, the result which I have already proved says that

Sn (ω) − Sn (ω 0 ) 5
lim ≤ 22 ≤ 8 for Q-almost every (ω, ω 0 ) ∈ Ω × Ω0 .
n→∞ Λn
Now suppose that limn→∞ |S n|

Λn > 8 on a set of positive P-measure. Then, by
Kolmogorov’s 0–1 Law, there would exist an > 0 such that
|Sn (ω)|
lim ≥ 8 + for P-almost every ω ∈ Ω;
n→∞ Λn
and so, by Fubini’sTheorem,3 we would

have that, for Q-almost every (ω, ω 0 ) ∈
0 + +
Ω × Ω , there is a nm (ω) : m ∈ Z ⊆ Z such that nm (ω) % ∞ and

Sn (ω) (ω 0 ) Sn (ω) (ω) Sn (ω) (ω) − Sn (ω) (ω 0 )
m m m m
lim ≥ lim − lim ≥ .
m→∞ Λnm (ω) m→∞ Λnm (ω) m→∞ Λnm (ω)
But, again by Fubini’s Theorem, this would mean that there exists a {nm : m ∈
Sn (ω0 )
Z+ } ⊆ Z+ such that nm % ∞ and limm→∞ Λmn ≥ for P0 -almost every
m
ω 0 ∈ Ω0 , and obviously this contradicts
" #
2
P 0 Sn 1
E = −→ 0.
Λn 2 log(2) n
We have now got the crude statement alluded to above. In order to get the
more precise statement contained in (1.5.2), I will need the following application
of the results in § 1.3.
Lemma 1.5.6. Let {Xn : n ≥ 1} be a sequence of independent random
variables with mean value 0, variance 1, and common distribution µ. Further,
assume that (1.3.4) holds. Then, for each R ∈ (0, ∞) there is an N (R) ∈ Z+
such that
" r ! #
8R log(2) n
(1.5.7) P S̃n ≥ R ≤ 2 exp − 1 − K R2 log(2) n
n
for n ≥ N (R). In addition, for each ∈ (0, 1], there is an N () ∈ Z+ such that,
for all n ≥ N () and |a| ≤ 1 ,
1 h i
P S̃n − a < ≥ exp − a2 + 4K|a| log(2) n .

(1.5.8)
2
In both (1.5.7) and (1.5.8), the constant K ∈ (0, ∞) is the one in Theorem
1.3.15.
Proof: Set
12
2 log(2) (n ∨ 3)

Λn
λn = = .
n n
To prove (1.5.7), simply apply the upper bound in the last part of Theorem
1.3.15 to see that, for sufficiently large n ∈ Z+ ,
(Rλn )2

3
P S̃n ≥ R = P S n ≥ Rλn ≤ 2 exp −n
− K Rλn .
2
3This is Fubini at his best and subtlest. Namely, I am using Fubini to switch between hori-
zontal and vertical sets of measure 0.
To prove (1.5.8), first note that

P S̃n − a < = P S n − an < n ,
where an = aλn and n = λn . Thus, by the lower bound in the last part of
Theorem 1.3.15,
2
K an 2

P S̃n − a < ≥ 1 − 2 exp −n
+ K|an | n + an
nn 2
!
K h i
≥ 1− 2 exp − a2 + 2K|a| + a2 λn log(2) n
2 log(2) n
for sufficiently large n’s.
Theorem 1.5.9 (Law of Iterated Logarithm). The equation (1.5.2) holds

for any sequence {Xn : n ≥ 1} of independent, identically distributed random
variables with mean
n value 0oand variance 1. In fact, P-almost surely, the set of
Sn
limit points of Λn : n ≥ 1 coincides with the entire interval [−1, 1]. Equiva-

lently, for any f ∈ C R; R ,

Sn
(1.5.10) lim f = sup f (t) (a.s., P).
n→∞ Λn t∈[−1,1]
(Cf. Exercise 1.5.12 for a converse statement and §§ 8.4.2 and 8.6.3 for related
results.)
Proof: I begin with the observation that, because of (1.5.5), I may restrict
my attention to the case when the Xn ’s are bounded random variables. Indeed,
for any Xn ’s and any > 0, an easy truncation procedure allows us to find an
ψ ∈ Cb (R; R) such that Yn ≡ ψ ◦ Xn again has mean value 0 and variance 1
while Zn ≡ Xn − Yn has variance less than 2 . Hence, if the result is known
when the random variables are bounded, then, by (1.5.5) applied to the Zn ’s,
Pn
m=1 Zm (ω)
lim S̃n (ω) ≤ 1 + lim
≤ 1 + 8,
n→∞ n→∞ Λn
and, for a ∈ [−1, 1],

Pn
Zm (ω)
lim S̃n (ω) − a ≤ lim m=1 ≤ 8
n→∞ n→∞ Λn
for P-almost every ω ∈ Ω.

In view of the preceding, from now on I may and will assume that the Xn ’s
are bounded. To prove that limn→∞ S̃n ≤ 1 (a.s., P), let β ∈ (1, ∞) be given,
and use (1.5.7) to see that
1
h 1 i
P S̃β m ≥ β 2 ≤ 2 exp −β 2 log(2) β m

+
large m ∈ Z . Hence, by Lemma 1.5.3 with a = β, we see
for all sufficiently
that limn→∞ S̃n ≤ β (a.s., P) for every β ∈ (1, ∞). To complete the proof, I
must still show that, for every a ∈ (−1, 1) and > 0,

P lim S̃n − a < = 1.
n→∞
Because I want to get this conclusion as an application of the second part of
the Borel–Cantelli Lemma, it is important that we be dealing with independent
events, and for this purpose I use the result just proved to see that, for every
integer k ≥ 2,

lim S̃n − a ≤ inf lim S̃km − a
n→∞ k→∞ m→∞

Skm − Skm−1
= inf lim
− a P-almost surely.
k→∞ m→∞ Λk m
Thus, because the events

Skm − Skm−1
m ∈ Z+ ,

Ak,m ≡ − a < ,

Λkm
are independent for each k ≥ 2, all that I need to do is check that
X∞

P Ak,m = ∞ for sufficiently large k ≥ 2.
m=1
But
Λkm a Λkm
P Ak,m = P S̃km −km−1 −
< ,
Λkm −km−1 Λkm −km−1
and, because
Λkm
lim max+
− 1 = 0,
k→∞ m∈Z Λ m m−1
k −k
everything reduces to showing that
X∞

(*) P S̃km −km−1 − a < = ∞
m=1
for each k ≥ 2, a ∈ (−1, 1), and > 0. Finally, referring to (1.5.8), choose 0 > 0
so small that ρ ≡ a2 + 4K0 |a| < 1, and conclude that, when 0 < < 0 ,
1 h i
P S̃n − < ≥ exp −ρ log(2) n
2
for all sufficiently large n’s, from which (*) is easy.
Remark 1.5.11. The reader should notice that the Law of the Iterated Log-
arithm provides a naturally occurring sequence of functions that converge in
measure but not almost everywhere. Indeed, it is obvious that S̃n −→ 0 in
L2 (P; R), but the Law of the Iterated Logarithm says that S̃n : n ≥ 1 is
wildly divergent when looked at in terms of P-almost sure convergence.
Exercise 1.5.12. Let {Xn : n ≥ 1} be a sequence of mutually independent,

identically distributed random variables for which

|Sn |
(1.5.13) P lim < ∞ > 0.
n→∞ Λn

In this exercise I4 will outline a proof that X1 is P-square integrable, EP X1 = 0,
and
Sn Sn 1
(1.5.14) lim = − lim = EP X12 2 (a.s., P).
n→∞ Λn n→∞ Λn
(i) Using Lemma 1.4.1, show that there is a σ ∈ [0, ∞) such that

Sn
(1.5.15) lim =σ (a.s., P).
n→∞ Λn
Next, assuming that X1 is P-square integrable, use The

Strong
Law of Large
Numbers together with Theorem 1.5.9 to show that EP X1 = 0 and
1 Sn Sn
σ = EP X12 2 = lim = − lim (a.s., P).
n→∞ Λn n→∞ Λn
In other words, everything comes down to proving that (1.5.13) implies that X1
is P-square integrable.
(ii) Assume that the Xn ’s are symmetric. For t ∈ (0, ∞), set
X̌nt = Xn 1[0,t] |Xn | − Xn 1(t,∞) |Xn | ,

and show that
X̌1t , . . . , X̌nt , . . .

and X1 , . . . , X n , . . .
4I follow Wm. Feller “An extension of the law of the iterated logarithm to variables without
variance,” J. Math. Mech., 18 #4, pp. 345–355 (1968), although V. Strassen was the first to
prove the result.
have the same distribution. Conclude first that, for all t ∈ [0, 1),
Pn
m=1 Xn 1[0,t] |Xn |

lim ≤ σ (a.s., P),
n→∞ Λn
where σ is the number in (1.5.15), and second that
h i
EP X12 = lim EP X12 , X1 ≤ t ≤ σ 2 .

t%∞
Hint: Use the equation
Xn + X̌nt
Xn 1[0,t] |Xn | = ,
2
and apply part (i).
(iii) For general {Xn : n ≥ 1}, produce an independent copy {Xn0 : n ≥ 1} (as
in the proof of Lemma 1.5.4), and set Yn = Xn − Xn0 . After checking that
Pn
| m=1 Ym |
lim ≤ 2σ (a.s., P),
n→∞ Λn

conclude
first that EP Y12 ≤ 4σ 2 and then (cf. part
(i) of Exercise1.4.27) that
EP X12 < ∞. Finally, apply (i) to arrive at EP X1 = 0 and (1.5.14).
Exercise 1.5.16. Let {s̃n : n ≥ 1} be a sequence of real numbers which possess
the properties that

lim s̃n = 1, lim s̃n = −1, and lim s̃n+1 − s̃n = 0.
n→∞ n→∞ n→∞
Show that the set of subsequential limit points of {s̃n : n ≥ 1} coincides with
[−1, 1]. Apply this observation to show that, in order to get the final statement in
Theorem 1.5.9, I need only have proved (1.5.10) for the function f (x) = x, x ∈ R.
Hint: In proving the last part, use the square integrability of X1 to see that
∞ 2
X Xn
P ≥ 1 < ∞,
n=1
n
and apply the Borel–Cantelli Lemma to conclude that S̃n − S̃n−1 −→ 0 (a.s., P).
Exercise 1.5.17. Let {Xn : n ≥ 1} be a sequence of RN -valued, identically
distributed random variables on (Ω, F, P) with the property that, for each e ∈
SN −1 = {x ∈ RN : |x| = 1}, e, X1 RN has mean value 0 and variance 1. Set
Pn Sn
Sn = m=1 Xm and S̃n = Λ n
, and show that limn→∞ |S̃n | = 1 P-almost surely.
Here are some steps that you might want to follow.
(i) Let {ek : k ≥ 1} be a countable, dense subset of SN −1 for which {e1 , . . . , eN }

N
is orthonormal, and suppose
the sequence {s̃n : n ≥ 1} ⊆ R has the
that
property that limn→∞ ek , s̃n RN = 1 for each k ≥ 1. Note that |s̃n | ≤
1
N 2 max1≤k≤N (ek , s̃n RN , and conclude that C ≡ supn≥1 |s̃n | ∈ [1, ∞).
S`
(ii) Continuing (i), for a given > 0, choose ` ≥ 1 so that SN −1 ⊆ k=1 B ek , C .

Show that
|s̃n | ≤ max e, s̃n RN + ,
1≤k≤`
and conclude first that limn→∞ |s̃n | ≤1 + and then that limn→∞ |s̃n | ≤ 1.
At the same time, since |s̃n | ≥ e1 , s̃n RN , show that limn→∞ |s̃n | ≥ 1. Thus
limn→∞ |s̃n | = 1.
(iii) Let {ek : k ≥ 1} be as in (i), and apply Theorem 1.5.9 to show that, for
P-almost all ω ∈ Ω, the sequence {S̃n (ω) : n ≥ 1} satisfies the condition in (i).
Thus, by (ii), limn→∞ |S̃n (ω)| = 1 for P-almost every ω ∈ Ω.
Chapter 2
The Central Limit Theorem
In the preceding chapter I dealt with averages of random variables and showed
that, in great generality, those averages converge almost surely or in probability
to a constant. At least when all the random variables have the same distribution
and moments of all orders, one way of rationalizing this phenomenon is to rec-
ognize that the mean value is conserved whereas all higher moments are driven
to 0 when one averages. Of course, the reason why it is easy to conserve the
first moment is that the mean of the sum is the sum of the means. Thus, if one
is going to attempt to find a simple normalization procedure that conserves a
quantity involving more than the mean value, one should seek a quantity that
shares this additivity property.
With this in mind, one is led to ask what happens if one normalizes in a way
that conserves the variance. For this purpose, suppose that {Xn : n ∈ Z+ } is a
sequence of mutually independent, identicallyP distributed random variables with
n 1
mean value 0 and variance 1, and set Sn = 1 Xk . Then S̆n ≡ n− 2 Sn again
has mean value 0 and variance 1. On the other hand, because of Theorem 1.5.9,
we know that, with probability 1, limn→∞ S̆n = ∞ = − limn→∞ S̆n . Hence,
from the point of view of either almost sure convergence or even convergence in
probability, there is no hope that the S̆n ’s will converge.
Nonetheless, the random variables {S̆n : n ≥ 1} possess remarkable stability
when viewed from a distributional perspective. Indeed, if the Xn ’s are Gaussian,
then so are the S̆n ’s, and therefore S̆n ∈ N (0, 1) for all n ≥ 1. More generally,
even if the Xn ’s are not Gaussian, fixing their mean value and variance in this
way forces all their moments to stabilize. To be precise, assume that X1 has finite
moments of all orders, that its mean is 0, and that its variance is 1. Trivially,
L1 ≡ limn→∞ EP [S̆n ] = 0 and L2 ≡ limn→∞ EP [S̆n2 ] = 1. Next, assume that
L` ≡ limn→∞ EP [S̆n` ] exists for 1 ≤ ` ≤ m, where m ≥ 2. I will show now that
Lm+1 ≡ limn→∞ EP [S̆nm+1 ] exists and is equal to mLm−1 . To this end, first note
that, since EP [Xn ] = 0 and the Xn ’s are independent and identically distributed,
m
m+1
P P
m X m P j+1 P m−j
E Sn = nE Xn Xn + Sn−1 =n E Xn E Sn−1
j=0
j
59
60 2 The Central Limit Theorem
m
m−1
P
X m P j+1 P m−j
= nmE Sn−1 + n E Xn E Sn−1 .
j=2
j
m+1
Thus, after dividing through by n 2 , one gets the desired conclusion when
n → ∞. Starting from L1 = 0 and L2 = 1, one now can use induction to check
Qm
that L2m−1 = 0 and L2m = `=1 (2` − 1) = 2(2m)! +
m m! for all m ∈ Z . That is,
m
Y (2m)!
lim EP S̆n2m−1 = 0 lim EP S̆n2m =

and (2` − 1) = m ,
n→∞ n→∞ 2 m!
`=1
for all m ∈ Z+ . In other

words, at least when the Xn ’s have moments of all
orders, limn→∞ EP S̆nm exists and is independent of the particular choice of
random variables. In particular, since for the Gaussian case, EP [S̆nm ] = EP [X1m ],
we conclude that all moments of the S̆n ’s converge to the corresponding moments
of a standard normal random variable.
In this chapter we will see that the preceding stabilization result is just one
manifestation of a general principle known as the Central Limit phenomenon.
§ 2.1 The Basic Central Limit Theorem
In this section I will derive the basic Central Limit Theorem using a beautiful
argument which was introduced by J. Lindeberg. Throughout, hϕ, µi denotes
the integral of a function ϕ against a measure µ.
§ 2.1.1. Lindeberg’s Theorem. Let {Xn : n ≥ 1} be a sequence of in-
dependent,Psquare integrable random variables with mean value 0, and set
1 n
S̆n = n− 2 m=1 Xm . At least when the Xn ’s are identically distributed and
have moments of all orders and variance 1, we just saw that (recall that γm,σ2
is the distribution of an N (m, σ 2 )-random variable)

(2.1.1) lim EP ϕ(S̆n ) = hϕ, γ0,1 i
n→∞
for any polynomial ϕ : R −→ C. In this subsection, I will prove a result

that shows that, under much more general conditions, (2.1.1) holds for all
ϕ ∈ C 3 R; C) with bounded second and third order derivatives.
In the following statement,
sX
p p
2 ,
Sn
(2.1.2) σm = Var(Xm ) > 0, Σn = Var(Sn ) = σm and S̆n ≡ .
m=1
Σn
Notice that when the Xk ’s are identically distributed and have variance 1, the
S̆n in (2.1.2) is consistent with the notation used above. Finally, set
n
σm 1 X P h 2 i
(2.1.3) rn = max and gn () = E Xm , X m
≥ Σ n
1≤m≤n Σn Σ2n m=1
§ 2.1 The Basic Central Limit Theorem 61
1
for > 0. Clearly, in the identically distributed case, rn = n− 2 and
h 1
i
gn () = σ1−2 EP X12 , |X1 | ≥ n 2 σ1 −→ 0 as n → ∞ for each > 0.
Theorem 2.1.4 (Lindeberg). Refer to the preceding, and let ϕ be an element

of C 3 (R; R) with bounded second and third order derivatives. Then, for each
> 0,

rn
ϕ000 + gn () ϕ00 .
P
(2.1.5) E ϕ S̆n − hϕ, γ0,1 i ≤ +

6 2 u u
In particular, because
(2.1.6) rn2 ≤ 2 + gn (), > 0,
(2.1.1) holds if gn () −→ 0 as n → ∞ for each > 0.

Proof: Choose N (0, 1)-random variables Y1 , . . . , Yn which are both mutually
independent and independent of the Xm ’s. (After augmenting the probability
space, if necessary, this can be done as an application of either Theorem 1.1.7
or Exercise 1.1.14.) Next, set
n
σk Yk X
Y̆k = and T̆n = Y̆k ,
Σn 1
and observe that T̆n is again an N (0, 1)-random variable and therefore that

∆ ≡ EP ϕ(S̆n ) − hϕ, γ0,1 i = EP ϕ(S̆n ) − EP ϕ(T̆n ) .

Xk
Further, set X̆k = Σn , and define
X X
Um = Y̆k + X̆k for 1 ≤ m ≤ n,
1≤k≤m−1 m+1≤k≤n
where a sum over the empty set is taken to be 0. It is then clear that
n
X
∆≤ ∆m where ∆m ≡ EP ϕ Um + X̆m − EP ϕ Um + Y̆m .

1
Moreover, if
ξ 2 00
Rm (ξ) ≡ ϕ Um + ξ − ϕ(Um ) − ξϕ0 (Um ) −

2 ϕ (Um ), ξ ∈ R,
then (because both X̆m and Y̆m are independent of Um and have the same first
two moments)

∆m = EP Rm (X̆m ) − EP Rm (Y̆m )] ≤ EP Rm (X̆m ) + EP Rm (Y̆m ) .
In order to complete the derivation of (2.1.5), note that, by Taylor’s Theorem,
3
Rm (ξ) ≤ ϕ000 |ξ| ∧ ϕ00 |ξ|2 ;

u 6 u
and therefore, for each > 0,

n
X
EP |Rm (X̆m )|
1
n n
kϕ000 ku X P X
E |X̆m |3 , |Xm | ≤ Σn + kϕ00 ku
2
≤ EP X̆m , |Xm | ≥ Σn
6 1 1
n
kϕ000 ku X σm
2
00 kϕ000 ku
≤ 2
+ kϕ ku gn () = + kϕ00 ku gn (),
6 1
Σ n 6
while
n n 3
X kϕ000 ku P X 3
σm 3 4 rn kϕ000 ku
E |Y1 |3

EP |Rm (Y̆n )| ≤ ≤ .
1
6 1
Σ3n 6
Hence, (2.1.5) is now proved.

Given (2.1.5), all that remains is to prove (2.1.6). However, for any 1 ≤ m ≤ n
and > 0,
2
2 2
, |Xm | ≥ Σn ≤ Σ2n 2 + gn () .

σm = EP Xm , |Xm | < Σn + EP Xm
The condition that gn () −→ 0 for each > 0 is often called Lindeberg’s
condition because it introduced by J. Lindeberg and it was he who proved that
it is a sufficient condition for (2.1.1) to hold for all (cf. Theorem 2.1.8) ϕ ∈
Cb (RN ; C). Later, Feller proved that (2.1.1) for all ϕ ∈ Cb (RN ; R) plus rn → 0
imply that Lindeberg’s condition holds. Together, these two results are known
as the Lindeberg–Feller Theorem. See Exercise 2.3.20 for a proof of Feller’s
part.
§ 2.1.2. The Central Limit Theorem. If one is not concerned about rates
of convergence, then the differentiability requirement on ϕ can be dropped from
the last part of Theorem 2.1.4. In order to understand the reason for this, it is
helpful to couch the statement of Theorem 2.1.4 entirely in terms of measures.
Thus, let µn denote the distribution of S̆n . Then, under Lindeberg’s condition,
Theorem 2.1.4 allows one to say that hϕ, µn i −→ hϕ, γ0,1 i for all ϕ ∈ C 3 (RN ; C)
with bounded second and third order derivatives. Because we are dealing with
statements about integration and integration is a very forgiving operation, this
sort of result self-improves. To be precise, I prove the following lemma.
§ 2.1 The Basic Central Limit Theorem 63
Lemma 2.1.7. Suppose that {µn : n ≥ 1} is a sequence of (non-negative)

locally finite1 Borel measures on RN and that µ is a locally finite Borel mea-
∞ N
sure on RN with the property that hϕ, µn i −→ hϕ, µi for all ϕ ∈ Cc (R ; R).
N
Then, for any ψ ∈ C R ; [0, ∞) , hψ, µi ≤ limn→∞ hψ, µn i. Moreover, if ψ ∈
C RN ; [0, ∞) is µn -integrable for each n ∈ Z+ and if hψ, µn i −→ hψ, µi ∈ [0, ∞),
then for any sequence {ϕn : n ≥ 1} ⊆ C(RN ; C) that converges uniformly on
compacts to a ϕ ∈ C(RN ; C) and satisfies |ϕn | ≤ Cψ for some C < ∞ and all
n ≥ 1, hϕn , µn i −→ hϕ, µi.
Proof: Choose ρ ∈ Cc∞ B(0, 1); [0, ∞) with total integral 1, and set ρ (x) =

−N ρ(−1 x) for > 0. Also, choose η ∈ Cc∞ B(0, 2); [0, 1] so that η = 1 on
B(0, 1), and set ηR (x) = η(R−1 x) for R > 0.
Begin by noting that hϕ, µn i −→ hϕ, µi for all ϕ ∈ Cc∞ (RN ; C). Next, suppose
that ϕ ∈ Cc (RN ; C), and, for > 0, set ϕ = ρ ? ϕ, the convolution
Z
ρ (x − y)ϕ(y) dy
RN
of ρ with ϕ. Then, for each > 0, ϕ ∈ Cc∞ (RN ; C) and therefore hϕ , µn i −→
hϕ , µi. In addition, there is an R > 0 such that supp(ϕ ) ⊆ B(0, R) for all
∈ (0, 1]. Hence,

lim hϕ, µn i − hϕ, µi ≤ 2hηR , µikϕ − ϕku .
n→∞
Since lim&0 kϕ − ϕku = 0, we have now shown that hϕ, µn i −→ hϕ, µi for all
ϕ ∈ Cc (RN ; C).
Now suppose that ψ ∈ C RN ; [0, ∞) , and set ψR = ηR ψ, where ηR is as
above. Then, for each R > 0, hψR , µi = limn→∞ hψR , µn i ≤ limn→∞ hψ, µn i.
Hence, by Fatou’s Lemma, hψ, µi ≤ limR→∞ hψR , µi ≤ limn→∞ hψ, µn i.
Finally, suppose that ψ ∈ C RN ; [0, ∞) is µn -integrable for each n ∈ Z+ and
that hψ, µn i −→ hψ, µi ∈ [0, ∞). Given {ϕn : n ≥ 1} ⊆ C(RN ; C) satisfying
|ϕn | ≤ Cψ and converging uniformly on compacts to ϕ, one has

hϕn , µn i − hϕ, µi ≤ hϕn − ϕ, µn i + hϕ, µn i − hϕ, µi.
Moreover, for each R > 0,

lim hϕn − ϕ, µn i
n→∞

≤ lim sup |ϕn (x) − ϕ(x)|hηR , µn i + lim h(1 − ηR )(ϕn − ϕ), µn i
n→∞ x∈B(0,2R) n→∞

≤ 2C lim h(1 − ηR )ψ, µn i = lim 2C hψ, µn i − hηR ψ, µn i = 2Ch(1 − ηR )ψ, µi,
n→∞ n→∞
1 A Borel measure on a topological space is locally finite if it gives finite measure to compacts.
and similarly

lim hϕ, µn i − hϕ, µi
n→∞

≤ lim hηR ϕ, µn i − hηR ϕ, µi + C lim h(1 − ηR )ψ, µn i + Ch(1 − ηR )ψ, µi
n→∞ n→∞
= 2Ch(1 − ηR )ψ, µi.
Finally, because ψ is µ-integrable, h(1 − ηR )ψ, µi −→ 0 as R → ∞ by Lebesgue’s

Dominated Convergence Theorem, and so we are done.
By combining Theorem 2.1.4 with the preceding, we have the following version
of the famous Central Limit Theorem.
Theorem 2.1.8 (Central Limit Theorem). With the setting the same as it
was in Theorem 2.1.4, assume that gn () −→ 0 as n → ∞ for each > 0. Then

lim EP ϕn (S̆n ) = hϕ, γ0,1 i
n→∞
whenever {ϕn : n ≥ 1} ⊆ C(R; C) satisfies

ϕn (y)
sup sup 2
<∞
n≥1 y∈R 1 + |y|
and tends to ϕ uniformly on compacts. Moreover, if −∞ ≤ a < b ≤ ∞, then

Z b 2
1 y
(2.1.9) lim P a ≤ S̆n ≤ b = γ0,1 (a, b] = √ exp − dy.
n→∞ 2π a 2
(See Exercise 2.1.10 for more information about the identically distributed case.)
Proof: Take µn to be the distribution of S̆n . By Theorem 2.1.4, we know
that hϕ, µn i −→ hϕ, γ0,1 i for all ϕ ∈ Cc∞ (RN ; R). In addition, we know that
hψ, µn i = 2 = hψ, γ0,1 i when ψ(y) = 1 + y 2 . Hence, the first assertion is an
application of Lemma 2.1.7.
Turning to the second assertion, let a < b be given. To prove (2.1.9), choose
{ϕk : k ≥ 1} ⊆ Cb (R; R) and {ψk : k ≥ 1} ⊆ Cb (R; R) so that 0 ≤ ϕk % 1(a,b)
and 1 ≥ ψk & 1[a,b] as k → ∞. Then,
Z
P
i
lim P a < S̆n < b ≥ lim E ϕk S̆n = ϕk (y) γ0,1 (dy) −→ γ0,1 (a, b)
n→∞ n→∞ R
as k → ∞, and, similarly,
Z

lim P a ≤ S̆n ≤ b ≤ lim EP ψk S̆n = ψk (y) γ0,1 (dy) −→ γ0,1 [a, b] .
n→∞ n→∞ R

Finally, note that γ0,1 (a, b) = γ0,1 [a, b] .
Exercise 2.1.10. Let {Xn : n ≥ 1} be a sequence of independent, identically

1 Pn
distributed random variables, set S̆n = n− 2 m=1 Xm , and assume that
lim EP S̆n2 ∧ R2 ≤ 1

for every R ∈ [0, ∞).
n→∞
In particular, by Lemma 2.1.7, this will certainly be the case whenever (2.1.1)
holds for every ϕ ∈ Cc (R; R). The purpose of this exercise is to show that the
Xn ’s are P-square integrable, have mean value 0, and variance no more than 1;
and the method which I will use is based on the same line of reasoning as was
given in Exercise 1.5.12.

(i) Assuming that X1 ∈ L2 (P; R), show that EP X1 = 0 and EP X12 ≤ 1. In
particular, use this together with the result in part (i) of Exercise 1.4.27 to see
that it suffices to handle the case when the Xn ’s are symmetric.
(ii) In this and the succeeding parts of this exercise, we will be assuming that
the Xn ’s are symmetric. Following the same route as was suggested in (ii) of
Exercise 1.5.12, set
X̌nt = Xn 1[0,t] |Xn | − Xn 1(t,∞) |Xn | , n ∈ Z+ ,

and recall that X̌1t , . . . , X̌nt , . . . and X1 , . . . , Xn , . . . have the same distribu-
tion for each t ∈ (0, ∞). Use this together with our basic assumption to see that
limR→∞ sup n∈Z+ P An (t, R) = 0, where
t∈(0,∞)
( n n )
X X 1
X̌kt ≥ n 2 R .

An (t, R) ≡ Xk ∨
1 1
(iii) Continuing in the setting of part (ii), set

n
1 X
S̆nt =

1 Xk 1[0,t] |Xk | .
n 2
1

After noting that the Xn 1[0,t] |Xn | ’s are symmetric, check (cf. the proof of

Theorem 1.3.1) that EP |S̆nt |4 ≤ 3t4 . In particular, conclude that, for each
t ∈ (0, ∞), there is an R(t) ∈ (0, ∞) such that
1 1
EP |S̆nt |2 , An t, R(t) ≤ 3 2 t2 P An t, R(t) 2 ≤ 1 for all n ∈ Z+ .

(iv) Given t ∈ (0, ∞), choose R(t) ∈ (0, ∞) as in the preceding. Taking into
account the identity Pn Pn t
t 1 Xk + 1 X̌k
S̆n = 1 ,
2n 2
show that
EP X12 , |X1 | ≤ t = EP |S̆nt |2 ≤ EP |S̆nt |2 , An t, R(t) { + 1

h i
≤ EP S̆n2 ∧ R(t)2 + 1
for all n ∈ Z+ and t ∈ (0,

∞). In particular,
use this and our basic hypothesis
to conclude first that EP X12 , |X1 | ≤ t ≤ 2 for all t ∈ (0, ∞) and then that X1
is square P-integrable.
(v) Show that (2.1.1) holds if and only if X1 has mean value 0 and variance 1.
Exercise 2.1.11. An interesting way in which to interpret The Central Limit
Theorem is as the solution to a certain fixed point problem. Namely, let P denote
the set of probability measures µ on R; BR with the properties that
Z Z
x2 µ(dx) = 1 and x µ(dx) = 0.
R R

Next, define T2 µ for µ ∈ P to be the probability measure on R, BR given by
ZZ
x+y
T2 µ(Γ) = 1Γ √ µ(dx)µ(dy) for Γ ∈ BR .
2
R2
After checking that T2 maps P into itself, use The Central Limit Theorem to
show that, for every µ ∈ P,
Z Z
n
lim ϕ d T2 µ = ϕ dγ0,1 , ϕ ∈ Cb (R; C).
n→∞ R R
Conclude, in particular, that γ0,1 is the one and only element µ of P with the
property that T2 µ = µ and that this fixed point is attracting. (See Exercise
2.3.21 for more information.)
Exercise 2.1.12. Here is another indication of the remarkable stability of nor-
mal random variables. Namely, I will outline here a derivation2 of the Lévy–
Cramér Theorem which says that if X and Y are independent random vari-
ables whose sum is normal (with some mean and variance), then both X and Y
are normal.
2 This derivation is based on a note by Z. Sasvári, who himself borrowed some of the ideas
from A. Rényi. I know of no derivation that does not rely on complex analysis and would be
very interested in learning one.
(i) Assume that X + Y ∈ N (a, σ 2 ), and, by subtracting a from X, reduce to

the case in which X + Y ∈ N (0, σ 2 ). Next, show that there is nothing more to
do when σ = 0 and that one can always reduce to the case σ = 1 when σ > 0.
Thus, from now on, assume that X + Y ∈ N (0, 1).
(ii) Choose r ∈ (0, ∞) so that P |X| ∨ |Y | ≥ r ≤ 12 , and conclude that

R2

P |X| ≥ r + R ∨ P |Y | ≥ r + R ≤ 4 exp − , R ∈ (0, ∞).
2
In particular,
show that the moment generating
functions z ∈ C 7−→ M (z) =
EP ezX ∈ C and z ∈ C 7−→ N (z) = EhP eizY ∈ C exist and are entire functions.
2
Further, note that M (z)N (z) = exp z2 , and conclude that M and N never
vanish. Finally, from the fact that X + Y has mean 0, show that one can reduce
to the case in which both X and Y have mean 0. Thus, from now on, we assume
that M 0 (0) = 0 = N 0 (0).
(iii) Because M never vanishes and M (0) = 1, elementary complex analysis (cf.
Lemma 3.2.3) guarantees that there is a unique entire function θ : C −→ C such
that θ(0) = 0 and M (z) = eθ(z) for all z ∈ C. Further, from M 0 (0) = 0, note
that θ0 (0) = 0. Thus,
∞
dn

X
cn z n P
xX
θ(z) = where n!cn = log E e ∈ R.
n=2
dxn
x=0
h i
z2
Finally, note that N (z) = exp 2 − θ(z) .
(iv) As an application of Hölder’s Inequality, observe that x ∈ R 7−→ θ(x) ∈ R

2
and x ∈ R 7−→ x2 − θ(x) ∈ R are both convex. Thus, since θ0 (0) = 0, both these
functions are non-increasing on (−∞, 0] and non-decreasing on [0, ∞). Use this
observation to check that
x2
θ(x) ≥ 0 ≤ − θ(x) for all x ∈ R.
2
Next, use the preceding in conjunction with the trivial remarks
exp Re θ(z) = EP ezX ≤ eθ(x)

and h i h 2 i
z2
= EP ezY ≤ exp x2 − θ(x)

exp Re 2 − θ(z)
to arrive at
√
−y 2 ≤ 2Re θ(z) ≤ x2

for z = x + −1 y ∈ C.
In particular, this means that

|z|2
Re θ(z) ≤ , z ∈ C.

2
(v) To complete the program, use Cauchy’s integral formula to show that, for
each n ∈ Z+ and r > 0, on the one hand,
Z 2π √ √
1
cn r n = θ re −1 θ e− −1 nθ dθ, r > 0,
2π 0
while, on the other hand (since θ(z) = θ z̄) and therefore ∂z θ(z) = 0),
Z 2π √ √
e− −1 nθ

0= θ re −1 θ dθ.
0
Hence,
Z 2π √ √
1
n
cn r = Re θ re −1 θ e− −1 nθ dθ, n ∈ Z+ and r > 0.
π 0
Finally, in combination with the estimate obtained in (iv) and the fact that
c0 = c1 = 0, this leads to the conclusion that cn = 0 for n 6= 2 and therefore
that θ(z) = c2 z 2 with 0 ≤ c2 ≤ 12 .
Exercise 2.1.13. An important result that is closely related to The Central
Limit Theorem is the following observation, which occupies a central position in
the development of classical statistical mechanics.3
(i) For each n ∈ Z+ , let λn denote the normalized surface measure on the
(n − 1)-dimensional sphere
√ 1
Sn−1 n = x ∈ Rn : |x| = n 2 ,
(1)
and denote by λn the distribution of the coordinate x1 under λn . Check that,
(1)
when n ≥ 2, λn (dt) = fn (t) dt, where
n−3
t2

ωn−2 2
1
fn (t) = 1 1− 1(−1,1) n− 2 t ,
n ωn−1
2 n
3Although E. Borel seems to have thought he was the first to discover this result and rhap-
sodizes about it a good deal in “Sur les principes de la cinétique des gaz,” Ann. l’École Norm.
sup., 3e t. 23, it appears already in the 1866 article “Über die Entwicklungen einer Funktion
von beliebig vielen Variabeln nach Laplaceshen Funktionen höherer Ordnung,” J. Reine u.
Angewandte Math., by F. Mehler and is only a small part of what Mehler discovered there. Be
that as it may, Borel deserves credit for recognizing the significance of this result for statistical
mechanics.
and ωk−1 denotes the surface area of the (k − 1)-dimensional unit sphere in Rk .
Using polar coordinates to compute the right-hand side of
Z
k |x|2
(2π) 2 = e− 2 dx,
Rk
first check that k

2π 2
ωk−1 = ,
Γ k2
where Γ(t) is Euler’s Γ-function (cf. (1.3.20)), and then apply Stirling’s formula
(cf. (1.3.21)) to see that
ωn−2 1
1 −→ √ as n → ∞.
n ωn−1
2 2π
Now, using g to denote the density for the standard Gauss distribution (i.e., the
Gauss kernel in (1.3.5)), apply these computations to show that
fn (t) fn (t)
sup sup < ∞ and that −→ 1 uniformly on compacts.
n≥3 t∈R g(t) g(t)
In particular, conclude that, for any ϕ ∈ L1 (γ0,1 ; R),

Z Z
(1)
(2.1.14) ϕ dλn −→ ϕ dγ0,1 .
R R
(ii) A less computational approach to the same calculation is the following. Let
{Xn : n p≥ 1} be a sequence of independent N (0, 1) random
variables, and set
2 2
Rn = X1 + · · · + Xn . First note that P Rn = 0 = 0 and then that the
distribution of 1
n 2 X1 , . . . , Xn
θn ≡
Rn
R2
is λn . Next, use The Strong Law of Large Numbers to see that nn −→ 1 (a.s., P)
and conclude that, for any N ∈ Z+ ,
lim EP ϕ θn(N ) = EP ϕ X1 , . . . , XN , ϕ ∈ Cc RN ; R ,

n→∞
(N )
where, for n ≥ N , θn ∈ RN denotes the projection of θn ∈ Rn onto its first
(N )
N coordinates. Conclude that if λn on RN , BRN denotes the distribution of

x = (x1 , . . . , xn ) ∈ Rn 7−→ x(N ) ≡ x1 , . . . , xN ∈ RN under λn , then
Z Z
(N ) N
for all ϕ ∈ Cb RN ; C .

lim ϕ dλn = ϕ dγ0,1
n→∞ RN RN
(iii) By considering the case when N = 2, show that, for any ϕ ∈ Cb (R; R),
Z n Z !2
1X
(2.1.15) lim ϕ xk − ϕ dγ0,1 λn (dx) = 0.
n→∞
√
n R
k=1
Sn−1 ( n)
Notice that the non-computational argument has the advantage that it immedi-
(N )
ately generalizes the earlier result to cover λn for all N ∈ Z+ , not just N = 1
(cf. Exercise 2.3.24). On the other hand, the conclusion is weaker in the sense
that convergence of the densities has been replaced by convergence of integrals
with bounded continuous integrands and that no estimate on the rate of con-
vergence is provided. More work is required to restore the stronger statements
when N ≥ 2.
When couched in terms of statistical mechanics, this result can be interpreted
as a derivation of the Maxwell distribution of velocities for an ideal gas of free
particles of mass 2 and having average energy 1.
Exercise 2.1.16. The most frequently encountered applications of Stirling’s
formula (cf. (1.3.21)) are to cases when t ∈ Z+ . That is, one is usually interested
in the formula
√ n n
(2.1.17) n! ∼ 2πn .
e
Here is a derivation of (2.1.17) as an application of The Central Limit Theorem.

Namely, take {X n : n ≥ 1} to be a sequence of independent, random variables
with P Xn > x = exp −(x + 1)+ , x ∈ R for all n ∈ Z+ . For n ≥ 1, note that
√
Z 1+4−1 n+n
1
1
xn e−x dx

P S̆n+1 ∈ 0, 4 =
n! 1+n
1 √
1 n− 2 + 14 1+n−1
nn+ 2 e−n √
Z
1 n
= 1 + n− 2 y e− ny
dy.
n! n− 2
1
By The Central Limit Theorem,

Z 14
1 1 x2
P S̆n ∈ 0, 4 −→ √ e− 2 dx.
2π 0
At the same time, an elementary computation shows that

1 √
Z n− 2 + 14 1+n−1 √
Z 1
n 4 x2
− 12 − ny
1
1+n y e dy −→ e− 2 dx,
n− 2 0
§ 2.2 The Berry–Esseen Theorem via Stein’s Method 71
and clearly (2.1.17) follows from these. In fact, if one applies the Berry–Esseen
estimate proved in the next section, one finds that
√ n n

2πn 1
e
= 1 + O n− 2 .
n!
However, this last observation is not very interesting since we saw in Exercise
1.3.19 that the true correction term is of order n−1 .4
§ 2.2 The Berry–Esseen Theorem via Stein’s Method
As we will see in the next section, the principles underlying the passage from
Theorem 2.1.4 to Theorem 2.1.8 are very general. In fact, as we will see in
Chapter 9, some of these principles can be formulated in such a way that they
extend to a very abstract setting. However, rather than delve into such exten-
sions here, I will devote this section to a closer examination of the situation at
hand. Specifically, in this section we are going to see how to make the final part
of Theorem 2.1.8 quantitative.
From (2.1.5), we get a rate of convergence in terms of the second and third
derivatives of ϕ. In fact, if we assume that
1
τk ≡ EP |Xk |3 3 < ∞,

(2.2.1) 1 ≤ k ≤ n,
then (cf. the proof of Theorem 2.1.4), by using the estimates

000 3
Rm (ξ) ≤ kϕ ku |ξ|

and σk ≤ τk ,
6
one sees that (2.1.5) can be replaced by
000
Pn 3
E ϕ S̆n − ϕ dγ0,1 ≤ 2kϕ ku 1 τk
Z
P
(2.2.2)
R
3 Σ3n
when the Xk ’s have third moments.

Although both (2.1.5) and (2.2.2) are interesting, neither one of them can
be used to get very much information about the rate at which the distribution
functions

(2.2.3) x ∈ R 7−→ Fn (x) ≡ P S̆n ≤ x ∈ [0, 1]
4 As this exercise demonstrates, Stirling’s formula is intimately connected to The Central
√
Limit Theorem. In fact, apart from the constant 2π, what we now call Stirling’s formula was
discovered first by DeMoivre while he was proving The Central Limit Theorem for Bernoulli
random variables. For more information, see, for example, Wm. Feller’s discussion of Stirling’s
formula in his Introduction to Probability Theory and Its Applications, Vol. I, Wiley, Series in
Probability and Math. Stat. (1968).
are tending to the error function

Z x
1 t2
e− 2 dt.

(2.2.4) G(x) ≡ γ0,1 (−∞, x] = √
2π −∞
To see how (2.1.5) and (2.2.2) must be modified in order to gain such information,
first observe that
Z
ϕ0 (x) Fn (x) − G(x) dx

R
(2.2.5) Z
ϕ(y) γ0,1 (dy), ϕ ∈ Cb1 (R; R .
P

= E ϕ(S̆n ) −
R
(To prove (2.2.5), reduce to the case in which ϕ ∈ Cc1 (R; R) and ϕ(0) = 0;
and for this case apply either Fubini’s Theorem or integration by parts over
the intervals (−∞, 0] and [0, ∞) separately.) Hence, in order to get information
about the distance between Fn and G, we will have to learn how to replace
the right-hand sides of (2.1.5) and (2.2.2) with expressions that depend only on
the first derivative of ϕ. For example, if the dependence is on kϕ0 ku , then we
get information about the L1 (R; R) distance between Fn and G, whereas if the
dependence is on kϕ0 kL1 (R;R) , then the information will be about the uniform
distance between Fn and G.
§ 2.2.1. L1 -Berry–Esseen. The basic idea that I will use to get estimates in
terms of ϕ0 was introduced by C. Stein and is an example of a procedure known
as Stein’s method.1 In the case at hand, his method stems from the trivial
observation that if µ is a Borel probability measure
on R and g is the Gauss
kernel in (1.3.5), then µ = γ0,1 if and only if ∂ µg = 0 in the sense of Schwartz
distribution theory. Equivalently, if A+ is the raising operator
D §E
(cf. 2.4.1) given
µ
by A+ ϕ(x) = xϕ(x) − ∂ϕ(x), then, because hA+ ϕ, µi = ϕg, ∂ g , µ = γ0,1
if and only if hA+ ϕ, µi = 0 for sufficiently many test functions ϕ. In fact, as
will be shown in what follows, µ will be close to γ0,1 if, in an appropriate sense,
hA+ ϕ, µi is small.
To make mathematics out of the preceding, I will need the following.
Lemma 2.2.6. Let ϕ ∈ C 1 (R; R), assume that kϕ0 ku < ∞, set ϕ̃ = ϕ−hϕ, γ0,1 i,
and define
Z x
x2 t2
(2.2.7) x ∈ R 7−→ f (x) ≡ e 2 ϕ̃(t)e− 2 dt ∈ R.
−∞
Then f ∈ Cb2 (R; R),

q
(2.2.8) kf ku ≤ 2kϕ0 ku , kf 0 ku ≤ 3 π2 kϕ0 ku , kf 00 ku ≤ 6kϕ0 ku ,
1Stein provided an introduction, by way of examples, to his own method in Approximate

Computation of Expectations, I.M.S., Lec. Notes & Monograph Series # 7 (1986).
and
(2.2.9) f 0 (x) − xf (x) = ϕ̃(x), x ∈ R.
Proof: The facts that f ∈ C 1 (R; R) and that (2.2.9) holds are elementary
applications of The Fundamental Theorem of Calculus. Moreover, knowing that
f ∈ C 1 (R; R) and using (2.2.9), we see that f ∈ C 2 (R; R) and, in fact, that
(2.2.10) f 00 (x) − xf 0 (x) = f (x) + ϕ0 (x), x ∈ R.
To prove the estimates in (2.2.8), first note that, because ϕ̃ and therefore f are
unchanged when ϕ is replaced by ϕ − ϕ(0), I may and will assume that ϕ(0) = 0
and therefore that |ϕ(t)| ≤ kϕ0 ku |t|. In particular, this means that
Z Z q
ϕ dγ0,1 ≤ kϕ0 ku |t| γ0,1 (dt) = kϕ0 ku 2 .

π
R R
t2
ϕ̃(t)e− 2 dt = 0, an alternative expression for f
R
Next, observe that, because R
is Z ∞
x2 t2
f (x) = −e 2 ϕ̃(t)e− 2 dt, x ∈ R.
x
Thus, by using the original expression for f (x) when x ∈ (−∞, 0) and the
alternative one when x ∈ [0, ∞), we see first that
Z ∞
x2 2
ϕ̃ −t sgn(x) e− t2 dt, x ∈ R,

|f (x)| ≤ e 2
|x|
and then that Z ∞ q

x2 t2
|f (x)| ≤ kϕ0 ku e 2 t+ 2
π e− 2 dt.
|x|
But, since
Z ∞ Z ∞
d x2
− t2 x2 t2
e2 e 2 dt ≤ e 2 t e− 2 dt − 1 = 0 for x ∈ [0, ∞),
dx x x
we have that, for x ∈ R,

Z ∞ Z ∞ Z ∞
x2 t2 x2 t2 x2 t2
q
(2.2.11) |x|e 2 e− 2 dt ≤ e 2 te− 2 dt = 1 and e 2 e− 2 dt ≤ π
2;
|x| |x| |x|
which means that I have now proved the first estimate in (2.2.8). To prove the
other two estimates there, derive from (2.2.10)
d − x2 0 x2
e 2 f (x) = e− 2 f (x) + ϕ0 (x)

dx
and therefore that

Z x Z ∞
x2 − t2 x2 t2
0 0
f (x) = e 2 f (t)+ϕ (t) e 2 dt = −e 2 f (t)+ϕ0 (t) e− 2 dt, x ∈ R.
−∞ x
Thus, reasoning as I did above and using the first estimate in (2.2.8) and the
relations in (2.2.9), (2.2.10), and (2.2.11), one arrives at the second and third
estimates in (2.2.8).
I now have the ingredients needed to apply Stein’s method to the following
example of a Berry–Esseen type of estimate.
Theorem 2.2.12 (L1 -Berry–Esseen Estimate). Continuing in the setting
of Theorem 2.1.4, one has that for all > 0 (cf. (2.1.3), (2.2.3), and (2.2.4))
√
(2.2.13) Fn − G 1 ≤ 6(rn + ) + 3 2π gn (2).
L (R;R)
Moreover, if (cf. (2.2.1)) τm < ∞ for each 1 ≤ m ≤ n, then

Pn 3
Pn 3

Fn − G 1 3 m=1 τm 9 m=1 τm
(2.2.14) ≤ 6rn + ∧ .
L (R;R) Σ3n Σ3n
2
In particular, if σm = 1 and τm ≤ τ < ∞ for each 1 ≤ m ≤ n, then
6 + 2τ 3 8τ 3
Fn − G 1
L (R;R)
≤ √ ≤ √ .
n n
Proof: Let ϕ ∈ C 1 (R; R) having bounded first derivative be given, and define
f accordingly, as in (2.2.7). Everything turns on the equality in (2.2.9). Indeed,
because of that equality, we know that the right-hand side of (2.2.5) is equal to
n
X
EP f 0 (S̆n ) − EP S̆n f (S̆n ) = E f 0 (S̆n ) − EP X̆m f (S̆n ) ,
2 P

σ̆m
m=1
σm Xm
where I have set σ̆m = Σn and X̆m = Σn . Next, define
T̆n,m (t) = S̆n + (t − 1)X̆m for t ∈ [0, 1],
note that T̆n,m (0) is independent of X̆m , and conclude that

Z 1 2 0
EP X̆m f (S̆n ) = EP X̆m f T̆n,m (t) dt
0
Z 1
E f 0 T̆n,m (0) +
2 0
2 P
f (T̆n,m (t) − f 0 T̆n,m (0) dt

= σ̆m EP X̆m
0
for each 1 ≤ m ≤ n. Hence, we now see that

Z Xn n Z
X 1
2

(2.2.15) EP ϕ(S̆n ) − ϕ dγ0,1 = σ̆m Am − Bm (t) dt,
R m=1 m=1 0
where h i
Am ≡ EP f 0 S̆n ) − f 0 T̆n,m (0)
and h i
2
f 0 (T̆n,m (t) − f 0 T̆n,m (0)

Bm (t) ≡ EP X̆m .
Obviously, by Taylor’s Theorem and Hölder’s Inequality, for each 1 ≤ m ≤ n,

00 τm
(*) |Am | ≤ σ̆m kf ku ≤ rn ∧ kf 00 ku
Σn
while, for each t ∈ [0, 1] and > 0,
kf 0 ku h 2 i
2
kf 00 ku + 2 2 EP Xm

Bm (t) ≤ 2tσ̆m , |Xm | ≥ 2Σn .
Σn
Thus, after summing over 1 ≤ m ≤ n, integrating with respect to t ∈ [0, 1], and
using (2.2.5), (2.2.15), and (*), we arrive at
Z
ϕ0 (x) Fn (x) − G(x) dx ≤ rn + kf 00 ku + 2gn (2)kf 0 ku ,

R
which, in conjunction with the estimates in (2.2.8), leads immediately to the

estimate in (2.2.13). In order to get (2.2.14), simply note that
Z 1 h
i τ3
EP |X̆m |3 f 00 T̆n,m (st) ds ≤ tkf 00 ku m3 ,

Bm (t) ≤ t
0 Σn
and again use (2.2.15), (2.2.8), and (*).
§ 2.2.2. The Classical Berry–Esseen Theorem. The result in Theorem
2.2.12 is already significant. However, it is not the classical Berry–Esseen The-
orem, which is the analogous statement about kFn − Gku .
In order to prove the classical result via Stein’s method, we must learn how
to replace the kϕ000 ku in Lindeberg’s Theorem by kϕ0 kL1 (R;R) . It turns out that
this replacement is far more challenging than replacing kϕ000 ku by kϕ0 ku , which
was the replacement needed to prove Theorem 2.2.12. The argument that I will
use is a clever induction procedure that was introduced into this context by E.
Bolthausen.2 But, before I can apply Bolthausen’s argument, I will need the
following variation on Lemma 2.2.6.
2 The Berry–Esseen Theorem appears as a warm-up exercise in Bolthausen’s “An estimate
of the remainder term in a combinatorial central limit theorem,” Z. Wahr. Gebiete 66, pp.
379–386 (1984).
p π Let
Lemma 2.2.16. ϕ ∈ C 1 (R; R), and define f accordingly, as in (2.2.7).
Then kf ku ≤ 8 kϕ kL1 (R;R) and kf 0 ku ≤ kϕ0 kL1 (R;R) .
0
Proof: I will assume, throughout, that kϕ0 kL1 (R;R) = 1. Observe that, by the
Fundamental Theorem of Calculus, (cf. the notation in Lemma 2.2.6)
Z
ϕ̃(x) = − ϕ̃y (x) ϕ0 (y) dy, where ϕy = 1(−∞,y] ,
R
and so (cf. (2.2.4))

Z √ x2
f (x) = − ψy (x) ϕ0 (y) dy,

where ψy (x) = 2πe 2 G(x∧y)−G(x)G(y) ≥ 0.
R
At the same time, these, together with (2.2.9), give

Z
0
f (x) = − xψy (x) + ϕ̃y (x) ϕ0 (y) dy.
R
Hence, the desired estimates come down to checking that

x2
e 2 G(x ∧ y) − G(x)G(y) ≤ 14 ,

and √ x2

2πxe 2 G(x ∧ y) − G(x)G(y) + 1(−∞,y] (x) − G(y) ≤ 1

for all (x, y) ∈ R × R. But, if x ≤ y,

1 2
G(x ∧ y) − G(x)G(y) ≤ G(x) − G(x)2 = 1 − 4 G(x) − 12
4
and
!2
Z |x|
1 2
1 2
− ξ2
G(x) − 2 = e dξ
2π 0
ZZ
1 ξ2 +η 2 1 x2

≥ e− 2 dξdη = 1 − e− 2 ,
8π 4
ξ 2 +η 2 ≤x2
which proves the first inequality. To get the second one, it suffices to consider
each of the four cases 0 ≤ x ≤ y, x ≥ 0 & y < x, y < x < 0, and x < 0 & y ≥ x
separately and take into account that, from the first part of (2.2.11),
√ x2 √ x2
x ≥ 0 =⇒ 2πxe 2 1−G(x) ≤ 1 and x < 0 =⇒ 2π|x|e 2 G(x) ≤ 1.
As distinguished from Lemma 2.2.6, Lemma 2.2.16 contains no estimate on

kf 00 ku . Indeed, there is no such estimate in terms of kϕ0 kL1 (R;R) . As a conse-
quence, the proof of the following is much more involved than that of Theorem
2.2.12
Theorem 2.2.17 (Classical Berry–Esseen Estimate). Let everything be as

in Theorem 2.1.4, and assume that (cf. (2.2.1)) τm < ∞ for each 1 ≤ m ≤ n.
Then (cf. (2.2.3) and (2.2.4))
Pn 3
1 τm
(2.2.18) kFn − Gku ≤ 10 .
Σ3n
In particular, if σm = 1 for all 1 ≤ m ≤ n, then (2.2.18) can be replaced by
3
Pn 3 max τm
1 τm 1≤m≤n
(2.2.19) kFn − Gku ≤ 10 3 ≤ 10 √ .
n2 n
Proof: For each n ∈ Z+ , let βn denote the smallest number β with the property
that Pn 3
τ
kFn − Gku ≤ β 1 3 m
Σn
for all choices of random variables satisfying the hypotheses under which (2.2.18)
is to be proved. My strategy is to give an inductive proof that βn ≤ 10 for all
n ∈ Z+ ; and, because Σ1 ≤ τ1 and therefore β1 ≤ 1, I need only be concerned
with n ≥ 2.
Given n ≥ 2 and X1 , . . . , Xn , define X̆m , σ̆m , and T̆n,m (t) for 1 ≤ m ≤ n and
t ∈ [0, 1] as in the proof of Theorem 2.2.12. Next, for each 1 ≤ m ≤ n, set
n X τ` 3
p τm X
3
Σn,m = Σ2n − σm
2 , τ̆m = , ρn = τ̆m , and ρn,m = .
Σn 1
Σn,m
1≤`≤n
`6=m
Finally, set
X Sn,m
Sn,m = X` and S̆n,m = ,
Σn,m
1≤`≤n
`6=m

and let x ∈ R 7−→ Fn,m (x) ≡ P S̆n,m ≤ x ∈ [0, 1] denote the distribution
function for S̆n,m . Notice that, by definition, kFn,m − Gku ≤ βn−1 ρn,m for each
1 ≤ m ≤ n. Furthermore, because (cf. (2.1.3))
3
Σ2n,m

2 Σn
= 1 − σ̆m ≥ 1 − rn2 and ρn,m ≤ ρn ,
Σ2n Σn,m
we know first that

ρn
ρn,m ≤ 3 , 1 ≤ m ≤ n,
(1 − rn2 ) 2
and therefore that

ρn βn−1
(2.2.20) max kFn,m − Gku ≤ 3 .
1≤m≤n (1 − rn2 ) 2
Now let ϕ ∈ Cb2 (R; R) with kϕ00 kL1 (R) < ∞ be given, define f accordingly, as
in (2.2.7), and let
{Am : 1 ≤ m ≤ n} and {Bm (t) : 1 ≤ m ≤ n & t ∈ [0, 1]}
be the associated quantities appearing in (2.2.15). By (2.2.9), we have that

h i h i
|Am | ≤ EP X̆m f (S̆n ) + EP T̆n,m (0) f (S̆n ) − f T̆n,m (0)

h i
+ EP ϕ(S̆n ) − ϕ T̆n,m (0)

≤ EP |X̆m | kf ku + EP |X̆m T̆n,m (0)| kf 0 ku

Z 1
E X̆m ϕ0 T̆n,m (ξ) dξ
P
+
0

Σn,m 0
kf ku + max EP X̆m ϕ0 T̆n,m (ξ)

≤ σ̆m kf ku +
Σn ξ∈[0,1]

kf ku + kf 0 ku + max EP X̆m ϕ0 T̆n,m (ξ) .

≤ σ̆m
ξ∈[0,1]
Similarly, from (2.2.9)) and the independence of X̆m from T̆m,n (0), one sees that
|Bm (t)| is dominated by
h i h 2 i
3

tEP X̆m f T̆n,m (t) + EP X̆m T̆n,m (0) f T̆n,m (t) − f T̆n,m (0)

h i
2

+ EP X̆m ϕ T̆n,m (t) − ϕ T̆n,m (0)

≤ tEP |X̆m |3 kf ku + tEP |X̆m |3 EP |T̆n,m (0)| kf 0 ku

Z 1
P 3 0
+t E X̆m ϕ T̆n,m (tξ) dξ
0
kf ku + kf 0 ku + t max EP X̆m
3
3 0
≤ tτ̆m ϕ T̆n,m (ξ) .
ξ∈[0,1]
In order to handle the second term in the last line of each of these calculations,
introduce the function

0 Σn,m
(ξ, ω, y) ∈ [0, 1] × Ω × R 7−→ ψ(ξ, ω, y) ≡ ϕ ξ X̆m (ω) + y ∈ R.
Σn
Then, because X̆m is independent of T̆n,m (0),

h Z Z
P k 0 i k

E X̆m ϕ T̆n,m (ξ) − X̆m (ω) ψ(ξ, ω, y) γ0,1 (dy) P(dω)

Ω R
Z Z Z
k
≤ X̆m (ω) ψ(ξ, ω, y) dFn,m (y) − ψ(ξ, ω, y) dG(y) P(dω)

Ω R R
Z Z
X̆m (ω)k ψ 0 (t, ω, y) G(y) − Fn,m (y) dy P(dω)

=
Ω R
βn−1 ρn h k i τ̆ k βn−1 kϕ00 kL1 (R;R) ρn

≤ EP X̆m kϕ00 kL1 (R;R) ≤ m
3 3 , k ∈ {1, 3},
(1 − rn2 ) 2 (1 − rn2 ) 2
where I have used ψ 0 (t, ω, y) to denote the first derivative of y ∈ R 7−→ ψ(ξ, ω, y),
applied (2.2.5) and (2.2.20), and noted that
kψ 0 (ξ, ω, ·)kL1 (R;R) = kϕ00 kL1 (R;R) for all (ξ, ω) ∈ [0, 1] × Ω.
At the same time, because
Σn
kψ(ξ, ω, · )kL1 (R;R) = kϕ0 kL1 (R;R) for all (ξ, ω) ∈ [0, 1] × Ω,
Σn,m
we have that, for each ξ ∈ [0, 1],
kϕ0 kL1 (R;R) τ̆m
k
Z Z
X̆m (ω)k

ψ(ξ, ω, y) γ 0,1 (dy) P(dω) ≤ 1 .
2π(1 − rn2 ) 2

Ω R
Hence, by combining these estimates, we arrive at

 
0
kϕ kL (R;R)
1 βn−1 ρn
|Am | ≤ τ̆m kf ku + kf 0 ku + 12 +
00
3 kϕ kL1 (R;R)

2
(1 − rn )
2π(1 − rn2 ) 2
and
 
0
kϕ k 1
L (R;R) βn−1 ρn
3 
|Bm (t)| ≤ tτ̆m kf ku + kf 0 ku + 12 +
00
3 kϕ kL1 (R;R)

2 2
(1 − rn )
2π(1 − rn ) 2
for all 1 ≤ m ≤ n and t ∈ [0, 1], and, after putting these together with (2.2.5)
and (2.2.15), we conclude that
Z
ϕ0 (y) G(y) − Fn (y) dy

R

3
(2.2.21) ≤ kf ku + kf 0 ku
2
kϕ0 kL1 (R;R) βn−1 kϕ00 kL1 (R;R) ρn

+ 1 + 3 ρn .
2π(1 − r2 ) 2 (1 − rn2 ) 2
n
I next apply (2.2.21) to a special class of ϕ’s. Namely, set


1
 if x < 0
h(x) = 1 − x if x ∈ [0, 1]

0 if x > 1,

and define
Z
−1
η −1 y h(x − y) dy

h (x) = for > 0 and x ∈ R,
R
where η ∈ Cc∞ R; [0, ∞) satisfies R η(y) dy = 1. Finally, let a ∈ R be given,

R
and set
ϕ,L (x) = h x−a
Lρn , x ∈ R and , L > 0.
It is then an easy matter to check that kϕ0,L kL1 (R;R) = 1 while kϕ00,L kL1 (R;R) ≤
2
Lρn . Hence, by plugging the estimates from Lemma 2.2.16 into (2.2.21) and
then letting & 0, we find that, for each L > 0,

1 Z a+Lρn
sup G(y) − Fn (y) dy

a∈R Lρn a
(2.2.22) 
r

3 π 1 2βn−1 
≤ 1+ + 1 + 3 ρn .
2 8 2π(1 − r2 ) 2 (1 − rn2 ) 2 L
n
But Z a Z a+Lρn
1 1
Fn (y) dy ≤ Fn (a) ≤ Fn (y) dy,
Lρn a−Lρn Lρn a
while
Z a+Lρn Z a+Lρn
1 1 Lρn
0≤ G(y) dy − G(a) = (a + Lρn − y) γ0,1 (dy) ≤ √ ,
Lρn a Lρn a 8π
and, similarly, Z a
1 Lρn
0 ≤ G(a) − G(y) dy ≤ √ .
Lρn a−Lρn 8π
Thus, from (2.2.22), we first obtain, for each L ∈ (0, ∞),
 r 
3 9π 3 3βn−1 L 
kFn − Gku ≤  + + 1 + 3 + 1 ρn ,
2 32 8π(1 − rn2 ) 2 (1 − rn2 ) 2 L (8π) 2
and then, after minimizing with respect to L ∈ (0, ∞),

r r
3 9π 9 − 1
kFn − Gku ≤ + + 1 − rn2 2
2 32 8π
(2.2.23) r
4 18
1 3
2 −4
+ β 2
1 − rn ρn .
π n−1
In order to complete the proof starting from (2.2.23), we have to consider the
1 1
two cases determined by whether ρn ≥ 10 or ρn < 10 . Because kFn − Gku ≤ 1,
it is obvious that we can take βn ≤ 10 in the first case. On the other hand, if
1
ρn ≤ 10 and we assume that βn−1 ≤ 10, then, because
n n n
1 X P 3 1 X P 2 32 X
3
≥ rn3 ,

ρn = 3 E |Xm | ≥ 3 E Xm = σ̆m
Σn 1 Σn 1 1
(2.2.23) says that kFn − Gku ≤ 10ρn . Hence, in either case, βn−1 ≤ 10 =⇒
βn ≤ 10.
It is clear from the preceding derivation (in particular, the final step) that the
constant 10 appearing in (2.2.18) and (2.2.19) can be replaced by the smallest
β > 1 that satisfies the equation
r r r
3 9π 9 1
− 23 − 2 4 18 1 2 − 3
β= + + 1−β + β 2 1 − β− 3 4 .
2 32 8π π
Numerical experimentation indicates that 10 is quite a good approximation to
the actual solution of this minimization problem. However, it should be rec-
ognized that, with sufficient diligence and entirely different techniques, one can
show that the 10 in (2.2.18) can be replaced by a number that is less than 1.
Thus, I do not claim that Stein’s method gives the best result, only that it gives
whatever it gives with relatively little pain.
Exercise 2.2.24. It is important to know that, at least qualitatively, one can-

not do better than Berry–Esseen. To see this, consider independent, symmetric,
{−1, 1}-valued Bernoulli random variables, and define Fn accordingly. Next,
1
observe that when tn = −(2n + 1)− 2 ,
Z 0
1 x2
F2n+1 (tn ) − G(tn ) = √ e− 2 dx
2π tn
1
and therefore that limn→∞ n 2 kFn − Gku ≥ √12π . In particular, since τm = 1 for
these Bernoulli random variables, we conclude that the constant in the Berry–
1
Esseen estimate cannot be smaller than (2π)− 2 .
Exercise 2.2.25. Because the derivation of Theorem 2.2.12 is so elegant and

simple, one wonders whether (2.2.14) can be used as the starting point for a proof
of (2.2.19). Unfortunately, the following naı̈ve idea falls considerably short of
the mark.
Let X1 , . . . , Xn satisfy the hypotheses of Theorem 2.2.17. Starting from
(2.2.14) and proceeding as I did in the passage from (2.2.22) to (2.2.23), show
that for every L > 0
Pn 3
6 1 τm L
kFn − Gku ≤ 3
+√ ,
LΣn 8π
and conclude that
14 Pn 3
12
72 1 τm
kFn − Gku ≤ .
π Σ3n
−3
Pn 3
Obviously, this is unacceptably poor when Σn 1 τm is small.
§ 2.3 Some Extensions of The Central Limit Theorem
In most modern treatments of The Central Limit Theorem, Fourier analysis plays
a central role. Indeed, the Fourier transform makes the argument so simple that
it can mask what is really happening. However, now that we know Lindeberg’s
argument, it is time to introduce Fourier techniques and begin to see how they
facilitate reasoning involving independent random variables.
§ 2.3.1. The Fourier Transform. The Fourier transform of finite, C-valued,
Borel measure µ on RN is the function µ̂ : RN −→ C given by
Z h√ i
(2.3.1) µ̂(ξ) = exp −1 ξ, x RN µ(dx) for x ∈ RN .
RN
When µ is a probability measure which is the distribution of an RN -valued ran-

dom variable X, probabilists usual call its Fourier transform the characteristic
function of X, and when µ admits a density ϕ with respect to Lebesgue measure
λRN , one uses
Z h√ i
(2.3.2) ϕ̂(ξ) = exp −1 ξ, x RN ϕ(x) dx for ξ ∈ RN
RN
in place of µ̂ to denote its Fourier transform.

Obviously, µ̂ is a continuous function that is bounded by the total variation
kµkvar of µ; and only slightly less obvious1 is the fact that, for ϕ ∈ Cc∞ RN ; C ,

ϕ̂ ∈ C ∞ RN ; C and that ϕ̂ as well as all its derivatives are rapidly decreasing

−1
(i.e., they tend to 0 at infinity faster than 1 + |ξ|2 to any power).
1
√
α ϕ(ξ) = (− −1ξ)α ϕ̂(ξ) and concludes that
One uses integration byP parts to check that ∂d
|ξ|n |ϕ̂(ξ)| is bounded by kαk=n
k∂ α ϕkL1 (RN ) .
§ 2.3 Some Extensions of The Central Limit Theorem 83
Lemma 2.3.3. Let µ be a finite, C-valued Borel measure on RN . Then, for

N 1 N 1 N
every ϕ ∈ Cb R ; C ∩ L R ; C with ϕ̂ ∈ L (R ; C),
Z Z
1
(2.3.4) hϕ, µi = ϕ dµ = ϕ̂(ξ)µ̂(ξ) dξ.
RN (2π)N RN
Moreover, given a sequence {µn : n ∈ Z+ } of Borel probability measures and

a Borel probability measure µ on RN , µ cn −→ µ̂ uniformly on compacts if
hϕ, µn i −→ hϕ, µi for every ϕ ∈ Cc RN , R . Conversely, if µ
cn (ξ) −→ µ̂(ξ) point-
wise, then hϕn , µn i −→ hϕ, µi whenever {ϕn : n ≥ 1} is a uniformly bounded
sequence in Cb (RN ; C) that tends to ϕ uniformly on compacts. (Cf. Theorem
3.1.8 for more refined information on this subject.)
Proof: Choose ρ ∈ Cc∞ RN ; [0, ∞) to be an even function that satisfies RN ρ dx =
R
1, and set ρ (x) = −N ρ(−1 x) for ∈ (0, ∞). Next, define ψ for ∈ (0, ∞) to
be the convolution ρ ? µ of ρ with µ. That is,
Z
ψ (x) = ρ (x − y) µ(dy) for x ∈ RN .
RN

It is then easy to check that ψ ∈ Cb RN ; C and kψ kL1 (RN ;R) ≤ kµkvar for every
∈ (0, ∞). In addition, one sees
(by Fubini’s Theorem) that ψ̂ (ξ) = ρ̂( ξ)µ̂(ξ).
Thus, for any ϕ ∈ Cb (RN ; C ∩ L1 RN ; C , Fubini’s Theorem followed by the
classical Parseval Identity (cf. Exercise 2.3.23) yields
Z Z
1
hϕ , µi = ϕ(x) ψ (x) dx = ρ̂( ξ) ϕ̂(ξ) µ̂(−ξ) dξ,
RN (2π)N RN
where ϕ ≡ ρ ? ϕ is the convolution of ρ with ϕ. Since, as & 0, ϕ −→ ϕ

while ρ̂( ξ) −→ 1 boundedly and pointwise, (2.3.4) now follows from Lebesgue’s
Dominated Convergence Theorem.
Turning to the second part of the theorem, first suppose that hϕ, µn i −→ hϕ, µi
for every ϕ ∈ Cc (RN ; R), and
√
let ξn −→ ξ in C. Then,
√
by the last part of Lemma
2.1.7 applied to ϕn (x) = e −1 (ξn ,x)RN and ϕ(x) = e −1 (ξ,x)RN , µ
cn (ξn ) −→ µ̂(ξ).
Hence, µ cn −→ µ̂ uniformly on compacts. Conversely, suppose that µ cn −→ µ̂
pointwise. Again by Lemma 2.1.7, we need only check that hϕ, µ n i −→ hϕ, µi
∞ N

when ϕ ∈ Cc R ; C . But, for such a ϕ, ϕ̂ is smooth and rapidly decreasing,
and therefore the result follows immediately from the first part of the present
lemma together with Lebesgue’s Dominated Convergence Theorem.
Remark 2.3.5. Although it may seem too obvious to mention, an important,
and rather amazing, consequence of Lemma 2.3.3 is that a finite Borel measure
on RN is completely determined by its 1-dimensional marginals. To understand
this remark, recall that for a linear subspace L of RN , the marginal distribu-
tion of µ on L is the measure (ΠL )∗ µ, where ΠL denotes orthogonal projection
onto L. In particular, if e ∈ SN −1 and µe is the marginal distribution of µ on the

1-dimensional subspace spanned by e, then µ̂(ξe) = µ ce (ξ). Hence, the Fourier
transform of µ is determined by the Fourier transforms of {µe : e ∈ SN −1 }, and
therefore, by Lemma 2.3.3, µ can be recovered from its 1-dimensional marginals.
Of course, one should be careful when applying this observation. For instance,
when applied to an RN -valued random variable X = (X1 , . . . , XN ), it says that
the distribution of X can be recovered from a knowledge of the distributions
of (e, X)RN for all e ∈ SN −1 , but it does not say that the distributions of the
coordinates Xi , 1 ≤ i ≤ N , determine the distribution of X.
§ 2.3.2. Multidimensional Central Limit Theorem. The great virtue of
the Fourier transform is that it behaves so well under operations built out of
translation. In applications to probability theory, this virtue is of particular
importance when adding independent random variables. Specifically, if X and
Y are independent, then the characteristic function of X + Y is the product
of the characteristic functions of X and Y. This observation, combined with
Lemma 2.3.3 leads to the following easy proof of The Central Limit Theorem for
independent, identically distributed, R-valued random variables {Xn : n ≥ 1}
with mean value 0 and variance 1. Namely, if µn is the distribution of S̆n , then
n ξ2 n

ξ2
µ̂n (ξ) = µ̂ √ξn = 1− + o n1 −→ e− 2 = γd 0,1 (ξ)
2n
for every ξ ∈ R.
Actually, as we are about to see, a slight variation on the preceding will allow
us to lift the results that we already know for R-valued random variables to ran-
dom variables with values in RN . However, before I can state this result, I must
introduce the analogs of the mean value and variance for vector-valued random
variables. Thus, given an RN -valued random variable X on the probability space
(Ω, F, P) with |X| ∈ L1 (P; R), the mean value EP [X] of X is the m ∈ RN that
is determined by the property that
for all ξ ∈ RN .

(ξ, m)RN = EP ξ, X RN
Similarly, if |X| is P-square integrable, then the covariance cov(X) of X is the
symmetric linear transformation C on RN determined by
for ξ, η ∈ RN .

ξ, C η RN = EP ξ, X − EP [X] RN η, X − EP [X] RN
Notice that cov(X) is not only symmetric
but is also non-negative definite,
since for each ξ ∈ RN , ξ, cov(X) ξ RN is nothing but the variance of (ξ, X)RN .
Finally, given m ∈ RN and a symmetric, non-negative C ∈ RN ⊗ RN , I will use
γm,C to denote the Borel probability measure on RN determined by the property
that
Z Z
1 N
(2.3.6) ϕ dγm,C = ϕ m + C 2 y γ0,1 (dy), ϕ ∈ Cb (RN ; R),
RN RN
1
where C 2 is the non-negative definite, symmetric square root of C
Clearly, an RN -valued random variable Y has distribution γm,C if and only
if, for each ξ ∈ RN , (ξ, Y)RN is a normal random variable with mean value
(ξ, m)RN and variance (ξ, C ξ)RN . For this reason, γm,C is called the normal
or Gaussian distribution with mean value m and covariance C. For the same
reason, a random variable with γm,C as its distribution is called a normal or
Gaussian random variable with mean value m and covariance C, or, more
briefly, an N (m, C)-random variable. Finally, one can use this characterization
to see that
h√ i
1
(2.3.7) γ[
m,C (ξ) = exp −1 ξ, m) − 2 ξ, Cξ RN
.
In the following statements, I will be assuming that {Xn : n ∈ Z+ } is a

sequence of mutually independent, square P-integrable, RN -valued random vari-
ables on the probability space (Ω, F, P ). Further, I will assume that, for each
n ∈ Z+ , Xn has mean value 0 and strictly positive definite covariance cov(Xn ).
Finally, for n ∈ Z+ , set
n
X n
X
Sn = Xm , Cn ≡ cov(Sn ) = cov(Xm ),
m=1 m=1
1 Sn
Σn = det(Cn ) 2N and S̆n = .
Σn
Notice that when N = 1, the above use of the notation Σn and S̆n is consistent
with that in § 2.1.1.
With these preparations, I am ready to prove the following multidimensional
generalization of Theorem 2.1.8.
Theorem 2.3.8. Referring to the preceding, assume that the limit
Cn
(2.3.9) A ≡ lim
n→∞ Σ2n
exists and that

n
1 X P
E |Xm |2 , |Xm | ≥ Σn = 0 for each > 0.

(2.3.10) lim 2
n→∞ Σn
m=1
Then, for every sequence {ϕn : n ≥ 1} ⊆ C(RN ; C) that satisfies
|ϕn (y)|
(2.3.11) sup sup <∞
n≥1 y∈RN 1 + |y|2
and converges uniformly on compacts to ϕ,

(2.3.12) lim EP ϕn S̆n = hϕ, γ0,A i.
n→∞
In particular, when the Xn ’s are uniformly square P-integrable random variables

with mean value 0 and common covariance C,

Sn
lim EP ϕn √ = hϕ, γ0,C i
n→∞ n
whenever {ϕn : n ≥ 1} ⊆ C(RN ; C) satisfies (2.3.11) and converges to ϕ uni-

formly on compacts.
Proof: Given e ∈ SN −1 , set
Σn (e)
q
Σn (e) = e, Cn e RN and ρn (e) = .
Σn
p
Then, ρ(e) ≡ inf n≥1 ρn (e) ∈ (0, 1] and ρn (e) −→ (e, Ae)RN as n → ∞. In
particular, if (e1 , . . . , eN ) is an orthonormal basis in RN , then
N N
X 2 X
EP |S̆n |2 = ρn (ei )2

EP ei , S̆n RN =
i=1 i=1
N
X Z
|y|2 γ0,A (dy).

−→ ei , Aei RN
=
i=1 RN
Hence, by Lemmas 2.1.7 and 2.3.3 plus (2.3.7), all that we have to do is check
that
h √ i 1
(*) fn (ξ) ≡ EP e −1 (ξ,S̆n )RN −→ e− 2 (ξ,Aξ)RN
for each ξ ∈ RN .
ξ
When ξ = 0, (*) is trivial. Thus, assume that ξ 6= 0, set e = |ξ| , and take
(e,Sn )RN
S̆n (e) = Σn (e) . Because
n
1 X 2
2
EP e, Xm RN , e, Xm RN ≥ Σn (e)
Σn (e) m=1
n
1 X 2
≤ 2 2
EP e, Xm RN , e, Xm RN ≥ ρ(e)Σn (e)
ρ(e) Σn m=1
tends to 0 for each > 0, Theorem 2.1.8 combined with Lemma 2.3.3 guarantees
that, for any η ∈ R,
√ 1 2
EP e −1 ηn S̆n (e) −→ e− 2 |η|

p
for any {ηn : n ≥ 1} ⊆ R that tends to η. In particular, if η = (ξ, Aξ)RN and
ηn = ρn (e)|ξ|, we find that
√ 1
fn (ξ) = EP e −1 ηn S̆n (e) −→ e− 2 (ξ,Aξ)RN .

When C is non-degenerate, the final part is a trivial application of the initial

part. When C is degenerate but not zero, one can reduce to the non-degenerate
case by projecting onto the span of its eigenvectors with strictly positive eigen-
values, and when C = 0, there is nothing to do.
§ 2.3.3. Higher Moments. In this subsection I will show that when the Xn ’s
possess higher moments, then (2.1.1) remains true for ϕ’s that can grow faster
than 1 + |y|2 . As an initial step in this direction, I give the following simple
example.
Lemma 2.3.13. Suppose that {Xn : n ≥ 1} is a sequence of independent,
identically distributed random variables with mean value 0 and variance 1. If
EP [X12` ] < ∞ for some ` ∈ Z+ , then (2.1.1) holds for any ϕ ∈ C(RN ; C) that
satisfies
|ϕ(y)|
(2.3.14) sup < ∞.
y∈R 1 + |y|2`
Proof: Refer to the discussion in the introduction to this chapter, and observe
that the argument there shows that
Z
2` (2`)!
P
lim E S̆n = ` = y 2` γ0,1 (dy)
n→∞ 2 `! R
whenever the 2`th moment of X1 is finite. Hence the desired conclusion is an

application of the last part of Lemma 2.1.7 with ψ(y) = 1 + |y|2` .
In most situations one cannot carry out the computations needed to give a
direct proof that the last part of Lemma 2.1.7 applies, and for this reason the
following lemma is often useful.
Lemma 2.3.15. Suppose that {µn : n ≥ 1} is a sequence of finite (non-
negative) Borel measures on RN , and assume µ is a finite Borel measure with
the property that hϕ, µn i −→ hϕ, µi for all ϕ ∈ Cb∞ (RN ; R). If, for some ψ ∈
C RN ; [0, ∞) and p ∈ (1, ∞),
(2.3.16) suphψ p , µn i < ∞,
n≥1
then hϕn , µn i −→ hϕ, µi whenever {ϕn : n ≥ 1} ⊆ C(RN ; C) is a sequence that

satisfies |ϕn | ≤ ψ for all n ∈ Z+ and converges to ϕ uniformly on compacts.
Proof: By Lemma 2.1.7, all that we have to prove is that hψ, µn i −→ hψ, µi.
For this purpose, note that, under our present hypotheses, Lemma 2.1.7 shows
that hψ, µi ≤ limn→∞ hψ, µn i < ∞ and that hψ ∧ R, µn i −→ hψ ∧ R, µi ≤ hψ, µi
for each R > 0. Thus, it suffices to observe that
Z
suph(ψ − ψ ∧ R), µn i = sup ψ dµn ≤ R1−p suphψ p , µn i −→ 0
n≥1 n≥1 {ψ>R} n≥1
as R → ∞.
Knowing Lemma 2.3.15, one’s problem is to find conditions under which one
can show that supn≥1 EP [ψ(S̆n )] < ∞ for an interesting class of non-negative
ψ’s. One such class is provided by the notion of a sub-Gaussian random vari-
able. Given β ∈ [0, ∞), an RN -valued random variable X is said to be β-sub-
Gaussian if
β 2 |ξ|2
EP e(ξ,X)RN ≤ e 2 , ξ ∈ RN .

(2.3.17)
The origin of this terminology should be clear: if X ∈ N (0, σ 2 ), then equality

holds in (2.3.17) with β = σ.
Lemma 2.3.18. Let X be an RN -valued random variable. If X is a β-sub-
Gaussian, then EP [X] = 0, Cov(X) ≤ β 2 I,
2
− R
P |X| ≥ R ≤ 2N e 2N β2 , R > 0,
and, for each α ∈ [0, β −1 ),

α2 |X|2 − N
EP e 2 ≤ 1 − (αβ)2 2 .
α2 |X|2
Conversely, if A ≡ EP e 2 < ∞ for some α ∈ (0, ∞) and EP [X] = 0, then X
√
2(1+A)
is β-sub-Gaussian when β = α . In particular, if X is a bounded random
variable with mean value 0, then X is β-sub-Gaussian with β ≤ 2kXkL∞ (P;RN ) .
Finally, if X1 , . . . , Xn are independent random variables, and, Pnfor each 1 ≤
m ≤ n, Xm is βm -sub-Gaussian, pPn then for any a1 , . . . , an ∈ R, m=1 am Xm is
β-sub-Gaussian when β = 2
m=1 (am βm ) .
Proof: Since the moment generating function of the sum of independent ran-
dom variables is the product of the moment generating functions of the sum-
mands, the final assertion is essentially trivial.
To prove the first assertion, use Lebesgue’s Dominated Convergence Theorem
to justify
β 2 t2
P
−1

P t(e,X)RN
e 2 −1
±E (e, X)RN = lim t E e − 1 ≤ lim =0
t&0 t&0 t
and
β 2 t2
EP et(e,X)RN + EP e−t(e,X)RN − 2

2 e 2 −1
= β2

P

E (e, X)RN = lim ≤ 2 lim
t&0 t2 t&0 t2
for any e ∈ SN −1 . Next, from
β 2 t2

−tR P
t(e,X) N
P (e, X)RN ≥ R) ≤ e E e R ≤ exp −tR +
2
R2
−
for any ≥ 0 and e ∈ SN −1 , one gets P (e, X)RN ≥ R) ≤ e 2β 2 by minimizing
over t ≥ 0. Since
1
P |X| ≥ R ≤ 2N max P (e, X)| RN ≥ N − 2 R ,

e∈SN −1
α2 |X|2
the estimate for P(|X| ≥ R) follows. To get the estimate on EP e 2 , use
Tonelli’s Theorem to see that
Z Z
α2 |X|2 β 2 |ξ|2 − N
EP e(ξ,X)RN γ0,α2 I (dξ) ≤ e 2 γ0,α2 I (dξ) = 1−(αβ)2 2 .

EP e 2 =
R R
α2 |X|2
Now assume that A = EP e 2 < ∞ for some α ∈ (0, ∞) and that EP [X]
= 0. Then
1
|ξ|2 P 2 |ξ||X|
Z
EP e(ξ,X)RN = 1 + (1 − t)EP (ξ, X)2RN et(ξ,X)RN dt ≤ 1 +

E |X| e
0 2
|ξ|2 |ξ|22 P 2 α2 |X|2 |ξ|2 |ξ|22 A|ξ|2

|ξ|2
≤1+ e E |X| e
α 4 ≤1+A 2 e α ≤ 1+ e α2 ,
2 α α2
from which it is clear that X is β-sub-Gaussian for the prescribed β. In par-

α2 |X|2 α2 K 2
ticular, if K = kXkL∞ (P;RN ) ∈ (0, ∞), then EP e 2 ≤ e 2 for all α ≥ 0,
and
p so, if, in addition, X has mean value 0, then X is β-sub-Gaussian √ for β =
t−1 (1 + etK 2 ) for all t > 0. Taking t = K −2 , we see that β = K 1 + e ≤ 2K.
When kXkL∞ (P;RN ) = 0, there is nothing to do.
By combining Lemmas 2.3.15 and 2.3.18 with Theorem 2.3.8, we get the fol-
lowing.
Theorem 2.3.19. Working in the setting and with the notation in Theorem
2.3.8, assume that, for each n ∈ Z+ ,
2
EP e(ξ,Xn )RN ≤ eβn |ξ| , ξ ∈ RN ,

where βn ∈ (0, ∞). If pPn

2
βm
m=1
β ≡ sup < ∞,
n≥1 Σn
then (2.3.12) holds for any ϕ ∈ C(RN ; C) satisfying
α2 |y|2
|ϕ(y)| ≤ Ce 2 , y ∈ RN ,
for some C < ∞ and α ∈ 0, β1 . In particular, if the Xn ’s are identically

2 2
distributed with covariance C and if EP eα |X1 | < ∞ for some α ∈ (0, ∞),
then, for any ϕ ∈ C(RN ; C),
1
lim |y|−2 log 1 + |ϕ(y)| = 0 =⇒ lim EP ϕ n− 2 Sn = hϕ, γ0,C i.

|y|→∞ n→∞
Exercise 2.3.20. Here is a proof of Feller’s part of the Lindeberg–Feller

Theorem. Referring to Theorem 2.1.4 and the discussion proceeding it, assume
that, as n → ∞,
√ ξ2
rn −→ 0 and EP e −1ξS̆n −→ e− 2 for all ξ ∈ R.

(i) Show that

h √ ξXm i ξ 2 σ 2
1 − EP e −1 Σn ≤ m
,

2
and conclude that, for each R > 0, there is an NR such that
h √ ξXm i 1
max 1 − EP e −1 Σn ≤ for n ≥ NR and |ξ| ≤ R.

1≤m≤n 2
P∞ (1−ζ)k
given by logζ = − k=1
(ii) Take the branch of the logarithm k for ζ ∈ C
with |1 − ζ| < 1, and check that (1 − ζ) + log ζ ≤ |1 − ζ|2 for |1 − ζ| ≤ 12 .
Conclude first that
n n

X h √ ξX m
i X h √ ξXm i R2 r2
EP 1 − e −1 Σn + log EP e −1 Σn ≤ n

2

m=1 m=1
for n ≥ NR and |ξ| ≤ R, and then that

n
ξ2

X
P ξXm
∆n (ξ) ≡ − E 1 − cos −→ 0
2 m=1
Σn
uniformly for ξ’s in compacts.

(iii) Given > 0, show that

n n
ξ2 X P 2

X
P ξXm
E 1 − cos , |Xm | < Σn ≤ 2
E Xm , |Xm | < Σn
m=1
Σ n 2Σ n m=1
ξ2 ξ2
≤ − gn ()
2 2
and that
n
X ξXm
P
E 1 − cos , |Xm | ≥ Σn ≤ −2 .
m=1
Σ n
Finally, combine these and apply (ii) to get limn→∞ ξ 2 gn () ≤ −2 for all ξ ∈ R.
Exercise 2.3.21. It is of some interest to know that the second moment
assumption can be removed from the hypotheses in Exercise 2.1.11 and that
the result there extends to Borel probability measures on RN .R To explain what
I have in mind, first use that exercise to see that if σ 2 = R x2 µ(dx) < ∞,
then µ = T2 µ =⇒ µR∈ N (0, σ 2 ). What I want to do now is remove the a
priori assumption that R x2 µ(dx) < ∞. That is, I want to show that, for any
probability measure µ on R, µ = T2 µ ⇐⇒ µ ∈ N (0, σ 2 ) for some σ ∈ [0, ∞).
Since the “⇐=” direction is obvious,
R and, by the discussion above, the “ =⇒ ”
direction is already covered when R x2 µ(dx) < ∞, all that remains is to show
that
Z
(2.3.22) µ = T2 µ =⇒ x2 µ(dx) < ∞.
R
See Exercise 2.4.33 for an interesting application of this result.

(i) Check (2.3.22) first under the condition that µ is symmetric (i.e., µ(−Γ)
= µ(Γ) for all Γ ∈ BR ). Indeed, if µ is symmetric, show that
Z
µ̂(ξ) = cos(ξx) µ(dx), ξ ∈ R.
R
At the same time, show that

1 1
µ = T2 µ =⇒ µ̂ 2− 2 ξ = µ̂(ξ) 2 , ξ ∈ R.
Conclude from these two that µ̂ > 0 everywhere and that

Z
n −n
cos 2− 2 ξx µ(dx) = µ̂(ξ)2 , n ∈ N and ξ ∈ R.

R
Finally, note that 1 − x ≤ − log x for x ∈ (0, 1], apply this to the preceding to
get Z
n

n
1 − cos 2− 2 x µ(dx) ≤ − log µ̂(1) < ∞, n ∈ N,

2
R
and arrive at Z
x2 µ(dx) ≤ −2 log µ̂(1)

R
after an application of Fatou’s Lemma.
(ii) To complete the program, let µ be any solution to µ = T2 µ, and define ν by
ZZ
ν(Γ) = 1Γ (x − y) µ(dx)µ(dy).
R2
Check that ν is symmetric and that ν = T2 ν. Hence, by (i), R x2 ν(dx) < ∞

R
(in fact, ν is centered normal). Finally, use this and part (i) of Exercise 1.4.27
to deduce that R x2 µ(dx) < ∞.
R
(iii) Make the obvious extension of T2 to Borel probability measures µ on RN .

That is,
ZZ
x+y
T2 µ(Γ) = 1Γ 1 µ(dx)µ(dy) for Γ ∈ BRN .
22
RN ×RN
Using the result just proved when N = 1, show that µ = T2 µ if and only if
µ = γ0,C for some non-negative definite, symmetric C.
Exercise 2.3.23. In connection with the preceding exercise, define Tα µ for
α ∈ (0, ∞) and Borel probability measures µ on RN , so that
ZZ
1
1Γ 2− α (x + y) µ(dx)µ(dy), Γ ∈ BRN .

Tα µ(Γ) =
RN ×RN
The problem under consideration here is that of determining for which α’s there
exist nontrivial (i.e., µ 6= δ0 ) solutions to the fixed point equation µ = Tα µ.
Begin by reducing the problem to the case when N = 1. Next, repeat the initial
argument given in part (ii) of Exercise 2.3.21 to see that there is some solution
if and only if there is one that is symmetric. Assuming that µ is a non-trivial,
symmetric solution, use the reasoning in part (i) there to see that
∞ if α ∈ (0, 2)
Z
2
x µ(dx) =
R 0 if α ∈ (2, ∞).
In particular, when α ∈ (2, ∞), there are no non-trivial solutions to µ = Tα µ.
(See § 3.2.3 for more on this topic.)
Exercise 2.3.24. Return to the setting of Exercise 2.1.13. After noting that,
so long as e ∈ Sn−1 , the distribution of
√
x ∈ Sn−1 n 7−→ (e, x)Rn ∈ R
is independent of e, use Lemma 2.3.3 to prove that the assertion in (2.1.15)
follows as a consequence of the one in (2.1.14).
Exercise 2.3.25. Begin by checking the identity (cf. (1.3.20))

Z ∞ 2

− t s−1 s+1
ts e 2β2 dt = 2 2 β s+1 Γ
0 2
for all β ∈ (0, ∞) and s ∈ (−1, ∞). Use the preceding to see that, for each
p ∈ (0, ∞),
r
2p

p p+1
(2.3.26) P
E |X| = Γ σ p if X ∈ N (0, σ 2 ).
π 2
The goal of the exercise is to show that the moments of sub-Gaussian random
variable display similar behavior.
(i) Suppose that X is β-sub-Gaussian, and show that, for each p ∈ (0, ∞),
p
p p
p
EP |X|p ≤ Kp β p where Kp ≡ p2 2 Γ = 2 2 +1 Γ

+1 .
2 2
(ii) Again suppose that X is β-sub-Gaussian, and let σ 2 be its variance. Show
that 2+|p−2|
−(1− p
2)
+ σ
EP |X|p ≥ K4 βp

β
for each p ∈ (0, ∞).
Hint: When p ≥ 2, the inequality is trivial. To prove it when p < 2, show that,
for any q ∈ (1, ∞),
1 2q−p 1
σ 2 ≤ EP |X|p q EP |X| q−1 q0 ,

q
where q 0 = q−1 is the Hölder conjugate of q.
(iii) Suppose that X1 , . . . , Xn are independent and that, for each 1 ≤ m ≤ n,
2
Xm is βm -sub-Gaussian and has variance σm . Given {a1 , . . . , an } ⊆ R, set
v v
Xn u n u n
uX uX
S= am Xm , Σ = t (am σm )2 , and B = t (am βm )2 ,
m=1 m=1 m=1
and show that, for each p ∈ (0, ∞),

2+|p−2|
−(1− p
2)
+ Σ
B p ≤ EP |S|p ≤ Kp B p .

K4
B
In particular, if βm = β and σm = σ for all 1 ≤ m ≤ n, then
v
2+|p−2| u n
−(1− p
2)
+ σ uX
(βA)p ≤ EP |S|p ≤ Kp (βA)p ,

K4 where A = t a2m .
β m=1
(iv) The most famous case of the situation discussed in (iii) is when the Xm ’s are
symmetric Bernoulli (i.e., P(Xm = ±1) = 12 ). First use (iii) in Exercise 1.3.17
or direct computation to check that Xm is 1-sub-Gaussian, and then conclude
that
n
! p2 " n p # n
! p2
−(1− p +
X
2)
X X
(2.3.27) K4 a2m P
≤E am Xm ≤ Kp a2m

m=1 m=1 m=1
for all {a1 , . . . , an } ⊆ R. This fact is known as Khinchine’s Inequality.

Exercise 2.3.28. Let X1 , . . . ,P
Xn be independent, symmetric (Exercise 1.4.26)
n
random variables, and set S = 1 Xm . Show that, for each p ∈ (0, ∞) (cf. part
(ii) in Exercise 2.3.25),

n
! p2  
n
! p2 
−(1− p + h i
2)
X X
2  ≤ EP |S|p ≤ Kp EP  2
K4 EP  Xm Xm .
1 1
Hint: Refer to the beginning of the proof of Lemma 1.1.6, and let R1 , . . . , Rn be

the Rademacher functions on [0, 1), set Q = λ[0,1) × P on [0, 1) × Ω, B[0,1) × F ,
and observe that
Xn
ω ∈ Ω 7−→ S(ω) ≡ Xm (ω)
1
has the same distribution under P as

n
X
(t, ω) ∈ [0, 1) × Ω 7−→ T (t, ω) ≡ Rm (t)Xm (ω)
1
does under Q. Next, apply Khinchine’s inequality to see that, for each ω ∈ Ω,
n
! p2 Z n
! p2
−(1− p
2)
+
T (t, ω)p dt ≤ Kp
X X
Xm (ω)2 Xm (ω)2

K4 ≤ ,
1 [0,1) 1
and complete the proof by taking the P-integral of this with respect to ω.
At least when p ∈ (1, ∞), I will show later that this sort of inequality holds
in much greater generality. Specifically, see Burkholder’s Inequality in Theorem
6.3.6.
Exercise 2.3.29. Suppose that X is an RN -valued Gaussian random variable
with mean value 0 and covariance C.
(i) Show that if A : RN −→ RN is a linear transformation, then AX is an
N (0, ACA> ) random variable, where A> is the adjoint transformation.
(ii) Given a linear subspace L of RN , let FL be the σ-algebra generated by

{(ξ, X)RN : ξ ∈ L}, and take L⊥C to be the subspace of η such that (η, Cξ)RN =
0 for all ξ ∈ L. Show that FL is independent of FL⊥C .
Hint: Show that, because of linearity, it suffices to check that
√ √ √ √
EP e −1(ξ,X)RN e −1(η,X)RN = EP e −1(ξ,X)RN EP e −1(η,X)RN

for all ξ ∈ L and η ∈ L⊥C .

(iii)
Suppose
that N = N1 + N2 , where Ni ∈ Z+ for i ∈ {1, 2}, write RN 3 x =
x(1)
∈ RN1 × RN2 , and take L = {x : x(1) = 0(1) }. Show that if Π is a
x(2)
linear transformation taking RN onto L that satisfies ξ − Πξ, Cη RN = 0 for all
ξ ∈ RN and η ∈ L, then Π> X is independent of (I − Π> )X.
(iv) Write
C(11) C(12)
C= ,
C(21) C(22)
where the block structure corresponds to RN = RN1 × RN2 , and assume that
C(22) is non-degenerate. Show that the one and only transformation Π of the
sort in part (iii) is given by

0(11) 0(12)
Π= ,
C−1
(22) C(21) I(22)
and therefore that

C(12) C−1

0(11)
Π> = (22) .
0(21) I(22)
Hint: Note
that Πξ = 0 if ξ(2) = 0(2) , Πξ = ξ if ξ(1) = 0(1) , and that
C(I − Π) (21) = 0(21) .
(v) Continuing with the assumption that C(22) is non-degenerate, show that
C(12) C−1

X= (22) Y +
Z
,
Y 0
where Y is an RN2 -valued, N (0, C(22) )-random variable, Z is an RN1 -valued

N (0, B) random variable with B = C(11) − C(12) C−1
(22) C(21) , and Y is indepen-
N1 N2
dent of Z. Conclude
P
measurable F : R × R
that, for any −→ R that is
bounded below, E F (X(1) , X(2) ) equals
Z Z

F x(1) , x(2) γC(12) C−1 x(2) ,B (dx(1) ) γ0,C(22) (dx(2) ).
(22)
RN2 RN1
Exercise 2.3.30. Given h ∈ L2 (RN ; C), recall that the (n + 2)-fold convolution
h?(n+2) is a bounded continuous function for each n ∈ N. Next, assume that
h(−x) = h(x) for almost every x ∈ RN and that h ≡ 0 off of BRN (0, 1). As an
application of part (iii) in Exercise 1.3.22, show that
" 2 #
?(n+2) (|x| − 2)+
h (x) ≤ 2khk2 2
L N
(R ;C) khkn
1 N
L (R ;C) exp − .
2n
Hint: Note that h ∈ L1 (RN ; C), assume that M ≡ khkL1 (RN ;C) > 0, and define
Af = M −1 h ? f for f ∈ L2 (RN ; C). Show that A is a self-adjoint contraction on
L2 (RN ; C), check that
h?(n+2) (x) = M n Tx h, An h L2 (RN ;C) ,

where Tx h ≡ h( · + x), and note that
Tx h, A` h L2 (RN ;C) = 0

if ` ≤ |x| − 2.
§ 2.4 An Application to Hermite Multipliers

This section does not really belong here and should probably be skipped by those
readers who want to restrict their attention to purely probabilistic matters. On
the other hand, for those who want to see how probability theory interacts
with other branches of mathematical analysis, the present section may come as
something of a revelation.
§ 2.4.1. Hermite Multipliers. The topic of this section will be a class of
linear operators called Hermite multipliers, and what will be discussed are certain
boundedness properties of these operators. The setting is as follows. For n ∈ N,
define
x2 dn − x2
(2.4.1) Hn (x) = (−1)n e 2 e 2 , x ∈ R.
dxn
Clearly, Hn is an nth order, real, monic (i.e., 1 is the coefficient of the highest
order term) polynomial. Moreover, if we define the raising operator A+ on
C 1 (R; C) by
x2 d − x2 dϕ
A+ ϕ (x) = −e 2 e 2 ϕ(x) = − (x) + xϕ(x), x ∈ R,
dx dx
then
(2.4.2) Hn+1 = A+ Hn for all n ∈ N.

§ 2.4 An Application to Hermite Multipliers 97
At the same time, if ϕ and ψ are continuously differentiable functions whose first
derivatives are tempered (i.e., have at most polynomial growth at infinity), then

(2.4.3) ϕ, A+ ψ L2 (γ 0,1 ;C)
= A− ϕ, ψ L2 (γ0,1 ;C)
,
dϕ
where A− is the lowering operator given by A− ϕ = dx . After combining
(2.4.2) with (2.4.3), we see that, for all 0 ≤ m ≤ n,
= Hm , An+ H0 = An− Hm , H0

Hm , Hn L2 (γ0,1 ;C) L2 (γ0,1 ;C) L2 (γ0,1 ;C)
= m! δm,n ,
where, at the last step, I have used the fact that Hm is a monic mth order
polynomial. Hence, the (normalized) Hermite polynomials
Hn (x) (−1)n x2 dn − x2
H n (x) = √ = √ e2 e 2 , x ∈ R,
n! n! dxn
form an orthonormal set in L2 (γ0,1 ; C). (Indeed, they are one choice of the
orthogonal polynomials relative to the Gauss weight.)
Lemma 2.4.4. For each λ ∈ C, set
λ2

H(x; λ) = exp λx − , x ∈ R.
2
Then
∞
X λn
(2.4.5) H(x; λ) = Hn (x), x ∈ R,
n=0
n!
where the convergence is both uniform on compact subsets of R× C and, for λ’s
in compact subsets of C, uniform in L2 (γ0,1 ; C). In particular, H n : n ∈ N is
an orthonormal basis in L2 (γ0,1 ; C).
x2
Proof: By (2.4.1) and Taylor’s expansion for the function e− 2 , it is clear that
(2.4.5) holds for each (x, λ) and that the convergence is uniform on compact
subsets of R × C. Furthermore, because the Hn ’s are orthogonal, the asserted
uniform convergence in L2 (γ0,1 ; C) comes down to checking that
∞ n 2
X λ
lim sup Hn k2 2
L (γ0,1 ;C) = 0
m→∞ |λ|≤R n=m n!

for every R ∈ (0, ∞), and obviously this follows from our earlier calculation that
2
Hn 2 = n!.
L (γ ;C)
0,1

To prove the assertion that H n : n ∈ N forms an orthonormal basis in
L2 (γ0,1 ; C), it suffices to check that any ϕ ∈ L2 (γ0,1 ; C) that is orthogonal to all
of the Hn ’s must be 0. But, because of the L2 (γ0,1 ; C) convergence in (2.4.5),
we would have that
Z
ϕ(x) eλx γ0,1 (dx) = 0, λ ∈ C,
R
for such a ϕ. Hence, if

x2
e− 2 ϕ(x)
ψ(x) = √ , x ∈ R,
2π
then kψkL1 (R;C) = kϕkL1 (γ0,1 ;C) ≤ kϕkL2 (γ0,1 ;C) < ∞ and (cf. (2.3.2)) ψ̂ ≡ 0,
which, by the L1 (R; C) Fourier inversion formula
√
Z
1 α&0
e−α|ξ| e− −1 xξ ψ̂(ξ) dξ −→ ψ in L1 (R; C),
2π R
means that ψ and therefore ϕ vanish Lebesgue-almost everywhere.

Now that we know H n : n ∈ N is an orthonormal basis, I can uniquely
determine a normal operator Hθ for each θ ∈ C by specifying that
Hθ Hn = θn Hn for each n ∈ N.
The operator Hθ is called the Hermite multiplier with parameter θ, and
clearly
∞
( )
X 2
Dom Hθ = ϕ ∈ L2 (γ0,1 ; C) : |θ|2n ϕ, H n L2 (γ0,1 ;C) < ∞

n=1
∞
X
θn ϕ, H N

Hθ ϕ = L2 (γ0,1 ;C)
H n, ϕ ∈ Dom Hθ .
n=0
In particular, Hθ is a contraction if and only if θ is an element of the closed unit

disk D in C, and it is unitary precisely when θ ∈ S1 ≡ ∂D. Also, the adjoint of
Hθ is Hθ , and so it is self-adjoint if and only if θ ∈ R.
As we are about to see, there are special choices of θ for which the correspond-
ing Hermite multiplier has interesting alternative interpretations and unexpected
additional properties. For example, consider the Mehler kernel1
" 2 2 #
1 θx − 2θxy + θy
M (x, y; θ) = √ exp −
1 − θ2 2 1 − θ2
1 This kernel appears in the 1866 article by Mehler referred to in the footnote following (2.1.14).
√
It arises there as the generating function for spherical harmonics on the sphere S∞ ∞ .
for θ ∈ (0, 1) and x, y ∈ R. By a straightforward Gaussian computation (i.e.,

“complete the square” in the exponential) one can easily check that
Z
H(y; λ) M (x, y; θ) γ0,1 (dy) = H(x; θλ)
R
for all θ ∈ (0, 1) and (x, λ) ∈ R × C. In conjunction with (2.4.5), this means that
Z
(2.4.6) Hθ ϕ = M ( · , y; θ) ϕ(y) γ0,1 (dy), θ ∈ (0, 1) and ϕ ∈ L2 (γ0,1 ; C),
R
and from here it is not very difficult to prove the following properties of Hθ for
θ ∈ (0, 1).
Lemma 2.4.7. For each ϕ ∈ L2 (γ0,1 ; C), (θ, x) ∈ (0, 1) × R 7−→ Hθ ϕ(x) ∈
C may be chosen to be a continuous function that is non-negative if ϕ ≥ 0
Lebesgue-almost everywhere. In addition, for each θ ∈ (0, 1) and every p ∈
[1, ∞],

(2.4.8) Hθ ϕ p ≤ kϕkLp (γ0,1 ;C) .
L (γ 0,1 ;C)
Proof: The first assertions are immediate consequences of the representation

in (2.4.6). To prove the second assertion, observe that Hθ 1 = 1 and therefore,
as a special case of (2.4.6),
Z
M (x, y; θ) γ0,1 (dy) = 1 for all θ ∈ (0, 1) and x ∈ R.
R
Hence, by (2.4.6) and Jensen’s Inequality, for any p ∈ [1, ∞),

Z
Hθ ϕ (x)p ≤ M (x, y; θ) |ϕ(y)|p γ0,1 (dy).

R
R
At the same time, by symmetry, R
M (x, y; θ) γ0,1 (dx) = 1 for all (θ, y) ∈ (0, 1)×
R, and therefore
Z ZZ Z
Hθ ϕ (x)p γ0,1 (dx) ≤ p
|ϕ|p dγ0,1 .

M (x, y; θ) |ϕ(y)| γ0,1 (dx)γ0,1 (dy) =
R R
R×R
Hence, (2.4.8) is now proved for p ∈ [1, ∞). The case when p = ∞ is even easier
and is left to the reader.
The conclusions drawn in Lemma 2.4.7 from the Mehler representation in
(2.4.6) are interesting but not very deep (cf. Exercise 2.4.36). A deeper fact is
the relationship between Hermite multipliers and the Fourier transform. For the
purposes of this analysis, it is best to define the Fourier operator F by
Z √
e −1 2πξx f (x) dx, ξ ∈ R,

(2.4.9) Ff (ξ) =
R
1
for f ∈ L (R; C). The advantage
√ of this choice is that, without the introduction
of any further factors of 2π, the Parseval Identity (cf. Exercise 2.4.37) becomes
the statement that F determines a unitary operator on L2 (R; C). In order to
relate F to Hermite multipliers, observe that, after analytically continuing the
result of another simple Gaussian computation,
Z
2 ζ2
eζx e−πx dx = e 4π for all ζ ∈ C,
R
we see from (2.4.5) that

∞
λn
Z √
X 2
e −1 2πξx Hn 2πp x e−πx dx
p
n=0
n! R
∞
(p − 1)λ2 √ n

2 X λ
−πξ 2
+ −1 λ 2πp ξ = e−πξ
p p
θpn Hn

=e exp 2πp0 ξ ,
2 n=0
n!
p √ 1
where p0 = p−1 is the Hölder conjugate of p and θp ≡ −1 (p − 1) 2 . Thus,
we have now proved that, for each p ∈ (1, ∞) and n ∈ N,
Z √
2 2
e −1 2πξx Hn 2πp x e−πx dx = θpn Hn 2πp0 x e−πξ .
p p
(2.4.10)
R
In particular, when p = 2, (2.4.10) says that

√ n
(2.4.11) Fhn = −1 hn , n ∈ N,
where hn is the nth (un-normalized) Hermite function given by
1 2
(2.4.12) hn (x) = Hn (4π) 2 x e−πx , n ∈ N and x ∈ R.
More generally, (2.4.10) leads to the following relationship between F and
Hermite multipliers. Namely, for each p ∈ (1, ∞), define Up on Lp (γ0,1 ; C) by
1 1 2
Up ϕ (x) = p 2p ϕ (2πp) 2 x e−πx , x ∈ R.

It is then an easy matter to check that Up is an isometric surjection from

Lp (γ0,1 ; C) onto Lp (R; C). In addition, (2.4.10) can now be interpreted as the
statement that, for every p ∈ (1, ∞) and every polynomial ϕ,
1
! 12
pp
(2.4.13) F ◦ Up ϕ = Ap Up0 ◦ Hθp ϕ where Ap ≡ 1 .
(p0 ) p0
See Exercise 2.4.35, where it is shown that Ap < 1 for p ∈ (0, 1).
§ 2.4.2. Beckner’s Theorem. Having completed this brief introduction to

Hermite multipliers, I will now address a problem to which The Central Limit
Theorem has something to contribute. The problem is that of determining the
set of (θ, p, q) ∈ D × (1, ∞) × (0, ∞) with p ≤ q for which Hθ determines a
contraction from Lp (γ0,1 ; C) into Lq (γ0,1 ; C). In view of the preceding discussion,
when θ ∈ (0, 1), a solution to this problem has implications for the Mehler
transform; and, when q = p0 , the solution tells us about the Fourier operator.
The role that The Central Limit Theorem plays in this analysis is hidden in the
following beautiful criterion, which was first discovered by Wm. Beckner.2
Theorem 2.4.14 (Beckner). Let θ ∈ D and 1 ≤ p ≤ q < ∞ be given. Then
ϕ ∈ L2 (γ0,1 ; C)

(2.4.15) Hθ ϕ q ≤ kϕkLp (γ0,1 ;C) for all
L (γ 0,1 ;C)
if
q1 p1
|1 − θζ|q + |1 + θζ|q |1 − ζ|p + |1 + ζ|p

(2.4.16) ≤
2 2
for every ζ ∈ C.
That (2.4.16) implies (2.4.15) is trivial is quite remarkable. Indeed, it takes
a problem in infinite dimensional analysis and reduces it to a calculus question
about functions on the complex plane. Even though, as we will see later, this
reduction leads to highly non-trivial problems in calculus, Theorem 2.4.14 has
to be considered a major step toward understanding the contraction properties
of Hermite multipliers.3
The first step in the proof of Theorem 2.4.14 is to interpret (2.4.16) in oper-
ator theoretic language. For this
purpose, let β denote the standard Bernoulli
probability measure on R, BR . That is, β {±1} = 12 . Next, use χ∅ to denote

the function on R that is constantly equal to 1 and χ{1} to stand for the iden-
tity function on R (i.e., χ{1} (x) = x, x ∈ R). It is then clear that χ∅ and
χ{1} constitute an orthonormal basis in L2 (β; C); in fact, they are the orthog-
onal polynomials there. Hence, for each θ ∈ C, we can define the Bernoulli
multiplier Kθ as the unique normal operator on L2 (β; C) prescribed by
if F = ∅

χ∅
Kθ χF =
θχ{1} if F = {1}.
2 See Beckner’s “Inequalities in Fourier analysis,” Ann. Math., # 102 #1, pp. 159–182 (1975).
3 Later, in his article “Gaussian kernels have only Gaussian maximizers,” Invent. Math. 12,
pp. 179–208 (1990), E. Lieb essentially killed this line of research. His argument, which is
entirely different from the one discussed here, handles not only the Hermite multipliers but
essentially every operator whose kernel can be represented as the exponential of a second order
polynomial.
Furthermore, (2.4.16) is equivalent to the statement that

≤ kϕkLp (β,C) for all ϕ ∈ L2 (β; C).

(2.4.17) Kθ ϕ q
L (β;C)
Indeed, it is obvious that (2.4.16) is equivalent to (2.4.17) restricted to ϕ’s of

the form x ∈ R 7−→ 1 + ζx as ζ runs over C; and from this, together with
the observation that every element of L2 (β; C) can be represented in the form
aχ∅ + bχ{1} as (a, b) runs over C2 , one quickly concludes that (2.4.16) implies
(2.4.17) for general ϕ ∈ L2 (β; C).
I next want to show that (2.4.17) can be parlayed into a seemingly more
general statement. To this end, define the n-fold tensor product operator Kθ⊗n
on L2 (β n ; C) as follows. For F ⊆ {1, . . . , n} set χF ≡ 1 if F = ∅ and define
Y
χ{1} (xj ) for x = x1 , . . . , xn ∈ Rn

χF (x) =
j∈F

if F 6= ∅. Note that χF : F ⊆ {1, . . . , n} is an orthonormal basis for L2 (β n ; C),
and define Kθ⊗n to be the unique normal operator on L2 (β n ; C) for which
(2.4.18) Kθ⊗n χF = θ|F | χF , F ⊆ {1, . . . , n},
where |F | is used here to denote the number of elements in the set F . Alterna-
tively, one can describe Kθ⊗ninductively on n ∈ Z+ by saying that Kθ⊗1 = Kθ
and that, for Φ ∈ C Rn+1 ; C and (x, y) ∈ Rn × R,
⊗(n+1)
Φ (x, y) = Kθ Ψ(x, · ) (y) where Ψ(x, y) = Kθ⊗n Φ( · , y) (x).

Kθ
It is this alternative description that makes it easiest to see the extension
of (2.4.17) alluded to above. Namely, what I will now show is that, for every
n ∈ Z+ ,
(2.4.17) =⇒ K⊗n Φ q n ≤ kΦkLp (β n ;C) , Φ ∈ L2 (β n ; C).

(2.4.19) θ L (β ;C)
Obviously, there is nothing to do when n = 1. Next, assume (2.4.19) for n,

let Φ ∈ C Rn+1 ; C be given, and define Ψ as in the second description of
⊗(n+1)
Kθ Φ. Then, by (2.4.17) applied to Ψ(x, · ) for each x ∈ Rn and by the
induction hypothesis applied to Φ( · , y) for each y ∈ R, we have that
Z Z
⊗(n+1) q q
β n (dx)
K
θ Φ q n+1 =
L (β ;C)
Kθ Ψ(x, · ) (y) β(dy)
Rn R
Z Z pq Z pq
p n
p
≤ |Ψ(x, y)| β(dy) β (dx) =

Ψ( · , y) β(dy)
q
Rn R Rn L p (β n ;C)
Z pq Z pq
p
|Ψ( · , y)|p

≤ q β(dy) = Ψ( · , y) Lq (β n ;C) β(dy)

Rn L p (β n ;C) R
Z pq
Φ( · , y) p p n β(dy) = kΦkqLp (β n+1 ;C) ,

≤ L (β ;C)
R
where, in the passage to the third line, I have used the continuous form of
Minkowski’s Inequality (it is at this point that the only essential use of the
hypothesis p ≤ q is made).
I am now ready to take the main step in the proof of Theorem 2.4.14.
Lemma 2.4.20. Define An : L2 (β; C) −→ L2 β n ; C) by
Pn
`=1 x`
for x ∈ Rn .

An ϕ (x) = ϕ √
n
Then, for every pair of tempered ϕ and ψ from C(R; C),

(2.4.21) kϕkLp (γ0,1 ;C) = lim An ϕ Lp (β n ;C) for every p ∈ [1, ∞)
n→∞
and

(2.4.22) Hθ ϕ, ψ = lim Kθ⊗n ◦ An ϕ, An ψ
L2 (γ0,1 ;C) n→∞ L2 (β n ;C)
for every θ ∈ (0, 1). Moreover, if, in addition, either ϕ or ψ is a polynomial, then
(2.4.22) continues to hold for all θ ∈ C.
Proof: Let ϕ and ψ be tempered elements of C(R; C), and define
fn (θ) = Kθ⊗n ◦ An ϕ, An ψ

L2 (β n ;C)
and f (θ) = Hθ ϕ, ψ L2 (γ0,1 ;C)
for n ∈ Z+ and θ ∈ C. I begin by showing that
(2.4.23) lim fn (θ) = f (θ), θ ∈ (0, 1).

n→∞
Notice that (2.4.23) is (2.4.22) for θ ∈ (0, 1) and that In (2.4.21) follows from
(2.4.22) with ϕ = 1, ψ = |ϕ|p , and any θ ∈ (0, 1).
In order to prove (2.4.23), I will need to introduce other expressions for f (θ)
and the fn (θ)’s. To this end, set

1 θ
Cθ = ,
θ 1
and, using (2.4.6), observe (cf. (2.3.6)) that

Z
f (θ) = ϕ(x) ψ(y) γ0,Cθ (dx × dy).
R2
Next, let, for each x ∈ R\{0}, define kθ (x, · ) to be the probability measure on R
such that kθ x, {±sgnx} = 1±θ

2 , and set kθ (0, · ) = β. Then it is easy to check
R R
that R χ{0} (y) kθ (±1, dy) =R χ{0} (±1) and R χ{1} (y) kθ (±1, dy) = θχ{1} (±1)
and therefore Kθ ϕ(±1) = R ϕ(y) kθ (±1, dy) for all ϕ. Hence, if βθ be the
probability measure on R2 determined by βθ (dx × dy) = kθ (x, dy) β(dx) or,
equivalently,
βθ {(±1, ±1)} = 1+θ and βθ {(±1, ∓1)} = 1−θ

4 4 ,
then Z

Kθ ϕ, ψ L2 (β;C)
= ϕ(x) ψ(y) βθ (dx × dy).
R2
Proceeding by induction, it follows that

Z Z
⊗n
Kθ Φ, Ψ 2 = ··· Φ(x) Ψ(y) βθ (dx1 × dy1 ) · · · βθ (dxn × dyn )
L (β;C) R2 R2
Z+
for all Φ, Ψ ∈ C(Rn ; C). Hence, if (cf. Exercise 1.1.14) Ω = R2 , F = BΩ ,
Z+
and Pθ = βθ , then
Pn
1 Zm
fn (θ) = E Pθ
F √ ,
n
where F (z) ≡ ϕ(x) ψ(y) for z = (x, y) ∈ R2 and Zn (ω) = zn , n ∈ Z+ , when

ω = (z1 , . . . , zn , . . . ) ∈ Ω. Further, under Pθ , the Zn ’s are mutually independent,
identically distributed R2 -valued random variables with mean value 0 and co-
variance Cθ . In addition, Z1 is bounded, and therefore the last part of Theorem
2.3.19 applies and guarantees that (2.4.23) holds.
To complete the proof, suppose that ϕ is a polynomial of degree k. It is then
an easy matter to check that

An ϕ, χF L2 (β n ;C) = 0 if |F | > k,
and therefore (cf. (2.4.18)) θ ∈ C 7−→ fn (θ) ∈ C is also a polynomial of degree

no more than k. Moreover, because

X |F |
fn (θ) = θ An ϕ, χF L2 (β n ;C) χF , An ψ L2 (β n ;C) ,

F
we also know that

fn (θ) ≤ |θ| ∨ 1 k An ϕ 2 n An ψ 2 n , n ∈ Z+ and θ ∈ C.

L (β ;C) L (β ;C)
Hence, because of (2.4.21) with p = 2, {fn : n ∈ Z+ } is a family of entire

functions on C that are uniformly bounded on compact subsets. At the same
time, because (ϕ, Hm )L2 (γ0,1 ;C) = 0 for m > k, f is also a polynomial of degree
at most k, and therefore (2.4.23) already implies that the convergence extends
to the whole of C and is uniform on compacts. Finally, in the case when ψ,
instead of ϕ, is a polynomial, simply note that

Kθ⊗n ◦ An ϕ, An ψ = Kθ̄⊗n ◦ An ψ, An ϕ
L2 (β n ;C) L2 (β n ;C)

and Hθ ϕ, ψ L2 (γ
= Hθ̄ ψ, ϕ L2 (γ0,1 ;C)
, and apply the preceding.
0,1 ;C)
Proof of Theorem 2.4.14: Assume that (2.4.16) holds for a given pair 1 <
p ≤ q < ∞ and θ ∈ D. We then know that (2.4.19) holds for every n ∈ Z+ .
Hence, by Lemma 2.4.20, if ϕ and ψ are tempered elements of C(R; C) and at
least one of them is a polynomial, then

Hθ ϕ, ψ L2 (γ0,1 ;C) = lim Kθ⊗n ◦ An ϕ, An ψ 2 n

n→∞ L (β ;C)

≤ lim An ϕ Lp (β n ;C) An ψ Lq0 (β n ;C) = kϕkLp (γ0,1 ;C) kψkLq0 (γ0,1 ;C) .
n→∞
In other words, we now know that, for all tempered ϕ and ψ from C(R; C),

(2.4.24) H ϕ, ψ ≤ kϕkLp (γ0,1 ;C) kψkLq0 (γ0,1 ;C)

θ L2 (γ0,1 ;C)
so long as one or the other is a polynomial.

To complete the proof when p ∈ (1, 2], note that, for any fixed polynomial
ϕ, (2.4.24) for every tempered ψ ∈ C(R; C) guarantees that the inequality in
(2.4.15) holds for that ϕ. At the same time, because p ∈ (1, 2] and the polynomi-
als are dense in L2 (γ0,1 ; C), (2.4.15) follows immediately from its own restriction
to polynomials.
Finally, assume that p ∈ [2, ∞) and therefore that q 0 ∈ (1, 2]. Then, again
because the polynomials are dense in L2 (γ0,1 ; C), (2.4.24) for a fixed tempered
ϕ ∈ C(R; C) and all polynomials ψ implies (2.4.15) first for all tempered contin-
uous ϕ’s and thence for all ϕ ∈ L2 (γ0,1 ; C).
§ 2.4.3. Applications of Beckner’s Theorem. I will now apply Theorem
2.4.14 to two important examples. The first example involves the case when
θ ∈ (0, 1) and shows that the contraction property proved in Lemma 2.4.7 can
be improved to say that, for each p ∈ (1, ∞) and θ ∈ (0, 1), there is a q =
q(p, θ) ∈ (p, ∞) such that Hθ is a contraction on Lp (γ0,1 ; C) into Lq (γ0,1 ; C).
Such an operator is said to be hypercontractive, and the fact that Hθ is
hypercontractive was first proved by E. Nelson in connection with his renowned
construction of a non-trivial, two-dimensional quantum field.4 The proof that
4 Nelson’s own proof appeared in his “The free Markov field,” J. Fnal. Anal. 12, pp. 12–21
(1974).
I will give is entirely different from Nelson’s and is much closer to the ideas
introduced by L. Gross5 as they were developed by Beckner.
Theorem 2.4.25 (Nelson). Let θ ∈ (0, 1) and p ∈ (1, ∞) be given, and set
p−1
q(p, θ) = 1 + .
θ2
Then
ϕ ∈ L2 (γ0,1 ; C),

(2.4.26) Hθ ϕ q ≤ kϕkLp (γ0,1 ;C) ,
L (γ0,1 ;C)
for every 1 ≤ q ≤ q(p, θ). Moreover, if q > q(p, θ), then

n o
(2.4.27) sup Hθ ϕ Lq (γ : kϕkLp (γ0,1 ;C) = 1 = ∞.
0,1 ;C)
Proof: I will leave the proof of (2.4.27) as an exercise. (Try taking ϕ’s of
2
the form eλx .) Also, because γ0,1 is a probability measure and therefore the
left-hand side of (2.4.26) is non-decreasing as a function of q, I will restrict my
attention to the proof of (2.4.26) for q = q(p, θ). Hence, by Theorem 2.4.14, what
I have to do is prove (2.4.16) for every 1 < p < q < ∞ and θ ∈ (0, 1) that are
related by
12
p−1
(2.4.28) θ= .
q−1
I begin with the case when 1 < p < q ≤ 2, and I will first consider ζ ∈ [0, 1).
Introducing the generalized binomial coefficients

r r(r − 1) · · · (r − ` + 1)
≡ for r ∈ R and ` ∈ N,
` `!
one can write

∞
|1 − θζ|q + |1 + θζ|q

X q
=1+ (θζ)2k
2 2k
k=1
and
∞
|1 − ζ|p + |1 + ζ|p

X p
=1+ ζ 2k .
2 2k
k=1
5 See Gross’s “Logarithmic Sobolev inequalities,” Amer. J. Math. 97 #4, pp. 1061–1083
(1975). In this paper, Gross introduced the idea of proving estimates on Hθ from the corre-
sponding estimates for Kθ . In this connection, have a look at Exercises 2.4.39 and 2.4.41.
q

Noting that, because q ≤ 2, 2k ≥ 0 for every k ∈ Z+ , and using the fact that,
p
because pq ∈ (0, 1), (1 + x) q ≤ 1 + pq x for all x ≥ 0, we see that
pq ∞
|1 − θζ|q + |1 + θζ|q

pX q
≤1+ (θζ)2k .
2 q 2k
k=1
Hence, I will have completed the case under consideration once I check that
∞ ∞
pX q 2k
X p
(θζ) ≤ ζ 2k ,
q 2k 2k
k=1 k=1
and clearly this will follow if I show that

p q 2k p
θ ≤ for each k ∈ Z+ .
q 2k 2k
But the choice of θ in (2.4.28) makes the preceding an equality when k = 1, and,
when k ≥ 2,
p q 2k−1
2k
q 2k θ
Y j−q
p
≤ ≤ 1,
2k j=2
j−p
since 1 < p < q ≤ 2.

At this point, I have proved (2.4.15) for 1 < p < q ≤ 2 and θ given by (2.4.28)
when ζ ∈ (0, 1). Continuing with this choice of p, q, and θ, note that (2.4.15)
extends immediately to ζ ∈ [−1, 1] by continuity and symmetry. Finally, for
general ζ ∈ C, set
|1 − ζ| + |1 + ζ| |1 − ζ| − |1 + ζ| b
a= , b= , and c = ∈ [−1, 1].
2 2 a
Then
|1 ± θζ| = 1+θ
2 (1 ± ζ) +
1−θ
2 (1 ∓ ζ) ≤ a ∓ θb,
and, therefore, by the preceding applied to c, we have that
1 1
|1 − θζ|q + |1 + θζ|q q |1 − θc|q + |1 + θc|q q

≤a
2 2
1 1 1
|1 − c|p + |1 + c|p p |a − b|p + |a + b|p p |1 − ζ|p + |1 + ζ|p p

≤a = = .
2 2 2
Hence, I have now completed the case when 1 < p < q ≤ 2 and θ is given by
(2.4.28).
To handle the other cases, I will use the equivalence of (2.4.16) and (2.4.17).
Thus, what we already know is that (2.4.17) holds for 1 < p < q ≤ 2 and the θ
in (2.4.28). Next, suppose that 2 ≤ p < q < ∞. Then, since 1 < q 0 < p0 ≤ 2 and
p−1 q0 − 1
= 0 ,
q−1 p −1
an application to q 0 and p0 of the result that we already have yields
= sup Kθ ϕ, ψ L2 (β;C) : ψ ∈ L2 (β; C) with kψkLq0 (β) = 1

Kθ ϕ q
L (β;C)

: ψ ∈ L2 (β; C) with kψkLq0 (β) = 1

= sup ϕ, Kθ ψ 2
L (β;C)
≤ kϕkLp (β;C) ,
where the θ is the one given in (2.4.28). Thus, the only case that remains is the
1 1
one when 1 < p ≤ 2 ≤ q < ∞. But, in this case, set ξ = (p − 1) 2 , η = (q − 1)− 2 ,
and observe that, because the associated θ in (2.4.28) is the product of ξ with
η, Kθ = Kη ◦ Kξ and therefore

Kθ ϕ q ≤ Kξ ϕ L2 (β;C) ≤ kϕkLp (β;C) .
L (β;C)
As my second, and final, application of Theorem 2.4.14, I present the theorem

of Beckner for which he concocted Theorem 2.4.14 in the first place. The result
was
√ conjectured originally by H. Weyl, who guessed, on the basis of Fh0 =
0
( −1) h0 , that the norm kFkp→p0 of F as an operator on Lp (R; C) to Lp (R; C)
n
should be achieved by h0 . Weyl’s conjecture was partially verified by I. Babenko,

who proved it when p0 is an even integer. In particular, when combined with
the Riesz–Thorin Interpolation Theorem, Babenko’s result already shows (cf.
Exercise 2.4.35) that kFkp→p0 < 1 for p ∈ (0, 1).
Theorem 2.4.29 (Beckner). For each p ∈ [1, 2],
(2.4.30) kFf kLp0 (R;C) ≤ Ap kf kLp (R;C) , f ∈ Lp (R; C) ∩ L2 (R; C),
where F is the Fourier operator in (2.4.9), A1 = 1, and Ap is the constant in

2
(2.4.13). Moreover, if f is the Gauss kernel e−πx , then (2.4.30) is an equality.
Proof: Because of (2.4.11), the second part is a straightforward computation
that I leave to the reader. Also, I will only consider (2.4.30) when p ∈ (1, 2), the
other cases being well known (cf. Exercise 2.4.37).
Because of (2.4.13), the proof of (2.4.30) comes down to showing that
≤ kϕkLp (γ0,1 ;C) , ϕ ∈ Lp (γ0,1 ; C),

(2.4.31) Hθp ϕ p0
L (γ ;C)
0,1
√ 1
where θp = −1 (p − 1) 2 . Indeed, by (2.4.13), (2.4.31) implies that
(2.4.32) kFUp ϕkLp0 (R;C) ≤ Ap kϕkLp (γ0,1 ;C)
for all polynomials ϕ. Next, if ϕ ∈ L2 (γ0,1 ; C) and {ϕn : n ≥ 1} is a sequence of
polynomials which tend to ϕ in L2 (γ0,1 ; C), then, because p ∈ (1, 2), it is easy to
check that ϕn −→ ϕ in Lp (γ0,1 ; C) and Up ϕn −→ Up ϕ in L2 (R; C); and therefore,
since F is a bounded on L2 (R; C), Fatou’s Lemma shows that (2.4.32) continues
to hold for all ϕ ∈ L2 (γ0,1 ; C). Now let f ∈ Cc (R; C), and set ϕ = Up−1 f . Then,
(2.4.32) implies that (2.4.30) holds for f . Finally, if f ∈ L2 (R; C) ∩ Lp (R; C),
choose {fn : n ≥ 1} ⊆ Cc (R; C) so that fn −→ f in both L2 (R; C) and Lp (R; C),
and conclude that (2.4.30) continues to hold.
By Theorem 2.4.14, (2.4.31) will follow as soon as I prove (2.4.16) for θp . For
this purpose, write
√ 1
ζ = ξ + −1 (p − 1)− 2 η, where ξ, η ∈ R.
Then, because p0 − 1 = (p − 1)−1 , proving (2.4.16) for θp becomes the problem
of checking that
1
h
2 i p20 h i p20  p0
2
2
 1 − η + (p − 1)ξ + 1 + η + (p − 1)ξ 2 
 
 2 
(*)
h 2 i p2 h 2 i p2  p1
0 2 0 2
1−ξ + (p − 1)η + 1+ξ + (p − 1)η
≤
 
2

for all ξ, η ∈ R.
To prove (*), consider,
1for each α ∈ (0, ∞), the function gα : [0, ∞)2 −→ [0, ∞)
1 α
defined by gα (x, y) = x α +y α . It is an easy matter to check that gα is concave
or convex depending on whether α ∈ [1, ∞) or α ∈ (0, 1). In particular, since
p0 p0 0 p0 0
2 ∈ (1, ∞), when we set α = 2, x± = |1 ± η|p , and y = (p − 1) 2 |ξ|p , we get

h 2 i p20 h 2 i p20
2 2
1−η + (p − 1)ξ + 1+η + (p − 1)ξ
2

gα x− , y + gα x+ , y x− + x+
= ≤ gα ,y
2 2
 ! 20  p20
0 0 p
|1 − η|p + |1 + η|p
= + (p − 1)ξ 2  ;
2
p
and similarly, because 2 ∈ (0, 1),
h 2 i p2 h 2 i p2
0 2 0 2
1−ξ + (p − 1)η + 1+ξ + (p − 1)η
2
" p2 # p2
|1 − ξ|p + |1 + ξ|p
≥ + (p0 − 1)η 2 .
2
Thus, (*) will be proved once I show that
0 0
! 20 p2
p
|1 − η|p + |1 + η|p |1 − ξ|p + |1 + ξ|p

(**) +(p−1)ξ ≤ 2
+(p0 −1)η 2 .
2 2
But because (cf. Theorems 2.4.14 and 2.4.25) we know that (2.4.16) holds with
1
p replaced by 2, q = p0 , and θ = p − 1 2 , the left side of (**) is dominated by
1 2 1 2
1 − (p0 − 1) 2 η + 1 + (p0 − 1) 2 η
(p − 1)ξ + 2
= 1 + (p − 1)ξ 2 + (p0 − 1)η 2 .
2
At the same time, again by (2.4.16), only this time with p, 2, and the same
choice of θ, we see that the right-hand side of (**) dominates
1 2 1 2
0 1 − (p − 1) 2 ξ + 1 + (p − 1) 2 ξ
(p − 1)η + 2
= 1 + (p − 1)ξ 2 + (p0 − 1)η 2 .
2
Exercise 2.4.33. Define S : R2 −→ R so that S(x1 , x2 ) = x1√+x 2

2
, let π1 and
π2 be the natural projection maps given by πi (x1 , x2 ) = xi for i ∈ {1, 2}, and
let λR denote Lebesgue measure on R. The goal of this exercise is to prove that
if f : R −→ R is a Borel measurable function with the property that
f ◦ π 1 + f ◦ π2
(2.4.34) f ◦S = √ λ2R -almost everywhere,
2
then there is an α ∈ R such that f (x) = αx for λR1 -almost every x ∈ R. Here
are steps which one can take to prove this result.
(i) After noticing that (2.4.34) holds when λR is replaced by γ0,1 , apply Exercise
2.3.21 to see that the γ0,1 -distribution of x f (x) is γ0,α for some α ∈ [0, ∞).
Conclude, in particular, that f ∈ L2 (γ0,1 ; R).

(ii) For each n ≥ 0, let Z (n) denote span {Hn ◦ π1 Hn−m ◦ π2 : 0 ≤ m ≤ n} .
S∞
Show that Z (m) ⊥ Z (n) in L2 (γ0,1
2
; R) when m 6= n and the span of n=0 Z (n)
is dense in L2 (γ0,1
2
; R). Conclude from these that if F ∈ L2 (γ0,1
2
; R), then F =
P∞ (n)
n=0 Πn F , where Πn denotes orthogonal projection onto Z and the series
convergences in L2 (γ0,1 ; R).
(iii) Using the generating (2.4.5), show that
n
−n
X n
Hn ◦ S = 2 2 Hm ◦ π1 Hn−m ◦ π2 ,
m=0
m
and use this to conclude that for any ϕ ∈ L2 (γ0,1 ; R),

ϕ, Hn L2 (γ0,1 ;R)
Πn (ϕ ◦ S) = Hn ◦ S.
n!
(iv) Show that if ϕ ∈ L2 (γ0,1 ; R), then

ϕ ◦ π 1 + ϕ ◦ π2
ϕ, Hn L2 (γ0,1 ;R)
Πn √ = 1 Hn ◦ π1 + Hn ◦ π2 .
2 2 n!
2
(v) By combining (iii) and (iv), show that

f, Hn L2 (γ ;R) f, Hn L2 (γ ;R)
0,1 0,1
(*) Hn ◦ S = 1 Hn ◦ π1 + Hn ◦ π2 .
n! 2 2 n!

From this, show that f, Hn L2 (γ0,1 ;R) = 0 unless n = 1. When n = 0, this is

obvious. When n ≥ 2, one can argue that, if f, Hn L2 (γ0,1 ;R) 6= 0, then (*)
implies that
P∞ Hn0 ◦ π1 = Hn0 ◦ π2 , which is possible only if Hn0 is constant. Finally,
1
since f = n=0 n! f, Hn L2 (γ0,1 ;R) Hn , it follows that
Z

f (x) = f, H1 L2 (γ0,1 ;R)
H1 (x) = ξf (ξ) γ0,1 (dξ) x
for γ0,1 -almost every x ∈ R.
Exercise 2.4.35. Because the Fourier operator F (cf. (2.4.9)) is a contraction

from L1 (R; C) to L∞ (R; C) as well as from L2 (R; C) into L2 (R; C), the Riesz–
Thorin Interpolation Theorem guarantees that it is a contraction from Lp (R; C)
0
into Lp (R; C) for each p ∈ (0, 1). However, this is a case in which Riesz–Thorin
gives a less than optimal result. Indeed, show that
t ∈ 12 , 1 7−→ log A 1t ∈ R

is a strictly convex function that tends to 0 at both end points and is therefore
strictly negative. Hence, Ap < 1 for p ∈ (1, 2).
Exercise 2.4.36. The inequality in (2.4.8) is an example of a general principle.

Namely, if (E, B) is any measurable space, then a map (x, Γ) ∈ E × B 7−→
Π(x, Γ) ∈ [0, 1] is called a transition probability whenever x ∈ E 7−→ Π(x, Γ)
is B-measurable for each Γ ∈ B and Γ ∈ B 7−→ Π(x, Γ) is a probability measure
on (E, B) for each x ∈ E. Given a transition probability Π(x, · ), define the linear
operator Π on B(E; C) (the space of bounded, B-measurable ϕ : E −→ C) by
Z

Πϕ (x) = ϕ(y) Π(x, dy), x ∈ E, for ϕ ∈ B(E; C).
E
Check that Π takes B(E; C) into itself and that kΠϕku ≤ kϕku . Next, given a
σ-finite measure µ on (E, B), say that µ is Π-invariant if
Z
µ(Γ) = Π(x, Γ) µ(dx) for all Γ ∈ B.
E
Using Jensen’s Inequality, first show that, for each p ∈ [1, ∞),
p
Πϕ (x) ≤ Π|ϕ|p (x), x ∈ E,

and then that, for any Π-invariant µ,
kΠϕkLp (µ;C) ≤ kϕkLp (µ;C) , ϕ ∈ B(E; C).
Finally, show that µ is Π-invariant if it is Π-reversing in the sense that

Z Z

Π x, Γ2 µ(dx) = Π y, Γ1 µ(dy) for all Γ1 , Γ2 ∈ B.
Γ1 Γ2
Exercise 2.4.37. Recall the Hermite functions hn , n ∈ N, in (2.4.12) and define

the normalized Hermite functions hn , n ∈ N by
1
24
hn = 1 hn , n ∈ N.
(n!) 2
By noting that (cf. the discussion following (2.4.12)) hn = U2 H n , show that

hn : n ∈ N constitutes an orthonormal basis in L2 (R; C), and from this
together with (2.4.11), arrive at Parseval’s Identity:
kFf kL2 (R;C) = kf kL2 (R;C) , f ∈ L1 (R; C) ∩ L2 (R; C),
and conclude that F determines a unique unitary operator F on L2 (R; C) such

that Ff = Ff for f ∈ L1 (R; C) ∩ L2 (R; C). Finally, use this to verify the L2 -
−1
˜ (x) ≡ Ff (−x), x ∈ R, for

Fourier inversion formula F = F̃, where Ff
f ∈ L1 (R; C) ∩ L2 (R; C).
Exercise 2.4.38. By the same reasoning as I used to prove Theorem 2.4.29,

show√ that, for any pair 1 < p ≤ 2 ≤ q < ∞ and any complex number θ =
ξ + −1 η, (2.4.16) and therefore (2.4.15) hold if both (q − 1)η 2 + ξ 2 ≤ 1 and
(q − 2)(ξη)2 ≤ 1 − ξ 2 − (q − 1)η 2 (p − 1) − (q − 1)α2 − β 2 .

Exercise 2.4.39. L. Gross had a somewhat different approach to the proof of

(2.4.26). As in the proof that I have given, he reduced everything to checking
(2.4.17). However, he did this in a different way. Namely, given b ∈ (0, 1), he
set f (x) = 1 + bx and introduced the functions
−t −t
ft (x) ≡ Ke−t f (x) = 1+e2 f (x) + 1−e2 f (−x), (t, x) ∈ [0, ∞) × R,

and q(t) = 1 + (p − 1)e2t , t ∈ [0, ∞), and proved that

d
(*) ft q(t)
L (β;C)
≤ 0.
dt
Following the steps below, see if you can reproduce Gross’s calculation.
(i) Set
F (t) = kft kLq(t) (β;C) ,
and, by somewhat tedious but completely elementary differential calculus, show
that
F (t)1−q(t)
Z q(t)
dF q(t) ft
dt (t) = − q̇(t) f log F (t) dβ
q(t)2 R
Z
q(t)2 q(t)−1

+ 2 ft (x) ft (−x) − ft (x) β(dx) .
R
Next, check that
Z
ft (x)q(t)−1 ft (−x) − ft (x) β(dx)

R
Z
1
ft (x)q(t)−1 − ft (−x)q(t)−1 ft (x) − ft (−x) β(dx),

= −2
R
and, after verifying that
q q 2
q−1 q−1
4(q − 1) ξ 2 − η 2
ξ −η (ξ − η) ≥ , ξ, η ∈ (0, ∞) and q ∈ (1, ∞),
q2
conclude that
F (t)1−q(t)
Z q(t)
dF q(t) ft
dt (t) ≤ −q̇(t) f log F (t) dβ
q(t)2 R
(**) Z
q(t) q(t) 2
+ q(t) − 1 ft (x) 2 − ft (−x) β(dx) .
2
R
(ii) Prove the Logarithmic Sobolev Inequality
Z 2 Z
2 ϕ
2
(2.4.40) ϕ log kϕk 2 dβ ≤ 2 ϕ(x) − ϕ(−x) β(dx)
L (β;C)
R R
for strictly positive ϕ’s on R.
Hint: Reduce to the case when ϕ(x) = 1 + bx for some b ∈ (0, 1), and, in this
case, check that (2.4.40) is the elementary calculus inequality
(1 + b)2 log(1 + b) + (1 − b)2 log(1 − b) − (1 + b2 ) log(1 + b2 ) ≤ 2b2 , b ∈ (0, 1).
(iii) By plugging (2.4.40) into (**), arrive at (*), and conclude that (2.4.17)
holds for θ ∈ (0, 1) and q = 1 + p−1
θ2 .
Exercise 2.4.41. The major difference between Gross’s and Beckner’s ap-
proaches to proving Nelson’s Theorem 2.4.25 is that Gross based his proof on
the equivalence of contraction results like (2.4.17) and (2.4.15) to Logarithmic
Sobolev Inequalities like (2.4.40). In Exercise 2.4.38, I outlined how one passes
from a Logarithmic Sobolev Inequality to a contraction result. The object of this
exercise is to go in the opposite direction. Specifically, starting from (2.4.26),
show that
Z 2 Z
(2.4.42) 2
ϕ log ϕ
kϕkL2 (γ dγ0,1 ≤ 2 |ϕ0 |2 γ0,1 (dx)
R 0,1 ;C) R
for non-negative, continuously differentiable ϕ ∈ L2 (γ0,1 ; C) \ {0} with ϕ0 ∈

L2 (γ0,1 ; C). See Exercise 8.4.8 for another derivation.
Exercise 2.4.43. As an application of Theorem 2.4.25, show that
p
kHn kLp (γ0,1 ;C) ≤ n!(p − 1) for n ∈ N and p ∈ [2, ∞).
p p1
To see that this estimate is quite good, show that kH1 kpLp (γ0,1 ;C) = 22
1 Γ p+1
2 ,
π2
1
p−1 2

and apply Stirling’s formula (1.3.21) to conclude that kH1 kLp (γ0,1 ;C) ∼ e
as p → ∞.
Chapter 3
Infinitely Divisible Laws
The results in this chapter are an attempt to answer the following question.
GivenPan RN -valued random variable Y with the property that, for each n ∈ Z+ ,
n
Y = m=1 Xm , where X1 , . . . , Xn are independent and identically distributed,
what can one say about the distribution of Y?
Recall that the convolution ν1 ? ν2 of two finite Borel measures ν1 and ν2 on
RN is given by
ZZ
ν1 ? ν2 (Γ) = 1Γ (x + y) ν1 (dx)ν2 (dy), Γ ∈ BRN ,
RN ×RN
and that the distribution of the sum of two independent random variables is the
convolution of their distributions. Thus, the analytic statement of our problem
is that of describing those probability measures µ that, for each n ≥ 1, can be
written as the n-fold convolution power µ?n 1 of some probability measure µ n1 .
n
I will say that such a µ is infinitely divisible and will use I(RN ) to denote
the class of infinitely divisible measures on RN . Since the Fourier transform
takes convolution into ordinary multiplication, the Fourier formulation of this
problem is that of describing those Borel probability measures on RN whose
Fourier transform µ̂ has, for each n ∈ Z+ , an nth root which is again the Fourier
transform of a Borel probability measure on RN .
Not surprisingly, the Fourier formulation of the problem is, in many ways, the
most amenable to analysis, and it is the formulation in terms of which I will solve
it in this chapter. On the other hand, this formulation has the disadvantage that,
although it yields a quite satisfactory description of µ̂, it leaves the problem
of extracting information about µ from properties of µ̂. For this reason, the
following chapter will be devoted to developing a probabilistic understanding of
the analytic answer obtained in this chapter.
§ 3.1 Convergence of Measures on RN
In order to carry out our program, I will need two important facts about the
convergence of probability measures on RN . The first of these is a minor modifi-
cation of the classical Helly–Bray Theorem, and the second is an improvement,
due to Lévy, of Lemma 2.3.3.
115
116 3 Infinitely Divisible Laws
Say that the sequence {µn : n ≥ 1} ⊆ M1 (RN ) converges weakly to µ ∈

M1 (RN ) and write µn =⇒ µ when hϕ, µn i −→ hϕ, µi for all ϕ ∈ Cb (RN ; C), and
cn (ξ) −→ µ̂(ξ) for every
apply Lemma 2.3.3 to check that µn =⇒µ if and only if µ
N
ξ∈R .
§ 3.1.1. Sequential Compactness in M1 (RN ). Given a subset S of M1 (RN ),
I will say that S is sequentially relatively compact if, for every sequence
{µn : n ≥ 1} ⊆ S, there a subsequence {µnm : m ≥ 1} and a µ ∈ M1 (RN ) such
that µnm =⇒ µ.
Theorem 3.1.1. A subset S of M1 (RN ) is sequentially relatively compact if
and only if

(3.1.2) lim sup µ B(0, R){ = 0.
R→∞ µ∈S
Proof: I begin by pointing out that there is a countable set {ϕk : k ∈ Z+ } ⊆

Cc (RN ; R) of linear independent functions whose span is dense, with respect to
uniform convergence, in Cc (RN ; R). To see this, choose η ∈ Cc RN ; [0, 1] so
that η = 1 on B(0, 1) and 0 off B(0, 2), and set ηR (y) = η(R−1 y) for R >
0. Next, for each ` ∈ Z+ , apply the Stone–Weierstrass Theorem to choose a
countable dense subset {ψj,` : j ∈ Z+ } of C B(0, 2`); R , and set ϕj,` = η` ψj,` .
Clearly {ϕj,` : (j, `) ∈ (Z+ )2 } is dense in Cc (RN ; R). Finally, using lexicographic
ordering of (Z+ )2 , extract a linearly independent subset {ϕk : k ∈ Z+ } by taking
ϕk = ϕjk ,`k , where (j1 , `1 ) = (1, 1) and (jk+1 , `k+1 ) is the first (j, `) such that
ϕj,` is linearly independent of {ϕ1 , . . . , ϕk }.
Given a sequence {µn : n ≥ 1} ⊆ S, we can use a diagonalization procedure to
find a subsequence {µnm : m ≥ 1} such that ak = limm→∞ hϕk , µnm i exists for
every k ∈ Z+ . Next, define the linear functional Λ on the span of {ϕk : k ∈ Z+ }
PK
so that Λ(ϕk ) = ak . Notice that if ϕ = k=1 αk ϕk , then

XK

Λ(ϕ) = lim αk hϕk , µnm i = lim hϕ, µnm i ≤ kϕku ,

m→∞ m→∞
k=1
and similarly that Λ(ϕ) = limm→∞ hϕ, µnm i ≥ 0 if ϕ ≥ 0. Hence, Λ admits a

unique extension as a non-negativity preserving linear functional on Cc (RN ; R)
that satisfies |Λ(ϕ)| ≤ kϕku for all ϕ ∈ Cc (RN ; R).
Now assume that (3.1.2) holds. For each ` ∈ Z+ , apply the Riesz Representa-
tion Theorem to produce a non-negative Borel measure ν` supported on B(0, 2`)
so that hϕ, ν` i = Λ(η` ϕ) for ϕ ∈ Cc (RN ; R). Since hϕ, ν`+1 i = Λ(ϕ) = hϕ, ν` i
whenever ϕ vanishes off of B(0, `), it is clear that

ν`+1 Γ ∩ B(0, ` + 1) ≥ ν`+1 Γ ∩ B(0, `) = ν` Γ ∩ B(0, `) for all Γ ∈ BRN .
§ 3.1 Convergence of Measures on RN 117
Hence, if
∞
X
µ(Γ) ≡ lim µ` Γ ∩ B(0, `) = µ` Γ ∩ B(0, `) \ B(0, ` − 1) ,
`→∞
`=1
then µ is a non-negative Borel measure on RN whose restriction to B(0, `) is ν`

for each ` ∈ Z+ . In particular, µ(RN ) ≤ 1 and hϕ, µi = limm→∞ hϕ, µnm i for
every ϕ ∈ Cc (RN ; R). Thus, by Lemma 2.1.7, all that remains is to check that
µ(RN ) = 1. But
µ(RN ) ≥ hη` , µi = lim hη` , µnm i ≥ lim µnm B(0, `)

m→∞ m→∞

= 1 − lim µnm B(0, `){ ,
m→∞
and, by (3.1.2), the final term tends to 0 as ` → ∞.

To prove the converse assertion, suppose that S is sequentially relatively com-
pact. If (3.1.2) failed, then we could find an θ ∈ (0, 1) and, for each n ∈ Z+ ,
a µn ∈ S such that µn B(0, n) ≤ θ. By sequential relative compactness, this
would mean that there is a subsequence {µnm : m ≥ 1} ⊆ S and a µ ∈ M1 (RN )
such that µnm =⇒ µ and µnm B(0, nm ) ≤ θ. On the other hand, for any
R > 0,
µ B(0, R) ≤ hηR , µi ≤ lim µnm B(0, nm ) ≤ θ,
m→∞

and so we would arrive at the contradiction 1 = limR→∞ µ B(0, R) ≤ θ.
§ 3.1.2. Lévy’s Continuity Theorem. My next goal is to find a test in terms
of the Fourier transform to determine when (3.1.2) holds.
Lemma 3.1.3. Define

sin θ
s(r) = inf 1− for r ∈ (0, ∞).
θ≥r θ
Then s is a strictly positive, non-decreasing, continuous function that tends to

0 as r & 0. Moreover, if µ ∈ M1 (RN ), then, for all (r, R) ∈ (0, ∞)2 ,
1 − µ̂(re) ≤ rR + 2µ {y : |(e, y)RN | ≥ R} for all e ∈ SN −1 ,

(3.1.4)
and
1
µ B(0, N 2 R){ ≤ N sup µ {y : |(e, y)RN | ≥ R}
e∈SN −1
(3.1.5)
N
≤ max 1 − µ̂(ξ) : |ξ| ≤ r .
s(rR)
In particular, for any S ⊆ M1 (RN ), (3.1.2) holds if and only if

(3.1.6) lim sup1 − µ̂(ξ) = 0.
|ξ|&0 µ∈S
Proof: Given (3.1.4) and (3.1.5), √

the final assertion is obvious. To prove
(3.1.4), simply observe that 1 − e −1(re,y)RN ≤ 2 ∧ r|(e, y)RN | .

Turning to (3.1.5), note that

Z

1 − µ̂(ξ) ≥ 1 − cos(ξ, y)RN µ(dy).
RN
Thus, for each e ∈ SN −1 ,
r
!
sin r(e, y)RN
Z Z
1
1 − µ̂(te) dt ≥ 1− µ(dy)
r 0 RN \{0} r(e, y)RN

≥ s(rR)µ {y : |(e, y)RN | ≥ R} ,
and therefore

(3.1.7) sup 1 − µ̂(ξ) ≥ s(rR)µ {y : |(e, y)RN | ≥ R} .
ξ∈B(0,r)
Since the first inequality in (3.1.5) is obvious, there is nothing more to be

done.
I am now ready to prove Lévy’s crucial improvement to Lemma 2.3.3.
Theorem 3.1.8 (Lévy’s Continuity Theorem). Let {µn : n ≥ 1} ⊆
M1 (RN ), and assume that f (ξ) = limn→∞ µ̂n (ξ) exists for each ξ ∈ RN . Then
there is a µ ∈ M1 (RN ) such that
f = µ̂ if and only if there is a δ > 0 for which
limn→∞ sup|ξ|≤δ µ̂n (ξ) − f (ξ) = 0, in which case µn =⇒ µ. (See part (iv) of
Exercise 3.1.9 for another version.)
Proof: The only assertion not already covered by Lemmas 2.1.7 and 2.3.3 is
the “if” part of the equivalence. But, if µ̂n −→ f uniformly in a neighborhood of
0, then it is easy to check that supn≥1 |1 − µ̂n (ξ)| must tend to zero as |ξ| → 0.
Hence, by the last part of Lemma 3.1.3 and Theorem 3.1.1, we know that there
exists a µ and a subsequence {µnm : m ≥ 1} such that µnm =⇒ µ. Since µ̂ must
equal f , Lemma 2.3.3 says that µn =⇒ µ.
Exercise 3.1.9. One might think that to address the sort of problem posed
at the beginning of this chapter, it would be helpful to know which functions
f : RN −→ C are the Fourier transforms of a probability measure. Such a
characterization is the content of Bochner’s Theorem, whose proof will be
outlined in this exercise. Unfortunately, his characterization looks more useful
than it is in practice. For instance, I will not use it to solve our problem, and it
is difficult to see how its use would simplify matters.
In order to state Bochner’s Theorem, say that a function f : RN −→ C is
N
non-negative definite if, for each n ≥ 1 and ξ1 , . . . , ξn ∈ R , the matrix
f (ξi − ξj ) 1≤i,j≤n is Hermitian and non-negative definite. Equivalently,1
n
X
f (ξi − ξj )ζi ζ̄j ≥ 0 for all ζ1 , . . . , ζn ∈ C.
i,j=1
Then Bochner’s Theorem is the statement that f = µ̂ for some µ ∈ M1 (RN ) if

and only if f (0) = 1 and f is a continuous, non-negative definite function.
(i) It is ironic that the necessity assertion is the more useful even though it is
nearly trivial. Indeed, if f = µ̂, then it is obvious that f (0) = 1 and that f is
continuous. To see that it is also non-negative definite, write
2
n
X √ Xn √
−1(ξi −ξj ,x)RN −1(ξi ,x)RN
e ζi ζ̄j = e ζi ,

i,j=1 i=1
and integrate in x with respect to µ.

(ii) The first step in proving the sufficiency is to use
the non-negative definite-
ness assumption to show that f (−x) = f (x) and f (x) ≤ f (0) for all x ∈ RN .
Obviously, this proves that kf ku ≤ 1. Second, using a standard Riemann approx-
imation procedure and the continuity of f , check that, for any rapidly decreasing,
continuous ψ̂ : RN −→ C,
ZZ
f (x − η)ψ̂(x)ψ̂(η) dx dη ≥ 0.
RN ×RN

In particular, when f ∈ L1 RN ; C , set
√
Z
−N
m(x) = (2π) e− −1 (x,ξ)RN
f (ξ) dξ,
RN
1 Recall that a non-negative definite operator on a complex Hilbert space is always Hermitian.
and use Parseval’s Identity and Fubini’s Theorem, together with elementary
manipulations, to arrive at
Z ZZ
(2π)N m(x) ψ(x)2 dx = f (ξ − η)ψ̂(ξ)ψ̂(η) dξ dη ≥ 0
RN
RN ×RN
for all ψ ∈ L1 (RN ; R) ∩ Cb (RN ; R) with ψ̂ ∈ L1 (RN ; R). Conclude that m is non-

negative, and use this to complete the proof in the case when f ∈ L1 RN ; C .

(iii) It remains only to pass from the case when f ∈ L1 RN ; C to the general
|x|2
case. For each t ∈ (0, ∞), set ft (x) = e−t 2 f (x). Clearly, ft (0) = 1 and
ft ∈ Cb (RN ; C) ∩ L1 (RN ; C). In addition, show that
 
Xn Z Xn

ft ξi − ξj ζi ζ̄j =  f ξi − ξj ζi (x)ζ̄j (x) γ0,tI (dx) ≥ 0,
i,j=1 RN i,j=1
√
where ζi (x) ≡ ζi e −1 (ξi ,x)RN . Hence, ft is also non-negative definite, and so,
by part (ii), we know thatft = µbt for some µt ∈ M1 (RN ). Finally, apply Lévy’s
Continuity Theorem to see that µt =⇒µ, where µ ∈ M1 (RN ) satisfies f = µ̂.
(iv) Let {µn : n ≥ 1} and f be as in Theorem 3.1.8. Combining Bochner’s
Theorem with Lemma 2.1.7, show that there exists a µ ∈ M1 (RN ) such that
f = µ̂ and µn =⇒ µ if and only if f is continuous.
Exercise 3.1.10. Suppose that f is a non-negative definite function with f (0) =
1. As we have just seen, if f is continuous, then f = µ̂ for some µ ∈ M1 (RN ).
(i) Assuming that f = µ̂, show that
and |f (η) − f (ξ)|2 ≤ 2 1 − Re f (η − ξ) , ξ, η ∈ RN .

(*) kf ku ≤ 1
Next, show that (*) follows directly from non-negative definiteness, whether
or not f is continuous. Thus, a non-negative definite function is uniformly
continuous everywhere if it is continuous at the origin.
Hint: Both parts of (*) follow from the fact that
 
1 f (ξ) f (η)
A =  f (ξ) 1 f (ξ − η) 
f (η) f (ξ − η) 1
is non-negative
definite. To get the second part, consider the quadratic form
v, Av C3 with v = (v1 , 1, −1).2
2 This choice of v was suggested to me by Linan Chen.
(ii) To understand how essential a role continuity plays in Bochner’s criterion,

show that f = 1{0} is non-negative definite. Even though this f cannot be the
Fourier transform of any µ ∈ M1 (RN ), it is nonetheless the “Fourier transform”
of a non-negativity preserving linear functional, one for which there is no Riesz
representation. To be more precise, consider the linear functional Λ on the space
of functions ϕ ∈ Cb (RN ; C) for which
Z
1
Λϕ ≡ lim ϕ(x) dx exists,
R→∞ |B(0, R)| B(0,R)
√
−1(ξ,x)RN
and show that f (ξ) = Λ(eξ ), where eξ (x) = e .
Exercise 3.1.11. It is important to recognize the extent to which Lévy’s Con-
tinuity Theorem and, as a by-product, Bochner’s Theorem, are strictly finite
dimensional results. For example, let H be an infinite dimensional, separa-
1 2
ble, real Hilbert space, and define f (h) = e− 2 khkH . Obviously, f is a con-
tinuous and f (0)= 1. Show that it is also non-negative definite in the sense
that f (hi − hj ) 1≤i,j≤n is a non-negative definite, Hermitian matrix for each
n ∈ Z+ and h1 , . . . , hn ∈ H. Now suppose that there were a Borel probability
measure µ on H such that
Z √
µ̂(h) ≡ e −1(h,x)H µ(dx) = f (h), h ∈ H.
H
Show that, for any orthonormal basis {ei : i ∈ Z+ } in H, the functions Xi (h) =
(ei , h)H , i ∈ Z+ , would be, under µ, a sequence of independent, N (0, 1)-random
variables, and conclude from this that
Z
2 Y 2
e−khkH µ(dh) = Eµ e−Xi = 0.

H i∈Z+
Hence, no such µ can exist. See Chapter 8 for a much more thorough account
of this topic.
Hint: The non-negative definiteness of f can be seen as a consequence of the
analogous result for Rn .
Exercise 3.1.12. The Riemann–Lebesgue Lemma says that fˆ(ξ) −→ 0
as |ξ| → ∞ if f ∈ L1 (RN ; C). Thus µ̂(ξ) −→ 0 as |ξ| → ∞ if µ ∈ M1 (R)
is absolutely continuous. In this exercise we will examine situations in which
µ ∈ M1 (R) but µ̂(ξ)−→
6 0 as |ξ| → ∞.
(i) Given a symmetric µ ∈ M1 (R), show that µ̂ is real valued, and use Bochner’s
Theorem to show that µ̂(ξ) cannot tend to a strictly negative number as |ξ| → ∞.
Hint: Let α > 0, and suppose that µ̂(ξ) −→ −2α as |ξ| → ∞. Choose R > 0
+
so that µ̂(ξ) ≤
−α for |ξ| ≥ R and n ∈ Z so that (n − 1)α > 1. Set A =
µ̂(`R − kR) 1≤k,`≤n , and show that A cannot be non-negative definite.
(ii) Show that µ̂(ξ)−→ 6 0 if µ has an atom (i.e., µ({x}) > 0 for some x ∈ R).
Hint: Reduce to the case in which µ is symmetric, and therefore that µ = pδ0 +
qν, where p ∈ (0, 1], q = 1 − p, and ν ∈ M1 (R) is symmetric. If p = 1, µ̂(ξ) = 1
for all ξ. If p ∈ (0, 1), then µ̂(ξ) −→ 0 as |ξ| → ∞ implies ν̂(ξ) −→ − pq < 0.
(iii) To produce an example that is non-atomic, refer to Exercise 1.4.29, take
p ∈ (0, 1) \ { 12 }, and let µ = µp , where µp is the measure described in that
exercise. Show that µ is a non-atomic element of M1 (R) for which µ̂−→ 6 0 as
|ξ| → ∞.
Hint: Show that µ̂ never vanishes and that µ̂(2m π) is independent of m ∈ Z+ .
§ 3.2 The Lévy–Khinchine Formula
Throughout, I(R ) will be the set of µ ∈ M1 (RN ) that are infinitely divisible.
N
My strategy for characterizing I(RN ) will be to start from an easily understood

subset of I(RN ) and to get the rest by taking weak limits.
The elements of I(RN ) that first come to mind are the Gaussian measures
(cf. (2.3.6)) γm,C . Indeed, if m ∈ RN and C is a symmetric, non-negative
definite transformation on RN , then it is clear from (2.3.7) that γm,C = γ ?n
m C.
n ,n
Unfortunately, this is not a good starting place because it is too rigid: limits of
Gaussians are again Gaussian. Indeed, suppose that γmn ,Cn =⇒ µ. Then
√
−1 (ξ,mn )RN − 12 (ξ,Cn ξ)RN
e −→ µ̂(ξ) for all ξ ∈ RN ,
and so µ = γm,C , where m = limn→∞ mn and C = limn→∞ Cn . In other words,
one cannot use weak convergence to escape the class of Gaussian measures.
A more fruitful choice is to start with the Poisson measures. Recall that if
ν is a probability measure on RN and α ∈ [0, ∞), then the Poisson measure
with jump distribution ν and jumping rate α (see § 4.2 for an explanation of this
terminology) is the measure
∞
X αn ?n
πα,ν = e−α ν .
n=0
n!
To see that πα,ν is infinitely divisible, note that

Z √
−1 (ξ,y)RN

πd
α,ν (ξ) = exp α e − 1 ν(dy) ,
and therefore that πα,ν = π ?nα . To see why the Poisson measures provide a
n ,ν
more hopeful choice of starting point, let m ∈ RN and a non-negative definite,
symmetric C be given, and choose (e1 , . . . , eN ) to be p an orthonormal basis of
eigenvectors for C. Next, set mi = (m, ei )RN and σi = (ei , Cei )RN , and take
N N
!
1 X 1X
νn = δ mi ei + δ σi ei + δ− σ√i ei .
2N i=1 n 2 i=1 √n n
§ 3.2 The Lévy–Khinchine Formula 123
Then the Fourier transform of π2N n,νn is
N √ N
!
X −1mi (ξ,ei ) N
R
X
σi (ξ, ei )RN
exp n e n −1 + n cos 1 −1 ,
i=1 i=1
n2
which tends to γ[ m,C (ξ) as n → ∞, and so π2N n,νn =⇒ γm,C as n → ∞. Thus,

one can use weak convergence to break out to the class of Poisson measures.
As I will show in the next subsection, the preceding is a special case of a
result (cf. Theorem 3.2.7) that says that every infinitely divisible measure is the
weak limit of Poisson measures. However, before proving that result, it will be
convenient to alter our description of Poisson measures. For one thing, it should
be clear that, without loss in generality, I may always assume that the jump
distribution ν assigns no mass to 0. Indeed, if ν({0}) = 1, then πα,ν = δ0 = π0,ν 0
no matter how α and ν 0 are chosen. If β = ν({0}) ∈ (0, 1), then πα,ν = πα0 ,ν 0 ,
where α0 = α(1 − β) and ν 0 = (1 − β)−1 (ν − βδ0 ). In addition, although the
segregation of the rate and jumping distribution provides probabilistic insight,
there is no essential reason for doing so. Thus, nothing is lost if one replaces
πα,ν by πM , where M is the finite measure αµ, in which case
√
Z
−1(ξ,y)RN

π
d M (ξ) = exp e − 1 M (dy) .
With these considerations in mind, let M0 (RN ) be the space of non-negative,

finite Borel measures M on RN with M ({0}) = 0, and set P(RN ) = {πM : M ∈
M0 (RN )}, the space of Poisson measures on RN .
§ 3.2.1. I(RN ) Is the Closure of P(RN ). Let P(RN ) be the closure of P(RN )
under weak convergence. That is, µ ∈ P(RN ) if and only if there exists a sequence
{Mn : n ≥ 1} ⊆ M0 (RN ) such that πMn =⇒µ. My goal here is to prove that
(3.2.1) I(RN ) = P(RN ).
Before turning to the proof of (3.2.1), I need the following simple lemma about
non-vanishing, C-valued functions. In its statement, and elsewhere,
∞
X (1 − ζ)m
(3.2.2) log ζ = − for ζ ∈ C with |1 − ζ| < 1
m=1
m
is the principle branch of logarithm function on the open unit disk around 1 in
the complex plane.

Lemma 3.2.3. Let R ∈ (0, ∞) be given. If f ∈ C B(0, R); C \ {0} with

f (0) = 1, then there is a unique `f ∈ C B(0; R); C such that `f (0) = 0 and

f (η)
f = e`f . Moreover, if ξ ∈ B(0; R), r ∈ (0, ∞), and 1 − < 1 for all

f (ξ)
η ∈ B(ξ, r) ∩ B(0, R), then, for each η ∈ B(ξ, r) ∩ B(0, R),
f (η)
`f (η) − `f (ξ) = log ,
f (ξ)
and therefore

f (η) f (η) 1
|`f (η) − `f (ξ)| ≤ 2 1 − if 1 −
≤ .
f (ξ) f (ξ) 2
if f˜ is a second element of C B(0; R); C \ {0} with f˜(0) = 1 and if

Finally,
f˜(ξ)
1 − f (ξ) ≤ 12 for all ξ ∈ B(0, R), then

˜(ξ)
f
` ˜(ξ) − `f (ξ) ≤ 2 1 −

for ξ ∈ B(0, R).
f
f (ξ)

In particular, if {fn : n ≥ 1} ⊆ C B(0, R); C \ {0} with fn (0) = 1 for all n ≥ 1,

and if fn −→ f ∈ C B(0; R); C \ {0} uniformly on B(0, R), then f (0) = 1 and
`fn −→ `f uniformly on B(0; R).
Proof: To prove the existence and uniqueness of `f , begin by observing that
there exists an M ∈ Z+ and 0 = r0 < r1 < · · · < rM = R such that

1 − f (ξ) 1
≤ for 1 ≤ m ≤ M and ξ ∈ B(0, rm ) \ B(0, rm−1 ).

f rm−1 ξ 2
|ξ|

Thus, we can define a function `f on B(0, R) so that `f (0) = 0 and

rm−1 ξ f (ξ)
`f (ξ) = `f + log
|ξ| f rm−1 ξ
|ξ|
if 1 ≤ m ≤ M and ξ ∈ B(0, rm ) \ B(0, rm−1 ).
Furthermore, working by induction on 1 ≤ m ≤ M , one sees that this `f is

continuous and satisfies f = e`f . Finally, for any ` ∈ C B(0, R); C satisfying
√
`(0) = 0 and f = e` , ( −12π)−1 (` − `f ) is a continuous, Z-valued function that
vanishes at 0, and therefore ` = `f .
Next suppose that ξ ∈ B(0, R) and that

1 − f (η) < 1 for all η ∈ B(ξ, r) ∩ B(0, R).

f (ξ)
Set
f (η)
`(η) = `f (ξ) + log for η ∈ B(ξ, r) ∩ B(0, R),
f (ξ)
√
( −12π)−1 `(η) − `f (η) is a continuous, Z-valued func-

and check that η
tion that vanishes at ξ. Hence, ` = `f on B(0, R) ∩ B(ξ, r), and therefore on
B(0, R) ∩ B(ξ, r). Since | log(1 − ζ)| ≤ 2|ζ| if |ζ| ≤ 12 , this completes the proof
of the asserted properties of `f .
˜
Turning to the comparison between `f and `f˜ when 1 − ff (ξ) 1
(ξ) ≤ 2 for all

f˜(ξ)
ξ ∈ B(0, R), set `(ξ) = `f (ξ) + log f (ξ) , check that `(0) = 0 and f˜ = e` , and
˜
conclude that `f˜ − `f = log ff . From this, the asserted estimate for |`f˜ − `f | is
immediate.
Lemma 3.2.4. Define r s(r) as in Lemma 3.1.3, and let µ ∈ M1 (RN ) and
0 < r < R be given. If |1 − µ̂(ξ)| ≤ 12 for all ξ ∈ B(0, r) and there is an
ν ∈ M1 (RN ) such that µ = ν ?n for some
16
(3.2.5) n≥ r
,
s 4R
then |µ̂(ξ)| ≥ 2−n for all ξ ∈ B(0, R).

Proof: First apply Lemma 3.2.3 to see that, because µ̂(ξ) = ν̂(ξ)n , neither µ̂
nor ν̂ vanishes anywhere on B(0, r) and therefore that there are unique `, `˜ ∈

˜ ˜
C B(0, r); C such that `(0) = 0 = `(0), µ̂ = e` , and ν̂ = e` on B(0, r).
˜
Further, since µ̂ = en` , uniqueness requires that `˜ = n1 `. Next, observe that,
because ` = log µ̂ and |1 − µ̂| ≤ 12 on B(0, r), |`| ≤ 2 there. Hence, because
1
Re` ≤ 0, |1 − ν̂| = 1 − e n ` ≤ n2 on B(0, r). Using this in (3.1.7), we have, for
any ρ > 0 and e ∈ SN −1 , that
1 2
(3.2.6) ν {y : |(e, y)RN | ≥ ρ} ≤ max 1 − ν̂(ξ) ≤ ,
s(rρ) ξ∈B(0,r) ns(rρ)
4
which, by (3.1.4), leads to 1 − ν̂(ξ) ≤ ρR + ns(rρ) for ξ ∈ B(0, R). Finally take
1
ρ = 4R , and use (3.2.5) and µ̂(ξ) = ν̂(ξ)n to check that this gives the desired
conclusion.
I now have everything that I need to prove the equality (3.2.1).
Theorem 3.2.7. For each µ ∈ I(RN ) there is a unique `µ ∈ C(RN ; C) satis-
1
fying `µ (0) = 0 and µ̂ = e`µ . Moreover, for each n ∈ Z+ , e n `µ is the Fourier
transform of the unique µ n1 ∈ M1 (RN ) such that µ = µ?n 1 . In addition, if
n
Mn ∈ M0 (RN ) is defined by
Mn (Γ) ≡ nµ n1 Γ ∩ (RN \ {0})

(3.2.8) for Γ ∈ BRN ,
then πMn =⇒µ. Finally, I(RN ) is closed in the sense that µ ∈ I(RN ) if there
exists a sequence {µk : k ≥ 1} ⊆ I(RN ) such that µk =⇒ µ. In particular, µ n1
is uniquely determined and (3.2.1) holds.
Proof: Let µ ∈ I(RN ) be given. Since there is an r > 0 such that |1− µ̂(ξ)| ≤ 12
for all ξ ∈ B(0, r) and, for all n ∈ Z+ , µ = µ?n 1 for some µ n1 ∈ M1 (RN ),
n
Lemma 3.2.4 guarantees that µ̂ never vanishes. Hence, by Lemma 3.2.3, both
the existence and uniqueness of `µ follow. Moreover, if µ = µ?n 1 , then, from
n
n
µ̂(ξ) = µcn1 (ξ) , we know first that µcn1 never vanishes and then that `µ = n`,
where ` is the unique element of C(RN ; C) satisfying `(0) = 0 and µcn1 = e` . In
1
particular, this proves that µ n1 = e n ` for any µ n1 with µ = µ∗n
1 , and so there is
n
at most one such µ n1 .
Now define Mn as in the statement, and observe that
1
πd (ξ) = exp n µ̂ 1 (ξ) − 1 = exp n e n `µ (ξ) − 1 −→ e`µ (ξ) = µ̂(ξ)
Mn n
as n → ∞. Hence, πMn =⇒µ. In particular, this proves that I(RN ) ⊆ P(RN ),

and therefore, since we already know that P(RN ) ⊆ I(RN ), the final statement
will follow once we check that I(RN ) is closed.
To prove that I(RN ) is closed, suppose that {µk : k ≥ 1} ⊆ I(RN ) and that
µk =⇒ µ. The first step in checking that µ ∈ I(RN ) is to show that µ̂ never
vanishes. To this end, use the fact that µ̂k −→ µ̂ uniformly on compacts to see
that there must exist an r > 0 such that |1 − µ̂k (ξ)| ≤ 12 for all k ∈ Z+ and
ξ ∈ B(0, r). Hence, because each of the µk ’s is infinitely divisible, one can use
Lemma 3.2.4 to see that, for each R ∈ (0, ∞),
inf{|µ̂k (ξ)| : k ∈ Z+ and ξ ∈ B(0, R)} > 0,
and clearly this is more than enough to show that µ̂ never vanishes. Thus we can
choose a unique ` ∈ C(RN ; C) so that `(0) = 0 and µ̂ = e` . Moreover, if `k = `µk ,
then, by Lemma 3.2.3, `k −→ ` uniformly on compacts. Now let n ∈ Z+ be given,
and choose {µk, n1 : k ≥ 1} ⊆ M1 (RN ) so that µk = µ?n k, 1
. Then we know that
n
1 1
`
1 = e n k , and so, as k → ∞, µ̂ 1 −→ e n
`
µ
[k, n k, n uniformly on compacts. Hence,
1
by Lévy’s Continuity Theorem, e n ` = µ̂ n1 for some µ n1 ∈ M1 (RN ). Since this
means that µ = µ?n1 , we have shown that µ ∈ I(R ).
N

n
§ 3.2.2. The Formula. Theorem 3.2.7 provides interesting information, but it

fails to provide a concrete characterization of the infinitely divisible laws. In this
subsection I will give an explicit formula for µ̂ when µ ∈ I(RN ), which, in view
of the first part of Theorem 3.2.7, is equivalent to characterizing the functions
in {`µ : µ ∈ I(RN )}.
In order to understand what follows, it may be helpful to first guess what

the characterization might be. We already know two families of measures which
are contained in I(RN ): the Gaussian measures γm,C for m ∈ RN and symmet-
ric, non-negative definite C ∈ Hom(RN ; RN ), and the Poisson measures πM for
M ∈ M0 (RN ). Further, it is obvious that µ, ν ∈ I(RN ) =⇒ µ ? ν ∈ I(RN ), and
we know that µ ∈ I(RN ) if µn =⇒ ν for some {µn : n ≥ 1} ⊆ I(RN ). Finally,
Theorem 3.2.7 tells us that every element of I(RN ) is the limit of Poisson mea-
sures. Thus, by Lévy’s Continuity Theorem, we should be asking what sort of
functions can arise as the locally uniform limit of functions of the form
√
Z h √ i
1
e −1(ξ,y)RN − 1 M (dy),

(*) ξ ` = −1 ξ, m RN − 2 ξ, Cξ RN +
RN
and, as I already noted, only the Poisson component M offers much flexibility.
With this in mind, I introduce for each α ∈ [0, ∞) the class Mα (RN ) of Borel
measures M on RN such that
|y|α
Z
M ({0}) = 0 and α
M (dy) < ∞.
RN 1 + |y|
When M ∈ M0 (RN ), the function ` in (*) equals `µ for µ = γm,C ? πM . More

generally, even if M ∈ Mα (RN ) \ M0 (RN ), for each r > 0, Mr given by M (dy) =
1[r,∞) (|y|)M (dy) is an element of M0 (RN ). Furthermore, if M ∈ M1 (RN ), then
it is clear that, as r & 0,
√−1(ξ,y) N √−1(ξ,y) N
Z i Z i
e R − 1 Mr (dy) −→ e R − 1 M (dy)
RN RN
uniformly on compacts. Thus, by Lévy’s Continuity Theorem, when M ∈

M1 (RN ), the function ` in (*) is `µ for a µ ∈ I(RN ). In order to handle
M ∈ Mα (RN ) for α > 1, we must make the integrand M -integrable
√
at 0 by
−1(ξ,y)RN
subtracting off the next term in the Taylor expansion of e . Thus,
choose a Borel measurable function η : RN −→ [0, 1] that equals 1 in a neigh-
borhood of 0, and set `r (ξ) equal to
√ h √ √
Z
i
1
e −1(ξ,y)RN − 1 − −1η(y) ξ, y RN Mr (dy).

−1 ξ, m RN
− 2 ξ, Cξ RN
+
RN
Because
√ √
Z h i
1 −1(ξ,y)RN

`r (ξ) = −1 ξ, mr RN
− 2 ξ, Cξ RN
+ e − 1 Mr (dy),
RN
Z
where mr = m − η(y)y Mr (dy),
RN
we know that `r = `µr for µr = γmr ,C ? πMr . In addition, if M ∈ M2 (RN ) and

`(ξ) equals
√ h √ √
Z
i
−1(ξ, m)RN − 1
2 (ξ, Cξ)RN + e −1(ξ,y)RN − 1 − −1η(y) ξ, y RN M (dy),
RN
then `r −→ ` uniformly on compacts. Hence, again by Lévy’s Continuity Theo-

rem, we know that, for each M ∈ M2 (RN ), the function
√
−1 ξ,m RN − 12 ξ, Cξ RN

ξ `(ξ) ≡
(**) √
Z h √
i
+ e −1(ξ,y)RN − 1 − −1η(y) ξ, y RN M (dy)
RN
equals `µ for some µ ∈ I(RN ).

One might think that repeated application of the same procedure would show
that one need not stop at M2 (RN ) and that more singular M ’s can occur in
the representation of `. More precisely, one might try accommodating M ’s from
M3 (RN ) by subtracting off the next term in the Taylor expansion. That is, one
would replace
√ √
Z h i
−1(ξ,y)RN

e −1− −1η(y) ξ, y RN
Mr (dy)
RN
by
h √ √
Z
2 i
e −1(ξ,y)RN − 1 − −1η(y) ξ, y RN + 12 η(y) ξ, y RN Mr (dy)

RN
in the expression for `r . However, to re-write this `r in the form given in (*),
one would have to replace C by
Z
C− η(y)y ⊗ y Mr (dy),
RN
which would destroy non-negative definiteness as r & 0.

The preceding discussion is evidence for the conjecture that the functions ` of
the form in (**) coincide with {`µ : µ ∈ I(RN )}, and the rest of this subsection is
devoted to the verification of this conjecture. Because of their role here, elements
of M2 (RN ) are called Lévy measures.
The strategy that I will adopt derives from the observation that `µ (ξ) =
limn→∞ n µ̂ n1 (ξ) − 1 . Thus, if we can understand the operation

Aµ ϕ = lim n hϕ, µ n1 i − ϕ(0)
n→∞
for a sufficiently rich class

√
of functions ϕ, then we can understand
√
`µ (ξ) by
−1(ξ,x)RN −1(ξ,x)RN
applying Aµ to x e . Even though x e is not an
element, for technical reasons, it turns out that the class of ϕ’s on which it
is easiest to understand Aµ is the Schwartz test function space S (RN ; C) (the
space of smooth C-valued functions that, together with all of their derivatives,
are rapidly decreasing). The basic reason why S (RN ; C) is well suited to our
analysis is that the Fourier transform maps S (RN ; C) onto itself. Further, once
we understand how Aµ acts on S (RN ; C), it is a relatively simple matter to use
that understanding to compute `µ (ξ).
Lemma 3.2.9. Let µ ∈ I(RN ) be given. For each r ∈ (0, ∞) there exists a
C(r) < ∞ such that |`µ (ξ)| ≤ C(r)(1 + |ξ|2 ) for all ξ ∈ RN whenever µ ∈ I(RN )
satisfies |1 − µ̂(ξ)| ≤ 12 for ξ ∈ B(0, r). Moreover,

Aµ (c1 + ϕ) ≡ lim n hc1 + ϕ, µ n1 i − c + ϕ(0)
n→∞
(3.2.10) 1
Z
= `µ (ξ)ϕ̂(ξ) dξ
(2π)N RN
for each c ∈ C and ϕ ∈ S (RN ; C).

Proof: Suppose that µ ∈ I(RN ) satisfies |1 − µ̂(ξ)| ≤ 12 for ξ ∈ B(0, r).
Applying (3.1.4) and the second inequality in (3.2.6) with ν = µ n1 , we know
that, for any (ρ, R) ∈ (0, ∞)2 ,
4
sup |1 − µcn1 (ξ)| ≤ ρR + .
|ξ|≤R ns(rρ)
1
Hence, if R ≥ r, then, by taking ρ = 4R , we obtain sup|ξ|≤R |1 − µcn1 (ξ)| ≤ 12
and therefore sup|ξ|≤R | n1 `µ (ξ)| ≤ 2 if n satisfies (3.2.5). Finally, observe that
2
is an > 0 such that s(t) ≥ t for t ∈ (0, 1], and therefore that |`µ (ξ)| ≤
there
64R2
2 1+ r 2 for |ξ| ≤ R, which completes the proof of the first assertion.
Clearly it suffices to prove (3.2.10) when c = 0. Thus, let ϕ ∈ S (RN ; C) be
given. Then, by (2.3.4),
Z
1
N
n e n `µ (ξ) − 1 ϕ̂(ξ) dξ

(2π) n hϕ, µ n1 i − ϕ(0) =
RN
Z 1 Z Z
t
= e n `µ (ξ) `µ (ξ)ϕ̂(ξ) dξ dt −→ `µ (ξ)ϕ̂(ξ) dξ,
0 RN RN
1
where (keeping in mind that |e n `µ | = |µ̂ n1 (ξ)| ≤ 1, `µ (ξ) has a most quadratic
growth, and ϕ̂(ξ) is rapidly decreasing) the passage to the second line is justified
by Fubini’s Theorem and the limit is an application of Lebesgue’s Dominated

Convergence Theorem.
Lemma 3.2.9, especially (3.2.10), provides us with two critical pieces of infor-
mation about Aµ . Namely, it tells us that Aµ satisfies the minimum principle
and that it is quasi-local. To be precise, set D = R⊕S (RN ; R). That is, ϕ ∈ D
if and only if there is a ϕ(∞) ∈ R such that ϕ − ϕ(∞)1 ∈ S (RN ; R). I will say
that a real-valued linear functional A on D satisfies the minimum principle if
(3.2.11) Aϕ ≥ 0 if ϕ ∈ D and ϕ(0) = min ϕ(x)

x∈RN
and that A is quasi-local if
(3.2.12) lim AϕR = 0 for all ϕ ∈ D,

R→∞
x

where ϕR (x) = ϕ R for R > 0. Notice that, by applying the minimum principle
to both 1 and −1, one knows that A1 = 0.
To see that Aµ satisfies both these conditions, first observe that if ϕ(0) =
minx∈RN ϕ(x), then hϕ, µ n1 i − ϕ(0) ≥ 0 for all n ∈ Z+ , and therefore that
Aµ ϕ ≥ 0. Secondly, to check that Aµ is quasi-local, note that it suffices to treat
ϕ ∈ S (RN ; R) and that for such a ϕ, ϕc N
R (ξ) = R ϕ̂(Rξ). Thus,
Z
N

(2π) Aµ ϕR = `µ R−1 ξ ϕ̂(ξ) dξ −→ 0,
RN
since `µ (0) = 0 and ξ supR≥1 |`µ (R−1 ξ)ϕ̂(ξ)| is rapidly decreasing.

As I am about to show, these two properties allow us to say a great deal about
Aµ . Before explaining this, first observe that if M ∈ Mα (RN ), then, for every
Borel measurable ϕ : RN −→ C,
|ϕ(y)|
(3.2.13) sup α
< ∞ =⇒ ϕ ∈ L1 (M ; C).
y∈RN \{0} 1 ∧ |y|
Using (3.2.13), one can easily check that if ϕ ∈ Cb2 (RN ; C) and η ∈ S (RN ; R)
equals 1 in a neighborhood of 0, then

y ϕ(y) − ϕ(0) − η(y) y, ∇ϕ(0) RN
is M -integrable for every M ∈ M2 (RN ).

Second, in preparation for the proof of the next lemma, I have to introduce
the following partition of unity for RN \ {0}. Choose ψ ∈ C ∞ RN ; [0, 1] so that

ψ has compact support in B(0, 2) \ B 0, 14 and ψ(y) = 1 when 12 ≤ |y| ≤ 1,

and set ψm (y) = ψ(2m y) for m ∈ Z. Then, if y ∈ RN and 2−m−1 ≤ |y| ≤

2−m , ψmP (y) = 1 and ψn (y) = 0 unless −m − 2 ≤ n ≤ −m + 1. Hence, if

Ψ(y) = m∈Z ψm (y) for y ∈ RN \ {0}, then Ψ is a smooth function with values
in [1, 4]; and therefore, for each m ∈ Z, the function χm given by χm (0) = 0
and χm (y) = ψΨ(y)
m (y)
for y ∈ RN \ {0} is a smooth, [0, 1]-valued function that
vanishes off of B(0, 2−m+1 ) \ B(0, 2−m−2 ). In addition, for each y ∈ RN \ {0},
−m−2
≤ |y| ≤ 2−m+1 .
P
m∈Z χm (y) = 1 and χm (y) = 0 unless 2
Finally, given n ∈ Z+ and ϕ ∈ C n (RN ; C), define ∇n ϕ(x) to be the multilinear
map on (RN )n into C by
n
!
n ∂n X
∇ ϕ(x) (ξ1 , . . . , ξn ) = ϕ x+ tm ξm .
∂t1 · · · ∂tn m=1 t1 =···=tn =0
Obviously, ∇ϕ and ∇2 ϕ can be identified as the gradient of ϕ and Hessian of ϕ.

Lemma 3.2.14. Let D be the space of functions described above. If A :
D −→ R is a linear functional on D that satisfiesR (3.2.11) and (3.2.12), then
there is a unique M ∈ M2 (RN ) such that Aϕ = RN ϕ(y) M (dy) whenever ϕ
is an element of S (RN ; R) for which ϕ(0) = 0, ∇ϕ(0) = 0, and ∇2 ϕ(0) = 0.
Next, given η ∈ Cc∞ RN ; [0, 1] satisfying η = 1 in a neighborhood of 0, set
ηξ (y) = η(y)(ξ, y)RN for ξ ∈ RN , and define mη ∈ RN and C ∈ Hom(RN ; RN )
by
Z
(3.2.15) ξ, mη ) = Aηξ and ξ, Cξ 0 RN = A ηξ ηξ0 −

(ηξ ηξ0 )(y) M (dy).
RN
Then C is symmetric, non-negative definite, and independent of the choice of η.

Finally, for any ϕ ∈ D,
Aϕ = 12 Trace C∇2 ϕ(0) + mη , ∇ϕ(0) RN

(3.2.16)
Z

+ ϕ(y) − ϕ(0) − η(y) y, ∇ϕ(0) RN M (dy).
RN
Proof: Referring to the partition of unity described above, define Λm ϕ =

A(χm ϕ) for ϕ ∈ C ∞ B(0, 2−m+1 ) \ B(0, 2−m−2 ); R , where

if 2−m−2 ≤ |y| ≤ 2−m+1

χm (y)ϕ(y)
χm ϕ(y) =
0 otherwise.
Clearly Λm is linear. In addition, if ϕ ≥ 0, then χm ϕ ≥ 0 = χm ϕ(0), and so, by

∞ −m+1 ) \ B(0, 2−m−2 ); R ,

(3.2.11), Λm ϕ ≥ 0. Similarly, for any ϕ ∈ C B(0, 2
kϕku χm ± χm ϕ ≥ 0 = kϕku χm ± χm ϕ (0), and therefore |Λm ϕ| ≤ Km kϕku ,
where Km = Aχm . Hence, Λm admits a unique extension as a continuous
linear functional on C B(0, 2−m+1 ) \ B(0, 2−m−2 ); R that is non-negativity

preserving and has norm Km ; and so, by the Riesz Representation Theorem,
we now know that there is a unique non-negative Borel measure Mm on RN
such that MRm is supported on B(0, 2−m+1 ) \ B(0, 2−m−2 ), Km = Mm (RN ), and
A(χm ϕ) = RN ϕ(y) Mm (dy) for all ϕ ∈ S (RN ; R).
Now define the non-negative Borel measure M on RN by M = m∈Z Mm .
P
Clearly, M ({0}) = 0. In addition, if ϕ ∈ Cc∞ RN \ {0}; R , then there is an

n ∈ Z+ such that χm ϕ ≡ 0 unless |m| ≤ n. Thus,
n
X n
X Z
Aϕ = A(χm ϕ) = ϕ(y) Mm (dy)
m=−n m=−n RN
Z n
! Z
X
= χm (y)ϕ(y) M (dy) = ϕ(y) M (dy),
RN m=−n RN
and therefore
Z
(3.2.17) Aϕ = ϕ(y) M (dy)
RN
for ϕ ∈ Cc∞ RN \ {0}; R .

Before taking the next step, observe that, as an application of (3.2.11), if

ϕ1 , ϕ2 ∈ D, then
(*) ϕ1 ≤ ϕ2 and ϕ1 (0) = ϕ2 (0) =⇒ Aϕ1 ≤ Aϕ2 .
Indeed, by linearity, this reduces to the observation that, by (3.2.11), if ϕ ∈ D

is non-negative and ϕ(0) = 0, then Aϕ ≥ 0.
With these preparations, I can show that, for any ϕ ∈ D,
Z
(**) ϕ ≥ 0 = ϕ(0) =⇒ ϕ(y) M (dy) ≤ Aϕ.
RN
Pn
To check this, apply (*) to ϕn = m=−n χm ϕ and ϕ, and use (3.2.17) together
with the Monotone Convergence Theorem to conclude that
Z Z
ϕ(y) M (dy) = lim ϕn (y) M (dy) = lim Aϕn ≤ Aϕ.
RN n→∞ RN n→∞
Now let η be as in the statement of the lemma, and set ηR (y) = η(R−1 y) for
R > 0. By (**) with ϕ(y) = |y|2 η(y) we know that
Z
|y|2 η(y) M (dy) ≤ Aϕ < ∞.
RN
At the same time, by (3.2.17) and (*),

Z

1 − η(y) ηR (y) M (dy) ≤ A(1 − η)
RN
for all R > 0, and therefore, by Fatou’s Lemma,
Z

1 − η(y) M (dy) ≤ A(1 − η) < ∞.
RN
Hence, I have proved that M ∈ M2 (RN ).

I am now in a position to show that (3.2.17) continues to hold for any ϕ ∈
S (RN ; R) that vanishes along with its first and second order derivatives at 0.
To this end, first suppose that ϕ vanishes in a neighborhood of 0. Then, for
each R > 0, (3.2.17) applies to ηR ϕ, and so
Z

ηR (y)ϕ(y) M (dy) = A(ηR ϕ) = Aϕ + A (1 − ηR )ϕ .
RN
By (*) applied to ±(1 − ηR )ϕ and (1 − ηR )kϕku ,

A (1 − ηR )ϕ ≤ kϕku A(1 − ηR ) = −kϕku AηR −→ 0 as R → ∞,
where I used (3.2.12) to get the limit assertion. Thus,
Z Z
Aϕ = lim ηR (y)ϕ(y) M (dy) = ϕ(y) M (dy),
R→∞ RN RN
because, since M is finite on the support of ϕ and therefore ϕ is M -integrable,
Lebesgue’s Dominated Convergence Theorem applies. I still have to replace the
assumption that ϕ vanishes in a neighborhood of 0 by the assumption that it
vanishes to second order there. For this purpose, first note that, by (3.2.13), ϕ
is certainly M -integrable, and therefore
Z

ϕ(y) M (dy) = lim A (1 − ηR )ϕ = Aϕ − lim A(ηR ϕ).
RN R&0 R&0
By our assumptions about ϕ at 0, we can find a C < ∞ such that |ηR ϕ(y)| ≤
CR|y|2 η(y) for all R ∈ (0, 1]. Hence, by (*) and the M -integrability of |y|2 η(y),
there is a C 0 < ∞ such that |A(ηR ϕ)| ≤ C 0 R for small R > 0, and therefore
A(ηR ϕ) −→ 0 as R & 0.
To complete the proof from here, let ϕ ∈ S (RN ; R) be given, and set
ϕ̃(x) = ϕ(x) − ϕ(0) − η(x) x, ∇ϕ(0) RN − 12 η(x)2 x, ∇2 ϕ(0)x RN .

Then, by the preceding, (3.2.17) holds for ϕ̃ and, after one re-arranges terms,
says that (3.2.16) holds. Thus, the properties of C are all that remain to be
proved. That C is symmetric requires no comment. In addition, from (*), it
is clearly non-negative definite. Finally, to see that it is independent of the η
chosen, let η 0 be a second choice, note that ηξ0 = ηξ in a neighborhood of 0, and
apply (3.2.17).
Remark 3.2.18. A careful examination of the proof of Lemma 3.2.14 reveals a

lot. Specifically, it shows why the operation performed by the linear functional
A cannot be of order greater than 2. The point is, that, because of the minimum
principle, A acts as a bounded, non-negative linear functional on the difference
between ϕ and its second order Taylor polynomial, and, because of quasi-locality,
this action can be represented by integration against a non-negative measure.
The reason why the second order Taylor polynomial suffices is that second order
polynomials are, apart from constants, the lowest order polynomials that can
have a definite sign.
In order to complete the program, I need to introduce the notion of a Lévy
system, which is a triple (m, C, M ) consisting of an m ∈ RN , a symmetric, non-
negative definite transformation C on RN , and a Lévy measure M ∈ M2 (RN ).
Given a Lévy system (m, C, M ) and a Borel measurable η : RN −→ [0, 1] satis-
fying
! !
−1

(3.2.19) sup |y| 1 − η(y) ∨ sup η(y)|y| < ∞,
y∈B(0,1)\{0} y∈B(0,1)
/
we will need to know that

√
Z √
1 −1(ξ,y)RN

e − 1 − −1η(y) ξ, y) N M (dy)

2 R
1 + |ξ|

(3.2.20) RN
is bounded and tends to 0 as |ξ| → ∞.
To see this, note that, for each r ∈ (0, 1],

√ √
Z
−1(ξ,y)RN
e − 1 − −1η(y) ξ, y)RN M (dy)

RN
√ √
Z
−1(ξ,y)RN
≤ e − 1 − −1 ξ, y)RN M (dy)

B(0,r)
Z Z

+ |ξ| 1 − η(y) |y| M (dy) + 2 + |ξ|η(y)|y| M (y)
B(0,r) B(0,r){
2
|ξ|
Z Z
|y|2 M (dy) + |ξ|

≤ 1 − η(y) M (dy)
2 B(0,r) B(0,r)
Z

+ |ξ| η(y)|y| M (dy) + 2M B(0, r){ .
B(0,r){
Obviously, this proves the boundedness in (3.2.20). In addition,

R it shows that, for
each r ∈ (0, 1], the limit there as |ξ| → ∞ is dominated by 12 B(0,r) |y|2 M (dy),
which tends to 0 as r & 0.
Knowing (3.2.20), we can define

√
`η(m,C,M ) (ξ) = −1 m, ξ RN − 12 ξ, Cξ RN

(3.2.21) Z √
√
+ e −1 (ξ,y)RN − 1 − −1η(y) ξ, y RN M (dy)
RN
for any Lévy system (m, C, M ) and any Borel measurable η : RN −→ [0, 1]
that satisfies (3.2.19). Furthermore, because `η(m,C,Mr ) −→ `η(m,C,M ) uniformly
on compacts when Mr (dy) = 1[r,∞) (|y|) M (dy), it is clear that `η(m,C,M ) is
continuous.
Theorem 3.2.22 (Lévy–Khinchine). For each µ ∈ I(RN ), there is a unique
1
`µ ∈ C(RN ; C) such that `µ (0) = 0 and µ̂ = e`µ , and, for each n ∈ Z+ , e n `µ is
the Fourier transform of the unique µ n1 ∈ M1 (RN ) satisfying µ = µ?n1 . Next,
n
let η : RN −→ [0, 1] be a Borel measurable function that satisfies (3.2.19).
Then, for each µ ∈ I(RN ), there is a unique Lévy system (mηµ , Cµ , Mµ ) such
that `µ = `η(mη ,Cµ ,Mµ ) , and, for each Lévy system (m, C, M ), there is a unique
µ
µ ∈ I(RN ) such that `µ = `η(m,C,M ) . In fact, if µ ∈ I(RN ), then
Z
ϕ(y) Mµ (dy) = lim nhϕ, µ n1 i
RN n→∞
for all ϕ ∈ S (R ; C) that satisfy lim |y|−2 |ϕ(y)| = 0,

N
|y|&0
Z Z
2
Cµ = lim n η0 (y) y ⊗ y µ n1 (dy) − η0 (y)2 y ⊗ y Mµ (dy),
n→∞ RN RN
and Z
mηµ0 = lim n η0 (y)y µ n1 (dy)
n→∞ RN
for any if η0 ∈ Cc∞ RN ; [0, 1] satisfying η0 = 1 in a neighborhood of 0. Finally,

for any Borel measurable η : RN −→ [0, 1] satisfying (3.2.19),

Z
mηµ mηµ0

= + η(y) − η0 (y) Mµ (dy).
RN
Proof: The initial assertion is covered by Theorem 3.2.7.

To prove the second assertion, let η ∈ Cc∞ RN ; [0, 1] with η = 1 in B(0, 1) be

given. For µ ∈ I(RN ), I will show that `µ = `η(mη ,C,M ) , where mη , C, and M are
determined from (cf. √
(3.2.10)) Aµ as in Lemma 3.2.14. To this end, define eξ for
ξ ∈ RN by eξ (x) = e −1(ξ,x)RN , and set ηR (x) = η(R−1 x) for R > 0. The idea
is to show that, as R → ∞, Aµ (ηR eξ ) tends to both `µ (ξ) and to `η(mη ,C,M ) (ξ).
To check the first of these, use (3.2.10) to see that
Z Z
(2π)N Aµ (ηR eξ ) = ηR (ξ 0 + ξ) dξ 0 =
`µ (ξ 0 )c `µ (R−1 ξ 0 − x)η̂(ξ 0 ) dξ 0 .
RN RN
Hence, since `µ is continuous and, by Lemma 3.2.9, supR≥1 |`µ (R−1 ξ)η̂(ξ)| is
rapidly decreasing, Lebesgue’s Dominated Convergence Theorem says that
Z
lim Aµ (ηR eξ ) = `µ (−ξ)(2π)−N η̂R (ξ 0 ) dξ 0 = `µ (ξ).
R→∞ RN
η
To prove that Aµ (ηR eξ ) also tends to `(mη ,C,M ) (ξ), use (3.2.16) to write
√
Z
Aµ (ηR eξ ) = `η(mη ,C,M ) (ξ) −

−1
1 − ηR (y) eξ (y) M (dy),
RN

and observe that the last term is dominated by M B(0, R){ −→ 0.
So far we know that, for each µ ∈ I(RN ), there is a Lévy system (mη , C, M )
such that `µ (ξ) = `η(mη ,C,M ) . Moreover, in the preliminary discussion at the
beginning of this subsection, it was shown that, for each Lévy system (m, C, M ),
there exists a µ ∈ I(RN ) for which `η(m,C,M ) = `µ .
Finally, let η0 be as in the statement of this theorem. Given µ ∈ I(RN ), let
mµ ∈ RN , Cµ ∈ Hom(RN ; RN ), and Mµ ∈ M2 (RN ) be associated with Aµ as in
η0
(3.2.16) of Lemma 3.2.14 when η = η0 . As we have just seen, `µ = `η(m0

η0
,C ,M )
.
µ µ µ
Further, by that lemma and (3.2.10), we know that
Z
ϕ(y) Mµ (dy) = Aµ ϕ = lim nhϕ, µ n1 i
RN n→∞
for any ϕ ∈ S (RN ; R) that vanishes to second order at 0. In addition, by that

same lemma and (3.2.10), we know that
Z Z
2
Cµ = lim n η0 (y) y ⊗ y µ n1 (dy) − η0 (y)2 y ⊗ y Mµ (dy)
n→∞ RN RN
and that Z
mηµ0 = lim n η0 (y) µ n1 (dy).
n→∞ RN
In particular, mη0 , Cµ , and Mµ are all uniquely determined by µ and η0 . In
addition, if η : RN −→ [0, 1] is any other Borel measurable function satisfying
(3.2.19), then the preceding combined with
√
Z
η η0
`(m,C,M ) (ξ) = `(m,C,M ) (ξ) + −1 η0 (y) − η(y) ξ, y RN M (dy)
RN
η
shows that `(m,C,M ) = `µ if and only if m = mηµ0 + RN η(y) − η0 (y) M (dy),
R
C = Cµ , and M = Mµ .
The expression in (3.2.21) for `µ in terms of a Lévy system is known as the
Lévy–Khinchine formula.
Exercise 3.2.23. Referring to (3.2.21), suppose that µ ∈ I(RN ) with `µ =

`η(m,C,M ) for some Lévy system (m, C, M ) whose Lévy measure M satisfies
eλ|y| M (dy) < ∞ for all λ ∈ (0, ∞). Show that `µ admits a unique ex-
R
|y|≥1
tension as an analytic function on CN and that `µ (ξ) continues to be given by
(3.2.21) when the RN -inner product of (ξ1 , . . . , ξN ) ∈ CN with (ξ10 , . . . , ξN
0
)∈
N
P N 0
C is i=1 ξi ξi . Further, show that
√
Z
−1ξ)
e(ξ,y)RN µ(dy) = e`µ (− for all ξ ∈ CN .
RN
Hint: The first part is completely elementary complex analysis. To handle the
second part, begin by arguing that it is enough to treat the cases when either
M = 0 or C = 0. The case M = 0 is trivial, and the case when C = 0 can be
further reduced to the one in which µ = πM for an M ∈ M0 (RN ) with compact
P∞ m
support in RN \ {0}. Finally, use the representation πM = e−α m=0 αm! ν ?m to
complete the computation in this case.
Exercise 3.2.24. Given µ ∈ I(RN ) and knowing (3.2.20), show that
≡ −2 lim t−2 `µ (tξ) for all µ ∈ I(RN ) and ξ ∈ RN .

ξ, Cµ ξ RN t→∞
Similarly, when Mµ ∈ M1 (RN ), show that

Z
mµ ≡ mηµ − η(y)y Mµ (dy)
RN
is independent of the choice of η satisfying (3.2.19) and, for each ξ ∈ RN ,
√ 2
ξ, mµ = − −1 lim t−1 `µ (tξ) + t2 ξ, Cµ ξ RN

and
t→∞
√
Z √
`µ (ξ) = − 12 ξ, Cµ ξ RN + −1 ξ, mµ RN + e −1(ξ,y)RN − 1 Mµ (dy).

RN
Finally, if µ ∈ I(RN ) is symmetric, show that Mµ is also symmetric and that

Z
1
`µ (ξ) = − ξ, Cµ ξ + cos ξ, y RN
− 1 Mµ (dy).
2 RN

Exercise 3.2.25. Given µ ∈ I(R), show that µ (−∞, 0) = 0 if and only if
Cµ = 0, Mµ ∈ M1 (R), Mµ (−∞, 0) = 0, and (cf. the preceding exercise)
mµ ≥ 0. The following are steps that you might follow.
r
(i) To prove the “if” assertion, set M (dy) = 1[r,∞) (y) Mµ (dy) for r > 0, and
show that δmµ ? πM r (−∞, 0] = 0 for all r > 0 and δmµ ? πM =⇒µ as r & 0.
r
Conclude from these that µ (−∞, 0) = 0.

(ii) Now assume that µ (−∞, 0) = 0. To see that Cµ = 0, show that if σ > 0,
then γ0,σ2 ? ν (−∞, 0) > 0 for any ν ∈ M1 (R).
n
(iii) Continuing (ii), show that µ (−∞, 0) ≥ µ n1 (−∞, 0) , and conclude first

that µ n1 (−∞, 0) = 0 for all n ∈ Z+ and then that
Z
Mµ (−∞, 0) = 0 and mηµ ≥

η(y)y Mµ (dy).
RN
Finally, deduce from these that Mµ ∈ M1 (R) and that mµ ≥ 0.
(iv) Suppose that X ∈ N (0, 1), and show that the distribution of |X| cannot be
infinitely divisible.
Exercise 3.2.26. The Gamma distributions is an interesting source of in-
finitely divisible laws. Namely, consider the family {µt : t ∈ (0, ∞)} ⊆ M1 (R)
given by
xt−1 e−x
µt (dx) = 1(0,∞) (x) dx.
Γ(t)
(i) Show by direct computation that
B(s, t)
µs ? µt (dx) = 1(0,∞) (x)xs+t−1 e−x dx,
Γ(s)Γ(t)
where Z
B(s, t) ≡ ξ s−1 (1 − ξ)t−1 dξ
(0,1)
is Euler’s Beta function, and conclude that µs+t = µs ? µt . In particular, one
gets, as a dividend, the famous identity B(s, t) = Γ(s)Γ(t)
Γ(s+t) .
(ii) As a consequence of (i), we know that the µt ’s are infinitely divisible. Show
that their Lévy–Khinchine representation is
" Z #
√
−y dy

−1 ξy
µbt (ξ) = exp t e −1 e .
(0,∞) y
Exercise 3.2.27. Given a µ ∈ M1 (RN ) for which there exists a strictly increas-
ing sequence {nm : m ≥ 1} ⊆ Z+ and a sequence {µ n1 : m ≥ 1} ⊆ M1 (RN )
m
such that µ = µ?n1 m for all m ≥ 1, show that µ ∈ I(RN ).
nm
Hint: First use Lemma 3.2.4 to show that µ̂ never vanishes and therefore that
there is a unique `µ ∈ C(RN ; C) such that `µ (0) = 0 and µ̂ = e`µ . Next,
proceed as in the proof of Theorem 3.2.7 to show that µ ∈ P(RN ), and apply
that theorem to conclude that µ ∈ I(RN ).
§ 3.3 Stable Laws 139
§ 3.3 Stable Laws

Recall from Exercise 2.3.23 the maps Tα : M1 (RN ) −→ M1 (RN ) given by the
prescription ZZ
x+y
Tα µ(Γ) = 1Γ 1 µ(dx)µ(dy),
2α
RN ×RN
N
and
let Fα (RN ) denote the set of non-trivial
fixed points of Tα . That is, Fα (RN ) =
µ ∈ M1 (R ) \ {δ0 } : µ = Tα µ . If µ ∈ Fα (RN ) and µ2−n denotes the distri-
n n
bution of x 2 α x under µ, then µ = µ?2 2−n for all n. Hence, by the result in
Exercises 3.2.27, µ ∈ I(RN ), and so Fα (RN ) ⊆ I(RN ) for all α ∈ (0, ∞). In this
section, I will study the Lévy systems associated with elements of Fα (RN ).
§ 3.3.1. General Results. Knowing that Fα (RN ) ⊆ I(RN ), we can phrase the
N
condition µ = Tα µ in terms of the associated Lévy systems.
N −α1 Namely, µ ∈N Fα (R )
if and only if µ ∈ I(R ) \ {δ0 } and `µ (ξ) = 2`µ 2 ξ for all ξ ∈ R . Next,
using this and Exercise 3.2.24, we see that, for µ ∈ Fα (RN ),

n 2 2n n 0 if α > 2
`µ (ξ) = 2−n `µ (2 α ξ) = 2n( α −1) 2− α `µ (2 α ξ) −→ 1
− 2 (ξ, Cµ ξ)RN if α = 2
as n → ∞. Thus, we have already recovered the results in Exercises 2.3.21 and

2.3.23.
I next will examine Fα (RN ) for α ∈ (0, 2) in greater detail. For this purpose,
define T̂α M for M ∈ M2 (RN ) to be the Borel measure determined by
Z Z
1
(3.3.1) ϕ(y) T̂α M (dy) = 2 ϕ(2− α y) M (dy)
RN RN
for Borel measurable ϕ : RN −→ [0, ∞). It is easy to check that T̂α maps
M2 (RN ) into itself.
Lemma 3.3.2. For any α ∈ (0, 2),
( C = 0, Mµ = T̂α Mµ , and
µ
µ ∈ Fα (RN ) ∪ {δ0 } ⇐⇒
Z
1 1
1− α η

(1 − 2 )mµ = η(y) − η(2 α y) y Mµ (dy).
RN
In addition, if M ∈ M2 (RN ) \ {0} satisfies M = T̂α M for some α ∈ (0, 2), then
M ∈ Mβ (RN ) for all β > α but M ∈ / Mα (RN ).
Proof: From the uniqueness of the Lévy system associated with an element of
2
I(RN ), it is clear that, for any µ ∈ I(RN ), MTα µ = T̂α Mµ , CTα µ = 21− α Cµ ,
and Z
η 1 1
1− α η

mTα µ = 2 mµ + η(y) − η(2 α y) y T̂α Mµ (dy).
RN
2
Hence, µ ∈ Fα (RN ) ∪ {δ0 } if and only if Mµ = T̂α Mµ , Cµ = 21− α Cµ , and, for
any η satisfying (3.2.19),
Z
1 1
1− α
)mηµ

(1 − 2 = η(y) − η(2 α y) y Mµ (dy).
RN
In particular, when α ∈ (0, 2), Cµ = 0, and so the first assertion follows.

The second assertion turns on the fact that, for all n ∈ Z+ ,
n n+1 1
M = T̂α M =⇒ M B(0, 2− α ) \ B(0, 2− ) = 2n M B(0, 1) \ B(0, 2− α ) .

α
1
From this we see that κ ≡ M B(0, 1) \ B(0, 2− α ) > 0 unless M = 0 and that
P∞ β
the M -integral of |y|β over B(0, 1) is bounded below by 2−1 κ n=0 2n(1− α ) and
P∞ β
above by κ n=0 2n(1− α ) .
Theorem 3.3.3. µ ∈ F2 (RN ) if and only if µ = γ0,C for some non-negative
definite, symmetric C ∈ Hom(RN ; RN ) \ {0}. If α ∈ (1, 2), then µ ∈ Fα (RN ) if
and only if µ ∈ I(RN ) and `µ (ξ) equals
√
−1
Z

1
1− α
ξ, y RN
M (dy)
1−2
1
2− α <|y|≤1
√ √
Z
−1(ξ,y)RN

+ e −1− −1 1[0,1] (|y|) ξ, y RN
M (dy)
RN
T
for some M ∈ β>α Mβ (RN ) \ Mα (RN ) satisfying M = T̂α M . If α ∈ (0, 1),
then µ ∈ Fα (RN ) if and only if µ ∈ I(RN ) and `µ (ξ) equals
√
Z
−1(ξ,y)RN
e − 1 M (dy)
RN
T
N
for some M ∈ β>α M β (R ) \ Mα (RN ) satisfying M = T̂α M . Finally, µ ∈
F1 (RN ) if and only if µ ∈ I(RN ) and either µ = δm for some m ∈ RN \ {0} or
√ √ √
Z

e −1(ξ,y)RN − 1 − −1 1[0,1] (|y|) ξ, y RN M (dy)

`µ (ξ) = −1 m, ξ RN
+
RN
T
for some m ∈ RN and M ∈ N N
β∈(1,2] Mβ (R ) \ M1 (R ) satisfying M = T̂1 M
and Z
y M (dy) = 0.
1
2 <|y|≤1
Proof: The first assertion requires no comment. When α ∈ (0, 2), the “if”
1
assertions can be proved by checking that, in each case, `µ (ξ) = 2`µ (2− α ξ).
When α ∈ [1, 2), the “only if” assertion follows immediately from Lemma 3.3.2
with η = 1B(0,1) , and when α ∈ (0, 1), it follows from that lemma combined
with the observation that
Z Z
1
1− α
M = T̂α M =⇒ 1 − 2 y M (dy) = 1
y M (dy).
B(0,1) {2− α <|y|≤1}
§ 3.3.2. α-Stable Laws. The most studied elements of Fα (RN ) are the α-
stable laws: those µ ∈ I(RN )\{δ0 } such that `µ (tξ) = tα `µ (ξ) for all t ∈ (0, ∞),
not just for t = 2. Equivalently, if µ ∈ M1 (RN ) is α-stable if and only if
µ ∈ I(RN ) \ {δ0 } and, for all non-negative, Borel measurable functions ϕ,
Z Z
1
ϕ(y) µt (dy) = ϕ(t α y) µ(dy), t ∈ (0, ∞),
RN RN
where µbt (ξ) = et`µ (ξ) . Thus, there are no α-stable laws if α > 2, and µ is 2-stable
if and only if µ = γ0,C for some C 6= 0. To examine the α-stable laws when
α ∈ (0, 2), I will need the computations contained in the following lemmas.
Lemma 3.3.4. Assume that M ∈ M2 (RN ) and that α ∈ (0, 2), and define the
finite Borel measure ν on SN −1 by
Z
1 y
2 −|y|
hϕ, νi = ϕ |y| |y| e M (dy)
Γ(2 − α) RN \{0}
for bounded, Borel measurable ϕ : SN −1 −→ C. Then, M satisfies

Z Z
(3.3.5) ϕ(ty) M (dy) = tα ϕ(y) M (dy), t ∈ (0, ∞)
RN RN

for all ϕ ∈ Cc RN \ {0}; R if and only if
Z Z Z !
dr
ϕ(y) M (dy) = ϕ(rω) ν(dω)
RN SN −1 (0,∞) r1+α

for all ϕ ∈ Cc RN \ {0}; R .
Proof: The “if” assertion is obvious. In addition, the “only if” assertion
y
will follow once I prove it for ϕ’s such that ϕ(y) = ϕ1 |y| ϕ2 (|y|), where
ϕ1 ∈ C SN −1 ; [0, ∞) and ϕ2 ∈ Cc (0, ∞); R . Given ϕ1 ∈ C SN −1 ; [0, ∞) ,

determine the Borel measure ρ on (0, ∞) by

Z
y
ϕ2 (|y|)|y|2 M (dy)

hϕ2 , ρi = ϕ1 |y|
RN \{0}

for ϕ2 ∈ Cc (0, ∞); R . Then (3.3.5) implies that
Z Z
−tr
e ρ(dr) = t α−2
e−r ρ(dr) = tα−2 Γ(2 − α)hϕ, νi
(0,∞) (0,∞)
for t ∈ (0, ∞). Hence, since

Z
r1−α e−tr dr = Γ(2 − α)tα−2 , t ∈ (0, ∞),
(0,∞)
uniqueness of the Laplace transform (cf. Exercise 1.2.12) implies that ρ(dr) =
hϕ1 , νir1−α dr, and therefore that
Z Z Z
y
ϕ2 (r) dr
ϕ1 |y| M (dy) = ρ(dr) = hϕ1 , νi ϕ1 (r) .
RN \{0} (0,∞) r2 (0,∞) r1+α
Lemma 3.3.6. Let µ ∈ I(RN ). Then µ is 2-stable if and only if µ = γ0,C for
some symmetric, non-negative definite C 6= 0; µ is α-stable for some α ∈ (0, 1)
if and only if there is a finite, non-negative Borel measure ν 6= 0 on SN −1 such
that !
√−1(ξ,rω) N
Z Z
dr
`µ (ξ) = e R − 1 1+α ν(dω);
SN −1 (0,∞) r
µ is 1-stable if and only if there exists a finite, non-negative, Borel measure ν

on SN −1 and an m ∈ RN satisfying
Z
N −1
|m| + ν(S ) > 0 and ω ν(dω) = 0
SN −1
such that `µ (ξ) equals

√
−1 ξ, m RN
!
√−1(ξ,rω) N √
Z Z
dr
+ e R − 1 − −11[0,1] (r) ξ, rω RN 2 ν(dω);
SN −1 (0,∞) r
and µ is α-stable for some α ∈ (1, 2) if and only if there is a finite, non-negative,
Borel measure ν 6= 0 on SN −1 such that `µ (ξ) equals
√
−1
Z

ξ, ω RN
ν(dω)
1−α SN −1
!
√−1(ξ,rω) N √
Z Z
dr
+ e R − 1 − −11[0,1] (r) ξ, rω RN 1+α ν(dω).
SN −1 (0,∞) r
Proof: The sufficiency part of each case is easy to check directly or as a conse-
quence of Theorem 3.3.3. To prove the necessity, first check that if µ is α-stable
and therefore `µ (tξ) = tα `µ (ξ), then M must have the scaling property in (3.3.5)
and therefore have the form described in Lemma 3.3.4. Second, when M has
this form, simply check that in each case the result in Theorem 3.3.3 translates
into the result here.
In the following, C+ denotes the open upper half-space {ζ ∈ C : Im(ζ) > 0}
in C, and C+ denotes its closure {ζ√ ∈ C : Im(ζ) ≥ 0}. In addition, given ζ ∈ C
and α ∈ (0, 2), we take ζ α ≡ |ζ|α e −1αargζ
√
, where argζ is 0 if ζ = 0 and is the
unique θ ∈ (−π, π] such that ζ = |ζ|e −1θ if ζ 6= 0.
Lemma 3.3.7. If α ∈ (0, 1), then
√ α
−1ζr

−1 Γ(1 − α
Z
e ζ
dr = √ for ζ ∈ C+ .
(0,∞) r1+α α −1
In particular,
Γ(2−α)
cos απ

 α(α−1) 2 if α ∈ (1, 2)
cos r − 1
Z 
aα ≡ dr = − Γ(1−α) cos απ if α ∈ (0, 1)
(0,∞) r1+α  π α
 2
−2 if α = 1
and
Γ(1 − α)
Z
sin r απ
bα ≡ dr = sin if α ∈ (0, 1).
(0,∞) r1+α α 2
Proof: Let fα (ζ) denote the integral on the left-hand side of the first equation.
Clearly fα is continuous on C+ and analytic on C+ . In addition, fα (ξ) = ξ αfα (1)
for ξ ∈ (0, ∞), and Re fα (1) < 0. Hence, there exist c > 0 and θ ∈ 0, π2 such
√
that fα (ξ) = −ce −1θ ξ α for ξ ∈ (0, ∞). Since ζ ∈ C+ 7−→ ζ α ∈ C is the unique
continuous extension of ξ ∈ (0,√∞) 7−→ ξ α ∈ (0, ∞) to C+ that is analytic on
C+ , we know that fα (ζ) = −ce −1θ ζ α for ζ ∈ C+ . In addition,
√ e−r − 1 Γ(1 − α)
Z Z
1
fα ( −1) = 1+α
dr = − r−α e−r dr = − .
(0,∞) r α (0,∞) α
Hence, c = Γ(1−α)
α and θ = − απ2 .
When α ∈ (0, 1), the values of aα and bα follow immediately from the evalu-
ation of fα (1). When α ∈ (1, 2), one can find the value of aα by first observing
that
cos(ξr) − 1 cos r − 1
Z Z
α
1+α
dr = ξ dr for ξ ∈ (0, ∞),
(0,∞) r (0,∞) r1+α
and then differentiating this with respect to ξ to get
cos r − 1
Z Z
sin r
α dr = − dr = −bα−1 .
(0,∞) r1+α (0,∞) rα
To evaluate a1 , simply note that
Γ(2 − α) cos απ
2 π
a1 = lim aα = − lim =− .
α&1 α&1 α 1−α 2
Theorem 3.3.8. Let µ ∈ I(RN ). If α ∈ (0, 2) \ {1}, then µ is α-stable if and

only if there exists a finite, non-negative, Borel measure ν 6= 0 on SN −1 such
that Z α
(ξ, ω)RN
`µ (ξ) = (−1)1(0,1) (α) √ ν(dω).
SN −1 −1
On the other hand, µ is 1-stable if and only if there exist an m ∈ RN and a
finite, non-negative, Borel measure ν on SN −1 such that |m| + ν(SN −1 ) > 0,
Z
ω ν(dω) = 0,
SN −1
and
√ √
Z

`µ (ξ) = −1 ξ, m RN − −1 ξ, ω RN
log ξ, ω RN
ν(dω),
SN −1
√
where ζ log ζ = ζ log |ζ| + −1ζargζ for ζ ∈ C.
Proof: When α ∈ (0, 1), the conclusion is a simple application of the cor-
responding results in Lemmas 3.3.6 and 3.3.7. When α ∈ (1, 2), one has to
massage the corresponding expression in Lemma 3.3.6. Specifically, begin with
the observation that
√
−1ξ h √ √
Z i dr
+ e −1ξr − 1 − −1ξ1[0,1] (r)r 1+α
1−α (0,∞) r
√ !
h √ √ −1sgn(ξ)
Z i dr
α −1sgn(ξ)r
= |ξ| e − 1 − −1sgn(ξ)1[0,1] (r)r 1+α +
(0,∞) r 1−α
for ξ ∈ R. Thus, we can write the expression for `µ (ξ) as

Z
ξ, ω N α gα sgn(ξ, ω)RN ν(dω),

R
SN −1
where (cf. Lemma 3.3.7)

√
√ √ −1
Z h i dr
± −1r
gα (±1) = e − 1 ∓ −11[0,1] (r)r 1+α ±
(0,∞) r 1−α
√
√ −1
Z
dr
= aα ± −1 sin r − 1[0,1] (r)r 1+α ± .
(0,∞) r 1−α
Next use integration by parts over the intervals (0, 1] and [1, ∞) to check that
cos r − 1
Z Z
dr 1 1 1 aα−1
sin r − 1[0,1] (r)r 1+α
= + α
dr = + .
(0,∞) r α−1 α (0,∞) r α−1 α
aα−1 Γ(2−α)
Hence, since α = − α(α−1) sin απ
2 ,
Γ(2 − α) ∓ απ
gα (±1) = e 2 ,
α(α − 1)
and therefore
α
α Γ(2 − α) (ξ, ω)RN
gα sgn(x, ω)RN ξ, ω RN = √ .
α(α − 1) −1
1−α
Thus, all that we need to do is replace the ν in Theorem 3.3.8 by Γ(1−α) ν.
Turning to the case α = 1, note that, because of the mean zero condition on
ν,
!
√ √
Z Z h i dr
−1(ξ,ω)RN r

e −1− −11[0,1] (r)r ξ, ω RN
ν(dω)
SN −1 (0,∞) r2
!
Z h √ Z i dr
= lim e −1(ξ,ω)RN r − 1 1+α ν(dω)
α%1 SN −1 (0,∞) r
α
Γ(1 − α)
Z
(ξ, ω)RN
= − lim √ ν(dω)
α%1 α SN −1 −1
√
Z
1 α
= −1 lim ξ, ω RN − ξ, ω RN ν(dω)
α%1 1 − α SN −1
√
Z

= − −1 ξ, ω RN log ξ, ω RN ν(dω),
SN −1
where I have used (1 − α)Γ(1 − α) = Γ(2 − α) −→ 1.

I close this section with a couple of examples of particularly important stable
laws.
Corollary 3.3.9. For any α ∈ (0, 2], µ is a symmetric and α-stable law if
and only if there is a finite, non-negative, symmetric, Borel measure ν 6= 0 on
SN −1 such that Z
ξ, ω N α ν(dω).

`µ (ξ) = − R
SN −1
Moreover, µ is a rotationally invariant, α-stable law if and only if `µ (ξ) = −t|ξ|α

for some t ∈ (0, ∞).
Proof: If µ is 2-stable, then µ = γ0,C for some C 6= 0 and is therefore sym-
metric. In addition, by defining ν on SN −1 so that
Z
1 y
hϕ, νi = |y|2 ϕ γ0,C (dω),
2 RN |y|
we see that
Z
1
ξ, ω N 2 ν(dω).

`µ (ξ) = − ξ, Cξ RN =
2 SN −1
R
If α ∈ (0, 2) \ {1}, then, for every non-zero, symmetric ν on SN −1 ,

Z Z α
α
1(0,1) (α) απ (ξ, ω)RN
− (ξ, ω
RN
ν(dω) = (−1) csc √ ν(dω)
2 −1

SN −1 SN −1
is `µ (ξ) for a symmetric, α-stable µ. Conversely, if µ is symmetric and α-stable

for some α ∈ (0, 1), then, because `µ (ξ) = `µ (−ξ), the associated ν in Theorem
3.3.8 can be chosen to be symmetric, in which case `µ (ξ) equals
Z α Z
1(0,1) (α) (ξ, ω)RN απ ξ, ω N α ν(dω).

(−1) √ ν(ω) = − cos

−1 2
R
SN −1 SN −1
To handle the case when α = 1, first suppose that ν 6= 0 on SN −1 is symmetric.

Then
Z Z

− ξ, ω N ν(dω) = 2
R
ξ, ω RN ν(dω)
SN −1
{ω:(ξ,ω)RN <0}
Z
2
= √ ξ, ω RN
log ξ, ω RN
+ ξ, −ω RN
log ξ, −ω RN
ν(dω)
π −1
{ω:(ξ,ω)RN <0}
√
Z
1
= − −1 ξ, ω RN
log ξ, ω RN
ν(dω),
π SN −1
which is `µ (ξ) for a symmetric, 1-stable µ. Conversely, if µ is symmetric and

1-stable, one can use `µ (ξ) = `µ (−ξ) to see that m = 0 and ν is symmetric in
the expression for `µ (ξ) in Theorem 3.3.8. Hence, by the preceding calculation,
`µ (ξ) has the desired form.
Finally, if µ is a rotationally invariant, α-stable law, then `µ (ξ) is a rotationally
invariant function of ξ and therefore the preceding leads to
Z Z
α
|ξ|α ω, ω 0 ν(dω 0 ) λSN −1 (dω) = −t|ξ|α ,

`µ (ξ) = −
SN −1 SN −1
where λSN −1 is normalized surface measure on SN −1 and

Z Z
t = ν(SN −1 ) e, ω N α λSN −1 (dω)

R
SN −1 SN −1
for any e ∈ SN −1 . Conversely, by taking ν to be an appropriate multiple of

λSN −1 , one sees that, for any t ∈ (0, ∞), −t|ξ|α is `µ (ξ) for a symmetric, α-
stable µ.
Exercise 3.3.10. Given α ∈ (0, 2), define Sα ν for finite, non-negative, Borel
1
measures ν on B(0, 1) \ B(0, 2− α ) by
Z
m
X
Sα ν(Γ) = 2−m 1Γ (2 α y) ν(dy),
m∈Z RN
and show that this map is one-to-one and onto the set of M ∈ M2 (RN ) satisfying
(cf. (3.3.1)) M = T̂α M . Conclude that, for each α ∈ (0, 2), Fα (RN ) contains
lots of elements!
Exercise 3.3.11. Here are a few further properties of elements of Fα (RN ).

(i) Show that there is µ ∈ Fα (RN ) such that µ {y : (e, y)RN < 0} = 0 for some
e ∈ SN −1 if and only if α ∈ (0, 1).
Hint: Reduce to the case when N = 1, and look at Exercise 3.2.24.
N −1

(ii) If µ ∈ F1 (RN ), show that,
for every e ∈ S , µ {y : (e, y)RN < 0} >
0 ⇐⇒ µ {y : (e, y)RN > 0} > 0.
(iii) If α ∈ (1, 2), show that for each > 0 there is a µ ∈ Fα (R) such that
µ (−∞, −] = 0.
Exercise 3.3.12. Take N = 1. This exercise is about an important class of
stable laws known as one-sided stable laws: stable laws that are supported
on [0, ∞).
(i) Show that there exists a one-sided α-stable law only if α ∈ (0, 1).
(ii)If α ∈
α(0, 1), show that µ is a one-sided α-stable law if and only if `µ (ξ) =
ξ
−t √−1 for some t ∈ (0, ∞).
(iii) Let α ∈ (0, 1), and use νtα to denote the one-sided α-stable law with `νtα (ξ) =
α
−t √ξ−1 . Show that
Z √
α
−1ζy ζ
e νtα (dy) = exp −t √ for ζ ∈ C with Im(ζ) ≥ 0.
[0,∞) −1
In particular, use Exercise 1.2.12 to conclude that νtα is characterized by the

facts that it is supported on [0, ∞) and its Laplace transform is given by
Z
α
e−λy νtα (dy) = e−tλ , λ ≥ 0.
[0,∞)
Exercise 3.3.13. Given α ∈ (0, 2], let µα t denote the symmetric α-stable law,
described in Corollary 3.3.9, with `µαt (ξ) = −t|ξ|α . Clearly µ2t = γ0,2tI . When
α ∈ (0, 2), show that
Z
α
α
µt = γ0,2τ I νt2 (dτ ),
[0,∞)
α
where νt2 is the one-sided α2 -stable law in part (iii) of the preceding exercise.
This representation is an example of subordination, and, as we will see in
Exercise 3.3.17, can be used to good effect.
Exercise 3.3.14. Because their Fourier transforms are rapidly decreasing,

we know that each of the measures νtα in part (iii) of Exercise 3.3.11 admits a
smooth density with respect to Lebesgue measure λR on R. In this exercise, we
examine these densities.
(i) For α ∈ (0, 1), set
dνtα
(3.3.15) hα
t = for t ∈ (0, ∞),
dλR
and show that Z ∞

α
e−λτ hα
t (τ ) dτ = e
−tλ
, λ ∈ [0, ∞),
0
1 1
−α α −α
and that hα
t (τ ) ≡ t h1 (t τ ).
(ii) Only when α = 12 is an explicit expression for hα1 readily available. To find
this expression, first note that, by the uniqueness of the Laplace transform (cf.
1
Exercise 1.2.12) and (i), h12 is uniquely determined by
Z ∞
2 1
e−λ τ h12 (τ ) dτ = e−λ , λ ∈ [0, ∞).
0
Next, show that
Z ∞ 1 ∞ 1
π 2 e−2ab π 2 e−2ab
Z
a2 2 a2
1 3
+b2 τ )
τ − 2 e−( τ +b τ ) dτ = and τ − 2 e−( τ dτ =
0 b 0 a
2
for all (a, b) ∈ (0, ∞) , and conclude from the second of these that
1
1 1(0,∞) (τ )e− 4τ
(3.3.16) h1 (τ ) =
2
√ 3 .
4πτ 2
1 1
Hint: To prove the first identity, try the change of variables x = aτ − 2 − bτ 2 ,
and get the second by differentiating the first with respect to a.
Exercise 3.3.17. In this exercise we will discuss the densities of the symmetric
stable laws µα t for α ∈ (0, 2) (cf. Exercise 3.3.13). Once again, we know that
each µαt admits a smooth density with respect to Lebesgue measure λRN on RN .
Further, it is clear that this density is symmetric and that
dµα 1 dµ
α
1
t
(x) = t− α 1
(t− α x) for t ∈ (0, ∞).
dλRN dλRN
(i) Referring to Exercise 3.3.14 and using Exercise 3.3.12, show that
Z ∞
dµα 1 −N
|x|2 α
2 e− 4τ h 2 (τ ) dτ.
1
(3.3.18) (x) = N τ
dλRN (4π) 2 0
1
(ii) Because we have an explicit expression for h12 , we can use (3.3.18) to get an
dµ11
explicit expression for dλRN . In fact, show that
dµ1t N 2tN
(3.3.19) (x) = πtR (x) ≡ N +1 , (t, x) ∈ (0, ∞) × RN ,
dλRN ωN (t2 + |x|2 ) 2
N +1 −1
where ωN = 2π 2 Γ N2+1 is the surface area of SN in RN +1 . The function
R
π1 is the density for what probabilists call the Cauchy distribution. For
N
general N ’s, (t, x) ∈ (0, ∞) × RN 7−→ πtR (x) is what analysts call the Poisson
kernel for the right half-space in RN +1 . That is (cf. Exercise 10.2.22), if f ∈
Cb (RN ; R), then
Z
N
(t, x) uf (t, x) = f (x − y) πtR (y) dy
RN
is the unique, bounded harmonic extension of f to the right half-space.
(iii) Given α ∈ (0, 2), show that

ZZ Z
−|y−x|α
kf k2α ≡ e f (x)f (y) dxdy = |fˆ(ξ)|2 µα
1 (dξ)
RN
RN ×RN
for f ∈ L1 (RN ; C). This can be used to prove that k · kα determines a Hilbert
norm on Cc (RN ; C).
Chapter 4
Lévy Processes
Although analysis was the engine that drove the proofs in Chapter 3, probability
theory can do a lot to explain the meaning of the conclusions drawn there.
Specifically, in this chapter I will develop an intuitively appealing way of thinking
about a random variable
√−1 (ξ,X) X whose distribution is infinitely divisible, an X for
P
which E e RN
equals
√

1

exp −1 ξ, m) − 2 ξ, Cξ RN
h √ √
Z
−1 (ξ,y)RN
i
+ e − 1 − −1 1[0,1] |y| ξ, y RN M (dy)
RN
for some m ∈ RN , some symmetric, non-negative definite C ∈ Hom(RN ; RN ),

and Lévy measure M ∈ M2 (RN ). In most of this chapter I will deal with the
case when there is no Gaussian component. That is, I will be assuming that
C = 0. Because it is distinctly different, I will treat the Gaussian component
separately in the final section. However, I begin with some general comments
that apply to the considerations in the whole chapter.
The key idea, which seems to have been Lévy’s, is to develop a dynamic
picture of X. To understand the origin of his idea, denote by µ ∈ I(RN ) the
distribution of X, and define `µ accordingly, as in Theorem 3.2.7. Then, for
each t ∈ [0, ∞), there is a unique µt ∈ I(RN ) for which µbt = et`µ , and so
µs+t = µs ? µt for all s, t ∈ [0, ∞). Lévy’s idea was to associate with {µt : t ≥ 0}
a family of random variables {Z(t) : t ≥ 0} that would reflect the structure of
{µt : t ≥ 0}. Thus, Z(0) = 0 and, for each (s, t) ∈ [0, ∞), Z(s + t) − Z(s)
should be independent of {Z(τ ) : τ ∈ [0, s]} and have distribution µt . In other
words, {Z(t) : t ≥ 0} should be the continuous parameter analog of the sums
of independent, identically distributed random variables. Indeed, given any τ >
0, let {Xm : m ≥ 0} be a sequence of independent random variables with
distribution µτ . Then {Z(nτP ) : n ≥ 0} should have the same distribution as
{Sn : n ≥ 0}, where Sn = 1≤m≤n Xm . This observation suggests that one
should think about t Z(t) as a evolution that, when one understands its
dynamics, will reveal information about Z(1) and therefore µ.
151
152 4 Lévy Processes
For reasons that should be obvious now, an evolution {Z(t) : t ∈ [0, ∞)} of the
sort described above used to be called a process with independent, homo-
geneous increments, the term “process” being the common one for continuous
families of random variables and the adjective “homogeneous” referring to the
fact that the distribution of the increment Z(t) − Z(s) for 0 ≤ s < t depends
only on the length t − s of the time interval over which it is taken. In more
recent times, a process with independent, homogeneous increments is said to be
a Lévy process, and so I will adopt this more modern terminology.
Assuming that the family {Z(t) : t ∈ [0, ∞)} exists, notice that we already
know what the joint distribution of {Z(tk ) : k ∈ N} must be for any choice of
0 = t0 < · · · < tk < · · · . Indeed, Z(0) = 0 and
K
Y

P Z(tk ) − Z(tk−1 ) ∈ Γk , 1 ≤ k ≤ K = µtk −tk−1 (Γk )
k+1

for any K ∈ Z+ and Γ1 , . . . , ΓK ∈ BRN . Equivalently, P Z(tk ) ∈ Γk , 0 ≤ k ≤ K
equals
Z Z Y K X k K
Y
1Γ0 (0) · · · 1Γk yj µtk −tk−1 (dyk )
k=1 j=1 k=1
(RN )K
for any K ∈ Z+ and Γ0 , . . . , ΓK ∈ BRN . My goal is this chapter is to show that

each µ ∈ I(RN ) admits a Lévy process {Zµ (t) : t ≥ 0} and that the construction
of the associated Lévy process improves our understanding of µ.
Unfortunately, before I can carry out this program, I need to deal with a few
technical, bookkeeping matters.
§ 4.1 Stochastic Processes, Some Generalities
Given an index A with some nice structure and a family {X(α) : α ∈ A} of
random variables on a probability space (Ω, F, P) taking values in some measur-
able space (E, B), it is often helpful to think about {X(α) : α ∈ A} in terms of
the map ω ∈ Ω 7−→ X( · , ω) ∈ E A . For instance, if A is linearly ordered, then
ω X( · , ω) can be thought of as a random evolution. More generally, when
probabilists want to indicate that they are thinking about {X(α) : α ∈ A} as
the map ω X( · , ω), they call {X(α) : α ∈ A} a stochastic process on A
with state space (E, B).
The distribution of a stochastic process is the probability measure X∗ P
on1 (E A , B A ) obtained by pushing P forward under the map ω X( · , ω).
Hence two stochastic processes {X(α) : α ∈ A} and {Y (α) : α ∈ A} on (E, B)
have the same distribution if and only if

P X(αk ) ∈ Γk , 0 ≤ k ≤ K = P Y (αk ) ∈ Γk , 0 ≤ k ≤ K
1Recall that BA is the σ-algebra over E A that is generated by all the maps ψ ∈ E A 7−→
ψ(α) ∈ E as α runs over A.
§ 4.1 Stochastic Processes, Some Generalities 153
for all K ∈ Z+ , {α0 , . . . , αK } ⊆ A, and Γ0 , . . . , ΓK ∈ B.

As long as A is countable, there are no problems because E A is a reasonably
tame object and B A contains lots of its subsets. However, when A is uncountable,
E A is a ridiculously large space and B A will be too meager to contain many of
the subsets in which one is interested. The point is that for B to be in B A there
must (cf. Exercise 4.1.11) be a countable subset {αk : k ∈ N} of A such that
one can determine whether or not ψ ∈ B by knowing {ψ(αk ) : k ∈ N}. Thus
[0,∞)
(cf. Exercise 4.1.11), for instance, C [0, ∞); R ∈ / BR .
Probabilists expended a great deal of effort to overcome the problem raised in
the preceding paragraph. For instance, using a remarkable piece of measure the-
oretic reasoning, J.L. Doob2 proved that in the important case when A = [0, ∞)
and E = R, one can always make a modification, what he called the “separable
modification,” so that sets like C [0, ∞); R become measurable. However, in
recent times, probabilists have tried to simplify their lives by constructing their
processes in such a way that these unpleasant measurability questions never
arise. That is, if they suspect that the process should have some property that
is not measurable with respect to B A , they avoid constructions based on gen-
eral principles, like Kolmogorov’s Extension Theorem (cf. part (iii) of Exercise
9.1.17), and instead adopt a construction procedure that produces the process
with the desired properties already present.
The rest of this chapter contains important examples of this approach, and
the rest of this section contains a few technical preparations.
§ 4.1.1. The Space D(RN ). Unless its Lévy measure M is zero, a Lévy pro-
cess for µ ∈ I(RN ) cannot be constructed so that it has continuous paths. In
fact, if M 6= 0, then t Zµ (t) will be almost never continuous. Nonethe-
less, {Zµ (t) : t ≥ 0} can be constructed so that its paths are reasonably nice.
Specifically, its paths can be made to be right-continuous everywhere and have no
oscillatory discontinuities. For this reason, I introduce the space D(RN ) of paths
ψ : [0, ∞) −→ RN such that ψ(t) = ψ(t+) ≡ limτ &t ψ(τ ) for each t ∈ [0, ∞)
and ψ(t−) ≡ limτ %t ψ(τ ) exists in RN for each t ∈ (0, ∞). Equivalently,
ψ(0) = ψ(0+), and, for each t ∈ (0, ∞) and > 0, there is a δ ∈ (0, t) such that
sup{|ψ(t)−ψ(τ )| : τ ∈ (t, t+δ)} < and sup{|ψ(t−)−ψ(τ )| : τ ∈ (t−δ, t)} < .
The following lemma presents a few basic properties possessed by elements of
−n
D(RN ). In its statement, for n ∈ N and τ ∈ (0, ∞), bτ c+ n = min{m2 : m∈
+ n − + −n −n
Z and m ≥ 2 τ } and bτ cn = bτ cn − 2 = max{m2 : m ∈ N and m <
2n τ }. In addition, for 0 ≤ a < b,
(4.1.1) kψk[a,b] ≡ sup |ψ(t)|

t∈[a,b]
2 See Chapter II of Doob’s Stochastic Processes, Wiley (1953).

is the uniform norm of ψ [a, b], and
K
X
var[a,b] (ψ) = sup |ψ(tk ) − ψ(tk−1 )| : K ∈ Z+
(4.1.2) k=1

and a = t0 < t1 < · · · < tK = b
is the total variation of ψ [a, b].

Lemma 4.1.3. If ψ ∈ D(RN ), then, for each t > 0, kψk[0,t] < ∞, and for each
r > 0, the set
J(t, r, ψ) ≡ {τ ∈ (0, t] : |ψ(τ ) − ψ(τ −)| ≥ r}
is finite subset of (0, t]. In addition, there exists an n(t, r, ψ) ∈ N such that, for
every n ≥ n(t, r, ψ) and m ∈ Z+ ∩ (0, 2n ],
ψ m2−n t − ψ (m − 1)2−n t ≥ r =⇒ m2−n = τ + for some τ ∈ J(t, r, ψ).

t n
Finally,
kψk[0,t] = lim max |ψ(m2−n t)| : m ∈ N ∩ [0, 2n ]

n→∞
and X
ψ m2−n t − ψ (m − 1)2−n t .

var[0,t] (ψ) = lim
n→∞
m∈Z+ ∩[0,2n ]
Proof: Begin by noting that it suffices to treat the case when t = 1, since one
can always reduce to this case by replacing ψ with τ ψ(tτ ).
If kψk[0,1] were infinite, then we could find a sequence {τn : n ≥ 1} ⊆ [0, 1] such
that |ψ(τn )| −→ ∞, and clearly, without loss in generality, we could choose this
sequence so that τn −→ τ ∈ [0, 1] and {τn : n ≥ 1} is either strictly decreasing or
strictly increasing. But, in the first case this would contradict right-continuity,
and in the second it would contradict the existence of left limits. Thus, kψk[0,1]
must be finite.
Essentially the same reasoning shows that J(1, r, ψ) is finite. If it were not,
then we could find a sequence {τn : n ≥ 0} of distinct points in (0, 1] such
that |ψ(τn ) − ψ(τn −)| ≥ r, and again we could choose them so that they were
either strictly increasing or strictly decreasing. If they were strictly increasing,
then τn % τ for some τ ∈ (0, 1] and, for each n ∈ Z+ , there would exist a
τn0 ∈ (τn−1 , τn ) such that |ψ(τn ) − ψ(τn0 )| ≥ 2r , which would contradict the
existence of a left limit at τ . Similarly, right-continuity would be violated if the
τn ’s were decreasing.
Although it has the same flavor, the proof of the existence of n(1, r, ψ) is a
bit trickier. Let 0 < τ1 < · · · τK ≤ 1 be the elements of J(1, r, ψ). If n(1, r, ψ)
failed to exist, then we could choose a subsequence {(mj , nj ) : j ≥ 1} from

Z+ × N so that {nj : j ≥−n 1}is strictly increasing and tj ≡ mj 2−nj ∈ (0, 1]
satisfies ψ tj − ψ tj − 2 j ≥ r for all j ∈ Z+ , but tj 6= bτk c+ nj for any
j ∈ Z+ and 1 ≤ k ≤ K. If tj = t infinitely often for some t, then we would
have the contradiction that t ∈ / J(1, r, ψ) and yet |ψ(t) − ψ(t−)| ≥ r. Hence,
I will assume that the tj ’s are distinct. Further, without loss in generality, I
assume that {tj : j ≥ 1} is a subset of one of the intervals (0, τ1 ), (τk−1 , τk )
for some 2 ≤ k ≤ K, or of (τK , 1]. Finally, I may and will assume that either
tj % t ∈ (0, 1] or that tj & t ∈ [0, 1). But, since |ψ(tj ) − ψ(tj − 2−nj )| ≥ r,
tj % t contradicts the existence of ψ(t−). Similarly, if tj & t and tj − 2−nj ≥ t
for infinitely many j 0 s, then we get a contradiction with right-continuity at t.
Thus, the only remaining case is when tj & t and tj − 2−nj < t ≤ tj for all but
a finite number of j’s, in which case we get the contradiction that t ∈ / J(1, r, ψ)
and yet
|ψ(t) − ψ(t−)| = lim ψ(tj ) − ψ tj − 2−nj ≥ r.

j→∞
To prove the assertion about kψk[0,1] , simply observe that, by monotonicity,

the limit exists and that, by right-continuity, for any t ∈ [0, 1],
|ψ(t)| = lim ψ btc+ ≤ lim max ψ(m2−n ) ≤ kψk[0,1] .

n n
n→∞ n→∞ 0≤m≤2
The assertion about var[0,1] (ψ) is proved in essentially the same manner, al-
though now the monotonicity comes from the triangle inequality and the first
equality in the preceding must be replaced by |ψ(t)−ψ(t−)| = limn→∞ |ψ(btc+n )−
−
ψ(btcn )|.
I next give D(RN ) the topological structure corresponding to uniform conver-
gence on compacts, or, equivalently, the topological structure for which
∞
X kψ − ψ 0 k[0,n]
ρ(ψ, ψ 0 ) ≡ 2−n
n=1
1 + kψ − ψ 0 k[0,n]
is a metric. Because it is not separable (cf. Exercise 4.1.10), this topological

structure is less than ideal. Nonetheless, the metric ρ is complete. To see
that it is, first observe that |ψ(τ −)| ≤ kψk[0,t] for all 0 < τ ≤ t. Thus, if
sup`>k ρ(ψ` , ψk ) −→ 0 as k → ∞, then there exist paths ψ : [0, ∞) −→ RN and
ψ̃ : (0, ∞) −→ RN such that
sup |ψk (τ ) − ψ(τ )| −→ 0 and sup |ψk (τ −) − ψ̃(τ )| −→ 0

τ ∈[0,t] τ ∈(0,t]
for each t > 0. Therefore, if t ≥ τn & τ , then
lim |ψ(τ ) − ψ(τn )| ≤ 2kψ − ψk k[0,t] + lim |ψk (τ ) − ψk (τn )| ≤ 2kψ − ψk k[0,t]
n→∞ n→∞
for all k ∈ Z+ , and so ψ is right-continuous. Essentially the same argument

shows that ψ(τ −) = ψ̃(τ ) for τ > 0, which means, of course, that ψ ∈ D(RN )
and that supτ ∈(0,t] |ψk (τ −) − ψ(τ −)| −→ 0 for each t > 0.
One might think that I would take the measurable structure on D(RN ) to be
the one given by the Borel field BD(RN ) determined by uniform convergence on
compacts. However, this is not the choice I will make. Instead, the measurable
structure I choose for D(RN ) is the one that D(RN ) inherits as a subset of
(RN )[0,∞) . That is, I take for D(RN ) the measurable structure given by the σ-
algebra FD(RN ) = σ {ψ(t) : t ∈ [0, ∞)} , the σ-algebra generated by the maps
ψ ∈ D(RN ) 7−→ ψ(t) ∈ RN as t runs over [0, ∞). The reason for my insisting on
this choice is that I want two D(RN )-valued stochastic processes {X(t) : t ≥ 0}
and {Y(t) : t ≥ 0} to induce the same measure on D(RN ) if they have the same
distribution. Seeing as (cf. Exercise 4.1.11) FD(RN ) $ BD(RN ) , this would not be
true were I to choose the Borel structure.
Because FD(RN ) 6= BD(RN ) , FD(RN ) -measurability does not follow from topo-
logical properties like continuity. Nonetheless, many functions related to the
topology of D(RN ) are FD(RN ) -measurable. For example, the last part of Lemma
4.1.3 proves that both ψ kψk[0,t] , which is continuous, and ψ var[0,t] (ψ),
which is lower semicontinuous, are both FD(RN ) -measurable for all t ∈ [0, ∞).
In the next subsection, I will examine other important functions on D(RN ) and
will show that they, too, are FD(RN ) -measurable.
§ 4.1.2. Jump Functions. Let M∞ (RN ) be the space of non-negativeBorel
measures M on RN with the properties that M ({0}) = 0 and M B(0, r){ < ∞
for all r > 0. A jump function is a map t ∈ [0, ∞) 7−→ j(t, · ) ∈ M∞ (RN )
with the property that, for each ∆ ∈ BRN with 0 ∈ ¯ j(0, ∆) = 0, t
/ ∆, j(t, ∆)
is a non-decreasing, piecewise constant element of D(RN ) such that j(t, ∆) −
j(t−, ∆) ∈ {0, 1} for each t > 0.
Lemma 4.1.4. A map t j(t, · ) is a non-zero jump function if and only if
there exists a set ∅ 6= J ⊂ (0, ∞) that is finite or countable and a set {yτ :
τ ∈ J} ⊆ RN \ {0} such that {τ ∈ J ∩ (0, t] : |yτ | ≥ r} is finite for each
(t, r) ∈ (0, ∞)2 and
X
(4.1.5) j(t, · ) = 1[τ,∞) (t)δyτ .
τ ∈J
In particular, if t j(t, · ) is a jump function and t > 0, then, either j(t, · ) =

j(t−, · ) or j(t, · ) − j(t−, · ) = δy for some y ∈ RN \ {0}.
Proof: It should be obvious that if J and {yτ : τ ∈ J} satisfy the stated
conditions, then the t j(t, · ) given by (4.1.5) is a jump function. To go in the
other direction, suppose that t j(t, · ) is a jump function, and, for each r > 0,
set fr (t) = j t, RN \ B(0, r) . Because t fr (t) is a non-decreasing, piecewise
constant, right-continuous function satisfying fr (0) = 0 and fr (t) − fr (t−) ∈
{0, 1} for each t > 0, it has at most a countable number of discontinuities, and
at most fr (t) of them can occur in any interval (0, t]. Furthermore, if fr has a
discontinuity at τ , then j τ, B(0, r) − j τ −, B(0, r) = 0, and so the measure
ντ = j(τ, · ) − j(τ −, · ) is a {0, 1}-valued probability measure on RN that assigns
mass 0 to B(0, r). Hence (cf. Exercise 4.1.15) fr (τ ) 6= fr (τ −) =⇒ ντ = δy
for some yτ ∈ RN \ B(0, r). From these considerations, it follows easily that if
J(r) = {τ ∈ (0, ∞) : fr (τ ) 6= fr (τ −)} and if, for each τ ∈ J(r), yτ ∈ RN \B(0, r)
is chosen so that j(τ, · ) − j(τ −, · ) = δyτ , then J(r) ∩ (0, t] is finite for all t > 0
and X
j t, · B(0, r){ = 1[τ,∞) (t)δyτ .
τ ∈J(r)
S
Thus, if J = r>0 J(r), then J is at most countable, {(τ, yτ ) : τ ∈ J} has the
required finiteness property, and (4.1.5) holds.
The reason for my introducing jump functions is that every element ψ ∈
D(RN ) determines a jump function t j(t, · , ψ) by the prescription
X
j(t, Γ, ψ) = 1Γ ψ(τ ) − ψ(τ −) ,
(4.1.6) τ ∈J(t,ψ)
where J(t, ψ) ≡ {τ ∈ (0, t] : ψ(τ ) 6= ψ(τ −)},
for Γ ⊆ RN \ {0}.
S To check that j(t, · , ψ) is well defined and is a jump function,
take J(ψ) = t>0 J(t, ψ) and yτ = ψ(τ ) − ψ(τ −) when τ ∈ J(ψ), note that,
by Lemma 4.1.3, J(ψ) is at most countable and that {(τ, yτ ) : τ ∈ J(ψ)} has
the finiteness required in Lemma 4.1.4, and observe that (4.1.5) holds when
j(t, · ) = j(t, · , ψ) and J = J(ψ).
Because it will be important for us to know that the distribution of a D(RN )-
valued stochastic process determines the distribution of the jump functions for
its paths, we will make frequent use of the following lemma.
Lemma 4.1.7. If ϕ : RN −→ R is a BRN -measurable function that vanishes in a
neighborhood of 0, then ϕ is j(t, · , ψ)-integrable for all (t, ψ) ∈ [0, ∞) × D(RN ),
and Z
(t, ψ) ∈ [0, ∞) × D(RN ) 7−→ ϕ(y) j(t, dy, ψ) ∈ R
RN
is a B[0,∞) × FD(RN ) -measurable function that, for each ψ, is right-continuous

and piecewise constant as a function of t. Finally, for all Borel measurable
ϕ : RN −→ [0, ∞), (t, ψ) ∈ [0, ∞) × D(RN ) 7−→ RN ϕ(y)j(t, dy, ψ) ∈ [0, ∞] is
R
B[0,∞) × FD(RN ) -measurable.

Proof: The final assertion is an immediate consequence of the earlier one plus
the Monotone Convergence Theorem.
Let r > 0 be given. If ϕ is a Borel measurable function that vanishes on
B(0, r), then it is immediate from the first part of Lemma 4.1.3 that ϕ is
j(t, · ,Rψ)-integrable for all (t, ψ) ∈ [0, ∞) × D(RN ) and, for each ψ ∈ D(RN ),
t RN
ϕ(y) j(t, dy, ψ) is right-continuous and piecewise constant. Thus, it
suffices to show that, for each t ∈ (0, ∞),
Z
(*) ψ ϕ(y) j(t, dy, ψ) is FD(RN ) -measurable.
RN
Moreover, it suffices to do this when t = 1 and ϕ is continuous, since rescaling

time allows one to replace t by 1 and the set of ϕ’s for which (*) is true is closed
under pointwise convergence. But, by the second part of Lemma 4.1.3, we know
that
2n
X X
ϕ ψ m2−n − ψ (m − 1)2−n = ϕ ψ bτ c+ −

n − ψ bτ cn
m=1 τ ∈J(1,r,ψ)
for n ≥ n(1, r, ψ), and therefore

Z 2n
X
ϕ ψ m2−n − ψ (m − 1)2−n .

ϕ(y) j(1, dy, ψ) = lim
RN n→∞
m=1
Here are some properties of a path ψ ∈ D(RN ) which are determined by

its relationship to its jump function. First, it should be obvious that ψ ∈
C(RN ) ≡ C [0, ∞); RN if and only if j(t, · , ψ) = 0 for all t > 0. At the opposite
extreme, say that a ψ is an absolutely pure jump path if and only if (cf.
§ 3.2.2) j(t, · , ψ) ∈ M1 (RN ) and ψ(t) = y j(t, dy, ψ) for all t > 0. Among
R
the absolutely pure jump paths are those that are the piecewise constant paths:
those absolutely pure jump ψ’s for which j(t, · , ψ) ∈ M0 (RN ), t > 0. Because
of Lemma 4.1.7, each of these properties is FD(RN ) -measurable. In particular, if
{Z(t) : t ≥ 0} is a D(RN )-valued stochastic process whose paths almost surely
have any one of these properties, then the paths of every D(RN )-valued stochastic
process with the same distribution as {Z(t) : t ≥ 0} will almost surely possess
that property.
Finally, I need to address the question of when a jump function is the jump
function for some ψ ∈ D(RN ).
Theorem 4.1.8. Let t j(t, · ) be a non-zero jump function, and set j Γ (t, dy)
¯ and if ψ ∆ (t) =
R 1Γ (y)j(t, dy) for ∆Γ ∈ BRN . If ∆ ∈ BRN with 0 ∈
= / ∆
∆
y j(t, dy), then ψ is a piecewise constant element of D(RN ), j(t, · , ψ ∆ ) =
N
j ∆ (t, · ), and j(t, · , ψ−ψ ∆ ) = j R \∆ (t, · ) = j(t, · )−j ∆ (t, · ) for any ψ ∈ D(RN )
whose jump function is t j(t, · ). Finally, suppose that {ψm : m ≥ 0} ⊆
N
D(R ) and a non-decreasing Ssequence {∆m : m ≥ 0} ⊆ BRN satisfy the
∞
conditions that RN \ {0} = m=0 ∆m and, for each m ∈ N, 0 ∈ / ∆m and
∆m
j(t, · , ψm ) = j (t, · ), t ≥ 0. If ψm −→ ψ uniformly on compacts, then
j(t, · , ψ) = j(t, · ), t ≥ 0.
Proof: Throughout the proof I will use the notation introduced in Lemma
4.1.4.
Assuming that 0 ∈ ¯ we know that
/ ∆,
X
j ∆ (t, · ) = 1[τ,∞) (t)1∆ (yτ )δyτ ,
τ ∈J
where, for each t > 0, there are only finitely many non-vanishing terms. At the
same time,
X X
ψ ∆ (t) = 1[τ,∞) (t)1∆ (yτ )yτ and j(t, · , ψ−ψ ∆ ) = 1[τ,∞) (t)1RN \∆ (yτ )δyτ
τ ∈J τ ∈J
if j(t, · , ψ) = j(t, · ). Thus, all that remains is to prove the final assertion. To
this end, suppose that j(t, · , ψ) 6= j(t−, · , ψ). Since kψ − ψm k[0,t] −→ 0, there
exists an m such that ψm (t) 6= ψm (t−) and therefore that j(t, · ) − j(t−, · ) = δy
for some y ∈ ∆m . Since this means that ψn (t) − ψn (t−) = y for all n ≥ m, it
follows that ψ(t) − ψ(t−) = y and therefore that j(t, · , ψ) − j(t−, · , ψ) = δy =
j(t, · ) − j(t−, · ). Conversely, suppose that j(t, · ) 6= j(t−, · ) and choose m so
that j(t, · ) − j(t−, · ) = δy for some y ∈ ∆m . Then ψn (t) − ψn (t−) = y for
all n ≥ m. Thus, since this means that ψ(t) − ψ(t−) = y, we again have that
j(t, · , ψ) − j(t−, · , ψ) = δy = j(t, · ) − j(t−, · ). After combining these, we see
that j(t, · , ψ) − j(t−, · , ψ) = j(t, · ) − j(t−, · ) for all t > 0, from which it is an
easy step to j(t, · ) = j(t, · , ψ) for all t ≥ 0.
Exercise 4.1.9. When dealing with uncountable collections of random vari-

ables, it is important to understand what functions are measurable with respect
to them. To be precise, suppose that {Xi : i ∈ I} is a non-empty collection of
Ω with values in some measurable space (E, B), and let
functions on some space
F = σ {Xi : i ∈ I} be the σ-algebra over Ω which they generate. Show that
+
A ∈ F if and only if there is a sequence {im : m ∈ Z+ } ⊆ I and an Γ ∈ B Z
such that
A = ω ∈ Ω : Xi1 (ω), . . . , Xim (ω), . . . ∈ Γ .
More generally, if f : Ω −→ R, show that f is F-measurable if and only if there
+ +
is a sequence {im : m ∈ Z+ } ⊆ I and a F Z -measurable F : E Z −→ R such
that
f (ω) = F Xi1 (ω), . . . , Xim (ω), . . . .
Hint: Make use of Exercise 1.1.12.
Exercise 4.1.10. Let e ∈ SN −1 , set ψt (τ ) = 1[t,∞) (τ )e for t ∈ [0, 1], and show
that kψt − ψs k[0,1] = 1 for all s 6= t from [0, 1]. Conclude from this that D(RN )
is not separable in the topology of uniform convergence on compacts.
Exercise 4.1.11. Using Exercise 4.1.9, show that a function ϕ : D(RN ) −→ R

is FD(RN ) -measurable if and only if there exists an (RN )N -measurable function
Φ : (RN )N −→ R and a sequence {tk : k ∈ N} ⊆ [0, ∞) such that
ϕ(ψ) = Φ ψ(t0 ), . . . , ψ(tk ), . . . , ψ ∈ D(RN ).

Next, define ψt as in Exercise 4.1.10, and use that exercise together with the
preceding to show that the open set {ψ ∈ D(RN ) : ∃ t ∈ [0, 1] kψ − ψt k[0,1] < 1}
is not FD(RN ) -measurable. Conclude that BD(RN ) % FD(RN ) . Similarly, conclude
that neither D(RN ) nor C(RN ) is a measurable subset of (RN )[0,∞) . On the
other hand, as we have seen, C(RN ) ∈ FD(RN ) .
Exercise 4.1.12. Show that
Z
(4.1.13) var[0,t] (ψ) ≥ |y| j(t, dy, ψ), (t, ψ) ∈ [0, ∞) × D(RN ).
RN
Hint: This is most easily seen from the representation of j(t, · , ψ) in terms of
point masses at the discontinuities of ψ. One can use this representation to show
that, for each r > 0,
X Z

var[0,t] (ψ) ≥ ψ(τ ) − ψ(τ −) =
|y| j(t, dy, ψ), (t, ψ) ∈ [0, ∞).
τ ∈J(t,r,ψ) |y|≥r
Exercise
R 4.1.14. If ψ is an absolutely pure jump path, show that var[0,t] (ψ) =
|y| j(t, dy, ψ) and therefore that ψ has locally bounded variation. Conversely,
if ψ ∈ C(RN ) has locally bounded variation, show that ψ is an absolutely pure
if ψ ∈ D(RN )
R
jump path if and only if var[0,t] (ψ) = |y| j(t, dy, ψ). Finally,
and j(t, · , ψ) ∈ M1 (RN ) for all t ≥ 0, set ψc (t) ≡ ψ(t) − y j(t, dy, ψ) and
R
show that ψc ∈ C(RN ) and

Z
var[0,t] (ψ) = var[0,t] (ψc ) + |y| j(t, dy, ψ).
Exercise 4.1.15. If ν ∈ M1 (RN ), show that ν(Γ) ∈ {0, 1} for all Γ ∈ BRN if
and only if ν = δy for some y ∈ RN .
Hint: Begin by showing that it suffices to handle the case when N = 1. Next,
assuming that N = 1, show that ν is compactly supported, let m be its mean
value, and show that ν = δm .
§ 4.2 Discontinuous Lévy Processes
In this section I will construct the Lévy processes corresponding to those µ ∈
I(RN ) with no Gaussian component. That is,
√

µ̂(ξ) = exp −1 ξ, mµ RN
(4.2.1)
√
Z h √
−1(ξ,y)
i
+ e − 1 − −1 1[0,1] (|y|) ξ, y RN Mµ (dy) .
RN
§ 4.2 Discontinuous Lévy Processes 161
Because they are the building blocks out of which all such processes are made,
I will treat separately the case when µ is a Poisson measure πM for some M ∈
M0 (RN ) and will call the corresponding Lévy process the Poisson process
associated with M .
§ 4.2.1. The Simple Poisson Process. I begin with the case when P∞ N 1= 1
−1
and M = δ1 , for which πM is the simple Poisson measure e m=0 m! δm
√
whose Fourier transform is exp e −1ξ − 1 .
To construct the Poisson process associated with δ1 , start with a sequence
{τm : m ≥ 1} of independent, unit exponential random variables on a proba-
bility space (Ω, F, P). That is,
n
!
X
+

P {ω : τ1 (ω) > t1 , . . . , τn (ω) > tn } = exp − tm
m=1
for all n ∈ Z+ and (t1 , . . . , tn ) ∈ Rn . Without loss in generality, I may and will
assume that τm (ω) > 0 for all m ∈ Z+ and ω ∈ Ω. InPaddition, by The Strong
∞
Law of Large Numbers, I may and will assume Pn that m=1 τm (ω) = ∞ for all
ω ∈ Ω. Next, set T0 (ω) = 0 and Tn (ω) = m=1 τm (ω), and define
∞
X
(4.2.2) N (t, ω) = max{n ∈ N : Tn (ω) ≤ t} = 1[Tn (ω),∞) (t) for t ∈ [0, ∞).
n=1
Clearly t N (t, ω) is a non-decreasing, right-continuous, piecewise constant, N-

valued path that starts at 0 and, whenever it jumps, jumps by +1. In particular,
N ( · , ω) ∈ D(RN ), N (t, ω) − N (t−, ω) ∈ {0, 1} for all t ∈ (0, ∞), and (cf. (4.1.6))
j t, · , N ( · , ω) = N (t, ω)δ1 .
Because P N (t) = n = P Tn ≤ t < Tn+1 , P N (t) = 0 = P(τ1 > t) = e−t ,

and, when n ≥ 1 (below |Γ| denotes the Lebesgue measure of Γ ∈ BRn )

Z Z Pn+1
P N (t) = n , = · · · e− m=1 τm dτ1 · · · dτn+1 = e−t |B|,

Pn Pn+1
where A = (τ1 , . . . , τn+1 ) ∈ (0, ∞)n+1 : m=1 τm ≤ t < m=1 τm and
Pn
B = (τ1 , . . . , τn ) ∈ (0, ∞)n : m=1 τm ≤ t . By making the change of
Pm
variables sm = j=1 τj and remarking that the associated Jacobian is 1, one

sees that |B| = |C|, where C = (s1 , . . . , sn ) ∈ Rn : 0 < s1 < · · · < sn ≤ t .
n
Since |C| = tn! , we have shown that the P-distribution of N (t) is the Poisson
measure πtδ1 . In particular, πδ1 is the P-distribution of N (1).
I now want to use the same sort of calculation to show that {N (t) : t ∈ [0, ∞)}
is a simple Poisson process, that is, a Lévy process for πδ1 . (See Exercise
4.2.18 for another, perhaps preferable, approach.)
Lemma 4.2.3. For any (s, t) ∈ [0, ∞), the P-distribution of the increment
N (s + t) − N (s) is πtδ1 . In addition, for any K ∈ Z+ and 0 = t0 < t1 < · · · < tK ,
the increments {N (tk ) − N (tk−1 ) : 1 ≤ k ≤ K} are independent.
Proof: What I have to show is that, for all K ∈ Z+ , 0 = n0 ≤ · · · ≤ nK , and
0 = t0 < t1 < · · · < tK ,

P N (tk ) − N (tk−1 ) = nk − nk−1 , 1 ≤ k ≤ K
K
Y e−(tk −tk−1 ) (tk − tk−1 )nk −nk−1
= ,
(nk − nk−1 )!
k=1
which is equivalent to checking that

K
Y (tk − tk−1 )nk −nk−1
P N (tk ) = nk , 1 ≤ k ≤ K = e−tK

;
(nk − nk−1 )!
k=1
and, since the case when nK = 0 is trivial, I will assume that nK ≥ 1. In fact,
because neither side is changed if one removes those nk ’s for which nk = nk−1 ,
I will assume that 0 = n0 < · · · < nK .
Begin by noting that

P N (tk ) = nk , 0 ≤ k ≤ K = P Tnk ≤ tk < Tnk+1 , 1 ≤ k ≤ K
Z Z PnK +1
= · · · e− m=1 τm dτ1 · · · dτnK +1 = e−tK |B|,
A
where
nk nX
k +1
( )
X
nK +1
A= (τ1 , . . . , τnK +1 ) ∈ (0, ∞) : τm ≤ tk < τm , 1 ≤ k ≤ K
m=1 m=1
and
nk
( )
X
B= (τ1 , . . . , τnK ) ∈ (0, ∞)nK : tk−1 < τm ≤ tk : 1 ≤ k ≤ K .
m=1
Pm
To compute |B|, make the change of variables sm = j=1 τj to see that |B| =
|C|, where
C = (s1 , . . . , snK ) ∈ RnK : tk−1 < snk−1 +1 < · · · < snk ≤ tk for 1 ≤ k ≤ K .

Finally, for 1 ≤ k ≤ K, set
Ck = (snk−1 +1 , . . . , snk ) ∈ Rnk −nk−1 : tk−1 < snk−1 +1 < · · · < snk ≤ tk ,

and check that

Y Y (tk − tk−1 )nk −nk−1
|C| = |Ck | = .
(nk − nk−1 )!
k∈S k∈S
The simple Poisson process {N (t) : t ≥ 0} is aptly named. It starts at 0,

waits a unit exponential holding time before jumping to 1, sits at 1 for another,
independent, unit exponential holding time before jumping to 2, etc. Thus, since
πδ1 is the distribution of this process at time 1, we now have an appealing picture
of the way in which simple Poisson random variables arise.
Given α ∈ [0, ∞), I will say that a D(R)-valued process whose distribution is
the same as that of {N (αt) : t ≥ 0} is a simple Poisson process run at rate
α.
§ 4.2.2. Compound Poisson Processes. I next want to build a Poisson
process associated with a general M ∈ M0 (RN ). If M = 0, there is nothing to
do, since the corresponding process will simply sit at 0 for all time. If M 6= 0, I
write it as αν, where α = M (RN ) and ν = M α . After augmenting the probability
space if necessary, I introduce a sequence {Xn : n ≥ 1} of mutually independent,
ν-distributed, random variables that are independent of the unit exponential
random variables {τm : m ≥ 1} out of which I built the simple Poisson process
{N (t) : t ≥ 0} in the preceding subsection. Further, since M ({0}) = 0, I may
and will assume that none of the Xn ’s is ever 0. Finally, set
X
(4.2.4) ZM (t, ω) = Xn (ω),
1≤n≤N (αt,ω)
with the understanding that a sum over the empty set is 0.

Clearly, the process {ZM (t) : t ≥ 0} is nearly as easily understood as is the
simple Poisson process. Like the simple Poisson process, its paths are right-
continuous, start at 0, and are piecewise constant. Further, its holding times
and jumps are all independent of one another. The difference is that its holding
times are now α-exponential random variables (i.e., exponential with mean value
1
α ) and its jumps are random variables with distribution ν. In particular,
X ∞
X

(4.2.5) j t, · , ZM ( · , ω) = δXn (ω) = 1[Tn (ω),∞) (t)δXn (ω) .
1≤n≤N (αt,ω) n=1
I now want to check that {ZM (t) : t ≥ 0} is a Lévy process for πM and, as
such, deserves to be called a Poisson process associated with M : the one with
rate M (RN ) and jump distribution M M (RN )
. That is, I want to show that, for
each 0 = t0 < t1 < · · · tK , the random variables ZM (tk ) − ZM (tk−1 ), 1 ≤ k ≤ K,
are mutually independent and that the kth one has distribution π(tk −tk−1 )M .
Equivalently, I need to check that, for any ξ1 , . . . , ξK ∈ RN ,
" K
!# K
P
√ X Y
E exp −1 ξk , ZM (tk ) − ZM (tk−1 ) RN = π[
τk M (ξk ),
k=1 k=1
where τk = tk − tk−1 . But, because of our independence assumptions, the above

expectation is equal to
X
P N (αtk ) − N (αtk−1 ) = nk − nk−1 , 1 ≤ k ≤ K
nK ≥···≥n1 ≥0
  
K
√ X X
× EP exp  −1 ξ k , Xm RN

k=1 nk−1 +1<m≤nk
K n −n K
X Y e−ατk τk k k−1 Y
= ν̂(ξk )nk −nk−1 = π[
τk M (ξk ).
(nk − nk−1 )!
nK ≥···≥n1 ≥0 k=1 k=1
Any stochastic process {Z(t) : t ≥ 0} with right-continuous, piecewise con-

stant paths and the same distribution as the process {ZM (t) : t ≥ 0} just
constructed is called a Poisson process associated with M .
Here is a beautiful and important procedure for transforming one Poisson
process into another.
0
Lemma 4.2.6. Suppose that F : RN −→ RN is a Borel measurable function
0
that takes the origin in RN into the origin in RN , and, for M ∈ M0 (RN ), define
0
M F ∈ M0 (RN ) by

M F (Γ) = M F −1 Γ \ {0} for Γ ∈ BRN 0 .
If {Z(t) : t ≥ 0} is a Poisson process associated with πM and

Z
F

(4.2.7) Z (t, ω) = F (y) j t, dy, Z( · , ω) for (t, ω) ∈ [0, ∞) × Ω,
RN
then {ZF (t) : t ≥ 0} is a Poisson process associated with πM F . Moreover, if,

for each i in an index set I, Fi : RN −→ RNi is a Borel measurable satisfying
N
Fy ∈ R , there is at
Fi (0) = 0 and, for each most one i ∈ I for which Fi (y) 6= 0,
then the processes {Z (t) : t ≥ 0} : i ∈ I are mutually independent.
i
Proof: In proving the first part, I will, without loss in generality, assume that
(cf. (4.2.4)) Z = ZM . But then, by (4.2.5),
X
ZF (t, ω) =

F Xn (ω) ,
1≤n≤N (αt,ω)
from which the first assertion follows immediately from the same computation
with which I just showed that {ZM (t) : t ≥ 0} is a Poisson process associated
with M .
To prove the second assertion, I begin by observing that it suffices to treat
the case when I = {1, 2}. To see this, suppose that we know the result in that
case, and let n > 2 and a set {i1 , . . . , in } of distinct elements from I be given.
By taking F1 = (Fi1 , . . . , Fin−1 ), F2 = Fin , and applying the assumed result, we
would have that {ZFin (t) : t ≥ 0} is independent of ZFi1 (t), . . . , ZFin−1 (t) :

t ≥ 0 . Hence,
F proceeding by induction, we would be able to show that the
processes {Z im (t) : t ≥ 0} : 1 ≤ m ≤ n are independent.
Now assume that I = {1, 2}. What I have to check is that, for any K ∈ Z+ ,
0 = t0 < t1 < · · · < tK , and {(ξk1 , ξk2 ) : 1 ≤ k ≤ K} ⊆ RN1 × RN2 ,
" K h
√ X

ξk1 , ZF1 (tk ) − ZF1 (tk−1 RN1
P

E exp −1
k=1
i #
+ ξk2 , ZF2 (tk ) − ZF2 (tk−1 ) RN2

" K
!#
√ X
ξk1 , ZF1 (tk ) F1
P

=E exp −1 − Z (tk−1 ) RN1
k=1
" K
!#
√ X
ξk2 , ZF2 (tk ) F2
P

×E exp −1 − Z (tk−1 ) RN2 .
k=1

For this purpose, take F : RN −→ RN1 +N2 to be given by F (y) = F1 (y), F2 (y) ,
and set ξk = (ξk1 , ξk2 ). Then the first expression in the preceding equals
" K #
√ X

F F
P

E exp −1 ξk , Z (tk ) − Z (tk−1 RN1 +N2
k=1
K h √
Y i
EP exp −1 ξk , ZF (tk − tk−1 ) RN1 +N2

= ,
k=1
since {ZF (t) : t ≥ 0} has independent, homogeneous increments. Hence, it
suffices to observe that, for any t > 0 and ξ = (ξ 1 , ξ 2 ),
h i Z √
EP exp ξ, ZF (t) RN1 +N2 = exp t e −1(ξ,F (y))RN1 +N2 − 1 M (dy)

RN
Z √
−1(ξ1 ,F1 (y))RN1
= exp t e − 1 M (dy)
RN
Z √
−1(ξ2 ,F2 (y))RN2
× exp t e − 1 M (dy)
RN
h i h i
= EP exp ξ 1 , ZF1 (t) RN1 EP exp ξ 2 , ZF2 (t) RN2 .

As an essentially immediate consequence of Lemma 4.2.6 and Theorem 4.1.8,

we have the following important conclusion.
Theorem 4.2.8. If {Z(t) : t ≥ 0} is aPoisson process associated with πM ,
then, for each ∆ ∈ BRN \{0} , j t, ∆, Z( · ) : t ≥ 0 is a simple Poisson process
run at rate M (∆). Moreover, if
Z
∆
y j t, dy, Z and M ∆ (Γ) = M (∆ ∩ Γ) for Γ ∈ BRN ,

Z (t) =
∆

then {Z (t) : t≥ 0} is the Poisson process associated with M ∆ and j t, Γ, Z∆
∆
= j t, Γ ∩ ∆, Z for all (t, Γ) ∈ [0, ∞) × BRN . Finally, if {∆i : i ∈ I} is a family

of
mutually disjoint Borel subsets of RN \ {0}, then boththe Poisson processes
∆i
{Z (t) : t ≥ 0} : i ∈ I as well as the jump processes {j t, ∆i , Z : t ≥ 0} :
i ∈ I are mutually independent.
The result in Theorem 4.2.8 says that the jumps of a Poisson process can be
decomposed into a family of mutually independent, simple Poisson processes run
at rates determined by the M -measure of the jump sizes. The next result can
be thought of as a re-assembly procedure that complements this decomposition
result.

Theorem 4.2.9. If {Zk (t) : t ≥ 0} : 1 ≤ k ≤ K are mutually independent
Poisson processes associated with {Mk : 1 ≤ k ≤ K} ⊆ M0 (RN ), then
( K
) K
X X
Z(t) ≡ Zk (t) : t ≥ 0 is a Poisson process associated with M ≡ Mk .
k=1 k=1
Next, suppose that the Mk ’s are mutually singular in the sense that, for each
k, there exists a ∆k ∈ BRN \{0} with the properties that ∆k ∩ ∆` = ∅ and

Mk ∆k { = 0 = M` (∆k ) for ` 6= k. Then, for P-almost every ω ∈ Ω,
K
X
j t, · , Z( · , ω) = j t, · , Zk ( · , ω) , t ∈ [0, ∞).
k=1
Equivalently, for P-almost every ω ∈ Ω and all t ≥ 0, there is at most one k such
that Zk (t, ω) 6= Zk (t−, ω).
Proof: Clearly, {Z(t) : t ≥ 0} starts at 0 and has independent increments. In
addition, for any s, t ∈ [0, ∞) and ξ ∈ RN ,
h √ i Y K h √ i
EP e −1(ξ,Z(s+t)−Z(s))RN = EP e −1(ξ,Zk (s+t)−Zk (s))RN
k=1
K
Y Z √
−1(ξ,y)RN
= exp t e − 1 Mk (dy)
k=1 RN
√
Z
−1(ξ,y)RN
= exp t e − 1 M (dy) .
RN
Now assume that the Mk ’s are as in the final part of the statement, and choose
∆k ’s accordingly. Without loss in generality, I will assume that RN \ {0} =
SK
k=1 ∆k . Also, because the assertion depends only on the joint distribution of
the processes involved, I may and will assume that
Z

Zk (t) = y j t, dy, Z for 1 ≤ k ≤ K,
∆k
PK
since then Z(t) = k=1 Zk (t), and, by Theorem 4.2.8, the Zk ’s are independent
and the kth one is a Poisson process associated with Mk . But
with this choice,

another application of Theorem 4.2.8 shows that j t, Γ, Zk = j t, Γ ∩ ∆k , Z ,
and therefore
K
X
j t, Γ, Z = j t, Γ, Zk , t ∈ [0, ∞).
k=1
Because the paths of a Poisson process are piecewise constant, they certainly
have finite variation on each compact time interval. The first part of the next
lemma provides an estimate of that variation. The estimate in the second part
will be used in § 4.2.5.
Lemma 4.2.10. If {Z(t) : t ≥ 0} is a Poisson process associated with M ∈
M0 (RN ), then Z
P

E var[0,t] (Z) = t |y| M (dy).
RN
R R
In addition, if RN
|y| M (dy) < ∞ and Z̄(t) = Z(t) − RN
y M (dy), then
N 2t N 2t
Z
≥ R ≤ 2 EP |Z̄(t)|2 = 2 |y|2 M (dy).

P kZ̄k[0,t]
R R RN
Proof: Again I will assume that (cf. (4.2.4)) Z = ZM , in which case

X
var[0,t] (Z) = |Xm |.
1≤m≤N (αt)
Hence (cf. the notation used in § 4.1.1)

Z Z

EP var[0,t] (Z) = EP N (αt) EP |X1 | = αt |y| ν(dy) = t |y| M (dy).
RN RN
Turning to the second part, begin by observing that

−n

P kZ̄k[0,t] > R = lim P max Z̄ m2 t > R
n→∞ 1≤m≤2n

−n − 12

≤ N lim sup P max n e, Z̄ m2 t RN > N R .

n→∞ e∈SN −1 1≤m≤2
Next, given e ∈ SN −1 and n ≥ 1, write

X
e, Z̄(m2−n t) RN = e, Z̄(`2−n t) − Z̄((` − 1)2−n t) RN ,

1≤`≤m
and apply Kolmogorov’s Inequality to conclude that

− 12
2
max n e, Z̄ m2 t RN > N R ≤ N R−2 EP e, Z̄(t) RN .
−n

P
1≤m≤2

EP |Z̄M (t)|2 = t RN |y|2 M (dy). To this
R
Thus, we will be done once I check
that

end, first note that EP |Z̄(t)|2 = EP |Z(t)|2 −α 2 2
t |m|2 , where m = RN y ν(dy).
R

At the same time, if X̄m = Xm − m, then EP |Z(t)|2 equals
 2   2 
X X
X̄m  + |m|2 EP N (αt)2
P 
 P 

E  X m  = E 

1≤m≤N (αt) 1≤m≤N (αt)
= αtEP |X̄1 |2 + |m|2 α2 t2 + αt = αtEP |X1 |2 + α2 t2 |m|2 .

R
Thus, since αEP |X1 |2 = RN |y|2 M (dy), the desired equality follows.
§ 4.2.3. Poisson Jump Processes. Rather than attempting to construct more
general Lévy processes directly, I will first construct their jump processes and
then construct them out of their jumps. With this idea in mind, given a probabil-
ity space (Ω, F, P), I will say that (t, ω) j(t, · , ω) is a Poisson jump process
associated with M ∈ M∞ (RN ) if, for each ω ∈ Ω, t j(t, · , ω) is a jump func-
+
tion, and for each n ∈ Z and collection
Sn {∆1, . . . , ∆n } of mutually disjoint Borel
subsets sets of RN satisfying 0 ∈ / i=1 ∆i , {j(t, ∆i ) : t ≥ 0} : 1 ≤ i ≤ n are
mutually independent, simple Poisson processes, the ith of which is run at rate
M (∆i ) for each 1 ≤ i ≤ n. By starting with simple functions and passing to
limits, one can easily check that
Z
(t, ω) ∈ [0, ∞) × Ω 7−→ ϕ(y) j(t, dy, ω) ∈ [0, ∞]
is measurable for every Borel measurable function ϕ : RN −→ [0, ∞]. Therefore,

0
if F : RN −→ RN is a Borel measurable function, and, for T > 0,
Z
Ω(T ) ≡ ω : |F (y)| j(T, dy, ω) < ∞ ,
then both the set Ω(T ) and the function

Z
0
(t, ω) ∈ [0, T ] × Ω(T ) F (y) j(t, y, ω) ∈ RN
are measurable. Note that if |F (y)| vanishes for y’s in a neighborhood of 0, then
Ω(T ) = Ω for all T > 0.
My goal in this subsection is to prove the following existence result.
Theorem 4.2.11. For each M ∈ M∞ (RN ) there exists an associated Poisson

jump process. (See § 9.2.2 for another approach.)
Proof: Set A0 = RN \B(0, 1) and Ak = B(0, 2−k+1 )\B(0, 2−k ) for k ∈ Z+ , and
define Mk(dy) = 1Ak (y) M (dy). Next, choose mutually independent Poisson
processes {Zk (t) : t ≥ 0} : k ∈ N so that the kth one is associated with Mk ,
and set jk (t, · , ω) = j t, · , Zk ( · , ω) . Without loss in generality, I may and will
assume that jk (t, Ak {, ω) = 0 for P all (t, ω) ∈ [0, ∞) × Ω and k ∈ N. In addition,
m
by Theorem 4.2.9, if Z(m) (t) = k=0 Zk (t), then we know that, for P-almost
every ω ∈ Ω,
m
X
j (m) (t, · , ω) ≡ j t, · , Z(m) ( · , ω) = jk (t, · , ω), t ≥ 0.
k=0
Hence, I may and will assume that

∞
X
t j(t, · , ω) ≡ jk (t, · , ω)
k=1
is a jump function for all ω ∈ Ω. Finally, suppose that {∆i : 1 ≤ i ≤ n} ⊆ BRN

Sn Sn
/ i=1 ∆i . Choose m ∈ N so that ( 1 ∆i )∩ B(0, 2−m ) =
are disjoint and that 0 ∈
∅, and note that, P-almost surely, j(t, ∆i , ω) = j (m) (t, ∆i , ω) for all t ≥ 0 and
1 ≤ i ≤ n. Hence, the required independence property is a consequence of the
last part of Theorem 4.2.8.
In preparation for the next section, I prove the following.
0
Lemma 4.2.12. Let F : RN −→ RN be a Borel measurable function satisfying
0
F (0) = 0 and 0 ∈ / F −1 RN \ B(0, r) for any r > 0. For any M ∈ M∞ (RN ),
0
M F ∈ M∞ (RN ). Moreover, if {j(t, · ) : t ≥ 0} is a Poisson jump process
associated with M and j F (t, Γ, ω) ≡ j t, F −1 (Γ\{0}), ω , then {j F (t, · ) : t ≥ 0}
is a Poisson jump process associated with (cf. Lemma 4.2.6) M F . Finally, if
0
0∈ / F −1 (RN \ {0}) and
Z Z
ZF (t, ω) ≡ y j F (t, dy, ω) = F (y) j(t, dy, ω),
0
then M F ∈ M0 (RN ), {ZF (t) : t ≥ 0} is a Poisson process associated with M F ,
and j t, · , ZF ( · , ω) = j F (t, · , ω).
Proof: To prove the first assertion, suppose that {∆1 , . . . , ∆n } are disjoint
0 Sn
Borel subsets of RN and that 0 ∈ / i=1 ∆i . Then {F −1 (∆1 ), . . . , F −1 (∆n )}
satisfy the same conditions as subsets of RN , and therefore, since j F (t, ∆i , ω) =
−1
F
j t, F (∆i ), ω), {j (t, ∆i ) : t ≥ 0} : 1 ≤ i ≤ n has the required properties.
0
Turning to the second assertion, first note that M F ∈ M0 (RN ) is an immedi-
0
ate consequence of 0 ∈ / F −1 (RN \ {0}) and that the equality j t, · , ZF ( · , ω) =
j F (t, · , ω) is a trivial application of the final part of Theorem 4.1.8. To prove
that {ZF (t) : t ≥ 0} is a Poisson process associated with M F , use Theorem
4.2.8 to see that {j F (t, · ) : t ≥ 0} has the same distribution as the jump
process for a Poisson process {Z(t) : t ≥ 0} associated with M F . Hence,
since Z(t) = y j(t, dy, Z), {ZF (t) : t ≥ 0} has the same distribution as
R
{Z(t) : t ≥ 0}.
§ 4.2.4. Lévy Processes with Bounded Variation. Although the contents
of the previous section provide the machinery with which to construct a Lévy
process for any µ with Fourier transform given by (4.2.1), for reasons made clear
in the next lemma, I will treat the special case when M ∈ M1 (RN ) here and will
deal with M ∈ M2 (RN ) \ M1 (RN ) in the following subsection.
Lemma 4.2.13. Let {j(t, · ) : t ≥ 0}R be a Poisson jump process associated
with M ∈ M2 (RN ), and set V (t, ω) = |y| j(t, dy, ω). Then V (t) < ∞ almost
surely or V (t) = ∞ almost surely for all t > 0, depending on whether M is or is
not in M1 (RN ). (See Exercise 4.3.11 to see that the same conclusion holds for
any M ∈ M∞ (RN ).)
R
Proof: Since |y|>1 |y| j(t, dy, ω) < ∞ for all (t, ω) ∈ [0, ∞) × Ω, the question
R
is entirely about the finiteness of V0 (t, ω) ≡ B(0,1) |y| j(t, dy, ω). To study this
−k+1 ) \ B(0, 2−k ), F (y) = |y|1
Rquestion, set Ak = B(0, 2 k Ak (y), and Vk (t, ω) =

Ak
|y| j(t, dy, ω) for k ≥ 1. Clearly, the processes {V k (t) : t ≥ 0} : k ∈ Z+
are mutually independent. In addition, for each k, t Vk (t) is non-decreasing
and, by the second part of Lemma 4.2.12, {Vk (t) : t ≥ 0} is a Poisson process
associated with M Fk . Thus, by Lemma 4.2.10,
Z Z
|y|2 M (dy).

ak ≡ EP Vk (t) = t |y| M (dy) and bk ≡ Var Vk (t) = t
Ak Ak
From the first of these, it follows that
∞
"Z # Z
X
P P
E |y| j(t, dy) = E Vk (t) = |y| M (dy),
B(0,1) k=1 B(0,1)
which finishes the case when M ∈ M1 (RN ). When M ∈ M2 (RN ) \ M1 (RN ), set
V̄k (t) = Vk (t) − tak . Then, for each t > 0, {V̄k (t) : k ∈ Z+ } is a sequence of
mutually independent random variables with mean value 0. Furthermore,
∞
X ∞
X Z
|y|2 M (dy) < ∞.

Var V̄k (t) = t bk = t
k=1 k=1 B(0,1)
P∞
Hence, by Theorem 1.4.2, k=1 V̄k (t) converges P-almost
P∞ surely. But, when
∞
/ M1 (RN ), k=1 ak = ∞, and so, for each t > 0, k=1 Vk (t) must diverge
P
M ∈
P-almost surely.
Before stating the main result of the subsection, I want to introduce the notion
of a generalized Poisson measure. Namely, if M ∈ M1 (RN ) \ M0 (RN ) and
πM is the element of I(RN ) whose Fourier transform is given by
Z √
−1(ξ,y)RN
exp e − 1 M (dy) ,
R
or, equivalently, π
d M is given by (4.2.1) with m = B(0,1) y M (dy), then I will
call πM the generalized Poisson measure for M . Similarly, if {Z(t) : t ≥ 0}
is a Lévy process for a generalized Poisson measure πM , I will say that it is a
generalized Poisson process associated with M .
Theorem 4.2.14. Suppose that M ∈ M1 (RN ) and that {j(t, · ) : t ≥ 0} is
a Poisson jump process associated with M . Set N = {ω : ∃t > 0 j(t, · , ω) ∈
/
N
M1 (R )}, and define (t, ω) ZM (t, ω) so that
R
y j(t, dy, ω) if ω ∈
/N
ZM (t, ω) =
0 if ω ∈ N .
Then P(N ) = 0 and {ZM (t) : t ≥ 0} is a (possibly generalized) Poisson process
associated with M . In particular, t ZM (t, ω) is absolutely pure jump for all
ω ∈ Ω, and {j(t, · , ZM ) : t ≥ 0} is a Poisson jump process associated with M .
Finally, if µ ∈ I(RN ) has Fourier transform given by (4.2.1), then
( Z ! )
t m− y M (dy) + ZM (t) : t ≥ 0
B(0,1)
is a Lévy process for µ.

Proof: That P(N ) = 0 follows from Lemma 4.2.13. To prove that {ZM (t) :
t ≥ 0} is a Lévy process for πM , set
Z
Z(r) (t, ω) = y j(t, dy, ω)
|y|>r
(r)
for r > 0. By Lemma 4.2.12, {Z (t) : t ≥ 0} is a Poisson process associated
with M (r) (dy) ≡ 1(r,∞) (y) M (dy). In addition, if ω ∈/ N , then Z(r) ( · , ω) −→
ZM ( · , ω) uniformly on compacts, from which it is easy to check that {ZM (t) :
t ≥ 0} is a Poisson process associated with M and that the process in the last
assertion is a Lévy process for the µ whose Fourier transform is given by (4.2.1)

with this M . Finally, by the last part of Theorem 4.1.8, j t, · , ZM ( · , ω) =
j(t, · , ω) when ω ∈/ N , from which it is clear that {j(t, · , ZM ) : t ≥ 0} is a
Poisson jump process associated with M .
§ 4.2.5. General, Non-Gaussian Lévy Processes. In this subsection I will
complete the construction of Lévy processes with no Gaussian component.
Theorem 4.2.15. For each m ∈ RN and M ∈ M2 (RN ) there is a Lévy process

for the µ ∈ I(RN ) whose Fourier transform is given by (4.2.1). Moreover, if
{Z(t) : t ≥ 0} is such a process, then {j(t, · , Z) : t ≥ 0} is a Poisson jump
process associated with M . Finally, if, for r ∈ (0, 1],
Z Z
Z(r) (t) = y j(t, dy, Z) − t y M (dy),
|y|>r r<|y|≤1
then
!
N 2t
Z
sup Z(τ ) − τ m − Z(r) (τ ) ≥ |y|2 M (dy).

P ≤
τ ∈[0,t] 2 B(0,r)
Proof: Without loss in generality, I will assume that m = 0.

By Theorem 4.2.11, we know that there is a Poisson jump process {j(t, · ) :
t ≥ 0} associated with M . Take
j̄(t, dy, ω) = j(t, dy, ω) − t1B(0,1) (y)M (dy),
and define
Z
(r)
Z (t, ω) = y j̄(t, dy, ω), (t, ω) ∈ [0, ∞) × Ω,
|y|>r
for r ∈ (0, 1]. By Theorem 4.2.14, we know that {Z(r) (t) : t ≥ 0} is a Lévy
process for µ(r) , where
!
h √ √
Z i
e −1(ξ,y)RN − 1 − −1 1[0,1] (y) ξ, y RN M (dy) .

µd(r) (ξ) = exp
|y|>r
Furthermore, by the second part of Lemma 4.2.10, we know that, for 0 < r <
r0 ≤ 1,
N 2t
Z
0
(*) P kZ(r ) − Z(r) k[0,t] ≥ ≤ 2 |y|2 M (dy).
r<|y|≤r 0
Hence, if 1 ≥ rm & 0 is chosen so that

Z
|y|2 M (dy) ≤ 2−m ,
B(0,rm )
then
X
P sup kZ(rn ) − Z(rm ) k[0,t] ≥ 1
P kZ(rn+1 ) − Z(rn ) k[0,t] ≥ (m + 1)−2

m ≤
n>m
n≥m
∞
X
≤ N 2t (n + 1)4 2−n ,
n=m
and therefore, by the first part of the Borel–Cantelli Lemma,

P ∃m ∀n ≥ m kZ(rn ) − Z(rm ) k[0,t] ≤ m+1
1
= 1.
We now know that there is a P-null set N such that, for any ω ∈ / N , there
exists a Z( · , ω) ∈ D(RN ) to which {Z(rm ) ( · , ω) : n ≥ 0} converges uniformly
on compacts. Thus, if we take Z(t, ω) = 0 for (t, ω) ∈ [0, ∞) × N , then it is an
easy matter to check that {Z(t) : t ≥ 0} is a Lévy process for the µ ∈ I(RN )
whose Fourier transform is given by (4.2.1) with m = 0. In addition, since, by
Theorem 4.1.8, we know that t j(t, · , ω) is the jump function for t Z(t, ω)
when ω ∈ / N , it is clear that {j(t, · , Z) : t ≥ 0} is a Poisson jump process
associated with M . Finally, to prove the estimate in the concluding assertion,
observe that, for ω ∈ / N , the path t Z(r) (t, ω) used in our construction
coincides with the path described in the statement. Thus, the desired estimate
is an easy consequence of the one in (*) above.
Corollary 4.2.16. Let µ ∈ I(RN ) with Fourier transform given by (4.2.1),
and suppose that {Z(t) : t ≥ 0} is a Lévy process for µ. Then, depending
on whether or not M ∈ M1 (RN ), either P-almost all or P-almost none of the
paths t Z(t) has locally bounded variation. Moreover, if M ∈ M1 (RN ), then,
P-almost surely,
Z !
t Z(t) − t m − y M (dy) is an absolutely pure jump path.
B(0,1)
Proof: From Theorem 4.2.14, we already know that t Z(t) − tm is almost

surely an absolutely pure jump path if M ∈ M1 (RN ), and so t Z(t) is almost
surely of locally bounded variation. Conversely, if t Z(t) has locally bounded
variation with positive probability, then, by (4.1.13),
j t, · , Z ∈ M1 (RN ) with
positive probability. But then, since {j t, · , Z : t ≥ 0} is a Poisson jump
process associated with M , it follows from Lemma 4.2.13 that M ∈ M1 (RN ).
Corollary 4.2.17. Let µ and {Z(t) : t ≥ 0} be as in Corollary 4.2.16. Given
∆ ∈ BRN with 0 ∈ ¯ set
/ ∆,
Z Z
∆ ∆ ∆
Z (t) = y j(t, dy, Z), M (dy) = 1∆ (y)M (dy), and m = y M ∆ (dy).
∆
B(0,1)
Then {Z∆ (t) : t ≥ 0} is a Poisson process associated with M ∆ , {Z(t) − Z∆ (t) :

t ≥ 0} is a Lévy process for the element of I(RN ) whose Fourier transform is
√

−1 ξ, m − m∆ RN

exp
h √ √
Z
−1(ξ,y)RN
i
+ e − 1 − −11[0,1] |y| ξ, y RN M (dy) ,
RN \∆

and {Z(t) − Z (t) : t ≥ 0} is independent of {j t, · , Z∆ : t ≥ 0}, and therefore
∆
of {Z∆ (t) : t ≥ 0} as well.

Proof: That {Z∆ (t) : t ≥ 0} is a Poisson process associated with M ∆ is an

immediate consequence of Lemma 4.2.12. Next, define Z(r) (t) as in Theorem
4.2.15. Then, for all r ∈ (0, 1],
Z Z
(r) ∆
Z (t) − Z (t) = 1RN \∆ (y)y j(t, dy) − t y M (dy).
|y|>r r<|y|≤1
In particular, this means that {Z(r) (t)−Z∆ (t) : t ≥ 0} has independent, homoge-
neous increments and (cf. Theorem 4.1.8) is independent of {j t, · , Z∆ : t ≥ 0}.
Thus, since, as r & 0, Z(r) (t) −→ Z(t) − tm in probability, it follows that
{Z(t) − Z∆ (t) : t ≥ 0} is independent of {j(t, · , Z∆ ) : t ≥ 0}. In addition,
√ ∆ √ ∆ √ (r) ∆ ∆
e− −1t(ξ,m−m )RN EP e −1(ξ,Z(t)−Z (t))RN = lim EP e −1(ξ,Z (t)−Z (t)+tm )RN

r&0
 
h √ √
Z i
e −1(ξ,y)RN − 1 − −11[0,1] |y| ξ, y RN M (dy)

= lim exp 
 
r&0
(∆∪B(0,r)){
!
√ √
Z h
−1(ξ,y)RN
i
= exp e −1− −11[0,1] |y| ξ, y RN M (dy) .
RN \∆
Hence, it follows that {Z(t) − Z∆ (t) : t ≥ 0} is a Lévy process for the specified
element of I(RN ).
Exercise 4.2.18. Here is another proof that the process {N (t) : t ≥ 0} in
§ 4.2.1 has independent, homogeneous increments. Refer to the notation used
there.
(i) Given n ∈ Z+ and measurable functions f : [0, ∞)n+1 7−→ [0, ∞) and g :
[0, ∞)n −→ R, show that

EP f (τ1 , . . . , τn+1 ), τn+1 > g(τ1 , . . . , τn )
+
= EP e−g(τ1 ,... ,τn ) f τ1 , . . . , τn , τn+1 + g(τ1 , . . . , τn )+ .

(ii) Let K ∈ Z+ , 0 = n0 ≤ n1 ≤ · · · ≤ nK , and 0 = t0 ≤ t1 < · · · < tK = s

be given, and set A = {N (tk ) = nk , 1 ≤ k ≤ K}. Show that A = B ∩
{τnK +1 > s − TnK }, where B ∈ σ {τ1 , . . . , τnK } , and apply (i) to see that
P(A) = EP e(s−TnK ) , B .

(iii) Let n ∈ Z+ and t > 0 be given, and set h(ξ) = P(Tn−1 > ξ). Referring to
(ii) and again using (i), show that

P A ∩ {N (s + t) − N (s) < n} = EP h(t + s − TnK +1 ), B ∩ {τnK +1 > s − TnK }
= EP e−(s−TnK ) h(t − τnK +1 ), B = EP h(t − τnK +1 ) EP e−(s−TnK ) , B

= P N (t) < n P(A).
Exercise 4.2.19. Let {N (t) : t ≥ 0} be a simple Poisson process, and show

that limt→∞ N t(t) = 1 P-almost surely.
N (n)
Hint: First use The Strong Law of Large Numbers to show that limn→∞ n =
1 P-almost surely. Second, use

N (t) − N (n) 2
P sup ≥ ≤ P N (1) ≥ n ≤ 2 2
n≤t≤n+1 t n
to see that
N (t) N (btc)
lim − = 0 P-almost surely.
t→∞ t btc
Exercise 4.2.20. Assume that µ ∈ I(R) has its Fourier transform given by
(4.2.1), and let {Z(t) : t ≥ 0} be a Lévy process for µ. Using Exercise 3.2.25,

show that t R Z(t) is non-decreasing if and only if M ∈ M1 (R), M (−∞, 0) =
0, and m ≥ [−1,1] y M (dy).
Exercise 4.2.21. Let {j(t, · ) : t ≥ 0} be a Poisson jump process associated
with some M ∈ M∞ (RN ), and suppose that F : RN −→ R is a Borel measurable,
M -integrable function that vanishes at 0.
(i) Let N be the set of ω ∈ Ω for which there is a t > 0 such that F is not
j t, · , ω)-integrable, and show that P(N ) = 0.
(ii) Show that (cf. Lemma 4.2.6) M F ∈ M1 (R) and that, in fact,
Z Z
F
|y| M (dy) = |F (y)| M (dy) < ∞.
Next, define R
F F (y) j(t, dy, ω) if ω ∈
/N
Z (t, ω) =
0 if ω ∈ N ,
and show that {Z F (t) : t ≥ 0} is a (possibly generalized) Poisson process asso-
ciated with M F .
(iii) Show that
Z F (t)
Z
lim = F (y) M (dy) P-almost surely.
t→∞ t
Hint: Begin by using Lemma 4.2.10 to show that it suffices to handle F ’s that
vanish in a neighborhood of 0. When F vanishes in a neighborhood of 0, use
Lemma 4.2.12 to see that {Z F (t) : t ≥ 0} is a Poisson process associated with
M F . Finally, use the representation of a Poisson process in terms of a simple
Poisson process and independent random variables, and apply The Strong Law
of Large Numbers together with the result in Exercise 4.2.19.
Exercise 4.2.22. Let {Z(t) : t ≥ 0} be a Lévy process for the µ ∈ I(RN ) with
Z̄(t) = Z(t) − tm. Show that for all
Fourier transform given by (4.2.1), and set
R ∈ [1, ∞) and t ∈ (0, ∞), P kZ̄k[0,t] ≥ R is dominated by t times
4N
Z
2
Z √
|y|2 M (dy) +

√ |y| M (dy) + M B(0, R){ .
R2 B(0,1) R 1<|y|≤ R
Hint: Write Z̄(t) = Z1 (t) + Z2 (t) + Z3 (t), where

Z Z
Z2 (t) = √ y j(t, dy, Z) and Z 3 (t) = √ y j(t, dy, Z).
1<|y|≤ R |y|> R
Then,
R R

P kZk[0,t] ≥ R ≤ P kZ1 k[0,t] ≥ 2 + P kZ2 k[0,t] ≥ 2 + P kZ3 k[0,t] 6= 0 .
Apply the estimates in Lemma 4.2.10 to control the first two terms on the right,
and use
√ N
√
P j t, RN \ B(0, R), Z 6= 0 = 1 − e−tM (R \B(0, R))

to control the third.

Exercise 4.2.23. Let ν be a locally finite Borel measure on RN . A Poisson
point process with intensity measure ν is a random, locally finite, purely atomic
measure-valued random variable ω P ( · , ω) with the properties that, for any
bounded Γ ∈ BRN , P (Γ) is a Poisson random variable with mean value ν(Γ)
and, for any n ≥ 2 and family {Γ1 , . . . , Γn } of mutually disjoint, bounded, Borel
subsets of RN , {P (Γ1 ), . . . , P (Γn )} are mutually independent. The purpose of
this exercise is to show how one always construct such a Poisson point process
for any ν.
y
(i) Define F : RN −→ RN so that F (0) = 0 and F (y) = |y| 2 for y 6= 0. Clearly,
−1
F is one-to-one and onto, and both F and F are Borel measurable. Assuming
that ν({0}) = 0, show that M ≡ F∗ ν ∈ M∞ (RN ) and that ν = (F −1 )∗ M .
(ii) Continue to assume that ν({0}) = 0, let {j(t, · ) : t ≥ 0} be a Poisson jump
process associated with the M in (i), and set P ( · , ω) = (F −1 )∗ j(1, · , ω). Show
that ω P ( · , ω) is a Poisson point process with intensity ν.
(vi) In order to handle ν’s that charge 0, suppose ν({0}) > 0. Choose a point
x ∈ RN for which ν({x}) = 0, define ν 0 (Γ) = ν(x + Γ), note that ν 0 ({0}) = 0,
and construct a Poisson point process ω P 0 ( · , ω) with intensity measure ν 0 .
0
Finally, define P (Γ, ω) = P (Γ − x, ω), and check that ω P ( · , ω) is a Poisson
point process with intensity measure ν.
§ 4.3 Brownian Motion, the Gaussian Lévy Process 177
Exercise 4.2.24. Let M ∈ M2 (RN ) be given, and assume that there exists a
decreasing sequence {rn : n ≥ 0} ⊆ (0, 1] with rn & 0 such that
Z
m = lim y M (dy)
n→∞ rn <|y|≤1
exists. Let µ ∈ I(RN ) have Fourier transform given by (4.2.1) with this m and
M . If {Z(t) : t ≥ 0} is a Lévy process for µ, set
Z

Zn (t, ω) = y j t, dy, Z( · , ω) ,
|y|>rn

and show that limn→∞ P kZ − Zn k[0,t] ≥ = 0 for all t ≥ 0 and > 0. Thus,
after passing to a subsequence {nm : m ≥ 0} if necessary, one sees that, P-almost
surely, Z

Z(t, ω) = lim y j t, dy, Z( · , ω) ,
m→∞ |y|>rnm
where the convergence is uniform on finite time intervals. In particular, one can
say that P-almost all the paths t Z(t, ω) are “conditionally pure jump.”
§ 4.3 Brownian Motion, the Gaussian Lévy Process
What remains of the program in this chapter is the construction of a Lévy
process for the standard, normal distribution γ0,I , the infinitely divisible law
|ξ|2
whose Fourier transform is e− 2 . Indeed, if {Zγ0,I (t) : t ≥ 0} is such a process
and {Zµ (t) : t ≥ 0} is a Lévy process for the µ ∈ I(RN ) whose Fourier transform
is given by (4.2.1), and if {Zγ0,I (t) : t ≥ 0} is independent of {Zµ (t) : t ≥ 0},
1
then it is an easy matter to check that C 2 Zγ0,I (t) + Zµ (t) will be a Lévy process
for γ0,C ? µ, whose Fourier transform is
√

−1 ξ, m RN − 12 ξ, Cξ RN

exp
√
Z h √
i
+ e −1(ξ,y)RN − 1 − −1 1[0,1] (|y|) ξ, y RN M (dy) .
RN
Because one of its earliest applications was as a mathematical model for the
motion of “Brownian particles,” 1 such a Lévy process for γ0,1 is called a Brow-
nian motion. In recognition of its provenance, I will adopt this terminology
and will use the notation {B(t) : t ≥ 0} instead of {Zγ0,I (t) : t ≥ 0}.
1 R. Brown, an eighteenth century English botanist, observed the motion of pollen particles
in a dilute gas. His observations were interpreted by A. Einstein as evidence for the kinetic
theory of gases. In his famous 1905 paper, Einstein took the first steps in a program, eventually
completed by N. Wiener in 1923, to give a mathematical model of what Brown had seen.
Before getting into the details, it may be helpful to think a little about what
sorts of properties we should expect the paths t B(t) will possess. For this
N
purpose, set Mn = n δ − 12 + δ − 12 , and recall that we have seen already
n −n
that πMn =⇒γ0,I . Since a Poisson process associated with Mn has nothing but
1
jumps of size n− 2 , if one believes that the Lévy process for γ0,I should be, in
some sense, the limit of such Poisson processes, then it is reasonable to guess
that its paths will have jumps of size 0. That is, they will be continuous.
Although the prediction that the paths of {B(t) : t ≥ 0} will be continuous
is correct, it turns out that, because it is based on the Central Limit Theorem,
the heuristic reasoning just given does not lead to the easiest construction. The
problem is that The Central Limit Theorem gives convergence of distributions,
not random variables, and therefore one should not expect the paths, as opposed
to their distributions, of the approximating Poisson processes to converge. For
this reason, it is easier to avoid The Central Limit Theorem and work with
Gaussian random variables from the start, and that is what I will do here. The
Central Limit approach is the content of § 9.3.
§ 4.3.1. Deconstructing Brownian Motion. My construction of Brownian
motion is based on an idea of Lévy’s; and in order to explain Lévy’s idea, I will
begin with the following line of reasoning.
Assume that {B(t) : t ≥ 0} is a Brownian motion in RN . That is, {B(t) : t ≥
0} starts at 0, has independent increments, any increment B(s + t) − B(s) has
distribution γ0,tI , and the paths t B(t) are continuous. Next, given n ∈ N, let
t Bn (t) be the polygonal path obtained from t B(t) by linear interpolation
during each time interval [m2−n , (m + 1)2−n ]. Thus,

Bn (t) = B(m2−n ) + 2n t − m2−n B (m + 1)2−n − B(m2−n )

for m2−n ≤ t ≤ (m + 1)2−n . The distribution of {B0 (t) : t ≥ 0} is very

easy to understand. Namely, if Xm,0 = B(m) − B(m − 1) for m ≥ 1, then the
N
X
Pm,0 ’s are independent, standard normal R -valued random variables, B0 (m) =
1≤m≤n Xm,0 , and B0 (t) = (m − t)B0 (m − 1) + (t − m + 1)B0 (m) for m − 1 ≤
t ≤ m. To understand the relationship between successive Bn ’s, observe that
Bn+1 (m2−n ) = Bn (m2−n ) for all m ∈ N and that
n

Xm,n+1 ≡ 2 2 +1 Bn+1 (2m − 1)2−n−1 − Bn (2m − 1)2−n−1

!
B m2−n + B (m − 1)2−n

n
+1 −n−1
= 22 B (2m − 1)2 −
2
n
h
= 2 2 B (2m − 1)2−n−1 − B (m − 1)2−n

i
− B m2−n − B (2m − 1)2−n−1

,
and therefore {Xm,n+1 : m ≥ 1} is again a sequence of independent standard

normal random variables. What is less obvious is that {Xm,n : (m, n) ∈ Z+ ×N}
is also a family of independent random variables. In fact, checking this requires
us to make essential use of the fact that we are dealing with Gaussian random
variables.
In preparation for proving the preceding independence assertion, say that
G ⊆ L2 (P; R) is a Gaussian family if G is a linear subspace and each element
of G is a centered (i.e., mean value 0), R-valued Gaussian random variable.
My
interest in Gaussian families at this point is that the linear span G(B) of
ξ, B(t) RN : t ≥ 0 and ξ ∈ RN is one. To see this, simply note that, for any
0 = t0 < t1 < · · · tn and ξ1 , . . . , ξn ∈ RN ,
n n n
!
X X X
ξm , B(tm ) RN = ξm , B(t` ) − B(t`−1 ) RN ,
m=1 `=1 m=` RN
which, as a linear combination of independent centered Gaussians, is itself a

centered Gaussian.
The crucial fact about Gaussian families is the content of the next lemma.
Lemma 4.3.1. Suppose that G ⊆ L2 (P; R) is a Gaussian family. Then the
closure of G in L2 (P; R) is again a Gaussian family. Moreover, for any S ⊆ G,
S is independent of S ⊥ ∩ G, where S ⊥ is the orthogonal complement of S in
L2 (P; R).
Proof: The first assertion is easy since, as I noted in the introduction to Chap-
ter 3, Gaussian random variables are closed under convergence in probability.
Turning to the second part, what I must show is that if X1 , . . . , Xn ∈ S and
X10 , . . . , Xn0 ∈ S ⊥ ∩ G, then (cf. part (ii) of Exercise 1.1.13)
" n n
# " n # " n #
Y √ Y √ 0 0 Y √ Y √ 0 0
P −1 ξm Xm −1 ξm Xm P −1 ξm Xm P −1 ξm Xm
E e e =E e E e
m=1 m=1 m=1 m=1
0
for any choice of {ξm : 1 ≤ m ≤ n} ∪ {ξm : 1 ≤ m ≤ n} ⊆ R. But the
expectation value on the left is equal to
  !2 
n
1 X
0 0

exp − EP  ξm Xm + ξm Xm 
2 m=1
  !2   !2 
n n
1 X 1 X
0 0
= exp − EP  ξm Xm  − EP  ξm Xm 
2 m=1
2 m=1
" n # " n #
Y √ Y √ 0 0
= EP e −1 ξm Xm EP e −1 ξm Xm ,
m=1 m=1
0 0
since EP [Xm Xm 0 ] = 0 for all 1 ≤ m, m ≤ n.
Armed with Lemma 4.3.1, we can now check that {Xm,n : (m, n) ∈ Z+ ×N}
is independent. Indeed, since, for all (m, n) ∈ Z+ × N and ξ ∈ RN , ξ, Xm,n RN
a member of the Gaussian family G(B), all that we have to do is check that, for
each (m, n) ∈ Z+ × N, ` ∈ N, and (ξ, η) ∈ (RN )2 ,
EP ξ, Xm,n+1 RN η, B(`2−n ) RN = 0.

But, since, for s ≤ t, B(s) is independent of B(t) − B(s),

EP ξ, B(s) RN η, B(t) RN = EP ξ, B(s) RN η, B(s) RN = s ξ, η RN
and therefore
n
2− 2 −1 EP ξ, Xm,n+1 RN η, B(`2−n ) RN

h i
= EP ξ, B (2m − 1)2−n−1 N η, B(`2−n ) N
R R
1 P h −n
i
+ B (m − 1)2−n N η, B(`2−n ) N

− E ξ, B m2
2 R R
m ∧ ` + (m − 1) ∧ `
= 2−n ξ, η RN m − 12 ∧ ` −

= 0.
2
§ 4.3.2. Lévy’s Construction of Brownian Motion. Lévy’s idea was to
invert the reasoning given in the preceding subsection. That is, start with a
family {Xm,n : (m, n) ∈ Z+ × N} of independent N (0, I)-random variables.
Next, define {Bn (t) : t ≥ 0} inductively
P so that t Bn (t) is linear on each
interval [(m − 1)2 , m2 ], B0 (m) = 1≤`≤m X`,0 , m ∈ N, Bn+1 (m2−n ) =
−n −n
Bn (m2−n ) for m ∈ N, and

n
Bn+1 (2m − 1)2−n = Bn (2m − 1)2−n−1 + 2− 2 −1 Xm,n+1 for m ∈ Z+ .

If Brownian motion exists, then the distribution of {Bn (t) : t ≥ 0} is the

distribution of the process obtained by polygonalizing it on each of the intervals
[(m − 1)2−n , m2−n ], and so the limit limn→∞ Bn (t) should exist uniformly on
compacts and should be Brownian motion.
To see that this procedure works, one must first verify that the preceding
definition of {Bn (t) : t ≥
0} gives a process
with the correct distribution. That
is, we need to show that Bn (m+1)2−n −Bn m2−n : m ∈ N is a sequence of

independent N (0, 2−n I)-random variables. But, since this sequence is contained
in the Gaussian family spanned by {Xm,n : (m, n) ∈ Z+ × N}, Lemma 4.3.1 says
that we need only show that
h
EP ξ, Bn (m + 1)2−n − Bn m2−n N

R
i
0 0 −n
− Bn m0 2−n = 2−n ξ, ξ 0 RN δm,m0

× ξ , Bn (m + 1)2
RN
for ξ, ξ 0 ∈ RN and m, m0 ∈ N. When n = 0, this is obvious. Now assume that

it is true for n, and observe that
Bn+1 (m2−n ) − Bn+1 (2m − 1)2−n−1

Bn (m2−n ) − Bn (m − 1)2−n

n
= − 2− 2 −1 Xm,n+1
2
and
Bn+1 (2m − 1)2−n−1 − Bn+1 (m − 1)2−n

Bn (m2−n ) − Bn (m − 1)2−n

n
= + 2− 2 −1 Xm,n+1 .
2
Using these expressions and the induction hypothesis, it is easy to check the
required equation.
Second, and more challenging, we must show that, P-almost surely, these
processes are converging uniformly on compact time intervals. For this purpose,
consider the difference t Bn+1 (t) − Bn (t). Since this path is linear on each
interval [m2−n−1 , (m + 1)2−n−1 ],
Bn+1 (m2−n−1 ) − Bn (m2−n−1 )

max Bn+1 (t) − Bn (t) = max
t∈[0,2L ] 1≤m≤2L+n+1
L+n
 14
2X
n n
= 2− 2 −1 max |Xm,n+1 | ≤ 2− 2 −1  |Xm,n+1 |4  .
1≤m≤2L+n
m=1
Thus, by Jensen’s Inequality,

 L+n
 14
2X
n n−L−4
EP kBn+1 − Bn k[0,2L ] ≤ 2− 2 −1  EP |Xm,n+1 |4  = 2− 4 CN ,

m=1
1
where CN ≡ EP |X1,0 |4 4 < ∞.
Starting from the preceding, it is an easy matter to show that there is a
measurable B : [0, ∞) × Ω −→ RN such that B(0) = 0, B( · , ω) ∈ C [0, ∞); RN )
for each ω ∈ Ω, and kBn − Bk[0,t] −→ 0 both P-almost surely and in L1 (P; R)
−n −n
for every t ∈ [0, ∞). Furthermore, since
B(m2 )−n= Bn (m2 −n ) P-almost surely

2
for all (m, n) ∈ N , it is clear that B (m + 1)2 − B(m2 ) : m ≥ 0 is a
sequence of independent N (0, 2−n I)-random variables for all n ∈ N. Hence, by
continuity, it follows that {B(t) : t ≥ 0} is a Brownian motion.
We have now completed the task described in the introduction to this section.
However, before moving on, it is only proper to recognize that, clever as his
method is, Lévy was not the first to construct a Brownian motion. Instead, it
was N. Wiener who was the first. In fact, his famous2 1923 article “Differential
Space” in J. Math. Phys. #2 contains three different approaches.
§ 4.3.3. Lévy’s Construction in Context. There are elements of Lévy’s
construction that admit interesting generalizations, perhaps the most important
of which is Kolmogorov’s Continuity Criterion.
Theorem 4.3.2. Suppose that {X(t) : t ∈ [0, T ]} is a family of random
variables taking values in a Banach space B, and assume that, for some p ∈
[1, ∞), C < ∞, and r ∈ (0, 1],
1 1
EP kX(t) − X(s)kpB p ≤ C|t − s| p +r for all s, t ∈ [0, T ].

Then, there exists a family {X̃(t) : t ∈ [0, T ]} of random variables such that
X(t) = X̃(t) P-almost surely for each t ∈ [0, T ] and t ∈ [0, T ] 7−→ X̃(t, ω) ∈ B is
continuous for all ω ∈ Ω. In fact, for each α ∈ (0, r),
" !p # p1 1
P kX̃(t) − X̃(s)kB 5CT p +r−α
E sup ≤ .
0≤s<t≤T (t − s)α (1 − 2−r )(1 − 2α−r )
Proof: First note that, by rescaling time, it suffices to treat the case when
T = 1.
Given n ≥ 0, set Mn = max1≤m≤2n X(m2−n ) − X (m − 1)2−n B , and

observe that

2n
! p1 
1 X p
EP Mnp p ≤ EP  X(m2−n ) − X (m − 1)2−n  ≤ C2−rn .

B
m=1
Next, let t Xn (t) be the polygonal path obtained by linearizing t X(t) on

each interval [(m − 1)2−n , m2−n ], and check that
max kXn+1 (t) − Xn (t)kB
t∈[0,1]

−n −n

X (m − 1)2 − X(m2 )
= max n X (2m − 1)2−n−1 −

≤ Mn+1 .

1≤m≤2 2
B
h i 12 p
Hence, EP supt∈[0,1] kXn+1 (t) − Xn (t)kpB ≤ C2−rn , and so there exists a
measurable X̃ : [0, 1] × Ω −→ B such that t X̃(t, ω) is continuous for all
ω ∈ Ω and
" # p1
C2−rn
EP sup kX̃(t) − Xn (t)kpB ≤ .
t∈[0,1] 1 − 2−r
2Wiener’s article is remarkable, but I must admit that I have never been convinced that it is
complete. Undoubtedly, my doubts are more a consequence of my own ineptitude than of his.
Moreover, because, for each t ∈ [0, 1], kX(τ ) − X(t)kB −→ 0 in probability as

τ → t, it is easy to check that, for each t ∈ [0, 1], X̃(t) = X(t) P-almost surely.
To prove the final estimate, note that for 2−n−1 ≤ t − s ≤ 2−n one has that
kX̃(t) − X̃(s)kB ≤ kX̃(t) − Xn (t)kB + kXn (t) − Xn (s)kB + kXn (s) − X̃(s)kB
≤ 2 sup kX̃(τ ) − Xn (τ )kB + 2n (t − s)Mn ,
τ ∈[0,1]
and therefore that
kX̃(t) − X̃(s)kB
≤ 22α(n+1) sup kX̃(τ ) − Xn (τ )kB + 2n 2(α−1)n Mn .
(t − s)α τ ∈[0,1]
But, by the estimates proved above, this means that
" !p # p1 ∞
2α(n+1) 2−rn

P kX̃(t) − X̃(s)kB X
αn −rn
E sup ≤C 2 + 2 2
0≤s<t≤1 (t − s)α n=0
1 − 2−r
5C
≤ .
(1 − 2−r )(1 − 2α−r )
Corollary 4.3.3. If {B(t) : t ≥ 0} is an RN -valued Brownian motion, then,

for each α ∈ 0, 21 , t

B(t) is P-almost surely Hölder continuous of order α.
In fact, for each T ∈ (0, ∞),

P |B(t) − B(s)|
E sup < ∞.
0≤s<t≤T (t − s)α
Proof: In view of Theorem 4.3.2, all that

we have to do is note that, for each
n ∈ Z+ , there is a Cn < ∞ such that EP |B(t) − B(s)|2n ≤ Cn |t − s|n .
§ 4.3.4. Brownian Paths Are Non-Differentiable. Having shown that
Brownian paths are Hölder continuous of every order strictly less than 12 , I
will close this section by showing that they are nowhere Hölder continuous of
any order strictly greater than 12 . In particular, this will prove Wiener’s famous
result that Brownian paths are nowhere differentiable. The proof that follows is
due to A. Devoretzky.
Theorem 4.3.4. Let {B(t) : t ≥ 0} be an RN -valued Brownian motion. Then,
for each α > 12 ,

|B(t) − B(s)|
P ∃s ∈ [0, ∞) lim <∞ = 0.
t&s (t − s)α
Proof: Because {B(T + t) − B(T ) : t ≥ 0} is a Brownian motion for each

T ∈ [0, ∞), it suffices for us to show that

|B(t) − B(s)|
P ∃s ∈ [0, 1) lim <∞ = 0.
t&s (t − s)α
To this end, note that, for every L ∈ Z+ ,

|B(t) − B(s)|
∃s ∈ [0, 1) lim < ∞
t&s (t − s)α
∞ [
[ ∞ \
∞ [
n L−1
\
m+`+1 m+` M

⊆ B
n −B n
≤
nα .
M =1 ν=1 n=ν m=0 `=0
Thus, it enough to show that there is a choice of L such that

`+1 ` M

lim nP B n −B n
≤
nα , 0 ≤ ` < L = 0.
n→∞
But

`+1 ` M

P B n −B n
≤
nα , 0≤`<L
Z !L
L −N
|y|2
− 2 1
= γ0, n1 I B 0, nα M
= (2π) 2
1
e dy ≤ Cn( 2 −α)N L .
B(0,M n 2 −α )
Hence, we need only take L so that (α − 12 )N L > 1.

In spite of their being non-differentiable, “differentials” of Brownian paths
display remarkable regularity properties. To wit, I make the following simple
observation. In its statement, k · kH.S. denotes the Hilbert–Schmidt norm on
Hom(RN ; RN ).
Theorem 4.3.5. If {B(t) : t ≥ 0} is an RN -valued Brownian motion, then,

for each T ∈ (0, ∞)

[nt]
X
lim sup ∆ m,n B ⊗ ∆ m,n B − tI = 0 P-almost surely,
n→∞ t∈[0,T ]
m=1
H.S.
where ∆m,n B ≡ B m m−1

n −B n . In particular, P-almost no Brownian path
has locally bounded variation.
N
Proof: Let (e1 , . . . , eN ) be an orthonormal basis for R , and set Xi (k, n) =
ei , ∆k,n B RN . Then, what we have to show is that

m
X m
(*) lim sup Xi (k, n)Xj (k, n) − δi,j = 0 P-almost surely.

n→∞ 1≤m≤nT n
k=1
To this end, note that, for each n ∈ Z+ and 1 ≤ i ≤ N , {Xi (k, n) : k ≥ 1} are
mutually independent N (0, n−1 )-random variables. Hence, for each 1 ≤ i ≤ N ,
{Xi (k, n)2 − n−1 : k ≥ 1} are independent random variables with mean value
0 and variance 3n−2 , and therefore, by (1.4.22) and the second inequality in
(1.3.2),
 
Xm 4
E  max Xi (k, n)2 − n1 

1≤m≤nT
k=1
 4 
12M4 T 2

 X 2

1 
≤ 4E  Xi (k, n) − n  ≤ ,
1≤k≤nT n2
where M4 is the fourth moment of X1 (1, 1)2 − 1, and so the Borel–Cantelli

Lemma can be used to check (*) when i = j. When i 6= j, the argument is
essentially the same, only, because Xi (k, n)Xj (k, n) has mean value 0, there is
no need to subtract off its mean.
To prove the final assertion, note that if ψ ∈ C [0, T ]; R has bounded varia-
tion, then
[nT ]
X 2
ψ m m−1

lim n − ψ n = 0.
n→∞
m=1
§ 4.3.5. General Lévy Processes. Our original reason for constructing Brow-
nian motion was to complete the program of constructing all the Lévy processes.
In this subsection, I will do that.
Throughout this subsection, µ ∈ I(RN ) has Fourier transform
√

−1 ξ, m RN − 12 ξ, Cξ RN

exp
(4.3.6)
√
Z h √
−1(ξ,y)RN
i
+ e − 1 − −11[0,1] (|y|) ξ, y RN M (dy) ,
where m ∈ RN , C ∈ Hom(RN ; RN ) is symmetric and non-negative definite, and

M ∈ M2 (RN ). In addition, I will use µ0 to denote γm,C and µ1 to denote the
element of I(RN ) whose Fourier transform is
√
Z h √
i
exp e −1(ξ,y)RN − 1 − −11[0,1] (|y|) ξ, y RN M (dy) .
Thus, µ = µ0 ? µ1 .
Theorem 4.3.7. There is a Lévy process {Z(t) : t ≥ 0} for each µ ∈ I(RN ).

Furthermore, if µ0 and µ1 are as in the preceding discussion and if {Z(t) : t ≥ 0}
is a Lévy process for µ = µ0 ? µ1 , then there exist independent Lévy processes
{Z0 (t) : t ≥ 0} and {Z1 (t) : t ≥ 0} for µ0 and µ1 , respectively, such that
Z(t) = Z0 (t) + Z1 (t), t ≥ 0, P-almost surely. In fact, if, for r ∈ (0, 1],
Z Z
Z(r) (t) = y j(t, dy, Z) − t y M (dy),
|y|>r r<|y|≤1
then, for each t ∈ (0, ∞),
N 2t
Z
P kZ(r) − Z1 k[0,t] ≥ ≤ 2 |y|2 M (dy).
B(0,r)
Proof: Let {B(t) : t ≥ 0} be a Brownian motion and {Z1 (t) : t ≥ 0} an

1
independent Lévy process for µ1 , and define Z0 (t) = tm + C 2 B(t) and Z(t) =
Z0 (t)+Z1 (t). As I pointed out in the introduction to this section, {Z0 (t) : t ≥ 0}
is a Lévy process for µ0 and {Z(t) : t ≥ 0} is a Lévy process for µ. Furthermore,
because t Z0 (t) is continuous, j(t, · , Z) = j(t, · , Z1 ). Hence, by the last part
of Theorem 4.2.15, we know that the last part of the present theorem holds for
this choice of {Z(t) : t ≥ 0}. Finally, since every Lévy process for µ will have
the same distribution as this one, there is nothing more to do.
Corollary 4.3.8. Let {Z(t) : t ≥ 0} be a Lévy process for µ. Then t Z(t)
is P-almost surely continuous if and only if M = 0 and is P-almost surely of
locally bounded variation if and only if C = 0 and M ∈ M1 (RN ). Finally,
t Z(t) is P-almost surely
R an absolutely pure jump path if and only if C = 0,
M ∈ M1 (RN ), and m = B(0,1) y M (dy).
Proof: Let Z(t) = Z0 (t) + Z1 (t) be the decomposition described in Theorem
4.3.7, and let {j(t, · ) : t ≥ 0} be the jump process for {Z(t) : t ≥ 0}. If
M = 0, then Z1 (t) = 0, t ≥ 0, P-almost surely, and so t Z(t) = Z0 (t)
is continuous P-almost surely. Conversely, if t Z(t) is continuous P-almost
surely, then j(t, · ) = 0, t ≥ 0, P-almost surely. Hence, since {j(t, · ) : t ≥ 0} is
a Poisson jump process associated with M , we see that M = 0. Next, suppose
that C = 0. Then Z(t) = Z1 (t) + tm, t ≥ 0, P-almost surely and therefore,
by Corollary 4.2.16, t Z(t) has locally bounded variation P-almost surely
N
if and only if M ∈ M1 (R ) and is P-almost Rsurely an absolutely pure jump
path if and only if M ∈ M1 (RN ) and m = B(0,1) y M (dy). Thus, all that
remains is to show that C = 0 if t Z(t) P-almost surely has locally bounded
variation. But,
R if t Z(t) has locally bounded variation P-almost surely, then,
by (4.1.13), |y|j(t, dy) < ∞, t ≥ 0, P-almost surely and therefore, by Lemma
4.2.13, M ∈ M1 (RN ), which, by Corollary 4.2.16, implies that t Z1 (t) has
locally bounded variation P-almost surely. Since this means that t Z0 (t) must
also have locally bounded variation P-almost surely, and, since {Z0 (t) : t ≥ 0}
1
has the same distribution as {tm + C 2 B(t) : t ≥ 0}, Theorem 4.3.5 shows that
this is possible only if C = 0.
Remark 4.3.9. Recall the linear functional Aµ introduced in (3.2.10). As I
showed in Lemma 3.2.14, the action of Aµ on ϕ decomposes into a local part
and a non-local part, which, with 20-20 hindsight, we can write as, respectively,
m, ∇ϕ(0) RN + 12 Trace C∇2 ϕ(0)

Z h
i
and ϕ(y) − ϕ(0) − 1[0,1] (|y|) y, ∇ϕ(0) RN M (dy).
In terms of this decomposition, Corollary 4.3.8 is saying that the local part of
Aµ governs the continuous part of {Z(t) : t ≥ 0} and that the non-local part
governs the discontinuous part.
Exercise 4.3.10. This exercise deals with a few elementary facts about Brow-
nian motion.
(i) Let {X(t) : t ≥ 0} be an RN -valued stochastic process satisfying X(0, ω) = 0
and X( · , ω) ∈ C(RN ) for all ω ∈ Ω, and showthat {X(t) N
: t ≥ 0} is an R -valued
Brownian motion if and only if the span of ξ, X(t) RN : t ≥ 0 & ξ ∈ RN } is a
Gaussian family with the property that, for all t, t0 ∈ [0, ∞) and ξ, ξ 0 ∈ RN ,
h i
EP ξ, X(t) RN ξ 0 , X(t0 ) RN = t ∧ t0 (ξ, ξ 0 )RN .

(ii) Assuming that {B(t) : t ≥ 0} is an RN -valued Brownian motion, show that

{OB(t) : t ≥ 0} is also an RN -valued Brownian motion for any orthogonal trans-
formation O on RN . That is, the distribution of Brownian motion is invariant
under rotation. (See Theorem 8.3.14 for a significant generalization.)
(iii) Assuming that {B(t) : t ≥ 0} is an RN -valued Brownian motion, show that
1
{λ− 2 B(λt) : t ≥ 0} is also an RN -Brownian motion for each λ ∈ (0, ∞). This is
called the Brownian scaling invariance property.
Exercise 4.3.11. This exercise introduces the time inversion invariance prop-
erty of Brownian motion.
(i) Suppose that {B(t) : t ≥ 0} is an RN -valued Brownian motion, and set
1

X(t) = tB t for t > 0. As an application of (i) in Exercise 4.3.10, show that
{X(t) : t > 0} has the same distribution as {B(t) : t > 0}, and conclude from
this that limt&0 X(t) = 0 P-almost surely. In particular, if B̃(0, ω) = 0 and, for
t ∈ (0, ∞),
tB 1t , ω when limτ →0 τ B τ1 , ω = 0

B̃(t, ω) =
0 otherwise,
show that {B̃(t) : t ≥ 0} is an RN -valued Brownian motion.
(ii) As a consequence of part (i), prove the Brownian Strong Law of Large Num-
bers: limt→∞ t−1 B(t) = 0.
Exercise 4.3.12. Let {B(t) : t ≥ 0} be an RN -valued Brownian motion.
(i) As an application of Theorem 1.4.13, show that, for any e ∈ SN −1 and

T ∈ (0, ∞),
!
R2

≤ 2P e, B(T ) RN ≥ R ≤ 2e− 2T ,

P sup e, B(t) ≥ R
t∈[0,T ]
and conclude that
R2
P kBk[0,T ] ≥ R ≤ 2N e− 2N T .

(4.3.13)
(ii) Now assume that N = 1, and set B ∗ (t) = max

τ ∈[0,t] B(τ ). Just as in part (i),
use Theorem 1.4.13 to show that P B ∗ (1) ≥ a ≤ 2P B(1) ≥ a for all a > 0. By

examining its proof, one sees that the inequality in Theorem 1.4.13 comes from
not knowing how far over a the partial sums jump when they first exceed level a.
Thus, because we are now dealing with “continuous partial sums,” one should
suspect that the inequality can be made an equality. To verify this suspicion, let
Γn () denote the set of ω such that |B(t, ω) − B(s, ω)| < for all 0 ≤ s < t ≤ 1
with t − s ≤ 2−n , and show that, for 0 < < a,
{B(1) ≥ a} ∩ Γn ()
2n
[ −1
−n −n −n
⊆ max B(`2 ) < a − ≤ B(m2 ) & B(1) − B(m2 ) > 0 ,
0≤`<m
m=1
and conclude that P {B(1) ≥ a} ∩ Γn () ≤ 12 P B ∗ (1) ≥ a − for all n∈ N.

Now let n → ∞ and then & 0 to arrive at P B ∗ (1) ≥ a ≥ 2P B(1) ≥ a .
(iii) By combining the preceding with Brownian scaling invariance, arrive at
∞
r Z
∗ 2 x2
e−

(4.3.14) P B (t) ≥ a = 2P B(t) ≥ a = 2 dx.
π 1
at− 2
This beautiful result, which is sometimes called the reflection principle for
Brownian motion, seems to have appeared first in L. Bachelier’s now famous
1900 thesis, where he used what is now called “Brownian motion” to model
price fluctuations on the Paris Bourse. More information about the reflection
principle can be found in § 8.6.3.
Exercise 4.3.15. Let {B(t) : t ≥ 0} be an R-valued Brownian motion. The

goal of this exercise is to prove the Brownian Law of the Iterated Loga-
rithm:
B(t) B(t)
lim q = 1 = lim q P-almost surely.
t→∞ t&0
2t log(2) t 2t log(2) t−1
Begin by checking that the second equality follows from the first applied to the
time inverted process {B̃(t) : t ≥ 0} described in (i) of Exercise 4.3.11. Next,
observe that
B(n)
lim q = 1 P-almost surely
n→∞
2n log(2) n
is just the Law of the Iterated Logarithm for standard normal random variables.
Thus, all that remains is to show that

B(t) B(n)
lim sup q −q = 0 P-almost surely,
n→∞ t∈[n,n+1]
2t log(2) t 2n log(2) n
which can be checked by a combination of the Strong Law for Brownian motion,
the estimate in (4.3.13), and the easy half of the Borel–Cantelli Lemma.
Exercise 4.3.16. Given a stochastic process {X(t) : t ≥ 0}, the stochastic
process {X̃(t) : t ≥ 0} is said to be a modification of {X(t) : t ≥ 0} if, for
each t ∈ [0, ∞), X̃(t) = X(t) P-almost surely. Further, given a stochastic process
{X(t) : t ≥ 0} with values in a metric space (E, ρ), one says that {X(t) : t ≥ 0}
is stochastically continuous if, as t → s, X(t) −→ X(s) in probability for
each s ∈ [0, ∞).
(i) Show that the simple Poisson process {N (t) : t ≥ 0} is stochastically contin-
uous. Thus, stochastic continuity does not imply path continuity.
(ii) Let Q denote the set of rational real numbers. Show that an RN -valued,
stochastically continuous stochastic process {X(t) : t ≥ 0} admits a continuous
modification if and only if, for each T > 0, t ∈ [0, T ] ∩ Q 7−→ X(t) is uniformly
continuous. Conclude that a stochastically continuous process {X(t) : t ≥ 0}
admits a continuous modification if and only if there exists a µ ∈ M1 C(RN )
such that the distribution of {X(t) : t ≥ 0} under P is the same as the dis-
tribution of {ψ(t) : t ≥ 0} under µ. Equivalently, a stochastically continuous
process {X(t) : t ≥ 0} admits a continuous modification if and only if there
exists a continuous stochastic process {Y (t) : t ≥ 0}, not necessarily on the
same probability space, with the same distribution as {X(t) : t ≥ 0}.
Exercise 4.3.17. It is important to realize that the insistence in Theorem

4.3.2 that the pth moment of |X(t) − X(s)| be dominated by |t − s| to a power
strictly greater than p is essential. To see this, recall the simple Poisson process
{N (t) : t ≥ 0} in § 5.2.1, and set X(t) = N (t) − t. The paths of this process are
right-continuous but definitely not continuous. On the other hand, show that
2
EP N (t) − N (s) − (t − s) ≤ t − s for 0 ≤ s < t. More generally, knowing
2
that E |X(t) − X(s)| is dominated by |t − s| is not enough to conclude that
there is a continuous modification of t X(t).
Exercise 4.3.18. There is an important extension of Theorem 4.3.2 to pro-
cesses that have a multidimensional parametrization. Let B be a Banach space
and {X(x) : x ∈ [0, T ]ν } a family of B-valued random variables with the prop-
erty that
1 ν
EP kX(y) − X(x)kpB p ≤ C|y − x| p +r

for some p ∈ [1, ∞), r > 0, and C < ∞. Show that there exists a family
{X̃(x) : x ∈ [0, T ]ν } with the properties that x ∈ [0, T ]ν 7−→ X̃(x, ω) ∈ B
is continuous for all ω, and, for each x ∈ [0, T ]ν , X̃(x, ω) = X(x, ω) P-almost
surely. Further, show that, for each α ∈ (0, r), there is a universal K(ν, r, α) < ∞
such that
 
kX̃(y) − X̃(x)kB  ν
+r−α
EP  sup  ≤ K(ν, r, α)CT p .

|y − x|α
x,y∈[0,T ]ν
y6=x
Hint: First rescale time to reduce to the case when T = 1. Now assume that 2
T = 1. Given n ∈ N, take Sn to be the set of pairs (m, m0 ) ∈ {0, . . . , 2n }N
ν
such that m0i ≥ mi for all 1 ≤ i ≤ ν and i=1 (m0i − mi ) = 1, note that Sn has
P
no more than ν2(n+1)ν elements, set
Mn = max kX(m0 2−n ) − X(m2−n )kB : (m, m0 ) ∈ Sn ,

1
and show that EP [Mn ] ≤ C2ν ν p 2−rn . Next, let x Xn (x) denote the nth
dyadic multiliniarization of x X(x), the one that is multilinear on each dyadic
QN
cube i=1 [(mi − 1)2−n , mi 2−n ] for (m1 , . . . , mN ) ∈ {1, . . . , 2n }N . As in the
proof of Theorem 4.3.2, argue that kXn+1 − Xn ku,B ≤ Mn+1 , and conclude
that there exists an (x, ω) X̃(x, ω) that is continuous in x for each ω and
is P-almost surely equal to X(x, · ) for each x. Finally, to derive the Hölder
1
continuity estimate, observe that kXn (y) − Xn (x)kB ≤ 2n ν 2 |y − x|Mn , and
proceed as in the proof of the corresponding part of Theorem 4.3.2.
Exercise 4.3.19. In this exercise we will examine a couple of the implications
that Theorem 4.3.5 has about any Riemann–Stieltjes type integration theory
involving Brownian paths. For simplicity, I will restrict my attention to the one-
dimensional case. Thus, let {B(t) : t ≥ 0} be an R-valued Brownian motion.
Because t B(t) is continuous, one knows that any function ψ : [0, 1] −→ R of
bounded variation is Riemann–Stieltjes integrable on [0, 1] with respect to B
[0, 1]. However, as the following shows, almost no Brownian path is Riemann–
Stieltjes with respect to itself. Namely, using Theorem 4.3.5, show that P-almost
surely,
n
X
m−1
m
m−1
B(1)2 − 1
lim B n B n −B n = ,
n→∞
m=1
2
n
X
m
m
m−1
B(1)2 + 1
lim B n B n −B n = ,
n→∞
m=1
2
whereas
n
X
2m−1 m m−1
= B(1)2 .

lim B 2n B n −B n
n→∞
m=1
Exercise 4.3.20. Say that a D(RN )-valued process {Z(t) : t ≥ 0} is a Lévy

process if Z(0) = 0 and it has independent, homogeneous increments. Show that
every Lévy process is a Lévy process for some µ ∈ I(RN ).
Exercise 4.3.21. Let {j(t, · ) : t ≥ 0} be a Poisson jump process associated
with some M ∈ M∞ (RN ). In Lemma 4.2.13, we showed that when M ∈ M2 (RN ),
then |y| j(t, dy) < ∞, t ≥ 0, with positive probability only if M ∈ M1 (RN ).
R
In this exercise, weR will show that the same is true for any M ∈ M∞ (RN ). That
is, assuming that |y| j(t, dy) < ∞, t ≥ 0, with positive probability, it is to be
shown that M ∈ M1 (RN ). Here are some steps that you might want to follow.
R
(i) As an application of Kolmogorov’s 0–1 Law, show that |y| j(t, dy) < ∞
with positive probability implies it is finite with probability 1.
R
(ii) Let N be the set of ω ∈ Ω for which there is aRt > 0 such that |y| j(t, dy, ω)
= ∞. By (i), P(N ) = 0. Define Z(t, ω) = y j(t, dy, ω) for ω ∈ / N and
Z(t, ω) = 0 for ω ∈ N , and show that {Z(t) : t ≥ 0} is a Lévy process with
absolutely pure jump paths.
(iii) Applying Theorem 4.1.8, first show that {Z(t) : t ≥ 0} is a Lévy process
for a µ with Lévy measure M , and then apply Corollary 4.3.8 to conclude that
M ∈ M1 (RN ).
Exercise 4.3.22. Corollary 4.3.3 can be sharpened. In fact, Lévy showed that
if {B(t) : t ≥ 0} is an R-valued Brownian motion, then
|B(t) − B(s)| √

P lim sup = 2 = 1,
δ&0 0<t−s≤δ L(δ)
p
where L(δ) ≡ δ log δ −1 . Notice that, on the one hand, this result is in the direc-
tion that one should expect: we know (cf. Theorem 4.3.4) that Brownian paths
are almost never Hölder continuous of any order greater than 12 . On the other
hand, the Brownian Law of the Iterated Logarithm (cf. Exercise q 4.3.15) might
make one guess that their true modulus of continuity ought to be δ log(2) δ −1 ,
not L(δ). However, that guess is wrong because it fails to take into account the
difference between a question about what is true at a single time as opposed to
what is true simultaneously for all times. The purpose of this exercise is to show
how the considerations in § 4.3.3 can be used to get a statement that is related
to but far less refined than Lévy’s. The result to be proved here says only that

|B(t) − B(s)|
(4.3.23) P lim sup ≤K =1
δ&0 0<t−s≤δ L(δ)
for some K < ∞.

(i) First show that it suffices to prove that there exists a K < ∞ such that
 
|B(t) − B(s)|
P  lim sup ≤ K = 1
 
δ&0 0<t−s≤δ L(δ)
s,t∈[0,1]
and that this will follow from

∞
!
X |B(t) − B(s)|
(*) P sup >K < ∞.
n=0 2−n−1 ≤t−s≤2−n L(2−n−1 )
(ii) Define the polygonal approximation {Bn (t) : t ≥ 0} as in § 4.3.1, set Mn =

max1≤m≤2n |B(m2−n ) − B((m − 1)2−n )|, and show that
|B(t) − B(s)| 2kB − Bn k[0,1] Mn

−n−1
≤ + for 2−n−1 ≤ t − s ≤ 2−n .
L(2 ) L(2−n−1 ) L(2−n )
P∞ p P∞
(iii) Set C = n=0 (n + 1)2−n , show that n=m L(2−n )−1 ≤ CL(2−m )−1 for
all m ≥ 0, and, arguing as in the proof of Theorem 4.3.2, conclude that, for any
R > 0,
∞
X
P Mm+1 ≥ C −1 RL(2−m−1 )−1 .

P kB − Bn k[0,1] ≥ R ≤
m=n
(iv) Show that, for all R > 0 and n ∈ N,

−1 2
P Mn ≥ RL(2−n )−1 ≤ 2n(1−2 R ) ,

and combine this with (ii) and (iii) to prove that (*) holds for some K < ∞.
Chapter 5
Conditioning and Martingales
Up to this point I have been dealing with random variables that are either
themselves mutually independent or are built out of other random variables
that are. For this reason, it has not been necessary for me to make explicit
use of the concept of conditioning, although, as we will see shortly, this concept
has been lurking silently in the background. In this chapter I will first give the
modern formulation of conditional expectations and then provide an example of
the way in which conditional expectations can be used.
Let (Ω, F, P) be a probability space, and suppose that A ∈ F is a set having
positive P-measure. For reasons that are most easily understood when Ω is finite
and P is uniform, the ratio
P(A ∩ B)
P(B|A) ≡ , B ∈ F,
P(A)
is called the conditional probability of B given A. As one learns in an
elementary course, the introduction of conditional probabilities makes many
calculations much simpler; in particular, conditional probabilities help to clarify
dependence relations between the events represented by A and B. For example,
B is independent of A precisely when P(B|A) = P(B) or, in words, when the
condition that A occurs does not change the probability that B occurs. Thus, it
is unfortunate that the naı̈ve definition of conditioning as described above does
not cover many important situations. For example, suppose that X and Y are
random variables and that one wants to talk about the conditional probability
that Y ≤ b given that X = a. Unless one is very lucky and P(X = a) > 0,
dividing by P(X = a) is not going to do the job. As this example illustrates,
it is of great importance to generalize the concept of conditional probability to
include situations when the event on which one is conditioning has P-measure 0,
and the next section is devoted to Kolmogorov’s elegant solution to the problem
of doing so.
§ 5.1 Conditioning
In order to appreciate the idea behind Kolmogorov’s solution, imagine someone
told you the conditional probability that the event B occurs given that the
event A occurs. Obviously, since you have no way of saying anything about the
193
194 5 Conditioning and Martingales
probability of B when A does not occur, she has provided you with incomplete
information about B. Thus, before you are satisfied, you should demand to
know also what is the conditional probability of B given that A does not occur.
Of course, this second piece of information is relevant only if A is not certain,
in which case P(A) < 1 and therefore P B A{ is well defined. More generally,
suppose that P = {A1 , . . . , AN } (N here may be either finite or countably
infinite) is a partition of Ω into elements of F having positive P-measure. Then,
of B ∈ F relative to
in order to have complete information about the probability
P, one has to know the entire list of the numbers P B An , 1 ≤ n ≤ N . Next,
suppose that one attempts to describe this list in a way that does not depend
explicitly on the positivity of the numbers P(An ). For this purpose, consider the
function
XN

ω ∈ Ω 7−→ f (ω) ≡ P B An 1An (ω).
n=1
Clearly, f is not only F-measurable, it is measurable with respect to the σ-
algebra σ(P) over Ω generated by P. In particular (because the only σ(P)-
measurable set of P-measure 0 is empty), f is uniquely determined by its P-
integrals EP [f, A] over sets A ∈ σ(P). Moreover, because, for each B ∈ σ(P)
and n, either An ⊆ B or B ∩ An = ∅, we have that
N
X X
EP f, A = P B ∩ An = P An ∩ B = P A ∩ B .
n=1 {n:An ⊆B}
Hence, the function f is uniquely determined by the properties that it is σ(P)-

measurable and that

EP f, A = P A ∩ B for every A ∈ σ(P).
The beauty of this description is that it makes perfectly good sense even if
some of the An ’s have P-measure 0, except in that case the description does not
determine f pointwise but merely up to a σ(P)-measurable P-null set (i.e., a
set of P-measure 0), which is the very least one should expect to pay for dividing
by 0.
§ 5.1.1. Kolmogorov’s Definition. With the preceding discussion in mind,
one ought to find the following formulation reasonable. Namely, given a sub-
σ-algebra Σ ⊆ F and a (−∞, ∞]-valued random variable X whose negative
part X − ≡ −(X ∧ 0) is P-integrable, I will say that the random variable XΣ
is a conditional expectation of X given Σ if XΣ is (−∞, ∞]-valued and
−
Σ-measurable, XΣ is P-integrable, and

(5.1.1) EP XΣ , A = EP X, A for every A ∈ Σ.
Obviously, having made this definition, my first duty is to show that such an
XΣ always exists and to discover in what sense it is uniquely determined. The
latter problem is dealt with in the following lemma.
§ 5.1 Conditioning 195
Lemma 5.1.2. Let Σ be a sub-σ-algebra of F, and suppose that XΣ and YΣ

are a pair of (−∞, ∞]-valued Σ-measurable random variables for which XΣ− and
YΣ− are both P-integrable. Then

EP XΣ , A ≤ EP YΣ , A for every A ∈ Σ,
if and only if XΣ ≤ YΣ (a.s., P).

Proof: Without loss in generality, I may and will assume that Σ = F and
will therefore drop the subscript Σ; and, since the “if” implication is completely
trivial, I will discuss only the minimally less trivial “only if” assertion. Thus,
suppose that P-integrals of Y dominate those of X and yet that X > Y on
a set of positive P-measure. We could then choose an M ∈ [1, ∞) so that
P(A) ∨ P (B) > 0, where
1

A ≡ X ≤ M and Y ≤ X − M and B ≡ X = ∞ and Y ≤ M }.
But if P(A) > 0, then

1

EP X, A ≤ EP Y, A ≤ EP X, A − M P (A),

which, because EP X, A is a finite number, is impossible. At the same time, if
P(B) > 0, then
∞ = EP X, B ≤ EP Y, B ≤ M < ∞,
which is also impossible.
Theorem 5.1.3. Let Σ be a sub-σ-algebra of F and X a (−∞, ∞]-valued
random variable for which X − is P-integrable. Then there exists a conditional
expectation value XΣ of X. Moreover, if Y is a second (−∞, ∞]-valued random
variable and Y ≥ X (a.s., P), then Y − is P-integrable and YΣ ≥ XΣ (a.s., P) for
any YΣ that is a conditional expectation value of Y given Σ. In particular, if
X = Y (a.s., P), then {YΣ 6= XΣ } is a Σ-measurable, P-null set.1
Proof: In view of Lemma 5.1.2, it suffices for me to handle the initial existence
statement. To this end, let G denote the class of X satisfying EP [X − ] < ∞ for
which an XΣ exists, and let G + denote the non-negative elements of G. If
{Xn : n ≥ 1} ⊆ G + is non-decreasing and, for each n ∈ Z+ , Xn Σ denotes a

conditional expectation of Xn given Σ, then 0 ≤ Xn Σ ≤ Xn+1 Σ (a.s., P),

and therefore I can arrange that 0 ≤ Xn Σ ≤ Xn+1 Σ everywhere. In par-

ticular, if X and XΣ are the pointwise limits of the Xn ’s and Xn Σ ’s, re-
spectively, then the Monotone Convergence Theorem guarantees that XΣ is a
1 Kolmogorov himself, and most authors ever since, have obtained the existence of conditional
expectation values as a consequence of the Radon–Nikodym Theorem. Because I find projec-

tions more intuitively appealing, I prefer the approach given here.
conditional expectation of X given Σ. Hence, we now know that G + is closed

under non-decreasing, pointwise limits, and therefore we will know that G + con-
tains all non-negative random variables X as soon as we show that G contains all
bounded X’s. But if X is bounded (and is therefore an element of L2 (P; R)) and
LΣ = L2 (Ω, Σ, P; R) is the subspace of L2 (P; R) consisting of its Σ-measurable
elements, then the orthogonal projection XΣ of X onto LΣ is a Σ-measurable
random variable that is P-square integrable and satisfies (5.1.1).
So far I have proved that G + contains all non-negative, F-measurable X’s.
Furthermore, if X is non-negative, then (by Lemma 5.1.2) XΣ ≥ 0 (a.s., P) and
so XΣ is P-integrable precisely when X itself is. In particular, I can arrange
to make XΣ take its values in [0, ∞) when X is non-negative and P-integrable.
Finally, to see that X ∈ G for every X with EP X − < ∞, simply consider X +

and X − separately, apply the preceding to show that X ± Σ ≥ 0 (a.s., P) and

that X − Σ is P-integrable, and check that the random variable

X + Σ − X − Σ when X ± Σ ≥ 0 and X − Σ < ∞

XΣ ≡
0 otherwise
is a conditional expectation of X given Σ.
Convention. Because it is determined only up to a Σ-measurable P-null set,
one cannot, in general, talk about the conditional expectation of X as a function.
Instead, the best that one can do is say that the conditional expectation of
X is the equivalence class of Σ-measurable XΣ ’s that satisfy (5.1.1), and I will
adopt the notation EP [X|Σ] to denote this equivalence class. On the other hand,
because one is usually interested only in P-integrals of conditional expectations,
it has become common practice to ignore, for the most part, the distinction
between the equivalence class EP [X|Σ] and the members of that equivalence class.
Thus (just as one would when dealing with the Lebesgue spaces) I will abuse
notation by using EP [X|Σ] to denote a generic element of the equivalence class
EP [X|Σ] and will be more precise only when EP [X|Σ] contains some particularly
distinguished member. For example, recall the random variables Tn entering the
definition of the simple Poisson process {N (t) : t ∈ (0, ∞)} in § 4.2.1. It is then
clear (cf. part (i) in Exercise 1.1.9) that we can take
h i
EP 1{n} N (t) σ T1 , . . . , Tn = 1[0,t] Tn e−(t−Tn ) ,

and one would be foolish to take any other representative. More generally, I
will always take non-negative representatives of EP [X|Σ] when X itself is non-
negative and R-valued representatives when X is P-integrable. Finally, for histor-
ical reasons, it is usual to distinguish the case when X is the indicator function
1B of a set B ∈ F and to call EP [1B |Σ] the conditional probability of B
given Σ and to write P(B|Σ) instead of EP [1B |Σ]. Of course, representatives of
P(B|Σ) will always be assumed to take their values in [0, 1].
Once one has established the existence and uniqueness of conditional expec-
tations, there is a long list of more or less obvious properties that one can easily
verify. The following theorem contains some of the more important items that
ought to appear on such a list.
Theorem 5.1.4. Let Σ be a sub-σ-algebra of F. If X is a P-integrable random
variable and C ⊆ Σ is a π-system (cf. Exercise 1.1.12) that generates Σ, then

Y = EP X Σ (a.s., P) ⇐⇒
Y ∈ L1 (Ω, Σ, P; R) and EP Y, A = EP X, A for A ∈ C ∪ {Ω}.

Moreover, if X is any (−∞, ∞]-valued random variable that satisfies EP [X − ]

< ∞, then each of the following relations holds P-almost surely:
P
(5.1.5) E X Σ ≤ EP |X|Σ ;
h i
(5.1.6) EP X T = EP EP X Σ T

when T is a sub-σ-algebra of Σ; and, when X is R-valued and P-integrable,

EP −X Σ = −EP X Σ .
Next, let Y be a second (−∞, ∞]-valued random variable with EP [Y − ] < ∞.

Then, P-almost surely,

EP αX + βY Σ = αEP X Σ + βEP Y Σ for each α, β ∈ [0, ∞),
and

(5.1.7) EP Y X Σ = Y EP X Σ
if Y is Σ-measurable and (XY )− is P-integrable. Finally, suppose that {Xn :

n ≥ 1} is a sequence of (−∞, ∞]-valued random variables. Then, P-almost
surely,
if EP [X1− ] < ∞ and Xn % X (a.s., P);

(5.1.8) EP Xn Σ % EP X Σ
and, more generally,

if Xn ≥ 0 (a.s., P) for each n ∈ Z+ .
P

(5.1.9) E lim Xn Σ ≤ lim EP Xn Σ
n→∞ n→∞
Proof: To prove the first assertion, note that the set of A ∈ Σ for which
EP [X, A] = EP [Y, A] is (cf. Exercise 1.1.12) a λ-system that contains C and
therefore Σ. Next, clearly (5.1.5) is just an application of Lemma 5.1.2, while
(5.1.6) and the two equations that follow it are all expressions of uniqueness. As
for the next equation, one can first reduce to the case when X and Y are both
non-negative. Then one can use uniqueness to check it when Y is the indicator
function of an element of Σ, use linearity to extend it to simple Σ-measurable
functions, and complete the job by taking monotone limits. Finally, (5.1.8) is an
immediate application of the Monotone Convergence Theorem, whereas (5.1.9)
comes from the conjunction of

m ∈ Z+ ,
P

E inf Xn Σ ≤ inf EP Xn Σ (a.s., P),
n≥m n≥m
with (5.1.8).
It probably will have occurred to most readers that the properties discussed
in Theorem 5.1.4 give strong evidence that, for fixed ω ∈ Ω, X 7−→ EP [X|Σ](ω)
behaves like an integral (in the sense of Daniell) and therefore ought to be
expressible in terms of integration with respect to a probability measure Pω .
Indeed, if one could actually talk about X 7−→ EP [X|Σ](ω) for a fixed (as opposed
to P-almost every) ω ∈ Ω, then there is no doubt that such a Pω would have to
exist. Thus, it is reasonable to ask whether there are circumstances in which one
can gain sufficient control over all the P-null sets involved to really make sense
out of X 7−→ EP [X|Σ](ω) for fixed ω ∈ Ω. Of course, when Σ is generated by a
countable partition P, we already know what to do. Namely, when ω ∈ A ∈ P,
we can take (
0 if P(A) = 0
P
E [X|Σ](ω) = EP [X, A]
P(A) if P(A) > 0.
Even when Σ does not arise in this way, one can often find a satisfactory repre-
sentation of conditional expectations as expectations. A quite general statement
of this sort is the content of Theorem 9.2.1 in Chapter 9.
§ 5.1.2. Some Extensions. For various applications it is convenient to have
two extensions of the basic theory developed in § 5.1.1. Specifically, as I will now
show, the theory is not restricted to probability (or even finite) measures and
can be applied to random variables that take their values in a separable Banach
space. Thus, from now
on, µ will be an arbitrary (non-negative) measure on
(Ω, F) and E, k·kE will be a separable Banach space; and I begin by reviewing
a few elementary facts about µ-integration for E-valued random variables.2
2 The integration that I outline below is what functional analysts call the Bochner integral for
Banach space–valued functions. There is a more subtle and intricate theory due to Pettis, but
Bochner’s theory seems adequate for most probabilistic considerations.
A function X : Ω −→ E is said to be µ-simple

if X is F-measurable, X takes
only finitely many values, and µ X 6= 0 < ∞, in which case its integral with
respect to µ is the element of E given by
Z X
µ
E [X] = X(ω) µ(dω) ≡ x µ(X = x).
Ω x∈E\{0}
Notice that another description of Eµ [X] is as the unique element of E with the
property that
E [X], x∗ = Eµ hX, x∗ i for all x∗ ∈ E ∗

µ
(I use E ∗ to denote the dual of E and hx, x∗ i to denote the action of x∗ ∈ E ∗ on

x ∈ E), and therefore that the mapping taking µ-simple X to Eµ [X] is linear.
Next, observe that ω ∈ Ω 7−→ kX(ω)kE ∈ R is F-measurable if X : Ω −→ E is
F-measurable. In particular, for F-measurable X : Ω −→ E, I will set
( 1
Eµ kXkpE p

if p ∈ [1, ∞)
kXkLp (µ;E) =
inf M : µ kXkE > M = 0 if p = ∞
and will write X ∈ Lp (µ; E) when kXkLp (µ;E) < ∞. Also, I will say the X :
Ω −→ E is µ-integrable if X ∈ L1 (µ; E); and I will say that X is locally
µ-integrable if 1A X is µ-integrable for every A ∈ F with µ(A) < ∞.
The definition of µ-integration for an E-valued X is completed in the following
lemma.
Lemma 5.1.10. For
each µ-integrable X : Ω −→ E there is a unique element
Eµ [X] ∈ E satisfying EP [X], x∗ = EP [hX, x∗ ] for all x∗ ∈ E ∗ . In particular,

the mapping X ∈ L1 (µ; E) 7−→ Eµ [X] ∈ E is linear and satisfies

µ
E [X] ≤ Eµ kXkE .

(5.1.11) E
Finally, if X ∈ Lp (µ; E), where p ∈ [1, ∞), then there is a sequence {Xn : n ≥ 1}
of E-valued, µ-simple functions with the property that kXn − XkLp (µ;E) −→ 0.
Proof: Clearly uniqueness, linearity, and (5.1.11) all follow immediately from
the given characterization of Eµ [X]. Thus, all that remains is to prove existence
and the final approximation assertion. In fact, once the approximation assertion
is proved, then existence will follow immediately from the observation that, by
(5.1.11), Eµ [X] can be taken equal to limn→∞ Eµ [Xn ] if kX − Xn kL1 (µ;E) −→ 0.
To prove the approximation assertion, I begin with the case when µ is finite
and M = supω∈Ω kX(ω)kE < ∞. Next, choose a dense sequence {x` : ` ≥ 1} in
E, set A0,n = ∅, and let
n o
A`,n = ω : kX(ω) − x` kE < n1 for (`, n) ∈ Z+ × Z+ .
Then, for each n ∈ Z+ there exists an Ln ∈ Z+ with the property that

Ln
!
[ 1
µ Ω\ A`,n < p .
n
`=1
Hence, if Xn : Ω −→ E is defined so that
`−1
[
Xn (ω) = x` when 1 ≤ ` ≤ Ln and ω ∈ A`,n \ Ak,n
k=0
SLn
and Xn (ω) = 0 when ω ∈
/ 1 A`,n , then Xn is µ-simple and
M + µ(E)
kX − Xn kLp (µ;E) ≤ .
n
In order to handle the general case, let X ∈ Lp (µ; E) and n ∈ Z+ be given.
We can then find an rn ∈ (0, 1] with the property that
Z
1
kX(ω)kpE µ(dω) ≤ ,
(2n)p
Ω(rn ){
where n o
Ω(r) ≡ ω : r ≤ kX(ω)kE ≤ 1r for r ∈ (0, 1].
Since, for any r ∈ (0, 1], rp µ Ω(r) ≤ kXkpLp (µ;E) , we can apply the preceding to

the restrictions of µ and X to Ω(rn ) and thereby find a µ-simple Xn : Ω(rn ) −→

E with the property
Z ! p1
p 1
kX(ω) − Xn (ω)kE µ(dω) ≤ .
Ω(rn ) 2n
Hence, after extending Xn to Ω by taking it to be 0 off of Ω(rn ), we arrive at a
µ-simple Xn for which kX − Xn kLp (µ;E) ≤ n1 .
Given an F-measurable X : Ω −→ E and a B ∈ F for which 1B X ∈ L1 (µ; E),
I will use, depending on the context, either
Z Z
Eµ X, B or

X dµ or X(ω) µ(dω)
B B
to denote the quantity Eµ [1B X]. Also, when discussing the spaces Lp (µ; E), I
will adopt the usual convention of blurring the distinction between a particular
F-measurable X : Ω −→ E belonging to Lp (µ; E) and the equivalence class of
those F-measurable Y ’s that differ from X on a µ-null set. Thus, with this
convention, k · kLp (µ;E) becomes a bona fide norm (not just a seminorm) on
Lp (µ; E) with respect to which Lp (µ; E) becomes a normed vector space. Finally,
by the same procedure with which one proves the Lp (µ; R) spaces are complete,
one can prove that the spaces Lp (µ; E) are complete for any separable Banach
space E.
Theorem 5.1.12. Let (Ω, F, µ) be a σ-finite measure space and X : Ω −→ E

a locally µ-integrable function. Then
µ X 6= 0 = 0 ⇐⇒ Eµ X, A = 0 for A ∈ F with µ(A) < ∞.

Next, assume that Σ is a sub-σ-algebra for which µ Σ is σ-finite. Then, for

each locally µ-integrable X : Ω −→ E, there is a µ-almost everywhere unique
locally µ-integrable, Σ-measurable XΣ : Ω −→ E such that
Eµ XΣ , A = Eµ X, A

(5.1.13) for every A ∈ Σ with µ(A) < ∞.
In particular, if Y : Ω −→ E is a second locally µ-integrable function, then, for

all α, β ∈ R,
αX + βY Σ = αXΣ + βYΣ (a.e., µ).
Finally,

(5.1.14) XΣ ≤ kXkE (a.e., µ).
E Σ
Hence, not only does (5.1.13) continue to hold for any A ∈ Σ with 1A X ∈
L1 (µ; E), but also, for each p ∈ [1, ∞], the mapping X ∈ Lp (µ; E) 7−→ XΣ ∈
Lp (µ; E) is a linear contraction.
Proof: Clearly, it is only necessary to prove the “⇐=” part of the first assertion.
Thus, suppose that µ(X 6= 0) > 0. Then, because E is separable and therefore
(cf. Exercise 5.1.19) E ∗ with the weak* topology
is also separable, there exists
an > 0 and a x∗ ∈ E ∗ with the property that µ X, x∗ ≥ > 0, from which

it follows (by σ-finiteness) that there is an A ∈ F for which µ(A) < ∞ and
D E h
i
Eµ X, A , x∗ = Eµ X, x∗ , A 6= 0.

I turn next to the uniqueness and other properties of XΣ . But it is obvious that
uniqueness is an immediate consequence of the first assertion and that linearity
follows from uniqueness. As for (5.1.14), notice that if x∗ ∈ E ∗ and kx∗ kE ∗ ≤ 1,
then
Eµ XΣ , x∗ , A = Eµ X, x∗ , A ≤ Eµ kXkE , A = Eµ kXkE Σ , A

for every A ∈ Σ with µ(A) < ∞. Hence, at

least when µ is a probability
measure, Theorem 5.1.3 implies that XΣ , x∗ ≤ kXkE Σ (a.e., µ) for each

element x∗ from the unit ball in E ∗ ; and so, because E ∗ with the weak* topology
is separable, (5.1.14) follows in this case. To handle µ’s that are not probability
measures, note that either µ(Ω) = 0, in which case everything is trivial, or
µ(Ω) ∈ (0, ∞), in which case we can renormalize µ to make it a probability
measure, or µ(Ω) = ∞, in which case we can use the σ-finiteness of µ Σ to

reduce ourselves to the countable, disjoint union of the preceding cases.
Finally, to prove the existence of XΣ , I proceed as in the last part of the
preceding paragraph to reduce myself to the case when µ is a probability measure
P. Next, suppose that X is simple, let R denote its range, and note that
X
XΣ ≡ xP X = xΣ
x∈R
has the required properties. In order to handle general X ∈ L1 (P; E), I use the
approximation result in Lemma 5.1.10 to find a sequence {Xn : n ≥ 1} of simple
functions that tend to X in L1 (P; E). Then, since

(Xn )Σ − (Xm )Σ = Xn − Xm Σ (a.s., P)
and therefore, by (5.1.14),

(Xn )Σ − (Xm )Σ 1 ≤ kXn − Xm L1 (P;E) ,
L (P;E)
1
we
know that there exists a Σ-measurable XΣ ∈ L (P; E) to which the sequence
(Xn )Σ : n ≥ 1 converges; and clearly XΣ has the required properties.
Referring to the setting in the second part of Theorem 5.1.12, I will extend
the convention introduced following Theorem 5.1.3 and call the µ-equivalence
class of XΣ ’s satisfying (5.1.13) the µ-conditional expectation of X given
Σ, will use Eµ [X|Σ] to denote this µ-equivalence class, and will, in general,
ignore the distinction between the equivalence class and a generic representative
of that class. In addition, if X : Ω −→ E is locally µ-integrable, then, just
as in Theorem 5.1.4, the following are essentially immediate consequences of
uniqueness:
Eµ Y X Σ = Y Eµ X Σ (a.e., µ) for Y ∈ L∞ (Ω, Σ, µ; R),

and h i
Eµ X T = Eµ Eµ X Σ T

(a.e., µ)

whenever T is a sub-σ-algebra of Σ for which µ T is σ-finite.

Exercise 5.1.15. As the proof of existence in Theorem 5.1.4 makes clear, the
operation X ∈ L2 (P; R) 7−→ EP [X|Σ] is just the operation of orthogonal pro-
jection from L2 (P; R) onto the space L2 (Ω, Σ, P; R) of Σ-measurable elements of
L2 (P; R). For this reason, one might be inclined to think that the concept of con-
ditional expectation is basically a Hilbert space notion. However, as this exercise
shows, that inclination should be resisted. The point is that, although condi-
tional expectation is definitely an orthogonal projection, not every orthogonal
projection is a conditional expectation!

(i) Let L be a closed linear subspace of L2 (P; R), and let ΣL = σ {X : X ∈ L}

be the σ-algebra over Ω generated by X ∈ L. Show that L = L2 Ω, ΣL , P; R if
and only if 1 ∈ L and X + ∈ L whenever X ∈ L.
Hint: To prove the “if” assertion, let X ∈ L be given, and show that
h + i
Xn ≡ n X − α1 ∧ 1 ∈ L for every α ∈ R and n ∈ Z+ .
Conclude that Xn % 1(α,∞) ◦ X must be an element of L.

(ii) Let Π be an orthogonal projection operator on L2 (P; R), set L = Range(Π),
and let Σ = ΣL , where ΣL is defined as in part (i). Show that ΠX = EP [X|Σ]
(a.s., P) for all X ∈ L2 (P; R) if and only if Π1 = 1 and
X, Y ∈ L∞ (P; R).

(*) Π X ΠY = (ΠX)(ΠY ) for all
Hint: Assume that Π1 = 1 and that (*) holds. Given X ∈ L∞ (P; R), use
induction to show that
n
kΠXknL2n (P) ≤ kXkn−1 = Π X(ΠX)n−1

L∞ (P) kXkL (P) and ΠX
2
n
for all n ∈ Z+ . Conclude that kΠXkL∞ (P) ≤ kXkL∞ (P) and that ΠX ∈
L, n ∈ Z+ , for every X ∈ L∞ (P; R). Next, using the preceding together with
Weierstrass’s Approximation Theorem, show that (ΠX)+ ∈ L, first for X ∈
L∞ (P; R) and then for all X ∈ L2 (P; R). Finally, apply (i) to arrive at L =
L2 Ω, Σ, P; R .
(iii) To emphasize the point being made here, consider once again a closed
linear subspace L of L2 (P; R), and let ΠL be orthogonal projection onto L.
Given X ∈ L2 (P; R), recall that ΠL X is characterized as the unique element of
L for which X − ΠL X ⊥ L, and show that EP [X|ΣL ] is the unique element of
L2 (Ω, ΣL , P; R) with the property that

X − EP X ΣL ⊥ f Y1 , . . . , Yn

for all n ∈ Z+ , f ∈ Cb Rn ; R , and Y1 , . . . , Yn ∈ L. In particular, ΠL X =
EP [X|ΣL ] if and only if X −ΠL X is perpendicular not only to all linear functions
of the Y ’s in L but even to all nonlinear ones.
Exercise 5.1.16. In spite of the preceding, there is a situation in which or-
thogonal projection coincides with conditioning. Namely, suppose that G is a
closed Gaussian family in L2 (P; R), and let L be a closed, linear subspace of G.
As an application of Lemma 4.3.1, show that, for any X ∈ G, the orthogonal
projection ΠL X of X onto L is a conditional expectation value of X given the
σ-algebra ΣL generated by the elements of L.
Exercise 5.1.17. Because most projections are not conditional expectations,

it is an unfortunate fact of life that, for the most part, partial sums of Fourier
series cannot be interpreted as conditional expectations. Be that as it may, there
are special cases in which such an interpretation is possible. To see this, take
Ω = [0, 1), F = B[0,1) , and P to be the restriction of Lebesgue measure to [0, 1).
Next, for n ∈ N, take Fn to be the σ-algebra generated by√ those f ∈ C([0, 1); C)
−n

that are periodic with period 2 . Finally, set ek (x) = exp −1k2πx for k ∈ Z,
and use elementary Fourier analysis to show that, for each n ∈ N, ek2n : k ∈ Z
is an orthonormal basis for L2 (Ω, Fn , P; C). In particular, conclude that, for
every f ∈ L2 (P; C),
X
EP f Fn = EP [f ] + f, ek2n L2 ([0,1);C) ek2n ,
k∈Z
where the convergence is in L2 ([0, 1]; C). (Also see Exercise 5.2.45.)
Exercise 5.1.18. Let (Ω, F, µ) be a measure space and Σ a sub-σ-algebra of
F with the property that µ Σ is σ-finite. Next, let E be a separable Hilbert
0
space, p ∈ [1, ∞], X ∈ Lp (µ; E), and Y a Σ-measurable element of Lp (µ; E) (p0
is the Hölder conjugate of p). Show that
h i
Eµ Y, X E Σ = Y, Eµ X Σ µ-almost surely.
E
Hint: First observe that it suffices to check that

h i h i
Eµ Y, X E = Eµ Y, Eµ X Σ .
E
Next, choose an orthonormal basis {en : n ≥ 0} for E, and justify the steps in
∞
X
Eµ Y, X E = Eµ Y, en E en , X E

1
∞ h
X i
Eµ Y, en E Eµ en , X E Σ = Eµ Y, Eµ [X|Σ] E .

=
1
Exercise 5.1.19. Let E be a separable Banach space, and show that, for each
R > 0, the closed ball BE ∗ (0, R) with the weak* topology is a compact metric
space. Conclude from this that the weak* topology on E ∗ is second countable
and therefore separable.
Hint: Choose a countable, dense subset {xn : n ≥ 1} in the unit ball BE (0, 1),
and define
∞
X
ρ(x∗ , y ∗ ) = 2−n hxn , x∗ − y ∗ i for x∗ , y ∗ ∈ BE ∗ (0, R).

n=1
§ 5.2 Discrete Parameter Martingales 205
Show that ρ is a metric for the weak* topology on BE ∗ (0, R). Next, choose
{xnm : m ≥ 1} so that xn1 = x1 and xnm+1 = xn if n is the first n > nm such
that xn is linearly independent of {x1 , . . . , xn−1 }. Given a sequence {x∗` : ` ≥ 1}
in BE ∗ (0, R), use a diagonalization argument to find a subsequence {x∗`k : k ≥ 1}
such that am = limk→∞ hxnm , x∗`k i exists for each m ≥ 1. Now define f on the
PM PM
span S of {xnm : m ≥ 1} so that f (x) = m=1 αm am if x = m=1 αm xnm ,
note that f (x) = limk→∞ hx, x∗`k i for x ∈ S, and conclude that f is linear on
S and satisfies the estimate |f (x)| ≤ RkxkE there. Since S is dense in E,
there is a unique extension of f as a bounded linear functional on E satisfying
the same estimate, and so there exists an x∗ ∈ BE ∗ (0, R) such that hx, x∗ i =
limk→∞ hx, x∗`k i for all x ∈ S. Finally, check that this convergence continues to
hold for all x ∈ E, and conclude that x∗`k −→ x∗ in the weak* topology.
Exercise 5.1.20. The purpose of this exercise is to show that Bochner’s theory
of integration for Banach space functions relies heavily on the assumption that
the Banach space be separable. In particular, the approximation procedure on
which the proof of Lemma 5.1.10 fails in the absence of separability. To see
this, consider the Banach space `∞ (µ; R) of uniformly bounded sequences x =
(x0 , . . . , xn , . . . ) ∈ RN with kxk`∞ (N;R) = supn≥0 |xn |. Next, let {Xn : n ≥ 0}
be a sequence of mutually independent, {−1, 1}-valued, Bernoulli random with
∞
mean value 0 on some probability space (Ω, F, P), and define X : Ω −→ ` (N; R)
by X(ω) = X0 (ω), . . . , Xn (ω), . . . . Show that, for any simple function Y :
Ω −→ `∞ (N; R),
P kX − Yk`∞ (N;R) < 14 = 0.

Hint: For any α ∈ R, show that P |Xn − α| < 14 ≤ 12 and therefore that

P kX − Ak`∞ (N;R) < 14 = 0 for any A ∈ `∞ (N; R).

§ 5.2 Discrete Parameter Martingales
In this section I will introduce an interesting and useful class of stochastic pro-
cesses that unifies and simplifies several branches of probability theory as well as
other branches of analysis. From the analytic point of view, what I will be doing
is developing an abstract version of differentiation theory (cf. Theorem 6.1.8).
Although I will want to make some extensions in § 5.3, I start in the fol-
lowing setting. (Ω, F, P) is a probability space and {Fn : n ∈ N} is a non-
decreasing sequence of sub-σ-algebra’s of F. Given a measurable space (E, B),
say that
the family {X n : n ∈ N} of E-valued random variables is Fn :
n ∈ N -progressively measurable if Xn is Fn -measurable for each n ∈ N.
Next, a family {Xn : n ∈ N} of (−∞, ∞]-valued random variables
is said
to be a P-submartingale with respect to Fn : n ∈ N if it is Fn :
n ∈ N -progressively measurable, EP [Xn− ] < ∞, and, for each n ∈ N, X n ≤
EP [Xn+1
|F n ] (a.s., P). It is said
to be a P-martingale
with respect to Fn :
n ∈ N if {Xn : n ∈ N} is an Fn : n ∈ N -progressively measurable family of
R-valued, P-integrable random variables satisfying Xn = EP [Xn+1 |Fn ] (a.s., P)

for each n ∈ N. In the future, I will abbreviate these statements by saying that
the triple Xn , Fn , P is a submartingale or a martingale.
Examples. The most trivial example of a submartingale is provided by a non-
decreasing sequence {an : n ≥ 0}. That is, if Xn ≡an , n ∈ N, then Xn , Fn , P
on any probability space Ω, F, P relative to any non-decreas-
is a submartingale

ing Fn : n ∈ N . More interesting examples are those which follow.1
(i) Let {Yn : n ≥ 1} be a sequence of mutually independent (−∞, ∞]-valued ran-
dom variables with EP [Yn− ] <P∞, n ∈ N, set F0 = {∅, Ω}, Fn = σ {Y1 , . . . , Yn }
+
for n ∈ Z , and define Xn = 1≤m≤n Ym , where summation over the empty set
is taken to be 0. Then, because EP [Yn+1 |Fn ] = EP [Yn+1 ] (a.s., P) and therefore

EP Xn+1 Fn = Xn + EP Yn+1 (a.s., P)

for every n ∈ N, we see that Xn , Fn , P is a submartingale if and only if
EP [Yn ] ≥ 0 for all n ∈ Z+ . In fact, if the Yn ’s are R-valued
and P-integrable,
then the same line of reasoning shows that Xn , Fn , P is a martingale if and
only if EP [Yn ] = 0 for all n ∈ Z+ . Finally, if {Yn : n ≥ 0} ⊆ L2 (P; R) and
EP [Yn ] = 0 for each n ∈ Z+ , then
2
Fn = Xn2 + EP Yn+1
2
EP Xn+1 Fn ≥ Xn2 (a.s., P),

and so Xn2 , Fn , P is a submartingale.

(ii) If X is an R-valued, P-integrable random variable and Fn : n ∈ N is a non-
decreasing sequence of sub-σ-algebras of F, then, by (5.1.6), EP [X|Fn ], Fn , P
is a martingale.

(iii) If Xn , Fn , P is a martingale, then, by (5.1.5), |Xn |, Fn , P is a submartin-
gale.
§ 5.2.1. Doob’s Inequality and Marcinkewitz’s Theorem. In view of Ex-
ample (i) above, we see that partial sums of independent random variables with
mean value 0 are a source of martingales and that their squares are a source of
submartingales. Hence, it is reasonable to ask whether some of the important
facts about such partial sums will continue to be true for all martingales, and
perhaps the single most important indication that the answer may be “yes” is
contained in the following generalization of Kolmogorov’s Inequality (cf. Theo-
rem 1.4.5). Like most of the foundational results in martingale theory, this one
is due to J.L. Doob. It is interesting that Doob’s proof is essentially the same
as Kolmogorov’s, only, if anything easier.
1 For a much more interesting and complete list of examples, the reader might want to consult
J. Neveu’s Discrete-parameter Martingales, North–Holland (1975).

Theorem 5.2.1 (Doob’s Inequality). Assume that Xn , Fn , P is a sub-
martingale. Then, for every N ∈ Z+ and α ∈ (0, ∞),

1 P
(5.2.2) P max Xn ≥ α ≤ E XN , max Xn ≥ α .
0≤n≤N α 0≤n≤N
In particular, if the Xn ’s are non-negative, then, for each p ∈ (1, ∞),

p1
p 1
(5.2.3) EP
sup Xnp ≤ sup EP Xnp p .
n∈N p − 1 n∈N
Proof: To prove (5.2.2), set A0 = {X0 ≥ α} and

An = Xn ≥ α but max Xm < α for n ∈ Z+ .
0≤m<n
Then the An ’s are mutually disjoint and An ∈ Fn for each n ∈ N. Thus,

X N N
X EP Xn , An
P max Xn ≥ α = P An ≤
0≤n≤N
n=0 n=0
α
N
X EP XN , An 1
≤ = EP XN , max Xn ≥ α .
n=0
α α 0≤n≤N
Now assume that the Xn ’s are non-negative. Given (5.2.2), (5.2.3) becomes
an easy application of Exercise 1.4.18.
Doob’s inequality is an example of what analysts call a weak-type inequal-
ity. To be more precise, it is a weak-type 1–1 inequality. The terminology derives
from the fact that such an inequality follows immediately from an L1 -norm, or
strong-type 1–1, inequality between the objects under consideration; but, in gen-
eral, it is strictly weaker. In order to demonstrate how powerful such a result
can be, I will now apply Doob’s Inequality to prove a theorem of Marcinkewitz.
Because it is an argument to which we will return again, the reader would do
well to become comfortable with the line of reasoning that allows one to pass
from a weak-type inequality, like Doob’s, to almost sure convergence results.
Corollary 5.2.4. Let X be an R-valued random variable and p ∈ [1, ∞). If
X ∈ Lp (P; R), then, for any non-decreasing sequence Fn : n ∈ N of sub-σ-
algebras of F,
" ∞ #
_
(a.s., P) and in Lp (P; R) as n → ∞.
P
P
E X Fn −→ E X Fn
0
W∞
In particular, if X is 0 Fn -measurable, then EP [X|Fn ] −→ X (a.s., P) and in
Lp (P; R).
W∞
Proof: Without loss in generality, assume that F = 0 Fn .
Given X ∈ L1 (P; R), set Xn = EP [X|Fn ] for n ∈ N. The key to my proof will
be the inequality

1
(5.2.5) P sup |Xn | ≥ α ≤ EP |X|, sup |Xn | ≥ α , α ∈ (0, ∞);
n∈N α n∈N
and, since, by (5.1.5), |Xn | ≤ EP [|X| |Fn ] (a.s., P), while proving (5.2.5) I may
and will assume that X and all the Xn ’s are non-negative. But then, by (5.2.2),

1 P
P sup Xn > α ≤ E XN , sup Xn > α
0≤n≤N α 0≤n≤N

1
= EP X, sup Xn > α
α 0≤n≤N
for all N ∈ Z+ , and therefore (5.2.5) follows when N → ∞ and one takes right
limits in α.
As my first application of (5.2.5), note that {Xn : n ≥ 0} is uniformly P-
integrable. Indeed, because |Xn | ≤ EP [|X| |Fn ], we have from (5.2.5) that
h i h i
sup EP |Xn |, |Xn | ≥ α ≤ sup EP |X|, |Xn | ≥ α
n∈N n∈N

P
≤ E |X|, sup |Xn | ≥ α −→ 0
n∈N
as α → ∞. Thus, we will know that the asserted convergence takes place in

L1 (P; R) as soon as we show that it happens P-almost surely. In addition, if
X ∈ Lp (P; R) for some
p ∈ (1, ∞), then, by (5.2.5) and Exercise 1.4.18, we see
that |Xn |p : n ∈ N is uniformly P-integrable and, therefore, that Xn −→ X
in Lp (µ; R) as soon as it does (a.s., P). In other words, everything comes down
to checking the P-almost sure convergence for X ∈ L1 (P; R).
To prove the P-almost sure convergence, let G be the set of X ∈ L1 (P; R) for
which Xn −→ X (a.s., P). Clearly, X ∈ G if X ∈ L1 (P; R) is Fn -measurable for
some n ∈ N, and, therefore, G is dense in L1 (P; R). Thus, all that remains is to
prove that G is closed in L1 (P; R). But if {X (k) : k ≥ 1} ⊆ G and X (k) −→ X
in L1 (P; R), then, by (5.2.5),

P sup Xn − X ≥ 3α

n≥N

sup Xn − Xn(k) ≥ α + P sup Xn(k) − X (k) ≥ α

≤P
n≥N n≥N

+ P X (k) − X ≥ α

2 (k)
(k) (k)

≤ X −X L1 (P)
+ P sup Xn − X
≥α
α n≥N
for every N ∈ Z+ , α ∈ (0, ∞), and k ∈ Z+ . Hence, by first letting N → ∞ and

then k → ∞, we see that

lim P sup Xn − X ≥ 3α = 0 for every α ∈ (0, ∞);
N →∞ n≥N
and this proves that X ∈ G.

Before moving on to more sophisticated convergence results, I will spend a
little time showing that Corollary 5.2.4 is already interesting. In order to in-
troduce my main application, recall my preliminary discussion of conditioning
when I was attempting to explain Kolmogorov’s idea at the beginning of this
chapter. As I said there, the most easily understood situation occurs when one
conditions with respect to a sub-σ-algebra Σ that is generated by a countable
partition P. Indeed, in that case one can easily verify that

P
X EP X, A
(5.2.6) E X Σ = 1A ,
P(A)
A∈P
where it is understood that

EP X, A
≡0 when P(A) = 0.
P(A)
Unfortunately, even when F is countably generated, Σ need not be (cf. Exercise
1.1.18). Furthermore, just because Σ is countably generated, it will be seldom
true that its generators can be chosen to form a countable partition. (For ex-
ample, as soon as Σ contains an uncountable number of atoms, such a partition
cannot exist.) Nonetheless, if Σ is any countably generated σ-algebra, then we
can find a sequence {Pn : n ≥ 0} of finite partitions with the properties that
∞
!
[
and σ Pn−1 ⊆ σ Pn , n ∈ Z+ .

Σ=σ Pn
0
In fact, simply choose a countable generating sequence {An : n ≥ 0} for Σ and
take Pn to be the collection of distinct sets of the form B0 ∩ · · · ∩ Bn , where
Bm ∈ {Am , Am {} for each 0 ≤ m ≤ n.
Theorem 5.2.7. Let Σ be a countably generated sub-σ-algebra of F, and
choose {Pn : n ≥ 0} to be a sequence of finite partitions as above. Next, given
p ∈ [1, ∞) and a random variable X ∈ Lp (P; R), define Xn for n ∈ N by the
right-hand side of (5.2.6) with P = Pn . Then Xn −→ EP [X|Σ] both P-almost
surely and in Lp (P; R). Moreover, even if Σ is not countably generated, for each
separable, closed subspace L of Lp (P; R) there exists a sequence {Pn : n ∈ N}
of finite partitions such that

X EP X, A
1A −→ EP X Σ (a.s., P) and in Lp (P; R)

P(A)
A∈Pn
for every X ∈ L.

Proof: To prove the first part, simply set Fn = σ Pn , identify the Xn in
(5.2.6) as EP [X|Fn ], and finally apply Corollary
5.2.4. As for the second part,
let Σ(L) be the σ-algebra generated by EP [X|Σ] : X ∈ L , note that Σ(L) is
countably generated and that

EP X Σ = EP X Σ(L) (a.s., P) for each X ∈ L,
and apply the first part with Σ replaced by Σ(L).

Theorem 5.2.7 makes it easy to transfer the usual Jensen’s Inequality to con-
ditional expectations.
Corollary 5.2.8 (Jensen’s Inequality). Let C be a closed, convex subset
of RN , X a C-valued, P-integrable random variable, and Σ a sub-σ-algebra of
F. Then there is a C-valued representative XΣ of
 P 
E X1 Σ
P

E XΣ ≡ ..
. .


EP XN Σ
In addition, if g : C −→ [0, ∞) is continuous and concave, then

EP g(X)Σ ≤ g XΣ (a.s., P).
Finally, if f : C −→ R is continuous, convex, and bounded above and if X is a

C-valued, P-integrable random variable, then f (X) is P-integrable and

(5.2.9) f EP [X|Σ] ≤ EP f (X)|Σ] (a.s., P).
(See Exercise 6.1.15 for Banach space–valued random variables.)

Proof: By the classical Jensen’s Inequality, Y ≡ g(X) is P-integrable. Hence,
by the second part of Theorem 5.2.7, we can find finite partitions Pn , n ∈ N, so
that
X EP [X, A]
Xn ≡ 1A −→ EP [X|Σ]
P(A)
A∈Pn
and
X EP g(X), A
Yn ≡ 1A −→ EP g(X)Σ
P(A)
A∈Pn
P-almost surely. Furthermore, again by the classical Jensen’s Inequality,

P
EP [X, A] EP g(X), A E [X, A]
∈C and ≤g
P(A) P(A) P(A)
for all A ∈ F with P(A) > 0. Hence, if Λ ∈ Σ denotes the set of ω for which

Xn (ω)
lim ∈ RN +1
n→∞ Yn (ω)
exists, v is a fixed element of C,
limn→∞ Xn (ω) if ω ∈ Λ

XΣ (ω) ≡
v if ω ∈
/ Λ,
and
limn→∞ Yn (ω) if ω ∈ Λ

Y (ω) ≡
v if ω ∈
/ Λ,
then XΣ is a C-valued representative of EP [X|Σ], Y is a representative of
E [g(X)|Σ], and Y (ω) ≤ g XΣ (ω) for every ω ∈ Ω.
P
Turning to the final assertion, begin by observing that once one knows that
f (X) ∈ L1 (P; R), the concluding inequality follows immediately by applying the
first part to the non-negative, concave function M − f , where M ∈ R is an upper
bound of f . Thus, what remains to be shown is that f − (X) ∈ L1 (P; R). To
this end, set fn = (−n) ∨ f for n ≥ 1. Then fn is bounded and convex,
and

P
so, by the preceding with Σ = {∅, Ω}, we know that fn E [X] ≤ E fn (X) .
P
Writing fn = f+ − fn− , this shows that EP fn− (X) ≤ M + − f E P [X] when

n ≥ −f E P [X] . Finally, note that fn− = n ∧ f − , and conclude that f − (X) ∈

L1 (P; R).
Corollary 5.2.10. Let I be a non-empty, closed interval in R ∪ {+∞} (i.e.,
either I ⊂ R is bounded on the right or I ∩ R is unbounded on the right and
I includes the point +∞). Then every I-valued random variable X with P-
integrable negative part admits an I-valued representative of EP [X|Σ]. Further-
more, if f : I −→ R ∪ {+∞} is a continuous, convex function and either f is
bounded above and X ∈ L1 (P; R) or f is bounded below and to the left (i.e.,
f is bounded on each interval of the form I ∩ (−∞, a] with a ∈ I ∩ R), then
f − (X) ∈ L1 (P; R) and (5.2.9) holds. In particular, for each p ∈ [1, ∞),
P
E X Σ p
L (P;R)
≤ kXkLp (P;R) .

Finally, if either Xn , Fn , P is anI-valued martingale and f satisfies the pre-
ceding conditions or if Xn , Fn , P is an I-valued submartingale and f is con-
tinuous, non-decreasing, convex, and bounded below, then f (Xn ), Fn , P is a
submartingale.
Proof: In view of Corollary 5.2.8, we know that an I-valued representative
of EP [X|Σ] exists when X is P-integrable, and the general case follows after a
trivial truncation procedure.
In the case when X is P-integrable and f is bounded above, f (X) ∈ L1 (P; R)

and (5.2.9) are immediate consequences of the last part of Corollary 5.2.8. To
handle the case when f is bounded below and to the left, first observe that either
f is non-increasing everywhere or there is an a ∈ I ∩ R with the property that
f is non-increasing to the left of a and non-decreasing to the right of a. Next,
let an I-valued X with X − ∈ L1 (P) be given, and set Xn = X ∧ n. Then there
exists an m ∈ Z+ such that Xn is I-valued for all n ≥ m; and clearly, by the
preceding, we know that

(*) f EP Xn Σ ≤ EP f (Xn )Σ (a.s., P) for all n ≥ m.

Moreover, in the case when f is non-increasing, f (X n ) : n ≥ m is bounded

below and non-increasing; and, in the other case, f (Xn ) : n ≥ m ∨ a is
bounded below and non-decreasing. Hence, in both cases, the desired conclusion
follows from (*) after an application of the version of the Monotone Convergence
Theorem in (5.1.8).
To complete the proof, simply note that in either of the two cases given, the
results just proved justify

EP f (Xn )Fn−1 ≥ f EP Xn Fn−1 ≥ f Xn−1 P-almost surely.
§ 5.2.2. Doob’s Stopping Time Theorem. Perhaps the most far-reaching

contribution that Doob made to martingale theory is his observation that one
can “stop” a martingale without destroying the martingale property. Later, L.
Snell showed that the analogous result is true for submartingales.
In order to state their results here, I need to introduce the notion of a stopping
time in this setting. Namely, I will say that thefunction ζ : Ω −→ N ∪ {∞} is
a stopping time relative to {Fn : n ≥ 0} if ω : ζ(ω) = n ∈ Fn for each
n ∈ N. In addition, given a stopping time ζ, I use Fζ to denote the σ-algebra of
A ∈ F such that A ∩ {ζ = n} ∈ Fn , n ∈ Z+ . Notice that Fζ1 ⊆ Fζ2 if ζ1 ≤ ζ2 .
In addition, if {Xn : n ∈ N} is Fn : n ∈ N -progressively measurable, check
that the random variable Xζ given by Xζ (ω) = Xζ(ω) (ω) is Fζ -measurable on
{ζ < ∞}.
Doob used stopping times to give a mathematically rigorous formulation of the
W.C. Field’s assertion that “you can’t cheat an honest man.” That is, consider
a gambler who is trying to beat the system. Assuming that he is playing a fair
game, it is reasonable to say his gain Xn after n plays will evolve as a martingale.
More precisely, if Fn contains
the history of the game up to and including the
nth play, then Xn , Fn , P will be a martingale. In the context of this model, a
stopping time can be thought of as a feasible (i.e., one that does not require the
gift of prophesy) strategy that the gambler can use to determine when he should
stop playing in order to maximize his gains. When couched in these terms, the
next result predicts that there is no strategy with which the gambler can alter
his expected gain.
Theorem 5.2.11 (Doob’s Stopping Time Theorem). For any submartin-

gale (martingale)
Xn , Fn , P that is P-integrable and any stopping time ζ,
Xn∧ζ , Fn , P is again a P-integrable submartingale (martingale).
Proof: Let A ∈ Fn−1 . Then, since A ∩ {ζ > n − 1} ∈ Fn−1 ,

EP Xn∧ζ , A = EP Xζ , A ∩ {ζ ≤ n − 1} + EP Xn , A ∩ {ζ > n − 1}

≥ EP Xζ , A ∩ {ζ ≤ n − 1} + EP Xn−1 , A ∩ {ζ > n − 1} = EP X(n−1)∧ζ , A ;
and, in the case of martingales, the inequality in the preceding can be replaced
by an equality.
Closely related to Doob’s Stopping Time Theorem is an important variant
due to G. Hunt. In order to facilitate the proof of Hunt’s result, I begin with an
easy but seminal observation of Doob’s.
Lemma 5.2.12 (Doob’s Decomposition). For each n ∈ N let Xn be an
Fn -measurable, P-integrable random variable. Then, up to a P-null set, there is
at most one sequence {An : n ≥ 0} ⊆ L1 (P; R) such that A0 = 0, An is Fn−1 -
+
measurable for each n ∈ Z , and Xn − An , Fn , P is a martingale. Moreover, if
(Xn , Fn , P) is an integrable submartingale, then such a sequence {An : n ≥ 0}
exists, and An−1 ≤ An P-almost surely for all n ∈ Z+ .
Proof: To prove the uniqueness assertion, suppose that {An : n ≥ 0} and
{Bn : n ≥ 0} are two such sequences, and set ∆n = Bn − An . Then ∆0 = 0,
∆n is Fn−1 -measurable for each n ∈ Z+ , and (∆n , Fn , P) is a martingale. But
this means that ∆n = EP [∆n | Fn−1 ] = ∆n−1 for all n ∈ Z+ , and so ∆n = 0 for
all n ∈ N.
Now suppose that (Xn , Fn , P) is an integrable submartingale. To prove the
asserted existence result, set A0 ≡ 0 and
for n ∈ Z+ .

An = An−1 + EP Xn − Xn−1 Fn−1 ∨ 0

Theorem 5.2.13 (Hunt). Let Xn , Fn , P be a P-integrable submartingale.
Given bounded stopping times ζ and ζ 0 satisfying ζ ≤ ζ 0 ,

(5.2.14) Xζ ≤ EP Xζ 0 Fζ (a.s., P),

and the inequality can be replaced by equality when Xn , Fn , P is a martingale.
(Cf. Exercise 5.2.39 for unbounded stopping times.)
Proof: Choose {An : n ∈ N} for (Xn , Fn , P) as in Lemma 5.2.12, and set
Yn = Xn − An for n ∈ N. Then, because Aζ ≤ Aζ 0 and Aζ is Fζ -measurable,

EP Xζ 0 Fζ ≥ EP Yζ 0 + Aζ Fζ = EP Yζ 0 Fζ + Aζ .

Hence, it suffices to prove that equality holds in (5.2.14) when Xn , Fn , P is a
martingale. To this end, choose N ∈ Z+ to be an upper bound for ζ 0 , let Γ ∈ Fζ
be given, and note that
N
X
EP XN , Γ = EP XN , Γ ∩ {ζ = n}
n=0
N
X
= EP Xn , Γ ∩ {ζ = n} = EP Xζ , Γ .
n=0
Similarly, since Γ ∈ Fζ ⊆ Fζ 0 , EP [XN , Γ] = EP [Xζ 0 , Γ].

§ 5.2.3. Martingale Convergence Theorem. My next goal is to show that,
even when they are not given in the form covered by Corollary 5.2.4, martingales
want to converge. If for no other reason, such a result has got to be more difficult
because one does not know ahead of time what, if it exists, the limit ought to
be. Thus, the reasoning will have to be more subtle than that used in the proof
of Corollary 5.2.4. I will follow Doob and base my argument on the idea that, in
some sense, a martingale has got to be nearly constant and that a submartingale
is the sum of a martingale and a non-decreasing process. In order to make
mathematics out of this idea, I need to introduce a somewhat novel criterion for
convergence of real numbers. Namely, given a sequence {xn : n ≥ 0} ⊆ R and
a numbers −∞ < a < b < ∞, say that {xn : n ≥ 0} upcrosses the interval
[a, b] at least N times if there exist integers 0 ≤ m1 < n1 < · · · < mN < nN
such that xmi ≤ a and xni ≥ b for each 1 ≤ i ≤ N and that it upcrosses [a, b]
precisely N times if it upcrosses [a, b] at least N but does not upcross [a, b] at
least N + 1 times. Notice that limn→∞ xn < limn→∞ xn if and only if there exist
rational numbers a < b such that {xn : n ≥ 0} upcrosses [a, b] at least N times
for every N ∈ Z+ . Hence, {xn : n ≥ 0} converges in [−∞, ∞] if and only if it
upcrosses [a, b] at most finitely often for each pair of rational numbers a < b.
2
Theorem 5.2.15 (Doob’s Martingale Convergence Theorem). Suppose
that Xn , Fn , P is a P-integrable submartingale. For −∞ < a < b < ∞, let
U[a,b] (ω) denote the precise number of times that {Xn (ω) : n ≥ 0} upcrosses
[a, b]. Then

P
EP (Xn − a)+
(5.2.16) E U[a,b] ≤ sup .
n∈N b−a
In particular, if
sup EP Xn+ < ∞,

(5.2.17)
n∈N
2 In the notes to Chapter VII of his Stochastic Processes, Wiley (1953), Doob gives a thorough
account of the relationship between his convergence result and earlier attempts in the same
direction. In particular, he points out that, in 1946, S. Anderson and B. Jessen formulated
and proved a closely related convergence theorem.
then there exists a P-integrable random variable X to which {Xn : n ≥ 0} con-

verges P-almost surely. (See Exercises 5.2.36 and 5.2.38 for other derivations.)
−a)+
Proof: Set Yn = (Xnb−a

, and note that (by Corollary 5.2.10) Yn , Fn , P is
a P-integrable submartingale. Next, let N ∈ Z+ be given, set ζ00 = 0, and, for
k ∈ Z+ , define
0
ζk = inf{n ≥ ζk−1 : Xn ≤ a} ∧ N and ζk0 = inf{n ≥ ζk : Xn ≥ b} ∧ N.
Proceeding by induction, it is an easy matter to check that all the ζk ’s and

(N )
ζ 0 ’s are stopping times. Moreover, if U[a,b] (ω) is the precise number of times
k
Xn∧N (ω) : n ≥ 0 upcrosses [a, b], then
N N
(N )
X X
U[a,b] ≤ Yζk0 − Yζk = YN − Y0 − Yζk − Yζk−1
0
k=1 k=1
N
X
≤ YN − Yζk − Yζk−1
0 .
k=1
0

Hence, since ζk−1 ≤ ζk and therefore, by (5.2.14), EP Yζk − Yζk−1
0 ≥ 0 for all
(N )
k ∈ Z+ , we see that EP [U[a,b] ] ≤ EP [YN ], and clearly (5.2.16) follows from this
after one lets N → ∞.
Given (5.2.16), the convergence result is easy. Namely, if (5.2.17) is satisfied,
then (5.2.16) implies that there is a set Λ of full P-measure such that U[a,b] (ω) <
∞ for all rational a < b and ω ∈ Λ; and so, by the remark preceding the
statement of this theorem, for each ω ∈ Λ, {Xn (ω) : n ≥ 0} converges to some
X(ω) ∈ [−∞, ∞]. Hence, we will be done as soon as we know that EP [|X|, Λ] <
∞. But
EP |Xn | = 2EP Xn+ − EP Xn ≤ 2EP Xn+ − EP X0 ,

n ∈ N,
and therefore Fatou’s Lemma plus (5.2.17) shows that X is P-integrable.

The inequality in (5.2.16) is quite famous and is known as Doob’s Upcross-
ing Inequality.
Remark 5.2.18. The argument in the proof of Theorem 5.2.15 is so slick that
it is easy to miss the point that makes it work. Namely, the whole proof turns
on the inequality EP [Yζk − Yζk−1
0 ] ≥ 0. At first sight, this inequality seems to be
wrong, since one is inclined to think that Yζk < Yζk−1 0 . However, Yζk need be
less than Yζk−1
0 only if ζk < N , which is precisely what, with high probability,
the submartingale property is preventing from happening.

Corollary 5.2.19. Let Xn , Fn , P be a martingale. Then there exists an
X ∈ L1 (P; R) such that Xn = EP [X|Fn ] (a.s., P) for each n ∈ N if and only if
the sequence {Xn : n ≥ 0} is uniformly P-integrable. In addition, if p ∈ (1, ∞],
then there is an X ∈ Lp (P; R) such that Xn = EP [X|Fn ] (a.s., P) for each n ∈ N
if and only if {Xn : n ≥ 0} is a bounded subset of Lp (P; R).
Proof: Because of Corollary 5.2.4 and (5.2.3), I need only check the “if” state-
ment in the first assertion. But, if {Xn : n ≥ 0} is uniformly P-integrable,
then (5.2.17) holds and therefore Xn −→ X (a.s., P) for some P-integrable X.
Moreover, uniform integrability together with almost sure convergence implies
convergence in L1 (P; R), and therefore, by (5.1.5), for each m ∈ N,

Xm = lim EP Xn Fm = EP X Fm (a.s., P).
n→∞
Just as Corollary 5.2.4 led us to an intuitively appealing way to construct

conditional expectations, so does Doob’s Theorem gives us an appealing ap-
proximation procedure for Radon–Nikodym derivatives.
Theorem 5.2.20 (Jessen). Let P and Q be a pair of probability measures on
the measurable space (Ω, F) and Fn : n ∈ N a non-decreasing sequence of sub-
σ-algebras whose union generates F. For each n ∈ N, let Qn,a and Qn,s denote,
respectively, the absolutely continuous and singular parts of Qn ≡ Q Fn
dQ
with respect to Pn ≡ P Fn , and set Xn = dPn,a n
. Also, let Qa and Qs be the
absolutely and singular continuous parts of Q with respect to P, and set Y = dQ
dP .
a
Then Xn −→ Y (a.s., P). In particular, Q ⊥ P if and only if Xn −→ 0 (a.s., P).

Moreover, if Qn Pn for each n ∈ N, then Q P if and only if {Xn : n ≥ 0} is
uniformly P-integrable, in which case Xn −→ Y in L1 (P; R) as well as P-almost
surely. Finally,
if Qn ∼ Pn (i.e., Pn Qn as well as Qn Pn ) for each n ∈ N
and G ≡ limn→∞ Xn ∈ (0, ∞) , then Qa (A) = Q(A ∩ G) for all A ∈ F, and
therefore Q(G) = 1 ⇐⇒ Q P and Q(G) = 0 ⇐⇒ Q ⊥ P.
Proof: Without loss in generality, I will assume throughout that all the Xn ’s
as well as Y ≡ dQdP take values in [0, ∞); and clearly, E [Xn ], n ∈ N, and E [Y ]
a P P
are all dominated by 1.

First note that
n o
Qn,s (A) = sup Q(A ∩ B) : B ∈ Fn and P(B) = 0 for A ∈ Fn .
Hence, Qn,s Fn−1 ≥ Qn−1,s for each n ∈ Z+ , and so

EP Xn , A = Qn,a (A) ≤ Qn−1,a (A) = EP Xn−1 , A

for all n ∈ Z+ and A ∈ Fn−1 . In other words, −Xn , Fn , P is a non-positive
submartingale. Moreover, in the case when Qn Pn , n ∈ N, the same argument

shows that Xn , Fn , P is a non-negative martingale. Thus, in either case, there
is a non-negative, P-integrable random variable X with the property that Xn −→
X (a.s., P). In order to identify X as Y , use Fatou’s Lemma to see that, for any
m ∈ N and A ∈ Fm ,

EP X, A ≤ lim EP Xn , A = lim Qn,a (A) ≤ Q(A);
n→∞ n→∞
S∞
and therefore EP [X, A] ≤ Q(A), first for A ∈ 0 Fm and then
for every A ∈ F.
In particular, by choosing B ∈ F so that Qs (B) = 0 = P B{ , we have that

EP X, A = EP X, A ∩ B ≤ Q(A ∩ B) = Qa (A) = EP Y, A for all A ∈ F,
which means that X ≤ Y (a.s., P). On the other hand, if Yn = EP [Y |Fn ] for
n ∈ N, then

EP Yn , A = Qa (A) ≤ Qn,a (A) = EP Xn , A for all A ∈ Fn ,
and therefore Yn ≤ Xn (a.s., P) for each n ∈ N. Thus, since Yn −→ Y and
Xn −→ X P-almost surely, this means that Y ≤ X (a.s., P).
Next, assume that Qn Pn for each n ∈ N and therefore that Xn , Fn , P
is a non-negative martingale. If {Xn : n ≥ 0} is uniformly P-integrable, then
Xn −→ Y in L1 (P; R) and therefore Qs (Ω) = 1 − EP [Y ] = 0. Hence, Q P
when {Xn : n ≥ 0} is uniformly P-integrable. Conversely, if Q P, then it
is easy to see that Xn = EP [Y |Fn ] for each n ∈ N, and therefore, by Corollary
5.2.4, that {Xn : n ≥ 0} is uniformly P-integrable.
Finally, assume that Qn ∼ Pn for each n ∈ N. Then, the Xn ’s can be chosen
dPn
to take their values in (0, ∞) and Yn ≡ X1n = dQ n
. Hence, if Pa and Ps are
the absolutely continuous and singular parts of P relative to Q and if Y ≡
limn→∞ Yn , then Y = dP dQ and so Pa (A) = E [Y, A] for all A ∈ F. Thus, when
a Q
1
B ∈ F is chosen so that Ps (B) = 0 = Q(B{), then, since Y = X on G and
E [X, C ∩ G] = E [X, C] for all C ∈ F, it is becomes clear that
P P

Q(A ∩ G = EQ XY, A ∩ G = EPa X, A ∩ G

= EP X, A ∩ G ∩ B = EP X, A ∩ B = Qa (A ∩ B) = Qa (A)
for all A ∈ F.
§ 5.2.4. Reversed Martingales and De Finetti’s Theory. For some appli-
cations it is important to know what happens if one runs a submartingale or
martingale backwards. Thus,
again let (Ω, F, P) be a probability space, only
this time suppose that Fn : n ∈ N is a sequence of sub-σ-algebras that
is non-increasing. Given a sequence {Xn : n ≥ 0} of (−∞, ∞]-valued ran-
dom variables, I will say that the triple Xn , Fn , P is either a reversed sub-
martingale or a reversed martingale if, for each n ∈ N, Xn is Fn -measurable
and either Xn− ∈ L1 (P; R) and Xn+1 ≤ EP [Xn | Fn+1 ] or Xn ∈ L1 (P; R) and
Xn+1 = EP [Xn | Fn+1 ].
Theorem 5.2.21. If (Xn , Fn , P) is a reversed submartingale, then

1 P
(5.2.22) P sup Xn ≥ R ≤ E X0 , sup Xn ≥ R , R ∈ (0, ∞).
n∈N R n∈N
In particular, if (Xn , Fn , P) is a non-negative reversed submartingale and X0 ∈

L1 (P; R), then {Xn : n ≥ 0} is uniformly P-integrable and

p
(5.2.23) sup Xn ≤ X0 p
L (P;R)
when p ∈ (1, ∞).
n∈N
Lp (P;R)
p−1

Moreover, if (Xn , Fn , P) is a reversed martingale, then (|Xn |, Fn , P is a re-
versed submartingale. Finally, if (XnT , Fn , P) is a reversed submartingale and
∞
X0 ∈ L1 (P; R), then there is a F∞ ≡ n=0 Fn -measurable X : Ω −→ [−∞, ∞]
to which Xn converges P-almost surely. In fact, X will be P-integrable if
supn≥0 EP [|Xn |] < ∞; and if (Xn , Fn , P) is either a non-negative reversed sub-
martingale or a reversed martingale with X0 ∈ Lp (P; R) for some p ∈ [1, ∞),
then Xn −→ X in Lp (P; R).
Proof: More or less everything here follows immediately from the observation
that (Xn , Fn , P) is a reversed submartingale or a reversed martingale if and only
if, for each N ∈ Z+ , (XN −n∧N , FN −n∧N , P) is a submartingale or a martingale.
Indeed, by this observation and (5.2.2) applied to (XN −n∧N , FN −n∧N , P),

1 P
P max Xn > R ≤ E X0 , max Xn > R
0≤n≤N R 0≤n≤N
for every N ≥ 1. When N → ∞, the left-hand side of the preceding tends to

P (supn∈N Xn > R) and

EP X0 , max Xn > R = EP X0+ , max Xn > R − EP X0− , max Xn > R
0≤n≤N 0≤n≤N 0≤n≤N

P + P − P
−→ E X0 , sup Xn > R − E X0 , sup Xn > R = E X0 , sup Xn > R ,
n∈N n∈N n∈N
since X0+ is non-negative, and therefore the Monotone Convergence Theorems

applies, and X0− is integrable, and therefore Lebesgue’s Dominated Convergence
Theorem applies. Thus (5.2.22) follows after one takes right limits in R. Starting
from (5.2.22) and applying Exercise 1.4.18, (5.2.23) follows for non-negative,
reversed submartingales. Moreover, because it is obvious that (|Xn |, Fn , P) is a
reversed submartingale when (Xn , Fn , P) is a reversed martingale, (5.2.23) holds
for reversed martingales as well.
Next, suppose that (Xn , Fn , P) is a non-negative, reversed submartingale or a

reversed martingale. Then

P
P
P
sup E |Xn |, |Xn | ≥ R ≤ sup E |X0 |, |Xn | ≥ R ≤ E |X0 |, sup |Xn | ≥ R ,
n∈N n∈N n∈N
which, by (5.2.22), tends to 0 as R → ∞. Thus, {Xn : n ≥ 0} is uniformly

P-integrable.
It remains to prove the convergence assertions, and again the key is the same
observation about reversing time to convert reversed submartingales into sub-
martingales. However, before seeing how it applies, first say that {xn : n ≥ 0}
downcrosses [a, b] at least N times if there exist 0 ≤ m1 < n1 < · · · < mN < nN
such that xmi ≥ b and xni ≤ a for each 1 ≤ i ≤ N . Clearly, the same argument
that I used for upcrossings applies to downcrossings and shows that {xn : n ≥ 0}
converges in [−∞, ∞] if and only if it downcrosses [a, b] finitely often for each
rational pair a < b. In addition, {xn : 0 ≤ n ≤ N } downcrosses [a, b] the same
(N )
number of times as {xN −n : 0 ≤ n ≤ N } upcrosses it. Hence, if D[a,b] (ω) is
the number of times {Xn∧N : n ≥ 0} downcrosses [a, b], then this observation
(N )
together with the estimate in the proof of Theorem 5.2.15 for EP [U[a,b] ] show
that
P
(N ) EP (X0 − a)+
E D[a,b] ≤ .
b−a
Starting from here, the argument used to prove Theorem 5.2.15 shows that there
exits a F∞ -measurable X : Ω −→ [−∞, ∞] to which {Xn : n ≥ 0} converges
P-almost surely. Once one has this almost sure convergence result, the rest of
the theorem is an easy application of standard measure theory and the uniform
integrability estimates proved above.
An important application of reversed martingales is provided by De Finetti’s
theory of exchangeable random variables. To describe his theory, let Σ denote
the group of all finite permutations of Z+ . That is, an element π of Σ is an
isomorphism
S∞ of Z+ that moves only a finite number of integers. Alternatively,
Σ = m=1 Σm , where Σm is the group of isomorphisms π of Z+ with the property
that n = π(n) for all n > m. Next, let (E, B) be a measurable space, and, for
+ +
each π ∈ Σ, define Sπ : E Z −→ E Z so that

Sπ x = xπ(1) , . . . , xπ(n) , . . . if x = x1 , . . . , xn , . . . .
+ +
Obviously, each Sπ is a B Z -measurable isomorphism from E Z onto itself. Also,
if +
for m ∈ Z+ ,

Am ≡ B ∈ BZ : B = Sπ B for all π ∈ Σm
+
then the Am ’s form a non-increasing sequence of sub-σ-algebras of B Z , and
∞
\ +
Am = A∞ ≡ B ∈ B Z : B = Sπ B for all π ∈ Σ .
m=1
Now suppose that {Xn : n ≥ 1} is a sequence of E-valued random variables on

+
the probability space (Ω, F, P), and set X(ω) = X1 (ω), . . . , Xn (ω), . . . ) ∈ E Z .
The Xn ’s are said to be exchangeable random variables if X has the same
P-distribution as Sπ X for every π ∈ Σ. The central result of De Finetti’s theory
is De Finetti’s Strong Law, which states that, for any g : E −→ R satisfying
g ◦ X1 ∈ L1 (P; R),
n
1X
EP g ◦ X1 X−1 (A∞ ) = lim

g ◦ Xm ,
(5.2.24) n→∞ n
1
where the convergence is P-almost sure and in L1 (P; R).
To prove (5.2.24), observe that, for any 1 ≤ m ≤ n, EP [g ◦ Xm | X−1 (An )] =
EP [g ◦ X1 | X−1 (An )], which immediately leads to
" n # n
P
−1 P 1 X −1 1 X
E g ◦ X1 X (An ) = E g ◦ Xm X (An ) = g ◦ Xm .
n m=1 n m=1
Hence, (5.2.24) follows as an application of Theorem 5.2.21.

De Finetti’s Strong Law makes it important to get a handle on the σ-algebra
X−1 (A∞ ). In particular, one would like to know when X−1 (A∞ ) is trivial in
the sense that each of its elements has probability 0 or 1, in which case (5.2.24)
self-improves to the statement that
n
1X
(5.2.25) lim g ◦ Xm = EP [g ◦ X1 ] P-almost surely and in L1 (P; R).
n→∞ n
1
The following lemma is the crucial step toward gaining an understanding of

X−1 (A∞ ).
T∞
Lemma 5.2.26. Refer to the preceding, and let T = m=1 σ {Xn : n ≥ m}
be the tail σ-algebra determined by {Xn : n ≥ 1}. Then T ⊆ X−1 (A∞ ) and
X−1 (A∞ ) is contained in the completion of T with respect to P. In particular,
for each F ∈ L1 (P; R),
EP F X−1 (A∞ ) = EP F T

(5.2.27) (a.s., P).
Proof: The inclusion T ⊆ X−1 (A∞ ) is obvious. Thus, what remains to be
proved is that, for any F ∈ L1 (P; R), EP [F | X−1 (A∞ )] is, up to a P-null set,
T -measurable. To this end, begin by observing that it suffices to check this for
F ’s that are σ {Xn : 1 ≤ m ≤ N } -measurable for some N ∈ Z+ . Indeed, since
X−1 (A∞ ) ⊆ σ {Xn : n ≥ 1} , we know that

h i
EP F X−1 (A∞ ) = EP EP F σ {Xn : n ≥ 1} X−1 (A∞ )

h i
= lim EP EP F σ {Xm : 1 ≤ m ≤ N } X−1 (A∞ ) .
N →∞

Now suppose that F is σ {Xm : 1 ≤ m ≤ N } -measurable. Then there
exists a g : E N −→ R such that F = g X1 , . . . , XN ). If N = 1, then, because
Pn
limn→∞ n1 m=1 g ◦ Xm is T -measurable, (5.2.24) says that E P [F | X−1 (A∞ )]
is T -measurable. To get the same conclusion when N ≥ 2, I want to apply the
same reasoning, only now with E replaced by E N . To be precise, define
Z+
A(N )
: B = Sσ B for all π ∈ Σ(N ) , where

∞ = B ∈B
Σ(N ) = π ∈ Σ : π(`N + m) = π(`N + 1) + m − 1 for all ` ∈ N and 1 ≤ m < N

is the group of finite permutations that transform Z+ in blocks of length N .

N P
−1 (N )
By (5.2.24) applied with E replacing E, we find that E F X (A∞ ) =

−1 −1 (N )
EP F T P-almost surely. Hence,
since X (A∞ )1⊆ X (A∞ ), (5.2.27) holds
for every σ {Xn : 1 ≤ n ≤ N } -measurable F ∈ L (P; R).
The best known consequence of Lemma 5.2.26 is the Hewitt–Savage 0–
1 Law, which says that X−1 (A∞ ) is trivial if the Xn ’s are independent and
identically distributed. Clearly, their result is an immediate consequence of
Lemma 5.2.26 together with Kolmogorov’s 0–1 Law.
Seeing as the Strong Law of Large Numbers follows from (5.2.24) combined
with the Hewitt–Savage 0–1 Law, one might think that (5.2.24) represents an
extension of the strong law. However, that is not really the case, since it can be
shown that X−1 (A∞ ) is trivial only if the Xn ’s are independent. On the other
hand, the derivation of the strong law via (5.2.24) extends without alteration to
the Banach space setting (cf. part (ii) of Exercise 6.1.16).
§ 5.2.5. An Application to a Tracking Algorithm. In this subsection I will
apply the considerations in § 5.2.1 to the analysis of a tracking algorithm. The
origin of this algorithm is an idea which Jan Mycielski introduced as a model
for learning. However, the treatment here derives from a variation, suggested
by Roy O. Davies, of Mycielski’s model. Because I do not understand learning
theory, I prefer to think of Mycielski’s algorithm as a tracking algorithm.
Let (E, B) be a measurable space for which there exists a nested sequence
{Pk : k ≥ 0} of finite or countable partitions such that P0 = {E} and B =
S∞
σ ( k=0 Pk ). Given k ≥ 1 and Q ∈ Pk , let Q̆ be the “parent” of Q in the sense
that Q̆ is the unique element of Pk−1 which contains Q. Also, for each x ∈ E
and k ≥ 0, use Qk (x) to denote the unique Q ∈ Pk such that Q 3 x. Further, let
µ be a probability measure on (E, B) with the property that, for some θ ∈ (0, 1),
S∞
0 < µ(Q) ≤ (1 − θ)µ(Q̆) for each Q ∈ k=0 Pk
Next, let (Ω, F, P) be a probability space on which there exists a sequence
{Xn : n ≥ 1} of mutually independent E-valued random variables with dis-
tribution µ. In addition, let {Zn : n ≥ 1} be a sequence of E-valued ran-
dom variables with the property that, for each n ≥ 1, Zn is independent of
σ {Xm : 1 ≤ m ≤ n} , let νn be the distribution of Zn , and assume that

νn µ with Kr ≡ supn≥1 dν < ∞ for some r ∈ (1, ∞). Finally, de-
n
dµ r
L (µ;R)
fine {Yn : n ≥ 1} by the prescription that Yn (ω) = Xm (ω) if Xm (ω) is the first
element of {X1 (ω), . . . , Xn (ω)} which
is “closest” to Zn (ω) in the sense that,
for some k ≥ 0, Xj (ω) ∈/ Q k Z n (ω) for 1 ≤ j < m, Xm (ω) ∈ Qk Zn (ω) , and
Xj (ω) ∈
/ Qk+1 Zn (ω) for m < j ≤ n.
The goal here is to show that the Yn ’s “search out” the Zn ’s in the sense that,
for any B-measurable f : E −→ R,

(5.2.28) lim P |f (Yn ) − f (Zn )| ≥ = 0 for all > 0.
n→∞
At least in the case when νn = µ, Mycielski has an alternative, in some sense

simpler, derivation of (5.2.28).
The strategy which I will use is the following. For each k ≥ 1 and f ∈ L1 (µ; R),
define fk : E −→ R so that
Z
1
fk (x) = f (y) µ(dy).
µ(Qk (x)) Qk (x)

Obviously fk Yn (ω) = fk Zn (ω) if Yn (ω) ∈ Qk Zn (ω) . Moreover, as I will
show below, limn→∞ P Yn ∈ / Qk (Zn ) = 0 for each k ≥ 0. Thus, the key step is
to show that

lim sup P |f (Yn ) − fk (Yn )| ≥ = 0 for all > 0.
k→∞ n≥1

Notice that, because fk = Eµ f σ(Pk ) , this would be obvious from Corollary
5.2.4 if the Yn were replaced by Xn . Thus, the problem comes down to showing
that the distributions of Yn ’s are uniformly sufficiently close to µ.
For each n ≥ 1, define
∞ X
n
X j−1 n−j
Πn (z, Γ) = 1 − µ Qk (z) µ (Qk (z) \ Qk+1 (z)) ∩ Γ 1 − µ Qk+1 (z)
k=0 j=1
∞
X
n
µ (Qk (z) \ Qk+1 (z)) ∩ Γ
= ∆ Qk+1 (z) ,
k=0
µ Qk (z) \ Qk+1 (z)
n n
where ∆n (Q) ≡ 1 − µ(Q) − 1 − µ(Q̆) . Then
ZZ

(5.2.29) P (Zn , Yn ) ∈ B = 1B (z, y) Πn (z, dy)νn (dz).
B
In particular, if µn is the distribution of Yn , then

∞
µ (Q̆ \ Q) ∩ Γ
Z X X
n
µn (Γ) = Πn (z, Γ) νn (dz) = ∆ (Q)νn (Q) .
k=0 Q∈Pk+1
ν(Q̆ \ Q)

In addition, because Q` (z) \ Q`+1 (z) ∩ Qk (z) = ∅ if ` < k and is equal to
Q` (z) \ Q`+1 (z) when ` ≥ k,
∞
X
∆n Q`+1 (z))

Πn z, Qk (z) =
`=k
n n n
= lim 1 − µ(QL+1 (z)) − 1 − µ(Qk (z) = 1 − 1 − µ(Qk (z)) .
L→∞
Thus, if r0 is the Hölder conjugate of r, then

Z Z 10
n nr0 r
P Yn ∈
/ Qk (Zn ) = 1−µ(Qk (z)) νn (dz) ≤ Kr 1 − µ(Qk (z)) µ(dz) ,
and so, by Lebesgue’s Dominated Convergence Theorem,

(5.2.30) lim P Yn ∈
/ Qk (Zn ) = 0 for all k ≥ 0.
n→∞
S∞
Given an f ∈ L1 (µ; R) and Q ∈ k=0 Pk , set
∞
Z ( )
1 0 0
[
Af (Q) = f dµ and M f (Q) = sup A|f |(Q ) : Q ⊆ Q ∈ Pk .
µ(Q) Q k=0
Clearly,
x ∈ Q =⇒ M f (Q) ≤ f ∗ (x) ≡ sup A|f | Qk (x) ,

k≥0

and, because Af Qk (x) = Eµ f σ(Pk ) (x), Doob’s Inequality (5.2.3) implies
p
that kf ∗ kLp (µ;R) ≤ p−1 kf kLp (µ;R) for all p ∈ (1, ∞].
Lemma 5.2.31. For any f ∈ L1 (µ; R),
Z Z
−1
(5.2.32) |f | dµn ≤ θ f ∗ dνn .
0
In particular, if q ∈ [1, ∞) and f ∈ Lqr (µ; R), then
q‘
rKr
(5.2.33) kf kLq (µn ;R) ≤ kf kLqr0 (µ;R) .
θ
Proof: Without loss in generality, I will assume throughout that f ≥ 0.
To prove (5.2.32), first note that
Z ∞ Z
X X
n 1
f dµn = ∆ (Q)νn (Q) f dµ
k=0 Q∈Pk+1
µ(Q̆ \ Q) Q̆\Q
X X
≤ θ−1 ∆n (Q)νn (Q)M f (Q̆),
k=0 Q∈Pk+1
since Z
1
f dµ ≤ θ−1 Af (Q̆) ≤ θ−1 M f (Q̆).
µ(Q̆ \ Q) Q̆\Q
Next, for each k ≥ 0,

X X n
∆n (Q)νn (Q)M f (Q̆) = 1 − µ(Q) νn (Q)M f (Q̆)
Q∈Pk+1 Q∈Pk+1
X n
− 1 − µ(Q̆) νn (Q)M f (Q̆)
Q∈Pk+1
X n X n
= 1 − µ(Q) νn (Q)M f (Q̆) − 1 − µ(Q) νn (Q)M f (Q)
Q∈Pk+1 Q∈Pk
X n X n
≤ 1 − µ(Q) νn (Q)M f (Q) − 1 − µ(Q) νn (Q)M f (Q),
Q∈Pk+1 Q∈Pk
and therefore
Z K X
X n
θ f dµn ≤ lim 1 − µ(Q) νn (Q)M f (Q)
K→∞
k=0 Q∈Pk+1

X n
− 1 − µ(Q) νn (Q)M f (Q)
Q∈Pk
Z
X n
= lim 1 − µ(Q) νn (Q)M f (Q) ≤ f ∗ dνn .
K→∞
Q∈PK+1
Given (5.2.32), (5.2.33) is an easy application of Hölder’s Inequality and the

estimate coming from (5.2.3) on the Lp (µ; R)-norm of f ∗ in terms of that of f .
Namely,
Z Z Z 10
q −1 q ∗
0
q ∗ r
r
f dµn ≤ θ (f ) dνn ≤ Kr (f ) dµ
r0 rKr
≤ θ−1 Kr 0
kf q kLr0 (µ;R) = kf kqLqr0 (µ;R) .
r −1 θ
Theorem 5.2.34. For each B-measurable f : E −→ R, (5.2.28) holds. More-

0
over, if q ∈ (1, ∞) and f ∈ Lqr (µ; R), then
lim EP |f (Yn ) − f (Zn )|q = 0 for each p ∈ [1, q).

(5.2.35)
n→∞
(See Exercise 6.1.19 for a related result.)

Proof: It is easy to prove (5.2.28) from (5.2.35). Indeed, given δ > 0, choose
R
R > 0 so that µ |f | ≥ R < δ, and set f = f 1[−R,R] (f ). Then, by (5.2.35),
limn→∞ P |f R (Yn ) − f R (Zn )| ≥ = 0 for all > 0. Hence,

lim P |f (Yn ) − f (Zn )| ≥ 3
n→∞
≤ lim µn |f − f R | ≥ + lim νn |f − f R | ≥ .

n→∞ n→∞
By Hölder’s Inequality,
1 1
νn |f − f R | ≥ ≤ Kr µ |f − f R | ≥ r0 < Kr δ r0 ,

and, by (5.2.33) with q = 1,

rKr 1 rKr 10
µn |f − f R | ≥ ≤ µ |f − f R | ≥ r0 < δr .
θ θ
The proof of (5.2.35) follows the strategy outlined earlier. That is,
1
EP |f (Yn ) − f (Zn )|p p

1
≤ kf − fk kLp (µn ;R) + EP |fk (Yn ) − fk (Zn )|p p + kfk − f kLp (νn ;R) .

By (5.2.33),
p1
rKr
kf − fk kLp (µn ;R) ≤ kf − fk kLpr0 (µ;R) ,
θ
and, by Hölder’s Inequality,
1
kf − fk kLp (νn ;R) ≤ Krp kf − fk kLpr0 (µ;R) .
Since, by Corollary 5.2.4, kf − fk kLpr0 (µ;R) −→ 0 as k → ∞, all that remains is

1
to show that, for each k ≥ 0, EP |fk (Yn ) − fk (Zn )|p p −→ 0. But
1 1
EP |fk (Yn ) − fk (Zn )|p p = EP |fk (Yn ) − fk (Zn )|p , Yn ∈

/ Qk (Zn ) p
1 1−1
≤ EP |fk (Yn ) − fk (Zn )|q q P Yn ∈

/ Qk (Zn ) p q .
By (5.2.30), the final factor tends to 0 as n → ∞. Hence, since, by Hölder’s
Inequality and (5.2.33),
1
EP |fk (Yn ) − fk (Zn )|q q ≤ kfk kLq (µn ;R) + kfk kLq (νn ;R)

1 1 1 1
r q r q
≤ + 1 Krq kfk kLqr0 (µ;R) ≤ + 1 Krq kf kLqr0 (µ;R) ,
θ θ
the proof is complete.

Exercise 5.2.36. In this exercise I will outline a quite independent derivation
of the convergence assertion in Doob’s Martingale Convergence Theorem. The
key observations here are first that, given Doob’s Inequality (cf. (5.2.2)), the
result is nearly trivial for martingales having uniformly bounded second moments
and second that everything can be reduced to that case.

(i) Let Mn , Fn , P be a martingale which is L2 -bounded (i.e., supn∈N EP [Mn2 ] <
∞). Note that
h 2 i
EP Mn2 − EP Mm−1
2
= EP Mn − Mm−1 for 1 ≤ m ≤ n;
and starting from this, show that there is an M ∈ L2 (P;

R) such that Mn −→ M
2
in L (P; R). Next apply (5.2.5) to the submartingale Mn∨m − Mm , Fn , P to
show that, for every > 0,

1 P h i
P sup Mn − Mm ≥ ≤ E M − Mm −→ 0 as m → ∞,
n≥m
and conclude that Mn −→ M (a.s., P).

(ii) Let Xn , Fn , P be a non-negative submartingale with the property that
supn∈N EP [Xn2 ] < ∞, define the sequence {An : n ∈ N} accordingly,
as in Lemma
5.2.12, and set Mn = Xn − An , n ∈ N. Then Mn , Fn , P is a martingale, and
clearly both Mn and An are P-square integrable for each n ∈ N. In fact, check
that
EP Mn2 − Mn−1
2

= EP Mn − Mn−1 Xn + Xn−1
= EP Xn2 − Xn−1
2
− EP An − An−1 Xn + Xn−1 ≤ EP Xn2 − Xn−1
2

,
and therefore that
EP Mn2 ≤ EP Xn2 and EP A2n ≤ 4EP Xn2

for every n ∈ N.

Finally, show that there exist M ∈ L2 (P; R) and A ∈ L2 P; [0, ∞) such that
Mn −→ M , An % A, and, therefore, Xn −→ X ≡ M + A both P-almost surely
and in L2 (P; R).
(iii) Let Xn , Fn , P be a non-negative martingale, set Yn = e−Xn , n ∈ N, use

Corollary 5.2.10 to see that Yn , Fn , P is a uniformly bounded, non-negative,

submartingale, and apply part (ii) to conclude that {Xn : n ≥ 0} converges
P-almost surely to a non-negative X ∈ L1 (P; R).

(iv) Let Xn , Fn , P be a martingale for which

(5.2.37) sup EP Xn < ∞.
n∈N
±
±
For each m ∈ N, define Yn,m = EP Xn∨m Fm ∨0 for n ∈ N. Show that Y ± ≥
n+1,m
± ± ± +

Yn,m (a.s., P), define Ym = limn→∞ Yn,m , check that both Ym , Fm , P and
Ym− , Fm , P are non-negative martingales with EP Y0+ +Y0− ≤ supn∈N EP |Xn | ,

and note that Xm = Y m+ − Ym− (a.s., P) for each m ∈ N. In other words, every
martingale Xn , Fn , P satisfying (5.2.37 ) admits a Hahn decomposition3 as
the difference of two non-negative martingales whose sum has expectation value
dominated by the left-hand side of (5.2.37). Finally, use this observation together
with (iii) to see that every such martingale converges P-almost surely to some
X ∈ L1 (P; R).
(v) By combining the final assertion in (iv) together with Doob’s Decomposition
in Lemma 5.2.12, give another proof of the convergence assertion in Theorem
5.2.15.
Exercise 5.2.38. In this exercise we will develop another way to reduce Doob’s
Martingale Convergence Theorem to the case of L2 -bounded martingales. The
technique here is due to R. Gundy and derives from the ideas introduced by
Calderón and Zygmund in connection with their famous work on weak-type 1–1
estimates for singular integrals.

(i) Let {Zn : n ∈ N} be a Fn : n ∈ N -progressively
measurable, [0, R]-valued
sequence with the property that
−Z n , F n , P is a submartingale. Next, choose
{An : n ∈ N} for −Zn , Fn , P as in Lemma 5.2.12, note that An ’s can be chosen
so that 0 ≤ An − An−1 ≤ R for all n ∈ Z+ , and set Mn = Zn + An , n ∈ N.
Check that Mn , Fn , P is a non-negative martingale with Mn ≤ (n + 1)R for
each n ∈ N. Next, show that
EP Mn2 − Mn−1
2

= EP Mn − Mn−1 Zn + Zn−1
= EP Zn2 − Zn−1
2

+ EP An − An−1 Zn + Zn−1
≤ EP Zn2 − Zn−1
2

+ 2R EP An − An−1 ,
and conclude that EP [A2n ] ≤ EP [Mn2 ] ≤ 3REP [Z0 ] for all n ∈ N.

(ii) Let Xn , Fn , P be a non-negative martingale. Show that, for each R ∈
(R) (R) (R) (R)
(0, ∞), Xn = Mn − An + ∆n , n ∈ N, where Mn , Fn , P is a non-negative
(R) 2 (R)
martingale satisfying supn≥0 EP Mn ≤ 3R EP X0 ; An : n ∈ N is a
(R)
non-decreasing sequence of random variables with the properties that A0 ≡ 0,
3 This useful observation was made by Klaus Krickeberg.
(R) (R) 2 (R)

An is Fn−1 -measurable, and supn≥1 EP An ≤ 3REP X0 ; and ∆n :

n ∈ N is a Fn : n ∈ N -progressively measurable sequence with the property
that
1
P ∃n ∈ N ∆(R)

n 6= 0 ≤ EP X0 .
R
(R) (R) (R)
Hint: Set Zn = Xn ∧ R and ∆n = Xn − Zn for n ∈ N, apply part (i)
(R)
to Zn : n ∈ N , and use Doob’s Inequality to estimate the probability that
(R)
∆n 6= 0 for some n ∈ N.

(iii) Let Xn , Fn , P be any martingale. Using (ii) above and part (iv) of Exer-
(R) (R) (R)
cise 5.2.36, show that, for each R ∈ (0, ∞), Xn = Mn + Vn + ∆n , n ∈ N,
(R) (R) 2

where Mn , Fn , P is a martingale satisfying EP Mn ≤ 12 REP |Xn | ;
(R) (R) (R)
Vn : n ∈ N is a sequence of random variables satisfying V0 ≡ 0, Vn is
Fn−1 -measurable, and
 !2 
n
Vm(R) − V (R)  ≤ 12REP |Xn |
X
EP  m−1
1

for n ∈ Z+ ; and {∆n : ∈ N} is an Fn : n ∈ N -progressively measurable
sequence satisfying
2
P ∃ 0 ≤ m ≤ n ∆(R)

m 6
= 0 ≤ EP |Xn | .
R
The preceding representation is called the Calderón–Zygmund decomposi-
tion of the martingale Xn , Fn , P .

(iv) Let Xn , Fn , P be a martingale that satisfies (5.2.37), and use part (iii)
above together with part (i) of Exercise 5.2.36 to show that, for each R ∈ (0, ∞),
2
{Xn : n ≥ 0} converges off of a set whose P-measure is no more than R times the
supremum over n ∈ N of E [|Xn |]. In particular, when combined with Lemma
P
5.2.12, the preceding line of reasoning leads to the advertised alternate proof of
the convergence result in Theorem 5.2.15.
Exercise 5.2.39. In this exercise we will extend Hunt’s Theorem (cf. Theorem
5.2.13) to allow unbounded stopping times. To this end, let Xn , Fn , P be a
uniformly P-integrable submartingale on the probability space (Ω, F, P), and set
Mn = Xn − An , n ∈ N, where {An : n ∈ N} is the sequence produced in Lemma
5.2.12. After checking that Mn , Fn , P is a uniformly P-integrable martingale,
show that, for any stopping time ζ: Xζ = EP [M∞ |Fζ ] + Aζ (a.s., P), where
X∞ , M∞ , and A∞ are, respectively, the P-almost sure limits of {Xn : n ≥ 0},
{Mn : n ≥ 0}, and {An : n ≥ 0}. In particular, if ζ and ζ 0 are a pair of stopping
times and ζ ≤ ζ 0 , conclude that Xζ ≤ EP [Xζ 0 |Fζ ] (a.s., P).
Exercise 5.2.40. There are times when submartingales converge even though
they are not bounded in L1 (P; R). For example, suppose that (Xn , Fn , P) is a
ρ : R 7−→ R with
submartingale for which there exists a non-decreasing function
the properties that ρ(R) ≥ R for all R and Xn+1 ≤ ρ Xn (a.e., P) for each
n ∈ N.

(i) Set ζR (ω) = inf n ∈ N : Xn (ω) ≥ R for R ∈ (0, ∞), and note that
sup Xn∧ζR ≤ X0 ∨ ρ(R) (a.e., P).
n∈N
In particular, if X0 is P-integrable, show that {Xn (ω) : n ≥ 0} converges in R

for P-almost every ω for which the sequence {Xn (ω) : n ≥ 0} is bounded above.
+
Hint: After observing that supn∈N EP [Xn∧ζ R
] < ∞ for every R ∈ (0, ∞), con-
clude that, for each R ∈ (0, ∞), {Xn : n ≥ 0} converges P-almost everywhere
on {ζR = ∞}.
(ii) Let {Yn : n ≥ 1} be a sequence of mutually independent, P-integrable
random variables, assume that EP [Yn ] ≥ 0 for n ∈ N and supn∈N kYn+ kL∞ (P;R) <
Pn
∞, and set Sn = 1 Ym . Show that {Sn : n ≥ 0} is either P-almost surely
unbounded above or P-almost surely convergent in R.

(iii) Let Fn : n ∈ N be a non-decreasing sequence of sub-σ-algebras and An
an element of Fn for each n ∈ N. Show that the set of ω ∈ Ω for which either
∞
X ∞
X
1An (ω) < ∞ but P An Fn−1 (ω) = ∞
n=0 n=1
or
∞
X ∞
X
1An (ω) = ∞ but P An Fn−1 (ω) < ∞
n=0 n=1
has P-measure 0. In particular, note that this gives another derivation of the
second part of the Borel–Cantelli Lemma (cf. Lemma 1.1.3).
Exercise 5.2.41. For each n ∈ N, let (En , Bn ) be a measurable space and
µn and νn a pair of probability measures on (En , Bn ) with the property that
νn µQn . Prove Kakutani’s
Q QTheorem, Qwhich says that (cf. Exercise 1.1.14)
either n∈N νn ⊥ n∈N µn or n∈N νn n∈N µn .
Hint: Set
Y Y Y Y
Ω= En , F = Bn , P = µn , and Q = νn .
n∈N n∈N n∈N n∈N
Qn
Next, take Fn = πn−1( 0 Bm ), where πn is the natural projection from Ω onto
Qn
0 Em , set Pn = P Fn and Qn = Q Fn , and note that
n
dQn Y
Xn (x) ≡ (x) = fm (xm ), x ∈ Ω,
dPn 0
dνn
where fn ≡ dµ n
. In particular, when νn ∼ µn for each n ∈ N, use Kol-
mogorov’s
0–1 Law (cf. Theorem 1.1.2) to see that Q(G) ∈ {0, 1}, where G ≡
limn→∞ Xn ∈ (0, ∞)}, and combine this with the last part of Theorem 5.2.20
to conclude that Q 6⊥ P =⇒ Q P. Finally, to remove the assumption that
νn ∼ µn for all n’s, define ν̃n on (En , Bn ) by ν̃n = 1 − 2−n−1 νn + 2−n−1 µn ,

Q
check that ν̃n ∼ µn and Q Q̃ ≡ n∈N ν̃n , and use the preceding to complete
the proof.
Exercise 5.2.42. Let (Ω, F) be a measurable space and Σ a sub-σ-algebra of
F. Given a pair of probability measures P and Q on (Ω, F), let XΣ and YΣ
be non-negative Radon–Nikodym derivatives
of, respectively, PΣ ≡ P Σ and
QΣ ≡ Q Σ with respect to PΣ + QΣ , and define
Z
1 1
P, Q Σ = XΣ2 YΣ2 d(P + Q).
(i) Show that if µ is any σ-finite measure on

(Ω, Σ) with the property that
PΣ µ and QΣ µ, then the number P, Q Σ given above is equal to
Z 12 12
dPΣ dQΣ
dµ.
dµ dµ

Also, check that PΣ ⊥ QΣ if and only if P, Q Σ = 0.

(ii) Suppose that Fn : n ∈ N is a non-decreasing sequence of sub-σ-algebras
of F, and show that (P, Q)Fn −→ (P, Q)W∞ Fn .
0
(iii) Referring to part (ii), assume that Q Fn P Fn for each n ∈ N, let Xn

be a non-negative Radon–Nikodym
W∞ W∞of Q Fn with respect
derivative Fn ,
√ to P
and show that Q 0 Fn is singular to P 0 Fn if and only if E P
Xn −→ 0
as n → ∞.
(iv) Let {σn }∞
0 ⊆ (0, ∞), and, for each n ∈ N, let µn and νn be Gaussian
measures on R with variance σn2 . If an and bn are the mean values of, respectively,
µn and νn , show that
Y Y Y Y
νn ∼ µn or νn ⊥ µn
n∈N n∈N n∈N n∈N
P∞
depending on whether 0 σn−2 (bn − an )2 converges or diverges.
Exercise 5.2.43. Let {Xn : n ∈ Z+ } be a sequence of identically distributed,
mutually independent, integrable, mean value P
0, R-valued random variables on
n
the probability space (Ω, F, P), and set Sn = 1 Xm for n ∈ Z+ . In Exercise
1.4.28 we showed that limn→∞ |Sn | < ∞ P-almost surely. Here we will show
that
(5.2.44) lim |Sn | = 0 P-almost surely.

n→∞
As was mentioned before, this result was proved first by K.L. Chung and W.H.
Fuchs. The basic observation behind the present proof is due to A. Perlin, who
noticed that, by the Hewitt–Savage 0–1 Law, limn→∞ |Sn | = L P-almost surely
for some L ∈ [0, ∞). Thus, the problem is to show that L = 0, and we will do
this by an simple argument invented by A. Yushkevich.
(i) Assuming that L > 0, use the Hewitt–Savage 0–1 Law to show that

L
P |Sn − x| < 3 i.o. = 0 for any x ∈ R,
where “i.o.” stands for “infinitely often” and means here “for infinitely many
n’s.”
Hint: Set ρ = L3 . Begin by observing that, because {Sm+n − Sm : n ∈ Z+ }
has the same P-distribution as {Sn : n ∈ Z+ }, P(|Sm+n − Sm | < 2ρ i.o.) = 0 for
any m ∈ Z+ . Thus, since |Sm+n − x| ≥ |Sm+n − Sm | − |Sm − x|, P(|Sn − x| <
ρ i.o.) ≤ P(|Sm − x| ≥ ρ) for any m ∈ Z+ . Moreover, by the Hewitt–Savage
0–1 Law, P(|Sn − x| < ρ i.o.) ∈ {0, 1}. Hence, either P(|Sn − x| < ρ i.o.) = 0,
or one has the contradiction that P(|Sm − x| < ρ) = 0 for all m ∈ Z+ and yet
P(|Sn − x| < ρ i.o.) = 1.
(ii) Still assuming that L > 0, argue that
L L

P |Sn − L| < 3 i.o. ∨ P |Sn + L| < 3 i.o. = 1,
which, in view of (i), is a contradiction. Conclude that (5.2.44) holds.

(iii) Knowing (5.2.44) and the Hewitt–Savage 0–1 Law, show that, for each x ∈ R
and > 0, one has the dichotomy

P |Sn − x| < = 0 for all n ≥ 1 or P |Sn − x| < i.o. = 1.
Here is a rather frivolous application

Exercise 5.2.45. of reversed martingales.
Let (Ω, F, P), Fn : n ∈ N , and {ek : k ∈ Z be as in part (v) of Exercise
5.1.17. Next, take
Sm = {(2k + 1)2m : k ∈ Z} for each m ∈ N, and, for
2
f ∈ L [0, 1); C , set
X
∆m (f ) = f, e` e,
L2 ([0,1);C) `
`∈Sm
where the convergence is in L2 (([0, 1]; C). By Exercise 5.1.17,

n
X
f − EP f Fn+1 = ∆m (f ).
m=0

After noting that Fn : n ∈ N is non-increasing, use the convergence result for
reversed martingales in Theorem 5.2.21 to see that the expansion
∞
X

f = f, 1 L2 ([0,1);C)
+ ∆m (f )
m=0
converges both almost everywhere as well as in L2 ([0, 1); C).4
4 When f is a function with the property that (f, e` )L2 ([0,1);C) = 0 for all ` ∈ Z\{2m : m ∈ N},
the preceding almost everywhere convergence result can be interpreted as saying that the
Fourier series of f converges almost everywhere, a result that was discovered originally by
Kolmogorov. The proof suggested here is based on fading memories of a conversation with
N. Varopolous. Of course, ever since L. Carleson’s definitive theorem on the almost every
convergence of the Fourier series of an arbitrary square integrable function, the interest in this
result of Kolmogorov is mostly historical.
Chapter 6
Some Extensions and Applications
of Martingale Theory
Many of the results obtained in § 5.2 admit easy extensions to both infinite
measures and Banach space–valued random variables. Furthermore, in many
applications, these extensions play a useful, and occasionally essential, role. In
the first section of this chapter, I will develop some of these extensions, and in the
second section I will show how these extensions can be used to derive Birkhoff’s
Individual Ergodic Theorem. The final section is devoted to Burkholder’s In-
equality for martingales, an estimate that is second in importance only to Doob’s
Inequality.
§ 6.1 Some Extensions
Throughout
discussion that follows, (Ω, F, µ) will be a measure space and
the
Fn : n ∈ N will be a non-decreasing sequence of sub-σ-algebras with the
property that µ F0 is σ-finite. In particular, this means that the conditional
expectation of a locally µ-integrable random variable given Fn is well defined (cf.
Theorem 5.1.12) even if the random variable takes
values in a separable Banach
space E. Thus, I will say that the sequence Xn ; n ∈ N of E-valued random
variables is a µ-martingale with respect to Fn : n ∈ N , or, more briefly,

that the triple Xn , Fn , µ is a martingale, if {Xn : n ∈ N} is Fn : n ∈ N -
progressively measurable, each Xn is locally µ-integrable, and
Xn−1 = Eµ Xn Fn−1 (a.e., µ) for each n ∈ Z+ .

Furthermore, whenE = R, I will say that {Xn : n ∈ N} is a µ-submartingale

with respect to Fn : n ∈ N (equivalently, the triple (Xn , Fn , µ) is a sub-
martingale) if {Xn : n ∈ N} is Fn : n ∈ N -progressively measurable, each
Xn is locally µ-integrable, and
Xn−1 ≤ Eµ Xn Fn−1 (a.e., µ) for each n ∈ Z+ .

§ 6.1.1. Martingale Theory for a σ-Finite Measure Space. Without any

real effort, I can now prove the following variants of each of the basic results in
§ 5.2.
233
234 6 Some Extensions and Applications

Theorem 6.1.1. Let Xn , Fn , µ be an R-valued µ-submartingale. Then, for
each N ∈ N and A ∈ F0 on which XN is µ-integrable,

1
(6.1.2) µ max Xn ≥ α ∩ A ≤ Eµ XN , max Xn ≥ α ∩ A
0≤n≤N α 0≤n≤N
for all α ∈ (0, ∞); and so, when all the Xn ’s are non-negative, for every p ∈
(1, ∞) and A ∈ F0 ,
p1
p 1
Eµ sup |Xn |p , A ≤ sup Eµ |Xn |p , A p .

n∈N p − 1 n∈N

Furthermore, for each stopping time ζ, Xn∧ζ , Fn , µ is a submartingale or a
martingale depending on whether Xn , Fn , µ is a submartingale or a martingale.
In addition, for any pair of bounded stopping times ζ ≤ ζ 0 ,
Xζ ≤ Eµ Xζ 0 Fζ

(a.e., µ),
and the inequality is an equality in the martingale case. Finally, given a < b
and A ∈ F0 ,
µ
Eµ (Xn − a)+ , A
E U[a,b] , A ≤ sup ,
n∈N b−a
where U[a,b] (ω) denotes the precise number of times that {Xn (ω) : n ≥ 1}
upcrosses [a, b] (cf. the discussion preceding Theorem 5.2.15), and therefore
sup Eµ Xn+ , A < ∞ for every A ∈ F0 with µ(A) < ∞

n∈N
=⇒ Xn −→ X (a.e., µ),
W∞
where X is 0 Fn -measurable
W∞ and locally µ-integrable. In fact, in the case of
martingales, there is a 0 Fn -measurable, locally µ-integrable X such that
Xn = Eµ X Fn (a.e., µ) for all n ∈ N

if and only if {Xn : n ≥ 0} is uniformly µ-integrable on each A ∈ F0 with

µ(A) < ∞, in which case X is µ-integrable if and only if Xn −→ X in L1 (µ; R).
On the other hand, when p ∈ (1, ∞), X ∈ Lp (µ; R) if and only if {Xn : n ≥ 0}
is bounded in Lp (µ; R), in which case Xn −→ X in Lp (µ; R).
Proof: Obviously, there is no problem unless µ(Ω) = ∞. However, even then,
each of these results follows immediately from its counterpart in § 5.2 once one
makes the following trivial observation. Namely, given Ω0 ∈ F0 with µ(Ω0 ) ∈
(0, ∞), set
µ F0
F 0 = F[Ω0 ], Fn0 = Fn [Ω0 ], Xn0 = Xn Ω0 , and P = .
µ(Ω0 )
§ 6.1 Some Extensions 235
Then Xn0 , Fn0 , P0 is asubmartingale or a martingale depending on whether

the original Xn , Fn , µ was a submartingale or a martingale. Hence, when

µ(Ω) = ∞, simply choose aSsequence {Ωk : k ≥ 1} of mutually disjoint, µ-finite
∞
elements of F0 so that Ω = 1 Ωk , work on each Ωk separately, and, at the end,
sum the results.
I will now spend a little time seeing how Theorem 6.1.1 can be applied to
give a simple proof of the Hardy–Littlewood Maximal Inequality. To state
their result, define the maximal function Mf for f ∈ L1 (RN ; R) by
Z
1
Mf (x) = sup |f (y)| dy, x ∈ RN ,
Q3x |Q| Q
where Q is used to denote a generic cube

N
Y
(6.1.3) Q= [aj , aj + r) with a = (a1 , . . . , aN ) ∈ RN and r > 0.
j=1
As is easily checked, Mf : RN −→ [0, ∞] is lower semicontinuous and therefore

certainly Borel measurable. Furthermore, if we restrict our attention to nicely
meshed families of cubes, then it is easy to relate Mf to martingales. More
precisely, for each n ∈ Z, the nth standard dyadic partition of RN is the
partition Pn of RN into the cubes
N
Y ki ki + 1
(6.1.4) Cn (k) ≡ n
, n
, k ∈ ZN .
i=1
2 2
These partitions are nicely meshed in the sense that the (n + 1)st is a refinement
of the nth. Equivalently, if Fn denotes the σ-algebra over RN generated by the
partition Pn , then Fn ⊆ Fn+1 . Moreover, if f ∈ L1 (RN ; R) and
Z
f nN f (y) dy for x ∈ Cn (k) and k ∈ ZN ,

Xn (x) ≡ 2
Cn (k)
then, for each n ∈ Z,

Xnf = EλRN |f |Fn

(a.e., λRN ),
where λRN denotes Lebesgue measure on RN . In particular, for each m ∈ Z,

f
Xm+n , Fm+n , λRN , n ∈ N,
is a non-negative martingale; and so, by applying (6.1.2) for each m ∈ Z and
then letting m & −∞, we see that
Z
n
(0)
o 1
(6.1.5) x : M f (x) ≥ α ≤ |f (y)| dy, α ∈ (0, ∞),

α

{M(0) f ≥α}
where ( Z )
(0) 1 [
M f (x) = sup |f (y)| dy : x ∈ Q ∈ Pn
|Q| Q n∈Z
and I have used |Γ| to denote λRN (Γ), the Lebesgue measure of Γ.
At first sight, one might hope that it should be possible to pass directly from
(6.1.5) to analogous estimates on the level sets of Mf . However, the passage
from (6.1.5) to control on Mf is not as easy as it might appear at first: the
“sup” in the definition of Mf involves many more cubes than the one in the
definition of M(0) f . For this reason I will have to introduce additional families
of meshed partitions. Namely, for each η ∈ {0, 1}N , set
(−1)n η

N
Pn (η) = + Cn (k) : k ∈ Z ,
3 × 2n
where Cn (k) is the cube described

in (6.1.4).
It is then an easy matter to check
that, for each η ∈ {0, 1}N , Pn (η) : n ∈ Z is a family of meshed partitions of
RN . Furthermore, if
( Z )
(η) 1 [
Pn (η) , x ∈ RN ,

M f (x) = sup f (y) dy : x ∈ Q ∈
|Q| Q
n∈Z
then exactly the same argument that (when η = 0) led us to (6.1.5) can now be
used to get
Z
n
N (η)
o 1
(*) x ∈ R : M f (x) ≥ α ≤ f (y) dy

α
{M(η) f ≥α}
for each η ∈ {0, 1}N and α ∈ (0, ∞). Finally, if Q is given by (6.1.3) and
r ≤ 3 12n , then it is possible to find an η ∈ {0, 1}N and a C ∈ Pn (η) for which
Q ⊆ C. (To see this, first reduce to the case when N = 1.) Hence,
max M(η) f ≤ Mf ≤ 6N max M(η) f.

η∈{0,1}N η∈{0,1}N
After combining this with the estimate in (*), we arrive at the following version
of the Hardy–Littlewood Maximal Inequality:
n o (12)N Z
(6.1.6) x ∈ RN : Mf (x) ≥ α ≤ |f (y)| dy.

α RN
At the same time, (*) implies that

(η)
M f p N ≤ p
max L (R ;R)
kf kLp (RN ;R) , p ∈ (1, ∞].
η∈{0,1}N p−1
To check this, first note that it suffices to do so when f vanishes outside of

the ball B(0, R) for some R > 0. Second, assuming that f = 0 off of B(0, R),
observe that (*) implies that
Z
n o 1
x ∈ B(0, R) : M(η) f (x) ≥ α ≤

f (y) dy.
α
{M(η)∩B(0,R) f ≥α}
Next, even though the result in Exercise 1.4.18 was stated for probability mea-
sures, it applies equally well to any finite measure. Thus, we now know that
Z ! p1
(η) (η) p p
kM f kLp (RN ;R) = lim (M f ) (x) dx ≤ kf kLp (RN ;R) ,
R→∞ B(0,R) p−1
and so we can repeat the argument just made to obtain

N
Mf p N ≤ (12) p kf kLp (RN ;R)

(6.1.7) L (R ;R)
for p ∈ (1, ∞].
p−1
In this connection, notice that there is no hope of getting this sort of estimate
when p = 1, since it is clear that
lim |x|N Mf (x) > 0

|x|→∞
whenever f does not vanish λRN -almost everywhere.

The inequality in (6.1.6) plays the same role in classical analysis as Doob’s
Inequality plays in martingale theory. For example, by essentially the same
argument as I used to pass from Doob’s Inequality to Corollary 5.2.4, we obtain
the following version of famous Lebesgue Differentiation Theorem.
Theorem 6.1.8. For each f ∈ L1 RN ; R),
Z
1
f (y) − f (x) dy = 0
lim
(6.1.9) B&{x} |B| B
for λRN -almost every x ∈ RN ,
where, for each x ∈ RN , the limit is taken over balls B that contain x and tend
to x in the sense that their radii shrink to 0. In particular,
Z
1
f (x) = lim f (y) dy for λRN -almost every x ∈ RN .
B&{x} |B| B
Proof: I begin with the observation that, for each f ∈ L1 (RN ; R),
Z
1 f (y) dy ≤ κN Mf (x), x ∈ RN ,

M̃f (x) ≡ sup
B3x |B| B
2N

where κn = Ω N
with ΩN = B(0, 1). Second, notice that (6.1.9) for every
x ∈ RN is trivial when f ∈ Cc (RN ; R). Hence, all that remains is to check that
if fn −→ f in L1 (RN ; R) and if (6.1.9) holds for each fn , then it holds for f . To
this end, let > 0 be given and check that, because of the preceding and (6.1.6),
Z
x : lim 1

f (y) − f (x) dy ≥
B&{x} |B| B
n o
≤ x : M̃(f − fn )(x) ≥

3

Z
1
fn (y) − fn (x) dy ≥
+ x : lim
B&{x} |B| B 3
n o
+ x : fn (x) − f (x) ≥

3

3
≤ 1 + (12)N κN kf − fn kL1 (RN )

for every n ∈ Z+ . Hence, after letting n → ∞, we get (6.1.9) f .
Although applications like Lebesgue’s Differentiation Theorem might make
one think that (6.1.6) is most interesting because of what it says about averages
over small cubes, its implications for large cubes are also significant. In fact, as I
will show in § 6.2, it allows one to prove Birkhoff’s Individual Ergodic Theorem
(cf. Theorem 6.2.7), which may be viewed as a result about differentiation at
infinity. The link between ergodic theory and the Hardy–Littlewood Inequality
is provided by the following deterministic
version
of the Maximal Ergodic Lemma
(cf. Lemma 6.2.1). Namely, let ak : k ∈ ZN be a summable subset of [0, ∞),
and set
1 X
S n (k) = N
aj+k , n ∈ N and k ∈ ZN ,
(2n)
j∈Qn

where Qn = j ∈ ZN : −n ≤ ji < n for 1 ≤ i ≤ N . By applying (6.1.6) and
(6.1.7) to the function f given by (cf. (6.1.4)) f (x) = ak when x ∈ C0 (k), we
see that
(12)N X

N
(6.1.10) card k ∈ Z : sup S n (k) ≥ α ≤ ak , α ∈ (0, ∞)
n∈Z+ α N k∈Z
and
! p1 ! p1
X (12)N p X
(6.1.11) sup |S n (k)|p ≤ |ak |p for p ∈ (1, ∞].
n∈Z+ p−1
k∈ZN k∈ZN
The inequality in (6.1.10) is called Hardy’s Inequality. Actually, Hardy

worked in one dimension and was drawn to this line of research by his passion
for the game of cricket. What Hardy wanted to find is the optimal order in
which to arrange batters to maximize the average score per inning. Thus, he
worked with a non-negative sequence {ak : k ≥ 0} in which ak represented the
expected number of runs scored by player k, and what he showed is that, for
each α ∈ (0, ∞),

k ∈ N : sup S n (k) ≥ α
+

n∈Z
is maximized when {ak : k ≥ 0} is non-increasing, from which it is an easy

application of Markov’s Inequality to prove that
∞
k ∈ N : sup S n (k) ≥ α ≤ 1
X
α ak , α ∈ (0, ∞).
n∈Z+ 0
Although this sharpened result can also be obtained as a corollary the Sunrise
Lemma,1 Hardy’s approach remains the most appealing.
§ 6.1.2. Banach Space–Valued Martingales. I turn next to martingales
with values in a separable Banach space. Actually, everything except the easiest
aspects of this topic becomes extremely complicated and technical very quickly,
and, for this reason, I will restrict my attention to those results that do not
involve any deep properties of the geometry of Banach spaces. In fact, the only
general theory with which I will deal is contained in the following.

Theorem 6.1.12. Let E be a separable Banach space and X n , Fn , µ an E-
valued martingale. Then kXn kE , Fn , µ is a non-negative submartingale and
therefore, for each N ∈ Z+ and all α ∈ (0, ∞),

1 µ
(6.1.13) µ sup kXn kE ≥ α ≤ E kXN kE , sup kXn kE ≥ α .
0≤n≤N α 0≤n≤N
In particular, for each p ∈ (1, ∞],

sup kXn kE p
(6.1.14) ≤ sup kXn kLp (µ;E) .
n∈N
Lp (µ;E)
p − 1 n∈N
Finally, if Xn = Eµ [X | Fn ], where X ∈ Lp (µ; E) for some p ∈ [1, ∞), then

" ∞ #
_
Xn −→ Eµ X Fn both (a.e., µ) and in Lp (µ; E).

1See Lemma 3.4.5 in my A Concise Introduction to the Theory of Integration, Third Edition,
Birkhauser (1998).

Proof: The fact kXn kE , Fn , µ is a submartingale is an easy application of
the inequality in (5.1.14); and, given this fact, the inequalities in (6.1.13) and
(6.1.14) follow from the corresponding inequalities in Theorem 6.1.1.
W∞ While proving the convergence statement, I may and will assume that F =
p µ
0 Fn . Now let X ∈ L (µ; E) be given, and set Xn = E [X|Fn ], n ∈ N.
Because of (6.1.13) and (6.1.14), we know (cf. the proofs of Corollary 5.2.4 and
Theorem 6.1.8) that the set of X for which Xn −→ X (a.e., µ) is a closed
subset of Lp (µ; E). Moreover, if X is µ-simple, then the µ-almost everywhere
convergence of Xn to X follows easily from the R-valued result. Hence, we
now know that Xn −→ X (a.s, µ) for each X ∈ L1 (µ; E). In addition, because
of (6.1.14), when p ∈ (1, ∞), the convergence in Lp (µ; E) follows by Lebesgue’s
Dominated Convergence Theorem. Finally, to prove the convergence in L1 (µ; E)
when X ∈ L1 (µ; E), note that, by Fatou’s Lemma,
kXkL1 (µ;E) ≤ lim kXn kL1 (µ;E) ,
n→∞
whereas (5.1.14) guarantees that

kXkL1 (µ;E) ≥ lim kXn kL1 (µ;E) .
n→∞
Hence, because

kXn kE − kXkE − kXn − XkE ≤ 2kXkE ,

the convergence in L1 (µ; E) is again an application of Lebesgue’s Dominated

Convergence Theorem.
Going beyond the convergence result in Theorem 6.1.12 to get an analog of
Doob’s Martingale Convergence Theorem is hard. For one thing, a naı̈ve analog
is not even true for general separable Banach spaces, and a rather deep analysis
of the geometry of Banach spaces is required in order to determine exactly when
it is true. (See Exercise 6.1.18 for a case in which it is.)
Exercise 6.1.15. In this exercise we will develop Jensen’s Inequality in the
Banach space setting. Thus, (Ω, F, P) will be a probability space, C will be a
closed, convex subset of the separable Banach space E, and X will be a C-valued
element of L1 (P; E).
(i) Show that there exists a sequence {Xn : n ≥ 1} of C-valued, simple functions
that tend to X both P-almost surely and in L1 (P; E).
(ii) Show that EP [X] ∈ C and that

EP g(X) ≤ g EP [X]
for every continuous, concave g : C −→ [0, ∞).
(iii) Given a sub-σ-algebra Σ of F, follow the argument in Corollary 5.2.8 to

show that there exists a sequence {Pn }∞
0 of finite, Σ-measurable partitions with
the property that
X EP [X, A]
1A −→ EP [X|Σ] both P-almost surely and in L1 (P; E).
P(A)
A∈Pn
In particular, conclude that there is a representative XΣ of EP [X|Σ] that is

C-valued and satisfies

EP g(X)Σ ≤ g XΣ (a.s., P)
for each continuous, convex g : C −→ [0, ∞).

Exercise 6.1.16. Again let (Ω, F, P) be a probability space and E be a sepa-
rable Banach space. Further, suppose that {FTn : n ≥ 0} is a non-increasing se-
∞
quence of sub-σ-algebras of F, and set F∞ = 0 Fn . Finally, let X ∈ L1 (P; E).
(i) Show that
EP X Fn −→ EP [X|F∞ ] both P-almost surely and in Lp (P; E)

for any p ∈ [1, ∞) with X ∈ Lp (P; E).

Hint: Use (6.1.13) and the approximation result in Theorem 5.1.10 to reduce to
the case when X is simple. When X is simple, get the result as an application
of the convergence result for R-valued, reversed martingales in Theorem 5.2.21.
(ii) Using part (i) and following the line of reasoning suggested at the end of
§ 5.2.4, give a proof of The Strong Law of Large Numbers for Banach space–
valued random variables.2 (See Exercises 6.2.18 and 9.1.18 for entirely different
approaches.)
Exercise 6.1.17. As we saw in the proof of Theorem 6.1.8, the Hardy–
Littlewood maximal function can be used to dominate other quantities of in-
terest. As a further indication of its importance, I will use it in this exercise
to prove the analog of Theorem 6.1.8 for a large class of approximate identities.
That is, let ψ ∈ L1 (RN ; R) with RN ψ(x) dx = 1 be given, and set
R
ψt (x) = t−N ψ xt , t ∈ (0, ∞) and x ∈ RN .

Then {ψt : t > 0} forms an approximate identity in the sense that, as

tempered distributions, ψt −→ δ0 as t & 0. In fact, because
kψt ? f kLp (RN ;R) ≤ kψkL1 (RN ;R) kf kLp (RN ;R) , t ∈ (0, ∞) and p ∈ [1, ∞],
2 This proof, which seems to have been the first, of the Strong Law for Banach spaces was
given by E. Mourier in “Eléments aléatoires dans un espace de Banach,” Ann. Inst. Poincaré
13, pp. 166–244 (1953).
and Z
ψt ? f (x) = ψ(y) f (x − ty) dy,
RN
it is easy to see that, for each p ∈ [1, ∞),

lim ψt ? f − f Lp (RN ;R) = 0
t&0
first for f ∈ Cc (RN ; R) and then for all f ∈ Lp (RN ; R).

The purpose of this exercise is to sharpen the preceding under the assumption
that
x ∈ RN \ {0} for some α ∈ C 1 (0, ∞); R with

ψ(x) = α |x| ,
Z
A≡ rN |α0 (r)| dr < ∞.
(0,∞)
Notice that when α is non-negative and non-increasing, integration by parts

shows that A = N .
(i) Let f ∈ Cc (RN ; R) be given, and set
Z
1
f˜(r, x) = f (y) dy for r ∈ (0, ∞) and x ∈ RN .
|B(x, r)| B(x,r)
Using integration by parts and the given hypotheses, show that

Z
ψt ? f (x) = − N1 rN α0 (r) f˜(tr, x) dr,
(0,∞)
and conclude that

ψt ? f (x) ≤ A
N M̃f (x),
where M̃f is the quantity introduced at the beginning of the proof of Theorem
6.1.8. In particular, conclude that there is a constant KN ∈ (0, ∞), depending
only on N ∈ Z+ , such that
x ∈ RN .

Mψ f (x) ≡ sup ψt ? f (x) ≤ KN A Mf (x),
t∈(0,∞)
(ii) Starting from the conclusion in (i), show that
(12)N KN Akf kL1 (RN )

f ∈ L1 (RN ; R),

{x : Mψ f (x) ≥ R} ≤ ,
R
and that for p ∈ (1, ∞],

N
Mψ f p N ≤ (12) KN A p kf kLp (RN ;R) , f ∈ Lp (RN ; R).

L (R ;R) p−1
Finally, proceeding as in the proof of Theorem 6.1.8, use the first of these to
prove that, for f ∈ L1 (RN ; R) and Lebesgue almost every x ∈ RN ,

lim ψt ? f (x) − f (x)
t&0
Z

≤ lim ψt (y) f (x − y) − f (x) dy = 0.
t&0 RN
Two of the most familiar examples to which the preceding applies are the
2
Gauss kernel gt (x) = (2πt)− 2 exp − |x|2 and the Poisson kernel (cf. (3.3.19))
N
N
ΠR
t . In both these cases, A = N .
Exercise 6.1.18. Let E be a separable Hilbert space and (Xn , F, P) an E-

valued martingale on some probability space (Ω, F, P) satisfying the condition
sup EP kXn k2E < ∞.

n∈Z+
W∞
Proceeding as in (i) of Exercise 5.2.36, first prove that there is a 1 Fn -measur-
able X ∈ L2 (P; E) to which {Xn : n ≥ 1} converges in L2 (P; E), next check
that
Xn = EP X Fn (a.s., P) for each n ∈ Z+ ,

and finally apply the last part of Theorem 6.1.12 to see that Xn −→ X P-almost
surely.
Exercise 6.1.19. This exercise deals with a variation, proposed by Jan Myciel-
ski, on the sort of search algorithm discussed in § 5.2.5. Let G be a non-empty,

bounded, open subset of RN with the property that λRN B(x, r) ∩ G ≥ αΩN rd
for some α > 0 and all x ∈ G and 0 < r ≤ diam(G), and define µ on (G, BG )
λ (Γ∩G)
by µ(Γ) = RλNN (G) . Next, let (Ω, F, P) be a probability space on which there
R
exists sequences {Xn : n ≥ 1} and {Zn : n ≥ 1} of G-valued random variables
with the properties that the Xn ’s are mutually independent and have distribu-
tion µ, Zn is independent of {X 1 , . . . , Xn } and has distribution νn µ for each
n ≥ 1, and Kr ≡ supn≥1 dν < ∞ for some r ∈ (1, ∞). Without loss
n
dµ r
L (µ;R)
in generality, assume that n 6= n0 =⇒ Xn (ω) 6= Xn0 (ω) for all ω ∈ Ω. For each
n ≥ 1, let Yn (ω) be the last element of {X1 (ω), . . . , Xn (ω)} which is closest to
Zn (ω). That is, if Σn is the permutation group on {1, . . . , n} and, for π ∈ Σn ,

An (π) = ω : |Xπ(m) (ω) − Zn (ω)| < |Xπ(m−1) (ω) − Zn (ω)| : for 2 ≤ m ≤ n ,
then Yn = Zπ(n) on An (π). Show that for all Borel measurable f : G −→ R,
|f (Yn ) − f (Zn )| −→ 0 in P-probability. Here are some steps that you might want
to follow.
(i) Given f ∈ L1 (µ; R), show that

( Z )
1
MG f (x) ≡ sup |f | dµ ≤ α−1 Mf (x)
r>0 |B(x, r) ∩ G| B(x,r)∩G
Cp
and therefore that there is a C < ∞ such that kMG f kLp (µ;R) ≤ p−1 kf kL (µ;R)
p
for all p ∈ (1, ∞].

(ii) Given n ≥ 1 and z ∈ G, set

An (z) = ω : |Xm (ω) − z| < |Xm−1 (ω) − z| : for 2 ≤ m ≤ n ,
and show that
Z

P

E f (Yn ) = n! EP f (Xn ), An (z) νn (dz).
Next, for n ≥ 2, set rn (ω) = |Xn−1 (ω) − z|, and show that
"Z #
P
P

E f (Xn ), An (z) = E f dµ, An−1 (z) ≤ MG f (z)P An (z) ,
B(z,rn )
and conclude from this that

Z
P

E f (Yn ) ≤ MG f dνn .
(iii) Given the conclusion drawn at the end of (ii), proceed as in the derivation
of Theorem 5.2.34 from Lemma 5.2.31 to get the desired result.
§ 6.2 Elements of Ergodic Theory
Among the two or three most important general results about dynamical systems
is D. Birkhoff’s Individual Ergodic Theorem. In this section, I will present a
generalization, due to N. Wiener, of Birkhoff’s basic theorem.
The setting in which I will prove the Ergodic Theorem will be the following.
kF, µ) will N
(Ω, be a σ-finite measure space on which there exits a semigroup
Σ : k ∈ N of measurable, µ-measure preserving transformations.
That is, for each k ∈ NN , Σk is an F-measurable map from Ω into itself, Σ0 is
the identity map, Σk+` = Σk ◦ Σ` for all k, ` ∈ NN , and
µ(Γ) = µ (Σk )−1 (Γ) for all k ∈ N and Γ ∈ F.

Further, E will be a separable Banach space with norm k · kE , and, given a

function F : Ω −→ E, I will be considering the averages
1 X
An F (ω) ≡ N F ◦ Σk (ω), n ∈ Z+ ,
n +
k∈Qn

where Q+ n is the cube k ∈ N
N
: kkk∞ < n and kkk∞ ≡ max1≤j≤N kj . My
goal (cf. Theorem 6.2.7) is to show that, for each p ∈ [1, ∞) and F ∈ Lp (µ; E),
{An F : n ≥ 1} converges µ-almost everywhere. In fact, when either µ is finite
or p ∈ (1, ∞), I will show that the convergence is also in Lp (µ; E).
§ 6.2 Elements of Ergodic Theory 245
§ 6.2.1. The Maximal Ergodic Lemma. Because he was thinking in terms

of dynamical systems and therefore did not take full advantage of measure the-
ory, Birkhoff’s own proof of his theorem is rather cumbersome. Later, F. Riesz
discovered a proof which has become the model for all later proofs. Specifically,
he introduced what is now called the Maximal Ergodic Inequality, which is an
inequality that plays the same role here that Doob’s Inequality played in the
derivation of Corollary 5.2.4. In order to cover Wiener’s extension of Birkhoff’s
theorem, I will derive a multiparameter version of the Maximal Ergodic In-
equality, which, as the proof shows, is really just a clever application of Hardy’s
Inequality.1
Lemma 6.2.1 (Maximal Ergodic Lemma). For each n ∈ Z+ and p ∈ [1, ∞],
An is a contraction on Lp (µ; E). Moreover, for each F ∈ Lp (µ; E),
(24)N

(6.2.2) µ sup kAn F kE ≥ λ ≤ kF kL1 (µ;E) , λ ∈ (0, ∞),
n≥1 λ
or
(24)N p

(6.2.3) sup kAn F kE ≤ kF kLp (µ;E) ,
n≥1
Lp (µ)
p−1
depending on whether p = 1 or p ∈ (1, ∞).

Proof: First observe that, because kAn F kE ≤ An kF kE , it suffices to prove
all of these assertions in the case when E = R and F is non-negative. Thus, I
will restrict myself to this case. Since F ◦ Σk has the same distribution as F
itself, the first assertion is trivial. To prove (6.2.2) and (6.2.3), let n ∈ Z+ be
given, apply (6.1.10) and (6.1.11) to
F ◦ Σk (ω) if k ∈ Q+

2n
ak (ω) ≡
0 / Q+
if k ∈ 2n ,
and conclude that

n o
+ k

Cn (ω) ≡ card k ∈ Qn : max Am F ◦ Σ (ω) ≥ λ
1≤m≤n
N
(12) X
≤ F ◦ Σk (ω)
λ
k∈Q+
2n
1The idea of using Hardy’s Inequality was suggested to P. Hartman by J. von Neumann and
appears for the first time in Hartman’s “On the ergodic theorem,” Am. J. Math. 69, pp.
193–199 (1947).
and
p X
(12)N p
X p p
k
F ◦ Σk (ω)

max Am F ◦ Σ (ω) ≤ .
1≤m≤n p−1
k∈Q+
n k∈Q+
2n
Hence, by Tonelli’s Theorem,
X Z
k

µ max Am F ◦ Σ ≥λ = Cn (ω) µ(dω)
1≤m≤n
k∈Q+
n
(12)N X
Z
≤ F ◦ Σk f dµ
λ +
k∈Q2n
and, similarly,
p X Z
(12)N p
X Z

k
p p
max Am F ◦ Σ dµ ≤ F ◦ Σk dµ.
1≤m≤n p−1
k∈Q+
n k∈Q+
2n

Finally, since the distributions of max1≤m≤n Am F ◦ Σk and F ◦ Σk do not
depend on k ∈ NN , the preceding lead immediately to
(24)N

µ max Am F ≥ λ ≤ kF kL1 (µ)
1≤m≤n λ
and N
2 p (12)N p

max Am F ≤ kF kLp (µ)
1≤m≤n
Lp (µ) p−1
for all n ∈ Z+ . Thus, (6.2.2) and (6.2.3) follow after one lets n → ∞.
Given (6.2.2) and (6.2.3), I adopt again the strategy used in the proof of
Corollary 5.2.4. That is, I must begin by finding a dense subset of each Lp -space
on that the desired convergence results can be checked by hand, and for this
purpose I will have to introduce the notion of invariance.
A set Γ ∈ F is said to be invariant, and I write Γ ∈ I if Γ = (Σk )−1 (Γ) for
every k ∈ NN . As is easily checked, I is a sub-σ-algebra of F. In addition, it
is clear that Γ ∈ F is invariant if Γ = (Σej )−1 (Γ) for each 1 ≤ j ≤ N , where
{ei : 1 ≤ i ≤ N } is the standard orthonormal basis in RN . Finally, if I is the
µ-completion of I relative to F in the sense that Γ ∈ I if and only if Γ ∈ F and
there is Γ̃ ∈ I such that µ(Γ∆Γ̃) = 0 (A∆B ≡ (A\B)∪(B \A) is the symmetric
difference between the sets A and B), then an F-measurable F : Ω −→ E is
I-measurable if and only if F = F ◦ Σk (a.e., µ) for each k ∈ NN . Indeed, one
need only check this equivalence for indicator functions of sets. But if Γ ∈ F
and µ(Γ∆Γ̃) = 0 for some Γ̃ ∈ I, then

µ Γ∆(Σk )−1 (Γ) ≤ µ (Σk )−1 (Γ∆Γ̃) + µ(Γ∆Γ̃) = 0,

and so Γ ∈ I. Conversely, if Γ ∈ I, set

[
Γ̃ = (Σk )−1 (Γ),
k∈NN
and check that Γ̃ ∈ I and µ(Γ∆Γ̃) = 0.

Lemma 6.2.4. Let I(E) be the subspace of I-measurable elements of L2 (µ; E).
Then, I(E) is a closed linear subspace of L2 (µ; E). Moreover, if ΠI(R) denotes
orthogonal projection from L2 (µ; R) onto I(R), then there exists a unique linear
contraction ΠI(E) : L2 (µ; E) −→ I(E) with the property that ΠI(E) (af ) =
aΠI(R) f for a ∈ E and f ∈ L2 (µ; R). Finally, for each F ∈ L2 (µ; E),
(6.2.5) An F −→ ΠI(E) F (a.e., µ) and in L2 (µ; E).
Proof: I begin with the case when E = R. The first step is to identify the
orthogonal complement I(R)⊥ of I(R). To this end, let N denote the subspace
of L2 (µ; R) consisting of elements having the form g − g ◦ Σej for some g ∈
L2 (µ; R) ∩ L∞ (µ; R) and 1 ≤ j ≤ N . Given f ∈ I(R), observe that
f, g − g ◦ Σej L2 (µ;R) = f, g L2 (µ;R) − f ◦ Σej , g ◦ Σej L2 (µ;R) = 0.

Hence, N ⊆ I(R)⊥ . On the other hand, if f ∈ L2 (µ; R) and f ⊥ N , then it is

clear that f ⊥ f − f ◦ Σej for each 1 ≤ j ≤ N and therefore that
f − f ◦ Σej 2 2

L (µ;R)
2
= kf kL2 (µ;R) − 2 f, f ◦ Σej L2 (µ;R) + f ◦ Σej L2 (µ;R)
2

= 2 kf k2L2 (µ;R) − f, f ◦ Σej L2 (µ;R) = 2 f, f − f ◦ Σej L2 (µ;R) = 0.

Thus, for each 1 ≤ j ≤ N , f = f ◦ Σej µ-almost everywhere; and, by induction

on kkk∞ , one concludes that f = f ◦ Σk µ-almost everywhere for all k ∈ NN .
In other words, we have now shown that I(R) = N ⊥ or, equivalently, that
N = I(R)⊥ .
Continuing with E = R, next note that if f ∈ I(R), then An f = f (a.e., µ)
for each n ∈ Z+ . Hence, (6.2.5) is completely trivial in this case. On the other
hand, if g ∈ L2 (µ; R) ∩ L∞ (µ; R) and f = g − g ◦ Σej , then
X X
nN An f = g ◦ Σk − g ◦ Σk+ej ,
{k∈Q+
n :kj =0} {k∈Q+
n :kj =n−1}
and so, with p ∈ {2, ∞},

2kgkLp (µ;R)
An f p
L (µ;R)
≤ −→ 0 as n → ∞.
n
Hence, in this case also, (6.2.5) is easy. Finally, to complete the proof for E = R,
simply note that, by (6.2.3) with p = 2 and E = R, the set of f ∈ L2 (µ; R) for
which (6.2.5) holds is a closed linear subspace of L2 (µ; R) and that we have
already verified (6.2.5) for f ∈ I(R) and f from a dense subspace of I(R)⊥ .
Turning to general E’s, first note that ΠI(E) F is well defined for µ-simple F ’s.
P`
Indeed, if F = 1 ai 1Γi for some {ai : 1 ≤ i ≤ `} ⊆ E and {Γi : 1 ≤ i ≤ `} of
mutually disjoint elements of F with finite µ-measure, then
`
X
ΠI(E) F = ai ΠI(R) 1Γi
1
and so
Z `
!2
ΠI(E) F 2 2
X
L (µ;E)
≤ kai kE ΠI(R) 1Γi dµ
1

`
! 2 `
X X
= ΠI(R) kai kE 1Γi ≤ kai k2E µ(Γi ) = kF k2L2 (µ;E) .

2
1 L (µ;R) 1
Thus, since the space of µ-simple functions is dense in L2 (µ; E), it is clear that
ΠI(E) not only exists but is also unique.
Finally, to check (6.2.5) for general E’s, note that (6.2.5) for E-valued, µ-
simple F ’s is an immediate consequence of (6.2.5) for E = R. Thus, we already
know (6.2.5) for a dense subspace of L2 (µ; E), and so the rest is another elemen-
tary application of (6.2.3).
§ 6.2.2. Birkhoff ’s Ergodic Theorem. For any p ∈ [1, ∞), let Ip (E) denote
the subspace of I-measurable elements of Lp (µ; E). Clearly Ip (E) is closed for
every p ∈ [1, ∞). Moreover, since
µ(Ω) < ∞ =⇒ ΠI(E) F = Eµ F I ,

(6.2.6)
when µ is finite ΠI(E) extends automatically as a linear contraction from Lp (µ; E)
onto Ip (E) for each p ∈ [1, ∞), the extension being given by the right-hand side
of (6.2.6). However, when µ(E) = ∞, there is a problem. Namely, because µ I
will seldom be σ-finite, it will not be possible to condition µ with respect to I.
Be that as it may, (6.2.5) provides an extension of ΠI(E) . Namely, from (6.2.5)
and Fatou’s Lemma, it is clear that, for each p ∈ [1, ∞),
≤ kF kLp (µ;E) , F ∈ Lp (µ; E) ∩ L2 (µ; E),

ΠI(E) F p
L (µ;E)
and therefore the desired existence of the extension follows by continuity.

Theorem 6.2.7 (Birkhoff ’s Individual Ergodic Theorem). For each p ∈

[1, ∞) and F ∈ Lp (µ; E),
(6.2.8) An F −→ ΠI(E) F (a.e., µ).
Moreover, if either p ∈ (1, ∞) or p = 1 and µ(Ω) < ∞, then the convergence in

(6.2.8) is also in Lp (µ; E). Finally, if µ(Γ) ∧ µ(Γ{) = 0 for every Γ ∈ I, then
(6.2.8) can be replaced by
 Eµ [F ]
 if µ(Ω) ∈ (0, ∞)
lim An F = µ(Ω) (a.e., µ),
n→∞ 
0 if µ(Ω) = ∞
and the convergence is in Lp (µ; E) when either p ∈ (1, ∞) or p = 1 and µ(Ω) <
∞.
Proof: As I said above, the proof is now an easy application of the strategy
used to prove Corollary 5.2.4. Namely, by (6.2.2), the set of F ∈ L1 (µ; E) for
which (6.2.8) holds is closed and, by (6.2.5), it includes L1 (µ; E) ∩ L∞ (µ; E).
Hence, (6.2.8) is proved for p = 1. On the other hand, when p ∈ (1, ∞),
(6.2.3) applies and shows first that the set of F ∈ Lp (µ; E) for which (6.2.8)
holds is closed in Lp (µ; E) and second that µ-almost everywhere convergence
already implies convergence in Lp (µ; E). Hence, we have proved that (6.2.8)
holds and that the convergence is in Lp (µ; E) when p ∈ (1, ∞). In addition,
when µ(Γ) ∧ µ(Γ{) = 0 for all Γ ∈ I, it is clear that the only elements of Ip (E)
are µ-almost everywhere constant, which, in the case when µ(Ω) < ∞, means (cf.
µ
[F ]
(6.2.6)) that ΠI(E) F = Eµ(Ω) , and, when µ(Ω) = ∞, means that Ip (E) = {0}
for all p ∈ [1, ∞).
In view of the preceding, all that remains is to discuss the L1 (µ; E) convergence
in the case when p = 1 and µ(Ω) < ∞. To this end, observe that, because the
An ’s are all contractions in L1 (µ; E), it suffices to prove L1 (µ; E) convergence
for E-valued, µ-simple F ’s. But L1 (µ; E) convergence for such F ’s reduces
to showing that An f −→ ΠI(R) f in L1 (µ; R) for non-negative f ∈ L∞ (µ; R).
Finally, if f ∈ L1 µ; [0, ∞) , then
n ∈ Z+ ,

An f kL1 (µ) = kf kL1 (µ) = ΠI(R) f kL1 (µ;R) ,
where, in the last equality, I used (6.2.6); and this, together with (6.2.8), implies
(cf. the final step in the proof of Theorem 6.1.12) convergence in L1 (µ).

I will say that semigroup Σk : k ∈ NN is ergodic on (Ω, F, µ) if, in addition
to being µ-measure preserving, µ(Γ) ∧ µ(Γ{) = 0 for every invariant Γ ∈ I.
Classic Example. In order to get a feeling for what the Ergodic Theorem is
saying, take µ to be Lebesgue measure on the interval [0, 1) and, for a given
α ∈ (0, 1), define Σα : [0, 1) −→ [0, 1) so that
Σα (ω) ≡ ω + α − [ω + α] = ω + α mod 1.
If α is rational and m is the smallest element of Z+ with the property that
mα ∈ Z+ , then it is clear that, for any F on [0, 1), F ◦ Σα = F if and only if F
1
has period m . Hence, if F ∈ L2 [0, 1); C and
√
Z
c` (F ) ≡ F (ω)e− −1 2π`ω dω, ` ∈ Z,
[0,1)
then elementary Fourier analysis leads to the conclusion that, in this case,
X √
lim An F (ω) = cm` (F )e −1 2m`πω for Lebesgue-almost every ω ∈ [0, 1).
n→∞
`∈Z

On the other hand, if α is irrational, then Σkα : k ∈ N} is µ-ergodic on [0, 1).
To see this, suppose that F ∈ I(C). Then (cf. the preceding and use Parseval’s
Identity)
2
c` (F ) − c` (F ◦ Σα )2 .
X
0 = F − F ◦ Σα L2 ([0,1);C) =
`∈Z
But, clearly, √
−1 2π`α
c` (F ◦ Σα ) = e c` (F ), ` ∈ Z,
and so (because α is irrational) c` (F ) = 0 for each ` 6= 0. In other words, the only
elements of I(C) are µ-almost everywhere constant. Thus, for each irrational
α ∈ (0, 1), p ∈ [1, ∞), separable Banach space E, and F ∈ Lp [0, 1); E ,
Z
lim An F = F (ω) dω Lebesgue-almost everywhere and in Lp (µ; E).
n→∞ [0,1)
Finally, notice that the situation changes radically when one moves from [0, 1) to
[0, ∞) and again takes µ to be Lebesgue measure and α ∈ (0, 1) to be irrational.
If I extend the definition of Σα by taking Σα (ω) = bωc + Σα (ω − bωc) for ω ∈
[0, ∞), then it is clear that invariant functions are those that are constant on each
R bωc+1
interval [m, m+1) and that, Lebesgue-almost surely, An f (ω) −→ bωc f (η) dη.
On the other hand, if one defines Σα (ω) = ω + α, then every invariant set that
has non-zero measure will have infinite measure, and so, now, every choice of
α ∈ (0, 1) (not just irrational ones) will give rise to an ergodic system. In
particular, one will have, for each p ∈ [1, ∞) and F ∈ Lp (µ; E),
lim An F = 0 Lebesgue-almost everywhere,
n→∞
and the convergence will be in Lp (µ; E) when p ∈ (1, ∞).

§ 6.2.3. Stationary Sequences. For applications to probability theory, it

is useful to reformulate these considerations in terms of stationary families of
random variables. Thus, let (Ω, F, P) be a probability space and (E, B) be a
measurable space (E need not be a Banach space). Given a family F = {Xk :
k ∈ NN } of E-valued random variables on (Ω, F, P), I will say that F is P-
stationary (or simply stationary) if, for each ` ∈ NN , the family
F` ≡ Xk+` : k ∈ NN

has the same (joint) distribution under P as F itself. Clearly, one can test for
stationarity by checking that the distribution of Fej is the same as that of F for
each 1 ≤ j ≤ N . In order to apply the considerations of § 6.2.1 to stationary
families, note that all questions about the properties of F can be phrased in
N
terms of the following canonical setting . Namely, set E = E N and define µ
N N
on E, B N to be the image measure F∗ P. In other words, for each Γ ∈ B N ,
µ(Γ) = P F ∈ Γ . Next, for each ` ∈ NN , define Σ` : E −→ E to be the natural
shift transformation on E given by Σ` (x)k = xk+` for all k ∈ NN . Obviously,
stationarity of F is equivalent to the statement that {Σk : k ∈ NN } is µ-measure
N
preserving. Moreover, if I is the σ-algebra of shift invariant elements Γ ∈ B N
−1
(i.e., Γ = Σk (Γ) for all k ∈ NN ), then, by Theorem 6.2.7, for any separable
Banach space B, any p ∈ [1, ∞), and any F ∈ Lp (P; B),
1 X h i
lim N F ◦ Fk = EP F ◦ F F−1 (I) (a.s., P) and in Lp (P; B).

n→∞ n
+
k∈Qn
N
In particular, when Σk : k ∈ NN is ergodic on E, B N µ , I will say that the
family F is ergodic and conclude that the preceding can be replaced by
1 X
F ◦ Fk = EP F ◦ F (a.s., P) and in Lp (P; B).

(6.2.9) lim N
n→∞ n
+
k∈Qn
So far I have discussed one-sided stationary families, that is, families indexed
by NN . However, for various reasons (cf. Theorem 6.2.11) it is useful to know
that one can usually embed a one-sided stationary family into a two-sided one. In
terms of the semigroup of shifts,
this corresponds to the trivial observation that
NN
k N
the semigroup Σ : k ∈ N on E = E can be viewed as a sub-semigroup
N
of the group of shifts Σk : k ∈ ZN on Ê = E Z . With these comments in
mind, I will prove the following.
Lemma 6.2.10. Assume that E is a complete, separable, metric space and that
F = {Xk : k ∈ NN } is a stationary family of E-valued random variables on the
(Ω, F, P).
probability space
N
Then there exists a probability space (Ω̂,
N
F̂, P̂) and
a family F̂ = X̂k : k ∈ Z with the property that, for each ` ∈ Z ,
F̂` ≡ X̂k+` : k ∈ NN

has the same distribution under P̂ as F has under P.

Proof: When formulated correctly, this theorem is an essentially trivial ap-

plication of Kolmogorov’s Extension Theorem (cf. part (iii) of Exercise 9.1.17).
Namely, for n ∈ N, set
Λn = k ∈ ZN : kj ≥ −n for 1 ≤ j ≤ N ,

and define Φn : E Λ0 −→ E Λn so that
Φn (x)k = xn+k for x ∈ E Λ0 and k ∈ Λn , where n ≡ (n, . . . , n)
Next, take µ0 on E Λ0 to be the P-distribution of F and, for n ≥ 1, µn on E Λn

to be (Φn )∗ µ0 . Using stationarity, one can easily check that, for each n ≥ 0
and k ∈ NN , µn is invariant under the obvious extension of Σk to E Λn . In
particular, if one identifies E Λn+1 with E Λn+1 \Λn × E Λn , then
µn+1 E Λn+1 \Λn × Γ = µn (Γ)

for all Γ ∈ BE Λn .
Hence the µn ’s are consistently defined on the spaces E Λn , and therefore Kol-
mogorov’s Extension Theorem applies and guarantees the existence of a unique
N
Borel probability measure µ on E Z with the property that
N
\Λn

µ EZ × Γ = µn (Γ) for all n ≥ 0 and Γ ∈ BE Λn .
Moreover, since each µn is Σk -invariant for all k ∈ NN , it is clear that µ is also.

N
Thus, because Σk is invertible on E Z and Σ−k is its inverse, it follows that µ
is invariant under Σk for all k ∈ ZN .
N
To complete the proof at this point, simply take Ω̂ = E Z , F̂ = BΩ̂ , P̂ = µ,
and X̂k (ω̂) = ω̂k for k ∈ ZN .
As an example of the advantage that Lemma 6.2.10 affords, I present the
following beautiful observation made originally by M. Kac.
Theorem 6.2.11. Let (E, B) be a measurable space and {Xk : k ∈ N}
a stationary sequence of E-valued random variables on the probability space
(Ω, F, P). Given Γ ∈ B, = inf{k ≥ 1 : Xk (ω) ∈ Γ}.
define the return time ρΓ (ω)
Then, EP ρΓ , X0 ∈ Γ = P Xk ∈ Γ for some k ∈ N . In particular, if {Xk : k ∈
N} is ergodic, then

P X0 ∈ Γ > 0 =⇒ EP ρΓ , X0 ∈ Γ = 1.
Proof: Set Uk = 1Γ ◦Xk for k ∈ N. Then {Uk : k ∈ N} is a stationary sequence

of {0, 1}-valued random variables. Hence, by Lemma 6.2.10, we can find a prob-
ability space Ω̂, F̂, P̂ on which there is a family {Ûk : k ∈ Z} of {0, 1}-valued

random variables with the property that, for every n ∈ Z, Ûn , . . . , Ûn+k , . . .
has the same distribution under P̂ as (U0 , . . . , Uk , . . . ) has under P. In particular,

P ρΓ ≥ 1, X0 ∈ Γ = P̂ Û0 = 1 and
n ∈ Z+ .

P ρΓ ≥ n + 1, X0 ∈ Γ = P̂ Û−n = 1, Û−n+1 = 0, . . . , Û0 = 0 ,
Thus, if
λΓ (ω̂) ≡ inf k ∈ N : U−k (ω̂) = 1 ,
then
n ∈ Z+ ,

P ρΓ ≥ n, X0 ∈ Γ = P̂ λΓ = n − 1 ,
and so
EP ρΓ , X0 ∈ Γ = P̂ λΓ < ∞ .
Now observe that

P̂ λΓ > n = P̂ Û−n = 0, . . . , Û0 = 0 = P X0 ∈
/ Γ, . . . , Xn ∈
/Γ ,
from which it is clear that

P̂ λΓ < ∞ = P ∃k ∈ N Xk ∈ Γ .
P∞{Xk : k ∈ N} is ergodic and that P(X0 ∈ Γ) > 0.

Finally, assume that
Because, by (6.2.9), 0 1Γ Xk = ∞ P-almost surely, it follows that, P-almost
surely, Xk ∈ Γ for some k ∈ N.
It should be noticed that, although there are far more elementary proofs, when
{Xn : n ≥ 0} is an irreducible, ergodic Markov chain on a countable state space
E, then Kac’s theorem proves that the stationary measure at the state x ∈ E is
the reciprocal of the expected time that the chain takes to return to x when it
starts at x.
§ 6.2.4. Continuous Parameter Ergodic Theory. I turn now to the set-
ting of continuously parametrized semigroups of transformations.
Thus, again
(Ω, F, µ) is a σ-finite measure space and Σt : t ∈ [0, ∞)N is a measurable
semigroup of µ-measure preserving transformations on Ω. That is, Σ0 is the
identity, Σs+t = Σs ◦ Σt ,
(t, ω) ∈ [0, ∞)N × Ω 7−→ Σt (ω) ∈ Ω is B[0,∞)N × F-measurable,

and Σt ∗ µ = µ for every t ∈ [0, ∞)N . Next, given an F-measurable F with
values in some separable Banach space E, let G(F ) be the set of ω ∈ Ω with the
property that
Z
F ◦ Σt (ω) dt < ∞ for all T ∈ (0, ∞).

E
[0,T )N
Clearly,
ω ∈ G(F ) =⇒ Σt (ω) ∈ G(F ) for every t ∈ [0, ∞)N .
In addition, if F ∈ Lp (µ; E) for some p ∈ [1, ∞), then
Z Z !
F ◦ Σt (ω) p dt µ(dω) = T N kF kp p

E L (µ;E) < ∞,
Ω [0,T )N
and so [
F ∈ Lp (µ; E) =⇒ µ G(F ){ = 0.
p∈[1,∞)
Next, for each T ∈ (0, ∞), define

( −N R
T [0,T )N
F ◦ Σt (ω) dt if ω ∈ G(F )
AT F (ω) =
0 if ω ∈
/ G(F ),
and note that, as a consequence of the invariance of G(F ),
AT F ◦ Σt = AT F ◦ Σt for all t ∈ [0, ∞)N .

Finally, use Î to denote the σ-algebra of Γ ∈ F with the property that Γ =

(Σt )−1 (Γ) for each t ∈ [0, ∞)N , and say that Σt : t ∈ [0, ∞)N is ergodic if

µ(Γ) ∧ µ(Γ{) = 0 for every Γ ∈ Î.

t
Theorem 6.2.12. Let (Ω, F, µ) be a σ-finite measure space and Σ : t ∈
[0, ∞)N be a measurable semigroup of µ-measure preserving transformations
on Ω. Then, for each separable Banach space E, p ∈ [1, ∞), and T ∈ (0, ∞),
AT is a contraction on Lp (µ; E). Next, set ΠÎ(E) = ΠI(E) ◦ A1 , where ΠI(E)

is defined in terms of Σk : k ∈ NN as in Theorem 6.2.7. Then, for each
p ∈ [1, ∞) and F ∈ Lp (µ; E),
(6.2.13) lim AT F = ΠÎ(E) F (a.e., µ).

T →∞
Moreover, if p ∈ (1, ∞) or p = 1 and µ(Ω) < ∞, then the convergence is also in

Lp (µ; E). In fact, if µ(Ω) < ∞, then
lim AT F = Eµ F Î (a.e., µ) and in Lp (µ : E).

T →∞

Finally, if Σt : t ∈ [0, ∞)N is ergodic, then (6.2.13) can be replaced by
Eµ [F ]
lim AT F = (a.e., µ),
T →∞ µ(Ω)
where it is understood that the ratio is 0 when the denominator is infinite.
Proof: The first step is the observation that
(24)N

(6.2.14) µ sup AT F E ≥ λ ≤ kF kL1 (µ;E) , λ ∈ (0, ∞)
T >0 λ
and
(24)N p

(6.2.15) sup AT F ≤ kF kLp (µ;E) for p ∈ (1, ∞).
T >0 E p−1
Lp (µ;E)
Indeed, because of (AT F ) ◦ Σt = AT (F ◦ Σt ), (6.2.14) is derived from (6.1.6)

in precisely the same way as I derived (6.2.2) from (6.1.10), and (6.2.15) comes
from (6.1.7) just as (6.2.3) came from (6.1.7).
Given (6.2.14) and (6.2.15), we know that it suffices to prove (6.2.13) for
a dense subset of L1 (µ; E). Thus, let F be a uniformly bounded element of
L1 (µ; E) and set F̂ = A1 F . Because
Z
N
T AT F (ω) − nN An F̂ (ω) ≤ F ◦ Σt (ω)kE dt

E
[0,n+1)N \[0,n)N
for n ≤ T ≤ n + 1,

lim AT F − An F̂ for every p ∈ [1, ∞].
n→∞ supn≤T ≤n+1
E
=0
Lp (µ;R)
Hence, for F ∈ L1 (µ; E)∩L∞ (µ; E), (6.2.13) follows from (6.2.8). As for
the case
when µ(Ω) < ∞, all that we have to do is check that ΠÎ(E) F = Eµ F Î (a.e., µ).
However, from (6.2.13), it is easy to see that ΠÎ(E) F is measurable with respect
to the µ-completion of Î, and so it suffices to show that
Eµ F, Γ = Eµ A1 F, Γ for all Γ ∈ Î.

But, if Γ ∈ Î, then

Z
µ
Eµ F ◦ Σt , Γ dt

E A1 F, Γ =
[0,1)N
Z
−1
Eµ F ◦ Σt , Σt (Γ) dt = Eµ [F, Γ].

=
[0,1)N

Finally, assume that Σt : t ∈ [0, ∞)N is µ-ergodic. When µ(Ω) < ∞, the
asserted result follows immediately from the preceding; and when µ(Ω) = ∞, it
follows from the fact that ΠÎ(E) F is measurable with respect to the µ-completion
of Î.
Exercise 6.2.16. Given an irrational α ∈ (0, 1) and an ∈ (0, 1), let Nn (α, )
be the number of 1 ≤ m ≤ n with the property that

α − ` ≤

for some ` ∈ Z.
m 2m
As an application of the considerations in the Classic Example given at the end

of § 6.1, show that
Nn (α, )
lim ≥ .
n→∞ n
Hint: Let δ ∈ 0, 2 be given, take f equal to the indicator function of [0, δ) ∪

Pn
(1 − δ, 1), and observe that Nn (α, ) ≥ k=1 f ◦ Σkα (ω) so long as 0 ≤ ω ≤ 2 − δ.

Exercise 6.2.17. Assume that µ(Ω) < ∞ and that Σk : k ∈ NN is ergodic.
Given a non-negative F-measurable function f , show that
lim An f < ∞ on a set of positive µ-measure =⇒ f ∈ L1 (µ; R)

n→∞
Eµ [f ]
=⇒ lim An f = (a.e., µ).
n→∞ µ(Ω)

Exercise 6.2.18. Let F = Xk : k ∈ NN be a stationary family of random
variables on the probability space (Ω, F, P) with values in the measurable space
NN
(E, B), and let I denote the σ-algebra of shift invariant Γ ∈ BE .
(i) Take \
T ≡ σ Xk : kj ≥ n for all 1 ≤ j ≤ N ,
n≥0
. Show that F−1 (I) ⊆ T , and

N

the tail σ-algebra
determined by X k : k ∈ N
conclude that Xk : k ∈ NN is ergodic if T is P-trivial (i.e., P(Γ) ∈ {0, 1} for
all Γ ∈ T ).
(ii) By combining (i), Kolmogorov’s 0–1 Law, and the Individual Ergodic The-
orem, give another derivation of The Strong Law of Large Numbers for inde-
pendent, identically distributed, integrable random variables with values in a
separable Banach space.

Exercise 6.2.19. Let Xk : k ∈ N be a stationary, ergodic sequence of R-
valued, integrable random variables on (Ω, F, P). Using the reasoning suggested
in Exercise 1.4.28, prove Guivarc’h’s lemma:

n−1
X
EP X1 = 0 =⇒ lim Xk < ∞.

n→∞ k=0

§ 6.3 Burkholder’s Inequality 257
§ 6.3 Burkholder’s Inequality

Given a martingale Xn , Fn , P with X0 = 0 and a sequence {σn : n ≥ 0}
of bounded functions with the property that σn is Fn -measurable for n ≥ 0,
determine {Yn : n ≥ 0} byY0 = 0 and Yn − Yn−1 = σn−1 (Xn − Xn−1 ) for n ≥ 1.
It is clear that Yn , Fn , P is again a martingale. In addition, if the absolute
values of all the σn ’s are bounded by some constant σ < ∞ and Xn is square
P-integrable, then one can easily check that
n n
X X
EP Yn2 = EP σn2 (Xn − Xn−1 )2 ≤ σ 2 EP (Xn − Xn−1 )2 = σ 2 EP Xn2 .

m=1 m=1
On the other hand, it is not at all clear how to compare the size of Yn to that
of Xn in any of the Lp spaces other than p = 2.
The problem of finding such a comparison was given a definitive solution by D.
Burkholder, and I will present his solution in this section. Actually, Burkholder
solved the problem twice. His first solution was a beautiful adaptation of general
ideas and results that had been developed over the years to solve related prob-
lems in probability theory and analysis and, as such, did not yield the optimal
solution. His second approach is designed specifically to address the problem
at hand and bears little or no resemblance to familiar techniques. It is entirely
original, remarkably elementary and effective, but somewhat opaque. The ap-
proach is the outgrowth of many years of deep thinking that Burkholder devoted
to the topic, and the reader who wants to understand the path that led him to
it should consult the explanation that he wrote.1
§ 6.3.1. Burkholder’s Comparison Theorem. Burkholder’s basic result is
the following comparison theorem.

Theorem
6.3.1 (Burkholder). Let Ω, F, P be a probability space, Fn :
n ∈ N a non-decreasing sequence of sub-σ-algebras of F, and E and F a pair
of (real or complex)
separable Hilbert spaces. Next, suppose that Xn , Fn , P
and Yn , Fn , P are, respectively, E- and F -valued martingales. If
kY0 kF ≤ kX0 kE and kYn − Yn−1 kF ≤ kXn − Xn−1 kE , n ∈ Z+ ,
P-almost surely, then, for each p ∈ (1, ∞) and n ∈ N,

1
(6.3.2) Yn p
L (P;F )
≤ Bp Xn Lp (P;E) , where Bp ≡ (p − 1) ∨ .
p−1
As I said before, the derivation of Theorem 6.3.1 is both elementary and
mysterious. I begin with the trivial observation that, without loss in generality,
1For those who want to know the secret behind this proof, Burkholder has revealed it in his
article “Explorations in martingale theory and its applications” for the 1989 Saint-Flour Ecole
d’Eté lectures published by Springer-Verlag, LNM 1464 (1991).
I may assume that both E and F are complex Hilbert spaces, since we can always
complexify them, and, in addition, that E = F , since, if that is not already the
case, I can embed them in E ⊕ F . Thus, I will be making these assumptions
throughout.
The heart of the proof lies in the computations contained in the following two
lemmas.
Lemma 6.3.3. Let p ∈ (1, ∞) be given, set
p2−p (p − 1)p−1 if p ∈ [2, ∞)

αp = 2−p
p if p ∈ (1, 2],
and define u : E 2 −→ R by (cf. (6.3.2))

p−1
u(x, y) = kykE − Bp kxkE kykE + kxkE .
Then p
kykpE − Bp kxkE ≤ αp u(x, y), (x, y) ∈ E 2 .
Proof: When p = 2, there is nothing to do. Thus, I will assume that p ∈
(1, ∞) \ {2}.
Observe that it suffices to show that, for all (x, y) ∈ E 2 satisfying kxkE +
kykE = 1, depending on whether p ∈ (2, ∞) or p ∈ (1, 2),

p ≤ p2−p (p − 1)p−1 kykE − (p − 1)kxkE

p
(*) kykE − (p − 1)kxkE
≥ p2−p (p − 1)p−1 kykE − (p − 1)kxkE .
Indeed, when p ∈ (2, ∞), (*) is precisely the result desired, and, when p ∈ (1, 2),
(*) gives the desired result after one divides through by (p − 1)p and reverses
the roles of x and y.
I begin the verification of (*) by checking that
if p ∈ (2, ∞)

2−p p−1 >1
(**) p (p − 1)
<1 if p ∈ (1, 2).
To this end, set f (p) = (p − 1) log(p − 1) − (p − 2) log p for p ∈ (1, ∞). Then
f is strictly convex on (1, 2) and strictly concave on (2, ∞). Thus, f (1, 2)
cannot achieve a maximum and, therefore, since limp&1 f (p) = 0 = f (2), f < 0
on (1, 2). Similarly, f (2, ∞) cannot achieve a minimum and, therefore, since
f (2) = 0 while limp%∞ f (p) = ∞, we have that f > 0 on (2, ∞).
Next, observe that proving (*) comes down to checking that, for s ∈ [0, 1],
≥0 if p ∈ (2, ∞)

2−p p−1 p p p
Φ(s) ≡ p (p − 1) (1 − ps) − (1 − s) + (p − 1) s
≤0 if p ∈ (1, 2).
To this end, note that, by (**), Φ(0) > 0 when p ∈ (2, ∞) and Φ(0) < 0 when
p ∈ (1, 2). Also, for s ∈ (0, 1),
h i
Φ0 (s) = p (p − 1)p sp−1 + (1 − s)p−1 − p2−p (p − 1)p−1
and h i
Φ00 (s) = p(p − 1) (p − 1)p sp−2 − (1 − s)p−2 .
In particular, we see that Φ p1 = Φ0 p1 = 0. In addition, depending on whether

p ∈ (2, ∞) or p ∈ (1, 2), lims&0 Φ00 (s) is negative or positive, Φ00 is strictly in-
creasing or decreasing on (0, 1), and lims%1 Φ00 (1) is positive or negative. Hence,
there exists a unique t = tp ∈ (0, 1) with the property that
< 0 if p ∈ (2, ∞) > 0 if p ∈ (2, ∞)

Φ00 (0, t) and Φ00 (t, 1)
> 0 if p ∈ (1, 2) < 0 if p ∈ (1, 2.

Moreover, because Φ00 (t) = 0, it is easy to see that t ∈ 0, p1 .
Now suppose that p ∈ (2, ∞) and consider Φ on each of the intervals p1 , 1 ,

t, p , and 0, t separately. Because both Φ and Φ0 vanish at p1 while Φ00 > 0

1
on p1 , 1 , it is clear that Φ > 0 on p1 , 1 . Next, because Φ0 p1 = 0 and

Φ00 t, p1 > 0, we know that Φ is strictly decreasing on t, p1 and therefore that

Φ t, p1 > Φ p1 = 0. Finally, because Φ00 (0, t) < 0 while Φ(0) ∧ Φ(t) ≥ 0,

we also know that Φ (0, t) > 0. The argument when p ∈ (1, 2) is similar, only
this time all the signs are reversed.
Lemma 6.3.4. Again let p ∈ (1, ∞) be given, and define u : E × F −→ R as in
Lemma 6.3.3. In addition, define the functions v and w on E 2 \ {0, 0} by
p−2
v(x, y) = p kykE + kxkE kykE + (2 − p)kxkE
and p−2
w(x, y) = p(1 − p) kykE + kxkE kxkE .
Then, for (x, y) ∈ E 2 and (k, h) ∈ E 2 satisfying

min ky + thkE ∧ kx + tkkE > 0 and khkE ≤ kkkE ,
t∈[0,1]
one has

y x
u(x + k, y + h) − u(x, y) ≤ v(x, y) Re kykF , h + w(x, y) Re kxk E
, k
F E
when p ∈ [2, ∞) and

y x

(p−1) u(x+k, y+h)−u(x, y) ≤ −w(y, x) Re kyk E
, h −v(y, x) Re kxkE , k
E E
when p ∈ (1, 2].

Proof: Set

Φ(t) = Φ t; (x, k), (y, h)
p−1
≡ ky + thkE − (p − 1)kx + tkkE kx + tkkE + ky + thkE ,
and observe that

(
Φ t; (x, k), (y, h) if p ∈ [2, ∞)
u x + tk, y + th = −1

−(p − 1) Φ t; (y, h), (x, k) if p ∈ (1, 2).
Hence, it suffices for us to check that

Φ0 (t) = v(x + tk, y + th)Re ky+thk
y+th
E
, h + w(x + tk, y + th)Re x+tk
kx+tkkE , k
E E
and prove that
≤0 if p ∈ [2, ∞) and khkE ≤ kkkE

Φ00 t; (x, k), (y, h)

≥0 if p ∈ (1, 2] and khkE ≥ kkkE .
To prove the preceding, = y + th, Ψ(t) = kx(t)kE +

set x(t) = x + tk, y(t)
Re x(t),k Re y(t),h
ky(t)kE , a(t) = kx(t)kE
E
, and b(t) = ky(t)kE
E
. One then has that
h i
Φ0 (t) = pΨ(t)p−2 (1 − p)kx(t)kE a(t) + ky(t)kE + (2 − p)kx(t)kE b(t)
h i
= p (1 − p)Ψ(t)p−2 kx(t)kE a(t) + b(t) + Ψ(t)p−1 b(t) .

In particular, the first expression establishes the required form for Φ0 (t). In
addition, from the second expression, we see that
Φ00 (t) 2
− = (p − 1)(p − 2) Ψ(t)p−3 kx(t)kE a(t) + b(t)
p
h i
+ (p − 1)Ψ(t)p−2 a(t) a(t) + b(t) + kx(t)k 2 2
E
ky(t)kE b⊥ (t) + a⊥ (t)
h i
b⊥ (t)2
− Ψ(t)p−2 (p − 1) a(t) + b(t) b(t) + Ψ(t) ky(t)k

E
2
= (p − 1)(p − 2) Ψ(t)p−3 kx(t)kE a(t) + b(t)
b⊥ (t)2
+ (p − 1)Ψ(t)p−2 kkk2E − khk2E + (p − 2)Ψ(t)p−1

ky(t)kE ,
p p
where a⊥ (t) = kkk2E − a(t)2 and b⊥ (t) = khk2E − b(t)2 . Hence the required
properties of Φ00 (t) have also been established.
Proof of Theorem 6.3.1: Set Kn = Xn − Xn−1 and Hn = Yn − Yn−1 for

n ∈ Z+ . I will assume that there is an > 0 with the property that
X0 (ω) − span{Kn (ω) : n ∈ Z+ } ≥

E
and
Y0 (ω) − span{Hn (ω) : n ∈ Z+ } ≥

E
for all ω ∈ Ω. Indeed, if this is not already the case, then I can replace E by
R × E (or, when E is complex, C × E) and Xn (ω) and Yn (ω), respectively, by
Xn() (ω) ≡ , Xn (ω) and Yn() (ω) ≡ , Yn (ω) ,

() ()
for each n ∈ N. Clearly, (6.3.2) for each Xn and Yn implies (6.3.2) for Xn
and Yn after one lets & 0. Finally, because there is nothing to do when the
right-hand side of (6.3.2) is infinite, let p ∈ (1, ∞) be given, and assume that
Xn ∈ Lp (P; E) for each n ∈ N. In particular, if u is the function defined in
Lemma 6.3.3 and v and w are those defined in Lemma 6.3.4, then
0
u(Xn , Yn ) ∈ L1 (P; R) and v(Xn , Yn ), w(Xn , Yn ) ∈ Lp (P; R)
p
for all n ∈ N, where p0 = p−1 is the Hölder conjugate of p.

Note that, by Lemma 6.3.3, it suffices for us to show that An ≡ EP u Xn , Yn
≤ 0, n ∈ N. Since u X0 , Y0 ) ≤ 0 P-almost surely, there is no question that
A0 ≤ 0. Next, assume that An ≤ 0, and, depending on whether p ∈ [2, ∞) or
p ∈ (1, 2], use the appropriate part of Lemma 6.3.4 to see that
h i
An+1 ≤EP v(Xn , Yn )Re kYYnnkE , Hn+1
E
h i
P Xn
+ E w(Xn , Yn )Re kXn kE , Kn+1
E
or
h i
An+1 ≤ − EP w(Yn , Xn )Re kYYnnkE , Hn+1
E
h i
P Xn
− E v(Yn , Xn )Re kXn kE , Kn+1 .
E
But, since v(Xn , Yn ) kYYnnkE is Fn -measurable, E [Hn+1 |Fn ] = 0, and therefore

P
(cf. Exercise 5.1.18)

h i
EP v(Xn , Yn )Re kYYnnkE , Hn+1 = 0.
E
Since the same reasoning shows that each of the other terms on the right-hand
side vanishes, we have now proved that An+1 ≤ 0.
As an immediate consequence of Theorem (6.3.2), we have the following answer
to the question raised at the beginning of this section.
Corollary 6.3.5. Suppose that (Xn , Fn , P) is a martingale with values in

a separable (real or complex) Hilbert space E. Further, let F be a second
separable, complex Hilbert space, and suppose that {σn : n ≥ 0} is a sequence
of Hom(E; F )-valued random variables with the properties that σ0 is constant,
σn is Fn -measurable for n ≥ 1, and kσn kop ≤ σ < ∞ (a.s., P) for some constant
σ < ∞ and all n ∈ N. If kY0 kF ≤ σkX0 kE and Yn − Yn−1 = σn−1 (Xn − Xn−1 )
for n ≥ 1, then (Yn , Fn , P) is an F -valued martingale and, for each p ∈ (1, ∞),
(cf. (6.3.2))
kYn kLp (P;F ) ≤ σBp kXn kLp (P;E) , n ∈ N.
§ 6.3.2. Burkholder’s Inequality. In many applications, the most useful

form of Burkholder’s result is as a generalization to p 6= 2 of the obvious equality
" n #
X
2 2

P
P
E |Xn − X0 | = E |Xm − Xm−1 | .
m=1
This is the form of his inequality which is best known and, as such, is called
Burkholder’s Inequality. Notice that his inequality can be viewed as a vast
generalization of Khinchine’s Inequality (2.3.27), although it applies only when
p ∈ (1, ∞).

Theorem
6.3.6 (Burkholder’s Inequality). Let Ω, F, P and Fn : n ∈
N be as in Theorem 6.3.1, and let Xn , Fn , P be a martingale with values in
the separable Hilbert space E. Then, for each p ∈ (1, ∞),
1
sup Xn − X0 Lp (P;E)
Bp n∈N

∞
! p2  p1
Xn − Xn−1 2
X
(6.3.7) ≤ EP  E

1

≤ Bp sup Xn − X0 Lp (P;E) ,
n∈N
with Bp as in (6.3.2).
Proof: Let F = `2 (N; E) be the separable Hilbert space of sequences

y = x0 , . . . , xn , . . . ∈ E N
satisfying
∞
! 12
X
kykF ≡ kxn k2E < ∞,
0
and define
Yn (ω) = (X0 (ω), X1 (ω) − X0 (ω), . . . , Xn (ω) − Xn−1 (ω), 0, 0, . . . ) ∈ F

for ω ∈ Ω and n ∈ N. Obviously, Yn , Fn , P is an F -valued martingale. More-
over,
kX0 kE = kY0 kF and kXn − Xn−1 kE = kYn − Yn−1 kF , n ∈ N,
and therefore the right-hand side of (6.3.7) is implied by (6.3.2) while the left-
hand side also follows from (6.3.2) when the roles of the Xn ’s and Yn ’s are
reversed.
Exercise 6.3.8. Because it arises repeatedly in the theory of stochastic inte-
gration, one of the most frequent applications of Burkholder’s Inequality is to
situations in which E is a separable Hilbert space and (Xn , Fn , P) is an E-valued
martingale for which one has an estimate of the form
1
h i 2p
2p
P
Kp ≡ sup E kXn − Xn−1 kE Fm−1

∞ <∞
n∈Z+
L (P;R)
for some p ∈ [1, ∞). To see how such an estimate gets used, let F be a sec-
ond separable Hilbert space and suppose that {σn : n ∈ N} is a sequence of
Hom(E; F )-valued random variables with the properties that, for each n ∈ N,
1
2p
σn is Fn -measurable and an ≡ EP kσn k2p
op < ∞. Set Y0 = 0 and
n
X
for n ∈ Z+ ,

Yn = σm−1 Xm − Xm−1
m=1
and show that

1
! 2p
n−1
1 1 X 2p
Yn 2p ≤ (2p − 1)n 2 Kp a .
L (P;F ) n m=0 m
Exercise 6.3.9. Return to the setting in Exercise 5.2.45, and let λ[0,1) denote
Lebesgue measure on [0, 1). Given f ∈ L2 (λ[0,1) ; C), show that, for each p ∈
(1, ∞),
1
(p − 1)∧ f − (f, 1)L2 (λ ;C) p
L ([0,1);C)
p−1 [0,1)

∞
! p  p1
2 2
Z X
≤ ∆m (f ) dt
[0,1) m=0
1
≤ (p − 1) ∨ f − (f, 1)L2 (λ ;C) p
L (λ[0,1) ;C)
.
p−1 [0,1)
For functions f with (f, e` )L2 (λ[0,1) ;C) = 0 unless ` = ±2m for some m ∈ N, this
estimate is a case of a famous theorem proved by Littlewood and Paley in order
to generalize Parseval’s Identity to cover p 6= 2. Unfortunately, the argument
here is far too weak to give their inequality for general f ’s.
Exercise 6.3.10. In connection with the preceding exercise, it is interesting
to note that there is an orthonormal basis for L2 λ[0,1) ; R that, as distinguished
from the trigonometric functions, can be nearly completely understood in terms
of martingale analysis. Namely, recall the Rademacher functions {Rn : n ∈ Z+ }
introduced in § 1.1.2. Next, use F to denote the set of all finite subsets F of Z+ ,
and define the Walsh function WF for F ∈ F by
if F = ∅

1
WF = Q
m∈F Rm if F 6= ∅.
Finally, set A0 = ∅ and An = {1, . . . , n} for n ∈ Z+ .

(i) For each n ∈ N, let Fn be the σ-algebra generated by the partition
: 0 ≤ k < 2n .
k k+1
2n , 2n

Show that, for each n ∈ Z+ , WF : F ⊆ An is an orthonormal basis for the

subspace L2 [0, 1), Fn , λ[0,1) ; R , and conclude from this that WF : F ∈ F
forms an orthonormal basis for L2 λ[0,1) ; R .

(ii) Let f ∈ L1 [0, 1); R be given, and set
!
X Z
f
Xn = f (t) WF (t) dt WF for n ∈ N.
F ⊆An [0,1)
Using the result in (i), show that Xnf = Eλ[0,1) [f |Fn ] and therefore that Xnf , Fn ,
λ[0,1) is a martingale.
In particular, Xnf −→ f both (a.e., λ[0,1) ) as well as in
1
L λ[0,1) ; R .

(iii) Show that for each p ∈ (1, ∞) and f ∈ L1 λ[0,1) ; R with mean value 0,
(p − 1) ∧ (p − 1)−1 kf kLp ([0,1);R)
 p1
2  p2
  
∞
Z Z !
 X X 
≤ f (s) WF (s) ds WF (t)  dt
  
  
[0,1) n=1 F ⊆An \An−1 [0,1)
≤ (p − 1) ∨ (p − 1)−1 kf kLp ([0,1);R) .

Exercise 6.3.11. Although Burkholder’s Inequality is extremely useful, it
does not give particularly good estimates in the case of martingales with bounded
increments. For such martingales, the following line of reasoning, which was
introduced by J. Azema in his thesis, is useful.
(i) For any a ∈ R and x ∈ [−1, 1], show that
1 + x a 1 − x −a
eax ≤ e + e = cosh a + x sinh a.
2 2
(ii) Suppose that {Y1 , . . . , Yn } are [−1, 1]-valued random variables on the prob-
ability space (Ω, F, P) with the property that, for each 1 ≤ m ≤ n,

EP Yj1 · · · Yjm = 0 for all 1 ≤ j1 < · · · < jm ≤ n.
Show that, for any {aj }n1 ⊆ R,

    
n n n
X Y 1 X
EP exp  aj Yj  ≤ cosh aj ≤ exp  a2j  ,
j=1 j=1
2 j=1
and conclude that

  !
n 2
X R
P aj Yj ≥ R ≤ exp − Pn , R ∈ [0, ∞).
j=1
2 j=1 a2j
(iii) Suppose that (Xn , Fn , P) is a bounded martingale with X0 ≡ 0, and set

Dn ≡ kXn − Xn−1 kL∞ (P) . Show that
!
R2
P (Xn ≥ R) ≤ exp − Pn , R ∈ [0, ∞).
2 j=1 Dj2
Chapter 7
Continuous Parameter Martingales
It turns out that many of the ideas and results introduced in § 5.2 can be easily
transferred to the setting of processes depending on a continuous parameter. In
addition, the resulting theory is intimately connected with Lévy processes, and
particularly Brownian. In this chapter, I will give a brief introduction to this
topic and some of the techniques to which it leads.1
§ 7.1 Continuous Parameter Martingales
There is a huge number of annoying technicalities which have to be addressed in
order to give a mathematically correct description of the continuous time theory
of martingales. Fortunately, for the applications which I will give here, I can
keep them to a minimum.
§ 7.1.1. Progressively
Measurable
Functions. Let (Ω, F) be a measurable
space and Ft : t ∈ [0, ∞) a non-decreasing family of sub-σ-algebras. I will say
that a function X on [0, ∞)×Ω space (E, B) is progressively
into a measurable
measurable with respect to Ft : t ∈ [0, ∞) if X [0, T ] × Ω is B[0,T ] × FT -
measurable for every T ∈ [0, ∞). When E is a metric space, I will say that
X : [0, ∞) × Ω −→ E is right-continuous if X(s, ω) = limt&s X(t, ω) for every
(s, ω) ∈ [0, ∞) × Ω and will say that it is continuous if X( · , ω) is continuous
for all ω ∈ Ω.
Remark 7.1.1. The reader might have been expecting a slightly different def-
inition of progressive measurability
here. Namely,
he might have thought that
one would say that X is Ft : t ∈ [0, ∞) -progressively measurable if it is
B[0,∞) × F-measurable and ω ∈ Ω 7−→ X(t, ω) ∈ E is Ft -measurable for each
t ∈ [0, ∞). Indeed, in extrapolating from the discrete parameter setting, this
would be the first definition at which one would arrive. In fact, it was the notion
with which Doob and Itô originally worked; and such functions were said by
them to be adapted to Ft : t ∈ [0, ∞) . However, it came to be realized
that there are various problems with the notion of adaptedness. For example,
even if X is adapted and f : E −→ R is a bounded, B-measurable function, the
1 A far more thorough treatment can be found in D. Revuz and M. Yor’s treatise Continuous
Martingales and Brownian Motion, Springer-Verlag, Grundlehren der Mathematishen #293
(1999).
266
§ 7.1 Continuous Parameter Martingales 267
Rt
function (t, ω) Y (t, ω) ≡ 0 f X(s, ω) ds ∈ R need not be adapted. On the
other hand, if X is progressively measurable, then Y will be also.
The following simple lemma should help to explain the virtue of progressive
measurability and its relationship to adaptedness.
Lemma 7.1.2. Let PM denote the set of A ⊆ [0, ∞) × Ω with the property
that [0, t] × Ω ∩ A ∈ B[0,t] × Ft for every t ≥ 0. Then PM is a sub-σ-algebra of
B[0,∞) ×F and X is progressively measurable if and only if it is PM-measurable.
Furthermore, if E is a separable metric space and X : [0, ∞) × Ω −→ E is a
right-continuous function, then X is progressively measurable if it is adapted.
Proof: Checking that PM is a σ-algebra is easy. Furthermore, for any X :
[0, ∞) × Ω −→ E, T ∈ [0, ∞), and Γ ∈ B,

(t, ω) ∈ [0, T ] × Ω : X(t, ω) ∈ Γ

= [0, T ] × Ω ∩ (t, ω) ∈ [0, ∞) × Ω : X(t, ω) ∈ Γ},

and so X is Ft : t ∈ [0, ∞) -progressively measurable if and only if it is PM-
measurable. Hence, the first assertion has been proved.
Next, suppose that X is a right-continuous, adapted function. To see that X
is progressively measurable, let t ∈ [0, ∞) be given, and define
n
Xnt (τ, ω) = X [2 2τn]+1 ∧ t, ω , for (τ, ω) ∈ [0, ∞) × Ω and n ∈ N.
Obviously, Xnt is B[0,t] ×Ft -measurable for every n ∈ N and Xnt (τ, ω) −→ X(τ, ω)
as n → ∞ for every (τ, ω) ∈ [0, t] × Ω. Hence, X [0, t] × Ω is B[0,t] × Ft -
measurable, and so X is progressively measurable.
§ 7.1.2. Martingales: Definition and Examples. Given a probability space
(Ω, F, P ) and a non-decreasing family of sub-σ-algebras Ft : t ∈ [0, ∞) , I will
that X : [0, ∞) × Ω −→ (−∞, ∞] is a submartingale
say with respect to
Ft : t ∈ [0, ∞) or, equivalently, that X(t), Ft , P is a submartingale if X
is a right-continuous, progressively measurable function with the properties that
X(t)− is P-integrable for every t ∈ [0, ∞) and

X(s) ≤ EP X(t)Fs (a.s., P) for all 0 ≤ s ≤ t < ∞.

When both X(t), Ft , P and −X(t), Ft , P are submartingales, I will say either
that X is a martingale with respect to Ft : t ∈ [0, ∞) or simply that
X(t), Ft , P is a martingale. Finally, if Z : [0, ∞) × Ω −→ C is a right-
continuous, progressively measurable function, then
Z(t), Ft , P is said
to be a
(complex) martingale if both Re Z(t), Ft , P and Im Z(t), Ft , P are.
The next two results show that Lévy processes provide a rich source of con-
tinuous parameter martingales.
268 7 Continuous Parameter Martingales
Theorem 7.1.3. Let µ ∈ I(RN ) with µ̂(ξ) = e`µ (ξ) , where `µ (ξ) equals
√ √
Z
−1(ξ,y)RN

−1 ξ, m RN − ξ, Cξ RN + e − 1 − 1[0,1] (|y|) ξ, y RN
M (dy).
RN
If (Ω, F, P) is a probability space and Z : [0, ∞) × Ω −→ RN is a B[0,∞) × F-

measurable map with the properties that Z(0, ω) = 0 and Z( · , ω) ∈ D(RN ) for
every ω ∈ Ω, then {Z(t) : t ≥ 0} is a Lévy process for µ if and only if, for each
x ∈ RN ,
√
(7.1.4) exp −1(ξ, Z(t))RN − t`µ (ξ) , Ft , P is a martingale,

where Ft = σ {Z(τ ) : τ ∈ [0, t]} .
Proof: If {Z(t) : t ≥ 0} is a Lévy process for µ, then, because Z(t) − Z(s) is

independent of Fs and has characteristic function e(t−s)`µ (ξ) ,
h √ i
EP exp −1 ξ, Z(t) RN − t`µ (ξ) Fs

h √ i
−1 ξ,Z(t)−Z(s)
− s`µ (ξ) e(s−t)`µ (ξ) EP e
√
= exp −1 ξ, Z(s) RN
RN
√
= exp −1 ξ, Z(s) RN
− s`µ (ξ) .
To prove the converse assertion, observe that the defining distributional property
of a Lévy process for µ can be summarized as the statement that Z(0, ω) = 0
and, for each 0 ≤ s < t, Z(t) − Z(s) is independent of σ {Z(τ ) : τ ∈ [0, t]} and
cτ = eτ `µ . Hence, since (7.1.4) implies that
has distribution µt−s , where µ
h √ i
EP exp −1 ξ, Z(t) − Z(s) RN Fs = e(t−s)`µ (ξ) , ξ ∈ RN ,
there is nothing more to do.

Another, and often more useful, way to capture the same result is to introduce
the Lévy operator
1
Lµ ϕ(x) = Trace C∇2 ϕ(x) + m, ∇ϕ(x) RN

2 Z
(7.1.5) h i
+ ϕ(x + y) − ϕ(x) − 1[0,1] (|y|) y, ∇ϕ(x) RN M (dy)
RN
for ϕ ∈ Cb2 (RN ; C).

Theorem 7.1.6. Assume that µ ∈ I(RN ) and that {Z(t) : t ≥ 0} is a Lévy

1,2 N

process for µ. Then, for every F ∈ Cb [0, ∞) × R ; C ,
Z t

F t, Z(t) − ∂τ + Lµ F τ, Z(τ ) dτ, Ft , P
0

is a martingale, where Ft = σ {Z(τ ) : τ ∈ [0, t]} and Lµ is the operator
described in (7.1.5). Conversely, if Z is a progressively measurable function
satisfying Z(0, ω) = 0 and Z( · , ω) ∈ D(RN ) for each ω ∈ Θ, and if
Z t
Lµ ϕ Z(τ ) dτ, Ft , P

ϕ Z(t) −
0
is a martingale for each ϕ ∈ Cc∞ (RN ; R), then {Z(t) : t ≥ 0} is a Lévy process
for µ.
Proof: Begin by noting that it suffices to handle the case when F is the
restriction to [0, ∞) × RN of an element of the Schwartz test function space
S (R × RN ; C). Indeed, because kLµ ϕku ≤ CkϕkCb2 (RN ;C) for some C < ∞,
the result for F ∈ Cb1,2 [0, ∞) × RN ; C follows, via an obvious approxima-

tion procedure, from the result for F ∈ S (R × RN ; C). Next observe that
it suffices to treat F ∈ S (RN ; C). To see this, simply interpret the process
t ∈ [0, ∞) 7−→ t, Zµ (t) ∈ RN +1 as a Lévy process for δ1 × µ.
Now let ϕ ∈ S (RN ; C) be given. The key to proving the required result is the
identity
d
(*) ϕ ? µ̆t = (Lµ ϕ) ? µ̆t ,
dt
where µ̆t is the distribution of −x under µt , the measure determined by µbt = et`µ .
The easiest way to check (*) is to work via Fourier transform and to use (3.2.10)
to verify that
d\
ϕ ? µ̆t (ξ) = `µ (−ξ)ϕ̂(ξ)et`µ (−ξ) = Ld
µ ϕ(ξ)et`µ (−ξ) ,
dt
which is equivalent to (*). To see how (*) applies, observe that

EP ϕ Z(t) Fs = ϕ ? µ̆t−s Z(s) ,
and therefore that, for any A ∈ Fs ,

Z t
EP (Lµ ϕ) ? µ̆τ −s Z(s) , A dτ

EP ϕ Z(t) , A] − EP ϕ Z(s) , A =
s
Z t Z t
µ µ
P
P

= E L ϕ Z(τ ) , A dτ = E L ϕ Z(τ ) dτ, A ,
s s
which, after rearrangement, is the asserted martingale property.

To prove the converse assertion, again begin with the observation that, by
an easy approximation procedure, one can prove the martingale property for all
ϕ ∈ Cb2 (RN ; C) as soon
√
as one knows it for ϕ ∈ Cc∞ (RN ; R). In particular, one
−1(ξ,x)RN
can take ϕ(x) = e , in which case Lµ ϕ = `µ (ξ)ϕ, and therefore, for
any A ∈ Fs , one gets that
h √ Z t
i
u(t) ≡ EP exp −1(ξ, Z(t) RN N , A = u(s) + `µ (ξ) u(τ ) dτ.
R s
Since this means that u(t) = e(t−s)`µ (ξ) u(s), it follows that {Z(t) : t ≥ 0}
satisfies (7.1.4) and is therefore a Lévy process for µ.
As an immediate consequence of the preceding we have the following charac-
terizations of the distribution of a Lévy process. In the statement that follows,
Ft is the σ-algebra over D(RN ) generated by {ψ(τ ) : τ ∈ [0, t]}.

Theorem 7.1.7. Given µ ∈ I(RN ), let Qµ ∈ M1 D(RN ) be the distribution
of a Lévy process for µ. Then Qµ is the unique P ∈ M1 D(RN ) that satisfies
either one of the properties:
h√ i
exp −1 ξ, ψ(t) RN + t`µ (ξ) , Ft , P
is a martingale with mean value 1 for each ξ ∈ RN ,
or
Z t
µ

ϕ ψ(t) − ϕ(0) − L ϕ ψ(τ ) dτ, Ft , P
0
is a martingale with mean value 0 for each ϕ ∈ Cc∞ (RN ; R).
§ 7.1.3. Basic Results. In this subsection I run through some of the results
from § 5.2 that transfer immediately to the continuous parameter setting.
Lemma 7.1.8. Let the interval I and the function f : I −→ R ∪ {∞} be as in
Corollary 5.2.10. If either X(t), Ft , P is an I-valued martingale or X(t), Ft , P
is an I-valued submartingale
and f is non-decreasing and bounded below, then
f ◦ X(t), Ft , P is a submartingale.
Proof: The fact that the parameter is continuous plays no role here, and so
this result is already covered by the argument in Corollary 5.2.10.

Theorem 7.1.9 (Doob’s Inequality). Let X(t), Ft , P be a submartingale.
Then, for every α ∈ (0, ∞) and T ∈ [0, ∞),
! " #
1 P
P sup X(t) ≥ α ≤ E X(T ), sup X(t) ≥ α .
t∈[0,T ] α t∈[0,T ]
In particular, for non-negative submartingales and T ∈ [0, ∞),

" # p1
p 1
sup X(t)p EP X(T )p p ,

EP ≤ p ∈ (0, ∞).
t∈[0,T ] p−1
Proof: Because of Exercise 1.4.18, I need only prove the first assertion. To
this end, let T ∈ (0, ∞) and
n ∈ N be given,
apply Theorem 5.2.1 to the discrete
mT
parameter submartingale X 2n , F mT n
, P , and observe that
2
mT n

sup X 2n : 0≤m≤2 % sup X(t) as n → ∞.
t∈[0,T ]
Theorem 7.1.10 (Doob’s Martingale Convergence Theorem). Assume

that X(t), Ft , P is a P-integrable submartingale. If
sup EP X(t)+ < ∞,

t∈[0,∞)
then there exists an F∞ ≡ t≥0 Ft -measurable X = X(∞) ∈ L1 (P; R) to which

W

X(t) converges P-almost surely as t → ∞. Moreover, when X(t), Ft , P is either
a non-negative submartingale or a martingale, the convergence takes place in
1
L (P; R) if and only if the family X(t) : t ∈ [0, ∞) is uniformly P-integrable,
in which case X(t) ≤ EP [X | Ft ] or X(t) = EP [X | Ft ] (a.s., P) for all t ∈ [0, ∞),
and

1
(7.1.11) P sup |X(t)| ≥ α ≤ EP |X|, sup |X(t)| ≥ α .
t≥0 α t≥0

Finally, again when X(t), Ft , P is either a non-negative submartingale
or a
martingale, for each p ∈ (1, ∞) the family |X(t)|p : t ∈ [0, ∞) is uniformly P-
integrable if and only if supt∈[0,∞) kX(t)kLp (P) < ∞, in which case X(t) −→ X
in Lp (P; R).
Proof: To prove the initial convergence assertion, note that, by Theorem
W 5.2.15
applied to the discrete parameter process X(n), Fn , P , there is an n∈N Fn -
measurable X ∈ L1 (P; R) to which X(n) converges P-almost surely. Hence,
we need only check that limt→∞ X(t) exists in [−∞, ∞] P-almost surely. To
(n)
this end, define U[a,b] (ω) for n ∈ N and a < b to be the precise number of
times that the sequence X 2mn , ω : m ∈ N upcrosses the interval [a, b] (cf. the

(n)
paragraph preceding Theorem 5.2.15), observe that U[a,b] (ω) is non-decreasing
(n)
as n increases, and set U[a,b] (ω) = limn→∞ U[a,b] (ω). Note that if U[a,b] (ω) < ∞,
then (by right-continuity), there is an s ∈ [0, ∞) such that either X(t, ω) ≤ b for
all t ≥ s or X(t, ω) ≥ a for all t ≥ s. Hence, we will know that X(t, ω) converges

in [−∞, ∞] for P-almost every ω ∈ Ω as soon as we show that EP U[a,b] < ∞
for every pair a < b. In addition, by (5.2.16), we know that

P
h
(n)
i EP (X(t) − a)+
sup E U[a,b] ≤ sup < ∞,
n∈N t∈[0,∞) b−a
and so the required estimate follows

from the Monotone Convergence Theorem.
Now assume that X(t), Ft , P is either a non-negative submartingale or a
martingale.
Given the preceding, it is clear that X(t) −→ X in L1 (P; P)
if X(t) : t ∈ [0, ∞) is uniformly P-integrable. Conversely, suppose that
X(t) −→ X in L1 (P; R). Then, for any T ∈ [0, ∞),

(*) |X(T )| ≤ lim EP |X(t)| FT = EP |X| FT .
t→∞
In particular, from Theorem 7.1.9,

! " #
1 P
P sup |X(t)| ≥ α ≤ E |X|, sup |X(t)| ≥ α
t∈[0,T ] α t∈[0,T ]
for every T ∈ (0, ∞). Hence, (7.1.11) follows when one lets T → ∞. But, again
from (*),

EP |X(T )|, |X(T )| ≥ α ≤ EP |X|, |X(T )| ≥ α ≤ EP |X|, sup |X(t)| ≥ α ,
t≥0

and therefore, since, by (7.1.11), P supt≥0 |X(t)| ≥ α −→ 0 as α → ∞, we can
conclude that {X(t) : t ≥ 0} is uniformly P-integrable.
Finally, if {X(T ) : T ≥ 0} is bounded in Lp (P; R) for some p ∈ (1, ∞), then,
by the last part of Theorem 7.1.9, supt≥0 |X(t)|p is P-integrable and therefore
X(t) −→ X in Lp (P; R).
§ 7.1.4. Stopping Times and Stopping Theorems. A stopping time
relative to a non-decreasing family {Ft : t ≥ 0} of σ-algebras is a map ζ :
Ω −→ [0, ∞] with the property that {ζ ≤ t} ∈ Ft for every t ≥ 0. Given a
stopping time ζ, I will associate with it the σ-algebra Fζ consisting of those
A ⊆ Ω such S∞ that A ∩ {ζ ≤ t} ∈ Ft for every t ≥ 0. Note that, because
{ζ < t} = n=0 {ζ ≤ (1 − 2−n )t}, {ζ < t} ∈ Ft for all t ≥ 0.
Here are a few useful facts about stopping times.
Lemma 7.1.12. Let ζ be a stopping time. Then ζ is Fζ -measurable, and,
for any progressively measurable function X with
values in a measurable space
(E, B), the function ω X(ζ, ω) ≡ X ζ(ω), ω is Fζ -measurable on {ζ < ∞} in

the sense that ω : ζ(ω) < ∞ & X(ζ, ω) ∈ Γ ∈ Fζ for all Γ ∈ B. In addition,
f ◦ ζ is again a stopping time if f : [0, ∞] −→ [0, ∞] is a non-decreasing, right-
continuous function satisfying f (τ ) ≥ τ for all τ ∈ [0, ∞]. Next, suppose that
ζ1 and ζ2 are a pair of stopping times. Then ζ1 + ζ2 , ζ1 ∧ ζ2 , and ζ1 ∨ ζ2
are all stopping times, and Fζ1 ∧ζ2 ⊆ Fζ1 ∩ Fζ2 . Finally, for any A ∈ Fζ1 ,
A ∩ {ζ1 ≤ ζ2 } ∈ Fζ1 ∧ζ2 .
Proof: Since {ζ ≤ s} ∩ {ζ ≤ t} = {ζ ≤ s ∧ t} ∈ Ft , it is clear that ζ is
Fζ -measurable. Next, suppose that X is a progressively measurable
function.

To prove that X(ζ) is Fζ -measurable, begin by checking that ω : ζ(ω), ω ∈
A ∈ Ft for any A ∈ Bt × Ft . Indeed, this is obvious when A = [0, s] × B for
s ∈ [0, t] and B ∈ Ft and, since these generate B[0,t] × Ft , follows in general.
Now, for any t ≥ 0 and Γ ∈ B,

A(t, Γ) ≡ (τ, ω) ∈ [0, ∞) × Ω : τ, X(τ, ω) ∈ [0, t] × Γ ∈ B[0,t] × Ft ,
and therefore

{X(ζ) ∈ Γ} ∩ {ζ ≤ t} = ω : ζ(ω), ω ∈ A(t, Γ) ∈ Ft .
As for f ◦ ζ when f satisfies the stated conditions, simply note that
{f ◦ ζ ≤ t} = {ζ ≤ f −1 (t)} ∈ Ft ,
where f −1 (t) ≡ inf{τ : f (τ ) ≥ t} ≤ t.
Next suppose that ζ1 and ζ2 are two stopping times. It is trivial to see that
ζ1 ∧ ζ2 and ζ1 ∨ ζ2 are again stopping times. In addition, if Q denotes the set of
rational numbers, then
[
{ζ1 + ζ2 > t} = {ζ1 > t} ∪ {ζ1 ≥ qt & ζ2 > (1 − q)t} ∈ Ft .
q∈Q∩[0,1]
Thus, ζ1 + ζ2 is a stopping time. To prove the final assertions,

begin with the
observation that if ζ1 ≤ ζ2 , then A ∩ {ζ2 ≤ t} = A ∩ {ζ1 ≤ t} ∩ {ζ2 ≤ t} ∈ Ft
for all A ∈ Fζ1 and t ≥ 0, and therefore Fζ1 ⊆ Fζ2 . Next, for any ζ1 and ζ2 ,
{ζ1 ≤ ζ2 } ∈ Fζ2 since
[
{ζ1 > ζ2 } ∩ {ζ2 ≤ t} = {ζ1 > qt} ∩ {ζ2 ≤ qt} ∈ Ft .
q∈Q∩[0,1]
Finally, if A ∈ Fζ1 , then

A ∩ {ζ1 ≤ ζ2 } ∩ {ζ1 ∧ ζ2 ≤ t} = A ∩ {ζ1 ≤ t} ∩ {ζ1 ≤ t ∧ ζ2 },
and therefore, since A ∩ {ζ1 ≤ t} ∈ Ft and {ζ1 ≤ t ∧ ζ2 } ∈ Ft∧ζ2 ⊆ Ft , we have
that A ∩ {ζ1 ≤ ζ2 } ∈ Fζ1 ∧ζ2 .
In order to prove the continuous parameter analog of Theorems 5.2.13 and
5.2.11, I will need the following uniform integrability result.

Lemma 7.1.13. If X(t), Ft , P is either a martingale or a non-negative, inte-
grable submartingale, then, for each T > 0, the set

X(ζ) : ζ is a stopping time dominated by T
is uniformly P-integrable. Furthermore, if, in addition, {X(t) : t ≥ 0} is uni-
formly P-integrable and (cf. Theorem
7.1.10) X(∞) = limt→∞ X(t) (a.s., P),
then X(ζ) : ζ is a stopping time is uniformly P-integrable.

Proof: Throughout, without loss in generality, I will assume that X(t), Ft , P
is a non-negative, integrable submartingale.
n
Given a stopping time ζ ≤ T , define ζn = [2 2ζ]+1
n for n ≥ 0. By Lemma 7.1.12,
ζn is again a stopping time. Thus, by Theorem 5.2.13 applied to the discrete
parameter submartingale X(m2−n ), Fm2−n , P ,

X(ζn ) ≤ EP X 2−n ([2n T ] + 1) Fζn ≤ EP X(T + 1) Fζn ,

and so

EP X(ζn ), X(ζn ) ≥ α ≤ EP X(T + 1), X(ζn ) ≥ α
" #
P
≤E X(T + 1), sup X(t) ≥ α .
t∈[0,T +1]
Starting from here, noting that ζn & ζ as n → ∞, and applying Fatou’s Lemma,
we arrive at
" #

(*) EP X(ζ), X(ζ) > α ≤ EP X(T + 1), sup X(t) ≥ α .
t∈[0,T +1]

Hence, since, by Theorem 7.1.9, P supt∈[0,T +1] X(t) ≥ α tends to 0 as α → ∞,
this proves the first assertion. When {X(t) : t ≥ 0} is uniformly integrable, we
can replace (*) by

P
P
E X(ζ ∧ T ), X(ζ ∧ T ) > α ≤ E X(∞), sup X(t) ≥ α
t≥0
for any stopping time ζ and T > 0. Hence, after another application of Fatou’s
Lemma, we get

P
P
E X(ζ), X(ζ) > α ≤ E X(∞), sup X(t) ≥ α .
t≥0
At the same time, the first inequality in Theorem 7.1.9 can be replaced by

1 P 1
P sup X(t) ≥ α ≤ E X(∞), sup X(t) ≥ α ≤ EP [X(∞)],
t≥0 α t≥0 α
and so the asserted uniform integrability follows.
It turns out that in the continuous time context, Doob’s Stopping Time The-
orem is most easily seen as a corollary of Hunt’s. Thus, I will begin with Hunt’s.
Theorem 7.1.14 (Hunt). Stopping Time Theorem, Hunt’s continuous pa-

rameter Let X(t), Ft , P be either a non-negative, integrable submartingale
or a martingale.
If ζ1 and ζ2 are bounded stopping times and ζ1 ≤ ζ2 , then
X(ζ1 ) ≤ E X(ζ2 ) Fζ1 , and equality holds in the martingale case. Moreover,
P
when {X(t) : t ≥ 0} is uniformly P-integrable and X(∞) ≡ limt→∞ X(t), then

the same result holds for arbitrary stopping times ζ1 ≤ ζ2 .
Proof: Given ζ1 ≤ ζ2 ≤ T , define (ζi )n = 2−n [2n ζi ] + 1 for n ≥ 0, note that

(ζi )n is a {Fm2−n : m ≥ 0}-stopping time and that Fζ1 ⊆ F(ζ1 )n , and apply
Theorem 5.2.13 to the discrete parameter submartingale X(m2−n , Fm2−n , P
in order to see that
h i h i
EP X (ζ1 )n , A ≤ EP X (ζ2 )n , A , A ∈ Fζ1 ,
with equality in the martingale case. Because of right-continuity and Lemma
7.1.13,
X (ζi )n −→ X(ζi ) in L1 (P; R), and so we have now shown that X(ζ1 ) ≤
E X(ζ2 ) Fζ1 , with equality in the martingale case.
P
When {X(t) : t ≥ 0} is uniformly P-integrable and ζ1 ≤ ζ2 are unbounded,

{X(ζi ∧ T ) : T ≥ 0} is uniformly P-integrable for i ∈ {1, 2}. Hence, for any
A ∈ Fζ1 and 0 ≤ t ≤ T ,

EP X(T ∧ ζ1 ), A ∩ {ζ1 ≤ t} ≤ EP X(T ∧ ζ2 ), A ∩ {ζ1 ≤ t} ,
with equality in the martingale case. Letting first T and then t tend to infinity,
one gets the same relationship for X(ζ1 ) and X(ζ2 ), initially with A ∩ {ζ1 < ∞}
and then, trivially, with A alone.

Theorem 7.1.15 (Doob’s Stopping Time Theorem). If X(t), Ft , P is
either a non-negative, integrable submartingale
or a martingale, then, for every
stopping time ζ, X(t ∧ ζ), Ft , P is either an integrable submartingale or a
martingale.
Proof: Given 0 ≤ s < t and A ∈ Fs , note that A ∩ {ζ > s} ∈ Fs∧ζ and
therefore, by Hunt’s Theorem applied to the stopping times s ∧ ζ and t ∧ ζ, that

EP X(t ∧ ζ), A = EP X(ζ), A ∩ {ζ ≤ s} + EP X(t ∧ ζ), A ∩ {ζ > s}

≥ EP X(ζ), A ∩ {ζ ≤ s} + EP X(s ∧ ζ), A ∩ {ζ > s} = EP X(s ∧ ζ), A ,
where the inequality is an equality in the martingale case.

To demonstrate just how powerful these results are, I give the following ex-
tension of the independent increment property of Lévy processes. In its state-
ment, the maps δt : D(RN ) −→ D(RN ) for t ∈ [0, ∞) are defined so that
δt ψ(τ ) = ψ(τ + t) − ψ(t),
τ ∈ [0, ∞). Also,
Ft = σ {ψ(τ ) : τ ∈ [0, t]} , ζ is a
stopping time relative to Ft : t ∈ [0, ∞) , and δζ is the map on {ψ : ζ(ψ) < ∞}
into D(RN ) given by δζ ψ = δζ(ψ) ψ.

Theorem 7.1.16. Given µ ∈ I(RN ), let Qµ ∈ M1 D(RN ) be the distribution
of the Lévy process for µ. Then, for each stopping time ζ and FD(RN ) × Fζ -
measurable functions F : D(RN ) × D(RN ) −→ [0, ∞),
Z ZZ
F δζ ψ, ψ Qµ (dψ) = 1[0,∞) ζ(ψ 0 ) F (ψ, ψ 0 ) Qµ (dψ)Qµ (dψ 0 ).

{ζ<∞}
Proof: By elementary measure theory, all that we have to show is that, for
each B ∈ Fζ contained in {ζ < ∞}, Qµ (δζ−1 Γ) ∩ B = Qµ (Γ)Qµ (B).
Given B ∈ Fζ contained in {ζ < ∞} with Qµ (B) > 0, choose T > 0 so that
Qµ (BT ) > 0 when BT = B ∩ {ζ ≤ T }, and define QT ∈ M1 D(RN ) so that
Qµ (δζ−1 Γ) ∩ BT

QT (Γ) = .
Qµ (BT )
If we show that QT = Qµ , then we will know that
Qµ (δζ−1 Γ) ∩ B = lim Qµ (δζ−1 Γ) ∩ BT

T →∞
= Q (Γ) lim Qµ (BT ) = Qµ (Γ)Qµ (B)
µ
T →∞
and therefore will be done.

By Theorem 7.1.6, checking that QT = Qµ comes down to showing that, for
any 0 ≤ s < t, ξ ∈ RN , and A ∈ Fs ,
√ √
EQT e −1(x,ψ(t))RN −t`µ (ξ) , A = EQT e −1(x,ψ(s))RN −s`µ (ξ) , A .

To this end, note that, by Theorem 7.1.14 applied to s + ζ ∧ T and t + ζ ∧ T ,

√
Qµ (BT )EQT e −1(x,ψ(t))RN −t`µ (ξ) , A

µ
h √ √ i
= EQ e− −1(ξ,ψ(ζ))RN +ζ`µ (ξ) e −1(ξ,ψ(t+ζ))RN −(t+ζ)`µ (ξ) , (δζ−1 A) ∩ BT
µ
h √ √ i
= EQ e− −1(ξ,ψ(ζ))RN +ζ`µ (ξ) e −1(ξ,ψ(s+ζ))RN −(s+ζ)`µ (ξ) , (δζ−1 A) ∩ BT
√
= Qµ (BT )EQT e −1(x,ψ(s))RN −s`µ (ξ) , A ,

√
since ψ e− −1(ξ,ψ(ζ))RN +ζ`µ (ξ) 1A (δζ ψ)1BT (ψ) is Fs+ζ∧T -measurable.
§ 7.1.5. An Integration by Parts Formula. In this subsection I will derive
a simple result that has many interesting applications.
Theorem 7.1.17. Suppose V : [0, ∞)×Ω −→ C is a right-continuous, progres-
sively measurable function, and let |V |(t, ω) ∈ [0, ∞] denote the total variation
var[0,t] V ( · , ω) of V ( · , ω) on the interval [0, t]. Then |V | : [0, ∞)×Ω −→ [0, ∞]
is a non-decreasing, progressively measurable function that is right-continuous

on each interval [0, t) for which |V |(t, ω) < ∞. Next, suppose that X(t), Ft , P
is a C-valued martingale with the property that, for each (t, ω) ∈ (0, ∞) × Ω,
the product kX( · , ω)k[0,t] |V |(t, ω) < ∞, and define
(R
(0,t]
X(s, ω) V (ds, ω) if |V |(t, ω) < ∞
Y (t, ω) =
0 otherwise,
where, in the case when |V |(t, ω) < ∞, the integral is the Lebesgue integral of
X( · , ω) on [0, t] with respect to the C-valued measure determined by V ( · , ω).
If h i
EP kXk[0,T ] |V |(T ) + V (0) < ∞ for all T ∈ (0, ∞),

then X(t)V (t) − Y (t), Ft , P is a martingale.
Proof: Without loss in generality,

I will assume
that both X and V are R-
valued. To see that |V | is Ft : t ∈ [0, ∞) -progressively measurable, simply
observe that, by right-continuity,
[2n t]
X
k+1 k

|V |(t, ω) = sup V
2n ∧ t, ω − V 2n , ω
;
n∈N
k=0
and to see that |V |( · , ω) is right-continuous on [0, t) whenever |V |(t, ω) < ∞,

recall that the magnitude of the jumps (from the right and left) of the variation
of a function coincide with those of the function itself.
I turn now to the second part. Certainly Y is Ft : t ∈ [0, ∞) -progressively
measurable. In addition, because kX( · , ω)k[0,t] |V |(t, ω) < ∞ for all (t, ω) ∈
[0, ∞) × Ω, for any ω ∈ Ω one has that
Z
Y (t, ω) = 0 or Y (t, ω) = X(s, ω) V (ds, ω) for all t ∈ [0, ∞);
(0,t]
and so, in either case, Y ( · , ω) is right-continuous and Y (t, ω) − Y (s, ω) can be

computed as
[2n t]
X
k+1
k+1
k

lim X 2n ∧ t, ω V 2n ∧ t, ω − V 2n ∨ s, ω .
n→∞
k=[2n s]
In fact, under the stated integrability condition, the convergence in the preceding
takes place in L1 (P; R) for every t ∈ [0, ∞); and therefore, for any 0 ≤ s ≤ t < ∞
and A ∈ Fs ,

EP Y (t) − Y (s), A
[2n t] h
X
k+1
k+1
k
i
= lim EP X 2n ∧ t, ω V 2n ∧ t, ω − V 2n ∨ s, ω ,A
n→∞
k=[2n s]
[2n t] h
X
k+1
k
i
= lim EP X(t) V 2n ∧ t, ω − V 2n ∨ s, ω ,A
n→∞
k=[2n s]
h i h i
= EP X(t) V (t) − V (s) , A = EP X(t)V (t) − X(s)V (s), A ,
and clearly this is equivalent to the asserted martingale property.
We will make frequent practical applications of Theorem 7.1.17 later, but
here I will show that it enables us to prove that there is an important dichotomy
between continuous martingales and functions of bounded variation. However,
before doing so, I need to make a small, technical digression.
A function ζ : Ω −→ [0, ∞] is an extended stopping time relative to
Ft : t ∈ [0, ∞) if {ζ < t} ∈ Ft for every t ∈ (0, ∞). Since {ζ < t} ∈ Ft for any
stopping time ζ, it is clear that every stopping time is an extended stopping time.
On the other hand, not every extended stopping time is a stopping time. To wit,
if X : [0, ∞)× Ω −→ R is a right-continuous,
progressively measurable function
relative to σ X(τ ) : τ ∈ [0, t] : t ≥ 0 , then ζ = inf{t ≥ 0 : X(t) > 1} will
always be an extended stopping time but will seldom be a stopping time.
T
Lemma 7.1.18. For each t ≥ 0, set Ft+ = τ >t Fτ . Then ζ : Ω −→ [0, ∞]
is an extended stopping time if and only if it is a stopping time relative to
{Ft+ : t ≥ 0}. Moreover, if X(t), Ft , P is either a non-negative,
integrable
submartingale or a martingale, then so is X(t), Ft+ , P . In particular, if ζ is
an extended stopping time, then X(t ∧ ζ), Ft+ , P is a non-negative, integrable
submartingale or a martingale.
T
Proof: The first assertion is immediate from {ζ ≤ t} = τ >t {ζ < τ }. To prove
the second assertion, apply right-continuity and the first uniform integrability
result in Lemma 7.1.13 to see that if 0 ≤ s < t and A ∈ Fs+ , then

EP X(s), A = lim EP X(τ ), A ≤ EP X(t), A ,
τ &s
where the inequality is an equality in the martingale case.

Theorem 7.1.19. Suppose that X(t), Ft , P is a continuous martingale, and
let |X|(t, ω) = var[0,t] X( · , ω) denote the variation of X( · , ω) [0, t]. Then

P ∃t > 0 0 < |X|(t, ω) < ∞ = 0.
Equivalently, for P-almost every ω and all t > 0, either X(τ, ω) = X(0, ω) for
τ ∈ [0, t] or |X|(t, ω) = ∞.
Proof: Without loss in generality, I will assume that X(0, ω) ≡ 0. Given

R > 0, let ζR (ω) = sup{t ≥ 0 : |X|(t, ω) ≤ R}, and set XR (t) = X(t ∧ ζR ).
Then ζR is an extended stopping time, and so, by Lemma 7.1.18, (XR (t), Ft+ , P)
is a bounded martingale. Hence, by Theorem 7.1.17,
Z t
2
XR (t) − XR (τ ) XR (dτ ), Ft+ , P
0
is also a martingale, and so

Z t
EP XR (t)2 = EP

XR (τ ) XR (dτ ) .
0
On the other hand, since XR ( · ) is continuous, and therefore, by Fubini’s Theo-

rem, ZZ Z t
XR (t)2 = XR (dτ1 )XR (dτ2 ) = 2 XR (τ ) XR (dτ ),
0
[0,t]2
we also know that

Z t
E XR (t)2 = 2EP
P

XR (τ ) XR (dτ ) .
0

Hence, EP XR (t)2 = 0 for all t > 0, which means that XR ( · ) ≡ 0 P-almost
surely.
The preceding result leads immediately to the following analog of the unique-
ness statement in Lemma 5.2.12.
Corollary 7.1.20. Let X : Ω −→ R be a right-continuous, progressively
measurable function. Then, up to a P-null set, there is at most one continuous,
progressively measurable A : Ω −→ R such that A(0, ω) = 0, A( · , ω) is of
locally bounded variation for P-almost every ω ∈ Ω, and X(t) − A(t), Ft , P is
a martingale.
The role of continuity here seems minor, but it is crucial. Namely, continu-
ity was used in Theorem 7.1.19 only when I wanted to know that XR (t)2 =
Rt
Namely, if {N (t)
2 0 XR (τ ) XR (dτ ). On the other hand, it is critical. : t ≥ 0}
is the simple Poisson process in § 4.2 and Ft = σ N (τ ) : τ ∈ [0, t] , then it
is easy to check that N (t) − t, Ft , P is a martingale, all of whose paths are of
locally bounded variation.

Exercise 7.1.21. The definition of stopping times and their associated σ-
algebras that I have adopted is due to E.B. Dynkin. Earlier, less ubiquitous
but more transparent, definitions appear in the work of Doob and Hunt under
the name of optional stopping times. To explain these earlier definitions, let E
be a complete, separable metric space and Ψ a non-empty collection of right-
continuous paths ψ : [0, ∞) −→ E with the property that for all ψ ∈ Ψ and
t ∈ [0, ∞), the stopped path ψ t given by ψ t (τ ) = ψ(t∧τ ) is again in Ψ. Similarly,
given a function ζ : Ψ −→ [0, ∞], define ψ ζ so that ψ ζ (t) = ψ t ∧ ζ(ψ) . Finally,
for each t ∈ [0, ∞), define the σ-algebras
W Ft over Ψ to be the one generated
by {ψ(τ ) : τ ∈ [0, t]}, and take F = t≥0 Ft . In terms of these quantities, an
optional stopping time is an F-measurable map ζ : Ψ −→ [0, ∞] such that
ζ(ψ) ≤ t =⇒ ζ(ψ) = ζ(ψ t ), and the associated σ-algebra is σ {ψ ζ (t) : t ≥ 0} .
The goal of this exercise is to show that ζ is an optional stopping time if and
only if it is a stopping time and that its associated σ-algebra is Fζ .
(i) It is an easy matter (cf. Exercise 4.1.9) to check that f : Ω −→ R is F-
+ +
measurable if and only if there exists a B Z -measurable F : E Z −→ R and a
sequence {tm : m ∈ Z+ } such that f (ψ) = F ψ(t1 ), . . . , ψ(tm ), . . . , from which
it is clear that an F-measurable f will be Ft -measurable for some t ∈ [0, ∞) if
and only if f (ψ) = f (ψ t ). Use this to show that every optional stopping time is
a stopping time.

(ii) Show that ζ : Ψ −→ [0, ∞] is a stopping time relative to Ft : t ∈ [0, ∞)
if and only if it is F-measurable and, for each t ∈ [0, ∞), {ψ : ζ(ψ) ≤ t} =
{ψ : ζ(ψ t ) ≤ t}. In addition, if ζ is a stopping time, show that ζ(ψ) < ∞ =⇒
ζ(ψ) = ζ(ψ ζ ), and therefore that ζ(ψ) ≤ t =⇒ ζ(ψ) = ζ(ψ t ) for all t ∈ [0, ∞).
Thus, ζ is an optional stopping time if and only if it is a stopping time.
Hint: In proving the second
part, check that {ζ = t} ∈ Ft , and conclude that
1{t} ζ(ψ) = 1{t} ζ(ψ t ) for all (t, ψ) ∈ [0, ∞) × Ψ.

(iii) If ζ is a stopping time, show that Fζ = σ {ψ ζ (t) : t ≥ 0} . Besides having
intuitive value, this shows that, at least in the situation here, Fζ is countably
generated.
Hint: Using right-continuity, first show that ψ ψ ζ is F-measurable. Next,
given a B-measurable f : E −→ R and t ∈ [0, ∞), use (ii) to show that

1[0,t] ζ(ψ) f ψ ζ (τ ) = 1[0,t] ζ(ψ t ) f ψ(τ ∧ ζ(ψ t ) , τ ∈ [0, ∞),

and conclude that σ {ψ ζ (t) : t ≥ 0} ⊆ Fζ . To prove the opposite inclu-
sion, show that if f : Ψ −→ R is Fζ -measurable, then, for each t ∈ [0, ∞),
1{t} ζ(ψ) f (ψ) = 1{t} ζ(ψ t ) f (ψ t ), and thereby arrive at f (ψ) = f (ψ ζ ). Fi-
nally, use this together with Exercise 4.1.9 to show that f is σ {ψ ζ (t) : t ≥ 0} -
measurable.

Exercise 7.1.22. Let (Ω, F, P) be a probability space and Ft : t ∈ [0, ∞)
is non-decreasing family of sub-σ-algebras of F. Denoteby F and Ft the com-
pletions of F and Ft with respect to P. If X(t), Ft , P is a submartingale or

martingale, show that X(t), Ft , P is also.
Exercise 7.1.23. Let µ ∈ I(RN ) be given as in Exercise 3.2.23, and extend `µ

to CN accordingly. If {Z(t) : t ≥ 0} is a Lévy process for µ, show that (7.1.4)
continues to hold for all ξ ∈ CN .
Exercise 7.1.24. In Exercise 3.3.12, we discussed one-sided
stable laws, and
in Exercise 4.3.12 we showed that P maxτ ∈[0,t] B(τ ) ≥ a = 2P B(t) ≥ a ,
where {B(t) : t ≥ 0} is an R-valued Brownian motion. In this exercise, we will
examine the relationship between these two.
(i) Set ζ a (ψ) = inf{t ≥ 0 : ψ(t) ≥ a}, and show that the result in Exercise
4.3.12 can be rewritten as
∞
r Z
2 y2
(1) a
e−

W ζ ≤t = 2 dy.
π at− 2
1
Now use the results in Exercise 3.3.14 (especially (3.3.16)) to conclude that the
1 1
W (1) -distribution of ζ a is ν 21 , the one-sided 12 -stable law “at time 2 2 a.”
22 a
(ii) Here is another, more conceptual way to understand the conclusion drawn
in (i) that the W (1) -distribution is a one-sided 12 -stable law. Namely, begin
by
a a+b a b
showing that if ψ(0) = 0 and ζ (ψ) < ∞, then ζ (ψ) = ζ (ψ) + ζ δζ ψ . As
a
an application of Theorem 7.1.16, conclude from this that if βa denotes the W (1) -
distribution of ζ a , then βa+b = βa ? βb . In particular, this means that β ≡ β1
ca = ea`β , where `β is the exponent appearing in
is infinitely divisible and that β
the Lévy–Khinchine formula for β̂.
(iii) Next, use Brownian scaling to see that, for all λ > 0, ζ λa has the same W (1) -
distribution as λ2 ζ a , and use this together with part (iii) of Exercise 3.3.12 to
1
see that the distribution of ζ 1 is νc2 for some c > 0.
1
(iv) Although we know from (i) that the constant c must be 2 2 , here is an
1 2
easier way to find it. Use Exercise 7.1.23 to see that eλψ(t)− 2 λ t , Ft , W (1)
for every λ ∈ R, and apply Doob’s Stopping Time Theorem and the fact that
(1) 1 2 a
W (1) (ζ a < ∞) = 1 to verify the identity EW e− 2 λ ζ = e−λa for λ > 0.
1 √
Hence, the Laplace transform of νc2 is e− 2λ , which, by the calculation in part
1
(iii) of Exercise 3.3.12, means that c = 2 2 . Of course, this calculation makes
the preceding parts of this exercise unnecessary. Nonetheless, it is interesting to
see the Brownian explanation for the properties of the one-sided, 12 -stable laws.
Exercise 7.1.25. An important corollary of Theorem 7.1.16 is the following

formula. Working in the setting of that theorem, show that, for any stopping
time ζ and t ∈ (0, ∞) and Γ ∈ BRN ,
µ
h i
Qµ {ψ : ψ(t) ∈ Γ & ζ(ψ) ≤ t} = EQ µt−ζ Γ − ψ(ζ) , ζ ≤ t ,

cτ = eτ `µ . As a consequence,
where, as usual, µτ is determined by µ
µ
h i
Qµ {ψ : ψ(t) ∈ Γ & ζ(ψ) > t} = µt (Γ) − EQ µt−ζ Γ − ψ(ζ) , ζ ≤ t ,

which is a quite general, generic statement of what is called Duhamel’s For-

mula.
§ 7.2 Brownian Motion and Martingales
In this section we will see that continuous martingales and Brownian motion are
intimately related concepts. In addition, we will find that martingale theory,
and especially Doob’s and Hunt’s Stopping Time Theorems, provides a powerful
tool with which to study Brownian paths.
§ 7.2.1. Lévy’s Characterization of Brownian Motion. When applied
to µ = γ0,I , Theorem 7.1.6 says that a progressively measurable function B :
[0, ∞) × Ω −→ RN with B(0, ω) = 0 and B( · , ω) ∈ D(RN ) is a Brownian motion
if and only if
Z t
1

ϕ B(t) − 2 ∆ϕ B(τ ) dτ, Ft , P
0
is a martingale for all ϕ ∈ Cc∞ (RN ; R). In this subsection, I, following Lévy,1 will
give another martingale characterization of Brownian motion, this time involving
many fewer test functions. On the other hand, we will have to assume ahead of
time that B( · , ω) ∈ C(RN ) for every ω ∈ Ω.
Theorem 7.2.1 (Lévy). Let B : [0, ∞) × Ω −→ RN be a progressively mea-
surable function satisfying
B(0, ω) = 0 and B( · , ω) ∈ C(RN ) for every ω ∈ Ω.
Then B(t), Ft , P is a Brownian motion if and only if
2 t|η|2
ξ, B(t) RN
+ η, B(t) RN − , Ft , P
2
is a martingale for every ξ, η ∈ RN .

1 Lévy’s Theorem is Theorem 11.9 in Chapter VII of Doob’s Stochastic Processes, Wiley (1953).
Doob uses a clever but somewhat opaque Central Limit argument. The argument given here
is far simpler and is adapted from the one introduced by H. Kunita and S. Watanabe in their
article “On square integrable martingales,” Nagoya Math. J. 30 (1967).
§ 7.2 Brownian Motion and Martingales 283

Proof: First suppose that B(t), Ft , P is a Brownian motion. Then, because
B(t) − B(s) is independent of Fs and has distribution γ0,I ,

EP B(t) − B(s) Fs = 0 and EP B(t) ⊗ B(t) − B(s) ⊗ B(s) Fs = (t − s)I.
Hence, the necessity is obvious.

To prove the sufficiency, Theorem 7.1.3 says that it is enough to prove that
h√ i
P
t|ξ|2
E exp −1 ξ, B(t) RN + 2 , A
(*) h√ i
P
s|ξ|2
= E exp −1 ξ, B(s) RN + 2 , A
for 0 ≤ s < t and A ∈ Fs . The challenge is to learn how to do this by taking

full advantage of the assumed continuity. To this end, let ∈ (0, 1] be given, set
ζ0 ≡ s, and use induction to define
n o
ζn = inf t ≥ ζn−1 : B(t) − B ζn−1 ≥ ∧ ζn−1 + ∧ t
for n ∈ Z+ . Proceeding by induction, one can easily check that {ζn : n ≥ 0}

is a non-decreasing sequence of [s, t]-valued stopping times. Hence, by Theorem
7.1.14 and our assumption,
h i h i
(**) EP ∆n Fζn−1 = 0 = EP ∆2n − δn Fζn−1 ,

where

∆n (ω) ≡ ξ, B ζn (ω), ω − B ζn−1 (ω), ω
RN
2

δn (ω) ≡ |ξ| ζn (ω) − ζn−1 (ω) .
Moreover, because B( · , ω) is continuous, we know that, for each ω ∈ Ω, |∆n (ω)|

≤ |ξ|, δn (ω) ≤ |ξ|2 , and ζn (ω) = t for all but a finite number of n’s. In
particular, we can write the difference
between the left and the right sides of (*)
+
as the sum over n ∈ Z of E Dn Mn , A], where
P
h√ i
Dn ≡ exp −1 ∆n + δ2n − 1
h√ 2
i
Mn ≡ exp −1 ξ, B(ζn−1 ) RN + |ξ|2 ζn−1 .

By Taylor’s Theorem,
√ √
1 |ξ|2 2 √
2 3
δn 1 δn δn
Dn − −1 ∆n + − −1 ∆n + ≤ 6 e −1 ∆n + 2 .

2 2 2
√
Hence, after rearranging terms, we see that Dn = −1 ∆n − 12 ∆2n − δn + En ,

where, by our estimates on ∆n and δn ,

|ξ|2 |ξ|2
δ2 δ3

|En | ≤ 12 |∆n δn | + 8n + 23 e 2 |∆n |3 + 8n ≤ 1 + |ξ|2 e 2 ∆2n + δn ;

and so, after taking (**) into account,we arrive at

X∞ X ∞

EP Dn Mm , A = EP En Mn , A

1 1
∞
|ξ|2 X |ξ|2
≤ 2 1 + |ξ|2 e EP δn |Mn |, A ≤ 2 1 + |ξ|2 (t − s)e 2 (1+t) .

2
In other words, we have now proved that, for every ∈ (0, 1], the difference
|ξ|2
between the two sides of (*) is dominated by 2(1 + |ξ|2 )(t − s)e 2 (1+t) , and so
the equality in (*) has been established.
As in Theorem 7.1.19, the subtlety here is in the use of the continuity as-
sumption. Indeed, the same example that demonstrated its importance there,
does so again here. Namely, if {N (t) : t ≥ 0} is a simple Poisson
process and
X(t) = N (t) − t, then both X(t), Ft , P and X(t)2 − t, Ft , P are martingales,
but X(t), Ft , P is certainly not a Brownian motion.
§ 7.2.2. Doob–Meyer Decomposition, an Easy Case. The continuous pa-
rameter analog of Lemma 5.2.12 is a highly non-trivial result, one that was
proved by P.A. Meyer and led him to his profound analysis of stochastic pro-
cesses. Nonetheless, there is an important case in which Meyer’s result is rel-
atively easy to prove, and that is the case proved in this subsection. However,
before getting to that result, there is a rather fussy matter to be dealt with.
Lemma 7.2.2. For each n ∈ N, let Xn : [0, ∞) −→ R be a right-continuous,
progressively measurable function with the property that Xn ( · , ω) is continuous
for P-almost every ω ∈ Ω. If
lim sup kXn ( · , ω) − Xm ( · , ω)k[0,t] = 0 (a.s., P) for each t ∈ (0, ∞),
m→∞ n>m
then there is a right-continuous, progressively measurable X : [0, ∞) −→ R such

that X( · , ω) is continuous and Xn ( · , ω) −→ X( · , ω) uniformly on compacts for
P-almost every ω ∈ Ω.
Proof: Set A = {(t, ω) : limm→∞ supn>m kXn ( · , ω) − Xm ( · , ω)k[0,t] = 0}.
Then A is progressively measurable. Next, define ζ(ω) = sup{t ≥ 0 : (t, ω) ∈ A},
note that {ζ < t} ∈ Ft for each t ∈ (0, ∞), and set B = {(t, ω) : ζ(ω) ≥ t}.
Then, B is again progressively measurable. To see this, first note that
Ω ∈ Ft if t ≤ s

{(τ, ω) ∈ [0, t] × Ω : τ ∧ ζ(ω) < s} =
{ζ < s} ∈ Ft if t > s,
and so (τ, ω) τ ∧ζ(ω) and therefore also (τ, ω) τ ∧ζ(ω)−τ are progressively
measurable functions. Hence, since B = {(τ, ω) : τ ∧ ζ(ω) − τ ≥ 0}, B is
progressively measurable.
Now define

 limn→∞ Xn (t, ω) if (t, ω) ∈ A

X(t, ω) = 0 if (t, ω) ∈ B \ A

X ζ(ω), ω if (t, ω) ∈
/ B.

Clearly X( · , ω) is right-continuous. Moreover, because ζ = ∞ (a.s., P), X( · , ω)

is continuous and Xn ( · , ω) −→ X( · , ω) uniformly on compacts for P-almost
every ω ∈ Ω. Thus, it only remains to check that X is progressively measurable.
For this purpose, let Γ ∈ BR be given, and set C = {(t, ω) : X(t, ω) ∈ Γ}.
Because A and the Xn ’s are progressively measurable, it is clear that C ∩ A is
progressively measurable. Similarly, because B \ A is progressively measurable
and C ∩ (B \ A) equals B \ A or ∅ depending on whether 0 ∈ Γ or 0 ∈ / Γ,
C ∩ (B \ A), and therefore C ∩ B, are progressively measurable. Hence, we
now know that X B is progressively measurable. Finally, we showed earlier
that (t, ω) t ∧ ζ(ω) is progressively
measurable, and therefore so is (t,ω) ∈
[0, ∞) × Ω 7−→ t ∧ ζ(ω), ω ∈ B. Thus, because X(t, ω) = X t ∧ ζ(ω), ω , we
are done.

Theorem 7.2.3. Let X(t), Ft , P be an R-valued, square integrable mar-
tingale with the property that X( · , ω) is continuous for P-almost every ω ∈
Ω. Then there is a P-almost surely unique progressively measurable function
hXi : [0, ∞) × Ω −→ [0, ∞) such that hXi(0, ω) = 0 and hXi( · , ω) is continuous

and non-decreasing for P-almost every ω ∈ Ω, and X(t)2 − hXi(t), Ft , P is a
martingale.
Proof: The uniqueness is an immediate consequence of Corollary 7.1.20.
The proof of existence, which is based on a suggestion I got from K. Itô, is
very much like that of Theorem 7.2.1. Without loss in generality, I will assume
that X(0) ≡ 0.
I begin by reducing to the case when X is P-almost surely bounded. To this
end, suppose that we know the result in this case. Given a general X and n ∈ N,
define ζn = inf{t ≥ 0 : |X(t)| ≥ n} and Xn (t) = X(t∧ζn ). Then, |Xn ( · , ω)| ≤ n
% ∞ for P-almost every ω ∈ Ω. Moreover, by
and, by Doob’s Inequality, ζn (ω)
Corollary 7.1.15, Xn (t), Ft , P is a martingale. Thus, by our assumption, for
each n, we know hXn i exists. In addition, by Corollary 7.1.15 and uniqueness,
we know (cf. Exercise 7.2.10) that, P-almost surely, hXm i(t) = hXn i(t ∧ ζm ) for
all m ≤ n and t ≥ 0. Now define hXi so that hXi(t) = hXn i(t) for ζn ≤ t < ζn+1 .
Then hXi is progressively measurable and right-continuous, hXi(0) = 0, and,
P-almost surely, hXi is continuous and non-decreasing. Furthermore, X(t ∧

ζn )2 − hXi(t ∧ ζn ), Ft , P is a martingale for each n ∈ N. Finally, note that, by
Doob’s Inequality,
EP khXik[0,t] ≤ EP kXk2[0,t] ≤ 4EP |X(t)|2 ,

and so, as n → ∞, X(t ∧ ζn)2 − hXi(t ∧ ζn ) −→ X(t)2 − hXi(t) in L1 (P; R).

Hence, X(t)2 − hXi(t), Ft , P is a martingale.
I now assume that |X( · , ω)| ≤ C < ∞ for P-almost every ω ∈ Ω. For each
n ∈ N, use induction to define {ζk,n : k ≥ 0} so that ζ0,n = 0, ζk,0 = k, and, for
(k, n) ∈ (Z+ )2 , ζk,n is equal to

inf{ζ`,n−1 : ζ`,n−1 > ζk−1,n }

∧ inf t ≥ ζk−1,n : (t − ζk−1,n ) ∨ |X(t) − X(ζk−1,n )| ≥ n1 .
Working by induction, one sees that, for each n ∈ N, {ζk,n : k ≥ 0} is a non-
decreasing sequence of bounded stopping times. Moreover, because X( · , ω) is
P-almost surely continuous, we know that, for each n ∈ N, limk→∞ ζk,n (ω) = ∞
P-almost every ω ∈ Ω. Finally, the sequences {ζk,n : k ≥ 0} are nested in the
sense that {ζk,n−1 : k ≥ 0} ⊆ {ζk,n : k ≥ 0} for each n ∈ Z+ .
Set Xk,n = X ζk,n ) and, for k ≥ 1, ∆k,n (t) = X t ∧ ζk,n − X t ∧ ζk−1,n .
Then X(t)2 = 2Mn (t) + hXin (t), where
∞
X ∞
X
Mn (t) = Xk−1,n ∆k,n (t) and hXin (t) = ∆k,n (t)2 .
k=1 k=1
Of course, for P-almost every ω ∈ Ω, all but a finite number of terms in each of
these sums vanish. In addition, one should observe that hXin (s) ≤ hXin (t) if
s ≥ 0 and t − s > n1 .
I now want to show that Mn (t), Ft , P is a P-almost surely continuous mar-
tingale for all n ∈ N, and the first step is to show for each (k, n) ∈ Z+ × N,
Xk−1,n ∆k,n (t), Ft , P is a P-almost surely continuous martingale. Indeed, if
0 ≤ s < t and A ∈ Fs , then

EP Xk−1,n ∆k,n (t), A = EP Xk−1,n ∆k,n (t), A ∩ {ζk−1,n ≤ s}

+ EP Xk−1,n ∆k,n (t), A ∩ {ζk−1,n > s} .
Next, check that

EP Xk−1,n ∆k,n (t), A ∩ {ζk−1,n ≤ s}

= EP Xk−1,n X(ζk,n ) − X(ζk−1,n ) , A ∩ {ζk,n ≤ s}
h i
+ EP Xk−1,n X (t ∧ ζk,n ) ∨ s − X(ζk−1,n ) , A ∩ {ζk−1,n ≤ s < ζk,n }

= EP Xk−1,n ∆k,n (s), A ∩ {ζk,n ≤ s}

+ EP Xk−1,n X(s) − X(ζk−1,n ) , A ∩ {ζk−1,n ≤ s < ζk,n }

= EP Xk−1,n ∆k,n (s), A ∩ {ζk−1,n ≤ s} ,
where, in the passage to the second to last equality, I have used the fact that
Xk−1,n 1A 1[ζk−1,n ,ζk,n ) (s) is Fs -measurable and applied Theorem 7.1.14. At the
same time

EP Xk−1,n ∆k,n (t), A ∩ {ζk−1,n > s}

= EP Xk−1,n X(t ∧ ζk,n ) − X(t ∧ ζk−1,n ) , A ∩ {s < ζk−1,n ≤ t}

= EP Xk−1,n X(t) − X(t) , A ∩ {s < ζk−1,n ≤ t}

= 0 = EP Xk−1,n ∆k,n (s), A ∩ {ζk−1,n > s} ,
where I have used the fact that Xk−1,n 1A 1(s,t] (ζk−1,n ) is Ft∧ζk−1,n -measurable
and again applied Theorem 7.1.14 in getting the second
to last line. After
combining these, one sees that EP Xk−1,n ∆ k,n (t), A = E P
Xk−1,n ∆k,n (s), A ,
which means that Xk−1,n ∆k,n (t), Ft , P is a P-almost surely continuous mar-
tingale.
Given the preceding, it is clear that, for each n and `, Mn (t ∧ ζ`,n ), Ft , P
is a P-almost surely continuous, square integrable martingale. In addition, for
k 6= k 0 , Xk−1 ∆k,n (t ∧ ζ`,n ) is orthogonal to Xk0 −1 ∆k0 ,n (t ∧ ζ`,n ) in L2 (P; R).
Thus
" #
Mn (τ ) ≤ 4EP Mn (t ∧ ζ`,n )2
2
P

E sup
0≤τ ≤t∧ζ`,n
`
X `
X
2
∆k,n (t ∧ ζ`,n )2 ≤ 4C 2 EP ∆k,n (t ∧ ζ`,n )2

=4 EP Xk−1,n
k=1 k=1
2 P 2 2 2
P

= 4C E X(t ∧ ζ`,n ) ≤ 4C E X(t) ,

from which it is easy to see that Mn (t), Ft , P is a P-almost surely continuous,
square integrable martingale.
I will now show that limm→∞ supn>m kMn − Mm k[0,t] = 0 P-almost surely
(m) (m)
and in L2 (P; R) for each t ∈ [0, ∞). To this end, define Yk−1,n so that Yk−1,n (ω)
(m)
= Xk−1,n (ω) − X`−1,m (ω) when ζ`−1,m (ω) ≤ ζk−1,n (ω) < ζ`,m (ω). Then Yk−1,n
(m) 1
P∞ (m)
is Fk−1,n -measurable, |Yk−1,n | ≤ m (a.s., P), and Mn −Mm = k=1 Yk−1,n ∆k,n .
Hence, by the same reasoning as above,
∞
X (m) 4
EP kMn − Mm k2[0,t] ≤ 4 EP (Yk−1,n )2 ∆k,n (t)2 ≤ 2 EP X(t)2 ,

m
k=1
which is more than enough to get the asserted convergence result.

We can now apply Lemma 7.2.2 to produce a right-continuous, progressively
measurable M : [0, ∞) × Ω −→ R which is P-almost surely continuous and to
which {Mn : n ≥ 1} converges uniformly on compacts, both P-almost surely

and in L2 (P; R). In particular, M (t), Ft , P is a square integrable martingale.
Finally, set hXi = (X 2 − 2M )+ . Obviously, hXi = X 2 − 2M (a.s., P), and hXi is
right-continuous, progressively measurable, and P-almost surely continuous. In
addition, because, P-almost surely, hXin −→ hXi uniformly on compacts and
hXin (s) ≤ hXin (t) when t − s > n1 , it follows that hXi( · , ω) is non-decreasing
for P-almost every ω ∈ Ω.
Remark 7.2.4. The reader may be wondering why I chose to complicate the
preceding statement and proof by insisting that hXibe progressively measurable

with respect to the original family of σ-algebras Ft : t ∈ [0, ∞) . Indeed,
Exercise 7.1.22 shows that I could have replaced all the σ-algebras with their
completions, and, if I had done so, there would have been no reason not to
have taken X( · , ω) to be continuous and hXi( · , ω) to be continuous and non-
decreasing for every ω ∈ Ω. However, there is a price to be paid for completing
σ-algebras. In the first place, when one does, all statements become dependent
on the particular P with which one is dealing. Secondly, because completed σ-
algebras are nearly never countably generated, certain desirable properties can
be lost by introducing them. See, for example, Theorem 9.2.1.
By combining Theorem 7.2.3 with Theorem 7.2.1, one can show that, up to
time re-parametrization, all continuous martingales are Brownian motions. In
order to avoid technical difficulties, I will prove this result only in the simplest
case.

Corollary 7.2.5. Let X(t), Ft , P be a continuous, square integrable mar-
tingale with the properties that, for P-almost every ω ∈ Ω, hXi( · , ω) is strictly
increasing and limt→∞ hXi(t, ω) = ∞. Then there exists a Brownian motion
B(t), Ft0 , P such that X(t) = X(0) + B hXi(t) , t ∈ [0, ∞) P-almost surely. In

particular,
X(t) X(t)
lim q = 1 = − lim q
t→∞
2hXi(t) log(2) hXi(t) t→∞ 2hXi(t) log(2) hXi(t)
P-almost surely.
Proof: Clearly, given the first part, the last assertion is a trivial application of
Exercise 4.3.15.
After replacing F and the Ft ’s by their completions and applying Exercise
7.1.22, I may and will assume that X(0, ω) = 0, X( · , ω) is continuous, hXi( · , ω)
is continuous and strictly increasing, and limt→∞ hXi(t, ω) = ∞ for every ω ∈ Ω.
Next, for each (t, ω) ∈ [0, ∞), set ζt (ω) = hXi−1 (t, ω), where hXi−1 ( · , ω) is the
inverse of hXi( · , ω). Clearly, for each ω ∈ Ω, t ζt (ω) is a continuous, strictly
increasing function that tends to infinity as t → ∞. Moreover, because hXi is
progressively measurable, ζt is a stopping time for each t ∈ [0, ∞). Now set

B(t) = X(ζt ). Since it is obvious that X(t) = B hXi(t) , all that I have to
show is that B(t), Ft0 , P is a Brownian motion for some non-decreasing family
{Ft0 : t ≥ 0} of sub-σ-algebras.
Trivially, B(0, ω) = 0 and B( · , ω) is continuous for all ω ∈ Ω. In addi-
tion, B(t) is Fζt -measurable, and so B is progressively measurable with respect
to {Fζt : t ≥ 0}. Thus, by Theorem 7.2.1, I will be done once I show that
2
B(t), Fζt , P and B(t) − t, Fζt , P are martingales. To this end, first observe
that
" # " #
EP sup X(τ )2 = lim EP sup X(τ )2
τ ∈[0,ζt ] T →∞ τ ∈[0,T ∧ζt ]
≤ 4 lim EP X(T ∧ ζt )2 ≤ 4 lim EP hXi(T ∧ ζt ) ≤ 4t.

T →∞ T →∞
Thus, limT →∞ X(T ∧ ζt ) −→ B(t) in L2 (P; R). Now let 0 ≤ s < t and A ∈ Fζs
be given. Then, for each T > 0, AT ≡ A ∩ {ζs ≤ T } ∈ FT ∧ζs , and so, by
Theorem 7.1.14,

EP X(T ∧ ζt ), AT = EP X(T ∧ ζs ), AT
and
EP X(T ∧ ζt )2 − hXi(T ∧ ζt ), AT = EP X(T ∧ ζs )2 − hXi(T ∧ ζs ), AT .

Now let T → ∞, and apply the preceding convergence assertion to get the
desired conclusion.
§ 7.2.3. Burkholder’s Inequality Again. In this subsection we will see what
Burkholder’s Inequality looks like in the continuous parameter setting, a result
whose importance for the theory of stochastic integration is hard to overstate.

Theorem 7.2.6 (Burkholder). Let X(t), Ft , P be a P-almost surely con-
tinuous, square integrable martingale. Then, for each p ∈ (1, ∞) and t ∈ [0, ∞)
(cf. (6.3.2)),
p1
(7.2.7) Bp−1 kX(t) − X(0)kLp (P;R) ≤ EP hX(t)i 2 p ≤ Bp kX(t) − X(0)kLp (P;R) .

Proof: After completing the σ-algebras if necessary, I may (cf. Exercise 7.1.22)
and will assume that X( · , ω) is continuous and that hXi( · , ω) is continuous and
non-decreasing for every ω ∈ Ω. In addition, I may and will assume that X(0) =
0. Finally, I will assume that X is bounded. To justify this last assumption, let
ζn = inf{t ≥ 0 : |X(t)| ≥ n}, set Xn (t) = X(t ∧ ζn ), and use Exercise 7.2.10 to
see that one can take hXn i = hXi(t ∧ ζn ). Hence, if we know (7.2.7) for bounded
martingales, then
p1
Bp−1 kX(t ∧ ζn )kLp (P;R) ≤ EP hXi(t ∧ ζn ) 2 p ≤ Bp kX(t ∧ ζn )kLp (P;R)

for all n ≥ 1. Since hXi is non-decreasing, we can apply Fatou’s Lemma to the
preceding and thereby get
p1
kX(t)kLp (P;R) ≤ lim kX(t ∧ ζn )kLp (P;R) ≤ Bp EP hXi(t) 2 p ,
n→∞
which is the left-hand side of (7.2.7). To get the right-hand side, note that either
kX(t)kLp (P;R) = ∞, in which case there is nothing to do, or kX(t)kLp (P;R) < ∞,
in which case, by the second half of Theorem 7.1.9, X(t ∧ ζn ) −→ X(t) in
Lp (P; R) and therefore
p1 p1
EP hXi(t) 2 p = lim EP hXi(t ∧ ζn ) 2 p
n→∞
≤ Bp lim kX(t ∧ ζn )kLp (P;R) = Bp kX(t)kLp (P;R) .
n→∞
Proceeding under the above assumptions and referring to the notation in the
proof of Theorem 7.2.3, begin by observing that, for any t ∈ [0, ∞) and n ∈
N, Theorem 7.1.14 shows that X(t ∧ ζk,n ), Ft∧ζk,n , P is a discrete parameter
martingale indexed by k ∈ N. In addition, ζk,n = t for all but a finite number
of k’s. Hence, by (6.3.7) applied to X(t ∧ ζk,n ), Ft∧ζk,n , P ,
p1
Bp−1 kX(t)kLp (P;R) ≤ EP hXin (t) 2 p ≤ Bp kX(t)kLp (P;R)

for all n ∈ N.
In particular, this shows that supn≥0 khXin (t)kLp (P;R) < ∞ for every p ∈ (1, ∞),
and therefore, since hXin (t) −→ hXi(t) (a.s., P), this is more than enough to
p p
verify that E hXin (t) −→ E hXi(t) for every p ∈ (1, ∞).
P 2 P 2

Exercise 7.2.8. Let X(t), Ft , P be a square integrable, continuous martin-
gale. Following the strategy used to prove Theorem 7.2.1, show that
Z t
1 2

F X(t) − 2 ∂x F X(τ ) hXi(dτ ), Ft , P
0
is a martingale for every F ∈ Cb2 (R; C).

Hint: Begin by using cutoffs and mollification to reduce to the case when F ∈
Cc∞ (R; R). Next, given s < t and > 0, introduce the stopping times ζ0 = s and
ζn = inf{t ≥ ζn−1 : |X(t) − X(ζn−1 )| ≥ } ∧ (ζn−1 + ) ∧ (hXi(ζn−1 ) + ) ∧ t
for n ≥ 1. Now proceed as in the proof of Theorem 7.2.1.


Exercise 7.2.9. Let X(t), Ft , P be a continuous, square integrable mar-
tingale with X(0) = 0, and assume that there exists a non-decreasing function
A : [0, ∞) −→ [0, ∞) such that hXi(t) ≤ A(t) (a.s.,
P) for each t ∈ [0, ∞). The
goal of this exercise is to show that E(t), Ft , P is a martingale when
E(t) = exp X(t) − 12 hXi(t) .

(i) Given R ∈ (0, ∞), set ζR = inf{t ≥ 0 : |X(t)| ≥ R}, and show that
!
Z t∧ζR
eX(t∧ζR ) − 1
2 eX(τ ) dhXi, Ft , P
0
is a martingale.
Hint: Choose F ∈ Cc∞ (R; R) so that F (x) = ex for x ∈ [−2R, 2R], apply
Exercise 7.2.8 to this F , and then use Doob’s Stopping Time Theorem.
1
(ii) Apply Theorem 7.1.17
to the martingale in (i) and e− 2 hXi(t∧ζR ) to show
that E(t ∧ ζR ), Ft , P is a martingale.
(iii) By replacing X and R with 2X and 2R in (ii), show that
EP E(t ∧ ζR )2 ≤ eA(t) EP e2X(t∧ζR )−2hXi(t∧ζR ) = eA(t) .

Conclude that {E(t

∧ ζR ) : R ∈ (0, ∞)} is uniformly P-integrable and therefore
that E(t), Ft , P is a martingale.

Exercise 7.2.10. If X(t), Ft , P is a P-almost surely continuous, square in-
tegrable martingale, ζ is a stopping time, and Y (t) = X(t ∧ ζ), show that
hY i(t) = hXi(t ∧ ζ), t ≥ 0, P-almost surely.
Exercise 7.2.11. Continuing in the setting of Exercise 7.2.9, first show that,
for every λ ∈ R, Eλ (t), Ft , P is a martingale, where
λ2

Eλ (t) = exp λX(t) − 2 hXi(t) .
Next, use Doob’s Inequality to see that, for each λ ≥ 0,

! !
λ2 λ2
P sup X(τ ) ≥ R ≤P sup Eλ (τ ) ≥ eλR− 2 A(t)
≤ e−λR+ 2 A(t)
.
τ ∈[0,t] τ ∈[0,t]
Starting from this, conclude that

R2
P kXk[0,t] ≥ R ≤ 2e− 2A(t) .

(7.2.12)
Finally, given this estimate, show that the conclusion in Exercise 7.2.8 continues
to hold for any F ∈ C 2 (R; C) whose second derivative has at most exponential
growth.
Exercise 7.2.13. Given a pair

of square integrable, continuous martingales
hX+Y i−hX−Y i
X(t), Ft , P and Y (t), Ft , P , set hX, Y i = 4 , and show that
X(t)Y (t) − hX, Y i(t), Ft , P is a martingale. Further, show that hX, Y i is
uniquely determined up to a P-null set by this property together with the facts
that hX, Y i(0, ω) = 0 and hX, Y i( · , ω) is continuous and has locally bounded
variation for P-almost every ω ∈ Ω.

Exercise 7.2.14. Let B(t), Ft , P be an RN -valued Brownian motion. Given
f, g ∈ Cb1,2 [0, ∞) × RN ; R , set

Z t
∂τ + 12 ∆ f τ, B(τ ) dτ,

X(t) = f t, B(t) −
0
Z t
∂τ + 12 ∆ g τ, B(τ ) dτ,

Y (t) = g t, B(t) −
0
and show that Z t

hX, Y i(t) = ∇f · ∇g τ, B(τ ) dτ.
0
Hint: First reduce to the case when f = g. Second, write X(t)2 as
2
Z t
∂τ + 12 ∆ f τ, B(τ ) dτ

f t, B(t) − 2X(t)
0
Z t 2
∂τ + 12 ∆ f τ, B(τ ) dτ

− ,
0
and apply Theorem 7.1.17 to the second term.

§ 7.3 The Reflection Principle Revisited
In Exercise 4.3.12 we saw that Lévy’s Reflection Principle (Theorem 1.4.13) has
a sharpened version when applied to Brownian motion. In this section I will give
another, more powerful way of discussing the reflection principle for Brownian
motion.
§ 7.3.1. Reflecting Symmetric Lévy Processes. In this subsection, µ will
be used to denote a symmetric, infinitely divisible law. Equivalently (cf. Exercise
3.3.11), µ̂ = e`µ (ξ) , where
Z
1
`µ (ξ) = − ξ, Cξ RN + cos ξ, y RN − 1 M (dy)
2 RN
for some non-negative definite, symmetric C and symmetric Lévy measure M .

§ 7.3 The Reflection Principle Revisited 293
Lemma 7.3.1. Let {Z(t) : t ≥ 0} be a Lévy process for µ, and set Ft =

σ {Z(τ ) : τ ∈ [0, t]} . If ζ is a stopping time relative to Ft : t ∈ [0, ∞) and

Z(t) if ζ > t
Z̃(t) ≡ 2Z(t ∧ ζ) − Z(t) =
2Z(ζ) − Z(t) if ζ ≤ t,
˜ : t ≥ 0} is again a Lévy process for µ.
then {Z(t)
Proof: According to Theorem 7.1.3, all that I have to show is that
√
exp −1 (ξ, Z̃(t) RN − t`µ (ξ) , Ft , P
is a martingale for all ξ ∈ RN . Thus, let 0 ≤ s < t and A ∈ Fs be given. Then,

by Theorem 7.1.14 and the fact that `µ (−ξ) = `µ (ξ),
h √ i
EP exp −1 (ξ, Z̃(t) RN − t`µ (ξ) , A ∩ {ζ ≤ s}
h √ √ i
= EP e2 −1(ξ,Z(s∧ζ))RN exp − −1 (ξ, Z(t) RN − t`µ (ξ) , A ∩ {ζ ≤ s}

h √ √ i
= EP e2 −1(ξ,Z(s∧ζ))RN exp − −1 ξ, Z(s) RN − s`µ (ξ) , A ∩ {ζ ≤ s}

h √ i
= EP exp −1 ξ, Z̃(s) RN − t`µ (ξ) , A ∩ {ζ ≤ s} .
Similarly,
h √ i
EP exp −1 ξ, Z̃(t) RN − t`µ (ξ) , A ∩ {ζ > s}
h √ √ i
= EP e2 −1(ξ,Z(t∧ζ))RN exp − −1 (ξ, Z(t) RN − t`µ (ξ) , A ∩ {ζ > s}

h √ i
= EP exp −1 ξ, Z(t ∧ ζ) RN − (t ∧ ζ)`µ (ξ) , A ∩ {ζ > s}
h √ i
= EP exp −1 ξ, Z(s ∧ ζ) RN − (s ∧ ζ)`µ (ξ) , A ∩ {ζ > s}
h √ i
= EP exp −1 ξ, Z̃(s) RN − s`µ (ξ) , A ∩ {ζ > s} .
Obviously, the process {Z̃(t) : t ≥ 0} in Lemma 7.3.1 is the one obtained

by reflecting (i.e., reversing the direction of {Z(t) : t ≥ 0}) at time ζ, and the
lemma says that the distribution of the resulting process is the same as that
of the original one. Most applications of this result are to situations when one
knows more or less precisely where the process is at the time when it is reflected.
For example, suppose N = 1, a ∈ (0, ∞), and ζa = inf{t ≥ 0 : Z(t) ≥ a}. Noting
that, because Z̃(t) = Z(t) for t ≤ ζa and therefore that ζa = inf{t ≥ 0 : Z̃(t) ≥
a}, we have that

P Z(t) ≤ x & ζa ≤ t = P 2Z(ζa ) − Z(t) ≤ x & ζa ≤ t

= P Z(t) ≥ 2Z(ζa ) − x & ζa ≤ t .
Hence, if x ≤ a, and therefore Z(t) ≥ 2Z(ζa ) − x =⇒ ζa ≤ t when ζa < ∞,

then

P Z(t) ≤ x & ζa ≤ t = P Z(t) ≥ 2Z(ζa ) − x & ζa < ∞ for x ≤ a.

Applying this
when x = a and using P ζ a ≤ t = P Z(t) ≤ a & ζa ≤ t +
P Z(t) > a , one gets P ζa ≤ t ≤ 2P Z(t) ≥ a , a conclusion that also could
have been reached via Theorem 1.4.13.
§ 7.3.2. Reflected Brownian Motion. The considerations in the preceding
subsection are most interesting
when applied to R-valued Brownian motion.
Thus, let B(t), Ft , P be an R-valued Brownian motion. To appreciate the
improvements that can be made in the calculations just made, again take ζa =
inf{t ≥ 0 : B(t) ≥ a} for some a > 0. Then, because Brownian paths are
continuous, ζa < ∞ =⇒ B(ζa ) = a and so, since P(ζa < ∞) = 1, we can say
that

(7.3.2) P B(t) ≤ x & ζa ≤ t = P B(t) ≥ 2a−x for (t, x) ∈ [0, ∞)×(−∞, a].

In particular, by taking x = a and using P B(t) ≥ a = P B(t) ≥ a & ζa ≤ t ,
we recover the result in Exercise 4.3.12 that

P ζa ≤ t = 2P B(t) ≥ a .
A more interesting application of Lemma 7.3.1 to Brownian motion is to the
case when ζ is the exit time from an interval other than a half-line.
Theorem 7.3.3. Let a1 < 0 < a2 be given, define ζ (a1 ,a2 ) = inf{t ≥ 0 : B(t) ∈ /
(a1 ,a2 ) (a1 ,a2 )
(a1 , a2 )}, and set Ai (t) = {ζ ≤ t & B(ζ ) = ai } for i ∈ {1, 2}. Then,
for Γ ∈ B[a1 ,∞) ,

0 ≤ P {B(t) ∈ Γ} ∩ A1 (t) − P {B(t) ∈ 2(a2 − a1 ) + Γ} ∩ A1 (t)

= P B(t) ∈ 2a1 − Γ − P B(t) ∈ 2(a2 − a1 ) + Γ
and, for Γ ∈ B(−∞,a2 ] ,

0 ≤ P {B(t) ∈ Γ} ∩ A2 (t) − P {B(t) ∈ −2(a2 − a1 ) + Γ} ∩ A2 (t)

= P B(t) ∈ 2a2 − Γ − P B(t) ∈ −2(a2 − a1 ) + Γ .

Hence, for Γ ∈ B[a1 ,∞) , P {B(t) ∈ Γ} ∩ A1 (t) equals
∞ h
X i
γ0,t Γ − 2a1 + 2(m − 1)(a2 − a1 ) − γ0,t Γ + 2m(a2 − a1 )
m=1

and, for Γ ∈ B(−∞,a2 ] , P {B(t) ∈ Γ} ∩ A2 (t) equals
∞ h
X i
γ0,t Γ − 2a2 − 2(m − 1)(a2 − a1 ) − γ0,t Γ − 2m(a2 − a1 ) ,
m=1
where in both cases the convergence is uniform with respect t in compacts and
Γ ∈ B(a1 ,a2 ) .
Proof: Suppose Γ ∈ B[a1 ,∞) . Then, by Lemma 7.3.1,

P {B(t) ∈ Γ} ∩ A1 (t) = P {2a1 − B(t) ∈ Γ} ∩ A1 (t)

= P B(t) ∈ 2a1 − Γ − P {B(t) ∈ 2a1 − Γ} ∩ A2 (t) ,
since B(t) ∈ 2a1 − Γ =⇒ B(t) ≤ a1 =⇒ ζ (a1 ,a2 ) ≤ t. Similarly,

P {B(t) ∈ Γ} ∩ A2 (t) = P {2a2 − B(t) ∈ Γ} ∩ A1 (t)

= P B(t) ∈ 2a2 − Γ − P {B(t) ∈ 2a2 − Γ} ∩ A1 (t)
when Γ ∈ B(−∞,a2 ] . Hence, since 2a1 − Γ ⊆ (−∞, a1 ] ⊆ (−∞, a2 ] if Γ ∈ B[a1 ,∞) ,

P {B(t) ∈ Γ} ∩ A1 (t) = P B(t) ∈ 2a1 − Γ − P B(t) ∈ 2(a2 − a1 ) + Γ

+ P {B(t) ∈ 2(a2 − a1 ) + Γ} ∩ A1 (t)
when Γ ∈ B[a1 ,∞) . Similarly, when Γ ∈ B(−∞,a2 ] ,

P {B(t) ∈ Γ} ∩ A2 (t) = P B(t) ∈ 2a2 − Γ − P B(t) ∈ −2(a2 − a1 ) + Γ

+ P {B(t) ∈ −2(a2 − a1 ) + Γ} ∩ A2 (t) .
To check that

P {B(t) ∈ Γ}∩A1 (t) −P {B(t) ∈ 2(a2 −a1 )+Γ}∩A1 (t) ≥ 0 when Γ ∈ B[a1 ,∞) ,
first use Theorem 7.1.16 to see that

P {B(t) ∈ Γ} ∩ A1 (t) = EP γ0,t−ζ (a1 ,a2 ) (Γ − a1 ), A1 (t) .

Second, observe that, because Γ ⊆ [a1 , ∞), γ0,τ 2(a2 − a1 ) + Γ ≤ γ0,τ (Γ) for all
τ ≥ 0. The case when Γ ∈ B(−∞,a2 ] and A1 (t) is replaced by A2 (t) is handled in
the same way.
Given the preceding, one can use induction to check that P {B(t) ∈ Γ}∩A1 (t)
equals
M h
X i
P B(t) ∈ 2a1 − 2(m − 1)(a2 − a1 ) − Γ − P B(t) ∈ 2m(a2 − a1 ) + Γ
m=1

+ P {B(t) ∈ 2M (a2 − a1 ) + Γ} ∩ A1 (t)
for all Γ ∈ B[a1 ,∞) . The same line of reasoning applies when Γ ∈ B(−∞,a2 ] and
A1 (t) is replaced by A2 (t).
Perhaps the most useful consequence of the preceding is the following corollary.
Corollary 7.3.4. Given a c ∈ R and an r ∈ (0, ∞), set I = (c − r, c + r) and
P I (t, x, Γ) = P {x + B(t) ∈ Γ} ∩ {ζ I > t} , x ∈ I and Γ ∈ BI .

Then
Z
I
(7.3.5) P (s + t, x, Γ) = P I (t, z, Γ) P I (s, x, dz).
I
Next, set
1 x2
X
g̃(t, x) = g(t, x + 4m), where g(t, x) = (2πt)− 2 e− 2t
m∈Z
and
p(−1,1) (t, x, y) = g̃(t, y − x) − g̃(t, y + x + 2) for (t, x, y) ∈ (0, ∞) × [−1, 1]2 .
Then p(−1,1) is a smooth function that is symmetric in (x, y), strictly positive
on (0, ∞) × (0, 1)2 , and vanishes when x ∈ {−1, 1}. Finally, if
pI (t, x, y) = r−1 p(−1,1) r−2 , r−1 (x − c), r−1 (y − c) , (t, x, y) ∈ (0, ∞) × I 2 ,

then
Z
I
(7.3.6) p (s + t, x, y) = pI (s, x, z)pI (t, z, y) dz
I
and, for (t, x) ∈ (0, ∞) × I, P I (t, x, dy) = pI (t, x, y) dy.

Proof: Begin by applying Theorem 7.1.16 to check that P I (s + t, x, Γ) equals
W (1) {x + ψ(s) + δs ψ(t) ∈ Γ} ∩ {x + ψ(s) + δs ψ(τ ), τ ∈ [0, t − s]}

∩ {x + ψ(σ) ∈ I, σ ∈ [0, s]}
(1)
= EW P I t, x + ψ(s), Γ , {x + ψ(σ) ∈ I, σ ∈ [0, s]}

Z
= P I (t, z, Γ) P I (s, x, dz).
I
Next, set a1 = r−1 (c − x) − 1 and a2 = r−1 (x − x) + 1. Then
P I (t, x, Γ) = P {B(t) ∈ Γ − x} ∩ {B(τ ) ∈ (ra1 , ra2 ), τ ∈ [0, t]}

= P {B(r−2 t) ∈ r−1 (Γ − x)} ∩ {B(r−2 τ ) ∈ (a1 , a2 ), τ ∈ [0, t]}

= P B(r−2 t) ∈ r−1 (Γ − x) & ζ (a1 ,a2 ) > r−2 t

= P B(r−2 t) ∈ r−1 (Γ − x) − P B(r−2 t) ∈ r−1 (Γ − x) & ζ (a1 ,a2 ) ≤ r−2 t ,

where, in the passage to the second line, I have used Brownian scaling. Now,
use the last part of Theorem 7.3.3, the symmetry of γ0,r−2 t , and elementary
rearrangement of terms to arrive first at
Xh i
P I (t, x, Γ) = γr−2 t 4m + r−1 (Γ − x) − γr−2 t 4m + 2 + r−1 (Γ + x − 2c) ,

m∈Z
and then at P I (t, x, dy) = pI (t, x, y) dy. Given this and (7.3.5), (7.3.6) is obvious.
Turning to the properties of p(−1,1) (t, x, y), both its symmetry and smooth-
ness are clear. In addition, as the density for P (−1,1) (t, x. · ), it is non-negative,
and, because x g̃(t, x) is periodic with period 4, it is easy to see that
(−1,1)
p (t, ±1, y) = 0. Thus, everything comes down to proving that p(−1,1) (t, x, y)
> 0 for (t, x, y) ∈ (0, ∞) × (−1, 1)2 . To this end, first observe that, after rear-
ranging terms, one can write p(−1,1) (t, x, y) as
g(t,y − x) − g(t, y + x) + g(t, 2 − x − y)

X∞ h

+ g(t, y − x + 4m) − g(t, y + x + 2 + 4m)
m=1
i
+ g(t, y − x − 4m) − g(t, y + x − 2 − 4m) .
Since each of the terms in the sum over m ∈ Z+ is positive, we have that
2(1−|x|)(1−|y|)
p(−1,1) (t, x, y) > g(t, y − x) 1 − 2e− ≥ 1 − 2e g(t, y − x)

t
if t ≤ 2(1 − |x|)(1 − |y|). Hence, for each θ ∈ (0, 1), p(−1,1) (t, x, y) > 0 for all
(t, x, y) ∈ [0, 2θ2 ] × [−1 + θ, 1 − θ]2 . Finally, to handle x, y ∈ [−1 + θ, 1 − θ] and
t > 2θ2 , apply (7.3.6) with I = (−1, 1) to see that
Z
(−1,1) 2
p (m + 1)θ , x, y) ≥ p(−1,1) (θ2 , x, z)p(−1,1) (mθ2 , z, y) dz,
|z|≤(1−θ)
and use this and induction to see that p(−1,1) (mθ2 , x, y) > 0 for all m ≥ 1. Thus,
if n ∈ Z+ is chosen so that nθ2 < t ≤ (n + 1)θ2 , then another application of
(7.3.6) shows that
Z
(−1,1)
p (t, x, y) ≥ p(−1,1) (t − nθ2 , x, z)p(−1,1) (nθ2 , z, y) dz > 0.
|z|≤(1−θ)

Exercise 7.3.7. Suppose that G is a non-empty, open subset of RN , define
ζxG : C(RN ) −→ [0, ∞] by
ζxG (ψ) = inf{t ≥ 0 : x + ψ(t) ∈

/ G},
and set
P G (t, x, Γ) = W (N ) {ψ : x + ψ(t) ∈ Γ & ζxG (ψ) > t}

for (t, x) ∈ (0, ∞) × G and Γ ∈ BG .

(i) Show that
Z
G
P (s + t, x, Γ) = P G (t, z, Γ) P G (s, x, dy).
G
(ii) As an application of Exercise 7.1.25, show that

(N )
P G (t, x, Γ) = γ0,tI (Γ − x) − EW γ0,(t−ζxG )I Γ − x − ψ(ζxG ) , ζxG ≤ Γ .

. This is the probabilistic version of Duhamel’s Formula, which we will see again
in § 10.3.1.
(iii) As a consequence of (ii), show that there is a Borel measurable function
pG : (0, ∞) × G2 −→ [0, ∞) such that (t, y) pG (t, x, y) is continuous for
each x ∈ G and P G (t, x, dy) = pG (t, x, y) dy for each (t, x) ∈ (0, ∞) × G. In
particular, use this in conjunction with (i) to conclude that
Z
G
p (s + t, x, y) = pG (t, z, y)pG (s, x, z) dz.
G
N |ξ|2
Hint: Keep in mind that (τ, ξ) (2πτ )− 2 e− 2τ is smooth and bounded as
long as ξ stays away from the origin.
(iv) Given c = (c1 , . . . , cN ) ∈ RN and r > 0, let Q(c, r) denote the open cube
QN
i=1 (ci − r, ci + r), and show that (cf. Corollary 7.3.4)
N
Y
pQ(c,r) (t, x, y) = p(ci −r,ci +r) (t, xi , yi )
i=1
for x = (x1 , . . . , xN ), y = (y1 , . . . , yN ) ∈ Q(c, r). In particular, conclude that

pQ(c,r) (t, x, y) is uniformly positive on compact subsets of (0, ∞) × Q(c, r)2 .
(v) Assume that G is connected, and show that pG (t, x, y) is uniformly positive
on compact subsets of (0, ∞) × G2 .
Hint: If Q(c, r) ⊆ G, show that pG (t, x, y) ≥ pQ(c,r) (t, x, y) on (0, ∞)×Q(c, r)2 .
Chapter 8
Gaussian Measures on a Banach Space
As I said at the end of § 4.3.2, the distribution of Brownian motion is called

Wiener measure because Wiener was the first to construct it. Wiener’s own
thinking about his measure had little or nothing in common with the Lévy–
Khinchine program. Instead, he looked upon his measure as a Gaussian measure
on an infinite dimensional space, and most of what he did with his measure is best
understood from that perspective. Thus, in this chapter, we will look at Wiener
measure from a strictly Gaussian point of view. More generally, we will be
dealing here with measures on a real Banach space E that are centered Gaussian
in the sense that, for each x∗ in the dual space E ∗ , x ∈ E 7−→ hx, x∗ i ∈ R is
a centered Gaussian random variable. Not surprisingly, such a measure will be
said to be a centered Gaussian measure on E .
Although the ideas that I will use are already implicit in Wiener’s work, it was
I. Segal and his school, especially L. Gross,1 who gave them the form presented
here.
§ 8.1 The Classical Wiener Space

In order to motivate what follows, it is helpful to first understand Wiener mea-
sure from the point of view which I will be adopting here.
§ 8.1.1. Classical Wiener Measure. Up until now I have been rather casual
about the space from which Brownian paths come. Namely, because Brownian
paths are continuous, I have thought
of their distribution as being a probability
on the space C(RN ) = C [0, ∞); RN . In general, there is no harm done by choos-
ing C(RN ) as the sample space for Brownian paths. However, for my purposes
here, I need my sample spaces to be separable Banach spaces, and, although it
is a complete, separable metric space, C(RN ) is not a Banach space. With this
in mind, define Θ(RN ) to be the space of continuous paths θ : [0, ∞) −→ RN
with the properties that θ(0) = 0 and limt→∞ t−1 |θ(t)| = 0.
1 See I.E. Segal’s “Distributions in Hilbert space and canonical systems of operators,” T.A.M.S.,
88 (1958) and L. Gross’s “Abstract Wiener spaces,” Proc. 5th Berkeley Symp. on Prob. &
Stat., 2 (1965), Univ. of California Press. A good exposition of this topic can be found in
H.-H. Kuo’s Gaussian Measures in Banach Spaces, Springer-Verlag, Math. Lec. Notes., # 463
(1975).
299
300 8 Gaussian Measures on a Banach Space
Lemma 8.1.1. The map

|ψ(t)|
ψ ∈ C(RN ) 7−→ kψkΘ(RN ) ≡ sup ∈ [0, ∞]
t≥0 1 + t

is lower semicontinuous, and the pair Θ(RN ), k · kΘ(RN ) is a separable Banach
space that is continuously embedded as a Borel measurable
subset of C(RN ). In

particular, BΘ(RN ) coincides with BC(RN ) [Θ(R )] = A∩Θ(RN ) : A ∈ BC(RN ) .
N
∗
Moreover, the dual space Θ(RN ) of Θ(RN ) can be identified with the space
of RN -valued, Borel measures λ on [0, ∞) with the properties that λ({0}) = 0
and 1 Z
kλkΘ(RN ) ≡
∗ (1 + t) |λ|(dt) < ∞,
[0,∞)
when the duality relation is given by

Z
hθ, λi = θ(t) · λ(dt).
[0,∞)
Finally, if (B(t), Ft , P) is an RN -valued Brownian motion, then B ∈ Θ(RN )

P-almost surely and
EP kBk2Θ(RN ) ≤ 32N.

Proof: It is obvious that the inclusion map taking Θ(RN ) into C(RN ) is con-
tinuous. To see that k · kΘ(RN ) is lower semicontinuous on C(RN ) and that
Θ(RN ) ∈ BC(RN ) , note that, for any s ∈ [0, ∞) and R ∈ (0, ∞),
n o
A(s, R) ≡ ψ ∈ C(RN ) : ψ(t) ≤ R(1 + t) for t ≥ s

is closed in C(RN ). Hence, since kψkΘ(RN ) ≤ R ⇐⇒ ψ ∈ A(0, R), k · kΘ(RN ) is

lower semicontinuous. In addition, since {ψ ∈ C(RN ) : ψ(0) = 0} is also closed,
∞ [
\ ∞ n o
Θ(RN ) = ψ ∈ A m, n1 : ψ(0) = 0 ∈ BC(RN ) .

n=1 m=1

In order to analyze the space Θ(RN ), k · kΘ(RN ) , define

N N N

F : Θ(R ) −→ C0 R; R ≡ ψ ∈ C R; R : lim |ψ(s)| = 0
|s|→∞
by
θ (es )
F (θ) (s) = , s ∈ R.
1 + es
1 I use |λ| to denote the variation measure determined by λ.
§ 8.1 The Classical Wiener Space 301

As is well known, C0 R; RN with the uniform norm is a separable Banach space,
N N
and it is obvious that F is an isometry from Θ(R ) onto
C0 R; R . Moreover,
by the Riesz Representation Theorem for C0 R; RN , one knows that the dual
of C0 R; RN is isometric to the space of totally finite, RN -valued measures
on R; BR with the norm given by total variation. Hence, the identification
∗
of Θ(RN ) reduces to the obvious interpretation of the adjoint map F ∗ as a
mapping from totally finite RN -valued measures onto the space of RN -valued
measures that do not charge 0 and whose variation measure integrates (1 + t).
Because of the Strong Law in part (ii) of Exercise 4.3.11, it is clear that almost
every Brownian path is in Θ(RN ). In addition, by the Brownian scaling property
and Doob’s Inequality (cf. Theorem 7.1.9),
∞
X
−n+1
kBk2Θ(RN ) 2
P
P
E ≤ 4 E sup |B(t)|
n=0 0≤t≤2n
X∞
−n+2 2
≤ 32EP |B(1)|2 = 32N.
P

= 2 E sup |B(t)|
n=0 0≤t≤1
In view of Lemma 8.1.1, we now know that the distribution of RN -valued

Brownian motion induces a Borel measure W (N ) on the separable Banach space
Θ(RN ), and throughout this chapter I will refer to this measure as the classical
Wiener measure.
My next goal is to characterize, in terms of Θ(RN ), exactly which measure
on Θ(RN ) Wiener’s is, and for this purpose I will use the following simple fact
about Borel probability measures on a separable Banach space.
Lemma 8.1.2. Let E with norm k · kE be a separable, real Banach space, and
use
(x, x∗ ) ∈ E × E ∗ 7−→ hx, x∗ i ∈ R
to denote the duality relation between E and its dual space E ∗ . Then the Borel
field BE coincides with the σ-algebra generated by the maps x ∈ E 7−→ hx, x∗ i
as x∗ runs over E ∗ . In particular, if, for µ ∈ M1 (E), one defines its Fourier
transform µ̂ : E ∗ −→ C by
Z h√ i
µ̂(x∗ ) = exp −1 hx, x∗ i µ(dx), x∗ ∈ E ∗ ,
E
then µ̂ is a continuous function of weak* convergence on Θ∗ , and µ̂ uniquely

determines µ in the sense that if ν is a second element of M1 (Θ) and µ̂ = ν̂,
then µ = ν.
Proof: Since it is clear that each of the maps x ∈ E 7−→ hx, x∗ i ∈ R is
continuous and therefore BE -measurable, the first assertion will follow as soon
as we show that the norm x kxkE can be expressed as a measurable function of

these maps. But, because E is separable, we know (cf. Exercise 5.1.19) that the
closed unit ball BE ∗ (0, 1) in E ∗ is separable with respect to the weak* topology
and therefore that we can find a sequence {x∗n :, n ≥ 1} ⊆ BE ∗ (0, 1) so that
kxkΘ = sup hx, x∗n i, x ∈ E.
n∈Z+
Turning to the properties of µ̂, note that its continuity with respect to weak*
convergence is an immediate consequence of Lebesgue’s Dominated Convergence
Theorem. Furthermore, in view of the preceding, we will know that µ̂ completely
determines µ as soon as we show that, for each n ∈ Z+ and X ∗ = x∗1 , . . . , x∗n ∈

n
E ∗ , µ̂ determines the marginal distribution µX ∗ ∈ M1 (RN ) of

x ∈ E 7−→ hx, x∗1 i, . . . , hx, x∗n i ∈ Rn
under µ. But this is clear (cf. Lemma 2.3.3), since

n
!
X
µd
X ∗ (ξ) = µ̂ ξm x∗m for ξ = (ξ1 , . . . , ξn ) ∈ Rn .
m=1
I will now compute the Fourier transform of W (N) . To this end, first recall
that, for an RN -valued Brownian motion, { ξ, B(t) RN : t ≥ 0 and ξ ∈ RN

spans a Gaussian family G(B) in L2 (P; R). Hence, span ξ, θ(t) : t ≥

0 and ξ ∈ RN is a Gaussian family in L2 (W (N ) ; R). From this, combined
with an easy limit argument using Riemann sum approximations, one sees that,
∗
for any λ ∈ Θ(RN ) , θ hθ, λi is a centered Gaussian random variable under
W (N ) . Furthermore, because, for 0 ≤ s ≤ t,
(N ) (N )
EW ξ, θ(s) RN η, θ(t) RN = EW

ξ, θ(s) RN η, θ(s) RN = s ξ, η RN ,
we can apply Fubini’s Theorem to see that
ZZ
(N )
EW hθ, λi2 =

s ∧ t λ(ds) · λ(dt).
[0,∞)2
Therefore, we now know that W (N ) is characterized by its Fourier transform

 
ZZ
1 ∗
(8.1.3) \
W (N ) (λ) = exp − s ∧ t λ(ds) · λ(dt) , λ ∈ Θ(RN ) .

2

[0,∞)2
Equivalently, we have shown that W (N ) is the centered Gaussian measure on

∗
Θ(RN ) with the property that, for each λ ∈ Θ(RNRR) ,θ hθ, λi is a centered
Gaussian random variable with variance equal to s ∧ t λ(ds) · λ(dt).
[0,∞)2
§ 8.1.2. The Classical Cameron–Martin Space. From the Gaussian stand-

point, it is extremely unfortunate that the natural home for Wiener measure is
a Banach space rather than a Hilbert space. Indeed, in finite dimensions, every
centered, Gaussian measure with non-degenerate covariance can be thought of
as the canonical, or standard, Gaussian measure on a Hilbert space. Namely, if
γ0,C is the Gaussian measure on RN with mean 0 and non-degenerate covariance
C, consider RN as a Hilbert space H with inner product (g, h)H = (g, Ch)RN ,
and take λH to be the natural Lebesgue measure there: the one that assigns
measure 1 to a unit cube in H or, equivalently, the one obtained by pushing the
1
usual Lebesgue measure λRN forward under the linear transformation C 2 . Then
we can write
khk2
1 − 2H
γ0,C (dh) = N e λH (dh)
(2π) 2
and 2 khk
− H
γd
0,C (h) = e 2 .
As was already pointed out in Exercise 3.1.11, in infinite dimensions there is
no precise analog of the preceding canonical representation (cf. Exercise 8.1.7
for further corroboration of this point). Nonetheless, a good deal of insight can
be gained by seeing how close one can come. In order to guess on which Hilbert
space it is that W (N ) would like to live, I will give R. Feynman’s highly ques-
tionable but remarkably powerful way of thinking about such matters. Namely,
n
given n ∈ Z+ , 0 = t0 < t1 < · · · < tn , and a set A ∈ BRN , we know that

W (N ) assigns θ : θ(t1 ), . . . , θ(tn ) ∈ A probability
" n
#
|ym − ym−1 |2
Z
1 X
exp − dy1 · · · dyn ,
Z(t1 , . . . , tn ) A m=1
tm − tm−1
Qn N
where y0 ≡ 0 and Z(t1 , . . . , tn ) = m=1 2π(tm − tm1 ) 2 . Now rename the
variable ym as θ(tm ), and rewrite the preceding as Z(t1 , . . . , tn )−1 times
 !2 
n
− θ(t ) − θ(t )
Z X t m t m−1
m m−1

exp −  dθ(t1 ) · · · dθ(tn ).
A m=1
2 tm − tm−1
Obviously, nothing very significant has happened yet, since nothing very excit-
ing has been done yet. However, if we now close our eyes, suspend our disbelief,
and pass to the limit as n tends to infinity and the tk ’s become dense, we arrive
at Feynman’s representation 2 of Wiener’s measure:
" Z #
(N ) 1 1 2
(8.1.4) W dθ) = exp − θ̇(t) dt dθ,
Z 2 [0,∞)
2 In truth, Feynman himself never dabbled in considerations so mundane as the ones that
√
follow. He was interested in the Schödinger equation, and so he had a factor −1 multiplying
the exponent.
where θ̇ denotes the velocity (i.e., derivative) of θ. Of course, when we reopen

our eyes and take a look at (8.1.4), we see that it is riddled with flaws. Not even
one of the ingredients on the right-hand side of (8.1.4) makes sense! In the first
place, the constant Z must be 0 (or maybe ∞). Secondly, since the image of the
“measure dθ” under
n
θ ∈ Θ(RN ) 7−→ θ(t1 ) . . . , θ(tn ) ∈ RN

is Lebesgue measure for every n ∈ Z+ and 0 < t1 · · · < tn , dθ must be the nonex-
istent translation invariant measure on the infinite dimensional space Θ(RN ). Fi-
nally, the integral in the exponent only makes sense if θ is differentiable in some
sense, but almost no Brownian path is. Nonetheless, ridiculous as it is, (8.1.4)
is exactly the expression at which one would arrive if one were to make a suffi-
ciently naı̈ve interpretation of the notion that Wiener measure is the standard
Gauss measure on the Hilbert space H(RN ) consisting of absolutely continuous
h : [0, ∞) −→ RN with h(0) = 0 and
khkH1 (RN ) = kḣkL2 ([0,∞);RN ) < ∞.
Of course, the preceding discussion is entirely heuristic. However, now that

we know that H1 (RN ) is the Hilbert space at which to look, it is easy to provide
a mathematically rigorous statement of the connection between Θ(RN ), W (N ) ,
and H1 (RN ). To this end, observe that H(RN ) is continuously embedded in
1
Θ(RN ) as a dense subspace. Indeed, if h ∈ H1 (RN ), then |h(t)| ≤ t 2 khkH1 (RN ) ,
and so not only is h ∈ Θ(RN ) but also khkΘ(RN ) ≤ 12 khkH1 (RN ) . In addition,
since Cc∞ (0, ∞); RN is already dense in Θ(RN ), the density of H1 (RN ) in

Θ(RN ) is clear. Knowing this, abstract reasoning (cf. Lemma 8.2.3) guarantees
∗
that Θ(RN ) can be identified as a subspace of H1 (RN ). That is, for each λ ∈
∗
Θ(RN ) , there is a hλ ∈ H1 (RN ) with the property that h, hλ H1 (RN ) = hh, λi
for all h ∈ H1 (RN ), and in the present setting it is easy to give a concrete
∗
representation of hλ . In fact, if λ ∈ Θ(RN ) , then, for any h ∈ H1 (RN ),
Z Z Z !
hh, λi = h(t) · λ(dt) = ḣ(τ ) dτ · λ(dt)
(0,∞) (0,∞) (0,t)
Z

= ḣ(τ ) · λ (τ, ∞) dτ = h, hλ H1 (RN ) ,
(0,∞)
where Z

hλ (t) = λ (τ, ∞) dτ.
(0,t]
Moreover,
 
Z Z ZZ
khλ k2H1 (RN ) = λ (τ, ∞) |2 dτ =

λ(ds) · λ(dt) dτ
 

(0,∞) (0,∞)
(τ,∞)2
ZZ
= s ∧ t λ(ds) · λ(dt).
(0,∞)2
Hence, by (8.1.3),
khλ k2H(RN )
!
∗
(8.1.5) \
W (N ) (λ) = exp − , λ ∈ Θ(RN ) .
2
Although (8.1.5) is far less intuitively appealing than (8.1.4), it provides a
mathematically rigorous way in which to think of W (N ) as the standard Gaussian
measure on H1 (RN ). Furthermore, there is another way to understand why one
should accept (8.1.5) as evidence for this way of thinking about W (N ) . Indeed,
∗
given λ ∈ Θ(RN ) , write
Z Z T

hθ, λi = lim θ(t) · λ(dt) = − lim θ(t) · dλ (t, ∞) ,
T →∞ [0,T ] T →∞ 0
where the integral in the last expression is taken in the sense of Riemann–
Stieltjes. Next, apply the integration by part formula3 to conclude that t
λ (t, ∞) is Riemann–Stieltjes integrable with respect to t θ(t) and that
Z T Z T

− θ(t) · dλ (t, ∞) = −θ(T ) · λ (T, ∞) + λ (t, ∞) · dθ(t).
0 0
Hence, since
|θ(T )|
Z
lim |θ(T )||λ|(T, ∞) ≤ lim (1 + t) |λ|(dt) = 0,
T →∞ T →∞ 1 + T (0,∞)
Z T
(8.1.6) hθ, λi = lim ḣλ (t) · dθ(t),
T →∞ 0
where again the integral is in the sense of Riemann–Stieltjes. Thus, if one
dt, one can believe that hθ, λi provides a
somewhat casually writes dθ(t) = θ̇(t)
reasonable interpretation of θ, hλ H(RN ) for all θ ∈ Θ(RN ), not just those that
are in H1 (RN ).
Because R. Cameron and T. Martin were the first mathematicians to system-
atically exploit the consequences of this line of reasoning, I will call H1 (RN ) the
Cameron–Martin space for classical Wiener measure.
3See, for example, Theorem 1.2.7 in my A Concise Introduction to the Theory of Integration,
Birkhäuser (1999).
Exercise 8.1.7. Let H be a separable Hilbert space, and, for each n ∈ Z+

and subset {g1 , . . . , gn } ⊆ H, let A(g1 , . . . , gn ) denote the σ-algebra over H
generated by the mapping
h ∈ H 7−→ (h, g1 )H , . . . , (h, gn )H ∈ Rn ,

and check that

[
A(g1 , . . . , gn ) : n ∈ Z+ and g1 , . . . , gn ∈ H

A=
is an algebra that generates BH . Show that there always exists a finitely additive
WH on A that is uniquely determined by the properties that it is σ-additive on
A(g1 , . . . , gn ) for every n ∈ Z+ and {g1 , . . . , gn } ⊆ H and that
Z h√ i
kgk2H

exp −1 (h, g)H WH (dh) = exp − , g ∈ H.
H 2
On the other hand, as we already know, this finitely additive measure admits a
countably additive extension to BH if and only if H is finite dimensional.
§ 8.2 A Structure Theorem for Gaussian Measures
Say that a centered Gaussian measure W on a separable Banach space E is
non-degenerate if EW hx, x∗ i2 > 0 unless x∗ = 0. (See Exercise 8.2.11.) In

this section I will show that any non-degenerate, centered Gaussian measure W
on a separable Banach space E shares the same basic structure that W (N ) has
on Θ(RN ). In particular, I will show that there is always a Hilbert space H ⊆ E
for which W is the standard Gauss measure in the same sense that W (N ) was
shown in § 8.1.2 to be the standard Gauss measure for H1 (RN ).
§ 8.2.1. Fernique’s Theorem. In order to carry out my program, I need a
basic integrability result about Banach space–valued, Gaussian random vari-
ables. The one that I will use is due to X. Fernique, and his is arguably the
most singularly beautiful result in the theory of Gaussian measures on a Banach
space.
Theorem 8.2.1 (Fernique’s Theorem). Let E be a real, separable Banach
space, and suppose that X is an E-valued random variable that is centered and
Gaussian in the sense that, for each x∗ ∈ E ∗ , hX, x∗ i is a centered, R-valued
Gaussian random variable. If R = inf{r : P(kXkE ≤ r) ≥ 34 )}, then
h kXk2E i ∞ 2n
1
X e
(8.2.2) E e 18R2 ≤ K ≡ e 2 + .
n=0
3
(See Corollary 8.4.3 for a sharpened statement.)

§ 8.2 A Structure Theorem for Gaussian Measures 307
Proof: After enlarging the sample space if necessary, I may and will assume
that there is an E-valued random variable X 0 that is independent of X and has
1 1
the same distribution as X. Set Y = 2− 2 (X + X 0 ) and Y 0 = 2− 2 (X − X 0 ).
Then the pair (Y, Y 0 ) has the same distribution as the pair (X, X 0 ). Indeed, by
2
Lemma 8.1.2, this comes down to showing that the R ∗-valued random variable
hY, x i, hY , x i has the same distribution as hX, x i, hX 0 , x∗ i , and that is
∗ 0 ∗

an elementary application of the additivity property of independent Gaussians.

Turning to the main assertion, let 0 < s ≤ t be given, and use the preceding
to justify
P kXkE ≤ s P kXkE ≥ t = P kXkE ≤ s & kX 0 kE ≥ t

1 1
= P kX − X 0 kE ≤ 2 2 s & kX + X 0 kE ≥ 2 2 t
1 1
≤ P kXkE − kX 0 kE ≤ 2 2 s & kXkE + kX 0 kE ≥ 2 2 t

1 1 2
≤ P kXkE ∧ kX 0 kE ≥ 2− 2 (t − s) = P kXkE ≥ 2− 2 (t − s) .

3

Now suppose that P kXk ≤ R ≥ 4, and define {tn : n ≥ 0} by t0 = R and
1
tn = R + 2 2 tn−1 for n ≥ 1. Then
2
P kXkE ≤ R P kXkE ≥ tn ≤ P kXkE ≥ tn−1
and therefore
!2
P kXkE ≥ tn P kXkE ≥ tn−1
≤
P kXkE ≤ R P kXkE ≤ R
for n ≥ 1. Working by induction, one gets from this that
!2n
P kXkE ≥ tn P kXkE ≥ R
≤
P kXkE ≤ R P kXkE ≤ R
n+1
2 −1 n+1 n+1 n
and therefore, since tn = R 2 R ≤ 3−2 .

1 ≤ 32 2 R, that P kXkE ≥ 32 2
2 2 −1
Hence,
h kXk2E i ∞
1 X n n n+1
e2 P 32 2 R ≤ kXkE ≤ 32 2 R

EP e 18R2 ≤ e 2 P kXkE ≤ 3R +
n=0
∞ n
1
X e 2
≤ e2 + = K.
n=0
3
§ 8.2.2. The Basic Structure Theorem. I will now abstract the relationship,
proved in § 8.1.2, between Θ(RN ), H1 (RN ), and W (N ) , and for this purpose I
will need the following simple lemma.
Lemma 8.2.3. Let E be a separable, real Banach space, and suppose that
H ⊆ E is a real Hilbert space that is continuously embedded as a dense subspace
of E.
(i) For each x∗ ∈ E ∗ there is a unique hx∗ ∈ H with the property that
h, hx∗ H = hh, x∗ i for all h ∈ H, and the map x∗ ∈ E ∗ 7−→ hx∗ ∈ H is
linear, continuous, one-to-one, and onto a dense subspace of H.
(ii) If x ∈ E, then x ∈ H if and only if there is a K < ∞ such that |hx, x∗ i| ≤
Kkhx∗ kH for all x∗ ∈ E ∗ . Moreover, for each h ∈ H, khkH = sup{hh, x∗ i : x∗ ∈
E ∗ & kx∗ kE ∗ ≤ 1}.
(iii) If L∗ is a weak* dense subspace of E ∗ , then there exists a sequence {x∗n :
n ≥ 0} ⊆ L∗ such that {hx∗n : n ≥ 0} is an orthonormal basis for H. Moreover,
P∞
if x ∈ E, then x ∈ H if and only if n=0 hx, x∗n i2 < ∞. Finally,
∞
X
h, h0 hh, x∗n ihh0 , x∗n i for all h, h0 ∈ H.

H
=
n=0
Proof: Because H is continuously embedded in E, there exists a C < ∞ such

that khkE ≤ CkhkH . Thus, if x∗ ∈ E ∗ and f (h) = hh, x∗ i, then f is linear and
|f (h)| ≤ khkE kx∗ kE ∗ ≤ Ckx∗ kE ∗ khkH , and so, by the Riesz Representation
Theorem for Hilbert spaces, there exists a unique hx∗ ∈ H such that f (h) =
h, hx∗ H . In fact, khx∗ kH ≤ Ckx∗ kE ∗ , and uniqueness can be used to check
that x∗ hx∗ is linear. To see that x∗ hx∗ is one-to-one, it suffices to show
that x = 0 if hx∗ = 0. But if hx∗ = 0, then hh, x∗ i = 0 for all h ∈ H, and
∗
therefore, because H is dense in E, x∗ = 0. Because I will use it later, I will

prove slightly more than the density of just {hx∗ : x∗ ∈ E ∗ } in H. Namely, for
any weak* dense subset S ∗ of E ∗ , {hx∗ : x∗ ∈ S ∗ } is dense in H. Indeed, if this
were not the case, then there would exist an h ∈ H \ {0} with the property that
hh, x i = h, hx∗ H = 0 for all x ∈ S. But, since S ∗ is weak* dense in E ∗ , this
∗ ∗
would lead to the contradiction that h = 0. Thus, (i) is now proved.

Obviously, if h ∈ H, then |hh, x∗ i| = |(h, hx∗ )H | ≤ khx∗ kH khkH for x∗ ∈ E ∗ .
Conversely, if x ∈ E and |hx, x∗ i| ≤ Kkhx∗ kH for some K < ∞ and all x∗ ∈ E ∗ ,
set f (hx∗ ) = hx, x∗ i for x∗ ∈ E ∗ . Then, because x∗ hx∗ is one-to-one, f
is a well-defined, linear functional on {hx : x ∈ E }. Moreover, |f (x∗ )| ≤
∗
∗ ∗
Kkhx∗ kH , and therefore, since {hx∗ : x∗ ∈ E ∗ } is dense, f admits a unique

extension as a continuous, linear functional on H. Hence, by Riesz’s theorem,
there is an h ∈ H such that
hx, x∗ i = f (hx∗ ) = h, hx∗ = hh, x∗ i, x∗ ∈ E ∗ ,

H
which means that x = h ∈ H. In addition, if h ∈ H, then khkH = sup{hh, x∗ i :

khx∗ kH ≤ 1} follows from the density of {hx∗ : x∗ ∈ E ∗ }, and this completes
the proof of (ii).
Turning to (iii), remember that, by Exercise 5.1.19, the weak* topology on E ∗

is second countable. Hence, the weak* topology on L∗ is also second countable
and therefore separable. Thus, we can find a sequence in L∗ that is weak* dense
in E ∗ , and then, proceeding as in the hint given for Exercise 5.1.19, extract
a subsequence of linearly independent elements whose span S ∗ is weak* dense
in E ∗ . Starting with this subsequence, apply the Grahm–Schmidt orthogonal-
ization procedure to produce a sequence {x∗n : n ≥ 0} whose span is S ∗ and
for which {hx∗n : n ≥ 0} is orthonormal in H. Moreover, because the span of
{hx∗n : n ≥ 0} equals {hx∗ : x∗ ∈ S ∗ }, which, by what we proved earlier, is
dense in H, {hx∗n : n ≥ 0} is an orthonormal basis in H. Knowing this, it is
immediate that
∞
X ∞
X
0 0
hh, x∗n ihh0 , x∗n i.

h, h H
= h, hxn H
h , hxn H
=
n=0 n=0
2
P∞ ∗ 2
P∞ ∗ 2
P khkH ∗= n=0 hh, xn i . Finally, if x ∈ E and
In particular, n=0 hx, xn i < ∞,
set g = m=0 hx, xn ihx∗n . Then g ∈ H and hx − g, x i = 0 for all x∗ ∈ S ∗ .
∗
Hence, since S ∗ is weak* dense in E ∗ , x = g ∈ H.

Given a separable real Hilbert space H, a separable real Banach space E, and
a W ∈ M1 (E), I will say that the triple (H, E, W) is an abstract Wiener
space if H is continuously embedded as a dense subspace of E and W ∈ M1 (E)
has Fourier transform
khx∗ k2
H
(8.2.4) c ∗ ) = e−
W(x 2 for all x∗ ∈ E ∗ .
The terminology is justified by the fact, demonstrated at the end of § 8.1.2,
that H1 (RN ), Θ(RN ), W (N ) is an abstract Wiener space. The concept of an
abstract Wiener space was introduced by Gross, although his description was
somewhat different from the one just given (cf. Theorem 8.3.9 for a reconciliation
of mine with his definition).
Theorem 8.2.5. Suppose that E is a separable, real Banach space and that
W ∈ M1 (E) is a centered Gaussian measure that is non-degenerate. Then there
exists a unique Hilbert space H such that (H, E, W) is an abstract Wiener space.
q
Proof: By Fernique’s Theorem, we know that C ≡ EW kxk2E < ∞.
To understand the proof of existence, it is best to start with the proof of
uniqueness. Thus, suppose that H is a Hilbert space for which (E, H, W) is an
abstract Wiener space. Then, for all x∗ , y ∗ ∈ E ∗ , hhx∗ , y ∗ i = (hx∗ , hy∗ )H =
hhy∗ , x∗ i. In addition,
Z
∗
hhx∗ , x i = khx∗ k2H = hx, x∗ i2 W(dx),
and so, by the symmetry just established,

Z
(*) hhx∗ , y ∗ i = khx∗ k2H = hx, x∗ ihx, y ∗ i W(dx),
for all x∗ , y ∗ ∈ E ∗ . Next observe that
Z
hx, x∗ i x W(dx) ≤ Ckhx∗ kH ,

(**) E
and therefore that the integral xhx, x∗ i W(dx) is a well-defined element of E.

R
Moreover, by (*),
Z
∗ ∗ ∗
hhx∗ , y i = xhx, x i W(dx), y for all y ∗ ∈ E ∗ ,
and so Z
(***) hx∗ = xhx, x∗ i W(dx).
Finally, given h ∈ H, choose {x∗n : n ≥ 1} ⊆ E ∗ so that hx∗n −→ h in H. Then

lim sup h · , x∗n i − h · , x∗m i 2

L (W;R)
= lim sup khx∗ − hx∗ kH = 0,
n m
m→∞ n>m m→∞ n>m
and so, if Ψ denotes the closure of {h · , x∗ i : x∗ ∈ E ∗ } in L2 (W; R) and F :

Ψ −→ E is given by
Z
F (ψ) = xψ(x) W(dx), ψ ∈ Ψ,
then h = F (ψ) for some ψ ∈ Ψ. Conversely, if ψ ∈ Ψ and {x∗n : n ≥ 1} is chosen

so that h · , x∗n i −→ ψ in L2 (W; R), then {hx∗n : n ≥ 1} converges in H to some
h ∈ H and it converges in E to F (ψ). Hence, F (ψ) = h ∈ H. In other words,
H = F (Ψ).
The proof of existence is now a matter of checking that if Ψ and F are defined
as above and if H = F (Ψ) with kF (ψ)kH = kψkL2 (W;R) , then (H, E, W) is an
abstract Wiener space. To this end, observe that
Z
hF (ψ), x i = hx, x∗ iψ(x) W(dx) = F (ψ), hx∗ H ,
∗

and therefore both (*) and (***) hold for this choice of H. Further, given (*), it
is clear that khx∗ k2H is the variance of h · , x∗ i and therefore that (8.2.4) holds.
At the same time, just as in the derivation of (**), kF (ψ)kE ≤ CkψkL2 (W;R) =
CkF (ψ)kH , and so H is continuously embedded inside E. Finally, by the Hahn–
Banach Theorem, to show that H is dense in E it suffices to check that the only
x∗ ∈ E ∗ such Rthat hF (ψ), x∗ i = 0 for all ψ ∈ Ψ is x∗ = 0. But when ψ = h · , x∗ i,
hF (ψ), x∗ i = hx, x∗ i2 W (dx), and therefore, because W is non-degenerate, such
an x∗ would have to be 0.
§ 8.2.3. The Cameron–Marin Space. Given a centered, non-degenerate
Gaussian measure W on E, the Hilbert space H for which (H, E, W) is an ab-
stract Wiener space is called its Cameron–Martin space. Here are a couple
of important properties of the Cameron–Martin subspace.
Theorem 8.2.6. If (H, E, W) is an abstract Wiener space, then the map

x∗ ∈ E ∗ 7−→ hx∗ ∈ H is continuous from the weak* topology on E ∗ into the
strong topology on H. In particular, for each R > 0, {hx∗ : x∗ ∈ BE ∗ (0, R)}
is a compact subset of H, BH (0, R) is a compact subset of E, and so H ∈ BE .
Moreover, when E is infinite dimensional, W(H) = 0. Finally, there is a unique
linear, isometric map I : H −→ L2 (W; R) such that I(hx∗ ) = h · , x∗ i for all
x∗ ∈ E ∗ , and {I(h) : h ∈ H} is a Gaussian family in L2 (W; R).
Proof: To prove the initial assertion, remember that x∗ c ∗ ) is continuous
W(x
∗ ∗
with respect to the weak* topology. Hence, if xk −→ x in the weak* topology,
then !
khx∗k − hx∗ k2H
exp − c ∗k − x∗ ) −→ 1,
= W(x
2
and so hx∗k −→ hx∗ in H.

Given the first assertion, the compactness of {hx∗ : x∗ ∈ BE ∗ (0, R)} in H
follows from the compactness (cf. Exercise 5.1.19) of BE ∗ (0, R) in the weak*
topology. To see that BH (0, R) is compact in E, again apply Exercise 5.1.19 to
check that BH (0, R) is compact in the weak topology on H. Therefore, all that
we have to show is that the embedding map h ∈ H 7−→ h ∈ E is continuous
from the weak topology on H into the strong topology on E. Thus, suppose
that hk −→ h weakly in H. Because hx∗ : x∗ ∈ BE ∗ (0, 1) is compact in H,
for each > 0 there exist an n ∈ Z+ and a {x∗1 , . . . , x∗n } ⊆ BE ∗ (0, 1) such that
n
[
∗
{hx∗ : x ∈ BE ∗ (0, 1)} ⊆ BH (hx∗m , ).
1
Now choose ` so that max1≤m≤n |hhk − h, x∗m i| < for all k ≥ `. Then, for any
x∗ ∈ BE ∗ (0, 1) and all k ≥ `,
|hhk − h, x∗ i| ≤ + min hk − h, hx∗ − hx∗m H ≤ + 2 sup khk kH .

1≤m≤n k≥1
Since, by the uniform boundedness principle, supk≥1 khk kH < ∞, this proves
that khk − hkE = sup{hhk − h, x∗ i : x∗ ∈ BE ∗ (0, 1)} −→ 0 as k → ∞.
S∞
Because H = 1 BH (0, n) and BH (0, n) is a compact subset of E for each
n ∈ Z+ , it is clear that H ∈ BE . To see that W(H) = 0 when E is infinite
dimensional, choose {x∗n : n ≥ 0} as in the final part of Lemma 8.2.3, and
set Xn (x) = hx, x∗n i. Then the Xn ’s are an infinite
P∞ sequence of independent,
centered, Gaussians with mean value 1, and so n=0 Xn2 = ∞ W-almost surely.
Hence, by Lemma 8.2.3, W-almost no x is in H.
Turning to the map I, define I(hx∗ ) = h · , x∗ i. Then, for each x∗ , I(hx∗ ) is
a centered Gaussian with variance khx∗ k2H , and so I is a linear isometry from
{hx∗ : x∗ ∈ E ∗ } into L2 (W; R). Hence, since {hx∗ : x∗ ∈ E ∗ } is dense in H, I

admits a unique extension as a linear isometry from H into L2 (W; R). Moreover,
as the L2 (W; R)-limit of centered Gaussians, I(h) is a centered Gaussian for each
h ∈ H.
The map I in Theorem 8.2.6 was introduced for the classical Wiener space by
Paley and Wiener, and so I will call it the Paley–Wiener map. To appreciate
its importance here, observe that {hx∗ : x∗ ∈ E ∗ } is the subspace of g ∈ H
with the property that h ∈ H 7−→ (h, g)H ∈ R admits a continuous extension
to E. Even though, when dim(H) = ∞, no such continuous extension exists for
general g ∈ H, I(g) can be thought of as an extension of h (h, g)H , albeit
one that is defined only up to a W-null set. Of course, one has to be careful
when using this interpretation, since, when H is infinite dimensional, I(g)(x)
for a given x ∈ E is not well-defined simultaneously of all g ∈ H. Nonetheless,
by adopting it, one gets further evidence for the idea that W wants to be the
standard Gauss measure on H. Namely, because
√ khk2
H
EW e −1 I(h) = e− 2 ,

(8.2.7) h ∈ H,
if W lived on H, then it would certainly be the standard Gauss measure there.

Perhaps the most important application of the Paley–Wiener map is the fol-
lowing theorem about the behavior of Gaussian measures under translation.
That is, if y ∈ E and τy : E −→ E is given by τy (x) = x + y, we will be looking
at the measure (τy )∗ W and its relationship to W. Using the reasoning suggested
above, the result is easy to guess. Namely, if W really lived on H and were given
by a Feynman-type representation
1 − khk2H
W(dh) = e 2 λH (dh),
Z
then (τg )∗ W should have the Feynman representation
1 − kh−gk2H
e 2 λH (dh),
Z
which could be rewritten as
(τg )∗ W (dh) = exp h, g H − 12 kgk2H W(dh).

Hence, if we assume that I(g) gives us the correct interpretation of ( · , g)H , we

are led to guess that, at least for g ∈ H,
(8.2.8) (τg )∗ W(dx) (dh) = Rg (x) W (dx), where Rg = exp I(g) − 12 kgk2H .

That (8.2.8) is correct was proved for the classical Wiener space by Cameron
and Martin, and for this reason it is called the Cameron–Martin formula. In
fact, one has the following result, the second half of which is due to Segal.
Theorem 8.2.9. If (H, E, W) is an abstract Wiener space, then, for each

g ∈ H, (τg )∗ W W and the Rg in (8.2.8) is the corresponding Radon–Nikodym
derivative. Conversely, if (τy )∗ W is not singular with respect to W, then y ∈ H.
Proof: Let g ∈ H, and set µ = (τg )∗ W. Then
√ ∗
µ̂(x∗ ) = EW e −1hx+g,x i = exp −1hg, x∗ i − 12 khx∗ k2H .
√
(*)
Now define ν by the right-hand side of (8.2.8). Clearly ν ∈ M1 (E). Thus, we

will have proved the first part once we show that ν̂ is given by the right-hand
side of (*). To this end, observe that, for any h1 , h2 ∈ H,
2
ξ22

W
ξ1 I(h1 )+ξ2 I(h2 ) ξ1 2
2
E e = exp kh1 kH + ξ1 ξ2 h1 , h2 H + kh2 kH
2 2
for all ξ1 , ξ2 ∈ C. Indeed, this is obvious when ξ1 and ξ2 are pure imaginary,
and, since both sides are entire functions of (ξ1 , ξ2 ) ∈ C2 , it follows in general
by analytic
√ continuation. In particular, by taking h1 = g, ξ1 = 1, h2 = hx∗ , and
ξ2 = −1, it is easy to check that the right-hand side of (*) is equal to ν̂(x∗ ).
To prove the second assertion, begin by recalling from Lemma 8.2.3 that if
y ∈ E, then y ∈ H if and only if there is a K < ∞ with the property that
|hy, x∗ i| ≤ K for all x∗ ∈ E ∗ with khx∗ kH = 1. Now suppose that (τx∗ )∗ W 6⊥
W, and let R be the Radon–Nikodym derivative of its absolutely continuous
part. Given x∗ ∈ E ∗ with khx∗ kH = 1, let Fx∗ be the σ-algebra generated by
x hx, x∗ i, and check that (τy )∗ W Fx∗ W Fx∗ with Radon–Nikodym
derivative
hy, x∗ i2

∗ ∗
Y (x) = exp hy, x ihx, x i − .
2
Hence,
1 2
Y ≥ EW R Fx∗ ≥ EW R 2 Fx∗ ,

and so (cf. Exercise 8.2.19)
hy, x∗ i2

1 1
exp − = EW Y 2 ≥ α ≡ EW R 2 ∈ (0, 1].
8
Since this means that hy, x∗ i2 ≤ 8 log α1 , the proof is complete.

Exercise 8.2.10. Let C ∈ Hom(RN ; RN be a positive definite and symmetric,

take E = RN to be the standard Euclidean metric, and let H = RN with
the
Hilbert inner product (x, y)H = (x, C−1 y)RN . Show that H, E, γ0,C is an
abstract Wiener space.
Exercise 8.2.11. Let E be a separable Banach space and W a centered Gaus-

sian measure on E, but do not assume W is non-degenerate. Denote by N
that
the set of x∗ ∈ E ∗ for which EW hx, x∗ i2 = 0, and set

Ê = x ∈ E : hx, x∗ i = 0 for all x∗ ∈ N .

Show that Ê is closed, that W(Ê) = 1, and that W Ê is a non-degenerate,

centered Gaussian measure on Ê.
Hint: Since W {x ∈ E : hx, x∗ i = 6 0} = 0 for each x∗ ∈ N , the only question is

whether one can choose a countable subset C ⊆ N such that x ∈ Ê if and only
if hx, x∗ i = 0 for all x∗ ∈ C. For this purpose, recall that, by Exercise 5.1.19, E ∗
with the weak* topology is second countable and therefore that N is separable
with respect to the weak* topology.
Exercise 8.2.12. Let {xP n : n ≥ 0} be a sequence in the P
separable Banach space
∞ ∞
E with the property that n=0 kxn kE < ∞. Show that n=0 |ξn |kxP n k < ∞ for
∞
N
γ0,1 -almost every ξ ∈ RN , and define X : RN −→ E so that X(ξ) = n=0 ξn xn
P∞
if n=0 |ξn |kxn kE < ∞ and X(ξ) = 0 otherwise. Show that the distribution
µ of X is a centered, Gaussian measure on E. In addition, show that µ is
non-degenerate if and only if the span of {xn : n ≥ 0} is dense in E.
Exercise 8.2.13. Here an application of Fernique’s Theorem to functional anal-
ysis. Let E and F be a pair of separable Banach spaces and ψ a Borel measurable,
linear map from E to F . Given a centered, Gaussian E-valued random variable
X, use Exercise 2.3.21 see that ψ ◦ X is an F -valued, a centered Gaussian ran-
dom variable, and apply Fernique’s Theorem to conclude that ψ ◦ X is a square
integrable and has mean value 0. Next, suppose that ψ is not continuous, and
choose {xn : n ≥ 0} ⊆ E and {yn : n ≥ 0} ⊆ F ∗ so that kxn kE = 1 = kyn ∗ kF ∗
and hψ(xn ), yn∗ i ≥ n + 13 . Using Exercise 8.2.12, show that there exist cen-
tered, Gaussian F -valued random variables {Xn : n ≥ 0},P {X n : n ≥ 0},
N −2 ∞
and X under γ0,1 such that Xn (ξ) = (n + 1) ξn xn , X(ξ) = n=0 Xn (ξ), and
X n (ξ) = X(ξ) − Xn (ξ) for γ0,1
N
-almost every ξ ∈ RN . Show that
Z Z
kψ ◦ X(ξ)k2F N
γ0,1 ≥ hψ ◦ X(ξ), yn∗ i γ0,1
(dξ) N
(dξ)
Z
≥ hψ ◦ Xn (ξ), yn∗ i γ0,1
N
(dξ) ≥ (n + 1),
and thereby arrive at the contradiction that ψ ◦ X ∈/ L2 (γ0,1

N
; F ). Conclude that
every Borel measurable, linear map from E to F is continuous. Notice that,
as a consequence, we know that the Paley–Wiener integral I(h) of an h in the
Cameron–Martin space is equal W-almost everywhere to a Borel measurable,
linear function if and only if h = hx∗ for some x∗ ∈ E ∗ .
Exercise 8.2.14. Let W p bePa centered, Gaussian measure on a separable Ba-

n 2
nach space E, and set σ = m=1 am , where a1 , . . . , an ∈ R. If X1 , . . . , Xn are
mutually independent, E-valued random variables with distribution Pn W on some
probability space (Ω, F, P), show that the P-distribution of S ≡ m=1 am Xm is
the same as the W-distribution of x σx. In particular,
EP kSkpE = σ p EW kxkpE

for all p ∈ [0, ∞).
Hint: Using Exercise 8.2.11, reduce to the case when W is non-degenerate. For
this case, let H be the Cameron–Martin space for W on E, and show that
h √ ∗
i σ2 2
EP e −1hS,x i = e− 2 khx∗ kH for all x∗ ∈ E ∗ .
Exercise 8.2.15. Referring to the setting in Lemma 8.2.3, show that there is a
(n)
sequence {k · kE : n ≥ 0} of norms on E each of which is commensurate with
(N )
k · kE (i.e., Cn−1 k · k ≤ k · kE ≤ Cn k · k for some Cn ∈ [1, ∞)) such that, for
each R > 0,
(n)
BH (0, R) = {x ∈ E : kxkE ≤ R for all n ≥ 0}.
Hint: Choose {x∗m : m ≥ 0} ⊆ E ∗ so that {hx∗m : m ≥ 0} is an orthonormal

Pn
basis for H, define Pn : E −→ H by Pn x = m=0 hx, x∗m ihx∗m , and set
q
(n)
kxkE = kPn xk2H + kx − Pn xk2E .
Exercise 8.2.16. Referring to the setting in Fernique’s

Theorem,
observe that
all powers of kXkE are integrable, and set σ 2 = E kXk2E . Show that
h kXk2E i
E e 72σ2 ≤ K.
In particular, for any n ≥ 1, conclude that
E kXk2n ≤ (72)n n!Kσ 2n ,

E
which is remarkably close to the equality that holds when E = R. See Corollary
8.4.3 for a sharper statement.
Exercise 8.2.17. Again let E be a separable, real Banach space. Suppose that
{Xn : n ≥ 1} is a sequence for centered, Gaussian E-valued random variables on
some probability space (Ω, F, P) and that Xn −→ X in P-probability. Show that
X is again a centered,
Gaussian random variable and that there exists a λ > 0
2
for which supn≥1 EP eλkXn kE < ∞. Conclude, in particular, that Xn −→ X in
Lp (P; E) for every p ∈ [1, ∞).
Exercise 8.2.18. Given λ ∈ Θ(RN )∗ , I pointed out at the end of § 8.1.2 that the
Paley–Wiener integral
[I(hλ )](θ) can be interpreted as the Riemann–Stieltjes
integral of λ (s, ∞) with respect to θ(s). In this exercise, I will use this obser-
vation as the starting point for what is called stochastic integration.
(i) Given λ ∈ Θ(RN )∗ and t > 0, set λt (dτ ) = 1[0,t) (τ )λ(dτ ) + δt λ [t, ∞) , and

show that for all θ ∈ Θ(RN )

Z t
hθ, λt i =

λ (τ, ∞) · dθ(τ ),
0
where the integral on the right is taken in the sense of Riemann–Stieltjes. In

particular, conclude that t hθ, λt i is continuous for each θ.

(ii) Given f ∈ Cc1 [0, ∞); RN , set λf (dτ ) = −ḟ (τ ) dτ , and show that
Z t
hθ, λtf i = f (τ ) · dθ(τ ),
0
where again the integral on the right is Riemann–Stieltjes. Use this to see that
the process Z t
f (τ ) · dθ(τ ) : t ≥ 0
0
has the same distribution under W (N ) as

Z t
2
(*) B |f (τ )| dτ : t ≥ 0 ,
0
where {B(t) : t ≥ 0} is an R-valued Brownian motion.

R t∧τ
(iii) Given f ∈ L2loc [0, ∞); RN and t > 0, set htf (τ ) = 0 f (s) ds. Show that

the W (N ) -distribution of the process I(htf ) : t ≥ 0 is the same as that of the
process in (*). In particular, conclude (cf. part (ii) of Exercise 4.3.16) that there
is a continuous modification of the process {I(htf ) : t ≥ 0}. For reasons made
clear in (ii), such a continuous modification is denoted by
Z t
f (τ ) · dθ(τ ) : t ≥ 0 .
0
Of course, unless f has bounded variation, the integrals in the preceding are
no longer interpretable as Riemann–Stieltjes integrals. In fact, they not even
defined θ by θ but only as a stochastic process. For this reason, they are called
stochastic integrals.
§ 8.3 From Hilbert to Abstract Wiener Space 317
Exercise 8.2.19. Define Rg as in (8.2.8), and show that

(p − 1)kgk2H

W
p p1
E Rg = exp for all p ∈ (0, ∞).
2
Exercise 8.2.20. Here is another way to think about Segal’s half of Theorem
8.2.9. Using Lemma 8.2.3, choose {x∗n : n ≥ 0} ⊆ E ∗ so that {hx∗n : n ≥ 0} is
an orthonormal basis for H. Next, define F : E −→ RN so thatQ F (x)n = hx, x∗n i
∞
for each n ∈ N, and show that F∗ W = γ0,1 N
and (F ◦ τy )∗ W = 0 γan ,1 , where
∞
an = hy, x∗n i. Conclude from this that (τy )∗ W ⊥ W if γ0,1
Q
N
⊥ 0 γan ,1 . Finally,
P∞
use this together with Exercise 5.2.42 to see that (τy )∗ W ⊥ W if 0 a2m = ∞,
which, by Lemma 8.2.3, will be the case if y ∈ / H.
§ 8.3 From Hilbert to Abstract Wiener Space
Up to this point I have been assuming that we already have at hand a non-
degenerate, centered Gaussian measure W on a Banach space E, and, on the
basis of this assumption, I produced the associated Cameron–Martin space H.
In this section, I will show how one can go in the opposite direction. That is,
I will start with a separable, real Hilbert space H and show how to go about
finding a separable, real Banach space E for which there exists a W ∈ M1 (E)
such that (H, E, W) is an abstract Wiener space. Although I will not adopt his
approach, the idea of carrying out such a program is Gross’s.
Warning: From now on, unless the contrary is explicitly stated, I will be as-
suming that the spaces with which I am dealing are all infinite dimensional,
separable, and real.
§ 8.3.1. An Isomorphism Theorem. Because, at an abstract level, all infinite
dimensional, separable Hilbert spaces are the same, one should expect that, in a
related sense, the set of all abstract Wiener spaces for which one Hilbert space is
the Cameron–Martin space is the same as the set of all abstract Wiener spaces
for which any other Hilbert space is the Cameron–Martin space. The following
simple result verifies this conjecture.
Theorem 8.3.1. Let H and H 0 be a pair of Hilbert spaces, and suppose that
F is a linear isometry from H onto H 0 . Further, suppose that (H, E, W) is
an abstract Wiener space. Then there exists a separable, real Banach space
E 0 ⊇ H 0 anda linear isometry F̃ from E onto E 0 such that F̃ H = F and
H 0 , E 0 , F̃∗ W is an abstract Wiener space.
Proof: Define kh0 kE 0 = kF −1 h0 kE for h0 ∈ H 0 , and let E 0 be the Banach space
obtained by completing H 0 with respect to k · kE 0 . Trivially, H 0 is continuously
embedded in E 0 as a dense subspace, and F admits a unique extension F̃ as an
isometry from E onto E 0 . Moreover, if (x0 )∗ ∈ (E 0 )∗ and F̃ > is the adjoint map
from (E 0 )∗ onto E ∗ , then
h0 , h0(x0 )∗ H 0 = hh0 , (x0 )∗ i = hF −1 h0 , F̃ > (x0 )∗ i

= F −1 h0 , hF̃ > (x0 )∗ H = h0 , F hF̃ > (x0 )∗ H 0 ,

and so h0(x0 )∗ = F hF̃ > (x0 )∗ . Hence,

h √ 0 0 ∗
i h √ 0 ∗
i h √ > 0 ∗
i
EF̃∗ W e −1 hx ,(x ) i = EW e −1 hF̃ x,(x ) i = EW e −1 hx,F̃ (x ) i
1 2
− 1 kF −1 h0 k2 − 1 kh0 k2
= e− 2 khF̃ > (x0 )∗ kH = e 2 (x0 )∗ H = e 2 (x0 )∗ H 0 ,
which completes the proof that H 0 , E 0 , F̃∗ W is an abstract Wiener space.

Theorem 8.3.1 says that there is a one-to-one correspondence between the ab-
stract Wiener spaces associated with one Hilbert space and the abstract Wiener
spaces associated with any other. In particular, it allows us to prove the theorem
of Gross which states that every Hilbert space is the Cameron–Martin space for
some abstract Wiener space.
Corollary 8.3.2. Given a separable, real Hilbert space H, there exists a
separable Banach space E and a W ∈ M1 (E) such that (H, E, W) is an abstract
Wiener space.
Proof: Let F : H 1 (R) −→ H be an isometric isomorphism, and use Theorem
8.3.1 to construct a separable Banach space E and an isometric, isomorphism
F̃ : Θ(R) −→ E so that (H, E, W) is an abstract Wiener space when W =
F̃∗ W (1) .
It is important to recognize that although a non-degenerate, centered Gaussian
measure on a Banach space E determines a unique Cameron–Martin space H,
a given H will be the Cameron–Martin space for an uncountable number of
abstract Wiener spaces. For example, in the classical case when H = H1 (RN ),
we could have replaced Θ(RN ) by a subspace which reflected the fact that almost
every Brownian path is locally Hölder continuous of any order less than a half.
We will see a definitive, general formulation of this point in Corollary 8.3.10.
§ 8.3.2. Wiener Series. The proof that I gave of Corollary 8.3.2 is too non-
constructive to reveal much about the relationship between H and the abstract
Wiener spaces for which it is the Cameron–Martin space. Thus, in this sub-
section I will develop another, entirely different way of constructing abstract
Wiener spaces for a Hilbert space.
The approach here has its origins in one of Wiener’s own constructions of
Brownian motion and is based on the following line of reasoning. Given H,
choose an orthonormal basis {hn : n ≥ 0}. If there were a standard Gauss
measure W on H, then the random variables {Xn : n ≥ 0} given by Xn (h) =
h, hn H would be independent, standard normal, R-valued random variables,
P∞
and, for each h ∈ H, 0 Xn (h)hn would converge in H to h. Even though
W cannot live on H, this line of reasoning suggests that a way to construct an
abstract Wiener space is to start with a sequence {Xn : n ≥ 0} of R-valued,
independent standard normalPrandom variables on some probability space, find
∞
a Banach space E in which 0 Xn hn converges with probability 1, and take
W on E to the distribution of this series.
To convince oneself that this line of reasoning has a chance of leading some-
where, one should observe that Lévy’s construction corresponds to a particu-
lar choice of the orthonormal basis {hm : m ≥ 0}.1 To see this, determine
{ḣk,n : (k, n) ∈ N2 } by
on k21−n , (2k + 1)2−n


 1
n−1
−1 on (2k + 1)2−n , (k + 1)21−n

ḣk,0 = 1[k,k+1) and ḣk,n = 2 2

0 elsewhere


for n ≥ 1. Clearly, the ḣk,n ’s are orthonormal in L2 [0, ∞); R . In addition, for
each n ∈ N, the span of {ḣk,n : k ∈ N} equals that of {1[k2−n ,(k+1)2−n ) : k ∈ N}.
Perhaps the easiest way to check this is to do so by dimension counting. That
is, for a given (`, n) ∈ N2 , note that
{ḣ`,0 } ∪ {ḣk,m : `2m−1 ≤ k < (` + 1)2m−1 and 1 ≤ m ≤ n}
has the same number of elements as {1[k2−n ,(k+1)2−n ) : `2n ≤ k < (` + 1)2n }
and that the first set is contained in the span of the second. As a consequence,
we know that {ḣk,n : (k, n) ∈ N2 } is an orthonormal basis in L2 [0, ∞); R , and
Rt
so, if hk,n (t) = 0 ḣk,n (τ ) dτ and (e1 , . . . , eN ) is an orthonormal basis in RN ,
then
hk,n,i ≡ hk,n ei : (k, n, i) ∈ N2 × {1, . . . , N }

is
an orthonormal basis, known as the Haar basis, in H1 (RN ). Finally, if
2
Xk,n,i : (k, n, i) ∈ N ×{1, . . . , N } is a family of independent, N (0, 1)-random
PN
variables and Xk,n = i=1 Xk,n,i ei , then
X ∞ X
n X N ∞
n X
X
Xk,m,i hk,m,i (t) = hk,m (t)Xk,m
m=0 k=0 i=1 m=0 k=0
is precisely the polygonalization that I denoted by Bn (t) in Lévy’s construction

(cf. § 4.3.2).
The construction by Wiener, alluded to above, was essentially the same, only
he chose a different basis for H1 (RN ). Wiener took ḣk,0 (t) = 1[k,k+1) (t) for
1
k ∈ N and ḣk,n (t) = 2 2 1[k,k+1) (t) cos πn(t − k) for (k, n) ∈ N × Z+ , which
means that he was looking at the series
∞ 1
X X 2 2 sin πn(t − k)
(t − k)1[k,k+1) (t)Xk,0 + 1[k,k+1) (t) Xk,n ,
+
πn
k=0 (k,n)∈N×Z
1 The observation that Lévy’s construction (cf. § 4.3.2) can be interpreted in terms of a Wiener
series is due to Z. Ciesielski. To be more precise, initially Ciesielski himself was thinking
entirely in terms of orthogonal series and did not realize that he was giving a re-interpretation
of Lévy’s construction. Only later did the connection become clear.
where again {Xk,n : (k, n) ∈ N2 } is a family of independent, RN -valued, N (0, I)-

random variables. The reason why Lévy’s choice is easier to handle than Wiener’s
is that, in Lévy’s case, for each n ∈ Z+ and t ∈ [0, ∞), hk,n (t) 6= 0 for precisely
one k ∈ N. Wiener’s choice has no such property.
With these preliminaries, the following theorem should come as no surprise.
Theorem 8.3.3. Let H be an infinite dimensional, separable, real Hilbert
space and E a Banach space into which H is continuously embedded as a dense
subspace. If for some orthonormal basis {hm : m ≥ 0} in H the series
∞
X
ξm hm converges in E
(8.3.4) m=0
N
for γ0,1 -almost every ξ = (ξ0 , . . . , ξm , . . . ) ∈ RN
and if S : RN −→ E is given by
P∞
m=0 ξm hm when the series converges in E
S(ξ) =
0 otherwise,

then H, E, W with W = S∗ γ0,1 N
is an abstract Wiener space. Conversely, if
(H, E, W) is an abstract Wiener space and {hm : m ≥ 0} is an orthogonal
sequence in H such that, for each m ∈ N, either hm = 0 or khm kH = 1, then
" p #
Xn
W
(8.3.5) sup I(hm )hm < ∞ for all p ∈ [1, ∞),

E
n≥0
m=0

E
P∞
and, for W-almost every x ∈ E, m=0 [I(hm )](x)hm converges
in E to the
W-conditional expectation value of x given σ {I(hm ) : m ≥ 0} . Moreover,
∞
X ∞
X
[I(hm )](x)hm is W-independent of x − [I(hm )](x)hm .
m=0 m=0
if {hm : m ≥ 0} is an orthonormal basis in H, then, for W-almost every

Finally,P
∞
x ∈ E, m=0 [I(hm )](x)hm converges in E to x, and the convergence is also in
Lp (W; E) for every p ∈ [1, ∞).
Proof: P First assume that (8.3.4) holds for some orthonormal basis, and set
n
Sn (ξ) = m=0 ξm hm and W = S∗ γ0,1 N
. Then, because Sn (ξ) −→ S(ξ) in E for
γ0,1 -almost every ξ ∈ R ,
N N
h √ i n
N Y 1 2 1 2
c ∗ ) = lim Eγ0,1
W(x e −1hSn ,λi = lim e− 2 (hx∗ ,hm )H = e− 2 khx∗ kH ,
n→∞ n→∞
m=0
which proves that (H, E, W) is an abstract Wiener space.

Next suppose that (H, E, W) is an abstract Wiener space and that {hm :
m ≥ 0} is an orthogonal sequence with khm kH ∈ {0, 1} for each m ≥ 0. By
Theorem 8.2.1, x ∈ Lp (W; E) for every p ∈ [1, ∞). Next, for each W∞ n ∈ N, set
Fn = σ {I(hm ) : 0 ≤ m ≤ n} . Clearly, Fn ⊆ Fn+1 and F ≡ n=0 Fn is the
Pn
σ-algebra generated by {I(hm ) : m ≥ 0}. Moreover, if Sn = m=0 I(hm )hm ,
then, since {I(hm ) : m ≥ 0} is a Gaussian family and hx − Sn (x), x∗ i is per-
pendicular in L2 (W; R) to I(hm ) for all x∗ ∈ E ∗ and 0 ≤ m ≤ n, x − Sn (x) is
W-independent of Fn . Thus Sn = EW [x | Fn ], and so, by Theorem 6.1.12, we
know both that (8.3.5) holds and that Sn −→ EW [x | F] W-almost surely. In
addition, the W-independence of Sn (x) from x − Sn (x) implies that the limit
quantities possess the same independence property.
In order to complete the proof at this point, all that I have to do is show that
x = EW [x | F] W-almost surely when {hm : m ≥ 0} is an orthonormal basis.
W
Equivalently, I must check that BE is contained P in the W-completion F of F.
n
To this end, note that, for each h ∈ H, because m=0 (h, hm )H hm converges in
H to h,
n n
!
X X
h, hm H hm −→ I(h) in L2 (W; R).

h, hm H I(hm ) = I
m=0 m=0
W
Hence, I(h) is F -measurable for every h ∈ H. In particular, this means that
W
x hx, x∗ i is F -measurable for every x∗ ∈ E ∗ , and so, since BE is generated
W
by {h · , x∗ i : x∗ ∈ E ∗ }, BE ⊆ F .
It is important to acknowledge that the preceding theorem does not give an-
other proof of Wiener’s theorem that Brownian motion exists. Instead, it simply
says that, knowing it exists, there are lots of ways in which to construct it. See
Exercise 8.3.21 for a more satisfactory proof of the same conclusion in the clas-
sical case, one that does not require the a priori existence of W (N ) .
The following result shows that, in some sense, a non-degenerate, centered,
Gaussian measure W on a Banach space does not fit on a smaller space.
Corollary 8.3.6. If W is a non-degenerate, centered Gaussian measure on
a separable Banach space E, then E is the support of W in the sense that W
assigns positive probability to every non-empty open subset of E.
Proof: Let H be the Cameron–Martin space for W. Since H is dense in E, it
suffices to show that W BE (g, r) > 0 for every g ∈ H and r > 0. Moreover,
since, by the Cameron–Martin formula (8.2.8) (cf. Exercise 8.2.19)
W BE (0, r) = (τ−g )∗ W BE (g, r) = EW R−g , BE (g, r)

kgk2
q
H
≤e 2 W BE (g, r) ,

I need only show that W BE (0, r) > 0 for all r > 0. To this end, choose an
Pn
orthonormal basis {hm : m ≥ 0} in H, and set Sn = m=0 I(hm )hm . Then, by
Theorem 8.3.3, x Sn (x) is W-independent of x x − Sn (x) and
Sn (x) −→ x
in E for W-almost every x ∈ E. Hence, W {kx − Sn (x)kE < 2r } ≥ 12 for some
n ∈ N, and therefore
W BE (0, r) ≥ 12 W kSn kE < 2r .

Pn
But kSn k2E ≤ CkSn k2H = m=0 I(hm )2 for some C < ∞, and so

n+1
W kSn kE < 2r ≥ γ0,1 r

BRn+1 0, 2C > 0 for any r > 0.
§ 8.3.3. Orthogonal Projections. Associated with any closed, linear sub-

space L of a Hilbert space H, there is an orthogonal projection map ΠL : H −→
L determined by the property that, for each h ∈ H, h − ΠL h ⊥ L. Equivalently,
ΠL h is the element of L that is closest to h. In this subsection I will show that if
(H, E, W) is an abstract Wiener space and L is a finite dimensional subspace of
H, then ΠL admits a W-almost surely unique extension PL to E. In addition,
I will show that PL x −→ x in L2 (W; E) as L % H.
Lemma 8.3.7. Let (H, E, W) be an abstract Wiener space P∞ and {hm : m ≥
0} an orthonormal basis in H. Then, for each h ∈ H, m=0 (h, hm )H I(hm )
converges to I(h) W-almost surely and in Lp (W; R) for every p ∈ [1, ∞).
Proof: Define the σ-algebras Fn and F as in the proof P
of Theorem 8.3.3. Then,
n
by the same argument as I used there, one can identify m=0 (h, hm )H I(hm ) as
W
EW [I(h) | Fn ]. Thus, since F ⊇ BE , the required convergence statement is an
immediate consequence of Corollary 5.2.4.
Theorem 8.3.8. Let (H, E, W) be an abstract Wiener space. For each
finite dimensional subspace L of H there is a W-almost surely unique map
PL : E −→ H such that, for every h ∈ H and W-almost every x ∈ E,
h, PL x H = I(ΠL h)(x), where ΠL denotes orthogonal projection from H onto
L. In fact, if {g1 , . . . , gdim(L) } is an orthonormal basis for L, then PL x =
Pdim(L)
1 [I(gi )](x)gi , and so PL x ∈ L for W-almost every x ∈ E. In partic-
ular, the distribution of x ∈ E 7−→ PL x ∈ L under W is the same as that
Pdim(L) dim(L)
of (ξ1 , . . . , ξdim(L) ) ∈ Rdim(L) 7−→ 1 ξi gi ∈ L under γ0,1 . Finally,
x PL x is W-independent of x x − PL x.
Proof: Set ` = dim(L). It suffices to note that
`
! ` `
!
X X X
I(ΠL h) = I (h, gk )H gk = (h, gk )H I(gk ) = I(gk )gk , h
k=1 k=1 k=1 H
for all h ∈ H
We now have the preparations needed to prove a result which shows that
my definition of an abstract Wiener space is the same as Gross’s. Specifically,
Gross’s own definition was based on the property proved in the following.
Theorem 8.3.9. Let (H, E, W) be an abstract Wiener space and {hn : n ≥ 0}

an orthonormal basis for H, and set Ln = span {h ,
0 2. . . , h }
n . Then, for all
W 2
> 0 there exists an n ∈ N such that E kPL xkE ≤ whenever L is a finite
dimensional subspace that is perpendicular to Ln .
Proof: Without loss in generality, I will assume that k · kE ≤ k · kH .
Arguing by contradiction, I will show that if the asserted property does not
hold,Pthen there would exist an orthonormal basis {fn : n ≥ 0} for H such
∞
that 0 I(fn )fn fails to converge in L2 (W; E). Thus, suppose that there exists
for all n ∈ N there exists a finite dimensional L ⊥ Ln
an > 0 such that
with EW kPL xk2E ≥ 2 . Under this assumption, define {nm : m ≥ 0} ⊆
N, {`m : m ≥ 0} ⊆ N, and {f0 , . . . , fnm } : m ≥ 0 ⊆ Lnm inductively
by the following prescription. First, take n0 = 0 = `0 and f0 = h0 . Next,
knowing nm and {f0, . . . , fnm }, choose a finite dimensional subspace L ⊥ Lnm
so that EW kPL xk2E ≥ 2 , set `m = dim(L), and let {gm,1 , . . . , gm,`m } be an
orthonormal basis for L. For any δ > 0 there exists an n ≥ nm + `m such that
`m
X
ΠLn gm,i , ΠLn gm,j
H
− δi,j ≤ δ.
i,j=1
In particular, if δ ∈ (0, 1), then the elements of {ΠLn gm,i : 1 ≤ i ≤ `m } are

linearly independent and the orthonormal set {g̃m,i : 1 ≤ i ≤ `m } obtained from
them via the Gram–Schmidt orthogonalization procedure satisfies (cf. Exercise
8.3.16)
`m
X `m
X
kg̃m,i − gm,i kH ≤ K`m ΠLn gm,i , ΠLn gm,j − δi,j
i=1 i,j=1
for some Km < ∞ which depends only on `m . Moreover, and because L ⊥ Lnm ,
g̃m,i ⊥ Lnm for all 1 ≤ i ≤ `m. Hence, we can find an nm+1 ≥ nm + `m so that
span {hn : nm < n ≤ nm+1 } admits an orthonormal basis {fnm +1 , . . . , fnm+1 }
P`
with the property that 1m kgm,i − fnm +i kH ≤ 4 .
Clearly {fn : n ≥ 0} is an orthonormal basis for H. On the other hand,
 2  12  2  12
nmX+`m X`m
EW  I(fn )fn  ≥ − EW 

I(gm,i )gm,i − I(fnm +i )fnm +i 

n=nm +1 E 1 E
`m
X 2 1
EW I(gm,i )gm,i − I(fnm +i )fnm +i H 2 ,

≥−
1
2 1
and so, since EW I(gi,m )gm,i − I(fnm +i )fnm +i H 2 is dominated by

2 1 1
EW I(gm,i ) − I(fnm +i ) gm,i H 2 + EW I(fnm +i )2 2 kgm,i − fnm +i kH

≤ 2kgm,i − fnm +i kH ,
we have that
 2  12
nmX+`m
EW  I(fn )fn  ≥ for all m ≥ 0,

n +1
2
m E
P∞
and this means that 0 I(fn )fn cannot be converging in L2 (W; E).
Besides showing that my definition of an abstract Wiener space is the same
as Gross’s, Theorem 8.3.9 allows us to prove a very convincing statement, again
due to Gross, of just how non-unique is the Banach space for which a given
Hilbert space is the Cameron–Martin space.
Corollary 8.3.10. If (H, E, W) is an abstract Wiener space, then there
exists a separable Banach space E0 that is continuously embedded in E as a
measurable subset and has the properties that W(E 0 ) = 1, bounded subsets of
E0 are relatively compact in E, and (H, E0 , W E0 is again an abstract Wiener
space.
Proof: Again I will assume that k · kE ≤ k · kH .
Choose {x∗n : n ≥ 0} ⊆ E ∗ so that {hn : n ≥ 0} is an orthonormal basis
in H when hn = hx∗n , and set Ln = span {h0 , . . . , hn } . Next, using Theo-
rem 8.3.9, choose an increasing sequence {nm : m ≥ 0} so that n0 = 0 and
1
EW kPL xk2E 2 ≤ 2−m for m ≥ 1 and finite dimensional L ⊥ Lnm , and define

Q` for ` ≥ 0 on E into H so that

n
X̀
Q0 x = hx, x∗0 ih0 and Q` x = hx, x∗n ihn when ` ≥ 1.
n=n`−1 +1
Pm
Finally, set Sm = PLnm = `=0 Q` , and define E0 to be the set of x ∈ E such
that
∞
X
`2 Q` xkE < ∞

kxkE0 ≡ kQ0 xkE + and kSm x − xkE −→ 0.
`=1
To show that k · kE0 is a norm on E0 and that E0 with norm k · kE0 is a

Banach space, first note that if x ∈ E0 , then
m
X
kxkE = lim kSm xkE ≤ kQ0 xkE + lim kQ` xkE ≤ kxkE0 ,
m→∞ m→∞ `=1
and therefore k · kE0 is certainly a norm on E0 . Next, suppose that the sequence
{xk : k ≥ 1} ⊆ E0 is a Cauchy sequence with respect to k · kE0 . By the
preceding, we know that {xk : k ≥ 1} is also Cauchy convergent with respect to
k · kE , and so there exists an x ∈ E such that xk −→ x in E. We need to show

that x ∈ E0 and that kxk − xkE0 −→ 0. Because {xk : k ≥ 1} is bounded in E0 ,
it is clear that kxkE0 < ∞. In addition, for any m ≥ 0 and k ≥ 1,
kx − Sm xkE = lim kx` − Sm x` kE ≤ lim kx` − Sm x` kE0
`→∞ `→∞
X X
2
= lim n kQn x` kE ≤ n2 kQn xk k + sup kx` − xk kE0 .
`→∞ n>m n>m `>k
Thus, by choosing k for a given > 0 so that sup`>k kx` − xk kE0 < , we
conclude that limm→∞ kx − Sm xkE < and therefore that Sm x −→ x in E.
Hence, x ∈ E0 . Finally, to see that xk −→ x in E0 , simply note that
∞
X
kx − xk kE0 = kQ0 (x − xk )kE + m2 kQm (x − xk )kE
m=1
∞
!
X
2
≤ lim kQ0 (x` − xk )kE + m kQm (x` − xk )kE ≤ sup kx` − xk kE0 ,
`→∞ m=1 `>k
which tends to 0 as k → ∞.
To show that bounded subsets of E0 are relatively compact in E, it suffices
to show that if {x` : ` ≥ 1} ⊆ BE0 (0, R), then there is an x ∈ E to which a
subsequence converges in E. For this purpose, observe that, for each m ≥ 0,
there is a subsequence {x`k : k ≥ 1} along which Sm x`k converges in Lnm .
Hence, by a diagonalization argument, {x`k : k ≥ 1} can be chosen so that
{Sm x`k : k ≥ 1} converges in Lnm for all m ≥ 0. Since, for 1 ≤ j < k,
X
kx`k − x`j kE ≤ kSm x`k − Sm x`j kE + kQn (x`k − x`j )kE
n>m
X 1
≤ kSm x`k − Sm x`j kE + 2R ,
n>m
n2
it follows that {x`k : k ≥ 1} is Cauchy convergent in E and therefore that it

converges in E.
I must still show that E0 ∈ BE and that (H, E0 , W0 ) is an abstract Wiener
space when W0 = W E0 . To see the first of these, observe that x ∈ E 7−→
kxkE0 ∈ [0, ∞] is lower semicontinuous and that {x : kSm x − xkE −→ 0} ∈ BE .
In addition, because, by Theorem 8.3.3, kSm x − xkE −→ 0 for W-almostevery
x ∈ E, we will know that W(E0 ) = 1 once I show that W kxkE0 < ∞ = 1,
which follows immediately from
∞
X
EW kxkE0 = EW kQ0 xkE + m2 EW kQm xkE

1
∞
X 1
≤ EW kQ0 xkE + m2 EW kQm xk2E 2 < ∞.

1
The next step is to check that H is continuously embedded in E0 . Certainly

h ∈ H =⇒ kSm h − hkE ≤ kSm h − hkH −→ 0. Next suppose that h ∈
H \ {0} and that h ⊥ Lnm , and let L be the line spanned by h. Then PL x =
khk−2
H [I(h)](x)h, and so, because L ⊥ Lnm ,
1 2 2 khkE khkE
W
1
≥ E I(h) = .
2m khk2H khkH
Hence, we now know that h ⊥ Lnm =⇒ khkE ≤ 2−m khkH . In particular,
kQm+1 hkE ≤ 2−m kQm+1 hkH ≤ 2−m khkH for all m ≥ 0 and h ∈ H, and so
∞ ∞
!
X
2
X m2
khkE0 = kQ0 hkE + m kQm hkE ≤ 1 + 2 m
khkH = 25khkH .
m=1 m=1
2
To complete the proof, I must show that H is dense in E0 and that, for each
c0 (y ∗ ) = e− 12 khy∗ k2H , where W0 = W E0 and hy∗ ∈ H is determined
y ∗ ∈ E0∗ , W
by h, hy∗ H = hh, y ∗ i for h ∈ H. Both these facts rely on the observation that

X
kx − Sm xkE0 = n2 kQn xkE −→ 0 for all x ∈ E0 .
n>m
Knowing this, the density of H in E0 is obvious. Finally, if y ∗ ∈ E0∗ , then, by

the preceding and Lemma 8.3.7,
nm
X
hx, y ∗ i = lim hSm x, y ∗ i = lim hx, x∗n ihhn , y ∗ i
m→∞ m→∞
n=0
nm
X
= lim hy∗ , hn H
I(hn ) (x) = I(hy∗ ) (x)
m→∞
n=0
for W0 -almost every x ∈ E0 . Hence h · , y ∗ i under W0 is a centered Gaussian

with variance khy∗ k2H .
§ 8.3.4. Pinned Brownian Motion. Theorem 8.3.8 has a particularly inter-
esting application to the classical abstract Wiener space H1 (RN ), Θ(RN ), W (N ) .
Namely, suppose that 0 = t0 < t1 < · · · < tn , and let L be the span of
htm e : 1 ≤ m ≤ n and e ∈ SN −1 , where ht (τ ) ≡ t ∧ τ . In this case,

n
X htm − htm−1
PL θ = θ(tm ) − θ(tm−1 ) ,
t − tm−1
m=1 m
and so
θ(t1 ,... ,tn ) (t) ≡ [θ − PL θ](t)
t−tm−1
(
(8.3.11) θ(t) − θ(tm−1 ) − tm −tm−1 θ(tm ) − θ(tm−1 ) if t ∈ [tm−1 , tm ]
=
θ(t) − θ(tn ) if t ∈ [tn , ∞).
Thus, if (θ, ~y) ∈ Θ(RN ) × (RN )n 7−→ θ(t1 ,... ,tn ),~y ∈ Θ(RN ) is given by
n
X htm − htm−1
θ(t1 ,... ,tn ),~y = θ(t1 ,... ,tn ) + (ym − ym−1 ),
t − tm−1
m=1 m
where ~y = (y1 , . . . , yn ) and y0 ≡ 0, then, for any Borel measurable F : Θ(RN )×

(RN )n −→ [0, ∞),
Z
F θ, θ(t1 ), . . . , θ(tn ) W (N ) (dθ)
Θ(RN )
(8.3.12) Z Z
(N )

= F θ(t1 ,... ,tn ),~y , ~y W (dθ) γ0,C(t1 ,... ,tn ) (d~y),
(RN )n Θ(RN )
where C(t1 , . . . , tn )(m,i),(m0 i0 ) = tm ∧tm0 δi,i0 for 1 ≤ m, m0 ≤ n and 1 ≤ i, i0 ≤ N

is the covariance of θ (θ(t1 ), . . . , θ(tn )) under W (N ) . Equivalently, if
n
X htm − htm−1
θ̌(t1 ,... ,tn ),~y = θ(t1 ,... ,tn ) + ym ,
t − tm−1
m=1 m
then
Z
F θ, θ(t1 ) − θ(t0 ), . . . , θ(tn ) − θ(tn−1 ) W (N ) (dθ)
Θ(RN )
(8.3.13) Z Z
(N )

= F θ̌(t1 ,... ,tn ),~y , y W
~ (dθ) γ0,D(t1 ,... ,tn ) (d~y),
(RN )n Θ(RN )
where D(t1 , . . . , tn )(m,i),(m0 ,i0 ) = (tm − tm−1 )δm,m0 δi,i0 for 1 ≤ m, m0 ≤ n and
1 ≤ i, i0 ≤ N is the covariance matrix for θ(t1 ) − θ(t0 ), . . . , θ(tn ) − θ(tn−1 )
under W (N ) .
There are several comments that should be made about these conclusions. In
the first place, it is clear from (8.3.11) that t θ(t1 ,... ,tn ) (t) returns to the origin
at each of the times {tm : 1 ≤ m ≤ n}. In addition, the excursions θ(t1 ,... ,tn )
[tm−1 , tm ], 1 ≤ m ≤ n, are independent of each other and of θ(t1 ,... ,tn ) [tn , ∞).
(N )
Secondly, if W(t1 ,... ,tn ),~y denotes the W (N ) -distribution of θ θ(t1 ,... ,tn ),~y , then
(8.3.12) says that
(N )
θ W(t1 ,... ,tn ),(θ(t1 ),... ,θ(tn ))
is a regular conditional probability distribution (cf. § 9.2) of W (N ) given the σ-

algebra generated
by {θ(t1 ), . . . , θ(tn )}. Expressed in more colloquial terms, the
process θ(t1 ,... ,tn ),~y (t) : t ≥ 0 is Brownian motion pinned to the points
{ym : 1 ≤ m ≤ n} at times {tm : 1 ≤ m ≤ n}.
§ 8.3.5. Orthogonal Invariance. Consider the standard Gauss distribution

γ0,I on RN . Obviously, γ0,I is rotation invariant. That is, if O is an orthog-
onal transformation on RN , then γ0,I is invariant under the transformation
TO : RN −→ RN given by TO x = Ox. On the other hand, none of these
transformations can be ergodic, since any radial function on RN is invariant
under TO for every O.
Now think about the analogous situation when RN is replaced by an infinite
dimensional Hilbert space H and (H, E, W) is an associated abstract Wiener
space. As I am about to show, W still enjoys rotation invariance with respect
to orthogonal transformations on H. On the other hand, because kxkH = ∞
for W-almost every x ∈ E, there are no non-trivial radial functions now, a
fact that leaves open the possibility that some orthogonal transformation of H
give rise to ergodic transformations for W. The purpose of this subsection is
to investigate these matters, and I begin with the following formulation of the
rotation invariance of W.
Theorem 8.3.14. Let (H, E, W) be an abstract Wiener space and O an or-
thogonal transformation on H. Then there is a W-almost surely unique, Borel
measurable map TO : E −→ E such that I(h) ◦ TO = I(O> h) W-almost surely
for each h ∈ H. Moreover, W = (TO )∗ W.
Proof: To prove uniqueness, note that if T and T 0 both satisfy the defining
property for TO , then, for each x∗ ∈ E ∗ ,
hT x, x∗ i = I(hx∗ )(T x) = I(O> hx∗ ) = I(hx∗ )(T 0 x) = hT 0 x, x∗ i
for W-almost every x ∈ E. Hence, since E ∗ is separable in the weak* topology,

T x = T 0 x for W-almost every x ∈ E.
To prove existence, choose an orthonormal
P∞ basis {hm : m ≥ P0} for H, and let
∞
C be the set of x ∈ E for which both m=0 [I(hm )](x)hm and m=0 [I(hm )](x)Ohm
converge in E. By Theorem 8.3.3, we know that W(C) = 1 and that
P∞
m=0 [I(hm )](x)Ohm if x ∈ C
x TO x ≡
0 if x ∈
/C
has distribution W. Hence, all that remains is to check that I(h)◦TO = I(O> h)
W-almost surely for each h ∈ H. To this end, let x∗ ∈ E ∗ , and observe that
∞
X
∗

[I(hx∗ )](TO x) = hTO x, x i = hx∗ , Ohm H
[I(hm )](x)
m=0
∞
X
O> hx∗ , hm

= H
[I(hm )](x)
m=0
for W-almost every x ∈ E. Thus, since, by Lemma 8.3.7, the last of these
series convergences W-almost surely to I(O> hx∗ ), we have that I(hx∗ ) ◦ TO =
I(O> hx∗ ) W-almost surely. To handle general h ∈ H, simply note that both
h ∈ H 7−→ I(h) ◦ TO ∈ L2 (W; R) and h ∈ H 7−→ I(O> h) ∈ L2 (W; R) are
isometric, and remember that {hx∗ : x∗ ∈ E ∗ } is dense in H.
I next want to discuss the possibility of TO being ergodic for some orthog-
onal transformations O. First notice that TO cannot be ergodic if O has a
Pn subspace L, since if {h1 , . . . , hn } were
non-trivial, finite dimensional invariant
an orthonormal basis for L, then m=1 I(hm )2 would be a non-constant, TO -
invariant function. Thus, the only candidates for ergodicity are O’s that have
no non-trivial, finite dimensional, invariant subspaces. In a more general and
highly abstract context, I. Segal2 showed that the existence of a non-trivial, fi-
nite dimensional subspace for O is the only obstruction to TO being ergodic.
Here I will show less.
Theorem 8.3.15. Let (H, E, W) be an abstract Wiener space. If O is an
orthogonal transformation
on H with the property that, for every g, h ∈ H,
limn→∞ On g, h H = 0, then TO is ergodic.
Proof: What I have to show is that any TO -invariant element Φ ∈ L2 (W; R)
is W-almost surely constant, and for this purpose it suffices to check that
lim EW (Φ ◦ TOn )Φ = 0

(*)
n→∞
for all Φ ∈ L2 (W; R) with mean value 0. In fact, if {hm : m ≥ 1} is an

orthonormal basis for H, then it suffices to check (*) when

Φ(x) = F [I(h1 )](x), . . . , [I(hN )](x)
for some N ∈ Z+ and bounded, Borel measurable F : RN −→ R. The reason

why it is sufficient to check it for such Φ’s is that, because TO is W-measure
preserving, the set of Φ’s for which (*) holds is closed in L2 (W; R). Hence, if
we start with any Φ ∈ L2 (W; R) with mean value 0, we can first approximate it
in L2 (W; R) by bounded functions with mean value 0 and then condition these
bounded approximates with respect to σ {I(h1 ), . . . , I(hN )} to give them the
required form.
Now suppose that Φ = F I(h1 ), . . . , I(hN ) for some N and bounded, mea-
surable F . Then
ZZ
EW Φ ◦ TOn Φ =

F (ξ)F (η) γ0,Cn (dξ × dη),
RN ×RN
2See I.E. Segal’s “Ergodic subsgroups of the orthogonal group on a real Hilbert Space,” Annals
of Math. 66 # 2, pp. 297–303 (1957). For a treatment in the setting here, see my article “Some
thoughts about Segals ergodic theorem,” Colloq. Math. 118 # 1, pp. 89-105 (2010).
where

I Bn

Cn = with Bn = hk , On h`
B>n I H 1≤k,`≤N
and the block structure corresponds to RN × RN . Finally, by our hypothesis

about O, we can find a subsequence {nm : m ≥ 0} such that limm→∞ Bnm = 0,
from which it is clear that γ0,Cnm tends to γ0,I × γ0,I in variation and therefore
lim EW (Φ ◦ TOnm )Φ = EW [Φ]2 = 0.

m→∞
Perhaps the best tests for whether an orthogonal transformation satisfies the
hypothesis in Theorem 8.3.15 come from spectral theory. To be more precise, if
Hc and Oc are the space and operator obtained by complexifying H and O, the
Spectral Theorem for normal operators allows one to write
Z 2π √
−1α
Oc = e dEα ,
0
where {Eα : α ∈ [0, 2π)} is a resolution of the identity in Hc by orthogonal

projection operators. The spectrum of Oc is said to be absolutely continuous
if, for each h ∈ Hc , the non-decreasing function α Eα h, h Hc is absolutely
Eα h, h0 Hc is absolutely

continuous, which, by polarization, means that α
continuous for all h, h0 ∈ Hc . The reason for introducing this concept here is
that, by combining the Riemann–Lebesgue Lemma with Theorem 8.3.15, one can
prove that TO is ergodic if the spectrum of Oc is absolutely continuous.3 Indeed,
given h, h0 ∈ H, let f be the Radon–Nikodym derivative of α Eα h, h0 H ,

c
and apply the Riemann–Lebesgue Lemma to see that
Z 2π √
n 0 −1nα

O h, h H
= e f (α) dα −→ 0 as n → ∞.
0
See Exercises 8.3.24, 8.3.25, and 8.5.15 for a more concrete examples.
Exercise 8.3.16. The purpose of this exercise is to provide the linear algebraic
facts that I used in the proof of Theorem 8.3.9. Namely, I want to show that if
a set {h1 , . . . , hn } ⊆ H is approximately orthonormal, then the vectors hi differ
by very little from their Gram–Schmidt orthogonalization.
3This conclusion highlights the poverty of the result here in comparison to Segal’s result,
which says that TO is ergodic as soon as the spectrum of Oc is continuous.

(i) Suppose that A = aij 1≤i,j≤n ∈ Rn ⊗Rn is a lower triangular matrix whose
diagonal entries are non-negative. Show that there is a Cn < ∞, depending only
on n, such that kIRn − Akop ≤ Cn kIRn − AA> kop .
Hint: Show that it suffices to treat the case when AA> ≤ 2IRn , and set ∆ =
IRn − AA> . Assuming that AA> ≤ 2IRn , work by induction on n, at each step
using the lower triangularity of A, to see that
  12
`
1 X
|a` ` an ` | ≤ |∆n ` | + (AA> )n2 n  a2` j  if 1 ≤ ` < n
j=1
n−1
X
1 − a2n n ≤ |∆n n | + a2n ` .

`=1

(ii) Let {h1 , . . . , hn } ⊆ H, set B = (hi , hj )H 1≤i,j≤n , and assume that kIRn −
Bkop < 1. Show that the hi ’s are linearly independent.
(iii) Continuing part (ii), let {f1 , . . . , fn } be the orthonormal set obtained from
the hi ’s by the Gram–Schmidt orthogonalization procedure, and let A be the
matrix whose (i, j)th entry is (hi , fj )H . Show that A is lower triangular and
that its diagonal entries are non-negative. In addition, show that AA> = B.
(iv) By combining (i) and (iii), show that there is a Kn < ∞, depending only
on n, such that
n
X n
X
khi − fi kH ≤ Kn δi,j − (hi , hj )H .
i=1 i,j=1
Pn
Hint: Note that hi = j=1 aij fj and therefore that
n
X 2
khi − fi k2H = IRn − A ij
≤ nkIRn − Ak2op .
j=1
Exercise 8.3.17. Given a Hilbert space H, the problem of determining for

which Banach spaces H arises as the Cameron–Martin space is an extremely
delicate one. For example, one might hope that H will be the Cameron–Martin
space for E if H is dense in E and its closed unit ball BH (0, 1) is compact in E.
However, this is not the case. For example,qtake H = `2 (N; R) and let E be the
P∞ ξn2
completion of H with respect to kξkE ≡ n=0 n+1 . Show that BH (0, 1) is
compact as a subset of E but that there is no W ∈ M1 (E) for which (H, E, W)
is an abstract Wiener space.
Hint: The first part is an easy application of the standard diagonalization ar-
2
P ξn 1
gument combined with the obvious fact that n≥m n+1 ≤ m+1 kξk`2 (N;R) . To
prove the second part, note that in order for W to exist it would be necessary
P∞ ξn2 N
for n=0 n+1 to be γ0,1 -almost surely convergent.
Exercise 8.3.18. Let (H, E, W) be an abstract Wiener space, and assume that
H is infinite dimensional. As was pointed out, {hx∗ : x∗ ∈ E ∗ } is the subspace of
g ∈ H for which there exists a C < ∞ with the property that |(h, g)H | ≤ CkhkE
for all h ∈ H. Show that for each g ∈ H there is separable Banach space Eg
that is continuously embedded as a Borel subset of E such that W(Eg ) = 1,
(H, Eg , W Eg ) is an abstract Wiener space, and |(h, g)H | ≤ khkEg for all
h ∈ H.
Hint: Refer to the notation used in the proof of Corollary 8.3.10. Choose nm %
1
gkH ≤ 2−m and EW kPL k2E 2 ≤ 2−m

∞ so that n0 = 0 and, for m ≥ 1, kΠL⊥ nm
for finite dimensional L ⊥ Lnm . Next, define Eg to be the space of x ∈ E with
the properties that PLnm x −→ x in E and
X
kxkEg ≡ kQ` xkE + Q` x, g H < ∞,
`=0
Pn`
where Q0 x = hx, x∗0 ihx∗0 and Q` x = ∗
n=n`−1 +1 hx, xn ihxn for ` ≥ 1. Using
∗
the reasoning in the proof of Corollary 8.3.10, show that Eg has the required
properties.
Exercise 8.3.19. Let N = 1. Using Theorem 8.3.3, take Wiener’s choice of or-
thonormal basis and check that there are independent, standard normal random
variables {Xm : m ≥ 1} under W (1) such that, for W (1) -almost almost every θ,
∞
1
X sin(πmt)
θ(t) = tX0 (θ) + 2 2 Xm (θ) , t ∈ [0, 1],
m=1
mπ
where the convergence is uniform. From this, show that, W (1) -almost surely,
1 ∞ √
X0 (θ)2 1 X Xm (θ)2 + 8X0 (θ)Xm (θ)
Z
2
θ(t) dt = + 2 ,
0 3 π m=1 m2
where the convergence of the series is absolute. Using the preceding, conclude
that, for any α ∈ (0, ∞),
Z 1 "Y∞ #− 12 " ∞

#− 12
(1) 2α X 1
EW −α θ(t)2 dt = 1+ 2 2 1 + 4α .
0 m=1
m π m=1
m2 π 2 + 2α
Finally, recall Euler’s product formula

∞
z2
Y
sinh z = 1+ , z ∈ C,
m=1
m2 π 2
and arrive first at

1 √ − 1
Z
W (1) 2

E exp −α θ(t) dt = cosh 2α 2 ,
0
and then, after an application of Brownian rescaling, at

" !#
W (1)
Z T
2
√ − 1
E exp −α θ(t) dt = cosh 2α T 2 .
0
This is a famous calculation that can be made using many different methods.
We will return to it in § 10.1.3. See, in addition, Exercise 8.4.7.
Hint: Use Euler’s product formula to see that
∞
d sinh t X 1
log = 2t for t ∈ R.
dt t n=1
n π + t2
2 2
Exercise 8.3.20. Related to the preceding exercise, but easier, is finding the
Laplace transform of the variance
!2
1 T 1 T
Z Z
2
VT (θ) ≡ θ(t) dt − θ(t) dt
T 0 T 0
of a Brownian path over the interval [0, T ]. To do this calculation, first use
Brownian scaling to show that
(1) (1)
EW e−αVT = EW e−αT V1 .

Next, use elementary Fourier series to show that (cf. part (iii) of Exercise 8.2.18)
R 2
∞ Z 1 2 X ∞ 1
X 0
f k (t) dθ(t)
V1 (θ) = 2 θ(t) cos(kπt) dt = ,
0 k2 π2
k=1 k=1
1
where fk (t) = 2 sin(kπt) for k ≥ 1. Since the fk ’s are orthonormal as elements
2
of L2 [0, ∞); R , this leads to

∞ − 12
W (1) −αV1
Y 2α
E e = 1+ 2 2 .
k π
k=1
Now apply Euler’s formula to arrive at

s √
2αT
E e−αVT =
W

√ .
sinh( 2αT )
Finally, using Wiener’s choice of basis, show that θ V1 (θ) has the same dis-
R1 2 (1)
tribution as θ 0
θ(t) − tθ(1) dt under W , a fact for which I would like
but do not have any conceptual explanation.
Exercise 8.3.21. The purpose of this exercise is to show that, without know-
ing ahead of time that W (N ) lives on Θ(RN ), for the Hilbert space H1 (RN ) one
can give a proof that any Wiener series converges γ0,1 N
-almost surely in Θ(RN ).
N
Thus, let {hm : m ≥ 0} be an orthonormal basis Pn in H(R ) and, for n ∈ N
and ω = (ω0 , . . . , ωm , . . . ) ∈ R , set Sn (t, ω) = m=0 ωm hm (t). The goal is to
N
show that {Sn ( · , ω) : n ≥ 0} converges in Θ(RN ) for γ0,1 N

-almost every ω ∈ RN .

(i) For ξ ∈ RN , set ht,ξ (τ ) = t∧τ ξ, check that ξ, Sn (t) RN = ht,ξ , Sn (t) H1 (RN ) ,
N
and apply Theorem 1.4.2 to show that limn→∞ ξ, Sn (t) RN exists both γ0,1 -
2 N N
almost surely and in L (γ0,1 ; R) for each (t, ξ) ∈ [0, ∞) × R . Conclude from
this that, for each t ∈ [0, ∞), limn→∞ Sn (t) exists both γ0,1 N
-almost surely and
2 N N
in L (γ0,1 ; R ).
(ii) On the basis of part (i), show that we will be done once we know that,
N
for γ0,1 -almost every x ∈ RN , {Sn ( · , x) : n ≥ 0} is equicontinuous on finite
intervals and that supn≥0 t−1 |Sn (t, x)| −→ 0 as t → ∞. Show that both these
will follow from the existence of a C < ∞ such that
" #
N
Sn (t) − Sn (s) 3
γ0,1
(*) E sup sup 1 ≤ CT 8 for all T ∈ (0, ∞).
0≤s<t≤T n≥0 (t − s) 8
(iii) As an application of Theorem 4.3.2, show that (*) will follow once one
checks that

N
γ0,1
E sup |Sn (t) − Sn (s)| ≤ B(t − s)2 , 0 ≤ s < t,
4
n≥0
for some B < ∞. Next, apply (6.1.14) to see that

4
N 4 N
Eγ0,1 sup |Sn (t) − Sn (s)|4 ≤ sup Eγ0,1 |Sn (t) − Sn (s)|4 .

n≥0 3 n≥0
In addition, because Sn (t) − Sn (s) is a centered Gaussian, argue that
N N 2
Eγ0,1 |Sn (t) − Sn (s)|4 ≤ 3Eγ0,1 |Sn (t) − Sn (s)|2 .

Finally, repeat the sort of reasoning used in (i) to check that

N
Eγ0,1 |Sn (t) − Sn (s)|2 ≤ N (t − s) for 0 ≤ s < t.

Exercise 8.3.22. In this exercise we discuss some properties of pinned Brow-

nian motion. Given T > 0, set θT (t) = θ(t) − t∧T T θ(T ). As I pointed out at
the end of § 8.3.2, the W (N ) -distribution of θT is that of a Brownian motion
conditioned to be back at 0 at time T . Next take ΘT (RN ) to be the space of
(N )
continuous paths θ : [0, T ] −→ RN satisfying θ(0) = 0 = θ(T ), and let WT
(N ) N N
denote the W -distribution of θ ∈ Θ(R ) 7−→ θT ∈ ΘT (R ).
(i) Show that the W (N ) -distribution of {θT (t) : t ≥ 0} is the same as that of
1
{T 2 θ1 (T −1 t) : t ≥ 0}.
(ii) Set H1T (RN ) = {h [0, T ] : h ∈ H1 (RN ) & h(T ) = 0}, and define
(N )
khkH1T (RN ) = kḣkL2 ([0,T ];RN ) . Show that the triple H1T (RN ), ΘT (RN ), WT
(N )
is an abstract Wiener space. In addition, show that WT is invariant under
time reversal. That is, {θ(t) : t ∈ [0, T ]} and {θ(T − t) : t ∈ [0, T ]} have the
(N )
same distribution under WT .
Hint: Begin by identifying ΘT (RN )∗ as the space of finite, RN -valued Borel
measures λ on [0, T ] such that λ({0}) = 0 = λ({T }).
Exercise 8.3.23. Say that D ⊆ E ∗ is determining if x = y whenever hx, x∗ i =

hy, x∗ i for all x∗ ∈ D. Next, referring to Theorem 8.3.14, suppose that O is an
orthogonal transformation on H and that F : E 7−→ E has the properties that
F H = O and that x hF (x), x∗ i is continuous for all x∗ ’s from a determining
set D. Show that TO x = F (x) for W-almost every x ∈ E.

Exercise 8.3.24. Consider H1 (RN ), Θ(RN ), W (N ) , the classical Wiener
space. Given α ∈ (0, ∞), define Oα : H1 (RN ) −→ H(RN ) by [Oα h](t) =
1
α− 2 h(αt), show that Oα is an orthogonal transformation, and apply Exer-
cise 8.3.23 to see that TOα is the Brownian scaling map Sα given by Sα θ(t) =
1
α− 2 θ(αt) discussed in part (iii) of Exercise 4.3.10. The main goal of this exercise
is to apply Theorem 8.3.15 to show that TOα is ergodic for every α ∈ (0, ∞)\{1}.
(i) Given an orthogonal transformation O on H1 (RN ), show that On h, h0 H1 (RN )

tends to 0 for all h, h0 ∈ H1 (RN ) if limn→∞ On h, h0 H1 (RN ) = 0 for all h, h0 ∈

H(RN ) with ḣ, ḣ0 ∈ Cc∞ (0, ∞); RN .

(ii) Complete the program by showing that Oαn h, h0 H1 (RN ) tends to 0 for all

α ∈ (0, ∞) \ {1} and h, h0 ∈ H1 (RN ) with ḣ, ḣ0 ∈ Cc∞ (0, ∞); RN .

(iii) There is another way to think about the operator Oα . Namely, let λRN
be Lebesgue measure on R, define U : H(RN ) −→ L2 (λRN ; RN ) by U h(x) =
x
e 2 ḣ(ex ), and show that U is an isometry from H1 (RN ) onto L2 (λRN ; RN ). Fur-
ther, show that U ◦ Oα = τlog α ◦ U , where τα : L2 (λRN ; RN ) −→ L2 (λRN ; RN ) is
the translation map τα f (x) = f (x + α). Conclude from this that
√
Z
Oαn h, h0 = (2π)−1 e− −1nξ log α

H1 (RN )
Uch(ξ), U
d h0 CN
dξ,
R
and use this, together

with the Riemann–Lebesgue Lemma, to give a second
proof that Oαn h, h0 H1 (RN ) tends to 0 as n → ∞ when α 6= 1.
(iv) As a consequence of the above and Theorem 6.2.7, show that for each
α ∈ (0, ∞) \ {1}, q ∈ [1, ∞), and F ∈ Lq (W (N ) ; C),
n−1
1 X (N )
F Sαn θ = EW [F ] W (N ) -almost surely and in Lq (W (N ) ; C).

lim
n→∞ n
m=0
Next, replace Theorem 6.2.7 by Theorem 6.2.12 to show that

Z t
1 (N )
τ −1 F Sτ θ dτ = EW [F ]

lim
t→∞ log t 1
W (N ) -almost surely and in Lq (W (N ) ; C). In particular, use this to show that,

for n ∈ N,
Z t ( Qn
1 m=1 (2m − 1) if n is even
2
n
lim τ − 2 −1 θ(τ )n dτ =
t→∞ log t 1 0 if n is odd.
Exercise 8.3.25. Here is a second reasonably explicit example to which The-

orem 8.3.15 applies. Again consider the classical case when H = H1 (RN ), and
assume that N ∈ Z+ is even. Choose a skew-symmetric A ∈ Hom(RN ; RN )
whose kernel is {0}. That is, A> = −A and Ax = 0 =⇒ x = 0.
(i) Define OA on H1 (RN ) by
Z t
OA h(t) = eτ A ḣ(τ ) dτ,
0
and show that OA is an orthogonal transformation that satisfies the hypotheses

in Theorem 8.3.15.
Hint: Using elementary spectral theory, show that there exist non-zero, real
numbers α1 , . . . , α N and an orthonormal basis (e1 , . . . , eN ) in RN such that
2
Ae2m−1 = αm e2m and Ae2m = −αm e2m−1 for 1 ≤ m ≤ N2 . Thus, if Lm is the
space spanned by e2m−1 and e2m , then Lm is invariant under A and the action
of eτ A on Lm in terms of this basis is given by

cos(αm τ ) − sin(αm τ )
.
sin(αm τ ) cos(αm τ )
n
Finally, observe that OA = OnA , and apply the Riemann–Lebesgue Lemma.
(ii) With the help of Exercise 8.3.23, show that
Z t
TOA θ(t) = eτ A dθ(τ ),
0
where the integral is taken in the sense of Riemann–Stieltjes.

§ 8.4 A Large Deviations Result and Strassen’s Theorem 337
§ 8.4 A Large Deviations Result and Strassen’s Theorem

In this section I will prove the analog of Corollary 1.3.13 for non-degenerate,
centered Gaussian measures on a Banach space. Once we have that result, I will
apply it to prove Strassen’s Theorem, which is the law of the iterated logarithm
for such measures.
§ 8.4.1. Large Deviations for Abstract Wiener Space. The goal of this
subsection is to derive the following result.
Theorem 8.4.1. Let (H, E, W) be an abstract Wiener space, and, for > 0,
1
denote by W the W-distribution of x 2 x. Then, for each Γ ∈ BE ,
khk2H
− inf◦ ≤ lim log W (Γ)
h∈Γ 2 &0
(8.4.2)
khk2H
≤ lim log W (Γ) ≤ − inf .
&0 h∈Γ 2
The original version of Theorem 8.4.1 was proved by M. Schilder for the clas-
sical Wiener measure using a method that does not extend easily to the general
case. The statement that I have given is due to Donsker and S.R.S. Varadhan,
and my proof derives from an approach (which very much resembles the argu-
ments given in § 1.3 to prove Cramér’s Theorem) that was introduced into this
context by Varadhan.
The lower bound is an easy application of the Cameron–Martin formula. In-
deed, all that I have to do is show that if h ∈ H and r > 0, then
khk2H
(*) lim log W BE (h, r) ≥ − .
&0 2
To this end, note that, for any x∗ ∈ E ∗ and δ > 0,

1 1
W BE (hx∗ , δ) = W BE (− 2 hx∗ , − 2 δ)

h −1 ∗ 1 2 1
i
= EW e− 2 hx,x i− 2 khx∗ kH , BE (0, − 2 δ)
−1 ∗ 1 2 1
≥ e−δ kx kE∗ − 2 khx∗ kH W BE (0, − 2 δ) ,

which means that
khx∗ k2H
BE (hx∗ , δ) ⊆ BE (h, r) =⇒ lim log W BE (hx∗ , r) ≥ −δkx∗ kE ∗ −

,
&0 2
and therefore, after letting δ & 0 and remembering that {hx∗ : x ∈ E ∗ } is dense
in H, that (*) holds.
The proof of the upper bound in (8.4.2) is a little more involved. The first step
is to show that it suffices to treat the case when Γ is relatively compact. To this
end, refer to Corollary 8.3.10, and set CR equal to the closure in E of BE0 (0, R).
2
By Fernique’s Theorem applied to W on E0 , we know that EW eαkxkE0 ≤ K <

∞ for some α > 0. Hence

R2
≤ Ke−α

W E \ CR = W E \ C 1 ,
− 2 R
and so, for any Γ ∈ BE and R > 0,

R2
W Γ ≤ 2W(Γ ∩ CR ) ∨ Ke−α .

Thus, if we can prove the upper bound for relatively compact Γ’s, then, because
Γ ∩ CR is relatively compact, we will know that, for all R > 0,
khk2H

∧ αR2

lim log W (Γ) ≤ − inf ,
&0 h∈Γ 2
from which the general result is immediate.

To prove the upper bound when Γ is relatively compact, I will show that, for
any y ∈ E,
kyk2
(
− 2 H if y ∈ H
(**) lim lim log W BE (y, r) ≤
r&0 &0 −∞ if y ∈
/ H.
To see that (**) is enough, assume that it is true and let Γ ∈ BE \{∅} be relatively
compact. Given β ∈ (0, 1), for each y ∈ Γ choose r(y) > 0 and (y) > 0 so that
(1−β)
( 2
e− 2 kykH if y ∈ H
W BE (y, r(y)) ≤ 1
− β
e if y ∈
/H
for all 0 < ≤ (y). Because Γ is relatively compact, we can find N ∈ Z+ and
SN
{y1 , . . . , yN } ⊆ Γ such that Γ ⊆ 1 BE (yn , rn ), where rn = r(yn ). Then, for
sufficiently small > 0,

1−β 2 1
W (Γ) ≤ N exp − inf khkH ∧ ,
2 h∈Γ β
and so
1−β 1
lim log W (Γ) ≤ − inf khk2H ∧ .
&0 2 h∈Γ β
Now let β & 0.
Finally, to prove (**), observe that

h −1 ∗ −1 ∗
i
W BE (y, r) = W BE ( √y , √r ) = EW e− 2 hx,x i e 2 hx,x i , BE ( √y , √r )

khx∗ k2
−1
−1 ∗ ∗ ∗ −1 ∗ H ∗
≤ e− (hy,x i−rkx kE∗ ) EW e 2 hx,x i = e− hy,x i− 2 −rkx kE∗ ,
for all x∗ ∈ E. Hence,

lim lim log W BE (y, r) ≤ − sup hy, x∗ i − 12 khx∗ k2H .

r&0 &0 x∗ ∈E ∗
Finally, note that the preceding supremum is the same as half the supremum
kyk2
of hy, x∗ i over x∗ with khx∗ kH = 1, which, by Lemma 8.2.3, is equal to 2 H if
y ∈ H and to ∞ if y ∈ / H.
An interesting corollary of Theorem 8.4.1 is the following sharpening, due to
Donsker and Varadhan, of Fernique’s Theorem.
Corollary 8.4.3. Let W be a non-degenerate, centered, Gaussian measure on
the separable Banach space E, let H be the associated Cameron–Martin space,
and determine Σ > 0 by Σ−1 = inf{khkH : khkE = 1}. Then
1
lim R−2 log W kxkE ≥ R = − 2 .

R→∞ 2Σ
α2 2
In particular, EW e 2 kxkE is finite if α < Σ−1 and infinite if α ≥ Σ−1 .
Proof: Set f (r) = inf{khkH : khkE ≥ r}. Clearly f (r) = rf (1) and f (1) =
Σ−1 . Thus, by the upper bound in (8.4.2), we know that
f (1)2 Σ−2
lim R−2 log W kxkE ≥ R = lim R−2 log WR−2 kxkE ≥ 1 ≤ −

= .
R→∞ R→∞ 2 2
Similarly, by the lower bound in (8.4.2), for any δ ∈ (0, 1),
lim R−2 log W kxkE ≥ R ≥ lim R−2 log W kxkE > R

R→∞ R→∞
khk2H f (1 + δ)2

1
≥ − inf : khkE > R ≥− = −(1 + δ)2 2 ,
2 2 2Σ
and so we have now proved the first assertion.

α2 kxk2E
Given the first assertion, it is obvious that EW e 2 is finite when α <
Σ−1 and infinite when α > Σ−1 . The case when α = Σ−1 is more delicate.
To handle it, I first show that Σ = sup{khx∗ kH : kx∗ kE ∗ = 1}. Indeed, if
x∗ ∈ E ∗ and kx∗ kE ∗ = 1, set g = khhxx∗∗kE , note that kgkE = 1, and check that
1 ≥ hg, x∗ i = g, hx∗ H = kgkH khx∗ kH . Hence khx∗ kH ≤ kgk−1

H ≤ Σ. Next,
suppose that h ∈ H with khkE = 1. Then, by the Hahn–Banach Theorem, there
exists a x∗ ∈ E ∗ with kxkE ∗ = 1 and hh, x∗ i = 1. In particular, khkH khx∗ kH ≥
h, hx∗ H = hh, x∗ i = 1, and therefore khk−1 H ≤ khx∗ kH , which, together with
the preceding, completes the verification.
The next step is to show that there exists an x∗ ∈ E ∗ with kx∗ kE ∗ = 1 such
that khx∗ kH = Σ. To this end, choose {x∗k : k ≥ 1} ⊆ E ∗ with kx∗k kE ∗ = 1 so
that khx∗k kH −→ Σ. Because BE ∗ (0, 1) is compact in the weak* topology and,
by Theorem 8.2.6, x∗ ∈ E ∗ 7−→ hx∗ ∈ H is continuous from the weak* topology
into the strong topology, we can assume that {x∗k : k ≥ 1} is weak* convergent to
some x∗ ∈ BE ∗ (0, 1) and that khx∗ kH = Σ, which is possible only if kx∗ kE ∗ = 1.
Finally, knowing that this x∗ exists, note that h · , x∗ i is a centered Gaussian
under W with variance Σ2 . Hence, since kxkE ≥ |hx, x∗ i|,
h kxk2E i Z ξ2
EW e 2Σ2 ≥ e 2Σ2 γ0,Σ2 (dξ) = ∞.
R
§ 8.4.2. Strassen’s Law of the Iterated Logarithm. Just as in § 1.5 we were

able to prove a law of the iterated logarithm on the basis of the large deviation
estimates in § 1.3, so here the estimates in the preceding subsection will allow
us to prove a law of the iterated for centered Gaussian random variables on a
Banach space. Specifically, I will prove the following theorem, whose statement
is modeled on V. Strassen’s famous law of the iterated for Brownian motion (cf.
§ 8.6.3). q
Sn
Recall from § 1.5 the notation Λn = 2n log(2) (n ∨ 3) and S̃n = Λn , where
Pn
Sn = 1 Xm .
Theorem 8.4.4. Suppose that W is a non-degenerate, centered, Gaussian
measure on the Banach space E, and let H be its Cameron–Martin space.
If {Xn : n ≥ 1} is a sequence of independent, E-valued, W-distributed ran-
dom variables on some probability space (Ω, F, P), then, P-almost surely, the
sequence {S̃n : n ≥ 1} is relatively compact in E and the closed unit ball
BH (0, 1) in H coincides with its set of limit points. Equivalently, P-almost surely,
limn→∞ kS̃n − BH (0, 1)kE = 0 and, for each h ∈ BH (0, 1), limn→∞ kS̃n − hkE =
0.
Because, by Theorem 8.2.6, BH (0, 1) is compact in E, the equivalence of the
two formulations is obvious, and so I will concentrate on the second formulation.
I begin by showing that limn→∞ kS̃n − BH (0, 1)kE = 0 P-almost surely, and
the fact that underlies my proof is the estimate that, for each open subset G of
E and α < inf{khkH : h ∈ / G}, there is an M ∈ (0, ∞) with the property that
α2 Λ2 √

Sn
(*) P ∈
/ G ≤ exp − for all n ∈ Z+ and Λ ≥ M n.
Λ 2n
To check (*), first note (cf. Exercise 8.2.14) that the distribution of Sn under

1
P is the same as that of x n 2 x under W and therefore that P S̃Λn ∈ /G =
W n2 (G{). Hence, (*) is really just an application of the upper bound in (8.4.2).
Λ
Given (*), I proceed in very much the same way as I did at the analogous place
in § 1.5. Namely, for any β ∈ (1, 2),
lim kS̃n − BH (0, 1)kE ≤ lim max kS̃n − BH (0, 1)kE

n→∞ m→∞ β m−1 ≤n≤β m
kSn − BH (0, Λ[β m−1 ] )kE

≤ lim max
m→∞ β m−1
≤n≤β m Λn

Sn
≤ lim max m
− BH (0, 1) .
m→∞ 1≤n≤β Λ m−1 [β ] E
At this point in § 1.5 (cf. the proof of Lemma 1.5.3), I applied Lévy’s reflection
principle to get rid of the “max.” However, Lévy’s argument works only for
R-valued random variables, and so here I will replace his estimate by one based
on the idea in Exercise 1.4.25.
Lemma 8.4.5. Let {YmP: m ≥ 1} be mutually independent, E-valued random
n
variables, and set Sn = m=1 Ym for n ≥ 1. Then, for any closed F ⊆ E and
δ > 0,

P(kSn − F kE ≥ δ)
P max kSm − F kE ≥ 2δ ≤ .
1≤m≤n 1 − max1≤m≤n P(kSn − Sm kE ≥ δ)
Proof: Set
Am = {kSm − F kE ≥ 2δ and kSk − F kE < 2δ for 1 ≤ k < m}.
Following the hint for Exercise 1.4.25, observe that

P max kSm − F kE ≥ 2δ min P(kSn − Sm kE < δ)
1≤m≤n 1≤m≤n
n
X n
X
≤ P Am ∩ {kSn − Sm kE < δ} ≤ P Am ∩ {kSn − F kE ≥ δ} ,
m=1 m=1

which, because the Am ’s are disjoint, is dominated by P kSn − F kE ≥ δ .
Applying the preceding to the situation at hand, we see that
!
Sn
P max ≥ 2δ
− BH (0, 1)
1≤n≤β m Λ[β m−1 ]
E

S[βm ]
− BH (0, 1) ≥ δ

P Λ[β
m−1 ] E
≤ .
1 − max1≤n≤β m P kSn kE ≥ δΛ[β m−1 ]
After combining this with the estimate in (*), it is an easy matter to show that,
for each δ > 0, there is a β ∈ (1, 2) such that
∞ !
X Sn
P max
≥ 2δ < ∞,
− BH (0, 1)
β m−1 ≤n≤β m Λ[β m−1 ]

m=1 E
from which it should be clear why limn→∞ kS̃n − BH (0, 1)kE = 0 P-almost surely.
The proof that, P-almost surely, limn→∞ kS̃n − hkE = 0 for all h ∈ BH (0, 1)
differs in no substantive way from the proof of the analogous assertion in the
second part of Theorem 1.5.9. Namely, because BH (0, 1) is separable, it suffices
to work with one h ∈ BH (0, 1) at a time. Furthermore, just as I did there, I can
reduce the problem to showing that, for each k ≥ 2, > 0, and h with khkH < 1,
∞
X

P S̃km −km−1 − h E < = ∞.
m=1
But, if khkH < α < 1, then (8.4.2) says that

2 m m−1
P S̃km −km−1 − h E < = W km −km−1 BE (h, ) ≥ e−α log(2) (k −k )

Λ2
km −km−1
for all large enough m’s.

Exercise 8.4.6. Let (H, E, W) be an abstract Wiener space, and assume that
dim(H) = ∞. If W is defined for > 0 as in Theorem 8.4.1, show that
W1 ⊥ W2 if 2 6= 1 .
Hint: Choose {x∗m : m ≥ 0} ⊆ E ∗ so that {hx∗m : m ≥ 0} is an orthonormal
basis in H, and show that
n−1
1 X
lim hx, x∗m i2 = W -almost surely.
n→∞ n
m=0
Exercise 8.4.7. Show that the Σ in Corollary 8.4.3 is 12 in the case of the

classical abstract Wiener space H1 (RN ), Θ(RN ), W (N ) and therefore that
lim R−2 log W (N ) kθkΘ(RN ) ≥ R = −2.

R→∞
Next, show that

!
−2 (N ) 1
lim R log W sup |θ(τ )| ≥ R =−
R→∞ τ ∈[0,t] 2t
§ 8.5 Euclidean Free Fields 343
and that
!
2
lim R−2 log W (N )

sup |θ(τ )| ≥ R θ(t) = 0 = − .
R→∞ τ ∈[0,t] t
Finally, show that

t
π2
Z
−1 (N ) 2
lim R log W |θ(τ )| dτ ≥ R =−
R→∞ 0 8t2
and that
t
π2
Z
−1 (N ) 2

lim R log W |θ(τ )| dτ ≥ R θ(t) = 0 = − 2 .

R→∞ 0 2t
Hint: In each case after the first, Brownian scaling can be used to reduce the
problem to the case when t = 1, and the challenge is to find the optimal constant
C for which khkE ≤ CkhkH , h ∈ H for the appropriate abstract Wiener space
N
(E, H, W). In the second case E =
C 0 [0, 1] : R ≡ θ [0, 1] : θ ∈ Θ(RN )
and H = η [0, 1] : η ∈ H1 (RN ) , in the third (cf. part (ii) of Exercise 8.3.22)
E = Θ1 (RN ) and H = H11 (RN ) , in the fourth E = L2 [0, 1]; N
R ) and H = {η
1 N 2 N 1 N
[0, 1] : η ∈ H (R )}, and in the fifth E = L [0, 1]; R and
H = H 1 (R ).
The optimization problems when E = Θ(RN ) or C0 [0, 1]; RN are rather easy
1
consequences of |η(t)| ≤ t 2 kηkH1 (RN ) . When E = Θ1 (RN ), one should start with
the observation that if η ∈ H11 (RN ), then 2kηku ≤ kη̇kL1 ([0,1];RN ) ≤ kηkH11 (RN ) .
In the final two cases, one can either use elementary variational calculus or one
can make use of, respectively, the orthonormal bases
1 1
1
πτ : n ≥ 0 and 2 2 sin nπτ : n ≥ 1 in L2 [0, 1]; R).

2 2 sin n + 2

Exercise 8.4.8. Suppose that f ∈ C E; R , and show, as a consequence of
Theorem 8.4.4, that

lim f S̃n = min{f (h) : khkH ≤ 1} and lim f S̃n = max{f (h) : khkH ≤ 1}
n→∞ n→∞
W N -almost surely.
§ 8.5 Euclidean Free Fields
In this section I will give a very cursory introduction to a family of abstract
Wiener spaces they played an important role in the attempt to give a mathe-
matically rigorous construction of quantum fields. From the physical standpoint,
the fields treated here are “trivial” in the sense that they model “free” (i.e.,
non-interacting) fields. Nonetheless, they are interesting from a mathematical
standpoint and, if nothing else, show how profoundly properties of a process are
effected by the dimension of its parameter set.
I begin with the case when the parameter set is one dimensional and the
resulting process can be seen as a minor variant of Brownian motion. As we
will see, the intractability of the higher dimensional analogs increases with the
number of dimensions.
§ 8.5.1. The Ornstein–Uhlenbeck Process. Given x ∈ RN and θ ∈ Θ(RN ),
consider the integral equation
1 t
Z
(8.5.1) U(t, x, θ) = x + θ(t) − U(τ, x, θ) dτ, t ≥ 0.
2 0
A completely elementary argument (e.g., via Gronwall’s Inequality) shows that,
for each x and θ, there is at most one solution. Furthermore, integration by
parts allows one to check that if
Z t
− 2t τ
U(t, 0, θ) = e e 2 dθ(τ ),
0
where the integral is taken in the sense of Riemann-Stieltjes, then

t
U(t, x, θ) = e− 2 x + U(t, 0, θ)
is one, and therefore the one and only, solution.

The stochastic process {U(t, x) : t ≥ 0} under W (N ) was introduced by
L. Ornstein and G. Uhlenbeck1 and is known as the Ornstein–Uhlenbeck
process starting from x. From our immediate point of view, its importance is
that it leads to a completely tractable example of a free field.
Intuitively, U(t, 0, θ) is a Brownian motion that has been subjected to a linear
restoring force. Thus, locally it should behave very much like a Brownian motion.
However, over long time intervals it should feel the effect of the restoring force,
which is always pushing it back toward the origin. To see how these intuitive
ideas are reflected in the distribution of {U(t, 0, θ) : t ≥ 0}, I begin by using
t
Exercise 8.2.18 to identify e, U(t, 0) RN as e− 2 I(hte ) for each e ∈ SN −1 , where
t∧τ
hte (τ ) = 2 e 2 − 1 e. Hence, the span of ξ, U(t, 0) RN : t ≥ 0 & ξ ∈ RN is
a Gaussian family in L2 (W (N ) ; R), and
(N ) |t−s| s+t
EW U(s, 0) ⊗ U(t, 0) = e− 2 − e− 2 I.

The key to understanding the process {U(t, 0) : t t ≥ 0} is the observation

that it has the same distribution as the process e− 2 B et − 1 : t ≥ 0 , where

1 In their article “On the theory of Brownian motion,” Phys. Reviews 36 # 3, pp. 823-841
(1930), L. Ornstein and G. Uhlenbeck introduced this process in an attempt to reconcile some
of the more disturbing properties of Wiener paths with physical reality.
{B(t) : t ≥ 0} is a Brownian motion, a fact that follows immediately from the

observation that they are Gaussian families with the same covariance structure.
In particular, by combining this with the Law of the Iterated Logarithm proved
in Exercise 4.3.15, we see that, for each e ∈ SN −1 ,

e, U(t, x) RN e, U(t, x) RN
(8.5.2) lim √ = 1 = − lim √
t→∞ 2 log t t→∞ 2 log t
W (N ) -almost surely, which confirms the suspicion that the restoring force damp-
ens the Brownian excursions out toward infinity.
A second indication that U( · , x) tends to spend more time than Brownian
paths do near the origin is that its distribution at time t will be γe− 2t x,(1−e−t )I ,
and so, as distinguished from Brownian motion itself, its distribution as time
t tends to a limit, namely γ0,I . This observation suggests that it might be
interesting to look at an ancient Ornstein–Uhlenbeck process, one that already
has been running for an infinite amount of time. To be more precise, since the
distribution of an ancient Ornstein–Uhlenbeck at time 0 would be γ0,I , what
we should look at is the process that we get by making the x in U( · , x, θ)
a standard normal random variable. Thus, I will say that a stochastic process
{UA (t) : t ≥ 0} is an ancient Ornstein–Uhlenbeck process if its distribution
is that of {U(t, x, θ) : t ≥ 0} under γ0,I × W (N ) .
If {U
A (t) : t ≥ 0} is an ancient Ornstein–Uhlenbeck
process, then it is clear
that ξ, UA (t) RN : t ≥ 0 & ξ ∈ RN spans a Gaussian family with covariance
|t−s|
EP UA (s) ⊗ UA (t) = e− 2 I.

As
−at consequence,
we see that if {B(t) : t ≥ 0} is a Brownian motion, then
e 2 B et : t ≥ 0 is an ancient Ornstein–Uhlenbeck process. In addition, as
we suspected, the ancient Ornstein–Uhlenbeck process is a stationary process
in the sense that, for each T > 0, the distribution of {UA (t + T ) : t ≥ 0} is
the same as that of {UA (t) : t ≥ 0}, which can be checked either by using the
preceding representation in terms of Brownian motion or by observing that its
covariance is a function of t − s.
In fact, even more is true: it is time reversible in the sense that, for each T > 0,
{UA (t) : t ∈ [0, T ]} has the same distribution as {UA (T − t) : t ∈ [0, T ]}. This
observation suggests that we can give the ancient Ornstein–Uhlenbeck its past
by running it backwards. That is, define UR : [0, ∞) × RN × Θ(RN )2 −→ RN by
if t ≥ 0

U(t, x, θ+ )
UR (t, x, θ+ , θ− ) =
U(−t, x, θ− ) if t < 0,
and consider the process {UR (t, x, θ+ , θ− ) : t ∈ R} under γ0,I × W (N ) × W (N ) .
This process also spans a Gaussian family, and it is still true that
(N ) (N ) |t−s|
(8.5.3) Eγ0,I ×W ×W UR (s) ⊗ UR (t) = u(s, t)I, where u(s, t) ≡ e− 2 ,

only now for all s, t ∈ R. One advantage of having added the past is that the
statement of reversibility takes a more appealing form. Namely, {UR (t) : t ∈ R}
is reversible in the sense that its distribution is the same whether one runs
it forward or backward in time. That is, {UR (−t) : t ∈ R} has the same
distribution as {UR (t) : t ∈ R}. For this reason, I will say that {UR (t) : t ≥ 0}
is a reversible Ornstein–Uhlenbeck process if its distribution is the same
as that of {UR (t, x, θ+ , θ− ) : t ≥ 0} under γ0,I × W (N ) × W (N ) .
An alternative way to realize a reversible Ornstein–Uhlenbeck process is to
start with an RN -valued Brownian motion {B(t) : t ≥ 0} and consider the
t t
process {e− 2 B(et ) : t ∈ R}. Clearly ξ, e− 2 B(et ) RN : (t, ξ) ∈ R × RN is

a Gaussian family with covariance given by (8.5.3). It is amusing to observe

that, when one uses this realization, the reversibility of the Ornstein–Uhlenbeck
process is equivalent to the time inversion invariance (cf. Exercise 4.3.11) of the
original Brownian motion.
§ 8.5.2. Ornstein–Uhlenbeck as an Abstract Wiener Space. So far, my
treatment of the Ornstein–Uhlenbeck process has been based on its relationship
to Brownian motion. Here I will look at it as an abstract Wiener space.
Begin with the one-sided process − t {U(t,
t
0) : t ≥ 0}. Seeing as this process
has the same distribution as e B e − 1 : t ≥ 0}, it is reasonably clear
2
that the Hilbert space associated with this process should be the space HU (RN )
t
of functions hU (t) = e− 2 h et − 1), h ∈ H1 (RN ). Thus, define the map F U :
H1 (RN ) −→ HU (RN ) accordingly, and introduce the Hilbert norm k · kHU (RN )
on HU (RN ) that makes F U into an isometry. Equivalently,
Z
U 2
h d 1 i2
kh kHU (RN ) = (1 + s) 2 hU log(1 + s) ds
[0,∞) ds
= kḣU k2L2 ([0,∞);RN ) + ḣU , hU L2 ([0,∞);RN ) + 14 khU k2L2 ([0,∞);RN ) .

Note that
Z
U U d U
1
|h (t)|2 dt = 1
lim |hU (t)|2

ḣ , h L2 ([0,∞);RN )
= 2 2 t→∞ = 0.
[0,∞) dt
1
To check the final equality, observe that it is equivalent to limt→∞ t− 2 |h(t)| = 0
1 1
for h ∈ H(RN ). Hence, since supt>0 t− 2 |h(t)| ≤ khkH1 (RN ) and limt→∞ t− 2 |h(t)|
= 0 if ḣ has compact support, the same result is true for all h ∈ H1 (RN ). In
particular,
q
khU kHU (RN ) = kḣU k2L2 ([0,∞);RN ) + 14 khU k2L2 ([0,∞);RN ) .
If we were to follow the prescription in Theorem 8.3.1, we would next complete

t
HU (RN ) with respect to the norm supt≥0 e− 2 |hU (t)|. However, we already know
from (8.5.2) that {U(t, 0) : t ≥ 0} lives on ΘU (RN ), the space of θ ∈ Θ(RN )

such that limt→∞ (log t)−1 |θ(t)| = 0 with Banach norm
−1
kθk ≡ sup log(e + t) |θ(t)|,
t≥0
and so we will adopt ΘU (RN ) as the Banach space for HU (RN ). Clearly, the
dual space ΘU (RN )∗ of ΘU (RN ) can be identified with the space of RN -valued
Borel
R measures λ on [0, ∞) that give 0 mass to {0} and satisfy kλkΛU (RN ) ≡
[0,∞)
log(e + t) |λ|(dt) < ∞.
(N )
Theorem 8.5.4. Let U0 ∈ M1 ΘU (RN ) be the distribution of {U(t, 0) :
(N )
t ≥ 0} under W (N ) . Then HU (RN ), ΘU (RN ), U0 is an abstract Wiener
space.
Proof: Since Cc∞ (0, ∞); RN is contained in HU (RN ) and is dense in ΘU (RN ),

we know that HU (RN ) is dense in ΘU (RN ). In addition, because η U (t) =

t
e− 2 η(et − 1), where η ∈ H1 (RN ), and kη U kHU (RN ) = kηkH1 (RN ) , kη U ku ≤
1
kη U kHU (RN ) follows from |η(t)| ≤ t 2 kηkH1 (RN ) . Hence, HU (RN ) is continuously
embedded in ΘU (RN ).
To complete the proof, remember our earlier calculation of the covariance of
{U(t; 0) : t ≥ 0}, and use it to check that
ZZ
(N ) |s−t| s+t
EU0 hθ, λi2 = where u0 (s, t) ≡ e− − e−

u0 (s, t) λ(ds) · λ(dt), 2 2 .
[0,∞)2
Hence, what I need to show is that if λ ∈ ΘU (RN )∗ −→ hU U N

λ ∈ H (R ) is the
U U U
map determined by hh , λi = h , hλ HU (RN ) , then
ZZ
(8.5.5) khU 2
λ kHU (RN ) = u0 (s, t) λ(ds) · λ(dt).
[0,∞)2
In order to do this, we must first know how hU λ is constructed from λ. But if

(8.5.5) is going to hold, then, by polarization,
ZZ
e, hU U

λ (τ ) RN = hhλ , δτ ei = u0 (s, t) δτ (ds) e, λ(dt) RN
[0,∞)2
Z !
= e, u0 (τ, t) λ(dt) .
[0,∞)
RN
Thus, one should guess that hU

R
λ (τ ) = [0,∞) u0 (τ, t) λ(dt) and must check that,
U U N U U N
with this choice, h
U U U
λ ∈ H (R ), (8.5.5) holds, and, for all h ∈ H (R ),
hh , λi = h , hλ HU (RN ) .
The key to proving all these is the equality
Z Z
(*) ḣU (τ )∂τ u0 (τ, t) dτ + 14 hU (τ )u0 (τ, t) dτ = hU (t),
[0,∞) [0,∞)
which is an elementary application of integration by parts. Applying (*) with

N = 1 to hU = u0 ( · , s), we see that
Z
∂τ u0 (s, τ )∂τ u0 (t, τ ) dτ = u0 (s, t),
[0,∞)
from which it follows easily both that hU U N

λ ∈ H (R ) and that (8.5.5) holds.
In addition, if h ∈ H (R ), then hh , λi = h , hU
U U N U U
λ HU (RN ) follows from (*)
after one integrates both sides of the preceding with respect to λ(dt).
I turn next to the reversible case. By the considerations in § 8.4.1, we know
(N )
that the distribution UR of {UR (t) : t ≥ 0} under γ0,1 × W (N ) × W (N ) is a
Borel measure on the space Banach space ΘU (R; RN ) of continuous θ : R −→ RN
such that lim|t|→∞ (log t)−1 |θ(t)| = 0 with norm
−1
kθkΘU (R;RN ) ≡ sup log(e + |t|) |θ(t)| < ∞.
t∈R
Furthermore, it should be clear that one can identify ΘU (R; RN )∗ with the space
of RN -valued Borel measures λ on R satisfying
Z
kλkΛU (R;RN ) ≡ log(e + |t|) |λ|(dt) < ∞.
R
Theorem 8.5.6. Take H1 (R; RN ) to be the separable Hilbert space of abso-

lutely continuous h : R −→ RN satisfying
q
khkH1 (R;RN ) ≡ kḣk2L2 (R:RN ) + 14 khk2L2 (R:RN ) < ∞.
(N )
Then H1 (R; RN ), ΘU (R; RN ), UR is an abstract Wiener space.
|s−t|
− 2
Proof: Set u(s, t) ≡ e , and let λ ∈ ΛU (R; RN ). By the same reasoning
as I used in the preceding proof,

hh, λi = h, hλ H1 (R;RN )
and ZZ
khλ k2H1 (R;RN ) = u(s, t) λ(ds) · λ(dt)
R×R

u(τ, t) λ(dt). Hence, since ξ, θ(t) RN : t ≥ 0 & ξ ∈ RN
R
when hλ (τ ) = R
(N ) (N )
spans a Gaussian family in L2 UR ; R and u(s, t)I = EUR θ(s) ⊗ θ(t) , the

proof is complete.
§ 8.5.3. Higher Dimensional Free Fields. Thinking a la Feynman, Theorem
(N )
8.5.6 is saying that UR wants to be the measure on H 1 (R; R) given by
Z
1 1 2 1 2
√ exp − |ḣ(t)| + 4 |h(t)| dt λH1 (R;RN ) (dh),
( 2π)dim(H1 (R;RN )) 2 R
where λH1 (R;RN ) is the Lebesgue measure on H1 (R; RN ).

I am now going to look at the analogous situation when N = 1 but the
parameter set R is replaced by Rν for some ν ≥ 2. That is, I want to look at
the measure that Feynman would have written as
Z
1 1 2 1 2

√ exp − |∇h(x))| + 4 |h(x)| dx λH 1 (Rν ;R) (dh),
( 2π)dim(H 1 (Rν ;R)) 2 Rν
where H 1 (Rν ; R) is the separable Hilbert space obtained by completing the

Schwartz test function space S (Rν ; R) with respect to the Hilbert norm
q
khkH 1 (Rν ;R) ≡ k∇hk2L2 (Rν ;R) + 14 khk2L2 (Rν ;R) .
When ν = 1 this is exactly the Hilbert space H 1 (R; R) described in Theorem

8.5.6 for N = 1. When ν ≥ 2, generic elements of H 1 (Rν ; R) are better than
generic elements of L2 (Rν ; R) but are not enough better to be continuous. In
fact, they are not even well-defined pointwise, and matters get worse as ν gets
larger. Thus, although Feynman’s representation is already questionable when
ν = 1, its interpretation when ν ≥ 2 is even more fraught with difficulties. As
we will see, these difficulties are reflected mathematically by the fact that, in
order to construct an abstract Wiener space for H 1 (Rν ; R) when ν ≥ 2, we will
have to resort to Banach spaces whose elements are generalized functions (i.e.,
distributions in the sense of L. Schwartz).2
2 The need to deal with generalized functions is the primary source of the difficulties that
mathematicians have when they attempt to construct non-trivial quantum fields. Without
going into any details, suffice it to say that in order to construct interacting (i.e., non-Gaussian)
fields, one has to take non-linear functions of a Gaussian field. However, if the Gaussian field
is distribution valued, it is not at all clear how to apply a non-linear function to it.
The approach that I will adopt is based on the following subterfuge. The space
H 1 (Rν ; R) is one of a continuously graded family of spaces known as Sobolev
spaces. Sobolev spaces are graded according to the number of derivatives “bet-
ter or worse” than L2 (Rν ; R) their elements are. To be more precise, for each
s ∈ R, define the Bessel operator B s on S (Rν ; C) so that
s
Bds ϕ(ξ) = 1 + |ξ|2 − 2 ϕ̂(ξ).
4
m
When s = −2m, it is clear that B s = 14 −∆ , and so, in general, it is reasonable
to think of B s as an operator that, depending on whether s ≤ 0 or s ≥ 0,
involves taking or restoring derivatives of order |s|. In particular, kϕkH 1 (Rν ;R) =
kB −1 ϕkL2 (Rν ;R) for ϕ ∈ S (Rν ; R). More generally, define the Sobolev space
H s (Rν ; R) to be the separable Hilbert space obtained by completing S (Rν ; R)
with respect to
s Z
−s 1 1
s
khkH s (Rν ;R) ≡ kB hkL2 (Rν ;R) = ν 4 + |ξ|2 |ĥ(ξ)|2 dξ.
(2π) Rν
Obviously, H 0 (Rν ; R) is just L2 (Rν ; R). When s > 0, H s (Rν ; R) is a sub-
space of L2 (Rν ; R), and the quality of its elements will improve as s gets larger.
However, when s < 0, some elements of H s (Rν ; R) will be strictly worse than
elements of L2 (Rν ; R), and their quality will deteriorate as s becomes more neg-
ative. Nonetheless, for every s ∈ R, H s (Rν ; R) ⊆ S 0 (Rν ; R), where S 0 (Rν ; R),
whose elements are called real-valued tempered distributions, is the dual
space of S (Rν ; R). In fact, with a little effort, one can check that an alternative
description of H s (Rν ; R) is as the subspace of u ∈ S 0 (Rν ; R) with the prop-
erty that B −s u ∈ L2 (Rν ; R). Equivalently, H s (Rν ; R) is the isometric image in
S (Rν ; R) of L2 (Rν ; R) under the map B s , and, more generally, H s2 (Rν ; R) is
the isometric image of H s1 (Rν ; R) under B s2 −s1 . Thus, by Theorem 8.3.1, once
we understand the abstract Wiener spaces for any one of the spaces H s (Rν ; R),
understanding the abstract Wiener spaces for any of the others comes down to
understanding the action of the Bessel operators, a task that, depending on what
one wants to know, can be highly non-trivial.
ν+1
Lemma 8.5.7. The space H 2 (Rν ; R) is continuously embedded as a dense
subspace of the separable Banach space C0 (Rν ; R) whose elements are continu-
ous functions that tend to 0 at infinity and whose norm is the uniform norm.
Moreover, given a totally finite, signed Borel measure λ on Rν , the function
Z 1−ν
|x−y|
− 2 π 2
hλ (x) ≡ Kν e λ(dy), with Kν ≡ ,
Rν Γ ν+1
2
ν+1
is an element of H 2 (Rν ; R),
ZZ
|x−y|
khλ k 2
ν+1 = Kν e− 2 λ(dx)λ(dy),
H 2 (Rν ;R)
Rν ×Rν
and ν+1
(Rν ; R).

hh, λi = h, hλ ν+1 for each h ∈ H 2
H 2 (Rν ;R)
Proof: To prove the initial assertion, use the Fourier inversion formula to write
√
Z
−ν
h(x) = (2π) e− −1(x,ξ)Rν ĥ(ξ) dξ
Rν
for h ∈ S (R ; R), and derive from this the estimate
ν
Z 12
ν ν+1
2 − 2
khku ≤ (2π)− 2 1
4 + |ξ| dξ khk ν+1 .
Rν H 2 (Rν ;R)
ν+1
Hence, since H 2 (Rν ; R) is the completion of S (Rν ; R) with respect to the
ν+1
norm k · k ν+1 , it is clear that H 2 (Rν ; R) is continuously embedded in
H 2 (Rν ;R)
ν+1
C0 (R ; R). In addition, since S (Rν ; R) is dense in C0 (Rν ; R), H 2 (Rν ; R) is
ν
also.
To carry out the next step, let λ be given, and observe that the Fourier
− ν+1
transform of B ν+1 λ is 14 + |ξ|2 2
λ̂(ξ) and therefore that
√
e− −1(x,ξ)Rν λ̂(ξ)
Z
ν+1 1
B λ(x) = ν+1 dξ
(2π)ν Rν 1
+ |ξ| 2 2
 4 √ 
Z Z −1(y−x,ξ)Rν
1 e
= 
ν+1 dξ  λ(dy).
(2π)ν Rν Rν 1 + |ξ|2 2
4
Now use (3.3.19) (with N = ν and t = 12 ) to see that

√
e −1(y−x,ξ)Rν
Z
1 |y−x|
− 2
ν+1 dξ = Kν e ,
(2π)ν Rν 1 + |ξ|2 2

4
ν+1
and thereby arrive at hλ = B λ. In particular, this shows that
|λ̂(ξ)|2
Z
1
khλ k2 ν+1 = dξ < ∞.
H 2 (Rν ;R) (2π)ν Rν 1 + |ξ|2 ν+12
4
Now let h ∈ S (Rν ; R), and use the preceding to justify
ν+1 ν+1
hh, λi = hB − 2 h, B − 2 B ν+1 λi = h, hλ ν+1

.
H 2 (Rν ;R)
ν+1
Since both sides are continuous with respect to convergence in H 2 (Rν ; R), we
ν+1
have now proved that hh, λi = h, hλ ν+1 ν for all h ∈ H 2 (Rν ; R). In
H 2 (R ;R)
particular,
ZZ
|y−x|
khλ k 2
ν+1 = hhλ , λi = Kν e− 2 λ(dx)λ(dy).
H 2 (Rν ;R)
Rν ×Rν
ν+1
Theorem 8.5.8. Let Θ 2 (Rν ; R) be the space of continuous θ : Rν −→ R sat-
−1 ν+1
isfying lim|x|→∞ log(e+|x|) |θ(x)| = 0, and turn Θ 2 (Rν ; R) into a separable
−1
Banach space with norm kθk ν+1 ν = supx∈RN log(e + |x|) |θ(x)|. Then
Θ 2 (R ;R)
ν+1 ν+1
ν
H 2 (R ; R) is continuously embedded as a dense subspace of Θ 2 (Rν ; R), and
ν+1
there is a W ν+1 ν ∈ M1 Θ 2 (Rν ; R) such that
H 2 (R ;R)
ν+1 ν+1

H 2 (Rν ; R), Θ 2 (Rν ; R), W ν+1
H 2 (Rν ;R)
is an abstract Wiener space. Moreover, for each α ∈ 0, 12 , W

ν+1 -almost
H 2 (Rν ;R)
every θ is Hölder continuous of order α and, for each α > 12 , W ν+1 -almost
H 2 (Rν ;R)
no θ is anywhere Hölder continuous of order α.
Proof: The initial part of the first assertion follows from the first part of
Lemma 8.5.7 plus the essentially trivial fact that C0 (Rν ; R) is continuously em-
ν+1
bedded as a dense subspace of Θ 2 (Rν ; R). Further, by the second part of
that same lemma combined with Theorem 8.3.3, we will have proved the sec-
ond part of the first assertion here once we show that, when {hm : m ≥ 0} is
ν+1 P∞
an orthonormal basis in H 2 (Rν ; R), the Wiener series m=0 ωm hm converges
ν+1
in Θ 2 (Rν ; R) for γ0,1N
-almost every ω = (ω0 , . . . , ωm , . . . ) ∈ RN . Thus, set
Pn
Sn (ω) = m=0 ωm hm for n ≥ 1. More or less mimicking the steps outlined in
Exercise 8.3.21, I will begin by showing that, for each α ∈ 0, 12 and R ∈ [1, ∞),
 
N |Sn (y) − Sn (x)| 
(*) sup Eγ0,1 sup sup  < ∞,

z∈Rν n≥0 x,y∈Q(z,R) |y − x|α
x6=y
where Q(z, R) = z + [−R, R)ν . Indeed, by the argument given in that exer-
cise combined with the higher dimensional analog of Kolmogorov’s continuity
criterion in Exercise 4.3.18, (*) will follow once we show that
N
Eγ0,1 |Sn (y) − Sn (x)|2 ≤ C|y − x|, x, y ∈ Rν ,

for some C < ∞. To this end, set λ = δy − δx , and apply Lemma 8.5.7 to check
n
N
γ0,1
2
X 2
E |Sn (y) − Sn (x)| = hm , hλ ν+1
H 2 (Rν ;R)
m=0
|y−x|
≤ khλ k2 = 2Kν 1 − e−

ν+1
2 .
H 2 (Rν ;R)
Knowing (*), it becomes an easy matter to see that there exists a measur-
able S : Rν × RN −→ R such that x S(x, ω) is continuous of each ω and
Sn ( · , ω) −→ S( · , ω) uniformly on compacts for γ0,1

N
-almost every ω ∈ RN . In
N
fact, because of (*), it suffices to check that limn→∞ Sn (x) exists γ0,1 -almost
ν
surely for each x ∈ R , and this follows immediately from Theorem 1.4.2 plus
∞ ∞
X X 2
= khδx k2

Var ωm hm (x) = hm , hδx ν+1 ν+1 = Kν .
H 2 (Rν ;R) H 2 (Rν ;R)
m=0 m=0
N
Furthermore, again from (*), we know that, γ0,1 -almost every ω, x S(x, ω)
1

is α-Hölder continuous so long as α ∈ 0, 2 .
N
I must still check that, γ0,1 -almost surely, the convergence of Sn ( · , ω) to
ν+1
S( · , ω) is taking place in Θ 2 (Rν ; R), and, in view of the fact that we already
N
know that, γ0,1 -almost surely, it is taking place uniformly on compacts, this
reduces to showing that
−1 N
lim log(e + |x| sup |Sn (x)| −→ 0 γ0,1 -almost surely.
|x|→∞ n≥0
For this purpose, observe that (*) says that

N
γ0,1
sup E sup kSn ku,Q(z,1) < ∞,
z∈Rν n≥0
where k · ku,C denotes the uniform norm over a set C ⊆ Rν . At this point, I
would like to apply Fernique’s Theorem (Theorem 8.2.1) to the Banach space
`∞ N; Cb (Q(z, 1); R) and thereby conclude that there exists an α > 0 such that

N
(**) B ≡ sup Eγ0,1 exp α sup kSn k2u,Q(z,1) < ∞.
z∈Rν n≥0
∞

However, ` N; Cb (Q(z, 1); R) is not separable. Nonetheless, there are two
ways to get around this technicality. The first is to observe that the only place
separability was used in the proof of Fernique’s Theorem was at the beginning,
where I used it to guarantee that BE is generated by the maps x hx, x∗ i as
∗ ∗
x runs over E and therefore that the distribution of X is determined by the
distribution of {hX, x∗ i : x∗ ∈ E ∗ }. But, even though `∞ N; Cb (Q(z, 1); R)
is not separable, one can easily check that it nevertheless possesses this prop-
erty. The second way to deal with the problem is to apply his theorem to
`∞ {0, . . . , N }; Cb (Q(z, 1); R) , which is separable, and to note that the result-
ing estimate can be made uniform in N ∈ N. Either way, p one arrives at (**).
2
Now set ψ(t) = eαt − 1 for t ≥ 0. Then ψ −1 (s) = α−1 log(1 + s), and

ν
sup kSn ku,Q(0,M ) = max sup kSn ku,Q(m,1) : m ∈ Q(0, M ) ∩ Z
n≥0 n≥0
 
X
≤ ψ −1  ψ sup kSn ku,Q(m,1)  .
n≥0
m∈Q(0,M )∩Zν
Thus, because ψ −1 is concave, Jensen’s Inequality applies and yields

N
Eγ0,1 sup kSn ku,Q(0,M ) ≤ ψ −1 (2M )ν B ,

n≥0
and therefore
N
h i
Eγ0,1 supn≥0 kSn ku,Q(0,em4 )
" #
N
γ0,1 Sn (x) X
E sup sup ≤
|x|≥R n≥0 log(e + |x|) 1
log(e + e(m−1)4 )
m≥(log R) 4
p
X log(1 + 2ν eν(m+1)4 B)
≤ √ −→ 0 as R → ∞.
1
α log(e + e(m−1)4 )
m≥(log R) 4
To complete the proof, I must show that, for any α > 12 , W ν+1 -almost
H 2 (Rν ;R)
no θ is anywhere Hölder continuous of order α, and for this purpose I will proceed
as in the proof of Theorem 4.3.4. Because the {θ(x + y) : x ∈ Rν } has the same
W ν+1 ν -distribution for all y, it suffices for me to show that, W ν+1 ν -
H 2 (R ;R) H 2 (R ;R)
almost surely, there is no x ∈ Q(0, 1) at which θ is Hölder continuous of order
α > 12 . Now suppose that α ∈ 12 , 1 , and observe that, for any L ∈ Z+ and
e ∈ Sν−1 , the set H(α) of θ’s that are α-Hölder continuous at some x ∈ Q(0, 1)
is contained in
∞ \
[ ∞ [ L n
\ o
m+(`−1)e
m+è
M
θ : θ n −θ n ≤ nα .
M =1 n=1 m∈Q(0,n)∩Zν `=1
Hence, again using translation invariance, we see that we need only show that
there is an L ∈ Z+ such that, for each M ∈ Z+ ,

(`−1)e
nν W ν+1 ν θ : θ è M

n − θ n ≤ nα , 1 ≤ ` ≤ L
H 2 (R ;R)
−1
tends to 0 as n → ∞. To this end, set U (t, θ) = Kν 2 θ(te), and observe that
the W ν+1 ν -distribution of {U (t) : t ≥ 0} is that of an R-valued ancient
H 2 (R ;R)
Ornstein–Uhlenbeck process. Thus, what I have to estimate is
` ` `−1 `−1

P e− 2n B e n − e− 2n B e n ≤ nMα , 1 ≤ ` ≤ L ,

where B(t), Ft , P is an R-valued Brownian motion. But clearly this probability
is dominated by the sum of
` `−1 `
P B e n − B e n ≤ M2ne 2n
α , 1 ≤ ` ≤ L
and `
1 `−1
P ∃1 ≤ ` ≤ L 1 − e− 2n B e n ≥ M e 2n
2nα .
M 2 n2(1−α)
The second of these is easily dominated by 2Le− 8 , which, since α < 1,
means that it causes no problems. As for the first, one can use the independence
of Brownian increments and Brownian scaling to dominate it by the Lth power
of
1
P B(1)−B e− n ≤ M (2nα )−1 . Hence, I can take any L such that α− 12 L >

ν.
As a consequence of the preceding and Theorem 8.3.1, we have the following
corollary.
Corollary 8.5.9. Given s ∈ R, set
ν+1 ν+1
Θs (Rν ; R) = B s− 2 θ : θ ∈ Θ 2 (Rν ; R) ,

ν+1
kθkΘs (Rν ;R) = kB 2 −s θk ν+1 ,
Θ 2 (Rν ;R)
and ν+1
WH s (Rν ;R) = (B s− 2 )∗ W ν+1 .
H 2 (Rν ;R)
Then Θs (Rν ; R) is a separable Banach space in which H (Rν ; R) is continuously

s
embedded as a dense subspace, and H s (Rν ; R), Θs (Rν ; R), WH s (Rν ;R) is an
abstract Wiener space.
Exercise 8.5.10. In this exercise we will show how to use the Ornstein–Uhlen-
beck process to prove Poincaré’s Inequality
(8.5.11) Varγ0,1 (ϕ) = kϕ − hϕ, γ0,1 ik2L2 (γ0,1 ;R) ≤ kϕ0 k2L2 (γ0,1 ;R)
for the standard Gaussian distribution on R. I will outline the proof of (8.5.11)
for ϕ ∈ S (R; R), but the estimate immediately extends to any ϕ ∈ L2 (γ0,1 ; R)
whose (distributional) first derivative is again in L2 (γ0,1 ; R).
(i) For ϕ ∈ S (R; R), set
(1)
uϕ (t, x) = EW

ϕ U (t, x) ,
where {U (t, x) : t ≥ 0} is the one-sided, R-valued Ornstein–Uhlenbeck process
t
starting at x. Show that u0ϕ (t, x) = e− 2 uϕ0 (t, x) and that
lim uϕ (t, · ) = ϕ and lim uϕ (t, · ) = hϕ, γ0,1 i in L2 (γ0,1 ; R).
t&0 t→∞
Show that another expression for uϕ is
t
!
(y − e− 2 x)2
Z
1
−t − 2
uϕ (t, x) = 2π(1 − e ) ϕ(y) exp − dy.
R 2(1 − e−t )
Using this second expression, show that uϕ (t, · ) ∈ S (R; R) and that t ∈
[0, ∞) 7−→ uϕ (t, · ) ∈ S (R; R) is continuous. In addition, show that u̇ϕ (t, x) =
1 00 0
2 uϕ (t, x) − xuϕ (t, x) .
(ii) For ϕ1 , ϕ2 ∈ C 2 (R; R) whose second derivative are tempered, show that
ϕ1 , ϕ002 − xϕ2 = − ϕ01 , ϕ02

L2 (γ0,1 ;R) L2 (γ0,1 ;R)
,
and use this together with (i) to show that, for any ϕ ∈ S (R; R),
d
huϕ (t, · ), γ0,1 i = hϕ, γ0,1 i and kuϕ (t, · )k2L2 (γ0,1 ;R) = −e−t kuϕ0 (t, · )k2L2 (γ0,1 ;R) .
dt
Conclude that kuϕ (t, · )kL2 (γ0,1 ;R) ≤ kϕkL2 (γ0,1 ;R) and
d
kuϕ (t, · )k2L2 (γ0,1 ;R) ≥ −e−t kϕ0 k2L2 (γ0,1 ;R) .
dt
Finally, integrate the preceding inequality to arrive at (8.5.11).

Exercise 8.5.12. In this exercise I will outline how the ideas in Exercise 8.5.10
can be used to give another derivation of the logarithmic Sobolev Inequality
(2.4.42). Again, I restrict my attention to ϕ ∈ S (R; R), since the general case
can be easily obtained from this by taking limits.
(i) Begin by showing that (2.4.42) for ϕ ∈ S (R; R) once one knows that
(ϕ0 )2

1
(*) ϕ log ϕ γ0,1
≤
2 ϕ γ0,1
for uniformly positive ϕ ∈ R ⊕ S (R; R).

(ii) Given a uniformly positive ϕ ∈ R ⊕ S (R; R), use the results in Exercise
8.5.10 to show that
e−t uϕ0 (t, · )2

d

uϕ (t, · ) log uϕ (t, · ) γ0,1 = − .
dt 2 uϕ (t, · ) γ0,1
(iii) Continuing (ii), apply Schwarz’s inequality to check that
uϕ0 (t, x)2

≤ u (ϕ0 )2 (t, x),
uϕ (t, x) ϕ
and combine this with (ii) to get
e−t (ϕ0 )2

d

uϕ (t, · ) log uϕ (t, · ) γ0,1 ≥ − .
dt 2 ϕ γ0,1
Finally, integrate this to arrive at (*).

Exercise 8.5.13. Although it should be clear that the arguments given in Ex-
ercises 8.5.10 and 8.5.12 work equally well in RN and yield (8.5.11) and (2.4.42)
with γ0,1 replaced by γ0,I and (ϕ0 )2 replaced by |∇ϕ|2 , it is significant that each
of these inequalities for R implies its RN analog. Indeed, show that Fubini’s The-
orem is all that one needs to pass to the higher dimensional results. The reason
why this remark is significant is that it allows one to prove infinite dimensional
versions of both Poincaré’s Inequality and the logarithmic Sobolev Inequality,
and both of these play a crucial role in infinite dimensional analysis. In fact,
Nelson’s interest in hypercontractive estimates sprung from his brilliant insight
that hypercontractive estimates would allow him to construct a non-trivial (i.e.,
non-Gaussian), translation invariant quantum field for R2 .
Exercise 8.5.14. It is interesting to see what happens if one changes the sign
of the second term on the right-hand side of (8.5.1), thereby converting the
centripetal force into a centrifugal one.
(i) Show that, for each θ ∈ Θ(RN ), the unique solution to
Z t
1
V(t, θ) = θ(t) + 2 V(τ, θ) dτ, t ≥ 0,
0
is Z t
t τ
V(t, θ) = e 2 e− 2 dθ(τ ),
0
where the integral is taken in the sense of Riemann–Stieltjes.

(ii) Show that ξ, V(t, · ) RN : (t, ξ) ∈ [0, ∞) × RN under W (N ) is a Gaussian
family with covariance
s+t |t−s|
v(s, t) = e 2 −e 2 .
(iii) Let {B(t) : t ≥ 0} be an RN -valued Brownian motion, and show that the
distribution of
t
e 2 B 1 − e−t : t ≥ 0

is the W (N ) -distribution of {V(t) : t ≥ 0}. Next, let ΘV (RN ) be the space of

continuous θ : [0, ∞) −→ RN with the properties that
θ(0) = 0 = lim e−t |θ(t)|,

t→∞
and set kθkΘV (RN ) ≡ supt≥0 e−t |θ(t)|. Show that ΘV (RN ); k · kΘV (RN ) is a

separable Banach space and that there exists a unique V (N ) ∈ M1 ΘV (RN )
such that the distribution of {θ(t) : t ≥ 0} under V (N ) is the same as the
distribution of {V(t) : t ≥ 0} under W (N ) .
(iv) Let HV (RN ) be the space of absolutely continuous h : [0, ∞) −→ RN with

the properties that h(0) = 0 and ḣ − 12 h ∈ L2 [0, ∞); RN . Show that HV (RN )
with norm
khkHV (RN ) ≡ ḣ − 12 h L2 ([0,∞);RN )
V N
is a separable Hilbert space that is continuously embedded in Θ (R ) as a dense
V N V N (N )
subspace. Finally, show that H (R ), Θ (R ), V is an abstract Wiener
space.
(v) There is a subtlety here that is worth mentioning. Namely, show that
HU (RN ) is isometrically embedded in HV (RN ). On the other hand, as dis-
tinguished from elements of HU (RN ), it is not true that kη̇ − 12 ηk2L2 (R;RN ) =
kη̇k2L2 (R;RN ) + 41 kηk2L2 (R;RN ) , the point being that whereas the elements h of

HV (RN ) with ḣ ∈ Cc (0, ∞); RN are dense in HU (RN ), they are not dense in
HV (RN ).
Exercise 8.5.15. Given x ∈ Rν and a slowly increasing ϕ ∈ C(Rν ; R), define
τx ϕ ∈ C(Rν ; R) so that τx ϕ(y) = ϕ(x + y) for y ∈ Rν . Next, extend τx
to S 0 (Rν ; R) so that hϕ, τx ui = hτ−x ϕ, ui for ϕ ∈ S (Rν ; R), and check that
this is a legitimate extension in the sense that it is consistent with the original
definition when applied to u’s that are slowly increasing, continuous functions.
Finally, given s ∈ R, define Ox : H s (Rν ; R) −→ H s (Rν ; R) by Ox h = τx h.
(i) Show that B s ◦ τx = τx ◦ B s for all s ∈ R and x ∈ Rν .
(ii) Given s ∈ R, define Ox = τx H s (Rν ; R), and show that Ox is an orthogonal
transformation.
(iii) Referring to Theorem 8.3.14 and Corollary 8.5.9, show that the measure
preserving transformation TOx that Ox determines on Θs (Rν ; R), WH s (Rν ;R) is
the restriction of τx to Θs (Rν ; R).

(iv) If x 6= 0, show that TOx is ergodic on Θs (Rν ; R), WH s (Rν ;R) .
§ 8.6 Brownian Motion on a Banach Space
In this concluding section I will discuss Brownian motion on a Banach space.
More precisely, given a non-degenerate, centered, Gaussian measure W on a
separable Banach space E, we will see that there exists an E-valued stochastic
process {B(t) : t ≥ 0} with the properties that B(0) = 0, t B(t) is continuous,

and, for all 0 ≤ s < t, B(t) − B(s) is independent of σ {B(τ ) : τ ∈ [0, s]} and
has distribution (cf. the notation in § 8.4) Wt−s .
§ 8.6.1. Abstract Wiener Formulation. Let W on E be as above, use H
to denote its Cameron–Martin space, and take H 1 (H) to be the Hilbert space
of absolutely continuous h : [0, ∞) −→ H such that h(0) = 0 and khkH 1 (H) =
kḣkL2 ([0,∞);H) < ∞. Finally, let Θ(E) be the space of continuous θ : [0, ∞) −→
§ 8.6 Brownian Motion on a Banach Space 359
E satisfying limt→∞ kθ(t)k

t
E
= 0, and turn Θ(E) into a Banach space with norm
−1
kθkΘ(E) = supt≥0 (1 + t) kθ(t)kE . By exactly the same line of reasoning as
I used when E = RN , one can show that Θ(E) is a separable Banach space in
which H 1 (E) is continuously embedded as a dense subspace. My goal is to prove
the following statement.
Theorem 8.6.1. With H 1 (H) and Θ(E) as above, there is a unique W (E) ∈
M1 Θ(E) such that H 1 (H), Θ(E), W (E) is an abstract Wiener space.
Choose an orthonormal basis {h1m : m ≥ 0} in H 1

(R), and, for n ≥ 0, t ≥ 0,
P n 1
and x = (x0 , . . . , xm , . . . ) ∈ E , set Sn (t, x) =
N
m=0 hm (t)xm . I will show
that, W -almost surely, {Sn ( · , x) : n ≥ 0} converges in Θ(E), and, for the
N
most part, the proof follows the same basic line of reasoning as that suggested in
Exercise 8.3.21 when E = RN . However, there is a problem here that we did not
encounter there. Namely, unless E is finite dimensional, bounded subsets will
not necessarily be relatively compact in E. Hence, local uniform equicontinuity
plus local boundedness is not sufficient to guarantee
that a collection of E-valued
paths is relatively compact in C [0, ∞); E , and that is the reason why we have
to work a little harder here.
Lemma 8.6.2. For W N -almost every x ∈ E N , {Sn ( · , x) : n ≥ 0} is relatively

compact in Θ(E).
Proof: Choose E0 ⊆ E, as in Corollary 8.3.10, so that bounded subsets of E0

are relatively compact in E and H, E0 , W E0 is again an abstract Wiener
space. Without loss in generality, I will assume that k · kE ≤ k · kE0 , and, by
Fernique’s Theorem, we know that C ≡ EW0 kxk4E0 < ∞.
Pn
Since (cf. Exercise 8.2.14) Sn (t, x) − Sn (s, x) = m=0 h1t − h1s , h1m H 1 (R) xm ,
where h1τ = · ∧ τ , the W0N -distribution of Sn (t) − Sn (s) is Wn , where 2n =
1 2
Pn 1 1
WN

0 ht − hs , hm H 1 (R) ≤ t − s. Hence, E kSn (t) − Sn (s)k4E0 ≤ C(t − s)2 .
In addition, {kSn (t) − Sn (s)kE0 : n ≥ 1} is a submartingale, and so, by Doob’s
Inequality plus Kolmogorov’s Continuity Criterion, there exists a K < ∞ such
that, for each T > 0,

N kSn (t) − Sn (s)kE0 3
(*) EW sup sup 1 ≤ KT 4 .
n≥0 0≤s<t≤T (t − s) 8
From (*) and Sn (0) = 0, we know that, W N -almost surely, {Sn ( · , x) : n ≥ 0} is

uniformly k · kE0 -bounded and uniformly k · kE0 -equicontinuous on each interval
[0, T ]. Since this means that, for every T > 0, {Sn (t, x) : n ≥ 0 & t ∈ [0, T ]}
is relatively compact in E and {Sn ( · , x) [0, T ] : n ≥ 0} is uniformly k · kE -
equicontinuous W N -almost surely, the Ascoli–Arzela Theorem guarantees that,
W N -almost surely, {Sn ( · , x) : n ≥ 0} is relatively compact in C [0, ∞); E with
the topology of uniform convergence on compacts. Thus, in order to complete

the proof, all that I have to show is that, W N -almost surely,
kSn (t, x)kE

lim sup sup = 0.
T →∞ n≥0 t≥T t
But,
kSn (t, x)kE X kSn (t, x)kE X 7` kSn (t, x)kE

sup ≤ sup ≤ 2− 8 sup 1 ,
t≥2k t `
2 ≤t≤2`+1 t 0≤t≤2`+1 t8
`≥k `≥k
and therefore, by (*),

" # 3
N kSn (t, x)kE 24 K k
EW sup sup ≤ 1 2− 8 .
n≥0 t≥2k t 2 −1
8
Now that we have the requisite compactness of {Sn : n ≥ 0}, convergence

comes to checking a criterion of the sort given in the following simple lemma.
Lemma 8.6.3. Suppose that {θn : n ≥ 0} is a relatively compact sequence in
Θ(E). If limn→∞ hθn (t), x∗ i exists for each t in a dense subset of [0, ∞) and x∗
in a weak* dense subset of E ∗ , then {θn : n ≥ 0} converges in Θ(E).
Proof: For a relatively compact sequence to be convergent, it is necessary and
sufficient that every convergent subsequence have the same limit. Thus, suppose
that θ and θ0 are limit points of {θn : n ≥ 0}. Then, by hypothesis, hθ(t), x∗ i =
hθ0 (t), x∗ i for t in a dense subset of [0, ∞) and x∗ in a weak* dense subset of E ∗ .
But this means that the same equality holds for all (t, x∗ ) ∈ [0, ∞) × E ∗ and
therefore that θ = θ0 .
Proof of Theorem 8.6.1: In view of Lemmas 8.6.2 and 8.6.3 and the sep-
arability of E ∗ in the weak* topology, we will know that {Sn ( · , x) : n ≥ 0}
converges in Θ(E) for W N -almost every x ∈ E N once we show that, for each
(t, x∗ ) ∈ [0, ∞) × E ∗ , {hSn (t, x), x∗ i : n ≥ 0} converges
Pn in R for W N -almost
∗ ∗ ∗ ∗ 1
every x ∈ E . But if x ∈ E , then hSn (t, x), x i = 0 hxm , x ihm (t), the ran-
N
dom variables x hxm , x∗ ih1m (t) are P

independent, centered Gaussians under
∞
W with variance khx∗ k2H h1m (t)2 , and 0 h1m (t)2 = kht k2H 1 (R) = t. Thus, by
N
Theorem 1.4.2, we have the required convergence.

Next, define B : [0, ∞) × E N −→ E so that
limn→∞ Sn (t, x) if {Sn ( · , x) : n ≥ 0} converges in Θ(E)

B(t, x) =
0 otherwise.
Given λ ∈ Θ(E)∗ , determine hλ ∈ H 1 (H) by h, hλ H 1 (H) = hh, λi for all h ∈

H 1 (H). I want to show that, under W N , x hB( · , x), λi is a centered Gaussian

with variance khλ k2H 1 (H) . To this end, define x∗m ∈ E ∗ so that1 hx, x∗m i =
hh1m x, λi for x ∈ E. Then,
n
X
hB( · , x), λi = lim hSn ( · , x), λi = lim hxm , x∗m i W N -almost surely.
n→∞ n→∞
0
Hence, hB( · , x), λi is certainly a centered Gaussian under W N , and, because we

are dealing with Gaussian random variables, almost sure convergence implies L2 -
convergence. To compute its variance, choose an orthonormal basis {hk : k ≥ 0}
for H, and note that, for each m ≥ 0,
∞
X
WN ∗ 2 2
hh1m hk , λi2 .

E hxm , xm i = khx∗m kH =
k=0
Thus, since {h1m hk : (m, k) ∈ N2 } is an orthonormal basis in H 1 (H),

∞ ∞
WN 2
X
1 2
X 2
h1m hk , hλ H 1 (H) = khλ k2H 1 (H) .

E hB( · ), λi = hhm hk , λi =
m,k=0 m,k=0
Finally, to complete the proof, all that remains is to take W (E) to be the
W N -distribution of x B( · , x).
§ 8.6.2. Brownian Formulation. Let (H, E, W) be an abstract Wiener space.
Given a probability space (Ω, F, P), a non-decreasing family of sub-σ-algebras
{Ft : t ≥ 0}, and a measurable map B : [0, ∞) × Ω −→ E, say that the triple
B(t), Ft , P is a W-Brownian motion if
(1) B is {Ft : t ≥ 0}-progressively measurable,

(2) B(0, ω) = 0 and B( · , ω) ∈ C [0, ∞); E for P-almost every ω,
(3) B(1) has distribution W, and, for all 0 ≤ s < t, B(t)−B(s) is independent
1
of Fs and has the same distribution as (t − s) 2 B(1).
Lemma 8.6.4. Suppose that {B(t) : t ≥ 0} satisfies conditions (1) and (2).
Then B(t), Ft , P is a W-Brownian motion if and only if hB(t), x∗ i, Ft , P is

an R-valued Brownian motion for each x∗ ∈ E ∗ with khx∗ kH = 1. In addition,

if B(t), Ft , P is a W-Brownian motion, then the span G(B) of {hB(t), x∗ i :

(t, x∗ ) ∈ [0, ∞) × E ∗ } is a Gaussian family in L2 (P; R) and
EP hB(t1 ), x∗1 ihB(t2 ), x∗2 i = (t1 ∧ t2 ) hx∗1 , hx∗2 H .

(8.6.5)
Conversely, if G(B) is a Gaussian family in L2 (P; R) and (8.6.5) holds,

then
B(t), Ft , P is a W-Brownian motion when Ft = σ {B(τ ) : τ ∈ [0, t]} .
1Given h1 ∈ H 1 (R) and x ∈ E, I use h1 x to denote the element θ of Θ(E) determined by
θ(t) = h1 (t)x.
Proof: If B(t), Ft , P is a W-Brownian motion and x∗ ∈ E ∗ with khx∗ kH = 1,

then hB(t), x∗ i − hB(s), x∗ i = hB(t) − B(s), x∗ i is independent of Fs and is a

centered Gaussian with variance (t − s). Thus, hB(t), x∗ i, Ft , P is an R-valued
Brownian motion.
Next assume that hB(t), x∗ i, Ft , P is an R-valued Brownian motion for every

x∗ with khx∗ kH = 1. Then hB(t) − B(s), x∗ i is independent of Fs for every

x∗ ∈ E ∗ , and so, since BE is generated by {h · , x∗ i : x∗ ∈ E ∗ }, B(t) − B(s) is
independent of Fs . In addition, hB(t) − B(s), x∗ i is a centered Gaussian with
variance (t − s)khx∗ k2H , and therefore B(1) has distribution W and B(t) − B(s)
1
has the same distribution as (t − s) 2 B(1). Thus, B(t), Ft , P is a W-Brownian
motion.
Again assume that B(t), Ft , P is a W-Brownian motion. To prove that
G(B) is a Gaussian family for which (8.6.5) holds, it suffices to show that, for
all 0 ≤ t1 ≤ t2 and x∗1 , x∗2 ∈ E ∗ , hB(t1 ), x∗1 i + hB(t2 ), x∗2 i is a centered Gaussian
with covariance t1 khx1 ∗ + hx∗2 k2H + (t2 − t1 )khx∗2 k2H . Indeed, we would then
know not only that G(B) is a Gaussian family but also that the variance of
hB(t1 ), x∗1 i ± hB(t2 ), x∗2 i is t1 khx1 ∗ ± hx∗2 k2H + (t2 − t1 )khx∗2 k2H , from which (8.6.5)
is immediate. But
hB(t1 ), x∗1 i + hB(t2 ), x∗2 i = hB(t1 ), x∗1 + x∗2 i + hB(t2 ) − B(t1 ), x∗2 i,
and the terms on the right are independent, centered Gaussians, the first with
variance t1 khx∗1 + hx∗2 k2H and the second with variance (t2 − t1 )khx∗2 k2H .

Finally, take Ft = σ {B(τ ) : τ ∈ [0, t]} , and assume that G(B) is a Gaussian
family satisfying (8.6.5). Given x∗ with khx∗ kH = 1 and 0 ≤ s < t, we know
that hB(t) − B(s), x∗ i = hB(t), x∗ i − hB(s), x∗ i is orthogonal in L2 (P; R) to
hB(τ ), y ∗ i for every τ ∈ [0, s] and y ∗ ∈ E ∗ . Hence, since Fs is generated by
{hB(τ ), y ∗ i : (τ, y ∗ ) ∈ [0, s]×E ∗ }, we know that hB(t)−B(s), x∗ i is independent
of Fs . In addition, hB(t) − B(s), x∗ i is a centered Gaussian with variance t − s,
and so we have proved that hB(t), x∗ i, Ft , P is an R-valued Brownian

motion.
Now apply the first part of the lemma to conclude that B(t), Ft , P is a W-
Brownian motion.
Theorem 8.6.6. Refer to the notation in Theorem 8.6.1. When Ω = Θ(E),
F = BE , and Ft = σ {θ(τ ) : τ ∈ [0, t]} , θ(t), Ft , W (E) is a W-Brownian
motion. Conversely, if B(t), Ft , P is any W-Brownian motion, then B( · , ω) ∈
Θ(E) P-almost surely and W (E) is the P-distribution of ω B( · , ω).
Proof: To prove the first assertion, let t1 , t2 ∈ [0, ∞) and x∗1 , x∗2 ∈ E ∗ be given,
and define λi ∈ Θ(E)∗ so that hθ, λi i = hθ(ti ), x∗i i for i ∈ {1, 2}. Then (cf. the
notation in the proof of Theorem 8.6.1) hλi = h1ti hx∗i , and so
(E)
EW hθ(t1 ), x∗1 ihθ(t2 ), x∗2 i = hλ1 hλ2 H 1 (H) = (t1 ∧ t2 ) hx∗1 , hx∗2 H .

Starting from this, it is an easy matter to check that the span of {hθ(t), x∗ i :
(t, x∗ ) ∈ [0, ∞) × E ∗ } is a Gaussian family in L2 (W (E) ; R) that satisfies (8.6.5).
To prove the converse, begin by observing that, because G(B) is a Gaussian
family satisfying (8.6.5), the distribution of ω ∈ Ω 7−→ B( · , ω) ∈ C [0, ∞); E

under P is the same as that of θ ∈ Θ(E) 7−→ θ( · ) ∈ C [0, ∞); E under W (E) .
Hence

kB(t)kE (E) kθ(t)kE
P lim =0 =W lim = 0 = 1,
t→∞ t t→∞ t
and so B( · , ω) ∈ Θ(E) P-almost surely and the distribution of ω B( · , ω) on

Θ(E) is W (E) .
§ 8.6.3. Strassen’s Theorem Revisited. What I called Strassen’s Theorem

in § 8.4.2 is not the form in which Strassen himself presented it. Instead, his
formulation was in terms of rescaled R-valued Brownian motion, not partial sums
of independent random variables. The true statement of Strassen’s Theorem is
the following in the present setting.
Theorem 8.6.7 (Strassen). Given θ ∈ Θ(E), define θ̃n (t) = θ(nt) Λn for n ≥ 1
q
and t ∈ [0, ∞), where Λn = 2n log(2) (n ∨ 3). Then, for W (E) -almost every θ,
the sequence {θ̃n : n ≥ 0} is relatively compact in Θ(E) and BH 1 (H) (0, 1) is its
set of limit points. Equivalently, for W (E) -almost every θ,
lim kθ̃n − BH 1 (H) (0, 1)kΘ(E) = 0

n→∞
and, for each h ∈ BH 1 (H) (0, 1), limn→∞ kθ̃n − hkΘ(E) = 0.
Not surprisingly, the proof differs only slightly from that of Theorem 8.4.4.
In proving the W (E) -almost sure convergence of {θ̃n : n ≥ 1} to BH 1 (H) (0, 1),
there are two new ingredients here. The first is the use of the Brownian scaling
invariance property (cf. Exercise 8.6.8), which says that the W (E) is invariant
1
under the scaling maps Sα : Θ(E) −→ Θ(E) given by Sα θ = α− 2 θ(α · ) for
α > 0 and is easily proved as a consequence of the fact that these maps are
isometric from H 1 (H) onto itself. The second new ingredient is the observation
that, for any R > 0, r ∈ (0, 1], and θ ∈ Θ(E), kθ(r · ) − BH 1 (H) (0, R)kΘ(E) ≤
kθ − BH 1 (H) (0, R)kΘ(E) . To see this, let h ∈ BH 1 (H) (0, R) be given, and check
that h(r · ) is again in BH (0, R) and that kθ(r · ) − h(r · )kΘ(E) ≤ kθ − hkΘ(E) .
Taking these into account and applying (8.4.2), one can now justify

W (E) m−1max m θ̃n − BH 1 (H) (0, 1) Θ(E) ≥ δ

β ≤n≤β
m !
β 2 θ(nβ −m · )

(E)

=W max − BH 1 (H) (0, 1) ≥δ
β m−1 ≤n≤β m
Λn
Θ(E)
 

Λ [β m−1 ]
δ
≤ W (E)  m−1max m θ β −m n · − BH 1 (H) 0,

≥ m

m
β ≤n≤β β2 β 2 Λ[β m−1 ]
Θ(E)
 

Λ[β m−1 ] δ
≤ W (E)  θ − BH 1 (H) 0, ≥ m

m
β2 β 2 Λ[β m−1 ]
Θ(E)
m
= W (E) β 2 Λ−1

[β m−1 ] θ − B 1
H (H) (0, 1)
Θ(E)
≥ δ
R2 [β m−1 ]

(E) m−1
= Wβ m Λ−2 kθ − BH 1 (H) (0, 1)kΘ(E) ≥ δ ≤ exp − log(2) [β ]
[β m−1 ] βm
for all β ∈ (1, 2), R < inf{khkH 1 (H) : khkΘ(E) ≥ δ}, and sufficiently large m ≥ 1.
Armed with this information, one can simply repeat the argument given at the
analogous place in the proof of Theorem 8.4.4.
The proof that, W (E) -almost surely, θ̃n approaches every h ∈ C infinitely often
also requires only minor modification. To begin, one remarks that if A ⊆ Θ(E)
is relatively compact, then
kθ(t)kE
lim sup sup = 0.
/ −1 ,T ] 1 + t
T →∞ θ∈A t∈[T
Thus, since, by the preceding, for W (E) -almost every θ, the union of {θn : n ≥ 1}
and BH 1 (H) (0, 1) is relatively compact in Θ(E), it suffices to prove that

θ̃n (t) − θ̃n (k −1 ) − h(t) − h(k −1 ) kE

lim sup = 0 W (E) -almost surely
n→∞ t∈[k−1 ,k] 1+t
for each h ∈ BH 1 (H) (0, 1) and k ≥ 2. Because, for a fixed k ≥ 2, the random
variables θ̃k2m − θ̃k2m (k −1 ) [k −1 , k], m ≥ 1, are W (E) -independent random

variables, we can use the Borel–Cantelli Lemma as in § 8.4.2 and thereby reduce
the problem to showing that, if θ̌km (t) = θ̃km (t + k −1 ) − θ̃km (k −1 ), then
∞
X
W (E) kθ̌k2m − hkΘ(E) ≤ δ = ∞

m=1
for each δ > 0, k ≥ 2, and h ∈ BH 1 (H) (0, 1). Finally, since W (E) km Λ−1 is the
k2m
W (E) distribution of θ θ̌k2m , the rest of the argument is the same as the one
given in § 8.4.2.

Exercise 8.6.8. Let H 1 (H), Θ(E), W (E) be as in Theorem 8.6.1.
1
(i) Given α > 0, define Sα : Θ(E) −→ Θ(E) so that Sα θ(t) = α− 2 θ(αt), t ∈
[0, ∞), and show that (Sα )∗ W (E) = W (E) . Again, this property is called Brow-
nian scaling invariance.
(ii) Define I : Θ(E) −→ C [0, ∞); E so the Iθ(0) = 0 and Iθ(t) = tθ(t−1 ) for

t > 0. Show that I is an isometry from Θ(E) onto itself and that I H 1 (H)
is an isometry on H onto itself. Finally, use this to prove the Brownian time
inversion invariance property: I∗ W (E) = W (E) .
Exercise 8.6.9. Let H U (H) be the Hilbert space of absolutely continuous hU :
R −→ H with the property that
q
khkH U (H) = kḣU k2L2 (R;H) + 14 khU k2L2 (R;H) < ∞,
and take ΘU (E) to be the Banach space of continuous θU : R −→ E satisfying

U
kθ U (t)
lim|t|→∞ kθlog(t)k
t = 0 with norm kθU kΘU (E) = supt∈R log(e+|t|) . If F : Θ(E) −→
t
C(R; E) is given by [F (θ)](t) = e− 2 θ(et ), show
that F takes Θ(E) continuously
into ΘU (E) and that H U (H), ΘU (E), U (E) is an abstract Wiener space when
(E) (E)
UR = F∗ W (E) . Of course, one should recognize the measure UR as the
distribution of an E-valued, reversible, Ornstein–Uhlenbeck process.
Exercise 8.6.10. A particularly interesting case of the construction in Exercise
8.6.9 is when H = H 1 (RN ) and E = Θ(RN ). Working in that setting, define
B : R × [0, ∞) × ΘU Θ(E) −→ RN by B (s, t), θ = [θ(s)](t), and show that,
Θ(RN )
for each s ∈ R, B(s, t), F(s,t) , UR is an RN -valued Brownian motion when
F(s,t) = σ {B(s, τ ) : τ ∈ [0, t]} . Next, for each t ∈ [0, ∞), show that the
Θ(E) √
UR -distribution of θ B( · , t) is that of t times a reversible, RN -valued
Ornstein–Uhlenbeck process.
Exercise 8.6.11. Continuing in the same setting as in the preceding, set σ 2 =
(E)
EW

kθk2Θ(E) , and combine the result in Exercise 8.2.16 with Brownian scaling
invariance to show that
!
R2

(E)
W sup kθ(t)kE ≥ R ≤ K exp − ,
τ ∈[0,t] 72σ 2 t
where K is the constant in Fernique’s Theorem. Next, use this together with
Theorem 8.4.4 and the reasoning in Exercise 4.3.16 to show that
kθ(t)kE kθ(t)kE
lim q = L = lim q W (E) -almost surely,
t→∞ t&0 1
2t log(2) t 2t log(2) t

where L = sup khkE : h ∈ BH (0, 1) .
Exercise 8.6.12. It should be recognized that Theorem 8.4.4 is an immediate

corollary of Theorem 8.6.7. To see this, check that {θ(n) : n ≥ 1} has the same
distribution under W (E) as {Sn : n ≥ 1} has under W N and that BH (0, 1) =
{h(1) : h ∈ BH 1 (H) }, and use these to show that Theorem 8.4.4 follows from
Theorem 8.6.7.
Exercise 8.6.13. For θ ∈ Θ(E) and n ∈ Z+ , define θ̆n ∈ Θ(E) so that
s
n t
θ̆n (t) = θ , t ∈ [0, ∞),
log(2) (n ∨ 3) n
and show that, W (E) -almost surely, {θ̆n : n ≥ 1} is relatively compact in Θ(E)
and that BH 1 (H) (0, 1) is the set of its limit points.
Hint: Referring to (ii) in Exercise 8.6.8, show that it suffices to prove these
properties for the sequence {(Iθ)˘n : n ≥ 1}. Next check that
(Iθ)˘n − Ih for h ∈ H 1 (H),

Θ(E)
= θ̃n − h Θ(E)
and use Theorem 8.6.7 and the fact that I is an isometry of H 1 (H) onto itself.
Chapter 9
Convergence of Measures on a Polish Space
In Chapters 2 and 3, I introduced a notion of convergence on M1 (RN ) that is

appropriate when discussing either Central Limit phenomena or the sort of limits
that arose in connection with infinitely divisible laws. In this chapter, I will give
a systematic treatment of this sort of convergence and show how it extends
to probability measures on any Polish space, that is, any complete, separable,
metric space. Unfortunately, this extension will entail an excursion into territory
that borders on abstract nonsense, although I hope to avoid crossing that border.
In any case, just as Banach’s great achievement was the ingenious use for infinite
dimensional vector spaces of completeness to replace local compactness, so here
we will have to learn how to substitute compactness by completeness in measure
theoretic arguments.
§ 9.1 Prohorov–Varadarajan Theory
The goal in this section is to generalize results like Lemma 2.1.7 and Theorem
3.1.1 to a very abstract setting.
§ 9.1.1. Some Background. When discussing the convergence of probabil-
ity measures on a measurable space (E, B), one always has at least two senses
in which the convergence may take place, and (depending on additional struc-
ture that the space may possess)
one may have more. To be more precise,
let B(E; R) ≡ B (E, B); R be the space of bounded, R-valued, B-measurable
functions on E, use M1 (E) ≡ M1 (E, B) to denote the space of all probability
measures on (E, B), and define the duality relation
Z
hϕ, µi = ϕ dµ for ϕ ∈ B(E; R) and µ ∈ M1 (E).
E
Next, again use kϕku ≡ supx∈E |ϕ(x)| to denote the uniform norm of ϕ ∈
B(E; R), and consider the neighborhood basis at µ ∈ M1 (E) determined by
the sets

U (µ, r) = ν ∈ M1 (E) : hϕ, νi − hϕ, µi < r for ϕ ∈ B(E, R) with kϕku ≤ 1
as r runs over (0, ∞). For obvious reasons, the topology defined by these neigh-
borhoods U is called the uniform topology on M1 (E). In order to develop
some feeling for the uniform topology, I will begin by examining a few of its
elementary properties.
367
368 9 Convergence of Measures on a Polish Space
Lemma 9.1.1. Define the variation distance between elements µ and ν of

M1 (E) by
n o
kν − µkvar = sup hϕ, µi − hϕ, νi : ϕ ∈ B(E; R) with kϕku ≤ 1 .
Then (µ, ν) ∈ M1 (E)2 7−→ kµ − νkvar is a metric on M1 (E) that is compatible

with the uniform topology. Moreover, if µ, ν ∈ M1 (E) are two elements of
M1 (E) and λ is any element of M1 (E) with respect to which both µ and ν are
absolutely continuous (e.g., µ+ν
2 ), then
dµ ∂ν
(9.1.2) kµ − νkvar = kg − f kL1 (λ;R) , where f = and g = .
dλ ∂λ
In particular, kµ − νkvar ≤ 2, and equality holds precisely when ν ⊥ µ (i.e., they

are singular). Finally, the metric (µ, ν) ∈ M1 (E)2 7−→ kµ − νkvar is complete.
Proof: The first assertion needing comment is the one in (9.1.2). But, for every
ϕ ∈ B(E; R) with kϕku ≤ 1,
Z

hϕ, νi − hϕ, µi = ϕ(g − f ) dλ ≤ kg − f kL1 (λ;R) ,

E
and equality holds when ϕ = sgn ◦ (g − f ). To prove the assertion that follows
(9.1.2), note that
kg − f kL1 (λ;R) ≤ kf kL1 (λ;R) + kgkL1 (λ;R) = 2
and that the inequality is strict if and only if f g > 0 on a set of strictly positive
λ-measure or, equivalently, if and only if µ 6⊥ ν. Thus, all that remains is to
check the completeness assertion. To this end, let {µn : n ≥ 1} ⊆ M1 (E)
satisfying
lim sup kµn − µm kvar = 0
m→∞ n≥m
P∞
be given, and set λ = n=1 2−n µn . Clearly, λ is an element of M1 (E) with
respect to which each µn is absolutely continuous. Moreover, if fn = dµ dλ , then,
n
1
by (9.1.2), {fn : n ≥ 1} is a Cauchy convergent sequence in L (λ; R). Hence,
since L1 (λ; R) is complete, there is an f ∈ L1 (λ; R) to which the fn ’s converge in
L1 (λ; R). Obviously, we may choose f to be non-negative, and certainly it has
λ-integral 1. Thus, the measure µ given by dµ = f dλ is an element of M1 (E),
and, by (9.1.2), kµn − µkvar −→ 0.
As a consequence of Lemma 9.1.1, we see that the uniform topology on M1 (E)
admits a complete metric and that convergence in this topology is intimately re-
lated to L1 -convergence in the L1 -space of an appropriate element of M1 (E).
§ 9.1 Prohorov–Varadarajan Theory 369
In fact, M1 (E) looks in the uniform topology like a galaxy that is broken into
many constellations, each constellation consisting of measures that are all abso-
lutely continuous with respect to some fixed measure. In particular, there will
usually be too many constellations for M1 (E) in the uniform topology to be
separable. To wit, if E is uncountable and {x} ∈ B for every x ∈ E, then the
point masses δx , x ∈ E, (i.e., δx (Γ) = 1Γ (x)) form an uncountable subset of
M1 (E) and kδy − δx kvar = 2 for y 6= x. Hence, in this case, M1 (E) cannot be
covered by a countable collection of open k · kvar -balls of radius 1.
As I said at the beginning of this section, the uniform topology is not the only
one available. Indeed, for many purposes and, in particular, for probability the-
ory, it is too rigid a topology to be useful. For this reason, it is often convenient
to consider a more lenient topology on M1 (E). The first one that comes to mind
is the one that results from eliminating the uniformity in the uniform topology.
That is, given a µ ∈ M1 (E), define
n o
(9.1.3) S µ, δ; ϕ1 , . . . , ϕn ≡ ν ∈ M1 (E) : max hϕk , νi − hϕk , µi < δ
1≤k≤n
for n ∈ Z+ , ϕ1 , . . . , ϕn ∈ B(E; R), and δ > 0. Clearly these sets S determine a

Hausdorff topology on M1 (E) in which the net {µα : α ∈ A} converges to µ if
and only if limα hϕ, µα i = hϕ, µi for every ϕ ∈ B(E; R). For historical reasons,
in spite of the fact that it is obviously weaker than the uniform topology, this
topology on M1 (E) is sometimes called the strong topology, although, in some
of the statistics literature, it is also known as the τ -topology.
A good understanding of the relationship between the strong and uniform
topologies is most easily gained through functional analytic considerations that
will not be particularly important for what follows. Nonetheless, it will be useful
to recognize that, except in very special circumstances, the strong topology is
strictly weaker than the uniform topology. For example, take E = [0, 1] withits
Borel field, and consider the probability measures µn (dt) = 1 + sin(2nπt) dt
for n ∈ Z+ . Noting that, since | sin(2nπt) − sin(2mπt)| ≤ 2 and therefore
Z 1
1 | sin(2nπt) − sin(2mπt)|
2 kµn − µm kvar = 2
dt
0
1 1
Z
2 1
≥ sin(2nπt) − sin(2mπt) dt =
4 0 4
for m 6= n, one sees that {µn : n ≥ 1} not only fails to converge in the uniform
1not even have any limit points as n →2 ∞. On the other
topology, it does
hand, because 2 2 sin(2nπt) : n ≥ 1 is orthonormal in L λ[0,1] ; R , Bessel’s
Inequality says that
∞ Z !2
X
2 ϕ(t) sin(2nπt) dt ≤ kϕk2L2 (λ[0,1] ) ≤ kϕk2u < ∞
n=1 [0,1]

and therefore hϕ, µn i −→ hϕ, λ[0,1] i for every ϕ ∈ B [0, 1]; R . In other words,
{µn : n ≥ 1} converges to λ[0,1] in the strong topology, but it converges to nothing
at all in the uniform topology.
§ 9.1.2. The Weak Topology. Although the strong topology is weaker than
the uniform and can be effectively used in various applications, it is still not
weak enough for most probabilistic applications. Indeed, even when E possesses
a good topological structure and B = BE is the Borel field over E, the strong
topology on M1 (E) shows no respect for the topology on E. For example,
suppose that E is a metric space and, for each x ∈ E, consider the point mass
δx on BE . Then, no matter how close x ∈ E \ {x} gets to y in the sense
of the topology on E, δx is not getting close to δy in the strong topology on
M1 (E). More generally (cf. Exercise 9.1.15), measures cannot be close in the
strong topology unless their sets of small measure are essentially the same. Thus,
for example, the convergence that is occurring in The Central Limit Theorem
(cf. Theorem 2.1.8) cannot, in general, be taking place in the strong topology;
and since The Central Limit Theorem is an archetypal example of the sort of
convergence result at which probabilists look, it is only sensible for us to take a
hint from the result that we got there.
Thus, let E be a metric space, set B = BE , and consider the neighborhood
basis at µ ∈ M1 (E) given by the sets S(µ, δ; ϕ1 , . . . , ϕn ) in (9.1.3) when the
ϕk ’s are restricted to be elements of Cb (E; R). The topology that results is much
weaker than the strong topology, and is therefore justifiably called the weak
topology on M1 (E). (The reader who is familiar with the language of functional
analysis will, with considerable justice, complain about this terminology. Indeed,
if one thinks of Cb (E; R) as a Banach space and of M1 (E) as a subspace of its
dual space Cb (E; R)∗ , then the topology that I am calling the weak topology
is what a functional analyst would call the weak∗ topology. However, because
it is the most commonly accepted choice of probabilists, I will continue to use
the term weak instead of the more correct term weak∗ .) In particular, the weak
topology respects the topology on E: δy tends to δx in the weak topology on
M1 (E) if and only if y −→ x in E. Lemma 2.3.3 provides further evidence
that the weak topology is well adapted to the sort of analysis encountered in
probability theory, since, by that lemma, weak convergence of {µn : n ≥ 1} ⊆
M1 (RN ) to µ is equivalent to pointwise convergence of µ cn (ξ) to µ̂(ξ).
Besides being well adapted to probabilistic analysis, the weak topology turns
out to have many intrinsic virtues that are not shared by either the uniform or
strong topologies. In particular, as we will see shortly, when E is a separable
metric space, the weak topology on M1 (E) is not only a metric topology, which
(cf. Exercise 9.1.15) the strong topology seldom is, but it is even separable,
which, as we have seen, the uniform topology seldom is. In order to check these
properties, we will first have to review some elementary facts about separable
metric spaces.
Given a metric ρ for a topological space E, I will use Ubρ (E; R) to denote
the space of bounded, ρ-uniformly continuous R-valued functions on E and will

endow Ubρ (E; R) with the topology determined by the uniform norm. Thus,
Ubρ (E; R) becomes in this way a closed subspace of Cb (E; R).
Lemma 9.1.4. Let E be a separable metric space. Then E is homeomorphic
+
to a subset of [0, 1]Z . In particular:
(i) If E is compact, then the space C(E; R) is separable with respect to the
uniform metric.
(ii) Even when E is not compact, it nonetheless admits a metric ρ̂ with respect
to which it becomes a totally bounded metric space.
(iii) If ρ̂ is a totally bounded metric on E, then Ubρ̂ (E; R) is separable.
Proof: Let ρ be any metric on E, and choose {pn : n ≥ 1} to be a countable,
+
dense subset of E. Next, define h : E −→ [0, 1]Z to be the mapping whose nth
coordinate is given by
ρ(x, pn )
hn (x) = , x ∈ E.
1 + ρ(x, pn )
It is then an easy matter to check that h is homeomorphic onto a subset of
+
[0, 1]Z .
+
To prove (i), I will first check it for compact subsets K of E = [0, 1]Z . To this
+
end, denote by P the space of polynomials p : [0, 1]Z −→ R. That is, P consists
+
of finite, R-linear combinations of the monomials ξ ∈ [0, 1]Z 7−→ ξkn11 · · · ξkn`` ,
where ` ≥ 1, 1 ≤ k1 < · · · < k` , and {n1 , . . . , n` } ⊆ N. Clearly, if P0 is the
subset of P consisting of those p’s with rational coefficients, then P0 is countable,
and P0 is dense in P. Thus, it suffices to show that {p K : p ∈ P} is dense
in C(K; R). But P is obviously an algebra. In addition, if ξ and η are distinct
+
points in [0, 1]Z , it is an easy (in fact, a one dimensional) matter to see that
there is a p ∈ P for which p(ξ) 6= p(η). Hence, the desired density follows
from the Stone–Weierstrass Approximation Theorem. Finally, for an arbitrary
+
compact metric space E, define h : E −→ [0, 1]Z as above, note that K ≡ h(E)
is compact, and conclude that the map ϕ ∈ C(K; R) 7−→ ϕ ◦ h ∈ C(E; R) is
a homeomorphism between the uniform topologies on these spaces. Since we
already know that C(K; R) is separable, this completes (i).
The proof of (ii) is easy. Namely, define
∞
X |ξn − ηn | +
D(x, η) = for x, η ∈ [0, 1]Z .
n=1
2n
+
Clearly, D is a metric for [0, 1]Z , and therefore
(x, y) ∈ E 2 7−→ ρ̂(x, y) ≡ D h(x), h(y)

+
is a metric for E. At the same time, since [0, 1]Z is compact, and therefore
the restriction of D to any subset is totally bounded, it is clear that ρ̂ is totally
bounded on E.
To prove (iii), let Ê denote the completion of E with respect to the totally
bounded metric ρ̂. Then, because E is dense in Ê, Ê is both complete and
totally bounded and therefore compact. In addition, ϕ̂ ∈ C Ê; R 7−→ ϕ̂ E ∈
Ubρ̂ (E; R) is a surjective homeomorphism; and so (iii) now follows from (i).
One of the main reasons why Lemma 9.1.4 will be important to us is that it
will enable us to show that, for separable metric spaces E, the weak topology
on M1 (E) is also a separable metric topology. However, thus far we do not
even know that the neighborhood bases are countably generated, and so, for a
moment longer, I must continue to consider nets when discussing convergence.
In order to indicate that a net {µσ : α ∈ A} ⊆ M1 (E) is converging weakly
(i.e., in the weak topology) to µ, I will write µα =⇒ µ.
Theorem 9.1.5. Let E be any metric space and {µα : α ∈ A} a net in M1 (E).
Given any µ ∈ M1 (E), the following statements are equivalent:
(i) µα =⇒ µ.
(ii) If ρ is any metric for E, then hϕ, µα i −→ hϕ, µi for every ϕ ∈ Ubρ (E; R).
(iii) For every closed set F ⊆ E, lim µα (F ) ≤ µ(F ).
α
(iv) For every open set G ⊆ E, lim µα (G) ≥ µ(G).

α
(v) For every upper semicontinuous function f : E −→ R that is bounded above,

limhf, µα i ≤ hf, µi.
α
(vi) For every lower semicontinuous function f : E −→ R that is bounded below,

limhf, µα i ≥ hf, µi.
α
(vii) For every f ∈ B(E; R) that is continuous at µ-almost every x ∈ E,

hf, µα i −→ hf, µi.
Finally, assume that E is separable, and let ρ̂ be a totally bounded metric for
E. Then there exists a countable subset {ϕn : n ≥ 1} ⊆ Ubρ̂ (E; [0, 1] that is

+
dense in Ubρ̂ (E; R), and therefore the mapping H : M1 (E) −→ [0, 1]Z given by
H(µ) = hϕ1 , µi, . . . , hϕn , µi, . . . is a homeomorphism from the weak topology
+
on M1 (E) into [0, 1]Z . In particular, when E is separable, M1 (E) with the
weak topology is itself a separable metric space and, in fact, one can take
∞
2
X hϕn , µi − hϕn , νi
(µ, ν) ∈ M1 (E) 7−→ R(µ, ν) ≡
n=1
2n
to be a metric for M1 (E).

Proof: The implications
(vii) =⇒ (i) =⇒ (ii), (iii) ⇐⇒ (iv), and (v) ⇐⇒ (vi)
are all trivial. Thus, the first part will be complete once I check that (ii) =⇒
(iii), (iv) =⇒ (vi), and that (v) together with (vi) imply (vii). To see the
first of these, let F be a closed subset of E, and set
n1
ρ(x, F )
ψn (x) = 1 − for n ∈ Z+ and x ∈ E.
1 + ρ(x, F )
It is then clear that ψn ∈ Ubρ (E; R) for each n ∈ Z+ and that 1 ≥ ψn (x) & 1F (x)
as n → ∞ for each x ∈ E. Thus, The Monotone Convergence Theorem followed
by (ii) imply that
µ(F ) = lim hψn , µi = lim limhψn , µα i ≥ lim µα (F ).

n→∞ n→∞ α α
In proving that (iv) =⇒ (vi), I may and will assume that f is a non-negative,
lower semicontinuous function. For n ∈ N, define
n
∞ 4
X ` ∧ 4n 1 X
fn = 1I`,n ◦f = n 1J`,n ◦ f,
2n 2
`=0 `=0
where
` `+1 `
I`,n = , and J`,n = ,∞ .
2n 2n 2n
It is then clear that 0 ≤ fn % f and therefore that hfn , µi −→ hf, µi as n → ∞.
At the same time, by lower semicontinuity, the sets {f ∈ J`,n } are open, and so
(iv) implies
hfn , µi ≤ limhfn , µα i ≤ limhf, µα i
α α
+
for each n ∈ Z . After letting n → ∞, one sees that (iv) =⇒ (vi).
Turning to the proof that (v) & (vi) =⇒ (vii), suppose that f ∈ B(E; R) is
continuous at µ-almost every x ∈ E, and define
f (x) = lim f (y) and f (x) = lim f (y) for x ∈ E.

y→x y→x
It is then an easy matter to check that f ≤ f ≤ f everywhere and that equal-

ity holds µ-almost surely. Furthermore, f is lower semicontinuous, f is upper
semicontinuous, and both are bounded. Hence, by (v) and (vi),
limhf, µα i ≤ limhf , µα i ≤ hf , µi = hf , µi ≤ limhf , µα i ≤ limhf, µα i;

α α α α
and so I have now completed the proof that conditions (i) through (vii) are
equivalent.
Now assume that E is separable, and let ρ̂ be a totally bounded metric for E.
By (iii) of Lemma 9.1.4, Ubρ̂ (E; R) is separable. Hence, we can find a countable
set {ϕn : n ≥ 1} that is dense in Ubρ̂ (E; R). In particular, by the equivalence of
(i) and (ii) above, we see that hϕn , µα i −→ hϕn , µi for all n ∈ Z+ if and only if
+
µα =⇒ µ, which is to say that the corresponding map H : M1 (E) −→ [0, 1]Z is
+
a homeomorphism. Since [0, 1]Z is a compact metric space and D (cf. the proof
of (ii) in Lemma 9.1.4) is a metric for it, we also see that the R described is a
totally bounded metric for M1 (E). In particular, M1 (E) is separable. Finally,
since, by (ii) in Lemma 9.1.4, it is always possible to find a totally bounded
metric for E, the last assertion needs no further comment.
The reader would do well to pay close attention to what (iii) and (iv) say
about the nature of weak convergence. Namely, even though µα =⇒ µ, it is
possible that some or all of the mass that the µα ’s assign to the interior of a
set may gravitate to the boundary in the limit. This phenomenon is most easily
understood by taking E = R, µα to be the unit point mass δα at α ∈ [0, 1),
checking that δα =⇒ δ1 , and noting that δ1 (0, 1) = 0 < 1 = δα (0, 1) for each
α ∈ [0, 1).
Remark 9.1.6. Those who find nets distasteful will be pleased to learn that,
from now on, I will be restricting my attention to separable metric spaces E and
therefore need only discuss sequential convergence when working with the weak
topology on M1 (E). Furthermore, unless the contrary is explicitly stated, I will
always be thinking of the weak topology when working with M1 (E).
Given a separable metric space E, I next want to find conditions that guarantee
that a subset of M1 (E) is compact; and at this point it will be convenient to
have introduced the notation K ⊂⊂ E to indicate that K is a compact subset
of E. The key to my analysis is the following extension of the sort of Riesz
Representation result in Theorem 3.1.1 combined with a crucial observation
made by S. Ulam.1
Lemma 9.1.7. Let E be a separable metric space, ρ a metric for E, and Λ a

non-negative linear functional on Ubρ (E; R) (i.e., Λ is a linear map that assigns
a non-negative value to a non-negative ϕ ∈ Ubρ (E; R)) with Λ(1) = 1. Then, in
order for there to be a (necessarily unique) µ ∈ M1 (E) satisfying Λ(ϕ) = hϕ, µi
for all ϕ ∈ Ubρ (E; R), it is sufficient that, for every > 0, there exist a K ⊂⊂ E
1 It is no accident that Ulam was the first to make this observation. Indeed, the term Polish
space was coined by Bourbaki in recognition of the contribution made to this subject by the
Polish school in general and C. Kuratowski in particular (cf. Kuratowski’s Topologie, Vol. I,
Warszawa–Lwow (1933)). Ulam had studied with Kuratowski.
such that
ϕ ∈ Ubρ (E; R).

(9.1.8) Λ(ϕ) ≤ sup |ϕ(x)| + kϕku ,
x∈K
Conversely, if E is a Polish space and µ ∈ M1 (E), then for every > 0 there is a
K ⊂⊂ E such that µ(K) ≥ 1 − . In particular, if µ ∈ M1 (E) and Λ(ϕ) = hϕ, µi
for ϕ ∈ Cb (E; R), then, for each > 0, (9.1.8) holds for some K ⊂⊂ E.
Proof: I begin with the trivial observation that, because Λ is non-negative and
Λ(1) = 1, Λ(ϕ) ≤ kϕku . Next, according to the Daniell theory of integration,
the first statement will be proved as soon as we know that Λ(ϕn ) & 0 whenever
{ϕn : n ≥ 1} is a non-increasing sequence of functions from Ubρ E; [0, ∞) that

tend pointwise to 0 as n → ∞. To this end, let > 0 be given, and choose

K ⊂⊂ E so that (9.1.8) holds. One then has that

lim Λ ϕn ≤ lim sup |ϕn (x)| + kϕ1 ku = kϕ1 ku ,
n→∞ n→∞ x∈K
since, by Dini’s Lemma, ϕn & 0 uniformly on compact subsets of E.

Turning to the second part, assume that E is Polish, and use B(x, r) to denote
the open ball of radius r > 0 around x ∈ E, computed with respect to a complete
metric ρ for E. Next, let {pk : k ≥ 1} be a countable dense subset of E, and set
Bk,n = B pk , n1 for k, n ∈ Z+ . Given µ ∈ M1 (E) and > 0, we can choose,

for each n ∈ Z+ , an `n ∈ Z+ so that
`n
!
[
µ Bk,n ≥1− .
2n
k=1
Hence, if
`n
[ ∞
\
Cn ≡ B k,n and K = Cn ,
k=1 n=1
then µ(K) ≥ 1 − . At the same time, it is obvious that, on the one hand,
K is closed (and therefore ρ-complete) and that, on the other hand, K ⊆
S`n 2

k=1 B pk , n for every n ∈ Z+ . Hence, K is both complete and totally
bounded with respect to ρ and, as such, is compact.
As Lemma 9.1.7 makes clear, probability measures on a Polish space like to
be nearly concentrated on a compact set. Following Prohorov and Varadarajan,2
2 See Yu. V. Prohorov’s article “Convergence of random processes and limit theorems in prob-
ability theory,” Theory of Prob. & Appl., which appeared in 1956. Independently, V.S.
Varadarajan developed essentially the same theory in “Weak convergence of measures on a
separable metric spaces,” Sankhyǎ, which was published in 1958. Although Prohorov got into
print first, subsequent expositions, including this one, rely heavily on Varadarajan.
what we are about to see is that, for a Polish space E, relatively compact subsets
of M1 (E) are those whose elements are nearly concentrated on the same compact
set of E. More precisely, given a separable metric space E, say that M ⊆ M1 (E)
is tight if, for every > 0, there exists a K ⊂⊂ E such that µ(K) ≥ 1 − for
all µ ∈ M .
Theorem 9.1.9. Let E be a separable metric space and M ⊆ M1 (E). Then
M is compact if M is tight. Conversely, when E is Polish, M is tight if M is
compact.3
Proof: Since it is clear, from (iii) in Theorem 9.1.5, that M is tight if and only
if M is, I will assume throughout that M is closed in M1 (E).
To prove the first statement, take
ρ̂ to be a totally bounded metric on E,
ρ̂
choose {ϕn : n ≥ 1} ⊆ Ub E; [0, 1] accordingly, as in the last part of Theorem
9.1.5, and let ϕ0 = 1. Given a sequence {µ` : ` ≥ 1} ⊆ M1(E), we can use a
standard diagonalization procedure to extract a subsequence µ`k : k ≥ 1 such
that
Λ(ϕn ) ≡ lim hϕn , µ`k i
k→∞
exists for each n ∈ N. Since Λ(ϕ) ≡ limk→∞ hϕ, µ`k i continues to exist for
every ϕ in the uniform closure of the span of {ϕn : n ≥ 1}, we now see that
Λ determines a non-negative linear functional on Ubρ̂ (E; R) and that Λ(1) = 1.
Moreover, because M is tight, we can find, for any > 0, a K ⊂⊂ E such that
µ(K) ≥ 1 − for every µ ∈ M , and therefore (9.1.8) holds with this choice
of K. Hence, by Lemma 9.1.7, we know that there is a µ ∈ M1 (E) for which
Λ(ϕ) = hϕ, µi, ϕ ∈ Ubρ̂ (E; R). Because this means that hϕ, µ`k i −→ hϕ, µi for
every ϕ ∈ Ubρ̂ (E; R), the equivalence of (i) and (ii) in Theorem 9.1.5 allows us
to conclude that µ`k =⇒ µ.
Finally, suppose that E is Polish and that M is compact in M1 (E). To see
that M must be tight, repeat the argument used to prove the second part of
Lemma 9.1.7. Thus, choose Bk,n for k, n ∈ Z+ as in the proof there, and set
`
!
[
f`,n (µ) = µ Bk,n for `, n ∈ Z+ .
k=1
By (iv) in Theorem 9.1.5, µ ∈ M1 (E) 7−→ f`,n (µ) ∈ [0, 1] is lower semicontinu-
ous. Moreover, for each n ∈ Z+ , f`,n % 1 as ` % ∞. Thus, by Dini’s Lemma,
we can choose, for each n ∈ Z+ , one `n ∈ Z+ so that f`n ,n (µ) ≥ 1 − 2n for all
3 For the reader who wishes to investigate just how far these results can be pushed before
they start of break down, a good place to start is Appendix III in P. Billingsley’s Convergence
of Probability Measures, Wiley (1968). In particular, although it is reasonably clear that
completeness is more or less essential for the necessity, the havoc that results from dropping
separability may come as a surprise.
µ ∈ M ; and at this point the rest of the argument is precisely the same as the
one given at the end of the proof of Lemma 9.1.7.
§ 9.1.3. The Lévy Metric and Completeness of M1 (E). We have now seen
that M1 (E) inherits properties from E. To be more specific, if E is a metric
space, then M1 (E) is separable or compact if E itself is. What I want to show
next is that completeness also gets transferred. That is, I will show that M1 (E)
is Polish if E is. In order to do this, I will need a lemma that is of considerable
importance in its own right.
Lemma 9.1.10. Let E be a Polish space and Φ a bounded subset of Cb (E; R)
that is equicontinuous at each x ∈ E. (That is, for each x ∈ E, supϕ∈Φ |ϕ(y) −
ϕ(x)| = 0 as y → x.) If {µn : n ≥ 1} ∪ {µ} ⊆ M1 (E) and µn =⇒ µ, then

lim sup hϕ, µn i − hϕ, µi = 0.

n→∞ ϕ∈Φ
Proof: Let > 0 be given, and use the second part of Theorem 9.1.9 to choose
K ⊂⊂ E so that

sup kϕku sup µn K{ < .
ϕ∈Φ n∈Z+ 4

By (iv) of Theorem 9.1.5, µ K{ satisfies the same estimate. Next, choose a
metric ρ for E and a countable dense set {pk : k ≥ 1} in K. Using equicontinuity

together with compactness, find ` ∈ Z+ and δ1 , . . . , δ` > 0 so that K ⊆ x :
ρ(x, pk ) < δk for some 1 ≤ k ≤ ` and

sup ϕ(x) − ϕ(pk ) < for 1 ≤ k ≤ ` and x ∈ K with ρ(x, pk ) < 2δk .
ϕ∈Φ 4

Because r ∈ (0, ∞) 7−→ µ y ∈ K : ρ(y, x) ≤ r ∈ [0, 1] is non-decreasing

for each x ∈ K, we can find,
for each 1 ≤k ≤ `, an rk ∈ δk , 2δk such that
µ(∂Bk ) = 0 when Bk ≡ x ∈ K : ρ x, pk < rk . Finally, set A1 = B1 and
Sk S`
Ak+1 = Bk+1 \ j=1 Bj for 1 ≤ k < `. Then, K ⊆ k=1 Ak , the Ak ’s are
disjoint, and, for each 1 ≤ k ≤ `,

sup sup ϕ(x) − ϕ pk < and µ ∂Ak = 0.
ϕ∈Φ x∈Ak 4
Hence, by (vii) in Theorem 9.1.5 applied to the 1Ak’s,
`
X
lim sup hϕ, µn i − hϕ, µi < + lim sup ϕ pk µn Ak − µ Ak = .

n→∞ ϕ∈Φ n→∞ ϕ∈Φ
k=1
Theorem 9.1.11. Let E be a Polish space and ρ a complete metric for E.

Given (µ, ν) ∈ M1 (E)2 , define
n
L(µ, ν) = inf δ : µ(F ) ≤ ν F (δ) + δ

o
and ν(F ) ≤ µ F (δ) + δ for all closed F ⊆ E ,

where F (δ) denotes the set of x ∈ E that lie a ρ-distance less than δ from F .
Then L is a complete metric for M1 (E), and therefore M1 (E) is Polish.
Proof: It is clear that L is symmetric and that it satisfies the triangle in-
equality. Thus,
we will know that it is a metric for M1 (E) as soon as we show
that L µn , µ −→ 0 if and only if µn =⇒ µ. To this end, first suppose that

L µn , µ −→ 0. Then, for every closed F , µ F (δ) + δ ≥ limn→∞ µn (F ) for all
δ > 0; and therefore, by countable additivity, µ(F ) ≥ limn→∞ µn (F ) for every
closed F . Hence, by the equivalence of (i) and (iii) in Theorem 9.1.5, µn =⇒ µ.
Now suppose that µn =⇒ µ, and let δ > 0 be given. Given a closed F in E,
define
ρ x, F (δ) {
ψF (x) = for x ∈ E.
ρ x, F (δ) { + ρ(x, F )
It is then an easy matter to check that both
ρ(x, y)
1F ≤ ψF ≤ 1F (δ) and ψF (x) − ψF (y) ≤ .
δ
In particular, by Lemma 9.1.10, we can choose m ∈ Z+ so that
n o
sup sup hψF , µn i − hψF , µi : F closed in E < δ,

n≥m
from which it is an easy matter to see that, for all n ≥ m,
µ(F ) ≤ µn F (δ) + δ and µn (F ) ≤ µ F (δ) + δ.

In other words, supn≥m L µn , µ ≤ δ, and, since δ > 0 was arbitrary, we have

shown that L µn , µ −→ 0.
In order to finish the proof, I must show that if {µn : n ≥ 1} ⊆ M1 (E) is
L-Cauchy convergent, then it is tight. Thus, let > 0 be given, and choose, for
each ` ∈ Z+ , an m` ∈ Z+ and a K` ⊂⊂ E so that

sup L µn , µm` ≤ `+1 and max µn K` { ≤ `+1 .
n≥m` 2 1≤n≤m ` 2
( )
Setting ` = 2`
one then has that supn∈Z+ µn K` ` { ≤ ` for each ` ∈ Z+ .
,
T∞ ( )
In particular, if K ≡ `=1 K` ` , then µn (K) ≥ 1 − for all n ∈ Z+ . Finally,
because each K` is compact, it is easy to see that K is both ρ-complete and

totally bounded and therefore also compact.
When E = R, P. Lévy was the first to construct a complete metric on M1 (E),
and it is for this reason that I will call the metric L described in Theorem 9.1.11
the Lévy metric determined by ρ. Using an abstract argument, Varadarajan
showed that M1 (E) must be Polish whenever E is, and the explicit construction
that I have used is essentially the one first produced by Prohorov.
Before closing this subsection, it seems appropriate to introduce and explain
some of the more classical terminology connected with applications of weak con-
vergence to probability theory. For this purpose, let (Ω, F, P) be a probability
space and E a metric space. Given a sequence {Xn : n ≥ 1} of E-valued ran-
dom variables on (Ω, F, P), one says that the {Xn : n ≥ 1} tends in law (or in
L
distribution) to the E-valued random variable X and writes Xn −→ X if (cf.
Exercise 1.1.16) (Xn )∗ P =⇒ X∗ P. The idea here is that, when the measures un-
der consideration are the distributions of random variables, one wants to think
of weak convergence of the distributions as determining a kind of convergence
of the corresponding random variables. Thus, one can add convergence in law
to the list of possible ways in which random variables might converge. In order
to elucidate the relationship between convergence in law, P-almost sure conver-
gence, and convergence in P-measure, it will be convenient to have the following
lemma.
Lemma 9.1.12. Let (Ω, F, P) be a probability space and E a metric space.
Given any E-valued random variables {Xn : n ≥ 1} ∪ {X} on (Ω, F, P) and any
pair of topologically equivalent metrics ρ and σ for E, ρ Xn , X −→ 0 in P-
measure if and only if σ Xn , X −→ 0 in P-measure. In particular, convergence
in P-measure does not depend on the choice of metric, and so one can write
Xn −→ X in P-measure without specifying a metric. Moreover, if Xn −→ X in
L
P-measure, then Xn −→ X. In fact, if E is a Polish space and L is the Lévy
metric on M1 (E) associated with a complete metric ρ for E, then

L X∗ P, Y∗ P) ≤ δ ∨ P ρ(X, Y ) ≥ δ
for all δ > 0 and E-valued random variables X and Y .

Proof: To prove the first assertion, suppose that
ρ(Xn , X) −→ 0 in P-measure but that σ(Xn , X)−→

6 0 in P-measure.
After passing to a subsequence if necessary,

we could then arrange that ρ(Xn , X)
−→ 0 (a.s., P) but P σ(Xn , X) ≥ ≥ for all n ∈ Z+ and some > 0. But this
is impossible, since then we would have that σ(Xn , X) −→ 0 P-almost surely
but not in P-measure. Hence, we now know that convergence in P-measure does
not depend on the choice of metric. To complete the first part, suppose that
ρ(Xn , X) −→ 0 in P-measure. Then, for every ϕ ∈ Ubρ (E; R) and δ > 0,

lim EP ϕ Xn − EP ϕ X) ≤ lim EP ϕ Xn − ϕ(X)
n→∞ n→∞

≤ (δ) + kϕku lim P ρ Xn , X ≥ δ = (δ),
n→∞
where

(δ) ≡ sup |ϕ(y) − ϕ(x)| : ρ(x, y) ≤ δ −→ 0 as δ & 0.
Thus, by (ii) in Theorem 9.1.5, (Xn )∗ P =⇒ X∗ P.

Now assume that E is Polish, and take ρ and L accordingly. Then, for any
closed set F and δ > 0,

X∗ P(F ) = P(X ∈ F ) ≤ P ρ(Y, F ) < δ + P ρ(X, Y ) ≥ δ
= Y∗ P F (δ) + P ρ(X, Y ) ≥ δ .

Hence, since the same is true when the roles of X and Y are reversed, the
asserted estimate for L X∗ P, Y∗ P) holds.
As a demonstration of the sort of use to which one can put these ideas, I
present the following version of the Principle of Accompanying Laws.
Theorem 9.1.13. Let E be a Polish space and, for each k ∈ Z+ , let {Yk,n :
n ≥ 1} be a sequence of E-valued random variables on the probability space
(Ω, F, P). Further, assume that, for each k ∈ Z+ , there is a µk ∈ M1 (E) such
∗
that Yk,n P =⇒ µk as n → ∞. Finally, let ρ be a complete metric for E, and
suppose that {Xn : n ≥ 1} is a sequence of E-valued random variables on
(Ω, F, P) with the property that

(9.1.14) lim lim P ρ Xn , Yk,n ≥ δ = 0 for every δ > 0.
k→∞ n→∞
Then there is a µ ∈ M1 (E) such that µk =⇒ µ as k → ∞ and (Xn )∗ P =⇒ µ as

L
n → ∞. In particular, if, as n → ∞, Yn −→ X and P ρ(Xn , Yn ) ≥ δ −→ 0 for
L
each δ > 0, then Xn −→ X.
Proof: Let L be the Lévy metric associated with a complete metric ρ for E.
By the second part of Lemma 9.1.12,

sup L (Y`,n )∗ P, (Xn )∗ P ≤ δ ∨ sup lim P ρ(Y`,n , Xn ) ≥ δ ,
`≥k `≥k n→∞
and therefore, by (9.1.14),

(*) lim lim L (Y`,n )∗ P, (Xn )∗ P = 0.
k→∞ n→∞
Thus, since for any k ∈ Z+ ,

sup L µ` , µk = sup lim L (Y`,n )∗ P, (Yk,n )∗ P ,
`≥k `≥k n→∞
{µk : k ≥ 1} is an L-Cauchy sequence and, as such, converges to some µ. Finally,

for every k ∈ Z+ ,

L µ, (Xn )∗ P ≤ L(µ, µk ) + L µk , (Yk,n )∗ + L (Yk,n )∗ P, (Xn )∗ P ,
and so

lim L µ, (Xn )∗ P ≤ L(µ, µk ) + lim L (Yk,n )∗ P, (Xn )∗ P .
n→∞ n→∞
Thus, after letting k → ∞ and applying (*), one concludes that (Xn )∗ P =⇒
µ.
Exercise 9.1.15. Let (E, B) be a measurable space with the property that
{x} ∈ B for all x ∈ E. In this exercise, we will investigate the strong topology
in a little more detail. In particular, in part (iv), we will show that when
µ ∈ M1 (E) is non-atomic (i.e., µ {x} = 0 for every x ∈ E), then there is no
countable neighborhood basis of µ in the strong topology. Obviously, this means
that the strong topology for M1 (E) admits no metric whenever M1 (E) contains
a non-atomic element.
(i) Show that, in general,

kν − µkvar = 2 max ν(A) − µ(A) : A ∈ B
and that in the case when E is a metric space, B its Borel field, and ρ a metric
for E,
kν − µkvar = sup hϕ, νi − hϕ, µi : ϕ ∈ Ubρ (E; R) and kϕku ≤ 1 .

(ii) Show that if {µn : n ≥ 1} is a P

sequence in M1 (E) that tends in the strong
∞
topology to µ ∈ M1 (E), then µ n=1 2−n µn .
(iii) Given µ ∈ M1 (E), show that µ admits a countable neighborhood basis
in the strong topology if and only if there exists a countable {ϕk : k ≥ 1} ⊆
B(E; R) such that, for any net {µα : α ∈ A} ⊆ M1 (E), µα −→ µ in the strong
topology as soon as limα hϕk , µα i = hϕk , µi for every k ∈ Z+ .
+ +
(iv) Referring to Exercises 1.1.14 and 1.1.16, set Ω = E Z and F = B Z . Next,
+
let µ ∈ M1 (E) be given, and define P = µZ on (Ω, F). Show that, for any
ϕ ∈ B(E; R), the random variables x ∈ Ω 7−→ Xnϕ (x) ≡ ϕ xn , n ∈ Z+ , are
mutually P-independent and all have distribution ϕ∗ µ. In particular, use the
Strong Law of Large Numbers to conclude that
n
1 X ϕ
lim Xm (x) = hϕ, µ
n→∞ n
m=1
for each x outside of a P-null set.

Now assume that µ is non-atomic, and suppose that µ admitted a countable
neighborhood basis in the strong topology. Choose {ϕk : k ≥ 1} ⊆ B(E; R)
accordingly, as in (iii), and (using the preceding) conclude P
that there exists at
n
least one x ∈ Ω for which the measures µn given by µn ≡ n1 m=1 δxm , n ∈ Z+ ,
converge in the strong topology to µ. Finally, apply (ii) to see that this is
impossible.
Exercise 9.1.16. Throughout this exercise, E is a separable metric space.
(i) We already know that M1 (E) is separable;

however, our proof was non-con-
structive. Show that if {pk : k ≥ 1 is a dense subset of E, then the set of
Pn +
all convex combinations
Pn k=1 αk δpk , where n ∈ Z and {αk : 1 ≤ k ≤ n} ⊂
[0, 1] ∩ Q with 1 αk = 1, is a countable dense set in M1 (E).
(ii) We have seen that M1 (E) is compact if E is. To see that the converse is
also true, show that x ∈ E 7−→ δx ∈ M1 (E) is a homeomorphism whose image
is closed.
(iii) Although it is a little off our track, it is amusing to show that E being
compact is equivalent to Cb (E; R) being separable; and, in view of (i) in Lemma
9.1.4, this comes down to checking that E is compact if Cb (E; R) is separable.
Hint: Let ρ̂ be a totally bounded metric on E, and use Ê to denote the ρ̂-
completion of E. Show that if {xn : n ≥ 1} ⊆ E has the properties that
xn −→ x̂ ∈ Ê and limn→∞ ϕ(xn ) exists for every ϕ ∈ Cb (E; R), then x̂ ∈ E.
1
(Suppose not, set ψ(x) = ρ̂(x,x̂) , and consider functions of the form f ◦ ψ for
f ∈ Cb (R; R).) Finally, assuming that Cb (E; R) is separable, and, using a
diagonalization procedure, show that every sequence {xn : n ≥ 1} ⊆ E admits a
subsequence {xnm : m ≥ 1} that converges to some x̂ ∈ Ê and limm→∞ ϕ xnm
exists for every ϕ ∈ Cb (E; R).
(iv) Let {Mn : n ≥ 1} be a sequence of finite, non-negative measures on (E, B).

Assuming that {Mn : n ≥ 1} is tight in the sense that {Mn (E) : n ≥ 1} is
bounded and that, for each > 0, there is a K ⊂⊂ E such that supn Mn K{ ≤
, show that there is a subsequence {Mnk : k ≥ 1} and a finite measure M such

that Z Z
ϕ dM = lim ϕ dMnk , for all ϕ ∈ Cb (E; R).
E k→∞ E
R
Conversely,
R if E is Polish and there is a finite measure M such that E
ϕ dMn −→
E
ϕ dM for every ϕ ∈ Cb (E; R), show that {Mn : n ≥ 1} is tight.
Exercise
Q∞ 9.1.17. Let {E` : ` ≥ 1} be a sequence of Polish spaces, set E =
1 E` , and give E the product topology.
(i) For each ` ∈ Z+ , let ρ` be a complete metric for E` , and define

∞
X 1 ρ` (x` , y` )
R(x, y) = for x, y ∈ E.
2` 1 + ρ` (x` , y` )
`=1
Show that R is a complete metric

Q∞ for E, and conclude that E is a Polish space.
In addition, check that BE = 1 BE` .
(ii) For ` ∈ Z+ , let π` be the natural projection map from E onto E` , and show
that K ⊂⊂ E if and only if
\
K= π`−1 (K` ), where K` ⊂⊂ E` for each ` ∈ Z+ .
`∈Z+
Also, show that the span of the functions

`
Y
ϕk ◦ πk , where ` ∈ Z+ and ϕk ∈ Ubρk (Ek ; R), 1 ≤ k ≤ `,
k=1
is dense in UbR (E; R).

In particular, conclude from these that A ⊆ M1+ (E) is
tight if and only if (π` )∗ µ : µ ∈ A ⊆ M1 (E` ) is tight for every ` ∈ Z and
that µn =⇒ µ in M1 (E) if and only if
* ` + * ` +
Y Y
ϕk ◦ πk , µn −→ ϕk ◦ πk , µ
k=1 k=1
for every ` ∈ Z+ and choice of ϕk ∈ Ubρk (Ek ; R), 1 ≤ k ≤ `.

Q`
(iii) For each ` ∈ Z+ , set E` = k=1 Ek , and let π` denote thenatural projection
map from E onto E` . Next, let µ[1,`] be an element of M1 E` , and assume that
the µ[1,`] ’s are consistent in the sense that, for every ` ∈ Z+ ,

µ[1,`+1] Γ × E`+1 = µ[1,`] (Γ) for all Γ ∈ BE` .
Show that there is a unique µ ∈ M1 (E) such that µ[1,`] = (π` )∗ µ for every
` ∈ Z+ .
Hint: Choose and fix an e ∈ E, and define Φ` : E` −→ E so that
n≤`

xn if
Φ` x1 , . . . , x` =
n en otherwise.

Show that (Φ` )∗ µ[1,`] : ` ∈ Z+ ∈ M1 (E) is tight and that any limit must be
the desired µ.
The conclusion drawn in (iii) is the renowned Kolmogorov Extension (or
Consistency) Theorem. Notice that, at least for Polish spaces, it represents
a vast generalization of the result obtained in Exercise 1.1.14.
Exercise 9.1.18. In this exercise we will use the theory of weak convergence
to develop variations on The Strong Law of Large Numbers (cf. Theorem 1.4.9).
Thus, let E be a Polish space, (Ω, F, P ) a probability space, and {Xn : n ≥ 1}
a sequence of mutually independent E-valued random variables on (Ω, F, P )
with common distribution µ ∈ M1 (E). Next, define the empirical distribution
function
n
1 X
ω ∈ Ω 7−→ Ln (ω) ≡ δX (ω) ∈ M1 (E),
n m=1 m
and observe that, for any ϕ ∈ B(E; R),
n
1 X
n ∈ Z+ and ω ∈ Ω.

ϕ, Ln (ω) = ϕ Xm (ω) ,
n m=1
As a consequence of the Strong Law, show that
(9.1.19) Ln (ω) =⇒ µ for P -almost every ω ∈ Ω,
which is The Strong Law of Large Numbers for the empirical distribu-
tion.
Now show that (9.1.19) provides another (cf. Exercises 6.1.16 and 6.2.18) proof
of the Strong Law of Large Numbers for Banach space–valued random variables.
Thus, let EPbe a real, separable, Banach space with dual space E ∗ , and set
n
S n (ω) = n1 1 Xm (ω) for n ∈ Z+ and ω ∈ Ω.
(i) As a preliminary step, begin with the case when

(*) µ BE (0, R){ = 0 for some R ∈ (0, ∞).

Choose η ∈ Cb R; R so that η(t) = t for t ∈ [−R, R] and η(t) = 0 when |t| ≥
R + 1, and define ψx∗ ∈ Cb (E; R) for x∗ ∈ E ∗ by ψx∗ (x) = η hx, x∗ i , x ∈ E,

where hx, x∗ i is used here to denote the action of x∗ ∈ E ∗ on x ∈ E. Taking (*)

into account and applying (9.1.19) and Lemma 9.1.10, show that
Z
hψx∗ , Ln (ω)i − hx, x∗ i µ(dx) = 0

lim sup
n→∞ kx∗ k ∗ ≤1
E

E
for P-almost every ω ∈ Ω, and conclude from this that

lim S n (ω) − m E = 0 for P-almost every ω ∈ Ω,
n→∞
where (cf. Lemma 5.1.10) m = Eµ [x].

(ii) The next step is to replace the boundedness assumption in (*) by the hypoth-
esis that x kxkE is µ-integrable. Assuming that it is, define, for R ∈ (0, ∞),
n ∈ Z+ , and ω ∈ Ω,

Xn (ω) if Xn (ω) E < R
Xn(R) (ω) =
0 otherwise
(R) (R) (R) Pn (R)

and Yn (ω) = Xn (ω) − Xn (ω). Next, set S n = n1 1 Xm , n ∈ Z+ , and,
(R)
from (i), note that S n (ω) : n ≥ 1 converges in E for P-almost every ω ∈ Ω.
In particular, if > 0 is given and R ∈ (0, ∞) is chosen so that
Z

kxkE µ(dx) < ,
8
{kxkE ≥R}
use the preceding and Theorem 1.4.9 to verify the computation

lim P sup S n − S m E ≥
m→∞ n≥m

(R) (R)
≤ lim P sup S n − S m ≥

m→∞ n≥m 2
n !
1 X
(R)
+ 2 lim P sup Yk ≥

m→∞ n≥m n 1 4
E
n
!
1 X Y (R) ≥ = 0,

≤ 2 lim P sup k E
m→∞ n≥m n 1 4
and from this conclude that S n −→ Eµ [x] P-almost surely.

(iii) Finally, repeat the argument

given in
the proof of Theorem 1.4.9 to show
that kxk is µ-integrable if S n : n ≥ 1 converges in E on a set of positive
P-measure.4
§ 9.2 Regular Conditional Probability Distributions
As I mentioned in the discussion following Theorem 5.1.4, there are quite general
situations in which conditional expectation values can be computed as expecta-
tion values. The following is a basic result in that direction.
Theorem 9.2.1. Suppose that Ω is a Polish space and that F = BΩ . Then,
for every sub-σ-algebra Σ of F, there is a P-almost surely unique Σ-measurable
map ω ∈ Ω 7−→ PΣ ω ∈ M1 (Ω) with the property that
Z
PΣ

P A∩B = ω (B) P(dω) for all A ∈ Σ and B ∈ F.
A
In particular, for each (−∞, ∞]-valued random variable X that is bounded be-
Σ
low, ω ∈ Ω 7−→ EPω [X] is a conditional expectation value of X given Σ. Finally,
if Σ is countably generated, then there is a P-null set N ∈ Σ with the property
that PΣω (A) = 1A (ω) for all ω ∈
/ N and A ∈ Σ.
Proof: To prove the uniqueness, suppose ω ∈ Ω 7−→ QΣ ω ∈ M1 (Ω) were a
second such mapping. We would then know that, for each B ∈ F, QΣ ω (B) =
PΣ
ω (B) for P-almost every ω ∈ Ω. Hence, since F (as the Borel field over a
second countable topological space) is countably generated, we could find one
Σ-measurable P-null set off of which QΣ Σ
ω = Pω . Similarly, to prove the final
assertion when Σ is countably generated, note (cf. (5.1.7)) that, for each A ∈
Σ, PΣ ω (A) = 1A (ω) = δω (A) for P-almost every ω ∈ Ω. Thus, once again
countability allows us to choose one Σ-measurable P-null set N such that PΣ ω
Σ = δω Σ if ω ∈ / N.
I turn next to the question of existence. For this purpose, first choose (cf. (ii)
of Lemma 9.1.4) ρ to be a totally bounded metric for Ω, and let U = Ubρ (Ω; R) be
the space of bounded, ρ-uniformly continuous, R-valued functions on Ω. Then
(cf. (iii) of Lemma 9.1.4) U is a separable Banach space with respect to the
uniform norm. In particular, we can choose a sequence {fn : n ≥ 0} ⊆ U so
that f0 = 1, the functions f0 , . . . , fn are linearly independent for each n ∈ Z+ ,
and the linear span S of {fn : n ≥ 0} is dense in U. Set g0 = 1, and, for each
n ∈ Z+ , let gn be some fixed representative of EP [fn | Σ]. Next, set

R = α ∈ RN : ∃m ∈ N αn = 0 for all n ≥ m
4The beautiful argument that I have just outlined is due to Ranga Rao. See his 1963 article
“The law of large numbers for D[0, 1]-valued random variables,” Theory of Prob. & Appl.
VIII #1, where he shows that this method applies even outside the separable context.
§ 9.2 Regular Conditional Probability Distributions 387
and define
∞
X ∞
X
fα = αn fn and gα = αn gn
n=0 n=0
for α ∈ R. Because of the linear independence of the fn ’s, we know that fα = fβ

if and only if α = β. Hence, for each ω ∈ Ω, we can define the (not necessarily
continuous) linear functional Λω : S −→ R so that

Λω fα = gα (ω), α ∈ R.
Clearly, Λω (1) = 1 for all ω ∈ Ω. On the other hand, we cannot say that Λω
is always non-negative as a linear functional on S. In fact, the best we can
do is extract a Σ-measurable P-null set N so that Λω is a non-negative linear
functional on S whenever ω ∈ / N . To this end, let Q denote the rational reals
and set
Q+ = α ∈ R ∩ QN : fα ≥ 0 .

Since gα ≥ 0 (a.s., P) for every α ∈ Q+ and Q+ is countable,

n o
N ≡ ω ∈ Ω : ∃α ∈ Q+ gα (ω) < 0
is a Σ-measurable, P-null set. In addition, it is obvious that, for every ω ∈

/ N,
Λω (f ) ≥ 0 whenever f is a non-negative element of S. In particular, for ω ∈
/ N,

kf ku ± Λω (f ) = Λω kf ku 1 ± f ≥ 0, f ∈ S,
and therefore Λω admits a unique extension as a non-negative, continuous linear

functional on U that takes 1 to 1. Furthermore, it is an easy matter to check
that, for every f ∈ U, the function
for ω ∈
/N

Λω (f )
g(ω) = P
E [f ] for ω∈N
is a conditional expectation value of f given Σ.

At this point, all that remains is to show that, for P-almost every ω ∈ / N,
Λω is given by integration with respect to a Pω ∈ M1 (Ω). In particular, by the
Riesz Representation Theorem, there is nothing more to do in the case when Ω
is compact. To treat the case when Ω is not compact, I will use Lemma 9.1.7.
Namely, first choose (cf. the last part of Lemma 9.1.7) a non-decreasing sequence
1
+

of sets Kn ⊂⊂ Ω, n ∈ Z , with the property that P Kn { ≤ 2n . Next, define
m ρ(ω, Kn )
ηm,n (ω) = for m, n ∈ Z+ .
1 + m ρ(ω, Kn )
Clearly, ηm,n ∈ U for each pair (m, n) and 0 ≤ ηm,n % 1Kn { as m → ∞ for each
n ∈ Z+ . Thus, by The Monotone Convergence Theorem, for each n ∈ Z+ ,
Z Z

sup Λω ηm,n P(dω) = lim Λω ηm,n P(dω)
N { m∈Z+ m→∞ N{
1
= lim EP ηm,n ≤ n ;
m→∞ 2
and so, by the Borel–Cantelli Lemma, we can find a Σ-measurable P-null set
N 0 ⊇ N such that

/ N 0.

M (ω) ≡ sup n sup Λω ηm,n < ∞ for every ω ∈
n∈Z+ m∈Z+
/ N 0 , then, for every f ∈ U and n ∈ Z+ ,

Hence, if ω ∈

Λω (f ) ≤ Λω (1 − ηm,n ) f + Λω ηm,n f
M (ω)
≤ (1 − ηm,n ) f u + kf ku
n

for all m ∈ Z+ . But (1 − ηm,n ) f u −→ kf ku,Kn as m → ∞, and so we now see
that the condition in (9.1.8) is satisfied by Λω for every ω ∈ / N 0 . In other words,
0 Σ
I have shown that, for each ω ∈ / N , there is a unique Pω ∈ M1 (Ω) such that
PΣ
Λω (f ) = E ω [f ] for all f ∈ U. Finally, if we complete the definition of the map
0
ω ∈ Ω 7−→ PΣ Σ
ω by taking Pω = P for ω ∈ N , then this map is Σ-measurable and
Z
Σ
EP f, A = EPω [f ] P(dω), A ∈ Σ,
Ω
first for all f ∈ U and thence for all F-measurable f ’s that are bounded be-
low.
If P is a probability measure on (Ω, F) and Σ is a sub-σ-algebra of F, then
a conditional probability distribution of P given Σ is a map (ω, B) 7−→
PΣ Σ
ω (B) such that Pω is a probability measure on (Ω, F) for each ω ∈ Ω and
Σ
ω Pω (B) a conditional probability of B given Σ for all B ∈ F. If, in addi-
tion, for ω outside a Σ-measurable, P-null set and all A ∈ Σ, PΣ (A) = 1A (ω),
then the conditional probability distribution is said to be regular. Notice that,
although they may not always exist, conditional probability distributions are
always unique up to a Σ-measurable, P-null set so long as F is countably gener-
ated. Moreover, Theorem 9.2.1 says that they will always exist if Ω is Polish and
F = BΩ . Finally, whenever a conditional probability distribution of P given Σ
exists, the argument leading to the last part of Theorem 9.2.1 when Σ is count-
ably generated is completely general and shows that a regular version can be
found.
§ 9.2.1. Fibering a Measure. When Ω is a product space E1 × E2 of two
Polish spaces and Σ is the σ-algebra generated by the second coordinate, then
the conclusion of Theorem 9.2.1 takes a particularly pleasing form.
Theorem 9.2.2. Let E1 and E2 be a pair of Polish spaces, and take Ω to be

the Polish space E1 × E2 . Given µ ∈ M1 (Ω), use µ2 to denote the marginal
distribution of µ on E2 : µ2 (Γ) = µ(E1 × Γ) for Γ ∈ BE2 . Then there is a Borel
measurable map x2 ∈ E2 7−→ µ(x2 , · ) ∈ M1 (E1 ) such that µ(dx1 × dx2 ) =
µ(x2 , dx1 ) µ2 (dx2 ).
Proof: Referring to Theorem 9.2.1, take P = µ, Σ = {E1 × Γ : Γ ∈ BE2 }, and
let ω ∈ Ω 7−→ PΣ ω ∈ M1 (Ω) be the map guaranteed by the result there. Next,
choose and fix a point x01 ∈ E1 . Then, because ω PΣ
ω is Σ-measurable, we
Σ Σ
know that P(x1 ,x2 ) = P(x0 ,x2 ) . In addition, because Σ is countably generated,
1
the final part of Theorem 9.2.1 guarantees
that there exists a µ2 -null set B ∈
BE2 such that PΣ 0
(x ,x2 )
E 1 × {x 2 } = 1 for all x2 ∈
/ B. Hence, if we define
1
x2 µ(x2 , · ) by µ(x2 , Γ) = PΣ (x01 ,x2 )
(Γ × E2 ), then, for any Borel measurable
ϕ : E1 × E2 −→ [0, ∞), hϕ, µi equals
Z Z Z Z
ϕ(ω 0 )PΣ ω (dω 0
) P(dω) = ϕ(x 1 , x2 ) µ(x 2 , dx1 ) µ2 (dx2 ).
E2 E1
In the older literature, the result in Theorem 9.2.2 would be called a fibering
of µ. The name derives from the idea that µ on E1 × E2 can be decomposed into
its “vertical component” µ2 and its “restrictions” µ(x2 , · ) to “horizontal fibers”
E1 × {x2 }. Alternatively, Theorem 9.2.2 can be interpreted as saying that any
µ ∈ M1 (E1 × E2 ) can be decomposed into its marginal distribution on E2 and
a transition probability x2 ∈ E2 7−→ µ(x2 , · ) ∈ M1 (E1 ). The two extreme cases
are when the coordinates are independent, in which case µ(x2 , · ) is independent
of x2 , and the case when the coordinates are equal, in which case µ(x2 , · ) = δx2 .
As an application of Theorem 9.2.2, I present the following important special
case of a more general result that indicates just how remarkably fungible non-
atomic measures are.
Corollary 9.2.3. Let λ[0,1) denote Lebesgue measure on [0, 1). For each
N ∈ Z+ and µ ∈ M1 (RN ), there is a Borel measurable map f : [0, 1) −→ RN
such that µ = f∗ λ[0,1) .
Proof: I will work by induction on N ∈ Z+ . When N = 1, take

f (u) = inf t ∈ R : µ (−∞, t] ≥ u , u ∈ [0, 1).
Next, assume the result is true for N , take E1 = R and E2 = RN in Theorem

9.2.2, and, given µ ∈ M1 (RN ), define µ2 ∈ M1 (RN ) and y ∈ RN 7−→ µ(y, · ) ∈
M1 (R) accordingly. By the induction hypothesis, µ2 = f2 ( · )∗ λ[0,1) for some
f2 : [0, 1) −→ RN . Thus, if g : [0, 1)2 −→ R × RN is given by

g(u1 , u2 ) = inf t ∈ R : µ f2 (u2 ), (−∞, t] ≥ u1 , f2 (u2 )
for (u1 , u2 ) ∈ [0, 1)2 , then g is Borel measurable on [0, 1)2 and µ = g∗ λ2[0,1) .
Finally, by Lemma 1.1.6 or part (ii) of Exercise 1.1.11, we know
that there is a
Borel measurable map u ∈ [0, 1) 7−→ U(u) = U1 (u), U2 (u) ∈ [0, 1)2 such that
U∗ λ[0,1) = λ2[0,1) , and so we can take f (u) = g ◦ U.
§ 9.2.2. Representing Lévy Measures via the Itô Map. There is another
way of thinking about the construction of the Poisson jump processes, one that
is based on Corollary 9.2.3 and the transformation property described in Lemma
4.2.12. The advantage of this approach is that it provides a method of coupling
Lévy processes corresponding to different Lévy measures. Indeed, it is this cou-
pling procedure that underlies K. Itô’s construction of Markov processes modeled
on Lévy processes.1
Let M0 (dy) = |y|−N −1 dy, which is the Lévy measure for a (cf. Corollary 3.3.9)
symmetric 1-stable law. My first goal is to show that every M ∈ M∞ (RN ) can
be realized as (cf. the notation in Lemma 4.2.6) M0F for some Borel measurable
F : RN −→ RN satisfying F (0) = 0.2
Theorem 9.2.4. For each M ∈ M∞ (RN ) there exists a Borel measurable map
F : RN −→ RN such that F (0) = 0 and
M (Γ) = M0F ≡ M0 F −1 (Γ \ {0}) , Γ ∈ BRN .

Proof: I begin with the case when N = 1. Given M ∈ M∞ (R), define ρ(r, ±1)
for r > 0 by
ρ(r, 1) = sup ρ ∈ [0, ∞) : M [ρ, ∞) ≥ r−1

ρ(r, −1) = sup ρ ∈ [0, ∞) : M (−∞, −ρ] ≥ r−1 ,

where I have taken the supremum over the empty set to be 0. Applying Exercise
9.2.6 with ν(dr)= r−2 λ(0,∞) (dr), one sees that M = M0F when F (0) = 0 and
y
F (y) = ρ |y|, |y| for y ∈ R \ {0}.
Now assume that N ≥ 2, and let M ∈ M∞ (RN ). If M = 0, simply take
F ≡ 0. If M 6= 0, choose a non-decreasing function h : (0, ∞) −→ (0, ∞) so that
Z

h |y| M (dy) = 1,
and define µ ∈ M1 (0, ∞) × SN −1 ) so that

Z

hϕ, µi = h |y| ϕ(y)M (dy).
RN
1 See K. Itô’s On stochastic differential equations, Memoirs of the A.M.S. 4 (1951) or my
Markov Processes from K. Itô’s Perspective, Princeton Univ. Press, Annals of Math. Studies
155 (2003).
2 There is nothing sacrosanct about the choice of M as my reference measure. For instance, it
0
should be obvious that one can choose any Lévy measure M with the property that M0 = M F
for some Borel measurable F : RN −→ RN that takes 0 to 0.
Using µ2 to denote the marginal distribution of µ on SN −1 , apply Corollary 9.2.3

to find a Borel measurable f : [0, 1) −→ RN so that µ2 = f∗ λ[0,1) . Since µ2 lives
on SN −1 , I may and will assume that f (u) ∈ SN −1 for all u ∈ [0, 1). Next, use
Theorem 9.2.2 to find a measurable map η ∈ SN −1 7−→ µ(η, · ) ∈ M1 (0, ∞)
so that µ(dr × dη) = µ(η, dr) µ2 (dη), and define ρ : (0, ∞) × SN −1 −→ [0, ∞)
by ( )
Z
1 ωN −1
ρ(r, η) = sup ρ ∈ [0, ∞) : µ(η, dr) ≥ .
[ρ,∞) h(r) r
Then, again by Exercise 9.2.6, but this time with ν(dr) = ωN −1 r−2 λ(0,∞) (dr),
for any continuous ϕ : RN −→ [0, ∞) that vanishes in a neighborhood of 0,
Z Z
ϕ(rη)
ϕ ρ(r, η)η r−2 dr, η ∈ SN −1 ,

µ(η, dr) = ωN −1
(0,∞) h(r) (0,∞)
and so
Z Z Z !
−2

ϕ(y) M (dy) = ωN −1 ϕ ρ(r, η)η r dr µ2 (dη)
RN SN −1 (0,∞)
Z Z !
−2
= ωN −1 ϕ ρ(r, η)f (t) r dr λ[0,1) (dt).
[0,1) (0,∞)
Finally, define g : SN −1 −→ [0, ωN −1 ) by g(η) = λSN −1 {η 0 ∈ SN −1 : η10 ≤ η1 } ,

note that ωN −1 λ[0,1) = g∗ λSN −1 , and conclude that M = M0F when

y y
for y ∈ RN \ {0}.

F (0) = 0 and F (y) = ρ |y|, |y| f ◦ g |y|
We can now prove the following theorem, which is the simplest example of
Itô’s procedure.
Theorem 9.2.5. Let {j0 (t, · ) : t ≥ 0} be a Poisson jump process associated
with M0 . Then, for each M ∈ M∞ (RN ), there is a Borel measurable map
F : RN −→ RN with F (0) = 0 and a Poisson jump process {j(t, · ) : t ≥ 0}
associated with M such that j(t, · ) = j0F (t, · ), t ≥ 0, P-almost surely.
Proof: Choose F as in Theorem 9.2.4 so that M = M0F . For R > 0, set
FR (y) = 1[R,∞) (y)F (y). By Lemma 4.2.12, we know that {j0FR (t, · ) : t ≥ 0} is
a Poisson jump process associated with M FR . In particular, for each r > 0,
EP j0F t, RN \ B(0, r) = lim EP j0FR t, RN \ B(0, r) = M RN \ B(0, r) < ∞.

R&0
Hence, there exists a P-null set N such that t j0F (t, · , ω) is a jump function
F
for all ω ∈/ N . Finally, if j(t, · , ω) = j0 (t, · , ω) when ω ∈ / N and j(t, · , ω) = 0
for ω ∈ N , then {j(t, · ) : t ≥ 0} is a jump process associated with M and
j(t, · ) = j0F (t, · ), t ≥ 0, for P-almost every ω ∈ Ω.
Exercise 9.2.6. Let ν be an infinite non-negative,

non-atomic,
Borel measure
on [0, ∞) with the property that ν [r2 , ∞) < ν [r1 , ∞) < ∞ for all 0 <
r1 < r2 < ∞. Given any other non-negative
Borel measure on [0, ∞) with the
properties that µ({0}) = 0 and µ [r, ∞) < ∞ for all r > 0, define

ρ(r) = sup ρ ∈ (0, ∞) : µ [ρ, ∞) ≥ ν [r, ∞) , r ≥ 0,

where
the supremum
over the empty set is taken to be 0. Show that µ [t, ∞) =
ν r : ρ(r) ≥ t for all t > 0, and therefore that hϕ, µi = hϕ ◦ ρ, νi for all Borel
measurable ϕ : [0, ∞) −→ [0, ∞) that vanish at 0.

Hint: Determine g : (0, ∞) −→
(0, ∞) so that ν g(r), ∞ = r, and check that
{r : ρ(r) ≥ t} = g µ([t, ∞)) , ∞ for all t > 0.
§ 9.3 Donsker’s Invariance Principle
The content of this section is my main justification for presenting the material in
§ 9.1. Namely, as we saw in Chapter 8, there is good reason to think that Wiener
measure is the infinite dimensional version of the standard Gauss measure in
RN , and as such one might suspect that there is a version of The Central Limit
Theorem that applies to it. In this section I will prove such a Central Limit
Theorem for Wiener measure. The result is due to M. Donsker and is known as
Donsker’s Invariance Principle (cf. Theorem 9.3.1).
Before getting started, I need to make a couple of simple preparatory re-
marks. In the first place, I will be thinking of Wiener
measure W (N ) as a Borel
N N
probability measure on C(R ) = C [0, ∞); R with the topology of uniform
convergence on compact intervals. Equivalently, C(RN ) is given the topology for
which
∞
X 1 kψ − ψ 0 k[0,n]
ρ(ψ, ψ 0 ) =
n=1
2n 1 + kψ − ψ 0 k[0,n]
is a metric, which, just as in the case of D(RN ) (cf. 4.1.1), is complete on

C(RN ) and, as distinguished from D(RN ), is separable there. One way to check
separability is to note that the set of paths ψ that, for some n ∈ N, are linear on
[(m − 1)2−n , m2−n ] and satisfy ψ(m2−n ) ∈ QN for all m ∈ Z+ is a countable,
dense subset. In particular, this means that C(RN ) is a Polish space, and so
the theory developed in § 9.1 applies to it. In addition, the Borel field BC(RN )

coincides with σ {ψ(t) : t ≥ 0} , the σ-algebra that C(RN ) inherits as a subset
of (RN )[0,∞) (cf. § 4.1). Indeed, since
ψ ψ(t) is continuous for every t ≥ 0, it
is obvious that σ {ψ(t) : t ≥ 0} ⊆ BC(RN ) . At the same time, since kψk[0,t] =
) : τ ∈ [0, t] ∩ Q}, it is easy to check that open balls are σ {ψ(t) :
sup{|ψ(τ
t ≥ 0} -measurable. Hence, since every open set is the countable union of open
balls, BC(RN ) ⊆ σ {ψ(t) : t ≥ 0} . Knowing that these σ-algebras coincide,
§ 9.3 Donsker’s Invariance Principle 393

we know that two probability measures µ, ν ∈ M1 C(RN ) are equal if they
determine the same distribution on (RN )[0,∞) , that is, if, for each n ∈ Z+ and
0 = t0 < t1 < tn , the distribution of ψ ∈ C(RN ) 7−→ ψ(t0 , . . . , ψ(tn ) ∈ (RN )n
is the same under µ and ν.
§ 9.3.1. Donsker’s Theorem. Let (Ω, F, P) be a probability space, and sup-
pose that {Xn : n ≥ 1} is a sequence of independent, P-uniformly square
integrable random variables (i.e., as R → ∞, EP |Xn |2 , |Xn | ≥ R −→ 0
uniformly in n) with mean value 0 and covariance I. Given n ≥ 1, define
m − 12
Pm
ω ∈ Ω 7−→ Sn ( · , ω) ∈ C(RN ) so that Sn (0)
m−1 m = 0, Sn n = n k=1 Xk , and
+
Sn ( · , ω) is linear on each interval n , n for all m ∈ Z . Donsker’s theorem
is the following.
Theorem 9.3.1 (Donsker’s Invariance Principle). If µn = (Sn )∗ P ∈

M1 C(RN ) is the distribution of ω ∈ Ω 7−→ Sn ( · , ω) ∈ C(RN ) under P, then
µn =⇒W (N ) . Equivalently, for any bounded, continuous Φ : C(RN ) −→ C,
lim EP Φ ◦ Sn = hΦ, W (N ) i.

n→∞
Proving this result comes down to showing that {µn : n ≥ 1} is tight and
that every limit point is W (N ) . The second of these is a rather elementary
application of the Central Limit Theorem, and, at least when the Xn ’s have
uniformly bounded fourth moments, the first is an application of Kolmogorov’s
Continuity Criterion. Finally, to remove the fourth moment assumption, I will
use the Principle of Accompanying Laws. It should be noticed that, at no point
in the proof, do I make use of the a priori existence of Wiener measure. Thus,
Theorem 9.3.1 provides another derivation of its existence, a derivation that
includes an an extremely ubiquitous approximation procedure.
Lemma 9.3.2. Any limit point of {µn : n ≥ 1} is W (N ) .
Proof: Since a probability on C(RN ) is uniquely determined by its finite di-

mensional time marginals, and because ψ(0) = 0 with probability 1 under
all the µn ’s as well as W (N ) , it suffices to show that, for each ` ∈ Z+ and
0 = t0 < t1 < · · · < t` ,

Sn (t1 ), Sn (t2 ) − Sn (t1 ), . . . , Sn (t` ) − Sn (t`−1 ) ∗ P =⇒ γ0,τ1 I × · · · × γ0,τ` I ,
1
where τk = tk − tk−1 , 1 ≤ k ≤ `. To this end, for 1 ≤ k ≤ ` and n > τk , set
bntk c
1
X
∆n (k) = n− 2 Xj ,
j=bntk−1 c+1
where, as usual, I use the notation btc to denote the integer part of t. Noting
that

Sn tk − Sn tk−1 − ∆n (k)

bntk c bntk−1 c
≤ Sn tk − Sn

+ Sn tk−1 − Sn
n n

Xbnt c+1 + Xbnt c+1
k k−1
≤ 1 ,
n2
one sees that, for any > 0,
`
! `
!
X 2 X 2 n2
Sn tk − Sn tk−1 − ∆n (k) ≥ 2

≤P Xbntk c+1 ≥

P
4
k=1 k=0
`
4 X P h 2 i 4(` + 1)N
≤ E X bnt c+1
= −→ 0
n2 k
n2
k=0
as n → ∞. Hence, by the Principle of Accompanying Laws (cf. Theorem 9.1.13),

we need only check that
∆n (1), . . . , ∆n (`) ∗ P =⇒ γτN × · · · × γτN

1 `
.
Moreover, since

∆n (1), . . . , ∆n (`) ∗ P = ∆n (1) ∗ P × · · · × ∆n (`) ∗ P

for all sufficiently large n’s, this reduces to checking ∆n (k) ∗ P =⇒ γ0,τk I for
each 1 ≤ k ≤ `. Finally, given 1 ≤ k ≤ `, set Mn (k) = bntk c − bntk−1 c, and use
Theorem 2.3.8 to see that, as n → ∞,
√
  
Mn (k)
|ξ|2

P −1 X
E exp  1 ξ, Xbntk c+j RN  −→ exp −
Mn (k) 2 j=1 2
Mn (k)
uniformly for ξ in compact subsets of RN . Hence, since n −→ τk , we now
see that, for any fixed ξ ∈ RN ,
√
h i

τk |ξ|2

P
E exp −1 ξ, ∆n (k) RN −→ exp − = γ\
0,τk I (ξ),
2

and therefore ∆n (k) ∗ P =⇒ γ0,τk I .
I turn next to the problem of showing that {µn : n ≥ 1} is tight. By the

Ascoli–Arzelaá Theorem, any subset K ⊆ C(RN ) of the form
∞
\ |ψ(t) − ψ(s)|
ψ : |ψ(0)| ∨ sup ≤ R`
0≤s<t≤` (t − s)α
`=1
iscompact for any α > 0 and {R` : ` ≥ 1} ⊆ [0, ∞). Thus, since µn ψ(0) =
0 = 1, all that we have to do is show that, for each T > 0,

P |Sn (t) − Sn (s)|
sup E sup 1 < ∞,
n≥1 1≤s<t≤T (t − s) 8
and, by Theorem 4.3.2, this would follow if we knew that
sup EP |Sn (t) − Sn (s)|4 ≤ C(t − s)2 , s, t ∈ [0, ∞),

(*)
n≥1
for some C < ∞.

I will prove
(*) under the assumption that, for some M < ∞ and all n ≥ 1,
EP |Xn |4 ≤ M . To do this, note that when k − 1 ≤ ns < nt ≤ k,
h 4 i h i
4
EP Sn (t) − Sn (s) = n2 (t − s)4 EP Xk ≤ M (t − s)2 .
On the other hand, when k − 1 ≤ ns ≤ k ≤ ` ≤ nt ≤ ` + 1,
h 4 i
EP Sn (t) − Sn (s)
h 4 i h 4 i
≤ 27EP Sn (t) − Sn n` + 27EP Sn n` − Sn nk

h 4 i
+ 27EP Sn nk − Sn (s)

 4 
4 X`−k 4
2 ` 27 P   2 k
≤ 27M n t − + 2 E  Xk+j  + 27M n
−s
n n j=1 n
81N 2 M (` − k)2
≤ 54M (t − s)2 + ≤ 135N 2 M (t − s)2 ,
n2
where, in the passage to the final line, I have taken {ei : 1 ≤ i ≤ N } to be an
orthonormal basis for RN and used the estimate
 4    2 2 
X`−k N `−k
  X X
EP  Xk+j  = EP  ei , Xk+j RN   
 
j=1 i=1 j=1
 4 
N
X `−k
 X
ei , Xk+j RN   ≤ 3N 2 M (` − k)2

≤N EP 

i=1 j=1
coming from the second inequality in (1.3.2).

In order to remove assumption on the fourth, I will apply the Principle
of Accompanying Laws. Namely, because the Xn ’s are uniformly square P-
integrable, one can use a truncation
procedure to find functions fn,δ : n ∈
Z+ and δ > 0 ⊆ Cb RN , RN with the properties that, for each δ > 0,
supn∈Z+ fn,δ u < ∞,
h 2 i
sup EP Xn − fn,δ ◦ Xn < δ,
n∈Z+
and, for every n ∈ Z+ , the random variable Xn,δ ≡ fn,δ ◦ Xn has mean value 0
and covariance I. Next, for each δ > 0, define the maps ω ∈Ω 7−→ Sn,δ ( · , ω) ∈
C(RN ) relative to {Xn,δ : n ≥ 1}, and set µn,δ = Sn,δ ∗ P. Then, by the
preceding, we know that µn,δ =⇒ W (N ) for each δ > 0. Hence, by Theorem
9.1.13, we will have proved that µn =⇒ W (N ) as soon as we show that

lim sup P sup Sn (t) − Sn,δ (t) ≥ = 0

δ&0 n∈Z+ 0≤t≤T
for every T ∈ Z+ and > 0. To this end, first observe that, because Sn ( · ) and
Sn,δ ( · ) are linear on each interval [(m − 1)2−n , m2−n ],

m
1 X
sup Sn (t) − Sn,δ (t) = max Yk,δ ,

1
t∈[0,T ] 1≤m≤nT n k=1
2
where Yk,δ ≡ Xk − Xk,δ . Next, note that

!
m
1 X
max Yk,δ ≥

P 1
1≤m≤nT n 2
k=1
!
m
X n 12
≤ N max P max e, Yk,δ RN ≥ 1 .

e∈SN −1 1≤m≤nT N2
k=1
Finally, by Kolmogorov’s Inequality,

m !
X n 12 NTδ
max e, Yk,δ RN ≥ 1 ≤ 2

P
1≤m≤nT N2
k=1
for every e ∈ SN −1 .
§ 9.3.2. Rayleigh’s Random Flights Model. Here is a more picturesque
scheme for approximating Brownian motion. Imagine the path t R(t) of a
bird that starts at the origin, flies in a randomly chosen direction at unit speed
for a unit exponential random time, then switches to a new randomly chosen
direction for a second unit exponential time, etc. Next, given > 0, rescale time
1
and space so that the path becomes t R (t), where R (t) ≡ 2 R(−1 t). I
will show that, as & 0, the distribution of {R (t) : t ≥ 0} becomes Brownian
motion. This model was introduced by Rayleigh and is called his random flights
model.
In the following, {τm : m ≥ 1} is a sequence of mutually independent, unit
exponential random variables from which their partial sums {Tn : n ≥ 0} and
the associated simple Poisson process {N (t) : t ≥ 0} are defined as in § 4.2.1.
Finally, given > 0, N (t) = N (−1 t).
Lemma 9.3.3. Let {Xn : n ≥ 1} a sequence of mutually independent RN -
valued, uniformly square P-integrable random variables with mean value 0 and
covariance I, and define {Sn (t) : t ≥ 0} accordingly, as in Theorem 9.3.1. (Note
that the Xn ’s are not assumed to be independent of the τn ’s.) Next, define
N (t,ω)
√ X
X (t, ω) = Xm , (t, ω) ∈ [0, ∞) × Ω.
m=1
Then, for all r ∈ (0, ∞) and T ∈ [0, ∞),

!
where n ≡ [−1 ].

lim P sup X (t) − Sn (t) ≥ r = 0,
&0 t∈[0,T ]
Proof: Note that

√ N (t, ω)
X (t, ω) − Sn (t, ω) = ( n − 1) Sn ,ω
n

N (t, ω)
+ Sn , ω − Sn (t, ω) .
n
Hence, for every δ ∈ (0, 1],

!

P sup X (t) − Sn (t) ≥ r
t∈[0,T ]
! !
r N (t)
≤P sup Sn (t) ≥ + P sup − t ≥ δ
t∈[0,T +δ] 2 t∈[0,T ] n
!
r
+ P sup sup Sn (t) − Sn (s) ≥ .
s∈[0,T ] |t−s|≤δ 2
But, by Theorem 9.3.1 and the converse statement in Theorem 9.1.9, we know
that the first term tends to 0 as & 0 uniformly in δ ∈ (0, 1] and that the third
term tends to 0 as δ & 0 uniformly in ∈ (0, 1]. Thus, all that remains is to
note that, by Exercise 4.2.19,
!

(9.3.4) lim P sup N (t) − t ≥ δ = 0.

&0 t∈[0,T ]
Now suppose that {θn : n ≥ 1} is a sequence of mutually independent RN -

valued random variables that satisfy the conditions that
h i
M ≡ sup EP |τn θn |4 < ∞,
n∈Z+
h i
EP τn θn = 0, and EP (τn θn ) ⊗ (τn θn ) = I, n ∈ Z+ .

Finally, define ω ∈ Ω 7−→ R( · , ω) ∈ C(RN ) by

N (t,ω)
X
R(t, ω) = t − TN (t,ω) (ω) θN (t,ω)+1 (ω) + τm (ω)θm (ω).
m=1
The process {R(t) : t ≥ 0} is my interpretation of Rayleigh’s random flights

model. A typical choice of the θn ’s would be to make them independent of the
holding times (i.e.,
√theτn ’s) and to choose them to be uniformly distributed over
N −1
the sphere S N .
Theorem 9.3.5. Referring to the preceding, set
√
R (t, ω) = R t , ω , (t, ω) ∈ [0, ∞) × Ω.

Then R ∗ P =⇒ W (N ) as & 0.
Proof: Set Xn = τn θn , and, using the same notation as in Lemma 9.3.3,
observe that √
R (t) − X (t) ≤ XN (t)+1 .

Hence, by Lemma 9.3.3 and Theorems 9.3.1 and 9.1.13, all that we have to do
is check that !
√
lim P sup XN (t)+1 ≥ r = 0
&0 t∈[0,T ]
for every r ∈ (0, ∞) and T ∈ [0, ∞). To this end, set T = 1+T . Then, by
(9.3.4), we have that
!
√

r
lim P sup XN (t)+1 ≥ r = lim P max |Xn+1 | ≥ √
&0 t∈[0,T ] &0 0≤n≤T
  14 
√ 1
P  X 4  M (2 + T ) 4
≤ lim E  |Xn+1 |  ≤ lim = 0.
&0 r &0 r
0≤n≤T
Exercise for § 9.3 399
Exercise for § 9.3

Exercise 9.3.6. Let {µn : n ≥ 1} ⊆ M1 C(RN ) , and, for each T ∈ (0, ∞),
let µTn ∈ M1 C [0, T ]; E) denote the distribution of
ψ ∈ C(RN ) 7−→ ψ [0, T ] ∈ C [0, T ]; RN under µn .

Show that there is a µ ∈ M1 C(RN ) to which {µn : n ≥ 1} converges in
M1 C(RN ) if and only if, for each T ∈ (0, ∞), there is a µT ∈ M1 C([0, T ]; RN )
with the property that
µTn =⇒ µT in M1 C([0, T ]; RN ) ,

in which case µT is the distribution of
ψ ∈ C(RN ) 7−→ ψ [0, T ] ∈ C([0, T ]; RN ) under µ.
In particular, weak convergence of measures on C(RN ) is really a local property.

Exercise 9.3.7. Donsker’s own proof of Theorem 9.3.1 was entirely different
from the one given here. Instead it was based on a special case of his result, a
case that had been proved already (with a very difficult argument) by P. Erdös
and M. Kac. The result of Erdös and Kac was that if {Xn : n ≥ 1} is a sequence
of independent, uniformly square integrable random variables with mean value
0 and variance 1, then, for all a ≥ 0,
m
! r Z
− 21
X 2 ∞ − x2
lim P max n Xk ≥ a = e 2 dx.
n→∞ 1≤m≤n π a
k=1
Prove their result as an application of Donsker’s Theorem and part (iii) of Ex-
ercise 4.3.11. According to Kac, it was G. Uhlenbeck who first suggested that
their result might be a consequence of a more general “invariance” principle.
Exercise 9.3.8. Here is another version
of Rayleigh’s random
flights model.
Again let {τk : k ≥ 1}, Tm : m ≥ 0 , and N (t) : t ≥ 0 be as in § 4.2.2, and
set Z t
√
(−1)N (s) ds and R (t) = R t .

R(t) =
0

Show that R ∗ P =⇒ W (1) as & 0.
Hint: Set βk = 0 or 1 according to whether k ∈ N is even or odd, and note that
n
X n
X X
(−1)k τk =

βk τk+1 − τk − βn τn = τ2k − τ2k−1 − βn τn+1 .
k=1 k=1 1≤k≤ n
2
Now proceed as in the derivations of Lemma 9.3.3 and Theorem 9.3.5.

Chapter 10
Wiener Measure and
Partial Differential Equations
In this chapter I will give a somewhat sketchy survey of the bridge between
Brownian motion and partial differential equations. Like all good bridges, it
is valuable when crossed starting at either end. For those starting from the
probability side, it provides a computational tool with which the evaluation of
many otherwise intractable Wiener integrals is reduced to finding the solution to
a partial differential equation. For aficionados of partial differential equations,
it provides a representation of solutions that often reveals properties that are
not at all apparent in more conventional, purely analytic, representations.
§ 10.1 Martingales and Partial Differential Equations
The origin of all the connections between Brownian motion and partial differen-
tial equations is the observation that the Gauss kernel
N |x|2
(10.1.1) g (N ) (t, x) = (2πt)− 2 e− 2t , (t, x) ∈ (0, ∞) × RN ,
is simultaneously the density for the Gaussian distribution γ0,tI and the solution
to the heat equation ∂t u = 12 ∆u in (0, ∞) × R with initial condition δ0 . More
precisely, if ϕ ∈ Cb (RN ; R), then
Z
uϕ (t, x) = g (N ) (t, y − x)ϕ(y) dy
RN

is the one and only bounded u ∈ C 1,2 (0, ∞) × RN ; R that solves the Cauchy
initial value problem
∂t u = 21 ∆u in (0, ∞) × RN with lim u(t, · ) = ϕ uniformly on compacts.

t&0
Checking that uϕ solves this problem is an elementary computation. Showing

that it is the only solution is less straightforward. Purely analytic proofs can
be based on the weak minimum principle. If one assumes more about u, then a
probabilistic proof can be based on Theorem 7.1.6. Indeed, if one assumes that
400
§ 10.1 Martingales and Partial Differential Equations 401
u ∈ Cb1,2 [0, ∞) × RN ; C , then that theorem shows that, when B(t), Ft , P is a

Brownian motion, for each T > 0, u(T −t∧T, x+B(t∧T )Ft , P is a martingale.
Thus,
Z

u(T, x) = EP ϕ B(T ) = ϕ(x + y) γ0,tI (dy) = uϕ (T, x).
RN
In Theorem 10.1.2, I will prove a refinement of Theorem 7.1.6 that will enable
me (cf. the discussion following Corollary 10.1.3) to remove the assumption that
the derivatives of u are bounded.
As the preceding line of reasoning indicates, the advantage that probability
theory provides comes from lifting questions about a partial differential equa-
tion to a pathspace setting, and martingales provide one of the most powerful
machines with which to do the requisite lifting. In this section I will refine and
exploit that machine.
§ 10.1.1. Localizing and Extending Martingale Representations. The
purpose of this subsection is to combine Theorems 7.1.6 and 7.1.17 with Corollary
7.1.15 to obtain a quite general method for representing solutions to partial
differential equations as Wiener integrals.
For the purposes of this chapter, it is best to think of Wiener measure
W (N )
N N
as a Borel measure on the Polish space C(R ) ≡ C [0, ∞); R and to take
{Ft : t ≥ 0} with Ft = σ {ψ(τ ) : τ ∈ [0, t]} as the standard choice of a
non-decreasing family of σ-algebras. The reason for using C(RN ) instead of (cf.
(N )
§ 8.1.3) Θ(RN ) is that we will want to consider the translates Wx of W (N ) by
(N )
x ∈ RN . That is, Wx is the distribution of ψ x + ψ under W (N ) . Since it
(N )
is clear that the map x ∈ R 7−→ Wx ∈ M1 C(RN ) is continuous, there is
N
no doubt that it is Borel measurable.

Theorem 10.1.2. Let G be a non-empty, open subset of R × RN , and, for
s ∈ R, define ζsG : C(RN ) −→ [0, ∞] by
ζsG (ψ) = inf t ≥ 0 : s + t, ψ(t) ∈

/G .
Further, suppose that V : G −→ R is a Borel measurable function that is
bounded above on the whole of G and bounded below on each compact subset
of G, and set
Z t∧ζsG !
EsV (t, ψ) = exp

V s + τ, ψ(τ ) dτ .
0
If w ∈ C 1,2 (G; R) ∩ Cb (G; R) satisfies ∂t + 12 ∆ + V w ≥ f on G, where f :

G −→ R is a bounded, Borel measurable function, then

EsV (t, ψ)w s + t ∧ ζsG (ψ), ψ(t ∧ ζsG )

Z t∧ζsG (ψ)
E (τ, ψ)f s + τ, ψ(τ ) , Ft , Wx(N )
V

−
0
402 10 Wiener Measure and P.D.E.’s
is a submartingale for every (s, x) ∈ G. In particular, if ∂t + 12 ∆ + V w = f on

G, then the preceding triple is a martingale.

Proof: Without loss in generality, I may and will assume that s = 0.
Choose a sequence {Gn : n ≥ 0} of open sets such that (0, x)S∈ G0 , Gn ⊆
∞
Gn+1 , Gn is a compact subset of G for each n ∈ N , and G = n=0 Gn . At
the same time, for each n ∈ N, choose ηn ∈ C ∞ R × RN ; [0, 1] so that ηn = 1
on Gn and ηn vanishes off a compact subset of G, and define wn and Vn so
that wn = ηn w and V n = ηn V on G and wn and Vn vanish off of G. Clearly,
1,2 N
wn ∈ Cb R × R ; R and Vn is bounded and measurable.
(N )
By Theorem 7.1.6, we know that Mn (t), Ft , Wx is a martingale, where
Z t
with gn = ∂t wn + 12 ∆wn .

Mn (t, ψ) = wn t, ψ(t) − gn τ, ψ(τ ) dτ
0
Thus, if Z t

En (t, ψ) = exp Vn τ, ψ(τ ) dτ ,
0
then, by Theorem 7.1.17,

Z t
(N )
En (t, ψ)Mn (t, ψ) − En (τ, ψ)Mn (τ, ψ)Vn (τ, ψ) dτ, Ft , Wx
0
is also a martingale. In addition,

Z t Z τ

En (τ, ψ)Vn (τ, ψ) gn σ, ψ(σ) dσ dτ
0 0
Z t Z t

= gn σ, ψ(σ) En (τ, ψ)Vn τ, ψ(τ ) dτ dσ
0 σ
Z t
Z t
= En (t, ψ) gn σ, ψ(σ) dσ − En (σ, ψ)gn σ, ψ(σ) dσ,
0 0
and therefore
Z t
En (t, ψ)Mn (t, ψ) − En (τ, ψ)Mn (τ, ψ)Vn (τ, ψ) dτ
0
Z t

= En (t, ψ)wn t, ψ(t) − En (τ, ψ)fn τ, ψ(τ ) dτ,
0
where fn = gn + Vn wn . Hence, we now know that

Z t
(N )

En (t, ψ)wn t, ψ(t) − En (τ, ψ)fn τ, ψ(τ ) dτ, Ft , Wx
0
is a martingale.
Finally, define ζ0Gn for Gn in the same way as ζ0G was defined for G. Since
fn ≥ f on Gn , an application of Theorem 7.1.15 gives the desired result with
ζ0Gn in place of ζ0G , and, because ζ0Gn % ζ0G , this completes the proof.
Perhaps the most famous application of Theorem 10.1.2 is the Feynman–Kac
formula,1 a version of which is the content of the following corollary.
Corollary 10.1.3. Let V : [0, T ] × RN −→ R be a Borel measurable function
that is uniformly bounded above everywhere
and bounded below uniformly on
compacts. If u ∈ C 1,2 (0, T ) × RN ; R is bounded and satisfies the Cauchy initial
value problem
∂t u = 12 ∆u+V u+f in (0, T )×RN with lim u(t, · ) = ϕ uniformly on compacts

t&0
for some bounded, Borel measurable f : [0, T ] × RN −→ R and ϕ ∈ Cb (RN ; R),

then
RT
(N ) V (τ,ψ(τ )) dτ
u(T, x) = EWx

e 0 ϕ ψ(T )
"Z #
(N )
T Rt
Wx V (τ,ψ(τ )) dτ
+E e 0 f t, ω(t) dt .
0
Proof: Given Theorem 10.1.2, there is hardly anything to do. Indeed, here
G = (0, T ) × RN and so ζ0G = T . Thus, by Theorem 10.1.2 applied to w(t, · ) =
u(T − t, · ), we know that
R t∧T
V (τ,ψ(τ )) dτ
e 0 u T − t ∧ T, ψ(t)
Z t∧T Rτ
V (σ,ψ(σ)) dσ
f τ, ψ(τ ) dτ, Ft , Wx(N )

− e 0
is a martingale. Hence,
Rt
W (N ) V (τ,ψ(τ )) dτ
u(T, x) = lim E e 0 u T − t, ψ(t)
t%T
Z t Rτ
V (σ,ψ(σ)) dσ
+ e 0 f τ, ψ(τ ) dτ ,
0
1In the same spirit as he wrote down (8.1.4), Feynman expressed solutions to Schrödinger’s
equation in terms of path-integrals. After hearing Feynman lecture on his method, Kac realized
that one could transfer Feynman’s ideas from the Schrödinger to the heat context and thereby
arrive at a mathematically rigorous but far less exciting theory.
from which the asserted equality follows immediately.

As a special case of the preceding, we obtain the missing uniqueness statement

in the introduction to this section. Namely, if u ∈ C 1,2 (0, ∞) × RN ; C is a
bounded solution to the heat equation with initial value ϕ, then, by considering
the real and imaginary parts of u separately, Corollary 10.1.3 implies that
Z
(N )
u(t, x) = EWx

ϕ ψ(t) = ϕ(y)g(t, y − x) dy.
RN
§ 10.1.2. Minimum Principles. In this subsection I will show how Theorem

10.1.2 leads to an elegant derivation of the basic minimum principle for solutions
to equations like the heat equation. Actually, there are two such minimum
principles, one of which says that solutions achieve their minimum value at the
boundary of the region in which they are defined and the other of which says that
only solutions that are constant can achieve a minimum value on the interior.
The first of these principles is called the weak minimum principle, and the
second is called the strong minimum principle.
Theorem 10.1.4. Let G be a non-empty open subset of R × RN , and let V
be a function of the sort described in Theorem 10.1.2. Further, suppose that
(s, x) ∈ G is a point at which

Wx(N ) ∃ t ∈ (0, ∞) s − t, ψ(t) ∈

(10.1.5) / G = 1.
If u ∈ C 1,2 (G; R) is bounded below and satisfies ∂t u − 12 ∆u − V u ≥ 0 in G and if

lim(t,y)→(t0 ,y0 ) u(t, y) ≥ 0 for every (t0 , y0 ) ∈ ∂G with t0 < s, then u(s, x) ≥ 0.
Proof: Without loss in generality, I will assume that s = 0.
Set G̃ = {(t, y) : (−t, y) ∈ G} and define w on G̃ by w(t, y) = u(−t, y). Next,
choose an exhaustion {Gn : n ≥ 0} of G as in the proof of Theorem 10.1.2, and
fn = {(t, y) : (−t, y) ∈ Gn }. By Theorem 10.1.2, we know that
set G
(N )
h R ζn (ψ) i
Wx V (−τ,ψ(τ )) dτ
w(0, x) ≥ E e 0 w ζn (ψ), ψ(ζn ) ,

where ζn (ψ) = inf{t ≥ 0 : t, ψ(t) ∈ fn } ∧ n. Moreover, by (10.1.5), for
/ G
(N )
Wx -almost every ψ, −ζn (ψ), ψ(ζn ) tends to a point in {(t, x) ∈ ∂G : t < 0}
as n → ∞, and therefore
Wx(N ) -almost surely.

lim w ζn (ψ), ψ(ζn ) = lim u −ζn (ψ), ψ(ζn ) ≥ 0
n→∞ n→∞
Hence, by Fatou’s Lemma, we see that u(0, x) = w(0, x) ≥ 0.

Theorem 10.1.6. In the same setting as the preceding, suppose that u ∈

C 1,2 (G; R) satisfies ∂t u − 12 ∆u − V u ≥ 0 in G. If (s, x) ∈ G and 0 = u(s, x) ≤
u(t, y) for all (t, y) ∈ G with t ≤ s, then u s − t, ψ(t) = 0 for all (t, ψ) ∈
[0, ∞) × C(RN ) such that ψ(0) = x and s − τ, ψ(τ ) ∈ G for all τ ∈ [0, t]. In
particular, if G is a connected, open subset of RN , V is independent of time,
and u ∈ C G; [0, ∞) satisfies 12 ∆u + V u ≤ 0, then either u ≡ 0 or u > 0
2

everywhere on G.
Proof: Again, without loss in generality, I assume that s = 0. In addition,I
may and will assume that x = 0, V is uniformly bounded, and u ∈ Cb G; [0, ∞) .
To see that these latter assumptions cause no loss in generality, one can use an
exhaustion argument of the same sort as was used in the proof of Theorem 10.1.2.
N
Given (t, ψ) ∈ (0, ∞)×C(R
) with ψ(0) = 0 and −τ, ψ(τ ) ∈ G for τ ∈ [0, t],
suppose that u −t, ψ(t) > 0. In order to get a contradiction, choose r > 0 so
that u(−t, y) ≥ r if |y − ψ(t)| ≤ r and so that −τ, ψ 0 (τ ) ∈ G if τ ∈ [0, t] and

kψ 0 − ψk[0,t] ≤ r. If G̃ = {(t, y) : (−t, y) ∈ G}, then, just as in the proof of

Theorem 10.1.2,
Z R t∧ζG̃ (ψ0 )
0 V (−τ,ψ 0 (τ )) dτ
u −t ∧ ζ0G̃ (ψ 0 ), ψ 0 (t ∧ ζ0G̃ ) W (N ) (dψ 0 )

0 = u(0, 0) ≥ e 0
≥ re−tkV ku k W (N ) {ψ 0 : kψ 0 − ψk[0,t] ≤ r} .

Since, by Corollary 8.3.6, W (N ) {ψ 0 : kψ 0 − ψk[0,t] ≤ r} > 0, we have the

required contradiction.
Turning to the final assertion, take G = R × G, and observe that for all
(x, y) ∈ G2 there is a ψ such that ψ(0) = x, ψ(1) = y, and ψ(τ ) ∈ G for all
τ ∈ [0, 1].
At first glance, one might think that the strong minimum principle overshad-
ows the weak minimum principle and makes it obsolete. However, that is not
entirely true. Specifically, before one can apply the strong minimum principle,
one has to know that a minimum is actually achieved. In many situations,
continuity plus compactness provide the necessary existence. However, when
compactness is absent, special considerations have to be brought to bear. The
weak minimum principle does not suffer from this problem. On the other hand,
it suffers from a related problem. Namely, one has to know ahead of time that
(10.1.5) holds. As we will see below, this is usually not too serious a problem,
but it should be kept in mind.
§ 10.1.3. The Hermite Heat Equation. In the preceding subsection I gave
an example of how probability theory can give information about solutions to
partial differential equations. In this subsection, it will be a differential equation
that gives us information about probability theory. To be precise, I, following M.
Kac, will give in this subsection his derivation of the formulas that we derived
by purely Gaussian techniques in Exercise 8.2.16, and in the next section I will
give his treatment of a closely related problem.2
Closed form solutions to the Cauchy initial value problem are available for
very few V ’s, but there is a famous one for which they are. Namely, when
V = − 12 |x|2 , a great deal is known. Indeed, already in the nineteenth century,
Hermite knew how to analyze the operator 12 ∆− 12 |x|2 . As a result, this operator
is often called the Hermite operator by mathematicians, although physicists
call it the harmonic oscillator because it arises in quantum mechanics as minus
the Hamiltonian for an oscillator that satisfies Hook’s law. Be that as it may,
set (cf. (10.1.1))
1 − e−2t

N t+|x|2 |y|2
− (N ) −t
(10.1.7) h(t, x, y) = e 2 g ,y − e x e 2
2
for (t, x, y) ∈ (0, ∞) × RN × RN . By using the fact that g (N ) solves the heat
equation and tends to δ0 as t & 0, one can apply elementary calculus to check
that
∂t h(t, · , y) = 12 ∆ − 12 |x|2 h(t, · , y) in (0, ∞) × RN

for each y ∈ RN .
and lim h(t, x, y) = δy−x
t&0
Now let ϕ ∈ Cb (RN ; R) be given, and set

Z
uϕ (t, x) = ϕ(y)h(t, x, y) dy.
RN
Then, uϕ is a bounded solution to ∂t u = 12 ∆u − 12 |x|2 u that tends to ϕ as t & 0.

Hence, as an immediate consequence of Corollary 10.1.3, we see that
h 1 Rt
(N ) − |ψ(τ )|2 dτ i
uϕ (t, x) = EWx e 2 0 ϕ ψ(t) .
By taking ϕ = 1 and performing a tedious, but completely standard, Gaussian

computation, one can use this to derive
|x|2
Rt
(N )
Wx − 12 |ψ(τ )|2 dτ − N2
E e 0 = cosh t exp − tanh t ,
2
which, together with Brownian scaling, vastly generalizes the result in Exercise
8.2.16.
2See Kac’s “On some connections between probability theory and differential and integral
equations,” Proc. 2nd Berkeley Symp. on Prob. & Stat. Univ. of California Press (1951),
where he gives several additional, intriguing applications of Corollary 10.1.3.
§ 10.1.4. The Arcsine Law. As I said at the beginning of the last subsection,
there are very few V ’s for which one can write down explicit solutions to equa-
tions of the form ∂t u = 12 ∆u + V u. On the other hand, when V is independent
of time one can often, particularly whenRN = 1, write down a closed form ex-
∞
pression for the Laplace transform Uλ = 0 e−λt u(t, · ) dt of u. Indeed, if u is a
bounded solution to ∂t u = 12 ∆u + V u, then it is an elementary exercise to check
that
λ − 12 ∆ − V Uλ = f,

and when N = 1 this is an ordinary differential equation. Moreover, when

Uλ ∈ C 2 (RN ; R) is bounded, one can apply Corollary 10.1.3 to see that
h RT
(N ) i
V (ψ(τ )) dτ
Uλ (x) =EWx e 0 λ Uλ ψ(T )
"Z #
(N )
T Rt
Wx Vλ (ψ(τ )) dτ
+E e 0 f ψ(t) dt for T > 0,
0
where Vλ = V − λ. Hence, if Vλ is uniformly negative and one lets T → ∞, one

gets Z ∞ R t
(N ) Vλ (ψ(τ )) dτ
Wx

Uλ (x) = E e 0 f ψ(t) dt .
0
The preceding remark is the origin of Kac’s derivation of Lévy’s Arcsine Law
for Wiener measure.
Theorem 10.1.8. For every T ∈ (0, ∞) and α ∈ [0, 1],
( )!
Z T √
(1) 1 2
W ψ ∈ C(R) : 1[0,∞) ψ(t) dt ≤ α = arcsin α .
T 0 π
Proof: First note that, by Brownian scaling, it suffices to prove the result when
T = 1. Next, set
Z 1

F (α) = W ψ ∈ C(R) : 1[0,∞) ψ(s) ds ≤ α , α ∈ [0, ∞),
0

and let µ denote the element of M1 [0, ∞) for which F is the distribution
function. We are going to compute F (α) by looking at the double Laplace
transform Z
G(λ) ≡ e−λt g(t) dt, λ ∈ (0, ∞),
(0,∞)
where Z
g(t) ≡ e−tα µ(dα), t ∈ (0, ∞);
[0,∞)
and, by another application of the Brownian scaling property, we see that

Z ∞ Z Z t

(1)
G(λ) = exp − λ + 1[0,∞) ψ(s) ds W (dψ) dt
0 0
Z ∞ R t
(1) V (ψ(τ )) dτ
= EW e 0 λ dt where Vλ ≡ −λ − 1[0,∞) .
0
At this point, the strategy is to calculate G(λ) with the help of the idea
explained above. For this purpose, I begin by seeking as good a solution x ∈
R 7−→ uλ (x) ∈ R as I can find to the equation 12 u00 + Vλ u = −1. By considering
this equation separately on the left and right half-lines and then matching, in so
far as possible, at 0, one finds that the best choice of bounded uλ will be to take
 h p i
1
 Aλ exp − 2(1 + λ) x + 1+λ
 if x ∈ [0, ∞)
uλ (x) = h√ i
 Bλ exp 2λ x + 1

if x ∈ (−∞, 0),
λ
where
12 12
1 1 1 1
Aλ = − and Bλ = − .
λ(1 + λ) 1+λ λ(1 + λ) λ
(The choice of sign in the exponent is dictated by my desire to have uλ bounded.)

If uλ were twice continuously differentiable, I could apply the reasoning above
directly and thereby arrive at G(λ) = uλ (0). However, because the second
derivative of uλ is discontinuous at 0, I have to work a little harder.
Notice that, although the second derivative of uλ has a discontinuity at 0,
u0λ is nonetheless uniformly Lipschitz continuous everywhere. Hence, by taking
ρ ∈ Cc∞ R; [0, ∞) with Lebesgue integral 1 and setting
Z
uλ,n (x) = n uλ (x − y)ρ(ny) dy, n ∈ Z+ ,
R
∈ Cb∞ (R; R) for each n ∈ Z+ , uλ,n −→ uλ uniformly on R as

we see that uλ,n
n → ∞, supn∈Z+ uλ,n kCb2 (R;R) < ∞, and, as n → ∞,

1 00
fn ≡ uλ,n − λ + 1[0,∞) uλ,n −→ −1 on R \ {0}.
2
Thus, since the argument that I attempted to apply to uλ works for uλ,n , we
know that
Z ∞ R t
W (1) Vλ (ψ(τ )) dτ
uλ,n (0) = E e 0 fn ψ(t) dτ dt .
0
In addition, because
Z ∞ Z ∞
W (1)

E 1{0} ψ(t) dt = γ0,t {0} dt = 0,
0 0
Z ∞ Rt
W (1) Vλ (ψ(τ )) dτ
E e 0 fn ψ(t) dt −→ G(λ).
0
Hence, the conclusion uλ (0) = G(λ) has now been rigorously verified.
− 1
Knowing that G(λ) = λ(1−λ) 2 , the rest of the calculation is easy. Indeed,
since Z ∞ r
− 12 −λt π
t e dt = ,
0 λ
the multiplication rule for Laplace transforms tells us that
t 1
e−s e−tα
Z Z
1 1
g(t) = p ds = p dα;
π 0 s(t − s) π 0 α(1 − α)
and so we now find that
1 α∧1
Z
1 2 √
F (α) = p dβ = arcsin α ∧ 1 .
π 0 β(1 − β) π
Just as Donsker’s Invariance Principle enabled us in Exercise 9.3.7 to derive

the Erdös–Kac Theorem from the reflection principle for Brownian motion, it
now allows us to transfer the Arcsine Law for Wiener measure to the Arcsine
Law for sums of independent random variables.

Corollary 10.1.9. If Xn : n ≥ 1 is a sequence of independent, uniformly
square P-integrable random variables with mean value 0 and variance 1 on some
probability space (Ω, F, P), then, for every α ∈ [0, 1],
√

Nn (ω) 2
lim P ω: ≤α = arcsin α ,
n→∞ n π
Pm
where Nn (ω) is the number of m ∈ Z+ ∩ [0, n] for which Sm (ω) ≡ `=1 X` (ω)
is non-negative.
Nn (ω)
Proof: Thinking of n as a Riemann approximation to (cf. the notation in
§ 9.2.1)
Z 1
1[0,∞) Sn (t, ω) dt,
0
one should guess that, in view of Theorem 9.3.1 and Theorem 9.1.13, there
should be very little left to be done. However, once again there are continuity

issues that have to be dealt with. Thus, for each f ∈ C R; [0, 1] and n ∈ Z+ ,
introduce the functions F f and Fnf on C(R) given by
Z 1 n
1 X
F f (ψ) = and Fnf (ψ) = m

f ψ(t) dt f ψ n
0 n m=1

for any f ∈ C R; [0, 1] . Since Fnf −→ F f uniformly on compacts, Theorem
9.3.1 plus Lemma 9.1.10 show that the distribution of
n
1 X Sm (ω)
ω ∈ Ω 7−→ Afn (ω) ≡ f 1
n m=1 n2
under P tends weakly to that of ψ ∈ C(R) 7−→ F f (ψ) under W (1) . Next, for
each δ ∈ (0, ∞), choose continuous functions fδ± so that 1(δ,∞) ≤ fδ+ ≤ 1[0,∞)
and 1[0,∞) ≤ fδ− ≤ 1[−δ,∞) , and conclude that

Nn
(1)
+
fδ
lim P ≤α ≤W F ≤α
n→∞ n
and

Nn
(1)
−
fδ
lim P <α ≥W F <α
n→∞ n
for every δ > 0. Passing to the limit as δ & 0, we arrive at

Z 1
Nn (1)

lim P ≤α ≤W ψ: 1(0,∞) ψ(t) dt ≤ α
n→∞ n 0
and
Z 1
Nn (1)

lim P <α ≥W ψ: 1[0,∞) ψ(t) dt < α .
n→∞ n 0
Finally, since
Z Z 1 Z 1
(1)
W (1) ψ(t) = 0 dt = 0,

1{0} ψ(t) dt W (dψ) =
0 0
√
and α ∈ [0, 1] 7−→ arcsin α is continuous, the asserted result follows.
Remark 10.1.10. The renown of the Arcsine Law stems, in large part, from the
following counterintuitive deduction that can be drawn from it. Namely, given
δ ∈ 0, 12 , guess which α maximizes limn→∞ P Nnn ∈ (α − δ, α + δ) mod1 for

a fixed δ. Because of The Law of Large Numbers (in more common parlance,
“The Law of Averages”), most people are inclined to guess that the maximum
should occur at α = 12 . Thus, it is surprising that, since
1
α ∈ [0, 1] 7−→ p ∈ [0, ∞]
α(1 − α)
is convex and has its minimum at 12 , the Arcsine Law makes the exact opposite
prediction! The point is, of course, that the sequence of partial sums {Sn (ω) :
n ≥ 1} is most likely to make long excursions above and below 0 but tends to
spend relatively little time in a neighborhood of 0. In other words, although
one may be correct to feel that “my luck has got to change,” one had better be
prepared to wait a long time.
A more technical point is one raised by S. Sternberg. The arcsine distribution
is familiar to people who study iterated maps and is important to them because
(cf. Exercise 10.1.15) it is the one and only absolutely continuous probability
distribution on [0, 1] that is invariant under x ∈ [0, 1] 7−→ 4x(1 − x) ∈ [0, 1].
Sternberg asked whether a derivation R 1 of Theorem 10.1.8 can be
R 1based on this
invariance property. Taking T+ = 0 1[0,∞) ψ(s) ds and S = 0 sgn ψ(s) ds,
and noting that 4T+ (1 − T+ ) = 1 − S 2 , one way to phrase Sternberg’s question
is to ask is whether there is a pure thought way to check that T+ and 1 − S 2
have the same distribution under W (1) and that that distribution is absolutely
continuous. I have posed this problem to several experts but, as yet, none of
them has come up with a satisfactory solution.
§ 10.1.5. Recurrence and Transience of Brownian Motion. In this sub-

section I will use solutions to partial differential equations to examine the long
time behavior of Brownian motion.
Theorem 10.1.11. For r ∈ [0, ∞), define
ψ ∈ C(RN ).

ζr (ψ) = inf t ∈ [0, ∞) : |ψ(t)| = r ,
Then
(N ) r2 − |x|2
EWx ζr =
N
for |x| < r.
(N ) (N + 4)r2 − N |x|2 2
EWx ζr2 = r − |x|2

2
N (N + 2)
In addition, if 0 < r < |x| < R < ∞, then

 R − |x|
 if N =1
R−r





 log R − log |x|

Wx(N ) ζr < ζR = log R − log r
if N =2


 N −2 N −2
− |x|N −2


 r R
if N ≥ 3.


|x| R N −2 − rN −2
In particular,
Wx(1) ζ0 < ∞ = 1 for all x ∈ R,

Wx(2) ζ0 < ∞ = 0, x 6= 0, but Wx(2) ζr < ∞ = 1, x ∈ R2 and r > 0,

and
N −2
r
Wx(N )

ζr < ∞ = , 0 < r < |x|, when N ≥ 3.
|x|
Proof: To prove the first two equalities, set f (t, x) = |x|2 − N t, use Theorem
10.1.2 to show that

f t ∧ ζr , ψ(t ∧ ζr ) , Ft , Wx(N )

and
!
2
Z t∧ζr
f t ∧ ζr , ψ(t ∧ ζr ) − 4 |ψ(s)|2 ds, Ft , Wx(N )
0
are continuous martingales, and conclude that

(N )
h
(N ) 2 i
N EWx t ∧ ζr = EWx ψ(t ∧ ζr − |x|2 ,

t ∈ [0, ∞),
and
"Z #
(N ) (N )
t∧ζr
Wx
2
(t ∧ ζr )2 =|x|4 + 4EWx 2

N E |ψ(s)| ds
0
(N )
h 2 i h
(N ) 4 i
+ 2N EWx (t ∧ ζr ) ψ(t ∧ ζr ) − EWx ψ(t ∧ ζr )

(N )
for all t ∈ [0, ∞). Now assume that |x| ≤ r, and use the first of these N EWx [ζr ] ≤
(N ) (N )
r2 . Thus Wx (ζr < ∞) = 1, and so N EWx [ζr ] = r2 −|x|2 follows when t → ∞.
To get the second equality, use Theorem 10.1.2 to show that
Z t∧ζr !
4 2
ψ(s) ds, Ft , Wx(N )

ψ(t ∧ ζr ) − (4 + 2N )
0
is a continuous martingale and therefore that, for x ∈ B(0, r),

"Z #
(N )
t∧ζr h
(N ) i
(4 + 2N )EWx ψ(s) ds = EWx ψ(t ∧ ζr )4 − |x|4 ,
2

0
plug this into the above, and pass to the limit as t −→ ∞.

Turning to the second part of the theorem, for each N ∈ Z+ choose fN ∈
∞
Cb RN ; R) so that, on an open neighborhood G of the annulus B(0, R)\B(0, r),
fN is equal to the corresponding expression on the right-hand side of the equality
under consideration, note that ∆fN = 0 on G, and conclude (via Theorem
10.1.2) that
fN t ∧ ζr ∧ ζR , Ft , Wx(N )

is a bounded, continuous martingale. In particular, after one lets t → ∞ and

(N )
notes that, by the first part of this theorem, Wx (ζR < ∞) = 1 for |x| < R,
this leads to
(N )
h i
Wx(N ) ζr < ζR = EWx fN ζr ∧ ζR = fN (x), 0 < r < |x| < R,

as required. Given this, the rest of the theorem follows easily when one lets
R % ∞ and, in the case when N = {1, 2}, r & 0.
The second part of Theorem 10.1.11 says something significant about the
global behavior of Brownian paths and the dependence of that behavior on
dimension. Namely, when N ∈ {1, 2}, it says that, no matter where it is started,
a Brownian path will hit any non-empty open set with probability 1. As will be
shown in Theorem 10.2.3, this property implies the seemly stronger statement
that, with probability 1, a Brownian path will visit every non-empty open set
infinitely often and will spend infinite time in each. For this reason, Brownian
motion in one and two dimensions is said to be recurrent. By contrast, when
N ≥ 3, Theorem 10.1.11 says that, with positive probability, a Brownian path
will never visit a closed ball in which it was not started. Moreover, if it is started
outside of a ball, then the probability of its ever hitting that ball goes to 0 as
the diameter of the set goes to 0. As I am about to show, this latter property
leads to the conclusion that, with probability 1, a Brownian path in three or
more dimensions tends to infinity.
Corollary 10.1.12. If N ≥ 3, then

Wx(N ) x ∈ RN .

lim ψ(t) = ∞ = 1,
t→∞
Proof: Given r > 0, apply Theorem 10.1.2 to see that (cf. the notation in
Theorem 10.1.11)
ψ(t ∧ ζr )−N +2 , Ft , Wx(N )

is a bounded, non-negative martingale for every |x| > r > 0. Hence, by Theorem
7.1.14, for any 0 ≤ s ≤ t < ∞ and A ∈ Fs ,
h i
ψ(s)−N +2 , A ∩ ζr (ψ) > s
(N )
|x|−N +2 ≥ EWx

h
(N ) −N +2 i
= EWx ψ t ∧ ζr

, A ∩ ζr (ψ) > s ;
(N )
and, because N ≥ 3 and therefore ζr % ∞ a.s., Wx as r & 0, an application
of the Monotone Convergence Theorem and Fatou’s Lemma leads to
h i h i
ψ(s)−N +2 , A ≥ EWx ψ(t)−N +2 , A
(N ) (N )
|x|−N +2 ≥ EWx

for all 0 ≤ s ≤ t < ∞, A ∈ Fs , and x 6= 0. In particular, this proves that

−N +2
−ψ(t) , Ft , Wx(N )
is a non-positive submartingale for every x 6= 0 and therefore, by Theorem

(N )
7.1.10, that limt→∞ ψ(t) exists in [0, ∞] for Wx -almost every ψ ∈ C(RN ).

At the same time,

Wx(N ) ψ(t) ≤ R = γ0,tI y : |y − x| ≤ R −→ 0

as t → ∞ for every R ∈ (0, ∞) and x ∈ RN ; and so we now know that, at least

(N )
when x 6= 0, |ψ(t)| −→ ∞ for Wx -almost every ψ ∈ C(RN ). Finally, since
Z
(N )
Wx(N )

W0 inf ψ(t) ≤ R = inf ψ(t) ≤ R γ0,I (dx),
t≥T +1 t≥T
RN \{0}
the same result also holds when x = 0.

The conclusion drawn in the preceding is sometimes summarized as the state-
ment that Brownian motion in three or more dimensions is transient.

Exercise 10.1.13. Referring to § 8.4.1, define U(t, x, θ) by (8.5.1), and let
(N )
Ux ∈ M1 C(RN ) denote the W (N ) -distribution of θ U (N ) ( · , x, θ). Given
N G
a non-empty open set G ⊆ R × R , define ζs (ψ) as in Theorem 10.1.2, and
show that for each w ∈ C 1,2 (G; R) ∩ Cb (G; R) and f ∈ Cb (G; R) satisfying
1 1
∂t w(t, y) − ∆w(t, y) + y, ∇w(t, y) RN ≥ f in G,
2 2
Z t∧ζsG (ψ) !
G G (N )

w s + t ∧ ζs (ψ), ψ(t ∧ ζs ) − f s + τ, ψ(τ ) dτ, Ft , Ux
0
is a submartingale for all (s, x) ∈ G.

Exercise 10.1.14. Let h be the function described in (10.1.7), and show that

(N )
Wx
h 1 RT
−2

|ψ(τ )|2 dτ i h t, x, ψ(T )
E e 0 σ {ψ(T )} = (N ) .
g T, ψ(T ) − x
Next, referring to Exercise 8.3.21, set `T,x,y (t) = TT−t x + Tt y for t ∈ [0, T ], let
(N )
WT,x,y ∈ M1 C([0, T ]; RN ) denote the W (N ) -distribution of θ `T,x,y + θT
[0, T ], and show that
(N )
h 1 RT
− |ψ(τ )|2 dτ
i h(t, x, y)
EWT ,x,y e 2 0 = (N ) .
g (T, y − x)
Exercise 10.1.15. The purpose of this exercise is to examine the asser-
tion made in Remark 10.1.10 about the characterization of the arcsine distri-
bution (i.e., the Borel probability
√ measure on [0, 1] with distribution function
x ∈ [0, 1] 7−→ F (x) = π2 arcsin x ∈ [0, 1]). Specifically, the goal is to show that
the arcsine distribution is the one and only Borel probability measure on [0, 1]
that is absolutely continuous with respect to Lebesgue measure and invariant
under x ∈ [0, 1] 7−→ 4x(1 − x) ∈ [0, 1].
2
(i) Define x ∈ [0, 1] 7−→ Φ(x) = sin πx 2 ∈ [0, 1], and show that a Borel
probability measure µ on [0, 1] is invariant under x 4x(1−x) if and only if Φ∗ µ
is invariant under x 2x mod 1. Conclude that the desired characterization of
the arcsine distribution is equivalent to showing that Lebesgue measure λ[0,1] on
[0, 1] is the one and only Borel probability measure on [0, 1] that is absolutely
continuous with respect to Lebesgue measure and invariant under x 2x mod 1.
(ii) Suppose that µ is a Borel probability measure on [0, 1] that is invariant

under x 2x mod 1 and assigns probability 0 to {0}. Set F (x) = µ [0, x] , the
distribution function for µ, and use induction on n ≥ 0 to show that
n
2X −1

F m2−n + x2−n − F m2−n

F (x) =
m=0
for x ∈ [0, 1].
(iii) Now add the assumption that µ λ[0,1] , let f be the corresponding Radon–
Nikodym derivative, and extend f to R by taking f = 0 off of [0, 1]. Given
0 ≤ x < x + y ≤ 1, conclude that
Z
F (x + y) − F (x) − F (y) ≤ f t + x2−n − f (t) dt −→ 0

R
as n → ∞. In other words, F (x + y) = F (x) + F (y) whenever 0 ≤ x <
x + y ≤ 1. Finally, after combining this with the facts that F (0) = 0, F (1) = 1,
and F is continuous, conclude that F (x) = x for x ∈ [0, 1]. In view of part
(i), this completes the proof that the arcsine distribution admits the asserted
characterization.
(vi) To see that absolute continuity is absolutely essential in the preceding con-
+
siderations, consider any Borel probability measure M on {0, 1}Z that is sta-
tionary in the sense that the M -distribution of
+ +
ω ∈ {0, 1}Z 7−→ (ω2 , . . . , ωn+1 , . . . ) ∈ {0, 1}Z
is again M . Show that the M -distribution µ of
∞
+ X
ω ∈ {0, 1}Z 7−→ 2−n ωn ∈ [0, 1]
n=1
is invariant under x 2x mod 1. In particular, this means that, for each
p ∈ (0, 1) \ { 12 }, the µp described in Exercise 1.4.29 is a non-atomic, Borel
probability measure on [0, 1] that is invariant under x 2x mod 1 but singular
to Lebesgue measure.
§ 10.2 The Markov Property and Potential Theory
In this section I will discuss the Markov property for Wiener measure and show
how it can be used as a tool for connecting Brownian motion to partial differential
equations.
§ 10.2.1. The Markov Property for Wiener Measure. The introduction
(N )
of the translates Wx ’s facilitates the statement of the following important in-
terpretation of Theorem 7.1.16. In its statement, and elsewhere, Σt : C(RN ) −→
C(RN ) is the time-shift map determined by Σt ψ(τ ) = ψ(t + τ ), τ ∈ [0, ∞),
and when ζ is a stopping time, Σζ is the map on {ψ : ζ(ψ) < ∞} −→ C(RN )
given by Σζ ψ(τ ) = ψ ζ(ψ) + τ .
Theorem 10.2.1. If ζ is a stopping time and F : C(RN ) × C(RN ) −→ [0, ∞)
is a Fζ × FC(RN ) -measurable function, then
Z
F ψ, Σζ ψ Wx(N ) (dψ)

{ψ:ζ(ψ)<∞}
(10.2.2) Z Z !
0 (N )
= F (ψ, ψ ) Wψ(ζ) (dψ 0 ) Wx(N ) (dψ).
C(RN )
{ψ:ζ(ψ)<∞}
§ 10.2 The Markov Property and Potential Theory 417
Proof: Given Theorem 7.1.16, the proof is mostly a matter of notation. In the
first place, by replacing F (ψ, ψ 0 ) with F (x + ψ, ψ 0 ), one can reduce to the case
when x = 0. Thus, I will assume that x = 0. Secondly, Σζ ψ = ψ(ζ) + δζ ψ if
ζ(ψ) < ∞. Hence,
Z Z
(N )
F ψ, ψ(ζ) + δζ ψ W (N ) (dψ).

F ψ, Σζ ψ W (dψ) =
{ψ:ζ(ψ)<∞} {ψ:ζ(ψ)<∞}
Now define F̃ (ψ, ψ 0 ) = 1[0,∞) ζ(ψ) F ψ, ψ(ζ) + ψ 0 , note that F̃ is again

Fζ × BC(RN ) -measurable, and apply Theorem 7.1.16 to reach the desired conclu-
sion.
Theorem 10.2.1 is a statement of the Markov property for Wiener measure.
More precisely, because it involves stopping times, and not just fixed times, it is
often called the strong Markov property.
§ 10.2.2. Recurrence in One and Two Dimensions. As my first application
of the Markov property, I will prove the statement made following Theorem
10.1.11 about the recurrence of Brownian motion when N ∈ {1, 2}.
Theorem 10.2.3. If N ∈ {1, 2}, then, for all x ∈ RN ,
Z ∞
Wx(N ) 1B(c,r) ψ(t) dt = ∞ for all c ∈ RN and r ∈ (0, ∞) = 1.

0
Proof: Because RN is separable, it is easy to use countable additivity to see

that the asserted result will be proved once we show that
Z ∞
(N )
1B(0,r) ψ(t) dt = ∞ = 1 for all x ∈ RN and r ∈ (0, ∞).

(*) Wx
0
Define {ζ n : n ≥ 0} so that ζ 0 (ψ) = inf{t ≥ 0 : |ψ(t)| ≤ 2r } and

inf t ≥ ζ n−1 (ψ) : |ψ(t)| ≥ r for odd n ≥ 1

n
ζ (ψ) =
inf t ≥ ζ n−1 (ψ) : |ψ(t)| ≤ 2r

for even n ≥ 2,
with the understanding that ζ n−1 (ψ) = ∞ =⇒ ζ n (ψ) = ∞. Clearly, all

the ζ n ’s are stopping times. In addition, ζ 2n (ψ) < ∞ =⇒ ζ 2n+1 (ψ) =
ζ 2n (ψ) + ζ B(0,r) ◦ Σζ 2n (ψ) for all n ≥ 0, and ζ 2n−1 (ψ) < ∞ =⇒ ζ 2n (ψ) =
ζ 2n−1 (ψ) + ζ r2 ◦ Σζ 2n−1 (ψ) for n ≥ 1, where ζ B(0,r) (ψ) = inf{t ≥ 0 : |ψ(t)| ≥ r}
and, as in Theorem 10.1.11, ζ r2 (ψ) = inf t ≥ 0 : |ψ(t)| = 2r . Hence, by

Theorem 10.2.1,
(N )
Wx(N ) ζ 2n+1 ≤ t Fζ 2n = Wψ(ζ 2n ) ζ B(0,r) ≤ t if ζ2n (ψ) < ∞,

(**) (N )
Wx(N ) ζ 2n ≤ t Fζ 2n−1 = Wψ(ζ 2n−1 ) ζ r2 ≤ t

if ζ2n−1 (ψ) < ∞.
In particular, because N ∈ {1, 2}, Theorem 10.1.11 says that both ζ B(0,r) and
(N )
ζ r2 are Wy -almost surely finite for all y ∈ RN . Thus, by induction, ζ n < ∞
(N )
Wx -almost surely for all n ≥ 0.
Next set
2n+1
ζ (ψ) − ζ 2n (ψ) if ζ 2n (ψ) < ∞
Xn (ψ) ≡
0 if ζ 2n (ψ) = ∞.
(N )
By the preceding, we know that, for each n ≥ 0, Xn > 0 Wx -almost surely.
In addition, it is obvious that
Z ∞ ∞
X

1B(0,r) ψ(t) dt ≥ Xn (ψ).
0 n=0
Hence, if we show that the Xn ’s are mutually independent and identically dis-
(N )
tributed under Wx , then (*) will follow from The Strong Law of Large Num-
bers. But, by (**), we will know that the Xn ’s have both these properties once
(N )
we show that Wy (ζ B(0,r) ≤ t) is the same for all y ∈ RN with |y| = 2r . To
this end, let yi , i ∈ {1, 2} with |yi | = 2r be given, and choose an orthogonal
(N )
transformation O of RN so that y2 = Oy1 . Then, Wy2 is the distribution of
(N ) (N ) (N )
ψ Oψ under Wy1 , and so Wy2 (ζ B(0,r) ≤ t) = Wy1 (ζ B(0,r) ≤ t).
§ 10.2.3. The Dirichlet Problem. There are many ways in which the Markov
property can be used to relate Brownian motion to partial differential equations,
but among the most compelling is the one that was discovered by S. Kakutani
and developed by Doob.1 What Kakutani discovered is that the capacitory
potential (cf. § 11.4.1) of a set K ⊆ R2 at a point x ∈ R2 \ K is equal to the
probability that a Brownian motion started at x ever hits K. What Doob did
is extend Kakutani’s result to RN and show that it is a very special case of a
result that identifies the distribution of the place where a Brownian motion hits
the boundary of a set as the harmonic measure (cf. § 11.1.4) for that set. In this
subsection, I will give a brief introduction to these ideas. A much more thorough
account is given in Chapter 11.
Let G be a non-empty, connected open subset of RN . Given an f ∈ Cb (G; R),
one says that u ∈ C 2 (G; R) solves the Dirichlet problem for f in G if u is
1 Kakutani’s 1944 article, “Two dimensional Brownian motion and harmonic functions,” Proc.
Imp. Acad. Tokyo, 20, together with his 1949 article, “Markoff process and the Dirichlet prob-
lem,” Proc. Imp. Acad. Tokyo, 21, are generally accepted as the first place in which a definitive
connection between harmonic functions and Brownian motion was established. However, it
was not until with Doob’s “Semimartingales and subharmonic functions,” T.A.M.S., 77, in
1954 that the connection was completed. It is ironic that this connection was not made by
Wiener himself. Indeed, Wiener’s early fame as an analyst was based on his contributions to
potential theory. However, in spite of his claims to the contrary, I know of no evidence that
he discovered the connection between his measure and potential theory.
harmonic (i.e., ∆u = 0) in G and, for each a ∈ ∂G, u(x) −→ f (a) if as x ∈ G

tends to a. Assuming that (10.1.5) holds with G = R × G, the weak minimum
principle shows that there is at most one solution to the Dirichlet problem for
each f ∈ Cb (G; R). However, the corresponding question about the existence
of solutions is not so easily resolved. The following preliminary result will get
us started. In its statement, and elsewhere, a harmonic function on a non-
empty, open subset G of R is a u ∈ C 2 (G; R) that satisfies ∆u = 0. Also, if µ
is a non-zero, finite measure on E and f : E −→ R is a µ-integrable function, I
will write Z Z
1
− f dµ ≡ f dµ
µ(E)
to denote the average value of f with respect to µ. Finally, ζ G : C(RN ) −→ [0, ∞]
given by ζ G (ψ) = inf{t ≥ 0 : ψ(t) ∈
/ G} is the first exit time from G.
Theorem 10.2.4. Let G be a non-empty, open subset of RN . If u ∈ Cb (Ḡ; R),
u G is harmonic, and x is an element of G for which
Wx(N ) ζ G < ∞ = 1,

(10.2.5)
then
(N )
u(x) = EWx u ψ(ζ G ) , ζ G (ψ) < ∞ .

(10.2.6)
In particular, if u is harmonic on G, then2

Z
(10.2.7) B(x, r) ⊂⊂ G =⇒ u(x) = − u(x + rω) λSN −1 (dω).
SN −1
Conversely, if u : G −→ R is a locally bounded (i.e., bounded on compacts),

Borel measurable function that satisfies (10.2.7), then u ∈ C ∞ (G; R) and u is
harmonic. Finally, if f : ∂G −→ R is a bounded, Borel measurable function,
then the function u : G −→ R given by
(N )
h i
u(x) = EWx f ψ(ζ G ) , ζ G (ψ) < ∞ for x ∈ G

u is a bounded, harmonic function on G.

Proof: Suppose that u ∈ Cb (Ḡ; R) is harmonic on G. By Theorem 10.1.2,

u ψ(t ∧ ζ G ) , Ft , Wx(N )

is a martingale.
(N )
Hence, u(x) = EWx u ψ(t ∧ ζ G ) , and so, after letting t → ∞ and taking

(10.2.5) into account, one gets (10.2.6).

2 Remember that I use the notation Γ ⊂⊂ G to mean that Γ is a compact subset of G.
Now assume that u is harmonic in G and that B(x, r) ⊂⊂ G. By applying

(10.2.6) to u B(x, r) and noting that (cf. the first part of Theorem 10.1.11)
(N )
Wx (ζ B(x,r) < ∞) = 1, one has
(N )
u(x) = EWx u ψ(ζ B(x,r) ) , ζ B(x,r) (ψ) < ∞ .

Hence, the proof of (10.2.7) reduces to the observation that the distribution of
(N )
ψ ∈ {ζ B(x,r) < ∞} 7−→ ψ(ζ B(x,r) ) ∈ ∂B(x, r) under Wx is same as that of
ψ ∈ {ζ B(0,r) < ∞} 7−→ x+ψ(ζ B(0,r) ) under W (N ) and that (cf. Exercise 4.3.10)
the distribution of ψ ∈ {ζ B(0,r) < ∞} 7−→ ψ(ζ B(0,r) ) under W (N ) is rotation
invariant.
Turning to the converse assertion, suppose that u : G −→ R is a locally
bounded, Borel measurable function for which (10.2.7) holds. To see that u ∈
C ∞ (G; R), extend u to RN so that it is 0 off of G, and choose a ρ ∈ Cc∞ R; [0, ∞)
with support in (0, 1) and total integral 1. Using (10.2.7) together with Fubini’s
Theorem, one sees that, as long as B(x, r) ⊂⊂ G,
ZZ 1
u(x) = ρ(t) − u(x + trω) λS N −1 (dω) dt
0 N −1
Z S
1
|y − x|1−N ρ r−1 |y − x| u(y) dy,

=
ωN −1 r RN
from which it is clear that u ∈ C ∞ (G; R). Further, knowing that u is smooth
and satisfies (10.2.7), it is easy to see that it is harmonic. Indeed, by Taylor’s
Theorem, we know that
Z Z
− u(x + rω) λSN −1 (dω) − u(x) = − u(x + rω) − u(x) λSN −1 (dω)
SN −1 SN −1
r2
= 2 ∆u(x) + o(r2 ),
since, for any orthonormal basis (e1 , . . . , eN ) in RN and 1 ≤ i ≤ N ,

Z Z
2 1 2
− ei , ω RN
λSN −1 (dω) = − ω λSN −1 (dω)
SN −1 N SN −1
and, when 1 ≤ i 6= j ≤ N ,
Z Z

ei , ω RN
λSN −1 (dω) = ei , ω RN
ej , ω RN
λSN −1 (dω) = 0.
SN −1 SN −1
Hence, after dividing through by r2 and letting r & 0, we see that (10.2.7)
implies ∆u(x) = 0.
To complete the proof, let u be as in the final assertion. Because ζ G =

B(x,r)
ζ + ζ G ◦ Σζ B(x,r) if B(x, r) ⊂⊂ G and ζ B(x,r) < ∞, Theorem 10.2.1 implies
that
(N )
B(x, r) ⊂⊂ G =⇒ u(x) = EWx u ψ(ζ B(x,r) ) , ζ B(x,r) (ψ) < ∞ ;

and, as we have seen earlier, this implies that (10.2.7) holds.

An easy, but important, corollary of the preceding is that if G is connected,
then (10.2.5) for one x ∈ G implies (10.2.5) for all x ∈ G. Indeed, take u(x) =
(N )
1−Wx (ζ G < ∞), apply Theorem 10.2.4 to see that u is a [0, 1]-valued harmonic
function in G, and apply the strong minimum principle to see that u(x) = 0
for some x ∈ G implies u = 0 throughout G. A second easy corollary is that if
(10.2.5) holds and if u solves the Dirichlet problem in G for some f ∈ Cb (∂G; R),
then
(N )
u(x) = EW f ψ(t) , ζ G (ψ) < ∞ .

(10.2.8)
Thus, if (10.2.5) holds for all x ∈ G and we are going to solve the Dirichlet
problem for f , then we have no choice but to show that the u given by (10.2.8)
is a solution. Furthermore, because of the last part of Theorem 10.2.4, we already
know that this u is harmonic in G. Thus, all that remains is to find conditions
under which the u in (10.2.8) will take the correct boundary values.
It should be reasonably clear, and will be verified shortly (cf. Theorem 10.2.14),
that if f is continuous at a ∈ ∂G and if
(10.2.9) lim Wx(N ) (ζ G ≥ δ) = 0

x→a
for all δ > 0,
x∈G
then the function u in (10.2.8) tends to f (a) as x → a through G. For this

reason, I will say that a ∈ ∂G is a regular point if (10.2.9) holds, in which case
I write a ∈ ∂reg ∂G.
In order to give a probabilistic criterion with which to test the regularity of
a point, I need to introduce some notation. Given s ∈ [0, ∞), set ζsG (ψ) =
inf{t ≥ s : ψ(t) ∈ / G}, the first time ψ exits from G after time s, and define
G
ζ0+ (ψ) = lims&0 ζsG (ψ), the first positive time ψ exits from G. Notice that, for
s ∈ (0, ∞),
(10.2.10) ζsG = s + ζ G ◦ Σs G
and ζ0+ G
≥ s =⇒ ζ0+ = ζsG .
Lemma 10.2.11. Regularity is a local property in the sense that, for each
r ∈ (0, ∞), a ∈ ∂reg G if and only if a ∈ ∂reg G ∩ B(a, r) . Furthermore,
a ∈ ∂reg G ⇐⇒ a ∈ ∂G and Wa(N ) ζ0+

G

(10.2.12), > 0 = 0,
and so ∂reg G is Borel measurable. Finally, if a ∈ ∂reg G, then, for each δ > 0,

(N ) G G

(10.2.13) lim
x→a
W x ζ , ψ(ζ ) ∈ (0, δ) × B(a, δ) = 1.
x∈G
Proof: Set G(a, r) = G ∩ BRN (a, r). Since it is obvious that ζ G(a,r) is domi-
nated by ζ G , there is no question that a ∈ ∂reg G =⇒ a ∈ ∂reg G(a, r). On the
other hand, if a ∈ ∂reg G(a, r) and > 0, then, for all 0 < δ < ,
lim Wx(N ) (ζ G ≥ ) ≤ x→a

x→a
lim Wx(N ) (ζ G ≥ δ)
x∈G x∈G
Wx(N ) lim Wx(N ) ζ BRN (a,r) ≤ δ

ζ G(a,r) ≥ δ + x→a

≤ lim
x→a
x∈G(a,r) x∈G
!
r
≤ W (N ) sup |ψ(t)| ≥ −→ 0 as δ & 0.
t∈[0,δ] 2
Hence, we have now also proved that a ∈ ∂reg G(a, r) =⇒ a ∈ ∂reg G.

Next, let a ∈ ∂G. To check the equivalence in (10.2.12), use the first part of
(10.2.10) and the Markov property to see that
Z
x ∈ RN 7−→ Wx(N ) ζsG ≥ δ) = Wy(N ) ζ G ≥ δ − s g (N ) (s, y − x) dy ∈ [0, 1]

RN
is a continuous function for every s ∈ (0, ∞), and therefore that
x ∈ RN 7−→ Wx(N ) ζ0+G

≥ δ = lim Wx(N ) ζsG ≥ δ

s&0
(N ) G

is upper semicontinuous for all δ ≥ 0. In particular, if Wa ζ0+ > 0 = 0, then,
because ζ G (ψ) = ζ0+
G
(ψ) when ψ(0) ∈ G, it follows that
lim Wx(N ) ζ0+

lim Wx(N ) ζ G ≥ δ = x→a G

x→a
≥δ =0
x∈G x∈G
for every δ > 0. To prove the converse, suppose that a ∈ ∂reg G, let positive
and δ be given, and choose r > 0 so that
Wx(N ) ζ G ≥ δ ≤ for x ∈ G ∩ B(a, r).

Then, by the second part of (10.2.10), the Markov property, and (4.3.13), for
each s ∈ (0, δ) one has
(N )
h i
(N )
Wa(N ) ζ0+
G
≥ 2δ ≤ EWa Wψ(s) ζ G ≥ δ , ψ(s) ∈ G

r2
/ B(a, r) ≤ + 2N e− 2N s ,
≤ + Wa(N ) ψ(s) ∈

(N ) G
from which Wa ζ0+ > 0 = 0 follows when first s & 0 and then & 0.
Now, assume that a ∈ ∂reg G, and observe that, for each 0 < < δ,

Wx(N ) ψ ζ G ∈ / B(a, δ) or ζ G ≥ δ

!
≤ Wx(N ) ζ G ≥ + Wx(N ) sup |ψ(t) − a| ≥ δ .

t∈[0,]
Hence, (10.2.9) and (4.3.13) together imply that
δ2

lim Wx(N ) G G

x→a
ψ ζ ∈
/ B(x, δ) or ζ ≥ δ ≤ 2N exp − ,
x∈G
2N
from which (10.2.13) follows after one lets & 0.

In view of the last part of Theorem 10.2.4 and (10.2.13), the following state-
ment is obvious.
Theorem 10.2.14. Let G be a non-empty open subset of RN and f : ∂G −→ R
a bound, Borel measurable function. If u is given by (10.2.8), then u is a bounded
harmonic function in G, and, for every a ∈ ∂reg G at which f is continuous,
u(x) −→ f (a) as x → a through G.
Before closing this brief introduction to one of the most successful applications
of probability theory to partial differential equations, it seems only appropriate
to check that the conclusion in Theorem 10.2.14 is equivalent to the classical one
at which analysts arrived. To be precise, recall the famous program, initiated
by O. Perron and completed by Wiener, M. Brélot, and others, for solving the
Dirichlet problem. Namely, given a bounded, non-empty open set G in RN
and an f ∈ C(∂G; R), consider the set U(f ) of lower semicontinuous functions
w : G −→ R that are bounded below and satisfy the super-mean value property
Z
B(x, r) ⊂⊂ G =⇒ w(x) ≥ − w(x + rω) λSN −1 (dω),
SN −1
and the boundary condition
lim w(x) ≥ f (a) for all a ∈ ∂G.

x→a
x∈G
At the same time, define L(f ) to be the set of v : G −→ R such that −v ∈ U(−f ).
Finally, given a ∈ ∂G, say that a admits
a barrier if, for some r > 0, there
exists an η ∈ C 2 G ∩ B(a, r); (0, ∞) such that
lim
x→a
η(x) = 0 and ∆η ≤ − for some > 0.
x∈G∩B(a,r)
A famous theorem3 proved by Wiener states that
inf{w(x) : w ∈ U(f )} = sup{v(x) : v ∈ L(f )} for all x ∈ G
and that if Hf (x) denotes this common value, then x Hf (x) is a bounded
harmonic function on G with the property that
lim Hf (x) = f (a)

x→a
for a ∈ ∂G that admit a barrier.
x∈G
Theorem 10.2.15. Referring to the preceding paragraph, the function Hf

described there coincides with the function u in (10.2.8). In addition, a boundary
point a ∈ ∂G is regular (i.e., (10.2.9) holds) if and only if it admits a barrier.
Proof: To prove the first part, all that I have to do is check that v ≤ u ≤ w for
all v ∈ L(f ) and w ∈ U(f ). For this purpose, set r(x) = 12 |x − G{|, and define
{ζ n : n ≥ 0} so that ζ 0 = 0 and
ζ n+1 (ψ) = inf t ≥ ζ n (ψ) : |ψ(t) − ψ(ζ n )| ≥ r ψ(ζ n )

for n ≥ 0,
with the usual understanding that ζ n (ψ) = ∞ =⇒ ζ n+1 (ψ) = ∞. An easy

inductive argument shows that all the ζ n ’s are stopping times. In addition, it is
clear that ζ n ≤ ζ n+1 ≤ ζ G . I now want to show that ζ G (ψ) < ∞ =⇒ ζ n (ψ) %
ζ G (ψ). To this end, suppose that supn≥0 ζ n (ψ) < ζ G (ψ) < ∞, in which case

there exists an > 0 such that r ψ(ζ n ) ≥ for all n ≥ 0. But this would
mean that {ζ n (ψ) : n ≥ 0} is a bounded sequence for which inf n≥0 |ψ(ζ n+1 ) −
ψ(ζ n )| ≥ , which contradicts the continuity of ψ. Finally, choose a reference
point y ∈ G, and set Xn (ψ) equal to ψ(ζ n ) or y according to whether ζ n (ψ) <
∞ or not, Rn (ψ) = r Xn (ψ) , and Bn (ψ) = B Xn (ψ), Rn (ψ) , the ball around
Xn (ψ) of radius Rn (ψ), and observe that
ζ n (ψ) < ∞ =⇒ ζ n+1 (ψ) = ζ n (ψ) + ζ Bn (ψ) ◦ Σζ n (ψ).
With these preparations at hand, let w ∈ U(f ) and x ∈ G be given. By

Theorem 10.2.1 and the preceding,
(N )
EWx w ψ(ζ n+1 ) , ζ n+1 (ψ) < ∞

 
Z Z
(N )
= w ψ 0 (ζ Bn (ψ) ) WXn (ψ) (dψ 0 ) Wx(N ) (ψ)
 

{ψ:ζ n (ψ)<∞} {ψ 0 :ζ Bn (ψ) (ψ 0 )<∞}
Z
(N )
Wx n

=E − w Xn (ψ) + Rn (ψ)ω λSN −1 (dω), ζ (ψ) < ∞
SN −1
(N )
≤ EWx w ψ(ζ n ) , ζ n (ψ) < ∞ ,

3 See O.D. Kellogg’s Foundations of Potential Theory, Dover Publ. (1963).

where, in the passage to the second to last line, I have used the fact, established
earlier, that the exit place from a ball of a Brownian path started at its center
is uniformly distributed. Hence, by Fatou’s Lemma and the boundary condition
satisfied by w,
(N )
w(x) ≥ lim EWx w ψ(ζ n ) , ζ n (ψ) < ∞

n→∞
(N )
≥ EWx f ψ(ζ G ) , ζ G (ψ) < ∞ = u(x).

Thus, we have now shown that w ≥ u for all w ∈ U(f ). Of course, if v ∈ L(f ),
then, because −v ∈ U(−f ), we also know that −v ≥ −u and therefore that
v ≤ u.
(N )
I turn next to the second part of the theorem. Set m(x) = EWx [ζ G ], x ∈ G.
Clearly m is positive. Moreover, if m(x) −→ 0 as x → a through G, then a is
regular. Conversely, suppose a is regular. Since
(N ) 1 (N ) 1
m(x) ≤ δ + EWx ζ G , ζ G ≥ δ ≤ δ + Wx(N ) (ζ G ≥ δ) 2 EWx (ζ G )2 2 ,

it will follow that m(x) −→ 0 as x → a through G once we check that x

(N )
EWx (ζ G )2 is bounded on G. But, if R is the diameter of G and c ∈ RN

is chosen so that G ⊆ B(c, R), then ζ G ≤ ζ B(c,R) , and, by the first part of
(N )
EWx (ζ B(c,R) )2 is bounded

Theorem 10.1.11 and translation invariance, x
on B(c, R). Hence, we now know that a ∈ ∂G is regular if and only if m(x) −→
0 as x → a through G. To complete the proof at this point, set m̃(x) =
(N )
EWx [ζ B(c,R) ], and observe that, since ζ B(c,R) = ζ G + ζ B(c,R) ◦ Σζ G when ζ G <
∞,
(N )
m̃(x) − m(x) = EWx m̃ ψ(ζ G ) , ζ G (ψ) < ∞ .

Thus m̃−m is harmonic on G and so, by the first part of Theorem 10.1.11, ∆m =
∆m̃ = −2 on G. Hence, if a is regular, then m is a barrier at a. Conversely,
suppose that a admits a barrier η ∈ C 2 G ∩ B(a, r); (0, ∞) . Because of the
locality property proved in Lemma 10.2.11, I will, without loss in generality,
assume that B(a, r) ⊇ G. Choose a sequence {Gn : n ≥ 1} of open sets so that
Ḡn ⊂⊂ Gn+1 for each n and Gn % G. Then, by Theorem 10.1.2, for x ∈ Gn
and t ≥ 0,
(N )
η(x) ≥ η(x) − EWx η ψ(t ∧ ζ Gn )

t∧ζ Gn (ψ)
"Z #
(N ) Wx(N )
= −EWx 1
t ∧ ζ Gn .

2 ∆η ψ(τ ) dτ ≥ 2 E
0
Hence, after letting first t and then n tend to infinity, we see that m(x) ≤ 2 η(x)
for all x ∈ G; and, since η(x) −→ 0 as x tends to a through G, it follows that
a ∈ ∂reg G.
The argument used to prove the first part of Theorem 10.2.15 is a probabilistic
implementation of what analysts call the “balayage” procedure for solving the
Dirichlet problem.
Exercise 10.2.16. Suppose that G is a non-empty, open subset of RM ×RN and
that (x, y) ∈ G 7−→ u(x, y) ∈ R is a Borel measurable function that is harmonic
with respect to x and y separately (i.e., u( · , y) is harmonic on {x : (x, y) ∈ G}
for each y ∈ G and u(x, · ) is harmonic on {y : (x, y) ∈ G} for each x ∈ G).
Assuming that u is bounded below on compact subsets of G, show that u is
harmonic on G.
Hint: Clearly, all that one has to show is that u is smooth on G. In addition,
without loss in generality, one can assume that u can be extended to RM × RN
as a non-negative, Borel measurable function. Making this assumption, proceed
as in the proof of Theorem 10.2.4 to show that if ρ ∈ Cc∞ (0, 1); R has total
integral 1 and BRM (x, r) × BRN (y, r) ⊂⊂ G, then u(x, y) equals
ZZ
1 1−M 1−N −1
−1
|x−ξ| |y−η| ρ r |x−ξ| ρ r |y−η| u(ξ, η) dxdη.
ωM −1 ωN −1 r2
RM ×RN
Thus, all that remains is to justify differentiating under the integral.

Exercise 10.2.17. Show that the only functions u ∈ C 2 R2 ; [0, ∞) satisfying
∆u ≤ 0 are constant. This result is a manifestation of recurrence. Indeed, show
that it is completely false when R2 is replaced by either the half-space R×(0, ∞)
or R3 .
Hint: Using the sort of reasoning in the proof of Corollary 10.1.12, show that
−u(ψ(t) , Ft , W (2) is a non-positive submartingale and, as a consequence,

that limt→∞ u ψ(t) exists W (2) -almost surely. Now, using Theorem 10.2.3,
show that this is possible only if u is constant. To handle the last part, let
f ∈ Cc∞ R; [0, ∞) , and consider the function on R3 given by
Z |x| Z ∞
−1
u(x) = |x| ρf (ρ) dρ dσ.
0 σ
Exercise 10.2.18. The goal of this exercise is to prove Blumenthal’s 0–1
T (N )
Law, which states that if A ∈ F0+ ≡ t>0 Ft , then Wx (A) ∈ {0, 1} for each
x ∈ RN .
(i) If F : C(RN ) −→ R is a bounded, continuous function, show that, for any
A ∈ F0+ and x ∈ RN ,
(N ) (N )
EWx F, A = lim EWx F ◦ Σt , A

t&0
Z (N ) (N )
W
= lim E ψ(t) [F ] Wx(N ) (dψ) = EWx [F ]Wx(N ) (A).
t&0 A
(N ) (N ) (N )
(ii) For any A ∈ BC(RN ) , show that EWx F, A = EWx [F ]Wx (A) for all

bounded, Borel measurable F : C(RN ) −→ R if it holds for all bounded, contin-

uous ones.
(N ) (N )
(iii) By combining (i) and (ii), show that Wx (A) = Wx (A)2 for all A ∈ F0+
and x ∈ RN .
Exercise 10.2.19. Let G be a non-empty, open subset of RN . In this exercise,
we will develop a criterion for checking the regularity of boundary points.
(1)
(i) As an application of Exercise 4.3.15, show that, for Wx -almost every ψ ∈
C(R), ∀t > 0 ∃τ ∈ (0, t) ψ(τ ) = x. Next use this to see that when N = 1 every
boundary point of every open set is regular.
G (N )
(ii) As an application of Blumenthal’s 0–1 Law, show that Wx (ζ0+ > 0) ∈
{0, 1} for all x ∈ RN . Next, using this together with (10.2.12), show that a is
(N )
regular if and only if Wa ζ0+ = 0 > 0.
(iii) Assume that a ∈ ∂G has positive, upper Lebesgue density in G{. That is,
|B(a, r) ∩ G{|
lim > 0,
r&0 |B(a, r)|
where |Γ| denotes the Lebesgue measure of Γ ∈ BRN . Show that a is regular. In
particular, because, for any Borel set Γ, the set of x ∈ Γ with upper Lebesgue
density less than 1 has Lebesgue measure 0, this proves that ∂G \ ∂reg G has
Lebesgue measure 0. (See the Lemma 11.1.9 for another proof of this fact.)
Hint: Show that, for all t > 0,
ΩN e− 12 |B(a, t 12 ) ∩ G{|
Wa(N ) Wa(N )

ζ0+ ≤ t ≥ ψ(t) ∈
/G ≥ N 1 ,
(2π) 2 |B(a, t 2 )|
where ΩN ≡ |B(0, 1)|.

(iv) Use (ii) to prove the exterior cone condition for regularity. That is,
show that a ∈ ∂G is regular if there is an ω ∈ SN −1 and an α ∈ (0, 1] such that
the cone ( )
N
y − a, ω RN
y ∈ R : 0 < |y − a| < α & <α
|y − a|
is contained in G{
(v) If F is a closed subset of RN , r > 0, and G = {x ∈ RN : |x − F | > r},
show that every boundary point of G satisfies the exterior cone condition and is
therefore regular.
Exercise 10.2.20. Let G be a non-empty, open subset of RN . In this exercise

we will give a probabilistic justification of the famous Courant–Friedrichs–Lewy4
finite difference scheme for solving the Dirichlet problem. To this end, let {Xn :
n ≥ 1} be a sequence of independent, identically distributed RN -valued random

variables with mean 0 and covarianceI, define {Sn (t) : t ≥ 0} : n ≥ 0 as
in § 9.2.1, and let Pn,x ∈ M1 C(RN ) be the distribution of x + Sn ( · ). By
Donsker’s Invariance Principle, we know that, as n → ∞, {Pn,xn : n ≥ 0} tends
(N )
weakly to Wx if xn → x. Thus, one might hope that, for f ∈ Cb (G; R),
(N )
EPn,x f ψ(ζ G ) , ζ G (ψ) < ∞ −→ EWx f ψ(ζ G ) , ζ G (ψ) < ∞

(10.2.21)
uniformly on compacts. On the other hand, in order to justify (10.2.21), one

has to confront the problem that ψ ζ G (ψ) is, in general, only a lower semi-
continuous function, not a continuous one. Thus, we must find conditions under
(N )
which ζ G is Wx -almost surely continuous.
(i) Let ζ Ḡ (ψ) = inf{t ≥ 0 : ψ(t) ∈/ Ḡ}. Obviously, ζ G ≤ ζ Ḡ . Show that

ψ ζ (ψ) is upper semicontinuous and that ζ G is continuous at any ψ for
Ḡ
which ζ G (ψ) = ζ Ḡ (ψ) < ∞.

(N )
(ii) Say that a ∈ ∂G is strongly regular if Wa ζ Ḡ = 0 = 1. If every a ∈ ∂G
is strongly regular and if x ∈ G is a point at which (10.2.5) holds, show that
(N )
Wx ζ G = ζ Ḡ < ∞ = 1. Thus, (10.2.21) holds in this situation.
Hint: Use ζ G < ∞ =⇒ ζ Ḡ = ζ G + ζ Ḡ ◦ Σζ G .
(iii) Using Blumenthal’s 0–1 Law and the technique described in the Hint for
(N )
part (iii) of Exercise 10.2.19, show that Wa ζ Ḡ = 0 = 1 if a ∈ ∂G has
positive, upper Lebesgue density in RN \ Ḡ, that is, if
|B(a, r) ∩ (RN \ Ḡ)|

lim > 0.
r&0 |B(a, r)|
Thus, if (10.2.5) holds for all x ∈ G and every a ∈ ∂G has positive, upper
Lebesgue density in RN \ Ḡ, then (10.2.21) holds uniformly for x in compact
subsets of G.
4This type of approximation was carried out originally by H. Phillips and N. Wiener in “Nets
and Dirichlet problem,” J. Math. Phys. 2 in 1923. Ironically, the authors do not appear to have
made the connection between their procedure and probability theory. In 1928, a more complete
analysis was carried out in the famous article “Über die partiellen Differenzengleichungen der
Phsik,” Ann. Math. 5 # 2, of R. Courant, K. Friedrichs, and H. Lewy. Interestingly, these
authors do allude to a possible probabilistic interpretation, although their method (based on
energy considerations) makes no use of probability theory.
§ 10.3 Other Heat Kernels 429
Exercise 10.2.22. Although, as the preceding exercise shows, probability

theory provides approximation schemes with which to solve the Dirichlet prob-
lem, it is less successful when it comes to writing down explicit expressions for
solutions. Nonetheless, there is a situation in which probability theory does lead
to an explicit answer. To wit, consider the “upper half-space” H = RN × (0, ∞)
in RN +1 . Given y ∈ (0, ∞), show that, for x ∈ RN and Γ ∈ BRN ,
(N +1) (1) (N )
ψ(ζ H ) ∈ Γ × {0} = EW γx,ζ y I (Γ) ,

W(x,y)
where ζ y (ψ) = inf{t ≥ 0 : ψ(t) ≥ y}. Next, recall from Exercise 7.1.24 that the
1
W (1) -distribution of ζ y is the one-sided, 12 -stable law ν 21 . Thus
22 y
Z
(N +1) N
ψ(ζ H ) ∈ Γ × {0} =

W(x,y) πyR (y − x) dy,
Γ
where Z
N 1
πyR (y) = γ0,tI (y) ν 21 (dt).
(0,∞) 22 y
N
Finally, referring to Exercise 3.3.17, conclude that πyR is the Cauchy distribution
in (3.3.19). This, of course, explains the reason, alluded to in (ii) of Exercise
N
3.3.17, why analysts call πyR the Poisson kernel for the upper half-space.
§ 10.3 Other Heat Kernels
As we saw in § 10.1, from the perspective of someone studying partial differential
equations, the function (t, x, y) ∈ (0, ∞) × RN × RN 7−→ g (N ) (t, y − x) ∈ (0, ∞)
is the heat kernel, or, equivalently, the fundamental solution, to the classical
heat equation ∂t u = 12 ∆u in (0, ∞) × RN . That is, if ϕ ∈ Cb (RN ; R), then
Z
u(t, x) = ϕ(y)g (N ) (t, y − x) dy
RN
is the unique bounded solution to the classical heat equation that tends to ϕ
as t & 0. Of course, from a probabilistic perspective, g (N ) (t, y − x) is the
probability (in the sense of densities) of a Brownian path going from x to y
during a time interval of length t.
In this section I will construct other functions that, on the one hand, are
the fundamental solution to a heat equation and, at the same time, the den-
sity for the probability of a Brownian motion making transitions under various
conditions.
§ 10.3.1. A General Construction. For each t > 0, let Et : C(RN ) −→ [0, ∞)
be a Ft -measurable function with the property that
(10.3.1) Es+t (ψ) = Es (ψ)Et Σs ψ) for s, t ∈ (0, ∞) and ψ ∈ C(RN ),

and define
(N )
h i
q(t, x, y) = EW Et x(1 − `t ) + θt + y`t g (N ) (t, y − x),
(10.3.2)
for (t, x, y) ∈ (0, ∞) × RN × RN ,
where `t (τ ) = τ ∧t N
t , τ ∈ [0, ∞), and θt = θ − θ(t)`t , θ ∈ Θ(R ). Clearly (x, y) ∈
(RN )2 7−→ q(t, x, y) ∈ [0, ∞) is Borel measurable for each t > 0.
My goal in this subsection is to prove the following theorem.
Theorem 10.3.3. For each t ∈ (0, ∞) and Borel measurable ϕ : RN −→ R
that is bounded below,
Z h
(N ) i
ϕ(y)q(t, x, y) dy = EWx Et (ψ)ϕ ψ(t) .
RN
Moreover, for all s, t ∈ (0, ∞) and x, y ∈ RN , q satisfies the Chapman–Kolmo-

gorov equation Z
q(s + t, x, y) = q(s, x, z)q(t, z, y) dz.
RN
Finally, if, for each t > 0, Et is reversible in the sense that
Et (ψ) = Et (ψ̆ t ), ψ ∈ C(RN ),
where ψ̆ t (τ ) = ψ(t−τ ∧t), τ ∈ [0, ∞), then q(t, x, y) = q(t, y, x) for all (t, x, y) ∈
(0, ∞) × (RN )2 .
Proof: The first assertion is an easy application of (8.3.12) with n = 1. Namely,
by that result,
(N ) (N )
EWx Et (ψ)ϕ ψ(t) = EW

Et (x + θ)ϕ x + θ(t)
Z Z
(N )
EW Et (x + θt + y`t )ϕ x + y g (N ) (t, y) dy =

= ϕ(y)q(t, x, y) dy.
RN RN
To prove the Chapman–Kolmogorov equation, set q̄(t, x, y) = g(Nq(t,x,y) ) (t,y−x) .
Then, another application of (8.3.12), this time with n = 2, t1 = s, and t2 = s+t,

shows that q̄(s + t, x, y) equals
ZZ h
(N ) i
EW Es+t x+θ(s,s+t,(ξ,η)) +(y−x−η)`s+t g (N ) (s, ξ)g (N ) (t, η−ξ) dξdη.
RN ×RN

Next note that, by (10.3.1), Es+t x + θ(s,s+t,(ξ,η)) + (y − x − η)`s+t equals

s
Es x + θs + ξ + s+t (y − x − η) `s

s
× Et x + ξ + s+t (y − x − η) + (δs θ)t

s

+ y − x + ξ + s+t (y − x − η) `t ( · − s) .
Therefore, since Es is Fs -measurable and θs [0, s] is W (N ) -independent of

(δs θ)t ,
(N )
h i
EW Es+t x + θ(s,s+t,(ξ,η)) + (y − x − η)`s+t
s s

= q̄ s, x, x + ξ + s+t (y − x − η) q̄ t, x + ξ + s+t (y − x − η), y .
Plugging this into the expression for q̄(s + t, x, y) and making the change of
s
variable ξ → x + ξ + s+t (y − x − η), one finds that q̄(s + t, x, y) equals
ZZ
q̄(s, x, ξ)q̄(t, ξ, y)g (N ) (s, αη + ξ − c)g (N ) (t, βη − ξ + c) dξdη,
RN ×RN
s t tx+sy
where α = s+t , β= s+t , and c = s+t . At the same time, by Exercise 10.3.34,
Z
g (N ) (s, αη + ξ − c)g (N ) (t, βη − ξ + c) dη = g (N ) st

s+t , ξ −c
RN
g (N ) (s, ξ − x)g (N ) (t, η − ξ)
= ,
g (N ) (s + t, y − x)
and so we are done.
To prove the last assertion, simply note that when Et is reversible, q̄(t, x, y)
equals
(N )
h i (N )
h i
EW Et x(1−`t )+θt +y`t = EW Et y(1−`t )+(θ˘t )t +y`t = q̄(t, y, x),
since, by part (ii) of Exercise 8.3.22, θ (θ˘t )t [0, t] has the same distribution
(N )
under W as θ θt [0, t].
§ 10.3.2. The Dirichlet Heat Kernel. Let G be a non-empty, open subset
of RN , and set EtG (ψ) = 1(t,∞) ζ G (ψ) . Obviously Et is Ft -measurable and
(10.3.1) holds. In addition, if pG (t, x, y) is used to denote the associated q given
in (10.3.2), then, pG (t, x, y) = 0 unless x, y ∈ G, and, by Theorem 10.3.3,
Z h i
(N )
ϕ(y)pG (t, x, y) dy = EWx ϕ ψ(t) , ζ G (ψ) > t ,

(10.3.4) G
for (t, x) ∈ (0, ∞) × G,
Z
G
(10.3.5) p (s+t, x, y) = pG (s, x, z)pG (t, z, y) dz, (s, x), (t, y) ∈ (0, ∞)×G,
G
and
(10.3.6). pG (t, x, y) = pG (t, y, x) for (t, x, y) ∈ (0, ∞) × G2 .
In order to show that pG is smooth on (0, ∞) × G2 , I will use the Duhamel
formula contained in the following.
Theorem 10.3.7. For all (t, x, y) ∈ (0, ∞) × G2 ,

(N )
h i
(10.3.8) pG (t, x, y) = g (N ) (t, y − x) − EWx g (N ) t, y − ψ(ζ G ) , ζ G (ψ) < t .

Proof: Given α ∈ (0, 1), set

qα (t, x, y) = W (N ) x + θt (τ ) + (y − x)`t (τ ) ∈ G, τ ∈ [0, αt] g (N ) (t, y − x).
Clearly qα (t, x, y) & pG (t, x, y) as α % 1 for each (t, x, y) ∈ (0, ∞) × G2 . Thus,

it suffices for us to know that, for α ∈ (0, 1) and (t, x, y) ∈ (0, ∞) × G2 ,
(N )
h i
(*) qα (t, x, y) = g (N ) (t, y − x) − EWx g (N ) t, y − ψ(ζ G ) , ζ G (ψ) ≤ αt .

Further, by the same argument as was used to prove the first assertion in The-
orem 10.3.3, for any ϕ ∈ Cc (G; R),
Z h i
(N )
ϕ(y)qα (t, x, y) dy = EWx ϕ ψ(ζ G ) , ζ G (ψ) > αt

G
Z
= ϕ(y)g (N ) (y − x) dy
G
(N )
hZ i
− EWx ϕ(y)g (N ) t − ζ G (ψ), y − ψ(ζ G ) , ζ G (ψ) ≤ αt ,

G
where, in the passage to the second line, I have applied the same reasoning as was
suggested in part (i) of Exercise 7.3.7. Hence, (*) will follow once y q̄α (t, x, y)
qα (t,x,y)
≡ g(N ) (t,y−x) is shown to be continuous on G. To this end, argue as in the last
part of Theorem (10.3.1) and apply the Markov property to show that q̄α (t, x, y)
equals
W (N ) x + θt (t − τ ) + (y − x)`t (t − τ ) ∈ G, τ ∈ [(1 − α)t, t]

= W (N ) y + θt (τ ) + (x − y)`t (τ ) ∈ G, τ ∈ [(1 − α)t, t]

Z
g (N ) (1 − α)t, z − y Wz(N ) ψ(τ ) + y − ψ(t) `t (τ ) ∈ G dz,

=
RN
which is certainly continuous with respect to y.

The importance of (10.3.8) is that it provides vital information used in the
proof of the next theorem.
Theorem 10.3.9. For each (t, x, y) ∈ (0, ∞)×G2 , pG (t, x, y) > 0 or pG (t, x, y)
= 0 according to whether y is or is not in the same connected component of G
as x. Furthermore, for each K ⊂⊂ G,
Z

lim sup 1 − pG (t, x, y) dy = 0 for all r > 0,

t&0 x∈K G∩B(x,r)
and
lim sup pG (t, x, y) = 0 for (s, a) ∈ (0, ∞) × ∂reg G.
(t,x)→(s,a) y∈K
x∈G
Finally, pG is smooth on (0, ∞) × G2 , and, for each m ≥ 1, ∂tm pG (t, x, y) =

2−m ∆m G
x p (t, x, y) = 2
−m m G
∆y p (t, x, y) on (0, ∞) × G × G.
Proof: Obviously, pG (t, x, y) = 0 unless x and y lie in the same connected
component of G. On the other hand, if x and y lie in the same connected
component of G, then there is a smooth f : [0, t] −→ G such that f (0) = x and
f (t) = y. Thus, if h(τ ) = f (τ ∧ t) − x 1 − `t (τ ∧ t) − y`t (τ∧ t) for τ ∈ [0, ∞),
then h ∈ Θ(RN ) and there is an r > 0 such that x 1 − `t (τ ) + θt (τ ) + y`t (τ ) ∈
G, τ ∈ [0, t], for θ ∈ BΘ(RN ) (h, r). Hence, by Corollary 8.3.6,
pG (t, x, y) ≥ W (N ) BΘ(RN ) (h, r) > 0.

Next, let K ⊂⊂ G be given. Because, by (10.3.8), pG (t, x, y) ≤ g (N ) (y − x),

Z Z Z
pG (t, x, y) dy ≥ pG (t, x, y) dy − g (N ) (y − x) dy
G∩B(x,r) G RN \B(x,r)
Z
(N ) G
g (N ) (y) dy,

= Wx (ζ > t −
RN \B(0,r)
and therefore
Z

lim sup 1 − pG (t, x, y) dy ≤ lim sup Wx(N ) (ζ G ≤ t),

t&0 x∈K G∩B(x,r) t&0 x∈K
which, by (4.3.13), is 0. Also, again by (10.3.8),

(N )
h i
pG (t, x, y) =EWx g (N ) (t, y − x) − g (N ) t − ζ G (ψ), y − ψ(ζ G ) , ζ G < t

+ g (N ) (t, y − x)Wx(N ) (ζ G ≥ t).
Thus, as an application of (10.2.13), it is an easy matter to see that pG (t, x, y)

tends to 0 uniformly in y ∈ K as (t, x) → (s, a) ∈ (0, ∞) × ∂reg G.
To prove the asserted smoothness properties, begin with the observation that,
for any multi-index α ∈ NN 1
N +kαk 1 |x|2
∂xα g (N ) (t, x) = t− 2 Pα t− 2 x e− 2t ,
1 I use the conventional multi-index notation for partial derivatives. Thus, if α = (α1 , . . . , αN )
α PN
∈ NN , then ∂yα = ∂yα11 · · · ∂yNN and kαk = i=1
αi .
where Pα is an kαkth order polynomial. Hence, by considering the cases t ≤ |x|2

and t ≥ |x|2 separately, one finds that
Cn −
|x|2
max ∂xα g (N ) (t, x) ≤

(10.3.10) N +n e
4t
kαk=n (t + |x|2 ) 2
for some Cn < ∞. Hence, if kαk = n, then, by (10.3.8),

(N )
EWx
α (N )
t − ζ G (ψ), y − ψ(ζ G ) , ζ G (ψ) < t

∂y g
Cn (N )
Wx
h |y−ψ(ζG )|2
− G
i
≤ E e 4t , ζ (ψ) < t .
|y − ∂G|N +n
At the same time,

(N )
Wx
h |y−ψ(ζG )|2
− G
i
−
|y−x|2
(N ) |y − x|
E e 4t , ζ (ψ) < t ≤ e 16t + W kψk[0,t] ≥ ,
2
and so we now see (cf. (4.3.13)) that, for some other choice of Cn < ∞,
!
1 1 |y−x|2
(10.3.11) ∂yα pG (t, x, y) ≤ Cn e−

N +n + 16N t
(t + |y − x|2 ) 2 |y − ∂G|(N +n)
when kαk = n.
Combining (10.3.11) with the symmetry of pG , we have
!
1 1 |y−x|2
(10.3.12) ∂xα pG (t, x, y) ≤ Cn e−

N +n + 16N t .
(t + |y − x|2 ) 2 |x − ∂G|(N +n)
In addition, from (10.3.5),

Z
pG (t, x, y) = pG t
pG t

2 , x, z 2 , z, y dz,
G
and so, by (10.3.12) and (10.3.11), we see that (x, y) pG (t, x, y) is smooth for
each t ∈ (0, ∞).
To check the assertions about the time derivatives, first observe that for any
ϕ ∈ Cb2 (G; R) and (x, y) ∈ G2 ,
Z
1
lim pG (h, x, y)ϕ(y) dy − ϕ(x) = 12 ∆ϕ(x)
h&0 h
ZG
1 G
lim p (h, x, y)ϕ(x) dx − ϕ(y) = 12 ∆ϕ(y).
h&0 h G
To see this, use the symmetry of pG to show that the second of these follows
from the first one. To prove the first one, use pG (h, x, y) ≤ g (N ) (h, y − x) and
(10.3.8) to show that, for any ϕ̃ ∈ Cc2 (RN ; R) that equals ϕ in a neighborhood
of x, Z Z
G (N )

p (h, x, y)ϕ(y) dy − ϕ(x) − g (h, y − x)ϕ̃(y) dy

G G
tends to 0 faster than any power of h. Thus, since
Z
1 (N )
g (h, y − x)ϕ̃(y) dy − ϕ(x) −→ 12 ∆ϕ(x),
h G
the assertion is proved. Given the preceding, we know that
1h G i 1 Z
G G G G
p (t + h, x, y) − p (t, x, y) = p (h, x, z)p (t, z, y) dz − p (t, x, y)
h h G
tends to 12 ∆x pG (t, x, y). Thus, ∂t pG (t, x, y) = 12 ∆x pG (t, x, y). Similarly, using
Z Z
pG (t + h, x, y) = pG (t, x, z)pG (h, z, y) dz = pG (h, y, z)pG (t, x, z) dz,
G G
G 1 G
one gets ∂t p (t, x, y) = 2 ∆y p (t, x, y). Finally, assume the result for m, use
(10.3.11) to justify
Z
−m
∂tm pG (t + h, x, y) = 2 pG (h, x, z)∆m G
y p (t, z, y) dz,
G
differentiate this with respect to h, and let h & 0 to arrive at
∂tm+1 pG (t, x, y) = 2−m−1 ∆x ∆m G
y p (t, x, y)
m+1 G
∆x p (t, x, y)
= 2−m−1
∆m G m+1 G
y ∆x p (t, x, y) = ∆y p (t, x, y).
The following result provides the justification for my calling pG the Dirichlet
heat kernel on G.
Corollary 10.3.13. For each ϕ ∈ Cb (G; R), the function
Z
(N )
Wx
G
ϕ(y)pG (t, x, y) dy

u(t, x) = E ϕ ψ(t) , ζ (ψ) > t =
G
is a smooth solution to the boundary value problem
∂t u(t, x) = 12 ∆u(t, x) in (0, ∞) × G,

lim u(t, · ) = ϕ uniformly on compacts,
t&0
lim u(t, x) = 0 for (s, a) ∈ (0, ∞) × ∂reg G.

(t,x)→(s,a)
x∈G
Moreover, if ∂G = ∂reg G, then u is the only bounded solution to this boundary

value problem.
Proof: That the u in the first part is a bounded, smooth solution follows easily
from (10.3.12) and the last part of Theorem 10.3.9. To prove the uniqueness
assertion when ∂G = ∂reg G, choose {Gn : n ≥ 1} to be a non-decreasing
S
sequence of open sets so that Gn ⊆ G and G = n≥1 Gn . Given a bounded
solution u, apply Theorem 10.1.2 to see that, for each n ≥ 1, u(t, x) equals
(N ) (N )
EWx ϕ ψ(t) , ζ Gn (ψ) > t + EWx u t − ζ Gn (ψ), ψ(ζ Gn ) , ζ Gn (ψ) ≤ t

(N ) (N )
= EWx ϕ(ψ(t) , ζ G (ψ) > t + EWx u t − ζ Gn (ψ), ψ(ζ Gn ) , ζ G (ψ) < t

(N )
+ EWx u t − ζ Gn (ψ), ψ(ζ Gn ) − ϕ ψ(t) , ζ Gn (ψ) ≤ t < ζ G (ψ) .

Since ζ Gn % ζ G , the second and third terms on the right tend to 0 as n → ∞.

Remark 10.3.14. The uniqueness part of Corollary 10.3.13 continues to hold
even if ∂G 6= ∂reg G. Indeed, the only place at which I used the assumption
that
∂G = ∂reg G was where I wanted to know that u t−ζ Gn (ψ), ψ(ζ Gn (ψ) −→ 0 on
{ζ G < t}, and for this I needed to be sure that ζ G (ψ) < ∞ =⇒ ψ(ζ G ) ∈ ∂reg G.
(N )
However, it would have been enough to know that ψ(ζ G ) ∈ ∂reg G for Wx -
almost every ψ ∈ {ζ G < ∞}, and this is always the case. Because the proof
(N )
that, for x ∈ G, Wx ψ(ζ G ) ∈ / ∂reg G = 0 is not simple, and since Corollary
10.3.13 covers most applications, I have chosen to settle for the weaker statement
here and postpone the proof of the general case until the next chapter (cf. § 11.1).
§ 10.3.3. Feynman–Kac Heat Kernels. In this subsection I will put some
of the considerations in §§ 10.1.3 and 10.1.4 into a more general framework.
Let V : RN −→ R be a Borel measurable function that is bounded above, and
define
q V (t, x, y)
Z t
(10.3.15)

(N )
= EW exp V x + θt + (y − x)`t (τ ) dτ g (N ) (t, y − x).
0
By applying Theorem 10.3.3 with

Z t

Et (ψ) ≡ exp V ψ(τ ) dτ ,
0
we see that
Z Z t
(N )
V Wx

(10.3.16) ϕ(y)q (t, x, y) dy = E exp V ψ(τ ) dτ ϕ ψ(t)
RN 0
for (t, x) ∈ (0, ∞) × RN and Borel measurable ϕ’s that are bounded below,
Z
V
(10.3.17) q (t, x, y) = q V (s, x, z)q V (t, z, y) dz
RN
for (s, x), (t, y) ∈ (0, ∞) × RN , and
(10.3.18) q V (t, x, y) = q V (t, y, x) for (t, x, y) ∈ (0, ∞) × (RN )2 .
As a consequence of (10.3.16) and Theorem 10.1.2, we know that q V (t, x, · )

is intimately related to the operator 12 ∆ + V . Indeed, by that theorem, we know
that if u ∈ Cb1,2 (0, ∞) × RN ; R satisfies the Cauchy initial value problem

(10.3.19) ∂t u = 12 ∆u + V u with lim u(t, · ) = ϕ uniformly on compacts

t&0
for some ϕ ∈ Cb (RN ; R), then

Z
(10.3.20) u(t, x) = ϕ(y) q V (t, x, y) dy (t, x) ∈ (0, ∞) × RN .
RN
I now want to make an analysis of q V (t, x, · ) which, among other things, will
enable me to show (cf. Corollary 10.3.22) that, under suitable conditions on V ,
the right-hand side of (10.3.20) is necessarily a solution to (10.3.19). For this
reason, I will call q V the Feynman–Kac heat kernel with potential V .
Theorem 10.3.21. Assume that V ∈ C n (RN ; R) is bounded above and that,
for some Cn < ∞,
max ∂xα V (x) ≤ Cn 1 + V − (x) , x ∈ RN .

kαk≤n
Then q V (t, · , y) ∈ C n (RN ; R) for every (t, y) ∈ (0, ∞) × R, ∂xα q V (t, x, · ) ∈

C n (RN ; R) for each α ∈ NN with kαk ≤ n, and there exists a C < ∞ such that,
when kαk ∨ kβk ≤ n,
N +kαk+kβk + |y−x|2
∂x ∂y q (t, x, y) ≤ C 1 + (t + |y − x|2 )− etkV ku − 4t .
α β V
2
n
Finally, if n ≥ 2 and m ≤ 2, then
m m
∂tm q V (t, x, y) = 1
2 ∆x + V (x) q V (t, x, y) = 1
2 ∆y + V (y) q V (t, x, y).
Proof: To prove the differentiability properties of q V (t, x, y) with respect to

x, let kαk ≤ n be given, use (10.3.15) to see that ∂xα q V (t, x, y) is a finite linear
combination of terms of the form
" ` Z #
(N ) Y t kα (k)
k (k)
EW 1 − τt (∂ α V ) ψt,x,y (τ ) E V (t, x, y, ψ) dτ

k=1 0
(0)
× ∂xα g (N ) (t, y − x),
Rt
V (ψt,x,y (τ )) dτ
where ψt,x,y (τ ) = x + ψt (τ ) + τt (y − x), E V (t, x, y, ψ) = e 0 , and
P` (k)
k=0 α = α. Since, by our hypotheses, each of the integrands in these terms
+
is bounded by a constant times etkV ku , the asserted estimate for ∂xα q V (t, x, y)
follows from this and (10.3.10).
The rest of the proof is similar to, but easier than, that of Theorem 10.3.9.
Specifically, one uses q V (t, x, y) = q V (t, y, x) and
Z
V
qV t
qV t

q (t, x, y) = 2 , x, z 2 , z, y dz
RN
to prove the existence of and estimate for ∂xα ∂yβ q V (t, x, y). Also, knowing these
results about the spacial derivatives, one deals with the time derivatives in the
same way as I did at the end of that theorem. The details are left to the
reader.
Corollary 10.3.22. Let V be as in Theorem 10.3.21, and assume that n ≥ 2.
Then, for each ϕ ∈ Cb (RN ; R), the function
h Rt Z
Wx
(N ) V (ψ(τ )) dτ i
u(t, x) = E e 0 ϕ t) = ϕ(y)q V (t, x, y) dy
RN

is the unique u ∈ C 1,2 (0, ∞) × RN ; R that is bounded on (0, T ) × RN for each
T > 0 and satisfies (10.3.19).
Proof: The only assertion that has not already been proved is that the u
described takes on the correct initial value. However, because q V (t, x, y) ≤
+
ekV ku g (N ) (t, y − x), it is clear that, for each r > 0,
Z
lim sup q V (t, x, y) dy = 0.
t&0 x∈RN B(x,r){
Hence, all that remains is to check that, for each R > 0,

Z
q V (t, x, y) dy = 0.

lim sup 1 −
t&0 |x|≤R RN
But if K(R) = sup|y|≤2R |V (y)|, then

Z Rt
V W (N ) x V (ψ(τ )) dτ

sup 1 − q (t, x, y) dy ≤ E

1 − e 0
|x|≤R RN
+
≤ tK(R)etK(R) + 1 + etkV ku W (N ) kψk[0,t] ≥ R ,

which, by (4.3.13), gives the desired conclusion.

§ 10.3.4. Ground States and Associated Measures on Pathspace. From

a probabilistic standpoint, the heat kernel q V (t, x, y) is flawed by the fact that
it is not a probability density. However, in many cases this flaw can be removed
by what physicists call switching to the ground state representation.
This terminology and the ideas underlying it are best understood when ex-
pressed in terms of operators. Thus, let V ∈ C(RN ; R) be bounded above, refer
to the preceding subsection, and define the operator
Z
V
Qt ϕ(x) = ϕ(y)q V (t, x, y) dy for t ≥ 0 and ϕ ∈ Cb (RN ; R).
RN
We know that QVt
is a bounded map from Cb (RN ; R) into itself. In addition, by
(10.3.17), {Qt : t ≥ 0} is a semigroup. That is, QVs+t = QVt ◦ QVs . Also, by
V
Corollary 10.3.22, we know that if

V ∈ C 2 (RN ; R) and max |∂ α V | ≤ C 1 + V − ,

(10.3.23)
kαk≤2
then (t, x) QVt ϕ(x) is a solution to (10.3.19).

I will say that ρ : RN −→ R is a ground state for V if ρ is a (strictly) positive,
continuous function that satisfies the equation etλ ρ = QVt ρ for some λ ∈ R and
all t ≥ 0, in which case λ will be called the eigenvalue associated with ρ.

Lemma 10.3.24. Let V be as above, and assume that ρ ∈ C RN ; [0, ∞) does
not vanish identically. If etλ ρ = QVt ρ for all t ≥ 0, then ρ is a ground state
with associated eigenvalue λ. In fact, ρ ∈ Cb2 RN ; (0, ∞) if ρ is bounded and
V ∈ C 2 (RN ; R) satisfies (10.3.23). Next, if ρ is a twice continuously differentiable
ground state with associated eigenvalue λ, then 12 ∆ρ + V ρ = λρ. Conversely,
if ρ is a twice continuously differentiable, bounded solution to 12 ∆ρ + V ρ = λρ,
then ρ is a ground state with associated eigenvalue λ.
Proof: Since I can always replace V by V −λ, I may and will assume that λ = 0
throughout. Also, observe that if ρ ∈ C RN ; [0, ∞) satisfies ρ = QV1 ρ, then,
because q V (1, x, y) > 0 everywhere, ρ > 0 everywhere unless ρ ≡ 0. Hence, the
first assertion is proved.
Next suppose that ρ is a twice continuously differentiable ground state with
eigenvalue 0. To see that 12 ∆ρ + V ρ = 0, it suffices to show that
1 ∞ N

2 ∆ϕ + V ϕ, ρ L2 (RN ;R) = 0 for all ϕ ∈ Cc (R ; R).
To this end, let ϕ ∈ Cc∞ (RN ; R) be given, and apply symmetry, Theorem 10.1.2,
and Fubini’s Theorem to justify
0 = ϕ, QV1 ρ − ρ L2 (RN ;R) = QV1 ϕ − ϕ, ρ L2 (RN ;R)

Z 1

= QVτ 21 ∆ϕ + V ϕ , ρ 2 N dτ
0 L (R ;R)
Z 1
1
+ V ϕ, QVτ ρ 1

= 2 ∆ϕ dτ = 2 ∆ϕ + V ϕ, ρ L2 (RN ;R) .
0 L2 (RN ;R)
Finally, suppose that ρ is a bounded, twice continuously differentiable solution

to 12 ∆ρ + V ρ = 0. Then, by Corollary 10.1.3 applied to the time-independent
function u(t, · ) = ρ, we know that ρ = QVt ρ for all t ≥ 0. Thus, by the initial
observation, ρ is a ground state with associated eigenvalue 0.
Theorem 10.3.25. Let V ∈ C(RN ; R) be bounded above, assume that ρ is a
ground state for V with associated eigenvalue λ, and set
pρ (t, x, y) = e−tλ ρ(x)−1 q V (t, x, y)ρ(y) for (t, x, y).
Then pρ is a strictly positive, continuous function, pρ (t, x, · ) has total integral
1 for all (t, x) ∈ (0, ∞) × RN ,
Z
lim sup pρ (t, x, y) dy = 0 for all r, R ∈ (0, ∞),
t&0 |x|≤R B(0,r)
and Z
pρ (s + t, x, y) = pρ (t, z, y)pρ (t, x, z) dz.
RN
Finally, if V ∈ C 2 (RN ; R) satisfies (10.3.23), then x pρ (t, x, y) is twice con-
N
tinuously differentiable for each (t, y) ∈ (0, ∞) × R , y ∂xα pρ (t, x, y) is twice
continuously differentiable for each α with kαk ≤ 2 and (t, x) ∈ (0, ∞) × RN ,
and
∂t pρ (t, x, y) = 12 ∆x pρ (t, x, y) + ∇x (log ρ), ∇x pρ (t, x, y) RN

= 12 ∆y pρ (t, x, y) − divy pρ (t, x, y)∇ log ρ(y)

for all (t, x, y) ∈ (0, ∞) × RN × RN . In particular, for each ϕ ∈ Cb (RN ; R), the
function Z
u(t, x) = ϕ(y)pρ (t, x, y) dy
RN

is the one and only bounded u ∈ C 1,2 (0, ∞) × RN ; R that satisfies
∂t u(t, x) = 12 ∆u(t, x) + ∇ log ρ(x), ∇u(t, x) RN in (0, ∞) × RN

lim u(t, x) = ϕ(x) uniformly on compacts.

t&0
Proof: The only assertion that is not an immediate consequence of Theorem

10.3.21, Corollary 10.3.22, and the preceding lemma is the uniqueness in the final
part, which is an easy consequence of the corresponding uniqueness statement in
Corollary 10.3.22. Indeed, if u is a bounded solution to the given Cauchy initial
value problem and w(t, · ) = ρu(t, · ), then w is a bounded solution to ∂t w =
1
2 ∆w + (V − λ)w with initial condition ρϕ. Hence, by R the uniqueness result in
Corollary 10.3.22, w(t, · ) = QVt (ρϕ), and so u(t, · ) = RN ϕ(y)pρ (t, x, y) dy.
The advantage that pρ (t, x, y) has over q V (t, x, y) is that we can construct
measures on C(RN ) that bear the same relationship to it as the Wiener measures
(N )
Wx bear to the classical heat kernel g (N ) (t, y − x).
Theorem 10.3.26. Let V ∈ C(RN ; R) be bounded above, and assume that ρ is

a ground state for V with associated
eigenvalue λ. Then, for each x ∈ RN , there
is a unique Pρx ∈ M1 C(RN ) such that, for each n ≥ 1, 0 = t0 ≤ t1 < · · · < tm ,
and Γ, · · · , Γn ∈ BRN ,
Z n
Z Y
Pρx pρ tm −tm−1 , ym−1 , ym ) dy1 · · · dyn ,

ψ(tm ) ∈ Γm , 1 ≤ m ≤ n = ···
Γ1 ×···×Γn m=1
where y0 = x. In fact, if
−1 V Rt
ρ −tλ
V V ((ψ(τ )) dτ
R (t, ψ) = e ρ ψ(0) E (t, ψ)ρ ψ(t) where E (t, ψ) = e 0 ,
then (N )
Pρx (A) = EWx
ρ
R (t), A for all t ≥ 0 and A ∈ Ft .
Finally, x Pρx is continuous, and, for any stopping time ζ,
Z Z Z
Pρx (dψ) 0
) Pρψ(ζ) (dψ 0 ) Pρx (dψ)

F ψ, Σζ ψ = F (ψ, ψ
{ζ(ψ)<∞} {ζ(ψ)<∞}
whenever F : C(RN ) × C(RN ) −→ R is a Fζ × BC(RN ) -measurable function that

is bounded below.
(N )
Proof: I begin by showing the Rρ (t), Ft , Wx is a martingale. Indeed,
(N )
Wx

E R (t) = 1 for all (t, x) ∈ [0, ∞) × R . In addition, Rρ (s + t, ψ) =
ρ N
R (s, ψ)Rρ t, Σs ψ , and so, by (10.2.2),

ρ
Z (N )
(N ) (N )
Wψ(s)
EWx Rρ (t) Wx(N ) (dψ) = EWx Rρ (s), A
ρ
Rρ (s, ψ)E

R (s+t), A =
A
for A ∈ Fs . (N )
Determine µt,x ∈ M1 C(RN ) by µt,x (dψ) = R(t, ψ)Wx (dψ). By the pre-
ceding, µt1 ,x Ft1 = µt2 ,x Ft1 for
all 0 ≤ t1 ≤ t2 , and so (cf. Exercise 9.3.6)
there is a unique Pρx ∈ M1 C(RN ) whose restriction to Ft is the same as that
of µt,x for all t ≥ 0.
To see that x Pρx is continuous, it suffices to check that
lim Rρ (t, y + ψ) = Rρ (t, x + ψ) in L1 (W (N ) ; R).

y→x
But clearly this convergence is taking place pointwise for each ψ ∈ C(RN ). In
addition, Rρ (t, · ) ≥ 0 and, for each z ∈ RN , Rρ (t, z + ψ) has W (N ) -integral 1.
Hence, the convergence is also taking place in L1 (W (N ) ; R).
Now suppose that ζ is a stopping time and that ζ ≤ T for some T ∈ (0, ∞).
Then, for any Fζ × FT -measurable F : C(RN )2 −→ R that is bounded below,
Z
F ψ, Σζ ψ Pρx (dψ)

Z
= Rρ ζ(ψ), ψ Rρ (2T − ζ(ψ), Σζ ψ F ψ, Σζ ψ Wx(N ) (dψ)

Z Z
ρ 0 0 (N ) 0
R 2T − ζ(ψ), ψ F (ψ, ψ )Wψ(ζ) (dψ ) Wx(N ) (dψ)
ρ

= R ζ(ψ), ψ
Z Z
ρ 0 ρ 0
F (ψ, ψ ) Pψ(ζ) (dψ ) Wx(N ) (dψ)

= R ζ(ψ), ψ
Z Z
0 ρ 0
= F (ψ, ψ ) Pψ(ζ) (dψ ) Pρx (dψ),
where I have again used (10.2.2) and, in the final step, Hunt’s Theorem (cf.
Theorem 7.1.14) to replace Rρ ζ(ψ), ψ) by Rρ (T, ψ). Starting from this, one
can easily remove the condition that ζ is bounded and extend the result to all
F ’s that are Fζ × BC(RN ) -measurable and bounded below.
To complete the proof, observe that, as a special case of the preceding,
Z
ρ ρ
pρ t, ψ(s), y) dy, A

EPx ϕ ψ(s + t) , A = EPx
RN
for all s, t ∈ [0, ∞), A ∈ Fs , and bounded Borel measurable ϕ : RN −→ R.

Hence, proceeding by induction on n and applying the preceding at each stage,
one can readily show that Pρx is related to pρ (t, x, y) in the way described in the
initial assertion.
Corollary 10.3.27. Let everything be as in Theorem 10.3.26, only this
time assume that ρ is twice continuously differentiable. Then, for
any bounded
1
1,2 N

ϕ∈C [0, ∞) × R ; R such that f ≡ ∂t ϕ + 2 ∆ + ∇ log ρ, ∇ϕ RN is bounded,
Z t
ρ

ϕ t, ψ(t) − f τ, ψ(τ ) dτ, Ft , Px
0
N
is a martingale for all x ∈ R .
Proof: By replacing V with V −λ, I can reduce to the case when λ = 0. Hence,
I will assume that λ = 0. Rt
V ((ψ(τ )) dτ
To prove the asserted martingale property, set E V (t, ψ) = e 0 , and
1 1
remember that, by Lemma 10.3.24, 2 ∆ρ+V ρ = 0. Thus, 2 ∆(ρϕ)+V (ρϕ) = f ρ,
and so, by Theorem 10.1.2,

E V (t ∧ ζ B(x,R) (ψ), ψ)(ρϕ) t ∧ ζ B(x,R) (ψ), ψ(t ∧ ζ B(x,R) )

Z t∧ζ B(x,R) (ψ)

V
dτ, Ft , Wx(N )

− E (τ, ψ)(ρf ) τ, ψ(τ )
0
is a martingale for every R > 0. Equivalently,

Rρ t ∧ ζ B(x,R) (ψ), ψ ϕ ψ(t ∧ ζ B(x,R) )

Z t∧ζ B(x,R) (ψ)

− Rρ (τ, ψ)f ψ(τ ) dτ, Ft , Wx(N )
0
(N )
is a martingale. Hence, since Rρ (t), Ft , Wx is a martingale, one can apply
Theorem 7.1.14 to see that
t∧ζ B(x,R) (ψ)

Z !
ϕ ψ(t ∧ ζ B(x,R) ) − f ψ(τ ) dτ, Ft , Pρx

0
is a martingale. Finally, since ϕ and f are bounded, we can let R → ∞ and

thereby get the required conclusion.
In order to understand the relationship between Brownian paths and the paths
as seen by the measure Pρx , I will need the following general lemma.
Lemma 10.3.28. Let b : RN −→ RN be a continuous function, and set
Z t
B(t, ψ) = ψ(t) − x − b ψ(τ ) dτ.
0

If P ∈ M1 C(RN ) has the properties that P ψ(0) = x = 1 and
Z t
1

ϕ ψ(t) − 2 ∆ϕ + b, ∇ϕ RN ψ(τ ) dτ, Ft , P
0
is a martingale for all ϕ ∈ Cc∞ (RN ; R), then B(t), Ft , P is a Brownian motion.

Proof: Without loss in generality, I will √assume that x = 0.

Given ξ ∈ RN and R > 0, set eξ (y) = e −1(ξ,y)RN ,
t∧ζ B(0,R) (ψ)

!
|ξ|2 √
Z
− −1 ξ, b(y) RN , and EξR (t, ψ) = exp

f (y) = f ψ(τ ) dτ .
2 0
By choosing ϕ ∈ Cc∞ (RN ; C) so that ϕ = eξ on B(0, 2R) and applying Doob’s

Stopping Time Theorem, we know that MξR (t), Ft , P is a martingale, where
Z t∧ζ B(0,R) (ψ)

MξR (t, ψ) B(0,R)

= eξ ψ(t ∧ ζ ) + f ψ(τ ) eξ ψ(τ ) dτ.
0
Thus, by Theorem 7.1.17,

t∧ζ B(0,R) (ψ)
Z !
EξR (t)MξR (t) − MξR (τ, ψ)f ψ(τ ) EξR (τ, ψ) dτ, Ft , P

0
is also a martingale. At the same time, after performing elementary calculus

operations, one sees that
√ |ξ|2
exp −1 ξ, B(t ∧ ζ B(0,R) (ψ) RN + t ∧ ζ B(0,R) (ψ)

2
Z t∧ζ B(0,R) (ψ)
= EξR (t)MξR (t) − MξR (τ, ψ)f ψ(τ ) EξR (τ, ψ) dτ.

0
Hence
√ |ξ|2

B(0,R) B(0,R)

exp −1 ξ, B(t ∧ ζ (ψ) RN + t∧ζ (ψ) , Ft , P
2
is a martingale for every R > 0, and so, after letting R → ∞, we know, by
Theorem 7.1.7, that B(t), Ft , P is a Brownian motion.
It is important to be clear about what Lemma 10.3.28 says and what it does not
say. It says that there is a progressively measurable B : [0, ∞) × C(RN ) −→ RN
such that B(t), Ft , P is a Brownian motion and
Z t
b ψ(τ ) dτ, (t, ψ) ∈ [0, ∞) × C(RN ).

(*) ψ(t) = x + B(t, ψ) +
0
In the probabilistic literature, this would be summarized by saying that P is

the distribution of a Brownian motion with drift b. What Lemma 10.3.28
does not say is that one can always use (*) to reconstruct ψ from B( · , ψ).
More precisely, ψ is not necessarily a measurable function of B( · , ψ). Indeed,
without additional assumptions on b, it will not be a measurable function of B.
Nonetheless, if b is locally Lipschitz continuous, then it will be. To see this,
take η ∈ Cc∞ R N

; [0, 1] so that η = 1 on B(0, 2) and 0 off of B(0, 3), and set
y

bR (y) = η R b(y). Then bR is uniformly Lipschitz continuous, and so, by
completely standard methods (e.g., Picard iteration), one can show that there
is a continuous map ϕ ∈ C(RN ) 7−→ XR ( · , ϕ) ∈ C(RN ) such that, for each
ϕ ∈ C(RN ),
Z t
R
bR XR (τ, ϕ) dτ, t ≥ 0.

X (t, ϕ) = ϕ(t) +
0
Moreover, if ψ ∈ C(RN ) and

Z t
bR ψ(τ ) dτ,

ψ(t) = ϕ(t) + t ∈ [0, T ],
0
then ψ [0, T ] = XR ( · , ϕ) [0, T ]. Hence, if
A(b) = ϕ ∈ C(RN ) : ∀t ≥ 0 ∃R > 0 kXR ( · , ϕ)k[0,t] ≤ R ,

then A(b) ∈ BC(RN ) , and I can define the Borel measurable map ϕ ∈ C(RN )
7−→ Xb ( · , ϕ) ∈ C(RN ) given by
XR (t, ϕ) if ϕ ∈ A(b) and kXR ( · , ϕ)k[0,t] ≤ R

Xb (t, ϕ) =
ϕ(t) if ϕ ∈
/ A(b).
In particular, when b is locally Lipschitz continuous,

Lemma 10.3.28 says that
x+B( · , ψ) ∈ A(b) and ψ(t) = Xb t, x+B( · , ψ) for all (t, ψ) ∈ [0, ∞)×C(RN ).
Corollary 10.3.29. Let everything be as in Corollary 10.3.27, bρ = ∇ log ρ,
and define the set A(bρ ) and the map Xbρ accordingly, as in the preceding
(N ) (N )
discussion. Then Wx A(bρ ) = 1 and Pρx = (Xbρ )∗ Wx for all x ∈ RN .
Proof: Define ψ B( · , ψ) in terms of bρ as in Corollary 10.3.27. Then, by
(N )
that corollary, we know that Wx is the distribution of ψ x + B( · , ψ) under

Pρx . Therefore, since x + B( · , ψ) ∈ A(bρ ) and ψ(t) = Xbρ t, x + B( · , ψ) for
all (t, ψ) ∈ [0, ∞) × C(RN ), the desired conclusions follow immediately.
§ 10.3.5. Producing Ground States. As yet I have not addressed the prob-
lem of producing ground states. In this subsection I will provide two approaches.
The first of these gives a criterion that guarantees the existence of a ground state
for a given V . The second goes in the opposite direction.
It is the essentially
trivial remark that there are many ρ ∈ C 2 RN ; (0, ∞) such that ρ is the ground
state of some V .
The first approach is an application of elementary spectral theory and is based
on the observation that, because q V (t, x, y) = q V (t, y, x), QVt is symmetric on
L2 (RN ; R) in the sense that
ϕ1 , QVt ϕ2 = ϕ2 , QVt ϕ1

(10.3.30) L2 (RN ;R) L2 (RN ;R)
for all ϕ1 , ϕ2 ∈ Cc (RN ; R).

The fact that QVt is symmetric on L2 (RN ; R) has profound implications, a few
of which are contained in the following lemma.
Lemma 10.3.31. For each q ∈ [1, ∞) and t ∈ (0, ∞), QVt Cc∞ (RN ; R) admits
a unique extension (which I again denote by QVt ) as a bounded linear operator on
+
Lq (RN ; R) into itself with norm at most etkV ku . Moreover, for each t > 0, QVt
is non-negative definite and self-adjoint on L2 (RN ; R), and, for each q ∈ [1, ∞),
QVt takes Lq (RN ; R) into Cb (RN ; R) for each q ∈ [1, ∞) and
N +
kQVt ϕ(x)ku ≤ (2πt)− 2q etkV ku
kϕkLq (RN ;R) .
Finally,
ZZ Z Z
V 2 V −N
q (t, x, y) dx dy = q (2t, x, x) dx ≤ (4πt) 2 e2tV (x) dx.
RN RN
RN ×RN
Proof: Given q ∈ [1, ∞) and a Borel measurable ϕ : RN −→ [0, ∞), we have,

by Jensen’s Inequality, that
q (N )
h Rt
V (ψ(τ )) dτ i q
QVt ϕ(x) = EWx e 0 ϕ ψ(t)
h Rt Z
(N ) q V (ψ(τ )) dτ q i +
≤ EWx e 0 ϕ ψ(t) ≤ eqtkV ku ϕ(y)q g (N ) (t, y − x) dy.
RN
Hence, since g (N ) (t, · ) has L1 (RN ; R) norm 1,

+
ku
kQVt ϕkLq (RN ;R) ≤ etkV kϕkLq (RN ;R) ,
and so we have proved the first assertion. In addition, if q 0 is that Hölder
conjugate of q, then
+
ku
tkV + ku etkV
kQVt ϕku ≤ e kg (N )
(t, · )kLq0 (RN ;R) kϕkLq (RN ;R) ≤ N kϕkLq (RN ;R) .
(2πt)− 2q
Thus, since QVt maps Cc∞ (RN ; R) into Cb (RN ; R), it also takes Lq (RN ; R) there.
Because (10.3.30) holds for elements of Cc (RN ; R), the preceding estimates
make it clear that it continues to hold for elements of L2 (RN ; R). That is, QVt is
self-adjoint on L2 (RN ; R). To see that it is non-negative definite, simply observe
that
ϕ, QVt ϕ L2 (RN ;R) = QVt ϕ, QVt ϕ L2 (RN ;R) ≥ 0.

2 2
Turning to the final estimate, note that (cf. (10.3.17))

Z Z
q V (t, x, y)2 dy = q V (t, x, y)q V (t, y, x) dy = q V (2t, x, x).
RN RN
At the same time, by Jensen’s Inequality,
(N )
h R 2t i
V (x+ψ2t (τ )) dτ
q V (2t, x, x) = EW e 0 g (N ) (2t, 0)
1 2t W (N ) 2tV (x+ψ2t (τ ))
Z
−N
≤ (4πt) 2 E e dτ,
2t 0
and, by Tonelli’s Theorem,
Z Z
(N )
EW e2tV (x+ψ(τ )) dx = e2tV (x) dx.

RN RN
In the language of functional analysis, the last part of Lemma 10.3.31 says
that QVT is Hilbert–Schmidt and therefore compact if e2T V ∈ L1 (RN ; R). As
a consequence, the elementary theory of compact, self-adjoint operators allows
us to make the conclusions drawn in the following theorem.
Theorem 10.3.32. Assume that eT V ∈ L2 (RN ; R) for some T ∈ (0, ∞). Then
there is a unique ρ ∈ Cb RN ; (0, ∞) ∩ L2 (RN ; R) such that
kρkL2 (RN ;R) = 1 and etλ ρ = QVt ρ for some λ ∈ R and all t ∈ (0, ∞).
Moreover, if V ∈ C 2 (RN ; R) satisfies (10.3.23), then pρ (t, · , y) ∈ C 2 (RN ; R) and
∂t pρ (t, x, y) = 12 ∆x pρ (t, x, y)+ ∇ log ρ(x), ∇x pρ (t, x, y) RN in (0, ∞)×RN ×RN .

Proof: The spectral theory of compact, self-adjoint operators guarantees that

the operator QVT has a completely discrete spectrum and that its largest eigen-
value is
α(T ) = sup ϕ, QVT ϕ L2 (RN ;R) : kϕkL2 (RN ;R) = 1 .

Now let ρ be an L2 (RN ; R)-normalized eigenvector for QVT with eigenvalue

α(T ). Because α(T )ρ = QVT ρ, we know that ρ can be taken to be continuous.
In addition, by the preceding paragraph,
ZZ ZZ
ρ(x)q V (T, x, y)ρ(y) dx dy = α(T ) ≥ |ρ(x)|q V (T, x, y)|ρ(y)| dx dy,
RN ×RN RN ×RN
which, because q V (T, x, y) > 0 for all (x, y), is possible only if α(T ) > 0 and
ρ never changes sign. Therefore we can be take ρ to be non-negative. But,
if ρ ≥ 0, then, since pρ (T, x, y) > 0 everywhere and α(T )ρ = QVT ρ, ρ > 0
everywhere. Thus, we have now shown that every normalized eigenvector for
QVT with eigenvalue α(T ) is a bounded, continuous function that, after a change
of sign, can be taken to be strictly positive. In particular, if ρ1 and ρ2 were
linearly independent, normalized eigenvectors of QVT with eigenvalue α(T ), then
ρ2 − (ρ1 , ρ2 )L2 (RN ;R) ρ1

g=
kρ2 − (ρ1 , ρ2 )L2 (RN ;R) ρ1 kL2 (RN ;R)
would also be such an eigenvector, and this one would be orthogonal

to ρ1 .
On the other hand, since neither ρ1 nor g changes sign, ρ1 , g L2 (RN ;R) 6= 0. In
summary, we now know that there is, up to sign, a unique L2 (RN ; R)-normalized
eigenvector ρ for QVT with eigenvalue α(T ) and that ρ can be taken to be strictly
positive, bounded, and continuous.
To complete the proof, I must show that QVt ρ = etλ ρ, where λ = T1 log α(T ).
To this end, set ρt = QVt ρ for t > 0. Then ρt ∈ Cb RN ; (0, ∞) for each t > 0 and
t ρt (x) is continuous for each x ∈ RN . Moreover, QVT ρt = QVt ◦QVT ρ = α(T )ρt .
Hence, by the uniqueness proved above, ρt = α(t)ρ for some α(t) ∈ R. In
addition, because t ρt (x) is continuous and strictly positive, so is t α(t).

Finally,
α(s + t) = ρ, QVs+t ρ = α(s) ρ, QVt ρ

L2 (RN ;R) L2 (RN ;R)
= α(s)α(t),
which means that α(t) = etβ for some β ∈ R, and, because α(T ) = eT λ , this com-
pletes the proof of everything except the final statement, which is an immediate
consequence of Theorem 10.3.21.
If nothing else, Theorem 10.3.32 helps to explain the terminology that I have
been using. In Schrödinger mechanics, the function ρ in Theorem 10.3.32 is
called the ground state because it is the wave function corresponding to the
lowest energy level of the quantum mechanical Hamiltonian − 12 ∆ − V . From
our standpoint, its importance is that it shows that lots of V ’s admit a ground
state.
I turn now to the second method for producing ground states. Namely, sup-
pose that ρ ∈ C 2 RN ; (0, ∞) . Then, it is obvious that 12 ∆ρ + V ρ = 0, where

∆ρ ∆ log ρ + |∇ log ρ|2

V =− =− .
2ρ 2
Theorem 10.3.33. Let U ∈ C 2 (RN ; R), and assume that both U and V U ≡
1 2
N
− 2 ∆U + |∇U | are bounded above. Then, for each x ∈ R , there is a unique
U N U
Px ∈ M1 C(R ) such that Px ψ(0) = x = 1 and
Z t

1
∆ϕ + ∇U, ∇ϕ RN ψ(τ ) dτ, Ft , PU

ϕ ψ(t) − 2 x
0
is a martingale for all ϕ ∈ Cc∞ (RN ; R). Moreover, for each x ∈ RN ,

Z t
∇U ψ(τ ) dτ, Ft , PU

ψ(t) − x − x
0
is a Brownian motion and

(N )
h Rt U i
−U (x) Wx U ((ψ(t))+ V (ψ(τ )) dτ
PU
x (A) =e E e 0 ,A for all t ≥ 0 and A ∈ Ft .
Finally, x PU
x is continuous and, for any stopping time ζ and any Fζ ×BC(RN ) -
measurable F : C(RN ) × C(RN ) that is bounded below,
Z Z Z
0 0
F (ψ, Σζ ψ) PU
x (dψ) = F (ψ, ψ ) PU
ψ(ζ) (dψ ) PU
x (dψ).
{ζ(ψ)<∞} {ζ(ψ)<∞}
Proof: By Lemma 10.3.24, ρ ≡ eU is a ground state for V U with associated

eigenvalue 0. Thus, the existence of PU x follows immediately from Theorem
(N )
10.3.26 with ρ = e , as does the relation between this choice of PU
U
x and Wx
as well as the Markov property in the final statement. Moreover, by Lemma
10.3.28, we know that any P satisfying the initial condition and the stated mar-
tingale property is related to Brownian motion in the stated way. Therefore,
all that remains is to show that this relationship to Brownian motion deter-
mines P. But, by Corollary 10.3.29 with ρ = eU , we know that such a P equals
(N )
(X∇U )∗ Wx , where X∇U is the mapping described in the paragraph preceding
that corollary.
Exercise 10.3.34. Given α, β ∈ (0, ∞) and a, b ∈ RN , show that

Z
g (N ) (s, αη + a)g (N ) (t, βη + b) dη = g (N ) β 2 s + α2 t, αb − βa .

RN
Hint: Note that

1 (N )
g (N ) (s, αη + a)g (N ) (t, βη + b) = s a
g (N ) t b

g α2 , η + α β2 , η + β .
αβ
Exercise 10.3.35. When N = 1, the considerations in § 7.2.2 can be used to

give a reasonably explicit formula for pG (t, x, y). Namely, show that
p(0,∞) (t, x, y) = g (1) (t, y − x) − g (1) (t, x + y) for (t, x, y) ∈ (0, ∞) × (0, ∞)2 ,
1 ξ2
where g (1) (τ, ξ) = (2πτ )− 2 e− 2τ . In addition, referring to Corollary 7.3.4, show
that, for c ∈ R, r > 0, and (x, y) ∈ (c − r, c + r),
p(c−r,c+r) (t, x, y) = r−1 g̃ (1) r−2 t, r−1 (y−x) −r−1 g̃ (1) r−2 t, r−1 (x+y+2−2c)) ,

g (1) (t, x + 4m).

P
where g̃(τ, ξ) = m∈Z
QN
Exercise 10.3.36. Set Q(a, R) = i=1 [ai − R, ai + R] for a ∈ RN and R > 0.
Show that
N N
π(yi − ai + R) π(xi − ai + R)
Z Y N π2 Y
pQ(a,R) (t, x, y) sin dy = e− 8R2 t sin
Q(a,R) i=1
2R i=1
2R
for (t, x, y) ∈ (0, ∞) × Q(a, R)2 . Conclude that
1 N π2
lim log Wx(N ) (ζ Q(a,R) > t) = − for x ∈ Q(a, R).
t→∞ t 8R2
Hint: First observe that it suffices to handle a = 0, R = 1, and N = 1. To prove

π2 (1)
the first part, set u(t, x) = e 4 t sin π(x+1)2 , and show that u(t, ψ(t)), Ft , Wx
(1) 2
is a martingale. Given the first part, limt→∞ 1t log Wx (ζ (−1,1) > t) ≥ − π8 is
clear. To get the inequality in the opposite direction, note that p(−1,1) (t, x, y) ≤
p(−R,R) (t, x, y) if R > 1, and use this to see that, for R > 1 and (t, x) ∈
(0, ∞) × (−1, 1),
Z
π(y + R) π2 π(x + R)
p(−1,1) (t, x, y) sin dy ≤ e− 8R2 t sin .
(−1,1) 2R 2R
Exercise 10.3.37. Let G be a non-empty, bounded, connected, open subset
(N )
of RN , and set w(t) = supx∈G Wx (ζ G > t) for t > 0. The purpose of this
exercise is to show that λG ≡ − limt→∞ 1t log w(t) exists and is an element of
(0, ∞).
(i) Show that w is sub-multiplicative in the sense that w(s + t) ≤ w(s)w(t), and
conclude from this that limt→∞ 1t log w(t) = supT >0 T1 f (T ) ∈ [−∞, 0].
Hint: Set f (t) = log w(t). Because w takes values in (0, 1] and is non-increasing,
f is non-positive and bounded on compacts.
t Further, f is sub-additive: f (s+t) ≤
f (s)+f (t). Thus, given, T > 0, f (t) ≤ T f (T ), and so limt→∞ t f (t) ≤ T1 f (T )
1
for every T > 0. Conclude from this that limt→∞ 1t f (t) = supT >0 T1 f (T ) ∈
[−∞, 0].
(ii) Refer to the notation in Exercise 10.3.36, set R1 = sup{r ≥ 0 : Q(a, r) ⊆
π2
G for some a ∈ G}, and show that λG ≡ − limt→∞ 1t log w(t) ≤ N 8R2
. In partic-
1
ular, λG < ∞.
(ii) Let R2 be the diameter of G, choose a ∈ RN so that G ⊆ B(a, R2 ), and use
(N ) R2
the first part of Theorem 10.1.11 to show that EWx [ζ G ] ≤ N2 for all x ∈ G. In
particular, conclude that w 2N −1 R22 ≤ 12 and therefore that λG ≥ N2R log 2

2 > 0.
2
Exercise 10.3.38. Again let G be a bounded, connected, open subset of

RN . Using spectral theory, the conclusions drawn in Exercise 10.3.37 can be
sharpened. Namely, this exercise outlines a proof that
∞
X
(10.3.39) G
p (t, x, y) = e−tλn ϕn (x)ϕn (y),
n=0
where {λn : n ≥ 0} ⊆ (0, ∞) is a non-decreasing sequence that tends to ∞, {ϕn :

n ≥ 0} ⊆ Cb (G, R) is an orthonormal basis in L2 (G; R) of smooth functions,
λ0 < λ1 , ϕ0 > 0, and the convergence is uniform on [, ∞) × G2 for each > 0.
Finally, from (10.3.39), it will follow that
e 0 p(t, x, y) − ϕ0 (x)ϕ0 (y) ≤ δ −1 e−tδ , (t, x, y) ∈ [1, ∞) × G2 ,
tλ
for some δ > 0. In particular, this means that λ0 here is equal to λG in Exercise
10.3.37.
(i) Let PG G
t be the operator on Cb (G; R) whose kernel is p (t, x, y), and show
G 2
that Pt admits a unique extension to L (G; R) as a self-adjoint contraction.
Further, show that {PG t : t > 0} is a continuous semigroup of non-negative
definite, self-adjoint contractions on L2 (G; R). Finally, show that
|G|
ZZ Z
G 2
p (t, x, y) dxdy = pG (2t, x, x) dx ≤ N ,
G (4πt) 2
G×G
and therefore that each PG

t is Hilbert–Schmidt.
(ii) Knowing that the operators PGt form a continuous semigroup of self-adjoint,
Hilbert–Schmidt (and therefore compact), non-negative definite contractions,
standard spectral theory2 guarantees that there exists a non-decreasing sequence
{λn : n ≥ 0} ⊆ [0, ∞) tending to ∞ and an orthonormal basis {ϕn : n ≥ 0}
in L2 (G; R) such that e−tλn ϕn = PGt ϕn for all t ∈ (0, ∞) and n ≥ 0. Conclude
from this that ϕn can be taken to be smooth and bounded. In addition, show
that PGt ϕ0 −→ 0 uniformly, and therefore that λ0 > 0.
(iii) Show that
∞
X
0
ϕ, PG e−tλn ϕ, ϕn ϕ0 , ϕn

(*) t ϕ L2 (G;R)
= L2 (G;R) L2 (G;R)
n=0
for ϕ, ϕ0 ∈ L2 (G; R), and conclude that
e−λ0 = sup ϕ, PG

1 ϕ L2 (G;R) : kϕkL2 (G;R) = 1 .
Use (cf. the proof of Theorem 10.3.32) this to show that if λn = λ0 , then ϕn
never changes sign and can therefore be taken to be non-negative. In particular,
show that this means that λ1 > λ0 and that ϕ0 > 0.
(iv) Starting from (*), show that
∞ Z
X
−tλn
2 N
e ϕ, ϕn L2 (G;R)
= ϕ(x)pG (t, x, y)ϕ(y) dxdy ≤ (2πt)− 2 kϕk2L1 (G;R)
n=0 G×G
2 What is needed here is the variant of Stone’s Theorem that applies to semigroups. The
technical question which his theorem addresses is that of finding a simultaneous diagonalization
of the operators PGt . Because we are dealing here with compact operators, this question can
be reduced to one about operators in finite dimensions, where it is quite easy to handle. For
a general statement, see, for example, K. Yoshida’s Functional Analysis and its Applications,
Springer-Verlag (1971).
for any ϕ ∈ L2 (G; R), and use this to show that, for any M ∈ N and ϕ, ϕ0 ∈
L2 (G; R),
∞ tλ
X
−tλn
0 e− 2M 0
e ϕ, ϕn ϕ , ϕn L2 (G;R) ≤ N kϕkL (G;R) kϕ kL (G;R) .
1 1

L2 (G;R)
n=M (πt) 2
Next, given x, y ∈ G, set R = |x − ∂G| ∧ |y − ∂G|, and, for 0 < r ≤ R, apply

the preceding to see that
Z Z
∞
X e− tλ2M
−tλn
e − ϕn (z) dz − ϕn (z) dz ≤ N .

n=M
B(x,r) B(y,r) (πt) 2
Finally, by combining this with (*), reach the conclusion that
M −1 tλ
X 2e− 2M
e−tλn ϕn (x)ϕn (y) ≤
G
p (t, x, y) − N ,
n=0 (πt) 2
which, because λM −→ ∞, certainly implies the asserted convergence result.

λ0
(v) To complete program, set θ = 1 − λ1 ∈ (0, 1). Show that
∞
! 12 ∞
! 12
X X
e−θtλn ϕn (x)2 e−θtλn ϕn (y)2
tλ
e 0 p(t, x, y) − ϕ0 (x)ϕ0 (y) ≤
n=1 n=1
θtλ1
−
θtλ1 12 12 e 2
≤ e− 2 pG θt
2 , x, x pG θt
2 , y, y ≤ N .
(πθt) 2
Exercise 10.3.40. M. Kac3 made an interesting application of (10.3.39) to

a problem raised originally by the physicist H. Lorentz and solved, remarkably
quickly, by H. Weyl. What Lorentz noticed is that, if one takes Planck’s theory of
black body radiation seriously, then the distribution of high frequencies emitted
should depend only on the volume of the radiator. In order to state Lorentz’s
question in mathematical terms, let G be a non-empty, bounded, connected, open
subset of RN , let {λn : n ≥ 0} be the eigenvalues, arranged in non-decreasing
order, of − 12 ∆ with zero boundary conditions, and use N(λ) to denote the
number of n ≥ 0 such that λn ≤ λ. What Lorentz predicted was that the rate
at which N(λ) grows as λ → ∞ depends only on the volume |G| of G and on
nothing else about G. Thus, the original interest in the result was that the
asymptotic distribution of high frequencies is so insensitive to the shape of the
3See Kac’s wonderful article “Can one hear the shape of drum?,” Am. Math. Monthly 73 # 4,
pp. 1–23 (1966), or, better yet, borrow the movie from the A.M.S.
radiator. When Kac took up the problem, he turned it around. Namely, he asked
what geometric information, besides the volume, is encoded in the eigenvalues.
When he explained his program to L. Bers, Bers rephrased the problem in the
terms that Kac adopted for his title. Audiophiles will be disappointed to learn
that, according to C. Gordon, D. Webb, and S. Wolpert’s,4 one cannot hear the
shape of a drum, even a two dimensional one.
This exercise outlines Kac’s argument for proving Weyl’s asymptotic for-
mula N
|G|λ 2
N (λ) ∼ N ,
(2π) 2 Γ( N2+1 )
in the sense that the ratio of the two sides tends to 1 as λ → ∞.
(i) Refer to Exercise 10.3.38, and show that, for each n ≥ 0,
1
2 ∆ϕn = −λn ϕn and lim ϕn (x) = 0 for a ∈ ∂reg G.
x∈G
x→a
Thus, I will interpret the λn ’s in Exercise 10.3.38 as the frequencies referred to

in Lorentz’s problem.
(ii) Using (10.3.39), show that
Z ∞
X Z
−tλ −tλn
e N (dλ) = e = pG (t, x, x) dx,
(0,∞) n=0 G
where N (dλ) denotes integration with respect the purely atomic measure on
(0, ∞) determined by the non-decreasing function λ N (λ).
(iii) Using (10.3.8), show that
N
1 ≥ (2πt) 2 pG (t, x, x) ≥ 1 − E(t, x),
where E(t, x) ≥ 0 and, as t & 0, E(t, x) −→ 0 uniformly on compact subsets of

G. Conclude that
Z
N
lim (2πt) 2 e−tλ N (dλ) = |G|.
t&0 (0,∞)
At this point, Kac invoked Karamata’s Tauberian Theorem,5 which relates the
asymptotics at infinity of an increasing function to the asymptotics at zero of
4 See their 1992 announcement in B.A.M.S., new series 27 (2), “One cannot hear the shape
of a drum.”
5 See, for example, Theorem 1.7.6 in N. Bingham, C. Goldie, and J. Teugel’s Regularly Varying
Functions, Cambridge U. Press (1987).

its Laplace transform. Given the preceding, Karamata’s theorem yields Weyl’s
asymptotic formula. It should be pointed out that the weakness of Kac’s method
is its reliance on the Laplace transform and Tauberian theory, which gives only
the principal term in the asymptotics. Further information can be obtained
using Fourier methods, which, in terms of partial differential equations, means
that one is replacing the heat equation by the wave equation, an equation about
which probability theory has embarrassingly little to say.
Exercise 10.3.41. It will have occurred to most readers that the relation be-
tween the Hermite heat kernel in (10.1.7) and the Ornstein–Uhlenbeck process
in § 8.4.1 is the archetypal example of what we have been doing in this section.
This exercise gives substance to this remark.
|x|2
(i) Set ρ± (x) = e± 2 , and show that 1
2 ∆ρ± − 12 |x|2 ρ± = ± N2 ρ± . By Lemma
2
10.3.24, ρ− is a ground state for − |x|2 with associated eigenvalue − N2 , a fact
that also can be verified by direct computation using (10.1.7). Show that the
ρ 1 1
measure Px− is the distribution under W (N ) of {2− 2 U(2t, 2 2 x, θ) : t ≥ 0},
where U(t, x, θ) is the Ornstein–Uhlenbeck process described in (8.5.1).
(ii) Although it does not follow from Lemma 10.3.24, use (10.1.7) to show that
2
ρ+ is also a ground state for − |x|2 with associated N2 . (See Exercise 10.3.43.)
ρ 1
Also, show that Px+ is the W (N ) -distribution of {θ ∈ et x+2− 2 V(2t, θ) : t ≥ 0},
where {V(t, θ) : t ≥ 0} is the process discussed in Exercise 8.5.14.
x2 n x2
d − 2
Exercise 10.3.42. Recall the Hermite polynomials Hn (x) = (−1)n e 2 dx ne
in § 2.4.1. Show that the Hermite functions (although these are not precisely the
ones introduced in § 2.4, they are obtained from those by rescaling)
1
24 x2 1
h̃n (x) = 1 e− n ≥ 0,
2 Hn (2 2 x),
(n!) 2
form an orthonormal basis in L2 (R; R) and that

Z
1
h̃n (y)h(t, x, y) dy = e−(n+ 2 )t h̃n (x), n ≥ 0 and (t, x) ∈ (0, ∞) × R,
R
where h(t, x, y) is the function in (10.1.7) when N = 1. As a consequence, if
N
Y
h̃n (x) = hni (xi ) for n ∈ NN and x ∈ RN ,
i=1
show that {h̃n : n ∈ N } is an orthonormal basis in L2 (RN ; R) and

N
Z
N
h̃n (y)h(t, x, y) dy = e−(knk+ 2 )t h̃n (x), n ∈ NN and (t, x) ∈ (0, ∞) × RN .
R
Hint: Remember that
∞
X λn λ2
Hn (x) = eλx− 2 .
n=0
n!
Exercise 10.3.43. Part (ii) of Exercise 10.3.41 might lead one to question the
necessity of the boundedness assumption made in Lemma 10.3.24. However, that
would be a mistake because, in general, a positive solution to 12 ∆ρ + V ρ = λρ
need not be a ground state. For example, in this exercise we will show that
x4
although ρ(x) = e 4 satisfies 12 ∂x2 ρ + V ρ = 0 when V = − 12 x6 + 3x2 , this ρ is

not a ground state for V . The proof is based on the following idea. If ρ were a
ground state, then Theorems 10.3.26 and its corollaries would apply, and so we
would know that the equation
Z t
(*) X(t, ψ) = ψ(t) + X(τ, ψ)3 dτ
0
(1)
would have a solution on [0, ∞) for Wx -almost every ψ ∈ C(R) for every x ∈ R.
The following steps show that this is impossible.
(i) Suppose that ψ1 , ψ2 ∈ C(R) and that 0 ≤ ψ1 (t) ≤ ψ2 (t) for t ∈ [0, 1]. If
X( · , ψ2 ) exists on [0, 1], show that X( · , ψ1 ) exists on [0, 1].
Rt
Hint: Define X0 (t, ψ) = ψ(t) and Xn+1 (t, ψ) = ψ(t) + 0 Xn (τ, ψ)3 dτ . First
show that if 0 ≤ ψ1 (t) ≤ ψ2 (t), then 0 ≤ Xn ( · , ψ1 ) ≤ Xn ( · , ψ2 ). Second, if
supn≥0 kXn ( · , ψ)k[0,T ] < ∞, show that Xn ( · , ψ) converges uniformly on [0, T ]
to the unique solution to (*) on [0, T ].
1
(ii) Show that if ψ(t) ≥ 1 for t ∈ [0, 1], then X(t, ψ) ≥ (1 − 2t)− 2 for t ∈ 0, 12

and therefore X( · , ψ) fails to exist after time 12 .

(1)
(iii) Show that W2 ψ(t) ≥ 1 for t ∈ [0, 1] > 0, and conclude from this that ρ
cannot be a ground state for V .
Chapter 11
Some Classical Potential Theory
In this concluding chapter I will discuss a few refinements and extensions of the
material in §§ 10.2 and 10.3. Even so, I will be barely scratching the surface. The
interested reader should consult J.L. Doob’s thorough account in Classical Po-
tential Theory and Its Probabilistic Counterpart, published by Springer–Verlag
in 1984, or S. Port and C. Stones’s Brownian Motion and Classical Potential
Theory, published by Academic Press in 1978.
§ 11.1 Uniqueness Refined
In this section I will refine some of the uniqueness statements made in § 10.2.
The improved statements result from the removal of the defect mentioned in
Remark 10.3.14. To be precise, recall that if G is an open subset of RN , then
ζsG (ψ) = inf{t ≥ s : ψ(t) ∈ / G}, ζ0+G
= lims&0 ζsG , and (cf. Lemma 10.2.11)
(N ) G
∂reg G is the set of x ∈ ∂G such that Wx (ζ0+ = 0) = 1. The main result
proved in this section is Theorem 11.1.15, which states that, for any x ∈ G and
(N )
Wx -almost all ψ ∈ C(RN ), ζ G (ψ) < ∞ =⇒ ψ(ζ G ) ∈ ∂reg G. However, I
will begin by amending the treatment that I gave in § 10.3 of the Dirichlet heat
kernel pG (t, x, y).
§ 11.1.1. The Dirichlet Heat Kernel Again. In § 10.3, I introduced the
Dirichlet heat kernel pG (t, x, y). At the time, I was concerned with it only when
(x, y) ∈ G × G, and so I defined it in such a way that it was 0 outside G × G.
When G is regular in the sense that ∂G = ∂reg G, this choice is the obvious one,
since (cf. Theorem 10.3.9) it is the one that makes pG (t, · , y) continuous on R
for each (t, y) ∈ (0, ∞) × RN . However, when G is not regular, it is too crude
for the analysis here. Instead, from now on I will take
pG (t, x, y) =
(11.1.1)
W (N ) x 1 − `t (τ ) + θt (τ ) + y`t (τ ) ∈ G, τ ∈ (0, t) g (N ) (t, y − x),

where `t (τ ) = τ ∧t
t and θt (τ ) = θ(τ ) − θ(t)`t (τ ). Notice that the difference
between this definition and the one in § 10.3.2 results from the replacement of
the closed interval [0, t] there by the open interval (0, t) here. That is, in § 10.3.2,
pG (t, x, y) was given by

W (N ) x 1 − `t (τ ) + θt (τ ) + y`t (τ ) ∈ G, τ ∈ [0, t] g (N ) (t, y − x).

456
§ 11.1 Uniqueness Refined 457
Of course, unless x, y ∈ ∂G, the difference between these two disappears. On

the other hand, when either x or y is an element of ∂G, there is a subtle, but
crucial, difference.
To relate the preceding definition to the considerations in § 10.3.1, set Et◦ (ψ) =
1[t,∞) ζ0+G
(ψ) . Then (11.1.1) is equivalent to saying that pG (t, x, ψ) = q ◦ (t, x, y)
when q (t, x, y) is defined in terms of Et◦ via (10.3.2). Hence, just as in the proof
◦
of Theorem 10.3.3, one can use the results in § 8.3.3 to check that pG (t, x, y) =
pG (t, y, x) is again true but that (10.3.4) has to be replaced by
Z
(N )
ϕ(y)pG (t, x, y) dy = EWx
G
(11.1.2) ϕ(ψ(t) , ζ0+ (ψ) ≥ t .
RN
However, the analog here of the Chapman–Kolmogorov equation (10.3.5) pre-

sents something of challenge. To understand this challenge, note that t Et◦
fails to satisfy (10.3.1). Indeed,
◦
(ψ) = 1G ψ(s) Es◦ (ψ)Et◦ (ψ).

(11.1.3) Es+t
Thus, repeating the argument used in the proof of Theorem 10.3.3 to derive
(10.3.5), one finds that
Z
G
(11.1.4) p (s + t, x, y) = pG (s, x, z)pG (t, z, y) dz,
G
which, because the integral is over G and not RN , is a flawed version of the
Chapman–Kolmogorov equation. In order to remove this flaw, I will need the
following lemma.
Lemma 11.1.5. For each (t, x) ∈ (0, ∞) × RN ,
Wx(N ) (ζ G = t) = 0 = Wx(N ) (ζ0+

G
= t),
and therefore
Z h i
ϕ(y)pG (t, x, y) = Wx(N ) ϕ ψ(t) , ζ0+
G
(11.1.6) (ψ) > t
RN
for all Borel measurable ϕ : RN −→ R that are bounded below. In particular,

pG (t, x, y) = 0 for Lebesgue-almost every y ∈
/ G.
Proof: Set
Z
Wy(N ) ζ G > ξ γ0,I (dy),

ρ(ξ) = ξ ∈ (0, ∞).
RN
458 11 Some Classical Potential Theory
Obviously, ρ is a right-continuous, non-increasing, [0, 1]-valued function, and, as

such, it has only countably many discontinuities. Hence, there is a countable set
Λ ⊆ (0, ∞) such that
/ Λ =⇒ Wy(N ) (ζ G = ξ) = 0
ξ∈ for Lebesgue-almost every y ∈ RN .
Now let (t, x) ∈ (0, ∞) × RN be given, and choose s ∈ (0, t) so that t − s ∈/ Λ.

Then, by the Markov property and (10.2.10),
Wx(N ) ζ0+
G
= t = Wx(N ) ζ0+
G
> s & ζ G ◦ Σs = t − s ≤ Wx(N ) ζ G ◦ Σs = t − s

Z
(N )
Wx+y ζ G = t − s γ0,sI (dy) = 0.

=
RN
In addition, because ζ G (ψ) = t =⇒ ζ0+ G

(ψ) = t when t > 0, it follows that
(N ) G
Wx (ζ = t) = 0 also.
Given the preceding, it is clear how to pass from (11.1.2) to (11.1.6). Finally,
by applying (11.1.6) with ϕ = 1G{ , we see that
Z
pG (t, x, y) dy = Wx(N ) ψ(t) ∈ G

/ G & ζ0+ (ψ) > t = 0,
G{
which says that pG (t, x, · ) vanishes Lebesgue-almost everywhere on G{.

Because of the final part of Lemma 11.1.5, we can now replace the preceding
flawed version of the Chapman–Kolmogorov equation by
Z
G
(11.1.7) p (s+t, x, y) = pG (s, x, z)pG (t, z, y) dz, (t, x, y) ∈ (0, ∞)×(RN )2 .
RN
Before completing this discussion, I want to develop a Duhamel formula for

pG . That is, I want to show that
pG (t, x, y) =g (N ) (t, y − x)
(11.1.8) (N )
h i
− EWx g (N ) t − ζ0+
G G
G
(ψ), y − ψ(ζ0+ ) , ζ0+ (ψ) < t
for all (t, x, y) ∈ (0, ∞) × (RN )2 , and the idea is very much the same as the one
used to prove (10.3.8). Thus, for α ∈ (0, 1), set

qα◦ (t, x, y) = W (N ) x 1 − `t (τ ) + θt (τ ) + y`t (τ ) ∈ G, τ ∈ (0, αt) g (N ) (t, y − x).

Obviously, qα◦ (t, x, y) & pG (t, x, y) as α % 1. In addition, proceeding as in the

proof of Theorem 10.3.3, one finds that qα◦ (t, x, · ) is continuous and that
Z h i
(N )
ϕ(y)qα◦ (t, x, y) dy = EWx ϕ ψ(t) , ζ0+
G
(*) (ψ) ≥ αt .
RN
Now use the Markov property to justify

Z
ϕ(y)g (N ) (t, y − x) dy
RN
(N )
h i (N )
h i
= EWx ϕ ψ(t) , ζsG (ψ) ≥ αt + EWx ϕ ψ(t) , ζsG (ψ) < αt

(N )
h i
= EWx ϕ ψ(t) , ζsG (ψ) ≥ αt

Z
(N )
Wx (N ) G G G

+E ϕ(y)g t − ζs (ψ), y − ψ(ζs ) dy, ζs (ψ) < αt .
RN
for all α ∈ (0, 1), t ∈ (0, ∞) and s ∈ (0, αt). Thus, by (*), after letting s & 0,
we see that
Z
ϕ(y)qα (t, x, y) dy
RN
Z
= ϕ(y)g (N ) (t, y − x) dy
RN
Z
(N )
− EWx G G G

ϕ(y)g t − ζ0+ (ψ), y − ψ(ζ0+ ) dy, ζ0+ (ψ) < αt .
RN
Because qα◦ (t, x, · ) is continuous, this means that

(N )
qα◦ (t, x, y) = g (N ) (t, y − x) − EWx
(N ) G G
G
g t − ζ0+ (ψ), y − ψ(ζ0+ ) , ζ0+ (ψ) < αt ,
and so (11.1.8) follows when one lets α % 1.

§ 11.1.2. Exiting Through ∂reg G. The purpose of this subsection is to prove
that when Brownian motion exits from a region, it does so through regular
points. My proof of this fact follows the reasoning in the book, cited above, by
Port and Stone.
Lemma 11.1.9. Let G be a non-empty, connected open subset of RN , and define
pG by (11.1.1). Then, for each (t, x, a) ∈ (0, ∞)×RN ×reg(G) ≡ ∂reg G∪(RN \ G,
pG (t, x, a) = 0. On the other hand, if (t, x) ∈ (0, ∞) × G, then pG (t, x, a) > 0
for all a ∈ ∂G \ ∂reg G. In particular, ∂G \ ∂reg G has Lebesgue measure 0.
Proof: Obviously, pG (t, x, a) = 0 if a ∈ RN \ Ḡ. Next, suppose that a ∈
∂reg G. Then, by (11.1.8), pG (t, a, x) = 0 for all (t, x) ∈ (0, ∞) × RN , and so, by
symmetry, the same is true of pG (t, x, a).
To go in the other direction when (t, x) ∈ (0, ∞) × G, let a ∈ ∂G be given,
and begin with the observation that (t, x) ∈ (0, ∞) × G 7−→ pG (t, x, a) is in
(0, ∞) × G; [0, ∞) and satisfies ∂t pG (t, x, a) = 12 ∆x pG (t, x, a). To check
1,2

C
this, use (11.1.4) to write
Z
G
p (t, x, a) = pG (t − s, x, z)pG (s, z, a) dz
G
for any 0 < s < t, and note that pG (s, · , a) is bounded. Hence, the desired
conclusions follow from (10.3.12) and the argument used to prove the last part of
Theorem 10.3.9. Next, suppose that pG (t0 , x0 , a) = 0 for some (t0 , x0 ) ∈ (0, ∞)×
G. Then, by the strong minimum principle (cf. Theorem 10.1.6), pG (t, x, a) = 0
for all (t, x) ∈ (0, t0 ) × G. But this, by (11.1.2) and symmetry, means that, for
t ∈ (0, t0 ),
Z Z
(N ) G G
Wa (ζ0+ ≥ t) = p (t, a, y) dy = pG (t, x, a) dx = 0,
RN G
where I have used the final part of Lemma 11.1.5 to get the second equality.
Hence, pG (t0 , x0 , a) = 0 =⇒ a ∈ ∂reg G.
Finally, because, by the preceding and symmetry, for any x ∈ G, ∂G \ ∂reg G
is contained in {y ∈ / G : p(1, x, y) > 0}, and, by Lemma 11.1.5, the latter set
has Lebesgue measure 0, it is clear the ∂G \ ∂reg G has Lebesgue measure 0.
I next introduce the function
(N ) −ζ G
(11.1.10) v G (x) ≡ EWx e 0+ , x ∈ RN .
Since, by the Markov property,
Z
(N ) G (N ) G
−s
e g (N ) (s, y − x)EWy e−ζ dy = EWx e−ζs % v G (x)
RN
as s & 0, it is clear that v G is lower semicontinuous. In addition, it is obvious
that v G ≤ 1 everywhere and that
x ∈ RN : v G (x) = 1 = reg(G) = ∂reg G ∪ RN \ G .

Lemma 11.1.11. Define the Borel measure ν G on RN by 1

Z h G i
(N )
G
ν (Γ) = EWx e−ζ0+ (ψ) , ψ(ζ0+G
) ∈ Γ dx.
RN
G
Then ν is supported on G{, and if
Z
r(x) = e−t g (N ) (t, x) dt, x ∈ RN ,
(0,∞)
then Z
G
(11.1.12) v (x) = r(y − x) ν G (dy), x ∈ RN .
RN
In particular, ν G is always locally finite and is therefore finite in the case when
G{ is compact. Finally, for any non-empty, open set H ⊂ RN ,
(N )
h H i
(11.1.13) G{ ⊆ reg(H) =⇒ v G (x) = EWx e−ζ0+ v G ψ(ζ0+ H
) , x ∈ RN ,
where reg(H) = ∂reg H ∪ (RN \ H̄).
G
1 −ζ0+ (ψ) G (ψ) = ∞. Thus, the problem of
Below I use the convention that e = 0 when ζ0+
G
G ) meaning when ζ G (ψ) = ∞ does not arise in integrals having e −ζ0+ (ψ)
giving ψ(ζ0+ 0+ as a
factor in their integrands.
Proof: Clearly ν G is supported on G{. To prove (11.1.12), note that the

symmetry of pG (t, x, y) together with (11.1.8) imply that
(N )
h i
EWx g (N ) t − ζ0+ G G
G
(ψ), y − ψ(ζ0+ ) , ζ0+ (ψ) < t
(N )
h i
= EWy g (N ) t − ζ0+ G G
G
(ψ), x − ψ(ζ0+ ) , ζ0+ (ψ) < t
for all (t, x, y) ∈ (0, ∞) × RN × RN . Hence, after multiplying by e−t and inte-
grating with respect to t ∈ (0, ∞), one arrives at
(N )
h G i (N )
h G i
EWx e−ζ0+ (ψ) r ψ(ζ0+ G
) − y = EWy e−ζ0+ (ψ) r ψ(ζ0+ G
)−x .
But Z
r(x − y) dy = 1, x ∈ RN ,
RN
and so (11.1.12) follows after one integrates the preceding over y ∈ RN and
applies Tonelli’s Theorem.
Given (11.1.12) and the fact that r is uniformly positive on compacts, it
becomes obvious that ν G must be always locally finite and finite when G{ is
compact. Thus, all that remains is to check (11.1.13). But clearly, after multi-
plying (11.1.8) with G = H throughout by e−t and integrating with respect to
t ∈ (0, ∞), one gets
Z h G
(N ) i
r(x − y) = e−t pH (t, x, y) dt + EWx e−ζ0+ r ψ(ζ0+
G
)−y .
(0,∞)
Hence, since, by the first part of Lemma 11.1.9 with G = H, pH (t, x, · ) vanishes
on reg(H), (11.1.13) follows after one integrates the preceding with respect to
ν G (dy) and uses (11.1.12).
Lemma 11.1.14. If G{ is compact and, for some θ ∈ [0, 1), v G G{ ≤ θ, then
(N ) G
Wx ζ0+ < ∞ = 0 for every x ∈ RN .
G
Proof: I begin by checking that v ≤ θ everywhere. Thus, suppose that
H = x ∈ R : v (x) > θ + 6= ∅ for some > 0. Because v G is lower
N G
semicontinuous, H is open. I will derive a contradiction by first showing that

G{ ⊆ reg(H) and then applying (11.1.13). To carry out the first step, use
(11.1.12) to see that, for any s ∈ (0, ∞),
Z ∞ Z
G −t (N ) G
v (x) ≥ e g (t, y − x) ν (dy) dt
s RN
Z Z
−s −t (N ) G
=e e g (s, y − x)v (y) dy dt
(0,∞) RN
−s
)Wx(N ) ψ(s) ∈ H ≥ e−s (θ + )Wx(N ) ζ0+
H

≥e (θ + >s ,
(N )
and so, after letting s & 0, we have that v G (x) ≥ (θ + )Wx (ζ0+ H
> 0).
(N ) H
In particular, if x ∈/ G, then θ ≥ (θ + )Wx (ζ0+ > 0), which means that
(N ) H
x ∈/ G =⇒ Wx (ζ0+ > 0) < 1. Hence, because (cf. part (ii) of Exercise
(N ) H
10.2.19) Wx (ζ0+ > 0) ∈ {0, 1}, this means that x ∈ / G =⇒ x ∈ reg(H) and
therefore that (11.1.13) applies. But if x ∈ H, (11.1.13) yields the contradiction
(N )
h H i
θ + < v G (x) = EWx e−ζ0+ v G ψ(ζ0+
H
) < θ + ,
H H
since ζ0+ (ψ) < ∞ =⇒ ψ(ζ0+ )∈ / H. That is, I have shown that H must be
empty.
Knowing that v G ≤ θ everywhere, I now want to argue that ν G (RN ) ≤
θν G (RN ). Since ν G (RN ) < ∞, this will show that ν G = 0 and therefore, by
(N ) G
(11.1.12), that v G ≡ 0, which is the same as saying that Wx (ζ0+ < ∞) = 0
everywhere. Thus, let K = G{, and set Kn = {x : dist(x, K) ≤ n−1 } and
Gn = Kn { for n ≥ 1. Clearly, K ⊆ RN \ Gn ⊆ reg(Gn ), and so, by (11.1.12) and
Tonelli’s Theorem,
Z Z
G N Gn G
ν (R ) = v (x) ν (dx) = v G (y) ν Gn (dy) ≤ θν Gn (RN ).
RN RN
Thus, all that we have to do is check that ν Gn (RN ) & ν G (RN ) when n → ∞.
But Z
Gn N
ν (R ) = v Gn (x) dx
RN
and ν G1 (RN ) < ∞. Hence, by the Monotone Convergence Theorem, it is enough

for us to know that v Gn (x) & v G (x) for Lebesgue-almost every x ∈ RN . Because
Gn (N )
x ∈ Gn implies ζ0+ = ζ Gn % ζ G = ζ0+G
Wx -almost surely, 1 ≥ v Gn & v G on
Gn G
G. At the same time, 1 ≥ v ≥ v = 1 on reg(G), and, by the last part of
Lemma 11.1.9, G{ \ reg(G) = ∂G \ ∂reg G has Lebesgue measure 0.
Theorem 11.1.15. For every open G ⊂ RN ,

Wx(N ) ζ0+
G G
(ψ) < ∞ & ψ(ζ0+ )∈
/ ∂reg G = 0 for all x ∈ G.
(N )
G
Proof: Suppose not. Because Wy (ζ0+ > 0) ∈ {0, 1} for all y ∈ RN , we could
then find an x ∈ G and a δ > 0 for which

Wx(N ) ζ0+
G G
(ψ) < ∞ & ψ(ζ0+ ) ∈ Γδ > 0,
where n o
Γδ = y ∈ ∂G : Wy(N ) ζ0+
G
≥ δ ≥ 12 .

But then there would exist a compact K ⊆ Γδ for which

Wx(N ) ζ0+
K{
< ∞ ≥ Wx(N ) ζ0+G G

(ψ) < ∞ & ψ(ζ0+ ) ∈ K > 0.
On the other hand, because K{ ⊇ G, v K{ ≤ v G everywhere, and therefore,

because v G (y) ≤ 12 1 + e−δ < 1 for y ∈ K, Lemma 11.1.14 would say that
(N ) K{
Wx ζ0+ < ∞ = 0, which is obviously a contradiction.
§ 11.1.3. Applications to Questions of Uniqueness. My main reason for
wanting the result in Theorem 11.1.15 is that it allows me to improve on the
uniqueness results that were proved in §§ 10.2.3 and 10.3.1. For example, by
the comment in Remark 10.3.14, we can now remove the assumption that ∂G =
∂reg G from the uniqueness assertion in Corollary 10.3.13.
Theorem 11.1.16. Let G be an open subset of RN and ϕ ∈ Cb (G; R). Then
Z
(N )
Wx
G
ϕ(y)pG (t, x, y) dy ∈ R

(t, x) ∈ (0, ∞) × G 7−→ E ϕ ψ(t) , ζ (ψ) > t =
G
is the one and only bounded, smooth solution to the boundary value problem
described in Corollary 10.3.13.
More interesting are the improvements that Theorem 11.1.15 allows me to
make to the results in § 10.2.3.
Theorem 11.1.17. Given an open G ⊆ RN and a bounded Borel measurable
f : ∂G −→ R, set
(N )
uf (x) = EWx f ψ(ζ G ) , ζ G (ψ) < ∞ ,

(11.1.18) for x ∈ G.
Then uf is a bounded harmonic function on G and limx→a uf (x) = f (a) when-

x∈G
ever a ∈ ∂regG is a point at which f is continuous.
Furthermore, if f ∈
2
Cb ∂G; [0, ∞) and u is an element of C G; [0, ∞) that satisfies
∆u ≤ 0 in G and lim u(x) ≥ f (a) for a ∈ ∂reg G,

x→a
x∈G

then uf ≤ u. In particular, if f ∈ Cb ∂G; R , then uf is the one and only
harmonic function u on G with the properties that
u(x) ≤ CWx(N ) ζ G < ∞ for all x ∈ G,

for some C < ∞ and
lim u(x) = f (a) for each a ∈ ∂reg G.

x→a
x∈G
Proof: The initial assertions are covered already by Theorem 10.2.14. Next,
let f ∈ Cb (∂G; R) be given, and suppose that u is an element of C 2 G; [0, ∞)
in the second assertion. To prove that uf ≤ u, set
which satisfies the conditions
Ft = σ {ψ(τ ) : τ ∈ [0, t]} , and choose a sequence of bounded, open subsets Gn
(N )
so that Gn ⊆ G and Gn % G. Then, for each n ≥ 1, −u ψ(t ∧ ζ Gn ), Ft , Wx
is a submartingale, and so we know that, for each x ∈ G, u(x) dominates
(N ) (N )
lim EWx u ψ(T ∧ ζ Gn ) ≥ lim lim EWx u ψ(ζ Gn ) , ζ G ≤ T

lim
T %∞ n→∞ T %∞ n→∞
(N )
h i
Wx
f ψ(ζ G ) , ζ G < ∞ = uf (x),

≥E
where, in the passage to the last line, I have used Fatou’s Lemma and Theorem
11.1.15.
Finally, let f ∈ Cb (∂G; R) be given. What I still have to show is that if u
is a harmonic function on G which tends to f at points in ∂reg G and satisfies
(N )
|u(x)| ≤ CWx (ζ G < ∞) for some C < ∞, then u = uf . Thus, suppose u is
such a function, and set M = C + kf ku . Then, by the preceding, we have both
that
M Wx(N ) ζ G < ∞ + u(x) ≥ uM 1+f (x) = M Wx(N ) ζ G < ∞ + uf (x)

and that
M Wx(N ) ζ G < ∞ − u(x) ≥ uM 1−f (x) = M Wx(N ) ζ G < ∞ − uf (x),

which means, of course, that u = uf .

As an immediate consequence of Theorem 11.1.17, we have the following.
Corollary 11.1.19. Assume that
(11.1.20) Wx(N ) (ζ G < ∞) = 1 for all x ∈ G.
Then, for each f ∈ Cb (G; R) the function uf in (11.1.18) is the one and only
bounded, harmonic function u on G which satisfies limx→a u(x) = f (a) for every
x∈G
a ∈ ∂reg G. In particular, this will be the case if G is contained in a half-space.
In order to go further, it will be helpful to have the following lemma.
Lemma 11.1.21. Let G be a non-empty, connected, open set in RN . Then
∂reg G = ∅ ⇐⇒ Wx(N ) ζ G < ∞ = 0 for all x ∈ G.

On the other hand, if ∂reg G 6= ∅ and b ∈ ∂G, then

b∈/ ∂reg G ⇐⇒ lim lim Wx(N ) ψ(ζ G ) ∈ / BRN (b, r) & ζ G < ∞ > 0.
r&0 x→b
x∈G
Proof: The equivalence
∂reg G = ∅ ⇐⇒ Wx(N ) (ζ G < ∞) = 0, x ∈ G,
follows immediately from Theorems 11.1.15 and 11.1.17.

Now assume that ∂reg G 6= ∅, and let b ∈ ∂G. If b ∈ ∂reg G, then

lim lim Wx(N ) ψ(ζ G ) ∈ / BRN (b, r) & ζ G < ∞ = 0
r&0 x→b
x∈G
follows from (10.2.13). Thus, suppose that b ∈ / ∂reg G. Choose a ∈ ∂reg G,

1
and set B = BRN (b, r), where 0 < r ≤ 2 |a − b|. One can then construct an
f ∈ C ∂G; [0, 1] with the properties that f = 0 on B ∩ ∂G and f (a) = 1. In
particular,

0 ≤ uf (x) ≤ Wx(N ) ψ(ζ G ) ∈
/ B & ζ G < ∞ ≤ 1 for all x ∈ G,
and so we need only check that limx→b uf (x) > 0. To this end, first note that,
x∈G
since
lim uf (x) = f (a) = 1,
x→a
x∈G
the Strong Minimum Principle (cf. Theorem 10.1.6) says that uf > 0 everywhere
in G. Next, because b is not regular, we can find a δ > 0 and a sequence
{xn : n ≥ 1} ⊆ G such that xn → b and
≡ inf+ Wx(N ) G

n
ζ > δ > 0.
n∈Z
Moreover, by the Markov property, we know that

(N )
i Z
Wx n G G
uf (y) pG (δ, xn , y) dy.

uf (xn ) ≥ E f ψ(ζ ) , δ < ζ < ∞ =
G
At the same time, we know that pG (δ, xn , y) ≤ g (N ) (δ, y − xn ), and therefore

that Z

sup pG (δ, xn , y) dy ≤
n∈Z+ G\K 2
for some compact subset K of G. Hence,

lim uf (x) ≥ lim uf (xn ) ≥ inf uf (y) > 0.
x→b n→∞ 2 y∈K
x∈G
As a consequence of Lemma 11.1.21, I will now show that solutions to the

Dirichlet problem will not, in general, approach the correct value at points out-
side of ∂reg G.
Theorem 11.1.22. Let G be a connected open set in RN , and assume that

∂reg G 6= ∅. If b ∈ ∂G \ ∂reg G, then there exists an f ∈ C ∂G; [0, 1] which has
the property that
lim uf (x) 6= f (b).
x→b
x∈G
Proof: Given b, use Lemma 11.1.21 to find an r ∈ (0, ∞) so that

lim Wx(N ) ψ(ζ G ) ∈
/ B(b, r) & ζ G < ∞ > 0,
x→b
x∈G
and construct f so that f ≡ 1 on ∂G ∩ B(b, r){ and f (b) = 0. Then f (b) <
limx→b uf (x).
x∈G
I next take a closer look at the conditions under which we can assert the
uniqueness of solutions to the Dirichlet problem. To begin, observe that, by
Corollary 11.1.19, the situation is quite satisfactory when (11.1.20) holds. In
fact, the same line of reasoning which I used there shows that the same conclusion
(N )
holds as soon as one knows that Wx ζ G < ∞ is bounded below by a positive
(N )
constant; and therefore, because x ∈ G 7−→ Wx (ζ G < ∞) is a bounded
harmonic function which tends to 1 at ∂reg G, Theorem 11.1.17 tells us that
inf Wx(N ) ζ G < ∞ > 0 =⇒ inf Wx(N ) ζ G < ∞ = 1.

(11.1.23)
x∈G x∈G
I will close this discussion of the Dirichlet problem with two results which
reflect the transience of Brownian paths in three and higher dimensions and
their recurrence in one and two dimensions.
Theorem 11.1.24. Assume that N ≥ 3, and let G be a nonempty, connected,
open subset of RN . If f ∈ Cc (∂G; R), then uf is the one and only bounded
harmonic function u on G which tends to f at ∂reg G and satisfies
(11.1.25) lim u(x) = 0.
|x|→∞
x∈G
Proof: We already know that uf is a bounded harmonic function which tends
to f at ∂reg G, but we must still show that it satisfies (11.1.25). For this purpose,
choose r ∈ (0, ∞) so that f is supported in B(0, r). Then (cf. the last part of
Theorem 10.1.11), because N ≥ 3,
uf (x) ≤ kf ku Wx(N ) ζr < ∞ −→ 0 as |x| → ∞.

To prove that uf is the only such function u, select bounded open sets Gn % G
with Gn ⊂⊂ G, and note that, for each T ∈ (0, ∞),
(N )
h i
u(x) = lim EWx u ψ(T ∧ ζ Gn )
n→∞
(N )
h i (N )
h i
= EWx f ψ(ζ G ) , ζ G ≤ T + EWx u ψ(T ) , T < ζ G < ∞

(N )
h i
+ EWx u ψ(T ) , ζ G = ∞ .

Clearly, h i
(N )
uf (x) = lim EWx f ψ(ζ G ) , ζ G ≤ T

T %∞
and h i
(N )
lim EWx u ψ(T ) , T < ζ G < ∞ = 0.

T %∞

Finally, because N ≥ 3 and, therefore, by Corollary 10.1.12, ψ(T ) −→ ∞ as
(N )
T % ∞ for Wx -almost every ψ ∈ C(RN ), (11.1.25) guarantees that
(N )
h i
lim EWx u ψ(T ) , ζ G = ∞ = 0,

T %∞
which completes the proof that u = uf .

The situation when N ∈ {1, 2} is more complicated.
Theorem 11.1.26. If N ∈ {1, 2}, then for every non-empty, open set G in
RN ,
Wx(N ) ζ G < ∞) = 1 for all x ∈ G or Wx(N ) ζ G < ∞ = 0 for all x ∈ G,

depending on whether ∂reg G 6= ∅ or ∂reg G = ∅. Moreover, if ∂reg G = ∅, then the

only functions u ∈ C 2 G; [0, ∞) satisfying ∆u ≤ 0 are constant. In particular,
either ∂reg G = ∅, and there are no non-constant, nonnegative harmonic functions
on G, or ∂reg G 6= ∅, and, for each f ∈ Cb (∂G; R), uf is the unique bounded
harmonic function on G which tends to f at ∂reg G.
(N )
Proof: Suppose that Wx0 ζ G < ∞ < 1 for some x0 ∈ G, and choose open
sets Gn % G so that x0 ∈ G1 and Gn ⊂⊂ G for all n ∈ Z+ . Given u ∈
C 2 G; [0, ∞) with ∆u ≤ 0, set
Xn (t, ψ) = 1(t,∞] ζ Gn (ψ) u ψ(t) for (t, ψ) ∈ [0, ∞) × C(RN ).

(N ) (N )
Then −Xn (t), Ft , Wx0 is a non-positive,
right-continuous, Wx0 -submartin-
gale when Ft = σ {ψ(τ ) : τ ∈ [0, t]} . Hence, since
Xn (t, ψ) % X(t, ψ) ≡ 1(t,∞] (ζ G ) u ψ(t)

pointwise as n → ∞,
an application of The Monotone Convergence Theorem allows us to conclude

(N )
that −X(t), Ft , Wx0 is also a non-positive, continuous, submartingale. In
particular, by Theorem 7.1.10, this means that
lim u ψ(t) exists for Wx(N )

-almost every ψ ∈ {ζ G = ∞}.

0
t→∞
(N )
At the same time, by Theorem 10.2.3, we know that, for Wx0 -almost every
ψ ∈ C(RN ), Z ∞

1U ψ(t) dt = ∞ for all open U 6= ∅.
0
(N )
Hence, since Wx0 ζ G = ∞ > 0, there exists a ψ0 ∈ C(RN ) with the properties
that ψ(0) = x0 , ζ G (ψ0 ) = ∞,
Z ∞

1U ψ0 (t) dt = ∞ for all open U 6= ∅, and lim u ψ0 (t) exists,
0 t→∞
which is possible only if u is constant. In other words, we have now proved that
(N )
when Wx0 (ζ G < ∞) < 1 for some x0 ∈ G, then the only u ∈ C 2 G; [0, ∞)
with ∆u ≤ 0 are constant.
Given the preceding paragraph, the rest is easy. Indeed, if ∂reg G = ∅, then
(N )
Theorem 11.1.15 already implies that Wx (ζ G < ∞) = 0 for all x ∈ G. On the
(N )
other hand, if a ∈ ∂reg G but Wx0 ζ G < ∞ < 1 for some x0 ∈ G, then the
(N ) (N )
preceding paragraph applied to x Wx (ζ G < ∞) says that Wx (ζ G < ∞)
is constant, which leads to the contradiction
1 > Wx(N
0
) G
lim Wx(N ) (ζ G < ∞) = 1.
(ζ < ∞) = x→a
x∈G
§ 11.1.4. Harmonic Measure. We now have a rather complete abstract anal-

ysis of when the Dirichlet problem can be solved. Indeed, we know that, at least
when f ∈ Cc (∂G; R), one cannot do better than take one’s solution to be the
function uf given by (11.1.18). For this reason, I will call
ΠG (x, Γ) ≡ Wx(N ) ψ(ζ G ) ∈ Γ, ζ G (ψ) < ∞

(11.1.27)
the harmonic measure for G based at x ∈ G of the set Γ ∈ B∂G . Obviously,
Theorem 11.1.15 says that ΠG (x, ∂G \ ∂reg G) = 0, and
Z
uf (x) = f (η) ΠG (x, dη).
∂G
This connection between harmonic measure and Wiener’s measure is due to
Doob,2 and it is the starting point for what, in the hands of G. Hunt,3 became
an isomorphism between potential theory and the theory of Markov processes.
2 Actually, S. Kakutani’s 1944 article, “Two dimensional Brownian motion and harmonic func-
tions,” Proc. Imp. Acad. Tokyo 20, together with his 1949 article, “Markoff process and the
Dirichlet problem,” Proc. Imp. Acad. Tokyo 21, are generally accepted as the first place in
which a definitive connection between the harmonic functions and Wiener’s measure was estab-
lished. However, it was not until with Doob’s “Semimartingales and subharmonic functions,”
T.A.M.S. 77, in 1954 that the connection was completed.
3 In 1957, Hunt published a series of three articles: “Markov processes and potentials, parts
I, II, & III,” Ill. J. Math. 1 & 2. In these articles, he literally created the modern theory of
Markov processes and established their relationship to potential theory. To see just how far
Hunt’s ideas can be elaborated, see M. Sharpe’s General Theory of Markov Processes, Acad.
Press Series in Pure & Appl. Math. 133 (1988).
Although (11.1.27) provides an intuitively appealing formula for the harmonic

measure ΠG (x, · ), it hardly can be considered explicit. Thus, in this subsection
I will write down two important examples in which explicit formulas for the
harmonic measure are readily available. The first example is the one discussed
in Exercise 10.2.22, namely, when G is a half-space. To be precise, if N = 1
and G = (0, ∞), then, because one-dimensional Wiener paths hit points, it is
clear that Π(0,∞) (x, · ) is nothing but the point mass δ0 for all x ∈ (0, ∞). On
N −1
the other hand, if N ≥ 2 and G = RN + ≡ R × (0, ∞), then we know from
Exercise 10.2.22 and (3.3.19) that, for y ∈ (0, ∞),
N 2 y
ΠR+ (0, y), dω = λ N −1 (dω), y ∈ (0, ∞),
ωN −1 y 2 + |ω|2 N2 R
where ωN −1 is the surface area of SN −1 and I have identified ∂RN

+ with R
N −1
and used λRN −1 to denote Lebesgue measure on RN −1 . Hence, after a trivial

translation,
N 2 y
ΠR+ (x, y), dω = λ N −1 (dω)
ωN −1 y 2 + |x − ω|2 N2 R
for (x, y) ∈ RN −1 × (0, ∞).
Moreover, by using further translation plus Wiener rotation invariance (cf. (ii) in
Exercise 4.3.10), one can pass easily from the preceding to an explicit expression
of the harmonic measure for an arbitrary half-space.
In the preceding, we were able to derive an expression giving the harmonic
measure for half-spaces directly from probabilistic considerations. Unfortu-
nately, half-spaces are essentially the only regions for which probabilistic rea-
soning yields such explicit expressions. Indeed, embarrassing as it is to admit,
it must recognized that, when it comes to explicit expressions, the time-honored
techniques of clever changes of variables followed by separation of variables are
more powerful than anything which comes out of (11.1.27). To wit, I have been
unable to give a truly probabilistic derivation of the classical formula given in
the following.
Theorem 11.1.28 (Poisson Formula). Use λSN −1 to denote the surface
measure on the unit sphere SN −1 in RN , and define
1 1 − |x|2
π (N ) (x, ω) = for (x, ω) ∈ B(0, 1) × SN −1 .
ωN −1 |x − ω|N
Then:
ΠB(0,1) (x, dω) = π (N ) (x, ω) λSN −1 (dω), for x ∈ B(0, 1).
More generally, if c ∈ RN , r ∈ (0, ∞), and λSN −1 (c,r) denotes the surface measure
on the sphere SN −1 (c, r) ≡ ∂B(c, r), then
1 r2 − |x − c|2
ΠB(c,r) (x, dω) = λSN −1 (c,r) (dω), x ∈ B(c, r).
ωN −1 r |x − ω|N
Equivalently, for each open G in RN , harmonic function u on G, B(c, r) ⊂⊂ G,
and x ∈ B(c, r),
Z
u(c + rω) π (N ) x−c

u(x) = r , ω λSN −1 (dω).
SN −1
In particular, if {un : n ≥ 1} is a sequence of harmonic functions on the open
set G and if un −→ u boundedly and pointwise on compact subsets of G, then
u is harmonic on G and un −→ u uniformly on compact subsets. (See Exercise
11.2.22 for another approach.)
Proof: Set B = B(0, 1). Clearly, everything except the final assertion follows
by scaling and translation once we identify π (N ) as the density for ΠB . To make
this identification, first check, by direct calculation, that π (N ) ( · , ω) is harmonic
in B for each ω ∈ SN −1 . Hence, in order to complete the proof, all that we have
to do is check that Z
lim
x→a
f (ω) π (N ) (x, ω) λSN −1 (dω) = f (a)
x∈B SN −1
for every f ∈ C SN −1 ; R) and a ∈ SN −1 . Since, for each δ > 0, it is clear that

Z
lim
x→a
π (N ) (x, ω) λSN −1 (dω) = 0,
x∈B
SN −1 ∩B(a,δ){
we will be done Zas soon as we show that

π (N ) (x, ω) λSN −1 (dω) = 1 for all x ∈ B.
SN −1
But, because, for each ξ ∈ SN −1 , π (N ) ( · , ξ) is harmonic in B and, by (10.2.7),
λSN −1 (0,r)
ΠB(0,r) (0, · ) = for each r ∈ (0, ∞),
ωN −1 rN −1
we have that, for r ∈ [0, 1) and ξ ∈ SN −1 ,
Z
(N )
1 = ωN −1 π (0, ξ) = π (N ) (rω, ξ) λSN −1 (dω)
SN −1
Z
= π (N ) (rξ, ω) λSN −1 (dω),
SN −1
where, in the final step, I have used the easily verified identity
2
π (N ) (rξ, ω) = π (N ) (rω, ξ) for all r ∈ [0, 1) and (ξ, ω) ∈ SN −1 .
Thus, by writing x = rξ, we obtain the desired identity.
When N = 2, one gets the following dividend from Theorem 11.1.28.
Corollary 11.1.29. Set D(r) = B(0, r) in R2 for r ∈ (0, ∞). Then

2
\D(r) r|x|2 |x|2 − r2
(11.1.30) ΠR (x, dω) = λS1 (0,r) (dω)
2π |x|2 ω − r2 x2

for each x ∈/ D(r). In particular, if u ∈ Cb R2 \ D(r); R is harmonic on
R2 \ D(r), then
|x|2 |x|2 − r2
Z
u(x) = u(rω)λS1 (dω),
2π S1 |x|2 ω − rx2

and so
Z
1
(11.1.31) lim u(x) = u(rω) λS1 (dω).
|x|→∞ 2π S1
Proof: After an easy scaling argument, I may and will assume that r = 1.
Thus, set D = D(1), and
assume
that u ∈ Cb R2 \ D; R is harmonic in R2 \
x
D. Next, set v(x) = u |x| 2 for x ∈ D \ {0}. Obviously, v is bounded and
continuous. In addition, by using polar coordinates, one can easily check that v
is harmonic in D \ {0}. In particular, if ρ ∈ (0, 1) and G(ρ) ≡ B \ B(0, ρ), then
(N )
h i (N )
h i
v(x) = EWx v ψ(ζ1 ) , ζ1 < ζρ + EWx v ψ(ζρ ) , ζρ < ζ1 , x ∈ G(ρ),

where the notation is that in Theorem 10.1.11. Hence, because, by that theorem,
(N )
ζρ % ∞ (a.s., Wx ) as ρ & 0, this leads to
1 − |x|2
Z
Wx
(N )
h i 1
v(x) = E v ψ(ζ1 ) , ζ1 < ∞ = u(ω) λS1 (dω)
2π S1 ω − x2

for all x ∈ D \{0}. Finally, given the preceding, the rest comes down to a simple
matter of bookkeeping.
As a second application of Poisson’s formula, I make the following famous ob-
servation, which can be viewed as a quantitative version of the Strong Minimum
Principle (cf. Theorem 10.1.6) for harmonic functions.
Corollary 11.1.32 (Harnack’s Principle). For any c ∈ RN and r ∈
(0, ∞),
rN −2 r − |x − c| B(c,r)

N −1 Π (c, · )
r + |x − c|
(11.1.33)
rN −2 r + |x − c| B(c,r)

B(c,r)
≤Π (x, · ) ≤ N −1 Π (c, · ).
r − |x − c|
for all x ∈ B(c, r). Hence, if u is a non-negative, harmonic function on B(c, r),
then
rN −2 r − |x − c| rN −2 r + |x − c|

(11.1.34) N −1 u(c) ≤ u(x) ≤ N −1 u(c).
r + |x − c| r − |x − c|
In particular, if G is a connected region in RN and {un : n ≥ 1} is a non-

decreasing sequence of harmonic functions on G, then either limn→∞ u(x) = ∞
for every x ∈ G or there is a harmonic function u on G to which {un : n ≥ 1}
converges uniformly on compact subsets of G.
Proof: The inequalities in (11.1.33) are immediate consequences of Poisson’s
formula and the triangle inequality; and, given (11.1.33), the inequalities in
(11.1.34) comes from integrating the inequalities in (11.1.33). Finally, let a
connected, open set G and a nondecreasing sequence {un : n ≥ 1} of har-
monic functions be given. By replacing un with un − u0 if necessary, I may
and will assume that all the un ’s are nonnegative. Next, for each x ∈ G, set
u(x) = limn→∞ un (x) ∈ [0, ∞]. Because (11.1.34) holds for each of the un ’s and
B(c, r) ⊂⊂ G, the Monotone Convergence Theorem allows us to conclude that
it also holds for u itself. Hence, we know that both {x ∈ G : u(x) = ∞} and
{x ∈ G : u(x) < ∞} are open subsets of G, and so one of them must be empty.
Finally, assume that u < ∞ everywhere on G, and suppose that B(c, 2r) ⊂⊂ G.
Then, by the right-hand side of (11.1.34), the un ’s are uniformly bounded on
B c, 3r

2 , and so, by the last part of Theorem 11.1.28, we know that u is har-
monic and that un −→ u uniformly on B(c, r).
Notice that, by taking c = 0 and letting r % ∞ in (11.1.34), one gets an
easy derivation of the following general statement, of which we already know a
sharper version (cf. Theorem 11.1.26) when N ∈ {1, 2}.
Corollary 11.1.35 (Liouville Theorem). The only nonnegative harmonic
functions on RN are constant.
Exercise 11.1.36. As a consequence of (11.1.31), note that if u is a bounded

harmonic function in the exterior of a compact subset of R2 , then u has a limit
as |x| → ∞. Show (by counterexample) that the analogous result is false in
dimensions greater than two.
Exercise 11.1.37. Once I reduced the problem to that of studying v on D\{0},
the rest of the argument which I used in the proof of (11.1.31) was based on a
general principle. Namely, given an open G, a K ⊂⊂ G, and a harmonic function
on G \ K, one says that K is a removable singularity for u in G if u admits
a unique harmonic extension to the whole of G.
(ii) Let K ⊂⊂ RN , and take σK (ψ) = inf{t > 0 : ψ(t) ∈ K} to be the first
positive entrance time of ψ ∈ C(RN ) into K. Given an open G ⊃⊃ K, show
that
Wx(N ) σK < ζ G = 0 for all x ∈ G \ K

(11.1.38)
if and only if K ∩ ∂reg (G \ K) = ∅, and use the locality proved in Lemma 10.2.11
to conclude that (11.1.38) for some G ⊃⊃ K is equivalent to K ∩ ∂reg (G \ K) = ∅
for all G ⊃⊃ K. In particular, conclude that (11.1.38) holds for some G ⊃⊃ K
if and only if

(11.1.39) Wx(N ) ∃t ∈ [0, ∞) ψ(t) ∈ K = 0 for all x ∈ / K.
(iii) Let K ⊂⊂ RN be given, and assume that (11.1.39) holds. Given G ⊃⊃ K

and a u ∈ C(G; R) which is harmonic on G \ K, show that K is a removable
singularity for u in G.
Hint: Begin by choosing a bounded open set H ⊃⊃ K so that H ⊂⊂ G. Next,
set n o
1

σn (ψ) = inf t > 0 : dist ψ(t), K ≤ 2n dist K, H{ ,
and define un on H by
(N )
h i
un (x) = EWx u ψ(ζ H ) , ζ H < σn .

Show that, on the one hand, un −→ u on H \ K, while, on the other hand,

(N )
h i
lim un (x) = EWx u ψ(ζ H ) , ζ H < ∞

n→∞
for all x ∈ H.
(iii) Let K be a compact subset of RN and a connected G ⊃⊃ K be given.
Assuming either that N ≥ 3 or that ∂reg G 6= ∅, show that (11.1.39) holds if K
is a removable singularity in G for every bounded, harmonic function on G \ K.
(N )
Hint: Consider the function x ∈ G \ K 7−→ Wx σK < ζ G ∈ [0, 1], and use
the Strong Minimum Principle.
(iv) Let G be a non-empty, open subset of RN , where N ≥ 2, and set D =
{(x, x) : x ∈ G}, the diagonal in G2 . Given a u ∈ C(G2 ; R) which is harmonic
on G \ D, show that u is harmonic on G2 .
Hint: Show that
(2N )

Wx,y ∃t ∈ [0, ∞) ψ(t) ∈ D
Z
Wy(N ) ∃t ∈ (0, ∞) ψ(t) = ϕ1 (t) Wx(N ) (dϕ) = 0

≤
C(RN )
for (x, y) ∈ G2 \ D.
Exercise 11.1.40. For each r ∈ (0, ∞), let S(r) denote the open vertical strip
(−r, r) × R in R2 . Clearly,
ζ S(r) (ψ) = ζr(1) (ψ) ≡ inf t ≥ 0 : |ψ1 (t)| ≥ r ,

and so the harmonic measure for S(r), based at any point in S(r), will be
supported on {(x, y) : x = ±r and y ∈ R}. In particular, if u ∈ Cb S(r); R is
bounded and harmonic on S(r), then
(11.1.41) kuku ≤ sup |u(1, y)| ∨ |u(−1, y)|.

y∈R
The estimate in (11.1.41) is a primitive version of the Phragmén–Lindelöf maxi-

mum principle. To get a sharper version, one has to relax the global boundedness
condition on S(r). To see what can be expected, consider the function

π(x + r) πy
ur (z) ≡ sin cosh for z = (x, y) ∈ R2 .
2r 2r
Obviously, ur is harmonic everywhere but (11.1.41) fails dramatically. Hence,

even if boundedness is not necessary for (11.1.41), something is: the function
cannot be allowed to grow, as |y| → ∞, as fast as ur does. What follows is the
outline of a proof that those harmonic functions which grow strictly slower than
ur do satisfy (11.1.41). More precisely, it will be shown that, for u ∈ C S(r); R
which are harmonic on S(r),

θπ|y|
sup exp − u(x, y) < ∞ for some θ ∈ [0, 1)
(x,y)∈S(r) 2r
=⇒ u satisfies (11.1.41),
which is the true Phragmén–Lindelöf principle

(i)
(i) Given R ∈ (0, ∞), set ζR (ψ) = inf{t ≥ 0 : |ψi (t)| ≥ R}, and show that, for

any u ∈ C S(r); R which is harmonic on S(r),
(2)
h i (2)
h i
(2) (2) (2)
u(z) = EWz u ψ ζr(1) , ζr(1) ≤ ζR + EWz u ψ ζR , ζR < ζr(1)
for z ∈ S(r, R) ≡ (−r, r) × (−R, R). Conclude that (11.1.41) holds as long as
(2)
lim sup u(x, R) ∨ u(x, −R)Wz(2) ζR < ζr(1) = 0, z ∈ S(r).

R→∞ |x|≤1
Thus, the desired conclusion comes down to showing that, for each ρ ∈ (r, ∞),

πR (2)
Wz(2) ζR < ζr(1) = 0, z ∈ S(r).

(*) lim exp
R→∞ 2ρ
§ 11.2 The Poisson Problem and Green Functions 475
(ii) To prove (*), let ρ ∈ (r, ∞) be given. Show that, for R ∈ (0, ∞) and
z ∈ S(r, R),
h (2)
i
πR
(2)
Wz π ψ1 ζR +ρ (2) (1)
uρ (z) = cosh 2ρ E sin 2ρ , ζ R < ζr

(2)
≥ cosh πR cos πr Wz(2) ζR < ζr(1) ,

2ρ 2ρ
and from this get (*).

§ 11.2 The Poisson Problem and Green Functions
Let G be an open subset of RN and f a smooth function on G. The basic problem
which motivates the contents of this section is that of analyzing solutions u to
the Poisson problem
1
(11.2.1) 2 ∆u = −f in G and lim u(x) = 0 for a ∈ ∂reg G.
x→a
Notice that, at least when G is bounded, or, more generally, whenever (11.1.20)
holds, there is at most one bounded u ∈ C 2 (G; R) which satisfies (11.2.1). In-
deed, if there were two, then their difference would be a bounded harmonic func-
tion on G satisfying boundary condition 0 at ∂reg G, which, because of (11.1.20)
and Corollary 11.1.19, means that this difference vanishes. Moreover, when
N ≥ 3, even if (11.1.20) fails, one can (cf. Theorem 11.1.24) recover uniqueness
by adding to (11.2.1) the condition that
(11.2.2) lim u(x) = 0.

|x|→∞
x∈G
In view of the preceding discussion, the problem in Poisson’s problem is that

of proving that solutions exist. In order to get a feeling for what is involved,
given f ∈ Cc (G; R), define
"Z # Z Z
(N )
T T
Wx
f (y)pG (t, x, y) y dt

uT (x) = E 1[0,ζ G ) (t)f ψ(t) dt =
1 1
T T G
for T ∈ (1, ∞) and x ∈ G. Then, by Corollary 10.3.13,

Z
1
f (y) pG (T, x, y) − pG (T −1 , x, y) dy

2 ∆uT =
G
lim uT (x) = 0 for a ∈ ∂reg G.
and x→a
x∈G
Hence, at least when (11.1.20) holds and therefore G pG (T, x, y)f (y) dy −→ 0
R
as T % ∞, it is reasonable to hope that u = limT →∞ uT exists and will be the

desired solution to (11.2.1). On the other hand, it is neither obvious that the
limit will exist nor, even if it does exist, in what sense either the smoothness
properties or (11.2.2) will survive the limit procedure.
Motivated by these considerations, I now define the Green function to be
the function g G given by
Z
G
(11.2.3) g (x, y) = pG (t, x, y) dt, (x, y) ∈ G2 .
(0,∞)
My goal in this section is to show that, in great generality, g G is the fundamental

f (y)g G (x, y) dy solves (11.2.1).
R
solution to (11.2.1) in the sense that x G
§ 11.2.1. Green Functions when N ≥ 3. The transience of Brownian motion
in RN for N ≥ 3 greatly simplifies the analysis of g G there. The basic reason
why is that
Z ∞ Z ∞
|y−x|2
RN −N N
g (x, y) ≡ (N )
g (t, y − x) dt = (2π) 2 t− 2 e− 2t dt
0 0
N

Γ 2 −1
= N ,
2π 2 |y − x|N −2
and therefore (cf. part (i) in Exercise 2.1.13)
N 2|y − x|2−N
(11.2.4) g R (x, y) = ,
(N − 2)ωN −1
N
where ωN −1 is the area of SN −1 . In particular, when N ≥ 3, g R (x, · ) is smooth
and has bounded derivatives of all orders in RN \ B(x, r) for each r > 0. Next,
by integrating both sides of (10.3.8) with respect to t ∈ (0, ∞), we obtain, for
any G, the Duhamel formula
N (N )
h N i
g G (x, y) = g R (x, y) − EWx g R ψ(ζ0+ G
G
(11.2.5) ), y , ζ0+ <∞ ,
which means that g G (x, · ) is bounded on compact subsets of G\{x}. In addition,

for each y ∈ G, the second term on the right R of (11.2.5) is a harmonic function
of x ∈ G, and so, for any f ∈ Cc (G; R), G f (y)g G ( · , y) dy makes sense and
R N
differs from RN f (y)g R ( · , y) dy by a function that is harmonic on G.
Now define
Z
(11.2.6) GG f (x) = f (y)g G (x, y) dy for f ∈ Cc (G; R) and x ∈ G.
G
What we still have to find are conditions under which GG f solves (11.2.1) and
satisfies (11.2.2). From (11.2.5) and Theorem 10.2.14, it is clear that GG f (x)
N N
tends to 0 as x tends to ∂reg G. In addition, since |GG f | ≤ GR |f | and GR |f |
tends to 0 at infinity, GG f satisfies (11.2.2). Hence, the remaining question is
whether 12 ∆GG f = −f on G. As an initial step, suppose that GG f ∈ Cb2 (G; R),
and note that, for each x ∈ G,
1 s
Z Z
1 G G
2 ∆G f (x) = lim ϕ(y)p (t, x, y) dy dt
s&0 s 0 G
Z
1 G G G
= lim p (s, x, y)G f (y) dy − G f (x)
s&0 s G
1 s
Z Z
G
= − lim f (y)p (t, x, y) dy dt = −f (x).
s&0 s 0 G
Thus, what we need to know is whether GG f ∈ Cb2 (G; R). By the considerations
N
above, we already know that GG f ∈ Cb2 (G; R) if and only if GR f is. Moreover,
N N
if f ∈ Cc2 (G; R), then ∂ α GR f = GR ∂ α f for any α with kαk ≤ 2. In addition,
(yi − xi )∂yj f (y)

Z
N 2
GR ∂xi ∂xj f (x) = dy.
ωN −1 RN |x − y|N
Hence, by starting with f ’s that are in Cc2 (G; R) and applying an obvious ap-
proximation argument, we see that GG f ∈ Cb2 (G; R) whenever f ∈ Cc1 (G; R).1
Theorem 11.2.7. Assume that N ≥ 3 and that G is a non-empty, open subset
of RN . Then, for each f ∈ Cc1 (G; R), the function GG f in (11.2.6) is the unique
bounded, twice differentiable solution to (11.2.1) which satisfies (11.2.2).
Remark 11.2.8. Notice that the Duhamel formula in (11.2.5) could have been
N
guessed. To be precise, g R is a fundamental solution for − 12 ∆ in RN in the
N
sense that 12 ∆GR f = −f all test functions f ∈ Cc1 (RN ; R), and g G is to be a
fundamental solution for − 12 ∆ in G with 0 boundary data in the sense that it
should be the kernel for the solution operator which solves the Poisson problem in
(11.2.1). Based on these remarks, one should guess that a reasonable approach
N
to the construction of g G would be to correct g R ( · , y) for each y ∈ G by
N
subtracting off a harmonic function which has g R ( · , y) as its boundary value,
and this is, of course, precisely what is being done in (11.2.5).
§ 11.2.2. Green Functions when N ∈ {1, 2}. Because (cf. Theorem 10.2.3)
Brownian paths in one and two dimensions spend infinite time in every non-
empty open set, the reasoning § 11.2.1 is too crude to handle the Poisson problem
1 It turns out that if f is Hölder continuous of some order, then GG f will be twice continuously
differentiable and its second derivatives will be Hölder continuous of the same order as f . Such
results are called Schauder estimates. See, for example, N.V. Krylov’s Lectures on Elliptic and
Parabolic Equations in Hölder Spaces, A.M.S. Graduate Studies in Math. 12 (1996).
N
in these dimensions. In particular, when N ∈ {1, 2}, g R will be identically
infinite, and so (11.2.5) does us no good. To overcome this difficulty, I will use a
generalization of (11.2.5). Namely, let H be an open set that contains G. Then,
by the Markov property, it is easy to check that
(N )
h i
P H (t, x, Γ) = P G (t, x, Γ) + EWx P H t − ζ G (ψ), ψ(ζ G ), Γ , ζ G (ψ) < ∞

for (t, x) ∈ (0, ∞) × G and Γ ∈ BG . As a consequence, we have that

(N )
h i
pH (t, x, y) = pG (t, x, y) + EWx pH t − ζ H (ψ), ψ(ζ G ), y , ζ G (ψ) < ∞

for (t, x, y) ∈ (0, ∞) × G2 . Hence, after integrating with respect to t ∈ (0, ∞),
one obtains
(N )
h i
g H (x, y) = g G (x, y) + EWx g H t − ζ G (ψ), y , ζ G (ψ) < ∞

(11.2.9)
for all (x, y) ∈ G2 .

Of course, (11.2.9) is useful only when g H is finite and calculable, and so
we have to find a suitable class of H’s for which it is. Thus, I will start by
calculating g (0,∞) (x, y) for x, y ∈ (0, ∞). To this end, recall from Exercise
10.3.35 that p(0,∞) (t, x, y) = g (1) (t, y − x) − g (1) (t, x + y). Next, check that
Z ∞
a2 ∞ − 3 − a2
2
Z
− 21 − a2t 1 1
t 2 e 2t dt = −2 2 |a|Γ 12 = −(2π) 2 |a|

t e − 1 dt = −
0 2 0
for any a ∈ R. Thus,
Z ∞
(0,∞)

g (x, y) = g(t, y − x) − g(t, y + x) dt
Z ∞0 Z ∞
(y−x)2
1
(y+x)2
−2 − 12 − 2t − 12 1
= (2π) t e − 1 dt − (2π) t− 2 e− 2t − 1 dt
0 0
= |x + y| − |x − y| = 2(x ∧ y).
More generally, by translation and reflection, we see that, for any c ∈ R,
(11.2.10) g (c,∞) (x, y) = 2(x − c) ∧ (y − c) for x, y ∈ (c, ∞)
and that g (−∞,c) (x, y) = g (−c,∞) (−x, −y).

Now suppose that G 6= R is a non-empty, connected, open subset of R. Then,
either ±G = (c, ∞) for some c ∈ R or G = (a, b) for some −∞ < a < b < ∞.
Since we already know an exact expression in the first of these cases, assume
that G = (a, b). By taking H = (a, ∞) in (11.2.9), we know that
(1)
h i
g (a,b) (x, y) = g (a,∞) (x, y) − EWx g (a,∞) ψ(ζ (a,b) ), y , ζ (a,b) (ψ) < ∞ .

(1)
Since Wx (ζ (a,b) < ∞) = 1 for all x ∈ R and the boundary of (a, b) is regular,
Corollary 11.1.19 together with (11.2.10) say that, as a function of x ∈ (a, b),
the second term on the right equals −u, where u00 = 0, limx&0 u(x) = 0, and
limx%b u(x) = 2(y − a). Hence, u(x) = 2(x−a)(y−a)
b−a , and so
2
g (a,b) (x, y) =

(11.2.11) x ∧ y − a (b − x ∨ y).
b−a
Starting from these, it is an easy matter to check by hand that if G 6= R is any
open interval and f ∈ Cc (G; R), GG f is bounded and solves (11.2.1). Moreover,
because (11.1.20) holds, GG f is the only such solution.
When N = 2, matters are significantly more complicated but much more
interesting. I will begin by considering the R2 analog of (0, ∞), which is the
upper half-space R2+ = {(x1 , x2 ) : x2 > 0}. It should be clear that, for x =
(x1 , x2 ) and y = (y1 , y2 ),

R2+ (1) (0,∞) 1 |y−x|2
− 2t
|y̌−x|2
− 2t
p (t, x, y) = g (t, y1 − x1 )p (t, y1 , y2 ) = e −e ,
2πt
where y̌ = (y1 , −y2 ). Therefore,
Z
2
2π pR+ (t, x, y) dt
(0,∞)
T
|y − x|2 |y̌ − x|2
Z
1
= lim exp − − exp − dt
T %∞ 0 t 2t 2t
−2
|y−x|
Z
1 − 1
= lim e 2tT dt,
T %∞ t
|y̌−x|−2
2 |y−x|
which means that g R+ (x, y) = − π1 log |y̌−x| . Hence, by (11.2.9), we know that
2 (2)
h 2
i
g G (x, y) = g R+ (x, y) − EWx G
G
g R+ ψ(ζ0+ ), y , ζ0+ <∞
if G ⊆ R2+ . Furthermore, because x log |y̌ − x| is harmonic in G, one can pass

from the preceding to
1 1 (2)
h i
(11.2.12) g G (x, y) = − log |y − x| + EWx log |y − ψ(ζ0+
G G
)|, ζ0+ (ψ) < ∞ ,
π π
first for G ⊆ R2+ and then, after translation and rotation, for G contained in
any half-space. In addition, by the same argument as I used in § 11.2.1, one can
use (11.2.12) to check that if G is contained in a half-space, then GG f solves
Poisson’s problem for every f ∈ Cc (G; R).
To handle regions that are not contained in a half-space, one needs to work
harder.
Lemma 11.2.13. If G is an open subset of R2 for which ∂reg G 6= ∅, then, for

each K ⊂⊂ G,
Z Z
sup g G (x, y) dx = sup g G (x, y) dy < ∞
y∈G K x∈G K
and (2)
h i
EWx log |y − ψ(ζ G )|, ζ G (ψ) < ∞ < ∞.

sup
(x,y)∈K 2
In addition, for each c ∈ G and r > 0,

(2)
h i
(x, y) ur (x, y) ≡ EWx log |y − ψ(ζ G )|, ζ G (ψ) < ζ B(c,r) (ψ)
2
is harmonic on G ∩ B(c, r) , and, as r → ∞, {ur : r > 0} tends uniformly on
compact subsets of G2 to the function
(2)
h i
(x, y) ∈ G2 7−→ u(x, y) ≡ EWx log |y − ψ(ζ G )|, ζ G (ψ) < ∞ ∈ R.
In particular, u is harmonic on G2 .
Proof: Since g G is symmetric, the first equality is obvious. While proving the
associated finiteness assertion, I may and will assume that G is connected. In
addition, it suffices for me to prove
"Z G #
(2)
ζ (ψ)
sup EWx

1B(c,r) ψ(t) dt < ∞
x∈G 0
for all c ∈ G and r > 0 with B(c, 2r) ⊂⊂ G. Given such a ball, set B = B(c, r)
and 2B = B(c, 2r), and define {ζn : n ≥ 0} inductively by ζ0 = 0 and, for n ≥ 1,
ζ2n−1 = inf{t ≥ ζ2(n−1) : ψ(t) ∈ B} and ζ2n = inf{t ≥ ζ2n−1 : ψ(t) ∈ / 2B}.
(2) G

If u(x) = Wx ζ1 < ζ , then u is a [0, 1]-valued, harmonic function on G \ B
that tends to 0 as x tends to ∂reg G and to 1 as x tends to ∂B. Thus, since
∂reg G 6= ∅, the Minimum Principle says that u(x) ∈ (0, 1) for all x ∈ G \ B. In
particular, this means that α ≡ max{u(x) : |x − c| = 2r} ∈ (0, 1). At the same
time, by the Markov property,
(2)
Wx(2) ζ2n+1 < ζ G = EWx u ψ(ζ2n ) , ζ2n (ψ) < ζ G (ψ) ≤ αWx(2) ζ2n−1 < ζ G ,

(2) (2)
and so Wx ζ2n−1 < ζ G ≤ αn−1 for n ∈ Z+ . Hence, if f (y) = EWy ζ 2B ,

then
∞
"Z G # "Z #
(2)
ζ X (2)
ζ2n
Wx Wx G

E 1B ψ(t) dt = E 1B ψ(t) dt, ζ2n−1 (ψ) < ζ
0 n=1 ζ2n−1
∞
X (2) kf ku
EWx f ψ(ζ2n−1 ) , ζ2n−1 (ψ) < ζ G (ψ) ≤

≤ .
n=1
1−α
Since, by Theorem 10.1.11, f is bounded, this completes the proof.

Turning to the second part, begin by observing that, for each r > 0 and
x ∈ G(r) ≡ G ∩ B(c, r), ur (x, · ) is a harmonic function on G(r). Next, given
y ∈ G(r), define f on ∂G(r) so that f (ξ) = log |y − ξ| or 0 according to whether
ξ is or is not an element of ∂G(r) \ ∂B(c, r). Then
(2)
ur (x, y) = EWx f ψ(ζ G(r) ), ζ G(r) (ψ) < ∞ ,

and so ur ( · , y) is also harmonic on G(r). Hence, since ur is locally bounded

on G(r)2 , Exercise 10.2.16 applies and says that ur is harmonic on G(r)2 . To
complete the proof, let B be an open ball whose closure is contained in G, set
D = dist(B̄, G{), and choose R > 0 so that B̄ ⊆ G(R). Then, for each r > R,
vr (x, y) ≡ ur (x, y) − log DWx(2) ζ G < ζ B(c,r)

|y − ψ(ζ G )| G

(2)
= EWx log , ζ (ψ) < ζ B(c,r) (ψ)
D
is a non-negative, harmonic function on B 2 , and, for each (x, y) ∈ B 2 , vr (x, y) is
non-decreasing as a function of r > R. Thus, by Harnack’s Principle (cf. Corol-
lary 11.1.32), either limr→∞ vr = ∞ on B 2 or vr tends uniformly on compact
subsets of B 2 to a harmonic function v. Since

lim sup Wx(2) ζ G < ζ B(c,r) − Wx(2) (ζ G < ∞) = 0,

r→∞ x∈B
it is clear that the latter case implies that

h
(2)
i
sup EWx log |y − ζ G (ψ)|, ζ G (ψ) < ∞

(x,y)∈K 2
≤ lim sup vr (x, y) + | log D| < ∞

r→∞ (x,y)∈K 2
for K ⊂⊂ B and that ur tends to u uniformly on compact subsets of B. Hence,

all that remains is for me to rule out the possibility that limr→∞ vr = ∞ on B.
Equivalently, I must show that limr→∞ ur (x, y) < ∞ for some (x, y) ∈ B 2 .
For this purpose, note that, because G(r) is bounded, and therefore contained
in some half-space, (11.2.12) applies and says that
(2)
h i
πg G(r) (x, y) + log |y − x| = EWx logy − ψ(ζ G(r) ), ζ G(r) (ψ) < ∞

(2)
h i
= ur (x, y) + EWx logy − ψ(ζ B(c,r) ), ζ B(c,r) (ψ) < ζ G (ψ) < ∞ .

Hence, for sufficiently large r’s and all (x, y) ∈ B 2 ,

1 1
ur (x, y) ≤ πg G(r) (x, y) + log |y − x| ≤ πg G (x, y) + log |y − x|,
π π
which, by the first part of this lemma, means that limr→∞ ur cannot be infinite
everywhere on B 2 .
Theorem 11.2.14. Let G be a non-empty, open subset of R2 for which

∂reg G 6= ∅. Then, (11.1.20) holds,
(2)
h i
sup EWx log y − ψ(ζ G ), ζ G < ∞ < ∞ for K ⊂⊂ G,

x,y∈K
and (2)
h i
(x, y) ∈ G2 7−→ EWx log y − ψ(ζ G ), ζ G < ∞ ∈ R

is a harmonic function. In addition, for each c ∈ G, the limit
log r (2) B(c,r)

(11.2.15) hG (x) ≡ lim Wx ζ ≤ ζG , x ∈ G,
r→∞ π
exists, is uniform with respect to x in compact subsets of G and independent of

c ∈ G, and determines a harmonic function of x ∈ G. Finally,
1 1 (2)
h i
(11.2.16) g G (x, y) = − log |y−x|+ EWx logy−ψ(ζ G ), ζ G < ∞ +hG (x)

π π
for all distinct x’s and y’s from G, and so either hG ≡ 0 or G is unbounded and
(11.2.17) g G ( · , y) −→ hG uniformly on compacts as |y| → ∞ through G.
Proof: Note that, because N = 2, Theorem 11.1.26 guarantees that (11.1.20)

follows from ∂reg G 6= ∅, and the rest of the initial assertion is covered by Lemma
11.2.13.
To prove the remaining assertions, let c ∈ G be given, set G(r) = G ∩ B(c, r),
and set gr (x, y) = g G(r) (x, y) for (x, y) ∈ G(r)2 . By (11.2.12),
1 1 (2)
h i
gr (x, y) = − log |y − x| + EWx log |y − ψ(ζ G(r) )|, ζ G(r) (ψ) < ∞ .
π π
In particular, for each (x, y) ∈ G(r)2 , gr ( · , y) is harmonic on G(r) \ {y} and

gr (x, · ) is harmonic on G(r) \ {x}. Hence, by Exercise 10.2.16, gr is a non-
negative, harmonic function on G(r) \2 ≡ {(x, y) ∈ G(r)2 : x 6= y}. At the same
time, because pG(r) (t, x, y) is non-decreasing in r for each (t, x, y) ∈ (0, ∞) ×
G(r)2 , we know that gr is non-decreasing in r. Hence, by Harnack’s Principle
(cf. Corollary 11.1.32), either limr%∞ gr is everywhere infinite on G c2 ≡ {(x, y) ∈
2 2
G : x 6= y} or gr converges uniformly on compact subsets of G to a harmonic
c
function. Because
Z Z
g G (x, y) = pG (t, x, y) dt = lim pG(r) (t, x, y) dt = lim gr (x, y),
(0,∞) r%∞ (0,∞) r%∞
we conclude from the first part of Lemma 11.2.13 that only the second alternative
is possible. Thus, we now know that g G is harmonic on G c2 and that
(*) gr (x, y) % g G (x, y) c2 .

uniformly on compact subsets of G
To go further, first notice that the expression in (11.2.12) for gr can be rewrit-
ten as
πgr (x, y) = − log |y − x| + ur (x, y)
(**) (2)
h i
+ EWx logy − ψ(ζ B(c,r) ), ζ B(c,r) (ψ) ≤ ζ G (ψ) < ∞ ,

where
(2)
h i
ur (x, y) = EWx logy − ψ(ζ G ), ζ G (ψ) < ζ B(c,r) (ψ) for (x, y) ∈ G(r)2 .

By the second part of Lemma 11.2.13, we know that each ur is harmonic on

G(r)2 and that, as r → ∞, {ur : r > 0} tends uniformly on compact subsets of
G2 to the harmonic function
(2)
h i
(x, y) u(x, y) ≡ EWx log |y − ψ(ζ G )|, ζ G (ψ) < ∞ .
Moreover, by combining this with (*) and (**), we also know that the third term
on the right of (**) converges uniformly on compact subsets of G c2 to a harmonic
function on G c2 . At the same time, as r → ∞,
(2)
h i
EWx logy − ψ(ζ B(c,r) ), ζ B(c,r) (ψ) ≤ ζ G (ψ) < ∞

− log rWx(2) ζ B(c,r) ≤ ζ G (ψ) < ∞
" ! #
(2) y − ψ(ζ B(c,r) )
= EWx log , ζ B(c,r) ≤ ζ G (ψ) < ∞ −→ 0
r
uniformly for (x, y) in compact subsets of G2 . Thus, the asserted limit in

(11.2.15) exists, the function hG is harmonic on G, and (11.2.16) holds.
Finally, to complete the proof, note that if G is bounded, then (11.2.12) holds
and therefore hG must be identically 0. Now, assume that G is unbounded. To
prove (11.2.17), use (11.2.16) to write
" ! #
1 Wx(2) y − ψ(ζ G )
G G G
h (x) = g (x, y) + E log , ζ (ψ) < ∞ ,
π |y − x|
and apply Lebesgue’s Dominated Convergence Theorem together with the inte-
grability estimate in the second part of Lemma 11.2.13 to see that, as |y| → ∞
through G, the second term tends to 0 uniformly for x in compact subsets of
G.
Remark 11.2.18. The appearance of the extra term hG in (11.2.16) is, of

course, a reflection of the fact that, for unbounded regions in R2 , we do not
know a priori which harmonic function (cf. Remark 11.2.8) should be used to
correct − π1 log |y−x|. When N ≥ 3, the obvious choice was the one that behaved
N
the same way at ∞ as g R itself (i.e., the one that tends to 0 at ∞). Actually,
as (11.2.17) makes explicit, the same principle applies to the case when N = 2,
G
although now 0 may not be that limiting behavior. To see that, in general, h
is not identically 0, consider the open disk D(R) = x : |x| < R , and take
G = R2 \ D(R). Then it is an easy matter to check that, for R < |x| < r,
log |x|
Wx(2) ζ D(r) < ζ G = R
.
log Rr
Hence, by (11.2.15), we see that
2
\D(R) 1 |x|
hR (x) = log , x∈
/ D(R).
π R
As we are about to see, for G’s whose complements are compact, the conclusion
drawn about hG at the end of Remark 11.2.18 is typical, at least as |x| → ∞.
Corollary 11.2.19. Let everything be as in Theorem 11.2.14, and assume
that K ≡ R2 \ G is compact. Then, for each R ∈ (0, ∞) with the property that
K ⊂⊂ D(R), one has that
|x| |x|2 |x|2 − R2
Z
1
hG (x) − log = G
h (Rω) λS1 (dω)
π R 2π S1 |x|2 ω − Rx2

Z
1
−→ hG (Rω) λS1 (dω)
2π S1
as |x| → ∞.
Proof: Define σ : C(RN ) −→ [0, ∞] to be the first entrance time into D(R),
and note (cf. the preceding discussion) that, for each r > R and R < |x| < r,
Wx(2) ζ D(r) < ζ G

(2)
h i
(2)
= Wx(2) ζ D(r) < σ + EWx Wψ(σ) ζ D(r) < ζ G , σ < ζ D(r)

log |x| (2)

Wx
h
(2)
i
R
Wψ(σ) ζ D(r) < ζ G , σ < ζ D(r) .

= r +E
log R
Hence, after multiplying the preceding through by logπ r , using (11.2.15), and
letting r → ∞, we arrive at
1 |x| 1 (2)
h i
hG (x) = log + EWx hG ψ(σ) , σ < ∞ , x ∈ R2 \ D(R),

π R π
which certainly implies that

1 |x|
x ∈ R2 7−→ hG (x) − log
π R
is a bounded function that is harmonic off of D(R). Thus, the desired result
now follows from the first part of Theorem 11.1.29.
Notice that, as a by-product, one knows that the number
Z
1 1
hG (Rω) λS1 (ω) − log R
2π S1 π
does not depend on R as long as G{ ⊂⊂ B(0, R). This number plays an im-
portant role in classical two-dimensional potential theory, where it is known as
Robin’s constant for G.
Corollary 11.2.20. Again let everything be as in Theorem 11.2.14. Then,
for each K ⊂⊂ G and r > 0,
n o
sup g G (x, y) : |x − y| ≥ r and y ∈ K < ∞
and
lim sup g G (x, y) = 0 for each a ∈ ∂reg G.
x→a
x∈G y∈K
Moreover, for each f ∈ Cc1 (G; R), GG f is the unique bounded solution to
(11.2.1).
Proof: To prove the initial statements, let c ∈ G and r > 0 satisfying B(c, 2r)
⊂⊂ G be given, set B = B(c, r), and define the first entrance time σ(ψ) of ψ
into B by σ(ψ) = inf t ≥ 0 : ψ(t) ∈ B . By the Markov property, we see that,
for any f ∈ Cc B; [0, ∞) ,
"Z G #
Z
(2)
ζ
g G (x, y)f (y) dy = EWx f ψ(t) dt, σ < ζ G

G σ
Z
(2)
Wx G G

=E g ψ(σ), y f (y) dy, σ < ζ .
G
/ 2B ≡ B(c, 2r) and therefore g G (x, · ) B is continuous, we find

Hence, if x ∈
that (2)
h i
g G (x, y) = EWx g G ψ(σ), y), σ < ζ G for all y ∈ B.

But, because g G ∂(2B) × B is bounded, we now see that
sup g G (x, y) ≤ CWx(2) σ < ζ G , x ∈

(*) / 2B,
y∈B
for some C ∈ (0, ∞). In particular, this, combined with the obvious Heine–
Borel argument, proves the first estimate. In addition, if a ∈ ∂reg G, then, for
each δ > 0,
lim Wx(2) σ < ζ G ≤ x→a lim Wx(2) ζ G > δ

lim Wx(2) σ ≤ δ + x→a

x→a
x∈G x∈G x∈G
lim Wx(2)

= x→a
σ≤δ .
x∈G
Thus, since the last expression obviously tends to 0 as δ & 0, this, together with
(*), implies that
lim sup g G (x, y) = 0,
x→a
x∈G y∈B
which (again after the obvious Heine–Borel argument) means that we have also
proved the second assertion.
Turning to the last part of the statement, let f ∈ Cc1 (G, R) be given. By the
preceding, we know that GG f is bounded and tends to 0 at ∂reg G. In addition,
using Theorem 11.2.14, especially (11.2.16), and arguing as I did in the case
when N ≥ 3, it is easy to check that GG f ∈ C 2 (G; R) and 12 ∆GG = −f . Thus,
GG f is a bounded solution to (11.2.1), and, because (11.1.20) holds, it can be
the only such solution.
Exercises 11.2.21. Give an explicit expression for the Green function g B(c,R)
when N ≥ 2. To this end, first use translation and scaling to see that

x−c y−c
g B(c,R) (x, y) = R2−N g B(0,1) ,
R R
for distinct x, y from B(c, R). Thus, assume that c = 0 and R = 1. Next,
observe that

y
|x − y| = |y|x − |y| for x ∈ SN −1 and y ∈ BRN (0, 1) \ {0},

and use this observation together with (11.2.12) and (11.2.5) to conclude that
(
y
1 1 log − |y|x if y 6= 0

B(0,1) |y|
(x, y) = − log |y − x| +

g
π π 0 if y = 0
when N = 2 and
 N
y
N
 gR
|y| − |y|x if y 6= 0
g B(0,1) (x, y) = g R (x, y) −
2

(N −2)ωN −1 if y = 0
when N ≥ 3.
§ 11.3 Excessive Functions, Potentials, and Riesz Decompositions 487
Exercise 11.2.22. The derivation that I gave of Poisson’s formula (cf. Theo-
rem 11.1.28) required me to already know the answer and simply verify that it is
correct. Here I outline another approach, which is the basis for a quite general
procedure. To begin with, recall the classical Green’s Identity
Z Z
∂v ∂u

u∆v − v∆u dx = u ∂n − v ∂n dλ∂G
G ∂G
N
for bounded, smooth regions G in R and functions u and v that are smooth
in a neighborhood of G. (In the preceding, ∂w ∂n (x) is used to denote the normal
derivative ∇w(x), n(x) RN , where n(x) is the outer unit normal at x ∈ ∂G
and λ∂G is the standard surface measure for ∂G.) Next, let c be an element of
B(0, 1), suppose r > 0 satisfies B(c, r) ⊂⊂ B(0, 1), and let u be a function that
is harmonic in a neighborhood of BRN (0, 1). By applying Green’s Identity with
G = BRN (0, 1) \ B(c, r) and v = 12 g B(0,1) (c, · ), use Exercise 11.2.21 to verify
Z
N −1

u(c) = lim r ω, ∇v(c + rω) RN u c + rω) λSN −1 (dω)
r&0 SN −1
Z Z
u ω)π (N ) (c, ω) λSN −1 (dω),

= ω, ∇v(ω) RN u ω) λSN −1 (dω) =
SN −1 SN −1
where π (N ) is the Poisson kernel given in Theorem 11.1.28. Finally, given f ∈

C(∂G; R), extend f to BRN (0, 1){ so that it is constant along rays, take
(N )
uR (x) = EWx f ψ(ζ B(0,R) ) , ζ B(0,R) < ∞ for R ≥ 1 and x ∈ B(0, R),

check that, as R & 1, uR −→ u1 uniformly on B(0, 1), and use the preceding to
conclude that Z
u1 (c) = f (ω) π (N ) (c, ω) λSN −1 (dω),
SN −1
which is, of course, the result that was proved in Theorem 11.1.28.
§ 11.3 Excessive Functions, Potentials, and Riesz Decompositions
The origin of the Green function lies in the theory of electricity and magnetism.
Namely, if G is a region in RN whose boundary is grounded and y ∈ G, then
g G ( · , y) should be the electrical potential in G that results from placing a unit
point charge at y. More generally, if µ is any distribution of charge in G (i.e.,
a non-negative, locally finite, Borel measure on G), then one can consider the
potential GG µ given by
Z
(11.3.1) GG µ(x) = g G (x, y) µ(dy), x ∈ G,
G
where I have implicitly assumed that either N ≥ 3 or (11.1.20) holds. In this

section I will characterize functions that arise in this way (i.e., are potentials).
§ 11.3.1. Excessive Functions. Throughout this subsection, G will be a non-

empty, connected, open region in RN , and I will be assuming either that N ≥ 3
or that (11.1.20) holds. Thus, by the results obtained in §§ 8.2.1 and 8.2.2, the
Green function (cf. (11.2.3)) g G satisfies (depending on whether N = 1, N = 2,
or N ≥ 3) either (11.2.10), (11.2.11), (11.2.16), or (11.2.5), and, in order to have
g G defined everywhere on G2 , I will take g G (x, x) = ∞, x ∈ G, when N ≥ 2.
I will say that u is an excessive function on G and will write u ∈ E(G) if
u is a lower semicontinuous, [0, ∞]-valued function that satisfies the super mean
value property:
Z
1
u(x) ≥ u(x + rω) λSN −1 (dω)
ωN −1 SN −1
whenever BRN (x, r) ⊆ G.
As the next lemma shows, there are lots of excessive functions.

Lemma 11.3.2. E(G) is closed under non-negative linear combinations and
non-decreasing limits,
and u, v ∈ E(G) =⇒ u ∧ v ∈ E(G). Moreover, if
u ∈ C 2 G; [0, ∞) , then u ∈ E(G) ⇐⇒ ∆u ≤ 0. Finally, for each non-negative,
locally finite, Borel measure µ on G and each non-negative harmonic function h
on G, GG µ + h is an excessive function on G.
Proof: The initial
assertions are obvious. To prove the next part, suppose that
u ∈ C 2 G; [0, ∞) is given. If u ∈ E(G), then
Z
1 1
2 ∆u(x) = lim u(x + rω) − u(x) λS N −1 (dω) ≤ 0
r&0 ωN −1 SN −1
for each x ∈ G. Conversely, if ∆u ≤ 0 and B(x, r) ⊂⊂ G, then
ζ B(x,r)
"Z #
(N ) (N )
Wx
u ψ(ζ B(x,r) ) , ζ B(x,r) < ∞ − EWx 1

u(x) = E 2 ∆u ψ(τ ) dτ
0
Z
1
≥ u(x + rω) λSN −1 (dω).
ωN −1 SN −1
Clearly the third assertion comes down to showing that GG µ is excessive.

Moreover, by Fatou’s Lemma and Tonelli’s Theorem, we will know that GG µ is
excessive as soon as we show that, for each y ∈ G, g G ( · , y) is excessive. To this
1

end, set fn = pG n , · , y and (cf. (11.2.6)) un = GG fn . Because
Z T
pG (t, · , y) dt % un as T → ∞,
1
n
un is lower semicontinuous. In addition, by the Markov property and rotation

invariance, B(x, r) ⊂⊂ G implies
"Z G #
(N )
ζ (N )
h i
Wx
fn ψ(t) dt = EWx un ψ(ζr ) , ζr < ∞

un (x) ≥ E
ζr
Z
1
= un (x + rx) λSN −1 (dx),
ωN −1 SN −1
where I have introduced the notation

n o
(11.3.3) ζr (ψ) = inf t : ψ(t) − ψ(0) ≥ r
and used the rotation invariance of Brownian motion. Hence, each un is exces-
sive, and therefore, since
Z ∞
un (x) = pG (t, x, y) dt % g G (x, y) as n → ∞,
1
n
we are done.
§ 11.3.2. Potentials and Riesz Decomposition. My next goal is to prove
that, apart from the trivial case when u ≡ ∞, every excessive function on G
admits a unique representation in the form GG µ + h for an appropriate choice
of µ and h. The proof requires me to make some preparations.
Lemma 11.3.4. If u ∈ E(G), then either u ≡ ∞ or u is locally integrable on G.
Next, given a u ∈ E(G) that is not identically infinite, there exists a sequence
{un : n ≥ 1} ⊆ Cc∞ (G; R) and a non-decreasing sequence {Gn : n ≥ 1} of
open subsets of G with the properties that Gn ⊂⊂ G, Gn % G, un ≤ u,
∆un ≤ 0 on Gn for each n ≥ 1, and un −→ u pointwise as n → ∞. Moreover,
if µn (dy) = − 12 1Gn (y)∆un (y) dy, then there is a non-negative, locally finite,
Borel measure µ on G such that
Z Z
(11.3.5) lim ϕ dµn = ϕ dµ for all ϕ ∈ Cc (G; R).
n→∞ G G
In fact, µ is uniquely determined by the fact that µ = − 12 ∆u in the sense that

Z Z
(11.3.6). 1
2 ∆ϕ(y)u(y) dy = − ϕ dµ for all ϕ ∈ Cc∞ (G; R).
G G
Proof: To prove the first assertion, let U denote the set of all x ∈ G with the
property that
Z
u(y) dy < ∞ for some r > 0 with B(x, r) ⊂⊂ G.
B(x,r)
Obviously, U is an open subset of G. At the same time, if x ∈ G \ U and r > 0

is chosen so that BRN (x, 2r) ⊂⊂ G, then, for each y ∈ B(x, r) and s ∈ (0, r),
Z
1
u(y) ≥ u(y + sω) λSN −1 (dω),
ωN −1 SN −1
and so, after integrating this with respect to N sN −1 ds over (0, r), we get
Z Z
1 1
u(y) ≥ u(z) dz ≥ u(z) dz = ∞,
ΩN −1 rN B(y,r) ΩN −1 rN B(x,δ)
where δ ≡ r − |y − x|. Hence, we now see that G \ U is also open, and therefore
that either U = G or U = ∅ and u ≡ ∞.
Now assume that u ∈ E(G) is not identically infinite. To construct the required
Gn ’s and un ’s, choose a reference point c ∈ G, set R = 12 |c − G{|, and take ρ ∈
Cc∞ B(0, R4 ); [0, ∞) to be a rotationally invariant function with total integral

1. Next, for each n ∈ Z+ , set
Gn = x ∈ G ∩ B(c, n) : |x − G{| > R

n and
(11.3.7)
Z
un (x) = ρn (x − y)u(y) dy, x ∈ RN ,
G4n
where ρn (ξ) = nN ρ(nξ). Clearly, {un : n ≥ 1} ⊆ Cc∞ G; [0, ∞) . In addition, if

x ∈ Gn , then, by taking advantage of the rotation invariance of ρ, one can check

that
Z Z
N −1 t

un (x) = t ρ̃(t) u x + n ω λSN −1 (dω) dt
(0, R
4 ) SN −1
Z
≤ u(x) ωN −1 tN −1 ρ̃(t) dt = u(x),
(0, R
4 )

where ρ̃ : R −→ [0, ∞) is taken so that ρ(x) = ρ̃ |x| . Similarly, if B(x, r) ⊂⊂
Gn , then
Z
un (x + rω) λSN −1 (dω)
SN −1
Z Z
1

= ρ(z) u x+ nz + rω λSN −1 (dω) dz
B(0, R
4 ) SN −1
Z
ρ(z)u x + n1 z dz = ωN −1 un (x).

≤ ωN −1
B(0, R
4 )
Hence, un Gn is a smooth element of E(Gn ), and therefore, by the second part

of Lemma 11.3.2, we know that ∆un ≤ 0 on Gn . To see that un −→ u pointwise,
observe that we already know that u(x) ≥ limn→∞ un (x). On the other hand,
because u is lower semicontinuous, an application of Fatou’s Lemma yields
Z
ρ(y) u x + n1 y dy = lim un (x).

u(x) ≤ lim
n→∞ G n→∞
To complete the proof, let µn be the measure described, and note that
t∧ζ Gn
"Z #
(N )
h i (N )
un (x) = EWx un ψ(t ∧ ζ Gn ) − EWx 1

2 ∆un ψ(s) ds
0
t∧ζ Gn
"Z # Z t Z
(N )
Wx 1 Gn

≥ −E 2 ∆un ψ(s) ds = p (s, x, y) µn (dy) ds
0 0 Gn
for all n ∈ Z+ and (t, x) ∈ (0, ∞) × Gn . Hence, after letting t % ∞, we see that
Z
u(x) ≥ un (x) ≥ g Gn (x, y) µn (dy), n ∈ Z+ and x ∈ Gn .
Gn
In particular, because u(x) < ∞ for Lebesgue-almost every x ∈ G, this proves

that, for each K ⊂⊂ G, supn∈Z+ µn (K) < ∞, and therefore (cf. part (iv) of
Exercise 9.1.16 and apply a diagonalization procedure) {µn : n ≥ 1} is rela-
tively compact in the sense that every subsequence {µnm : m ≥ 1} admits a
subsequence {µnmk : k ≥ 1} and a locally finite, non-negative, Borel measure µ
on G with the property that
Z Z
lim ϕ dµnmk = ϕ dµ for all ϕ ∈ Cc (G; R).
k→∞ G G
At the same time, using integration by parts followed by Lebesgue’s Dominated
Theorem, we see that
Z Z Z
lim ϕ dµn = − lim 1
2 ∆ϕ u n dx = − 1
2 ∆ϕ u dx, ϕ ∈ Cc2 (G; R),
n→∞ G n→∞ G G
and therefore any limit µ of {µn : n ≥ 1} must satisfy (11.3.6), which proves
not only that there is such a µ but also that (11.3.5) is satisfied.
Lemma 11.3.8. For any lower semicontinuous u : G −→ [0, ∞], u ∈ E(G) if
and only if
(N )
h i (N )
h i
(11.3.9) EWx u ψ(τ ) , τ (ψ) < ζ G (ψ) ≤ EWx u ψ(σ) , σ(ψ) < ζ G (ψ)

for every pair σ and τ of Bt : t ∈ [0, ∞) -stopping times with σ ≤ τ . In
particular, if u ∈ E(G) and B(x, r) ⊂⊂ G, then, for any rotationally symmetric
ρ ∈ Cc B(0, r); [0, ∞) with total integral 1,
Z
t ∈ (0, 1) 7−→ ρ(y) u(x + ty) dy ∈ [0, ∞]
B(0,r)
is a non-increasing function.
Proof: Let u ∈ E(G) be given. Clearly (11.3.9) is trivial in the case when
u ≡ ∞. Thus, assume that u 6≡ ∞, and define Gn and un for n ∈ Z+ as in
(11.3.7). Because ∆un Gn ≤ 0, we know that
(N )
h i
EWx un ψ(τ ∧ ζ Gm ∧ T ) , σ(ψ) ∧ T < ζ Gm (ψ)

(N )
h i
≤ EWx un ψ(σ ∧ T ) , σ(ψ) ∧ T < ζ Gm (ψ)

for all 1 ≤ m ≤ n, x ∈ Gm , and T ∈ [0, ∞). Next, after noting that ζ Gm < ∞
(N )
Wx -almost surely, let T % ∞ in the preceding, and arrive at
(N )
h i (N )
h i
EWx un ψ(τ ∧ζ Gm ) , σ(ψ) < ζ Gm (ψ) ≤ EWx un ψ(σ) , σ(ψ) < ζ Gm (ψ) .

But, because σ ≤ τ and u ≥ un ≥ 0, this means that

(N )
h i (N )
h i
EWx un ψ(τ ) , τ (ψ) < ζ Gm (ψ) ≤ EWx u ψ(σ) , σ(ψ) < ζ Gm (ψ) ,

which, because 0 ≤ un −→ u pointwise, leads, via Fatou’s Lemma, first to

(N )
h i (N )
h i
EWx u ψ(τ ) , τ (ψ) < ζ Gm (ψ) ≤ EWx u ψ(σ) , σ(ψ) < ζ Gm (ψ)

and thence, by the Monotone Convergence Theorem, to (11.3.9) when m → ∞.

From here, the rest is easy. Given a lower semicontinuous u : G −→ [0, ∞]
and B(x, r) ⊂⊂ G, we have (cf. (11.3.3))
Z
1 (N )
h i
u(x + rω) λSN −1 (dω) = EWx u ψ(ζr ) , ζr (ψ) < ζ G (ψ) .

ωN −1 SN −1
Thus, if, in addition, (11.3.9) holds, then
Z
1
t ∈ [0, 1] 7−→ u(x + trω) λSN −1 (dω) ∈ [0, ∞]
ωN −1 SN −1
is non-increasing; and, therefore, not only is u excessive but also (after passing
to polar coordinates and integrating) one finds that the monotonicity described
in the final assertion is true.
Theorem 11.3.10 (Riesz Decomposition). Let G be a non-empty, con-
nected open subset of RN , and assume either that N ≥ 3 or that (11.1.20)
holds. If u ∈ E(G) is not identically infinite, then there exists a unique locally fi-
nite, non-negative Borel measure µ and a unique non-negative harmonic function
h on G with the property that
(11.3.11) u(x) = GG µ(x) + h(x) for all x ∈ G.
In fact, µ is uniquely determined by (11.3.6), and h is the unique harmonic
function on G that is dominated by u and has the property that h ≥ w for every
non-negative harmonic w that is dominated by u. (Cf. Exercise 11.3.14 as well.)
Proof: Take Gn and un as in (11.3.7), and define µn accordingly, as in Lemma

11.3.4. Then, for each 1 ≤ m ≤ n, Lemma 11.3.4 and the final part of Lemma
11.3.8 say that um ≤ un % u pointwise on Gm . In addition, for m ≤ n and
x ∈ Gm ,
Z
un (x) = g Gm (x, y) µn (dy) + wm,n (x),
Gm
(N )
where wm,n = EWx un ψ(ζ Gm ) , ζ Gm < ∞ .

Hence, by the Monotone Convergence Theorem, for any locally finite, non-
negative, Borel measure ν on G,
Z ZZ Z
Gm
(*) u(x) ν(dx) = lim g (x, y) ν(dx)dµn (y) + wm (x) ν(dx),
Gm n→∞ Gm
G2m
(N )
where wm (x) = EWx u ψ(ζ Gm ) , ζ Gm < ∞ .

Notice (cf. Harnack’s Principle) that, as the non-decreasing limit of non-

negative harmonic functions {wm,n : n ≥ m}, wm is either identically infinite
or is itself a non-negative harmonic function on G; and so, since u(x) < ∞
Lebesgue-almost everywhere, (*) shows that the latter must be the case. Now
let a be a fixed element of Gm , take ρn as in (11.3.7), and, for n ≥ m, define
(R
ρ (x − a)g Gm (x, y) dx if y ∈ Gm
Gm n
ϕn (y) =
0 otherwise.
By taking ν(dx) = 1Gm (x)ρn (x − a) dx in (*), we see that, for n ≥ m,

Z Z
ρn (x − a) u(x) dx = lim ϕn (y) µk (dy)
Gm k→∞ G
Z
+ ρn (x − a) wm (x) dx.
Gm
But, since Gm is the intersection of two sets, both of which (cf. part (iv) in
Exercise 10.2.19) are regular, and is therefore regular as well, there is an n(a) ≥
m for which ϕn is continuous whenever n ≥ n(a). In particular, by (11.3.5), we
can now say that
Z Z Z
ρn (x − a) u(x) dx = ϕn (x) µ(dx) + ρn (x − a) wm (x) dx
Gm G Gm
for all n ≥ n(a). In addition, as n → ∞, the reasoning with which we showed

the un −→ u in Lemma 11.3.4 shows that the term on the left tends to u(a). At
the same
time, it is clear that the second term on the right goes to wm (a) and
that ϕn (y) : n ≥ n(a) tends non-decreasingly to g Gm (a, y). Thus, we have
now proved that
(**) u = GGm µ + wm on Gm for every m ∈ Z+ .
Starting from (**), the rest of the proof is quite easy. Namely, fix x ∈ G,
choose m so that x ∈ Gm , note that, g Gn (x, · ) is non-decreasing as n ≥ m
increases, and conclude that GGn∨m µ(x) % GG µ(x). Hence, by (**) (alter-
natively, by (11.3.9)), we know that wm∨n (x) tends non-increasingly to a limit
h(x), which Harnack’s Principle guarantees to be harmonic as a function of
x ∈ G. Thus, after passing to the limit as m → ∞ in (**), we conclude that
(11.3.11) holds with the µ satisfying (11.3.6) and h = limm→∞ H Gm u.
To prove that these quantities are unique, note that if ν is any locally finite,
non-negative, Borel measure on G for which u − GG ν is a non-negative harmonic
function, then, for every ϕ ∈ Cc∞ (G; R), simple integration by parts plus the
symmetry of g G shows that
Z Z Z
1 1 G
−2 ∆ϕu dx = − 2 ∆G ϕ dν = ϕ dν.
G G G
That is, ν must satisfy (11.3.6); and so we have now derived the required unique-
ness result.
Finally, to check the asserted characterization of h, suppose that v is a non-
negative harmonic function that is dominated by u on G. We then have
(N )
v(x) = EWx v ψ(ζ Gm ) , ζ Gm (ψ) < ∞ ≤ wm (x) for m ∈ Z+ and x ∈ Gm ,

and therefore the desired conclusion follows from the fact that wm tends to h.
By combining Lemma 11.3.2 with Theorem 11.3.10, we arrive at the following
characterization of potentials.
Corollary 11.3.12. Let everything be as in Theorem 11.3.10, and suppose
that u : G −→ [0, ∞] is not identically infinite. Then a necessary and sufficient
condition for u to be the potential GG µ of some locally finite, non-negative,
Borel measure µ on G is that u be excessive on G and have the property that
the constant function 0 is the only non-negative harmonic function on G that is
dominated by u.
Let u be an excessive function on G that is not identically infinite. In keeping
with the electrostatic metaphor, I will call the measure µ entering the Riesz de-
composition (11.3.11) of u the charge determined by u. A more mathematical
interpretation is provided by Schwartz’s theory of distributions. Namely, when
u ∈ E(G) is not identically infinite, it is (cf. Lemma 11.3.4) locally integrable on
G, and, as such, it determines a distribution there. Moreover, in the language
of distribution theory, (11.3.6) says that µ = − 12 ∆u. However, the following
theorem provides a better way of thinking about µ.
Theorem 11.3.13. Let G be as in Theorem 11.3.10 and u : G −→ [0, ∞] a

lower semicontinuous function. Then u ∈ E(G) if and only if
Z
u(x) ≥ us (x) ≡ u(y)pG (s, x, y) dy for all (s, x) ∈ (0, ∞) × G.
G
Moreover, if u ∈ E(G) is not identically infinite and, for s ∈ (0, ∞), µs (dx) =
fs (x) dx, where fs (x) = u(x)−u
s
s (x)
, then, as s & 0, {µs : s > 0} tends to the
charge µ of u in the sense that
Z Z
ϕ(x) µ(dx) = lim ϕ(x) µs (dx) for all ϕ ∈ Cc (G; R).
G s&0 G
Proof: If u ∈ E(G), then, by the first part of Lemma 11.3.8 with τ = s and
σ = 0, one sees that u ≥ us . Conversely, suppose that u : G −→ [0, ∞] is lower
semicontinuous, not identically infinite, and satisfies u ≥ us for all s > 0. Then,
since pG (s, x, · ) > 0, u is locally integrable on G. Thus, if B(c, r) ⊂⊂ G and
Z
ws (x) = u(y)pB(c,r) (s, x, y) dy,
B(c,r)
then ws is bounded on B(c, r) and therefore, because pB(c,r) is smooth on

(0, ∞) × B(c, r)2 and satisfies the Chapman–Kolmogorov equation, it follows
that ws is smooth on B(c, r). In addition, because pB(c,r) ≤ pG and ut ≤ u,
another application of the Chapman–Kolmogorov equation leads to
Z
ws+t (x) = u(y)pB(c,r) (s + t, x, y) dy
B(c,r)
Z
≤ pB(c,r) (s, x, y)ut (y) dy ≤ ws (x)
B(c,r)

for (s, t) ∈ (0, ∞)2 and x ∈ B(c, r). Hence, if ϕ ∈ Cc2 B(c, r); [0, ∞) , then
Z Z
1
− 12 ∆ws (x)ϕ(x) dx = lim

ws (x) − ws+t (x) ϕ(x) dx ≥ 0,
B(c,r) t&0 s B(c,r)

which proves that ∆ws ≤ 0 on B(c, r). Since this means that ws ∈ E B(c, r)
for each s > 0 and because
ws is non-increasing as a function of s, we will know
that u ∈ E B(c, r) once we show that ws −→ u pointwise on B(c, r). But,
since ws ≤ u, this comes down to checking u(x) ≤ lims&0 ws (x), which follows
from lower semicontinuity.
Turning to the second assertion, begin with the observation that, because
u ≥ us and u is lower semicontinuous, us −→ u pointwise as s & 0. Next, note
that for (s, x) ∈ (0, ∞) × G,
"Z #
Z s Z T +s
G 1
g (x, y)fs (y) dy = lim ut (x) dt − ut (x) dt
G T →∞ s 0 T
1 s
Z
≤ ut (x) dt ≤ u(x).
s 0
Hence, since u < ∞ Lebesgue-almost everywhere on G, sups>o µs (K) < ∞

for all K ⊂⊂ G, and so {µs : s > 0} is (cf. part (iv) of Exercise 9.1.16)
relatively sequentially compact in the sense that every subsequence admits a
subsequence that converges when tested against ϕ ∈ Cc (G; R). At the same
time, if ϕ ∈ Cc2 (G; R) and ϕs (x) = G ϕ(y)pG (s, x, y) dy, then
R
Z s Z
1 G
ϕs − ϕ = 2 ∆ϕ(y)p (τ, · , y) dy dτ,
0 G
and so, by Fubini’s Theorem and the symmetry of pG (τ, x, y), one can justify
Z Z Z s
1
ϕ dµs = − uτ (y) dτ
∆ϕ(y) dy
G 2s G 0
Z Z
1
−→ − 2 ∆ϕ(y)u(y) dy = ϕ dµ.
G G
Hence, every limit of {µs : s > 0} is µ.

Exercise 11.3.14. Let G be a connected open set in RN , and assume that

N ∈ {1, 2}. If (11.1.20) fails, show that every excessive function on G is constant.
Hence, the only cases not already covered by Riesz’s Decomposition Theorem
are trivial anyhow.
Hint: Using the reasoning employed to prove the first part of Lemma 11.3.4,
reduce to the case when u is smooth and satisfies ∆u ≤ 0, and in this case apply
the result in Theorem 11.1.26.
Exercise 11.3.15. Let G be an open subset of R, and assume that either N ≥ 3
or (11.1.20) holds. If u is an excessive function on G that is not identically
infinite and has charge µ, show that u is harmonic on any open H ⊆ G for
which µ(H) = 0. In addition, show that u is a potential if it is bounded and
u(x) −→ 0 as x ∈ G tends to ∂reg G ∪ {∞}.
§ 11.4 Capacity 497
Exercise 11.3.16. Let G be a connected, open subset of RN , and again assume

that either N ≥ 3 or (11.1.20) holds. If u ∈ E(G) is not identically infinite but u
(N )
is infinite on the compact set K, show that Wx ∃t ∈ 0, ζ G (ψ) ψ(t) ∈ K = 0
for all x ∈ G \ K. Finally, apply part (ii) of Exercise 11.1.37 to conclude that
(N )
Wx (∃t > 0 ψ(t) ∈ K) = 0 for all x ∈ / K.
§ 11.4 Capacity
In the classical theory of electricity, a question of interest is that of determining
the largest charge that can be placed on a body so that the resulting electric
field nowhere exceeds 1. From a mathematical standpoint this question is the
following. Let M(G) denote the space of non-negative, finite Borel measures
on an open set G. Then, given ∅ = 6 K ⊂⊂ R3 , what we want to know is the
3
total mass of the µK ∈ M(R ) that is supported on K and solves the extremal
problem
3 3 3
GR µK (x) = max{GR µ(x) : µ ∈ M(R3 ) with µ(R3 \ K) = 0 and GR µ ≤ 1}
for all x ∈ R3 . Of course, it is not at all obvious that such a µK exists. Indeed,
the proof that it always does was one of Wiener’s significant contributions to
classical potential theory. As we are about to see, probability provides a simple
proof of Wiener’s result.1
§ 11.4.1. The Capacitory Potential. Here I will show that the extremal
problem described above has a solution.
Theorem 11.4.1. Assume that G is a connected, open subset of RN and that
either N ≥ 3 or (11.1.20) holds. Given K ⊂⊂ G, set

pG (N )
∃t ∈ 0, ζ G (ψ) ψ(t) ∈ K , x ∈ G.

(11.4.2) K (x) = Wx
Then pG G
K is a potential whose charge µK is supported on K. Moreover, if µ ∈
M(G) is supported on K and GG µ ≤ 1, then GG µ ≤ pG K.
Proof: I begin by checking that pG K is excessive. For this purpose, note that,
for any s > 0, the Markov property says that
Z
pG G (N )
∃t ∈ s, ζ G (ψ) ψ(t) ∈ K ≤ pG

K (y)p (s, x, y) dy = Wx K (x).
G
In addition, because pG
K is bounded, the left-hand side is continuous with respect
to x ∈ G, and clearly the middle expression tends non-decreasingly to pG K (x) as
s & 0. Thus, by the first part of Theorem 11.3.13, we now know that pGK ∈ E(G).
1It is interesting to note that, although Wiener’s 1924 article, “Certain notions in potential
theory,” J. Math. Phys. M.I.T. 4, contains the first proof that an arbitrary compact set is
capacitable, it contains no reference to his own measure.
The next step is to prove that pG K is a potential whose charge is supported

on K. But, because N ≥ 3 or (11.1.20) holds, it is clear that pGK (x) tends to 0
as x ∈ G tends to either ∂reg G or ∞. Hence, if u is a non-negative harmonic
function on G that is dominated by pG K , then u must be a bounded harmonic
function that tends to 0 at ∂reg G ∪ {∞}, and so, because N ≥ 3 or (11.1.20)
holds, u ≡ 0. Therefore, pG K is a potential. By Exercise 11.3.15, to check that
µGK (G \ K) = 0, it suffices to show that pG K is harmonic on G \ K. For this
purpose, assume that B(x, r) ⊂⊂ (G \ K), and use the Markov property to
justify
Z
1 Wx
(N )
pG pG B(x,r)
B(x,r)
K (ω) λSN −1 (dω) = E K ψ(ζ ,ζ (ψ) < ∞
ωN −1 SN −1

= Wx(N ) ∃t ∈ ζ B(x,r) (ψ), ζ G (ψ) ψ(t) ∈ K = pG

K (x).
That is, pG
K satisfies the mean value property in G \ K and is therefore harmonic
there.
To complete the proof I must still show that if µ ∈ M(G) is supported on
K and u ≡ GG µ ≤ 1, then u ≤ pG K , and I will start by showing that u ≤ pK
G
on G \ K. To this end, observe that u is harmonic on G \ K and that it tends

to 0 at ∂reg G ∪ {∞}. Thus, if ζδ (ψ) = inf{t ≥ 0 : ψ(t) ∈ K(δ)}, where
K(δ) = {x : |x − K| ≤ δ}, then, for δ ∈ 0, dist(K, G{) and x ∈ G \ K(δ), u(x)
is dominated by
(N )

EWx u ψ(ζδ ) , ζδ (ψ) < ζ G (ψ) ≤ Wx(N ) ∃t ∈ 0, ζ G (ψ) ψ(t) ∈ K(δ) .

But, as δ & 0, the last expression tends to pG

K (x) plus
Wx(N ) ∀δ > 0 ζδ < ζ G and lim ζδ = ∞ = ζ G ,

δ&0
and, because N ≥ 3 or (11.1.20) holds, this additional term is 0.

We now know that u ≤ pG K on G \ K. To prove that the same inequality holds
on K, first observe that, by part (i) of Exercise 10.2.19, pG
K K = 1 ≥ u K
when N = 1. Thus, assume that N ≥ 2. In this case, g G (x, x) = ∞ for x ∈ G,
and so, since u ≤ 1, µ must be non-atomic. In particular, this means that
Z
u(x) = lim ur (x), where ur ≡ g G ( · , y) µ(dy).
r&0 G\B(x,r)
But, by the preceding applied with K \ B(x, r) replacing K, ur (x) ≤ pG

K\B(x,r) ,
G G
and obviously pK\B(x,r) ≤ pK .
The function pG G
K and the measure µK are, for the reasons explained above,
known as, respectively, the capacitory potential and the capacitory distri-
bution for K in G, and the total mass
(11.4.3) Cap(K; G) ≡ µG
K (K)
is called the capacity of K in G. As a dividend from Theorem 11.4.1, we get

the following important connection between properties of Brownian paths and
classical potential theory.
Corollary 11.4.4. Let everything be as in the statement of Theorem 11.4.1.
Then the following are equivalent:
(i) For every x ∈ G,

Wx(N ) ∃ t ∈ 0, ζ G (ψ) ψ(t) ∈ K > 0.

(ii) There is an x ∈ G for which

Wx(N ) ∃ t ∈ 0, ζ G (ψ) ψ(t) ∈ K > 0.

(iii) There exists a non-zero, bounded potential on G whose charge is supported

in K.
(iv) Cap(K; G) > 0.
Moreover, Cap(K; G) = 0 for, when N ≥ 3, some G ⊃⊃ K or, when N ∈ {1, 2},
(N )
some G ⊃⊃ K satisfying (11.1.20), if and only if Wx ∃t ∈ (0, ∞) ψ(t) ∈ K =
0 for all x ∈
/ K.
Proof: The only implications in the equivalence assertion that are not com-
pletely trivial are (iii) =⇒ (iv) and (iv) =⇒ (i). But, by Theorem 11.4.1, (iii)
implies that pG G
K 6≡ 0 and therefore that µK 6= 0. Similarly, (iv) implies that
µK 6= 0, and therefore, since g > 0 throughout G2 , that pG
G G
K > 0 throughoutG.
(N )
To prove the final assertion, first suppose that Wx0 ∃t ∈ (0, ∞) ψ(t) ∈ K >
0 for some x0 ∈ / K. Then we can choose R ∈ (0, ∞) so that K ⊂⊂ B(0, R) and
B(0,R) B(0,R)
pK (x0 ) > 0. In particular, µK 6= 0 and
B(0,R) B(0,R)
GG∩B(0,R) µK ≤ GG µK ≤ 1.
At the same time, because

(N )
h i
g G (x, y) ≤ g G∩B(0,R) (x, y) + EWx g G ψ(ζ G∩B(0,R) ) , ζ G∩B(0,R) (ψ) < ∞ ,

there exists (cf. Corollary 11.2.20 when N = 2) a C < ∞ such that g G (x, y) ≤
B(0,R)
g G∩B(0,R) (x, y) + C for all x ∈/ B(0, R) and y ∈ K. Hence, GG µK ≤
B(0,R)
1 + CCap K, B(0, R) , and so we have shown that GG µK is a non-zero,
bounded potential on G whose charge is supported in K, which, by the preceding
equivalences, means that Cap(K; G) > 0. Conversely, if Cap(K; G) > 0, then,
again by the preceding equivalences, we know that pG K > 0everywhere on G,
(N )
which, of course, means that Wx ∃t ∈ (0, ∞) ψ(t) ∈ K > 0, first for all
x ∈ G and then for all x ∈ RN .
The last part of the preceding allows us to use capacity to determine whether
Brownian paths will hit a K ⊂⊂ RN . Indeed, we now know that they will if
and only if Cap(K; G) > 0 for some G ⊃⊃ K satisfying our hypotheses. Thus,
the ability of Brownian paths in RN to hit a set is completely determined by
the singularity in the Green function. Namely, they will hit K with positive
probability if and only if there is a non-zero µ supported on K for which GG µ
is bounded. When N = 1, there is no singularity, and so even points can be hit.
When N ≥ 2, there is a singularity, and so, in order to be hit, K has to be large
enough to support a measure that is sufficiently smooth to mollify the singularity
in the Green function. Non-trivial (i.e., K’s for which K{ is the interior of its
closure) examples of K’s that cannot be hit are hard to come by. “Lebesgue’s
spine” provides one in R3 and can be adapted to RN for N ≥ 3. When N = 2
one has too work much harder. The most famous example is a devilishly clever
construction, known as “Littlewood’s crocodile,” due to J.E. Littlewood. See M.
Brelot’s lecture notes Éléments de la Théorie Classique du Potenial published
in 1965 by Centre de Documentation Universitaire, Sorbonne, Paris V.
§ 11.4.2. The Capacitory Distribution. In this subsection I will give a prob-
abilistic representation, discovered by K.L. Chung, of the capacitory distribution
µG N
K . Again I assume that G is a connected open subset of R and that either
N ≥ 3 or (11.1.20) holds.
The function `G N
K : C(R ) −→ [0, ∞] given by
`G G

K (ψ) = sup t ∈ 0, ζ (ψ) : ψ(t) ∈ K
(11.4.5)
≡ 0 if t ∈ 0, ζ G (ψ) : ψ(t) ∈ K = ∅ .

is called a quitting time. Clearly, `G K is not a stopping time. On the other

hand, it transforms nicely under the time-shift maps Σt . Specifically,
+
`G G
K ◦ Σt = `K − t for t ∈ [0, ζ G ).
Theorem 11.4.6 (Chung).2 Let G be a connected open subset of RN , assume

that either N ≥ 3 or that (11.1.20) holds, and suppose that K ⊂⊂ G with
2 This result appeared originally in K.L. Chung’s “Probabilistic approach in potential theory
to the equilibrium problem,” Ann. Inst. Fourier Gren. 23 # 3, pp. 313–322 (1973). It gives
the first direct probabilistic interpretation of the capacitory measure.
Cap(K; G) > 0. Then, for all Borel measurable ϕ : G −→ R that are bounded
below and every c ∈ G,
" #
ϕ ψ(`GK)
Z
(N )
G Wc G
, `K ∈ (0, ∞) .
(11.4.7) ϕ dµK = E
G g G c, ψ(`G
K)
Proof: Take u = pG K , and define f and µs for s > 0 as in Theorem 11.3.13.

(N ) G
s
Then sfs (x) = Wx 0 < `K ≤ s , and so, for any ϕ ∈ Cb (G; R),
"Z G #
Z
(N )
ζ
G Wc

g (c, y)ϕ(y) µs (dy) = E ϕ ψ(t) fs ψ(t) dt
G 0
Z ∞
1 (N )
h (N ) i
EWc ϕ ψ(t) Wψ(t) 0 < `G
G
= K ≤ s , ζ > t dt
s 0
Z ∞
1 (N )
h i
EWc ϕ ψ(t) , t < `G

= K ≤ s + t dt
s 0
" Z G #
Wc1 `K
(N ) G
=E ϕ ψ(t) dt, `K ∈ (0, ∞)
s (`G
K
−s)+
(N )
h i
−→ EWc ϕ ψ(`G
G
K ) , `K ∈ (0, ∞) as s & 0,
where, in the passage to the third line, I have applied the Markov property and
used the time-shift property of `G K . Next, let η ∈ Cc (G; R) be given, note that
η
ϕ = gG (c, ·)
is again an element of Cc (G; R), and conclude from Theorem 11.3.13
and the preceding that (11.4.7) holds first for ϕ’s in Cc (G; R) and then for all
bounded, measurable ϕ’s on G.
Aside from its intrinsic beauty, (11.4.7) has the virtue that it simplifies the
proofs of various important facts about capacity. For instance, it allows one to
prove a basic monotone convergence result for capacity. However, before doing
so, I will need to introduce the the energy E G (µ, ν), which is defined for locally
finite, non-negative Borel measures µ and ν on G by
ZZ
E G (µ, ν) = g G (x, y) µ(dx)ν(dy).
G2
Clearly E G (µ, ν) is some sort of inner product, and so it is not surprising that
there is a Schwarz inequality for it.
Lemma 11.4.8. For any pair of locally finite, non-negative, Borel measures µ
and ν on G, q q
E G (µ, ν) ≤ E G (µ, µ) E G (ν, ν);
and, when the factors on the right are both finite, equality holds if and only if
aµ − bν = 0 for some pair (a, b) ∈ [0, ∞)2 \ (0, 0).
Proof: For each (t, x) ∈ (0, ∞) × G, set

Z Z
f (t, x) = pG (t, x, y) µ(dy) and g(t, x) = g G (t, x, y) ν(dy),
G G
and note that, by the Chapman–Kolmogorov equation, Tonelli’s Theorem, and

Schwarz’s Inequality:
 
Z ZZ
E G (µ, ν) =  pG (t, x, y) µ(dx)ν(dy) dt
(0,∞)
G2
ZZ
t t

= f 2, x g 2, x dtdx
(0,∞)×G
  12   12
ZZ ZZ
t
2 t
2
≤ f 2, x dtdx  g 2, x dtdx
   
(0,∞)×G (0,∞)×G
  12   12
ZZ ZZ
= f (t, x) dtdx  g(t, x) dtdx
   
(0,∞)×G (0,∞)×G
q q
= E G (µ, µ) E G (ν, ν).
Furthermore, when f and g are square integrable, then equality holds if and only
if they are linearly dependent in the sense that af − bg = 0 Lebesgue-almost
everywhere for some non-trivial choice of a, b ∈ [0, ∞). But this means that
Z Z T Z
a G
a ϕ dµ = lim ϕ(x)p (t, x, y) µ(dx) dt
G T &0 T 0 G
ZZ ZZ
a b
= lim ϕ(x) f (t, x) dtdx = lim ϕ(x) g(t, x) dtdx
T &0 T T &0 T
(0,T ]×G (0,T ]×G
Z T Z Z
b
= lim ϕ(x)pG (t, x, y) ν(dx) dt = b ϕ dν
T &0 T 0 G G
for every ϕ ∈ Cc (G; R), and so aµ − bν = 0.

With this lemma, I can now give the application of Theorem (11.4.7) men-
tioned above.
Theorem 11.4.9. Let G be as in Theorem (11.4.7) and T∞{Kn : n ≥ 1} a non-

increasing sequence of compact subsets of G. If K = 1 Kn , then, for every
Borel measurable ϕ : G −→ R that is continuous in a neighborhood of K1 ,
Z Z
G
lim ϕ dµKn = ϕ dµGK,
n→∞ G G
and so
Cap(K; G) = lim Cap Kn ; G).
n→∞
Finally, if µ is any non-negative Borel measure on G satisfying µ(G \ K) = 0
and GG µ ≤ 1, then
E G µ, µ ≤ Cap(K; G) and equality holds ⇐⇒ µ = µG

K.
Proof: Let c ∈ G \ K1 be given. In view of (11.4.7), checking the first assertion
(N )
comes down to showing that, for Wc -almost every ψ ∈ C(RN ),
`G G G

Kn (ψ) −→ `K (ψ) ∈ 0, ζ (ψ) if either
G
`Kn (ψ) : n ≥ 1 ⊆ 0, ζ G (ψ) or `G G

K (ψ) ∈ 0, ζ (ψ) .

To this end, let ψ ∈ C(RN ) with ψ(0) = c be given. If `G Kn
(ψ) : n ≥ 1 ⊆

0, ζ G (ψ) , then it is clear that
`G G
where T ∈ 0, ζ G (ψ) .

Kn (ψ) & T ≥ `K (ψ),
In addition, by continuity, ψ(T ) ∈ K, which

means first that T ≤ `G K (ψ) and
G G G
then that `Kn (ψ) −→ `K (ψ) ∈ 0, ζ (ψ) . Next, observe that
0 < `G G G
G G
for all n ∈ Z+ .

K (ψ) < ζ (ψ) < ∞ =⇒ `Kn (ψ) ∈ `K (ψ), ζ (ψ)
Hence, we are done if (11.1.20) holds. On the other hand, if N ≥ 3, then,
(N )
because limt→∞ |ψ(t)| = ∞ for Wc -almost all ψ ∈ C(RN ), we know that, for
(N )
Wc -almost every ψ ∈ C(RN ),
ζ G (ψ) = ∞ and `G
G
`Kn (ψ) : n ≥ 1 ⊆ 0, ζ G (ψ) ;

K (ψ) ∈ (0, ∞) =⇒
and so we have now completed the proof of the first part.
To prove the final assertion, first choose compact Kn ’s in G so that K ⊂⊂
(Kn )◦ for each n ∈ Z+ and Kn & K as n → ∞. Because pG Kn K ≡ 1 and
G
pKn ≤ 1, we have that
Z
pG G G
µG G

Cap(K; G) = Kn (x) µK (dx) = E K , µKn
G
21 12
≤ E G µG G
K , µK E G µG G
Kn , µKn
Z 12
1
= EG G G 2
µK , µK G G
pKn (x) µKn (dx)
G
1 1 1 1
≤ EG µG G 2
K , µK Cap Kn ; G 2 −→ E G µG G 2
K , µK Cap K; G 2

as n → ∞. Hence, Cap(K; G) ≤ E G µG G
K , µK . On the other hand, if µ(G \ K) =
0 and GG µ ≤ 1, then, by Theorem 11.4.1, GG µ ≤ pG
K ≤ 1,
Z Z
E G (µ, µ) = GG µ dµ ≤ pG G
µG

K dµ = E K, µ
G G
G
12 1
≤E µG G
E (µ, µ) 2
K , µK
G
Z 12 q
1 p
= pG
K dµG
K E G
(µ, µ) 2 ≤ Cap(K; G) E G (µ, µ),
G
and equality can hold only if aµG K − bµ = 0 for some non-trivial pair (a, b) ∈
[0, ∞)2 . When one takes µ = µG K this, in conjunction with the preceding, proves
,
that Cap(K; G) = E G µG K , µG
K . In addition, for any µ with µ(G \ K) = 0 and
G G
G µ ≤ 1, it shows that E (µ, µ) ≤ Cap(K; G) and that equality can hold only
if µ and µGK are related by a non-trivial linear equation,
in which case µ = µG K
G G G G
follows immediately from the equality E µK , µK = E (µ, µ).
The result in Theorem 11.4.9, which was known to Wiener, played an impor-
tant role in his analysis of classical potential theory. To be more precise, when
3
3
N = 3 and K{ is regular, pR K is the continuous function on R that is harmonic
off K, is 1 on K, and tends to 0 at infinity. Thus, it is a relatively simple prob-
lem to define the capacitory distribution for such K’s in R3 . The importance
to Wiener of results like that in Theorem 11.4.9 is that they enabled him (cf.
Exercise 11.4.20) to make a consistent assignment of capacity to K’s for which
K{ is not necessarily regular.
§ 11.4.3. Wiener’s Test. This subsection is devoted to another of Wiener’s
famous contributions to classical potential theory.
As was pointed out following Corollary 11.4.4, capacity can be used to test
whether Brownian paths will hit a compact set K. By Lemma 11.1.21, an
equivalent statement is that capacity can be used to test whether ∂reg (K{) is
empty or not. The result of Wiener that will be proved here can be viewed as a
sharpening of this remark.
Assume that N ≥ 2, and let an open subset G of RN and an a ∈ ∂G be given.
For n ∈ Z+ , set
n o
Kn = y ∈ / G : 2−n−1 ≤ |y − a| ≤ 2−n ,
and define

nCap Kn ; B(a, 1) if N = 2

(11.4.10) Wn (a, G) =
2n(N −2) Cap Kn ; B(a, 1)

if N ≥ 3.
Then Wiener’s test says that
∞
X
(11.4.11) a ∈ ∂reg G ⇐⇒ Wn (a, G) = ∞.
n=1
Notice that, at least qualitatively, (11.4.11) is what one should expect in that
the divergence of the series is some sort of statement that G{ is robust at a.
The key to my proof of Wiener’s test is the trivial observation that because
Z
B(a,1) B(a,1)
pn (x) ≡ pKn (x) = g B(a,1) (x, y) µKn (dy),
Kn
and, depending on whether N = 2 or N ≥ 3, there exists (cf. Exercise 11.2.21) an

−1
αN ∈ (0, 1) such that αN n ≤ g B(a,1) (a, y) ≤ αN n or αN 2n(N −2) ≤ g B(a,1) (a, y)
−1 n(N −2) −n −n−1
≤ αN 2 for y ∈ B(a, 2 ) \ B(a, 2 ), we know that
αN Wn (a, G) ≤ pn (a) ≤ Wn (a, G), n ∈ Z+ .
Hence, in probabilistic terms, Wiener’s test comes down to the assertion that
∞
X
Wa(N ) G
Wa(N ) An = ∞,

ζ0+ = 0 = 1 ⇐⇒
1
where An is the set of ψ ∈ C(RN ) that visit Kn before leaving B(a, 1). Actually,
although the preceding equivalence is not obvious, the closely related statement

Wa(N ) ζ0+
G
= 0 = 1 ⇐⇒ Wa(N ) lim An > 0

(11.4.12)
n→∞
G
is essentially immediate. Indeed, if ψ(0) = a and ζ0+ (ψ) = 0, then there
exists a sequence of times tm & 0 with the property that ψ(tm ) ∈ B(a, 1) ∩
G{ for all m, from which it is clear that ψ visits infinitely many Kn ’s before
leaving B(a, 1). Hence, the “ =⇒ ” in (11.4.12) is trivial. As for the opposite
N B(a,1)
implication,
B(a,1) that ψ ∈ C(R ) has the properties that ζ
suppose (ψ) < ∞,
t ∈ 0, ζ (ψ) : ψ(t) = a} = {0}, and that ψ visits infinitely many Kn ’s
before leaving B(a, 1). We can then find a subsequence {nm : m ≥ 1} and
a convergent sequence of times tm > 0 such that ψ(tm ) ∈ Knm for each m.
Clearly, limm→∞ ψ(tm) = a, and therefore
limm→∞
tm = 0. In other words, if
ζ B(a,1) (ψ) < ∞, t ∈ 0, ζ B(a,1) (ψ) : ψ(t) = a = {0}, and ψ ∈ limn→∞ An ,
G
then ζ0+ (ψ) = 0. Hence, since N ≥ 2 and therefore
Wa(N ) ψ : ζ B(a,1) (ψ) < ∞ and ∀t > 0 ψ(t) 6= a

= 1,
we have shown that

Wa(N ) ζ0+
G
= 0 ≥ Wa(N ) lim An ;
n→∞
(N ) G

and therefore, because Wa ζ0+ = 0 ∈ {0, 1}, we have proved the equivalence
in (11.4.12).
In view of the preceding paragraph, the proof of Wiener’s test reduces to the
problem of showing that
∞
X
Wa(N ) Wa(N ) An = ∞.

(11.4.13) lim An > 0 ⇐⇒
n→∞
1
By the trivial part of the Borel–Cantelli Lemma, the “ =⇒ ” implication in

(11.4.13) is easy. On the other hand, because the events {An : n ≥ 1} are not
mutually independent, the non-trivial part of that lemma does not apply and
therefore cannot be used to go in the opposite direction. Nonetheless, as we will
see, the following interesting variation on the Borel–Cantelli theme does apply
and gives us the “⇐=” implication in (11.4.13).
Lemma 11.4.14. Let (Ω, F, P) be a probability space and {An : n ≥ 1} a
sequence of F-measurable sets with the property that
P Am ∩ An ≤ CP Am P An , m ∈ Z+ and n ≥ m + d,

for some C ∈ [1, ∞) and d ∈ Z+ . Then

∞
X 1
P An = ∞ =⇒ P lim An ≥ .
1
n→∞ 4C
Proof: Because
∞
X ∞
X

P An = ∞ =⇒ P And+k = ∞ for some 0 ≤ k < d,
n=1 n=1
whereas
P lim An ≥ P lim And+k for each 0 ≤ k < d,
n→∞ n→∞
I may and will assume that d = 1. Further, since

P lim An ≥ lim P An ,
n→∞ n→∞
1
I will assume that P(An ) ≤ 4C for all n ∈ Z+ . In particular, these assumptions
mean that, for each m ∈ Z+ , we can find an nm > m such that
nm
X 3 1
sm ≡ P A` ∈ 4C ,C .
`=m
Pn 1
Indeed, simply take nm to be the largest n > m for which `=m P A` ≤ C.
At the same time, by an easy induction argument on n > m, one has that
n
! n
[ X 1 X
P A` ≥ P A` − P Ak ∩ A`
2
`=m `=m m≤k6=`≤n
for all n > m ≥ 1, and therefore

∞
! n
!
[ [ m
Cs2m 1
P A` ≥ P A` ≥ sm − ≥
2 4C
`=m `=m
for all m ∈ Z+ .
Proof of Wiener’s Test: All that remains is to check that the sets An
(N )
appearing in (11.4.13) satisfy the hypothesis in Lemma 11.4.14 when P = Wa .
To this end, set n o
σn (ψ) = inf t ∈ (0, ∞) : ψ(t) ∈ Kn .

Clearly, An = σn < ζ B(a,1) , and so
Wa(N ) Am ∩ An ≤ Wa(N ) σm < σn < ζ B(a,1) + Wa(N ) σn < σm < ζ B(a,1)

for all m ∈ Z+ and n 6= m. But, by the Markov property,

(N )
Wa(N ) σm < σn < ζ B(a,1) ≤ EWa pn ψ(σm ) , σm (ψ) < ζ B(a,1) (ψ)

≤ β(m, n)pm (a),

where I have introduced the notation β(m, n) ≡ maxx∈Km pn (x). Finally, be-
B(a,1) g B(a,1) (x,y)
cause pn (x) = GB(a,1) µKn (x) and there is a CN < ∞ such that g B(a,1) (a,y)
≤
S
CN for x ∈ |m−n|≥2 Km and y ∈ Kn ,
β(m, n) ≤ CN pn (a) for all |m − n| ≥ 2.
(N )
Hence, since pn (a) = Wa An , we have now shown that
Wa(N ) Am ∩ An ≤ 2CN Wa(N ) Am Wa(N ) An

for all |m − n| ≥ 2,
which means that Lemma 11.4.14 applies with C = 2CN and d = 2.
§ 11.4.4. Some Asymptotic Expressions Involving Capacity. Assume
that K ⊂⊂ RN and that N ≥ 2. Given K ⊂⊂ RN , define σK (ψ) = ζ0+ K{
(ψ) =
inf{t > 0 : ψ(t) ∈ K} to be the first positive entrance time into K. In this
subsection I will make some computations in which σK and capacity play a
critical role.
I begin with a result of F. Spitzer’s3 about the rate of heat transfers from
the outside to the inside of a compact set. To be precise, let K ⊂⊂ RN , where
N ≥ 3, and think of
Z
Wx(N ) σK ≤ t dx

(11.4.15) EK (t) ≡
K{
as the amount of heat that flows into K during [0, t] from outside.
3See Electrostatic capacity, heat flow, and Brownian motion, in Z. Wahrsh. Gebiete. 3. Re-
cently, M. Van den Burg has written several papers in which he greatly refines Spitzer’s result.
Theorem 11.4.16 (Spitzer). Assume that N ≥ 3, and, for K ⊂⊂ RN , define

t EK (t) as in (11.4.15). Then
EK (t)
lim = Cap(K; RN ).
t→∞ t
Proof: Because, by the second part of Lemma 11.1.5,
Wx(N ) σK = t = 0 for all (t, x) ∈ (0, ∞) × RN ,

we know that t EK (t) is a bounded, non-negative, continuous, non-decreasing

function.
I next observe that, for any 0 ≤ h < t,
Z
Wx(N ) t − h < σK ≤ t dx.

EK (t) − EK (t − h) =
RN
To see this, notice that there would be nothing to do if the integral were over
(N )
K{. On the other hand, by part (ii) of Exercise 10.2.19, Wx (σK > 0) = 0
Lebesgue-almost everywhere on K, and so the integral over K does not con-
tribute anything.
I now want to replace the preceding by
Z
Wy(N ) σK ≤ h and σK h

(*) EK (t) − EK (t − h) = > t dy,
RN
where
h

σK (ψ) ≡ inf s ∈ (h, ∞) : ψ(s) ∈ K
is the first entrance time into K after time h. To prove (*), set
(x,y) t−s s
θt (s) = x + θt (s) + y, s ∈ [0, t],
t t
where θt (s) = θ(s) − s∧t

t θ(t). Then, by (8.3.12) and the reversibility property
discussed in Exercise 8.3.22,
Wx(N ) t − h < σK ≤ t

Z
(x,y)
= W (N ) t − h < σK θt ≤ t g (N ) (t, y − x) dy
N
ZR
(y,x) (y,x)
= W (N ) σK θt ≤ h and σK h
θt > t g (N ) (t, y − x) dy,
RN
and now integrate with respect to x to arrive at (*) after an application of

Tonelli’s Theorem and another application of (8.3.12).
Starting from (*), one has that, for each h ∈ [0, ∞),

∆K (h) ≡ lim EK (t + h) − EK (t)
t→∞
Z
= Wy(N ) σK ≤ h and σK h
= ∞ dy,
RN
the convergence being uniform for h in compacts. Thus, ∆K is non-negative and

continuous, and, from its definition, it is clear that it is additive in the sense
that ∆K (h1 + h2 ) = ∆K (h1 ) + ∆K (h2 ). Therefore, by standard results about
additive functions, we now know that ∆K (h) = h∆K (1).
The problem which remains is that of evaluating ∆K (1). First observe that,
by (4.3.13),
|y − K|2

(N ) h (N )

Wy σK ≤ h and σK = ∞ ≤ Wy σK ≤ h ≤ 2N exp − ,
2N h
and therefore that
Z
∆K (h) 1
Wy(N ) σK ≤ h & σK
h

∆K (1) = lim = lim = ∞ dy
h&0 h h&0 h B(0,R)
for any R > 0 satisfying K ⊂⊂ B(0, R). Second, note that
Wy(N ) σK ≤ h and σK h
= ∞ = Wy(N ) σKh
= ∞ − Wy(N ) σK = ∞

Z
N N
= Wy(N ) σK < ∞ − Wy(N ) σK h
g (N ) (h, y − ξ)pR

< ∞ = pR K (y) − K (ξ) dξ.
RN

Finally, combine these with Theorem 11.3.13 to arrive at ∆K (1) = Cap K; RN .
To complete the proof, set ]t[= t − btc and write
[t]
X
EK (t) = EK ]t[ + EK ]t[ +n − EK ]t[ +n − 1 .
n=1
Using this together with ∆K (h) = hCap(K; G), one obtains the desired re-
sult.
The next two computations provide asymptotic formulas as t % ∞ for the
(N )
quantity Wx σK ∈ (t, ∞) .
Theorem 11.4.17.4 If N ≥ 3 and K ⊂⊂ RN , then, as t % ∞,
N
2Cap(K; RN ) 1 − pR
K (x) 1− N
pK (t, x) ≡ Wx(N ) σK ∈ (t, ∞) ∼ N t 2
(2π) 2 (N − 2)
uniformly for x in compacts.
4 This result was conjectured by Kac and first proved by his student A. Joffe. However, I will
follow the argument given by F. Spitzer in the article cited above.
Proof: Without loss in generality (cf. Corollary 11.4.4), I will assume that
N
Cap(K; RN ) > 0. Next, set pK (x) = pR
K (x) and pK (t, x, y) = p
K{
(t, x, y), and
note that, by the Markov property,
Z
pK (t, x) = pK (y) pK (t, x, y) dy.
K{
N
Thus, since pK (t, x, y) ≤ (2πt)− 2 , we know that
Z
N

−1
lim sup t p (t, x) − p (y) p (t, x, y) dy =0
2

K K K
t→∞ x∈RN |y|≥R
for every R > 0 with K ⊂⊂ B(0, R). At the same time, because
Z
3 N
pK (y) = g R (x, y) µR
K (dx),
K
it is clear that
2Cap(K; RN )
lim |y|N −2 pK (y) = .
|y|→∞ (N − 2)ωN −1
Hence, we have now shown that

N Z
N
2Cap(K; R ) pK (t, x, y)
lim sup t −1 pK (t, x) − dy =0
2

t→∞ x∈RN (N − 2)ωN −1 |y|≥R |y| N −2
for each R ∈ (0, ∞) with K ⊂⊂ B(0, R), and what we must still prove is that

N Z pK (t, x, y) ωN −1 (N )
2 −1
(*) lim sup t dy − W (σ = ∞)=0

N −2 N x K
t→∞ |x|≤r |y|≥R |y| (2π) 2
for all positive r and R with K ⊂⊂ B(0, R).

To prove (*), let r and R be given, and use (10.3.8) to see that
Z
pK (t, x, y) (N )
Wx
h i
dy = q(t, x) − E q t − σK , ψ(σK ) , σ K < t ,
|y|≥R |y|N −2
where
g (N ) (t, y − x)
Z
q(t, x) ≡ dy for (t, x) ∈ (0, ∞) × RN .
|y|≥R |y|N −2
After changing to polar coordinates and making a change of variables, one can
easily check that, for each T ∈ [0, ∞),

N ωN −1
lim sup t 2 −1 q(t − s, x) − = 0.

N
t→∞ 0<s≤T (2π) 2
|x|≤r
Thus, if, for T ∈ (0, t), we write

Z
N
−1 pK (t, x, y) ωN −1 (N )
t 2
N −2
dy − N Wx (σK = ∞)
|y|≥R |y| (2π) 2
!
N
−1 ωN −1 Wx
(N )
h N
ωN −1
i
2 −1 q t − σ

= t 2 q(t, x) − N − E t K , ψ(σ K ) − N , σ K ≤ T
(2π) 2 (2π) 2
(N )
h N i ωN −1 (N )
− EWx t 2 −1 q t − σK , ψ(σK ) , σK ∈ (T, t) +

N Wx σK ∈ (T, ∞) ,
(2π) 2
then it becomes clear that (*) will follow once we check that

lim sup Wx(N ) σK ∈ (T, ∞) = 0 and
T →∞ x∈RN
(**) N (N )
h i
sup t 2 −1 EWx q t − σK , ψ(σK ) , σK ∈ (T, t) = 0.

lim
T →∞ t>T
x∈RN
To check the first part of (**), note that, by the Markov property,
Z
Wx(N ) σK ∈ (T, T + 1] = pK (T, x, y)Wy(N ) σK ≤ 1 dy

K{
Z
−N N
Wy(N ) σK ≤ 1 dy ≤ CT − 2 ,

≤ (2πT ) 2
RN
where C = C(N, R) ∈ (0, ∞). Hence, after writing
X∞
Wx(N ) σK ∈ (T, ∞) ≤ Wx(N ) σK ∈ (T + n, T + n + 1] ,
n=0
(N )
we see that, as T → ∞, Wx σK ∈ (T, ∞) −→ 0 uniformly with respect to
x ∈ RN .
To handle the second part of (**), note that there is a constant A ∈ (0, ∞)
for which
N
q(t, x) ≤ A (t ∨ 1)1− 2 , (t, x) ∈ (0, ∞) × K,
and therefore
N (N )
t 2 −1 EWx

q t − σK , ψ(σK ) , σK ∈ (T, t)

N
−1
Wx(N ) σK ∈ [t] − 1, t

≤ At 2
[t]−1
N
X
(t − `)1− 2 Wx(N ) σK ∈ (` − 1, `]

+
`=[T ]
[t]−1
N
−N N N N
X
−1 −1
≤ ACt 2 ([t] − 1) 2 + ACt 2 (t − `)1− 2 (` − 1)− 2 ,
`=[T ]
where the C is the same as the one that appeared in the derivation of the first
part of (**). Thus, everything comes down to verifying that
n−1
N N N
X
lim sup n 2 −1 (n − `)1− 2 `− 2 = 0.
m→∞ n>m
`=m
2
But, by taking m = m N −1 and considering
N N N N
X X
(n − `)1− 2 `− 2 and (n − `)1− 2 `− 2
m≤`≤(1−m )n (1−m )n≤`≤n
separately, one finds that there is a B ∈ (0, ∞) such that
n−1
N N N
X
n 2 −1 (n − `)1− 2 `− 2 ≤ Bm .
`=m
As one might guess, on the basis of (11.2.15), the analogous situation in R2 is

somewhat more delicate in that it involves logarithms.
Theorem 11.4.18 (Hunt).5 Let K be a compact subset of R2 , define σK as
(2)
above, assume that Wx σK < ∞ = 1 for all x ∈ R2 , and use hK to denote
the function hG given in (11.2.15) when G = R2 \ K. Then, as t % ∞,
2πhK (x)
Wx(2) σK > t ∼ for each x ∈ R2 \ K.
log t
5 This theorem is taken from G. Hunt’s article Some theorems concerning Brownian motion,
T.A.M.S. 81, pp. 294–319 (1956). With breathtaking rapidity, it was followed by the articles
referred to in § 11.1.4.
Proof: The strategy of Hunt’s proof is to deal with the Laplace transform
Z ∞ (2)
e−αt W (2) σK > t dt = α−1 1 − EWx e−ασK ,

0
show that
log α1 (2)
(*) lim 1 − EWx e−ασK = hK (x),
α&0 2π
and apply Karamata’s Tauberian Theorem to conclude first that
log t t (2)
Z

lim Wx σK > τ dτ = hK (x)
t→∞ 2πt 0

and then, because t W (2) σK > t is non-increasing, that the asserted result
holds. Thus, everything comes down to proving (*).
Set G = R2 \ K. By assumption, G satisfies the hypotheses of Theorem
11.2.14. Now let x ∈ G be given, and choose y ∈ G \ {x} from the same
connected component of G as x. Then pG (t, x, y) > 0 for all t ∈ (0, ∞). In
addition, by (10.3.8), for each α ∈ (0, ∞),
Z ∞
e−αt pG (t, x, y) dt
0
Z ∞ Z ∞
(N )
−αt (2) Wx −ασK −αt (2)

= e g t, y − ψ(σK ) dt − E e e g (t, y − x) dt .
0 0
Next observe that

Z ∞
α|z|2

−αt (2)
e g (t, z) dt = f
0 2
Z ∞
1
t−1 exp −βt − t−1 dt for β > 0.

where f (β) ≡
2π 0
Writing
Z 1 Z ∞
t exp −βt − t−1 dt +
−1
t−1 e−βt exp −t−1 − 1 dt

2πf (β) =
0 1
Z ∞
+ t−1 e−t dt,
β
integrating by parts, and performing elementary manipulations, we find that

1
log β
f (β) = + κ + o(1) as β & 0,
2π
where Z ∞
1
κ= e−t log t dt.
π 0
At the same time, we have that

Z ∞
e−αt pG (t, x, y) dt −→ g G (x, y) as α & 0.
0
Hence, when we plug these into the preceding, we get
1 1 (2)
g G (x, y) = − log |y − x| + EWx log |y − ψ(σK )|, σK < ∞

π π
log α1 (2)
+ 1 − EWx e−ασK + o(1)
2π
as α & 0. Finally, after comparing this to (11.2.16), we arrive at (*).
Let K ⊂⊂ RN be as in the preceding theorem, and choose some c ∈ K{. By
comparing the result just obtained to (11.2.15), we see that
(2)
Wx σK > t
lim (2) = 2 for each x ∈ K{.
t→∞ Wx σK > ζ BR2 (c,t)
It would be interesting to know if there is a more direct route to this conclusion,

in particular, one that avoids a Tauberian argument.
Exercise 11.4.19. Assume that N ≥ 2. Given a µ ∈ M(RN ), say that µ is

tame if
Z

sup − log |y − x| ∧ 1 µ(dy) < ∞ when N = 2
x∈R2 R2
Z
sup |y − x|2−N µ(dy) < ∞ when N ≥ 3.
x∈RN RN
Further, say that Γ ∈ BRN has capacity zero if there is no tame µ ∈ M(RN )
for which µ(Γ) > 0.

(i) If K ⊂⊂ RN , show that K has capacity 0 if and only if Cap K; B(0, R) = 0
for some R > 0 with K ⊂⊂ B(0, R). Further, show that if K has capacity 0, G
is open with K ⊂⊂ G, and either N ≥ 3 or (11.1.20) holds, then Cap(K; G) = 0.
(ii) If Γ ∈ BRN , show that Γ has capacity 0 if and only if every compact K ⊆ Γ
has capacity 0.
(iii) For any open G ⊆ RN , show that ∂G \ ∂reg G has capacity 0.
(iv) Let G be an open subset of RN , and assume that either N ≥ 3 or (11.1.20)

holds. If u ∈ E(G) is not identically infinite, show that {x ∈ G : u(x) = ∞} has
capacity 0.
(v) Suppose that G is an open subset of RN and that either N ≥ 3 or (11.1.20)
holds. If K ⊂⊂ G, show that {x ∈ K : pG K (x) < 1} has capacity 0. Conclude
N N
that if µ ∈ M(R ) is tame and µ(R \ K) = 0, then
Z Z
G G G
µ(K) = pK dµ = E (µ, µK ) = GG µ dµG
K.
G G
N
Exercise 11.4.20. Let G be an open subset of R for some N ≥ 2, and assume
that either N ≥ 3 or that (11.1.20) holds. We know how to define Cap(K; G)
for K ⊂⊂ G. However, the map K Cap(K; G) is somewhat mysterious. In
this exercise we will discuss a few of its important properties, properties that
enabled G. Choquet1 to prove that Cap( · , G) admits a well-defined extension
to all of BG .
(i) If µ, ν ∈ M(G) and GG µ ≤ GG ν, show that E G (µ, µ) ≤ E G (µ, ν). In
particular, conclude that Cap(K1 , G) ≤ Cap(K2 , G) for all compacts K1 ⊆ K2 ⊂
G. Thus the convergence in Theorem 11.4.9 is non-increasing convergence.
(ii) If K1 , K2 ⊂⊂ G, show that
pG G (N )
σK2 < ζ G ≤ σK1

K1 ∪K2 (x) − pK1 (x) = Wx
≤ Wx(N ) σK2 < ζ G ≤ σK1 ∩K2 ≤ pG G

K2 (x) − pK1 ∩K2 (x),
and therefore that pG G G G

K1 ∪K2 + pK1 ∩K2 ≤ pK1 + pK2 .
(iii) By combining (i) and (ii), arrive at
E G (µK1 ∪K2 + µK1 ∩K2 , µK1 ∪K2 + µK1 ∩K2 ) ≤ E G (µK1 ∪K2 + µK1 ∩K2 , µK1 + µK2 ).
Next, apply (v) of the preceding exercise to see that
E G (µK1 ∪K2 +µK1 ∩K2 , µK1 ∪K2 +µK1 ∩K2 ) = Cap(K1 ∪K2 ; G)+3Cap(K1 ∩K2 ; G)
and
E G (µK1 ∪K2 +µK1 ∩K2 , µK1 +µK2 ) = Cap(K1 ; G)+Cap(K2 ; G)+2Cap(K1 ∩K2 ; G),
and conclude that Cap( · ; G) satisfies the strong sub-additivity property
Cap(K1 ∪ K2 ; G) + Cap(K1 ∩ K2 ; G) ≤ Cap(K1 ; G) + Cap(K2 ; G).
What Choquet showed is that a non-negative set function defined for compact
subsets of G and satisfying the monotonicity property in (i), the monotone
convergence property in (ii), and the strong subadditivity property in (iii) admits
a unique extension to BG in such a way that these properties persist. In the
articles alluded to earlier, Hunt used Choquet’s result to show that the first
positive entrance into a Borel set is measurable.
1 See Choquet’s Lectures on Analysis, Vol. I, W.A. Benjamin (1965).
Notation
General
Notation Description See
a∧b&a∨b The minimum and the maximum of a and b
The non-negative part, a ∨ 0, and non-positive part, −(a ∧

a+ & a−
0), of a ∈ R
f ↾S The restriction of the function f to the set S
k · ku The uniform (supremum) norm
The uniform norm of the path ψ restricted to the inter-

kψk[a,b] (4.1.1)
val [a, b]
var[a,b] (ψ) Variation norm of the path ψ ↾ [a, b] (4.1.2)
Γ(t) Euler Gamma function (1.3.20)
ωN −1 The surface area of the sphere SN −1 in RN (2.1.13)
ΩN −1 The volume, N −1 ωN −1 , of the unit ball B(0; 1) in RN
⌊t⌋ The integer part of t ∈ R
Sets and Spaces
A∁ The complement of the set A
A(δ) The δ-hull around the set A §3.1
1A The indicator function of the set A. §1.1
The ball of radius r around a in E. When E is omitted,

BE (a, r)
it is assumed to be the RN for some N ∈ Z+
Space of bounded, Borel measurable functions from E
B (E; R)
into R
K ⊂⊂ E To be read: K is a compact subset of E.
C The complex numbers
N The non-negative integers: N = {0} ∪ Z+
517
518 Notation
SN −1 The unit sphere in RN
Q The set of rational numbers
Z & Z+ Set of all integers and the subset of positive integers
The space C ([0, ∞); RN ) of continuous paths ψ : [0, ∞) −→

C(RN ) §9.3
RN
The space of bounded continuous functions from E into
Cb (E; R)
R.
The space of continuous, R-valued functions having com-
Cc (G; R)
pact support in the open set G
The space of functions (t, x) ∈ R × RN −→ R which are
C 1,2 (R × RN ; R)
continuously differentiable once in t and twice in x.
The space of right-continuous paths ψ : [0, ∞) −→ RN
D(RN ) §4.1.1
with left-limits on (0, ∞)
The Cameron–Martin subspace for Wiener measure on
H ( RN ) §8.1.2
Θ ( RN )
The Lebesgue space of E-valued functions f for which
Lp (µ; E)
kf kpE is µ-integrable
M1 (E) The space of Borel probability measures on E §9.1.2
The space of non-negative, finite, Borel probability mea-

M(E)
sures on E
Real- or complex-valued Schwartz test function space on
S (RN ; R) or S (RN ; C) §3.2.3
RN
Measure Theoretic
BE The Borel σ-algebra over E
B(E; R) The space of bounded, measurable functions on E
To be read the expectation value of X with respect to µ

R
µ on A. Equivalent to X dµ. When A is unspecified, it
E [X, A] A
is assumed to be the whole space
δa The unit point mass at a

Notation 519
Lebesgue measure on the set A. Usually A = RN or some

λA
interval
To be read: the conditional expectation value of X given
E µ [X | F ] §5.1.1
the σ-algebra F
fˆ The Fourier transform of the function f §2.3.1
f ⋆g The convolution of f with g
hϕ, µi An alternative notation for Eµ [ϕ] §2.1
g (N ) (t, x) The density of the Gauss distribution in RN §10.1
Wiener Measure
Gaussian or normal distribution with mean m and co-
γm,C §2.3.1
variance C
µ̂ The Fourier of the measure µ §2.3.1
µ⋆ν The convolution of measures µ with ν Chap. III
µ≪ν The measure µ is absolutely continuous with respect to ν
µn =⇒ µ The sequence {µn : n ≥ 1} tends weakly to µ §9.1.2
µ⊥ν The measure µ is singular to ν
med(Y ) The set of medians of the random variable Y §1.4
N (m, C) Normal distributions with mean m and covariance C §2.3.1
Φ∗ µ The pushforward (image) of µ under Φ (1.1.16)
The σ-algebra generated by the set of random variables

σ({Xi : i ∈ I})
{Xi : i ∈ I}
_ S
Fi The σ-algebra generated by i∈I
Fi
i∈I
δs The differential time-shift map on C(RN ) §7.1.4
Σs The time-shift map on C(RN ) §10.2.1
W (N ) Wiener measure on Θ(RN ) or C(RN ) §8.1.1

(N )
Wx The distribution of x + ψ under W (N ) §10.1.1
The abstract Wiener space with Cameron–Martin space

(H, E, WH ) §8.2.2
H
520 Notation
Potential Theoretic
E(G) The set of excessive functions on G §11.3.1
g G (x, y) Dirichlet Green function for G §11.2
GG µ Green potential with charge µ in G (11.3.1)
pG (t, x, y) Dirichlet heat kernel for G §10.3.1

Index
A iterated logarithm, 189, 366

Lévy’s martingale characterization, 282
absolutely monotone, 19 Lévy’s modulus of continuity, 191
absolutely pure jump path, 158 non-differentiability, 183
abstract Wiener space, 309 on a Banach space, 361
orthogonal invariance, 328 pinned, 327, 334
ergodicity, 329 recurrence in one and two dimensions, 413
adapted, 266 reflection principle, 188, 294
σ-algebra rotational invariance, 187
atom in, 13 scaling invariance, 187, 335
tail, 2 for Banach space, 365
trivial, 2 strong law, 188
approximate identity, 16 time inversion, 187
a.e. convergence of, 241 for Banach space, 365
Arcsine Law, 407 transience for N ≥ 3, 414
a characterization of, 415 transition function for killed, 298
for random variables, 409 variance of paths, 333
asymptotic, 32 with drift, 444
atom, 13 Burkholder’s Inequality, 262
Azema’s Inequality, 264 application to Fourier series, 263
application to Walsh series, 264
B for continuous martingales, 289
martingale comparison, 257
Bachelier, 188
for martingale square function, 262
barrier function, 423
Beckner’s inequality, 108
C
Bernoulli multiplier, 101
Bernoulli random variables, 5 Calderón–Zygmund Decomposition
Bernstein polynomial, 17 Gundy’s for martingales, 227
Berry–Esseen Theorem, 77 Cameron–Martin formula, 312
Bessel operator, 350 Cameron–Martin space, 305
Beta function, 138 classical, 305
Blumenthal’s 0–1 Law, 426 in general, 310
Bochner’s Theorem, 119 capacitory distribution, 499
Borel measureable linear maps are continu- Chung’s representation of, 500
ous, 314 capacitory potential, 497, 499
Borel–Cantelli Lemma capacitory distribution, 499
extended version of, 506 capacity, 499
martingale extension of, 229 monotone continuity, 502
original version, 3 capacity zero, 514
Brownian motion, 177 Cauchy distribution, 149
Erdös–Kac Theorem, 399 Cauchy initial value problem, 400
Hölder continuity, 183 centered Gaussian measure, 299
in a Banach space, 359 non-degenerate, 306
521
522 Index
centered random variable, 179 balayage procedure, 426

Central Limit phenomenon, 60 Courant–Friedrichs–Lewy scheme, 428
Central Limit Theorem finite difference scheme, 428
basic case, 64 Perron–Wiener solution, 423
Berry–Esseen, 77 regular point, 421
higher moments, 87 uniqueness, 463
Lindeberg, 61 uniqueness criterion
sub-Gaussian random variables, 89 N ≥ 3, 466
characteristic function, 82 N ∈ {1, 2}, 467
Chebychev polynomial, 34 distribution, 12
Chebyshev’s inequality, 15 function, 7
Chernoff’s Inequality, 30 Gaussian or normal, 85
Chung–Fuchs Theorem, 231 uniform, 6
conditional expectation, 194 distribution of a stochastic process, 152
application to Fourier series, 204 Donsker’s Invariance Principle, 393
basic properties, 197 Doob’s Decomposition, 213
existence and uniqueness, 195 continuous case, see Doob–Meyer
infinite measure, 200 Doob’s Inequality
Banach space–valued case, 200 Banach-valued case, 239
Jensen’s Inequality for, 210 continuous parameter, 270
properties, 197 discrete parameter, 207
regular, 386 Doob’s Stopping Time Theorem
versus orthogonal projection, 202 continuous parameter, 275
conditional probability, 196 discrete parameter, 213
as limit of naı̈ve case, 209 Doob–Meyer Decomposition, 285
naı̈ve case, 193 drift, 444
regular version, 388 Duhamel’s Formula, 282
conditional probability distribution, 388 for Green function when N = 2, 482
continuous martingale, 267 for Green function when N ≥ 3, 476
Burkholder’s Inequality for, 289 for killed Brownian motion, 298
Doob–Meyer Theorem, 285
exponential estimate, 291 E
exponential martingale, 291
continuous singular functions, 47 eigenvalues for Dirichlet Laplacian, 450
convergence principal eigenvalue, 450
in law or distribution, 379 Weyl’s asymptotic formula, 453
weak, 116 empirical distribution, 384
convolution, 63 energy of a charge, 501
measure with measure, 115 equicontinuous family, 377
of function with measure, 83 Erdös–Kac Theorem, 399
of functions, 63 ergodic hypothesis
countably generated σ-algebra, 13 continuous case, 254
covariance, 84 discrete case, 249
Cramér’s Theorem, 27 ergodic theory
Individual Ergodic Theorem
D continuous parameter, 254
discrete parameter, 248
De,Finetti, 219 stationary family, 251
strong law, 220 error function, 72
difference operator, 18 Euler’s Gamma function, 32
Dirichlet problem, 418 excessive function, 488
Index 523
excessive function (continued) Rademacher, 5

charge determined by, 494 rapidly decreasing, 82
Riesz Decomposition of, 492 tempered, 97
exchangeable random variables, 220
Strong Law for, 220 G
exponential random variable, 161
extended stopping time, 278 Gamma distribution, 138
Gamma function, 32
Gauss kernel, 23
F
Gaussian family, 179
Fernique’s Theorem, 306 conditioning, 203
application to functional analysis, 314 Gaussian measure
Feynman’s representation, 303 on a Banach space, 299
Feynman–Kac support of, 321
formula, 403 Gaussian random variable,
heat kernel, 437 independence vs. orthogonality, 94
fibering a measure, 389 generalized Poisson process, 171
first entrance time, asymptotics of distribu- Green function, 476
tion for balls, 486
N = 2, 512 Duhamel’s Formula for N = 2, 482
N ≥ 3, 509 Duhamel’s Formula for N ≥ 3, 476
first exit time, 419 properties when N = 2, 485
fixed points of Tα , 92 Green’s Identity, 487
Fourier transform, 82 ground state, 439, 448
Beckner’s inequality for, 108 associated eigenvalue, 439
ground state representation, 439
diagonalized by Hermite functions, 100
Guivarc’h recurrence lemma, 45, 256
for measure on Banach space, 301
inversion formula, 98, 112
of a function, 82 H
of a measure, 82 Haar basis, 319
operator, 100 Hardy’s Inequality, 238
Parseval’s Identity for, 112 Hardy–Littlewood Maximal Inequality, 235
free fields harmonic function, 419
Gaussian, 343 Harnack’s Inequality and Principle, 471
erogicity, 358 Liouville Theorem, 472
existence of, 352 removable singularities for, 472
function harmonic measure, 468
characteristic, 82 for balls, 469
distribution, 7 for RN
+ , 469
error, 72 harmonic oscillator, 406
Euler’s Beta, 138 Harnack’s Inequality, 471
Euler’s Gamma, 32 Harnack’s Principle, 471
excessive, 488 heat equation, 400
Fourier transform of, 82 Cauchy initial value problem, 400
Hermite, 100 heat kernel, 429
indicator, 4 Dirichlet, 435
moment generating, 23 Feynman–Kac, 437
logarithmic, 25 Hermite, 406, 454
normalized Hermite, 112 heat transfer, Spitzer’s asymptotic rate, 507
probability generating, 19 Hermite functions, 100
progressively measurable, 266 eigenfunctions for Hermite operator, 454
524 Index
Hermite functions (continued) Kolmogorov’s

Fourier eigenvectors, 100 continuity criterion, 182
normalized, 112 Extension or Consistency Theorem, 384
Hermite heat kernel, 406 Inequality, 36
Hermite multiplier, 98 Strong Law, 38
Hermite operator, 406 0–1 Law, 2
Hermite polynomials, 97 Kronecker’s Lemma, 37
Lp -estimate, 114
Hewitt–Savage 0–1 Law, 221
Hölder conjugate, 100 L
hypercontractive, 105
λ-system, 8
Laplace transform inversion formula, 21
I
large deviations estimates, 28
independent Law of Large Numbers
events or sets, 1 Strong
random variables, 4 in Banach space, 241, 256, 384
existence in general, 12 for empirical distribution, 384
existence of R-valued sequences, 7 for exchangeable random variables, 220
σ-algebras, 1 Kolmogorov’s, 38
indicator function, 4 Weak, 16
inequality refinement, 20, 44, 45
Azema’s, 264 Law of the Iterated Logarithm
Burkholder’s, 262, 289 converse, 56
Gross’s logarithmic Sobolev, 114
proof of, 54
Harnack’s, 471
statement, 49
Jensen’s, 210, 240
Strassen’s Version, 340, 366
Khinchine’s, 94
Lebesgue’s Differentiation Theorem, 237
Kolmogorov’s, 36
Lévy’s, 40 Lévy measure, 128
Nelson’s Hypercontractive, 106 Itô map for, 390
infinitely divisible, 115 Lévy operator, 268
measure or law, 115 Lévy process, 152
inner product for measures, 230 reflection, 292
integer part, 5 Lévy system, 134
invariant set, 246 Lévy’s Continuity Theorem, 118
second version, 120
J Lévy–Cramér Theorem, 66
Lévy–Khinchine formula, 136
Jensen’s Inequality, 210 limit superior of sets, 2
Banach-valued case, 240 Lindeberg’s Theorem, 61
jump function, 156
Lindeberg–Feller Theorem, 62
Feller’s part, 90
K Liouville Theorem, 472
Kac’s Theorem, 252 locally µ-integrable, 199
Kakutani’s Theorem, 229 Logarithmic Sobolev Inequality, 113
kernel for Bernoulli, 113
Gauss, 23 logarithmic Sobolev Inequality
Mehler’s, 98 for Gaussian, 114, 356
Khinchine’s Inequality, 94 lowering operator, 97
Index 525
M Hermite, 98
marginal distribution, 83
N
Markov property, 417
martingale, 205 Nelson’s Inequality, 106
application to Fourier series, 263 non-degenerate, 306
continuous parameter, 267 non-negative definite function, 119
complex, 267 non-negative linear functional, 374
Gundy’s decomposition of, 227 normal law, 23
Hahn decomposition of, 227 fixed point characterization, 91
reversed, 217 Lévy–Cramér Theorem, 66
Banach-valued case, 241 standard, 23
on σ-finite measure space, 233 null set, see P-null set
martingale convergence
continuous parameter, 271 O
Hilbert-valued case, 243
operator
Marcinkewitz’s Theorem, 207
Fourier, 100
preliminary version for Banach space, 239
hypercontractive, 105
second proof, 226
lowering, 97
third proof, 227
raising, 96
via upcrossing inequality, 214
optional stopping time, 280
maximal function
Ornstein–Uhlenbeck process, 344
Hardy–Littlewood, 235
ancient, 345
Hardy–Littlewood inequality, 236
associated martingales, 415
maximum principle of Phragmén– Lin-
Gaussian description, 344
delöf, 474
Hermite heat kernel, 454
Maxwell distribution for ideal gas, 70
reversible, 346
mean value
in Banach space, 365
Banach space case, 199
vector-valued case, 84
P
measure
invariant, 112 Paley–Littlewood Inequality for Walsh
locally finite, 63 series, 264
non-atomic, 381 Paley–Wiener map, 312
product, 10 as a stochastic integral, 316
pushforward Φ∗ µ of µ under Φ, 12 Parseval’s Identity, 112
measure preserving, 244 path properties, 158
measures absolutely pure jump, 158
consistent family, 383 piecewise constant, 158
tight, 376, 382 Phragmén–Lindelöf, 474
median, 39 pinned Brownian motion, 327
variational characterization, 43 π-system, 8
Mehler kernel, 98 P-null set, 194
minimum principle, 130 Poincaré’s Inequality for Gaussian, 355
strong, 405 Poisson jump process, 168
weak, 404 Itô’s construction of, 390
moment estimate for sums of independent Poisson kernel, 149
random variables, 94 for upper half-space, 429
moment generating function, 23 for ball via Green’s Identity, 487
logarithmic, 25 Poisson measure, 122
multiplier generalized, 171
Bernoulli, 101 simple, 161
526 Index
Poisson point process, 176 recurrence of Brownian motion, 413

Poisson problem, 475 reflection principle
Poisson process, 161, 163 Brownian motion, 188, 294
associated with πM , 164 for independent random variables, 40
generalized, 171 regular point, 421, 427
jump distribution, 163 exterior cone condition, 427
rate, 163 probabilistic criterion, 421
simple, 161 Wiener’s test for, 504
Poisson random variable, N-valued, 21 removable singularity, 472
Poisson’s formula, 469 return time, Kac’s Theorem for, 252
Polish space, 367 Riemann–Lebesgue Lemma, 121
potential, 487 Riesz Decomposition Theorem, 492
charge determined by, 494 Robin’s constant, 485
in terms of excessive functions, 494
principle of accompanying laws, 380
S
probability space, 1
process semigroup, hypercontractive estimate, 105
Brownian motion, 177 shift invariant, 251
with drift, 444 σ-algebra, countably generated, 13
Ornstein–Uhlenbeck, 344 simple Poisson process, 163
stationary, 345 run at rate α, 163
process with independent, homogeneous Sobolev space, 350
increments, 152 square function, Burkholder’s Inequality
product measure, 10 for, 262
progressively measurable, 205, 266 stable laws, 141
versus adapted, 267 1
order one-sided
2
pushforward measure Φ∗ µ, 12 Brownian motion, 281
density, 149
Q characterization, 144
one-sided, 147
quitting time, 500
density, 148
symmetric, 146
R densities, 149
Rademacher functions, 5 state space, 152
Radon–Nikodym derivatives, martingale stationary, 251
interpretation, 216 stationary family
raising operator, 96 canonical setting for, 251
random variable, 4 Kac’s Theorem for, 252
N-valued Poisson, 21 stationary process, 345
Bernoulli, 5 statistical mechanics, derivation of Maxwell
characteristic function, 82 distribution, 70
convergence in law, 379 Stein’s method, 72
Gaussian or normal, 23 Stirling’s formula, 32, 70
vector-valued case, 85 stochastic integral, 316
median of, 39 stochastic process, 152
sub-Gaussian, 88 adapted, 266
symmetric, 44 continuous, 266
uniformly integrable, 15 distribution of, 152
variance of, 15 independent increments, 152
rapidly decreasing, 9, 82 modification, 189
Rayleigh’s Random Flights Model, 396, 399 reversible, 346
Index 527
stochastic process (continued) tempered, 97

right-continuous, 266 tempered distribution, 350
state of, 152 tight, 376, 382
stochastic continuity, 189 for finite measures, 382
stopping time, 212 time reversal, 335
continuous parameter, 272 time-shift map, 416
discrete case, 212
Tonelli’s Theorem, 4
extended, 278
transform
old definition, 280
optional, 280 Fourier, see Fourier transform
Stopping Time Theorem Laplace, 21
Doob’s, continuous parameter, 275 Legendre, 26
Doob’s, discrete parameter, 213 transformation, measure preserving, 244
Hunt’s, continuous parameter, 275 transient, 414
Hunt’s, discrete parameter, 213 transition probability, 112
Strassen’s Theorem, 340
Brownian formulation of, 363
U
Strong Law of Large Numbers, 23
for Brownian motion, 188 uniform norm k · ku , 17
for empirical distribution, 384 uniform topology on M1 (E), 367
in Banach space, 241, 256, 384
uniformly distributed, 6
Kolmogorov’s, 38
uniformly integrable, 15
strong Markov property, 417
Strong Minimum Principle, 405 unit exponential random variable, 161
strong topology on M1 (E), 369
not metrizable, 381 V
sub-Gaussian random variables, moment
estimates, 93 variance, 15
submartingale, 205 variation norm, 368
continuous parameter, 267
Doob’s Decomposition, 213
W
Doob’s Inequality
continuous parameter, 270 Walsh functions, 264
discrete parameter, 206
weak convergence, 116
Doob’s Upcrossing Inequality, 214
equivalent formulations, 372
reversed, 217
principle of accompanying laws, 380
σ-finite measure space, 233
stopping time theorem Weak Law of Large Numbers, 16
Doob’s Weak Minimum Principle, 404
discrete parameter, 212 weak topology on M1 (E), 370
Doob’s continuous parameter, 275 completeness, 377
Hunt’s Prohorov metric for, 379
discrete parameter, 213 separable, 376, 382
Hunt’s continuous parameter, 275 weak-type inequality, 207
subordination, 148 Weierstrass’s Approximation Theorem, 17
symmetric difference of sets, 246 Wiener measure, 301
symmetric random variable, 44 Arcsine law, 407
moment relations, 45
Feynman’s representation, 303
Markov property, 417
T translation by x, 401
tail σ-algebra, 2 Wiener series, 318
and exchangability, 220 classical case, 334
ergodicity of, 256 Wiener’s test for regularity, 504

Probability Theory An Analytic View PDF

Uploaded by

Copyright:

Available Formats

Probability Theory An Analytic View PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability Theory An Analytic View PDF

Uploaded by

Copyright:

Available Formats

This page intentionally left blank

Dr. Daniel W. Stroock is the Simons Professor of Mathematics Emeritus at the

Cambridge University Press

This publication is in copyright. Subject to statutory exception

First edition published 1994

Printed in the United States of America

Library of Congress Cataloging in Publication data

ISBN 978-0-521-76158-1 Hardback

M. Kac, H.P. McKean, Jr., and S.R.S. Varadhan

Table of Dependence . . . . . . . . . . . . . . . . . . . . . . xxi

Chapter 1 Sums of Independent Random Variables . . . . . . . . 1

Chapter 2 The Central Limit Theorem . . . . . . . . . . . . . 59

Chapter 3 Infinitely Divisible Laws . . . . . . . . . . . . . . . 115

Chapter 4 Lévy Processes . . . . . . . . . . . . . . . . . . . 151

Chapter 6 Some Extensions and Applications of Martingale

Chapter 7 Continuous Parameter Martingales . . . . . . . . . 266

Chapter 8 Gaussian Measures on a Banach Space . . . . . . . . 299

Chapter 9 Convergence of Measures on a Polish Space . . . . . 367

Chapter 10 Wiener Measure and Partial Differential Equations . 400

Chapter 11 Some Classical Potential Theory . . . . . . . . . . 456

From the Preface to the First Edition

Preface to the Second Edition

1: Chapter 1 contains a sampling of the standard, pointwise convergence theo-

Suggestions about the Use of This Book

§§3.2–3.3 §§4.1–4.3 §§6.1 & 6.3 §6.2

§2.3 §§3.1 & 3.4 §§5.1 & 5.3 §1.5

§§2.1 & 2.2 §§1.1–1.4

(1.1.1) P(Ai1 ∩ · · · ∩ Ain ) = P(Ai1 ) · · · P(Ain ).

In particular, if {Ai : i ∈ I} is a family of sets from F, I will say that Ai , i ∈

Since T ⊆ FI , this implies that T is independent of itself ; that is, P(A ∩ B) =

Obviously, limn→∞ An is measurable with respect to the tail field determined by

In words, this conclusion can be summarized as follows: for any sequence of

In fact, if the An ’s are P-independent sets, then

But, by independence and another application of countable additivity, for any

{∞}, that counts theP number of n ∈ Z+ such that ω ∈ An . Then, by Tonelli’s

σ(Xi ) = Xi−1 Bi ≡ Xi−1 (Bi ) : Bi ∈ Bi , i ∈ I,

for all finite subsets

and the σ-algebras Fi , i ∈ I, are P-independent. Although this procedure is

The function Rn is then defined on [0, 1) by

Rn (ω) = R 2n−1 ω , n ∈ Z+ and ω ∈ [0, 1).

for any F : {−1, 1}n −→ R. But (R1 , . . . , Rn ) is constant on each interval

random variables, I will combine our Bernoulli random variables together in a

Lemma 1.1.6. Let {X` : ` ∈ Z+ } be a sequence of P-independent {0, 1}-

Then U is uniformly distributed on [0, 1].

is uniformly distributed on [0, 1). In addition, the Uk ’s are obviously mutually

variable into an arbitrary one. Namely, given a distribution function F on

Exercise 1.1.10. As an application of Lemma 1.1.6 and part (ii) of Exercise

Exercise 1.1.11. Define {n (ω) : n ≥ 1} for ω ∈ [0, 1) as in the proof of

and show that λ[0,1) = G∗ λ[0,1)2 .

for all α ∈ Rm and β ∈ Rn .

elementary Fourier analysis to write

The purpose of this exercise is to generalize the preceding construction to

as F varies over non-empty, finite subsets of I (abbreviated by ∅ = 6 F ⊂⊂ I).

After noting that, for each m and n, gm,n+1 ≤ gm,n and

set gm = limn→∞ gm,n and conclude that

In addition, note that

and proceed by induction to produce a` ∈ Ei` , ` ∈ Z+ , so that

gm (a1 , . . . , am ) ≥  for all m ∈ Z+ .

Exercise 1.1.16. Recall that if Φ is a measurable map from one measurable

Exercise 1.1.11. Define {n (ω) : n ≥ 1} for ω ∈ [0, 1) as in the proof of

gm (a1 , . . . , am ) ≥ for all m ∈ Z+ .

f (p) − Bn (p; f ) ≤ kf ku + ρ(; f ).

Then, for each > 0,

At the same time, it is clear that, for 0 < < |a|,

and, for 0 < < |a| and sufficiently large n’s,

Moreover, for a ∈ (α, β) (cf. Lemma 1.3.11), > 0, and n ∈ Z+ ,

for every ∈ (0, δ). If a ∈

for all x ∈ [m − δ, m + δ]. In particular, if 0 < < δ, then

and if |a − m| < δ and > 0, then