Conditional Expectation and Martingales
Conditional Expectation and Martingales
Conditional Expectation and Martingales
1. Introduction
The stochastic processes known as martingales were invented by Joseph Doob a little over
fifty years ago at the University of Illinois. Their introduction changed the face of modern
probability theory. Martingales now play a central role in probabilistic potential theory, in
stochastic calculus, and (as some of you will learn) in mathematical finance. The purpose of
these notes is to give an introduction to martingale theory, and to illustrate how martingales
arise in the study of Markov chains and random walks.
2.3. First Application: The Gambler’s Ruin Problem. Two gamblers, Fats and
Slim, play the following game: Fats repeatedly toses a fair coin. After each toss that
comes up H, Slim pays Fats one dollar. After each toss that comes up T , Fats pays Slim
one dollar. The game continues until either one or the other gambler runs out of money. If
Fats starts with $A and Slim starts with $B,
(A) What is the probability that, when the game ends, Fats has all the cash?
(B) What is the expected duration of the game?
Solution to Problem B. To solve Problem (B), we shall appeal to the Second Wald
Identity. Since µ = 0 and σ 2 = 1, and since Eτ < ∞ (see below), the Second Wald Identity
implies that
ESτ2 = Eτ.
But Sτ takes one of only two possible values, +A and −B, and by equations (9) above, the
probability distribution is known. Thus,
A2 B B2A
ESτ2 = + = AB
A+B A+B
This proves that
(10) Eτ = AB.
To justify the calculations that led to equations (9) and (10), we must show that Eτ < ∞.
To this end, set M = A + B, and consider how the game proceeds in blocks of M consecutive
tosses. In each such block, it is possible that there will be M consecutive Heads, in which
case the game will end and Fats will take home the money. For each such block of M
consecutive tosses, this happens with probability 1/2M . Thus, the probability that the game
has not ended after the first kM tosses is (1 − 2−M )k . Thus,
k
1
P {τ > kM } ≤ 1 − M ,
2
that is, the distribution of τ has an exponentially decaying tail. This implies that Eτ < ∞.
(Exercise: Explain why.)
where the sum is over the set of all possible values x of X. Note that this expression depends
on the value y – this will be important in what follows. For discrete random variables that
take values in finite sets there are no difficulties regarding possible divergence of the sum, nor
is there any difficulty regarding the meaning of the conditional probability P (X = x | Y = y).
For continuous random variables, or, worse, random variables that are neither discrete nor
have probability densities, the definition (14) is problematic. There are two main difficulties:
(a) If X is not discrete, then the sum must be replaced by an integral of some sort; and (b) If
Y is not discrete then it is no longer clear how to define the condtional probabilities P (X =
x | Y = y). Fortunately, there is an alternative way of defining condtional expectation that
works in both the discrete and the indiscrete cases, and additionally allows for conditioning
not only on the value of a single random variable, but for conditioning simultaneously on
the values of finitely or even countably many random variables or random vectors:
Definition 1. Let X be a real-valued random variable such that either E|X| < ∞ or X ≥ 0,
and let Y1 , Y2 , . . . , Ym be random variables taking values in a set Y.1 The conditional expec-
tation E(X | Y1 , Y2 , . . . , Ym ) is the unique (measurable) real-valued function of Y1 , Y2 , . . . , Ym
such that for every bounded (measurable) function g(Y1 , Y2 , . . . , Ym ),
(15) EXg(Y1 , Y2 , . . . , Ym ) = E(E(X | Y1 , Y2 , . . . , Ym )g(Y1 , Y2 , . . . , Ym )).
It is by no means clear a priori that such a function E(X | Y1 , Y2 , . . . , Ym ) should always
exist, nor is it obvious that it should be unique. In fact, the existence of such a function
is an important theorem of measure theory, called the Radon-Nikodym theorem. You will
see its proof next year, in Prof. Wichura’s class. The uniqueness of the function is not so
difficult to prove:
Proof of Uniqueness. For convenience, denote by Y = (Y1 , Y2 , . . . , Ym ) the random vector on
which we are conditioning. Suppose that there are distinct functions h1 (y) and h2 (y) such
1More precisely, Y1 , Y2 , . . . , Ym take values in a measurable space (Y, G). If the set Y is finite, there is no
need to worry about measurability issues.
6
that, for every bounded function g(y),
EXg(Y ) = Eh1 (Y )g(Y ) and
EXg(Y ) = Eh2 (Y )g(Y ).
Then by the linearity of ordinary expectation (taking the difference of the two equations) it
must be the case that for every bounded function g(y),
0 = E(h1 (Y ) − h2 (Y ))g(Y );
in particular, this equation must hold for the function g(y) that is 1 if h1 (y) > h2 (y) and
0 otherwise. But this implies that P {h1 (Y ) > h2 (Y )} = 0. A similar argument shows that
P {h2 (Y ) = h1 (Y )} = 0. It follows that P {h1 (Y ) = h2 (Y )} = 1.
3.2. Equivalence of the Naive and Modern Definitions. It is not difficult to show that
the naive definition (15) coincides with the modern Definition 1 when the random variable
X and the random vector Y = (Y1 , Y2 , . . . , Ym ) are discrete and assume only finitely many
possible values with positive probability. Define
X X P {X = x and Y = y}
h(y) = xP (X = x | Y = y) = x
x x
P {Y = y}
for those values of y such that P {Y = y} > 0. To show that E(X | Y ) = h(Y ), it suffices,
by Definition 1, to show that, for any bounded function g(y),
EXg(Y ) = Eh(Y )g(Y ).
But
XX
EXg(Y ) = xg(y)P {X = x and Y = y}
y x
X X
= g(y)P {Y = y} xP (X = x | Y = y)
y x
X
= g(y)P {Y = y}h(y)
y
= Eh(Y )g(Y ).
3.3. Properties of Conditional Expectation. The raw definition 1 can be rather clumsy
to work with directly. In this section we present a short list of important rules for manipu-
lating and calculating conditional expectations. The bottom line will be that, in many im-
portant respects, conditional expectations behave like ordinary expectations, with random
quantities that are functions of the conditioning random variable being treated as constants.
In stating the propperties below, it is convenient to use the abbreviation Y = (Y1 , Y2 , . . . , Yn ).
Note: The definition (1) requires only that the equation EXg(Y ) = EE(X | Y )g(Y ) be
valid for bounded functions g. A standard argument from measure theory, which you will
learn next year, then implies that it holds for all functions such that the product Xg(Y )
has finite first moment. Similarly, Property (4) holds provided the product has finite first
moment.
Proof of the Positivity Property. The idea is to exploit the defining property (15) of condi-
tional expectation. First, suppose that X ≥ 0. Define B to be the set of possible values of Y
where the conditional expectation E(X | Y ) < 0, so that the event {E(X | Y ) < 0} coincides
with the event {Y ∈ B}. Then by equation (15),
EX1B (Y ) = E(E(X | Y )1B (y)).
Since X ≥ 0, the left side of this equality is nonnegative; but by definition of B, the right
side is negative unless P {Y ∈ B} = 0. It follows that P {Y ∈ B} = 0, that is, E(X | Y ) ≥ 0
with probability one.
Proof of the Linearity Property. Since each of the conditional expectations E(U | Y ) and
E(V | Y ) is a function of Y , so must be the linear combination aE(U | Y ) + bE(V | Y ).
Thus, by Definition 1, to show that this linear combination is the conditional expectation
E(aU + bV | Y ), it sufficies to show that it satisfies equation (15), that is, that for every
bounded nonnegative function g(Y ),
(16) E(aU + bV )g(Y ) = E(aE(U | Y ) + bE(V | Y ))g(Y ).
But equation (15) holds for X = U and for X = V :
EU g(Y ) = EE(U | Y )g(Y ),
EV g(Y ) = EE(V | Y )g(Y ).
8
Multiplying these equations by a and b, respectively, and then adding gives (16), because
the unconditional expectation operator is linear.
Proof of the Independence Property. This relies on the fact that if U and V are independent,
integrable random variables whose product U V is also integrable, then E(U V ) = EU EV .
Now suppose that X is independent of Y , and let g(Y ) be any bounded (measurable) function
of Y . Then EXg(Y ) = EXEg(Y ) = E(EX)g(Y ). Since any constant, and in particular
EX, is trivially a function of Y , Definition 1 implies that EX must be the conditional
expectation EW (X | Y ).
Proof of the Jensen Inequalities. One of the basic properties of convex functions is that every
point on the graph of a convex function ϕ has a support line: that is, for every argument
x∗ ∈ R there is a linear function yx∗ (x) = ax + b such that
ϕ(x∗ ) = yx∗ (x∗ ) and
ϕ(x) ≥ yx∗ (x) for all x ∈ R.
Let X be a random variable such that E|X| < ∞, so that the expectation EX is well-
defined and finite. Let yEX (x) = ax + b be the support line to the convex function at
the point (EX, ϕ(EX)). Then by definition of a support line, yEX (EX) = ϕ(EX); also,
yEX (X) ≤ ϕ(X), and so
EyEX (X) ≤ Eϕ(X).
But because yEX (x) = ax + b is a linear function of x,
EyEX (X) = yEX (EX) = ϕ(EX).
This proves the Jensen inequality for ordinary expectation. The proof for conditional ex-
pectation is similar. For any value of Y , let yE(X|Y ) (x) be the support line at the point
(E(X|Y ), ϕ(E(X|Y ))). Then yE(X|Y ) (E(X|Y )) = ϕ(E(X|Y )), and for every value of X,
yE(X|Y ) (X) ≤ ϕ(X). Consequently, by the linearity and positivity properties of conditional
expectation,
ϕ(E(X|Y )) = yE(X|Y ) (E(X|Y ))
= E(yE(X|Y ) (X)|Y )
≤ E(ϕ(X)|Y ).
4. Martingales
4.1. Definition of a Martingale. Let X0 , X1 , . . . and Y0 , Y1 , . . . be sequences of random
variables defined on a common probability space, with Xn being real-valued and integrable
(that is, E|Xn | < ∞), and Yn valued in an arbitrary (measurable) space. In common
applications in stochastic process theory, the random variables Yn might be the successive
states in a Markov chain, and the random variables Xn numerical functions of the states Yn .
The sequence X0 , X1 , . . . is said to be adapted to the sequence Y0 , Y1 , . . . if for each n the
9
random variable Xn is a function2 of Y1 , Y2 , . . . , Yn . The sequence X0 , X1 , . . . is said to be a
martingale relative to the sequence Y0 , Y1 , . . . if it is adapted, and if for every n,
(17) E(Xn+1 | Y0 , Y1 , Y2 , . . . , Yn ) = Xn .
Similarly, it is said to be a supermartingale (respectively, submartingale) if for every n,
(18) E(Xn+1 | Y0 , Y1 , Y2 , . . . , Yn ) ≤ (≥)Xn .
Observe that any martingale is automatically both a submartingale and a supermartingale.
4.2. Martingales and Martingale Difference Sequences. The most basic examples of
martingales are sums of independent, mean zero random variables. Let Y0 , Y1 , . . . be a
sequence of independent, identically distributed random variables such that EYn = 0. Then
the sequence of partial sums
Xn
(19) Xn = Yj
j=1
is a martingale relative to the sequence 0, Y1 , Y2 , . . . . This is easily verified, using the linearity
and stability properties and the Independence Law for conditional expectation:
E(Xn+1 | Y1 , Y2 , . . . , Yn ) = E(Xn + Yn+1 | Y1 , Y2 , . . . , Yn )
= E(Xn | Y1 , Y2 , . . . , Yn ) + E(Yn+1 | Y1 , Y2 , . . . , Yn )
= Xn + EYn+1
= Xn .
The importance of martingales in modern probability stems at least in part from the fact
that most of the essential properties of sums of independent, identically distributed random
variables are inherited (with minor modification) by martingales: As you will learn, there are
versions of the SLLN, the Central Limit Theorem, the Wald indentities, and the Chebyshev,
Markov, and Kolmogorov inequalities for martingales.3 To get some appreciation of why this
might be so, consider the decomposition of a martingale {Xn } as a partial sum process:
n
X
(20) X n = X0 + ξj where ξj = Xj − Xj−1 .
j=1
Proposition 1. The martingale difference sequence {ξn } has the following properties: (a)
the random variable ξn is a function of Y1 , Y2 , . . . , Yn ; and (b) for every n ≥ 0,
(21) E(ξn+1 | Y1 , Y2 , . . . , Yn ) = 0.
Proof. This is another three-line calculation using the properties of conditional expectation
and the definition of a martingale.
2Here and in what follows, the term function will always mean measurable function. If you do not know
yet what this mean, don’t worry about it: all reasonable functions are measurable.
3We won’t have a chance to discuss all of these in this class, but you can rest assured that Prof. Wichura
will give a complete tour next year.
10
Corollary 1. Let {Xn } be a martingale relative to {Yn }, with martingale difference sequence
{ξn }. Then for every n ≥ 0,
(22) EXn = EX0 .
Moreover, if X0 = 0 and EXn2 < ∞ then
n
X
(23) EXn2 = Eξj2 .
j=1
Proof. The first property follows almost trivially from Proposition 1 and the Expectation
Law for conditional expectation, as these together imply that Eξn = 0 for each n. Summing
and using the linearity of ordinary expectation, one obtains (22).
The second property is only slightly more difficult: First, observe that each of the terms
ξj has finite variance, because it is the difference of two random variables with finite second
moments. (That EXj2 < ∞ follows from the hypothesis that EXn2 < ∞, together with
the Tower Property.) Consequently, all of the products ξi ξj have finite first moments, by
the Cauchy-Schwartz inequality. Next, if j ≤ k ≤ n then ξj is a function of Y1 , Y2 , . . . , Yj ,
and therefore also a function of Y1 , Y2 , . . . , Yk . Thus, by Properties (1), (4), (6), and (7) of
conditional expectation, if j ≤ k ≤ n then
Eξj ξk+1 = EE(ξj ξk+1 | Y1 , Y2 , . . . , Yk )
= Eξj Eξk+1 | Y1 , Y2 , . . . , Yk )
= E(ξj · 0) = 0.
The variance of Xn may now be calculated in exactly the same manner as for sums of
independent random variables with mean zero:
n
X
EXn2 = E( ξ j )2
j=1
Xn Xn
=E ξj ξk
j=1 k=1
n
XX n
= Eξj ξk
j=1 k=1
Xn XX
= Eξj2 +2 Eξj ξk
j=1 j<k
n
X
= Eξj2 + 0.
j=1
Here E x denotes the expectation corresponding to the probability measure P x under which
P x {Y0 = x} = 1. Notice the similarity between equation (28) and the equation for the
stationary distribution – one is just the transpose of the other.
Proposition 3. If h is harmonic for the transition probability matrix P then for every
starting state x ∈ Y the sequence h(Yn ) is a martingale under the probability measure P x .
Proof. This is once again nothing more than a routine calculation. The key is the Markov
property, which allows us to rewrite any conditional expectation on Y0 , Y1 , Y2 , . . . , Yn as a
conditional expectation on Yn . Thus,
E(h(Yn+1 ) | Y0 , Y1 , Y2 , . . . , Yn ) = E(h(Yn+1 ) | Yn)
X
= h(y)p(Yn, y)
y∈Y
= h(Yn ).
4.3.8. Submartingales from Martingales. Let {Xn }n≥0 be a martingale relative to the se-
quence Y0 , Y1 , . . . . Let ϕ : R → R be a convex function such that Eϕ(Xn ) < ∞ for each
n ≥ 0. Then the sequence {Zn }n≥0 defined by
(29) Zn = ϕ(Xn )
is a submartingale. This is a consequence of the Jensen inequality and the martingale
property of {Xn }n≥0 :
E(Zn+1 |Y0 , Y1 , . . . , Yn ) = E(ϕ(Xn+1 )|Y0 , Y1 , . . . , Yn )
≥ ϕ(E(Xn+1 |Y0 , Y1 , . . . , Yn )
= ϕ(Xn ) = Zn
Useful special cases: (a) ϕ(x) = x2 , and (b) ϕ(x) = exp{θx}.
4See his 800-page book Classical Potential Theory and its Probabilistic Counterpart for more on this.
13
5. Martingale and Submartingale Transforms
According to the Merriam-Webster Collegiate Dictionary, a martingale is
any of several systems of betting in which a player increases the stake usually
by doubling each time a bet is lost.
The use of the term in the theory of probability derives from the connection with fair games
or fair bets; and the importance of the theoretical construct in the world of finance also
derives from the connection with fair bets. Seen in this light, the notion of a martingale
transform, which we are about to introduce, becomes most natural. Informally, a martingale
transform is nothing more than a system of placing bets on a fair game.
5.1. Martingale Transforms. A formal definition of a martingale transform requires two
auxiliary notions: martingale differences and predictable sequences. Let X 0 , X1 , . . . be a
martingale relative to another sequence Y0 , Y1 , . . . . For n = 1, 2, . . . , define
(30) ξn = Xn − Xn−1 ;
to be the martingale difference sequence associated with the martingale Xn .
A predictable sequence Z1 , Z2 , . . . relative to the sequence Y0 , Y1 , . . . is a sequence of ran-
dom variables such that for each n = 1, 2, . . . the random variable Zn is a function of
Y0 , Y1 , . . . , Yn−1 . In gambling (and financial) contexts, Zn might represent the size (say, in
dollars) of a bet paced on the nth play of a game, while ξn represents the (random) payoff
of the nth play per dollar bet. The requirement that the sequence Zn be predictable in such
contexts is merely an assertion that the gambler not be clairvoyant.
Definition 2. Let X0 , X1 , . . . be a martingale relative to Y0 , Y1 , . . . and let ξn = Xn − Xn−1
be the associated martingale difference sequence. Let Z0 , Z1 , . . . be a predictable sequence
relative to Y0 , Y1 , . . . . Then the martingale transform {(Z · X)n }n≥0 is defined by
n
X
(31) (Z · X)n = Z0 X0 + Zk ξk .
k=1
Example: The St. Petersburg Game. In this game, a referee tosses a fair coin repeat-
edly, with results ξ1 , ξ2 , . . . , where ξn = +1 if the nth toss is a Head and ξn = −1 if the nth
toss is a Tail. Before each toss, a gambler is allowed to place a wager of size Wn (in roubles)
on the outcome of the next toss. The size of the wager Wn may depend on the observed
tosses ξ1 , ξ2 , . . . , ξn−1 , but not on ξn (or on any of the future tosses); thus, the sequence
{Wn }n≥1 is predictable relative to {ξn }n≥1 . If the nth toss is a Head, the gambler nets +Wn ,
but if the nth toss is a Tail, the gambler loses Wn . Thus, the net winnings Sn after n tosses
is the martingale transform
Xn
Sn = (W · X)n = Wk ξ k ,
k=1
where Xn = ξ1 + ξ2 + · · · + ξn .
The most important fact about martingale transforms is that they are martingales in their
own right, as the next proposition asserts:
14
Proposition 4. Assume that the predictable sequence {Zn }n≥0 consists of bounded random
variables. Then the martingale transform {(Z · X)n }n≥0 is itself a martingale relative to
{Yn }n≥0 .
Proof. This is a simple exercise in the use of the linearity and stability properties of condi-
tional expectation:
E((Z · X)n+1 | Y1 , Y2 , . . . , Yn ) = (Z · X)n + E(Zn+1 ξn+1 | Y1 , Y2 , . . . , Yn )
= (Z · X)n + Zn+1 E(ξn+1 | Y1 , Y2 , . . . , Yn )
= (Z · X)n ,
the last equation because {ξn }n≥1 is a martingale difference sequence relative to {Yn }n≥0 .
Proposition 5. If the terms Zn of the predictable sequence are nonnegative and bounded,
and if {Xn }n≥0 is a submartingale, then the submartingale transform (Z · X)n is also a
submartingale. Moreover, if, for each n ≥ 0,
(33) 0 ≤ Zn ≤ 1,
then
(34) E(Z · X)n ≤ EXn .
Proof. To show that (Z · X)n is a submartingale, it suffices to verify that the differences
Zk ξk constitute a submartingale difference sequence. Since Zk is a predictable sequence, the
differences Zk ξk are adapted to {Yk }k≥0 , and
E(Zk ξk | Y1 , Y2 , . . . , Yk−1 ) = Zk E(ξk | Y1 , Y2 , . . . , Yk−1 ).
Since ξk is a submartingale difference sequence, E(ξk | Y1 , Y2 , . . . , Yk−1 ≥ 0); and therefore,
since 0 ≤ Zk ≤ 1,
0 ≤ E(Zk ξk | Y1 , Y2 , . . . , Yk−1 ) ≤ E(ξk | Y1 , Y2 , . . . , Yk−1 ).
Consequently, Zk ξk is a submartingale difference sequence. Moreover, by taking expectations
in the last inequalities, we have
E(Zk ξk ) ≤ Eξk ,
15
which implies (34).
There is a similar result for supermartingales:
Proposition 6. If {Xn }n≥0 is a supermartingale, and if the terms Zn of the predictable
sequence are nonnegative and bounded, then {(Z · X)n }n≥0 is a supermartingale; and if
inequality (33) holds for each n ≥ 0 then
(35) E(Z · X)n ≥ EXn .
6. Optional Stopping
The cornerstone of martingale theory is Doob’s Optional Sampling Theorem. This states,
roughly, that “stopping” a martingale at a random time τ does not alter the expected
“payoff”, provided the decision about when to stop is based solely on information available
up to τ . Such random times are called stopping times.5
Theorem 1. (Optional Sampling Theorem) Let {Xn }n∈Z+ be a martingale, submartingale ,
or supermartingale relative to a sequence {Yn }n≥0 , and let τ be a stopping time. Then for
any n ∈ N,
EXτ ∧n = EX0 (martingales)
EXτ ∧n ≤ EX0 (supermartingales)
EXτ ∧n ≤ EXn (submartingales)
Proof. The easiest proof is based on the fact that martingale (respectively, submartingale, su-
permartingale) transforms are martingales (respectively, submartingales, supermartingales).
The connection between transforms and the Optional Sampling Theorem is that the sequence
{Xτ ∧n }n≥0 may be represented as a transform of the sequence {Xn }n≥0 :
(36) Xτ ∧n = (Z · X)n
where
(
1 if τ ≥ n, and
(37) Zn =
0 if τ < n.
The equation (36) is easy to verify. Note that Z0 = 1, since τ is nonnegative; consequently,
X n
(Z · X)n = X0 + Zj (Xj − Xj−1 )
j=1
τ ∧n
X
= X0 + (Xj − Xj−1 )
j=1
= Xτ ∧n ,
since the last sum telescopes.
5In some of the older literature, they are called Markov times or optional times.
16
In order that the sequence {(Z · X)n }n≥0 be a martingale transform (respectively, sub- or
super- martingale transform) it must be the case that the sequence {Zn }n≥0 is predictable.
This is where the assumption that τ is a stopping time enters: Since τ is a stopping time,
for each fixed m the event that τ = m depends only on Y0 , Y1 , Y2 , . . . , Ym . Hence, the event
n−1
c
{τ ≥ n} = ∪m=0 {τ = m}
depends only on Y0 , Y1 , . . . , Yn−1 . But this event is the same as the event that Zn = 1;
this proves that Zn is a function only of Y0 , Y1 , . . . , Yn−1 , and so the sequence {Zn }n≥0 is
predictable.
The first two assertions of the Optional Sampling Theorem now follow easily from Propo-
sitions 4 and 5, in view of the “Conservation of Expectation” properties of martingales and
supermartingales. For instance, if {Xn }n≥0 ) is a martingale, then since martingale transforms
are themselves martingales, and since expectation is “preserved” for martingales,
EXτ ∧n = E(Z · X)n = E(Z · X)0 = EX0 .
A similar argument establishes the corresponding result for supermartingales. Finally, the
last assertion, regarding the case where {Xn }n≥0 is a submartingale, follows from inequality
(34), since the terms Zn of the predictable sequence are between 0 and 1.
7. Maximal Inequalities
The Optional Sampling Theorem has immediate implications concerning the pathwise
behavior of martingales, submartingales, and supermartingales. The most elementary of
these concern the maxima of the sample paths, and so are called maximal inequalities.
Proposition 7. Let {Xn }n≥0 be a sub- or super-martingale relative to {Yn }n≥0 , and for each
n ≥ 0 define
(38) Mn = max Xm , and
0≤m≤n
Proof. Assume first that {Xn }n≥0 is a submartingale. Without loss of generality, we may
assume that each Xn ≥ 0, because if not we may replace the original submartingale Xn by
the larger submartingale Xn ∨ 0. Define τ to be the smallest n ≥ 0 such that Xn ≥ α, or
+∞ is there is no such n. Then for any nonrandom n ≥ 0, the truncation τ ∧ n is a stopping
time and so, by the Optional Sampling Theorem,
EXτ ∧n ≤ EXn .
17
But because the random variables Xm are nonnegative, and because Xτ ∧n ≥ α on the event
that τ ≤ n,
Now the random variables Mn are nondecreasing in n, and converge up to M∞ , so for any
> 0, the event that M∞ ≥ α is contained in the event that Mn ≥ α − for some n. But by
the last displayed inequality and the monotone convergence theorem, the probability of this
is no larger than EX0 /(α − ). Since > 0 may be taken arbitrarily small, inequality (41)
follows.
Example: The St. Petersburg Game, Revisited. In Dostoevsky’s novel The Gambler,
the hero (?) is faced with the task of winning a certain amount of money at the roulette
table, starting with a fixed stake strictly less than the amount he wishes to take home from
the casino. What strategy for allocating his stake will maximize his chance of reaching
his objective? Here we will consider an analogous problem for the somewhat simpler St.
Petersburg game described earlier. Suppose that the gambler starts with 100 roubles, and
that he wishes to maximize his chance of leaving with 200 roubles. There is a very simple
strategy that gives him a .5 probability of reaching his objective: stake all 100 roubles on
the first coin toss, and quit the game after one play. Is there a strategy that will give the
gambler more than a .5 probability of reaching the objective?
The answer is NO, and we may prove this by appealing to the Maximal Inequality (41)
for supermartingales. Let {Wn }n≥0 be any predictable sequence (recall that, for a non-
clairvoyant bettor, the sequence of wagers must be predictable). Then the gambler’s fortune
after n plays equals
n
X
Fn = 100 + Wk ξ k ,
k=1
21