Gradprob Notes4

Notes 4 : Laws of large numbers
Math 733-734: Theory of Probability Lecturer: Sebastien Roch
References: [Fel71, Sections V.5, VII.7], [Dur10, Sections 2.2-2.4].
1 Easy laws
P
Let X1 , X2 , . . . be a sequence of RVs. Throughout we let Sn = k≤n Xk .
We begin with a straighforward application of Chebyshev’s inequality.
THM 4.1 (L2 weak law of large numbers) Let X1 , X2 , . . . be uncorrelated RVs,
i.e., E[Xi Xj ] = E[Xi ]E[Xj ] for i 6= j, with E[Xi ] = µ < +∞ and Var[Xi ] ≤
C < +∞. Then n−1 Sn →L2 µ and, as a result, n−1 Sn →P µ.
Proof: Note that

 !2 
X
Var[Sn ] = E[(Sn − E[Sn ])2 ] = E  (Xi − E[Xi ]) 
i
X X
= E[(Xi − E[Xi ])(Xj − E[Xj ])] = Var[Xi ],
i,j i
since, for i 6= j,
E[(Xi − E[Xi ])(Xj − E[Xj ])] = E[Xi Xj ] − E[Xi ]E[Xj ] = 0.
Hence
Var[n−1 Sn ] ≤ n−2 (nC) ≤ n−1 C → 0,
that is, n−1 Sn →L2 µ, and the convergence in probability follows from Chebyshev.
With a stronger assumption, we get an easy strong law.
THM 4.2 (Strong Law in L4 ) If the Xi s are IID with E[Xi4 ] < +∞ and E[Xi ] =
µ, then n−1 Sn → µ a.s.
1
Lecture 4: Laws of large numbers 2
Proof: Assume w.l.o.g. that µ = 0. (Otherwise translate all Xi s by µ.) Then

 
X
E[Sn4 ] = E  Xi Xj Xk Xl  = nE[X14 ] + 3n(n − 1)(E[X12 ])2 = O(n2 ),
i,j,k,l
where we used that E[Xi3 Xj ] = 0 by independence and the fact that µ = 0. (Note
that E[X12 ] ≤ 1 + E[X14 ].) Markov’s inequality then implies that for all ε > 0
E[Sn4 ]
P[|Sn | > nε] ≤ = O(n−2 ),
n 4 ε4
which is summable, and (BC1) concludes the proof.
The law of large numbers has interesting implications, for instance:

EX 4.3 (A high-dimensional cube is almost the boundary of a ball) Let X1 , X2 , . . .
be IID uniform on (−1, 1). Let Yi = Xi2 and note that E[Yi ] = 1/3, Var[Yi ] ≤
E[Yi2 ] ≤ 1, and E[Yi4 ] ≤ 1 < +∞. Then
X12 + · · · + Xn2 1
→ ,
n 3
both in probability and almost surely. In particular, this implies for ε > 0
r r
n (n) n
P (1 − ε) < kX k2 < (1 + ε) → 1,
3 3
where X (n) =p(X1 , . . . , Xn ). I.e., most of the cube is close to the boundary of a
ball of radius n/3.
2 Weak laws
In the case of IID sequences we get the following.
THM 4.4 (Weak law of large numbers) Let (Xn )n be IID. A necessary and suf-
ficient condition for the existence of constants (µn )n such that
Sn
− µn →P 0,
n
is
n P[|X1 | > n] → 0.
In that case, the choice
µn = E[X1 1|X1 |≤n ],
works.
COR 4.5 (L1 weak law) If (Xn )n are IID with E|X1 | < +∞, then
Sn
→P E[X1 ].
n
Proof: From (DOM)
nP[|X1 | > n] ≤ E[|X1 |1|X1 |>n ] → 0,
and
µn = E[X1 1|X1 |≤n ] → E[X1 ].
Before proving the theorem, we give an example showing that the condition in
Theorem 4.4 does not imply the existence of a first moment. We need the following
important lemma which follows from Fubini’s theorem. (Exercise.)
LEM 4.6 If Y ≥ 0 and p > 0, then
Z ∞
E[Y p ] = py p−1 P[Y > y]dy.
0
EX 4.7 Let X ≥ e be such that, for some α ≥ 0,

1
P[X > x] = , ∀x ≥ e.
x(log x)α
(There is a jump at e. The choice of e makes it clear that the tail stays under 1.)
Then
Z +∞ Z +∞
1 1
E[X 2 ] = e2 + 2x α
dx ≥ 2 dx = +∞, ∀α ≥ 0.
e x(log x) e (log x)α
(Indeed, it decays slower than 1/x which diverges.) So the L2 weak law does not
apply. On the other hand,
Z +∞ Z +∞
1 1
E[X] = e + α
dx = e + α
du.
e x(log x) 1 u
This is +∞ if 0 ≤ α ≤ 1. But for α > 1
+∞
u−α+1 1
E[X] = e + =e+ .
−α + 1 1
α−1
Finally,
1
nP[X > n] = → 0, ∀α > 0.
(log n)α
(In particular, the WLLN does not apply for α = 0.) Also, we can compute µn in
Theorem 4.4. For α = 1, note that (by the change of variables above)
Z n
1 1
µn = E[X 1X≤n ] = e + − dx ∼ log log n.
e x log x n log n
Note, in particular, that µn may not have a limit.
2.1 Truncation
To prove sufficiency, we use truncation. In particular, we give a weak law for
triangular arrays which does not require a second moment—a result of independent
interest.
THM 4.8 (Weak law for triangular arrays) For each n, let (Xn,k )k≤n be inde-
0
pendent. Let bn with bn → +∞ and let Xn,k = Xn,k 1|Xn,k |≤bn . Suppose that
Pn
1. k=1 P[|Xn,k | > bn ] → 0.
Pn
2. b−2
n
0
k=1 Var[Xn,k ] → 0.
If we let Sn = nk=1 Xn,k and an = nk=1 E[Xn,k

0 ] then
P P
Sn − an
→P 0.
bn
Pn
Proof: Let Sn0 = 0
k=1 Xn,k . Clearly
0
Sn − an 0
Sn − an
P
> ε ≤ P[Sn 6= Sn ] + P
>ε .
bn bn
For the first term, by a union bound

n
X
P[Sn0 6= Sn ] ≤ P[|Xn,k | > bn ] → 0.
k=1
For the second term, we use Chebyshev’s inequality:

0 n
Var[Sn0 ]

Sn − an 1 X 0
P
> ε ≤ = Var[Xn,k ] → 0.
bn ε2 b2n ε2 b2n
k=1
Proof: (of sufficiency in Theorem 4.4) We apply Theorem 4.4 with bn = n. Note
that an = nµn . Moreover,
n−1 Var[Xn,1
0
] ≤ n−1 E[(Xn,1
0
)2 ]
Z ∞
= n−1 0
2yP[|Xn,1 | > y]dy
0
Z n
= n−1 2y[P[|Xn,1 | > y] − P[|Xn,1 | > n]]dy
0
Z n
1
≤2 yP[|X1 | > y]dy
n 0
→ 0,
since we are “averaging” a function going to 0. Details in [D].

The other direction is proved in the appendix.
3 Strong laws
Recall:
DEF 4.9 (Tail σ-algebra) Let X1 , X2 , . . . be RVs on (Ω, F, P). Define

\
Tn = σ(Xn+1 , Xn+2 , . . .), T = Tn .
n≥1
By a previous lemma, T is a σ-algebra. It is called the tail σ-algebra of the se-

quence (Xn )n .
THM 4.10 (Kolmogorov’s 0-1 law) Let (Xn )n be a sequence of independent RVs
with tail σ-algebra T . Then T is P-trivial, i.e., for all A ∈ T we have P[A] = 0
or 1. In particular, if Z ∈ mT then there is z ∈ [−∞, +∞] such that
P[Z = z] = 1.
EX 4.11 Let X1 , X2 , . . . be independent. Then
lim sup n−1 Sn and lim inf n−1 Sn

n n
are almost surely a constant.

3.1 Strong law of large numbers
P Let X1 , X2 , . . . be pairwise indepen-

THM 4.12 (Strong law of large numbers)
dent IID with E|X1 | < +∞. Let Sn = k≤n Xk and µ = E[X1 ]. Then
Sn
→ µ, a.s.
n
If instead E|X1 | = +∞ then

Sn
P lim exists ∈ (−∞, +∞) = 0.
n n
Proof: For the converse, assume E|X1 | = +∞. From Lemma 4.6
+∞
X
+∞ = E|X1 | ≤ P[|X1 | > n].
n=0
By (BC2)
P[|Xn | > n i.o.] = 1.
Because
Sn Sn+1 (n + 1)Sn − nSn+1 Sn − nXn+1 Sn Xn+1
− = = = − ,
n n+1 n(n + 1) n(n + 1) n(n + 1) n + 1
we get that
{lim n−1 Sn exists ∈ (−∞, +∞)} ∩ {|Xn | > n i.o.} = ∅

n
because, on that event, Sn /n(n + 1) → 0 so that

Sn Sn+1 2
n − n + 1 > 3 i.o.

a contradiction. The result follows because P[|Xn | > n i.o.] = 1.

There are several steps in the proof of the =⇒ direction:
1. Truncation. Let Yk = Xk 1{|Xk |≤k} and Tn = k≤n Yk . (Note that the Yi ’s
P
are not identically distributed.) Since (by integrating and using Lemma 4.6)
+∞
X
P[|Xk | > k] ≤ E|X1 | < +∞,
k=1
(BC1) implies that it suffices to prove n−1 Tn → µ.

2. Subsequence. For α > 1, let k(n) = [αn ]. By Chebyshev’s inequality, for

ε > 0,
+∞ +∞
X 1 X Var[Tk(n) ]
P[|Tk(n) − E[Tk(n) ]| > εk(n)] ≤
ε2 k(n)2
n=1 n=1
+∞ k(n)
1 X 1 X
= Var[Yi ]
ε2 k(n)2
n=1 i=1
+∞
1 X X 1
= Var[Yi ]
ε2 k(n)2
i=1 n:k(n)≥i
+∞
1 X
≤ Var[Yi ](Ci−2 )
ε2
i=1
< +∞,
where the next to last line follows from the sum of a geometric series and the
last line follows from the next lemma—proved later:
LEM 4.13 We have

+∞
X Var[Yi ]
≤ E|X1 | < +∞.
i2
i=1
By (DOM) and (BC1), since ε is arbitrary, we have E[Yk ] → µ and
Tk(n)
→ µ, a.s.
k(n)
3. Sandwiching. To use a sandwiching argument, we need a monotone se-

quence. Note that the assumption of the theorem applies to both X1+ and
X1− and the result is linear so that we can assume w.l.o.g. that X1 ≥ 0. Then
for k(n) ≤ m < k(n + 1)
Tk(n) Tm Tk(n+1)
≤ ≤ ,
k(n + 1) m k(n)
and using k(n + 1)/k(n) → α we get
1 Tm Tm
E[X1 ] ≤ lim inf ≤ lim sup ≤ αE[X1 ].
α m m m m
Since α > 1 is arbitrary, we are done. But it remains to prove the lemma:
Proof: By Fubini’s theorem
+∞ +∞
X Var[Yi ] X E[Y 2 ] i
≤
i2 i2
i=1 i=1
+∞ Z ∞
X 1
= 2yP[|Yi | > y]dy
i2 0
i=1
+∞ Z ∞
1
1{y≤i} 2yP[|Yi | > y]dy
X
=
i2 0
i=1
+∞
!
Z ∞
1
1
X
= 2y P[|Yi | > y]dy
0 i2 {y≤i}
i=1
Z ∞
≤ C 0 P[|Yi | > y]dy
0
0
≤ C E|X1 |,
where the second to last inequality follows by integrating.

In the infinite case:
THM 4.14 (SLLN: Infinite mean case) Let X1 , X2 , . . . be IID with E[X1+ ] =
+∞ and E[X1− ] < +∞. Then
Sn
→ +∞, a.s.
n
Proof: LetPM > 0 and XiM = Xi ∧ M . Since E|XiM | < +∞ the SLLN applies
to SnM = i≤n XiM . Then
Sn SM
lim inf ≥ lim inf n = E[XiM ] ↑ +∞,
n n n n
as M → +∞ by (MON) applied to the positive part.
3.2 Applications
An important application of the SLLN:
THM 4.15 (Glivenko-Cantelli) Let (Xn )n be IID and, for x ∈ R,
1X
Fn (x) = 1{Xk ≤ x},
n
k≤n
be the empirical distribution function. Then
sup |Fn (x) − F (x)| → 0,

x∈R
where F is the distribution function of X1 .
Proof: Pointwise convergence follows immediately from the SLLN. Uniform con-
vergence then follows from the boundedness and monotonicity of F and Fn . See
[D] for details.
References
[Dur10] Rick Durrett. Probability: theory and examples. Cambridge Series in
Statistical and Probabilistic Mathematics. Cambridge University Press,
Cambridge, 2010.
[Fel71] William Feller. An introduction to probability theory and its applications.

Vol. II. Second edition. John Wiley & Sons Inc., New York, 1971.
A Symmetrization
To prove the other direction of the weak law, we use symmetrization.
DEF 4.16 Let X ∼ F . We say that X is symmetric if X and −X have the same
distribution function, that is, if at points of continuity we have F (x) = 1 − F (−x)
for all x.
EX 4.17 (Symmetrization) Let X1 be a RV (not necessarily symmetric) and X̃1 ,

an independent copy. Then X1◦ = X1 − X̃1 is symmetric.
LEM 4.18 For all t > 0,
P[|X1◦ | > t] ≤ 2P[|X1 | > t/2]. (1)
If m is a median for X1 , i.e.,

1 1
P[X1 ≤ m] ≥ , P[X1 ≥ m] ≥ ,
2 2
and assume w.l.o.g. m ≥ 0 then
1
P[|X1◦ | > t] ≥ P[|X1 | > t + m]. (2)
2
Proof: For the first one, at least one of |X1 | > t/2 or |X̃1 | > t/2 must be satisfied.
For the second one, the following are enough
{X1 > t + m, X̃1 ≤ m} ∪ {X1 < −t − m, X̃1 ≥ −m},
and note that

P[X1 ≥ −m] ≥ P[X1 ≥ m] ≥ 1/2.
LEM 4.19 Let {Yk }k≤n be independent and symmetric with Sn = nk=1 Yk and
P
Mn equal to the first term among {Yk }k≤n with greatest absolute value. Then
1
P[|Sn | ≥ t] ≥ P [|Mn | ≥ t] . (3)
2
Moreover, if the Yk ’s have a common distribution F then
1
P[|Sn | ≥ t] ≥ (1 − exp(−n[1 − F (t) + F (−t)])) . (4)
2
Proof: We start with the second one. Note that
P[|Mn | < t] ≤ (F (t) − F (−t))n ≤ exp (−n[1 − F (t) + F (−t)]) .
Plug the latter into the the first statement.

For the first one, note that by symmetry we can drop the absolute values. Then
P[Sn ≥ t] = P[Mn + (Sn − Mn ) ≥ t] ≥ P[Mn ≥ t, (Sn − Mn ) ≤ 0]. (5)
By symmetry, the four combinations (±Mn , ±(Sn − Mn )) have the same distribu-
tion. Indeed Mn and Sn − Mn are not independent but their sign is because Mn is
defined by its absolute value and Sn − Mn is the sum of the other variables. Hence,
P[Mn ≥ t] ≤ P[Mn ≥ t, (Sn − Mn ) ≥ 0] + P[Mn ≥ t, (Sn − Mn ) ≤ 0],
and the two terms on the RHS are equal. Plugging this back into (5), we are done.
Going back to the proof of necessity:

Proof:(of necessity in Theorem 4.4) Assume that there is µn such that for all ε > 0
P[|Sn − nµn | ≥ εn] → 0.
Note that X
Sn◦ = (Sn − nµn )◦ = Xk◦ .
k≤n
Therefore, by (1), assuming w.l.o.g. m ≥ 0,

1
P[|Sn − nµn | ≥ εn] ≥ P[|Sn◦ | ≥ 2εn]
2
1
≥ (1 − exp (−nP[|X1◦ | ≥ 2nε]))
4
1 1
≥ 1 − exp − nP[|X1 | ≥ 2nε + m]
4 2

1 1
≥ 1 − exp − nP[|X1 | ≥ n] ,
4 2
for ε small enough and n large enough. Since the LHS goes to 0, we are done.
B St-Petersburg paradox
EX 4.20 (St-Petersburg paradox) Consider an IID sequence with
P X1 = 2j = 2−j , ∀j ≥ 1.

Clearly E[X1 ] = +∞. Note that

1
P[|X1 | ≥ n] = Θ ,
n
(indeed it is a geometric series and the sum is dominated by the first term) and
therefore we cannot apply the WLLN. Instead we apply the WLLN for triangular
arrays to a properly normalized sum. We take Xn,k = Xk and bn = n log2 n. We
check the two conditions. First
n
X n
P[|Xn,k | > bn ] = Θ → 0.
n log2 n
k=1
0
To check the second one, let Xn,k = Xn,k 1|Xn,k |≤bn and note
log2 n+log2 log2 n

X
0
E[(Xn,k )2 ] = 22j 2−j ≤ 2 · 2log2 n+log2 log2 n = 2n log2 n.
j=1
So
n
1 X 0 2 2n2 log2 n
E[(Xn,k ) ] = → 0.
b2n n2 (log2 n)2
k=1
Finally,
n log2 n+log2 log2 n

X X
0 0
an = E[Xn,k ] = nE[Xn,1 ] =n 2j 2−j = n(log2 n+log2 log2 n),
k=1 j=1
so that
Sn − an
→P 0,
bn
and
Sn
→P 1.
n log2 n
P
THM 4.21 Let (Xn )n be IID with E|X1 | = +∞ and Sn = k≤n Xk . Let an be
a sequence
P with an /n increasing. Then lim supn |Sn |/an = 0 or + ∞ according
as n P[|X1 | ≥ an ] < +∞ or = +∞.
The proof uses random series and is presented in [D].
EX 4.22 (Continued) Note that

1
P[|X1 | ≥ n log2 n] = Ω ,
n log2 n
which is not summable. Therefore, by the previous theorem

Sn
lim sup = +∞, a.s.
n n log2 n

Gradprob Notes4

Uploaded by

Copyright:

Available Formats

Gradprob Notes4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gradprob Notes4

Uploaded by

Copyright:

Available Formats

Notes 4 : Laws of large numbers

Math 733-734: Theory of Probability Lecturer: Sebastien Roch

References: [Fel71, Sections V.5, VII.7], [Dur10, Sections 2.2-2.4].

Proof: Note that

E[(Xi − E[Xi ])(Xj − E[Xj ])] = E[Xi Xj ] − E[Xi ]E[Xj ] = 0.

With a stronger assumption, we get an easy strong law.

Proof: Assume w.l.o.g. that µ = 0. (Otherwise translate all Xi s by µ.) Then

The law of large numbers has interesting implications, for instance:

nP[|X1 | > n] ≤ E[|X1 |1|X1 |>n ] → 0,

EX 4.7 Let X ≥ e be such that, for some α ≥ 0,

Note, in particular, that µn may not have a limit.

If we let Sn = nk=1 Xn,k and an = nk=1 E[Xn,k

For the first term, by a union bound

For the second term, we use Chebyshev’s inequality:

since we are “averaging” a function going to 0. Details in [D].

DEF 4.9 (Tail σ-algebra) Let X1 , X2 , . . . be RVs on (Ω, F, P). Define

By a previous lemma, T is a σ-algebra. It is called the tail σ-algebra of the se-

EX 4.11 Let X1 , X2 , . . . be independent. Then

lim sup n−1 Sn and lim inf n−1 Sn

are almost surely a constant.

3.1 Strong law of large numbers

P Let X1 , X2 , . . . be pairwise indepen-

{lim n−1 Sn exists ∈ (−∞, +∞)} ∩ {|Xn | > n i.o.} = ∅

because, on that event, Sn /n(n + 1) → 0 so that

a contradiction. The result follows because P[|Xn | > n i.o.] = 1.

(BC1) implies that it suffices to prove n−1 Tn → µ.

2. Subsequence. For α > 1, let k(n) = [αn ]. By Chebyshev’s inequality, for

LEM 4.13 We have

By (DOM) and (BC1), since ε is arbitrary, we have E[Yk ] → µ and

3. Sandwiching. To use a sandwiching argument, we need a monotone se-

and using k(n + 1)/k(n) → α we get

where the second to last inequality follows by integrating.

be the empirical distribution function. Then

sup |Fn (x) − F (x)| → 0,

where F is the distribution function of X1 .

[Fel71] William Feller. An introduction to probability theory and its applications.

EX 4.17 (Symmetrization) Let X1 be a RV (not necessarily symmetric) and X̃1 ,

LEM 4.18 For all t > 0,

P[|X1◦ | > t] ≤ 2P[|X1 | > t/2]. (1)

If m is a median for X1 , i.e.,

{X1 > t + m, X̃1 ≤ m} ∪ {X1 < −t − m, X̃1 ≥ −m},

and note that

P[|Mn | < t] ≤ (F (t) − F (−t))n ≤ exp (−n[1 − F (t) + F (−t)]) .

Plug the latter into the the first statement.

P[Sn ≥ t] = P[Mn + (Sn − Mn ) ≥ t] ≥ P[Mn ≥ t, (Sn − Mn ) ≤ 0]. (5)

P[Mn ≥ t] ≤ P[Mn ≥ t, (Sn − Mn ) ≥ 0] + P[Mn ≥ t, (Sn − Mn ) ≤ 0],

Going back to the proof of necessity:

P[|Sn − nµn | ≥ εn] → 0.

Therefore, by (1), assuming w.l.o.g. m ≥ 0,

Clearly E[X1 ] = +∞. Note that

log2 n+log2 log2 n

n log2 n+log2 log2 n

The proof uses random series and is presented in [D].

EX 4.22 (Continued) Note that

which is not summable. Therefore, by the previous theorem

You might also like