Nothing Special   »   [go: up one dir, main page]

Brief Study of Markov Chains

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Studying Markov Chains and The

Monte Carlo Method


Math 157 Final Paper
Denis Turcu
May 4th , 2016

Abstract
This paper provides a brief introduction to Markov Chains. It also includes a
discussion about the Monte Carlo method for solving problems which allow
probabilistic interpretations. The first part introduces a few simple and
conceptual problems for providing intuition about these approaches before
going into theoretical aspects. This basic formulation of Markov chains will
be followed by a section on Discrete-time Markov chains involving a more
rigorous formulation, which can be applied to a varied range of problems over
many fields of study. We will briefly go into Continuous-time Markov chains
in the next section for a more comprehensive study. The last part introduces
the Monte Carlo method for numerical estimations where analytical solutions
are nearly impossible to compute or do not exist by the nature of the problem.

Denis Turcu

Math 157

Introduction

Lets start with a brief aside to first see how powerful the Markov chains
method is. We assume familiarity with the law of large numbers. Assume we
have two boxes, Aw and Ab , which contain 200 white pebbles and 300 black
pebbles respectively. Then, we know that mixing these two boxes together,
and picking a random pebble each time for a large number of times, with
replacement, the number of white pebbles we encountered and the number
of black pebbles we encountered converges to:
2
# white pebbles encuntered in n trials
= .
n # black pebbles encuntered in n trials
3
lim

For the above to be true, we require independence for each of the trials.
While this example can be extended to more complicated ones, it is important to understand that there exists an underlying probability distribution
describing this situation. Namely, after n samples, we expect to have seen
[ 25 , 53 ] n white and black pebbles respectively, where [ 25 , 35 ] represents the
probability distribution for the two observables we have.
Now imagine we have the same two boxes as before, but instead of mixing
them into one other box, we create two other boxes as follows. We will have
box W , containing 150 white pebbles and 100 black pebbles, and box B,
containing the rest of the pebbles, 50 white and 200 black. We start picking
pebbles one at a time from one of these two boxes, and assume we start
from box W with probability p. However, we move from box W to box B
dependently on the last observation as follows. If the last picked pebble is
white, we pick the next one from box W , if it is black, we pick from box B.
Therefore, we can draw the following diagram to understand the rules better:
p

0.6

0
0.2
0.4

1p

0.8

Page 2

Denis Turcu

Math 157

If we are at some discrete-time t in state B, we have probability of


= 0.8 of extracting a black pebble, therefore we remain in this state
with probability 0.8 based on the definition of the problem. It should be
quite simple to see how we can compute the probability of being in any of
the possible new states from a given initial state. The interesting question
arises when we want to find closed formulas for any number of times we
picked pebbles out of the boxes. We can use simple linear algebra notation
for this problem. We have a 3 1 vector, for which the ith component represents the probability of being in the ith state at the given time. Let the
first component represent state 0, second component state W and third component state B. Assuming that we start from state
 0, we write the initial
probability distribution
for the states as 1 0 0 . At the next step, it be
comes 0 p 1 p . Note that the sum of the entries of these vectors must
be 1 because at any step we must be in one of the states of the problem. We
can define a transition matrix that helps us to easily compute the probability
distribution from one step to another by:
200
50+200

0
T = W
B

0 W
B
0 p 1 p!
0 0.6 0.4
0 0.2 0.8

where the entry Tij represents the probability of transitioning from state i
to state j. In particular, regardless of our initial state, we cannot return
to state 1 because the first column is only 0s. Note that each row of this
matrix sums to 1, because regardless of which state is our initial state, we
must transition to one of the existing states, and the probability of the entire
space is 1. Now, if we let v0 denote the initial state, using the above notation,
in the nth step we will be in the state:
v(n) = v0 T n .
While this seems to be a very simple and straightforward example, it is
not the main result, but just a tool for easier notation. The interesting result
was proven by Andrey Markov; recall at this point that the previous problem
does not assume independence among trials, on the contrary they are made
explicitly dependent. Still in this case, given an initial state, the final result
converges in the limit n .
The above example is adapted from [5].
Page 3

Denis Turcu

1.1

Math 157

Gamblers ruin

This is a famous problem that has a variety of different formulations. In more


technical terms, for this section, we are asked to think about a constrained
random walk problem in which we can move one step at a time, and which,
by definition, has absorbing 1 states at both ends, starting somewhere in
between these states (the problem is not defined outside).
To state the problem more rigorously in terms of gamblers ruin, we
assume there is a fixed amount of money, $S that two players, A and B share.
At each round of the game, one o the players earns $1 and the other loses
$1. The game ends when one of the players has $0, and implicitly the other
has $S. We assume that each round is independent of any other factors,
and the winning chances of each player do not change in time. Denote by
p the probability that A wins a round. Therefore, A losses with probability
q 1 p and B wins with q and loses with p.
To be able to deal with the problem, we want to have very precise definitions from now on. Let Xn be the amount player A has at discrete-time
n N. Clearly, B will have Yn S Xn at the same time. We will assume
that A starts with X0 , which as mentioned before, must satisfy 0 X0 S.
Note that in this case 0 and S are absorbing states.
We are interested in finding the probability that player A eventually losses
the game (will have no money left). Based on the definitions, we can write:

P(Xn+1 = k + 1|Xn = k {1, . . . S 1}) = p


P(Xn+1 = k 1|Xn = k {1, . . . S 1}) = q

P(Xn+1 = S|Xn = S) = 1, P(Xn+1 = 0|Xn = 0) = 1


To visualize the problem, we can draw the following diagram, denoting
the amount of money player A has in each state:
p
1

p
2

S-1

Note that for S {0, 1, 2} the problem is trivial, and for S = 3 it is not
considerably harder [2]. However, we are interested in general S. Denote
1

An absorbing state is a situation where with probability 1 the next states are the same
as the current one.

Page 4

Denis Turcu

Math 157

the event that A loses the game by RA (for ruin A). Avoiding the absorbing
states as initial states because these would make the problem trivial, we can
write a very useful recursive relation based on the above Markov chain to
make transform the problem into a system of linear equations:
P(RA |X0 = k) = pP(RA |X0 = k + 1) + qP(RA |X0 = k 1).
While this result should be self explanatory, an extensive proof can be
found in Lemma 3.1 from [2]. Intuitively speaking, since each round is independent of the others, we have P(RA |Xi = k) = P(RA |Xj = k), i, j N.2
Therefore, the above equation simply represents one round of the game, going
right with probability p and left with probability q at the first step.
Introducing a shorter notation for the above equation, we denote:
fS (k) = P(RA |X0 = k),
for function fS : {0, 1, . . . , S} [0, 1] with f (0) = 1 and f (S) = 1. Then we
write:
fS (k) = pfS (k + 1) + qfS (k 1), 1 k S 1.
Assuming this is a known problem, that can be solved without the use
of Markov chains, we will cite equations 3.8, 3.9 from [2] for the final result,
which solves the above recursive equations with given boundary conditions:
(
k (q/p)S
, p 6= q
fS (k) = (q/p)
1(q/p)S
fS (k) = 1 Sk ,

p=q=

1
2

Recall that we defined the transition matrix which can be easily computed
in this case, especially having the diagram above. In particular we have:
0
1
q


T =

S 2
0
S 1 0
0
1
2
..
.

1
0
0
q

0
0
0

2
0
p
0


0
0
0

S2
0
0
0

S1
0
0
0

0
q
0

p
0
0

S
0
0
0

p
1

This means that no matter how many rounds have been played already, as long as A
has $k at the current time, from that time on, the probability of losing is the same as if
he or she would have started with $k.

Page 5

Denis Turcu

Math 157

Starting with a given state (an S dimensional almost-null vector, v0 , with


only k th component equal to 1 for X0 = k), after n rounds, the probability
distribution of the possible states will be:
v(n) = v0 T n .
The previous result about the gamblers ruin problem, assuming p 6= q
for the equations below, tells us that in the limit n , we know the vector
v(n) and it has a very interesting form:


1(q/p)k
(q/p)k (q/p)S
lim (v(n)) =
0 0 0 0 1(q/p)S
1(q/p)S
n

Likewise, we may infer intuitively, without rigorous proof, how T n behaves


in the same limit:

(q/p)0 (q/p)S
1(q/p)0
0
0

0
0
S
S
1(q/p)
1(q/p)
(q/p)
1 (q/p)S
1(q/p)1

0
0

0
0
S
1(q/p)
1(q/p)S
(q/p)2 (q/p)S
1(q/p)2
1(q/p)S
0 0 0 0 1(q/p)
S
n

lim (T ) =
..
.. ..
.. ..
..

.
. .
. .
.
(q/p)S1 (q/p)S

1(q/p)S1

0
0

0
0
S
S
1(q/p)
1(q/p)

(q/p)S (q/p)S
1(q/p)S

0 0

0 0

0 0
0 0

0 0
0 0

S
(q/p)1 (q/p)
1(q/p)S
(q/p)2 (q/p)S
1(q/p)S
n
lim (T ) =
..

.
(q/p)S1 (q/p)S

1(q/p)

0
..
.

0
..
.

0 0
0 0

1(q/p)S
1(q/p)S

1(q/p)1
1(q/p)S
1(q/p)2
1(q/p)S

0 0
.
.. ..
..

. .
.

1(q/p)S1
0 0 1(q/p)S
0 0
1

Note how the results for p 6= q and p = q differ from each other. Consider
the sequences (an )n0 and (bn )n0 that converge to (q/p)k (q/p)S and,
respectively, 1 (q/p)S as n for given p, q, k, S. Then the result of the
gamblers ruin problem can be written as:
 
an
fS (k) = lim
.
n
bn
Page 6

Denis Turcu

Math 157

The above result does not differentiate between p 6= q and p = q as before,


because in the indeterminate case 0/0 which is obtained for p = q we must
apply lHopitals rule to compute the limit. This would indeed result in the
k1 S1S1
= 1 Sk for p = q.
previous equation, fS (k) = k10S1
S1
A simple example which helped adapt the problem and build intuition
about the gamblers ruin transition matrix behavior can be found in [6].

1.2

Unrestricted Random Walk

We now look at a similar problem, but unlike the previous section we do not
have boundaries anymore. It is now more convenient to slightly redefine the
problem in terms of the position we are at step n 0. First of all, without
loss of generality, assume we start at position 0 (otherwise, relabel/shift the
axis such that this is achieved). We are still assuming that at each step we
must move, either to the left or to the right. Denote by p the probability of
walking to the right, and by q = 1 p the probability of walking to the left.
We want to express the position, S, as a function of n, but this is cannot be
deterministic equation because we are walking randomly in either direction.
Therefore, we let (Xi )i1 be a set of random variables such that:
P(Xi = 1) = p and P(Xi = 1) = q.
Now we can write the position as a sequence:
Sn = X1 + X2 + + Xn .
While in the previous section we were interested in the probability of
winning/losing, that was not the only relevant question that we could ask.
Recall that we dealt with a discrete-time problem, as we do in this section
as well. It seems reasonable to assume that each round/step takes a roughly
constant amount of time, therefore, we are interested in how long we will
be playing/walking until we reach some desired state. In the gamblers
ruin problem we were interested in the game ending, and, luckily, it turned
out that it was ending at some point with probability 1. However, without
boundaries, we do not have an end anymore, so we must define a stopping
situation. Assuming that we start from the origin, lets explore how many
steps we must take to arrive back at the origin. Since this section is only
meant to offer a taste of what we can solve using recursive solutions that

Page 7

Denis Turcu

Math 157

Markov chains make seem more intuitive, we will only briefly discuss a few
results. Following notation in [2], we let:
T0 = inf{n > 0|Sn = 0}
denote the first time we return to the initial position (measured in units of
steps). Once we agree on the convention that inf = for this problem,
we may continue the analysis. This is simply an intuitive notation, because
{n > 0|Sn = 0} = we do not reach the initial position again.
The parameters of the problem are only given by p and q (or rather just
p because q = 1 p), so we will assume these are given from the beginning.
However, we are still not working with a deterministic problem, so T0 can be
represented by a distribution. Let this distribution be:
g : N [0, 1], defined by g(n) = P(T0 = n|S0 = 0).
Note that the P
above indeed defines a distribution because it satisfies
g(n) [0, 1] and
nN g(n) = 1. The second equality should be quite
clear and self explanatory because we have:
X
X
T0 =i are mutually exclusive
g(n) =
P(T0 = n|S0 = 0) =================
nN

nN

X
nN

g(n) = P

(T0 = n|S0 = 0)

= 1.

nN

Since we are making a step in either direction at each time, note that the
problem has a parity symmetry regarding the position and number of steps.
Assuming n N and k Z, we can write the following equations:
P(S2n = 2k + 1|S0 = even) = 0 and P(S2n+1 = 2k|S0 = even) = 0.
Therefore, if we start from 0, in an odd number of steps we will be at an odd
coordinate, so we have:
g(2k + 1) = P(T0 = 2k + 1|S0 = 0) = 0, k N.
So we see that we need an even number of steps to get back to the initial position. Recall from the binomial distribution the following adapted equation
to our problem:


2n
P(S2n = 2k|S0 = 0) =
pn+k q nk ,
n+k
Page 8

Denis Turcu

Math 157

because we need to make 2k net steps to the right, which means the rest
steps, 2n 2k must be half to the right and half to the left. In other words,
= k + n steps to the right and 2n2k
= n k to
we are making 2k + 2n2k
2
2
the left. The binomial coefficient accounts for all possible paths we can walk
on having these numbers of steps in each direction. Note that this holds for
k < 0 as well, but then making 1 step in the right direction physically
represents moving to the left.
Combining the above equation with the previous one, starting from 0
instead of any even number, P(S2n = 2k + 1|S0 = 0) = 0, we define:
( 
2n n n
p q , n is even
n
h(n) = P(Sn = 0|S0 = 0) h(n) =
, n N .
0,
n is odd
Now we can write down a key result, which is rigorously proven in Lemma
4.1 from [2]. However, there is a lot of intuition behind the result, if we think
about what g(l) and h(n) mean. To emphasize, g(l) represents the probability
that we will return in our random walk to the initial position in l steps for
the first time. On the other hand, h(n) represents the probability that we
will be in the initial state at step n, so we can pass through 0 in our case
multiple times. The proof of Lemma 4.1 from [2] is basically nothing else
than applying the law of total probability and working out the notation to
explicitly obtain:
n2
X
h(n) =
g(n k)h(k).
k=0

The above is called the convolution equation and solving this for g(l) will
give us the result. While computing the final answer is beyond the current
scope of this paper, we will use the results from [2] to make a few brief
remarks. It turns out that the distribution is rather complex, but we can
easily have a look at a physically interesting situation, namely if the return
to the initial state is achieved in a finite amount of steps. From equation 4.21
from [2] we get:
g() = P(T0 = |S0 = 0) = |p q|.
A few important remarks are that the result is symmetric in p and q as
expected because we could simply rename the directions of walking. Note
that for p = q = 21 , the game will end for sure in a finite number of steps.
This seems to be a reasonable result because in this case we expect to stay
Page 9

Denis Turcu

Math 157

around 0. However, if say p > q, then we expect to slightly move to the right
at each step, shifting our origin to the right, towards infinity.
If we are interested in the average number of steps in which the game
ends, we clearly see that if the walk is biased (i.e. p 6= q), this is infinite
because we have non zero probability of having infinite number of steps.
The surprising result is that the average number of steps to return does not
converge even in the p = q case [2]. Whether or not this is an intuitive result
is open for discussion, but it means that the number of steps to return to
the initial state grows faster than the probability of the longer walks to be
followed.

Page 10

Denis Turcu

Math 157

Markov Chains

We are now about to dive into a more formal study of Markov chains. Reading Chapter 5 from [2] and Chapter 1 from [1] we will develop notations
and definitions to facilitate this study. Most of the technical words used in
this introductory paragraph will be defined in the subsequent sections. This
being said, roughly speaking we will first deal with discrete state spaces focusing on the Markov property and the role of the transition matrix previously
introduced. The discussion on continuous state space will be less formal.

2.1

Discrete-time Markov chains

Lets start by defining a discrete state space, S, which is a countable set. For
most problems, we generally have S = Z, as we have seen in the unrestricted
random walk situation. We then call any i S a state, and the fact that
S is countable allows us to name these discrete states because we have an
isomorphism between S and a subset of N. Note that this concept is not
related to the notion of discrete-time process. We will assume that the state
space is discrete for the continuous-time Markov chains as well.
We call a measure on S a list = (i |i S) such that 0 i < , i S.
Because measures are additive, for this particular case of a discrete space we
see that it suffices to know the measure of any particular state in order to
compute the measure of any given subset of S.
In addition, we can add additional structure to the previous definition if
we
require
that the measure of the entire discrete state space, S, is 1, namely:
P
iS i = 1. We call such a measure a distribution because it resembles the
idea of probabilities for discrete processes.
Defining a sequence discrete-time stochastic processes, (Zn )nN , which
takes values in S, we can slowly move on to what is the Markov property.
We will assume that each of these random variables are picked from the
distribution at the time step corresponding to n.
Recall the transition matrices we used in the first part for developing
intuition about the problems. Mote precisely, these were stochastic matrices,
i.e. each row of each matrix was supposed to be a distribution (sum to 1).
Moreover, the dimensionality of the matrices was the same as the cardinality
Page 11

Denis Turcu

Math 157

of the state space. Therefore we define a transition matrix to be a |S| |S|


stochastic matrix:
X
TS = (tij |i, j S) and 0 tij 1 i, j S;
tij = 1 i S.
jS

Note that there is an isomorphism between these type of matrices and the
diagrams drew for this type of problems. This is because the two objects have
the same dimensionality, and the requirement that each row of the matrix
is a distribution is equivalent to the requirement that the outward pointing
arrows in the diagrams form a distribution as well.
We are at the point where we can formally define a Markov chain. There
are only few, rather intuitive and expected properties. We say that (Zn )nN
is a Markov chain with initial distribution and transition matrix T if:
(
P(Z0 = i0 ) = i0
P(Zn+1 = in+1 |Z0 = i0 , . . . , Zn = in ) = P(Zn+1 = in+1 |Zn = in ) = tin in+1
In other words, the initial variable is picked from the initial distribution
, and the following ones only depend on the previous one, and no more
than that, through the transition matrix. In short notation, we will say that
(Zn )nN is M arkov(, T ).
From the above definitions, we immediately have a very useful result for
computing the distribution of Zn starting from any initial step. Indeed this
result is part of the beginning towards confirming the matrix equations we
used in the first part when working through the example problems.
Theorem 2.1 A discrete-time, finite stochastic process (Zn )0nN is:
M arkov(, T ) P(Z0 = i0 , . . . , ZN = iN ) = i0 ti0 i1 tiN 1 iN ,
i0 , . . . , iN S.
Proof: The direction follows immediately by applying both properties
in the definition and by using conditional probability identities [1].
For the direction, we will use induction. Assume the the equality is
true for N . Then, we can sum over all possible values of iN to get:
X
X
P(Z0 = i0 , . . . , ZN = iN ) =
i0 ti0 i1 tiN 1 iN
iN S

iN S

Page 12

Denis Turcu

Math 157

P(Z0 = i0 , . . . , ZN 1 = iN 1 ) = i0 ti0 i1 tiN 2 iN 1

tiN 1 iN

iN S
P

S tiN 1 iN =1

===N======== P(Z0 = i0 , . . . , ZN 1 = iN 1 ) = i0 ti0 i1 tiN 2 iN 1 .


Then, by induction, we get that:
P(Z0 = i0 , . . . , Zn = in ) = i0 ti0 i1 tin1 in , 0 n N.
Therefore, we note that the first property is satisfied, namely for n = 0
in the above we get P(Z0 = i0 ) = i0 . For the second property, we simply do
the computation:
P(Zn+1 = in+1 |Z0 = i0 , . . . , Zn = in ) =

P(Zn+1 = in+1 , Z0 = i0 , . . . , Zn = in )
P(Z0 = i0 , . . . , Zn = in )

which by the previous result simply gives:


P(Zn+1 = in+1 |Z0 = i0 , . . . , Zn = in ) =

i0 ti0 i1 tin in+1


= tin in+1 .
i0 ti0 i1 tin1 in


Before showing the possibility of working with the more convenient matrix equations used in the introductory section, and exploring examples afterwards, we are about to state and prove the Markov property. The rough
idea of the theorem is that as long as we know the position at a given time of
the system we are observing, i.e. the distribution of Zm is formed by (|S| 1)
zeros and a single 1 for some m after the observation, similar to a row in a
delta function, we can set this to be our initial state and, starting from here
we have a Markov chain with initial distribution given by a row of the delta
function and the same transition matrix as before.
Theorem 2.2 (Markov Property) Let (Zn )nN be M arkov(, T ). Then,
conditional on Zm = i, the remaining chain (Zm+n )nN is M arkov(i , T ),
and independent of Z0 , . . . , Zm .
Proof: We will simply use the properties of the Markov chains. Since
(Zn )nN is M arkov(, T ), then we may infer that it satisfies the second property from the definition:
P(Zn+1 = in+1 |Z0 = i0 , . . . , Zn = in ) = tin in+1 , n N
Page 13

Denis Turcu

Math 157

In our problem we have an extra condition to satisfy, namely Zm = i. Therefore, this changes the problem as follows. First, there is nothing special for
n + 1 < m, so we have:
P(Zn+1 = in+1 |Z0 = i0 , . . . , Zn = in ) = tin in+1 , 0 n < m 1.
The only requirement that must be satisfied to be able to condition on Zm = i
is that this is a reachable state, i.e. it has a strictly positive probability. Now
for Zm we have:
P(Zm = im |Z0 = i0 , . . . , Zm1 = im1 and im = i) = i,im .
Let A be the event that a certain path is followed among (Zk )0k<m to
get to Zm = i. Then, we want to prove:
P((Zm = im , . . . ,Zm+n = im+n ) A|Zm = i) =
= i,im tim im+1 tim+n1 im+n P(A|Zm = i)
But now we can expand both sides by writing:
P(A and (Zm = im , . . . , Zm+n = im+n ) and im = i)/P(Zm = i) =
= i,im tim im+1 tim+n1 im+n P(A and im = i)/P(Zm = i)
P(A and (Zm = im , . . . , Zm+n = im+n ) and im = i) =
= i,im tim im+1 tim+n1 im+n P(A and im = i).
which is true by Theorem 2.1. Now if we sum the above for all possible paths
A that we can follow to get to Zm = i in m steps, we get:
P((Zm = im , . . . , Zm+n = im+n ) and im = i) =
= i,im tim im+1 tim+n1 im+n P(im = i)
P(Zm = im , . . . , Zm+n = im+n |Zm = i) = i,im tim im+1 tim+n1 im+n ,
so this is M arkov(i , T ) by Theorem 2.1.

Having these two theorems handy, we can show that the matrix notation
we previously used is allowed when working with Markov chains with given
initial distribution and transition matrix.

Page 14

Denis Turcu

Math 157

Corollary 2.3 Let (Zn )nN be M arkov(, T ). Then n N we have:


P(Xn = j) = (T n )j .
Proof: This is a simple application of Theorem 2.1, together with contracting the sum notation to matrix notation.

Corollary 2.4 Let (Zn )nN be M arkov(, T ). Then n, m N we have:
P(Xm+n = j|Xm = i) = (i T n )j = tij (n) .
Proof: This follows from Corollary 2.3 using Theorem 2.2, with = i as
suggested in the equation above.

Since we have started the paper by introducing a few examples that can
use Markov chains to solve and develop better intuition for different problems,
we will only provide one more example here.
Example 2.5 (Simplest Non-trivial Markov chain) The simplest nontrivial Markov chain that we can explore is one which has a state space of
cardinality 2. This is depicted below, with its respective transition probabilities (assume that when the system is in one state, it changes to the other
with respective probabilities and ).

The transition matrix then take the form:




1

T =

1
In order to solve the problem for any amount of steps, we must find out
what is the behavior of T n , n N. As in the example from [1], lets look
at t11 (n) , since the other 3 entries can be solved for similarly. We start from
T n+1 = T n T . Therefore, we have:
t11 (n+1) = t11 (n) (1 ) + t12 (n) .
Page 15

Denis Turcu

Math 157

Using the properties of the Markov chains, we know that T n must be a


stochastic matrix n N, therefore, t11 (n) + t12 (n) = 1. Then we get:
t11 (n+1) = t11 (n) (1 ) + (1 t11 (n) )
t11 (n+1) = t11 (n) (1 ) + .
Solving the above equation and 3 similar others, will give the exact form
of transitions of any step size in time. There is something important to note,
however. Take, for example, in the above = = 1. This physically means
that we move from one state to the other in each step. Therefore, the powers
of the transition matrix should not converge, i.e. @ limn (T n ). This is
indeed true, because in this case we have:

1
0


n
0 1 , n is even
0 1
!
=
.
1 0

0
1

1 0 , n is even
This does not mean that the system is unpredictable, on the contrary,
given an initial state, we will know precisely where the system will be at
time n. In other words it is deterministic. Therefore, T n does not necessarily converge, but the precesses are still predictable. Using the results from
Example 1.1.4 from [1], we have:
(

+ +
(1 )n , + > 0
(n)
+
.
t11 =
1,
+ =0
We are looking at the above expression to note that the (11) component
of T n is divergent only for = = 1. This follows quite easily from 0
, 1, which means that 1 1 1. The only case among these

+ +
(1 )n diverges is 1 = 1, which is obtained
in which +
only for = = 1. This is precisely the case we discussed just before. Note
however, that because the transition matrix involves determined transitions
in the case when T n diverges, the system is completely determined, with
two possibilities: it will either periodically rotate between the states in a
predefined order (for example = = 1 in the previous situation), or it will
achieve a state in which it will remain from that time on, called an absorbing
state (for example = 0 or = 0 in the previous situation).
Page 16

Denis Turcu

2.2

Math 157

Continuos-time Markov chains

This section is meant to make the transition from the previously discussed
discrete-time stochastic processes to the continuous-time version. Note that,
in general, for computational and simulation purposes, the continuous-time
problems must be converted to discrete-time ones with desired, preferably
small, step size from continuous to discrete [2] because computers only use
discrete states and discrete transitions. From a theoretical point of view, the
study of continuos-time Markov chains is considerably more complex. For
the purpose of this paper, we will briefly introduce this concept, with help
from Chapter 2 from [1] and Chapter 10 from [2].
Recall the state space S defined in the previous section. We are continuing
to work with a similar (or the same) state space for this part. We are not
making the states continuos, but rather the time steps. This is an important
distinction to make.
With the above notion in mind, we can now define the continuous-time
stochastic process as (Xt )tR+ . Just as before, Xt S, t R+ . In keeping
the same names as in the previous section, we consider such a process to
have the Markov property if the probability distribution of Xt , given previous
processes up to time s, is determined by the state Xs and does not depend
on previous values Xu for u < s. We can write this as:
P(Xt = it |Xs0 = i0 , . . . , Xsn1 = in1 , Xs = is ) = P(Xt = it |Xs = is ),
0 s0 < s1 < < sn1 < s < t. The difference here from the previous
section is that s can now get arbitrarily close to t.
Instead of focusing on a specific example, lets go into a more general
discussion about these continuous-time processes. Following [1], indeed we
do not have anySparticularly nice and useful relations involving uncountable
unions such as tR+ At as in the case for countable sets:
!
P

[
i

Ai

P(Ai ).3

Note that this was mandatory in Theorem 2.2 to be allowed to sum over all possible
paths A, although it was implied for the purpose of the proof.

Page 17

Denis Turcu

Math 157

There is a way to avoid having to deal with such a situation, as specified


in [1], by restricting to right-continuous processes. In terms of intervals, this
means that we have constant segments that are open to the right, if we were
to plot Xt . Writing this more formally, we have:
t R+ > 0 such that Xs = Xt s [t, t + ].
From this definition it should be more clear that Xt must remain constant
for some positive time, and eventually it can jump to another value in S.
While this might look like we are making the space discrete, keep in mind
that Xt can remain constant for arbitrarily small, but positive, intervals of
time, therefore allowing it to explode if wanted. It is proven in Section
6.6 from [1] that the right-continuity property allows us to determine the
probability of any event from the finite-dimensional distributions, P(Xt0 =
i0 , . . . , Xtn = in ) for n N, 0 t0 tn and i0 , . . . , in S:
!
X
P(Xt = i for some t [0, )) = 1 lim
P(Xq0 = j0 , . . . , Xqn = jn ) ,
n

j0 ,...,jn 6=i

where qi s are rationals.


Based on the shape and physical description of the right-continuity property for (Xt )tR+ , we are interested in finding the points in time where
jumps occur. We will call these jump times, and define them by:
J0 = 0, Jn+1 = inf{t > Jn |Xt 6= XJn }.
In the above, we set J0 = 0 because at the beginning we must jump to the
initial value. We used the same convention as before, inf = , for the same
reason, and we consider n to be a countable index. Of similar interest are
the holding times, or stationary, defined by:
(
Jn Jn1 , Jn1 <
Sn =
,
,
otherwise
where Sn > 0 by right-continuity.
Indeed the right-continuity assumption makes the study considerably simpler, especially when we define the jump process (Yn )nN given by Yn = XJn .
This discrete-time process can itself be a Markov chain. It allows us to write:
P(Xt = i for some t R+ ) = P(Yn = i for some n N).
Page 18

Denis Turcu

Math 157

Monte Carlo Method

We have now arrived at the final section of the paper, a brief introduction
to the Monte Carlo method. It consists of solving different problems in
computational sciences by using constructions of some random processes for
each problem. Of course the parameters for each process depend on the
needs of the problem, in particular on the quantities involved in the problem.
These quantities are then determined approximately through observations of
the created random processes and statistical analysis will become possible for
each of these quantities. Lets jump into an example to better understand the
method. Ultimately, this is the purpose of the paper, to stimulate intuition
about these problems and methods, to make it easier to apply them rigorously
when required.
Assume we want to estimate x through a random variable, , which has
E() = x. One approach involving the Monte Carlo Method for this simple
problem is to determine the approximate value of the quantity x by independently sampling N times from the defined random variable, and computing
the mean value:
1 + + N
.
=
N
According to the law of large numbers, we have E() = x, with variance given by 2 /N ,4 where 2 is the variance of a single sample. Therefore,
we note that the variance decreases linearly with N , so we can get arbitrarily
small errors in estimating x with sufficient samples.
The previous example is a toy example, and there are many other like this
one. However, we see from [3] that for some of these problems it can be much
faster to compute closed forms solutions and do all exact computation, rather
than running large amounts of experiments to obtain the desired accuracy.
This being said, we will continue with the first remarkable application of the
Monte Carlo method: estimating , or rather 1/.
This is know as Buffons celebrated problem of needle-tossing [3]. Let
a system of parallel lines separated by distances d on a plane. We suppose
that a needle of length l is tossed onto the plane, so that its position on the
4

Bienayme formula

Page 19

Denis Turcu

Math 157

plane is random (the center of he needle and its angle with respect to the
parallel lines). Naturally, we may assume that the position of the center, x, is
uniformly (and periodically) distributed between 0 and d, and that the angle
formed with the parallel lines, , is likewise uniformly distributed between
/2 and /2. For the following discussion, assume l < d.
We want to analyze situations in which the tossed needle does not intersect with any of the parallel lines in the plane. This condition can be written
as the following requirements:
l
l
cos() and x + cos() < d.
2
2
At this point, we luckily restricted the problem from the infinite 2dimensional plane space, to a rectangle, namely (x, ) [0, d] [/2, /2]. Let
D denote the region of the rectangle representing states in which the needle
does not touch any of the parallel lines. Then, because we assumed both
variables, x and , to be uniformly distributed, we have the probability of
no intersection for a needle of length l, pl , given by:
R
Z
dxd
1
D
pl = R
dxd.
=
d D
dxd
[0,d][/2,/2]
0<x

We can now use the previous inequalities to determine D in a way that


allows us to integrate the above expression:
Z /2 Z d l cos()
Z /2
2
1
1
pl =
ddx =
d(d l cos())
d /2 2l cos()
d /2
2l
1
(d 2l) pl = 1
.
d
d
Given that the probability of the above event is expressed in terms of ,
we can create and define a new event that has the desired probability of 1/:
let l = d/2 < d and consider the complementary event, namely the needle
intersects at least one of the parallel lines. Then we have the probability of
this event:
1
qd/2 = 1 pd/2 = .

For sufficient tosses of the needle, N , we will obtain K intersections of


the needle with one of the parallel lines, and by the law of large numbers, we
will have:
N/K .
pl =

Page 20

Denis Turcu

Math 157

This is clearly enough to estimate , in this case, but so far it does not
provide any margins for the errors of the method. Quite simply, if we could
estimate 3.1416 with an error of 4, this would not be of any good. Recall the first example in this section, in particular, the variance was given by
N1 where N was the number of trials. Therefore, the standard deviation
of the final result 1N . The standard deviation has a strong correlation
with the error for the method, and we can see from the previous example
that its asymptotic behavior is O( 1N ). Section I.2 from [3] provides a few
other examples with different distribution for more evidence of the general
behavior of the error for the Monte Carlo Method.
The assumed asymptotic behavior, O( 1N ), tells us that it is better to
run an experiment as many times as possible to get smaller error. Indeed
this seems reasonable and even more promising since limN O( 1N ) = 0,
which means we can have arbitrarily small error. However, in computational
sciences, we might care to some considerable extent about the time it takes to
runs these experiments versus the error we get to improve. Clearly the
time
increases linearly with N while the error only gets better with
1/
N , so,

roughly speaking, we are making an effort proportional to N . Although


it is not a perfect effort to get to the desired error, such as O(1), this is
a good approach because the final results are achieved in polynomial time
as a function of the number of runs of the experiment, and moreover, it is
a low degree polynomial, of degree 1/2. This being said, We may conclude
that the Monte Carlo method proves to be fairly effective in delivering good
ratio of time spent over quality of the final results for problems which require
computational tools to be solved or numerically estimated.

Page 21

Denis Turcu

Math 157

References
[1] Norris, J. R. Markov Chains. Cambridge Series on Statistical and Probabilistic Mathematics ; No. 2. New York: Cambridge University Press,
1997.
[2] Privault, Nicolas. Understanding Markov Chains : Examples and Applications. Springer Undergraduate Mathematics Series. 2013.
[3] Shreider, Iu. A., and Buslenko, N. P. The Monte Carlo Method; the
Method of Statistical Trials. International Series of Monographs in Pure
and Applied Mathematics ; v. 87. Oxford, New York: Pergamon Press,
1966.
[4] Fishman, George S. Monte Carlo: Concepts, Algorithms, and Applications. Springer Series in Operations Research. New York: SpringerVerlag, 1995.
[5] ArtOfTheProblem. What Is a Markov Chain? YouTube. YouTube, 26
June 2013. Web. 02 May 2016.
[6] BCFoltz. Finite Math: Markov Chain Example - The Gamblers Ruin.
YouTube. YouTube, 16 Nov. 2012. Web. 02 May 2016.

Page 22

You might also like