Brief Study of Markov Chains
Brief Study of Markov Chains
Brief Study of Markov Chains
Abstract
This paper provides a brief introduction to Markov Chains. It also includes a
discussion about the Monte Carlo method for solving problems which allow
probabilistic interpretations. The first part introduces a few simple and
conceptual problems for providing intuition about these approaches before
going into theoretical aspects. This basic formulation of Markov chains will
be followed by a section on Discrete-time Markov chains involving a more
rigorous formulation, which can be applied to a varied range of problems over
many fields of study. We will briefly go into Continuous-time Markov chains
in the next section for a more comprehensive study. The last part introduces
the Monte Carlo method for numerical estimations where analytical solutions
are nearly impossible to compute or do not exist by the nature of the problem.
Denis Turcu
Math 157
Introduction
Lets start with a brief aside to first see how powerful the Markov chains
method is. We assume familiarity with the law of large numbers. Assume we
have two boxes, Aw and Ab , which contain 200 white pebbles and 300 black
pebbles respectively. Then, we know that mixing these two boxes together,
and picking a random pebble each time for a large number of times, with
replacement, the number of white pebbles we encountered and the number
of black pebbles we encountered converges to:
2
# white pebbles encuntered in n trials
= .
n # black pebbles encuntered in n trials
3
lim
For the above to be true, we require independence for each of the trials.
While this example can be extended to more complicated ones, it is important to understand that there exists an underlying probability distribution
describing this situation. Namely, after n samples, we expect to have seen
[ 25 , 53 ] n white and black pebbles respectively, where [ 25 , 35 ] represents the
probability distribution for the two observables we have.
Now imagine we have the same two boxes as before, but instead of mixing
them into one other box, we create two other boxes as follows. We will have
box W , containing 150 white pebbles and 100 black pebbles, and box B,
containing the rest of the pebbles, 50 white and 200 black. We start picking
pebbles one at a time from one of these two boxes, and assume we start
from box W with probability p. However, we move from box W to box B
dependently on the last observation as follows. If the last picked pebble is
white, we pick the next one from box W , if it is black, we pick from box B.
Therefore, we can draw the following diagram to understand the rules better:
p
0.6
0
0.2
0.4
1p
0.8
Page 2
Denis Turcu
Math 157
0
T = W
B
0 W
B
0 p 1 p!
0 0.6 0.4
0 0.2 0.8
where the entry Tij represents the probability of transitioning from state i
to state j. In particular, regardless of our initial state, we cannot return
to state 1 because the first column is only 0s. Note that each row of this
matrix sums to 1, because regardless of which state is our initial state, we
must transition to one of the existing states, and the probability of the entire
space is 1. Now, if we let v0 denote the initial state, using the above notation,
in the nth step we will be in the state:
v(n) = v0 T n .
While this seems to be a very simple and straightforward example, it is
not the main result, but just a tool for easier notation. The interesting result
was proven by Andrey Markov; recall at this point that the previous problem
does not assume independence among trials, on the contrary they are made
explicitly dependent. Still in this case, given an initial state, the final result
converges in the limit n .
The above example is adapted from [5].
Page 3
Denis Turcu
1.1
Math 157
Gamblers ruin
p
2
S-1
Note that for S {0, 1, 2} the problem is trivial, and for S = 3 it is not
considerably harder [2]. However, we are interested in general S. Denote
1
An absorbing state is a situation where with probability 1 the next states are the same
as the current one.
Page 4
Denis Turcu
Math 157
the event that A loses the game by RA (for ruin A). Avoiding the absorbing
states as initial states because these would make the problem trivial, we can
write a very useful recursive relation based on the above Markov chain to
make transform the problem into a system of linear equations:
P(RA |X0 = k) = pP(RA |X0 = k + 1) + qP(RA |X0 = k 1).
While this result should be self explanatory, an extensive proof can be
found in Lemma 3.1 from [2]. Intuitively speaking, since each round is independent of the others, we have P(RA |Xi = k) = P(RA |Xj = k), i, j N.2
Therefore, the above equation simply represents one round of the game, going
right with probability p and left with probability q at the first step.
Introducing a shorter notation for the above equation, we denote:
fS (k) = P(RA |X0 = k),
for function fS : {0, 1, . . . , S} [0, 1] with f (0) = 1 and f (S) = 1. Then we
write:
fS (k) = pfS (k + 1) + qfS (k 1), 1 k S 1.
Assuming this is a known problem, that can be solved without the use
of Markov chains, we will cite equations 3.8, 3.9 from [2] for the final result,
which solves the above recursive equations with given boundary conditions:
(
k (q/p)S
, p 6= q
fS (k) = (q/p)
1(q/p)S
fS (k) = 1 Sk ,
p=q=
1
2
Recall that we defined the transition matrix which can be easily computed
in this case, especially having the diagram above. In particular we have:
0
1
q
T =
S 2
0
S 1 0
0
1
2
..
.
1
0
0
q
0
0
0
2
0
p
0
0
0
0
S2
0
0
0
S1
0
0
0
0
q
0
p
0
0
S
0
0
0
p
1
This means that no matter how many rounds have been played already, as long as A
has $k at the current time, from that time on, the probability of losing is the same as if
he or she would have started with $k.
Page 5
Denis Turcu
Math 157
(q/p)0 (q/p)S
1(q/p)0
0
0
0
0
S
S
1(q/p)
1(q/p)
(q/p)
1 (q/p)S
1(q/p)1
0
0
0
0
S
1(q/p)
1(q/p)S
(q/p)2 (q/p)S
1(q/p)2
1(q/p)S
0 0 0 0 1(q/p)
S
n
lim (T ) =
..
.. ..
.. ..
..
.
. .
. .
.
(q/p)S1 (q/p)S
1(q/p)S1
0
0
0
0
S
S
1(q/p)
1(q/p)
(q/p)S (q/p)S
1(q/p)S
0 0
0 0
0 0
0 0
0 0
0 0
S
(q/p)1 (q/p)
1(q/p)S
(q/p)2 (q/p)S
1(q/p)S
n
lim (T ) =
..
.
(q/p)S1 (q/p)S
1(q/p)
0
..
.
0
..
.
0 0
0 0
1(q/p)S
1(q/p)S
1(q/p)1
1(q/p)S
1(q/p)2
1(q/p)S
0 0
.
.. ..
..
. .
.
1(q/p)S1
0 0 1(q/p)S
0 0
1
Note how the results for p 6= q and p = q differ from each other. Consider
the sequences (an )n0 and (bn )n0 that converge to (q/p)k (q/p)S and,
respectively, 1 (q/p)S as n for given p, q, k, S. Then the result of the
gamblers ruin problem can be written as:
an
fS (k) = lim
.
n
bn
Page 6
Denis Turcu
Math 157
1.2
We now look at a similar problem, but unlike the previous section we do not
have boundaries anymore. It is now more convenient to slightly redefine the
problem in terms of the position we are at step n 0. First of all, without
loss of generality, assume we start at position 0 (otherwise, relabel/shift the
axis such that this is achieved). We are still assuming that at each step we
must move, either to the left or to the right. Denote by p the probability of
walking to the right, and by q = 1 p the probability of walking to the left.
We want to express the position, S, as a function of n, but this is cannot be
deterministic equation because we are walking randomly in either direction.
Therefore, we let (Xi )i1 be a set of random variables such that:
P(Xi = 1) = p and P(Xi = 1) = q.
Now we can write the position as a sequence:
Sn = X1 + X2 + + Xn .
While in the previous section we were interested in the probability of
winning/losing, that was not the only relevant question that we could ask.
Recall that we dealt with a discrete-time problem, as we do in this section
as well. It seems reasonable to assume that each round/step takes a roughly
constant amount of time, therefore, we are interested in how long we will
be playing/walking until we reach some desired state. In the gamblers
ruin problem we were interested in the game ending, and, luckily, it turned
out that it was ending at some point with probability 1. However, without
boundaries, we do not have an end anymore, so we must define a stopping
situation. Assuming that we start from the origin, lets explore how many
steps we must take to arrive back at the origin. Since this section is only
meant to offer a taste of what we can solve using recursive solutions that
Page 7
Denis Turcu
Math 157
Markov chains make seem more intuitive, we will only briefly discuss a few
results. Following notation in [2], we let:
T0 = inf{n > 0|Sn = 0}
denote the first time we return to the initial position (measured in units of
steps). Once we agree on the convention that inf = for this problem,
we may continue the analysis. This is simply an intuitive notation, because
{n > 0|Sn = 0} = we do not reach the initial position again.
The parameters of the problem are only given by p and q (or rather just
p because q = 1 p), so we will assume these are given from the beginning.
However, we are still not working with a deterministic problem, so T0 can be
represented by a distribution. Let this distribution be:
g : N [0, 1], defined by g(n) = P(T0 = n|S0 = 0).
Note that the P
above indeed defines a distribution because it satisfies
g(n) [0, 1] and
nN g(n) = 1. The second equality should be quite
clear and self explanatory because we have:
X
X
T0 =i are mutually exclusive
g(n) =
P(T0 = n|S0 = 0) =================
nN
nN
X
nN
g(n) = P
(T0 = n|S0 = 0)
= 1.
nN
Since we are making a step in either direction at each time, note that the
problem has a parity symmetry regarding the position and number of steps.
Assuming n N and k Z, we can write the following equations:
P(S2n = 2k + 1|S0 = even) = 0 and P(S2n+1 = 2k|S0 = even) = 0.
Therefore, if we start from 0, in an odd number of steps we will be at an odd
coordinate, so we have:
g(2k + 1) = P(T0 = 2k + 1|S0 = 0) = 0, k N.
So we see that we need an even number of steps to get back to the initial position. Recall from the binomial distribution the following adapted equation
to our problem:
2n
P(S2n = 2k|S0 = 0) =
pn+k q nk ,
n+k
Page 8
Denis Turcu
Math 157
because we need to make 2k net steps to the right, which means the rest
steps, 2n 2k must be half to the right and half to the left. In other words,
= k + n steps to the right and 2n2k
= n k to
we are making 2k + 2n2k
2
2
the left. The binomial coefficient accounts for all possible paths we can walk
on having these numbers of steps in each direction. Note that this holds for
k < 0 as well, but then making 1 step in the right direction physically
represents moving to the left.
Combining the above equation with the previous one, starting from 0
instead of any even number, P(S2n = 2k + 1|S0 = 0) = 0, we define:
(
2n n n
p q , n is even
n
h(n) = P(Sn = 0|S0 = 0) h(n) =
, n N .
0,
n is odd
Now we can write down a key result, which is rigorously proven in Lemma
4.1 from [2]. However, there is a lot of intuition behind the result, if we think
about what g(l) and h(n) mean. To emphasize, g(l) represents the probability
that we will return in our random walk to the initial position in l steps for
the first time. On the other hand, h(n) represents the probability that we
will be in the initial state at step n, so we can pass through 0 in our case
multiple times. The proof of Lemma 4.1 from [2] is basically nothing else
than applying the law of total probability and working out the notation to
explicitly obtain:
n2
X
h(n) =
g(n k)h(k).
k=0
The above is called the convolution equation and solving this for g(l) will
give us the result. While computing the final answer is beyond the current
scope of this paper, we will use the results from [2] to make a few brief
remarks. It turns out that the distribution is rather complex, but we can
easily have a look at a physically interesting situation, namely if the return
to the initial state is achieved in a finite amount of steps. From equation 4.21
from [2] we get:
g() = P(T0 = |S0 = 0) = |p q|.
A few important remarks are that the result is symmetric in p and q as
expected because we could simply rename the directions of walking. Note
that for p = q = 21 , the game will end for sure in a finite number of steps.
This seems to be a reasonable result because in this case we expect to stay
Page 9
Denis Turcu
Math 157
around 0. However, if say p > q, then we expect to slightly move to the right
at each step, shifting our origin to the right, towards infinity.
If we are interested in the average number of steps in which the game
ends, we clearly see that if the walk is biased (i.e. p 6= q), this is infinite
because we have non zero probability of having infinite number of steps.
The surprising result is that the average number of steps to return does not
converge even in the p = q case [2]. Whether or not this is an intuitive result
is open for discussion, but it means that the number of steps to return to
the initial state grows faster than the probability of the longer walks to be
followed.
Page 10
Denis Turcu
Math 157
Markov Chains
We are now about to dive into a more formal study of Markov chains. Reading Chapter 5 from [2] and Chapter 1 from [1] we will develop notations
and definitions to facilitate this study. Most of the technical words used in
this introductory paragraph will be defined in the subsequent sections. This
being said, roughly speaking we will first deal with discrete state spaces focusing on the Markov property and the role of the transition matrix previously
introduced. The discussion on continuous state space will be less formal.
2.1
Lets start by defining a discrete state space, S, which is a countable set. For
most problems, we generally have S = Z, as we have seen in the unrestricted
random walk situation. We then call any i S a state, and the fact that
S is countable allows us to name these discrete states because we have an
isomorphism between S and a subset of N. Note that this concept is not
related to the notion of discrete-time process. We will assume that the state
space is discrete for the continuous-time Markov chains as well.
We call a measure on S a list = (i |i S) such that 0 i < , i S.
Because measures are additive, for this particular case of a discrete space we
see that it suffices to know the measure of any particular state in order to
compute the measure of any given subset of S.
In addition, we can add additional structure to the previous definition if
we
require
that the measure of the entire discrete state space, S, is 1, namely:
P
iS i = 1. We call such a measure a distribution because it resembles the
idea of probabilities for discrete processes.
Defining a sequence discrete-time stochastic processes, (Zn )nN , which
takes values in S, we can slowly move on to what is the Markov property.
We will assume that each of these random variables are picked from the
distribution at the time step corresponding to n.
Recall the transition matrices we used in the first part for developing
intuition about the problems. Mote precisely, these were stochastic matrices,
i.e. each row of each matrix was supposed to be a distribution (sum to 1).
Moreover, the dimensionality of the matrices was the same as the cardinality
Page 11
Denis Turcu
Math 157
Note that there is an isomorphism between these type of matrices and the
diagrams drew for this type of problems. This is because the two objects have
the same dimensionality, and the requirement that each row of the matrix
is a distribution is equivalent to the requirement that the outward pointing
arrows in the diagrams form a distribution as well.
We are at the point where we can formally define a Markov chain. There
are only few, rather intuitive and expected properties. We say that (Zn )nN
is a Markov chain with initial distribution and transition matrix T if:
(
P(Z0 = i0 ) = i0
P(Zn+1 = in+1 |Z0 = i0 , . . . , Zn = in ) = P(Zn+1 = in+1 |Zn = in ) = tin in+1
In other words, the initial variable is picked from the initial distribution
, and the following ones only depend on the previous one, and no more
than that, through the transition matrix. In short notation, we will say that
(Zn )nN is M arkov(, T ).
From the above definitions, we immediately have a very useful result for
computing the distribution of Zn starting from any initial step. Indeed this
result is part of the beginning towards confirming the matrix equations we
used in the first part when working through the example problems.
Theorem 2.1 A discrete-time, finite stochastic process (Zn )0nN is:
M arkov(, T ) P(Z0 = i0 , . . . , ZN = iN ) = i0 ti0 i1 tiN 1 iN ,
i0 , . . . , iN S.
Proof: The direction follows immediately by applying both properties
in the definition and by using conditional probability identities [1].
For the direction, we will use induction. Assume the the equality is
true for N . Then, we can sum over all possible values of iN to get:
X
X
P(Z0 = i0 , . . . , ZN = iN ) =
i0 ti0 i1 tiN 1 iN
iN S
iN S
Page 12
Denis Turcu
Math 157
tiN 1 iN
iN S
P
S tiN 1 iN =1
P(Zn+1 = in+1 , Z0 = i0 , . . . , Zn = in )
P(Z0 = i0 , . . . , Zn = in )
Before showing the possibility of working with the more convenient matrix equations used in the introductory section, and exploring examples afterwards, we are about to state and prove the Markov property. The rough
idea of the theorem is that as long as we know the position at a given time of
the system we are observing, i.e. the distribution of Zm is formed by (|S| 1)
zeros and a single 1 for some m after the observation, similar to a row in a
delta function, we can set this to be our initial state and, starting from here
we have a Markov chain with initial distribution given by a row of the delta
function and the same transition matrix as before.
Theorem 2.2 (Markov Property) Let (Zn )nN be M arkov(, T ). Then,
conditional on Zm = i, the remaining chain (Zm+n )nN is M arkov(i , T ),
and independent of Z0 , . . . , Zm .
Proof: We will simply use the properties of the Markov chains. Since
(Zn )nN is M arkov(, T ), then we may infer that it satisfies the second property from the definition:
P(Zn+1 = in+1 |Z0 = i0 , . . . , Zn = in ) = tin in+1 , n N
Page 13
Denis Turcu
Math 157
In our problem we have an extra condition to satisfy, namely Zm = i. Therefore, this changes the problem as follows. First, there is nothing special for
n + 1 < m, so we have:
P(Zn+1 = in+1 |Z0 = i0 , . . . , Zn = in ) = tin in+1 , 0 n < m 1.
The only requirement that must be satisfied to be able to condition on Zm = i
is that this is a reachable state, i.e. it has a strictly positive probability. Now
for Zm we have:
P(Zm = im |Z0 = i0 , . . . , Zm1 = im1 and im = i) = i,im .
Let A be the event that a certain path is followed among (Zk )0k<m to
get to Zm = i. Then, we want to prove:
P((Zm = im , . . . ,Zm+n = im+n ) A|Zm = i) =
= i,im tim im+1 tim+n1 im+n P(A|Zm = i)
But now we can expand both sides by writing:
P(A and (Zm = im , . . . , Zm+n = im+n ) and im = i)/P(Zm = i) =
= i,im tim im+1 tim+n1 im+n P(A and im = i)/P(Zm = i)
P(A and (Zm = im , . . . , Zm+n = im+n ) and im = i) =
= i,im tim im+1 tim+n1 im+n P(A and im = i).
which is true by Theorem 2.1. Now if we sum the above for all possible paths
A that we can follow to get to Zm = i in m steps, we get:
P((Zm = im , . . . , Zm+n = im+n ) and im = i) =
= i,im tim im+1 tim+n1 im+n P(im = i)
P(Zm = im , . . . , Zm+n = im+n |Zm = i) = i,im tim im+1 tim+n1 im+n ,
so this is M arkov(i , T ) by Theorem 2.1.
Having these two theorems handy, we can show that the matrix notation
we previously used is allowed when working with Markov chains with given
initial distribution and transition matrix.
Page 14
Denis Turcu
Math 157
T =
1
In order to solve the problem for any amount of steps, we must find out
what is the behavior of T n , n N. As in the example from [1], lets look
at t11 (n) , since the other 3 entries can be solved for similarly. We start from
T n+1 = T n T . Therefore, we have:
t11 (n+1) = t11 (n) (1 ) + t12 (n) .
Page 15
Denis Turcu
Math 157
1
0
n
0 1 , n is even
0 1
!
=
.
1 0
0
1
1 0 , n is even
This does not mean that the system is unpredictable, on the contrary,
given an initial state, we will know precisely where the system will be at
time n. In other words it is deterministic. Therefore, T n does not necessarily converge, but the precesses are still predictable. Using the results from
Example 1.1.4 from [1], we have:
(
+ +
(1 )n , + > 0
(n)
+
.
t11 =
1,
+ =0
We are looking at the above expression to note that the (11) component
of T n is divergent only for = = 1. This follows quite easily from 0
, 1, which means that 1 1 1. The only case among these
+ +
(1 )n diverges is 1 = 1, which is obtained
in which +
only for = = 1. This is precisely the case we discussed just before. Note
however, that because the transition matrix involves determined transitions
in the case when T n diverges, the system is completely determined, with
two possibilities: it will either periodically rotate between the states in a
predefined order (for example = = 1 in the previous situation), or it will
achieve a state in which it will remain from that time on, called an absorbing
state (for example = 0 or = 0 in the previous situation).
Page 16
Denis Turcu
2.2
Math 157
This section is meant to make the transition from the previously discussed
discrete-time stochastic processes to the continuous-time version. Note that,
in general, for computational and simulation purposes, the continuous-time
problems must be converted to discrete-time ones with desired, preferably
small, step size from continuous to discrete [2] because computers only use
discrete states and discrete transitions. From a theoretical point of view, the
study of continuos-time Markov chains is considerably more complex. For
the purpose of this paper, we will briefly introduce this concept, with help
from Chapter 2 from [1] and Chapter 10 from [2].
Recall the state space S defined in the previous section. We are continuing
to work with a similar (or the same) state space for this part. We are not
making the states continuos, but rather the time steps. This is an important
distinction to make.
With the above notion in mind, we can now define the continuous-time
stochastic process as (Xt )tR+ . Just as before, Xt S, t R+ . In keeping
the same names as in the previous section, we consider such a process to
have the Markov property if the probability distribution of Xt , given previous
processes up to time s, is determined by the state Xs and does not depend
on previous values Xu for u < s. We can write this as:
P(Xt = it |Xs0 = i0 , . . . , Xsn1 = in1 , Xs = is ) = P(Xt = it |Xs = is ),
0 s0 < s1 < < sn1 < s < t. The difference here from the previous
section is that s can now get arbitrarily close to t.
Instead of focusing on a specific example, lets go into a more general
discussion about these continuous-time processes. Following [1], indeed we
do not have anySparticularly nice and useful relations involving uncountable
unions such as tR+ At as in the case for countable sets:
!
P
[
i
Ai
P(Ai ).3
Note that this was mandatory in Theorem 2.2 to be allowed to sum over all possible
paths A, although it was implied for the purpose of the proof.
Page 17
Denis Turcu
Math 157
j0 ,...,jn 6=i
Denis Turcu
Math 157
We have now arrived at the final section of the paper, a brief introduction
to the Monte Carlo method. It consists of solving different problems in
computational sciences by using constructions of some random processes for
each problem. Of course the parameters for each process depend on the
needs of the problem, in particular on the quantities involved in the problem.
These quantities are then determined approximately through observations of
the created random processes and statistical analysis will become possible for
each of these quantities. Lets jump into an example to better understand the
method. Ultimately, this is the purpose of the paper, to stimulate intuition
about these problems and methods, to make it easier to apply them rigorously
when required.
Assume we want to estimate x through a random variable, , which has
E() = x. One approach involving the Monte Carlo Method for this simple
problem is to determine the approximate value of the quantity x by independently sampling N times from the defined random variable, and computing
the mean value:
1 + + N
.
=
N
According to the law of large numbers, we have E() = x, with variance given by 2 /N ,4 where 2 is the variance of a single sample. Therefore,
we note that the variance decreases linearly with N , so we can get arbitrarily
small errors in estimating x with sufficient samples.
The previous example is a toy example, and there are many other like this
one. However, we see from [3] that for some of these problems it can be much
faster to compute closed forms solutions and do all exact computation, rather
than running large amounts of experiments to obtain the desired accuracy.
This being said, we will continue with the first remarkable application of the
Monte Carlo method: estimating , or rather 1/.
This is know as Buffons celebrated problem of needle-tossing [3]. Let
a system of parallel lines separated by distances d on a plane. We suppose
that a needle of length l is tossed onto the plane, so that its position on the
4
Bienayme formula
Page 19
Denis Turcu
Math 157
plane is random (the center of he needle and its angle with respect to the
parallel lines). Naturally, we may assume that the position of the center, x, is
uniformly (and periodically) distributed between 0 and d, and that the angle
formed with the parallel lines, , is likewise uniformly distributed between
/2 and /2. For the following discussion, assume l < d.
We want to analyze situations in which the tossed needle does not intersect with any of the parallel lines in the plane. This condition can be written
as the following requirements:
l
l
cos() and x + cos() < d.
2
2
At this point, we luckily restricted the problem from the infinite 2dimensional plane space, to a rectangle, namely (x, ) [0, d] [/2, /2]. Let
D denote the region of the rectangle representing states in which the needle
does not touch any of the parallel lines. Then, because we assumed both
variables, x and , to be uniformly distributed, we have the probability of
no intersection for a needle of length l, pl , given by:
R
Z
dxd
1
D
pl = R
dxd.
=
d D
dxd
[0,d][/2,/2]
0<x
Page 20
Denis Turcu
Math 157
This is clearly enough to estimate , in this case, but so far it does not
provide any margins for the errors of the method. Quite simply, if we could
estimate 3.1416 with an error of 4, this would not be of any good. Recall the first example in this section, in particular, the variance was given by
N1 where N was the number of trials. Therefore, the standard deviation
of the final result 1N . The standard deviation has a strong correlation
with the error for the method, and we can see from the previous example
that its asymptotic behavior is O( 1N ). Section I.2 from [3] provides a few
other examples with different distribution for more evidence of the general
behavior of the error for the Monte Carlo Method.
The assumed asymptotic behavior, O( 1N ), tells us that it is better to
run an experiment as many times as possible to get smaller error. Indeed
this seems reasonable and even more promising since limN O( 1N ) = 0,
which means we can have arbitrarily small error. However, in computational
sciences, we might care to some considerable extent about the time it takes to
runs these experiments versus the error we get to improve. Clearly the
time
increases linearly with N while the error only gets better with
1/
N , so,
Page 21
Denis Turcu
Math 157
References
[1] Norris, J. R. Markov Chains. Cambridge Series on Statistical and Probabilistic Mathematics ; No. 2. New York: Cambridge University Press,
1997.
[2] Privault, Nicolas. Understanding Markov Chains : Examples and Applications. Springer Undergraduate Mathematics Series. 2013.
[3] Shreider, Iu. A., and Buslenko, N. P. The Monte Carlo Method; the
Method of Statistical Trials. International Series of Monographs in Pure
and Applied Mathematics ; v. 87. Oxford, New York: Pergamon Press,
1966.
[4] Fishman, George S. Monte Carlo: Concepts, Algorithms, and Applications. Springer Series in Operations Research. New York: SpringerVerlag, 1995.
[5] ArtOfTheProblem. What Is a Markov Chain? YouTube. YouTube, 26
June 2013. Web. 02 May 2016.
[6] BCFoltz. Finite Math: Markov Chain Example - The Gamblers Ruin.
YouTube. YouTube, 16 Nov. 2012. Web. 02 May 2016.
Page 22