Markov
Markov
Markov
A discrete system is characterized by a set V of “states” and transitions between the states.
V is referred to as the state space. We think of the transitions as occurring at each time
beat, so the state of the system at time t is a value Xt ∈ V (t = 0, 1, 2, . . . ). The adjective
“discrete” refers to discrete time beats.
A discrete stochastic process is a discrete system in which transitions occur randomly
according to some probability distribution. The process is memoryless if the probability of an
i → j transition does not depend on the history of the process (the sequence of previous states):
(∀i, j, u0 , . . . , ut−1 ∈ V )(P (Xt+1 = j | Xt = i, Xt−1 = ut−1 , . . . , X0 = u0 ) = P (Xt+1 = j | Xt =
i)). (Here the universal quantifier is limited to feasible sequences of states u 0 , u1 , . . . , ut−1 , i,
i. e., to sequences which occur with positive probability; otherwise the conditional probability
stated would be undefined.) If in addition the transtion probability pij = P (Xt+1 = j | Xt = i}
does not depend on the time t, we call the process homogeneous.
Let M be a finite Markov chain with n states, V = [n] = {1, 2, . . . , n}. Let pij denote the
probability of transition from state i to state j, i. e., pij = P (Xt+1 = j | Xt = i). (Note that
this is a conditional probability: the question of i → j transition only arises if the system is in
state i, i. e., Xt = i.)
The finite Markov chain M is characterized by the n × n transition matrix T = (p ij )
(i, j ∈ [n]) and an initial distribution q = (q1 , . . . , qn ) where qi = P (X0 = i).
Definition 8.0.8. An n × n matrix T = (pij ) is stochastic if its entries are nonnegative real
numbers and the sum of each row is 1:
(∀i, j)(pij ≥ 0) and (∀i)( nj=1 pij = 1).
P
85
86 CHAPTER 8. FINITE MARKOV CHAINS
Exercise 8.0.9. The transition matrix of a finite Markov chain is a stochastic matrix. Con-
versely, every stochastic matrix can be viewed as the transition matrix of a finite Markov
chain.
Exercise 8.0.10. Prove: if T is a stochastic matrix then T k is a stochastic matrix for every
k.
Random walks on digraphs are important examples of finite Markov chains. They are
defined by hopping from vertex to neighboring vertex, giving equal chance to each out-neighbor.
The state space will be V , the set of vertices. The formal definition follows.
Let G = (V, E) be a finite digraph; let V = [n]. Assume (∀i ∈ V )(deg + (i) ≥ 1). Set
pij = 1/ deg+ (i) if (i, j) ∈ E; pij = 0 otherwise.
Exercise 8.0.11. Prove that the matrix (pij ) defined in the preceding paragraph is stochastic.
Conversely, all finite Markov chains can be viewed as weighted random walks on a digraph,
the weights being the transition probabilities. The formal definition follows.
Let T = (pij ) be an arbitrary (not necessarily stochastic) n × n matrix. We associate with
T a digraph G = (V, E) as follows. Let V = [n] and E = {(i, j) : pij 6= 0}. We label the edge
i → j with the number pij 6= 0 (the “weight” of the edge).
This definition makes sense for any matrix T ; edges indicate nonzero entries. If T is the
transition matrix of a finite Markov chain M then we call the associated digraph the transition
digraph of M. The vertices of the transition digraph represent the states of M and the
edges the feasible transitions (transitions that occur with positive probability).
Exercise 8.0.12. Prove that in the transition digraph of a finite Markov chain, (∀i)(deg + (i) ≥
1).
Exercise 8.0.13. Draw the transition digraph corresponding to the stochastic matrix
0.7 0.3
A= .
0.2 0.8
Label the edges with the transition probabilities.
The principal subject of study in the theory of Markov chains is the evolution of the
system.
The initial distribution q = (q1 , . . . , qn ) describes
Pn the probability that the system is in a
particular state at time t = 0. So qi ≥ 0 and i=1 qi = 1.
Set q(0) = q and let q(t) = (q1t , . . . , qnt ) be the distribution of the states at time t, i. e., the
distribution of the random variable Xt :
qit = P (Xt = i).
The following simple equation describes the evolution of a finite Markov chain.
87
0.2
0.7 1 2 0.8
0.3
So the study of the evolution of a finite Markov chain amounts to studying the powers of
the transition matrix.
Exercise 8.0.15. Experiment: study the powers of the matrix A defined in Exercise 8.0.13.
Observe that the sequence I, A, A2 , A3 , . . . appears to converge. What is the limit?
The study of the powers rests on the study of eigenvalues and eigenvectors.
Note that if G is the digraph associated with the matrix A then the summation can be reduced
to
Xn
zj = xi aij . (8.2)
i:i→j
Exercise 8.0.18. State the equations for the left action and the right eigenvectors of the
matrix A.
Theorem. The left and the right eigenvalues of a matrix are the same (but not the eigenvec-
tors!).
Proof. Both the right and the left eigenvalues are the roots of the characteristic polynomial
fA (x) = det(xI − A) where I is the n × n identity matrix.
Exercise 8.0.19. Find the eigenvalues and the corresponding left and right eigenvectors of
the matrix A from Exercise 8.0.13.
Hint. The characteristic polynomial is
x − 0.7 −0.3
fA (x) =
= x2 − 1.5x + 0.5 = (x − 1)(x − 1/2).
−0.2 x − 0.8
So the eigenvalues are λ1 = 1 and λ2 = 1/2. Each eigenvalue gives rise to a system of linear
equations for the coordinates of the corresponding (left/right) eigenvectors.
Observe the consequence that λ = 1 is also a left eigenvalue. This is significant because it
raises the possibility of having stationary distributions.
Exercise 8.0.25. Find a left eigenvector x = (x1 , x2 ) to the eigenvalue 1 for the stochastic
matrix A defined in Exercise 8.0.13. Normalize your eigenvector such that |x1 | + |x2 | = 1.
Observe that x is a stationary distribution for A.
89
2 3
Exercise 8.0.26. Let T be a stochastic matrix. Prove: if the limit T ∞ = limt→∞ T t exists
then every row of T ∞ is a stationary distribution.
Exercise 8.0.28. Let C ~ n denote the directed cycle of length n. Prove that the powers of the
transition matrix of the random walk on C ~ n do not converge; but a stationary distribution
exists.
A matrix is called non-negative if all entries of the matrix are non-negative. The Perron–
Frobenius theory of non-negative matrices provides the following fundamental result.
Exercise 8.0.32. Prove that a non-negative matrix has a non-negative right eigenvector. (Use
the Perron–Frobenius Theorem.)
Exercise 8.0.33. Let T be a stochastic matrix and x a non-negative left eigenvector to eigen-
value λ. Prove: λ = 1. Hint. Use Exercise 8.0.21.
Exercise 8.0.34. Prove: every finite Markov chain has a stationary distribution.
Finite Markov chains with a strongly connected transition digraph (every state is accessi-
ble from every state) are of particular importance. Such Markov chains are called irreducible.
To emphasize the underlying graph theoretic concept (and reduce the terminology overload),
we shall deviate from the accepted usage and use the term strongly connected Markov
chains instead of the classical and commonly used term “irreducible Markov chains.”
Our results are summed up in the following exercise, an immediate consequence of the
preceding three exercises.
Exercise 8.0.38. Prove: A strongly connected finite Markov chain (a) has exactly
one stationary distribution; and (b) all probabilities in the stationary distribution
are positive.
As we have seen (which exercise?), strong connectivity is not sufficient for the powers of
the transition matrix to converge. One more condition is needed.
Definition 8.0.39. The period of a vertex v in the digraph G is the g.c.d. of the lengths of
all closed directed walks in G passing through v. If G has no closed directed walks through v,
the period of v is said to be 0. If the period of v is 1 then v is said to be aperiodic.
91
Exercise 8.0.40. (a) Show that it is not possible for every state of a finite Markov chain to
have period 0 (in the transition digraph). (b) Construct a Markov chain with n states, such
that all but one state has period 0.
Note that a loop is a closed walk of length 1, so if G has a loop at v then v is automatically
aperiodic. A lazy random walk on a digraph stops at each vertex with probability 1/2 and
divides the remianing 1/2 evenly between the out-neighbors (pii = 1/2, and if i → j then
pij = 1/2 deg +(i)). So the lazy random walks are aperiodic at each vertex.
It follows that all states of a strongly connected finite Markov chain have the same
period. We call this common value the period of the strongly connected Markov chain. A
Markov chain is aperiodic if every node has period 1.
Exercise 8.0.42. Recall that (undirected) graphs can be viewed as digraphs with each pair
of adjacent vertices being connected in both directions. Let G be an undirected graph viewed
as a digraph. Prove: every vertex of G has period 1 or 2. The period of a vertex v is 2 if and
only the connected component of G containing v is bipartite.
Exercise 8.0.43. Suppose a finite Markov chain M is strongly connected and NOT aperiodic.
(It follows that the period ≥ 2 (why?).)
Prove: the powers of the transition matrix do not converge.
Hint. If the period is d, prove that the transition graph is a “blown-up directed cycle of length d”
in the following sense: the vertices of the transition graph can be divided into d disjoint subsets
V0 , V1 , . . . , Vd−1 such that (∀k) all edges starting at Vk end in Vk+1 , where the subscript is read
modulo d (wraps around). – Once you have this structure, observe that any t-step transition
would take a state in Vk to a state in Vk+t (the subscript again modulo d).
(b) If G is strongly connected and aperiodic then (∀i)(λ1 > |λi |).
Exercise 8.0.45. Prove that the eigenvalues of the random walk on the directed n-cycle are
exactly the n-th roots of unity. (So all of them have unit absolute value.)
Exercise 8.0.46. Let A be a (not necessarily non-negative) n × n matrix and G the associated
digraph. Suppose d is a common divisor of the periods of G. Let ω be a complex d-th root of
unity (i. e., ω d = 1). Then, if λ is an eigenvalue of A then λω is also an eigenvalue of A. Hint.
Equation (8.3).
Exercise 8.0.48. Prove: if T is the transition matrix of an ergodic Markov chain and
limt→∞ T t = M then all rows of M are equal.
Exercise 8.0.49. Prove: if a finite Markov chain is ergodic then from any initial distributrion,
the process will approach the unique stationary distribution. In other words, let T be the
transition matrix, s the stationary distribution, and q an arbitrary initial distribution. Then
lim qT t = s.
t→∞
The following example illuminates the kind of Markov chains encountered in combinatorics,
theoretical computer science, and statistical physics.
93
Random recoloring: a class of large Markov chains. Let G = (V, E) be a graph with n
vertices and maximum degree ∆; and let Q ≥ ∆ + 1. Let S be the set of all legal colorings of
G with Q colors, i. e., S is the set of functions f : V → [Q] such that if v, w ∈ V are adjacent
then f (v) 6= f (w). This “random recoloring process” is a Markov chain which takes S as its
set of states (the “state space”). The transitions from a legal coloring are defined as follows.
We pick a vertex v ∈ V at random, and recolor it by one of the available colors (colors not
used by the neighbors of v), giving each available color an equal chance (including the current
color of v).
Exercise 8.0.51. Prove that the number of states of the random recoloring process is between
(Q − ∆ − 1)n and Qn . So if Q ≥ ∆ + 2 then the state space is exponentially large.
Exercise 8.0.52. Prove: if Q ≥ ∆ + 2 then the stationary distribution for the random
recoloring process is uniform.
Concluding remarks. Markov chains are widely used models in a variety of areas of
theoretical and applied mathematics and science, including statistics, operations research, in-
dustrial engineering, linguistics, artificial intelligence, demographics, genomics. Markov chain
models are used in performance evaluation for computer systems (“if the system goes down,
what is the chance it will come back?”), in queuing theory (server queuing, intelligent trans-
portation systems). Hidden Markov models (where the transition probabilities are not known)
are a standard tool in the design of intelligent systems, including speech recognition, natural
language modelling, pattern recognition, weather prediction.
In discrete mathematics, theoretical computer science, and statistical physics, we often
have to consider finite Markov chains with an enormous number of states. Card shuffling is an
example of a Markov chain with 52! states. The “random recoloring process,” discussed above,
is an example of a class of Markov chains which have exponentially many states compared
1/3
1 2
1
2/3 1/3 1 3
5 4 2/3
1
to the length of the description of the Markov chain. (The description of an instance of the
random recoloring process consists of specifying the graph G and the parameter Q.) We remark
that the random recoloring process is but one instance of a class of Markov chains referred to
as “Glauber dynamics,” originating in statistical physics.
An example from computer science: if the state of a memory unit on a computer chip can
be described by a bit-string of length k then the number of states of the chip is 2 k . (Transitions
can be defined by changing one bit at a time.)
This exponential behavior is typical of combinatorially defined Markov chains.
Because of the exponential growth in the number of states, it is not possible to store the
transition matrices and to compute their powers; the size of the matrices becomes prohibitive
even for moderate values of the description length of the states. (Think of a 52! × 52! matrix
to study card shuffling!)
The evolution of such “combinatorially defined” Markov chains is therefore the subject
of intense theoretical study. It is of great importance to find conditions under which the
distribution is guaranteed to get close to the stationary distribution very fast (in a polynomial
number of steps). As noted above, this circumstance is called rapid mixing. Note that rapid
mixing takes place much faster than it would take to visit each state! (Why is this not a
paradox?)
8.1 Problems
1. Is M strongly connected?
2 3
5. Prove that limt→∞ T t does not exist. Prove this directly, do not refer to the Perron-
Frobenius theorem.