Continuous Time Markov Chains
Continuous Time Markov Chains
Continuous Time Markov Chains
6.1
(6.1)
(6.2)
We will call any process satisfying (6.1) and (6.2) a time-homogeneous continuous
time Markov chain, though a more useful equivalent definition in terms of transition
rates will be given in Definition 6.1.3 below. Property (6.1) should be compared with
the discrete time analog (3.3). As we did for the Poisson process, which we shall see
is the simplest (and most important) continuous time Markov chain, we will attempt
to understand such processes in more than one way.
Before proceeding, we make the technical assumption that the processes under
consideration are right-continuous. This implies that if a transition occurs at time
t, then we take X(t) to be the new state and note that X(t) = X(t).
146
Example 6.1.1. Consider a two state continuous time Markov chain. We denote the
states by 1 and 2, and assume there can only be transitions between the two states
(i.e. we do not allow 1 1). Graphically, we have
1 2.
Note that if we were to model the dynamics via a discrete time Markov chain, the
tansition matrix would simply be
0 1
P =
,
1 0
and the dynamics are quite trivial: the process begins in state 1 or 2, depending upon
the initial distribution, and then deterministically transitions between the two states.
At this point, we do not know how to understand the dynamics in the continuous
time setting. All we know is that the distribution of the process should only depend
upon the current state, and not the history. This does not yet tell us when the firings
will occur.
1
.
(x)
Thus, the higher the rate (x), representing the rate out of state x, the smaller the
expected time for the transition to occur, which is intuitively pleasing.
Example 6.1.2. We return to Example 6.1.1, though now we assume the rate from
state 1 to state 2 is (1) > 0, and the rate from state 2 to state 1 is (2) > 0. We
147
commonly incorporate these parameters into the model by placing them next to the
transition arrow in the graph:
(1)
1 2.
(2)
The dynamics of the model are now clear. Assuming X(0) = 1, the process will
remain in state 1 for an exponentially distributed amount of time, with parameter
(1), at which point it will transition to state 2, where it will remain for an exponentially distributed amount of time, with parameter (2). This process then continuous
indefinitely.
Example 6.1.2 is deceptively simple as it is clear that when the process transitions
out of state 1, it must go to state 2, and vice versa. However, consider the process
with states 1, 2, and 3 satisfying
1 2 3.
Even if you are told the holding time parameter for state 2, without further information you can not figure out wether you transition to state 1 or state 3 after you leave
state 2. Therefore, we see we want to study the transition probabilities associated
with the process, which we do now.
Still letting Tx denote the amount of time the process stays in state x after entering
state x, and which we now know is exponentially distributed with a parameter of (x),
we define for y = x
def
pxy = P {X(Tx ) = y | X(0) = x},
to be the probability that the process transitions to state y after leaving state x. It
can be shown that the time of the transition, Tx , and the value of the new state,
y, are independent random variables. Loosely, this follows since if the amount of
time the chain stays in state x aects the transition probabilities, then the Markov
property (6.1) is not satisfied as we would require to know both the current state
and the amount of time the chain has been there to know the probabilities associated
with ending up in the dierent states.
We next define
def
(x, y) = (x)pxy .
Since Tx is exponential with parameter (x), we have that
P {Tx < h} = 1 e(x)h = (x)h + o(h), as h 0.
Combining the above, for y = x and mild assumptions on the function ,1 we have
P {X(h) = y | X(0) = x} = P {Tx < h, X(Tx ) = y | X(0) = x} + o(h)
= (x)hpxy + o(h)
= (x, y)h + o(h),
1
148
(6.3)
as h 0, where the o(h) in the first equality represents the probability of seeing
two or more jumps (each with an exponential distribution) in the time window [0, h].
Therefore, (x, y) yields the local rate, or intensity, of transitioning from state x to
state y. It is worth explicitly pointing out that for x S
(x, y) =
(x)pxy = (x).
y=x
y=x
y=x
P {X(h) = y | X(0) = x}
(x, y)h + o(h)
y=x
= 1 (x)h
(6.4)
pxy + o(h)
y=x
= 1 (x)h + o(h).
Similarly to our consideration of the Poisson process, it can be argued that any
time homogeneous process satisfying the local conditions (6.3) and (6.4) also satisfies
the Markov property (6.1). This is not surprising as the conditions (6.3)-(6.4) only
make use of the current state of the system and ignore the entire past. This leads
to a formal definition of a continuous time Markov chain that incorporates all the
relevant parameters of the model and is probably the most common definition in the
literature.
Definition 6.1.3. A time-homogeneous continuous time Markov chain with transition rates (x, y) is a stochastic process X(t) taking values in a finite or countably
infinite state space S satisfying
P {X(t + h) = x | X(t) = x} = 1 (x)h + o(h)
P {X(t + h) = y | X(t) = x} = (x, y)h + o(h),
where y =
x, and (x) = y=x (x, y).
When only the local rates (x, y) are given in the construction of the chain, then it
is important to recognize that the transition probabilities of the chain can be recovered
via the identity
(x, y)
(x, y)
pxy =
=
.
(x)
y=x (x, y)
Example 6.1.4. Let N be a Poisson process with intensity > 0. As N satisfies
P {N (t + h) = j + 1 | N (t) = j} = h + o(h)
P {N (t + h) = j | N (t) = j} = 1 h + o(h),
we see that it is a continuous time Markov chain. Note also that any Poisson process
is the continuous time version of the deterministically monotone chain from Chapter
3.
149
(2,3)
(2,1)
(3,2)
1 2 3,
where the local transition rates have been placed next to their respective arrows.
Note that the holding time in state two is an exponential random variable with a
parameter of
def
(2) = (2, 1) + (2, 3),
and the probability that the chain enters state 1 after leaving state 2 is
def
p21 =
(2, 1)
,
(2, 1) + (2, 3)
whereas the probability that the chain enters state 3 after leaving state 2 is
def
p23 =
(2, 3)
.
(2, 1) + (2, 3)
This chain could then be simulated by sequentially computing holding times and
transitions.
(x) =
(x, y),
y
(x, y)
(x, y)
.
=
(x)
j=x (x, j)
151
if i > 0,
and (0, 1) = 0. Therefore, the probability of the embedded discrete time Markov
chain transitioning up if the current state is i = 0, is /(+), whereas the probability
of transitioning down is /( + ). When i = 0, the holding times will always be
exponentially distributed with a parameter of + .
Example 6.1.7. We generalize Example 6.1.6 by allowing the transition rates to
depend upon the current state of the system. As in the discrete time setting this
leads to a birth and death process. More explicitly, for i {0, 1, . . . , } we let
(i, i + 1) = B(i)
(i, i 1) = D(i),
where 0 = 0. Note that the transition rates are now state dependent, and may even
be unbounded as i . Common choices for the rates include
B(i) = i
D(i) = i,
for some scalar , > 0. Another common model would be to assume a population
satisfies a logistical growth model,
B(i) = ri
r
D(i) = i2 .
K
where K is the carrying capacity.
Analogously to Example 5.2.18, if we let X(t) denote the state of the system at
time t, we have that X(t) solves the stochastic equation
t
X(t) = X(0) + Y1
B(X(s))ds Y2
D(X(s))ds ,
(6.5)
0
def
def
A(t) = Y1
B(X(s))ds D(t) = Y2
D(X(s))ds ,
0
152
we see that
P {X(t + h) = x + 1 | X(t) = x}
= P {A(t + h) A(t) = 1, D(t + h) D(t) = 0 | X(t) = x} + o(h)
= B(x)h(1 D(x)h) + o(h)
= B(x)h + o(h).
Example 6.1.8. We will model the dynamical behavior of a single gene, the mRNA
molecules it produces, and finally the resulting proteins via a continuous time Markov
chain. It is an entirely reasonable question to ask whether it makes sense to model
the reaction times of such cellular processes via exponential random variables. The
answer is almost undoubtably no, however the model should be interpreted as an
approximation to reality and has been quite successful in elucidating cellular dynamics. It is also a much more realistic model than a classical ODE approach, which
is itself a crude approximation to the continuous time Markov chain model (we will
discuss this fact later).
Consider a single gene that is producing mRNA (this process is called transcription) at a constant rate of 1 , where the units of time are hours, say. Further, we
suppose the mRNA molecules are producing proteins (this process is called translation) at a rate of 2 (#mRNA), for some 2 > 0. Next, we assume that the mRNA
molecules are being degraded at a rate of dm (#mRNA), and proteins are being
degraded at a rate of dp (#proteins). Graphically, we may represent this system via
(1)
G G+M
(2)
M M +P
d
m
M
dp
P .
It is important to note that this is not the only way to write down these reactions.
For example, many in the biological communities would write M P , as opposed
to M M + P . However, we feel it is important to stress, through the notation
M M + P , that the mRNA molecule is not lost during the course of the reaction.
As the number of genes in the model is assumed to be constant in time, the state
space should be taken to be Z20 . Therefore, we let X(t) Z20 be the state of the
process at time t where the first component gives the number of mRNA molecules
and the second gives the number of proteins.
Now we ask: what are the possible transitions in the model, and what are the
rates? We see that the possible transitions are given by addition of the reaction
vectors
1
0
1
0
,
,
,
,
0
1
0
1
153
2 X1 (t),
dm X1 (t),
dp X2 (t).
Note that the rate of reaction 3, respectively 4, will be zero when X1 (t) = 0, respectively X2 (t) = 0. Therefore, non-negativity of the molecules is assured.
6.2
Explosions
Now that we have a good idea of what a continuous time Markov chain is, we demonstrate a behavior that is not possible in the discrete time setting: explosions. Recall
that in Algorithm 1, which constructs a continuous time Markov chain, the value
Tn represents the time of the nth transition of the chain. Therefore, the chain so
constructed is only defined up until the (random) time
def
T = lim Tn .
n
Pi {T = } = P {T = | X(0) = i} = 1,
for all i S,
than we will say the process is non-explosive. Otherwise we will say the process is
explosive.
Note that a process could be explosive even if
Pi {T = } = 1,
for some i S; see Example 6.2.4. It is not too dicult to construct an explosive
process. To do so, we will first need the following result pertaining to exponential
random variables.
Proposition 6.2.2. Suppose that {En }, n 1, are independent exponential random
variables with respective parameters n . Then,
1
P
En < = 1
< .
n
n
n
Proof. We will prove one direction of the implication (the one
will use). For
we
1
the
other
direction,
see
[13,
Section
5.1].
We
suppose
that
<
. Because
n n
n En 0 and
E(
En ) =
EEn =
1
< ,
n
n
Example 6.2.3. Consider a pure birth process in which the embedded discrete time
Markov chain is the deterministically monotone chain of Example 3.1.5. Suppose that
the holding time parameter in state i is (i). Finally, let X(t) denote the state of the
continuous time process at time t. Note that the stochastic equation satisfied by X
is
t
X(t) = X(0) + N
(X(s))ds .
0
Suppose that (n) = n for some > 0 and that X(0) = 1. Then the nth holding
time is determined by an exponential random variable with parameter n2 , which we
denote by En . Since
1
< ,
2
n
n
P
En < = 1,
n
and the process is explosive. The stochastic equation for this model is
t
2
X(t) = X(0) + N
X(s) ds ,
0
2
x (t) = x (t)
x(t) = x(0) +
x(s)2 ds,
0
Example 6.2.4. Consider a continuous time Markov chain with state space {-2,1,0,1,2,. . . }. We suppose that the graph of the model is
1
22
32
2 1 0 1 2 3 ,
1
The following proposition characterizes the most common ways in which a process
is non-explosive. A full proof can be found in [13].
2
155
1
< ,
(Xn )
1
= ,
(Xn )
1
1
= .
(X(n))
c
n
To show (2), we note that if the state space is finite, we may simply take c = max{i },
and apply (1).
We will now show (3). If Pi {Xn T, n} = 0, then entry into T c is assured. There
must, therefore, be a state i T c , which is hit infinitely often (note that this value
can be dierent for dierent realizations of the process). Let the infinite sequence of
times when Xn = i be denoted by {nj }. Then,
1/(Xn )
1/(Xnj ) =
1/(i) = .
n
6.3
1
= lim
P {X(t + h) = j | X(t) = y, X(0) = i}P {X(t) = y | X(0) = i}
h0 h
yS
P {X(t) = j | X(0) = i} .
However,
+
P {X(t + h) = j | X(t) = y, X(0) = i}P {X(t) = y | X(0) = i}
(6.6)
= (1 (j)h)Pij (t) +
(6.7)
y=j
y=j
and so
1
Pij (t) = lim
h0 h
= (j)Pij (t) +
y=j
157
y=j
Thus,
Pij (t) = (j)Pij (t) +
(6.8)
y=j
These are the Kolmogorov forward equations for the process. In the biology literature
this system of equations is termed the chemical master equation.
We point out that there was a small mathematical slight of hand in the above
calculation. To move from (6.6) to (6.7), we had to assume that
where we write oy (h) to show that the size of the error can depend upon the state y.
This condition is satisfied for all systems we will consider.
Definition 6.3.1. Let X(t) be a continuous time Markov chain on some state space
S with transition intensities (i, j) 0. Recalling that
(i) =
(i, j),
j=i
The matrix
Aij =
(i), if i = j
(i, j), if i = j
(i, j), if i = j
(i, j),
if i = j
= (j)Pij (t) +
Piy Ayj
y=j
y=j
At
def
n n
t A
k=0
n!
This solution is always valid in the case that the state space is finite.
We make the following observations pertaining to the generator A:
158
3 3
A=
.
1 1
A B C.
The infinitesimal generator of this process is
1
1
0
A = 1 (1 + 2 ) 2 ,
0
2
2
0
1
0
P = 1 /(1 + 1 ) 0 2 /(1 + 1 ) .
0
1
0
1 1
0
...
0 1 1 0 . . .
A= 0
.
0
1
1
..
.. . .
.
.
.
.
.
.
If we are given an initial condition, , then P (t) is the vector with jth element
def
(P (t))j =
i Pij =
P {X(t) = j | X(0) = i}P {X(0) = i} = P {X(t) = j},
i
giving the probability of being in state j at time t given and initial distribution of .
Thus, we see that if is given, we have
P (t) = P (t) = etA .
159
(6.9)
Backward equation
Before attempting to solve a system using Kolmogorovs forward equations, we introduce another set of equations, called Kolmogorovs backward equations, which are
valid for all continuous time Markov chains. The derivation below follows that of [13].
We begin by finding an integral equation satisfied by Pij (t). We will then dierentiate it to get the backward equations.
Proposition 6.3.5. For all i, j S and t 0, we have
t
(i)t
Pij (t) = ij e
+
(i)e(i)s
Qik Pkj (t s)ds,
0
where, as usual,
ij =
k=i
1, if i = j
0, if i = j
is the Kronecker delta function, and Q is the transition matrix of the embedded discrete
time Markov chain.
Proof. Conditioning on the first jump time of the chain, T1 , we have
P {X(t) = j | X(0) = i}
= P {X(t) = j, T1 > t | X(0) = i} + P {X(t) = j, T1 t | X(0) = i}.
We handle these terms separately. For the first term on the right hand side of the
above equation, a first transition has not been made. Thus, X(t) = j i j = i and
does so with a probability of one. That is,
P {X(t) = j,T1 > t | X(0) = i}
= P {X(t) = j | T1 > t, X(0) = i}P {T1 > t | X(0) = i}
= ij Pi {T1 > t}
= ij e(i)t .
For the second term, we will condition on the time of the first jump happening
in (s, s + ), for small (we will eventually take 0). As the holding time is
exponential with parameter (i), this event has probability
s+
160
N
1
n=0
N
1
n=0
N
1
n=0
n=0
N
1
N
1
k=i
(i)e
(i)sn
n=0
k=i
N
1
(i)e(i)sn
n=0
k=i
(i)e(i)s
0
k=i
The system of equations (6.10) is called the Kolmogorov backwards equations. Note
that the dierence with the forward equations is the order of the multiplication of
P (t) and A. However, the solution of the backwards equation is once again seen to
be
P (t) = etA ,
agreeing with previous results.
161
(i)t
Pij (t) = ij e
+
(i)e(i)s
Qik Pkj (t s)ds
0
= ij e(i)t +
= e(i)t ij +
k=i
(i)e(i)(tu)
k=i
(i)e(i)u
0
k=i
Dierentiating yields
(i)t
(i)e
(i)t
(i)e(i)u
0
k=i
k=i
k=i
Both the forward and backward equations can be used to solve for the associated
probabilities as the next example demonstrates.
Example 6.3.7. We consider a two state, {0, 1}, continuous time Markov chain with
generator matrix
A=
.
We will use both the forwards and backwards equations to solve for P (t).
Approach 1: Backward equation. While we want to compute Pij (t) for each pair
i, j {0, 1}, we know that
P00 (t) + P01 (t) = P10 (t) + P11 (t) = 1,
for all t 0, and so it is sucient to solve just for P00 (t) and P10 (t).
The backwards equation is P (t) = AP (t), yielding the equations
162
We see that
P00
(t) + P10
(t) = 0 = P00 (t) + P10 (t) = c.
P00
(t) = P00 (t) P00 (t) = ( + )P00 (t).
(+)t
+
e
.
+ +
(+)t
(+)t
P10 (t) =
+
e
=
e
.
+ +
+ +
Approach 1: Forward equation. This is easier. We want to solve
P (t) = P (t)A.
We now get
P00
(t) = P00 (t) + P01 (t) = P00 (t) + (1 P00 (t)) = ( + )P00 (t)
P10 (t) = P10 (t) + P11 (t) = P10 (t) + (1 P10 (t)) = ( + )P00 (t),
1
lim P (t) =
,
t
+
yielding a common row vector which can be interpreted as a limiting distribution.
There is a more straightforward way to make the above computations: simply
solve the matrix exponential.
163
n
1 n
tn D n
t
(QDQ
)
eAt =
=Q
Q1 = QeDt Q1 ,
n!
n!
n=0
n=0
where eDt , because D is diagonal, is a diagonal matrix with diagonal elements ei t
where i is the ith eigenvalue.
Example 6.3.9. We now solve the above problem using the matrix exponential.
Supposing, for concreteness, that = 3 and = 1, we have that the generator
matrix is
3 3
A=
1 1
It is easy to check that the eigenvalues are 0, 4 and the associated eigenvalues are
[1, 1]t and [3, 1]t . Therefore,
1 3
1/4 3/4
1
Q=
, Q =
,
1 1
1/4 1/4
and
e
tA
tA
1/4 3/4
1/4 3/4
which has a common row. Thus, for example, in the long run, the chain will be in
state zero with a probability of 1/4.
6.4
Stationary Distributions
In this section we will parallel our treatment of stationary distributions for discrete
time Markov chains. We will aim for intuition, as opposed to attempting to prove
everything, and point the interested reader to [13] and [11] for the full details of the
proofs.
164
6.4.1
Classification of states
We start by again classifying the states of our process. Viewing a continuous time
Markov chain as an embedded discrete time Markov chain with exponential holding
times makes the classification of states, analogous to Section 3.4 in the discrete time
setting, easy. We will again denote our state space as S.
Definition 6.4.1. The communication classes of the continuous time Markov chain
X(t) are the communication classes of the embedded Markov chain Xn . If there is
only one communication class, we say the chain is irreducible; otherwise it is said to
be reducible.
Noting that X(t) will return to a state i infinitely often if and only if the embedded
discrete time chain does (even in the case of an explosion!) motivates the following.
Definition 6.4.2. State i S is called recurrent for X(t) if i is recurrent for the
embedded discrete time chain Xn . Otherwise, i is transient.
Definition 6.4.3. Let T1 denote the first jump time of the continuous time chain.
We define
def
i = inf{t T1 : X(t) = i},
6.4.2
Invariant measures
Recall that equation (6.9) states that if the initial distribution of the process is ,
then P (t) is the vector whose ith component gives the probability that X(t) = i.
We therefore define an invariant measure in the following manner.
Definition 6.4.4. A measure = {j , j S} on S is called invariant if for all t > 0
P (t) = .
If this measure is a probability distribution (i.e. sums to one), then it is called a
stationary distribution.
165
d
P (t).
dt
Thus,
P (t) = P (0) = ,
for all t 0.
Now assume that P (t) = for al t 0. Then, for all h > 0, we have
P (h) =
(P (h) I) = 0
(P (h) I) = 0.
h
A = 0
j Ajk = 0, k
j (j)Qjk k (k) = 0.
j
j=k
j (j)Qjk = k (k) Q = ,
def
where k = (k)k .
j=k
That is, the final equation (and hence all the others) holds if and only if is invariant
for the Markov matrix Q. Such a exists, and satisfies all the desired properties,
by Theorem 3.5.16. Further, we see the invariant measure of the continuous time
Process satisfies k = k /(k), as desired.
Example 6.4.7. Consider the continuous time Markov chain with generator matrix
5 3
1
1
1 1 0
0
.
A=
1 4 1
2
14 58 6 5
=
,
,
,
.
83 83 83 83
Further, note that the transition matrix for the embedded discrete time Markov chain
is
1
0
0
0
.
P =
1/2
1/4
0
1/4
0 1/2 1/2 0
Solving for the stationary distribution of the embedded chain, i.e. solving P = ,
yields
35 29 6 5
, , ,
=
.
86 86 43 43
167
14 58
6
5
[1 (1), 2 (2), (3)(3), (4)(4)] = 5 ,
, 4 , 4
83 83
83
83
70 58 24 20
=
, , ,
83 83 83 83
172 35 29 6 5
=
, , ,
83 86 86 43 43
172
=
,
83
0 1 0 0
q 0 p 0
Q= q 0 0 p
,
..
..
.
.
where p + q = 1. This is the success run chain and we showed in Problem 2.11 that
the discrete time chain is positive recurrent. Let (i) be the holding time parameter
for state i of the associated continuous time Markov chain, and let Em , m 0,
denote a sequence of independent unit exponential random variables, which are also
independent of the embedded discrete time Markov chain. Finally, assuming that
168
X0 = 0, let T1 denote the first return time to state 0 of the embedded chain. For
example, if T1 = 3, then X0 = 0, X1 = 1, X2 = 2, and X3 = 0. More generally,
we have X0 = 0, X1 = 1, . . . , XT1 1 = T1 1, and XT = 0. For m < T1 , we let
W (m) = Em /(m) be the holding time in state m. We have
m 0 = E 0 0 = E 0
=E
T
1 1
m=0
W (m)
m=0
m=0
W (m)1{m<T1 }
E[W (m)1{m<T1 } ].
However, we know that the holding times and the embedded chain are independent.
Thus, as 1{m<T1 } is simply a statement pertaining to the embedded chain,
E[W (m)1{m<T1 } ] = [EW (m)][E1{m<T1 } ] =
1
P0 {m < T1 }.
(m)
1
P0 {m < T1 }
(m)
m=0
1
1
=
+
P0 {m < T1 }.
(0) m=1 (m)
For m 1,
P {m < T1 } =
n=m+1
P {T1 = n} =
n2
q = qp
n=m+1
m1
pn = pm1 .
n=0
Thus,
1
1
m0 =
+
pm1 .
(0) m=1 (m)
Of course, we have not chosen (m) yet. Taking (m) = pm , we see
1
1 m1
1
m0 =
+
p
=1+
= .
m
(0) m=1 p
p
m=1
The following example, taken from [11], shows two things. First, it demonstrates
that a transient chain can have an invariant measure. Further, it even shows stranger
169
(0)p (0)p
0
0
0
0
q(2)
(2)
p(2)
0
A=
0
0
q(3)
(3)
p(3)
..
...
...
.
We know that this chain is transient if p > q since the discrete time chain is. We now
search for an invariant measure satisfying
A = 0,
i > 0.
i
p
,
q
1
1 p
= p = q(1)
= q(1)1 .
(0)
(1) q
Now, consider the case when p > q, with 1 < p/q < 2, and take (i) = 2i . Define
def
= p/q < 2. Then,
(i) =
i=0
i
i=0
1
2
=
< ,
1 /2
2
6.4.3
Example 6.4.13. Let S = {0, 1} with transition rates (0, 1) = 3 and (1, 0) = 1.
Then the generator matrix is
3 3
A=
.
1 1
Solving directly for the left eigenvector of A with eigenvalue 0 yields
= [1/4, 3/4],
which agrees with the result found in Example 6.3.9.
1
P
1{X(s)=i} ds i , as t = 1.
t 0
Moreover, for any bounded function f : S R we have
t
P
f (X(s))ds f , as t = 1,
t 0
where
f =
jS
j f (j) = E f (X ),
171
Thus, as in the discrete time setting, we see that i gives the proportion of time
spent in state i over long periods of time. This gives us an algorithmic way to sample
from the stationary distribution: simulate a single long trajectory and average over
it.
6.5
2
f = .
100
As A is a matrix, it therefore makes sense to discuss the well defined object Af , which
is itself a column vector, and hence a function from S to R.
Next, we note that if the initial distribution for our Markov chain is , then for
any f we have that
E f (X(t)) =
P {X(t) = j}f (j)
jS
jS
iS
iS
jS
(6.11)
i (P (t)f )i
iS
= P (t)f.
Now recall that the forward equation stated that P (t) = P (t)A. Integrating this
equation yields
t
P (t) = I +
P (s)Ads,
0
P (s)(Af )ds,
0
172
(6.12)
E (Af ) (X(s)) ds
t
= E f (X(0)) + E
(Af ) (X(s)) ds.
0
This is a version of Dynkins formula. For a more formal derivation in the Markov
process setting, see [4, Section 1.1]. In the next section, we will use this formulation
to calculate the mean and variance of a linear birth and death model.
Example 6.5.1. We will re-derive the mean and variance of a Poisson process using
Dynkins formula. Let X(t) be a Poisson process with intensity > 0 defined on
S = {0, 1, 2, . . . }. Then, for any function f : S R
0
0
f (0)
f (0) + f (1)
0
,
0
Letting f (i) = i, and taking X(0) = 0 with a probability of one, we therefore see that
t
Ef (X(t)) = EX(t) = 0 +
E(Af )(X(s))ds
0
t
=
E f (X(s) + 1) f (X(s)) ds
0
t
=
ds
0
= t.
=
E g(X(s) + 1) g(X(s)) ds
0
t
=
E X(s)2 + 2X(s) + 1 X(s)2 ds
0 t
=
E 2X(s) + 1 ds
0 t
= (2s + 1)ds
0
= 2 t2 + t.
173
Of course, both the mean and variance of a Poisson process are well known.
However, the above method is quite general and is useful in a myriad of applications.
Example 6.5.2. Consider a pure birth process with growth rate (i) = bi for some
b > 0. That is, the embedded chain is the deterministic monotone chain and the
holding time in state i is bi. For an arbitrary function f , we have that
=1+
EbX(s) f (X(s) + 1) f (X(s)) ds
0
t
=1+b
EX(s)ds.
0
g(0) = 1.
Thus,
g(t) = EX(t) = ebt .
This result should be compared with the solution to the ODE linear growth model
x (t) = bx(t), which yields a similar solution. You will derive the variance for a
homework exercise.
Now consider the (row) vector ei , with a one in the ith component, and zeros
everywhere else. Taking ei as an initial distribution, we see from (6.11) that for all
t0
ei P (t)f = Ei f (X(t)).
In words, the ith component of the vector P (t)f gives Ei f (X(t)). Next, note that
1
(P (h)f P (0)f )
h0 h
174
(6.14)
1
Aij = lim (P {X(h) = j | X(0) = i}) = (i, j),
h0 h
when i = j, and
1
Ajj = lim (P {X(h) = j | X(0) = j} 1) = (j),
h0 h
for the diagonal elements. Therefore, (6.14) could be taken as an alternative definition
of the generator for a Markov process, though one which views the generator as an
operator and not simply as a matrix that stores the transition intensities. In fact, in
many ways this definition is much more useful than that of simply the matrix with
transition rates.
Example 6.5.3. Consider a process with arrivals coming in at rate > 0 and
departures taking place at rate X(t), where X(t) is the number of items at time t.
Then, for i 0 we have
Ei f (X(h)) f (i)
h0
h
1
= lim
f (i + 1)Pi {X(h) = i + 1} + f (i 1)Pi {X(h) = i 1}
h0 h
1
= lim
f (i + 1)h + f (i 1)ih + f (i)(1 h ih) f (i) + o(h)
h0 h
= (f (i + 1) f (i)) + i(f (i 1) f (i)).
So, for example, taking f (y) = y to be the identity, and X(0) = x, we have that
t
Ef (X(t)) = EX(t) = EX(0) + E (Af )(X(s))ds
0
t
=x+E
(X(s) + 1 X(s)) + X(s) X(s) 1 X(s) ds
t0
=x+
( EX(s))ds.
0
Setting g(t) = EX(t), we see that g(0) = x and g (t) = g(t). Solving this initial
value problem yields the solution
EX(t) = xet +
(1 et ).
The second moment, and hence the variance, of the process can be calculated in a
similar manner.
175
6.6
We again consider a Markovian birth and death process, though now in the continuous
time setting. As in Section 4.2, our state space is S = {0, 1, 2, . . . }. The transition
rates are
(n, n + 1) = bn
(n, n 1) = dn
(n, j) = 0,
if |j n| 2,
b0
b1
0
0
d
d1
0
1 (b1 + d1 )
d2
(b2 + d2 )
b2
A= 0
0
0
d3
(b3 + d3 )
..
..
..
..
.
.
.
.
generator matrix
0
0
0
.
b3
..
.
We begin with examples, many of which are analogous to those in the discrete
time setting.
Example 6.6.1. The Poisson process is a birth-death process with bn , for some
> 0, and dn 0.
Example 6.6.3 (Queueing Models). We suppose that arrivals of customers are occurring at a constant rate of > 0. That is, we assume that bn . However,
departures occur when a customer has been served. There are a number of natural
choices for the model of the service times.
(a) (Single server) If there is a single server, and that person always serves the first
person in line, then we take dn = > 0, if n 1, and d0 = 0 (as there is no one
to serve).
(b) (k servers) If there are k 1 servers, and the first k people in line are always
being served, then for some > 0 we take
n, if n k
dn =
.
k, if n k
(c) ( servers) If we suppose that there are an infinite number of servers, then
dn = n for some > 0.
176
Example 6.6.4 (Population Models). Suppose that X(t) represents the number of
individuals in a certain population at time t. Assuming the rates of both reproduction
and death are proportional to population size we have
bn = n
dn = n,
Example 6.6.5 (Population with immigration). Consider the previous system except
bn = n + for some > 0, representing an inflow due to immigration. Now 0 is no
longer an absorbing state.
and n = 0.
We have seen that this population grows so fast that it reaches an infinite population
in finite time with a probability of one.
k1
Example 6.6.7 (Chemistry 1). Consider the chemical system A B with A(0) +
k2
B(0) = N and mass action kinetics. Then, A(t), giving the number of A molecules
at time t, is a birth and death process with state space {0, 1, . . . , N } and transition
rates
bn = k2 (N n),
and dn = k1 n.
and suppose X(t) tracks the number of A molecules. Then this model is a birth
and death process with the exact same transition rates as the infinite server queue of
Example 6.6.3.
Returning to a general system, consider the embedded discrete time Markov chain
of a general continuous time birth and death process. The transition probabilities of
this chain are
bn
def
pn,n+1 = pn =
bn + d n
dn
def
qn,n1 = qn =
.
bn + d n
Note that in this case we have pn + qn = 1 for all n 0. We will first consider when
these processes are recurrent and transient, and then consider positive recurrence.
The following proposition follows directly from Proposition 4.2.6.
177
Proposition 6.6.9. A continuous time birth and death process is transient if and
only if
d1 dk
< .
b1 bk
k=1
Proof. From Proposition 4.2.6, the embedded chain, and hence the continuous time
chain, is transient if and only if
q1 qn
< .
p
1 pk
k=1
Noting that
q1 qn d 1 d k
=
,
p1 pk
b1 bk
k=1
k=1
d1 dk
k=1
p1 pk
k!
k=1
= .
for k 1.
Noting that these are the same equations as (4.5) and (4.6), we can conclude that
such an exists and can be made into a probability vector if and only if
b0 b1 bk1
k=1
d1 dk
< .
b0 b1 bk1
k=1
d1 dk
178
< .
In this case,
0 =
b0 b1 bk1
d1 dk
k=0
where the k = 0 term in the above sum is taken to be equal to one, and for k 1,
b0 bk1
0 .
d1 dk
k =
k=0
kk = k 1
k=0
=
1
=
,
b0 bk1
k=0
d1 dk
1
=
= e/ .
k!
k=0
Therefore, a stationary distribution exists, and since we already know the chain is recurrent we may conclude it is positive recurrent. Note that the stationary distribution
is Poisson(/), and
(/)k
k = e/
, for k 0.
k!
In the next chapter, we will see why many models from chemistry and biology have
stationary distributions that are Poisson.
We close by considering the generator for a continuous time birth and death
process. It is straightforward to show that it satisfies
(Af )(i) = bi (f (i + 1) f (i)) + di (f (i 1) f (i)),
for all i 0. This fact can be used in the case of linear intensities to easily calculate
the time-dependent moments.
179
Example 6.6.11. Consider linear birth and death process with transition rates
bi = i
di = i,
where , > 0. The generator of the process satisfies
(Af )(i) = i(f (i + 1) f (i)) + i(f (i 1) f (i)),
for all i 0. Taking f (i) = i to be the identity, and X(0) = x, we have that
t
Ef (X(t)) = EX(t) = EX(0) + E (Af )(X(s))ds
0
=x+E
X(s)(X(s) + 1 X(s)) + X(s) X(s) 1 X(s) ds
0
t
= x + ( )
EX(s)ds.
0
Solving yields
EX(t) = xe()t ,
(6.15)
which, it is worth noting, is the solution to the ordinary dierential equation x (t) =
( )x(t) that is the standard deterministic model for this system.
6.6.1
While not a topic covered in this course to any great depth, we turn briefly to the
question of parameter inference. We do so by considering the linear birth and death
process of Example 6.6.11. Specifically, we suppose that we believe our system can
be modeled as a linear birth and death system, however we do not know the key
parameters, or .
We first note that we have multiple options for how to model the dynamics of
the process with the two most common choices being (i) deterministic ODEs and (ii)
the stochastic model considered in Example 6.6.11. If we choose to model using ordinary dierential equations, then the time dependent solution to the process, equation
(6.15), only depends upon the parameter , and not on the actual values of
and . Therefore, there will not be a way to recover and from data, only their
dierence.
Perhaps surprisingly, more can be accomplished in the stochastic setting. While
the mean value of X(t) is a function of the single parameter given in equation
6.15, we can also solve for the variance, which turns out to be (this is the subject of
a homework exercise)
+ 2()t
e
e()t .
(6.16)
Var(t) = X(0)
180
Note that this is a function of both the dierence and the sum of and . Therefore,
we may use the mean and variance of any data to approximate both and . In this
way, we see that having noisy data actually helps us solve for parameters.
For example, suppose that we know that X(0) = 60 (perhaps because we begin
each experiment with 60 bacteria), and after a number of experiments we found
the mean of the process at time 1 is 22.108, and the variance is 67.692. Using the
equations for the mean and variance, equations (6.15) and (6.16) respectively, this
reduces to solving the system of equations
= .9984
+ = 4.8406,
yielding
= 1.9211
and
= 2.9195.
For comparison sake, the data reported above was actually generated from 1,000
samples of a process with actual values of = 2 and = 3.
6.7
Exercises
0
A=
1
2 1/2 3/2
.
1 4 2
0
0 1
Write a Matlab code that simulates a path of this chain. To do so, use the
construction provided in the notes (i.e. simulate the embedded chain and holding times sequentially). Using this code and assuming that X(0) = 1, estimate
EX(3) by averaging over 10,000 such paths. Note that you will need to make
sure you break your for or while loop after you see that the time will go
beyond T = 3, without updating the state for that step.
6. Consider the linear birth and death process of Example 6.16. Suppose that
X(0) = 100, and at time T = 2 the mean of 100 experiments is 212, and the
variance is 1,100. Estimate the parameters and .
182