Markov 123
Markov 123
Markov 123
Kishor Trivedi
A probability of failure
A failure rate
A distribution of time to failure
Steady-state and instantaneous unavailability
Markov chain
To model complex interactions between components,
use other kinds of models like Markov chains or
more generally state space models.
Many examples of dependencies among system
components have been observed in practice and
captured by Markov models.
MARKOV CHAINS
State-space based model
States represent various conditions of the system
Transitions between states indicate occurrences of
events
State-Space-Based Models
States and labeled state transitions
State can keep track of:
Number of functioning resources of each type
States of recovery for each failed resource
Number of tasks of each type waiting at each
resource
Allocation of resources to tasks
A transition:
Can occur from any state to any other state
Can represent a simple or a compound event
MARKOV CHAINS
(Continued)
Chapter 8
Continuous Time Markov Chains
Formal Definition
A discrete-state continuous-time stochastic process
is called a Markov chain if
for t0 < t1 < t2 < . < tn < t , the conditional pmf
satisfies the following Markov property:
pmf of X(t)
Using the theorem of total probability
Homogenous CTMCs
CTMC Dynamics
Chapman-Kolmogorov Equation
Transition Rates
Define the rates (probabilities per unit time):
net rate out of state j at time t:
Kolmogorov Differential
Equation
The transition probabilities and transition rates are,
Homogeneous CTMC
Specialize to HCTMC (Kolmogorov diff. eqn) :
CTMC Measures
(LTODE)
UP
1
DN
0
1
MTTF
1
MTTR
0 1
1
0
2 unknowns, 2 equations, but there is only
one independent
equation.
0 1 1
1
1 1 1 1
1
MTTF
1 Ass
1 1 MTTF MTTR
1
MTTR
1 Ass
MTTF MTTR
MTTR
MTTF MTTR
* 8760*60
d 1
0 (t ) 1 (t )
dt
d 1
(1 1 (t )) 1 (t )
since 0 (t ) 1 (t ) 1 we have
dt
d 1
( ) 1 (t )
dt
This equation can be solved to obtain assuming 1(0)=1
( ) t
1 (t ) A(t )
3)
R (t ) e t
2
2
Non-shared (independent)
repair
Shared repair
2 2 1
( ) 1 2 2 0
1 0
Hence
Since
We have
or
2
1 1 0
2
0 1 2 1
0 0
1
0
2
1 2
2
0 1
2
1 Anon shared
2
2
1
2
A larger example
Return to the 2 control and 3 voice channels example and
assume that the control channel failure rate is c, voice channel
failure rate is v.
Repair rates are c and v, respectively. Assuming a single
shared repair facility and control channel having preemptive
repair priority over voice channels, draw the state diagram of a
Markov availability model. Using SHARPE GUI, solve the
Markov chain for steady-state and instantaneous availability.
WFS Example
A Workstations-Fileserver
Example
Computing system consisting of:
A file-server
Two workstations
Computing network connecting them
2w
2,1
1,1
0,1
w
f
w
f
2w
2,0
1,0
0,0
Since all states are reachable from every other states, the
CTMC is irreducible. Furthermore, all states are positive
recurrent.
Markov Model
Let {X(t), t > 0} represent a finite-state
Continuous Time Markov Chain (CTMC)
with state space .
Infinitesimal Generator Matrix Q = [qij]:
qij (i !j) : transition rate from state i to state j
qii = - qi=
f
2w
0
0
0
( f 2 w )
)
0
2
0
0
f
f
w
w
0
( w f w )
f
w
0
= w
0
0
)
0
f
f
w
w
0
0
w
0
( w f ) f
0
0
0
0
f
f
Q 0,
after solving A
for
SS
availability
obtain
( 2,1) Steady-state
(1,1)
given (0)
A (t ) ( 2 ,1) (t ) (1,1) (t )
Ass
lim
A
(
t
)
Availability (Continued)
Interval Availability:
Steady-State Availability:
A (t )
I
A( x) dx
t
d
L(t ) L(t )Q (0) ,
dt
L ( 0) 0
Interval availability
AI (t )
L( 2,1) (t ) L(1,1) (t )
t
Ass 0.9999
2 , 1D , 1 , 0
Asys 2 1 (or 1D )
Closed-form
2
2
0 [1
] 1
( ) 22
0
1
E
1
E
2
1
1D
( ) E
2
1
2
22 E
1
A 2 1 r 1 D
2
2
( 2
r
)/ E
2
( )
2-components availability
model : delay + imperfect
coverage
Assumptions
Parameters
Process MTTF = 10 days (1/p)
Node MTTF = 20 days (1/n)
Process polling interval = 2 seconds (1/p)
Mean process restart time = 30 seconds (1/p)
Mean process failover time = 2 minutes (1/n)
Switching time with mean 1/ s
C = 0.95
Outline
Description of the system
Using a rate approximation
Using a 3-stage Erlang approximation to a
uniform distribution
Using a Semi-Markov model - approximation
method using a 3-stage Erlang distribution
Using equations of the underlying SemiMarkov Process
Solutions for the models
Outline
Description of the system
Using a rate approximation
Using a 3-stage Erlang approximation to a
uniform distribution
Using a Semi-Markov model - approximation
method using a 3-stage Erlang distribution
Using equations of the underlying SemiMarkov Process
Solutions for the models
(1-c)
(c+d)
Protection
Switch
Failure
Failure to
Detect
Protection
Fault
Simplex
(1)
2
Failed
(0)
Normal:
1
Protection Switch Failure:
2
Simplex:
3
Failure to detect protection fault: 4
Failed:
5
N=1
(1-c)
(c+d)
Protection
Switch
Failure
Simplex
(1)
2
Failed
(0)
Time to diagnostic is
exponentially distributed
with mean T/2
2/T
Failure to
Detect
Protection
Fault
Normal:
1
Protection Switch Failure:
2
Simplex:
3
Failure to detect protection fault: 4
Failed:
5
N=1
Outline
Description of the system
Using a rate approximation
Using a 3-stage Erlang approximation to a
uniform distribution
Using a Semi-Markov model approximation method using a 3-stage
Erlang distribution
Using equations of the underlying SemiMarkov Process
Solutions for the models
1.8
Comparison of
probability density functions (pdf)
1.6
1.4
1.2
1
pdf
0.8
0.6
0.4
0.2
0
T=1
time
1.2
cdf
0.8
0.6
U(0,1) cdf
0.4
0.2
T=1
time
(1-d)
(1-c)
(c+d)
Protection
Switch
Failure
Time to
diagnostic is
uniformly
distributed over
(0,T) approximated
by a 3-stage
Erlang with
mean T/2
Simplex
(1)
s1
s2
6/T
Failure to
Detect
Protection
Fault
6/T
Failed
(0)
6/T
Outline
Description of the system
Using a rate approximation
Using a 3-stage Erlang approximation to a
uniform distribution
Using a Semi-Markov model approximation method using a 3-stage
Erlang distribution
Using equations of the underlying SemiMarkov Process
Solutions for the models
Normal
(1+1)
(1-d)
(1-c)
(c+d)
Time to
diagnostic is
uniformly
distributed over
(0,T) approximated
by a 3-stage
Erlang
distribution
with mean T/2
Protection
Switch
Failure
Simplex
(1)
Failure to
Detect
Protection
E(t)
Fault
Failed
(0)
k 0
( T6 t ) k
k!
T6 t
Outline
Description of the system
Using a rate approximation
Using a 3-stage Erlang approximation to a
uniform distribution
Using a Semi-Markov model approximation method using a 3-stage
Erlang distribution
Using equations of the underlying SemiMarkov Process
Solutions for the models
0
0
1-c
2
0
0
0
0
cd
2
0
T
1
(
1
e
)
T
1
1-d
2
0
0
0
0
1
T
(1 e
0
v v,
1 c
1 2
v,
1 d
1 2
v,
v1 , (
(1 c )
2( )
1 d
2
(1
2 ( )
1
T
(1 e
)))v1
( )t
H 2 (t ) 1 e ( )t ,
,
H 5 (t ) 1 e
tT
1,
tT
H 4 (t )
1 (1 Tt )e t ,
2 t
h1
1
2
, h2
, h3
, h4
1
1
T 2
(1 e
vi hi
5
v jhj
j 1
Unavailability 2 5
), h5
1
2
Outline
Description of the system
Using a rate approximation
Using a 3-stage Erlang approximation to a
uniform distribution
Using a Semi-Markov model approximation method using a 3-stage
Erlang distribution
Using equations of the underlying SemiMarkov Process
Solutions for the models
Results obtained
Steady state availability
Steady state
unavailability
Avg. downtime in
steady state
(Minutes/year)
Exp(2/T)
9.99989992e-01
1.00075983e-05
5.25999
1.99977503e+00
9.99989992e-01
1.00075983e-05
5.25999
1.99977503e+00
Semi-Markov (3-stage
Erlang approx. with mean
T/2)
9.99989992e-01
1.00075983e-05
5.25999
1.99977503e+00
Semi-Markov Process
equations-U([0,T])
9.99989992e-01
1.00075983e-05
5.25999
1.99977503e+00
Absorbing state
d 0 (t )
1 (t )
dt
MTTF
2
2 2
, we get:
factor (1 22 ) 1
3
States (0,1), (1,0) and (2,0) become absorbing states while (2,1) and (1,1) are transient states.
Note: we have made a simplification that, once the CTMC reaches a system failure state, we
do not allow any more transitions.
(i , j )
( i , j ) ( x ) dx ,
QB B (0)
QB B (0)
(i, j ) B
MTTA
( i , j )B
(i , j )
QB
( f 2w )
2w
( w f w )
First Solve
( f 2 w )
2w
( f w )
Markov model
with imperfect coverage
Next consider a modification of the above
example proposed by Arnold as a model of
duplex processors of an electronic
switching system. We assume that not all
faults are recoverable and that c is the
coverage factor which denotes the
conditional probability that the system
recovers given that a fault has occurred.
The state diagram is now given by the
following picture:
Markov model
with imperfect coverage
(Continued)
Markov model
with imperfect coverage
(Continued)
(1 2c)
MTTF
2[ (1 c)]
It should be clear that the system MTTF and system reliability are
critically dependent on the coverage factor.