CPS Mid Sem
CPS Mid Sem
CPS Mid Sem
)-Final Year
Lecture 1 & 2
09-06-2021 09: 00 to 11:00 AM
Unit-1
Summary
The Alarm Problem
"My definition of an expert in any field is a person who knows enough about what is really going on to be
scared" P.J.Plauger
What is an Alarm ?
An alarm is audible and/or visible means of indicating to the operator an
equipment malfunction, process deviation, or abnormal condition requiringa
response"
Poorly performing alarm system has resulted into serious incidents/ accidents
(example : Bhopal incident in India, 3 mile island off shore platform in Mexico/ US)
What keeps process within limits?
Can only alarm maintain the Chemical Process Safety?
ecida Alarm Purpose
PV LYPical Process Upset
Shutdown
erectrve
Response
Abnormal
Alerm
xida
Alarms are One of the First Protection Layers
Community
Emergency response
Plant
Emergency response
ve protectio
Lsa
Safety InstrunmentedSystem
-
(Ls
Trip
Operator lntervention
Alarm
Process Control
LoOp
Process Design
Process value
Adding an alarm needs wiring from sensor to panel lamp through Annunciator panel to its window!
Autcikuen
The Good Old Days... autorumatOn
Rockwel
Automation
Misapplication of Modern Technology
AMARASWbennAirnSmnen
ass
00:13:08 26242S1
VVEIGHEEDER ZEROSPEED STVITCH
2 KZZ COAL
p24F@. K22 COAL WEIGHFEEDER FLOVW TOTALIZER
ONT
991AON DOME SUMP PUMP ROR ON
SL 14 BiN C
021
T 21 021| FAUL-T
PRESSO 21 RUN COMMAND OF F
BaSV37
FEEDER
aAGHOUSE
K21
AOTION
AAN
DETECT
PURGE SOL.
Sw. ON
65SVa5
253SV37
KZ1
K19
SAGHOUsE
BAGHOUSE AAN
CPT.3
SOLONOD
OFP
OFF
aUAD VAL VE 004 RUN ON
041ARNE
ON
42SV11 DUST CYCLONEVALVE
22 sOLENOID VALVE
SS
CUTOFF
EEOCOAL WEIGHFEEDER
FLOW TOTALIZER
2EROSPEED SViTCH AUL
BODL
BEEP_ON BOOL
BOOL ase
Fake
C C BUDL
STRING
Who can help ? (Competent People)
People who are acknowledged experts in the alarm management field, with in-
depth understanding of the historical and current problem, the science and
literature, the studies and standards, and the range of solutions
People with in-depth knowledge of process control, distributed control systems
(DCS/PLC), human-machine interfaces (HMI), process networks, and critical
condition management
People with experience in every stage ofa successful alarm system improvement
project, along with many examples of successful projects
People who understand work processes based on successful experience in
different PROCESS industry segments. You want to know what your industry is
doing, what are the best and most efficient practices, and frankly, what the worst
practices are.
The ANSI/ISA-18.2-2009 (2016)
Alarm Management Standard
CHAPTER 1
Alarm Management
Best Practices:
Highly Condensed
Having decided to investigate this area, how do you proceed? Your time
and resources are always limited. The subject is complex. Alarm system
improvement involves an interlinked combination of technology and
work processes.
5
6 Alarm Management: A Comprehensive Guide, Second Edition
The issuance of ISA-18.2 is a significant and important event for the pro-
cess industries. It sets forth the work processes for designing, implement-
ing, operating, and maintaining a modern alarm system, presented in a
lifecvcle format. This standard will definitely have a regulatory impact,
but more on that later.
There is no conflict between this book's seven step approach and the
ISA-18.2 life cycle approach-there is only some different nomenclature
Chapter 1-Alarm Management Best Practices: Highly Condensed
and arrangement of the topics. The seven step approach is well proven
for efficiency and effectiveness.
These first three steps are placed first in the process because they collec-
tively provide the most improvement for the least expenditure of effort.
They provide the best possible start and the fundamental underpinnings
for the remainder of steps necessary for effective alarm management.
These first three steps are universally needed for the improvement of an
alarm system. The following steps generally involve more time, resourc-
es, and expense. Some of them may or may not be needed depending on
the performance characteristics of your system.
best solutions, which are provided here in this book. One result of a D&R
effort is the creation of a Master Alarm Database, which contains the post-
rationalized alarm configuration with changed setpoints, priorities, and so
forth. A Master Alarm Database has several uses.
1.5 Summary
If you know or suspect you have an alarm problem, read this book and
begin doing the things it recommends.
16-07-2021 09:00 11:00
Operator Response Time Cycle
.What makes alarm a problem?
.When is an alarm considered as "Overloaded" as per standard?
Definition of "Alarm-flood" and its permitted range as per standard?
Definition of "Nuisance Alarm"
Stale Alarms and its recommended range as per ISA 18.2 standard
Operator needs to detect easily (Detect)
to think about reasons why ??(Diagnose)
To take an Action (Respond)
exida a Successful
What makes Operator Response?
Sensor Logic Solver Final lement
odd Stauffer
What makes Alarms - A PROBLEM?
exida
Common Alarm Management "Villains"
Alarm Overload (Steady State)
Good Alarm
O Alam Floods (After an Upset)
Bad Alarm
NuisanceAlarms
OChattering Alarms
o Standing/ Stale Alarms
Redundant Alarms
OAlarms which have no response
OAlarm priority not meaningful
odd Stauffer
When is Alarm -considered as Overload ?
What is recommended practice as per 18.2 for Alarms / hour?
Todd Stauffer
When an overload / alarm flood occurs ?
-
Can flood be maintained/ contained ? How about alarm- flood ?
What is the permitted range of Alarm Flood in percentage as per stnadard?
exicda
Alarm Flood (Alarm Shower)
Definition: A condition during which the alarm rate is greater than the operator
can effectively manage (e.g., more than 10 alarm per 10 minutes).
Usually triggered by a single event (plant upset)
Most complex alarm problem to solve
Results
Increasedtolikelihood of operator missing an alarm
Potential overwhelm
the operator
Occurs during most critical
time for the operator
Metric Target Value
Percentage of10-minute periods containing more than 10 alarms
Maximnum number of alarms in a 10-minute pericod
Percentage of time the alarm system is in flood condition
com LLC a
Copyrane exxda 200o-2018
dd Stauffer
"The Boy Who cried Wolf "- an universally known story
What is moral of story w.r.to Nuisance Alarm bells?
Definition of "Nuisance Alarm"
exida
Nuisance Alarm
An alarm that annunciates excessively, unnecessarily, or does not
return to ornormal
fleeting, stale
after
alarms).
the correct response is taken (e.g., chattering.
Results
Desensitize the operator
Distract the operator
Lead to Missing of Alarms
Increase Operator Stresss
The Boy Who Cried Wolf
Ref: 1SA-18.2. IEC 62682
dd Stauffer
What happens if level high alarm (LAH) is set at 70 % of level and when it goes
down and up in the range of 70.1 & 69.9%?
rding D 2015
exida
Chattering Alarm
An alarma that repeatedly transitions between the alarm state and the normal
state in
Clears
short
quickly
period
whether
of time (e.g.
operator
3x / min)
responds or not
Makes up large percentage
nmay
of systems total alarm loading
Single chattering alarm
produce thousands of events
Most common nuisance alarm
xida
Stale Alarm
An alarm
period
that remains
of time (e.g.,
in the
24 hours).
alarm state for an extended
exida
Bad Actors (aka Frequently Occurring Alarm)
Definition: A tag that produces a large number of alarm events
80/ 20 Rule: "10-20 tags produce 80% of the alarns"
Often caused by System / Instrument Diagnostic Alarms
dd Stauffer
Alarm priority: Be selective in prioritisation
5 priority; 15 % Medium priority; 80 % Low priority
% High
High/Medium/ Low Vs Response in 5 min./15 min./ 30 min.
dd Stauffer
17-07-2021 09:00 11:00
Common DCS and Alarm Displays
Modern large screen in Control Room in addition to table top Computer
Alarm Management Life Cycle
Key Performance Indicators
Six DCS screen shots (Operation & Engineering Configuration)
-
Actual Alarm Summary Display: Explained for top most alarm dated 14 th March 2013
15:15:10 LIC 301 PVHH 110 (cms) 56 TPD Polish Brine Tank Level 110.505(cms)
Time Tag no. Process Value set point HH Description of Equipment Actual Value
PLCmanufacturer claims features
More Than a DCS!
Roccell
PlantPAx -
It's More Than a DCS Automation
PlantPAx
Syster
Process Au
ation
Automation
Other Useful Alarm System KPIs..
Number/
standing
of
stale
long
alarms
Priority distribution
alarms
of
Unauthorized
property changes
alarm Number
operating
of alarms
position
per
Chattering
Suppressed alarms
alarms outside of approved
methodologies
Target Benchmark number (s) for 8 KPIs
O 6-12 Alarms
hour
/
O O
AiaIiS/nUuI 5 On Vs 12 Alarms/ HOur as (ISA
Benchmark-2009) done in 1994/96 by GACL/ UPL Mobile HMI
14 Mar 14 1119135
lnTouch Access Anywhere
PULO
BLEACH
ie.c0e
2.00
s6 TPO POLISH BRINE LUL
S1OA
KFER PUNP
1
9
110.585
30.
STOPPED
Alarm- Displays
PRESSURE
PuLO
.co
oISH
CLARIFIEO 8RTHE LruEL
UNLORD TRANSEPHP
BRINE LUL
TART
ro
stORPED
on DCS Screens
23P0
Automation Honeywell
PlantPAx Library Alarms
Rockw
Automation
PlantPAx Library Alams
FromISA 18.2: 11.2.1 HMI Info Requirements:
Rockwell-Allen Bradley PLC
The inlerface shall cleany indicale
A lam;
From ISA 18.2 11.2.3 HMI Display Requirements Alarmta
A ortios
Factory-Talk Software
The interface shall provide capability for the following Alarm Ypes
a At least one alarm summary display:
b) Alarm indications on process aisplays, H 9443 GPM 1ag in Aarm Descpbon orogt in aii
featuresS
Alarm Phonty.symbol and color
c) Alarm indications on tag detail display
olcaies ign riOy Alarm State: Solid o
Ust
. Naw
Driority alarms
alarms
Newr ealows
d Uried TRPp Mcta Coniact Wld Let. Comia 2 0 it to be stationed as a
permanent fixture on
the HMI client. Reactor
Launch summary
a /0 ar
recuy ommore
bottom of C
Banner for
o Rn Command 1 Phesae details
@er
eRed in Client window to
always appe sereet
en T y opnicseeen
1SHS28LIO3O PUHN
11.000 56 TPO POLISH BRINE LUL 110.58BS
151864L1I752862
459
PUHI 38.0 cOOLING UATER SUPP-TEMP
5 P1e01
L 16P81
PULL
OFFNOR
2.000 TORAGE-5 PRESSuRE
8LEAC
1.518
X52P01 X-FER PUNP STOPPED
OFFMOR HCL NG PUNP STOPPED
24010X PULO 7.000LO LO CL2 TO X24TO18a
7.00
681
PULO
PUH
.ses sTORAGE-5 PRESSUREE S.458
1.518
0491X PUNK
85. HCL HEAD TANK LEUEL 87.0
5X 30.000 CLARIFIED BRINE LEVEL
a 5752841 OFFMORM LORDINC RECT 91-98
PULO 68.00 START
OFFHORN
coOLING UATER SUNP LEUEL
PUNI
N2504 UMLOaD TRANSFR PRP S6.e6
56 TPD POLISH 8RINE STOPPED
PUNI LUL 12.585
PUNT Ns04
DRY TO x21003
CL2 1O X22 91.57
40
PN Parikh
1
25 th March
& with
14
way forward
Achiovomont
for Associated Display
Rockwell
PlantPAx Library Alarms Automation
alm 2
95tu
1
PN Parikh 25 th March 14 37
way Forward:Alarm to be set for Dead Band of Time & Alarm value at DVACL
Alarm-Shelving
Reason for Shelving Nuisance Alarm
Shelving Period: 2 hours
36
Parikh
PNForward-Alarm
Way
26 th March at'14 DVACL
Shelving
Copyrighted Materials
Copyright 2011 ISA Retieved from www.knovel.com
CHAPTER 4
There are typically three methods by which alarms are displayed to a DCcs
or SCADA console operator. (The term "DCS" is used to include SCADA
systems, since their alarm-related functionality is essentially identical.)
These methods are:
The alarm display functionality provided by the DCS manufacturer
Custom graphics created by the owning companyy
External lightbox annunciators added to the DCS
from switches, and other more complex poirnt types have many addi-
tional alarm types and choices. Logic points can be constructed to create
special-purpose alarms under a variety of Boolean conditions. Programn
code can be written to create quite complex alarms.
When alarms occur, their status is depicted on the control system screens.
New alarms can be acknowledged by the operator, which generally alters
their appearance in some way. When the alarm condition is no longer
in effect, the alarm clears and either automatically disappears from the
displays or can be manually dismissed by the operator. Time-stamped
electronic records of new alarms, alarm acknowledgement, and alarm
clearing are automatically created and saved.
The best practice is to use three primary levels of annunciated DCS alarm
priority. Your DCS may allow many more than that. Do not succumb
to the temptation of using them! Humans are wonderfully able to put
things in three categories and to understand items in three categories.
Four or five categories are about the maximum; more than that will get
cognitively blurred together and become confusing rather than helpful.
(Quick! What is the difference between Priority 17 and Priority 18?)
CHAPTER 12
Understanding and
Applying ISA-18.2:
Management of Alarms for
the Process Industries
"Laws are like sausages. It is better not to see them being made."
-Otto von Bismarck (1815 1898) -
Ditto for standards! Over the last several years, alarm management has
become a highly important topic, and the subject of a number of ar-
ticles, technical symposia, and books. In response to this, in 2003 ISA be-
gan developing an alarm management standard. Dozens of contributors,
from a variety of industry segments, spent thousands of person-hours
participating in the development. The authors of this book participated
in the roles of section editor and voting member. After six years of work,
he new standard, ANSI/ISA-18.2-2009 Management of Alarm Systenms for
the Process Industries, is now available at www.isa.org.
ISA-18.2 is quite different from the usual ISA standard. It is not about
specifying how some sort of hardware talks to other hardware, or the
detailed design of control components. It is about work processes of
people. Alarm management isn't really about hardware or software; it's
about work processes. (Poorly performing alarm systems do not create
183
184 Alarm Management: A Comprehensive Guide, Second Edition
In this section, we will review the most important aspects about the scope,
requirements, recommendations, and other contents of ISA-18.2. But,
there is no substitute for obtaining and understanding the full document!
Additionally, you will learn a bit about what it is like to be on a standards
committee, and why a standard requires several years to develop.
Philosophy
ldentification
Rationalization
Detailed Design
Management
of Change
EImplementation
Audit
Operation
Monitoring&
Assessment
Maintenance
For exanmple, in a few minutes an engineer could sit down and resolve a
single nuisance chattering alarm. This simple task could involve going
through several different life cycle stages as part of performing the activi-
ties associated with a simple task. Follow along with this engineer as he
thinks through the process of resolving a single nuisance alarm:
Monitoring Stage:
"Well today, I am spending some time fixing nuisance alarms.
Which of my alarms are on the most frequent alarm list? Ah, there's
one-a chattering high-value alarm on the column pressure."
Identification Stage
"Ah yes, I happen to remember that we need this alarm as part of
our quality program. My job today, though, is to make it work cor-
rectly and eliminate the chattering behavior, not to decide wheth-
er to get rid of it or not. So I don't have to research as to whether it
was originally specified by some particular process like a PHA."
Implementation Stage:
"Now I actually change the deadband. I type in the new number
and hit Enter. Done!"
Monitoring Stage:
"Part of my work process for this is to continue to look at the
alarm data to see if this deadband setting change solved the prob-
lem. I will add this one to my tracking and follow up list."
In a few minutes, several different life cycle stages were briefly visited in
accomplishing this one example task. So in understanding and applying
ISA-18.2, don't get overwrought about trying to figure out which life
cycle stage you are in at any point in time. It is a requirements structure,
not a work process sequential checklist!
There are no surprises in the list except for two concepts not seen before
in the alarm management lexicon-namely "Alarm Classification" and
"Highly Managed Alarms."
The various mandatory requirements for HMAs are spread out in several
sections throughout ISA-18.2. These include:
Specific Shelving requirements, such as access control with audit trail
Specific Out of Service requirements, such as interim protection,
access control, and audit trail
Mandatory initial and refresher training with specific content and
documentation
Mandatory initial and periodic testing with specific documentation
Mandatory training around maintenance requirements with spe-
cific documentation
Mandatory audit requirements
Our advice is to specifically avoid the usage of this alarm classification!
You might choose to have your own similar classification, and then
choose only the administrative requirements you deem specifically nec-
essary for those alarms. These will probably be only a subset of the ISA
18.2 listing for HMAs.
Even better is the integration of relevant alarm data available via a single
click call-up within the DCS graphics themselves, i.e. within the opera-
tor's HMI. This method provides for the quickest access by the operator
to alarm information at the time and at the point where they need it. It
is an important adjunct to operator effectiveness and abnormal situation
response.
144 Alarm Management: A Comprehensive Guide, Second Edition
The changed alarm system will usually involve a major shift in operat-
ing methodology for most operators. It can be quite uncomfortable for
them, as well as staff engineers, to accept. There are several consider
ations and methods to accomplish effective implementation.
For example, consider a simple interlock that closes a feed valve based
on a high level of 80% in a tank.
Poor Practice: Cornfigure the logic element with the occurrence of the
high alarm (often via a flag) as the input to cause the valve to close. This
is poor because:
The alarm setpoint parameter, or even the existence of the alarm,
is subject to change from a variety of places. Years of history have
led many to believe that the change of alarm settings is not a
significant action, regardless of procedures or MOC policies. A
change to the alarm setpoint will change the functionality of the
interlock, and this will likely not be obvious!
In some DCSs you have nmany obscure choices and methods as to
suppression options on an alarm, some of which could negate the
flag chosen to close the valve. So a suppressed alarm could prevent
the safety function of an interlock.
Chapter 5-Step 1: The Alam Philosophy 61
Better Practice: Configure such logic elements with the process value
(PV) as an input, and compare it to a numeric (80%) contained within
the logic construct. This is better because:
Even though the numeric could be changed, logic elements are far
more obscure control system constructs and are much less likely to
be changed by the non-expert. The logic will activate and the valve
will close based on the PV, whether the alarm occurs or not.
A separate alarm can be configured to provide warning ofthe im-
pending interlock action.
This design leaves the flexibility for adjusting, resetting, shelving,
or otherwise modifying the alarm appropriately, without inadver
tently changing the performance characteristics of the interlock.
Our conclusion is, if you want something to happen based on the pro-
cess attaining a certain value, then program it or configure it based on
reading the value itself, not on whether an alarm occurs at that value.
Exceptions deserve careful evaluation.
system. For details, see Chapter 10.) If the messaging system attracts the
operator's attention by sounding tones or flashing lights, and requires
acknowledgement, then the messaging system has a similar effect as the
alarm system in loading the operator. Therefore, the use of such mes-
sages should meet many of the same principles as alarms.
Any messaging system should use a separate visual and audible interface
(different tones) than the alarm system.
Copyrighted Materials
Copyright 2011 ISA Retieved from www.knovel.com
CHAPTER 6
Step 2: Baseline
and Benchmarking
of Alarm System
Performance
"Ifyou torture data sufficiently, it will confess to almost anything."
-Fred Menger
Good alarm analysis software should be able to perform all of the analy-
ses in this chapter, and many others. It is possible to do these in a spread-
sheet, although the data parsing and reduction will become tedious,
speed is quite slow, and spreadsheet page size limits are easily exceeded
when importing alarm journals. Frankly, using a spreadsheet to analyze
alarm events is like using a water hose to fill an Olympic-sized swim-
ming pool! The proper tool for alarm analysis is a real database.
in the afternoon." The control console is not logically split in such situ-
ations, nor are the alarms segregated. The question arises-since more
than one person is monitoring them, are substantially higher alarm rates
(perhaps doubled) possible to be handled successfully?
While some minor alarm handling rate increase might be possible, there
is no documented research or testing available about this situation. It is
obvious that doubled rates would not be achievable.
The situation can take quite awhile to figure out, involving looking per-
haps at trends of all of these flows and comparing them to the proper
numbers for the current process situation. The correct action to take var-
ies highly with the proper determination of the cause(s). The diagnosis
time is highly variable based upon the experience of the operator and
whether the operator has been in the situation before.
The result is that the diagnosis and response to a simple high tank level
alarm becomes not quite so simple at all. Given the tasks involved, cer
tainly much less than ten such alarms can be handled in a ten minute
period. Or, sixty in an hour.
Compare and contrast the above simple "high level tank" alarm to anoth-
er, different simple alarm stating "Pump 14 is supposed to be running but
has kicked off." The needed action is verydirect: "Restart the pump or if it
won't, start the spare." Operators can handle several such alarms as these in
ten minutes. The time required to figure out the situation is much less.
The real concern is to get the alarm rates down to a level so there is a
low likelihood an alarm will be missed. Remember, when alarms indi-
cate a situation requiring an operator action, missing an alarm means an
avoidable consequence will occur. Alarm rate also then indirectly indi-
cates control system effectiveness-its ability to keep the process within
bounds that do not require manual operator intervention to avoid con-
sequences of differing severity!
68 Alarm Management: A Comprehensive Guide, Second Edition
Alarm rates are thus controlled by indirect means rather than direct
means. The solution to an alarm rate problem may lie in control im-
provements rather than in directly addressing the alarm system.
In the early 1990s, control systems were generally big, expensive, closed, pro-
prietary boxes. They were not designed to connect to alien systems like PCs.
The printer port was one of the few standard interfaces available. The DCS
manufacturer wanted you to buy their equipment for anything you needed.
A simple replacement keyboard could cost $5,000 (but it was "certified!")
The closed nature of DCSs meant that any advanced methods of collect-
ing alarm events for analysis were very DCS-specific, which made multi-
DCS commercial solutions uneconomic. Many home-brewed solutions
began to appear from innovative end-users and third parties. In the late
1990s and early 2000s, DCSs became more "open," generally beginning
to support Microsoft-based technologies. A major advance came about
with the support of the OPC standard by several DCS manufacturers.
OPC stands for Object Linking and Emnbedding (OLE) for Process Control.
All of the examples in this book are of real data, but slightly disguised to
protect the embarrassed.
7000
4000
Likely Acceptable' (150/day)
3000
2000
1000
-30 Days -
In the above (and quite typical) data, this alarm system produces alarms
at rates far beyond the operator's abilities of evaluation and response,
and for days at a time. Such an alarm system is not a useful tool to help
the operator perform the right action at the right time! In fact, it is much
more of a distraction or a hindrance to the operator.
100
80
60
40
20
-8 WeekS
Fixed ten minute intervals are used (e.g., 1:00:00 pm through 1:09:59
pm). An alarm rate of ten or more alarms in ten minutes defines the be-
ginning of an alarm flood. In the chart above, rates often exceed 20, 40,
60, or more alarms in ten minutes. Such rates can continue for hours.
During such periods, the likelihood of an operator missing an important
alarm increases, as has been shown many times in the analysis of major
accidents.
An alarm flood can last for many hours and include hundreds or thou-
sands of alarm events. Alarm floods can make a difficult process situation
much worse. This analysis depicts the alarm floods occurring during an
eight week analysis period, showing a breakdown by alarm count during
the flood. Only alarms annunciated to the operator are included.
Alarm floods are a significant problem for this system. Most alarms pro-
duced by the system are during flood periods. Flood magnitude is very
high, generally hundreds of alarms contained in each flood. There are
over fourteen floods per day on average.
Chapter 6-Step 2: Baseline and Benchmarking of Alarm System Performance 75
1000
820 Separate
900 Floods
Several
Peaks
800 above
Highest Count
in an Alarm
1000
Flood 2771
700
Longest
600 Duration of
Flood 19hrs
500
400
300
200
100
- 8 Weeks
Upon first examination, this might seem like very good performance,
and our work is done! The average per day (138) is less than even the
"Likely Acceptable" value of 150. However, a more detailed look at the
data is needed, preferably involving trends. The question is-regardless
of my averages-how many alarms were likely to have been missed?
The alarms per day chart for this week looks like this:
500
450
"Maximum
400
Manageable"
350
300
250
200
150
100
-7 Days
Five of my days were less than the 150 "Likely Acceptable" value, and
although two days exceeded it, they were still well under the 300 "Maxi-
mum Manageable" value. Is this the end of the story?
Chapter 6-Step 2: Baseline and Benchmarking of Alarm System Performance 77
The alarms per ten minute chart (averaging only 0.96) looks like this:
40
30
20
10
-7 Days
There were only two fairly minor floods. The peak rate during one flood
was 48 alarms in ten minutes and the other had sixty alarms in one ten
minute period. The flood breakdowns were:
By this method:
Flood alarms were likely to havebeen missed.
1: 78
Flood 2: 104 alarms were likely to have been missed.
This week: 182 alarms were likely to have been missed.
In other words, despite these great averages (and actually this is pret
ty good alarm system performance, and most sites would be happy to
achieve it!) we still put the operators in the position of being likely to
miss almost 200 alarms. Almost 200 cases where the failure of the op-
erator to take proper corrective action could have resulted in a conse-
quenceperhaps a quite significant one.
The result of this simple analysis is plotted in Figure 6-9, which is taken
from the same data as Figure 6-3, Alarms per Tern Minutes.
Week 1: 3,885
Week 2: 2,281
1000 Week 3: 2,728
Week 4: 1,903
Week 5: 2,173
Week 6: 1,443
800 Week 7: 2,253
Week 8: 4,260
600
400
200
- 8 Weeks
In the eight week period of Figure 6-3, almost 21,000 alarms were very
likely to have been missed! This is great for the proverbial elevator speech
Chapter 6-Step 2: Baseline and Benchmarking of Alarm System Performance 79
where you have one opportunity to state your case in-between floors. A
weekly view of such data can really get the attention of management. So,
don't rely on averages to tell the whole story!
As is often the case, only ten alarms are a significant fraction of the en-
tire system alarm load, in this case 55%. (The analysis of hundreds of
systems shows the number is often over 80%, rarely is it less than 20%.)
In fact, the top four alarms are over 40O% of the load! Were they inten-
tionally designed to annunciate so frequently? Of course not! Are they
performing a useful function in their current configuration? Doubtful.
The beauty of this analysis is that it can direct improvement efforts to
where they will do the most good. Imagine finding the time and making
the effort to improve only one alarm per week-to make it work as it was
intended to work. In four weeks, this system would be improved by over
40%. Someone would be a hero!
3500 100
90
3000-
80
2500 70
2000 60
50 Count
1500 Accum
40
1000 30
20
500
LLLg.ngl. 10
Mobile HMI
nTouch Access Anywhere
A
PN Parikh
Uncomfigured
25 tn Marcn
Alarm
14
Feature
Corresponding Graphic page is available to operator to jump from
Alarm-Summary page to Operational Graphic page where in he
gets more details about the same Tag-number: A-16-105 B
35
PN Parikh
Unconfigurod
25 th March
Alarm Feature
14
5.5 Alarm Response Timeline
Return to-
Normal
(A)
Unack(B)Alarm Ack & Response
(C) Normal (D)
process rosponse-
without operator action
consequence
threshold
- measurement
process
rosponse to
operator action deadband
delay
Operator
takos action sotpoin
alarm
tabular way.
TTTIIT
E
Key actions which can reduce Alarm-Flood by 50 %
Gompletion of Phase 1
Bad Acting Alarm Type of Alarms Counts
Phase 2
Unnecessary alarms
removals by Process Alarmss 2924
Optimize alarm settings dynamically with help of System Alarms
standard data and Mr. PN Parikh Sir.
Dead Band, (Dead )Time-delay for Operator Actions 658
Temp. :Pressure; Level & Flow Set-points Avg Alarema
heur
Ma. Alarmm
hour
of houira wtth
m.fhour
Classtscstses
for overhead tank to keep himself alert to manually trip the pump on need basis) Alerts setting is by
Operator for his convenience.
A few Acronyms ... worth remembering
SCADA: Supervisory Control& Data Acquisition- (Software Engineering)
RAGAGEP Recognised And Generally Accepted General Engineering
Practice
Standards and Codes are RAGAGEP.
OSHA (Occupational Safety and Health Act) fined Texas City Refinery in
2009, 87 Million USD fine for not following ASME code and ISA standard
resulting into accidents with fatalities. (Level high alarm did not function)
ANSI: American National Standards Institute
ISA: International Society of Automation
IEC: International Electrotechnical Commission (Europe)
BIS: Bureau of Indian Standards (Starts with IS-IEC 61511 accepted)
ISA/IEC 61511 is a SIS Safety standard for Process Industry includes Chem.
Unit 2: Hazard and Operability Studies (HAZOP)
30-07-2021: Friday Lectures 8 &9:09:00 11:00 AM
31-07-2021:Saturday Lectures 10 & 11: 09:00~11:00 AM
(Reference: Text book Chemical Process Safety by Crow/& Louvar, 3 rd Edition-2011)
Hazard analysis : Why needed ? What is required ? How done?
Definitions: Hazard, Risk, Failures- Safe & Dangerous (Hidden), Human Error, Safe guards
Fault tree analysis
HAZOP (Hazard and Operability Studies-3d Edition, CPS by Crowl & Louvar; Ch #11,Section 11.3 page-510)
C Symbols knowledge (AND gate ,OR gate, Final Event, Intermediate Event)
d. Logical thinking & skill expressed terms of above gate(s): Logic Diagram
Analysis starts from Top Event down to individual Basic Event(s) thru' Intermediate Events
Circle
Undeveloped event
Diamond
REACTOR EXPLOSION
3.6 x 10-4F/YR
RUNAWAY BURSTING
REACTION DISC FAILS
0.02
Probability
1.8x 10F/YR of failure
on demand
FLOW CONTROL TEMPERATUREE
LOOP FAILS INTERLOCK FAILS
0.3 F/YR 06
D -
FLOW ALVE THERMO
VALVE FAILS
CONTROLLER STICKS COUPLE &
TO CLOSE
FAILS OPEN RELAY FAIL
0.2 F/YR 0.1 F/YR 05 0.01
Probability Probability
offailure of failure
On demand on demand
2 tttKKE tFRSSIH
04-08-2021 Pramod N Parikh
Fault Tree Analysis
When to Use:
a. Design:FTA can be used in the design phase of the chem.plant to uncover hidden failure modes that
result from combinations of equipment failures.
b. Operation: FTA including operator and procedure characteristics can be used to study an operating
plant to identify potential combinations of failures for specific accidents.
Staffing Requirements
Single Individual
One analyst should be responsible for a single fault tree, with frequent consultation
with the engineers, operators, and other personal who have experience with the
systems/equipment that are included in the analysis.
Team
A team approach desirable if multiple fault trees are needed, with each team
is
member concentrating on one individual fault tree. Interactions between team
members and other experienced personnel are necessary for completeness in the
analysis process.
If a Fault /Failure takes place in FRC/ FCV, reactor temperature may increase to hazardous level (temp.)and hence TIS
Temperature Indicating Switch is provided by process design engineer to close the flow of Material-A by shutting down
Emergency Shut-Off Valve (XV).This is termed as High Temp. Interlock through TIS.,
Further, if this high temperature interlock fails, then high temperature may cause high pressure and hence a Bursting Disc is
installed on the reactor to release the pressure out of the reactor vessel to prevent harm to human / operators in the area
which may take his life if impact is extensive. (Rupture of Reactor/ Fatal Accident)
HIGH TEIP
EMERGENCY INTERLOCK
SHLT-OFF|
VALVE BURSTING
FLO W TIS
CONTROLLE
DIS
FRC
FLO
CONTROL
W
ALVE
MATERIAL
B
DILATERIAL
Chemical
RUNAWAY BURSTING
REACTION DISC FAILS
0.02
Probability
1.8x 10-2F/YR of failure
Dn deinand1
FLOW CONTROL TEMPERATURE
LOOP FAILSs INTERLOCK FAILSS
Stand I
PSU
Markov Models
OR
Tire Faiiure
Road
Debr is
Defectí Worn
T ire Tiree
Figure 11-12 A fault tree describing the various events contributing to a flat tire
04-08-2021 Pramod N Parikh 15
Failure-Analysis of a flat-tire
FTA -
for flat tire continued )
For instance, a fiat tire on an automobile is caused by
two possible events.
In one case the flat is due to driving over debris on thne
road, such as a nail. The other possible cause
as the top event.
is tire
failure. The flat tire is identified
The two contributing causes are either basic or
intermediate events. The basic events are events that
cannot be defined further, and intermediate events aree
events that can be defined further.
For this example, driving over the road debris is a
basic event because no further definition is possible.
Figure
shutdown 12-5systemns are inkedroactor
Achemical with an alarm
in paraliel. and an iniet feed solenoid. The alam and e
Example 12-5
Consider again the alarm indicator and emergency shutdown system of Example 12-5.Draw a fault
tree for this system
Solution
The first step is to define the problem.
Q-2.When can (PIC) Emergency Shutdown through Switch 2 & Solenoid fail ?
Faitura
naieao
of
Aarm Faituroshusof i orgericy
Pressure
Swisch
ressare
lndicator Pressur
Switch2
Fature Faiure
Failune Faluire
Probability
100
Zero
.
Simple
percent
Probability
of
0.00;
therefore
=
any event
Probability
occuring
100 %=
mathematically
1
can be
expressed
expressed
Concepts
in the
from
range
zero
of
percent
a to 1
to
are valid.
Bathtub Curve
i2-1 through 12- esonabl3 consitzxnt csel
Failure
Rate
(faults/time) Period of Approximately
Constant
Mortality
Old Age
infant
Time
12-2 typical bathtub failure rate curve for process hardware. The failure rate is
constant nvartha midlifeof the comnponent.
Solution Refer Slide-6 1 for failure rates of PS & Indiactor & Solenoid
-
0.165..
ln R=
.-. Reliability
Probabilityy
-
ln (0.835)=0.
of alarnm system
of failure of alarm system
180 faults / year
(nmultiplication of reliabilities)
Solution (continued)
For the Shutdown system also the components are also in series
So, R= (0.87) (0.66) = 0.574
P =1-R = 1-0.574 =0.426 Probability of Failure
4F- In R = - ln (O.574)= 0.555 faults /year
Two systems combined (when alarm system and shutdown system fail1)
Probabilities to be multiplied for overall performance= 0.165 x 0.4265
0.070: This is probability offailure on both system to occur hazard
System reliability =
(1- 0.070)= 0.93 =(93 %)
=Failure rate= -ln R = -ln (0.93)=0.073 faults / year
07-08-2021
Explain-Operating modes of Controllers (Auto/Man.)
How calculation is done for various 9 results in Event -Tree Analysis
Basic understanding of calculations with data available for IE in occurrence/ year & failure rate of safety function
Relationship between FTA & ETA
Solution (continued)
For the Shutdown system also the components are also in series
So, R= (0.87) (0.66) = 0.574
P =1-R = 1-0.574 =0.426 Probability of Failure
4F- In R = - ln (O.574)= 0.555 faults /year
Two systems combined (when alarm system and shutdown system fail1)
Probabilities to be multiplied for overall performance= 0.165 x 0.4265
0.070: This is probability offailure on both system to occur hazard
System reliability =
(1- 0.070)= 0.93 =(93 %)
=Failure rate= -ln R = -ln (0.93)=0.073 faults / year
487
11-2 Event Trees
Reactor Cooling
Feed ,coit
Cooling
Water
Out
Cool ing
Water
In
actor
Temperature
Controllor
ThermocoupIe
arm
at
High Temperature
TA Alar m
Figure 11-8 Reactor with high-temperature alarm and temperature controller.
07-08-2021 EARROTGNTkea
Event-Tree with 4 Safety Functions for an Initiating Event
I.E in Failures/year& Safety Functions in Probability of failure on demand
Oprtor Ope Operter
Hih Te- Re-stert Shu1s Oouna
Saf Funeri en: Ater
Oprto
Rl-rs
Hiah
Nices
T=p C..1in
C
Iden iie B-25 e.25 e.1
Feilres0e mand
8.725 Con
e.993 Shu
e- 247TS
2475 *
305625 Ce
Iii
Los co Ev
in e-ea75 e.ea6B8 ABO Shu
1Ocurrnee s.s8107anosTOrs -9661a7s
e.e1
o.888ssas
.8062slaaGB666259*Rur
-2227 .81688 .88a5625 B.2ees8 ceure nces/u
Rwnony B.szTs
aure 11
.ees1875
Event tree
.saeeses
for a koue-of-cooiant
a. eeses eucrencs/y
mocidont ranctor oft Figure 11-8
for the
Opiemtor
aren ae Cpto uts dcn
rodctor Bosult
sntor
Fatan ertan o.01
Continue opieratioes
hea
o.7425
AD
o2227
0.2475 PADE reuanawy
o.c247s
o 005625 ainue operation
ngevent o00700 ABO
O.00t6B8 sut dowm
o.001876 AEDE
o0 O.0001875
ABC
o.00187 Contine operation
0.0025 ABCD
O.000ses25 shutdown
O.000625 ABCRE
O.0000625 Runan
For 02475 .000s62022s0 oocuooou ecesy
Runayo.0247so.oo01erso.0000625O.02500 esy
07-08-2021
s a2
Shutdown: Total of all 3 Shutdown Results= 0.2250 Occurrence/year A
Runaway : Total of all 3 Runaway Results = 0.02500 Occurrence/year- B
Continued: Total of all 3 Continued Operation=0.7500 Occurrences /Year - C
Total Occurrence per year = A+ B + C 10ccurrence /year per Reactoor
Hish Te=p Opereo Op-retor Oeretor
Saf y Fune tion Rirm alers No ic Re arts Shuts Den
Operete Hiak T-= Co-Iins Reecer
Idoifier: C
Feilures/Dmand: a. 1 e.25 B.255
B-7425 Cen
e.99
-2475
AD
e.2227 S Ss
B.82475 R
AB
In i ering Even
Loss f Co o Iin .875 ABO
e.e81688S) Sh
1 Oce urrenee e.e81e75|ABDE
.81 B-06618rs
ABC
e.ebe5 ARCD
Shurdo en
Runon=y
8.2227
e. e247S
Figure
.
8. e81688
11-9
ese1875
Event tree
.a88s625
for
.ss8es25
a loss-of-coolant
.eee6aslaaco662s25
8.2es8
B.e2sa8
accident
ceumeences/ur-
e currencesyr
for the reactor of Figure 11-8
07-08-2021 Pramod N Parikh
Basic Understanding
ETA Begins with IE and work toward the Top Event (Induction
FTA: Begins with Top event and work backward toward IE (Deduction)
.IE Cause of the incident (Accident)
Top Events: Final Out come ofthe incident (Hazard)
487
11-2 Event Trees
Reactor Cooling
Feed
Cooling
Water
Out
Cool ing
Water
In
actor
Temperature
Controllor
ThermocoupIe
arm
at
High Temperature
TA Alar m
Figure 11-8 Reactor with high-temperature alarm and temperature controller.
13-08-2021 EARROTGNTkea
Shutdown: Total of all 3 Shutdown Results= 0.2250 Occurrence/year A
Runaway : Total of all 3 Runaway Results = 0.02500 Occurrence/year- B
Continued: Total of all 3 Continued Operation=0.7500 Occurrences /Year - C
Total Occurrence per year = A+ B+C 1Occurrence /year per Reactor
Nih Tep Op Opraser
Safey Fun
lont OgerrAler
Bier
Hik T-
t-s
Co-lins
rer1s Shuts
Rce
De
Ideoifier? C
Iniietin Eve
e.P87T5 BBO
Loss
1 Occ
fCoo
urrenc
I e-e616Be Sh
.8 .ea1atas561875Y
ABC
B.881875 c.
e.eb25
965625 Sh
Shurden
Runen=y
e.222r
8. s247S
Figure
.
e. e816e8
e0e1875
a. 8a8s625
8.sa88s25
TTaTmouNGINI
8.a2588
B.e006eaacDEaSESR
8.225s8.<e s/
e eurmencesye
urrenc
11-9 Event tree for a loss-of-coolant aocident forthe reactorof Figure
31-8
Basic Understanding
Concept is extended in Layer of Protection Analysis (LOPA) for Risk reduction requirement
Risk: Acceptable, Not Acceptable
LOPA: Layer of Protection Analysis
Advantages and Disadvantages of Fault Trees
Relationship between Fault trees and Event Trees
Risk:
Risk is defined as combination of the probability of occurrence of harm ( frequency/ likelihood) and the
severity of that harm (consequence)
Risk could be of loss of human life/ costly assets/ environment harming all livings (PPE)
Acceptable risk: Refer figure 11-15 on page 499 of Frequency & Consequence in the text book.
Further risk reduction NOT required at additional cost for Acceptable risk
Not Acceptable risk: Refer figure 11-15 on page 499 of Frequency & Consequence.
Consider re-design of process or protection system to reduce the risk to acceptable zone
by adding layers of protection and/or layers of mitigation in the process design.
LOPA: Layers of Protection Analysis :Ref. to Figure 11.16 on page 501of text book.
Next slides are to make easy understanding of above risk concepts and LOPA
How Not Acceptable (Intolerable ) risk is reduced to Acceptable risk ?
(By HAZOP study, Controllers in DCS/ PLC & LOPA)
who decides Acceptable risk?
Recording has started. This meeting is being recorded. By joining. you are giving consent for this meeting to be recorded Prbcac
SIF
Intolerable HAZOP
risk level
SIL Selection SIL of
The SIF
ALARP
or tolerable
risk region Validate
Requirements
Acceptable risk implementation Safety
loop
37
Onion-Diagram for Protection Layers :Preventive & Mitigation
Layers (Plz. refer textbook of Crowl & Louvar)
ARecording has started. This meeting is being recorded By joining- you are gwing consent for this meeting to be recorded. Priacy.policy
Layers of Protection
Onion Model
7Plantemergency Reapon
. Passve Physical
mitigation
containment Protection DIkes)
Active Physicai Protection
(RellefDeviens
4. Autonatic Safety
System
Instrumented
Peoc
design
IEC 61508/61511
Layers of Protection for avoiding
hazards and consequences
SIS: is one of the layers of Protection & Event tree Analysis is deployed in LOPA Analysis
Process
Defense in depth, or, don't put all your eggs in one basket.
Is alarm a perfect (IPL)Independent Protection Layer ?
xida
Alarms are One of the First Protection Layers
Community
Emergency response
Plant
Emergency response
ve protectio
Lsa
Safety InstrunmentedSystem
-
(Ls
Trip
Operator lntervention
Alarm
Process Control
LoOp
Process Design
Process value
Mechanical safety
Unsafe Condition
SIF action
Trip level
Alarm Condition
Operator takes action High alarm leve
High level
Normal Condition
Process value
28 Request control
27
Chain of with Events with failure of 3 layers- BPCS,SIS,PSV with
release of toxic, explosive, flammable gas leading to harmto
humans, fire of asset.
Recording has started. This meeting is being recorded By joining you are ghing conent for this meeting to tbe recorded. Erox.o
Chai of Events
ema
tritiating
oenaio
Event
Process under control
ontroat Loop
ESO vatve teil cle
Humen erro
Process deviation Pump malfunction
or disturbance etc
Hazardous situation to
S
Dreven us situation
SIF Released Hazard
Consequences
FALURE ON DEMAND
28
LOPA-Event Tree: Risk Arrow reduces in size due to each of
protection layer's effectiveness to reduce risk.
a Recording haa started. Thä meeting is being recorded. By joining- you are giving consent for this meeting to be recorded: Prbavnolio
PFDYa
yz Evene Frequeney
PFD-
PED,-y Success Safe outcome
Initiating
Event Succe
Safe outcome
Estimated Frequency
Succees
Safe outcome
8
RP
DM SM N
Tolerable risk is same as Acceptable risk
Example: Du-Pont story in USA (1 death in 14 months)
Residual
risk
Tolerable
risk
Processs
risk
Partial risk
Partial risk by other
Partial risk by other non-sIF
protedtion layers cOvered by SIF protection layers
initially SIL1)
Bhopal, india,
1984
13
SM) Sv JN
n DM
Simple Example of Hexane Tank overflow Scenario
Determine frequency of fire event & fatality of operator by LOPA method in case
- LIC fails once in 10 years (This is initiating event in Event -tree)
Overflow has a passive protection by a Dike and hence Hexane is contained safely.
COL
ble fatality.
HAZOP
----
Vent
A/s
Hexane
Storage
Tank
-
Prior Proces eErososs
obab1LEy
rows, which amounts to 2.5 10 UUL 1.00.5t
of the remaining
CONDITON -MoDEE RS
Probability Probability of Probability
BPCS loop Dike personnel
of ignition in of fatality
fallure
o-1/y P oo1 P=1 area P-o-5 P= O5
No significant event
Success P= o.99
No significant event
FO.1/yr No P=0
Fire
Failure P 0.01 No P= 0.5 Fire, no fatality
Yes P= 1.0
Preb
8 Fie
No P= 0.5
'o1 Yes P
=
0.5
=4 /y. Yes
Fire, w/ fatality
P= 0.5-
Figuro 7-4. Event Tree for LOPA
Example
MHRD Scheme on Global Initiative on
Academic Network (GIAN) &
Initiating Event Frequency
Commissionerate of Technical Education
Gujarat State example
(REF:AIChE CCPS LOPA Text Book)
Lightening Strike 1x 10 -3
BPCS Loop Failure (Control valve, Logic Solver, Sensor; HW/SW) 1x10 ^ -1
Regulator Failure 1x 10 -1
AasenicWewok(GAN
Probability Failure on Demand (PFD)example
AConmisinente d lktnial tdcaton,
Gujant Sate
(REF:AIChE CCPS LOPA Text Book)
R7
Shell 10 -3 10 A
-6
BP 10 -3 10^ -6
ICI (on shore) 3.3X10 -5
Typical 10 A -4 10 A -6
Initiating Event
Likelihood of Failure
(Events per Year)
BPCS instrument loop failure
Note: IEC 61511 limits the likelihood of BPCS failure to no less than 101
8.76 x10yr (EC, 2001)
Regulator failure 10-1
Probability of
Independent Protection Layer (IPL) Failure on
Demand (PFD)
*Basic Process Control System, if not associated with the initiating event
being considered (assume high demand BPCS) 1x10
*Operator response to an audible alarm with at least 10 minutes response
1x 10
time
Critical operating procedure 1x 10
| Relief valve (non-dirty service) 1 x
10
|Relief valve (dirty service) 1x 10
HAZOP:Hazard and Operability Study
Revision of last 3 slides:(Benchmark Tolerable Risk, Initiating Frequency, PFD for devices)
Shell 10 -3 10 A
-6
BP 10 -3 10^ -6
ICI (on shore) 3.3X10 -5
Typical 10 A -4 10 A -6
Initiating Event
Likelihood of Failure
(Events per Year)
BPCS instrument loop failure
Note: IEC 61511 limits the likelihood of BPCS failure to no less than 101
8.76 x10yr (EC, 2001)
Regulator failure 10-1
Probability of
Independent Protection Layer (IPL) Failure on
Demand (PFD)
*Basic Process Control System, if not associated with the initiating event
being considered (assume high demand BPCS) 1x10
*Operator response to an audible alarm with at least 10 minutes response
1x 10
time
Critical operating procedure 1x 10
| Relief valve (non-dirty service) 1 x
10
|Relief valve (dirty service) 1x 10
HAZOP:Hazard and Operability Study
Definition
It relies on:
Systematic identification
Methodical brainstorming
Creative interaction of diverse disciplines
And:
Toidentify all process safety, health and environmental hazards (SH & E)
And wherever possible, to determine:
27-08-2021
And:
Independent Leader
Project Manager / Engineer
Operation / Maintenance Representative(s)
.Discipline Engineer(s) / Specialist
ProceSS
.Mechanical
.Electrical / Instrument
Others, e.g. Chemist, Vendor Representative
HAZOP Minutes Recorder
Team Size 4 to 8 People (10 to 12 people if Design is out-sourced)
When any of the above or both fails to function as per design, it initiates a hazardous event (Initiating Event)
This initiated event may result into a hazardous event if process design does bot have adequate protection.
In HAZOP Study cause of initiating event is identified for deviation from normal design intent of operating in normal range.
Also, the consequence(resulting out-come ) is recorded.
The HAZOP team reviews and records safe guards if any in place if any to prevent or mitigate the consequence.
PROJECT:
TEAM MEMBERS: DATE:
NODEE: LEADER:
DRAWING: MINUTES BY:
POTENTIAL cONSQUENCES
INTERMEDIATE EVENTs
INITIATING EVENTS
HAZOP Study General Principles
GUIDEWORDS:
No, NoT, NONE Plus, Special
MORE Application Guide
LESS Words,
As WELLAs Sooner,/ Faster,
PART OF Later/Slower
REVERSE
Where else
OTHER THAN
LC
To Plant
Reactor
Cooling coils
Monomer feed
Cooling water in
TC
Thermocouple
Figure 10-8 An exothermic reaction controiled by cooling water.
Exampie 10-2
Consider the reactor system shown in Figure 10-8. The reeaction is exothermic, a cooling systemso
is provided to remove the excess energy of reaction. In the event thal the cooling function is lost,
the temperature of the reactor would increase.This vould lead to an increase in reaction ratc, lcad-
ing to additional energy release. The result would be a runaway reacaion with pressures exceeding
the bursting pressure of the reactor vessel.
The temperature within the reactor is measured and is used to control the cooling water fow
rate byy a valve.
Performa HAzOPstudy on this unit to improve the safety of he process. Use as study nodes
the cooling coil (process parameters: fow and temperature) and the stirrer (process parameter:
agitation).
Solution
The guide words are applied to the study node of the cooling coils and the stirrer with the desig-
nated process parameters.
analysis.
The HAZOP results are shown in Tablc 10-7, which is only a small part of the complete
Node Description: Cooling Coil used for Ref.Drg.: Figure 10-8 Virtual class of
circulating cooling water in Exothermic Sem.VIll students
reactor-Tag no. 27-R 108 B.Tech (Chem.)
1. Begin with a detailed flow sheet. Break the fiow sheet into a number of process units. Thus
the reactor area might be one unit, and the storage tank another. Select a unit for study.
2. Choose a study node (vessel, line, operating instruction).
3. Describe the design intent of the study node. For example, vessel V-1 is designed to store
the benzene feedstock and provide it on demand to the reactor.
4. Pick a process parameter: flow, level, temperature, pressure, concentration, pH, viscosity,
state (solid, liquid, or gas), agitation, volume, reaction, sample, component, start, stop,
stability, power, inert
5. Apply a guide word to the process parameter to suggest possible deviations. A list of guide
words is shown in Table 10-3. Some of the guide word process parameter combinations
are meaningless, as shown in Tables 10-4 and 10-5 for process lines and vessels.
6. If the deviation is applicable, determine possible causes and note any protective systems.
7. Evaluate the consequences of the deviation (if any).
8. Recommend action (what? by whom? by when?)
9. Record all information.
Guidelines for Hazard Evaluation Procedures, 2d ed. (New York: American Institute of Chemical Engi-
neers, 1992).
10-3 Hazards and Operability Studies 449
No, More, As
Process not, higher, Less, well Part Other Sooner, Later, Where
parameters none greater lower as of Reverse than faster slower else
Flow X X X X X
Temperature
Pressure
Concentration X X
pH x X
Viscosity X X X
State X
450 Chapter 10 Hazards ldentification
No, More, As
Process not, higher, Less, well Part Other Sooner, Later, Where
Parameters none greater lower as of Reverse than faster slower else
Level X X x X
Temperature X
Pressure X
Concentration
pH X X
Viscosity X X
Agitation X X
Volume
Reaction
State X X
Sample X X X X
10. Repeat steps 5 through 9 until all applicable guide words have been applied to the cho-
sen process parameter.
11. Repeat steps 4 through 10 until all applicable process parameters have been considered
for the given study node.
12. Repeat steps 2 through 11 until all study nodes have been considered for the given sec-
tion and proceed to the next section on the flow sheet.
The guide words AS WELL AS, PART OF, and oTHER THAN can sometimes be conceptually dif-
ficult to apply. As WELL AS means that something else happens in addition to the intended de-
sign intention. This could be boiling of a liquid, transfer of some additional component, or the
transfer of some fluid somewhere else than expected. PART OF means that one of the compo-
nents is missing or the stream is being preferentially pumped to only part of the process.
OTHER THAN applies to situations in which a material is substituted for the expected material,
is transferred somewhere else, or the material solidifies and cannot be transported. The guide
words soONER THAN, LATER THAN, and WHERE ELSE are applicable to batch processing.
An important part of the HAZOP procedure is the organization required to record and
use the results. There are many methods to accomplish this and most companies customize
their approach to fit their particular way of doing things.
Table 10-6 presents one type of basic HAZOP form. The first column, denoted "Item,"
is used to provide a unique identifier for each case considered. The numbering system used is
a number-letter combination. Thus the designation "1A" would designate the first study node
and the first guide word. The second column lists the study node considered. The third column
lists the process parameter, and the fourth column lists the deviations or guide words. The next
three columns are the most important results of the analysis. The first column lists the possible
Table 10-6 HAZOP Form for Recording Data
Completed:
Project name: Date Page of
No action
Process:
Reply date
Section: Reference drawing:
Deviations
Item
Study Process Possible causes Possible consequences Action required
Assigned
(guide to
node parameters words)
452 Chapter 10 Hazards ldentification
Reactor
Cooling waterin
TC
Thermocouple
causes. These causes are determined by the committee and are based on the specific devia-
tion-guide word combination. The next column lists the possible consequences of the devia-
tion. The last column lists the action required to prevent the hazard from resulting in an acci
dent. Notice that the items listed in these three columns are numbered consecutively. The last
several columns are used to track the work responsibility and completion of the work.
Example 10-2
Consider the reactor system shown in Figure 10-8. The reaction is exothermic, so a cooling system
is provided to remove the excess energy of reaction. In the event that the cooling function is lost,
the temperature of the reactor would increase. This would lead to an increase in reaction rate, lead-
ing to additional energy release. The result would bea runaway reaction with pressures exceeding
the bursting pressure of the reactor vessel.
The temperature within the reactor is measured and is used to control the cooling water flow
rate by a valve.
Perform a HAZOPstudy on this unit to improve the safety of the process. Use as study nodes
the cooling coil (process paranmeters: flow and temperature) and the stirrer (process parameter
agitation).
Solution
The guide words are applied to the study node of the cooling coils and the stirrer with the desig-
nated process parametersS.
The HAZOP results are shown in Table 10-7, which is only a small part of the complete
analysis.
Table 10-7 HAZOP Study Applied to the Exothermic Reactor of Example 10-2.
Completed:
Project name: Example 10-2 Date: 1/1/93 Page 1
of 2
No action:
Process: Reactor of Example 10-2
Reply date
Section: Reactor shown in Example 10-2 Reference drawing: Figure 10-8
Deviations
ltem Study Process uide
9words) Possible causes Possible consequences Action
requiredASSigned
node parameters to:
1A Cooling Flow No 1. Control valve fails closed 1. Loss of cooling, possible 1. Select valve to fail open DAC 1/993
coils 2. Plugged cooling coils runaway 2. Install fiter with maintenance DAC 1/93
2. procedure
Install cooling water flow meter DAC |2/93
and low flow alarm
Install high temperature alarm DAC 2/93
to alert operator
3. Cooling water service failure 3. 3. Check and monitor reliability of |DAC 2/93
water service
Controller fails and closes valve 4. 4. Place controller on critical DAC 1/93
instrumentation list
5. Air pressure fails, closing valve 5. See 1A.1
1B High 1. Control valve fails open 1. Reactor cools, reactant 1. Instruct operators and update JFL 1/93
conc. builds, possible procedures
runaway on heating
2. Controiler fails and opens valve 2. 2. See 1A.4
C LoW 1.Partially plugged cooling line 1.Diminished cooling, 1. See 1A.2
possible runaway
2. Partial water source failure 2. See 1A.2
3. Control vaive fails to respond 3. Place valve on critical JFL 1/93
instrumentation list
1D As well as, Contamination of water supply 1. Not possible here 1. None
1E part of, 1. Covered under 1C
1F reverse 1. Failure of water source resulting in 1. Loss of cooling, possible 1. See 1A.2
backflow runaway
2. Backflow due to high backpressure 2. Install check valve JFL 2/93
1G Other than, 1. Not considered possible
1H soonerthan,| 1. Cooling normally started early 1. None
later than 1.Operatorerror 1. Temperature rises, 1. Interlock between cooling flow JW 1/93
possible runaway and reactor feed
1J
1K Temp.
Where else 1.1. Not consideredpossible
Low Low watersupply temperature 1. None-controiler handles 1. None
L High 1. High water supply temperature 1.Cooling system capacity 1. Install high flow alarm and/or JW 1/93
limited, temp. increases cooling water high temp. alarm
2A StirrerAgitation | No
1. Stirrer motor malfunction 1. No mixing, possible 1. Interlock with feed line 1/93
accumulation of unreacted JW
materialss 2/93
2. Power failure 2. Monomer feed continues, 2. Monomer feed valve must fail JW
possible accumulation of closed on power loss
unreacted materials
2B More Stirrer motor controller fails, 1. None
resulting in high motor speed
454 Chapter 10 Hazards Identification
The potential process modifications resulting from this study (Example 10-2) are the
following:
install a high-temperature alarm to alert the operator in the event of cooling function loss,
install a high-temperature shutdown system (this system would automatically shut down
the process in the event of a high reactor temperature; the shutdown temperature would
be higher than the alarm temperature to provide the operator with the opportunity tore-
store cooling before the reactor is shutdown),
install a check valve in the cooling line to prevent reverse flow (a check valve could be in-
stalled both before and after the reactor to prevent the reactor contents fromn flowing up-
stream and to prevent the backflow in the event of a leak in the coils),
periodically inspect the cooling coil to ensure its integrity,
study the cooling water source to consider possible contamination and interruption of
supply,
install a cooling water flow meter and low-flow alarm (which will provide an immediate
indication of cooling loss).
In the event that the cooling water system fails (regardless of the source of the failure),
the high-temperature alarm and emergency shutdown system prevents a runaway reaction. The
review committee performing the HAZOP study decided that the installation of a backup con-
troller and control valve was not essential. The high-temperature alarm and shutdown system
prevents a runaway reaction in this event. Similarly, a loss of coolant water source or a plugged
cooling line would be detected by either the alarm or the emergency shutdown system. The re-
view committee suggested that all coolant water failures be properly reported and that if a par
ticular cause occurred repeatedly, then additional process modifications were warranted.
Example 10-2 demonstrates that the number of suggested process changes is great, al-
though only a single process intention is considered.
The advantage to this approach is that it provides a more complete identification of the
hazards, including information on how hazards can develop as a result of operating procedures
and operational upsets in the process. Companies that perform detailed HAZOPs studies find
that their processes operate better and have less down time, that their product quality is im-
proved, that less waste is produced, and that their employees are more confident in the safety
of the process. The disadvantages are that the HAZOP approach is tedious to apply, requires
considerable staff time, and can potentially identify hazards independent of the risk.
VENT
Reflux -2Caustic
v-2XScrubber
Condenser
(NaOH)
COCl2
L
REACTOR
Figure 10-9 Original design of phosgene reactor before informal safety review.
The informal safety review is used for small changes to existing processes and for small
bench-scale or laboratory processes. The informal safety review procedure usually involves
just two or three people. It includes the individual responsible for the process and one or two
others not directly associated with the process but experienced with proper safety procedures.
The idea is to provide a lively dialogue where ideas can be exchanged and safety improvements
can be developed.
The reviewers simply meet in an informal fashion to examine the process equipment and
operating procedures and to offer suggestions on how the safety of the process might be im-
proved. Significant improvements should be summarized in a memo for others to reference in
the future. The improvements must be implemented before the process is operated
Example 10-3
Consider the laboratory reactor system shown in Figure 10-9. This system is designed to react phos-
gene (COClh) with aniline to produce isocyanate and HCl. The reaction is shown in Figure 10-10.
The isocyanate is used for the production of foams and plastics.
Phosgene is a colorless vapor with a boiling point of 46.8°F. Thus it is normally stored as a
liquid in a container under pressure above its normal boiling point temperature. The TLV for phos-
gene is 0.1 ppm, and its odor threshold is 0.5-1 ppm, well above the TLV.
Aniline is a liquid with a boiling point of 364°F. Its TLV is 2 ppm. It is absorbed through
the skin.
NH NCO
O+co0,|O+2n
Aniline Isocyanate
Figure 10-10 Reaction stoichiometry for phosgene reactor.
456 Chapter 10 Hazards ldentification
Vacuum
Control
Reflux
Condenser
LCOCi
Flow
Indicator
Relief Trap
50%
NaOH
20%
NH,OH
Caustic REACTOR
Solution
Figure
reactor
Figure 10-11 Final design of phosgene reactor after informal safety review.
In the process shown in Figure 10-9 the phosgene is fed from the container through a valve
into a fritted glass bubbler in the reactor. The reflux condenser condenses aniline vapors and re-
turns them to the reactor. A caustic scrubber is used to remove the phosgene and HCl vapors from
the exit vent stream. The complete process is contained in a hood.
Conduct an informal safety review on this process.
Solution
The safety review was completed by two individuals. The final process design is shown in Fig-
ure 10-11. The changes and additions to the process are as follows:
In addition, the reviewers recommended the following: (1) Hang phosgene indicator paper around
the hood, room, and operating areas (this paper is normally white but turns brown when exposed
to 0.1 ppm of phosgene), (2) use a safety checklist, daily, before the process is started, and (3) post
an up-to-date process sketch near the process.
The formal safety review is used for new processes, substantial changes in existing pro
cesses, and processes that need an updated review. The formal safety review is a three-step pro-
10-4 Safety Reviews 457
cedure. This consists in preparing a detailed formal safety review report, having a committee
review the report and inspect the process, and implementing the recommendations. The for
mal safety review report includes the following sections:
I. Introduction
A. Overview or summary: Provides a brief summary of the results of the formal safety
review. This is done after the formal safety review is complete.
B. Process overviewor summary: Provides a brief description of the process with an em-
phasis on the major hazards in the operation.
C. Reactions and stoichiometry: Provides the chemical reaction equations and stoi-
chiometry.
D. Engineering data: Provides operating temperatures, pressures, and relevant physical
property data for the materials used.
II. Raw materials and products: Refers to specific hazards and handling problems associated
with the raw materials and products. Discusses procedures to minimize these hazards.
III. Equipment setup
A. Equipment description: Describes the configuration of the equipment. Sketches of
the equipment are provided.
B. Equipment specifications: Identifies the equipment by manufacturer name and model
number. Provides the physical data and design information associated with the
equipment.
IV. Procedures
A. Normal operating procedures: Describes how the process is operated.
B. Safety procedures: Provides a description of the unique concerns associated with the
equipment and materials and specific procedures used to minimize the risk. This in-
cludes:
1. Emergency shutdown: Describes the procedure used to shut down the equipment
if an emergency should occur. This includes major leaks, reactor runaway, and loss
of electricity, water, and air pressure.
2. Fail-safe procedures: Examines the consequences of utility failures, such as loss of
steam, electricity, water, air pressure, or inert padding. Describes what to do for
each case so that the system fails safely.
3. Majorrelease procedures: Describes what to do in the event of a major spill of toxic
or flammable material.
C. Waste disposal procedure: Describes how toxic or hazardous materials are collected,
handled, and disposed.
D. Cleanup procedures: Describes how to clean the process after use.
V. Safety checklist: Provides the complete safety checklist for the operator to complete be-
fore operation of the process. This checklist is used before every startup.
VI. Material safety data sheets: Provided for each hazardous material used.
458 Chapter 10 Hazards Identification
DIRTY TOLUENE
STORAGE
POD
7G
O WATER
CLEAN
TOLUENE
WATER
Figure 10-12 Toluene water wash process before formal safety review.
Example 10-4
A toluene water wash process is shown in Figure 10-12. This process is used to clean water-soluble
impurities from contaminated toluene. The separation is achieved with a Podbielniak centrifuge,
or Pod, because of a difference in densities. The light phase (contaminated toluene) is fed to the
periphery of the centrifuge and travels to the center. The heavy phase (water) is fed to the center
and travels countercurrent to the toluene to the periphery of the centrifuge. Both phases are mixed
within the centrifuge and separated countercurrently. The extraction is conducted at 190°F.
The contaminated toluene isfed froma storage tank into the Pod. The heavy liquid out (con-
taminated water) is sent to waste treatment and the light liquid out (ch an toluene) is collected in a
55-gal drum.
Perform a formal safety review on this process.
Solution
The complete safety review report is provided in appendix D. Figure 10-13 shows the modified pro-
cess after the formal safety review has been completed. The significant changes or additions added
as a result of the review are as follows:
1. add grounding and bonding to all collection and storage drums and process vessels,
2. add inerting and purging to all drums,
3. add elephant trunks at all drums to provide ventilation,
4. provide dip legs in all drums to prevent the free fall of solvent resulting in the generation and
accumulation of static charge,
5. add a charge drum with grounding, bonding, inerting, and ventilation,
6. providea vacuum connection to the dirty toluene storage for charging,
7. add a relief valve to the dirty toluene storage tank,
8. add heat exchangers to all outlet streams to cool the exit solvents below their flash point (this
must include temperature gauges to ensure proper operation), and
10-5 Other Methods 459
Vacuumn
90°F
LHO77°F DIRTY TOLUENE
STORAGGE
POD/
Na
GWATER
CLEAN
TOLUENE
N
SL DIRTY TOLUENE
CHARGE
WATER
L
Figure 10-13 Toluene water wash process after formal safety review.
9. provide a waste water collection drum to collect all waste water that might contain substan-
tial amounts of toluene from upset conditions.
Additional changes were made in the operating and emergency procedure. They included
1. checking the room air periodically with colorimetrie tubes to determine whether any tolu-
ene vapors are present and
2. changing the emergency procedure for spills to include (a) activating the spill alarm, (b) in-
creasing the ventilation to high speed, and (c) throwing the sewer isolation switch to prevent
solvent from entering the main sewer lines.
The formal safety review can be used almost immediately, is relatively easy to apply, and is
known to provide good results. However, the committee participants must be experienced in
identifying safety problems. For less experienced committees, a more formal HAZOP study
may be more effective in identifying the hazards.
2. Human error analysis: This method is used to identify the parts and the procedures of a
process that have a higher than normal probability of human eror. Control panel layout
is an excellent application for human error analysis because a control panel can be de-
signed in such a fashion that human error is inevitable.
3. Failure mode, effects, and criticality analysis (FMECA): This method tabulates a list of
cquipment in the process along with all the possible failure modes for each item. The ef
fect of a particular failure is considered with respect to the process.
Suggested Reading
Dow's Fire and Explosion Index Hazard Classification Guide, 7th ed. (New York: American Institute of
Chemical Engineers, 1994).
Guidelines for Hazard Evaluation Procedures, 2d ed. (New York: American Institute of Chemical Engi-
neers, 1992).
revor A. Kletz, HAZOP and HAZAN, 3d ed. (Warwickshire, England: Institution of Chemical Engi-
neers, 1992).
Frank P. Lees, Loss Prevention in the Process Industries, 2d ed. (London: Butterworths, 1996), ch. 8.
Problems
10-1. The hydrolysis of acetic anhydride is being studied in a laboratory-scale continuously
stirred tank reactor (CSTR). In this reaction acetic anhydride {(CH,CO),0] reacts with
water to produce acetic acid (CH,COOH).
The concentration of acetic anhydride at any time in the CSTR is determined by
titration with sodium hydroxide. Because the titration procedure requires time (rela-
tive to the hydrolysis reaction time), it is necessary to quench the hydrolysis reaction as
soon as the sample is taken. The quenching is achieved by adding an excess of aniline
to the sample. The quench reaction is
The quenching reaction also forms acetic acid, but in a different stoichiometric ratio
than the hydrolysis reaction. Thus it is possible to determine the acetic anhydride con-
centration at the time the sample was taken.
The initial experimental design is shown in Figure 10-14. Water and acetic anhy-
dride are gravity-fed from reservoirs and through a set of rotameters. The water is
mixed with the acetic anhydride just before it enters the reactor. Water is also circulated
by a centrifugal pump from the temperature bath through coils in the reactor vessel.
This maintains the reactor temperature at a fixed value. A temperature controller in the
water bath maintains the temperature to within 1°F of the desired temperature.
CHAP TER 1 1
Risk Assessment
bers are used with the typical HAZOP study, although the experience of the review committee
is used to decide on an appropriate course of action.
In this chapter we will
We focus on determining the frequency of accident scenarios. The last two sections show
how the frequencies are used in QRA and LOPA studies; LOPA is a simplified QRA. It should
be emphasized that the teachings of this chapter are all easy to use and to apply, and the results
471
472 Chapter 11 Risk Assessment
are often the basis for significantly improving the design and operation of chemical and petro-
chemical plants.
Rt) = e (11-1)
where Ris the reliability. Equation 11-1 assumes a constant failure rate u. As t0o, the reli-
ability goes to 0. The speed at which this occurs depends on the value of the failure rate p. The
higher the failure rate, the faster the reliability decreases. Other and more complex distribu-
tions are available. This simple exponential distribution is the one that is used most commonly
because it requires only a single parameter, u. The complement of the reliability is called the
failure probability (or sometimes the unreliability), P, and it is given by
PO) = 1- R(t) = 1
- e. (11-2)
The failure density function is defined as the derivative of the failure probability:
dP(1)
f)=
dt Leu (11-3)
P4)= ft) dt =
e dte eH- (11-4)
B. Roffel and J. E. Rijnsdorp, Process Dynamics, Control, and Protection (Ann Arbor, MI: Ann Arbor
Science, 1982), p. 381.
11-1 Review of Probability Theory 473
f(t) Area=
1
p(t) R(t) 1-P(t)
t
(o) (b) (d)
The integral represents the fraction of the total area under the failure density function between
,
time and 4.
The time interval between two failures of the component is called the mean time between
failures (MTBF) and is given by the first moment of the failure density functioon:
= (11-5)
E) = MTBF = f) dt
Typical plots of the functions 4,f.P, and R are shown in Figure 11-1
Equations 11-1 through 11-5 are valid only for a constant failure rate a. Many compo-
nents exhibit a typical bathtub failure rate, shown in Figure 11-2. The failure rate is highest
when the component is new (infant mortality) and when it is old (old age). Between these two
periods (denoted by the lines in Figure 11-2), the failure rate is reasonably constant and Equa-
tions 11-1 through 11-5 are valid.
Faiiure
Rate,
Period of Approximately Constant
(taults/time)
Time
Figure 11-2 A typical bathtub failure rate curve for process hardware. The failure rate is ap-
proximately constant over the midlife of the component.
474 Chapter 11 Risk Assessment
P IIP, i=1
(11-6)
where
R R i=1
(11-8)
P 1 -
II1 -
P). (11-9)
The cross-product term P(A)P(B) compensates for counting the overlapping cases twice. Con-
sider the example of tossing a single die and determining the probability that the number of
points is even or divisible by 3. In this case
11-1 Review of Probability Theory 475
The last term subtracts the cases in which both conditions are satisfied.
If the failure probabilities are small (a common situation), the term P(A)P(B) is negli
gible, and Equation 11-10 reduces to
This result is generalized for any number of components. For this special case Equation 11-9
reduces to
P P.
Failure rate data for a number of typical process components are provided in Table 11-1.
These are average values determined at a typical chemical process facility. Actual values would
Controller 0.29
Control valve 0.60
Flow measurement (fiuids) 1.14
Flow measurement (solids) 3.75
Flow switch .12
Gas-liquid chromatograph 30.6
Hand vale 0.13
Indicator lamp 0.044
Level measurement (liquids) 1.70
Level measurement (solids) 6.86
Oxygen analyzer 5.65
pH meter 5.88
Pressure measurement 1.41
Pressure relief valve 0.022
Pressure switch 0.14
Solenoid valve 0.42
Stepper motor 0.044
Strip chart recorder 0.22
Thermocouple temperature measurement .52
Thermometer temperature measurement 0.027
Valve positioner 0.44
ND-P R2ANDR
-R2)
P
PP2 R 1 (1 R)1 -Ln R)/t
P
TP 1
Paral le1 Link of Components: The fai lure of the system requires the
failure of both components. Note that
there 19 no convenient Way to combine
depend on the ma nufacturer, materials of construction, the design, the environment, and other
factors. The assumptions in this analysis are that the failures are independent, hard, and not in-
termittent and that the failure of one device does not stress adjacent devices to the point that
the failure probability is increased.
A summary of computations for parallel and series process components is shown in
Figure 11-3.
Example 11-1
The water flow to a chemical reactor cooling coil is controlled by the system shown in Figure 11-4.
The flow is measured by a differential pressure (DP) device, the controller decides on an appropri-
ate control strategy, and the control valve manipulates the fow of coolant. Determine the overall
failure rate, the unreliability, the reliability, and the MTBF for this system. Assume a 1-yr period of
operation.
11-1 Review of Probability Theory 477
Controller
FIC
Pump
Control Flow
valve meter
TITTTTTTTTTTÍTTTTT
Figure 11-4 Flow control system. The components of the control system are linked in series.
Solution
These process components are related in series. Thus, if any one of the components fails, the en-
tire system fails. The reliability and failure probability are computed for each component using
Equations 11-1 and 11-2. The results are shown in the following table. The failure rates are from
Table 11-1.
Failure Failure
rate Reliability probability
Component (faults/yr) R e* P 1- R
The overall reliability for components in series is computed using Equation 11-8. The result is
-In(0.10) = 2.30failures/yr.
The MTBF is computed using Equation 11-5:
PresSure
Switch
Alarm
at
P 1
React or
Feed
Solenoid
Valve
Reactor
Figure 11-5 A chemical reactor with an alarm and an iniet feed solenoid. The alarm and feed
shutdown systems are linked in parallel.
Example 11-2
A diagram of the safety systems in a certain chemical reactor is shown in Figure 11-5. This reactor
contains a high-presure alarm to alert the operator in the event of dangerous reactor pressures. It
consists of a pressure switch within the reactor connected to an alarm light indicator. For additional
safety an automatic high-pressure reactor shutdown system is installed. This system is activated at
a pressure somewhat higher than the alarm system and consists of a pressure switch connected to a
solenoid valve in the reactor feed line. The automatic system stops the flow of reactant in the event
of dangerous pressures. Compute the overall failure rate, the failure probability, the reliability, and
the MTBF for a high-pressure condition. Assume a 1-yr period of operation. Also, develop an ex
pression for the overall failure probability based on the component failure probabilities.
Solution
Failure rate data are available from Table 11-1. The reliability and failure probabilities of each com-
ponent are computed using Equations 11-1 and 11-2:
Failure Failure
rate Reliabilityy probability
Component (faults/yr) R et P 1 R
A dangerous high-pressure reactor situation occurs only when both the alarm system and the shut-
down system fail. These two components are in parallel. For the alarm system the components are
in series:
2
R= IIR, = (0.87)(0.96) = 0.835,
P 1 -
R 1 -
0.835 = 0.165,
=
R IIR (0.87)(0.6) = 0.574,
P 1- R 1
-0.574 =
0.426,
MTBF = L= 1.80yr.
2
P= IIP
i=1
= (0.165)(0.426) = 0.070,
=
R 1- P 0.930.,
=-In R =
- In(0.930) = 0.073 faults/yr,
=13.7 yr.
MTBF =
For the alarm system alone a failure is expected once every 5.5 yr. Similarly, for a reactor with a high-
pressure shutdown system alone, a failure is expected once every 1.80 yr. However, with both sys-
tems in parallel the MTBF is significantly improved and a combined failure is expected every 13.7 yr.
The overall failure probability is given by
P P(A)P(S).
where P(A) is the failure probability of the alarm system and P(S) is the failure probability of the
emergency shutdown system. An alternative procedure is to invoke Equation 11-9 directly. For the
alarm system
P(A) =
P + P2- PP2
480 Chapter 11 Risk Assessment
and
=
P P(A)P(S) (Pi + P:NP; + Pa)
= 0.080.
The difference between this answer and the answer obtained previously is 14.3%. The component
probabilities are not small enough in this example to assume that the cross-products are negligible.
Operational
Failed
To
Tr
MTBF
Time
Figure 11-6 Component cycles for revealed failures. A failure requires a period of time for repair.
For revealed failures the period of inactivity or downtime for a particular component is
computed by averaging the inactive period for a number of failures:
T (11-12)
where
(11-13)
,
where is the period of operation between a particular set of failures.
The MTBF is the sum of the period of operation and the repair period:
MTBF =
= T, + To (11-14)
482 Chapter 11 Risk Assessment
A +U =1. (11-15)
The quantity 7, represents the period that the process is in operation, and 7, + To Tepresents
the total time. By definition, it follows that the availability is given by
A = To (11-16)
T+To
Tr
U (11-17)
T+To
By combining Equations 11-16 and 11-17 with the result of Equation 11-14, we can write the
equations for the availability and unavailability for revealed failures:
U =
T
A Tg (11-18)
For unrevealed failures the failure becomes obvious only after regular inspection. This
situation is shown in Figure 11-7. If T, is the average period of unavailability during the in-
spection interval and if 7, is the inspection interval, then
U (11-19)
T Po)dr. (11-20)
U-P) d. (11-21)
11-1 Review of Probability Theory 483
Failed
Tu
T
Time
The failure probability P() is given by Equation 11-2. This is substituted into Equation 11-21
and integrated. The result is
U=1-1-
Ti
e*) (11-22)
P) ut, (11-24)
1
U T (11-25)
This is a useful and convenient result. It demonstrates that, on average, for unrevealed failures
the process or component is unavailable during a period equal to half the inspection interval. A
decrease in the inspection interval is shown to increase the availability of an unrevealed failure.
484 Chapter 11 Risk Assessment
Equations 11-19 through 11-25 assume a negligible repair time. Thisis usually a valid as-
sumption because on-line process equipment is generally repaired within hours, whereas the
inspection intervals are usually monthly.
Example 11-3
Compute the availability and the unavailability for both the alarm and the shutdown systems of Ex
ample 11-2. Assume that a maintenance inspection occurs once every month and that the repair time
is negligible.
Solution
Both systems demonstrate unrevealed failures. For the alarm system the failure rate is = 0.18
faults/yr. The inspection period is 1/12 = 0.083 yr. The unavailability is computed using Equa-
tion 11-25:
1 (1/2)(0.18)(0.083)
U= uT; = = 0.0075,
=
A 1- U 0.992.
The alarm system is available 99.2% of the time. For the shutdown system u = 0.55 faults/yr. Thus
=
U =T= (1/2)(0.55)(0.083) 0.023,
A = 1- 0.023 = 0.977.
Probability of Coincidence
All process components demonstrate unavailability as a result of a failure. For alarms
and emergency systems it is unlikely that these systems will be unavailable when a dangerous
process episode occurs. The danger results only when a process upset occurs and the emer-
gency system is unavailable. This requires a coincidence of events.
Assume that a dangerous process episode occurs pa times in a time interval 7;. The fre-
quency of this episode is given by
(11-26)
Ti
For an emergency system with unavailability U, a dangerous situation will occur only when the
process episode occurs and the emergency system is unavailable. This is every p,U episodes.
11-1 Review of Probability Theory 485
The average frequency of dangerous episodes Ad is the number of dangerous coincidences di-
vided by the time period:
A AU. (11-27)
T
For small failure rates U = žuT; and pa= AT;. Substituting into Equation 11-27 yields
(11-28)
The mean time between coincidences (MTBC) is the reciprocal of the average frequency of
dangerous coincidences:
=
MTBC (11-29)
AuT
Example 11-4
For the reactor of Example 11-3 a high-pressure incident is expected once every 14 months. Com-
pute the MTBC for a high-pressure excursion and a failure in the emergency shutdown device. As-
sume that a maintenance inspection occurs every month.
Solution
The frequency of process episodes is given by Equation 11-26:
It is expected that a simultaneous high-pressure incident and failure of the emergency shutdown
device will occur once every 0 yr.
If the inspection interval 7, is halved, then U = 0.023, A = 0.010, and the resulting MTBC is
100 yr. This is a significant improvement and shows why a proper and timely maintenance program
is important.
486 Chapter 11 Risk Assessment
Redundancy2
Systems are designed to function normally even when a single instrument or control
function fails. This is achieved with redundant controls, including two or more measurements,
processing paths, and actuators that ensure that the system operates safely and reliably. The
degree of redundancy depends on the hazards of the process and on the potential for economic
losses. An example ofa redundant temperature measurement is an additional temperature
probe. An example of a redundant temperature controlloop is an additional temperature probe,
controller, and actuator (for example, cooling water control valve).
If appropriate data are available, the procedure is used to assign numerical values to the vari-
ous events. This is used effectively to determine the probability of a certain sequence of events
and to decide what improvements are required.
2S. S. Grossel and D. A. Crowl, eds. Handbook of Highly Toxic Materials Handling and Management
(New York: Marcel Dekker, 1995), p. 264.
3Guidelines for Hazard Evaluation Procedures, 2d ed. (New York: American Institute of Chemical Engi
neers, 1992).
11-2 Event Trees 487
Reactor Cooling
Feed Coils
Cooling
Water
Out
Cooli ng
Water
In
Reactor
Temp eraturre
Controller
Thermoc ouple
Alarm High Temperature
at
T
TA AIarm
Consider the chemical reactor system shown in Figure 11-8. This system is identical to the
system shown in Figure 10-6, except that a high-temperature alarm has been installcd to warn
the operator of a high temperature within the reactor. The event tree for a loss-of-coolant ini-
tiating event is shown in Figure 11-9. Four safety functions are identified. These are written
across the top of the sheet. The first safcty function is the high-temperature alarm. The second
safety function is the operator noticing the high reactor temperature during normal inspection.
The third safety function is the operator reestablishing the coolant flow by correcting the prob-
lem in time. The final safety function is invoked by the operator performing an emergency shut-
down of the reactor. These safety functions are written across the page in the order in which they
logically occur.
The event tree is written from left to right. The initiating event is written first in the cen-
ter of the page on the left. A line is drawn from the initiating event to the first safety function.
At this point the safety function can either succeed or fail. By convention, a successful opera-
tion is drawn by a straight line upward and a failure is drawn downward. Horizontal lines are
drawn from these two states to the next safety function.
If a safety function does not apply, the horizontal line is continued through the safety
function without branching. For this example, the upper branch continues through the second
function, where the operator notices the high temperature. If the high-temperature alarm op-
erates properly, the operator will already be aware of the high-temperature condition. The se-
quence description and consequences are indicated on the extreme right-hand side of the event
tree. The open circles indicate safe conditions, and the circles with the crosses represent unsafe
conditions.
High Temp Operator Operetor Operator
Safe ty Fune tioni Alerm Alerts Notices Re-S tar 1s Shuts De wn
Operato High Temp Coling Reae tor
Ident if ier: B C E
1 Occurrence/yr
0.001875 ABDE
Run
.0001875
0.01
ABC
8.001875 Con
Safety Function
0.01 Failures/Demand
0.5 Occurrences/yr.
Figure 11-10 The computational sequence across a safety function in an event tree.
The lettering notation in the sequence description column is useful for identifying the par
ticular event. The letters indicate the sequence of failures of the safety systems. The initiating
event is always included as the first letter in the notation. An event tree for a different initiating
event in this study would use a different letter. For the example here, the lettering sequence
ADE represents initiating event A followed by failure of safety functions D and E.
The event tree can be used quantitatively if data are available on the failure rates of the
safety functions and the occurrence rate of the initiation event. For this example assume that
of-cooling event ocurs once a year. Let us also assume that the hardware safety func-
tions fail 1% of the time they are placed in demand. This is a failure rate of 0.01 failure/de-
mand. Also assume that the operator will notice the high reactor temperature 3 out of 4 times
and that 3 out of 4 times the operator will be successful at reestablishing the coolant flow. Both
of these cases represent a failure rate of 1 time out of 4, or 0.25 failure/demand. Finally, it is es-
timated that the operator successfully shuts down the system 9 out of 10 times. This is a failure
rate of 0.10 failure /demand.
The failure rates for the safety functions are written below the column headings. The oc-
currence frequency for the initiating event is written below the line originating from the initi-
ating event.
The computational sequence performed at each junction is shown in Figure 11-10. Again,
the upper branch, by convention, represents a successful safety function and the lower branch
represents a failure. The frequency associated with the lower branch is computed by multiply-
ing the failure rate of the safety function times the frequency of the incoming branch. The fre-
quency associated with the upper branch is computed by subtracting the failure rate of the
safety function from 1 (giving the success rate of the safety function) and then multiplying by
the frequency of the incoming branch.
The net frequency associated with the event tree shown in Figure 11-9 is the sum of the
frequencies of the unsafe states (the states with the circles and x's). For this example the net
frequency is estimated at 0.025 failure per year (sum of failures ADE, ABDE, and ABCDE).
This event tree analysis shows that a dangerous runaway reaction will occur on average
0.025 time per year, or once every 40 years. This is considered too high for this installation. A
possible solution is the inclusion of a high-temperature reactor shutdown system. This control
Sha
490
11-3 Fault Trees 491
system would automatically shut down the reactor in the event that the reactor temperature ex-
ceeds a fixcd value. The cmergency shutdown temperature would be higher than the alarm
value to provide an opportunity for the operator to restore the coolant flow.
The event tree for the modified process is shown in Figure 11-11. The additional safety
function provides a backup in the event that the high-temperature alarm fails or the opera-
tor fails to notice the high temperature. The runaway reaction is now estimated to occur
0.00025 time per year, or once every 400 years. This is a substantial improvement obtained by
the addition of a simple redundant shutdown system.
The event tree is useful for providing scenarios of possible failure modes. If quantitative
data are available, an estimate can be made of the failure frequency. This is used most suc-
cessfully to modify the design to improve the safety. The difficulty is that for most real processes
the method can be extremely detailed, resulting in a huge event tree. If a probabilistic compu-
tation is attempted, data must be available for every safety function in the event tree.
An event tree begins with a specified failure and terminates with a number of resulting
consequences. If an engineer is concerned about a particular consequence, there is no certainty
that the consequence of interest will actually result from the selected failure. This is perhaps
the major disadvantage of event trees.
OR
Defect i
ve Worn
Tire Tire
Figure 11-12 A fault tree describing the various events contributing to a flat tire.
Events in a fault tree are not restricted to hardware failures. They can also include soft
ware, human, and environmental factors.
For reasonably complex chemical processes a number of additional logic functions are
needed to construct a fault tree. A detailed list is given in Figure 11-13. The AND logic func-
tion is important for describing processes that interact in parallel. This means that the output
state of the AND logic function is active only when both of the input states are active. The IN-
HTBIT function is useful for events that lead to a failure only part of the time. For instance, driv-
ing over debris in the road does not always lead to a flat tire. The INHIBIT gate could be used
in the fault tree of Figure 11-12 to represent this situation.
Before the actual fault tree is drawn, a number of preliminary steps must be taken:
1. Define precisely the top event. Events such as "high reactor temperature" or "liquid level
too high" are precise and appropriate. Events such as "explosion of reactor" or "fire in
process" are too vague, whereas an event such as "leak in valve" is too specific.
2. Define the existing event. What conditions are sure to be present when the top event
occurs?
3. Define the unallowed events. These are events that are unlikely or are not under con-
sideration at the present. This could include wiring failures, lightning, tornadoes, and
hurricanes.
4. Define the physical bounds of the process. What components are to be considered in the
fault tree?
11-3 Fault Trees 493
O further definition.
5. Define the equipment configuration. What valves are open or closed? What are the liq-
uid levels? Is this a normal operation state?
6. Define the level of resolution. Will the analysis consider just a valve, or will it be neces-
sary to consider the valve components?
The next step in the procedure is to draw the fault tree. First, draw the top event at the
top of the page. Label it as the top event to avoid confusion later when the fault tree has spread
out to several sheets of paper.
494 Chapter 11 Risk Assessment
Second, determine the major events that contribute to the top event. Write these down
as intermediate, basic, undeveloped, or external events on the sheet. If these events are related
in parallel (all events must occur in order for the top event to occur), they must be connected
to the top event by an AND gate. If these events are related in series (any event can occur in
order for the top event to occur), they must be connected by an OR gate. If the new events can-
not be related to the top event by a single logic function, the new events are probably improp-
erly specified. Remember, the purpose of the fault tree is to determine the individual event
steps that must occur to produce the top event.
Now consider any one of the new intermediate events. What events must occur to con-
tribute to this single event? Write these down as either intermediate, basic, undeveloped, or ex-
ternal events on the tree. Then decide which logic function represents the interaction of these
newest events.
Continue developing the fault tree until all branches have been terminated by basic, un-
developed, or external events. All intermediate events must be expanded.
Example 11-5
Consider again the alarm indicator and emergency shutdown system of Example 11-2. Draw a fault
tree for this system.
Solution
The first step is to define the problem.
The top event is written at the top of the fault tree and is indicated as the top event (see Figure 11-14).
Two events must occur for overpressuring: failure of the alarm indicator and failure of the emergency
shutdown system. These events must occur together so they must be connected by an AND func
tion. The alarm indicator can fail by a failure of either pressure switch l or the alarm indicator light.
These must be connected by OR functions. The emergency shutdown system can fail by a failure of
either pressure switch 2 or the solenoid valve. These must also be connected by an OR function.
The complete fault tree is shown in Figure 1-14.
1 2 3 4
P 0.13 P 0.04 P 0.13 P 0. 34
R: 0. 87 R
0. 96 R 0.87 R
0. 66
Figure 11-14 Fault tree for Example 11-5.
the various sets of events that could lead to the top event. In general, the top event could occur
through a variety of different combinations of events. The different unique sets of events lead-
ing to the top event are the minimal cut sets.
The minimal cut sets are useful for determining the various ways in which a top event
could occur. Some of the mimimal cut sets have a higher probability than others. For instance,
a set involving just two events is more likely than a set involving three. Similarly, a set involv-
ing human interaction is more likely to fail than one involving hardware alone. Based on these
simple rules, the minimal cut sets are ordered with respect to failure probability. The higher
probability sets are examined carefully to determine whether additional safety systems are
required.
The minimal cut sets are determined using a procedure developed by Fussell and Vesecly
The procedure is best described using an example.
4J. B. Fussell and W. E. Vesely, "A New Methodology for Obtaining Cut Sets for Fault
Trees" Trans-
actions of the American Nuclear Society (1972), 15.
496 Chapter 11 Risk Assessment
Example 11-6
Determine the minimal cut sets for the fault tree of Example 11-5.
Solution
The first step in the procedure is to label all the gates using letters and to label all the basic events
using numbers. This is shown in Figure 11-14. The first logic gate below the top event is written:
AND gates increase the number of events in the cut sets, whereas OR gates lead to more sets. Logic
gate A in Figure 11-14 has two inputs: one from gate B and the other from gate C. Because gate A
is an AND gate, gate A is replaced by gates B and C
AB C
Gate B has inputs from event 1 and event 2. Because gate B is an OR gate, gate B is replaced by
adding an additional row below the present row. First, replace gate B by one of the inputs, and then
create a second row below the first. Copy into this new row all the entries in the remaining column
of the first row:
AB1 C
C
Note that the C in the second column of the first row is copied to the new row.
Next, replace gate C in the first row by its inputs. Because gate C is also an OR gate, replace
Cby basic event 3 and then create a third row with the other event. Be sure to copy the 1 from the
other column of the first row:
AB1 3
2 C
1 4
Finally, replace gate C in the second row by its inputs. "This generates a fourth row:
AB1 3
2 e3
1 4
2 4
The cut sets are then
1,3
2,3
1,4
2,4
This means that the top event occurs as a result of any one of these sets of basic events.
11-3 Fault Trees 497
The procedure does not always deliver the minimal cut sets. Sometimes a set might be of the
following form:
1,2,2
This is reduced to simply 1, 2. On other occasions the sets might include supersets. For instance,
consider
1,2
1,2,4
1,2,3
The second and third sets are supersets of the first basic set because events 1 and 2 are in common.
The supersets are eliminated to produce the minimal cut sets.
For this example there are no supersets.
Total 0.0799
498 Chapter 11 Risk Assessment
This compares to the exact result of 0.0702 obtained using the actual fault tree. The cut sets are
related to each other by the OR function. For Example 11-6 all the cut set probabilities were
added. This is an approximate result, as shown by Equation 11-10, because the cross-product
terms were neglected. For small probabilities the cross-product terms are negligible and the
addition will approach the true result.
Not
acceptable
Acceptable
The initiating events are the causes of the incident, and the top events are the final outcomes.
The two methods are related in that the top events for fault trees are the initiating events for
the event trees. Both are used together to produce a complete picture of an incident, from its
initiating causes all the way to its final outcome. Probabilities and frequencies are attached to
these diagrams.
Risk is the product of the probability of a release, the probability of exposure, and the conse-
quences of the exposure. Risk is usually described graphically, as shown in Figure 11-15. All
companies decide their levels of acceptable risk and unacceptable risk. The actual risk of a pro-
cess or plant is usually determined using quantitative risk analysis (QRA) or a layer of protec
tion analysis (LOPA). Other methods are sometimes used; however, QRA and LOPA are the
methods that are most commonly used. In both methods the frequency of the release is deter-
mined using a combination of event trees, fault trees, or an appropriate adaptation.
SCCPS, Guidelines for Chemical Process Quantitative Risk Analysis, 2d ed. (New York: Center for Chemi-
cal Process Safety, AICHE, 2000).
500 Chapter 11 Risk Assessment
of a project (conceptual review and design phases) and are maintained throughout the facility's
life eycle.
The QRA method is designed to provide managers with a tool to help them evaluate the
overall risk of a process. QRAs are used to evaluate potential risks when qualitative methods
cannot provide an adequate understanding of the risks. QRA is especially effective for evalu-
ating alternative risk reduction strategies.
The major steps of a QRA study include
In general, QRA is a relatively complex procedure that requires expertise and a sub-
stantial commitment of resources and time. In some instances this complexity may not be war-
ranted; then the application of LOPA methods may be more appropriate.
Monomer
feed
Steam
Polymer
product
CW
Figure 11-16 Layers of protection to lower the frequency of a specific accident scenario.
The primary purpose of LOPA is to determine whether there are sufficient layers of pro-
tection against a specific accident scenario. As illustrated in Figure 11-16, many types of pro-
tective layers are possible. Figure 11-16 does not include all possible layers of protection. A sce
nario may require one or many layers of protection, depending on the process complexity and
potential severity of an accident. Note that for a given scenario only one layer must work suc-
cessfully for the consequence to be prevented. Because no layer is perfectly effective, however,
sufficient layers must be added to the process to reduce the risk to an acceptable level.
The major steps of a LOPA study include
502
11-4 QRA and LOPA 503
4. identifying the protection layers available for this particular consequence and estimating
the probability of failure on demand for each protection layer,
5. combining the initiating event frequency with the probabilities of failure on demand for
the independent protection layers to estimate a mitigated consequence frequency for this
initiating event,
6. plotting the consequence versus the consequence frequency to estimate the risk (the risk
is usually shown in a figure similar to Figure 11-15), and
7. evaluating the risk for acceptability (if unacceptable, additional layers of protection are
required).
This procedure is repeated for other consequences and scenarios. A number of variations on
this procedure are used.
Consequence
The most common scenario of interest for LOPA in the chemical process industry is loss
of containment of hazardous material. This can occur through a variety of incidents, such as a
leak from a vessel, a ruptured pipeline, a gasket failure, or release from a relief valve.
In a QRA study the consequences of these releases are quantified using dispersion mod-
eling and a detailed analysis to determine the downwind consequences as a result of fires, ex-
plosions, or toxicity. In a LOPA study the consequences are estimated using one of the follow-
ing methods: (1) semi-quantitative approach without the direct reference to human harm, (2)
qualitative estimates with human harm, and (3) quantitative estimates with human harm. See
footnote 6 for the detailed methods.
When using the semi-quantitative method, the quantity of the release is estimated using
source models, and the consequences are characterized with a category, as shown in Table 11-2.
This is an easy method to use compared with QRA.
Although the method is easy to use, it clearly identifies problems that may need addi-
tional work, such as a QRA. It also identifies problems, which may be deemphasized because
the consequences are insignificant.
Frequency
When conducting a LOPA study, several methods can be used to determine the frequency.
One of the less rigorous methods includes the following steps:
1. Determine the failure frequency of the initiating event.
2. Adjust this frequency to include the demand, for example, a reactor failure frequency is
divided by 12 if the reactor is used only 1 month during the entire year. The frequencies
are also adjusted (reduced) to include the benefits of preventive maintenance. If, for ex-
ample, a control system is given preventive maintenance 4 times each year, then its fail-
ure frequency is divided by 4.
3. Adjust the failure frequency to include the probabilities of failure on demand (PFDs) for
each independent layer of protection.
504 Chapter 11 Risk Assessment
Example of a
Frequency range value chosen by a
from literature company for use
Initiating event (per yr) in LOPA (per yr)
10-2
Large external fire (aggregate causes) 10-2 to 10-3
1x
LOTO (lock-out tag-out) procedure failure 10- to 104/ 1x 10-3
(overall failure of a multiple element process) opportunity (lopportunity)
Operator failure (to execute routine procedure; 10 to 10/ 1
x 102
well trained, unstressed, not fatigued) opportunity (lopportunity)
Individual companies choose their own values, consistent with the degree of conservatism or the company's risk toler
ance criteria. Failure rates can also be greatly affected by preventive maintenance rOutines.
The failure frequencies for the common initiating events of an accident scenario are
shown in Table 11-3.
The PFD for each independent protection layer (IPL) varies from 10l to 10-S for a weak
IPL and a strong IPL, respectively. The common practice is to use a PFD of 10 unless expe-
rience shows it to be higher or lower. Some PFDs recommended by CCPS (see footnote 6) for
screening are given in Tables 11-4 and 11-5. There are three rules for classifying a specific sys-
tem or action of an IPL:
CCPS, Simplified Process Risk Assessment: Layer of Protection Analysis, D. A. Crowl, ed. (New York: American In-
stitute of Chemical Engineers, 2001) (in press).
where
f is the mitigated consequence frequency for a specific consequence C for an initiating
event i,
fi is the initiating event frequency for the initiating event i, and
PFD is the probability of failure of the jth IPL that protects against the specific con-
sequence and the specific initiating event i. The PFD is usually 10, as described
previously.
When there are multiple scenarios with the same consequence, each scenario is evalu-
ated individually using Equation 11-30. The frequency of the consequence is subsequently de-
termined using
(11-31)
506 Chapter 11 Risk Assessment
x 105
to 1x
pressure. Effectiveness of this device is 1
CCPS, Simplified Process Risk Assessment: Layer of Protection Analysis, D. A. Crowl, ed. (New York: American In-
stitute of Chemical Engineers, 2001) (in press).
IEC (1998), IEC 61508, Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems,
Parts 1-7, Geneva: International Electrotechnical Commission.
SIEC (2001), IEC 61511, Functional Safety Instrumented Systems for the Process Industry Sector, Parts 1-3. (Draft in Pro
cess), Geneva: International Electrotechnical Commission.
where
f isthe frequency of the Cth consequence for the ith initiating event and
Iis the total number of initiating events for the same consequence.
Example 11-7
Determine the consequence frequency for a cooling water failure if the system is designed with two
IPLs. The IPLs are human interaction with 10-min response time and a basic process control sys-
tem (BPCS).
Solution
The frequency of a cooling water failure is taken from Table 11-3, that is, i = 10". The PFDs are
estimated from Tables 11-4 and 11-5. The human response PFD is 10 and the PFD for the BPCS
is 10. The consequence frequency is found using Equation 11-30:
Suggested Reading 507
-fx IIPFD,
= 10-1x (10- )(10-1) = 10 failure/yr.
As illustrated in Example 11-7, the failure frequency is determined easily by using LOPA
methods.
The concept of PFD is also used when designing emergency shutdown systems called
safety instrumented functions (SIFs). A SIF achieves low PFD figures by
There are three safety integrity levels (SILs) that are generally accepted in the chemical
process industry for emergency shutdown systems:
1. SIL1 (PFD 10 to 10 2): These SIFs are normally implemented with a single sensor,
=
a single logic solver, a single final control element, and requires periodic proof testing.
2. SIL2 (PFD = 102 to 10): These SIFs are typically fully redundant, including the sen-
sor, logic solver, final control element, and requires periodic proof testing.
3. SIL3 (PFD = 103 to 10 *): SIL3 systems are typically fully redundant, including the sen-
sor, logic solver, and final control element; and the system requires careful design and
frequent validation tests to achieve the low PFD figures. Many companies find that they
have a limited number of SIL3 systems because of the high cost normally associated with
this architecture.
Suggested Readinng
CCPS, Guidelines for Consequence Analysis of Chemical Releases (New York: American Institute of
Chemical Engineers, 1999).
Guidelines for Hazard Evaluation Procedures, 2d ed. (New York: American Institute of Chemical Engi-
neers, 1992).
J. B. Fussell and W. E. Vesely, "A New Methodology for Obtaining Cut Sets for Fault Trees," Transactions
of the American Nuclear Society (1972), 15.
F.P. Lees, Loss Prevention in the Process Industries, 2d ed. (London: Butterworths, 1996).
J.F. Louvar and B. D. Louvar, Health and Environmental Risk Analysis: Fundamentals with Applications
(Upper Saddle River, NJ: Prentice Hall PTR, 1998).
B. Roffel and J. E. Rijnsdorp, Process Dynamics, Control, and Protection (Ann Arbor, MI: Ann Arbor
Science, 1982), ch. 19.
508 Chapter 11 Risk Assessment
i ii
iii) iv
23 23
v)
Problemns
11-1. Given the fault tree gates shown in Figure 11-17 and the following set of failure
probabilities:
Failure
Component probability
0.1
2 0.2
3 0.3
0.4
a. Determine an expression for the probability of the top event in terms of the compo-
ent failure probabilities.
b. Determine the minimal cut sets.
c. Compute a value for the failure probability of the top event. Use both the expression
of part a and the fault tree itself.
11-2. The storage tank system shown in Figure 11-18 is used to store process feedstock. Over-
filling of storage tanks is a common problem in the process industries. To prevent overfill-
ing, the storage tank is equipped with a high-level alarm and a high-level shutdown sys-
tem. The high-level shutdown system is connected to a solenoid valve that stops the flow
of input stock.
a. Develop an event tree for this system using the "failure of level indicator" as the ini-
tiating event. Given that the level indicator fails 4 times/yr, estimate the number of
overfows expected per year. Use the following data: