Achieving Hardware Fault Tolerance
Achieving Hardware Fault Tolerance
Achieving Hardware Fault Tolerance
Session Ten
Achieving Compliance in Hardware Fault Tolerance
Mirek Generowicz
FS Senior Expert (TV Rheinland #183/12)
Engineering Manager, I&E Systems Pty Ltd
Abstract
The functional safety standards ISA S84/IEC 61511 and IEC 61508 both set
out requirements for hardware fault tolerance or architectural constraints.
The method specified in ISA S84 and IEC 61511 for assessing hardware fault
tolerance has often proven to be impracticable for SIL 3 in the process sector.
Many users in the process sector have not been able to comply fully with the
requirements.
Further confusion has been created because there are many SIL certificates in
circulation that are undeniably incorrect and misleading.
This paper describes common problems and misunderstandings in assessing
hardware fault tolerance.
The 2010 edition of IEC 61508 brought in a new and much simpler and more
practicable method for assessing hardware fault tolerance. The method is
called Route 2H.
This paper explains how Route 2H overcomes the problems with the earlier
methods.
The proposed new edition of IEC 61511 will be based on Route 2H.
The AS IEC 61511-1 method for HFT can only be used for relatively simple
architectures. The AS IEC 61508-2 methods can be applied to assess
hardware fault tolerance requirements for complex architectures.
Minimum HFT
SIL 1
SIL 2
SIL 3
Dominant failure
to a dangerous
state
Dominant
failure to a
safe state
SIL 1
SIL 2
SIL 3
SIL
SIL 1
SIL 2
SIL 3
The bare minimum requirement for SIL 3 is therefore to have 3 valves in series:
Route 1H applies the concept of Safe Failure Fraction (SFF). This is another
way of assessing whether the dominant failure is to the safe state. The
maximum SIL that can be claimed depends on the HFT.
The results are very similar to those of the AS IEC 61511 method.
The following table shows that maximum SIL that can be claimed for Type A
elements under Route 1H, depending on the HFT and SFF:
Safe Failure Fraction of the
element
SIL 1
SIL 2
SIL 3
SIL 2
SIL 3
SIL 4
SIL 3
SIL 4
SIL 4
SFF 99%
SIL 3
SIL 3
SIL 4
If the SFF < 60% then the dominant failure mode is not to the safe state and to
claim SIL 3 we still need HFT 2, requiring 3 valves in series:
To claim SIL 3 with only 2 valves we need to prove that SFF 60%:
= S + DD + DU
The SFF is the proportion of failures that are either safe (S) or are
dangerous but detected by on-line diagnostics (DD):
SFF = (S + DD )/
Understandably, equipment suppliers and designers have been creative in
trying to prove that SFF 60%.
The diagnostic interval must be included in the MTTR that is used in calculating
probability of failure:
The diagnostic interval + time for safety action response must be less
than the process safety time OR
The diagnostic test rate must be at least 100 times more frequent than
the demand rate.
Automatic weekly or daily testing might be sufficiently frequent for low demand
applications in the process sector but it is usually impractical.
6-monthly testing cannot be classed as a diagnostic and does not
contribute to improving SFF.
TV Rheinland has published a statement clarifying how these certificates
should be interpreted:
10
11
The requirement for Route 2H is very simple. If the confidence level can be
demonstrated then HFT of 1 is sufficient for SIL 3, and HFT of 0 is acceptable
for SIL 2.
There is no need to consider SFF for Type A elements.
The requirement for Type B elements is simply that
All type B elements used in Route 2H shall have, as a minimum, a
diagnostic coverage of not less than 60 %.
Failure rates with a confidence level of 90% can be expected to be
approximately 0.8 standard deviation (0.8) higher than failure rates with a
confidence level of 70%.
Finding data
Two dependable sources: OREDA and exida SERH
The OREDA Offshore Reliability Handbook published by SINTEF gives the
standard deviation and the mean for failure rates of components commonly
applied in the hydrocarbons industry.
OREDA is based on extensive field experience, though in limited applications.
The SERH Safety Equipment Reliability Handbook is published by exida.
The failure rates in exida SERH are calculated using FMEDA, but are based on
extensive datasets for individual component parts.
The results are broadly consistent, though OREDA includes some site specific
failures and OREDA failure rates may be twice as high as corresponding exida
rates.
12
Confidence levels
The confidence level in exida SERH is stated as 70%.
OREDA shows full details of the spread of failure rates recorded, including the
mean and the standard deviation.
The standard deviation allows us to estimate failure rates with 90% confidence
level (90%) from failure rates with 70% confidence level (70%).
In a normal distribution approximately 90% of population lies within 1.6 of .
Typically failure rates are distributed over one or two orders of magnitude.
According to OREDA, the following failure rates are typical for actuated ball
valves:
50% 2.3 per 106 hours
2.7 per 106 hours
70% 3.6 per 106 hours
90% 5.8 per 106 hours
90% / 70% 1.6
This value of ratio of 90% / 70% is typical.
We might infer that the calculated probability of failure for designs relying on
Route 2H will typically be around 60% higher than calculations based on Route
1H.
13
Studies based on vendor returns may inadvertently exclude many failures that
were not reported to the vendor. They may also exclude failures considered to
be systematic or outside the design envelope.
Low failure rates from restricted datasets may be unrealistically optimistic.
Common
SIL 1
SIL 2
(low demand mode)
SIL 2
(high demand/continuous mode)
SIL 3
SIL 4
The proposed draft excludes the requirement for 90% confidence level.
14
Conclusions
The HFT methods in AS IEC 61511 and AS IEC 61508 Route 1H do not work
well in practice for the process sector. These methods require 3 valves in
series (1 out of 3) to achieve SIL 3.
IEC Route 2H is based on confidence level increased to 90%. It is much
simpler and easier to apply. It allows SIL 3 to be achieved with only 2 valves
as final elements.
The new edition of IEC 61511 will apply Route 2H though without an explicit
requirement for 90% confidence levels.
OREDA and exida SERH provide failure rate data that are widely accepted as
being dependable. These references provide enough information to allow us to
infer failure rates with 90% confidence levels.
There are many certificates in circulation that claim failure rates that are much
lower than the rates published by OREDA and exida.
Users should collect their own data. Requirements for collection of evidence
are onerous. A large volume of evidence is required. User should compare
their failure rates with those in OREDA and SERH.
Failure rates from different sources should always be compared and assessed
for plausibility. For Route 2H a conservative approach should be taken, the
complete spread of failure rates should be taken into account.
Published failure rates for valves all include systematic failures. All valve
failures are essentially systematic in nature and can be avoided or controlled to
some extent. In evaluating failure rates the effectiveness of the planned
operation and maintenance should be considered. Particular attention should
be given to identifying and controlling common cause failures as these will
almost always dominate in the calculated probability of failure.
There are some certificates in circulation that take credit for no effect failures
or for partial stroke testing in determining SFF. These certificates must be
interpreted with caution. It is not valid to claim SFF > 60% for valves by:
Taking credit for no effect failures
Taking credit for infrequent partial stroke testing as a diagnostic
Certificates on their own are not sufficient as evidence of compliance to
AS IEC 61508-2 and AS IEC 61508-3. Detailed safety manuals must be
provided in accordance with AS IEC 61508-2 Annex D.
15
References
AS IEC 61511.1-2004 Functional safetySafety instrumented systems for the
process industry sector
Part 1: Framework, definitions, systems, hardware and software requirements
AS IEC 61508.2-2011 Functional safety of electrical/electronic/programmable
electronic safety-related systems
Part 2: Requirements for electrical/electronic/programmable electronic safetyrelated systems
SINTEF 2009, OREDA Offshore Reliability Handbook 5th Edition
Volume 1 Topside Equipment
exida.com L.L.C. 2007, Safety Equipment Reliability Handbook 3rd Edition
Volume 3 Final Elements
YouTube video http://youtu.be/SHAiFH4v_K8
The exida FMEDA Process Accurate Failure Data for the Process Industries
Dr. William M. Goble, CFSE, Exida Consulting, February 2012 Field Failure
Data the Good, the Bad and the Ugly
http://www.exida.com/images/uploads/Field_Failure_Ratesgood_bad_and_ugly_Feb_2012.pdf
16