Brall 2007
Brall 2007
Brall 2007
Packages
Aron Brall, SRS Technologies
William Hagen, Ford Motor Company
Hung Tran, SRS Technologies
Key Words: Reliability Modeling, Reliability Software, Simulation
119
0-7803-9766-5/07/$25.00 ©2007 IEEE
Raptor appears to be a pure Monte Carlo simulation tool Block Probability
Parameter 1 Parameter 2
to solve reliability block diagrams. Parameter Distribution
Failure
2.2 Reliasoft BlockSim 6.5.2 a Weibull Shape 1.5 Scale 1,000
Distribution
Repair
From the Reliasoft BlockSim web site: “Flexible a Lognormal Mu 5 Sigma 0.5
Distribution
Reliability Block Diagram (RBD) creation. Exact reliability
results/plots and optimum reliability allocation. Repairable Table 3-1- One Block Model Input Data
system analysis via simulation (reliability, maintainability, 3.3 Complex Model
availability) plus throughput, life cycle cost and related
analyses.” This model consists of 194 blocks, redundancy, k of n,
BlockSim appears to use Monte Carlo simulation with and corrective maintenance. The simulation was set at 100
algorithms used to speed the processing time to solve hours and 10,000 runs. Figure 3-3 gives an impression of the
reliability block diagrams. BlockSim will also provide an complexity of this model. Due to its size, it can’t be shown
analytical calculation of reliability. effectively within the page limitations of this paper.
2.3 Relex Reliability Block Diagram 3.4 Large Exponential Model
From the Relex web site: “At the core of Relex RBD is a This model consists of 83 blocks, all modeled with the
highly intelligent computational engine. First, each diagram is Exponential distribution for failure, and no repair distribution.
analyzed to determine the best approach for problem solving The simulation was set at 61,312 hours and 1,000 runs. Figure
using pure analytical solutions, simulation, or a combination 3.4 shows the block diagram.
of both. Once a methodology is determined, the powerful
Relex RBD calculations are engaged to produce fast, accurate Block Failure
results.” Name Distribution Parameter 1 Parameter 2
Relex RBD appears to be a hybrid tool that uses a Weibull Shape 1.5 Scale 1,000
algorithms and simulation in varying combinations to solve
b Normal Mean 250 Std Dev 50
reliability block diagrams.
c Exponential 10,000 0
3 THE MODELS d Lognormal Mu 6 Sigma 2
e Weibull Shape 1.5 Scale 2,300
We used several models to put the software packages
f Normal Mean 250 Std Dev 50
through their paces and identify differences in results. A total
of four models were used in varying combinations across the g Exponential 10,000 0
packages. The intent was not to pick a winner, but to increase h Lognormal Mu 8 Sigma 1
awareness of the care that must be taken in simulating. i Weibull Shape 1.5 Scale 1,000
j Normal Mean 250 Std Dev 50
3.1 One Block Model
k Exponential 10,000 0
This model consists of one simple block with a Weibull l Lognormal Mu 8 Sigma 3
failure distribution and Lognormal repair distribution. The m Weibull Shape 2.0 Scale 1,000
simulation was set at 1,000 hours run time and 10,000 runs. n Weibull Shape 3.0 Scale 1,000
See Figure 3-1 and Table 3-1 for the details of this model. o Weibull Shape 4.0 Scale 1,000
p Weibull Shape 0.5 Scale 1,000
q Weibull Shape 0.4 Scale 1,000
Table 3-2 - Simple Model Input Data
4 RESULTS OF SIMULATIONS/ANALYSIS
120
Figure 3-2 - Simple Model
121
key parameters from the simulations, Reliability at the The values produced for the Large model show some curious
end of simulation time (for non-maintainable systems), behavior. For example, as the number of trials is increased,
Availability (for maintainable systems) and Mean Time to one software package had decreasing results and the other had
First Failure or similar measure for non-repairable systems increasing results. The implication is that the software
vary by small to surprisingly large numbers, especially for packages are iterating from a different direction. The
MTTFF and MTBF. The Large model was a no maintenance difference in the MTTFF value raises serious concerns. The
model. Reporting MTTFF and Mean Down Time are difference is greater than 40%! The difference in reliability is
reasonable parameters for this model. However, one of the just 5% and the difference in availability is 2%. Care must be
packages reported a MTBF and MTTR, which are taken in interpreting or using any or all of these values. The
inappropriate. Note that the Availability for this model is differences between Raptor and BlockSim vs. Relex need to
actually the percentage of the simulation time until first be investigated for differences in how models are built and
failure. In this case, the availability for one of the packages is interpreted.
85.8% of 61362 hours. The packages provide statistical measures of all or some of the
To extract some of the parameters from the packages calculated values. When running a large number of runs, the
requires an intimate knowledge of how they work, and what Standard Error of the Mean can be used to show the range of
tool within the package will provide the parameter desired. the mean reliability. Using +/- 3 SEM will give you a good
One of the packages requires simulating a second time to one estimate of the range.
failure to produce an MTTFF.
Software Package
Trials Time
Model Parameter or Runs (hours) Raptor BlockSim Relex
One Block Reliability 1,000 1,000 0.3797 0.3663 0.365
One Block Availability 1,000 1,000 0.8927 0.8894 0.8430
Simple Reliability 1,000 100 0.983 0.977 0.978
Simple Availability 1,000 100 0.9955 0.9892 0.978
Simple System Failures 1,000 100 0.017 0.023 Not Reported
Large Reliability 10,000 61,362 0.7024 0.737 0.6914
Large Reliability 1,000 61,362 0.718 0.729 0.707
Large Availability 1,000 61,362 0.858 0.861 0.691
Large Availability 10,000 61,362 0.847 0.865 0.6866
Large MTTFF: (Hours) 10,000 61,362 144,775.992 201,679.125 146,321.53
Complex Reliability 10,000 100 0.1313 0.1215 0.0988
Complex Availability 10,000 100 0.3877 0.3741 0.3333
Complex MTBF (MTBDE) (Hours) 10,000 100 36.2732 37.2032 33.92
Complex MTTR (MDT) (Hours) 10,000 100 68.3853 62.2399 74.51
Table 4-1- Results of Simulations
5 OBSERVATIONS AND CAUTIONS demonstrate compliance with a specified reliability or
availability requirement. A result that would show a
Building models and entering data is a human activity Reliability of 0.85 when the requirement was 0.90 might cause
subject to human reliability problems. We all fell into the redesign, request for waiver, or other action to address the
abyss and made errors in connecting boxes, entering data, and shortfall. However, the shortfall may be due to the parameters
setting up simulations. If we weren’t comparing the results of used for the simulation, the algorithms used by the software, a
several software packages, these errors may have gone lack of understanding of how long to simulate, how many
undetected. This error rate points to one important caution. If independent random number streams to use, and/or how many
the results of the RBD analysis are critical to a decision runs to use. Analytical solutions for highly complex models
making process, and not just for information, it is important are based on approximations and simulations produce statistics
that a redundant analysis path be developed to assure the which represent the results of multiple simulation runs.
results are correct within the limitations of the software, and The various programs do not necessarily describe
not a product of erroneous modeling. We offer three variables in the same manner. When using the Lognormal
approaches. (1) Have two analysts independently model the distribution for example, we encountered a difference in
design using the same software package. (2) Have a second terminology between Raptor and BlockSim. Raptor allows the
analyst review the first analyst’s work in detail, including all Lognormal to be entered as Mean and Std. Dev. or Mu and
modeling decisions and data entries. (3) Have one analyst use Sigma. BlockSim only uses Mean and Std. Dev., but this is
two different packages for modeling. the same as Raptor’s Mu and Sigma. A novice could waste a
Many times the results of these simulations are used to
122
great deal of time clarifying what needs to be entered as data. an R&M engineer would tell someone never having used the
Modeling special cases can be difficult because of the product, not the interface he would like after he becomes
way the programs handle standby (which was in our models) familiar with the product. For example, double-clicking and
and phasing (which was not in our models). working through multiple tabs to put data into blocks in a
The output parameters were not consistently labeled, and block diagrams is very modern. Sometimes an alternative
the user should understand the difference between MTTF, method using tables of properties is easier to use even if it
MTTFF, MTBDE, and MTBF for reliability and MDT and doesn't let you create blocks or change probability
MTTR for maintainability. The products also provide distributions.
reliability and availability results with various adjectives such
as “mean”, “point”, “conditional”, etc. A review of the BIOGRAPHIES
literature provided with the packages is necessary to
understand these terms and relate them to those found in Aron Brall
specifications, handbooks, references, and texts. It is a serious SRS Technologies, Mission Support Division
issue that there doesn’t appear to be standard and/or consistent NASA Goddard Space Flight Center
terminology and notation from one program to another as well Code 302.9, Building 6
as to standard literature in the field. Greenbelt, Maryland 20771 USA
Each of the packages have tabs, checkboxes, preferences, abrall@pop300.gsfc.nasa.gov
defaults, multiple random number streams, selectable seeds for
random numbers, etc to facilitate the modeling, analysis, and Aron Brall is the Reliability Team Lead at NASA Goddard
simulation process. However, this flexibility can provide huge Space Flight Center for SRS Technologies. Previously he
pitfalls to the analyst. Care in modeling, and use of support held several positions in Product Assurance, including Vice
services provided by the software supplier is a good practice. President of Quality, in 14 years at Landis Grinding Systems,
Each of the authors worked with the software package each a Division of UNOVA Industrial Automation Systems. Prior
was most familiar with. Despite this familiarity, numerous to that he worked 12 years for the Amecom Division of Litton
runs and reruns were necessary due to idiosyncrasies of the Systems as a Systems Effectiveness Project Engineer. Out of
software, as well as errors in modeling, confusion of thirty-nine years professional experience, thirty-two have been
parameter definition, etc. Simple RBDs (parallel-series in Reliability and Product Assurance. He received a BS in
combinations of Exponential failure rate blocks without Electrical Engineering in 1967 from Columbia University,
maintenance) are not the issue here. The problems compound NY, NY, and an MBA in 1987 from Loyola College,
as a variety of failure distributions are intermixed with a Baltimore, MD. He is a senior member of ASQ, IEEE, and
similar grouping of repair distributions. As these become SME and a member of the SRE and SAE. He is an ASQ
more complex, a simulation becomes mandatory. Certified Reliability Engineer. He is a contributing member of
Some additional observations and cautions are given in the committees that prepared the initial and revised editions of
the paragraphs that follow. SAE M-110, Reliability and Maintainability Guideline for
The models can run quickly even on old Pentium II PCs, Manufacturing Machinery and Equipment. He is also a
or they can take hours to run. Length of simulation time, member of the RAMS 2007 Management Committee.
number of runs, and failure rate of the system can all
contribute to lengthening of simulation time. One of the William Hagen
models took in excess of 1 hour on a 3 GHz Pentium IV. Ford Motor Company, Powertrain Manufacturing Engineering
Convergence of the results is heavily dependent on how R&M Section, Global Engineering Alignment
consistent the block failure rates are. For example, one block 36200 Plymouth Road T3A
with an MTBF of 1,000 hours, can double or triple simulation Livonia, MI 48150 USA
time in a system where the other blocks have MTBFs in the
100,000 hour range. The display during simulation on some e-mail: whagen2@ford.com
of the packages shows the general trend, but there can be a lot
of outliers. Mr. Hagen has worked in manufacturing equipment
The display of Availability and or Reliability during Reliability and Maintainability at Ford Motor Company’s
simulation can be useful for seeing how the simulation is Powertrain Division for the last 11 years. He received a
behaving. For most models, this rapidly stabilizes to the first Bachelor of Science degree in Electrical Engineering from
decimal place, and then the second decimal place tends to Michigan State University in 1978, and has undertaken
bounce around. Usually you get the first two significant graduate studies in computer science at Worcester Polytechnic
figures in a hundred runs. Institute. His background in Reliability engineering includes
One of the models was so complex that it failed to four years as a reliability engineer working on sonar systems
converge on one of the packages – again this may have been at Raytheon Submarine Signal Division and thirteen years on
due to a subtle preference selection (or non-selection) or a facsimile, communication, electronic support measures and
human error. spacecraft at Litton Industries Amecom Division.
We have the impression that most of the user interfaces
were designed by software designers, working with R&M Hung Tran
engineers. The problem is that we seem to have gotten what SRS Technologies, Mission Support Division
123
NASA Goddard Space Flight Center Management at NASA Goddard Space Flight Center for MEI
Code 302.9, Building 6 Technologies and at NASA Lyndon B. Johnson Space Center
Greenbelt, Maryland 20771 USA for GHG Corporation. He graduated from University of
Houston with Bachelor of Science in Mathematics. He has
htran@pop300.gsfc.nasa.gov extensive experience with utilizing computation tools such as
Rapid Availability Prototyping for Testing Operational
Hung Tran has over 6 years of reliability engineering Readiness (RAPTOR), System Analysis Programs for Hands-
experience in relation to unmanned and manned spacecraft On Integrated Reliability Evaluation (SAPHIRE), and Relex
systems. Currently he works as Reliability Engineer at NASA Reliability Software.
Goddard Space Flight Center for SRS Technologies.
Previously, he held positions as Reliability Engineer and Risk
124