Clemson University
TigerPrints
All Dissertations
Dissertations
12-2018
Augmenting a Hazard Analysis Method with Error
Propagation Information for Safety-Critical
Systems
Fryad Khalid M. Rashid
Clemson University, fryad1979@gmail.com
Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations
Recommended Citation
Rashid, Fryad Khalid M., "Augmenting a Hazard Analysis Method with Error Propagation Information for Safety-Critical Systems"
(2018). All Dissertations. 2259.
https://tigerprints.clemson.edu/all_dissertations/2259
This Dissertation is brought to you for free and open access by the Dissertations at TigerPrints. It has been accepted for inclusion in All Dissertations by
an authorized administrator of TigerPrints. For more information, please contact kokeefe@clemson.edu.
AUGMENTING A H AZARD A NALYSIS M ETHOD WITH E RROR
P ROPAGATION I NFORMATION FOR S AFETY-C RITICAL S YSTEMS
A Dissertation
Presented to
the Graduate School of
Clemson University
In Partial Fulfillment
of the Requirements for the Degree
Doctor of Philosophy
Computer Science
by
Fryad Khalid M. Rashid
December 2018
Accepted by:
Dr. John D McGregor, Committee Chair
Dr. Brian Malloy, Committee Co-Chair
Dr. Amy Apon
Dr. Murali Sitaraman
Abstract
Safety-critical systems need specific activities in the software development life cycle to ensure that
the system will operate safely. The objective of this dissertation is to develop a new safety analysis method
to identify hazards. The method uses error propagation information and the internal structure rather than the
interfaces of a system. We propose development procedures to augment STPA (System-Theoretic Process
Analysis) with error propagation information derived from the architecture description of a system represented in the AADL (Architecture Analysis & Design Language). We will focus on how the AADL error
ontology can be used to assist in identifying errors, how those errors propagate among components, and
whether the errors lead to hazards in the system. Our research shows that tracing error propagation leads to
the discovery of hazards and additional information that other methods miss.
The new safety analysis method, Architecture Safety Analysis Method (ASAM), by augmenting
STPA with early design information, is able to find more hazards, unsafe control actions, safety constraints
and causes of the unsafe control actions than by using STPA alone. Our method leaves more false positives
than STPA, but in safety analysis having false positive is preferred over missing actual hazards. We use
the AADL error ontology to rigorously describe system component errors and how they propagate among
components. We illustrate this rigorous description through several examples and we demonstrate that it
yields hazards that an STPA analysis of the example did not find. In addition, we provide a mathematical
notation and expressions so that formal analysis and verification of the hazards can be done to ensure that
all causes of the hazards have been identified and that any developed safety constraints fully mitigate the
hazards, through the use of compositional reasoning.
ii
Dedication
I dedicate this doctoral dissertation to the cancer patients, nurses, physicians and professors in the
MD Anderson Cancer Center at the University of Texas (Houston Campus).
iii
Acknowledgments
I would like to express my deepest gratitude to my advisor, Dr. John D. McGregor, for his unwavering support, advice, and help that he has given me throughout this academic journey.
I would like to specially thank my best friend, Dr. Ethan McGee, for his help and support throughout
this dissertation. In addition, I would like to thank my other fellow Ph.D. colleagues in the Strategic Software
Engineering Research Group (SSERG), Yates Monteith and Wanda Moses, for their willingness to listen and
offer constructive feedback about my research ideas. Also, I would like to thank my dissertation defense
committee members for their feedback on this research and for their willingness to serve throughout the
defense process. Finally, I would like to thank my scholarship program, The Higher Committee For Education
Development (HCED) in Iraq, for helping to make my dreams come true.
I am deeply thankful to my lovely wife, Dr. Tavga A. Hussain, for her love, support, motivation and
sacrifice. Without her, this doctoral degree would never have been done.
I would also like to thank everyone who has offered encouragement and support throughout my
entire graduate career. This includes my parents, my siblings, my former co-workers at Sulaimani Polytechnic
University / Clemson University and my close friends.
iv
Table of Contents
Title Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
viii
1 Introduction . . . . . . . . .
1.1 Problem Statement . . .
1.2 Research Method . . . .
1.3 Contributions . . . . . .
1.4 Dissertation Statement .
1.5 Research Questions . . .
1.6 Dissertation Organization
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2
3
4
4
5
5
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 System-Theoretic Process Analysis . . . . . . . . . . . . . .
2.2 Architecture Analysis and Design Language . . . . . . . . .
2.3 Error-Model Annex . . . . . . . . . . . . . . . . . . . . . .
2.4 Safety Verification Methods . . . . . . . . . . . . . . . . .
2.5 System-Theoretic Process Analysis Augmentation Reasons .
2.6 False Positive and False Negative in Safety-Critical Systems
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
9
12
18
20
21
3 Feedback Control Loop Architecture . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Augmenting a Hazard Analysis Method with Error Propagation Information . . .
3.2 Additional Scenarios to the Augmented Method . . . . . . . . . . . . . . . . . .
3.3 Using Formal Notations to Augment a Hazard Analysis Method . . . . . . . . .
3.4 Introducing Feedback Control Loop Architecture with Compositional Reasoning
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
28
32
38
4 Analysis and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Demonstrating Three-Way Communication Format . . . . . . . . . . . . . . . .
4.2 Demonstrating Error Mitigation by Safety Constraints . . . . . . . . . . . . . . .
4.3 Demonstrating Safety Constraints Verification . . . . . . . . . . . . . . . . . . .
4.4 Demonstrating Notations and Expressions for Error Flows and Safety Constraints
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 42
. 43
. 68
. 88
. 102
5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.1 Difference Between Two-Way and Three-Way Communication Formats . . . . . . . . . . . 110
v
5.2
5.3
Justification of & Limitations of Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 111
ASAM Field Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.1 Works Related to ASAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2 Additional Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1 ASAM’s Future Work in Adaptive Cruise Control System . . . . .
7.2 ASAM’s Future Work in Medical Embedded Device - Pacemaker .
7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
118
118
120
121
8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Appendices . . . . . . . . . . . . . . . . . . .
A
Error Ontology . . . . . . . . . . . .
B
STPA Example . . . . . . . . . . . .
C
Safety-Critical System Terminologies
D
List of Abbreviations . . . . . . . . .
E
ASAM Annex Language Reference .
F
XText Grammar . . . . . . . . . . . .
G
ASAM’s plug-in . . . . . . . . . . . .
H
Feedback Results Representation . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
124
125
127
130
132
133
138
143
145
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
vi
List of Tables
3.1
3.2
3.3
3.4
Unsafe Functional Behavior . . . . . . . . . . . . . . . . . .
Finding More Possibilities . . . . . . . . . . . . . . . . . . .
Mathematical Notations for Error Flows and Safety Constraints
Mitigating the unsafe control actions . . . . . . . . . . . . . .
.
.
.
.
37
37
38
40
4.1
Evaluation Examples Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
7.1
7.2
Identify FP/FN/TP/TN in ACC system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Identify FP/FN/TP/TN in pacemaker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Figures
2.1
2.2
Feedback Control Loop in STPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Flow Visualized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
12
3.1
3.2
3.3
3.4
Feedback control loop augmented with error propagation and finite state machines .
The augmented method for additional scenarios . . . . . . . . . . . . . . . . . . .
A framework for feedback control loop inputs/outputs and functions . . . . . . . .
Introduce feedback control loop architecture with assume-guarantee reasoning . . .
26
29
36
41
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
4.16
4.17
4.18
Three-Way Communication Format for Adaptive Cruise Control System Architecture . . . . 52
ASAM’s report for Adaptive Cruise Control System Architecture . . . . . . . . . . . . . . . 53
Three-Way Communication Format for Train Automated Door Control System Architecture
60
ASAM’s report for Train Automated Door Control System Architecture . . . . . . . . . . . 61
Three-Way Communication Format for Embedded Medical Device . . . . . . . . . . . . . . 63
ASAM’s report for Pacemaker System Architecture . . . . . . . . . . . . . . . . . . . . . . 63
Three-Way Communication Format for Isolette System Architecture . . . . . . . . . . . . . 66
ASAM’s report for Isolette System Architecture . . . . . . . . . . . . . . . . . . . . . . . . 67
ASAM’s Error Mitigation report for ACC System Architecture . . . . . . . . . . . . . . . . 72
ASAM’s report for Non-Mitigated Error in ACC system . . . . . . . . . . . . . . . . . . . 74
ASAM’s Error Mitigation report for Train Automated Door Control System Architecture . . 79
ASAM’s report for Non-Mitigated Error in Train Door . . . . . . . . . . . . . . . . . . . . 81
ASAM’s Error Mitigation report for pacemaker . . . . . . . . . . . . . . . . . . . . . . . . 85
ASAM’s report for Non-Mitigated Error in pacemaker . . . . . . . . . . . . . . . . . . . . 87
ASAM’s report about safety constraints verification for train door . . . . . . . . . . . . . . 93
ASAM’s report about safety constraints are not verified for the train door . . . . . . . . . . 94
ASAM’s report about safety constraints verification for pacemaker . . . . . . . . . . . . . . 99
ASAM’s report about safety constraints are not verified for the pacemaker . . . . . . . . . . 101
5.1
5.2
5.3
Two-way versus Three-way communication format . . . . . . . . . . . . . . . . . . . . . . 111
Control Table versus Error Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
ASAM’s contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.1
7.2
ASAM’s future work in ACC example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
ASAM’s future work in pacemaker example . . . . . . . . . . . . . . . . . . . . . . . . . . 120
1
2
3
4
5
6
Feedback control loop for automated door control system for train
Unsafe control actions for train door controller . . . . . . . . . .
Process model of the controller . . . . . . . . . . . . . . . . . . .
Context of the control actions . . . . . . . . . . . . . . . . . . . .
Feedback questionnaire . . . . . . . . . . . . . . . . . . . . . . .
Results of the votes . . . . . . . . . . . . . . . . . . . . . . . . .
viii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
127
128
128
129
145
146
Chapter 1
Introduction
John Knight wrote that ”Safety-critical systems are those systems whose failure could result in
loss of life, significant property damage, or damage to the environment” [19]. His primary concern was the
consequences of failure. There are many safety-critical system examples such as automotive systems, medical
devices, aircraft flight controls, nuclear systems, etc. Malfunctions in these systems could have significant
consequences such as severe injuries and mission failure. We know that those systems are dependent on
software to the extent that they could not work without the software. So, they should be designed, verified,
and validated very carefully to make sure that they obey the system specifications and requirements and are
free from errors.
One of the motivations for doing this research is a study by the National Institute of Standard and
Technology [42] found that 70% of software faults are introduced during the requirements and architecture
design phases, while 80% of those faults are not discovered until system integration testing or later in the
development life cycle. The cost of rework to fix the software problem is more expensive than finding and
fixing during the requirements and design phase. This presents a big challenge in designing safety-critical
embedded systems to address system faults that are recently discovered late in the development life cycle.
The software in safety-critical systems may not harm people directly, but it can control different
types of equipment which may cause harm. There are many examples of safety systems that have failed due
to software related faults such as ESA Rocket [5] , Therac-25 [23], PCA Pump [37], Toyota Prius [14],
ARIANE 5 [24], Boeing 777-200 [41], and pacemaker types such as Medtronic/Biotronik/St.Jude Medical,
[6]. This evidence shows that software errors, bugs and flaws in the requirements might have catastrophic
consequences. For that reason, software operating critical functions should be very carefully designed and
1
analyzed to avoid and prevent errors as well as their propagation.
From the system safety perspective, interaction complexity among embedded software and hardware
platforms will increase criticality between system components. For that reason, safety can be described
as an emergent behavior property that arises from the interaction of system components. That property
requires understanding how the components are connected and how they interact with each other in the
system design. Also, understanding is required because the concept of emergent behavior carries the idea
that unsafe interaction among components could lead to violating safety requirements.
From the software perspective, in the early system design phase, we could notice that software
components are scattered across so many nodes that it may introduce problems such as late or early delivery
of data, loss of system operation or concurrent access to the same resources. Additionally, we could see in
the early design phase of the software that it is possible to generate values beyond expected boundaries, that
there is inconsistency between requirements or that there are mismatched timing requirements. If we know
the faults cause errors and the those errors are not mitigated, then errors will propagate to the implementation
and may not be caught by testing efforts. If these problems remain undetected during software testing, they
could lead to serious errors and potential injury.
From safety and software perspectives, we need to deal with software aspects according to the safety
requirements to build a safe system. The real challenges start here. How can one develop a safe system that
fulfills safety constraints in the development of software-intensive systems? At the same time, hazard analysis
techniques cannot describe the dynamic error behavior of the system and error propagation among system
components. Because of that, traditional hazard analysis techniques depend on decomposition of the system
with respect to the hierarchy of failure effects instead of the systems architectural model. For that reason,
there is a need for a new hazard analysis method to support modeling and analyzing dynamic error behavior
to find hazards in early development phases.
To the best of our knowledge, no guidance to augment existing hazard analysis techniques to find
hazards and additional information that they missed exists. Also, to the best of our knowledge, no formalized
guidance for hazard analysis techniques to improve the rigor of their methods exists.
1.1 Problem Statement
In this dissertation, we address the incompleteness of hazard analysis to find faults that exist beyond
just the controller. We provide a systematic procedure to augment a recent hazard analysis approach, STPA
2
(System-Theoretic Process Analysis), with error propagation information and dynamic support contributors
to find additional hazards that STPA did not find. In addition, the augmented method also addresses the
lack of using mathematical models in safety analysis to ensure that all causes of a fault have been identified.
Finally, we also address the lack of ability of ensure that a safety constraint fully mitigates a fault. The
problem statement could be summarized as follows: The development of safety critical software would be
improved if:
1. Guidance were available on how to augment the fault finding analyses to address incompleteness.
2. Guidance were available on how to formalize analysis to improve the rigor of the hazard analysis
methods.
3. Recent hazard analysis methods were linked with compositional reasoning techniques to verify the
safety constraints against the faults they mitigate.
1.2 Research Method
In this dissertation, we are analyzing the system before the architecture design is completed. The
purpose of our work is to allow a stakeholder or safety analyst, at design time, to improve the safety of
the system and build a safe product. Building on the work previously done [38], this research seeks to
develop a safety analysis method that provides a deeper analysis of the system under review. The method
will be developed in the form of an architectural model that describes internal failures in one of the major
components (sensor, controller, actuator, controlled process, and controller input device) of the feedback
control loop architecture, and we will show how the internal failure affects the other components in the form
of three-way interactions. We will illustrate the method through several scenarios.
We will provide a theoretical foundation in the form of mathematical models for each communication channel and error in the feedback control architecture. This formal background will help to provide a
guide to determine the expected behavior of the system because most of system design starts with a system
specification and there is not necessarily a complete description of the system’s behavior.
In addition to documenting and describing the error, we will also verify that the hazard situation
which causes the unsafe behavior of the system is mitigated. This evaluation of the system safety verification
help to verify the safety constraints based on the system / component design.
3
1.3 Contributions
The primary contributions of the dissertation will be as follows:
1. The introduction of a method of augmenting STPA (System-Theoretic Process Analysis) with error
propagation information and error behavior state machines using AADL (Architectural Analysis and
Design Language).
2. The introduction and analysis of a mathematical notation for each error flow in the feedback control
loop architecture.
3. The identification of a method of verifying safety constraints / mitigating unsafe control actions in the
hazard analysis process.
1.4 Dissertation Statement
In this dissertation, we assume that:
1. The error propagation information and error behavior state machine analysis can be applied to augment
hazard analysis methods to find new hazards as well as specific causes and effects. Recent hazard analysis methods such as STPA (System-Theoretic Process Analysis) mechanisms can be adapted and used
to provide support for this type of augmentation under different scenarios and development contexts.
2. The increasing complexity and criticality of modern computing systems are driving the need for enhanced hazard analysis methods. This augmentation is used to support modeling dynamic error behavior of the system during hazard analysis and representing a formal framework for the hazard identification in system architecture.
3. Safety-critical systems are driving the need to understand the feedback control loop architecture to
manage variations, component failures and emergent behavior between system components resulting in
hazards. Our framework can be applied to augment feedback control loops with mathematical notations
and expressions to find hazardous conditions in a design, to improve rigor of STPA, and to formalize
error descriptions in systems described in AADL.
4. Compositional reasoning, especially assume-guarantee contract techniques, can be applied to augment
safety analysis methods to verify safety requirements identified for each component. The feedback
control loop architecture in STPA can be adapted and used to support the contract.
4
1.5 Research Questions
In this dissertation, we are attempting to answer the following specific research questions in the
safety-critical systems:
• RQ1) Can the safety of the system be improved by augmenting a hazard analysis method with error
propagation information? We need to provide deeper analysis of the system under review. This allows
us to find more faults before instantiating the product.
• RQ2) Can we analyze the internal failures of the components in order to show its effect of the other
components during hazard analysis process? Modern information systems have many components.
The interaction among components can make the system unreliable or unsafe. We need to know how
the components interact with each other in the early design phase. For example, what will happen to
the system if one of the components has internal failure?
• RQ3) How does knowledge of the interaction among the components in the early design phase help
us to identify internal failures? Safety can be defined as an emergent behavior property of the system.
This allows us to deal with interactions among system components.
• RQ4) How can the propagated error be presented in the form a mathematical model? We need to
analyze system behavior in a formal way to expect unsafe interactions between system components.
For example, what are those elements / parameters that effect the other components?
• RQ5) How can we identify the assume-guarantee contract for each component in the feedback control
loop architecture? We need to know the specific information about specification and implementation
for each component in the feedback control loop.
1.6 Dissertation Organization
The remainder of the dissertation is organized as follows: Chapter 2 provides necessary background
on STPA (System-Theoretic Process Analysis), AADL (Architecture Analysis and Design Language), EMV2
(Error-Model Annex version 2), safety verification methods, STPA augmentation reasons and false positive/false negative in safety-critical systems; Chapter 3 provides feedback control loop architecture augmentation
sections such as augmenting STPA with error propagation information, mathematical notation for formalized
augmentation of STPA and augmenting STPA with compositional reasoning; Chapter 4 provides our method
5
for analysis and evaluation; Chapter 5 provides justifications and limitations of evaluation; Chapter 6 discusses related work; Chapter 7 introduces plans for future enhancements of each safety-critical application
example; and Chapter 8 provides conclusions based on the differentiation between our method and the current accepted standard.
6
Chapter 2
Background
In this section, we present relevant background material for this dissertation. This section will provide an overview of STPA (System-Theoretic Process Analysis), AADL (Architecture Analysis and Design
Language), EMV2 (Error-Model Annex v2), STPA augmentation reasons and false positives / false negatives
in safety-critical systems.
2.1 System-Theoretic Process Analysis
STAMP (Systems-Theoretic Accident Model and Processes) is a causality model built on system
theory used to analyze system accidents. In this model of system safety, accidents could occur when components fail or when interactions between system components are inadequately handled by the control system.
Therefore, STAMP views system safety as a control problem and introduces three main concepts: safety
constraints, hierarchical control structures and process models [20].
STPA (System-Theoretic Process Analysis) is a top-down hazard analysis approach built on STAMP.
The idea behind this approach is to avoid accidents by modifying the design of the system before implementation. The goal is to identify potential causes of accidents, that is scenarios that may lead to losses, so they
can be controlled or eliminated in the system design or operations before damage occurs. In a nutshell, it
provides scenarios to control and mitigate the hazards in the system design. The method consists of four steps
to provide scenarios [21]:
1. The stakeholder establishes fundamental analyses to identify accidents and the hazards associated with
those accidents. For example, if we assume that the accident is two aircraft colliding with each other,
7
then, the hazards could be the two aircraft violate minimum separation or the two aircraft enter unsafe
atmospheric regions.
2. The stakeholder designs a feedback control loop for the system to identify major components such as
sensors, controllers, actuators and the controlled process. Then, the stakeholder labels the control /
feedback arrows as shown in figure 2.1. Essentially, the information from the figure 2.1 about each
element is extracted into in the feedback control loop. For example, we need to know that the process
model is the mental model of the controller. It is used to store information about variables and their
values. The sensor is used to measure the values of the variables. The actuator is used to follow the
instructions of the controller. The controlled process is used to implement commands that come from
the controller through the actuator and send the result back to the controller through the sensor.The
guide words among elements are used to identify hazards.
3. The stakeholder identifies unsafe control actions that could lead to hazardous states. The stakeholder
should identify the following unsafe control actions, and then translate them to corresponding safety
constraints:
• A control action is needed for safety, but it is not provided.
• An unsafe control action is provided.
• A safe control action is provided too late or too early.
• A safe control action is stopped too soon or applied too long.
4. The stakeholder identifies causal factors for the unsafe control actions. The safety analyst determines
how each hazardous control action could occur by identifying the process model variables for the
controller in the feedback control loop and analyzes each path to find out the cause of the unsafe
control actions.
An example STPA analysis is provided in appendix B.
8
Figure 2.1: Feedback Control Loop in STPA
2.2 Architecture Analysis and Design Language
The Architecture Analysis and Design Language (AADL) is an architecture modeling language
which uses a common description language to bind software and hardware elements together into a unified
model. The language is a standard of the Society of Automotive Engineers (SAE) and is actively maintained.
The language can be extended through the use of annexes; each annex is standardized independently and has
separate syntax or functionality [32].
9
2.2.1
Language Overview
The language has keyword constructs for representing the execution platform, hardware components
(device, processor, bus, memory), software components (data, subprogram, thread, process) and the integration of the two (system). AADL is a well known modeling language for safety-critical systems because it
captures the hardware and software aspects of a system and their interactions. It is also possible to create
(abstract) components in the early stage of system design that can then can be refined into either software or
hardware at a later time. Additionally, AADL includes connections among components of various types such
as ports, bus access and data access [32].
AADL components are composed of two separate definitions. The first is a specification of the
component. It defines the component type and its external interfaces used to communicate with the rest of
the world. The second is an implementation. It defines the inside of the component and how the interfaces
are connected to sub-components to provide services. One of the important of specification feature is a port.
The port represents the interaction point for an event / notification, data transfer or a combination of events
and data. The port can be specified for components such as devices, threads, processes and systems. There
are three types of ports [11]:
1. The event port represents the reception of a notification and carries the signal without any data. For
example, detection of data from a sensor such as temperature being above or below a defined threshold.
When using this kind of port, the software is waiting for an incoming event.
2. The data port represents an interface which receives data flow and keeps track of the latest values. The
port value will be updated as soon as possible when a new data value is received.
3. The event data port represents an interface that combines the event port and the data port to exchange
data with notification in the same entity. It sends the signal (event) associated with the data.
AADL also introduces the concept of flows. The flow enables the representation and analysis of
logical paths through an architecture, and can reflect scenarios that can be the basis for system level analysis. AADL can support and declare flow specifications and implementations for the components. The flow
specification defines a logical flow from input port to output port of the component. The logical flow can be
represented as either a data flow, fault event flow or control flow. The flow specification can define three types
of flows for a component [9, 15]:
10
1. The flow source defines a flow which originates within a component and emerges from the component
through an outgoing port.
2. The flow sink defines a flow which terminates within a component and enters the component through
an incoming port.
3. The flow path defines a flow which enters a component through an incoming port and emerges through
an outgoing port.
The flow implementation elaborates the flow specifications through sub-components. It starts with the incoming port of the flow specification, follows a sequence of connections and sub-components and ends with an
outgoing port. The last type declaration of a flow is the end-to-end flow which defines a complete flow path
from the starting component to the final component in the system.
In this dissertation, we use the flows concept to analyze important system characteristics. For instance, we show (in listing 2.1 and figure 2.2) a flow within a speed control system in the vehicle. This
flow originates at Brake Pedal Sensor, traverses through the Speed Control System and terminates at Throttle Actuator. When the driver pushes the brake pedal, the Brake Event will pass through the outgoing
event data port via flow source to the Speed Control System. Then, the Speed Control System receives the
Brake Event through incoming event data port, adjusts the throttle setting value to decrease speed of the vehicle and sends the value to the throttle value via the flow path to the out event data port. The Throttle Actuator
reduces fuel into the engine based on the received setting value through the incoming event data port. The
flow sink shows the impact of Brake Event on the Throttle Actuator and stops the flow.
1
2
3
4
5
6
device
Brake Pedal Sensor
features
Brake Event : out event data port ;
flows
Brake Flow : flow s o u r c e Brake Event ;
end B r a k e P e d a l S e n s o r ;
7
8
9
system Speed Control System
features
10
Brake Event : in event data port ;
11
T h r o t t l e A d j u s t i n g : out event data port ;
12
flows
11
13
14
C r u i s e F l o w : f l o w p a t h B r a k e E v e n t −> T h r o t t l e A d j u s t i n g ;
end S p e e d C o n t r o l S y s t e m ;
15
16
17
18
19
20
21
device Throttle Actuator
features
Throttle Adjusting : in event data port ;
flows
T h r o t t l e F l o w : flow s i nk T h r o t t l e A d j u s t i n g ;
end T h r o t t l e A c t u a t o r ;
Listing 2.1: Flow AADL Example
Figure 2.2: Flow Visualized
2.3 Error-Model Annex
In designing safety-critical systems, we need to show that the system is reliable and resilient to
different types of failures. For that purpose, a number of different types of safety analysis methods are
required in order to show the system is safe and has appropriate measures to handle critical failures. We
know that the main objective of a Model-Based Engineering (MBE) approach is to automate the development
process such as building model based tools to automate the safety validation process. The core AADL
language does not contain the language constructs to do safety analysis. For that reason, the core language
is extended with an annex dedicated to safety analysis, Error Model Annex v2, which provides the required
semantics for catching safety information and showing error propagation through the architecture.
The Error Model Annex is used to capture what could go wrong in the architecture, to specify which
12
errors could arise from the components, how those errors propagate through the architecture and how they
impact components behavior. This annex has four levels of abstraction. In order, they are [9, 11, 33]:
1. Error Ontology: Represents error types in a hierarchical structure to support hazard analysis and assist
modelers to make sure that they have considered different types of error propagation in the interaction
among system components. The error ontology can distinguish errors within the architecture and how
each type can propagate or impact the components’ behavior. We provide a hierarchical structure of
error ontology in appendix A. In here, we give a brief description of each type of error [10, 38]:
• Service Errors: Represent errors which are related to delivering service for items. Service
errors can be categorized as omission errors, no service delivered (such as loss of a message
or command), and commission errors, which represent unexpected service provided (such as
unintended incoming data).
• Value Errors: Represent errors which are related to the value domain of a service. Value errors
can be categorized as value errors for individual service items like out of range sensor reading,
value errors for sequences of service items like bounded value change and value errors related to
the service as a whole like out of calibration.
• Timing Errors: Represent errors which are related to the time domain of a service. Timing
errors can be categorized as individual service items like early/late item delivery, timing errors
for sequences of service items like rate errors, and timing errors related to the service as a whole
like early/delayed services.
• Replication Errors: Represent errors which are related to delivery of replicated services. For
example, replicate service items delivered for one recipient or to multiple recipients.
• Concurrency Errors: Represent errors which are related to behavior of concurrent systems
like executing tasks concurrently to access shared resources. Errors are distinguished between
race condition errors and mutual exclusion errors.
• Access Control Errors: Represent errors which are related to operation of access control
services like authentication and authorization errors.
Developers can use the error ontology in AADL, which is defined in the ErrorLibrary.aadl in
OSATE (Open Source AADL Tool Environment), when they design a system. Although, if the existing
13
names do not fit with their system, developers can define their own error types in a package as shown
in listing 2.2.
1
2
package My Error Types
public
3
a n n e x EMV2 { ∗∗
4
error types
5
BrakeEventError : type ;
6
BadValue : t y p e e x t e n d s B r a k e E v e n t E r r o r ;
7
NoValue : t y p e e x t e n d s B r a k e E v e n t E r r o r ;
end t y p e s ;
8
∗∗};
9
10
end MyErrorTypes ;
Listing 2.2: Define Error Types in AADL
2. Error Flows in the Architecture: Represents how errors flow through the system architecture, after
we identified the type of errors for the system in the previous step. In this step, EMV2 allow us to
describe:
• Error Propagation: Define incoming and outgoing errors for a component. The error propagation has the ability to specify what error type the component can expect to receive or send,
but it does not have the ability to specify how incoming and outgoing error types relate to each
other. For example, no one knows how an error from an incoming port can generate an error on
an outgoing port or how an error in the processor effects a process and results in an outgoing
propagation on its data port. The error flow can solve the relation problem between errors.
• Error Flow: Defines the relationship between incoming and outgoing error propagations. There
are three types of error flows:
(a) Error Source: Specifies that the error originates from this component. Especially, when
the component has an internal failure, it will generate the error being propagated.
(b) Error Path: Specifies the transmission of an incoming error propagation to an outgoing
error propagation. The error is propagated by a component itself. The component is passing
the error from the incoming port to the outgoing port.
14
(c) Error Sink: Specifies that the error is stopped at this component. The error can be handled
or mitigated at this component because it impacts the architecture.
In listing 2.3, we show how to define error propagations for each component such as the Brake Pedal Sensor,
Speed Control System and Throttle Actuator. We can see that the device Brake Pedal Sensor is an error source for any error type which propagates out of the event data port Brake Event. The device
Throttle Actuator is an error sink for any error type. The Speed Control System is an error path because an error of any type can propagate into the system through the incoming event data port, and the
error can propagate out through the outgoing event data portm Throttle Adjusting. We do not know
how the propagated error impacts the component at this stage. We will show that in the next step.
1
2
device Brake Pedal Sensor
features
Brake Event : out event data port ;
3
4
a n n e x EMV2 {∗∗
5
use types E r r o r L i b r a r y ;
6
error propagations
Brake Event : out p r o p a g a t i o n s { AnyError };
7
flows
8
Sensor Flow : e r r o r s o u r c e Brake Event { AnyError };
9
end p r o p a g a t i o n s ;
10
11
12
∗∗};
end B r a k e P e d a l S e n s o r ;
13
14
15
system Speed Control System
features
16
Brake Event : in event data port ;
17
T h r o t t l e A d j u s t i n g : out event data port ;
18
a n n e x EMV2 {∗∗
19
use types E r r o r L i b r a r y ;
20
error propagations
21
Brake Event : i n p r o p a g a t i o n s { AnyError };
22
T h r o t t l e A d j u s t i n g : out p r o p a g a t i o n s { AnyError };
23
flows
15
C o n t r o l F l o w : e r r o r p a t h B r a k e E v e n t −> T h r o t t l e A d j u s t i n g ;
24
end p r o p a g a t i o n s ;
25
26
27
∗∗};
end S p e e d C o n t r o l S y s t e m ;
28
29
30
device Throttle Actuator
features
Throttle Adjusting : in event data port ;
31
32
a n n e x EMV2{∗∗
33
use types E r r o r L i b r a r y ;
34
error propagations
T h r o t t l e A d j u s t i n g : i n p r o p a g a t i o n s { AnyError };
35
flows
36
Actuator Flow : e r r o r s i n k T h r o t t l e A d j u s t i n g { AnyError };
37
end p r o p a g a t i o n s ;
38
39
40
∗∗};
end T h r o t t l e A c t u a t o r ;
Listing 2.3: Error Propagation Concept in AADL
3. Component Error Behavior: Represents how errors impact a component’s behavior. For instance,
an error can change the operating mode of a component from nominal mode to failure mode. For
that purpose, EMV2 provides an error behavior state machine (EBSM) to catch safety behavior of a
component according to its states, internal failure/internal error events and errors coming from outside
world.
We can define the states of a component regarding its error behavior. Basically, error behavior state
machines (EBSM) allows stakeholders to define states according to system operational environment.
For example, we consider two states for each component: Operational state (which means the component is active and it is operating without any error), and Failed state (which means the component is
possibly active but not without error). A transition can define the condition under which a state change
could occur. Each transition consists of a source state which is the initial state of the component and
a destination state which is the final state after the transition is triggered. A condition is an error event
that needs to be activated or triggered to activate the transition. We show the error behavior of a com16
ponent in listing 2.4, by two transitions. The first transition is from Operational state to Failed state
when the Failure error event is raised. The second is from Failed state to Operational state when the
Recovery event is activated.
1
2
e r r o r behavior Fail And Recover
events
3
Failure E : error event ;
4
Recovery E : r e c o v e r event ;
5
states
6
Operational : i n i t i a l
7
Failed : state ;
8
9
10
11
state ;
transitions
t 1 : O p e r a t i o n a l −[ F a i l u r e E ]−> F a i l e d ;
t 2 : F a i l e d −[ R e c o v e r y E]−> O p e r a t i o n a l ;
end b e h a v i o r ;
Listing 2.4: Error Behavior State Machine
4. Composite Error Behavior: Represents the state of the component according to state of the subcomponents. In another context, the state of a system depends on the state of the components. For
example, a computer system is in the Failed state when one of its sub-parts (processor or memory)
is Failing. EMV2 can support this through composite error behavior specifications. We show this in
listing 2.5; when a system consists of two sensors to provide temperature, the main system will be in
the failure if both sensors are failing. Otherwise, it will be in the operational state.
1
2
composite e r r o r behavior
states
3
s e n s o r 1 . F a i l e d and s e n s o r 2 . F a i l e d −> F a i l e d ;
4
s e n s o r 1 . O p e r a t i o n a l o r s e n s o r 2 . O p e r a t i o n a l −> O p e r a t i o n a l ;
5
end c o m p o s i t e ;
Listing 2.5: Composite Error Behavior
17
2.4 Safety Verification Methods
Generally, verification concentrates on showing consistency among system implementation and
functional specification but for the safety-critical systems that is not enough. The implementation of the
system must satisfy the safety requirements. Specifically, the safety verification is different than functional
verification. The functional verification ensures that the system fully satisfies its specification, but safety
verification uses the result of safety analysis methods to ensure that the system meets the safety constraints
[46]. We can classify safety verification into two types:
2.4.0.1
Safety Verification in Software Testing
The software safety verification can be achieved by in two ways [2]:
1. Dynamic Safety Testing: requires the execution of the software, the result can be evaluated with respect
to system safety features. In addition, testers monitor behavior of the system during testing like entering
erroneous values / out of range values.
2. Static Safety Analysis: looks over the code and design document of the system. It is used to verify
accuracy of safety feature with regard to the actual system/software.
2.4.0.2
Safety Verification in Software Architecture
Compositional reasoning is a partitioning formal analysis of a system architecture into a sequence
of verification tasks which correspond to the decomposition of the system architecture. This partitioning of
verification effort will help to find proof for each subcomponent within the system architecture so that the
analysis will handle a large system design. In a nutshell, system level properties depend on the properties of
the components within the system. For example, from safety perspective, system safety depends on the safety
properties of the components. For that purpose, AGREE (Assume Guarantee REasoning Environment) was
created for managing proofs of components described in AADL [7].
AGREE is a compositional verification tool that can be used to confirm the behavior of a component.
It follows the popular assume guarantee reasoning model which states that provided the assumptions about a
component’s inputs are met the component can provide certain guarantees about its output. The guarantees
correspond to component requirements, and the assumptions correspond to the environmental requirements
that were used in verifying the component requirements. The contract specifies the information which is
needed to reason about the component’s interaction with other components of the system [26].
18
As an example, let’s build a safety constraint based on error values: In an ACC (Adaptive Cruise
Control) system, the sensor should measure the distance between vehicles in the specific range (0-100) feet.
Any values beyond that limit are likely erroneous and distance cannot be negative. We will focus on the
actions of the controller. If the controller has an assumption that it will receive values in the range, it will
perform a safe action if the values it received are in the specified range. Outside of this range, whether a safe
action is performed cannot be guaranteed. An implementation of this example is shown in listing 2.6
1
2
system sensor
features
d i s t a n c e : out d at a p o r t Base Types : : I n t e g e r ;
3
4
a n n e x a g r e e {∗∗
g u a r a n t e e ” S e n s o r o u t p u t i s b e t w e e n 0 and 100 ” : d i s t a n c e >= 0 and
5
d i s t a n c e <= 1 0 0 ;
6
7
∗∗};
end s e n s o r ;
8
9
10
system implementation sensor . i
a n n e x a g r e e {∗∗
a s s e r t d i s t a n c e = i f d i s t a n c e < 0 t h e n 0 e l s e i f d i s t a n c e > 100 t h e n
11
100 e l s e d i s t a n c e ;
12
13
∗∗};
end s e n s o r . i ;
14
15
16
system c o n t r o l l e r
features
d i s t a n c e : in d at a p o r t : Base Types : : I n t e g e r ;
17
18
a n n e x a g r e e {∗∗
assume ” C o n t r o l l e r s h o u l d r e c e i v e v a l u e s among 0−100 ” : d i s t a n c e <= 100
19
and d i s t a n c e >= 0 ;
20
21
∗∗};
end c o n t r o l l e r ;
Listing 2.6: AADL AGREE Annex ACC system example
Additionally, the Strategic Software Engineering Research Group (SSERG) at Clemson University
19
extended AGREE to XAGREE (eXtended Assume Guarantee REasoning Environment) to support the inheritance features of the AADL language. This addition allows XAGREE to produce more maintainable and
less complex verification assets than AGREE for architectures that incorporate inheritance such as Software
Product Lines and Dynamic Software Product Lines [25].
2.5 System-Theoretic Process Analysis Augmentation Reasons
From our experience in applying STPA on different types of safety-critical systems, a systematic
literature review that we conducted in the domain and the understanding of the method in the system safety
community, we come to the conclusion that STPA needs significant augmentation because of the following
reasons:
1. STPA lacks precise information about each major component: STPA is a system method built on the
system theory. Nancy Leveson states ”The systems approach focuses on systems taken as a whole, not
on the parts taken separately” [21]. This means that system theory allows STPA to provide accidents,
hazards, and causes at the system level not the component level. However, in AADL, systems include
hardware and software components as described in section 2.2.1. This allows stakeholders to deeply
analyze the system and identify specific causes of system failure. The causes can be either software,
hardware or both together leading to the system failure.
2. STPA uses a feedback control loop as a system model but doesn’t use the feedback control loop used
as an architectural model: STPA is a recent hazard analysis technique and depends on a monolithic
decomposition of the system with respect to the hierarchy of failure effects instead of the system’s
architectural model. The reason for using monolithic decomposition is because of system complexity.
Nancy Leveson states ”Increased interactive complexity and coupling make it difficult for the designers
to consider all the potential system states or for operators to handle all normal and abnormal situations
and disturbances safely and effectively” [20]. This means that system theory does not let STPA access
lower subcomponents in the feedback control loop because of the system complexity. At the same time,
software engineering principles allow stakeholders to decompose systems into subsystems to solve the
complexity problem. For example, in AADL, we can decompose the system into two layers with
each layer containing subsystems. We can bind the layers together, and we can show error propagation
between the layers. This deeper analysis could help stakeholders identify the source of faults or hazards
20
in the feedback control loop architecture. For instance, system software can have failures because of
error propagation from hardware to software. If the stakeholder upgrades the software to resolve the
problem, the problem will come back after the upgrade because the problem is not the software but
the hardware. The stakeholder needs to change the hardware not upgrade the software to resolve the
problem.
3. STPA does not use the reliability principle (fault→ error → f ailure): STPA looks for causes of hazards
based on system thinking, and it considers the result of the hazard as an accident. There is not a
component reliability concept in STPA because it analyzes safety based on the system. Nancy Leveson
states ”Safety then can be treated as a dynamic control problem, rather than a component reliability
problem” [22]. She also states ”In general, words like ”failure” and ”error” should be avoided, even as
causes, because they provide very little information and are often used to avoid having to provide more
information or to think hard about the cause” [22].
However, reliability engineering principles provide system safety based on the reliability of each individual component. The system will not be safe if the components have failures. Additionally, in
AADL, we use the error ontology to support hazard analysis. This allows stakeholders to identify hazards based on the reliability concept. The basic principles of reliability needed in this dissertation are:
fault is root cause of the error, error is the difference between the measured value and the correct value,
and failure is the termination of a product’s ability to do a required function.
4. STPA does not use a formalization concept: STAMP provides a framework for STPA to investigate and
analyze accidents. This is a major limitation of STPA. Nancy Leveson states ”Three basic constructs
underlie STAMP: safety constraints, hierarchical safety control structures, and process models” [21].
This limitation tells us that STPA does not have any connection with formal principles such as system
architecture analysis, mathematical models, state machine analysis, error propagation, etc. However,
the formalization concept helps improve the rigors of safety analysis methods.
2.6 False Positive and False Negative in Safety-Critical Systems
The safety analysts / stakeholders are expecting the system failure before it occurs. They apply
preventative techniques to the systems to avoid the failure or reduce time to repair by preparing the system
for upcoming failure events. For that purpose, stakeholders proposed proactive fault handling methods to deal
21
with faults in order to prevent the system failure or to minimize the system downtime. This method combines
failure prediction and proactive action. The preventative actions are triggered by prediction of upcoming
failure events, and the repair actions are prepared for imminent failure to reduce time to repair. Thus, the
systems require the prediction of failures to judge a faulty situation or to make a decision to handle the fault.
It is very possible that the failure predictor will be wrong in its predictions. For example, the predictor might
detect an upcoming failure event that never occurs, this is called false positive (FP). Also, the predictor might
fail to predict a failure, this is called false negative (FN) [43].
Now, we provide the real world examples to identify false positives and false negatives. For example,
from the software testing perspective, when a test is executed and, in spite of it running correctly, the test
shows us there is an error, the tester will search for the nonexistent bug adding a lot of cost. That is a false
positive (FP). In addition, when the execution of a test shows no faults although there is a bug in the software
application. That is a false negative (FN).
From the the system safety perspective, in ACC system, if a warning / automatic braking happens
before the critical distance, this is a false positive (FP). Also, if a warning / automatic braking happens too
late, this is a false negative (FN) too [28].
22
Chapter 3
Feedback Control Loop Architecture
3.1 Augmenting a Hazard Analysis Method with Error Propagation
Information
As stated in section 1.3, the first goal of this research is:
• The introduction of a method of augmenting STPA (System-Theoretic Process Analysis) with error
propagation information and error behavior state machines using AADL (Architectural Analysis &
Design Language).
The first goal is associated with three of the research questions shown in section 1.5:
• Can the safety of the system be improved by augmenting a hazard analysis method with error propagation information?
• Can we analyze the internal failures of the components in order to show its effect on the other components during hazard analysis process?
• How does the interaction of the components in the early design phase help us to identify internal
failures?
The traditional version of STPA cannot describe dynamic error behavior, state transitions, and error
propagations among system components. This is because of the reasons described in section 2.5, thus, as part
of this dissertation, we have augmented STPA with error propagation information and error behavior state
23
machines to support modeling / analyzing the dynamic error behavior of the system during hazard analysis
process and to identify potential internal failures of the major components during application of STPA.
This will allow us to identify additional hazards based on different criteria for existing application
examples, identify additional unsafe control actions in different analysis, identify general / specific causes for
unsafe control actions and blend safety constraints to the systems architecture to build a safe product.
We propose a method that helps stakeholders or safety analysts, during hazard analysis, to consider
the error behavior of a component, to identify each path in the form of three-way interactions among components in the feedback control loop architecture and to analyze the trace of each hazard. We illustrate this
method by describing several scenarios and examples to support it. We will provide models for the scenarios
by using the Architecture Analysis and Design Language (AADL) supported by Open Source AADL Tool
Environment (OSATE) [16] to develop a architecture representation. We extend each step in the STPA process with error propagation information and dynamic error behavior. In addition, we add a final step to merge
safety aspects of the system into the systems architecture. The proposed method consists of the following
steps:
1. Identify hazards using different criteria: This step involves identifying accidents based on the system operational context. The error ontology provides guidance in identifying hazards.
2. Build control structures with contributors: This step is the construction of a feedback control loop
with finite state machines for describing dynamic behavior of the system, and add error propagation
specifications across the system to analyze the trace of the hazards which may lead to the accidents.
3. Identify unsafe control actions using tracing: This step helps to identify the unsafe control actions
based on the error propagations tracing. It identifies the error behavior of a component which can lead
to inadequate control actions that could become an unsafe action. This step also helps to identify any
error flow for which a corresponding safety constraint needs to be created to mitigate the identified
hazards.
4. Identify specific causes: This step helps to identify causes for the unsafe control actions in the feedback control loop. Generally, it needs to select the component first and then specifically look for the
causes which relate internally and external to the component.
5. Develop safety architecture: A safety architecture is implemented based on the safety constraints
identified in the previous steps. It blends safety aspects of the system into the overall system architec24
ture.
We will create this augmentation to help the stakeholder or the safety analyst in identifying and evaluating dysfunctional behavior of the system during hazard analysis. The main goal of analyzing the behavior
of the system is to identify hazardous control actions by considering the specification of error propagations
across the system and operating states of the system which can have an effect on control actions.
Using the characteristics of event driven models and error propagation information in the safety
analysis can assist in providing an effective way to annotate and assess all possible paths in the feedback
control loop. In this method, steps 1 through 4 can identify the source of errors and their impact on the other
components during operation. Step 5 helps to enhance the safety of the system by feeding the identified
hazardous situations or propagated error events into the system’s architecture to create constraints to mitigate
the hazards.
The heart of the STPA approach is a feedback control loop to analyze safety of the system. We wish
to improve STPA to facilitate a deeper analysis of the system. Figure 3.1 shows the feedback control loop
with error propagation and finite state machine models. Figure 3.1 consists of four component types: sensors,
controllers, actuators, and the controlled process. These components have incoming and outgoing ports used
to send / receive data as well as error events. The connections among components represent the nominal
control flow as well as the error propagation path.
The error propagation path follows port connections from sensors to controllers, controllers to actuators, actuators to the controlled process and tje controlled process to the sensors. First, the sensors are used
to measure the values of attributes and send them to the controller. Each sensor has two states, operational
and failed. Second, the controller acquires information about the state of the process from measured variables and controlled variables, then the controller uses this information for an initial action by manipulating
controlled variables to maintain the operational process within predefined limits. The controller is used to
regulate the process variables and send commands to the actuator. The controller consists of a process model
which is used to present variables and their values. The controller has two states: normal and error states. The
normal state shows the status of the variables in normal operation. The error state shows abnormal values of
the variables. Third, the actuator follows the controller’s instructions to execute the commands. The actuator
has two states, operational and failed. Fourth, the controlled process is used to show processes inside of the
controller and implement the controller’s decision. The error flow passes errors inside of the component from
an incoming port to an outgoing port.
25
Figure 3.1: Feedback control loop augmented with error propagation and finite state machines
26
In the figure 3.1, we show three scenarios to demonstrate our method and to show the need for
augmentation. We now detail each scenario.
3.1.1
Scenario 1:
Scenario 1 begins with the sensor in the operational state, and the failed state is entered when the
sensor detects an internal failure. In this scenario, we demonstrate how the error propagates from the sensor
to actuator and becomes hazardous to the system. Consider when the sensor detects an internal failure. The
error continuously effects the normal operational state of the sensor. If the sensor is able to recover, the
error will not propagate to the next component. But, if it does not recover, the error propagates through the
outgoing port to the controller. The controller receives the error through its incoming port and processes
it using the process model. In this case, the process model does not understand the propagated error. The
propagated error then becomes an error in the controller. If the controller is able to handle the error, then no
hazard would exist. The controller will automatically go back to its normal state. But, if the controller is not
able to resolve the error, then a hazard would exist because the controller could possibly send a command to
the actuator when it shouldn’t. To diagnose the cause of this unsafe control action, we go back to the system
controller because we expect the controller to be faulty due to its sending of commands when it should not.
However, in this situation the controller does not have any faults. We need to go back another step to find
the exact cause of the fault. In conclusion, using this scenario, we have shown that the propagated error
from sensor to actuator has three important effects in the system. First, it effects the state of the sensor itself.
Second, it effects the decision of the controller. Third, it effects the actuator.
3.1.2
Scenario 2:
Scenario 2 begins with the controller in the normal state, and the error state is entered when the
controller detects an internal failure. The generated error does not propagate out; it stays within controller.
Different types of errors propagate through the outgoing port to the actuator instead of the generated error.
For example, the controller generates an (out of range) error, but it sends a (no data) error to the actuator
because the controller is unable the out of range error. In this case, the controller becomes a source of errors.
In this potentially hazardous situation, for each propagated error, the actuator is executing a different type of
unsafe control action. For instance, if these errors are generated in the controller (i.e. stuck value error, out
of range error, and out of calibration error) the actuator is doing an unacceptable action for the transformed
27
error (i.e. no value, bad value, and incorrect value). After that, the controlled process receives the propagated
error from the actuator as an inadequate or ineffective action through the error propagation path. The error
then passes through the error flow inside of the controlled process to the outgoing port. Therefore, the output
of the controlled process would be an incorrect, inaccurate or delayed measurement. For example, if the input
to the controlled process is delayed, then it is likely its output would also be delayed. In conclusion, using
this scenario, we have illustrated that the propagated error from the controller to the controlled process has
three important effects on the system. First, it effects the controller decision. Second, the actuator performs
an inadequate action. Third, it effects the controlled process by selecting the inappropriate process for the
action. This illustration allows us to identify the source of the hazard by back-tracing for the error.
3.1.3
Scenario 3:
Scenario 3 begins with the actuator in the operational state, and the failed state is entered when
the actuator detects an internal failure. The actuator receives a command from the controller through the
incoming port, but the command is not executed as the controller intended because of the actuator’s internal
failure. Thus an error occurs in the actuator leading to the actuator entering the failed state. If the actuator
is able to resolve the error, the controller’s command will not be impacted. If the actuator is not able to
resolve the error, a hazardous situation could result impacting the actuator’s output, like causing the actuator
to produce output late. The delayed operation directly effects the controlled process because the timing error
passes through error flow to the output port. The output of the controlled process becomes an input to the
sensor, but it has a feedback delay because of propagating the timing error. At the same time, the output of
the sensor becomes an input to the controller, but it also has a feedback delay because of propagating the
timing error. Therefore, the propagated error from the actuator to the sensor has three important effects on
the system. First, it effects of the actuator state. Second, it effects of the controlled process by having a delay
time to select the appropriate process for the action. Third, it effects of the sensor by having the delay time
to obtain measured values.
3.2 Additional Scenarios to the Augmented Method
In this section, we provide additional scenarios that answer the following questions:
• How can an error outside of the feedback control loop effect a component’s behavior inside of the loop?
(Scenario 4)
28
Figure 3.2: The augmented method for additional scenarios
• How can we identify the system states for each scenario? (Scenario 5)
Figure 3.2 will be used for the additional scenarios on the feedback control loop architecture. Consider the following scenarios:
3.2.1
Scenario 4:
As shown in figure 3.2, scenario 4 begins with a device which is connected to the feedback control
loop. The device has a user interface which is used to set values of the controller and monitor the status of the
system. The purpose of this scenario is that we wish to understand how the internal failure of the connected
device effects the feedback control loop architecture and how errors propagate from devices external to the
29
loop to components in the loop.
We begin with the operator interface device (OID) in the operational state, and the failed state is
entered when the OID detects an internal failure. The error continuously effects the normal operational of the
device. If the device is able to recover, the error will not propagate to the feedback control loop. But, if it
does not recover, the error propagates through an outgoing port to the controller.
The controller receives the error through its incoming port processes it using the process model. But,
the process model does not understand the propagated error. Because of this, the propagated error becomes
an error event which may lead to the controller transitioning from the normal state to the error state. If the
controller is able to resolve the error event, no hazard would be possible. But, if the controller is not able
to resolve the error event, a harzardous event leading to an unsafe control action being sent to the actuator
and the feedback control loop state becoming unsynchronized with the OID could occur, as shown in the
figure 3.2.
In conclusion, using this scenario, we demonstrate the propagated error from the connected device
has three important effects in the system. First, it effects the state of the connected device which leads to
the sending of an ineffective/inadequate command to the controller. Second, it effects the decision of the
controller. Third, it effects the actuator causing it to perform an ineffective/inadequate action.
3.2.2
Scenario 5:
As shown in figure 3.2, scenario 5 is used to find the system or top level, states of the feedback
control loop according to the previous scenarios. As a reminder, we present each error flow based on threeway interactions back-tracing for the error. Each error flow could assist in identifying failure states of the
system.
Scenario 1: We have this error flow (Sensor → Controller → Actuator), and we need to find out
how this flow can show us the system states. Specifically, we need to provide a safety requirement for the
failure state:
[(Sensor.FailedState) ∨ (Controller.ErrorState) ∨ (Actuator.FailedState)]
→ System.FailedState (3.1)
30
[(Sensor.OperationalState) ∧ (Controller.NormalState) ∧ (Actuator.OperationalState)]
→ System.OperationalState (3.2)
Scenario 2: We have this error flow (Controller → Actuator → ControlledProcess), and we need
to find out how this flow can show us the system state. Specifically, we need to provide a safety requirement
for the failure state:
[(Controller.ErrorState) ∨ (Actuator.FailedState) ∨ (ControlledProcess.ErrorState)]
→ System.FailedState (3.3)
[(Controller.NormalState) ∧ (Actuator.OperationalState) ∧ (ControlledProcess.NormalState)]
→ System.OperationalState (3.4)
Scenario 3: We have this error flow (Actuator → ControlledProcess → Sensor), and we need to
find out how this flow can show us the system state. Specifically, we need to provide a safety requirement for
the failure state:
[(Actuator.FailedState) ∨ (ControlledProcess.ErrorState) ∨ (Sensor.FailedState)]
→ System.FailedState (3.5)
[(Actuator.OperationalState) ∧ (ControlledProcess.NormalState) ∧ (Sensor.OperationalState)]
→ System.OperationalState (3.6)
31
Scenario 4: We have this error flow (Inter f aceDevice → Controller → Actuator), and we need to
find out how this flow can show us the system state. Specifically, we need to provide a safety requirement for
the failure state:
[(Inter f aceDevice.ErrorState) ∨ (Controller.ErrorState) ∨ (Actuator.FailedState)]
→ System.FailedState (3.7)
[(Inter f aceDevice.NormalState) ∧ (Controller.NormalState) ∧ (Actuator.OperationState)]
→ System.OperationalState (3.8)
3.2.3
Notifications
Section 3.1 and its examples have been published as a technical paper in the 35th International
System Safety Conference (ISSC) in Albuquerque, New Mexico, USA. The evaluation results of the scenarios
are described in section 4.
3.3 Using Formal Notations to Augment a Hazard Analysis Method
As stated in section 1.3, the second goal of this research is:
• The introduction and analysis of a theoretical framework to identify unsafe system behavior.
The second goal is associated with a research question as stated in section 1.5:
• How to find more hazardous possibilities in the feedback control loop architecture?
The primary goal of augment the analysis processes of STPA is to perform a deeper analysis of
system safety. We know that STPA identifies hazardous control actions based on the values of the process
model variables and then changes the hazards into safety constraints for the controller. To identify hazards,
each relevant value will be examined to determine the context of the control action because the controller will
decide which action to perform based on the value. If one of the existing values in the process model is not
32
compatible within the control action context that action can be considered an unsafe control action. From a
software perspective, what will happen if the value/variable is not defined or missed in that context? Does
this lead to missing safety constraints for the system? Finally, how does STPA resolve this problem? From
our point of view, we need to know the specific information about each component in the feedback control
loop such as functions, inputs, outputs, variables, and values. In addition, we need to know specifically how
the process model in the controller makes a decision based on the existing values and variables.
Specifically, operation of the safety-critical system can be expressed as a function relating system
inputs and outputs. There is a relationship among inputs and outputs where each output value will be related
to an input value. When the system is working correctly, the system satisfies all functional requirements.
This does not make the system safe. The system should be analyzed regarding the safety aspects and verified
according to its safety requirements.
Our objective is to find new hazards and more safety constraints to further ensure safe. In this
case, we need to identify safety constraints for unsafe actions because any unsafe behavior in the operation
of the component. For this purpose, we now introduce the theoretical background of our analysis process.
Our background consists of several functions describing the behavior of the feedback control loop, and by
providing details of the functions, potential hazards can be identified and mitigated.
1. Specify criteria to identify unsafe behavior of a component: This step involves specifying an identifier to identify a faulty component in the feedback control loop and giving an error to the component.
This step can be expressed formally as a two-tuple: <INTF, Err> where:
• < INT F > is faulty component identifier. This means a component will deliver incorrect service
to the next component becoming an error source:
[INT F ∈ DC], where DC is the set of detection conditions.
• < Err > is the error. It is the result of an internal failure of a component. The error has an effect
on the component’s behavior. If the error propagates to the other components it may / may not
lead to unsafe interaction among components.
[Err ∈ EO], where EO is the set of error ontology.
2. Build notation to specify input/output/function for each component: The purpose of this step is
the construction of the feedback control loop with mathematical notation for describing the functional
behavior of each component as is shown in figure 3.3
33
Controller - tuple: < FC (V me, FPM (Var,Val,Cond)) = CA >, where:
• < FC > is the function of the controller. Normally, it takes an input from the sensor/interface
device, processes it, and gives a command to the actuator.
• < V me > is the measured variable. It is the input to the controller. It is used to measure the value
of the variable which is read by the sensor.
[V me ∈ Var], where Var is the set of variables in process model. But if [V me → Err], where
[Err ∈
/ FPM ]
• < FPM > is the process model function. It is the mental model of the controller. It can be
represented as another component inside of the controller. It contains variables and their values.
It is used to determine what control action is needed based on receiving content of the measured
variables. It consists of three important elements:
– < Var > is the set of variables which are used by the process model.
– < Val > is the set of values that can be used by the variables.
– < Cond > is the set of conditions that can be used by the process model to make a decision,
where: [{Var,Val} ∈ Cond]. The condition is that the received values via measured variable
should be matched with the defined values in the process model function.
• < CA > is the control action. It is the output of the controller. It can be considered as a command
to actuator, where the {CA → True iff Cond → True}. This means the control action is going to
be a safe action if and only if the condition is true. Otherwise, the control action is going to be
inadequate action that can become an unsafe action.
Actuator-tuple:< FA (CA) = VMa > where:
• < CA > is the control action, input to the actuator from the controller. If this is an expected
control action, then the actuator will behave as expected. However, if this is an unsafe control
action, the actions of the actuator might lead to a hazardous situation.
• < V ma > is the manipulated variable, input to the controlled process from the actuator. The
control action (CA), as previously mentioned, determines the output of the actuator.
Controlled Process-tuple: < FCP (Vma ) = V c> where:
34
• < V ma > is the manipulated variable, input to the controlled process. If the values are manipulated according to the controller’s command, then the controlled process will be modified
according to the limits defined in the controller.
• < V c > is the controlled variable, output of the controlled process. The actual behavior of the
controlled process is monitored through the controlled variable which is dependent on input. If
the input is in the range defined by the controller, the output will be in range as well.
Sensor-tuple: < FS (V c) = V me > where:
• < V c > is the controlled variable, input to the sensor, which is used to update information that
produced by the sensor.
• < V me > is the measured variable, output of the sensor. The sensor sends information to the
controller in the form of measured values. If the sensor has an internal failure, it will provide
incorrect information to the controller. Additionally, the sensor can give incorrect information to
the controller if the sensor updates its information using incorrect values that have been provided
by the controlled processes through controlled variables.
Top-tuple: <System> where:
• System is the set of sub-components that the top component decomposes into. If a component has
an internal failure, how will the system react to it? Does the system have a safe behavior or will
it react unsafely?
3. Identify unsafe functional behavior and find more hazardous possibilities using the error ontology: This step helps to identify unsafe functional behavior of a component that violates safety requirements for the associated system. The internal failure of a component can give inadequate results that
could lead to an unsafe function. This step also helps to identify corresponding safety constraints which
need to be created to mitigate unsafe functional behavior. In this step, we are using the error ontology
to find the possibilities of unsafe functional behavior which leads to the system behaving incorrectly.
If we find the hazardous possibilities before we develop the system, it helps to improve our confidence
that the system will behave correctly when it faces hazardous conditions.
4. Identify mathematical notation for each error flow: This step identifies formal expressions for each
error flow. We know that the propagated error cuts across three components. For that reason, each
35
Figure 3.3: A framework for feedback control loop inputs/outputs and functions
error flow should contain three functions. In this case, the proposed safety constraints should cover
three components to mitigate the effect of the error flow.
3.3.1
Notations and Expressions for Train Automated Door Control System
We take the same example described in appendix B which is an automated door controller for a train,
and we apply our notations and expressions to find more hazardous possibilities:
1. Specify criteria to identify unsafe behavior of a component: Doors are locked. The train driver is
unable to open them because the sensor provides incorrect readings to the controller. This leads to the
passengers being stuck inside the train when the train is stopped at the station platform.
FS (V c) =
V me
Err
if Measured Values ∈ V me;
if Measured Values ∈
/ V me, where FS (INT F) → Err, Err ∈ EO.
The sensor sends measured values to the controller via measured variables. The controller is able to
recognize the values that were previously defined in the variables (i.e measured variables). When the
sensor has an internal failure, it will send different values to the controller. Then, the controller is not
36
able to understand those values leading to the stuck door.
2. Build notation to specify input/output and function for each component: As shown in figure 3.3.
3. Identify unsafe functional behavior and find more hazardous possibilities using the error ontology: These functions are identified by predicting the behavior of the sensor and can help to determine
the response of the controller.
Function
Open Door
Table 3.1: Unsafe Functional Behavior
Sensor Function
Controller Function
FS (V c, INT F) = Err, assume Err ∈ FC (FPM (Var,Val,Cond))
6=
ErrValues
CA, i f ErrValues ∈
/ Val, T hen Cond →
False
As shown in table 3.1, the function of the sensor shows the internal failure of the component and
provides incorrect values to the controller (i.e the output of the sensor is an error (Err) which can be
considered as one of the errors in the error ontology [like error values].) The function of the controller
shows that the incoming error value is not an element in the defined values of the process model. For
that reason, the decision of the process model is going to be unknown. This effects the output of the
controller because the condition of the decision is false. Then, the output of the controller is not equal
to the safe control action. As shown in table 3.2 by formalizing the error ontology, we can find more
hazardous possibilities.
Table 3.2: Finding More Possibilities
Finding possibilities in controller
If ErrValue > MaxVal
If ErrValue < MinVal
If ErrValue ∈
/ Val
If ErrValue = ”Null Value”
If ErrValue = ”Stuck Value”
Description
The controller does not open the
door when it gets above of the
range values
The controller does not open the
door when it gets below of the
range values
The controller does not open
the door when value error occur
(none of the values)
The controller does not open the
door when it gets omitted error
value
The controller does not open the
door when it gets the same repeated value
37
Is this possibility hazardous?
Yes
Yes
Yes
Yes
Yes
4. Identify mathematical notation for each error flow: As shown in table 3.3, we identify formal
expressions for each error flow or scenario that was described in previous sections. This helps to
identify safety constraints with respect to error contexts to mitigate the effects of the error on the
system. For example, SC(S, C, A, Err) means the controller (C) is required to provide control action for
the actuator (A) with respect to error type (Err) context for that message which comes from the sensor
(S).
Table 3.3: Mathematical Notations for Error Flows and Safety Constraints
3.3.2
Scenarios
Error Flow Expressions
Safety Constraints (SC)
1
FS (V c, INT F) → FC → FA (V ma, Err)
SC(S, C, A, Err)
2
FC (V me, FPM , INT F) → FA → FCP (V c, Err)
SC(C, A, CP, Err)
3
FA (CA, INT F) → FCP → FS (V me, Err)
SC(A, CP, S, Err)
4
FOID (V me, INT F) → FC → FA (V ma, Err)
SC(OID, C, A, Err)
Notifications
This section 3.3 and its example 3.3.1 have been published as a technical paper in the 36th In-
ternational System Safety Conference (ISSC), Phoenix, Arizona, USA. In that session, 21 system safety
researchers and practitioners are voted for 10 questions that have been prepared about the notations. The
results of the votes are described in the appendix H.
3.4 Introducing Feedback Control Loop Architecture with Compositional Reasoning
This section can be considered the last step of the proposed method. The main reason behind doing
this as part of the dissertation is that STPA has no method of verifying that the safety constraints proposed
are correct. Thus our associated research question is How to introduce STPA with assume-guarantee contract
to verify safety constraints? What are the steps to verify the safety constraints against system model?
System / software verification concentrates on showing functional correctness of the system and
demonstrating that the system fully satisfies all functional requirements. However, the functional requirements do not make the system safe or reduce the risk level of the system. Therefore, the system should be
38
analyzed with respect to safety aspects and verified against its safety requirements at component / system
level.
The overall objective of this part of the dissertation is to demonstrate the possibility of verifying
safety-critical software (i.e architecture model) against safety constraints which are derived at the component
/ system level. Generally, to control the criticality of safety critical software, we need to identify the potential
hazards using the error ontology as a guide and then demonstrate that a potential hazard could not occur (i.e
the software of the system cannot contribute to an unsafe state).
The contribution of this work is to introduce the feedback control loop architecture with safety
verification procedures to verify the identified safety constraints and to show that hazards have been mitigated
or controlled to an acceptable level of risk. In addition, our contribution could assist stakeholders with
reducing the amount of time and effort by doing safety analysis and verification of the safety requirements
on the same system architecture rather than doing them separately.
The procedure consists of the following steps:
1. Identify an assume-guarantee contract for each component: This step is the construction of a
feedback control loop architecture with assumptions (A) describing expectations the component makes
about the environment and guarantees (G) describing the behavior the component makes about its
output if the assumptions are met.
2. Identify mathematical notations for each component: This step involves identifying expressions
for each component based on the mathematical notations which are described in previous sections. For
example, using FS , FC , FA , FCP as functions for the components in the implementation of the component,
and using V me,CA,V ma,V c as input/output for components in the specification.
3. Identify safe behavior for the top level system: This step assists in identifying guarantee of safe
behavior in the overall system which depends on the guarantee of correct output of each component in
the feedback control loop architecture. The notation inside the the figure 3.4 demonstrates this.
4. Satisfy safety constraints: This step assists in verifying the identified safety requirements against
system model. We need to show that the model of the system satisfies the derived safety constraints.
Previously, we built the safety constraints with regard to context of the error types. For example,
our safety constraint, (The system should not accept null values), is identified as an omission error in
the error ontology; it can also be mapped to one of the unsafe control actions in STPA which is (not
39
providing control action). If the system does not accept null values, it means the system satisfies the
safety constraint against system model.
Consider figure 3.4 which introduces the feedback control loop architecture with compositional reasoning. The main advantage of this introduction is it mitigates the unsafe control action, the wrong decision
of the controller that has been made based on the received incorrect values. To eliminate this action, we can
make a guarantee for the output of the sensor, at the same time, constructing an assumption input for the
controller. As a result, validate that the sensor does not send incorrect values to the controller leading to a
safer system. This type of analysis can provide a certainty about behavior for the system and ensure that the
system is not behaving incorrectly.
Table 3.4 helps to ensure the controller does not follow/execute the unsafe control actions because
there is a contract among components such as sensor-controller, controller-actuator, actuator-controlled process, and controlled process-sensor. For example, we build the following safety constraints based error ontology context:
SC1: The controller has a guarantee to not receive null values from the sensor. (Omission error
value)!! SC2: The controller has a guarantee to not receive unintended incoming values from the sensor.
(Commission error value)!! SC3: The controller has a guarantee to do an action within defined time range.
(Timing error)!!
When we verify the SC1, the controller mitigates one of the unsafe control actions which is [Action
required but not provided] that can lead to hazards. In addition, when we verify SC2, the controller mitigates
another unsafe control action which is [Unsafe action provided], and so on.
Table 3.4: Mitigating the unsafe control actions
STPA
Error Ontology
Action required but not provided
Omission service error types
Unsafe action provided
Commission service error types
Action provided too early/too late
Service timing errors
Provided action stopped too soon/applied Sequence commission (early start/late tertoo long
mination)
40
A/G Contract
SC1
SC2
SC3
SC3
Figure 3.4: Introduce feedback control loop architecture with assume-guarantee reasoning
41
Chapter 4
Analysis and Evaluation
In this section, we evaluate our compositional safety analysis method, Architecture Safety Analysis
Method (ASAM), used for developing safety-critical systems. Our method will allow system stakeholders or
safety analysts to mitigate the effects of errors through safety constraints developed during the construction
of software architecture. In the method, we focus on finding safety constraints and verifying them within
the system model. To do so, we inject the AADL error ontology into the architecture model to identify the
unsafe control actions and causal factors of the unsafe control actions. We verify the safety constraints that
we generate within the system model to ensure the hazardous situations are sufficiently mitigated through the
use of compositional reasoning on the system model.
From a technical perspective, ASAM is a new software safety analysis tool which works with AADL
models annotated with an implementation of our formulas (i.e. asam annex, introduced in section 3.3) and
supported by the Open Source Architectural Tool Environment (OSATE). It is used to analyze and generate
a report based on information attached to each component in the feedback control architecture. In fact,
ASAM provides several statements to support discovering hazards such as error statements, error propagation
statements, internal failure statements and safety constraint statements that either handle or prevent hazards.
The goal of the statements is to describe errors in order to determine their source and prevent propagation.
If error propagation cannot be prevented, ASAM can describe how the error is eventually handled as well as
what component handles it. Additionally, ASAM lets the safety analyst record the severity level of the hazard
for the specific error type based on the probability of occurrence for that type of error as well as verify that a
safety constraint properly mitigates the error.
In this research, we focus on evaluating ASAM based on concerning claims:
42
Table 4.1: Evaluation Examples Overview
C1
Safety-Critical System Examples
Adaptive Cruise Control System -(ACCS)
4.1.2
Train Automated Door Control System-(TADCS)
4.1.3
Medical Embedded Device-Pacemaker (PM)
4.1.4
Infant Incubator Temperature Control System-Isolette (ISO) 4.1.5
C2
C3
4.2.2
4.2.3
4.2.4
−
−
4.3.2
4.3.3
−
• Claim C1: Whereas STPA only finds hazards that exist in the communication between the controller
and the actuator, ASAM finds errors, that may or may not lead to hazards, that exist in not only in the
controller / actuator but other components as well.
• Claim C2: Each error flow in ASAM assists in identifying a safety constraint of appropriate breadth
to eliminate the unsafe control actions caused by an identified error. In addition, each error flow can
guide us to identify specific causes for the unsafe control action.
• Claim C3: ASAM uses verification, unlike existing safety analysis methods, to further ensure that the
safety constraint fully mitigates the hazardous condition improving the quality of the safety constraints.
Each claim is discussed individually in the remaining sections. Throughout the evaluation, we will
use the safety-critical system examples that were described in the previous chapter such as the adaptive cruise
control system (scenario 3.1.1), the train automated door control system (scenario 3.1.2), the pacemaker
(scenario 3.1.3), and the isolette (scenario 3.2.1). For each of our selected examples, we will present the
implementation of the scenarios, some of which will be reused across evaluations.
4.1 Demonstrating Three-Way Communication Format
Claim C1: Whereas STPA only finds hazards that exist in the communication between the controller
and the actuator, ASAM finds errors, that may or may not lead to hazards, that exist in not only in the
controller / actuator but other components as well.
4.1.1
Evaluation Plan
To demonstrate claim C1, we will evaluate each of our selected examples. For each system, we
will first introduce the system and its architecture. We will then produce a report for that system under
the described scenario. Each scenario evaluates an error flow based on three-way interaction format for the
specific example to identify the source of the hazard.
43
ASAM shows the effect of unsafe interactions based on the three-way communication format among
components while back-tracing the error. As a reminder, generally, the current accepted standard identifies
unsafe interactions based on the two-way communication format. Specifically, it focuses on the interaction
between the controller and actuator to identify unsafe control actions.
4.1.2
Adaptive Cruise Control System
The first system we introduce is an adaptive cruise control system for modern vehicles. The goal
of this system is to monitor the distance and speed of the vehicle in front of the monitored vehicle (i.e. the
system compares the speed of the monitored vehicle with the vehicle in front based on repeated measurements
of the distance between them). This example is used to support (scenario 3.1.1) and show how our method
helps to identify additional hazardous situations for an existing application example, described in the [22].
For demonstration purposes, we evaluate what will happen if the sensor in adaptive cruise control (ACC)
estimates the incorrect values for speed and distance of the vehicle in front of the monitored vehicle (e.g., the
sensor estimates close enough, but in reality is not) during driving because of an internal failure. When the
component has an internal failure, the result is propagation of an error. The error values propagate from the
sensor to the controller causing the controller to process the erroneous value. This will cause the controller to
send an incorrect command to the actuator. For this situation, the ACC system warns the driver to apply the
brake because the monitored vehicle is close enough with the vehicle in front. If the driver does not apply the
brake, the system automatically will do it. Either way, an accident will happen because the monitored vehicle
is doing an unacceptable action based on a decision made with incorrect values. In this case, either the driver
or the system will apply the brakes in the middle of the road possibly causing the monitored vehicle to be hit
by the vehicle behind it.
The initial results for this ACC system example have been published in [38] which are:
1. The error ontology identifies wrong estimation as incorrect values. The hazard is ”incorrect estimation
values for the monitored vehicle from the ACC system”.
2. The feedback control loop system has been built from the sensor to the actuator as shown in Figure 3.1
which is error flow 1 (scenario 3.1.1).
3. The unsafe control action is that ”The system applies the brakes in the middle of the road”. The safety
constraint for the unsafe control action is that ”The system must not apply the brakes when it has
incorrect values”.
44
4. The general cause for the unsafe control action is that ”The sensor has an internal failure” and the specific cause is that ”The propagating of incorrect values causes an error event from sensor to actuator”.
5. This is shown in the developed safety architecture subsection, as shown in figure 3.1.
We will now provide and discuss the individual core components of the feedback control loop architecture for ACC system.
1
2
s y s t e m ACC sensor
features
3
ACC Speed s : o u t d a t a p o r t ;
4
ACC Distance s : out data p o r t ;
5
ACC State s : in data p o r t ;
6
flows
7
f s o u r c e 1 : f l o w s o u r c e ACC Speed s ;
8
f s o u r c e 2 : flow source ACC Distance s ;
9
end ACC sensor ;
Listing 4.1: ACC System
In listing 4.1, we show the first primary component of the architecture: the interface for the ACC
sensor. This interface shows that the ACC sensor component produces 2 values and receives 1 value. The
two outputs are a ACC Speed s and a ACC Distance s value that represent the speed and distance of the in
front of car. These two values also represent the source of information flow. The final value ACC State s is
the controlled process state. The implementation of this interface is shown in listing 4.2.
1
s y s t e m i m p l e m e n t a t i o n ACC sensor . i m p l
2
a n n e x asam {∗∗
3
E r r s => [ {
4
−− I d e n t i f y s p e c i f i c e r r o r t y p e s f o r t h e s e n s o r , i d e n t i f y c h a n c e o f
harming people .
5
6
Type =>AboveRangeSpeed ( C r i t i c a l , p = 0 . 1 ) ,
−−F o r t h e s p e c i f i c e r r o r t y p e , what k i n d o f u n s a f e a c t i o n w i l l t h e
sensor perform ?
7
UCA => UCA1 : ”ACC s e n s o r d i s p a t c h e s a b o v e r a n g e o f s p e e d v a l u e s t o
controller ” ,
45
−−What a r e t h e c a u s e s o f t h e U n s a f e C o n t r o l A c t i o n (UCA) o f t h e
8
sensor ?
C a u s e s => {
9
10
G e n e r a l => ”ACC s e n s o r h a s an i n t e r n a l f a i l u r e ” ,
11
S p e c i f i c => ”ACC s e n s o r r e c e i v e s a b o v e r a n g e o f s p e e d v a l u e
from f e e d b a c k ”
},
12
−− I d e n t i f y S a f e t y C o n s t r a i n t s ( SC ) f o r t h e s e n s o r t o m i t i g a t e
13
e f f e c t o f t h e UCA.
SC => SC1 : ”ACC s e n s o r s h o u l d n o t s e n d i n c o r r e c t s p e e d v a l u e s t o
14
c o n t r o l l e r for computation ”
15
},
16
{
17
Type =>B e l o w R a n g e D i s t a n c e ( C a t a s t r o p h i c , p = 0 . 1 ) ,
18
UCA => UCA2 : ”ACC s e n s o r d i s p a t c h e s below r a n g e o f d i s t a n c e v a l u e s
to c o n t r o l l e r ” ,
C a u s e s => {
19
20
G e n e r a l => ”ACC s e n s o r h a s an i n t e r n a l f a i l u r e ” ,
21
S p e c i f i c => ”ACC s e n s o r r e c e i v e s below r a n g e d i s t a n c e v a l u e
from f e e d b a c k ”
22
},
23
SC => SC2 : ”ACC s e n s o r must n o t d i s p a t c h i n c o r r e c t d i s t a n c e v a l u e s
to c o n t r o l l e r for computation ”
}]
24
25
26
∗∗};
end ACC sensor . i m p l ;
Listing 4.2: ASAM for ACC System Sensor
In listing 4.2, the ASAM annex allows stakeholders to record specific information in the implementation part of each component in the ACC architecture model. For example, we have recorded the following
information in the ACC sensor’s implementation: error (Err) types for the sensor’s internal failure (above
range of speed and below range of distance), unsafe control actions (UCA) for each error type, general or
specific causes for each unsafe control action, probability of occurrence (P) for each error type, severity level
46
of hazards for each error type, and safety constraints (SC) to mitigate the effects of the error types or unsafe
control actions.
1
2
system ACC controller
features
3
ACC Speed c : i n d a t a p o r t ;
4
ACC Distance c : in d a t a p o r t ;
5
ACC Speed cmd : o u t d a t a p o r t ;
6
ACC Ditance cmd : o u t d a t a p o r t ;
7
flows
8
f p a t h 1 : f l o w p a t h ACC Speed c −> ACC Speed cmd ;
9
f p a t h 2 : f l o w p a t h A C C D i s t a n c e c −> ACC Ditance cmd ;
10
end A C C c o n t r o l l e r ;
Listing 4.3: ACC System
In listing 4.3, we have the ACC controller interface that controls the speed and distance of the
vehicle. This interface has two in data ports, ACC Speed c and ACC Distance c, that allow the controller to
receive speed and distance values from the sensor. The interface also has two out data ports:ACC Speed cmd
and ACC Ditance cmd. ACC Speed cmd sends the speed command to actuator and ACC Ditance cmd sends
the distance command to actuator. The interface has a flow path to make a link between input and output
ports. For example, f path1 is a flow path used to transfer speed information within the controller from input
port ACC Speed c to output port ACC Speed cmd. Also, f path2 is another flow path used to pass distance
information within the controller from ACC Distance c to ACC Ditance cmd. The implementation of this
interface is shown in listing 4.4.
1
system implementation ACC controller . impl
2
a n n e x asam {∗∗
3
E r r s => [ {
4
Type =>AboveRangeSpeed ( C r i t i c a l , p = 0 . 0 0 0 3 ) ,
5
UCA => UCA3 : ”ACC c o n t r o l l e r w a r n s t h e d r i v e r t o a p p l y t h e b r a k e s ” ,
6
C a u s e s => {
7
G e n e r a l => ”ACC c o n t r o l l e r h a s an i n t e r n a l f a i l u r e ” ,
8
S p e c i f i c => ”ACC c o n t r o l l e r r e c e i v e s a b o v e r a n g e o f s p e e d v a l u e
from s e n s o r ”
9
},
47
SC => SC3 : ”ACC c o n t r o l l e r s h o u l d n o t do a c o m p u t a t i o n when i t h a s
10
i n c o r r e c t speed values ”
11
},
12
{
13
Type =>B e l o w R a n g e D i s t a n c e ( C a t a s t r o p h i c , p = 0 . 0 0 0 3 ) ,
14
UCA => UCA4 : ”ACC c o n t r o l l e r s e n d s command t o a p p l y t h e b r a k e s i n
the middle of the road ” ,
C a u s e s => {
15
16
G e n e r a l => ”ACC c o n t r o l l e r h a s an i n t e r n a l f a i l u r e ” ,
17
S p e c i f i c => ”ACC c o n t r o l l e r r e c e i v e s below r a n g e d i s t a n c e v a l u e
from f e e d b a c k ”
18
},
19
SC => SC4 : ”ACC c o n t r o l l e r must n o t a p p l y t h e b r a k e s when i t h a s
incorrect distance values ”
}]
20
21
22
∗∗};
end A C C c o n t r o l l e r . i m p l ;
Listing 4.4: ASAM for ACC System Controller
In listing 4.4, we have recorded two incoming errors from the ACC sensor such as above range speed
and below range distance. For the first error, above range speed, the controller warns the driver to apply the
brakes because the speed of your monitored vehicle is higher than the vehicle in front. The error is estimated
by the sensor incorrectly and propagated to the controller through the nominal control flow as well as the
error propagation path. In this case, the action of controller is going to be an unsafe action. The causes for
that action can be classified into two types: 1) a general cause, ACC controller has an internal failure and 2)
a specific cause, ACC controller received incorrect speed values from the sensor. To mitigate the effects of
that unsafe action, we provide a safety constraint: ACC controller should not do a computation when it has
incorrect speed values. We have attached a safety requirement for that error to know how to correct the error
if it happens. Also, we have recorded the probability of occurrence for above range speed error and its hazard
severity level.
For the second error, below range distance, the ACC controller sends a command to apply the brakes
in the middle of the road because your car is too close to the car in front. The error is estimated by the sensor
incorrectly and propagated to the controller. In this case, the action of controller is going to be an unsafe
48
control action. The causes for that action can be divided into two types: 1) a general cause, ACC controller
has an internal failure, and 2) a specific cause, ACC controller received incorrect distance values from the
sensor. We provide a safety constraint to mitigate the effects of the unsafe actions: ACC controller must
not apply the brakes when it has incorrect distance values. We have attached a safety requirement for that
error which helps to know how to correct the error if it happens. Also, we have recorded the probability of
occurrence for below range distance error and its hazard severity level.
1
2
system ACC actuator
features
3
ACC EXE Speed a : i n d a t a p o r t ;
4
ACC EXE Distance a : i n d a t a p o r t ;
5
Unsafe Action a : out data port ;
6
flows
7
f p a t h 3 : f l o w p a t h ACC EXE Speed a−>U n s a f e A c t i o n a ;
8
f p a t h 4 : f l o w p a t h ACC EXE Distance a−>U n s a f e A c t i o n a ;
9
end A C C a c t u a t o r ;
Listing 4.5: ACC System
In listing 4.5, we have the ACC actuator interface that follows the controller’s instructions to execute
commands. This interface has two in data ports, ACC EXE Speed a and ACC EXE Distance a, that allow
the actuator to execute speed and distance commands from the controller. The interface also has an out data
port, Unsafe Action a, that allows the actuator to send commands to the controlled process. In this case,
inadequate commands from controller lead to unsafe commands from the ACC actuator. The interface also
has a flow path to make a link among in and out data ports. For example, f path3 is a flow path within the
actuator used to transfer data from ACC EXE Speed a to Unsafe Action a. Also, f path4 is another flow path
within the actuator used to transfer data from ACC EXE Distance a to Unsafe Action a. The implementation
of this interface is shown in listing 4.6.
1
system implementation ACC actuator . impl
2
a n n e x asam {∗∗
3
E r r s => [ {
4
Type =>AboveRangeSpeed ( C r i t i c a l , p = 0 . 2 ) ,
5
UCA => UCA5 : ”ACC a c t u a t o r e x e c u t e s t h e ’ warn t h e d r i v e r ’ command ” ,
6
C a u s e s => {
49
7
G e n e r a l => ”ACC a c t u a t o r h a s an i n t e r n a l f a i l u r e ” ,
8
S p e c i f i c => ”ACC a c t u a t o r r e c e i v e s i n a d e q u a t e command from
controller ”
},
9
SC => SC5 : ”ACC a c t u a t o r s h o u l d n o t e x e c u t e t h e i n c o r r e c t command ”
10
11
},
12
{
13
Type =>B e l o w R a n g e D i s t a n c e ( C a t a s t r o p h i c , p = 0 . 3 ) ,
14
UCA => UCA6 : ”ACC a c t u a t o r e x e c u t e s t h e ’ a p p l y b r a k e s ’ command ” ,
C a u s e s => {
15
16
G e n e r a l => ”ACC a c t u a t o r h a s an i n t e r n a l f a i l u r e ” ,
17
S p e c i f i c => ”ACC a c t u a t o r r e c e i v e s an i n c o r r e c t command from
controller ”
18
},
19
SC => SC6 : ”ACC a c t u a t o r must n o t e x e c u t e t h e i n a d e q u a t e command ”
}]
20
21
22
∗∗};
end A C C a c t u a t o r . i m p l ;
Listing 4.6: ASAM for ACC System Actuator
In listing 4.6, we have two incoming commands from the ACC controller to the ACC actuator
through two in data ports: ACC EXE Speed a and ACC EXE Distance a. But, each command contains an
error. For example, there is an above range of speed error in ACC EXE Speed a, and a below range of distance
error in ACC EXE Distance a. For the first error, above range of speed, the actuator executes the ’warn the
driver’ command because the actuator has an internal failure or received an inadequate command from the
controller. For the second error, below range of distance, the actuator executes the ’apply brakes’ command
in the middle of the road because the actuator has an internal failure or receives an inadequate command from
controller. To know how to mitigate the effects of the errors, we have attached safety constraints which help
mitigate incorrect or inadequate commands.
1
2
system ACC controlled process
features
3
Controlled Action : in data port ;
4
Feedback : out d a t a p o r t ;
50
5
6
7
flows
f s i n k : flow s in k C o n t r o l l e d A c t i o n ;
end A C C c o n t r o l l e d p r o c e s s ;
Listing 4.7: ACC System
In listing 4.7, we show the last interface, ACC controlled process. This interface gives a Feedback
value to the sensor through the out data port. The interface also receives Controlled Action through an in data
port from the actuator. The flow sink represents the end of the information flow which comes from the sensor.
1
2
system implementation e n t i r e s y s t e m . impl
subcomponents
3
ACC sensor : s y s t e m ACC sensor . i m p l ;
4
ACC controller : system ACC controller . impl ;
5
ACC actuator : system ACC actuator . impl ;
6
A CC c on tr ol le d pr o ce ss : system A CC c on tr ol le d p ro ce ss . impl ;
7
connections
8
c o n n 1 : p o r t ACC sensor . ACC Speed s −> A C C c o n t r o l l e r . ACC Speed c ;
9
c o n n 2 : p o r t ACC sensor . A C C D i s t a n c e s −> A C C c o n t r o l l e r . A C C D i s t a n c e c ;
10
c o n n 3 : p o r t A C C c o n t r o l l e r . ACC Speed cmd −>
A C C a c t u a t o r . ACC EXE Speed a ;
11
c o n n 4 : p o r t A C C c o n t r o l l e r . ACC Ditance cmd −>
A C C a c t u a t o r . ACC EXE Distance a ;
12
c o n n 5 : p o r t A C C a c t u a t o r . U n s a f e A c t i o n a −>
ACC controlled process . Controlled Action ;
13
14
15
c o n n 6 : p o r t A C C c o n t r o l l e d p r o c e s s . F e e d b a c k −> ACC sensor . A C C S t a t e s ;
flows
e t o e f s p e e d : end t o end f l o w
ACC sensor . f s o u r c e 1 −>conn 1 −>A C C c o n t r o l l e r . f p a t h 1
16
17
−>conn 3 −>A C C a c t u a t o r . f p a t h 3 −>c onn 5 −>A C C c o n t r o l l e d p r o c e s s . f s i n k ;
e t o e f d i s t a n c e : end t o end f l o w
ACC sensor . f s o u r c e 2 −>conn 2 −>A C C c o n t r o l l e r . f p a t h 2
−>c onn 4 −>A C C a c t u a t o r . f p a t h 4 −>conn 5 −>A C C c o n t r o l l e d p r o c e s s . f s i n k ;
18
end e n t i r e s y s t e m . i m p l ;
Listing 4.8: ACC System
51
In listing 4.8, we have represented the core components of the entire ACC system architecture.
We have represented the connections among components. We have also represented the end to end flow
information which goes from the sensor to the actuator.
Figure 4.1: Three-Way Communication Format for Adaptive Cruise Control System Architecture
Figure 4.1, a feedback control loop architecture has been built for the ACC system. Figure 4.1
shows the error flow for each error type based on the three-way interaction format. Figure 4.1 shows two
error flows: one for the distance error and the other for the speed error. The first error flow (i.e., red line)
shows the error propagation from ACC sensor to ACC actuator. The error in the measuring of distance
propagates to the ACC controller through the output data port. The error distance value in the sensor can
be prevented from being communicated through a safety constraint statement. But, if it is not prevented, it
propagates to the controller. The controller also can handle the incoming error distance value through the in
data port according to the safety constraint. If the controller is not able to handle that, it sends an inadequate
command to the actuator. Then, the actuator will execute an inadequate action. The effect of that error
changes the manipulated variable which is the output of the actuator. The second error flow (i.e. blue line)
has the same description.
ASAM generates a report for the adaptive cruise control system architecture based on component
names, error ontology, probability values, probability thresholds, hazard levels, unsafe control actions, causes
of unsafe control actions and safety constraints. This stream of information is attached to each imple52
Figure 4.2: ASAM’s report for Adaptive Cruise Control System Architecture
mentation part of the error flow (flow source [ACC sensor.impl], flow path[ACC controller.impl], and flow
sink[ACC actuator.impl]). The report is shown in figure 4.2. That analysis shows the source of faults in the
ACC model, which is the sensor. Each fault in the sensor produces errors. Each error has a chance to harm
people based on the probability of occurrence for that error and result of that error. For example, an ”above
range of speed” error results in a critical level and a ”below range of distance” error results in a catastrophic
level. Each error in the sensor gives an unsafe control action to the controller which results in sending an
inadequate command to the actuator. For example, the ACC controller warns the driver to apply the brake. If
the driver does not do that action, the system will automatically do it. Also, general and specific causes for
each unsafe control action have been identified. Finally, we have safety constraints to know how to mitigate
the effects of the unsafe control actions or the errors.
4.1.3
Train Automated Door Control System
The second system we introduce is a train automated door control system. The goal of this system
is to monitor the open and close events of the train door. This example is used to support (scenario 2 3.1.2).
The train door example is described in [22]. We need to extend the same example to improve safety of the
system from better to best. For instance, what will happen if train automated door controller sends inadequate
commands instead of correct commands because of internal failure? Will it select the correct doors to open,
and only open those doors when the train is stopped completely at the station platform? If the controller
has an internal failure, it produces an error. The error becomes an event leading to changing the normal
operational state of the process model to the error state. If the controller is able to handle the error event, a
hazardous state will not occur. The controller will go back to its normal state. But, if the controller is not able
to handle the error event, the error will be transformed into an outgoing propagation error from controller to
actuator. As the result, the actuator is will perform an incorrect action. For example, it might select the right
side doors processes in the controlled process to execute instead of left side doors. This can definitely lead to
53
a hazardous situation because the system guides people in the wrong direction.
The initial result for this train automated door control system example has been published in [38]
which are:
1. The error ontology identifies wrong selection as incorrect values. Now, the hazard is ”wrong side
selection to open the wrong doors at the station platform”.
2. The feedback control loop has been built from the controller to the controlled process as shown in
figure 3.1 which is error flow 2 (scenario 2).
3. The unsafe control action is that ”system selects right side doors to open instead of left side”. The
safety constraints for the unsafe action is that ”the system must not open the doors when it has incorrect
values”.
4. The general cause for the unsafe control action is that ”the controller has an internal failure” and the
specific cause is that ”the propagating error event from the door controller to the physical door”.
5. This is shown in the developed safety architecture subsection, as shown in figure 3.1.
We will now provide and discuss the individual core components of the feedback control loop architecture for train automated door control system.
1
2
system D o o r c o n t r o l l e r
features
3
door open c : in data port ;
4
door blocked c : in data port ;
5
open door c : out data port ;
6
close door c : out data port ;
7
flows
8
c s o u r c e 1 : flow source open door c ;
9
c s o u r c e 2 : flow source c l o s e d o o r c ;
10
end D o o r c o n t r o l l e r ;
Listing 4.9: Train Automated Door Control System
In listing 4.9, we have the Door controller interface that opens and closes the train door when it is
required. This interface has two in data ports, door open c and door blocked c, that allow the controller to
54
receive open door state and blocked door state values. The interface also has two out data ports: open door c
and close door c. open door c is used to send the open door command to the physical door and close door c
is used to send the close door command to the physical door as well. The interface has two flow sources of information, c source1 and c source2. The Door controller sends open door commands through c source1, and
sends close door commands through c source2. The implementation of this interface is shown in listing 4.10.
1
system implementation D o o r c o n t r o l l e r . impl
2
a n n e x asam {∗∗
3
E r r s => [ {
−− I d e n t i f y s p e c i f i c e r r o r t y p e s f o r t h e d o o r c o n t r o l l e r , i d e n t i f y
4
c h a n c e o f harm p e o p l e .
Type =>S e r v i c e C o m m i s s i o n ( C r i t i c a l , p = 0 . 1 ) ,
5
−−F o r t h e s p e c i f i c e r r o r t y p e , what k i n d o f u n s a f e a c t i o n w i l l t h e
6
d o o r c o n t r o l l e r do ?
UCA => UCA1 : ” Door c o n t r o l l e r s e l e c t s r i g h t s i d e d o o r s t o open
7
i n s t e a d of l e f t side ” ,
C a u s e s => {
8
−−What a r e t h e c a u s e s o f t h e U n s a f e C o n t r o l A c t i o n (UCA) o f t h e
9
door c o n t r o l l e r ?
10
G e n e r a l => ” The d o o r c o n t r o l l e r h a s an i n t e r n a l f a i l u r e ” ,
11
S p e c i f i c => ” The p r o p a g a t i n g e r r o r e v e n t from t h e s e n s o r t o
door c o n t r o l l e r ”
},
12
−− I d e n t i f y S a f e t y C o n s t r a i n t s ( SC ) f o r t h e d o o r c o n t r o l l e r t o
13
m i t i g a t e e f f e c t s o f t h e UCA.
SC => SC1 : ” The d o o r c o n t r o l l e r must n o t open t h e d o o r s when
14
unexpected s e r v i c e or data i s provided f or a n a l y s i s ”
}]
15
16
17
∗∗};
end D o o r c o n t r o l l e r . i m p l ;
Listing 4.10: ASAM for Train Door Controller
In listing 4.10, we have identified a specific error type, the service commission, which represents
unexpected service provided by the controller because of the controller’s internal failure. For that type of
55
error, we have also identified the probability of occurrence for the error and the hazard’s severity level. In
addition, we have recorded the unsafe action of the door controller for that type of error: Door controller
selects right side doors to open instead of left side. The general cause for that unsafe action is the controller’s
internal failure. The specific cause is propagating an error from the sensor of the door to controller. We also
have identified a safety constraint, the controller should not do any analysis for unintended incoming data, to
mitigate the effects of the controller’s unsafe action within the model.
1
2
system D o o r a c t u a t o r
features
3
open door a : in data port ;
4
close door a : in data port ;
5
d o o r s t a t e a : out data port ;
6
flows
7
a p a t h 1 : f l o w p a t h o p e n d o o r a −> d o o r s t a t e a ;
8
a p a t h 2 : f l o w p a t h c l o s e d o o r a −> d o o r s t a t e a ;
9
end D o o r a c t u a t o r ;
Listing 4.11: Train Automated Door Control System
In listing 4.11, we have the Door actuator interface that follows the door controller’s to execute
commands. The interface has two in data ports, open door a and close door a, that allow the door actuator
to execute open and close commands from the door controller. The interface also has an out data port,
door state a, that allows the door actuator to show the physical door status after executing open or close
commands. The interface also has two flow paths to make a link between the in and out data ports within
door actuator. For example, a path1 is a flow path within the actuator used to transfer data from open door a
to door state a. Also, a path2 is another flow path within the actuator used to transfer data from close door a
to door state a. The implementation of this interface is shown in listing 4.12.
1
system implementation D o o r a c t u a t o r . impl
2
a n n e x asam {∗∗
3
E r r s => [ {
4
−− I d e n t i f y s p e c i f i c e r r o r t y p e s f o r t h e d o o r a c t u a t o r , i d e n t i f y t h e
chance of harming people .
5
6
Type =>S e r v i c e C o m m i s s i o n ( C r i t i c a l , p = 0 . 0 0 2 ) ,
−−F o r t h e s p e c i f i c e r r o r t y p e , what k i n d o f u n s a f e a c t i o n w i l l t h e
56
d o o r a c t u a t o r do ?
UCA => UCA2 : ” Door a c t u a t o r f o l l o w s t h e d o o r c o n t r o l l e r ’ s commands
7
which o p e n s t h e wrong s i d e d o o r ” ,
C a u s e s => {
8
−−What a r e t h e c a u s e s o f t h e U n s a f e C o n t r o l A c t i o n (UCA) o f t h e d o o r
9
actuator ?
10
G e n e r a l => ” The d o o r a c t u a t o r h a s an i n t e r n a l f a i l u r e ” ,
11
S p e c i f i c => ” The d o o r a c t u a t o r r e c e i v e d i n a d e q u a t e command from
door c o n t r o l l e r ”
},
12
−− I d e n t i f y S a f e t y C o n s t r a i n t s ( SC ) f o r t h e d o o r a c t u a t o r t o
13
m i t i g a t e e f f e c t o f t h e UCA.
SC => SC2 : ” The d o o r a c t u a t o r must n o t e x e c u t e i n c o r r e c t commands ”
14
}]
15
16
17
∗∗};
end D o o r a c t u a t o r . i m p l ;
Listing 4.12: ASAM for Train Door Actuator
In listing 4.12, we have identified that the actuator opens the wrong doors. That’s because of two
reasons: 1) the door actuator receives an inadequate command from the controller or 2) the door actuator has
an internal failure. In this case, we need to provide a safety constraint to know how to mitigate the effects of
actuator’s unsafe actions.
1
2
system Physical Door
features
3
d o o r s t a t e i n c p : in data port ;
4
d o o r s t a t e o u t c p : out data port ;
5
6
7
flows
c p p a t h 3 : f l o w p a t h d o o r s t a t e i n c p −> d o o r s t a t e o u t c p ;
end P h y s i c a l D o o r ;
Listing 4.13: Train Automated Door Control System
In listing 4.13, we show the physical door interface in the train door system architecture. The interface has an in data port, door state in cp, that allows it to take the executed commands from door actuator.
57
Also, the interface has an out data port, door state out cp, that allows it to show the status of executed process
that has been selected by the door controller. The data of the executed process automatically passes within
the physical door interface from door state in cp to door state out cp. The implementation of this interface
is shown in listing 4.14.
1
system implementation Physical Door . impl
2
a n n e x asam {∗∗
3
E r r s => [ {
−− I d e n t i f y s p e c i f i c e r r o r t y p e s f o r t h e p h y s i c a l door , i d e n t i f y t h e
4
chance of harming people .
Type =>S e r v i c e C o m m i s s i o n ( M a r g i n a l , p = 0 . 0 0 0 0 1 ) ,
5
−−F o r t h e s p e c i f i c e r r o r t y p e , what k i n d o f u n s a f e a c t i o n w i l l t h e
6
p h y s i c a l d o o r do ?
UCA => UCA3 : ” The wrong p r o c e s s h a s b e e n s e l e c t e d t o e x e c u t e ” ,
7
C a u s e s => {
8
−−What a r e t h e c a u s e s o f t h e U n s a f e C o n t r o l A c t i o n (UCA) o f t h e
9
p h y s i c a l door ?
10
G e n e r a l => ” The d o o r c o n t r o l l e r ’ s wrong d e c i s i o n ” ,
11
S p e c i f i c => ” The d o o r a c t u a t o r ’ s wrong e x e c u t i o n ”
},
12
−− I d e n t i f y S a f e t y C o n s t r a i n t s ( SC ) f o r t h e p h y s i c a l d o o r t o
13
m i t i g a t e e f f e c t o f t h e UCA.
SC => SC3 : ” The p h y s i c a l d o o r s h o u l d n o t a l l o w t h e s e l e c t i o n o f t h e
14
wrong p r o c e s s ”
}]
15
16
17
∗∗};
end P h y s i c a l D o o r . i m p l ;
Listing 4.14: ASAM for Physical Door Interface
In listing 4.14, the physical door implementation tells us the wrong process has been executed because of two reasons: 1) the wrong decision has been made by the door controller or 2) the wrong execution
has been made by the door actuator. We provide a safety constraint, the physical door should not allow the
selection of the wrong process. This helps to mitigate the effects of the unsafe action of the physical door.
Finally, this case tells us that the open or close processes in the physical door have been updated by abnormal
58
data because of a service commission error which comes from the controller. We know the physical door
passes the wrong updated process to sensor of the door. This means the sensor’s information will be updated
incorrectly.
1
2
system Door sensor
features
3
d o o r s t a t e s : in data port ;
4
door open s : out data port ;
5
door blocked s : out data port ;
6
7
8
flows
s s i n k : flow s in k d o o r s t a t e s ;
end D o o r s e n s o r ;
Listing 4.15: Train Automated Door Control System
In listing 4.15, we show the last component of the train automated door control system architecture
(TADCS): the interface for the TADCS sensor. This interface shows that the Door sensor component receives
one value and produces two values. The first received value door state s represents door state. The other two
values are door open s and door blocked s, representing door open state and door blocked state. The flow
sink represents the end of the information flow which comes from the Door controller. The implementation
of the entire system is shown in listing 4.16.
1
2
system implementation e n t i r e s y s t e m . impl
subcomponents
3
Door sensor : system Door sensor . impl ;
4
D o o r c o n t r o l l e r : system D o o r c o n t r o l l e r . impl ;
5
D o o r a c t u a t o r : system D o o r a c t u a t o r . impl ;
6
Physical Door : system Physical Door . impl ;
7
connections
8
c o n n 1 : p o r t D o o r s e n s o r . d o o r o p e n s −> D o o r c o n t r o l l e r . d o o r o p e n c ;
9
c o n n 2 : p o r t D o o r s e n s o r . d o o r b l o c k e d s −>
Door controller . door blocked c ;
10
c o n n 3 : p o r t D o o r c o n t r o l l e r . o p e n d o o r c −> D o o r a c t u a t o r . o p e n d o o r a ;
11
c o n n 4 : p o r t D o o r c o n t r o l l e r . c l o s e d o o r c −> D o o r a c t u a t o r . c l o s e d o o r a ;
12
c o n n 5 : p o r t D o o r a c t u a t o r . d o o r s t a t e a −>
Physical Door . door sta te in cp ;
59
13
c o n n 6 : p o r t P h y s i c a l D o o r . d o o r s t a t e o u t c p −>
Door sensor . d o o r s t a t e s ;
14
15
flows
e t o e f o p e n : end t o end f l o w
D o o r c o n t r o l l e r . c s o u r c e 1 −>c onn 3 −>D o o r a c t u a t o r . a p a t h 1
16
17
−>conn 5 −>P h y s i c a l D o o r . c p p a t h 3 −>c onn 6 −>D o o r s e n s o r . s s i n k ;
e t o e f c l o s e : end t o end f l o w
D o o r c o n t r o l l e r . c s o u r c e 2 −>c onn 4 −>D o o r a c t u a t o r . a p a t h 2
18
19
−>conn 5 −>P h y s i c a l D o o r . c p p a t h 3 −>c onn 6 −>D o o r s e n s o r . s s i n k ;
end e n t i r e s y s t e m . i m p l ;
Listing 4.16: Train Automated Door Control System
In listing 4.16, we show the core components of the entire train automated door control system
architecture. We represent the connections among components and we also represent the end to end flow
information which comes from the door controller to the controlled process (i.e.physical door).
Figure 4.3: Three-Way Communication Format for Train Automated Door Control System Architecture
Figure 4.3 shows the feedback control loop architecture which has been built for the train automated
60
door control system. Figure 4.3 shows error flows based on the three-way interaction format for that train
door model. The Door controller has an internal failure. This leads to the production of a service commission
error (i.e. door controller closes the door instead of open). That error propagates to the Door actuator in the
form of inadequate command. Then, the actuator executes the command which contains the error because
the actuator follows the Door controller’s order. The Physical door shows the process has been executed
by the door actuator and updates that process incorrectly. Finally, we show the propagated error which cuts
across three components, from Door controller to Door actuator, and from Door actuator to Physical door.
Also, the propagated error has three important effects on the train automated door control system. First, it
effects the Door controller decision. Second, the Door actuator performs inadequate action. Third, it effects
the controlled process Physical door by selecting the inappropriate process for the action. This illustration
allows us to identify the source of the hazard by back-tracing for the error.
Figure 4.4: ASAM’s report for Train Automated Door Control System Architecture
ASAM generates a report for the train automated door control system architecture based on component names, error ontology, probability values, probability thresholds, hazard levels, unsafe control actions, causes of unsafe control actions and safety constraints. This stream of information is attached to
each implementation part of the component. Specifically, besides identifying error propagation information for each component, we have recorded additional information for each part of the error flow (flow
source [Door controller.impl], flow path[Door actuator.impl], and flow sink[Physical door.impl]. The report
is shown in figure 4.4. That analysis report shows the source of the faults in the train door model, which is the
Door controller.impl. Each fault in the controller produces an error, which is of the type service commission.
Each error in the door controller has a chance of harming people based on the probability of occurrence for
that error, represented as (p=0.1). The report tells also tells us what the unsafe action of the door controller
is for that type of error as well as the general and specific causes for that error. Finally, the report shows the
safety constraint for the safety analysts or stake holders to mitigate the effects of the error.
61
4.1.4
Medical Embedded Device-Pacemaker
The third system we introduce is a safety-critical embedded system, an implantable medical device
known as a pacemaker. The pacemaker is used to regulate abnormal heart rhythms. It has two main tasks,
sensing and pacing. In pacing, it paces the heart in case the heart’s own rhythm is irregular or too slow. In
sensing, it monitors the heart’s natural electrical activity. If the pacemaker senses a natural heart beat, it will
not stimulate the heart. According to our method we need to specify major components of the pacemaker
like the controller (DCM:Device-Controller Monitor), actuator (PG: Pulse Generator), controlled process
(heart), and sensor (electrode/lead) [29, 31]. We can connect the components as shown in figure 3.1 . Our
example starts from the actuator to the sensor’s output as shown in figure 3.1, error flow 3 (scenario 3: PG
→ heart → electrode/lead).
The initial results for this medical device has been published in [38] which are:
1. The error ontology identifies timing errors like late delivery. Now, the hazard is ”the pacemaker is not
working properly”.
2. The feedback control loop has been built from the actuator to the sensor as shown in figure 3.1 which
is error flow 3 (scenario 3).
3. The unsafe control action is that ”the pacemaker does not provide an electrical pulse when its required”.
The safety constraints for the unsafe action is that ”the pacemaker should provide an electrical pulse
whenever its needed”.
4. The general cause for the unsafe control action is that ”the actuator has an internal failure” and the
specific cause is that ”the propagation of timing error from the electrode to pulse generator”.
5. This is shown in the developed safety architecture subsection, as shown in figure 3.1
We will provide and discuss the feedback control loop architecture for the pacemaker.
62
Figure 4.5: Three-Way Communication Format for Embedded Medical Device
Figure 4.5 shows the feedback control loop architecture that has been built for the pacemaker. Figure
4.5 shows the error flow based on the three-way interaction format for the pacemaker.PULSE GENERATOR
has an internal failure. We know the result of each internal failure is an error and the error propagates to the
next component through nominal control flow as well as the error propagation path. When the ELECTRODE
detects that the HEART needs pacing, the DEVICE CONTROLLER MONITOR decides to send the command
to the PULSE GENERATOR to send an electrical pulse to the HEART. The PULSE GENERATOR receives
the command, but it is not executed instantly because it has internal failure. In this situation, the produced
error inside of the PULSE GENERATOR constantly effects the operational state of the PULSE GENERATOR
and also directly impacts the device CONTROLLER MONITOR’s command. If the PULSE GENERATOR is
able to resolve this problem that would not be hazardous situation. But, if it is not, the situation will be
hazardous for the patient such as sending pulse to the heart late.
Figure 4.6: ASAM’s report for Pacemaker System Architecture
ASAM generates the report for the pacemaker system architecture based on the following param63
eters: component names, error ontology, probability values, probability thresholds, hazard levels, unsafe
control actions, causes of unsafe control actions and safety constraints. These streams of information are
attached to each implementation part of the component. Besides identifying error propagation information for each component, we have recorded additional information for each part of the error flow (flow
source[Pulse Generator.impl], flow path[Heart.impl] and flow sink [Electrode.impl]. The report is shown
in figure 4.6. This analysis report shows the source of the error in the safety-critical embedded system
model(i.e. pacemaker), which is LatePacingDelivery. Each error in the pacemaker has a chance of harming
the patient’s heart based on the probability of occurrence for that type of error, represented with (p=0.01).
ASAM compares the p value with probability of occurrence thresholds to find the probability of the hazard
occurring. In this example, it found (occasional) and the hazard level is (Marginal). The report tells us the
unsafe action of the pulse generator for that type of error, which is ”The pulse generator does not provide
an electrical pulse when its required”. Also, the report shows the general and specific causes for that error,
which are ”general: the pulse generator has an internal failure, specific: the propagating of timing error
from electrode to the pulse generator”. Finally, the report shows the safety constraint for the safety analysts
to know how to mitigate the effects of the error, which is ”the pulse generator should provide an electrical
pulse whenever its needed”.
4.1.5
Infant Incubator Temperature Control System- Isolette
The fourth system we introduce is another safety-critical system, known as an isolette. The isolette
is an infant incubator temperature control system. The goal of this system is to monitor an infant at a safe or
comfortable temperature and warn the nurse if the infant becomes too hot or too cold. In general, the system
is used to control air temperatures inside of incubator so that they remain in a desired temperature range.
We provide this example to support scenario 4 3.2.1. We evaluate what will happen if the operator interface
device (OID) device in an infant incubator (isolette) gives an out of range value for the air temperature inside
the isolette because of an internal failure? (e.g. the OID device displays correct values on the screen, but in
reality does not produce an air temperature within the target range specified by the nurse). At a result, the
infant is harmed by air temperature on the inside isolette being too hot / too cold. To solve this problem, the
controller should send notification to the screen of the OID device about error values and should not send the
command to the actuator.
The initial result for isolette example are:
64
1. The error ontology identifies out of range errors as an incorrect values. Now, the hazard is ”the infant
is not safe with (out of range) temperature values such as below or above of the desired range”.
2. The feedback control loop has been built from the operator interface device to the actuator as shown in
figure 3.2 which is error flow 4 (scenario 4).
3. The unsafe control action is that ”the operator interface device does not send the message to warn the
nurse when its required”. The safety constraints for the unsafe action is that ”the device should dispatch
the message to warn the nurse whenever its needed”.
4. The general cause for the unsafe control action is that ”the connected device has an internal failure”
and the specific cause is that ”the propagating of error values from connected device to actuator”.
5. This is shown in the developed safety architecture subsection, as shown in figure 3.2.
We will provide and discuss the feedback control loop architecture for the infant incubator temperature control system.
65
Figure 4.7: Three-Way Communication Format for Isolette System Architecture
Figure 4.7 shows the feedback control loop architecture which has been built for the isolette. Figure
4.7 shows the error flow based on three-way communication format for that isolette system. This example
is different from the previous example because we show the effects of the internal failure of in OPERATOR INTERFACE DEVICE which is not a fundamental element in the feedback control loop architecture.
The OPERATOR INTERFACE DEVICE is used to set THERMOSTAT and monitor the status of the system.
When OPERATOR INTERFACE DEVICE has an internal failure, this leads to the sending an out of range
temperature error values to THERMOSTAT. In this case, the THERMOSTAT will receive above range or below range temperature values. In this example, we have selected the above range temperature error value.
This error effects the THERMOSTAT’s decision because it is not able to understand the temperature value
that has been defined previously in the system controller. In this situation, if the THERMOSTAT is able to
resolve this problem, it would not be a hazardous situation. But, if it is not, the situation will be hazardous for
the infant such as sending inadequate commands to the HEAT SOURCE actuator. The inadequate command
66
affects the infant because the air temperature inside of the incubator is going to be higher than maximum
temperature value. The infant will be harmed because incubator is going to be too hot. Also, that inadequate command does not send the message to the OPERATOR INTERFACE DEVICE to warn the nurse of
the malfunction.
Figure 4.8: ASAM’s report for Isolette System Architecture
ASAM generates a report for the isolette system architecture based on the following parameters:
component names, error ontology, probability values, probability thresholds, hazard levels, unsafe control
actions, causes of unsafe control actions and safety constraints. This particular information is connected to
each component’s implementation portion. In spite of identifying error propagation information for each
component’s specification, we have recorded additional information for each portion of error: flow source
(flow source[Operator Interface.impl], flow path[Thermostat.impl], and flow sink[Heat Source.impl]). The
report is shown in figure 4.8. That report shows the component’s fault source in the infant incubator temperature control system, which is Operator Interface.impl. Each fault gives an error. The operator interface’s fault
gives an OutOfRangeTemp error. Each error in the isolette system has a chance of harming the infant based
on the probability of occurrence for that type of error, represented as (p=0.1). ASAM compares the p value
with the probability of occurrence thresholds to find the probability of hazard occurring; it found (probable)
so the hazard level is (critical). The report tells us what unsafe action the operator interface performs for that
type of specific error, which is ”The operator interface sends an out of range temperature error values to
thermostat”. Also, the report shows the general and specific causes for that type of error, which are ”general:
the operator interface has an internal failure, specific: The operator interface accidentally receives incorrect setting values by the nurse”. Finally, the report shows the safety constraint for the stakeholders to know
how to mitigate the effects of the error, which is ”The operator interface device should not send incorrect
values to the thermostat”.
4.1.6
Summary
In this section, we have introduced four examples of varying complexity and scenarios for each
example. For each scenario, we have provided an instantiation of the ASAM annex representing the intro67
duced three-way communication format scenarios as well as a report of the output produced by the ASAM
evaluator.
The purpose of these examples is to demonstrate that ASAM finds more hazards using the threeway communication format as opposed to using the two-way format. ASAM identify hazards for the system
scenarios based on the following parameters: component specification and implementation information, error
ontology, probability values, probability thresholds, hazard levels, unsafe control actions, causes of unsafe
control actions and safety constraints. For each scenario, we have also demonstrated not only the ability
to represent the scenario in the architecture model but the ability to evaluate the representation to produce
reports.
4.2 Demonstrating Error Mitigation by Safety Constraints
Claim C2: Each error flow in ASAM assists in identifying a safety constraint of appropriate breadth
to eliminate the unsafe control actions caused by an identified error. In addition, each error flow can guide us
to identify specific causes for the unsafe control action.
4.2.1
Evaluation Plan
To demonstrate claim C2, we take the previous complex examples that we analyzed with respect
to the scenarios and we perform error mitigation analysis by safety constraints on each example. For that
purpose, we will use ASAM to focus on error behavior of the components in the feedback control loop
architecture and follow each error (port by port) and (component by component) until the destination. This
following of the error allows us to identify the source of the hazard and provides safety constrains to mitigate
it.
We will evaluate our selected examples that were described previously to support the scenarios. In
these examples, we need to identify the major component types (i.e. sensor, controller, actuator, controlled
process), the component’s internal failure which causes error on the output ports, and the component’s safety
constraint need to handle errors through in ports and prevent errors through out ports. Then, we will produce
an error mitigation analysis report for each example to ensure that error has been mitigated by the safety
constraint in each specific component.
68
4.2.2
Error Mitigation Analysis for Adaptive Cruise Control System
We will now provide and discuss error mitigation by safety constraint for implementation part of
each individual core components of the feedback control loop architecture for ACC system.
1
2
s y s t e m ACC sensor
features
3
ACC State s : in data p o r t ;
4
ACC Speed s : o u t d a t a p o r t ;
5
ACC Distance s : out data p o r t ;
6
end ACC sensor ;
7
8
9
10
s y s t e m i m p l e m e n t a t i o n ACC sensor . i m p l
a n n e x asam {∗∗
−− I d e n t i f y t h e component t y p e
t y p e => s e n s o r ;
11
12
−−ACC sensor ’ s i n t e r n a l f a i l u r e c a u s e s e r r o r s on o u t p o r t s
i n t e r n a l f a i l u r e INTF1 : ”ACC s e n s o r h a s i n t e r n a l f a i l u r e ” c a u s e s
13
[ S e n s o r R e a d s E r r o r V a l u e s ( C r i t i c a l , p = 0 . 0 2 ) ] on [ ACC Speed s ,
ACC Distance s ] ;
14
15
∗∗};
end ACC sensor . i m p l ;
Listing 4.17: ASAM Error Mitigation Analysis for ACC System
In listing 4.17, we have represented the ACC sensor interfaces: specification and implementation.
The ACC sensor specification interface is already described in section 4.1.2. The ACC sensor implementation interface shows the component’s sensor. Also, it shows the ACC sensor’s internal failure which causes
SensorReadsErrorValues on the output ports ACC Speed s and ACC Distance s. This port ACC Speed s represents an interface which sends the measured values of the speed of your own car as compared to the car in
front. This port ACC Distance s represent an interface which sends the measured values of distance between
your own car and the car in front.
1
2
3
system ACC controller
features
ACC Speed c : i n d a t a p o r t ;
69
4
ACC Distance c : in d a t a p o r t ;
5
ACC Speed cmd : o u t d a t a p o r t ;
6
ACC Ditance cmd : o u t d a t a p o r t ;
7
end A C C c o n t r o l l e r ;
8
9
10
11
system implementation ACC controller . impl
a n n e x asam {∗∗
−− i d e n t i f y t h e component t y p e
t y p e => c o n t r o l l e r ;
12
13
−−ACC c o n t r o l l e r h a v e i n t e r n a l f a i l u r e c a u s e s e r r o r on o u t p o r t s .
i n t e r n a l f a i l u r e INTF2 : ”ACC c o n t r o l l e r h a s an i n t e r n a l f a i l u r e ” c a u s e s
14
[ C o n t r o l l e r C o m p u t e E r r o r V a l u e s ( C a t a s t r o p h i c , p = 0 . 0 3 ) ] on
[ ACC Speed cmd , ACC Ditance cmd ] ;
15
−−E r r o r comes from t h e ACC s e n s o r and c a u s e s e r r o r on o u t p o r t s .
16
[ S e n s o r R e a d s E r r o r V a l u e s ] on [ ACC Speed c , A C C D i s t a n c e c ] c a u s e s
[ S e n s o r R e a d s E r r o r V a l u e s ( C a t a s t r o p h i c , p = 0 . 0 3 ) ] on [ ACC Speed cmd ,
ACC Ditance cmd ] ;
17
−−ACC c o n t r o l l e r h a v e a SC t o h a n d l e e r r o r ( s ) on i n p o r t s .
s a f e t y c o n s t r a i n t SC1 : ” The ACC c o n t r o l l e r s h o u l d h a n d l e e r r o r s t h a t
18
come from t h e ACC s e n s o r ” h a n d l e s [ S e n s o r R e a d s E r r o r V a l u e s ] on
[ ACC Speed c , A C C D i s t a n c e c ] ;
19
20
∗∗};
end A C C c o n t r o l l e r . i m p l ;
Listing 4.18: ASAM Error Mitigation Analysis for ACC System
In listing 4.18, we show the ACC controller’s specification and implementation. The ACC controller
specification interface is already described in 4.1.2. The ACC controller implementation interface shows
the component’s controller. Also, it shows the ACC controller’s internal failure causes ControllerComputeErrorValues on the output ports ACC Speed cmd and ACC Distance cmd. That error effects of the
ACC controller’s decision by giving incorrect commands to the actuator, which are catastrophic. At the same
time, ACC sensor’s error, SensorReadsErrorValues, is propagated to ACC controller’s in ports ACC Speed c
and ACC Distance c through the error propagation path. In this case, we provide a safety constraint to handle
the ACC controller’s incoming errors. Also, the ACC controller does not let sensor’s error propagate to the
70
next component.
1
2
system ACC actuator
features
3
ACC EXE Speed a : i n d a t a p o r t ;
4
ACC EXE Distance a : i n d a t a p o r t ;
5
Unsafe Action a : out data port ;
6
end A C C a c t u a t o r ;
7
8
9
10
system implementation ACC actuator . impl
a n n e x asam {∗∗
−− i d e n t i f y t h e component t y p e
t y p e => a c t u a t o r ;
11
12
−−ACC a c t u a t o r ’ s i n t e r n a l f a i l u r e c a u s e s e r r o r s on o u t p o r t s
i n t e r n a l f a i l u r e INTF3 : ” The ACC a c t u a t o r h a s an i n t e r n a l f a i l u r e ”
13
c a u s e s [ A c t u a t o r E r r o r V a l u e s ( M a r g i n a l , p = 0 . 0 4 ) ] on [ U n s a f e A c t i o n a ] ;
14
−−E r r o r comes from t h e ACC c o n t r o l l e r and c a u s e s e r r o r on o u t p o r t s .
[ C o n t r o l l e r C o m p u t e E r r o r V a l u e s ] on [ ACC EXE Speed a , ACC EXE Distance a ]
15
c a u s e s [ C o n t r o l l e r C o m p u t e E r r o r V a l u e s ( C r i t i c a l , p = 0 . 1 ) ] on
[ Unsafe Action a ] ;
16
−−ACC a c t u a t o r h a s a SC t o h a n d l e i n c o m i n g e r r o r s from c o n t r o l l e r
s a f e t y c o n s t r a i n t SC2 : ” The ACC h a n d l e s t h e e r r o r t h a t comes from t h e
17
ACC c o n t r o l l e r ” h a n d l e s [ C o n t r o l l e r C o m p u t e E r r o r V a l u e s ] on
[ ACC EXE Speed a , ACC EXE Distance a ]
18
−−ACC a c t u a t o r h a s a SC t o p r e v e n t i t s e r r o r s t o o u t
s a f e t y c o n s t r a i n t SC3 : ” The ACC p r e v e n t s t h e e r r o r o f t h e a c t u a t o r ”
19
p r e v e n t s [ A c t u a t o r E r r o r V a l u e s ] on [ U n s a f e A c t i o n a ] ;
20
21
∗∗};
end A C C a c t u a t o r . i m p l ;
Listing 4.19: ASAM Error Mitigation Analysis for ACC System
In listing 4.18, we show the ACC actuator’s specification and implementation. The ACC actuator
specification interface is already described in 4.1.2. The ACC actuator implementation interface shows the
component’s actuator. Also, it shows the ACC actuator’s internal failure which causes ActuatorErrorValues
71
on an output port Unsafe Action a. At the same time, the ACC controller’s error, ControllerComputeErrorValues, is propagated to ACC actuator in ports ACC EXE Speed a and ACC EXE Distance a through the error propagation path. In this case, we need to provide two safety constraints, one to handle the ACC actuator’s
incoming errors, the other to prevent the errors of ACC actuator itself.
Figure 4.9: ASAM’s Error Mitigation report for ACC System Architecture
Figure 4.9 shows ASAM’s generated report about error mitigation analysis by safety constraints for
the adaptive cruise control system architecture. ASAM generates the report based on the following information: internal failure for each major component, each internal failure causes error, probability value of
error occurrence, probability thresholds to identify probability of which hazard could occur, major component implementation, major component specification, propagation path to identify error mitigation process
step by step, error specification propagation gathers information about each error, mitigated error by safety
constraint, mitigated error in component’s implementation and mitigated error in component’s specification.
We iteratively explain figure 4.9 below:
1. Internal failure (INTF1) recorded in ACC sensor causes SensorReadsErrorValues in ACC Distance s
and ACC Speed s. The probability of occurrence for that error in measuring of distance and speed is
(0.02). That error gives a critical level of hazard for the ACC system because ACC sensor reads incorrect values of distance and speed for the car in front. In the error propagation path, we show that these
two error values are propagated from sensor to controller [ACC sensor.impl → ACC controller.impl].
Also, in the error propagation type, we gather information about each error such as error name, port
to propagate, hazard’s severity level and p value. In this case, the controller has enough information about each propagated error. To mitigate each error, ACC controller has a safety constraint (SC).
For example, ”SC1: The ACC controller should handle errors that come from the sensor” is used to
mitigate the effects of the error measuring in distance and speed. This mitigation process has been
72
executed in ACC controller.impl. Finally, we show that the errors are propagated from sensor (S) or
ACC sensor.impl as a source and have been mitigated in controller (C) or ACC controller.impl as a
destination.
2. Previously, we show that how the ACC controller mitigates the incoming errors. But, what will happen
if the ACC controller itself has an internal failure? How does the ACC controller solve this problem? For that purpose, we have recorded an internal failure (INTF2) in the ACC controller which
causes error, ControllerComputeErrorValues, in commands ACC Distance cmd and ACC Speed cmd.
The probability of occurrence for that error in the commands is (0.03). That’s (catastrophic) because the ACC controller’s command is considered as an inadequate command. In the error propagation path, we show that these two error values have been propagated from controller to actuator
[ACC controller.impl → ACC actuator.impl]. In addition, in the error propagation information, we collect information about each error such as error name, port to propagate, hazard’s severity level and p
value. In this case, the actuator has enough information about each propagated error from controller. To
mitigate each error on an in ports, ACC actuator has a safety constraint (SC) for it. For example,”SC2:
The ACC actuator handles the errors that come from the ACC controller” is used to mitigate the effects
of the error in controller’s commands. This mitigation process has been executed in ACC actuator.impl
as a destination point. Finally, we show that the errors in the commands are propagated from controller
(C) or ACC controller.impl and have been mitigated in actuator (A) or ACC actuator.impl.
3. Now, we know that how the ACC controller handles incoming errors from the ACC sensor and also
we show that how ACC actuator handles the incoming errors from the ACC controller. But, what
will happen if the ACC actuator itself has an internal failure? How does this impact the controller’s
commands? For that purpose, we have recorded an internal failure (INTF3) in the ACC actuator which
causes ActuatorErrorValues. The probability of occurrence for that error during execution is (0.04).
We consider the severity level for that error is (marginal) because the error effects the controller’s
commands during execution and leads to produce an unsafe action in Unsafe Action a. But, previously
we show that how ACC actuator mitigates the incoming propagated errors from ACC controller. Now,
we need to show that the ACC actuator prevents errors that have been created by itself. In the error
propagation path, we show that the source of the error is ACC actuator.impl and we have enough
information about it. To prevent that error to out, the ACC actuator has a safety constraint (SC) for
it. For example, ”SC3: The ACC actuator prevents the error of actuator itself ” is used to mitigate
73
the effects of the actuator itself. This mitigation process has been executed in ACC actuator.impl as a
destination point. Finally, we show that the errors produced in the actuator (A) or ACC actuator.impl
have been mitigated in the actuator (A) or ACC actuator.impl itself.
4. What will happen if the ACC actuator.impl is not able to prevent ActuatorErrorValues error out? Figure 4.10, the error propagation path shows that the ActuatorErrorValues error is propagated from the
actuator to the sensor through controlled process, [ACC actuator.impl → controlled process.impl →
ACC sensor.impl]. This means Unsafe Action a of the ACC actuator.impl updates the ACC sensor.impl’s
information by incorrect values of speed and distance. This leads to the ACC controller.impl not being able to control speed and distance of the car in front because of the ACC actuator.impl’s internal
failure. To ensure that the error is propagated and has not been mitigated, the figure 4.10 shows empty
cells in some specific parameters such as mitigated error by safety constraints, mitigated in component
name and mitigated by component type.
Figure 4.10: ASAM’s report for Non-Mitigated Error in ACC system
4.2.3
Error Mitigation Analysis for Train Automated Door Control System
We will now provide and discuss error mitigation by safety constraint for the implementation of
each individual core component of the feedback control loop architecture for the train automated door control
system.
1
2
system Door sensor
features
3
door open s : out data port ;
4
do or c l os e s : out data port ;
5
d o o r s t a t e s : in data port ;
6
door close on person : out data port ;
7
end D o o r s e n s o r ;
8
74
9
10
11
system implementation Door sensor . impl
a n n e x asam {∗∗
−− I d e n t i f y t y p e o f m a j o r component
t y p e => s e n s o r ;
12
13
−−Door s e n s o r h a s an i n t e r n a l f a i l u r e c a u s e s e r r o r ( s ) on o u t p o r t s .
i n t e r n a l f a i l u r e INTF1 : ” d o o r r e p o r t s i t i s c l o s e d when a p e r s o n i s i n
14
t h e doorway ” c a u s e s [ D o o r S e n s o r C o m m i s s i o n E r r o r ( C r i t i c a l , p = 0 . 0 2 ) ] on
[ door close on person , door open s , door close s ] ;
15
−−Door s e n s o r h a s a s a f e t y c o n s t r a i n t t o p r e v e n t e r r o r o u t .
s a f e t y c o n s t r a i n t SC1 : ” d o o r s h o u l d be o p e n e d when t h e p e r s o n i s i n t h e
16
doorway ” p r e v e n t s [ D o o r S e n s o r C o m m i s s i o n E r r o r ] on [ d o o r o p e n s ,
door close s ];
17
18
∗∗};
end D o o r s e n s o r . i m p l ;
Listing 4.20: ASAM Error Mitigation Analysis for Train Automated Door Control System
In listing 4.20, we have represented the Door sensor specification and implementation. In the specification part, the interface has three out data ports, door open s and door close s are used to send the open
and close status of the door to the controller, door close on person is used to send status of the the door to the
controller when an object is in the doorway. The interface also has one in data port, door state s, that allows
the sensor to receive the last update values from physical door feedback about door status. In the implementation part, the interface shows the component’s major type is sensor. Also, it shows the Door sensor’s internal
failure which causes DoorSensorCommisionError on the output ports,door close on person, door open s
and door close s. We have also represented the probability of occurrence of the error and hazard’s severity
level. Also, we need to provide a safety constraint, SC1, for the Door sensor interface to prevent DoorSensorCommisionError through output ports.
1
2
system D o o r c o n t r o l l e r
features
3
door open c : in data port ;
4
door close c : in data port ;
5
open door cmd : out da t a p o r t ;
6
close door cmd : out data port ;
7
door state c : in data port ;
75
d o o r o p e n t r a i n n o t s t o p p e d : out data port ;
8
9
end D o o r c o n t r o l l e r ;
10
11
12
13
system implementation D o o r c o n t r o l l e r . impl
a n n e x asam {∗∗
−− I d e n t i f y t y p e o f m a j o r component
t y p e => c o n t r o l l e r ;
14
15
−−Door c o n t r o l l e r h a s i n t e r n a l f a i l u r e c a u s e s e r r o r ( s ) on o u t p o r t s .
16
i n t e r n a l f a i l u r e INTF2 : ” d o o r r e p o r t s i t i s o p e n e d when t h e t r a i n i s
not stopped ” causes
[ D o o r C o n t r o l l e r C o m m i s s i o n E r r o r ( M a r g i n a l , p = 0 . 0 4 ) ] on
[ d o o r o p e n t r a i n n o t s t o p p e d , close door cmd , open door cmd ] ;
17
−−E r r o r comes from d o o r s e n s o r and c a u s e s e r r o r on c o n t r o l l e r ’ s o u t p u t
ports .
[ D o o r S e n s o r C o m m i s s i o n E r r o r ] on [ d o o r o p e n c , d o o r c l o s e c ,
18
door state c ] causes
[ D o o r S e n s o r C o m m i s s i o n E r r o r ( C a t a s t r o p h i c , p = 0 . 0 0 2 ) ] on
[ d o o r o p e n t r a i n n o t s t o p p e d , close door cmd , open door cmd ] ;
19
−−Door c o n t r o l l e r h a v e s a f e t y c o n s t r a i n t s t o h a n d l e i n c o m i n g e r r o r and
prevent outgoing error .
s a f e t y c o n s t r a i n t SC2 : ” d o o r s s h o u l d be o p e n e d when t h e p e r s o n i s i n t h e
20
doorway ” h a n d l e s [ D o o r S e n s o r C o m m i s s i o n E r r o r ] on [ d o o r o p e n c ,
door close c , door state c ];
s a f e t y c o n s t r a i n t SC3 : ” d o o r s s h o u l d be c l o s e d when t h e t r a i n i s n o t
21
stopped at the s t a t i o n platform ” prevents
[ D o o r C o n t r o l l e r C o m m i s s i o n E r r o r ] on [ c l o s e d o o r c m d , o p e n d o o r c m d ] ;
22
23
∗∗};
end D o o r c o n t r o l l e r . i m p l ;
Listing 4.21: ASAM Error Mitigation Analysis for Train Automated Door Control System
In listing 4.21, we have represented the Door controller specification and implementation. In the
specification part, Door controller has three in data ports, door open c and door close c are used to receive open and close door status from the sensor respectively, door state c is used to receive any abnor-
76
mal status of the door such as an object in the doorway but the door is closed. Also, Door controller
has three out data ports, open door cmd and close door cmd are used to send open and close door commands to actuator, door open train not stopped is used to send abnormal status of the door to actuator
such as door train is opened before train is stopped at the station platform. In the implementation part,
Door controller shows the type of the component is the controller. Also, it shows the Door controller’s
internal failure which causes DoorControllerCommissionError on output ports door open train not stopped,
open door cmd and close door cmd. This error effects the Door controller’s decision by giving inadequate commands to door actuator. At the same time, Door sensor error, DoorSensorCommisionError, is
propagated from Door controller’s in ports door open c, door close c and door state c through the error
propagation path. This effects the Door controller’s decision as well. In this case, we need to provide two
safety constraints, SC2 and SC3, to mitigate the effects of the incoming and outgoing errors respectively.
1
2
system D o o r a c t u a t o r
features
3
open door a : in data port ;
4
close door a : in data port ;
5
door state a : in data port ;
6
door blocked in emergency : out data port ;
7
end D o o r a c t u a t o r ;
8
9
10
11
12
13
14
system implementation D o o r a c t u a t o r . impl
a n n e x asam {∗∗
−− I d e n t i f y t y p e o f m a j o r component
t y p e => a c t u a t o r ;
−−Door a c t u a t o r h a s i n t e r n a l f a i l u r e c a u s e s e r r o r on o u t p o r t s
i n t e r n a l f a i l u r e INTF3 : ” d o o r r e p o r t s i t i s b l o c k e d d u r i n g e m e r g e n c y ”
c a u s e s [ D o o r A c t u a t o r C o m i s s i o n E r r o r ( C a t a s t r o p h i c , p = 0 . 2 ) ] on
[ door blocked in emergency ] ;
15
16
−−E r r o r comes from d o o r c o n t r o l l e r and c a u s e s e r r o r on o u t p o r t .
[ D o o r C o n t r o l l e r C o m m i s s i o n E r r o r ] on [ o p e n d o o r a , c l o s e d o o r a ,
door state a ] causes
[ D o o r C o n t r o l l e r C o m m i s s i o n E r r o r ( C r i t i c a l , p = 0 . 1 ) ] on
[ door blocked in emergency ] ;
77
17
−− Door a c t u a t o r h a v e s a f e t y c o n s t r a i n t s t o h a n d l e i n c o m i n g e r r o r s and
prevent outgoing e r r o r s .
s a f e t y c o n s t r a i n t SC4 : ” d o o r s s h o u l d be c l o s e d b e f o r e t r a i n i s moving
18
and d o o r s s h o u l d be o p e n e d a f t e r t r a i n i s s t o p p e d a t t h e s t a t i o n
p l a t f o r m ” h a n d l e s [ D o o r C o n t r o l l e r C o m m i s s i o n E r r o r ] on
[ open door a , close door a , d o o r s t a t e a ] ;
s a f e t y c o n s t r a i n t SC5 : ” d o o r s s h o u l d be o p e n e d d u r i n g e m e r g e n c y ”
19
p r e v e n t s [ D o o r A c t u a t o r C o m i s s i o n E r r o r ] on [ d o o r b l o c k e d i n e m e r g e n c y ] ;
20
21
∗∗};
end D o o r a c t u a t o r . i m p l ;
Listing 4.22: ASAM Error Mitigation Analysis for Train Automated Door Control System
In listing 4.22, we have represented the Door actuator specification and implementation. From
the specification perspective, Door actuator has three in data ports, open door a and close door a are used
to receive open and close door commands from the controller, door state a is used to receive any abnormal status of the door such as doors are not closed before moving. Also, Door actuator has an out data
port, door blocked in emergency is used to send abnormal status of the door to controlled process such as
doors are blocked during an emergency which leads to people stuck inside the train. From the implementation information perspective, the Door actuator shows the component’s major type, which is the actuator.
Also, it shows the Door actuator’s internal failure which causes DoorActuatorComissionError on an output
port door blocked in emergency. That error effects the Door actuator’s execution commands. At the same
time, the Door controller’s error, DoorControllerCommissionError, is propagated to Door actuator in ports
open door a, close door a and door state a through the error propagation path. Now, the Door actuator gets
two types of errors: one from the controller of the door and the other generated by the actuator itself. In this
situation, we need to provide two safety constraints, SC4 and SC5, to mitigate the effects of the incoming
and outgoing errors respectively.
78
Figure 4.11: ASAM’s Error Mitigation report for Train Automated Door Control System Architecture
Figure 4.11 shows ASAM’s generated report about error mitigation analysis by safety constraints
for the train automated door control system architecture. ASAM generates the report based on the following
information: internal failure for each major component, each internal failure causes error, probability value
of error occurrence, probability thresholds to identify probability of which hazard could occur, major component implementation, major component specification, propagation path to identify error mitigation process
step by step, error specification propagation collects information about each error, mitigated error by safety
constraint, mitigated error in components implementation and mitigated error in components specification.
We iteratively explain figure 4.11 below:
1. The internal failure (INTF1) recorded in the Door sensor which causes DoorSensorCommisionError
on output port. This error leads to closing the door on a person in the doorway. The probability
of occurrence for that error in the sensor mis-reading an object is (0.02). This error gives a critical
level of hazard for people before train door system is developed because Door sensor does not read
objects in the doorway. In the error propagation path, we show that the error is propagated from the
sensor to the controller[Door sensor.impl → Door controller.impl] through the error propagation path.
Also, in the error propagation specification, the controller gathers information about the error such as
name of the error, port to propagate, hazard’s severity level of the error and p value. To mitigate the
propagated error, the Door controller has safety constraints (SC). For example, ”SC2: doors should be
opened when the person is in the doorway” is used to mitigate the effects of misreading objects. This
mitigation process has been executed in the Door controller.impl. Finally, we show that the error is
propagated from sensor(S) or Door sensor.impl as a source of the error and has been mitigated in the
controller(C) or Door controller.impl as a destination of the error.
79
2. Previously, we show that how the Door controller mitigates the incoming errors from the Door sensor.
But, what will happen if the Door controller itself has an internal failure? How can the Door controller
solve this problem? For that purpose, we have recorded an internal failure (INTF2) in the Door controller
which causes error, DoorControllerCommissionError, in commands. This error leads to letting the
controller open the doors before train is stopped or after train is started. The probability of occurrence
for that error is (0.03). That p value gives marginal severity level to people. In the error propagation
path, we show that the error is propagated from the controller to the actuator [Door controller.impl →
Door actuator.impl] through the error propagation path. Also, in the error propagation information, the
controller gathers information about the error and sends it to the actuator such as name of the error,
port to propagate, hazard’s severity level of the error and p value. Now, to mitigate the propagated error
in the actuator, Door actuator provides a safety constraint for it. For example, ”SC4: doors should be
closed before train is moving and doors should be opened after train is stopped at the station platform”
is used to mitigate the effects of the error in the controller’s command. This mitigation process has
been executed in the Door actuator.impl as a destination point. Finally, we show that the source of
the error is the door controller(C) or Door controller.impl and mitigated in the door actuator (A) or
Door actuator.impl.
3. Now, we know that how the Door controller handles incoming errors from the Door sensor and we
also show that the Door actuator handles the incoming errors from the Door controller. But, what
will happen if the Door actuator itself has an internal failure? For that purpose, we have recorded
an internal failure (INTF3) in the Door actuator which causes DoorActuatorCommissionError. This
error leads to blocking the doors during an emergency. The probability of occurrence for that error
during execution is (0.04). That p value gives a catastrophic severity level. Previously, we show how
Door actuator mitigates the incoming propagated errors from Door controller. Now, we need to show
how the Door actuator prevents errors that have been produced by itself. In the error propagation path,
we show that the source of the error is Door actuator.impl and the actuator has sufficient information
about the error such as name of produced error, port to propagate, p value and hazard’s severity level. To
prevent DoorActuatorCommissionError, the Door actuator has a safety constraint for it. For example,
”SC5: doors should be opened during emergency” is used to mitigate the effects of the error. This
mitigation process has been executed in Door actuator.impl as a destination point. Finally, we show
that the source of produced error in the actuator (A) or Door actuator.impl and has been mitigated in
80
the (A) or Door actuator.impl.
4. What will happen if the Door actuator is not able to prevent DoorActuatorCommissionError? As
shown in figure 4.12, the error propagation analysis shows that the error has been propagated to
Door sensor.impl through Physical Door.impl. This means Door actuator updates the Door sensor’s
information incorrectly. This leads to the Door controller not being able to open the door during an
emergency because the sensor continuously provides incorrect values of data to controller and the controller will do incorrect computation for that data which causing it to send inadequate commands that
would not lead to opening the physical door. To ensure that the error is propagated and has not been
mitigated, the figure 4.12 shows empty cells in some specific parameters such as mitigated error by
safety constraints, mitigated in component name, and mitigated by component type.
Figure 4.12: ASAM’s report for Non-Mitigated Error in Train Door
4.2.4
Error Mitigation Analysis for Medical Embedded Device
We will now provide and discuss error mitigation by safety constraint for the implementation of
each individual core component of the feedback control loop architecture for the medical embedded device pacemaker.
1
system E l e c t r o d e
2
features
3
Measure Sensing : out event data port ;
4
Heart Response : in event data port ;
5
end E l e c t r o d e ;
6
7
8
9
10
system implementation E l e c t r o d e . impl
a n n e x asam {∗∗
−− I d e n t i f y t y p e o f m a j o r component
t y p e => s e n s o r ;
81
11
−−E l e c t r o d e i s t h e s e n s o r o f t h e p a c e m a k e r . The i n t e r n a l f a i l u r e i n i t
causes electrode error .
i n t e r n a l f a i l u r e INTF1 : ” The e l e c t r o d e d o e s n o t m e a s u r e s e n s i n g v a l u e s
12
on t i m e ” c a u s e s [ L a t e M e a s u r e S e n s i n g ( C r i t i c a l , p = 0 . 0 2 ) ] on
[ Measure Sensing ] ;
13
14
∗∗};
end E l e c t r o d e . i m p l ;
Listing 4.23: ASAM Error Mitigation Analysis for pacemaker
In listing 4.23, we have represented the Electrode sensor specification and implementation. From
the specification perspective, the interface has two ports Measure Sensing and Heart Response. The Measure Sensing is an out event data port and used to send sensing measured values to controller.Heart Response
is an in event data port and used to receive the heart response values whenever the heart is paced. From the
implementation perspective, we have identified the Electrode major type which is the sensor. Also, we show
that the Electrode has an internal failure which causes LateMeasureSensing error on the out event data port
Measure Sensing. Additionally, we have specified the probability of occurrence for that error and hazard’s
severity level.
1
2
system D e v i c e C o n t r o l l e r M o n i t o r
features
3
Heart Monitoring : in event data port ;
4
Send Pulse Command : o u t e v e n t d a t a p o r t ;
5
end D e v i c e C o n t r o l l e r M o n i t o r ;
6
7
8
9
10
11
system implementation D e v i c e C o n t r o l l e r M o n i t o r . impl
a n n e x asam {∗∗
−− I d e n t i f y t y p e o f m a j o r component
t y p e => c o n t r o l l e r ;
−−DCM i s t h e c o n t r o l l e r o f t h e p a c e m a k e r . The i n t e r n a l f a i l u r e i n i t
c a u s e s DCM e r r o r .
12
i n t e r n a l f a i l u r e INTF2 : ” The DCM d o e s n o t s e n d p a c i n g command on t i m e t o
t h e p u l s e g e n e r a t o r ” c a u s e s [ LatePacingCommand ( C r i t i c a l , p = 0 . 0 3 ) ] on
[ Send Pulse Command ] ;
13
−−The E l e c t r o d e e r r o r p r o p a g a t e s t o DCM.
82
[ L a t e M e a s u r e S e n s i n g ] on [ H e a r t M o n i t o r i n g ] c a u s e s
14
[ L a t e M e a s u r e S e n s i n g ( C r i t i c a l , p = 0 . 0 0 2 ) ] on [ Send Pulse Command ] ;
15
−−DCM h a v e a SC t o m i t i g a t e e r r o r s coming from t h e e l e c t r o d e
s a f e t y c o n s t r a i n t SC1 : ” The DCM s h o u l d r e c e i v e h e a r t s e n s i n g v a l u e s on
16
t i m e t o s e n d t h e p a c e command on t i m e a s w e l l ” h a n d l e s
[ L a t e M e a s u r e S e n s i n g ] on [ H e a r t M o n i t o r i n g ] ;
17
18
∗∗};
end D e v i c e C o n t r o l l e r M o n i t o r . i m p l ;
Listing 4.24: ASAM Error Mitigation Analysis for pacemaker
In listing 4.24, we have represented the Device Controller Monitor specification and implementation. From specification perspective, the interface has two ports Heart Monitoring and Send Pulse Command.
The Heart Monitoring is an in event data port and is used to monitor measured sensing values of the heart.
The Send Pulse Command is an out event data port and is used to send pulse command to pulse generator. From the implementation perspective, we have identified the component major type which is controller. We show the Device Controller Monitor has an internal failure which causes LatePacingCommand
on the out event data port Send Pulse Command. This error effects of the controller’s decision by giving
pulse command in late. At the same time, Electrode sensor error, LateMeasureSensing is propagated to Device Controller Monitor through error propagation path. The sensor’s propagated error also effects of the
controller’s decision as well. In the controller, we need to provide a safety constraint, SC1, to handle the
incoming error first.
1
2
system P u l s e G e n e r a t o r
features
3
Receive Pulse Command : i n e v e n t d a t a p o r t ;
4
Heart Pacing : out event data port ;
5
end P u l s e G e n e r a t o r ;
6
7
8
9
10
11
system implementation P u l s e G e n e r a t o r . impl
a n n e x asam {∗∗
−− I d e n t i f y t y p e o f m a j o r component
t y p e => a c t u a t o r ;
−−The PG i s t h e p u l s e g e n e r a t o r . The i n t e r n a l f a i l u r e i n i t c a u s e s PG
errors .
83
i n t e r n a l f a i l u r e INTF3 : ” The p u l s e g e n e r a t o r d o e s n o t g e n e r a t e t h e
12
r e q u i r e d p u l s e whenever i s needed ” c a u s e s
[ D e l a y e d P a c i n g S e r v i c e ( C r i t i c a l , p = 0 . 0 4 ) ] on [ H e a r t P a c i n g ] ;
13
−−DCM e r r o r from c o n t r o l l e r c a u s e s D e l a y e d S e r v i c e i n PG .
[ LatePacingCommand ] on [ R e c e i v e P u l s e C o m m a n d ] c a u s e s
14
[ LatePacingCommand ( C r i t i c a l , p = 0 . 1 ) ] on [ H e a r t P a c i n g ] ;
s a f e t y c o n s t r a i n t SC2 : ” The p u l s e g e n e r a t o r s h o u l d r e c e i v e t h e p a c i n g
15
command on t i m e from DCM” h a n d l e s [ LatePacingCommand ] on
[ Receive Pulse Command ] ;
s a f e t y c o n s t r a i n t SC3 : ” The p u l s e g e n e r a t o r s h o u l d n o t p r o v i d e t h e l a t e
16
p a c i n g s e r v i c e t o t h e h e a r t ” p r e v e n t s [ D e l a y e d P a c i n g S e r v i c e ] on
[ Heart Pacing ] ;
17
18
∗∗};
end P u l s e G e n e r a t o r . i m p l ;
Listing 4.25: ASAM Error Mitigation Analysis for pacemaker
In listing 4.25, we have represented the Pulse Generator specification and implementation. From
the specification perspective, the interface has two ports Receive Pulse Command and Heart Pacing. The
Receive Pulse Command is an in event data port and is used to receive pulse command from the controller.
The Heart Pacing is an out event data port and is used to execute the received command such as pacing heart
according to the controller’s decision. From the implementation perspective, we have identified the component’s major type which is the actuator. Also, we show that the Pulse Generator has an internal failure which
causes DelayedPacingService error on the output port Heart Pacing. That error effects the Pulse Generator’s
execution commands. At the same time, the Device Controller Monitor error, LatePacingCommand, is propagated to Pulse Generator in port Receive Pulse Command through error propagation path. That error also
effects of the execution commands in the Pulse Generator as well. Now, the Pulse Generator gets two types
of errors: one from the Device Controller Monitor and the other by Pulse Generator itself. In this case, we
need to provide two safety constraints, SC2 and SC3, to mitigate the effects of the incoming and outgoing
errors respectively.
84
Figure 4.13: ASAM’s Error Mitigation report for pacemaker
Figure 4.13 shows ASAM’s generated report about error mitigation analysis by safety constraints
for the an embedded medical device is called pacemaker. ASAM generates the report based on the following
information: internal failure for each major component, each internal failure causes error, probability value
of error occurrence, probability thresholds to identify probability of which hazard could occur, major component implementation, major component specification, propagation path to identify error mitigation process
step by step, error specification propagation collects information about each error, mitigated error by safety
constraint, mitigated error in components implementation and mitigated error in components specification.
We iteratively explain figure 4.13 below:
1. The internal failure (INTF1) recorded in the Electrode which causes LateMeasureSensing error on output port Measure Sensing. This error leads to reading the heart sensing values in late. The probability
of occurrence for that error in the sensor late-reading sense value is (0.02). This error gives a critical
level of hazard for patient before pacemaker is developed because Electrode does not read sensing values on time during operational system context. In the error propagation path, we show that the error
is propagated from the sensor to controller[Electrode.impl → Device Controller Monitor.impl] through
the error propagation path. Also, in the error propagation specification, the controller gathers information about the propagated error such as name of the error, port to propagate, hazard’s severity level of
the error and p value. To mitigate the propagated error, the Device Controller Monitor.impl has safety
constraint (SC). For example, ”SC1:The DCM should receive heart sensing values on time to send the
pace command on time as well” is used to mitigate the effects of late-reading values. This mitigation
process has been executed in the Device Controller Monitor.impl. Finally, we show that the error is
propagated from sensor (S) or Electrode.impl as a source of the error and has been mitigated in the
controller (C) or Device Controller Monitor.impl as a destination of the error.
2. Previously, we show that the Device Controller Monitor.impl mitigates the incoming errors from the
Electrode.impl. But, what will happen if the Device Controller Monitor.impl itself has an internal
85
failure?How can the Device Controller Monitor.impl solve this problem? For that purpose, we have
recorded an internal failure (INTF2) in the Device Controller Monitor.impl which causes error, LatePacingCommand. This error leads to letting the controller send the pace command in a late time because it receives sensing values in a late time as well. The probability of occurrence for that error is
(0.03). That p value gives critical severity level to the patient. In the error propagation path, we show
that the error is propagated from the controller to the actuator [Device Controller Monitor.impl →
Pulse Generator.impl] through the error propagation path. Also, in the error propagation information,
the controller gathers information about the error and sends it to the actuator such as name of the error,
port to propagate, hazard’s severity level of the error and p value. Now, to mitigate the propagated
error in the actuator, Pulse Generator.impl provides a safety constraint for it. For example, ”SC2:
The pulse generator should receive the pacing command on time from DCM” is used to mitigate the
effects of the error in the controller’s command. This mitigation process has been executed in the
Pulse Generator.impl as a destination point. Finally, we show that the source of the error is the controller (C) or Device Controller Monitor.impl and mitigated in the actuator (A) or Pulse Generator.impl
3. Now, we know that how the Device Controller Monitor.impl handles incoming errors from the Electrode.impl and we also show that the Pulse Generator.impl handles the incoming errors from the Device Controller Monitor.impl. But, what will happen if the Pulse Generator.impl itself has an internal
failure? For that purpose, we have recorded an internal failure (INTF3) in the Pulse Generator.impl
which causes DelayedPacingService error. This error leads to delaying the pace service to the heart.
The probability of occurrence for that error during execution is (0.04). That p value gives a critical severity level. Previously, we show how Pulse Generator.impl mitigates the incoming propagated
errors from Device Controller Monitor.impl. Now, we need to show how the Pulse Generator.impl prevents errors that have been produced by itself. In the error propagation path, we show that the source
of the error is Pulse Generator.impl and the actuator has sufficient information about the error such as
name of the produce error, port to propagate, p value and hazard’s severity level. To prevent DelayedPacingService error, the Pulse Generator.impl has a safety constraint for it. For example, ”SC3: The
pulse generator should not provide the late pacing service to the heart” is used to mitigate the effects
of the error. This mitigation process has been executed in the Pulse Generator.impl as a destination
point. Finally, we show that the source of the error is the actuator (A) or Pulse Generator.impl and has
been mitigated in the (A) or Pulse Generator.impl itself.
86
4. What will happen if the Pulse Generator.impl is not able to prevent DelayedPacingService error? As
shown in figure 4.14, the error propagation analysis shows that the DelayedPacingService error has
been propagated from the actuator to the sensor [Pulse Generator.impl → Heart.impl → Electrode.impl].
This means that the Pulse Generator.impl updates the Electrode.impl’s information in a late time. This
leads to the Device Controller Monitor.impl not being able to pace the heart when it is required because
of the Pulse Generator.impl’s internal failure. If we assume that the Device Controller Monitor.impl
sends another pulse command to the Pulse Generator.impl in a late time. Also, the Pulse Generator.impl
executes the second command in a late time as well. This leads to accumulating time of delay pacing
service to the heart. This operation does not help to raise the patient’s heart beating because every
sensing operation needs pacing action on time. To ensure that the error is propagated and has not been
mitigated, the figure 4.14 shows empty cells in some specific parameters such as mitigated error by
safety constraints, mitigated in component name, and mitigated by component type.
Figure 4.14: ASAM’s report for Non-Mitigated Error in pacemaker
4.2.5
Summary
In this evaluation, we have taken three examples which contain different types of errors such as
value errors in ACC example, service errors in train door and timing errors in pacemaker. Each example has
been evaluated based on mitigated and non-mitigated errors. The result of each example has been showed
based on the ASAM’s generated report.
The purpose of this evaluation is to demonstrate that ASAM mitigates errors by safety constraints
for different kinds of safety-critical system examples. ASAM generates a report based on the following
information: identify internal failure for each major component, each internal failure causes error, probability
value of error occurrence, thresholds to identify probability of which hazard could occur, identify major
component’s implementation & specification parts, propagation path to identify error mitigation process step
by step, and mitigate error effects by safety constraints in the component’s implementation parts. Finally,
in each example, we have also demonstrated the ASAM’s ability, to handle incoming errors and prevent
outgoing errors by safety constraints, for each specific component in the feedback control loop architecture.
87
4.3 Demonstrating Safety Constraints Verification
Claim C3: ASAM uses verification, unlike existing safety analysis methods, to further ensure that
the safety constraint fully mitigates the hazardous condition improving the quality of the safety constraints.
4.3.1
Evaluation Plan
To demonstrate claim C3, we use ASAM to verify the safety constraints against the system model
with injected errors. This allows us to determine that either unsafe behavior occurs (which means the error
leads to a hazardous condition in the system) or verify that the error does not lead to unsafe behavior.
To evaluate this claim, we need to add a new piece, verified by [Guarantee Identifier], to the safety
constraint statements of ASAM and pass the identifiers to the guarantee statements in XAGREE (eXtended
Assume Guarantee REasoning Environment) which is described in section 2.4. The identifier of the guarantee
statement is used in the verified by statements of ASAM. When we invoke either activity (report or analysis)
in ASAM, XAGREE is spawned in the background and as each component of the feedback control loop is
encountered, a verify all activity in XAGREE is spawned. ASAM waits for XAGREE to complete the analysis
and then it matches each safety constraint with the XAGREE results. If XAGREE verified the guarantee(s)
associated with the safety constraint statement, then the safety constraint is reported as verified. If any
guarantee associated with a safety constraint fails verification, the safety constraint is marked as unverified.
4.3.2
Safety Constraints Verification for Train Automated Door Control System
We will provide and discuss safety constraint verification for the feedback control loop architecture
of the train automated door control system. We will verify the safety constraints that have been placed in
the implementation of each component. The specification part of the component ensures that the component
does not have unsafe behavior by listing the verification conditions using XAGREE. In this example, we will
verify three important safety safety constraints of the train door, which are:
1. SC1: The train door should not be blocked when it is opened.
2. SC2: The train door should not be blocked when it is closed.
3. SC3: The train door can not be opened and closed at the same time (i.e, open and then close, close and
then open, not both together).
88
1
2
system d o o r s e n s o r
features
3
d o o r b l o c k e d s : in da t a p o r t Base Types : : I n t e g e r ;
4
d o o r o p e n s : out d a t a p o r t Base Types : : Boolean ;
5
d o o r c l o s e s : out d a t a p o r t Base Types : : Boolean ;
6
7
a n n e x x a g r e e {∗∗
−− I d e n t i f y d o o r s e n s o r i n p u t s t a t e s
assume ” Door i n b l o c k s t a t e s ” : d o o r b l o c k e d s >= 0 ;
8
9
−− I d e n t i f y d o o r s e n s o r o u t p u t i d e n t i f i e r s
GAU1 : g u a r a n t e e ” Door i s b l o c k e d o n l y i f
10
i t opens ” : ( d o o r b l o c k e d s = 1
and d o o r o p e n s = t r u e ) o r d o o r o p e n s = f a l s e ;
GAU2 : g u a r a n t e e ” Door i s b l o c k e d o n l y i f
11
i t closes ” : ( door blocked s =
0 and d o o r c l o s e s = t r u e ) o r d o o r c l o s e s = f a l s e ;
12
13
∗∗};
end d o o r s e n s o r ;
Listing 4.26: XAGREE Identifiers for Train Door Sensor Specification
In listing 4.26, we have presented the door sensor specification. Notice that identifiers have been
annotated in the XAGREE annex. In the specification, the interface has two out data ports, door open s and
door close s, which are used to send the open and close status of the door to the controller. The interface
also has an in data port, door blocked s, that allows the sensor to receive the blocked status value from the
physical door sensors. In the XAGREE annex, we have preconditions for the in data port. Also, we have
postconditions and identifiers, GAU1 and GAU2, for the out data ports.
1
2
system implementation d o o r s e n s o r . impl
a n n e x asam {∗∗
3
t y p e => s e n s o r ;
4
E r r s => [ {
5
Type => V a l u e O p e n e d E r r o r ( C r i t i c a l , p = 0 . 1 ) ,
6
UCA => UCA1 : ” Door r e p o r t i t i s b o t h o p e n e d and b l o c k e d ” ,
7
C a u s e s => {
G e n e r a l => ”A s i g n a l was g e n e r a t e d wrong when d o o r i s o p e n e d ”
8
9
10
},
−−V e r i f y d o o r s e n s o r o u t p u t s by s a f e t y c o n s t r a i n t s
89
identifiers .
SC => SC1 : ” Door s h o u l d n o t be b l o c k e d when t h e d o o r i s o p e n e d ”
11
v e r i f i e d by [GAU1]
12
},
13
{
14
Type => V a l u e C l o s e d E r r o r ( C r i t i c a l , p = 0 . 1 ) ,
15
UCA => UCA2 : ” Door r e p o r t i t i s b o t h c l o s e d and b l o c k e d ” ,
C a u s e s => {
16
G e n e r a l => ”A s i g n a l was g e n e r a t e d wrong when d o o r i s c l o s e d ”
17
},
18
19
−−V e r i f y d o o r s e n s o r o u t p u t s by s a f e t y c o n s t r a i n t s
20
SC => SC2 : ” Door s h o u l d n o t be b l o c k e d when t h e d o o r i s c l o s e d ”
identifiers .
v e r i f i e d by [GAU2]
}]
21
22
23
24
∗∗};
−− S a t i s f y e a c h s t a t u s o f t h e d o o r s e n s o r
a n n e x x a g r e e {∗∗
25
assign door open s = ( door blocked s = 1) ;
26
assign d o o r c l o s e s = ( door blocked s = 0) ;
27
28
∗∗};
end d o o r s e n s o r . i m p l ;
Listing 4.27: ASAM and XAGREE for Train Door Sensor Implementation
In listing 4.27, we have presented ASAM and XAGREE in the door sensor implementation. ASAM
matches the guarantee identifiers, GAU1 and GAU2, to the guarantee statements in the XAGREE specification shown in list 4.26. These identifiers are used to verify the safety constraints SC1 and SC2 respectively
to ensure that the train door is not blocked when it is opened or closed. When we call XAGREE, it checks all
activities in the specification and implementation parts of the door sensor and returns the result to ASAM. At
that time, ASAM waits for the XAGREE response and then it matches each safety constraint with the XAGREE results. If XAGREE verified the guarantee identifiers, GAU1 and GAU2, associated with the safety
constraint, SC1 and SC2, then ASAM reports that the safety constraints have been verified.
1
−−The d a t a m e s s a g e comes from t h e d o o r s e n s o r , i t c o n t a i n s two s t a t u s a t t h e
same t i m e
2
d a t a message
90
3
end m e s s a g e ;
4
5
6
d a t a i m p l e m e n t a t i o n message . impl
subcomponents
7
open door c : d a t a b a s e t y p e s : : Boolean ;
8
c l o s e d o o r c : d a t a b a s e t y p e s : : Boolean ;
9
end m e s s a g e . i m p l ;
10
11
12
13
−−The d o o r c o n t r o l l e r s p e c i f i c a t i o n
system d o o r c o n t r o l l e r
features
14
d o o r o p e n a n d c l o s e m s j : i n d a t a p o r t message . impl ;
15
d o o r b l o c k e d c : in d a ta p o r t Base Types : : I n t e g e r ;
16
open door c : out d a t a p o r t Base Types : : Boolean ;
17
c l o s e d o o r c : out d a t a p o r t Base Types : : Boolean ;
18
a n n e x x a g r e e {∗∗
19
assume ” Door s t a t e s i n ” : d o o r b l o c k e d c <= 2 and d o o r b l o c k e d c >=0;
20
GAU3 : g u a r a n t e e ” Door c a n n o t be o p e n e d and c l o s e d a t t h e same t i m e ” :
n o t ( o p e n d o o r c and c l o s e d o o r c ) ;
21
22
∗∗};
end d o o r c o n t r o l l e r ;
Listing 4.28: XAGREE for Train Door Controller Specification
In listing 4.28, we present a message object, read by the door sensor, containing two status at the
same time. In this example, only one of the boolean flags of the message will be set at a time; there should not
be an instance where both are set. Also, we present the door controller specification. In the specification, the
interface has two out data ports, open door c and close door c, which are used to send commands to open and
close the train door. The interface also has two in data ports, door open and close msj and door blocked c.
The door open and close msj is used to receive the data messages that have been read by the door sensor.
The door blocked c is used to receive the blocked status of the train door. In the XAGREE annex, we have
preconditions for the in data port. Also, we have a postcondition ensuring that the door controller does not
send both open and close door commands at the same time.
1
system implementation d o o r c o n t r o l l e r . impl
91
2
a n n e x asam {∗∗
3
t y p e => c o n t r o l l e r ;
4
E r r s => [ {
5
Type => V a l u e O p e n d A n d C l o s e d E r r o r ( C r i t i c a l , p = 0 . 2 ) ,
6
UCA => UCA3 : ” Door r e p o r t i t i s i n two s t a t e s ( open and c l o s e ) a t
t h e same t i m e ” ,
C a u s e s => {
7
G e n e r a l => ”A s i g n a l was g e n e r a t e d wrong when d o o r i s c l o s e d
8
and o p e n e d ”
},
9
10
−− V e r i f y t h e s a f e t y c o n s t r a i n t i d e n t i f i e r
11
SC => SC3 : ” Door c o n t r o l l e r s h o u l d n o t s e n d open and c l o s e d o o r
commands a t t h e same t i m e ” v e r i f i e d by [GAU3]
}]
12
13
∗∗};
14
a n n e x x a g r e e {∗∗
15
−− S a t i s f y t h a t open and c l o s e h a v e t h e same s t a t u s
eq d o o r o p e n m s j c : b o o l = f a l s e −> i f
16
( ( d o o r o p e n a n d c l o s e m s j . o p e n d o o r c = t r u e ) and
( door open and close msj . close door c = f a l s e ) ) then true else
false ;
eq d o o r c l o s e m s j c : b o o l = f a l s e −> i f
17
( ( d o o r o p e n a n d c l o s e m s j . c l o s e d o o r c = t r u e ) and
( door open and close msj . open door c = f a l s e ) ) then true else f a l s e ;
18
−− S a t i s f y t h a t t h e d a t a m e s s a g e do n o t l e a d t o b l o c k t h e d o o r when i t
i s opened or c l o s e d
19
a s s e r t ( o p e n d o o r c = ( d o o r o p e n m s j c and ( d o o r b l o c k e d c <= 2 ) ) ) ;
20
a s s e r t ( c l o s e d o o r c = ( d o o r c l o s e m s j c and ( d o o r b l o c k e d c >= 0 ) ) ) ;
21
22
∗∗};
end d o o r c o n t r o l l e r . i m p l ;
Listing 4.29: ASAM and XAGREE for Train Door Controller Implementation
In listing 4.29, we have both ASAM and XAGREE in the door controller implementation. ASAM
provides an identifier, GAU3, to the guarantee statement in the XAGREE specification of the train door
92
controller, as shown in listing 4.28, to verify the safety constraint SC3. When ASAM calls XAGREE, it
checks all activity of the train door controller in the specification / implementation and returns the result to
ASAM. As we show that in the door controller implementation, XAGREE checks the status of the open and
close flags in the data message door open and close msj. Also, XAGREE has to check that the door sensor
data message does not allow the door to enter a state where it is blocked while open or closed.
Figure 4.15: ASAM’s report about safety constraints verification for train door
Figure 4.15 shows ASAM’s generated report about safety constraint verification for the train automated door control system. ASAM generates the report based on the following information: component’s
implementation part, error ontology, probability occurrence of the specific error, probability thresholds to
identify hazard’s severity level, unsafe control action, causes of unsafe control action, safety constraints, and
verification of the safety constraints. This stream of information was previously described in other examples.
In this section of the report, we will focus on the verification parts of the safety constraints.
1. The unsafe control action (UCA1) in the door sensor.impl tells us that the train door has been blocked
when it is opened. The general cause for this unsafe action is a wrong signal generated by the sensor
when the door is opened, or an internal failure of the sensor. To solve this problem, we need to provide a
safety constraint, SC1, to show that the door should not be blocked when it is opened. At the same time,
we need to ensure that the train door is opened correctly/safely and it is not blocked. For that purpose,
we provide a guarantee identifier, GAU1, for the safety constraint, SC1. ASAM sends the identifier to
XAGREE to verify the activity of the door sensor.impl and then returns the result (Yes/No) to ASAM.
ASAM reports that result (Yes) showing verification of the safety constraint, SC1. This verification
means that the door sensor.impl cannot be blocked when the door is opened.
2. The unsafe control action (UCA2) in the door sensor.impl tells us that the train door has been blocked
when it is closed. The general cause for this unsafe action is a wrong signal generated by the sensor
when the door is closed, or an internal failure of the sensor. To solve this problem, we need to provide a
93
safety constraint, SC2, to show that the door should not be blocked when it is closed. At the same time,
we need to ensure that the train door is closed correctly/safely and it is not blocked. For that purpose,
we provide a guarantee identifier, GAU2, for the safety constraint, SC2. ASAM sends the identifier to
XAGREE to verify the activity of the door sensor.impl and then returns the result (Yes/No) to ASAM.
ASAM reports that result (Yes) showing verification of the safety constraint, SC2. This verification
means that the door sensor.impl cannot be blocked when the door is closed.
3. The unsafe control action (UCA3) in the door controller.impl tells us that the incoming data message
from the door sensor.impl contains two status (open and close) at the same time. The general cause
for this unsafe action is a wrong signal generated by the sensor when the door is opened or closed, or
the controller has an internal failure. To solve this problem, we need to provide a safety constraint,
SC3, to show that the door controller.impl should not send both open and close commands at the same
time. Also, we need to ensure that the door controller.impl does not send the wrong command. For
that purpose, we provide a guarantee identifier, GAU3, for the safety constraint, SC3. ASAM sends
the identifier to XAGREE to verify the activity of the door controller.impl and then returns the result
(Yes/No) to ASAM. ASAM reports the result (Yes) showing verification of the safety constraint, SC3.
This verification means that the door controller.impl does not send inadequate commands, or two status
at the same time.
Figure 4.16: ASAM’s report about safety constraints are not verified for the train door
Figure 4.16 shows ASAM’s generated report about safety constraints of the door sensor status in
the train automated door control system are not verified. We will describe the reasons as follows:
1. As a reminder, we need to verify the safety constraint to ensure the door is not blocked when it is
opened. For that purpose, we provide a guarantee identifier, GAU1, for the safety constraint, SC1.
ASAM sends the identifier to XAGREE to verify the activity of the door sensor.impl and then returns
the result (Yes/No) to ASAM. ASAM reports that result (NO or N) showing the safety constraint, SC1,
94
is not verified. This verification means that the door is blocked when the it is opened. This means
the door will not return to different status because it is blocked in that situation. ASAM verified that
the door is blocked when it is opened because ASAM did not obtain the true value from XAGREE to
ensure the door is not blocked when it is opened.
2. Also, we need to verify the safety constraint to ensure the door is not blocked when it is closed. For
that purpose, we provide a guarantee identifier, GAU2, for the safety constraint, SC2. ASAM sends the
identifier to XAGREE to verify the activity of the door sensor.impl and then returns the result (Yes/No)
to ASAM. ASAM reports that result (NO or N) showing the safety constraint, SC2, is not verified. This
verification means that the door is blocked when the it is closed. This means the door will not return to
different status because it is blocked in that situation. ASAM verified that the door is blocked when it
is closed because ASAM did not obtain the true value from XAGREE to ensure the door is not blocked
when it is closed.
4.3.3
Safety Constraints Verification for Pacemaker
We will provide and discuss safety constraint verification for the pacemaker feedback control loop
architecture. We will verify the safety constraints that have been fed into the implementation part of each
individual component do not have unsafe behavior or that the hazard effects have been mitigated. In this
example, we will verify three important safety constraints of the pacemaker, which are:
1. SC1: The electrode should not be blocked when it is sensing.
2. SC2: The electrode should not be blocked when the heart is paced.
3. SC3: The heart should not be sensed and paced at the same time (i.e, sense and then pace, not both
together).
1
system E l e c t r o d e
2
features
3
E l e c t r o d e B l o c k e d s : in d a t a p o r t Base Types : : I n t e g e r ;
4
Measured Sensing : out d a t a p o r t Base Types : : Boolean ;
5
Measured Pacing : out d a t a p o r t Base Types : : Boolean ;
6
7
a n n e x x a g r e e {∗∗
assume ” E l e c t r o d e i n b l o c k e d s t a t e s ” : E l e c t r o d e B l o c k e d s >= 0 ;
95
GAU1 : g u a r a n t e e ” E l e c t r o d e i s b l o c k e d o n l y i f
8
i t i s i n measured
s e n s i n g ” : ( E l e c t r o d e B l o c k e d s = 0 and M e a s u r e d S e n s i n g = t r u e ) o r
Measured Sensing = f a l s e ;
GAU2 : g u a r a n t e e ” E l e c t r o d e i s b l o c k e d o n l y i f
9
i t i s h e a r t paced ” :
( E l e c t r o d e B l o c k e d s = 1 and M e a s u r e d P a c i n g = t r u e ) o r
Measured Pacing = f a l s e ;
10
11
∗∗};
end E l e c t r o d e ;
Listing 4.30: XAGREE Identifiers for Electrode Specification
In listing 4.30, we show the Electrode sensor specification along with preconditions and postconditions for the operation of the sensor. In the specification, the interface has two out data ports, Measured Sensing and Measured Pacing, which are used to send sensed and paced values of the heart to Controller Monitor Device. The interface also has an in data port, Electrode Blocked s that allows the Electrode
to receive the blocked status value from the heart.
1
2
3
4
5
system implementation E l e c t r o d e . impl
a n n e x asam {∗∗
−− I d e n t i f y t y p e o f m a j o r component
t y p e => s e n s o r ;
−−E l e c t r o d e i s t h e s e n s o r o f t h e p a c e m a k e r . The i n t e r n a l f a i l u r e i n i t
causes electrode error .
6
i n t e r n a l f a i l u r e INTF1 : ” The e l e c t r o d e r e p o r t i t i s b l o c k e d i n m e a s u r e d
s e n s i n g ” c a u s e s [ M e a s u r e d S e n s i n g E r r o r ( C r i t i c a l , p = 0 . 0 2 ) ] on
[ Measure Sensing ] ;
7
i n t e r n a l f a i l u r e INTF2 : ” The e l e c t r o d e r e p o r t i t i s b l o c k e d when t h e
h e a r t i s p a c e d ” c a u s e s [ M e a s u r e d P a c i n g E r r o r ( C r i t i c a l , p = 0 . 0 2 ) ] on
[ Measured Pacing ] ;
8
s a f e t y c o n s t r a i n t SC1 : ” The e l e c t r o d e s h o u l d n o t be b l o c k e d when i t i s
s e n s i n g ” p r e v e n t s [ M e a s u r e d S e n s i n g E r r o r ] on [ M e a s u r e S e n s i n g ]
v e r i f i e d by [GAU1 ] ;
9
s a f e t y c o n s t r a i n t SC2 : ” The e l e c t r o d e s h o u l d n o t be b l o c k e d when t h e
h e a r t i s p a c e d ” p r e v e n t s [ M e a s u r e d P a c i n g E r r o r ] on [ M e a s u r e d P a c i n g ]
v e r i f i e d by [GAU2 ] ;
96
10
∗∗};
11
a n n e x x a g r e e {∗∗
12
−− S a t i s f y e a c h s t a t u s o f t h e e l e c t r o d e
13
assign Measured Sensing = ( Electrode Blocked s = 0) ;
14
assign Measured Pacing = ( Electrode Blocked s = 1) ;
15
16
∗∗};
end E l e c t r o d e . i m p l ;
Listing 4.31: ASAM and XAGREE for Electrode Implementation
In listing 4.31, we show ASAM and XAGREE in the Electrode sensor implementation. ASAM gives
the identifiers, GAU1 and GAU2, to the guarantee statements in the XAGREE specification as shown in listing
4.30. These identifiers are used to verify the safety constraints, SC1 and SC2, to ensure that the Electrode is
not blocked when it is sensing the heart or when the heart is paced. If any guarantee associated with safety
constraints fail verification, then ASAM will report that the safety constraint are marked as unverified.
1
−−The d a t a m e s s a g e comes from t h e e l e c t r o d e s e n s o r , i t c o n t a i n s two s t a t u s a t
t h e same t i m e , s e n s i n g and p a c i n g
2
data Electrode message
3
end E l e c t r o d e m e s s a g e ;
4
5
6
d a ta implementation Electrode message . impl
subcomponents
7
DCM sensing c : d a t a B a s e T y p e s : : B o o l e a n ;
8
DCM pacing c : d a t a B a s e T y p e s : : B o o l e a n ;
9
end E l e c t r o d e m e s s a g e . i m p l ;
10
11
12
system D e v i c e C o n t r o l l e r M o n i t o r
features
13
DCM sensing and pacing msj : in d a t a p o r t Electrode message . impl ;
14
DCM Blocked c : i n d a t a p o r t B a s e T y p e s : : I n t e g e r ;
15
DCM sensing c : o u t d a t a p o r t B a s e T y p e s : : B o o l e a n ;
16
DCM pacing c : o u t d a t a p o r t B a s e T y p e s : : B o o l e a n ;
17
18
a n n e x x a g r e e {∗∗
assume ”DCM s t a t e s i n ” : DCM Blocked c <= 2 and DCM Blocked c >=0;
97
GAU3 : g u a r a n t e e ” H e a r t c a n n o t be s e n s e d and p a c e d a t t h e same t i m e ,
19
s e n s e and t h e n s p a c e ” : n o t ( DCM sensing c and DCM pacing c ) ;
20
21
∗∗};
end D e v i c e C o n t r o l l e r M o n i t o r ;
Listing 4.32: XAGREE for Device Controller Monitor Specification
In listing 4.32, we show a data message read by the Electrode containing two statuses at a time.
This means the message could contain both sensing and pacing information simultaneously. In reality, the
pacemaker should first sense and then pace the heart with respect to sensing values. We also show the Device Controller Monitor specification with identifiers in the XAGREE part. In the specification of the controller, the interface has two out data ports, DCM sensing c and DCM pacing c, to send sensing and pacing
commands. The interface also has two in data ports, DCM sensing and pacing msj and DCM Blocked c to
receive different types of messages. The DCM sensing and pacing msj is used to receive the data messages
that have been read and sent by the Electrode sensor. The DCM Blocked c is used to receive the blocked
status values of the Electrode sensor. In the XAGREE part, we show a precondition for the in data port to
identify blocked status values of the Device Controller Monitor. Also, we show a postcondition identifier
or guarantee identifier, GAU3, for the out data ports of the controller, DCM sensing c and DCM pacing c,
to identify that the Device Controller Monitor does not send sensing and pacing commands together at the
same time.
1
2
3
system implementation D e v i c e C o n t r o l l e r M o n i t o r . impl
a n n e x asam {∗∗
−− I d e n t i f y t y p e o f m a j o r component
t y p e => c o n t r o l l e r ;
4
5
−−DCM i s t h e c o n t r o l l e r o f p a c e m a k e r . The i n t e r n a l f a i l u r e i n i t c a u s e s
DCM e r r o r .
i n t e r n a l f a i l u r e INTF3 : ” The DCM r e p o r t t h a t h e a r t s e n s e d and p a c e d a t
6
t h e same t i m e ” c a u s e s [ SensedAndPacedCMD ( C r i t i c a l , p = 0 . 0 3 ) ] on
[ DCM sensing c ] ;
s a f e t y c o n s t r a i n t SC3 : ” The DCM s h o u l d n o t s e n d s e n s i n g and p a c i n g
7
commands t o g e t h e r a t t h e same t i m e ” p r e v e n t s [ SensedAndPacedCMD ] on
[ DCM sensing c ] v e r i f i e d by [GAU3 ] ;
8
∗∗};
98
9
10
a n n e x x a g r e e {∗∗
−− S a t i s f y t h a t s e n s i n g and p a c i n g h a v e t h e same s t a t u s
eq D C M s e n s i n g m s j c : b o o l = f a l s e −> i f
11
( ( D C M s e n s i n g a n d p a c i n g m s j . DCM sensing c = t r u e ) and
( D C M s e n s i n g a n d p a c i n g m s j . DCM pacing c = f a l s e ) ) t h e n t r u e e l s e
false ;
eq DCM pa cing msj c : b o o l = f a l s e −> i f
12
( ( D C M s e n s i n g a n d p a c i n g m s j . DCM pacing c = t r u e ) and
( D C M s e n s i n g a n d p a c i n g m s j . DCM sensing c = f a l s e ) ) t h e n t r u e e l s e
false ;
13
−− S a t i s f y t h a t e l e c t r o d e d a t a m e s s a g e s h o u l d n o t l e a d t o b l o c k t h e DCM
when i t i s t h e s e n s i n g and p a c i n g h a v e t h e same s t a t u s
14
a s s e r t ( DCM sensing c = ( D C M s e n s i n g m s j c and ( DCM Blocked c <= 2 ) ) ) ;
15
a s s e r t ( DCM pacing c = ( DCM pac ing msj c and ( DCM Blocked c >=0) ) ) ;
16
17
∗∗};
end D e v i c e C o n t r o l l e r M o n i t o r . i m p l ;
Listing 4.33: ASAM and XAGREE for Device Controller Monitor Implementation
In listing 4.33, we show ASAM and XAGREE in the Device Controller Monitor implementation.
ASAM gives an identifier, GAU3, to the guarantee statement in the XAGREE specification of the controller
as shown in listing 4.32, to verify the safety constraint, SC3, to ensure that the controller of the pacemaker
does not send sensing and pacing commands together at the same time. For that purpose, when we call the
XAGREE, it checks all activity of the Device Controller Monitor in the specification and implementation
parts and returns the result to ASAM.
Figure 4.17: ASAM’s report about safety constraints verification for pacemaker
Figure 4.17 shows ASAM’s report about the safety constraints verification for the pacemaker. ASAM
generates the report based on the following information: component’s implementation part, error ontology,
99
probability occurrence of the specific error, probability thresholds to identify hazard’s severity level, unsafe
control action, causes of unsafe control action, safety constraints, and verification of the safety constraints.
This stream of information is described previously in detail for different types of examples. Now, we need to
focus on the verification part of the safety constraints. We iteratively explain figure 4.17 below:
1. The internal failure (INTF1) in the Electrode.impl tells us that the electrode sensor has been blocked
when it is sensing the heart. This failure gives MeasuredSensingError to the Electrode.impl. We need
a safety constraint to mitigate the effects of the error. For that purpose, we provide a safety constraint,
SC1, to show that the electrode sensor should not be blocked because of the error when it is sensing
the heart. This is not a sufficient mechanism to mitigate the effects of the error only. Also, we need to
ensure that Electrode.impl is sensing the heart correctly/safely and it is not blocked. For this case, we
provide a guarantee identifier, GAU1, for the safety constraint, SC1. ASAM sends the guarantee identifier to XAGREE to verify the activity of the Electrode.impl specification / implementation and returns
the result (Yes/No) to ASAM. Then, ASAM reported (Yes) as a verification of the safety constraint,
SC1. This verification means that the Electrode.impl is not blocked when it senses the heart.
2. The internal failure (INTF2) in the Electrode.impl tells us that the electrode sensor has been blocked
when the heart is paced. This failure gives MesuredPacingError to the Electrode.impl. We need a safety
constraint to mitigate the effects of the error. For that purpose, we provide a safety constraint, SC2,
to show that the electrode sensor should not be blocked because of the error when the heart is paced.
That is not sufficient mechanism to mitigate the effects of the error only. Also, we need to ensure that
the Electrode.impl sensor has not been blocked when the heart is paced. For this case, we provide a
guarantee identifier, GAU2, for the safety constraint, SC2. ASAM sends the guarantee identifier to
XAGREE to verify the activity of the Electrode.impl specification / implementation and returns the
result (Yes/No) to ASAM. Then, ASAM reported (Yes) as a verification of the safety constraint, SC2.
This verification means that the Electrode.impl is not blocked when the heart is paced.
3. The internal failure (INTF3) in the Device Controller Monitor.impl tell us that the heart is sensed and
paced together at the same time. This failure gives SensedAndPacedCMD error to the controller. We
need a safety constraint to mitigate the effects of the error. For that purpose, we provide a safety
constraint, SC3, to show that the controller should not send sensing and pacing commands together at
the same time. That is not sufficient mechanism to mitigate the effects of the error only. Also, we need
to ensure that the controller do not send the wrong command to the heart. For this case, we provide a
100
guarantee identifier, GAU3, for the safety constraint, SC3. ASAM sends the identifier to XAGREE to
verify the activity of the Device Controller Monitor.impl specification / implementation and returns the
result (Yes/No) to ASAM. Then, ASAM reported (Yes) as a verification of the safety constraint, SC3.
This verification means that the Device Controller Monitor.impl do not send the inadequate command
to the heart such as sensing and pacing commands together to the heart.
Figure 4.18: ASAM’s report about safety constraints are not verified for the pacemaker
Figure 4.18 shows ASAM’s generated report about safety constraints of the Electrode.impl status in
the pacemaker system are not verified. We will describe the reasons as follows:
1. As a reminder, we need to verify the safety constraint of the Electrode.impl to ensure that the electrode is not blocked when it is sensing. For that purpose, we provide a guarantee identifier, GAU1,
for the safety constraint, SC1. ASAM sends the identifier to XAGREE to verify the activity of the
Electrode.impl and then returns the result (Yes/No) to ASAM. ASAM reports that result (NO or N)
showing the safety constraint, SC1, is not verified. This verification means that the electrode sensor is
blocked when it is sensing. This means the electrode will not sense any more because it is blocked in
that status. ASAM verified that the electrode sensor is blocked when it is sensed because ASAM did
not obtain the true value from XAGREE to ensure the electrode is not blocked when it is sensed.
2. Also, we need to verify the safety constraint of the Electrode.impl to ensure that the electrode is not
blocked when the heart is paced. For that purpose, we provide a guarantee identifier, GAU2, for
the safety constraint, SC2. ASAM sends the identifier to XAGREE to verify the activity of the Electrode.impl and then returns the result (Yes/No) to ASAM. ASAM reports that result (NO or N) showing
the safety constraint, SC2, is not verified. This verification means that the electrode sensor is blocked
when the heart is paced. This means the electrode will not pace the heart any more because it is blocked
in that situation. ASAM verified that the electrode sensor is blocked when the heart is paced because
ASAM did not obtain the true value from XAGREE to ensure the electrode sensor is not blocked when
the heart is paced.
101
4.3.4
Summary
In this evaluation, we have taken two important safety-critical system examples, the train automated
door control system and a medical embedded device (i.e. pacemaker). In each example, we have multiple
safety constraints, (SC1, SC2, and SC3) and multiple guarantee identifiers (GAU1, GAU2, and GAU3) which
are used to verify the safety constraints. The result of each example has been evaluated based on XAGREE
analysis and ASAM’s generated report.
The purpose of this evaluation is to demonstrate that ASAM uses verification, unlike existing safety
analysis methods, to prove that the safety constraint fully mitigates the hazardous condition improving the
quality of the safety constraints.
The result tells us that ASAM verified the safety requirements for different kinds of examples in
the safety-critical domains. We provided safety constraints in the implementation part for each individual
core component in the feedback control loop architecture. We also provided guarantee identifier for each
safety constraint within the component implementation. We adopted XAGREE to check the activity of the
identifiers in the specification and implementation of the component and then it returns the result to ASAM.
In each example, we have also demonstrated that ASAM generate a report to verify the result. In summary,
we show how to ensure the error effects have been mitigated by verifying safety the constraints within the
architecture model.
4.4 Demonstrating Notations and Expressions for Error Flows and
Safety Constraints
In this section, we will show that how to identify the notations in the ASAM’s implementation.
Essentially, we are going to say that the tooling shown in claims 1 - 3 is an implementation of the mathematical
framework. We described in detail the theoretical foundation of notation and expression in section 3.3.
As a reminder, previously, we provide the mathematical notation for the feedback control loop architecture and provide formal specification for the error ontology in the loop. By doing so, we allow stakeholders
to find more unsafe possibilities in the operational system context. Additionally, we provide an expression
for each component in the feedback control loop and use error types in the form of formal specifications
to find more possibilities for how the system could behave incorrectly. Also, we identify unsafe functional
behavior for each component, and we provide safety constraints for possibilities of unsafe functions in the
102
system because if the system knows about those possibilities, it can take action to avoid incorrect behavior.
We will show the notations in ASAM’s implementation to identify the locations of the variables in
the feedback control loop architecture such as measured variables (Vme), control actions (CA), manipulated
variables (Vma), and controlled variables(Vc). We also show the other elements of the notation in ASAM
such as safety constraints (SC), internal failures (INTF), errors (Err), the error ontology (EO), guarantee
identifiers (GAU), unsafe control actions (UCA), probability values (P), hazards types (H), major component
types such as sensors (S), controllers (C), actuators (A) and controlled processes (CP). These elements help
the stakeholders use our notation in the safety-critical examples.
4.4.1
Identify Notations in ASAM’s Implementation
When the safety analysts or stakeholders run ASAM, they will see our notations that have been
executed within the architecture model to generate a report. We have presented the notations in the form of
executed safety analysis statements in ASAM such as error statements, error propagation statements, internal
failure statements, safety constraint statements that either handle or prevent hazards, and safety constraint
verification statements. In this section, we will provide and discuss the notations that we have implemented
in ASAM.
We have presented different types of variables in the feedback control loop architecture such as
measured variables (Vme), control actions (CA), manipulated variables (Vma) and controlled variables (Vc).
We also identify major component types such as sensors (S), controllers (C), actuators (A), and controlled
processes (CP). We have also specified the location of the variables among the components. For example, we
have presented (Vme) between (S and C), (CA) between (C and A), (Vma) between (A and CP), (Vc) between
(CP and S).
Each component in the feedback control loop could has an internal failure (INTF) which leads to an
error (Err) in the variable. In this case, the stakeholders can select any of the six (Type) errors in the error
ontology (EO) such as (Service Errors, Value Errors, Timing Errors, Replication Errors, Concurrency Errors,
Access Control Errors) as an effect of the internal failure. In the our application examples, we have focused
on the first three. Also, each error causes an unsafe control action (UCA) in the component. We identify the
general and specific cause for that error. To mitigate the effects of the (Err), we provide safety constraints
(SC), to handle and prevent incoming and outgoing (Err)s. We also have represented the probability value
(P) of error (Err)’s occurrence as well as the severity level of the the (Err) such as (Catastrophic, Critical,
Marginal, Negligible). Finally, we show the notation in the components implementation within the architec103
ture model. Also, XAGREE helped ASAM to verify the guarantee identifiers (GAU) in the specification of
the components. ASAM wants to verify the safety constraint (SC) identifiers that have been fed in the implementation by specification using XAGREE to ensure that the (Err) effects have been mitigated or hazardous
condition could not occur. In the following example, we just remind the reader how to identify our notations
in ASAM:
1
2
system S
features
−−C o n t r o l l e d V a r i a b l e s ( Vc ) o f t h e S e n s o r ( S )
3
Vc S : i n d a t a p o r t B a s e T y p e s : : I n t e g e r ;
4
5
−− Measured V a r i a b l e (Vme) o f t h e S e n s o r ( S )
6
Vme S : o u t d a t a p o r t B a s e T y p e s : : B o o l e a n ;
7
−−Check t h e s e n s o r ’ s s p e c i f i c a t i o n i n p u t and o u t p u t
8
a n n e x x a g r e e {∗∗
assume ” C o n t r o l l e d v a r i a b l e o f t h e s e n s o r ’ s i n p u t ’ ” : Vc S >= 0 ;
9
GAU1 : g u a r a n t e e ” Check m e a s u r e d v a r i a b l e w i t h r e s p e c t t o c o n t r o l l e d
10
v a r i a b l e ” : ( Vc S = 1 and Vme S = t r u e ) o r Vme S = f a l s e ;
11
12
∗∗};
end S ;
Listing 4.34: Notations in ASAM
In listing 4.34, we show controlled variable (Vc) as an in data port and also have represented measured variable (Vme) as an out data port. We also have used XAGREE to check the sensor, (S)’s, input and
output specifications respectively. XAGREE provides conditions for the controlled variable (Vc) and provides
a guarantee identifier (GAU1) for the measured variable (Vme).
1
2
system implementation S . impl
a n n e x asam {∗∗
3
t y p e => s e n s o r ;
4
i n t e r n a l f a i l u r e INTF1 : ” S e n s o r i n t e r n a l f a i l u r e s t a t e m e n t ” c a u s e s
[ EO S ( C r i t i c a l , p = 0 . 0 2 ) ] on [ Vme S ] ;
s a f e t y c o n s t r a i n t SC1 : ” S e n s o r s a f e t y c o n s t r a i n t s t a t e m e n t f o r t h e
5
s e n s o r ’ s m e a s u r e d v a r i a b l e ” p r e v e n t s [ EO S ] on [ Vme S ] v e r i f i e d by
[GAU1 ] ;
6
∗∗};
104
7
a n n e x x a g r e e {∗∗
a s s i g n Vme S = ( Vc S = 1 ) ; −−R e t u r n t r u e when t h e Vc i s c o r r e c t
8
9
10
∗∗};
end S . i m p l ;
Listing 4.35: Notations in ASAM
In listing 4.35, in the sensor’s implementation part (S.impl), we show the type of the component
(type => sensor), an internal failure statement (INTF1) which gives an error (Err) in the error ontology
(EO S) to the sensor (S). The error has a probability of occurrence (P) and has a severity level, (Critical).
We also a safety constraint statement (SC1) to prevent the effects of the error on the measured variable of the
sensor (Vme S), which also has a guarantee identifier (GAU1) to that the effect has been mitigated. Also, we
have use XAGREE to return true or (Yes) to ASAM if the safety constraint is correct.
1
2
system C
features
3
−−Measured V a r i a b l e s (Vme) o f t h e C o n t r o l l e r (C )
4
Vme C : i n d a t a p o r t B a s e T y p e s : : B o o l e a n ;
5
−−C o n t r o l A c t i o n (CA) o f t h e C o n t r o l l e r (C )
6
CA C : o u t d a t a p o r t
7
Base Types : : I n t e g e r ;
end C ;
8
9
10
system implementation C. impl
a n n e x asam {∗∗
11
t y p e => c o n t r o l l e r ;
12
i n t e r n a l f a i l u r e INTF2 : ” C o n t r o l l e r i n t e r n a l f a i l u r e s t a t e m e n t ” c a u s e s
[ EO C ( M a r g i n a l , p = 0 . 0 4 ) ] on [ CA C ] ;
13
−−E r r o r comes from t h e m e a s u r e d v a r i a b l e and c a u s e s e r r o r on c o n t r o l
action .
[ EO S ] on [ Vme C ] c a u s e s [ EO S ( C a t a s t r o p h i c , p = 0 . 0 0 2 ) ] on [ CA C ] ;
14
15
−− C o n t r o l l e r h a v e a SC t o h a n d l e e r r o r ( s ) i n m e a s u r e d v a r i a b l e .
s a f e t y c o n s t r a i n t SC2 : ” C o n t r o l l e r s a f e t y c o n s t r a i n t f o r c o n t r o l l e r ’ s
16
m e a s u r e d v a r i a b l e ” h a n d l e s [ EO S ] on [ Vme C ] v e r i f i e d by [GAU2 ] ;
17
∗∗};
105
18
end C . i m p l ;
Listing 4.36: Notations in ASAM
In listing 4.36, we have represented the controller’s (C) specification and implementation. In the
specification, we have represented measured variables as in data port to the controller (Vme C) and have
represented control action as an out data port of the the controller (CA C). In the implementation, we have
identified the major type of the component (type => controller), internal failure statement (INTF2) of the
controller which gives an error (Err) in the error ontology to the controller (EO C). Then, the error (Err) will
propagate to the actuator (A) through control action of the controller (CA C). At the same time, the incoming
error from the sensor (EO S) arrives to the controller (C) through measured variable of the controller (Vme C),
it causes an error on the control action (CA C) of the controller’s output. We also have represented a safety
constraint, (SC2), to handle the incoming error from the sensor (EO S) on controller’s measured variable
(Vme C), and we have an identifier (GAU2) to ensure that it is mitigated.
1
2
system A
features
3
−− C o n t r o l A c t i o n (CA) o f t h e A c t u a t o r (A)
4
CA A : i n d a t a p o r t B a s e T y p e s : : I n t e g e r ;
5
−− M a n i p u l a t e d V a r i a b l e s (Vma) o f t h e A c t u a t o r (A)
6
Vma A : o u t d a t a p o r t B a s e T y p e s : : I n t e g e r ;
7
end A;
8
9
10
system implementation A. impl
a n n e x asam {∗∗
11
t y p e => a c t u a t o r ;
12
i n t e r n a l f a i l u r e INTF3 : ” A c t u a t o r i n t e r n a l f a i l u r e s t a t e m e n t ” c a u s e s
[ EO A ( C a t a s t r o p h i c , p = 0 . 2 ) ] on [ Vma A ] ;
13
−−E r r o r comes from t h e c o n t r o l l e r and c a u s e s e r r o r on m a n i p u l a t e d
variables .
14
[ EO C ] on [ CA A ] c a u s e s [ EO C ( C r i t i c a l , p = 0 . 1 ) ] on [ Vma A ] ;
15
s a f e t y c o n s t r a i n t SC3 : ” A c t u a t o r s a f e t y c o n s t r a i n t t o h a n d l e i n c o m i n g
e r r o r from c o n t r o l a c t i o n ” h a n d l e s [ EO C ] on [ CA A ] v e r i f i e d by
[GAU3 ] ;
16
s a f e t y c o n s t r a i n t SC4 : ” A c t u a t o r s a f e t y c o n s t r a i n t t o p r e v e n t o u t g o i n g
106
e r r o r on m a n i p u l a t e d v a r i a b l e ” p r e v e n t s [ EO A ] on [ Vma A ] v e r i f i e d
by [GAU4 ] ;
17
18
∗∗};
end A . i m p l ;
Listing 4.37: Notations in ASAM
In listing 4.37, we show the actuator’s, (A), specification and implementation. In the specification,
we show the control action of the actuator (CA A) as an in data port and show the manipulated variables of
the actuator (Vma A) as an out data port. In the implementation, we have identified the type of the component
(type => actuator) and we show an internal failure statement (INTF3) of the actuator which gives an error
(Err) in the error ontology (EO A). The error will propagate through the manipulated variables of the actuator
(Vma A) to the controlled process (CP). At the same time, an incoming error from controller (EO C) arrives
to the actuator (A) through the control action of the actuator (CA A), it causes an error on the manipulated
variables of the actuator (Vma A). Also, we show a safety constraint, (SC3), to handle incoming errors from
controller (EO C) on the control action of the actuator (CA A) and we have an identifier (GAU3) to ensure
that it is mitigated. We also represent another safety constraint, (SC4) to prevent errors in the actuator itself
(EO A) and we have an identifier (GAU4) to ensure that they are mitigated.
1
2
s y s t e m CP
features
3
−−M a n i p u l a t e d V a r i a b l e s (Vma) o f t h e C o n t r o l l e d P r o c e s s ( CP )
4
Vma CP : i n d a t a p o r t B a s e T y p e s : : I n t e g e r ;
5
−−C o n t r o l l e d V a r i a b l e s ( Vc ) o f t h e C o n t r o l l e d P r o c e s s ( CP )
Vc CP : o u t d a t a p o r t B a s e T y p e s : : I n t e g e r ;
6
7
end CP ;
8
9
10
s y s t e m i m p l e m e n t a t i o n CP . i m p l
a n n e x asam {∗∗
t y p e => c o n t r o l l e d p r o c e s s ;
11
12
−−P a s s t h e i n c o m i n g e r r o r s from m a n i p u l a t e d v a r i a b l e s t o c o n t r o l l e d
variables
13
[ EO C ] on [ Vma CP ] c a u s e s [ EO C ( M a r g i n a l , p = 0 . 0 1 ) ] on [ Vc CP ] ;
14
[ EO A ] on [ Vma CP ] c a u s e s [ EO A ( N e g l i g i b l e , p = 0 . 0 0 0 0 1 ) ] on [ Vc CP ] ;
15
∗∗};
107
16
end CP . i m p l ;
Listing 4.38: Notations in ASAM
In listing 4.38, we show the controlled process (CP) specification and implementation. In the specification, we show the manipulated variables of the controlled process (Vma CP) as an in data port and show
the controlled variable of the controlled process (Vc CP) as an out data port. In the implementation, we
have identified the type of the component (type => controlled process). The errors of the controller (C) and
actuator (A), (EO C) and (EO A), will propagate through the manipulated variable of the controlled process
(Vma CP) to the controlled variable of the controlled process (Vc CP).
1
s y s t e m Top System
2
end Top System ;
3
4
5
s y s t e m i m p l e m e n t a t i o n Top System . i m p l
subcomponents
6
S : system S . impl ;
7
C: system C. impl ;
8
A: s y s t e m A . i m p l ;
9
CP : s y s t e m CP . i m p l ;
10
connections
11
c o n n 1 : p o r t S . Vme S −> C . Vme C ;
12
c o n n 2 : p o r t C . CA C −> A . CA A ;
13
c o n n 3 : p o r t A . Vma A −> CP . Vma CP ;
14
c o n n 4 : p o r t CP . Vc CP −> S . Vc S ;
15
end Top System . i m p l ;
Listing 4.39: Notations in ASAM
In listing 4.39, we show the top level specification and implementation. In the specification, we
do not have any features to describe. In the implementation, we show the subcomponents and their connections. In the subcomponents, we have called the component implementations such as sensor implementation
(S.impl), controller implementation (C.impl), actuator implementation (A.impl) and controlled process implementation (CP.impl). In their connections, we have connected the components together through the ports.
For example, the measured variable of the sensor (S.Vme S) is connected to the measured variable of the controller (C.Vme C) under the connection name (conn 1). Also, the control action of the controller (C.CA C)
108
is connected to the control action of the actuator (A.CA A) under the connection name (conn 2). Then, the
manipulated variables of the actuator (A.Vma A) are connected to the manipulated variables of the controlled
process (CP.Vma CP) under the connection name (conn 3). Finally, the controlled variable of the controlled
process (CP.Vc CP) is connected to the controlled variable of the sensor (S.Vc S) under the name the connection (conn 4).
4.4.2
Summary
In this evaluation, we have made two different types of analysis, theoretical and practical. In the
theoretical analysis, described in section 3.3.1, we have applied our notations and expressions for the train
automated door control system to identify more safety constraints for the hazardous possibilities. In the
practical analysis, described in section 4.4.1, we have an architectural description for the stakeholders or
safety analysts to know how to use our notations in the feedback control loop architecture to run ASAM
successfully and generate a report for the architecture model.
The purpose of this evaluation is to demonstrate that ASAM uses system mathematical notation,
unlike existing safety analysis methods, to describe the systems error flows allowing it more rigorously audit
the model.
In the theoretical analysis, the result of the notation and expression tell us that the safety analysts are
able to identify unsafe behavior of the components, specify input/output/function for each component, identify unsafe functional behavior/find more hazardous possibilities using error ontology, identify mathematical
notation for each error flow / scenario and provide the safety constraint to mitigate the effects of the error
flow.
In the practical analysis, the result tells us that the ASAM’s implementation satisfied the notations
that have been used in the theoretical analysis.
109
Chapter 5
Discussion
In this section, we will first discuss the limitations of the techniques used to evaluate the framework
presented as part of this research. We will also justify the choices made during the evaluation, and we will
discuss the contributions of ASAM.
5.1 Difference Between Two-Way and Three-Way Communication Formats
In this section, we will describe the difference between two-way and three-way communication formats. The two-way communication format is the interaction between two components only. The stakeholders
know the source of the hazard and impact destination. But, the three-way communication format is the interaction among the three components. The safety analysts need to focus on the system behavior when the data
is sent by the first component, crosses into the second component and is received by the third component.
This allows us to identify different kinds of hazard sources and different types of impact destinations. We
show the difference between the two formats in figure 5.1.
110
Figure 5.1: Two-way versus Three-way communication format
Figure 5.1(a), the two-way communication format, depends on the interaction between controller
and actuator to identify unsafe control actions (UCA). This allows the stakeholders / safety analysts to focus
on the system controller as a source of the hazard because they are continuously monitoring the system
behavior when the controller sends inadequate commands to the actuator. Then, they will provide the safety
requirements to mitigate that hazard during the operational system context.
Figure 5.1(b), the three-way communication format, focuses on the interaction among multiple components, usually three in a sequence such as [sensor → controller → actuator], [controller → actuator →
controlled process], and [actuator → controlled process → sensor]. In each sequence, we have identified
the new unsafe control action for the system. Each element in the sequence has a different role of (source,
path, sink) in the context of the error propagation information. This allows the stakeholders to identify different types of hazard sources because each of three major component can have different types of hazards.
5.2 Justification of & Limitations of Evaluation
In the first evaluation (for claim C1), we chose to use only the three-way communication format
rather than do a comparison against the two-way format. STPA is a long process that requires extensive
description, and STPA analyses of each of our examples are already available in existing literature so repeating the analysis for each example was deemed unnecessary. For those wishing to see one of our examples
analyzed in STPA, please see appendix B.
In the second evaluation (for claim C2), we chose error flows to identify safety constraints instead
111
of using control tables, used in two-way communication format analysis. The rational behind this choice
was two fold. First, the error flows allow the stakeholders to identify the safety constraints and mitigate the
effects of errors that have been propagated from a component to another. Second, the error flows also allow
the identification of general and specific causes of the unsafe control actions. In figure 5.2, we show the
difference between safety analysis using control tables and safety analysis using error flows:
Figure 5.2: Control Table versus Error Flows
As shown in the figure 5.2(a), the control table has been created to hold the safety requirements to
eliminate system hazards or unsafe interaction between the controller and actuator. This allows the safety analysts to provide the safety requirements between the two components to mitigate the unsafe system behavior.
Also, as shown in the figure 5.2(b), we show that error flow analysis allows us to specify safety constraints
and describe hazards between components other than the actuator and controller. The error flows also allow
us to specify the exact source, propagation path and sink of each error, allowing us to specify both the general
and specific causes of each hazard.
In the third evaluation (for claim C3), we extended safety analysis by adding verification. Most current standards do verify the safety constraints resulting from the analysis instead relying on the experience of
the analyst to ensure the correctness of the constraint. Due to our use of formal notation and exact specification, it is possible to perform verification of the constraint against the architecture model of the system. The
rational behind this was to introduce the safety constraint verification to ensure the hazard was completely
mitigated.
112
5.3 ASAM Field Contributions
ASAM makes three primary contributions or advancements to the field of safety-critical systems
and error propagation information. Briefly, we summarized ASAM’s contributions:
First, ASAM uses an architectural model rather than a mental process model. This provides several
benefits:
1. It allows us to use defined error ontologies to describe hazards.
2. It allows us to define an unsafe control action’s propagation path rather than just describing its effect.
3. It allows us to define more precise safety constraints focusing on the general and / or specific cause of
an error identified through the propagation path.
Second, ASAM is based a formal expression notation in addition to the architectural model. This
provides several benefits:
1. Formalizing the error ontology allows us to find more possibilities of incorrect system behavior.
2. It allows us to formally specify the behavior of each component, and through rigorous analysis uncover
the unsafe functional behavior of each component in the feedback control loop.
3. It allows us to discover additional safety constraints as well as improve existing constraints to more
thoroughly mitigate potential hazards.
Finally, ASAM allows verification of its safety constraints in the architectural model through the use
of its formal notation. This provides several benefits:
1. It allows us to better ensure that safety constraints completely cover all hazardous possibilities.
2. It allows us to discover weak constraints so that they can be improved.
3. It allows us to discover areas of the architectural model that have missing constraints when verification
is done not just against the safety constraints but also against the system requirements, a possibility
that exists due to our use of a rigorous architecture specification format and formal notation.
Figure 5.3 shows the contributions in the road map of the ASAM’s implementation:
113
Figure 5.3: ASAM’s contributions
114
Chapter 6
Related Work
In this section, we will discuss works related to ASAM. We will also discuss work that is loosely
related to our method, including research that is likely to be incorporated into our method at a later date.
6.1 Works Related to ASAM
STPA is the current accepted standard used to analyze safety of systems. It is the latest top-down
hazard analysis method. It has been used successfully in many safety-critical systems outside of AADL.
Some examples of this are [22] and [45]. Our work differs from these groups in where safety analysis is
applied during the development process of safety-critical systems. We augment STPA with error propagation
information to finding hazards and additional information in the system that other groups miss such as unsafe
control actions, safety constraints and specific causes of the unsafe actions. We apply safety analysis to the
system architecture during the design phase of the software development life cycle while these other projects
or groups apply hazard analysis during the requirements phase.
Our work is most similar to the work performed by the following groups, particularly [37] and [35]
both of which use AADL. Additional architecture-based techniques exist, such as [10], [36], [34] and [12].
Our work differs from these groups in that we are explicitly focused on allowing the error ontology assets to
be reused in the same manner as the safety-critical assets, and we also add and exploit new several statements
in the AADL language such as error statements, error propagation statements, internal failure statements,
safety constraint statements that either handle or prevent hazards and verify safety constraints statements.
ASAM is a model-based software safety analysis tool which works with AADL models annotated
115
with an annex (i.e. annex asam) and supported by OSATE. It is used to analyze and generate a report based
on information attached to each component in the feedback control loop architecture. There are many other
safety and hazard analysis techniques, for example, STPA [22], SAFE [36], FTA [3], HAZOP [30], FMEA
[40], FPTN [13], Hip-HOPS [48], FPTC [47]. We differ from each of these primarily in focus.
Our work is focused on mitigating the effects of the errors through safety constraints and verifying
the constraints within the system model which are unique to safety-critical systems. Additionally, ASAM is
a modified version of the previously existing STPA that has been made more rigorous, and ASAM effectively
analyzes hardware & software components in safety-critical systems. Also, ASAM is different than SAFE in
handling incoming errors / preventing outgoing errors via the implementation of the safety constraints, and
verifies the safety constraint statements to ensure that the error effect or unsafe action has been mitigated.
Our work is different from FTA, HAZOP and FMEA in where faults are inserted into the architecture
model. Our generated report also provides how the fault produces an error in the system & how the error
propagates through the model & gives the probability value of the occurrence for each error in the model &
records the hazard’s severity level for each error.
Also, our work is different than FPTN, Hip-HOPS and FPTC in that we show how the error propagation effects are mitigated via the implementation of safety constraints instead of just showing how errors
propagate through the model. Another difference is that we focus on introducing compositional reasoning in
ASAM, unlike other safety analysis methods, to ensure that the hazardous condition of the propagated error
will not happen.
6.2 Additional Related Works
According to our experience in applying hazard analysis methods on different types of systems,
these hazard analysis techniques cannot describe the dynamic error behavior of the system, system states, the
transitions of the system, and error propagations among system components. Because of this, the traditional
hazard analysis techniques depend on decomposition of the system with respect to the hierarchy of failure
effects instead of the systems architectural model. Based on this fact, we have decided to develop procedures
to augment the existing hazard analysis techniques (such as STPA) with error propagation information and
state machines to support modeling and analyzing dynamic error behavior. To prove that the current hazard
analysis methods like STPA still needs development, we have several references:
This work [1] shows that the STPA requires suitable diagrammatic notations to represent the rela116
tionship among hazards, process model variables in the controller and control actions to the actuator. Our
work is different from this group in that we use formal notations to augment STPA to find hazardous conditions in a design. Also, we investigated how a formalized error ontology could assist in identifying unsafe
behavior. The formal specification of the error ontology can aid in identifying mathematical expressions for
each error flow in the canonical feedback control loop architecture.
This work [2] describes software safety verification based on recent hazard analysis methods. They
found the safety requirements using STPA. Then, they changed the requirements to formal specifications
using theorem proving such as CTL/LTL (Computation Tree Logic / Linear-Time Temporal Logic). Then,
they used a model checker such as SMV (Symbolic Model Verifier) to verify the safety requirements. This
work has been done by using several tools, but our work is different than this group, we make it easier by
putting the safety requirements within the architecture model and then verify the requirements within in the
model as well, using less tools.
This work [4] represents a formal framework for the hazard identification step in STPA. Our work is
different from this project, in that we provide guidance, [39], concerning mathematical notations to formalize
an error ontology used in the architecture descriptions of systems represented in AADL to improve the rigor
of STPA. The results of our work have shown that providing a formal notation for the feedback control loop
and providing formal specification for the error ontology lead to finding hazards in the operational system
context that other methods miss. By augmenting STPA with an error ontology described in a formal notation,
we are able to find more hazards.
117
Chapter 7
Future Work
As a reminder, ASAM has been created based on three primary contributions that have been completed and evaluated as part of the work for this dissertation, which are:
• Augmenting a hazard analysis method with error propagation information for safety-critical systems
[38]
• Using formal notations to augment a hazard analysis method [39]
• Verifying safety constraints in a compositional safety analysis method (i.e. ASAM) [39]
As mentioned in section 2.6, ASAM’s future work is the implementation of False Positive (FP) /
False Negative (FN) identification for each safety-critical application that we described in the dissertation.
Shown below are partially complete FP/FN identification in some of the safety-critical example systems used
in this dissertation. Now, we will show that how false positive and false negative could occur and be identified.
7.1 ASAM’s Future Work in Adaptive Cruise Control System
As a reminder, a false positive (FP) means the result is positive when in reality it is not. For example,
when a warning or automatic braking happens before the critical distance. Also, a false negative (FN) means
the result is negative when in reality it is present. For example, if a warning or an automatic braking happens
too late.
Figure 7.1 shows ASAM’s future work in the adaptive cruise control system to warn a driver if
the minimum distance between your own vehicle and the vehicle’s front is passed. The obstacle is that the
118
Figure 7.1: ASAM’s future work in ACC example
ACC system warns the driver to the apply the brakes when the minimum distance is not presented to the
system. Another obstacle is that the ACC sensor presents the minimum distance, but the ACC controller
does not send the warn command to the driver. For that purpose, we need to analyze the ACC sensor data
availability according to warning / not warning the driver to identify false positives (FP), false negatives (FN),
true positives (TP), and true negatives (TN) in ACC system. The analysis is shown in a table below 7.1:
Table 7.1: Identify FP/FN/TP/TN in ACC system
Min. Distance presented Min. Distance not presented
Positive Result (Warn)
ACC TP*(1)
ACC FP*(2)
Negative Result (Not Warn)
ACC FN*(3)
ACC TN*(4)
As shown in table 7.1, we have identified a true positive which is ACC TP*(1): ACC controller
warns the driver when the minimum distance is presented. Also, we identified a false positive which is
ACC FP*(2): ACC controller warns the driver when the minimum distance is not presented.. It means
ACC controller made a wrong decision because the warning command has been sent based on incorrect data.
Also, we identified a false negative which is ACC FN*(3): ACC controller does not warn the driver when the
minimum distance is presented.. It means ACC controller does not make a decision when data is available.
That’s a hazard because the data is available but the controller missed sending the warning command, potentially leading to a crash. Finally, we have identified a true negative which is ACC TN*(4): ACC controller
does not warn the driver when the minimum distance is not presented. It means the ACC controller does not
make a decision when the data is not available, normal operation for the system.
119
7.2 ASAM’s Future Work in Medical Embedded Device - Pacemaker
In this example, we will identify FP/FN/TP/TN for the pacemaker. The device controller monitor
(DCM) sends the pace command to the pulse generator (PG) to pace the heart with respect to sensing data.
Figure 7.2: ASAM’s future work in pacemaker example
Figure 7.2 shows ASAM’s future work in the pacemaker system to pace the heart with respect to
availability of sensing data which provides by the electrode sensor. The problem is that the DCM paces the
heart when the sensing data is not available. Another problem is that the DCM does not pace the heart when
the sensing data is available. For that purpose, we need to analyze the electrode’s sensing data according to
pacing / not pacing the heart to identify false positives (FP), false negatives (FN), true positives (TP) and true
negatives (TN) in the pacemaker system. We will show that in the table 7.2:
Table 7.2: Identify FP/FN/TP/TN in pacemaker
Sensing Data Available Sensing Data Not Available
Positive Result (Pace)
PM TP*(1)
PM FP*(2)
Negative Result (Not Pace)
PM FN*(3)
PM TN*(4)
As shown in table 7.2, we have identified a true positive which is PM TP*(1): DCM paces the
heart when the sensing data is available. In addition, we identified a false positive PM FP*(2): DCM paces
the heart when the sensing data is not available. It means DCM made a wrong decision because the pace
command has been sent based on incorrect sensing data. Also, we have identified a false negative which is
PM FN*(3): DCM does not pace the heart when the sensing data is available. That is a hazard because
the sensing data is available but the controller does not send the command to the pulse generator, potentially
120
causing a life-threatening situation. Finally, we have identified a true negative which is PM TN*(4): DCM
does not pace the heart when the sensing data is not available. It means that DCM does not make a decision
when the data is not available, normal operation for the system.
7.3 Summary
In this section, we have focused on how to identify the false positives (FP) and false negatives
(FN) in each safety-critical application example that described in the dissertation. Specifically, we focused
on availability of the data which is provided by the sensor and then analyzed by the controller to make a
decision. This step can be considered as an initial identification step to start the future work of ASAM to
identify FP/FN.
According to the identification step, false negatives (FN) can be assessed as a hazard for the safetycritical systems. As part of our future work we will analyze the following questions:
• RQ6) How false positives and false negatives could occur in the safety-critical systems?
• RQ7) What can be done to reduce false positive and false negative probabilities in the feedback control
loop architecture?
• RQ8) How to handle false positives and false negatives in the architecture to identify some margins of
system safety?
121
Chapter 8
Conclusion
In the proposal for this dissertation, we promised to provide a new architecture safety analysis
method (ASAM) to analyze system safety in a novel way. In this dissertation, we have presented the outline
of a new method as well as an implementation of of that method for the analysis of safety-critical systems.
Our method has been built based on STPA but it has the following additions:
• ASAM is capable of finding more hazards by using a three-way communication format as opposed
to the two-way format used by STPA. This allows the identification of potential internal failures of
the major components in the feedback control loop architecture. The result of each internal failure is
an error. The error propagates and cuts across components based on the three-way interaction format.
This allows stakeholders to identify new unsafe control actions and allows the reduction of the potential
effects of residual hazards in the operational system context.
• ASAM also provides a mathematical notation/expression and formal specification of error ontology
for the feedback control loop architecture to find more safety constraints during hazard analysis. This
allows stakeholders to use formal methods to find more hazardous possibilities and mitigate them using
safety constraints.
• ASAM provides safety verification procedures to verify the safety constraints to ensure that hazardous
conditions cannot occur. ASAM verifies the safety constraints against the system model with injected
errors. This allows us to determine that either unsafe behaviors occur (which means the error leads to
a hazardous condition in the system) or verifies that the error does not lead to unsafe behaviors of the
system.
122
In conclusion, ASAM allows system stakeholders / safety analysts to find new hazards and new
safety constraints that STPA does not find. This does not mean that our method is a replacement for STPA;
however, it supports more effective and rigorous analyses of the elements of safety-critical systems. Our
method is an augmentation of existing safety analysis methods, including STPA, to be used alongside current
industry standards to uncover more difficult latent errors that current methods are not capable of discovering.
123
Appendices
124
Appendix A
Error Ontology
• Service Errors
– ItemOmission / ItemCommission
– ServiceOmission / ServiceCommission
– SequenceOmission / SequenceCommission
∗ LateServiceStart
∗ EarlyServiceTermination
∗ BoundedOmissionInterval
∗ EarlyServiceStart
∗ LateServiceTermination
• Timing Errors
– ItemTimingErro
∗ EarlyDelivery
∗ LateDelivery
– SequenceTimingError
∗ HighRate
∗ LowRate
∗ RateJitter
– ServiceTimingError
∗ DelayedService
∗ EarlyService
• Value Errors
– ItemValueError
∗ UndetectableValueError
∗ DetectableValueError
· OutOfRange
125
· BelowRange
· AboveRange
· OutOfBounds
– SequenceValueError
∗ BoundedValueChange
∗ StuckValue
∗ OutOfOrder
– ServiceValueError
∗ OutOfCalibration
• Replication Errors
• Concurrency Errors
• Access Control Errors
126
Appendix B
STPA Example
In [45], an automated door control system for a train is described. We want to know how the train
door can harm people. We simplified the example according to the steps:
1. The stakeholder establishes fundamental analyses to identify accidents and the hazards associated with
those accidents. Here, we have: Accident: Hit by closing door, falling out of the train door, people
trapped inside the train during emergency evacuation. Hazards: The hazards related to the accident
could be: H-1: Door closes on a person in the doorway. H-2: Door opens when the train is not in the
station / it is not aligned with a station platform. H-3: People are not able to exit during emergency.
2. The stakeholder designs a feedback control loop for the system to identify major components such as
sensors, controllers, actuators, and the controlled process. For the train door, we have the following
structure as shown in figure 1
Figure 1: Feedback control loop for automated door control system for train
3. The stakeholder identifies unsafe control actions that could lead to hazardous states. The figure 2 shows
the relationship among hazards that we have found previously and the provided control actions.
4. The stakeholder identifies causal factors for the unsafe control actions. For example, the doors are
commanded closed while a person in the doorway. One of the potential causes of that action is an
incorrect process model (i.e, the controller incorrectly believes that the doorway is clear). This is the
result of inadequate / ineffective feedback which is provided by a failing sensor.
The figure 3 will help to identify causal factors of the unsafe control actions. For that purpose, we
provide table 4 to clarify the context of each control action. The name of the columns are the variables
127
Figure 2: Unsafe control actions for train door controller
Figure 3: Process model of the controller
of the process model and the cells are the values of the variables. We know that the controller makes
a decision based on the context of the control action. For example, the open door command consists
of the values: the train is stopped, no emergency, train is not aligned with platform, and no obstacle in
doorway.
128
Figure 4: Context of the control actions
129
Appendix C
Safety-Critical System Terminologies
• Fault - 1) A manifestation of an error in software 2) An incorrect step/ process /data definition in a
computer program 3)A defect in a component / hardware device [17].
• Error - 1) The difference between a measured value and the specified value. 2)Omission of a requirement/constraints in the design specification. 3)A human action which produces an incorrect result such
as software containing a fault [17].
• Failure - 1) Termination of the capability of a product to do a required function 2) Can be described
as an event in a system / component which does not perform a required function within specified
limits [17].
• Accident - Can be described as a (loss) which results from inadequate enforcement of the behavioural
safety requirements on the process [21].
• Hazard - 1) A source of potential harm which leads to human injury, damage to health, property/the
environment 2) An essential property or condition which has potential to cause harm. 3) System state
/ set of conditions that will lead to an accident [17, 21].
• Unsafe Control Actions - They are hazardous scenarios that may occur in the system due to provided/not provided control action when the system need it [21].
• Safety Constraints - They are the safeguards which prevent the system from leading to accident/loss
[21].
• Process model - A model is required to determine system variables states and their values that affect
the controller decision and it can be updated through feedback [44].
• Process model variables - They are the safety-critical system variables of the controller in the feedback
control loop. They have an effect on the safety of issuing the control actions [44].
• Causal Factors - They are the accident scenarios which explain how unsafe control actions may occur
and how safe control actions may not occur [44].
• Software Safety - Can be described as a field of study of software assurance, it is a systematic method
to identifying, analyzing, mitigating, and controlling software hazards and hazardous functions to assure that the system is safe within operational context [27].
130
• Internal failure - 1) Can be described as an error detection condition. Specifically, it is used to identify
source of the error because result of internal failure is a propagation [33].
2) given a correct input, a failure happens during the execution which leads to produce an erroneous
output [8].
3) Can be interpreted as a probability of component failure per demand and accumulated with the
failure rate which occurs during interaction with other components [18]
131
Appendix D
List of Abbreviations
• AADL - Architecture Analysis and Design Language
• STPA - System Theoretic Process Analysis
• STAMP - Systems Theoretic Accident Model and Processes
• EMV2 - Error Model Annex Version 2
• AGREE - Assume Guarantee REasoning Environment
• SAE - Society of Automotive Engineers
• ACC - Adaptive-Cruise Control
• UCA - Unsafe Control Action
• MBE - Model-Based Engineering
• FTA - Fault Tree Analysis
• FMEA - Failure Modes and Effects Analysis
• SAFE - Systematic Analysis of Faults and Errors
• HAZOP - Hazard and Operability Study
• FPTN - Failure Propagation and Transformation Notation
• FPTC - Fault Propagation and Transformation Calculus
• Hip-HOPS - Hierarchically Performed Hazard Origin and Propagation Studies
• LTL - Linear-Time Temporal Logic
• CTL - Computation Tree Logic
• SMV - Symbolic Model Verifier
• FP - False Positive
• FN - False Negative
• TN - True Negative
• TP - True Positive
132
Appendix E
ASAM Annex Language Reference
E.1 Error Statement
E.1.1
1
Format
E r r s => [ {
2
Type => <ERROR 1 TYPE>(<ERROR 1 SEVERITY>, p=<ERROR 1 PROB>) ,
3
UCA => <UCA 1 ID >: ”<UCA 1 DESCRIPTION>” ,
C a u s e s => {
4
5
G e n e r a l => ”<ERROR 1 GENERAL CAUSE DESCRIPTION>” ,
6
S p e c i f i c => ”<ERROR 1 SPECIFIC CAUSE DESCRIPTION>”
7
},
8
SC => <SC 1 ID >: ”<SC 1 DESCRIPTION>” v e r i f i e d by [GAU1]
9
}, {
10
Type => <ERROR 2 TYPE>(<ERROR 2 SEVERITY>,p=<ERROR 2 PROB>) ,
11
UCA => <UCA 2 ID >: ”<UCA 2 DESCRIPTION>” ,
C a u s e s => {
12
13
G e n e r a l => ”<ERROR 2 GENERAL CAUSE DESCRIPTION>” ,
14
S p e c i f i c => ”<ERROR 2 SPECIFIC CAUSE DESCRIPTION>”
15
},
16
SC => <SC 2 ID >: ”<SC 2 DESCRIPTION>” v e r i f i e d by [GAU2]
17
} ,... ,{
18
Type => <ERROR N TYPE>(<ERROR N SEVERITY>,p=<ERROR N PROB>) ,
19
UCA => <UCA N ID>: ”<UCA N DESCRIPTION>” ,
C a u s e s => {
20
21
G e n e r a l => ”<ERROR N GENERAL CAUSE DESCRIPTION>” ,
22
S p e c i f i c => ”<ERROR N SPECIFIC CAUSE DESCRIPTION>”
23
},
24
SC => <SC N ID >: ”<SC N DESCRIPTION>” v e r i f i e d by [GAU3]
25
}];
E.1.2
Description
The errors statement allows known errors that determined causes and known safety constraints to be
attached to a component of the architecture. A report of all components and their mitigated errors can then be
133
produced. ERROR SEVERITY can be one of 4 values: Catastrophic, Critical, Marginal, or Negligible. The
ERROR PROB represents how likely the error is to occur. This is a value between 0 and 1 (inclusive). The
verified by portion requires the XAGREE annex to be installed. Once installed, you can a mathematical proof
of your safety constraint associating XAGREE guarantees to each constraint. Provided that those guarantees
can be proven, the safety constraint will be reported as verified.
E.1.3
Example
1
a n n e x asam {∗∗
2
E r r s => [ {
3
Type => V a l u e O u t O f R a n g e E r r o r ( C a t a s t r o p h i c , p = 0 . 0 0 1 ) ,
4
UCA => UCA1 : ” T h a t e r r o r g i v e s t h e u n s a f e c o n t r o l a c t i o n t o t h e s y s t e m ” ,
C a u s e s => {
5
6
G e n e r a l => ”Why t h i s u n s a f e a c t i o n h a p p e n e d ? ”
7
S p e c i f i c => ” T h a t e r r o r p r o p a g a t e d and c r o s s e d t h r e e c o m p o n e n t s ”
8
},
9
SC => SC1 : ” P r o v i d e t h e s a f e t y c o n s t r a i n t t o m i t i g a t e t h e e f f e c t s o f
t h e e r r o r ” v e r i f i e d by [GAU1]
}]
10
11
};
E.2 Error Propagation Statement
E.2.1
1
Format
[<INCOMING ERROR 1>, <INCOMING ERROR 2>, . . . <INCOMING ERROR N>] on
[<INCOMING PORT 1>, <INCOMING PORT 2>, . . . <INCOMING PORT N>] c a u s e s
[<OUTGOING ERROR 1>(<ERROR 1 SEVERITY>, p=<ERROR 1 PROB>) ,
<OUTGOING ERROR 2>(<ERROR 2 SEVERITY>,
p=<ERROR 2 PROB>) , . . . < OUTGOING ERROR N>(<ERROR N SEVERITY>,
p=<ERROR N PROB>) ] on [<OUTGOING PORT 1>, <OUTGOING PORT 2>,
. . . < OUTGOING PORT N>];
134
E.2.2
Description
Error propagation statements allow incoming errors to be transformed within the current component
to outgoing errors. Errors that do not have a transformation rule are assumed to not propagate beyond the current component. ERROR SEVERITY can be one of 4 values: Catastrophic, Critical, Marginal, or Negligible.
The ERROR PROB represents how likely the error is to occur. This is a value between 0 and 1 (inclusive).
E.2.3
1
Example
a n n e x asam {∗∗
2
[ ValueOutOfRange ] on [ d o o r o p e n c , d o o r b l o c k e d c ] c a u s e s
3
[ ValueOutOfRange ( M a r g i n a l , p = 0 . 5 ) ] on [ o p e n d o o r c , c l o s e d o o r c ] ;
4
};
E.3 Internal Failure Statement
E.3.1
1
Format
i n t e r n a l f a i l u r e <ID >: ”<DESCRIPTION>” c a u s e s [<ERROR 1>(<ERROR 1 SEVERITY>,
p=<ERROR 1 PROB>) , <ERROR 2>(<ERROR 2 SEVERITY>,
<ERROR N>(<ERROR N SEVERITY>,
p=<ERROR 2 PROB>) , . . .
p=<ERROR N PROB>) ] on [<OUT PORT 1>,
<OUT PORT 2>, . . . <OUT PORT N>];
E.3.2
Description
The internal failure statement allows internal failures to be documented as well as what errors they
cause and on which ports they are caused. ERROR SEVERITY can be one of 4 values: Catastrophic, Critical,
Marginal, or Negligible. The ERROR PROB represents how likely the error is to occur. This is a value
between 0 and 1 (inclusive).
E.3.3
1
Example
a n n e x asam {∗∗
i n t e r n a l f a i l u r e INTF2 : ” s o m e t h i n g bad h a p p e n e d ” c a u s e s
2
[ ValueOutOfRange ( N e g l i g i b l e , p = 0 . 0 0 2 ) ] on [ o p e n d o o r c , c l o s e d o o r c ] ;
3
};
135
E.4 Safety Constraint Handles Statement
E.4.1
1
Format
s a f e t y c o n s t r a i n t <ID >: ”<DESCRIPTION>” h a n d l e s [<ERROR 1>, <ERROR 2>, . . .
<ERROR N>] on [<IN PORT 1 >, <IN PORT 2 >, . . . <IN PORT N>] v e r i f i e d by
[GAU1, GAU2,
E.4.2
. . . . GAU N ] ;
Description
The safety constraint handles statements allows you to specify safety constraints for handling errors
on in ports. The verified by portion requires the XAGREE annex to be installed. Once installed, you can a
mathematical proof of your safety constraint associating XAGREE guarantees to each constraint. Provided
that those guarantees can be proven, the safety constraint will be reported as verified.
E.4.3
1
Example
a n n e x asam {∗∗
s a f e t y c o n s t r a i n t SC1 : ” h a n d l e an e r r o r ” h a n d l e s [ ValueOutOfRange ] on
2
[ i n d a t a ] v e r i f i e d by [GAU1 ] ;
3
};
E.5 Safety Constraint Prevents Statement
E.5.1
1
Format
s a f e t y c o n s t r a i n t <ID >: ”<DESCRIPTION>” p r e v e n t s [<ERROR 1>, <ERROR 2>, . . .
<ERROR N>] on [<OUT PORT 1>, <OUT PORT 2>, . . . <OUT PORT N>] v e r i f i e d by
[GAU1, GAU2,
E.5.2
. . . GAU N ] ;
Description
The safety constraint handles statements allows you to specify safety constraints for preventing
errors on out ports. The verified by portion requires the XAGREE annex to be installed. Once installed,
you can a mathematical proof of your safety constraint associating XAGREE guarantees to each constraint.
Provided that those guarantees can be proven, the safety constraint will be reported as verified.
136
E.5.3
1
Example
a n n e x asam {∗∗
s a f e t y c o n s t r a i n t SC1 : ” h a n d l e an e r r o r ” p r e v e n t s [ ValueOutOfRange ] on
2
[ o u t d a t a ] v e r i f i e d by [GAU1 ] ;
3
};
137
Appendix F
1
XText Grammar
grammar edu . c l e m s o n . asam . Asam w i t h o r g . o s a t e . x t e x t . a a d l 2 . p r o p e r t i e s . P r o p e r t i e s
2
3
g e n e r a t e asam ” h t t p : / / www. c l e m s o n . edu /ASAM”
4
5
i m p o r t ” h t t p : / / a a d l . i n f o /AADL/ 2 . 0 ” a s a a d l 2
6
7
8
AnnexLibrary r e t u r n s aadl2 : : AnnexLibrary :
AsamLibrary ;
9
10
11
AnnexSubclause r e t u r n s aadl2 : : AnnexSubclause :
A s a m S u b c l a u se ;
12
13
14
AsamLibrary :
{ AsamContractLibrary } c o n t r a c t =AsamContract ;
15
16
17
A s a m S u b c l a u se :
{ AsamContractSubclause } c o n t r a c t =AsamContract ;
18
19
20
AsamContract r e t u r n s C o n t r a c t :
{ A s a m C o n t r a c t } ( s t a t e m e n t += A s a m S t a t e m e n t ) ∗ ;
21
22
23
AsamStatement :
ErrsStatement | InternalFailureStatement | SafetyConstraintHandlesStatement |
SafetyConstraintPreventsStatement | ErrorPropagationRuleStatement |
TypeStatement ;
24
25
26
InternalFailureStatement :
’ i n t e r n a l ’ ’ f a i l u r e ’ i n t e r n a l F a i l u r e I d =ID ’ : ’
i n t e r n a l F a i l u r e D e s c r i p t i o n =STRING ’ c a u s e s ’
’[ ’ ports = P o r t s L i s t
’] ’
’; ’;
27
28
SafetyConstraintHandlesStatement :
138
’[ ’ e r r o r s = E r r o r s L i s t
’ ] ’ ’ on ’
29
’ s a f e t y ’ ’ c o n s t r a i n t ’ s a f e t y C o n s t r a i n t I d =ID ’ : ’
s a f e t y C o n s t r a i n t D e s c r i p t i o n =STRING ’ h a n d l e s ’
errors=ErrorsListNoProbability
’ by ’
’ ] ’ ’ on ’
’[ ’
’[ ’ ports = P o r t s L i s t
’[ ’ matchingXagreeStatements=GuaranteeIds
’] ’
’] ’ ’ verified ’
’; ’;
30
31
32
SafetyConstraintPreventsStatement :
’ s a f e t y ’ ’ c o n s t r a i n t ’ s a f e t y C o n s t r a i n t I d =ID ’ : ’
s a f e t y C o n s t r a i n t D e s c r i p t i o n =STRING ’ p r e v e n t s ’
errors=ErrorsListNoProbability
’ by ’
’ ] ’ ’ on ’
’[ ’
’[ ’ ports = P o r t s L i s t
’[ ’ matchingXagreeStatements=GuaranteeIds
’] ’
’] ’ ’ verified ’
’; ’;
33
34
35
ErrorPropagationRuleStatement :
’[ ’ i n E r r o r s L i s t = ErrorsListNoProbability
’] ’ ’ causes ’
’ ] ’ ’ on ’
’[ ’ o u t E r r o r s L i s t = E r r o r s L i s t
outPortsLists=PortsList
’] ’
’[ ’ i n P o r t s L i s t s = P o r t s L i s t
’ ] ’ ’ on ’
’[ ’
’; ’;
36
37
38
TypeStatement :
’ t y p e ’ ’=>’ t y p e = ( ’ s e n s o r ’ | ’ c o n t r o l l e r ’ | ’ c o n t r o l l e d p r o c e s s ’ | ’ a c t u a t o r ’ )
’; ’;
39
40
41
ErrorsList :
f i r s t E r r o r = E r r o r ( ’ , ’ r e s t E r r o r s += E r r o r ) ∗ ;
42
43
44
ErrorsListNoProbability :
f i r s t E r r o r = E r r o r N o P r o b a b i l i t y ( ’ , ’ r e s t E r r o r s += E r r o r N o P r o b a b i l i t y ) ∗ ;
45
46
47
PortsList :
f i r s t P o r t =ID ( ’ , ’ r e s t P o r t s +=ID ) ∗ ;
48
49
50
ErrsStatement :
{ E r r s S t a t e m e n t } ’ E r r s ’ ’=>’ ’ [ ’ f i r s t E r r o r = E r r o r S t a t e m e n t ( ’ , ’
r e s t E r r o r s += E r r o r S t a t e m e n t ) ∗ ’ ] ’ ;
51
139
52
53
ErrorStatement :
’{ ’ t y p e = E r r o r T y p e S t a t e m e n t ’ , ’
54
uca= U n s a f e C o n t r o l A c t i o n S t a t e m e n t ’ , ’
55
cause=CausesStatement ’ , ’
56
sc= S a f e t y C o n s t r a i n t S t a t e m e n t
57
58
’} ’;
59
60
61
ErrorTypeStatement :
’ Type ’ ’=>’ ( e r r o r T y p e = E r r o r ) ;
62
63
64
UnsafeControlActionStatement :
’UCA’ ’=>’ i d =ID ’ : ’ d e s c r i p t i o n =STRING ;
65
66
67
CausesStatement :
’ Causes ’ ’=>’ ’ { ’ ( g e n e r a l = G e n e r a l C a u s e S t a t e m e n t |
s p e c i f i c =SpecificCauseStatement | general=GeneralCauseStatement ’ , ’
specific =SpecificCauseStatement )
’} ’;
68
69
70
GeneralCauseStatement :
’ G e n e r a l ’ ’=>’ d e s c r i p t i o n =STRING ;
71
72
73
SpecificCauseStatement :
’ S p e c i f i c ’ ’=>’ d e s c r i p t i o n =STRING ;
74
75
76
SafetyConstraintStatement :
’SC ’ ’=>’ i d =ID ’ : ’ d e s c r i p t i o n =STRING ’ v e r i f i e d ’ ’ by ’
matchingXagreeStatements=GuaranteeIds
77
78
79
GuaranteeIds :
f i r s t =ID ( ’ , ’ r e s t +=ID ) ∗ ;
80
81
Error :
140
’] ’;
’[ ’
82
e r r o r =( ’ S e r v i c e E r r o r ’ | ’ ItemOmission ’ | ’ ServiceOmission ’ |
’ SequenceOmission ’ | ’ T r a n s i e n t S e r v i c e O m i s s i o n ’ | ’ L a t e S e r v i c e S t a r t ’ |
’ E a r l y S e r v i c e T e r m i n a t i o n ’ | ’ B o u n d e d O m i s s i o n I n t e r v a l ’ | ’ ItemCommission ’ |
’ S e r v i c e C o m m i s s i o n ’ | ’ SequenceCommission ’ | ’ E a r l y S e r v i c e S t a r t ’ |
’ LateServiceTermination ’ | ’ TimingRelatedError ’ | ’ ItemTimingError ’ |
’ E a r l y D e l i v e r y ’ | ’ L a t e D e l i v e r y ’ | ’ S e q u e n c e T i m i n g E r r o r ’ | ’ HighRate ’ |
’ LowRate ’ | ’ R a t e J i t t e r ’ | ’ S e r v i c e T i m i n g E r r o r ’ | ’ D e l a y e d S e r v i c e ’ |
’ EarlyService ’ | ’ ValueRelatedError ’ | ’ ItemValueError ’ |
’ U n d e t e c t a b l e V a l u e E r r o r ’ | ’ D e t e c t a b l e V a l u e E r r o r ’ | ’ OutOfRange ’ |
’ BelowRange ’ | ’ AboveRange ’ | ’ OutOfBounds ’ | ’ S e q u e n c e V a l u e E r r o r ’ |
’ BoundedValueChange ’ | ’ S t u c k V a l u e ’ | ’ OutOfOrder ’ | ’ S e r v i c e V a l u e E r r o r ’ |
’ OutOfCalibration ’ | ’ ReplicationError ’ | ’ AsymmetricReplicatesError ’ |
’ AsymmetricValue ’ | ’ A s y m m e t r i c A p p r o x i m a t e V a l u e ’ | ’ A s y m m e t r i c E x a c t V a l u e ’
| ’ AsymmetricTiming ’ | ’ A s y m m e t r i c O m i s s i o n ’ | ’ A s y m m e t r i c I t e m O m i s s i o n ’ |
’ AsymmetricServiceOmission ’ | ’ SymmetricReplicatesError ’ |
’ SymmetricValue ’ | ’ SymmetricApproximateValue ’ | ’ SymmetricExactValue ’ |
’ SymmetricTiming ’ | ’ S y m m e t r i c O m i s s i o n ’ | ’ S y m m e t r i c I t e m O m i s s i o n ’ |
’ SymmetricServiceOmission ’ | ’ ConcurrencyError ’ | ’ RaceCondition ’ |
’ ReadWriteRace ’ | ’ W r i t e W r i t e R a c e ’ | ’ MutExError ’ | ’ Deadlock ’ |
’ S t a r v a t i o n ’ | ID )
83
’( ’
84
s e v e r i t y L e v e l =( ’ C a t a s t r o p h i c ’ | ’ C r i t i c a l ’ | ’ Marginal ’ | ’ N e g l i g i b l e ’ )
85
’ , p=’
86
p r o b a b i l i t y = ( ’ 0 ’ | ’ 1 ’ | FLOAT)
87
’) ’;
88
89
90
ErrorNoProbability :
e r r o r =( ’ S e r v i c e E r r o r ’ | ’ ItemOmission ’ | ’ ServiceOmission ’ |
’ SequenceOmission ’ | ’ T r a n s i e n t S e r v i c e O m i s s i o n ’ | ’ L a t e S e r v i c e S t a r t ’ |
’ E a r l y S e r v i c e T e r m i n a t i o n ’ | ’ B o u n d e d O m i s s i o n I n t e r v a l ’ | ’ ItemCommission ’ |
’ S e r v i c e C o m m i s s i o n ’ | ’ SequenceCommission ’ | ’ E a r l y S e r v i c e S t a r t ’ |
’ LateServiceTermination ’ | ’ TimingRelatedError ’ | ’ ItemTimingError ’ |
’ E a r l y D e l i v e r y ’ | ’ L a t e D e l i v e r y ’ | ’ S e q u e n c e T i m i n g E r r o r ’ | ’ HighRate ’ |
141
’ LowRate ’ | ’ R a t e J i t t e r ’ | ’ S e r v i c e T i m i n g E r r o r ’ | ’ D e l a y e d S e r v i c e ’ |
’ EarlyService ’ | ’ ValueRelatedError ’ | ’ ItemValueError ’ |
’ U n d e t e c t a b l e V a l u e E r r o r ’ | ’ D e t e c t a b l e V a l u e E r r o r ’ | ’ OutOfRange ’ |
’ BelowRange ’ | ’ AboveRange ’ | ’ OutOfBounds ’ | ’ S e q u e n c e V a l u e E r r o r ’ |
’ BoundedValueChange ’ | ’ S t u c k V a l u e ’ | ’ OutOfOrder ’ | ’ S e r v i c e V a l u e E r r o r ’ |
’ OutOfCalibration ’ | ’ ReplicationError ’ | ’ AsymmetricReplicatesError ’ |
’ AsymmetricValue ’ | ’ A s y m m e t r i c A p p r o x i m a t e V a l u e ’ | ’ A s y m m e t r i c E x a c t V a l u e ’
| ’ AsymmetricTiming ’ | ’ A s y m m e t r i c O m i s s i o n ’ | ’ A s y m m e t r i c I t e m O m i s s i o n ’ |
’ AsymmetricServiceOmission ’ | ’ SymmetricReplicatesError ’ |
’ SymmetricValue ’ | ’ SymmetricApproximateValue ’ | ’ SymmetricExactValue ’ |
’ SymmetricTiming ’ | ’ S y m m e t r i c O m i s s i o n ’ | ’ S y m m e t r i c I t e m O m i s s i o n ’ |
’ SymmetricServiceOmission ’ | ’ ConcurrencyError ’ | ’ RaceCondition ’ |
’ ReadWriteRace ’ | ’ W r i t e W r i t e R a c e ’ | ’ MutExError ’ | ’ Deadlock ’ |
’ S t a r v a t i o n ’ | ID ) ;
91
92
t e r m i n a l FLOAT :
’ 0 . ’ INTEGER LIT ;
142
Appendix G ASAM’s plug-in
G.1 Install OSATE
http://osate.org/download-and-install.html
G.2 Install ASAM
https://bitbucket.org/strategicsoftwareengineering/asam-update-site/
G.3 ASAM Examples
G.3.1
Adaptive Cruise Control System - (ACC)
Example 1: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/
src/master/ACCExample1.aadl
Example 2: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/
src/master/ACCExample2.aadl
G.3.2
Medical Embedded Device - Pacemaker (PM)
Example 1: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/
src/master/PMExample1.aadl
Example 2: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/
src/master/PMExample2.aadl
Example 3: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/
src/master/PMExample3.aadl
G.3.3
Train Automated Door Control System - (TADCS)
Example 1: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/
src/master/TADCSExample1.aadl
Example 2: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/
src/master/TADCSExample2.aadl
Example 3: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/
src/master/TADCSExample3.aadl
143
G.3.4
Infant Incubator Temperature Control System - Isolette (ISO)
https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/src/master/
ISOLETTEExample1.aadl
G.3.5
Notation Example
https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/src/master/
NotationExample.aadl
144
Appendix H Feedback Results Representation
In this section, we present the results of the feedback questionnaire from section 3.3 which was
published as a technical paper in the 36th international system safety conference (ISSC). In that session,
after the presentation, 21 system safety specialists voted for 10 questions based on different criteria such as
strongly agree, agree, undecided, disagree and strongly disagree. Questions are listed in figure 5.
Figure 5: Feedback questionnaire
Also, figure 6 shows the results of their votes with respect to each question.
145
Figure 6: Results of the votes
146
Bibliography
[1] Asim Abdulkhaleq and Stefan Wanger. Integrating state machine analysis with system-theoretic process analysis. LNI Proceeding of Software Engineering Workshop Band, Springer, P(215):501–514,
December 2013.
[2] Asim Abdulkhaleq and Stefan Wanger. A software safety verification method based on systemtheoritic process analysis. In International Conference on Computer Safety, Reliability, and Security,(SAFECOMP). Springer, 2014.
[3] Risza Ruzli Ahmed Baig and Azizul Buang. Reliability analysis using fault tree analysis: A review.
International Journal of Chemical Engineering and Applications, 4(3):169–173, June 2013.
[4] Philip Asare, John Lach, and John A. Stankovic. Fstpa-i: A formal approach to hazard identification
via system theoretic process analysis. In Proceedings of the ACM/IEEE 4th International Conference
on Cyber-Physical Systems, ICCPS ’13, pages 150–159, New York, NY, USA, 2013. ACM.
[5] Mordechai Ben-Ari. The bug that destroyed a rocket. Special Interest Group on Computer Science
Education, SIGCSE Bulletin, 33(2):58–59, June 2004.
[6] Ronal Berger and et al. Rhythm management product performance report: Q2 edition. Boston Scientific:Advancing Science for Life, June 2016.
[7] Darren Cofer and et al. Compositional verification of architectural models. In NFM 2012, LNCS 7226,
pp. 126140. Springer-Verlag Berlin Heidelberg, 2012.
[8] V. Cortellessa and V. Grassi. Role and impact of error propagation in software architecture reliability.
Technical Report TRCS 007/2006, 2006.
[9] Julien Delange. AADL in Practice: Design and Validate the Architecture of Critical Systems. Reblochon
Development Company, May 2017.
[10] Julien Delange and Peter Feiler. Supporting safety evaluation process using aadl. SEI, 7th Layered
Assurance Workshop, 2013.
[11] Peter Feiler and David Gluch. Model-Based Engineering with AADL. Addison-Wesley, 2013.
[12] Peter H. Feiler, David Gluch, and John D. McGregor. An Architecture-Led Safety Analysis Method.
In 8th European Congress on Embedded Real Time Software and Systems (ERTS 2016), TOULOUSE,
France, January 2016.
[13] Peter Fenelon and John A McDermid. An integrated toolset for software safety analysis. Journal of
Systems and Software, 21(3):279–290, June 1993.
[14] Jeremy Hsu. Toyota recalls 1.9 million prius hybrids over software flaw. IEEE Spectrum, Feb. 2014.
147
[15] John Hudak and Peter Feiler. Developing aadl models for control systems: A practitioners guide.
CMU/SEI-2007-TR-014, July 2007.
[16] Carnegie Mellon University/Software Engineering Institute(SEI). Open source aadl tool environment:
http://osate.org/, 2005.
[17] International Electrotechnical Commission International Organization for Standardization, Institute
of Electrical, and Electronics Engineers. Iso/iec/ieee 24765:2010 systems and software engineeringvocabulary, December 2010.
[18] Kishor S. Trivedi Katerina Goeva-Popstojanova. Architecture-based approach to reliability assessment
of software systems. An International Journal of Performance Evaluation, 45(1):179–204, April 2001.
[19] John C. Knight. Safety critical systems: challenges and directions. In The 24 rd International Conference on Software Engineering, ICSE. IEEE, May 2005.
[20] Nancy Leveson. A new accidental model for engineering safer systems. Safety Science, 42(4):237–270,
April 2004.
[21] Nancy Leveson. Engineering a safer world: Systems Thinking Applied to Safety. MIT Press, 2011.
[22] Nancy Leveson. An stpa primer:version 1. MIT publications, http://sunnyday.mit.edu/STPA-Primerv0.pdf, August 2013.
[23] Nancy Leveson and Clark Turner. An investigation of the therac-25 accidents. IEEE Computer,
26(7):18–41, July 1993.
[24] Jacques-Louis Lions. Ariane 5:flight 501 failure. In Technical Report: Report by the Inquiry Board,
July 1996.
[25] Ethan Mcgee and John McGregor. A decision support system for selecting between designs for dynamic
software product lines. PhD Dissertation in Computer Science, Clemson University, August 2018.
[26] Anitha Murugesan and et al. Compositional verification of a medical device system. In n ACM SIGAda
Ada Letters, volume 33, pages 5164. ACM, 2013.
[27] NASA-GB-8719.13. NASA Software Safety Guidebook: TECHNICAL STANDARD. NASA, March
2004.
[28] Michael Maile Natalya An and Daniel Jiang. Balancing the requirements for a zero false positive/negative forward collision warning. 10th IEEE Annual Conference on Wireless On-demand Network Systems and Services (WONS), Banff, Canada, March 2013.
[29] National
Heart,
Lung,
and
Blood
Institute.
https://www.nhlbi.nih.gov/health/health-topics/topics/pace, Feb.28 2012.
”explore
pacemakers”.
[30] Dennis Nolan. Safety and Security Review for the Process Industries Application of HAZOP, PHA,
What-If and SVA Reviews, 4th Edition. Elsevier, 2015.
[31] Standard Pacemaker. Pacemaker system specification, software quality research laboratory (sqrl),.
Boston Scientific Publisher, Jan.3 2007.
[32] David Gluch Peter Feiler and John Hudak. The architecture analysis and design language(aadl):an
introduction. SEI, CMU/SEI-2006-TN-011, February 2006.
[33] David Gluch Peter Feiler, John Hudak and Julien Delange. Architecture fault modeling and analysis
with the error model annex, version 2. SEI, CMU/SEI-2016-TR-009, June 2016.
148
[34] David Gluch Peter Feiler, Julien Delange and John D. McGregor. Architecture-led safety process.
Technical Report: CMU/SEI-2016-TR-012, Carnegie Mellon University/Software Engineering Institute, Pittsburgh, 2016.
[35] Julien Delange Peter Feiler, John Hudak and David Gluch. Architecture fault modeling and analysis
with the error model annex, version 2. Technical Report: CMU/SEI-2016-TR-009, Carnegie Mellon
University/Software Engineering Institute, Pittsburgh, 2016.
[36] Sam Procter. A development and assurance process for medical application platform apps. Ph.D.
Dissertation, Kansas State University, 2016.
[37] Sam Procter and John Hatcliff. An architecturally-integrated, systems-based hazard analysis for medical applications. In Twelfth ACM/IEEE Conference on Formal Methods and Models for Codesign,
MEMOCODE. IEEE, November 2014.
[38] Fryad M. Rashid and John D. McGregor. Augmenting a hazard analysis method with error propagation
information for safety-critical systems. 35th International System Safety Conference, Albuquerque,
New Mexico, USA, 21-25 August 2017.
[39] Fryad M. Rashid and John D. McGregor. Using formal notations to augment a hazard analysis method.
36th International System Safety Conference, Phoenix, Arizona, USA, 13-17 August 2018.
[40] Michael Beauregard Raymond Mikulak, Robin McDermott. The Basics of FMEA, 2nd Edition. CRC
Press, 2008.
[41] ATSB Transport Safety Investigation Report. In-flight upset event 240km north-west of perth, wa boeing
company 777-200, 9m-mrg. Australian Transport Safety Bureau, August 2005.
[42] NIST Planning report 02-3. The economic impacts of inadequate infrastructure for software testing.
May 2002.
[43] Felix Salfner and Miroslaw Malek. Reliability Modeling of Proactive Fault Handling. HumboldtUniversitt zu Berlin, Mathematisch-Naturwissenschaftliche Fakultt II, Institut fr Informatik, 2006.
[44] John Thomas. Extending and automating a systems-theoretic hazard analysis for requirements generation and analysis. Ph.D. Dissertation. MIT., 2013.
[45] John Thomas and Nanacy Leveson. Performing hazard analysis on complex, software and humanintensive systems. The 29th International System Safety Conference, Las Vegas, NV, August 2011.
[46] Nigel J. Tracey and et al. Integrating safety analysis with automatic test-data generation for software
safety verification. Proceedings of the 17th International Conference on System Safety, 1999.
[47] Malcolm Wallace. Modular architectural representation and analysis of fault propagation and transformation. Electronic Notes in Theoretical Computer Science, 141(3):53–71, December 2005.
[48] R. Sasseb G. Heinerb Y. Papadopoulosa, J. McDermida. Analysis and synthesis of the behaviour of
complex programmable electronic systems in conditions of failure. Reliability Engineering and System
Safety, 71(3):229–247, 2001.
149