Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu
Clemson University TigerPrints All Dissertations Dissertations 12-2018 Augmenting a Hazard Analysis Method with Error Propagation Information for Safety-Critical Systems Fryad Khalid M. Rashid Clemson University, fryad1979@gmail.com Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations Recommended Citation Rashid, Fryad Khalid M., "Augmenting a Hazard Analysis Method with Error Propagation Information for Safety-Critical Systems" (2018). All Dissertations. 2259. https://tigerprints.clemson.edu/all_dissertations/2259 This Dissertation is brought to you for free and open access by the Dissertations at TigerPrints. It has been accepted for inclusion in All Dissertations by an authorized administrator of TigerPrints. For more information, please contact kokeefe@clemson.edu. AUGMENTING A H AZARD A NALYSIS M ETHOD WITH E RROR P ROPAGATION I NFORMATION FOR S AFETY-C RITICAL S YSTEMS A Dissertation Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Computer Science by Fryad Khalid M. Rashid December 2018 Accepted by: Dr. John D McGregor, Committee Chair Dr. Brian Malloy, Committee Co-Chair Dr. Amy Apon Dr. Murali Sitaraman Abstract Safety-critical systems need specific activities in the software development life cycle to ensure that the system will operate safely. The objective of this dissertation is to develop a new safety analysis method to identify hazards. The method uses error propagation information and the internal structure rather than the interfaces of a system. We propose development procedures to augment STPA (System-Theoretic Process Analysis) with error propagation information derived from the architecture description of a system represented in the AADL (Architecture Analysis & Design Language). We will focus on how the AADL error ontology can be used to assist in identifying errors, how those errors propagate among components, and whether the errors lead to hazards in the system. Our research shows that tracing error propagation leads to the discovery of hazards and additional information that other methods miss. The new safety analysis method, Architecture Safety Analysis Method (ASAM), by augmenting STPA with early design information, is able to find more hazards, unsafe control actions, safety constraints and causes of the unsafe control actions than by using STPA alone. Our method leaves more false positives than STPA, but in safety analysis having false positive is preferred over missing actual hazards. We use the AADL error ontology to rigorously describe system component errors and how they propagate among components. We illustrate this rigorous description through several examples and we demonstrate that it yields hazards that an STPA analysis of the example did not find. In addition, we provide a mathematical notation and expressions so that formal analysis and verification of the hazards can be done to ensure that all causes of the hazards have been identified and that any developed safety constraints fully mitigate the hazards, through the use of compositional reasoning. ii Dedication I dedicate this doctoral dissertation to the cancer patients, nurses, physicians and professors in the MD Anderson Cancer Center at the University of Texas (Houston Campus). iii Acknowledgments I would like to express my deepest gratitude to my advisor, Dr. John D. McGregor, for his unwavering support, advice, and help that he has given me throughout this academic journey. I would like to specially thank my best friend, Dr. Ethan McGee, for his help and support throughout this dissertation. In addition, I would like to thank my other fellow Ph.D. colleagues in the Strategic Software Engineering Research Group (SSERG), Yates Monteith and Wanda Moses, for their willingness to listen and offer constructive feedback about my research ideas. Also, I would like to thank my dissertation defense committee members for their feedback on this research and for their willingness to serve throughout the defense process. Finally, I would like to thank my scholarship program, The Higher Committee For Education Development (HCED) in Iraq, for helping to make my dreams come true. I am deeply thankful to my lovely wife, Dr. Tavga A. Hussain, for her love, support, motivation and sacrifice. Without her, this doctoral degree would never have been done. I would also like to thank everyone who has offered encouragement and support throughout my entire graduate career. This includes my parents, my siblings, my former co-workers at Sulaimani Polytechnic University / Clemson University and my close friends. iv Table of Contents Title Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction . . . . . . . . . 1.1 Problem Statement . . . 1.2 Research Method . . . . 1.3 Contributions . . . . . . 1.4 Dissertation Statement . 1.5 Research Questions . . . 1.6 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 4 4 5 5 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 System-Theoretic Process Analysis . . . . . . . . . . . . . . 2.2 Architecture Analysis and Design Language . . . . . . . . . 2.3 Error-Model Annex . . . . . . . . . . . . . . . . . . . . . . 2.4 Safety Verification Methods . . . . . . . . . . . . . . . . . 2.5 System-Theoretic Process Analysis Augmentation Reasons . 2.6 False Positive and False Negative in Safety-Critical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 9 12 18 20 21 3 Feedback Control Loop Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Augmenting a Hazard Analysis Method with Error Propagation Information . . . 3.2 Additional Scenarios to the Augmented Method . . . . . . . . . . . . . . . . . . 3.3 Using Formal Notations to Augment a Hazard Analysis Method . . . . . . . . . 3.4 Introducing Feedback Control Loop Architecture with Compositional Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 23 28 32 38 4 Analysis and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Demonstrating Three-Way Communication Format . . . . . . . . . . . . . . . . 4.2 Demonstrating Error Mitigation by Safety Constraints . . . . . . . . . . . . . . . 4.3 Demonstrating Safety Constraints Verification . . . . . . . . . . . . . . . . . . . 4.4 Demonstrating Notations and Expressions for Error Flows and Safety Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 42 . 43 . 68 . 88 . 102 5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.1 Difference Between Two-Way and Three-Way Communication Formats . . . . . . . . . . . 110 v 5.2 5.3 Justification of & Limitations of Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 111 ASAM Field Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.1 Works Related to ASAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.2 Additional Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 ASAM’s Future Work in Adaptive Cruise Control System . . . . . 7.2 ASAM’s Future Work in Medical Embedded Device - Pacemaker . 7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 118 120 121 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Appendices . . . . . . . . . . . . . . . . . . . A Error Ontology . . . . . . . . . . . . B STPA Example . . . . . . . . . . . . C Safety-Critical System Terminologies D List of Abbreviations . . . . . . . . . E ASAM Annex Language Reference . F XText Grammar . . . . . . . . . . . . G ASAM’s plug-in . . . . . . . . . . . . H Feedback Results Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 125 127 130 132 133 138 143 145 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 vi List of Tables 3.1 3.2 3.3 3.4 Unsafe Functional Behavior . . . . . . . . . . . . . . . . . . Finding More Possibilities . . . . . . . . . . . . . . . . . . . Mathematical Notations for Error Flows and Safety Constraints Mitigating the unsafe control actions . . . . . . . . . . . . . . . . . . 37 37 38 40 4.1 Evaluation Examples Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 7.1 7.2 Identify FP/FN/TP/TN in ACC system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Identify FP/FN/TP/TN in pacemaker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Figures 2.1 2.2 Feedback Control Loop in STPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flow Visualized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 12 3.1 3.2 3.3 3.4 Feedback control loop augmented with error propagation and finite state machines . The augmented method for additional scenarios . . . . . . . . . . . . . . . . . . . A framework for feedback control loop inputs/outputs and functions . . . . . . . . Introduce feedback control loop architecture with assume-guarantee reasoning . . . 26 29 36 41 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 Three-Way Communication Format for Adaptive Cruise Control System Architecture . . . . 52 ASAM’s report for Adaptive Cruise Control System Architecture . . . . . . . . . . . . . . . 53 Three-Way Communication Format for Train Automated Door Control System Architecture 60 ASAM’s report for Train Automated Door Control System Architecture . . . . . . . . . . . 61 Three-Way Communication Format for Embedded Medical Device . . . . . . . . . . . . . . 63 ASAM’s report for Pacemaker System Architecture . . . . . . . . . . . . . . . . . . . . . . 63 Three-Way Communication Format for Isolette System Architecture . . . . . . . . . . . . . 66 ASAM’s report for Isolette System Architecture . . . . . . . . . . . . . . . . . . . . . . . . 67 ASAM’s Error Mitigation report for ACC System Architecture . . . . . . . . . . . . . . . . 72 ASAM’s report for Non-Mitigated Error in ACC system . . . . . . . . . . . . . . . . . . . 74 ASAM’s Error Mitigation report for Train Automated Door Control System Architecture . . 79 ASAM’s report for Non-Mitigated Error in Train Door . . . . . . . . . . . . . . . . . . . . 81 ASAM’s Error Mitigation report for pacemaker . . . . . . . . . . . . . . . . . . . . . . . . 85 ASAM’s report for Non-Mitigated Error in pacemaker . . . . . . . . . . . . . . . . . . . . 87 ASAM’s report about safety constraints verification for train door . . . . . . . . . . . . . . 93 ASAM’s report about safety constraints are not verified for the train door . . . . . . . . . . 94 ASAM’s report about safety constraints verification for pacemaker . . . . . . . . . . . . . . 99 ASAM’s report about safety constraints are not verified for the pacemaker . . . . . . . . . . 101 5.1 5.2 5.3 Two-way versus Three-way communication format . . . . . . . . . . . . . . . . . . . . . . 111 Control Table versus Error Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 ASAM’s contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.1 7.2 ASAM’s future work in ACC example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 ASAM’s future work in pacemaker example . . . . . . . . . . . . . . . . . . . . . . . . . . 120 1 2 3 4 5 6 Feedback control loop for automated door control system for train Unsafe control actions for train door controller . . . . . . . . . . Process model of the controller . . . . . . . . . . . . . . . . . . . Context of the control actions . . . . . . . . . . . . . . . . . . . . Feedback questionnaire . . . . . . . . . . . . . . . . . . . . . . . Results of the votes . . . . . . . . . . . . . . . . . . . . . . . . . viii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 128 128 129 145 146 Chapter 1 Introduction John Knight wrote that ”Safety-critical systems are those systems whose failure could result in loss of life, significant property damage, or damage to the environment” [19]. His primary concern was the consequences of failure. There are many safety-critical system examples such as automotive systems, medical devices, aircraft flight controls, nuclear systems, etc. Malfunctions in these systems could have significant consequences such as severe injuries and mission failure. We know that those systems are dependent on software to the extent that they could not work without the software. So, they should be designed, verified, and validated very carefully to make sure that they obey the system specifications and requirements and are free from errors. One of the motivations for doing this research is a study by the National Institute of Standard and Technology [42] found that 70% of software faults are introduced during the requirements and architecture design phases, while 80% of those faults are not discovered until system integration testing or later in the development life cycle. The cost of rework to fix the software problem is more expensive than finding and fixing during the requirements and design phase. This presents a big challenge in designing safety-critical embedded systems to address system faults that are recently discovered late in the development life cycle. The software in safety-critical systems may not harm people directly, but it can control different types of equipment which may cause harm. There are many examples of safety systems that have failed due to software related faults such as ESA Rocket [5] , Therac-25 [23], PCA Pump [37], Toyota Prius [14], ARIANE 5 [24], Boeing 777-200 [41], and pacemaker types such as Medtronic/Biotronik/St.Jude Medical, [6]. This evidence shows that software errors, bugs and flaws in the requirements might have catastrophic consequences. For that reason, software operating critical functions should be very carefully designed and 1 analyzed to avoid and prevent errors as well as their propagation. From the system safety perspective, interaction complexity among embedded software and hardware platforms will increase criticality between system components. For that reason, safety can be described as an emergent behavior property that arises from the interaction of system components. That property requires understanding how the components are connected and how they interact with each other in the system design. Also, understanding is required because the concept of emergent behavior carries the idea that unsafe interaction among components could lead to violating safety requirements. From the software perspective, in the early system design phase, we could notice that software components are scattered across so many nodes that it may introduce problems such as late or early delivery of data, loss of system operation or concurrent access to the same resources. Additionally, we could see in the early design phase of the software that it is possible to generate values beyond expected boundaries, that there is inconsistency between requirements or that there are mismatched timing requirements. If we know the faults cause errors and the those errors are not mitigated, then errors will propagate to the implementation and may not be caught by testing efforts. If these problems remain undetected during software testing, they could lead to serious errors and potential injury. From safety and software perspectives, we need to deal with software aspects according to the safety requirements to build a safe system. The real challenges start here. How can one develop a safe system that fulfills safety constraints in the development of software-intensive systems? At the same time, hazard analysis techniques cannot describe the dynamic error behavior of the system and error propagation among system components. Because of that, traditional hazard analysis techniques depend on decomposition of the system with respect to the hierarchy of failure effects instead of the systems architectural model. For that reason, there is a need for a new hazard analysis method to support modeling and analyzing dynamic error behavior to find hazards in early development phases. To the best of our knowledge, no guidance to augment existing hazard analysis techniques to find hazards and additional information that they missed exists. Also, to the best of our knowledge, no formalized guidance for hazard analysis techniques to improve the rigor of their methods exists. 1.1 Problem Statement In this dissertation, we address the incompleteness of hazard analysis to find faults that exist beyond just the controller. We provide a systematic procedure to augment a recent hazard analysis approach, STPA 2 (System-Theoretic Process Analysis), with error propagation information and dynamic support contributors to find additional hazards that STPA did not find. In addition, the augmented method also addresses the lack of using mathematical models in safety analysis to ensure that all causes of a fault have been identified. Finally, we also address the lack of ability of ensure that a safety constraint fully mitigates a fault. The problem statement could be summarized as follows: The development of safety critical software would be improved if: 1. Guidance were available on how to augment the fault finding analyses to address incompleteness. 2. Guidance were available on how to formalize analysis to improve the rigor of the hazard analysis methods. 3. Recent hazard analysis methods were linked with compositional reasoning techniques to verify the safety constraints against the faults they mitigate. 1.2 Research Method In this dissertation, we are analyzing the system before the architecture design is completed. The purpose of our work is to allow a stakeholder or safety analyst, at design time, to improve the safety of the system and build a safe product. Building on the work previously done [38], this research seeks to develop a safety analysis method that provides a deeper analysis of the system under review. The method will be developed in the form of an architectural model that describes internal failures in one of the major components (sensor, controller, actuator, controlled process, and controller input device) of the feedback control loop architecture, and we will show how the internal failure affects the other components in the form of three-way interactions. We will illustrate the method through several scenarios. We will provide a theoretical foundation in the form of mathematical models for each communication channel and error in the feedback control architecture. This formal background will help to provide a guide to determine the expected behavior of the system because most of system design starts with a system specification and there is not necessarily a complete description of the system’s behavior. In addition to documenting and describing the error, we will also verify that the hazard situation which causes the unsafe behavior of the system is mitigated. This evaluation of the system safety verification help to verify the safety constraints based on the system / component design. 3 1.3 Contributions The primary contributions of the dissertation will be as follows: 1. The introduction of a method of augmenting STPA (System-Theoretic Process Analysis) with error propagation information and error behavior state machines using AADL (Architectural Analysis and Design Language). 2. The introduction and analysis of a mathematical notation for each error flow in the feedback control loop architecture. 3. The identification of a method of verifying safety constraints / mitigating unsafe control actions in the hazard analysis process. 1.4 Dissertation Statement In this dissertation, we assume that: 1. The error propagation information and error behavior state machine analysis can be applied to augment hazard analysis methods to find new hazards as well as specific causes and effects. Recent hazard analysis methods such as STPA (System-Theoretic Process Analysis) mechanisms can be adapted and used to provide support for this type of augmentation under different scenarios and development contexts. 2. The increasing complexity and criticality of modern computing systems are driving the need for enhanced hazard analysis methods. This augmentation is used to support modeling dynamic error behavior of the system during hazard analysis and representing a formal framework for the hazard identification in system architecture. 3. Safety-critical systems are driving the need to understand the feedback control loop architecture to manage variations, component failures and emergent behavior between system components resulting in hazards. Our framework can be applied to augment feedback control loops with mathematical notations and expressions to find hazardous conditions in a design, to improve rigor of STPA, and to formalize error descriptions in systems described in AADL. 4. Compositional reasoning, especially assume-guarantee contract techniques, can be applied to augment safety analysis methods to verify safety requirements identified for each component. The feedback control loop architecture in STPA can be adapted and used to support the contract. 4 1.5 Research Questions In this dissertation, we are attempting to answer the following specific research questions in the safety-critical systems: • RQ1) Can the safety of the system be improved by augmenting a hazard analysis method with error propagation information? We need to provide deeper analysis of the system under review. This allows us to find more faults before instantiating the product. • RQ2) Can we analyze the internal failures of the components in order to show its effect of the other components during hazard analysis process? Modern information systems have many components. The interaction among components can make the system unreliable or unsafe. We need to know how the components interact with each other in the early design phase. For example, what will happen to the system if one of the components has internal failure? • RQ3) How does knowledge of the interaction among the components in the early design phase help us to identify internal failures? Safety can be defined as an emergent behavior property of the system. This allows us to deal with interactions among system components. • RQ4) How can the propagated error be presented in the form a mathematical model? We need to analyze system behavior in a formal way to expect unsafe interactions between system components. For example, what are those elements / parameters that effect the other components? • RQ5) How can we identify the assume-guarantee contract for each component in the feedback control loop architecture? We need to know the specific information about specification and implementation for each component in the feedback control loop. 1.6 Dissertation Organization The remainder of the dissertation is organized as follows: Chapter 2 provides necessary background on STPA (System-Theoretic Process Analysis), AADL (Architecture Analysis and Design Language), EMV2 (Error-Model Annex version 2), safety verification methods, STPA augmentation reasons and false positive/false negative in safety-critical systems; Chapter 3 provides feedback control loop architecture augmentation sections such as augmenting STPA with error propagation information, mathematical notation for formalized augmentation of STPA and augmenting STPA with compositional reasoning; Chapter 4 provides our method 5 for analysis and evaluation; Chapter 5 provides justifications and limitations of evaluation; Chapter 6 discusses related work; Chapter 7 introduces plans for future enhancements of each safety-critical application example; and Chapter 8 provides conclusions based on the differentiation between our method and the current accepted standard. 6 Chapter 2 Background In this section, we present relevant background material for this dissertation. This section will provide an overview of STPA (System-Theoretic Process Analysis), AADL (Architecture Analysis and Design Language), EMV2 (Error-Model Annex v2), STPA augmentation reasons and false positives / false negatives in safety-critical systems. 2.1 System-Theoretic Process Analysis STAMP (Systems-Theoretic Accident Model and Processes) is a causality model built on system theory used to analyze system accidents. In this model of system safety, accidents could occur when components fail or when interactions between system components are inadequately handled by the control system. Therefore, STAMP views system safety as a control problem and introduces three main concepts: safety constraints, hierarchical control structures and process models [20]. STPA (System-Theoretic Process Analysis) is a top-down hazard analysis approach built on STAMP. The idea behind this approach is to avoid accidents by modifying the design of the system before implementation. The goal is to identify potential causes of accidents, that is scenarios that may lead to losses, so they can be controlled or eliminated in the system design or operations before damage occurs. In a nutshell, it provides scenarios to control and mitigate the hazards in the system design. The method consists of four steps to provide scenarios [21]: 1. The stakeholder establishes fundamental analyses to identify accidents and the hazards associated with those accidents. For example, if we assume that the accident is two aircraft colliding with each other, 7 then, the hazards could be the two aircraft violate minimum separation or the two aircraft enter unsafe atmospheric regions. 2. The stakeholder designs a feedback control loop for the system to identify major components such as sensors, controllers, actuators and the controlled process. Then, the stakeholder labels the control / feedback arrows as shown in figure 2.1. Essentially, the information from the figure 2.1 about each element is extracted into in the feedback control loop. For example, we need to know that the process model is the mental model of the controller. It is used to store information about variables and their values. The sensor is used to measure the values of the variables. The actuator is used to follow the instructions of the controller. The controlled process is used to implement commands that come from the controller through the actuator and send the result back to the controller through the sensor.The guide words among elements are used to identify hazards. 3. The stakeholder identifies unsafe control actions that could lead to hazardous states. The stakeholder should identify the following unsafe control actions, and then translate them to corresponding safety constraints: • A control action is needed for safety, but it is not provided. • An unsafe control action is provided. • A safe control action is provided too late or too early. • A safe control action is stopped too soon or applied too long. 4. The stakeholder identifies causal factors for the unsafe control actions. The safety analyst determines how each hazardous control action could occur by identifying the process model variables for the controller in the feedback control loop and analyzes each path to find out the cause of the unsafe control actions. An example STPA analysis is provided in appendix B. 8 Figure 2.1: Feedback Control Loop in STPA 2.2 Architecture Analysis and Design Language The Architecture Analysis and Design Language (AADL) is an architecture modeling language which uses a common description language to bind software and hardware elements together into a unified model. The language is a standard of the Society of Automotive Engineers (SAE) and is actively maintained. The language can be extended through the use of annexes; each annex is standardized independently and has separate syntax or functionality [32]. 9 2.2.1 Language Overview The language has keyword constructs for representing the execution platform, hardware components (device, processor, bus, memory), software components (data, subprogram, thread, process) and the integration of the two (system). AADL is a well known modeling language for safety-critical systems because it captures the hardware and software aspects of a system and their interactions. It is also possible to create (abstract) components in the early stage of system design that can then can be refined into either software or hardware at a later time. Additionally, AADL includes connections among components of various types such as ports, bus access and data access [32]. AADL components are composed of two separate definitions. The first is a specification of the component. It defines the component type and its external interfaces used to communicate with the rest of the world. The second is an implementation. It defines the inside of the component and how the interfaces are connected to sub-components to provide services. One of the important of specification feature is a port. The port represents the interaction point for an event / notification, data transfer or a combination of events and data. The port can be specified for components such as devices, threads, processes and systems. There are three types of ports [11]: 1. The event port represents the reception of a notification and carries the signal without any data. For example, detection of data from a sensor such as temperature being above or below a defined threshold. When using this kind of port, the software is waiting for an incoming event. 2. The data port represents an interface which receives data flow and keeps track of the latest values. The port value will be updated as soon as possible when a new data value is received. 3. The event data port represents an interface that combines the event port and the data port to exchange data with notification in the same entity. It sends the signal (event) associated with the data. AADL also introduces the concept of flows. The flow enables the representation and analysis of logical paths through an architecture, and can reflect scenarios that can be the basis for system level analysis. AADL can support and declare flow specifications and implementations for the components. The flow specification defines a logical flow from input port to output port of the component. The logical flow can be represented as either a data flow, fault event flow or control flow. The flow specification can define three types of flows for a component [9, 15]: 10 1. The flow source defines a flow which originates within a component and emerges from the component through an outgoing port. 2. The flow sink defines a flow which terminates within a component and enters the component through an incoming port. 3. The flow path defines a flow which enters a component through an incoming port and emerges through an outgoing port. The flow implementation elaborates the flow specifications through sub-components. It starts with the incoming port of the flow specification, follows a sequence of connections and sub-components and ends with an outgoing port. The last type declaration of a flow is the end-to-end flow which defines a complete flow path from the starting component to the final component in the system. In this dissertation, we use the flows concept to analyze important system characteristics. For instance, we show (in listing 2.1 and figure 2.2) a flow within a speed control system in the vehicle. This flow originates at Brake Pedal Sensor, traverses through the Speed Control System and terminates at Throttle Actuator. When the driver pushes the brake pedal, the Brake Event will pass through the outgoing event data port via flow source to the Speed Control System. Then, the Speed Control System receives the Brake Event through incoming event data port, adjusts the throttle setting value to decrease speed of the vehicle and sends the value to the throttle value via the flow path to the out event data port. The Throttle Actuator reduces fuel into the engine based on the received setting value through the incoming event data port. The flow sink shows the impact of Brake Event on the Throttle Actuator and stops the flow. 1 2 3 4 5 6 device Brake Pedal Sensor features Brake Event : out event data port ; flows Brake Flow : flow s o u r c e Brake Event ; end B r a k e P e d a l S e n s o r ; 7 8 9 system Speed Control System features 10 Brake Event : in event data port ; 11 T h r o t t l e A d j u s t i n g : out event data port ; 12 flows 11 13 14 C r u i s e F l o w : f l o w p a t h B r a k e E v e n t −> T h r o t t l e A d j u s t i n g ; end S p e e d C o n t r o l S y s t e m ; 15 16 17 18 19 20 21 device Throttle Actuator features Throttle Adjusting : in event data port ; flows T h r o t t l e F l o w : flow s i nk T h r o t t l e A d j u s t i n g ; end T h r o t t l e A c t u a t o r ; Listing 2.1: Flow AADL Example Figure 2.2: Flow Visualized 2.3 Error-Model Annex In designing safety-critical systems, we need to show that the system is reliable and resilient to different types of failures. For that purpose, a number of different types of safety analysis methods are required in order to show the system is safe and has appropriate measures to handle critical failures. We know that the main objective of a Model-Based Engineering (MBE) approach is to automate the development process such as building model based tools to automate the safety validation process. The core AADL language does not contain the language constructs to do safety analysis. For that reason, the core language is extended with an annex dedicated to safety analysis, Error Model Annex v2, which provides the required semantics for catching safety information and showing error propagation through the architecture. The Error Model Annex is used to capture what could go wrong in the architecture, to specify which 12 errors could arise from the components, how those errors propagate through the architecture and how they impact components behavior. This annex has four levels of abstraction. In order, they are [9, 11, 33]: 1. Error Ontology: Represents error types in a hierarchical structure to support hazard analysis and assist modelers to make sure that they have considered different types of error propagation in the interaction among system components. The error ontology can distinguish errors within the architecture and how each type can propagate or impact the components’ behavior. We provide a hierarchical structure of error ontology in appendix A. In here, we give a brief description of each type of error [10, 38]: • Service Errors: Represent errors which are related to delivering service for items. Service errors can be categorized as omission errors, no service delivered (such as loss of a message or command), and commission errors, which represent unexpected service provided (such as unintended incoming data). • Value Errors: Represent errors which are related to the value domain of a service. Value errors can be categorized as value errors for individual service items like out of range sensor reading, value errors for sequences of service items like bounded value change and value errors related to the service as a whole like out of calibration. • Timing Errors: Represent errors which are related to the time domain of a service. Timing errors can be categorized as individual service items like early/late item delivery, timing errors for sequences of service items like rate errors, and timing errors related to the service as a whole like early/delayed services. • Replication Errors: Represent errors which are related to delivery of replicated services. For example, replicate service items delivered for one recipient or to multiple recipients. • Concurrency Errors: Represent errors which are related to behavior of concurrent systems like executing tasks concurrently to access shared resources. Errors are distinguished between race condition errors and mutual exclusion errors. • Access Control Errors: Represent errors which are related to operation of access control services like authentication and authorization errors. Developers can use the error ontology in AADL, which is defined in the ErrorLibrary.aadl in OSATE (Open Source AADL Tool Environment), when they design a system. Although, if the existing 13 names do not fit with their system, developers can define their own error types in a package as shown in listing 2.2. 1 2 package My Error Types public 3 a n n e x EMV2 { ∗∗ 4 error types 5 BrakeEventError : type ; 6 BadValue : t y p e e x t e n d s B r a k e E v e n t E r r o r ; 7 NoValue : t y p e e x t e n d s B r a k e E v e n t E r r o r ; end t y p e s ; 8 ∗∗}; 9 10 end MyErrorTypes ; Listing 2.2: Define Error Types in AADL 2. Error Flows in the Architecture: Represents how errors flow through the system architecture, after we identified the type of errors for the system in the previous step. In this step, EMV2 allow us to describe: • Error Propagation: Define incoming and outgoing errors for a component. The error propagation has the ability to specify what error type the component can expect to receive or send, but it does not have the ability to specify how incoming and outgoing error types relate to each other. For example, no one knows how an error from an incoming port can generate an error on an outgoing port or how an error in the processor effects a process and results in an outgoing propagation on its data port. The error flow can solve the relation problem between errors. • Error Flow: Defines the relationship between incoming and outgoing error propagations. There are three types of error flows: (a) Error Source: Specifies that the error originates from this component. Especially, when the component has an internal failure, it will generate the error being propagated. (b) Error Path: Specifies the transmission of an incoming error propagation to an outgoing error propagation. The error is propagated by a component itself. The component is passing the error from the incoming port to the outgoing port. 14 (c) Error Sink: Specifies that the error is stopped at this component. The error can be handled or mitigated at this component because it impacts the architecture. In listing 2.3, we show how to define error propagations for each component such as the Brake Pedal Sensor, Speed Control System and Throttle Actuator. We can see that the device Brake Pedal Sensor is an error source for any error type which propagates out of the event data port Brake Event. The device Throttle Actuator is an error sink for any error type. The Speed Control System is an error path because an error of any type can propagate into the system through the incoming event data port, and the error can propagate out through the outgoing event data portm Throttle Adjusting. We do not know how the propagated error impacts the component at this stage. We will show that in the next step. 1 2 device Brake Pedal Sensor features Brake Event : out event data port ; 3 4 a n n e x EMV2 {∗∗ 5 use types E r r o r L i b r a r y ; 6 error propagations Brake Event : out p r o p a g a t i o n s { AnyError }; 7 flows 8 Sensor Flow : e r r o r s o u r c e Brake Event { AnyError }; 9 end p r o p a g a t i o n s ; 10 11 12 ∗∗}; end B r a k e P e d a l S e n s o r ; 13 14 15 system Speed Control System features 16 Brake Event : in event data port ; 17 T h r o t t l e A d j u s t i n g : out event data port ; 18 a n n e x EMV2 {∗∗ 19 use types E r r o r L i b r a r y ; 20 error propagations 21 Brake Event : i n p r o p a g a t i o n s { AnyError }; 22 T h r o t t l e A d j u s t i n g : out p r o p a g a t i o n s { AnyError }; 23 flows 15 C o n t r o l F l o w : e r r o r p a t h B r a k e E v e n t −> T h r o t t l e A d j u s t i n g ; 24 end p r o p a g a t i o n s ; 25 26 27 ∗∗}; end S p e e d C o n t r o l S y s t e m ; 28 29 30 device Throttle Actuator features Throttle Adjusting : in event data port ; 31 32 a n n e x EMV2{∗∗ 33 use types E r r o r L i b r a r y ; 34 error propagations T h r o t t l e A d j u s t i n g : i n p r o p a g a t i o n s { AnyError }; 35 flows 36 Actuator Flow : e r r o r s i n k T h r o t t l e A d j u s t i n g { AnyError }; 37 end p r o p a g a t i o n s ; 38 39 40 ∗∗}; end T h r o t t l e A c t u a t o r ; Listing 2.3: Error Propagation Concept in AADL 3. Component Error Behavior: Represents how errors impact a component’s behavior. For instance, an error can change the operating mode of a component from nominal mode to failure mode. For that purpose, EMV2 provides an error behavior state machine (EBSM) to catch safety behavior of a component according to its states, internal failure/internal error events and errors coming from outside world. We can define the states of a component regarding its error behavior. Basically, error behavior state machines (EBSM) allows stakeholders to define states according to system operational environment. For example, we consider two states for each component: Operational state (which means the component is active and it is operating without any error), and Failed state (which means the component is possibly active but not without error). A transition can define the condition under which a state change could occur. Each transition consists of a source state which is the initial state of the component and a destination state which is the final state after the transition is triggered. A condition is an error event that needs to be activated or triggered to activate the transition. We show the error behavior of a com16 ponent in listing 2.4, by two transitions. The first transition is from Operational state to Failed state when the Failure error event is raised. The second is from Failed state to Operational state when the Recovery event is activated. 1 2 e r r o r behavior Fail And Recover events 3 Failure E : error event ; 4 Recovery E : r e c o v e r event ; 5 states 6 Operational : i n i t i a l 7 Failed : state ; 8 9 10 11 state ; transitions t 1 : O p e r a t i o n a l −[ F a i l u r e E ]−> F a i l e d ; t 2 : F a i l e d −[ R e c o v e r y E]−> O p e r a t i o n a l ; end b e h a v i o r ; Listing 2.4: Error Behavior State Machine 4. Composite Error Behavior: Represents the state of the component according to state of the subcomponents. In another context, the state of a system depends on the state of the components. For example, a computer system is in the Failed state when one of its sub-parts (processor or memory) is Failing. EMV2 can support this through composite error behavior specifications. We show this in listing 2.5; when a system consists of two sensors to provide temperature, the main system will be in the failure if both sensors are failing. Otherwise, it will be in the operational state. 1 2 composite e r r o r behavior states 3 s e n s o r 1 . F a i l e d and s e n s o r 2 . F a i l e d −> F a i l e d ; 4 s e n s o r 1 . O p e r a t i o n a l o r s e n s o r 2 . O p e r a t i o n a l −> O p e r a t i o n a l ; 5 end c o m p o s i t e ; Listing 2.5: Composite Error Behavior 17 2.4 Safety Verification Methods Generally, verification concentrates on showing consistency among system implementation and functional specification but for the safety-critical systems that is not enough. The implementation of the system must satisfy the safety requirements. Specifically, the safety verification is different than functional verification. The functional verification ensures that the system fully satisfies its specification, but safety verification uses the result of safety analysis methods to ensure that the system meets the safety constraints [46]. We can classify safety verification into two types: 2.4.0.1 Safety Verification in Software Testing The software safety verification can be achieved by in two ways [2]: 1. Dynamic Safety Testing: requires the execution of the software, the result can be evaluated with respect to system safety features. In addition, testers monitor behavior of the system during testing like entering erroneous values / out of range values. 2. Static Safety Analysis: looks over the code and design document of the system. It is used to verify accuracy of safety feature with regard to the actual system/software. 2.4.0.2 Safety Verification in Software Architecture Compositional reasoning is a partitioning formal analysis of a system architecture into a sequence of verification tasks which correspond to the decomposition of the system architecture. This partitioning of verification effort will help to find proof for each subcomponent within the system architecture so that the analysis will handle a large system design. In a nutshell, system level properties depend on the properties of the components within the system. For example, from safety perspective, system safety depends on the safety properties of the components. For that purpose, AGREE (Assume Guarantee REasoning Environment) was created for managing proofs of components described in AADL [7]. AGREE is a compositional verification tool that can be used to confirm the behavior of a component. It follows the popular assume guarantee reasoning model which states that provided the assumptions about a component’s inputs are met the component can provide certain guarantees about its output. The guarantees correspond to component requirements, and the assumptions correspond to the environmental requirements that were used in verifying the component requirements. The contract specifies the information which is needed to reason about the component’s interaction with other components of the system [26]. 18 As an example, let’s build a safety constraint based on error values: In an ACC (Adaptive Cruise Control) system, the sensor should measure the distance between vehicles in the specific range (0-100) feet. Any values beyond that limit are likely erroneous and distance cannot be negative. We will focus on the actions of the controller. If the controller has an assumption that it will receive values in the range, it will perform a safe action if the values it received are in the specified range. Outside of this range, whether a safe action is performed cannot be guaranteed. An implementation of this example is shown in listing 2.6 1 2 system sensor features d i s t a n c e : out d at a p o r t Base Types : : I n t e g e r ; 3 4 a n n e x a g r e e {∗∗ g u a r a n t e e ” S e n s o r o u t p u t i s b e t w e e n 0 and 100 ” : d i s t a n c e >= 0 and 5 d i s t a n c e <= 1 0 0 ; 6 7 ∗∗}; end s e n s o r ; 8 9 10 system implementation sensor . i a n n e x a g r e e {∗∗ a s s e r t d i s t a n c e = i f d i s t a n c e < 0 t h e n 0 e l s e i f d i s t a n c e > 100 t h e n 11 100 e l s e d i s t a n c e ; 12 13 ∗∗}; end s e n s o r . i ; 14 15 16 system c o n t r o l l e r features d i s t a n c e : in d at a p o r t : Base Types : : I n t e g e r ; 17 18 a n n e x a g r e e {∗∗ assume ” C o n t r o l l e r s h o u l d r e c e i v e v a l u e s among 0−100 ” : d i s t a n c e <= 100 19 and d i s t a n c e >= 0 ; 20 21 ∗∗}; end c o n t r o l l e r ; Listing 2.6: AADL AGREE Annex ACC system example Additionally, the Strategic Software Engineering Research Group (SSERG) at Clemson University 19 extended AGREE to XAGREE (eXtended Assume Guarantee REasoning Environment) to support the inheritance features of the AADL language. This addition allows XAGREE to produce more maintainable and less complex verification assets than AGREE for architectures that incorporate inheritance such as Software Product Lines and Dynamic Software Product Lines [25]. 2.5 System-Theoretic Process Analysis Augmentation Reasons From our experience in applying STPA on different types of safety-critical systems, a systematic literature review that we conducted in the domain and the understanding of the method in the system safety community, we come to the conclusion that STPA needs significant augmentation because of the following reasons: 1. STPA lacks precise information about each major component: STPA is a system method built on the system theory. Nancy Leveson states ”The systems approach focuses on systems taken as a whole, not on the parts taken separately” [21]. This means that system theory allows STPA to provide accidents, hazards, and causes at the system level not the component level. However, in AADL, systems include hardware and software components as described in section 2.2.1. This allows stakeholders to deeply analyze the system and identify specific causes of system failure. The causes can be either software, hardware or both together leading to the system failure. 2. STPA uses a feedback control loop as a system model but doesn’t use the feedback control loop used as an architectural model: STPA is a recent hazard analysis technique and depends on a monolithic decomposition of the system with respect to the hierarchy of failure effects instead of the system’s architectural model. The reason for using monolithic decomposition is because of system complexity. Nancy Leveson states ”Increased interactive complexity and coupling make it difficult for the designers to consider all the potential system states or for operators to handle all normal and abnormal situations and disturbances safely and effectively” [20]. This means that system theory does not let STPA access lower subcomponents in the feedback control loop because of the system complexity. At the same time, software engineering principles allow stakeholders to decompose systems into subsystems to solve the complexity problem. For example, in AADL, we can decompose the system into two layers with each layer containing subsystems. We can bind the layers together, and we can show error propagation between the layers. This deeper analysis could help stakeholders identify the source of faults or hazards 20 in the feedback control loop architecture. For instance, system software can have failures because of error propagation from hardware to software. If the stakeholder upgrades the software to resolve the problem, the problem will come back after the upgrade because the problem is not the software but the hardware. The stakeholder needs to change the hardware not upgrade the software to resolve the problem. 3. STPA does not use the reliability principle (fault→ error → f ailure): STPA looks for causes of hazards based on system thinking, and it considers the result of the hazard as an accident. There is not a component reliability concept in STPA because it analyzes safety based on the system. Nancy Leveson states ”Safety then can be treated as a dynamic control problem, rather than a component reliability problem” [22]. She also states ”In general, words like ”failure” and ”error” should be avoided, even as causes, because they provide very little information and are often used to avoid having to provide more information or to think hard about the cause” [22]. However, reliability engineering principles provide system safety based on the reliability of each individual component. The system will not be safe if the components have failures. Additionally, in AADL, we use the error ontology to support hazard analysis. This allows stakeholders to identify hazards based on the reliability concept. The basic principles of reliability needed in this dissertation are: fault is root cause of the error, error is the difference between the measured value and the correct value, and failure is the termination of a product’s ability to do a required function. 4. STPA does not use a formalization concept: STAMP provides a framework for STPA to investigate and analyze accidents. This is a major limitation of STPA. Nancy Leveson states ”Three basic constructs underlie STAMP: safety constraints, hierarchical safety control structures, and process models” [21]. This limitation tells us that STPA does not have any connection with formal principles such as system architecture analysis, mathematical models, state machine analysis, error propagation, etc. However, the formalization concept helps improve the rigors of safety analysis methods. 2.6 False Positive and False Negative in Safety-Critical Systems The safety analysts / stakeholders are expecting the system failure before it occurs. They apply preventative techniques to the systems to avoid the failure or reduce time to repair by preparing the system for upcoming failure events. For that purpose, stakeholders proposed proactive fault handling methods to deal 21 with faults in order to prevent the system failure or to minimize the system downtime. This method combines failure prediction and proactive action. The preventative actions are triggered by prediction of upcoming failure events, and the repair actions are prepared for imminent failure to reduce time to repair. Thus, the systems require the prediction of failures to judge a faulty situation or to make a decision to handle the fault. It is very possible that the failure predictor will be wrong in its predictions. For example, the predictor might detect an upcoming failure event that never occurs, this is called false positive (FP). Also, the predictor might fail to predict a failure, this is called false negative (FN) [43]. Now, we provide the real world examples to identify false positives and false negatives. For example, from the software testing perspective, when a test is executed and, in spite of it running correctly, the test shows us there is an error, the tester will search for the nonexistent bug adding a lot of cost. That is a false positive (FP). In addition, when the execution of a test shows no faults although there is a bug in the software application. That is a false negative (FN). From the the system safety perspective, in ACC system, if a warning / automatic braking happens before the critical distance, this is a false positive (FP). Also, if a warning / automatic braking happens too late, this is a false negative (FN) too [28]. 22 Chapter 3 Feedback Control Loop Architecture 3.1 Augmenting a Hazard Analysis Method with Error Propagation Information As stated in section 1.3, the first goal of this research is: • The introduction of a method of augmenting STPA (System-Theoretic Process Analysis) with error propagation information and error behavior state machines using AADL (Architectural Analysis & Design Language). The first goal is associated with three of the research questions shown in section 1.5: • Can the safety of the system be improved by augmenting a hazard analysis method with error propagation information? • Can we analyze the internal failures of the components in order to show its effect on the other components during hazard analysis process? • How does the interaction of the components in the early design phase help us to identify internal failures? The traditional version of STPA cannot describe dynamic error behavior, state transitions, and error propagations among system components. This is because of the reasons described in section 2.5, thus, as part of this dissertation, we have augmented STPA with error propagation information and error behavior state 23 machines to support modeling / analyzing the dynamic error behavior of the system during hazard analysis process and to identify potential internal failures of the major components during application of STPA. This will allow us to identify additional hazards based on different criteria for existing application examples, identify additional unsafe control actions in different analysis, identify general / specific causes for unsafe control actions and blend safety constraints to the systems architecture to build a safe product. We propose a method that helps stakeholders or safety analysts, during hazard analysis, to consider the error behavior of a component, to identify each path in the form of three-way interactions among components in the feedback control loop architecture and to analyze the trace of each hazard. We illustrate this method by describing several scenarios and examples to support it. We will provide models for the scenarios by using the Architecture Analysis and Design Language (AADL) supported by Open Source AADL Tool Environment (OSATE) [16] to develop a architecture representation. We extend each step in the STPA process with error propagation information and dynamic error behavior. In addition, we add a final step to merge safety aspects of the system into the systems architecture. The proposed method consists of the following steps: 1. Identify hazards using different criteria: This step involves identifying accidents based on the system operational context. The error ontology provides guidance in identifying hazards. 2. Build control structures with contributors: This step is the construction of a feedback control loop with finite state machines for describing dynamic behavior of the system, and add error propagation specifications across the system to analyze the trace of the hazards which may lead to the accidents. 3. Identify unsafe control actions using tracing: This step helps to identify the unsafe control actions based on the error propagations tracing. It identifies the error behavior of a component which can lead to inadequate control actions that could become an unsafe action. This step also helps to identify any error flow for which a corresponding safety constraint needs to be created to mitigate the identified hazards. 4. Identify specific causes: This step helps to identify causes for the unsafe control actions in the feedback control loop. Generally, it needs to select the component first and then specifically look for the causes which relate internally and external to the component. 5. Develop safety architecture: A safety architecture is implemented based on the safety constraints identified in the previous steps. It blends safety aspects of the system into the overall system architec24 ture. We will create this augmentation to help the stakeholder or the safety analyst in identifying and evaluating dysfunctional behavior of the system during hazard analysis. The main goal of analyzing the behavior of the system is to identify hazardous control actions by considering the specification of error propagations across the system and operating states of the system which can have an effect on control actions. Using the characteristics of event driven models and error propagation information in the safety analysis can assist in providing an effective way to annotate and assess all possible paths in the feedback control loop. In this method, steps 1 through 4 can identify the source of errors and their impact on the other components during operation. Step 5 helps to enhance the safety of the system by feeding the identified hazardous situations or propagated error events into the system’s architecture to create constraints to mitigate the hazards. The heart of the STPA approach is a feedback control loop to analyze safety of the system. We wish to improve STPA to facilitate a deeper analysis of the system. Figure 3.1 shows the feedback control loop with error propagation and finite state machine models. Figure 3.1 consists of four component types: sensors, controllers, actuators, and the controlled process. These components have incoming and outgoing ports used to send / receive data as well as error events. The connections among components represent the nominal control flow as well as the error propagation path. The error propagation path follows port connections from sensors to controllers, controllers to actuators, actuators to the controlled process and tje controlled process to the sensors. First, the sensors are used to measure the values of attributes and send them to the controller. Each sensor has two states, operational and failed. Second, the controller acquires information about the state of the process from measured variables and controlled variables, then the controller uses this information for an initial action by manipulating controlled variables to maintain the operational process within predefined limits. The controller is used to regulate the process variables and send commands to the actuator. The controller consists of a process model which is used to present variables and their values. The controller has two states: normal and error states. The normal state shows the status of the variables in normal operation. The error state shows abnormal values of the variables. Third, the actuator follows the controller’s instructions to execute the commands. The actuator has two states, operational and failed. Fourth, the controlled process is used to show processes inside of the controller and implement the controller’s decision. The error flow passes errors inside of the component from an incoming port to an outgoing port. 25 Figure 3.1: Feedback control loop augmented with error propagation and finite state machines 26 In the figure 3.1, we show three scenarios to demonstrate our method and to show the need for augmentation. We now detail each scenario. 3.1.1 Scenario 1: Scenario 1 begins with the sensor in the operational state, and the failed state is entered when the sensor detects an internal failure. In this scenario, we demonstrate how the error propagates from the sensor to actuator and becomes hazardous to the system. Consider when the sensor detects an internal failure. The error continuously effects the normal operational state of the sensor. If the sensor is able to recover, the error will not propagate to the next component. But, if it does not recover, the error propagates through the outgoing port to the controller. The controller receives the error through its incoming port and processes it using the process model. In this case, the process model does not understand the propagated error. The propagated error then becomes an error in the controller. If the controller is able to handle the error, then no hazard would exist. The controller will automatically go back to its normal state. But, if the controller is not able to resolve the error, then a hazard would exist because the controller could possibly send a command to the actuator when it shouldn’t. To diagnose the cause of this unsafe control action, we go back to the system controller because we expect the controller to be faulty due to its sending of commands when it should not. However, in this situation the controller does not have any faults. We need to go back another step to find the exact cause of the fault. In conclusion, using this scenario, we have shown that the propagated error from sensor to actuator has three important effects in the system. First, it effects the state of the sensor itself. Second, it effects the decision of the controller. Third, it effects the actuator. 3.1.2 Scenario 2: Scenario 2 begins with the controller in the normal state, and the error state is entered when the controller detects an internal failure. The generated error does not propagate out; it stays within controller. Different types of errors propagate through the outgoing port to the actuator instead of the generated error. For example, the controller generates an (out of range) error, but it sends a (no data) error to the actuator because the controller is unable the out of range error. In this case, the controller becomes a source of errors. In this potentially hazardous situation, for each propagated error, the actuator is executing a different type of unsafe control action. For instance, if these errors are generated in the controller (i.e. stuck value error, out of range error, and out of calibration error) the actuator is doing an unacceptable action for the transformed 27 error (i.e. no value, bad value, and incorrect value). After that, the controlled process receives the propagated error from the actuator as an inadequate or ineffective action through the error propagation path. The error then passes through the error flow inside of the controlled process to the outgoing port. Therefore, the output of the controlled process would be an incorrect, inaccurate or delayed measurement. For example, if the input to the controlled process is delayed, then it is likely its output would also be delayed. In conclusion, using this scenario, we have illustrated that the propagated error from the controller to the controlled process has three important effects on the system. First, it effects the controller decision. Second, the actuator performs an inadequate action. Third, it effects the controlled process by selecting the inappropriate process for the action. This illustration allows us to identify the source of the hazard by back-tracing for the error. 3.1.3 Scenario 3: Scenario 3 begins with the actuator in the operational state, and the failed state is entered when the actuator detects an internal failure. The actuator receives a command from the controller through the incoming port, but the command is not executed as the controller intended because of the actuator’s internal failure. Thus an error occurs in the actuator leading to the actuator entering the failed state. If the actuator is able to resolve the error, the controller’s command will not be impacted. If the actuator is not able to resolve the error, a hazardous situation could result impacting the actuator’s output, like causing the actuator to produce output late. The delayed operation directly effects the controlled process because the timing error passes through error flow to the output port. The output of the controlled process becomes an input to the sensor, but it has a feedback delay because of propagating the timing error. At the same time, the output of the sensor becomes an input to the controller, but it also has a feedback delay because of propagating the timing error. Therefore, the propagated error from the actuator to the sensor has three important effects on the system. First, it effects of the actuator state. Second, it effects of the controlled process by having a delay time to select the appropriate process for the action. Third, it effects of the sensor by having the delay time to obtain measured values. 3.2 Additional Scenarios to the Augmented Method In this section, we provide additional scenarios that answer the following questions: • How can an error outside of the feedback control loop effect a component’s behavior inside of the loop? (Scenario 4) 28 Figure 3.2: The augmented method for additional scenarios • How can we identify the system states for each scenario? (Scenario 5) Figure 3.2 will be used for the additional scenarios on the feedback control loop architecture. Consider the following scenarios: 3.2.1 Scenario 4: As shown in figure 3.2, scenario 4 begins with a device which is connected to the feedback control loop. The device has a user interface which is used to set values of the controller and monitor the status of the system. The purpose of this scenario is that we wish to understand how the internal failure of the connected device effects the feedback control loop architecture and how errors propagate from devices external to the 29 loop to components in the loop. We begin with the operator interface device (OID) in the operational state, and the failed state is entered when the OID detects an internal failure. The error continuously effects the normal operational of the device. If the device is able to recover, the error will not propagate to the feedback control loop. But, if it does not recover, the error propagates through an outgoing port to the controller. The controller receives the error through its incoming port processes it using the process model. But, the process model does not understand the propagated error. Because of this, the propagated error becomes an error event which may lead to the controller transitioning from the normal state to the error state. If the controller is able to resolve the error event, no hazard would be possible. But, if the controller is not able to resolve the error event, a harzardous event leading to an unsafe control action being sent to the actuator and the feedback control loop state becoming unsynchronized with the OID could occur, as shown in the figure 3.2. In conclusion, using this scenario, we demonstrate the propagated error from the connected device has three important effects in the system. First, it effects the state of the connected device which leads to the sending of an ineffective/inadequate command to the controller. Second, it effects the decision of the controller. Third, it effects the actuator causing it to perform an ineffective/inadequate action. 3.2.2 Scenario 5: As shown in figure 3.2, scenario 5 is used to find the system or top level, states of the feedback control loop according to the previous scenarios. As a reminder, we present each error flow based on threeway interactions back-tracing for the error. Each error flow could assist in identifying failure states of the system. Scenario 1: We have this error flow (Sensor → Controller → Actuator), and we need to find out how this flow can show us the system states. Specifically, we need to provide a safety requirement for the failure state: [(Sensor.FailedState) ∨ (Controller.ErrorState) ∨ (Actuator.FailedState)] → System.FailedState (3.1) 30 [(Sensor.OperationalState) ∧ (Controller.NormalState) ∧ (Actuator.OperationalState)] → System.OperationalState (3.2) Scenario 2: We have this error flow (Controller → Actuator → ControlledProcess), and we need to find out how this flow can show us the system state. Specifically, we need to provide a safety requirement for the failure state: [(Controller.ErrorState) ∨ (Actuator.FailedState) ∨ (ControlledProcess.ErrorState)] → System.FailedState (3.3) [(Controller.NormalState) ∧ (Actuator.OperationalState) ∧ (ControlledProcess.NormalState)] → System.OperationalState (3.4) Scenario 3: We have this error flow (Actuator → ControlledProcess → Sensor), and we need to find out how this flow can show us the system state. Specifically, we need to provide a safety requirement for the failure state: [(Actuator.FailedState) ∨ (ControlledProcess.ErrorState) ∨ (Sensor.FailedState)] → System.FailedState (3.5) [(Actuator.OperationalState) ∧ (ControlledProcess.NormalState) ∧ (Sensor.OperationalState)] → System.OperationalState (3.6) 31 Scenario 4: We have this error flow (Inter f aceDevice → Controller → Actuator), and we need to find out how this flow can show us the system state. Specifically, we need to provide a safety requirement for the failure state: [(Inter f aceDevice.ErrorState) ∨ (Controller.ErrorState) ∨ (Actuator.FailedState)] → System.FailedState (3.7) [(Inter f aceDevice.NormalState) ∧ (Controller.NormalState) ∧ (Actuator.OperationState)] → System.OperationalState (3.8) 3.2.3 Notifications Section 3.1 and its examples have been published as a technical paper in the 35th International System Safety Conference (ISSC) in Albuquerque, New Mexico, USA. The evaluation results of the scenarios are described in section 4. 3.3 Using Formal Notations to Augment a Hazard Analysis Method As stated in section 1.3, the second goal of this research is: • The introduction and analysis of a theoretical framework to identify unsafe system behavior. The second goal is associated with a research question as stated in section 1.5: • How to find more hazardous possibilities in the feedback control loop architecture? The primary goal of augment the analysis processes of STPA is to perform a deeper analysis of system safety. We know that STPA identifies hazardous control actions based on the values of the process model variables and then changes the hazards into safety constraints for the controller. To identify hazards, each relevant value will be examined to determine the context of the control action because the controller will decide which action to perform based on the value. If one of the existing values in the process model is not 32 compatible within the control action context that action can be considered an unsafe control action. From a software perspective, what will happen if the value/variable is not defined or missed in that context? Does this lead to missing safety constraints for the system? Finally, how does STPA resolve this problem? From our point of view, we need to know the specific information about each component in the feedback control loop such as functions, inputs, outputs, variables, and values. In addition, we need to know specifically how the process model in the controller makes a decision based on the existing values and variables. Specifically, operation of the safety-critical system can be expressed as a function relating system inputs and outputs. There is a relationship among inputs and outputs where each output value will be related to an input value. When the system is working correctly, the system satisfies all functional requirements. This does not make the system safe. The system should be analyzed regarding the safety aspects and verified according to its safety requirements. Our objective is to find new hazards and more safety constraints to further ensure safe. In this case, we need to identify safety constraints for unsafe actions because any unsafe behavior in the operation of the component. For this purpose, we now introduce the theoretical background of our analysis process. Our background consists of several functions describing the behavior of the feedback control loop, and by providing details of the functions, potential hazards can be identified and mitigated. 1. Specify criteria to identify unsafe behavior of a component: This step involves specifying an identifier to identify a faulty component in the feedback control loop and giving an error to the component. This step can be expressed formally as a two-tuple: <INTF, Err> where: • < INT F > is faulty component identifier. This means a component will deliver incorrect service to the next component becoming an error source: [INT F ∈ DC], where DC is the set of detection conditions. • < Err > is the error. It is the result of an internal failure of a component. The error has an effect on the component’s behavior. If the error propagates to the other components it may / may not lead to unsafe interaction among components. [Err ∈ EO], where EO is the set of error ontology. 2. Build notation to specify input/output/function for each component: The purpose of this step is the construction of the feedback control loop with mathematical notation for describing the functional behavior of each component as is shown in figure 3.3 33 Controller - tuple: < FC (V me, FPM (Var,Val,Cond)) = CA >, where: • < FC > is the function of the controller. Normally, it takes an input from the sensor/interface device, processes it, and gives a command to the actuator. • < V me > is the measured variable. It is the input to the controller. It is used to measure the value of the variable which is read by the sensor. [V me ∈ Var], where Var is the set of variables in process model. But if [V me → Err], where [Err ∈ / FPM ] • < FPM > is the process model function. It is the mental model of the controller. It can be represented as another component inside of the controller. It contains variables and their values. It is used to determine what control action is needed based on receiving content of the measured variables. It consists of three important elements: – < Var > is the set of variables which are used by the process model. – < Val > is the set of values that can be used by the variables. – < Cond > is the set of conditions that can be used by the process model to make a decision, where: [{Var,Val} ∈ Cond]. The condition is that the received values via measured variable should be matched with the defined values in the process model function. • < CA > is the control action. It is the output of the controller. It can be considered as a command to actuator, where the {CA → True iff Cond → True}. This means the control action is going to be a safe action if and only if the condition is true. Otherwise, the control action is going to be inadequate action that can become an unsafe action. Actuator-tuple:< FA (CA) = VMa > where: • < CA > is the control action, input to the actuator from the controller. If this is an expected control action, then the actuator will behave as expected. However, if this is an unsafe control action, the actions of the actuator might lead to a hazardous situation. • < V ma > is the manipulated variable, input to the controlled process from the actuator. The control action (CA), as previously mentioned, determines the output of the actuator. Controlled Process-tuple: < FCP (Vma ) = V c> where: 34 • < V ma > is the manipulated variable, input to the controlled process. If the values are manipulated according to the controller’s command, then the controlled process will be modified according to the limits defined in the controller. • < V c > is the controlled variable, output of the controlled process. The actual behavior of the controlled process is monitored through the controlled variable which is dependent on input. If the input is in the range defined by the controller, the output will be in range as well. Sensor-tuple: < FS (V c) = V me > where: • < V c > is the controlled variable, input to the sensor, which is used to update information that produced by the sensor. • < V me > is the measured variable, output of the sensor. The sensor sends information to the controller in the form of measured values. If the sensor has an internal failure, it will provide incorrect information to the controller. Additionally, the sensor can give incorrect information to the controller if the sensor updates its information using incorrect values that have been provided by the controlled processes through controlled variables. Top-tuple: <System> where: • System is the set of sub-components that the top component decomposes into. If a component has an internal failure, how will the system react to it? Does the system have a safe behavior or will it react unsafely? 3. Identify unsafe functional behavior and find more hazardous possibilities using the error ontology: This step helps to identify unsafe functional behavior of a component that violates safety requirements for the associated system. The internal failure of a component can give inadequate results that could lead to an unsafe function. This step also helps to identify corresponding safety constraints which need to be created to mitigate unsafe functional behavior. In this step, we are using the error ontology to find the possibilities of unsafe functional behavior which leads to the system behaving incorrectly. If we find the hazardous possibilities before we develop the system, it helps to improve our confidence that the system will behave correctly when it faces hazardous conditions. 4. Identify mathematical notation for each error flow: This step identifies formal expressions for each error flow. We know that the propagated error cuts across three components. For that reason, each 35 Figure 3.3: A framework for feedback control loop inputs/outputs and functions error flow should contain three functions. In this case, the proposed safety constraints should cover three components to mitigate the effect of the error flow. 3.3.1 Notations and Expressions for Train Automated Door Control System We take the same example described in appendix B which is an automated door controller for a train, and we apply our notations and expressions to find more hazardous possibilities: 1. Specify criteria to identify unsafe behavior of a component: Doors are locked. The train driver is unable to open them because the sensor provides incorrect readings to the controller. This leads to the passengers being stuck inside the train when the train is stopped at the station platform. FS (V c) =    V me   Err if Measured Values ∈ V me; if Measured Values ∈ / V me, where FS (INT F) → Err, Err ∈ EO. The sensor sends measured values to the controller via measured variables. The controller is able to recognize the values that were previously defined in the variables (i.e measured variables). When the sensor has an internal failure, it will send different values to the controller. Then, the controller is not 36 able to understand those values leading to the stuck door. 2. Build notation to specify input/output and function for each component: As shown in figure 3.3. 3. Identify unsafe functional behavior and find more hazardous possibilities using the error ontology: These functions are identified by predicting the behavior of the sensor and can help to determine the response of the controller. Function Open Door Table 3.1: Unsafe Functional Behavior Sensor Function Controller Function FS (V c, INT F) = Err, assume Err ∈ FC (FPM (Var,Val,Cond)) 6= ErrValues CA, i f ErrValues ∈ / Val, T hen Cond → False As shown in table 3.1, the function of the sensor shows the internal failure of the component and provides incorrect values to the controller (i.e the output of the sensor is an error (Err) which can be considered as one of the errors in the error ontology [like error values].) The function of the controller shows that the incoming error value is not an element in the defined values of the process model. For that reason, the decision of the process model is going to be unknown. This effects the output of the controller because the condition of the decision is false. Then, the output of the controller is not equal to the safe control action. As shown in table 3.2 by formalizing the error ontology, we can find more hazardous possibilities. Table 3.2: Finding More Possibilities Finding possibilities in controller If ErrValue > MaxVal If ErrValue < MinVal If ErrValue ∈ / Val If ErrValue = ”Null Value” If ErrValue = ”Stuck Value” Description The controller does not open the door when it gets above of the range values The controller does not open the door when it gets below of the range values The controller does not open the door when value error occur (none of the values) The controller does not open the door when it gets omitted error value The controller does not open the door when it gets the same repeated value 37 Is this possibility hazardous? Yes Yes Yes Yes Yes 4. Identify mathematical notation for each error flow: As shown in table 3.3, we identify formal expressions for each error flow or scenario that was described in previous sections. This helps to identify safety constraints with respect to error contexts to mitigate the effects of the error on the system. For example, SC(S, C, A, Err) means the controller (C) is required to provide control action for the actuator (A) with respect to error type (Err) context for that message which comes from the sensor (S). Table 3.3: Mathematical Notations for Error Flows and Safety Constraints 3.3.2 Scenarios Error Flow Expressions Safety Constraints (SC) 1 FS (V c, INT F) → FC → FA (V ma, Err) SC(S, C, A, Err) 2 FC (V me, FPM , INT F) → FA → FCP (V c, Err) SC(C, A, CP, Err) 3 FA (CA, INT F) → FCP → FS (V me, Err) SC(A, CP, S, Err) 4 FOID (V me, INT F) → FC → FA (V ma, Err) SC(OID, C, A, Err) Notifications This section 3.3 and its example 3.3.1 have been published as a technical paper in the 36th In- ternational System Safety Conference (ISSC), Phoenix, Arizona, USA. In that session, 21 system safety researchers and practitioners are voted for 10 questions that have been prepared about the notations. The results of the votes are described in the appendix H. 3.4 Introducing Feedback Control Loop Architecture with Compositional Reasoning This section can be considered the last step of the proposed method. The main reason behind doing this as part of the dissertation is that STPA has no method of verifying that the safety constraints proposed are correct. Thus our associated research question is How to introduce STPA with assume-guarantee contract to verify safety constraints? What are the steps to verify the safety constraints against system model? System / software verification concentrates on showing functional correctness of the system and demonstrating that the system fully satisfies all functional requirements. However, the functional requirements do not make the system safe or reduce the risk level of the system. Therefore, the system should be 38 analyzed with respect to safety aspects and verified against its safety requirements at component / system level. The overall objective of this part of the dissertation is to demonstrate the possibility of verifying safety-critical software (i.e architecture model) against safety constraints which are derived at the component / system level. Generally, to control the criticality of safety critical software, we need to identify the potential hazards using the error ontology as a guide and then demonstrate that a potential hazard could not occur (i.e the software of the system cannot contribute to an unsafe state). The contribution of this work is to introduce the feedback control loop architecture with safety verification procedures to verify the identified safety constraints and to show that hazards have been mitigated or controlled to an acceptable level of risk. In addition, our contribution could assist stakeholders with reducing the amount of time and effort by doing safety analysis and verification of the safety requirements on the same system architecture rather than doing them separately. The procedure consists of the following steps: 1. Identify an assume-guarantee contract for each component: This step is the construction of a feedback control loop architecture with assumptions (A) describing expectations the component makes about the environment and guarantees (G) describing the behavior the component makes about its output if the assumptions are met. 2. Identify mathematical notations for each component: This step involves identifying expressions for each component based on the mathematical notations which are described in previous sections. For example, using FS , FC , FA , FCP as functions for the components in the implementation of the component, and using V me,CA,V ma,V c as input/output for components in the specification. 3. Identify safe behavior for the top level system: This step assists in identifying guarantee of safe behavior in the overall system which depends on the guarantee of correct output of each component in the feedback control loop architecture. The notation inside the the figure 3.4 demonstrates this. 4. Satisfy safety constraints: This step assists in verifying the identified safety requirements against system model. We need to show that the model of the system satisfies the derived safety constraints. Previously, we built the safety constraints with regard to context of the error types. For example, our safety constraint, (The system should not accept null values), is identified as an omission error in the error ontology; it can also be mapped to one of the unsafe control actions in STPA which is (not 39 providing control action). If the system does not accept null values, it means the system satisfies the safety constraint against system model. Consider figure 3.4 which introduces the feedback control loop architecture with compositional reasoning. The main advantage of this introduction is it mitigates the unsafe control action, the wrong decision of the controller that has been made based on the received incorrect values. To eliminate this action, we can make a guarantee for the output of the sensor, at the same time, constructing an assumption input for the controller. As a result, validate that the sensor does not send incorrect values to the controller leading to a safer system. This type of analysis can provide a certainty about behavior for the system and ensure that the system is not behaving incorrectly. Table 3.4 helps to ensure the controller does not follow/execute the unsafe control actions because there is a contract among components such as sensor-controller, controller-actuator, actuator-controlled process, and controlled process-sensor. For example, we build the following safety constraints based error ontology context: SC1: The controller has a guarantee to not receive null values from the sensor. (Omission error value)!! SC2: The controller has a guarantee to not receive unintended incoming values from the sensor. (Commission error value)!! SC3: The controller has a guarantee to do an action within defined time range. (Timing error)!! When we verify the SC1, the controller mitigates one of the unsafe control actions which is [Action required but not provided] that can lead to hazards. In addition, when we verify SC2, the controller mitigates another unsafe control action which is [Unsafe action provided], and so on. Table 3.4: Mitigating the unsafe control actions STPA Error Ontology Action required but not provided Omission service error types Unsafe action provided Commission service error types Action provided too early/too late Service timing errors Provided action stopped too soon/applied Sequence commission (early start/late tertoo long mination) 40 A/G Contract SC1 SC2 SC3 SC3 Figure 3.4: Introduce feedback control loop architecture with assume-guarantee reasoning 41 Chapter 4 Analysis and Evaluation In this section, we evaluate our compositional safety analysis method, Architecture Safety Analysis Method (ASAM), used for developing safety-critical systems. Our method will allow system stakeholders or safety analysts to mitigate the effects of errors through safety constraints developed during the construction of software architecture. In the method, we focus on finding safety constraints and verifying them within the system model. To do so, we inject the AADL error ontology into the architecture model to identify the unsafe control actions and causal factors of the unsafe control actions. We verify the safety constraints that we generate within the system model to ensure the hazardous situations are sufficiently mitigated through the use of compositional reasoning on the system model. From a technical perspective, ASAM is a new software safety analysis tool which works with AADL models annotated with an implementation of our formulas (i.e. asam annex, introduced in section 3.3) and supported by the Open Source Architectural Tool Environment (OSATE). It is used to analyze and generate a report based on information attached to each component in the feedback control architecture. In fact, ASAM provides several statements to support discovering hazards such as error statements, error propagation statements, internal failure statements and safety constraint statements that either handle or prevent hazards. The goal of the statements is to describe errors in order to determine their source and prevent propagation. If error propagation cannot be prevented, ASAM can describe how the error is eventually handled as well as what component handles it. Additionally, ASAM lets the safety analyst record the severity level of the hazard for the specific error type based on the probability of occurrence for that type of error as well as verify that a safety constraint properly mitigates the error. In this research, we focus on evaluating ASAM based on concerning claims: 42 Table 4.1: Evaluation Examples Overview C1 Safety-Critical System Examples Adaptive Cruise Control System -(ACCS) 4.1.2 Train Automated Door Control System-(TADCS) 4.1.3 Medical Embedded Device-Pacemaker (PM) 4.1.4 Infant Incubator Temperature Control System-Isolette (ISO) 4.1.5 C2 C3 4.2.2 4.2.3 4.2.4 − − 4.3.2 4.3.3 − • Claim C1: Whereas STPA only finds hazards that exist in the communication between the controller and the actuator, ASAM finds errors, that may or may not lead to hazards, that exist in not only in the controller / actuator but other components as well. • Claim C2: Each error flow in ASAM assists in identifying a safety constraint of appropriate breadth to eliminate the unsafe control actions caused by an identified error. In addition, each error flow can guide us to identify specific causes for the unsafe control action. • Claim C3: ASAM uses verification, unlike existing safety analysis methods, to further ensure that the safety constraint fully mitigates the hazardous condition improving the quality of the safety constraints. Each claim is discussed individually in the remaining sections. Throughout the evaluation, we will use the safety-critical system examples that were described in the previous chapter such as the adaptive cruise control system (scenario 3.1.1), the train automated door control system (scenario 3.1.2), the pacemaker (scenario 3.1.3), and the isolette (scenario 3.2.1). For each of our selected examples, we will present the implementation of the scenarios, some of which will be reused across evaluations. 4.1 Demonstrating Three-Way Communication Format Claim C1: Whereas STPA only finds hazards that exist in the communication between the controller and the actuator, ASAM finds errors, that may or may not lead to hazards, that exist in not only in the controller / actuator but other components as well. 4.1.1 Evaluation Plan To demonstrate claim C1, we will evaluate each of our selected examples. For each system, we will first introduce the system and its architecture. We will then produce a report for that system under the described scenario. Each scenario evaluates an error flow based on three-way interaction format for the specific example to identify the source of the hazard. 43 ASAM shows the effect of unsafe interactions based on the three-way communication format among components while back-tracing the error. As a reminder, generally, the current accepted standard identifies unsafe interactions based on the two-way communication format. Specifically, it focuses on the interaction between the controller and actuator to identify unsafe control actions. 4.1.2 Adaptive Cruise Control System The first system we introduce is an adaptive cruise control system for modern vehicles. The goal of this system is to monitor the distance and speed of the vehicle in front of the monitored vehicle (i.e. the system compares the speed of the monitored vehicle with the vehicle in front based on repeated measurements of the distance between them). This example is used to support (scenario 3.1.1) and show how our method helps to identify additional hazardous situations for an existing application example, described in the [22]. For demonstration purposes, we evaluate what will happen if the sensor in adaptive cruise control (ACC) estimates the incorrect values for speed and distance of the vehicle in front of the monitored vehicle (e.g., the sensor estimates close enough, but in reality is not) during driving because of an internal failure. When the component has an internal failure, the result is propagation of an error. The error values propagate from the sensor to the controller causing the controller to process the erroneous value. This will cause the controller to send an incorrect command to the actuator. For this situation, the ACC system warns the driver to apply the brake because the monitored vehicle is close enough with the vehicle in front. If the driver does not apply the brake, the system automatically will do it. Either way, an accident will happen because the monitored vehicle is doing an unacceptable action based on a decision made with incorrect values. In this case, either the driver or the system will apply the brakes in the middle of the road possibly causing the monitored vehicle to be hit by the vehicle behind it. The initial results for this ACC system example have been published in [38] which are: 1. The error ontology identifies wrong estimation as incorrect values. The hazard is ”incorrect estimation values for the monitored vehicle from the ACC system”. 2. The feedback control loop system has been built from the sensor to the actuator as shown in Figure 3.1 which is error flow 1 (scenario 3.1.1). 3. The unsafe control action is that ”The system applies the brakes in the middle of the road”. The safety constraint for the unsafe control action is that ”The system must not apply the brakes when it has incorrect values”. 44 4. The general cause for the unsafe control action is that ”The sensor has an internal failure” and the specific cause is that ”The propagating of incorrect values causes an error event from sensor to actuator”. 5. This is shown in the developed safety architecture subsection, as shown in figure 3.1. We will now provide and discuss the individual core components of the feedback control loop architecture for ACC system. 1 2 s y s t e m ACC sensor features 3 ACC Speed s : o u t d a t a p o r t ; 4 ACC Distance s : out data p o r t ; 5 ACC State s : in data p o r t ; 6 flows 7 f s o u r c e 1 : f l o w s o u r c e ACC Speed s ; 8 f s o u r c e 2 : flow source ACC Distance s ; 9 end ACC sensor ; Listing 4.1: ACC System In listing 4.1, we show the first primary component of the architecture: the interface for the ACC sensor. This interface shows that the ACC sensor component produces 2 values and receives 1 value. The two outputs are a ACC Speed s and a ACC Distance s value that represent the speed and distance of the in front of car. These two values also represent the source of information flow. The final value ACC State s is the controlled process state. The implementation of this interface is shown in listing 4.2. 1 s y s t e m i m p l e m e n t a t i o n ACC sensor . i m p l 2 a n n e x asam {∗∗ 3 E r r s => [ { 4 −− I d e n t i f y s p e c i f i c e r r o r t y p e s f o r t h e s e n s o r , i d e n t i f y c h a n c e o f harming people . 5 6 Type =>AboveRangeSpeed ( C r i t i c a l , p = 0 . 1 ) , −−F o r t h e s p e c i f i c e r r o r t y p e , what k i n d o f u n s a f e a c t i o n w i l l t h e sensor perform ? 7 UCA => UCA1 : ”ACC s e n s o r d i s p a t c h e s a b o v e r a n g e o f s p e e d v a l u e s t o controller ” , 45 −−What a r e t h e c a u s e s o f t h e U n s a f e C o n t r o l A c t i o n (UCA) o f t h e 8 sensor ? C a u s e s => { 9 10 G e n e r a l => ”ACC s e n s o r h a s an i n t e r n a l f a i l u r e ” , 11 S p e c i f i c => ”ACC s e n s o r r e c e i v e s a b o v e r a n g e o f s p e e d v a l u e from f e e d b a c k ” }, 12 −− I d e n t i f y S a f e t y C o n s t r a i n t s ( SC ) f o r t h e s e n s o r t o m i t i g a t e 13 e f f e c t o f t h e UCA. SC => SC1 : ”ACC s e n s o r s h o u l d n o t s e n d i n c o r r e c t s p e e d v a l u e s t o 14 c o n t r o l l e r for computation ” 15 }, 16 { 17 Type =>B e l o w R a n g e D i s t a n c e ( C a t a s t r o p h i c , p = 0 . 1 ) , 18 UCA => UCA2 : ”ACC s e n s o r d i s p a t c h e s below r a n g e o f d i s t a n c e v a l u e s to c o n t r o l l e r ” , C a u s e s => { 19 20 G e n e r a l => ”ACC s e n s o r h a s an i n t e r n a l f a i l u r e ” , 21 S p e c i f i c => ”ACC s e n s o r r e c e i v e s below r a n g e d i s t a n c e v a l u e from f e e d b a c k ” 22 }, 23 SC => SC2 : ”ACC s e n s o r must n o t d i s p a t c h i n c o r r e c t d i s t a n c e v a l u e s to c o n t r o l l e r for computation ” }] 24 25 26 ∗∗}; end ACC sensor . i m p l ; Listing 4.2: ASAM for ACC System Sensor In listing 4.2, the ASAM annex allows stakeholders to record specific information in the implementation part of each component in the ACC architecture model. For example, we have recorded the following information in the ACC sensor’s implementation: error (Err) types for the sensor’s internal failure (above range of speed and below range of distance), unsafe control actions (UCA) for each error type, general or specific causes for each unsafe control action, probability of occurrence (P) for each error type, severity level 46 of hazards for each error type, and safety constraints (SC) to mitigate the effects of the error types or unsafe control actions. 1 2 system ACC controller features 3 ACC Speed c : i n d a t a p o r t ; 4 ACC Distance c : in d a t a p o r t ; 5 ACC Speed cmd : o u t d a t a p o r t ; 6 ACC Ditance cmd : o u t d a t a p o r t ; 7 flows 8 f p a t h 1 : f l o w p a t h ACC Speed c −> ACC Speed cmd ; 9 f p a t h 2 : f l o w p a t h A C C D i s t a n c e c −> ACC Ditance cmd ; 10 end A C C c o n t r o l l e r ; Listing 4.3: ACC System In listing 4.3, we have the ACC controller interface that controls the speed and distance of the vehicle. This interface has two in data ports, ACC Speed c and ACC Distance c, that allow the controller to receive speed and distance values from the sensor. The interface also has two out data ports:ACC Speed cmd and ACC Ditance cmd. ACC Speed cmd sends the speed command to actuator and ACC Ditance cmd sends the distance command to actuator. The interface has a flow path to make a link between input and output ports. For example, f path1 is a flow path used to transfer speed information within the controller from input port ACC Speed c to output port ACC Speed cmd. Also, f path2 is another flow path used to pass distance information within the controller from ACC Distance c to ACC Ditance cmd. The implementation of this interface is shown in listing 4.4. 1 system implementation ACC controller . impl 2 a n n e x asam {∗∗ 3 E r r s => [ { 4 Type =>AboveRangeSpeed ( C r i t i c a l , p = 0 . 0 0 0 3 ) , 5 UCA => UCA3 : ”ACC c o n t r o l l e r w a r n s t h e d r i v e r t o a p p l y t h e b r a k e s ” , 6 C a u s e s => { 7 G e n e r a l => ”ACC c o n t r o l l e r h a s an i n t e r n a l f a i l u r e ” , 8 S p e c i f i c => ”ACC c o n t r o l l e r r e c e i v e s a b o v e r a n g e o f s p e e d v a l u e from s e n s o r ” 9 }, 47 SC => SC3 : ”ACC c o n t r o l l e r s h o u l d n o t do a c o m p u t a t i o n when i t h a s 10 i n c o r r e c t speed values ” 11 }, 12 { 13 Type =>B e l o w R a n g e D i s t a n c e ( C a t a s t r o p h i c , p = 0 . 0 0 0 3 ) , 14 UCA => UCA4 : ”ACC c o n t r o l l e r s e n d s command t o a p p l y t h e b r a k e s i n the middle of the road ” , C a u s e s => { 15 16 G e n e r a l => ”ACC c o n t r o l l e r h a s an i n t e r n a l f a i l u r e ” , 17 S p e c i f i c => ”ACC c o n t r o l l e r r e c e i v e s below r a n g e d i s t a n c e v a l u e from f e e d b a c k ” 18 }, 19 SC => SC4 : ”ACC c o n t r o l l e r must n o t a p p l y t h e b r a k e s when i t h a s incorrect distance values ” }] 20 21 22 ∗∗}; end A C C c o n t r o l l e r . i m p l ; Listing 4.4: ASAM for ACC System Controller In listing 4.4, we have recorded two incoming errors from the ACC sensor such as above range speed and below range distance. For the first error, above range speed, the controller warns the driver to apply the brakes because the speed of your monitored vehicle is higher than the vehicle in front. The error is estimated by the sensor incorrectly and propagated to the controller through the nominal control flow as well as the error propagation path. In this case, the action of controller is going to be an unsafe action. The causes for that action can be classified into two types: 1) a general cause, ACC controller has an internal failure and 2) a specific cause, ACC controller received incorrect speed values from the sensor. To mitigate the effects of that unsafe action, we provide a safety constraint: ACC controller should not do a computation when it has incorrect speed values. We have attached a safety requirement for that error to know how to correct the error if it happens. Also, we have recorded the probability of occurrence for above range speed error and its hazard severity level. For the second error, below range distance, the ACC controller sends a command to apply the brakes in the middle of the road because your car is too close to the car in front. The error is estimated by the sensor incorrectly and propagated to the controller. In this case, the action of controller is going to be an unsafe 48 control action. The causes for that action can be divided into two types: 1) a general cause, ACC controller has an internal failure, and 2) a specific cause, ACC controller received incorrect distance values from the sensor. We provide a safety constraint to mitigate the effects of the unsafe actions: ACC controller must not apply the brakes when it has incorrect distance values. We have attached a safety requirement for that error which helps to know how to correct the error if it happens. Also, we have recorded the probability of occurrence for below range distance error and its hazard severity level. 1 2 system ACC actuator features 3 ACC EXE Speed a : i n d a t a p o r t ; 4 ACC EXE Distance a : i n d a t a p o r t ; 5 Unsafe Action a : out data port ; 6 flows 7 f p a t h 3 : f l o w p a t h ACC EXE Speed a−>U n s a f e A c t i o n a ; 8 f p a t h 4 : f l o w p a t h ACC EXE Distance a−>U n s a f e A c t i o n a ; 9 end A C C a c t u a t o r ; Listing 4.5: ACC System In listing 4.5, we have the ACC actuator interface that follows the controller’s instructions to execute commands. This interface has two in data ports, ACC EXE Speed a and ACC EXE Distance a, that allow the actuator to execute speed and distance commands from the controller. The interface also has an out data port, Unsafe Action a, that allows the actuator to send commands to the controlled process. In this case, inadequate commands from controller lead to unsafe commands from the ACC actuator. The interface also has a flow path to make a link among in and out data ports. For example, f path3 is a flow path within the actuator used to transfer data from ACC EXE Speed a to Unsafe Action a. Also, f path4 is another flow path within the actuator used to transfer data from ACC EXE Distance a to Unsafe Action a. The implementation of this interface is shown in listing 4.6. 1 system implementation ACC actuator . impl 2 a n n e x asam {∗∗ 3 E r r s => [ { 4 Type =>AboveRangeSpeed ( C r i t i c a l , p = 0 . 2 ) , 5 UCA => UCA5 : ”ACC a c t u a t o r e x e c u t e s t h e ’ warn t h e d r i v e r ’ command ” , 6 C a u s e s => { 49 7 G e n e r a l => ”ACC a c t u a t o r h a s an i n t e r n a l f a i l u r e ” , 8 S p e c i f i c => ”ACC a c t u a t o r r e c e i v e s i n a d e q u a t e command from controller ” }, 9 SC => SC5 : ”ACC a c t u a t o r s h o u l d n o t e x e c u t e t h e i n c o r r e c t command ” 10 11 }, 12 { 13 Type =>B e l o w R a n g e D i s t a n c e ( C a t a s t r o p h i c , p = 0 . 3 ) , 14 UCA => UCA6 : ”ACC a c t u a t o r e x e c u t e s t h e ’ a p p l y b r a k e s ’ command ” , C a u s e s => { 15 16 G e n e r a l => ”ACC a c t u a t o r h a s an i n t e r n a l f a i l u r e ” , 17 S p e c i f i c => ”ACC a c t u a t o r r e c e i v e s an i n c o r r e c t command from controller ” 18 }, 19 SC => SC6 : ”ACC a c t u a t o r must n o t e x e c u t e t h e i n a d e q u a t e command ” }] 20 21 22 ∗∗}; end A C C a c t u a t o r . i m p l ; Listing 4.6: ASAM for ACC System Actuator In listing 4.6, we have two incoming commands from the ACC controller to the ACC actuator through two in data ports: ACC EXE Speed a and ACC EXE Distance a. But, each command contains an error. For example, there is an above range of speed error in ACC EXE Speed a, and a below range of distance error in ACC EXE Distance a. For the first error, above range of speed, the actuator executes the ’warn the driver’ command because the actuator has an internal failure or received an inadequate command from the controller. For the second error, below range of distance, the actuator executes the ’apply brakes’ command in the middle of the road because the actuator has an internal failure or receives an inadequate command from controller. To know how to mitigate the effects of the errors, we have attached safety constraints which help mitigate incorrect or inadequate commands. 1 2 system ACC controlled process features 3 Controlled Action : in data port ; 4 Feedback : out d a t a p o r t ; 50 5 6 7 flows f s i n k : flow s in k C o n t r o l l e d A c t i o n ; end A C C c o n t r o l l e d p r o c e s s ; Listing 4.7: ACC System In listing 4.7, we show the last interface, ACC controlled process. This interface gives a Feedback value to the sensor through the out data port. The interface also receives Controlled Action through an in data port from the actuator. The flow sink represents the end of the information flow which comes from the sensor. 1 2 system implementation e n t i r e s y s t e m . impl subcomponents 3 ACC sensor : s y s t e m ACC sensor . i m p l ; 4 ACC controller : system ACC controller . impl ; 5 ACC actuator : system ACC actuator . impl ; 6 A CC c on tr ol le d pr o ce ss : system A CC c on tr ol le d p ro ce ss . impl ; 7 connections 8 c o n n 1 : p o r t ACC sensor . ACC Speed s −> A C C c o n t r o l l e r . ACC Speed c ; 9 c o n n 2 : p o r t ACC sensor . A C C D i s t a n c e s −> A C C c o n t r o l l e r . A C C D i s t a n c e c ; 10 c o n n 3 : p o r t A C C c o n t r o l l e r . ACC Speed cmd −> A C C a c t u a t o r . ACC EXE Speed a ; 11 c o n n 4 : p o r t A C C c o n t r o l l e r . ACC Ditance cmd −> A C C a c t u a t o r . ACC EXE Distance a ; 12 c o n n 5 : p o r t A C C a c t u a t o r . U n s a f e A c t i o n a −> ACC controlled process . Controlled Action ; 13 14 15 c o n n 6 : p o r t A C C c o n t r o l l e d p r o c e s s . F e e d b a c k −> ACC sensor . A C C S t a t e s ; flows e t o e f s p e e d : end t o end f l o w ACC sensor . f s o u r c e 1 −>conn 1 −>A C C c o n t r o l l e r . f p a t h 1 16 17 −>conn 3 −>A C C a c t u a t o r . f p a t h 3 −>c onn 5 −>A C C c o n t r o l l e d p r o c e s s . f s i n k ; e t o e f d i s t a n c e : end t o end f l o w ACC sensor . f s o u r c e 2 −>conn 2 −>A C C c o n t r o l l e r . f p a t h 2 −>c onn 4 −>A C C a c t u a t o r . f p a t h 4 −>conn 5 −>A C C c o n t r o l l e d p r o c e s s . f s i n k ; 18 end e n t i r e s y s t e m . i m p l ; Listing 4.8: ACC System 51 In listing 4.8, we have represented the core components of the entire ACC system architecture. We have represented the connections among components. We have also represented the end to end flow information which goes from the sensor to the actuator. Figure 4.1: Three-Way Communication Format for Adaptive Cruise Control System Architecture Figure 4.1, a feedback control loop architecture has been built for the ACC system. Figure 4.1 shows the error flow for each error type based on the three-way interaction format. Figure 4.1 shows two error flows: one for the distance error and the other for the speed error. The first error flow (i.e., red line) shows the error propagation from ACC sensor to ACC actuator. The error in the measuring of distance propagates to the ACC controller through the output data port. The error distance value in the sensor can be prevented from being communicated through a safety constraint statement. But, if it is not prevented, it propagates to the controller. The controller also can handle the incoming error distance value through the in data port according to the safety constraint. If the controller is not able to handle that, it sends an inadequate command to the actuator. Then, the actuator will execute an inadequate action. The effect of that error changes the manipulated variable which is the output of the actuator. The second error flow (i.e. blue line) has the same description. ASAM generates a report for the adaptive cruise control system architecture based on component names, error ontology, probability values, probability thresholds, hazard levels, unsafe control actions, causes of unsafe control actions and safety constraints. This stream of information is attached to each imple52 Figure 4.2: ASAM’s report for Adaptive Cruise Control System Architecture mentation part of the error flow (flow source [ACC sensor.impl], flow path[ACC controller.impl], and flow sink[ACC actuator.impl]). The report is shown in figure 4.2. That analysis shows the source of faults in the ACC model, which is the sensor. Each fault in the sensor produces errors. Each error has a chance to harm people based on the probability of occurrence for that error and result of that error. For example, an ”above range of speed” error results in a critical level and a ”below range of distance” error results in a catastrophic level. Each error in the sensor gives an unsafe control action to the controller which results in sending an inadequate command to the actuator. For example, the ACC controller warns the driver to apply the brake. If the driver does not do that action, the system will automatically do it. Also, general and specific causes for each unsafe control action have been identified. Finally, we have safety constraints to know how to mitigate the effects of the unsafe control actions or the errors. 4.1.3 Train Automated Door Control System The second system we introduce is a train automated door control system. The goal of this system is to monitor the open and close events of the train door. This example is used to support (scenario 2 3.1.2). The train door example is described in [22]. We need to extend the same example to improve safety of the system from better to best. For instance, what will happen if train automated door controller sends inadequate commands instead of correct commands because of internal failure? Will it select the correct doors to open, and only open those doors when the train is stopped completely at the station platform? If the controller has an internal failure, it produces an error. The error becomes an event leading to changing the normal operational state of the process model to the error state. If the controller is able to handle the error event, a hazardous state will not occur. The controller will go back to its normal state. But, if the controller is not able to handle the error event, the error will be transformed into an outgoing propagation error from controller to actuator. As the result, the actuator is will perform an incorrect action. For example, it might select the right side doors processes in the controlled process to execute instead of left side doors. This can definitely lead to 53 a hazardous situation because the system guides people in the wrong direction. The initial result for this train automated door control system example has been published in [38] which are: 1. The error ontology identifies wrong selection as incorrect values. Now, the hazard is ”wrong side selection to open the wrong doors at the station platform”. 2. The feedback control loop has been built from the controller to the controlled process as shown in figure 3.1 which is error flow 2 (scenario 2). 3. The unsafe control action is that ”system selects right side doors to open instead of left side”. The safety constraints for the unsafe action is that ”the system must not open the doors when it has incorrect values”. 4. The general cause for the unsafe control action is that ”the controller has an internal failure” and the specific cause is that ”the propagating error event from the door controller to the physical door”. 5. This is shown in the developed safety architecture subsection, as shown in figure 3.1. We will now provide and discuss the individual core components of the feedback control loop architecture for train automated door control system. 1 2 system D o o r c o n t r o l l e r features 3 door open c : in data port ; 4 door blocked c : in data port ; 5 open door c : out data port ; 6 close door c : out data port ; 7 flows 8 c s o u r c e 1 : flow source open door c ; 9 c s o u r c e 2 : flow source c l o s e d o o r c ; 10 end D o o r c o n t r o l l e r ; Listing 4.9: Train Automated Door Control System In listing 4.9, we have the Door controller interface that opens and closes the train door when it is required. This interface has two in data ports, door open c and door blocked c, that allow the controller to 54 receive open door state and blocked door state values. The interface also has two out data ports: open door c and close door c. open door c is used to send the open door command to the physical door and close door c is used to send the close door command to the physical door as well. The interface has two flow sources of information, c source1 and c source2. The Door controller sends open door commands through c source1, and sends close door commands through c source2. The implementation of this interface is shown in listing 4.10. 1 system implementation D o o r c o n t r o l l e r . impl 2 a n n e x asam {∗∗ 3 E r r s => [ { −− I d e n t i f y s p e c i f i c e r r o r t y p e s f o r t h e d o o r c o n t r o l l e r , i d e n t i f y 4 c h a n c e o f harm p e o p l e . Type =>S e r v i c e C o m m i s s i o n ( C r i t i c a l , p = 0 . 1 ) , 5 −−F o r t h e s p e c i f i c e r r o r t y p e , what k i n d o f u n s a f e a c t i o n w i l l t h e 6 d o o r c o n t r o l l e r do ? UCA => UCA1 : ” Door c o n t r o l l e r s e l e c t s r i g h t s i d e d o o r s t o open 7 i n s t e a d of l e f t side ” , C a u s e s => { 8 −−What a r e t h e c a u s e s o f t h e U n s a f e C o n t r o l A c t i o n (UCA) o f t h e 9 door c o n t r o l l e r ? 10 G e n e r a l => ” The d o o r c o n t r o l l e r h a s an i n t e r n a l f a i l u r e ” , 11 S p e c i f i c => ” The p r o p a g a t i n g e r r o r e v e n t from t h e s e n s o r t o door c o n t r o l l e r ” }, 12 −− I d e n t i f y S a f e t y C o n s t r a i n t s ( SC ) f o r t h e d o o r c o n t r o l l e r t o 13 m i t i g a t e e f f e c t s o f t h e UCA. SC => SC1 : ” The d o o r c o n t r o l l e r must n o t open t h e d o o r s when 14 unexpected s e r v i c e or data i s provided f or a n a l y s i s ” }] 15 16 17 ∗∗}; end D o o r c o n t r o l l e r . i m p l ; Listing 4.10: ASAM for Train Door Controller In listing 4.10, we have identified a specific error type, the service commission, which represents unexpected service provided by the controller because of the controller’s internal failure. For that type of 55 error, we have also identified the probability of occurrence for the error and the hazard’s severity level. In addition, we have recorded the unsafe action of the door controller for that type of error: Door controller selects right side doors to open instead of left side. The general cause for that unsafe action is the controller’s internal failure. The specific cause is propagating an error from the sensor of the door to controller. We also have identified a safety constraint, the controller should not do any analysis for unintended incoming data, to mitigate the effects of the controller’s unsafe action within the model. 1 2 system D o o r a c t u a t o r features 3 open door a : in data port ; 4 close door a : in data port ; 5 d o o r s t a t e a : out data port ; 6 flows 7 a p a t h 1 : f l o w p a t h o p e n d o o r a −> d o o r s t a t e a ; 8 a p a t h 2 : f l o w p a t h c l o s e d o o r a −> d o o r s t a t e a ; 9 end D o o r a c t u a t o r ; Listing 4.11: Train Automated Door Control System In listing 4.11, we have the Door actuator interface that follows the door controller’s to execute commands. The interface has two in data ports, open door a and close door a, that allow the door actuator to execute open and close commands from the door controller. The interface also has an out data port, door state a, that allows the door actuator to show the physical door status after executing open or close commands. The interface also has two flow paths to make a link between the in and out data ports within door actuator. For example, a path1 is a flow path within the actuator used to transfer data from open door a to door state a. Also, a path2 is another flow path within the actuator used to transfer data from close door a to door state a. The implementation of this interface is shown in listing 4.12. 1 system implementation D o o r a c t u a t o r . impl 2 a n n e x asam {∗∗ 3 E r r s => [ { 4 −− I d e n t i f y s p e c i f i c e r r o r t y p e s f o r t h e d o o r a c t u a t o r , i d e n t i f y t h e chance of harming people . 5 6 Type =>S e r v i c e C o m m i s s i o n ( C r i t i c a l , p = 0 . 0 0 2 ) , −−F o r t h e s p e c i f i c e r r o r t y p e , what k i n d o f u n s a f e a c t i o n w i l l t h e 56 d o o r a c t u a t o r do ? UCA => UCA2 : ” Door a c t u a t o r f o l l o w s t h e d o o r c o n t r o l l e r ’ s commands 7 which o p e n s t h e wrong s i d e d o o r ” , C a u s e s => { 8 −−What a r e t h e c a u s e s o f t h e U n s a f e C o n t r o l A c t i o n (UCA) o f t h e d o o r 9 actuator ? 10 G e n e r a l => ” The d o o r a c t u a t o r h a s an i n t e r n a l f a i l u r e ” , 11 S p e c i f i c => ” The d o o r a c t u a t o r r e c e i v e d i n a d e q u a t e command from door c o n t r o l l e r ” }, 12 −− I d e n t i f y S a f e t y C o n s t r a i n t s ( SC ) f o r t h e d o o r a c t u a t o r t o 13 m i t i g a t e e f f e c t o f t h e UCA. SC => SC2 : ” The d o o r a c t u a t o r must n o t e x e c u t e i n c o r r e c t commands ” 14 }] 15 16 17 ∗∗}; end D o o r a c t u a t o r . i m p l ; Listing 4.12: ASAM for Train Door Actuator In listing 4.12, we have identified that the actuator opens the wrong doors. That’s because of two reasons: 1) the door actuator receives an inadequate command from the controller or 2) the door actuator has an internal failure. In this case, we need to provide a safety constraint to know how to mitigate the effects of actuator’s unsafe actions. 1 2 system Physical Door features 3 d o o r s t a t e i n c p : in data port ; 4 d o o r s t a t e o u t c p : out data port ; 5 6 7 flows c p p a t h 3 : f l o w p a t h d o o r s t a t e i n c p −> d o o r s t a t e o u t c p ; end P h y s i c a l D o o r ; Listing 4.13: Train Automated Door Control System In listing 4.13, we show the physical door interface in the train door system architecture. The interface has an in data port, door state in cp, that allows it to take the executed commands from door actuator. 57 Also, the interface has an out data port, door state out cp, that allows it to show the status of executed process that has been selected by the door controller. The data of the executed process automatically passes within the physical door interface from door state in cp to door state out cp. The implementation of this interface is shown in listing 4.14. 1 system implementation Physical Door . impl 2 a n n e x asam {∗∗ 3 E r r s => [ { −− I d e n t i f y s p e c i f i c e r r o r t y p e s f o r t h e p h y s i c a l door , i d e n t i f y t h e 4 chance of harming people . Type =>S e r v i c e C o m m i s s i o n ( M a r g i n a l , p = 0 . 0 0 0 0 1 ) , 5 −−F o r t h e s p e c i f i c e r r o r t y p e , what k i n d o f u n s a f e a c t i o n w i l l t h e 6 p h y s i c a l d o o r do ? UCA => UCA3 : ” The wrong p r o c e s s h a s b e e n s e l e c t e d t o e x e c u t e ” , 7 C a u s e s => { 8 −−What a r e t h e c a u s e s o f t h e U n s a f e C o n t r o l A c t i o n (UCA) o f t h e 9 p h y s i c a l door ? 10 G e n e r a l => ” The d o o r c o n t r o l l e r ’ s wrong d e c i s i o n ” , 11 S p e c i f i c => ” The d o o r a c t u a t o r ’ s wrong e x e c u t i o n ” }, 12 −− I d e n t i f y S a f e t y C o n s t r a i n t s ( SC ) f o r t h e p h y s i c a l d o o r t o 13 m i t i g a t e e f f e c t o f t h e UCA. SC => SC3 : ” The p h y s i c a l d o o r s h o u l d n o t a l l o w t h e s e l e c t i o n o f t h e 14 wrong p r o c e s s ” }] 15 16 17 ∗∗}; end P h y s i c a l D o o r . i m p l ; Listing 4.14: ASAM for Physical Door Interface In listing 4.14, the physical door implementation tells us the wrong process has been executed because of two reasons: 1) the wrong decision has been made by the door controller or 2) the wrong execution has been made by the door actuator. We provide a safety constraint, the physical door should not allow the selection of the wrong process. This helps to mitigate the effects of the unsafe action of the physical door. Finally, this case tells us that the open or close processes in the physical door have been updated by abnormal 58 data because of a service commission error which comes from the controller. We know the physical door passes the wrong updated process to sensor of the door. This means the sensor’s information will be updated incorrectly. 1 2 system Door sensor features 3 d o o r s t a t e s : in data port ; 4 door open s : out data port ; 5 door blocked s : out data port ; 6 7 8 flows s s i n k : flow s in k d o o r s t a t e s ; end D o o r s e n s o r ; Listing 4.15: Train Automated Door Control System In listing 4.15, we show the last component of the train automated door control system architecture (TADCS): the interface for the TADCS sensor. This interface shows that the Door sensor component receives one value and produces two values. The first received value door state s represents door state. The other two values are door open s and door blocked s, representing door open state and door blocked state. The flow sink represents the end of the information flow which comes from the Door controller. The implementation of the entire system is shown in listing 4.16. 1 2 system implementation e n t i r e s y s t e m . impl subcomponents 3 Door sensor : system Door sensor . impl ; 4 D o o r c o n t r o l l e r : system D o o r c o n t r o l l e r . impl ; 5 D o o r a c t u a t o r : system D o o r a c t u a t o r . impl ; 6 Physical Door : system Physical Door . impl ; 7 connections 8 c o n n 1 : p o r t D o o r s e n s o r . d o o r o p e n s −> D o o r c o n t r o l l e r . d o o r o p e n c ; 9 c o n n 2 : p o r t D o o r s e n s o r . d o o r b l o c k e d s −> Door controller . door blocked c ; 10 c o n n 3 : p o r t D o o r c o n t r o l l e r . o p e n d o o r c −> D o o r a c t u a t o r . o p e n d o o r a ; 11 c o n n 4 : p o r t D o o r c o n t r o l l e r . c l o s e d o o r c −> D o o r a c t u a t o r . c l o s e d o o r a ; 12 c o n n 5 : p o r t D o o r a c t u a t o r . d o o r s t a t e a −> Physical Door . door sta te in cp ; 59 13 c o n n 6 : p o r t P h y s i c a l D o o r . d o o r s t a t e o u t c p −> Door sensor . d o o r s t a t e s ; 14 15 flows e t o e f o p e n : end t o end f l o w D o o r c o n t r o l l e r . c s o u r c e 1 −>c onn 3 −>D o o r a c t u a t o r . a p a t h 1 16 17 −>conn 5 −>P h y s i c a l D o o r . c p p a t h 3 −>c onn 6 −>D o o r s e n s o r . s s i n k ; e t o e f c l o s e : end t o end f l o w D o o r c o n t r o l l e r . c s o u r c e 2 −>c onn 4 −>D o o r a c t u a t o r . a p a t h 2 18 19 −>conn 5 −>P h y s i c a l D o o r . c p p a t h 3 −>c onn 6 −>D o o r s e n s o r . s s i n k ; end e n t i r e s y s t e m . i m p l ; Listing 4.16: Train Automated Door Control System In listing 4.16, we show the core components of the entire train automated door control system architecture. We represent the connections among components and we also represent the end to end flow information which comes from the door controller to the controlled process (i.e.physical door). Figure 4.3: Three-Way Communication Format for Train Automated Door Control System Architecture Figure 4.3 shows the feedback control loop architecture which has been built for the train automated 60 door control system. Figure 4.3 shows error flows based on the three-way interaction format for that train door model. The Door controller has an internal failure. This leads to the production of a service commission error (i.e. door controller closes the door instead of open). That error propagates to the Door actuator in the form of inadequate command. Then, the actuator executes the command which contains the error because the actuator follows the Door controller’s order. The Physical door shows the process has been executed by the door actuator and updates that process incorrectly. Finally, we show the propagated error which cuts across three components, from Door controller to Door actuator, and from Door actuator to Physical door. Also, the propagated error has three important effects on the train automated door control system. First, it effects the Door controller decision. Second, the Door actuator performs inadequate action. Third, it effects the controlled process Physical door by selecting the inappropriate process for the action. This illustration allows us to identify the source of the hazard by back-tracing for the error. Figure 4.4: ASAM’s report for Train Automated Door Control System Architecture ASAM generates a report for the train automated door control system architecture based on component names, error ontology, probability values, probability thresholds, hazard levels, unsafe control actions, causes of unsafe control actions and safety constraints. This stream of information is attached to each implementation part of the component. Specifically, besides identifying error propagation information for each component, we have recorded additional information for each part of the error flow (flow source [Door controller.impl], flow path[Door actuator.impl], and flow sink[Physical door.impl]. The report is shown in figure 4.4. That analysis report shows the source of the faults in the train door model, which is the Door controller.impl. Each fault in the controller produces an error, which is of the type service commission. Each error in the door controller has a chance of harming people based on the probability of occurrence for that error, represented as (p=0.1). The report tells also tells us what the unsafe action of the door controller is for that type of error as well as the general and specific causes for that error. Finally, the report shows the safety constraint for the safety analysts or stake holders to mitigate the effects of the error. 61 4.1.4 Medical Embedded Device-Pacemaker The third system we introduce is a safety-critical embedded system, an implantable medical device known as a pacemaker. The pacemaker is used to regulate abnormal heart rhythms. It has two main tasks, sensing and pacing. In pacing, it paces the heart in case the heart’s own rhythm is irregular or too slow. In sensing, it monitors the heart’s natural electrical activity. If the pacemaker senses a natural heart beat, it will not stimulate the heart. According to our method we need to specify major components of the pacemaker like the controller (DCM:Device-Controller Monitor), actuator (PG: Pulse Generator), controlled process (heart), and sensor (electrode/lead) [29, 31]. We can connect the components as shown in figure 3.1 . Our example starts from the actuator to the sensor’s output as shown in figure 3.1, error flow 3 (scenario 3: PG → heart → electrode/lead). The initial results for this medical device has been published in [38] which are: 1. The error ontology identifies timing errors like late delivery. Now, the hazard is ”the pacemaker is not working properly”. 2. The feedback control loop has been built from the actuator to the sensor as shown in figure 3.1 which is error flow 3 (scenario 3). 3. The unsafe control action is that ”the pacemaker does not provide an electrical pulse when its required”. The safety constraints for the unsafe action is that ”the pacemaker should provide an electrical pulse whenever its needed”. 4. The general cause for the unsafe control action is that ”the actuator has an internal failure” and the specific cause is that ”the propagation of timing error from the electrode to pulse generator”. 5. This is shown in the developed safety architecture subsection, as shown in figure 3.1 We will provide and discuss the feedback control loop architecture for the pacemaker. 62 Figure 4.5: Three-Way Communication Format for Embedded Medical Device Figure 4.5 shows the feedback control loop architecture that has been built for the pacemaker. Figure 4.5 shows the error flow based on the three-way interaction format for the pacemaker.PULSE GENERATOR has an internal failure. We know the result of each internal failure is an error and the error propagates to the next component through nominal control flow as well as the error propagation path. When the ELECTRODE detects that the HEART needs pacing, the DEVICE CONTROLLER MONITOR decides to send the command to the PULSE GENERATOR to send an electrical pulse to the HEART. The PULSE GENERATOR receives the command, but it is not executed instantly because it has internal failure. In this situation, the produced error inside of the PULSE GENERATOR constantly effects the operational state of the PULSE GENERATOR and also directly impacts the device CONTROLLER MONITOR’s command. If the PULSE GENERATOR is able to resolve this problem that would not be hazardous situation. But, if it is not, the situation will be hazardous for the patient such as sending pulse to the heart late. Figure 4.6: ASAM’s report for Pacemaker System Architecture ASAM generates the report for the pacemaker system architecture based on the following param63 eters: component names, error ontology, probability values, probability thresholds, hazard levels, unsafe control actions, causes of unsafe control actions and safety constraints. These streams of information are attached to each implementation part of the component. Besides identifying error propagation information for each component, we have recorded additional information for each part of the error flow (flow source[Pulse Generator.impl], flow path[Heart.impl] and flow sink [Electrode.impl]. The report is shown in figure 4.6. This analysis report shows the source of the error in the safety-critical embedded system model(i.e. pacemaker), which is LatePacingDelivery. Each error in the pacemaker has a chance of harming the patient’s heart based on the probability of occurrence for that type of error, represented with (p=0.01). ASAM compares the p value with probability of occurrence thresholds to find the probability of the hazard occurring. In this example, it found (occasional) and the hazard level is (Marginal). The report tells us the unsafe action of the pulse generator for that type of error, which is ”The pulse generator does not provide an electrical pulse when its required”. Also, the report shows the general and specific causes for that error, which are ”general: the pulse generator has an internal failure, specific: the propagating of timing error from electrode to the pulse generator”. Finally, the report shows the safety constraint for the safety analysts to know how to mitigate the effects of the error, which is ”the pulse generator should provide an electrical pulse whenever its needed”. 4.1.5 Infant Incubator Temperature Control System- Isolette The fourth system we introduce is another safety-critical system, known as an isolette. The isolette is an infant incubator temperature control system. The goal of this system is to monitor an infant at a safe or comfortable temperature and warn the nurse if the infant becomes too hot or too cold. In general, the system is used to control air temperatures inside of incubator so that they remain in a desired temperature range. We provide this example to support scenario 4 3.2.1. We evaluate what will happen if the operator interface device (OID) device in an infant incubator (isolette) gives an out of range value for the air temperature inside the isolette because of an internal failure? (e.g. the OID device displays correct values on the screen, but in reality does not produce an air temperature within the target range specified by the nurse). At a result, the infant is harmed by air temperature on the inside isolette being too hot / too cold. To solve this problem, the controller should send notification to the screen of the OID device about error values and should not send the command to the actuator. The initial result for isolette example are: 64 1. The error ontology identifies out of range errors as an incorrect values. Now, the hazard is ”the infant is not safe with (out of range) temperature values such as below or above of the desired range”. 2. The feedback control loop has been built from the operator interface device to the actuator as shown in figure 3.2 which is error flow 4 (scenario 4). 3. The unsafe control action is that ”the operator interface device does not send the message to warn the nurse when its required”. The safety constraints for the unsafe action is that ”the device should dispatch the message to warn the nurse whenever its needed”. 4. The general cause for the unsafe control action is that ”the connected device has an internal failure” and the specific cause is that ”the propagating of error values from connected device to actuator”. 5. This is shown in the developed safety architecture subsection, as shown in figure 3.2. We will provide and discuss the feedback control loop architecture for the infant incubator temperature control system. 65 Figure 4.7: Three-Way Communication Format for Isolette System Architecture Figure 4.7 shows the feedback control loop architecture which has been built for the isolette. Figure 4.7 shows the error flow based on three-way communication format for that isolette system. This example is different from the previous example because we show the effects of the internal failure of in OPERATOR INTERFACE DEVICE which is not a fundamental element in the feedback control loop architecture. The OPERATOR INTERFACE DEVICE is used to set THERMOSTAT and monitor the status of the system. When OPERATOR INTERFACE DEVICE has an internal failure, this leads to the sending an out of range temperature error values to THERMOSTAT. In this case, the THERMOSTAT will receive above range or below range temperature values. In this example, we have selected the above range temperature error value. This error effects the THERMOSTAT’s decision because it is not able to understand the temperature value that has been defined previously in the system controller. In this situation, if the THERMOSTAT is able to resolve this problem, it would not be a hazardous situation. But, if it is not, the situation will be hazardous for the infant such as sending inadequate commands to the HEAT SOURCE actuator. The inadequate command 66 affects the infant because the air temperature inside of the incubator is going to be higher than maximum temperature value. The infant will be harmed because incubator is going to be too hot. Also, that inadequate command does not send the message to the OPERATOR INTERFACE DEVICE to warn the nurse of the malfunction. Figure 4.8: ASAM’s report for Isolette System Architecture ASAM generates a report for the isolette system architecture based on the following parameters: component names, error ontology, probability values, probability thresholds, hazard levels, unsafe control actions, causes of unsafe control actions and safety constraints. This particular information is connected to each component’s implementation portion. In spite of identifying error propagation information for each component’s specification, we have recorded additional information for each portion of error: flow source (flow source[Operator Interface.impl], flow path[Thermostat.impl], and flow sink[Heat Source.impl]). The report is shown in figure 4.8. That report shows the component’s fault source in the infant incubator temperature control system, which is Operator Interface.impl. Each fault gives an error. The operator interface’s fault gives an OutOfRangeTemp error. Each error in the isolette system has a chance of harming the infant based on the probability of occurrence for that type of error, represented as (p=0.1). ASAM compares the p value with the probability of occurrence thresholds to find the probability of hazard occurring; it found (probable) so the hazard level is (critical). The report tells us what unsafe action the operator interface performs for that type of specific error, which is ”The operator interface sends an out of range temperature error values to thermostat”. Also, the report shows the general and specific causes for that type of error, which are ”general: the operator interface has an internal failure, specific: The operator interface accidentally receives incorrect setting values by the nurse”. Finally, the report shows the safety constraint for the stakeholders to know how to mitigate the effects of the error, which is ”The operator interface device should not send incorrect values to the thermostat”. 4.1.6 Summary In this section, we have introduced four examples of varying complexity and scenarios for each example. For each scenario, we have provided an instantiation of the ASAM annex representing the intro67 duced three-way communication format scenarios as well as a report of the output produced by the ASAM evaluator. The purpose of these examples is to demonstrate that ASAM finds more hazards using the threeway communication format as opposed to using the two-way format. ASAM identify hazards for the system scenarios based on the following parameters: component specification and implementation information, error ontology, probability values, probability thresholds, hazard levels, unsafe control actions, causes of unsafe control actions and safety constraints. For each scenario, we have also demonstrated not only the ability to represent the scenario in the architecture model but the ability to evaluate the representation to produce reports. 4.2 Demonstrating Error Mitigation by Safety Constraints Claim C2: Each error flow in ASAM assists in identifying a safety constraint of appropriate breadth to eliminate the unsafe control actions caused by an identified error. In addition, each error flow can guide us to identify specific causes for the unsafe control action. 4.2.1 Evaluation Plan To demonstrate claim C2, we take the previous complex examples that we analyzed with respect to the scenarios and we perform error mitigation analysis by safety constraints on each example. For that purpose, we will use ASAM to focus on error behavior of the components in the feedback control loop architecture and follow each error (port by port) and (component by component) until the destination. This following of the error allows us to identify the source of the hazard and provides safety constrains to mitigate it. We will evaluate our selected examples that were described previously to support the scenarios. In these examples, we need to identify the major component types (i.e. sensor, controller, actuator, controlled process), the component’s internal failure which causes error on the output ports, and the component’s safety constraint need to handle errors through in ports and prevent errors through out ports. Then, we will produce an error mitigation analysis report for each example to ensure that error has been mitigated by the safety constraint in each specific component. 68 4.2.2 Error Mitigation Analysis for Adaptive Cruise Control System We will now provide and discuss error mitigation by safety constraint for implementation part of each individual core components of the feedback control loop architecture for ACC system. 1 2 s y s t e m ACC sensor features 3 ACC State s : in data p o r t ; 4 ACC Speed s : o u t d a t a p o r t ; 5 ACC Distance s : out data p o r t ; 6 end ACC sensor ; 7 8 9 10 s y s t e m i m p l e m e n t a t i o n ACC sensor . i m p l a n n e x asam {∗∗ −− I d e n t i f y t h e component t y p e t y p e => s e n s o r ; 11 12 −−ACC sensor ’ s i n t e r n a l f a i l u r e c a u s e s e r r o r s on o u t p o r t s i n t e r n a l f a i l u r e INTF1 : ”ACC s e n s o r h a s i n t e r n a l f a i l u r e ” c a u s e s 13 [ S e n s o r R e a d s E r r o r V a l u e s ( C r i t i c a l , p = 0 . 0 2 ) ] on [ ACC Speed s , ACC Distance s ] ; 14 15 ∗∗}; end ACC sensor . i m p l ; Listing 4.17: ASAM Error Mitigation Analysis for ACC System In listing 4.17, we have represented the ACC sensor interfaces: specification and implementation. The ACC sensor specification interface is already described in section 4.1.2. The ACC sensor implementation interface shows the component’s sensor. Also, it shows the ACC sensor’s internal failure which causes SensorReadsErrorValues on the output ports ACC Speed s and ACC Distance s. This port ACC Speed s represents an interface which sends the measured values of the speed of your own car as compared to the car in front. This port ACC Distance s represent an interface which sends the measured values of distance between your own car and the car in front. 1 2 3 system ACC controller features ACC Speed c : i n d a t a p o r t ; 69 4 ACC Distance c : in d a t a p o r t ; 5 ACC Speed cmd : o u t d a t a p o r t ; 6 ACC Ditance cmd : o u t d a t a p o r t ; 7 end A C C c o n t r o l l e r ; 8 9 10 11 system implementation ACC controller . impl a n n e x asam {∗∗ −− i d e n t i f y t h e component t y p e t y p e => c o n t r o l l e r ; 12 13 −−ACC c o n t r o l l e r h a v e i n t e r n a l f a i l u r e c a u s e s e r r o r on o u t p o r t s . i n t e r n a l f a i l u r e INTF2 : ”ACC c o n t r o l l e r h a s an i n t e r n a l f a i l u r e ” c a u s e s 14 [ C o n t r o l l e r C o m p u t e E r r o r V a l u e s ( C a t a s t r o p h i c , p = 0 . 0 3 ) ] on [ ACC Speed cmd , ACC Ditance cmd ] ; 15 −−E r r o r comes from t h e ACC s e n s o r and c a u s e s e r r o r on o u t p o r t s . 16 [ S e n s o r R e a d s E r r o r V a l u e s ] on [ ACC Speed c , A C C D i s t a n c e c ] c a u s e s [ S e n s o r R e a d s E r r o r V a l u e s ( C a t a s t r o p h i c , p = 0 . 0 3 ) ] on [ ACC Speed cmd , ACC Ditance cmd ] ; 17 −−ACC c o n t r o l l e r h a v e a SC t o h a n d l e e r r o r ( s ) on i n p o r t s . s a f e t y c o n s t r a i n t SC1 : ” The ACC c o n t r o l l e r s h o u l d h a n d l e e r r o r s t h a t 18 come from t h e ACC s e n s o r ” h a n d l e s [ S e n s o r R e a d s E r r o r V a l u e s ] on [ ACC Speed c , A C C D i s t a n c e c ] ; 19 20 ∗∗}; end A C C c o n t r o l l e r . i m p l ; Listing 4.18: ASAM Error Mitigation Analysis for ACC System In listing 4.18, we show the ACC controller’s specification and implementation. The ACC controller specification interface is already described in 4.1.2. The ACC controller implementation interface shows the component’s controller. Also, it shows the ACC controller’s internal failure causes ControllerComputeErrorValues on the output ports ACC Speed cmd and ACC Distance cmd. That error effects of the ACC controller’s decision by giving incorrect commands to the actuator, which are catastrophic. At the same time, ACC sensor’s error, SensorReadsErrorValues, is propagated to ACC controller’s in ports ACC Speed c and ACC Distance c through the error propagation path. In this case, we provide a safety constraint to handle the ACC controller’s incoming errors. Also, the ACC controller does not let sensor’s error propagate to the 70 next component. 1 2 system ACC actuator features 3 ACC EXE Speed a : i n d a t a p o r t ; 4 ACC EXE Distance a : i n d a t a p o r t ; 5 Unsafe Action a : out data port ; 6 end A C C a c t u a t o r ; 7 8 9 10 system implementation ACC actuator . impl a n n e x asam {∗∗ −− i d e n t i f y t h e component t y p e t y p e => a c t u a t o r ; 11 12 −−ACC a c t u a t o r ’ s i n t e r n a l f a i l u r e c a u s e s e r r o r s on o u t p o r t s i n t e r n a l f a i l u r e INTF3 : ” The ACC a c t u a t o r h a s an i n t e r n a l f a i l u r e ” 13 c a u s e s [ A c t u a t o r E r r o r V a l u e s ( M a r g i n a l , p = 0 . 0 4 ) ] on [ U n s a f e A c t i o n a ] ; 14 −−E r r o r comes from t h e ACC c o n t r o l l e r and c a u s e s e r r o r on o u t p o r t s . [ C o n t r o l l e r C o m p u t e E r r o r V a l u e s ] on [ ACC EXE Speed a , ACC EXE Distance a ] 15 c a u s e s [ C o n t r o l l e r C o m p u t e E r r o r V a l u e s ( C r i t i c a l , p = 0 . 1 ) ] on [ Unsafe Action a ] ; 16 −−ACC a c t u a t o r h a s a SC t o h a n d l e i n c o m i n g e r r o r s from c o n t r o l l e r s a f e t y c o n s t r a i n t SC2 : ” The ACC h a n d l e s t h e e r r o r t h a t comes from t h e 17 ACC c o n t r o l l e r ” h a n d l e s [ C o n t r o l l e r C o m p u t e E r r o r V a l u e s ] on [ ACC EXE Speed a , ACC EXE Distance a ] 18 −−ACC a c t u a t o r h a s a SC t o p r e v e n t i t s e r r o r s t o o u t s a f e t y c o n s t r a i n t SC3 : ” The ACC p r e v e n t s t h e e r r o r o f t h e a c t u a t o r ” 19 p r e v e n t s [ A c t u a t o r E r r o r V a l u e s ] on [ U n s a f e A c t i o n a ] ; 20 21 ∗∗}; end A C C a c t u a t o r . i m p l ; Listing 4.19: ASAM Error Mitigation Analysis for ACC System In listing 4.18, we show the ACC actuator’s specification and implementation. The ACC actuator specification interface is already described in 4.1.2. The ACC actuator implementation interface shows the component’s actuator. Also, it shows the ACC actuator’s internal failure which causes ActuatorErrorValues 71 on an output port Unsafe Action a. At the same time, the ACC controller’s error, ControllerComputeErrorValues, is propagated to ACC actuator in ports ACC EXE Speed a and ACC EXE Distance a through the error propagation path. In this case, we need to provide two safety constraints, one to handle the ACC actuator’s incoming errors, the other to prevent the errors of ACC actuator itself. Figure 4.9: ASAM’s Error Mitigation report for ACC System Architecture Figure 4.9 shows ASAM’s generated report about error mitigation analysis by safety constraints for the adaptive cruise control system architecture. ASAM generates the report based on the following information: internal failure for each major component, each internal failure causes error, probability value of error occurrence, probability thresholds to identify probability of which hazard could occur, major component implementation, major component specification, propagation path to identify error mitigation process step by step, error specification propagation gathers information about each error, mitigated error by safety constraint, mitigated error in component’s implementation and mitigated error in component’s specification. We iteratively explain figure 4.9 below: 1. Internal failure (INTF1) recorded in ACC sensor causes SensorReadsErrorValues in ACC Distance s and ACC Speed s. The probability of occurrence for that error in measuring of distance and speed is (0.02). That error gives a critical level of hazard for the ACC system because ACC sensor reads incorrect values of distance and speed for the car in front. In the error propagation path, we show that these two error values are propagated from sensor to controller [ACC sensor.impl → ACC controller.impl]. Also, in the error propagation type, we gather information about each error such as error name, port to propagate, hazard’s severity level and p value. In this case, the controller has enough information about each propagated error. To mitigate each error, ACC controller has a safety constraint (SC). For example, ”SC1: The ACC controller should handle errors that come from the sensor” is used to mitigate the effects of the error measuring in distance and speed. This mitigation process has been 72 executed in ACC controller.impl. Finally, we show that the errors are propagated from sensor (S) or ACC sensor.impl as a source and have been mitigated in controller (C) or ACC controller.impl as a destination. 2. Previously, we show that how the ACC controller mitigates the incoming errors. But, what will happen if the ACC controller itself has an internal failure? How does the ACC controller solve this problem? For that purpose, we have recorded an internal failure (INTF2) in the ACC controller which causes error, ControllerComputeErrorValues, in commands ACC Distance cmd and ACC Speed cmd. The probability of occurrence for that error in the commands is (0.03). That’s (catastrophic) because the ACC controller’s command is considered as an inadequate command. In the error propagation path, we show that these two error values have been propagated from controller to actuator [ACC controller.impl → ACC actuator.impl]. In addition, in the error propagation information, we collect information about each error such as error name, port to propagate, hazard’s severity level and p value. In this case, the actuator has enough information about each propagated error from controller. To mitigate each error on an in ports, ACC actuator has a safety constraint (SC) for it. For example,”SC2: The ACC actuator handles the errors that come from the ACC controller” is used to mitigate the effects of the error in controller’s commands. This mitigation process has been executed in ACC actuator.impl as a destination point. Finally, we show that the errors in the commands are propagated from controller (C) or ACC controller.impl and have been mitigated in actuator (A) or ACC actuator.impl. 3. Now, we know that how the ACC controller handles incoming errors from the ACC sensor and also we show that how ACC actuator handles the incoming errors from the ACC controller. But, what will happen if the ACC actuator itself has an internal failure? How does this impact the controller’s commands? For that purpose, we have recorded an internal failure (INTF3) in the ACC actuator which causes ActuatorErrorValues. The probability of occurrence for that error during execution is (0.04). We consider the severity level for that error is (marginal) because the error effects the controller’s commands during execution and leads to produce an unsafe action in Unsafe Action a. But, previously we show that how ACC actuator mitigates the incoming propagated errors from ACC controller. Now, we need to show that the ACC actuator prevents errors that have been created by itself. In the error propagation path, we show that the source of the error is ACC actuator.impl and we have enough information about it. To prevent that error to out, the ACC actuator has a safety constraint (SC) for it. For example, ”SC3: The ACC actuator prevents the error of actuator itself ” is used to mitigate 73 the effects of the actuator itself. This mitigation process has been executed in ACC actuator.impl as a destination point. Finally, we show that the errors produced in the actuator (A) or ACC actuator.impl have been mitigated in the actuator (A) or ACC actuator.impl itself. 4. What will happen if the ACC actuator.impl is not able to prevent ActuatorErrorValues error out? Figure 4.10, the error propagation path shows that the ActuatorErrorValues error is propagated from the actuator to the sensor through controlled process, [ACC actuator.impl → controlled process.impl → ACC sensor.impl]. This means Unsafe Action a of the ACC actuator.impl updates the ACC sensor.impl’s information by incorrect values of speed and distance. This leads to the ACC controller.impl not being able to control speed and distance of the car in front because of the ACC actuator.impl’s internal failure. To ensure that the error is propagated and has not been mitigated, the figure 4.10 shows empty cells in some specific parameters such as mitigated error by safety constraints, mitigated in component name and mitigated by component type. Figure 4.10: ASAM’s report for Non-Mitigated Error in ACC system 4.2.3 Error Mitigation Analysis for Train Automated Door Control System We will now provide and discuss error mitigation by safety constraint for the implementation of each individual core component of the feedback control loop architecture for the train automated door control system. 1 2 system Door sensor features 3 door open s : out data port ; 4 do or c l os e s : out data port ; 5 d o o r s t a t e s : in data port ; 6 door close on person : out data port ; 7 end D o o r s e n s o r ; 8 74 9 10 11 system implementation Door sensor . impl a n n e x asam {∗∗ −− I d e n t i f y t y p e o f m a j o r component t y p e => s e n s o r ; 12 13 −−Door s e n s o r h a s an i n t e r n a l f a i l u r e c a u s e s e r r o r ( s ) on o u t p o r t s . i n t e r n a l f a i l u r e INTF1 : ” d o o r r e p o r t s i t i s c l o s e d when a p e r s o n i s i n 14 t h e doorway ” c a u s e s [ D o o r S e n s o r C o m m i s s i o n E r r o r ( C r i t i c a l , p = 0 . 0 2 ) ] on [ door close on person , door open s , door close s ] ; 15 −−Door s e n s o r h a s a s a f e t y c o n s t r a i n t t o p r e v e n t e r r o r o u t . s a f e t y c o n s t r a i n t SC1 : ” d o o r s h o u l d be o p e n e d when t h e p e r s o n i s i n t h e 16 doorway ” p r e v e n t s [ D o o r S e n s o r C o m m i s s i o n E r r o r ] on [ d o o r o p e n s , door close s ]; 17 18 ∗∗}; end D o o r s e n s o r . i m p l ; Listing 4.20: ASAM Error Mitigation Analysis for Train Automated Door Control System In listing 4.20, we have represented the Door sensor specification and implementation. In the specification part, the interface has three out data ports, door open s and door close s are used to send the open and close status of the door to the controller, door close on person is used to send status of the the door to the controller when an object is in the doorway. The interface also has one in data port, door state s, that allows the sensor to receive the last update values from physical door feedback about door status. In the implementation part, the interface shows the component’s major type is sensor. Also, it shows the Door sensor’s internal failure which causes DoorSensorCommisionError on the output ports,door close on person, door open s and door close s. We have also represented the probability of occurrence of the error and hazard’s severity level. Also, we need to provide a safety constraint, SC1, for the Door sensor interface to prevent DoorSensorCommisionError through output ports. 1 2 system D o o r c o n t r o l l e r features 3 door open c : in data port ; 4 door close c : in data port ; 5 open door cmd : out da t a p o r t ; 6 close door cmd : out data port ; 7 door state c : in data port ; 75 d o o r o p e n t r a i n n o t s t o p p e d : out data port ; 8 9 end D o o r c o n t r o l l e r ; 10 11 12 13 system implementation D o o r c o n t r o l l e r . impl a n n e x asam {∗∗ −− I d e n t i f y t y p e o f m a j o r component t y p e => c o n t r o l l e r ; 14 15 −−Door c o n t r o l l e r h a s i n t e r n a l f a i l u r e c a u s e s e r r o r ( s ) on o u t p o r t s . 16 i n t e r n a l f a i l u r e INTF2 : ” d o o r r e p o r t s i t i s o p e n e d when t h e t r a i n i s not stopped ” causes [ D o o r C o n t r o l l e r C o m m i s s i o n E r r o r ( M a r g i n a l , p = 0 . 0 4 ) ] on [ d o o r o p e n t r a i n n o t s t o p p e d , close door cmd , open door cmd ] ; 17 −−E r r o r comes from d o o r s e n s o r and c a u s e s e r r o r on c o n t r o l l e r ’ s o u t p u t ports . [ D o o r S e n s o r C o m m i s s i o n E r r o r ] on [ d o o r o p e n c , d o o r c l o s e c , 18 door state c ] causes [ D o o r S e n s o r C o m m i s s i o n E r r o r ( C a t a s t r o p h i c , p = 0 . 0 0 2 ) ] on [ d o o r o p e n t r a i n n o t s t o p p e d , close door cmd , open door cmd ] ; 19 −−Door c o n t r o l l e r h a v e s a f e t y c o n s t r a i n t s t o h a n d l e i n c o m i n g e r r o r and prevent outgoing error . s a f e t y c o n s t r a i n t SC2 : ” d o o r s s h o u l d be o p e n e d when t h e p e r s o n i s i n t h e 20 doorway ” h a n d l e s [ D o o r S e n s o r C o m m i s s i o n E r r o r ] on [ d o o r o p e n c , door close c , door state c ]; s a f e t y c o n s t r a i n t SC3 : ” d o o r s s h o u l d be c l o s e d when t h e t r a i n i s n o t 21 stopped at the s t a t i o n platform ” prevents [ D o o r C o n t r o l l e r C o m m i s s i o n E r r o r ] on [ c l o s e d o o r c m d , o p e n d o o r c m d ] ; 22 23 ∗∗}; end D o o r c o n t r o l l e r . i m p l ; Listing 4.21: ASAM Error Mitigation Analysis for Train Automated Door Control System In listing 4.21, we have represented the Door controller specification and implementation. In the specification part, Door controller has three in data ports, door open c and door close c are used to receive open and close door status from the sensor respectively, door state c is used to receive any abnor- 76 mal status of the door such as an object in the doorway but the door is closed. Also, Door controller has three out data ports, open door cmd and close door cmd are used to send open and close door commands to actuator, door open train not stopped is used to send abnormal status of the door to actuator such as door train is opened before train is stopped at the station platform. In the implementation part, Door controller shows the type of the component is the controller. Also, it shows the Door controller’s internal failure which causes DoorControllerCommissionError on output ports door open train not stopped, open door cmd and close door cmd. This error effects the Door controller’s decision by giving inadequate commands to door actuator. At the same time, Door sensor error, DoorSensorCommisionError, is propagated from Door controller’s in ports door open c, door close c and door state c through the error propagation path. This effects the Door controller’s decision as well. In this case, we need to provide two safety constraints, SC2 and SC3, to mitigate the effects of the incoming and outgoing errors respectively. 1 2 system D o o r a c t u a t o r features 3 open door a : in data port ; 4 close door a : in data port ; 5 door state a : in data port ; 6 door blocked in emergency : out data port ; 7 end D o o r a c t u a t o r ; 8 9 10 11 12 13 14 system implementation D o o r a c t u a t o r . impl a n n e x asam {∗∗ −− I d e n t i f y t y p e o f m a j o r component t y p e => a c t u a t o r ; −−Door a c t u a t o r h a s i n t e r n a l f a i l u r e c a u s e s e r r o r on o u t p o r t s i n t e r n a l f a i l u r e INTF3 : ” d o o r r e p o r t s i t i s b l o c k e d d u r i n g e m e r g e n c y ” c a u s e s [ D o o r A c t u a t o r C o m i s s i o n E r r o r ( C a t a s t r o p h i c , p = 0 . 2 ) ] on [ door blocked in emergency ] ; 15 16 −−E r r o r comes from d o o r c o n t r o l l e r and c a u s e s e r r o r on o u t p o r t . [ D o o r C o n t r o l l e r C o m m i s s i o n E r r o r ] on [ o p e n d o o r a , c l o s e d o o r a , door state a ] causes [ D o o r C o n t r o l l e r C o m m i s s i o n E r r o r ( C r i t i c a l , p = 0 . 1 ) ] on [ door blocked in emergency ] ; 77 17 −− Door a c t u a t o r h a v e s a f e t y c o n s t r a i n t s t o h a n d l e i n c o m i n g e r r o r s and prevent outgoing e r r o r s . s a f e t y c o n s t r a i n t SC4 : ” d o o r s s h o u l d be c l o s e d b e f o r e t r a i n i s moving 18 and d o o r s s h o u l d be o p e n e d a f t e r t r a i n i s s t o p p e d a t t h e s t a t i o n p l a t f o r m ” h a n d l e s [ D o o r C o n t r o l l e r C o m m i s s i o n E r r o r ] on [ open door a , close door a , d o o r s t a t e a ] ; s a f e t y c o n s t r a i n t SC5 : ” d o o r s s h o u l d be o p e n e d d u r i n g e m e r g e n c y ” 19 p r e v e n t s [ D o o r A c t u a t o r C o m i s s i o n E r r o r ] on [ d o o r b l o c k e d i n e m e r g e n c y ] ; 20 21 ∗∗}; end D o o r a c t u a t o r . i m p l ; Listing 4.22: ASAM Error Mitigation Analysis for Train Automated Door Control System In listing 4.22, we have represented the Door actuator specification and implementation. From the specification perspective, Door actuator has three in data ports, open door a and close door a are used to receive open and close door commands from the controller, door state a is used to receive any abnormal status of the door such as doors are not closed before moving. Also, Door actuator has an out data port, door blocked in emergency is used to send abnormal status of the door to controlled process such as doors are blocked during an emergency which leads to people stuck inside the train. From the implementation information perspective, the Door actuator shows the component’s major type, which is the actuator. Also, it shows the Door actuator’s internal failure which causes DoorActuatorComissionError on an output port door blocked in emergency. That error effects the Door actuator’s execution commands. At the same time, the Door controller’s error, DoorControllerCommissionError, is propagated to Door actuator in ports open door a, close door a and door state a through the error propagation path. Now, the Door actuator gets two types of errors: one from the controller of the door and the other generated by the actuator itself. In this situation, we need to provide two safety constraints, SC4 and SC5, to mitigate the effects of the incoming and outgoing errors respectively. 78 Figure 4.11: ASAM’s Error Mitigation report for Train Automated Door Control System Architecture Figure 4.11 shows ASAM’s generated report about error mitigation analysis by safety constraints for the train automated door control system architecture. ASAM generates the report based on the following information: internal failure for each major component, each internal failure causes error, probability value of error occurrence, probability thresholds to identify probability of which hazard could occur, major component implementation, major component specification, propagation path to identify error mitigation process step by step, error specification propagation collects information about each error, mitigated error by safety constraint, mitigated error in components implementation and mitigated error in components specification. We iteratively explain figure 4.11 below: 1. The internal failure (INTF1) recorded in the Door sensor which causes DoorSensorCommisionError on output port. This error leads to closing the door on a person in the doorway. The probability of occurrence for that error in the sensor mis-reading an object is (0.02). This error gives a critical level of hazard for people before train door system is developed because Door sensor does not read objects in the doorway. In the error propagation path, we show that the error is propagated from the sensor to the controller[Door sensor.impl → Door controller.impl] through the error propagation path. Also, in the error propagation specification, the controller gathers information about the error such as name of the error, port to propagate, hazard’s severity level of the error and p value. To mitigate the propagated error, the Door controller has safety constraints (SC). For example, ”SC2: doors should be opened when the person is in the doorway” is used to mitigate the effects of misreading objects. This mitigation process has been executed in the Door controller.impl. Finally, we show that the error is propagated from sensor(S) or Door sensor.impl as a source of the error and has been mitigated in the controller(C) or Door controller.impl as a destination of the error. 79 2. Previously, we show that how the Door controller mitigates the incoming errors from the Door sensor. But, what will happen if the Door controller itself has an internal failure? How can the Door controller solve this problem? For that purpose, we have recorded an internal failure (INTF2) in the Door controller which causes error, DoorControllerCommissionError, in commands. This error leads to letting the controller open the doors before train is stopped or after train is started. The probability of occurrence for that error is (0.03). That p value gives marginal severity level to people. In the error propagation path, we show that the error is propagated from the controller to the actuator [Door controller.impl → Door actuator.impl] through the error propagation path. Also, in the error propagation information, the controller gathers information about the error and sends it to the actuator such as name of the error, port to propagate, hazard’s severity level of the error and p value. Now, to mitigate the propagated error in the actuator, Door actuator provides a safety constraint for it. For example, ”SC4: doors should be closed before train is moving and doors should be opened after train is stopped at the station platform” is used to mitigate the effects of the error in the controller’s command. This mitigation process has been executed in the Door actuator.impl as a destination point. Finally, we show that the source of the error is the door controller(C) or Door controller.impl and mitigated in the door actuator (A) or Door actuator.impl. 3. Now, we know that how the Door controller handles incoming errors from the Door sensor and we also show that the Door actuator handles the incoming errors from the Door controller. But, what will happen if the Door actuator itself has an internal failure? For that purpose, we have recorded an internal failure (INTF3) in the Door actuator which causes DoorActuatorCommissionError. This error leads to blocking the doors during an emergency. The probability of occurrence for that error during execution is (0.04). That p value gives a catastrophic severity level. Previously, we show how Door actuator mitigates the incoming propagated errors from Door controller. Now, we need to show how the Door actuator prevents errors that have been produced by itself. In the error propagation path, we show that the source of the error is Door actuator.impl and the actuator has sufficient information about the error such as name of produced error, port to propagate, p value and hazard’s severity level. To prevent DoorActuatorCommissionError, the Door actuator has a safety constraint for it. For example, ”SC5: doors should be opened during emergency” is used to mitigate the effects of the error. This mitigation process has been executed in Door actuator.impl as a destination point. Finally, we show that the source of produced error in the actuator (A) or Door actuator.impl and has been mitigated in 80 the (A) or Door actuator.impl. 4. What will happen if the Door actuator is not able to prevent DoorActuatorCommissionError? As shown in figure 4.12, the error propagation analysis shows that the error has been propagated to Door sensor.impl through Physical Door.impl. This means Door actuator updates the Door sensor’s information incorrectly. This leads to the Door controller not being able to open the door during an emergency because the sensor continuously provides incorrect values of data to controller and the controller will do incorrect computation for that data which causing it to send inadequate commands that would not lead to opening the physical door. To ensure that the error is propagated and has not been mitigated, the figure 4.12 shows empty cells in some specific parameters such as mitigated error by safety constraints, mitigated in component name, and mitigated by component type. Figure 4.12: ASAM’s report for Non-Mitigated Error in Train Door 4.2.4 Error Mitigation Analysis for Medical Embedded Device We will now provide and discuss error mitigation by safety constraint for the implementation of each individual core component of the feedback control loop architecture for the medical embedded device pacemaker. 1 system E l e c t r o d e 2 features 3 Measure Sensing : out event data port ; 4 Heart Response : in event data port ; 5 end E l e c t r o d e ; 6 7 8 9 10 system implementation E l e c t r o d e . impl a n n e x asam {∗∗ −− I d e n t i f y t y p e o f m a j o r component t y p e => s e n s o r ; 81 11 −−E l e c t r o d e i s t h e s e n s o r o f t h e p a c e m a k e r . The i n t e r n a l f a i l u r e i n i t causes electrode error . i n t e r n a l f a i l u r e INTF1 : ” The e l e c t r o d e d o e s n o t m e a s u r e s e n s i n g v a l u e s 12 on t i m e ” c a u s e s [ L a t e M e a s u r e S e n s i n g ( C r i t i c a l , p = 0 . 0 2 ) ] on [ Measure Sensing ] ; 13 14 ∗∗}; end E l e c t r o d e . i m p l ; Listing 4.23: ASAM Error Mitigation Analysis for pacemaker In listing 4.23, we have represented the Electrode sensor specification and implementation. From the specification perspective, the interface has two ports Measure Sensing and Heart Response. The Measure Sensing is an out event data port and used to send sensing measured values to controller.Heart Response is an in event data port and used to receive the heart response values whenever the heart is paced. From the implementation perspective, we have identified the Electrode major type which is the sensor. Also, we show that the Electrode has an internal failure which causes LateMeasureSensing error on the out event data port Measure Sensing. Additionally, we have specified the probability of occurrence for that error and hazard’s severity level. 1 2 system D e v i c e C o n t r o l l e r M o n i t o r features 3 Heart Monitoring : in event data port ; 4 Send Pulse Command : o u t e v e n t d a t a p o r t ; 5 end D e v i c e C o n t r o l l e r M o n i t o r ; 6 7 8 9 10 11 system implementation D e v i c e C o n t r o l l e r M o n i t o r . impl a n n e x asam {∗∗ −− I d e n t i f y t y p e o f m a j o r component t y p e => c o n t r o l l e r ; −−DCM i s t h e c o n t r o l l e r o f t h e p a c e m a k e r . The i n t e r n a l f a i l u r e i n i t c a u s e s DCM e r r o r . 12 i n t e r n a l f a i l u r e INTF2 : ” The DCM d o e s n o t s e n d p a c i n g command on t i m e t o t h e p u l s e g e n e r a t o r ” c a u s e s [ LatePacingCommand ( C r i t i c a l , p = 0 . 0 3 ) ] on [ Send Pulse Command ] ; 13 −−The E l e c t r o d e e r r o r p r o p a g a t e s t o DCM. 82 [ L a t e M e a s u r e S e n s i n g ] on [ H e a r t M o n i t o r i n g ] c a u s e s 14 [ L a t e M e a s u r e S e n s i n g ( C r i t i c a l , p = 0 . 0 0 2 ) ] on [ Send Pulse Command ] ; 15 −−DCM h a v e a SC t o m i t i g a t e e r r o r s coming from t h e e l e c t r o d e s a f e t y c o n s t r a i n t SC1 : ” The DCM s h o u l d r e c e i v e h e a r t s e n s i n g v a l u e s on 16 t i m e t o s e n d t h e p a c e command on t i m e a s w e l l ” h a n d l e s [ L a t e M e a s u r e S e n s i n g ] on [ H e a r t M o n i t o r i n g ] ; 17 18 ∗∗}; end D e v i c e C o n t r o l l e r M o n i t o r . i m p l ; Listing 4.24: ASAM Error Mitigation Analysis for pacemaker In listing 4.24, we have represented the Device Controller Monitor specification and implementation. From specification perspective, the interface has two ports Heart Monitoring and Send Pulse Command. The Heart Monitoring is an in event data port and is used to monitor measured sensing values of the heart. The Send Pulse Command is an out event data port and is used to send pulse command to pulse generator. From the implementation perspective, we have identified the component major type which is controller. We show the Device Controller Monitor has an internal failure which causes LatePacingCommand on the out event data port Send Pulse Command. This error effects of the controller’s decision by giving pulse command in late. At the same time, Electrode sensor error, LateMeasureSensing is propagated to Device Controller Monitor through error propagation path. The sensor’s propagated error also effects of the controller’s decision as well. In the controller, we need to provide a safety constraint, SC1, to handle the incoming error first. 1 2 system P u l s e G e n e r a t o r features 3 Receive Pulse Command : i n e v e n t d a t a p o r t ; 4 Heart Pacing : out event data port ; 5 end P u l s e G e n e r a t o r ; 6 7 8 9 10 11 system implementation P u l s e G e n e r a t o r . impl a n n e x asam {∗∗ −− I d e n t i f y t y p e o f m a j o r component t y p e => a c t u a t o r ; −−The PG i s t h e p u l s e g e n e r a t o r . The i n t e r n a l f a i l u r e i n i t c a u s e s PG errors . 83 i n t e r n a l f a i l u r e INTF3 : ” The p u l s e g e n e r a t o r d o e s n o t g e n e r a t e t h e 12 r e q u i r e d p u l s e whenever i s needed ” c a u s e s [ D e l a y e d P a c i n g S e r v i c e ( C r i t i c a l , p = 0 . 0 4 ) ] on [ H e a r t P a c i n g ] ; 13 −−DCM e r r o r from c o n t r o l l e r c a u s e s D e l a y e d S e r v i c e i n PG . [ LatePacingCommand ] on [ R e c e i v e P u l s e C o m m a n d ] c a u s e s 14 [ LatePacingCommand ( C r i t i c a l , p = 0 . 1 ) ] on [ H e a r t P a c i n g ] ; s a f e t y c o n s t r a i n t SC2 : ” The p u l s e g e n e r a t o r s h o u l d r e c e i v e t h e p a c i n g 15 command on t i m e from DCM” h a n d l e s [ LatePacingCommand ] on [ Receive Pulse Command ] ; s a f e t y c o n s t r a i n t SC3 : ” The p u l s e g e n e r a t o r s h o u l d n o t p r o v i d e t h e l a t e 16 p a c i n g s e r v i c e t o t h e h e a r t ” p r e v e n t s [ D e l a y e d P a c i n g S e r v i c e ] on [ Heart Pacing ] ; 17 18 ∗∗}; end P u l s e G e n e r a t o r . i m p l ; Listing 4.25: ASAM Error Mitigation Analysis for pacemaker In listing 4.25, we have represented the Pulse Generator specification and implementation. From the specification perspective, the interface has two ports Receive Pulse Command and Heart Pacing. The Receive Pulse Command is an in event data port and is used to receive pulse command from the controller. The Heart Pacing is an out event data port and is used to execute the received command such as pacing heart according to the controller’s decision. From the implementation perspective, we have identified the component’s major type which is the actuator. Also, we show that the Pulse Generator has an internal failure which causes DelayedPacingService error on the output port Heart Pacing. That error effects the Pulse Generator’s execution commands. At the same time, the Device Controller Monitor error, LatePacingCommand, is propagated to Pulse Generator in port Receive Pulse Command through error propagation path. That error also effects of the execution commands in the Pulse Generator as well. Now, the Pulse Generator gets two types of errors: one from the Device Controller Monitor and the other by Pulse Generator itself. In this case, we need to provide two safety constraints, SC2 and SC3, to mitigate the effects of the incoming and outgoing errors respectively. 84 Figure 4.13: ASAM’s Error Mitigation report for pacemaker Figure 4.13 shows ASAM’s generated report about error mitigation analysis by safety constraints for the an embedded medical device is called pacemaker. ASAM generates the report based on the following information: internal failure for each major component, each internal failure causes error, probability value of error occurrence, probability thresholds to identify probability of which hazard could occur, major component implementation, major component specification, propagation path to identify error mitigation process step by step, error specification propagation collects information about each error, mitigated error by safety constraint, mitigated error in components implementation and mitigated error in components specification. We iteratively explain figure 4.13 below: 1. The internal failure (INTF1) recorded in the Electrode which causes LateMeasureSensing error on output port Measure Sensing. This error leads to reading the heart sensing values in late. The probability of occurrence for that error in the sensor late-reading sense value is (0.02). This error gives a critical level of hazard for patient before pacemaker is developed because Electrode does not read sensing values on time during operational system context. In the error propagation path, we show that the error is propagated from the sensor to controller[Electrode.impl → Device Controller Monitor.impl] through the error propagation path. Also, in the error propagation specification, the controller gathers information about the propagated error such as name of the error, port to propagate, hazard’s severity level of the error and p value. To mitigate the propagated error, the Device Controller Monitor.impl has safety constraint (SC). For example, ”SC1:The DCM should receive heart sensing values on time to send the pace command on time as well” is used to mitigate the effects of late-reading values. This mitigation process has been executed in the Device Controller Monitor.impl. Finally, we show that the error is propagated from sensor (S) or Electrode.impl as a source of the error and has been mitigated in the controller (C) or Device Controller Monitor.impl as a destination of the error. 2. Previously, we show that the Device Controller Monitor.impl mitigates the incoming errors from the Electrode.impl. But, what will happen if the Device Controller Monitor.impl itself has an internal 85 failure?How can the Device Controller Monitor.impl solve this problem? For that purpose, we have recorded an internal failure (INTF2) in the Device Controller Monitor.impl which causes error, LatePacingCommand. This error leads to letting the controller send the pace command in a late time because it receives sensing values in a late time as well. The probability of occurrence for that error is (0.03). That p value gives critical severity level to the patient. In the error propagation path, we show that the error is propagated from the controller to the actuator [Device Controller Monitor.impl → Pulse Generator.impl] through the error propagation path. Also, in the error propagation information, the controller gathers information about the error and sends it to the actuator such as name of the error, port to propagate, hazard’s severity level of the error and p value. Now, to mitigate the propagated error in the actuator, Pulse Generator.impl provides a safety constraint for it. For example, ”SC2: The pulse generator should receive the pacing command on time from DCM” is used to mitigate the effects of the error in the controller’s command. This mitigation process has been executed in the Pulse Generator.impl as a destination point. Finally, we show that the source of the error is the controller (C) or Device Controller Monitor.impl and mitigated in the actuator (A) or Pulse Generator.impl 3. Now, we know that how the Device Controller Monitor.impl handles incoming errors from the Electrode.impl and we also show that the Pulse Generator.impl handles the incoming errors from the Device Controller Monitor.impl. But, what will happen if the Pulse Generator.impl itself has an internal failure? For that purpose, we have recorded an internal failure (INTF3) in the Pulse Generator.impl which causes DelayedPacingService error. This error leads to delaying the pace service to the heart. The probability of occurrence for that error during execution is (0.04). That p value gives a critical severity level. Previously, we show how Pulse Generator.impl mitigates the incoming propagated errors from Device Controller Monitor.impl. Now, we need to show how the Pulse Generator.impl prevents errors that have been produced by itself. In the error propagation path, we show that the source of the error is Pulse Generator.impl and the actuator has sufficient information about the error such as name of the produce error, port to propagate, p value and hazard’s severity level. To prevent DelayedPacingService error, the Pulse Generator.impl has a safety constraint for it. For example, ”SC3: The pulse generator should not provide the late pacing service to the heart” is used to mitigate the effects of the error. This mitigation process has been executed in the Pulse Generator.impl as a destination point. Finally, we show that the source of the error is the actuator (A) or Pulse Generator.impl and has been mitigated in the (A) or Pulse Generator.impl itself. 86 4. What will happen if the Pulse Generator.impl is not able to prevent DelayedPacingService error? As shown in figure 4.14, the error propagation analysis shows that the DelayedPacingService error has been propagated from the actuator to the sensor [Pulse Generator.impl → Heart.impl → Electrode.impl]. This means that the Pulse Generator.impl updates the Electrode.impl’s information in a late time. This leads to the Device Controller Monitor.impl not being able to pace the heart when it is required because of the Pulse Generator.impl’s internal failure. If we assume that the Device Controller Monitor.impl sends another pulse command to the Pulse Generator.impl in a late time. Also, the Pulse Generator.impl executes the second command in a late time as well. This leads to accumulating time of delay pacing service to the heart. This operation does not help to raise the patient’s heart beating because every sensing operation needs pacing action on time. To ensure that the error is propagated and has not been mitigated, the figure 4.14 shows empty cells in some specific parameters such as mitigated error by safety constraints, mitigated in component name, and mitigated by component type. Figure 4.14: ASAM’s report for Non-Mitigated Error in pacemaker 4.2.5 Summary In this evaluation, we have taken three examples which contain different types of errors such as value errors in ACC example, service errors in train door and timing errors in pacemaker. Each example has been evaluated based on mitigated and non-mitigated errors. The result of each example has been showed based on the ASAM’s generated report. The purpose of this evaluation is to demonstrate that ASAM mitigates errors by safety constraints for different kinds of safety-critical system examples. ASAM generates a report based on the following information: identify internal failure for each major component, each internal failure causes error, probability value of error occurrence, thresholds to identify probability of which hazard could occur, identify major component’s implementation & specification parts, propagation path to identify error mitigation process step by step, and mitigate error effects by safety constraints in the component’s implementation parts. Finally, in each example, we have also demonstrated the ASAM’s ability, to handle incoming errors and prevent outgoing errors by safety constraints, for each specific component in the feedback control loop architecture. 87 4.3 Demonstrating Safety Constraints Verification Claim C3: ASAM uses verification, unlike existing safety analysis methods, to further ensure that the safety constraint fully mitigates the hazardous condition improving the quality of the safety constraints. 4.3.1 Evaluation Plan To demonstrate claim C3, we use ASAM to verify the safety constraints against the system model with injected errors. This allows us to determine that either unsafe behavior occurs (which means the error leads to a hazardous condition in the system) or verify that the error does not lead to unsafe behavior. To evaluate this claim, we need to add a new piece, verified by [Guarantee Identifier], to the safety constraint statements of ASAM and pass the identifiers to the guarantee statements in XAGREE (eXtended Assume Guarantee REasoning Environment) which is described in section 2.4. The identifier of the guarantee statement is used in the verified by statements of ASAM. When we invoke either activity (report or analysis) in ASAM, XAGREE is spawned in the background and as each component of the feedback control loop is encountered, a verify all activity in XAGREE is spawned. ASAM waits for XAGREE to complete the analysis and then it matches each safety constraint with the XAGREE results. If XAGREE verified the guarantee(s) associated with the safety constraint statement, then the safety constraint is reported as verified. If any guarantee associated with a safety constraint fails verification, the safety constraint is marked as unverified. 4.3.2 Safety Constraints Verification for Train Automated Door Control System We will provide and discuss safety constraint verification for the feedback control loop architecture of the train automated door control system. We will verify the safety constraints that have been placed in the implementation of each component. The specification part of the component ensures that the component does not have unsafe behavior by listing the verification conditions using XAGREE. In this example, we will verify three important safety safety constraints of the train door, which are: 1. SC1: The train door should not be blocked when it is opened. 2. SC2: The train door should not be blocked when it is closed. 3. SC3: The train door can not be opened and closed at the same time (i.e, open and then close, close and then open, not both together). 88 1 2 system d o o r s e n s o r features 3 d o o r b l o c k e d s : in da t a p o r t Base Types : : I n t e g e r ; 4 d o o r o p e n s : out d a t a p o r t Base Types : : Boolean ; 5 d o o r c l o s e s : out d a t a p o r t Base Types : : Boolean ; 6 7 a n n e x x a g r e e {∗∗ −− I d e n t i f y d o o r s e n s o r i n p u t s t a t e s assume ” Door i n b l o c k s t a t e s ” : d o o r b l o c k e d s >= 0 ; 8 9 −− I d e n t i f y d o o r s e n s o r o u t p u t i d e n t i f i e r s GAU1 : g u a r a n t e e ” Door i s b l o c k e d o n l y i f 10 i t opens ” : ( d o o r b l o c k e d s = 1 and d o o r o p e n s = t r u e ) o r d o o r o p e n s = f a l s e ; GAU2 : g u a r a n t e e ” Door i s b l o c k e d o n l y i f 11 i t closes ” : ( door blocked s = 0 and d o o r c l o s e s = t r u e ) o r d o o r c l o s e s = f a l s e ; 12 13 ∗∗}; end d o o r s e n s o r ; Listing 4.26: XAGREE Identifiers for Train Door Sensor Specification In listing 4.26, we have presented the door sensor specification. Notice that identifiers have been annotated in the XAGREE annex. In the specification, the interface has two out data ports, door open s and door close s, which are used to send the open and close status of the door to the controller. The interface also has an in data port, door blocked s, that allows the sensor to receive the blocked status value from the physical door sensors. In the XAGREE annex, we have preconditions for the in data port. Also, we have postconditions and identifiers, GAU1 and GAU2, for the out data ports. 1 2 system implementation d o o r s e n s o r . impl a n n e x asam {∗∗ 3 t y p e => s e n s o r ; 4 E r r s => [ { 5 Type => V a l u e O p e n e d E r r o r ( C r i t i c a l , p = 0 . 1 ) , 6 UCA => UCA1 : ” Door r e p o r t i t i s b o t h o p e n e d and b l o c k e d ” , 7 C a u s e s => { G e n e r a l => ”A s i g n a l was g e n e r a t e d wrong when d o o r i s o p e n e d ” 8 9 10 }, −−V e r i f y d o o r s e n s o r o u t p u t s by s a f e t y c o n s t r a i n t s 89 identifiers . SC => SC1 : ” Door s h o u l d n o t be b l o c k e d when t h e d o o r i s o p e n e d ” 11 v e r i f i e d by [GAU1] 12 }, 13 { 14 Type => V a l u e C l o s e d E r r o r ( C r i t i c a l , p = 0 . 1 ) , 15 UCA => UCA2 : ” Door r e p o r t i t i s b o t h c l o s e d and b l o c k e d ” , C a u s e s => { 16 G e n e r a l => ”A s i g n a l was g e n e r a t e d wrong when d o o r i s c l o s e d ” 17 }, 18 19 −−V e r i f y d o o r s e n s o r o u t p u t s by s a f e t y c o n s t r a i n t s 20 SC => SC2 : ” Door s h o u l d n o t be b l o c k e d when t h e d o o r i s c l o s e d ” identifiers . v e r i f i e d by [GAU2] }] 21 22 23 24 ∗∗}; −− S a t i s f y e a c h s t a t u s o f t h e d o o r s e n s o r a n n e x x a g r e e {∗∗ 25 assign door open s = ( door blocked s = 1) ; 26 assign d o o r c l o s e s = ( door blocked s = 0) ; 27 28 ∗∗}; end d o o r s e n s o r . i m p l ; Listing 4.27: ASAM and XAGREE for Train Door Sensor Implementation In listing 4.27, we have presented ASAM and XAGREE in the door sensor implementation. ASAM matches the guarantee identifiers, GAU1 and GAU2, to the guarantee statements in the XAGREE specification shown in list 4.26. These identifiers are used to verify the safety constraints SC1 and SC2 respectively to ensure that the train door is not blocked when it is opened or closed. When we call XAGREE, it checks all activities in the specification and implementation parts of the door sensor and returns the result to ASAM. At that time, ASAM waits for the XAGREE response and then it matches each safety constraint with the XAGREE results. If XAGREE verified the guarantee identifiers, GAU1 and GAU2, associated with the safety constraint, SC1 and SC2, then ASAM reports that the safety constraints have been verified. 1 −−The d a t a m e s s a g e comes from t h e d o o r s e n s o r , i t c o n t a i n s two s t a t u s a t t h e same t i m e 2 d a t a message 90 3 end m e s s a g e ; 4 5 6 d a t a i m p l e m e n t a t i o n message . impl subcomponents 7 open door c : d a t a b a s e t y p e s : : Boolean ; 8 c l o s e d o o r c : d a t a b a s e t y p e s : : Boolean ; 9 end m e s s a g e . i m p l ; 10 11 12 13 −−The d o o r c o n t r o l l e r s p e c i f i c a t i o n system d o o r c o n t r o l l e r features 14 d o o r o p e n a n d c l o s e m s j : i n d a t a p o r t message . impl ; 15 d o o r b l o c k e d c : in d a ta p o r t Base Types : : I n t e g e r ; 16 open door c : out d a t a p o r t Base Types : : Boolean ; 17 c l o s e d o o r c : out d a t a p o r t Base Types : : Boolean ; 18 a n n e x x a g r e e {∗∗ 19 assume ” Door s t a t e s i n ” : d o o r b l o c k e d c <= 2 and d o o r b l o c k e d c >=0; 20 GAU3 : g u a r a n t e e ” Door c a n n o t be o p e n e d and c l o s e d a t t h e same t i m e ” : n o t ( o p e n d o o r c and c l o s e d o o r c ) ; 21 22 ∗∗}; end d o o r c o n t r o l l e r ; Listing 4.28: XAGREE for Train Door Controller Specification In listing 4.28, we present a message object, read by the door sensor, containing two status at the same time. In this example, only one of the boolean flags of the message will be set at a time; there should not be an instance where both are set. Also, we present the door controller specification. In the specification, the interface has two out data ports, open door c and close door c, which are used to send commands to open and close the train door. The interface also has two in data ports, door open and close msj and door blocked c. The door open and close msj is used to receive the data messages that have been read by the door sensor. The door blocked c is used to receive the blocked status of the train door. In the XAGREE annex, we have preconditions for the in data port. Also, we have a postcondition ensuring that the door controller does not send both open and close door commands at the same time. 1 system implementation d o o r c o n t r o l l e r . impl 91 2 a n n e x asam {∗∗ 3 t y p e => c o n t r o l l e r ; 4 E r r s => [ { 5 Type => V a l u e O p e n d A n d C l o s e d E r r o r ( C r i t i c a l , p = 0 . 2 ) , 6 UCA => UCA3 : ” Door r e p o r t i t i s i n two s t a t e s ( open and c l o s e ) a t t h e same t i m e ” , C a u s e s => { 7 G e n e r a l => ”A s i g n a l was g e n e r a t e d wrong when d o o r i s c l o s e d 8 and o p e n e d ” }, 9 10 −− V e r i f y t h e s a f e t y c o n s t r a i n t i d e n t i f i e r 11 SC => SC3 : ” Door c o n t r o l l e r s h o u l d n o t s e n d open and c l o s e d o o r commands a t t h e same t i m e ” v e r i f i e d by [GAU3] }] 12 13 ∗∗}; 14 a n n e x x a g r e e {∗∗ 15 −− S a t i s f y t h a t open and c l o s e h a v e t h e same s t a t u s eq d o o r o p e n m s j c : b o o l = f a l s e −> i f 16 ( ( d o o r o p e n a n d c l o s e m s j . o p e n d o o r c = t r u e ) and ( door open and close msj . close door c = f a l s e ) ) then true else false ; eq d o o r c l o s e m s j c : b o o l = f a l s e −> i f 17 ( ( d o o r o p e n a n d c l o s e m s j . c l o s e d o o r c = t r u e ) and ( door open and close msj . open door c = f a l s e ) ) then true else f a l s e ; 18 −− S a t i s f y t h a t t h e d a t a m e s s a g e do n o t l e a d t o b l o c k t h e d o o r when i t i s opened or c l o s e d 19 a s s e r t ( o p e n d o o r c = ( d o o r o p e n m s j c and ( d o o r b l o c k e d c <= 2 ) ) ) ; 20 a s s e r t ( c l o s e d o o r c = ( d o o r c l o s e m s j c and ( d o o r b l o c k e d c >= 0 ) ) ) ; 21 22 ∗∗}; end d o o r c o n t r o l l e r . i m p l ; Listing 4.29: ASAM and XAGREE for Train Door Controller Implementation In listing 4.29, we have both ASAM and XAGREE in the door controller implementation. ASAM provides an identifier, GAU3, to the guarantee statement in the XAGREE specification of the train door 92 controller, as shown in listing 4.28, to verify the safety constraint SC3. When ASAM calls XAGREE, it checks all activity of the train door controller in the specification / implementation and returns the result to ASAM. As we show that in the door controller implementation, XAGREE checks the status of the open and close flags in the data message door open and close msj. Also, XAGREE has to check that the door sensor data message does not allow the door to enter a state where it is blocked while open or closed. Figure 4.15: ASAM’s report about safety constraints verification for train door Figure 4.15 shows ASAM’s generated report about safety constraint verification for the train automated door control system. ASAM generates the report based on the following information: component’s implementation part, error ontology, probability occurrence of the specific error, probability thresholds to identify hazard’s severity level, unsafe control action, causes of unsafe control action, safety constraints, and verification of the safety constraints. This stream of information was previously described in other examples. In this section of the report, we will focus on the verification parts of the safety constraints. 1. The unsafe control action (UCA1) in the door sensor.impl tells us that the train door has been blocked when it is opened. The general cause for this unsafe action is a wrong signal generated by the sensor when the door is opened, or an internal failure of the sensor. To solve this problem, we need to provide a safety constraint, SC1, to show that the door should not be blocked when it is opened. At the same time, we need to ensure that the train door is opened correctly/safely and it is not blocked. For that purpose, we provide a guarantee identifier, GAU1, for the safety constraint, SC1. ASAM sends the identifier to XAGREE to verify the activity of the door sensor.impl and then returns the result (Yes/No) to ASAM. ASAM reports that result (Yes) showing verification of the safety constraint, SC1. This verification means that the door sensor.impl cannot be blocked when the door is opened. 2. The unsafe control action (UCA2) in the door sensor.impl tells us that the train door has been blocked when it is closed. The general cause for this unsafe action is a wrong signal generated by the sensor when the door is closed, or an internal failure of the sensor. To solve this problem, we need to provide a 93 safety constraint, SC2, to show that the door should not be blocked when it is closed. At the same time, we need to ensure that the train door is closed correctly/safely and it is not blocked. For that purpose, we provide a guarantee identifier, GAU2, for the safety constraint, SC2. ASAM sends the identifier to XAGREE to verify the activity of the door sensor.impl and then returns the result (Yes/No) to ASAM. ASAM reports that result (Yes) showing verification of the safety constraint, SC2. This verification means that the door sensor.impl cannot be blocked when the door is closed. 3. The unsafe control action (UCA3) in the door controller.impl tells us that the incoming data message from the door sensor.impl contains two status (open and close) at the same time. The general cause for this unsafe action is a wrong signal generated by the sensor when the door is opened or closed, or the controller has an internal failure. To solve this problem, we need to provide a safety constraint, SC3, to show that the door controller.impl should not send both open and close commands at the same time. Also, we need to ensure that the door controller.impl does not send the wrong command. For that purpose, we provide a guarantee identifier, GAU3, for the safety constraint, SC3. ASAM sends the identifier to XAGREE to verify the activity of the door controller.impl and then returns the result (Yes/No) to ASAM. ASAM reports the result (Yes) showing verification of the safety constraint, SC3. This verification means that the door controller.impl does not send inadequate commands, or two status at the same time. Figure 4.16: ASAM’s report about safety constraints are not verified for the train door Figure 4.16 shows ASAM’s generated report about safety constraints of the door sensor status in the train automated door control system are not verified. We will describe the reasons as follows: 1. As a reminder, we need to verify the safety constraint to ensure the door is not blocked when it is opened. For that purpose, we provide a guarantee identifier, GAU1, for the safety constraint, SC1. ASAM sends the identifier to XAGREE to verify the activity of the door sensor.impl and then returns the result (Yes/No) to ASAM. ASAM reports that result (NO or N) showing the safety constraint, SC1, 94 is not verified. This verification means that the door is blocked when the it is opened. This means the door will not return to different status because it is blocked in that situation. ASAM verified that the door is blocked when it is opened because ASAM did not obtain the true value from XAGREE to ensure the door is not blocked when it is opened. 2. Also, we need to verify the safety constraint to ensure the door is not blocked when it is closed. For that purpose, we provide a guarantee identifier, GAU2, for the safety constraint, SC2. ASAM sends the identifier to XAGREE to verify the activity of the door sensor.impl and then returns the result (Yes/No) to ASAM. ASAM reports that result (NO or N) showing the safety constraint, SC2, is not verified. This verification means that the door is blocked when the it is closed. This means the door will not return to different status because it is blocked in that situation. ASAM verified that the door is blocked when it is closed because ASAM did not obtain the true value from XAGREE to ensure the door is not blocked when it is closed. 4.3.3 Safety Constraints Verification for Pacemaker We will provide and discuss safety constraint verification for the pacemaker feedback control loop architecture. We will verify the safety constraints that have been fed into the implementation part of each individual component do not have unsafe behavior or that the hazard effects have been mitigated. In this example, we will verify three important safety constraints of the pacemaker, which are: 1. SC1: The electrode should not be blocked when it is sensing. 2. SC2: The electrode should not be blocked when the heart is paced. 3. SC3: The heart should not be sensed and paced at the same time (i.e, sense and then pace, not both together). 1 system E l e c t r o d e 2 features 3 E l e c t r o d e B l o c k e d s : in d a t a p o r t Base Types : : I n t e g e r ; 4 Measured Sensing : out d a t a p o r t Base Types : : Boolean ; 5 Measured Pacing : out d a t a p o r t Base Types : : Boolean ; 6 7 a n n e x x a g r e e {∗∗ assume ” E l e c t r o d e i n b l o c k e d s t a t e s ” : E l e c t r o d e B l o c k e d s >= 0 ; 95 GAU1 : g u a r a n t e e ” E l e c t r o d e i s b l o c k e d o n l y i f 8 i t i s i n measured s e n s i n g ” : ( E l e c t r o d e B l o c k e d s = 0 and M e a s u r e d S e n s i n g = t r u e ) o r Measured Sensing = f a l s e ; GAU2 : g u a r a n t e e ” E l e c t r o d e i s b l o c k e d o n l y i f 9 i t i s h e a r t paced ” : ( E l e c t r o d e B l o c k e d s = 1 and M e a s u r e d P a c i n g = t r u e ) o r Measured Pacing = f a l s e ; 10 11 ∗∗}; end E l e c t r o d e ; Listing 4.30: XAGREE Identifiers for Electrode Specification In listing 4.30, we show the Electrode sensor specification along with preconditions and postconditions for the operation of the sensor. In the specification, the interface has two out data ports, Measured Sensing and Measured Pacing, which are used to send sensed and paced values of the heart to Controller Monitor Device. The interface also has an in data port, Electrode Blocked s that allows the Electrode to receive the blocked status value from the heart. 1 2 3 4 5 system implementation E l e c t r o d e . impl a n n e x asam {∗∗ −− I d e n t i f y t y p e o f m a j o r component t y p e => s e n s o r ; −−E l e c t r o d e i s t h e s e n s o r o f t h e p a c e m a k e r . The i n t e r n a l f a i l u r e i n i t causes electrode error . 6 i n t e r n a l f a i l u r e INTF1 : ” The e l e c t r o d e r e p o r t i t i s b l o c k e d i n m e a s u r e d s e n s i n g ” c a u s e s [ M e a s u r e d S e n s i n g E r r o r ( C r i t i c a l , p = 0 . 0 2 ) ] on [ Measure Sensing ] ; 7 i n t e r n a l f a i l u r e INTF2 : ” The e l e c t r o d e r e p o r t i t i s b l o c k e d when t h e h e a r t i s p a c e d ” c a u s e s [ M e a s u r e d P a c i n g E r r o r ( C r i t i c a l , p = 0 . 0 2 ) ] on [ Measured Pacing ] ; 8 s a f e t y c o n s t r a i n t SC1 : ” The e l e c t r o d e s h o u l d n o t be b l o c k e d when i t i s s e n s i n g ” p r e v e n t s [ M e a s u r e d S e n s i n g E r r o r ] on [ M e a s u r e S e n s i n g ] v e r i f i e d by [GAU1 ] ; 9 s a f e t y c o n s t r a i n t SC2 : ” The e l e c t r o d e s h o u l d n o t be b l o c k e d when t h e h e a r t i s p a c e d ” p r e v e n t s [ M e a s u r e d P a c i n g E r r o r ] on [ M e a s u r e d P a c i n g ] v e r i f i e d by [GAU2 ] ; 96 10 ∗∗}; 11 a n n e x x a g r e e {∗∗ 12 −− S a t i s f y e a c h s t a t u s o f t h e e l e c t r o d e 13 assign Measured Sensing = ( Electrode Blocked s = 0) ; 14 assign Measured Pacing = ( Electrode Blocked s = 1) ; 15 16 ∗∗}; end E l e c t r o d e . i m p l ; Listing 4.31: ASAM and XAGREE for Electrode Implementation In listing 4.31, we show ASAM and XAGREE in the Electrode sensor implementation. ASAM gives the identifiers, GAU1 and GAU2, to the guarantee statements in the XAGREE specification as shown in listing 4.30. These identifiers are used to verify the safety constraints, SC1 and SC2, to ensure that the Electrode is not blocked when it is sensing the heart or when the heart is paced. If any guarantee associated with safety constraints fail verification, then ASAM will report that the safety constraint are marked as unverified. 1 −−The d a t a m e s s a g e comes from t h e e l e c t r o d e s e n s o r , i t c o n t a i n s two s t a t u s a t t h e same t i m e , s e n s i n g and p a c i n g 2 data Electrode message 3 end E l e c t r o d e m e s s a g e ; 4 5 6 d a ta implementation Electrode message . impl subcomponents 7 DCM sensing c : d a t a B a s e T y p e s : : B o o l e a n ; 8 DCM pacing c : d a t a B a s e T y p e s : : B o o l e a n ; 9 end E l e c t r o d e m e s s a g e . i m p l ; 10 11 12 system D e v i c e C o n t r o l l e r M o n i t o r features 13 DCM sensing and pacing msj : in d a t a p o r t Electrode message . impl ; 14 DCM Blocked c : i n d a t a p o r t B a s e T y p e s : : I n t e g e r ; 15 DCM sensing c : o u t d a t a p o r t B a s e T y p e s : : B o o l e a n ; 16 DCM pacing c : o u t d a t a p o r t B a s e T y p e s : : B o o l e a n ; 17 18 a n n e x x a g r e e {∗∗ assume ”DCM s t a t e s i n ” : DCM Blocked c <= 2 and DCM Blocked c >=0; 97 GAU3 : g u a r a n t e e ” H e a r t c a n n o t be s e n s e d and p a c e d a t t h e same t i m e , 19 s e n s e and t h e n s p a c e ” : n o t ( DCM sensing c and DCM pacing c ) ; 20 21 ∗∗}; end D e v i c e C o n t r o l l e r M o n i t o r ; Listing 4.32: XAGREE for Device Controller Monitor Specification In listing 4.32, we show a data message read by the Electrode containing two statuses at a time. This means the message could contain both sensing and pacing information simultaneously. In reality, the pacemaker should first sense and then pace the heart with respect to sensing values. We also show the Device Controller Monitor specification with identifiers in the XAGREE part. In the specification of the controller, the interface has two out data ports, DCM sensing c and DCM pacing c, to send sensing and pacing commands. The interface also has two in data ports, DCM sensing and pacing msj and DCM Blocked c to receive different types of messages. The DCM sensing and pacing msj is used to receive the data messages that have been read and sent by the Electrode sensor. The DCM Blocked c is used to receive the blocked status values of the Electrode sensor. In the XAGREE part, we show a precondition for the in data port to identify blocked status values of the Device Controller Monitor. Also, we show a postcondition identifier or guarantee identifier, GAU3, for the out data ports of the controller, DCM sensing c and DCM pacing c, to identify that the Device Controller Monitor does not send sensing and pacing commands together at the same time. 1 2 3 system implementation D e v i c e C o n t r o l l e r M o n i t o r . impl a n n e x asam {∗∗ −− I d e n t i f y t y p e o f m a j o r component t y p e => c o n t r o l l e r ; 4 5 −−DCM i s t h e c o n t r o l l e r o f p a c e m a k e r . The i n t e r n a l f a i l u r e i n i t c a u s e s DCM e r r o r . i n t e r n a l f a i l u r e INTF3 : ” The DCM r e p o r t t h a t h e a r t s e n s e d and p a c e d a t 6 t h e same t i m e ” c a u s e s [ SensedAndPacedCMD ( C r i t i c a l , p = 0 . 0 3 ) ] on [ DCM sensing c ] ; s a f e t y c o n s t r a i n t SC3 : ” The DCM s h o u l d n o t s e n d s e n s i n g and p a c i n g 7 commands t o g e t h e r a t t h e same t i m e ” p r e v e n t s [ SensedAndPacedCMD ] on [ DCM sensing c ] v e r i f i e d by [GAU3 ] ; 8 ∗∗}; 98 9 10 a n n e x x a g r e e {∗∗ −− S a t i s f y t h a t s e n s i n g and p a c i n g h a v e t h e same s t a t u s eq D C M s e n s i n g m s j c : b o o l = f a l s e −> i f 11 ( ( D C M s e n s i n g a n d p a c i n g m s j . DCM sensing c = t r u e ) and ( D C M s e n s i n g a n d p a c i n g m s j . DCM pacing c = f a l s e ) ) t h e n t r u e e l s e false ; eq DCM pa cing msj c : b o o l = f a l s e −> i f 12 ( ( D C M s e n s i n g a n d p a c i n g m s j . DCM pacing c = t r u e ) and ( D C M s e n s i n g a n d p a c i n g m s j . DCM sensing c = f a l s e ) ) t h e n t r u e e l s e false ; 13 −− S a t i s f y t h a t e l e c t r o d e d a t a m e s s a g e s h o u l d n o t l e a d t o b l o c k t h e DCM when i t i s t h e s e n s i n g and p a c i n g h a v e t h e same s t a t u s 14 a s s e r t ( DCM sensing c = ( D C M s e n s i n g m s j c and ( DCM Blocked c <= 2 ) ) ) ; 15 a s s e r t ( DCM pacing c = ( DCM pac ing msj c and ( DCM Blocked c >=0) ) ) ; 16 17 ∗∗}; end D e v i c e C o n t r o l l e r M o n i t o r . i m p l ; Listing 4.33: ASAM and XAGREE for Device Controller Monitor Implementation In listing 4.33, we show ASAM and XAGREE in the Device Controller Monitor implementation. ASAM gives an identifier, GAU3, to the guarantee statement in the XAGREE specification of the controller as shown in listing 4.32, to verify the safety constraint, SC3, to ensure that the controller of the pacemaker does not send sensing and pacing commands together at the same time. For that purpose, when we call the XAGREE, it checks all activity of the Device Controller Monitor in the specification and implementation parts and returns the result to ASAM. Figure 4.17: ASAM’s report about safety constraints verification for pacemaker Figure 4.17 shows ASAM’s report about the safety constraints verification for the pacemaker. ASAM generates the report based on the following information: component’s implementation part, error ontology, 99 probability occurrence of the specific error, probability thresholds to identify hazard’s severity level, unsafe control action, causes of unsafe control action, safety constraints, and verification of the safety constraints. This stream of information is described previously in detail for different types of examples. Now, we need to focus on the verification part of the safety constraints. We iteratively explain figure 4.17 below: 1. The internal failure (INTF1) in the Electrode.impl tells us that the electrode sensor has been blocked when it is sensing the heart. This failure gives MeasuredSensingError to the Electrode.impl. We need a safety constraint to mitigate the effects of the error. For that purpose, we provide a safety constraint, SC1, to show that the electrode sensor should not be blocked because of the error when it is sensing the heart. This is not a sufficient mechanism to mitigate the effects of the error only. Also, we need to ensure that Electrode.impl is sensing the heart correctly/safely and it is not blocked. For this case, we provide a guarantee identifier, GAU1, for the safety constraint, SC1. ASAM sends the guarantee identifier to XAGREE to verify the activity of the Electrode.impl specification / implementation and returns the result (Yes/No) to ASAM. Then, ASAM reported (Yes) as a verification of the safety constraint, SC1. This verification means that the Electrode.impl is not blocked when it senses the heart. 2. The internal failure (INTF2) in the Electrode.impl tells us that the electrode sensor has been blocked when the heart is paced. This failure gives MesuredPacingError to the Electrode.impl. We need a safety constraint to mitigate the effects of the error. For that purpose, we provide a safety constraint, SC2, to show that the electrode sensor should not be blocked because of the error when the heart is paced. That is not sufficient mechanism to mitigate the effects of the error only. Also, we need to ensure that the Electrode.impl sensor has not been blocked when the heart is paced. For this case, we provide a guarantee identifier, GAU2, for the safety constraint, SC2. ASAM sends the guarantee identifier to XAGREE to verify the activity of the Electrode.impl specification / implementation and returns the result (Yes/No) to ASAM. Then, ASAM reported (Yes) as a verification of the safety constraint, SC2. This verification means that the Electrode.impl is not blocked when the heart is paced. 3. The internal failure (INTF3) in the Device Controller Monitor.impl tell us that the heart is sensed and paced together at the same time. This failure gives SensedAndPacedCMD error to the controller. We need a safety constraint to mitigate the effects of the error. For that purpose, we provide a safety constraint, SC3, to show that the controller should not send sensing and pacing commands together at the same time. That is not sufficient mechanism to mitigate the effects of the error only. Also, we need to ensure that the controller do not send the wrong command to the heart. For this case, we provide a 100 guarantee identifier, GAU3, for the safety constraint, SC3. ASAM sends the identifier to XAGREE to verify the activity of the Device Controller Monitor.impl specification / implementation and returns the result (Yes/No) to ASAM. Then, ASAM reported (Yes) as a verification of the safety constraint, SC3. This verification means that the Device Controller Monitor.impl do not send the inadequate command to the heart such as sensing and pacing commands together to the heart. Figure 4.18: ASAM’s report about safety constraints are not verified for the pacemaker Figure 4.18 shows ASAM’s generated report about safety constraints of the Electrode.impl status in the pacemaker system are not verified. We will describe the reasons as follows: 1. As a reminder, we need to verify the safety constraint of the Electrode.impl to ensure that the electrode is not blocked when it is sensing. For that purpose, we provide a guarantee identifier, GAU1, for the safety constraint, SC1. ASAM sends the identifier to XAGREE to verify the activity of the Electrode.impl and then returns the result (Yes/No) to ASAM. ASAM reports that result (NO or N) showing the safety constraint, SC1, is not verified. This verification means that the electrode sensor is blocked when it is sensing. This means the electrode will not sense any more because it is blocked in that status. ASAM verified that the electrode sensor is blocked when it is sensed because ASAM did not obtain the true value from XAGREE to ensure the electrode is not blocked when it is sensed. 2. Also, we need to verify the safety constraint of the Electrode.impl to ensure that the electrode is not blocked when the heart is paced. For that purpose, we provide a guarantee identifier, GAU2, for the safety constraint, SC2. ASAM sends the identifier to XAGREE to verify the activity of the Electrode.impl and then returns the result (Yes/No) to ASAM. ASAM reports that result (NO or N) showing the safety constraint, SC2, is not verified. This verification means that the electrode sensor is blocked when the heart is paced. This means the electrode will not pace the heart any more because it is blocked in that situation. ASAM verified that the electrode sensor is blocked when the heart is paced because ASAM did not obtain the true value from XAGREE to ensure the electrode sensor is not blocked when the heart is paced. 101 4.3.4 Summary In this evaluation, we have taken two important safety-critical system examples, the train automated door control system and a medical embedded device (i.e. pacemaker). In each example, we have multiple safety constraints, (SC1, SC2, and SC3) and multiple guarantee identifiers (GAU1, GAU2, and GAU3) which are used to verify the safety constraints. The result of each example has been evaluated based on XAGREE analysis and ASAM’s generated report. The purpose of this evaluation is to demonstrate that ASAM uses verification, unlike existing safety analysis methods, to prove that the safety constraint fully mitigates the hazardous condition improving the quality of the safety constraints. The result tells us that ASAM verified the safety requirements for different kinds of examples in the safety-critical domains. We provided safety constraints in the implementation part for each individual core component in the feedback control loop architecture. We also provided guarantee identifier for each safety constraint within the component implementation. We adopted XAGREE to check the activity of the identifiers in the specification and implementation of the component and then it returns the result to ASAM. In each example, we have also demonstrated that ASAM generate a report to verify the result. In summary, we show how to ensure the error effects have been mitigated by verifying safety the constraints within the architecture model. 4.4 Demonstrating Notations and Expressions for Error Flows and Safety Constraints In this section, we will show that how to identify the notations in the ASAM’s implementation. Essentially, we are going to say that the tooling shown in claims 1 - 3 is an implementation of the mathematical framework. We described in detail the theoretical foundation of notation and expression in section 3.3. As a reminder, previously, we provide the mathematical notation for the feedback control loop architecture and provide formal specification for the error ontology in the loop. By doing so, we allow stakeholders to find more unsafe possibilities in the operational system context. Additionally, we provide an expression for each component in the feedback control loop and use error types in the form of formal specifications to find more possibilities for how the system could behave incorrectly. Also, we identify unsafe functional behavior for each component, and we provide safety constraints for possibilities of unsafe functions in the 102 system because if the system knows about those possibilities, it can take action to avoid incorrect behavior. We will show the notations in ASAM’s implementation to identify the locations of the variables in the feedback control loop architecture such as measured variables (Vme), control actions (CA), manipulated variables (Vma), and controlled variables(Vc). We also show the other elements of the notation in ASAM such as safety constraints (SC), internal failures (INTF), errors (Err), the error ontology (EO), guarantee identifiers (GAU), unsafe control actions (UCA), probability values (P), hazards types (H), major component types such as sensors (S), controllers (C), actuators (A) and controlled processes (CP). These elements help the stakeholders use our notation in the safety-critical examples. 4.4.1 Identify Notations in ASAM’s Implementation When the safety analysts or stakeholders run ASAM, they will see our notations that have been executed within the architecture model to generate a report. We have presented the notations in the form of executed safety analysis statements in ASAM such as error statements, error propagation statements, internal failure statements, safety constraint statements that either handle or prevent hazards, and safety constraint verification statements. In this section, we will provide and discuss the notations that we have implemented in ASAM. We have presented different types of variables in the feedback control loop architecture such as measured variables (Vme), control actions (CA), manipulated variables (Vma) and controlled variables (Vc). We also identify major component types such as sensors (S), controllers (C), actuators (A), and controlled processes (CP). We have also specified the location of the variables among the components. For example, we have presented (Vme) between (S and C), (CA) between (C and A), (Vma) between (A and CP), (Vc) between (CP and S). Each component in the feedback control loop could has an internal failure (INTF) which leads to an error (Err) in the variable. In this case, the stakeholders can select any of the six (Type) errors in the error ontology (EO) such as (Service Errors, Value Errors, Timing Errors, Replication Errors, Concurrency Errors, Access Control Errors) as an effect of the internal failure. In the our application examples, we have focused on the first three. Also, each error causes an unsafe control action (UCA) in the component. We identify the general and specific cause for that error. To mitigate the effects of the (Err), we provide safety constraints (SC), to handle and prevent incoming and outgoing (Err)s. We also have represented the probability value (P) of error (Err)’s occurrence as well as the severity level of the the (Err) such as (Catastrophic, Critical, Marginal, Negligible). Finally, we show the notation in the components implementation within the architec103 ture model. Also, XAGREE helped ASAM to verify the guarantee identifiers (GAU) in the specification of the components. ASAM wants to verify the safety constraint (SC) identifiers that have been fed in the implementation by specification using XAGREE to ensure that the (Err) effects have been mitigated or hazardous condition could not occur. In the following example, we just remind the reader how to identify our notations in ASAM: 1 2 system S features −−C o n t r o l l e d V a r i a b l e s ( Vc ) o f t h e S e n s o r ( S ) 3 Vc S : i n d a t a p o r t B a s e T y p e s : : I n t e g e r ; 4 5 −− Measured V a r i a b l e (Vme) o f t h e S e n s o r ( S ) 6 Vme S : o u t d a t a p o r t B a s e T y p e s : : B o o l e a n ; 7 −−Check t h e s e n s o r ’ s s p e c i f i c a t i o n i n p u t and o u t p u t 8 a n n e x x a g r e e {∗∗ assume ” C o n t r o l l e d v a r i a b l e o f t h e s e n s o r ’ s i n p u t ’ ” : Vc S >= 0 ; 9 GAU1 : g u a r a n t e e ” Check m e a s u r e d v a r i a b l e w i t h r e s p e c t t o c o n t r o l l e d 10 v a r i a b l e ” : ( Vc S = 1 and Vme S = t r u e ) o r Vme S = f a l s e ; 11 12 ∗∗}; end S ; Listing 4.34: Notations in ASAM In listing 4.34, we show controlled variable (Vc) as an in data port and also have represented measured variable (Vme) as an out data port. We also have used XAGREE to check the sensor, (S)’s, input and output specifications respectively. XAGREE provides conditions for the controlled variable (Vc) and provides a guarantee identifier (GAU1) for the measured variable (Vme). 1 2 system implementation S . impl a n n e x asam {∗∗ 3 t y p e => s e n s o r ; 4 i n t e r n a l f a i l u r e INTF1 : ” S e n s o r i n t e r n a l f a i l u r e s t a t e m e n t ” c a u s e s [ EO S ( C r i t i c a l , p = 0 . 0 2 ) ] on [ Vme S ] ; s a f e t y c o n s t r a i n t SC1 : ” S e n s o r s a f e t y c o n s t r a i n t s t a t e m e n t f o r t h e 5 s e n s o r ’ s m e a s u r e d v a r i a b l e ” p r e v e n t s [ EO S ] on [ Vme S ] v e r i f i e d by [GAU1 ] ; 6 ∗∗}; 104 7 a n n e x x a g r e e {∗∗ a s s i g n Vme S = ( Vc S = 1 ) ; −−R e t u r n t r u e when t h e Vc i s c o r r e c t 8 9 10 ∗∗}; end S . i m p l ; Listing 4.35: Notations in ASAM In listing 4.35, in the sensor’s implementation part (S.impl), we show the type of the component (type => sensor), an internal failure statement (INTF1) which gives an error (Err) in the error ontology (EO S) to the sensor (S). The error has a probability of occurrence (P) and has a severity level, (Critical). We also a safety constraint statement (SC1) to prevent the effects of the error on the measured variable of the sensor (Vme S), which also has a guarantee identifier (GAU1) to that the effect has been mitigated. Also, we have use XAGREE to return true or (Yes) to ASAM if the safety constraint is correct. 1 2 system C features 3 −−Measured V a r i a b l e s (Vme) o f t h e C o n t r o l l e r (C ) 4 Vme C : i n d a t a p o r t B a s e T y p e s : : B o o l e a n ; 5 −−C o n t r o l A c t i o n (CA) o f t h e C o n t r o l l e r (C ) 6 CA C : o u t d a t a p o r t 7 Base Types : : I n t e g e r ; end C ; 8 9 10 system implementation C. impl a n n e x asam {∗∗ 11 t y p e => c o n t r o l l e r ; 12 i n t e r n a l f a i l u r e INTF2 : ” C o n t r o l l e r i n t e r n a l f a i l u r e s t a t e m e n t ” c a u s e s [ EO C ( M a r g i n a l , p = 0 . 0 4 ) ] on [ CA C ] ; 13 −−E r r o r comes from t h e m e a s u r e d v a r i a b l e and c a u s e s e r r o r on c o n t r o l action . [ EO S ] on [ Vme C ] c a u s e s [ EO S ( C a t a s t r o p h i c , p = 0 . 0 0 2 ) ] on [ CA C ] ; 14 15 −− C o n t r o l l e r h a v e a SC t o h a n d l e e r r o r ( s ) i n m e a s u r e d v a r i a b l e . s a f e t y c o n s t r a i n t SC2 : ” C o n t r o l l e r s a f e t y c o n s t r a i n t f o r c o n t r o l l e r ’ s 16 m e a s u r e d v a r i a b l e ” h a n d l e s [ EO S ] on [ Vme C ] v e r i f i e d by [GAU2 ] ; 17 ∗∗}; 105 18 end C . i m p l ; Listing 4.36: Notations in ASAM In listing 4.36, we have represented the controller’s (C) specification and implementation. In the specification, we have represented measured variables as in data port to the controller (Vme C) and have represented control action as an out data port of the the controller (CA C). In the implementation, we have identified the major type of the component (type => controller), internal failure statement (INTF2) of the controller which gives an error (Err) in the error ontology to the controller (EO C). Then, the error (Err) will propagate to the actuator (A) through control action of the controller (CA C). At the same time, the incoming error from the sensor (EO S) arrives to the controller (C) through measured variable of the controller (Vme C), it causes an error on the control action (CA C) of the controller’s output. We also have represented a safety constraint, (SC2), to handle the incoming error from the sensor (EO S) on controller’s measured variable (Vme C), and we have an identifier (GAU2) to ensure that it is mitigated. 1 2 system A features 3 −− C o n t r o l A c t i o n (CA) o f t h e A c t u a t o r (A) 4 CA A : i n d a t a p o r t B a s e T y p e s : : I n t e g e r ; 5 −− M a n i p u l a t e d V a r i a b l e s (Vma) o f t h e A c t u a t o r (A) 6 Vma A : o u t d a t a p o r t B a s e T y p e s : : I n t e g e r ; 7 end A; 8 9 10 system implementation A. impl a n n e x asam {∗∗ 11 t y p e => a c t u a t o r ; 12 i n t e r n a l f a i l u r e INTF3 : ” A c t u a t o r i n t e r n a l f a i l u r e s t a t e m e n t ” c a u s e s [ EO A ( C a t a s t r o p h i c , p = 0 . 2 ) ] on [ Vma A ] ; 13 −−E r r o r comes from t h e c o n t r o l l e r and c a u s e s e r r o r on m a n i p u l a t e d variables . 14 [ EO C ] on [ CA A ] c a u s e s [ EO C ( C r i t i c a l , p = 0 . 1 ) ] on [ Vma A ] ; 15 s a f e t y c o n s t r a i n t SC3 : ” A c t u a t o r s a f e t y c o n s t r a i n t t o h a n d l e i n c o m i n g e r r o r from c o n t r o l a c t i o n ” h a n d l e s [ EO C ] on [ CA A ] v e r i f i e d by [GAU3 ] ; 16 s a f e t y c o n s t r a i n t SC4 : ” A c t u a t o r s a f e t y c o n s t r a i n t t o p r e v e n t o u t g o i n g 106 e r r o r on m a n i p u l a t e d v a r i a b l e ” p r e v e n t s [ EO A ] on [ Vma A ] v e r i f i e d by [GAU4 ] ; 17 18 ∗∗}; end A . i m p l ; Listing 4.37: Notations in ASAM In listing 4.37, we show the actuator’s, (A), specification and implementation. In the specification, we show the control action of the actuator (CA A) as an in data port and show the manipulated variables of the actuator (Vma A) as an out data port. In the implementation, we have identified the type of the component (type => actuator) and we show an internal failure statement (INTF3) of the actuator which gives an error (Err) in the error ontology (EO A). The error will propagate through the manipulated variables of the actuator (Vma A) to the controlled process (CP). At the same time, an incoming error from controller (EO C) arrives to the actuator (A) through the control action of the actuator (CA A), it causes an error on the manipulated variables of the actuator (Vma A). Also, we show a safety constraint, (SC3), to handle incoming errors from controller (EO C) on the control action of the actuator (CA A) and we have an identifier (GAU3) to ensure that it is mitigated. We also represent another safety constraint, (SC4) to prevent errors in the actuator itself (EO A) and we have an identifier (GAU4) to ensure that they are mitigated. 1 2 s y s t e m CP features 3 −−M a n i p u l a t e d V a r i a b l e s (Vma) o f t h e C o n t r o l l e d P r o c e s s ( CP ) 4 Vma CP : i n d a t a p o r t B a s e T y p e s : : I n t e g e r ; 5 −−C o n t r o l l e d V a r i a b l e s ( Vc ) o f t h e C o n t r o l l e d P r o c e s s ( CP ) Vc CP : o u t d a t a p o r t B a s e T y p e s : : I n t e g e r ; 6 7 end CP ; 8 9 10 s y s t e m i m p l e m e n t a t i o n CP . i m p l a n n e x asam {∗∗ t y p e => c o n t r o l l e d p r o c e s s ; 11 12 −−P a s s t h e i n c o m i n g e r r o r s from m a n i p u l a t e d v a r i a b l e s t o c o n t r o l l e d variables 13 [ EO C ] on [ Vma CP ] c a u s e s [ EO C ( M a r g i n a l , p = 0 . 0 1 ) ] on [ Vc CP ] ; 14 [ EO A ] on [ Vma CP ] c a u s e s [ EO A ( N e g l i g i b l e , p = 0 . 0 0 0 0 1 ) ] on [ Vc CP ] ; 15 ∗∗}; 107 16 end CP . i m p l ; Listing 4.38: Notations in ASAM In listing 4.38, we show the controlled process (CP) specification and implementation. In the specification, we show the manipulated variables of the controlled process (Vma CP) as an in data port and show the controlled variable of the controlled process (Vc CP) as an out data port. In the implementation, we have identified the type of the component (type => controlled process). The errors of the controller (C) and actuator (A), (EO C) and (EO A), will propagate through the manipulated variable of the controlled process (Vma CP) to the controlled variable of the controlled process (Vc CP). 1 s y s t e m Top System 2 end Top System ; 3 4 5 s y s t e m i m p l e m e n t a t i o n Top System . i m p l subcomponents 6 S : system S . impl ; 7 C: system C. impl ; 8 A: s y s t e m A . i m p l ; 9 CP : s y s t e m CP . i m p l ; 10 connections 11 c o n n 1 : p o r t S . Vme S −> C . Vme C ; 12 c o n n 2 : p o r t C . CA C −> A . CA A ; 13 c o n n 3 : p o r t A . Vma A −> CP . Vma CP ; 14 c o n n 4 : p o r t CP . Vc CP −> S . Vc S ; 15 end Top System . i m p l ; Listing 4.39: Notations in ASAM In listing 4.39, we show the top level specification and implementation. In the specification, we do not have any features to describe. In the implementation, we show the subcomponents and their connections. In the subcomponents, we have called the component implementations such as sensor implementation (S.impl), controller implementation (C.impl), actuator implementation (A.impl) and controlled process implementation (CP.impl). In their connections, we have connected the components together through the ports. For example, the measured variable of the sensor (S.Vme S) is connected to the measured variable of the controller (C.Vme C) under the connection name (conn 1). Also, the control action of the controller (C.CA C) 108 is connected to the control action of the actuator (A.CA A) under the connection name (conn 2). Then, the manipulated variables of the actuator (A.Vma A) are connected to the manipulated variables of the controlled process (CP.Vma CP) under the connection name (conn 3). Finally, the controlled variable of the controlled process (CP.Vc CP) is connected to the controlled variable of the sensor (S.Vc S) under the name the connection (conn 4). 4.4.2 Summary In this evaluation, we have made two different types of analysis, theoretical and practical. In the theoretical analysis, described in section 3.3.1, we have applied our notations and expressions for the train automated door control system to identify more safety constraints for the hazardous possibilities. In the practical analysis, described in section 4.4.1, we have an architectural description for the stakeholders or safety analysts to know how to use our notations in the feedback control loop architecture to run ASAM successfully and generate a report for the architecture model. The purpose of this evaluation is to demonstrate that ASAM uses system mathematical notation, unlike existing safety analysis methods, to describe the systems error flows allowing it more rigorously audit the model. In the theoretical analysis, the result of the notation and expression tell us that the safety analysts are able to identify unsafe behavior of the components, specify input/output/function for each component, identify unsafe functional behavior/find more hazardous possibilities using error ontology, identify mathematical notation for each error flow / scenario and provide the safety constraint to mitigate the effects of the error flow. In the practical analysis, the result tells us that the ASAM’s implementation satisfied the notations that have been used in the theoretical analysis. 109 Chapter 5 Discussion In this section, we will first discuss the limitations of the techniques used to evaluate the framework presented as part of this research. We will also justify the choices made during the evaluation, and we will discuss the contributions of ASAM. 5.1 Difference Between Two-Way and Three-Way Communication Formats In this section, we will describe the difference between two-way and three-way communication formats. The two-way communication format is the interaction between two components only. The stakeholders know the source of the hazard and impact destination. But, the three-way communication format is the interaction among the three components. The safety analysts need to focus on the system behavior when the data is sent by the first component, crosses into the second component and is received by the third component. This allows us to identify different kinds of hazard sources and different types of impact destinations. We show the difference between the two formats in figure 5.1. 110 Figure 5.1: Two-way versus Three-way communication format Figure 5.1(a), the two-way communication format, depends on the interaction between controller and actuator to identify unsafe control actions (UCA). This allows the stakeholders / safety analysts to focus on the system controller as a source of the hazard because they are continuously monitoring the system behavior when the controller sends inadequate commands to the actuator. Then, they will provide the safety requirements to mitigate that hazard during the operational system context. Figure 5.1(b), the three-way communication format, focuses on the interaction among multiple components, usually three in a sequence such as [sensor → controller → actuator], [controller → actuator → controlled process], and [actuator → controlled process → sensor]. In each sequence, we have identified the new unsafe control action for the system. Each element in the sequence has a different role of (source, path, sink) in the context of the error propagation information. This allows the stakeholders to identify different types of hazard sources because each of three major component can have different types of hazards. 5.2 Justification of & Limitations of Evaluation In the first evaluation (for claim C1), we chose to use only the three-way communication format rather than do a comparison against the two-way format. STPA is a long process that requires extensive description, and STPA analyses of each of our examples are already available in existing literature so repeating the analysis for each example was deemed unnecessary. For those wishing to see one of our examples analyzed in STPA, please see appendix B. In the second evaluation (for claim C2), we chose error flows to identify safety constraints instead 111 of using control tables, used in two-way communication format analysis. The rational behind this choice was two fold. First, the error flows allow the stakeholders to identify the safety constraints and mitigate the effects of errors that have been propagated from a component to another. Second, the error flows also allow the identification of general and specific causes of the unsafe control actions. In figure 5.2, we show the difference between safety analysis using control tables and safety analysis using error flows: Figure 5.2: Control Table versus Error Flows As shown in the figure 5.2(a), the control table has been created to hold the safety requirements to eliminate system hazards or unsafe interaction between the controller and actuator. This allows the safety analysts to provide the safety requirements between the two components to mitigate the unsafe system behavior. Also, as shown in the figure 5.2(b), we show that error flow analysis allows us to specify safety constraints and describe hazards between components other than the actuator and controller. The error flows also allow us to specify the exact source, propagation path and sink of each error, allowing us to specify both the general and specific causes of each hazard. In the third evaluation (for claim C3), we extended safety analysis by adding verification. Most current standards do verify the safety constraints resulting from the analysis instead relying on the experience of the analyst to ensure the correctness of the constraint. Due to our use of formal notation and exact specification, it is possible to perform verification of the constraint against the architecture model of the system. The rational behind this was to introduce the safety constraint verification to ensure the hazard was completely mitigated. 112 5.3 ASAM Field Contributions ASAM makes three primary contributions or advancements to the field of safety-critical systems and error propagation information. Briefly, we summarized ASAM’s contributions: First, ASAM uses an architectural model rather than a mental process model. This provides several benefits: 1. It allows us to use defined error ontologies to describe hazards. 2. It allows us to define an unsafe control action’s propagation path rather than just describing its effect. 3. It allows us to define more precise safety constraints focusing on the general and / or specific cause of an error identified through the propagation path. Second, ASAM is based a formal expression notation in addition to the architectural model. This provides several benefits: 1. Formalizing the error ontology allows us to find more possibilities of incorrect system behavior. 2. It allows us to formally specify the behavior of each component, and through rigorous analysis uncover the unsafe functional behavior of each component in the feedback control loop. 3. It allows us to discover additional safety constraints as well as improve existing constraints to more thoroughly mitigate potential hazards. Finally, ASAM allows verification of its safety constraints in the architectural model through the use of its formal notation. This provides several benefits: 1. It allows us to better ensure that safety constraints completely cover all hazardous possibilities. 2. It allows us to discover weak constraints so that they can be improved. 3. It allows us to discover areas of the architectural model that have missing constraints when verification is done not just against the safety constraints but also against the system requirements, a possibility that exists due to our use of a rigorous architecture specification format and formal notation. Figure 5.3 shows the contributions in the road map of the ASAM’s implementation: 113 Figure 5.3: ASAM’s contributions 114 Chapter 6 Related Work In this section, we will discuss works related to ASAM. We will also discuss work that is loosely related to our method, including research that is likely to be incorporated into our method at a later date. 6.1 Works Related to ASAM STPA is the current accepted standard used to analyze safety of systems. It is the latest top-down hazard analysis method. It has been used successfully in many safety-critical systems outside of AADL. Some examples of this are [22] and [45]. Our work differs from these groups in where safety analysis is applied during the development process of safety-critical systems. We augment STPA with error propagation information to finding hazards and additional information in the system that other groups miss such as unsafe control actions, safety constraints and specific causes of the unsafe actions. We apply safety analysis to the system architecture during the design phase of the software development life cycle while these other projects or groups apply hazard analysis during the requirements phase. Our work is most similar to the work performed by the following groups, particularly [37] and [35] both of which use AADL. Additional architecture-based techniques exist, such as [10], [36], [34] and [12]. Our work differs from these groups in that we are explicitly focused on allowing the error ontology assets to be reused in the same manner as the safety-critical assets, and we also add and exploit new several statements in the AADL language such as error statements, error propagation statements, internal failure statements, safety constraint statements that either handle or prevent hazards and verify safety constraints statements. ASAM is a model-based software safety analysis tool which works with AADL models annotated 115 with an annex (i.e. annex asam) and supported by OSATE. It is used to analyze and generate a report based on information attached to each component in the feedback control loop architecture. There are many other safety and hazard analysis techniques, for example, STPA [22], SAFE [36], FTA [3], HAZOP [30], FMEA [40], FPTN [13], Hip-HOPS [48], FPTC [47]. We differ from each of these primarily in focus. Our work is focused on mitigating the effects of the errors through safety constraints and verifying the constraints within the system model which are unique to safety-critical systems. Additionally, ASAM is a modified version of the previously existing STPA that has been made more rigorous, and ASAM effectively analyzes hardware & software components in safety-critical systems. Also, ASAM is different than SAFE in handling incoming errors / preventing outgoing errors via the implementation of the safety constraints, and verifies the safety constraint statements to ensure that the error effect or unsafe action has been mitigated. Our work is different from FTA, HAZOP and FMEA in where faults are inserted into the architecture model. Our generated report also provides how the fault produces an error in the system & how the error propagates through the model & gives the probability value of the occurrence for each error in the model & records the hazard’s severity level for each error. Also, our work is different than FPTN, Hip-HOPS and FPTC in that we show how the error propagation effects are mitigated via the implementation of safety constraints instead of just showing how errors propagate through the model. Another difference is that we focus on introducing compositional reasoning in ASAM, unlike other safety analysis methods, to ensure that the hazardous condition of the propagated error will not happen. 6.2 Additional Related Works According to our experience in applying hazard analysis methods on different types of systems, these hazard analysis techniques cannot describe the dynamic error behavior of the system, system states, the transitions of the system, and error propagations among system components. Because of this, the traditional hazard analysis techniques depend on decomposition of the system with respect to the hierarchy of failure effects instead of the systems architectural model. Based on this fact, we have decided to develop procedures to augment the existing hazard analysis techniques (such as STPA) with error propagation information and state machines to support modeling and analyzing dynamic error behavior. To prove that the current hazard analysis methods like STPA still needs development, we have several references: This work [1] shows that the STPA requires suitable diagrammatic notations to represent the rela116 tionship among hazards, process model variables in the controller and control actions to the actuator. Our work is different from this group in that we use formal notations to augment STPA to find hazardous conditions in a design. Also, we investigated how a formalized error ontology could assist in identifying unsafe behavior. The formal specification of the error ontology can aid in identifying mathematical expressions for each error flow in the canonical feedback control loop architecture. This work [2] describes software safety verification based on recent hazard analysis methods. They found the safety requirements using STPA. Then, they changed the requirements to formal specifications using theorem proving such as CTL/LTL (Computation Tree Logic / Linear-Time Temporal Logic). Then, they used a model checker such as SMV (Symbolic Model Verifier) to verify the safety requirements. This work has been done by using several tools, but our work is different than this group, we make it easier by putting the safety requirements within the architecture model and then verify the requirements within in the model as well, using less tools. This work [4] represents a formal framework for the hazard identification step in STPA. Our work is different from this project, in that we provide guidance, [39], concerning mathematical notations to formalize an error ontology used in the architecture descriptions of systems represented in AADL to improve the rigor of STPA. The results of our work have shown that providing a formal notation for the feedback control loop and providing formal specification for the error ontology lead to finding hazards in the operational system context that other methods miss. By augmenting STPA with an error ontology described in a formal notation, we are able to find more hazards. 117 Chapter 7 Future Work As a reminder, ASAM has been created based on three primary contributions that have been completed and evaluated as part of the work for this dissertation, which are: • Augmenting a hazard analysis method with error propagation information for safety-critical systems [38] • Using formal notations to augment a hazard analysis method [39] • Verifying safety constraints in a compositional safety analysis method (i.e. ASAM) [39] As mentioned in section 2.6, ASAM’s future work is the implementation of False Positive (FP) / False Negative (FN) identification for each safety-critical application that we described in the dissertation. Shown below are partially complete FP/FN identification in some of the safety-critical example systems used in this dissertation. Now, we will show that how false positive and false negative could occur and be identified. 7.1 ASAM’s Future Work in Adaptive Cruise Control System As a reminder, a false positive (FP) means the result is positive when in reality it is not. For example, when a warning or automatic braking happens before the critical distance. Also, a false negative (FN) means the result is negative when in reality it is present. For example, if a warning or an automatic braking happens too late. Figure 7.1 shows ASAM’s future work in the adaptive cruise control system to warn a driver if the minimum distance between your own vehicle and the vehicle’s front is passed. The obstacle is that the 118 Figure 7.1: ASAM’s future work in ACC example ACC system warns the driver to the apply the brakes when the minimum distance is not presented to the system. Another obstacle is that the ACC sensor presents the minimum distance, but the ACC controller does not send the warn command to the driver. For that purpose, we need to analyze the ACC sensor data availability according to warning / not warning the driver to identify false positives (FP), false negatives (FN), true positives (TP), and true negatives (TN) in ACC system. The analysis is shown in a table below 7.1: Table 7.1: Identify FP/FN/TP/TN in ACC system Min. Distance presented Min. Distance not presented Positive Result (Warn) ACC TP*(1) ACC FP*(2) Negative Result (Not Warn) ACC FN*(3) ACC TN*(4) As shown in table 7.1, we have identified a true positive which is ACC TP*(1): ACC controller warns the driver when the minimum distance is presented. Also, we identified a false positive which is ACC FP*(2): ACC controller warns the driver when the minimum distance is not presented.. It means ACC controller made a wrong decision because the warning command has been sent based on incorrect data. Also, we identified a false negative which is ACC FN*(3): ACC controller does not warn the driver when the minimum distance is presented.. It means ACC controller does not make a decision when data is available. That’s a hazard because the data is available but the controller missed sending the warning command, potentially leading to a crash. Finally, we have identified a true negative which is ACC TN*(4): ACC controller does not warn the driver when the minimum distance is not presented. It means the ACC controller does not make a decision when the data is not available, normal operation for the system. 119 7.2 ASAM’s Future Work in Medical Embedded Device - Pacemaker In this example, we will identify FP/FN/TP/TN for the pacemaker. The device controller monitor (DCM) sends the pace command to the pulse generator (PG) to pace the heart with respect to sensing data. Figure 7.2: ASAM’s future work in pacemaker example Figure 7.2 shows ASAM’s future work in the pacemaker system to pace the heart with respect to availability of sensing data which provides by the electrode sensor. The problem is that the DCM paces the heart when the sensing data is not available. Another problem is that the DCM does not pace the heart when the sensing data is available. For that purpose, we need to analyze the electrode’s sensing data according to pacing / not pacing the heart to identify false positives (FP), false negatives (FN), true positives (TP) and true negatives (TN) in the pacemaker system. We will show that in the table 7.2: Table 7.2: Identify FP/FN/TP/TN in pacemaker Sensing Data Available Sensing Data Not Available Positive Result (Pace) PM TP*(1) PM FP*(2) Negative Result (Not Pace) PM FN*(3) PM TN*(4) As shown in table 7.2, we have identified a true positive which is PM TP*(1): DCM paces the heart when the sensing data is available. In addition, we identified a false positive PM FP*(2): DCM paces the heart when the sensing data is not available. It means DCM made a wrong decision because the pace command has been sent based on incorrect sensing data. Also, we have identified a false negative which is PM FN*(3): DCM does not pace the heart when the sensing data is available. That is a hazard because the sensing data is available but the controller does not send the command to the pulse generator, potentially 120 causing a life-threatening situation. Finally, we have identified a true negative which is PM TN*(4): DCM does not pace the heart when the sensing data is not available. It means that DCM does not make a decision when the data is not available, normal operation for the system. 7.3 Summary In this section, we have focused on how to identify the false positives (FP) and false negatives (FN) in each safety-critical application example that described in the dissertation. Specifically, we focused on availability of the data which is provided by the sensor and then analyzed by the controller to make a decision. This step can be considered as an initial identification step to start the future work of ASAM to identify FP/FN. According to the identification step, false negatives (FN) can be assessed as a hazard for the safetycritical systems. As part of our future work we will analyze the following questions: • RQ6) How false positives and false negatives could occur in the safety-critical systems? • RQ7) What can be done to reduce false positive and false negative probabilities in the feedback control loop architecture? • RQ8) How to handle false positives and false negatives in the architecture to identify some margins of system safety? 121 Chapter 8 Conclusion In the proposal for this dissertation, we promised to provide a new architecture safety analysis method (ASAM) to analyze system safety in a novel way. In this dissertation, we have presented the outline of a new method as well as an implementation of of that method for the analysis of safety-critical systems. Our method has been built based on STPA but it has the following additions: • ASAM is capable of finding more hazards by using a three-way communication format as opposed to the two-way format used by STPA. This allows the identification of potential internal failures of the major components in the feedback control loop architecture. The result of each internal failure is an error. The error propagates and cuts across components based on the three-way interaction format. This allows stakeholders to identify new unsafe control actions and allows the reduction of the potential effects of residual hazards in the operational system context. • ASAM also provides a mathematical notation/expression and formal specification of error ontology for the feedback control loop architecture to find more safety constraints during hazard analysis. This allows stakeholders to use formal methods to find more hazardous possibilities and mitigate them using safety constraints. • ASAM provides safety verification procedures to verify the safety constraints to ensure that hazardous conditions cannot occur. ASAM verifies the safety constraints against the system model with injected errors. This allows us to determine that either unsafe behaviors occur (which means the error leads to a hazardous condition in the system) or verifies that the error does not lead to unsafe behaviors of the system. 122 In conclusion, ASAM allows system stakeholders / safety analysts to find new hazards and new safety constraints that STPA does not find. This does not mean that our method is a replacement for STPA; however, it supports more effective and rigorous analyses of the elements of safety-critical systems. Our method is an augmentation of existing safety analysis methods, including STPA, to be used alongside current industry standards to uncover more difficult latent errors that current methods are not capable of discovering. 123 Appendices 124 Appendix A Error Ontology • Service Errors – ItemOmission / ItemCommission – ServiceOmission / ServiceCommission – SequenceOmission / SequenceCommission ∗ LateServiceStart ∗ EarlyServiceTermination ∗ BoundedOmissionInterval ∗ EarlyServiceStart ∗ LateServiceTermination • Timing Errors – ItemTimingErro ∗ EarlyDelivery ∗ LateDelivery – SequenceTimingError ∗ HighRate ∗ LowRate ∗ RateJitter – ServiceTimingError ∗ DelayedService ∗ EarlyService • Value Errors – ItemValueError ∗ UndetectableValueError ∗ DetectableValueError · OutOfRange 125 · BelowRange · AboveRange · OutOfBounds – SequenceValueError ∗ BoundedValueChange ∗ StuckValue ∗ OutOfOrder – ServiceValueError ∗ OutOfCalibration • Replication Errors • Concurrency Errors • Access Control Errors 126 Appendix B STPA Example In [45], an automated door control system for a train is described. We want to know how the train door can harm people. We simplified the example according to the steps: 1. The stakeholder establishes fundamental analyses to identify accidents and the hazards associated with those accidents. Here, we have: Accident: Hit by closing door, falling out of the train door, people trapped inside the train during emergency evacuation. Hazards: The hazards related to the accident could be: H-1: Door closes on a person in the doorway. H-2: Door opens when the train is not in the station / it is not aligned with a station platform. H-3: People are not able to exit during emergency. 2. The stakeholder designs a feedback control loop for the system to identify major components such as sensors, controllers, actuators, and the controlled process. For the train door, we have the following structure as shown in figure 1 Figure 1: Feedback control loop for automated door control system for train 3. The stakeholder identifies unsafe control actions that could lead to hazardous states. The figure 2 shows the relationship among hazards that we have found previously and the provided control actions. 4. The stakeholder identifies causal factors for the unsafe control actions. For example, the doors are commanded closed while a person in the doorway. One of the potential causes of that action is an incorrect process model (i.e, the controller incorrectly believes that the doorway is clear). This is the result of inadequate / ineffective feedback which is provided by a failing sensor. The figure 3 will help to identify causal factors of the unsafe control actions. For that purpose, we provide table 4 to clarify the context of each control action. The name of the columns are the variables 127 Figure 2: Unsafe control actions for train door controller Figure 3: Process model of the controller of the process model and the cells are the values of the variables. We know that the controller makes a decision based on the context of the control action. For example, the open door command consists of the values: the train is stopped, no emergency, train is not aligned with platform, and no obstacle in doorway. 128 Figure 4: Context of the control actions 129 Appendix C Safety-Critical System Terminologies • Fault - 1) A manifestation of an error in software 2) An incorrect step/ process /data definition in a computer program 3)A defect in a component / hardware device [17]. • Error - 1) The difference between a measured value and the specified value. 2)Omission of a requirement/constraints in the design specification. 3)A human action which produces an incorrect result such as software containing a fault [17]. • Failure - 1) Termination of the capability of a product to do a required function 2) Can be described as an event in a system / component which does not perform a required function within specified limits [17]. • Accident - Can be described as a (loss) which results from inadequate enforcement of the behavioural safety requirements on the process [21]. • Hazard - 1) A source of potential harm which leads to human injury, damage to health, property/the environment 2) An essential property or condition which has potential to cause harm. 3) System state / set of conditions that will lead to an accident [17, 21]. • Unsafe Control Actions - They are hazardous scenarios that may occur in the system due to provided/not provided control action when the system need it [21]. • Safety Constraints - They are the safeguards which prevent the system from leading to accident/loss [21]. • Process model - A model is required to determine system variables states and their values that affect the controller decision and it can be updated through feedback [44]. • Process model variables - They are the safety-critical system variables of the controller in the feedback control loop. They have an effect on the safety of issuing the control actions [44]. • Causal Factors - They are the accident scenarios which explain how unsafe control actions may occur and how safe control actions may not occur [44]. • Software Safety - Can be described as a field of study of software assurance, it is a systematic method to identifying, analyzing, mitigating, and controlling software hazards and hazardous functions to assure that the system is safe within operational context [27]. 130 • Internal failure - 1) Can be described as an error detection condition. Specifically, it is used to identify source of the error because result of internal failure is a propagation [33]. 2) given a correct input, a failure happens during the execution which leads to produce an erroneous output [8]. 3) Can be interpreted as a probability of component failure per demand and accumulated with the failure rate which occurs during interaction with other components [18] 131 Appendix D List of Abbreviations • AADL - Architecture Analysis and Design Language • STPA - System Theoretic Process Analysis • STAMP - Systems Theoretic Accident Model and Processes • EMV2 - Error Model Annex Version 2 • AGREE - Assume Guarantee REasoning Environment • SAE - Society of Automotive Engineers • ACC - Adaptive-Cruise Control • UCA - Unsafe Control Action • MBE - Model-Based Engineering • FTA - Fault Tree Analysis • FMEA - Failure Modes and Effects Analysis • SAFE - Systematic Analysis of Faults and Errors • HAZOP - Hazard and Operability Study • FPTN - Failure Propagation and Transformation Notation • FPTC - Fault Propagation and Transformation Calculus • Hip-HOPS - Hierarchically Performed Hazard Origin and Propagation Studies • LTL - Linear-Time Temporal Logic • CTL - Computation Tree Logic • SMV - Symbolic Model Verifier • FP - False Positive • FN - False Negative • TN - True Negative • TP - True Positive 132 Appendix E ASAM Annex Language Reference E.1 Error Statement E.1.1 1 Format E r r s => [ { 2 Type => <ERROR 1 TYPE>(<ERROR 1 SEVERITY>, p=<ERROR 1 PROB>) , 3 UCA => <UCA 1 ID >: ”<UCA 1 DESCRIPTION>” , C a u s e s => { 4 5 G e n e r a l => ”<ERROR 1 GENERAL CAUSE DESCRIPTION>” , 6 S p e c i f i c => ”<ERROR 1 SPECIFIC CAUSE DESCRIPTION>” 7 }, 8 SC => <SC 1 ID >: ”<SC 1 DESCRIPTION>” v e r i f i e d by [GAU1] 9 }, { 10 Type => <ERROR 2 TYPE>(<ERROR 2 SEVERITY>,p=<ERROR 2 PROB>) , 11 UCA => <UCA 2 ID >: ”<UCA 2 DESCRIPTION>” , C a u s e s => { 12 13 G e n e r a l => ”<ERROR 2 GENERAL CAUSE DESCRIPTION>” , 14 S p e c i f i c => ”<ERROR 2 SPECIFIC CAUSE DESCRIPTION>” 15 }, 16 SC => <SC 2 ID >: ”<SC 2 DESCRIPTION>” v e r i f i e d by [GAU2] 17 } ,... ,{ 18 Type => <ERROR N TYPE>(<ERROR N SEVERITY>,p=<ERROR N PROB>) , 19 UCA => <UCA N ID>: ”<UCA N DESCRIPTION>” , C a u s e s => { 20 21 G e n e r a l => ”<ERROR N GENERAL CAUSE DESCRIPTION>” , 22 S p e c i f i c => ”<ERROR N SPECIFIC CAUSE DESCRIPTION>” 23 }, 24 SC => <SC N ID >: ”<SC N DESCRIPTION>” v e r i f i e d by [GAU3] 25 }]; E.1.2 Description The errors statement allows known errors that determined causes and known safety constraints to be attached to a component of the architecture. A report of all components and their mitigated errors can then be 133 produced. ERROR SEVERITY can be one of 4 values: Catastrophic, Critical, Marginal, or Negligible. The ERROR PROB represents how likely the error is to occur. This is a value between 0 and 1 (inclusive). The verified by portion requires the XAGREE annex to be installed. Once installed, you can a mathematical proof of your safety constraint associating XAGREE guarantees to each constraint. Provided that those guarantees can be proven, the safety constraint will be reported as verified. E.1.3 Example 1 a n n e x asam {∗∗ 2 E r r s => [ { 3 Type => V a l u e O u t O f R a n g e E r r o r ( C a t a s t r o p h i c , p = 0 . 0 0 1 ) , 4 UCA => UCA1 : ” T h a t e r r o r g i v e s t h e u n s a f e c o n t r o l a c t i o n t o t h e s y s t e m ” , C a u s e s => { 5 6 G e n e r a l => ”Why t h i s u n s a f e a c t i o n h a p p e n e d ? ” 7 S p e c i f i c => ” T h a t e r r o r p r o p a g a t e d and c r o s s e d t h r e e c o m p o n e n t s ” 8 }, 9 SC => SC1 : ” P r o v i d e t h e s a f e t y c o n s t r a i n t t o m i t i g a t e t h e e f f e c t s o f t h e e r r o r ” v e r i f i e d by [GAU1] }] 10 11 }; E.2 Error Propagation Statement E.2.1 1 Format [<INCOMING ERROR 1>, <INCOMING ERROR 2>, . . . <INCOMING ERROR N>] on [<INCOMING PORT 1>, <INCOMING PORT 2>, . . . <INCOMING PORT N>] c a u s e s [<OUTGOING ERROR 1>(<ERROR 1 SEVERITY>, p=<ERROR 1 PROB>) , <OUTGOING ERROR 2>(<ERROR 2 SEVERITY>, p=<ERROR 2 PROB>) , . . . < OUTGOING ERROR N>(<ERROR N SEVERITY>, p=<ERROR N PROB>) ] on [<OUTGOING PORT 1>, <OUTGOING PORT 2>, . . . < OUTGOING PORT N>]; 134 E.2.2 Description Error propagation statements allow incoming errors to be transformed within the current component to outgoing errors. Errors that do not have a transformation rule are assumed to not propagate beyond the current component. ERROR SEVERITY can be one of 4 values: Catastrophic, Critical, Marginal, or Negligible. The ERROR PROB represents how likely the error is to occur. This is a value between 0 and 1 (inclusive). E.2.3 1 Example a n n e x asam {∗∗ 2 [ ValueOutOfRange ] on [ d o o r o p e n c , d o o r b l o c k e d c ] c a u s e s 3 [ ValueOutOfRange ( M a r g i n a l , p = 0 . 5 ) ] on [ o p e n d o o r c , c l o s e d o o r c ] ; 4 }; E.3 Internal Failure Statement E.3.1 1 Format i n t e r n a l f a i l u r e <ID >: ”<DESCRIPTION>” c a u s e s [<ERROR 1>(<ERROR 1 SEVERITY>, p=<ERROR 1 PROB>) , <ERROR 2>(<ERROR 2 SEVERITY>, <ERROR N>(<ERROR N SEVERITY>, p=<ERROR 2 PROB>) , . . . p=<ERROR N PROB>) ] on [<OUT PORT 1>, <OUT PORT 2>, . . . <OUT PORT N>]; E.3.2 Description The internal failure statement allows internal failures to be documented as well as what errors they cause and on which ports they are caused. ERROR SEVERITY can be one of 4 values: Catastrophic, Critical, Marginal, or Negligible. The ERROR PROB represents how likely the error is to occur. This is a value between 0 and 1 (inclusive). E.3.3 1 Example a n n e x asam {∗∗ i n t e r n a l f a i l u r e INTF2 : ” s o m e t h i n g bad h a p p e n e d ” c a u s e s 2 [ ValueOutOfRange ( N e g l i g i b l e , p = 0 . 0 0 2 ) ] on [ o p e n d o o r c , c l o s e d o o r c ] ; 3 }; 135 E.4 Safety Constraint Handles Statement E.4.1 1 Format s a f e t y c o n s t r a i n t <ID >: ”<DESCRIPTION>” h a n d l e s [<ERROR 1>, <ERROR 2>, . . . <ERROR N>] on [<IN PORT 1 >, <IN PORT 2 >, . . . <IN PORT N>] v e r i f i e d by [GAU1, GAU2, E.4.2 . . . . GAU N ] ; Description The safety constraint handles statements allows you to specify safety constraints for handling errors on in ports. The verified by portion requires the XAGREE annex to be installed. Once installed, you can a mathematical proof of your safety constraint associating XAGREE guarantees to each constraint. Provided that those guarantees can be proven, the safety constraint will be reported as verified. E.4.3 1 Example a n n e x asam {∗∗ s a f e t y c o n s t r a i n t SC1 : ” h a n d l e an e r r o r ” h a n d l e s [ ValueOutOfRange ] on 2 [ i n d a t a ] v e r i f i e d by [GAU1 ] ; 3 }; E.5 Safety Constraint Prevents Statement E.5.1 1 Format s a f e t y c o n s t r a i n t <ID >: ”<DESCRIPTION>” p r e v e n t s [<ERROR 1>, <ERROR 2>, . . . <ERROR N>] on [<OUT PORT 1>, <OUT PORT 2>, . . . <OUT PORT N>] v e r i f i e d by [GAU1, GAU2, E.5.2 . . . GAU N ] ; Description The safety constraint handles statements allows you to specify safety constraints for preventing errors on out ports. The verified by portion requires the XAGREE annex to be installed. Once installed, you can a mathematical proof of your safety constraint associating XAGREE guarantees to each constraint. Provided that those guarantees can be proven, the safety constraint will be reported as verified. 136 E.5.3 1 Example a n n e x asam {∗∗ s a f e t y c o n s t r a i n t SC1 : ” h a n d l e an e r r o r ” p r e v e n t s [ ValueOutOfRange ] on 2 [ o u t d a t a ] v e r i f i e d by [GAU1 ] ; 3 }; 137 Appendix F 1 XText Grammar grammar edu . c l e m s o n . asam . Asam w i t h o r g . o s a t e . x t e x t . a a d l 2 . p r o p e r t i e s . P r o p e r t i e s 2 3 g e n e r a t e asam ” h t t p : / / www. c l e m s o n . edu /ASAM” 4 5 i m p o r t ” h t t p : / / a a d l . i n f o /AADL/ 2 . 0 ” a s a a d l 2 6 7 8 AnnexLibrary r e t u r n s aadl2 : : AnnexLibrary : AsamLibrary ; 9 10 11 AnnexSubclause r e t u r n s aadl2 : : AnnexSubclause : A s a m S u b c l a u se ; 12 13 14 AsamLibrary : { AsamContractLibrary } c o n t r a c t =AsamContract ; 15 16 17 A s a m S u b c l a u se : { AsamContractSubclause } c o n t r a c t =AsamContract ; 18 19 20 AsamContract r e t u r n s C o n t r a c t : { A s a m C o n t r a c t } ( s t a t e m e n t += A s a m S t a t e m e n t ) ∗ ; 21 22 23 AsamStatement : ErrsStatement | InternalFailureStatement | SafetyConstraintHandlesStatement | SafetyConstraintPreventsStatement | ErrorPropagationRuleStatement | TypeStatement ; 24 25 26 InternalFailureStatement : ’ i n t e r n a l ’ ’ f a i l u r e ’ i n t e r n a l F a i l u r e I d =ID ’ : ’ i n t e r n a l F a i l u r e D e s c r i p t i o n =STRING ’ c a u s e s ’ ’[ ’ ports = P o r t s L i s t ’] ’ ’; ’; 27 28 SafetyConstraintHandlesStatement : 138 ’[ ’ e r r o r s = E r r o r s L i s t ’ ] ’ ’ on ’ 29 ’ s a f e t y ’ ’ c o n s t r a i n t ’ s a f e t y C o n s t r a i n t I d =ID ’ : ’ s a f e t y C o n s t r a i n t D e s c r i p t i o n =STRING ’ h a n d l e s ’ errors=ErrorsListNoProbability ’ by ’ ’ ] ’ ’ on ’ ’[ ’ ’[ ’ ports = P o r t s L i s t ’[ ’ matchingXagreeStatements=GuaranteeIds ’] ’ ’] ’ ’ verified ’ ’; ’; 30 31 32 SafetyConstraintPreventsStatement : ’ s a f e t y ’ ’ c o n s t r a i n t ’ s a f e t y C o n s t r a i n t I d =ID ’ : ’ s a f e t y C o n s t r a i n t D e s c r i p t i o n =STRING ’ p r e v e n t s ’ errors=ErrorsListNoProbability ’ by ’ ’ ] ’ ’ on ’ ’[ ’ ’[ ’ ports = P o r t s L i s t ’[ ’ matchingXagreeStatements=GuaranteeIds ’] ’ ’] ’ ’ verified ’ ’; ’; 33 34 35 ErrorPropagationRuleStatement : ’[ ’ i n E r r o r s L i s t = ErrorsListNoProbability ’] ’ ’ causes ’ ’ ] ’ ’ on ’ ’[ ’ o u t E r r o r s L i s t = E r r o r s L i s t outPortsLists=PortsList ’] ’ ’[ ’ i n P o r t s L i s t s = P o r t s L i s t ’ ] ’ ’ on ’ ’[ ’ ’; ’; 36 37 38 TypeStatement : ’ t y p e ’ ’=>’ t y p e = ( ’ s e n s o r ’ | ’ c o n t r o l l e r ’ | ’ c o n t r o l l e d p r o c e s s ’ | ’ a c t u a t o r ’ ) ’; ’; 39 40 41 ErrorsList : f i r s t E r r o r = E r r o r ( ’ , ’ r e s t E r r o r s += E r r o r ) ∗ ; 42 43 44 ErrorsListNoProbability : f i r s t E r r o r = E r r o r N o P r o b a b i l i t y ( ’ , ’ r e s t E r r o r s += E r r o r N o P r o b a b i l i t y ) ∗ ; 45 46 47 PortsList : f i r s t P o r t =ID ( ’ , ’ r e s t P o r t s +=ID ) ∗ ; 48 49 50 ErrsStatement : { E r r s S t a t e m e n t } ’ E r r s ’ ’=>’ ’ [ ’ f i r s t E r r o r = E r r o r S t a t e m e n t ( ’ , ’ r e s t E r r o r s += E r r o r S t a t e m e n t ) ∗ ’ ] ’ ; 51 139 52 53 ErrorStatement : ’{ ’ t y p e = E r r o r T y p e S t a t e m e n t ’ , ’ 54 uca= U n s a f e C o n t r o l A c t i o n S t a t e m e n t ’ , ’ 55 cause=CausesStatement ’ , ’ 56 sc= S a f e t y C o n s t r a i n t S t a t e m e n t 57 58 ’} ’; 59 60 61 ErrorTypeStatement : ’ Type ’ ’=>’ ( e r r o r T y p e = E r r o r ) ; 62 63 64 UnsafeControlActionStatement : ’UCA’ ’=>’ i d =ID ’ : ’ d e s c r i p t i o n =STRING ; 65 66 67 CausesStatement : ’ Causes ’ ’=>’ ’ { ’ ( g e n e r a l = G e n e r a l C a u s e S t a t e m e n t | s p e c i f i c =SpecificCauseStatement | general=GeneralCauseStatement ’ , ’ specific =SpecificCauseStatement ) ’} ’; 68 69 70 GeneralCauseStatement : ’ G e n e r a l ’ ’=>’ d e s c r i p t i o n =STRING ; 71 72 73 SpecificCauseStatement : ’ S p e c i f i c ’ ’=>’ d e s c r i p t i o n =STRING ; 74 75 76 SafetyConstraintStatement : ’SC ’ ’=>’ i d =ID ’ : ’ d e s c r i p t i o n =STRING ’ v e r i f i e d ’ ’ by ’ matchingXagreeStatements=GuaranteeIds 77 78 79 GuaranteeIds : f i r s t =ID ( ’ , ’ r e s t +=ID ) ∗ ; 80 81 Error : 140 ’] ’; ’[ ’ 82 e r r o r =( ’ S e r v i c e E r r o r ’ | ’ ItemOmission ’ | ’ ServiceOmission ’ | ’ SequenceOmission ’ | ’ T r a n s i e n t S e r v i c e O m i s s i o n ’ | ’ L a t e S e r v i c e S t a r t ’ | ’ E a r l y S e r v i c e T e r m i n a t i o n ’ | ’ B o u n d e d O m i s s i o n I n t e r v a l ’ | ’ ItemCommission ’ | ’ S e r v i c e C o m m i s s i o n ’ | ’ SequenceCommission ’ | ’ E a r l y S e r v i c e S t a r t ’ | ’ LateServiceTermination ’ | ’ TimingRelatedError ’ | ’ ItemTimingError ’ | ’ E a r l y D e l i v e r y ’ | ’ L a t e D e l i v e r y ’ | ’ S e q u e n c e T i m i n g E r r o r ’ | ’ HighRate ’ | ’ LowRate ’ | ’ R a t e J i t t e r ’ | ’ S e r v i c e T i m i n g E r r o r ’ | ’ D e l a y e d S e r v i c e ’ | ’ EarlyService ’ | ’ ValueRelatedError ’ | ’ ItemValueError ’ | ’ U n d e t e c t a b l e V a l u e E r r o r ’ | ’ D e t e c t a b l e V a l u e E r r o r ’ | ’ OutOfRange ’ | ’ BelowRange ’ | ’ AboveRange ’ | ’ OutOfBounds ’ | ’ S e q u e n c e V a l u e E r r o r ’ | ’ BoundedValueChange ’ | ’ S t u c k V a l u e ’ | ’ OutOfOrder ’ | ’ S e r v i c e V a l u e E r r o r ’ | ’ OutOfCalibration ’ | ’ ReplicationError ’ | ’ AsymmetricReplicatesError ’ | ’ AsymmetricValue ’ | ’ A s y m m e t r i c A p p r o x i m a t e V a l u e ’ | ’ A s y m m e t r i c E x a c t V a l u e ’ | ’ AsymmetricTiming ’ | ’ A s y m m e t r i c O m i s s i o n ’ | ’ A s y m m e t r i c I t e m O m i s s i o n ’ | ’ AsymmetricServiceOmission ’ | ’ SymmetricReplicatesError ’ | ’ SymmetricValue ’ | ’ SymmetricApproximateValue ’ | ’ SymmetricExactValue ’ | ’ SymmetricTiming ’ | ’ S y m m e t r i c O m i s s i o n ’ | ’ S y m m e t r i c I t e m O m i s s i o n ’ | ’ SymmetricServiceOmission ’ | ’ ConcurrencyError ’ | ’ RaceCondition ’ | ’ ReadWriteRace ’ | ’ W r i t e W r i t e R a c e ’ | ’ MutExError ’ | ’ Deadlock ’ | ’ S t a r v a t i o n ’ | ID ) 83 ’( ’ 84 s e v e r i t y L e v e l =( ’ C a t a s t r o p h i c ’ | ’ C r i t i c a l ’ | ’ Marginal ’ | ’ N e g l i g i b l e ’ ) 85 ’ , p=’ 86 p r o b a b i l i t y = ( ’ 0 ’ | ’ 1 ’ | FLOAT) 87 ’) ’; 88 89 90 ErrorNoProbability : e r r o r =( ’ S e r v i c e E r r o r ’ | ’ ItemOmission ’ | ’ ServiceOmission ’ | ’ SequenceOmission ’ | ’ T r a n s i e n t S e r v i c e O m i s s i o n ’ | ’ L a t e S e r v i c e S t a r t ’ | ’ E a r l y S e r v i c e T e r m i n a t i o n ’ | ’ B o u n d e d O m i s s i o n I n t e r v a l ’ | ’ ItemCommission ’ | ’ S e r v i c e C o m m i s s i o n ’ | ’ SequenceCommission ’ | ’ E a r l y S e r v i c e S t a r t ’ | ’ LateServiceTermination ’ | ’ TimingRelatedError ’ | ’ ItemTimingError ’ | ’ E a r l y D e l i v e r y ’ | ’ L a t e D e l i v e r y ’ | ’ S e q u e n c e T i m i n g E r r o r ’ | ’ HighRate ’ | 141 ’ LowRate ’ | ’ R a t e J i t t e r ’ | ’ S e r v i c e T i m i n g E r r o r ’ | ’ D e l a y e d S e r v i c e ’ | ’ EarlyService ’ | ’ ValueRelatedError ’ | ’ ItemValueError ’ | ’ U n d e t e c t a b l e V a l u e E r r o r ’ | ’ D e t e c t a b l e V a l u e E r r o r ’ | ’ OutOfRange ’ | ’ BelowRange ’ | ’ AboveRange ’ | ’ OutOfBounds ’ | ’ S e q u e n c e V a l u e E r r o r ’ | ’ BoundedValueChange ’ | ’ S t u c k V a l u e ’ | ’ OutOfOrder ’ | ’ S e r v i c e V a l u e E r r o r ’ | ’ OutOfCalibration ’ | ’ ReplicationError ’ | ’ AsymmetricReplicatesError ’ | ’ AsymmetricValue ’ | ’ A s y m m e t r i c A p p r o x i m a t e V a l u e ’ | ’ A s y m m e t r i c E x a c t V a l u e ’ | ’ AsymmetricTiming ’ | ’ A s y m m e t r i c O m i s s i o n ’ | ’ A s y m m e t r i c I t e m O m i s s i o n ’ | ’ AsymmetricServiceOmission ’ | ’ SymmetricReplicatesError ’ | ’ SymmetricValue ’ | ’ SymmetricApproximateValue ’ | ’ SymmetricExactValue ’ | ’ SymmetricTiming ’ | ’ S y m m e t r i c O m i s s i o n ’ | ’ S y m m e t r i c I t e m O m i s s i o n ’ | ’ SymmetricServiceOmission ’ | ’ ConcurrencyError ’ | ’ RaceCondition ’ | ’ ReadWriteRace ’ | ’ W r i t e W r i t e R a c e ’ | ’ MutExError ’ | ’ Deadlock ’ | ’ S t a r v a t i o n ’ | ID ) ; 91 92 t e r m i n a l FLOAT : ’ 0 . ’ INTEGER LIT ; 142 Appendix G ASAM’s plug-in G.1 Install OSATE http://osate.org/download-and-install.html G.2 Install ASAM https://bitbucket.org/strategicsoftwareengineering/asam-update-site/ G.3 ASAM Examples G.3.1 Adaptive Cruise Control System - (ACC) Example 1: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/ src/master/ACCExample1.aadl Example 2: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/ src/master/ACCExample2.aadl G.3.2 Medical Embedded Device - Pacemaker (PM) Example 1: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/ src/master/PMExample1.aadl Example 2: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/ src/master/PMExample2.aadl Example 3: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/ src/master/PMExample3.aadl G.3.3 Train Automated Door Control System - (TADCS) Example 1: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/ src/master/TADCSExample1.aadl Example 2: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/ src/master/TADCSExample2.aadl Example 3: https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/ src/master/TADCSExample3.aadl 143 G.3.4 Infant Incubator Temperature Control System - Isolette (ISO) https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/src/master/ ISOLETTEExample1.aadl G.3.5 Notation Example https://bitbucket.org/strategicsoftwareengineering/asam-examples-2/src/master/ NotationExample.aadl 144 Appendix H Feedback Results Representation In this section, we present the results of the feedback questionnaire from section 3.3 which was published as a technical paper in the 36th international system safety conference (ISSC). In that session, after the presentation, 21 system safety specialists voted for 10 questions based on different criteria such as strongly agree, agree, undecided, disagree and strongly disagree. Questions are listed in figure 5. Figure 5: Feedback questionnaire Also, figure 6 shows the results of their votes with respect to each question. 145 Figure 6: Results of the votes 146 Bibliography [1] Asim Abdulkhaleq and Stefan Wanger. Integrating state machine analysis with system-theoretic process analysis. LNI Proceeding of Software Engineering Workshop Band, Springer, P(215):501–514, December 2013. [2] Asim Abdulkhaleq and Stefan Wanger. A software safety verification method based on systemtheoritic process analysis. In International Conference on Computer Safety, Reliability, and Security,(SAFECOMP). Springer, 2014. [3] Risza Ruzli Ahmed Baig and Azizul Buang. Reliability analysis using fault tree analysis: A review. International Journal of Chemical Engineering and Applications, 4(3):169–173, June 2013. [4] Philip Asare, John Lach, and John A. Stankovic. Fstpa-i: A formal approach to hazard identification via system theoretic process analysis. In Proceedings of the ACM/IEEE 4th International Conference on Cyber-Physical Systems, ICCPS ’13, pages 150–159, New York, NY, USA, 2013. ACM. [5] Mordechai Ben-Ari. The bug that destroyed a rocket. Special Interest Group on Computer Science Education, SIGCSE Bulletin, 33(2):58–59, June 2004. [6] Ronal Berger and et al. Rhythm management product performance report: Q2 edition. Boston Scientific:Advancing Science for Life, June 2016. [7] Darren Cofer and et al. Compositional verification of architectural models. In NFM 2012, LNCS 7226, pp. 126140. Springer-Verlag Berlin Heidelberg, 2012. [8] V. Cortellessa and V. Grassi. Role and impact of error propagation in software architecture reliability. Technical Report TRCS 007/2006, 2006. [9] Julien Delange. AADL in Practice: Design and Validate the Architecture of Critical Systems. Reblochon Development Company, May 2017. [10] Julien Delange and Peter Feiler. Supporting safety evaluation process using aadl. SEI, 7th Layered Assurance Workshop, 2013. [11] Peter Feiler and David Gluch. Model-Based Engineering with AADL. Addison-Wesley, 2013. [12] Peter H. Feiler, David Gluch, and John D. McGregor. An Architecture-Led Safety Analysis Method. In 8th European Congress on Embedded Real Time Software and Systems (ERTS 2016), TOULOUSE, France, January 2016. [13] Peter Fenelon and John A McDermid. An integrated toolset for software safety analysis. Journal of Systems and Software, 21(3):279–290, June 1993. [14] Jeremy Hsu. Toyota recalls 1.9 million prius hybrids over software flaw. IEEE Spectrum, Feb. 2014. 147 [15] John Hudak and Peter Feiler. Developing aadl models for control systems: A practitioners guide. CMU/SEI-2007-TR-014, July 2007. [16] Carnegie Mellon University/Software Engineering Institute(SEI). Open source aadl tool environment: http://osate.org/, 2005. [17] International Electrotechnical Commission International Organization for Standardization, Institute of Electrical, and Electronics Engineers. Iso/iec/ieee 24765:2010 systems and software engineeringvocabulary, December 2010. [18] Kishor S. Trivedi Katerina Goeva-Popstojanova. Architecture-based approach to reliability assessment of software systems. An International Journal of Performance Evaluation, 45(1):179–204, April 2001. [19] John C. Knight. Safety critical systems: challenges and directions. In The 24 rd International Conference on Software Engineering, ICSE. IEEE, May 2005. [20] Nancy Leveson. A new accidental model for engineering safer systems. Safety Science, 42(4):237–270, April 2004. [21] Nancy Leveson. Engineering a safer world: Systems Thinking Applied to Safety. MIT Press, 2011. [22] Nancy Leveson. An stpa primer:version 1. MIT publications, http://sunnyday.mit.edu/STPA-Primerv0.pdf, August 2013. [23] Nancy Leveson and Clark Turner. An investigation of the therac-25 accidents. IEEE Computer, 26(7):18–41, July 1993. [24] Jacques-Louis Lions. Ariane 5:flight 501 failure. In Technical Report: Report by the Inquiry Board, July 1996. [25] Ethan Mcgee and John McGregor. A decision support system for selecting between designs for dynamic software product lines. PhD Dissertation in Computer Science, Clemson University, August 2018. [26] Anitha Murugesan and et al. Compositional verification of a medical device system. In n ACM SIGAda Ada Letters, volume 33, pages 5164. ACM, 2013. [27] NASA-GB-8719.13. NASA Software Safety Guidebook: TECHNICAL STANDARD. NASA, March 2004. [28] Michael Maile Natalya An and Daniel Jiang. Balancing the requirements for a zero false positive/negative forward collision warning. 10th IEEE Annual Conference on Wireless On-demand Network Systems and Services (WONS), Banff, Canada, March 2013. [29] National Heart, Lung, and Blood Institute. https://www.nhlbi.nih.gov/health/health-topics/topics/pace, Feb.28 2012. ”explore pacemakers”. [30] Dennis Nolan. Safety and Security Review for the Process Industries Application of HAZOP, PHA, What-If and SVA Reviews, 4th Edition. Elsevier, 2015. [31] Standard Pacemaker. Pacemaker system specification, software quality research laboratory (sqrl),. Boston Scientific Publisher, Jan.3 2007. [32] David Gluch Peter Feiler and John Hudak. The architecture analysis and design language(aadl):an introduction. SEI, CMU/SEI-2006-TN-011, February 2006. [33] David Gluch Peter Feiler, John Hudak and Julien Delange. Architecture fault modeling and analysis with the error model annex, version 2. SEI, CMU/SEI-2016-TR-009, June 2016. 148 [34] David Gluch Peter Feiler, Julien Delange and John D. McGregor. Architecture-led safety process. Technical Report: CMU/SEI-2016-TR-012, Carnegie Mellon University/Software Engineering Institute, Pittsburgh, 2016. [35] Julien Delange Peter Feiler, John Hudak and David Gluch. Architecture fault modeling and analysis with the error model annex, version 2. Technical Report: CMU/SEI-2016-TR-009, Carnegie Mellon University/Software Engineering Institute, Pittsburgh, 2016. [36] Sam Procter. A development and assurance process for medical application platform apps. Ph.D. Dissertation, Kansas State University, 2016. [37] Sam Procter and John Hatcliff. An architecturally-integrated, systems-based hazard analysis for medical applications. In Twelfth ACM/IEEE Conference on Formal Methods and Models for Codesign, MEMOCODE. IEEE, November 2014. [38] Fryad M. Rashid and John D. McGregor. Augmenting a hazard analysis method with error propagation information for safety-critical systems. 35th International System Safety Conference, Albuquerque, New Mexico, USA, 21-25 August 2017. [39] Fryad M. Rashid and John D. McGregor. Using formal notations to augment a hazard analysis method. 36th International System Safety Conference, Phoenix, Arizona, USA, 13-17 August 2018. [40] Michael Beauregard Raymond Mikulak, Robin McDermott. The Basics of FMEA, 2nd Edition. CRC Press, 2008. [41] ATSB Transport Safety Investigation Report. In-flight upset event 240km north-west of perth, wa boeing company 777-200, 9m-mrg. Australian Transport Safety Bureau, August 2005. [42] NIST Planning report 02-3. The economic impacts of inadequate infrastructure for software testing. May 2002. [43] Felix Salfner and Miroslaw Malek. Reliability Modeling of Proactive Fault Handling. HumboldtUniversitt zu Berlin, Mathematisch-Naturwissenschaftliche Fakultt II, Institut fr Informatik, 2006. [44] John Thomas. Extending and automating a systems-theoretic hazard analysis for requirements generation and analysis. Ph.D. Dissertation. MIT., 2013. [45] John Thomas and Nanacy Leveson. Performing hazard analysis on complex, software and humanintensive systems. The 29th International System Safety Conference, Las Vegas, NV, August 2011. [46] Nigel J. Tracey and et al. Integrating safety analysis with automatic test-data generation for software safety verification. Proceedings of the 17th International Conference on System Safety, 1999. [47] Malcolm Wallace. Modular architectural representation and analysis of fault propagation and transformation. Electronic Notes in Theoretical Computer Science, 141(3):53–71, December 2005. [48] R. Sasseb G. Heinerb Y. Papadopoulosa, J. McDermida. Analysis and synthesis of the behaviour of complex programmable electronic systems in conditions of failure. Reliability Engineering and System Safety, 71(3):229–247, 2001. 149