Abstract
In many cases, an interlocking system is required not only to dispose with the high safety integrity level, but also with high availability. Therefore, many manufacturers realize the interlocking system based on architecture 2003 or architecture 2 × (2002). The aim of the paper is to compare the safety integrity levels against random failures of two identical safety functions, that are implemented by the interlocking system with different architectures - architecture 2003 and architecture 2 × (2002). In order to observe the simultaneous influence of system and operational parameters of the interlocking system on the safety integrity against random failures of safety function, for model creation is used method based on the Markov chain theory. Based on mathematical description of created models are realized the calculations in software tool Wolfram Mathematica and the results are presented graphically.
Similar content being viewed by others
Keywords
1 Introduction
For an achievement of higher safety integrity level (SIL3, SIL4) of electronic interlocking systems (IS) is used the composite fail-safety technique, which is based on using the redundancy. If the safety integrity is the main observed property, then solutions based on architecture 2002 are normally used. On the other hand, if the availability is the main observed property (in addition to the safety integrity), then solutions based on architecture 2003, resp. 2 × (2002) are used. These are multichannel solutions, whose safety is achieved by multiple processing of safety function (SF) and subsequent comparison of results, resp. subresults. Necessary safety conditions are:
-
Early failures detection and following negation of their consequences;
-
Channels independence.
If a redundancy in the system is used only for improvement of dependability indicators, then redundant parts (channels in this case) are called redundancy. From a point of view of a correct system operation, the redundant system parts are such parts which are redundant, if the system works without the failure (in scope of the paper, the failure is considered as random failure).
Active redundancy is characterized by the fact, that a control function is performed by basic item and all redundant items at the same time. The system with active redundancy is functional as long as m items out of all n system items perform a required function. Therefore, systems with this architecture are called m out of n systems. A suitable choice of values of the parameters m and n has a significant influence on dependability and safety system properties.
If only a part of a total number of items which perform the required function is active and the rest of the items is inactive (they are waiting to put into the active state if one of the active items fails), then this system is called a system with standby redundancy. A configuration when the system performs the required function until at least one item is functional is very often. At the system commissioning, the basic item is connected to a process. If the basic item fails, then this item is disconnected and its function is taken over by the first available functional redundancy. Subsequently, if this first redundant item also fails, the function is taken over by a next functional redundant item. This cycle is repeated until a functional redundant item is available.
The IS with architecture 2003 uses an active redundancy for achievement of the required SIL and availability and the IS with architecture 2 × (2002) uses a standby redundancy for achievement of the required SIL and availability.
Selection of suitable solution at system design requires to know its observed properties. This paper is focused on comparison of the safety integrity of the SF realized by two different solutions – architecture 2003 and 2 × (2002). For simplicity, within the paper it is assumed, that the IS realizes just one SF (then the SIL of the SF can be identified with the SIL of the IS).
2 Theoretical Part
2.1 The Random Failures Influence on the Safety Integrity of the IS with Architecture 2003
Figure 1 shows the architecture of the IS, where A, B, C are three identical and mutually independent channels. Linkages AB, BC, AC represent communication systems for data transmission between channels. System is able, among other things, to remain functional (with the required SIL) also in case of failure in one channel, due to the reconfiguration from architecture 2003 into architecture 2002. Part of the system reconfiguration is also the broken channel disconnection (isolation). After failure removal and returning the isolated channel into operation (for example by replacement of the broken channel by the functional one), the system again works with architecture 2003. Results (resp. subresults) from individual channels are evaluated in the voting.
As a dangerous state of the system (regarding the consequences of random failures) it is necessary to consider the state, when at least two of three channels have the potentially dangerous random failure and the system is, by its neighbourhood (cooperating systems), considered as functional – its data are considered as correct, resp. as data non-threatening a controlled process. The hazard analysis shows that the dangerous state of the system may be caused by one of the following events:
-
Occurrence of the system random failures;
-
Dangerous failure of the communication between channels;
-
Dangerous failure of the voting.
Let us assume, that the system includes the continuous failures detection mechanism, which is considered at analysis of integrity against random failures with the diagnostic coverage coefficient \( c_{d} \) and duration time of the diagnostic cycle \( T_{d} \).
For clarity of the paper we will assume, that influence of the dangerous failure of the voting or communication between channels has negligible influence on the system safety integrity. More detailed information about the modelling of the communication system safety integrity is stated in [3].
The random failures influence on the system safety integrity can be analysed using CTMC. Figure 2 shows the diagram describing the transition of the system with architecture 2003 from no-failure state 1 into dangerous state 7, resp. into safe state 8 provided, that the random failures rates of channels A, B, C are identical, i.e. \( \lambda_{A} = \lambda_{B} = \lambda_{C} = \lambda \).
Meaning of individual states and transitions in CTMC in Fig. 2 is stated in Tables 1 and 2.
The transition rate from state 3 into state 4, resp. from state 6 into state 8, can be determined as (pessimistic approach):
.
where \( T_{d} \) is the failure detection time and \( T_{n} \) is the failure negation time.
Diagram in Fig. 2 does not respect the recovery influence on the safety integrity (transition into absorption state). The recovery influence on the system safety integrity depends on the recovery time and activities related to the system recovery. In general, recovery can take place when:
-
The system is in operable state and the recovery only applies to part of the system (recovery from architecture 2002 back to architecture 2003);
-
The system is in incapable-operation state; it can be a recovery after regular preventive inspection or after transition of the system into state 8, resp. 7.
The system transition from incapable-operation state to operable state can be modelled using the DTMC [4, 5].
The recovery of the repaired channel of the system, when the system is in operable state is shown in Fig. 3 Since the recovery of a faulty part can occur only after the failure detection and negation, so the recovery is possible only from architecture 2002 – the system is in one of states 4, 5, 6. The transition rate from one state into the other state is given by the recovery rate – \( \mu \). If the system is in state 4, the both channels are functional and after the recovery of the third channel the system passes into state 1. It is not excluded, that during activities related to the recovery of dysfunctional channel, the failure can occur in one of two operable channels – then the system passes into state 5 or 6. If it is detectable failure, the system passes from state 6 into state 3 and if it is undetectable failure, the system passes from state 5 into state 2.
The recovery rate \( \mu \) (the rate of transitions 4 → 1, 6 → 3 and 5 → 2) can be calculated according to the following equation:
where \( MTTR \) is the mean time to recovery.
CTMC shown in Fig. 3 can be described by the transition rate matrix (3) and the differential equation system (4).
Based on the solution of the differential equation system (4) and knowledge of the probability distribution vector in time \( t = 0 \), can be calculated the probability of state 8 – \( p_{D} \left( t \right) \) and subsequently also the rate of system dangerous failure – \( \lambda_{D} \left( t \right) \).
2.2 The Random Failures Influence on the Safety Integrity of the IS with Architecture 2 × (2002)
Figure 4 shows the architecture of the IS, which consists of two independent subsystems – subsystem SS1 and subsystem SS2. Each subsystem comprises the dual-channel system with architecture 2002, which consists of two independent channels – A, B. System is able, among other things, to remain functional (with the required SIL) also in case of failure in one subsystem, due to the switchover to the second functional subsystem and disconnection the dysfunctional subsystem from a controlled process. The switch control block gives command on switchover the switch based on information about operable state of the individual subsystems.
As a dangerous state of the system (regarding the consequences of random failures) it is necessary to consider the state, when channels A, B of subsystem, which is connected to a controlled process, have the potentially dangerous random failure and the system is, by its neighbourhood (cooperating systems), considered as functional – its data are considered as correct, resp. as data non-threatening controlled process. The hazard analysis shows that the dangerous state of the system may be caused by one of the following events:
-
Occurrence of the system random failures;
-
Dangerous failure of the switch;
-
Dangerous failure of the switch control.
Let us assume, that the system includes the continuous failures detection mechanism, which is considered at analysis of integrity against random failures with the diagnostic coverage coefficient \( c_{d} \) and duration time of the diagnostic cycle \( T_{d} \).
For clarity of the paper we will assume, that influence of the dangerous failure of the switch or the switch control has negligible influence on the system safety integrity.
Observing the reliable and safety properties of systems with architecture 2 × (2002) (also marked as 2004) is content of, for example [6, 7].
The random failures influence on the system safety integrity can be analysed using CTMC. Figure 5 shows the diagram describing the transition of the system with architecture 2 × (2002) from no-failure state 1 into dangerous state 7, resp. into safe state 8 provided, that the random failures rates of channels A, B in subsystem are identical, i.e. \( \lambda_{1A} = \lambda_{1B} = \lambda_{1} \), resp. \( \lambda_{2A} = \lambda_{2B} = \lambda_{2} \). In this case it is assumed, that the standby redundancy (subsystem SS2) works in mode called cold standby redundancy – SS2 is in functioning time of the SS1 disconnected from a power source and therefore it is practically assumed, that in this operation mode the random failure of a redundant part cannot occur (idealized idea) [2].
Meaning of individual states and transitions in CTMC in Fig. 5 is stated in Tables 3 and 4.
The transition rate from state 3 into state 4 – δ1 is the failures detection and negation rate in subsystem SS1 and switchover to subsystem SS2 and the transition rate from state 6 into state 8 – δ2 is the failures detection and negation rate in subsystem SS2. δ1 and δ2 can be determined according to the (1).
Diagram in Fig. 5 does not respect the influence of recovery on the safety integrity. The recovery influence on the system safety integrity depends on the recovery time and activities related to the system recovery. In general, recovery can take place when:
-
The system is in operable state and the recovery only applies to part of the system (recovery from architecture 2002 back to architecture 2 × (2002));
-
The system is in incapable-operation state; it can be recovery after regular preventive inspection or after transition of the system into state 8, resp. 7.
The subsystem recovery, when the system is in operable state is shown in Fig. 6 Since the recovery of a faulty part can occur only after the failure detection and negation, so the recovery is possible only from architecture 2002 – the system is in one of states 4, 5, 6. The transition rate from one state into the other state is given by the recovery rate – \( \mu \). It is assumed, that a failure in subsystem cannot occur during its recovery. If the system is in state 4, the both channels of subsystem SS2 are functional and after the subsystem SS1 recovery the system passes into state 1. It is not excluded, that during activities related to the subsystem SS1 recovery, the failure can occur in one of two operable channels of subsystem SS2 – then the system passes into state 5 or 6 (depending on whether it is an undetectable or a detectable failure). Since it is assumed, that a failure in subsystem SS1 cannot occur during its recovery, it can be assumed also the system transition from states 5 and 6 into state 1. In this case, after subsequent failure occurrence and its detection and negation in subsystem SS1, the system could not pass into state 4, but into state 5, resp. 6 and therefore it is needed to realize the regular inspection called prooftest of the subsystem SS2 during the subsystem SS1 recovery, thereby ensuring that subsystem SS2 is without a failure during the subsystem SS1 operation.
The recovery rate (the rate of transitions 4 → 1, 6 → 1 and 5 → 1) can be calculated according to the (2).
CTMC shown in Fig. 6 can be described by the transition rate matrix (5) and the differential equation system (6).
Based on the solution of the differential equation system (6) a and knowledge of the probability distribution vector in time \( t = 0 \), can be calculated the probability of state 8 – \( p_{D} \left( t \right) \) and subsequently also the rate of system dangerous failure – \( \lambda_{D} \left( t \right) \).
3 Experimental Part
In order to compare the safety properties of two different solutions of the same SF, let us assume, that:
-
Basic blocks (channels) in both solutions are identical by hardware and they have the random failures rate \( \lambda_{A} = \lambda_{B} = \lambda_{C} = \lambda = 10^{ - 6} {\text{h}}^{ - 1} \);
-
In both solutions is implemented the continuous failures detection mechanism with the diagnostic coverage coefficient \( c_{d} = 0,99 \) and the duration time of the diagnostic cycle \( T_{d} = 1 \,{\text{h}} \);
-
The negation time is negligible in consideration of the duration time of the diagnostic cycle;
-
The mean time to recovery \( MTTR = 5\, {\text{h}} \);
-
The useful lifetime of the IS is 20 years;
-
All the channels of given solutions are involved in the realization of one SF.
3.1 Comparison of Safety Properties of Both Architectures
The influence of the random failures occurrence, the failure detection and negation and also the architecture change after failure detection and negation on the safety integrity of the SF for architectures 2003 and 2 × (2002) can be seen in Figs. 7 and 8.
Figure 7 shows curves of the SF failures probability for both compared architectures (see CTMC in Fig. 2 for architecture 2003 and CTMC in Fig. 5 for architecture 2 × (2002)) during the entire useful lifetime of the IS.
In the case, when the IS works in continuous mode of operation, then for the SIL determination is decisive the SF failures rate, which can be determined based on the \( p_{D} \left( t \right) \) [1]. Curves of the SF failures rate for the entire useful lifetime of the IS are shown in Fig. 8.
Figures 9 and 10.complement the results presented in Figs. 7 and 8 by the recovery influence on the safety integrity.
Figure 9 shows curves of the SF failure probability for both compered architectures (see CTMC in Fig. 3 for architecture 2003 and CTMC in Fig. 6 for architecture 2 × (2002)) during the entire useful lifetime of the IS.
In the case, when the IS works in continuous mode of operation, then for the SIL determination is decisive the SF failures rate, which can be determined based on the \( p_{D} \left( t \right) \) [1]. Curves of the SF failures rate for the entire useful lifetime of the IS are shown in Fig. 10.
4 Conclusion
Based on results stated in the paper, it can be concluded, that from view of the safety integrity (regarding the considered values of the parameters) using the architecture 2 × (2002) seems to be more appropriate. This result remains valid even when considering the recovery, which has negative influence on the safety integrity.
Since the paper deals only with the safety properties of the IS it is necessary to add, that in case of observing the reliability properties, the results could be pointing out the suitability of using the other architecture.
In practice, it is often necessary to choose a certain compromise between the safety properties, reliability properties and the complexity of the system technical solution. It is about looking for a global optimum with regard to all observing properties of the IS.
References
EN 50129: Railway application. Safety-related electronic systems for signalling (2003)
Balák, J., Ždánsky, P.: Modelling of transition of system with standby redundancy into failed state In: Procedia Engineering 12th International Scientific Conference of Young Scientists on Sustainable, Modern and Safe Transport Book Series vol. 192, pp. 10–15 (2017)
Rástočný, K., et al.: Quantitative assessment of safety integrity level of message transmission between safety-related equipment. Comput. Inform. 33(2), 343–368 (2014)
Ilavský, J., Rástočný, K.: Considerations of the recovery in 2-out-of-3 safety-related control system. In: 11th IFAC/IEEE International Conference on Programmable Devices and Embedded Systems, Brno, Czech Republic (2012)
Rástočný, K., Ilavský, J.: Effects of a periodic maintenance on the safety integrity level of a control system. In: FORMS/FORMAT 2010: Formal Methods for Automation and Safety in Railway and Automotive Systems, pp. 77–85 (2011). https://doi.org/10.1007/978-3-642-14261-1_8
Haridasan, R., Kumar, M., Marathe, P.P.: Safety analysis of 2004 coincidence logic systems. Int. J. Syst. Assur. Eng. Manage. 6(1), 26–31 (2015)
Borcsok J., et al.: Estimation and evaluation of the 2004-architecture for safety related systems. Risk, reliability and societal safety, vols 1–3: vol 1: specialisation topics; vol 2: thematic topics; vol 3: applications topics, p. 361+ (2007)
Acknowledgment
This paper has been supported by the Educational Grant Agency of the Slovak Republic (KEGA) Number: 034ŽU-4/2016: Implementation of modern technologies focusing on control using the safety PLC into education.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Rástočný, K., Ždánsky, P. (2018). Comparison of Some Safety Properties of Architecture 2003 and Architecture 2 × (2002). In: Mikulski, J. (eds) Management Perspective for Transport Telematics. TST 2018. Communications in Computer and Information Science, vol 897. Springer, Cham. https://doi.org/10.1007/978-3-319-97955-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-97955-7_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97954-0
Online ISBN: 978-3-319-97955-7
eBook Packages: Computer ScienceComputer Science (R0)