1 s2.0 S0967066123003817 Main
1 s2.0 S0967066123003817 Main
1 s2.0 S0967066123003817 Main
Keywords: Alarm systems are essential for the safe and efficient operation of process industries. However, complex plant
Industrial alarm systems connectivity and process interactions could cause many correlated alarms in practice and thus compromise
Process monitoring alarm system performance. To address correlated alarms, it is desired that alarm correlations are discovered
Correlated alarms
from historical Alarm and Event (A&E) logs, so the obtained results could help improve alarm configurations
Pattern mining
or design suppression strategies. Motivated by this problem, a systematic method to extract alarm correlation is
Network analysis
proposed in this work and the contributions are: (1) Correlated alarms and their occurrence orders are captured
as correlation patterns through pattern mining, and such patterns are characterized by statistical features. (2)
Alarm correlations and their statistical features are visualized as network graphs to indicate process interactions
and identify alarms for prioritized analysis. To demonstrate the effectiveness of the proposed method, case
studies are provided using an industrial simulation benchmark Vinyl Acetate Monomer (VAM) plant model.
✩ This work was supported by the Natural Sciences and Engineering Research Council of Canada. A preliminary version of this work was presented as H.R.M.
Rao, B. Zhou, T. Chen, and S.L. Shah, ‘‘Discovery of alarm correlations based on pattern mining and network analysis,’’ in 2022 American Control Conference
(ACC), Atlanta, GA, USA, 2022. The additional contributions are summarized in Section 1.
∗ Corresponding author.
E-mail addresses: mohanrao@ualberta.ca (H.R.M. Rao), bzhou@ualberta.ca (B. Zhou), kevin.brown@mcloudcorp.com (K. Brown), tchen@ualberta.ca
(T. Chen), sirish.shah@ualberta.ca (S.L. Shah).
https://doi.org/10.1016/j.conengprac.2023.105812
Received 19 June 2023; Received in revised form 27 September 2023; Accepted 28 November 2023
Available online 6 December 2023
0967-0661/© 2023 Elsevier Ltd. All rights reserved.
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812
2
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812
3
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812
Fig. 3. Visualizations of correlation types. The vertical axis represents temporal records ( ′ ), which are distinguished by arrows in blue (green). The horizontal axis gives the
time instants of temporal records, where ( ′ ) has its start and end time instants as 𝑡𝑠 and 𝑡𝑒 (𝑡′𝑠 and 𝑡′𝑒 ), respectively. (For interpretation of the references to color in this figure
legend, the reader is referred to the web version of this article.)
𝛾
extended to capture more correlation instances and such an extension where 𝑎1 , 𝑎2 , … , 𝑎|𝛹 | are alarms in 1 , 2 , … , |𝜙| . Obviously, 𝑎𝑖 ⇐⇐⇐⇒
⇐ 𝑎𝑗
is represented by 𝛾
and 𝑝 ⇐⇐⇐⇒
⇐ 𝑞 represent identical correlation type 𝛾, when 𝑖 = 𝑝
𝛾1 𝛾2 𝛾|𝜙|−1
𝜙 = 1 ⇐⇐⇐⇐⇐⇒
⇐ 2 ⇐⇐⇐⇐⇐⇒
⇐ 3 ⋯ ⇐⇐⇐⇐⇐⇐⇐⇐⇐⇐⇐⇒
⇐ |𝜙| , (4) and 𝑗 = 𝑞. It is worth mentioning that such pattern representation is
strict on both the captured alarms and correlation types. For example,
where 1 , 2 , … , |𝜙| are the captured temporal records and 𝛾1 , 𝛾2 , … , given correlation instances comprised of identical alarms with different
𝛾|𝜙|−1 represent their corresponding correlation types. |𝜙| gives the correlation types, such correlation instances cannot be represented by
number of temporal records in 𝜙. It is worth mentioning that to avoid a common correlation pattern due to mismatches in correlation types.
pattern redundancy due to correlation type equal, a unification strategy Moreover, with the recursive extension of correlation instance by
𝐄 𝐄
⇐ ′ and ′ ⇐⇐⇐⇐⇒
is adopted, such that correlation instances ⇐⇐⇐⇐⇒ ⇐ (4), the obtained correlation patterns also grow in length gradually. To
are considered as identical during correlation instance extension. The represent such pattern growth, a notation is introduced as
significance of such unification strategy is further explained by the 𝛹 = 𝛹̌ ⊎ 𝑎, (7)
example in Remark 2.
where ⊎ indicates 𝛹 is obtained by attaching alarm 𝑎 to the end of a
Remark 2. In the preliminary version of the work, presented as Rao previous pattern 𝛹̌ . Next, some statistical metrics are calculated to help
et al. (2022), a notable redundancy was observed in the discovered evaluate the obtained correlation patterns and prioritize analysis.
correlation patterns, where certain patterns were initially considered
distinct, but they were different only in a few mismatched alarms or 3.2. Pattern statistical features evaluation
alarm occurrence orders. As the pattern length grows, such redundancy
becomes more prevalent. To mitigate this issue, similar patterns are The extracted correlation patterns are evaluated with three statis-
consolidated by the unification strategy in this paper. Consequently, tical metrics, namely, support, confidence, and lift, where their cal-
similar correlation patterns could be commonly represented by a gen- culations are performed based on Definitions 1, 2, and 3 (below),
𝐁 𝐎 respectively.
eral pattern. For instance, the four correlation patterns 1 ⇐⇐⇐⇐⇒
⇐ 2 ⇐⇐⇐⇐⇒
⇐ 4 ,
𝐁 𝐎 𝐁 𝐄 𝐎 𝐁 𝐄 𝐎
1 ⇐⇐⇐⇐⇒
⇐ 3 ⇐⇐⇐⇐⇒
⇐ 4 , 1 ⇐⇐⇐⇐⇒
⇐ 2 ⇐⇐⇐⇐⇒
⇐ 3 ⇐⇐⇐⇐⇒
⇐ 4 , and 1 ⇐⇐⇐⇐⇒
⇐ 3 ⇐⇐⇐⇐⇒
⇐ 2 ⇐⇐⇐⇐⇒
⇐ 4 , Definition 1. The support of 𝛹 measures the pattern occurrence
were considered distinct patterns in Rao et al. (2022), but they are now frequency by counting the number of correlation instances that can be
𝐁 𝐄 𝐎 represented by this pattern, namely,
consolidated into a general pattern as 1 ⇐⇐⇐⇐⇒
⇐ 2 ⇐⇐⇐⇐⇒
⇐ 3 ⇐⇐⇐⇐⇒
⇐ 4 by the
unification strategy. 𝑓𝜆 (𝛹 ) = |{𝜙| 𝜙 ≻ 𝛹 , 𝜙 ∈ 𝛷}|, (8)
Thereafter, correlation instances are recursively extended until the where 𝑓𝜆 (⋅) calculates the support of 𝛹 and ≻ implies that correlation
obtained ones cannot be further extended; such calculation follows the instances 𝜙 can be represented by pattern 𝛹 . It should be noted that
approach in Dorgo and Abonyi (2018) and Kong et al. (2010), but similar correlation instances could occur multiple times in a process
with further modifications to distinguish alarm correlation types and due to re-occurrences of similar faults and abnormalities. Such re-
incorporate the pattern unification strategy. Eventually, the obtained occurrences are captured in the A&E log and quantified through the
correlation instances are collected into a set support value.
𝛷 = {𝜙1 , 𝜙2 , … , 𝜙|𝛷| }, (5) Definition 2. The confidence of 𝛹 evaluates pattern reliability based
on its relative occurrence frequency (Singh et al., 2011) as
where 𝜙𝑖 ∈ 𝛷 is the 𝑖th correlation instance, 𝑖 = 1, 2, … , |𝛷|.
{
To give general representations of alarm correlations as patterns, 𝑓𝜃 (𝛹̌ ) × 𝑓𝜆 (𝛹 )∕𝑓𝜆 (𝛹̌ ), if 𝛾|𝛹 |−1 ∈ 𝛤 ⧵ {𝐃},
the exact time instants in temporal records are discarded because such 𝑓𝜃 (𝛹 ) = (9)
𝑓𝜆 (𝛹 )∕𝑓𝜆 (𝑎), if 𝛾|𝛹 |−1 ∈ {𝐃},
information is indicated by correlation types. As a result, 𝜙 is extracted
as correlation pattern where 𝑓𝜃 (⋅) determines the confidence of 𝛹 and 𝛹 = 𝛹̌ ⊎ 𝑎. The
𝛾1 𝛾2 𝛾|𝛹 |−1 operator ⧵ denotes set exclusion. The confidence value measures the
𝛹 = 𝑎1 ⇐⇐⇐⇐⇐⇒ ⇐ 𝑎3 ⋯ ⇐⇐⇐⇐⇐⇐⇐⇐⇐⇐⇐⇐⇒
⇐ 𝑎2 ⇐⇐⇐⇐⇐⇒ ⇐ 𝑎|𝛹 | , (6) conditional probability that alarm 𝑎 occurs with correlation pattern
4
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812
Table 1 Table 2
Temporal database for the numerical example. Captured correlation instances, patterns, and statistical features.
Temporal Alarm Start End Temporal Alarm Start End Correlation instances Correlation patterns Statistical metrics
record time time record time time Support Confidence Lift
1 𝑎1 3 5 9 𝑎3 40 45 𝐁 𝐎
2 𝑎2 7 11 10 𝑎5 50 58 𝜙1 = 1 ⇐⇐⇐⇐⇐⇐⇒
⇐ 2 ⇐ 3
⇐⇐⇐⇐⇐⇐⇐⇒ 𝐁 𝐎
𝛹1 = 𝑎1 ⇐⇐⇐⇐⇐⇐⇒
⇐ 𝑎2 ⇐⇐⇐⇐⇐⇐⇐⇒
⇐ 𝑎3 2 1 2
3 𝑎3 9 13 11 𝑎4 52 54 𝐁 𝐎
𝜙3 = 7 ⇐⇐⇐⇐⇐⇐⇒
⇐ 8 ⇐ 9
⇐⇐⇐⇐⇐⇐⇐⇒
4 𝑎5 17 22 12 𝑎5 62 67
5 𝑎4 18 20 13 𝑎4 63 65 𝐃
𝜙2 = 5⇐⇐⇐⇐⇐⇐⇐⇒
⇐ 4
6 𝑎6 25 28 14 𝑎7 72 75 𝐃
𝐃 𝛹2 = 𝑎5⇐⇐⇐⇐⇐⇐⇐⇒𝑎
⇐ 4 3 0.75 1
7 𝑎1 33 35 15 𝑎5 80 82 𝜙4 = 11⇐⇐⇐⇐⇐⇐⇐⇒
⇐ 10
8 𝑎2 37 42 16 𝑎8 90 100 𝐃
𝜙5 = 13⇐⇐⇐⇐⇐⇐⇐⇒
⇐ 12
5
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812
asymmetric due to ), it gives 𝑤3,2 = ∅ and the value of 𝑤2,3 is shown
in Fig. 5b, where alarms 𝑎2 and 𝑎3 have correlation types equal and
start and thus their corresponding statistical values (namely, support
𝜆 and confidence 𝜃) are recorded in columns 𝐄 and 𝐒; whereas the
statistical values for the remaining columns are assigned as 0’s because
their correlation types are not discovered.
6
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812
4.2.3. Identification of highly interacting alarms Fig. 6. Systematic procedure for the analysis of alarm correlation networks.
As a practical application, correlation networks could be utilized
to prioritize alarm analysis by identifying highly interacting alarms.
Specifically, highly interacting alarms are defined as alarms config- In summary, correlated alarms and their occurrence orders are
ured with higher priorities and captured by correlation patterns with captured as correlation patterns, and such patterns are characterized
higher statistical metric values (namely, support and confidence). As by statistical features and their correlation types. Furthermore, the
a result, highly interacting alarms are characterized by frequent occur- alarm correlations and the statistical features of correlated alarms are
rences and association with severe abnormal situations and thus require visualized as network graphs to represent the captured information
immediate attention for alarm management. To help identify highly in a compact form. Such a succinct representation facilitates the easy
interacting alarms, an index called correlation centrality is introduced comprehension of interactions between the patterns and within the
to rank alarms in the correlation network. Here, correlation centrality process. Even though there are existing methods in literature to extract
is calculated by alarm patterns (Jacobs & Dagnino, 2016; Wang et al., 2017, 2016; Zhu
( )1−𝛽 et al., 2021), none of them considers correlation types to determine
𝑓𝜏 (𝑎𝑖 ) = (𝜈𝑖 )𝛽 ⋅ 𝑘𝑖 ⋅ (1∕𝜌𝑖 ), (14)
the correlation patterns nor utilizes a statistical framework to quantify
where function 𝑓𝜏 (⋅) calculates correlation centrality of alarm 𝑎𝑖 and alarm correlation. For better presentation, the major steps involved
∑
𝑘𝑖 = 𝑁𝑗 𝜃𝑖,𝑗,𝛾 represents the weighted degree of 𝑎𝑖 as described in (13). in the construction and analysis of alarm correlation networks are
𝑁 is the number of captured alarm correlations, namely, the total illustrated as a flowchart in Fig. 6, where the input is the set of obtained
number of edges in the network. Here, 𝜈𝑖 is the degree of 𝑎𝑖 measuring valid correlation patterns from Section 3.
the number of alarm correlations containing 𝑎𝑖 , and it is calculated by
∑
𝑁 Remark 5. The introduction of correlation centrality is to help
𝜈𝑖 = 𝑣𝑖,𝑗 , (15) efficiently identify highly interacting alarms, which require immediate
𝑗
attention for alarm management. The correlation centrality metric in-
where 𝑣𝑖,𝑗 is the element of adjacency matrix in the 𝑖th row and corporates both pertinent statistical features of correlation patterns and
𝑗th column. Therefore, the user-specified parameter 𝛽 is utilized to attributes in alarm configuration to give a systematic rank of alarms for
adjust the weight between 𝜈𝑖 and 𝑘𝑖 , where the former (latter) evaluates prioritized analysis. Therefore, correlation centrality offers an effective
the number of correlations (the strength of correlations quantified metric to prioritize alarm rationalization and it holds an advantage
by confidence) of alarm 𝑎𝑖 . By default, set 𝛽 = 0.5, such that both over the conventional approach of top bad actors (Hollifield & Habibi,
factors are considered with equal importance for identifying highly in- 2011), which primarily considers alarm occurrence frequencies for
teracting alarms. In addition, alarm priority configuration is considered ranking the alarms.
in correlation centrality, and thus a scaling factor (namely, 1∕𝜌𝑖 ) is
employed. Here, 𝜌𝑖 ∈ {1, 2, … , ||} is the numerical denomination of
configured alarm priorities on a linear scale, such that 𝜌𝑖 = 1 (||)
5. Case study
when alarm 𝑎𝑖 is configured with the highest (lowest) priority. It is
worth mentioning that the value of 𝜌𝑖 could be assigned with further
modifications to accommodate other factors (e.g., priority distribution In this section, case studies are presented to demonstrate the effec-
and relative occurrence frequency) and thus giving more flexibility to tiveness of the proposed method based on simulated data generated
prioritize alarm analysis based on specific requirements. from an industrial benchmark model.
7
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812
Table 3
Details of the process simulation using the VAM plant model (Machida et al., 2016).
Start time End time Fault description Type of fault Fault magnitude
00 h 00 min 05 h 00 min Steady State Operation N/A N/A
05 h 00 min 05 h 20 min Malfunction-21: Fail Absorber Bottom Valve Process Failure Binary Malfunction
14 h 20 min 14 h 40 min Malfunction-29: Column Differential Pressure Process Failure 150%
23 h 40 min 24 h 00 min Malfunction-25: Reactor-In Temperature Indicator Process Failure 10 ◦ C
33 h 00 min 33 h 20 min Malfunction-26: Dirty Vaporizer Level Indicator Process Failure 30%
42 h 20 min 42 h 40 min Malfunction-13: Change 15K Steam Temperature Disturbance −20 ◦ C
51 h 40 min 52 h 00 min Malfunction-07: Decrease Vaporizer Heat-Transfer Disturbance 50%
61 h 00 min 61 h 20 min Malfunction-23: Gas Feed Pressure Indicator Trouble Process Failure 2 MPa
70 h 20 min 70 h 40 min Malfunction-21: Fail Absorber Bottom Valve Process Failure Binary Malfunction
80 h 00 min 100 h 00 min Malfunction-20: Steady State Operation N/A N/A
8
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812
9
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812
Table 6
Description of alarms captured in correlation network cluster-3a .
Alarm 𝑎 Description of alarm 𝑎 Alarm 𝑎′ Description of alarm 𝑎′ Support Confidence Correlation
PC130.PV.LO C2 H4 Feed Pressure Low FI102.PV.LO Recycle Gas Flow Rate Low 54 0.568 B
FI102.PV.LO Recycle Gas Flow Rate Low FC170.PV.HI O2 Feed Flow Rate High 15 0.577 B
FI102.PV.LO Recycle Gas Flow Rate Low FC101.PV.HI C2 H4 Feed Flow Rate High 15 0.577 B
FC170.PV.HI O2 Feed Flow Rate High FC101.PV.HI C2 H4 Feed Flow Rate High 14 0.538 B
𝛾
a
Here, alarms 𝑎 and 𝑎′ give correlation pattern 𝑎 ⇐⇐⇐⇒𝑎
⇐ ′ and their correlation type is 𝛾.
10
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812
Singh, A., Chaudhary, M., Rana, A., & Dubey, G. (2011). Online mining of data to Yang, G., Hu, W., Cao, W., & Wu, M. (2020). Simulating industrial alarm systems by
generate association rule mining in large databases. In International conference on extending the public model of a Vinyl Acetate Monomer process. In Chinese control
recent trends in information systems (pp. 126–131). conference (pp. 6093–6098).
Song, X., Liu, Q., Dong, M., Meng, Y., Qin, C., Zhao, D., Yin, F., & Jiu, J. (2022). Yang, F., Shah, S., Xiao, D., & Chen, T. (2012). Improved correlation analysis and
Chemical process alarm root cause diagnosis method based on the combination visualization of industrial alarm data. ISA Transactions, 51(4), 499–506.
of data-knowledge-driven method and time retrospective reasoning. ACS Omega, Yang, Z., Wang, J., & Chen, T. (2013). Detection of correlated alarms based on similarity
7(24), 20886–20905. coefficients of binary data. IEEE Transactions on Automation Science and Engineering,
Wang, J., & Chen, T. (2016). Main causes of long-standing alarms and their removal 10(4), 1014–1025.
by dynamic state-based alarm systems. Journal of Loss Prevention in the Process Yang, B., Wang, H., Li, H., & He, Y. (2020). A novel detection of correlated alarms
Industries, 43, 106–119. with delays based on improved block matching similarities. ISA Transactions, 98,
Wang, J., He, C., Liu, Y., Tian, G., Peng, I., Xing, J., Ruan, X., Xie, H., & Wang, F. 393–402.
L. (2017). Efficient alarm behavior analytics for telecom networks. Information Zhang, L., Chen, G., Brijs, T., & Zhang, X. (2008). Discovering during-temporal
Sciences, 402, 1–14. patterns (DTPs) in large temporal databases. Expert Systems with Applications, 34(2),
Wang, J., Li, H., Huang, J., & Su, C. (2015). A data similarity based analysis to 1178–1189.
consequential alarms of industrial processes. Journal of Loss Prevention in the Process Zhu, Q., Jin, C., He, Y., & Xu, Y. (2021). Pattern mining of alarm flood sequences using
Industries, 35, 29–34. an improved prefixspan algorithm with tolerance to short-term order ambiguity.
Wang, J., Li, H., Huang, J., & Su, C. (2016). Association rules mining based analysis Industrial and Engineering Chemistry Research, 60(11), 4375–4384.
of consequential alarm sequences in chemical processes. Journal of Loss Prevention
in the Process Industries, 41, 178–185.
11