Nothing Special   »   [go: up one dir, main page]

1 s2.0 S0967066123003817 Main

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Control Engineering Practice 143 (2024) 105812

Contents lists available at ScienceDirect

Control Engineering Practice


journal homepage: www.elsevier.com/locate/conengprac

Alarm correlation analysis with applications to industrial alarm


management✩
Harikrishna Rao Mohan Rao a ,∗, Boyuan Zhou a , Kevin Brown b , Tongwen Chen a , Sirish L. Shah c
a
Department of Electrical & Computer Engineering, University of Alberta, Edmonton, T6G 1H9, Alberta, Canada
b
mCloud Technologies Corp., Edmonton, T6E 5Z9, Alberta, Canada
c
Department of Chemical & Materials Engineering, University of Alberta, Edmonton, T6G 1H9, Alberta, Canada

ARTICLE INFO ABSTRACT

Keywords: Alarm systems are essential for the safe and efficient operation of process industries. However, complex plant
Industrial alarm systems connectivity and process interactions could cause many correlated alarms in practice and thus compromise
Process monitoring alarm system performance. To address correlated alarms, it is desired that alarm correlations are discovered
Correlated alarms
from historical Alarm and Event (A&E) logs, so the obtained results could help improve alarm configurations
Pattern mining
or design suppression strategies. Motivated by this problem, a systematic method to extract alarm correlation is
Network analysis
proposed in this work and the contributions are: (1) Correlated alarms and their occurrence orders are captured
as correlation patterns through pattern mining, and such patterns are characterized by statistical features. (2)
Alarm correlations and their statistical features are visualized as network graphs to indicate process interactions
and identify alarms for prioritized analysis. To demonstrate the effectiveness of the proposed method, case
studies are provided using an industrial simulation benchmark Vinyl Acetate Monomer (VAM) plant model.

1. Introduction To address correlated alarms, many methods have been proposed


in literature, where the discovery and analysis of alarm correlations
Alarm systems are integral parts of process monitoring for in- were commonly used to help with process monitoring and alarm ratio-
dustrial facilities, ensuring the safe and efficient operation of plants nalization/reconfiguration (Bergquist et al., 2003; Salah et al., 2013).
by alerting operators about abnormal situations (EEMUA-191, 2013). Such methods could be categorized into two groups based on their
Well-performed alarm systems could help maintain processes in their objectives, namely, correlation analysis and pattern discovery. For the
desired operating conditions by accurately alerting abnormalities for
analysis of correlated alarms, various metrics were utilized to quantify
operators to take remedial actions (ANSI/ISA-18.2, 2016). As a re-
alarm correlation using covariance measures, such as Sorgenfrei coeffi-
sult, alarm systems serve as the first line of defense in safeguarding
cient (Yang et al., 2013) and Pearson’s correlation coefficient (Hu et al.,
plants, personnel, and property. However, the advent of digitalization
has introduced sophisticated alarm configurations causing many alarm 2015). To give improved evaluations of correlated alarms, occurrence
management problems in practice, and even leading to catastrophic delays were further considered using block matching (Yang, Wang,
disasters, such as Three Mile Island (1979), Chernobyl (1986), Phillips- et al., 2020). For the discovery of correlation patterns, many data min-
66 Complex (1989), Milford Haven Refinery (1994), and BP Texas ing techniques were adapted, such as weighted fuzzy association rule
(2005) (EEMUA-191, 2013; Hollifield & Habibi, 2011). In addition, due mining (Wang et al., 2016), graph-based analysis (Jacobs & Dagnino,
to complex plant connectivity and interacting process variables, ab- 2016), and neural networks (Li & Li, 2010). In addition, PrefixSpan
normalities occurring at a particular location could propagate to other was adapted to identify alarm correlations in telecommunication net-
parts of the plant, triggering correlated secondary alarms. By definition, works (Wang et al., 2017), and such an algorithm was further improved
correlated alarms are alarms that often occur together or within a short to tolerate alarm order ambiguity (Zhu et al., 2021). Thereafter, to
time duration (Rothenberg, 2009). Therefore, such correlated alarms help with alarm rationalization and further analysis in practice, the
are more likely to be associated with a common abnormality.

✩ This work was supported by the Natural Sciences and Engineering Research Council of Canada. A preliminary version of this work was presented as H.R.M.
Rao, B. Zhou, T. Chen, and S.L. Shah, ‘‘Discovery of alarm correlations based on pattern mining and network analysis,’’ in 2022 American Control Conference
(ACC), Atlanta, GA, USA, 2022. The additional contributions are summarized in Section 1.
∗ Corresponding author.
E-mail addresses: mohanrao@ualberta.ca (H.R.M. Rao), bzhou@ualberta.ca (B. Zhou), kevin.brown@mcloudcorp.com (K. Brown), tchen@ualberta.ca
(T. Chen), sirish.shah@ualberta.ca (S.L. Shah).

https://doi.org/10.1016/j.conengprac.2023.105812
Received 19 June 2023; Received in revised form 27 September 2023; Accepted 28 November 2023
Available online 6 December 2023
0967-0661/© 2023 Elsevier Ltd. All rights reserved.
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812

discovered alarm correlations were visualized as color maps (Yang


et al., 2012). In addition, alarm grouping (Higuchi et al., 2009) and
suppression (Dorgo & Abonyi, 2018) strategies were adapted to reduce
alarm rates. To identify the path of abnormality propagation triggering
correlated alarms, causal relationships were evaluated using Granger
causality (Wang et al., 2015) and alarm causal networks (Song et al.,
2022).
Despite the methods in literature, there still exist some open prob-
lems for alarm correlation analysis. Specifically, alarm occurrence or-
ders are commonly used by operators to diagnose process malfunctions.
However, since the captured alarm correlation is only represented by a
number (namely, correlation coefficient), alarm correlation determined Fig. 1. Applications of the proposed method to industrial alarm management with
by covariance metrics cannot capture more complicated information, respect to alarm configuration and alarm monitoring.
such as occurrence order indicating alarm directionality, thus losing
the potential indications for abnormality propagation paths. Moreover,
alarm occurrence orders could reveal different types of faults and of alarm correlations and captured process connectivity. (4) To assist
operation modes that are pertinent to alarm management. However, process experts in analyzing results effectively, a centrality metric is in-
extracting such sequential information from alarm time stamps is non- troduced to prioritize the analysis of alarms captured by the correlation
trivial, especially for large-scale industrial facilities associated with network.
many alarms. Therefore, it is desired that the sequential order of
correlated alarms be captured along with correlation measurement. Remark 1. The proposed method only utilizes alarm data for three
In addition, to prioritize correlated alarms during their analysis and
main reasons: (1) The primary objective of alarm management is to
minimize redundant information presented to operators, the captured
efficiently reduce alarm occurrences (without loss of information) to a
alarm correlations should be given in a compact format to facilitate
manageable level for operators. This involves bolstering the reliability
accurate operator responses.
and acceptance of the proposed approach within the industrial practi-
Moreover, the information about process interactions is desired to
tioners by harnessing the same decision-making information sources as
help with further analysis and determining the paths of abnormality
operators. (2) Alarm data accurately preserves historical alarm situa-
propagation, but such information about process interactions could
tions and is easily accessible. In contrast, process data may not be able
change due to various reasons, such as design modifications, periodic
to reveal all historical situations, due to data compression and down-
plant maintenance/repairs, and operator interventions (Mah, 2013).
sampling that are commonly used to curtail storage expenses. (3) Some
Consequently, capturing and maintaining precise records of process
state-based alarms are not associated with process data. For example,
interaction is resource-intensive and time-consuming, especially for in-
communication-related alarms are triggered based on the status of
dustrial facilities with complex structures. If process interactions could
the communication system, rather than any process measurement that
be precisely captured and timely updated with plant operations, the
generates process data.
analysis of correlated alarms could be benefited by having information
to indicate abnormality propagation paths. For practical applications, this work could help industrial alarm
Motivated by the problems mentioned above, a systematic method management achieve better performance, where the performance in-
to capture alarm correlations and discover process interactions from dicators identified in EEMUA-191 (2013) and Rothenberg (2009) are
historical Alarm & Event (A&E) logs is proposed in this work, and the precision, necessity, uniqueness, direction, and timeliness. Such indi-
contributions are: cators evaluate alarm systems through the entire life cycle of alarm
management, namely, alarm configuration and alarm monitoring. Fig. 1
1. Correlated alarms and their occurrence orders are captured as
summarizes the applications of the proposed method to industrial alarm
correlation patterns by pattern mining and such patterns are
management to achieve improved performance. In the alarm configura-
characterized by statistical features.
tion stage, the captured correlation patterns could be utilized to design
2. Alarm correlations and the statistical features of correlated
alarm reduction strategies, such as alarm grouping and suppression,
alarms are visualized as network graphs to indicate process
thus achieving improved alarm configuration based on requirements
interactions and help prioritize alarm analysis.
for precision, necessity, and uniqueness. In the alarm monitoring stage,
Some preliminary results of this work were presented in Rao et al. the visualized network is enhanced with alarm statistical features to
(2022), and the work is further enriched in this paper as follows: (1) A help operators discover process interactions and prioritize their actions
pattern unification strategy is incorporated to avoid redundant results and thus giving improved alarm monitoring in terms of direction and
in correlation pattern extraction. (2) An additional statistical metric, timeliness requirements.
namely, lift, is introduced to help effectively identify meaningful cor- The rest of the paper is organized as follows. Section 2 presents
relation patterns. (3) A partitioning algorithm based on modularity is some preliminary concepts for alarm correlation analysis. The discovery
introduced to help generate optimal partitions of correlation networks of correlation patterns is provided in Section 3, and the analysis of
for indicating process interactions. (4) A correlation centrality metric correlation networks is presented in Section 4. The effectiveness of
is utilized to help prioritize alarm analysis by capturing highly inter- the proposed method is demonstrated using case studies in Section 5,
acting alarms. The details of these contributions are further clarified in followed by conclusions in Section 6.
Remarks 2, 3, 4, and 5, respectively.
Compared with existing methods in literature, the proposed method 2. Preliminaries of correlation analysis
differs in four aspects: (1) Alarm correlations are discovered along with
directionality, which is represented by the time instants of alarm oc- 2.1. Alarm data and temporal data
currence and clearance. (2) Instead of using process data to determine
process interactions, only alarm data is required in this work, making For better understanding, some preliminary concepts associated
it more suitable for direct applications to industrial alarm systems with alarm correlation analysis are provided here. The discovery of
(The reasons are further explained in Remark 1). (3) Network graphs alarm correlations is carried out based on the historical Alarm & Event
are visualized with statistical features to give a concise representation (A&E) log, which is a database comprised of chronologically ordered

2
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812

utilized to distinguish alarm correlation into distinct types to indicate


different alarm occurrence orders.

2.2. Alarm correlation types

To capture correlated alarms along with their occurrence orders,


alarm correlations are distinguished into seven distinct types. Following
the convention in causal reasoning, these correlation types are equal,
before, during, overlap, meet, start, and finish (Allen & Ferguson, 1994).
Given temporal records  = (𝑎, 𝑡𝑠 , 𝑡𝑒 ) and  ′ = (𝑎′ , 𝑡′𝑠 , 𝑡′𝑒 ), where 𝑎 ≠ 𝑎′ ,
all possible correlation types between alarms 𝑎 and 𝑎′ are given by set
𝛤 = {𝐄, 𝐁, 𝐃, 𝐎, 𝐌, 𝐒, 𝐅}. For a better explanation, the visualizations
of all correlation types and their definitions are provided in Fig. 3,
where the vertical axis represents temporal records  ( ′ ), which are
distinguished by arrows in blue (green). The horizontal axis gives the
time instants of temporal records, where  ( ′ ) has its start and end
time instants as 𝑡𝑠 and 𝑡𝑒 (𝑡′𝑠 and 𝑡′𝑒 ), respectively. For instance,  and
 ′ have their correlation type as during when 𝑡′𝑠 < 𝑡𝑠 < 𝑡𝑒 < 𝑡′𝑒 .
It is evident that alarm occurrence orders are preserved in corre-
Fig. 2. Transformation of temporal database (dashed purple rectangle) from A&E log lation types because their definitions have distinct requirements on
(dashed blue rectangle). Temporal records (dashed magenta rectangles) are generated
time instants. In comparison, four of the correlation types (namely,
from paired alarm records (dashed green rectangles) based on attributes in time, alarm,
and state columns (solid red rectangles). (For interpretation of the references to color overlap, before, during, and meet ) have directional features, whereas
in this figure legend, the reader is referred to the web version of this article.) the remaining ones (namely, equal, start and finish) are directionless.
Moreover, it is worth noting that 𝛥𝑡 is a user-specified parameter
incorporated into correlation type before, where this type is considered
records of alarm instances. An alarm record is defined as a tuple  = valid if and only if  ′ occurs after  within time period 𝛥𝑡. Therefore,
(𝑎, 𝑠, 𝑡, 𝑝), where 𝑎 ∈  is the unique identification of an alarm and the value of 𝛥𝑡 should be selected based on the dynamics of the process
𝑠 ∈  is alarm state indicating the status of alarm 𝑎 at time instant under consideration, where a larger (smaller) value is recommended for
𝑡 ∈  . Here,  is the finite set of configured alarms, and  is the processes with slow (fast) dynamics.
period of the collected A&E log. Typically,  = {0, 1}, where 1(0) stands As for practical applications, alarm correlation types could be uti-
for the alarm occurrence (clearance). Each alarm 𝑎 is associated with a lized to indicate different process operation modes, which are usually
priority 𝑝 ∈  to indicate its severity and  is the set of all priorities associated with changes in the control loop and material flow and thus
configured in the alarm system. For example,  = {Critical, High, Low} give distinct alarm correlations. For example, alarms correlated by the
and Critical (Low) denotes the most (least) severe alarm priority. Then, type start are more likely to be triggered by a common abnormality,
the A&E log could be represented as whereas the alarms with the correlation type equal indicate potential
redundancy in alarm configuration. Therefore, such correlation-type
D = ⟨1 , 2 , … , |D| ⟩, (1)
information helps alarm rationalization by indicating potential alarm
where 𝑖 is the 𝑖th alarm record and 𝑖 = 1, 2, … , |D|. The operator ⟨⋅⟩ suppression (grouping) strategies.
implies a chronologically organized sequence and | ⋅ | denotes the size
of a sequence or set. 3. Discovery of correlation patterns
To effectively capture the occurrence orders of correlated alarms,
the A&E log is transformed into a temporal database. For each alarm In this section, the discovery of correlation patterns is presented,
𝑎 ∈  recorded in D, its historical occurrence and clearance are paired where the calculations include pattern extraction from temporal
based on alarm state to generate a temporal record  = (𝑎, 𝑡𝑠 , 𝑡𝑒 ), where database and pattern statistical features evaluation.
𝑡𝑠 (𝑡𝑒 ) is the occurrence (clearance) time instant of the paired alarm
records and it is required that 𝑡𝑠 ≤ 𝑡𝑒 and ∄𝑡′𝑒 of 𝑎 such that 𝑡𝑠 ≤ 𝑡′𝑒 ≤ 𝑡𝑒 .
Thereafter, temporal records are exhaustively extracted from D for all 3.1. Pattern extraction from temporal database
alarms, and a temporal database is obtained as
Correlated alarms are extracted as correlation patterns based on the
E = ⟨1 , 2 , … , |E| ⟩, (2) temporal database, which is transformed from A&E logs in Section 2.1.
where 𝑖 ∈ E is the 𝑖th temporal record, 𝑖 = 1, 2, … , |E|. The calculations to extract meaningful correlation patterns are con-
For better illustration, Fig. 2 gives an example showing the trans- ducted in two steps, namely, (1) deriving temporal records from the
formation of the temporal database from the A&E log, where the A&E temporal database to capture potential correlation patterns; and (2)
log (temporal database) is highlighted by the dashed blue (purple) calculating statistical metrics to evaluate the extracted patterns.
rectangle. The A&E log has four columns storing time instant, alarm, Correlation patterns are extracted to discover correlated alarms
state, and priority, whereas the temporal database has three columns and capture their occurrence orders by the correlation type. For this
containing alarm, start time, and end time. Columns of the A&E log purpose, a correlation instance is constructed from temporal records
used in constructing a temporal database are highlighted in solid red using temporal database E, and such correlation instance serves as the
rectangles. The paired alarm records in the A&E log and their corre- basic element of the correlation pattern. Here, denote a correlation
sponding temporal records in the temporal database are indicated by instance by
the dashed green and magenta rectangles, respectively. In summary, 𝛾
the transformation of the A&E log into a temporal database ensures ⇐  ′,
𝜙̄ =  ⇐⇐⇐⇒ (3)
that both alarm occurrence and clearance instants are captured, and 𝛾
thus enable the extraction of alarm correlations with sequential orders. ⇐ denotes that  and  ′ are correlated in type 𝛾 ∈ 𝛤 , and 𝛤 is
where ⇐⇐⇐⇒
As a result, the start and end time attributes in temporal records are the set of all correlation types. A correlation instance could be further

3
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812

Fig. 3. Visualizations of correlation types. The vertical axis represents temporal records  ( ′ ), which are distinguished by arrows in blue (green). The horizontal axis gives the
time instants of temporal records, where  ( ′ ) has its start and end time instants as 𝑡𝑠 and 𝑡𝑒 (𝑡′𝑠 and 𝑡′𝑒 ), respectively. (For interpretation of the references to color in this figure
legend, the reader is referred to the web version of this article.)

𝛾
extended to capture more correlation instances and such an extension where 𝑎1 , 𝑎2 , … , 𝑎|𝛹 | are alarms in 1 , 2 , … , |𝜙| . Obviously, 𝑎𝑖 ⇐⇐⇐⇒
⇐ 𝑎𝑗
is represented by 𝛾
and 𝑝 ⇐⇐⇐⇒
⇐ 𝑞 represent identical correlation type 𝛾, when 𝑖 = 𝑝
𝛾1 𝛾2 𝛾|𝜙|−1
𝜙 = 1 ⇐⇐⇐⇐⇐⇒
⇐ 2 ⇐⇐⇐⇐⇐⇒
⇐ 3 ⋯ ⇐⇐⇐⇐⇐⇐⇐⇐⇐⇐⇐⇒
⇐ |𝜙| , (4) and 𝑗 = 𝑞. It is worth mentioning that such pattern representation is
strict on both the captured alarms and correlation types. For example,
where 1 , 2 , … , |𝜙| are the captured temporal records and 𝛾1 , 𝛾2 , … , given correlation instances comprised of identical alarms with different
𝛾|𝜙|−1 represent their corresponding correlation types. |𝜙| gives the correlation types, such correlation instances cannot be represented by
number of temporal records in 𝜙. It is worth mentioning that to avoid a common correlation pattern due to mismatches in correlation types.
pattern redundancy due to correlation type equal, a unification strategy Moreover, with the recursive extension of correlation instance by
𝐄 𝐄
⇐  ′ and  ′ ⇐⇐⇐⇐⇒
is adopted, such that correlation instances  ⇐⇐⇐⇐⇒ ⇐  (4), the obtained correlation patterns also grow in length gradually. To
are considered as identical during correlation instance extension. The represent such pattern growth, a notation is introduced as
significance of such unification strategy is further explained by the 𝛹 = 𝛹̌ ⊎ 𝑎, (7)
example in Remark 2.
where ⊎ indicates 𝛹 is obtained by attaching alarm 𝑎 to the end of a
Remark 2. In the preliminary version of the work, presented as Rao previous pattern 𝛹̌ . Next, some statistical metrics are calculated to help
et al. (2022), a notable redundancy was observed in the discovered evaluate the obtained correlation patterns and prioritize analysis.
correlation patterns, where certain patterns were initially considered
distinct, but they were different only in a few mismatched alarms or 3.2. Pattern statistical features evaluation
alarm occurrence orders. As the pattern length grows, such redundancy
becomes more prevalent. To mitigate this issue, similar patterns are The extracted correlation patterns are evaluated with three statis-
consolidated by the unification strategy in this paper. Consequently, tical metrics, namely, support, confidence, and lift, where their cal-
similar correlation patterns could be commonly represented by a gen- culations are performed based on Definitions 1, 2, and 3 (below),
𝐁 𝐎 respectively.
eral pattern. For instance, the four correlation patterns 1 ⇐⇐⇐⇐⇒
⇐ 2 ⇐⇐⇐⇐⇒
⇐ 4 ,
𝐁 𝐎 𝐁 𝐄 𝐎 𝐁 𝐄 𝐎
1 ⇐⇐⇐⇐⇒
⇐ 3 ⇐⇐⇐⇐⇒
⇐ 4 , 1 ⇐⇐⇐⇐⇒
⇐ 2 ⇐⇐⇐⇐⇒
⇐ 3 ⇐⇐⇐⇐⇒
⇐ 4 , and 1 ⇐⇐⇐⇐⇒
⇐ 3 ⇐⇐⇐⇐⇒
⇐ 2 ⇐⇐⇐⇐⇒
⇐ 4 , Definition 1. The support of 𝛹 measures the pattern occurrence
were considered distinct patterns in Rao et al. (2022), but they are now frequency by counting the number of correlation instances that can be
𝐁 𝐄 𝐎 represented by this pattern, namely,
consolidated into a general pattern as 1 ⇐⇐⇐⇐⇒
⇐ 2 ⇐⇐⇐⇐⇒
⇐ 3 ⇐⇐⇐⇐⇒
⇐ 4 by the
unification strategy. 𝑓𝜆 (𝛹 ) = |{𝜙| 𝜙 ≻ 𝛹 , 𝜙 ∈ 𝛷}|, (8)

Thereafter, correlation instances are recursively extended until the where 𝑓𝜆 (⋅) calculates the support of 𝛹 and ≻ implies that correlation
obtained ones cannot be further extended; such calculation follows the instances 𝜙 can be represented by pattern 𝛹 . It should be noted that
approach in Dorgo and Abonyi (2018) and Kong et al. (2010), but similar correlation instances could occur multiple times in a process
with further modifications to distinguish alarm correlation types and due to re-occurrences of similar faults and abnormalities. Such re-
incorporate the pattern unification strategy. Eventually, the obtained occurrences are captured in the A&E log and quantified through the
correlation instances are collected into a set support value.

𝛷 = {𝜙1 , 𝜙2 , … , 𝜙|𝛷| }, (5) Definition 2. The confidence of 𝛹 evaluates pattern reliability based
on its relative occurrence frequency (Singh et al., 2011) as
where 𝜙𝑖 ∈ 𝛷 is the 𝑖th correlation instance, 𝑖 = 1, 2, … , |𝛷|.
{
To give general representations of alarm correlations as patterns, 𝑓𝜃 (𝛹̌ ) × 𝑓𝜆 (𝛹 )∕𝑓𝜆 (𝛹̌ ), if 𝛾|𝛹 |−1 ∈ 𝛤 ⧵ {𝐃},
the exact time instants in temporal records are discarded because such 𝑓𝜃 (𝛹 ) = (9)
𝑓𝜆 (𝛹 )∕𝑓𝜆 (𝑎), if 𝛾|𝛹 |−1 ∈ {𝐃},
information is indicated by correlation types. As a result, 𝜙 is extracted
as correlation pattern where 𝑓𝜃 (⋅) determines the confidence of 𝛹 and 𝛹 = 𝛹̌ ⊎ 𝑎. The
𝛾1 𝛾2 𝛾|𝛹 |−1 operator ⧵ denotes set exclusion. The confidence value measures the
𝛹 = 𝑎1 ⇐⇐⇐⇐⇐⇒ ⇐ 𝑎3 ⋯ ⇐⇐⇐⇐⇐⇐⇐⇐⇐⇐⇐⇐⇒
⇐ 𝑎2 ⇐⇐⇐⇐⇐⇒ ⇐ 𝑎|𝛹 | , (6) conditional probability that alarm 𝑎 occurs with correlation pattern

4
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812

Table 1 Table 2
Temporal database for the numerical example. Captured correlation instances, patterns, and statistical features.
Temporal Alarm Start End Temporal Alarm Start End Correlation instances Correlation patterns Statistical metrics
record time time record time time Support Confidence Lift
1 𝑎1 3 5 9 𝑎3 40 45 𝐁 𝐎
2 𝑎2 7 11 10 𝑎5 50 58 𝜙1 = 1 ⇐⇐⇐⇐⇐⇐⇒
⇐ 2 ⇐ 3
⇐⇐⇐⇐⇐⇐⇐⇒ 𝐁 𝐎
𝛹1 = 𝑎1 ⇐⇐⇐⇐⇐⇐⇒
⇐ 𝑎2 ⇐⇐⇐⇐⇐⇐⇐⇒
⇐ 𝑎3 2 1 2
3 𝑎3 9 13 11 𝑎4 52 54 𝐁 𝐎
𝜙3 = 7 ⇐⇐⇐⇐⇐⇐⇒
⇐ 8 ⇐ 9
⇐⇐⇐⇐⇐⇐⇐⇒
4 𝑎5 17 22 12 𝑎5 62 67
5 𝑎4 18 20 13 𝑎4 63 65 𝐃
𝜙2 = 5⇐⇐⇐⇐⇐⇐⇐⇒
⇐ 4
6 𝑎6 25 28 14 𝑎7 72 75 𝐃
𝐃 𝛹2 = 𝑎5⇐⇐⇐⇐⇐⇐⇐⇒𝑎
⇐ 4 3 0.75 1
7 𝑎1 33 35 15 𝑎5 80 82 𝜙4 = 11⇐⇐⇐⇐⇐⇐⇐⇒
⇐ 10
8 𝑎2 37 42 16 𝑎8 90 100 𝐃
𝜙5 = 13⇐⇐⇐⇐⇐⇐⇐⇒
⇐ 12

𝛹 (Fournier-Viger et al., 2015). The correlation type 𝛾|𝛹 |−1 ∈ 𝐃 is


1 × 2∕2 = 1. Thereafter, the confidence of correlation pattern 𝛹1 is
considered as a special case to accommodate the fact that time intervals 𝐁
of 𝑎|𝛹 |−1 would be contained in the time intervals of 𝑎|𝛹 | in (6) if calculated by 𝑓𝜃 (𝛹1 ) = 𝑓𝜃 (𝑎1 ⇐⇐⇐⇐⇒ ⇐ 𝑎2 ) × 𝑓𝜆 (𝛹1 )∕𝑓𝜆 (𝑎3 ) = 1 × 2∕2 = 1.
𝛾|𝛹 |−1 ∈ 𝐃 (Zhang et al., 2008). Eventually, the lift value of correlation pattern 𝛹1 is calculated by
𝑓̃ (𝛹 ) 𝑓𝜆 (𝛹 )∕𝜉max 2∕4
𝑓𝜋 (𝛹1 ) = ̃ ̌𝜆 ̃ = = (2∕4)⋅(2∕4) = 2, where
𝑓𝜆 (𝛹 )⋅𝑓𝜆 (𝑎) (𝑓𝜆 (𝛹̌ )∕𝜉max ) ⋅ (𝑓𝜆 (𝑎3 )∕𝜉max )
Definition 3. The lift of 𝛹 determines pattern independence based on the normalized support value 𝜉max is determined as 𝜉max = max(|𝜉𝑖 |) =
the ratio of support values by max({2, 2, 2, 3, 4, 1, 1, 1}) = 4.
𝑓̃𝜆 (𝛹 )
𝑓𝜋 (𝛹 ) = , (10) Remark 3. The statistical metrics, namely, support, confidence, and
𝑓̃𝜆 (𝛹̌ ) ⋅ 𝑓̃𝜆 (𝑎) lift, quantify correlation patterns in different aspects. For practical
where 𝑓𝜋 (⋅) calculates the lift of 𝛹 and 𝛹 = 𝛹̌ ⊎ 𝑎. Here, 𝑓̃𝜆 (⋅) calculates applications, an evaluation framework combining these three statisti-
the normalized value of support of a pattern 𝛹 as 𝑓̃𝜆 (𝛹 ) = 𝑓𝜆 (𝛹 )∕𝜉max , cal metrics is desired. Therefore, strongly correlated alarms could be
and 𝜉max = max(|𝜉𝑖 |) for 𝑖 = 1, 2, … , ||. Here, 𝜉𝑖 is the set of temporal identified from correlation patterns that frequently occurred (indicated
records associated with alarm 𝑎𝑖 , namely, 𝜉𝑖 = { ∣ 𝑎𝑖 ∈  and  ∈ E}. by support), having a higher degree of relative occurrence frequency
The lift value measures the association between correlation pattern 𝛹 (indicated by confidence), and giving positive association (indicated by
and alarm 𝑎, such that the occurrence of 𝑎 significantly indicates the lift).
occurrence of 𝛹 when 𝑓𝜋 (𝛹 ) > 1 (Gadár & Abonyi, 2019). Eventually, all valid correlation patterns are collected into set
Based on the above statistical metrics, valid correlation patterns P = {𝛹1 , 𝛹2 , … , 𝛹|P| }, (11)
could be effectively identified for prioritized analysis. Specifically,
correlation pattern 𝛹 is taken as valid if and only if it satisfies that where 𝛹𝑖 represents the 𝑖th valid correlation pattern and 𝑖 = 1, 2, … , |P|.
̄ 𝑓𝜃 (𝛹 ) ≥ 𝜃,
𝑓𝜆 (𝛹 ) ≥ 𝜆, ̄ and 𝑓𝜋 (𝛹 ) > 1. Here, 𝜆̄ (𝜃)
̄ is a user-specified pa- Then, the alarm correlation patterns are further analyzed as network
rameter called minimum support (minimum confidence) to regulate the graphs for a better representation of the results and to identify process
occurrence frequency (relative occurrence frequency) of a correlation interactions. In summary, the major steps in the discovery of alarm
pattern and the value of such parameter could be determined through correlation patterns from historic A&E logs are presented as a flowchart
a combination of process knowledge, industry standards, and practical in Fig. 4.
requirements from alarm management teams.
To better explain the evaluation of statistical features, a numeri- 4. Analysis of correlation networks
cal example is provided as follows. A temporal database is given in
Table 1, which contains columns for the temporal records (𝑖 and In this section, correlation networks are constructed to visualize the
𝑖 = 1, 2, … , 16), alarms associated with the temporal records, and extracted alarm correlation patterns along with their statistical features.
their corresponding start/end time instants. To capture the correlation
4.1. Construction of correlation networks
instances and patterns, set 𝛥𝑡 = 3. The complete list of correlation
instances, the corresponding correlation patterns, and their statistical
The construction of a correlation network is achieved by creating a
metrics are given in Table 2. Some examples of correlation instances
𝐁 𝐎 𝐃 graph from the extracted correlation patterns, such that the network is
are: 𝜙̄ 1 = 1 ⇐⇐⇐⇐⇒
⇐ 2 , 𝜙̄ 2 = 2 ⇐⇐⇐⇐⇒
⇐ 3 , and 𝜙̄ 3 = 5 ⇐⇐⇐⇐⇒
⇐ 4 . Here, comprised of ordered pairs of vertices and edges, where the vertices
𝐁 represent correlated alarms and the edges indicate their correlation
𝜙̄ 1 = 1 ⇐⇐⇐⇐⇒
⇐ 2 could be further extended to a longer correlation instance
types. Therefore, the vertices are connected by edges in a correlation
𝐁 𝐎
as 𝜙1 = 1 ⇐⇐⇐⇐⇒
⇐ 2 ⇐⇐⇐⇐⇒
⇐ 3 . Thereafter, correlation patterns are extracted network if and only if their corresponding alarms are correlated. Based
to give generalized representations of the captured alarm correlation on set P of correlation patterns, the obtained correlation network is
𝐁 𝐎
instances. For instance, correlation instances 𝜙1 = 1 ⇐⇐⇐⇐⇒
⇐ 2 ⇐⇐⇐⇐⇒
⇐ 3  = (, ), (12)
𝐁 𝐎
and 𝜙3 = 7 ⇐⇐⇐⇐⇒
⇐ 8 ⇐⇐⇐⇐⇒
⇐ 9 could be generally represented by pattern where  () is the adjacency (edge) matrix to store vertices (edges).
𝐁 𝐎 Here,  is a binary square matrix of size || and 𝑣𝑖,𝑗 is the element
𝛹1 = 𝑎1 ⇐⇐⇐⇐⇒ ⇐ 𝑎3 .
⇐ 𝑎2 ⇐⇐⇐⇐⇒ in the 𝑖th row and 𝑗th column to indicate if alarms 𝑎𝑖 , 𝑎𝑗 ∈  are
The calculations of pattern statistical features are demonstrated by 𝛾
𝐁 𝐎 correlated, i.e., 𝑣𝑖,𝑗 = 1, if 𝑎𝑖 ⇐⇐⇐⇒
⇐ 𝑎𝑗 is captured by 𝛹 ∈ P; otherwise,
using pattern 𝛹1 = 𝑎1 ⇐⇐⇐⇐⇒ ⇐ 𝑎3 as an example. It should be noticed
⇐ 𝑎2 ⇐⇐⇐⇐⇒ 𝑣𝑖,𝑗 = 0. It should be noticed that  is usually asymmetric as correlation
𝐁 type is considered.
that 𝛹1 is obtained by attaching alarm 𝑎3 to the end of 𝛹̌ = 𝑎1 ⇐⇐⇐⇐⇒ ⇐ 𝑎2 .
To record correlation types and statistical features, an edge ma-
The support value for 𝛹1 , denoted as 𝑓𝜆 (𝛹1 ) is determined to be 2,
trix  (having the same dimension as ) is utilized; however,  is
because there are two alarm correlation instances represented by 𝛹1 .
augmented to store sets as its elements, such that the scenarios where
Prior to calculating the confidence value of 𝛹1 , the confidence value
𝐁 𝐁 multiple correlation types coexist for the same pair of alarms could be
of 𝛹̌ is calculated as, 𝑓𝜃 (𝑎1 ⇐⇐⇐⇐⇒
⇐ 𝑎2 ) = 𝑓𝜃 (𝑎1 ) × 𝑓𝜆 (𝑎1 ⇐⇐⇐⇐⇒
⇐ 𝑎2 )∕𝑓𝜆 (𝑎2 ) = handled. Such scenarios may exist for various reasons, such as different

5
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812

asymmetric due to ), it gives 𝑤3,2 = ∅ and the value of 𝑤2,3 is shown
in Fig. 5b, where alarms 𝑎2 and 𝑎3 have correlation types equal and
start and thus their corresponding statistical values (namely, support
𝜆 and confidence 𝜃) are recorded in columns 𝐄 and 𝐒; whereas the
statistical values for the remaining columns are assigned as 0’s because
their correlation types are not discovered.

4.2. Correlation analysis based on networks

Based on the correlation network, further analysis is performed to


help prioritize alarm analysis and identify process interactions that
indicate potential abnormality propagation paths.

4.2.1. Partition of correlation networks


To prioritize further analysis, a correlation network is partitioned
into clusters of correlated alarms having significant statistical features,
where alarms in each cluster are potentially associated with common
abnormalities. In addition, after such partition, a more granular view
of the process is obtained and thus making it easier for operators to
diagnose the ongoing abnormality by focusing on the cluster of relevant
alarms.
For this purpose, the partition of the correlation network is per-
formed using a network analysis metric called modularity, which is an
index defined as the difference between the fraction of edges assigned
Fig. 4. Systematic procedure for the discovery of alarm correlation patterns from
to a specific cluster and the expected fraction assuming such edges are
historical A&E logs. randomly clustered in the network (Blondel et al., 2008; Newman &
Girvan, 2004). Moreover, the statistical features (namely, support and
confidence) of correlation networks are further adapted in this work,
and thus modularity is calculated by
[ ]
1 ∑ 𝑘𝑖 𝑘𝑗
= 𝜃𝑖,𝑗,𝛾 − 𝛿(𝑐𝑖 , 𝑐𝑗 ), (13)
2𝑁 𝑖,𝑗 2𝑁
∑𝑁
where 𝑘𝑖 = 𝑗 𝜃𝑖,𝑗,𝛾 denotes the weighted degree of alarm 𝑎𝑖 and 𝜃𝑖,𝑗,𝛾
𝛾
represents the confidence of 𝑎𝑖 ⇐⇐⇐⇒ ⇐ 𝑎𝑗 . For a specific clustering of a
correlation network, 𝑎𝑖 and 𝑎𝑗 are assigned with cluster labels 𝑐𝑖 and 𝑐𝑗 ,
respectively. Consequently, 𝛿(𝑐𝑖 , 𝑐𝑗 ) = 1, if 𝑐𝑖 = 𝑐𝑗 (i.e., 𝑎𝑖 and 𝑎𝑗 are in
the same cluster); otherwise, 𝛿(𝑐𝑖 , 𝑐𝑗 ) = 0. Here, 𝑁 denotes the number
of captured alarm correlations, which is determined by counting the
total number of edges in the network.
Fig. 5. Construction of a correlation network based on adjacency matrix  and edge Thereafter, the optimal clustering of the correlation network is
matrix . Here, the elements having values 1’s (0’s) in  indicate the corresponding determined by assigning class labels 𝑐 to all captured alarms using a
alarms are (are not) correlated and the diagonal values are discarded;  is augmented
to store the statistical features (namely, support 𝜆 and confidence 𝜃) of all possible
heuristic search approach (Blondel et al., 2008), such that modularity
correlation types.  is maximized. As a result, each obtained cluster is comprised of
strongly correlated alarms having significant statistical features and
such alarms could come from multiple distinct correlation patterns.
operating modes with distinct process dynamics. Consequently, the The clustering of the correlation network consolidates the extracted
element in the 𝑖th row and 𝑗th column of  is denoted as 𝑤𝑖,𝑗 = alarm correlation patterns because such calculations to merge distinct
{(𝛾, 𝜆, 𝜃)|𝛾 ∈ 𝛤 }, where the tuple (𝛾, 𝜆, 𝜃) indicates that alarms 𝑎𝑖 and 𝑎𝑗 patterns are not feasible during the pattern extraction stage. It is worth
are correlated by 𝛾 ∈ 𝛤 and this correlation is statistically characterized noting that the consolidation of correlation patterns through network
by support (confidence) 𝜆 (𝜃). If no correlation is discovered 𝑤𝑖,𝑗 is an graphs is tolerant to order switching of alarm tags in the patterns
empty set, namely, 𝑤𝑖,𝑗 = ∅. Lift is utilized to filter significant patterns without loss of information.
with 𝑓𝜋 (𝛹 ) > 1, and thus all resulting patterns satisfy this requirement.
Hence, this metric is not included in the visualization of correlation Remark 4. The integration of modularity as a network analysis
networks. metric significantly enhances the comprehension and analysis of alarm
For better illustration, an example of the construction of a correla- correlation in the following two aspects: (1) It simplifies the analysis
tion network is given in Fig. 5. The adjacency matrix  is shown in of large and complex correlation networks by breaking them down
Fig. 5a, where  stores the correlations between five alarms, namely, into manageable and more comprehensive functional modules, where
𝑎1 , 𝑎2 , … , 𝑎5 . The diagonal values in  are discarded as the analysis is each module contains alarms that are strongly correlated; (2) after
focusing on correlations between different alarms. More specifically, such network partition, it becomes easier to observe the interactions
𝛾
alarms 𝑎2 and 𝑎3 are only correlated by 𝑎2 ⇐⇐⇐⇒
⇐ 𝑎3 and thus 𝑣2,3 = between modules, which provides insights to the interactions within
1, whereas 𝑣3,2 = 0. Accordingly, in the edge matrix  (usually the process.

6
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812

4.2.2. Visualization of correlation networks


A correlation network is presented using Gephi (Bastian et al.,
2009), which is an open-source platform for graph and network vi-
sualization. Here, a conventional graph layout format in Fruchterman
and Reingold (1991) is utilized to help convert numerical graph rep-
resentations to exact graphs. In this visualization, correlation network
graphs are characterized by three main features: (1) The vertices and
edges represent the captured correlated alarms and their corresponding
correlation types, respectively. Arrows on the edges indicate the corre-
lation types having directionality (namely, 𝛾 ∈ {𝐁, 𝐃, 𝐎, 𝐌}), whereas
the edges for the remaining correlation types (namely, 𝛾 ∈ {𝐄, 𝐒, 𝐅}) do
not have an arrow. (2) The statistical features of correlation patterns
are provided to facilitate alarm diagnosis, where the size (thickness) of
vertices (edges) indicates the relative values of support (confidence),
such that correlated alarms with higher support (confidence) values
are visualized using bigger (thicker) vertices (edges) in the graph.
Consequently, alarm analysis could be prioritized based on such visual
indicators of their statistical features. (3) The partition of the corre-
lation network based on modularity is indicated by different colors,
such that correlated alarms assigned to the same cluster have identical
colors on their vertices and edges. As a result, it is easier to distinguish
such alarms, which have significant statistical features and thus are
potentially associated with common root causes. It is worth noting that
connections may exist across the clusters, due to process interactions
and even fault propagation in the plant. Therefore, such connections
across the clusters are highlighted using transitioning colors for visual
distinction.

4.2.3. Identification of highly interacting alarms Fig. 6. Systematic procedure for the analysis of alarm correlation networks.
As a practical application, correlation networks could be utilized
to prioritize alarm analysis by identifying highly interacting alarms.
Specifically, highly interacting alarms are defined as alarms config- In summary, correlated alarms and their occurrence orders are
ured with higher priorities and captured by correlation patterns with captured as correlation patterns, and such patterns are characterized
higher statistical metric values (namely, support and confidence). As by statistical features and their correlation types. Furthermore, the
a result, highly interacting alarms are characterized by frequent occur- alarm correlations and the statistical features of correlated alarms are
rences and association with severe abnormal situations and thus require visualized as network graphs to represent the captured information
immediate attention for alarm management. To help identify highly in a compact form. Such a succinct representation facilitates the easy
interacting alarms, an index called correlation centrality is introduced comprehension of interactions between the patterns and within the
to rank alarms in the correlation network. Here, correlation centrality process. Even though there are existing methods in literature to extract
is calculated by alarm patterns (Jacobs & Dagnino, 2016; Wang et al., 2017, 2016; Zhu
( )1−𝛽 et al., 2021), none of them considers correlation types to determine
𝑓𝜏 (𝑎𝑖 ) = (𝜈𝑖 )𝛽 ⋅ 𝑘𝑖 ⋅ (1∕𝜌𝑖 ), (14)
the correlation patterns nor utilizes a statistical framework to quantify
where function 𝑓𝜏 (⋅) calculates correlation centrality of alarm 𝑎𝑖 and alarm correlation. For better presentation, the major steps involved

𝑘𝑖 = 𝑁𝑗 𝜃𝑖,𝑗,𝛾 represents the weighted degree of 𝑎𝑖 as described in (13). in the construction and analysis of alarm correlation networks are
𝑁 is the number of captured alarm correlations, namely, the total illustrated as a flowchart in Fig. 6, where the input is the set of obtained
number of edges in the network. Here, 𝜈𝑖 is the degree of 𝑎𝑖 measuring valid correlation patterns from Section 3.
the number of alarm correlations containing 𝑎𝑖 , and it is calculated by

𝑁 Remark 5. The introduction of correlation centrality is to help
𝜈𝑖 = 𝑣𝑖,𝑗 , (15) efficiently identify highly interacting alarms, which require immediate
𝑗
attention for alarm management. The correlation centrality metric in-
where 𝑣𝑖,𝑗 is the element of adjacency matrix  in the 𝑖th row and corporates both pertinent statistical features of correlation patterns and
𝑗th column. Therefore, the user-specified parameter 𝛽 is utilized to attributes in alarm configuration to give a systematic rank of alarms for
adjust the weight between 𝜈𝑖 and 𝑘𝑖 , where the former (latter) evaluates prioritized analysis. Therefore, correlation centrality offers an effective
the number of correlations (the strength of correlations quantified metric to prioritize alarm rationalization and it holds an advantage
by confidence) of alarm 𝑎𝑖 . By default, set 𝛽 = 0.5, such that both over the conventional approach of top bad actors (Hollifield & Habibi,
factors are considered with equal importance for identifying highly in- 2011), which primarily considers alarm occurrence frequencies for
teracting alarms. In addition, alarm priority configuration is considered ranking the alarms.
in correlation centrality, and thus a scaling factor (namely, 1∕𝜌𝑖 ) is
employed. Here, 𝜌𝑖 ∈ {1, 2, … , ||} is the numerical denomination of
configured alarm priorities on a linear scale, such that 𝜌𝑖 = 1 (||)
5. Case study
when alarm 𝑎𝑖 is configured with the highest (lowest) priority. It is
worth mentioning that the value of 𝜌𝑖 could be assigned with further
modifications to accommodate other factors (e.g., priority distribution In this section, case studies are presented to demonstrate the effec-
and relative occurrence frequency) and thus giving more flexibility to tiveness of the proposed method based on simulated data generated
prioritize alarm analysis based on specific requirements. from an industrial benchmark model.

7
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812

Table 3
Details of the process simulation using the VAM plant model (Machida et al., 2016).
Start time End time Fault description Type of fault Fault magnitude
00 h 00 min 05 h 00 min Steady State Operation N/A N/A
05 h 00 min 05 h 20 min Malfunction-21: Fail Absorber Bottom Valve Process Failure Binary Malfunction
14 h 20 min 14 h 40 min Malfunction-29: Column Differential Pressure Process Failure 150%
23 h 40 min 24 h 00 min Malfunction-25: Reactor-In Temperature Indicator Process Failure 10 ◦ C
33 h 00 min 33 h 20 min Malfunction-26: Dirty Vaporizer Level Indicator Process Failure 30%
42 h 20 min 42 h 40 min Malfunction-13: Change 15K Steam Temperature Disturbance −20 ◦ C
51 h 40 min 52 h 00 min Malfunction-07: Decrease Vaporizer Heat-Transfer Disturbance 50%
61 h 00 min 61 h 20 min Malfunction-23: Gas Feed Pressure Indicator Trouble Process Failure 2 MPa
70 h 20 min 70 h 40 min Malfunction-21: Fail Absorber Bottom Valve Process Failure Binary Malfunction
80 h 00 min 100 h 00 min Malfunction-20: Steady State Operation N/A N/A

5.1. Generation of simulation data Table 4


Example of alarm data generated based on VAM simulator.
Time Alarm State Priority Area
In the case studies, alarm data was generated through long-term
10:59:32 AM FC101.PV.HI ALM Low Raw Material Feed
alarm monitoring of the industrial benchmark Vinyl Acetate Monomer 10:59:42 AM FC101.PV.HI RTN Low Raw Material Feed
(VAM) plant model (Machida et al., 2016), where alarm configurations 10:59:52 AM TP201.PV(6).HI ALM High Reactor
were based on Yang, Hu, et al. (2020). The plant model consists of 11:00:22 AM FC101.PV.HI ALM Low Raw Material Feed
11:00:22 AM FC170.PV.HI ALM Low Raw Material Feed
eight distinct process units, namely, (1) raw material feed, (2) reactor, 11:00:32 AM FC170.PV.HI RTN Low Raw Material Feed
(3) separator & compressor, (4) absorber, (5) buffer tank, (6) distil- 11:00:42 AM FC101.PV.HI RTN Low Raw Material Feed
11:00:42 AM TC230.PV.HI ALM High Reactor
lation column, (7) decanter, and (8) CO2 remover & purge line. To
11:00:52 AM TC230.PV.HI RTN High Reactor
simulate industrial conditions, the steady-state system was perturbed
with malfunctions (namely, process failures and disturbances) as shown
in Table 3. During the entire simulation period, only one malfunction
was introduced at a time, such that the system had enough time to
recover to normal operating conditions before the next malfunction
was introduced. For each process variable (PV), the alarm limits were
obtained from statistical analysis of its steady state process data for 24 h
of operation, and thus four alarm limits were configured, namely, Low–
Low (PV.LL), Low (PV.LO), High (PV.HI), and High–High (PV.HH). In
total, 193 process variables were configured with alarm limits through
this approach. Thereafter, alarm priorities (namely, Critical, High, and
Low) were assigned following the recommendations in Yang, Hu, et al.
(2020). Eventually, alarm data was generated by comparing process
Fig. 7. Overview of correlation pattern statistical features, where the horizontal axis
variable values with their corresponding alarm limits. To ensure the
represents the length of a correlation pattern, namely, the number of alarms captured
accuracy of alarm correlation analysis, chattering alarms, oscillating by a correlation pattern. For (a), the vertical axis indicates the number of patterns
alarms, and long-standing alarms were identified and removed, where having a specific length. For (b), the vertical axis indicates pattern statistical features,
alarms that stayed active for more than 10 h were taken as long- where the blue (red) bar represents the average support (average confidence) value
for patterns in a specific length, and the exact reading is indicated on the left (right)
standing (Wang & Chen, 2016). As a result, the obtained data was
side. (For interpretation of the references to color in this figure legend, the reader is
comprised of 468 unique alarms. For illustration, a portion of the gen- referred to the web version of this article.)
erated alarm data is given in Table 4. Next, the proposed method was
applied to extract alarm correlation patterns and generate correlation
networks.
5.2.2. Visualized correlation networks
Based on the obtained correlation patterns, correlation networks
5.2. Presentation of obtained results were constructed. The networks were partitioned into seven clusters
by maximizing modularity  in (13). Then, the network graph was vi-
Following the calculation stages of the proposed method, the ob- sualized to help analyze process interactions. Fig. 8 shows the obtained
tained results are presented below. correlation network, where individual clusters are highlighted by dis-
tinct colors. Details of such clusters are provided in Table 5, where
5.2.1. Discovered correlation patterns cluster names, the number of alarms captured by a cluster, and the
Based on the alarm data, a temporal database was created, where corresponding alarms are provided in the three columns, respectively.
23819 temporal records were captured. Thereafter, alarm correlation Since alarm clusters in the correlation network consolidate multiple
patterns were extracted using this temporal database, and the statistical correlation patterns, the number of alarms in a cluster might exceed the
requirements were selected as 𝜆̄ = 10, 𝜃̄ = 0.5, 𝑓𝜋 (𝛹 ) > 1, and maximum pattern length. Therefore, correlation networks could give
𝛥𝑡 = 10 min. Consequently, 902 correlation patterns were extracted. more information to analyze alarm correlations by indicating process
An overview of the obtained patterns is provided in Fig. 7, where the interactions.
horizontal axis denotes pattern length, namely, the number of alarms To illustrate that a correlation network could help analyze corre-
in a correlation pattern. The vertical axis represents pattern statistical lated alarms and identify process interactions, an example is given
features: in Fig. 7a, it indicates the number of patterns having a specific based on cluster-3. In this cluster, two correlated alarms were cap-
length, whereas in Fig. 7b, it gives support/confidence values, where tured, namely, EI430.PV.LO and FC430.PV.LO. According to the Vinyl
the blue (red) bar represents the average support (average confidence) Acetate Monomer (VAM) plant model instruction manual (Machida
value for patterns in a specific length, and the exact value reading is et al., 2016), these alarms were triggered due to malfunction ‘‘Absorber
indicated on the left (right) side. Circulation Pump (P430) Failure’’, such that pump P430 stopped for

8
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812

Fig. 8. Correlation network containing seven clusters of correlated alarms. The


Fig. 9. Process variables and alarm signals for alarms captured by correlation network
individual clusters are highlighted by distinct colors. Each alarm is represented by a
cluster-4. (a) the process variables are shown in blue curves, and the corresponding
vertex and correlation types are indicated by letters on graph edges. (For interpretation
alarm limits are indicated by red lines. (b) the green lines indicate the alarm signal,
of the references to color in this figure legend, the reader is referred to the web version
where 1 (0) represents alarm occurrence (clearance). (For interpretation of the
of this article.)
references to color in this figure legend, the reader is referred to the web version
of this article.)
Table 5
Alarm correlations clusters in the correlation network.
Cluster No. of alarms Captured alarms in cluster
process data, the validation process involves examining whether the
Cluster-1 7 PI401.PV.HI, PI402.PV.HI, behavior of process variables during abnormal events aligns with the
PI331.PV.HI, FC460.PV.HI, identified correlation patterns. Therefore, for further validation of the
FI102.PV.HH, FC101.PV.LO, obtained correlation network, the alarms in cluster-4 were utilized.
FC170.PV.LO
Comparing Table 6 with the correlation network in Fig. 8, it could be
Cluster-2 2 FI150.PV.LL, PI150.PV.LO observed that this correlation network captured alarm correlations that
Cluster-3 2 EI430.PV.LO, FC430.PV.LO endorse the expected behavior of the process. Table 6 gives detailed
𝛾
Cluster-4 4 PC130.PV.LO, FI102.PV.LO, descriptions of alarms 𝑎 and 𝑎′ captured by correlation pattern 𝑎 ⇐⇐⇐⇒
⇐ 𝑎′ ,
FC101.PV.HI, FC170.PV.HI where the statistical features of the patterns are provided in the support
Cluster-5 11 PC330.PV.LO, PC210.PV.HI, and confidence columns. After analyzing process information, it could
TC130.PV.HI, PC210.PV.LO, be verified that all alarms in cluster-4 were correlated because they
FI150.PV.HI, TP201.PV(3).HI, were triggered due to the malfunction ‘‘Gas (Ethylene, C2 H4 ) Feed
LC310.PV.HI, PI150.PV.HI, Pressure Indicator Trouble’’ (Yang, Hu, et al., 2020). Specifically, the
TC150.PV.LO, activation of alarm PC130.PV.LO (C2 H4 feed pressure low) was fre-
TP201.PV(5).LO,
quently accompanied by the other three alarms, namely, FI102.PV.LO
TP201.PV(1).HI
(recycle gas flow rate low), FC101.PV.HI (C2 H4 feed flow rate high),
Cluster-6 7 FC460.PV.LL, PI402.PV.LL, and FC170.PV.HI (O2 feed flow rate high). When such abnormality hap-
FI403.PV.LO, PI331.PV.LL, pened, the feed pressure (PC130.PV) of raw material ethylene dropped,
PI401.PV.LL, FI451.PV.LO,
and recycle gas flow rate (FI102.PV) decreased. In the meantime, the
FI402.PV.LO
flow rates for C2 H4 (FC101.PV) and O2 (FC170.PV) increased because
Cluster-7 3 TP201.PV(8).HI, TC201.PV.HI, the system was attempting to maintain gas pipeline head pressure by
TP201.PV(7).HI
giving higher flow rates.
In addition, Fig. 9a (Fig. 9b) is provided to show the typical
process variable values (alarm signals) associated with the alarms in
an electrical issue. Consequently, this abnormality caused the process
cluster-4 of the correlation network. These process variables and alarm
values of the pump electric indicator (EI430.PV) and circulation flow
signals are aligned by time stamps on the horizontal axis. Here, the
rate (FC430.PV) to become zero simultaneously. Therefore, the two
process variables are shown in blue curves, and their alarm limits
process variables were correlated and such a scenario was presented
are indicated by red lines. If process variables PC130.PV/FI102.PV
by cluster-3 in the correlation network, where alarms EI430.PV.LO and
FC430.PV.LO were correlated by type 𝐄 (as indicated by the letter on (FC170.PV/FC101.PV) drop below (exceed) their alarm limits, their
the graph edge). corresponding alarms PC130.PV.LO/FI102.PV.LO (FC170.PV.HI/
FC101.PV.HI) would be triggered. Therefore, the green lines indicate
5.2.3. Results validation alarm signals, where 1 (0) represents alarm occurrence (clearance).
To utilize the obtained results in designing alarm reduction strate- Therefore, it could be observed that alarms in cluster-4 of the cor-
gies, it is essential to validate alarm correlations using process knowl- relation network represent the exact process interactions in plant
edge and/or process data. When the correlated alarms are linked to operation.

9
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812

Table 6
Description of alarms captured in correlation network cluster-3a .
Alarm 𝑎 Description of alarm 𝑎 Alarm 𝑎′ Description of alarm 𝑎′ Support Confidence Correlation
PC130.PV.LO C2 H4 Feed Pressure Low FI102.PV.LO Recycle Gas Flow Rate Low 54 0.568 B
FI102.PV.LO Recycle Gas Flow Rate Low FC170.PV.HI O2 Feed Flow Rate High 15 0.577 B
FI102.PV.LO Recycle Gas Flow Rate Low FC101.PV.HI C2 H4 Feed Flow Rate High 15 0.577 B
FC170.PV.HI O2 Feed Flow Rate High FC101.PV.HI C2 H4 Feed Flow Rate High 14 0.538 B
𝛾
a
Here, alarms 𝑎 and 𝑎′ give correlation pattern 𝑎 ⇐⇐⇐⇒𝑎
⇐ ′ and their correlation type is 𝛾.

Table 7 Declaration of competing interest


Highly interacting alarms in the alarm correlation network.
Alarm Priority Numerical Cluster Support Degree Weighted Correlation
denomination degree centrality The authors declare that they have no known competing finan-
PC330.PV.LO Critical 1 5 108 15 9.291 11.805 cial interests or personal relationships that could have appeared to
PI331.PV.HI Critical 1 1 38 4 2.688 3.279 influence the work reported in this paper.
PI401.PV.HI Critical 1 1 30 3 2.289 2.621
PI402.PV.HI Critical 1 1 33 3 2.095 2.507
PI401.PV.LL High 2 6 28 6 3.842 2.400
FI403.PV.LO Low 3 6 28 7 5.475 2.064
References
PI402.PV.LL High 2 6 27 4 2.628 1.621
FC460.PV.LL High 2 6 13 3 2.059 1.243
Allen, J. F., & Ferguson, G. (1994). Actions and events in interval temporal logic.
FI102.PV.HH High 2 1 12 3 1.619 1.102
FC460.PV.HI Low 3 1 38 4 2.704 1.096 Journal of Logic and Computation, 4(5), 531–579.
ANSI/ISA-18. 2 (2016). ANSI/ISA-18.2: Management of alarm systems for the process
industries. Durham, NC USA: International Society of Automation (ISA).
Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: an open source software for
exploring and manipulating networks. In International AAAI conference on web and
5.2.4. Identified highly interacting alarms
social media (pp. 361–336).
As a practical application of the proposed method to give priori- Bergquist, T., Ahnlund, J., & Larsson, J. E. (2003). Alarm reduction in industrial process
tized analysis for alarm configuration, highly interacting alarms were control. In IEEE conference on emerging technologies and factory automation (pp.
identified. Here, Table 7 listed the top 10 highly interacting alarms, 58–65).
which were ranked in descending order by correlation centrality. The Blondel, V. D., Guillaume, J., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding
of communities in large networks. Journal of Statistical Mechanics: Theory and
numerical denomination represents the conversion of alarm priority
Experiment, 2008(10).
to corresponding numerical values for centrality calculation. The clus- Dorgo, G., & Abonyi, J. (2018). Sequence mining based alarm suppression. IEEE Access,
ter values denoted where the highly interacting alarms belong. For 6, 15365–15379.
instance, alarm PC330.PV.LO (compressor inlet pressure low) was in EEMUA-191 (2013). Alarm systems: A guide to design, management and procurement.
cluster-5 and had priority as critical. This alarm occurred 108 times in London U.K.: Engineering Equipment and Materials Users’ Association (EEMUA).
Fournier-Viger, P., Wu, C., Tseng, V. S., Cao, L., & Nkambou, R. (2015). Mining
the temporal database and its correlation centrality is 11.805. Since
partially-ordered sequential rules common to multiple sequences. IEEE Transactions
alarm PC330.PV.LO has the highest centrality, it indicates that this on Knowledge and Data Engineering, 27(8), 2203–2216.
alarm was triggered frequently and correlated with many other alarms. Fruchterman, T. M., & Reingold, E. M. (1991). Graph drawing by force-directed
This scenario is also revealed by the correlation network, where alarm placement. Software - Practice and Experience, 21(11), 1129–1164.
PC330.PV.LO has the largest vertex size (indicating high support value Gadár, L., & Abonyi, J. (2019). Frequent pattern mining in multidimensional
organizational networks. Scientific Reports, 9(1).
for frequent occurrence) and this vertex is connected with 15 edges
Higuchi, F., Yamamoto, I., Takai, T., Noda, M., & Nishitani, H. (2009). Use of
(indicating having correlations with 15 alarms). Consequently, alarm event correlation analysis to reduce number of alarms. Computer Aided Chemical
PC330.PV.LO could be prioritized for further analysis to improve alarm Engineering, 27, 1521–1526.
system configuration. Hollifield, B. R., & Habibi, E. (2011). Alarm management: A comprehensive guide:
As shown in the case studies, the proposed method is able to dis- Practical and proven methods to optimize the performance of alarm management systems.
Durham, NC USA: International Society of Automation (ISA).
cover meaningful alarm correlation patterns and concisely present the
Hu, W., Wang, J., & Chen, T. (2015). A new method to detect and quantify correlated
results by visualization as correlation networks. For practical applica- alarms with occurrence delays. Computers & Chemical Engineering, 80, 189–198.
tion, the method could cluster alarms based on their statistical features Jacobs, S. A., & Dagnino, A. (2016). Large-scale industrial alarm reduction and critical
and effectively identify highly interacting alarms to help improve alarm events mining using graph analytics on spark. In IEEE second international conference
system configurations. on big data computing service and applications (pp. 66–71).
Kong, X., Wei, Q., & Chen, G. (2010). An approach to discovering multi-temporal
patterns and its application to financial databases. Information Sciences, 180(6),
6. Conclusion 873–885.
Li, T., & Li, X. (2010). Novel alarm correlation analysis system based on associa-
Analysis of correlated alarms is a commonly used strategy for tion rules mining in telecommunication networks. Information Sciences, 180(16),
alarm rationalization, such that alarm system performance could be 2960–2978.
improved by reducing redundant alarms. Therefore, this work proposed Machida, Y., Ootakara, S., Seki, H., Hashimoto, Y., Kano, M., Miyake, Y., An-
zai, N., Sawai, M., Katsuno, T., & Omata, T. (2016). Vinyl acetate monomer
a systematic method to extract and analyze correlated alarms from (VAM) plant model: a new benchmark problem for control and operation study.
historical Alarm & Event (A&E) logs. The proposed method consists of IFAC-PapersOnLine, 49(7), 533–538.
two main stages: (1) Pattern mining is performed to extract correlated Mah, R. S. (2013). Chemical process structures and information flows. Stoneham, MA
alarms along with alarm occurrence orders as correlation patterns. To USA: Butterworth Publishers, Department of Chemical Engineering, Northwestern
University.
help with further analysis, pattern statistical features are calculated.
Newman, M. E., & Girvan, M. (2004). Finding and evaluating community structure in
(2) Graph visualization techniques are utilized to generate correlation networks. Physical Review E, 69(2).
networks, which could help prioritize alarm analysis and indicate pro- Rao, H. R. M., Zhou, B., Chen, T., & Shah, S. L. (2022). Discovery of alarm correlations
cess interactions. As a practical application, a correlation centrality based on pattern mining and network analysis. In American control conference (pp.
metric is introduced to identify highly interacting alarms, which are 2467–2472).
Rothenberg, D. H. (2009). Alarm management for process control: A best-practice guide
alarms having frequent occurrences and strong correlations with other
for design, implementation, and use of industrial alarm systems. New York, NY USA:
alarms. To demonstrate the effectiveness of the proposed method, case Momentum Press.
studies are provided based on the industrial benchmark Vinyl Acetate Salah, S., Maciá-Fernández, G., & Díaz-Verdejo, J. E. (2013). A model-based survey of
Monomer (VAM) plant model. alert correlation techniques. Computer Networks, 57(5), 1289–1317.

10
H.R.M. Rao et al. Control Engineering Practice 143 (2024) 105812

Singh, A., Chaudhary, M., Rana, A., & Dubey, G. (2011). Online mining of data to Yang, G., Hu, W., Cao, W., & Wu, M. (2020). Simulating industrial alarm systems by
generate association rule mining in large databases. In International conference on extending the public model of a Vinyl Acetate Monomer process. In Chinese control
recent trends in information systems (pp. 126–131). conference (pp. 6093–6098).
Song, X., Liu, Q., Dong, M., Meng, Y., Qin, C., Zhao, D., Yin, F., & Jiu, J. (2022). Yang, F., Shah, S., Xiao, D., & Chen, T. (2012). Improved correlation analysis and
Chemical process alarm root cause diagnosis method based on the combination visualization of industrial alarm data. ISA Transactions, 51(4), 499–506.
of data-knowledge-driven method and time retrospective reasoning. ACS Omega, Yang, Z., Wang, J., & Chen, T. (2013). Detection of correlated alarms based on similarity
7(24), 20886–20905. coefficients of binary data. IEEE Transactions on Automation Science and Engineering,
Wang, J., & Chen, T. (2016). Main causes of long-standing alarms and their removal 10(4), 1014–1025.
by dynamic state-based alarm systems. Journal of Loss Prevention in the Process Yang, B., Wang, H., Li, H., & He, Y. (2020). A novel detection of correlated alarms
Industries, 43, 106–119. with delays based on improved block matching similarities. ISA Transactions, 98,
Wang, J., He, C., Liu, Y., Tian, G., Peng, I., Xing, J., Ruan, X., Xie, H., & Wang, F. 393–402.
L. (2017). Efficient alarm behavior analytics for telecom networks. Information Zhang, L., Chen, G., Brijs, T., & Zhang, X. (2008). Discovering during-temporal
Sciences, 402, 1–14. patterns (DTPs) in large temporal databases. Expert Systems with Applications, 34(2),
Wang, J., Li, H., Huang, J., & Su, C. (2015). A data similarity based analysis to 1178–1189.
consequential alarms of industrial processes. Journal of Loss Prevention in the Process Zhu, Q., Jin, C., He, Y., & Xu, Y. (2021). Pattern mining of alarm flood sequences using
Industries, 35, 29–34. an improved prefixspan algorithm with tolerance to short-term order ambiguity.
Wang, J., Li, H., Huang, J., & Su, C. (2016). Association rules mining based analysis Industrial and Engineering Chemistry Research, 60(11), 4375–4384.
of consequential alarm sequences in chemical processes. Journal of Loss Prevention
in the Process Industries, 41, 178–185.

11

You might also like