1. Introduction
The requirements for safety in industrial production have increased in rigor with the rapid expansion of modern industry. Accurate fault monitoring, diagnosis, and treatment of industrial processes contribute to the normal operation of industrial production and the prevention of further accident spreading. However, the modern industrial structure is complex, with numerous subsystems and variables that are frequently dynamic, non-linear, and highly correlated. Furthermore, due to changes in actual production requirements, industrial processes frequently contain multiple stable and transitional modes. These characteristics of complex industrial processes make monitoring and diagnosing faults more difficult.
In recent decades, data-driven methods [
1] have been widely used in the field of industrial technology. This is because a data-driven method does not need to establish an accurate mathematical model or possess too much prior knowledge. Such methods mainly depend on a large amount of process data to analyze and monitor [
2,
3,
4] the operation of the system or equipment under study. Currently, the mainstream research methods of fault monitoring and diagnosis technology primarily include data-driven multivariate statistics and novel machine learning or artificial intelligence methods. Aldrich and Auret provided a comprehensive review of unsupervised machine learning-based process monitoring methods [
5]. Fan trained the autoencoder using offline normal data by building the structure of a neural network and then used it for online fault detection [
6].
Classical data-driven approaches include PCA [
7], ICA [
8], PLS [
9], and other methods. Each of these methods has its own strengths and drawbacks. Traditional PCA is more suitable for production process data that are linear and satisfy Gaussian distribution, while it performs poorly for data that are strongly nonlinear, dynamic, and coupled. Therefore, many scholars have made improvements based on PCA and proposed many methods. For example, Lee proposed KPCA to handle nonlinear data for industrial process data presenting strong nonlinearity [
10]. Ku considered the serial correlation of process data and proposed DPCA, which performs better for highly dynamic data [
11]. Bakshi proposed MSPCA to solve the multiscale problem [
12]. Harrou combined the multivariate exponentially weighted moving average (MEWMA) monitoring scheme with PCA modeling to improve anomaly detection performance [
13], and so on.
Considering the importance of local data features represented by data neighborhood information, manifold learning is proposed to provide a novel perspective to preserve the local features of data [
14]. According to this method, data are formed by mapping a low-dimensional manifold onto a high-dimensional space. As a result, the low-dimensional data can uniquely represent the original data. Extraction of low-dimensional manifolds from high-dimensional data is accomplished by first establishing a local reduced-dimensional mapping relationship and then attempting to generalize the local mapping relationship to the global. Isomap [
15], Laplacian eigenmaps (LE) [
16], and locality-preserving projections (LPP) [
17] are popular manifold learning techniques. Currently, manifold learning has been applied to process monitoring in industrial processes. By using LPP in combination with PCA, Yu proposed a principal component analysis method (LGPCA) for local and global applications [
18]. Luo further revealed the relationship between LPP and PCA and proposed a novel dimensionality reduction algorithm, GLPP, which aims to preserve the global and local structure of the dataset by solving a biobjective optimization function [
19]. Wu used PCA, LPP, and isometric feature mapping (ISOMAP) to fuse features extracted from vibration signals for fault diagnosis [
20]. These methods have proven to outperform PCA-based and LPP-based monitoring methods.
However, actual industrial process variables are highly dynamic and have characteristics such as autocorrelation and intercorrelation. Traditional methods are difficult to effectively model highly dynamic data, which may result in false alarms in online monitoring. Moreover, due to the changing input point of the working condition and changes in the underlying raw material, the operating state of the industrial process will change to varying degrees, thus showing several different modes. Most data-driven methods operate in a single stable mode but perform poorly on multimodal process data. To accurately model and monitor multimodal processes, some methods have been proposed and practiced in the past. There are two main ideas: 1—Overall modeling [
21,
22,
23] entails using the same model to describe different modes. 2—Multimodal modeling [
24,
25,
26] involves describing the process characteristics of each mode by building local models for different stable modes. The goal of overall modeling is to build models that describe the different structures of all modes, such as global PCA models. However, this kind of method can lead to the deterioration of monitoring accuracy for some modes. False alarms may even occur. Multimodal modeling-based approaches model different modes separately. Modeling individual modes is more accurate than overall modeling [
27,
28].
The main contributions of this paper are as follows: first, an offline mode identification method based on the variable-length sliding window-mean augmented Dickey–Fuller (VLSW-MADF) test is proposed. The commonly used offline mode identification work is based on the trend of variation of each variable for mode classification. The method innovatively uses the smoothness of the data as the basis for stable and transitional mode identification. Compared with other multimodal classification methods, the proposed method in this paper is more intuitive, and the starting position of transition modes can be determined more accurately. Secondly, this paper improves the traditional data-driven fault monitoring method and proposes a novel fault monitoring method DLPPCA. Many scholars have studied the modeling of transition modes [
29]; however, these methods often do not focus on the transition modes themselves. DLPPCA performs well on dynamic transition mode data, can accurately model and monitor both transition and stable modes, and is more suitable for modeling and monitoring multimodal processes. Finally, this paper proposes a novel and less computationally intensive online modal identification method. The traditional online modal identification method requires traversing all offline models [
30]. When the number of modalities in the process is large, the computational effort is too large. The modal identification method proposed in this paper is based on matching value calculation and uses an offline matching matrix with the same sample length as the online data for modal identification, which reduces the computational effort and improves the accuracy at the same time.
The paper is organized as follows:
Section 2 describes process monitoring based on DLPPCA.
Section 3 describes the offline mode recognition and modeling steps based on the VLSW-MADF test and DLPPCA.
Section 4 describes the online mode recognition and monitoring strategies proposed in this paper.
Section 5 uses the TE process and actual power plant data to simulate and verify the validity and correctness of the methods presented in this paper. Finally, the conclusion of this article is presented in
Section 6. To avoid confusion among the many symbols, we have listed a nomenclature.
2. Process Monitoring Based on DLPPCA
PCA is a widely used data-driven method that performs well on data feature extraction tasks and is often applied for process monitoring in industrial practice [
31,
32,
33,
34,
35]. However, this method often ignores the local structure underlying data, resulting in the loss of potential information from such structures. Locality-preserving projection (LPP) is a manifold learning method that maintains the local structure of data and can restore a low-dimensional manifold structure from high-dimensional sample data [
36,
37,
38]. At present, scholars use LPP in combination with PCA [
39,
40], but the statistical model established by this combination method is static; that is, it assumes that the current process is time-invariant. In real industrial processes, process variables have dynamic characteristics of autocorrelation and cross-correlation. Since static PCA is unable to extract dynamic relationships from the data, autocorrelation and cross-correlation are mixed together, which makes it difficult for traditional PCA to reveal what type of relations among the measured variables. Direct application of traditional fault monitoring methods to dynamic data may lead to misleading results (real-time statistics exceeding real-time thresholds, resulting in false fault alarms). Therefore, we must consider the process data serial correlations to implement an efficient monitoring method.
Therefore, a process monitoring method based on dynamic locality-preserving principal component analysis (DLPPCA) is presented to solve the above problems. DLPPCA first constructs an extended matrix to associate adjacent sample points, and this solves the problem of strong correlation among the sample points in a dynamic process. LPP and PCA are combined to extract the maximum variance information of the manifold structure. This algorithm not only solves the problem of traditional data-driven methods having difficulty modeling due to the strong dynamic natures of industrial processes, but also makes up for the disadvantage of PCA or LPP being used alone by combining LPP with PCA. The steps for DLPPCA are as follows:
First, we assume that the sample set is
(
is the number of variables, and
is the number of samples) and that the sample set
has been standardized.
The original sample set
is dynamically expanded into a new matrix
by adding the time lag values of the variables using the “time lag shift” method proposed by Ku [
11]. The sample set:
is expanded to:
where
is the number of lags. It is selected by experience and should not be too large; generally,
. Where
is the first column of the dynamic expansion matrix and
is the m-dimensional observation vector in the sample set at moment
.
Next, the low-dimensional manifold structure of the data is extracted. The goal of manifold feature extraction is to find a projection matrix
such that the extracted low-dimensional manifold
retains a local structure similar to that of
. Then, we have the following objective function:
where
is a
relational matrix
and
is an
diagonal matrix. The diagonal elements are
and can be used to indicate the importance of each sample. To ensure that the objective function is solvable, a restriction
or
needs to be added. We define a Laplacian matrix
:
. Equation (4) is converted to an optimization problem, as shown below:
Equation (6) is equivalent to solving the generalized eigenvalue problem shown below:
The projection matrix is composed of the eigenvectors corresponding to the minimum generalized eigenvalues obtained. Thus, the extracted low-dimensional manifold is obtained.
Finally, the principal components of manifold
are extracted by PCA. The covariance matrix ∑ of the low-dimensional manifold
is as follows:
Eigenvalue decomposition is performed on the resulting covariance matrix as follows:
The projection matrix
is obtained by taking the eigenvectors corresponding to the
largest eigenvalues. Finally, the feature data extracted by DLPPCA are obtained:
The feature data from Equation (10) are called the matching matrix. This matching matrix will be used later for online modal recognition.
The above derivation explains the basic principles of DLPPCA. Statistics and confidence limits also need to be built to monitor an industrial process. This paper is implemented with the and statistics. Among them, the statistic represents the fluctuations of model variables, and the statistic measures the goodness of fit of the constructed model. Once the statistics of the online data exceed the corresponding confidence limits calculated from the normal offline data, the current process is considered to have a fault situation.
The
statistic is as follows:
where
and
is a diagonal matrix composed of the largest
in
.
The
statistic obeys the F distribution, so the confidence limit for
is:
The
statistic is as follows:
The confidence limit for the
is:
where
represents the confidence level, which is 0.99 for this article.
3. Offline Mode Identification Based on the VLSW-MADF Test and Modeling
Multimodal processes contain different stationary and transition modes [
41]. The stable mode mentioned in this paper refers to the industrial production process in a smooth working condition for a period of time. A stable mode means most of the time that the process data for that time series fluctuates around a stable central level. The nonstationary mode is the state of the production process when it transitions from one operating condition to another. The process data in this time series tend to have a clear upward or downward trend. Moreover, time series with nonstationary states can often be differentiated to form stationary series.
Notably, the sampling data of a multimodal process are also essentially a time series. Therefore, from the perspective of data stationarity, the stable mode can be considered as the current process data is in a stationary state, while the transition mode can be considered as the current process data is in a nonstationary state. From this point of view, this paper presents a method of pattern recognition based on the VLSW-MADF test. The method uses the stationarity of the process data as the basis for the identification of stable and transitional modes. Compared with other multimodal identification methods, the method presented in this paper starts with the intrinsic characteristics of the given data, which is more intuitive and less difficult to implement.
3.1. ADF Test
The augmented Dickey–Fuller (ADF) test is a stability test method that is widely used in the field of economics [
42,
43,
44]. However, to the authors’ knowledge, the ADF test has not yet been applied in the field of process monitoring or fault diagnosis. This method makes a stationarity judgment by determining whether there is a unit root in the current data; if there is no unit root, the data are in a stationary state. If there is a unit root, the data are in a nonstationary state. The specific process of the ADF test is as follows:
Assume that we have a time series denoted by ( is the number of samples). The ADF test can be completed by validating the following three models after making a first-order difference equation for :
Model 3:
where
is the time index,
is an intercept constant called a drift,
is the coefficient on a time trend,
is a trend term,
is the coefficient presenting process root, and
is a white noise sequence.
The assumptions presented are as follows:
Hypothesis 0 (H0). .
Hypothesis 1 (H1). .
This test is performed by calculating the
-statistic for each model:
where
is the estimated value of
and
is the standard error.
By querying the ADF threshold table, if the obtained -statistic is less than three confidence levels (10%, 5%, and 1%), it can be judged that the null hypothesis is rejected with 90%, 95%, and 99% confidence, respectively. If is greater than or equal to the critical value, the current data are not stationary. If is less than or equal to the critical value, the current data are stationary.
Since it is not known in the actual test which model the data being tested conform to at this time, the ADF test first checks model 3 (Equation (19)), and then it checks model 2 (Equation (18)) and model 1 (Equation (17)) in turn. If the null hypothesis is rejected, the test stops; otherwise, the test continues. That is, when none of the three models can reject the null hypothesis, the time series tested is considered nonstationary, and if one model rejects the null hypothesis, the time series is considered stationary.
3.2. Mode Identification Based on the VLSW-MADF Test
The traditional ADF test introduced in
Section 3.1 can only test the stationarity of a single variable. The production data from actual industrial processes are multivariable. As a result, the ADF test cannot be directly applied to industrial process data. However, in pattern recognition, the work that must be carried out is to recognize a pattern according to the changing trend of each variable. If we can extract a single variable that can represent the fluctuation trend of the multivariable industrial data, we can carry out pattern recognition through a stationarity analysis of the single variable. The single variable that can achieve this effect is called the trend variable of the process. This paper proposes a method of using mean value processing to extract the trend variables of a given process. We find that when the process is in a stable mode, the mean value of each sampling point also remains relatively stable; when the process is in a transitional mode, the mean value of each sampling point likewise fluctuates. The trend variables of the process extracted by the mean processing method can reflect the change trend of the process data accurately. The method for extracting the trend variable is described in detail in the second half of this section.
However, the MADF test alone cannot realize mode identification for process data. If we use the MADF test directly on a whole dataset with multiple modes without distinguishing between them, we will obtain incorrect test results, which cannot be reflected when the process enters a new mode. Only after the process data are divided can mode identification be realized by the MADF test. The result of mode identification is closely related to the length of the selected partition. Therefore, this paper combines the MADF test with a variable-length sliding window and finally proposes the VLSW-MADF test for modal identification. The approximate framework of the method is as follows: First, a window of length is used to divide the trend variables of the given data, and then the ADF test is used for rough mode identification. Rough mode identification can be used to roughly distinguish stable modes from transition modes. Then, a window of length is used to divide the trend variables, and the ADF test is used for detailed mode identification. Detailed mode identification can be used to determine the beginning and end of a transition mode. Finally, mode identification is realized for the multimodal process.
We provide the user with a criterion to select the hyperparameters of the proposed offline method. The parameter should be chosen to satisfy the length of the minimum transition mode of the current process. According to the experience of modeling multivariate statistical regression methods, the window data should be sampled at least 2–3 times more than the number of variables in order to achieve an effective statistical feature extraction. The parameter should be chosen to satisfy the length of the minimum stable mode of the current process. Moreover, the window length should be chosen to be at least two times the window length of ().
The above procedure is the VLSW-MADF test proposed in this paper. It is worth noting that there is no difference between the final modal recognition results obtained using mean processing before and after sliding window partitioning. However, when the mean value is used to divide the sliding window, the method needs to solve the mean value many times. In the VLSW-MADF test proposed in this paper, mean value processing is used before sliding window partitioning to reduce the number of required calculation steps. That is, a sliding window partition and an ADF test are directly carried out on the trend variables.
More detailed steps are as follows. It is assumed that we have multimodal process data
is the number of samples, and
is the number of variables). The mean value of the sample point data
is calculated to obtain:
By summarizing the results of Equation (21), the trend variable of the process is finally obtained as follows:
Next, rough mode identification is performed:
is segmented along the sampling direction using a sliding window
. In this paper, we choose the window length based on the aforementioned criterion and the characteristics of the actual process. In the two numerical simulation cases of this paper, the window length of
is determined to be 50 and the window length of
is determined to be 100. After cutting, we obtain a series of windows:
…. The ADF test in
Section 3.1 is used to test the stationarity of the data in each window. Finally, a stationarity matrix is obtained as follows:
where
.
The nonstationarity window data correspond to processes in a transition mode, while the stationarity window data correspond to processes in a steady mode. The resulting stationarity matrix shows the result of rough mode identification for a multimodal process. However, at this point, we can only obtain a rough idea of whether the process corresponding to each segment of data is in a stable mode or a transitional mode. To achieve pattern recognition, it is necessary to further determine the positions of the beginning and end of each mode. Therefore, detailed mode identification is also needed.
Detailed mode identification: Based on the results of rough mode identification, for the previous window entering the nonstationary state to the next window ending the nonstationary state mode, the shorter window is used for pattern recognition and stability testing. For example, assuming that the continuous values from to in the stationary matrix are all zero, the sample dataset that needs to be reidentified and retested is .
In this step, we need to choose a shorter window length
than
based on the aforementioned criterion and the characteristics of the actual process. We choose the window length
. Similar to the matrix obtained via the previous steps of rough mode identification, the final stationarity matrix is as follows:
The starting point of the transition mode can be judged more accurately by matrix
than by
. By analogy, the other transition modes in the rough pattern recognition results are determined in the same way, and finally, pattern recognition is realized for the multimodal process. The detailed steps of the VLSW-MADF test are shown in
Figure 1.
3.3. Offline Modeling
The main idea of this method is to first identify a mode and then model it separately. In the offline modeling stage, the transition modes and stable modes obtained after pattern recognition should be modeled individually. In the second section, we declared that the process monitoring method used in this paper is DLPPCA. This method performs well on dynamic data and can accurately model transition modes with large variation ranges and strong dynamics. Although the stable mode means most of the time that the process data of that time series fluctuate around a stable central level, we still have to consider the serial correlation of the stable mode process data. Therefore, DLPPCA is equally applicable to offline modeling for both stable and transition modes. Therefore, this paper uses DLPPCA to model transition modes and stable modes separately based on mode identification. Algorithm 1 shows the offline modeling phase algorithm.
Algorithm 1. Offline modeling phase
|
Step 1: Input multimodal process data ; |
Step 2: Calculate trend variables by (21) and (22); |
Step 3: Divide through a window of length obtaining through ADF test; |
Step 4: Further divide the shorter window into obtaining through ADF test; |
Step 5: Obtain by step 3 and step 4, and model them separately using DLPPCA, saving and ; |
Step 6: Similar to Step 5, obtain by step 3 and step 4, and model them separately using DLPPCA, saving and . |
The specific steps are as follows:
Step 1: We acquire multimodal process data .
Step 2: The mean value of the training dataset is calculated to obtain the trend variables of the process .
Step 3: The VLSW-MADF test is used to test the stability of determine the starting position of each mode, and complete the pattern recognition task.
Step 4: According to the results of mode division obtained in the previous step, the training data are divided into several subsegments. The stable mode subsegments are and the transition mode subsegments are .
Step 5: DLPPCA is used to model each transition mode subblock. Taking subsegment as an example, according to the content in the first section, the confidence limit and can be obtained by using DLPPCA to model and the final feature extraction result can be obtained. This feature extraction result is also called the matching matrix. This matching matrix and the confidence limit are both saved.
Step 6: Similar to Step 5, DLPPCA is also used to model the stable mode subblocks. Taking subsegment
as an example, DLPPCA is used to model and obtain the confidence limit
and
. In the subsequent online mode identification step, it is not necessary to use the matching matrices of the stable modes, so only the confidence limit needs to be saved here. A flowchart of the offline modeling is shown in
Figure 2.
It is worth noting that this paper only takes a portion of a process dataset as an example to illustrate the steps of offline mode identification and modeling. However, in practical applications, to model all the modes offline, it is often necessary to identify and model multiple segments of process data. In particular, a transition mode is assumed to contain a stable mode A and stable mode B. The transition process from stable mode A to stable mode B (transition mode AB) and the transition process from stable mode B to stable mode A (transition mode BA) are two different transition modes, and the change trends of their related characteristics are also different, so it is necessary to establish transition models for these two transition modes separately. In other words, if we have stable modes A and B, then A and B satisfy .
4. Online Mode Identification and Monitoring Algorithm
In the last section, offline mode identification and modeling were completed. When conducting online monitoring for multimodal processes, it is also necessary to identify the current online process data. Only by determining which mode the current process belongs to can the appropriate offline model for subsequent online monitoring be selected. If the judgment is wrong, it may result in false alarms. Therefore, when online monitoring is performed, it is necessary to carry out mode identification first and then statistical monitoring.
For online mode identification, researchers have proposed several methods, such as the minimum principle, which involves traversing all models and selecting the corresponding model with the lowest . Another approach is the probability monitoring method, in which the online samples come from each process with a certain probability, and all offline models are used for joint detection with a certain probability.
However, all offline models need to be considered in the above methods. When there are too many modes in the examined process, the number of calculations is too large. When the process corresponding to the online data first enters the transition mode, the amount of data is small, and the data characteristics are different from those of the whole transition dataset. If the offline model derived from the whole transition dataset is used to match the online data, mode identification errors easily occur.
Therefore, considering the above problems, a new online mode identification method is proposed. The method proposed in this paper does not need to identify all online data but instead discusses them in different situations, and mode identification is performed only in certain situations. The proposed mode identification method is based on the calculation of matching values. An offline matching matrix with the same sample length as that of the online dataset is used for mode identification instead of using the whole transition dataset for matching, thereby improving the accuracy of the method.
It should be noted that since the current sampling time selected for online operation is
reliable and accurate conclusions cannot be obtained if the online modal recognition process depends only on the results of one sample point at time
. Therefore, online mode identification is performed by combining the recognition results of
consecutive online sampling data, that is, from the
sample to the
sample. Algorithm 2 shows the online monitoring phase algorithm. Based on the above premises, the detailed steps of the online monitoring method proposed in this paper are as follows (the proposed online mode identification method is used in Step 4).
Algorithm 2. Online monitoring phase.
|
Step 1: Input online process data ; |
Step 2: Determine the mode of the starting phase by minimum ; |
Step 3: Monitor the current continuous data from to using an offline model corresponding to time data;
Situation 1: Below the control limit.
The current process mode is the same as the previous one.
Situation 2: Exceeding the control limit.
Situation 2.1: The current process has a fault situation.
Situation 2.2: The current process enters a new mode.
Situation 2.2.1: The process is in transition mode at the previous moment.
Situation 2.2.2: The process is in stable mode at the previous moment. |
Step 4: For situation 2.2.2, calculate matching value for online modal recognition. |
Step 5: Use the model determined in step 4 to remonitor. If it is below the control limits, the currently selected model matches the actual mode. If it exceeds the control limits, a fault has occurred. |
Step 1: Determining the model for the starting stage.
For the initial phase of a process, since there are no data from the sample points at the previous moment to use as references, the corresponding model for the starting phase needs to be determined. Here, the minimum principle is used to determine that for a data segment with a starting length of ; the online data are monitored in turn using known historically stable modes. The model with the lowest for online samples is selected for monitoring.
Step 2: Trial monitoring of online process data:
When there are consecutive data points to the offline model from the previous moment is fully utilized for detection. The current continuous ( > 10) data from to are monitored using an offline model corresponding to time data.
Step 3: Analyzing the monitoring test results.
There are several possibilities for monitoring the test results obtained in Step 2. If the current process data statistic is below the control limit, it means that the process data ω and (k − ω) correspond to the same mode of the process. If the current process data statistic exceeds the control limit, the mode changes at the time when the control limit is exceeded. There are two possibilities for change.
The current process has a fault situation;
The current process enters a new mode. There are also two possibilities for entering a new modal process, from transition mode to stable mode or from stable mode to transition mode.
It should be noted that to avoid false alarms, this paper considers that a process is abnormal only when a continuous number of samples () are beyond the control limit and do not depend only on the identification result of a sampling point at time .
Step 4: Online modal identification.
If the current process data statistics are beyond the confidence limit, it is necessary to judge whether the current process is having a fault or enters a new mode.
First, we assume that the current data enter a new mode.
- 3.
If the process was in the transition mode at the previous moment (k − ω), the current process enters the stable mode corresponding to the transition mode. This stable mode is selected as the monitoring mode and no mode identification is required;
- 4.
If the previous moment (k − ω) process is in a stable mode, the current process enters into a transition mode that bridges with this stable mode. Modal identification is required. However, only the transition modes that articulate that stable mode need to be selected for modal identification, not all transition modes need to be selected.
In online mode identification, a method based on matching value calculation is presented in this paper.
Assume that all possible historical transition modes are
. According to
Section 3.3, we can obtain the matching matrix corresponding to the historical transition mode:
. The online data are modeled by DLPPCA, and the online matching matrix is
. The historical matching matrix at this time comes from performing feature extraction on the whole transition dataset, and
is derived only from the current
data points. The characteristics of transition data are high volatility and a large change range. This means that
’s data features are different from those of
. Additionally, direct matching is prone to errors. Therefore, to conduct matching accurately, short processing is performed for
; that is, the original
is truncated along the direction of the sampling point, and only the first
column vectors are taken. Because the data needed for online mode identification are considered to have just entered the transition mode, it is reasonable to select the first
column vectors of
.
Next, the matching value between the matrices and is calculated in turn. In this paper, the matching value is solved based on Euclidean distance, and the similarity of the two matrices is measured by calculating the sum of the distances between the corresponding column vectors in and . The smaller the distance, the smaller is. The specific procedure is as follows:
First, the Euclidean distance vector between
and
is calculated as follows:
where
Here, represents line of and represents line of .
The elements of the Euclidean distance matrix
are summed to obtain the matching value
as follows:
The transition mode corresponding to the minimum is selected as the monitoring model.
Step 5: Remonitoring.
The
consecutive process data points from
to
are monitored again using the monitoring model determined in Step 4. The monitoring results are analyzed again. If the current process statistics are below the confidence limit, indicating that the current selection model matches the actual mode, the obtained model can continue to perform process monitoring. If the current data still exceed the confidence limit, a fault has occurred. A flowchart of the online mode identification and monitoring algorithm is shown in
Figure 3. To avoid confusion among the many symbols, we have created a nomenclature, as shown in Nomenclature Section.
5. Application and Results
In the previous section, we showed the proposed method in detail. In this section, we will use two numerical simulation cases to verify the effectiveness of our proposed method. First, the first numerical simulation case was carried out based on the TE process. We generated multimodal data based on normal operating conditions by adjusting the operating points of the TE process. Second, the case of the second numerical simulation was carried out based on data from a power plant generating unit. We also simulated a multimodal process data. The validity and feasibility of the methods presented in this paper are verified from the following four perspectives:
The presented offline mode identification method based on the VLSW-MADF test is accurate and feasible;
The online mode identification method proposed in this paper is accurate and feasible;
Transition modes are more accurately modeled and monitored using DLPPCA than with other approaches;
Modeling stable modes and transition modes separately can improve the accuracy of online monitoring.
For the proposed method, the fault detection rate (FDR), false alarm rate (FAR), missed alarm rate (MAR), and detection delay (DD) are mainly considered to evaluate the method’s performance. These metrics are applied to quantify the method performance in the two subsequent numerical simulation cases. False alarm rate (FAR) measures the probability of false alarms, and a false alarm is an indication of a fault when a fault has not occurred. Fault detection rate (FDR) measures the probability of successful fault detection, and successful fault detection is an indication of a fault when a fault has occurred. Missing alarm rate (MAR) measures the probability of a missed alarm, which is when a fault occurs but is not detected. Detection delay (DD) is the time period between the start of a fault and the time of the detection. It is expected that a larger value for the FDR indicator is better. Smaller values for the remaining three indicators are better. The formulae for calculating the FDR, FAR, and MAR indicators are as follows.
where
represents the current data statistic value and
represents the control limit.
.
5.1. TE Process
A TE process is a simulation based on a real industrial process [
45,
46,
47]. The operating points of a TE process can be adjusted to meet production requirements when generating multimodal data. This paper describes a 160 h multimodal process; the values of the Production Setpoint, Sep Level Setpoint, and Steam Valve Position are changed at the 50th hour so that the TE process transitions from stable mode A to stable mode B. At 90 h, the values of the production setpoint, sep level setpoint, steam valve position, mole%g setpoint, and yA setpoint are changed again so that the TE process transitions from stable mode B to stable mode C.
Finally, multimodal process data were obtained, with a total of 1600 sample points. This process includes the stable mode A, stable mode B, stable mode C, transition mode AB, and transition mode BC. There are 53 variables in the TE process. Eight process continuous variables are selected to validate the proposed method. These eight variables are shown in the following
Table 1. The change curves of the eight variables of the simulation data under normal working conditions are shown in
Figure 4.
The multimodal process dataset consisting of these eight variables is named
. First, the segment data are identified based on the VLSW-MADF test. The trend variable
is derived from Formulas (21) and (22). The change curve for this trend variable is shown in
Figure 5. Notably, the trend of
coincides with the pattern change trend of the original process; the pattern changes in the 50th and 90th hours and transitions to a new mode each time. This indicates that the variable
can represent the trend of multivariable process data.
After
is obtained, rough mode identification is performed using a window with a length of
resulting in the following stationarity matrix:
It can be seen from the matrix that the window
is in a stable mode. Window
is nonstationary and enters a transition mode. Window
enters a stable mode. Window
enters a transition mode again. The final window
remains in a stable mode. In order to highlight the details in the mode transitions, the next step requires more detailed mode identification to determine the exact starting position of the transition modes. Two small stationarity matrices are obtained by dividing the data of windows
and
using a shorter window
. The small stationary matrices are shown below:
Finally, results are obtained based on the VLSW-MADF test. The process of data points 1–500 is in stable mode A. The process of data points 500–650 is in transition mode AB. the process of data points 650–950 is in stable mode B. the process of data points 950–1100 is in transition mode BC. Finally, the process of data points 1100–1600 is in stable mode C. These results are consistent with the actual situation and can be explained with the trend variable
.
Figure 6 is the local magnifications of the trend variable
for demonstrating the correctness of the results of mode identification based on the VLSW-MADF test.
Next, based on the modal identification results, stable modes A, B, and C and transition modes AB and BC are modeled using DLPPCA, and the confidence limits and matching matrices of each mode are saved. It is important to note that only a portion of the full dataset is presented here. However, other forms of multimodal data need to be identified and modeled. Finally, the confidence limits and matching matrices of stable modes A, B, and C and transition modes AB, BC, AC, BA, CB, and CA are obtained. In the online monitoring phase, this paper first monitors a section of normal operating process data from stable mode B to stable mode C online. The test data contain a total of 1000 sample points, and the process enters transition mode BC at around the 500th sample point, exits transition mode at around the 700th sample point, and finally enters stable mode C.
This test dataset
is used to verify the correctness of the online modal identification method proposed in this paper. As seen in
Section 4, there are no previous sample data points available for reference at the beginning of the process. Therefore, the online data are monitored using known stable modes A, B, and C as offline models; that is, 30 consecutive data points from the first sample are monitored online. The obtained results in terms of the
statistics are shown in
Figure 7.
The red dashed line in the figure represents the confidence limit. Notably, the statistic is the smallest when using stable mode B to detect online process data. From this observation, it is determined that the production process is in stable mode B. This conclusion is consistent with the actual situation. The online process data are then continuously monitored using stable mode B. In subsequent monitoring steps, most of the online data statistics are below the confidence limit, and these online data statistics only occasionally exceed the confidence limit. However, it has been declared previously that a failure or a new mode is only considered if several consecutive sampling points exceed the confidence limit.
When the 521st sampling point to the 550th sampling point is monitored, these 30 consecutive sampling points are beyond the confidence limit, as shown in
Figure 8.
At this point, we consider the current process to have transitioned to a new operating mode or to have a fault. The steps in
Section 4 are now followed. First, assume that the current process transitions to a new operation mode. Since the process at the previous time is in stable mode B, the current process of this section must be in one of the transition modes connected to B. Therefore, transition mode BA, transition mode BC, and the current process must be selected to match the current mode. According to Step 4 in
Section 4, the matching value between the online data and transition mode BA is
and the matching value for transition mode BC is
. According to these matching values, it is determined that the current process enter transition mode BC. Transition mode BC is used as the offline model to remonitor the current process data online, and the results are shown in
Figure 9.
At this time, the online monitoring sample statistics are below the confidence limit. It is proven that the current process is in transition mode BC, which is consistent with the actual situation. The above simulation verifies the feasibility and correctness of the online mode identification method proposed in this paper that performs online monitoring and online modal identification on process data obtained under normal working conditions.
The next step is to use a data segment in stable mode A where a fault has occurred online test data. The fault occurs in the variable D Feed; introducing a fault signal at the 51st sample point linearly increases the variable D Feed to simulate a progressively increasing fault. The increased value is maintained between the 80th and 130th sample points; starting at the 131st sample point, the value falls back to its normal level, and at the 150th sample point, the fault disappears. The change curve of D Feed is shown in
Figure 10.
The simulation here omits steps such as determining the initial mode and begins directly at the 55th sample point. Since the historical mode of the data at the previous moment is known to be stable mode A, stable mode A continues to be used for the online monitoring of 30 consecutive data points starting at the 55th sample point, as shown in
Figure 11.
As seen in the figure above, the
and
statistics for the current data almost all exceed the confidence limit. As a result, the current process has faulted or entered a transitional mode. Assuming that the current process enters a transition mode, since the previous moment was stable mode A, the current process can only be in a transition mode joined to A: transition mode AB or transition mode AC. The procedure of Step 4 in
Section 4 is continued to obtain the match between the online data and transition mode AB;
and the matching value for transition mode AC is
. From the matched values, transition mode AC is more likely to be the transition mode in which the online process is located. The online data are remonitored using transition mode AC, as shown in
Figure 12.
Obviously, all data statistics exceed the confidence limit. This indicates that the current process does not enter transition mode AC, but is faulted. Therefore, the current process data will continue to be monitored using stable mode A. To better illustrate the effectiveness of using stable mode A for the failure monitoring of online data,
Figure 13 shows the results of online monitoring at sample points 51 to 200.
Figure 13 in the article shows the results of online monitoring based on the modal identification monitoring model. The fault can be clearly detected when
and SPE exceed the control limits at 56 and 60 sampling times, respectively. The fault is introduced from the 51st sampling time. The monitoring statistics
and SPE have a detection delay of 5 sampling times and 9 sampling times, respectively. In addition, the estimated fault end time differs from the real situation by only 4 sampling times, which indicates that the monitoring model DLPPCA based on modal identification can accurately locate the fault interval and has accurate monitoring results. In addition, we calculated FDR, FAR, MAR, and detection delay for the
statistic and SPE statistic, as shown in
Table 2.
The above simulation of the TE process data proves the following:
The correctness and feasibility of offline mode identification based on the VLSW-MADF test. The VLSW-MADF method proposed in this paper can accurately and quickly identify the mode of multimodal process.
The correctness and feasibility of the online mode identification method proposed in this paper. The online mode identification method proposed in this paper does not require all the online data to be modally identified and makes full use of the data from the previous moment for a case-by-case discussion. When a fault occurs or enters a transition mode, this method can accurately identify.
In
Section 5.2, the validation simulation of the actual data from a power plant motor set will be continued to demonstrate the superiority of the DLPPCA method and the necessity of modeling stable modes and transition modes separately.
5.2. Power Plant Data Simulation
In the simulation experiments in this section, relevant data from a 2 × 660 MW power plant are used. A steam feedwater pump system is selected as an example for simulation purposes. The schematic diagram of the thermal power unit is shown in
Figure 14. The steam feedwater pump system contains seven variables, as shown in
Table 3. The change curves of these seven variables are shown in
Figure 15.
First, the VLSW-MADF method is used for offline modal identification of the multimodal process. It is determined that the process begins at the 200th sampling point, enters transition mode AB from stable mode A, and then enters a new stable mode (B) after 50 sampling points. At the 450th sample point, the process starts from stable mode B, enters transition mode BA, and then enters stable mode A after 50 sample points. After mode identification, the DLPPCA method is used to model stable mode A, stable mode B, transition mode AB, and transition mode BA. The confidence limits of each mode and the matching matrices of the transition modes are saved. Since this part of the procedure is the same as the simulation of the TE process in
Section 5.1, it will not be repeated here.
To prove that DLPPCA performs well with dynamic transition data, fault data in each transition mode are used for online detection; there are 400 sample points in the test dataset, and the transition from stable mode A to stable mode B begins gradually at 150th sample point. A fault signal is introduced to transition mode AB of the variable “small engine speed” to simulate a noise fault during the transition. The variable change curve is shown in
Figure 16.
Ignoring the initial mode matching process, the online monitoring effect is demonstrated for 30 consecutive sample points, starting at sample point 151. Since at the previous moment the process was in stable mode A, the online process data are monitored using stable mode A first, and the results are in
Figure 17.
Obviously, all sample points exceed the confidence limit. Assuming that the current process enters a new mode since it was in stable mode A at the previous moment, the current mode may only be in transition mode AB. The online data are then remonitored using transition mode AB, as shown in
Figure 18.
If the current process mode is in transition mode AB, the statistics of the online data should be below the confidence limit when using transition mode AB for online monitoring. However, at this time, the confidence limit of the online data almost completely exceeds the confidence limit. This indicates that the online data are not in transition mode AB, but have a fault. Sample points 150 to 200 are monitored using transition mode AB, as shown in
Figure 19.
From the monitoring results shown in the figure above, it can be seen that most of the sample points exceed the confidence limit when using transition mode AB to monitor the online process data. Although the sample points did not exceed the confidence limit by much, the occurrence of a fault was also identified.
These monitoring results are due to the nature of a fault itself. Transition mode data are dynamic, and the fault that occurs is that noise signals are added based on the original change trend, thereby increasing the fluctuation amplitude of the transition data. Therefore, it is difficult to monitor such faults. However, the DLPPCA method can still accurately model and monitor such transition data online. To illustrate the excellence of the DLPPCA method, a comparison is made between it, the DPCA method, and the LPPCA method without dynamic expansion. Only the modeling and monitoring methods are replaced in the comparison; all other steps remain the same.
After modeling with DPCA and monitoring the 151st to 180th sample points of the online dataset, the obtained results are shown in
Figure 20.
After modeling using LPPCA and monitoring the 151st to 180th sample points of the online dataset, the obtained results are shown in
Figure 21.
Using DLPPCA, DPCA, and LPPCA, the fault detection rate (FDR) and missed alarm rate (MAR) obtained when monitoring this transitional mode failure are shown in the following
Table 4.
When the transition mode fails, the missed alarm rate (MAR) for modeling and monitoring with DPCA or LPPCA are much higher than that with DLPPCA. By comparison, the superiority of the DLPPCA method is proven. Generally, the DLPPCA method presented in this paper performs better on dynamic transition mode data and is more suitable for modeling and monitoring multimodal processes. Compared with DPCA, the accuracy of DLPPCA for fault monitoring is higher. This is because the combined use of the LPP method enables the extraction of a manifold structure that is more representative of the essential characteristics of the data while maintaining the nonlinear structure. DLPPCA fully considers both the global Euclidean structure and the local neighborhood structure of the dataset, instead of considering only one of these aspects. The false alarm rate of LPPCA for fault monitoring is much higher than that of DPCA and DLPPCA. This is due to the dynamic characteristics of autocorrelation and cross-correlation of process variables in real industrial processes. The traditional PCA-based approach is unable to extract dynamic relationships from the data, which makes it difficult to reveal the types of relationships between measured variables.
Next, another comparative simulation is used to illustrate the necessity of modeling stable modes and transition modes separately. This comparative simulation uses the DLPPCA method to perform overall offline modeling on the process containing stable mode A, stable mode B, transition mode AB, and transition mode BA without distinguishing between them.
On this basis, online data with the same trend as that of the corresponding offline data are selected for monitoring; there are 700 sample points in the online dataset. At the 200th sample point, stable mode A changes to transition mode AB, and after 50 additional sample points, stable mode B is entered. At the 450th sample point, stable mode B switches to transition mode BA and then becomes stable mode A after 50 more sample points. All online data are monitored, and the results are shown in
Figure 22.
It was clearly seen that the normal transition mode process was incorrectly identified as a fault during the overall monitoring process. This is not the case when the stable modes and transition modes are modeled and monitored separately. The necessity of modeling stable modes and transition modes separately can be demonstrated.
Through the above simulation using an actual dataset from a power plant motor, the following can be proven:
The DLPPCA method is more accurate than existing methods when modeling and monitoring transition modes. Compared with DPCA and LPPCA, the DLPPCA pro-posed in this paper has higher modeling accuracy. For transitional mode faults that are difficult to accurately monitor with other methods, accurate results can also be obtained by using DLPPCA.
Modeling stable modes and transition modes separately can improve the accuracy of online monitoring. If the multimodal process is indiscriminately modeled as a whole, the normal transitional modal process can easily be mismonitored as a fault by an online monitoring approach. Modeling stable modes and transition modes separately enables us to avoid such errors and make online monitoring more accurate.
6. Conclusions
In this paper, a new multimodal process detection method is presented. In the offline phase, the VLSW-MADF test is used to identify the inherent modes, separating the stable modes from the transition modes. Then, based on the results of mode identification, the stable modes and the transition modes are modeled separately using the proposed DLPPCA method, and the confidence limit and matching matrix of each mode are saved for online mode recognition and monitoring. The VLSW-MADF test is fast and accurate in mode identification, and DLPPCA performs better on transitional mode data than traditional methods. In the online monitoring phase, this paper takes full advantage of the previous moment’s historical mode and presents a new online mode identification method; this method is discussed separately and online mode identification is performed only when necessary, which reduces the computational load and improves the efficiency of mode identification. The feasibility and efficiency of the proposed method have been evaluated through case studies involving the TE process and power plant data. Several comparisons and simulations have been made. The results show that the proposed multimodal process fault monitoring method based on the VLSW-MADF test and DLPPCA improve the efficiency and accuracy of multimodal data monitoring.
In previous research work, multiple PCA and multiple PLS are considered as the most classical methods for multimode process monitoring. Zhao [
25,
26] used historical data to build a single PCA or PLS model for each mode. However, this approach splits the useful information hidden between data sequences and is highly dependent on similarity measure algorithms. The proposed DLPPCA model in this paper considers the serial correlation of process data and makes full use of the global Euclidean structure and local neighborhood structure of the dataset by introducing manifold learning. The simulation results show that DLPPCA performs better than the conventional method on transition mode data. Meanwhile, the DLPPCA monitoring model can detect faults in time not only in steady mode but also in transition mode. For the problem of an unknown modeling data mode, Tan [
29] uses variable-length sliding windows to extract the correlation changes of offline normal operation data and achieves the division of stable modal data and transitional modal data according to the similarity of correlation between windows. However, in the process of mode identification, the influence of the selection of the boundary parameter α on the mode identification accuracy and monitoring effect is enormous. The boundary parameter α needs to be selected by a large number of repeated experiments and expert experience. The VLSW-MADF test method proposed in this study innovatively uses the smoothness of the data as the basis for the identification of stable and transitional modes, and can accurately determine the onset of transitional modes.
Although the method in this paper achieves better results on two numerical simulation cases, there is still much room for improvement and some limitations. For example, we still perform dynamic feature extraction and analysis by constructing an augmented matrix with time lag properties. However, this method will increase the dimensionality of the data matrix and increase the computational effort. In addition, continuous learning or lifelong learning has become a key research focus in machine learning, and many researchers have introduced continuous learning into the field of process monitoring and fault diagnosis. For example, Zhang [
48] investigated a single model with continuous learning capability to monitor continuous modes and achieved good results. In future research, the authors will consider improvements to existing algorithms and prefer to extend PCA to a framework of continuous learning or adaptive updating of the overall model to propose a more effective approach for industrial process monitoring.