Cem 800
Cem 800
Cem 800
This paper provides an overview and analysis of statistical process monitoring methods for fault
detection, identification and reconstruction. Several fault detection indices in the literature are
analyzed and unified. Fault reconstruction for both sensor and process faults is presented which
extends the traditional missing value replacement method. Fault diagnosis methods that have
appeared recently are reviewed. The reconstruction-based approach and the contribution-based
approach are analyzed and compared with simulation and industrial examples. The complementary
nature of the reconstruction- and contribution-based approaches is highlighted. An industrial
example of polyester film process monitoring is given to demonstrate the power of the contribution-
and reconstruction-based approaches in a hierarchical monitoring framework. Finally we demon-
strate that the reconstruction-based framework provides a convenient way for fault analysis,
including fault detectability, reconstructability and identifiability conditions, resolving many
theoretical issues in process monitoring. Additional topics are summarized at the end of the paper
for future investigation. Copyright # 2003 John Wiley & Sons, Ltd.
KEYWORDS: process monitoring; process chemometrics; fault detection; fault identification; fault reconstruction; sensor
validation; contribution plots; fault analysis
following characteristics of process monitoring which dis- or directions. Section 3 summarizes many existing fault
tinguishes it from MQC: (i) fault diagnosis in process mon- detection indices, including global and subspace-based in-
itoring becomes feasible and more interesting owing to the dices. A unified representation of these indices is presented
inclusion of process variables in the analysis; (ii) fault as well. Section 4 discusses methods for fault reconstruction
reconstruction is possible based on multivariate statistical and their relation to missing value replacement approaches.
models to maintain control and optimization of process Fault diagnosis methods are summarized in Section 5, with
variables on-going even though some sensors have failed special attention to reconstruction-based methods and con-
[6, 10, 15]; (iii) the stationarity assumption of multivariate tribution plot approaches. Section 6 provides an industrial
statistical methods is challenged in SPM, since there are application in which the contribution- and reconstruction-
frequently normal process changes and process drifts which based methods are used in a hierarchical monitoring frame-
would be reflected in the process variables, while the quality work. The analysis of fault detectability, reconstructability
variables are not supposed to change or drift—this leads to and identifiability is given in Section 7. Section 8 gives
methods for multiscale and recursive monitoring to remove conclusions and further discussion.
or adapt for process non-stationarity [18–22]; (iv) process
dynamics becomes a concern, which causes autocorrelation
in the variables [19, 23–25]; and (v) multiway analysis such 2. PROCESS AND FAULT MODELING
as multiway PCA or multiway PLS is suitable for monitoring
Statistical process monitoring relies on the use of normal
batch processes, which historically have less sophisticated
process data to build process models. These models include
control strategies than their continuous counterparts [26].
from PCA, PLS and their variants. PCA models are predo-
The tasks in process monitoring can be compared in
minantly used to extract variable correlation from data [4, 6].
parallel to those in the following areas: (i) gross error
Wise and co-workers [6, 47] suggest the use of PLS models
detection and identification (GDI) based on first-principles
for process monitoring in a similar manner to PCA models,
models [27–30]; (ii) fault detection and isolation (FDI) based
but they point out different characteristics of the two types of
on FDI observers (a special form of observers), parity rela-
models. In this section we discuss the main points of SPM
tions and Kalman filters [31–36]; and (iii) multivariate
using PCA models, because PLS has been used in a similar
statistics-based outlier detection and missing value replace-
manner.
ment [5, 37–39]. The multivariate process monitoring meth-
ods based on PCA and PLS models offer a practical approach
for fault detection and diagnosis. While fault detection is 2.1. Principal component analysis
accomplished by directly applying statistics used in MQC, Let x 2 Rm denote a sample vector of m sensors. Assuming
fault diagnosis is made possible by the use of contribution that there are N samples for each sensor, a data matrix
plots [8, 9, 40, 41] and a fault identification index based on X 2 RNm is composed with each row representing a sam-
fault reconstruction [10, 13]. Recent work by Gertler et al. [16] ple. The matrix X is scaled to zero mean for covariance-based
describes an isolation-enhanced PCA approach which uses a PCA and, in addition, to unit variance for correlation-based
bank of PCA models for fault identification. A structured PCA. The matrix X can be decomposed into a score matrix T
residuals approach with maximized sensitivity for fault and a loading matrix P by either the NIPALS [48] or the
diagnosis in processes is proposed by Qin and Li [15]. A singular value decomposition (SVD) algorithm:
related data-based method is the use of auto-associative ~ ¼ TPT þ T
X ¼ TPT þ X ~P~T
neural networks as non-linear PCA for sensor validation ð1Þ
[42]. In these methods, quasi-steady state models are used to ¼ ½T T ~TT
~ ½P P PT
detect sensor gross errors. ~¼T ~P
~ T is the residual matrix, T ¼ ½T T~ and
where X
While the area of statistical process monitoring has pro- ~
P ¼ ½P P. Since the columns of T are orthogonal, the covar-
gressed rapidly, with many successful industrial applica-
iance matrix is
tions reported, only a few efforts have been made to provide
overviews of the area. MacGregor [43] and MacGregor and 1 KP
T
S XT X ¼ P ð2Þ
Kourti [44] provide early overviews of the methods available N1
in SPM. Wise and Gallagher [45] provide an overview of where
many aspects of statistical process monitoring based on
PCA, PLS and their variations. A recent text by Chiang ¼ 1 T
K T T ¼ diagf1 ; 2 ; . . . ; m g ð3Þ
et al. [46] on process monitoring provides an introduction N1
to SPM methods and their applications. and
The objective of this paper is to provide an overview and
1
analysis of recently developed process monitoring methods i ¼ tT ti varfti g ð4Þ
for fault detection, reconstruction and identification. The
N1 i
focus of the review necessarily reflects the author’s experi- when N is very large. The score vector ti is the ith column of
ence in this area. Methods that are well reviewed in earlier and i are the eigenvalues of the covariance matrix in
T,
papers [44, 45] will be mentioned here but will not be descending order. For variance-scaled X, Equation (2) gives
repeated at length. The paper is organized as follows. Section the correlation matrix R. The principal component subspace
2 discusses how processes are modeled using statistical (PCS) is S p ¼ spanfPg and the residual subspace (RS) is
methods and how faults are represented using subspaces ~
S r ¼ spanfPg.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
482 S. J. Qin
A sample vector x can be projected on the PCS and RS where xðkÞ represents the noise-free values, dðkÞ is the
respectively: unmeasured disturbances and nðkÞ is the measured noise.
The first part of Equation (12) is a representation of the
^x ¼ PPT x Cx 2 S p ð5Þ
process conservation laws at quasi-steady state. The second
part of Equation (12) is the measurement equation. The time
~P
~x ¼ P ~ T x ¼ ðI PPT Þ x ¼ ðI CÞ x 2 S r ð6Þ index k can be suppressed for convenience.
Since S p and S r are orthogonal, Denoting D? as the orthogonal complement of D such that
DT D? ¼ 0, Equation (12) can be rewritten as
xT ~
^ x¼0 ð7Þ T
D? B x ¼ 0
x C ð13Þ
and
D ? T
where C B 2 Rqm . If we can identify the matrix C
x¼^
xþ~
x ð8Þ
consistently, the process model is accurately identified up to
a similarity transformation. From Equation (13) we know
2.2. Fault direction matrix that
x is in the orthogonal complement of C, i.e.
The sample vector for normal operating conditions is de- T ?
x¼ C s ð14Þ
noted by x , which is unknown when a fault has occurred. In
the presence of a process fault F i the sample vector x is where s is an independent vector, which leads to
represented by the expression T ?
x¼ C sþn ð15Þ
x ¼ x þ i f ð9Þ
where i is orthonormal and jjfjj represents the magnitude of Equation (15) has a similar form to the PCA model and is an
the fault. Note that f may change over time depending on equivalent representation of Equation (12).
how the actual fault develops over time. The actual fault From Equation (15) we observe that the dimension of
independently varying factors s is mq. For a multivariable
belongs to the set of possible faults, denoted by fF i g. Some
process under feedback control, these independently vary-
members of this set may be combinations of faults. For
ing factors include (i) unmeasured disturbance changes, (ii)
unidimensional faults, i is a column vector, while for
measured disturbance changes and (iii) possible setpoint
multidimensional faults, i is a matrix. It is straightforward
to derive i for sensor faults. For example, changes.
One note is in order on the persistent excitation of the data
Ti ¼ ½0 0 1 0 ð10Þ used for process monitoring. When system identification
models are used for control purposes [52], it is required
represents a single sensor fault in sensor 3, while
that the data are persistently excited. In process monitoring,
1 0 0 0 however, if the process data are not persistently excited, i.e.
Ti ¼ ð11Þ
0 0 1 0 some of the independent factors are not active, the identified
PCA model will have fewer principal components than mq.
represents a simultaneous sensor fault in sensors 1 and 3.
In this case the model can still be used for process monitor-
For process faults, i represents the direction or subspace
ing if these inactive factors remain silent. If these factors
in which the process deviates from the normal situation.
become active, a process monitoring alarm will be triggered,
Process faults are usually multidimensional, but they can be
which indicates a new active mode rather than an actual
unidimensional and at the same time impact multiple sen-
fault. In this case the process model can be simply updated
sors. Examples of such can be found in e.g. Reference [13].
with the data that reflect the new active mode in order to
Yoon and MacGregor [49] classify faults into simple faults
avoid future false alarms [22].
and complex faults. It should be noted that i can represent
Another note is on causal models versus correlation-based
both simple and complex faults. If faulty data are available
models. Industrial processes usually operate under highly
for a complex process fault of interest, the fault direction
constrained conditions due to material and energy balances,
matrix i can be extracted from faulty data. Valle-Cervantes
quality requirements, operational constraints, safety con-
et al. [50] and Yue and Qin [51] give examples of extracting
straints and control feedback. These constrained conditions
process fault directions for both continuous and batch
appear as correlation in the operational data. When PCA and
processes.
PLS models are built from these data, they are correlation-
based models and cannot be interpreted as causal relations.
2.3. How does PCA model a process? Yoon and MacGregor [53] discuss the difference between
In statistical process monitoring, PCA is used to model the causal models and correlation-based models. Care must be
correlation among process variables. Therefore it is desirable taken using correlation models for fault diagnosis. However,
to understand what PCA is actually modeling about the if the process data are generated from designed experiments,
process and how good the model is. Industrial processes it is possible to build causal models using PCA if the noise-
usually have many unmeasured but normal process distur- to-signal ratio is low. Such a model is equivalent to a total
bances. We can represent the process at quasi-steady state as least squares approach. If the noise-to-signal ratio is high,
direct PCA modeling leads to poor models [24, 53], and
xðkÞ þ DdðkÞ ¼ 0
B
ð12Þ system identification techniques [24] or PCA with proper
xðkÞ ¼
xðkÞ þ nðkÞ instrumental variables [54] should be used.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
Statistical process monitoring 483
3. FAULT DETECTION INDICES The relationship between Equation (21) and Equation (18) is
discussed in Reference [26].
Fault detection is usually the first step in multivariate
process monitoring. Typically the SPE (or Q statistic) and
3.2. Hotelling’s T 2 statistic and Hawkins’
Hotelling’s T 2 statistic are used to represent the variability in 2
TH statistic
the RS and PCS respectively. Owing to the complementary
Hotelling’s T 2 statistic measures variations in the PCS:
nature of these two indices, combined indices are also
proposed for fault detection and diagnosis [51, 55]. Another T 2 ¼ xT PK1 PT x ð22Þ
statistic that measures the variability in the RS is Hawkins’
Under the condition that the process is normal and the
statistic [5]. The global Mahalanobis distance test can also be
data follow a multivariate normal distribution, the T 2 statis-
used as a combined measure of variability in the PCS and RS.
tic is related to an F distribution considering that the popula-
Individual tests of PCs can also be conducted [5], but they are
tion mean and covariance are estimated from data [58]:
often not preferred in practice, since one has to monitor
many statistics. In this section we summarize these fault NðN lÞ 2
T
Fl;Nl ð23Þ
detection indices and provide a unified representation. lðN 2 1Þ
3.1. Squared prediction error where Fl;Nl is an F distribution with l and Nl degrees of
The SPE index measures the projection of the sample vector freedom. For a given significance level the process is
on the residual subspace: considered normal if
xjj2 ¼ jjðI PPT Þxjj2
SPE jj~ ð16Þ lðN 2 1Þ
T2 T2 Fl;Nl; ð24Þ
The process is considered normal if
NðN lÞ
If one considers that the mean is accurately known and only
SPE 2 ð17Þ
the covariance is estimated from data, the T 2 upper control
where 2 denotes the upper control limit for SPE with a limit is [5]:
significance level . Jackson and Mudholkar [56] developed
lðN 1Þ
an expression for 2 : T2 ¼ Fl;Nl; ð25Þ
Nl
0 qffiffiffiffiffiffiffiffiffiffiffi 11=h0
c 22 h20 h ðh 1Þ Note that the difference between the above two expressions
2 0 0
¼ 1 @
2
þ1þ A ð18Þ is only a factor of ðN þ 1Þ=N. If the number of data points, N,
1 21
is so large that the mean and covariance estimated from data
are accurate, the T 2 index can be well approximated with a
where
X
m 2 distribution with l degrees of freedom and
i ¼ ij ; i ¼ 1; 2; 3 ð19Þ
j¼lþ1 T2 ¼ 2l; ð26Þ
21 3 This result can also be obtained using Reference [57] directly
h0 ¼ 1 ð20Þ on Equation (22). In process monitoring it is typically true
322
that N is very large. Therefore the 2 upper control limit is
l is the number of retained principal components and c is adequate and often used in the process monitoring literature.
the normal deviate corresponding to the upper 1 percen- 2
Hawkins’ TH statistic is a symmetric implementation of
tile. Note that this result is derived under the following 2
Hotelling’s T statistic in the residual subspace [59]:
conditions.
2 ~K~ 1 P
~Tx
ðm lÞðN 2 1Þ
TH ¼ xT P Fml;Nmþl ð27Þ
The sample vector x follows a multivariate normal dis- NðN m þ lÞ
tribution.
An approximation for the distribution is made in A drawback of this statistic compared with SPE is that ill-
deriving the control limit, which is valid when 1 is conditioning can result when some of the residual eigenva-
very large. lues i ði ¼ l þ 1, . . . ,mÞ are very close to zero. Similarly, if N
The result holds regardless of how many principal com- is large, the process is considered normal if
ponents are retained in the model. 2
TH 2ml; ð28Þ
When a fault occurs, the faulty sample vector x is composed
for a significance level . This test has been used in gross
of the normal portion superimposed with the fault portion.
error detection [28] on the residuals due to conservation
The fault can make SPE larger than 2 , leading to the
equations.
detection of the fault.
An alternative upper control limit for SPE is derived in
Reference [26] using the result in Reference [57]: 3.3. Mahalanobis distance
The Mahalanobis distance, which is defined as follows,
2 ¼ g2h; ð21Þ forms the global Hotelling’s T 2 test:
where mðN 2 1Þ
D ¼ xT S1 x
Fm;Nm ð29Þ
g ¼ 2 =1 ; h¼ 21 =2 NðN mÞ
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
484 S. J. Qin
rðN 2 1Þ After g and h are calculated, the upper control limit of ’ can
Dr ¼ xT Sþ x
Fr;Nr ð30Þ be obtained for a given significance level ; a fault is detected
NðN rÞ
by ’ if
where Sþ is the Moore–Penrose pseudoinverse. Mardia [60] ’ > g2h; ð41Þ
further shows that the distance Dr is independent of the type
of pseudoinverse. Hence we choose the Moore–Penrose It is worth noting that Raich and Cinar [55] suggest the
pseudoinverse. combined statistic
It is straightforward to show that the global Mahalanobis
SPEðxÞ T 2 ðxÞ
distance is the sum of T 2 in PCS and TH 2
in the RS: c þ ð1 cÞ ð42Þ
2 2l;
2 2
D¼T þ TH ð31Þ
where c 2 ð0; 1Þ is a constant. They further give a rule that
In process monitoring where the number of observations, N, the statistic less than one is considered normal. This, how-
is rather large, the global Mahalanobis distance approxi- ever, can lead to erroneous results, because it is possible to
mately follows a 2 distribution with m degrees of freedom, have either SPEðxÞ > 2 or T 2 ðxÞ > 2l; even if the above
statistic is less than one.
D
2m ð32Þ
and the reduced Mahalanobis distance
3.5. Unified form of the quadratic indices
Tong and Crowe [41] compared several univariate and
Dr
2r ð33Þ collective statistical tests, including the global Mahalanobis
distance. In this subsection we provide a unified form for
The upper control limits for D and Dr can be defined
these indices available in the literature and compare them in
accordingly.
terms of strengths and weaknesses.
Denoting
3.4. Combined indices
In practice, a single index, rather than two indices, is ~ Tx
t ¼ ½P P ð43Þ
preferred for monitoring a process. Yue and Qin [14, 51]
propose a combined index for fault detection, which com- as the principal components, the above fault detection in-
bines SPE and T 2 as follows: dices can be unified as follows:
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
Statistical process monitoring 485
2
These aforementioned indices (SPE and TH in the RS, T 2 in
the PCS, the Mahalanobis distance and the combined index)
are all collective statistical tests. They are preferable to
univariate statistical tests, because the correlation in the
data is taken into account. One common feature is that
they are all quadratic-form indices. The in-control regions
for these indices can be different owing to the possibility of
near-zero eigenvalues. The in-control region defined by SPE
and T 2 jointly is the joint of two ellipsoids, each in one
subspace, while the combined index and the Mahalanobis
distance define a global ellipsoid in the measurement space.
In the case of process monitoring, SPE is often preferred to
T 2 , as explained in the next subsection, while in other cases,
such as quality control, the T 2 can be preferred. In these cases
it is beneficial to use subspace-based indices such as SPE or
T 2 . If both indices or subspaces are equally important, it is
desirable to use a global index such as the combined index ’.
This will result in one index to monitor instead of two, and
the diagnosis can be done using methods such as contribu-
tion plots, which reveals much more information than the Figure 1. Two flow sensors measuring the inlet and outlet flow
two indices. of a unit.
3.6. The asymmetric role of SPE and T2 Figure 1(a) will increase T 2 only, which is not a fault.
in process monitoring Therefore we consider the use of SPE for fault detection
Although both SPE and T 2 are used for process monitoring, more preferable than the T 2 index.
it is necessary to point out that they measure different Another difference between the PCS and RS is the statio-
situations of the process, and their roles in process monitor- narity of the projected vectors ^ x and ~
x. Real process signals
ing are not symmetric. The SPE index measures variability are hardly normally distributed and stationary. While PCA
that breaks the normal process correlation, which often decomposition does not require the signals to be normal or
indicates an abnormal situation. The T 2 index measures the stationary, the principal component subspace usually cap-
distance to the origin in the principal component subspace. tures the non-stationary parts of the signals, because they
Since the principal component subspace typically contains have large variability. Use of the T 2 index can incur false
normal process variations with large variance that represent alarms due to the non-stationary, non-normal signals. On the
signals, and the residual subspace contains mainly noise, the other hand, the T 2 control limit defined by the non-stationary
normal region defined by the control limit for T 2 is usually signals can be very wide, which increases the rate of un-
much larger than that of SPE. Therefore it usually takes a detected faults. After the major variability is extracted in the
much larger fault magnitude to exceed the T 2 control limit. PCS, the residual ~x usually looks much more stationary and
The normal region defined by the SPE control limit includes random, which makes the SPE control limit valid. Therefore
residual components that are mainly noise. Therefore faults SPE may have lower chances of type I and type II errors
with small to moderate magnitudes can easily exceed the compared with T 2 . The T 2 index, however, is more suitable
SPE control limit. Furthermore, if a sample exceeds the T 2 for monitoring quality variables [5], since in-control quality
limit only but does not violate the SPE limit, it does not break variables are usually stationary.
the correlation structure but simply shifts further away from When the process variables are considered stationary and
the origin in the PCS. This case could be a fault, but it could their number is large, the principal components, which are
also be a change in the operating region which is not linear combinations of the variables, can often be approxi-
necessarily a fault. mated with normal distributions owing to the central limit
The use of SPE and T 2 for fault detection can be explained theorem, even though the original variables are not normally
with a simple example shown in Figure 1(a), which has two distributed. However, it would be dangerous to generalize
flow sensors measuring the inlet and outlet flow rates of a this statement to considering that all major components are
unit. Under normal steady state operation the data are normally distributed, because, if so, this would imply that
shown in Figure 1(b). The PCA model with one PC is the the original variables are normally distributed as they are
45 line, which is the PCS. A faulty sample (full circle) essentially linear combinations of the major components. If
deviates from the normal model line and increases SPE. the process variables are highly correlated and the number of
This fault breaks the mass balance and is clearly detected independently varying components is not very large, the
using SPE. Note that the T 2 index is inside the control limit in applicability of the central limit theorem is discounted. One
this case. While a fault can cause SPE and T 2 to increase, an such example is the boiler process data in Reference [10],
increase in T 2 alone indicates that the change is consistent where one principal component captures the major trend
with the model; it may be just a shift of operating region. For and most of the variance in the data but is obviously not
example, a change in throughput of the process shown in normally distributed.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
486 S. J. Qin
where fi is an estimate of the actual fault magnitude f along Therefore the best reconstruction for sensor 2 is the average of
direction i . sensors 1 and 3. For an arbitrary process fault direction, say
The objective for reconstruction is to find fi such that the 1
reconstructed SPE i ¼ pffiffiffi ½1 1 1T
3
xi jj2 ¼ jj~
SPEðxi Þ ¼ jj~ x i fi jj2 ð45Þ the reconstruction according to Equation (47) is
is minimized. The optimal solution to this problem leads to 2 3
3x1 x2 þ 2x3
14
an optimal estimate of the fault magnitude: xi ¼ x1 þ 3x2 þ 2x3 5 ð50Þ
4
fi ¼ ~ þ~ ¼ ~ þx ð46Þ x1 þ x2 þ 2x3
i x i
For any given fault magnitude f in the i direction the
The second equal sign is due to the fact that ðI CÞ2 ¼ I C.
faulty data vector is
The reconstructed measurement vector is
2 3 2 3 2 3
x1 x1 1
~ þ x ¼ I i
xi ¼ x i ~þ x ð47Þ 1
i i 4 x2 5 ¼ 4 x2 5 þ pffiffiffi 4 1 5f
x3 x3 3 1
and in the residual space
~ i
xi ¼ I
~ ~þ ~ ð48Þ The reconstruction using Equation (50) is
i x
2 3
The reconstructed SPE becomes 3x x2 þ 2x3
1 4 1
xi ¼ x1 þ 3x2 þ 2x3 5
xi jj2 ¼ x~T I
SPEðxi Þ ¼ jj~ ~ i
~þ ~ ð49Þ 4 x þ x þ 2x
i x 1 2 3
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
Statistical process monitoring 487
which completely eliminates the impact of the fault in the Yue and Qin [14, 51] suggest the use of the combined index
reconstruction. This result extends the missing value repla- for reconstruction.
cement approach which allows for sensor faults only. The optimal reconstruction is obtained by minimizing the
combined index:
4.2. Related methods fi ¼ ðTi Ui Þ1 Ti Ux ð54Þ
In the case of sensor faults the fault direction matrix i takes
a special form that corresponds to some columns of an Yue and Qin [51] demonstrate that Ti Ui is positive definite,
identity matrix. In this special case, other methods such as which guarantees its invertibility. The reconstructed xi and
the missing value replacement method (e.g. References [5, ’ðxi Þ can be calculated from Equation (54):
37, 38]) and the iterative approach of Dunia et al. [10] can be xi ¼ ½I i ðTi Ui Þ1 Ti Ux P? ð55Þ
i ;U x
applied. Dunia et al. [10] further show that these three
1 T
methods give identical results, unifying the three methods. where P? T
i ;U ¼ I i ði Ui Þ i U is the projection
For a single sensor fault at the ith sensor the fault direction matrix on the orthogonal complement of i weighted
2
matrix i ¼ ½0 0,1,0 0T , which is the ith column of an by U; ðP? ?
i ;U Þ ¼ Pi ;U , and
identity matrix. The fault estimate from Equation (46) re-
’ðxi Þ ¼ xT P?
i ;U x
duces to
T The use of ’ for reconstruction means that the fault is
ci 0 cTþi x
fi ¼ xi ð51Þ corrected in both the PCS and RS. Owing to the large normal
1 cii
variability in the PCS, it may not always be appropriate to
where xi is the ith element of x and ½cTi ; cii ; cTþi forms the ith correct in the PCS. Therefore one should assess whether ’
row of matrix C, with cTi including the elements before i and should be used for reconstruction as well as for detection.
cþi including the elements after i. The relation ðI CÞ2 ¼
I C is used in deriving the above equation. The recon-
structed ith variable can be calculated from Equation (47) as
5. FAULT IDENTIFICATION
AND DIAGNOSIS
½cTi 0 cTþi x
xrec
i ¼ x i fi ¼ ð52Þ There has been tremendous interest in diagnosing the pos-
1 cii
sible root causes of a fault situation once it is detected. Thus
Although the aforementioned methods give the same results far the most popular approach to diagnosis is the contribu-
for reconstruction, the optimization method is more general tion plot approach [8, 9]. This approach requires no prior
since it works for process faults as well. knowledge except for a normal PCA or PLS model. The
Another advantage of the reconstruction method is de- contributions are actually the effects of the fault on the
scribed by Dunia and Qin [13]: it checks for the reconstruct- observed vector of measurements. If prior knowledge or
ability of a fault and the reliability of the reconstruction. If historical data of the faults are available, the reconstruc-
the fault subspace happens to overlap with the PCS, ~ i will tion-based approach [10, 12, 13] can lead to more conclusive
be rank-deficient and the fault cannot be completely recon- results. If there are plenty of historical fault records with
structed. The pseudoinverse solution in Equation (47) leads many fault categories, classification and clustering methods
to the minimum norm reconstruction. Even though ~ i is not are applicable. In the context of statistical process monitor-
rank-deficient, it is possible for the reconstructed values to ing, Raich and Cinar [11] apply a similarity index to dis-
have a larger variance than the original variables. In this case criminate among different faults using angles and distances
a better reconstruction or replacement of the faulty sensor is between clusters. Kano et al. [65] use a dissimilarity measure
the mean of the variable instead of the reconstruction to discriminate between the normal and faulty clusters. By
through the PCA model. This point is further elaborated in using Fisher discriminant analysis, Chiang et al. [66] achieve
Section 7. Furthermore, the variance of the reconstruction maximum separation between the normal and faulty clus-
error, after reconstructability is guaranteed, can be used to ters. This approach is more suitable for detecting mean
determine the optimal number of principal components [63]. changes than covariance changes.
The more traditional cross-validation method for selecting In the rest of this section we focus on the reconstruction-
the number of principal components [64] does not guarantee based approach and the contribution-based approach. Both
that the left-out values are reconstructable, although a gen- approaches are applicable to sensor and process fault diag-
eral rule of thumb such as no more than 20% left-out values nosis, but they have different characteristics that are often
is recommended in the literature. complementary. The contribution plot approach does not
require any information about the fault to generate the plots;
4.3. Fault reconstruction using the in the case where prior knowledge is available, it is up to the
combined index user to interpret the plots. The reconstruction-based ap-
In the case where the reconstruction should minimize both proach, on the other hand, requires knowledge of the fault
SPE and T 2 , the combined index of the reconstructed vector directions. In the case where faulty data are available for a
can be minimized: process fault, it is straightforward to model the fault direc-
tions from the faulty data and use them for future fault
SPEðxi Þ T2 ðxi Þ
’ðxi Þ ¼ þ 2 ¼ xTi Uxi ð53Þ identification [50,51]. In this case, knowledge from historical
2 l; faulty data is built into the method. Owing to the use of
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
488 S. J. Qin
additional information, the reconstruction-based approach detection and diagnosis. This method, however, requires
can be more conclusive in the results. If a fault has never knowing j ¼ ~j þ
~ j , while the SPE-based method requires
happened before but is detected for the first time, the ~ j . If the process fault directions are modeled from data,
only
reconstruction-based approach can also extract the fault as shown in References [50, 51], the estimate of ~ j is usually
direction from the faulty data. The interpretation of the fault less accurate than that of ~ j owing to the large normal
direction can be done similarly to that of the contribution variability in the PCS, leading to a less accurate estimate
plots. Further, the fault direction can be used for future fault for j .
identification via reconstruction. The fault direction extrac-
tion will be discussed in Section 6. An additional benefit of
5.2. Contribution plots
the reconstruction-based approach is that it provides a
Contribution plots are well known diagnostic tools for fault
framework to analyze fault detectability, reconstructability
identification [8, 9, 40, 43, 67]. The commercial use of con-
and identifiability, which will be reviewed in Section 7.
tribution plots was patented by Hopkins et al. [68] in 1995.
The most common indices used for fault diagnosis with
5.1. Reconstruction-based approach
contribution plots are SPE and T 2 . Contribution plots on
The reconstruction approach for fault identification consists
SPE indicate the significance of the effect of each variable
of finding the true fault from a set of candidate faults. Dunia
on the index at different sampling times. If a sample vector x
and Qin [13] propose a fault identification method by assum-
has an abnormal SPE, the variables that appear to have a
ing each of the faults in fF j g in turn and performing
significant contribution are investigated. A contribution plot
reconstruction. When j is assumed, which may or may
on PCA scores indicates the significance of the effect of each
not be the true fault, the reconstructed sample vector is
variable on the T 2 index. The variables with the largest
xj ¼ x jfj , and fj is estimated such that
contribution are considered major contributors to the fault.
xj jj2 ¼ jj~
SPEðxj Þ jj~ ~ j fj jj2
x ð56Þ The contribution for SPE is simply breaking down the
summation of SPE (Equation (16)) into each element:
is minimized. The least squares solution for fj is
X
m X
m
~ þ~
fj ¼ ð57Þ SPE ¼ x~2i ¼ SPEi ð62Þ
j x
i¼1 i¼1
The reconstructed vector ~xj can be related to the fault-free
where SPEi ¼ x~2i is the contribution of the ith variable. The T 2
vector ~x as
distribution is not as clearly defined owing to the definition
~ j fj ¼ ðI
~xj ¼ ~x ~ j
~ þ Þ~ of T 2 . Miller et al. [8] define the contribution for each PC and
j x
each variable, which is difficult to use in practice. Nomikos
~ þ Þ~
~ j
¼ ðI ~ ~þ ~
j x þ ðI j j Þi f ð58Þ
[69] defines a T 2 distribution that involves cross-talk among
When the actual fault i is assumed, i.e. j ¼ i, the second variables, which could lead to negative contributions. Qin
term in Equation (58) is zero, which leads to et al. [70] define a T 2 contribution that eliminates the cross-
talk among variables. Westerhuis et al. [71] propose other
~ i
xi ¼ ðI
~ ~ þ Þ~
ð59Þ
i x generalizations to the T 2 contributions by including all
~ þ is a projection matrix,
~ i principal components. Upper control limits for contribution
Since I i
plots are discussed in References [70–72].
~ i
jjðI ~ þ Þ~
x jj
i x jj jj~ ð60Þ The contribution plots are very easy to calculate, with no
prior knowledge required to generate the plots. Prior knowl-
The SPE of the reconstructed vector is
edge, however, is often used and required to interpret the
xi jj2 ¼ jjðI
SPEðxi Þ ¼ jj~ ~ i
~ þ Þ~ 2
i x jj
plots. As explained by Kourti and MacGregor [40], the
ð61Þ contribution plots may not explicitly identify the cause of
x jj2 ¼ SPEðx Þ 2
jj~
an abnormal condition, but they determine the entries in x
Therefore, when the true fault is assumed, the reconstructed that are not consistent with the normal operating conditions.
SPE is brought into the control limit. If ~ j 6¼
~ i , the last term The reason is that the contribution from one variable is
in Equation (58) is not zero, which makes SPEðxj Þ outside the propagated to other variables in calculating the projection
SPE control limit given a large enough fault magnitude. The ~
x. This ‘smearing’ effect can reduce the difference between
issue of fault identifiability between i and j is discussed by contributing and non-contributing variables, which in the
Dunia and Qin [13]. Section 7 will discuss briey the fault extreme case can lead to mis-identification. The reconstruc-
identifiability issue for the case of unidimensional faults. In tion-based approach with known fault directions completely
summary, if SPEðxj Þ 2 , j is considered as the fault; if eliminates the fault when the actual fault direction is used for
several j are identified, then the true fault is not uniquely reconstruction. In the case of arbitrary process fault direc-
identifiable but is identified to a subset. tions the reconstruction-based approach completely re-
Yue and Qin [51] identify faults based on the combined moves the effect of the fault and brings SPE within the
index. If the reconstruction in a fault subspace leads to a normal control limit.
feasible solution in the normal region, the fault subspace is Owing to limited redundancy or correlation among the
considered as the true fault. The reconstruction method process variables, it is possible that some faults may not be
minimizes ’ along j in this case. This method is appropriate identifiable. In this case one should be cautious in drawing
when both SPE and T 2 are important indices for fault conclusions from any diagnosis methods. For example, for
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
Statistical process monitoring 489
the case of two variables and one PC the loading matrices can be discussed in Section 7. A similar example is illustrated in
be parametrized as Reference [73].
It might also be noted that the contribution plot approach
sin
P¼ is dependent on the scaling of variables. To demonstrate the
cos effect of scaling on the contribution- and reconstruction-
cos based approaches, we give the following example.
~¼
P
sin
Example 2. Sensor faults
Consider a fault in sensor 1. The reconstruction-based iden- The data generating equation for this example is
tification results along sensor 1 and sensor 2 directions are 2 3 2 3
x1 0:3873 0:1190
SPEðx1 Þ ¼ 0 < 2 6 x2 7 6 0:1291 0:2379 7 s1
6 7¼6 7
4 x3 5 4 0:9037 0:1530 5 s2 þ noise
SPEðx2 Þ ¼ 0 < 2
x4 0:1291 0:9518
which indicates explicitly that one of the two is faulty, but they
are not identifiable further owing to lack of redundancy. Using where s1 and s2 are zero-mean random sequences with
the contribution plot approach, the SPE contributions are standard deviations of 1 and 0.8 respectively. We generate
100 data samples to build a PCA model and then generate an
SPE1 ¼ x~21 ¼ cos 2 ðx1 cos þ x2 sin Þ2 additional 12 samples with a bias fault in sensor 3. The noise
SPE2 ¼ x~22 ¼ sin 2 ðx1 cos þ x2 sin Þ2 standard deviation is 0.2 with normal distribution. The stan-
dard deviation of the normal data is [0.47 0.34 0.99 0.85]T. We
Therefore SPE2 =SPE1 ¼ tan2 regardless of the fault in sen- perform fault diagnosis using reconstruction-based and con-
sor 1. If the two variables are scaled to unit variance, the tribution plot approaches. We also consider the case where it
angle will be 45 and the contributions SPE1 and SPE2 are is required not to scale the variables owing to other reasons,
about the same. However, if variable 2 is more important and the case where the variables are scaled to unit variance.
and is given more weight, the SPE2 contribution will always The fault identification results using the reconstruction-
be larger, which leads to mis-identification. The correct based approach and the contribution plots are shown in
answer to this problem is that there is not enough redun- Figure 3. When the variables are not scaled to unit variance,
dancy to identify the two faults; the identifiability issue will the reconstruction-based approach correctly identifies that
Figure 3. Sensor fault identification results for Example 2.Top plots: each group has four bars representing
reconstructed SPE along four sensor directions; smallest value indicates a fault. Bottom plots: each group
has four bars representing contributions of four sensors; largest value indicates a fault.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
490 S. J. Qin
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
Statistical process monitoring 491
faults that are important to the safety and health of the structure to substantially improve the film’s mechanical
process. The contribution plot approach does not require properties. Finally, the film is heat-set to stabilize it. A
prior knowledge to calculate the contributions, but requires number of faults can occur in the process. A typical fault is
process knowledge to interpret the contributions. a sudden oscillation in some temperature loops by about
The use of contribution plots is not always to find the 10 C, which severely affects the quality of the film product. It
largest contributions. In Example 3 the sign of the contribu- is desirable to diagnose this fault as soon as it happens.
tions is much more informative. For a more complex, realis- In the polyester film process there were initially a total of
tic process it is up to the user’s experience to determine 308 variables, which were reduced to 103 variables with the
whether one should look at the magnitude or the sign of the help of the plant engineers. The variables in the total data set
contributions, or a combination of both sign and magnitude. include process variables, setpoints, output variables and
For large processes with possibly hundreds of variables it monitoring variables. For this analysis, process variables and
can be overwhelming to examine either the magnitude or the monitoring variables were used. After a preliminary analysis
sign of the magnitude of the contributions. Multiblock it was clear that the first 1167 samples can be used to gen-
approaches in this case can be much more informative erate the normal PCA model. The period from sample 1168
[9,70]. In the next section the contribution plot approach is to sample 1417 contains the typical fault which can be used
discussed in a hierarchical framework with an industrial to test different fault diagnosis methods. For the normal PCA
example. Fault directions extracted from fault data are model the number of principal components is determined to
interpreted similarly to contribution plots and are used for be 15 using the variance of the reconstruction error [63].
fault identification via reconstruction. Figure 6 shows an example of contribution plots for the
So far we have discussed mainly SPE contributions. The T 2 polyester film process for the faulty samples. The SPE in the
contributions can also be useful, but the contributions to T 2 top-left plot shows significant violation of the control limit.
break down to individual principal components. The The bottom-right plot shows a typical contribution plot. For
lumped T 2 contributions for each variable have to incur this faulty period the highest contributing variable is shown
some approximation [69–71]. The reconstruction-based ap- in the top-right plot, which frequently points to variable 28,
proach can also be done based on T 2 or the combined index variable 25 and sometimes variable 32. The faulty data for
of References [51, 74]. variable 28 and its PCA model projection are shown in the
In the case where no prior knowledge is available about bottom-left plot, which shows a large difference between
the fault directions, the reconstruction-based approach can them. This indicates that the contribution plot points to the
be implemented similarly to the contribution plot approach. contributing variable correctly.
In this approach, one simply assumes that each variable is a Owing to the large number of variables in this example, it
potential contributor and reconstructs along each variable to can be difficult to interpret the contribution plot when there
minimize the distance to the normal model. The distance are many competing large contributors, as in the case of
measure can be SPE or Mahalanobis distance. The largest the bottom-right plot in Figure 6. In the next subsection we
correction indicates a large contribution to the fault situation. discuss the use of the reconstruction-based approach for
This approach is suggested by Runger et al. [73]. Similarly to fault diagnosis. The faulty data are divided into two parts.
the contribution plot approach, this approach is also sensi- The first part is used to extract the fault direction matrix i to
tive to variance scaling. A major advantage of this approach characterize this fault. The fault direction matrix is then used
over the contribution plot approach is that the calculation to identify major contributing variables. The second part of
of T 2 contributions does not require approximations as done the faulty data is treated as new faulty data, and the fault
in References [69–71]. Process knowledge is also needed to direction matrix extracted earlier is used to identify that the
interpret these reconstruction-based contributions. The re- second part is essentially the same fault as the first part.
construction-based contributions can also be calculated for
the combined index or any quadratic indices.
6.1. Fault subspace extraction
When a process is under faulty condition, its measurement
6. FAULT DIAGNOSIS OF AN INDUSTRIAL will contain the normal values of the process variables and
FILM PROCESS the fault. The kth sample under fault F i can be projected to
This case study shows how the reconstruction-based ap- the residual subspace:
proach and the hierarchical contribution plot approach can ~ i fðkÞ
x ðkÞ þ
xðkÞ ¼ ~
~
be used for complex process fault diagnosis. The process is a
polyester film manufacturing process which is studied in It is often the case that the faulty residual is much larger than
Reference [70]. The raw material, a polyethylene polymer the normal part projected on the residual subspace, i.e.
that comes from batch reactors, is first extruded in a chill roll jj~ x ðkÞjj
xðkÞjj jj~
drum to form a film. The film is then biaxially oriented and
stretched first in the machine direction and then in the Then we have
transverse direction. The orientation is accomplished by ~ i fðkÞ
xðkÞ
~
passing the film over rollers that run at increasingly faster
speed (300 m min1), then fed into a tenter oven, where it is Collecting p observations under fault F i and denoting
pulled at right angles (transverse direction orientation). This
Xi ¼ ½xð1Þ xð2Þ xðpÞT
stretching rearranges the polymer molecules into an orderly
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
492 S. J. Qin
directions is demonstrated in the next subsection. In the Dimension 2 dominate these directions. Variable 32 in di-
limiting case where only one fault observation is used in mension 3 is also significant and variable 40 dominates
fault direction extraction, the fault direction in Equation (63) dimension 4. This result is essentially consistent with the
reduces to the standard contributions to SPE. For a multi- contribution plot results in Figure 6, with the additional
dimensional fault the number of observations required for suspect, variable 40, being identified. Therefore it is possible
extracting the fault direction should be larger than the to identify variables that are causing the upset in the process
dimension of the fault; more observations often lead to better using fault direction information.
estimates of the fault directions. The SVD approach to To adequately extract all dimensions of the fault, the
extracting fault directions is not simply averaging the fault reconstructed SPE after extracting the fault directions should
contributions over multiple observations. It extracts the be deated within the normal SPE limit. Figure 8 shows that the
common, significant variations due to the effect of the fault, SPE is deated every time a dimension is extracted from the
similarly to applying PCA on the faulty residuals. faulty data. After extracting dimension 1, the major hump is
With the knowledge of ~ i , Equation (56) can be calculated. deflated. After extracting dimension 2, the second major hump
Now that we know SPEðxi Þ and SPEðxÞ, we can define a fault is deflated. More than 95% of the reconstructed SPEs fall in the
identification index (FII) control limit with five dimensions, and all reconstructed SPEs
SPEðxi Þ fall in the control limit after the eighth dimension.
i ¼
SPEðxÞ After the extraction has been done for a known fault, we
can use this fault signature to identify that fault in new faulty
whose values range from 0 to 1. If i is close to one, i is not
data. The top plot in Figure 9 shows the SPE for the new data
likely the true fault, since it offers little correction in SPE. If i
from samples 126–250. When the extracted directions are
is close to zero, the fault has been identified. The FII works
applied to this faulty section, we observe in the middle plot
well for process fault identification even though the fault
of Figure 9 that the reconstructed SPE is deflated. The fault
direction estimation is not very accurate.
identification index, which is the ratio of the reconstructed
6.2. Reconstruction-based approach SPE to the original SPE, is shown in the bottom plot of
By applying fault direction extraction to the polyester film Figure 9. This result shows that the same fault is identified
manufacturing process, we obtain in Figure 7 the directions up to sample 90, because the reconstructed SPE is small and
extracted from the faulty data. These extractions are made the FII values are close to zero. After sample 90 the recon-
using the data from samples 1–125 in Figure 6. Observe in structed SPE cannot be adequately deflated, which indicates
Figure 7 that variable 28 in dimension 1 and variable 25 in that a new disturbance enters the process.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
494 S. J. Qin
In summary, when a fault is detected, the faulty data can multiple-block fault diagnosis [9]. Qin et al. [70] report the use
be used to extract the fault direction matrix. This fault of hierarchical contribution plots for this industrial process.
direction matrix can be used to identify major contributing The process is partitioned into seven blocks as shown in
variables to the fault situation, similarly to the contribution Table I. The partitioning is based on the knowledge of the
plot approach. Furthermore, the directional knowledge of process in terms of process sections. Observe that the sizes of
the fault can be used to identify newly detected faults. If a the blocks are very different. It is suggested to divide the
newly detected fault is reconstructed adequately using a process into sections that describe a unit or a specific
known fault direction, one can conclude that the same fault physical or chemical operation. After grouping into blocks,
happens again. the overall SPE and T 2 are used to detect a fault. Once a fault
is detected, the block SPE and T 2 are calculated and exam-
6.3. Hierarchical contribution plots ined against their control limit [70]. If a block SPE or T 2 is
In the case of a large number of variables it can be difficult to outside the control limit, variable contributions in that block
interpret the contribution plots. An effective approach is to use are further examined.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
Statistical process monitoring 495
responsible for the abnormal situation. The bottom plot fault directions to identify future faults. For important,
of Figure 11 shows the contribution to T 2 for Block 2, frequent process faults this approach can be more efficient
which again identifies variables 25 and 28 as the major and conclusive than other approaches, since it uses fault
contributors. directional knowledge explicitly. The hierarchical monitor-
The industrial film process clearly demonstrates the com- ing framework shown in this section can be extended to the
plementary nature and duality of the reconstruction-based reconstruction-based approach as well, which leads to even
approach and contribution-based approach to fault diagno- clearer identification results [75].
sis. The contribution plot approach requires no prior knowl-
edge to calculate the contributions, but needs prior process
7. FAULT DETECTABILITY,
knowledge to interpret the results. The reconstruction-based
RECONSTRUCTABILITY
approach can be used in two ways. One way is to use it to
AND IDENTIFIABILITY
extract fault directions and interpret the directions similarly
to contribution plots. In this way it does not require prior From both an analytical and a practical point of view it is
knowledge up front. Another way is to use the extracted important to know whether a fault is detectable, reconstruct-
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
Statistical process monitoring 497
able or identifiable given the available measurement infor- in References [13, 51] for the SPE- and combined index-based
mation and redundancy. Such an analysis can help to fault detection and diagnosis.
determine whether it is possible to carry out the fault
detection and diagnosis task or whether additional measure- 7.1. Fault detectability
ment information needs to be collected. This is analogous to The example process in Figure 1(a) can be used to illustrate
the design of a Kalman filter, where the observability needs the concept of fault detectability: a leakage at the upper
to be checked first. Dunia and Qin [12, 13] and Yue and Qin stream of sensor x1 will affect both x1 and x2 in exactly the
[51] give the necessary and sufficient detectability and same way and is consistent with the PCA model. This
identifiability conditions for unidimensional and multidi- leakage cannot be detected using the SPE index. With the
mensional faults. In this section we only give an account of help of a geometric interpretation we give explicit conditions
the unidimensional fault analysis, as it is easier to visualize for fault detectability for the unidimensional fault case,
geometrically. The multidimensional fault case can be found where i becomes a column vector ni . In the presence of a
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
498 S. J. Qin
process fault F i the sample vector x can be represented using 7.2. Fault reconstructability
a fault direction vector ni : The feasibility of calculating fi assures the existence of xi ,
which is the best estimate for x by reconstruction in the
x ¼ x þ fni ð64Þ
direction ni . Therefore the condition for a feasible calculation
where ni is normalized and the scalar f represents the of fi is identical to the condition for reconstructability along
magnitude of the fault F i . Such a fault belongs to the set of ni . From Equation (46) it is noticed that fi can be calculated
possible faults, denoted by fF j g. This fault vector can when ~ i ðor ~ni Þ 6¼ 0, i.e. ni 2
= S p . Intuitively, this condition
represent a sensor fault as well as a process fault. suggests that the displacement caused by F i should not lie
Since ni 2 Rm , it can be projected onto S p and S r : in the PCS. Therefore a necessary condition for reconstruct-
ability is ~ ni 6¼ 0, which is also the necessary condition for
ni ¼ ^
ni þ ~
ni ð65Þ
detectability.
where ^ni ¼ Cni 2 S p and ~ni ¼ Cn
~ i 2 S r . If ~ni ¼ 0, the follow- Dunia and Qin [13] use the variance of reconstruction error
ing relation results from Equation (64): (VRE) to measure the reliability of the reconstruction. The
VRE in the direction i is denoted by ui and represents the
~ x þ f ~
x¼~ x
ni ¼ ~ variance of the projection of x xi on the fault direction i :
Therefore
ui varfnTi ðx xi Þg
SPEðxÞ ¼ SPEðx Þ ð66Þ
nTi Efx xT g~ ni nT S~
~ n
As a consequence, no matter how large f is, the fault is not ¼ 2
¼ iT i2 ð69Þ
~ ~
ðn T
Þ ð~ ~
detectable if ~ni 6¼ 0. n
i i n i iÞ
n
Given that ~ni 6¼ 0, we define
where S denotes the covariance matrix of the normal data.
~
n It is possible that the best reconstruction is worse than
~
n0i i
jj~
ni jj using the average of the historical data as a reconstruction if
a particular sensor is little correlated with other sensors. In
as the normalized residual direction for the fault vector ni .
other words, it is possible to have
With this notation,
x¼~
~ x þ f ~
ni ui > varfnTi ðx
xÞg ¼ nTi Sni ð70Þ
ð67Þ
x þ f~n
¼~ ~0
i In this case the particular sensor or fault is better recon-
where f~ ¼ fjj~ni jj is the orthogonal distance of the fault to the structed using the historical average instead of the correla-
PCS. tion-based PCA model. Dunia and Qin [13] proposed an
To guarantee that the fault will be detectable, it is required iterative procedure to determine the number of sensors in
that SPE ¼ jj~xjj2 > 2 for all possible normal values of x . the model and the set of faults that can be reliably recon-
Figure 12 illustrates geometrically the sufficient condition for structed using the PCA model.
detectability in the case of a two-dimensional residual sub-
space. The vector ~x can be anywhere inside the circle defined Example 4
by the upper control limit . To guarantee that the fault is To illustrate the notion of reconstructability, the process in
detectable, the corrupted sample ~x has to be outside the circle, Figure 2 of five possible faults is considered: F 1 to F 3 are
which requires that jfj ~ be larger than the diameter of the circle
sensor faults, while F 4 and F 5 represent leaks in units A and
for the extreme case. In other words, one must have B. The PCA model for the process with three sensors is
pffiffiffi
~ > 2
jfj ð68Þ 3
T
p ¼ ½1 1 1
3
A general derivation for the case of multidimensional faults
can be found in Reference [13]. The fault direction for F 4 is
pffiffiffi
3
nT4 ¼ ½1 1 1
3
which makes ~ n4 ¼ 0. Therefore this fault is neither recon-
structable nor detectable. The physical significance is that the
fault direction is consistent with the normal variation in the
data.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
Statistical process monitoring 499
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
Statistical process monitoring 501
24. Negiz A, Cinar A. Statistical monitoring of multivariate 49. Yoon S, MacGregor JF. Fault diagnosis with multivariate
dynamic processes with state space models. AIChE J. statistical models. Part I: using steady state fault signa-
1997; 43: 2002–2020. tures. J. Process Control 2001; 11: 387–400.
25. Misra M, Qin SJ, Yue H, Ling C. Multivariate process 50. Valle-Cervantes S, Qin SJ, Piovoso MJ, Bachmann M,
monitoring and fault identification using multi-scale Mandakoro N. Extracting fault subspaces for fault iden-
PCA. Comput. Chem. Eng. 2002; 26: 1281–1293. tification of a polyester film process. Proc. ACC, Arling-
26. Nomikos P, MacGregor JF. Multivariate SPC charts for ton, VA, 2001; 4466–4471.
monitoring batch processes. Technometrics 1995; 37: 41–59. 51. Yue H, Qin SJ. Reconstruction based fault identification
27. Mah RSH, Stanley GM, Downing D. Reconciliation and using a combined index. Ind. Eng. Chem. Res. 2001; 40:
rectification of process ow and inventory data. Ind. Eng. 4403–4414.
Chem. Process Design Develop 1976; 15: 175. 52. Ljung L. System Identification: Theory for the User.
28. Romagnoli JA, Stephanopoulos G. Rectification of pro- Prentice-Hall: Englewood Cliffs, NJ, 1999.
cess measurement data in the presence of gross errors. 53. Yoon S, MacGregor JF. Statistical and causal model-
Chem. Eng. Sci. 1981; 36: 1849–1863. based approaches to fault detection and isolation. AIChE
29. Narasimhan S, Mah RSH. Generalized likelihood ratios J. 2000; 46: 1813–1824.
for gross error identification. AIChE J. 1987; 33: 1514– 54. Li W, Qin SJ. Consistent dynamic PCA based on errors-
1521. in-variables subspace identification. J. Process Control
30. Narasimhan S, Mah RSH. Generalized likelihood ratios 2001; 11: 661–678.
for gross error identification in dynamic processes. 55. Raich A, Cinar A. Statistical process monitoring and dis-
AIChE J. 1988; 34: 1321–1331. turbance diagnosis in multivariate continuous processes.
31. Chow EY, Willsky AS. Analytical redundancy and the AIChE J. 1996; 42: 995–1009.
design of robust failure detection systems. IEEE Trans. 56. Jackson JE, Mudholkar G. Control procedures for resi-
Automatic Control 1984; 29: 603–614. duals associated with principal component analysis.
32. Isermann R. Process fault detection based on modeling Technometrics 1979; 21: 341–349.
and estimation methods—a survey. Automatica 1984; 57. Box GEP. Some theorems on quadratic forms applied in
20: 387–404. the study of analysis of variance problems. I. Effect of
33. Gertler J. Survey of model-based failure detection and inequality of variance in the one-way classification.
isolation in complex plants. IEEE Control Syst. Mag. Ann. Math. Statist. 1954; 25: 290–302.
1988; 12: 3–11. 58. Tracy ND, Young JC, Mason RL. Multivariate control
34. Benveniste A, Basseville M, Moustakides G. The asymp- charts for individual observations. J. Qual. Technol.
totic local approach to change detection and model vali- 1992; 24: 88–95.
dation. IEEE Trans. Automatic Control 1987; 32: 583–592. 59. Hawkins DM. The detection of errors in multivariate
35. Frank PM. Fault diagnosis in dynamic systems using data using principal components. J. Am. Statist. Assoc.
analytical and knowledge-based redundancy—a survey 1974; 69: 340–344.
and some new results. Automatica 1990; 26: 459–474. 60. Mardia KV. Mahalanobis distances and angles. In Multi-
36. Frank PM. Analytical and qualitative model-based fault variate Analysis—IV, Krishnaiah PR (ed.). North-Hol-
diagnosis—a survey and some new results. Eur. J. Con- land: Amsterdam, 1977; 495–511.
trol 1996; 2: 6–28. 61. Takemura A. A principal decomposition of Hotelling’s
37. Cleason TC, Staelin R. A proposal for handling missing T 2 statistic. In Multivariate Analysis—VI, Krishnaiah PR
data. Psychometrika 1975; 40: 229–252. (ed.). North-Holland: Amsterdam, 1985; 583–597.
38. Martens H, Naes T. Multivariate Calibration. Wiley: New 62. Wold H. Nonlinear estimation by iterative least squares
York, 1989. procedures. In Research Papers in Statistics, David F (ed.).
39. Nelson PRC, Taylor PA, MacGregor JF. Missing data Wiley: New York, 1966.
methods in PCA and PLS: score calculations with incom- 63. Qin SJ, Dunia R. Determining the number of principal
plete observations. Chemometrics Intell. Lab. Syst. 1996; 35: components for best reconstruction. Proc. 5th IFAC
45–65. DYCOPS, Corfu, 1998; 359–364.
40. Kourti T, MacGregor JF. Multivariate SPC methods for 64. Wold S. Cross validatory estimation of the number of
monitoring and diagnosing of process performance. components in factor and principal component analysis.
Proc. PSE, 1994; 739–746. Technometrics 1978; 20: 397–406.
41. Tong H, Crowe CM. Detection of gross errors in data 65. Kano M, Nagao K, Hasebe S, Hashimoto I, Ohno H. Sta-
reconciliation by principal component analysis. AIChE tistical process monitoring based on dissimilarity of pro-
J. 1995; 41: 1712–1722. cess data. AIChE J. 2002; 48: 1231–1240.
42. Kramer MA. Autoassociative neural networks. Comput. 66. Chiang LH, Russell EL, Braatz RD. Fault diagnosis and
Chem. Eng. 1992; 16: 313–328. Fisher discriminant analysis, discriminant partial least
43. MacGregor JF. Statistical process control of multivariate squares, and principal component analysis. Chemo-
processes. Prepr. IFAC ADCHEM, 1994. metrics Intell. Lab. Syst. 2000; 50: 243–252.
44. MacGregor JF, Kourti T. Statistical process control of 67. Kourti T, MacGregor JF. Multivariate SPC methods for
multivariate processes. Control Eng. Pract. 1995; 3: process and product monitoring. J. Qual. Technol. 1996;
403–414. 28: 409–428.
45. Wise BM, Gallagher NB. The process chemometrics 68. Hopkins RW, Miller P, Swanson RE, Scheible JJ. Method
approach to process monitoring and fault detection. J. of controlling a manufacturing process using multivari-
Process Control 1996; 6: 329–348. ate analysis. US Patent 5442562, 1995.
46. Chiang LH, Russell EL, Braatz RD. Fault Detection 69. Nomikos P. Statistical monitoring of batch processes.
and Diagnosis in Industrial Systems. Springer: London, Prepr. Joint Statistical Meet., Anaheim, CA, 1997.
2001. 70. Qin SJ, Valle-Cervantes S, Piovoso M. On unifying multi-
47. Wise BM, Ricker NL, Veltkamp DF, Kowalski BR. A the- block analysis with applications to decentralized process
oretical basis for the use of principal component models monitoring. J. Chemometrics 2001; 15: 715–742.
for monitoring multivariate processes. Process Control 71. Westerhuis JA, Gurden SP, Smilde AK. Generalized
Qual. 1990; 1: 41–51. contribution plots in multivariate statistical process
48. Wold S, Esbensen K, Geladi P. Principal component ana- monitoring. Chemometrics Intell. Lab. Syst. 2000; 51:
lysis. Chemometrics Intell. Lab. Syst. 1987; 2: 37–52. 95–114.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502
502 S. J. Qin
72. Conlin AK, Martin EB, Morris AJ. Confidence limits for 81. Bonvin D, Srinivasan B, Ruppen D. Dynamic optimiza-
contribution plots. J. Chemometrics 2000; 14: 725–736. tion in the batch chemical industry. Prepr. Chemical Pro-
73. Runger GC, Alt FB, Montgomery DC. Contributors to cess Control-6, Assessment and New Directions for Research
a multivariate statiscal process control chart signal. (CPC-6), Tuscon, AZ, 2001.
Commun. Statist.—Theory Methods 1996; 25: 2203–2213. 82. Pranatyasto TN, Qin SJ. Sensor validation and process
74. Cherry G, Good R, Qin SJ. Semiconductor process mon- fault diagnosis for FCC units under MPC feedback.
itoring and fault detection with recursive multiway PCA Control Eng. Pract. 2001; 9: 877–888.
based on a combined index. AEC/APC Symp. XIV, Salt 83. McNabb CA. MIMO control performance monitoring
Lake City, UT, 2002. based on subspace projections. PhD Thesis, University
75. Valle-Cervantes S. Plant-wide monitoring of processes of Texas at Austin, 2002.
under closed-loop control. PhD Thesis, University of 84. Qin SJ. Control performance monitoring—a review and
Texas at Austin, 2001. assessment. Comput. Chem. Eng. 1998; 23: 178–186.
76. Westerhuis JA, Kourti T, MacGregor JF. Analysis of mul- 85. Harris TJ, Seppala CT. Recent developments in control-
tiblock and hierarchical PCA and PLS models. J. Chemo- ler performance monitoring and assessment techniques.
metrics 1998; 12: 301–321. Chemical Process Control—CPC VI, Tuscon, AZ, 2002;
77. Smilde A. Comments on three-way analyses used for 208–222.
batch process data. J. Chemometrics 2001; 15: 15–27. 86. Kozub DJ. Controller performance monitoring and diag-
78. Wang Y, Seborg D, Larimore W. Process monitoring nosis: experiences and challenges. Fifth Int. Conf. on Che-
based on canonical variate analysis. Proc. ADCHEM 97, mical Process Control, Tahoe, CA, 1996; 83–96.
Banff, 1997; 523–528. 87. Harris TJ, Seppala CT, Desborough LD. A review of per-
79. Qin SJ, Li W. Detection and identification of faulty sen- formance monitoring and assessment techniques for uni-
sors in dynamic processes. AIChE J. 2001; 47: 1581–1593. variate and multivariate control systems. J. Process
80. Woodall WH, Montgomery DC. Research issues and Control 1999; 9: 1–17.
ideas in statistical process control. J. Qual. Technol.
1999; 31: 376–386.
Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 480–502