Data-Driven Inter-Turn Short Circuit Fault Detection in Induction Machines
Data-Driven Inter-Turn Short Circuit Fault Detection in Induction Machines
Data-Driven Inter-Turn Short Circuit Fault Detection in Induction Machines
Received September 2, 2017, accepted October 9, 2017, date of publication October 24, 2017,
date of current version November 28, 2017.
Digital Object Identifier 10.1109/ACCESS.2017.2764474
ABSTRACT Inter-turn short circuit (ITSC) fault is one of the critical electrical faults in induction motors
that affects the reliability of many industrial applications. Although the use of data-driven fault detection
techniques have gained much interest, the main deterrent in using these approaches in detecting ITSC
faults is in the generalization and robustness of the diagnosis. In this paper, a data-driven on-line fault
detection framework, incorporated with multi-feature extraction/selection and multi-classifier ensemble is
proposed, capable of detecting ITSC faults in induction motors (IMs) that subjected to variable operating
conditions. By using the synchronous time series signals collected from the machines, multiple feature
extraction/selection is explored to find the sensitive faulty features, and the different types of classification
strategies is used to increase the diversity of single based models. With the increased diversity of the base
learners, the fault detection accuracy is expected to be enhanced and the robustness can be guaranteed.
The framework was implemented and tested using real data collected from a designed test bed, with the
experimental results showing the effectiveness of the framework in detecting ITSC faults in IMs.
INDEX TERMS Data-driven, fault diagnosis, induction motor, inter-turn short circuit.
2169-3536 2017 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 5, 2017 Personal use is also permitted, but republication/redistribution requires IEEE permission. 25055
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Z. Xu et al.: Data-Driven ITSC Fault Detection in Induction Machines
For example, as an indication of short circuit failure in the operating states that can be considered normal, arising from
windings, it is common to detect changes in the spectrum the different speed and/or loading conditions, throughout the
of the negative sequence voltage. However, the use of such lifetime of the machine. In detecting these specific mode of
conventional methods is often only for detecting specific failures experienced throughout the lifetime of the machines,
known types of abnormalities, therefore unable to detect any the failure detection method must be robust and generalizable
new abnormal behaviour present in the system. In addition, over a range of operating conditions.
if the spectrum of a healthy IM is close or overlapped with There were some papers working on motor fault diagnose
that of a faulty IM, it is difficult to distinguish the faulty from based on ensemble approach. [21] investigated fault feature
the healthy operating conditions. Furthermore, even in safe extraction of mechanical anomaly on induction motor bear-
conditions, the frequency components depend on the speed ing using ensemble super-wavelet transform. [22] employed
and power supply, which these strategies are not well adapted vibration signals from normal bearings and bearings with
to and often only applicable to machines under steady state three different fault locations. Bearing fault detection was
working conditions i.e., at constant speed and load. In over- based on hybrid ensemble detector and empirical mode
coming these limitations, advanced signal processing tech- decomposition. In [23],the stator current signals of induction
niques and high-resolution spectral analysis such as negative- motors are obtained using the MCSA method and the signals
and zero-sequence currents [8] and wavelet based analysis [9] are then processed to produce a set of harmonic-based fea-
were suggested, with the robustness of these methods still tures for classification using the FMMÂĺCRFE model. Above
being questioned [10]. literatures often uses current or vabration signals as the crit-
The second category uses model-based appro- ical fault indicator, in our experiments, however, no obvious
aches [11]–[14], which requires physical and mathematical difference was observed in the early stages of ITSC e.g.,
knowledge of the process a prior. The fault diagnosis is real- 2% and 10%. This implies that simply analysing the cur-
ized by generating features such as specific residuals, param- rent or vibration signals will not guarantee the performance
eter estimation, state estimation, etc. However, the downside of the fault detection. Furthermore, most of existing driven-
to these categories of methods is that, in many situations, driven approaches cannot generalize to unseen data, meaning
the complexity of the systems under observation makes it that the testing data must have the same characteristics as
almost impossible to derive robust and accurate models for the training data. However, as the operating conditions of
online applications. Moreover, these methods assumes the IMs vary, it is not plausible to exhaust all of the working
accurate knowledge of the model parameters, which is not conditions during training. In addition, the framework trained
the case in practice, as uncertainties often exist leading to on previous historical data should correctly predict the data
high false alarm [10]. generated after training, which may be affected by noise,
A good alternative to the above-mentioned two categories resulting from the re-start of the machine, the variations of
is the third category, which, prescribes the use of data-driven system dynamics etc.
approaches to detect the ITSC faults. This is undertaken In mitigating the above identified issues in monitoring
by evaluating the large quantity of available data, collected ITSC faults, a diagnostic condition monitoring framework,
from non-intrusive and in-expensive sensors, which is already based on an ensemble of data-driven techniques using elec-
implemented in current IM control system without disturbing trical signals from the IM, is proposed as shown in Fig. 1 and
the normal operation of the machines. In the areas of on- explained in detail in Fig. 9 of Section IV. The frame-
line monitoring of IM, several data-driven methods have work in Fig. 1 begins with synchronous time series signals
been applied, with multivariate statistical process monitor- collected from the machines, acting as inputs. As mining
ing methods and machine learning approaches [15]–[20] to unknown knowledge from data is one of the most important
name a few. However, in using these methods, assumptions characteristics of data-driven methods, all of the signals were
were often made that all faults are known a priori. With put together as inputs to proposed framework so that useful
the exception of some common faults, not all faults can be signal could be automatically found during training. In the
identified before the design of the diagnostic system, thus data preparation stage, a sliding window mechanism is used,
risking the misclassification of unknown faults. Although breaking down the time series data into short segments, before
fault detection is a reasonably mature field of research, there any preprocessing is done. Feature extraction/selection meth-
are very few techniques developed with real-time operations ods were then used to extract features and select the most
in mind, with the ability to predict incipient faults early informative ones for the following tasks. This is then followed
enough. Furthermore, in an IM, it can be subjected to various by the classification step, where models were built based
on the selected features for fault detection. The purpose of • Generator: controls the load applied to IM;
multiple feature extraction/selection is to explore the sensi- • ITSC controller: constructss various types of ITSC
tive faulty features, and the different types of classification faults;
strategies is to increase the diversity of single based models. • Signal collector: collects various signals from the
With the increased diversity of the base learners, the ensemble sensors.
performance is expected to be enhanced and the robustness
of the fault detection can be guaranteed. From ensemble
learning theory [24], it was argued that the use of multi-
learner ensemble improves the performance as compared to
single base learners, provided the base learners are accurate
and diverse. Thus, the aim of the proposed framework is to
develop an on-line monitoring system, capable of detecting
ITSC faults in IMs, subjected to variable operating condi-
tions. It it envisaged that the incorporation of these methods
in an on-line monitoring setup will make the results sensitive
to low severity faults, robust in handling data noise and
operating conditions of the IMs and fast in detecting ITSC
faults.
Experimental studies were conducted on the proposed
framework, using healthy and faulty ITSC data, gener- FIGURE 3. Configuration of the three-phase stator windings with ITSC
fault.
ated from a three-phase powered IM. The results from
the studies showed that faulty condition can be distin- Fig. 3 illustrates how an ITSC fault is simulated in the
guished from healthy ones even at low severity as the controller. The quantity xa = NA /N represents the relative
unseen faults can be detected under the different working fraction of the fault in phase A, defined as the ratio between
conditions. the shorted turns NA and total turns N in each stator winding.
The rest of this paper is organized as follows. Section II Defined as short percentage (Short%), it represents the per-
presents the experimental setup and the description of the data centage of the stator windings that are short circuited in the
collected from IMs. Section III presents the data preprocess- test run. The higher the percentage, the more severe the fault
ing and feature extraction and selection methods. Section IV is. In simulating the degradation of the fault from incipient to
presents the classification techniques and proposed ensem- severe, the resistor R is attached to A stator coil. Different
ble framework for the condition monitoring system. Finally, values of Short% and R represents the different levels of
the results in Section V showed the effectiveness of the data- fault severities that the IM is subjected to. Based on domain
driven fault detection in IMs. knowledge, the fault severity will increase with the increase in
Short% and/or decrease in R. This phenomenon is illustrated
in Fig. 4 where it can be clearly seen that decreasing fault
severities implies the increasing difficulties in detecting the
faults.
by 2-tuple of (speed, load). The ‘speed’ values represent the that there are 5000 data points per second. In our experiments,
nominal speed and corresponds to the actual fundamental the window size is set at 2500 data points, or 0.5 second. This
frequency of the current/voltage of IM, controlled by the value was chosen so as to achieve a good balance between
power supplier. The value of speed is varied from 600 rpm to efficiency and effectiveness. If the window size is too big,
1400 rpm. The ‘load’ represents how much work the motor is the time information will be lost as all the data within the
outputting, with values ranging from 0 N·m and 5 N·m of the window will be considered as a single instance, i.e. the time
IM loading conditions. The sensors pick up data under the resolution of the data processing suffers. On the other hand,
different working conditions from the IM, including: phase if the window size is too small, the extracted information from
current from all three phase (IA , IB , IC ) and phase voltage the signal may not be accurate as the window contains too few
from all three phase (VA , VB , VC ). cycles, compromising the accuracy of the frequency spectrum
analysis.
TABLE 1. Data description.
B. FEATURE EXTRACTION
To further extract useful information from current (IA , IB , IC )
and voltage (VA , VB , VC ) in diagnosing the ITSC fault, vari-
ous feature extraction techniques were employed. Each tech-
nique will be used to process all the data from a single window
into a set of feature value. By building new features from
the original time series, feature extraction will reduce the
massive time series data points into a manageable synoptic
data structure whilst preserving most of the characteristics
of the time series. In addition, the use of feature extraction
provides an opportunity to incorporate domain knowledge
into the data.
Table 1 shows the description of three phase currents and
voltages collected from the experiments, with the Short% set 1) FFT
to four different values. When the IM is healthy, ITSC is 0% Fourier Transform (FT) is one of the most popular techniques
and R is initially set to positive infinity to collect the healthy used in analyzing time series and FFT (Fast Fourier Trans-
data. After approximately 300 seconds, ITSC is manually set form) is its fast implementation. Given a vector (x1 , . . . xN ),
with a certain value of Short% with the value of R gradually the time series is represented by its spectrum as
lowered, represent the degradation process. Other faulty data N
were also collected at the same time, i.e., from (0%, infinity) (j−1)(k−1)
X
x(k) = xj ωN (1)
to (2%, 50 ohm) and then (2%, 0.8 ohm) and so on. The sen- j=1
sors collect various types of data at the sampling frequency
of 5 kHz. where ωN = e(−2πi)/Nis the N th root of unity. After FT,
the coefficients obtained are complex values which are not
suitable for most of the existing classifiers. A following step
to get real-valued features is often required. Existing means
include calculating the real value, absolute value and image
value, etc. Here, the absolute values of FT coefficients are
used as the extracted features. Based on the corresponding
frequency components of the features, two types of FFT-
based feature extraction technique can be used:
• Basic FFT (FFT-B): FFT-B uses all frequency
FIGURE 5. Illustration of raw signal segmentation for feature extraction.
components as the extracted features. There are no pre-
assumptions made on the input signal and therefore
III. FAULT INDICATOR FFT-B can be applied to any type of time series.
A. DATA PREPARATION • Harmonic FFT (FFT-H): FFT-H uses the harmonic fre-
As discussed in Section I, in order to capture and identify quency component as the extracted feature. An assump-
specific trends or patterns in the data, a sliding window tion made on the input signal is that there is a funda-
mechanism is used. As illustrated in Fig. 5 i.e. how a raw mental frequency in the signal and its harmonics play an
signal is segmented for the purpose of feature extraction, important role in analyzing the signal. This is true for
the mechanism breaks down the long time series data into the phase currents and/or voltages in an IM and FFT-H
short segments before any preprocessing is done, thus pro- can be used to pick out the most important frequency
viding more insight into the machine behavior. Since the components. Another merit of FFT-H is that the features
sensors acquire data at the frequency of 5 kHz, this implies are independent of the fundamental frequency, i.e. when
FIGURE 8. Scatter plot of data with FFT-H features. FIGURE 9. Framework for fault detection in ITSC.
are healthy or faulty. In practice, there are different types of TABLE 2. Classifier parameters.
classifiers namely linear, non-linear, statistical, kernel based,
etc, and each has its own advantages and disadvantages.
In out experiments, multiple classifiers were also explored
for the purpose of ensemble, including NB, NN [32], [33],
linear and nonlinear SVM [34]–[36], linear and nonlinear
ELM [37], [38]. All these models need to be trained
using training data prior to testing. In our experiments,
offline training is employed for the model training
process.
The step by step implementation of offline training is as
follows:
1) The training data was prepared into healthy and faulty
data set.
2) A sliding window mechanism was used to break down
the healthy and faulty data set from a long time series
into short segments before any preprocessing was
undertaken as explained in Section III-A.
3) Each data segment then goes into the four feature
extraction blocks and new features were built from the TABLE 3. Experimental setting for feature extraction.
data from the segment by the four extraction methods
independently. Specific features extracted from each
method were introduced in Section III-B.
4) The extracted features from the four feature extraction
methods were put together as inputs for feature selec-
tion. Fisher’s ratio was used as the feature selection
method to evaluate the importance of all the features
based on (2). Only high ranked features were retained
for classification. The number being kept were decided
are decided by the criterion that the classification per-
formance will not be much improved by incorporating
more features as inputs to classifiers.
5) The selected features then acts as inputs to each clas-
sifier. Models were trained off-line and the parameters
of each classifier were decided. The performance of the
classifiers were then evaluated by using the definitions
of TPR and TNR as introduced in Section V-A. V. EXPERIMENTAL VERIFICATION
6) The individual outputs of each classifier were then A. PERFORMANCE EVALUATION
combined at the ensemble stage using the majority
Classification performance evaluation is an important stage
voting approach.
in the construction of a fault detection system as it acts as
During offline training, the parameters of each block in
an index feeding back to the previous stages for improving
the framework were decided and models were saved for
the outputs from each stage or the whole system. It is also a
online testing. The parameters of the classifiers were given
way to reflect the ability of the classification system, helping
in Table 2 with Table 3 showing the four techniques and
to build confidence for users. For a binary classification
their extracted features as well as the number of extracted
problem, without the loss of generality, assuming healthy
features from each input signals. For on-line testing, as long
class is ‘positive’ and faulty class is ‘negative’, each classified
as a data segment is available as inputs, the processing
instance will belong to one of the following four categories:
in Fig. 9 will take place step-by-step and the ensuing results
will displayed. Since online testing is based on the saved • True Positive (TP): Instance is predicted as ‘positive’
models during offline training and only one data segment is when it is actually positive;
processed at a time, the detection of the ITSC faults were • False Positive (FP): Instance is predicted as ‘positive’
found to be very fast.were found to be very fast. Besides when it is actually negative;
giving improved performance, the proposed framework was • True negative (TN): Instance is predicted as ‘negative’
found to be easy to extend and modify, with the possibility when it is actually negative;
of incorporating new data and/or new techniques in each • False negative (FN): the instance is predicted to be ‘neg-
stage. ative’ while it is actually positive.
FIGURE 10. Classification results under different number of features by using PLP.
In this paper, the True Positive Rate (TPR) and True Neg- same characteristics as the training ones. However, it is often
ative Rate (TNR) are utilized as measures to evaluate the not plausible to exhaust all of the Short% during training.
performance of the fault detection. TPR is the proportion of Therefore, the ‘Short%-Generalization’ is employed to test
correctly predicted positive instances defined as: the generalization from a group of Short% values to another
TP unseen Short% value, e.g., train on 2%, 5%, test on 10%.
TPR = (3) IMs work in varying operating conditions, e.g. changing
TP + FN
load or speed. Using the above-mentioned, the following
TNR measures the proportion of negative instances that are sections will test: 1) if the framework can detect the ITSC
correctly identified defined as: in early stage,. e.g., 2% ITSC fault; 2) time-generalization
TN ability (robustness); 3) short percentage generalization abil-
TNR = (4)
TN + FP ity in multi-load and multi-speed scenarios. The analysing
results will be shown in Sections V-B, V-C and V-D. The
results will be summarized in Section V-F. Note that the
TABLE 4. Performance evaluation methods.
following sections only show the representative results due to
the space limitation, which are usually the worst case or the
most difficult testing scenarios based on the data listed
in Table 1.
B. MULTI-LOAD SCENARIO
Load can be connected or disconnected from the IMs, leading
to operating point variations. It is clear from Fig. 8 that
the instances under load and no load conditions may form
different clusters even under the same speed. Whilst it is not
Table 4 lists the three evaluation methods that are used in
plausible to validate against all possible conditions, the ability
the experiments to evaluate the performance of the classifi-
to generalize and detect the unseen behaviors is important.
cation. The 10CV is to randomly split the data into 10 equal
In the following part, the above issue is addressed by selecting
folds and each fold is then selected as testing data and the
the appropriate feature extraction methods, classifiers and
rest as training data. For time series data, the performance
their combinations as in the framework 9 for the framework
evaluated by using 10CV method is used optimistic as data
shown in Fig 10.
appeared in past and some future time would be used to make
predictions for the data to appear in-between the past and
future. However, classification model trained using previous 1) 10-FOLD CROSS-VALIDATION UNDER MULTI-LOAD
historical data may not be able to correctly predict the data SCENARIO
generated after training which may be affected by re-start of One of the key task of the proposed framework is the ability
the machine, the variations of system dynamics etc. In order to generalize unseen scenarios which assumes there exists
to verify whether the data variations can be handled, ‘Time- an underlying set of rules relating the parts of the input
Generalization’ is used to ensure that the testing data will signals or features that does not change when the condition of
always appear in time after training. In this way, the impact IM changes. This relationship, if captured by the classifiers,
of time sequence will also be investigated. In addition, most will enable the framework to function under a range of in a
of existing driven-driven approaches is unable to generalize wide range of operating conditions. This investigation is thus
to unseen data, implying that the testing data must have the focused on finding the investigate feature extraction method
that can identify signals with little variance at different • PLP and FFT-H perform better than the other two meth-
working conditions across the different settings. ods, PLP with better acceptability;
The previously discussed four feature extraction tech- • FFT-B performed the worst with some of the values very
niques, namely PLP, Wavelet-based feature extraction, FFT-B close to 0.5 (random guess);
and FFT-H were selected and implemented. For comparisons • Wavelet-based feature extraction showed low perfor-
of the performance between the different feature extraction mance in TPR with some of them close to 0.6.
methods, all available healthy (0% short circuit + Inf ohm) In addition, when comparing the results from FFT-B and
and faulty (2%,5%,8%,10%) data under each operation con- FFT-H, it is obvious that FFT-H performed better. This is rea-
dition (speed, load) were used in the 10CV evaluation way sonable and within expectation. The fundamental frequency
to estimate the performance of TPR and TNR. NB was the will change with the load under the same speed. The true
chosen classifier due to its simplicity and potentially high fundamental frequency will change over time even under the
generalization ability. The accuracy (ACC) is used as another same speed and load. Therefore, FFT-H which chooses the
metric to select the number of features according to the features based on the actual fundamental frequencies will
classification performance. ACC is the probability that the be more reliable than FFT-B in a fundamental frequency-
classifier classifies a randomly selected instance to the correct dynamic environment. Based on the above analysis and com-
class, which is the proportion of the total number of predic- parison, it is concluded that PLP and FFT-H be the first choice
tions that are correct. It is determined using the equation: and Wavelet-based feature extraction the second choice.
TP + TN FFT-B will not be suitable for fundamental frequency-
ACC = changing signals.
TP + FP + TN + FN
Table 3 shows the results from the 4 feature extraction
2) TIME-GENERALIZATION (ROBUSTNESS) UNDER
techniques and their corresponding number of extracted fea-
MULTI-LOAD SCENARIO
tures from the experiments. The number of features are
The instances in the data set appear in time order. When 10CV
selected based on the classification performance. For exam-
is used, the time order is randomized when generating the
ple, the classification results of ‘speed =1200 rpm and load
training and testing data, i.e., instances in training data may
= [0, 5] N·m’ by using PLP and NB is plotted as an example
appear after testing instances which is not case in practice.
in Fig. 10, which showed that the classification performance
Thus, the experiment here is designed to testify. From the
does not indicate any obvious change after 55 features. In our
results obtained in Section V-B.1, it was shown that FFT-
experiemnts, 11 features from total 66 features were selected
H is a generally good feature extraction technique as the
from each input signals for classification.
data under ‘speed = 1400 rpm’ has more overlaps between
TABLE 5. Performance of the four feature extraction techniques.
the faulty and the healthy data in the feature space. Thus,
in the experiments, the data under ‘speed = 1400 rpm + load
= [0 5] N·m’ was chosen, with FFT-H employed to extract
features. In addition to the extracted features, the nominal
fundamental and actual fundamental frequencies from fourier
transform were also used as two additional features. This
provided the ability to handle variations in the fundamental
frequency, are mainly caused by the changes of load over
time. Six classifiers are used in the experiments including
NB, SVM, NN, ELM and the linear forms of SVM and
ELM, namely, linearSVM and linearELM. In the experi-
ments, the instances under each condition (load, Short%,
resistance) were first separated into two parts based on their
appearing time. Within the available data, the first part is used
as training which includes instances appearing during the
first half of data capture time, whilst the second part is used
as testing which includes those appeared after the first part.
In obtaining a robust estimation of the performance, in each
round, 90% of the training instances were randomly selected
to train a classifier which was then tested on the testing data.
Table 6 shows the performance of the classifiers, with ‘P’
and ‘N’ being the number of positive and negative instances
Table 5 shows the averaged TPR and TNR for the four respectively.
feature extraction techniques, with the performance values The observations from the table are as follows:
< 0.8 being highlighted. The observations from the table are • NB seems to be the worst classifier as compared to the
as follows: other classifiers since it has the worst TPR;
• TNR values are shown to be generally good from the six TABLE 8. Performance of short%-generalization evaluation.
classifiers;
• In essence, the it can be concluded that both nonlin-
ear (SVM, NN, ELM) and linear classifiers (linearSVM
linearELM) can be used for monitoring the working
conditions;
• Although not indicated in the table, it should be noted
that linearSVM is more computationally intensive than
the rest. Specifically, in linearSVM requires several hun-
dred seconds for training whilst the others needs less
than 20 seconds.
TABLE 9. Performance of time-generalization under multi-speed TABLE 11. Performance of short%-generalization under multi-speed
scenario. scenario.
and ELM are selected as the base classifiers for ensemble. TABLE 13. Comparison of performance of short%-generalization under
multi-speed scenario.
The experiments are done in the same scenarios as in the
Section V-C by the evaluation of Time-Generalization and
Short%-Generalization. Ensemble results of the two scenar-
ios are shown in last columns of Table 9 and Table 11.
CHANGHUA HU was a Visiting Scholar with CHI-KEONG GOH (SM’14) received the B.Eng.
the University of Duisburg in 2008. He is cur- and Ph.D. degrees in electrical engineering from
rently a Professor with the High-Tech Institute, the National University of Singapore, Singapore,
Xi’an, China. He has published two books, and in 2003 and 2007, respectively.
about 100 articles. His research interests include He is currently the Team Lead in data ana-
fault diagnosis and prediction, life prognosis, and lytics and optimization with the Rolls-Royce
fault tolerant control. He received the Changjiang Advanced Technology Centre, Singapore. His cur-
Scholar by the Chinese Ministry of Education rent research interests include evolutionary com-
in 2013. putation and data analytics and their application.
Dr. Goh has served as a reviewer for various
international journals, such as the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND
CYBERNETICS, PART B: CYBERNETICS, the IEEE TRANSACTIONS ON SYSTEMS, MAN,
AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS, and Neurocomputing.