Multi-fault diagnosis method applied to an electric machine based on high-

dimensional feature reduction

Juan Jose Saucedo-Dorantes, Miguel Delgado-Prieto, Member, IEEE, Roque Alfredo Osornio-Rios, Member,
IEEE, and Rene de Jesus Romero-Troncoso, Senior Member, IEEE.

Abstract – Condition monitoring schemes are essential for electrical stresses associated to problems in the power supply
increasing the reliability and ensuring the equipment efficiency cause stator faults like short circuits in the stator winding [3].
in industrial processes. The feature extraction and dimensionality Thus, the related condition monitoring plays a key role in the
reduction are useful preprocessing steps to obtain high
reliability and safety strategies of several industry applications
performance in condition monitoring schemes. To address this
issue, this work presents a novel diagnosis methodology based on [4]-[6]. Although different physical magnitudes have been
high-dimensional feature reduction applied to detect multiple investigated for IM condition monitoring [3], [7]-[9]; vibration
faults in an induction motor linked to a kinematic chain. The analysis remains as the most industrially accepted approach.
proposed methodology involves a hybrid feature reduction that The vibration analysis is a useful and reliable tool to assess the
ensures a good processing of the acquired vibration signals. The IM condition since the characteristic vibration modes of any
method is performed sequentially; first, signal decomposition is
rotating machine changes in presence of faults [10]-[13]. Yet,
carried out by means of Empirical Mode Decomposition. Second,
statistical-time based features are estimated from the resulting although several methodologies applied to diagnose faults in
decompositions. Third, a feature optimization is performed to electric motors have been presented during the last decades,
preserve the data variance by a Genetic Algorithm in conjunction most of these methodologies are focused on the analysis of a
with the Principal Component Analysis. Fourth, a feature specific fault mode [7], [10], [14]-[15]. Indeed, the application
selection is done by means of Fisher score analysis. Fifth, a feature of such health monitoring schemes to industrial scenarios
extraction is performed through Linear Discriminant Analysis.
presents new challenges that must be addressed, where
And, finally, sixth, the different considered faults are diagnosed
by a Neural Network-based classifier. The performance and the different faults may appear hiding or overlapping the expected
effectiveness of the proposed diagnosis methodology is validated characteristic fault patterns.
experimentally and compared with classical feature reduction Typically, it is estimated the root mean square (RMS) from
strategies, making the proposed methodology suitable for the vibration signal as a numerical indicator to assess the
industry applications. general condition of the machine [16]-[18]. In order to
consider improved characterization of the vibration signal, the
Index Terms— Induction Motor, Condition Monitoring;
numerical set of features is extended to additional statistical
Multiple Faults; Feature Reduction; Vibrations.
time-domain, frequency domain, and also time-frequency
domain [19]-[21], [22]. Yet, although fast Fourier transform
I. INTRODUCTION and Cohen's class time-frequency distributions have been
Induction motors (IM) represent the most common rotating successfully applied [22]-[23], the simplicity and low
electrical machines used in industry due to its robustness and computational cost of the statistical time-domain features
competitive cost [1]-[2]. However, unexpected faults may exhibit a high characterization potential dealing with regular
occur during the useful life of the IM, causing unscheduled stationary speed cycles in the industry [24].
downtimes of the whole components associated to the Condition monitoring strategies that use a high-dimensional
kinematic chain. Typical faults in IM may be due to set of features to characterize the properties of faults inevitably
mechanical and electrical stresses. Mechanical stresses caused contain redundant and non-significant information. Recently,
by overloads can produce bearing defects, rotor bar breakage, approaches of signal decomposition are widely used in
rotor unbalance and misalignment in couplings, whereas condition monitoring schemes. Different decomposition
techniques can be used; and the signal decomposition approach
This research was partially supported by CONACYT, Mexico, under by means of Empirical Mode Decomposition (EMD) has being
doctoral scholarship number 278033, and MINECO, Spain, under the
Project CICYT TRA2013- 46757-R. applied due to its self-adaptive capabilities to extract a set of
J. J. Saucedo-Dorantes and R. A. Osornio-Rios are with the HSPdigital CA- Intrinsic Mode Functions (IMF) from the raw signal. The
Mecatronica, Engineering Faculty, Autonomous University of Queretaro, estimation of numerical features from each IMF represents a
San Juan del Rio 76806, Mexico (e-mail:, good opportunity to obtain a potential high-dimensional set of
M. Delgado-Prieto is with the MCIA Research Center, Department of features for diagnosis purposes [20]. Yet, dimensionality
Electronic Engineering, Technical University of Catalonia (UPC), Spain (e- reduction procedures must be applied to avoid low fault
R. J. Romero-Troncoso is with the HSPdigital CA-Telematica, DICIS,
diagnosis performances and overfitting responses of the
University of Guanajuato, Salamanca 36885, Mexico (e-mail: classification algorithm [21], [25]. In this regard, classical techniques of dimensionality reduction have been integrated in
condition monitoring schemes; for instance, Principal include the validation that the application of hybrid feature
Component Analysis (PCA) [20], [26], and Linear reduction strategies (selection and extraction), represents a
Discriminant Analysis (LDA) [27], are the main techniques high-performance information analysis procedure, which
used for reducing high-dimensional sets of features. However, improves the classification capabilities compared with the use
each dimensionality reduction approach is based on a specific of classical approaches, such as PCA and LDA, as a unique
objective function; that is, PCA aims to identify orthogonal technique to high-dimensional feature reduction. Notice that
components aligned with the maximum data dispersion this proposed hybrid feature reduction methodology has not
direction, whereas LDA aims to maximize the distance among been study in multi-fault diagnosis so far and the results are
different data sets [28]. Such difference of criteria leads to promising.
multiple works in which the selection of the dimensionality This paper is structured as follows. Section II describes the
reduction approach is carried out by a performing ratio when theoretical aspects of the proposed method and section III
combined with the classification algorithm [29]-[30]. describes the diagnosis methodology. The experimental test
Moreover, dealing with multiple faults, such classical bench used to assess and validate the method is presented and
dimensionality reduction approaches are usually combined discussed in sections IV and V, respectively. Conclusions and
with complex hierarchical classification structures in order to future work are summarized in Section VI.
compensate the loss of performance. In this sense, in [23], a
set of features, estimated by means of wavelet decomposition II. FEATURE REDUCTION
from vibration signals, is used in a hierarchical deep belief The feature set is a critical aspect that compromises the
based network to classify different bearing defects. Although performance of classification algorithms; thereby, a reduced
this methodology exhibits good results, the proposed multi- number of features will not contain enough information to
stage network implies multiple trainings, one for each specific describe and to characterize the machine working conditions.
condition to be solved. In [22], a bi-spectrum set of features Therefore, the addition of new features is an option to increase
estimated from vibration measurements is reduced through the capability of discrimination, and it is commonly believed
PCA technique and then, used by a hierarchical classifier based that the classification performance will improve. Yet, an
on Support Vector Machine (SVM). Although this scheme increase of the number of features may not offer additional
assesses different bearing condition, the proposed approach information to the machine condition, and the performance of
involves the use of the same number of SVMs as the same classification will be degraded instead of improved. Thus,
number of considered faults. misclassifications can be obtained because of the redundant
Thereby, the contribution of this work lies on a novel multi- and useless information contained in large sets of features.
fault diagnosis methodology, and the verification of the Working with a high-dimensional set of features complicates
proposed hybrid high-dimensional feature reduction method to the fault identification task of the multi-class classification
increase the diagnosis performance dealing with multiple methods. Besides, it is required a high computational cost and
faults in an induction motor linked to a kinematic chain. the use of redundant and useless information could
Originality of the work includes the empirical mode compromise the proper convergence of the algorithms [21].
decomposition of the available vibration signals, the For that reason, procedures of feature reduction are
estimation of statistical-time-features, and the validation of the implemented in condition monitoring schemes [14]. Mainly, it
proposed hybrid high-dimensional feature reduction method. is possible to remove redundant or non-discriminative features
Indeed, the resulting high-dimensional set of features is by means of two reduction strategies: feature selection and
analyzed by means of a novel multi-stage dimensionality feature extraction.
reduction approach, in which, an optimization is performed by Regarding feature selection, it is a filtering strategy in
a Genetic Algorithm (GA) in conjunction with the PCA to seek which all the features are independently evaluated by
an optimal set of IMFs that best preserve the data variance, considering only their individual descriptive capabilities; thus,
afterwards, a selection of the best discriminative statistical the features are ranked in terms of their relevance, and even
features is carried out by means of the Fisher score, and then, though a specific feature cannot be useful by itself, it can be
the select features are compressed and transformed into a 2- very useful when it is combined with others. Filtering
dimensional space through LDA based feature extraction. strategies do not require a particular learning algorithm,
Such multi-stage dimensionality reduction, allows using a making, them effective and easy to compute. Most of these
simple Neural Network (NN) -based classification algorithm algorithms are based on general characteristics of the data such
as diagnosis estimator, including class identification and as distance, dependence, and consistency among others [17].
membership probability. Consequently, the implementation of feature selection
The proposed diagnosis methodology is validated under a strategies in condition monitoring schemes is used to preserve
complete set of experimental vibrations acquired from an the most discriminative features; in this sense, the filtered
electromechanical system, where five different mechanical features are those that best described the machine working
faults are considered. In this context, novelties of this work condition [20].
On the other hand, feature extraction differs in the question bearing defect (BD), half-broken rotor bar (1/2 BRB), one
of whether a technique is supervised or unsupervised. The broken rotor bar (1 BRB), unbalance (UNB) and misalignment
main difference between both techniques is the availability of (MAL). For each considered condition, ninety axial vibration
labels to distinguish the different classes. measurements have been acquired. Each measurement
PCA is a well-known and the most common used technique corresponds to one second of the machine operation.
for unsupervised dimensionality reduction and feature
A. Signal decomposition and features calculation
extraction [15]. This technique projects a high-dimensional
data set into a new uncorrelated set of features; therefore, no The decomposition of the acquired vibration signal is
redundant information is present. These projections, named performed by means of EMD; such decomposition is applied
principal components, are linear combinations in which the to each considered condition and allows obtaining a set of
variability of the data is better captured. PCA is based on IMFs which are automatically adapted to the corresponding
statistical analysis and even though it does not concern in the vibrational pattern.
separation of different classes, it has advantage in feature Afterwards, each resulting set of IMFs is characterized by
extraction due to preserving the variability of the data. estimating 15 statistical time-based features: mean, maximum
Therefore, the consideration of PCA analysis is helpful in value, RMS, square root mean, standard deviation, variance,
condition monitoring schemes to discard redundant RMS shape factor, square root mean shape factor, crest factor,
information that is not required to detect faults in a system. latitude factor, impulse factor, skewness, kurtosis, and
LDA is one of the most well-known supervised techniques normalized fifth and sixth moments. Therefore, a resulting
used in multi-class problems for linear dimensionality number of 150 numerical features are estimated for each
reduction and feature extraction [20]. LDA aims to find a considered condition. The proposed set of statistical features is
projection into a low-dimensional representation in which it is shown in Table I. These statistical-time features have been
contained the most discriminant information attempting to successfully used for fault detection in electrical motor due to
maximize the linear separation between data points belonging their high-performance source of information and their
to different classes. LDA is a suitable feature extraction capabilities to analyze general trends of the signal [19].
technique to be considered in condition monitoring schemes
because it pays attention to differences of known classes; thus, TABLE I


̅ = ∙  | |
through the proper application of this technique it is possible

Mean (1)
to obtain the parameters that correctly indicate the machine
working condition. Maximum value  =  (2)

 =  ∙   
Feature selection and feature extraction approaches provide

Root mean square (3)
complementary feature reduction effects; therefore, there is not


 =  ∙  | |
a clear criterion for choosing a specific technique: the

Square root mean (4)
reduction stage is typically implemented in order to fulfill with

 =  ∙   − ̅ 
a required data processing.

Standard deviation (5)


  = ∙   − ̅ 

Variance (6)

The proposed multi-fault diagnosis methodology is 

∙ ∑ | |
composed by six steps as depicted in Fig. 1. First, the signal

RMS Shape factor (7)

decomposition with the estimation of the IMFs from the

∙ ∑ | |
vibration signal is done by the EMD. Second, the calculation

SRM Shape factor (8)

of a set of statistical-time based features from each IMFs is
" =

done. The proposed hybrid high-dimensional feature reduction Crest factor (9)

# =
method follows. Thus, third, a feature optimization approach

Latitude factor (10)

of the available set of features is done by selecting the most
$ =

significant IMFs to maximize the data variance preservation.

∙ ∑ | |

Impulse factor (11)
Fourth, a feature selection by filtering the set of statistical-time
%& − ̅ ' (
features trough the analysis of the Fisher score is done. Fifth,
Skewness (12)
%& − ̅ * (
a feature reduction with the extraction of a reduced set of
features to maximize the fault discrimination is performed. Kurtosis (13)
%& − ̅ . (
5,ℎ =
Finally, sixth, the classification stage based on NN is
Fifth moment (14)
%& − ̅ 0 (
performed, where the different considered faults are
6,ℎ =
diagnosed. Sixth moment (15)
In this work, six different conditions have been considered
to be evaluated in terms of the induction motor: healthy (HLT),
Fig. 1. Proposed diagnosis methodology based on hybrid feature reduction for the detection of multiple faults in electromechanical systems.

Second, in the feature selection, an analysis of the

B. Multi-stage dimensionality reduction
individual discriminative capabilities is applied to the
The estimated high-dimensional set of features contains a statistical-time features with the objective to filter and preserve
large portion of the information related to the working the most discriminative features. In this process the considered
condition on the kinematic chain, but only some contain statistical time-features are those that belong to the optimized
representative information. First, considering the obtained set set of IMFs. The feature selection is performed by computing
of IMFs, an optimization process is performed by a GA in the Fisher score that is a relative measure in terms of the
conjunction with the PCA technique in order to seek a subset distances between data points in different classes, which
of IMFs that provides a better representation of each one of the means, statistical features with a large Fisher score represent
considered conditions. This optimization is carried out, largest distances while small Fisher score represent smallest
individually, for each considered condition as follows: a distances. Consequently, the features are ranked in terms of
logical vector with ten elements are the chromosomes of the their relevance, in other words, the best ranked features are
GA, in which, every element represents each one of the considered the best discriminative features whereas the worst
obtained IMFs. Then, an initial population is randomly ranked are the non-discriminative features. The feature
generated by considering that at least one of the elements selection is a combinatorial problem; in this regard, the Fisher
contained in the logical vector has to be selected to be score is computed by combining all the statistical time-
evaluated; also more than one element can be evaluated. Once features, where subsets of two and three features are
defined the initial population, the fitness function is assessed, considered. As in the optimization stage, in this process the
which is based on the accumulation of the data variance. Then, faulty conditions are faced to the healthy condition. After
the IMFs to be evaluated are now represented by their computing the Fisher score for each feature, the three first
corresponding statistical-time set of features. In this sense, the ranked features in terms of Fisher score ranking are considered
cumulative variance of the selected IMFs is computed through the most significant set of features that better describes the
PCA, and the fitness function comprises the cumulative machine working condition.
variance of the two and three first principal components. These two first stages correspond to a filtering process of
Afterwards, another population is generated using the Roulette the initial high-dimensional set of features. Thus, for each
wheel selection, moreover the GA applies a mutation and considered fault condition it is computed a set of features
based on the Gaussian distribution the new population is containing the most significant and discriminative information
chosen, then the process is iteratively repeated until finding the of the kinematic chain working condition.
best set of IMFs that best accumulate the data variability; the Finally, in a feature extraction stage, all the filtered sets of
stop criteria considered for the algorithm is controlled by statistical features are subjected to a compression process and
different goals like achieving a maximization of the variance a base transformation by means of LDA. Through this
or reaching a maximum number of generations. In this process, compression process a new set of features are extracted, and
each fault condition is faced to the healthy condition; as a these extracted features are a combination of different weights
result, the sets of IMFs are optimized by removing those from the selected set of features. Consequently, the extracted
redundant information, and just the discriminative information features are projected in a 2-dimensional space allowing a
related to each condition is retained in the optimized IMFs. visual interpretation of the considered conditions. Moreover,
this resulting 2-dimensional representation facilitates the acquisition system (DAS). The DAS is a proprietary low-cost
classification task, since just two inputs must be managed. design based on field programmable gate array technology
With this approach the most discriminative features are (FPGA). The sampling frequency is set to 3 kHz for vibration
projected in a reduced dimensional space in which their signals acquisition, obtaining 270 kS during 90 seconds of
discriminative capabilities between all the considered continuous sampling in the IM from start-up to steady state.
conditions are retained. Regarding the six different conditions considered, the
artificial damage on the bearing is produced by drilling a hole
C. Classification
with 1.191 mm of diameter on the bearing outer race using a
The proposed diagnosis methodology is based on a tungsten drill bit. The artificially damaged bearing, model
consecutive processing of the original set of features, and 6205-2ZNR, used in this experimentation is shown in Fig. 3a.
during this process those features that are significantly The artificial damage in both rotor bar elements are produced
important to represent the characteristic failure patterns are by drilling a hole with 6 mm of diameter. For the ½ BRB fault,
preserved. Therefore, high performance is obtained by the hole has a depth of 3mm that corresponds mostly to the
applying the proposed hybrid feature reduction to the high- 22% of the section of the rotor bar. For the BRB fault, a
dimensional set of features. It must be noted that the proposed through-hole is produced with 14 mm of depth, which
hybrid feature reduction allows a simpler configuration of the corresponds to the complete section of the rotor bar. These
classification stage since the input vectors are reduced to two faults are shown in Fig. 3b and Fig. 3c respectively. The
dimensions. presence of the UNB condition is related to the mechanical
In this sense, a simple structure of a NN-based classifier is load distribution in the IM; thus, a non-uniform load
used to obtain the diagnosis estimation of all the considered distribution takes the center of mass out of the motor shaft. To
conditions. Indeed, the classifier has a classical three-layer do this, the UNB condition is produced by attaching a bolt in
structure. The input layer is composed by two neurons one of the IM coupling, as Fig. 3d shows. Finally, the MAL
corresponding to the two dimensional features vectors condition is present when the centerlines of coupled shafts do
resulting of the proposed hybrid feature reduction not coincide with each other; consequently, the dynamic load
methodology. The hidden layer has ten neurons following on bearings and couplings increases. Therefore, an angular
classical recommendations [31]. The output layer is composed misalignment is carried out by moving the free end of the IM,
by six neurons, one for each considered condition. This simple so that a misalignment of 5 mm on horizontal plane is produced
and classical NN structure has been successfully implemented only from the free end, Fig. 3e shows the misalignment shaft
in different condition monitoring schemes [32]-[33]. In coupling.
addition to the resulting classification, the proposed NN also In most of the considered fault conditions, the experiments
offers the diagnosis probability due to the sigmoid function is are carried out by replacing the healthy elements with each one
used as activation function in the output layer, thus, the of the damaged elements alternatively. Only the MAL
classification result in the NN is related with a probability condition is induced by moving the free end of the IM as
value. The training uses the back-propagation rule for the explained above. The operational frequency of driving IM
gradient estimation and the scaled conjugate gradient as controlled by the VFD is set to 60 Hz, which causes an average
minimization technique. rotating speed of 3585 rpm in the IM.


The experimental test bench used to validate the proposed
diagnosis methodology and the data acquisition system used to
capture the vibration signals are shown in Fig. 2. The test
bench consists on a kinematic chain and it is composed by a
1492-W, three-phase IM (WEG00236ET3E145T-W22), with
its rotational speed controlled through a variable frequency
drive (VFD) (WEGCFW08). It also consists of a 4:1 ratio
gearbox (BALDOR GCF4X01AA) that is used for coupling
the motor drive to a DC generator (BALDOR CDP3604). The
DC generator is used as a non-controlled mechanical load
comprising around 20% of the nominal load. The vibration
signals from the perpendicular plane of the IM axis are
acquired using a triaxial accelerometer (LIS3L02AS4),
mounted on a board with the signal conditioning and anti-alias
filtering. A 12-bit 4-channel serial-output sampling analog-to-
Fig. 2. Experimental test bench used to validate the proposed diagnosis
digital converters (ADS7841) is used on board of the data
generations is fixed to 50 and the algorithm uses the Roulette
wheel selection scheme with mutation and based on the
Gaussian distribution as the selection operator. Meanwhile, in
the PCA two and three principal components are considered to
compute the cumulative variance. The optimization process is
applied individually to the considered conditions; it should be
noticed that the faulty conditions are faced to the healthy
condition. Thus, the optimized set of IMFs contains those
functions with greater variability and relevant information
related to the considered fault condition.
During the optimization process of the GA, the cumulative
variance of the two and three principal components is used to
compare the optimized results. Through this comparison it is
obtained the same set of optimized IMFs. Although the results
Fig. 3. Arrangement of the different faults produced in the experimental test are similar, there is a clear difference in the computational
bench. (a) Bearing defect. (b) ½ Broken rotor bar. (c) 1 Broken rotor bar.
(d) Unbalance. (e) Misalignment. resources. That is, when three principal components are
considered, the computational load increases, and as a
V. VALIDATION OF THE METHOD consequence the convergence time increases approximately
twice compared with the convergence time when only two
The proposed diagnosis methodology is implemented under
principal components are considered. Regarding the
MATLAB, which is used for processing the acquired signals
cumulative variance, by considering three components the
and to provide the fault diagnosis. As some researchers report,
percentage of cumulative variance is around 3% greater than
the related information to the working condition of rotational
the computed cumulative variance when considering only two
machines is reflected in the appearance of perpendicular
components. Thereby, good results with low computational
vibrations on the rotating axis [3], [17], [28]. Thus, the
acquired and stored vibration measurements belong to the resources are obtained when two principal components are
considered in the PCA, as expected. Table II lists the optimized
perpendicular plane to the IM rotating axis. As
set of IMFs for all the considered conditions. In all the
aforementioned, the stored information consist of ninety
seconds of kinematic chain working condition under the optimized set of features the cumulative variance is in the
upper 75%, which reflects a good concentration of the data.
considered conditions; then, each acquisition is segmented in
Also, Table II summarizes the percentages of the cumulative
ninety parts of one second with the aim to generate a set of
variance and the detail of the statistical features with greater
consecutive samples.
attribution considered by the two first principal components.
Afterwards, the signal decomposition is carried out by
Regarding to the optimization process it was stopped because
means of EMD. That is, the decomposition is iteratively
the maximum number of generations was reached, in all the
obtained from each segmented part; as a result, an adaptive
optimization processes it was obtained a good performance; in
characteristic set of IMFs is computed for each considerer
Fig. 4 are shown the graphics related to the performance
condition. Thus, each vibrational pattern is represented by a set
achieved in terms of the cumulative variance for the
of 90 samples with 10 IMFs. After performing the signal
optimization of the IMFs, for all the considered conditions and
decomposition, each function of the resulting sets of IMFs are
it is possible to notice that in all the cases after the 20
then characterized by the estimation of a number of 15
generations the best individual shows the best result.
statistical-time based features. Consequently, a high-
In this optimization stage, redundant information is
dimensional set of features is estimated for each considered
discarded. In order to show the difference between the data
condition. Thus, all of the high-dimensional sets are composed
variance, the PCA is applied to an optimized set of IMFs and
by 90 samples with 150 statistical-time features.
a random set of IMFs of the MAL condition. In this regard, the
Although the estimated high-dimensional set of features
optimized set of IMFs for the MAL condition is composed by
contains a large portion of the information related to the
the two first modes (IMF1 and IMF2); thus, their
kinematic chain working condition, only some will have
corresponding statistical-time features are used by the PCA to
representative information. In this sense, the estimated high-
compute the cumulative variance (91.1%). A representation of
dimensional set of features is then analyzed through the
the scattered data points obtained by the optimized set of IMFs
proposed hybrid feature reduction methodology in order to
is shown in Fig. 5. It could be believed that a better
retain the most discriminative features.
characterization of the machine working condition can be
First, as previously described, the optimization process is
obtained if there is as much information as possible. In this
performed by means of a GA in conjunction with the PCA. The
sense, for the same MAL condition, a random set of the IMFs
main setting parameters of the GA are defined as: a population
composed by the last three modes (IMF8, IMF9 and IMF10) is
of 10 for the number of individuals, the maximum number of
Statistical-time features with
Optimized 75
Condition %σ grater attribution in the two
set of IMFs (a)
principal components 90
Maximum value, root mean
square, square root mean,

Fitness value: cumulative variance

Bearing IMF1, standard deviation, variance, Best 84.2
defect IMF2 RMS shape factor, SRM shape
factor, crest factor, latitude 70
factor, kurtosis. (b)
Maximum value, root mean
square, square root mean,
½ Broken standard deviation, variance, 90
IMF1 84.2% Best 92.3
rotor bar RMS shape factor, SRM shape
factor, crest factor, latitude 85
factor, impulse factor. (c)
Maximum value, root mean 100
square, square root mean,
standard deviation, variance, 90
1 Broken Best 92.4
IMF1 92.3% RMS shape factor, SRM shape
rotor bar
factor, crest factor, latitude 80
factor, impulse factor, kurtosis, (d)
sixth moment. 92
Mean, maximum value, root
mean square, square root 90
mean, standard deviation, Best 91.1
variance, RMS shape factor, 88
Unbalance IMF1 92.4%
SRM shape factor, crest factor, 0 10 20 30 40 50
latitude factor, impulse factor, (e)
skewness, kurtosis, fifth Generation
moment, sixth moment.
Fig. 4. Performance of the GA-based procedure applied during the
Maximum value, root mean
optimization of the number of IMFs to represent each considered fault.
square, square root mean,
Evolution and maximum percentage the cumulative variance obtained: (a)
standard deviation, variance,
IMF1, Bearing defect. (b) ½ broken rotor bar. (c) 1 broken rotor bar. (d) Unbalance.
Misalignment 91.1% RMS shape factor, SRM shape
IMF2 (e) Misalignment.
factor, crest factor, latitude
factor, impulse factor, kurtosis,
sixth moment. conditions are faced to the healthy condition in order to
highlight the best discriminative statistical features.
used to compute its cumulative variance. Thereby, a set of 45 As aforementioned, different subsets are used to compute
statistical-time features estimated from the random set on the Fisher score; thus, the computational resources could be
IMFs is evaluated through the PCA obtaining the cumulative compromised by the number of statistical features used due to
variance of 36.9%. Fig. 6 shows the scattered data points this strategy is a combinatorial problem. In this way, all the
obtained by the random set of IMFs. From Fig. 5 and Fig. 6 it combinations are performed and the statistical-features are
is possible to notice the difference between the data scatter. ranked in terms of their relevance; which means that features
Thereby, in both figures it is used the same scale, and the data with largest values are considered the best discriminative
points from Fig. 5 are widely spread while in Fig. 6 they are feature. Then, the three first ranked subsets of features are
concentrated within a smaller area. Moreover, this comparison considered the best to describe the machine working condition.
proves that not all the information related to the machine Through this feature selection approach, the best subsets of
working condition is useful in condition monitoring schemes, statistical features with a better class separation are obtained,
and the performance of such schemes will be degraded instead besides of the reduction of the optimized sets of statistical-time
of improved. features. For all the considered conditions, Table III shows the
Feature selection is the next process considered in the details of the selected subsets of statistical features obtained by
proposed hybrid feature reduction; in this process, an analysis the combinations of two features, their corresponding IMF and
of the discriminative capabilities of the statistical-time features the computed Fisher score for each statistical feature. The
is carried out by computing the Fisher score. The statistical- obtained Fisher scores reveal that there exists a good
time features considered in this selection process are those separability between the conditions of BD, 1 BRB, MAL and
computed from the optimized sets of IMFs. Fisher score is HLT. On the other hand, an overlapping could appear between
based on a combinatorial problem; thus, all the statistical-time the ½ BRB, UNB and the HLT condition. Thus, to obtain a
features are analyzed by considering subsets of two and three good separability between classes the expected Fisher score
statistical features. Moreover, in this process the faulty should be higher than one; however, the combination of all the
Statistical-time Optimized Fisher score
2nd Principal component

5 feature IMF rank

Root mean square IMF1 76.9
Standard deviation IMF1 76.9
Bearing defect
RMS Shape factor IMF2 62.7
0 Kurtosis IMF2 37.8
Square root mean IMF2 0.13
½ Broken rotor Standard deviation IMF2 0.13
bar Kurtosis IMF2 0.13
Sixth moment IMF2 0.11
RMS Shape factor IMF1 12.4
1 Broken rotor Kurtosis IMF1 6.7
-10 bar Kurtosis IMF2 4.8
Sixth moment IMF2 2.1
0 10 20 30 40 50 60 SMR Shape factor IMF1 0.29
1st Principal component Maximum value IMF2 0.16
Root mean square IMF2 0.16
Fig. 5. Scatterplot of the optimized set of IMFs (IMF2 and IMF2) for the
Standard deviation IMF2 0.15
misalignment condition using two principal components in the PCA.
Root mean square IMF1 89.0
Square root mean IMF1 88.9
Standard deviation IMF1 80.0
10 SMR Shape factor IMF2 67.5

interpretation of all the considered conditions. Fig. 7 shows the

2nd Principal component

5 projection of the extracted set of features resulting from the

application of the proposed hybrid feature reduction. Although
it is expected an overlapping between the ½ BRB, UNB and
0 the HLT condition by means of the LDA, a separation is
obtained. This projection is obtained through a transformation
matrix composed by a set of values with different weights.
-5 Table IV shows the details of the transformation matrix. The
values that compose the transformation matrix prove that the
extracted features projected are not specifically concentrated
-10 in one or two statistical features, and even though some
statistical features have a low weight, these are essential to
0 10 20 30 40 50 60
1st Principal component improve the separability between classes.
Fig. 6. Scatterplot of the random set of IMFs (IMF8, IMF9 and IMF10) for
Previous to the fault classification, a comparison between
the misalignment condition using two principal components in the PCA. the proposed hybrid feature reduction and classical approaches
such as LDA and PCA is carried out. That is, the proposed 15
selected sets of statistical features will be useful in a feature statistical-time features are directly estimated from segmented
extraction technique such as LDA. Regarding the use of parts of the acquired vibration signals. Then, for all the
different subset of features, the results are not significantly considered conditions, a feature extraction is carried out by the
different and, when three statistical features are considered the classical approaches, and these extracted features are also
execution time of the technique is at least twice compared projected in a 2-dimensional space to have the same basis of
when considering two features only. comparison. Fig. 8 and Fig. 9 show the projection of the
In the last stage, a feature extraction is carried out by the extracted features computed by the PCA and LDA,
LDA, in which all the selected sets of statistical features are respectively.
subjected to the compression procedure. The LDA strategy Through the application of both classical approaches some
aims to find a projection by attempting to maximize the linear disadvantage are present. There is a clear difference between
separation between different classes. Through this approach a the extracted features computed through the proposed hybrid
new set of extracted features are obtained, and those extracted feature reduction and the extracted features obtained by the
features are composed by a combination of different weights classical approaches. That is, in Fig. 8 and Fig. 9 an overlap is
of the selected set of features. Besides that, the dimensionality presented between the conditions of ½ BRB, 1 BRB, UNB and
of all the selected sets of statistical features is reduced. HLT, while in Fig. 7 these classes show a better separation.
Consequently, the extracted set of features is projected in a Although the conditions of BD and MAL are not overlapped
2-dimensional space where it is possible to obtain a visual the use of classical approaches of feature extraction such as
0.3 10


2nd Principal component

Feature 2



-0.4 -10
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 -20 0 20 40 60 80
Feature 1 1st Principal component
Fig. 7. Projection of the extracted set of features resulting from the Fig. 8. Projection of the extracted set of features resulting from the classical
application of the proposed hybrid feature reduction strategy. approach PCA.

Statistical-time feature Optimized IMF Column 1 Column 2
Root mean square IMF1 0.570 0.505 0.1
Root mean square IMF1 0.021 0.026
Standard deviation IMF1 0.551 0.481 0.05
RMS Shape factor IMF1 0.095 0.055
Feature 2

Kurtosis IMF1 0.088 0.031 0

Maximum value IMF2 0.014 0.001
Root mean square IMF2 0.297 0.476 -0.05
Root mean square IMF2 0.075 0.037
Standard deviation IMF2 0.376 0.528 -0.1
RMS Shape factor IMF2 0.180 0.025
SMR Shape factor IMF2 0.012 0.010 -0.15
Kurtosis IMF2 0.258 0.014
Sixth moment IMF2 0.131 0.009 -0.2
-0.05 0 0.05 0.1 0.15
Feature 1
PCA and LDA would not be capable to characterize all the Fig. 9. Projection of the extracted set of features resulting from the classical
considered faults. approach LDA.
Regarding the fault classification, a multilayer NN-based
classifier is used to obtain the output classes. Because it is classification using all the variance available in the original
obtained a better performance by the proposed hybrid feature database, a 5-fold cross validation scheme has been applied, in
reduction, a simple structure considered in the classifier allows which five classification ratios are obtained as result of five
obtaining good results without excessive use of resources. iterations with complementary partitions of the original
Thus, the classifier has 10 neurons in its hidden layer, in the database in training and test sets. An averaged classification
output layer a probabilistic sigmoid function is used as ratios of 91% for the training, and 92% for the test have been
activation function and 70 epochs are considered for training obtained.
using the back-propagation rule. These parameters are selected It should be also clarified that the classification ratios
by trial and error tests. obtained during the 5-fold cross validation exhibit a stable
In order to obtain statistically significant results and to behavior, that is, within the range of 89.8% to 91.7% in the
prove the performance of the proposed diagnosis training stage, and within the range of 90.7% to 92.8% in the
methodology, the classifier is trained and tested under a 5-fold test stage. Besides to provide the classification rates, the
cross-validation scheme. Thus, considering all the conditions, decision regions are computed by means of the NN classifier.
the original database is composed by 540 samples, 90 samples A visual representation of the resulting classification
of each condition. This database is divided in two different performance reach during the training and test of the NN
parts, one of them composed by 432 samples for training classifier is provided next. The resulting decision regions and
purposes, 72 samples per condition, and the other one samples projections, using the first fold partition as reference,
composed by 108 samples for testing purposes, 18 samples per are projected and shown in Fig. 10, training, and Fig. 11, test.
condition. In order to analyze the performance of the
Regarding the proposed diagnosis methodology, the
resulting classification ratio achieved from the training and test
of the NN classifier is 91% and 92% respectively. In Table VII
and Table VIII are summarized its respective confusion
matrices corresponding to the evaluation of all the considered
conditions by using the proposed hybrid feature reduction.
Although some misclassifications are obtaining in both
training and test of the NN classifier, the results are promising.
Considering the results generated during the test of the NN
classifier, the global ratio of classification is improved by 12%
and 20% in comparison with the classical approaches PCA and
LDA, respectively. Respecting the right classification of the
healthy condition, which is the most important condition, the
proposed approach improved its correct classification by 39%
compared to the classical approach PCA, and 11% for the
LDA. These results represent a high-performance feature
reduction in the development of diagnosis schemes for
Fig. 10. Projection of the decision regions resulting from the NN-based
classification algorithm. Projection of the training data set corresponding to electromechanical systems.
the first cross validation.

Assigned True Class
HLT 8 0 2 0 2 0
BD 0 18 0 0 0 0
½ BRB 1 0 14 1 0 0
1 BRB 3 0 2 17 4 0
UNB 6 0 0 0 12 0
MAL 0 0 0 0 0 18

Assigned True Class
HLT 13 0 1 3 1 0
BD 0 18 0 0 0 0
½ BRB 1 0 13 13 1 0
1 BRB 2 0 4 2 2 0
Fig. 11. Projection of the decision regions for the multiple fault classification UNB 2 0 0 0 14 0
corresponding to the test of the first cross validation computed by the MAL 0 0 0 0 0 18
proposed NN-based classifier.

To analyze the performance of each class individually, the TABLE VII

same structure of the NN-based classifier is trained and tested CONFUSION MATRIX RESULTING FROM THE EVALUATION OF ALL CONSIDERED
with the extracted features provided by the classical
approaches. Table V and Table VI summarize the confusion Assigned True Class
matrices computed by the classical approaches, PCA and Class HLT BD ½ BRB 1 BRB UNB MAL
LDA, respectively. As the results show in the confusion HLT 58 0 2 0 3 0
BD 0 70 0 0 0 0
matrices of the classical approaches, the misclassification ½ BRB 5 0 70 0 0 0
problems are present between the classes of ½ BRB, 1 BRB, 1 BRB 4 0 0 71 14 0
UNB and HLT. It should be noticed that the most critical UNB 5 0 0 1 55 0
MAL 0 2 0 0 0 72
misclassification cases are related to the HLT condition that
represents a disadvantage. The classification ratio achieved by
the classical approaches PCA and LDA are approximately
80% and 72%, respectively.
classification," IEEE Transactions on Instrumentation and
Measurement, vol. 64, no. 12, pp. 3358-3600, 2015.
Juan Jose Saucedo-Dorantes received the B.E.

degree in electromechanical engineering and the
M.E. degree (Hons.) in mechatronics from the
Autonomous University of Queretaro (UAQ),
Queretaro, Mexico, in 2012 and 2014,
respectively. He is currently working toward the
Ph.D. degree at the Engineering Faculty, UAQ,
Queretaro, Mexico. Since 2012, he has been
doing research work at the HSPdigital group. His
research interest include digital signal processing
on FPGAs for applications in engineering, fault diagnosis in
electromechanical systems, fault detection algorithms, artificial
intelligence and signal processing methods.

