A Domain Adaptation Model For Early Gear Pitting Fault Diagnosis Based On Deep Transfer Learning Network
A Domain Adaptation Model For Early Gear Pitting Fault Diagnosis Based On Deep Transfer Learning Network
A Domain Adaptation Model For Early Gear Pitting Fault Diagnosis Based On Deep Transfer Learning Network
Abstract
In recent years, research on gear pitting fault diagnosis has been conducted. Most of the research has focused on feature
extraction and feature selection process, and diagnostic models are only suitable for one working condition. To diagnose
early gear pitting faults under multiple working conditions, this article proposes to develop a domain adaptation diagnos-
tic model–based improved deep neural network and transfer learning with raw vibration signals. A particle swarm opti-
mization algorithm and L2 regularization are used to optimize the improved deep neural network to improve the
stability and accuracy of the diagnosis. When using the domain adaptation diagnostic model for fault diagnosis, it is neces-
sary to discriminate whether the target domain (test data) is the same as the source domain (training data). If the target
domain and the source domain are consistent, the trained improved deep neural network can be used directly for diag-
nosis. Otherwise, the transfer learning is combined with improved deep neural network to develop a deep transfer
learning network to improve the domain adaptability of the diagnostic model. Vibration signals for seven gear types with
early pitting faults under 25 working conditions collected from a gear test rig are used to validate the proposed method.
It is confirmed by the validation results that the developed domain adaptation diagnostic model has a significant improve-
ment in the adaptability of multiple working conditions.
Keywords
Early gear pitting, multiple working conditions, transfer learning, improved deep neural network
extraction, (2) feature selection, and (3) pattern recogni- method to diagnose bearing faults. Qu et al.27 used the
tion.6 Saravanan et al.7 used wavelet analysis to extract deep sparse autoencoder (SAE) method to diagnose
features from vibration signals and used two pattern gear pitting: the authors combined dictionary learning
recognition methods, artificial neural network (ANN), with sparse coding and then stacked it into the AE net-
and proximal support vector machine (PSVM), to diag- work and diagnosed two types of gear conditions
nose gearbox faults. Wu and Chan8 used acoustic emis- (health, pitting) with raw data as the inputs of deep
sion signals instead of vibration signals for gear faults SAE.
diagnosis, and a continuous wavelet transform tech- The domain adaptability of the diagnostic model is
nique combined with a feature selection of energy spec- also a key evaluation criterion. Ren et al.28 proposed a
trum is used to generate the inputs of ANN. In the new feature extraction method for diagnosing rolling
study by Samanta et al.,9 statistical features extracted bearing faults under varying speed conditions.
from time domain signals were applied as the inputs of Considering the increase in energy when the ball passes
ANN and SVM. In addition, the genetic algorithm through the fault, the frequency values are divided by
(GA) is applied for optimization. Traditional pattern instantaneous speed and corresponding amplitude to
recognition methods such as ANN and SVM can only form a new fault feature array, and the Euclidean dis-
achieve shallow learning tasks, and the diagnosis per- tance classifier was used for recognition. Tong et al.29
formance is directly affected by feature selection pro- proposed domain adaptation using transferable fea-
cess.10,11 Moreover, the feature selection process is tures (DATF) to solve the diagnosis of different work-
done manually, largely depending on prior diagnostic ing conditions. They used maximum mean discrepancy
knowledge. And the feature selection method of one (MMD) to reduce the marginal and conditional distri-
faulty diagnosis issue may not be applicable to another butions simultaneously during domains across. Cheng
issue. et al.30 first transformed the vibration signal into a
In recent years, enthusiasm for deep learning has recurrence plot (RP) with two dimensions and then uti-
been triggered by Hinton et al.12 Deep learning can lized speed up robust feature to extract fault features
overcome the shortcomings of the shallow model. considering the visual invariance characteristic of the
When it is applied to faults diagnosis, the feature selec- human visual system (HVS). Liu et al.31 applied
tion process can be omitted, which can save time and Hilbert–Huang transform (HHT), singular value
labor. There are many different methods for deep decomposition (SVD), and Elman neural network to
learning, and according to the training method, it can solve the bearing fault diagnosis under variable work-
be divided into two types: supervised training and ing conditions. This method is mainly used to apply the
unsupervised training.13 Methods for supervised train- SVD method to reduce the dimension of the instanta-
ing include deep neural network (DNN)14 and convolu- neous amplitude matrix and obtain the insensitive fault
tional neural network (CNN).15,16 Methods for feature. Zhang et al.32 applied the method of transfer
unsupervised training include deep belief network learning (TL) to make diagnostic methods quickly
(DBN)17,18 and autoencoder (AE).19,20 Heydarzadeh et adaptable to other working conditions.
al.21 applied the discrete wavelet transform (DWT) Most of the aforementioned gear pitting fault diag-
results of three common monitoring signals (vibration, nosis methods include feature extraction and feature
acoustic, and torque) as the inputs of the DNN to diag- selection process. Moreover, the conventional diagnos-
nose the five classes of gear faults. Sun et al.22 applied tic model is only suitable for fault diagnosis under one
a dual-tree complex wavelet transform (DTCWT) to working condition. This article proposes a newly devel-
extract multi-scale features of signals. In addition, the oped DNN methodology for diagnosis of early gear
CNN is applied for gear fault diagnosis. Shao et al.23 pitting faults. Meanwhile, particle swarm optimization
also applied DTCWT for feature extraction and used (PSO) algorithm and L2 regularization are used to
adaptive deep belief network (ADBN) for fault diagno- optimize the traditional DNN. In addition, TL is com-
sis. Jia et al.24 used AE technology to pre-train the bined to develop a deep transfer learning network
parameters of the DNN to diagnose rotating machin- (DTLN) to improve the domain adaptability of the
ery faults. Several of the references presented above diagnostic model. The innovation of the proposed
used different deep learning methods to diagnose method is that the feature extraction and selection pro-
mechanical faults, but all include feature extraction cess are omitted, and the domain adaptability of the
process such as DWT. Manual feature extraction pro- network is improved. The rest of the article is orga-
cess is time-consuming and labor-intensive, and unsui- nized as follows: in ‘‘The proposed method’’ section,
table extraction methods will also affect the diagnosis the methodology of the proposed method is intro-
results. Jing et al.25 proposed an adaptive gearbox duced. In ‘‘Experiment setup and data segmentation’’
faults diagnosis method based on deep convolutional section, the data collected from the experimental test
neural network (DCNN), and there is no feature rig and preprocess of the collected vibration data are
extraction process in the article, and the raw data col- explained. In ‘‘Results and discussions’’ section, the
lected from the experiment were directly applied as the validation of proposed method using the collected
inputs of the DCNN. Wang et al.26 proposed the adap- vibration data is reported. Finally, ‘‘Conclusions’’ sec-
tive deep convolutional neural network (ADCNN) tion concludes the article.
170 Proc IMechE Part O: J Risk and Reliability 234(1)
Figure 2. Different learning processes between traditional machine learning and transfer learning.
calculate the particle velocity and position, as shown in vmax vij . vmax xmax xij . xmax
vij = ; xij =
equations (27) and (28) vmin vij \ vmin xmin xij \ xmin
ð30Þ
vij ðt + 1Þ = rvij ðtÞ + c1 e1 Pbj ðtÞ xij ðtÞ
The range of particle velocity cannot be too large,
+ c2 e2 gbj xij ðtÞ ð27Þ
otherwise the system will be unstable and it is easy to
xij ðt + 1Þ = xij ðtÞ + vij ðt + 1Þ ð28Þ ‘‘skip’’ the optimal solution during particle iteration.
Particle activity range setting is also similar to the speed
where i is the ith particle; j is the jth dimensional of the setting, and limiting the particles’ position helps find
P-dimensional space; c1 and c2 are the learning factor; the optimal solution.
c1 is the particle’s own part, expresses its own under- The initial position of the particles is randomly
standing and influence on the optimization; c2 is the assigned within a certain range, and the optimal solu-
social part, indicates that the particles are affected by tion found by the several iteration may not be the glo-
the population; t is the number of iterations; e1 and e2 bal optimal solution. Therefore, the position of the
are random numbers that are evenly distributed particles should be mutated at a certain probability,
between 0 and 1; and r is the inertia weight of particle, which can increase the diversity of particles and find
indicates that it is affected by the last speed. optimal solution in a new area. After repeating the
PSO is used to optimize the parameters in the DNN. above-mentioned operation several times, the global
If the DNN contains a total of k parameters, the dimen- optimal solution can be found.
sional space j in equations (27) and (28) is equal to k.
The number of particles is set empirically, and each par-
ticle contains j parameters. Select the best performing The framework and diagnostic process of DTLN
particle after t iterations and attach its j parameters to Figure 3 shows the framework of DTLN. It can be seen
the DNN as the initial parameters. The weight deter- that the overall framework of DTLN is divided into
mines the influence of the previous speed of the particle two parts: (1) when the training and test data are in the
on the current speed, which plays a role in balancing same working condition, perform 1 -2 (purple circles
the global search and the local search. As shown in marked in Figure 3) and (2) when the test data (target
equation (29), the weights are linear decay with itera- domain) are different from the training data (source
tions. This makes the particle swarm algorithm have domain), perform 1 -3 -4 -5.
strong search ability at the beginning of the iteration, The detailed diagnostic process of the DTLN is
and good local search ability in the later stage45 defined as follow:
t
r = rmax ðrmax rmin Þ ð29Þ Step 1: Select one working condition data from all the
tmax
collected data. Then, cut the raw data into n segmenta-
where rmax is the set maximum weight, rmin is the set tions with the same amount of points. Finally, divide all
minimum weight, and tmax is the maximum number of segmentations into two groups, 80% of which is used
iterations. for training and the remaining 20% for testing.
The position and velocity of the particles all have a Step 2: Set the structure of the IDNN, set the minimum
range. When the velocity or position value is out of training error and the maximum epoch of training, and
range, the processes as shown in equation (30) will be use the PSO algorithm to generate the initial weight
performed and bias of IDNN.
Li et al. 173
Step 3: Randomly select a batch of segmentations as Experiment setup and data segmentation
the inputs of the IDNN.
Step 4: Get the actual output through IDNN network,
Experiment setup
and use the cost function corrected by L2 regularization The experimental test rig and gear pitting type are
to calculate the error between the actual output and the shown in Figure 4. The gearbox is driven by two 45 kW
ideal output. Siemens servo motors: motor 1 is the drive motor and
Step 5: Compute the gradients of weights and biases in motor 2 is the load motor. The gearbox contains a pair
each layer with the back propagation algorithm, and of spur gears. The driving gear connected to the motor
update the weights and biases with the learning rate. 1 has 40 teeth, the driven gear connected to the motor
Step 6: Change another batch of segmentations to 2 has 72 teeth, and the gear module is 3 mm. The gear-
repeat Steps 3–5 until all the training data are used up. box was also equipped with a lubrication and cooling
Step 7: Repeat Steps 3–6 until training epochs reach the system, and the vibration sensor is mounted on the
maximum epoch or the output error reaches the mini- bearing housing of the driven gear.
mum set value. Table 1 describes the gear pitting condition in Figure
Step 8: Test the trained network with the testing data. 4. Six different early pitting were designed manually by
When the test working condition is the same as the a drill on the driven gear, and the degree of gear pitting
selected working condition in Step 1, the trained net- is gradually increased, as shown in Table 1. The setting
work will be directly used for fault diagnosis. of gear pitting fault simulates the process of gear pitting
Otherwise, Steps 9–10 will be performed. from small to large and can also analyze the relation-
Step 9: Transfer the parameters of the trained IDNN to ship between pitting type and fault diagnostic accuracy.
the new diagnostic model. This article proposes to establish a gear pitting diag-
Step 10: Fine-tune the new diagnostic network with a nosis model suitable for various working conditions, so
small amount of data (fine-tuning with 1% of all data the vibration data of various working conditions are
has a significant improvement) from the target domain. collected to construct and test the model. In the experi-
Finally, the fine-tuned model is used to diagnose the ment, vibration signal under five speed conditions and
fault. five torque conditions are collected, a total of 25
174 Proc IMechE Part O: J Risk and Reliability 234(1)
Figure 4. (a) Experimental test rig and (b) gear pitting type.
Figure 5. The vibration signal of 100 r/min-100 Nm: (a) one second signals and (b) one segmentation signals.
working conditions, as shown in Table 2. Note that the We collected vibration signal five times in each gear
circles in Table 2 represent the six conditions used in fault type (C1–C7). So there are 35 files in each working
the mixed working condition diagnosis in ‘‘Diagnosis condition and 60,000 data points per file. The number
results of IDNN under multiple working conditions’’ of data points in each file is too large to be directly used
and ‘‘Diagnostic results with DTLN’’ sections. as input to the DNN, so we cut the raw signal into suit-
able segmentation. The advantage of data segmentation
is that the number of neurons in input layer is reduced,
Data segmentation which in turn reduces the complexity of the DNN struc-
The tri-axial accelerometer was mounted on the bear- ture and makes the network fit more quickly. On the
ing housing of the driven gear and collected vibration contrary, the training sample size and sample diversity
signals in all the three directions, with a sampling rate is increased, and the diagnostic accuracy of the network
of 10,240 Hz. In this article, the vibration signals of is improved.
seven kinds of gears under 25 working conditions are The sampling rate is 10,240 Hz and the max rotation
collected. Comparing the vibration signals of all the speed is 500 r/min, so approximately 1200 data points
three directions, the amplitude of Z-axis is the largest. per gear rotation can be computed. We put 300 data
Therefore, we use the Z-axis vibration signal in the points (quarter of per gear rotation collected data) in
diagnosis of gear pitting faults. The vibration signal in each segmentation.46 So each file is divided into 200
the Z-axis of 100 r/min-100 Nm working condition is segmentations, a total of 7000 segmentations. About
shown in Figure 5(a). 80% of all data are used for training and the rest is
Li et al. 175
Table 2. Experimental working conditions. Table 3 shows the effect of PSO algorithm on training
time and training accuracy. The term NAN in the table
Speed (r/min) 100 200 300 400
Torque (Nm) indicates that the network does not converge. The PSO
algorithm allows the network to start with good initial
100 s N s N parameters. In this case, it is possible to choose a larger
200 N N N N learning rate and speed up the network convergence.
300 s N N N
400 N N N N Figure 6(b) shows the influence of the magnitude of
500 s N N N the L2 coefficient l on the diagnostic accuracy. It can
be seen from the figure that when l is equal to 0, that
is, there is no L2 optimization, the accuracy is about
0.9. As the value of l increases, the accuracy shows an
used for testing. The diagnostic model training matrix upward trend. The accuracy reaches the maximum
dimension for each working condition is 300 3 5600, value of 0.96386 when l is equal to 0.35. As L2 coeffi-
and the testing matrix dimension for each working con- cient l continues to increase, the fluctuation of the
dition is 300 3 1400. accuracy becomes larger, that is, the stability of the
diagnostic model decreases.
Results and discussions The confusion matrixes of standard DNN (SDNN)
method and IDNN are shown in Figure 7. The activa-
Diagnosis results of IDNN under working condition tion function ReLU is used in the standard DNN. It
100 r/min-100 Nm can be seen that the improved method has a better diag-
First, we should decide the structure of the IDNN: the nostic accuracy. The misdiagnosis of the two methods
number of neurons in input layer is equal to the num- is consistent (case1: C2 misjudge as C4, case2: C2 mis-
ber of data points in segmentation (300 neurons), seven judge as C6, case3: C5 misjudge as C6, case4: C6 mis-
neurons in the output layer (corresponding to seven judge as C4). The initial judgment of misdiagnosis is
gear types), and contained three hidden layers (300, due to the occasional single-tooth engagement of the
200, and 100 neurons). The minimum training error is gearbox resulting in a change in the type of fault.
set to 0.01 and the maximum training epochs is set to The diagnostic accuracy of four methods for diag-
150. All samples are randomly branched, and then each nosing gear pitting faults under the 100 r/min-100 Nm
branch is trained in turn, and one training epoch is is shown in Table 4. When the SVM and ANN meth-
completed when all branches are trained. ods were used, 12 statistical characteristics (mean, root
Figure 6(a) shows the effect of PSO on training. By mean square (RMS), variance, etc.) were extracted
comparison, it is found that after PSO optimization, from the time domain and frequency domain. On the
the initial error is reduced from 25 to 2, and the num- contrary, the standard DNN method and proposed
ber of training epoch is also greatly reduced, which method used the raw vibration signal as the input.
means that PSO optimization can shorten the training The fault type of the gear is the type corresponding
process and make the training process more stable. to the neuron with maximum value. The diagnostic
Table 3. The effect of PSO on the training time and diagnostic accuracy.
Figure 6. Training error curve of hybrid model: (a) influence of PSO and (b) influence of L2 coefficient l.
accuracy has not differed when the maximum output Diagnosis results of IDNN under multiple working
of neurons is 0.5 and 0.99. Therefore, the diagnostic conditions
accuracy cannot fully represent the diagnostic ability of
the network. We perform principal component analysis ‘‘Diagnosis results of IDNN under working condition
(PCA) on the output matrix of the network to further 100 r/min-100 Nm’’ section shows the results of apply-
analyze the diagnostic ability of three methods, and ing IDNN to diagnose gear faults under 100 r/min-
then used the first two principal components (PCs) of 100 Nm working conditions. This section applies a vari-
the PCA results to form a scatter figure, as shown in ety of working conditions to verify the adaptability of
Figure 8. The diagnostic accuracy of Figure 8(a) and IDNN for diagnosing multiple working conditions.
(b) is similar, but from the PCA results, we can know Figure 10(a), (c), and (e) shows the diagnostic accuracy
the diagnostic ability of SDNN method is significantly of the three methods (SVM, ANN, and IDNN) in 25
better than ANN method. Compared with the SDNN working conditions (as shown in Table 2). It can be
method, diagnostic ability of the IDNN has also found from Figure 10(e) that IDNN method has a high
improved significantly. accuracy under each working condition, but it is neces-
The parameter setting during training also affects sary to retrain the network when the working condi-
the diagnostic accuracy. Figure 9 shows the effect of tions change. Figure 10(b), (d), and (f) shows the cross-
parameters learning rate and batch size (samples in each diagnosis accuracy of six working conditions (labeled
batch). Figure 9(a) and (b) shows that as the learning as circles in Table 2) without retraining the network. It
rate and batch size increase, the accuracy decreases. can be seen from Figure 10(f) that the diagnostic
Li et al. 177
Method C1 C2 C3 C4 C5 C6 C7 Average
SVM: support vector machine; ANN: artificial neural network; SDNN: standard deep neural network; IDNN: improved deep neural network.
Figure 8. The PCA result of three kinds of network outputs: (a) ANN, (b) SDNN, and (c) improved DNN.
Figure 9. Influence of the network parameter on diagnostic accuracy: (a) learning rate and (b) batch size.
accuracy is better only when the training and testing However, a well-trained IDNN under one working
data are from the same working condition. In other condition can only diagnose the data under this condi-
words, a trained IDNN developed under one working tion. In order to improve the working condition adapt-
condition is only applicable to the same working condi- ability of the diagnostic model, this article proposes a
tion and cannot be used in other working conditions. DTLN based on TL. This section applies six working
conditions (labeled as circles in Table 2) to test the
adaptability of the DTLN. The six working conditions
Diagnostic results with DTLN are as follows: A: 100 r/min-100 Nm, B: 100 r/min-
As can be seen from Figure 10, IDNN has a good diag- 300 Nm, C: 100 r/min-500 Nm, D: 300 r/min-100 Nm,
nostic accuracy under each working condition. E: 500 r/min-100 Nm, and F: 500 r/min-500 Nm.
178 Proc IMechE Part O: J Risk and Reliability 234(1)
Figure 10. Diagnostic accuracy of different methods: (a), (b) SVM method; (c), (d) ANN method; (e), (f) IDNN method. The
training data and test data used in (a), (c), and (e) are from the same working condition; the training data and test data used in (b),
(d), and (f) are from different working conditions.
Figure 11 shows the diagnostic accuracy changes A and test with the remaining data in working condi-
corresponding to an increase in the training sample size tion A; (2) case 2 (A-B with DTLN, with different train-
of both DTLN and IDNN. The horizontal axis is the ing sample size): uses DTLN to diagnose faults, where
target domain sample size used to fine-tune the pre- the source domain Ds was used as data in working con-
trained network. The data in working condition A and dition A and the target domain Dt as data in working
data in working condition B were used as training data condition B. As discussed in ‘‘Data segmentation’’ sec-
for the results in Figure 11(a) and (b), respectively. The tion, the number of samples in each working condition
four curves in Figure 11(a) correspond to four cases: was 7000. Setting the percentage of the data for fine-
(1) case 1 (A-A with IDNN, all samples are used): trains tuning from 0.1% to 2%, the fine-tuning sample size
the network with 80% of the data in working condition used was changed from seven (7000 3 0.1%) to 140
Li et al. 179
Figure 11. The accuracy changes corresponding to the changes in training sample size for DTLN and IDNN: (a) source domain:
working condition A, target domain: working condition B; (b) source domain: working condition B, target domain: working condition
A.
(7000 3 2%). (3) Case 3 (A-B with IDLN, all samples In summary, DTLN can not only make model adapt to
are used): trains the network with 80% of the data in multiple working conditions, but also save training
working condition A, and then test the trained network time and samples.
with 20% data from working condition B. (4) Case 4
(A-A with IDLN, accuracy fluctuates with training
sample size): trains the network with data sample size Conclusions
from seven to 140 in working condition A, and then In this article, a domain adaptation model for early
tests the trained network with 20% data in working gear pitting fault diagnosis based on deep TL was pre-
condition A. In Figure 11(b), data in working condition sented. By combining an IDNN with TL, DTLN was
B were used as the source domain and data in working developed to make the diagnostic model have a good
condition A were used as the target domain. As can be diagnostic accuracy under multiple working conditions.
seen from the figure, the DTLN can achieve high diag- The vibration signals for seven types of gears with early
nostic accuracy with 1% of data for fine-tuning. pitting faults under 25 working conditions collected
Figure 12 shows a comparison of the diagnostic from a gear test rig were used to validate the DTLN.
accuracy of DTLN and IDNN with different target Based on the validation results, we can draw the fol-
domains. Taking Figure 12(a) as an example, the data lowing conclusions:
in working condition A as source domain were used to
train the model and the data in other five working con- 1. Using PSO optimization to initialize model para-
ditions (B to F) were used as the target domain to test meters speeds up the training process. L2 regulari-
the model. When using the DTLN method, 5% of the zation improves the diagnostic ability of the
target domain data were used to fine-tune the pre- diagnostic model by weight decay during training.
trained model. Comparing the diagnostic accuracy of 2. The IDNN has a high diagnostic accuracy when
the two methods, it can be found that the DTLN is sig- the target domain (testing data) and the source
nificantly more adaptable to different working condi- domain (training data) are in the same working
tions than IDNN. The DTLN method not only condition, and the maximum accuracy can reach
improves the diagnostic accuracy under multiple work- 99.93%. However, a diagnostic model developed
ing conditions, but also requires fewer training samples with IDNN is only suitable for fault diagnosis
and less training time. The IDNN required 67 s to train under the same working condition.
the model with 80% data in working condition A. 3. The DTLN overcomes the shortcomings of IDNN,
When the data in working condition B were used as the and greatly improves the adaptability of the diag-
target domain, it took 72 s to develop the model with nostic model to multiple working conditions.
80% data in working condition B. However, using the Moreover, to fine-tune the pre-trained model, only
DTLN method to fine-tune the model required only a small number of target samples and less training
6 s, which reduced the training time by a factor of 10. time are required.
180 Proc IMechE Part O: J Risk and Reliability 234(1)
Figure 12. Comparison of diagnostic accuracy between DTLN and IDNN: (a) source domain: working condition A, (b) source
domain: working condition B, (c) source domain: working condition C, (d) source domain: working condition D, (e) source domain:
working condition E, and (f) source domain: working condition F.
Declaration of conflicting interests publication of this article: This work was funded by the
The author(s) declared no potential conflicts of interest National Natural Science Foundation of China (No.
with respect to the research, authorship, and/or publi- 51675089 and No. 51505353).
cation of this article.
ORCID iDs
Funding
Jialin Li https://orcid.org/0000-0002-9940-179X
The author(s) disclosed receipt of the following finan- Xueyi Li https://orcid.org/0000-0002-1751-2809
cial support for the research, authorship, and/or David He https://orcid.org/0000-0002-5703-6616
Li et al. 181
33. Wang Z, Wang J and Wang Y. An intelligent diagnosis convolutional neural network-based transfer learning.
scheme based on generative adversarial learning deep IEEE Access 2018; 6: 26241–26253.
neural networks and its application to planetary gearbox 40. Qian WW, Li SM and Wang JR. A new transfer learning
fault pattern recognition. Neurocomputing 2018; 310: method and its application on rotating machine fault
132–222. diagnosis under variant working conditions. IEEE Access
34. Cui JL, Qiu S, Jiang MY, et al. Text classification based 2018; 6: 69907–69917.
on ReLU activation function of SAE algorithm. In: Pro- 41. Mohammadi N and Mirabedini SJ. Comparison of parti-
ceedings of the international symposium on neural net- cle swarm optimization and backpropagation algorithms
works, Hokkaido, Japan, 21–26 June 2017, pp.44–50. for training feedforward neural network. J Math Comp
Cham: Springer. Sci 2014; 12: 113–123.
35. Ye J. Fault diagnosis of turbine based on fuzzy cross 42. Kulkarni VR and Desai V. ABC and PSO: a comparative
entropy of vague sets. Expert Syst Appl 2009; 36(4): analysis. In: Proceedings of the IEEE international confer-
8103–8106. ence on computational intelligence & computing research,
36. Clevert DA, Unterthiner T and Hochreiter S. Fast and Chennai, India, 14–16 December 2017. New York: IEEE.
accurate deep network learning by exponential linear 43. Chen L, Xiao C, Li X, et al. A seismic fault recognition
units (ELUs). In: Proceedings of the international confer- method based on ant colony optimization. J Appl Geo-
ence on learning representations, San Juan, PR, 2–4 May phys 2018; 152: 1–8.
2016, https://arxiv.org/abs/1511.07289 44. Rajeswari C, Sathiyabhama B, Devendiran S, et al. A
37. Zhao M, Chow TWS, Zhang H, et al. Rolling fault diag- gear fault identification using wavelet transform, rough
nosis via robust semi-supervised model with capped l2, 1- set based GA, ANN and C4.5 algorithm. Procedia Engi-
norm regularization. In: Proceedings of the IEEE interna- neer 2014; 97: 1831–1841.
tional conference on industrial technology, Toronto, ON, 45. Fang H. Monopole-gear design based on neural network
Canada, 22–25 March 2017. New York: IEEE. and modified particle swarm optimization. Appl Mech
38. Shao SY, McAleer S, Yan RQ, et al. Highly accurate Mater 2013; 477–478: 368–373.
machine fault diagnosis using deep transfer learning. 46. Zhang R, Peng Z, Wu L, et al. Fault diagnosis from raw
IEEE T Ind Inform 2018; 15: 2466–2455. sensor data using deep neural networks considering tem-
39. Cao P, Zhang S and Tang J. Pre-processing-free gear poral coherence. Sensors 2017; 17(3): E549.
fault diagnosis using small datasets with deep