Paper FD With RL 2
Paper FD With RL 2
Paper FD With RL 2
net/publication/336186043
CITATIONS READS
76 806
7 authors, including:
Yu Ding Liang Ma
Beihang University (BUAA) Beihang University (BUAA)
26 PUBLICATIONS 322 CITATIONS 10 PUBLICATIONS 164 CITATIONS
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Yu Ding on 23 November 2020.
Abstract
Fault diagnosis methods for rotating machinery have always been a hot research
topic, and artificial intelligence-based approaches have attracted increasing attention
from both researchers and engineers. Among those related studies and methods,
artificial neural networks, especially deep learning-based methods, are widely used to
extract fault features or classify fault features obtained by other signal processing
techniques. Although such methods could solve the fault diagnosis problems of rotating
machinery, there are still two deficiencies. (1) Unable to establish direct linear or non-
linear mapping between raw data and the corresponding fault modes, the performance
of such fault diagnosis methods highly depends on the quality of the extracted features.
(2) The optimization of neural network architecture and parameters, especially for deep
neural networks, requires considerable manual modification and expert experience,
which limits the applicability and generalization of such methods. As a remarkable
breakthrough in artificial intelligence, AlphaGo, a representative achievement of deep
reinforcement learning, provides inspiration and direction for the aforementioned
shortcomings. Combining the advantages of deep learning and reinforcement learning,
deep reinforcement learning is able to build an end-to-end fault diagnosis architecture
that can directly map raw fault data to the corresponding fault modes. Thus, based on
deep reinforcement learning, a novel intelligent diagnosis method is proposed that is
able to overcome the shortcomings of the aforementioned diagnosis methods.
Validation tests of the proposed method are carried out using datasets of two types of
rotating machinery, rolling bearings and hydraulic pumps, which contain a large
number of measured raw vibration signals under different health states and working
conditions. The diagnosis results show that the proposed method is able to obtain
intelligent fault diagnosis agents that can mine the relationships between the raw
vibration signals and fault modes autonomously and effectively. Considering that the
learning process of the proposed method depends only on the replayed memories of the
agent and the overall rewards, which represent much weaker feedback than that
obtained by the supervised learning-based method, the proposed method is promising
in establishing a general fault diagnosis architecture for rotating machinery.
Keywords: fault diagnosis, rotating machinery, deep reinforcement learning, deep Q-
network
1. Introduction
Rotating machinery is currently employed in a wide variety of industrial
applications, including the petroleum, energy and chemical industries. Such
applications require high reliability, safety, and economy of the machines during
operation, as the downtime caused by failure may cause economic loss or even
catastrophic accidents [1]. With the development of reliability-related theories, people
realize that crafting “absolutely reliable” machines is impossible. Thus, recent studies
that aim to monitor, evaluate, diagnose and predict the health state of machinery have
attracted considerable interest [2, 3]. Data-driven signal processing techniques are a
significant research direction in fault diagnosis of rotating machinery, and these
techniques are used to analyse the collected signals and extract useful fault
characteristics. Such methods can adaptively analyse the collected data and obtain
sensitive information for distinguishing the health state of machinery [4–6].
As fault diagnosis can be regarded as a pattern recognition problem, artificial
intelligence (AI) has shown great potential in both academic studies and industrial
applications. Among the AI-based intelligent fault diagnosis methods for rotating
machinery, artificial neural networks (ANNs) have been researched and employed in
fault feature extraction and fault mode classification. Bin et al. [7] utilized the feature
obtained by wavelet packet transformation and empirical mode decomposition as the
input of a three-layer ANN to identify early rotor lateral cracks in rotating machinery.
Rafiee et al. [8] used a multiple layer perceptron (MLP) ANN, which takes the wavelet
packet decomposition results of the collected data as input, for fault detection and
diagnosis of gearboxes. Samanta et al. [9] utilized features extracted from the time
domain to identify the state of bearings and employed ANNs and support vector
machines (SVMs) to classify the bearing faults. In the above approaches, the ANNs
were used as a classifier, which still relied on the features extracted by time-domain and
frequency-domain signal processing methods. There are two major drawbacks when
employing ANN-based fault diagnosis methods. The first is that such methods rely on
manual feature extraction, which largely depends on prior knowledge of signal
processing techniques and expert experience. Second, the shallow architecture of ANNs
lacks enough ability to learn the relationships that show the complex non-linear
characteristics existing in fault diagnosis problems. With the advent and development
of deep learning, such problems have been solved to a certain extent [10]. Deep learning
is a kind of machine learning algorithm that can learn multiple levels of abstract
representations of different levels through supervised or unsupervised learning [11].
Instead of extracting the fault features manually, deep learning methods are able to
adaptively learn the hierarchical representation from raw data through multiple non-
linear transformations and approximate complex non-linear functions. Inspired by the
successful applications of deep learning in the fields of natural language processing and
image recognition, deep learning-based fault diagnosis methods have attracted
increasing attention [12]. Based on deep neural networks (DNNs), Jia et al. [13]
proposed an intelligent diagnosis method to automatically mine the fault characteristics
from the frequency spectrum of rotating machinery and classify the health conditions.
Lu et al. [14] used a convolutional neural network (CNN), which can be used to
diagnose the faults of rolling bearings, to learn the hierarchical feature representations
of rolling bearings. Liu et al. [15] utilized recurrent neural network (RNN)-based
autoencoders to process the multiple time sequence data collected from rotating
machines. The anomalous conditions detecting and fault type classification are realized
by comparing the reconstruction error between the data of the next time stamp and the
predicted data generated by RNN-based denoising autoencoders. Shao et al. [16]
proposed a method named the improved convolutional deep belief network (CDBN),
which used compressed sensing (CS) for feature learning and fault diagnosis of rolling
bearings. The proposed method used CS to reduce the amount of vibration data and
CDBN to enhance the feature learning capability for the compressed data.
Through a literature review, ANN-based methods can be effectively crafted as a
basic classifier [7–9], while deep learning-based methods can realize adaptive fault
feature extraction by DNN architecture and hierarchical representation [13, 14, 16].
Compared with traditional ANN-based methods, deep learning-based methods have
improved automation and autonomy in seeking representations that are strongly
correlated with faults. However, there are still two major deficiencies that need to be
improved. (1) Choosing a proper architecture and optimizing the corresponding hyper-
parameters for a DNN still requires a great deal of prior knowledge and experience,
which limits the generalization ability of such methods. Although deep learning-based
methods can effectively and automatically extract fault features, there are still many
details to be determined for a specific task. (2) The training mechanism of such methods
is mostly based on supervised learning or semi-supervised learning, which means that
the diagnosis algorithms need to be taught to learn the different fault patterns
specifically. Such a mechanism does not conform to the human cognitive mechanism
in the real world, which makes it impossible for the fault diagnosis agents to explore
the problem they face and learn to solve it autonomously. These deficiencies limit the
generalization and intelligence level of deep learning-based fault diagnosis methods.
Therefore, it is necessary to establish a deep reinforcement learning mechanism-based
intelligent architecture to realize autonomous fault diagnosis of rotating machinery.
Deep reinforcement learning (DRL) provides new ideas for further improving the
intelligence of fault diagnosis and has the potential to solve the problems mentioned
above. DRL involves the combination of reinforcement learning and deep learning, and
this combination enables artificial agents to learn their knowledge and experience from
raw data directly. Deep learning gives agents the ability to sense the environment, while
reinforcement learning gives agents the ability to learn the best strategies to deal with
real-world problems [17]. Utilizing DRL algorithms, agents can learn by themselves to
obtain successful strategies that lead to the highest long-term rewards directly from raw
input data without any hand-engineered features or domain heuristics [18]. One of the
most famous and striking achievements of DRL is AlphaGo, which is the first program
to defeat a professional human player in Go [19]. Such success has aroused much
enthusiasm in applying DRL methods in different fields. In a wide range of problems,
DRL algorithms have been applied in fields such as robotics, natural language
processing, and computer vision. In robotics, DRL is applied to train robots to learn
control policies directly from camera signals from the real world [20, 21]. Furthermore,
DRL has been used to create agents that can generalize to an environment they have
never seen before, which is considered meta-learning [22, 23]. For the purpose of
creating agents or systems that can learn how to adapt in the real world, DRL algorithms
have been applied not only in managing power consumption [24] and picking and
stowing objects [21] but also in designing novel and machine translation models [25]
and constructing new optimization functions [26]. As a general way of solving
optimization problems by trial and error, DRL has already been studied and applied to
approach most fields of machine learning.
To overcome the deficiencies of the aforementioned existing methods, this paper
proposes a novel intelligent fault diagnosis method based on fault diagnosis agents
trained utilizing DRL. In this method, a fault diagnosis “game” is established, which is
able to provide an interactive environment for the fault diagnosis agent to observe, act
and receive rewards. Then, the fault diagnosis agent is built through a stacked
autoencoder (SAE) and learns to diagnose faults utilizing a deep Q-network (DQN)
combined with experience replay and a separated target network [27]. The highlights
of the proposed method are outlined as follows. (1) Combining the autonomous learning
ability of reinforcement learning and the perception ability of deep learning, the
proposed deep reinforcement learning-based method can realize an end-to-end fault
diagnosis method for rotating machinery. The proposed method greatly reduces the
dependence on prior knowledge and expert experience for fault diagnosis. (2) By
designing a fault diagnosis game environment, the proposed method can successfully
obtain an intelligent fault diagnosis agent using DRL. (3) Benefitting from memory
replay and reward-based feedback mechanisms, the fault diagnosis agent is able to learn
the non-linear mapping relationships between different fault modes and raw vibration
signals based only on weak feedback. This paper is organized as follows. Section 2
briefly introduces the theoretical background of the DQN. Section 3 describes the
proposed DRL-based intelligent fault diagnosis method. Section 4 gives a detailed
description of validating the proposed method using rolling bearing datasets and a
hydraulic pump dataset.
2.1 Q-learning
where 𝑄𝑛𝑒𝑤 (𝑠𝑡 , 𝑎𝑡 ) is the new Q value for state 𝑠𝑡 and action 𝑎𝑡 , 𝑄(𝑠𝑡 , 𝑎𝑡 ) is
the current Q value, 𝛼 is the learning rate, 𝑅𝑡+1is the reward for taking action 𝑎𝑡 at
state 𝑠𝑡 , and max 𝑄 (𝑠𝑡+1 , 𝑎) is the maximum expected future reward given the new
𝑎
With the definition and the differential form of the Q-function, the optimization of
the target is applicable. Additionally, the Q-learning algorithm can be recovered in this
framework by updating the weights after every time step, replacing the expectations
using single samples, and setting 𝜃𝑖− = 𝜃𝑖−1 .
In the training process of the DQN, two modifications of Q-learning are made to
ensure that the training process of DNNs does not diverge. The first is the experience
replay in which the agent’s experiences are stored into a replay memory by stacking the
state, action, and reward of the current time step and the state of the subsequent time
step. Experience replay is an efficient technology in avoiding the oscillations or
divergence in the parameters, smoothing out learning by enabling the agent to consider
its experience in the learning process. The second modification to Q-learning is to use
a separate network for generating the targets in the Q-learning updating process, and
this modification can significantly improve the stability of the DQN. Such methods can
add a delay between the moment of updating the Q value and the corresponding
influence that the update might cause, which reduces the likelihood of divergence or
oscillations existing in the parameters of DNNs.
Thus far, although the training process of the agent is completed, the agent cannot
yet be used to diagnose rolling bearing faults. The agent can output only the Q values
of actions, instead of giving out the diagnosis result. Thus, the output layer of the agent
needs to be modified. This modification determines the maximum value of the Q values
and then outputs the corresponding action. Thus, an agent that is able to diagnose the
faults of rotating machinery is obtained. Notably, in both the training and testing
processes, the input data are the original vibration signals. In other words, an end-to-
end fault diagnosis method using SAE-based DRL is established. The complete method
flowchart is shown in Fig. 2.
Fig. 2. Flowchart of the proposed method
4. Fault diagnosis using the proposed method
Rolling bearings and hydraulic pumps, two representative research objects of
rotating machinery, are crucial for the safety of industrial systems. Due to severe
working conditions, they are prone to suffer different kinds of damage, which could
lead to a breakdown or even catastrophic failure. In this section, two fault diagnoses for
rolling machinery cases, including rolling bearings and hydraulic pumps, are utilized to
illustrate the effectiveness of the proposed method.
The fault data of rolling bearings used in our study are collected by Case Western
Reserve University (CWRU) [34]. The dataset consists of ball bearing test data for both
normal and faulty bearings. The power producer of the experiments is a two-
horsepower Reliance Electric motor, and the vibration data are acceleration data
collected from the drive end (DE) with a sampling frequency of 48 kHz, which was
measured at the near end of the motor bearings. The experimental equipment for rolling
bearing fault injection is shown in Fig. 3. Data under four conditions were collected,
including one normal condition (N) and three fault conditions. The fault conditions are
outer race fault (OF), inner race fault (IF) and roller fault (RF). In addition, the faults
introduced to the rolling bearings were all single point faults. Moreover, the degree of
the faults was different in fault diameters including 0.007 inches, 0.014 inches, 0.021
inches, and 0.028 inches.
To validate the proposed method, the vibration data are divided into four sets (S1,
S2, S3, and S4) according to loads. Each set contains ten fault modes, which are divided
according to fault diameter and fault location. S1, S2, and S3 contain all of the fault
modes under loads of one, two and three horsepower, respectively. There are 12000
samples for each fault set, and each sample contains 400 data points. The datasets are
all raw vibration signals without preprocessing. S4, which is the union set of S1, S2,
and S3, has 36000 samples under all of the loads. The detailed information is shown in
Table 1.
Table 1 Description of the rolling bearing fault datasets
Load Fault Fault
Datasets Number of samples Alias Label
(horsepower) type diameter(inch)
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 N 0 Normal 1
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 RF 0.007 R_007 2
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 RF 0.014 R_014 3
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 RF 0.021 R_021 4
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 IF 0.007 I_007 5
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 IF 0.014 I_014 6
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 IF 0.021 I_021 7
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 OF 0.007 O_007 8
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 OF 0.014 O_014 9
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 OF 0.021 O_021 10
In the training process of the proposed method, the SAE network has four layers,
including an input layer, two trained encoders, and an output layer. The dimension of
the input samples determines the unit number of the input layer, the unit number of the
first encoder and the second encoder are 128 and 32, respectively, and the dimension of
Q values determines the unit number of the output layer. The activation functions used
in the encoders are ReLUs, and the backpropagation function is Adam. Moreover, the
L1 regularization restriction, which can prevent overfitting and improve the training
efficiency, is also utilized to restrict the parameter space of the neural units’ weights.
The maximum training epoch of each autoencoder is 200, the training epoch of the
DQN agent is 2000, and each epoch contains 512 rounds of diagnosis games. The replay
memory size of our DQN agent is 64. Replay memory is the samples randomly selected
from memory, which is used to train the DQN agent as minibatch. Thus, the size of
minibatch in this study is 64. To test and validate the performance of the proposed
method, ten-fold cross-validations are applied to the four datasets. To monitor the
learning performance of our DQN agent, the mean values of the reward are calculated
for every ten rounds. Fig. 4 shows the progress curve of the DQN agent in a training
process. Moreover, the diagnosis results are summarized in Fig. 5.
Fig. 4. The agent’s score in the training process of fault diagnosis for rolling
bearing datasets
The fault diagnosis agent scores one point for a correct answer and negative one
point for a wrong answer, which means that the total score in a game of 512 rounds is
between -512 and 512 points. At the beginning of the game, the DQN agent is in the
stage of random guessing because of the lack of any prior knowledge and experience
as references. As the fault diagnosis result has ten categories, the agent has a probability
of only 0.1 to obtain the correct answer. Thus, in the beginning, the scores are
approximately -460 points. As the training process goes on, the agent learns from its
own historical experience and the change in scores. As shown in Fig. 4, the score
increases with training and reaches nearly 512 points when approaching the final
epochs. It is worth noting that the scores mentioned above are the same as the rewards,
which are used as feedback for evaluating the performance of the agent. The
convergence to the global extremum of DQN agent can be observed through the total
reward of each epoch, because of the diagnosis accuracy is directly reflected by the
total reward. When the total reward is high enough and starts to oscillate, the training
process of DQN agent can be considered as finished.
Fig. 5. Diagnosis results of ten-fold cross-validation of rolling bearing datasets using the
proposed method and the SAE-Softmax method: (a)-(d) results of datasets S1, S2, S3
and S4 using the proposed method, (e)-(h) results of datasets S1, S2, S3 and S4 using the
SAE-Softmax method.
Fig. 5 illustrates that the trained agent can effectively diagnose faults under various
conditions. The quantitative results of ten-fold cross-validation are shown in Table 2.
To illustrate the performance and effectiveness of the proposed method, we also process
the same datasets by ten-fold cross-validation using an SAE model that is trained based
on supervised learning. The input layer and the two autoencoders are the same as the
corresponding layers of the DQN agent. The output layer has ten nodes and uses the
softmax function to output classified results. The training process of the model contains
layer-by-layer pre-training of the autoencoders and fine-tuning of the entire SAE model.
Overall, both methods achieved a good level of diagnostic accuracy and stability. On
the training sets, the proposed method can distinguish the samples correctly with nearly
100% accuracy, and the SAE-Softmax method can distinguish all the samples correctly.
On the testing sets, the two methods are very close in terms of their accuracies and the
corresponding standard deviations. The absolute values of the differences between the
two methods for datasets S1, S2, S3, and S4 are 0.0069, 0.012, 0.0126, and 0.0075,
respectively. Considering that the supervised learning-based method has an advantage
in acknowledging the relationships between the samples and their corresponding labels,
this result illustrates the effectiveness of the proposed method. In addition, the
maximum standard deviation of the testing accuracies is 0.0144, which indicates that
the proposed method can stably classify the fault samples. Specifically, the situation of
dataset S4, which contains raw vibration signals of ten health conditions under three
different loads, is the most complicated of the four datasets. The proposed method can
distinguish the fault modes despite the working conditions. Interestingly, the SAE-
Softmax method performs better than the proposed method on all of the experiments
except for the average accuracy of the testing on dataset S4, which is the most
challenging experiment in case 1. In summary, the results show that the proposed
method can train a DNN-based agent to learn how to diagnose the faults of rolling
bearings using only the raw vibration signals, regardless of the fault categories,
severities, and working conditions.
Table 2 Diagnosis results of the rolling bearing datasets
The SAE-Softmax
The proposed method
based method
Datasets
Training Testing Training Testing
accuracy accuracy accuracy accuracy
Dataset S1 0.9984±0.0004 0.9358±0.0099 1 0.9427±0.0051
Fig. 6. Two-dimensional t-SNE of the representations in the last hidden layer of the DQN for
the rolling bearing datasets
The SAE model used in our method perceives the raw vibration signals and is the
front end of our proposed end-to-end fault diagnosis method. To further understand how
the agent learns to diagnose the faults and validate the agent’s ability to mine the fault
characteristics, t-distributed stochastic neighbor embedding (t-SNE) is utilized to
visualize the output of SAE models inside the DQN gent. The second encoder’s output
is considered as the features that our agent learned, as the output layer of the DQN agent
is a linear layer which outputs the Q values. Thus, the dimensionality of a feature vector
extracted from a single sample is 32 in case 1. The t-SNE algorithm is employed on the
high-dimensional features, and the visualization results of datasets S1-S4 are shown in
Fig. 6. Samples of the same health condition gather together in the shape of curves or
circular areas, and apparent boundaries exist between samples of different health
conditions. By visualizing the representations of raw vibration signals achieved by the
sensing part of the DQN agent, the proposed fault diagnosis method based on DRL can
extract the fault characteristics and distinguish the faults accordingly.
The hydraulic pump data used in case 2 are collected from 7 fuel transferring
pumps, as shown in Fig. 7. Two commonly occurring faults, slipper loosing and valve
plate wear, are physically injected into the pumps. The vibration signals of the hydraulic
pumps are acquired by an acceleration sensor from the pumps’ end face, with a
stabilized motor speed of 528 rpm and a 1024 Hz sampling rate. The states of the pumps
are summarized in Table 3. Two samples are collected from each pump, and each
sample contains 1024 points, so that 14 samples are collected in total.
Each sample of the sensing signals is continuously rearranged into a new sample
containing 500 sets of data using a 512-point length sliding window. The reorganized
datasets are summarized in Table 3. The samples under three different health states
compose the dataset of case 2, which has 7000 samples to be distinguished. Additionally,
ten-fold cross-validation is conducted to evaluate the performance and generalization
ability of the proposed method.
Since ten-fold cross-validation is used on the dataset for validation, ten trials are
conducted where 10% of the samples are randomly selected for testing and 90% of the
samples for training. The performance of our proposed method for case 2 is evaluated
by the average accuracies of the corresponding deviations of the ten-fold cross-
validation.
Table 4 Diagnosis results of pump datasets
1 1 1 0.9997±0.0009
Fig. 9. Two-dimensional t-SNE of the representations in the last hidden layer of the
DQN for the hydraulic pump dataset
As shown in Fig. 9, the dimension of the features extracted by the DQN agent are
reduced by t-SNE for visualization. It can be noticed that the scatter points of different
health conditions separate apart from each other. However, the scatter points of the
same health condition are clustered into several small clusters instead of being
uniformly clustered into one cluster. It is because that after the sparse restriction is
added to the SAE inside of DQN, samples with small difference of the same health
condition may cause activation of neurons at different locations. It causes the features
in high dimension space are relatively far from each other. But the activated neurons
are all related to the same output, in other words, they can still be classified into the
same category.
This study was supported by the Fundamental Research Funds for the Central
Universities (Grant No. YWF-16-BJ-J-18) and the National Natural Science
Foundation of China (Grant Nos. 61803013, 51575021 and 61603016), as well as the
China Postdoctoral Science Foundation (Grant Nos. 2017M610033 and 2018T110030).
References