Nothing Special   »   [go: up one dir, main page]

Paper FD With RL 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/336186043

Intelligent fault diagnosis for rotating machinery using deep Q-network


based health state classification: A deep reinforcement learning approach

Article  in  Advanced Engineering Informatics · October 2019


DOI: 10.1016/j.aei.2019.100977

CITATIONS READS

76 806

7 authors, including:

Yu Ding Liang Ma
Beihang University (BUAA) Beihang University (BUAA)
26 PUBLICATIONS   322 CITATIONS    10 PUBLICATIONS   164 CITATIONS   

SEE PROFILE SEE PROFILE

Jian Ma Mingliang Suo


Beihang University (BUAA) Beihang University (BUAA)
55 PUBLICATIONS   1,664 CITATIONS    39 PUBLICATIONS   354 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Data-driven fault diagnosis, Data-driven control and decision-making View project

Multiple unmanned systems misssion planning View project

All content following this page was uploaded by Yu Ding on 23 November 2020.

The user has requested enhancement of the downloaded file.


Intelligent fault diagnosis for rotating machinery using deep
Q-network based health state classification: a deep
reinforcement learning approach
Yu Ding 1, 2, Liang Ma 1, 2, Jian Ma1, 2, Laifa Tao1, 2, Yujie Cheng3, Chen Lu 1, 2 *
1
School of Reliability and Systems Engineering, Beihang University, Beijing, 100191, China.
2
Science & Technology on Reliability and Environmental Engineering Laboratory, Beijing,
100191, China.
3
School of Aeronautic Science and Engineering, Beihang University, Beijing, 100191, China
*
Correspondence: luchen@buaa.edu.cn

Abstract
Fault diagnosis methods for rotating machinery have always been a hot research
topic, and artificial intelligence-based approaches have attracted increasing attention
from both researchers and engineers. Among those related studies and methods,
artificial neural networks, especially deep learning-based methods, are widely used to
extract fault features or classify fault features obtained by other signal processing
techniques. Although such methods could solve the fault diagnosis problems of rotating
machinery, there are still two deficiencies. (1) Unable to establish direct linear or non-
linear mapping between raw data and the corresponding fault modes, the performance
of such fault diagnosis methods highly depends on the quality of the extracted features.
(2) The optimization of neural network architecture and parameters, especially for deep
neural networks, requires considerable manual modification and expert experience,
which limits the applicability and generalization of such methods. As a remarkable
breakthrough in artificial intelligence, AlphaGo, a representative achievement of deep
reinforcement learning, provides inspiration and direction for the aforementioned
shortcomings. Combining the advantages of deep learning and reinforcement learning,
deep reinforcement learning is able to build an end-to-end fault diagnosis architecture
that can directly map raw fault data to the corresponding fault modes. Thus, based on
deep reinforcement learning, a novel intelligent diagnosis method is proposed that is
able to overcome the shortcomings of the aforementioned diagnosis methods.
Validation tests of the proposed method are carried out using datasets of two types of
rotating machinery, rolling bearings and hydraulic pumps, which contain a large
number of measured raw vibration signals under different health states and working
conditions. The diagnosis results show that the proposed method is able to obtain
intelligent fault diagnosis agents that can mine the relationships between the raw
vibration signals and fault modes autonomously and effectively. Considering that the
learning process of the proposed method depends only on the replayed memories of the
agent and the overall rewards, which represent much weaker feedback than that
obtained by the supervised learning-based method, the proposed method is promising
in establishing a general fault diagnosis architecture for rotating machinery.
Keywords: fault diagnosis, rotating machinery, deep reinforcement learning, deep Q-
network

1. Introduction
Rotating machinery is currently employed in a wide variety of industrial
applications, including the petroleum, energy and chemical industries. Such
applications require high reliability, safety, and economy of the machines during
operation, as the downtime caused by failure may cause economic loss or even
catastrophic accidents [1]. With the development of reliability-related theories, people
realize that crafting “absolutely reliable” machines is impossible. Thus, recent studies
that aim to monitor, evaluate, diagnose and predict the health state of machinery have
attracted considerable interest [2, 3]. Data-driven signal processing techniques are a
significant research direction in fault diagnosis of rotating machinery, and these
techniques are used to analyse the collected signals and extract useful fault
characteristics. Such methods can adaptively analyse the collected data and obtain
sensitive information for distinguishing the health state of machinery [4–6].
As fault diagnosis can be regarded as a pattern recognition problem, artificial
intelligence (AI) has shown great potential in both academic studies and industrial
applications. Among the AI-based intelligent fault diagnosis methods for rotating
machinery, artificial neural networks (ANNs) have been researched and employed in
fault feature extraction and fault mode classification. Bin et al. [7] utilized the feature
obtained by wavelet packet transformation and empirical mode decomposition as the
input of a three-layer ANN to identify early rotor lateral cracks in rotating machinery.
Rafiee et al. [8] used a multiple layer perceptron (MLP) ANN, which takes the wavelet
packet decomposition results of the collected data as input, for fault detection and
diagnosis of gearboxes. Samanta et al. [9] utilized features extracted from the time
domain to identify the state of bearings and employed ANNs and support vector
machines (SVMs) to classify the bearing faults. In the above approaches, the ANNs
were used as a classifier, which still relied on the features extracted by time-domain and
frequency-domain signal processing methods. There are two major drawbacks when
employing ANN-based fault diagnosis methods. The first is that such methods rely on
manual feature extraction, which largely depends on prior knowledge of signal
processing techniques and expert experience. Second, the shallow architecture of ANNs
lacks enough ability to learn the relationships that show the complex non-linear
characteristics existing in fault diagnosis problems. With the advent and development
of deep learning, such problems have been solved to a certain extent [10]. Deep learning
is a kind of machine learning algorithm that can learn multiple levels of abstract
representations of different levels through supervised or unsupervised learning [11].
Instead of extracting the fault features manually, deep learning methods are able to
adaptively learn the hierarchical representation from raw data through multiple non-
linear transformations and approximate complex non-linear functions. Inspired by the
successful applications of deep learning in the fields of natural language processing and
image recognition, deep learning-based fault diagnosis methods have attracted
increasing attention [12]. Based on deep neural networks (DNNs), Jia et al. [13]
proposed an intelligent diagnosis method to automatically mine the fault characteristics
from the frequency spectrum of rotating machinery and classify the health conditions.
Lu et al. [14] used a convolutional neural network (CNN), which can be used to
diagnose the faults of rolling bearings, to learn the hierarchical feature representations
of rolling bearings. Liu et al. [15] utilized recurrent neural network (RNN)-based
autoencoders to process the multiple time sequence data collected from rotating
machines. The anomalous conditions detecting and fault type classification are realized
by comparing the reconstruction error between the data of the next time stamp and the
predicted data generated by RNN-based denoising autoencoders. Shao et al. [16]
proposed a method named the improved convolutional deep belief network (CDBN),
which used compressed sensing (CS) for feature learning and fault diagnosis of rolling
bearings. The proposed method used CS to reduce the amount of vibration data and
CDBN to enhance the feature learning capability for the compressed data.
Through a literature review, ANN-based methods can be effectively crafted as a
basic classifier [7–9], while deep learning-based methods can realize adaptive fault
feature extraction by DNN architecture and hierarchical representation [13, 14, 16].
Compared with traditional ANN-based methods, deep learning-based methods have
improved automation and autonomy in seeking representations that are strongly
correlated with faults. However, there are still two major deficiencies that need to be
improved. (1) Choosing a proper architecture and optimizing the corresponding hyper-
parameters for a DNN still requires a great deal of prior knowledge and experience,
which limits the generalization ability of such methods. Although deep learning-based
methods can effectively and automatically extract fault features, there are still many
details to be determined for a specific task. (2) The training mechanism of such methods
is mostly based on supervised learning or semi-supervised learning, which means that
the diagnosis algorithms need to be taught to learn the different fault patterns
specifically. Such a mechanism does not conform to the human cognitive mechanism
in the real world, which makes it impossible for the fault diagnosis agents to explore
the problem they face and learn to solve it autonomously. These deficiencies limit the
generalization and intelligence level of deep learning-based fault diagnosis methods.
Therefore, it is necessary to establish a deep reinforcement learning mechanism-based
intelligent architecture to realize autonomous fault diagnosis of rotating machinery.
Deep reinforcement learning (DRL) provides new ideas for further improving the
intelligence of fault diagnosis and has the potential to solve the problems mentioned
above. DRL involves the combination of reinforcement learning and deep learning, and
this combination enables artificial agents to learn their knowledge and experience from
raw data directly. Deep learning gives agents the ability to sense the environment, while
reinforcement learning gives agents the ability to learn the best strategies to deal with
real-world problems [17]. Utilizing DRL algorithms, agents can learn by themselves to
obtain successful strategies that lead to the highest long-term rewards directly from raw
input data without any hand-engineered features or domain heuristics [18]. One of the
most famous and striking achievements of DRL is AlphaGo, which is the first program
to defeat a professional human player in Go [19]. Such success has aroused much
enthusiasm in applying DRL methods in different fields. In a wide range of problems,
DRL algorithms have been applied in fields such as robotics, natural language
processing, and computer vision. In robotics, DRL is applied to train robots to learn
control policies directly from camera signals from the real world [20, 21]. Furthermore,
DRL has been used to create agents that can generalize to an environment they have
never seen before, which is considered meta-learning [22, 23]. For the purpose of
creating agents or systems that can learn how to adapt in the real world, DRL algorithms
have been applied not only in managing power consumption [24] and picking and
stowing objects [21] but also in designing novel and machine translation models [25]
and constructing new optimization functions [26]. As a general way of solving
optimization problems by trial and error, DRL has already been studied and applied to
approach most fields of machine learning.
To overcome the deficiencies of the aforementioned existing methods, this paper
proposes a novel intelligent fault diagnosis method based on fault diagnosis agents
trained utilizing DRL. In this method, a fault diagnosis “game” is established, which is
able to provide an interactive environment for the fault diagnosis agent to observe, act
and receive rewards. Then, the fault diagnosis agent is built through a stacked
autoencoder (SAE) and learns to diagnose faults utilizing a deep Q-network (DQN)
combined with experience replay and a separated target network [27]. The highlights
of the proposed method are outlined as follows. (1) Combining the autonomous learning
ability of reinforcement learning and the perception ability of deep learning, the
proposed deep reinforcement learning-based method can realize an end-to-end fault
diagnosis method for rotating machinery. The proposed method greatly reduces the
dependence on prior knowledge and expert experience for fault diagnosis. (2) By
designing a fault diagnosis game environment, the proposed method can successfully
obtain an intelligent fault diagnosis agent using DRL. (3) Benefitting from memory
replay and reward-based feedback mechanisms, the fault diagnosis agent is able to learn
the non-linear mapping relationships between different fault modes and raw vibration
signals based only on weak feedback. This paper is organized as follows. Section 2
briefly introduces the theoretical background of the DQN. Section 3 describes the
proposed DRL-based intelligent fault diagnosis method. Section 4 gives a detailed
description of validating the proposed method using rolling bearing datasets and a
hydraulic pump dataset.

2. A brief introduction to the deep Q-network


Reinforcement learning is a set of goal-oriented algorithms and aims to train
software agents on how to take actions in an environment to maximize the cumulative
reward. Based on reinforcement learning, agents can learn an optimal policy through
trial and error for sequential decision-making problems in a wide range of fields. While
reinforcement learning algorithms have achieved success in a variety of domains, their
applicability is limited to domains with available handcraft features or low-dimensional
state spaces that are fully observed. On the other hand, the emergence of deep learning
enables automatic feature engineering through DNN architecture and gradient descent,
which reduce the dependence on domain knowledge. Thus, by combining
reinforcement learning and DNNs, DRL is able to solve problems close to the
complexity of the real world. A DQN, one of the most remarkable achievements in DRL,
can learn successful policies directly from high-dimensional sensory inputs using a
DNN to replace the Q-table in original Q-learning [27].

2.1 Q-learning

As a model-free reinforcement learning algorithm, Q-learning can also be


regarded as a method of asynchronous dynamic programming. The Q-learning
algorithm creates a table to calculate the maximum expected future reward for each
state and action. Specifically, the columns are the actions, and the rows are the states.
The value of each cell is the maximum expected future reward for that given state and
action. The table is called a Q-table, where “Q” stands for the quality of the action. Each
Q-table score is the maximum expected future reward that the agent gets as long as the
agent takes the action at the corresponding state. In other words, the agent will always
know the best action to take for each state by searching for the highest Q value in the
line corresponding with the state. The learning process of a Q-learning agent can be
summarized as three steps: 1) the agent tries an action based on the observation of the
current state; 2) the environment gives back immediate feedback for the action, which
includes either reward or penalty; and 3) the algorithm updates the Q-table considering
the reward, action and state using the Bellman equation [28].
The Q-learning algorithm can be used to learn each value of the Q-table through
the action-value function (Q-function). The Q-function is defined as:
𝑄 π (𝑠𝑡 , 𝑎𝑡 ) = 𝐸[𝑅𝑡+1 + 𝛾𝑅𝑡+2 + 𝛾 2 𝑅𝑡+3 + ⋯ |𝑠𝑡 , 𝑎𝑡 ] (1)
where 𝑄 π (𝑠𝑡 , 𝑎𝑡 ) is the Q value for state 𝑠𝑡 given action 𝑎𝑡 , 𝑅𝑡 is the reward
of each time step, and 𝛾 is the discount rate. The Q-function takes the state and action
as inputs and returns their expected future reward. The Q-learning algorithm updates
Q(s, a) using the Bellman equation as follows:

𝑄𝑛𝑒𝑤 (𝑠𝑡 , 𝑎𝑡 ) = 𝑄(𝑠𝑡 , 𝑎𝑡 ) + 𝛼 [𝑅𝑡+1 + γ max 𝑄 (𝑠𝑡+1 , 𝑎) − 𝑄(𝑠𝑡 , 𝑎𝑡 )] (2)


𝑎

where 𝑄𝑛𝑒𝑤 (𝑠𝑡 , 𝑎𝑡 ) is the new Q value for state 𝑠𝑡 and action 𝑎𝑡 , 𝑄(𝑠𝑡 , 𝑎𝑡 ) is
the current Q value, 𝛼 is the learning rate, 𝑅𝑡+1is the reward for taking action 𝑎𝑡 at
state 𝑠𝑡 , and max 𝑄 (𝑠𝑡+1 , 𝑎) is the maximum expected future reward given the new
𝑎

state 𝑠𝑡+1 and all possible actions.


The core training process of the Q-learning algorithm can be summarized as
follows. The first step is initializing the Q-table. Then, the learning process is started
by choosing an action 𝑎𝑡 in the current state 𝑠𝑡 based on the Q-table. Subsequently,
the action 𝑎𝑡 is performed, and the next state 𝑠𝑡+1 and reward 𝑅𝑡+1 are observed.
Then, the Q-table is updated by the Bellman equation introduced above. The above
steps are repeated until the training process reaches a predefined terminal condition [29].

2.2 Deep Q-network

A DQN can be a newly developed end-to-end reinforcement learning agent that


utilizes a DNN to map the relationships between actions and states similar to the Q-
table in Q-learning. DNNs, such as a CNN, stacked sparse autoencoder, and RNN, are
capable of directly learning abstract representations from raw sensory data. The DQN
agent proposed in Ref. [27] utilizes a CNN to cognize the local spatial correlations
presented in consecutive game frames. A DQN agent also has to interact with an
environment through a sequence of observations, actions, and rewards, which is similar
to the task that a Q-learning agent faces. Compared with that of a DQN agent, the most
severe defect of a traditional Q-learning agent lies in the Q-table, which Q-learning uses
to map the relations between actions and states. Typically, the problem to be solved by
a Q-learning agent has a mass of states and actions, which could lead to the curse of
dimensionality. Thus, instead of using a Q-table, the DQN in Ref. [27] uses a deep CNN
to approximate the optimal Q-function.
Furthermore, reinforcement learning is known to be unstable or even diverge when
the Q-function is represented by a non-linear function approximator, such as a neural
network. The reasons for the instability include the correlations existing in the sequence
of observations, the small updates to Q values that may cause a severe change in the
agent’s policy and the correlations between the Q value and the target values 𝑅𝑡+1 +
γ max 𝑄 (𝑠𝑡+1 , 𝑎). Thus, two strategies called experience replay and iterative update are
𝑎

introduced to overcome such deficiencies. Experience replay addresses the problem by


eliminating the correlations in the observation sequences and smoothing over changes
in the data distribution through randomizing over the data. Iterative update reduces the
correlations between the Q values and the target values through periodically updating
the Q values towards the target values. Firstly, experience replay stores the agent’s
experience at each time step to form a collection named memory containing a certain
number of experiences. A single experience 𝑒𝑡 at time step 𝑡 is defined as 𝑒𝑡 =
(𝑠𝑡 , 𝑎𝑡 , 𝑟𝑡 , 𝑠𝑡+1 ). The memory at time step 𝑡 is defined as 𝐷𝑡 , where 𝐷𝑡 = {𝑒1 , … , 𝑒𝑡 }.
Then, experience replay randomly draws experiences from the memory when updating
the DQN agent [27].
The performance of the DQN agent is evaluated by teaching it to play a series of
Atari games. The agent observes the states of games by directly taking the raw game
frames as input. Furthermore, the agent can observe only several frames once, which
means that the task is partially observed. Frequently, reinforcement learning algorithms
use the Bellman equation as an iterative update to estimate the action-value function.
As this method is impractical in practice, the action-value function is commonly
estimated using a function approximator. The DQN uses a CNN as the function
approximator with weights 𝜃 as a Q-network. Therefore, the Q-network can be trained
by updating the parameters 𝜃𝑖 at iteration 𝑖 by reducing the mean-squared error in
the Bellman equation. Thus, the loss function 𝐿𝑖 (𝜃𝑖 ), which changes at each iteration
i, is defined as:
𝐿𝑖 = 𝐸𝑠,𝑎,𝑟 [(𝐸𝑠′ [𝑦|𝑠, 𝑎] − 𝑄(𝑠, 𝑎; 𝜃𝑖 ))2 ] (3)
Differentiating the loss function concerning the weights results in the following
form:

∇𝜃𝑖 𝐿(𝜃𝑖 ) = 𝐸𝑠,𝑎,𝑟,𝑠′ [(𝑟 + 𝛾 max



𝑄(𝑠 ′ , 𝑎′ ; 𝜃𝑖− ) − 𝑄(𝑠, 𝑎; 𝜃𝑖 )) 𝛻𝜃𝑖 𝑄(𝑠, 𝑎; 𝜃𝑖 )] (4)
𝑎

With the definition and the differential form of the Q-function, the optimization of
the target is applicable. Additionally, the Q-learning algorithm can be recovered in this
framework by updating the weights after every time step, replacing the expectations
using single samples, and setting 𝜃𝑖− = 𝜃𝑖−1 .
In the training process of the DQN, two modifications of Q-learning are made to
ensure that the training process of DNNs does not diverge. The first is the experience
replay in which the agent’s experiences are stored into a replay memory by stacking the
state, action, and reward of the current time step and the state of the subsequent time
step. Experience replay is an efficient technology in avoiding the oscillations or
divergence in the parameters, smoothing out learning by enabling the agent to consider
its experience in the learning process. The second modification to Q-learning is to use
a separate network for generating the targets in the Q-learning updating process, and
this modification can significantly improve the stability of the DQN. Such methods can
add a delay between the moment of updating the Q value and the corresponding
influence that the update might cause, which reduces the likelihood of divergence or
oscillations existing in the parameters of DNNs.

3. The DQN-based end-to-end fault diagnosis method


Reinforcement learning is generally studied in areas such as game theory, control
theory, operations research, and multi-agent systems. Fault diagnosis is generally
considered a classification problem that is solved by supervised learning. Supervised
learning-based fault diagnosis methods need to take both fault data and the
corresponding tags simultaneously as inputs to distinguish fault modes. For
reinforcement learning-based methods, the learning performance of fault diagnosis
agents is evaluated by the overall rewards, which is much weaker feedback for the
agents to adjust their performance combined with that used in supervised learning
methods. In other words, during the learning process, the agents are not able to
distinguish which samples are not well learned from the input signals. Such a
mechanism compels the agents to discover the intrinsic differences among the fault
modes, which makes the agent more robust.
To realize the fault diagnosis method based on DRL, we designed a fault diagnosis
simulation environment by which we can convert the supervised learning problem to a
reinforcement learning problem. This environment could be considered a “fault
diagnosis game.” Each game contains a certain number of fault diagnosis questions,
and each question consists of a fault sample and the corresponding fault label. When
the agent is playing the game, the game will have one single question for the agent to
diagnose. Then, the game will check whether the agent’s answer is correct. The reward
mechanism of the game is one point for the correct answer and minus one point for a
wrong answer. When the game is over, the agent will receive a total score for the game.
The flow of the fault diagnosis game is shown in Fig. 1.
The vibration signal of rotating machinery contains sufficient fault information.
As a critical element in reinforcement learning, effective perception of the environment
and state by the agent is crucial. For original reinforcement learning methods such as
Q-learning, it is difficult to differentiate and identify the states of rotating machinery
based only on the vibration signals due to the limited ability of those methods to
represent states. Inspired by the DQN’s idea in processing game screens, we use an
SAE DNN in our method to sense the fault information existing in vibration signals.
Recently, an SAE has shown great potential in extracting fault features of rolling
bearings from raw vibration signals. In our method, the DQN agent is built by
sequentially stacking the encoders obtained, initializing the parameters using the
weights stored in pretraining of SAE, and adding a linear layer to map the output of
SAE to a Q values. Unlike CNN-based feature extraction methods, which use
convolutional and subsampling operations to learn a set of locally connected neurons,
an SAE employs a fully connected model for level-by-level intrinsic feature learning.
There are two significant reasons why an SAE can effectively extract fault features.
The first is that an SAE can learn the representations of the original signal through
layer-by-layer self-learning. The second reason is that the sparse constraint can prevent
Fig. 1. The fault diagnosis game flow
overfitting by limiting each autoencoder’s parameter space. The SAE used in our work
has two hidden layers between the input layer and the layer that outputs the Q values.
Notably, the architecture of the SAE remains the same in the cases of this study. The
activation function of each encoder is a rectified linear unit (ReLU) [30], and the
regularization method applied to the encoders is L1 regularization [31], which means
the absolute values of parameters in the neural network are penalized if they are too
large. Before training our DRL method, the pre-training method is applied to the
autoencoders on all of the training data samples. Then, the encoders of the autoencoders
are extracted, and the perception part of our agent is built by stacking the input layer,
all the trained encoders, and the fully connected layer used to output the Q values.
The original DQN in Ref. [27] takes advantage of the games on the Atari 2600
platform to evaluate the performance of the agent. For an agent that is attempting to
master a game, a single observed state contains several game screens. Thus, the
observed states from a game are not independent and identically distributed. For fault
diagnosis of rotating machinery, due to the periodicity of the vibration signals, the
samples can be regarded as independent and identically distributed. Considering the
differences above, the observation of the agent in this study is composed of a single
fault sample. Moreover, the decay factor in the Bellman equation equals zero, which
means that the historical experience under the current state is not considered.
The algorithm for training the SAE-based DQN is illustrated in Algorithm 1. The
training process consists of two separate parts. The first is the pre-training of the SAE,
and the second is the training of the DQN agent. The agent selects and executes actions
according to the ε-greedy policy, which means the action is randomly selected by the
probability ε, and the action corresponding to the maximum q value is selected by the
probability 1-𝜀 [28]. The optimization algorithm used in this study is adaptive moment
optimization (Adam). Adam is an algorithm that can used instead of the classical
stochastic gradient descent procedure to update network weights iterative based in
training data [32].

Algorithm 1: Training of the SAE-based DQN.


Initialize replay memory D to capacity N;
Initialize the feature extraction SAE network with random weights 𝜃1 ;
For i = 1, I do
Train the ith autoencoder;
Save the parameters of the ith encoder;
End For
Initialize the action-value network Q with the parameters of the encoders, except for
the input layer and the output layer;
Initialize the target action-value function Q’ with the same parameters as those of Q;
For episode = 1, M do
Initialize the observation sequence 𝑠1 by outputting a fault sample randomly by
the emulator;
For t = 1, T do
With probability 𝜀, select a random action 𝑎𝑡 ,
otherwise select 𝑎𝑡 = argmax𝑎 𝑄(𝑠𝑡 , 𝑎; 𝜃);
Execute action 𝑎𝑡 in the emulator and observe reward 𝑟𝑡 ;
Generate the next state 𝑠𝑡+1 randomly by the emulator;
Store memory (𝑠𝑗 , 𝑎𝑗 , 𝑟𝑗 , 𝑠𝑗+1) in D;
Sample random minibatch of memory from D;
2
Perform a gradient descent step using Adam on (𝑟𝑗 − 𝑄(𝑠𝑗 , 𝑎𝑗 ; 𝜃)) with
respect to the network parameters 𝜃;
Every C steps, reset Q’ = Q;
End For
End For
*minibatch: The training data is divided into several batches, and the network’s
parameters are updated using each batch. Each batch is called a “minibatch” [33].

Thus far, although the training process of the agent is completed, the agent cannot
yet be used to diagnose rolling bearing faults. The agent can output only the Q values
of actions, instead of giving out the diagnosis result. Thus, the output layer of the agent
needs to be modified. This modification determines the maximum value of the Q values
and then outputs the corresponding action. Thus, an agent that is able to diagnose the
faults of rotating machinery is obtained. Notably, in both the training and testing
processes, the input data are the original vibration signals. In other words, an end-to-
end fault diagnosis method using SAE-based DRL is established. The complete method
flowchart is shown in Fig. 2.
Fig. 2. Flowchart of the proposed method
4. Fault diagnosis using the proposed method
Rolling bearings and hydraulic pumps, two representative research objects of
rotating machinery, are crucial for the safety of industrial systems. Due to severe
working conditions, they are prone to suffer different kinds of damage, which could
lead to a breakdown or even catastrophic failure. In this section, two fault diagnoses for
rolling machinery cases, including rolling bearings and hydraulic pumps, are utilized to
illustrate the effectiveness of the proposed method.

4.1 Case 1: Fault diagnosis of rolling bearings

4.1.1 Data description

The fault data of rolling bearings used in our study are collected by Case Western
Reserve University (CWRU) [34]. The dataset consists of ball bearing test data for both
normal and faulty bearings. The power producer of the experiments is a two-
horsepower Reliance Electric motor, and the vibration data are acceleration data
collected from the drive end (DE) with a sampling frequency of 48 kHz, which was
measured at the near end of the motor bearings. The experimental equipment for rolling
bearing fault injection is shown in Fig. 3. Data under four conditions were collected,
including one normal condition (N) and three fault conditions. The fault conditions are
outer race fault (OF), inner race fault (IF) and roller fault (RF). In addition, the faults
introduced to the rolling bearings were all single point faults. Moreover, the degree of
the faults was different in fault diameters including 0.007 inches, 0.014 inches, 0.021
inches, and 0.028 inches.
To validate the proposed method, the vibration data are divided into four sets (S1,
S2, S3, and S4) according to loads. Each set contains ten fault modes, which are divided
according to fault diameter and fault location. S1, S2, and S3 contain all of the fault
modes under loads of one, two and three horsepower, respectively. There are 12000
samples for each fault set, and each sample contains 400 data points. The datasets are
all raw vibration signals without preprocessing. S4, which is the union set of S1, S2,
and S3, has 36000 samples under all of the loads. The detailed information is shown in
Table 1.
Table 1 Description of the rolling bearing fault datasets
Load Fault Fault
Datasets Number of samples Alias Label
(horsepower) type diameter(inch)
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 N 0 Normal 1
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 RF 0.007 R_007 2
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 RF 0.014 R_014 3
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 RF 0.021 R_021 4
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 IF 0.007 I_007 5
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 IF 0.014 I_014 6
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 IF 0.021 I_021 7
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 OF 0.007 O_007 8
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 OF 0.014 O_014 9
S1/S2/S3/S4 1/2/3/1-3 12000/12000/12000/36000 OF 0.021 O_021 10

Fig. 3. Experimental equipment for rolling bearing fault injection

4.1.2 Diagnosis results

In the training process of the proposed method, the SAE network has four layers,
including an input layer, two trained encoders, and an output layer. The dimension of
the input samples determines the unit number of the input layer, the unit number of the
first encoder and the second encoder are 128 and 32, respectively, and the dimension of
Q values determines the unit number of the output layer. The activation functions used
in the encoders are ReLUs, and the backpropagation function is Adam. Moreover, the
L1 regularization restriction, which can prevent overfitting and improve the training
efficiency, is also utilized to restrict the parameter space of the neural units’ weights.
The maximum training epoch of each autoencoder is 200, the training epoch of the
DQN agent is 2000, and each epoch contains 512 rounds of diagnosis games. The replay
memory size of our DQN agent is 64. Replay memory is the samples randomly selected
from memory, which is used to train the DQN agent as minibatch. Thus, the size of
minibatch in this study is 64. To test and validate the performance of the proposed
method, ten-fold cross-validations are applied to the four datasets. To monitor the
learning performance of our DQN agent, the mean values of the reward are calculated
for every ten rounds. Fig. 4 shows the progress curve of the DQN agent in a training
process. Moreover, the diagnosis results are summarized in Fig. 5.

Fig. 4. The agent’s score in the training process of fault diagnosis for rolling
bearing datasets

The fault diagnosis agent scores one point for a correct answer and negative one
point for a wrong answer, which means that the total score in a game of 512 rounds is
between -512 and 512 points. At the beginning of the game, the DQN agent is in the
stage of random guessing because of the lack of any prior knowledge and experience
as references. As the fault diagnosis result has ten categories, the agent has a probability
of only 0.1 to obtain the correct answer. Thus, in the beginning, the scores are
approximately -460 points. As the training process goes on, the agent learns from its
own historical experience and the change in scores. As shown in Fig. 4, the score
increases with training and reaches nearly 512 points when approaching the final
epochs. It is worth noting that the scores mentioned above are the same as the rewards,
which are used as feedback for evaluating the performance of the agent. The
convergence to the global extremum of DQN agent can be observed through the total
reward of each epoch, because of the diagnosis accuracy is directly reflected by the
total reward. When the total reward is high enough and starts to oscillate, the training
process of DQN agent can be considered as finished.

Fig. 5. Diagnosis results of ten-fold cross-validation of rolling bearing datasets using the
proposed method and the SAE-Softmax method: (a)-(d) results of datasets S1, S2, S3
and S4 using the proposed method, (e)-(h) results of datasets S1, S2, S3 and S4 using the
SAE-Softmax method.
Fig. 5 illustrates that the trained agent can effectively diagnose faults under various
conditions. The quantitative results of ten-fold cross-validation are shown in Table 2.
To illustrate the performance and effectiveness of the proposed method, we also process
the same datasets by ten-fold cross-validation using an SAE model that is trained based
on supervised learning. The input layer and the two autoencoders are the same as the
corresponding layers of the DQN agent. The output layer has ten nodes and uses the
softmax function to output classified results. The training process of the model contains
layer-by-layer pre-training of the autoencoders and fine-tuning of the entire SAE model.
Overall, both methods achieved a good level of diagnostic accuracy and stability. On
the training sets, the proposed method can distinguish the samples correctly with nearly
100% accuracy, and the SAE-Softmax method can distinguish all the samples correctly.
On the testing sets, the two methods are very close in terms of their accuracies and the
corresponding standard deviations. The absolute values of the differences between the
two methods for datasets S1, S2, S3, and S4 are 0.0069, 0.012, 0.0126, and 0.0075,
respectively. Considering that the supervised learning-based method has an advantage
in acknowledging the relationships between the samples and their corresponding labels,
this result illustrates the effectiveness of the proposed method. In addition, the
maximum standard deviation of the testing accuracies is 0.0144, which indicates that
the proposed method can stably classify the fault samples. Specifically, the situation of
dataset S4, which contains raw vibration signals of ten health conditions under three
different loads, is the most complicated of the four datasets. The proposed method can
distinguish the fault modes despite the working conditions. Interestingly, the SAE-
Softmax method performs better than the proposed method on all of the experiments
except for the average accuracy of the testing on dataset S4, which is the most
challenging experiment in case 1. In summary, the results show that the proposed
method can train a DNN-based agent to learn how to diagnose the faults of rolling
bearings using only the raw vibration signals, regardless of the fault categories,
severities, and working conditions.
Table 2 Diagnosis results of the rolling bearing datasets

The SAE-Softmax
The proposed method
based method
Datasets
Training Testing Training Testing
accuracy accuracy accuracy accuracy
Dataset S1 0.9984±0.0004 0.9358±0.0099 1 0.9427±0.0051

Dataset S2 0.9985±0.0005 0.9043±0.0110 1 0.9163±0.0101

Dataset S3 0.9994±0.0001 0.9187±0.0143 1 0.9313±0.0079

Dataset S4 0.9883±0.0083 0.9408±0.0144 1 0.9333±0.0040

The result format: average accuracy ± standard deviation.

Fig. 6. Two-dimensional t-SNE of the representations in the last hidden layer of the DQN for
the rolling bearing datasets

The SAE model used in our method perceives the raw vibration signals and is the
front end of our proposed end-to-end fault diagnosis method. To further understand how
the agent learns to diagnose the faults and validate the agent’s ability to mine the fault
characteristics, t-distributed stochastic neighbor embedding (t-SNE) is utilized to
visualize the output of SAE models inside the DQN gent. The second encoder’s output
is considered as the features that our agent learned, as the output layer of the DQN agent
is a linear layer which outputs the Q values. Thus, the dimensionality of a feature vector
extracted from a single sample is 32 in case 1. The t-SNE algorithm is employed on the
high-dimensional features, and the visualization results of datasets S1-S4 are shown in
Fig. 6. Samples of the same health condition gather together in the shape of curves or
circular areas, and apparent boundaries exist between samples of different health
conditions. By visualizing the representations of raw vibration signals achieved by the
sensing part of the DQN agent, the proposed fault diagnosis method based on DRL can
extract the fault characteristics and distinguish the faults accordingly.

4.2 Case 2: Fault diagnosis of hydraulic pumps

4.2.1 Data description

The hydraulic pump data used in case 2 are collected from 7 fuel transferring
pumps, as shown in Fig. 7. Two commonly occurring faults, slipper loosing and valve
plate wear, are physically injected into the pumps. The vibration signals of the hydraulic
pumps are acquired by an acceleration sensor from the pumps’ end face, with a
stabilized motor speed of 528 rpm and a 1024 Hz sampling rate. The states of the pumps
are summarized in Table 3. Two samples are collected from each pump, and each
sample contains 1024 points, so that 14 samples are collected in total.
Each sample of the sensing signals is continuously rearranged into a new sample
containing 500 sets of data using a 512-point length sliding window. The reorganized
datasets are summarized in Table 3. The samples under three different health states
compose the dataset of case 2, which has 7000 samples to be distinguished. Additionally,
ten-fold cross-validation is conducted to evaluate the performance and generalization
ability of the proposed method.

Fig. 7. Experimental hydraulic pump for fault injection


Table 3 Description of the pump datasets
Pump ID 1 2 3 4 5 6 7
Fault State Slipper Loosing Valve Plate Wear Normal State
Number of 1000 1000 1000 1000 1000 1000 1000
datasets sets sets sets sets sets sets sets

4.2.2 Diagnosis results

Since ten-fold cross-validation is used on the dataset for validation, ten trials are
conducted where 10% of the samples are randomly selected for testing and 90% of the
samples for training. The performance of our proposed method for case 2 is evaluated
by the average accuracies of the corresponding deviations of the ten-fold cross-
validation.
Table 4 Diagnosis results of pump datasets

The proposed method The SAE-Softmax based method

Training accuracy Testing accuracy Training accuracy Testing accuracy

1 1 1 0.9997±0.0009

The result format: average accuracy ± standard deviation.


Fig. 8. The agent’s score in the training process of fault diagnosis for the
hydraulic pump datasets
The trend of the agent’s mean reward for every ten iterations is shown in Fig. 8.
As case 2 contains three health states of the pumps, the agent can guess one-third of the
correct answers. Thus, the score is approximately -314 at the beginning, and it increases
towards 512 points as the training process progresses. The average accuracies and the
corresponding standard deviations of ten-fold cross-validation are shown in Table 4.
The average accuracies of both training and testing are 100%, which means that the
proposed method can classify the samples of each fault mode correctly. Similar to case
1, an SAE-Softmax model is also used to diagnose the faults, and the diagnosis result
is slightly worse than the proposed method. Considering the safety hazard of injecting
heavy faults into the core components of the hydraulic pump, the variety of fault modes
and severities is not as complicated as in case 1. However, the signals are collected from
7 different pumps, and the sensors’ sample rate is much lower than that in case 1. Thus,
case 2 illustrates the effectiveness of the proposed method from three perspectives:
multiple devices, relatively few samples and rotating machinery fault diagnosis. The
results show that both the proposed method and the SAE-Softmax-based method can
effectively classify the states of different hydraulic pumps using the limited duration of
vibration signals. Such a result shows the proposed method’s ability to mine the features
related to the fault modes from the raw sensing signals. Likewise, to visualize the
features that our method autonomously learned from the raw data, t-SNE is utilized to
present the learned low-dimensional representation of the DQN agent.

Fig. 9. Two-dimensional t-SNE of the representations in the last hidden layer of the
DQN for the hydraulic pump dataset
As shown in Fig. 9, the dimension of the features extracted by the DQN agent are
reduced by t-SNE for visualization. It can be noticed that the scatter points of different
health conditions separate apart from each other. However, the scatter points of the
same health condition are clustered into several small clusters instead of being
uniformly clustered into one cluster. It is because that after the sparse restriction is
added to the SAE inside of DQN, samples with small difference of the same health
condition may cause activation of neurons at different locations. It causes the features
in high dimension space are relatively far from each other. But the activated neurons
are all related to the same output, in other words, they can still be classified into the
same category.

4.3 A brief discussion

For comprehensively exploring and understanding the performance of the


proposed method, we select one of the most complicated ways of preparing the training
and testing datasets of rolling bearings under various fault modes and working
conditions. The datasets are organized in a similar way to those of case 1 in Ref. [13],
where a DNN method is proposed for the fault diagnosis of rolling bearings. The results
in Ref. [13] are very good, and most of them are nearly 100%. However, the input of
the model in Ref. [13] is the frequency spectra of the original vibration signals. To
present a more informative controlled trial, the SAE-Softmax model used in our study
is built and trained for the method in Ref. [13]. Similar to the results discussed in case
1, both methods achieve similar diagnostic accuracy and standard deviation.
Furthermore, as the proposed method is an end-to-end self-learning intelligent method,
this result shows the prospect of realizing a more generalized intelligent fault diagnosis
method.
For most of the existing data-driven fault diagnosis methods, the diagnosis process
contains feature extraction from the raw vibration signals, dimensionality reduction of
the features, which is an optional step, and classification using the obtained features.
Among these steps, it is well known that extracting high-quality features becomes the
most crucial step in designing fault diagnosis algorithms. The existing data-driven fault
diagnosis methods, such as time-frequency decomposition-based methods and ANN-
based methods, especially deep learning-based methods, focus on extracting features
that are useful for achieving high diagnostic accuracy. The assembly of those relatively
independent processes into an effective fault diagnosis algorithm takes advantage of
human ingenuity and requires much experience and prior knowledge on the relevant
techniques. Although deep learning-based methods have made particular progress in
adaptive feature extraction, they still cannot combine feature extraction and fault
classification into a unified process. Such methods reduce the requirements for fault
diagnosis prior knowledge and signal processing techniques but increase the
requirements for constructing and training deep models. Considering the above
problems, a deep reinforcement learning-based fault diagnosis method is proposed in
this study. This proposed method not only can solve the fault diagnosis problem in an
end-to-end way from original signals to fault categories but also can learn how to
distinguish the fault samples through self-learning and memory replaying. Thus, the
superiority of our method further enhances the level of intelligent fault diagnosis and
reduces the dependence on signal processing technology and expert experience. The
proposed method provides a method and idea for constructing general fault diagnosis
agents.
The case studies indicate that both our method and the SAE-Softmax-based
method can classify the health states of rotating machinery. However, these methods
are different considering the time consumption during the training process. For the
proposed method, the running time of the algorithm is determined by the time of the
agent’s learning curve. Specifically, for both case 1 and case 2, the agent needs 2000
epochs to achieve a high score of almost 512 points. Moreover, the time of each epoch
is the same regardless of the number of samples in the training set. For the supervised
learning-based SAE-Softmax method, the runtime of the training process is positively
related to the number of samples in the training set. As the way of learning for fault
diagnosis in the proposed method is self-exploration, the runtime of our method is
higher than that of the SAE-Softmax-based method in our study. However, the proposed
method has the potential to exceed supervised learning-based deep learning methods
considering the efficiency of the algorithm. Moreover, with the accumulation of
relevant experimental data, a detailed analysis of algorithm efficiency can be conducted
in future studies.

5. Conclusions and future work


The emergence of DRL has increased the autonomy and intelligence of AI
algorithms and inspires us to propose a novel fault diagnosis method for rotating
machinery faults. Case studies of rolling element bearings and hydraulic pumps
illustrate the feasibility and effectiveness of the proposed method. By designing a fault
diagnosis game environment, the proposed method makes it possible for the fault
diagnosis agent to learn to identify the health states of rotating machinery only through
experience replay and self-exploration. Through the diagnosis results of the datasets, it
is noted that the fault diagnosis agent masters the game, which suggests that our method
can accurately sense the fault characteristics indicating the health conditions of the
machinery. The combination of SAE-based raw signal sensing and DQN-based self-
taught experiential learning is the key reason why the proposed method can achieve
such good results. Unlike supervised learning-based fault diagnosis methods, the deep
reinforcement learning-based method is an end-to-end solution that is capable of
autonomously mining the non-linear mapping relationship between raw vibration data
and the corresponding health states of rotating machinery. Such an advantage can
significantly reduce the dependency on expert experience and prior knowledge for
developing fault diagnosis models. Therefore, the proposed method provides a method
and idea for implementing a more generalized intelligent solution to rotating machinery
fault diagnosis problems.
Our future plans include implementing more experimental tests covering more
types of rotating machinery under various working conditions and fault states, and these
tests would be used to further understand the limitations and boundaries of the proposed
methods in rotating machinery fault diagnosis. In addition, the computing efficiency of
the proposed method during the training process needs to be improved.
Acknowledgements

This study was supported by the Fundamental Research Funds for the Central
Universities (Grant No. YWF-16-BJ-J-18) and the National Natural Science
Foundation of China (Grant Nos. 61803013, 51575021 and 61603016), as well as the
China Postdoctoral Science Foundation (Grant Nos. 2017M610033 and 2018T110030).
References

[1] A. Azadeh, M. Saberi, A. Kazem, V. Ebrahimipour, A. Nourmohammadzadeh, Z. Saberi, A


flexible algorithm for fault diagnosis in a centrifugal pump with corrupted data and noise based
on ANN and support vector machine with hyper-parameters optimization, Appl. Soft Comput.
J. (2013). doi:10.1016/j.asoc.2012.06.020.
[2] A. Heng, S. Zhang, A.C.C. Tan, J. Mathew, Rotating machinery prognostics: State of the art,
challenges and opportunities, Mech. Syst. Signal Process. 23 (2009) 724–739.
doi:10.1016/j.ymssp.2008.06.009.
[3] A. Zhou, D. Yu, W. Zhang, A research on intelligent fault diagnosis of wind turbines based on
ontology and FMECA, Adv. Eng. Informatics. 29 (2015) 115–125.
doi:10.1016/j.aei.2014.10.001.
[4] J. Lee, F. Wu, W. Zhao, M. Ghaffari, L. Liao, D. Siegel, Prognostics and health management
design for rotary machinery systems - Reviews, methodology and applications, Mech. Syst.
Signal Process. 42 (2014) 314–334. doi:10.1016/j.ymssp.2013.06.004.
[5] D. An, N.H. Kim, J.-H. Choi, Practical options for selecting data-driven or physics-based
prognostics algorithms with reviews, Reliab. Eng. Syst. Saf. 133 (2015) 223–236.
[6] M.S. Kan, A.C.C. Tan, J. Mathew, A review on prognostic techniques for non-stationary and
non-linear rotating systems, Mech. Syst. Signal Process. 62 (2015) 1–20.
[7] G.F. Bin, J.J. Gao, X.J. Li, B.S. Dhillon, Early fault diagnosis of rotating machinery based on
wavelet packets—Empirical mode decomposition feature extraction and neural network, Mech.
Syst. Signal Process. 27 (2012) 696–711.
[8] J. Rafiee, F. Arvani, A. Harifi, M.H. Sadeghi, Intelligent condition monitoring of a gearbox
using artificial neural network, Mech. Syst. Signal Process. 21 (2007) 1746–1754.
[9] B. Samanta, C. Nataraj, Use of particle swarm optimization for machinery fault detection, Eng.
Appl. Artif. Intell. 22 (2009) 308–316.
[10] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks,
Science (80-. ). 313 (2006) 504–507. doi:10.1126/science.1127647.
[11] Y. Lecun, Y. Bengio, G. Hinton, Deep learning, Nature. 521 (2015) 436–444.
doi:10.1038/nature14539.
[12] C. Lu, Z.Y. Wang, W.L. Qin, J. Ma, Fault diagnosis of rotary machinery components using a
stacked denoising autoencoder-based health state identification, Signal Processing. 130 (2017)
377–388. doi:10.1016/j.sigpro.2016.07.028.
[13] F. Jia, Y. Lei, J. Lin, X. Zhou, N. Lu, Deep neural networks: A promising tool for fault
characteristic mining and intelligent diagnosis of rotating machinery with massive data, Mech.
Syst. Signal Process. 72–73 (2016) 303–315. doi:https://doi.org/10.1016/j.ymssp.2015.10.025.
[14] C. Lu, Z. Wang, B. Zhou, Intelligent fault diagnosis of rolling bearing using hierarchical
convolutional network based health state classification, Adv. Eng. Informatics. 32 (2017) 139–
151. doi:https://doi.org/10.1016/j.aei.2017.02.005.
[15] H. Liu, J. Zhou, Y. Zheng, W. Jiang, Y. Zhang, Fault diagnosis of rolling bearings with
recurrent neural network-based autoencoders, ISA Trans. 77 (2018) 167–178.
[16] H. Shao, H. Jiang, H. Zhang, W. Duan, T. Liang, S. Wu, Rolling bearing fault feature learning
using improved convolutional deep belief network with compressed sensing, Mech. Syst.
Signal Process. 100 (2018) 743–765.
[17] K. Arulkumaran, M.P. Deisenroth, M. Brundage, A.A. Bharath, Deep reinforcement learning:
A brief survey, IEEE Signal Process. Mag. 34 (2017) 26–38. doi:10.1109/MSP.2017.2743240.
[18] S.S. Mousavi, M. Schukat, E. Howley, Deep Reinforcement Learning: An Overview, (2018) 1–
17. doi:10.1007/978-3-319-56991-8_32.
[19] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser,
I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N.
Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis,
Mastering the game of Go with deep neural networks and tree search, Nature. (2016).
doi:10.1038/nature16961.
[20] S. Levine, C. Finn, T. Darrell, P. Abbeel, End-to-End Training of Deep Visuomotor Policies, J.
Mach. Learn. Res. 17 (2016). doi:10.1007/s13398-014-0173-7.2.
[21] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, D. Quillen, Learning hand-eye coordination for
robotic grasping with deep learning and large-scale data collection, Int. J. Rob. Res. 37 (2018)
421–436. doi:10.1177/0278364917710318.
[22] J.X. Wang, Z. Kurth-Nelson, D. Tirumala, H. Soyer, J.Z. Leibo, R. Munos, C. Blundell, D.
Kumaran, M. Botvinick, Learning to reinforcement learn, (2016).
http://arxiv.org/abs/1611.05763 (accessed November 8, 2018).
[23] Y. Duan, J. Schulman, X. Chen, P.L. Bartlett, I. Sutskever, P. Abbeel, RL$^2$: Fast
Reinforcement Learning via Slow Reinforcement Learning, (2016).
http://arxiv.org/abs/1611.02779 (accessed November 8, 2018).
[24] G. Tesauro, R. Das, H. Chan, … J.K.-A. in N., undefined 2008, Managing power
consumption and performance of computing systems using reinforcement learning,
Papers.Nips.Cc. (n.d.). http://papers.nips.cc/paper/3251-managing-power-consumption-and-
performance-of-computing-systems-using-reinforcement-learning.pdf (accessed November 8,
2018).
[25] B. Zoph, Q. V. Le, Neural Architecture Search with Reinforcement Learning, (2016).
http://arxiv.org/abs/1611.01578 (accessed November 8, 2018).
[26] K. Li, J. Malik, Learning to Optimize, (2016). http://arxiv.org/abs/1606.01885 (accessed
November 8, 2018).
[27] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M.
Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H.
King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Human-level control through deep
reinforcement learning, Nature. 518 (2015) 529–533. doi:10.1038/nature14236.
[28] R.S. Sutton, A.G. Barto, Reinforcement learning: An introduction, (2011).
[29] C.J.C.H. Watkins, P. Dayan, Q-learning, Mach. Learn. 8 (1992) 279–292.
doi:10.1007/BF00992698.
[30] V. Nair, G.E. Hinton, Rectified linear units improve restricted boltzmann machines, in: Proc.
27th Int. Conf. Mach. Learn., 2010: pp. 807–814.
[31] M.Y. Park, T. Hastie, L1-regularization path algorithm for generalized linear models, J. R. Stat.
Soc. Ser. B (Statistical Methodol. 69 (2007) 659–677.
[32] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, ArXiv Prepr.
ArXiv1412.6980. (2014).
[33] M. Li, T. Zhang, Y. Chen, A.J. Smola, Efficient mini-batch training for stochastic optimization,
in: Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2014: pp. 661–670.
[34] X. Lou, K.A. Loparo, Bearing fault diagnosis based on wavelet transform and fuzzy inference,
Mech. Syst. Signal Process. (2004). doi:10.1016/S0888-3270(03)00077-3.

View publication stats

You might also like