Anomaly Detection and Condition Monitoring of Wind Turbine Gearbox Based On LSTM-FS and Transfer Learning

Renewable Energy 189 (2022) 90e103
Contents lists available at ScienceDirect
Renewable Energy
journal homepage: www.elsevier.com/locate/renene
Anomaly detection and condition monitoring of wind turbine gearbox

based on LSTM-FS and transfer learning
Yongchao Zhu a, Caichao Zhu a, *, Jianjun Tan a, Yong Tan a, Lei Rao b
a
The State Key Laboratory of Mechanical Transmissions, Chongqing University, Chongqing, 400044, China
b
CSSC Haizhuang Windpower Co., Ltd., Chongqing, 401122, China
a r t i c l e i n f o a b s t r a c t
Article history: To take full advantage of the limited monitoring data with fault information for operational state pre-
Received 8 November 2021 diction in the case of the discrepancy in data distribution between the WTGs, a novel combined method
Received in revised form is proposed based on the long short-term memory, fuzzy synthesis and feature-based transfer learning.
12 February 2022
After the statistical analysis and prediction of the monitoring indexes of two 2-MW WTGs with faulty
Accepted 15 February 2022
Available online 18 February 2022
information, an operational state calibration framework is proposed based on deep learning and fuzzy
synthesis. Following this, three feature-based transfer learning methods are adopted to narrow the
discrepancy among the data distribution of the WTGs. Correspondingly, feasibility verification of the
Keywords:
Wind turbine
proposed method is equally addressed. Case applications are performed using the actual monitoring data
Condition monitoring from No. 13 and 15 wind turbines of a wind farm in northern China. The results show that the operational
Deep learning state calibration framework can sensitively detect the potential fault information of the WTG in advance.
Fuzzy synthesis Meanwhile, three transfer learning algorithms can effectively narrow the distance of the data distribu-
Transfer learning tion among the WTGs, and the classification accuracy can almost reach above 0.9. The proposed method
can make full use of existing monitoring data with faulty information to predict the status of other WTGs.
© 2022 Published by Elsevier Ltd.
1. Introduction studies mainly focused on condition monitoring (CM) and fault

diagnosis. Thereinto, fault diagnosis is applicable when the fault
With the increasing number of wind turbines installed across has already occurred or the fault signal characteristics are relatively
the world, higher standards of wind turbine operation and main- obvious. Moreover, the system monitors the health of components
tenance (O&M) strategies are now needed due to their complex as the reference for preventive maintenance, which replaces or
structure and high repair cost [1]. Moreover, the offshore wind repairs parts before a related failure occurs, and the O&M costs can
energy resources are being vigorously developed, especially the be significantly reduced [7]. CM mainly consists of lubrication
floating wind turbines [2], which greatly increase the difficulty and analysis, acoustic emission analysis, vibration analysis to fault
costs of O&M. According to the statistics, the operational state of detection, and data-driven operational state prediction [8].
WTGs is being concerned due to its high failure rate and longest The analysis of lubrication can be divided into two ways: online
troubleshooting time [3], and nearly a third of the O&M costs are (analysis of viscosity and particle count) and offline (analysis of
wasted owing to the inappropriate O&M strategies [4]. Therefore, filter flow and cleanliness characteristics) [9], such as the moni-
how to optimize the O&M strategies based on the actual operating toring of lubrication state based on viscosity, sensor and a particle
state of the wind turbine, especially the wind turbine gearbox filtering technique to predict the remaining useful life [10,11].
(WTG), has become an indispensable means to improve the oper- Meanwhile, the acoustic emission and vibration of mechanical
ational reliability and reduce the cost per kilowatt hour [5]. For this systems has also been addressed in recent years [12e15], these
purpose, a lot of research has focused on the WTG operational state studies mainly focused on wigner-ville distribution, fourier trans-
prediction through the Supervisory Control and Data Acquisition form, wavelet transform and empirical mode decomposition sup-
(SCADA) and Condition Monitoring System (CMS) [6], and these port, etc [8], for example, the harmonic wavelet transform analysis
for fault detection and diagnosis [16], the time-synchronous aver-
aging for diagnostics of WTGs [17], the fault-signature enhance-
* Corresponding author. ment algorithm [18] and pattern recognition approach [19] for fault
E-mail address: cczhu@cqu.edu.cn (C. Zhu).
https://doi.org/10.1016/j.renene.2022.02.061
0960-1481/© 2022 Published by Elsevier Ltd.
Y. Zhu, C. Zhu, J. Tan et al. Renewable Energy 189 (2022) 90e103
Nomenclature nwind instantaneous wind speed/m/s

Vna the maximum vibration value of nacelle
THSSf temperature of the front in the high-speed shaft/ C Sij the deterioration degree of each index
THSSb temperature of the back in the high-speed shaft/ C wij dynamic weight value of the index
Toil temperature of gearbox oil/ C w0ij constant weight value
Tinlet temperature of inlet lubricant/ C Vc fuzzy judgment matrix of indices
Tcool temperature of cooling water/ C hðsij Þ membership value
nhub1 rotate speed of the hub (sensor 1)/r/min Wc weight matrix of the project layer
nhub2 rotate speed of the hub (sensor 2)/r/min Vi matrix of the destination assessment
nhub rotate speed of the hub/r/min bkj membership degree under the condition j
ngen generator rotate speed/r/min Nep number of iterations of training
Pact10m average active power in 10 min/kW O&M operation and maintenance
nwind10m average wind speed in 10 min/m/s WTG wind turbine gearbox
Pact60s average active power in 60 s/kW SCADA supervisory control and data acquisition
nwind60s average wind speed in 60 s/m/s CMS condition monitoring system
Pact active power of the WTG/kW CM condition monitoring
Pactgen active power of generator LSTM long short-term memory
Pactgrid active power of the grid FS fuzzy synthesis
nwind10s average wind speed in 10 s/m/s CNN convolutional neural network
nhubdif rotate speed difference between two sensors/r/min DDC deep domain confusion
pin pressure of the inlet lubricant/bar DAN adaptation networks
Pact10s average active power in 10 s/kW MMD maximum mean discrepancy
Tgen maximum temperature of generator windings/ C CORAL correlation alignment
pout pressure of the outlet lubricant/bar PE percentage error
Jgen torque of the generator/kN$m RMSE root mean square error
nwind1 instantaneous wind speed (sensor 1)/m/s MK-MMD multiple kernel variant of maximum mean
nwind2 instantaneous wind speed (sensor 2)/m/s discrepancy
characteristics and cointegration analysis of SCADA data for fault transfer knowledge among the similar wind turbines to make full
diagnosis, nonlinear data trends and abnormal problems detection use of the faulty data, Wanqiu Chen et al. [43] used two transfer
[20]. learning methods to establish wind turbine fault diagnosis models
In recent years, with the continuous development of data- for blade icing accretion and gear cog belt fracture failures. Yanting
driven artificial intelligence, many algorithms have been applied Li et al. [44] proposed a fault diagnosis method based on transfer
to CM, such as support vector machine, deep learning and statistical learning and convolutional autoencoder. Ren H et al. [45] aimed at
analysis. Since Zhang et al. [21] proposed a fault detection method the health condition diagnosis of the wind turbines, a fault diag-
for wind turbine main bearings based on an artificial neural nosis method was built based on variational mode decomposition,
network (ANN) to identify the early stage of the main bearing fault multi-scale permutation entropy and feature-based transfer
in 2014, many studies have been addressed on the CM based on learning. For the automatic extraction of image features and the
machine learning [22e26], statistical algorithms [27e31] and the accurate and efficient detection of wind turbine blade damage,
life cycle assessment [32e37]. In 2020, Zepeng Liu et al. [38] Yang X et al. [46] proposed an image recognition model based on
concluded that the present works on failure modes, CM and fault deep transfer learning, and the performance was verified by using
diagnosis methods for WTG are mainly only validated with artifi- unmanned aerial vehicle images of the wind turbine blades.
cially seeded defects in idealized laboratory environments, as a As can be seen from the above literatures, the data-driven al-
result of the designed techniques may not perform well on actual gorithm is one of the most significant methods for WTG condition
wind turbines due to their complex structure and failure modes. monitoring. Many a study has been addressed in this field and some
Meanwhile, after reviewing the typical WTG failure modes, the precise results have been obtained at present, but few applications.
approaches of CM in Ref. [39] and the methods of CM based on Which is caused mainly by the following aspects: ⅰ) the monitoring
SCADA and CMS data in Refs. [40,41]. The conclusion can be drawn system of each wind farm has indeed collected a large amount of
that the SCADA and CMS data should be integrated when detecting, monitoring data, but the faulty data samples are not sufficient
diagnosing and predicting WTG faults. Besides, most of the current enough and the data between WTGs cannot be directly used due to
studies on wind turbine fault diagnosis can identify the fault lo- their non-identity; ⅱ) it is very costly to establish a prediction model
cations, while we should pay more attention to the accuracy of for each type of failure of each wind turbine, due to the diversity of
dynamic models for different fault types, the more specifically the failure model and there being not always all corresponding
representative features, and powerful classifiers. fault data samples for each WTG. In addition, the accuracy of the
Moreover, some recent studies have focused on the application CM model should not only be measured by the size of the predic-
of transfer learning in condition monitoring and fault diagnosis, tion error, but also the difference of prediction results between
due to the insufficiency of the faulty data samples in the actual normal and abnormal status. Operational state of the WTG should
operation of the wind turbines. For example, to predict the power be calibrated from multiple dimensions and dynamic thresholds,
characteristics of wind turbines in the short term in the future, due to its complex structure, constantly changing wind speed,
Qureshi AS et al. [42] proposed a prediction system which exploits power, and environmental conditions.
the learning ability of a deep neural network based ensemble From the above, the dilemma we are facing is not only how to
technique and the concept of transfer learning. Furthermore, to detect the potential faulty information of WTGs from multiple
91
dimensions, but also to make full use of the limited faulty data operational state level of each data set is obtained by inputting the
samples to detect the potential fault of other WTGs in the case of deterioration to the FS model. In another part, transfer learning is
the discrepancy among the data distribution of the WTGs. Never- used to predict the operational state of the WTG, after the filtration
theless, transfer learning can learn the common fault characteris- of data samples in the source and target domain. Finally, the ac-
tics of similar WTGs and the specific fault characteristics of target curacy of the prediction results for cross-WTG and power level are
WTGs, which improves the accuracy of state prediction. Besides, acquired by using the three transfer learning algorithms,
considering the fuzziness between different state levels of WTGs respectively.
and the superiority of LSTM in dealing with time series data, we The processes of the proposed state prediction and evaluation
therefore propose a novel combined method for operational state for the WTG include (ⅰ) feasibility verification of the methodology,
prediction based on LSTM, fuzzy synthesis (FS), convolutional (ⅱ) indexes prediction, (ⅲ) operational state calibration, (ⅳ) training
neural network (CNN) and feature-based transfer learning. transfer learning model, (ⅴ) state prediction based on transfer
The rest of this paper is organized as follows. Section 2 elabo- learning and (ⅵ) discussion.
rates how to calibrate and predict the operational state of the WTG
based on transfer learning, LSTM and FS (LSTM-FS). Section 3 ver-
2.2. Operational state calibration based on LSTM-FS
ifies the feasibility of the combined method based on the moni-
toring performance, statistical analysis and the necessity of the
2.2.1. Indexes prediction based on LSTM
transfer learning by utilizing the prediction results of actual
Because the LSTM network unit can filter the information to the
monitoring data. Section 4 verifies and discusses the accuracy of the
cell state in training through its three channels, forgetting, input,
method based on the prediction results of the actual monitoring
and output channels, the LSTM has a strong advantage in dealing
data from No. 13 and No. 15 wind turbines of Pandaoliang wind farm
with time-series data. Hence the LSTM is adopted to predict each
in Shanxi, China.
evaluation index. As shown in Fig. 2, the LSTM network unit loops
itself by updating the cell state in the hidden layer. In this figure, xt
2. WTG status prediction through LSTM-FS and feature-based and yt are the elements of the input and the output, yt1 is the
transfer learning output previously, ht is the current cell state that is storing the
hidden layer state, and ht1 is the cell state previously [26].
2.1. The process of state prediction based on LSTM-FS and transfer Among them, the forget gate is used to determine whether the
learning information needs to be forgotten through the following equation:

In this section, the process of WTG state prediction based on sf ¼ s Wf ½ct1 ; xt þ bf (1)
LSTM-FS and transfer learning is elaborated. As shown in Fig. 1, the
process can be roughly divided into two parts: (ⅰ) feasibility veri-
where the forget gate prints out 0 or 1 with respect to ct1 and xt ,
fication of the methodology and operational state calibration, (ⅱ)
which decide to forget or retain ht1 in the current cell,
state prediction based on transfer learning and discussion.
respectively.
In the first part, monitoring indexes with obvious fault charac-
In a similar way, the principle of the input gate is used to update
teristics were selected from the historical monitoring data as
the cell state from ht1 to ht , which can be determined as:
evaluation indexes for the operational state of WTGs, after
reviewing the fault log. Then, prediction dimensions of each eval- si ¼ sðWi ½ct1 ; xt þ bi Þ (2)
uation index are determined based on the correlation analysis, and
also the power level of each data sample is obtained based on K- the forget and update gates are sued to determine the final cell
means. Meanwhile, the predicted results of each evaluation index state, which is showed as:
are obtained based on LSTM, with the percentage error (PE) be-
tween the predicted results and the actual data used as the dete- ht ¼ sf , ht1 þ si ,tanhðWc , ½ct1 ; xt þ bc Þ (3)
rioration of each evaluation index. Correspondingly, the
Fig. 1. The process of operational state prediction based on LSTM-FS and transfer learning.
92
Fig. 2. LSTM unit and structure used in this paper.
The output channel is used to determine the final output, which 8

can be determined as: >
> 1; 0 sij < l1
>
>
>
<
so ¼ sðWo ½ct1 ; xt þ bo Þ 1 1 p l þ l2
(4) h sij ¼ sin sij 1 ; l1 sij < l2 (7)
>
> 2 2 l2 l1 2
>
>
LSTM network with a single circulation unit and its criterion for >
: 0;
determining the prediction dimensions are discussed in section 3. l2 sij
Following this, the hyper-parameters are adjusted according to
The membership equation of v2 is determined as:
experience during the training process. The network structure is
illustrated in Fig. 2. The number of cycle unit and epochs is 10 and 8
>
>
200, respectively. >
> 0; si < l1 or sij l4
>
>
>
>
>
>
> 1 1 p l1 þ l2
2.2.2. Operational state calibration based on predicted results >
<2 2þ sin s ij ; l1 sij < l2
l2 l1 2
The operating health state of the WTG is hard to be quantified h sij ¼
due to its complex structure which is formed by the coupling of >
> 1; l2 sij < l3
>
>
multiple components, so the WTG has a certain ambiguity or un- >
>
>
> 1 1 p l3 þ l4
>
> ; l3 sij < l4
certainty among various operational state levels. Therefore, >
> sin s
:2 2 l4 l3 ij 2
combining the prediction errors of multiple dimensions (evaluation
indexes, THSSf , THSSb , Toil and Tinlet ), we calibrate the operational
(8)
state of WTG based on LSTM and FS, the detailed processes can be
referred to our previous study [26]. The membership equation of v3 is determined as:
Firstly, the constant weight of each evaluation index is obtained 8
based on the Analytic Hierarchy Process and Least Square Method, >
> 0; sij < l3
>
>
which are used to determine the judgment matrix based on the >
<
1 1 p l þ l4
1e9 calibration method and engineering experience. The constant h sij ¼ þ sin sij 3 ; l3 sij < l4 (9)
>2 2
> l4 l3 2
weights are obtained as shown: >
>
>
: 1; l4 sij
W 0 ¼ ½0:2272; 0:2272; 0:4231; 0:1225 (5)
Secondly, the deterioration degree of each evaluation index is where li is defined as the fuzzy boundary interval of the three state
equal to the percentage error of each index after normalization. levels. Thereinto, l1 is the maximum PE under the normal state, l4 is
Particularly, the operational state is faulty when the deterioration the minimum PE when the fault features are obvious, and l2 ¼ l1 þ
degree of an index is greater than 0.8. In addition, a dynamic weight ðl4 l1 Þ=4, l3 ¼ l4 ðl4 l1 Þ=4 l3 are selected according to the
calculation method is introduced, which combines the constant relevant regulations and expert experience.
weight and the deterioration degree, to avoid the faulty informa- Following this, the fuzzy judgment matrix (Vc ) of each index is
tion being concealed due to its smaller weight of an index. The obtained by substituting the deterioration degree into the mem-
dynamic weight is shown as: bership functions.
, 2 3 2 3
X
m vc1 vc11 / vc14
wij ðsi1 ; si2 ; …; sim Þ ¼ w0ij , sa1
ij w0k ,saij1 (6) Vc ¼ 4 « 5 ¼ 4 « 1 « 5 (10)
k¼1 vc4 vc41 / vc44
where wij and w0ij are the dynamic weight and constant weight; sij is where vcjk is the jth index's membership degree under the kth state
the percentage error of each index after normalization; m is the level; vcj is the membership matrix of the jth index.
number of indices; a is an undetermined coefficient, a 0:5 when Wc is defined as the dynamic weight matrix:
some indexes have a great influence on the operational state, and
a > 0:5 for converse cases. In this study, a is 0.4. W c ¼ ½ w1 w2 … w4 (11)
The membership function is obtained by combining the half
trapezoid and half ellipse in this study [47]. Based on the per-
V i ¼ W c ,V c (12)
centage error and the dynamic weight of each index, the mem-
bership values of the three operational state levels are defined as: Finally, the balanced function is introduced to obtain the oper-
The equation of the v1 is defined as: ational state level after rounding off f , the balanced function is
93
defined as:
1 X 1 X
, MMDðXS ; XT Þ ¼ ∅ ðxs Þ xt (14)
X
4 X
4 jXS j x 2X jXT j x 2X
f¼ b2j ,hj b2j (13) s S t T
j¼1 j¼1 In order to obtain a representation that is conductive to

enhancing the generalization ability of the classifier, it is necessary
where hj ¼ j is the degree of each operational state level in V i . to minimize the loss via the following:
L ¼ L C ðXL ; yÞ þ a,MMD2 ðXS ; XT Þ (15)

2.3. Operational state prediction model based on transfer learning
where L C ðXL ; yÞ denotes the classification loss of the labeled data
In general, each wind farm contains dozens of wind turbines (XL ) and the ground truth labels (y), and the hyperparameter a
which are similar or identical types. Monitoring data of these wind determines the intensity of the confusion domain.
turbines contain some universal failure information, due to their The architecture of DDC in this study has two convolutional and
similar structures and working conditions. Although a large num- pooling layers and three fully connected layers with dimensions. To
ber of SCADA data about wind turbines are stored at present, few directly regularize the representation to be invariant to the source
are labeled, and they have not been fully utilized. Meanwhile, the and target domains, the domain distance loss is located on the top
generalization ability of the prediction model is insufficient when of the “bottleneck” layer, and the adaptation layer and the domain
across the WTG. Hence it is necessary to narrow the discrepancy of distance loss are placed the layer after the fully connected layer, FC
data distribution among WTGs based on transfer learning algo- 4. The source and target CNN in the architecture (Fig. 3 (a)) share
rithm to make full use of the historical monitoring data of the wind the weights, and the labeled monitoring data samples of WTG are
turbines with the same type for condition monitoring. used to calculate the classification loss, while the domain confusion
In this section, a WTG operational state prediction method loss can be obtained by the utilization of all data from the source
based on convolutional autoencoder and transfer learning is and target domain.
established. The architecture and process of deep domain confusion As shown in Fig. 3 (b), distinguished from DDC, DAN has three
(DDC) [48] and Deep Adaptation Networks (DAN) [49] are shown in adaptation layers and the domain distance loss which are respec-
Fig. 3. As shown in Fig. 3 (a), the DDC automatically learn the tively placed before classifier. Secondly, the multiple kernel variant
representation of joint training to optimize classification and of MMD (MK-MMD) with better characterization ability is adopted
domain invariance by utilizing an adaptive layer and domain in the DAN method to replace the single kernel MMD [49]. The MK-
confusion loss based on maximum mean discrepancy (MMD). The MMD Dk ðp; qÞ between probability distributions p and q is defined
classifier can be trained based on the source labeled data and be as the reproducing kernel Hilbert space distance. The MK-MMD is
applied to the target domain directly with minimal loss, after defined as
learning a representation for minimizing the distance between the
source and target distributions. In this study, the MMD is computed D2K ðp; qÞ b EP ½∅ðxs Þ Eq ½∅ðxt Þ2H k
(16)
with respect to ∅ð ,Þ, which operates on source data points (xs 2 XS )
and target data points (xt 2XT ) [48]. The distance between source where H k denotes the reproducing kernel Hilbert space endowed
and target data is defined as follows: with a characteristic kernel k, and the mean embedding p in H k is a
Fig. 3. The process of deep domain confusion and deep adaptation networks.
94
unique element uk ðpÞ so that Exp f ðxÞ ¼ ½f ðxÞ; uk ðpÞH k for all f 2 normal in the operational log, due to the fixed alarm threshold in
H k. the SCADA system. To detect the potential faults of the WTG, the
In this study, we fine-tune CNN on the source labeled data prediction error based on LSTM is used as the dynamic alarm
samples, to make the distributions of the source and the target threshold, and the operating health status of the WTG is calibrated
similar under the hidden representations of fully connected layers from the above four indexes.
FC 4-FC 6. The optimization objective can be defined as
3.1.2. Correlation analysis among the indexes
1 Xna X
l2 In most of the existing research on the CM, the prediction error
min J q Xia ; yai þ l,m, D2k D ls ; D lt (17) is regarded as the only standard to evaluate the accuracy of the
Q na i¼1
l¼l1
prediction model. It tends to ignore the correlation difference be-
tween the predicted dimension and the label in normal and
where Q denotes the bias parameter and the weight of the
abnormal states, so that the difference of the predicted results in
network; l1 and l2 are 3 and 5 respectively, indicating that the
each operational state grade is may not be obvious enough. In other
network adaptation is from the 4th layer to the 6th layer; na and Xa
words, the criterion for judging the accuracy of prediction results
represent the labeled data sets in the source and target domains;
should not be the size of the prediction error, but the degree of
Jð $Þ defines a loss function, cross-entropy. l ¼ 2= ð1 þ
prediction difference between normal and abnormal states instead.
e10,i =Nep Þ 1, i is the index of the iterations. m is the undeter-
To obtain the prediction models with large differences in pre-
mined coefficient used to regulate the balance between the MMD
diction results between normal and fault states, a total of 10,000
and classification loss.
sets of data samples from No. 13 wind turbine under normal and
Furthermore, the correlation alignment for deep domain adap-
failure conditions were selected. Following this, we calculate the
tation (Deep CORAL) [50] is similar to DDC and DAN. The Deep
correlation between each predicted dimension and the above
CORAL can be integrated into different layers or architectures. The
selected four indicators under the normal and abnormal conditions,
CORAL loss as the distance between covariances of the source and
as well as the correlation difference between the two conditions.
target features is defined as
The correlation coefficient is described as follows:
1
LCORAL ¼ CS CT 2F (18) CovðX; YÞ
4d2 rðX; YÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (21)
Var½XVar½Y
where k ,k 2F represents the Frobenius norm of squared matrix, and
the CS and CT are covariance matrices of the source and target data, where CovðX; YÞ is the covariance between X and Y, Var½X is the
which are defined as variance of X, and Var½Y is the variance of Y.
As a result, the predicted dimensions are selected only if their

1 1 u u u correlation with the predicted label, and the correlation difference
CS ¼ Du D ðx D Þ x D (19)
ns 1 S S ns S S between the two states are all greater than 0.2. The prediction di-
mensions of each evaluation index are shown in Table 1. The in-
fluence of the correlation between the indexes is analyzed
1 1 u u u
CT ¼ Du DT ð x D T Þ x DT (20) detailedly in the following section.
nT 1 T nT
3.2. Verification and classification based on clustering
where x is a column vector with all elements equal to 1.
Finally, three transfer learning methods are adopted to narrow
To verify the rationality of the prediction dimensions selection
the discrepancy of the data distribution between WTGs, respec-
based on correlation, as well as the accuracy of the generalization
tively. Meanwhile, the accuracy and generation ability of three
ability of the prediction model cross WTGs, the corresponding data
methods are also discussed.
samples are selected and analyzed respectively in this section.
3. Feasibility verification of the methodology 3.2.1. Verification of the dimensions

To verify the significance of the prediction dimensions selection
The feasibility verification of the proposed methodology mainly and power classification, in this part, we take the prediction results
consists of the following parts: (ⅰ) selection of the index and its of THSSb of No. 13 WTG under the fault and normal state as an
prediction dimensions based on the performance and correlation, example, in that the high-speed bearing exhausts the highest fail-
(ⅱ) classification of the power level and operational state and (ⅲ) ure rate in the WTG. According to the absolute value of the corre-
necessity analysis of transfer learning adoption. lation and its difference between the two states, the prediction
dimensions were classified into two groups based on the value of
3.1. The performance and correlation among monitoring indexes the correlation with the predicted label and the correlation differ-
ence between the two operational states. In the first group, the
3.1.1. The performance of the indexes correlation between the prediction dimension and the prediction
The operational health of a WTG can be reflected by multiple label in the normal state is located in the interval (0.2, 0.8], and the
monitoring indexes in the SCADA system, and the operational correlation difference between the two operational states is located
health degree reflected by each monitoring index is distinguishing. in (0.2, 1). In the second group, the correlation in the normal state is
We therefore filtered the monitoring indexes under the normal and in the interval (0.2, 1). Correspondingly, a total of 10,000 sets of data
failure state of the WTGs, after reviewing the fault log and the samples under the normal (2,000 sets) and fault (8,000 sets) state
historical fault data samples of No. 13 and No. 15. Finally, a total of were selected for verification, the prediction results and the PE are
four monitoring indexes are selected to calibrate the health status shown in Fig. 5.
of WTG. As shown in Fig. 4, each selected index has obvious fault As can be seen from Fig. 5(a), the accurate prediction results and
characteristics when the WTG is in the fault state in the period the strong generalization ability have been obtained, according to
between t0 and t1 . However, almost all the operational state is the normal operational state from 0 to t1 of two groups. However,
95
Fig. 4. The performance of monitoring indexes under each operational state.
Table 1 the prediction results of the WTG under the fault state are widely
The prediction dimensions of each evaluation index. divergent after the t1 . In Fig. 5(b), l1 and l2 are respectively the
Indexes THSSf THSSb Toil Tinlet maximum PE of the two groups under the normal state. According
to the statistical results, the number of the predicted PE less than l1
THSSf ✓ ✓
THSSb ✓ ✓
under the failure state is 852 in the first group, and 6,268 less than
Toil ✓ ✓ l2 in the second group, which means the number of the missed
Tinlet ✓ ✓ alarms in the fault state can be reduced due to the prediction di-
Tcool ✓ mensions determined based on the correlation analysis. But the
nhub1 ✓ ✓ ✓ ✓
probability of the missed alarms still exceeds 10% (852/8000) in the
nhub2 ✓ ✓ ✓ ✓
nhub ✓ ✓ ✓ ✓ first group. In addition, it is found that both temperature and pre-
ngen ✓ ✓ ✓ ✓ diction errors are positively correlated with the active power,
Pact10m ✓ ✓ ✓ ✓ which means that the faulty information becomes more evident
nwind10m ✓ ✓ ✓ ✓ with the increase of the active power in the prediction results of the
Pact60s ✓ ✓ ✓ ✓
index, especially the temperature.
nwind60s ✓ ✓ ✓ ✓
Pact ✓ ✓ ✓ ✓
Pactgen ✓ ✓ ✓ ✓
Pactgrid ✓ ✓ ✓ ✓
3.2.2. Classification of the operational state and power level
nwind10s ✓ ✓ ✓ ✓ Based on the previous analysis of the results predicted, we
Jgen ✓ ✓ ✓ ✓ therefore classified the active power to the further reduce the
nwind1 ✓ ✓ ✓ ✓ probability of missed alarms. To this end, we take No. 13 WTG as an
nwind2 ✓ ✓ ✓ ✓ example and a total of 10,000 data samples under normal and fault
nwind ✓ ✓ ✓ ✓
nhubdif ✓ ✓ ✓
states are selected. K-means clustering is adopted to obtain the
pin ✓ ✓ ✓ silhouette coefficients under each classification quantity. As can be
pout ✓ ✓ ✓ seen from Fig. 6(a), the maximum silhouette coefficient occurs
Tgen ✓ ✓ when the active power is divided into two categories. Therefore,
Pact10s ✓ the power status of No. 13 is divided into two categories based on K-
Fig. 5. Prediction results of the two groups under the different correlation degrees.
96
Fig. 6. The silhouette coefficient and power classification of No. 13 WTG.
means clustering, with the results presented in Fig. 6(b). 3.3. Necessity analysis of transfer learning adoption
Based on the above results, it is difficult to distinguish the
operational health state of the WTG through the prediction error The mapping relationship between monitoring indexes in each
when under the lower power state. Therefore, this study is WTG is not exactly the same, due to the differences in
addressed only for the operational health status under the high manufacturing and installation errors, health status, operational
power, as shown in Fig. 6(b) marked in blue. To further reduce the conditions and other factors among each WTG. In order to
influence of the active power on the accuracy of prediction, 10,000 evaluate the generalization ability of the prediction model cross
sets of data were selected from the normal and abnormal states of WTG directly, and clarify the necessity of aligning data distribu-
the two WTGs respectively, and k-means is used to cluster the four tion between the WTGs, a total of 84,000 sets (42,000 sets for
indexes of each WTG and their corresponding PEs with the each WTG) of SCADA data from No. 01 and No. 02 WTG were
Pact10m , respectively. The silhouette coefficients of the clustering filtered out, based on the correlation analysis made above. As
results are shown in Fig. 7. shown in Fig. 8(a) and (b), active power and oil temperature of
As shown in Fig. 7(a), the clustering results of each index with the two WTGs are almost identical in time domain. Finally, the
Pact10m all indicate that high power should be further divided into loss and root mean square error (RMSE) of the predicted results
2 categories. Correspondingly, we divide the high power into the are shown in Fig. 8(c) and (d).
lower and the higher power, which are represented by P0 and P1 As shown in Fig. 8(c) and (d), Loss 21 means the classified loss in
respectively. the training process when the data of No. 02 WTG is used as the
In addition, according to the clustering results of the time- training set to predict the oil temperature of No. 01 WTG, in the
domain characteristics of the vibration data in CMS, it is reason- same way, Loss 11, Loss 22 and Loss 12 also represent their cor-
able to divide the operational state of the WTG into 3 or 4 cate- responding meanings, similar to RMSE 11, RMSE 21, RMSE 22 and
gories [47]. As the study is conducted based on the SCADA data, we RMSE 12. It can be seen from the classification loss that all the four
clustered the PE of the four indexes of the two WTGs with Pact10m groups of the training results are convergent, indicating that the
respectively, and the results are shown in Fig. 7(b). The clustering prediction model with accurate regression performance has been
results show that the operational health state should be divided obtained in each group. As shown in Fig. 8(d), RMSE 11, RMSE 21,
into 2 or 3 categories. Finally, to meet the actual engineering re- RMSE 22 and RMSE 12 are 0.8070, 3.1177, 1.4135 and 2.1818,
quirements, we determined the operational health state as 3 cat- respectively. Thereinto, RMSE 12 is 1.5435 times as much as
egories, which are fine (v1 Þ, attention (v2 Þ and fault ðv3 Þ. RMSE 22, and RMSE 21 is 3.8626 times as much as RMSE 11.
Fig. 7. The clustering results of Pact10m with indexes and their prediction errors.
97
Fig. 8. The monitoring data and predicted results.
To sum up, the SCADA data distribution among WTGs is not training process, and then the most appropriate prediction model
identical enough to directly predict the operational state of for each index can be filtered out based on the principle of the
different WTGs, due to the discrepancy among the data distribution minimum prediction error and loss under the normal state. As
in each WTG. Therefore, the transfer learning method should be shown in Fig. 9(a) and (b), after 200 iterations of training, the Loss
adopted to align the data distribution among different WTGs, so curve of each index is almost close to 0, and the RMSE also ap-
that the existing fault data samples can be fully utilized to achieve proaches a constant value, which indicates that the faulty infor-
more accurate performance for CM. mation can be accurately predicted.
Fig. 9 (c) and (d) show the prediction percentage error of each
4. Performance analysis and discussion WTG index after normalization of maximum and minimum
values under the normal and abnormal operational state. As can
In this section, two WTGs with fault data samples, No. 13 and No. be seen from the figures, the percentage errors are inconspicuous
15, are taken as cases to verify the effectiveness of the LSTM-FS and before Tf when the WTG is under the normal state, which in-
feature-based transfer learning. The practicability in identifying the dicates that the model has a good generation ability on the
potential faults of the WTG is demonstrated by comparing the selected data sets. Correspondingly, PE under the abnormal state
actual operational state level and the predicted results. The case after Tf is significantly different from that in the normal state,
analysis mainly includes two aspects: (ⅰ) operational state calibra- which means that the model of each index can detect the faulty
tion based on LSTM-FS; (ⅱ) prediction cross power and devices information accurately. Meanwhile, it further verifies the ratio-
based on TRANSFER LEARNING and discussion. nality of the prediction dimensions selection for each indicator
based on correlation analysis. It is also found that the normalized
percentage errors are greatly distinguishing among the indexes at
4.1. Operational state calibration based on LSTM-FS
the same time under the abnormal state, which means that the
degree of faulty information reflected by each index is varied
For the sake of the accuracy of the operational state calibration,
from the different conditions. This also further clarifies that the
a total of 239,978 and 293,657 data samples were filtrated from No.
operational state of the WTG should be calibrated based on the
13 and No. 15 WTGs, respectively, under the normal and abnormal
prediction results of multiple indexes.
state for approximately 2 years from April 2018 to June 2020, with
the collection frequency of the data samples being once per minute.
Thereinto, a total of 40,000 sets of the data samples under the 4.1.2. Operational state calibration based on FS
normal state from each WTG were taken as the training set, and the In order to verify the objectivity of the operational state cali-
remaining data samples are used to test the generalization ability bration and the ability to detect faulty information by utilizing of
and classification accuracy of the model. the LSTM-FS. Firstly, based on the normalized PE of each index
predicted previously, the dynamic weight (Equ. 6) of each index
4.1.1. Index prediction for state calibration based on LSTM and the parameters li of the membership functions (Equ. 7 to Equ.
Due to the existence of faulty information in the test sets, 9) are obtained. Secondly, the member values for each operational
multiple groups of different hyperparameters are adopted in the state level can be calculated based on the deterioration degree,
98
Fig. 9. The prediction results of the indexes.
Fig. 10. The operational state calibration results of two WTGs.
dynamic weight and membership functions. Thirdly, the opera- 4.2. Operational state prediction based on transfer learning
tional state level of each data set is obtained based on the balanced
function (Equ. 13). Finally, the corresponding operational state For the sake of verifying the effectiveness of the transfer
levels of the data samples selected for the two WTGs under learning in operational state prediction, 10,000 sets of data cali-
different wind speed are shown in Fig. 10. brated previously are filtered for each WTG corresponding to each
As shown in Fig. 10, lp is the threshold of power classification, power level and each operational state, among which 24,000 sets
the method can detect faulty information at both high power (P1 ) are used as the training data and the rest as the test data. Opera-
and low power (P0 ) levels after auxiliary classification based on K- tional state prediction tests cross the WTG are divided into three
means clustering, also to the monitoring data retained in No. 15 parts: (ⅰ) prediction without power classification, (ⅱ) prediction
WTG (marked as ‘Downrating’) under the operation state of under the same power level and (ⅲ) cross the power level.
reduced power after the alarming. In addition, it is found that the
fault of the No. 13 WTG was caused by the fracture of the middle 4.2.1. State prediction based on transfer learning without auxiliary
pinion, No. 15 WTG was the wear of the inner ring in the low speed power classification
stage, after reviewing the faulty log. In order to verify the necessity of auxiliary power classification
and its effect on the accuracy of prediction results, we predict the
states of the two WTGs without auxiliary power classification
99
based on the three methods established in section 2.3. In the The average accuracy of three transfer learning algorithms is
training process, we adjust the initial learning rate, iteration times, 0.8893, 0.9393 and 0.9048, respectively, when No. 15 WTG is uti-
the magnification of the MMD and other parameters in sequence. lized as the target domain. Conversely, the average accuracy is
After the comprehensive evaluation of the accuracy and the 0.9228, 0.9133 and 0.8011 when No. 13 WTG is used as the target
convergence of loss, we get the accuracy of these three methods to domain. Thereinto, the classification accuracy of the fault state is
narrow the data distribution of the source and the target domain, greater than 0.95, when the No. 13 is used as the source domain.
so as to minimize the impact of random error on the consequences. The accuracy of the normal state (v1 ) is relatively high, when the
The confusion matrices for the classification accuracy of the three No. 15 WTG is used as the source domain, especially under the
operational status are shown in Fig. 11. power level P1 .
In Fig. 11, N13 /N15 represents N13 is invoked as the source As shown in Fig. 13, The average accuracy of three transfer
domain, N15 as the target domain, v1 , v2 and v3 represent the three learning algorithms crossing the power level is 0.8643, 0.9502 and
operational status. As shown in Fig. 11, the average accuracy of three 0.8424, respectively, when No. 13 WTG is invoked as the source
transfer learning is 0.8733, 0.8710 and 0.8248, respectively, when domain. Similarly, the average accuracy is 0.9297, 0.9190 and
No. 13 WTG is utilized as the source domain. Conversely, the 0.7590 when No. 13 WTG is invoked as the target domain. The
average accuracy is 0.7990,0.7480 and 0.7569 when No. 13 WTG is accuracy obtained based on DAN is the highest and all exceed 0.9,
used as the target domain. It can be seen that the classification and the DDC is lower, especially when No. 13 WTG is invoked as the
accuracy is higher when No. 13 WTG is used as the source domain, source domain. Despite the fastest convergence and the shortest
especially for the fault state (v3 ), when the monitoring data without training time, the accuracy is low when CORAL loss is utilized to
the auxiliary power classification. The classification of the normal narrow the data distribution.
state (v1 ) can obtain a higher classification accuracy, no matter
which WTG is served as the target domain.
4.3. Discussion
4.2.2. State prediction based on transfer learning with auxiliary As shown in Fig. 10. It is found that the ‘fault’ state of the two
power classification WTGs first appeared at 12:05 on January 7, 2018 and 22:43 on June
In this section, after the auxiliary power classification, the pre- 1, 2018, marked as v3 first , respectively. However, after reviewing
diction is carried out in two ways: under the same power level and the faulty log, the system alarmed and shut down at 12:05 on
under the cross power level, to obtain the influence of the auxiliary August 16 and 9:20 on November 8, 2018, due to the high tem-
power classification on the prediction accuracy. The confusion perature (70 C) of the Tinlet . Meanwhile, it was found that the worn
matrices of prediction accuracy under the same and different po- debris almost blocked the filter element when it was being
wer levels are shown in Figs. 12 and 13 respectively. replaced. Therefore, the multidimensional LSTM-FS can detect the
As can be seen from Fig. 12, where N13 P0 /N15 P0 indicates faulty information of WTG in advance than the fixed alarm
that the data of the No. 13 WTG with the power level P0 is invoked threshold in SCADA system with a good generalization ability, so as
as the source domain, and the No. 15 WTG with the power level P0 to reflect the actual operational state of the WTG.
as the target domain. In general, the accuracy is improved, According to the obtained state prediction results shown in
compared with the accuracy without auxiliary power classification. section 4.2, the average accuracy is 0.8122 without the auxiliary
Fig. 11. The confusion matrices of prediction results without power classification.
100
Fig. 12. The confusion matrices of prediction results under the same power level.
Fig. 13. The confusion matrices of prediction results crossing the power level.
101
power classification, 0.8915 when the two domains are under the (3) The monitoring data under the power-reduced operational
same power level, the average accuracy is 0.8707 when cross the status have the negative influence on the accuracy of clas-
power level. It shows that the auxiliary power classification can sification, especially when they are placed in the source
improve the accuracy which reaches its highest degree when two domain. Hence the power-reduced data should be excluded
domains under the same power level. or placed in the target domain during the training process to
It can be consulted from Fig. 12 that the average accuracy under improve the generalization ability of the model. In addition,
the power level P1 is higher than that under the power level P0 . It the visibility of fault information and status classification
can be inferred that the fault characteristic is positively correlated accuracy are positively correlated with the active power.
with the active power, which is consistent with the conclusion in
section 3.2, as well as the power level ranges of the two WTGs do In summary, the LSTM-FS and feature-based transfer learning
not coincide completely. In addition, the average accuracy under proposed in this study are effective to apply the existing monitoring
the same power level is higher when No. 13 WTG is invoked as the data with faulty information to the potential fault detection of other
source domain than that as the target domain, thanks to the WTG, for the O&M strategies optimization, reliability improvement
monitoring data that under the reduced power status is retained in and cost saving. It will also be beneficial to investigate how to make
No. 15 WTG. As can be seen from the confusion matrices of full use of the data under the power-reduced status and detect the
N13 P1 /N15 P1 and N15 P0 /N13 P0 , compared with the DDC faulty information when the wind turbine operating is under the
based on MMD with a single feature layer, three feature layers with low power level.
MK-MMD in the DAN structure can achieve a high accuracy,
nevertheless, there is a slight over-fitting. Although the mean ac- CRediT authorship contribution statement
curacy across the power level is not highly enough, the perfor-
mance of the three transfer learning algorithms can be reflected via Yongchao Zhu: Methodology, Program, Data processing,
the prediction results in Fig. 13. The mean accuracy is also higher Writing e original draft, Writing e review & editing. Caichao Zhu:
when MMD and MK-MMD are adopted. The CORAL is hard to Supervision. Jianjun Tan: Investigation, Writing e review & edit-
accurately narrow the discrepancy when the power levels between ing. Yong Tan: Investigation, Writing e review & editing. Lei Rao:
two domains are not aligned, especially the monitoring data under Guidance based on engineering experience.
the reduced power status is retained in the source domain.
To sum up, the multidimensional LSTM-FS is effective to detect Declaration of competing interest
faulty information when calibrating the operational state of the
WTG. The auxiliary power classification can improve the classifi- The authors declare that they have no known competing
cation accuracy of transfer learning-based prediction. The power financial interests or personal relationships that could have
level and the monitoring data under the reduced power status have appeared to influence the work reported in this paper.
a great influence on the generalization ability of the three transfer
learning methods adopted in this case application.
Acknowledgment
5. Conclusion
The authors gratefully appreciate the support from the National
High Tech ship research project (No.360 [2019] issued by MIIT,
Operational state-based maintenance is considered as an
China).
effective measure to improve the reliability and reduce the O&M
costs of a wind turbine in its overall lifetime. To make full use of the
monitoring data with faulty information for the operational state References
prediction of the WTG, we proposed a multidimensional state
[1] Y. Li, F.P.A. Coolen, C. Zhu, J. Tan, Reliability assessment of the hydraulic sys-
calibration method based on LSTM, FS, statistical analysis and tem of wind turbines based on load-sharing using survival signature, Renew.
transfer learning. Following this, necessity analysis of the power Energy 153 (2020) 766e776.
[2] G. Rinaldi, A. Garcia-Teruel, H. Jeffrey, P.R. Thies, L. Johanning, Incorporating
classification, clustering, prediction based on the transfer learning
stochastic operation and maintenance models into the techno-economic
were performed, respectively. Correspondingly, monitoring data analysis of floating offshore wind farms, Appl. Energy (2021) 301.
from two WTGs with faulty information were filtered for state [3] C.J.F.Y. Crabtree, P.J. Tavner, Detecting incipient wind turbine gearbox failure:
calibration and prediction and compared with the fault logs. Finally, a signal analysis method for on-line condition monitoring, in: Proceedings of
the Scientific Track of the European Wind Energy Conference, 2010,
through the analysis of the entire study in this paper, the following pp. 154e156.
conclusions are drawn: [4] P. Dehghanian, M. Fotuhi-Firuzabad, S. Bagheri-Shouraki, A.A. Razi Kazemi,
Critical component identification in reliability centered asset management of
power distribution systems via fuzzy AHP, IEEE Syst. J. 6 (2012) 593e602.
(1) In the process of training the index prediction model, it is [5] A. Kusiak, Z. Zhang, A. Verma, Prediction, operations, and condition moni-
suggested to eliminate the dimension that has strong cor- toring in wind energy, Energy 60 (2013) 1e12.
relation with active power and its correlation difference [6] A. Meyer, Multi-target normal behaviour models for wind farm condition
monitoring, Appl. Energy (2021) 300.
between normal and abnormal states is not obvious enough, [7] A. Stetco, F. Dinmohammadi, X. Zhao, V. Robu, D. Flynn, M. Barnes, et al.,
which is conducive to improving the ability of the prediction Machine learning methods for wind turbine condition monitoring: a review,
model to identify the normal and abnormal states. The po- Renew. Energy 133 (2019) 620e635.
[8] J.P. Salameh, S. Cauet, E. Etien, A. Sakout, L. Rambault, Gearbox condition
tential faults can be detected in advance by merging the monitoring in wind turbines: a review, Mech. Syst. Signal Process. 111 (2018)
multiple dimensions based on the LSTM-FS when calibrating 251e264.
the operational state. [9] S. Sheng, Monitoring of wind turbine gearbox condition through oil and wear
debris analysis: a full-scale testing perspective, Tribol. Trans. 59 (2016)
(2) Power classification should be performed by Pact10m for its
149e162.
strongest correlation with temperature indexes. Meanwhile, [10] Lubrication Oil Condition Monitoring and Remaining Useful Life Prediction
the classification accuracy can almost reach above 0.9 by with Particle Filtering.
utilizing the MK-MMD to narrow the discrepancy between [11] J. Zhu, J.M. Yoon, D. He, E. Bechhoefer, Online particle-contaminated lubrica-
tion oil condition monitoring and remaining useful life prediction for wind
the data distribution of the WTGs, while considering the turbines, Wind Energy 18 (2015) 1131e1149.
auxiliary power classification. [12] Y. Qu, D. He, J. Yoon, B. Van Hecke, E. Bechhoefer, J. Zhu, Gearbox tooth cut
102
fault diagnostics using acoustic emission and vibration sensors–a comparative and a hybrid intelligent optimizer, Appl. Energy (2021) 304.
study, Sensors (Basel) 14 (2014) 1372e1393. [32] A. Schreiber, J. Marx, P. Zapp, Comparative life cycle assessment of electricity
[13] Y. Zhang, W. Lu, F. Chu, Planet gear fault localization for wind turbine gearbox generation by different wind turbine types, J. Clean. Prod. 233 (2019)
using acoustic emission signals, Renew. Energy 109 (2017) 449e460. 561e572.
[14] C. Li, R.-V. Sanchez, G. Zurita, M. Cerrada, D. Cabrera, R.E. Va squez, Gearbox [33] J. Dai, W. Yang, J. Cao, D. Liu, X. Long, Ageing assessment of a wind turbine
fault diagnosis based on deep random forest fusion of acoustic and vibratory over time by interpreting wind farm SCADA data, Renew. Energy 116 (2018)
signals, Mech. Syst. Signal Process. 76e77 (2016) 283e293. 199e208.
[15] F. Elasha, M. Greaves, D. Mba, D. Fang, A comparative study of the effective- [34] B. Guezuraga, R. Zauner, W. Po €lz, Life cycle assessment of two different 2 MW
ness of vibration and acoustic emission in diagnosing a defective bearing in a class wind turbines, Renew. Energy 37 (2012) 37e44.
planetry gearbox, Appl. Acoust. 115 (2017) 181e195. [35] A. Orlando, L. Pagnini, M.P. Repetto, Structural response and fatigue assess-
[16] E.J. Diehl, J. Tang, Predictive modeling of a two-stage gearbox towards fault ment of a small vertical axis wind turbine under stationary and non-
detection, Shock and Vib. (2016) 1e13. stationary excitation, Renew. Energy 170 (2021) 251e266.
[17] J.M. Ha, B.D. Youn, H. Oh, B. Han, Y. Jung, J. Park, Autocorrelation-based time [36] X. Li, W. Zhang, Long-term fatigue damage assessment for a floating offshore
synchronous averaging for condition monitoring of planetary gearboxes in wind turbine under realistic environmental conditions, Renew. Energy 159
wind turbines, Mech. Syst. Signal Process. 70e71 (2016) 161e175. (2020) 570e584.
[18] L. Hong, Y.Z. Qu, J.S. Dhupia, S.W. Sheng, Y.G. Tan, Z.D. Zhou, A novel vibration- [37] X. Jia, C. Jin, M. Buzza, W. Wang, J. Lee, Wind turbine performance degradation
based fault diagnostic algorithm for gearboxes under speed fluctuations assessment based on a novel similarity metric for machine performance
without rotational speed measurement, Mech. Syst. Signal Process. 94 (2017) curves, Renew. Energy 99 (2016) 1191e1201.
14e32. [38] Z. Liu, L. Zhang, A review of failure modes, condition monitoring and fault
[19] R. Ruiz de la Hermosa Gonza lez-Carrato, Sound and vibration-based pattern diagnosis methods for large-scale wind turbine bearings, Measurement
recognition for wind turbines driving mechanisms, Renew. Energy 109 (2017) (2020) 149.
262e274. [39] Y. Feng, Y. Qiu, C.J. Crabtree, H. Long, P.J. Tavner, Monitoring wind turbine
[20] P.B. Dao, W.J. Staszewski, T. Barszcz, T. Uhl, Condition monitoring and fault gearboxes, Wind Energy 16 (2013) 728e740.
detection in wind turbines based on cointegration analysis of SCADA data, [40] T.Y. Wang, Q.K. Han, F.L. Chu, Z.P. Feng, Vibration based condition monitoring
Renew. Energy 116 (2018) 107e122. and fault diagnosis of wind turbine planetary gearbox: a review, Mech. Syst.
[21] Z.-Y. Zhang, K.-S. Wang, Wind turbine fault detection based on SCADA data Signal Process. 126 (2019) 662e685.
analysis using ANN, Adv. Manuf. 2 (2014) 70e78. [41] J. Igba, K. Alemzadeh, C. Durugbo, K. Henningsen, Performance assessment of
[22] H. Chen, H. Liu, X. Chu, Q. Liu, D. Xue, Anomaly detection and critical SCADA wind turbine gearboxes using in-service data: current approaches and future
parameters identification for wind turbines based on LSTM-AE neural trends, Renew. Sustain. Energy Rev. 50 (2015) 144e159.
network, Renew. Energy 172 (2021) 829e840. [42] A.S. Qureshi, A. Khan, A. Zameer, A. Usman, Wind power prediction using deep
[23] H. Sun, C. Qiu, L. Lu, X. Gao, J. Chen, H. Yang, Wind turbine power modelling neural network based meta regression and transfer learning, Appl. Soft
and optimization using artificial neural network with wind field experimental Comput. 58 (2017) 742e755.
data, Appl. Energy (2020) 280. [43] W. Chen, Y. Qiu, Y. Feng, Y. Li, A. Kusiak, Diagnosis of wind turbine faults with
[24] L. Xiang, X. Yang, A. Hu, H. Su, P. Wang, Condition monitoring and anomaly transfer learning algorithms, Renew. Energy 163 (2021) 2053e2067.
detection of wind turbine based on cascaded and bidirectional deep learning [44] Y. Li, W. Jiang, G. Zhang, L. Shu, Wind turbine fault diagnosis based on transfer
networks, Appl. Energy (2022) 305. learning and convolutional autoencoder with small-scale data, Renew. Energy
[25] J. Zhang, X. Zhao, Three-dimensional spatiotemporal wind field reconstruction 171 (2021) 103e115.
based on physics-informed deep learning, Appl. Energy (2021) 300. [45] H. Ren, W. Liu, M. Shan, X. Wang, A new wind turbine health condition
[26] Y. Zhu, C. Zhu, J. Tan, Y. Wang, J. Tao, Operational state assessment of wind monitoring method based on VMD-MPE and feature-based transfer learning,
turbine gearbox based on long short-term memory networks and fuzzy Measurement (2019) 148.
synthesis, Renew. Energy (2021). [46] X. Yang, Y. Zhang, W. Lv, D. Wang, Image recognition of wind turbine blade
[27] V. Inturi, N. Shreyas, K. Chetti, G.R. Sabareesh, Comprehensive fault di- damage based on a deep learning model with transfer learning and an
agnostics of wind turbine gearbox through adaptive condition monitoring ensemble learning classifier, Renew. Energy 163 (2021) 386e397.
scheme, Appl. Acoust. (2021) 174. [47] Y. Zhu, C. Zhu, C. Song, Y. Li, X. Chen, B. Yong, Improvement of reliability and
[28] P. Qian, D. Zhang, X. Tian, Y. Si, L. Li, A novel wind turbine condition moni- wind power generation based on wind turbine real-time condition assess-
toring method based on cloud computing, Renew. Energy 135 (2019) ment, Int. J. Electr. Power Energy Syst. 113 (2019) 344e354.
390e398. [48] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, T.J.A. Darrell, Deep Domain
[29] Y. Zheng, J. Wei, K. Zhu, B. Dong, Reliability analysis assessment of the wind Confusion: Maximizing for Domain Invariance, 2014, p. 3474, abs/1412.
turbines system under multi-dimensions, Adv. Compos. Lett. 29 (2020). [49] M. Long, Y. Cao, J. Wang, MIJae-p Jordan, Learning Transferable Features with
[30] J. Zhong, D. Wang, C. Li, A nonparametric health index and its statistical Deep Adaptation Networks, 2015 arXiv:1502.02791.
threshold for machine condition monitoring, Measurement (2021) 167. [50] B. Sun, K. Saenko, Deep CORAL: Correlation Alignment for Deep Domain
[31] R. Zou, J. Yang, Y. Wang, F. Liu, M. Essaaidi, D. Srinivasan, Wind turbine power Adaptation, Springer International Publishing, Cham, 2016, pp. 443e450.
curve modeling using an asymmetric error characteristic-based loss function
103

Anomaly Detection and Condition Monitoring of Wind Turbine Gearbox Based On LSTM-FS and Transfer Learning

Uploaded by

Copyright:

Available Formats

Anomaly Detection and Condition Monitoring of Wind Turbine Gearbox Based On LSTM-FS and Transfer Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Anomaly Detection and Condition Monitoring of Wind Turbine Gearbox Based On LSTM-FS and Transfer Learning

Uploaded by

Copyright:

Available Formats

Renewable Energy 189 (2022) 90e103

Contents lists available at ScienceDirect

Anomaly detection and condition monitoring of wind turbine gearbox

1. Introduction studies mainly focused on condition monitoring (CM) and fault

Nomenclature nwind instantaneous wind speed/m/s

Fig. 2. LSTM unit and structure used in this paper.

The output channel is used to determine the ﬁnal output, which 8

j¼1 j¼1 In order to obtain a representation that is conductive to

L ¼ L C ðXL ; yÞ þ a,MMD2 ðXS ; XT Þ (15)

3. Feasibility veriﬁcation of the methodology 3.2.1. Veriﬁcation of the dimensions

Fig. 4. The performance of monitoring indexes under each operational state.

Fig. 6. The silhouette coefﬁcient and power classiﬁcation of No. 13 WTG.

Fig. 8. The monitoring data and predicted results.

Fig. 9. The prediction results of the indexes.

Fig. 10. The operational state calibration results of two WTGs.

You might also like