Remaining Useful Life Estimation Framework For The Main Bearing of Wind Turbines Operating in Real Time

energies
Article
Remaining Useful Life Estimation Framework for the Main
Bearing of Wind Turbines Operating in Real Time
Januário Leal de Moraes Vieira 1 , Felipe Costa Farias 1 , Alvaro Antonio Villa Ochoa 1,2, * ,
Frederico Duarte de Menezes 1,2 , Alexandre Carlos Araújo da Costa 2,3 , José Ângelo Peixoto da Costa 1,2 ,
Gustavo de Novaes Pires Leite 1,2,3 , Olga de Castro Vilela 3 , Marrison Gabriel Guedes de Souza 4
and Paula Suemy Arruda Michima 2
1 Department of Higher Education Courses (DACS), Federal Institute of Education, Science and Technology of
Pernambuco, Av. Prof Luiz Freire, 500, Recife 50740-545, Brazil; januarioleal@recife.ifpe.edu.br (J.L.d.M.V.);
felipe.farias@paulista.ifpe.edu.br (F.C.F.); fredericomenezes@recife.ifpe.edu.br (F.D.d.M.);
angelocosta@recife.ifpe.edu.br (J.Â.P.d.C.); gustavonovaes@recife.ifpe.edu.br (G.d.N.P.L.)
2 Department of Mechanical Engineering, Federal University of Pernambuco, Cidade Universitaria, 1235,
Recife 50670-901, Brazil; alexandre.acosta@ufpe.br (A.C.A.d.C.); paula.michima@ufpe.br (P.S.A.M.)
3 Centro de Energias Renováveis (CER), Universidade Federal de Pernambuco, Cidade Universitaria, 1235,
Recife 50670-901, Brazil; olga.vilela@ufpe.br
4 NEOG—New Energy Options Geração de Energia, Guamaré 59598-000, Brazil; marrison.souza@neog.com.br
* Correspondence: ochoaalvaro@recife.ifpe.edu.br; Tel.: +55-81-99976-4266
Abstract: The prognosis of wind turbine failures in real operating conditions is a significant gap in the
academic literature and is essential for achieving viable performance parameters for the operation and
maintenance of these machines, especially those located offshore. This paper presents a framework
for estimating the remaining useful life (RUL) of the main bearing using regression models fed
operational data (temperature, wind speed, and the active power of the network) collected by a
supervisory control and data acquisition (SCADA) system. The framework begins with a careful
Citation: Vieira, J.L.d.M.; Farias, F.C.;
Ochoa, A.A.V.; de Menezes, F.D.;
data filtering process, followed by creating a degradation profile based on identifying the behavior of
Costa, A.C.A.d.; da Costa, J.Â.P.; de temperature time series. It also uses a cross-validation strategy to mitigate data scarcity and increase
Novaes Pires Leite, G.; Vilela, O.d.C.; model robustness by combining subsets of data from different available turbines. Support vector,
de Souza, M.G.G.; Michima, P.S.A. gradient boosting, random forest, and extra trees models were created, which, in the tests, showed an
Remaining Useful Life Estimation average of 20 days in estimating the remaining useful life and presented mean absolute error (MAE)
Framework for the Main Bearing of values of 0.047 and mean squared errors (MSE) of 0.012. As its main contributions, this work proposes
Wind Turbines Operating in Real (i) a robust and effective regression modeling method for estimating RUL based on temperature and
Time. Energies 2024, 17, 1430.
(ii) an approach for dealing with a lack of data, a common problem in wind turbine operation. The
https://doi.org/10.3390/en17061430
results demonstrate the potential of using these forecasts to support the decision making of the teams
Academic Editors: responsible for operating and maintaining wind farms.
Mohammadreza Aghaei and
Aref Eskandari Keywords: wind turbine; main bearing; remaining useful life—RUL; remaining useful life; machine
learning; regression models; supervisory control and data acquisition—SCADA; bearing temperature
Received: 16 February 2024
Revised: 5 March 2024
Accepted: 13 March 2024
Published: 16 March 2024
1. Introduction
Since the beginning of the 21st century, the pursuit of developing and enhancing
renewable energy production has been driven by environmental regulations, new business
Copyright: © 2024 by the authors.
prospects, and a shifting global mindset regarding the significance of utilizing energy
Licensee MDPI, Basel, Switzerland.
sources with minimal impact on the planet [1].
This article is an open access article
In this context, wind energy, a resource harnessed by humans for centuries, currently
distributed under the terms and
stands out for its advanced technological level and widespread use in various countries,
conditions of the Creative Commons
Attribution (CC BY) license (https://
both onshore and offshore. In terms of design, modern wind turbines surpass 120 m hub
creativecommons.org/licenses/by/
heights, 200 m rotor diameters, and 5 MW production capacity barriers. Operationally,
4.0/). wind farms have expanded in size, increasingly venturing into waters far from the coast
Energies 2024, 17, 1430. https://doi.org/10.3390/en17061430 https://www.mdpi.com/journal/energies

Energies 2024, 17, 1430 2 of 17
and employing specialized monitoring and control systems to ensure safety and efficiency
throughout their life cycle.
However, these large machines entail high operating and maintenance costs, around
12% for onshore installations and up to 23% [2] for offshore installations. The companies
operating and maintaining them have been striving to employ more precise and effective
maintenance methods and techniques to achieve more profitable production levels [3] and
enhanced safety.
In the wind energy industry, particularly offshore, corrective maintenance is undesir-
able due to its high costs and operational and environmental impacts. Therefore, preventive
and predictive maintenance are utilized to avoid early wind turbine failures. Predictive
maintenance, a technique based on condition monitoring as per NSAI [4], is carried out
using forecasts from the repeated analysis or evaluation of parameters and characteristics
indicating component degradation. This type of maintenance necessitates investment in
sensors, equipment to collect data from critical machine components, and human capital
capable of interpreting the data to generate relevant information for the decision making of
senior managers.
Vibration monitoring is an effective predictive technique for identifying component
degradation in mechanical assemblies. However, in applications such as wind turbines,
where there is a wide range of load variation, randomly, over short periods [5] (pp. 178–227),
and in low-speed conditions, it is a challenge to identify incipient faults in bearings and
gears. In these cases, it is necessary to identify and analyze changes in the vibrational
signature of these components concerning the start of operation or its replacement.
Temperature monitoring also provides valuable data. Analyzing deviations in histori-
cally observed temperature values can reveal possible flaws in development, allowing for
early corrections before major problems occur.
In order to successfully implement condition-based maintenance (CBM), sophisticated
techniques for fault detection, diagnosis, and prediction that satisfy the wind energy
sector’s technical, operational, safety, and environmental requirements must be used.
Consequently, to carry out a failure prognosis, the earlier it is possible to identify a
degradation profile in which the data show a monotonic signal with a trend, the more
successful the prognosis estimation will be.
Fault prognosis is the next frontier that the wind energy industry seeks. From this
perspective, there has been an explosion of articles since 2014 investigating the use of data-
driven methods and models for the assessment of remaining useful life (RUL), as shown
in Figure 1, addressing different failure prognosis techniques in the kinematic assembly,
mainly in wind turbine speed reducers. Several authors [6–10] have used machine learning
techniques or artificial neural networks (ANN) to estimate the RUL of gears and bearings
in speed reducers and carry out failure prognoses. Still, the number of studies for main
bearing prognosis is significantly lower [2,11], as shown in Figure 1.
Some studies [12,13] employing hybrid models with deep neural networks have
focused on main bearing fatigue estimates. Other studies advance the estimation of the
RUL of the main bearing using natural neural network models (e.g., LSTM) [9]. The authors
emphasize the need for data manipulation, data augmentation [13], and resampling [9] to
estimate fatigue or RUL.
Challenges in the failure prognosis of wind turbine main bearings stem from the lack
of understanding of the relationship between load and damage and frequency and failure
modes in real applications [11].
An accurate calculation of the RUL estimate is crucial for cost-effective maintenance
and greater wind turbine availability. Machine learning and artificial intelligence algorithms
enable the utilization of a massive amount of data from the condition-based maintenance
system and the supervision and data acquisition system to estimate the RUL of power train
components [7,14].
The use of variables like temperature is proposed as an alternative for identifying
faults and conducting prognosis. Wind turbine supervisory control systems (SCADA)
Energies 2024, 17, 1430 3 of 17
monitor the temperatures of the various components of the kinematic assembly, from the
Energies 2024, 17, x FOR PEER REVIEW
main bearing to the bearing on the opposite side of the coupling and of the Nacelle3and the
of 18
environment. It is possible to set temperature limits based on technical information about
the components and lubricants used.
Figure 1. Number of records of publications related to failure prognostics of all compo-

Figure 1. Number of records of publications related to failure prognostics of all components or sub-
nents or
systems andsubsystems and failure
failure prognostics prognostics
of main bearingofofmain
windbearing
turbinesofonwind turbinesVillage
Engineering on Engineering
(COM-
Village
PENDEX). (COMPENDEX).
Thisstudies
Some article presents a methodology
[12,13] employing hybridfor estimating
models the RUL
with deep of main
neural bearings
networks have from
fo-
regression models using real wind turbine data. The models were built with the a priori
cused on main bearing fatigue estimates. Other studies advance the estimation of the RUL
definition of an RUL profile through the analysis of local temperature time series, creation
of the main bearing using natural neural network models (e.g., LSTM) [9]. The authors
of training sets, validation and testing with different turbine time series, evaluation of
emphasize the need for data manipulation, data augmentation [13], and resampling [9] to
metrics such as mean squared error (MSE), root mean squared error (RMSE), mean absolute
estimate fatigue or RUL.
error (MAE), and the coefficient of determination (R2 Score), and estimation of RUL using
Challenges in the failure prognosis of wind turbine main bearings stem from the lack
the test dataset.
of understanding of the relationship between load and damage and frequency and failure
The main contributions of this article include the following:
modes in real applications [11].
• An The development
accurate of a of
calculation robust framework
the RUL estimatefor
is estimating
crucial for RUL from realmaintenance
cost-effective main bearing
temperature
and greater series from
wind turbine a SCADA
availability. system;learning and artificial intelligence algo-
Machine
•
rithms enable the utilization of a massivestrategy
The presentation of a cross-validation amounttoofmitigate the issue
data from of scarce data and
the condition-based
increase models’ generalization capacities.
maintenance system and the supervision and data acquisition system to estimate the RUL
Thetrain
of power remainder of this [7,14].
components manuscript is organized as follows. Section 2 presents the frame-
work
Themethodology for estimating
use of variables the main
like temperature is bearing’s
proposedRUL.
as anSection 3 provides
alternative the results
for identifying
and discussion obtained with the developed models, and Section 4 presents
faults and conducting prognosis. Wind turbine supervisory control systems (SCADA) the conclusions.
monitor the temperatures of the various components of the kinematic assembly, from the
2. Methodology for Wind Turbine Useful Life Estimation
main bearing to the bearing on the opposite side of the coupling and of the Nacelle and
This paper presents
the environment. a framework
It is possible for predicting
to set temperature windbased
limits turbine
onfailures
technicalby estimating
informationthe
remaining
about useful life and
the components of the main bearing
lubricants used.using temperature data monitored by a SCADA
system.
This article presents a methodologybearing
The framework used raw main temperature
for estimating data of
the RUL andmain
regression models
bearings fromto
calculate the RUL of turbines from two wind farms installed on the coast
regression models using real wind turbine data. The models were built with the a priori of northeastern
Brazil. The
definition turbines
of an were all
RUL profile the same
through themodel and
analysis ofhad
locala temperature
rated power above 1 MW.creation
time series, Detailed
information cannot be made available for confidentiality reasons.
of training sets, validation and testing with different turbine time series, evaluation of
metricsThe
suchprocedure
as mean consists
squared of reading
error and
(MSE), preprocessing
root mean squared theerror
raw data frommean
(RMSE), the SCADA
abso-
system and creating a degradation profile based on the temperature
lute error (MAE), and the coefficient of determination (R Score), and estimation of RUL
2 behavior with the
using the test dataset.
The main contributions of this article include the following:
• The development of a robust framework for estimating RUL from real main bearing
temperature series from a SCADA system;
models to calculate the RUL of turbines from two wind farms installed on the coast of
northeastern Brazil. The turbines were all the same model and had a rated power above 1
MW. Detailed information cannot be made available for confidentiality reasons.
The procedure consists of reading and preprocessing the raw data from the SCADA
Energies 2024, 17, 1430 system and creating a degradation profile based on the temperature behavior4 ofwith 17 the
design limits. The main bearing average temperature, environment average temperature,
wind speed, and active grid power data within the operational range of the wind turbines
design
were limits. Thevariables
considered main bearing average temperature,
of interest. environment
The environmental average temperature,
conditions and mechanical
wind speed, and active grid power data within the operational range of the wind turbines
properties of the main bearing lubricant of three wind turbines with different degradation
were considered variables of interest. The environmental conditions and mechanical
dynamics were also accounted for. Simulation cases were then defined from the datasets
properties of the main bearing lubricant of three wind turbines with different degradation
of these three
dynamics were turbines, respectively,
also accounted for training,
for. Simulation validating,
cases were then and testing
defined frommachine learning
the datasets
models using
of these threeregression models. for training, validating, and testing machine learning
turbines, respectively,
The MSE, RMSE, MAE,
models using regression models. and R2 Score metrics were calculated for the regression mod-
els developed,
The MSE,and rankings
RMSE, MAE, and R2 made
were based first
Score metrics were on the MAE
calculated for and then on the
the regression R2 Score.
models
developed, and with
rankings 2
The two models the were made
lowest MAEsbased first
and on the MAE
highest and then
R2 Scores wereonchosen
the R Score. The the
to estimate
two models with the lowest MAEs and highest R2 Scores were chosen to estimate the RUL
RUL of the wind farm’s main turbine bearing.
of the wind farm’s main turbine bearing.
Figure 2 shows the flowchart for estimating the RUL of the main bearing from tem-
Figure 2 shows the flowchart for estimating the RUL of the main bearing from temper-
perature data.
ature data.
Figure 2. RUL estimation using regression machine learning models.

Figure 2. RUL estimation using regression machine learning models.
As shown in Figure 2, the overall method consists of eleven stages. In the first stage,
the raw data from the SCADA system is read. The second stage aims for the preprocessing
As shown in Figure 2, the overall method consists of eleven stages. In the first stage,
stage of the data with cleaning, filtering, and resampling techniques. The third stage aims
thefor
raw data from the SCADA system is read. The second stage aims for the preprocessing
data classification. The fourth stage involves the creation of data for training, validation,
stage
andoftesting,
the data
and with
the cleaning, filtering,
available data and resampling
are separated techniques.
into subsets. The
The fifth third
stage stage
aims to aims
prepare and train the regression models. In the sixth stage, cross-validation is carried out
to identify the best hyperparameters of the models. The seventh stage is linked to the
determination of the study metrics. In the eighth stage, the models are ranked, and the
best ones are selected. The simulations are carried out to estimate the remaining useful life
(RUL) based on reading data from the test subsets in the ninth stage. In the tenth stage, the
simulation with the best regressive models is carried out, and finally, in the eleventh stage,
the RUL is calculated.
2.1. SCADA System Data Reading and Preprocessing

The SCADA system has a lower data acquisition rate than condition-based mainte-
nance systems [15] (pp. 303–342). The signals are recorded in 10 min samples over a given
time interval, but they can still identify faults generated by components and subsystems
that involve changes in measured quantities. The raw turbine data from the two wind
farms were provided in “.csv” format and were read and condensed into a parquet file. A
2.1. SCADA System Data Reading and Preprocessing
The SCADA system has a lower data acquisition rate than condition-based mainte-
nance systems [15] (pp. 303–342). The signals are recorded in 10 min samples over a given
Energies 2024, 17, 1430 time interval, but they can still identify faults generated by components and subsystems 5 of 17
that involve changes in measured quantities. The raw turbine data from the two wind
farms were provided in “.csv” format and were read and condensed into a parquet file. A
single
single data
data frame
frame was
was created
created containing
containing allall the
the SCADA
SCADA variables
variables with
with the
the data
data from
from thethe
parquet
parquet file,
file, which
which wewe will
will call
call the
the raw
raw data
data dataset.
dataset. From
From thethe raw
raw data
data dataset,
dataset, aa new
new oneone
was
was created
created with
with only
only the
the time
time series
series with
with the
the variables
variables ofof interest
interest (main
(main bearing
bearing average
average
temperature,
temperature, environment average temperature, windspeed average status, grid
environment average temperature, windspeed average status, grid average
average
active
active power)
power) of of 34
34 wind
wind turbines
turbines (WT1–WT34)
(WT1–WT34) from from the
the two
two wind
windfarms.
farms.
Figure
Figure 33 shows
shows the
the preprocessing
preprocessing of of the
the signal.
signal. In
In the
the first
first step,
step, the
the data
data were
were filtered
filtered
to remove measurements
to remove measurementswhere wherethe thewind
windspeed
speed was
was greater
greater than
than or equal
or equal to 3.5tom/s,
3.5 m/s,
gen-
generator speed was below 1198 RPM or above 1200 RPM, and
erator speed was below 1198 RPM or above 1200 RPM, and ambient temperature was lessambient temperature
was
thanless than to
or equal or50
equal to 50Celsius,
degrees degreeswhich
Celsius,
were which were
outside theoutside theoperating
turbine’s turbine’sconditions.
operating
conditions. The temperature series was then resampled at 1 h intervals
The temperature series was then resampled at 1 h intervals to filter out intra-hour to filter out varia-
intra-
hour variations. In the second step, a feature was created by subtracting
tions. In the second step, a feature was created by subtracting the ambient temperature the ambient
temperature
from the main from the main
bearing bearing temperature
temperature value asby
value as outlined outlined
Wiese by Wiese
et al. [16],etwhich
al. [16],wewhich
will
we will call temperature variation, to remove the influence of the ambient temperature on
call temperature variation, to remove the influence of the ambient temperature on the
the main bearing behavior.
main bearing behavior.
Figure 3. Signal preprocessing.

Figure 3. Signal preprocessing.
In the
In the third
third step
step of
of preprocessing,
preprocessing, thethe temperature
temperature variation
variation outliers
outliers were
were removed.
removed.
This involved eliminating values below 10 degrees Celsius, which indicate
This involved eliminating values below 10 degrees Celsius, which indicate that the that the wind
wind
turbine operates at low speed, and values above 98% of the percentile of
turbine operates at low speed, and values above 98% of the percentile of the time series,the time series,
considering the operating temperature
temperature range
range ((−30 ◦ C to
−30 °C to +110 ◦ C) of
+110 °C) of the
the SKF LGWM 1
lubricate the main
grease used to lubricate main bearing.
bearing. Figure 4 displays the time series of the failed failed
turbines (WT9, WT14, and WT29) with the outliers, while Figure 5 shows the series with
outliersremoved
the outliers removedand andwith
withthe
the1-day
1-day resampling
resampling necessary
necessary forfor seasonal
seasonal decomposi-
decomposition.
tion.
At theAtfour
the four steps,
steps, measurements
measurements were
were removed
removed that
that presentedmissing
presented missingvalues
values(Not
(Not a
Numbers—NaNs) and that could not be converted into another value other than the float
type [17] of the main bearing temperature variation to obtain the desired result in the last
preprocessing
preprocessing step.
step.
The final preprocessing step step involved
involved applying
applying anan additive
additive seasonal
seasonal decomposition
decomposition
to the time series. This method estimates the time series trend by applying a convolution
filter to the data,
data, removing
removing the computed
computed trend from the the series,
series, and
and then
then calculating
calculating the
Energies 2024, 17, x FOR PEER REVIEW 6 of 18
average of the detrended series [18]. In this work, additive seasonal decomposition
detrended series [18]. In this work, additive seasonal decomposition was was
used for 7-day periods, and the the results
results are
are shown
shown in
in Figure
Figure 6.6.
Figure 4. Main bearing average temperature series with outliers (WT9, WT14, and WT29).
Energies 2024, 17, 1430 Figure 4. Main bearing average temperature series with outliers (WT9, WT14, and WT29).
6 of 17
Figure5.5.Temperature
Figure Temperature variation
variation for for
thethe main
main bearing
bearing afterafter outlier
outlier removal
removal (WT9,(WT9,
WT14,WT14, and WT29).
and WT29).
Figure 5. Temperature variation for the main bearing after outlier removal (WT9, WT14, and WT29).
Figure
Figure 6.Temperature
Figure6.6. Temperature
Temperaturevariation forfor
variation
variation thethe
for main
the bearing
main
main after
bearing
bearing seasonal
after decomposition
seasonal
after (WT9,
decomposition
seasonal WT14,
(WT9,
decomposition WT14,
(WT9, WT14,
and WT29).
andWT29).
and WT29).
2.2. Classification of Temperature Variation Data
2.2.Classification
2.2. Classification ofof Temperature
Temperature Variation
VariationData
Data
This section outlines the process of classifying temperature variation data for applica-
This
tion inThis section outlines
sectionmodels.
regression outlines Thethe process
thefollowing of
process stepsclassifying
of classifying temperature
are involved:temperaturevariation data
variation forfor
data appli-
appli-
cation in regression models. The following steps are involved:
1.cation
The inIdentification
regression models.
of FailureThe following
Times: steps are
A graphical time involved:
series analysis determines the
1. The Identification of Failure Times: A graphical timeseries
1. The
first Identification
prediction time of
(FPT)Failure
and Times:
failure A graphical
threshold time time
(FTT). FPT analysis
marks
series the determines
analysiscomponent
determines thethe
first prediction
degradation process
first prediction time
time (FPT)
initiation,and
(FPT) and failure
while threshold
the FTT
failure time
signifiestime
threshold (FTT).
complete
(FTT).FPT marks
degradation. the compo-
FPT marks the compo-
2. nent degradation
Minimum process
Classification Valueinitiation,
Definition:while
A the FTT signifies
minimum complete
classification value degradation.
is degradation.
based on
nent degradation process initiation, while the FTT signifies complete
the temperature variation.
3. Linear Interpolation: Utilize linear interpolation to classify data between FPT and
FTT instances.
In analyzing turbine time series data, essential steps were taken to estimate the re-
maining useful life (RUL). The failure detection time (FPT) and failure threshold time
(FTT) were pinpointed by graphically examining the time series. The FPT represents the
beginning of component degradation, and the FTT marks the moment when the component
is considered thoroughly degraded. The period between the FTT and FPT is the remaining
useful life.
Following data preprocessing (described in Figure 3), the next step is classifying
temperature variation data to establish the dependent variable for regression models. A
manual definition of the seasonal decomposition stage was carried out by observing the
behavior of the temperature variation time series of the main bearing.
A graphical analysis of the temperature variation time series identified instances
of monotonic increase, indicating the FPT. Data before this moment were classified as
Energies 2024, 17, 1430 7 of 17
healthy (assigned the value 1). The FTT, identified as the instant with the most significant
temperature variation, represented where data were classified as degraded (assigned the
value 0). Upon the return of temperature variation to non-zero values, the data were
reclassified as healthy, denoting the restoration of valuable life, referred to as the restoration
instant (RT). Each turbine exhibited different maximum temperature variation values:
34.17 ◦ C, 39.5 ◦ C, and 28.5 ◦ C for the WT9, WT14, and WT29 turbines. These values were
normalized and employed to define the minimum data classification value according to
Equation (1).
Classmin = 1 − Temperaturenorm (1)
Considering the instants FPT, FTT, and RT, along with their corresponding data
classification values, linear interpolation was employed to classify temperature variation
data within the FPT and FTT intervals. The classification process unfolded across different
intervals as follows:
a. Interval 1 (Beginning of Time Series to Before Observable Increase in Main Bearing
Temperature): Data are consistently classified with a value of one (1).
b. Interval 2 (Start of Rise in Main Bearing Temperature to First Occurrence of Maximum
Temperature): Values gradually decrease through linear interpolation between 1 and
the minimum classification value (Classmin ). This considers the interval from the
initial rise in main bearing temperature to the point of the first occurrence of the
maximum temperature.
c. Interval 3 (First Occurrence of Maximum Temperature to Turbine Shutdown): Data
classification is assigned a zero value (0) during this interval, extending from the first
occurrence of the maximum temperature until the turbine ceases operation.
d. Post-Turbine Shutdown and Restart: Upon the resumption of wind turbine operation,
the data classification reverts to a value of one (1).
Figure 7 illustrates the comprehensive data classification for the three selected turbines
across the entire time series. It showcases distinct degradation dynamics in the main
bearings, with variations observed in the degradation speed, which was sometimes faster
for WT9 and occasionally slower for WT29.
2.3. Creation of Data Subsets

One common issue with machine learning models is overfitting, which can occur when
using samples from the same dataset for training and testing. However, using different
datasets involves a trade-off: reducing overfitting at the expense of regression model
metrics.
In this study, we employed a strategy to address data scarcity and minimize overfitting,
as suggested by [19,20]. This involved using subsets of data, defined through a combination
with permutation, from the three-time series of the WT9, WT14, and WT29 turbines, which
exhibited the monotonic growth of the main bearing temperature (see Table 1).
Table 1. Subsets used to deal with the scarcity of data.
ID Training Subset Validation Subset Testing Subset

SC01 WT9 WT14 WT29
SC02 WT9 WT29 WT14
SC03 WT14 WT9 WT29
SC04 WT14 WT29 WT9
SC05 WT29 WT9 WT14
SC06 WT29 WT14 WT9
nergies 2024, 17, x FOR PEER REVIEW 8 of 18
Energies 2024, 17, 1430 8 of 17
(a)
(b)
(c)
FigureFigure
7. RUL7.profiles estimated
RUL profiles from
estimated thethe
from main bearings’
main bearings’temperature
temperature time seriesfor
time series forwind
windturbines
turbines
(a) WT9,
(a) (b)
WT9,WT14, and (c)
(b) WT14, andWT29.
(c) WT29.
The
2.3. Creation of IDs
Dataindicated
Subsets in Table 1 represent a combination of data from one turbine used
to train the model (training subset), data from another turbine used to validate the model
One common
(validation issueand
subset), with machine
data learning
from a third models
turbine used toistest
overfitting,
the modelwhich can occur
after validation
when (testing
using samples from IDs
subset). These the will
same be dataset for training
used throughout andtotesting.
the text However,
refer to the using
simulation dif-
cases.
ferent datasets involves a trade-off: reducing overfitting at the expense of regression
model2.4. Model Development
metrics.
In thisSix regressive
study, machine learning
we employed a strategymodels, namely
to address support
data vector
scarcity andregression
minimize (SVR),
overfit-
isotonic regression (ISOR), gradient boosting regression (GBR), decision tree
ting, as suggested by [19,20]. This involved using subsets of data, defined through a com- regression
(DTR), extra trees regression (ETR), and random forest regression (RFR), were developed
bination with permutation, from the three-time series of the WT9, WT14, and WT29 tur-
to estimate the RUL using the sci-kit-learn Python library.
bines, which exhibited the monotonic growth of the main bearing temperature (see Table
The training, validation, and test data were normalized according to the respective
1). data ranges (MinMax Scaler) to standardize the range of independent variables [21].
The independent variables used in the models’ training, validation, and testing subsets
Table 1. Subsetsthe
included used to deal with
temperature the scarcity
variations of main
of the data. bearing, the wind speed, and the net active
ID Training Subset Validation Subset Testing Subset

SC01 WT9 WT14 WT29
SC02 WT9 WT29 WT14
SC03 WT14 WT9 WT29
Energies 2024, 17, 1430 9 of 17
power of the wind turbines in the subassemblies. The dependent variables were the
classified data, shown in Figure 7, estimated from the temperature time series. The data
subsets were used as described in Section 2.3 to guarantee the use of distinct subsets for
training, validation, and testing.
Cross-validation is a critical process as it enhances the performance of model estimates,
reduces overfitting, and aids in selecting the best model and its hyperparameters, thus
improving the robustness of tests.
The choice of datasets from different turbines and different power plants aimed to
avoid the occurrence of inflated model scores with hyperparameters adjusted through
cross-validation, as observed by Leahy et al. [22].
Following the model training, cross-validation was employed to identify the model
and hyperparameters that would minimize the mean squared error regression loss. The
parameter ranges for each regression model are detailed in Table 2.
The following metrics were calculated for each case to evaluate and compare the results
of the regression models and are shown in Table 1: mean squared error (MSE), root mean
squared error (RMSE), mean absolute error (MAE), and the coefficient of determination (R2
Score). The averages and standard deviations of the training, validation, and test metrics
for each subset of the simulated cases were also calculated.
The metrics obtained from the six developed models and the six cases showed slight
variations. Consequently, we selected the two models with the lowest MAE values and the
R2 Score values closest to 1. The RUL was estimated for each model, and the estimated and
observed values were compared, considering the stop dates recorded for the wind turbines,
as identified through the main bearing temperature time series.
The calculation of the RUL depends on the availability of information at the first
prediction time; in this study, this time was identified using the time series evaluated.
RUL estimates were performed using data from the WT9, WT14, and WT29 turbines.
A comparison was then performed with the failure date, also identified in the time series.
Energies 2024, 17, 1430 10 of 17
Table 2. List of parameters and ranges of parameters used for cross-validation task.
Support Vector Decision Tree Gradient Boosting

Parameters Isotonic Regression (ISOR) Random Forest (RFR) Extra Trees (ETR)
Regression (SVR) Regression (DTR) Regression (GBR)
Criterion: function to Criterion: function to Criterion: function to
Regularization Lower bound on the lowest
Parameter 1 measure the quality of Loss measure the quality of measure the quality of
parameter—C predicted value (y_min)
a split a split. a split.
Squared error, absolute
Squared error, absolute Squared error, absolute Squared error, absolute
Range 0.1, 1.0, 10.0 0, 0.1, 0.25, 0.5 error, Huber, and
error, and Friedman MSE error, and Friedman MSE error, and Friedman MSE
quantile
The maximum depth of Upper bound on the highest The maximum depth of The maximum depth of
Parameter 2 Polynomial degree Learning rate
the tree predicted value (y_max) the tree the tree
Range 5, 6, 7, 8, 9 None, 5, 10 0.5, 0.75, 1 0.01, 0.05, 0.1 None, 5, 10 None, 5, 10
Minimum number of Whether computing data are
Parameter 3 - samples required to split increasing or decreasing Number of estimators Random_state Random_state
an internal node (increasing)
Range - 2, 5, 10 True, False, ‘auto’ 100, 250, 500 None, 10, 100 None, 10, 100
Handles how X values
outside of the training
Parameter 4 - - Criterion - -
domain are handled during
prediction (out_of_bounds)
Friedman MSE and
Range - - clip - -
squared error
Energies 2024, 17, 1430 11 of 17
3. Analysis and Discussion of Results

This section presents the results obtained with the models at the training, validation,
3. Analysis
and testing and Discussion
stages. Firstly, theof Results
results of the regression model metrics and the selected
models are shown. Secondly, results
This section presents the obtained
the results of with
the the models at the
simulations fortraining, validation,
estimating the RUL are
and testing stages. Firstly, the results of the regression model metrics and the selected
shown. After that, remarks on the RUL estimation results and a discussion about the section
models are shown. Secondly, the results of the simulations for estimating the RUL are
results are presented.
shown. After that, remarks on the RUL estimation results and a discussion about the sec-
tion results are presented.
3.1. Metrics for Regression Models
3.1. Metrics
Metrics for Regression
were chosen that Models
allowed for a straightforward interpretation (MAE, MSE,
RMSE) toMetrics
evaluate werethechosen
regression models
that allowed forthat had a lower sensitivity
a straightforward to (MAE,
interpretation outliers (MAE) and
MSE,
RMSE)the
explained to evaluate
influence theof regression models thatvariables
the independent had a lower onsensitivity to outliers
the dependent (MAE)
variable—RUL—
and explained
(R2 Score). The lower the influence
the MSEofand the RMSE
independent variables
values, on thethe
the better dependent variable—
model results. The MAE
RUL— (R2 Score). The lower the MSE and RMSE values, the better the model results. The
should vary from 0 to 1 due to the RUL range, with values closer to zero being desirable.
MAE should vary from 0 to 1 due to the 2RUL range, with values closer to zero being de-
Values closer
sirable. to 1 are
Values desirable
closer for the R
to 1 are desirable score.
for the R2 score.
FigureFigure 8 illustrates the average MSE, RMSE,and
8 illustrates the average MSE, RMSE, andMAEMAE metric
metric values duringthe
values during the model
testing stage.
model Small
testing variations
stage. were observed
Small variations in theinMAE
were observed andand
the MAE MSE.MSE.Although
Although the
the average
MAEaverage
variedMAE fromvaried
0.02 to from 0.02
0.07, theto average
0.07, the average
standard standard deviation
deviation variedvaried between
between 0.002 and
0.029,0.002 and 0.029,
indicating indicating
that that themust
the models models must present
present differences
differences in RUL
in RUL estimates. The
estimates. The models
with models with the lowest MAE averages were RFR and DTR. The lowest standard deviation
the lowest MAE averages were RFR and DTR. The lowest standard deviation values
values were observed for the GBR, ETR, and DTR models. Consequently, these models
were should
observed for the GBR, ETR, and DTR models. Consequently, these models should
provide estimates with more minor errors, which was proven in the GBR and RFR
provide estimates with more minor
models, as shown in Section 3.2. errors, which was proven in the GBR and RFR models,
as shown in Section 3.2.
(a)—Support vector regression model (b)—Isotonic regression model
(c)—Gradient boosting regression model (d)—Decision tree regression model
(e)—Extra trees regression model (f)—Random forest regression model

Figure 8. MAE, MSE, and RMSE results for each regression model and simulation case. (a) support
Figure 8. MAE,
vector, MSE, and
(b) isotonic, RMSEboosting,
(c) gradient results for
(d) each regression
decision model
tree, (e) extra and
trees, andsimulation case. (a) support
(f) random Forest.
vector, (b) isotonic, (c) gradient boosting, (d) decision tree, (e) extra trees, and (f) random Forest.
In the case of the R2 Score metric, the average values for the training, validation, and
test subsets ranged from 0.894 to 0.639, with a standard deviation ranging from 0.073 to
0.402 (see Figure 9). The RFR, ETR, and GBR models better influenced the main bearing
temperature variation on the RUL’s higher R2 Score values. The lowest standard deviation
values were observed for the ISOR, ETR, and GBR models. Additionally, the R2 Score re-
sults indicate that data subsets 01 and 03 exhibited the lowest values.
Energies 2024, 17, 1430 (e)—Extra trees regression model (f)—Random forest regression model 12 of 17
Figure 8. MAE, MSE, and RMSE results for each regression model and simulation case. (a) support
vector, (b) isotonic, (c) gradient boosting, (d) decision tree, (e) extra trees, and (f) random Forest.
In theIncase of the
the case R2RScore
of the 2 Score metric, the
metric, the average
average values
values for thefor the training,
training, validation,validation,
and and
test subsets ranged
test subsets from
ranged from0.894
0.894 to 0.639,with
to 0.639, with a standard
a standard deviation
deviation ranging
ranging from 0.073 from
to 0.073 to
0.402Figure
0.402 (see (see Figure
9). 9).
The The RFR,ETR,
RFR, ETR, and
andGBRGBR models
modelsbetterbetter
influenced the main bearing
influenced the main bearing
temperature variation on the RUL’s higher R2 Score 2 values. The lowest standard deviation
temperature variation on the RUL’s higher R Score values. The lowest2standard deviation
values were observed for the ISOR, ETR, and GBR models. Additionally, the R Score re- 2
valuessults
were observed
indicate forsubsets
that data the ISOR,
01 and ETR, and the
03 exhibited GBR models.
lowest values. Additionally, the R Score
results indicate that data subsets 01 and 03 exhibited the lowest values.
(a)—Support vector regression model (b)—Isotonic regression model
(c)—Gradient boosting regression model (d)—Decision tree regression model
(e)—Extra trees regression model (f)—Random forest regression model
Figure 9. Results of R2 Score for each regression model and each simulation case. (a) Support vector,
(b) Isotonic, (c) Gradient boosting, (d) Decision tree, (e) Extra trees, and (f) Random Forest.
Among the metrics, there was little variation between the training and validation
stages:
• For the MAE, the averages ranged from 0.27 in training to 0.25 in validation, Table 3;
• For the MSE, the averages ranged from 0.005 in training to 0.004 in validation, Table 4;
• For the RMSE, the averages ranged from 0.066 in training to 0.063 in validation, Table 5;
• For the R2 Score, the averages ranged from 0.839 in training to 0.86 in validation, Table 6.
Table 3. Results of mean absolute error (MAE) for all simulation cases, for each model, for training,
validation, and testing steps.
Models Mean of Training Values [days] Mean of Validation Values [days] Mean of Testing Values [days]
DTR 0.014 0.014 0.040
ETR 0.023 0.023 0.040
GBR 0.026 0.026 0.049
ISOR 0.025 0.025 0.043
RFR 0.018 0.014 0.034
SVR 0.057 0.051 0.075
Overall mean MAE 0.027 0.025 0.047
Energies 2024, 17, 1430 13 of 17
Table 4. Mean squared error (MSE) results for all simulation cases, for each model, for training,
Models Mean of Training Values [days2 ] Mean of Validation Values [days2 ] Mean of Testing Values [days2 ]
DTR 0.004 0.004 0.013
ETR 0.004 0.004 0.010
GBR 0.004 0.004 0.012
ISOR 0.005 0.005 0.013
RFR 0.005 0.003 0.011
SVR 0.007 0.006 0.015
Overall mean MSE 0.005 0.004 0.012
Table 5. Root mean squared error (RMSE) results for all simulation cases, for each model, for training,
Models Mean of Training Values [days] Mean of Validation Values [days] Mean of Testing Values [days]
DTR 0.059 0.058 0.111
ETR 0.062 0.062 0.096
GBR 0.060 0.059 0.109
ISOR 0.071 0.071 0.111
RFR 0.063 0.054 0.100
SVR 0.082 0.077 0.122
Overall mean RMSE 0.066 0.063 0.108
Table 6. Coefficient of determination (R2 Score) results for all simulation cases, for each model, for
training, validation, and testing steps.
Mean of Training Values Mean of Validation Values Mean of Testing Values

Models
[Dimensionless] [Dimensionless] [Dimensionless]
DTR 0.877 0.880 0.623
ETR 0.864 0.865 0.677
GBR 0.861 0.885 0.639
ISOR 0.836 0.836 0.633
RFR 0.807 0.901 0.630
SVR 0.790 0.798 0.545
Overall mean R2 Score 0.839 0.861 0.625
Ranking the models examined was challenging due to the slight variations across
the stages.
Due to the small variation in the MAE averages and the variations in the average and
standard deviation of the R2 score, we decided to conduct simulations with all models and
evaluate the results by assessing the performances of the models for each simulated case.
3.2. Estimation of the Remaining Useful Life of the Sample Set of Failed Turbines
At this stage, simulations were carried out with the models using the turbine data
defined in the test subset for the cases presented in Section 2.3. Table 7 presents the best
results obtained in the simulations with the test subsets.
Table 7. Best simulation results with test subsets.
Simulation Case Model RUL Estimation [days] Calculation Error of RUL Estimation [Days]
SC02 SVR 1638.0 −111
SC05 SVR 1685.0 −64
SC04 GBR 393.0 0
SC04 RFR 461.0 68
SC02 ETR 1856.0 107
SC06 ETR 1855.0 118
Energies 2024, 17, 1430 14 of 17
The “RUL estimation” column indicates the calculated value of the remaining useful
life obtained by the respective model and with the respective dataset, and the “Error on
RUL estimation” column indicates the differences between the results in the forecast date
and date columns when the fault was recorded.
Negative values in the RUL estimation error column indicate that the estimated date
was lower than recorded. Positive values in this column indicate that the estimated date
was greater than the recorded date.
The results with the SVR and ETR models were obtained without the fault detection
date information. The results for the GBR and RFR models were obtained with details on
the fault detection date.
Figure 10 shows the results of the SVR models for the two simulation cases (SC02
and SC05). It is observed that the models, trained and validated with data from other
turbines, can represent the behavior of WT14 used in tests until close to the failure15date.
Energies 2024, 17, x FOR PEER REVIEW of 18
The model presents more significant errors in making this representation in the interval
between stopping and returning to operation.
Figure 10.Best
Figure10. Bestresult
resultsimulations
simulationswith
withSVR
SVRmodels
models(SC02
(SC02e eSC05).
SC05).
3.3. Remarks on RUL Estimation Results
3.3. Remarks on RUL Estimation Results
The developed models demonstrated satisfactory performances for the commonly
The developed
used metrics models regression
used to evaluate demonstrated satisfactory
models. However, performances for the commonly
the small differences in perfor-
mance made establishing a ranking strategy for the models challenging. differences in per-
used metrics used to evaluate regression models. However, the small
formance made establishing
In the simulations a ranking
for estimating thestrategy
RUL, theforSVR
the and
models
ETRchallenging.
models could reproduce
In the simulations
the degradation behavior forbased
estimating
on thethe RUL,
time thedata.
series SVR andThe ETR
SVRmodels could reproduce
model exhibited more
the degradation
conservative behavior
results, predictingbasedtheon the time
failure dateseries
earlierdata.
thanThe
the SVR model
recorded exhibited more
one.
conservative
On the otherresults,
hand,predicting
the GBRthe and failure
RFR date
modelsearlier than the
required recorded one.
information on the date of
On the other hand, the GBR and
detection to make a more accurate estimate of the RUL.RFR models required information on the date of
detection to make a more accurate estimate of the RUL.
3.4. Final Discussion of the Results
3.4. The
Finalresults
Discussion
showofthat
the itResults
is possible to estimate RUL using little data and without over-
fitting,The
usingresults show that wind
data from three turbines.toThis
it is possible makesRUL
estimate it a viable
using complementary option
little data and without
for the decision making of wind farm maintenance managers. However,
overfitting, using data from three wind turbines. This makes it a viable complementary as highlighted by
several authors [21,23,24], intensive real data preprocessing techniques
option for the decision making of wind farm maintenance managers. However, as high- were necessary to
obtain
lighted time
by series that
several would[21,23,24],
authors enable obtaining
intensiveminimally
real data reasonable
preprocessingresults from regres-
techniques were
sive machine learning models. This stage requires the analysis of the mechanical
necessary to obtain time series that would enable obtaining minimally reasonable results behavior
from regressive machine learning models. This stage requires the analysis of the mechan-
ical behavior of components, subsystems, and systems under study, combined with data
processing techniques to effectively generate useful information for the operation and
maintenance of wind turbines.
The results of the remaining useful life estimates (RUL) of wind turbines allow dif-
Energies 2024, 17, 1430 15 of 17
of components, subsystems, and systems under study, combined with data processing
techniques to effectively generate useful information for the operation and maintenance of
wind turbines.
The results of the remaining useful life estimates (RUL) of wind turbines allow different
interpretations from a maintenance point of view. They can be considered assertive, such
as the GBR model that predicted the date with an error of zero days; conservative, in the
case of SVR models that underestimated the useful life; and non-conservative, such as
the ETR and RFR models, which overestimated the values found in the time series (see
Table 7). Therefore, conservative estimates, such as those from the SVR, and assertive ones,
such as those from the GBR, can support proper maintenance planning for the resources
needed to carry out scheduled shutdowns and interventions on the turbines, thus avoiding
catastrophic failures that could reduce the availability of the wind farm. In this way, the
estimates support the prevention of catastrophic faults and guarantee the operation of
the turbines [25]. In the same context, using subsets of real data effectively mitigated the
effects of scarce data and overfitting, which was a significant contribution due to the limited
availability of time series data on main bearing failures.
The strategy of creating simulation cases proved effective in dealing with the issue of
data scarcity without the need to use data padding or synthetic data. Future work could
address the development of a framework that uses data of increasing complexity (synthetic
data, bench test data, and real data) to enable the use and evaluation of different machine
learning or deep learning models.
Nevertheless, for the successful application of the method, data on the variables of
interest, indicated in Section 2.1, are necessary from at least three wind turbines that have
experienced main bearing failure to minimize overfitting. Another limitation of the method
is the need for main bearing temperature data with monotonic growth and, if possible,
with records close to the turbine shutdown date for replacements. Using low-temperature
values for model training and validation may bias the results, and the method may indicate
a replacement before the component’s actual end of life.
It is worth underscoring that utilizing low-temperature values during the model’s
training and validation can introduce distortion into the results. Such a bias may cause
the method to suggest replacement prematurely, preceding the actual conclusion of the
component’s operational lifespan. Therefore, meticulous and representative data collec-
tion is imperative to uphold the dependability and precision of the analysis, preventing
premature or erroneous determinations in the context of main bearing replacement.
4. Conclusions
A framework for predicting wind turbine main bearing failures was developed and
tested using temperature data from a SCADA system. Temperature data and different
machine learning regression models (support vector regression isotonic regression, gradient
boosting regression, decision tree, extra trees, and random forest regression) were used to
estimate the remaining useful life (RUL) of the main bearings of wind turbines with scarce
data. The main findings of the study are as follows:
• The models were tested on real data from three wind turbines in northeastern Brazil,
showing satisfactory results in each step of the validation and test. The MAE, MSE,
RMSE, and R2 Score metric values in the validation step were 0.25, 0.004, 0.004, and
0.86, respectively;
• Regarding the simulation, the results demonstrated that the models (SVR, ETR, GBR,
and RFR) outperformed since they showed an average of 20 days in estimating the
remaining useful life of the main bearings of the wind turbines;
• The methodology showed that conservative estimates, such as those from the SVR, and
assertive ones, such as those from the GBR, can support proper maintenance planning,
thereby avoiding catastrophic failures that could reduce the wind farm’s availability.
Future work will focus on developing a framework that uses synthetic, bench, and real
data to build more complex models. It is also essential to understand the failure mechanisms
Energies 2024, 17, 1430 16 of 17
and the data collection process for model development, as indicated by [2,26]. Furthermore,
organizations must improve data management processes for effective decision making [26].
Author Contributions: Conceptualization, J.L.d.M.V., F.C.F., and G.d.N.P.L.; methodology, J.L.d.M.V.,

G.d.N.P.L. and A.A.V.O.; software, J.L.d.M.V., F.C.F., G.d.N.P.L. and A.A.V.O.; validation, J.L.d.M.V.
and G.d.N.P.L.; formal analysis, J.L.d.M.V.; investigation, J.L.d.M.V.; resources A.A.V.O., J.Â.P.d.C.,
F.D.d.M., A.C.A.d.C., M.G.G.d.S. and G.d.N.P.L.; data curation, J.L.d.M.V., O.d.C.V. and M.G.G.d.S.;
writing—original draft preparation, J.L.d.M.V., P.S.A.M., and A.A.V.O.; writing—review and edit-
ing, J.Â.P.d.C., P.S.A.M., G.d.N.P.L., and A.A.V.O.; visualization, P.S.A.M.; supervision, J.Â.P.d.C.,
G.d.N.P.L., A.A.V.O. and P.S.A.M.; project administration, A.C.A.d.C., J.Â.P.d.C., G.d.N.P.L., A.A.V.O.,
M.G.G.d.S. and P.S.A.M.; funding acquisition, A.C.A.d.C., J.Â.P.d.C., G.d.N.P.L., M.G.G.d.S., O.d.C.V.
and A.A.V.O. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Rio Amazonas SA—2021.
Data Availability Statement: The data are not publicly available due to confidentiality issues.
Acknowledgments: This study obtained information while developing a Research and Development—
R&D project supported by Rio Amazonas SA and approved by the Brazilian Regulatory Agency
ANEEL. The first author thanks IFPE, Brazil, for the support through their doctorate thesis. The
third author thanks CNPq, Brazil, for the support through productivity grant n. 303417/2022-6. The
seventh author thanks CNPq for his support through post-doctorate grant 200820/2022-2 under Call
26/2021 and productivity grant n. 303200/2023-5. The third, fourth, sixth, and seventh authors thank
IFPE for the financial support under Call REI/IFPE No. 43/2023.
Conflicts of Interest: Author Marrison Gabriel Guedes de Souza was employed by the company
NEOG, Brazil. The remaining authors declare that the research was conducted in the absence of any
commercial or financial relationships that could be construed as a potential conflict of interest.
References
1. Strielkowski, W.; Civín, L.; Tarkhanova, E.; Tvaronavičienė, M.; Petrenko, Y. Renewable Energy in the Sustainable Development
of Electrical Power Sector: A Review. Energies 2021, 14, 8420. [CrossRef]
2. de Novaes Pires Leite, G.; Araújo, A.M.; Rosas, P.A.C. Prognostic Techniques Applied to Maintenance of Wind Turbines: A
Concise and Specific Review. Renew. Sustain. Energy Rev. 2018, 81, 1917–1925. [CrossRef]
3. Guo, Y.; Sheng, S.; Phillips, C.; Keller, J.; Veers, P.; Williams, L. A Methodology for Reliability Assessment and Prognosis of
Bearing Axial Cracking in Wind Turbine Gearboxes. Renew. Sustain. Energy Rev. 2020, 127, 109888. [CrossRef]
4. BS EN 13306:2010; Maintenance—Maintenance Terminology. NSAI. 2010. Available online: https://www.en-standard.eu/
bs-en-13306-2017-maintenance-maintenance-terminology/?gad_source=1&gclid=Cj0KCQjwwMqvBhCtARIsAIXsZpZS1
xtdpaIhepDSfK9Ukr8llB0tSP-j860QQhjm2l81JU8jXHbnnDIaAu1TEALw_wcB (accessed on 17 January 2024).
5. Randall, R.B. Vibration-Based Condition Monitoring, 1st ed.; John Wiley & Sons, Ltd: Hoboken, NJ, USA, 2011; ISBN 9780470747858.
6. Koukoura, S. Failure and Remaining Useful Life Prediction of Wind Turbine Gearboxes. Annu. Conf. PHM Soc. 2018, 10.
[CrossRef]
7. Carroll, J.; Koukoura, S.; McDonald, A.; Charalambous, A.; Weiss, S.; McArthur, S. Wind Turbine Gearbox Failure and Remaining
Useful Life Prediction Using Machine Learning Techniques. Wind Energy 2019, 22, 360–375. [CrossRef]
8. Dameshghi, A.; Refan, M.H. Combination of Condition Monitoring and Prognosis Systems Based on Current Measurement and
PSO-LS-SVM Method for Wind Turbine DFIGs with Rotor Electrical Asymmetry. Energy Syst. 2021, 12, 203–232. [CrossRef]
9. Herp, J.; Pedersen, N.L.; Nadimi, E.S. Assessment of Early Stopping through Statistical Health Prognostic Models for Empirical
Rul Estimation in Wind Turbine Main Bearing Failure Monitoring. Energies 2019, 13, 83. [CrossRef]
10. Elasha, F.; Shanbr, S.; Li, X.; Mba, D. Prognosis of a Wind Turbine Gearbox Bearing Using Supervised Machine Learning. Sensors
2019, 19, 3092. [CrossRef] [PubMed]
11. Hart, E.; Clarke, B.; Nicholas, G.; Kazemi Amiri, A.; Stirling, J.; Carroll, J.; Dwyer-Joyce, R.; McDonald, A.; Long, H. A Review of
Wind Turbine Main Bearings: Design, Operation, Modelling, Damage Mechanisms and Fault Detection. Wind Energy Sci. 2020,
5, 105–124. [CrossRef]
12. Yucesan, Y.A.; Viana, F.A.C. A Hybrid Model for Main Bearing Fatigue Prognosis Based on Physics and Machine Learning. In
Proceedings of the AIAA Scitech 2020 Forum, Orlando, FL, USA, 6–10 January 2020; American Institute of Aeronautics and
Astronautics Inc., AIAA: Orlando, FL, USA, 2020; Volume 1. Part F.
13. Yucesan, Y.A.; Viana, F.A.C. Wind Turbine Main Bearing Fatigue Life Estimation with Physics-Informed Neural Networks; PHM Society:
Orlando, FL, USA, 2019.
14. Yang, L.; Zhang, Z. Wind Turbine Gearbox Failure Detection Based on SCADA Data: A Deep Learning Based Approach. IEEE
IEEE Trans. Instrum. Meas. 2020, 70, 3507911. [CrossRef]
Energies 2024, 17, 1430 17 of 17
15. Hu, C.; Byeng, D.Y.; Youn, D.; Wang, P. Springer Series in Reliability Engineering Design under Uncertainty and Health Prognostics;
Springer Publisher: New York, NY, USA, 2019.
16. Wiese, B.; Pedersen, N.L.; Nadimi, E.S.; Herp, J. Estimating the Remaining Power Generation of Wind Turbines—An Exploratory
Study for Main Bearing Failures. Energies 2020, 13, 3406. [CrossRef]
17. McKinney, W. Pandas: A Python Data Analysis Library. Available online: http://pandas.sourceforge.net (accessed on
17 January 2024).
18. Perktold, J.; Seabold, S.; Taylor, J. Statsmodels Documentation. Available online: https://www.statsmodels.org/stable/generated/
statsmodels.tsa.seasonal.seasonal_decompose.html#statsmodels.tsa.seasonal.seasonal_decompose (accessed on 17 January 2024).
19. Rezamand, M.; Carriveau, R.; Ting, D.S.K.; Davison, M.; Davis, J.J. Aggregate Reliability Analysis of Wind Turbine Generators.
IET Renew. Power Gener. 2019, 13, 1902–1910. [CrossRef]
20. Abid, K.; Sayed-Mouchaweh, M.; Laurence, C. Adaptive Machine Learning Approach for Fault Prognostics Based on Normal
Conditions—Application to Shaft Bearings of Wind Turbine. Annu. Conf. PHM Soc. 2019, 11, 46–50. [CrossRef]
21. Tutiv’en, C.; Benalcazar–Parra, C.; Escuela, A.E.; Vidal, Y.; Puruncaias, B.; Fajardo, M. Wind Turbine Main Bearing Condition
Monitoring via Convolutional Autoencoder Neural Networks. In Proceedings of the 2021 International Conference on Electrical,
Computer, Communications and Mechatronics Engineering (ICECCME), Mauritius, 7–8 October 2021; pp. 1–6.
22. Leahy, K.; Gallagher, C.; O’Donovan, P.; Bruton, K.; O’Sullivan, D.T.J. A Robust Prescriptive Framework and Performance Metric
for Diagnosing and Predicting Wind Turbine Faults Based on SCADA and Alarms Data with Case Study. Energies 2018, 11, 1738.
[CrossRef]
23. Zhao, Y.; Li, D.; Dong, A.; Kang, D.; Lv, Q.; Shang, L. Fault Prediction and Diagnosis of Wind Turbine Generators Using SCADA
Data. Energies 2017, 10, 1210. [CrossRef]
24. Correa-jullian, C.; Cofre-martel, S.; Martin, G.S.; Droguett, E.L.; de Novaes Pires Leite, G.; Costa, A. Exploring Quantum Machine
Learning and Feature Reduction Techniques for Wind Turbine Pitch Fault Detection. Energies 2022, 15, 2792. [CrossRef]
25. Sahu, A.; Jambhale, R.; Adiga, D.T.; Powar, N.; Mckinley, T. Formulation of Model Stability Metrics for Remaining Useful Life
Models of Engine Components. In Proceedings of the 2023 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2023;
Volume 2023, pp. 1–11.
26. Sikorska, J.Z.; Hodkiewicz, M.; Ma, L. Prognostic Modelling Options for Remaining Useful Life Estimation by Industry. Mech.
Syst. Signal Process. 2011, 25, 1803–1836. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Remaining Useful Life Estimation Framework For The Main Bearing of Wind Turbines Operating in Real Time

Uploaded by

Copyright:

Available Formats

Remaining Useful Life Estimation Framework For The Main Bearing of Wind Turbines Operating in Real Time

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Remaining Useful Life Estimation Framework For The Main Bearing of Wind Turbines Operating in Real Time

Uploaded by

Copyright:

Available Formats

energies

Energies 2024, 17, 1430. https://doi.org/10.3390/en17061430 https://www.mdpi.com/journal/energies

Figure 1. Number of records of publications related to failure prognostics of all compo-

Figure 2. RUL estimation using regression machine learning models.

2.1. SCADA System Data Reading and Preprocessing

Figure 3. Signal preprocessing.

2.3. Creation of Data Subsets

Table 1. Subsets used to deal with the scarcity of data.

ID Training Subset Validation Subset Testing Subset

ID Training Subset Validation Subset Testing Subset

Support Vector Decision Tree Gradient Boosting

3. Analysis and Discussion of Results

(a)—Support vector regression model (b)—Isotonic regression model

Energies 2024, 17, x FOR PEER REVIEW 12 of 18

(c)—Gradient boosting regression model (d)—Decision tree regression model

(e)—Extra trees regression model (f)—Random forest regression model

(a)—Support vector regression model (b)—Isotonic regression model

(c)—Gradient boosting regression model (d)—Decision tree regression model

(e)—Extra trees regression model (f)—Random forest regression model

Mean of Training Values Mean of Validation Values Mean of Testing Values

Table 7. Best simulation results with test subsets.

Author Contributions: Conceptualization, J.L.d.M.V., F.C.F., and G.d.N.P.L.; methodology, J.L.d.M.V.,

You might also like