Open AccessArticle

Multistep Forecasting Method for Offshore Wind Turbine Power Based on Multi-Timescale Input and Improved Transformer

Anping Wan

¹,

Zhipeng Gong

^1,2,

Chao Wei

^3,*,

Khalil AL-Bukhaiti

^1,4,*

Yunsong Ji

⁵,

Shidong Ma

⁵ and

Fareng Yao

⁵

Department of Mechanical Engineering, Hangzhou City University, Hangzhou 310015, China

College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China

Huadian Electric Power Research Institute, Hangzhou 310030, China

⁴

School of Civil Engineering, Southwest Jiaotong University, Chengdu 610032, China

⁵

Guangdong Huadian Fuxin Yangjiang Offshore Wind Power Co., Ltd., Yangjiang 529500, China

Authors to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(6), 925; https://doi.org/10.3390/jmse12060925

Submission received: 12 April 2024 / Revised: 18 May 2024 / Accepted: 27 May 2024 / Published: 31 May 2024

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Figure 1
Raw power data. "> Figure 2
Standard wind speed power curve for offshore wind turbines. "> Figure 3
Wind speed distribution and contours in m/s. "> Figure 4
Three-level decomposition of the VMD. "> Figure 5
Four-level decomposition of the VMD. "> Figure 6
GRU unit. "> Figure 7
Multi-head Attention. "> Figure 8
Decoder structure. "> Figure 9
Wind turbine multistep power forecast method based on VMD-MSI-GTTS. "> Figure 10
Change curve of the loss function in the training set of Unit 12. "> Figure 11
Change curve of the loss function in the validation set of Unit 12. "> Figure 12
Evaluation index of module analysis experimental results. "> Figure 13
Box plot of the error distribution of the module analysis experiment results. "> Figure 14
Error distribution of the comparative experimental results of different decomposition methods. "> Figure 15
Evaluation index of the comparative test results. "> Figure 16
Box plot of the error distribution of the comparative experiments results. "> Figure 17
Line chart of the partial comparison of experimental results. "> Figure 17 Cont.
Line chart of the partial comparison of experimental results. ">

Versions Notes

Abstract

Wind energy is highly volatile, and large-scale wind power grid integration significantly impacts grid stability. Accurate forecasting of wind turbine power can improve wind power consumption and ensure the economy of the power grid. This paper proposes a multistep forecasting method for offshore wind turbine power based on a multi-timescale input and an improved transformer. First, the wind speed sequence is decomposed by the VMD method to extract adequate timing information and remove the noise, after which the decomposition signals are merged with the rest of the timing features, and the dataset is split according to different timescales. A GRU receives the short-timescale inputs, and the Improved Transformer captures the timing relationship of the long-timescale inputs. Finally, a CNN is used to extract the information of each time point at the output of each branch, and the fully connected layer outputs multistep forecasting results. Experiments were conducted on operation data from four wind turbines located within the offshore wind farm but not near the edge. The results show that the proposed method achieved average errors of 0.0522 in MAE, 0.0084 in MSE, and 0.0907 in RMSE on a four-step forecast. This outperformed comparison methods LSTM, CNN-LSTM, LSTM-Attention, and Informer. The proposed method demonstrates superior forecasting performance and accuracy for multistep offshore wind turbine power forecasting.

Keywords:

offshore wind turbines; VMD decomposition; multi-timescale input; improved transformer; multistep power forecasting

1. Introduction

Wind energy has gained global momentum to address pressing environmental concerns and fuel shortages [1], with its share in the new energy sector growing significantly [2]. Offshore wind power, a crucial subfield of wind energy development, has emerged as a new trend in the global wind power industry due to its vast resources, minimal environmental impact, high efficiency, large individual capacity, and proximity to load centers [3]. However, the unpredictability of wind speed and other meteorological factors leads to erratic power yields from wind turbines [4], causing considerable fluctuations in the power grid when wind energy is integrated in large volumes. Hence, precise and resilient short-term wind power forecasting is vital for large-scale grid integration [5]. Despite factors like wind speed, direction, temperature, humidity, barometric pressure, and altitude causing significant variances in wind power, offshore wind energy remains more stable and less turbulent than onshore, as it is uninfluenced by topography, vegetation, and buildings [6]. Nonetheless, the short distance between units, the lengthy, extensive range of wind energy through the impeller wake, and the existence of complex regional numerical models at sea make accurate wind-energy predictions challenging. Wind power forecasting methods can be categorized into ultra-short-term, short-term, medium-term, and long-term forecasts based on the forecasting horizon [7]. While ultra-short-term forecasts predict wind power up to 4 h ahead with a resolution of 15 min or less, short-term forecasts extend up to 72 h, medium-term forecasts span three days to a week, and long-term forecasts exceed a week. The latter two are generally used in wind farm site selection and power plant development plans, though they are not the focus of this study [7].

Forecasting methods can be divided into physical modeling, statistical learning, machine learning, and combined physical–statistical methods [8]. Although physical-model-based methods are computationally demanding and challenging to implement, statistical learning methods, including the autoregressive integrated moving average (ARIMA) [9], are easy to use and quick to compute. However, they tend to be less accurate when dealing with highly volatile wind power forecasts. With the evolution of artificial intelligence, machine learning has been applied to wind power forecasting, with methods including support vector machines (SVM) [10], extreme gradient boosting tree models (XGBoost) [11], traditional neural network models [12,13], recurrent neural network (RNN) models [14,15,16], and transformer architecture models [17,18]. However, due to the complex volatility of wind power and the lack of clear time-series characteristics, single-model predictions often fall short in accuracy. Combined prediction methods, which leverage the strengths of various models, have proven to significantly improve forecast accuracy compared to single models [19]. For instance, Wu et al. [20] first used LSTM neural networks to forecast wind speed and other meteorological data, then applied similar time-series matching methods to filter out the main factors for modeling, training, and prediction in LightGBM. Similarly, Cao et al. [21] used a convolutional neural network (CNN) to extract the spatial and temporal correlation vectors from different stations and used LSTM to extract the temporal relationship between historical time points for multi-step wind power forecasting.

However, conventional methods struggle to handle the high noise, high volatility, and non-stationarity of the original time-series data. To address this, some researchers have applied signal-processing algorithms like empirical wavelet transformation (EWT) [22], empirical mode decomposition (EMD) [23], and complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) [24] to wind power forecasting. For example, Abedinia et al. [25] proposed an improved empirical modal decomposition method (IEMD) to decompose wind speed and fed the decomposed signals into a hybrid prediction model based on BaNN and K-means clustering, using the intelligent optimization method ChB-SSO for the automatic tuning of BaNN parameters. Similarly, Li et al. [26] used the historical wind speed and key meteorological factors decomposed by variational modal decomposition (VMD) and weighted permutation entropy (WPE) as inputs and output the forecasts with the CNN-LSTM model.

Recent research further explores advanced techniques for monitoring and diagnosing mechanical health across diverse applications [27,28,29,30]. While Sharma et al. [31] use swarm decomposition and permutation entropy for bearing defect detection, Vashishtha et al. [32] explore a Levy flight-based genetic algorithm for Pelton wheel health assessment. Chauhan et al. [33] propose a corrected conditional entropy measure combined with a multi-parent evolutionary algorithm for bearing diagnostics. Meanwhile, Chauhan [34] leverages an adaptive wavelet mutation strategy within an evolutionary algorithm for bearing defect identification. However, these methods have limitations, such as poor time resolution accuracy or short prediction time steps. To overcome these issues, this study proposes a composite deep-learning multistep forecasting method based on multi-timescale inputs (MSI) for short-term wind power forecasting. This includes VMD signal deconstruction, multi-timescale inputs, and composite prediction models based on gated recurrent units (GRU), CNN, and improved transformers. The key contributions of this study are as follows:

First, a trigonometric function is used to standardize the unique meteorological feature of wind direction, after which it is further standardized to the interval [0,1] using the maximum–minimum normalization method. This aligns with the actual position distribution of wind direction;
The Variational Mode Decomposition (VMD) method is used to decompose the primary time-series information in the original wind speed signal, yielding three components that reflect the overall trend, primary fluctuation trend, and sub-fluctuation trend of the original wind speed signal, respectively;
A multi-timescale input method is proposed to improve the power forecast effect of the model by capturing the long- and short-term time-series relationships of different input data scales;
The GRU neural network captures the time-series relationship of long-timescale input data, while the improved transformer time-series forecast model is used to process short-timescale input data. Finally, the CNN’s strong ability to process local information is utilized to extract the tip of each time point of each branch output individually, outputting the multistep forecast results through the fully connected neural network (FCN).

2. Data Sources and Processing

2.1. Data Sources

This study scrutinized acquired data from wind turbines located at an offshore wind farm in Guangdong. The wind farm, situated roughly 55 km off the coast in waters 41–46 m deep, has a total installed capacity of 500 MW. It consists of 37 wind turbines, each with a capacity of 6.8 MW, and 30 turbines of 8.3 MW. The facility also includes a 220 kV offshore booster station and 35 kV underwater transmission cables. We randomly selected historical data from four units within the wind farm, two of which were 6.8 MW and the other two 8.3 MW. This historical dataset comprises measured output power gleaned from the SCADA and NWP systems. All the data maintained a time resolution of 10 min, accruing to 144 data samples per day. Each unit’s data spanned one year and one month. We used the first year’s data for both training and validation purposes, while the subsequent month’s data served to test the model’s precision. Table 1 and Figure 1 provide a detailed overview of the raw power data for the four wind turbines under consideration.

As shown in Table 1, there are 67,996 datasets for each unit, and the data have a 10-min event interval. As shown in Figure 1, the start and end times of the datasets for units 3 and 12 are from October 2022 to November 2023, and the start and end times for units 14 and 61 are from September 2022 to October 2023. The first 52,560 data points in each dataset constitute the training set, and the remaining 5436 are the test set. The complete dataset, comprising 105,408 10-min samples, was partitioned into training, validation, and testing. The breakdown is as follows: Training set: 60,000 samples (56.9% of the total data). Validation set: 10,000 samples (9.5% of the total data). Test set: 35,408 samples (33.6% of the total data). The training set was used for model training and parameter estimation. The validation set, held out from the training process, was used for model selection and hyperparameter tuning. Monitoring the model’s performance on the validation set during training could identify the best-performing model configuration and prevent overfitting. Finally, the test set, a separate, unseen subset of the data, was used for the final model evaluation and reporting of performance metrics. This approach ensured an unbiased assessment of the model’s generalization capabilities.

2.2. Wind Turbine Operation and Control

Wind turbine generators (WTGs) are influenced by many factors, broadly classified into internal and external factors. Internal factors primarily include the blade’s shape, size, and material, along with the transmission efficiency of the drive train, which were established during the WTG design phase. Over time, as the wind turbine accumulates operational hours and undergoes multiple maintenance procedures and overhauls, these internal factors experience minor variations. However, these changes are not easily measurable with specific indicators. Nonetheless, the benefits of deep learning can be harnessed to adapt to the operational status of WTGs by employing numerous parameters in a deep neural network.

On the other hand, external factors impacting wind turbine power generation encompass wind speed, wind direction, air density, and other meteorological variables. The fundamental operation of WTGs involves the transformation of the kinetic energy of the airflow into the mechanical energy of the wind wheel’s rotation. This mechanical energy is then transmitted to the generator through the wind turbine drive system, which converts it into electrical energy. According to Betz’s theory, the wind energy absorbed by the wind turbine can be expressed as:

P_{out} = \frac{1}{2} ρ R^{2} v^{3} C_{p} (λ, β)

(1)

where

ρ

represents the air density, R is the radius of the turbine impeller, v is the ambient wind speed,

C_{p}

is the power coefficient factor of the wind turbine,

λ

is the tip speed ratio, and

β

is the pitch angle of the turbine blades. The characteristic curve of

C_{p}

value of wind turbine is related to the design parameters of the wind turbine, which are directly given by the manufacturer, and the maximum wind energy coefficient is

C_{p m a x} = 0.593

. According to the Baez limit theory, the wind power calculation formula shows that wind speed is the main factor affecting the power of WTGs. As shown in Figure 2, it is the standard power curve of 6.8 MW and 8.3 MW WTGs of this offshore wind farm, with a cut-in wind speed of 3 m/s, a cut-out wind speed of 25 m/s, rated powers of 6800 KW and 8300 KW, respectively, and a rated wind speed of 11.1 m/s. When the external wind speed was greater than 3 m/s, the WTGs started up, and the wind power was conducted through the impellers, spindle, transmission box, etc., to the generator, which drives the generator to rotate and generate electricity. When the wind speed exceeds 11.1 m/s, the wind turbine generating power reaches the rated value. At this time, it is important to initiate the wind turbine pitch system. This action helps control the amount of wind energy the turbine harnesses and prevents the turbine blades from spinning too fast, which could lead to accidents involving flying cars. When wind speeds surpass 25 m/s, both the wind turbine’s yaw and pitch systems are activated simultaneously, leading to the turbine being powered down.

2.3. Feature Selection

When the wind direction is stable, the impeller of the wind turbine can be maintained at the optimal angle, allowing the wind energy to be more fully utilized, at which point the wind turbine generates more power and is more efficient. In contrast, when the wind direction varies significantly, the impeller of the wind turbine needs to constantly adjust its angle to adapt to different wind directions, which will affect the efficiency and power generation of the generator. Although advanced offshore wind turbines are now equipped with an automatic yaw-to-wind system, which can automatically adjust the nacelle’s direction and track the incoming direction of the wind in real time, the system also requires a specific reaction time, and inevitably, there will be wind alignment errors. The wind direction time-series feature data are conducive to the deep learning model capturing the wind-pairing error distribution law of the wind turbine to eliminate this error. Meanwhile, the wind direction and wind speed have a close relationship, as shown in Figure 3, and historical wind speed direction rose diagram of a wind farm, from which the distribution of wind speed values at different wind speed intervals in each direction shows a high degree of similarity. The scale of the wind speed distribution is given in counts within the 13 months, and the color scale of the wind speed contours is in m/s. For the same wind farm, the terrain and geomorphological features in each direction were specific and greatly influenced the change in wind speed. Therefore, the wind direction time-series feature data are also conducive to the model to better capture the change rule of wind speed and improve the accuracy of the wind power prediction results.

As can be seen from the wind-power calculation formula, the air density is also closely related to the size of the wind energy. According to the IEC61400-12-1 standard [35], the actual air density calculation formula is:

ρ = \frac{1}{T} [\frac{B}{R_{0}} - Φ P_{w} (\frac{1}{R_{0}} - \frac{1}{R_{w}})]

(2)

where

ρ

is the density of air, kg/m³; B is the atmospheric pressure, Pa; T is the absolute temperature, K;

Φ

is the relative humidity, taken as

0 < Φ < 1

;

R_{0}

is the gas constant of dry air, 287.05 J/kg-K;

R_{w}

is the gas constant of water vapor, 461.5 J/kg-K;

P_{w}

is the vapor pressure.

Therefore, to obtain the actual air density, it is only necessary to obtain the relative humidity, atmospheric pressure, and atmospheric temperature. In summary, the main external factors affecting the power generation of wind turbines are meteorological factors, such as wind speed, wind direction, relative humidity, atmospheric pressure, and atmospheric temperature, which were selected as the initial input parameters of the model. Table 2 summarizes the features selected for the analysis, along with their descriptions and sources.

2.4. Data Preprocessing and Gap Handling

Upon closer inspection of the raw data, we identified several missing or interpolated data gaps. These gaps could arise for various reasons, such as scheduled maintenance, unscheduled downtime, or temporary disconnection of the turbines or the entire wind farm from the grid. Specifically, we observed a significant data gap for turbines #3 and #12 just before July 2023. After consulting with the wind farm operators, we learned that this gap was due to a scheduled maintenance period during which these turbines were taken offline for routine inspections and servicing. To handle such data gaps, we explored two approaches:

Gap Interpolation: One approach was to interpolate the missing data points based on the available data before and after the gap. While this approach can provide a continuous data stream, it may introduce biases or inaccuracies, especially for larger gaps or periods with rapidly changing wind conditions;
Gap Removal: Alternatively, we opted to remove the data gaps entirely from the dataset, treating them as missing values. This approach ensures that our analysis and modeling efforts are based solely on actual recorded data, avoiding any potential biases introduced by interpolation.

After careful consideration, we chose the gap removal approach for our analysis. We believe that this conservative approach preserves the integrity of the data and provides a more accurate representation of the turbines’ performance under the observed conditions. It is important to note that the presence of data gaps and the chosen gap-handling strategy may impact the overall dataset size and the distribution of samples across different operating conditions. We have considered these factors during our data partitioning and model training processes to ensure robust and reliable results.

2.5. Deconstruction of Wind Speed Signals

The volatility of offshore wind signals is complex because of several natural factors. First, the shape and size of land features and their relative position to the sea can change the direction and strength of the wind. The fluctuation of ocean waves and their interaction with the wind can also affect the wind speed signal. Furthermore, meteorological factors, such as temperature, humidity, and pressure in the atmosphere, and the influence of atmospheric currents can have a complex effect on wind speed signals. These signals are not simply superimposed but are intertwined and interfere with each other, making it challenging to extract adequate timing information from the original wind speed signal. To effectively extract the timing features in the wind speed signal, it is necessary to preprocess the wind speed signal with feature deconstruction to decompose the primary signals and remove the related noise.

2.5.1. Principles of VMD

The volatility of offshore wind signals is complex due to several natural factors, such as the shape and size of land features, ocean wave interactions, and meteorological factors, such as temperature, humidity, and atmospheric currents. These signals are not simply superimposed but are intertwined and interfere, making it challenging to extract adequate timing information from the original wind speed signal. To effectively extract the timing features, it is necessary to preprocess the wind speed signal by decomposing the primary signals and removing the related noise. This study employs the Variational Mode Decomposition (VMD) [36] method, which effectively processes non-smooth and nonlinear mixed time-frequency signals. The VMD method decomposes the original one-dimensional wind speed signal x(t) into k finite-bandwidth intrinsic modal functions (IMFs), allowing us to extract the signal’s frequency domain features. The constrained variational expression for the VMD method is given by:

\{\begin{matrix} \underset{\{u_{k}\}, \{w_{k}\}}{m i n} \{\sum_{k} {‖\partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j w_{t} t}‖}_{2}^{2}\} \\ s . t . \sum_{k = 1}^{K} u_{k} = x (t) \end{matrix}, u_{k} (t) = \sum_{k = 1}^{n} a_{k} \sin (w_{k} t)

(3)

where k is the number of modes to be decomposed,

\{u_{k}\} = \{u_{1}, \dots, u_{k}\}

denotes the k intrinsic modal components,

\{w_{k}\} = \{w_{1}, \dots, w_{k}\}

is the center frequency of each component,

δ (t)

is the Dirichlet function, ∗ is the convolution operation, t is the time series, and

\partial_{t}

denotes the partial derivatives of time t. Equation (3) aims to decompose the input signal x(t) into k intrinsic modal functions, subject to the constraint that the sum of these functions equals the original signal. To solve Equation (3), the Lagrange multiplier operator λ is introduced, converting the constrained variational problem into an unconstrained variational problem. This leads to the Lagrange augmented matrix expression:

L (\{u_{k}\}, \{w_{k}\}, λ) = α \sum_{k} \begin{matrix} {‖\partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j w_{k} t}‖}_{2}^{2} + \\ {‖f (t) - \sum_{k} u_{k} (t)‖}_{2}^{2} + \\ ⟨λ (t), f (t) - \sum_{k} u_{k} (t)⟩ \end{matrix}

(4)

where

α

is the quadratic term penalty factor, which is used to reduce the interference of Gaussian noise, the final solution reduces the noise and volatility of the original signal to obtain each IMF component with a higher signal-to-noise ratio in the filtered bandwidth set. The VMD defines each component as an amplitude-modulation-frequency modulation (AMFM) function, which can be expressed as:

u_{k} (t) = A_{k} (t) \cos (ϕ_{k} (t))

(5)

where

A_{k} (t)

is the instantaneous amplitude of the component and

ϕ_{k} (t)

is the instantaneous phase of the component.

2.5.2. Wind Speed Signal Decomposition

The penalty factor

α

and the number of decomposition layers

k

in the VMD algorithm need to be selected by humans. The penalty factor

α

is 1.5~2.0 times the sampling point length, and the number of decomposition layers

k

is determined according to the actual decomposition effect [37]. Define

\hat{x} (t)

as the wind speed signal reconstructed from the decomposition components, where:

\hat{x} (t) = \sum_{i = 1}^{k} (I M F_{i} (t)), {I M F}_{i} (t) = \sum_{k = 1}^{n} a_{k} \sin (w_{k} t)

(6)

Define the root mean square error between the original wind speed and reconstructed signals as:

R M S E = \sqrt{\frac{1}{n} \sum_{j = 1}^{n} ({\hat{x}}_{j} - x_{j})^{2}}

(7)

Define the Pearson correlation coefficient between the original wind speed and reconstructed signals as:

ρ_{p} = \frac{cov (\hat{x} (t), x (t))}{σ_{\hat{x} (t)} σ_{x (t)}}

(8)

where

c o v

denotes the covariance,

σ_{\hat{x} (t)}

and

σ_{x (t)}

are the standard deviations, and

ρ_{p} \in [0,1]

. In the above evaluation of reconstruction performance indicators, the smaller the value of RMSE and the closer the correlation coefficient value is to 1, the better the reconstructed signals obtained from the decomposition of each component and the original signals overlap.

Taking the data of Unit 12 as an example, the parameters of the decomposition process are shown in Table 3, and the correlation coefficient nearly reaches the maximum value when the value of k is 3. With further increase of k, the RMSE decreases, but the decrease is slow, while the correlation coefficient remains relatively stable. The computation time will greatly increase as the number of decomposition layers increases. The larger the number of decomposition layers k, the better the signal overlap before and after reconstruction. However, as the value of k increases, the reconstructed signal is prone to introducing noise. As shown in Figure 4 and Figure 5, the VMD 4-layer decomposition introduces noise with a small amplitude on top of the VMD 3-layer decomposition, which instead tends to confuse the timing features. Therefore, the optimal number of decomposition layers chosen

k = 3

, and each component is shown in Figure 4. Among them, IMF1 reflects the overall trend of the original wind speed signal and is the trend component; IMF2 is the main fluctuation component of the wind speed signal; IMF3 is the secondary fluctuation component of the wind speed signal.

2.6. Feature Standardization

The VMD method deconstructs the wind speed signal to obtain three components representing different aspects of the wind speed, such as periodicity, trend volatility, etc. Each component has unique features and contributions, which can provide us with more detailed and comprehensive wind speed information. To further extract the time-period features, this paper extracts four key temporal features, namely month, day, hour, and minute, from the date. These features are crucial for understanding the temporal properties of wind speed. For example, a month may affect the seasonal wind speed variation, while day, hour, and minute provide information on a finer timescale. Together with the power itself, 12-dimensional features were obtained, and the data needed to be normalized in the next step. First, since the angle of wind direction ranges from 0 to 360°, and the due north direction is 0°, when the wind angle tends to be close to 0° and close to 360°, the numerical representation results of the wind position should be similar. However, if the standard normalization or the maximum and minimum normalization are used to deal with it, the difference in the results obtained is enormous. Therefore, the trigonometric normalization method is used first for this particular feature of the wind direction angle, as shown in the following equation.

x_{d i r} = \sin (\frac{θ}{360} * 2 π) + \cos (\frac{θ}{360} * 2 π)

(9)

where

x_{d i r}

is the standardized value for wind direction,

θ

is the wind angle, ranging from 0 to 360. Afterward, for all 12-dimensional features, the data are normalized using the maximum–minimum normalization method, as shown below:

\bar{x} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(10)

where

x

is the original data,

x_{m i n}

is the minimum value, and

x_{m a x}

is the maximum value.

3. Forecasting Model Structure

The forecasting model was developed using a data-driven machine learning approach. The overall methodology involved several key steps: (1) data preprocessing and feature engineering, (2) variational mode decomposition (VMD) for extracting intrinsic mode functions (IMFs) from the wind speed and power generation time series, (3) feature selection to identify the most relevant IMFs and meteorological variables, and (4) training and evaluation of various machine learning models (e.g., random forest, gradient boosting, neural networks) for wind power forecasting. The following subsections provide detailed descriptions of each step in the proposed approach.

3.1. GRU Network

RNN can learn the interrelated information between pre- and post-data when dealing with continuous time series data, so RNN has certain advantages in prediction tasks. However, due to the limitation of its structure, RNN has the problem of gradient vanishing during backpropagation, which is unsuitable for dealing with long time series data. GRU [38] is an improved RNN structure, which effectively solves the problems of gradient vanishing and gradient explosion by introducing structures such as a memory unit and gating mechanism, enabling RNN to handle sequence data better.

As shown in Figure 6, the GRU neural network effectively processes sequential information by introducing a gating mechanism. The mathematical principle is based on the neural network’s activation function and weight matrix, which control the information flow by calculating the gating unit’s value.

\{\begin{matrix} z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}]) \\ r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}]) \\ {\bar{h}}_{t} = \tanh (W \cdot [r_{t} * h_{t - 1}, x_{t}]) \\ h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * {\bar{h}}_{t} \end{matrix}

where

z_{t}, r_{t}, {\bar{h}}_{t}, h_{t}

corresponds to the update gate, reset gate, candidate state at the current moment, and hidden state at the current moment, respectively.

x_{t}

is the input variable of the current moment, and

h_{t - 1}

is the current state of the reset gate, which can control the output of the previous moment

h_{t - 1}

W_{z}, W_{r}, W

are the training parameters inside the model, and

σ

is the nonlinear activation function. Here, the ReLU function is used. The GRU neural network achieves layer-by-layer transmission and information extraction of sequence data by controlling the forgetting and retaining of information through two gating units, the reset gate and the update gate, respectively. This mechanism can effectively capture the sequence’s long-term dependencies and improve the model’s performance.

3.2. Improved Transformer Timing Forecast Model

The Transformer model was first proposed by the Google machine translation team and applied to natural language processing (NLP) tasks with good results. It uses a self-attention mechanism and position encoding (PE) to capture long-distance dependencies in the input sequence. Despite the similarities between time series prediction and NLP tasks, there are some key differences between them, and some modifications to the model structure are required to apply the Transformer model to the task of WTG power forecast. The input sequences in NLP tasks are mostly words or symbols, which must be mapped into a fixed-size numeric vector representation by word vector coding before the computer can process them. In the WTG power forecast task, each time point of the resulting time series data is a 12×1 numerical feature vector that can be used directly as input to the model. Because Transformer does not use the structure of the recurrent neural network but uses global information, it is not able to utilize the knowledge of temporal features before and after the data, and it needs to embed the positional relationship between the data in the input data, as shown in the following equation:

\{\begin{matrix} P E_{(pos, 2 i)} = \sin (p o s / {10,000}^{2 i / d}) \\ P E_{(pos, 2 i + 1)} = \cos (p o s / {10,000}^{2 i / d}) \end{matrix}

(11)

where pos denotes the position of the data, d represents the dimension of the PE, 2i indicates the even dimension, and 2i + 1 denotes the odd dimension (i.e., 2i ≤ d, 2i + 1 ≤ d). Computing PE in this way allows the model to calculate the relative position, and for a fixed length spacing k, PE (pos + k) can be obtained using PE (pos). The encoded data X is then passed into the multi-head attention module (As shown in Figure 7), which combines multiple self-attention mechanisms in parallel. Three linear variational matrices

W_{Q}, W_{K}, W_{V}

are used in the attention mechanism to calculate Q, K, and V, where Q is the query value, K is the key value, and V is the original value, after which the attention mechanism is calculated as shown in the following equation:

A tten t i o n (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(12)

softmax (x) = e^{x_{i}} \frac{1}{\sum_{i} e^{x_{i}}}

(13)

The outputs of each attention mechanism module

Z_{1}, Z_{2}, \dots Z_{h}

are spliced by columns and passed through a fully connected linear layer to obtain the output Z. The matrix Z needs to be consistent in dimension with its input matrix X. The outcomes of each attention mechanism module are spliced by columns.

Figure 7. Multi-head Attention.

The matrix Z is summed with its input matrix X through the residual linkage network and layer normalization. Layer normalization turns the inputs of each layer of neurons to have the same mean and variance, which speeds up convergence. The output matrix

O_{n \times m}

of the encoder module is obtained. The traditional Transformer decoder uses a large number of matrix operations and attention mechanisms, so it consumes a large amount of computational resources. It can only generate output sequences individually, with low output efficiency. When dealing with large-scale datasets or long sequences, it may be necessary to use high-performance GPUs or TPUs and other computational resources, which will increase the cost of model training and inference and is difficult to apply in practical industrial scenarios.

Here, the decoding layer uses CNN and FCN (As shown in Figure 8); instead, CNN can perform local feature extraction on the matrix output from the encoder and capture the local attentional information through convolutional operations. Then, the global temporal information is captured by FCN to integrate the local features into a complete sequence representation. The decoder structure using CNN and FCN can output the prediction results of long sequences simultaneously, avoiding the limitation of traditional decoders that generate output sequences one by one. In addition, this structure can reduce the computational complexity in the decoding process and improve the decoding efficiency.

3.3. Multi-Timescale Input Forecast Models

Meteorological factors such as wind speed, wind direction, temperature, humidity, and barometric pressure all affect the power generated by wind turbines. However, these meteorological factors are highly uncertain and difficult to predict accurately. To capture the time series characteristics of wind energy more accurately, based on the GRU neural network, a method of wind power forecast using a multi-timescale input model with GRU and an improved transformer for time series forecast (MSI-GTTS) is proposed. The specific model structure is shown in Figure 9. The core idea of this method is to divide the processed data into different timescales. Specifically, we process the data using three timescales: days, weeks, and months.

Specifically, the data of one day, one week, and one month before the moment t can be taken as inputs to the model for predicting the power generated by wind turbines after the moment t, respectively. This paper uses two different models to deal with these different timescales: the GRU neural network and the improved Transformer time series prediction model. For one-day and one-week data, we used a GRU neural network for processing. The GRU neural network is a very suitable model for processing time series data, which can effectively capture the time-series features in the data. On the other hand, the GRU neural network may be difficult to process effectively for one-month data due to its long timescale. Therefore, we adopt the Improved Transformer model to process one month’s input data to capture its long-time-dependent features. Finally, we spliced the data from these three different timescales. With one-dimensional convolution and FCN, we obtained the final forecasting results. The MSI-GTTS model consists of three branching channels and one aggregated output channel, where the first two branching channels consist of a GRU recurrent neural network and a fully connected layer, and the third branch is the improved Transformer temporal forecasting model. The coding layer in the third branch consists of three encoders within the encoder in the multi-head attention mechanism.

4. Experimental Analysis and Verification

4.1. Evaluation Indicators

In this paper, the mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and coefficient of determination (R²) were used as the evaluation indexes of the model performance. Among them:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |P_{r e a l} - P_{p r e}|

(14)

M S E = \frac{1}{n} {\sum_{i = 1}^{n} (P_{r e a l} - P_{p r e})}^{2}

(15)

R M S E = \sqrt{\frac{1}{n} {\sum_{i = 1}^{n} (P_{r e a l} - P_{p r e})}^{2}}

(16)

R^{2} = 1 - \frac{\sum_{t = 1}^{n} (P_{r e a l} - P_{pre})^{2}}{\sum_{t = 1}^{n} (P_{r e a l} - {\bar{P}}_{r e a l})^{2}}

(17)

Among them,

P_{r e a l}

denotes the measured power value,

{\bar{P}}_{r e a l}

denotes the mean of the measured power values and

P_{p r e}

denotes the predicted power value. The smaller the MAE, MSE, and RMSE indicators are, the better the prediction effect is. R² is used to assess the degree of conformity between the forecast value and the actual value, and the value range is [0,1]. The closer R² is to 1, the better the fitting effect of the forecast and actual values.

4.2. Analysis of the Model Training Process

The network parameters are updated by backpropagation gradient descent, the optimizer is Adam, the learning rate is set to 0.0001, the loss function is MSE, and the current optimal model is automatically saved when the loss function values of both the training set and the validation set reach the minimum. The number of iterations is set to 1000. Taking the data of Unit 12 as an example, the loss function curves of each model’s training set and validation set are shown in Figure 10 and Figure 11.

The magnitude of data embodied in the loss curves is small because the data are normalized. As can be seen from Figure 10 and Figure 11 the proposed VMD-MSI-GTTS method has the fastest convergence speed and the smallest training loss, and the difference between the loss values of the training and validation sets is very small, which indicates that there is no overfitting problem. Among them, the loss function value of the GRU model alone is too large compared to the other three methods using VMD signal deconstruction, which indicates that the VMD method can effectively deconstruct the primary temporal information in the medium-original wind speed signals, which in turn significantly improves the accuracy of the WTG power prediction results. Among the three models that used VMD signal deconvolution, the loss function value of the VMD-GRU model with single timescale input was the largest, and the loss function values of the two models (VMD-MSI-GRU and VMD-MSI-GTTS) that used multi-timescale input were smaller. Moreover, during the training process, the loss function curves of the two models with multi-timescale inputs change more stably and have a stronger convergence tendency than those with single-timescale inputs. This indicates that the multi-timescale input models can effectively capture the long- and short-term temporal relationships of different input data scales and improve the models’ prediction effect. Among the two models that used the multi-timescale input method, the loss values of the training and validation sets of the VMD-MSI-GTTS model are smaller than those of the VMD-MSI-GRU model. This indicates that the improved Transformer temporal forecasting model can capture the temporal relationships of the long-timescale input data more effectively than the GRU, ultimately allowing the model to achieve better forecast results.

4.3. Comparison of Module Analysis Experiment Results

The module analysis experiment results for the test set data of the four units are shown in Table 4, and the data visualization results are shown in Figure 12.

From Figure 12, it can be intuitively seen that the VMD-MSI-GTTS model proposed in this paper achieves the best evaluation indexes in the power forecast experiments of all four units. Compared with the three methods of GRU, VMD-GRU, and VMD-MSI-GRU, the VMD-MSI-GTTS model has the smallest MAE, MSE, and RMSE for the power forecast results of the four units, and the value of the coefficient of determination R² is the closest to 1. It can be derived from Table 4.

Compared to the GRU model, the VMD-GRU model reduces the MAE, MSE, and RMSE values of the prediction results for the four datasets by an average of 0.0253, 0.0081, and 0.0322, respectively, and improves the R² value by an average of 0.112.

Compared to the VMD-GRU model, the MAE, MSE, and RMSE values of the prediction results of the VMD-MSI-GRU model for the four datasets were reduced by an average of 0.0069, 0.0024, and 0.0095, respectively, and the R² value was improved by an average of 0.033.

Compared to the VMD-MSI-GRU model, the MAE, MSE, and RMSE values of the prediction results of the VMD-MSI-GTTS model for the four datasets were reduced by an average of 0.0042, 0.0009, and 0.0095, respectively, and the R² values were improved by an average of 0.0063. From the experimental results, it can be concluded that the VMD signal deconstruction method, the multi-timescale input structure, and the improved Transformer timing prediction method in the proposed VMD-MSI-GTTS model can effectively improve the accuracy of the power forecast results for WTGs.

The VMD method can effectively deconstruct the leading time series information in the raw wind speed signal, which can significantly improve the accuracy of the power prediction results of WTGs; the multi-timescale input method can effectively capture the long-term and short-term time series relationships of different scales of input data, which can improve the power prediction effect of the model; compared with the GRU, the improved Transformer time series forecast model is more capable of capturing the time series relationships of long-time input data, which can lead to better prediction effects of the model in the end.

After back-normalizing the forecast results, the distribution of errors in the results of the module analysis experiments for the four datasets is shown in Figure 13, from which the accuracy of the prediction results of the VMD-MSI-GTTS model is the highest in each of the four datasets, with the smallest range of distribution of errors between the predicted values and the true values.

4.4. Comparison of Different Decomposition Methods

To verify the effectiveness of the VMD signal decomposition method, the VMD method is compared with the EEMD, EWT, and TVF-EMD methods. The comparative experimental results of different decomposition methods are shown in Table 5 and Figure 14. From Table 5, it can be seen that in the experimental results of the four units, compared with the EEMD-MSI-GTTS, EWT-MSI-GTTS, and TVFEMD-MSI-GTTS models, the prediction results of the VMD-MSI-GTTS model proposed in this paper have the smallest errors and the highest prediction accuracy.

As shown in Figure 14, the absolute error distribution ranges of the prediction results of the proposed VMD-MSI-GTTS models are also all minimized. The prediction performance of the TVFEMD-MSI-GTTS in the datasets of Units 3 and 14 is similar to that of the VMD-MSI-GTTS model, but the prediction performance in the datasets of Units 12 and 61 is poor.

The prediction performance of the EWT-MSI-GTTS model is good in the Unit 61 dataset but poor in the other three datasets. The experimental results show that the VMD method can improve the short-term multistep power prediction accuracy of offshore wind turbines in a better and more stable way than the EEMD, EWT, and TVF-EMD decomposition methods and is more universal.

4.5. Model Comparison Experimental Analysis and Validation

To further validate the proposed VMD-MSI-GTTS model in the offshore wind turbine power forecasting problem with LSTM [39], CNN-LSTM [40], LSTM-Attention [41], and Informer [42,43] models for comparison experiments. The results of the comparison experiments are shown in Table 6, and the visualization results are shown in Figure 15. From Figure 15, it can be visualized that the VMD-MSI-GTTS model proposed in this paper exhibits excellent forecast performance on all four WTGs, significantly outperforming the other four compared models. Specifically, the VMD-MSI-GTTS model achieved the best evaluation metrics on all datasets. In contrast, the other models had their strengths and weaknesses in performance on the different unit datasets. For example, in the dataset of Unit 3, the Informer model achieved the smallest values for MAE, MSE, and RMSE metrics; however, in the dataset of Unit 61, the Informer model had the highest MAE, MSE, and RMSE values of the four compared models.

The VMD-MSI-GTTS model proposed in this paper demonstrates excellent power forecast performance in all four datasets compared to the LSTM, CNN-LSTM, LSTM-Attention, and Informer models. The mean values of MAE, MSE, and RMSE for the forecast results of the four datasets are 0.0522, 0.0084, and 0.0907, respectively, and the mean value of R² is 0.899. Compared with the other four methods, the VMD-MSI-GTTS model significantly improves the accuracy of the power forecast of offshore wind turbines and provides a specific reference value in the offshore wind turbine power forecast field. After back-normalizing the prediction results of the five models on the four datasets, the distribution of errors in the prediction results of each model on different datasets is shown in Figure 16. Some of the forecasting result curves for the comparison experiments on the four datasets are shown in Figure 17. Compared with the LSTM, CNN-LSTM, LSTM-Attention, and Informer models, the VMD-MSI-GTTS model has the smallest error distribution range between the forecast values and the true values in all four datasets, with the highest forecast accuracy and the strongest reliability, which indicates that it is more robust in forecasting the power of different WTGs. This robustness may be attributed to the sensitivity of the VMD-MSI-GTTS model to multi-timescale information and its effective capture of temporal structure. With an MAE of 0.05 on the normalized power values (scaled between 0 and 1), the model can predict the 10-min-ahead wind power with an average percentage error of approximately 5% of the nominal power capacity.

5. Conclusions

To improve the accuracy of offshore wind turbine ultra-short-term power multistep forecast results, an ultra-short-term wind power forecast method for offshore wind turbines based on VMD signal deconstruction, multi-timescale inputs, and an improved Transformer time-series forecast model is proposed. Experiments are also conducted with the historical data of four WTGs in an offshore wind farm in Guangdong, and the output power of WTGs in the next four hours is forecasted with a time resolution of 10 min. The experimental results show that the proposed VMD-MSI-GTTS model has the highest accuracy and the strongest stability for offshore wind turbines’ ultra-short-term power multistep forecast results. The breakdown conclusions are as follows:

The unique meteorological feature of wind direction is first standardized by a trigonometric function and then normalized by maximum–minimum normalization, which can normalize the data to the [0,1] interval and is in line with the actual positional distribution of the wind direction.

The VMD method can effectively deconstruct the primary timing information in the medium-original wind speed signals, which in turn significantly improves the accuracy of the wind turbine power prediction results.

The multi-timescale input method can effectively capture the long- and short-term temporal relationships of input data at different scales and improve the power forecast of the model.

Compared to the GRU, the improved Transformer temporal forecast model is more capable of capturing the temporal relationships of input data over long-timescales, ultimately allowing the model to achieve better forecast results.

The proposed VMD-MSI-GTTS model also exhibits superior prediction performance compared to LSTM, CNN-LSTM, LSTM-Attention, and Informer temporal prediction methods, significantly outperforming these comparison models.

In the future, the effect of the timescale size of each input on the model’s forecast accuracy can be explored in detail, and the impact of seasonal periodicity on the forecast model can be considered to build a longer-time wind turbine power forecast model. Furthermore, optimization algorithms can be incorporated to optimize each hybrid deep learning model parameter to improve the model’s forecast performance.

Author Contributions

Conceptualization, A.W. and Z.G.; methodology, Z.G.; software, Y.J.; validation, S.M. and K.A.-B.; formal analysis, A.W.; investigation, K.A.-B.; resources, A.W; data curation, Z.G.; writing—original draft preparation, A.W.; writing—review and editing, A.W., C.W. and K.A.-B.; visualization, F.Y.; supervision, A.W.; project administration, A.W.; funding acquisition, A.W. and C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research is partially supported by the Special Support for Marine Economic Development of Guangdong Province (GDNRC [2022]28) and the National Natural Science Foundation of China (52372420).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used and analyzed during the current study are included in the manuscript.

Conflicts of Interest

Author Chao Wei was employed by the company Huadian Electric Power Research Institute. Authors Yunsong Ji, Shidong Ma and Fareng Yao were employed by the company Guangdong Huadian Fuxin Yangjiang Offshore Wind Power Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Kılkış, Ş.; Krajačić, G.; Duić, N.; Montorsi, L.; Wang, Q.; Rosen, M.A. Research frontiers in sustainable energy, water, and environment development in a climate crisis. Energy Convers. Manag. 2019, 199, 111938. [Google Scholar] [CrossRef]
IRENA. Renewable Energy Statistics 2022; International Renewable Energy Agency: Abu Dhabi, United Arab Emirates, 2022. [Google Scholar]
Chen, Y.; Lin, H. Overview of offshore wind power generation development in China. Sustain. Energy Technol. Assess. 2022, 53, 102766. [Google Scholar] [CrossRef]
Xia, S.; Chan, K.W.; Luo, X.; Bu, S.Q.; Ding, Z.; Zhou, B. Optimal sizing of the energy storage system and its cost-benefit analysis for power grid planning with intermittent wind generation. Renew. Energy 2018, 122, 472–486. [Google Scholar] [CrossRef]
Breeze, P. Wind Power Grid Integration and Environmental Issues. In Wind Power Generation; Elsevier: Amsterdam, The Netherlands, 2016. [Google Scholar]
Fan, Q.; Wang, X.; Yuan, J.; Liu, X.; Hu, H.; Lin, P. A Review of the Development of Key Technologies for Offshore Wind Power in China. J. Mar. Sci. Eng. 2022, 10, 929. [Google Scholar] [CrossRef]
Santhosh, M.; Venkaiah, C.; Vinod Kumar, D.M. Current advances and approaches in wind speed and wind power forecasting for improved renewable energy integration: A review. Eng. Rep. 2020, 2, e12178. [Google Scholar] [CrossRef]
Ahmed, A.; Khalid, M.W. A review of the selected applications of forecasting models in renewable power systems. Renew. Sustain. Energy Rev. 2019, 100, 9–21. [Google Scholar] [CrossRef]
Singh, P.K.; Singh, N.; Negi, R. Wind Power Forecasting Using Hybrid ARIMA-ANN Technique. In Advances in Intelligent Systems and Computing; Springer: Singapore, 2019. [Google Scholar]
Zeng, J.; Qiao, W. Support vector machine-based short-term wind power forecasting. In Proceedings of the 2011 IEEE/PES Power Systems Conference and Exposition, Phoenix, AZ, USA, 20–23 March 2011; pp. 1–8. [Google Scholar]
Cai, R.; Xie, S.; Wang, B.; Yang, R.; Xu, D.; He, Y. Wind Speed Forecasting Based on Extreme Gradient Boosting. IEEE Access 2020, 8, 175063–175069. [Google Scholar] [CrossRef]
Chang, W. Wind Energy Conversion System Power Forecasting Using Radial Basis Function Neural Network. Appl. Mech. Mater. 2013, 284–287, 1067–1071. [Google Scholar] [CrossRef]
Bhaskar, K.; Singh, S.N. AWNN-Assisted Wind Power Forecasting Using Feed-Forward Neural Network. IEEE Trans. Sustain. Energy 2012, 3, 306–315. [Google Scholar] [CrossRef]
Ren, B.; Chen, L.; Ma, H.; Xue, X. A Robust Short-term Wind Power Forecasting Algorithm Based on LSTM-XGBoost Model. In Proceedings of the 2021 IEEE 5th Conference on Energy Internet and Energy System Integration (EI2), Taiyuan, China, 22–24 October 2021; pp. 2854–2859. [Google Scholar]
Olaofe, Z.O.; Folly, K.A. Wind power estimation using recurrent neural network technique. In Proceedings of the IEEE Power and Energy Society Conference and Exposition in Africa: Intelligent Grid Integration of Renewable Energy Resources (PowerAfrica), Johannesburg, South Africa, 9–13 July 2012; pp. 1–7. [Google Scholar]
Xiaoyun, Q.; Xiaoning, K.; Chao, Z.; Shuai, J.; Xiuda, M. Short-term wind power prediction based on deep Long Short-Term Memory. In Proceedings of the 2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC), Xi’an, China, 25–28 October 2016; pp. 1148–1152. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All you Need. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Wu, H.; Meng, K.; Fan, D.; Zhang, Z.; Liu, Q. Multistep short-term wind speed forecasting using Transformer. Energy 2022, 261, 125231. [Google Scholar] [CrossRef]
Agarwal, P.; Shukla, P.; Sahay, K.B. A Review on Different Methods of Wind Power Forecasting. In Proceedings of the 2018 International Electrical Engineering Congress (iEECON), Krabi, Thailand, 7–9 March 2018; pp. 1–4. [Google Scholar]
Wu, Q.; Guan, F.; Lv, C.; Yongzhang, H. Ultra-short-term multistep wind power forecasting based on CNN-LSTM. IET Renew. Power Gener. 2021, 15, 1019–1029. [Google Scholar] [CrossRef]
Cao, Y.; Gui, L. Multistep wind power forecasting model Using LSTM networks, Similar Time Series and LightGBM. In Proceedings of the 2018 5th International Conference on Systems and Informatics (ICSAI), Nanjing, China, 10–12 November 2018; pp. 192–197. [Google Scholar]
Lio, J.; Liancai, M.; Li, J.; Ma, L. Short-term wind power combined prediction based on EWT-SMMKL methods. Arch. Electr. Eng. 2023, 70, 801–817. [Google Scholar]
Zhang, F.; Guo, Z.; Sun, X.; Xi, J. Short-term wind power prediction based on EMD-LSTM combined model. IOP Conf. Ser. Earth Environ. Sci. 2020, 514, 42003. [Google Scholar] [CrossRef]
Hu, C.; Zhao, Y.; Jiang, H.; Jiang, M.; You, F.; Liu, Q. Prediction of ultra-short-term wind power based on CEEMDAN-LSTM-TCN. Energy Rep. 2022, 8, 483–492. [Google Scholar] [CrossRef]
Abedinia, O.; Lotfi, M.; Bagheri, M.; Sobhani, B.; Shafie Khah, M.; Catalão, J.P.S. Improved EMD-Based Complex Prediction Model for Wind Power Forecasting. IEEE Trans. Sustain. Energy 2020, 11, 2790–2802. [Google Scholar] [CrossRef]
Lu, P.; Ye, L.; Pei, M.; Zhao, Y.; Dai, B.; Li, Z. Short-term wind power forecasting based on meteorological feature extraction and optimization strategy. Renew. Energy 2021, 184, 642–661. [Google Scholar] [CrossRef]
Vashishtha, G.; Chauhan, S.; Kumar, S.; Kumar, R.; Zimroz, R.; Kumar, A. Intelligent Fault Diagnosis of Worm Gearbox Based on Adaptive CNN Using Amended Gorilla Troop Optimization with Quantum Gate Mutation Strategy. Knowl.-Based Syst. 2023, 280, 110984. [Google Scholar] [CrossRef]
Chauhan, S.; Vashishtha, G.; Gupta, M.K.; Korkmaz, M.E.; Demirsöz, R.; Noman, K.; Kolesnyk, V. Parallel Structure of Crayfish Optimization with Arithmetic Optimization for Classifying the Friction Behaviour of Ti-6Al-4V Alloy for Complex Machinery Applications. Knowl.-Based Syst. 2024, 286, 111389. [Google Scholar] [CrossRef]
Korkmaz, M.E.; Gupta, M.K.; Kuntoğlu, M.; Patange, A.D.; Ross, N.S.; Yılmaz, H.; Chauhan, S.; Vashishtha, G. Prediction and Classification of Tool Wear and Its State in Sustainable Machining of Bohler Steel with Different Machine Learning Models. Measurement 2023, 223, 113825. [Google Scholar] [CrossRef]
Chauhan, S.; Vashishtha, G.; Kumar, R.; Zimroz, R.; Gupta, M.K.; Kundu, P. An Adaptive Feature Mode Decomposition Based on a Novel Health Indicator for Bearing Fault Diagnosis. Measurement 2024, 226, 114191. [Google Scholar] [CrossRef]
Sharma, S.; Tiwari, S.; Singh, S. Integrated Approach Based on Flexible Analytical Wavelet Transform and Permutation Entropy for Fault Detection in Rotary Machines. Measurement 2021, 169, 108389. [Google Scholar] [CrossRef]
Vashishtha, G.; Chauhan, S.; Singh, M.; Kumar, R. Bearing Defect Identification by Swarm Decomposition considering Permutation Entropy Measure and Opposition-based Slime Mould Algorithm. Measurement 2021, 178, 109389. [Google Scholar] [CrossRef]
Chauhan, S.; Singh, M.; Kumar Aggarwal, A. An Effective Health Indicator for Bearing Using Corrected Conditional Entropy through Diversity-Driven Multi-Parent Evolutionary Algorithm. Struct. Health Monit. 2020, 20, 2525–2539. [Google Scholar] [CrossRef]
Chauhan, S.; Singh, M.; Aggarwal, A.K. Bearing Defect Identification via Evolutionary Algorithm with Adaptive Wavelet Mutation Strategy. Measurement 2021, 179, 109445. [Google Scholar] [CrossRef]
International Standard IEC61400-12-1: Wind Energy Generation Systems-Part 12-1: Power Performance Measurements of Electricity Producing Wind Turbines. 2017. Available online: https://cdn.standards.iteh.ai/samples/17046/768eb857b82a4b8d99aa87ef901d23ea/IEC-61400-12-1-2017.pdf (accessed on 1 May 2024).
Dragomiretskiy, K.; Zosso, D. Variational Mode Decomposition. IEEE Trans. Signal Process. 2014, 62, 531–544. [Google Scholar] [CrossRef]
Li, J.; Song, Z.; Wang, X.; Wang, Y.; Jia, Y. A novel offshore wind farm typhoon wind speed prediction model based on PSO–Bi-LSTM improved by VMD. Energy 2022, 251, 123848. [Google Scholar] [CrossRef]
Kisvari, A.; Lin, Z.; Liu, X. Wind power forecasting—A data-driven method and gated recurrent neural network. Renew. Energy 2021, 163, 1895–1909. [Google Scholar] [CrossRef]
Ookura, S.; Mori, H. An Efficient Method for Wind Power Generation Forecasting by LSTM in Consideration of Overfitting Prevention. IFAC-PapersOnLine 2020, 53, 12169–12174. [Google Scholar] [CrossRef]
Chen, Y.; Wang, Y.; Dong, Z.; Su, J.; Han, Z.; Zhou, D.; Zhao, Y.; Bao, Y. 2-D regional short-term wind speed forecast based on CNN-LSTM deep learning model. Energy Convers. Manag. 2021, 244, 114451. [Google Scholar] [CrossRef]
Yu, C.; Yan, G.; Yu, C.; Mi, X. Attention mechanism is useful in spatiotemporal wind speed prediction: Evidence from China. Appl. Soft Comput. 2023, 148, 110864. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
Gong, M.; Yan, C.; Xu, W.; Zhao, Z.; Li, W.; Liu, Y.; Li, S. Short-term wind power forecasting model based on temporal convolutional network and Informer. Energy 2023, 283, 129171. [Google Scholar] [CrossRef]

Figure 1. Raw power data.

Figure 2. Standard wind speed power curve for offshore wind turbines.

Figure 3. Wind speed distribution and contours in m/s.

Figure 4. Three-level decomposition of the VMD.

Figure 5. Four-level decomposition of the VMD.

Figure 6. GRU unit.

Figure 8. Decoder structure.

Figure 9. Wind turbine multistep power forecast method based on VMD-MSI-GTTS.

Figure 10. Change curve of the loss function in the training set of Unit 12.

Figure 11. Change curve of the loss function in the validation set of Unit 12.

Figure 12. Evaluation index of module analysis experimental results.

Figure 13. Box plot of the error distribution of the module analysis experiment results.

Figure 14. Error distribution of the comparative experimental results of different decomposition methods.

Figure 15. Evaluation index of the comparative test results.

Figure 16. Box plot of the error distribution of the comparative experiments results.

Figure 17. Line chart of the partial comparison of experimental results.

Table 1. Raw power data statistics.

	#3-6.8MW	#12-8.3MW	#14-6.8MW	#61-8.3MW
	Active Power, kW	Active Power, kW	Active Power, kW	Active Power, kW
Count	67,996	67,996	67,996	67,996
Mean	2372	3147	2238	2288
Standard deviation	2409	2796	2279	2623
Minimum	−55	−73	−42	−118
25%	178	595	330	0
Median	1473	2422	1444	1274
75%	4161	5265	3626	3702
Maximum	7198	8394	7232	8379

Table 2. Summary of the features, along with their descriptions and sources.

Feature Name	Description	Source
Wind Power	Active power output from the wind turbine	SCADA system
Wind Speed	Ambient wind speed	Anemometer on the meteorological mast or nacelle instrumentation
Wind Direction	Direction of wind flow	Wind vane on the meteorological mast or nacelle instrumentation
Turbine Yaw Angle	Orientation of the turbine nacelle	Yaw angle sensor on the nacelle
Air Temperature	Ambient air temperature	Meteorological sensor
Barometric Pressure	Atmospheric pressure	Barometric pressure sensor
Relative Humidity	Amount of water vapor in the air	Humidity sensor
Month	Month of the year (1–12)	Derived from timestamp
Day	Day of the month (1–31)	Derived from timestamp
Hour	Hour of the day (0–23)	Derived from timestamp
Minute	Minute of the hour (0–59)	Derived from timestamp

Table 3. Decomposition performance parameters.

Number of Decomposition Layers (k)	RMSE	$ρ_{p}$	Calculation Time/s
2	1.111	0.947	0.27
3	0.794	0.973	0.91
4	0.765	0.975	2.24
5	0.765	0.975	3.31
6	0.761	0.977	5.32
7	0.759	0.977	6.17

Table 4. Results of module analysis experiments.

Methods	Unit Number	Output Length	MAE	MSE	RMSE	R²
GRU	#3	4∗6	0.1035	0.0250	0.1580	0.758
	#12	4∗6	0.0987	0.0240	0.1549	0.787
	#14	4∗6	0.0759	0.0136	0.1164	0.696
	#61	4∗6	0.0763	0.0158	0.1258	0.683
VMD-GRU	#3	4∗6	0.0633	0.0128	0.1130	0.891
	#12	4∗6	0.0736	0.0154	0.1240	0.856
	#14	4∗6	0.0652	0.0099	0.0994	0.782
	#61	4∗6	0.0512	0.0081	0.0898	0.843
VMD-MSI-GRU	#3	4∗6	0.0635	0.0114	0.1067	0.903
	#12	4∗6	0.0618	0.0102	0.1061	0.912
	#14	4∗6	0.0499	0.0077	0.0880	0.841
	#61	4∗6	0.0505	0.0075	0.0873	0.848
VMD-MSI-GTTS	#3	4∗6	0.0621	0.0108	0.1038	0.910
	#12	4∗6	0.0579	0.0094	0.0971	0.921
	#14	4∗6	0.0425	0.0058	0.0763	0.891
	#61	4∗6	0.0464	0.0074	0.0858	0.873

Table 5. Results of the comparative experiments of different decomposition methods.

Methods	Unit Number	Output Length	MAE	MSE	RMSE	R²
EEMD-MSI-GTTS	#3	4∗6	0.1042	0.0235	0.1534	0.761
	#12	4∗6	0.1031	0.0262	0.1617	0.795
	#14	4∗6	0.07495	0.0127	0.1129	0.745
	#61	4∗6	0.1120	0.0215	0.1467	0.619
EWT-MSI-GTTS	#3	4∗6	0.0977	0.0232	0.1524	0.781
	#12	4∗6	0.0988	0.0223	0.1495	0.797
	#14	4∗6	0.0710	0.0131	0.1142	0.716
	#61	4∗6	0.0691	0.0145	0.1203	0.753
TVFEMD-MSI-GTTS	#3	4∗6	0.0661	0.0112	0.1055	0.905
	#12	4∗6	0.1017	0.0236	0.1536	0.781
	#14	4∗6	0.0515	0.0081	0.0897	0.856
	#61	4∗6	0.0744	0.0148	0.1215	0.798
VMD-MSI-GTTS	#3	4∗6	0.0621	0.0108	0.1038	0.910
	#12	4∗6	0.0579	0.0094	0.0971	0.921
	#14	4∗6	0.0425	0.0058	0.0763	0.891
	#61	4∗6	0.0464	0.0074	0.0858	0.873

Table 6. Comparative experimental results of five methods.

Methods	Unit Number	Output Length	MAE	MSE	RMSE	R²
LSTM	#3	4∗6	0.1093	0.0302	0.1737	0.730
	#12	4∗6	0.1054	0.0274	0.1656	0.769
	#14	4∗6	0.0737	0.0144	0.1100	0.699
	#61	4∗6	0.0801	0.0176	0.1326	0.673
CNN-LSTM	#3	4∗6	0.1051	0.0285	0.1689	0.750
	#12	4∗6	0.1128	0.0300	0.1732	0.761
	#14	4∗6	0.0789	0.0151	0.1228	0.651
	#61	4∗6	0.0782	0.0159	0.1263	0.672
LSTM-Attention	#3	4∗6	0.0891	0.0215	0.1467	0.815
	#12	4∗6	0.0983	0.0205	0.1430	0.815
	#14	4∗6	0.0672	0.0134	0.1156	0.653
	#61	4∗6	0.0707	0.0143	0.1194	0.688
Informer	#3	4∗6	0.0851	0.0187	0.1366	0.795
	#12	4∗6	0.1027	0.0235	0.1533	0.747
	#14	4∗6	0.0768	0.0154	0.1242	0.665
	#61	4∗6	0.0835	0.0178	0.1333	0.633
VMD-MSI-GTTS	#3	4∗6	0.0621	0.0108	0.1038	0.910
	#12	4∗6	0.0579	0.0094	0.0971	0.921
	#14	4∗6	0.0425	0.0058	0.0763	0.891
	#61	4∗6	0.0464	0.0074	0.0858	0.873

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wan, A.; Gong, Z.; Wei, C.; AL-Bukhaiti, K.; Ji, Y.; Ma, S.; Yao, F. Multistep Forecasting Method for Offshore Wind Turbine Power Based on Multi-Timescale Input and Improved Transformer. J. Mar. Sci. Eng. 2024, 12, 925. https://doi.org/10.3390/jmse12060925

AMA Style

Wan A, Gong Z, Wei C, AL-Bukhaiti K, Ji Y, Ma S, Yao F. Multistep Forecasting Method for Offshore Wind Turbine Power Based on Multi-Timescale Input and Improved Transformer. Journal of Marine Science and Engineering. 2024; 12(6):925. https://doi.org/10.3390/jmse12060925

Chicago/Turabian Style

Wan, Anping, Zhipeng Gong, Chao Wei, Khalil AL-Bukhaiti, Yunsong Ji, Shidong Ma, and Fareng Yao. 2024. "Multistep Forecasting Method for Offshore Wind Turbine Power Based on Multi-Timescale Input and Improved Transformer" Journal of Marine Science and Engineering 12, no. 6: 925. https://doi.org/10.3390/jmse12060925

APA Style

Wan, A., Gong, Z., Wei, C., AL-Bukhaiti, K., Ji, Y., Ma, S., & Yao, F. (2024). Multistep Forecasting Method for Offshore Wind Turbine Power Based on Multi-Timescale Input and Improved Transformer. Journal of Marine Science and Engineering, 12(6), 925. https://doi.org/10.3390/jmse12060925

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multistep Forecasting Method for Offshore Wind Turbine Power Based on Multi-Timescale Input and Improved Transformer

Abstract

1. Introduction

2. Data Sources and Processing

2.1. Data Sources

2.2. Wind Turbine Operation and Control

2.3. Feature Selection

2.4. Data Preprocessing and Gap Handling

2.5. Deconstruction of Wind Speed Signals

2.5.1. Principles of VMD

2.5.2. Wind Speed Signal Decomposition

2.6. Feature Standardization

3. Forecasting Model Structure

3.1. GRU Network

3.2. Improved Transformer Timing Forecast Model

3.3. Multi-Timescale Input Forecast Models

4. Experimental Analysis and Verification

4.1. Evaluation Indicators

4.2. Analysis of the Model Training Process

4.3. Comparison of Module Analysis Experiment Results

4.4. Comparison of Different Decomposition Methods

4.5. Model Comparison Experimental Analysis and Validation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI