Civil airline fare prediction with a multi-attribute dual-stage attention mechanism

Zhichao Zhao ORCID: orcid.org/0000-0003-0402-7852^1,2,
Jinguo You^1,2,
Guoyu Gan^1,2,
Xiaowu Li^1,2 &
…
Jiaman Ding^1,2

3777 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Airfare price prediction is one of the core facilities of the decision support system in civil aviation, which includes departure time, days of purchase in advance and flight airline. The traditional airfare price prediction system is limited by the nonlinear interrelationship of multiple factors and fails to deal with the impact of different time steps, resulting in low prediction accuracy. To address these challenges, this paper proposes a novel civil airline fare prediction system with a Multi-Attribute Dual-stage Attention (MADA) mechanism integrating different types of data extracted from the same dimension. In this method, the Seq2Seq model is used to add attention mechanisms to both the encoder and the decoder. The encoder attention mechanism extracts multi-attribute data from time series, which are optimized and filtered by the temporal attention mechanism in the decoder to capture the complex time dependence of the ticket price sequence. Extensive experiments with actual civil aviation data sets were performed, and the results suggested that MADA outperforms airfare prediction models based on the Auto-Regressive Integrated Moving Average (ARIMA), random forest, or deep learning models in MSE, RMSE, and MAE indicators. And from the results of a large amount of experimental data, it is proven that the prediction results of the MADA model proposed in this paper on different routes are at least 2.3% better than the other compared models.

A multi-level attention long short-term memory neural network based on rival rise algorithm for traffic volume prediction

Article 24 April 2024

Track Prediction Based on Spatio-Temporal Attention

Short-term multi-step-ahead sector-based traffic flow prediction based on the attention-enhanced graph convolutional LSTM network (AGC-LSTM)

Article Open access 07 May 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Air travels are becoming more and more popular in China, and numerous online booking channels for aircraft tickets are now available. It is well-recognized that airlines make decisions about aircraft ticket prices based on the time of purchase. Airlines nowadays use complex strategies to dynamically allocate ticket prices, and these strategies take into account a variety of financial, marketing, commercial, and social factors. Because of the high complexity of the pricing model and the dynamic price changes, it is tricky for customers to buy tickets at the lowest price. Therefore, several applications have been developed recently to predict the ticket price, thereby guiding customers to buy tickets at the most appropriate time. Specifically, Hopper [23] is a relatively mature airfare forecast app, producing an accuracy of 95%, and 60% of its push messages tell its consumers that it is not the optimal time to order tickets yet.

Ticket price forecasts are of great reference value for the aviation industry. Ticket prices are determined by various factors, such as airlines, days of early purchase, as well as the departure time and airport. Airlines can adjust their ticket prices based on these factors to get the expected income for an effective pricing strategy [10, 19, 30].

However, several factors can limit the accuracy of air ticket price forecasts. First of all, the price of air tickets is a random walk time series, which is affected by the purchase time and other related factors; Secondly, with the ARIMA model, only simple non-stationarity type relationships can be acquired, but predictions of conventional time series are non-linear and non-stationary. The time series data used for prediction is generally required as regressive and periodic, which is not the case with air ticket price forecasts. Finally, ticket prices are affected by many uncertain factors, such as the long-term impact from governmental regulations, the short-term impact from the market and the weather, as well as some unexpected or international events. One example of such events is the novel coronavirus outbreak, which led the entire international airline industry to experience a downturn.

A linear quantile model [18] was proposed to predict the ticket prices in 2014. The model integrates four LR models to obtain the best fitting effect, mainly to provide passengers with unbiased information about whether to purchase tickets or wait longer to provide better prices. Besides, Tziridis et al. [28] used eight machine learning models to predict ticket prices, including ANNs, RF, SVM, and LR, and compared their results. Bagging Regression Tree was found to be the best model in their comparison, as it is stable and unaffected by various input feature sets. Moreover, deep learning has also demonstrated great promise and made significant progress in computer vision and natural language processing. Neural networks [1, 13, 37], instead of traditional methods, have become one of the latest trends to predict airfare ticket prices.

This paper proposes a novel strategy for predicting air ticket prices based on the multi-attribute dual-stage attention (MADA) mechanism to address this problem. Besides, the Seq2Seq neural network is adopted to encode and decode the input multi-dimensional fare-related attributes. Moreover, dual-stage attention mechanisms [31] are employed to extract effective information variables. The mean square error loss function is used to train the real data to obtain the trend of fare changes.

Our main contributions are threefold.

1.
An improved multi-attribute dual-stage attention mechanism model is proposed. The first attention mechanism is performed by the encoder in the input time series, which selects important weight information for the decoder layer. Subsequently, the decoder layer uses such weight information in its temporal attention mechanism to produce the final prediction outputs.
2.
Various major models for airfare prediction were compared on real data sets. The results showed that the MADA model outperformed others in MSE, RMSE, and MAE indicators.
3.
Finally, the accuracy of civil aviation ticket price prediction was compared among different prediction models, and the influence extents of different data attributes when the proposed model was in different hidden layers were also analyzed. In terms of RMSE, MSE, and MAE indicators, the MADA model outperformed the variant and benchmark models.

The rest of the paper is organized as follows. Section 2 introduces the related previous works about civil aviation fare prediction, and Section 3 introduces the relevant models, including a detailed description of model data preprocessing and the network models. Section 4 describes the algorithm in this paper. Subsequently, the experimental evaluations, including data sets, evaluation indexes, comparisons of experimental results, and ablation analysis, are illustrated in Section 5. Finally, in Section 6, the work in this paper and the direction for future research are summarized.

2 Related work

Airfare prediction is essentially a time series forecast. The Auto-Regressive Integrated Moving Average (ARIMA) method is a traditional method for time series prediction, where AR and MA eliminate positive and negative correlations, respectively. AR and MA offset each other, and they usually contain two elements that can avoid overfitting. Gordiievych et al. [12] proposed to use the ARIMA model to build a system that will help customers make purchase decisions by predicting the price of air tickets. Besides, another idea called Facebook-prophet [35] is similar to STL (Seasonal-Trend decomposition procedure based on Loess) decomposition, which can divide the time-series signals into seasonal, trend, and residue ones. STL decomposition, as the name suggests, is better suited to dealing with seasonal time series data than traditional time series models in terms of control and interpretability.

In Random forest (RF), multiple decision trees are integrated by ensemble learning. XGBoost (Extreme Gradient Boosting Decision Tree) [5] is a machine learning algorithm with higher robustness and efficiency, which can be applied to detect problems and predict time series. To aid the consumer decision-making process, Wohlfarth et al. [7] integrated in the early phases of the cluster and used a variety of the latest supervised learning algorithms (classification tree (CART) and RF). They then utilize CART to understand meaningful rules and RF to provide information about each function’s relevance. To compare relative values with the total average price, Ren et al. [32] proposed utilizing LR, Naive Bayes, Softmax regression, and SVMs to develop a prediction model and categorize the ticket price into five bins (60% to 80%, 80% to 100%, 100% to 120%, and so on). The models were built using over 9,000 data points, comprising six features (such as the start of the departure week, the date of the price quote, the number of stops in the itinerary, and etc). Using the LR model, the authors reported the best training error rate of around 22.9%. Instead, the prices were classified as “higher” or “lower” than the average using an SVM classification model.

It has been used to predict crude oil prices [39] and housing prices [29], and it is more effective compared to with the traditional ARIMA. As the XGBoost and deep learning techniques continue to develop, researches about flight delays [13] have also integrated relevant random tree models with deep learning. The flight delays experiment shows LSTM cell is an effective structure to handle time sequences and random forest-based method can obtain good performance for the classification accuracy (90.2% for the binary classification) and overcome the overfitting problem.

As the deep learning technique continues to develop, RNN (LSTM, GRU) time series analysis and CNN+ RNN + attention prediction have been proposed as two prediction methods. Specifically, CNN captures short-term local dependence, while RNN captures long-term macro dependence. Attention is weighted for essential periods or variables, and the representative models for such strategy are TPA-LSTM [33], LSTNet [22] and MTA-RNN [9]. Shih et al. [33] introduce a different attention mechanism for selecting important time series and multivariate forecasting using frequency domain information. The attention dimension was allowed to be feature-wise by the authors in order for the model to learn interdependencies among various variables not only within the same time step, but also over all past times and series. Meanwhile, Guokun et al. [22] also made a model to predict the time-series, which through a combination of neural networks and recursive convolution advantage of neural networks as well as auto-regressive component, to form a better robustness of the model. Deep learning in passenger load factors and ticket prices, as well as a multi-granularity temporal attention mechanism RNN structure [9], has been reported to predict the relationship between ticket prices.

Besides, Seq2Seq can also predict time series, and it is combined with the attention mechanisms, such as DA-RNN [31], which adopts the dual-stage attention mechanism. In this way, the attention mechanism for time series is added in both the encoder and decoder layers in Seq2Seq. Experiments in this DA-RNN prove that increasing the temporal attention mechanism can capture more relevant input features. Another price forecast model based on the attention mechanism, TADA [6], has also been proposed and proved to produce more evident effects than a single-layer encoder with attention. The TADA splits influencing factors into internal and exterior features, and then uses the dual attention mechanism to forecast future price trends.

At present, although deep learning has received a lot of attention for stock forecasting [4, 16, 24, 27], it has received very little attention for forecasting civil aviation ticket prices. The RNN is the most widely used deep learning model for prediction, and it has gained a lot of traction. Recent years have seen a surge in approaches that use neural network structures to make the prediction results more accurate [17, 21, 38].

A summary for the discussed ticket price prediction models and correlation time series forecasting models is shown in Table 1.

Table 1 Summary of airline fare prediction models and correlation time series forecasting models

Full size table

3 The airfare prediction model

In this section,we first introduce the related work of data preprocessing and then we present the technical details of our proposed model MADA. Simultaneously, mathematically formulate the definition of civil airline fare prediction.

3.1 Data preprocessing and profiling

Firstly, the data is preprocessed, and the missing values within the data are removed. Based on the currency exchange rates, the face value of ticket prices is represented as Chinese yuan, and the non-numerical data is vectorized to meet the requirements for data input into the neural network. Subsequently, identical route data is selected for model training. The attributes related to airfare prediction include departure airport (Dpt_AirPt_Cd), arrival airport (Arrv_Airpt_Cd), boarding time (Datetime), route (Air_route), flight number (Flt_nbr), airline (Airln_cd), the total number of guests (Pax_Qty_y), and fare (FARE) (Table 2). Finally, the flight ticket price data is visualized, as is shown in Fig. 1. Figures 2, 3 and 4 show the price changes in different quarters of the flight segment. For a specific flight, the red line represents its daily average price, and the blue line represents its real-time price changes. Note that only airfare information of the first eight months is displayed.

Table 2 Input attributes

Full size table

Figure 1 shows the dynamic fluctuation of actual ticket prices in the first eight months of a certain line segment. Figures 2, 3 and 4 show the price fluctuation trends of 3 quarters in a year, respectively. Particularly, Fig. 2 stands out because of the phenomenal growth around mid-February, which is because mid-February coincides with the traditional Chinese Spring Festival in that year. In the second quarter of Fig. 3, the air ticket prices exhibit periodic changes; however, as can be seen in the third quarter of Fig. 4, the variation in air tickets becomes relatively stable. Therefore, it is fair to deduce that generally speaking, the airfare of this particular flight segment shows periodic fluctuations. At the beginning of the year, the ticket prices vary significantly because of the holidays, and then they exhibit periodic changes. After the middle of the year, which is the off-season period, the ticket prices of this segment reveal a stable increase.

In one execution session of the model, the original data is cleaned to remove duplicate or missing values, and a table with information such as departure time, airline, flight number, route, and guests’ number is created.

Subsequently, the non-numerical values are encoded by One-Hot and vectorized to serve as the inputs into the neural network. After that, in an 8:2 ratio, the data is divided into training and test data.

It’s worth noting that the selected feature attributes’ dimensionality is being reduced. First, properties that are relevant to flight fares are visualized (Fig. 5) with the feature selection module in the Scikit-Learn machine learning toolbox. From Fig. 5, it is obvious that airline (Airln_cd) is the most significant factor to affect ticket fares, followed by the departure airport (Dpt_AirPt_Cd) and the route (Air_route). By reducing the relatively non-critical dimensions, the model shows a higher overhead and generalization capability.

3.2 The deep learning model for fare prediction

The structure of the MADA mechanism model proposed in this paper is shown in Fig. 6.

(1) Network model: To perform multi-dimensional airfare prediction, a multi-attribute dual-stage attention mechanism model is proposed in this paper.

Ticket price forecasts need multidimensional data, which are represented as x₁,x₂,…,x_n. Then input X $= \left (x_{1},x_{2},\ldots ,x_{n}\right )^{T}=(x^{1},x^{2},\ldots ,x^{p})\in \ R^{n\ast p} $, where p represents the window size. The time periods t use X $= \left (x_{1},x_{2},\ldots ,x_{n}\right )^{T}=(x^{1},x^{2},\ldots ,x^{p})\in \ R^{n\ast p}$ to represent the processing results with multiple attributes. After that, X is input into the LSTM layer to obtain the feature vector, which is integrated with the feature weight a_t at time t to obtain the output $\widetilde {Z_{t}}$ in the encoder layer.

Next, the input for the decoder layer of the LSTM network is the time series $Z_{t}=(\widetilde {Z_{1}},\widetilde {Z_{2}},\widetilde {Z_{3}},...,\widetilde {Z_{n}})\ \in \ R^{n\ast p}$ of time t in the encoder layer. The decoding results are integrated with the feature weight l_t at time t to get the context vector C_t− 1. Finally, the final predicted value $\widetilde {Y_{t-1}}$ is obtained from the final output layer in LSTM.

Following the steps above, the processed data can be input into the LSTM network for relevant trainings. The model employs a supervised learning methodology, with the multi-dimensional data (airline, flight number, departure airport, arrival airport, flight path, and flight number) representing X and the ticket price data representing Y. In this way, model training is enabled, and the MADA model can remember the law of changes of the relevant data. During the model training, various parameters need to be adjusted to optimize the model, so that both the training data and the test data can achieve the best possible results.

(2) Loss function: The mean square error MSELoss is used as the loss function. Set the vectors s and y as the predicted and actual values, respectively. MSELoss calculates the error (scalar e) and the gradient of e concerning s.

$$ e=M S E \operatorname{Los} s(s, y)=\frac{1}{n} {\sum}_{t=1}^{n}\left( s_{t}-y_{t}\right)^{2} $$

The solution is:

$$ \begin{array}{c} \frac{d e}{d s}=\frac{2}{n}\left( \left( s_{1}-y_{1}\right),\left( s_{2}-y_{2}\right), \ldots,\left( s_{n}-y_{n}\right)\right) \\ \frac{d e}{d y}=-\frac{d e}{d s} \end{array} $$

The mean square error loss is calculated by the distance between the target and the calculated values, and the gradient of each step is obtained by backward propagation. After multiple iterations, the minimum loss is obtained.

(3) Activation function: The Leaky ReLU [26] and Softmax are used as the activation functions. Particularly, the Leaky ReLU can extract the feature information hidden in the data and map it to the corresponding ranges. The equation of the Leaky ReLU function is:

$$ y_{i}=\left\{\begin{array}{ll} x_{i}, & \text { if } x \geq 0 \\ \frac{x_{i}}{a_{i}}, & \text { if } x<0 \end{array}\right. $$

(1)

Compared with the ReLU [11], a general activation function, the Leaky ReLU is used in this paper because it can reasonably divide the negative values. Empirically, the Leaky ReLU is more efficient than the ReLU.

On the other hand, Softmax can convert all the input values into others within the range of 0–1. Its equation is:

$$ y_{i}=S(z)_{i}=\frac{e^{z_{i}}}{{\sum}_{j=1}^{C} e^{z_{j}}}, i=1, \ldots, C $$

(2)

Z is the output of the previous layer, which serves as the input of Softmax. The predicted object’s dimension is C, and y_i is the probability that it belongs to the C-th category.

(4) Optimizer: Adam (Adaptive Moment Estimation) [20] is a first-order optimization algorithm that can be used instead of the conventional stochastic gradient descent. Furthermore, iterations based on the training data can be used to adjust neural network weights. Adam is essentially RMSprop with a momentum term, which uses the first-order and second-order moment estimations of the gradient to realize dynamic adjustments of the learning rate for each parameter. It is especially beneficial because, after bias correction, the learning rate at each iteration has a defined range, resulting in reasonably stable parameters. Its equations are as follows:

$$ m_{t}=\mu * m_{t-1}+(1+\mu) * g_{t} $$

(3)

$$ n_{t}=v * n_{t-1}+(1-v) * {g_{t}^{2}} $$

(4)

$$ \widehat{m_{t}}=\frac{m_{t}}{1-\mu^{t}} $$

(5)

$$ \widehat{n_{t}}=\frac{n_{t}}{1-v^{t}} $$

(6)

$$ {\varDelta} \theta_{\mathrm{t}}=-\frac{\widehat{\mathrm{m}_{\mathrm{t}}}}{\sqrt{\mathrm{n}_{\mathrm{t}}+\epsilon}} * \eta $$

(7)

The letter meaning in the above formulas are as follows :μ,v ∈ [0,1] represents exponential decay rates for the moment estimates. m₀ initialize first-order moment vector, n₀ initialize second-order moment vector, 𝜃₀ initialize parameter vector, t initialize timestep, and η is stepsize.

Among them, (3) and (4) are the first-order and second-order moment estimations of the gradient, which can be considered the expected estimations of E|gt| and E|gt²|, respectively. Besides, (5) and (6) are two correction equations to (3) and (4), so that they can be approximated as unbiased estimates of expectations. The direct moment estimations of the gradients, based on the equations, do not require additional memory and can be dynamically adjusted according to the gradients. The last item’s front part is a dynamic constraint on the learning rate n that is located within a precise range.

3.3 Dual-stage attention mechanism

Aside from the Seq2Seq model, the model also integrates the attention mechanism into its time series dimension in its encoder and decoder layers.

Encoder with input attention

The input time series is X = (x¹,x²,…,x^p) = (x₁,x₂,…,x_n)^T ∈ R^(n∗p), where n is the window of the input time series, h_t ∈ R^m represents the encoder layer’s hidden state at time t, and m represents the hidden state.

$$ \mathbf{h}_{t}=LSTM\left( \mathbf{h}_{t-1}, \mathbf{x}_{t}\right) $$

(8)

LSTM is used as the encoder model. The forget gate, the input gate, and the output gate are all required gates for each LSTM. The output results are expressed as h_t.

Where (h_t− 1;x_t) ∈ R^m+n is the result of concatenating the previously hidden layer h_t− 1 with the current input x_t. LSTM is chosen as the loop unit because it can evade gradient disappearance and explosion problems, enabling better capture of the time series data input.

Computer vision’s proposed attention mechanism will contribute to enhance the model’s prediction accuracy. Within the time series X = (x¹,x²,…,x^p) = (x₁,x₂,…,x_n)^T ∈ R^(n∗p), deterministic attention is used to extract the input time dimensions. The previously hidden states, namely h_t− 1 and v_t− 1, serve as the attention input in the LSTM of the encoder layer. The equations are as follows:

$$ {b_{t}^{p}}=\mathbf{U}_{e}^{\top} \tanh \left( \mathbf{W}_{e}\left[\mathbf{h}_{t-1} ; \mathbf{v}_{t-1}\right]+\mathbf{B}_{e} \mathbf{x}^{p}\right) $$

(9)

and

$$ {\alpha_{t}^{p}}=\frac{\exp \left( {b_{t}^{p}}\right)}{{\sum}_{i=1}^{n} \exp \left( {b_{t}^{i}}\right)} $$

(10)

Among them, the parameters of U_e ∈ R^T, W_e ∈ R^T∗2m, and B_e ∈ R^T∗T must be learned, and the equation (9) should not be biased. The equation (10) calculates the attention at time t. Softmax is used to normalize the corresponding weights. Attention of this input data can extract the time series information for the decoder to perform subsequent steps. After that, the input time series can be extracted adaptively following the equation (11):

$$ \tilde{\mathbf{x}}_{t}=\left( {\alpha_{t}^{1}} {x_{t}^{1}}, {\alpha_{t}^{2}} {x_{t}^{2}}, \cdots, {\alpha_{t}^{n}} {x_{t}^{n}}\right)^{\top} $$

(11)

Then, using the equation (12), the hidden state at time t is updated:

$$ \mathbf{h}_{t}=LSTM\left( \mathbf{h}_{t-1}, \tilde{\mathbf{x}}_{t}\right) $$

(12)

x_t is replaced by $\tilde {\mathbf {x}}_{t}$, and the weight of the input time series is calculated.

Decoder with attention

To predict the final result $\tilde {\mathbf {Y}}_{T}$, both the encoder and the decoder are used. However, the model’s robustness in predicting time series data is lacking, particularly at longer input time series lengths. Therefore, the decoder adopts the temporal attention mechanism based on the concatenation of hidden state d_t− 1 and the cell hidden state $\mathbf {v}_{t-1}^{\prime }$ of the previous encoder layer. Input to the time attention network in the decoder, such as U_d, W_d, and B_d, need to be trained to learn parameters. In the equation (13), ${\beta _{t}^{i}}$ is the weight layer in the decoder, and work out at ${l_{t}^{i}}$ is the attention weight at time t by equation (14).

$$ {\beta_{t}^{i}}=\mathbf{U}_{d}^{\top} \tanh \left( \mathbf{W}_{d}\left[\mathbf{d}_{t-1} ; \mathbf{v}_{t-1}^{\prime}\right]+\mathbf{B}_{d} \mathbf{h}_{i}\right) $$

(13)

and

$$ {l_{t}^{i}} =\frac{\exp \left( {\beta_{t}^{i}}\right)}{{\sum}_{j=1}^{T} \exp \left( {\beta_{t}^{j}}\right)} $$

(14)

Each hidden h_i in the encoder layer is used as the input in the decoder layer and calculated with its corresponding attention weight to obtain a weighted average context vector C_t. The total hidden input is [h₁,h₂,h₃,...,h_T].

$$ \mathbf{C}_{t}={\sum}_{i=1}^{T} {l_{t}^{i}} \mathbf{h}_{i} $$

(15)

C_t is the different times in the context vector. Once the total amount of context vectors is obtained, it is combined with the input target (y₁,y₂,...,y_T− 1), and that gives us:

$$ \tilde{y}_{t-1}=\tilde{\mathbf{w}}^{\top}\left[y_{t-1} ; \mathbf{C}_{t-1}\right]+\tilde{b} $$

(16)

Where y_t− 1 is the input of the decoder layer, C_t− 1 is the context vector.

$$ \mathbf{d}_{t}=LSTM\left( \mathbf{d}_{t-1}, \tilde{y}_{t-1}\right) $$

(17)

Then $\tilde {y}_{t-1}$ and the input d_t− 1 in the decoder layer are concatenated, and the concatenation results are input into the LSTM network to find out d_t. Subsequently, d_t is concatenated with C_t in the fully connected neural network (FC), and $\tilde {\mathbf {Y}}_{T}$ is the final prediction result from training.

$$ \tilde{\mathbf{Y}}_{T}=FC\left( \mathbf{d}_{t}, \mathbf{C}_{t}\right) $$

(18)

And that concludes the introduction of the training process for the MADA model.

4 Algorithms

The entire execution process of our MADA model is consisted of Algorithm 1 (Multi-attribute Data Processing) and Algorithm 2 (Multi-attribute Dual-stage Attention Mechanism model).

In Algorithm 1, the multi-attribute content X from the original user is entered to determine the data type. Before executing Algorithm 1, the attributes are preprocessed based on their types. For numerical data, MinMax is used to normalize them within the range of [-1,1]; when the data is non-numerical, it is encoded as a One-Hot number. Finally, after such preprocessing, the values are concatenated to obtain the output $\tilde {Z}$, which serves as the input for Algorithm 2.

In Algorithm 2, after inputting $\tilde {Z}$, to get the final prediction result $\tilde {\mathbf {Y}}_{T}$, d_t− 1 and v_t− 1 are fed into the decoder layer. MSELoss then calculates the difference between the real and expected values, and the learning parameters are updated by back propagation to gradually improve the model’s capability of generalization.

5 Experimental evaluations

In this section, we conduct experiments on real civil aviation datasets to showcase the advantage of MADA in the task of airline fare prediction.

5.1 Datasets

For experimental evaluations, our model was implemented in Python 3.7.6 and performed on an Ubuntu 18.04 with a 2.5GHz Intel Core i7 CPU and 8GB memory. The data set was a two-year anonymous airfare record from a real airline, and the training set contained more than 1.7 million data pieces, including essential attributes such as airline, flight number, departure time, and passenger volume. The data set contains more than 200 routes, and the paper selects one of the representative routes for discussion.

5.2 Evaluation metrics

The absolute error (MSE), root mean square error (RMSE) and mean absolute error (MAE) were applied as the assessment metrics. Their calculation equations are:

$$ \text{MSE}=\frac{1}{m} {\sum}_{i=1}^{m}\left( y_{\text {test}}^{(i)}-\hat{y}_{\text {test}}^{(i)}\right)^{2} $$

$$ \text{RMSE}=\sqrt{\frac{1}{m} {\sum}_{i=1}^{m}\left( y_{\text {test}}^{(i)}-\hat{y}_{\text {test}}^{(i)}\right)^{2}} $$

$$ \text{MAE}=\frac{1}{m} {\sum}_{i=1}^{m}\left|y_{\text {test}}^{(i)}-\hat{y}_{\text {test}}^{(i)}\right| $$

According to the predicted label, the reference values of these three metrics help train the model for better generalization capability.

5.3 Baselines

ARIMA, RF, XGBoost, CNN-LSTM, CNN-LSTM + Attention, and other mechanical models were used in this paper for comparison with the MADA prediction model. ARIMA, RF, and XGBoost were executed without any further configurations.

LSTM-CNN [15]

This approach first uses the LSTM model, followed by CNN for parameter classification. Initially, this model was used to predict gold prices.

CNN-LSTM [14]

This model uses CNN first, and then connects to LSTM. These two sub-modules are combined to form a CNN-LSTM model. For time series forecasting, the CNN-model is frequently used.

CNN-LSTM+Attn [25]

In this model, CNN is used to extract multidimensional attributes, and the results are entered into LSTM, then the final results are output through the attention mechanism. The model is used to predict PM2.5 concentration.

Seq2Seq [34]

Seq2Seq is first proposed to be used to deal with language translation problems. This paper extends its application to predict time series problems.

Seq2Seq+Attention [3]

Seq2Seq with attention is first proposed to be used to deal with language translation problems when the input sequence length is longer. This paper extends its application to predict time series problems.

MADA

The model proposed in this paper adopted multi-attribute data preprocessing, and the attention mechanism was added to both the encoder and the decoder layers.

5.4 Experimental results

Generally speaking, the new MADA model produced significantly lower MSE, RMSE, and MAE results than the previous models, implying that it can better predict price fluctuation trends. The experiment applied predominantly the Pytorch deep learning framework for extensive comparisons in terms of airfare forecasts and analyses. With repeated parameter adjustments, an optimal model was trained with Adam of unequal steps until the model parameters converged. During the experiments, one of the routes’ fare is used to compare the prediction results produced by the involved models effectively.

Table 3 shows the data comparison results when the time window is T= 30 for a 7-day forecast (n= 7 days).

5.4.1 Performance comparison

The performance comparison results can be seen in Table 3, in which the indicators for comparison are MSE, RMSE, and MAE. Judging from the experimental results, the predictions produced by the MADA model were significantly more preferred with much lower MSE, RMSE, and MAE than those obtained from traditional machine learning. Particularly, it is noteworthy that the MADA model integrated multi-dimensional data prediction, and it showed greater relevant input attributes, suggesting more accurate data prediction.

Table 3 Comparison of MADA and various models

Full size table

Moreover, the MADA mechanism is more effective in terms of extracting time series than common deep learning models. However, it took more time to train a MADA model, and the prediction results still need much tuning for greatest accuracy. In short, the MADA model proposed in this paper is more preferable than other methods in predicting air ticket prices.

5.4.2 Effectiveness comparison

To compare the prediction effects, data from a certain flight was used for forecasts. For data preprocessing, models such as CNN-LSTM, Seq2Seq, and MADA were used for prediction. Figure 7 shows the fluctuation trends of MSE, RMSE, and MAE predicted by the three models.

The X-axis in Fig. 7 indicates that different time windows were used, and the data in the table was used to predict the prices next day under different time windows, which were 10, 15, 20, 25, and 30 days. Under different time windows, other models, such as CNN-LSTM + Attn, all became unstable in their performance. However, the MADA revealed better stability and more preferable performance under different time windows.

Meanwhile, based on the experimental results, if the model wants to predict the fare next day, the data from the past 15 days would be necessary for best training.

In Fig. 8, the predictions from CNN-LSTM, CNN-LSTM + Attn, Seq2Seq, and Seq2Seq + Attn were compared. Figure 9 depicts the MADA model’s visualization results. Among them, the effects of the training set and the test set on the final predictions are shown in the upper parts of the graphs, while the effect of visualizing the upper half of the test set on the final predictions is shown in the lower parts. It can be concluded from the graphs that the MADA model produces more advantageous results in terms of civil aviation fare prediction.

5.4.3 Ablation study

The ablation study involved experiments on the proposed model MADA with different structures. The original model was altered as the follows.

MADA_nAttn

Encoders and decoders with 16, 32, 64, 100, or 128 hidden layers were used to build the model. The model structure used in the encoders and decoders was LSTM;

MADA_sAttn

Seq2Seq with 16, 32, 64, 100, or 128 hidden layers were used to build the model, followed by the addition of the temporal attention mechanism to the decoder layer;

MADA

Seq2Seq with 16, 32, 64, 100, or 128 hidden layers were used. Subsequently, the temporal attention mechanism was added to the encoder and decoder layers to build the model.

The experimental results are shown in Fig. 10, in which the MSE, RMSE, and MAE of the MADA_nAttn model were negatively correlated with the number of hidden layers. Besides, the prediction results from the MADA_nAttn model became more favorable as the number of hidden layers rose. However, after producing the optimal results when there was 64 hidden layers, MADA_nAttn would lead to less satisfactory results when the number of model layers continued to increase. This is because the model does not perform well in the data set, and the hidden information within the data cannot be learned from a single attention, and such inaccessibility was exacerbated by a growing number of hidden layers. On the other hand, the MADA model proposed in this paper cannot learn the hidden information in the data with a small number of hidden layers (no larger than 32). However, as the number of hidden layers grew, the MADA model outperforms other deep learning models.

Besides, the importance of multidimensional attributes has been studied. Different attributes were input into the MADA model for training, and the prediction conditions were set as the time window T= 15 and n= 1. Here, the following different variants were defined.

MADA_nEx

There were no weekend or holiday attributes in the multi-attribute data.

MADA_nAttn_allData

The data entered to the model was complete, but no attention mechanism was added.

MADA_sAttn_allData

The data entered by the model was complete, but only temporal attention was added to the decoder layer.

Table 4 compares the MADA variant models by different indicators, and only the average result was taken.

Table 4 Comparison of the variant MADA models

Full size table

It can be seen from Table 4 that MADA_nEx produced results that were less accurate than the original MADA, indicating that multi-attribute data input plays a vital role in the prediction accuracy. Besides, MADA_sAttn was more accurate in prediction compared to MADA_nAttn, suggesting the relevance of the attention mechanism in prediction.

6 Conclusion

Currently, the prediction for civil aviation ticket prices remains rather inaccurate and unreliable. To solve such problem, a prediction method based on MADA is proposed.

Judging from the experimental results, the MADA-based method can produce more accurate prediction results than the traditional methods for civil aviation ticket prices. Moreover, with multidimensional training models, the prediction results will be more accurate. Combined with the dual-stage attention mechanism, the implicit information of time series can be extracted to the utmost extent.

Although MADA has a certain effect from the experimental results, there are still some problems based on the current research. Specifically, first of all, ticket prices will change with other uncontrollable attributes, such as weather conditions will also affect the change in ticket prices. Secondly, although this paper to do a lot more research in airlines fare prediction, but optimal purchase time prediction has not been studied. The prediction of the best time to buy air tickets may become a research direction in this field next. In addition, as far as airlines are concerned, there are also issues such as demand prediction and price discrimination that require further in-depth research.

In the future, more accurate prediction methods should be explored to optimize the current imperfections. The prediction of civil aviation ticket prices can be realized by deep learning methods, so that the ticket buyers can choose a more reasonable period to purchase. At the same time, the company can also increase its corresponding revenue through predictive models. There is a tradeoff between money saving by customer and increasing revenue by companies. Therefore, there is a need for a prediction model that can predict the optimal ticket prices that can bring mutual benefit both for customers and airlines.

References

Abdella JA, Zaki N, Shuaib K, Khan F (2019) Airline ticket price and demand prediction: a survey. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2019.02.001. https://www.sciencedirect.com/science/article/pii/S131915781830884X
Asteriou D, Hall SG (2016) ARIMA Models and the Box-Jenkins Methodology
Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd international conference on learning representations. ICLR 2015 - Conference Track Proceedings. 1409.0473, pp 1–15
Bao W, Yue J, Rao Y (2017) A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLOS ONE 12(7). https://doi.org/10.1371/journal.pone.0180944
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: The 22nd ACM SIGKDD international conference
Chen T, Yin H, Chen H, Wu L, Wang H, Zhou X, Li X (2018) TADA: trend alignment with dual-attention multi-task recurrent neural networks for sales prediction. In: Proceedings - IEEE International Conference on Data Mining. ICDM 2018-Novem. https://doi.org/10.1109/ICDM.2018.00020, pp 49–58
Clémençon S, Casellato X, Roueff F, Wohlartfh T (2012) A data-mining approach to travel price forecasting. In: ICMLA
Dal Molin Ribeiro MH, Coelho LDS (2020) Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Applied Soft Computing 86. https://doi.org/10.1016/j.asoc.2019.105837
Yujing D, Zhihao W, Youfang L (2020) Flight passenger load factors prediction based on RNN using multi-granularity time attention. Computer Engineering v.46 No.509(01):300–307. https://doi.org/10.19678/j.issn.1000-3428.0053569
Article Google Scholar
Ding J (2018) Research on ticket pricing strategy of shandong airlines. PhD thesis
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Gordon G, Dunson D, Dudík M (eds) Proceedings of the fourteenth international conference on artificial intelligence and statistics, PMLR, Fort Lauderdale, FL, USA. Proceedings of Machine Learning Research, vol 15, pp 315–323. http://proceedings.mlr.press/v15/glorot11a.html
Gordiievych A, Shubin I (2015) Forecasting of airfare prices using time series. In: 2015 information technologies in innovation business conference. ITIB 2015 - Proceedings. https://doi.org/10.1109/ITIB.2015.7355055, pp 68–71
Gui G, Liu F, Sun J, Yang J, Zhou Z, Zhao D (2020) Flight delay prediction based on aviation big data and machine learning. IEEE Trans Veh Technol 69(1):140–150. https://doi.org/10.1109/TVT.2019.2954094
Article Google Scholar
Guo X, Zhao Q, Zheng D, Ning Y, Gao Y (2020) A short-term load forecasting model of multi-scale cnn-lstm hybrid neural network considering the real-time electricity price. Energy Reports 6:1046–1053
Article Google Scholar
He Z, Zhou J, Dai HN, Wang H (2019) Gold price forecast based on LSTM-CNN model. In: Proceedings - IEEE 17th international conference on dependable, autonomic and secure computing, IEEE 17th international conference on pervasive intelligence and computing, IEEE 5th international conference on cloud and big data computing. 4th Cyber Science. https://doi.org/10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00188, pp 1046–1053
Hoseinzade E, Haratizadeh S (2019) Cnnpred: cnn-based stock market prediction using a diverse set of variables. Expert Syst Appl 129(SEP.):273–285
Article Google Scholar
Huang B, Liang Y, Qiu X (2021) Wind power forecasting using attention-based recurrent neural networks: a comparative study. IEEE Access 9:40432–40444. https://doi.org/10.1109/ACCESS.2021.3065502
Article Google Scholar
Janssen T (2014) A linear quantile mixed regression model for prediction of airline ticket prices
Juan S (2017) Analysis of the rule of civil aviation passenger reservation and parallel flights management. PhD thesis
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:14126980
Lago J, De Ridder F, De Schutter B (2018) Forecasting spot electricity prices: deep learning approaches and empirical comparison of traditional algorithms. Applied Energy 221:386–405. https://doi.org/10.1016/j.apenergy.2018.02.069
Article Google Scholar
Lai G, Chang WC, Yang Y, Liu H (2018) Modeling long- and short-term temporal patterns with deep neural networks. In: 41st international ACM SIGIR conference on research and development in information retrieval, SIGIR 2018. https://doi.org/10.1145/3209978.3210006, pp 95–104
Lalonde F (2020) Hopper - book flights & hotels on mobile. https://www.hopper.com/
Li M (2019) The study of stock market prediction based on deep learning networks. PhD thesis
Li S, Xie G, Ren J, Guo L, Yang Y, Xu X (2020) Urban pm2. 5 concentration prediction via attention-based cnn–lstm. Applied Sciences 10(6):1953
Article Google Scholar
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proc. icml, Citeseer, vol 30, p 3
Pang X, Zhou Y, Wang P, Lin W, Chang V (2018) An innovative neural network approach for stock market prediction. The Journal of Supercomputing (1):1–21
Papadakis M (2012) Predicting airfare prices using machine learning. Stanford Assignment
Peng Z, Huang Q, Han Y (2019) Model research on forecast of second-hand house price in Chengdu based on XGboost algorithm. In: 2019 IEEE 11th international conference on advanced infocomm technology, ICAIT 2019. https://doi.org/10.1109/ICAIT.2019.8935894, pp 168–172
Qiang Z (2015) Research on the several issues about pricingmodel in airline revenue management. PhD thesis
Qin Y, Song D, Cheng H, Cheng W, Jiang G, Cottrell GW (2017) A dual-stage attention-based recurrent neural network for time series prediction. In: IJCAI International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2017/366. 1704.02971, vol 0, pp 2627–2633
Ren R, Yang Y, Yuan S (2014) Prediction of airline ticket price. University of Stanford
Shih SY, Sun FK, Yi Lee H (2019) Temporal pattern attention for multivariate time series forecasting. Machine Learning 108(8-9):1421–1441. https://doi.org/10.1007/s10994-019-05815-0. 1809.04206
Article MathSciNet Google Scholar
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems 4(January):3104–3112. 1409.3215
Google Scholar
Taylor SJ, Letham B (2018) Forecasting at scale. Am Stat 72(1):37–45. https://doi.org/10.1080/00031305.2017.1380080
Article MathSciNet Google Scholar
Wang J, Sun X, Cheng Q, Cui Q (2021) An innovative random forest-based nonlinear ensemble paradigm of improved feature extraction and deep learning for carbon price forecasting. Science of the Total Environment 762. https://doi.org/10.1016/j.scitotenv.2020.143099
Wang T, Pouyanfar S, Tian H, Tao Y, Alonso M, Luis S, Chen SC (2019) A framework for airfare price prediction: a machine learning approach. In: 2019 IEEE 20th international conference on information reuse and integration for data science (IRI). https://doi.org/10.1109/IRI.2019.00041, pp 200–207
Yang Z, Yan W, Huang X, Mei L (2020) Adaptive temporal-frequency network for time-series forecasting. IEEE Trans Knowl Data Eng, pp 1–1. https://doi.org/10.1109/TKDE.2020.3003420
Zhou Y, Li T, Shi J, Qian Z (2019) A CEEMDAN and XGBOOST-based approach to forecast crude oil prices. Complexity 2019. https://doi.org/10.1155/2019/4392785

Download references

Acknowledgements

This work was supported in part by the Natural Science Foundation of China (62062046) and the Enterprise cooperation project of Yunnan Province (649320190106).

Author information

Authors and Affiliations

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, China
Zhichao Zhao, Jinguo You, Guoyu Gan, Xiaowu Li & Jiaman Ding
Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, 650500, China
Zhichao Zhao, Jinguo You, Guoyu Gan, Xiaowu Li & Jiaman Ding

Authors

Zhichao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jinguo You
View author publications
You can also search for this author in PubMed Google Scholar
Guoyu Gan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowu Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiaman Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinguo You.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhao, Z., You, J., Gan, G. et al. Civil airline fare prediction with a multi-attribute dual-stage attention mechanism. Appl Intell 52, 5047–5062 (2022). https://doi.org/10.1007/s10489-021-02602-0

Download citation

Accepted: 07 June 2021
Published: 03 August 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10489-021-02602-0

Civil airline fare prediction with a multi-attribute dual-stage attention mechanism

Abstract

Similar content being viewed by others

A multi-level attention long short-term memory neural network based on rival rise algorithm for traffic volume prediction

Track Prediction Based on Spatio-Temporal Attention

Short-term multi-step-ahead sector-based traffic flow prediction based on the attention-enhanced graph convolutional LSTM network (AGC-LSTM)

Explore related subjects

1 Introduction

2 Related work

3 The airfare prediction model

3.1 Data preprocessing and profiling

3.2 The deep learning model for fare prediction

3.3 Dual-stage attention mechanism

Encoder with input attention

Decoder with attention

4 Algorithms

5 Experimental evaluations

5.1 Datasets

5.2 Evaluation metrics

5.3 Baselines

LSTM-CNN [15]

CNN-LSTM [14]

CNN-LSTM+Attn [25]

Seq2Seq [34]

Seq2Seq+Attention [3]

MADA

5.4 Experimental results

5.4.1 Performance comparison

5.4.2 Effectiveness comparison

5.4.3 Ablation study

MADA_nAttn

MADA_sAttn

MADA

MADA_nEx

MADA_nAttn_allData

MADA_sAttn_allData

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation