Nothing Special   »   [go: up one dir, main page]

Research Article: Study of Flight Departure Delay and Causal Factor Using Spatial Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Hindawi

Journal of Advanced Transportation


Volume 2019, Article ID 3525912, 11 pages
https://doi.org/10.1155/2019/3525912

Research Article
Study of Flight Departure Delay and Causal Factor Using
Spatial Analysis

Shaowu Cheng,1 Yaping Zhang ,1 Siqi Hao ,1 Ruiwei Liu,2 Xiao Luo,3 and Qian Luo3
1
School of Transportation Science and Technology, Harbin Institute of Technology, Harbin 150001, China
2
State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China
3
The Second Research Institute of Civil Aviation Administration of China, Chengdu 610041, China

Correspondence should be addressed to Siqi Hao; siqihao47@163.com

Received 15 February 2019; Revised 5 May 2019; Accepted 23 May 2019; Published 4 June 2019

Academic Editor: Eneko Osaba

Copyright © 2019 Shaowu Cheng et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Analysis of flight delay and causal factors is crucial in maintaining airspace efficiency and safety. However, delay samples are not
independent since they always show a certain aggregation pattern. Therefore, this study develops a novel spatial analysis approach to
explore the delay and causal factors which is able to take dependence and the possible problem involved including error correlation
and variable lag effect of causal factors on delay into account. The study first explores the delay aggregation pattern by measuring
and quantifying the spatial dependence of delay. The spatial error model (SEM) and spatial lag model (SLM) are then established
to solve the error correlation and the variable lag effect, respectively. Results show that the SEM and SLM achieve better fit than
ordinary least square (OLS) regression, which indicates the effectiveness of considering dependence by employing spatial analysis.
Moreover, the outcomes suggest that, aside from the well-known weather and flow control factors, delay-reduction strategies also
need to pay more attention to reducing the impact of delay at the previous airport.

1. Introduction causal factors of flight delays by ranking different factors


using the analytical hierarchical process. They found techni-
With the rapid development of the civil aviation industry, cal failure and delayed entries as two of the most influential
airspace has become increasingly crowded. This crowdedness factors. Based on the identification of causal factors, further
causes increasingly frequent delays in most major airports researches explored the quantitative effect of each factor
worldwide. This situation seriously affects airports, airlines, on flight delay. By analyzing the characteristics of flight
and passengers. From 2007 to 2017, the annual flights in China departure and arrival delays by constructing probability
consistently increased from 3.65 million to 10.83 million, with density functions, Mueller et al. [3] explored several causal
an average increasing rate of approximately 12.2% in the past factors of delays, such as traffic volume, aircraft type, aircraft
five years. Meanwhile, the rate of flights arriving on time maintenance, airline operations, weather conditions, change
decreased from 83.19% in 2007 to 71.67% in 2017. The annual of procedures en route, capacity constraints, customer service
cost of flight delays in China was estimated to be more than issues, and late aircraft or crew arrival. The results show that
$7.4 billion. Such high economic costs of delay necessitate weather contributed to 69% of the delays. Different results
delay causal factor analysis and delay-reduction strategies. can be achieved by different method and variables; research
Several approaches have been taken to analyze the factors results of Kwan and Hansen [4] show that airport congestion
that affect flight arrival and departure delay. Allan et al. contributed to approximately 32% of the average delays, in
[1] studied several determining causes of flight delay at the which a series of econometric models was established to iden-
Newark International Airport (EWR) using a comprehensive tify the key causal factors of flight delays, including airport
approach. The results show that adverse weather conditions, congestion, total traffic, and en route weather. In addition to
low ceilings, and low visibility conditions strongly influence identifying the causal factors and their quantitative effect on
flight delays. Similarly, Asfe et al. [2] investigated the major flight delay, more studies focus on the development of models
2 Journal of Advanced Transportation

to determine the probability of aircraft delay. Wesonga et al. resources correlation may lead to delay daily propagation.
[5] proposed and evaluated a multiple parametric approach, The spatial dependence exists in every direction since the
which includes the apparently significant meteorological and aggregation is observed in both day of week and hour of
aviation parameters, to predict the probability of aircraft day, which probably lead to error correlation and variable lag
delay. Recent research and development effort in delay proba- effect of causal factors on delay [19].
bility prediction are seeking to develop asymmetric Bayesian Motivated by the exploration of the main causal factors
logit model to take the asymmetric distribution pattern of the of flight departure delays in consideration of correlation
dependent variable into consideration (see Perez-Rodriguez between delay samples, our study analyzes departure delay
et al. [6]). By using data from BTS and IATA, this article as a geographic problem instead of a statistical problem by
corroborates the necessity and superiority of the proposed assuming delay as a spatially distributed variable organized by
asymmetric Bayesian logit model, as well as identifying new hour and by day. Causal factor analysis using spatial analysis
significant factors affecting the probability of arrival delay. enables the existence of spatial dependence in variables,
In addition to traditional statistical methods, machine which solves the problem of sample correlations among hours
learning algorithms were used by several studies. Bayesian and days simultaneously. Specifically, spatial regression mod-
network was a commonly used approach to establish delay els were built to absorb the delay spatial dependence by
model to explore the delay propagation mode and estimate adding a spatial independent variable. The spatial lag model
delay [7, 8]. Artificial neural network was also utilized to (SLM) and spatial error model (SEM) are established in
examine the relationship between departure delay and differ- our study to solve the variable lagged effect and the error
ent causal factors comparing to linear and nonlinear regres- correlation, respectively. Comparisons between the SLM, the
sions [9]. Deep learning models have also been investigated SEM, and the OLS estimation are also conducted.
for air traffic delay prediction tasks [10]. Moreover, a number This paper is structured as follows. Section 2 introduces
of studies attempted to determine the major causal factors of the spatial analysis methodology. Section 3 describes the
flight delays by detecting the time series data trend. Abdel-Aty data sources, defines the variables, and describes the data-
et al. [11] applied the “two-stage approach” to detect periods processing methodology. Section 4.1 shows the exploration
of regularly repeating patterns in their data and to identify analysis of flight departure delay with a distribution map
the factors correlated with them. Tu et al. [12] employed a and trend analysis. Section 4.2 demonstrates identification of
smoothing spline model to identify the relationship between delay pattern. Section 4.3 maps the semivariogram to quan-
seasonal trends, random effects, and daily delay propagation tify the spatial dependence of flight departure delay. Based
pattern. Delay propagation has also been deeply investigated on the results of Section 4.3, Section 4.4 illustrates spatial
by many researches to help to understand the air congestion prediction considering spatial autocorrelation by employing
[13–15] and alleviate fight delay [16, 17]. The effects of day ordinary kriging method. Section 4.5 discusses the establish-
and time were assumed to be additive, and the residuals were ment of the classical regression model, SEM, and SLM, as well
assumed to be identically and independently distributed in as the comparative analysis of the three models to explore
the study. the main causal factors of flight departure delays. Finally, this
However, delays show a certain aggregation pattern in paper is concluded with a summary.
the temporal dimension; high delays are normally clustered;
and low delays tend to be surrounded by low delays. In 2. Methods
other words, the delay value of samples with shorter distance
between them is normally similar compared to the delay value This study employs the spatial analysis method to explore
of delays with longer distance between them. The correlation the delay distribution pattern and causal factors of flight
between two delay values depends on their spatial attribute departure delays while considering delay spatial dependence.
such as spatial location and spatial distance. Without doubt, Delay is assumed to be a spatially distributed variable. Spatial
there is high degree of spatial dependence among delays in a analysis is a quantifying technique used in the study of spatial
space organized by hour and by day. Given that most of the variables [20].
aforementioned methods were based on certain assumptions
which either ignore or simplify the correlation of samples in 2.1. Delay Pattern Analysis
the dataset, Diana [18] initially introduced the approach of
spatial analysis for delay prediction, which is able to take the (1) Exploring Delay Distribution. The first step to analyze delay
spatial dependence in every direction into account. In the pattern is to explore the distribution. By defining a space
study, delay was considered as a spatially distributed variable with 𝑥 coordinate of day of week and 𝑦 coordinate of hour,
in a space coordinated by day and time. A spatial error model the delay is added to each hour unit as an attribute. The
(SEM) was built to consider spatial dependence in error. delay distribution can be plotted with different colors as the
Actually, flight departure delay is a complex problem delay minutes. The time when an intense delay occurred is
with substantial direct causal factors and many concealed recognized in the distribution map. 3D trend analysis can be
indirect causal factors. Flight departure delay is caused by the used to visualize the departure delay distribution and trend
abovementioned factors, as well as by the flight delays that in the temporal dimension.
occur earlier [12], as the operation resources required by the
current flight, such as the crew, aircraft, and passenger gates, (2) Identifying the Pattern of Delay. The pattern of delay
might have been utilized by previously delayed flights. This is then identified by calculating Moran’s 𝐼 and general 𝐺
Journal of Advanced Transportation 3

to measure the degree of delay spatial dependence among delays between pairs of hour units at a given interval [9]; it is
observations. Positive autocorrelation suggests that the values computed as
of the one hour unit and its neighbors are similar. Negative
autocorrelation suggests that the values of the one hour unit
1 𝑁(ℎ) 2
and its neighbors are different. No autocorrelation suggests 𝛾 (ℎ) = ∑ [𝑍 (𝑥𝑖 ) − 𝑍 (𝑥𝑖 + ℎ)] , (5)
that the values are randomly distributed over the space. 2𝑁 (ℎ) 𝑖=1
Moran’s 𝐼 is calculated as
where 𝑍(𝑥𝑖 ) is the total minutes of departure delay of location
∑𝑛𝑖 ∑𝑛𝑗 𝑤𝑖𝑗 (𝑥𝑖 − 𝑥) (𝑥𝑗 − 𝑥) 𝑥𝑖 ; 𝑍(𝑥𝑖 + ℎ) is the total minutes of departure delay of the
𝐼= (1) locations with ℎ distance from 𝑥𝑖 ; and 𝑁(ℎ) is the number of
𝑆2 ∑𝑛𝑖 ∑𝑛𝑗 𝑤𝑖𝑗 locations with ℎ distance from 𝑥𝑖 .

1 𝑛 2 1 𝑛 (4) Delay Prediction. After the spatial dependence structure


𝑆2 = ∑ (𝑥𝑖 − 𝑥) 𝑥 = ∑𝑥𝑖 (2)
𝑛 𝑖 𝑛 𝑖=1 of a variable is determined, the measured data can be used to
estimate the variable at unmeasured locations. This interpola-
1 − 𝐸 (𝐼) tion method is known as kriging interpolation. Based on the
𝑍= (3)
√𝑉𝐴𝑅 (𝐼) unbiased estimation and the minimum variance principle,
the kriging interpolation method can quantify the spatial
dependence between the known sample and the estimated
where 𝐼 is the value of Moran’s I, 𝑥𝑖 is the total minutes of point according to the statistical characteristics and spatial
departure delay during the hour unit i, and 𝑤𝑖𝑗 is the spatial variation of the sample.
weight matrix. Z value is generally used to test Moran’s 𝐼
value. A test result against the null hypothesis indicates that
no spatial autocorrelation exists. 2.2. Causal Factor Analysis. After the identification of delay
Most of the spatial weight matrices are built based on dependence, causal factor analysis is performed using spatial
spatial connectivity and spatial distance. The weight matrix analysis, which enables the existence of spatial dependence
in this study is generated based on distance measured by in variables. To explore the causal factors of flight departure
the inverse Euclidean distance between two hour units. The delay, spatial econometric models were built to absorb the
value of Moran’s 𝐼 ranges from −1 to 1. Moran’s 𝐼 identifies delay spatial dependence by adding a spatial independent
the similarity between units with delay and the spatial variable, and the outcomes of the SLM, SEM, and classical
distribution pattern. However, it cannot distinguish high- regression model are compared.
from low-value clusters. General 𝐺 identifies the two different
patterns of spatial cluster; it is computed as (1) Classical Regression Model. A classical regression model
can be written as

∑𝑛𝑖 ∑𝑛𝑗 𝑤𝑖𝑗 (𝑑) 𝑥𝑖 𝑥𝑗 𝑌 = 𝛽X + 𝜀,


𝐺 (𝑑) = (6)
∑𝑛𝑖 ∑𝑛𝑗 𝑥𝑖 𝑥𝑗
(4)
where 𝑌 represents the total minutes of departure delay at
𝐺 − 𝐸 (𝐺)
𝑍= the target airport and X represents the factor variables. 𝛽
√𝑉𝐴𝑅 (𝐺) represents the effect of the independent variables on the
dependent variable, and 𝜀 is the random error term vector
When the 𝑍 value is significant, a general 𝐺 value that subjected to normal distribution.
is greater than its average indicates a high-value cluster, as
opposed to a general 𝐺 value that is less than its average. (2) SEM. SEM is able to consider the spatial dependence in
A general 𝐺 value that is equal to its average indicates no error terms by adding spatial error term as an explanatory
autocorrelation. variable. The SEM takes the following form:
The cluster type in the flight departure delay is then
identified, and the hot and cold spots of flight departure delay 𝑌 = 𝛽X + 𝜀
are explored. (7)
𝜀 = 𝜆W𝜀 + 𝜇,
(3) Quantifying Delay Spatial Dependence. After measuring
the degree of spatial dependence, the semivariogram is mod- where 𝑌 is the total minutes of departure delay and X is the
eled to quantify the departure delay spatial dependence and to factor variables. 𝛽 represents the effect of the independent
analyze its random and structural properties. Departure delay variables on the dependent variable, 𝜀 is the random error
is considered as a regionalized variable because it is correlated term vector, 𝜆 is the spatial error coefficient, W is the spatial
with the hour and the day. Structural property indicates the weight matrix of error term generated based on distance
existence of the autocorrelation between the departure delay measured by the inverse Euclidean distance between two
at location 𝑥 and at location 𝑥 + ℎ (h is the distance from x). hour units, and 𝜇 is the random error term vector subjected
Semivariance calculates the average difference on departure to normal distribution.
4 Journal of Advanced Transportation

Table 1: Frequency and effect of each factor of flight departure delay.

Factors of departure delay Frequency Rank of frequency Average delay minutes Rank of average delay minutes
T 21 7 206.5 4
TL 34 5 235.8 2
W 30 6 238.6 1
WL 161 2 154.8 6
WD 9 9 176.1 5
WR 13 8 212.2 3
CF 650 1 99.2 11
CR 71 4 126.9 8
A 115 3 134.8 7
F 3 11 110.7 10
P 2 12 98.0 12
D 7 10 118.0 9

(3) SLM. SLM is able to consider the spatial autocorrelation The flight delay determinants considered in previous studies
in delay variable by adding a spatial lag variable as an include weather, delay propagation, flight schedule, airplane
explanatory variable. The SLM takes the following form: shortage, air route, airplane type, flight order, air traffic
flow, hub airport, ability of the airline to pay debt, ability
𝑌 = 𝜌W𝑌 + 𝛽X + 𝜀, (8) of the airline to profit, load factors of the airline, load rate
where 𝑌 is the total minutes of departure delay and X is the of the airline, and other factors [21, 22]. Chinese aviation
factor variables. 𝛽 represents the effect of the independent determined the following factors of flight delay. Technical
variables on the dependent variable; W is a spatial weight failure includes technical failure at the target airport (T)
matrix of the dependent variables generated based on dis- and technical failure at the previous airport (TL). Weather
tance measured by the inverse Euclidean distance between refers to weather conditions at the target airport (W), at the
two hour units; 𝜌 is the spatial regression coefficient, which previous airport (WL), at the destination airport (WD), and
reflects the effects of the delay in the neighbor hours W𝑌 on en route (WR). Control factors include flow control (CF) and
the delay in one hour Y; and 𝜀 is the random error term vector route restriction (CR). Other factors include the airline (A),
subjected to normal distribution. airport facility (F), passenger (P), and capacity allocation (D).
Then, nominal factors are selected by calculating the
3. Data Collection frequency and the effect of each factor in our dataset. Effect
of each factor of flight delay in Table 1 is measured by average
3.1. Data Sample. The data in this study are obtained from delay minutes caused by each factor. As shown in Table 1, the
the database of an international hub airport in China in June frequency of the flow control factor is significantly higher
2016. To maintain the privacy of the institution, the name of than the others, but the average delay minutes caused by
the airport is not revealed. In June 2016, 8788 flights departed the flow control factor is lower. Conversely, the frequency of
from the target airport, among which 18 flights returned, 51 the weather condition at the target airport and the technical
flights were canceled, and 5357 flights (60.96%) were delayed failure at the previous airport are significantly lower, with
for more than 15 minutes; 3180 flights (36.19%) were delayed high average delay minutes.
for more than half an hour; 1528 flights (17.39%) were delayed All factors are classified into three categories: high fre-
for more than one hour; and 489 flights (5.56%) were delayed quency and low effect, low frequency and high effect, and low
for more than two hours. The most severe delay lasted for frequency and low effect. Flow control, airline factor, route
888 minutes. Approximately 70% of the delays were within restriction, and weather condition at the previous airport
60 minutes. The data are organized by day of week and caused most of the departure delays; however, these factors
hour of day. To demonstrate the spatial dependence of delay can be usually controlled well, and the delay can be eliminated
distribution intuitively, the study assumed delay as a spatially in a short time. The effects of weather conditions at the target
distributed variable. The space is defined with day of week airport and en route and the technical failure at the target,
as the x coordinate and hour of day as the y coordinate. previous, and destination airports, although they did not
Compared with the total number of flights (8788), there were happen often, have dramatic impacts with long departure
few flights (72) from 0:00 to 7:00, and hour units with less delays. Airport facility, passenger, and capacity allocation are
than five flights are not considered since it could bias the the minor reasons for flight departure delay, and we will not
average. The study area covers 7:00 to 24:00, including a total focus on these factors in the following discussion.
of 510 hour units with departure delays. In addition, delay can be related to time period (morning,
afternoon, night, and weekday or weekend). The total traffic
3.2. Definitions of Variables. First step of variable construc- and passengers are also important factors. Aviation industry
tion is to find out factors affecting flight departure delay. experts are interviewed about the limitations of the data
Journal of Advanced Transportation 5

Table 2: Descriptive statistics and definitions of the variables used in the model.

Variables Definition Mean s.d. Min Max


Delay Total minutes of departure delay 587.824 428.433 0 2341
Technical failure at the target
T airport (equals 1 if technical 0.039 0.194 0 1
failure occurs)
Technical failure at the previous
TL airport (equals 1 if technical 0.065 0.246 0 1
failure occurs)
Weather condition at the target
W airport (equals 1 if weather is 0.018 0.132 0 1
adverse)
Weather condition at the
WL previous airport (equals 1 if 0.251 0.434 0 1
weather is adverse)
WR Weather condition en route 0.016 0.124 0 1
(equals 1 if weather is adverse)
CF Flow control (equals 1 if conduct 0.571 0.496 0 1
flow control)
CR Route restriction (equals 1 if 0.122 0.327 0 1
conduct route restriction)
NF Scheduled departure traffic 16.943 4.414 4 27

collection, and the final list included 15 factors that affected (2) 3D Trend Analysis. The trend analysis generated a 3D trend
flight departure delays. map of the departure delay. In Figure 2, the x-axis and y-axis
We then conducted a stepwise-backwards regression in represent the day and the hour of delay, respectively, and the
variable construction and determined a significant level of z-axis represents the total minutes of departure delay. The
introduced independent variable as 𝛼in = 5% and a signif- green line in the x–z plane and the blue line in the y–z plane
icant level of excluded independent variable as 𝛼out = 10%. indicate the trend of the delay. The figure shows that flight
Seven factors are excluded, and the remaining 8 explanatory departure delay is intensively late in the month and exhibits a
variables comprise the regression model (Table 2). All vari- parabola with the peak at 18:00 in our dataset.
ables are calculated in an hour unit.
4.2. Identifying the Pattern of Delay. As mentioned in Intro-
3.3. Data Processing. The data are processed with various duction, there exists a high degree of spatial dependence
software. The exploration analysis module in ArcGIS 10.2 is among delays. Moran’s 𝐼 and general 𝐺 are calculated to
used for the distribution mapping and the 3D trend analysis. measure the degree of spatial autocorrelation and to identify
The geostatistic module in ArcGIS 10.2 software is adopted the pattern of delay. The Moran’s 𝐼 and general 𝐺 values of all
to generate the theoretical and empirical semivariograms, as variables are calculated. Values with a significant autocorre-
well as the kriging interpolation. The Geoda software is used lation are listed in Table 3.
to develop the spatial econometric models. Table 3 shows significant positive spatial autocorrelation
in variables such as the total minutes of departure delay,
4. Results and Discussion weather conditions at the target airport, weather conditions at
the previous airport, weather conditions en route, flow con-
4.1. Exploring Delay Distribution trol, total departure traffic, and number of passengers. Gen-
eral 𝐺 test showed that all of the abovementioned variables
(1) Distribution Map. The distribution map is a commonly are high-value clusters, indicating that the hours of intense
used spatial data visualization method. Each grid defined by delay are clustered.
day and hour is colored according to the departure delay in The hot and cold spots of delay are explored after the
minutes that occurred in an hour unit. Red represents high degree of autocorrelation is measured and the hour units of
departure delay, whereas dark blue indicates low departure high-value clusters in flight departure delay are identified.
delay. The distribution map highlights delay intensity, the A high degree of delay from June 18 to 22 that lasted for
day on which the delay occurred, and the duration of the 8 hours from 14:00 to 22:00 is noted, as shown by the red
delay. As demonstrated in Figure 1, the delay levels between area in Figure 3. Among all the factors responsible for flight
neighborhoods are usually similar, which indicates obvious departure delay, this large-scale cluster is probably caused by
spatial cluster characteristics. The distribution map also an exogenous variable such as sudden adverse weather con-
shows that intense delay occurred mostly at 16:00, 18:00, and ditions. This conclusion corresponded to the actual weather
21:00, especially from June 18 to June 22 in our dataset. report record of the target airport in June 2016. Between
6 Journal of Advanced Transportation

Y (Hour)

20

15

0 : 281.14
10 281.14 : 424.34
424.34 : 705.48
705.48 : 1257.4
1257.4 : 2341
6 X (Day)
0 5 10 15 20 25 30

Figure 1: Distribution map of departure delay.

Z (Delay/min)

2400

2000

1600

1200

800

400
(Hour) Y
X (Day)
23
27 30
19 24
21
15 18
15
12
11 9
6
7 0 3

Figure 2: 3D trend map of departure delay.

Table 3: Autocorrelation statistics for selected variables.


Variables Coefficient Observed Expected Stddev Z P
Moran’s I 0.580509 -0.001965 0.001016 18.275923 0.000000
Delay
General G 0.010119 0.007496 0.000000 19.158604 0.000000
Moran’s I 0.397390 -0.001965 0.000915 13.198725 0.000000
W
General G 0.194444 0.007496 0.000201 13.177839 0.000000
Moran’s I 0.161547 -0.001965 0.001021 5.116695 0.000000
WL
General G 0.011565 0.007496 0.000001 5.576605 0.000000
Moran’s I 0.248777 -0.001965 0.000901 8.352514 0.000000
WR
General G 0.142857 0.007496 0.000260 8.398764 0.000000
Moran’s I 0.341630 -0.001965 0.001024 10.738782 0.000000
CF
General G 0.009717 0.007496 0.000000 11.624289 0.000000
Moran’s I 0.532762 -0.001965 0.001018 16.761879 0.000000
NF
General G 0.007859 0.007496 0.000000 14.762860 0.000000
Moran’s I 0.489845 -0.001965 0.001019 15.405064 0.000000
NP
General G 0.007888 0.007496 0.000000 12.831494 0.000000
Journal of Advanced Transportation 7

Table 4: Theoretical model fit comparison of isotropic semivariogram.

Model Nugget Sill Range RSS R2


Exponential 66979.4200 219070.4200 17.6000 5.48E+08 0.977
Spherical 90023.7000 206937.5788 14.0439 1.12E+09 0.955
Gaussian 105523.0000 203931.7816 11.4025 1.35E+09 0.947

Y (Hour) 210196

157647

Semivariance
21

18 105098

15 52549
12
0
0 5.52 11.04 16.56
9
Separation Distance (h)
6 X (Day)
0 5 10 15 20 25 30 Figure 4: Isotropic semivariogram of departure delay.

Cold Spot – 99% Confidence


Cold Spot – 95% Confidence
Cold Spot – 90% Confidence (4) Structural variance 𝐶 represents the structural prop-
Not Significant erty of delay. The value of 𝐶 reflects the variance caused by
Hot Spot – 90% Confidence the autocorrelation.
Hot Spot – 95% Confidence (5) Nugget–sill ratio 𝐶0 /(𝐶 + 𝐶0 ) represents the percent-
Hot Spot – 99% Confidence age of variance caused by randomness. A low nugget–sill
Figure 3: Hot and cold spots of departure delay. ratio reflects that the variation is mainly affected by the
autocorrelation factors.
Theoretical semivariogram is necessary to obtain the
spatial structure of delay. The experimental semivariogram
generated from limited samples is used to estimate the corre-
June 18 and 22, there were cloudy skies for 5 days and a lation in the whole area by fitting a theoretical semivariogram
thunderstorm for 3 days. to an empirical semivariogram. Different theoretical models
are compared in Table 4. The comparison shows that, when
4.3. Quantifying Delay Spatial Dependence. After measuring fitting the isotropic semivariogram, the exponential model is
the degree of delay spatial dependence between observations, more effective than others, such as the spherical and Gaussian
the variogram is utilized to quantify the spatial dependence models.
based on the theory of regionalized variables. The exper- According to results of semivariogram, the low nugget–
imental semivariogram is mapped to quantify the spatial sill ratio (30.6%) suggests that the variation of delay is mainly
dependence of delays and to provide the spatial structure for caused by autocorrelation (69.4%). In Figure 4, the delays
the subsequent kriging interpolation. separated by short intervals are strongly correlated to one
The following are the key parameters in the semivari- another. The correlation decreases as the intervals increase to
ogram: a distance of 17.60.
(1) Nugget effect 𝐶0 is estimated from the empirical
4.4. Delay Prediction. Spatial interpolation allows us to fur-
variogram at ℎ = 0. This represents the measurement error or
ther comprehend the overall situation of the entire study
random property of the departure delay. The 𝐶0 value reflects
area from a limited number of spatial sample points. We
the variation caused by the stochastic factor.
randomly select 10% of the sample dataset as the test set
(2) Range 𝐴 0 is the distance where the variogram reaches and the remaining 90% as the training set. Spatial autocor-
plateau. This represents the largest distance of autocorrela- relation undermines the accuracy and effectiveness of some
tion. Data can be considered uncorrelated if their distance commonly used interpolation methods such as trend surface
exceeds the range. method or inverse distance weighting (IDW) method. We
(3) Sill 𝐶+𝐶0 is the plateau at which the variogram reaches use the ordinary kriging method to interpolate delays since
the range. This represents the total variance of regionalized it can take spatial dependence into account by considering
variables, which is equal to the sum of the autocorrelation and spatial structure obtained by semivariogram. Similar to the
stochastic variances. IDW method, the ordinary kriging method predicts the
The following are the other two parameters that can be value on unmeasured position by generating weights of the
calculated from the three parameters mentioned above: surrounding points. IDW generates weights according to
8 Journal of Advanced Transportation

30.

0
Ho

23.
ur

0
y
Da

0
20.

15.
3

7.0
10.
7
2341

1.0
1561
Delay

Delay
2341 780 2341.
2185.
0 2029.
1561 1873.

30
1717.
Delay

.0
780 1561.
1405.

20
0.0 1249.

.3
.0

1092.
23

936.

ay
780.

10

D
.7
.0

624.
15

468.
H
ou

312.
r

1.

156.
0
7.

0.

Figure 5: Prediction surface of delay.

Table 5: Results of the comparison of cross-validation between different interpolation methods.

Interpolation method M RMS A Std Std M Std RMS


IDW 8.0616 298.4739
Exponential 1.7808 301.0585 312.6256 0.0040 0.9665
Ordinary kriging Spherical 3.0301 307.9240 331.4682 0.0077 0.9312
Gaussian 2.9745 316.4641 336.4019 0.0075 0.9426

the distance between unmeasured position and surrounding Cross-validation can also be an effective selection
points. Different from IDW, kriging method generates weight approach between different interpolation methods. Com-
from the semivariogram, which is developed by considering paring the cross-validation results, exponential semivari-
spatial properties and spatial structure of the data. The ogram shows the minimum RMS and Std RMS closest to
interpolation results of the prediction surface are shown in 1. Therefore, the best result is obtained in this study using
Figure 5, in which the x-axis represents the day of week, y- the exponential fitting semivariogram for ordinary kriging
axis represents the hour of day, and the z-axis represents the interpolation (Table 5).
value of delay. In this way the value of unmeasured locations
can be interfered according to the value of measure locations 4.5. Spatial Econometric Analysis. The regression model is
and the spatial relation between them. commonly used to analyze the factors of departure delay.
After the generation of prediction surface, it is important First, we perform an ordinary least square (OLS) estimation
to evaluate the interpolation precision, which is conducted by based on the classical regression model (as in (6)). The
cross-validation. Cross-validation leaves one point out and
uses the rest to predict a value at that location. The point is estimation results of each variable are demonstrated as 𝛽̂
then changed to another in turn, and finally this process is in Table 6. After the model parameters are estimated, it is
performed for all samples in the dataset. Similar to another necessary to perform the statistical test of the model, which
typical interpolation method, prediction performance can includes the goodness of fit test, the significance test of the
be evaluated by Mean Error (M) and Root Mean Square equation, and the significance test of the variables.
Error (RMS). The smaller the RMS, the better. Besides, The goodness of fit test can be reflected by R2 . R2 value is
ordinary kriging has other indicators to evaluate prediction the ratio of the sum of the squares of the regression and the
performance, including Average Standard Error (A Std), sum of the squares of the total deviations, and it indicates the
which measures the average of the prediction standard errors; degree of interpretation of all the explanatory variables to the
Mean Standardized Error (Std M), whose value should be variation of the dependent variables. The value is between 0
close to 0; Root Mean Square Standardized Error (Std RMS), and 1; the closer to 1, the better the estimated regression model
which should be close to 1. A Std RMS greater than 1 indicates fits.
underestimating the variability in the predictions. A Std RMS The F test is a joint significance test for multiple coeffi-
less than 1 indicates overestimating the variability in the cients to infer whether the linear relationship between the
predictions. dependent variable and explanatory variables is significant.
Journal of Advanced Transportation 9

Table 6: Estimation results of delay and causal factors for each model.

OLS estimation Spatial lag model Spatial error model


Variable
𝛽̂ p 𝛽̂ p 𝛽̂ p
∗∗∗ ∗∗∗
Constant -189.2450 0.0008 -275.3832 0.0000 81.4065 0.2570
T 192.5201∗∗∗ 0.0066 163.5734∗∗∗ 0.0029 197.2891∗∗∗ 0.0003
TL 179.6183∗∗∗ 0.0013 150.6237∗∗∗ 0.0005 140.6904∗∗∗ 0.0006
W 1059.5640∗∗∗ 0.0000 699.4476∗∗∗ 0.0000 769.7684∗∗∗ 0.0000
WL 247.8793∗∗∗ 0.0000 181.4880∗∗∗ 0.0000 161.2410∗∗∗ 0.0000
WR 275.8793∗∗ 0.0187 90.3014 0.3228 147.4748 0.1265
CF 292.8883∗∗∗ 0.0000 139.7609∗∗∗ 0.0000 128.7156∗∗∗ 0.0000
CR 205.2254∗∗∗ 0.0000 141.3503∗∗∗ 0.0002 120.0013∗∗∗ 0.0002
∗∗∗ ∗∗∗
NF 28.3640 0.0000 19.8473 0.0000 28.7060 0.0000
𝜌 0.5806∗∗∗ 0.0000
𝜆 0.6926∗∗∗ 0.0000
R2 0.5977 0.7928 0.7884
F 62.0591 ∗∗∗
Log likelihood -3638.21 -3539.13 -3554.8640
SC 7332.53 7140.61 7165.84
AIC 7294.43 7098.26 7127.73
Likelihood Ratio 198.1630 ∗∗∗ 166.6973∗∗∗
Note: ∗∗∗ , ∗∗ , and ∗ represent significance in 1%, 5%, and 10% levels, respectively.

Table 7: Test results of OLS residuals’ spatial dependence.

TEST MI/DF VALUE PROB


Moran’s I (error) 0.3640 11.5242 0.0000
Lagrange Multiplier (lag) 1 195.3046 0.0000
Robust LM (lag) 1 68.0750 0.0000
Lagrange Multiplier (error) 1 127.4078 0.0000
Robust LM (error) 1 10.1783 0.0000

The null hypothesis (H0 ) of the F test is that all the parameters Classical regression model fails to reflect the spatial
to be estimated are simultaneously zero. The larger the F dependence between hour units and the influence of their
value, the less likely the null hypothesis. interactions on the total minutes of departure delay. There-
The p-value measures the probability of correctly reject- fore, spatial factors are introduced into the regression model,
ing the null hypothesis when testing the significance of a and spatial econometric analysis is necessary. SEM and SLM
single variable. A larger p-value indicates greater probability are built to measure the spatial dependence in error terms
of erroneously rejecting the null hypothesis. and the spatial dependence of delay between the hour units,
In Table 6, the OLS estimation shows an F value of respectively [23].
62.0591 at a 1% level of significance and a goodness of fit R2 The spatial lag variable and spatial error terms are
value of 0.5977, which indicates that the explanation variables considered as the explanatory variables because of the spatial
and the dependent variable have relatively significant linear effects. The use of the OLS results in a biased and irregular
correlation, and the dependent variable can be effectively estimation. Therefore, the maximum likelihood estimation
predicted by the explanation variables. p-value indicates method is used in this study. The model selection is based on
significant variables such as the technical fault at the target the value of Log likelihood (Log L), the Akaike information
airport (T), the technical fault at the previous airport (TL), the criterion (AIC), and the Schwartz criterion (SC), which
weather condition at the previous airport (WL), the weather are fit statistic measures of the accuracy of the model, as
condition at the airport of departure (W), the flow control well as the test for goodness of fit (R2 ). A greater Log
(CF), the route restriction (CR), and the number of scheduled likelihood and goodness of fit value and smaller Akaike
departure flights (NF). information and Schwartz criteria indicate a better model
However, the Moran’s 𝐼 test shows a significant spatial fit.
autocorrelation in the residuals of the OLS estimators. The Comparing the estimation results between the OLS esti-
spatial dependence test results of the error and lag are mation, spatial lag model, and spatial error model in Table 6,
positive, as shown in Table 7. the goodness of fit R2 is 0.7918 for the SLM, which is greater
10 Journal of Advanced Transportation

than 0.5978 in the OLS estimation and 0.7884 in the SEM. 5. Conclusions
The AIC (7098.26) and SC values (7140.61) of the SLM
are both less than the values of the OLS estimation (AIC This study studied the flight departure delay and its causal
7294.43, SC 7332.53) and the SEM (AIC 7127.73, SC 7165.84). factors by developing a novel spatial analysis method, which
Moreover, the SLM and OLS estimations are nested, as with enables the correlation in data samples. The main conclusion
the SEM. Increasing the model parameters must result in can be presented as below.
high likelihood scores. Therefore, judging the fit of the model First, spatial analysis is confirmed as a useful method in
based on the log likelihood value is inaccurate. We conduct the delay and causal factor analysis in this study. Exploration
the likelihood ratio test for both models. analysis can intuitively demonstrate the distribution pattern
The likelihood ratio test uses a likelihood function to of flight departure delay in the temporal dimension, semi-
evaluate a simple model and a complex model with parameter variogram can quantify the spatial structure of the delay, and
constraints. The likelihood ratio is defined as the ratio of the kriging interpolation allows delay estimation at unmeasured
maximum value of the likelihood function under constrained locations.
conditions to that under unconstrained conditions. A statistic Besides, the results of the spatial econometrics models
that obeys the chi-square distribution can be constructed achieve better fit performance by taking the spatial depen-
based on the likelihood ratio. The null hypothesis H0 is that dence into consideration, since the fit of SLM and SEM is
there is no significant difference in the goodness of fit between better than that of OLS estimation. Results achieved by this
model A and model B. The rejection or acceptance of the study reconfirm the significant effect of the weather condition
null hypothesis can be judged based on the constructed chi- and technical failure on flight departure delay.
square statistic value or p-value. In this way, we can judge This study also indicates that the weather condition and
whether the difference between models is significant. The technical failure at the previous airport significantly affect
results of the likelihood ratio test of SLM-OLS and SEM-OLS departure delay. These effects are more significant than the
show that the likelihood ratio values are greater than the chi- flow control factor, which is regarded as one of the two most
squared distribution with the degree of freedom of 1 at 1% important factors that affect delay. This result suggests that
significance level, and the null hypotheses are rejected, which delay-reduction strategies must also focus on reducing the
means that the SLM and SEM provide significantly better impact of delay at the previous airport.
fit compared with the OLS estimation, and the explanatory
capacity is enhanced by adding the spatial effect to the Data Availability
model.
Among the explanatory variables, the effects of the The data used to support the findings of this study are
weather condition at the previous airport, the weather con- available from the corresponding author upon request.
dition at the airport of departure, flow control, and the num-
ber of scheduled departure flights are the most significant. Conflicts of Interest
Moreover, the technical failure at the target airport and the
The authors declare that there are no conflicts of interest
previous airport and the route restriction also significantly
regarding the publication of this paper.
affect departure delay. Adverse weather is the primary cause
of flight departure delays with harsher influence than flow
control. Delay reduction primarily focuses on weather fore- Acknowledgments
casts and dynamically adjusts to weather changes. The authors would like to acknowledge the financial support
Besides, the comparison results in Table 6 show that the from the Research and Development Project of Scientific
WR variable, which is significant in the OLS estimation, is and Technological Cooperation between Sichuan Provincial
insignificant in the SLM and SEM. Delay-reduction strategies Colleges and Universities (Grant No. 2019YFSY0024), Key
may focus on the weather prediction on route according to Research and Development Projects of Sichuan Science and
the OLS estimation but would not achieve a significant effect Technology Plan Project (Grant No. 2019YFG0050), and
in delay reduction according to SLM and SEM. The SLM National Natural Science Foundation of China (Grant No.
shows that the spatial lag variable is at the 1% significance U1533203) (Grant No. 61179069).
level, which indicates a strong spillover effect of the departure
delay in the temporal dimension.
Comparing with the results of causal factors obtained References
from the previous study, this study also indicates that the
[1] S. S. Allan, J. A. Beesley, J. E. Evans et al., “Analysis of delay
effect of weather condition at the target airport on flight causality at Newark International Airport,” in Proceedings of the
delay is much greater than that of other factors. However, 4th USA/Europe Air Traffic Management R&D Seminar, 2001.
this study exhibits an interesting finding that the technical [2] M. Kazemi Asfe, M. Jangi Zehi, M. N. Shahiki Tash, and N. M.
failure and weather condition at the previous airport have a Yaghoubi, “Ranking different factors influencing flight delay,”
larger effect on departure delays than flow control, which is Management Science Letters, vol. 4, no. 7, pp. 1397–1400, 2014.
one of the two most significant factors that affect delays aside [3] E. R. Mueller and G. B. Chatterji, “Analysis of aircraft arrival and
from weather condition. This finding suggests that dealing departure delay characteristics,” in Proceedings of the AIAA’s Air-
with technical failure and weather prediction at the previous craft Technology, Integration, and Operations (ATIO) 2002 Tech-
airport is crucial in delay reduction. nical Forum, pp. 1–14, Los Angeles, Calif, USA, October 2002.
Journal of Advanced Transportation 11

[4] I. Kwan and M. Hansen, “US flight delay in the 2000s: an [22] X. Yang, J. Wang, and J. He, “The identification of flight delay’s
econometric analysis,” in Proceedings of the Transportation determinants and their impact degree based on the analysis of
Research Board 90th Annual Meeting, vol. 11-4283, 2011. dynamic queues model,” Statistic Information Forum, vol. 4, no.
[5] R. Wesonga, F. Nabugoomu, and P. Jehopio, “Parameterized 29, pp. 88–95, 2014.
framework for the analysis of probabilities of aircraft delay at [23] L. Anselin, Spatial Econometrics: Methods and Models, vol. 4,
an airport,” Journal of Air Transport Management, vol. 23, pp. Springer Science & Business Media, 2013.
1–4, 2012.
[6] J. Pérez–Rodrı́guez, J. Pérez–Sánchez, and E. Gómez–Déniz,
“Modelling the asymmetric probabilistic delay of aircraft
arrival,” Journal of Air Transport Management, vol. 62, pp. 90–
98, 2017.
[7] Y.-J. Liu and S. Ma, “Flight delay and delay propagation analysis
based on Bayesian network,” in Proceedings of the 2008 Inter-
national Symposium on Knowledge Acquisition and Modeling,
KAM 2008, pp. 318–322, China, December 2008.
[8] W. Cao and X. Fang, “Airport Flight departure delay model on
improved BN structure learning,” Physics Procedia, vol. 33, pp.
597–603, 2012.
[9] D. M. Dai and J. S. Liou, “Delay prediction models for departure
flights,” Transportation Research Record, 2006.
[10] Y. J. Kim, S. Choi, S. Briceno, and D. Mavris, “A deep learning
approach to flight delay prediction,” in Proceedings of the 2016
IEEE/AIAA 35th Digital Avionics Systems Conference (DASC),
pp. 1–6, Sacramento, Calif, USA, September 2016.
[11] M. Abdel-Aty, C. Lee, Y. Bai, X. Li, and M. Michalak, “Detecting
periodic patterns of arrival delay,” Journal of Air Transport
Management, vol. 13, no. 6, pp. 355–361, 2007.
[12] Y. Tu, M. O. Ball, and W. S. Jank, “Estimating flight departure
delay distributions—a statistical approach with long-term trend
and short-term pattern,” Journal of the American Statistical
Association, vol. 103, no. 481, pp. 112–125, 2008.
[13] X. Dai, M. Hu, W. Tian, and H. Liu, “Modeling congestion
propagation in multistage schedule within an Airport Net-
work,” Journal of Advanced Transportation, vol. 2018, Article ID
6814348, 11 pages, 2018.
[14] N. Kafle and B. Zou, “Modeling flight delay propagation: A new
analytical-econometric approach,” Transportation Research Part
B: Methodological, vol. 93, pp. 520–542, 2016.
[15] Q. Wu, M. Hu, X. Ma, Y. Wang, W. Cong, and D. Delahaye,
“Modeling flight delay propagation in airport and Airspace Net-
work,” in Proceedings of the 2018 21st International Conference on
Intelligent Transportation Systems (ITSC), pp. 3556–3561, Maui,
HI, USA, November 2018.
[16] W. Wu, C.-L. Wu, T. Feng, H. Zhang, and S. Qiu, “Comparative
analysis on propagation effects of flight delays: a case study of
china airlines,” Journal of Advanced Transportation, vol. 2018,
Article ID 5236798, 10 pages, 2018.
[17] N. Ivanov, F. Netjasov, R. Jovanović, S. Starita, and A. Strauss,
“Air Traffic Flow Management slot allocation to minimize
propagated delay and improve airport slot adherence,” Trans-
portation Research Part A: Policy and Practice, vol. 95, pp. 183–
197, 2017.
[18] T. Diana, “Predicting arrival delays: An application of spatial
analysis,” Journal of Aircraft, vol. 48, no. 2, pp. 462–467, 2011.
[19] A. Reynolds-Feighan, “Competing networks, spatial and indus-
trial concentration in the US airline industry,” Spatial Economic
Analysis, vol. 2, no. 3, pp. 237–257, 2007.
[20] R. Haining, Spatial Data Analysis: Theory and Practice, Cam-
bridge University Press, 2003.
[21] J.-T. Wong and S.-C. Tsai, “A survival model for flight delay
propagation,” Journal of Air Transport Management, vol. 23, pp.
5–11, 2012.
International Journal of

Rotating Advances in
Machinery Multimedia

The Scientific
Engineering
Journal of
Journal of

Hindawi
World Journal
Hindawi Publishing Corporation Hindawi
Sensors
Hindawi Hindawi
www.hindawi.com Volume 2018 http://www.hindawi.com
www.hindawi.com Volume 2018
2013 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

Journal of

Control Science
and Engineering

Advances in
Civil Engineering
Hindawi Hindawi
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

Submit your manuscripts at


www.hindawi.com

Journal of
Journal of Electrical and Computer
Robotics
Hindawi
Engineering
Hindawi
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

VLSI Design
Advances in
OptoElectronics
International Journal of

International Journal of
Modelling &
Simulation
Aerospace
Hindawi Volume 2018
Navigation and
Observation
Hindawi
www.hindawi.com Volume 2018
in Engineering
Hindawi
www.hindawi.com Volume 2018
Engineering
Hindawi
www.hindawi.com Volume 2018
Hindawi
www.hindawi.com www.hindawi.com Volume 2018

International Journal of
International Journal of Antennas and Active and Passive Advances in
Chemical Engineering Propagation Electronic Components Shock and Vibration Acoustics and Vibration
Hindawi Hindawi Hindawi Hindawi Hindawi
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

You might also like