Identifying Factors That in Uence The Patterns

sustainability
Article
Identifying Factors that Influence the Patterns of
Road Crashes Using Association Rules: A case Study
from Wisconsin, United States
Shuai Yu *, Yuanhua Jia and Dongye Sun
School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China; yhjia@bjtu.edu.cn (Y.J.);
14114218@bjtu.edu.cn (D.S.)
* Correspondence: 14114217@bjtu.edu.cn; Tel.: +86-152-1057-6646

Received: 14 February 2019; Accepted: 25 March 2019; Published: 1 April 2019
Abstract: Road traffic injury is currently the leading cause of death among children and young adults
aged 5–29 years all over the world. Measures must be taken to avoid accidents and promote the
sustainability of road safety. The current study aimed to identify risk factors that are significantly
associated with the severity in crash accidents; therefore, traffic crashes could be reduced, and the
sustainable safety level of roadways could be improved. The Apriori algorithm is carried out to mine
the significant association rules between the severity of the crash accidents and the factors influencing
the occurrence of crash accidents. Compared to previous studies, the current study included the
variables more comprehensively, including environment, management, and the state of drivers and
vehicles. The data for the current study comes from the Wisconsin Transportation crash database that
contains information on all reported crashes in Wisconsin in the year 2016. The results indicate that
male drivers aged 16–29 are more inclined to be involved in crashes on roadways with no physical
separation. Additionally, fatal crashes are more likely to occur in towns while property damage
crashes are more likely to occur in the city. The findings can help government to make efficient
policies on road safety improvement.
Keywords: traffic safety; significant factor; association rules; Apriori
1. Introduction
The number of road traffic deaths in the world remains unacceptably high and increases
continuously, reaching 1.35 million in 2016 [1]. However, the fact is, every one of those deaths
and injuries is avertible. Improving traffic safety levels is one of the great opportunities to save lives
around the world, which does not receive anywhere near the attention it deserves [2].
Traffic crashes can be decreased significantly and identifying the causes of a traffic crash is the
most critical procedure in adopting precautionary measures to reduce the severity and quantity of
traffic crashes. However, some previous studies estimated a model of crash frequency and severity
using only the volume of traffic as an explanatory variable, while clearly many other factors affect
the frequency and severity of crashes, such as environmental conditions, roadway geometrics, driver
characteristics, and so on. Due to the complex nature of traffic crashes, the policy decision makers must
consider numerous contributory factors when making decisions on the improvement of safety [3]. It is
vital for decision makers to find the most significant factors that affect the occurrence and consequence
of traffic crashes. After years of research, it is generally accepted that through recognizing risk factors
as shown in Figure 1, which affect the severity of a crash and corresponding coping strategies, the
impact of crashes can be significantly reduced [4–6].
Sustainability 2019, 11, 1925; doi:10.3390/su11071925 www.mdpi.com/journal/sustainability

Sustainability 2019, 11, 1925 2 of 14
Sustainability 2019, 11 FOR PEER REVIEW 2 of 14
Figure1.1. The
Figure Thecausative
causativemechanisms
mechanismsof
oftraffic
trafficincidents/accidents.
incidents/accidents.
Some previous studies have been devoted to identifying the contributing factors that affect the
Some previous studies have been devoted to identifying the contributing factors that affect the
occurrence and severity of traffic crashes through traffic data. Various approaches were proposed
occurrence and severity of traffic crashes through traffic data. Various approaches were proposed by
by these studies such as binary logit/probit models [7,8], multinomial logit models [9,10], nested
these studies such as binary logit/probit models [7,8], multinomial logit models [9,10], nested logit
logit models [11,12], log-linear models [13], artificial neural networks [14,15], spatial and temporal
models [11,12], log-linear models [13], artificial neural networks [14,15], spatial and temporal
correlations [16], Markov switching models [17], and genetic algorithms [18], etc. Meanwhile, various
correlations [16], Markov switching models [17], and genetic algorithms [18], etc. Meanwhile, various
contributing factors to frequency and severity of traffic crashes have been identified in the above
contributing factors to frequency and severity of traffic crashes have been identified in the above
literature, such as weather, gender and age of drivers, posted speed, roadway geometrics, condition of
literature, such as weather, gender and age of drivers, posted speed, roadway geometrics, condition
drivers, and so on.
of drivers, and so on.
In recent years, the analysis of the various types of data using data mining techniques has been
In recent years, the analysis of the various types of data using data mining techniques has been
attracting more and more attention among researchers. Data mining technology has been employed in
attracting more and more attention among researchers. Data mining technology has been employed
traffic crash analysis and achieved satisfactory results in areas such as assessing the inherent connection
in traffic crash analysis and achieved satisfactory results in areas such as assessing the inherent
between crashes and road geometry [19–21], critical points identification [22], factors that contribute
connection between crashes and road geometry [19–21], critical points identification [22], factors that
to the severity of traffic crashes [23], and the relationship between driver characteristics and traffic
contribute to the severity of traffic crashes [23], and the relationship between driver characteristics
crashes [24]. Many studies have analyzed crash data with data mining techniques. Agrawal et al.
and traffic crashes [24]. Many studies have analyzed crash data with data mining techniques.
utilized the data mining technique of association analysis for crash data analysis [25]. Golob and
Agrawal et al. utilized the data mining technique of association analysis for crash data analysis [25].
Recker used clustering analysis for relating prevailing traffic conditions on freeways with type of
Golob and Recker used clustering analysis for relating prevailing traffic conditions on freeways with
collision most likely to occur [26]. Prati et al. applied a decision tree technique and Bayesian network to
type of collision most likely to occur [26]. Prati et al. applied a decision tree technique and Bayesian
predict the severity of bicycle crashes [27]. However, some of these studies are based on the hypotheses
network to predict the severity of bicycle crashes [27]. However, some of these studies are based on
that these factors are independent of one another, which might misunderstand the contribution of
the hypotheses that these factors are independent of one another, which might misunderstand the
every single factor.
contribution of every single factor.
Among these data mining techniques, association rules mining is a valid technique to analyze
Among these data mining techniques, association rules mining is a valid technique to analyze
traffic crashes since data mining methods do not rely on any hypothesis and can discover meaningful
traffic crashes since data mining methods do not rely on any hypothesis and can discover meaningful
connections hidden in large datasets. There are three kinds of basic algorithms for association rules
connections hidden in large datasets. There are three kinds of basic algorithms for association rules
mining, which are the Apriori algorithm, an algorithm based on partition, and the Frequent Pattern
mining, which are the Apriori algorithm, an algorithm based on partition, and the Frequent Pattern
tree algorithm. The Apriori algorithm is succinct and clear, which adopts an iterative method of
tree algorithm. The Apriori algorithm is succinct and clear, which adopts an iterative method of layer-
layer-by-layer search. Compared to the other two algorithms, the Apriori algorithm is more capable of
by-layer search. Compared to the other two algorithms, the Apriori algorithm is more capable of
processing large-scale datasets. In the current study, the Apriori algorithm was used to discover the
processing large-scale datasets. In the current study, the Apriori algorithm was used to discover the
significant rules between the factors and crashes in Wisconsin.
significant rules between the factors and crashes in Wisconsin.
2. Data Description and Processing
2. Data Description and Processing
2.1. Raw Data and Study Area
2.1. Raw Data and Study Area
The raw crash data for the current study was collected from the Wisconsin Transportation crash
The that
database raw contains
crash data for the current
information aboutstudy was collected
all reported crashesfrom the Wisconsin
in Wisconsin Transportation
in 2016. crash
A reportable crash
database
was thatleading
a crash contains
toinformation about
injury or death ofall
anyreported
person,crashes in Wisconsin
total damage in 2016.owned
to property A reportable
by anycrash
one
was a crash leading to injury or death of any person, total damage to property owned by any one
person to an apparent extent of $1000 or more, or any damage to government-owned non-vehicle
property to an apparent extent of $200 or more.
person to an
Sustainability apparent
2019, extent
11 FOR PEER of $1000 or more, or any damage to government-owned non-vehicle
REVIEW 3 of 14
property to an apparent extent of $200 or more.
The
The crash
crash data included 129,051
data included 129,051 crashes
crashes that
that occurred
occurred in in Wisconsin and were
Wisconsin and were described
described by 49
by 49
variables including
variables including calendar
calendar date
date onon which
which the
the crash
crash occurred,
occurred, crash
crash severity,
severity, type of crash,
type of crash, age
age ofof
the driver, etc. However, not all the reported crashes listed in the database are described by all the 49
the driver, etc. However, not all the reported crashes listed in the database are described by all the
variables,
49 variables, andandnotnot
allall
thethevariables
variableswere
werenecessarily
necessarilysignificant
significantfor
forthe
the crashes.
crashes. Therefore,
Therefore, in the
in the
current study the
current study the dataset
dataset needs
needs to
to be
be pretreated
pretreated with
with the
the following
following process
process as
as shown
shown inin Figure
Figure2.2.
Figure 2. The procedure of data pretreatment.

Figure 2. The procedure of data pretreatment.
2.2. Crash Data Processing
2.2. Crash Data Processing
First, a clustering algorithm of k-means was used to clean the noise data, which were erroneous or
First, [28].
abnormal a clustering algorithm
Meanwhile, each of k-means
reported was needed
crash used to to clean the noisefor
be checked data, whichvalues.
missing were erroneous
A reported or
abnormal
crash would [28].have
Meanwhile,
to be removedeach reported
if it hadcrash
noiseneeded
data ortolacked
be checked for missing values.
key information, such asA reported
reasons of
crash would have to be removed if it had noise data or lacked key information,
crash, the condition of the road, weather condition, injury condition, driver information, etc. such as reasons of crash,
the condition
Because of thethe road,
data forweather
the currentcondition, injuryfrom
study came condition,
crashdriver
and spot information, etc. with combing
investigations
meticulously, variables in the dataset were independent and the problem of data conflictwith
Because the data for the current study came from crash and spot investigations does combing
not exist.
meticulously, variables in the dataset were independent and the problem
There was no need to clean up the redundant data and integrate the data. In order to mine associationof data conflict does not
exist. Thereefficiently,
rules more was no need to clean
variables such upasthe redundant
calendar date data and integrate
on which the crashthe data. Inthe
occurred, order
name toof
mine
the
association rules more efficiently, variables such as calendar date on which the
street, name of the highway, house, fire, railroad, or other numbers that contributed little to the traffic crash occurred, the
name of the street,
crash were removed. name of the highway, house, fire, railroad, or other numbers that contributed little
to theSome
traffic crash were
variables thatremoved.
had the same range of value such as NTFYHOUR (the one-hour range in
which the enforcement agency the
Some variables that had wassame range
notified of crash)
of the value such as NTFYHOUR
and POSTSPD (posted(the
speed)one-hour range in
were converted
which the enforcement
into a different agency
range of value was notified
as shown in Tableof1.the crash)variables
Boolean and POSTSPDor discrete(posted
numericspeed) were
variables
converted into a different range of value as shown in Table 1. Boolean variables
were required to mine association rules using the Apriori algorithm, so that the continuous numerical or discrete numeric
variablesAGE
variable wereneeded
required to mine
to be association
dispersed as shownrules using 2.
in Table theSince
Apriori algorithm,
the residents cansoget
thata the continuous
driver's license
numerical
at the age of variable
16 in theAGE needed
United to be
States, thedispersed
age value as of shown
the firstin Tablewas
group 2. Since
set bythe residents can get a
(0,15).
driver's license at the age of 16 in the United States, the age value of the first group was set by (0,15).
Table 1. Variable conversion.
Table 1. Variable conversion.
Variable NTFYHOUR POSTSPD ROADCOND VEHDMG SAFETY
Variable
Initial data NTFYHOUR
X (hour) e.g., 5 POSTSPD
XX (mile/h) e.g., 55 ROADCOND NONE
SNOW VEHDMG NONE
SAFETY
Initial data
Converted data X (hour) e.g.,
HX (hour) e.g., H5 5 XX (mile/h) e.g.,
SXX (mile/h) e.g., S55 55 SNOW
SNOWY NONE
VNONE NONE
SNONE
Converted data HX (hour) e.g., H5 SXX (mile/h) e.g., S55 SNOWY VNONE SNONE
Table 2. Variable discretization.
Table 2. Variable discretization.
Initial Age (0,15] [16,25] [26,35] [36,45] [46,55] [56,65] [66,75] [76,85] [86,99]
Initial Age (0,15] [16,25] [26,35] [36,45] [46,55] [56,65] [66,75] [76,85] [86,99]
Discretization A1 A2 A3 A4 A5 A6 A7 A8 A9
Discretization A1 A2 A3 A4 A5 A6 A7 A8 A9
2.3. Structured Dataset Construction

Twenty-one variables and 63,325 reported crashes were filtered from 129,051 reported crashes
by data processing. The description and range of value of the twenty-one variables are cataloged in
Table 3.
Table 3. Description and information field of corresponding variables.
NO. Variables Description Information Fields Percentage (%)

• C = City 57.9
1 MUNITYPE The municipality type • T = Town 29.6
• V = Village 12.5
Intersection Distance in hundredths of a mile from •0 40.8
2 INTDIS
intersection location listed (1 = approx. 50 feet) • [0,288] 59.2
• ANGL = Angle 23.6
• HEAD = Head on collision 1.5
• NO C = No collision with
Manner (first harmful event) in which participants 31.1
3 MNRCOLL another vehicle
collided in the crash
• REAR = Rear end 30.0
• RTR = Rear to rear 0.3
• SSO = Sideswipe/opposite
2.8
direction
• SSS = Sideswipe/same direction 10.7
• GORE = Gore 0.2
• LTSH = Outside shoulder-left 4.8
• MED = Median 2.0
• OFF = Off roadway—location
Location of first harmful event in relation to a 0.8
4 RLTNRDWY unknown
roadway
• ON = On roadway 78.8
• PLOT = Private lot or private
0.0
prop
• RAMP = On ramp 0.7
• RTSH = Outside shoulder-right 9.1
• SHLD = Shoulder 3.6
• R CITY = City street rural 3.5
• R CTH = County trunk rural 8.4
• R IH = Interstate highway rural 3.2
• R STH = State highway rural 14.8
5 HWYCLASS The type of road the crash took place on • R TOWN = Town road rural 6.8
• U CITY = City street urban 40.3
• U CTH = County trunk urban 0.1
• U IH = Interstate highway urban 5.1
• U STH = State highway urban 17.7
• FAT = Fatal accident 0.5
The worst level of the crash severity to life and
6 ACCDSVR • INJ = Injury occurred 31.2
property
• PD = Property damage only 68.3
• [S5; S10; S15; S20] mile/hour 1.0
• S25 mile/hour 26.5
Posted speed for a vehicle unit at the location where
7 POSTSPD • [S30; S35; S40; S45; S50]
a crash occurred 43.3
mile/hour
• S55 mile/hour 19.2
• [S60; S65; S70; S77] mile/hour 10.0
• ND = Not physically divided 60.7
Text describing areas designed for motor vehicle • D/WO = Divided highway
8 TRFCWAY 21.5
operation without traffic barrier
• D/B = Divided highway with
13.6
traffic barrier
• OW = One-way traffic 4.2
• A1 0.3
9 AGE The age of the driver who causes the crash • A2 35.3
• [A3, A9] 64.7
• Male 57.4
10 SEX The sex of the driver
• Female 42.6
• V MNR = Very minor 7.4
• MNR = Minor 19.1
• MOD = Moderate 38.6
11 VEHDMG The extent of the worst vehicle damage
• SVR = Severe 22.7
• V SVR = Very severe 8.3
• VNONE = None 3.8
• DRY 67.9
• MUD 0.2
12 ROADCOND Surface condition of the road • SNOWY 14.0
• ICE 3.1
• WET 14.8
Table 3. Cont.
NO. Variables Description Information Fields Percentage (%)

• CLR = Clear 49.1
• CLDY = Cloudy 31.5
• RAIN = Rain 7.7
• SNOW = Snow 10.0
13 WTHRCOND The weather condition at the time of a crash
• FOG = Fog/smog/smoke 0.5
• SLET = Sleet/hail 0.7
• WIND = Blowing
0.4
sand/dirt/snow
• XWIND = Severe crosswinds 0.0
• BACKING = Backing up 3.4
• CHG LN = Changing lanes 3,7
• GO STR = Going straight 55.4
• IL PRK = Illegally parked 0.0
• LG PRK = Legally parked 0.0
• LT TRN = Making left turn 13.4
• MERGING = Merging into
1.4
traffic
What the driver of unit was doing at the time of
14 DRVRDO • NEGCRV = Negotiating curve 7.0
the crash
• NPASZN = Violate no pass zone 0.1
• OVT LT = Overtaking on the left 0.7
• OVT RT = Overtaking on right 0.4
• PARKNG = Parking maneuver 0.3
• RT TRN = Right turn 5.9
• RTOR = Right turn on red 0.0
• SL/ST = Slowing or stopped 7.3
• STOPED = Stopped in traffic 0.3
• UTURN = U turn 0.7
• DC = Driver condition 2.2
• DIS = Physically disabled 0.0
• DTC = Disregard traffic control 3.4
• FTC = Following too close 11.1
• FTY = Failure to yield 20.8
• FVC = Failure to keep vehicle
13.6
The possible driver contributing circumstances under control
15 DRVRPC
(Driver Factors) in a collision • IC = In conflict 0.0
• ID = Inattentive driving 24.2
• IO = Improper overtake 1.4
• IT = Improper turn 2.5
• LOC = Left of center 1.1
• SPD = Exceed speed limit 2.6
• TFC = Too fast for conditions 14.5
• UB = Unsafe backing 2.6
3. Methodology
3.1. Basic Conceptions

In the current study, the item set is a set of items and it includes at least one reported crash. An
item is one element of an item set, which represents a reported crash. A k-item set is defined as an item
set consisting of k items. A frequent pattern means that the same combination of eigenvalues occurs
a certain number of times in the dataset [29]. The association pattern represents the association and
correlation between several items. Association rules are association patterns that satisfy user-specified
support [30].
Given a finite set of items I = {i1 , i2 . . . . . . , im }. Let D be a dataset including plenty of transactions
that are subsets of I [31]. An extracted association rule is an implication of the form X ⇒ Y, where X
is the antecedent, and Y is the consequent. X and Y are item sets, which belong to D, and A ∩ B = ∅.
Support and confidence are the two most commonly used criteria for measuring the importance of
association rules. The support indicates the frequency of the association rule in the transaction set
containing X and Y, which is defined as Sup (X ⇒ Y) = P (X ∩ Y):
|X ∪ Y|
Sup(X ⇒ Y) = (1)
|D|
|D| is the total number of transactions, while |X ∪ Y| is the number of transactions that include
both item sets X and Y.
The confidence
Sustainability 2019, 11 FORindicates the credibility of the association rule X ⇒ Y, which is defined
PEER REVIEW 6 ofas
14
Con f (X ⇒ Y):
|X ∪ Y| Sup(X ∪ Y)
Con f (A ⇒ B) = |X ∪ Y|= 𝑆𝑢𝑝(X ∪ Y) (2)
𝐶𝑜𝑛𝑓(A ⇒ B) = |X| = Sup(X) (2)
|X| 𝑆𝑢𝑝(X)
|X| is the number of transactions that only contain item set X, while |X ∪ Y| is the number of
|X| is the number of transactions that only contain item set X, while |X ∪ Y| is the number of
transactions that include both item sets X and Y. The association rules whose value of support and
transactions that include both item sets X and Y. The association rules whose value of support and
confidence are equal to or bigger than the threshold defined by users are valid rules, which deserve to
confidence are equal to or bigger than the threshold defined by users are valid rules, which deserve
be analyzed.
to be analyzed.
To avoid generating a great number of uninteresting association rules, many algorithms for
To avoid generating a great number of uninteresting association rules, many algorithms for
mining association rules use criteria based on minimum support and minimum confidence. Due
mining association rules use criteria based on minimum support and minimum confidence. Due to
to lacking consideration of correlation between the support of X and the support of (X, Y), useless
lacking consideration of correlation between the support of X and the support of (X, Y), useless
association rules may still be generated when the support value of the consequent is too high. In order
association rules may still be generated when the support value of the consequent is too high. In order
to solve this problem, previous researchers have proposed several valid measures. Lift is the most
to solve this problem, previous researchers have proposed several valid measures. Lift is the most
widely used measure of them, which is defined as
widely used measure of them, which is defined as
Con f ( X ⇒ Y )
liftlift(𝑋 ) == 𝐶𝑜𝑛𝑓(𝑋 ⇒ 𝑌)
( X ⇒⇒Y𝑌) (3)(3)
Sup
𝑆𝑢𝑝(𝑌)(Y )
Conf(X⇒
Conf(X Y) is
⇒Y) is the
the confidence
confidence of
of association
association rule
rule (X (X ⇒⇒ Y),Y), while
while Sup(Y)
Sup(Y) is is the
the support
support value
value ofof
itemset
item setY.Y.There
Thereisisno
nocorrelation
correlationbetween
betweenitemitemset
setXXandandYYwith withlift
lift==1,1,while
whilethe
theoccurrence
occurrenceof ofitem
item
set XXisisexclusive
set exclusivetotoitem
itemset
setYYwith
withlift
lift<<1.1. Only
Only ififlift
lift>> 1,1, the
theassociation
association rules
rules are
arerecognized
recognizedas as
valuablerules.
valuable rules.
3.2.Association
3.2. AssociationRule
RuleMining
Mining
Extractingimportant
Extracting importantand
andhidden
hiddeninformation
informationfrom
froma large
a large dataset
dataset byby mining
mining association
association rules
rules is
is one
one of the
of the most
most common
common tasks
tasks in data
in data mining
mining [32].
[32]. TheThe association
association rule
rule mining
mining cancan
bebe described
described as as
a
a two-step process [33]:
two-step process [33]:
• Generating frequent
Generating frequent item
item sets—find
sets—find all
all frequent
frequent item
item sets
sets whose
whose support
support value
value is
is equal
equal to
to or
or
greaterthan
greater thanthe
theminimum
minimumsupport
supportvalue;
value;
• Generating association
Generating association rules—generate
rules—generate association
association rules
rules from
from frequent
frequent item
item sets
sets under
under the
the
conditionof
condition ofminimum
minimumconfidence.
confidence.
Figure33shows
Figure showsthe
theprocess
processof
ofassociation
associationrule
rulemining.
mining.
Figure 3. Association rule mining process.

Figure 3. Association rule mining process.
The association rules mining algorithms include Apriori, SETM [34], ECLAT [35], Pincer
The association rules mining algorithms include Apriori, SETM [34], ECLAT [35], Pincer Search
Search [36], and MAFIA [37], which are based on a support-confidence framework proposed by
[36], and MAFIA [37], which are based on a support-confidence framework proposed by Agrawal
Agrawal and Srikant. The Apriori algorithm is succinct and clear, which adopts an iterative method of
and Srikant. The Apriori algorithm is succinct and clear, which adopts an iterative method of layer-
layer-by-layer search. In the current study, the Apriori algorithm was used to discover the significant
by-layer search. In the current study, the Apriori algorithm was used to discover the significant rules
rules between the factors and crashes in Wisconsin.
between the factors and crashes in Wisconsin.
3.3. Validity Test of Association Rules
3.3. Validity Test of Association Rules
An extreme risk of type-I error exists because of the large number of association rules, which
needsAn extremeof
a process risk of type-I
validity error
tests exists because
to evaluate of the large
the statistical number of
significance of the
association rules, which
rules obtained [38].
needs a process of validity tests to evaluate the statistical significance of the rules obtained [38]. The
validation process is generally distinguished in two ways. The first approach is the direct adjustment
approach, which requires all association rules to pass statistical tests at the adjusted critical value.
The second approach is the holdout approach, which divides the data into exploratory data for
generating association rules without regard for the problem of multiple testing and holdout data for
statistical tests.
The validation process is generally distinguished in two ways. The first approach is the direct
adjustment approach, which requires all association rules to pass statistical tests at the adjusted critical
value. The second approach is the holdout approach, which divides the data into exploratory data
Sustainability 2019, 11 FOR PEER REVIEW
for
7 of 14
generating association rules without regard for the problem of multiple testing and holdout data for
In the
statistical current study, a direct adjustment approach was applied to test the validation of
tests.
association rules, as
In the current it has
study, an advantage
a direct adjustmentof data usage
approach for bothtoassociation
was applied rule discovery
test the validation and
of association
statistical evaluation [38]. Meanwhile, no more statistical tests will be required under this statistical
rules, as it has an advantage of data usage for both association rule discovery and approach
than under [38].
evaluation the holdout
Meanwhile, approach.
no more A number of direct
statistical tests will adjustment
be required approaches
under thiswereapproach
employed thanto
perform multiple hypothesis tests, such as Bonferroni correction [39], sequentially rejective
under the holdout approach. A number of direct adjustment approaches were employed to perform
Bonferroni [40], adaptive
multiple hypothesis tests,Benjamini–Hochberg algorithm[39],
such as Bonferroni correction [41],sequentially
and so on. The Bonferroni
rejective correction
Bonferroni [40],
states
adaptivethatBenjamini–Hochberg
if an experimenter is algorithm testing n independent
[41], and so hypotheses
on. The Bonferroni on a set of data, then
correction the statistical
states that if an
significance level that should be used for each hypothesis separately is 1/n times what it would be if
experimenter is testing n independent hypotheses on a set of data, then the statistical significance
only
level one
thathypothesis
should be was usedtested.
for eachBecause of the separately
hypothesis principle and is 1/ncharacteristics
times whatofitBonferroni
would be if correction,
only one
it made thewas
hypothesis results
tested. more rigorous
Because of thewith a tight
principle andupper bound. Thus,
characteristics the method
of Bonferroni of Bonferroni
correction, it made
correction was applied in the current study. The definition of Bonferroni correction is as follows:was
the results more rigorous with a tight upper bound. Thus, the method of Bonferroni correction
Let in
applied H1the, H2current
,..., Hn bestudy.
a family
Theof hypotheses
definition and p1, p2,…,
of Bonferroni pn be their
correction is ascorresponding
follows: p-values. The
n is the total number of null hypotheses, while n0 is the number of true hypotheses. The familywise
Let H 1 , H 2 ,..., H n be a family of hypotheses and p 1 , p 2 , . . . , p n be their corresponding p-values.
error rate (FWER) is the probability of rejecting at least
The n is the total number of null hypotheses, while n 0 is one true Hi; in other words, of making at least
the number of true hypotheses. The familywise
one
errortype
rateI(FWER)
error. The Bonferroni
is the probability correction rejects
of rejecting the null
at least onehypothesis for each
true Hi ; in other pi ≤ α/n,
words, while αatisleast
of making the
global significance level. Proof of this control follows from Boole's inequality, as
one type I error. The Bonferroni correction rejects the null hypothesis for each p i follows:
≤ α/n, while α is the
global significance level. Proof of this control follows from Boole's inequality, as follows:
𝛼 𝛼 𝛼 𝛼
𝐹𝑊𝐸𝑅 = 𝑃 n0 (𝑝 ≤α ) ≤ n0 n𝑃 𝑝 ≤ α o= 𝑛 α≤ 𝑛 α= 𝛼 (4)
FWER = P ∪ pi ≤ 𝑛 ≤ ∑ P pi ≤ 𝑛 = n0 ≤ 𝑛n = α
𝑛 (4)
i =1 n i =1
n n n
4.
4. Results
Results and
and Discussions
Discussions
Through
Through the
the procedure
procedure ofof data
data pretreatment,
pretreatment, 63,325
63,325 pieces
pieces of valid reported
of valid reported crashes
crashes data
data were
were
filtrated.
filtrated. Among
Among them,
them, there
there were
were 43,239
43,239 pieces
pieces ofof property
property damage
damage only
only (PD)
(PD) crashes,
crashes, 19,766
19,766 injuries
injuries
occurred
occurred (INJ)
(INJ) crashes,
crashes, and
and 320
320 fatal
fatal crashes
crashes (FAT)
(FAT) asas in
in Figure
Figure 4.
4. Based
Based onon the
the dataset,
dataset, the
the current
current
study
study then
then used
used the
the mathematical
mathematical programming
programming software
software Python
Python 3.53.5 on
on aa Lenovo
Lenovo laptop
laptop with
with Intel
Intel
Core
Core i5-5200U
i5-5200U 2.20GHz
2.20GHz CPUCPU and
and 88 GB
GB RAM
RAM to generate association
to generate association rules.
rules. There
There were
were 766
766 pieces
pieces of
of
association
association rules
rules that
that were
were obtained
obtained with
with filter criteria of
filter criteria minimum support
of minimum support equal
equal to 0.1, minimum
to 0.1, minimum
confidence equal to
confidence equal to 0.14,
0.14, and
and minimum
minimum lift lift greater
greater than
than 1.0,
1.0, as
as shown
shown in in Figure
Figure 5.
5.
Figure 4. The proportion of accident category.

Figure 4. The proportion of accident category.
Sustainability 2019, 11 FOR PEER REVIEW 8 of 14
Figure 5. Seven
Figure 5. Seven hundred and sixty-six
hundred and sixty-six pieces
pieces of
of association
association rules.
rules.
The current
The current study
study estimated
estimated the the smallest
smallest p-value
p-value for for the
the association
association rules based on
rules based on the
the upper
upper
bound of
bound of 0.1/766
0.1/766 that
that equals
equals 1.3*10
1.3*10− −4,4 ,while
while766766pieces
piecesof ofassociation
association were
were obtained
obtained withwith aa minimum
minimum
support value
support value that
that equals
equals 0.10.1 [42].
[42]. Only
Only two two rules
ruleshad
hadp-values
p-valueshigher
higherthan
than1.3*10
1.3*10−−44—the
—the p-value of
p-value of
rule WET,
rule WET, MALE
MALE ⇒ ⇒ ND
ND is 0.012 and
is 0.012 and the the p-value
p-value of of rule
rule LT LT TRN
TRN ⇒ PD is
⇒ PD is 0.029.
0.029. TheThe reason
reason for
for the
the
extremely low
extremely low number
number of of false
false discoveries
discoveries is is that
that the
the support,
support, confidence,
confidence, andand lift
lift threshold
threshold already
already
do an
do an excellent
excellent jobjob of
of pruning
pruning out out most
most rules
rules that
that are
are not
not statistically
statistically significant.
significant.
High support
High supportrulesrules indicate
indicate a higha high frequency
frequency of association
of association rules
rules (i.e., (i.e.,that
events events
occurthat occur
frequently
frequently
in a crash), in a crash),
while while high indicates
high confidence confidence theindicates
probabilitythe of
probability
occurrence ofofoccurrence
a consequent of a event
consequent
when
event
the when theitem
antecedent antecedent
occurred item occurred
(i.e., (i.e., the event
the antecedent antecedent
is more event is more
likely to occurlikely
when to occur when the
the antecedent
antecedent
event happenseventin happens
a crash). in a crash).
Rules withRules with
high lift highwhich
value, lift value, which are
are greater thangreater
1.0, arethan 1.0,rules
valid are valid
and
rules and indicate strong associations between the factors (i.e., there is a strong positive correlation
indicate strong associations between the factors (i.e., there is a strong positive correlation between
between
the the two
two events in aevents
crash).inThe
a crash).
current The current
study studyout
screened screened
the topout the top association
20 support 20 support rules association
of the
rules of value
highest the highest value 4,
as in Table asthe
in Table
top 204,confidence
the top 20 confidence
associationassociation
rules of therules
highestof the highest
value as invalue
Tableas5,
in Table
and 5, and
the top theassociation
20 lift top 20 lift association
rules of therules highest of the highest
value valueinasTable
as shown shown 6. in Table 6.
Table 4. Top
Table 4. Top 20
20 support
support association
association rules
rules of
of the
the highest
highest value.
value.
Rules Rules Antecedent

Antecedent Consequent
Consequent Support
SupportConfidence Lift
Confidence Lift
1 PD S25, 0 0.68 0.15 1.01
1 2
PD PD S25, 0
S25, CLR 0.68
0.68 0.15
0.15
1.05
1.01
2 3
PD PD S25, CLR
U CITY, CLR, ND 0.68
0.68 0.15
0.15
1.01
1.05
3 4 PD PD U CITY,S25,CLR,
M ND 0.68 0.68 0.16 0.15
1.08 1.01
4 5 PD PD MOD, M
S25, A2 0.68 0.68 0.16 0.16
1.13 1.08
5 6 PD PD MOD,
U CITY, ND,A2M 0.68 0.68 0.16 0.16
1.04 1.13
6 7 PD PD U CITY,TFC ND, M 0.68 0.68 0.16 0.16
1.08 1.04
7 8 PD PD TFCM
REAR, 0.68 0.68 0.17 0.16
1.02 1.08
8 9 PD PD REAR,
0, MODM 0.68 0.68 0.17 0.17
1.05 1.02
9 10 PD PD 0, MODND
S25, U CITY, 0.68 0.68 0.18 0.17
1.07 1.05
10 11 PD PD S25,
MOD,U CITY,
U CITYND 0.68 0.68 0.18 0.18
1.09 1.07
11 12 PD PD MOD, U CITY
F, U CITY 0.68 0.68 0.19 0.18
1.01 1.09
12 13 PD PD F,MOD,
U CITY
F 0.68 0.68 0.19 0.19
1.10 1.01
13 14 PD PD MOD,
MOD, CLR
F 0.68 0.68 0.20 1.06
0.19 1.10
14 15 PD PD S25,
MOD, U CITY
CLR 0.68 0.68 0.20 1.07
0.20 1.06
15 16 PD PD S25,A2,
UM CITY 0.68 0.68 0.20 1.02
0.20 1.07
16 17 PD PD U CITY,
A2, MM 0.68 0.68 0.22 1.03
0.20 1.02
17 18 PD PD MOD, GO STR
U CITY, M 0.68 0.68 0.23 1.07
0.22 1.03
18 19 PD PD MOD, MNRGO STR 0.68 0.68 0.23 1.21
0.23 1.07
19 20 PD PD MOD,
MNRM 0.68 0.68 0.24 1.11
0.23 1.21
20 PD MOD, M 0.68 0.24 1.11
Following are the analysis of results from Table 4:
 Due to the PD (property damage only) crashes having a proportion of 68.3% in the whole dataset,
the top 20 support association rules of highest value are all related to PD. It indicates that most
• Due to the PD (property damage only) crashes having a proportion of 68.3% in the whole dataset,
the top 20 support association rules of highest value are all related to PD. It indicates that most
of the crashes are not related to injury and fatalities, which is consistent with the findings of the
Global status report on road safety 2018 [1].
• The significant factors for the high value of support association rules are the type of road, the
extent of the worst vehicle damage, posted speed, male drivers, and a roadway with no physical
separation, weather, location, and age of drivers.
• It is obvious that the extent of vehicle damage is more likely to be moderate (MOD) in a property
damage only crash (rule 5, 9, 11, 13, 14, 18, and 20).
• The crashes mostly occurred in urban areas (rule 11, 12, and 17) with no physical separation (rule
3 and 6), while Abdel-Aty and Radwan found that highway geometry is the second important
factor in occurrence of traffic crashes [24], and a lower posted speed (rule 15). Especially, the
rule PD → S25, U CITY, ND (support = 0.68, confidence = 0.18, lift = 1.07) clearly expresses the
relationship between them. Through the revelation of the above rules, decision makers can reduce
the occurrence of crashes by setting up physical separations on crash-prone sections.
• Male drivers are more prone to be associated with property damage only traffic crashes than
female drivers, which can be observed from the rules (4, 6, 8, 16, 17, and 20) and rules (12 and
13). On the one hand, male drives are more likely to drive drunk and/or speed than female
drivers [43]. On the other hand, it is probable that male drivers are less likely to comply with
traffic rules and are generally overconfident while driving [44].
Table 5. Top 20 confidence association rules of the highest value.
Rules Antecedent Consequent Support Confidence Lift

1 FTC REAR 0.14 0.95 3.17
2 S55, NO C ND 0.12 0.89 1.46
3 S25, GO STR ND 0.14 0.87 1.44
4 S25, U CITY, PD ND 0.14 0.87 1.43
5 S25, U CITY ND 0.19 0.87 1.43
6 S25, CLR ND 0.14 0.86 1.42
7 ANGL, GO STR 0 0.12 0.86 2.10
8 S25, PD ND 0.19 0.86 1.41
9 S25, M ND 0.14 0.85 1.41
10 S25 ND 0.27 0.85 1.41
11 S25, F ND 0.12 0.85 1.40
12 MNR PD 0.19 0.83 1.21
13 S25, 0 ND 0.15 0.82 1.36
14 FTY, ANGL 0 0.15 0.78 1.92
15 0, FTY ANGL 0.15 0.78 3.31
16 ND, FVC NO C 0.14 0.78 2.5
17 MOD, A2 PD 0.14 0.77 1.13
18 S55 ND 0.19 0.76 1.25
19 FTY, ND ANGL 0.14 0.75 3.20
20 MOD, M PD 0.21 0.75 1.11
• The highest confidence value rule FTC (following too close) → REAR (rear end) (support = 0.14,
confidence = 0.95, lift = 3.17) indicates that following too close will lead to rear ending between
cars, which is widely known.
• Same as the result from Table 4, low posted speed and roadways with no physical separation (rule
3, 4, 5, 6, etc.) are significant elements that affect the occurrence of crashes. The large deviation
of speed, which is generated by drivers that ignore the posted speed and speed a lot, is perhaps
the reason why crashes happen in the location with low posted speed. Elvik found that lower
posted speed is prone to lead to a crash as a result of a high deviation of speed [45]. Roadways
with no physical separation often cause the problem that drivers sometime occupy the opposite
lanes, which probably leads to a collision.
• In comparison with other drivers, the drivers aged 16–25, which are presented by A2 in
Tables 4 and 5, are most likely to be involved in crashes (rule 5, 16 in Table 4, rule 17 in Table 5),
because drivers aged 16–25 are a large proportion of the whole drivers, and they are more likely
to violate driving rules. Decision makers can strengthen traffic safety education for drivers aged
16–25 to reduce the occurrence of traffic crashes.
• ‘0’ indicates that the crash occurred at an intersection. Four rules (rule 7, 13, 14, and 15) show
that crashes are more likely to occur at an intersection. The intersection is a convergence
area of city traffic flow and flow of people, which have complex traffic conditions and are
more likely to lead to a crash. Wang et al. found that a crash is more prone to occur at an
intersection [46]. An appropriate organization of intersection flow might help decision makers
control the occurrence of crashes effectively.
• Following too close (FTC), failure to yield (FTY), and failure to keep the vehicle under control
(FVC) are perhaps the significant driver-contributing circumstances in a crash (rule 1, 14, 15, 16,
and 19). Abdel-Aty and Radwan found that driver conditions were the most important factors in
the occurrence of traffic crashes [24].
Table 6. Top 20 lift association rules of the highest value.

1 0, FTY ANGL 0.15 0.78 3.31
2 FTY 0, ANGL 0.21 0.57 3.23
3 FTY, ND ANGL 0.14 0.75 3.20
4 FTC REAR 0.14 0.95 3.17
5 S55 ND, NO C 0.19 0.54 2.52
6 ND, FVC NO C 0.14 0.78 2.50
7 M, FVC NO C 0.14 0.73 2.36
8 FVC, GO STR NO C 0.14 0.72 2.33
9 M, NO C FVC 0.20 0.53 2.31
10 FVC NO C 0.23 0.71 2.30
11 PD, FVC NO C 0.15 0.71 2.28
12 ND, S55 NO C 0.15 0.71 2.27
13 U CITY, PD, ND S25 0.20 0.59 2.24
14 FVC ND, NO C 0.23 0.48 2.24
15 FVC PD, NO C 0.23 0.47 2.21
16 S25 U CITY, ND 0.27 0.62 2.16
17 U CITY, ND S25, PD 0.29 0.42 2.14
18 ANGL, GO STR 0 0.12 0.86 2.10
19 0, GO STR ANGL 0.21 0.50 2.08
20 0, U CITY ANGL 0.22 0.49 2.05
• High lift values suggest a strong interdependence between the antecedent and the consequent.
Three rules with high lift values indicate that drivers failing to yield, crash occurring at the
intersection, and the collision type of angle have a strong connection [24].
• The rule with highest lift value is 0, FTY → ANGL (support = 0.15, confidence = 0.78, lift = 3.17).
The support value shows that 15% of crashes result from failing to yield at an intersection [46].
The confidence value proves that 78% of the crashes occurred due to angle collision. The ratio of
angle collision crashed was 3.17 times the ratio of other types of collision.
• The crash is more likely to happen when drivers go straight (rule 8, 18, and 19), because drivers
might tend to be more relaxed with their vigilance during going straight than when crossing
a curve.
• There are nine rules with NO C = no collision as a consequent, which indicates that most of the
crashes with no collision happened between vehicles because most of the vehicles had a collision
with a physical barrier.
• Male drivers are more prone to fail to keep the vehicle under control. Das et al. also found a
higher number of males are associated with crashes [47].
With the percentage of fatal crashes (0.5%) being too small, it is impossible to produce high values
of support and confidence. To discuss the influence factors of fatal crashes, the dataset applied only
included fatal crashes. Twelve pieces of association rules that were obtained with filter criteria of
minimum support that equaled 0.5, minimum confidence that equaled 0.5, and minimum lift that was
greater than 1.0 is shown in Table 7.
Table 7. Association rules related to fatal crashes.

1 T M 0.68 0.76 1.02
2 M T 0.75 0.69 1.02
3 DRY CLR 0.84 0.69 1.14
4 CLR DRY 0.60 0.96 1.14
5 M ND 0.75 0.77 1.02
6 ND M 0.76 0.76 1.02
7 M DRY 0.75 0.87 1.03
8 DRY M 0.84 0.77 1.03
9 T ND 0.68 0.86 1.13
10 ND T 0.76 0.77 1.13
11 V SVR ND 0.68 0.76 1.02
12 ND V SVR 0.75 0.69 1.02
The following are the analysis of results from Table 7:
• The significant factors for fatal crashes are location, male drivers, the extent of the worst vehicle
damage, roadway with no physical separation, weather and road surface condition.
• Different from property damage only crashes, fatal crashes are more likely to occur in town instead
of the city. Compared with the city road, there are fewer vehicles, police, and less supervision in
town. Drivers tend to be more relaxed with their vigilance and speeding.
• Male drivers are prone to be involved in fatal crashes, which has the same reason with other types
of crashes.
• Drivers are more likely to get involved in fatal crashes when the weather condition is clear, and
the road surface condition is dry. It is perhaps because drivers would pay more attention to
driving when the weather and road surface condition are dangerous. Karlaftis and Yannis suggest
a negative relationship between adverse weather and road safety, mainly because drivers are not
used to driving under adverse weather conditions and consequently adjust their behavior by
driving more carefully [48].
• Roadways with no physical separation have always been a problem threatening traffic safety.
5. Conclusions
Due to the complicated interaction among different factors—the situation of the driver, the
condition of vehicle and road, environment and management—a traffic crash is a complex and
systemic problem. In order to decrease the number of traffic crashes, fundamental reasons, which are
the basis for promoting measures, need to be systematically analyzed. A large number of researchers
have made efforts to identify the vital factors that influence the severity and frequency of traffic
crashes during recent years, in order to formulate effective safety countermeasures to enhance traffic
sustainability [47].
In the current study, the Apriori algorithm was implemented to identify characteristics and factors
impacting traffic crashes in Wisconsin, United States. By setting an appropriate threshold value of
support and confidence, essential information of traffic crash characteristics can be gained to analyze
the fundamental causes of a traffic crash. The association rules, which were generated in the current
study, suggest a couple of significant factor groups: posted speed, driver condition, weather condition,
road surface condition, distance from the intersection, a roadway with no physical separation, an
administrative grade of crash location, male drivers, and the age of drivers. Taking these factors into
account, the government can make countable measures to improve the sustainable level of traffic safety.
The majority of the findings are consistent with previous studies. The variables considered are more
comprehensive, including environment, management, and state of drivers and vehicles, which is the
critical contribution of the current study.
Note that the present study did not optimize the parameters with any optimization method,
for the current study obtained objective and significant results in the current size of the database.
For future directions, efforts could be made on incorporating genetic algorithms and particle swarm
optimization with the Apriori algorithm to optimize the values of the parameters, and to obtain
significant results with high efficiency in analyzing large-scale databases.
Author Contributions: S.Y. and Y.J. developed the concept and designed the study. S.Y. performed the
methodology. S.Y. and D.S. performed the data analysis. Y.J. and D.S. read and approved the final manuscript.
All authors contributed to the result interpretation and the final version of the manuscript.
Funding: This research received no external funding.
Acknowledgments: This research is supported by National Natural Science Foundation of China (71340020).
Conflicts of Interest: The authors declare no conflict of interest.
References
1. World Health Organization (WHO). Global Status Report on Road Safety 2018; WHO: Geneva,
Switzerland, 2018.
2. Wegman, F. Road Accidents: Worldwide a Problem that Can Be Tackled Successfully; Permanent International
Association of Road Congresses: Paris, France, 1996.
3. Tešić, M.; Hermans, E.; Lipvac, K.; Pešić, D. Identifying the most significant indicators of the total road safety
performance index. Accid. Anal. Prev. 2018, 113, 263–278. [CrossRef] [PubMed]
4. Haddon, W. Options for the prevention of motor vehicle crash injury. Isr. J. Med Sci. 1980, 16, 45–65.
[PubMed]
5. Sun, Y.; Hrušovský, M.; Zhang, C.; Lang, M.X. A Time-Dependent Fuzzy Programming Approach for the
Green Multimodal Routing Problem with Rail Service Capacity Uncertainty and Road Traffic Congestion.
Complexity 2018, 2018, 1–22. [CrossRef]
6. Figueira, A.D.C.; Pitombo, C.S.; Oliveira, D.; Paulo, T.M.S.; Larocca, A.P.C. Identification of rules induced
through decision tree algorithm for detection of traffic accidents with victims: A study case from Brazil.
Case Stud. Transp. Policy 2017, 5, 200–207. [CrossRef]
7. Shibata, A.; Fukuda, K. Risk factors of fatality in motor vehicle traffic accidents. Accid. Anal. Prev. 1994, 26,
391–397. [CrossRef]
8. Moudon, A.; Lin, L.; Jiao, J.; Hurvitz, P.; Reeves, P. The risk of pedestrian injury and fatality in collisions
with motor vehicles, a social ecological study of state routes and city streets in King County, Washington.
Accid. Anal. Prev. 2011, 43, 11–24. [CrossRef] [PubMed]
9. Shankar, V.; Mannering, F. An exploratory multinomial logit analysis of single-vehicle motor cycle accident
severity. J. Saf. Res. 1996, 27, 183–194. [CrossRef]
10. Yasmin, S.; Eluru, N.; Ukkusuri, S. Alternative ordered response frameworks for examining pedestrian injury
severity in New York City. J. Transp. Saf. Secur. 2014, 6, 275–300. [CrossRef]
11. Wu, Z.; Sharma, A.; Mannering, F.; Wang, S. Safety impacts of signal-warning flashers and speed control at
high-speed signalized intersections. Accid. Anal. Prev. 2013, 54, 90–98. [CrossRef]
12. Savolainen, P.; Mannering, F. Probabilistic models of motor cyclists' injury severities in single- and
multi-vehicle crashes. Accid. Anal. Prev. 2007, 39, 955–963. [CrossRef]
13. Chen, W.; Jovanis, P. Method for identifying factors contributing to driver-injury severity in traffic crashes.
Transp. Res. Rec. 2000, 1707, 1–9. [CrossRef]
14. Abdelwahab, H.; Abdel-Aty, M. Development of artificial neural network models to predict driver injury
severity in traffic accidents at signalized intersections. Transp. Res. Rec. 2001, 1746, 6–13. [CrossRef]
15. Chimba, D.; Sando, T. Neuromorphic prediction of highway injury severity. Adv. Transp. Stud. 2009, 19,
17–26.
16. Castro, M.; Paleti, R.; Bhat, C. A spatial generalized ordered response model to examine highway crash
injury severity. Accid. Anal. Prev. 2013, 52, 188–203. [CrossRef]
17. Xiong, Y.; Tobias, J.; Mannering, F. The analysis of vehicle crash injury-severity data: A Markov switching
approach with road-segment heterogeneity. Transp. Res. Part B: Methodol. 2014, 67, 109–128. [CrossRef]
18. Martin, D.; Rosete, A.; Alcala-Fdez, J.; Herrera, F. A new multiobjective evo-lutionary algorithm for mining a
reduced set of interesting positive and negative quantitative association rules. IEEE Trans. Evol. Comput.
2014, 18, 54–69. [CrossRef]
19. Miaou, S.; Lum, H. Modeling vehicle accidents and highway geometric design relationships.
Accid. Anal. Prev. 1993, 25, 689–709. [CrossRef]
20. Shankar, V.; Mannering, F.; Barfield, W. Effect of roadway geometrics and environmental factors on rural
freeway accident frequencies. Accid. Anal. Prev. 1995, 27, 371–389. [CrossRef]
21. Milton, J.; Mannering, F. The relationship among highway geometrics, traffic-related elements and
motor-vehicle accident frequencies. Transportation 1998, 25, 395–431. [CrossRef]
22. Tarko, A.P.; Kanodia, M. Effective and Fair Identification of Hazardous Locations. In Transportation Research
Record: Journal of Transportation Research Board, No. 1897; Transportation Research Board of National
Academics: Washington, DC, USA, 2004; pp. 64–70.
23. Bagdadi, O. Estimation of the severity of safety critical events. Accid. Anal. Prev. 2013, 50, 167–174. [CrossRef]
[PubMed]
24. Abdel-Aty, M.; Radwan, A. Modeling traffic accident occurrence and involvement. Accid. Anal. Prev. 2000,
32, 633–642. [CrossRef]
25. Agrawal, R.; Imielinski, T.; Swami, A. Mining association rules between sets of items in large databases.
Proc. ACM Sigmod 1994, 22, 207–216. [CrossRef]
26. Golob, T.; Recker, W. A Method for relating type of cash to traffic flow characteristics on urban freeways.
Transp. Res. Part A Policy Pract. 2004, 38, 52–80.
27. Prati, G.; Pietrantoni, L.; Fraboni, F. Using data mining techniques to predict the severity of bicycle crashes.
Accid. Anal. Prev. 2017, 101, 44–54. [CrossRef]
28. Kumar, S.S.; Kumar, J.S. A Study of K-Means and C-Means Clustering Algorithms for Intension Detection
Product Development. Int. J. Innov. Technol. Manag. 2016, 5, 207–213.
29. Rodríguez, G.; Ansel, Y.; Martínez, T.; José, F. Mining frequent patterns and association rules using similarities.
Expert Syst. Appl. 2013, 40, 6823–6836. [CrossRef]
30. Xue, C.J.; Song, W.J.; Qin, L.J.; Dong, Q.; Wen, X.Y. A mutual-information-based mining method for marine
abnormal association rules. Comput. Geosci. 2015, 76, 121–129.
31. Lazzerini, B.; Pistolesi, F. Profiling risk sensibility through association rules. Expert Syst. Appl. 2013, 40,
1484–1490. [CrossRef]
32. Kabir, M.M.J.; Xu, S.X.; Kang, B.H.; Zhao, Z.Y. A new multiple seeds based genetic algorithm for discovering
a set of interesting Boolean association rules. Expert Syst. Appl. 2017, 74, 55–69. [CrossRef]
33. Agrawal, R.; Srikant, R. Fast algorithms for mining association rules in large databases. Process. 20th Int.
Conf. Very Large Databases 1994, 1215, 487–499.
34. Houtsma, M.; Swami, A. Set-oriented mining for association rules in relational databases. In Proceedings of
the Eleventh International Conference on Data Engineering, Taipei, Taiwan, 6–10 March 1995; pp. 25–33.
35. Zaki, M.J. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 2000, 12, 372–390.
[CrossRef]
36. Lin, D.I.; Kedem, Z.M. Pincer-search: An efficient algorithm for discovering the maximum frequent set.
IEEE Trans. Knowl. Data Eng. 2002, 14, 553–566. [CrossRef]
37. Burdick, D.; Calimlim, M.; Flannick, J.; Gehrke, J.; Yiu, T. MAFIA: A maximal frequent itemset algorithm.
IEEE Trans. Knowl. Data Eng. 2005, 17, 1490–1504. [CrossRef]
38. Webb, G.I. Discovering significant patterns. Mach. Learn. 2007, 68, 1–33. [CrossRef]
39. Scheffer, T. Finding association rules that trade support optimally against confidence. Intell. Data Anal. 1995,
9, 381–395. [CrossRef]
40. Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 1979, 6, 65–70.
41. Benjamini, Y.; Hochberg, Y. On the Adaptive Control of the False Discovery Rate in Multiple Testing with
Independent Statistic. J. Educ. Behav. Stat. 2000, 25, 60–83. [CrossRef]
42. Megiddo, N.; Srikant, R. Discovering predictive association rules. In Proceedings of the International
Conference on Knowledge Discovery & Data Mining, San Diego, CA, USA, 15–18 August 1999.
43. Shinar, D.; Compton, R. Aggressive driving: An observational study of driver, vehicle, and situational
variables. Accid. Anal. Prev. 2004, 36, 429–437. [CrossRef]
44. Zhang, G.N.; Yau, K.K.W.; Gong, X.P. Traffic violations in Guangdong Province of China: Speeding and
drunk driving. Accid. Anal. Prev. 2014, 63, 30–40. [CrossRef]
45. Elvik, R. A comprehensive and unified framework for analysing the effects on injuries of measures
influencing speed. Accid. Anal. Prev. 2019, 125, 63–69. [CrossRef]
46. Jinghui, Y.; Mohamed, A.A. Approach-level real-time crash risk analysis for signalized intersections.
Accid. Anal. Prev. 2018, 119, 274–289.
47. Das, S.; Dutta, A.; Jalayer, M.; Bibeka, A.; Wu, L. Factors influencing the patterns of wrong-way driving
crashes on freeway exit ramps and median crossovers: Exploration using ‘Eclat’ association rules to promote
safety. Int. J. Transp. Sci. Technol. 2018, 7, 114–123. [CrossRef]
48. Karlaftis, M.; Yannis, G. Weather effects on daily traffic accidents and fatalities: A time series count data
approach. In Proceedings of the 89th Annual Meeting of the Transportation Research Board, Washington,
DC, USA, 10–14 January 2010.
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Identifying Factors That in Uence The Patterns

Uploaded by

Copyright:

Available Formats

Identifying Factors That in Uence The Patterns

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Identifying Factors That in Uence The Patterns

Uploaded by

Copyright:

Available Formats

sustainability

Keywords: traffic safety; significant factor; association rules; Apriori

Sustainability 2019, 11, 1925; doi:10.3390/su11071925 www.mdpi.com/journal/sustainability

Figure 2. The procedure of data pretreatment.

2.3. Structured Dataset Construction

Table 3. Description and information field of corresponding variables.

NO. Variables Description Information Fields Percentage (%)

NO. Variables Description Information Fields Percentage (%)

3.1. Basic Conceptions

Figure 3. Association rule mining process.

Figure 4. The proportion of accident category.

Rules Rules Antecedent

Following are the analysis of results from Table 4:

Table 5. Top 20 confidence association rules of the highest value.

Rules Antecedent Consequent Support Confidence Lift

Following are the analysis of results from Table 5:

Table 6. Top 20 lift association rules of the highest value.

Rules Antecedent Consequent Support Confidence Lift

Following are the analysis of results from Table 6:

Table 7. Association rules related to fatal crashes.

Rules Antecedent Consequent Support Confidence Lift

The following are the analysis of results from Table 7:

You might also like