Water 11 00973 v2 PDF
Water 11 00973 v2 PDF
Water 11 00973 v2 PDF
Article
Use of Artificial Intelligence to Improve Resilience
and Preparedness Against Adverse Flood Events
Sara Saravi 1, * , Roy Kalawsky 1 , Demetrios Joannou 1 , Monica Rivas Casado 2 ,
Guangtao Fu 3 and Fanlin Meng 3
1 Wolfson School of Mechanical, Electrical & Manufacturing Engineering, Advanced VR Research Centre,
Loughborough University, Loughborough LE11 3TU, UK; r.s.kalawsky@lboro.ac.uk (R.K.);
d.joannou@lboro.ac.uk (D.J.)
2 School of Water, Energy and Environment, Cranfield University, Cranfield, Bedfordshire MK43 0AL, UK;
m.rivas-casado@cranfield.ac.uk
3 Centre for Water Systems, College of Engineering, Mathematics and Physical Sciences, University of Exeter,
Exeter, Devon EX4 4QF, UK; g.fu@exeter.ac.uk (G.F.); m.fanlin@exeter.ac.uk (F.M.)
* Correspondence: s.saravi@lboro.ac.uk; Tel.: +44-(0)-1509-222-938
Received: 12 February 2019; Accepted: 6 May 2019; Published: 9 May 2019
Abstract: The main focus of this paper is the novel use of Artificial Intelligence (AI) in natural disaster,
more specifically flooding, to improve flood resilience and preparedness. Different types of flood have
varying consequences and are followed by a specific pattern. For example, a flash flood can be a result
of snow or ice melt and can occur in specific geographic places and certain season. The motivation
behind this research has been raised from the Building Resilience into Risk Management (BRIM)
project, looking at resilience in water systems. This research uses the application of the state-of-the-art
techniques i.e., AI, more specifically Machin Learning (ML) approaches on big data, collected from
previous flood events to learn from the past to extract patterns and information and understand flood
behaviours in order to improve resilience, prevent damage, and save lives. In this paper, various ML
models have been developed and evaluated for classifying floods, i.e., flash flood, lakeshore flood,
etc. using current information i.e., weather forecast in different locations. The analytical results show
that the Random Forest technique provides the highest accuracy of classification, followed by J48
decision tree and Lazy methods. The classification results can lead to better decision-making on what
measures can be taken for prevention and preparedness and thus improve flood resilience.
Keywords: Artificial Intelligence; machine learning; flood; preparedness; resilience; flood resilience
1. Introduction
Climate change is expected to increase the frequency and intensity of extreme events, including
flooding. Across the world, flooding has an enormous economic impact and cost millions of lives.
The number of large scale natural disasters have significantly increased in the past few years; this results
in considerable impact to human lives, environment and buildings, and substantial damage to societies.
During these disasters, vast quantities of data are collected on the characteristics of the event via
governmental bodies, society (e.g., citizen science), emergency responders, loss adjusters and social
media, amongst others. However, there is a lack of research on how this data can be used to inform
how different stakeholders are/can be directly or indirectly affected by large scale natural disasters
pre-, during and post-event disaster management decisions. There is a growing popularity and need
for the use of Artificial Intelligence (AI) techniques [1] that bring large-scale natural disaster data into
real practice and provide suitable tools for natural disaster forecasting, impact assessment, and societal
resilience. This in turn will inform on resource allocation, which can lead to better preparedness and
prevention for a natural disaster, save lives, minimize economic impact, provide better emergency
respond, and make communities stronger and more resilient.
The majority of the work done in the area of AI in flooding has been on the use of social media [2]
(e.g., Facebook, Twitter or Instagram) where status update, comments and photo sharing have been
used for data mining to improve flood modelling and risk management [3–5]. The author of [6] has
used Artificial Neural Networks (ANN) in flash flood prediction using data from soil moisture and
rainfall volume. Further research [7] has focused on the use of the Bursty Keyword technique combined
with the Group Burst algorithm to retrieve co-occurring keywords and derive valuable information
for flood emergency response. AI has also been used on images provided by citizens affected by
flooding for emergency responders to have situational awareness. In [8], the authors explored the use
of algorithms based on ground photography shared within social networks. Use of specific algorithms
for satellite images or aerial imagery [9] to detect flood extent was also explored. Within this context,
the resolution of the imagery collected is of key relevance to detect features of interest due to the
complexity of the imagery acquired in urban areas [10]. Some studies have focused on the analysis of
high resolution, real-time data processing to derive flood information [11,12].
Overall, the majority of disaster-monitoring methods are based on change detection algorithms,
where the affected area is identified through a complex elaboration on images from pre- and post-event.
Change detection can be applied to the amplitude or intensity, filtered or elaborated versions of the
amplitude [2,13,14]. For example, in [15], a technique based on change detection applied to quantities
related to the fractal parameters of the observed surface was developed to address change detection.
In [16], information extracted from images taken and shared on social media by people in flooded
regions was combined with the embedded metadata within them to detect flood patterns. In this study,
a convolutional inception network was applied on pre-trained weights on ImageNet to extract rich
visual information from the social media imagery. A word embedding was used for the metadata to
represent the textual information continuously and feed it to a bidirectional Recurrent Neural Networks
(RNN). The word embedding was initialized using Glove vectors, and finally, the image and text
features were concatenated to find out probability of the sample, including related information about
flooding. Similarly, in [17], an AI system was designed to retrieve social media images containing
direct evidence of flooding events and derive visual properties of images and the related metadata via
a multimodal approach. For that purpose, an image pre-processing including cropping and test-set
pre-filtering based on image colour or textual metadata and ranking for fusion was implemented.
In [18,19], Convolutional Neural Networks and Relation Networks were used for end-to-end learning
for disaster image retrieval and flood detection from satellite images.
2. Methodology
Flood management strategies and emergency response depend upon the type of area affected
(e.g., agricultural or urban) as well as on the flood type (e.g., fluvial, pluvial or coastal). Resilience
measures are generally deployed by governmental agencies to reduce the impact of flooding. The use
of AI to derive flood information for specific events is well documented in the scientific literature.
However, little is known about how AI could inform future global patterns of flood impact and
associated resilience needs.
The main focus of this paper is on the use of AI and more explicitly Machine Learning (ML)
applied to natural disasters involving flooding to estimate the flood type from the weather forecast,
location, days event lasted, begin/end location, begin/end latitude and longitude, injuries direct/indirect,
death direct/indirect and property and crop damage.
The proposed method uses historical information collected from 1994 to 2018, to learn the patterns
and changes in various parameters’ behaviours in flood events and make remarks for the future events.
This paper focuses only on providing an insight on how floods behave differently in terms of damage.
Using the historic data, the models developed adapt to all the changes over time by learning from
past information and can provide high accuracy of classification. The proposed technique is highly
The flood pathways and key variables are first described, and data sourcing and ML techniques
used in this study are then explained, and finally the model evaluation metrics are provided.
Figure 1.
Figure An overall
1. An overall causal
causal loop
loopof
offlood
floodpathways.
pathways.Green
Greenrefers
referstotonormal
normalconditions, amber
conditions, refers
amber to
refers
caution for a probability of flooding, and red refers to a very high risk or event of flooding.
to caution for a probability of flooding, and red refers to a very high risk or event of flooding.
The Collation
2.2. Data season, temperature, location of the area (highland/inland, coastal/urban), and rain/snowfall
and Preparation
can affect the levels of the sea or river and reservoirs. Usage of water by energy suppliers, human/farming
waterOne of thecan
demand most important
change requirements
the balance for this research
of the reservoirs and riverwas a detailed
water historic
levels and andainclusive
indicate warning
data
sign for flooding. The use of social media and public awareness can help tackle the risks of a National
set, which was acquired from Federal Emergency Management Agency (FEMA) [20], flooding
Oceanic
event. Whenand Atmospheric Administration
the flooding occurs, (NOAA)
many sectors [21] and
are affected National
i.e., road/railClimatic Data Centre
way damage, (NCDC)
gas/water pipe
[21]. The data used in this study covers the period of 1950 until 2018. However, the
damage, power cut, farming damage, etc. Emergency response and access to food and local amenities data of flooding
events is recorded
are restricted. from prices
Grocery the year 1994due
spike onwards
to lackand is inclusive
of supply of all eventare
and businesses types, i.e., heavy
affected snow,
by physical
building damage or lack of human resources. In this loop, public awareness, emergency responses
(local/public), early release of sewerage system, and shelters can help save lives.
There are three states in the diagram in Figure 1: Green refers to normal conditions, amber refers
to caution for a probability of flooding, and red refers to a very high risk or event of flooding.
Water 2019, 11, 973 4 of 16
Training
Data Model
Model
Test Data
Figure flow
Figure 2. Work 2. Work flow summarising
summarising the the analyticalsteps
analytical steps followed.
followed.“Data” includes
“Data” both data
includes collation
both data collation
and extraction.
and extraction.
Water 2019, 11, 973 5 of 16
Figure3.3. A
Figure A sample
sample of
of input
input data
data prepared
preparedfor
fortraining.
training.
2.3.
2.3. Machine
Machine Learning—Model
Learning—Model Development
Development
AI
AI isis human intelligence demonstrated
human intelligence demonstrated by bymachines
machinesand andML MLisisananapproach
approach to to achieve
achieve AI.AI.
In
In this study, the focus will be on supervised ML to learn from historic data,
this study, the focus will be on supervised ML to learn from historic data, find clustered data, and find clustered data,
and
buildbuild classification
classification modelmodelforfor future
future events.
events. This
This typeofofML
type MLworks
worksparticularly
particularly bestbest when
when usedused inin
combination
combination with with historic
historic data
data (results
(results included).
included). For For this
this purpose,
purpose, aa number
number of of data
data mining
mining tools
tools
such
such as:
as: Weka
Weka [22],
[22], MATLAB
MATLAB[23] [23]and
andOrange
Orange[24] [24]have
havebeen
beendeployed.
deployed. TheThe reason
reason forfor using
using two
two
softwares
softwares (Weka and Orange) for this purpose is to test more ML techniques with varioustraining
(Weka and Orange) for this purpose is to test more ML techniques with various training
and
and testing
testing dataset
dataset sizes.
sizes. The
The data
data isis divided
divided into into two
two parts.
parts. The
The first
first will
will bebe used
used for
for training
training and
and
generating the model, and the second will be used for testing
generating the model, and the second will be used for testing and verification. and verification.
Several
Severalmodels
modelswere weredeveloped
developed using different
using ML techniques
different to be able
ML techniques to be to measure and compare
able to measure and
their performance and accuracy and choose the best. These techniques
compare their performance and accuracy and choose the best. These techniques included Random included Random Forest (RF),
Lazy,
ForestJ48 tree,Lazy,
(RF), Artificial Neural
J48 tree, Network
Artificial (ANN),
Neural Naïve Bayes
Network (ANN), (NB), and Bayes
Naïve Logistic Regression
(NB), (LR).
and Logistic
The class for(LR).
Regression the The
model in for
class all the
cases was in
model setallascases
“eventwastype”
set as(Table
“event1), which
type” included
(Table flashincluded
1), which floods,
coastal floods, lakeshore floods and other kinds of floods. The independent
flash floods, coastal floods, lakeshore floods and other kinds of floods. The independent attributes inattributes in all models
were weatherwere
all models forecast, location,
weather injuries
forecast, direct, injuries
location, injuries indirect, death direct,
direct, injuries death
indirect, indirect,
death property
direct, death
damage
indirect,($) and crop
property damage($)($)and
damage (Table
crop1).damage
Two of ($)these ML methods
(Table 1). Two of used i.e.,ML
these RFmethods
and NB, are usedtested in
i.e., RF
both softwares (Weka and Orange) to ensure the accuracy of results.
and NB, are tested in both softwares (Weka and Orange) to ensure the accuracy of results.
2.3.1.
2.3.1. Random
Random Forest
Forest (RF)
(RF) [25]
RF
RF [25]
is a is a collaborative
collaborative learning
learning technique.
technique. It isIta iscombination
a combination of the
of the Bagging
Bagging algorithm
algorithm andand
the
the
random subspace method and deploys decision trees as the basis for classifier. Each tree is mademade
random subspace method and deploys decision trees as the basis for classifier. Each tree is from
a bootstrap sample from the original dataset. The key point is that the trees are not exposed to
trimming, allowing them to partly overfit to their own sample of the data. To extend the classifiers at
Water 2019, 11, 973 6 of 16
from a bootstrap sample from the original dataset. The key point is that the trees are not exposed to
trimming, allowing them to partly overfit to their own sample of the data. To extend the classifiers at
every branch in the tree, the decision of which feature to divide further is limited to a random sub-data
from the full data set. The random sub-data is chosen again for each branching point.
2.3.2. Lazy
Lazy [25] learning is a ML approach where learning is delayed until testing time. The calculations
within a learning system can be divided as happening at two separate times: training and testing
(consultation). Testing time is the time between when an object is introduced to a system for an action
to be taken and the time when the action is accomplished. Training time is before testing time during
which the system takes actions from training data in preparation for testing time. Lazy learning refers
to any ML process that postpones the majority of computation to testing time. Lazy learning can
improve estimation precision by allowing a system to concentrate on deriving the best possible decision
for the exact points of the instance space for which estimations are to be made. However, lazy learning
must store the entire training set for use in classification. In contrast, eager learning need only store a
model, which may be more compact than the original data.
The MAE is the mean of the absolute value of the error per instance over all samples in the test
data. Each estimation error is the difference between the true value and the estimated value for the
sample. MAE is calculated as follows:
Pn
i=1 yest,i − yi
MAE = (5)
n
where yi is the true target value for test sample i, yest,i is the estimated target value for test sample i,
and n is the number of test samples.
Water 2019, 11, 973 8 of 16
The RMSE of a model with respect to a test data is the square root of the mean of the squared
estimation
Water 2019, 11,errors
x FOR over all samples in the test data. The estimation error is the difference between
PEER REVIEW 8 ofthe
16
true value and the estimated value for a sample. RMSE is calculated as follows:
∑ , −2
s
Pn
RMSE = i=1 ( yest,i − yi ) (6)
= (6)
n
Where yi is is
where thethe
truetrue target
target value
value for test
for test sample
sample i, yest,i, is the, estimated
is the estimated target
target value forvalue for testi,
test sample
sample
and , and
n is the numberis the number
of test of test samples.
samples.
3. Model Training
3. Model Training and
and Testing
Testing Results
Results
The original data consisted of 126,315 samples. After removing the outliers and filtering using
ADAT
ADAT application,
application, 69,558
69,558 instances
instances were
were narrowed
narrowed down
down toto be
be used
used for
for learning.
learning. The data was then
divided into two parts: a larger
larger section (data from 1994 to 2017) for training purposes and the smaller
section (data
(data from
from2018)
2018)for
fortesting
testingpurposes.
purposes.AAscattered
scatteredplot
plotofof
the training
the trainingdata forfor
data event type
event cancan
type be
seen in Figure
be seen in Figure4. 4.
Figure 5.
5. Visual
Visual overview
overview of the
the model
modeltraining
trainingand
andtesting
testingprocess
process inOrange.
Orange.
Figure
Figure 5. Visual overview ofofthe model training and testing process ininOrange.
Theresults
The resultsof ofthe
themodels
modelsand andtheirtheirperformance
performanceare arediscussed
discussedbelow.below.
The results
Based on of
the theconfusion
models and their the
matrix, performance
RF model areusing
discussed
Orange below.
software classified
Based on the confusion matrix, the RF model using Orange software classified 7 out of 97 instances
out of 9
Based
instances on the
as Lakeshore confusion
Flood, matrix, the RF model using Orange software classified 7 out of
44 9as
as Lakeshore Flood, 49 out of 5349 as out of 53
Flood, 32 as
outFlood,
of 58 as32Flash
out ofFlood
58 asandFlash Flood
44 out and
of 44 as44 out ofFlood
Coastal
instances as
Coastal Flood Lakeshore Flood,
correctly.classified 49 out
The correctly of 53 as Flood,
classified 32
instancesout of 58
in total as Flash
was 132 Flood and 44 out
(80.49%).toAccording of 44
to as
the
correctly. The correctly instances in total was 132 (80.49%). According the proportion
Coastal Flood
proportion correctly. The
of the classifications correctly
on the classified instances in total was 132 (80.49%). According to the
of the classifications on the test data, thetest
RFdata, the RF of
was ahead was
allahead of all other techniques.
other techniques. Figure 6 shows Figurethe6
proportion
shows the of the classifications
evaluation results on confusion
and the test data, the for
matrix RF was
the ahead
RF modelof all other
based on techniques.
the supplied Figure
test 6
set.
evaluation results and confusion matrix for the RF model based on the supplied test set. Based on the
shows the
Based on the evaluation
confusion results and confusion matrix for the RF model based on the supplied test set.
confusion matrix, the RFmatrix, the RFWeka
model using model using Weka
software software
classified classified
1850 out of 21041850 out ofFlood,
as Flash 2104 as 820Flash
out
Based
Flood,on the confusion matrix, the RF model using Weka software classified 1850 out of 2104 as Flashas
of 1266 820 out of100
as Flood, 1266 outasofFlood,
100 as100 out ofFlood
Coastal 100 as
and Coastal
six outFlood
of eightandinstances
six out of as eight instances
Lakeshore Flood
Flood, 820 out
Lakeshore Floodof 1266 as Flood,
correctly. 100 out ofclassified
The correctly 100 as Coastal
instances Flood andare
in total six out of eight instances asis
correctly. The correctly classified instances in total are 2776 (79.83%). The2776
MAE(79.83%). TheRMSE
is 0.13 and MAE is
Lakeshore
0.13 and Flood correctly. The correctly classified instances in total are 2776 (79.83%). The MAE is
0.27. The RMSE is 0.27. provides
RF technique The RF technique
best results provides best results
as compared to theastechniques
comparedtested to thein techniques
Weka. Figure tested 7
0.13
in and RMSE
Weka. Figure is 0.27.
7 The RF technique
indicates a visual provides error
classifier best results
for theasRFcompared
model. to
The thediagram
techniquesshowstestedthe
indicates a visual classifier error for the RF model. The diagram shows the distribution of correctly
indistribution
Weka. Figure 7 indicates
of correctly a visual
classified classifier
instances error forclusters,
in the
coloured the RF model. The diagram shows thein
classified instances in coloured clusters, where bigger clusters where(shownthe in bigger
crosses) clusters
are the(shown
correctly
distribution
crosses) are of correctly
the correctly classified instances in coloured clusters, where the bigger clusters (shown in
classified instances and theclassified instances
smaller clusters and the
(shown withsmaller
smallclusters
squares)(shown
are the with small squares)
misclassified instances.are
crosses) are the correctly
the misclassified instances.classified instances and the smaller clusters (shown with small squares) are
the misclassified instances.
Figure 6. Evaluation
Figure 6. Evaluation Results
Results for
for Random
Random Forest
Forest model
model in
in Weka.
Weka.
Figure 6. Evaluation Results for Random Forest model in Weka.
Water 2019, 11, 973 10 of 16
Water 2019, 11, x FOR PEER REVIEW 10 of 16
Water 2019, 11, x FOR PEER REVIEW 10 of 16
Figure9.9.
Figure
Figure 9.Evaluation
EvaluationResults
Evaluation Resultsfor
Results forJ48
for J48model
J48 modelin
model inWeka.
in Weka.
Weka.
Based
Based on
Based on the
on the confusion
the confusion matrix
confusion matrix (Figure
matrix (Figure 10)
(Figure 10) result
10) result for
result forthe
for theANN
the ANNmodel
ANN model(Orange),
model (Orange), successful
(Orange), successful
successful
classifications
classifications are 7 out of 9 as Lakeshore Flood, 49 out of 53 instances as Flood, 27 out of 58as
classifications are
are77out
out of
of99as
asLakeshore
Lakeshore Flood,
Flood,49
49 out
outof
of 53
53instances
instances as
asFlood,
Flood, 2727out
out of
of58
58 asFlash
as Flash
Flash
Flood
Flood and 44 out of 44 as Coastal Flood correctly. The total of correctly classified instances is 127
Flood and
and 44
44 out
outofof44
44as
as Coastal
CoastalFlood
Flood correctly.
correctly. The
The total
total of
ofcorrectly
correctly classified
classified instances
instances is
is 127
127
(77.44%
(77.44%rate
(77.44% rateof
rate ofsuccess).
of success).
success).
Figure10.
Figure
Figure 10.Evaluation
10. EvaluationResults
Evaluation Resultsfor
Results forNeural
for NeuralNetwork
Neural Networkmodel
Network modelin
model inOrange.
in Orange.
Orange.
The
TheNB
The NBmodel
NB modelusing
model usingOrange
using Orangesoftware
Orange softwarecorrectly
software correctlyclassified
correctly classifiednine
classified nineout
nine outof
out ofnine
of nineinstances
nine instancesof
instances ofLakeshore
of Lakeshore
Lakeshore
Flood,
Flood, 4545out
out of
of5353as Flood,
as Flood, 2727out
outofof58 as
58 asFlash
FlashFlood
Flood and 44
and out
44 of
out 44
of as
44
Flood, 45 out of 53 as Flood, 27 out of 58 as Flash Flood and 44 out of 44 as Coastal Flood accordingCoastal
as CoastalFlood
Floodaccording to
according
the resulting
to the
to the resulting confusion
resulting confusion matrix
confusion matrix (Figure
matrix (Figure10). The
(Figure 10). total
10). The of correctly
The total
total of classified
of correctly instances
correctly classified was
classified instances 125 (76.22%).
instances waswas 125
125
The number
(76.22%). of
The confused
number instances
of confusedas Flash Flood
instances and
as Flood
Flash are of
Flood a slightly
and
(76.22%). The number of confused instances as Flash Flood and Flood are of a slightly higher higher
Flood are proportion
of a compared
slightly higher
to the ANN compared
proportion
proportion based on 164
compared to instances
to the
the ANN based
ANN considered
based on 164
on 164for validation.
instances
instances Correspondingly
considered
considered the confusion
for validation.
for validation. matrix
Correspondingly
Correspondingly
showed
the that the
the confusion
confusion NB model
matrix
matrix showed
showed built
that
thatusing
the NB
the Weka
NB model
model software (Figure
built using
built using 11) classified
Weka
Weka 1614 out11)
software (Figure
software (Figure ofclassified
11) 2104 as Flash
classified 1614
1614
Flood,
out of 2104 as Flash Flood, 853 out of 1266 as Flood, 89 out of 100 as Coastal Flood and seven outas
out of 853
2104 out
as of 1266
Flash as
Flood, Flood,
853 89
out out
of of
1266 100
as as Coastal
Flood, 89 Flood
out of 100andasseven
Coastalout of
Flood eight
and instances
seven out of
of
Lakeshore Flood
eight instances
eight instances as correctly.
as Lakeshore The
Lakeshore Flood correctly classified
Flood correctly.
correctly. The instances
The correctly in total
correctly classifiedwas 2563 (73.69%).
classified instances
instances in The
in total MAE
total was was
was 2563
2563
0.18 and
(73.69%). RMSE
The was
MAE 0.29.
was This
0.18 outcome
and RMSE indicates
was 0.29. that
This the NB
outcome has a very
indicates
(73.69%). The MAE was 0.18 and RMSE was 0.29. This outcome indicates that the NB has a very high high
that classification
the NB has a accuracy
very high
when trained on
classification
classification a small data
accuracy
accuracy when
when settrained
or larger
trained ondata
on set asdata
aa small
small bothset
data software
set or larger
or havedata
larger produced
data as similar
set as
set results when
both software
both software have
have
trained
produced and tested
produced similar on both
similar results large
results when and
when trained small
trained and data
and tested sets.
tested onon both
both large
large andand small
small data
data sets.
sets.
Water 2019, 11, 973 12 of 16
Water
Water 2019,
2019, 11,
11, xx FOR
FOR PEER
PEER REVIEW
REVIEW 12
12 of
of 16
16
Figure
Figure 11.
11. Evaluation
Evaluation Results
Evaluation Results for
Results for Naïve
for Naïve Bayes
Naïve Bayes model
model in
in Weka.
Weka.
The LR
The LR model
model (Orange)
(Orange) is
is completely
completely disregarded
disregarded as it has
as it has provided
provided as
as small
small as
as 27.44%
27.44% correctly
correctly
classified instances
instances (Figure
(Figure 12).
12).
Figure
Figure 12.
12. Evaluation
Evaluation Results
Evaluation Results for
for Logistic
Logistic Regression
Regression model
model in
in Orange.
Orange.
Table 22shows
Table showsthe predicative
the models’
predicative performance
models’ usingusing
performance the MAE
the and
MAERMSE
and evaluation metrics.
RMSE evaluation
metrics.
Table 2. Summary of evaluation metrics of all classification models’ performance in Orange and Weka.
Table
Table 2.
2. Summary
Summary of evaluation metrics
metrics of
of all
all classification models’ performance inin Orange and
and Weka.
Modelof evaluationRMSE classification
MAE models’%performance Orangeinstances
of correctly classified Weka.
4. Discussion
4.2.3. Resilience
Bringing human knowledge and AI together is an important way to build resilience. The advantage
of this research is to help comprehend, prioritise, and respond to the potential impact of flooding based
on flood types and protect the community and environment. Flood type classification based on weather
forecast will allow for key decision- makers such as local councils and emergency response agencies to
take action to put in place mitigation measures to decrease the potential impact of an oncoming event.
This is achieved through better understanding of flood types and to make a long-term strategic
plan to prioritise the need for investment based on flood type risk and consequence to reduce impact
on lives, infrastructures, finances, etc. Solutions that are resilient to a variety of flood types can be
made, mitigation measures can be implemented, and prioritising locations which are at higher risk can
be kept under surveillance leading up to an anticipated flood occurrence.
5. Conclusions
This paper describes a robust evaluation of state-of-the-art ML techniques to classify flood type
based on weather forecast, location, days event lasted, begin/end location (name of the place), begin/end
latitude and longitude, injuries direct/indirect, death direct/indirect and property and crop damage to
classify the flood type. The use of ML on historic data in terms of flood type classification is used for
the first time in this study. Extensive historic data has been filtered and used for training and testing
purposes. Several models were built and compared using evaluation metrics i.e., RMSE, MAE and
confusion matrix. The comparison of the evaluation metrics from the models built suggest that the RF
technique outperforms other techniques in terms of RMSE, MAE and confusion matrix (accuracy rate
of 80.49%), followed by ANN (accuracy rate of 77.44%). One of the benefits of this work is that the
same tools and techniques can be used to classify and estimate many other parameters, which have
been used in currently used training set i.e., location, potential financial damage, etc.
This study has focused on flooding as a sub-branch of natural disasters. Nevertheless, there are
many other possibilities to apply AI in natural disasters and help build resilience. Data mining can be
applied to help insurance companies, estimate level of compensations, estimate damage to crops and
buildings, and estimate number of injuries and death in specific areas. By being able to estimate more
accurately and learn from past events, many lessons can be learned and applied in building resilience
against natural disasters. This will also improve public awareness and preparedness, and save lives,
if faced with an adverse natural condition.
Water 2019, 11, 973 15 of 16
A constraint in this study is restrictive access to inclusive data. Most of the big data sets are in the
hands of private companies, and there are no principles for data sharing. Accessing and collecting
these data is difficult and expensive.
The results from this study show for the first time that ML can be used to analyze datasets from
historical disaster events to reveal the most likely events that could occur, should similar events be
experienced in the future. From the literature review, to the best of the authors’ knowledge there is
no equivalent set of data as the NCDC NOAA data from a UK source. It would be advantageous for
the UK environmental agency to provide a detailed historic data from past natural disasters similar
to the NCDC NOAA. This work has proven that the application of ML concept and if such data is
made available from the UK, this ML method can be applied, and more advances can be made within
the UK not only for flooding, but any type of natural disaster (based on provided data type) to help
preparedness, raise awareness and build resilience in disaster management, especially in areas more
prone areas to natural disasters.
The results are highly dependent on data quality and precision. If the data is not reliable or
is “bad” data, the ML is trained on wrong information and therefore the results will be completely
misleading. Missing information or parameter limitation can also adversely affect the model built.
For further study, building a predictive model for future events will be considered. Furthermore,
the use of AI in more natural disaster areas and improving resilience in disaster management, especially
in the UK, is strongly suggested.
Author Contributions: Conceptualization, S.S. and R.K.; Data curation, S.S. and D.J.; Formal analysis S.S. and D.J.;
Funding acquisition, R.K.; Investigation, S.S.; Methodology, S.S.; Project administration, R.K. and G.F.; Resources,
S.S.; Software, S.S.; Supervision, R.K.; Validation, S.S. and D.J.; Visualization, S.S.; Writing–original draft, S.S. and
D.J.; Writing–review & editing, S.S.; R.K.; M.R.C.; G.F. and F.M.
Funding: This research was funded by the EPSRC for funding on BRIM (Building Resilience Into Risk Management),
Ref: EP/N010329/1
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Arslan, M.; Roxin, A.; Cruz, C.; Ginhac, D. A review on applications of big data for disaster management.
In Proceedings of the 2017 13th International Conference on Signal-Image Technology & Internet-Based
Systems (SITIS), Jaipur, India, 4–7 December 2017; pp. 370–375. [CrossRef]
2. Sande, C.J.V.D.; Jong, S.M.D.; Roo, A.P.J.D. A segmentation and classification approach of IKONOS-2 imagery
for land cover mapping to assist flood risk and flood damage assessment. Int. J. Appl. Earth Obs. Geoinf.
2003, 4, 217–229. [CrossRef]
3. Wang, Y.; Chen, A.; Fu, G.; Djordjevic, S.; Zhang, C.; Savic, D. An integrated framework for high-resolution
urban flood modelling considering multiple information sources and urban features. Environ. Modell. Softw.
2018, 107, 85–95. [CrossRef]
4. Mathioudakis, M.; Koudas, N. Twitter monitor: Trend detection over the twitter stream. In Proceedings of
the 2010 ACM SIGMOD International Conference on Management of Data, ACM, Indianapolis, IN, USA,
6–10 June 2018; pp. 1155–1158.
5. Smith, L.; Liang, Q.; James, P.; Lin, W. Assessing the utility of social media as a data source for flood risk
management using a real-time modelling framework. J. Flood Risk Manag. 2017, 10, 370–380. [CrossRef]
6. Sangeetha, S.; Jayakumar, D. Flash flood forecasting using different artificial intelligence method. Int. J. Eng.
Trends Technol. 2018, 3, 140–144. [CrossRef]
7. Robert, P.; Bella, R.; John, C.; Mark, C. Emergency Situation Awareness: Twitter Case Studies. Available
online: https://link.springer.com/chapter/10.1007/978-3-319-11818-5_19 (accessed on 9 May 2019).
8. Lopez-Fuentes, L.; Weijer, J.; González-Hidalgo, M.; Skinnemoen, H.; Bagdanov, D.A. Review on computer
vision techniques in emergency situations. Multimed. Tools Appl. 2017, 77, 17069–17107. [CrossRef]
Water 2019, 11, 973 16 of 16
9. Lai, C.L.; Yang, J.C.; Chen, Y.H. A real time video processing-based surveillance system for early fire and flood
detection. In Proceedings of the Instru-mentation and Measurement Technology Conference Proceedings,
Warsaw, Poland, 1–3 May 2007; pp. 1–6.
10. Liu, F.; Xu, F.; Yang, S. A flood forecasting model based on deep learning algorithm via integrating stacked
autoencoders with BP neural network. In Proceedings of the IEEE International Conference on Multimedia
Big Data, Laguna Hills, CA, USA, 19–21 April 2017; pp. 58–61.
11. Martinis, S. Automatic near real-time flood detection in high resolution X-band synthetic aperture radar
satellite data using context-based classification on irregular graphs. Ph.D. Thesis, Electronic Theses of LMU
Munich, Munich, Germany, 2010.
12. Mason, D.C.; Davenport, I.J.; Neal, J.C.; Schumann, G.J.P.; Bates, P.D. Near real-time flood detection in urban
and rural areas using high-resolution synthetic aperture radar images. Geosci. Remote Sens. 2012, 50, 3041–3052.
[CrossRef]
13. Sayers, W.; Savić, D.; Kapelan, A.; Kellagher, R. Artificial intelligence techniques for flood risk management
in urban environments. Proc. Eng. 2014, 70, 1505–1512. [CrossRef]
14. Wu, Z.Y.; Liu, W.; Xu, J.; Feng, S.; Palaiahnakote, T.L. Context-aware attention lstm network for flood
prediction. In Proceedings of the 24th International Conference on Pattern Recognition (ICPR), Beijing,
China, 20–24 August 2018; pp. 1301–1306. [CrossRef]
15. Witten, I.H.; Eibe, F.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed.; Morgan
Kaufmann: San Francisco, CA, USA, 2011; p. 191.
16. Mason, D.C.; Speck, R.; Devereux, B.; Schumann, G.J.P.; Neal, J.C.; Bates, P.D. Flood detection in urban areas
using TerraSAR-X. Geosci. Remote Sens. 2010, 48, 882–894. [CrossRef]
17. Di Martino, G.; Iodice, A.; Riccio, D.; Ruello, G. A novel approach for disaster monitoring: Fractal models
and tools. IEEE Trans. Geosci. Remote Sens. 2007, 45, 1559–1570. [CrossRef]
18. Keiller, N.; Samuel, G.F.; Ícaro, C.D.; de Rafael, O.W.; Javier, A.V.M.; Otávio, A.B.P.; Rodrigo, T.C.; Lin, T.L.;
dos Jefersson, A.S.; da Ricardo, S.T. Data-driven flood detection using neural networks. Available online:
http://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_39.pdf (accessed on 9 May 2019).
19. Lopez-Fuentes, L.; van de Weijer, J.; Bolanos, M.; Skinnemoen, H. Multi-modal deep learning approach for
flood detection. In Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, 13–15 September 2017.
20. Federal Emergency Management Agency (FEMA), Headquarters: Washington D.C., US. Available online:
https://www.fema.gov/data-visualization (accessed on 8 May 2019).
21. National Climatic Data Center (NCDC NOAA), Asheville, North Carolina, US. Available online: https:
//www.ncdc.noaa.gov/stormevents/ftp.jsp (accessed on 8 May 2019).
22. Hall, M.; Eibe, F.; Geoffrey, H.; Pfahringer, B.; Reutemman, P.; Witten, I.H. The WEKA data mining software:
An update. SIGKDD Explor. 2009, 11, 10–18. [CrossRef]
23. The MathWorks, Inc. MATLAB and Statistics Toolbox Release 2012b; The MathWorks, Inc.: Natick, MA, USA, 2012.
24. Demsar, J.; Curk, T.; Erjavec, A.; Gorup, C.; Hocevar, T.; Milutinovic, M.; Mozina, M.; Polajnar, M.; Toplak, M.;
Staric, A.; et al. Orange: Data mining toolbox in python. J. Mach. Learn. Res. 2014, 14, 2349–2353.
25. Sammut, C.; Webb, G.I. Encyclopedia of Machine Learning and Data Mining; Springer: Boston, MA, USA, 2011.
26. Harris, D.; Harris, S. Digital Design and Computer Architecture, 2nd ed.; Elsevier: San Francisco, CA, USA,
2010; p. 129. ISBN 978-0-12-394424-5.
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).