Keywords: Risk assessment has a primary role in safety-critical industries. However, it faces a series of overall challenges,
Risk assessment partially related to technology advancements and increasing needs. There is currently a call for continuous risk
Dynamic risk analysis assessment, improvement in learning past lessons and definition of techniques to process relevant data, which
Machine learning are to be coupled with adequate capability to deal with unexpected events and provide the right support to
Deep learning
enable risk management. Through this work, we suggest a risk assessment approach based on machine learning.
In particular, a deep neural network (DNN) model is developed and tested for a drive-off scenario involving an
Oil & Gas drilling rig. Results show reasonable accuracy for DNN predictions and general suitability to (partially)
overcome risk assessment challenges. Nevertheless, intrinsic model limitations should be taken into account and
appropriate model selection and customization should be carefully carried out to deliver appropriate support for
safety-related decision-making.
N. Paltrinieri, et al. Safety Science 118 (2019) 475–486
Yet, as the technologies of sensors and timing advanced in managing Fig. 1a shows how a two-dimensional risk matrix depicts formula 1.
high risk operations, the focus on organizational control shifted to A traffic-light colour code represents acceptable (green), unacceptable
highly sensitive programs of computational management of machines (red) or intermediate (yellow and orange) risk. The application of the
that combined multiple measures of performance to provide more additional knowledge dimension (formula 2) would bend the matrix as
consistently reliable management of technical operations in changing depicted in Fig. 1b. Expressing the level of knowledge used for risk
risk environments (Nobre, 2009). The risk remained, but the manage- assessment is an intrinsic feature of the calculated value of risk. This
ment practice and technologies changed. implies the definition of a condition of unacceptable knowledge, which
Villa et al. (2016a, 2016b) demonstrate how different risk defini- may be represented by the space under the matrix in Fig. 1b. We can
tions may affect the approach taken for its assessment and manage- tolerate having relatively little knowledge of scenarios with both low
ment. Villa et al. (2016a, 2016b) also remind us that, while quantitative probability and low consequence. For this reason, the matrix is bent
risk assessment (QRA) is required by law in several industrial sectors, it towards its minimum values in this area. The matrix reaches its peak
is performed mainly during the design phase. For this reason, it only where red is more intense and probability and consequence have their
describes a static risk picture of the system (Pasman and Reniers, 2014). highest values. This represents the need for thorough knowledge of
The issue of realistically evaluating a given scenario s is also addressed scenarios falling in this area.
by Apostolakis (2004) and Creedy (2011). They question the prob- Formula 2 gives important insight on how we should treat risk as-
abilities and frequencies used in quantitative risk analysis, affirming sessment results and supports the continuous improvement of the
that they are retrieved from outdated databases and they may not fit the analysis – we become aware of how uncertainty is always a companion
studied system. They also affirm that probability calculation is heavily and that we should cope with it (De Marchi and Ravetz, 1999). For this
affected by scarcity of data. Landucci et al. (2016a, 2016b) demonstrate reason, we adopt this formula as the basis for this study among nu-
how the impact of an unwanted event is influenced by a series of dy- merous definitions of risk (Aven, 2012). However, another question
namic variables, which are not always considered for its prediction. emerges: how can we consider knowledge in quantitative risk assess-
Moreover, if we want to assess the overall risk covering all the possible ment? In addition, even if we can assess risk with all the knowledge
scenarios si, i = 1,…,N, how do we know that we are not missing available, we would provide a risk picture that is “frozen” in time, while
anything and N = Nmax? We cannot be sure that we will be free from the system is changing around it. The conditions considered on day zero
“atypical” scenarios, as theorized by Paltrinieri et al. (2013, 2012a); may not be valid anymore on day one. For this reason, we also need to
that is scenarios that are not captured by standard hazard identification address how to consider system evolutions. Calibration and correction
techniques because they deviate from normal expectations of unwanted based on new evidence would possibly allow risk analysis to consider
events or worst-case scenarios. evolving conditions and improve system knowledge. Such a dynamic
This study proposes a solution to the risk assessment main chal- approach to risk management is theorized and reviewed by a number of
lenges based on the application of machine learning techniques. While studies (Khan et al., 2016; Paltrinieri et al., 2014; Paltrinieri and Khan,
the following section introduces the additional risk dimension of 2016a,b; Villa et al., 2016a).
knowledge and summarizes the state of the art of the industrial risk Underlying the dynamic approach to risk management is the con-
assessment main challenges, section 3 describes indicator-based ap- cept of “initial conditions” that set the trajectory for evolving system
proaches and a representative case study from the offshore Oil & Gas performance (Kaufmann, 1993; Prigogine and Stengers, 1984). Initial
industry. Machine learning and Deep Neural Networks (DNN) are conditions represent the existing state of an organization at risk, prior
suggested as a possible solution and applied to the case study in section to a specific hazardous event. It includes the basic resources available
4. Section 5 illustrates application results, section 6 discusses benefits for learning and action, as well as the current operating context of the
and limitations of machine learning for risk assessment, and section 7 organization. These conditions shape the possible courses of action for
provides some conclusions. coordinated response to an actual event (Comfort, 2019, 1999). Given
the distinctive set of initial conditions, an organization engages in an
2. Risk knowledge evolving learning process that reflects its practical response to risk, its
interaction with other organizations and conditions, and produces the
Aven and Krohn (2014) suggest including a new dimension in the next (temporary) state of operations. The set of interactive responses by
definition of risk (R): knowledge (k): organizations with the environment, repeated over time, constitutes a
dynamic response system as it adapts to risk.
R = f (s , p , c, k ) (2)
a) b)
Space of
0 PROBABILITY 0 knowledge
Fig. 1. (a) Two-dimensional risk matrix according to formula (1); (b) three-dimensional risk matrix according to formula (2).
N. Paltrinieri, et al. Safety Science 118 (2019) 475–486
Fig. 2. (a) Dynamic Risk Management Framework (DRMF - clockwise), adapted from Paltrinieri et al. (2014); (b) DRMF revolving around the knowledge dimension,
adapted from Villa et al. (2016a).
Fig. 2a represents the Dynamic Risk Management Framework dynamic risk analysis, as opposed to traditional risk analysis incapable
(DRMF) defined by Paltrinieri et al. (2014). DRMF focuses on con- of reflecting evolving real-world risk (Paltrinieri and Khan, 2016a,b;
tinuous systematization of information on new risk evidence. Its shape Yang et al., 2017). However, increasing complexity creates uncertainty
is open to the outside to avoid vicious circles and self-sustained pro- about technological capabilities and adequate strategies to apply them
cesses. It opens the process to new information, early warnings and (Schumacher et al., 2016). For this reason, the transformation of risk
unwanted events by means of continuous monitoring. Such information models should result in handy software tools to enable DRMF applica-
is an input (through communication or consultation) to each of the four tion in practice.
steps of risk management. There is no end to the process, but iteration,
in order to keep track of changes and elaborate them for improved 2.1. State of the art and overall challenges
management. Such iteration is in accordance with the revised definition
of risk in formula 2, as shown by the three-dimensional representation A number of approaches address the need of continuous update of
of DRMF (Fig. 2b) revolving around the dimension of knowledge to risk assessment and may be grouped in two macro groups: empirical
escape the aforementioned space of unacceptability. and theoretical. First-group approaches are generally developed by
Epistemic limitations and continuous modifications of the world observing a large amount of relevant data. Whereas, sparse data would
around us lead to an obvious conclusion: there will be always some- lead to relying on theory-based approaches – given some inevitable
thing that we cannot capture while assessing risk. Within the space of assumptions. Fig. 3 depicts an overall simplification of the state of the
unacceptable knowledge we may encounter Unknown unknown events art of risk assessment and the ideal risk assessment approach on a
(as defined in Table 1), or Black Swans. Taleb (2007) defines such models/data graph.
events as those that can be explained only after the fact and cannot be Representative examples of approaches from industrial applications
anticipated. Our best chance to lower risk is being aware that there are rely on simplified (empirical) models and a big amount of data, as re-
scenarios that we do not know (in part or at all – Known unknowns in ported in the following.
Table 1) and implement DRMF. This represents a way out from un-
acceptable knowledge towards Known knowns (Table 1). Nevertheless, • Popular software for bowtie analysis allow for real-time monitoring
knowledge may be disregarded or simply forgotten, covering the spiral of safety measures performance (CGE Risk Management Solutions
in Fig. 2b backward and incurring in Unknown knowns (Table 1). This B.V., 2016). End-users are from Oil & Gas, aviation, mining, mar-
underlines that fact that the main challenge is effectively capitalizing itime industries and healthcare may use such software to support
the accumulated knowledge and avoiding its oblivion. risk management.
Nowadays, emerging cyber-physical systems within industry pre- • Attempts to better monitor safety measures were carried out by the
sent a significant opportunity to implement DRMF. Such systems embed Norwegian Oil & Gas industry (Etterlid and Etterlid, 2013; Hansen,
internet of things solutions and wireless sensor networks, allowing for 2015; Statoil, 2013). However, they only rely on monitoring and
collection of data records in all phases of product lifecycle (Lasi et al., several of them have been suspended during the 2014–2015 oil
2014; Wang et al., 2016). Lasi et al. (2014) state that the increasing crisis.
digitalization in industry is resulting in the registration of an increasing • Preliminary methodologies developed in collaboration with industry
amount of actor- and sensor-data which can support functions of can also be found in literature (Risk Barometer (Hauge et al., 2015)
Table 1
Definitions of Known/Unknown events (Paltrinieri et al., 2012a).
Unknown unknowns Known unknowns Unknown knowns Known knowns
Events we are not aware that we do not Events we are aware that we do not know, for Events we are not aware that we already Events we are aware that we know, whose
know, whose risk cannot be which we employ both prevention and know, or used to know, with certain risk we can manage with a certain level of
managed learning capabilities confidence confidence
N. Paltrinieri, et al. Safety Science 118 (2019) 475–486
Risk assessment models with relatively strong theoretical bases may Paté-Cornell (2012) and Haugen and Vinnem (2015) warn against
be found in literature and mainly aim at dynamic risk assessment. the misuse of the Black Swan concept. This should not be a reason for
Nevertheless, lack of data from real cases has led to large sets of as- ignoring potential scenarios or waiting until a disaster happens, to take
sumptions and simulations for their development. Representative ex- safety measures and issue regulations against a predictable situation.
amples in the following. On the contrary, it should represent an incentive to continuously learn
and improve (as suggested by Fig. 2b). What can we do against what we
• Khakzad has extensively worked on the application of Bayesian do not know? Sornette (2009) provides an answer to such concern by
networks to dynamic risk assessment problems in the chemical applying a geophysical model (Musgrave, 2013) on the prediction of
process industry (Khakzad, 2015; Khakzad et al., 2014, 2013a). earthquakes. He saw that some degrees of organization and coordina-
• Several contributions to dynamic risk assessment by means of the tion could serve to amplify small fractures, always present and forming
Monte Carlo method can be found in literature (Noh et al., 2014; in the tectonic plates. Organization and coordination may turn small
Targoutzidis, 2012). Such contributions are either applied (Noh causes into large effects, i.e. large earthquakes characterized by low
et al., 2014) or addressing new findings on a purely methodological probability. Paltrinieri and Khan (2016a,b) are in line with this,
level (Durga Rao et al., 2009; Targoutzidis, 2012). claiming that extreme accidents may be described as a particular
• The Petri nets method is also used to improve risk assessment and combination of single events, some of which may be considered as
capture dynamic sequences (Nivolianitou et al., 2004; Nývlt et al., “Small Things” – e.g. apparently meaningless technical malfunction or
2015; Nývlt and Rausand, 2012; Zhou et al., 2017; Zhou and human distraction. Acting on Small Things would allow breaking the
Reniers, 2017, 2016a, 2016b). chain of events leading to an accident and lowering its probability.
A number of approaches are used to describe accident sequences
Improving risk assessment would mean to iteratively learn from this and understand how to stop them. Some of the most known and used in
experience and provide an ideal approach that relies on both Big Data industry are logic trees such as fault tree, event tree and bow-tie dia-
and theoretical models. A first effort to move from the ellipses in the gram (Center for Chemical Process Safety, 2000). An example from the
Models/Data graph (Fig. 3), is provided by Paltrinieri and Khan offshore Oil & Gas industry is shown in Fig. 4 and further described in
(2016a,b). However, we can summarize five main methodological Section 3.1. Logic trees are used to evaluate risk on a probabilistic basis.
challenges to be addressed in such a journey towards “ideal” risk as- The concept of “safety barriers” is used to model and include prevention
sessment: and/or mitigation measures. The Norwegian oil & gas sector (Petroleum
Safety Authority, 2013) commonly uses a specific hierarchical structure
Potential wellhead
Wellhead damage
Fig. 4. Event tree describing a drive-off scenario of an Oil & Gas drilling rig. EDS stands for Emergency Disconnect System. Adapted from (Paltrinieri et al., 2016b).
N. Paltrinieri, et al. Safety Science 118 (2019) 475–486
N. Paltrinieri, et al. Safety Science 118 (2019) 475–486
Ev./year (log)
Frequency of wellhead damage
Fig. 6. Simulated frequency trend of wellhead damage and examples of simulated indicator trends associated to the barrier systems in Fig. 5, according to Matteini’s
(2015) study.
layers to learn representations of data with multiple levels of abstrac- the monitored system, e.g. an offshore Oil & Gas platform.
A computer may be trained to assess risk for safety-critical in-
dustries such as Oil & Gas through deep learning techniques (Fig. 7). 4.1. Deep neural network
This would allow processing a large amount of information in the form
of indicators from normal operations and past unwanted events (from The deep learning model considered in this work is a feed-forward
mishaps to major accidents), which would be used for training. Due to neural network, wherein connections between the units do not form a
the subjectivity of risk definition (as discussed in section 2) risk level cycle (Svozil et al., 1997). The model was chosen due to its similarity
cannot be assigned to each event with certainty and expert supervision with the hierarchical structure used to aggregate indicator information
is needed. Deep learning allows for this supervised learning (Fig. 5). A linear model, such as a linear regression, would be restricted
(Goodfellow et al., 2016). Once the model has learned risk categor- to linear functions, while a DNN model describes the target as a non-
ization, it uses its knowledge to evaluate real-time risk from the state of linear function of the input features (Goodfellow et al., 2016). The DNN
model can be described as a series of functional transformations
N. Paltrinieri, et al. Safety Science 118 (2019) 475–486
associated to the model layers (Fig. 7). The overall length of the chain The simulated indicator values Indi, for i = 1, …, 50, were also
gives the depth of the model. The name “deep learning” derives from transformed into their derivative with respect to time t, in order to
this (Goodfellow et al., 2016). Specifically, the first network layer define the inputs xi to the DNN model:
performs the following computation of the inputs xi, …, xp, which, in dIndi
this case, are performance indicators: xi =
ai = bi + xj wi, j Two datasets were created from the overall database:
j=1 (3)
- Training dataset used to train the DNN model, with 2/3 of the xi and
with i = 1, …, m.
associated y values (1 6 0), and
Where ai, bi and wi are respectively defined as activation, bias and
- Test dataset used to test the DNN model, with about 1/3 of the xi
model weight.
and associated y values (79).
The activations are transformed by the activation function g within
the hidden layer:
A code in Python language was written for training and testing. The
z i = g (a i ) (4) classifier tf.contrib.learn.DNNClassifier from the open-source library
TensorFlow (Google LLC, 2018) was used for the DNN model. The DNN
where zi is defined as hidden unit. The most used activation function is
model structure (i.e. number of layers and nodes) was inspired by
the sigmoid (Goodfellow et al., 2016). Fig. 5 shows only one hidden
Cheng et al. (2016). Moreover, a multiple linear regression (MLR)
layer for the sake of simplicity, but there can be several.
model was applied to the same datasets, to provide a term of compar-
The hidden units are combined to give the activations ao of the
ison and evaluate the DNN model ability to predict risk increase.
output layer:
5. Results
ao = bo + zj wo, j
j=1 (5)
Fig. 8 shows the derivative of risk over time for constant scenario s
where ao, bo and wo are activation, bias and model weight. Fig. 7 shows and consequence c (Eq. (7)) within the considered dataset. For about
only one output for the sake of simplicity, but there can be several. the first 40 year quarters, the risk value is relatively constant as its
Finally, the activation function h is used to obtain the output y, derivative oscillates around “0″. Risk variations can be described by the
which, in this case, is an index for risk R: variation of frequency of well damage in Fig. 6, but they are not sudden
y = h (a o ) R (6) enough to produce high derivative values. It should be remembered
that the frequency of well damage in Fig. 6 is plotted on a logarithmic
Given a dataset of xi and associated y, the model can be trained to scale and does not appropriately show the sharp variations of well
minimize the final loss function in supervised way (Goodfellow et al., damage frequency occurring from year quarter 80, which are anyway
2016), in order to predict y based on new inputs xi. represented by the risk derivative in Fig. 8.
Fig. 9 shows the results of the risk increase prediction tests by the
4.2. Model application and the models. The following outcomes are considered:
Matteini (2015) has simulated the trend of 50 different indicator • true positive (t ,), as correct prediction of risk increase;
categories over 30 years (Fig. 6) to assess the performance of the safety • false positive (f ), as incorrect prediction of risk increase;
barriers involved in a drive-off scenario (Fig. 4). Indicator readings are • true negative (t ), as correct prediction of risk decrease; and
assumed every 6 weeks for a total of 240 values per indicator category.
As already mentioned, aggregation of these indicators through rela-
• false negative (f ), as incorrect prediction of risk decrease.
tively complex barrier hierarchical structures and event tree analysis The DNN model has produced fewer false positives and more false
allowed assessing the wellhead damage frequency over time (Fig. 6). negatives than the MLR model. Fig. 10 shows the incorrect predictions
Trend definition is particularly important in terms of decision-making over the simulated time. The errors are well distributed along the trend
support, because it allows the operator to understand whether the and do not show a specific pattern.
system is improving or worsening in terms of risk. For this reason, the Such results may be also combined to define better representative
study focuses on the prediction of risk increase given the indicator metrics, as reported by Table 3.
trends. Fewer false positives by the DNN model resulted in higher precision
Since the simulated wellhead damage frequency Freq is an expres- and slightly higher accuracy. However, the higher number of false ne-
sion of the scenario probability p, and, in turn, the risk R, for constant gatives affected the recall, which is relatively lower than the MLR
scenario s and consequence c, we can state that: model.
dFreq dR The results were also evaluated considering a set of tolerance values
dt dt (7) for the risk derivative. Outcomes obtained for absolute risk derivative
lower than specific tolerance values were omitted. Tables 4 and 5 show
For this reason, Freq was transformed into its derivative with respect how respectively the DNN and MLR outcomes gradually change from
to time t, and labels indicating its increase or decrease were added
within the database (Table 2). 0.0025
Table 2
Risk variation
Definition of the output used as risk index to predict by means of the DNN
model. -0.0005 0 20 40 60 80 100 120
Freq = wellhead damage frequency value dFreq
0 Risk increase Year quarter
<0 Risk decrease Fig. 8. Derivative of risk over time for constant scenario s and consequence c
(Eq. (7)).
N. Paltrinieri, et al. Safety Science 118 (2019) 475–486
Positives Positives
Tolerance Considered predictions (%) Tp Tn Fp Fn
0 100 37 29 7 6
0.0001 81 33 24 3 4
4 False Negatives 26 True Negatives 0.0002 61 25 19 2 2
0.0003 42 21 10 0 2
0.0004 34 17 9 0 1
Fig. 9. Test results: number of true positives, false positives, true negatives, and 0.0005 33 17 9 0 0
false negatives. 0.0006 28 13 9 0 0
0.0007 24 12 7 0 0
0.0008 23 11 7 0 0
0.0009 20 9 7 0 0
the baseline case (null tolerance value) to a tolerance value equal to 0.001 14 6 5 0 0
0.001, where only 14% of the predictions are considered (the highest
peaks in Fig. 8) and no errors are made.
Fig. 11 illustrates the trend of the considered metrics if the tolerance
Table 5
values are variated. The DNN model reports high levels of precision,
MLR outcomes for specific tolerance values.
reaching 100% for a tolerance value equal to 0.0003. Accuracy and
recall are also satisfactory, as they reach 100% if the tolerance is equal Tolerance Considered predictions (%) Tp Tn Fp Fn
to 0.0005. On the other hand, the MLR model has higher performance 0 100 39 26 10 4
than DNN only in terms of recall, as it reports constantly higher values 0.0001 81 34 22 5 3
and reaches 100% for a tolerance value equal to 0.0004. MLR accuracy 0.0002 61 26 17 4 1
and precision reach 100% only if tolerance is set to 0.001 due to a 0.0003 42 22 9 1 1
0.0004 34 18 8 1 0
persistent false positive error, as seen in Table 5.
0.0005 33 17 8 1 0
0.0006 28 13 8 1 0
6. Discussion 0.0007 24 12 6 1 0
0.0008 23 11 6 1 0
0.0009 20 9 6 1 0
The case study results allow illustrating benefits and limitations of
0.001 14 6 5 0 0
artificial cognition (particularly deep learning) for risk assessment in
industry. Having said that, it must be underlined that the main issue is
to identify or customize the most suitable model and features given a
specific purpose. This requires knowing the state of the art, defining a required before and in function of acquiring relevant and usable
systematic and evaluation-oriented approach, and applying the right knowledge.
amount of creativity. To this end, the categories of Known/Unknown Paltrinieri et al. (2012b) compare an ideal risk management model
events (Table 1) and the challenges listed in Section 2.1 are used as a with the case of an atypical accident (Fig. 12). In this work, we plot the
structure to discuss the case study results. machine learning effort for the ideal case, defined as follows:
d (Ai )
6.1. Known/unknown framework E=
d (K i ) (12)
Paltrinieri et al. (2012b) report an adapted version of the risk where E is the machine learning effort equal to the derivative of the
management cycle by Merad (2010), which includes the categories of awareness A for the unwanted event with respect to the knowledge K of
Known/Unknown events (Table 1). Such a framework is used in this the unwanted event for the ideal case i .
work to describe the impact of machine learning on Known/Unknown In an initial phase, despite a condition of knowledge and awareness
events (Fig. 12). While machine learning may be considered mostly lack, the latter may relatively increase due to reasonable doubt (Merad,
useful for Known knowns and Unknown knowns, most of the effort is 2010). In an ideal case, such reasonable doubt leads to a consolidated
False positive Fig. 10. False positives and false negatives for
the prediction of increase in frequency of well-
Correct predict. head damage obtained from the models.
False negative
0 20 40 60 80 100 120
Year quarter
N. Paltrinieri, et al. Safety Science 118 (2019) 475–486
a) 100 The models are trained (phase 3 in Fig. 12) once relevant knowledge
is identified – consciously or unconsciously. In fact, they have more
Metric value (%)
computational power to process all possible variables, so they can de-
90 tect patterns where human assessors may not be able to see patterns or
85 predictive risk factors.
80 Once accident scenarios are considered Known knowns, machine
Recall learning may help maintain such capability and avoid the potential shift
75 from Known knowns to Known unknowns due to loss of memory (phase
0 0.0002 0.0004 0.0006 0.0008 0.001
4 in Fig. 12). However, this phase does not require particular effort as
Tolerance the models are supposed to be trained and effective in terms of pre-
b) 100 diction.
In case of an accident, which may be due to several reasons, such as
Metric value (%)
the presence of an atypical scenario or a loss of memory, a phase of
Accuracy compensation will occur. Such phase represents a response to experi-
85 ences failure and requires an intense effort for implementing or im-
80 proving machine learning approaches in the system (phase 5 in Fig. 12).
0 0.0002 0.0004 0.0006 0.0008 0.001 6.2. Dynamicity
Indicators reporting the system performance on a regular basis re-
Fig. 11. Metrics for specific tolerance values of predictions by (a) the DNN present an opportunity to consider changes and evolutions, and con-
model and (b) the MLR model. tinuously update risk assessment. The example used (Matteini, 2015)
simulates the monitoring of 50 indicator categories with regular
reading every 6 weeks (Fig. 6). Heterogeneous indicators are considered
awareness that “something may go wrong” (Kaplan and Garrick, 1981). to describe the safety barrier “stop drive-off”. Considering operational
On the other hand, relative unawareness of a specific accident scenario and organizational factors (e.g. number of simulator hours carried out
s and no delayed reasonable doubts can potentially lead to an atypical by the DPO in the last three months), in addition to technical ones (e.g.
accident (Paltrinieri et al., 2015, 2011). the number of thruster controls failures in the last three months), aims
The effort in machine learning required by the ideal case is parti- at producing proactive risk evaluation (Paltrinieri et al., 2016a;
cularly required in the initial phase (phase 1 in Fig. 12). A system for Scarponi and Paltrinieri, 2016).
data collection and categorization (the “small things” of section 3) is a Nevertheless, these indicators reflect different projections in time. A
necessary support for machine learning, as incomplete and unreliable technical failure may be directly associated to the accident develop-
input data inevitably affect the quality of results. Such a system should ment, while early operational/organizational deviations have a lower
be designed at the early stages of risk management for effective im- degree of causality and may be disregarded and not registered.
plementation of machine learning methods. Moreover, operational and organizational indicators rely on personnel’s
The data collection system would also be functional to the realiza- feedback and may be collected less frequently than technical ones. For
tion that there are potential unknown scenarios (Known unknowns). In this reason, sparsity of data may be especially encountered for opera-
this phase, new effort should be made to build machine learning models tional and organizational indicators, and this may undermine the dy-
(phase 2 in Fig. 12). The models may already represent a possible re- namic capabilities of the model.
sponse to Known unknowns events, if associated with unsupervised It must be also mentioned that the DNN model used in this case-
learning (Hastie et al., 2009). study has limitations concerning dynamicity. In fact, every time a new
Known Known
Unknowns Knowns
1 2 3 4 5
Unknown Unknown
Unknowns Knowns
N. Paltrinieri, et al. Safety Science 118 (2019) 475–486
set of indicators arrives, the model needs to be re-trained. However, 90 and 95% for a value of tolerance equal to about 0.0001 and 81% of
retraining from scratch every time is computationally expensive and predictions are considered, and reach 100% for tolerance equal to
delays the time from data arrival to serving an updated model. To tackle 0.0005 with 33% of predictions. For tolerance of 0.0001, MLR accuracy
this challenge, a warm-starting system is implemented by Cheng et al. and precision are equal to 87%, while recall is at about 92%. All the
(2016), which initializes a new model with embeddings and weights MLR metrics reach 100% when tolerance is equal to 0.001 and only
from the previous model. 14% of predictions are considered. Such behavior may be explained by
the higher sensitivity of DNN models (Christian and Griffiths, 2016),
6.3. Cognition which commit errors only for relatively small risk variations or in the
vicinity of stationary points. However, such sensitivity should be ap-
An artificial cognition model has the potential to capitalize the in- propriately handled as it may lead to over-fitting phenomena (Christian
formation collected from indicators and avoid disregard of past lessons. and Griffiths, 2016).
This is made possible by the training sessions, where model features are A limitation of DNN is that its results can be altered by its random
defined. In this case study, supervised learning was applied: derivatives initialization of parameters before every training session. This has the
of the 50 indicator categories were provided together with the asso- potential to affect the whole model development and, in turn, lead to
ciated outputs showing risk increase or decrease. This allows for au- slight alterations of prediction capabilities. Such differences may be
tomatic learning of aggregation structures for input data. Despite the amplified in case of relatively small datasets and few iterations to
fact that it was not used in this case study, unsupervised learning is also minimize the final loss function during training. Another limitation of
a possibility for machine learning (Hastie et al., 2009). In this case, the the DNN model used in this case study may be related to its setting
desired output is not known (some potential patterns may be anyway based on Cheng et al.’s (2016) work. In fact, the DNN model used may
provided) and the model aims at drawing inferences in the dataset used. still need appropriate optimization for the case study.
The additional knowledge dimension for risk definition (as sug- As mentioned, the quality of the model, as with all models, depends
gested by Aven and Krohn (2014)) is quantitatively represented by the on the quality of the data input. For instance, if humans within the
characteristics of the training dataset, such as the number of indicator system do not think a factor is important, they may not collect the data
categories (columns) and values over time (rows), and the number of or include them in the model. In addition, according to the “no free
iterations to minimize the final loss function during model training. In lunch theorem” (Wolpert, 2002), if an algorithm A performs better than
this way, a fundamental concept such as the level of assessment un- algorithm B on a certain problem, it is not necessarily true that A will
certainty can be measured and quantitatively compared. perform better on other problems. This is why in machine learning it is
When we consider such training processes, it is easy to assume that common to approach the problem by trying more solutions for a par-
more is better. Nonetheless, as Christian and Griffiths (2016) point out, ticular case. A further model to consider may also be the one suggested
“the question of how hard to think, and how many factors to consider, is by Cheng et al (2016): a mixed machine-learning model to combine the
the heart of a knotty problem that statisticians and machine-learning strengths of both linear and deep approaches. Such technique would
researchers call over-fitting.” The DNN model may have such a sensi- allow memorization of registered indicator interactions and general-
tivity to input data that the solutions it produces are highly variable. ization of previously unseen ones.
There can be errors in how the data were collected or reported – this is
especially true for operational and organizational factors. For instance, 6.5. Emergence
collection of the number of DPO delays in the last three months (Fig. 6)
depends on DPO’s memory (or honesty) and small mistakes may be Major accidents are (fortunately) rare events in industry, even
amplified in the prediction. For this reason, cross-validating with a test considering evidence of fat-tailed distributions (Taleb, 2007). For this
dataset is essential. In this study, a relatively more complex model reason, appropriate models should be used to deal with such un-
(DNN) resulted 1.3% more accurate than a linear one (MLR – Table 3), expected events. To this end, linear regression techniques are well-
despite the presence of several operational and organizational factors. known for their limitation to handle rare events data (King and Zeng,
These factors were simulated to show high volatility (e.g. percentage of 2001). Relatively simple models tend to forecast the basic trend and
time in the last three months with more than an operator monitoring), may potentially miss several exact points (Christian and Griffiths,
but we should consider that they may still not be completely realistic. 2016). Sophisticated models such as DNN are better suited to consider
rare events, due to their sensitivity to input data and capability to
6.4. Data processing generalize (Cheng et al., 2016).
The case study addressed in this work does not directly address such
While machine-learning in general allows overcoming the definition problems, because it simulates dynamic positioning operations where
of tangled data aggregation structures and relative weights used for only deviations from normal conditions and no specific accidents occur.
indicators, there are some important differences among the specific The only relevant result is represented by the demonstration of the
techniques. Linear models such as MLR are widely used for prediction potential flexibility of a DNN model. In fact, such a machine learning
purposes. Indicator interactions can be easily memorized through the model is not tied to a rigid structure to aggregate information from
provided datasets, such as the one in this study (Fig. 8 and Table 3). indicators (Landucci and Paltrinieri, 2016), but it has the potential to
However, a relatively simple model may not be able to capture the reshape its own structure based on new batches of data. Such an ap-
essential pattern in the data (Christian and Griffiths, 2016). General- proach reminds one of that proposed by Paltrinieri et al. (2013), who
ization of lessons learned for prediction under unknown circumstances developed a technique to update logic trees describing accident sce-
requires a higher level of complexity, which linear functions may fail to narios dynamically, in order to account for new evidence and prevent
provide (Goodfellow et al., 2016). Deep neural networks are suggested emergence of atypical events.
for such task (Christian and Griffiths, 2016) and the case study results Finally, to address the emergence challenge, it is possible to apply
hint it: when tested with an unknown dataset, the DNN model produced progressive learning techniques, which may be independent of the
66 correct predictions of risk increase/decrease against 65 correct number of indicator categories and to learn new indicators once re-
predictions by the MLR model (Fig. 8). These results show not only levant information emerges, while retaining the knowledge of previous
slightly higher accuracy, but also a 5%-higher value for the DNN model ones (Venkatesan and Er, 2016). For instance, new sets of indicators
precision – compensated by lower recall. describing the appropriate operator response to alarms could be in-
The DNN model seems to perform even better if some tolerance is troduced in the case study in a second phase without invalidating the
introduced (Tables 4, 5 and Fig. 11). DNN metrics reach values between evaluation.
N. Paltrinieri, et al. Safety Science 118 (2019) 475–486
The case study showed how a machine approach allows predicting This research was supported by the project Lo-Risk (“Learning about
the overall risk of well damage increase or decrease based on the var- Risk”), supported by the Norwegian University of Science and
iation of singular technical, operational and organizational indicators. Technology – NTNU (Onsager fellowship).
This approach may be used for both real-time risk assessment of the
