Faller Pressure Production

Journal of Petroleum Science and Engineering 210 (2022) 109937
Contents lists available at ScienceDirect
Journal of Petroleum Science and Engineering

journal homepage: www.elsevier.com/locate/petrol
Data-driven deep-learning forecasting for oil production and pressure

Rafael de Oliveira Werneck a ,∗, Raphael Prates a , Renato Moura a , Maiara Moreira Gonçalves b ,
Manuel Castro a , Aurea Soriano-Vargas a , Pedro Ribeiro Mendes Júnior a , M. Manzur Hossain b ,
Marcelo Ferreira Zampieri b , Alexandre Ferreira a , Alessandra Davólio b , Denis Schiozer b,c ,
Anderson Rocha a
a RECOD.ai, Institute of Computing, University of Campinas – UNICAMP, 13083-852 Campinas, SP, Brazil
b
CEPETRO, University of Campinas – UNICAMP, 13083-970 Campinas, SP, Brazil
c
School of Mechanical Engineering, University of Campinas, 13083-970 Campinas, SP, Brazil
ARTICLE INFO ABSTRACT
Keywords: Production forecasting plays an important role in oil and gas production, aiding engineers to perform field
Forecasting management. However, this can be challenging for complex reservoirs such as the highly heterogeneous
Data-driven carbonate reservoirs from Brazilian Pre-salt fields. We propose a new setup for forecasting multiple outputs
Deep learning
using machine-learning algorithms and evaluate a set of deep-learning architectures suitable for time-series
Oil production
forecasting. The setup proposed is called N-th Day and it provides a coherent solution for the problem of
Pre-salt
forecasting multiple data points in which a sliding window mechanism guarantees there is no data leakage
during training. We also devise four deep-learning architectures for forecasting, stacking the layers to focus
on different timescales, and compare them with different existing off-the-shelf methods. The obtained results
confirm that specific architectures, as those we propose, are crucial for oil and gas production forecasting.
Although LSTM and GRU layers are designed to capture temporal sequences, the experiments also indicate
that the investigated scenario of production forecasting requires additional and specific structures.
used in both periods. Therefore, the short-term forecast is usually done

1. Introduction
by analytics or machine-learning approaches (Tadjer et al., 2021).
Managing hydrocarbon reservoirs is challenging as it requires in- More recently, the use of data-driven approaches to perform pro-
tegrating different areas of knowledge such as reservoir and produc- duction forecasts has gained attention. These approaches only consider
tion engineering and geosciences. Through field data and production field response data and machine-learning techniques to perform fore-
measurements, we gain insights into the reservoir behavior and can cast (Kubota and Reinert, 2019; Davtyan et al., 2020; Liu et al., 2020;
perform estimations for the reservoir’s future, e.g., production fore- Zhong et al., 2020), which are critical for pre-salt and unconventional
casting (Ertekin and Sun, 2019). Reservoir simulation models are the
reservoirs (Sun et al., 2018). A data-driven approach can be of im-
most common tool used to forecast the production of petroleum fields.
portant aid (directly or as a complement) to model-based approach
Although they are a consolidated tool to assist the decision-making
process, they have some drawbacks, especially for short-term contexts. approximations, providing production forecasts within a window of
These models can also be very time-consuming depending on the field several days or weeks.
size and the complexity of the reservoir. These issues are significant Accurate forecasting is an essential part of a reservoir’s operation,
for some Brazilian pre-salt fields since they are giant reservoirs formed as it helps engineers make proper designs and developments for the
by highly heterogeneous carbonate rocks under a complex production field (Liu et al., 2020). However, it is difficult to predict well production
strategy such as WAG (water alternating gas) injection. These reservoir and bottom-hole pressure (BHP) accurately, as the reservoir properties
models are commonly used for long-term decisions, which involve
and its dynamics influence them, i.e., injection of gas and water,
several years of production forecast. On the other hand, short-term
maintenance of reservoir pressure, and interference from other wells.
forecasting requires specific tuning of model properties to avoid high
production fluctuations in the transition period from past to future, a In addition, the well monitoring data are non-linear, non-stationary,
common issue observed given the changes in the operational controls non-parametric, noisy, and of a chaotic nature.
∗ Corresponding author.
E-mail address: rafael.werneck@ic.unicamp.br (R.d.O. Werneck).
https://doi.org/10.1016/j.petrol.2021.109937
Received 15 April 2021; Received in revised form 3 November 2021; Accepted 6 November 2021
Available online 13 December 2021
0920-4105/© 2021 Elsevier B.V. All rights reserved.
R.d.O. Werneck et al. Journal of Petroleum Science and Engineering 210 (2022) 109937
This work focuses on data-driven approaches to perform well pro-

duction and pressure forecasting in the short term. We aim to com-
bine past information from several sensors from producer and injector
wells into an end-to-end solution. The solution is expected to predict
several days, shaping a multiple-output regression model based on
multivariate time-series data. The short-term oil and pressure fore-
casting scenarios are much longer than the traditional scope in the
machine-learning literature. Therefore we must leverage state-of-the-
art machine-learning techniques and appropriately adapt them to cope
with these challenges (Ertekin and Sun, 2019). In this case, we compare
machine-learning with analytics approaches and not model-based ones.
We propose: (a) a new forecasting setup focused on the prediction
window’s last day, which considers the challenging problem of forecast-
ing for several days; (b) the use of stacked Recurrent Neural Networks
(RNN) in forecasting, allowing them to focus on different timescales,
and (c) a discussion on the use of off-the-shelf methods to forecast oil
production and pressure.
This paper can be divided into three main parts: Information and
Theory, which comprises the Introduction and Forecasting in the litera-
ture, respectively Sections 1 and 2. The work performed is represented
in Sections Section 3 (Proposed Method), 4 (Experimental protocol),
and 5 (Experiments and Results). Finally, we present the Conclusions
Fig. 1. Common setups on multi-step-ahead and multi-outputs according to Bontempi
and Future Work in Section 6.
(2008). The example considers two input and three output sequences. 𝜑
̂ denotes the
predicted values.
2. Forecasting in the literature
This paper investigates forecasting for oil production and bottom- Pao (2007) proposed a rolling cross-validation setup, in which the
hole pressure (BHP) of producer wells in a carbonate reservoir, con- testing set is separated into folds. The training for each fold is composed
sidering only data from the floating production storage and offloading
of the previous data in the time series. This approach is similar to
units (FPSOs). Traditional methods in the oil and gas industry leverage
what we call retraining. When using retraining, we improve the learning
numerical reservoir models to perform production forecasts, a model-
model in each step of the testing by incorporating the most recent data
based approach known to be time-consuming and computationally
points. However, one disadvantage of this method is that retraining in
expensive. The literature for forecasting well production based on data-
each step is time-consuming.
driven features, mostly with deep learning techniques, is still scarce and
Bedi and Toshniwal (2019) studied a different technique. They used
not well documented (Cao et al., 2016; Kubota and Reinert, 2019; Zhan
an output window of multiple outputs and slid the window the size
et al., 2019; Davtyan et al., 2020; Liu et al., 2020; Zhong et al., 2020).
of the output for the next prediction with the ground-truth values as
This section presents different setups for time-series forecasting in
input. Thus, they combined their results at the end of the prediction
the prior art that considers a broader horizon, i.e., a setup that performs
and presented a plot of their predictions. In this method, they avoid
multiple output predictions for a target variable. The multiple output
multiple predictions for the same day. However, they combined differ-
prediction aims at predicting a sequence of two or more data points
ent confidences of the prediction when concatenating the last day of a
based on a sequence of input data. We discuss the scarce literature on
data-driven oil production forecasting. Finally, we describe off-the-shelf window with the first day of the next window.
and state-of-the-art methods for forecasting general data that could be We detail these different setups in the following figures. For each
adapted to the present problem. figure, one must consider each column as a day in the test set and the
different rows as different predictions. The concatenation approach per-
2.1. Forecasting setups forms its multiple output prediction and slides the window by one day
to perform the next prediction. All predictions are then concatenated
The literature of time-series forecasting lacks a default setup on how and the representation is used to calculate the metrics. In Fig. 2, we
to perform forecasts considering multiple outputs. Bontempi (2008) have three predictions (blue, green, and red, with yellow representing
describes the two most common approaches to multiple outputs as a each input) of three days, and they are concatenated at the end. Hence,
conditional distribution over input and output sequences. Fig. 1 depicts the last day of the first prediction (day 3 in blue) is succeeded by the
these setups in a graphical model, where Fig. 1(a) is the iterated first day of the second prediction (day 2 in green), and so on. This final
prediction, and Fig. 1(b) is the direct prediction. The iterated prediction concatenation is used to calculate the metrics of this setup.
is an iterative one-step-ahead prediction, in which the dependencies are We included a yellow strip to represent the training set in the
preserved, but the error is propagated throughout the iterations. On the retraining setup, showing that we have a larger training set for each
other hand, the direct approach has different models for each next step, forecast. The method retrains before performing the next prediction. In
making it a conditional independent problem. the end, the predictions complete the test set (last row), and we then
Brownlee proposed another setup for multiple outputs on his web- evaluate the method. Fig. 3 depicts this setup.
site (Brownlee, 2018). In this approach, each prediction window’s Finally, the sliding window setup performs a prediction for multiple
output is concatenated in a sequence, considering a one-step sliding outputs without intersections between the columns, i.e., if the first pre-
window, and then evaluated. The pros of this approach are that every diction predicts day 1 to day 3, the second one predicts day 4 to day 6.
multiple output prediction is considered when calculating the metrics. Fig. 4 illustrates this method. These different setups and their problems
However, it is impossible to have a plot representation of the prediction (time-consuming and combining different confidences of prediction)
results for this method, as it has more than one result for each data motivated us to discuss and propose a new, more compatible setup with
point. the challenge of forecasting for longer periods.
2
Departing from the linear approach formulation, Liu et al. (2020)

performed production forecasts using an ensemble empirical mode
decomposition (EEMD), followed by Long-Term Short-Term learning.
They split two daily datasets into training and test sets, decomposed
the training set using an EEMD to choose the basis functions, applied
a Genetic Algorithm to select the hyperparameters of their methods,
and tested on the test set. However, they only compared their approach
with other simple approaches that are based on their own. There is no
detailed information on how they performed the forecasting, whether
it was daily or if they used a longer time frame.
Other works focus on Multilayer Perceptron (MLP) networks (Aizen-
berg et al., 2016; Amirian et al., 2018; Yuan et al., 2021) to perform
forecasting. However, these approaches do not consider the informa-
Fig. 2. Concatenation setup. For each prediction (lines in blue, green, and red), their
values are concatenated in a new line, in which each number corresponds to a day
tion’s propagation through time. The literature is more recently moving
(last blue day followed by first green day) to calculate the measures of the forecast. to Recurrent Neural Networks (RNN) (Song et al., 2020) and taking
(For interpretation of the references to color in this figure legend, the reader is referred advantage of stacking RNN layers (Al-Shabandar et al., 2021; Chaikine
to the web version of this article.) and Gates, 2021), which are aligned with our proposed RNN. However,
such approaches have fewer layers and do not consider multi-output —
several days at a time.
In this line, some studies in the literature focus on long-term fore-
casting with RNNs. Pan et al. (2019) proposed two different scenarios
that use specific types of LSTMs trained considering hidden periods
of the time series. In the first scenario, a Denoising Long Short-Term
Memory (DeLSTM) only considers oil production rate, followed by a
Decline Curve Analysis (DCA) to perform the forecasting. The second
scenario uses Savitzky-Golay Cascaded Long Short-Term Memory (SG-
CLSTM) to smooth, fill hidden periods, and forecast. In this case, both
Fig. 3. Retraining setup. After each forecast, the method includes the ground-truth oil production rate and pressure are used as input data. Although the
value of the predicted part and trains again for the next output window, represented results indicate a long-term production forecasting ability, they did not
by the hashed blocks.
focus on the forecasting window or compare the forecasting results
against other strategies.
Another type of recurrent network used for time-series forecasting
is Echo State Networks (ESN), which is based on reservoir computing
theory (Bianchi et al., 2021). Deng and Pan (2020) took advantage
of ESN to capture the inherent dependencies among the wells and
added empirical fractional-flow relationships to perform well-control
optimization. However, the proposed approach is designed for mature
fields where a certain amount of water has reached the producers,
which is not the case of the pre-salt field in focus here.
Different from prior works, which are data-driven, Zhong et al.
Fig. 4. Sliding window setup. For each prediction, there is no intersection between (2020) designed a proxy model using a conditional convolutional gen-
the forecast output windows.
erative neural network to predict field production considering water-
flooding for oil recovery. First, they used geostatistical methods to
generate stochastic input parameters. Then, they applied a numerical
2.2. Forecasting oil production simulator to obtain a train and test sets and trained the proxy model.
Finally, they applied the material balance rule to calculate the oil
Some authors already addressed the problem of the production production rate. The authors performed three experiments of their
forecast using data-driven algorithms. Kubota and Reinert (2019) tack- approach but they only compared it with the simulator model.
led this issue using linear regression techniques along with recurrent
Kim and Durlofsky (2021) use an RNN-based proxy to predict
neural networks. The authors only considered three sets of time series:
different oil and water rates. It uses the well’s BHP data sequence as
injection history, production history, and the number of producers.
input. Unlike our goal, the authors have trained the network with 256
They performed experiments for longer prediction horizons (12 months
simulated BHP profiles, and the obtained predictions are used as input
but on monthly data). The authors showed that a reliable production
for a constrained production optimization problem. The proposed RNN-
forecast could be made with data-driven models, without a geological
based proxy model is based on a sequence-to-sequence LSTM layer,
model or numerical simulators. These findings motivated us to design
the methods we investigate herein. which is quite similar to what we propose in this paper. However,
Davtyan et al. (2020) also used linear regression methods to forecast our network is formed by two stacked LSTM layers and several dense
oil production, in this case, based on sliding windows. They combined layers at the top, enabling it to capture diverse aspects of the input time
features from aggregated characteristics of oil fields, local information series. In addition, we propose a multivariate input where BHP is one
of the wells, pressure features, and autoregressive features to perform variable among others. It is therefore impossible to fairly compare it to
their regression. They applied their approach on a monthly-grained our approach for forecasting.
dataset, predicting the following month. They also performed an ex- Razak et al. (2021) proposed an encoder–decoder approach for long-
periment considering a longer time frame but creating 𝑚 independent term production forecasting combined with well properties and future
regression models for a horizon of size 𝑚. Unfortunately, they did not controls of the producer. They also perform transfer learning, i.e., they
compare their method with any other forecasting approaches and only train the network in a collection of historical production data from
applied simple linear regression. other wells and perform a fine-tuning for the target well. Thus, the
3
model can exploit other dynamical trends to improve the generaliza- They evaluated their approach, called N-BEATS, on the M4 (Makridakis
tion. After concatenating the input encodings, the decoder predicts oil, et al., 2018), M3 (Makridakis and Hibon, 2000), and Tourism (Athana-
water, and gas as multivariate time series. Although the results indicate sopoulos et al., 2011) datasets.
a long-term forecasting ability, they only perform 6-data-points-ahead Vaswani et al. (2017) proposed an architecture based on attention
forecasting and do not compare it with other strategies. mechanisms called Transformer. Their network model has an encoder–
Most of these studies are data-driven, the same category of our decoder structure using stacked self-attention and point-wise connected
proposed approach. Nevertheless, such methods comprise simple com- layers. As the model has neither recurrent nor convolution layers,
binations of linear regressions or recurrent networks with fewer layers. they needed a positional encoding to have information about relative
This paper proposes solutions with deeper architectures and, most positions in the sequence. The Transformer architecture was evaluated
importantly, different (and more complete) evaluation setups. There- on translation tasks, outperforming other architectures.
fore, we opted not to compare the proposed methods with the ones Taylor and Letham (2018) described a modular regression model
mentioned above but, instead, with more recent deep-learning derived that can be adjusted to different time series with the help of a specialist.
methods and off-the-shelf solutions. Their method, named Prophet, uses a decomposable time series model
to obtain three components: trend, seasonality, and holidays. The au-
2.3. Baseline methods thors frame the problem as a curve-fitting exercise, as they claim it
provides flexibility, no need for interpolating missing values, efficiency
We adopted two baselines for our experiments: Pressure-Normalized in fitting the model, and interpretable parameters. Taylor and Letham
Decline Curve Analysis and a simple Recurrent Neural Network. Decline evaluated Prophet in business time series, specifically for Facebook
Curve Analysis (DCA) is a traditional method used in the oil and gas Events.
industry to predict future production (Belyadi et al., 2019) using a
graphical procedure to analyze the declining production rates. Lacayo 3. Proposed method
and Lee (2014) proposed a modified curve analysis method for uncon-
ventional reservoirs that did not achieve the pressure stabilized state. In this section, we present our proposed approaches to deal with
The Pressure-Normalized Decline Curve Analysis (PN-DCA) performs a bottom-hole pressure and oil production forecasting. Our main con-
decline curve analysis using pressure-normalized production rates. This tributions are divided into three fronts: first, in terms of validation,
pressure-normalized rate can be described by Eq. (1). we propose a more realistic setup for comparing different forecasting
𝑞 methods when predicting multiple days. This setup allows us to plot
𝛥𝑝𝑁 = (1)
𝑝𝑖 − 𝑝𝑤𝑓 results correctly and avoid mixing different prediction confidences in a
long-range. Second, we introduce a series of pre-processing, data aug-
where 𝑞 is the production rate, 𝑝𝑖 the average initial reservoir pressure,
mentation, and inclusion of injection data in the forecasting modeling.
and 𝑝𝑤𝑓 is the flowing pressure. This pressure-normalized rate 𝛥𝑝𝑁 can
Finally, we propose methods to leverage cutting-edge deep-learning
be calculated by Eq. (2).
formulations for temporal data to tackle bottom-hole pressure and oil
1 √
= 𝑚 𝑡 + 𝑏′ (2) production forecasting. Fig. 5 presents a pipeline of our methodology.
𝛥𝑝𝑁
where 𝑚 is the slope of the straight line and 𝑏′ is where the curve 3.1. Proposed evaluation setups
intercepts the 𝑦-axis.
RNNs are one of the most promising techniques for time-series fore- To avoid the problems raised in Section 2, we propose two fore-
casting. Their main advantage is the presence of memory cells capable casting setups for time series. The first, which we denominate First
of propagating information through time. At a specific time-step 𝑡, the Prediction, slides a multi-output window one step at a time. This method
output ℎ𝑡 is calculated based on the current time-step input 𝑋𝑡 and the obtains the first prediction for each test data point. We thus considered
previous time-step output ℎ𝑡−1 . The RNN network is a simple network all the data from the first forecast window and, for the subsequent
composed of an RNN layer, followed by a single dense layer to our predictions, only the last data of the output. Fig. 6 details how this
output size. approach works. Considering each column a data point and each row
the predictions of multiple outputs with a sliding window of one
2.4. Off-the-shelf methods point each time, the First Prediction approach corresponds to the first
prediction made for each data point. In this case, we select the three
As the literature of data-driven oil-production forecasting is still data predictions for the first forecasting and then the last data point
scarce, we also adopt state-of-the-art methods for general forecast- of each subsequent prediction to compose our final forecast. We can
ing and compare them to our proposal. These methods are consid- plot the predictions with this setup, but it still combines different
ered off-the-shelf, as they are general-purpose and not tailored for oil confidences considering the first prediction window.
production forecasting. Our second setup focused on the last prediction data, obtaining
Salinas et al. (2020) proposed the DeepAR, an autoregressive re- a result that is more compatible with the challenging problem of
current neural network for a probabilistic forecast. This method learns forecasting for longer periods. We name this approach N-th Day. In this
a global model for all time series in a dataset. The authors claim case, we perform the same multi-output window with sliding steps of
as an advantage are that the model learns seasonal behaviors across the previous setup but only consider each window’s last day for the
time series. They make a probabilistic forecast in the form of Monte evaluation. Thus, the obtained results for this approach represent the
Carlo samples learning from similar items. Moreover, their method most challenging forecast data, which is the most distant data in our
can incorporate many likelihood functions to apply in the data. This output window from the input data. This setup has the advantage of
DeepAR method was proposed to Amazon’s retail businesses but was not combining different prediction confidences in its results. However,
also evaluated on datasets of various problems. it lacks all data points to plot, complemented by the First Prediction
Oreshkin et al. (2019) presented a neural architecture based on setup in this case.
backward and forward residual links and fully connected layers for This setup helps us to focus the evaluation on the behavior of
forecasting univariate time series. Their architecture is generic and the forecasting methods for the N-th Day prediction. Fig. 7 shows an
straightforward; it does not rely on time series feature engineering; example of this evaluation approach considering an output window of
it is easy to interpret and extend. They also used ensembling to be size 3. For each prediction (each row), each forecast’s last predicted
comparable to other methods from the M4 forecasting competition. value (third data point) is selected to compose the final forecast series.
4
Fig. 5. Visual pipeline of our methodology.
3.2. Data pre-processing
Reservoir production time series might present some anomalies

and even inconsistent or erroneous observations, e.g., due to unex-
pected events, defective observation or measurements, and human
interventions. These events can alter the series, making it difficult for
forecasting methods to learn a pattern for its prediction. Therefore, it
is essential to identify and remove these anomalies in the data pre-
processing. Data pre-processing is a vital step in supervised learning
solutions.
For anomaly removal, we adopt z-score modeling. We calculate the
mean and standard deviation for each time series and remove the data
Fig. 6. The proposed First Prediction setup. For each prediction, we select the first
above or below one standard deviation from the mean.
prediction for each data. For the first prediction (blue), we select all predicted data
and, for the subsequent predictions (green and red), we select the last data to form the
However, after removing noisy data, we can have fewer data points
last line. (For interpretation of the references to color in this figure legend, the reader than necessary for our forecast model to learn a time-series pattern. An
is referred to the web version of this article.) approach to avoid this problem is performing data augmentation. Data
augmentation comprises methods for increasing training data, includ-
ing unobserved data related or transformed from the original training
points (van Dyk and Meng, 2001). With more data, our methods and
models can improve their learning.
We propose performing data augmentation using two approaches.
The first is to create data points between two-time quanta with inter-
polation, e.g., between the data points day 1 00:00 and day 2 00:00,
this generates a point day 1 12:00 through interpolation.
The second approach is to perturb the data during the training of
the method. In this case, the training variability helps the method to be
invariant to noise. In this approach, we set a probability of applying a
transformation (a.k.a. perturbation) in subsets of the training data. The
transformations could be: Add Noise, Convolution, Drift, Pool, Quanti-
zation, Reverse, and Time Warp. Table 1 describes each perturbation
Fig. 7. The proposed N-th Day setup, in which we only select the last data (i.e., day and Fig. 8 depicts these augmentations.
8 - blue, day 9 - green, and day 10 - red) for each forecast output window and create
a new time series with these selected data. This new time series is the result for later 3.3. Injector data
assessment of accuracy. (For interpretation of the references to color in this figure
legend, the reader is referred to the web version of this article.)
Aside from the time series data from the producer well, we also
have information from injector wells in the reservoir. The process of
injecting water or gas takes time to influence the producer well. In
While the First Prediction considers all the predicted data from the first other words, when injecting a fluid into the reservoir, it generates a
prediction and then the last data from subsequent predictions, the N-th pressure pulse, and this takes some time to reach the producer well
Day only considers the last data from all predictions. For long testing (diffusivity time) (Johnson et al., 1966).
time series, both approaches tend to become comparable. In addition, When considering injector well data, we have to examine the con-
the N-th Day prediction could be used to evaluate how the forecasting nectivity among them. One approach to determine interwell connectiv-
ability of the classifier decreases as the forecasting target goes further ity is the Time Lagged Cross-Correlation (TLCC), which uses Pearson’s
in the future. correlation coefficient applied on two time series shifted in time (Shen,
5
Table 1
Perturbations for the data augmentation in the training phase with their descriptions.
Perturbations Description
Add noise Adds random noise to the time series.
Convolution Performs a convolution in the time series with a kernel window,
i.e., a composition function between the time series’ values and a
kernel function (e.g., triangular, Hann window), which acts as a
filter.
Drift Adds a drift value to some points of the time series.
Pool Divides the time series into windows and then applies a pooling
function (e.g., maximum, minimum, average) to each window point.
Quantization Defines level sets according to a distribution (e.g., uniform,
quantile, k-means) and rounds the time series’ values to the nearest
level in the level set.
Reverse Reverses the timeline of a series. Fig. 9. Representation of an LSTM layer. 𝑋𝑡 represents the input at instant 𝑡, 𝐶𝑡−1 and
Time warp The augmenter randomly changes the speed of the timeline. 𝐶𝑡 are the memory from the previous LSTM cell and current cell, and ℎ𝑡−1 and ℎ𝑡 are
the output of the previous cell and the output of the actual cell. The gates of this
cell are enumerated as follows: (1) is the forget gate, (2) is the input gate, and (3)
is the output gate. In this recurrent cell, the input and the previous output decides to
consider the memory from the last cell, which then update the memory state of the
cell in the input gate, and finally, the memory of the cell combined with the inputs
results in the output of the cell.
considering the BHP and reservoir pressure difference, both from the
injector and producer.
3.4. Data-driven techniques
Time series derived from petroleum field production data is highly

non-linear; to perform forecast with these data, we need to rely on non-
linear formulations, such as Artificial Neural Networks (ANN) (Li et al.,
2013). The architecture’s brain-inspired ANNs can help solve large
and complex tasks, with applications both in academia and industry.
The idea of ANNs is to combine a vast set of artificial neurons and
connections with learned weights directly from the data aiming at
solving a specific problem. The literature on data-driven techniques
presents several neural network architectures designed for different
tasks. For time series forecasting, some of the most suitable are RNN
and Convolutional Neural Networks (CNN). However, it is unlikely that
using an off-the-shelf solution would solve a complex problem as the
one in this work.
The network we devise herein stacks multiple recurrent layers,
creating a deep RNN, varying the input/output sequences among the
layer (Géron, 2019). There are two main recurrent layers in the lit-
erature: LSTM (Long Short-Term Memory) and GRU (Gated Recurrent
Unit). LSTM is a recurrent layer that remembers previous steps using
Fig. 8. Data augmentation in training phase. (a) shows the original time-series, and three gates that manipulate information. The first, called forget gate,
(b) to (h) shows the augmentation through perturbations. removes irrelevant information from the previous steps. The second is
the input gate which allows (or not) the input value to be accumulated
and updates the cell memory. Finally, the output gate can shut off the
2015). The TLCC helps identify lags of influence from an injector that output of the cell (Goodfellow et al., 2016). GRU differs from the LSTM
might be useful to infer a producers’ production (Menke and Menke, by using a single gate to simultaneously control the forget function and
2016). the decision to update the actual state, named update gate, and the reset
Pearson’s correlation coefficient defines the degree of linear corre- gate. Figs. 9 and 10 show these two recurrent layers and their gates.
lation between two time series. The higher this coefficient, the more We experimented with two different types of recurrent networks.
significant the correlation between them (Tian and Horne, 2016). In the first, Seq2Vector, we feed the network with a sequence of inputs
Pearson’s correlation coefficient is defined as and only the output vector from the last recurrent layer is considered. In
∑𝑛 ( )( ) the second one, Seq2Seq, we feed the network with a sequence of inputs
𝑖=1 𝐼𝑖 − 𝐼 𝑃𝑖 − 𝑃 and the network produces a sequence of outputs at every recurrent step.
𝑟= √
)2 √∑ (
, (3) The advantage is that it will consider the output at every time step in
∑𝑛 ( 𝑛
)2
𝑖=1 𝐼 𝑖 − 𝐼 𝑖=1 𝑃 𝑖 − 𝑃 the error calculation, which improves the training (Géron, 2019). We
adopted the Seq2Vector in the last recurrent layer, followed by one or
in which 𝐼𝑖 and 𝑃𝑖 are the injector and producer time series, respec- more dense layers to provide the multiple output prediction.
tively, 𝑛 is the length of the series, and 𝐼 and 𝑃 are the mean value of CNNs consist of a different formulation than RNNs. They are well-
the series 𝐼 and 𝑃 , respectively. known and widely successful networks used mainly for image classi-
In this work, we considered the two correlations. The first is the fication (Krizhevsky et al., 2012). CNNs consist of sequences of con-
correlation between the mass flow of water and gas of the injector volutional layers, which perform convolutions between local regions
with the producer’s BHP derivative. The other correlation was made of the input and a defined filter, or weight matrix, which slides over
6
Table 2
The ANNs details we have used in the experiments. A plot of the models can be found in Fig. A.1.
GRU2 GRUconv Seq2Seq CNN
2x gru: 2x conv1D: 2x gru: 3x conv1D:
units = 128, filters = 64/32, units = 128, filters = 64,
return sequences = T/F kernel size = 4, dropout = 0.1, kernel size = 2,
strides = 2, recurrent dropout = 0.5, padding = ‘‘same’’
padding = ‘‘valid’’ return sequences = T + batch
normalization
10x dense: 2x gru: lambda: global average
units = 128/64/32/30 units = 128, last 30 days pooling1D
dropout = 0.1,
recurrent dropout = 0.5,
return sequences = T/F
10x dense: time distributed: 1x dense:
units = 128/64/32/30 dense(1) units = 30
values, such as the number of neurons in each layer, dropout, filters,

kernel size, stride, padding, and if it returns a sequence (T) or not
(F). We selected these networks to represent the two approaches for
forecasting (RNN and CNN) and a combination of them. Compared to
forecasting oil production literature, these architectures are deeper and
a little more complex, stacking up more recurrent layers and more
dense layers. We intend to release the complete source code of our
methods freely through GitHub upon acceptance of this paper.
4. Experimental protocol
This section describes the protocol guiding our experiments, includ-

Fig. 10. Representation of a GRU layer. 𝑋𝑡 represents the input at instant 𝑡 and ℎ𝑡−1 ing the selected datasets, the protocol for splitting the datasets between
and ℎ𝑡 are the output of the previous cell and the output of the actual cell. The gates
of this cell are enumerated as follows: (1) is the reset gate, and (2) is the update
train and test sets, and the metrics applied to the results.
gate. First, the data goes through the reset gate, which decides how much information
from the past to forget, and then passes through the update gate that controls the 4.1. Datasets
information that flows into the memory.
For experiments and validation, we adopted two benchmarks from

the literature that are not from the oil industry (Metro Interstate Traffic
Volume and Appliances Energy Prediction) to consider the forecasting
setup and a proprietary dataset from a pre-salt oil reservoir. For the
sake of open science, we also present new data from a benchmark
model for experiments in oil and pressure forecasting, which will
be fully available upon acceptance of this paper, also with pre-salt
Fig. 11. Example of a forecasting network using CNNs. conditions and characteristics. We adopted the first two datasets to
show compatibility with previously evaluated methods in the literature
and pinpoint that the methods we explore in this work might apply to
the input. A CNN architecture learns the filters to be able to recognize other setups beyond the oil and gas industry.
patterns in the input data. Fig. 11 presents an example of a network The Metro Interstate Traffic Volume Dataset1 is a multivariable
using CNNs. In this network, a sliding window in the input goes through time series benchmark created from hourly data of the Interstate 94
the convolutional layers to find the input patterns, next the pooling Westbound traffic volume for MN DoT ATR station 301 from 2012 to
layer to obtain the most salient elements obtained in the convolution. 2018. This station is located roughly midway between Minneapolis and
Finally, a dense layer interprets the extracted features and returns the St. Paul, MN, USA. The dataset also contains hourly weather features
output. and information on holidays. It has 48,204 data points with 9 vari-
Applying CNNs to time-series forecasting tasks consists of learning ables. We selected the numeric variables (temp, rain_1h, snow_1h,
filters representing the patterns as a 1D feature map. With that, it uses clouds_all, and traffic_volume) for our experiments, and de-
these patterns to help forecast future values. One advantage of CNNs for fined the traffic_volume as the target.
forecasting is that they can access a broad range of history of the time The Appliances Energy Prediction Dataset (Candanedo et al., 2017)
series (van den Oord et al., 2016; Borovykh et al., 2017) and reliably is a multivariate benchmark to perform regression of energy use in
capture local structures from the data. The created 1D feature map can a low energy building. It comprises almost five months of informa-
also be mixed with recurrent layers, helping the recurrent layer detect tion, measured every 10 min, resulting in 19,735 data points with 29
more extended patterns (expanding a local view), especially when attributes.
the convolutional layer reduces the sequence’s size. Our experiments Considering oil and pressure forecast, our primary focus, we worked
evaluated both setups for the forecasting, CNNs alone and CNNs allied with two datasets: one private and another generated from a bench-
with RNNs. mark model. The private dataset comprises production data from a
Table 2 summarizes the four network architectures (named GRU2, Brazilian pre-salt oil reservoir. This dataset provides information on
GRUconv, Seq2Seq, and CNN) adopted in this work. Note that we opted
to use GRU layers instead of LSTM since they proved faster. For the
sake of visualization, the number of layers and their respective type are 1
https://archive.ics.uci.edu/ml/datasets/Metro+Interstate+Traffic+
defined on the cells’ top. The table also describes the hyperparameter Volume, accessed on November 23rd, 2020.
7
fluid production (oil, gas, and water), pressure (bottom-hole), and the Table 3
Forecasting setups on Metro and Energy datasets.
ratio between them (water cut, gas–oil ratio, and gas–liquid ratio). The
reservoir contains 16 producers and 16 injector wells, divided into nine Dataset Metrics Forecasting setups
water injectors and seven WAG injectors. For the oldest producer well, Concatenation Tumbling Retraining + Nth Day
Tumbling
we have five years of historical data.
The final dataset is the UNISIM-II-M-CO benchmark,2 created by MAE 1062.65 1054.47 1046.67 1432.57
Metro RMSE 1405.41 1391.53 1383.81 1764.99
the UNISIM group at the University of Campinas. This synthetic bench-
SMAPE 43.49 43.30 43.03 54.21
mark, run on the CMG-GEM simulator,3 has production and injection
MAE 43.29 43.25 43.21 45.78
trends similar to the private dataset apart from being a carbonate
Energy RMSE 91.56 91.33 88.56 93.79
reservoir based on real field data. The model is a synthetic light-oil SMAPE 36.08 36.15 36.21 38.25
based on a combination of Pre-Salt characteristics, such as fractures,
Super-K layers, and high heterogeneity (Correia et al., 2015). The fluid
model is compositional, with seven components in the oil phase. The
simulation model has 6.5 years of production history and contains eight Upon obtaining the forecasting results, we evaluated the approaches
WAG injectors alternating every six months and ten producer wells. All through three metrics commonly used in forecasting problems (Kubota
the producers and injectors of the simulation model present total and and Reinert, 2019; Oreshkin et al., 2019; Liu et al., 2020): Mean
partial closure frequency similar to real cases. Absolute Error (MAE), that shows the magnitude of errors,
1 ∑
𝑚
4.2. Protocol MAE(𝑋, ℎ) = |ℎ(𝑥𝑖 ) − 𝑦𝑖 |,

𝑚 𝑖=1
In this subsection, we describe the adopted evaluation protocols. Root Mean Square Error (RMSE), which measures how spread out the
errors are
We selected the Metro and the Energy datasets for the experiments √
√
comparing the forecasting setups to show that our proposed setup can √1 ∑ 𝑚
generalize to other forecasting problems. RMSE(𝑋, ℎ) = √ (ℎ(𝑥𝑖 ) − 𝑦𝑖 )2 ,

𝑚 𝑖=1
We reserved 10% of each dataset for testing (4820 and 1973 data
points, respectively), following the procedure done by Hu and Zheng and Symmetric Mean Absolute Percentage Error (SMAPE), which mea-
(2020). We applied an RNN network (GRU2 network, described in Sec- sures the percentage error of the predicted values,
100 ∑ |ℎ(𝑥𝑖 ) − 𝑦𝑖 |
tion 3.4) with Huber loss in all experiments as our proposed baseline, 𝑚
unless stated otherwise. As the approach is stochastic, we performed SMAPE(𝑋, ℎ) = ,
𝑚 𝑖=1 (|𝑦𝑖 |+|ℎ(𝑥𝑖 )|)
ten runs to obtain a margin of its results. We obtained the mean of 2
these predictions as our final result. We fed batches of data containing where 𝑋 are the predicted values and 𝑦 the ground-truth.
85 points as input to learn how to predict the following 30 data points
(output) for training the network. All experiments were performed with 5. Experiments and results
100 epochs, using an early stopping approach after 10 epochs without
improving the validation loss, a common practice in machine learning. In this section, we present the sets of experiments and discuss their
For the oil datasets, we performed experiments considering our N-th results. The first set of experiments (Section 5.1) compares different
day approach, as it is better for assessing the quality of the forecasting forecasting setups. The second set (Section 5.2) shows our forecasting
further in the future, as discussed in Section 3.1. As the datasets contain method applied in the oil and gas field and highlights the importance of
daily data, we performed a forecast of one month ahead (30 data points data pre-processing. The third set (Section 5.3) adds information about
output), given an input of 85 data points. the injection data, considering the delay of influence extracted from
correlation techniques. Finally, the last set (Section 5.4) compares the
Thirty days might seem a small time frame for petroleum engineers,
proposed approach with a number of off-the-shelf data-driven solutions.
as they are used to forecasting years using numerical simulations.
However, it is a larger enough window to help decide interventions
5.1. Comparing forecast setups
in the field’s operation. A reservoir simulator is usually not predictive
enough for short-term events, particularly for large and heterogeneous Our first experiment compared the different forecast setups in the
reservoirs, due to the complexity of representing such reservoirs and literature (Concatenation, Tumbling, and Retraining) with our pro-
computational limitations. Machine-learning approaches can deal bet- posed approach (N-th Day). All these setups perform multiple out-
ter with the complex data available from different sources and high- put forecasting, they comtemplate a more extensive testing set, and
frequency data of the oil and gas industry. Therefore, through the more combine their outputs differently. Table 3 shows how these methods
refined data, these machine-learning approaches can be more accurate perform in the Metro and Energy datasets.
to predict a near-future event, such as kicks, hydrate formation, or early As Table 3 shows, the N-th Day evaluation setup does not have
water and gas breakthroughs. the best result, as expected. This is coherent with its proposition of
All the networks tested in these datasets use Huber loss. We se- considering only the last data of each horizon window and integrate
lected two targets for our experiments in these datasets: daily oil this data at the end. The last day is the most challenging data point to
production and well bottom-hole pressure. Each data point input of the be predicted, as it is the furthest from the input data.
networks consists of all available variables for the given data point, All other setups consider the other predicted data points (e.g., 1-day
e.g., production data, injection data, and well pressure. prediction) in their evaluations and, because of this, their results tend to
After obtaining the forecast results, we performed post-processing in be higher. For instance, the first predicted data point is straightforward,
the data. We ensure that the target was not negative for the oil datasets, as the method can even use the same last data point seen without any
as our targets are daily pressure and daily production. We also removed learning method and still obtain reasonable results. We can see in the
any outliers above one standard deviation. Energy dataset that the results for the N-th Day setup are not far from
the other approaches. The subsequent experiments present results using
only the N-th Day setup for forecasting.
2
https://www.unisim.cepetro.unicamp.br/benchmarks/en/unisim- It is interesting to notice that the Retraining setup outperforms the
ii/overview, accessed on November 23rd, 2020. Concatenation and Tumbling methods. However, as it trains every new
3
https://www.cmgl.ca/, accessed on September 29th, 2020. prediction, its costs are multiple times the cost of the other methods.
8
Fig. 12. Augmentation and anomaly detection experiments on the private dataset,
Fig. 13. Augmentation and anomaly detection experiments on the UNISIM-II-M-CO
considering 104 experiments (4 networks, 13 wells, and 2 target variables).
dataset, considering 80 experiments (4 networks, 10 wells, and 2 target variables).
5.2. Comparing different pre-processing techniques
Table 4
Our second experiment focuses on field response data, i.e., fluid
SMAPE results for 3 wells on the private dataset using GRU2 method and considering
production and wells pressure from a producing reservoir. We want (or not) their correlated injectors. Well P1 is connected to injectors I1 with 9 days
to remove erroneous data that we cannot predict, such as human delay and I2 with 5 days delay. Well P2 has 3 connected injectors: I2, I3, and I4, with
intervention. 1 day, 15 days, and 1 day delays, respectively. Producer well P3 is correlated with 6
different injector wells: I5 without delay, I6 with 2 days delay, and wells I7, I8, I9,
We separated the anomaly removal and the data augmentation to
and I10 with 1 day delay each.
see how they improved the forecasting results for these experiments.
Well Target Without injector With correlated injectors
We used a z-score method for anomaly removal. Considering data
DailyProdOil 29.12 28.73
augmentation, we created data points that correspond to intervals of P1
DailyPressureBHP 0.99 0.96
3 h in our original daily data, interpolating these data linearly.
DailyProdOil 46.12 38.94
The perturbations for data augmentation are presented in Table 1. P2
Using our two targets, we performed all augmentations combined with DailyProdOil 7.01 7.44
our four networks on 13 producer wells of the private dataset (producer P3
with enough data to perform the perturbations). We compared the
results to select the augmentation approach that performed better in
more experiments. Fig. 12(a) shows the number of experiments that
have the best result with the corresponding augmentation. This figure production) and DailyPressureBHP (daily measure of well bottom-
hole pressure), respectively. We can see how the forecast (red) performs
shows that the augmentation by data points performed better in more
in these plots compared to the ground-truth (blue).
wells than the other approaches. Moreover, Fig. 12(b) considers the
same experiments and shows the influence of the use of an anomaly 5.3. Injector data
detector.
We also performed these experiments in the UNISIM-II-M-CO Our following experiments considered injector data as input of our
dataset, considering all its ten producer wells. Figs. 13(a) and 13(b) forecasting approaches. In these experiments, for each producer well,
presents these results for augmentation and anomaly detection, respec- we devised the TLCC to determine which injector wells are connected
tively. It is clear that, for the UNISIM-II-M-CO dataset, the removal of to the producer and the lag between them. Tables 4 and 5 show the
anomalies worsens both results (augmentation and anomaly detection). SMAPE metric for our two datasets, considering the oil production of a
We believe this is because the interference in this benchmark model well with and without correlated injector wells.
was artificially generated, following a distribution. Thus the anomaly As shown in Tables 4 and 5, we do not have a definitive answer
and synthetic data are intrinsic to the time series data so that any on whether it is better to use data from injector wells, especially
anomaly removal would disrupt the dataset model, and interpolation considering the private dataset. Intuitively, we think it is better to
include this information. We believe future investigations should be
would only provide noise.
performed to enhance the correlation of producer and injector wells so
The next experiments in this work consider the best results on
that more precise information could be used as input to our algorithms.
augmentation and anomaly removal for each dataset. For experiments
with the private dataset, we used augmentation by 3 h and z-score 5.4. Comparing data-driven techniques
anomaly removal. For the UNISIM-II-M-CO dataset, we do not per-
form an augmentation nor do we remove anomalies. Figs. 14 and 15 Our last round of experiments compares our methods with baselines
present plots for three wells and two targets, DailyProdOil (daily oil and different off-the-shelf deep-learning networks from the literature,
9
Fig. 14. DailyProdOil forecasting in the private dataset. The purple strip represents the start of the test set and has the size of the output window. The red line is the forecasted
data, and the blue points are the ground-truth data. The green shadow is the maximum and minimum values obtained considering all 10 runs. (For interpretation of the references
to color in this figure legend, the reader is referred to the web version of this article.)
10
Fig. 15. DailyPressureBHP forecasting in the private dataset The purple strip represents the start of the test set and has the size of the output window. The red line is the
forecasted data, and the blue points are the ground-truth data. The green shadow is the maximum and minimum values obtained considering all 10 runs. (For interpretation of
the references to color in this figure legend, the reader is referred to the web version of this article.)
11
Table 5 literature, we identified that forecasting multiple outputs is challenging

SMAPE results for 3 wells on the UNISIM-II-M-CO dataset using GRU2 method and
due to the sliding window overlapping values already predicted. We
considering (or not) their correlated injectors. Well PRK028 is connected to injectors
IRK004 with 8 days delay and IRK049 with 5 days delay. Well PRK045 has just 1
identified that several approaches continuously update the forecasting
connected injector: IRK049, with 4 days delay. Producer well PRK014 is correlated to values, leading to a wrong interpretation of the multiple outputs com-
the injector wells IRK004 and IRK049, both with 2 days delay. parable to a single output, which is crucial for managing oil fields.
Well Target Without injector With correlated injectors Thus, we aim for this proposed method to be the standard for multiple
DailyProdOil 45.74 23.09 output forecasting scenarios. The differences between the proposed
PRK028
DailyPressureBHP 6.86 7.03 method and the standard methods found in the literature are detailed
DailyProdOil 15.70 15.52 throughout several experiments.
PRK045
DailyPressureBHP 7.64 6.81 Considering the data-driven oil production forecasting scenario,
PRK014
DailyProdOil 106.12 90.98 we evaluated the impact of different pre-processing techniques, such
DailyPressureBHP 3.52 2.99 as augmentation and removal of anomalies. The experiments were
conducted using both real data and synthetic data from the UNISIM-
II-M-CO benchmark. Unlike common forecasting examples, such as
e.g., Transformer, N-BEATS, DeepAR, among others. For these new weather prediction, positive influence by pre-processing was dataset
networks, we used a toolkit for time series modeling called GluonTS.4 dependent. We also studied the influence of injector data in forecasting
We adapted these baselines and off-the-shelf networks to our N-th oil production, from which we could not determine if it is better to use
Day setup. In that way, we modified our test set to contain just the injector data or not, so additional studies on this problem are required.
last 100 days of our datasets and maintained our batch output size We performed experiments comparing different off-the-shelf data-
of 30 data points. That way, we have 71 predictions, applying our driven techniques with network configurations to demonstrate the
setup on the 30th day. However, we could not apply this setup to the complexity of the oil production forecasting context. We verified that
Prophet network, as their solution does not consider the test’s input. We selecting an off-the-shelf method to apply to the oil production forecast
therefore included its results considering a retraining setup combined with these experiments is not enough. We also confirmed that forecast-
with our N-th Day setup. ing multiple outputs is tricky, which may lead to erroneous evaluation
For comparison, we selected baselines, off-the-shelf solutions for of the designed model if a specific approach, such as the proposed N-th
forecasting, a mechanism to improve recurrent networks, and a state- Day, is not considered. Further, it is essential to understand the data
of-the-art forecasting approach. Details of these networks can be found and the problem at hand.
in Vaswani et al. (2017), Taylor and Letham (2018), Oreshkin et al. Based on the obtained results, the best ones were achieved using
(2019), Salinas et al. (2020). Figs. 16 and 17 show the SMAPE metric stacked recurrent network layers to consider long time-frames as in-
for three producer wells and two target variables. put, such as LSTM and GRU. Harnessing these networks is already
As we can see in Figs. 16 and 17, no one network performs better an improvement compared to the linear regression and single-layer
than the rest. Thus, for oil production forecasting, it is not enough to se- recurrent networks commonly found in the oil production forecasting
lect an off-the-shelf method and apply it to the problem. It is essential to literature. Experiments with several recurrent networks indicate we do
recognize the problem, understand the available variables, and propose not simply need a more extended time frame as input. Instead, we
an approach to resolve it. One possible reason for PN-DCA to perform need to design specific temporal representations highlighting crucial
worse than machine-learning techniques in these experiments is their input snippets from different time series to support oil and pressure
ability to adapt to various changes in the data. In contrast, PN-DCA forecasting. We see it as a natural evolution of our proposed Seq2Seq
depends on some conditions of the reservoir that are not true in our and GRU2 architectures.
cases (Arps, 1945; Lacayo and Lee, 2014), such as: not being influenced
by injection support, long production history, and performing under Future work
pseudo-steady-state conditions or boundary-dominated flow conditions.
As we consider the normalized pressure rate in the PN-DCA’s forecast-
Our proposed approach for oil production forecasting can be im-
ing, we had to define the bottom-hole pressure for the testing data. For
proved considering, for example, attention mechanisms to quantify
that, we used linear regression to obtain these values, as we lack access
the interdependence between the input and output or within input
to the ground-truth data of the testing data.
elements, and a new hyperparameter considering the input size of the
Each reservoir is unique and specific forecasting methods are needed
data, in which we can apply an auto-correlation method to obtain this
for the Oil and Gas industry. In particular, the forecasting window is
parameter. Regarding the model hyperparameters, we have used the
much larger, the data suffers high interference from unknown corre-
default values from Keras, which have proved to be a good option
lations, and there are many anomalous data (such as human interfer-
by the literature. For instance, a hyperbolic tangent is used as the
ence). Our applied approaches consider these variables, but there is still
activation function for all recurrent layers, which is the default, and it is
room for improvement.
important to reduce the issues with unstable gradients. Searching into
6. Conclusion and future work the hyperparameters space can be pretty costly, as the deep learning
model has many parameters (batch size, learning rate, loss function,
We focused this work on short-time 30-day forecasting of fluid rates number of epochs, early stopping, to name a few), so we plan to adopt
and bottom-hole pressure for hydrocarbon reservoirs using data-driven a Genetic Algorithm approach.
procedures. We used different datasets to evaluate various forecasting Another future work is to consider the information of other producer
setups on broader horizons, applied a number of pre-processing tech- wells in the training step of the target well. With more data, we can bet-
niques, and studied their impact on forecasting, including information ter understand the different well production evolution and production
on injector data. We also evaluated several off-the-shelf approaches to behaviors. It is crucial to have additional studies on the correlation be-
this problem. tween producer and injector wells to consider the interactions between
The proposed N-th Day approach highlights the importance of hav- them and improve the input data. In future work, we can include more
ing an accurate method to forecast multiple outputs. Compared to the details about the correlation between two wells, the delay for one well
to influence the other, and the strength of the correlation between two
wells. Another investigation proposal is to program the neural networks
4
https://ts.gluon.ai, accessed on October 27th, 2020. to learn this delay from the data in such a way that current data from
12
Fig. 16. SMAPE results for the private dataset considering oil production and bottom-hole pressure. The blue bars are the networks proposed for use in this work, the red bars are
the off-the-shelf methods from the literature, and the green bars are the baselines methods. All experiments were performed with augmentation and anomaly removal. Prophet*
means that this network uses a retraining approach. Note that the PN-DCA method is not applicable (N/A) to perform BHP forecasting. (For interpretation of the references to
color in this figure legend, the reader is referred to the web version of this article.)
both injector and producer are fed to the network, disregarding the Thirty days of production forecast can be used to assist short-
actual delay. term decisions, such as kicks or early water and gas breakthrough.
13
Fig. 17. SMAPE results for the UNISIM-II-M-CO dataset considering oil production and bottom-hole pressure. The blue bars are the networks proposed for use in this work, the red
bars are the off-the-shelf methods from the literature, and the green bars are the baselines methods. These experiments considered the best pre-processing for the UNISIM-II-M-CO
dataset (no augmentation and no anomaly removal). Prophet* means that this network uses a retraining approach. Note that the PN-DCA method is not applicable (N/A) to perform
BHP forecasting. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
As mentioned before, this time window is much smaller than those approaches are complementary. An even better solution may be found
usually considered in model-based approaches, whose forecasts cover by joining the two into a hybrid procedure, which is a topic for future
more than ten years. We believe that data-driven and model-based research.
14
CRediT authorship contribution statement Borovykh, A., Bohte, S., Oosterlee, C.W., 2017. Conditional time series forecasting with
convolutional neural networks. arXiv:1703.04691.
Brownlee, J., 2018. How to develop multi-step LSTM time series forecasting models
Rafael de Oliveira Werneck: Conceptualization, Methodology,
for power usage. https://machinelearningmastery.com/how-to-develop-lstm-
Software, Investigation, Writing – original draft. Raphael Prates: models-for-multi-step-time-series-forecasting-of-household-power-consumption/,
Software, Investigation, Writing – review & editing. Renato Moura: (accessed: Aug 3rd, 2020).
Software, Investigation, Writing – review & editing. Maiara Moreira Candanedo, L.M., Feldheim, V., Deramaix, D., 2017. Data driven prediction models
Gonçalves: Data curation, Writing – review & editing. Manuel Castro: of energy use of appliances in a low-energy house. Energy Build. 140, 81–97.
http://dx.doi.org/10.1016/j.enbuild.2017.01.083.
Data curation, Writing – review & editing. Aurea Soriano-Vargas: Vi-
Cao, Q., Banerjee, R., Gupta, S., Li, J., Zhou, W., Jeyachandra, B., 2016. Data driven
sualization, Writing – review & editing. Pedro Ribeiro Mendes Júnior: production forecasting using machine learning. In: SPE Argentina Exploration and
Formal analysis, Writing – review & editing. M. Manzur Hossain: Production of Unconventional Resources Symposium. pp. 01–10. http://dx.doi.org/
Resources, Writing – review & editing. Marcelo Ferreira Zampieri: 10.2118/180984-MS.
Resources, Data curation, Writing – review & editing. Alexandre Chaikine, I.A., Gates, I.D., 2021. A machine learning model for predicting multi-stage
horizontal well production. J. Pet. Sci. Eng. 198, 108133. http://dx.doi.org/10.
Ferreira: Conceptualization, Supervision, Writing – review & edit-
1016/j.petrol.2020.108133.
ing. Alessandra Davólio: Supervision, Writing – review & editing. Correia, M., Hohendorff, J., Gaspar, A.T., Schiozer, D., 2015. UNISIM-II-D: Benchmark
Denis Schiozer: Supervision, Writing – review & editing. Anderson case proposal based on a carbonate reservoir. In: SPE Latin America and Caribbean
Rocha: Conceptualization, Project administration, Supervision, Writing Petroleum Engineering Conference, Vol. Day 3 Fri, November 20, 2015. pp. 1–21.
– review & editing. http://dx.doi.org/10.2118/177140-MS, D031S020R004.
Davtyan, A., Rodin, A., Muchnik, I., Romashkin, A., 2020. Oil production forecast
models based on sliding window regression. J. Pet. Sci. Eng. 195, 107916. http:
Declaration of competing interest //dx.doi.org/10.1016/j.petrol.2020.107916.
Deng, L., Pan, Y., 2020. Machine-learning-assisted closed-loop reservoir management
No author associated with this paper has disclosed any potential or using echo state network for mature fields under waterflood. SPE Reservoir
pertinent conflicts which may be perceived to have impending conflict Evaluation & Engineering 23 (04), 1298–1313. http://dx.doi.org/10.2118/200862-
PA.
with this work. For full disclosure statements refer to https://doi.org/
van Dyk, D.A., Meng, X.L., 2001. The art of data augmentation. J. Comput. Graph.
10.1016/j.petrol.2021.109937. Statist. 10 (1), 1–50. http://dx.doi.org/10.1198/10618600152418584.
Ertekin, T., Sun, Q., 2019. Artificial intelligence applications in reservoir engineering: A
Acknowledgments status check. Energies 12 (15), 2897–2919. http://dx.doi.org/10.3390/en12152897.
Géron, A., 2019. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow:
This work was conducted in association with the ongoing Project Concepts, Tools, and Techniques To Build Intelligent Systems. O’Reilly Media.
Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT Press, http://www.
registered under ANP number 21373-6 as ‘‘Desenvolvimento de Téc- deeplearningbook.org.
nicas de Aprendizado de Máquina para Análise de Dados Complexos Hu, J., Zheng, W., 2020. A deep learning model to effectively capture mutation
de Produção de um Campo do Pre-Sal’’ (UNICAMP/Shell Brazil/ANP) information in multivariate time series prediction. Knowl.-Based Syst. 203, 106139.
funded by Shell Brazil, under the ANP R&D levy as ‘‘Compromisso http://dx.doi.org/10.1016/j.knosys.2020.106139.
Johnson, C.R., Greenkorn, R.A., Woods, E.G., 1966. Puk-testing: A new method for
de Investimentos com Pesquisa e Desenvolvimento’’. The authors also
describing reservoir flow properties between wells. J. Pet. Technol. 18 (12),
thank Schlumberger and CMG for software licenses and Vitor Ferreira 599–604.
for helping with the PN-DCA method. Kim, Y.D., Durlofsky, L.J., 2021. A recurrent neural network–based proxy model
for well-control optimization with nonlinear output constraints. SPE J. 26 (04),
Appendix A. Supplementary data 1837–1857. http://dx.doi.org/10.2118/203980-PA.
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep
convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Wein-
Supplementary material related to this article can be found online berger, K.Q. (Eds.), Advances in Neural Information Processing Systems, Vol. 25.
at https://doi.org/10.1016/j.petrol.2021.109937. Curran Associates, Inc., pp. 1097–1105, URL http://papers.nips.cc/paper/4824-
imagenet-classification-with-deep-convolutional-neural-networks.pdf.
References Kubota, L., Reinert, D., 2019. Machine learning forecasts oil rate in mature onshore field
jointly driven by water and steam injection. In: SPE Annual Technical Conference
Aizenberg, I., Sheremetov, L., Villa-Vargas, L., noz, J.M.-M., 2016. Multilayer neural and Exhibition. Day 2 Tue, October 01, 2019, pp. 1–18. http://dx.doi.org/10.2118/
network with multi-valued neurons in time series forecasting of oil production. 196152-MS.
Neurocomputing 175, 980–989. http://dx.doi.org/10.1016/j.neucom.2015.06.092. Lacayo, J., Lee, J., 2014. Pressure normalization of production rates improves fore-
Al-Shabandar, R., Jaddoa, A., Liatsis, P., Hussain, A.J., 2021. A deep gated recurrent casting results. In: SPE Unconventional Resources Conference / Gas Technology
neural network for petroleum production forecasting. Mach. Learn. Appl. 3, Symposium, Vol. Day 1 Tue, April 01, 2014. http://dx.doi.org/10.2118/168974-
100013. http://dx.doi.org/10.1016/j.mlwa.2020.100013. MS.
Amirian, E., Fedutenko, E., Yang, C., Chen, Z., Nghiem, L., 2018. Artificial neural Li, X., Chan, C., Nguyen, H., 2013. Application of the neural decision tree approach
network modeling and forecasting of oil reservoir performance. In: Applications of for prediction of petroleum production. J. Pet. Sci. Eng. 104, 11–16. http://dx.doi.
Data Management and Analysis : Case Studies in Social Networks and beyond. org/10.1016/j.petrol.2013.03.018.
Springer International Publishing, Cham, pp. 43–67. http://dx.doi.org/10.1007/ Liu, W., Liu, W.D., Gu, J., 2020. Forecasting oil production using ensemble empirical
978-3-319-95810-1_5. model decomposition based long short-term memory neural network. J. Pet. Sci.
Arps, J., 1945. Analysis of decline curves. Trans. AIME 160 (01), 228–247. http: Eng. 189, 107013. http://dx.doi.org/10.1016/j.petrol.2020.107013.
//dx.doi.org/10.2118/945228-G. Makridakis, S., Hibon, M., 2000. The M3-competition: results, conclusions and im-
Athanasopoulos, G., Hyndman, R.J., Song, H., Wu, D.C., 2011. The tourism forecast- plications. Int. J. Forecast. 16 (4), 451–476. http://dx.doi.org/10.1016/S0169-
ing competition. Int. J. Forecast. 27 (3), 822–844. http://dx.doi.org/10.1016/j. 2070(00)00057-1.
ijforecast.2010.04.009. Makridakis, S., Spiliotis, E., Assimakopoulos, V., 2018. The M4 competition: Results,
Bedi, J., Toshniwal, D., 2019. Deep learning framework to forecast electricity demand. findings, conclusion and way forward. Int. J. Forecast. 34 (4), 802–808. http:
Appl. Energy 238, 1312–1326. http://dx.doi.org/10.1016/j.apenergy.2019.01.113. //dx.doi.org/10.1016/j.ijforecast.2018.06.001.
Belyadi, H., Fathi, E., Belyadi, F., 2019. Chapter seventeen - decline curve analysis. In: Menke, W., Menke, J., 2016. Environmental Data Analysis with Matlab. Academic Press.
Belyadi, H., Fathi, E., Belyadi, F. (Eds.), Hydraulic Fracturing in Unconventional van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A.,
Reservoirs (Second Edition), second ed. Gulf Professional Publishing, pp. 311–340. Kalchbrenner, N., Senior, A., Kavukcuoglu, K., 2016. Wavenet: A generative model
http://dx.doi.org/10.1016/B978-0-12-817665-8.00017-5. for raw audio. arXiv:1609.03499.
Bianchi, F.M., Scardapane, S., Løkse, S., Jenssen, R., 2021. Reservoir computing Oreshkin, B.N., Carpov, D., Chapados, N., Bengio, Y., 2019. N-BEATS: Neural basis
approaches for representation and classification of multivariate time series. IEEE expansion analysis for interpretable time series forecasting. arXiv preprint arXiv:
Trans. Neural Netw. Learn. Syst. 32 (5), 2169–2179. http://dx.doi.org/10.1109/ 1905.10437.
TNNLS.2020.3001377. Pan, Y., Bi, R., Zhou, P., Deng, L., Lee, J., 2019. An effective physics-based deep learn-
Bontempi, G., 2008. Long term time series prediction with multi-input multi-output ing model for enhancing production surveillance and analysis in unconventional
local learning. In: 2nd European Symposium on Time Series Prediction, pp. reservoirs. In: SPE/AAPG/SEG Unconventional Resources Technology Conference.
145–154. OnePetro, http://dx.doi.org/10.15530/urtec-2019-145.
15
Pao, H.T., 2007. Forecasting electricity market pricing using artificial neural networks. Tadjer, A., Hong, A., Bratvold, R.B., 2021. Machine learning based decline
Energy Convers. Manage. 48 (3), 907–912. http://dx.doi.org/10.1016/j.enconman. curve analysis for short-term oil production forecast. Energy Explor. Exploit.
2006.08.016. 01445987211011784. http://dx.doi.org/10.1177/01445987211011784.
Razak, S.M., Cornelio, J., Cho, Y., Liu, H.-H., Vaidya, R., Jafarpour, B., 2021. Transfer Taylor, S.J., Letham, B., 2018. Forecasting at scale. Amer. Statist. 72 (1), 37–45.
learning with recurrent neural networks for long-term production forecasting in un- Tian, C., Horne, R.N., 2016. Inferring interwell connectivity using production data. In:
conventional reservoirs. In: SPE/AAPG/SEG Unconventional Resources Technology SPE Annual Technical Conference and Exhibition.
Conference. OnePetro, http://dx.doi.org/10.15530/urtec-2021-5687. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, u.,
Salinas, D., Flunkert, V., Gasthaus, J., Januschowski, T., 2020. Deepar: Probabilis- Polosukhin, I., 2017. Attention is all you need. In: Proceedings of the 31st
tic forecasting with autoregressive recurrent networks. Int. J. Forecast. 36 (3), International Conference on Neural Information Processing Systems. In: NIPS’17,
1181–1191. Curran Associates Inc., Red Hook, NY, USA, pp. 6000–6010.
Shen, C., 2015. Analysis of detrended time-lagged cross-correlation between two Yuan, Z., Huang, H., Jiang, Y., Li, J., 2021. Hybrid deep neural networks for reservoir
nonstationary time series. Phys. Lett. A 379 (7), 680–687. production prediction. J. Pet. Sci. Eng. 197, 108111. http://dx.doi.org/10.1016/j.
Song, X., Liu, Y., Xue, L., Wang, J., Zhang, J., Wang, J., Jiang, L., Cheng, Z., 2020. petrol.2020.108111.
Time-series well performance prediction based on long short-term memory (LSTM) Zhan, C., Sankaran, S., LeMoine, V., Graybill, J., Mey, D.O.S., 2019. Application
neural network model. J. Pet. Sci. Eng. 186, 106682. http://dx.doi.org/10.1016/j. of machine learning for production forecasting for unconventional resources. In:
petrol.2019.106682. Unconventional Resources Technology Conference, Denver, Colorado, 22–24 July
Sun, J., Ma, X., Kazi, M., 2018. Comparison of decline curve analysis DCA with 2019. pp. 1945–1954. http://dx.doi.org/10.15530/urtec-2019-47.
recursive neural networks RNN for production forecast of multiple wells. In: SPE Zhong, Z., Sun, A.Y., Wang, Y., Ren, B., 2020. Predicting field production rates for
Western Regional Meeting. Day 4 Wed, April 25, 2018, http://dx.doi.org/10.2118/ waterflooding using a machine learning-based proxy model. J. Pet. Sci. Eng. 194,
190104-MS. 107574. http://dx.doi.org/10.1016/j.petrol.2020.107574.
16

Faller Pressure Production

Uploaded by

Copyright:

Available Formats

Faller Pressure Production

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Faller Pressure Production

Uploaded by

Copyright:

Available Formats

Journal of Petroleum Science and Engineering 210 (2022) 109937

Contents lists available at ScienceDirect

Journal of Petroleum Science and Engineering

Data-driven deep-learning forecasting for oil production and pressure

ARTICLE INFO ABSTRACT

used in both periods. Therefore, the short-term forecast is usually done

This work focuses on data-driven approaches to perform well pro-

Departing from the linear approach formulation, Liu et al. (2020)

Fig. 5. Visual pipeline of our methodology.

3.2. Data pre-processing

Reservoir production time series might present some anomalies

3.4. Data-driven techniques

Time series derived from petroleum field production data is highly

values, such as the number of neurons in each layer, dropout, filters,

This section describes the protocol guiding our experiments, includ-

For experiments and validation, we adopted two benchmarks from

4.2. Protocol MAE(𝑋, ℎ) = |ℎ(𝑥𝑖 ) − 𝑦𝑖 |,

generalize to other forecasting problems. RMSE(𝑋, ℎ) = √ (ℎ(𝑥𝑖 ) − 𝑦𝑖 )2 ,

5.2. Comparing different pre-processing techniques

Table 5 literature, we identified that forecasting multiple outputs is challenging

You might also like