Vd2 Bouzarkouna Zyed 03042012

Well placement optimization
Zyed Bouzarkouna
To cite this version:

Zyed Bouzarkouna. Well placement optimization. Other [cs.OH]. Université Paris Sud - Paris XI,
2012. English. �NNT : 2012PA112061�. �tel-00690456�
HAL Id: tel-00690456

https://theses.hal.science/tel-00690456
Submitted on 23 Apr 2012
HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
UNIVERSITE PARIS-SUD
ÉCOLE DOCTORALE : Informatique
Laboratoire de Recherche en Informatique
DISCIPLINE : INFORMATIQUE
THÈSE DE DOCTORAT
soutenue le 03/04/2012
par
Zyed BOUZARKOUNA
Optimisation de placement des puits
Directeur de thèse : Marc SCHOENAUER INRIA Saclay - Île-de-France, France

Co-directeur de thèse : Anne AUGER INRIA Saclay - Île-de-France, France
Composition du jury :
Président du jury : Jan Dirk JANSEN TU Delft, Pays-Bas

Rapporteurs : Christian IGEL University of Copenhagen, Danemark
Pierre THORE Total, Royaume-Uni
Examinateurs : Didier Yu DING IFP Energies nouvelles, France
UNIVERSITY PARIS-SUD
ÉCOLE DOCTORALE : Informatique
Laboratoire de Recherche en Informatique
DISCIPLINE: COMPUTER SCIENCE
Ph.D. THESIS
defended on 03/04/2012
by
Zyed BOUZARKOUNA
Thesis advisor: Marc SCHOENAUER INRIA Saclay - Île-de-France, France

Thesis co-advisor: Anne AUGER INRIA Saclay - Île-de-France, France
Jury members:
President of the jury: Jan Dirk JANSEN TU Delft, Netherlands

Reviewers: Christian IGEL University of Copenhagen, Denmark
Pierre THORE Total, United Kingdom
Examiners: Didier Yu DING IFP Energies nouvelles, France
To my Mother.
i
Acknowledgements
It would not have been possible to write this doctoral thesis without the help and support
of the kind people around me, to only some of whom it is possible to give particular
mention here.
First and foremost I want to thank my thesis advisor Marc Schoenauer. It has been an
honor to be one of his Ph.D. students. I appreciate all his contributions of time, and
funding to make my Ph.D. experience productive and stimulating.
This thesis would not have been possible without the help, support and patience of my
thesis co-advisor, Anne Auger, not to mention her advice and unsurpassed knowledge of
her research area: the Optimization. Her good advice, support and friendship has been
invaluable on both an academic and a personal level, for which I am extremely grateful.
The joy and enthusiasm she has for her research was contagious and motivational for me!
I would like as well to express my deep and sincere gratitude to Didier Yu Ding, my advisor
in IFP Energies nouvelles. I thank Didier for his strategic insight that made this research
project both interesting and doable. He has spent a huge amount of time, energy and
dedication into this research: thank you Didier!
The jury for my thesis has consisted of Christian Igel, Pierre Thore and Jan Dirk Jansen.
I am grateful and honored that you have accepted to be part of the Jury of my PhD
defense. In particular, I thank Christian Igel and Pierre Thore for taking the time to read
and review the thesis.
I would like also to thank all the “Simulation of Flows and Transfers in Porous Media”
department in the IFP Energies nouvelles, for having created such a nice and ideal envi-
ronment. Special thanks to Benoit Noetinger, Mickaele Le Ravalec, Sébastien Da Veiga,
Frédéric Roggero, Sylvie Hoguet, and all the others whose names were not mentionned
here. It was an honor for me to be part of your team.
I would like also to thank all the “TAO team, INRIA Saclay - Île de France”, for their help,
guidance and enjoying environment. Special and particular thanks to Nikolaus Hansen,
whose scientific insight and experience was invaluable, for his helpful discussions and in-
sightful comments. I thank also Mohamed, Mouadh, Ilya, Hassen, Raymond, and all the
other TAO members.
Also, a lot of thanks to all the PhD students and all the friends I had the chance to
know during my stay in IFP Energies nouvelles. I spent three very pleasant years in your
company, I will surely miss it. In particular, I am thankful for Salem, Samir, Ekaterina,
Franck, François, Romain, Ratiba, Marius, Fakher, Höel, and all the other nice persons
from Rueil-Malmaison and Solaize I have known. We will keep in touch and we will surely
meet again many times. “It’s A Small World!”, well, the world of Geosciences is even
smaller!
I also thank all the friends I had, beginning from Gabés, to Tunis, then to Paris and now
in Pau! They all know who they are.
Finally, I want to thank my family: my mother, my brother, my sister, my brother-in-law
and my nephew.
This thesis is in particular dedicated to my mother: words can not express how grateful I
am for all of the sacrifices that you have made on my behalf.
ii
Abstract
The amount of hydrocarbon recovered can be considerably increased by finding optimal

placement of non-conventional wells. For that purpose, the use of optimization algorithms,
where the objective function is evaluated using a reservoir simulator, is needed. Further-
more, for complex reservoir geologies with high heterogeneities, the optimization problem
requires algorithms able to cope with the non-regularity of the objective function. The
goal of this thesis was to develop an efficient methodology for determining optimal well
locations and trajectories, that offers the maximum asset value using a technically feasible
number of reservoir simulations.
In this thesis, we show a successful application of the Covariance Matrix Adaptation -
Evolution Strategy (CMA-ES) which is recognized as one of the most powerful derivative-
free optimizers for continuous optimization. Furthermore, in order to reduce the number of
reservoir simulations (objective function evaluations), we design two new algorithms. First,
we propose a new variant of CMA-ES with meta-models, called the new-local-meta-model
CMA-ES (nlmm-CMA), improving over the already existing variant of the local-meta-
model CMA-ES (lmm-CMA) on most benchmark functions, in particular for population
sizes larger than the default one. Then, we propose to exploit the partial separability
of the objective function in the optimization process to define a new algorithm called the
partially separable local-meta-model CMA-ES (p-sep lmm-CMA), leading to an important
speedup compared to the standard CMA-ES.
In this thesis, we apply also the developed algorithms (nlmm-CMA and p-sep lmm-CMA)
on the well placement problem to show, through several examples, a significant reduction
of the number of reservoir simulations needed to find optimal well configurations. The
proposed approaches are shown to be promising when considering a restricted budget of
reservoir simulations, which is the imposed context in practice.
Finally, we propose a new approach to handle geological uncertainty for the well placement
optimization problem. The proposed approach uses only one realization together with
the neighborhood of each well configuration in order to estimate its objective function
instead of using multiple realizations. The approach is illustrated on a synthetic benchmark
reservoir case, and is shown to be able to capture the geological uncertainty using a reduced
number of reservoir simulations.
iii
Contents
1 Introduction 1
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Optimization algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1.1 Deterministic methods . . . . . . . . . . . . . . . . . . . . 5
1.2.1.2 Stochastic methods . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1.3 Search algorithms using surrogates . . . . . . . . . . . . . . 8
1.2.1.4 Hybrid methods . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Well placement optimization . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Thesis objectives and methodology . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Dissertation road-map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 CMA-ES and CMA-ES with meta-models 14

2.1 Covariance Matrix Adaptation - Evolution Strategy . . . . . . . . . . . . . 14
2.2 Handling constraints with CMA-ES . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 CMA-ES with local meta-models . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 The local-meta-model CMA-ES (lmm-CMA) . . . . . . . . . . . . . 23
2.3.1.1 Locally weighted regression . . . . . . . . . . . . . . . . . . 23
2.3.1.2 Approximate ranking procedure . . . . . . . . . . . . . . . 24
2.3.2 Evaluating lmm-CMA on increasing population size . . . . . . . . . 27
2.3.2.1 Experimental procedure . . . . . . . . . . . . . . . . . . . . 27
2.3.2.2 Performances of lmm-CMA with increasing population size 28
2.3.3 A new variant of lmm-CMA . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.3.1 A new meta-model acceptance criteria . . . . . . . . . . . . 28
2.3.3.2 Evaluation of nlmm-CMA . . . . . . . . . . . . . . . . . . . 29
2.3.3.3 Impact of the recombination type . . . . . . . . . . . . . . 30
2.3.3.4 Impact of initial parameters . . . . . . . . . . . . . . . . . 30
2.4 Summary and discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
iv
CONTENTS
3 Well placement optimization with CMA-ES and CMA-ES with meta-

models 35
3.1 The well placement optimization problem formulation . . . . . . . . . . . . 35
3.1.1 Objective function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.2 Well parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 CMA-ES and a real-coded GA for the well placement problem . . . . . . . 39
3.2.1 Well placement using CMA-ES . . . . . . . . . . . . . . . . . . . . . 39
3.2.2 Well placement using GA . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.3 Well placement performance . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Application of CMA-ES with meta-models on the PUNQ-S3 case . . . . . . 45
3.3.1 Placement of one unilateral producer and one unilateral injector . . 46
3.3.2 Placement of one multi-segment producer . . . . . . . . . . . . . . . 50
4 Local-meta-model CMA-ES for partially separable functions 54

4.1 Partial separability and problem modeling . . . . . . . . . . . . . . . . . . . 54
4.2 lmm-CMA for partially separable functions . . . . . . . . . . . . . . . . . . 56
4.3 Evaluation of p-sep lmm-CMA . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.1 Test functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.2 Performance of p-sep lmm-CMA . . . . . . . . . . . . . . . . . . . . 59
4.3.3 Optimal bandwidth for building partially separated meta-models . . 63
4.3.4 Computational cost . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5 Partially separated meta-models with CMA-ES for well placement opti-

mization 66
5.1 p-sep lmm-CMA for well placement optimization . . . . . . . . . . . . . . . 66
5.2 Application of p-sep lmm-CMA on the PUNQ-S3 case . . . . . . . . . . . . 68
6 Well placement optimization under geological uncertainty 75

6.1 Optimization under uncertainty: a literature review . . . . . . . . . . . . . 75
6.1.1 Optimization community . . . . . . . . . . . . . . . . . . . . . . . . 76
6.1.1.1 Explicit Averaging . . . . . . . . . . . . . . . . . . . . . . . 77
6.1.1.2 Implicit Averaging . . . . . . . . . . . . . . . . . . . . . . . 78
6.1.2 Petroleum community . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 Well placement under uncertainty with CMA-ES using the neighborhood . 80
6.3 Application of CMA-ES using the neighborhood approach on the PUNQ-S3
case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
v
CONTENTS
7 Conclusions and perspectives 90

7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Nomenclature 93
Résumé en Français 95
References 105
vi
List of Figures
1.1 Brent crude oil price (in US dollar), Oct 2007 - Sep 2011 . . . . . . . . . . . 2
2.1 Geometrical representation of a 2-dimensional multivariate normal distri-

bution N(m, C) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Speedup of nlmm-CMA and lmm-CMA on fSchw1/4 , fRosen and fRast for
dimension n = 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 An example of a single multilateral well parameterization with two segments

(Ns = 2) and one branch (Nb = 1) . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Elevation (in meters) and geometry of the PUNQ-S3 test case . . . . . . . . 41
3.3 The mean value of NPV (in US dollar) and its corresponding standard
deviation for well placement optimization using CMA-ES and GA. Four-
teen runs are performed for each algorithm. Constraints are handled using
an adaptive penalization with rejection technique for CMA-ES and using
Genocop III for GA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 The mean number of unfeasible individuals per generation and its corre-
sponding standard deviation using CMA-ES with an adaptive penalization
with rejection technique. Here, we consider only unfeasible individuals far
from the feasible domain, i.e., resampled individuals . . . . . . . . . . . . . 42
3.5 The positions of solution wells found by 14 runs of CMA-ES projected on
the top face of the reservoir . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6 The positions of solution wells found by 14 runs of the GA projected on the
top face of the reservoir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
deviation for well placement optimization using CMA-ES with meta-models
and CMA-ES. Ten runs are performed for each algorithm. Constraints are
handled using an adaptive penalization with rejection technique . . . . . . . 46
3.8 The mean number of reservoir simulations needed to reach a given NPV
value using CMA-ES with meta-models and CMA-ES. Ten runs are per-
formed for each algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
vii
LIST OF FIGURES
3.9 The SoPhiH map, with solution well configuration obtained using CMA-ES
with meta-models and two engineer’s proposed well configurations . . . . . 48
3.10 Production curves for an optimized solution using CMA-ES with meta-
models and two engineer’s proposed configurations . . . . . . . . . . . . . . 49
deviation for well placement optimization using CMA-ES with meta-models
of one multi-segment well. Ten runs are performed . . . . . . . . . . . . . . 51
3.12 The positions of solution multi-segment producers found by 10 runs of CMA-
ES with meta-models. A zoom on the region containing the solution wells
is performed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.13 The positions of solution multi-segment producers found by 10 runs of CMA-
ES with meta-models projected on the top face of the reservoir. A zoom on
the region containing the solution wells is performed . . . . . . . . . . . . . 52
4.1 100
Average number of evaluations of the p-sep lmm-CMA on fRosen to reach
fstop for varying population sizes λ = γ × λdefault , and average number of
100
evaluations per generation of the p-sep lmm-CMA on fRosen for varying
population sizes λ = γ × λdefault . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 α
Success performance SP1 over the dimension of the problem on fRosen , with
α = 1, 102 and 104 for dimensions in between 4 and 40. The dimension of
the sub-functions nM equals 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 Average speedup with respect to CMA-ES to reach fstop with a varying
number of points used to build the meta-model ki = β×ki,min where ki,min =
ni (ni +3)
2 + 1. Each point corresponds to 20 runs performed . . . . . . . . . . 63
5.1 The SoPhiH map with the location of the injector already drilled . . . . . . 69
5.2 The mean value of NPV (in US dollars) for well placement optimization
using CMA-ES with partially separated meta-models denoted p-sep lmm-
CMA, CMA-ES with meta-models denoted lmm-CMA and CMA-ES. Ten
runs are performed for each method . . . . . . . . . . . . . . . . . . . . . . 70
5.3 The SoPhiH map with the location of the injector already drilled, and the
solution producers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
viii
LIST OF FIGURES
5.4 The evolution of the well placement optimization process on the PUNQ-
S3 case using CMA-ES with partially separated meta-model, i.e., p-sep
lmmCMA. The three figures depict one of the ten performed runs of p-sep
lmm-CMA. In (a), the evolution of the best overall NPV value and the
best NPV obtained each generation is depicted. In (b), the evolution of the
well trajectory parameters, where each well is plotted using a different color
representing three group of parameters is depicted. The group of angles
encoding each well is shown in the lower part of the figure (values below
10). The group of well lengths is shown in the intermediate part of the
figure (the three curves with values around 500). The group of Cartesian
coordinates of the wells is shown in the upper part of the figure. In (c) the
evolution of the well trajectory parameters on the log-scale is depicted . . . 74
6.1 The evolution of the well placement optimization process on the PUNQ-S3
case using CMA-ES with the “using the mean of samples” approach. The
best mean value of the NPV over the 20 possible realizations, i.e., NPVR is
shown. Three runs are performed . . . . . . . . . . . . . . . . . . . . . . . . 83
case using CMA-ES with the “using the neighborhood” approach, for three
independent runs in (a), (b) and (c). The evolutions of the best estimated
objective function, i.e., NPVE are drawn with green lines. The evaluations
on the true objective function over the 20 possible realizations, i.e., NPVR
are depicted with red crosses . . . . . . . . . . . . . . . . . . . . . . . . . . 84
case using CMA-ES with the “using the neighborhood” approach for eight
independent runs. (a) shows the evolution of the evaluations on NPVR . (b)
shows the evolution of the best found evaluation on NPVR . The maximum
distance of selection for the neighborhood dmax is equal to 4000 . . . . . . . 84
6.4 The evolution of the well placement optimization process on the PUNQ-
S3 case using CMA-ES with the “using the mean of samples” approach
and the “using the neighborhood” approach. The evolution of the best
found evaluation on NPVR for the “using the neighborhood” approach is
drawn with red lines. The evolution of the best found evaluation on NPVR
for the “using the mean of samples” approach is drawn with blue lines.
Three independent runs are performed for each approach. For the “using
the neighborhood” approach, the maximum distance of selection for the
neighborhood dmax is equal to 4000 . . . . . . . . . . . . . . . . . . . . . . . 86
ix
LIST OF FIGURES
case using CMA-ES with the “using the neighborhood” approach for four
case using CMA-ES with the “using the neighborhood” approach for four
case using CMA-ES with the “using one realization” approach, for three
independent runs in (a), (b) and (c). The evolutions of the best estimated
objective function (equal to an evaluation on a randomly chosen realization)
are drawn with green lines. The evaluations on the true objective function
over the 20 possible realizations, i.e., NPVR are depicted with blue crosses . 88
case using CMA-ES with the “using one realization” approach. The best
mean value of the NPV over the 20 possible realizations, i.e., NPVR is
shown. Three runs are performed . . . . . . . . . . . . . . . . . . . . . . . . 89
.1 Un exemple de paramétrage d’un puits multilatéral ayant deux segments et

une branche . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
.2 Valeur moyenne du NPV (en dollars) et l’écart type correspondant pour un
problème de placement des puits utilisant CMA-ES et AG. Quatorze tests
sont effectués pour chaque algorithme . . . . . . . . . . . . . . . . . . . . . 99
.3 Valeur moyenne du NPV (en dollars) et l’écart type correspondant pour un
problème de placement des puits utilisant CMA-ES avec des méta-modéles
et CMA-ES. Dix tests sont effectués pour chaque algorithme . . . . . . . . . 100
.4 Valeur moyenne du NPV (en dollars) pour un problème de placement des
puits utilisant CMA-ES avec des méta-modéles partiellement séparés, i.e.,
p-sep lmm-CMA, CMA-ES avec des méta-modéles, i.e., lmm-CMA et CMA-
ES. Dix tests sont effectués pour chaque algorithme . . . . . . . . . . . . . . 101
.5 L’évolution du meilleur NPV (en dollars) obtenu pour un problème de place-
ment des puits utilisant CMA-ES avec une approche utilisant la moyenne
des échantillons, et avec l’approche proposée utilisant le voisinage. Trois
tests sont démontrés pour chaque algorithme . . . . . . . . . . . . . . . . . 102
x
Chapter 1
Introduction
“Drill for oil? You mean drill into the

ground to try and find oil? You’re crazy.” –
this was what drillers who Edwin L. Drake1
tried to enlist to his project to drill for oil
in 1859, said.
“If you can draw it (the well), I can drill

it !” – this becomes the modern refrain of a
driller.
1.1 Problem statement
The state of the art in reservoir management has been recently greatly influenced by
technologies. Nowadays, drilling technologies have made great strides with the advances
achieved in directional drilling capabilities. Hence, reservoir engineers can take advantage
from the use of different well architectures such as vertical, horizontal and more complex
configurations to enhance reservoir productivity, especially given the present price of oil
which is although continuing to fluctuate in recent years, still above the US$40/barrel
(Fig. 1.1).
Environments, work areas and conditions in which oil and gas fields are now being
discovered are much more complex and challenging. The existing fields are becoming
more depleted and, therefore, are more marginal. Unless there is a way to optimize
their productivity and to take corrective actions, it would be hard to justify to continue
1
Edwin L. Drake (1819 - 1880) was an American oil driller, popularly credited with being the first to
drill for oil in the United States.
1
Figure 1.1: Brent crude oil price (in US dollar), Oct 2007 - Sep 2011. Reprinted from
Index Mundi website, November 9, 2011.
[ http://www.indexmundi.com/commodities/?commodity=crude-oil-brent&months=60 ]
investing to produce these existing fields for economic reasons [14]. On the other hand,
new discoveries also need an optimal production scheme to be economically viable.
One of the most important issues that must be addressed to maximize a given project’s
asset value is to optimally decide where to drill wells. A well placement decision affects
the hydrocarbon recovery and thus the asset value of a project. In general, such a decision
is difficult to make since an optimal placement depends on a large number of parameters
such as reservoir heterogeneities, faults and fluids in place. Moreover, dealing with complex
well configurations, e.g., non-conventional wells, implies additional challenges such as the
concentration of investment and the well intervention difficulty1 .
The current approach, mostly used in the industry, is based on the so-called profes-
sional judgment made by reservoir engineers –requiring the understanding of the impact of
different influencing engineering and geological parameters– and confirmed by a number
of reservoir simulation trials. However, the reservoir performance is influenced by non-
linearly correlated parameters, which may also evolve with time. Hence, the professional
judgment approach, in general, fails to predict the best well configurations.
Recently, many efforts were made to formulate the well placement decision as an opti-
mization problem: the objective function optimized, which is evaluated using a reservoir
simulator, evaluates the economics of the project; the parameters thought encode the posi-
tion of the different wells (that include locations and trajectories). We define the location
of a given well as the position of the starting point of the well, and we define the trajectory
of a given well as the positions of the mainbore and the laterals (if any). If the number of
wells to be placed and their type (injector or producer) is fixed, the parameters encoding
1
Drilling a well costs in general from US$1 million to US$30 million.
2
the well positions are real numbers and the objective function f maps a subset of Rn
where n, the number of parameters, equals the sum of the number of parameters needed
to encode each well position that need to be placed. Formally we want to find a vector of
parameter pmax ∈ Rn such that:
f (pmax ) = max {f (p)} , (1.1)

p
where p denotes the vector of parameters to be optimized encoding the positions and
trajectories of the well configuration. The vector pmax must be found using a technically
feasible number of reservoir simulations.
The well placement optimization problem is challenging as:
• The objective function, e.g., the net present value (NPV) is difficult to optimize. In
particular, it is multi-modal, i.e., with multiple local optima, non-convex and non-
smooth. An illustration can be found in [103] where the NPV of a single vertical
well placement is sampled to construct the objective function surface. The surface
is shown to be highly non-smooth and to contain several local optima. In this
illustration, the problem dimension equals two and it has thus been possible to
sample all the points from a fine grid spanning regularly the search space. However,
this becomes impossible for problem dimensions larger than 3 as the number of
points, to keep a fine discretization, would need to grow exponentially in the search
space dimension (this is referred as curse of dimensionality) rendering the search
task difficult.
• The problem is costly: a single function evaluation requires one reservoir simulation
which is often very demanding in CPU time (several minutes to several hours). The
affordable number of reservoir simulations is often then restricted.
• The problem involves in general optimizing under geological uncertainty: the prob-
lem assumes that we have already defined a (or a number of) realistic geological
model(s). Each model is obtained using history matching which consists in the ad-
justment of the reservoir model until it closely reproduces the past behavior of the
reservoir (historical production and pressures). However, history matching problem
is a mathematically ill-posed with non-unique solutions, i.e., several possible (gen-
erally equiprobable) geological models. Thus, taking into account several geological
models introduces the problem of handling geological uncertainty which adds an
other challenge to the optimization of the objective function, in particular it leads
to a large increase of the number of performed reservoir simulations. In the context
of geological uncertainty which will be addressed in Chapter 6, we will denote by
3
f the objective function to optimize, and let us consider a number Nr of geological

realizations denoted by (Ri )i=1,··· ,Nr . We denote by f (p, Ri ) the objective function
value on the well configuration p on the realization Ri . Thus, we want to find a
vector of parameter pmax,R ∈ Rn such that:

f R (pmax,R ) = max f R (p) , (1.2)
p
where f R is in general an averaged sum of the objective function evaluations on the

well configuration p over all the realizations:
Nr
1 X
f R (p) = f (p, Ri ) . (1.3)
Nr
i=1
Furthermore, constraints are imposed to guarantee the physical feasibility of the so-
lution wells, and thus to avoid very long wells or wells that violate common engineering
practices (e.g., wells outside the reservoir). Therefore, a constraint optimization problem
needs to be handled. Formally, when dealing with constraints, we want to find a vector of
parameter pmax ∈ Rn such that:
(
f (pmax ) = max f (p)
, (1.4)
s.t. hi (p) ≤ di ∀i = 1, · · · , m
where m is the number of constraints, di are real numbers and hi : Rn → R are the
constraint functions that need to be satisfied.
The main objective of this thesis is to propose a procedure for solving the well place-
ment optimization problem, in particular the well locations and trajectories optimization
problem. The proposed procedure must offer the maximum asset value using a technically
feasible number of reservoir simulations. This implies to address the challenges explained
above namely:
(I) The non-smoothness, the multi-modality, the non-convexity and the high dimen-
sionality of the objective function;
(II) The expensive cost of the objective function;
(III) The geological uncertainty handling problem.
In this thesis, we will consider the well placement optimization problem as a black-box
optimization (also known as derivative-free optimization) problem. The black-box opti-
mization means that only the inputs and outputs of the objective function are observed,
and not its internal operations and processes. The black-box context is natural in our
4
1.2 Literature review
context since an objective function evaluation involves a reservoir simulation which corre-
sponds in general to a commercial software, in which the internal structure and code are
often unavailable.
We now review the critical points of current knowledge and methodological approaches
related to the well placement optimization.
Many optimization algorithms exist to address the continuous optimization problem formu-
lated in Eq. (1.1). In this section, we give a survey of the existing continuous optimization
algorithms. Only some of these algorithms will be detailed depending on their importance
for this thesis. Other algorithms will be briefly mentioned with their corresponding refer-
ences for more details. Then, a survey of studies describing existing approaches used for
the well placement optimization problem will be given. A detailed literature review for
well placement optimization under geological uncertainty formulated in Eq. (1.2) will be
provided in Chapter 6.
1.2.1 Optimization algorithms
Optimization algorithms for non-linear continuous optimization can be divided depending

on the method they use to explore the search space. In the following, we enumerate a
number of selected representative algorithms divided into four categories: deterministic
algorithms, stochastic algorithms, search algorithms using surrogates and hybrid algo-
rithms.
1.2.1.1 Deterministic methods
Deterministic algorithms include descent methods which use the explicit value of the gra-
dient or higher order derivatives of the objective function. If this information is not avail-
able, i.e., in case of black-box optimization, it can be approximated. Other deterministic
optimization techniques include trust region methods (e.g., [107]), direct pattern search
methods [80] and simplex methods [101]. A major drawback of deterministic optimization
methods is that they can easily get stuck in a local optimum.
• Descent methods: Descent methods are defined as iterative methods that need
the gradient of the objective function to search for a minimum of a given objective
function f . After fixing an initial point xk at iteration k, a new point is calculated
as follows:
xk+1 = xk + αk pk (1.5)
5
where pk is the search direction at iteration k and αk denotes the step width. The
optimization process continues until reaching the convergence criterion. The search
direction can be calculated using a linear approximation (first order) of the target
function, i.e., pk = −∇f (xk ). In this case, the method is called the steepest de-
scent method. A second order approach uses a quadratic approximation and leads
to methods referred to as Newton methods. Quasi-Newton methods are based on
Newton methods, but without computing the Hessian matrix. In this case, the search
direction pk = −H̃k−1 ∇f (xk ), where H̃k is an approximation of the Hessian matrix
in the current solution. The most popular quasi-Newton method is the Broyden-
Fletcher-Goldfarb-Shanno algorithm (BFGS) [31, 53, 61, 118].
If no explicit formula of the objective function is available, derivatives are in general

approximated using methods such as finite difference methods. An other way to
compute the gradients is by using adjoint methods. In contrast to finite difference
methods, where the number of objective function evaluations required to estimate
the gradients grows linearly with the number of the parameters of the problem,
adjoint methods provide the gradients in a fraction of the computational time of
objective function evaluation. However, implementing adjoint methods requires a
deep understanding of the so-called simulation code (corresponding to the objective
function evaluation) which is not usually trivial for real-world problems. It also
requires having access to the simulation code, which is not usually available for real-
world problems. Adjoint methods are widely used in aerodynamics [81]. In the oil
and gas industry, it is still difficult to apply adjoint method approaches, although
some research has already been performed in particular in the reservoir simulation
community [89].
• Trust region methods: Trust region methods, called also quadratic approximation
methods rely on an approximation of the objective function f with a quadratic
function which is supposed to be a reasonable approximation of f in a neighborhood
of the the current estimate. This neighborhood is called the trust region. A state-
of-the-art trust region method is the NEW Unconstrained Optimization Algorithm
(NEWUOA) [107] which is a derivative-free optimization method. At each iteration,
NEWUOA creates a quadratic model that interpolates the objective function f at m
points (usually m = 2n + 1, where n is the number of parameters to be optimized).
The quadratic model is then updated by minimizing it inside the trust region. A
more detailed presentation of trust region methods can be found in [91].
6
1.2.1.2 Stochastic methods
Stochastic methods have been employed to mitigate the defect of deterministic methods
for difficult functions to solve (e.g., non-smooth and multi-modal). In particular, stochas-
tic optimization algorithms aim at being more robust when dealing with multi-modal
objective functions. These methods include methods such as simulated annealing (SA)
[88, 124], particle swarm optimization (PSO) [86], simultaneous perturbation stochastic
algorithm (SPSA) [119] and evolutionary algorithms (EA). EAs which have received an
increasing interest has mainly three origins: genetic algorithms (GA) [78, 79], evolutionary
programming (EP) [56, 56] and evolution strategies (ES) [108, 117].
• Evolutionary algorithms (EA): An overview of evolutionary algorithms is pre-

sented in [15]. EAs are stochastic optimization algorithms inspired by biological
evolution. Starting with an initial population of points called individuals and at
each iteration, candidate solutions evolve by selection, mutation and recombination
until reaching the stopping criteria with a satisfactory solution. This process is used
by the three origins of EAs, i.e., GA, EP and ES. Only two of them will be detailed
in this section: genetic algorithms and evolution strategies.
Genetic algorithms (GA) [78, 79] are stochastic search algorithms designed ini-
tially to deal with binary encoded individuals. For continuous optimization, problem
variables can either be mapped to binary strings or other encoding can be adopted
such as real encoding. However, representing real vectors as bit strings leads to poor
performance [122].
Evolution strategies (ES) [108, 117]: besides the common principles shared with
other EAs, i.e., mutation, recombination and selection, during the optimization pro-
cess, ESs sample new individuals according to a multivariate normal distribution, and
use a self-learning mechanism to adapt its parameters called adaptive search. The
Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [74] is the state-of-the-
art Evolution Strategy where the multivariate normal distribution has a mean and
a covariance matrix continually updated during the optimization process. Intensive
benchmarking of several derivative-free algorithms have established that CMA-ES
is one of the most efficient method for dealing with difficult numerical optimization
problems [70]. CMA-ES has also been applied to real-world problems [18, 42, 92, 94].
More details about CMA-ES are provided in Chapter 2.
• Simulated annealing (SA) [88, 124]: The name and the inspiration of simulated
annealing comes from annealing in metallurgy, a technique involving heating and
controlled cooling of a material to increase the size of its crystals and reduce their
defects. The algorithm avoids getting trapped in local optima by allowing moves
7
that may lead to a deterioration in the objective function values. The SA algorithm
is outlined as follows. Given a candidate solution s, a neighbor random solution
s′ is accepted1 if (1) s′ is better than s with respect to the objective function or
(2) with a probability that depends on the change of the corresponding objective
function values and a control parameter T , called the temperature. Otherwise, if
none of the above conditions are met, the current solution remains unchanged. The
parameter T is gradually decreased to zero in the course of the optimization according
to a deterministic “cooling schedule”. The performance of the simulated annealing
algorithm is very sensitive to the choice of the cooling schedule.
• Particle swarm optimization (PSO) [86]: PSO is an iterative population based

algorithm, inspired from movement of swarms of birds or insects searching for food
or protection. Each particle movement is influenced by its own experience (its best
found locality) and by the experience of the others (the best found locality of all
the particles). Based on these best found localities, the localities of the members
of the swarm and their velocities are adjusted. The performance of PSO are not
invariant with respect to rotations of the coordinate system, i.e., the performance of
PSO on non-separable, ill-conditioned functions declines dramatically with increasing
condition numbers [75].
• Simultaneous perturbation stochastic algorithm (SPSA) [119]: SPSA is a

stochastic gradient approximation method, in which at each iteration the parameters
are randomly perturbed, and the objective function is evaluated at the perturbed
points to estimate the gradient.
1.2.1.3 Search algorithms using surrogates
Search algorithms using surrogates, called proxy-modeling or meta-modeling in the lit-

erature, are based on approximating the objective function by a an approximate model
(called also surrogate, proxy-model or meta-model). In the context of costly objective
functions, a surrogate can be considered as a computationally cheaper replacement of the
objective function. Thus, during the optimization process the surrogate is constructed and
the objective function evaluations are replaced by evaluations on the surrogate [20, 83].
Search algorithms using surrogates needs to consider the so-called exploration-exploitation
trade-off [58], i.e., evaluating more (respectively, less) candidate solutions using the “true”
objective function implies a better (respectively, worst) quality of the surrogate but on the
other hand a higher (respectively, reduced) computational cost of the optimization.
1
If a candidate solution is accepted, it replaces the current solution
8
The most popular surrogate models include polynomial response surfaces, Kriging
[90, 48], support vector machines [40] and artificial neural networks [110].
1.2.1.4 Hybrid methods
Several algorithms (two or more) from different classes can be combined in order to form
the so-called hybrid methods. Hybridization aims at having a resulting algorithm which
contains the positive features of the combined algorithms. Several hybridizations have been
proposed in the literature in order to tackle specific applications. For instance, a review of
hybridization of genetic algorithms can be found in [46]. Also, a review of hybridization
of the particle swarm optimization can be found in [123].
1.2.2 Well placement optimization
Well placement optimization is a recent area of research that is gaining growing interest.
Different methodologies have been used in the literature to tackle the well placement
problem.
On the one hand, approaches based on stochastic search algorithms were used, where
minimal assumptions on the problem are needed and that are thus more robust than
deterministic methods when dealing with rugged problems such as the well placement
problem. Simulated annealing (SA) was used in [19] for well placement and scheduling,
and in [85] for well placement. Particle swarm optimization (PSO) was applied in [103] for
the determination of optima well type and position. Genetic algorithm (GA) was applied
in [98, 47, 99, 33]. Simultaneous perturbation stochastic algorithm (SPSA) was used in
[16, 17]. In particular, in [17], a comparison between three optimization algorithms is
performed: the SPSA algorithm, the very fast simulated annealing (VFSA) and the finite
difference gradient (FDG).
On the other hand, deterministic optimization methods were also used. Descent algo-
rithms were mostly used, in which adjoint methods were used for computing the gradients
[67, 114, 57, 125, 130]. Using descent methods implies that the underlying model of the
function needs to be smooth enough. In [67], the adjoint method is used to place an injec-
tor in a 2D oil-water reservoir with 4 producers already fixed in each of the four corners
grid blocks. Results show that the algorithm, as expected due to its deterministic aspect,
converges to a different local optimum for every initial well location. In [114], the wells
are defined by continuous variables and the adjoint method is tested on a few synthetic
waterflood optimization problems.
Search algorithms using surrogates, or proxy-modeling were also used in the literature.
In proxy-modeling the true objective function is replaced by a proxy-model, and different
optimization techniques are applied to the proxy. Proxy-models include least squares and
9
1.3 Thesis objectives and methodology
kriging [105], radial basis functions [50], quality maps [41, 100], and multiple regression
techniques (including kriging) [1]. Although proxy-modeling is an efficient way to have an
approach with a reduced number of reservoir simulations, its application, with increasing
complexity of the solution space, is not recommended [132].
Stochastic algorithms have been combined with search algorithms using surrogates and
deterministic approaches to form hybrid algorithms: GA with a polytope algorithm and
kriging [63, 64], GA with a polytope algorithm, kriging and neural networks [65], GA with
neural networks, a hill climber and a near-well upscaling technique [129]. Results show
that a hybrid stochastic algorithm converges in general to a reasonable solution with a
reduced number of evaluations compared to a pure stochastic algorithm. The approaches
in [63, 64, 65, 129] build at each iteration a proxy-model, determine its maximum and
include the location of this maximum in the population (replacing the worst individual) if
it is better than the best individual of the current population. In [10], a GA is defined, in
which at each iteration, only a predefined percentage of the individuals, chosen according
to a set of scenario attributes, is simulated. The objective function of the non-simulated
points is estimated using a statistical proxy based on cluster analysis.
1.3 Thesis objectives and methodology
In this thesis the objective is to address the previously mentioned challenges (I), (II) and
(III) in Section 1.1, namely:
Considering the state of the art in optimization, the choice of the CMA-ES algorithm
[74] seems a priori natural to address problem (I). Indeed, CMA-ES is recognized as one
of the most powerful derivative-free optimizers for continuous optimization [70]. CMA-ES
is both a fast and robust local search algorithm, exhibiting linear convergence on wide
classes of functions and a global search algorithm when playing with restart and increase
of population size. CMA-ES, in contrast to most other evolutionary algorithms, is a quasi
parameter-free algorithm1 .
In the petroleum industry, CMA-ES have been applied only in two studies, to the
best of our knowledge, previous to this work: a characterization of fracture conductivities
1
Only the population size is suggested to be adjusted by the user in order to account for the ruggedness
of the objective function landscape.
10
1.4 Summary of contributions
from well tests inversion [32], a well placement optimization but with respect to simple at-
tributes (e.g., productivity indexes) [43]. A more recent application on the well placement
optimization was shown in [116, 115].
To tackle problem (II), we propose to investigate coupling the CMA-ES optimizer with
surrogates (or meta-models). In this context, we aim at defining an efficient variant of
CMA-ES coupled with meta-models able to reduce significantly the number of the reservoir
simulations. Furthermore, we aim at exploiting the knowledge about the optimization
problem, in particular the so-called partial separability of the objective function in order
to reduce more the number of reservoir simulations.
Finally, to tackle problem (III), we aim at defining an approach (for CMA-ES) able
to capture the geological uncertainty with a significantly reduced cost of reservoir simu-
lations. In this context, we aim at defining an approach that performs a small number
of reservoir simulations (typically one) for each well configuration instead of performing
reservoir simulations on all possible geological realizations.
The following presents a summary of the contributions of this thesis.
We have tackled the problem (I) related to the non-smoothness, the multi-modality, the
non-convexity and the high dimensionality of the objective function in the well placement
problem, and we have shown:
A first successful application of CMA-ES on the well placement problem. (Re-

sults published in [26, 24]) We propose a new methodology for well location and
trajectory optimization based on the population based stochastic search algorithm called
Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [74]. We propose to use
a new adaptive penalization with rejection technique to handle constraints. Because ge-
netic algorithms are quite often the method of choice in petroleum industry, we show the
improvement of applying CMA-ES over a GA on the synthetic benchmark reservoir case
PUNQ-S3 [54]. To allow a fair comparison, both algorithms are used without parameter
tuning on the problem, standard settings are used for the GA and default settings for
CMA-ES. It is shown that our new approach outperforms the genetic algorithm: it leads
in general to both a higher net present value and a significant reduction in the number of
reservoir simulations needed to reach a good well configuration.
11
After this application of CMA-ES on the well placement problem, we have tackled the
problem (II) related to the expensive cost of the objective function, and we have proposed
two new algorithms:
A new variant of CMA-ES with local meta-models. (Results published in [22])

The local-meta-model CMA-ES (lmm-CMA) [87] coupling local quadratic meta-models
with the Covariance Matrix Adaptation Evolution Strategy is investigated. The scaling
of the algorithm with respect to the population size is analyzed and limitations of the
approach for population sizes larger than the default one are shown. A new variant for
deciding when the meta-model is accepted is proposed –called the new-local-meta-model
CMA-ES (nlmm-CMA).
A new variant of CMA-ES with local meta-models for partially separable func-
tions. (Results published in [23]) We propose a new variant of the covariance matrix
adaptation evolution strategy with local meta-models for optimizing partially separable
functions –called the partially separable local-meta-model CMA-ES (p-sep lmm-CMA).
We propose to exploit partial separability by building at each iteration a meta-model for
each element function (or sub-function) using a full quadratic local model. Our results
demonstrate that exploiting partial separability leads to an important speedup compared
to the standard CMA-ES. We show on the tested functions that the speedup increases
with increasing dimensions for a fixed dimension of the element function. On the stan-
dard Rosenbrock function the maximum speedup of λ is reached in dimension 40 using
element functions of dimension 2, where λ is the population size. We show also that higher
speedups can be achieved by increasing the population size.
Now, we have applied the two new proposed algorithms on the well placement problem
to achieve:
A significant reduction of the number of reservoir simulations for the well

placement problem. (Results published in [26, 24, 25]) We propose to apply
CMA-ES with local meta-models (nlmm-CMA) on the well placement problem, where
for each well configuration in the population, an approximate convex quadratic model is
built using true objective function evaluations collected during the optimization process.
Coupling CMA-ES with a meta-model leads to a significant improvement, which was
around 20% for the synthetic benchmark reservoir case PUNQ-S3.
Moreover, we propose also to apply p-sep lmm-CMA on the well placement problem,
by building partially separated meta-models for each well or set of wells, which results in a
more accurate modeling. Results show that taking advantage of the partial separability of
12
1.5 Dissertation road-map
the objective function leads to a significant decrease in the number of reservoir simulations
needed to find the “optimal” well configuration, given a restricted budget of reservoir
simulations.
We have also tackled the problem (III) related to the geological uncertainty handling,
and we have proposed:
A new approach to handle geological uncertainty for the well placement prob-
lem. We propose a new approach to handle geological uncertainty for the well placement
problem with a reduced number of reservoir simulations. We propose to use only one re-
alization together with the neighborhood of each well configuration in order to estimate
its objective function instead of using multiple realizations. The approach is applied on
the synthetic benchmark reservoir case PUNQ-S3 and shown to be able to capture the
geological uncertainty using a reduced number of reservoir simulations.
1.5 Dissertation road-map
This thesis is structured as follows. Chapter 2 gives a “theoretical” overview of the opti-
mization method used in this thesis: the Covariance Matrix Adaptation Evolution Strategy
(CMA-ES). An adaptive penalization technique to handle the optimization constraints is
also introduced and a combination of CMA-ES with meta-models is investigated to pro-
pose a new variant of CMA-ES with local-meta-models, called the new-local-meta-model
CMA-ES (nlmm-CMA).
In Chapter 3, the CMA-ES optimizer is applied on the well placement problem. The
improvement of applying CMA-ES over a GA on a synthetic benchmark reservoir case is
shown. In addition, the contribution of the CMA-ES with meta-models in reducing the
number of reservoir simulations is demonstrated on a number of examples.
In Chapter 4, we propose a new variant of CMA-ES with local meta-models for optimiz-
ing partially separable functions, called the partially separable local-meta-model CMA-ES
(p-sep lmm-CMA).
In Chapter 5, the resulting approach (p-sep lmm-CMA) is applied on the well placement
problem.
Finally, in Chapter 6, the problem of dealing with uncertainty in well placement is
tackled. A new approach using the neighborhood of each well configuration is proposed
and demonstrated on a synthetic benchmark reservoir case.
The thesis closes with the conclusions and a number of suggestions for future work.
13
Chapter 2
CMA-ES and CMA-ES with

meta-models
This chapter is based on the paper [22]. It gives a detailed overview of the optimization
methods applied in Chapter 3 to the well placement problem. We present the CMA-ES
algorithm, a constraint handling needed for well placement and a new surrogate approach
that couples CMA-ES with meta-models. This latter approach mitigate some defects of the
local-meta-model CMA-ES (lmm-CMA). The different defined methodologies are tested
and validated on some mathematical test functions.
This chapter is structured as follows. Section 2.1 gives an overview of the optimization
algorithm CMA-ES. In Section 2.2, we propose an adaptive penalization and rejection
technique in order to handle optimization constraints. Finally in Section 2.3, the reduction
of the number of evaluations is addressed by coupling CMA-ES with meta-models.
In the following, we denote the objective function to be optimized by f : Rn → R.
2.1 Covariance Matrix Adaptation - Evolution Strategy
The Covariance Matrix Adaptation - Evolution Strategy (CMA-ES) [74, 71] is an iterative
stochastic optimization algorithm where at each iteration, a population of candidate solu-
tions is sampled. In contrast to the classical presentation of population based stochastic
search algorithms (like genetic algorithms [78, 79]) where the different steps of the algo-
rithms are described in terms of operators acting on the population (crossover, mutation),
the natural algorithm template for CMA-ES translates the evolution of the probability
distribution used to sample points at each iteration. Indeed, the algorithm loops over the
following steps:
1. sample a population of λ candidate solutions (points of Rn )
2. evaluate the λ candidate solutions on f
14
3. adapt the sampling distribution (using the feedback from f obtained at step 2.)
We see that this general template depends on a probability distribution (sampling distri-
bution) and on the update of this probability distribution. The sampling distribution in
CMA-ES is a multivariate normal distribution. In the next paragraphs we will give more
insights on multivariate normal distributions and their geometrical interpretation and then
explain how its update is performed at each iteration within CMA-ES.
Multivariate normal distributions A random vector of Rn distributed according to a

multivariate normal distribution is usually denoted by N(m, C) where m is a vector of Rn
and C an n × n symmetric positive definite matrix corresponding to the covariance matrix
of the random vector. The set of parameters (m, C) entirely determines the random vector.
Fig. 2.1 gives the geometric interpretation of a random vector N(m, C) in two dimensions.
We visualize that m is the symmetry center of the distribution and that isodensity lines
are ellipsoid centered in m with main axes corresponding to eigenvectors of C and lengths
determined by the square roots of the eigenvalues of C. Fig. 2.1 depicts also points
sampled according to a multivariate normal distribution. As expected, the spread of the
points follows the isodensity lines. A useful relation is m + N(0, C) = N(m, C) that
interprets m as the displacement from the origin 0.
In CMA-ES, the mean vector represents the favorite solution or best estimate of the
optimum, and the covariance matrix C characterizing the geometric shape of the distri-
bution defines where new solutions are sampled. Furthermore, an additional parameter is
added, which is the step-size σ used as a global scaling factor for the covariance matrix.
Overall, in step 1. for CMA-ES, points are sampled according to:
m + σN(0, C) . (2.1)
The adaptation of m targets to find the best estimate of the optimum, the adaptation of
C aims at learning the right coordinate system of the problem (rotation and scaling of the
main axes) and the adaptation of σ aims at achieving fast convergence to an optimum and
preventing premature convergence. We will now describe how the distribution is updated,
that is how the parameters m, σ and C are updated in step 3. of the template.
Update of mean vector, covariance matrix and step-size We adopt here some
time-dependent notations. The iteration index is denoted g. Let (m(g) , g ∈ N) be the
sequence of mean vectors of the multivariate normal distribution generated by CMA-
ES and let (σ (g) , g ∈ N) and (C(g) , g ∈ N) be respectively the sequences of step-sizes and
15
12 12
10 10
8 8
6 6
4 4
2 m 2 m
0 isodensity 0 isodensity
−2 lines −2 lines
−4 −4
−6 −6
−8 −8
−5 0 5 10 −5 0 5 10
12 12
10 10
8 8
6 6
4 4
2 m 2 m
0 isodensity 0 isodensity
−2 lines −2 lines
−4 −4
−6 −6
−8 −8
−5 0 5 10 −5 0 5 10
Figure 2.1: Geometrical representation of a 2-dimensional multivariate normal distribution

N(m, C) where m = (2, 2)T and the covariance matrix C admits √12 (1, 1) and √12 (−1, 1) as
normalized eigenvectors with respective eigenvalues 16 and 1. Depicted on each plot is the
mean vector m and the ellipsoid isodentity lines defined as (x−m)T C−1 (x−m) = c where
the constant c equals 1 (inner line) and 3 (outer line). The main axes of the (isodensity)
ellipsoid are carried by eigenvectors of C. The half lengths of the axis of the unit isodensity
lines ((x − m)T C−1 (x − m) = 1) are the square roots of the eigenvalues of C.
Depicted on the 2nd, 3rd and 4th plots are samples among 10 (resp. 100 and 1000) samples
from N(m, C) falling into the box plot [−8, 12] × [−8, 12].
16
covariance matrices. Assume that m(g) , σ (g) , C(g) are given, the λ new points or individuals
are sampled in step 1. according to:
(g)
xi = m(g) + σ (g) Ni (0, C(g) ), for i = 1, · · · , λ . (2.2)
| {z }
=yi
Those λ individuals are evaluated in step 2. and ranked according to f :
(g) (g) (g)

f (x1:λ ) ≤ · · · ≤ f (xµ:λ ) ≤ · · · ≤ f (xλ:λ ) , (2.3)
(g)
where we use the notation xi:λ for ith best individual.
The mean m(g) is then updated by taking the weighted mean of the best µ individuals:
µ
X µ
X
(g)
m(g+1) = ωi xi:λ = m(g) + σ (g) ωi yi:λ , (2.4)
i=1 i=1
(g)
where yi:λ = (xi:λ − m(g) )/σ (g) . In general µ = λ2 and (ωi )1≤i≤µ are strictly positive
Pµ
and normalized weights, i.e., satisfying ωi = 1. This update displaces the mean vector
i=1 P
toward the best solutions. The increment σ (g) µi=1 ωi yi:λ has an interpretation in terms of
(stochastic) approximation of the gradient with respect to m of a joint criterion J mapping
(m, σ, C) to R and depending on quantiles of the objective function f [9].
A measure characterizing the recombination used is called the variance effective selec-
µ
P 2 −1
tion mass and defined by µeff = ωi . The choice of the recombination type has
i=1
an important impact on the efficiency of the algorithm [6]. The default weights are equal
to:
ln(µ + 1) − ln(i)
ωi = , for i = 1, · · · , µ . (2.5)
µ ln(µ + 1) − ln(µ!)
The update of the covariance matrix C(g) uses two mechanisms. First of all the rank-
(g)
one update [74] using the so called evolution path pc ∈ Rn whose update is given by:
p m(g+1)−m(g)
p(g+1)
c = (1−cc )p(g)
c + cc (2−cc )µeff , (2.6)
σ (g)
where cc ∈)0, 1]. For the constant cc = 1, the evolution path points toward the descent
m(g+1) −m(g) (g)
direction σ (g)
and for cc 6= 1, the vector pc adds the steps followed by the mean
vector over the iterations using some normalization to dampen previous steps, so as not
(g+1)
to rely too much on old information. The vector pc gives a direction where we expect
(g+1) (g+1) T
to see good solutions. From the evolution path, the rank-one matrix pc pc is
built and added to the covariance matrix (see Eq. (2.7)). Geometrically it deforms the
(g+1)
ellipsoid-density in the direction pc , i.e., the rank-one update increases the probability
(g+1)
to sample in the next iteration in the direction pc .
17
The second mechanism is the rank-mu update [72] where the rank-mu matrix
µ
P T is added to the covariance matrix. This rank-mu matrix is also the stochastic
ωi yi:λ yi:λ
i=1
approximation of the gradient of the joint criterion J with respect to C [9]. The update
of the covariance matrix combines rank-one and rank-mu update and reads:
X µ
(g+1) (g) ccov (g+1) (g+1) T 1 T
C = (1 − ccov )C + pc pc + ccov 1− × ωi yi:λ yi:λ . (2.7)
µcov µcov
| {z } | {z i=1 }
rank-one update
rank-mu update
(0)
The initial evolution path pc , cc , ccov and µcov are parameters of the algorithm. Default
values can be found in [71].
In addition to the covariance matrix adaptation, the step-size σ (g) is controlled after
(g)
every iteration. To perform the adaptation, a conjugate evolution path pσ ∈ Rn at
generation g is updated according to:
(g+1) (g)
pσ = (1 − cσ )pσ
p − 1 (g+1) −m(g) (2.8)
+ cσ (2 − cσ )µeff C(g) 2 m σ(g) .
The conjugate path differs from the evolution path in the direction of the steps added,
m(g+1) −m(g)
as in the conjugate path the normalized step σ (g)
is multiplied by the matrix
1
C (g) − 2 1 .
The step-size is adapted according to:
!!
(g+1)
(g+1) (g) cσ kpσ k
σ =σ exp −1 , (2.9)
dσ EkN(0, I)k
(0)
where pσ , cσ and dσ are parameters of the algorithm with default values defined in [71].
This update rule implements to increase the step-size when the length of the conjugate
evolution path is larger than the length it would have if selection would be random (this
length will then be equal to kN(0, I)k) and decrease it otherwise.
All the updates rely on the ranking determined by Eq. (2.3) only and not on the exact
value of the objective functions making the algorithm invariant to monotonic transforma-
tions of the objective functions that preserve the ranking of solutions.
On the class of functions x 7→ gM ◦ fcq (x) where fcq is a convex quadratic function
and gM : R → R a monotonically increasing function, the covariance matrix sequence C(g)
becomes proportional to the inverse Hessian of the function fcq (x), i.e., the algorithm is
able to learn second order information without using any derivatives.
1
This difference is mainly technical in order to be able to compare the length of the conjugate path at
different iterations though the steps have been sampled with different covariance matrices [74]
18
Step-size adaptation is important to achieve fast convergence corresponding to linear

convergence with rates close to optimal rates that can be achieved by evolution strategies
algorithms. In combination with covariance matrix adaptation, step-size adaptation al-
lows to achieve linear convergence on a wide range of functions including ill-conditioned
problems.
CMA-ES and EnOpt The ensemble-based optimization (EnOpt) [37, 36, 131] shares
similarities with CMA-ES. In the following, we briefly present the main idea of EnOpt as
well as the similarities and differences with CMA-ES. Original notations defined in [131]
have been changed in order to be in accordance with the notations used for CMA-ES.
In EnOpt, for every iteration, an ensemble of λ points is sampled according to:
(g+1)
xi = m(g) + Ni (0, CX ) for i = 1, · · · , λ , (2.10)
where Ni (0, CX )1≤i≤λ are λ independent multivariate normal distributions with zero mean
vector and covariance matrix CX . CX is a user specified matrix, which remains constant
during the whole optimization process. Therefore, EnOpt adapts only the mean m(g) of
the distribution according to:
(g)
m(g+1) = m(g) + α(g) CX CX,J , (2.11)
(g)
where α(g) is the step-size and CX,J is the cross-covariance between the population and
the approximate gradient of the objective function.
Hence, while EnOpt and CMA-ES shares some similarities, CMA-ES presents three
important advantages:
• CMA-ES adapts the covariance matrix used to sample its population to the landscape
of the objective function as shown above. However, EnOpt uses the same covari-
ance matrix during the whole optimization process which may lead to difficulties in
refining the search at the end of the optimization;
• CMA-ES uses a step-size adaptation mechanism where the step-size is increased or

decreased depending on the situation which is crucial to obtain linear convergence.
However, in EnOpt, the step-size is always decreased and thus too small values at
the beginning will be very detrimental for the convergence rate. Situations where
step-size should be increased (linear environment) are also sub-optimally handled;
• CMA-ES is invariant to monotonic transformations of the objective functions that

preserve the ranking of solutions, which represents a source of robustness of the
algorithm [59]. More particularly, this invariance of CMA-ES removes the need to
19
2.2 Handling constraints with CMA-ES
tune the parameters of the algorithm according to the scale of the objective function,
which is in general a challenging task. However, EnOpt uses the exact values of
the objective function to update the mean of its search distribution which leads to
breaking the invariance that comparison-based algorithms, such as CMA-ES, have.
Several methods are used, in the literature, to handle constraints in stochastic optimization
algorithms. In general, unfeasible individuals can be rejected, penalized or repaired. In
the following, we briefly discuss these alternatives. A more detailed study and comparison
can be found in [96].
• Rejection of unfeasible individuals: Besides its simplicity and ease of implementation,

rejecting the unfeasible individuals, also called “death penalty” does not require any
parameter to be tuned. However, ignoring unfeasible individuals can prevent the
algorithm from finding the region containing the optimum solution if it is close to
the feasible domain boundaries [95];
• Penalizing unfeasible individuals: Penalization is the most widespread approach used

to handle constraints. This method corresponds to a transformation of the optimiza-
tion problem: (
min f (x)
s.t. hi (x) ≤ di ∀i = 1, · · · , m (2.12)
Pm
⇒ min f (x) + g(hi (x) − di ) ,
i=1
where m is the number of constraints and g(.) is the penalty function which is
non-negative, equal to zero in R− and increasing in R+ . In general, g(.) contains pa-
rameters to be tuned. These parameters depend on the problem to be optimized. A
solution to avoid the difficulty of tuning those parameters consists in using an adap-
tive penalization which does not require any user specified constant. However, pe-
nalizing all unfeasible individuals implies evaluating all unfeasible individuals which
can be costly;
• Repairing unfeasible individuals: Another popular solution to handle constraints is

to repair each unfeasible individual before evaluating it. An important parameter
to be specified is the probability of replacement of the unfeasible individual by the
repaired new feasible individual. Moreover, repairing introduces a new individual in
the population which may not obey to the adapted distribution, and hence may hold
up the optimization process of CMA-ES.
20
Knowing the limitations of each of the constraint-handling approaches, the approach

used in the present work is a mixture between two approaches: adaptive penalization of the
marginally unfeasible individuals and rejection of only the unfeasible individuals far from
the boundaries of the feasible domain. Using this approach, rejecting only individuals far
from the feasible domain does not prevent the algorithm from finding a solution near the
feasible domain boundaries, and by using adaptive penalization, the critical penalization
coefficients are adapted automatically during the course of the search1 .
A box constraint handling is presented in [73] in which the feasible space is a hypercube
defined by lower and upper boundary values for each parameter. In the following, this
approach is generalized in order to handle feasible spaces defined by lower and upper
boundary values for a sum of some of the parameters (e.g., to constrain the length of
multilateral wells).
Given an optimization problem with a dimension n, let us suppose we have m ∈ N
constraints denoted by Sj , ∀j = 1, · · · , m. For each constraint Sj , we define Pj ⊂
{1, · · · , n} such that a vector x = (xi )1≤i≤n is feasible with respect to the constraint Sj if:
X
v(j,−) < qj = xp < v(j,+) , (2.13)
p∈Pj
where v(j,−) and v(j,+) are the lower and upper boundaries defining Sj . Constraints are
then handled as follows, when evaluating an individual x:
- Initializing weights: In the first generation, boundary weights γj are initialized to
γj = 0, ∀j = 1, · · · , m ;
- Setting weights: From the second generation upwards, if the distribution mean is unfea-
sible and weights are not set yet
2δfit
γj ←− n , ∀j = 1, · · · , m , (2.14)
21
P
σ n Cii
i=1
3n
where δfit is the median from the last (20 + λ ) generations of the interquartile range
of the unpenalized objective function th
evaluations
n
and Cii is the i diagonal element of
P
the covariance matrix. The term σ 2 n1 Cii represents the mean of σ 2 Cii i=1,··· ,n
i=1
which will be used in Eq. (2.16) in order to normalize the square of the distance which is
(qjfeas − qj )2 with respect to the covariance matrix adapted by CMA-ES ;
- Increasing weights: For each constraint Sj , if the distribution mean Mj , i.e., the mean of
qj for the λ individuals of the current generation, is out-of-bounds and the distance from
Mj to the feasible domain, i.e., max(0, Mj − v(j,+) ) + max(0, v(j,−) − Mj ) is larger than
1
The penalization method depends in general on other parameters which are on the other hand much
less critical and which are tuned beforehand to be suitable for a wide range of problems [73].
21
2.3 CMA-ES with local meta-models
r P √
1 n
σ× card(Pj ) Cpp × max(1, µeff ) then
p∈Pj
µeff
γj ←− γj × 1.1max(1, 10n ) , ∀j = 1, · · · , m , (2.15)
where card(Pj ) denotes the cardinality of the set Pj ;

- Evaluating the individual :
m
1 X (qjfeas − qj )2
f (x) ←− f (x) + γj , (2.16)
m ξj
j=1
where qjfeas is the projection of qj on the feasible domain and ξj =

!!
P n
P
1 1
exp 0.9 card(Pj ) log(Cpp ) − n × log(Cii ) .
p∈Pj i=1
An individual x, in the following, will be rejected and resampled if |qjfeas − qj | >
p% × |v(j,+) − v(j,−) |, where p% is a parameter to be chosen. In all runs presented in the
sequel, p% is chosen to be equal to 20%.
Many real-world optimization problems are formulated in a black-box scenario where the
objective function to optimize may have noise, multiple optima and can be computationally
expensive. For expensive objective functions–several minutes to several hours for one
evaluation–a strategy is to couple evolutionary algorithms with meta-models or surrogates:
a model of f is built, based on “true” evaluations of f , and used during the optimization
process to save evaluations of the expensive objective function [83]. One key issue when
coupling EAs and meta-models is to decide when the quality of the model is good enough to
continue exploiting this model and when new evaluations on the “true” objective functions
should be performed, i.e., the exploration-exploitation trade-off defined in Section 1.2.1.3.
Indeed, performing too few evaluations on the original objective function can result in
suboptimal solutions whereas performing too many of them can lead to a non efficient
approach.
CMA-ES was coupled with local meta-models to define the local-meta-model CMA-ES
(lmm-CMA) [87]. In the proposed algorithm, the quality of the meta-model is appraised
by tracking the change in the exact ranking of the best individuals. The lmm-CMA
algorithm has been evaluated on test functions using the default population size of CMA-
ES for unimodal functions and for some multi-modal functions and has been shown to
improve CMA-ES [87].
In this section, we review the lmm-CMA algorithm as defined in [87] in Section 2.3.1
and then we analyze the performance of lmm-CMA when using population sizes larger than
22
the default one in Section 2.3.2. We show that tracking the exact rank-change of the best
solutions to determine when to re-evaluate new solutions is a too conservative criterion
and leads to a decrease of the speedup with respect to CMA-ES when the population
size is increased. Instead we propose in Section 2.3.3 a less conservative criterion that we
evaluate on test functions to define a new variant of CMA-ES with meta-models that we
call the new-local-meta-model CMA-ES (nlmm-CMA).
2.3.1 The local-meta-model CMA-ES (lmm-CMA)
The lmm-CMA algorithm [87] combines the CMA-ES with local meta-models by exploiting
the fact that the updates of CMA-ES only rely on the ranking of the µ best solutions. An
iteration of lmm-CMA consists of one iteration of CMA-ES where the evaluation step on
the (true) objective function that usually determines the ranking of the µ best solutions
is replaced by the approximate ranking procedure that outputs an approximate ranking
of the candidate solutions and that costs maximally λ function evaluations on the (true)
objective function (the benefit of the approach comes of course when it costs less than λ).
The mean value, covariance matrix and step-size of CMA-ES are then updated according
to the update equations defined by the standard CMA-ES.
2.3.1.1 Locally weighted regression
To build an approximate model of the objective function f , denoted by fˆ, we use a locally
weighted regression. During the optimization process, a database, i.e., a training set is
built by storing, after every evaluation on the true objective function, points together with
their objective function values (x, y = f (x)). Assuming that the training set contains a
sufficient number m of couples (x, f (x)), let us consider an individual denoted q ∈ Rn to
be evaluated with the approximate model, where n is the dimension of the problem. We
begin by selecting the k nearest points (xj )1≤j≤k from the training set. The distance used
for this purpose exploits the natural metric defined by the covariance matrix of CMA,
namely the Mahalanobis distance with respect to the current covariance
q matrix C defined
for two given points z1 ∈ R and z2 ∈ R by dC (z1 , z2 ) = (z1 − z2 )T C−1 (z1 − z2 ).
n n
We build with locally weighted regression an approximate objective function using (true)
evaluations (yj )1≤j≤k corresponding to the k selected nearest points to q.
The use of a full quadratic meta-model is suggested in [87]. Hence, using a vector
n(n+3)
β ∈ R 2 +1 , we define fˆ as follows:
fˆ (x, β) = β T x21 , · · · , x2n , · · · , x1 x2 , · · · ,

(2.17)
xn−1 xn , x1 , · · · , xn , 1)T .
23
The full quadratic meta-model is built based on minimizing the following criterion with
respect to the vector of parameters β of the meta-model at q:
k 2
X dC (xj , q)
A(q) = fˆ (xj , β) − yj K . (2.18)
h
j=1
The kernel weighting function K (.) is defined by K(ζ) = (1 − ζ 2 )2 , and h is the bandwidth
defined by the distance of the k th nearest neighbor data point to q where k must be greater
n(n+3)
or equal to 2 + 1 for a full quadratic meta-model.
2.3.1.2 Approximate ranking procedure
To incorporate the approximate model built using the locally weighted regression, we use
the approximate ranking procedure [111]. This procedure decides whether the quality of
the model is good enough in order to continue exploiting this model or new true objective
function evaluations should be performed. The resulting method is called the local-meta-
model CMA-ES (lmm-CMA) [87] and is defined as follows. For a given generation, let
us denote individuals of the current population of CMA-ES by (xi )1≤i≤λ , where λ is the
population size. The following procedure is then performed:
1. build fˆ (xi ) for all individuals of the current population (xi )1≤i≤λ .
2. rank individuals according to their approximated value fˆ (xi ): ranking0 .
3. evaluate the best ninit individuals with the true objective function and add their
evaluations to the training set.

4. for nic from 1 to λ−nnb
init
, we:
(a) build fˆ (xi )1≤i≤λ .
(b) rank individuals according to their approximated value fˆ (xi )1 : rankingnic .
(c) if (rankingnic = rankingnic −1 ), the meta-model is accepted.
(d) if the meta-model is accepted, we break. If not, we evaluate the best nb un-
evaluated individuals with the true objective function, add their evaluations to
the training set, and loop to step 4, until reaching the acceptance criterion of
the meta-model.
5. if (nic > 2), ninit = min(ninit + nb , λ − nb ) .
6. if (nic < 2), ninit = max(nb , ninit − nb ) .

1
Or true objective function if the individuals have been evaluated on it.
24
Table 2.1: Test functions and their corresponding initial intervals and standard deviations.
The starting point is uniformly drawn from the initialized interval.
Name Function Init. σ0
n
P
Noisy Sphere fNSphere (x) = ( x2i ) exp (ǫN(0, 1)) [−3, 7]n 5
i=1
n P
P i
Schwefel fSchw (x) = ( xj )2 [−10, 10]n 10
i=1 j=1
1
Schwefel1/4 fSchw1/4 (x) = (fSchwefel (x)) 4 [−10, 10]n 10
n−1
P 2
Rosenbrock fRosen (x) = 100. x2i − xi+1 + (xi − 1)2 [−5, 5]n 5
i=1 s
Pn
Ackley fAck (x) = 20 − 20 exp −0.2 n1 x2i [1, 30]n 14.5
i=1
n
P
+e − exp( n1 cos (2πxi ))
i=1
n
P
Rastrigin fRast (x) = 10n + x2i − 10. cos (2πxi ) [1, 5]n 2
i=1
This procedure heavily exploits the rank-based property of the CMA-ES algorithm.
Initially, a number ninit of best individuals based on the meta-model is evaluated using
the true objective function and then added to the training set. A batch of nb individuals
is evaluated until satisfying the meta-model acceptance criterion: keeping the ranking of
each of the µ best individuals based on the meta-model unchanged for two iteration cycles.
Hence, (ninit + nb ∗ nic ) individuals are evaluated every generation where nic represents
the number of iteration cycles needed to satisfy the meta-model acceptance criterion. The
λ
integer nb is chosen to be equal to max[1, ( 10 )] and ninit is initialized to λ and adapted after
every generation. The minimum number of evaluations performed for a given generation,
which corresponds to the minimum value that ninit can reach, is then equal to nb .
25
Table 2.2: Success performance SP1, i.e., the average number of function evaluations
for successful runs divided by the ratio of successful runs, standard deviations of the
number of function evaluations for successful runs and speedup performance spu, to reach
fstop = 10−10 of lmm-CMA and nlmm-CMA. The ratio of successful runs is denoted
between brackets if it is < 1.0. Results with a constant dimension n = 5 and an increasing
λ are highlighted in grey.
Function n λ ǫ lmm-CMA spu nlmm-CMA spu CMA-ES

fRosen 2 6 291 ± 59 2.7 252 ± 52 3.1 779 ± 236
4 8 776 ± 102 [0.95] 2.8 719 ± 54 [0.85] 3.0 2185 ± 359 [0.95]
5 8 1131 ± 143 2.7 1014 ± 94 [0.90] 3.0 3012 ± 394 [0.90]
5 16 1703 ± 230 [0.95] 2.0 901 ± 64 3.7 3319 ± 409
5 24 2784 ± 263 1.4 1272 ± 90 [0.95] 3.0 3840 ± 256
5 32 3364 ± 221 1.3 1567 ± 159 2.9 4515 ± 275
5 48 4339 ± 223 1.3 1973 ± 144 2.9 5714 ± 297
5 96 6923 ± 322 1.2 3218 ± 132 2.5 7992 ± 428
8 10 2545 ± 233 [0.95] 2.1 2234 ± 202 [0.95] 2.4 5245 ± 644
fSchw 2 6 89 ± 9 4.3 87 ± 7 4.4 385 ± 35
4 8 166 ± 8 5.4 166 ± 6 5.4 897 ± 51
8 10 334 ± 9 6.2 333 ± 9 6.2 2078 ± 138
16 12 899 ± 40 5.9 855 ± 30 6.2 5305 ± 166
fSchw1/4 2 6 556 ± 25 2.4 413 ± 25 3.3 1343 ± 72
4 8 1715 ± 87 1.7 971 ± 36 2.9 2856 ± 135
5 8 2145 ± 69 1.6 1302 ± 31 2.7 3522 ± 136
5 16 3775 ± 137 1.3 1446 ± 31 3.4 4841 ± 127
5 24 5034 ± 142 1.2 1825 ± 45 3.4 6151 ± 252
5 32 6397 ± 174 1.2 2461 ± 43 3.2 7765 ± 227
5 48 8233 ± 190 1.2 3150 ± 58 3.2 10178 ± 202
5 96 11810 ± 177 1.2 4930 ± 94 2.9 14290 ± 252
8 10 4046 ± 127 1.5 2714 ± 41 2.2 5943 ± 133
fNSphere 2 6 0.35 124 ± 14 2.7 109 ± 12 3.1 337 ± 34
4 8 0.25 316 ± 45 2.3 236 ± 19 3.1 739 ± 30
8 10 0.18 842 ± 77 1.8 636 ± 33 2.4 1539 ± 69
16 12 0.13 2125 ± 72 1.3 2156 ± 216 1.3 2856 ± 88
fAck 2 5 302 ± 43 [0.90] 2.6 227 ± 23 3.5 782 ± 114 [0.95]
5 7 1036 ± 620 2.0 704 ± 23 [0.90] 3.0 2104 ± 117 [0.85]
10 10 2642 ± 93 [0.90] 1.4 2066 ± 119 [0.95] 1.8 3787 ± 151 [0.95]
fRast 2 50 898 ± 160 [0.95] 2.7 524 ± 48 [0.95] 4.7 2440 ± 294 [0.75]
5 70 19911 ± 599 [0.15] 0.6 9131 ± 135 [0.15] 1.3 11676 ± 711 [0.50]
5 140 6543 ± 569 [0.80] 1.6 4037 ± 209 [0.60] 2.6 10338 ± 1254 [0.85]
5 280 10851 ± 1008 [0.85] 1.3 4949 ± 425 [0.85] 2.9 14266 ± 1069
26
(a) (b) (c)

5 5 5
4 4 4
Speedup
Speedup
Speedup
3 3 3
2 2 2
1 1 1
0 0 0
8 16 24 32 48 96 8 16 24 32 48 96 70 140 280
Population Size Population Size Population Size
Figure 2.2: Speedup of nlmm-CMA (△) and lmm-CMA () on (a) fSchw1/4 , (b) fRosen and
(c) fRast for dimension n = 5.
2.3.2 Evaluating lmm-CMA on increasing population size

2.3.2.1 Experimental procedure
The lmm-CMA and the other variants tested in this chapter are evaluated on the objective
functions presented in Table 2.1 corresponding to the functions used in [87] except two
functions: (1) the function fSchw1/4 where we compose the convex quadratic function fSchw
by a strictly increasing mapping g : x ∈ R 7→ x1/4 , introduced because we suspect that the
results on fSchw are artificial and only reflect the fact that the model used in lmm-CMA
is quadratic and (2) the noisy sphere function fNSphere whose definition has been modified
following the recommendations of [82]. We have followed the experimental procedure in
[87] and performed for each test function 20 independent runs using an implementation
of lmm-CMA based on a java code of CMA-ES1 randomly initialized from initial inter-
vals defined in Table 2.1 and with initial standard deviations σ0 in Table 2.1 and other
standard parameter settings in [71]. The algorithm performance is measured using the
success performance SP1 used in [11]. SP1 is defined as the average number of evaluations
for successful runs divided by the ratio of successful runs, where a run is considered as
successful if it succeeds in reaching fstop = 10−10 . Another performance measure that
might be used was the expected running time ERT [69] which is defined as the number of
function evaluations conducted in all runs (successful and unsuccessful runs) divided by
the ratio of successful runs. In this chapter, we opt for SP1 since the stopping criteria for
unsuccessful runs were not properly tuned which can affect the performance comparison.
We have reproduced the results for the lmm-CMA presented in [87, Table 3]. Those results
are presented in Table 2.22 .
1
See http : //www.lri.fr/ ∼ hansen/cmaes inmatlab.html.
2
Experiments have been performed with k = n(n + 3) + 2 indicated in [87]. However we observed some
differences on fRosen and fSchw with this value of k and found out that k = n(n+3)
2
+ 1 allows to obtain the
results presented in [87, Table 3]. We did backup this finding by using the Matlab code provided by Stefan
Kern.
27
2.3.2.2 Performances of lmm-CMA with increasing population size
In lmm-CMA, a meta-model is accepted if the exact ranking of the µ best individuals

remains unchanged. However, this criterion is more and more difficult to satisfy when the
population size λ and thus µ(= λ/2) increases. We suspect that this can have drastic
consequences on the performances of lmm-CMA. To test our hypothesis we perform tests
for n = 5 on fRosen , fSchw1/4 with λ = 8, 16, 24, 32, 48, 96 and for fRast for λ =
70, 140, 280. The results are presented in Fig. 2.2 and in Table 2.2 (rows highlighted in
grey). On fRosen and fSchw1/4 , we observe, as expected that the speedup with respect to
CMA-ES drops with increasing λ and is approaching 1. On fRast , we observe that the
speedup for λ = 140 is larger than for λ = 280 (respectively equal to 1.6 and 1.3).
2.3.3 A new variant of lmm-CMA
We propose now a new variant of lmm-CMA, the new-local-meta-model CMA-ES (nlmm-

CMA) that tackles the problem detected in the previous section.
2.3.3.1 A new meta-model acceptance criteria
We have seen that requiring the preservation of the exact ranking of the µ best individuals
is a too conservative criterion for population sizes larger than the default one to measure
the quality of meta-models. We therefore propose to replace this criterion by the following
one: after building the model and ranking it, a meta-model is accepted if it succeeds in
keeping, both the ensemble of µ individuals and the best individual unchanged. In this
case, we ignore any change in the rank of each individual from the best µ individuals,
except for the best individual which must be the same, as long as this individual is still an
element of the µ best ensemble. Another criterion is added to the acceptance of the meta-
model: once more than one fourth of the population is evaluated, the model is accepted if
it succeeds to keep the best individual unchanged. The proposed procedure is then defined
as follows. For a given generation, let us denote individuals of the current population of
CMA-ES by (xi )1≤i≤λ , where λ is the population size. The following new approximate
ranking procedure is then performed:
1. build fˆ (xi ) for all individuals of the current population (xi )1≤i≤λ .
2. rank individuals according to their approximated value fˆ (xi ) and determine the µ
best individuals set and the best individual.
3. evaluate the ninit best individuals with the true objective function and add their
evaluations to the training set.

4. for nic from 1 to λ−nnb
init
, we:
28
(a) build fˆ (xi )1≤i≤λ .
(b) rank individuals according to their approximated value fˆ (xi )1 and determine
the µ best individuals set and the best individual.
(c) if less than one fourth of the population is evaluated, the meta-model is accepted
if it succeeds in keeping both the best individual and the ensemble of µ
best individuals unchanged.
(d) if more than one fourth of the population is evaluated, the meta-model is ac-
cepted if it succeeds in keeping the best individual unchanged.
(e) if the meta-model is accepted, we break. If not, we evaluate the nb best un-
evaluated individuals with the true objective function, add their evaluations to
the training set, and loop to step 4, until reaching the acceptance criterion of
the meta-model.
5. if (nic > 2), ninit = min(ninit + nb , λ − nb ) .
6. if (nic < 2), ninit = max(nb , ninit − nb ) .
Considering only changes in the whole parent set, without taking into account the exact
rank of each individual, and setting an upper limit on the number of true objective function
evaluations was first proposed in [13]. The new variant is called nlmm-CMA in the sequel.
2.3.3.2 Evaluation of nlmm-CMA
The performance results of nlmm-CMA are presented in Table 2.2 together with the ones of
lmm-CMA. Table 2.2 shows that on fRast , the nlmm-CMA speedup is in between 2.5 and 5
instead of 1.5 and 3 for lmm-CMA, and on fAck nlmm-CMA outperforms lmm-CMA with
speedups between 1.5 and 3.5 for nlmm-CMA and between 1.4 and 3 for lmm-CMA. On
these functions, nlmm-CMA is significantly more efficient. For the other tested functions
fRast , fSchw and fSchw1/4 , nlmm-CMA is marginally more efficient than the standard lmm-
CMA. In Fig. 2.2 and in Table 2.2 (highlighted rows), we evaluate the effect of increasing
λ on nlmm-CMA using the same setting as in Section 2.3.2.2. Using population sizes
larger than the default one, nlmm-CMA improves CMA-ES by a factor between 2.5 and
3.5 for all tested functions fRosen , fSchw1/4 and fRast . Therefore, nlmm-CMA maintains a
significant speedup for λ larger than the default one contrary to lmm-CMA which offers
a speedup approaching to 1 for fRosen and fSchw1/4 and a decreasing speedup (from 1.6 to
1.3) when λ increases (from 140 to 280) for fRast .
1
Or true objective function if the individuals have been evaluated on it.
29
2.3.3.3 Impact of the recombination type
The choice of the recombination type has an important impact on the efficiency of evolution
strategies in general [6] and CMA-ES in particular [74, 71]. In the previous section, all the
runs performed use the default weighted recombination type defined by Eq. (2.5). In the
new variant of lmm-CMA, the meta-model acceptance criterion does not take into account
the exact rank of each individual except the best one. By modifying the meta-model
acceptance criteria of lmm-CMA, a possible accepted meta-model may be a meta-model
that preserves the µ best individuals set and the best individual but generates a ranking
far from the “true” ranking, i.e., the one based on the true objective function. We now
compare nlmm-CMA using weighted recombination where weights are defined in Eq. (2.5)
and intermediate recombination where weights are all equal to 1/µ: nlmm-CMAI . Results
are presented in Table 2.3. The algorithm nlmm-CMA outperforms nlmm-CMAI in all
cases suggesting that even if the exact ranking is not taken into account for assessing the
quality of the meta-model in nlmm-CMA , this ranking is not random and still has an
amount of information to guide CMA-ES.
2.3.3.4 Impact of initial parameters
In the tests presented so far, the initial parameters of the approximate ranking procedure
are defined as follows: ninit is initialized at the beginning of the optimization process to
λ
λ, and nb is set to max[1, ( 10 )]. Every generation g, the number of initial individuals
evaluated ninit is adapted (increased or decreased) depending on the meta-model quality
(g)
(Steps 5. and 6. in the procedure defined in Section 2.3.3.1). We denote by ninit and
(g)
nic the values of ninit and nic respectively at generation g. The number of evaluations
(g) (g)
performed every generation g is (ninit + nic × nb ). We quantify now the impact of the
initial values of (ninit and nb ) on the total cost of the optimization process. The algorithm
nlmm-CMA is compared to a similar version where initial parameters are chosen as small
(0)
as possible, i.e., ninit and nb are equal to 1. Moreover, we consider two cases: (1) with
update denoted nlmm-CMA1 , i.e., where initial parameters are adapted depending on the
iteration cycle number (Steps 5. and 6. in the procedure defined in Section 2.3.3.1), and
(2) without update denoted nlmm-CMA2 , i.e., parameters are equal to 1 during the entire
optimization process (omitting steps 5. and 6. in the procedure defined in Section 2.3.3.1).

(g) (g)
We note that in case (1), the number of evaluations for each generation g is ninit + nic .

(g) (g)
In case (2), every generation g, lmm-CMA evaluates 1 + nic individuals, since ninit = 1.
The results on different test functions are summarized in Table 2.3.
On the unimodal functions fSchw , fSchw1/4 , setting ninit and nb as small as possible in ev-
ery generation, is marginally more efficient than the default definition of initial parameters
on small dimensions except for dimension n = 8 and λ = 10. On fRosen , nlmm-CMA2 is
30
the most efficient compared to other approaches, except for dimension n = 8 and λ = 10
which can be justified by a higher number of unsuccessful runs compared to other ap-
proaches. On the multi-modal function fAck , modifying the initial parameter ninit does
not have an important impact on the speedup of lmm-CMA (between 1.5 and 4). However
on fRast , using a small initial ninit decreases considerably the probability of success of the
optimization, from 0.95 to between 0.35 and 0.10 for dimension n = 2 and λ = 50, and
from 0.60 to 0.10 for dimension n = 5 and λ = 140. These results confirm the initial
parameter choice suggested in [87].
31
Table 2.3: SP1, standard deviations of the number of function evaluations for successful runs and speedup performance spu, to reach
fstop = 10−10 of nlmm-CMA, nlmm-CMAI (intermediate recombination and default initial parameters), nlmm-CMA1 (default recombination,
initial values of ninit and nb set to 1) and nlmm-CMA2 (default recombination type, ninit = 1 and nb = 1 during the whole optimization
process). The ratio of successful runs is denoted between brackets if it is < 1.0.
Function n λ ǫ nlmm-CMA spu nlmm-CMAI spu nlmm-CMA1 spu nlmm-CMA2 spu

fRosen 2 6 252 ± 52 3.1 357 ± 67 2.2 250 ± 80 3.1 229 ± 53 3.4
4 8 719 ± 54 [0.85] 3.0 833 ± 100 2.6 596 ± 55 3.7 575 ± 68 3.8
8 10 2234 ± 202 [0.95] 2.4 2804 ± 256 [0.95] 1.9 2122 ± 133 2.5 2466 ± 207 [0.85] 2.1
fSchw 2 6 87 ± 7 4.4 110 ± 10 3.5 75 ± 8 5.2 73 ± 7 5.3
4 8 166 ± 6 5.4 220 ± 15 4.1 138 ± 6 6.5 136 ± 5 6.6
8 10 333 ± 9 6.2 423 ± 15 4.9 374 ± 16 5.6 380 ± 21 5.5
32
16 12 855 ± 30 6.2 947 ± 24 5.6 794 ± 27 6.7 786 ± 37 6.8

fSchw1/4 2 6 413 ± 25 3.3 550 ± 29 2.4 411 ± 20 3.3 398 ± 16 3.4
4 8 971 ± 36 2.9 1320 ± 76 2.2 938 ± 32 3.1 909 ± 30 3.1
8 10 2714 ± 41 2.2 2714 ± 257 2.2 2668 ± 40 2.2 2677 ± 36 2.2
fNSphere 2 6 .35 109 ± 12 3.1 135 ± 19 2.5 92 ± 11 3.7 87 ± 9 3.9
4 8 .25 236 ± 19 3.1 306 ± 40 2.4 216 ± 16 3.4 219 ± 16 3.4
8 10 .18 636 ± 33 2.4 788 ± 47 2.0 611 ± 35 2.5 619 ± 45 2.5
16 12 .13 2156 ± 216 1.3 2690 ± 421 1.1 2161 ± 148 1.3 2195 ± 142 1.3
fAck 2 5 227 ± 23 3.5 329 ± 29 [0.85] 2.4 226 ± 21 [0.95] 3.5 208 ± 19 3.8
5 7 704 ± 23 [0.90] 3.0 850 ± 43 [0.90] 2.5 654 ± 35 [0.95] 3.2 652 ± 32 [0.95] 3.2
10 10 2066 ± 119 [0.95] 1.8 2159 ± 58 1.8 2394 ± 52 [0.80] 1.6 1925 ± 44 2.0
fRast 2 50 524 ± 48 [0.95] 4.7 796 ± 68 [0.75] 3.1 569 ± 26 [0.35] 4.3 1365 ± 28 [0.10] 1.8
5 140 4037 ± 209 [0.60] 2.6 5265 ± 313 [0.55] 2.0 13685 ± 257 [0.10] 0.8 7910 ± 82 [0.10] 1.3
2.4 Summary and discussions
In this chapter, we have introduced the stochastic optimizer CMA-ES, as well as an adap-
tive penalization with rejection technique in order to handle the optimization constraints.
We have explained that CMA-ES exhibits many invariances, a desirable property as it
implies the generalization of results from one function to a class of functions and confer
thus robustness and wider applicability of the method. In particular, CMA-ES is a rank-
based search algorithm exploiting the objective function only through the relative ranking
of solutions within the population. The rank-based property implies invariance of the
algorithm on the class of functions class f = {g ◦ f, g : R → R strictly increasing} for any
f : Rn → R.
In order to improve its performance when dealing with costly objective functions, the
CMA-ES algorithm has been combined with local meta-models that are constructed using
points from the archive of solutions–called the training set–evaluated on the (expensive)
original objective function. The quality of the meta-models is appraised using an ap-
proximate ranking procedure that determines if the objective function predicted by the
meta-model is good enough or more points should be evaluated on the original function.
The resulting algorithm is called the local-meta-model CMA-ES (lmm-CMA) [87] (Sec-
tion 2.3.1). In this chapter, the original acceptance criterion for the meta-models proposed
for lmm-CMA has been shown to be too conservative for increasing population sizes (Sec-
tion 2.3.2) and modified in order to maintain a reasonable speed-up when population sizes
larger than the default one are used (Section 2.3.3). The proposed new variant is called
the new-local-meta-model CMA-ES (nlmm-CMA).
In particular, we have investigated in this chapter the performances of the lmm-CMA
algorithm coupling CMA-ES with local meta-models. On fRosen and fSchw1/4 , we have
shown that the speedup of lmm-CMA with respect to CMA-ES drops to one when the
population size λ increases. This phenomenon has been explained by the too restrictive
condition used to stop evaluating new points dedicated at refining the meta-model, namely
requiring that the exact ranking of the µ = λ/2 best solutions is preserved when evaluating
a new solution on the exact objective function. To tackle this problem, we have proposed
to relax the condition to: the set of µ best solutions is preserved and the best individual
is preserved. The resulting new variant, nlmm-CMA outperforms lmm-CMA on the test
functions investigated and the speedup with CMA-ES is between 1.5 and 7. Moreover,
contrary to lmm-CMA it maintains a significant speedup, between 2.5 and 4, when increas-
ing λ on fRosen , fSchw1/4 and fRast . The study of the impact of the recombination weights
has shown that the default weights of CMA-ES are more appropriate than equal weights.
The influence of two parameters, nb and ninit , corresponding to the number of individu-
als evaluated respectively initially and in each iteration cycle has been investigated. We
33
have seen that setting those parameters to 1 during the whole optimization process can
marginally improve the performances on uni-modal functions and some multi-modal test
functions. However it increases the likelihood to be stuck in local minima for the Rastrigin
function suggesting that the default parameter of lmm-CMA are still a good choice for
nlmm-CMA.
34
Chapter 3
Well placement optimization with

CMA-ES and CMA-ES with
meta-models
This chapter is based on the papers [26, 24]. In this chapter, we apply the CMA-ES
algorithm to the well placement problem, with the adaptive penalization with rejection
technique (introduced in Chapter 2) to handle constraints. Because genetic algorithms are
quite often the method of choice in petroleum industry, we first show the improvement
of applying CMA-ES over a GA on the synthetic benchmark reservoir case PUNQ-S3. In
addition, because a reservoir simulation and thus the objective function is expensive, we
apply the nlmm-CMA algorithm introduced in the previous chapter in order to save a
number of evaluations by building a model of the problem. We validate the approach on
the PUNQ-S3 case.
This chapter is structured as follows. Section 3.1 describes the problem formulation.
In Section 3.2, CMA-ES is compared to a genetic algorithm on a synthetic reservoir case to
show the contribution of the proposed optimization method. In Section 3.3, the reduction
of the number of reservoir simulations is addressed by coupling CMA-ES with meta-models
and the contribution of the whole methodology, i.e., CMA-ES with meta-models is demon-
strated on a number of well location and trajectory optimization problems (with unilateral
and multilateral wells).
3.1 The well placement optimization problem formulation
In this section, we describe the well placement optimization problem and explain the
parameterization of the wells.
35
3.1.1 Objective function
The quality of a well placement decision is evaluated using an objective function that we
aim at maximizing (good solutions have a high objective function value and we aim at
finding the solution with the highest objective function value). The objective function as-
sociated with a well placement problem often evaluates the economic model of the decision
and takes into account different costs such as prices of oil and gas, costs of drilling and
costs of injection and production of water. Another alternative is to use the cumulative oil
production or the barrel of oil equivalent (BOE). In this chapter, the objective function
considered is the net present value NPV. Formally we want to find a vector of parameter
pmax such that:
NPV(pmax ) = max {NPV(p)} . (3.1)
p
The NPV of a well configuration and trajectory represented by a vector of parameter

p is calculated using two terms, the expected revenue associated to p denoted R and the
drilling and completing cost of p denoted Cd which is subtracted to the revenue term, i.e.,:
NPV(p) = R(p) − Cd (p) . (3.2)
The revenue term R is defined by summing the revenues from produced oil over all the
wells, and subtracting the costs associated to produced water and to injected water. A
discount rate –called also an annual percentage rate– is introduced to take into account the
risk and uncertainty and the time value of money, that is oil produced earlier contributes
more to the overall NPV. The detailed formula for the revenue term reads:
  T  
Y Qn,o Cn,o
X  1  Qn,g   Cn,g 
R=   , (3.3)
(1 + APR)n
n=1 Qn,wa Cn,wa
where Qn,p is the field production of phase ph (either oil, gas or water denoted respectively
o, g, wa ) at period n and Cn,p is the profit or loss associated to this production. The annual
percentage rate is denoted APR. The integer Y is the number of discount periods (years).
For the drilling and completing cost term Cd , we use the approximate formula used in
[129] that proposes to estimate the drilling cost as the sum of two terms: the first term
is proportional to the diameter of each lateral times the length of this lateral multiplied
the logarithm of this lateral (taking into account that the cost is more than linear in the
length), the second term adds up a fixed cost per junction, i.e.,:
Nw Nlat
! Njun
X X X
Cd = [A.dw . ln(lw ).lw ]k,w + [Cjun ]m , (3.4)
w=1 k=0 m=1
36
Table 3.1: Constants used to define the net present value (NPV).
Constant Value
Cn,o 60 $ / barrel
Cn,wa -4 $ / barrel
Cn,g 0 $ / barrel
APR 0.1
A 1000
dw 0.1 m
Cjun 105 $
where k = 0 represents the mainbore, k > 0 represents the laterals, lw is the length of
the lateral (in ft), dw is the diameter of the mainbore (in ft), Nw is the number of wells
drilled, Nlat is the number of laterals and A is a constant specific to the considered field
containing conversion factors. Cjun is the cost of milling the junction and Njun is the
number of junctions.
For this chapter, the constants used to define the NPV in Eqs. (3.3) and (3.4) are given
in Table 3.1.
The computation of the NPV of a configuration p requires to have a prediction of the
quantity of oil, water and gas (Qn,o , Qn,wa , Qn,g ) associated to p in order to compute
the revenue R given by Eq. (3.3). To compute those quantities we use a reservoir simula-
tion which represents the time consuming part in the computation of the NPV objective
function.
It is in general needed to impose different constraints on the well configuration to avoid
finding both undrillable wells and wells that violates common engineering practices. The
constraints handled in this thesis are as follows:
• maximum length of wells: lw < Lmax , for each well w to be placed;
• all wells must be inside the reservoir grid: lw = linside , for each well w to be placed,
where linside is the length of the well w inside the reservoir grid.
3.1.2 Well parameterization
In our approach, we want to be able to handle different possible configurations of multi-

lateral wells. An illustrative scheme is given in Fig. 3.1. The terminology used to define
each part of a multilateral well follows the terminology used in [77]. In general, a lateral
can be defined by a line connecting two points. The mainbore is defined through the
trajectory of its contiguous completed segments. Hence, we define a sequence of points
where a deviation occurs (Pd,i )0≤i≤Ns where Ns is the number of segments. The starting
point Pd,0 = P0 of the mainbore called the heel is represented by its Cartesian coordinates
37
P0 (x0 , y0 , z0 )
rd,1
lb,1
Pd,1 (rd,1 , θd,1 , ϕd,1 )
rd,2
Q1 rb,1
Pb,1 (lb,1 , rb,1 , θb,1 , ϕb,1 )
Pd,2 (rd,2 , θd,2 , ϕd,2 )
Figure 3.1: An example of a single multilateral well parameterization with two segments
(Ns = 2) and one branch (Nb = 1).
(x0 , y0 , z0 ). Other intermediate points (Pd,i )1≤i≤Ns −1 and the ending point Pd,Ns called
the toe are represented by their corresponding spherical coordinate system (rd,i , θd,i , ϕd,i )
with respect to the basis (Pd,i−1 , urd,i , uθd,i , uϕ
d,i ). We use spherical coordinates because
they allow for straightforward control of the well lengths by imposing a box constraint
whereas it would need to be handled by imposing a non linear constraint with Cartesian
coordinates.
The wells are parameterized in a way to handle a number Nb of branches and/or
laterals as well.
The branch or lateral j ∈ [1, · · · , Nb ] is defined by locating its ending point Pb,j (lb,j ,
rb,j , θb,j , ϕb,j ) where (rb,j , θb,j , ϕb,j )1≤j≤Nb represents the spherical coordinates of Pb,j with
respect to the basis (Qj , urb,j , uθb,j , uϕ
b,j ), Qj is the starting point of the branch or the lateral
j, and lb,j is the distance along the well between P0 and Qj .
The dimension Dw of the representation of a well denoted by w is as follows:
Dw = 3 (1 + Nsw ) + 4 Nbw . (3.5)
Hence, the dimension D of the problem of placing Nw wells (wk )k=1,··· ,Nw is:
Nw
X
D= D wk . (3.6)
k=1
An example of a single well parameterization is shown in Fig. 3.1. In this example,
38
3.2 CMA-ES and a real-coded GA for the well placement problem
Ns is equal to two and Nb is equal to one. The mainbore is then represented by three
points P0 and (Pd,i )1≤i≤2 . The branch is represented by one point Pb,1 . The corresponding
dimension of the optimization problem is 13.
3.2 CMA-ES and a real-coded GA for the well placement

problem
The choice of a stochastic optimization method was motivated by the ability of this type of
algorithms to deal with non-smooth, non-convex and multi-modal functions. In addition,
stochastic optimization does not require any gradients and can be easily parallelized. So
far, the most popular stochastic approaches for tackling well placement have been genetic
algorithms encoding the real parameters to be optimized as bit-strings. However, it is know
in the stochastic algorithm community, that representing real vectors as bit strings leads to
poor performance [122]. Recently, a comparison between binary and real representations
on a well placement problem in a channelized synthetic reservoir model has been made,
showing that the continuous variant outperforms the binary one [33].
This section compares a real-coded GA with CMA-ES on a well placement problem.
To allow a fair comparison, both algorithms are used without parameter tuning. Indeed,
tuning an algorithm requires some extra objective function evaluations that would need to
be taken into account otherwise. Default parameters are used for the CMA-ES algorithm1
and typical parameter value for the GA.
3.2.1 Well placement using CMA-ES
The initial population is normally drawn using a mean vector uniformly drawn in the
reservoir. Parameters were defined according to default settings [71].
The population size λ is an important parameter of CMA-ES [71]. The default popula-
tion size value equals 4+⌊3×ln(D)⌋, where D is the dimension of the problem. Independent
restarts with increasing population size are suggested in [12]. In this thesis, the optimal
tuning of the population size was not addressed. However, due to the difficulty of the
problem at hand, we use a population size greater than the default value.
3.2.2 Well placement using GA
Genetic algorithms [78, 79] are stochastic search algorithms that borrow some concepts
from nature. Similar to CMA-ES, GAs are based on an initial population of individuals.
Each individual represents a possible solution to the problem at hand. Starting with
1
At the exception of the population size where the default setting is known to be good for non-rugged
landscapes but needs to be increased otherwise [71].
39
Table 3.2: GA parameters: the probabilities to apply GA operators, i.e., crossover and
mutation.
Constant Value
crossprob 0.7
mutprob 0.1
an initial population of points called individuals or chromosomes, and at each iteration,

candidate solutions evolve by selection, mutation and recombination until reaching the
stopping criteria with a satisfactory solution. The correspondence between a solution
and its representation needs to be defined. In general, simple forms like an array or a
matrix of integer or bit elements are used. In this section, individuals are parameterized
as defined for CMA-ES (see Section 3.1.2). Hence, well coordinates are defined using a
real encoding. Elitism is used to make sure that the best chromosome would survive to
the next generation. The used operators are defined as follows:
• The crossover starts with two parent chromosomes causing them to unite in points
to create two new elements. The greater chromosome fitness’ rank, the higher prob-
ability it will be selected. After selecting the two parents, crossover is applied with a
probability denoted crossprob. To apply the crossover, we randomly draw an index i
between 1 and D and a number c between 0 and 1. Let us denote the two parents by
(x1,j )1≤j≤D and (x2,j )1≤j≤D , then x1,i ← c × x1,i + (1 − c) × x2,i and x2,i ← c × x2,i
+ (1 − c) × x1,i .
• The mutation, instead, starts with one individual and randomly changes some of
its components. Mutation is applied to all chromosomes, except the one with the
best fitness value, with a probability of mutation denoted mutprob. In this case, we
randomly draw an index i. Let us denote the selected chromosome by (xj )1≤j≤D ,
then xi ← mini + c × (maxi − mini ), where mini and maxi are the minimum and
the maximum values that can be taken by the ith coordinate of the chromosome and
c is a number randomly drawn between 0 and 1.
The mutation and crossover probabilities are set to typical values (see Table 3.2)1 .
To handle the constraints, the genetic algorithm is combined with the Genocop III
technique (Genetic Algorithm for Numerical Optimization of Constrained Problems) [47].
This procedure maintains two separate populations. The first population called the search
population contains individuals which can be unfeasible. The second population, the
reference population, consists of individuals satisfying all constraints (linear and non-
linear), called reference individuals. Feasible search individuals and reference individuals
1
A good choice of the crossover probability is said to be in between 0.4 and 0.9 [128, 39], 0.6 and 0.8
[66], 0.6 and 0.95 [51, 55], 0.6 and 0.8 [120]. A good choice of the mutation probability is said to be in
between 0.001 and 0.1 [39, 51, 55], 0.005 and 0.05 [120], 0.05 and 0.1 [128].
40
Figure 3.2: Elevation (in meters) and geometry of the PUNQ-S3 test case.
are evaluated directly using the objective function. However, unfeasible individuals are
repaired before being evaluated. More details about Genocop III can be found in [97].
3.2.3 Well placement performance
All tests performed in the present chapter are conducted on the PUNQ-S3 test case [54].
PUNQ-S3 is a case taken from a reservoir engineering study on a real field, and qualified as
a small-size industrial reservoir model. The model grid contains 19 cells in the x-direction,
28 cells in the y-direction and 5 cells in the z-direction. The cell sizes is 180m in the x and
y directions and 18m in the z-direction. We suppose that the field does not contain any
production or injection well initially. The elevation of the field and its geometry is shown
in Fig. 3.2. We plan to drill two wells: one unilateral injector and one unilateral producer.
The dimension of the problem is then equal to 12(= 6 × 2).
To compare results obtained by both CMA-ES and the genetic algorithm, 14 runs were
performed for each algorithm. A streamline simulator is used during the optimization. In
this comparison, a bottomhole pressure imposed on the producer is fixed to 80 bar, and a
bottomhole pressure imposed on the injector is fixed to 6.000 bar which is too high. This
unrealistic value was used only for the sake of comparison between the two optimization
methods.
The population size is set to 40 for both algorithms. The stopping criterion is also the
same for both of the methods: a maximum number of iterations equal to 100. The size of
the reference population for Genocop III is set to 60. Well lengths are constrained with a
maximum well length Lmax = 1000 meters.
41
10
x 10
1.8
1.6
NPV
1.4
1.2
CMA−ES
GA
1
0 500 1000 1500 2000 2500 3000 3500 4000
Number of reservoir simulations
Figure 3.3: The mean value of NPV (in US dollar) and its corresponding standard devi-
ation for well placement optimization using CMA-ES (solid line) and GA (dashed line).
Fourteen runs are performed for each algorithm. Constraints are handled using an adaptive
penalization with rejection technique for CMA-ES and using Genocop III for GA.
40 40
Stan. Dev. (Nb of infeasible indiv.)
Nb. of infeasible indiv.
30 30
20 20
10 10
0 0
0 20 40 60 80 0 20 40 60 80
Number of generations Number of generations
Figure 3.4: The mean number of unfeasible individuals per generation and its corre-
sponding standard deviation using CMA-ES with an adaptive penalization with rejection
technique. Here, we consider only unfeasible individuals far from the feasible domain, i.e.,
resampled individuals.
42
5040
4680
4320
3960
3600
3240
2880
2520
2160
1800
1440
1080
720
360
0
0 360 720 1080 1440 1800 2160 2520 2880 3240
Figure 3.5: The positions of solution wells found by 14 runs of CMA-ES projected on
the top face of the reservoir. Injectors are represented by (dashed line). Producers are
represented by (solid line).
Fig. 3.3 shows the average performance and its standard deviation of the well placement
optimization using both algorithms measured by the overall best objective function value.
It is clear that CMA-ES outperforms the GA: the genetic algorithm adds only 40% to the
best NPV obtained by a randomly sampled configuration, i.e., in the first generation of
the optimization. However, CMA-ES adds 80%.
Fig. 3.4 shows that CMA-ES handles the used constraints successfully. The number
of well configurations resampled, i.e., far from the feasible domain, approaches to 0 at the
end of the optimization. Fig. 3.4 shows that after a number of iterations, the majority of
the well configurations generated by CMA-ES are either feasible or close to the feasible
domain.
Fig. 3.5 shows the positions of “optimum” wells obtained from 14 runs using CMA-ES.
CMA-ES succeeds in defining in 11 runs of the 14 performed the same potential zone to
place the producer and the injector. This region gives an NPV between $1.99 × 1010 and
43
5040
4680
4320
3960
3600
3240
2880
2520
2160
1800
1440
1080
720
360
0
0 360 720 1080 1440 1800 2160 2520 2880 3240
Figure 3.6: The positions of solution wells found by 14 runs of the GA projected on the top
face of the reservoir. Injectors are represented by (dashed line). Producers are represented
by (solid line).
44
3.3 Application of CMA-ES with meta-models on the PUNQ-S3 case
$2.05 × 1010 . In the other three runs, CMA-ES finds each time a different local optimum
with NPV values equal to: $1.83 × 1010 , $1.95 × 1010 and $2.05 × 1010 . Despite the large
number of local optima, CMA-ES succeeds in providing satisfactory results on 93 % of
the runs, if we consider that a run is satisfactory if it gives an NPV greater or equal to
$1.95 × 1010 .
For the genetic algorithm, 14 runs were performed to trace different “optimum” well
configurations in Fig. 3.6. Well configurations are not concentrated in some well-defined
regions and have an NPV mean value equal to $1.68 × 1010 with a standard deviation
equal to 1.06 × 109 . The GA leads to well configurations dispersed over a large zone. The
maximum value of NPV obtained by the GA is equal to $1.86 × 1010 and it corresponds
to a well configuration close to a well configuration obtained by CMA-ES with an NPV
$2.05 × 1010 .
Results confirm that CMA-ES is able to find in the majority of the runs a solution in
the same potential region. In 93% of the runs on the considered test case, CMA-ES finds
a well configuration with a satisfactory NPV value. However, the GA has difficulties to
define this potential region and seems to prematurely converge in different regions. Pre-
mature convergence in the GA is most certainly due to the lack of mechanisms that (1)
would play the role of the step-size mechanism in CMA-ES which is able to increase the
step-size in linear environments and (2) would play the role of the covariance matrix adap-
tation mechanism allowing to adapt the main search directions (elongate / shrink certain
directions and learn the principal axis of the problem) to solve efficiently ill-conditioned
problems. Without this latter mechanism on ill-conditioned problems, it is common to
observe premature convergence.
3.3 Application of CMA-ES with meta-models on the PUNQ-

S3 case
In this section we apply CMA-ES with meta-models on the well placement optimization
problem. The proposed approach is able to handle different possible well configurations as
defined in Section 3.1.2. The use of local meta-models instead of a global one is motivated
by the fact that we want the algorithm to be able to handle multi-modal functions or
unimodal functions where a global quadratic model would model poorly the function.
In the following, we use the variant nlmm-CMA2 defined in Section 2.3.3.4. For nlmm-
CMA2 , (1 + nic ) individuals are evaluated for a given generation where nic is the number
of iteration cycles needed to satisfy the meta-model acceptance criterion. In this section,
the performance of the approach is demonstrated on two cases.
45
Figure 3.7: The mean value of NPV (in US dollar) and its corresponding standard deviation
for well placement optimization using CMA-ES with meta-models (solid line) and CMA-
ES (dashed line). Ten runs are performed for each algorithm. Constraints are handled
using an adaptive penalization with rejection technique.
3.3.1 Placement of one unilateral producer and one unilateral injector
In this application, we consider a placement problem of one unilateral injector and one
unilateral producer on the PUNQ-S3 case. Parameters of the problem are the same as for
the example in Section 3.2.3, except for the following differences:
• a commercial reservoir simulator is used to evaluate field productions of each phase

instead of the streamline simulator;
• the bottomhole pressure imposed on the producer is fixed to 150 bar;
• the bottomhole pressure imposed on the injector is fixed to 320 bar.
To define the parameters of the meta-model, we choose k, the number of individuals

used to evaluate the meta-model, equal to 100. Meta-models are used when the training
set contains at least 160 couples of points with their evaluations. For each method, i.e.,
CMA-ES and CMA-ES with local meta-models (lmm-CMA), 10 runs were performed. The
evolution of the NPV mean value in term of the mean number of reservoir simulations is
represented in Fig. 3.7.
Fig. 3.7 shows that, for the same number of reservoir simulations, combining CMA-
ES with meta-models allows to reach higher NPV values compared to CMA-ES, given a
46
1800
CMA−ES
1600 lmm−CMA

1400
1200
1000
800
600
400
200
0
7 7.5 8 8.5 9 9.5 10
NPV 9
x 10
Figure 3.8: The mean number of reservoir simulations needed to reach a given NPV value
using CMA-ES with meta-models (solid line) and CMA-ES (dashed line). Ten runs are
performed for each algorithm.
restricted budget of reservoir simulations. A better representation is to show the mean

number of reservoir simulations needed to reach a certain value of NPV for CMA-ES and
for CMA-ES with meta-models (Fig. 3.8). To reach an NPV value of $9 × 109 , lmm-CMA
requires only 659 reservoir simulations, while CMA-ES requires 880 reservoir simulations.
If we consider that an NPV equal to $9 × 109 is satisfactory, using meta-models reduces
the number of reservoir simulations by 25%. For an NPV value equal to $9.6 × 109 ,
the use of meta-models reduces the number of reservoir simulations by 19%. Figs. 3.7
and 3.8 highlight the contribution of meta-models in reducing the number of reservoir
simulations. Results show also that, in addition to reducing the number of objective
function evaluations, the method still succeeds in reaching high NPV values and results
are similar to those obtained by CMA-ES. As for the example in Section 3.2.3, the well
placement optimization still succeeds in identifying in the majority of the runs the same
potential region to contain optimum wells. In the following, we present detailed results
obtained only by one of the solution well configurations proposed by lmm-CMA. The
selected solution well configuration is denoted optimized config in the sequel. Optimized
config is then compared to two configurations designed after some trials in a way to
represent the decision of a reservoir engineer (denoted config.1 and config.2 ). The locations
and trajectories of the considered well configurations are shown in Fig. 3.9.
The engineer’s proposed configurations were defined according to the SoPhiH map
(Fig. 3.9) which represents the distribution of the hydrocarbon pore volume over the nlayers
47
Figure 3.9: The SoPhiH map, with solution well configuration obtained using CMA-ES
with meta-models (PROD-O, INJ-O) and two engineer’s proposed well configurations
(PROD-1, INJ-1 and PROD-2, INJ-2).
48
8
x 10
2.5
Cumulative oil production (bbl)

2
1.5
0.5 Optimized config.

Config. 1
Config. 2
0
0 1000 2000 3000 4000
Time (days)
5
x 10
3
2.5
Oil rate (bbl/d)
1.5
0.5
0
1000 2000 3000 4000
Time (days)
0.8
Water cut
0.6
0.4
0.2
0
1000 2000 3000 4000
Time (days)
Figure 3.10: Production curves for an optimized solution using CMA-ES with meta-models
(optimized config.) and two engineer’s proposed configurations (config.1 and config.2 ).
49
nlayers
P
layers defined by (Hk ×φ×So ), where Hk is the gross thickness of the layer k, So is the
k=1
oil saturation and φ is the porosity. PROD-c and INJ-c denote respectively the producer
and the injector corresponding to the well configuration c. The well configuration is either
config.1, config.2 or optimized config denoted respectively 1, 2, O. Engineer’s proposed
wells are horizontal wells where producers (PROD-1 = PROD-2) are placed in the top
layer (k = 1) and injectors in the bottom layer (k = 5). However, producers and injectors
in optimized config are inclined wells placed in the layer (k = 3). The engineer’s proposed
producer is placed in the region with the highest SoPhiH value.
Fig. 3.10 shows the production curves of the considered well configurations. The cu-
mulative oil production for optimized config, during the 11 simulated years equals 205
MMbbl. However, config.1 offers only 119 MMbbl and config.2 offers 102 MMbbl. There-
fore, the optimization methodology adds 72% to the best considered engineer’s proposed
well configuration. Optimized config offers also the smallest water cut (0.45 for optimized
config, 0.57 for config.1 and 0.69 for config.2 ).
3.3.2 Placement of one multi-segment producer
In this application, we consider a placement problem of one multi-segment well on the

PUNQ-S3 case. We suppose that an injector is already placed in the reservoir. It corre-
sponds to the well denoted INJ-O in Fig. 3.9. We plan to drill a multi-segment well with
two completed segments. The dimension of the problem is then equal to 9(= 6 + 3). The
different parameters of the problem are the same as in the example in Section 3.3.1, except
for the population size which is equal to 30. Ten runs were performed with a maximum
number of iterations equal to 100.
Fig. 3.11 shows the evolution of the average performance of the well placement, i.e.,
NPV mean values and the corresponding standard deviation. Optimizing the placement
of one multi-segment producer offers an NPV equal to $1.10 × 109 ± 4.37 × 107 . To reach
an NPV mean value of $1.10 × 109 , the optimization process requires only 504 reservoir
simulations.
The positions of solution wells are shown in Figs. 3.12 and 3.13. In this application,
the used methodology succeeds in reaching NPV values greater than $1.09 × 109 and in
defining an “optimum” well configuration in the same potential region for all the performed
runs. Therefore, performing only one run can be conclusive and can ensure converging to
a solution well with a satisfactory NPV.
50
9
x 10
12
11
10
9
NPV
5
0 100 200 300 400 500 600
Figure 3.11: The mean value of NPV (in US dollar) and its corresponding standard de-
viation for well placement optimization using CMA-ES with meta-models of one multi-
segment well. Ten runs are performed.
2420
2410
2400
2390
2380
3000 2000
2500
2500
3000
2000 3500
Figure 3.12: The positions of solution multi-segment producers found by 10 runs of CMA-
ES with meta-models. A zoom on the region containing the solution wells is performed.
51
2 880
2520
2 160
2 160 2 700
Figure 3.13: The positions of solution multi-segment producers found by 10 runs of CMA-
ES with meta-models projected on the top face of the reservoir. A zoom on the region
containing the solution wells is performed.
In this chapter, the stochastic optimization method CMA-ES was applied on a well place-
ment problem. A technique based on adaptive penalization with rejection was developed
to handle well placement constraints with CMA-ES. Results showed that this technique
ensures that after a number of iterations, the majority of well configurations generated
by CMA-ES are either feasible or close to the feasible domain. The optimization with
CMA-ES was compared to a GA which is the most popular method used in well place-
ment optimization in the literature. Both algorithms were used without parameter tuning
allowing for a direct fair comparison of the results. Indeed parameter tuning requires
extra function evaluations that should be taken into account when presenting comparison
results. In addition, we think that parameter tuning should be done by the designer of
the algorithm and not the user as it is unrealistic to waste expensive function evaluations
for correcting the weakness of the design phase. The CMA-ES example shows that pro-
viding parameter-free algorithms with robust setting is possible to achieve. CMA-ES was
shown to outperform the genetic algorithm on the PUNQ-S3 case by leading to a higher
net present value (NPV). Moreover, CMA-ES was shown to be able to define potential
regions containing optimal well configurations. On the other hand, the genetic algorithm
converged to solutions located in different regions for every performed run. In addition
52
those solutions are associated to much smaller NPV values than the solutions found by
CMA-ES. On the PUNQ-S3 case, the mean NPV value found by GA is $1.68 × 1010 . How-
ever, the mean NPV value found by CMA-ES is $2.01 × 1010 . The ability of CMA-ES to
find much higher NPV values and to converge to the same region of the search space, has
been explained by its advanced adaptation mechanism that allows the algorithm, on ill-
conditioned non-separable problems, to adapt in an efficient way its sampling probability
distribution.
To tackle the computational issue related to the number of reservoir simulations per-
formed during the optimization, an application of nlmm-CMA algorithm is demonstrated
on the PUNQ-S3 case. The use of meta-models was shown to offer similar results (solution
well configurations and the corresponding NPV values) as CMA-ES without meta-models
and moreover to reduce the number of simulations by 19-25% to reach a satisfactory NPV.
The comparison of the obtained results with some engineer’s proposed well configura-
tions showed that the proposed optimization methodology is able to provide better well
configurations in regions that might be difficult to determine by reservoir engineers.
The results presented in this chapter has demonstrated the potential huge benefit
of applying the CMA-ES methodology over more established stochastic techniques for
reservoir applications.
53
Chapter 4
Local-meta-model CMA-ES for

partially separable functions
This chapter is based on the paper [23]. In this chapter, we propose a new variant of
the covariance matrix adaptation evolution strategy with local meta-models for optimiz-
ing partially separable functions. We propose to exploit partial separability by building
at each iteration a meta-model for each element function (or sub-function) using a full
quadratic local model. The performance of the proposed algorithm is shown on a number
of mathematical test functions.
This chapter is structured as follows. Section 4.1 defines a general notion of partial
separability. In Section 4.2, we propose a new variant of CMA-ES with meta-models for
partially separable functions. The performance of this variant is evaluated in Section 4.3
on a number of partially separable test functions. The choice of the number of points used
to build the meta-model is also described and the computational cost is discussed.
In the following, we denote the objective function to be optimized by f : Rn → R.
4.1 Partial separability and problem modeling
A function f : Rn → R is partially separable if it can be written as a sum of sub-functions,

also called element functions, each depending on a fewer number of variables. Often the
particular case where each sub-function depends on a subset of variables of the original
function is defined as partial separability. For instance the Rosenbrock function in Table 4.1
writes:
n−1
X
f (x) = h(xi , xi+1 ) , (4.1)
i=1
where x = (xi )1≤i≤n and h(xi , xi+1 ) = α(x2i − xi+1 )2 + (xi − 1)2 and is thus partially
separable with each sub-function depending on the subset of variables [(xi , xi+1 )]i=1,··· ,n−1 .
This particular case of partially separable function is considered for instance in [21, 38, 45].
54
4.1 Partial separability and problem modeling
A more general definition, given in [102], considers that each sub-function can depend on
a number of variables that are a linear combination of a subset of variables.
In this thesis we consider a generalization of the previous definitions allowing non-
linear combinations of the subset of variables. More precisely a function f : Rn → R is
said partially separable if there exists an integer N > 1, a set of integers (ni )1≤i≤N with
ni < n, for all i = 1, · · · , N , a set of explicit functions (Φi : Rn → Rni )1≤i≤N and a set
PN
of functions (fi : Rni → R)1≤i≤N , such that f can be written as f (x) = fi (Φi (x)).
i=1
The sub-functions or element functions (fi )1≤i≤N depend on a number ni of parameters
called element variables. The functions Φi will be called mapping functions. Note that the
setting of [102] is recovered by taking Φi = U i where U i is a linear mapping from Rn to
Rni .
For a given partially separable function, there exists “theoretically” an infinite number
of ways to define the element functions and mapping functions. However, one has usually
a restricted knowledge about the structure of the problem that determines the modeling
choice. We can argue that we only know in general that the problem can be decomposed as
a sum of element functions depending on fewer variables, and that there is thus no reason
to encode non-linearity in the variable dependencies. However, a motivating example for
our general definition is the well placement optimization problem, in which we will show in
Chapter 5 that a suitable way to model the objective function is to suppose that the profit
corresponding to a given well depends only on its location and on the distances of this well
to the others. Using the distances between the wells as an element variable implies using
a nonlinear combination of the parameters of the problem (see Chapter 5).
In the well placement problem also, the objective function is computed using a numer-
ical software (reservoir simulator) able to simulate for a given set of well placements the
quantity of oil, water and gas that can be extracted from each well. Consequently one
has access to the function value of each element function. In the following we will also
assume not only that the function is partially separable but also that one has access to the
function value of each element function. As argued above this assumption is reasonable as
it models the case for the well placement problem. History matching is another problem in
petroleum engineering in which this assumption is reasonable. In history matching prob-
lems, we want to adjust the reservoir model until it closely reproduces the past behavior
of the reservoir (historical production and pressures). For this problem also, we can define
the objective function as a sum of a number of sub-functions defined for each well and
calculated when evaluating the objective function [44].
Exploiting partial separability or separability is a common approach to enhance perfor-
mances of optimization algorithms, in particular when dealing with large scale optimiza-
tion. For instance a trust region algorithm for minimizing partially separable functions
55
4.2 lmm-CMA for partially separable functions
Table 4.1: Test functions. For the block-rotated ellipsoid, Q is a 2 × 2 rotation matrix
with each column being a uniformly distributed unit vector.
Name Function
n−1
P 2
Rosenbrock α
fRosen (x) = α. x2i − xi+1 + (xi − 1)2
i=1
1 n−1
P 2 1
α 2 2 2
Rosenbrock 2 fRosen 1 (x) = α. xi − xi+1 + (xi − 1)
2 i=1
P2 i−1
α
Block-rotated fBlockElli−2D (x, y) = α n−1 .(Q × (x, y)) 2
i=1
ellipsoid 2D
n−1
P α
α
Block-rotated fBlockElli (x) = fBlockElli−2D (xi , xi+1 )
i=1
ellipsoid
was proposed in [38]. Separability was also exploited within CMA-ES. A method where
the covariance matrix was constrained to be diagonal has been proposed in [109].
This section introduces a new algorithm based on nlmm-CMA and exploiting the partial
separability of the objective function. This algorithm will be called the partially separable
local-meta-model CMA-ES (p-sep lmm-CMA).
In our proposed approach, the partial separability of the objective function is exploited
when building the meta-models. The optimization process defined by CMA-ES is not
altered. The idea behind exploiting the problem structure when building the meta-model,
is to improve the quality of the approximate model. Hence, the better the quality of
the model is, the easier the acceptance criteria can be satisfied, the less evaluations are
performed.
Let us consider a partially separable function f . As in Section 4.1, we consider that
f has N element functions (fi )1≤i≤N . For each element function, we associate a mapping
N
P
function Φi such that f (x) = fi ◦ Φi (x). We suppose that when evaluating a point x
i=1
on f , we have access to the evaluations (fi ◦ Φi (x))1≤i≤N as well.
In Chapter 2, an approximate function fˆ for a given objective function f is defined
using a locally weighted regression based on the training set containing both evaluated
points and their values on f . In this chapter, we propose to build a meta-model for each
element function fi that we denote by fî . The meta-model fˆ of f is then defined by:
N
X
fˆ = fî ◦ Φi . (4.2)
i=1
The meta-model fî of each element function fi is built in a way quite similar to the
56
meta-model fˆ of f defined for the (n)lmm-CMA in Section 2.3.1.1. The training set is
built by storing for every evaluated point x, Φi (x) and its corresponding values on fi , i.e.,
fi (Φi (x)). Let us consider an individual q for which Φi (q) ∈ Rni has to be evaluated on
the approximate model of fi . Assuming that the training set contains a sufficient number
mi of elements, we select the ki ∈ N nearest points (Φi (xj ), j = 1, · · · , ki ) to Φi (q) using
the Mahalanobis distance di with respect to a matrix Ci , defined for a given point z ∈ Rn
as: q
di (Φ (z), Φ (q))= (Φi(z)−Φi(q))T Ci −1 (Φi(z)−Φi(q)) ,
i i
(4.3)
where Ci is an ni × ni matrix adapted to the local shape of the landscape of fi (see below).
Similarly to Section 2.3.1.1, a full quadratic meta-model is used. Using a vector βi ∈
ni (ni +3)
R 2 , fî is defined for a given point z ∈ Rn , for which we denote Φi (z) = (ũ1 , · · · , ũni )
+1
as:

fî Φi (z), βi = βiT z˜i T , (4.4)

where z˜i = ũ21 , · · · , ũ2ni , ũ1 ũ2 , · · · , ũni −1 ũni , ũ1 , · · · , ũni , 1 . The full quadratic meta-model
is built by minimizing the following criterion with resepct to βi :
ki
" !#
X 2 di Φi (xj ), Φi (q)
B(q) = fî Φi (xj ), βi − fi (Φi (xj )) ×K . (4.5)
h
j=1
K(.) is the kernel weighting function defined as in Section 2.3.1.1 by K(ζ) = (1 − ζ 2 )2 ,

and h is the bandwidth defined by the distance di of the kith nearest neighbor data point
ni (ni +3)
to q. For a given element function, ki must be greater or equal to ki,min = 2 + 1. ki
is chosen to be equal to 2 × ki,min . The choice of ki will be discussed in Section 4.3.3. The
sufficient size of the training set denoted above by mi must be then greater or equal to ki .
PN
Hence, the approximate function of f which corresponds to fˆ(x) = fî (Φi (x)) is
i=1
incorporated into CMA-ES using the approximate ranking procedure as detailed in Sec-
tion 2.3.
It remains now to describe how the matrices (Ci )1≤i≤N are obtained. They are built
in an iterative manner. At each iteration, after the approximate ranking procedure, each
of the λ candidate solutions denoted (Xm )1≤m≤λ and sampled according to Eq. (2.2) has
been either evaluated on f or has an associated approximate meta-models value given
by Eq. (4.2). Thus for each i, the vectors Φi (Xm ) ∈ Rni have either been evaluated on
fi or have an associated estimate of fi provided by fî . We then consider the vectors
Φi (Xm ) ∈ Rni for 1 ≤ m ≤ λ and rank them according to f˜i where f˜i equals fi if Xm was
evaluated on f and fî otherwise. The ordered µ best solutions according to f˜i are used as
input variables in Algorithm 1, to update the covariance matrix Ci .
57
4.3 Evaluation of p-sep lmm-CMA
Algorithm 1: CMA-Update(x1 , · · · , xµ )
µ
P
1. given parameters (ωi )1≤i≤µ , cσ , cc , ccov , µcov , dσ . Set µeff = 1/ ωi 2
i=1
2. given m ∈ Rn , pσ ∈ Rn , pc ∈ Rn , σ ∈ R and C ∈ Rn×n from last iteration
3. m− ← m
P µ
4. m ← ωi x i
i=1
p 1
5. pσ ← (1 − cσ )pσ + cσ (2 − cσ )µeff C− 2 m−m
−
σ
p
6. pc ← (1− cc )pc + cc (2 − cc )µeff m−m
−
σ
Pµ − T
ωi (xi −m )(x i −m )
−
7. Cµ = σ2
i=1

kpσ k
8. σ ← σ × exp dcσσ EkN(0,I)k −1

ccov T 1
9. C ← (1 − ccov )C + µcov pc pc + ccov 1 − µcov × Cµ
Number of evaluations per generation

4
10 7
n=4 n=4
n=8 n=8
Number of evaluations
6 n = 10
n = 10
n = 16 n = 16
5
3
10 4
2
2
10 1
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
γ γ
(a) (b)
Figure 4.1: (a) Average number of evaluations of the p-sep lmm-CMA on fRosen 100 to reach
fstop for varying population sizes λ = γ × λdefault . (b) Average number of evaluations per
generation of the p-sep lmm-CMA on fRosen100 for varying population sizes λ = γ × λdefault .
In Algorithm 1, the parameters (ωi )1≤i≤µ , cσ , cc , ccov , µcov , dσ are chosen with default
values as defined in [71]. Initial values for pσ , pc and C used in Algorithm 1 are also set
to default as in [71]. Initial values for m and σ are set to Φi (m(0) ) and σ (0) where m(0)
and σ (0) are the initial mean vector and step-size of (n)lmm-CMA. The idea behind this
adaptation procedure is the same as the one of the adaptive encoding proposed in [68].
However in adaptive encoding, step-size update is not needed and different normalizations
for the weights depending on the step-length are introduced. Though we believe that the
adaptive encoding update is more robust numerically, it has not been tested for this thesis.
58
Table 4.2: Modeling of the partially separable functions tested.

Name nM N fi (u = (uj )1≤j≤nM ) Φi (v = (vj )1≤j≤n )
2
2 2
Rosenbrock 2 (n − 1) fi (u) = α. u1 − u2 + (u1 − 1) Φi (v) = (vi , vi+1 )
n−1 2
2 2
4 3 fi (u) = α. u1 − u2 + (u1 − 1) Φi (v) = (v3i−2 , v3i−1 ,
2
2 2
+α. u2 − u3 + (u2 − 1) v3i , v3i+1 )
2
2 2
+α. u3 − u4 + (u3 − 1)
1
2 1
2 (n − 1) fi (u) = α. u21 − u2 + (u1 − 1)2
2
Rosenbrock 2 Φi (v) = (vi , vi+1 )
Block-rotated 2 (n − 1) fi (u) α
= fBlockElli−2D (u1 , u2 ) Φi (v) = (vi , vi+1 )
ellipsoid
In this section we describe the functions used to evaluate p-sep lmm-CMA. We show
the performance of this method compared to CMA-ES. The optimal bandwidth used to
build the meta-model is also investigated and the computational cost of the approach is
discussed.
4.3.1 Test functions

1
The p-sep lmm-CMA is evaluated on the partially separable test functions fRosen 100 ,
, fRosen
10000 , f 100
fRosen and fBlockElli defined in Table 4.1. For the block-rotated ellipsoid, Q is a
Rosen 12
2×2 rotation matrix sampled uniformly anew for every run performed. The performance of
the method is measured using the success performance SP1 defined as the average number
of evaluations for successful runs divided by the ratio of successful runs, needed to reach
a stopping objective value fstop = 10−10 , except for fRosen
α
1 for which fstop = 10
−5 . We
2
perform 20 independent runs to measure SP1. The runs are randomly initialized in the
1
intervals [−5, 5] for fRosen 100 , f 10000 and f 10000 and [−10, 10] for f
, fRosen Rosen Rosen 1 BlockElli . Each test
2
function is modeled by defining a number N of element functions, a number nM of element
variables for each element function, a set of element functions denoted by fi : RnM → R
N
P
and a set of mapping functions Φi : Rn → RnM , such that f = fi ◦ Φi . The modeling
i=1
of each test function is shown in Table 4.2. The block-rotated ellipsoid function is defined
using quadratic element functions. For the other tested functions, the defined element
functions are not quadratic.
4.3.2 Performance of p-sep lmm-CMA
Results on the test functions are presented in Table 4.3 showing the performance of p-sep
lmm-CMA compared to CMA-ES and to some tests with nlmm-CMA. For each test, by
defining the value of nM , we refer to the corresponding modeling defined in Table 4.2. It is
59
Table 4.3: Success performance SP1, i.e., the average number of function evaluations
for successful runs divided by the ratio of successful runs, standard deviations of the
number of function evaluations for successful runs and speedup performance spu, to reach
fstop = 10−10 of p-sep lmm-CMA, nlmm-CMA and CMA-ES (for fRosen 100
1 , fstop = 10
−5 ).
2
The ratio of successful runs is denoted between brackets if it is < 1.0. The number of
element variables of each element function is denoted by nM .
Function n nM λ p-sep lmm-CMA spu nlmm-CMA spu CMA-ES

1
fRosen 4 2 8 189 ± 13 5.1 297 ± 20 3.2 964 ± 192
8 2 10 308 ± 20 6.5 932 ± 52 2.2 2006 ± 118
10 2 10 353 ± 20 6.8 1482 ± 169 1.6 2418 ± 204
16 2 12 465 ± 20 8.6 4023 ± 310
20 2 12 548 ± 34 9.1 4978 ± 374
32 2 14 755 ± 32 10.3 7777 ± 347
40 2 15 871 ± 41 11.2 9799 ± 602
100
fRosen 4 2 8 485 ± 47 [0.80] 4.7 647 ± 67 [0.95] 3.5 2269 ± 254 [0.85]
8 2 10 910 ± 71 [0.80] 6.5 2602 ± 264 [0.85] 2.3 5883 ± 727 [0.90]
10 2 10 1006 ± 99 [0.95] 7.6 3727 ± 300 [0.90] 2.1 7644 ± 765 [0.95]
16 2 12 1834 ± 117 [0.90] 8.6 15781 ± 1360 [0.85]
16 4 12 7162 ± 1112 [0.95] 2.2 15781 ± 1360 [0.85]
20 2 12 2533 ± 361 [0.90] 10.4 26366 ± 3249 [0.85]
32 2 14 4628 ± 144 [0.95] 13.2 60948 ± 2668 [0.90]
40 2 15 6527 ± 226 [0.95] 15.2 99346 ± 3502 [0.85]
10000
fRosen 4 2 8 1333 ± 238 [0.95] 5.3 2637 ± 715 [0.90] 2.7 7032 ± 944 [0.90]
8 2 10 2745 ± 246 6.6 10287 ± 468 [0.85] 1.8 18216 ± 1683 [0.95]
10 2 10 5552 ± 429 [0.75] 4.5 16280 ± 843 [0.85] 1.5 25037 ± 3160 [0.95]
16 2 12 10583 ± 398 [0.80] 5.9 62903 ± 4441 [0.90]
20 2 12 14749 ± 431 [0.90] 6.3 93545 ± 6566 [0.95]
100
fRosen 4 2 8 544 ± 48 [0.70] 4.8 909 ± 75 [0.75] 2.9 2620 ± 342 [0.95]
1
2
8 2 10 1008 ± 67 [0.80] 7.0 2549 ± 262 [0.95] 2.8 7006 ± 762
10 2 10 1299 ± 178 [0.95] 10.4 4685 ± 518 [0.90] 2.9 13517 ± 1288 [0.75]
16 2 12 3346 ± 223 [0.90] 9.9 33154 ± 3568 [0.90]
20 2 12 6797 ± 878 [0.85] 10.0 68136 ± 5363 [0.80]
32 2 14 20751 ± 2116 [0.85] 14.6 302039 ± 40915 [0.65]
10000
fBlockElli 4 2 8 226 ± 11 6.6 1500 ± 89
8 2 10 392 ± 14 8.2 3220 ± 196
10 2 10 472 ± 17 8.7 4093 ± 173
16 2 12 670 ± 37 9.8 6566 ± 284
60
4
10
CMA−ES
p−sep lmm−CMA
SP1 / Dimension 3
10
2
10
1
10
4 8 10 16 20 32 40
Dimension
1
(a) fRosen
4
10
3
SP1 / Dimension
10
2
10
1
10
CMA−ES
0 Sepmm−CMA
10
4 8 10 16 20 32 40
Dimension
100
(b) fRosen
4
10
3
SP1 / Dimension
10
2
10
1
10
p−sep lmm−CMA
0 CMA−ES
10
4 8 10 12 16 20
Dimension
10000
(c) fRosen
α
Figure 4.2: Success performance SP1 over the dimension of the problem on fRosen , with
2 4
α = 1, 10 and 10 for dimensions in between 4 and 40. The dimension of the sub-functions
nM equals 2.
61
clear that exploiting the partial separability within CMA-ES with meta-models improves
the performance of CMA-ES with a speedup in-between 4.5 and 15.
For element functions with fixed nM equal to 2, p-sep lmm-CMA offers an increasing
speedup with increasing dimensions of the problem as shown in Fig. 4.2. The algorithm
p-sep lmm-CMA performs better with increasing dimensions since it breaks the curse of
dimensionality when building the meta-model: for a problem of dimension n, building the
meta-model is equivalent to building N meta-models of dimension nM .
Using greater number of parameters for each separated meta-model decreases the
100
speedup obtained by the approach. On fRosen for a dimension 16, the speedup, decreases
from 8.6 to 2.2 for corresponding values of nM respectively equal to 2 and 4.
At each iteration at least nb function evaluations are performed on the true function in
λ
order to check the accuracy of the meta-models. The parameter nb is set to max[1, ( 10 )].
This setting is introduced in order to be able to add a significant amount of information
at each iteration by enriching the training set. It is in particular important when dealing
with large population sizes. For increasing population sizes λ, i.e., for increasing values of
µ, we need an increasing number of points evaluated at each iteration cycle to be able to
have a significant impact on the ranking of population.
Moreover, a better setting of nb would also depend on the dimension of the problem
as for increasing dimensions, i.e., for increasing numbers k (or ki ) of points to build the
meta-model, we need an increasing number of points evaluated at each iteration cycle to
be able to change significantly the meta-model and then the ranking of the population.
The minimum number of evaluations performed at each iteration nb limits the speedup
that can be achieved by our approach. We show that for some test functions, we are able
100
to reach this maximum speedup of λ/nb . For fRosen 100
with n = 40 and for fRosen 1 with
2
n = 20, we reach a speedup equal to λ since nb is equal to 1 in these tests.
Since we reach the maximal speedup allowed by the approach on the Rosenbrock
function, we asked ourselves whether we can further reduce the number of overall function
evaluations needed to reach a target by increasing the population size λ. The default
population size denoted λdefault value equals 4+⌊3×ln(n)⌋. Fig. 4.1(a) shows the influence
of the population size on the performance of p-sep lmm-CMA. We perform 20 independent
100
runs on fRosen for dimensions n = 4, 8, 10 and 16, and nM = 2 with fstop = 10−10 . The
tested population sizes are written as λ = γ × λdefault where γ is in-between 1 and 10.
Tests were performed with similar parameters: ninit initialized to λdefault and nb equal to
max[1, ( λdefault
10 )]. A training set containing ki elements randomly sampled is loaded at the
beginning of every run in order to use the meta-models from the first generation, for all
the tests. Results show that λ = 4 × λdefault gives the minimum number of evaluations to
reach fstop and improves the performance by a factor between 1.5 and 2 over the default
62
12 n = 40 15 n = 32
10 n = 32
n = 20
8 n = 20 10
Speedup
speedup
n = 16
6 n = 16
n = 10
4 n = 10 5
2 n=8 n=8
0 n=4 0 n=4
1 2 3 4 5 1 2 3 4 5
β β
1 100
(a) fRosen (b) fRosen
8 n = 20 12 n = 20
10
6 n = 16 n = 16
8
Speedup
Speedup
4 n = 10 6 n = 10
4
2 n=8 n=8
2
0 n=4 0 n=4
1 2 3 4 5 1 2 3 4 5
β β
10000 100
(c) fRosen (d) fRosen 1
2
Figure 4.3: Average speedup with respect to CMA-ES to reach fstop with a varying number
of points used to build the meta-model ki = β × ki,min where ki,min = ni (n2i +3) + 1. Each
point corresponds to 20 runs performed.
population size. For γ > 4, the performance of p-sep lmm-CMA stagnates. We observe in
Fig. 4.1(b) that the number of evaluations per generation increases linearly for increasing
population sizes.
4.3.3 Optimal bandwidth for building partially separated meta-models
Let us consider an element function fi with a number of element variables ni . The optimal
bandwidth depends on the number of points ki used to build the meta-model. As shown
ni (ni +3)
in Section 4.2, ki must be greater or equal to ki,min = 2 + 1. In this section,
we investigate the influence of the choice of ki on the performance of p-sep lmm-CMA.
α
We perform 20 independent runs on fRosen for α = 1, 102 , 104 and fRosen
100
1 for different
2
dimensions in-between 4 and 40. Results are shown in Fig. 4.3, where ki is written as
ki = β × ki,min for β = 1, 2, 3, 4 and 5. We find that for 14 tests over the 23 tests
performed on the test functions with different dimensions, a good estimate of the optimal
β is equal to 2. Moreover, for the other tests, choosing a value of β equal to 2 is a
63
100
reasonable choice since it offers a speedup close to best one found, except for fRosen with
dimensions 10 and 16.
4.3.4 Computational cost
The internal cost of the optimization procedure is dominated by the evaluation of the
objective function and the construction of the meta-model.
For p-sep lmm-CMA, building a meta-model consists in finding in the training set
the ki sorted nearest points to the point to be evaluated and then solving Eq. (4.5). Let
us consider a training set with a size m. To find and sort the best ki points, we begin
by sorting the first ki points of the training set using a heapsort algorithm which has
a complexity of ki logki . Then, we compare the other (m − ki ) points with the selected
ki points until finding its position which adds at worst a complexity of (m − ki ) × ki .
Thus, finding and sorting the best ki points needs O(ki logki + (m − ki )ki ) = O(m × ki ).
According to Section 4.3.3, the optimal bandwidth ki is equal to ni (ni + 3) + 2. Thus,
finding and sorting the points to evaluate the meta-model needs O(m × n2i ). Moreover,
solving Eq. (4.5) is dominated by a ki × ki matrix inversion and thus has a complexity of
n6i .
Let us denote by Ne the number of evaluations on the true objective function and by
ce the complexity of one single objective function evaluation. Let us denote also by Nm
the number of built meta-models. The complexity of p-sep lmm-CMA is then equal to:
Ne ce + Nm n2i (m + n4i ).
In this chapter we have investigated the exploitation of partial separability of the objective
function to enhance the performances of CMA-ES coupled with local meta-models. We
have defined p-sep lmm-CMA, a new variant of CMA-ES with meta-models for partially
separable functions. In this variant, we build separate meta-models for each element
function, instead of building one meta-model for the whole objective function. We have
shown that the speedup of p-sep lmm-CMA with respect to CMA-ES is in-between 4.5
100
and 15 for the tested functions. For fRosen 100
with a dimension 40 and for fRosen 1 with a
2
dimension 20, we reach a speedup equal to λ which corresponds to the theoretical maximum
speedup allowed by the approach. In general, the maximum speedup that can be achieved
equals λ/nb as at least nb evaluations on the true function are performed at each iteration.
We have shown on the standard Rosenbrock function that increasing the population size
allows to decrease significantly (by a factor between 1.5 and 2) the number of evaluations
64
to reach a given target. The optimal population size on the Rosenbrock function is shown
to be equal to 4 × λdefault .
65
Chapter 5
Partially separated meta-models

with CMA-ES for well placement
optimization
This chapter is based on the paper [25]. In the well placement optimization problem, the
objective function (e.g., the NPV) can usually be split into local components referring
to each of the wells that moreover depends in general on a smaller number of principal
parameters, and thus can be modeled as a partially separable function. In this chapter,
we propose to apply p-sep lmm-CMA (defined in Chapter 4) on the well placement prob-
lem, i.e., to exploit the partial separability of the objective function when using CMA-ES
coupled with meta-models, by building partially separated meta-models. Thus, differ-
ent meta-models are built for each well or set of wells, which results in a more accurate
modeling. The approach is shown on the PUNQ-S3 case.
This chapter is structured as follows. Section 5.1 defines p-sep lmm-CMA for the
well placement problem. In Section 5.2, we demonstrate the contribution of the proposed
approach in reducing the number of reservoir simulations on the synthetic benchmark
reservoir case PUNQ-S3 [54].
5.1 p-sep lmm-CMA for well placement optimization
In this chapter, we propose to build a meta-model for each well or set of wells to be placed,
instead of one meta-model for all the wells.
In order to apply p-sep lmm-CMA (defined in Chapter 4), we need to define the different
element functions and their corresponding dependencies. As mentioned in Chapter 4, for
a given partially separable function, there exists “theoretically” an infinite number of
ways to define the element functions and mapping functions. However in this chapter, we
66
5.1 p-sep lmm-CMA for well placement optimization
propose to investigate building one meta-model for each well (already drilled and to be
drilled) approximating its NPV.
Let us consider a reservoir case with a number Nw of wells to be drilled. We sup-
pose that we have also Nwd wells already drilled. We denote by (NPVi )1≤i≤Nw the NPVs
corresponding to the wells to be drilled and by (NPVi )(Nw +1)≤i≤(Nw +Nwd ) the NPVs cor-
responding to the wells already drilled.
Therefore, the objective function corresponding to the NPV of the field is equal to
the sum of the different element functions corresponding to the NPV of each well, i.e.,
(NPVi )1≤i≤(Nw +Nwd ) .
Let us denote by {m1 , · · · , mNw } the number of parameters defining the position of
the wells to be placed, and by (Wj ∈ Rmj )1≤j≤Nw these parameters. Thus, the NPV, as
well as the NPVs corresponding to each well depends on (Wj )1≤j≤Nw :
+Nwd
NwX
NPV = NPVi , (5.1)
i=1
+Nwd
NwX
NPV (Wj )1≤j≤Nw = NPVi (Wj )1≤j≤Nw . (5.2)
i=1
As reflected in the previous equation, in general, the NPVi of a given well i depends
on all the wells1 , however, in order to use the p-sep lmm-CMA, we will assume that the
NPVi of a well essentially depends on a fewer number of parameters.
In this chapter, we will assume that the NPVi of a given well essentially depends on
the considered well and that the impact of other wells is represented only by distances
between the considered well and the others. For each well denoted by i, we define the
following parameters:
• dpi : the minimum distance between the well i and the other producers;
• dii : the minimum distance between the well i and the other injectors.
The minimum distance between two wells is defined by the minimum Euclidean distance
between the two trajectories of the considered wells. In order to calculate the meta-
model, we now suppose that the NPVs of the wells to be drilled, i.e., (NPVi )1≤i≤Nw can
be approximated using only the parameters defining the location and trajectory of the
considered well and its corresponding dpi and dii . The NPV of the well already drilled,
i.e., (NPVi )(Nw +1)≤i≤(Nw +Nwd ) can be approximated using only two parameters: dpi and
dii .
1
Except when dealing with non-communicating reservoir regions, and if each of the wells has to be
placed in one of these regions
67
5.2 Application of p-sep lmm-CMA on the PUNQ-S3 case
ˆ can be written as follows:

Therefore, the built meta-model NPV
Nw +Nwd
NwX
X
ˆ (Wj )1≤j≤N =
NPV ˆ i (Wi , dpi , dii ) +
NPV ˆ i (dpi , dii ) ,
NPV (5.3)
w
| {z }
i=1 i=Nw +1
∈Rmi ×R×R
ˆ i denotes the meta-model approximating NPVi .

where NPV
ˆ into CMA-ES, we use the approx-
After that, to incorporate the built meta-model NPV
imate ranking procedure as described in the variant nlmm-CMA2 defined in Section 2.3.3.4
with only one difference related to the acceptance criterion of the meta-model: in this case,
we use a less conservative criterion in which the meta-model is accepted if it succeeds in
keeping only the best well configuration unchanged.
In the next section, we will see how the approach can be applied for a well placement
problem and the number of function evaluations that can be saved in the optimization
process.
This section shows an application of p-sep lmm-CMA on the benchmark reservoir case
PUNQ-S3 [54]. This application is compared to the CMA-ES optimizer and to the variant
of CMA-ES with meta-models (nlmm-CMA)1 . As shown in previous examples, the model
contains 19×28×5 grid blocks. The elevation of the field is shown in Fig. 3.2. An injection
well denoted I1 is already drilled. Fig. 5.1 represents the SoPhiH map which represents
the distribution of the hydrocarbon pore volume over the nlayers layers, and defined by
nlayers
P
(Hk × φ × So ), where Hk is the gross thickness of the layer k, So is the oil saturation
k=1
and φ is the porosity. The location of I1 is also shown in Fig. 5.1, where I1 is an inclined
well drilled in the layer 3.
We propose to drill 3 unilateral producers (denoted P1, P2 and P3) to maximize the
NPV. The dimension of the problem is then equal to: 6 × 3 = 18. A producer limit
bottomhole pressure is fixed to 150 bar, and an injector limit bottomhole pressure is fixed
to 320 bar. A maximum length of 1000 m is imposed on the 3 producers to be drilled.
The population size λ is set, for all the methods used, to 60. The different optimizers are
run with a stopping criterion corresponding to a maximum number of reservoir simulations
equal to 1000. Other parameters of the optimization method were set to default settings.
As shown in Section 5.1, the built meta-model for the element functions (NPVi )i=1,··· ,3
will only depend on eight parameters (compared to eighteen if we would use all the original
variables), and the meta-model for the element function NPV4 will only depend on a single
1
In this chapter, we use the variant nlmm-CMA2 (defined in Section 2.3.3.4), as used in Chapter 3.
68
Figure 5.1: The SoPhiH map with the location of the injector already drilled I1.
69
10
x 10
1.5
1.4
1.3
NPV
1.2
1.1
1 p−sep lmm−CMA
lmm−CMA
CMA−ES
0.9
0 200 400 600 800 1000
Figure 5.2: The mean value of NPV (in US dollars) for well placement optimization
using CMA-ES with partially separated meta-models denoted p-sep lmm-CMA (solid line),
CMA-ES with meta-models denoted lmm-CMA (dash line) and CMA-ES (△). Ten runs
are performed for each method.
parameter1 :
ˆ
NPV(P ˆ ˆ
1 , P2 , P3 ) = NPV1 (P1 , dp1 , di1 ) + NPV2 (P2 , dp2 , di2 )
ˆ ˆ 4 (dp4 ) , (5.4)
+NPV3 (P3 , dp3 , di3 ) + NPV
where Pi ∈ R6 denotes the vector of parameters defining the position of the well Pi .
The number of points used to build the partially separated meta-models, is chosen to be
equal to 90 (according to Section 4.3.3), and the meta-model is used when the training set
(storing the performed evaluations) contains at least 150 elements, i.e., before performing
150 simulations, all the points are evaluated with the true objective function, and the
partially separated meta-model is not used.
Fig. 5.2 shows the average performance of the proposed method, i.e., CMA-ES with
partially separated meta-models (p-sep lmm-CMA). Results are reported together with
those obtained using CMA-ES and CMA-ES with meta-models (nlmm-CMA). The per-
formance of each method is evaluated on ten independent runs, where for each run, we
report the best obtained NPV value after each generation. These values correspond to
true values of the objective function, i.e., obtained with a reservoir simulation2 .
1
Here, we have only one single parameter dpi , since the only injector we have is the considered injector.
2
The CMA-ES with meta-models method ensures by construction that at least each generation the best
70
During the first iterations of the optimization, the performance of the 3 used algorithms
is equivalent. For p-sep lmm-CMA, the meta-model is used if the training set contains
at least 150 performed reservoir simulation results. Therefore, at the beginning of the
optimization, the meta-model is not used which justifies the equivalent results for the
three optimizers.
For nlmm-CMA, building the meta-model requires more reservoir simulations com-
pared to partially separated meta-models. Non-partially separated meta-models depend
on 18 parameters. In the performed runs, the meta-model is built using 300 performed
reservoir simulations (k = 300) and used when the training set contains at least 350 ob-
jective function evaluations. Hence, before reaching 350 simulations, nlmm-CMA and
CMA-ES are equivalent.
Except at the beginning of the optimization in which all the optimizers are equiva-
lent, it is clear that CMA-ES with partially separated meta-models outperforms the other
methods, when considering a restricted budget of 1000 reservoir simulations. The context
of restricted budget of simulations is imposed to consider real applications in which the
number of simulations is generally limited to several hundreds or at most a few thousands,
due to the CPU time required by a simulation.
For a given number of reservoir simulations equal to 600, p-sep lmm-CMA is able to
find a well configuration with an NPV equal to $1.26 × 1010 . However, CMA-ES reaches
only an NPV equal to $1.17 × 1010 and nlmm-CMA offers only a maximum NPV equal to
$1.21 × 1010 . As a conclusion, using a restricted budget of reservoir simulations, exploiting
the partial separability allows reaching greater NPV values compared to CMA-ES and
nlmm-CMA.
To reach a value of NPV equal to $1.20×1010 , CMA-ES with partially separated meta-
models requires 370 reservoir simulations. However, to reach the same value of NPV, using
standard meta-models requires 510 reservoir simulations, and when using CMA-ES without
meta-models, we need 930 reservoir simulations. Therefore, using partially separated
meta-models saves 60% of the number of reservoir simulations compared to CMA-ES
(without meta-models). The contribution of exploiting the partial separability is shown
when comparing p-sep lmm-CMA with nlmm-CMA. Exploiting the partial separability
of the objective function saves 28% of the number of reservoir simulations compared to
CMA-ES with standard meta-models approach.
Fig. 5.3 shows one of the obtained solution well configurations, with an NPV value
equal to $1.38 × 1010 . Although, each of the performed runs proposes in general a different
solution, the majority of the solution well configurations are located in the same regions.
point is evaluated with the true objective function, i.e., each iteration, the best obtained well configuration
is evaluated using a reservoir simulation.
71
Figure 5.3: The SoPhiH map with the location of the injector already drilled I1, and
solution producers (P1, P2 and P3).
72
Fig. 5.4 shows a typical optimization process performed using CMA-ES with separated
meta-models, i.e., with p-sep lmmCMA. Fig. 5.4 shows the evolution of the NPV (the best
at each generation and the overall best) as well as the evolution of the parameters encoding
the three wells.
In this chapter we have shown on the synthetic benchmark reservoir case PUNQ-S3 that
using p-sep lmm-CMA algorithm leads to an important reduction of the number of reservoir
simulations (around 60%) compared to the optimizer CMA-ES. The important savings in
the number of reservoir simulations are justified by the reduced number of parameters
required to build the meta-model of the element functions.
The proposed approach exploiting the partial separability of the objective function can
also be combined with any other stochastic optimizer, in order to reduce the computational
cost of the optimization.
73
10
x 10
1.35 3500
1.3
3000
1.25
2500
trajectory evolution
1.2
1.15 2000
NPV
1.1 1500
1.05
1000
1
500
0.95
0.9 0
0 200 400 600 800 1000 0 200 400 600 800 1000
reservoir simulations reservoir simulations
(a) (b)
4
10
3
10
trajectory evolution
2
10
1
10
0
10
−1
10
0 200 400 600 800 1000
reservoir simulations
(c)
Figure 5.4: The evolution of the well placement optimization process on the PUNQ-S3
case using CMA-ES with partially separated meta-model, i.e., p-sep lmmCMA. The three
figures depict one of the ten performed runs of p-sep lmm-CMA. In (a), the evolution of
the best overall NPV value (in red) and the best NPV obtained each generation (in blue)
is depicted. In (b), the evolution of the well trajectory parameters, where each well is
plotted using a different color representing three group of parameters is depicted. The
group of angles encoding each well is shown in the lower part of the figure (values below
10). The group of well lengths is shown in the intermediate part of the figure (the three
curves with values around 500). The group of Cartesian coordinates of the wells is shown
in the upper part of the figure. In (c) the evolution of the well trajectory parameters on
the log-scale is depicted.
74
Chapter 6

under geological uncertainty
In the well placement problem, as well as in many other field development optimization
problems, geological uncertainty is considered as a key source of risk affecting the viability
of field development projects. The problem arises when we have multiple possible geological
realizations of the reservoir. The multiple realizations are generated using geostatistical
techniques and in general deemed equiprobable. Let us consider an objective function to
optimize denoted by f and a number Nr of geological realizations denoted by (Ri )i=1,··· ,Nr .
The key issue here is that for each scenario, i.e., for each well configuration when optimizing
well placement, we have Nr possible values of the objective function, one for each realization
where each will be denoted for a given well configuration x by f (x, Ri ) corresponding to
a given realization Ri .
This chapter addresses the problem of how to define the objective function to opti-
mize when dealing with uncertainty for well placement and whether we should perform
evaluations on all the possible realizations in order to define the objective function.
This chapter is structured as follows. Section 6.1 provides a detailed literature review
for well placement optimization under geological uncertainty. Section 6.2 defines a new
approach to handle geological uncertainty for well placement using the neighborhood. In
Section 6.3, we demonstrate the contribution of the proposed approach in capturing the
geological uncertainty and in reducing the number of reservoir simulations on the synthetic
benchmark reservoir case PUNQ-S3 [54].
6.1 Optimization under uncertainty: a literature review
The problem of optimization under geological uncertainty shares many similarities with
the problem of optimizing noisy functions.
75
A function f : Rn → R is said to be noisy if the only measurable value of f on

x ∈ Rn is a random variable that can be written as F(f(x), z) where f is a time-invariant
function and z is a noise often assumed to be normally distributed with a zero mean and
variance σ 2 , and denoted by N(0, σ 2 ). The noise can be also defined differently (e.g.,
Cauchy distributed), and can be either additive or multiplicative. A common approach to
optimize noisy functions is to estimate the fitness function by the expected value defined
as follows: Z +∞
f (x) = [F(f(x), z)] p(z) dz , (6.1)
−∞
where p(z) is the probability density function of the noise. Thus, a common way to ap-
proximate the expected fitness function is by averaging over a number of random samples:
Ns
1 X
f (x) ≃ [F(f(x), zi )] , (6.2)
Ns
i=1
where Ns denotes the number of samples called also the sample size.
In the context of field development optimization under geological uncertainty, we are
dealing with a finite number of realizations, and the measurable fitness values correspond
to the values f (x, Ri )i=1,··· ,Nr . Therefore, the objective function corresponds in general to:
Nr
1 X
f (x) = [f (x, Ri )] . (6.3)
Nr
i=1
However due to the expensive computational effort required to evaluate the objective
function over one realization Ri , the expected fitness function is often approximated in
a way to use a fewer number of samples instead of using all the realizations. Thus, one
common way to approximate the expected objective function here is again by averaging
over a number of samples Ns ≤ Nr .
In the following, we briefly review the existing approaches often used in optimization
under uncertainty. On the one hand we review the approaches defined by the optimiza-
tion community mainly to cope with noise but that can be extended to the different field
development optimization under geological uncertainty. On the other hand we review the
approaches already applied in the petroleum community to cope with geological uncer-
tainty.
6.1.1 Optimization community
This section summarizes the different ways to handle uncertainty within the evolution-
ary optimization community. A detailed overview of the existing approaches addressing
uncertainties in evolutionary optimization is presented in [84]. Let us suppose in this sec-
76
tion then that the function f to optimize is a noisy function. The approaches to handle
uncertainty can be mainly divided into two categories.
6.1.1.1 Explicit Averaging
Using mean of several samples for each individual The simplest and the most
common way to address the uncertainty issue is to define the objective function for each
point by averaging over a number of samples (Eq. (6.2)). Increasing the sample size Ns is
equivalent to reducing the variance of estimating the objective function.
In general, the objective function is defined using an averaged sum of a constant sample
size. In this case, for each single evaluation of the expected objective function, one needs
to evaluate the objective function on Ns samples.
In the context of costly objective functions, depending on the number of samples, there
is a compromise between the computational cost of the optimization and the accuracy of
the estimation of the objective function. Increasing (respectively, decreasing) the number
of samples tends to improve (respectively, worsen) the accuracy of the estimated objective
function, but on the other hand it tends also to increase (respectively, reduce) the com-
putational cost of the optimization. The idea of using an adapted sample size during the
optimization was first proposed in [3, 4]. In [4], it is shown that adapting the number of
samples performs better than using constant sample sizes, and it is suggested to increase
the sample size with the generation number and to use a higher number of samples for
individuals with higher estimated variance. An other way to adapt the sample size is
based on an individual’s probability to be among a number of the best individuals [121].
Recently, an other approach relying on the rank based selection operators was proposed in
[73]. In [76], an adaptive uncertainty handling procedure is proposed, based on selection
races [93].
Using the neighborhood for each individual An alternative approach to defining

the objective function as an averaged sum of a number of samples (constant or adapted) is
to define the objective function using the neighborhood points already evaluated [106, 29,
28, 27, 112, 113]. The general idea has first been suggested in [27] in which it is suggested
to estimate the fitness as a weighted average of the neighborhood with a linearly decreasing
weight function up to some fixed maximum distance. In [106, 28, 29], a locally weighted
regression is used for estimation. This technique is shown to be a good solution to improve
the accuracy of the estimated objective function without increasing the computational cost.
77
6.1.1.2 Implicit Averaging
When increasing the population size, the probability to obtain similar points is higher.
Thus, a way to cope with noise is to simply increase the population size [52]. In this case,
with a large population size, the influence of noise on a given point can be reduced due to
the evaluations on other similar points. Conflicting conclusions [52, 7, 8, 60] were shown
in the literature when comparing explicit averaging and implicit averaging.
6.1.2 Petroleum community
Several studies in the literature have addressed the problem of optimization under geolog-
ical uncertainty not only on the well placement problem but also on other field develop-
ment optimization problems. Optimization under geological uncertainty in the petroleum
community considers always a finite number of realizations Nr and models the objective
function following Eq. (6.3). In the following we briefly review the approaches to handle
uncertainty in optimization within the petroleum community.
To the best of our knowledge, all the studies that consider a number Nr multiple possi-
ble realizations of the reservoir, use the approach “Using mean of several samples for each
individual”. Moreover, all the studies in the literature, except the approach proposed in
[126] that will be detailed later, perform Nr reservoir simulations for every single objective
function evaluation. Although sharing this common similarity, the proposed approaches
introduce different formulations of the objective function.
In [116, 115, 103, 35], the objective function is formulated as the expected value of
the net present value over all the realizations, as shown in Eq. (6.3). In [35], the authors
tackles the problem of closed-loop production optimization using the optimizer EnOpt
[37, 36] which is applied to the geological model ensemble updated by either EnKF [49] or
EnRML [62].
In [129, 2, 5], multiple geostatistical realizations of the reservoir are considered in the
formulation of the objective function:
Nr
1 X
f (x) = [f (x, Ri )] + rσ , (6.4)
Nr
i=1
where r ∈ R is the risk factor and σ is the standard deviation of f on x over the realizations,
defined as follows: v
u Nr
u 1 X
σ=t (f (x, Ri ) − hf (x)i)2 , (6.5)
Nr
i=1
where:
Nr
1 X
hf (x)i = [f (x, Ri )] . (6.6)
Nr
i=1
78
The term rσ in Eq. (6.4) is used to take into account the decision maker’s attitude
toward risk. A positive r indicates a risk-prone attitude, a negative r indicates a risk-
averse attitude and an r = 0 indicates a risk-neutral attitude. This formulation is close to
the formulations defined in [64, 104] using utility functions.
In [10], a more general formulation of the objective function is defined as follows. A
genetic algorithm is used, in which at each iteration only a predefined percentage of the
individuals, chosen according to a set of scenario attributes, is simulated. For the simulated
individuals, the authors in [10] propose to perform again Nr reservoir simulations for each
well configuration x in order to evaluate the values of f (x, Ri ) on all realizations and
then to derive the cumulative distribution function cdf{f } on x. From this distribution,
the values of f 10 (x), f 50 (x) and f 90 (x) are determined. The value f 10 on x denotes the
value of f on x corresponding to a probability of 0.1, i.e., there is a probability 0.1 that
the value of f on x will be less than f 10 on x. The value f 10 on x can be written as
cdf{f }−1 (0.1). The values f 50 (x) and f 90 (x) are defined in a way similar to f 10 (x). The
objective function is then formulated as follows:
f (x) = r10 f 10 (x) + r50 f 50 (x) + r90 f 90 (x) , (6.7)
where the parameters r10 , r50 and r90 are defined according to the decision maker’s attitude
toward risk. A risk-neutral attitude corresponds to the case where (r10 , r50 , r90 ) = (0,
1, 0) which may be similar to the definition in Eq. (6.3). However, a risk-averse investor
tends to increase the value of r10 , and a risk-prone investor tends to increase the value of
r90 .
Another way to formulate the objective function under geological uncertainty is to
optimize the worst case scenario using a min-max problem formulation [30]. This approach
is used in [5] to optimize smart well controls.
The only approach selecting only a number of samples instead of all the realizations
is defined in [126]. The approach is based on the so-called retrospective optimization
[34, 127] and divides the problem as a number of subproblems, where the initial solution
of the current subproblem is simply the returned solution from the previous subproblem.
Each point to be evaluated is approximated by the average over a number of realizations,
where the number of selected realizations is increased from subproblem to subproblem.
The approach implies then defining a sequence of samples. The example shown in [126]
considers a well placement problem on 104 permeability and porosity realizations and
therefore defines subproblems with a sequence {20, 15, 10, 5} of iterations and a sequence
{1, 5, 16, 21, 104} of sample sizes. Although the authors suggest further testing of the
overall framework to determine the appropriate sequence of sample sizes, an answer can
79
6.2 Well placement under uncertainty with CMA-ES using the neighborhood
be the work on adapting automatically the sample sizes already proposed in [121, 73] but
still demanding in the number of objective function evaluations.
6.2 Well placement under uncertainty with CMA-ES using

the neighborhood
This section proposes a new approach to handle geological uncertainty for well placement.
The proposed approach focuses on reducing the uncertainty by using the objective function
evaluations of already evaluated individuals in the neighborhood. In this section, we
propose then to apply an approach based on using the neighborhood for each individual.
We define a CMA-ES optimizing an estimated fitness defined on a given point using a
weighted average of a small number of evaluations on the considered point and a number of
evaluations already performed on the neighborhood (up to some fixed maximum distance)
with a decreasing weight function depending on the Mahalanobis distance with respect
to the covariance matrix C defined by CMA-ES. Although considering a Mahalanobis
distance with respect to σ 2 C is suspected to be a better choice (since we are using a fixed
maximum distance to select the neighbors), it has not been tested in this thesis.
Let us consider a well placement optimization problem with a number of wells (pro-
ducers and/or injectors) to be placed. Let us denote by n the dimension of the problem,
i.e., the number of parameters needed to encode the wells to be placed. The wells to be
placed can be parameterized as defined in Section 3.1.2. Without loss of generality, we will
consider in the sequel the NPV as the objective function that we aim to optimize, unless
otherwise explicitly stated. Thus, we want to find a vector of parameter pmax,R ∈ Rn such
that:

NPVR (pmax,R ) = max NPVR (p) , (6.8)
p
where NPVR is the averaged sum of the NPVs of a given well configuration represented
by a vector of parameters p over all the realizations:
Nr
1 X
NPVR (p) = NPV(p, Ri ) . (6.9)
Nr
i=1
In the proposed approach, we define a so-called estimated objective function that will be
optimized instead of the true objective function NPVR defined in Eq. (6.9). The estimated
function will be denoted in the sequel by NPVE . Thus in the proposed approach, contrary
to what is shown in Eq. (6.8), we will try to find the vector of parameter pmax,E ∈ Rn
such that:

NPVE (pmax,E ) = max NPVE (p) . (6.10)
p
80
6.2 Well placement under uncertainty with CMA-ES using the neighborhood
The simplest case in which solving Eq. (6.8) is equivalent to solving Eq. (6.10), is when
NPVE is a monotonic transformation of NPVR . However in this thesis, we do not aim to
define an estimated objective function NPVE such that we can prove that Eq. (6.10) is
equivalent to Eq. (6.8). Our aim is that by solving Eq. (6.10), we can propose good points
with high NPVR values (see below for the definition of NPVE ).
To optimize NPVE , we propose to use the CMA-ES optimizer. During the optimization
process, we build a database –called also training set– in which after every performed
reservoir simulation for a given point x on a realization R, we store the point x together
with its corresponding evaluation NPV(x, R).
It remains now to define the estimated objective function NPVE for a given point (well
configuration) denoted by a vector of parameters p:
1. At the beginning of the optimization and until reaching a given number Nsim of
performed reservoir simulations, we define a number of reservoir simulations Ns1 (≤

Nr ) to be performed on p, and a set of Ns1 randomly drawn integers j1 , · · · , jNs1 ⊂
{1, · · · , Nr }. We perform then Ns1 reservoir simulations on p on the realizations
(Ri )i=j1 ,··· ,jN1 , and we add each of the obtained simulation results (p, NPV(p, Ri ))
s
to the training set.
The estimated objective function on the point p reads as follows:
Ns 1
E 1 X
NPV (p) = 1 NPV(p, Rji ) . (6.11)
Ns
i=1
In this case, the evaluation of NPVE requires a number Ns1 of reservoir simulations.
2. If more than Nsim reservoir simulations are performed, we perform the following
steps.
We begin by defining a number of reservoir simulations Ns2 (≤ Nr ) to be performed on

p, and a set of randomly drawn integers j1 , · · · , jNs2 ⊂ {1, · · · , Nr }. We perform
then Ns2 reservoir simulations on p on the realizations (Ri )i=j1 ,··· ,jN2 , and we add
s
each of the obtained simulation results (p, NPV(p, Ri )) to the training set.
We also define a maximum number of neighbor points Nn,max ∈ N that can be used
in the definition of NPVE . We select then at most the Nn,max nearest points to p
from the training set. Here, we select only the points with a distance less or equal
to a given fixed distance of selection denoted by dmax . We denote by Nn the number
of selected points and by (xi )1≤i≤Nn the selected points1 . The distance used for this
1
For each selected point xi for the training set, we have a corresponding evaluation on a given realiza-
tion. For the sake of notation simplicity we will denote the corresponding stored evaluation by NPV(xi , Ri )
although it is not necessarily evaluated on realization Ri .
81
6.3 Application of CMA-ES using the neighborhood approach on the
PUNQ-S3 case
purpose is the Mahalanobis distance with respect to the current covariance matrix
C of CMA-ES defined for two given points z1 ∈ Rn and z2 ∈ Rn by dC (z1 , z2 ) =
q
(z1 − z2 )T C−1 (z1 − z2 ).
The estimated objective function on p reads as follows:

 2

Ns Nn
1 X X
NPVE (p) = (pi NPV(p, Rji )) + (p̃i NPV(xi , Ri )) , (6.12)
S
i=1 i=1
2 2
dC (xi ,p) PNs2 P n
where pi = 1, p̃i = 1 − dmax and S = i=1 pi + N
i=1 p̃i . In this case, the
evaluation of NPVE requires only a number Ns2 of reservoir simulations.
The parameters Nsim , Ns1 , Ns2 and Nn,max are not meant to be in the users’ choice.
Typical values are Nn,max = 2 × Nr , Nsim = 2 × Nr , Ns1 = 1 and Ns2 = 1. A users’ choice
is the maximum distance of selection for the neighborhood dmax , and which is a problem-
dependent constant. An investigation of the impact of the choice of dmax will be briefly
shown in the next section through some examples.
An estimated standard deviation can also be included in the formulation of the esti-
mated objective function NPVE . In this case, the estimated objective function, which will
not be tested in this chapter, can be formulated as follows:
NPVE (p) = mE + rσ E (p) , (6.13)
where:  2 
Ns Nn
1 X X
mE =  (pi NPV(p, Rji )) + (p̃i NPV(xi , Ri )) , (6.14)
S
i=1 i=1
and
v  
u Ns2
u Nn
u1 X X
σ E (p) =t  pi (NPV(p, Rji ) − mE )2 + p̃i (NPV(xi , Ri ) − mE )2  . (6.15)
S
i=1 i=1
6.3 Application of CMA-ES using the neighborhood ap-

proach on the PUNQ-S3 case
In this section, we apply the CMA-ES using the neighborhood approach –that we will
call in the sequel the “using the neighborhood” approach– on the well placement prob-
lem on the benchmark reservoir case PUNQ-S3 [54]. As shown in previous examples in
Chapters 3 and 5, the model contains 19 × 28 × 5 grid blocks, and the elevation and the
geometry of the field is shown in Fig. 3.2. We consider 20 geological realizations that will
82
PUNQ-S3 case
9
x 10
11
10
9
NPV (True value)
4
0 1 2 3 4 5 6 7
Number of reservoir simulations 4
x 10
Figure 6.1: The evolution of the well placement optimization process on the PUNQ-S3 case
using CMA-ES with the “using the mean of samples” approach. The best mean value of
the NPV over the 20 possible realizations, i.e., NPVR is shown. Three runs are performed.
be again denoted by (Ri )i=1,··· ,20 . Each realization defines one possible porosity map and
one possible permeability map. In these examples, the number of realization Nr is then
equal to 20.
We plan to drill two wells: one unilateral injector and one unilateral producer. The
dimension of the problem is then equal to 12(= 6 × 2). In all the following applications,
we use CMA-ES as an optimization algorithm with a population size equal to 40.
As a reference approach, we perform three independent runs in which we optimize
the objective function NPVR as defined in Eq. (6.9). In this reference approach, we
perform for each well configuration to be evaluated 20 reservoir simulations. The reference
approach will be called in the sequel the “using the mean of samples” approach. Fig. 6.1
shows the evolution of the best mean value of NPVR , i.e., the NPV over the 20 possible
realizations, for the three performed runs. The “using the mean of samples” approach is
shown to be able to reach a mean value of NPVR equal to $9 × 109 using 15200 reservoir
simulations. It is able also to reach a mean value of NPVR equal to $9.3 × 109 using 31200
reservoir simulations and a mean value of NPVR equal to $9.5 × 109 using 44400 reservoir
simulations.
To evaluate the “using the neighborhood” approach, we use typical values of the param-
eters Nsim , Ns1 , Ns2 and Nn,max as defined in Section 6.2, i.e., Nn,max = 2×Nr , Nsim = 2×Nr ,
Ns1 = 1 and Ns2 = 1. We begin by choosing the maximum distance of selection for the
neighborhood dmax equal to 4000.
Fig. 6.2 shows the evolution of the optimization process for three independent runs
83
PUNQ-S3 case
9 9
x 10 x 10
12 12
11 11
10 10
9 9
NPV
NPV
8 8
7 7
6 6
5 5
4 4
0 1000 2000 3000 4000 5000 6000 7000 8000 0 1000 2000 3000 4000 5000 6000 7000 8000
Number of reservoir simulations Number of reservoir simulations
(a) (b)
9
x 10
12
11
10
9
NPV
4
0 1000 2000 3000 4000 5000 6000 7000 8000
(c)
case using CMA-ES with the “using the neighborhood” approach, for three independent
runs in (a), (b) and (c). The evolutions of the best estimated objective function, i.e.,
NPVE are drawn with green lines. The evaluations on the true objective function over the
20 possible realizations, i.e., NPVR are depicted with red crosses. The maximum distance
of selection for the neighborhood dmax is equal to 4000.
9 9
x 10 x 10
11 11
10 10
Best found NPV (True value)
9 9
NPV (True value)
8 8
7 7
6 6
5 5
4 4
0 1000 2000 3000 4000 5000 6000 7000 8000 0 1000 2000 3000 4000 5000 6000 7000 8000
(a) (b)
case using CMA-ES with the “using the neighborhood” approach for eight independent
runs. (a) shows the evolution of the evaluations on NPVR . (b) shows the evolution of the
best found evaluation on NPVR . The maximum distance of selection for the neighborhood
dmax is equal to 4000.
84
PUNQ-S3 case
of CMA-ES with the “using the neighborhood” approach. The evolutions of the best
estimated objective function, i.e., NPVE are drawn with green lines. During the optimiza-
tion process, each new overall best point found on NPVE , is evaluated on NPVR . The
evaluations performed on NPVR are depicted with red crosses. Fig. 6.2 shows that when
optimizing NPVE , we are able to propose good points according to NPVR (points with
an NPVR greater than $9 × 109 ). Moreover, NPVR tends to increase with an increasing
number of performed reservoir simulations.
Fig. 6.2(c) shows a particular run in which the best NPVE value found at the first
generation is equal to $9.7 × 109 . This value is calculated according to Eq. (6.11), and thus
calculated using only one single reservoir simulation (with one single random realization).
Indeed, with a single reservoir simulation to evaluate one point, the estimated objective
function can not in general propose a good point according to NPVR . Consequently, the
best point found at the first generation according to NPVE has a “bad” NPVR value
equal to $5.8 × 109 . Thus, the optimization process does not propose for 112 iterations
a new overall best point to be evaluated on NPVR . The performance of this run can
be avoided either by evaluating more often points using NPVR1 or simply by using more
reservoir simulations for each point to be evaluated at the beginning of the optimization,
i.e., choosing Ns1 ≥ 2.
We show in Fig. 6.3 the performance of eight independent runs of CMA-ES with the
“using the neighborhood” approach. Fig. 6.3(a) shows the evolution of the evaluations
performed on NPVR . The evaluated points correspond to the best overall points found
during the optimization process of NPVE . Fig. 6.3(b) shows the evolution of the best
evaluation performed on NPVR . Seven runs out of the eight performed runs (88%) are
able to reach an NPVR value greater than to $9 × 109 , using a mean number of reservoir
simulations equal to 2851. Consequently the reduction of the number of reservoir simula-
tions to reach an NPVR greater than to $9 × 109 when using the “using the neighborhood”
approach compared to the “using the mean of samples” approach is equal to 81%. Six
runs out of eight performed runs (75%) are able to reach a value of NPVR greater than
to $9.3 × 109 , using a mean number of reservoir simulations equal to 4307, which offers a
reduction of the number of reservoir simulations when comparing to the “using the mean of
samples” approach equal to 86%. However, only two runs out of the eight performed runs
(25%) are able to reach a value of NPVR greater than to $9.5 × 109 . The mean number
of reservoir simulations required to reach this value is 6160. Consequently the reduction
of the number of reservoir simulations to reach an NPVR greater than to $9.5 × 109 when
comparing to the “using the mean of samples” approach is again equal to 86%.
1
For example, one can evaluate the best found point according to NPVE at each iteration on NPVR
85
PUNQ-S3 case
9
x 10
11
10
9
NPV (True value)
4
0 1 2 3 4 5 6 7
x 10
case using CMA-ES with the “using the mean of samples” approach and the “using the
neighborhood” approach. The evolution of the best found evaluation on NPVR for the
“using the neighborhood” approach is drawn with red lines. The evolution of the best
found evaluation on NPVR for the “using the mean of samples” approach is drawn with
blue lines. Three independent runs are performed for each approach. For the “using the
neighborhood” approach, the maximum distance of selection for the neighborhood dmax is
equal to 4000.
9 9
x 10 x 10
11 11
10 10
9 9
NPV (True value)
8 8
7 7
6 6
5 5
4 4
0 1000 2000 3000 4000 5000 6000 7000 8000 0 1000 2000 3000 4000 5000 6000 7000 8000
(a) (b)
case using CMA-ES with the “using the neighborhood” approach for four independent
86
PUNQ-S3 case
9 9
x 10 x 10
11 11
10 10

9 9
NPV (True value)
8 8
7 7
6 6
5 5
4 4
0 1000 2000 3000 4000 5000 6000 7000 8000 0 1000 2000 3000 4000 5000 6000 7000 8000
(a) (b)
case using CMA-ES with the “using the neighborhood” approach for four independent
Three runs of CMA-ES with the “using the neighborhood” approach are shown together
with the three performed runs of CMA-ES with the “using the mean of samples” approach
in Fig. 6.4. Results show that although the “using the neighborhood” approach does not
guarantee finding the best values of NPVR found by the “using the mean of samples”
approach when comparing with the “using the mean of samples” approach, the number of
reservoir simulations is reduced significantly by more than 81%.
The impact of the choice of the maximum distance of selection for the neighborhood
dmax is shown in Figs. 6.5 and 6.6. Comparing the results in Figs. 6.5, 6.3 and 6.6 (with
dmax = 3000, 4000 and 6000) shows that the approach is not very sensitive to the choice
of dmax .
In the sequel, we compare the “using the neighborhood” approach with another ap-
proach in which the estimated objective function to be optimized is equal to an evaluation
on a randomly chosen realization. This approach is called the “using one realization”
approach. In this approach, we also evaluate on NPVR only the overall new best points
found on the estimated objective function. Figs. 6.7 and 6.8 show the evolution of the op-
timization process for three independent runs of CMA-ES with the “using one realization”
approach. In Fig. 6.7, the evolutions of the best estimated objective function are again
drawn with green lines. If we compare the “using the neighborhood” and the “using one
realization” approaches through Figs. 6.2 and 6.7, it is clear that contrary to the “using the
neighborhood” approach which is shown to be able to capture the geological uncertainty,
the “using one realization” approach is shown to be not able to propose good points with
high NPVR . The three performed runs with the “using one realization” approach are not
able to reach an NPVR value greater than $9 × 109 .
87
PUNQ-S3 case
9 9
x 10 x 10
13 13
12 12
11 11
10 10
9 9
NPV
NPV
8 8
7 7
6 6
5 5
4 4
0 1000 2000 3000 4000 5000 6000 7000 0 1000 2000 3000 4000 5000 6000 7000
(a) (b)
9
x 10
13
12
11
10
9
NPV
4
0 1000 2000 3000 4000 5000 6000 7000
(c)
case using CMA-ES with the “using one realization” approach, for three independent runs
in (a), (b) and (c). The evolutions of the best estimated objective function (equal to an
evaluation on a randomly chosen realization) are drawn with green lines. The evaluations
on the true objective function over the 20 possible realizations, i.e., NPVR are depicted
with blue crosses.
88
9
x 10
11
10
9
NPV (True value)
4
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
case using CMA-ES with the “using one realization” approach. The best mean value of
the NPV over the 20 possible realizations, i.e., NPVR is shown. Three runs are performed.
In this chapter, we have defined a new approach to handle geological uncertainty for well
placement using the objective function evaluations of already evaluated individuals in the
neighborhood. The proposed approach is compared to a reference approach using the
mean of samples of each individual. We have shown on the synthetic benchmark reservoir
case PUNQ-S3 that although the proposed approach does not guarantee finding always
the best values found by the reference approach, the number of reservoir simulations is
reduced significantly by more than 81%.
89
Chapter 7
Conclusions and perspectives
7.1 Conclusions
In this thesis, we have contributed to the research area of optimizing well placement
(locations and trajectories of the wells to be drilled) by addressing the following challenges
(presented in Section 1.3):
The problem (I) was addressed in Chapter 3 by applying the stochastic optimizer CMA-
ES. We have shown that CMA-ES outperforms the genetic algorithm on the PUNQ-S3 case
by leading to a higher net present value (NPV). Moreover, CMA-ES was shown to be able
to define potential regions containing optimal well configurations. The ability of CMA-ES
to find much higher NPV values and to converge to the same region of the search space, has
been explained by its advanced adaptation mechanism that allows the algorithm, on ill-
conditioned non-separable problems, to adapt in an efficient way its sampling probability
distribution.
The problem (II) was addressed by defining two new algorithms aiming at reducing
the number of objective function evaluations, based on meta-models whose underlying
idea is to replace some (true) function evaluations during the optimization process by the
function values given by the meta-model. Meta-models can be considered as a computa-
tionally cheaper replacement of the objective function. This consideration is justified by
the context of costly objective function for the well placement problem. The new-local-
meta-model CMA-ES, denoted nlmm-CMA (Chapter 2) was proposed in order to mitigate
some defects of the already existing local-meta-model CMA-ES (lmm-CMA) when dealing
with large population sizes. The partially separable local-meta-model CMA-ES, denoted
90
7.2 Perspectives
p-sep lmm-CMA (Chapter 4) was proposed leading to an important speedup compared

to the standard CMA-ES when dealing with partially separable functions. Exploiting the
partial separability of the objective function is motivated by the well placement problem,
in which the reservoir simulations can output the production of each single well (assumed
to be depending on a fewer parameters) though the objective is to maximize the production
of all the wells. The proposed algorithms (nlmm-CMA and p-sep lmm-CMA) were then
applied on the well placement problem in Chapters 3 and 5. Results had demonstrated
the potential huge benefit of applying the proposed algorithms in reducing the number of
objective function evaluations on the well placement problem, and which can be extended
to other reservoir applications. The use of nlmm-CMA was shown on the benchmark
reservoir case PUNQ-S3 to be able to offer similar results (solution well configurations
and the corresponding NPV values) as CMA-ES without meta-models and moreover to
reduce the number of simulations by around 20% to reach a satisfactory NPV. The use of
p-sep lmm-CMA was also shown on PUNQ-S3 to lead to an important reduction of the
number of reservoir simulations of around 60% compared to CMA-ES.
Finally, the problem (III) was addressed in Chapter 6 by defining a new approach to
optimize under geological uncertainty with a reduced number of reservoir simulations. The
approach uses the objective function evaluations of already simulated well configurations
in the neighborhood of each well configuration. The proposed approach is shown to be able
to capture the geological uncertainty using a reduced number of reservoir simulations. On
the benchmark reservoir case PUNQ-S3, the proposed approach is able to reduce signifi-
cantly the number of reservoir simulations by more than 80% compared to the reference
approach using all the possible realizations for each well configuration. The research in
the area of reducing the number of reservoir simulations when optimizing under geological
uncertainty has been relatively scarce, and it is expected that an efficient algorithm such
as the proposed one, should provide a renewed interest in this area.
7.2 Perspectives
Several extensions to the present research can be mentioned. For the two algorithms nlmm-
CMA and p-sep lmm-CMA defined respectively in Chapters 2 and 4, we have only focused
on local quadratic meta-models. However other types of meta-models could be used like
kriging and radial basis functions as we have no a priori that quadratic meta-models are
the best models to use for practical purposes.
100
In Chapter 4 when defining p-sep lmm-CMA, we have shown on fRosen that using a
population size λ = 4 × λdefault gives the minimum number of evaluations and improves
the performance by a factor between 1.5 and 2 over the default population size. Future
91
7.2 Perspectives
investigations are needed to define the optimal population size depending on the dimension
of the problem and the dimension of the sub-problems, over a wide range of test functions.
In Chapter 5 when applying p-sep lmm-CMA on the well placement problem, we have
assumed that each element function depends on the parameters defining the considered
well and the minimum distances between the considered well and the other wells. However,
further studies are useful in order to define the best parameters for each element function
and thus improve the accuracy of the partially separated meta-models.
Moreover, the work in Chapters 4 and 5 was motivated by the fact that we need to
exploit more information on the well placement problem and that we need to incorporate
these information into the original algorithm in order to obtain consequent improvements.
Chapters 4 and 5 deal with the partial separability of the objective function. Another
approach could be to exploit some a priori information such as well allocation factors and
connectivity using the work developed in [41].
The problem of optimization under geological uncertainty remains an open research
problem. The approach proposed in Chapter 6 can be considered as an initial work to
define a robust optimization algorithm handling the uncertainty with a restricted budget
of function evaluations. This work can be extended and enhanced by numerous means,
mainly by using an adaptive strategy to define the parameters of the algorithm. Although
we suspect that the used values of the parameters defining the number of performed reser-
voir simulations Ns1 and Ns2 (equal to one) can be a good choice, we believe that a procedure
to adapt the number of function evaluations can improve the performance of the approach
and generalize it to a wider range of problems. Indeed, the adaptation of Ns1 and Ns2
aims at controlling the quality of the estimated function and thus at deciding whether
the estimated function is sufficiently accurate or if one needs more evaluations on other
realizations to improve the quality of the estimated function. In Chapter 6, the proposed
approach is compared to a reference approach using the mean of samples of each individ-
ual. However, more comparisons with recent competing approaches is recommended, in
particular with the approach in [73].
Finally, applying the developed methods to other Geosciences optimization problems
(e.g., history matching, production optimization) is suggested since we believe that many
other Geosciences problems could be successfully handled with the developed methods.
92
Nomenclature
Included in the following list are common symbols employed throughout the thesis. Other,
more specific symbols are defined as used.
C covariance matrix of a multivariate normal distribution

Cd cost of drilling
dii minimum distance between the well i and the other injectors
dpi minimum distance between the well i and the other producers
Hk gross thickness of a layer with an index k
k number of points used to build the meta-model
ki number of points used to build the sub-meta-model (approximating an element
function) with an index i
Lmax maximum well length
m mean of a multivariate normal distribution
n dimension of the problem
nb number of individuals evaluated on the meta-model each iteration cycle
nic number of iteration cycles needed to satisfy the meta-model acceptance criterion
ninit number of initial individuals evaluated on the meta-model
nlayers number of layers in the reservoir
nM number of element variables for an element function
Njun number of junctions
Nlat number of laterals
Nr number of geological realizations
Ns number of samples (sample size)
Nw number of wells
Nwd number of wells already drilled
R revenue
Ri geological realization with an index i
So oil saturation
Symbols
λ number of individuals created each generation (population size)

µ number of individuals of the current generation used to create the individuals
of the next generation
φ porosity
σ step-size; standard deviation
93
Abbreviations
APR Annual Percentage Rate

BOE Barrel of Oil Equivalent
CPU Central Processing Unit
NPV Net Present Value
Superscripts
E estimated
R true
(g) at generation g
94
Résumé en Français
(extended abstract in French)
L’état de l’art de la gestion des réservoirs a été récemment fortement influencé par le
développement technologique. De nos jours, les technologies de forage ont connu de grands
progrès, en particulier le forage directionnel. Par conséquent, les ingénieurs de réservoir
bénéficient de l’utilisation des différentes architectures de configurations des puits, à savoir
des architectures verticales, horizontales, ainsi que des architectures plus complexes, afin
d’améliorer la productivité du réservoir.
Les Environnements, les zones et les conditions dans lesquelles les champs de pétrole et
de gaz sont actuellement découverts, sont beaucoup plus complexes et difficiles à exploiter.
D’une part, les champs existants sont de plus en plus déplétés, et par conséquent plus
marginaux. A moins d’avoir un moyen d’optimiser leurs productivités et de prendre des
mesures correctives, il serait difficile de justifier de continuer à investir dans la production
de ces champs existants pour des raisons économiques [14]. D’autre part, les nouvelles
découvertes ont aussi besoin d’un schéma de production optimal pour être économiquement
viables.
L’un des problèmes les plus importants qui doivent être abordés afin de maximiser
la valeur liquidative d’un projet donné est de décider d’une façon optimale où forer les
puits. Une décision de placement des puits affecte considérablement la récupération des
hydrocarbures, et donc la valeur liquidative du projet. En général, une telle décision est
difficile à prendre puisqu’un placement optimal dépend d’un grand nombre de paramètres
tels que les hétérogénéités du réservoir, les failles et les fluides en place. En outre, la
complexité des configurations des puits, e.g. les puits non-conventionnels, implique des
95
P0 (x0 , y0 , z0 )
rd,1
lb,1
Pd,1 (rd,1 , θd,1 , ϕd,1 )
rd,2
Q1 rb,1
Pb,1 (lb,1 , rb,1 , θb,1 , ϕb,1 )
Pd,2 (rd,2 , θd,2 , ϕd,2 )
Figure .1: Un exemple de paramétrage d’un puits multilatéral ayant deux segments et une
branche.
difficultés supplémentaires, telles que la concentration des investissements et la difficulté

d’intervention sur les puits 1 .
La décision de placement des puits est formulée comme étant un problème
d’optimisation:
• la fonction objectif à optimiser, qui est estimée en utilisant un simulateur de réservoir,

évalue les aspects économiques du projet;
• les paramètres du problème encodent les positions des différents puits (qui compor-
tent leurs emplacements et trajectoires).
Nous définissons l’emplacement d’un puits donné par la position de son point de départ,
et nous définissons la trajectoire d’un puits donné par les positions de sa bore principale
et les différents latérales (le cas échéant).
Si le nombre des puits à placer ainsi que leurs types (injecteur ou producteur) sont
fixés, les paramètres encodant les positions des puits sont des nombres réels, et la fonction
objectif f est une fonction d’un sous-ensemble de Rn , où n, qui correspond au nombre
de paramètres, est égal à la somme des nombres de paramètres nécessaires pour encoder
chaque position de puits à placer. Un exemple de paramétrage d’un puits multilatéral est
représenté dans la figure. .1.
1
Le forage d’un puits coûte en général entre 1 million de dollars et 30 millions de dollars.
96
Formellement, nous cherchons à trouver un vecteur de paramètres pmax ∈ Rn tel que:
f (pmax ) = max {f (p)} , (.1)

p
où p désigne le vecteur de paramètres à optimiser encodant les positions et les trajectoires
des configurations des puits.
L’objectif principal de cette thèse est de proposer une procédure permettant de résoudre
le problème d’optimisation de placement des puits, en particulier l’emplacement des puits
et leurs trajectoires (défini dans l’Eq. (.1)). La procédure proposée doit à la fois offrir
la valeur liquidative maximale tout en utilisant un nombre techniquement abordable de
simulations de réservoir.
Cela implique les défis suivants, à savoir:
(I) L’irrégularité, la multimodalité, la non-convexité et la dimensionnalité élevée de la

fonction objectif;
(II) Le coût élevé de la fonction objectif;
(III) Le problème de traitement des incertitudes géologiques.
Considérant l’état de l’art de l’optimisation, le choix de l’algorithme CMA-ES [74]

semble a priori adapté pour attaquer le problème (I). En effet, CMA-ES est reconnu
comme l’un des plus puissants optimiseurs sans-dérivés pour l’optimisation continue [70].
CMA-ES est à la fois un algorithme rapide et robuste de recherche locale, présentant
une convergence linéaire sur des classes larges de fonctions, et un algorithme de recherche
global, si on relançe l’algorithme tout en augmentant la taille de population. CMA-ES,
contrairement à la plupart des autres algorithmes évolutionnaires, est un algorithme quasi
sans-paramètres 1 .
Dans l’industrie pétrolière, CMA-ES n’a été appliqué, au meilleur de notre connais-
sance, que dans deux études antérieure à ce travail: une caractérisation des conductivités
de fracture pour l’inversion des tests de puits [32], une optimisation de placement des
puits mais en utilisant des attributs simples (par exemple, les indices de productivité)
[43]. Une application plus récente sur l’optimisation de placement des puits a été publiée
dans [116, 115].
Pour s’attaquer au problème (II), nous proposons d’étudier le couplage de l’optimiseur
CMA-ES avec des surrogates (ou méta-modèles). Dans ce contexte, nous cherchons à
définir une variante efficace de CMA-ES couplé avec des méta-modèles, capable de réduire
significativement le nombre de simulations de réservoir. Par ailleurs, nous visons à exploiter
1
Seule la taille de la population est suggéré d’être ajustée par l’utilisateur, afin de tenir compte de la
robustesse du paysage fonction objective.
97
les connaissances sur le problème d’optimisation, en particulier ladite séparabilité partielle
de la fonction objectif afin de réduire davantage le nombre de simulations de réservoir.
Enfin, pour s’attaquer au problème (III), nous visons à définir une approche (pour
CMA-ES) capable de capturer l’incertitude géologique avec un coût nettement réduit de
simulations de réservoir. Dans ce contexte, nous cherchons à définir une approche qui
utilise un nombre réduit de simulations de réservoir (typiquement un) pour chaque config-
uration des puits, au lieu d’effectuer des simulations de réservoir sur toutes les réalisations
géologiques possibles.
Résumé des contributions
Dans ce qui suit, on présente un résumé des contributions de la présente thèse.
Nous avons abordé le problème (I) lié à l’irrégularité, la multimodalité, la non-convexité

et la dimensionnalité élevée de la fonction objectif dans le problème de placement des puits,
et nous avons présenté:
Une première application réussie de CMA-ES sur le problème de placement

des puits. (Résultats publiés dans [26, 24])
Nous proposons une nouvelle méthodologie pour l’optimisation des positions et tra-
jectoires des puits. Cette méthodologie basée sur des populations est un algorithme
de recherche stochastique appelée “Covariance Matrix Adaptation - Evolution Strategy”
(CMA-ES) [74]. Nous proposons d’utiliser une nouvelle technique de pénalisation adapta-
tive avec rejet pour traiter les contraintes.
Vu que les algorithmes génétiques (AGs) sont très souvent la méthode choisie dans
l’industrie pétrolière, nous montrons la contribution de l’application de CMA-ES par rap-
port à un AG sur le cas réservoir de benchmark synthétique PUNQ-S3 [54]. Pour permettre
une comparaison équitable, les deux algorithmes sont utilisés sans réglage de paramètres
du problème, les réglages standard sont utilisés pour AG et les paramètres par défaut
pour CMA-ES. Nous montróns que notre nouvelle approche est plus performante que
l’algorithme génétique (voir figure. .2): généralement, elle conduit à la fois à une plus
grande valeur actuelle nette (NPV) et à une réduction significative du nombre de simula-
tions de réservoir nécessaire pour atteindre une bonne configuration de puits.
Après cette application de CMA-ES sur le problème de placement des puits, nous
avons abordé le problème (II) liée au coût élevé de la fonction objectif, et nous avons
proposé deux nouveaux algorithmes:
98
10
x 10
1.8
1.6
NPV
1.4
1.2
CMA−ES
GA
1
0 500 1000 1500 2000 2500 3000 3500 4000
Figure .2: Valeur moyenne du NPV (en dollars) et l’écart type correspondant pour un
problème de placement des puits utilisant CMA-ES (courbe continue) et AG (courbe dis-
continue). Quatorze tests sont effectués pour chaque algorithme.
Une nouvelle variante de CMA-ES avec des méta-modèles locaux. (Résultats

publiés dans [22])
Le local-méta-modèle CMA-ES (lmm-CMA) [87], couplant des méta-modèles locaux
quadratiques avec CMA-ES, est étudié. La dépendance de l’algorithme par rapport à la
taille de population est analysée et les limites de l’approche pour des tailles de population
plus grandes que celle par défaut, sont démontrées. Une nouvelle variante pour décider
quand le méta-modèle est accepté, est proposée - appelé le nouveau-local-méta-modèle
CMA-ES (nlmm-CMA).
Une nouvelle variante de CMA-ES avec des méta-modèles locaux pour les
fonctions partiellement séparables. (Résultats publiés dans [23])
Nous proposons une nouvelle variante de CMA-ES avec des méta-modèles locaux pour
l’optimisation des fonctions partiellement séparables - appelée la partiellement séparable
local-méta-modèle CMA-ES (p-sep lmm-CMA). Nous proposons d’exploiter la séparabilité
partielle en construisant à chaque itération un méta-modèle pour chaque fonction élément
(ou sous-fonction) en utilisant un modèle quadratique complet local. Nos résultats
démontrent que l’exploitation de la séparabilité partielle conduit à une accélération im-
portante par rapport à l’algorithme CMA-ES standard. Nous montrons sur les fonctions
testées que l’accélération augmente avec les dimensions, pour une dimension fixe de la
fonction élément. Sur la fonction Rosenbrock standard, l’accélération maximale de λ est
99
Figure .3: Valeur moyenne du NPV (en dollars) et l’écart type correspondant pour un
problème de placement des puits utilisant CMA-ES avec des méta-modéles (courbe con-
tinue) et CMA-ES (courbe discontinue). Dix tests sont effectués pour chaque algorithme.
atteinte avec la dimension 40 en utilisant des fonctions éléments de dimension 2, où λ est
la taille de population. Nous montrons également que des accélérations plus importantes
peuvent être atteintes en augmentant la taille de population.
Maintenant, nous avons appliqué les deux nouveaux algorithmes proposés sur le
problème de placement des puits:
Une réduction significative du nombre de simulations de réservoir pour le

problème de placement des puits. (Résultats publiés dans [26, 24, 25])
Nous proposons d’appliquer CMA-ES avec des méta-modèles locaux (nlmm-CMA) sur
le problème de placement des puits, où pour chaque configuration de la population, un
modèle approché convexe quadratique est construit en utilisant les vraies évaluations de la
fonction objectif collectées pendant le processus d’optimisation. Le couplage de CMA-ES
avec les méta-modèles conduit à une amélioration significative, autour de 20 % pour le cas
réservoir de benchmark synthétique PUNQ-S3 (voir figure. .3).
Par ailleurs, nous proposons également d’appliquer p-sep lmm-CMA sur le problème de
placement des puits, en construisant des méta-modèles partiellement séparés pour chaque
100
10
x 10
1.5
1.4
1.3
NPV
1.2
1.1
1 p−sep lmm−CMA
lmm−CMA
CMA−ES
0.9
0 200 400 600 800 1000
Figure .4: Valeur moyenne du NPV (en dollars) pour un problème de placement des
puits utilisant CMA-ES avec des méta-modéles partiellement séparés, i.e., p-sep lmm-CMA
(courbe continue), CMA-ES avec des méta-modéles, i.e., lmm-CMA (courbe discontinue)
et CMA-ES (△). Dix tests sont effectués pour chaque algorithme.
puits ou ensemble de puits, qui se traduit par une modélisation plus précise. Nos résultats
montrent qu’en exploitant la séparabilité partielle de la fonction objectif, nous obtenons
une diminution significative du nombre de simulations de réservoir nécessaire pour trouver
la configuration “optimale” des puits, en considérant un budget restreint de simulations
de réservoir (voir figure. .4).
Une nouvelle approche pour traiter l’incertitude géologique pour le problème

de placement des puits.
Nous proposons une nouvelle approche pour traiter l’incertitude géologique pour le
problème de placement des puits avec un nombre réduit de simulations de réservoir. Nous
proposons d’utiliser une seule réalisation avec le voisinage de chaque configuration des puits
afin d’estimer sa fonction objectif, au lieu d’utiliser multiples réalisations. L’approche est
appliquée sur le cas réservoir de benchmark synthétique PUNQ-S3 et démontré être ca-
pable de capturer l’incertitude géologique en utilisant un nombre réduit de simulations de
réservoir (voir figure. .5).
101
9
x 10
11
10
9
NPV (True value)
4
0 1 2 3 4 5 6 7
x 10
Figure .5: L’évolution du meilleur NPV (en dollars) obtenu pour un problème de placement
des puits utilisant CMA-ES avec une approche utilisant la moyenne des échantillons (courbe
bleue), et avec l’approche proposée utilisant le voisinage (courbe rouge). Trois tests sont
démontrés pour chaque algorithme.
102
Perspectives
Plusieurs extensions de ces travaux présentés dans cette thése peuvent être mentionnées.
Pour les deux algorithmes nlmm-CMA et p-sep-lmm CMA définies respectivement dans
les chapitres 2 et 4, nous nous sommes uniquement focalisé sur des méta-modèles quadra-
tiques locaux. Cependant, d’autres types de méta-modèles peuvent être utilisés comme le
krigeage et les fonctions à base radiale, puisque que nous n’avons aucun a priori que les
méta-modèles quadratiques sont les meilleurs modèles à utiliser dans la pratique.
Dans le chapitre 4, lors de la définition p-sep-lmm CMA, nous avons montré sur la
100
fonction fRosen qu’une taille de population λ = 4 × λ(par offre le nombre minimum
defaut)
d’évaluations, et améliore ainsi les performances de l’algorithme avec un facteur entre 1.5
et 2 par rapport à l’utilisation de la taille de population par défaut. D’autres investigations
sont nécessaires pour définir la taille de population optimale en fonction de la dimension
du problème et de la dimension des sous-problèmes, sur une large gamme de fonctions
test.
Dans le chapitre 5, lors de l’application de p-sep-lmm CMA sur le problème de place-
ment des puits, nous avons supposé que chaque élément fonction dépend des paramètres
du puits considéré, ainsi que des distances minimales entre le puits considéré et les autres
puits. Cependant, des études complémentaires sont utiles afin de définir les meilleurs
paramètres pour chaque élément fonction, et d’améliorer ainsi la précision des méta-
modèles partiellement séparés.
En outre, le travail dans les chapitres 4 et 5 a été motivé par le fait que nous avons
besoin d’exploiter le maximum possible d’informations sur le problème de placement des
puits, et que nous avons besoin d’intégrer ces informations dans l’algorithme original afin
d’obtenir des améliorations significatives. Les chapitres 4 et 5 traitent la séparabilité par-
tielle de la fonction objectif. Une autre approche pourrait consister à exploiter certaines
informations a priori comme les facteurs d’allocation des puits et la connectivité, en util-
isant les travaux développés dans [41].
Le problème d’optimisation sous incertitude géologique demeure un problème de
recherche ouvert. L’approche proposée dans le chapitre 6 peut être considéré comme
un travail initial visant à définir un algorithme d’optimisation robuste pouvant gérer les
incertitudes avec un budget restreint d’évaluations de la fonction objectif. Ce travail peut
être étendu par de nombreux moyens, notamment en utilisant une stratégie adaptative
pour définir les paramètres de l’algorithme. Bien que nous soupçonnons que les valeurs
des paramètres utilisées pour définir le nombre de simulations de réservoir réalisées Ns1
et Ns2 (égal à un) peuvent représenter un bon choix, nous considérons qu’une procédure
adaptative du nombre d’évaluations de la fonction objectif peut améliorer la performance
de l’approche, et ainsi la généraliser à un plus large éventail de problèmes. En effet,
103
l’adaptation de Ns1 et Ns2 , vise à contrôler la qualité de la fonction objectif estimée, et donc
à décider si la fonction estimée est suffisamment précise ou si des évaluations sur d’autres
réalisations sont nécessaires pour améliorer la qualité de la fonction estimée.
Enfin, l’application des méthodes développées sur d’autres problèmes d’optimisation
en Géosciences (e.g., calage d’historiques, optimisation de la production) est suggérée car
nous pensons que de nombreux autres problèmes en Géosciences pourraient être traitées
avec succès avec les méthodes mises au point dans cette thèse.
104
References
[1] S. I. Aanonsen, A. L. Eide, L. Holden, and J. O. Aasen. Optimizing reservoir

performance under uncertainty with application to well location. In SPE Annual
Technical Conference and Exhibition, number SPE 30710, October 1995.
[2] I. Aitokhuehi, L. J. Durlofsky, V. Artus, B. Yeten, and K. Aziz. Optimization of ad-

vanced well type and performance. In 9th European Conference on the Mathematics
of Oil Recovery (ECMOR IX). EAGE, August 2004.
[3] A. Aizawa and B. W. Wah. Dynamic control of genetic algorithms in a noisy environ-
ment. In Proceedings of the Fifth International Conference on Genetic Algorithms,
pages 48–55, 1993.
[4] A. Aizawa and B. W. Wah. Scheduling of genetic algorithms in a noisy environment.

Evolutionary Computation, 2:97–122, 1994.
[5] A. H. Alhuthali, A. Datta-Gupta, B. Yuen, and J. P. Fontanilla. Optimizing smart

well controls under geologic uncertainty. Journal of Petroleum Science and Engi-
neering, 73(1-2):107 – 121, 2010.
[6] D. V. Arnold. Optimal weighted recombination. In A. H. Wright, M. D. Vose,

K. A. De Jong, and L. M. Schmitt, editors, Foundations of Genetic Algorithms,
volume 3469 of Lecture Notes in Computer Science, pages 215–237. Springer Berlin
/ Heidelberg, 2005.
[7] D. V. Arnold and H.-G. Beyer. Efficiency and mutation strength adaptation of
the (µ/µi , λ)-ES in a noisy environment. In M. Schoenauer, K. Deb, G. Rudolph,
X. Yao, E. Lutton, J. Merelo, and H.-P. Schwefel, editors, Parallel Problem Solving
from Nature PPSN VI, volume 1917 of Lecture Notes in Computer Science, pages
39–48. Springer Berlin / Heidelberg, 2000.
[8] D. V. Arnold and H.-G. Beyer. Local performance of the (µ/µI , λ)-ES in a noisy
environment. Foundations of Genetic Algorithms 6, pages 127–141, 2001.
105
REFERENCES
[9] L. Arnold, A. Auger, N. Hansen, and Y. Ollivier. Information-Geometric Optimiza-

tion Algorithms: A Unifying Picture via Invariance Principles. ArXiv e-prints, June
2011.
[10] V. Artus, L. J. Durlofsky, J. Onwunalu, and K. Aziz. Optimization of nonconven-

tional wells under uncertainty using statistical proxies. Computational Geosciences,
10:389–404, 2006.
[11] A. Auger and N. Hansen. Performance evaluation of an advanced local search evo-
lutionary algorithm. In IEEE Congress on Evolutionary Computation, pages 1777–
1784, 2005.
[12] A. Auger and N. Hansen. A restart CMA evolution strategy with increasing popula-
tion size. In Proceedings of the IEEE Congress on Evolutionary Computation, pages
1769–1776. IEEE Press, 2005.
[13] A. Auger and N. Hansen. Approximate evolution strategy using stochastic ranking.
In IEEE Congress on Evolutionary Computation, pages 745–752, 2006.
[14] T. Babadagli. Development of mature oil fields - a review. Journal of Petroleum

Science and Engineering, 57(3-4):221 – 246, 2007.
[15] T. Bäck and H.-P. Schwefel. An overview of evolutionary algorithms for parameter
optimization. Evolutionary Computation, 1:1–23, 1993.
[16] W. Bangerth, H. Klie, V. Matossian, M. Parashar, and M. F. Wheeler. An auto-

nomic reservoir framework for the stochastic optimization of well placement. Cluster
Computing, 8(4):255–269, 2005.
[17] W. Bangerth, H. Klie, M. F. Wheeler, P. L. Stoffa, and M. K. Sen. On optimization

algorithms for the reservoir oil well placement problem. Computational Geosciences,
10(3):303–319, 2006.
[18] P. Bayer and M. Finkel. Evolutionary algorithms for the optimization of advective
control of contaminated aquifer zones. Water Resources Research, 40(6), 2004.
[19] B. L. Beckner and X. Song. Field development planning using simulated annealing -
optimal economic well scheduling and placement. In SPE annual technical conference
and exhibition, number SPE 30650, October 1995.
[20] A. J. Booker, J. E. Dennis, P. D. Frank, D. B. Serafini, V. Torczon, and M. W.

Trosset. A rigorous framework for optimization of expensive functions by surrogates.
Structural and Multidisciplinary Optimization, 17(1):1–13, 1999.
106
REFERENCES
[21] A. Bouaricha and J. J. Morè. Impact of partial separability on large-scale optimiza-

tion. Computational Optimization and Applications, 7(1):27–40, 1997.
[22] Z. Bouzarkouna, A. Auger, and D. Y. Ding. Investigating the local-meta-model

CMA-ES for large population sizes. In C. Di Chio, S. Cagnoni, C. Cotta, M. Ebner,
A. Ekárt, A. Esparcia-Alcazar, C.-K. Goh, J. Merelo, F. Neri, M. Preuß, J. Togelius,
and G. Yannakakis, editors, Applications of Evolutionary Computation, volume 6024
of Lecture Notes in Computer Science, pages 402–411. Springer Berlin / Heidelberg,
2010.
[23] Z. Bouzarkouna, A. Auger, and D. Y. Ding. Local-meta-model CMA-ES for partially

separable functions. In Proceedings of the 13th annual conference on Genetic and
evolutionary computation, GECCO ’11, pages 869–876, New York, NY, USA, 2011.
ACM.
[24] Z. Bouzarkouna, D. Y. Ding, and A. Auger. Using evolution strategy with meta-
models for well placement optimization. In 12th European Conference on the Math-
ematics of Oil Recovery (ECMOR XII). EAGE, 2010.
[25] Z. Bouzarkouna, D. Y. Ding, and A. Auger. Partially separated meta-models with

evolution strategies for well placement optimization. In SPE EUROPEC/EAGE
Annual Conference and Exhibition, number SPE 143292, 2011.
[26] Z. Bouzarkouna, D. Y. Ding, and A. Auger. Well placement optimization with the
covariance matrix adaptation evolution strategy and meta-models. Computational
Geosciences, 16(1):75–92, 2011.
[27] J. Branke. Creating robust solutions by means of evolutionary algorithms. In Pro-

ceedings of the 5th International Conference on Parallel Problem Solving from Na-
ture, PPSN V, pages 119–128, London, UK, 1998. Springer-Verlag.
[28] J. Branke and C. Schmidt. Faster convergence by means of fitness estimation. Soft
Comput., 9(1):13–20, January 2005.
[29] J. Branke, C. Schmidt, and H. Schmec. Efficient fitness estimation in noisy envi-
ronments. In Proceedings of Genetic and Evolutionary Computation, pages 243–250,
2001.
[30] R. Brayton, S. Director, G. Hachtel, and L. Vidigal. A new algorithm for statis-
tical circuit design based on quasi-newton methods and function splitting. IEEE
Transactions on Circuits and Systems, 26(9):784–794, 1979.
107
REFERENCES
[31] C. G. Broyden. The convergence of a class of double-rank minimization algorithms.

Journal of the Institute of Mathematics and Its Applications, 6:76–90, 1970.
[32] J. Bruyelle and A. Lange. Automated characterization of fracture conductivities from

well tests. In SPE EUROPEC/EAGE annual conference and exhibition, number SPE
121172, June 2009.
[33] A. Y. Bukhamsin, M. M. Farshi, and K. Aziz. Optimization of multilateral well

design and location in a real field using a continuous genetic algorithm. In SPE/DGS
Saudi Arabia Section Technical Symposium and Exhibition, number SPE 136944,
April 2010.
[34] H. Chen and B. Schmeiser. Stochastic root finding via retrospective approximation.
IIE Transactions, 33(3):259–275, 2001.
[35] Y. Chen. Ensemble-based closed-loop production optimization. PhD thesis, University

of Oklahoma, 2008.
[36] Y. Chen and D. S. Oliver. Ensemble-based closed-loop optimization applied to brugge

field. In SPE reservoir simulation symposium, number SPE 118926, February 2009.
[37] Y. Chen, D. S. Oliver, and D. Zhang. Efficient ensemble-based closed-loop production

optimization. SPE Journal, 14(4):634–645, 2009.
[38] B. Colson and P. L. Toint. Optimizing partially separable functions without deriva-
tives. Optimization Methods and Software, 20(4-5):493–508, 2005.
[39] A. Cordoba, R. Vilar, A. Lavrov, A. B. Utkin, and A. Fernandes. Multi-objective

optimisation of lidar parameters for forest-fire detection on the basis of a genetic
algorithm. Optics & Laser Technology, 36(5):393 – 400, 2004.
[40] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20:273–297,

1995.
[41] P. S. da Cruz, R. N. Horne, and C. V. Deutsch. The quality map: A tool for
reservoir uncertainty quantification and decision making. SPE Reservoir Evaluation
& Engineering, 7(1):6–14, 2004.
[42] L. Damp and L. F. Gonzalez. Optimisation of the nose of a hypersonic vehicle

using dsmc simulation and evolutionary optimisation. In 5th AIAA ASSC Space
Conference, September 2005.
108
REFERENCES
[43] D. Y. Ding. Optimization of well placement using evolutionary algorithms. In SPE

EUROPEC/EAGE annual conference and exhibition, number SPE 113525, June
2008.
[44] D. Y. Ding and F. Mckee. Using partial separability of the objective function for
gradient-based optimizations in history matching. In SPE reservoir simulation sym-
posium, number SPE 140811, February 2011.
[45] N. Durand and J.-M. Alliot. Genetic crossover operator for partially separable func-
tions. In Proceedings of the third annual Genetic Programming Conference, 1998.
[46] T. A. El-Mihoub, A. A. Hopgood, L. Nolle, and A. Battersby. Hybrid genetic algo-

rithms: A review. Engineering Letters, 13(2):124–137, 2006.
[47] A. A. Emerick, E. Silva, B. Messer, L. F. Almeida, D. Szwarcman, M. A. C. Pacheco,

and M. M. B. R. Vellasco. Well placement optimization using a genetic algorithm
with nonlinear constraints. In SPE reservoir simulation symposium, number SPE
118808, February 2009.
[48] M. T. M. Emmerich, K. C. Giannakoglou, and B. Naujoks. Single- and multiobjective

evolutionary optimization assisted by gaussian random field metamodels. IEEE
Transactions on Evolutionary Computation, 10(4):421–439, 2006.
[49] G. Evensen. Sequential data assimilation with a nonlinear quasi-geostrophic model

using monte carlo methods to forecast error statistics. J. Geophys. Res, 99:10143–
10162, 1994.
[50] C. L. Farmer, J. M. Fowkes, and N. I. M. Gould. Optimal well placement. In 12th

European Conference on the Mathematics of Oil Recovery (ECMOR XII). EAGE,
2010.
[51] K. P. Ferentinos, K. G. Arvanitis, and N. Sigrimis. Heuristic optimization methods

for motion planning of autonomous agricultural vehicles. J. of Global Optimization,
23(2):155–170, June 2002.
[52] J. M. Fitzpatrick and J. J. Grefenstette. Genetic algorithms in noisy environments.

Machine Learning, 3(2):101–120, 1988.
[53] R. Fletcher. A new approach to variable metric algorithms. Computer Journal,

13:317–322, 1970.
[54] F. J. T. Floris, M. D. Bush, M. Cuypers, F. Roggero, and A.-R. Syversveen. Meth-

ods for quantifying the uncertainty of production forecasts: A comparative study.
Petroleum Geoscience, 7:87–96, 2001.
109
REFERENCES
[55] D. B. Fogel. An introduction to simulated evolutionary optimization. IEEE Trans-

actions on Neural Networks, 5:3–14, 1994.
[56] L. J. Fogel. Autonomous automata. Industrial research, 4:14–19, 1962.
[57] F. Forouzanfar, G. Li, and A. C. Reynolds. A two-stage well placement optimiza-

tion method based on adjoint gradient. In SPE annual technical conference and
exhibition, number SPE 135304, September 2010.
[58] A. I. Forrester and A. J. Keane. Recent advances in surrogate-based optimization.

Progress in Aerospace Sciences, 45(1-3):50 – 79, 2009.
[59] S. Gelly, S. Ruette, and O. Teytaud. Comparison-based algorithms are robust and
randomized algorithms are anytime. Evolutionary Computation, 15(4):411–434, De-
cember 2007.
[60] H. georg Beyer. Towards a theory of ‘evolution strategies’. some asymptotical results
from the (1,+lambda)-theory. Evolutionary Computation, 1:165–188, 1993.
[61] D. Goldfarb. A family of variable metric updates derived by variational means.

Mathematics of Computing, 24:23–26, 1970.
[62] Y. Gu and D. S. Oliver. An iterative ensemble Kalman filter for multiphase fluid
flow data assimilation. SPE Journal, 12:438–446, 2007.
[63] B. Guyaguler and R. N. Horne. Optimization of well placement. Journal of Energy

Resources Technology, 122(2):64–70, 2000.
[64] B. Guyaguler and R. N. Horne. Uncertainty assessment of well placement opti-

mization. In SPE annual technical conference and exhibition, number SPE 87663,
September-October 2001.
[65] B. Guyaguler, R. N. Horne, L. Rogers, and J. J. Rosenzweig. Optimization of

well placement in a gulf of mexico waterflooding project. In SPE annual technical
conference and exhibition, number SPE 63221, October 2000.
[66] P. Hajela and E. Lee. Genetic algorithms in truss topological optimization. Inter-
national Journal of Solids and Structures, 32(22):3341 – 3357, 1995.
[67] M. Handels, M. J. Zandvliet, D. R. Brouwer, and J. D. Jansen. Adjoint-based well-

placement optimization under production constraints. In SPE reservoir simulation
symposium, number SPE 105797, February 2007.
110
REFERENCES
[68] N. Hansen. Adaptive encoding: How to render search coordinate system invariant.
In G. Rudolph et al., editors, Parallel Problem Solving from Nature - PPSN X,
/ Heidelberg, 2008.
[69] N. Hansen, A. Auger, S. Finck, and R. Ros. Real-parameter black-box optimization

benchmarking 2009: Experimental setup. Research Report 6828, INRIA, 2009.
[70] N. Hansen, A. Auger, and P. P. R. Ros, S. Finck. Comparing results of 31 algorithms

from the black-box optimization benchmarking bbob-2009. In GECCO ’10: Proceed-
ings of the 12th annual conference comp on Genetic and evolutionary computation,
pages 1689–1696, New York, NY, USA, 2010. ACM.
[71] N. Hansen and S. Kern. Evaluating the CMA evolution strategy on multimodal test
functions. In X. Yao, E. Burke, J. A. Lozano, J. Smith, J. J. Merelo-Guervós, J. A.
Bullinaria, J. Rowe, P. Tino, A. Kabán, and H.-P. Schwefel, editors, Parallel Prob-
lem Solving from Nature - PPSN VIII, volume 3242 of Lecture Notes in Computer
Science, pages 282–291. Springer Berlin / Heidelberg, 2004.
[72] N. Hansen, S. D. Müller, and P. Koumoutsakos. Reducing the time complexity of the
derandomized evolution strategy with covariance matrix adaptation. Evolutionary
Computation, 11(1):1–18, 2003.
[73] N. Hansen, A. S. P. Niederberger, L. Guzzella, and P. Koumoutsakos. A method for

handling uncertainty in evolutionary optimization with an application to feedback
control of combustion. IEEE Transactions on Evolutionary Computation, 13(1):180–
197, 2009.
[74] N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution

strategies. Evolutionary Computation, 9(2):159–195, 2001.
[75] N. Hansen, R. Ros, N. Mauny, M. Schoenauer, and A. Auger. Impacts of Invari-

ance in Search: When CMA-ES and PSO Face Ill-Conditioned and Non-Separable
Problems. Applied Soft Computing, 11:5755–5769, 2011.
[76] V. Heidrich-Meisner and C. Igel. Hoeffding and bernstein races for selecting policies
in evolutionary direct policy search. In Proceedings of the 26th Annual International
Conference on Machine Learning, ICML ’09, pages 401–408, New York, NY, USA,
2009. ACM.
[77] A. D. Hill, D. Zhu, and M. J. Economides. Multilateral Wells. Society of Petroleum

Engineers, 2008.
111
REFERENCES
[78] J. H. Holland. Outline for a logical theory of adaptive systems. J. ACM, 9(3):297–
314, 1962.
[79] J. H. Holland. Adaptation in natural and artificial systems. University of Michigan

Press, 1975.
[80] R. Hooke and T. A. Jeeves. ”direct search” solution of numerical and statistical
problems. Journal of the association for computing machinery, 8:212–229, 1961.
[81] A. Jameson. Aerodynamic design via control theory. Journal of Scientific Comput-
ing, 3(3):233–260, 1988.
[82] M. Jebalia, A. Auger, and N. Hansen. Log linear convergence and divergence of the
scale-invariant (1+1)-ES in noisy environments. Algorithmica, 59(3):425–460, 2011.
10.1007/s00453-010-9403-3.
[83] Y. Jin. A comprehensive survey of fitness approximation in evolutionary computa-

tion. Soft Computing, 9(1):3–12, 2005.
[84] Y. Jin and J. Branke. Evolutionary optimization in uncertain environments - a

survey. IEEE Trans. Evolutionary Computation, 9(3):303–317, 2005.
[85] C. V. D. K. P. Norrena. Automatic determination of well placement subject to

geostatistical and economic constraints. In SPE International Thermal Operations
and Heavy Oil Symposium and International Horizontal Well Technology Conference,
number SPE 78996, November 2002.
[86] J. Kennedy and R. Eberhart. Particle swarm optimization. In IEEE international

conference on neural networks, pages 1942–1948, 1995.
[87] S. Kern, N. Hansen, and P. Koumoutsakos. Local meta-models for optimization using
evolution strategies. In T. Runarsson, H.-G. Beyer, E. Burke, J. Merelo-Guervós,
L. Whitley, and X. Yao, editors, Parallel Problem Solving from Nature - PPSN IX,
/ Heidelberg, 2006.
[88] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing.

Science, 220(4598):671–680, 1983.
[89] D. Kourounis, D. Voskov, and K. Aziz. Adjoint methods for multicomponent flow
simulation. In 12th European Conference on the Mathematics of Oil Recovery (EC-
MOR XII). EAGE, 2010.
112
REFERENCES
[90] D. G. Krige. A statistical approach to some basic mine valuation problems on the
witwatersrand. Journal of the Chemical, Metallurgical and Mining Society, 52:119–
139, 1951.
[91] H. Langouët. optimisation sans dérivées sous contraintes. PhD thesis, Université
Nice - Sophia Antipolis, June 2011.
[92] C. Li and P. H. Heinemann. A comparative study of three evolutionary algorithms

for surface acoustic wave sensor wavelength selection. Sensors and Actuators B:
Chemical, 125(1):311 – 320, 2007.
[93] O. Maron and A. W. Moore. The racing algorithm: Model selection for lazy learners.
Artificial Intelligence Review, 11:193–225, 1997. 10.1023/A:1006556606079.
[94] N. S. Mera. Passive gamma tomography reconstruction of layered structures in

nuclear waste vaults. Inverse Problems, 23(1):385 – 403, 2007.
[95] Z. Michalewicz. Heuristic methods for evolutionary computation techniques. Journal

of Heuristics, 1(2):177–206, 1996.
[96] Z. Michalewicz, D. Dasgupta, R. G. L. Riche, and M. Schoenauer. Evolutionary al-

gorithms for constrained engineering problems. Computers & Industrial Engineering
Journal, 30(4):851–870, 1996.
[97] Z. Michalewicz and G. Nazhiyath. Genocop III: a co-evolutionary algorithm for nu-
merical optimization problems with nonlinear constraints. In D. B. Fogel, editor,
Proceedings of the Second IEEE International Conference on Evolutionary Compu-
tation, pages 647–651, Piscataway, NJ, USA, 1995. IEEE Press.
[98] G. Montes, P. Bartolome, and A. L. Udias. The use of genetic algorithms in well
placement optimization. In SPE Latin American and Caribbean Petroleum Engi-
neering Conference, number SPE 69439, March 2001.
[99] A. Morales, H. Nasrabadi, and D. Zhu. A modified genetic algorithm for horizontal
well placement optimization in gas condensate reservoirs. In SPE annual technical
conference and exhibition, number SPE 135182, September 2010.
[100] L. Nakajima and D. J. Schiozer. Horizontal well placement optimization using quality
map definition. In Canadian International Petroleum Conference, number 2003-053.
Petroleum Society of Canada, June 2003.
[101] J. A. Nelder and R. Mead. A simplex-method for function minimization. Computer

journal, 7:308–313, 1965.
113
REFERENCES
[102] J. Nocedal. Large scale unconstrained optimization. In The state of the art in
numerical analysis, pages 311–338. Oxford University Press, 1996.
[103] J. Onwunalu and L. J. Durlofsky. Application of a particle swarm optimization algo-

rithm for determining optimum well location and type. Computational Geosciences,
14(1):183–198, 2010.
[104] U. Ozdogan and R. N. Horne. Optimization of well placement under time-dependent

uncertainty. SPE Reservoir Evaluation & Engineering, 9(2):135–145, 2006.
[105] Y. Pan and R. N. Horne. Improved methods for multivariate optimization of field de-
velopment scheduling and well placement design. In SPE annual technical conference
and exhibition, number SPE 49055, September 1998.
[106] I. Pänke, J. Branke, and Y. Jin. Efficient search for robust solutions by means of evo-
lutionary algorithms and fitness approximation. IEEE Transactions on Evolutionary
Computation, 10(4):405–420, 2006.
[107] M. J. D. Powell. The NEWUOA software for unconstrained optimization without

derivatives. In Large Scale Nonlinear Optimization, pages 255–297, 2006.
[108] I. Rechenberg. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzip-

ien der biologischen Evolution. PhD thesis, Technical University of Berlin, 1971.
[109] R. Ros and N. Hansen. A simple modification in CMA-ES achieving linear time
and space complexity. In G. Rudolph et al., editors, Parallel Problem Solving from
Nature - PPSN X, 10th International Conference Dortmund, Germany, September
13-17, 2008, Proceedings, volume 5199 of Lecture Notes in Computer Science, pages
296–305. Springer, 2008.
[110] F. Rosenblatt. The Perceptron: A probabilistic model for information storage and
organization in the brain. Psychological Review, 65:386–408, 1958.
[111] T. P. Runarsson. Constrained evolutionary optimization by approximate ranking

and surrogate models. In X. Yao, E. Burke, J. A. Lozano, J. Smith, J. J. Merelo-
Guervós, J. A. Bullinaria, J. Rowe, P. Tino, A. Kabán, and H.-P. Schwefel, editors,
Parallel Problem Solving from Nature - PPSN VIII, volume 3242 of Lecture Notes
in Computer Science, pages 401–410. Springer Berlin / Heidelberg, 2004.
[112] Y. Sano and H. Kita. Optimization of noisy fitness functions by means of genetic
algorithms using history of search. In M. Schoenauer, K. Deb, G. Rudolph, X. Yao,
E. Lutton, J. Merelo, and H.-P. Schwefel, editors, Parallel Problem Solving from
114
REFERENCES
Nature PPSN VI, volume 1917 of Lecture Notes in Computer Science, pages 571–
580. Springer Berlin / Heidelberg, 2000.
[113] Y. Sano and H. Kita. Optimization of noisy fitness functions by means of genetic
algorithms using history of search with test of estimation. volume 1917, pages 360–
365, 2002.
[114] P. Sarma and W. H. Chen. Efficient well placement optimization with gradient-based
algorithms and adjoint models. In SPE intelligent energy conference and exhibition,
number SPE 112257, February 2008.
[115] R. Schulze-Riegert, M. Bagheri, M. Krosche, N. Kück, and D. Ma. Multiple-objective

optimization applied to well path design under geological uncertainty. In SPE reser-
voir simulation symposium, number SPE 141712, February 2011.
[116] R. Schulze-Riegert, M. Dong, K. L. Heskestad, M. Krosche, H. Mustafa,

K. Stekolschikov, and M. Bagheri. Well path design optimization under geologi-
cal uncertainty: Application to a complex north sea field. In SPE Russian Oil and
Gas Conference and Exhibition, number SPE 136288, October 2010.
[117] H.-P. Schwefel. Evolutionsstrategie und numerische Optimierung. PhD thesis, Tech-
nical University of Berlin, 1975.
[118] D. F. Shanno. Conditioning of quasi-newton methods for function minimization.

Mathematics of Computing, 24:647–656, 1970.
[119] J. C. Spall. Multivariate stochastic approximation using a simultaneous perturba-

tion gradient approximation. IEEE Transactions on Automatic Control, 37:332–341,
1992.
[120] M. Srinivas and L. M. Patnaik. Adaptive probabilities of crossover and mutation in

genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics, 24:656–
667, 1994.
[121] P. Stagge. Averaging efficiently in the presence of noise. In A. Eiben, T. Bäck,

M. Schoenauer, and H.-P. Schwefel, editors, Parallel Problem Solving from Nature
- PPSN V, volume 1498 of Lecture Notes in Computer Science, pages 188–197.
Springer Berlin / Heidelberg, 1998.
[122] P. D. Surry and N. J. Radcliffe. Real representations. In Foundations of Genetic

Algorithms 4. Morgan Kaufmann, 1996.
115
REFERENCES
[123] R. Thangaraj, M. Pant, A. Abraham, and P. Bouvry. Particle swarm optimization:

Hybridization perspectives and experimental illustrations. Applied Mathematics and
Computation, 217(12):5208–5226, 2011.
[124] V. Černý. Thermodynamical approach to the traveling salesman problem: An

efficient simulation algorithm. Journal of Optimization Theory and Applications,
45(1):41–51, 1985.
[125] S. Vlemmix, G. J. P. Joosten, D. R. Brouwer, and J. D. Jansen. Adjoint-based

well trajectory optimization in a thin oil rim. In SPE EUROPEC/EAGE annual
conference and exhibition, number SPE 121891, June 2009.
[126] H. Wang, D. E. Ciaurri, and L. J. Durlofsky. Optimal well placement under uncer-
tainty using a retrospective optimization framework. In SPE Reservoir Simulation
Symposium, number SPE 141950, February 2011.
[127] H. Wang and B. W. Schmeiser. Discrete stochastic optimization using linear inter-
polation. In Proceedings of the 40th Conference on Winter Simulation, WSC ’08,
pages 502–508. Winter Simulation Conference, 2008.
[128] L. C. Wrobel and P. Miltiadou. Genetic algorithms for inverse cathodic protection
problems. Engineering Analysis with Boundary Elements, 28(3):267 – 277, 2004.
[129] B. Yeten, L. J. Durlofsky, and K. Aziz. Optimization of nonconventional well type,

location and trajectory. SPE Journal, 8:200–210, 2003.
[130] M. J. Zandvliet, M. Handels, G. M. van Essen, D. R. Brouwer, and J. D. Jansen.

Adjoint-based well-placement optimization under production constraints. SPE Jour-
nal, 13(4):392–399, 2008.
[131] H. Zhao, C. Chen, S. Do, G. Li, and A. C. Reynolds. Maximization of a dynamic

quadratic interpolation model for production optimization. In SPE reservoir simu-
lation symposium, number SPE 141317, February 2011.
[132] D. I. Zubarev. Pros and cons of applying proxy-models as a substitute for full
reservoir simulations. In SPE Annual Technical Conference and Exhibition, number
SPE 124815, October 2009.
116
Abstract: The amount of hydrocarbon recovered can be considerably increased by finding optimal place-
ment of non-conventional wells. For that purpose, the use of optimization algorithms, where the objective
function is evaluated using a reservoir simulator, is needed. Furthermore, for complex reservoir geologies with
high heterogeneities, the optimization problem requires algorithms able to cope with the non-regularity of the
objective function. The goal of this thesis was to develop an efficient methodology for determining optimal well
locations and trajectories, that offers the maximum asset value using a technically feasible number of reservoir
simulations.
In this thesis, we show a successful application of the Covariance Matrix Adaptation - Evolution Strategy
(CMA-ES) which is recognized as one of the most powerful derivative-free optimizers for continuous optimiza-
tion. Furthermore, in order to reduce the number of reservoir simulations (objective function evaluations), we
design two new algorithms. First, we propose a new variant of CMA-ES with meta-models, called the new-
local-meta-model CMA-ES (nlmm-CMA), improving over the already existing variant of the local-meta-model
CMA-ES (lmm-CMA) on most benchmark functions, in particular for population sizes larger than the default
one. Then, we propose to exploit the partial separability of the objective function in the optimization process
to define a new algorithm called the partially separable local-meta-model CMA-ES (p-sep lmm-CMA), leading
to an important speedup compared to the standard CMA-ES.
In this thesis, we apply also the developed algorithms (nlmm-CMA and p-sep lmm-CMA) on the well place-
ment problem to show, through several examples, a significant reduction of the number of reservoir simulations
needed to find optimal well configurations. The proposed approaches are shown to be promising when consid-
ering a restricted budget of reservoir simulations, which is the imposed context in practice.
Finally, we propose a new approach to handle geological uncertainty for the well placement optimization
problem. The proposed approach uses only one realization together with the neighborhood of each well
configuration in order to estimate its objective function instead of using multiple realizations. The approach
is illustrated on a synthetic benchmark reservoir case, and is shown to be able to capture the geological
uncertainty using a reduced number of reservoir simulations.
Résumé: La quantité d’hydrocarbures récupérés peut être considérablement augmentée si un placement op-
timal des puits non conventionnels forer, peut être trouvé. Pour cela, l’utilisation d’algorithmes d’optimisation,
où la fonction objectif est évaluée en utilisant un simulateur de réservoir, est nécessaire. Par ailleurs, pour des
réservoirs avec une géologie complexe avec des hétérogénéités élevées, le problème d’optimisation nécessite des
algorithmes capables de faire face à la non-régularité de la fonction objectif. L’objectif de cette thèse est de
développer une méthodologie efficace pour déterminer l’emplacement optimal des puits et leurs trajectoires,
qui offre la valeur liquidative maximale en utilisant un nombre techniquement abordable de simulations de
réservoir.
Dans cette thèse, nous montrons une application réussie de l’algorithme “Covariance Matrix Adaptation -
Evolution Strategy” (CMA-ES) qui est reconnu comme l’un des plus puissants optimiseurs sans-dérivés pour
l’optimisation continue. Par ailleurs, afin de réduire le nombre de simulations de réservoir (évaluations de la
fonction objectif), nous concevons deux nouveaux algorithmes. Premièrement, nous proposons une nouvelle
variante de la méthode CMA-ES avec des méta-modèles, appelé le nouveau-local-méta-modèle CMA-ES (nlmm-
CMA), améliorant la variante déjà existante de la méthode local-méta-modèle CMA-ES (lmm-CMA) sur la
plupart des fonctions de benchmark, en particulier pour des tailles de population plus grande que celle par
défaut. Ensuite, nous proposons d’exploiter la séparabilité partielle de la fonction objectif durant le processus
d’optimisation afin de définir un nouvel algorithme appelé la partiellement séparable local-méta-modèle CMA-
ES (p-sep lmm-CMA), conduisant à une réduction importante en nombre d’évaluations par rapport à la
méthode CMA-ES standard.
Dans cette thèse, nous appliquons également les algorithmes développés (nlmm-CMA et p-sep lmm-CMA) sur
le problème de placement des puits pour montrer, à travers plusieurs exemples, une réduction significative du
nombre de simulations de réservoir nécessaire pour trouver la configuration optimale des puits. Les approches
proposées sont révélées prometteuses en considérant un budget restreint de simulations de réservoir, qui est le
contexte imposé dans la pratique.
Enfin, nous proposons une nouvelle approche pour gérer l’incertitude géologique pour le problème
d’optimisation de placement des puits. L’approche proposée utilise seulement une réalisation, ainsi que le
voisinage de chaque configuration, afin d’estimer sa fonction objectif au lieu d’utiliser multiples réalisations.
L’approche est illustrée sur un cas de réservoir de benchmark, et se révèle être en mesure de capturer
l’incertitude géologique en utilisant un nombre réduit de simulations de réservoir.

Vd2 Bouzarkouna Zyed 03042012

Uploaded by

Copyright:

Available Formats

Vd2 Bouzarkouna Zyed 03042012

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vd2 Bouzarkouna Zyed 03042012

Uploaded by

Copyright:

Available Formats

Well placement optimization

To cite this version:

HAL Id: tel-00690456

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

Optimisation de placement des puits

Directeur de thèse : Marc SCHOENAUER INRIA Saclay - Île-de-France, France

Président du jury : Jan Dirk JANSEN TU Delft, Pays-Bas

DISCIPLINE: COMPUTER SCIENCE

Well placement optimization

Thesis advisor: Marc SCHOENAUER INRIA Saclay - Île-de-France, France

President of the jury: Jan Dirk JANSEN TU Delft, Netherlands

The amount of hydrocarbon recovered can be considerably increased by finding optimal

2 CMA-ES and CMA-ES with meta-models 14

3 Well placement optimization with CMA-ES and CMA-ES with meta-

4 Local-meta-model CMA-ES for partially separable functions 54

5 Partially separated meta-models with CMA-ES for well placement opti-

6 Well placement optimization under geological uncertainty 75

6.4 Summary and discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7 Conclusions and perspectives 90

2.1 Geometrical representation of a 2-dimensional multivariate normal distri-

3.1 An example of a single multilateral well parameterization with two segments

.1 Un exemple de paramétrage d’un puits multilatéral ayant deux segments et

“Drill for oil? You mean drill into the

“If you can draw it (the well), I can drill

1.1 Problem statement

f (pmax ) = max {f (p)} , (1.1)

f the objective function to optimize, and let us consider a number Nr of geological

where f R is in general an averaged sum of the objective function evaluations on the

(II) The expensive cost of the objective function;

(III) The geological uncertainty handling problem.

1.2 Literature review

1.2.1 Optimization algorithms

Optimization algorithms for non-linear continuous optimization can be divided depending

1.2.1.1 Deterministic methods

If no explicit formula of the objective function is available, derivatives are in general

1.2.1.2 Stochastic methods

• Evolutionary algorithms (EA): An overview of evolutionary algorithms is pre-

• Particle swarm optimization (PSO) [86]: PSO is an iterative population based

• Simultaneous perturbation stochastic algorithm (SPSA) [119]: SPSA is a

1.2.1.3 Search algorithms using surrogates

Search algorithms using surrogates, called proxy-modeling or meta-modeling in the lit-

1.2.1.4 Hybrid methods

1.2.2 Well placement optimization

1.3 Thesis objectives and methodology

(II) The expensive cost of the objective function;

(III) The geological uncertainty handling problem.

1.4 Summary of contributions

The following presents a summary of the contributions of this thesis.

A first successful application of CMA-ES on the well placement problem. (Re-

A new variant of CMA-ES with local meta-models. (Results published in [22])

A significant reduction of the number of reservoir simulations for the well

1.5 Dissertation road-map

CMA-ES and CMA-ES with

2.1 Covariance Matrix Adaptation - Evolution Strategy

1. sample a population of λ candidate solutions (points of Rn )

2. evaluate the λ candidate solutions on f

Multivariate normal distributions A random vector of Rn distributed according to a

Figure 2.1: Geometrical representation of a 2-dimensional multivariate normal distribution

Those λ individuals are evaluated in step 2. and ranked according to f :

(g) (g) (g)