These: PHD Degree in Mathematics (LMD)
These: PHD Degree in Mathematics (LMD)
These: PHD Degree in Mathematics (LMD)
These
Presented with a view to obtaining the
Option : OPTIMIZATION
by
GUESSOUM Nabila
In this thesis, we delve into a global optimization challenge where the objective function
is assumed to exhibit Lipschitz continuity with an unspecified Lipschitz constant.
Drawing inspiration from the recently introduced BIRECT (BIsection of RECTangles) al-
gorithm, we present a novel diagonal partitioning and sampling strategy. Our framework, na-
med BIRECT-V (V for vertices), seamlessly combines bisection with a two-point sampling
approach. In the initial hyper-rectangle, these two points are strategically positioned at 1/3
and 1 along the primary diagonal. Unlike many DIRECT-type algorithms, which find vertex-
based objective function evaluations unsuitable for bisection, our strategy, when paired with
bisection, furnishes a more comprehensive understanding of the objective function. Howe-
ver, this strategy may lead to the creation of new sampling points coinciding with existing
ones at shared vertices, thereby necessitating additional evaluations of the objective func-
tion and increasing the number of function evaluations per iteration.
ii
gorithm. Our proposal exhibits promise as a global optimization algorithm, particularly when
compared to the original BIRECT and two popular DIRECT-type algorithms across a battery
of test problems. It notably excels in scenarios involving high-dimensional problems.
Dans cette thèse, nous nous plongeons dans un défi d’optimisation globale où l’on sup-
pose que la fonction objectif présente une continuité Lipschitz avec une constante Lipschitz
non spécifiée.
i
Pour surmonter ce défi, nous proposons une modification du domaine d’optimisation
d’origine afin d’obtenir une approximation fiable de la solution globale. Les investigations
empiriques démontrent que cette modification améliore significativement les performances
de l’algorithme BIRECT-V. Notre proposition présente des perspectives prometteuses en tant
qu’algorithme d’optimisation globale, notamment par rapport à l’algorithme BIRECT d’ori-
gine et à deux algorithmes populaires de type DIRECT sur un ensemble de problèmes de test.
Elle excelle notamment dans les scénarios impliquant des problèmes de grande dimension.
في هذه الرسالة ،نتناول تحدي األمثلية العالمية حيث يفترض أن تظهر الدالة الهدف تتابع Lipschitzمع
ثابت Lipschitzغير معروف.
مؤخرا ،نقدم
ً مستلهمين من خوارزمية ( BIRECT (BIsection of RECTanglesال ُمقدمة
استراتيجية جديدة للتقسيم والعينات على شكل قطر .إطارنا ،الذي أطلقنا عليه اسم ( BIRECT-V
Vللنقاط الزاوية( ،يجمع بسالسة بين االنقسام والعينة بنقطتين .في المستطيل الفرعي األولي ،تُوضع
هاتان النقطتان بشكل استراتيجي على 3/1و 3/2على طول القطر الرئيسي .على عكس العديد من
خوارزميات نوع DIRECTاألخرى ،التي تجد أن تقييمات الدالة الهدف على أساس النقاط الزاوية غير
مناسبة لالنقسام ،توفر استراتيجيتنا ،عند استخدامها مع االنقسام ،فه ًما أشمل للدالة الهدف .ومع ذلك ،قد
تؤدي هذه االستراتيجية إلى إنشاء نقاط عينة جديدة تتزامن مع النقاط الحالية في النقاط الزاوية المشتركة،
مما يستدعي تقييمات إضافية للدالة الهدف وزيادة عدد تقييمات الدالة في كل تكرار.
ً
تعديال على المجال األصلي لألمثلية للحصول على تقريب موثوق للتغلب على هذا التحدي ،نقترح
للحالولة العالمية .تظهر التحقيقات التجريبية أن هذا التعديل يعزز بشكل كبير من أداء خوارزمية
صا عند مقارنته بخوارزمية .BIRECT-Vاقتراحنا يظهر وعدًا كخوارزمية أمثلية عالمية ،خصو ً
BIRECTاألصلية واثنتين من خوارزميات نوع DIRECTالشهيرة عبر مجموعة من مشكالت
االختبار .يبرز أداءه بشكل ملحوظ في السيناريوهات التي تتضمن مشكالت عالية األبعاد.
iii
Acknowledgments
First and foremost, I express my gratitude to the Almighty Allah for granting me the
courage and patience to complete this work.
This work would not have been as comprehensive and could not have come to fruition
without the assistance and guidance of Prof. CHITER LAKHDAR. I thank him for the excep-
tional quality of his guidance, his patience, his rigor, his valuable advice, and his availability
during our preparation of this thesis.
I also extend my respect and gratitude to the members of the jury who honored me by
evaluating this work.
Our deepest gratitude goes to our dear parents for their assistance, support, and guidance.
I would also like to thank my husband for his unwavering patience and support, which have
been more than indispensable to me.
A special thank you to the professors who taught us throughout our academic journey and
illuminated our lives with the light of knowledge.
Finally, I extend my thanks in advance to those who will kindly read or listen to me and,
above all, help me progress.
iv
Dedication
T o the two noblest and dearest people in the world : My late Father and my Mother, who
sacrificed the most
beautiful years of their lives to see me succeed and supported me until the end.
v
Table des matières
Abstract ii
Résumé i
Acknowledgments iv
Introduction 1
0.1 Context and Research Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Object of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.4 Research Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.5 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.6 Structure of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.7 Puplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
vi
2 The DIRECT Algorithm 16
2.1 Lipschitz Optimization Prolem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 The one-dimensional DIRECT Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.1 Area-Dividing Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Potentially optimal intervals (POIs) . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 The one-dimensional DIRECT Algorithm . . . . . . . . . . . . . . . . . . . . . . 22
2.2.4 The multidimensional DIRECT algorithm . . . . . . . . . . . . . . . . . . . . . . 22
2.2.5 Strengths of the DIRECT algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.6 Weaknesses of the DIRECT algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Overview DIRECT-type derivative-free methods . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.1 Classification of DIRECT Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Bibliographie 72
5 Appendices 82
.1 Key characteristics of the Hedar test problems [18] . . . . . . . . . . . . . . . . . . . . . 83
.2 Global minimizer found by the BIRECT-V algorithm using Hedar test problems
[18] with modified domain from Table 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
.3 Derivative-free optimization software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Table des figures
2.2.1 The division strategy for both Piyavskii and DIRECT algorithms . . . . . . . . . . . 20
2.2.2 Dividing of a hypercube. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.3 Illustration of the main operations of BIRECT in n = 2. The gray-colored hyper-
rectangles highlight those that are potentially optimal. They are selected in
the first phase of the current iteration and divided in the second phase of the
same iteration. Each subsequent iteration is obtained by sampling two diago-
nal points and dividing them along the longest direction, starting with the lo-
west index if multiple equally long longest directions exist. The blue points re-
present newly sampled points where the objective function is evaluated, while
for the green points, the function values from previous iterations are reused. . . 32
2.3.4 The maximum distance between the sampled points (solid blue circles) and
an unsampled point (open red circle) is higher in the ADC method than in the
BIRECT method. (Color figure online). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.5 The maximum distance between the sampled points (solid blue circles) and
an unsampled point (open red circle) is higher in the ADC method than in the
BIRECT method. (Color figure online). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
ix
3.2.1 Description of the initialization and the first three iterations used in the new
sampling scheme on on the Branin test problem. Each iteration is performed by
sampling two new points (blue color) issued from the old ones (red color) and
bisecting potentially optimal hyper-rectangles (shown in gray color) along the
coordinate (branching variable x b r , 1 ≤ b r ≤ n ), having the largest side length
(d bi r , where d ji = b ji − a ij , j = 1, ..., n ) and by first considering the coordinate
directions with the smallest index j (if more coordinates may be chosen). . . . . 49
3.2.2 Illustration of selection, sampling and partitioning schemes ranging from ite-
ration 4 to 5 on the Branin test problem. A situation where two adjacent hyper-
rectangles share the same vertex. After bisection of the lower-left hyper-rectangle
in iteration 4, the new created point fall exactly with the one in the adjacent
hyper-rectangle. This point is marked with a circle in iteration 5. . . . . . . . . . . 50
3.2.3 Geometric interpretation of potentially optimal hyper-rectangles using the BIRECT-V
algorithm on the Branin test function in the seventh iteration : (right side),
POHs correspond to the lower-right convex hull of points marked in blue color
(left side). The position of six points (values of f (x )) obtained in BIRECT can be
clearly distinguished. We observe three sampled points at which the objective
function has been re-evaluated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.4 Iteration progress of the BIRECT-Vl algorithm on the left-hand side and BIRECT-V
on the right-hand side while solving the Ackley (No. 3, n =10, from Table 3.3)
test problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Liste des tableaux
3.1 Preliminary results during the first run of the BIRECT-V algorithm. . . . . . . . . . 57
3.2 BIRECT-Vl and BIRECT-V versus BIRECT and BIRECT-l. . . . . . . . . . . . . . . 61
3.3 Comparison between BIRECT-Vl, BIRECT-V, BIRECT-l, BIRECT, DIRECT-l,
and DIRECT algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
viii
List of Algorithms
1
Algorithm252
main steps of the DIRECT Algorithms263
code of BIRECT algorithm35 4
BIRECT-V algorithm56
ix
Introduction
The field of global optimization algorithms has seen significant advancements and inno-
vative adaptations since the seminal work of Jones et al. in 1993 [23], when they introduced
the pioneering DIRECT (DIvide RECTangles) algorithm. This thesis traces the evolutionary
trajectory of these optimization methods, aiming to address the inherent limitations of the
original DIRECT approach.
The original DIRECT algorithm, while groundbreaking, had some recognized limitations.
One key limitation was the potential to overly confine the search to local regions, which could
lead to excessive function evaluations if not controlled. To address this, researchers empha-
sized selecting an appropriately sized algorithmic parameter, denoted as "", to strike a ba-
lance between exploration and exploitation. Another approach, proposed by Jones in 2001
[24], involved an alternating strategy where DIRECT alternated between global and local op-
timization phases. However, the effectiveness of this alternation depended on the predefi-
ned quantity of function evaluations in the global phase. Previous literature introduced two-
phase schemes that seamlessly balanced local and global information within a single optimi-
zation procedure.
Various extensions of the DIRECT algorithm have been suggested, with a focus on im-
proving the selection of potential optimal hyper-rectangles and modifying partitioning and
sampling strategies. Unlike most DIRECT-type algorithms that utilize a central sampling stra-
tegy, some algorithms, such as BIRECT (BIsecting RECTangles) [45], have explored alternative
approaches. BIRECT-V (V for vertices) stands out among them, introducing a novel diagonal
1
partitioning and sampling scheme. It combines bisection with the sampling of two strate-
gically positioned points along the main diagonal of the initial hyper-rectangle, denoted as
ADC. Additionally, both DISIMPL versions employ distinct partitioning strategies.
The objective of this thesis is to introduce an innovative diagonal partitioning and sam-
pling approach, extending the recently developed BIRECT (BIsection of RECTangles) algo-
rithm. This research addresses global optimization problems characterized by Lipschitz-continuous
objective functions, where the Lipschitz constant is unknown.
1. Develop and propose a novel algorithm named BIRECT-V for global optimization.
2. Extend the BIRECT-V algorithm to create a variation called BIRECT-Vl to further en-
hance its performance.
5. Evaluate the impact of domain modification on the performance of the BIRECT-V algo-
rithm.
6. Explore the applicability of the proposed approach in handling global optimization pro-
blems involving Lipschitz continuous functions subject to linear constraints, addres-
sing a challenging and relevant problem.
0.4 Research Tasks
1. Develop the BIRECT-V algorithm, incorporating bisection and diagonal vertex sam-
pling.
3. Implement numerical experiments and collect data for the numerical comparison of
BIRECT-V and BIRECT-Vl with other optimization methods.
4. Analyze the results of the numerical comparison to highlight the advantages of the pro-
posed approach.
6. Explore the applicability of the proposed approach to address global optimization pro-
blems involving Lipschitz continuous functions and linear constraints.
0.5 Methodology
The proposed approach builds upon the BIRECT algorithm’s core principles while in-
troducing a refined diagonal partitioning and sampling strategy. By strategically modi-
fying the optimization domain, we aim to reduce the likelihood of redundant evalua-
tions and improve the overall efficiency of the optimization process. This enhancement
is motivated by the need to strike a balance between exploration and exploitation, en-
suring that the algorithm converges to the global solution effectively.
Chapter 1 provides an overview of the research context, outlines the problem statement,
discusses the motivation behind the research, and presents the research’s aims and ob-
jectives. It also addresses the research questions, describes the research methods em-
ployed, and discusses the research’s approbation.
Finally, at the end of the dissertation, the main results of this thesis are summarized.
0.7 Puplications
1
CHAPITRE
Art
1.1 Introduction
The resolution of global optimization problems has become a central topic in scienti-
fic research and engineering. It encompasses various fields such as computer science,
mathematics, industrial engineering, and many others. This growing importance arises
from the need to find optimal solutions in various contexts, whether it’s optimizing in-
dustrial processes, logistic planning, or solving complex problems in artificial intelli-
gence. Advances in this field continue to significantly impact numerous sectors and
contribute to improving efficiency, performance, and decision-making in various prac-
tical applications.
Furthermore, humans naturally seek to enhance their daily lives and pursue perfection.
They continually strive to minimize expenses, rent, or vehicle consumption, while at-
tempting to maximize their assets. Mathematicians step in to actualize these desires by
modeling real-life problems using cost functions and various optimization techniques.
Today, global optimization has become essential for solving a wide range of problems,
whether in industry or other sectors. Indeed, we have witnessed a rapid growth in re-
search employing global optimization methods in recent years. This trend can be ob-
served across all scientific domains.
In this chapter, we will explore the general definitions of global optimization methods,
which can be divided into two main categories : deterministic and non-deterministic
5
methods(stochastic).
The problem we are studying here is the search for the global minimum of a real func-
tion f of n real variables x1 , x2 , . . . , xn . Such problems frequently arise in applications.
min f (x )
x ∈D
where :
Most of the time, the global optimization problem is defined only as a minimization
problem because the maximization of the real-valued function fˆ(x ) is equivalent to
the minimization of the function :
f (x ) ≡ (−1) × fˆ(x ).
Besides finding the global minimum f ∗ , one or all global minimizers x ∗ such that f (x ∗ ) =
f ∗ are supposed to be found.
1.3 Classification of Global Optimization Problems
In the domain of global optimization, we can categorize problems into two distinct
types : optimization with constraints and unconstrained optimization. When the so-
lution space X is not equivalent to Rn , we refer to the problem (1.1) as a constrained
minimization problem. Conversely, if X = Rn , it is termed an unconstrained minimiza-
tion problem. It is essential to note that, for a problem to possess a global minimum,
it often necessitates the imposition of additional conditions on the function f , inclu-
ding the requirement that the function grows towards infinity as |x | approaches infinity,
expressed as limk →+∞ f (x ) = +∞
To grasp the significant challenges inherent in global optimization problems and the
computational expenses associated with their resolution, it is crucial to recognize that
standard techniques in nonlinear optimization typically, at best, identify local minima.
Furthermore, there is no local criterion to discern whether a local solution is indeed
a global minimum. Consequently, conventional optimization approaches that rely on
derivatives, gradients, subgradients, and similar tools are generally ill-suited to loca-
ting or identifying global optima. It is important to emphasize that, in general, a local
minimum differs from a global minimum.
Hence, researchers in the field of global optimization have introduced specialized me-
thods whose primary objective is not merely to find local minima but, more impor-
tantly, to determine which among them constitutes the global minimum. This deter-
mination heavily relies on the inherent structural characteristics of the given problem.
It’s worth noting that these classifications are not mutually exclusive, and many optimi-
zation problems can fall into multiple categories based on different criteria. The choice
of classification depends on the specific problem, its attributes, and the optimization
objectives. Researchers and practitioners utilize these classifications to select appro-
priate solution methods and algorithms tailored to the problem’s unique nature.
1.4 Classification of Global Optimization Methods
1- Stochastic Methods : These methods provide the global minimum of f with a pro-
bability close to 1.
While, in general, they may not yet fully satisfy practical needs, their applications conti-
nue to improve and expand. They are used in the following cases :
1- The objective function is not known analytically but is known through observable
numerical evaluations subject to random perturbation.
There are stochastic methods based on pure random search, involving the evaluation of
the function f at a finite set of points in D drawn randomly and independently. These
algorithms are particularly useful when f is not given analytically, or when no informa-
tion about f is available other than its value at every point in D ; this is known as the
"Black Box Situation."
Some stochastic methods involve deterministic procedures with local strategy, such as
the multi-start method, which relies entirely on local optimization techniques. It in-
volves multiple local descents starting from randomly drawn points, independently and
uniformly from D .
Another type of stochastic algorithm, known as the "random line algorithm," seeks the
global minimum using a deterministic method applied to curves generated randomly.
As a result, the algorithm transforms a multidimensional global optimization problem
into a one-dimensional problem.
Other stochastic methods are based on partitioning the feasible set D into a finite num-
ber of subsets, followed by the application of a global optimization algorithm (determi-
nistic or stochastic) successively on these randomly drawn subsets based on a distribu-
tion concentrated on the elements of the subdivision.
Among the techniques based on pure random search, we also mention Markovian algo-
rithms, which represent a modification of these techniques. These techniques include
adaptive elements that allow simulating a law dependent on the previous point xk and
the value f (xk ) and generate xk +1 . One of the most well-known Markovian algorithms
is the "simulated annealing" algorithm.
A second class of stochastic methods is the class of methods based on solving stochastic
differential equations. These methods aim to find global minima of a function f defined
on Rn and sufficiently regular by considering the stochastic differential equation of Itô :
For a function ε(t ) that slowly tends to 0 as t approaches ∞, the solution of the Itô equa-
tion is a recurrent diffusion process, whose stationary distribution concentrates on the
set of global minimizers of f . The method, therefore, attempts to obtain a global mini-
mizer of f by observing, as t → ∞, a trajectory of the diffusion computed numerically.
Stochastic algorithms provide the global minimum with a probability close to 1 when a
large number of evaluations of f (x ) are conducted. They are relatively straightforward
to implement on a computer but tend to be less efficient. However, these methods are
deployed when deterministic approaches prove inadequate for solving global optimi-
zation problems.
Stochastic methods encounter particular challenges when confronted with equality constraints
and managing a substantial number of variables, typically exceeding a dozen. Their pri-
mary limitation is their tendency to miss the global solution, and there is no assurance
regarding the quality of the resulting solution. Additionally, these methods can be com-
putationally intensive in terms of CPU time when compared to deterministic counter-
parts.
These methods exhibit several key characteristics. They provide approximate solutions
that may not be optimal but are usually acceptable for practical purposes. Heuristics
are designed for efficiency, making them suitable for large-scale or time-sensitive pro-
blems. They are often tailored to specific optimization challenges, leveraging problem-
specific knowledge and rules to guide the search for solutions. Heuristics employ di-
verse search strategies, including greedy algorithms, local search, simulated annealing,
genetic algorithms, and more, to navigate the solution space effectively. Many heuris-
tics use iterative refinement, progressively improving candidate solutions.
Heuristic methods, while providing practical solutions to complex problems, come with
inherent limitations. First and foremost, they do not guarantee the discovery of optimal
solutions, as their primary aim is to deliver reasonably good solutions quickly, often wi-
thout reaching the best possible outcome. Additionally, some heuristics are sensitive to
initial conditions, making them less reliable due to the potential for significantly dif-
ferent solutions arising from slight variations in initial configurations. Evaluating the
quality of heuristic solutions can be challenging, given the absence of a clear optima-
lity criterion. Moreover, heuristics are problem-specific, and adapting them to new do-
mains can be time-consuming. They also lack theoretical guarantees regarding perfor-
mance, making it difficult to predict their effectiveness. In some cases, heuristics may
not converge to a solution or could run indefinitely. Computational intensity, the risk of
getting stuck in local optima, parameter tuning, and limited generalizability are further
drawbacks. Despite these limitations, heuristics remain valuable for solving complex
real-world problems where obtaining exact solutions is impractical, with their effecti-
veness depending on problem characteristics and practitioner expertise.
1.4.3 Deterministic Global Methods
In this category of methods, randomness is not a factor. That is, to solve a problem,
the algorithm will always behave in the same way and provide the same result. These
algorithms can be classified based on the types of problems they can handle, including
linear programs, convex problems, quadratic problems, polynomial problems, or more
general cases.
These techniques generally have the advantage of not requiring starting points. More
importantly, they provide a deterministic answer regarding the quality of the solutions
found : Is the optimum local or global ? What is the degree of certainty ? And so on.
This precision is significantly important because it is often much less costly to find a
solution than to prove that it is indeed the global optimum. Deterministic methods em-
ploy two techniques : reduction and partition.
The simplest theoretical scheme for dimension reduction is called "multi-step reduc-
tion." It is based on the representation :
where f is a continuous function over the unit cube X = [0, 1]n . This scheme can be
used to transform the initial global optimization problem over a cube into multiple one-
dimensional global optimization problems, although the number of these problems is
generally very large.
Methods Based on Partition and Elimination Techniques (Branch and Bound)
Many problems, especially optimization problems, have a finite (or countable) set of
solutions. In principle, it is possible to enumerate all these solutions and select the one
that suits our needs. However, the major drawback of this approach is the prohibitively
large number of solutions ; enumerating them is not practical.
The branch and bound method (progressive evaluation and separation procedure) aims
to intelligently enumerate these solutions by utilizing certain properties of the problem.
This technique allows for the elimination of partial solutions that do not lead to the de-
sired solution, making it possible to often find the sought-after solution in reasonable
time. Of course, in the worst case, this method still requires explicit enumeration of all
problem solutions.
To achieve this, the branch and bound method employs a function that places bounds
on certain solutions, either excluding them or retaining them as potential solutions.
The performance of the branch and bound method depends, among other things, on
the quality of this function.
General Algorithm :
For convenience, the execution of the branch and bound method is represented through
a tree structure. The root of this tree represents the set of all solutions to the conside-
red problem. Below, we summarize the branch-and-bound method for minimization
problems.
The method starts by considering the initial problem with its set of solutions, referred to
as the root. Procedures for lower and upper bounds are applied to the root. If these two
bounds are equal, then an optimal solution is found, and the process terminates. Other-
wise, the set of solutions is divided into two or more subproblems, becoming children
of the root.
The method is then recursively applied to these subproblems, generating a tree struc-
ture. If an optimal solution is found for a subproblem, it is feasible but not necessarily
optimal for the original problem. Since it is feasible, it can be used to eliminate all its
descendants : if the lower bound of a node exceeds the value of an already known so-
lution, it can be asserted that the global optimal solution cannot be contained in the
subset of solutions represented by this node. The search continues until all nodes are
either explored or eliminated.
1.5 Conclusions
In conclusion, global optimization is a vital field for finding the best solutions across
diverse applications. It distinguishes itself from local optimization by seeking absolute
extrema. Global optimization problems can be constrained or unconstrained, with ad-
ditional conditions often required for global solutions.
Global optimization methods fall into two main categories : stochastic methods, which
provide high-probability solutions, and deterministic methods, which guarantee conver-
gence to the global minimum.
Stochastic methods are suitable for complex problems, while heuristic methods prio-
ritize practicality. Deterministic methods offer precision but can be computationally
expensive.
The choice of method depends on the problem and resources available. Researchers
continually refine these methods to address real-world challenges effectively.
In the upcoming sections, we’ll explore these methods further, understanding their strengths
and weaknesses. This chapter equips you with insights into the world of global optimi-
zation and its real-world applications.
2
CHAPITRE
16
vertex. However, the DIRECT algorithm simplifies this complexity by requiring evalua-
tions solely at the midpoints rather than the vertices. To ensure sampling at the center
points, mesh refinement involves dividing the interval into thirds, and evaluations are
performed at the midpoint of the left and right thirds. The initial center point simply
becomes the midpoint of the middle interval, where the objective function is already
known. Notably, the algorithm does not rely on a known Lipschitz constant to deter-
mine which hyperrectangles to sample next. Instead, it calculates a set of potential hy-
perrectangles with the lowest objective values and the greatest expected decrease in
objective function values. For termination, since the Lipschitz constant remains unk-
nown, the number of algorithm iterations is pre-specified.
In this chapter, we delve into the DIRECT algorithm, which was conceived by D. R. Jones,
C. D. Perttunen, and B. E. Stuckman [23] in 1993. The name "DIRECT" stems from its
fundamental principle of dividing rectangles, a core aspect of the algorithm. Before del-
ving into the intricacies of the DIRECT algorithm, we establish the abstract problems
that this algorithm aims to address. Additionally, we provide an overview of Lipschitz
continuous functions and their properties, which serve as foundational concepts for
many classical methods in Lipschitz optimization. However, we also highlight the limi-
tations of these classical approaches. Towards the end of the chapter, we introduce the
one-dimensional variant of the DIRECT algorithm and elucidate how it can be extended
to handle problems in higher dimensions.
| f (x ) − f (y )| ≤ L · kx − y k (2.1.1)
Here, L is the Lipschitz constant, and kx − y k represents the distance between points
x and y in the domain of the function. This property ensures that the function does
not change too rapidly, and it provides a useful condition for certain optimization pro-
blems.
The Lipschitz optimization problem typically involves finding the minimum or maxi-
mum of a Lipschitz continuous objective function subject to certain constraints. The
Lipschitz continuity property can be useful in optimization because it provides a bound
on how much the function can vary, which can guide the optimization process.
Both Piyavskii and DIRECT are global optimization algorithms that divide the search
space into smaller regions. Piyavskii focuses on maximizing the reduction in the objec-
tive function’s value when dividing, while DIRECT divides into three equal parts per-
pendicularly to the dimension with the smallest wi value. The choice of algorithm de-
pends on the specific characteristics of the optimization problem and the available in-
formation about the objective function. The problems of the Piyavskii algorithm are
addressed by the DIRECT algorithm, where DIRECT stands for dividing rectangles. This
algorithm was developed by D. R. Jones, C. D. Perttunen, and B. E. Stuckman [23].
To explain this algorithm, we first assume that we know a Lipschitz constant and drop
this requirement later. We start with inequality (2.1.1) in its one-dimensional formula-
tion :
| f (x ) − f (x 0 )| ≤ γ|x − x 0 | ∀x , x 0 ∈ [a , b ].
0 =c
Let c = (a + b )/2 and set x in (2.1.1) . Then, for all x ∈ [a , b ] :
For x ∈ [a , c ] :
f (c ) + L (x − c ) ≤ f (x ) ≤ f (c ) − L (x − c )
For x ∈ [c , b ] :
f (c ) − L (x − c ) ≤ f (x ) ≤ f (c ) + L (x − c ).
These two inequalities define a region in which the graph of the function is contained.
Furthermore, we get a lower bound D (a , b ) for f in [a , b ] by letting x = a or x = b in
these inequalities. This means we evaluate the bounding function at the two endpoints
of the interval :
L (b − a )
D (a , b ) = f (c ) − .
2
The bounding area is large, and therefore needs to be refined. To do this, we need to
sample the function at more points and divide the search area. The next section explains
how DIRECT does this.
The DIRECT algorithm divides the original interval into three sub-intervals of equal
length and assesses the function at the midpoints of these new intervals. As one of
these midpoints coincides with the midpoint of the original interval, only two addi-
tional evaluations of the function are needed. This distinct feature sets DIRECT apart
from the Piyavskii algorithm, which divides the original interval into two sub-intervals
of different lengths.
Galperin [11, 12] and Shen et al. [56] have also proposed evaluating the function at the
interval’s midpoint but then dividing the interval there. Similarly, Gourdin et al. [13]
advocate the same division strategy as DIRECT. They extend this approach to the N-
dimensional case, a topic we will discuss in Section 2.2.4 However, their method differs
from DIRECT in how they decide which hyperrectangle to divide next.
The advantage of dividing the interval as done in the DIRECT algorithm is that it re-
quires evaluation only at the midpoint of the interval. This concept readily extends to
higher dimensions without an exponential increase in the number of function evalua-
tions, as hyperrectangles in higher dimensions still have a single center. In contrast, if
we were to extend the idea of evaluating the function at the endpoints of intervals to hi-
gher dimensions, we would need to evaluate the function at all corners, resulting in an
exponential increase in function calls with increasing dimensionality. Moreover, the DI-
RECT strategy is not dependent on the Lipschitz constant since the points at which the
function is evaluated are always midpoints. This property holds significance for sub-
sequent discussions.
A visual representation of the two different dividing strategies employed by DIRECT and
the Piyavskii algorithm is depicted in Figure (2.2.1 ) In the diagram, ’×’ marks represent
the points where the function is evaluated, while ’|’ marks indicate the endpoints of the
intervals.
FIGURE 2.2.1 – The division strategy for both Piyavskii and DIRECT algorithms .
2.2.2 Potentially optimal intervals (POIs)
The DIRECT algorithm effectively selects intervals for division without requiring prior
knowledge of the Lipschitz constant, denoted as L . Initially, we assume the value of L
is known.
In cases where multiple intervals have the same length, we only consider points that re-
present one of these intervals with the smallest function value at their midpoint. These
points, for which such a K̃ i exists, hold the potential for the most substantial function
improvement if the Lipschitz constant were K̃ i .
Définition 2.2.1 Let " > 0 be a positive constant, and let fmi n be the current best value of
the function. An interval j is said to be potentially optimal if there exists a rate of varia-
tion K˜ such that
In this definition, inequality (2.2.2) tells us the decision to choose intervals that promise
the greatest improvement in the function’s value if f were Lipschitz with a Lipschitz
constant K˜ .
Inequality (2.2.3) ensures that we have the potential for a significant decrease in f wi-
thin the considered interval.
Let [a 1 , b1 ] be the initial interval, and set an initial accuracy threshold " > 0. Initialize
fmin as the current best function value and start the iteration.
a 2 ← c2 ,
else,
b1 ← c 1 .
In the following, we describe how to extend the DIRECT algorithm to the multi-dimensional
case. We first assume that P is a hyperrectangle in Rn . The main difference in this tech-
nique, compared to the one-dimensional case, is the procedure for dividing the search
space. For better understanding, we start with a hypercube and then describe the more
general case of a hyperrectangle.
Dividing in higher dimensions
Dividing of a hypercube :
Let c be the center point of a hypercube. Evaluate the function at the points c ± δei ,
where δ equals 1/3 of the side length of the cube, and ei is the i -th Euclidean base vector.
Define wi as
wi = min{ f (c + δei ), f (c − δei )}.
Divide the hypercube in the order given by the wi , starting with the lowest wi .
We divide the hypercube first perpendicular to the direction with the lowest wi . Then
we divide the remaining volume perpendicular to the direction of the second lowest wi ,
and so on until the hypercube is divided in all directions. The division we create with this
strategy puts c in the center of a hypercube with side length δ. Let b = arg mini =1,...,N { f (c +
δei ), f (c − δei )}. b will be the center of a hyperrectangle with one side with a length of
δ, and N − 1 sides with a length of 3δ. This means that of the new hyperrectangles,
the one with the smallest function value at its center has the largest volume of all new
hyperrectangles.
Division of a hyperrectangle :
Moving forward to Figure 2.2.2 b, we encounter the subsequent step within the algo-
rithm. Here, DIRECT proceeds to partition the shaded region. In the second box of Fi-
gure 2.2.2 b, you can observe the specific locations where DIRECT samples the function,
while the third box illustrates the single division of the rectangle.
Continuing to Figure 2.2.2 c, we delve into the third stage of the algorithm within this
particular example. In this step, DIRECT is set to divide the two shaded rectangles. One
of these rectangles happens to be a square, thus it undergoes a double division, as ex-
plained earlier. Conversely, the larger area takes the form of a rectangle and undergoes
a singular division operation.
Consequently, we have successfully described the process and presented it as the "Divide
algorithm". The rectangle division procedure is summarized in Algorithm
Algorithm 1 Divide Algorithm
1: Identify the set I of dimensions, the maximum side length d , and set δ = d3 .
2: Evaluate f at the points c ± δei , where i ∈ I .
3: Calculate w and divide according to the values of w .
The only remaining component needed to extend the DIRECT algorithm to n dimen-
sions is the definition of potentially optimal hyperrectangles. The selection procedure
at the initial step is trivial ; there exist only one candidate. To make the selection of po-
tentially optimal hyper-rectangles in the future iterations, DIRECT assesses the good-
ness based on the lower bound estimates for the objective function f (ci ) over each
hyper-rectangle. The requirement of potential optimality is stated formally in Defini-
tion (2.2.2).
Définition 2.2.2 Let " > 0 be a positive constant, and let fmi n be the current best value of
the function. An interval j is said to be potentially optimal if there exists a rate of varia-
tion K˜ such that
f (c j ) − K˜ δ j ≤ f (ci ) − k̃ δi , ∀i , (2.2.4)
1
δi = ||b − a ||2 (2.2.6)
2
Here, ||b − a || represents the Euclidean norm of the vector (b − a ), where a and b are
vectors defining the bounds of the hyper-rectangle.
The only difference between this definition and Definition 2.2.1 is that, instead of the
interval length in the one-dimensional case, we need another measure of size, denoted
as δi , for the hyperrectangle. Jones et al. [3] choose δ to be the distance from the center
of the hyperrectangle to its vertices. Many extensions and modifications of the DIRECT
algorithm can be found in the literature.
endwhile
Return : fm i n , xmi n and algorithmic performance measures.
The DIRECT algorithm offers several notable strengths, making it a valuable tool for
global optimization. Firstly, it operates as a versatile black-box optimization algorithm,
negating the need for prior knowledge of gradients or specific characteristics of the ob-
jective function. This versatility enables its application across a wide spectrum of op-
timization problems. Secondly, DIRECT provides a strong guarantee of global conver-
gence, assuming the continuity of the objective function. This assurance is particularly
valuable when dealing with intricate, multi-modal objective functions.
A distinctive feature of DIRECT is its adaptive approach to balance local and global ex-
ploration. This balance is achieved without the need for explicit hyperparameters. DI-
RECT dynamically selects rectangles with lower bounds for specific Lipschitz constants,
with smaller constants favoring local search and larger ones favoring global explora-
tion. This adaptability enhances its robustness, allowing it to perform effectively across
problems of varying complexity without manual parameter tuning.
Moreover, DIRECT exhibits a unique property wherein sampled points tend to cluster
around global minima. This clustering phenomenon can be leveraged to identify pro-
mising regions for initiating local optimization algorithms to further refine solutions.
Despite its notable strengths, the DIRECT algorithm does exhibit certain weaknesses.
One significant limitation is the reliance on a user-defined parameter, namely the de-
sired accuracy (ε). This parameter, although intuitive in its interpretation as the desi-
red level of accuracy, can impact performance. If set too conservatively, it may lead to
overly cautious, locally focused searches, potentially overlooking valuable global op-
tima. Conversely, if set too aggressively, it might waste computational resources explo-
ring negligible improvements. Striking the right balance for ε can be a non-trivial task,
and the algorithm’s sensitivity to this parameter poses a challenge.
In conclusion, while DIRECT offers valuable capabilities for global optimization, users
must be mindful of its sensitivity to the ε parameter, its reliance on function continuity,
and potential scalability challenges in high-dimensional spaces when considering its
application to specific optimization tasks.
The DIRECT algorithm, originally introduced in [23], has undergone numerous varia-
tions and extensions over the years. These adaptations and extensions aim to overcome
the inherent limitations of the original DIRECT algorithm and enhance its effectiveness
in solving practical global optimization problems. In this section, we present a compre-
hensive survey of significant modifications and extensions derived from the DIRECT
algorithm, as documented in the existing literature.
Numerous articles have explored ways to modify and extend the DIRECT algorithm, in-
troducing innovative features to improve the selection of potentially optimal hyperrec-
tangles (POHs). These proposals often focus on novel partitioning and sampling tech-
niques, which are instrumental in enhancing the algorithm’s performance.
Table ?? in this section serves as a valuable reference, summarizing the range of algo-
rithms included in the DIRECTGO toolbox for box-constrained global optimization.
While most of these algorithms adhere to the trisection approach for n-dimensional
POHs, it’s important to note that ADC, BIRECT, and both DISIMPL versions deviate
from this convention by employing alternative partitioning strategies.
The remainder of this section will provide a detailed exploration of these various algo-
rithms and their unique contributions to the field of global optimization. In doing so,
we aim to offer a comprehensive overview of the advancements made in the realm of
DIRECT-based optimization techniques.
These variations can be categorized into several groups based on their key characteris-
tics and strategies. We’ll discuss each group separately :
Adaptive Diagonal Curves (ADC) : Adaptive diagonal curves (ADC) based al-
gorithm with a new two-phase technique balancing local and global informa-
tion was introduced in [51]. Independently on dimensionality, the ADC algo-
rithm evaluates the objective function at two vertices a ik and bik of the main
diagona. Notice that up to 2n hyper-rectangles can share the same vertex, lea-
ding (in a long sequence) to a smaller number of sampled points than the total
number of hyper-rectangles in the current partition. Furthermore, as in the re-
vised version of DIRECT [24], ADC trisects each selected POH along just one
of the longest dimensions. Such a diagonal scheme potentially obtains more
comprehensive information about the objective function than center-based
sampling, which sometimes may take many iterations to find the solution. For
example, a hyper-rectangle containing the optimum with a bad function value
at the midpoint makes him undesirable for further selection. The ADC algo-
rithm intuitively reduces this chance for both sampling points in the hyper-
rectangle containing the optimum solution.
Bisecting Rectangles (BIRECT) : In the domain of derivative-free optimiza-
tion, Paulavičiuset al. introduced the BIRECT algorithm as a notable variant
of the well-established DIRECT method [45]. What sets BIRECT apart is its dis-
tinctive approach to sampling points within rectangles and its preference for
bisection over trisection. In BIRECT, each rectangle is carefully sampled at two
points, strategically positioned at 1/3 and 2/3 along a diagonal within the rec-
tangle (see Figure (2.3.3 ). This innovative sampling strategy addresses a fun-
damental issue encountered in DIRECT : the potential selection of rectangles
with unfavorable center-point function values. By sampling two points per
rectangle, BIRECT significantly reduces the likelihood of selecting rectangles
where both points align with regions of poor function values, thereby promo-
ting more favorable selections in the algorithm’s decision-making process. To
illustrate the division and sampling approach of BIRECT, consider an example
where n = 2. For the complete description, we refer to Paulavičiuset et al.
(2018).
Another key advantage of BIRECT lies in its ability to establish tighter Lip-
schitzian lower bounds. In comparison to alternative methods like ADC, BI-
RECT minimizes the maximum distance between an unsampled point wi-
thin a rectangle and the closest of the two sampled points (see Figure (2.3.4
). Consequently, it offers more precise estimates of function values within the
rectangle, enhancing the efficiency of the optimization process.
FIGURE 2.3.4 – The maximum distance between the sampled points (solid blue circles) and an
unsampled point (open red circle) is higher in the ADC method than in the BIRECT method.
(Color figure online).
However, it’s worth noting that the full impact of BIRECT, especially in com-
parison to other methods like ADC, remains an active area of research. The
trade-off between the advantages brought by BIRECT’s dual-point sampling
and any potential disadvantages, such as a lower ratio of sampled points to
rectangles, continues to be a subject of investigation.
b r = mi n{argmax{d ji =| b ji − a ij |}}.
j =1,...,n
m +1
= m i n f (lm +1 ), f (um +1 ) , fmi
m +2
= mi n f (lm+2 ), f (um +2 ) ;
Set fmi n n
m +1
Set fmi n = m i n fmi n , fmi n
m+2
, fmi n
,xmi n = argmax f (x ) ;
m+1 m+2
x ∈{ xm i n ,xm i n ,x m i n }
//Dividing
Divide D̄ i into a two new hyper-rectangles D̄ m +1 and D̄ m +2 ;
Update δm +1 and δm +2 ;
Set k = k + 1 and Ik = Ik ∪ {m + 1, m + 2} ;
Return : fmi n , xmi n and algorithmic performance measures : m , k and p e .
DISIMPL (DISIMPL-C and DISIMPL-V) :
DISIMPL [42] departs from the hyper-rectangle approach and uses simplicial
partitions. Initially, it divides the hyper-cube into n ! simplices using combina-
torial vertex triangulation. DISIMPL offers two distinct sampling strategies :
DISIMPL-C evaluates the objective function at the geometric center of the
simplex, while DISIMPL-V evaluates it at all unique vertices of the simplex.
However, DISIMPL is more suitable for small box-constrained problems due
to the rapid growth in the number of initial simplices as the dimension in-
creases. For box-constrained problems, the total number of initial simplices
grows speedily with the dimension increase. Therefore, DISIMPL effectively
can be used only for small box-constrained problems. However, the DISIMPL
approach is auspicious (among all DIRECT-type methods) for symmetric op-
timization problems [42][66] and problems with linear constraints [67].
Both algorithms cycle through these levels using “W-cycle” : 21011012. The
main difference between the proposed algorithms is that MrDIRECT uses fixed
" = 10−4 value at all levels, while MrDIRECT075 follows above-mentioned rules.
Aggressive DIRECT :
In [1], the authors relaxed the selection criteria of POHs and proposed an ag-
gressive version of the DIRECT algorithm. Aggressive DIRECT’s main idea is to
select and divide at least one hyper-rectangle from each group of different dia-
meters (δik ) having the lowest function value. Therefore, using Aggressive DI-
RECT in the situation presented in Fig. 2, a hyper-rectangle with the slightest
measure δik would also be selected and divided. The aggressive version per-
forms more function evaluations per iteration than other DIRECT-type me-
thods. From the optimization point of view, such an approach seems less fa-
vorable since it “wastes” function evaluations by exploring unnecessary (non-
potentially optimal) hyper-rectangles. However, such a strategy is much more
appealing in a parallel environment, as was shown in [15, 16, 17, 71]. Note that
the authors did not specify which hyper-rectangle should be selected from
the same group (δik ) if more than one with identical objective function values
exist. Thus, the hyper-rectangle with the larger index value was selected in our
implementations.
3. Measure-Based Variations
DIRECT-l :
In [7], the authors concluded that the original DIRECT algorithm is sensitive to
the objective function’s additive scaling. Additionally, the algorithm does not
operate well when the objective function values are large enough.The authors
proposed a scaling of function values by subtracting the median ( fm e d i a n ) of
the collected function values to overcome this. DIRECT-m replaces the equa-
tion (2.2.5) in Definition 2.2.2to :
Similarly, in [21], the authors extended the same idea in DIRECT-a to reduce
the objective function’s additive scaling. Instead of the median value, the au-
thors proposed to use the average value ( fa v e r a g e ) at each iteration
glbSolve-sym :
Another extension of the DIRECT algorithm was proposed in [14]. The authors
introduced glbSolve-sym (glbSolve-sym2) as DIRECT extensions for symme-
tric Lipschitz continuous functions. When solving symmetric optimization
problems, there exist equivalent subregions in the hyper-rectangle. The algo-
rithm determines which hyper-rectangles can be safely discarded, conside-
ring the problem’s symmetrical nature, and avoids exploration over equiva-
lent subregions.
PLOR :
In the PLOR algorithm [22], the set of POHs is reduced to just two, correspon-
ding to the first and last point on the Pareto front (see the right panel in Fig. 2).
Therefore, only hyper-rectangles with the lowest function value and the most
extensive measure, breaking ties in favor of a better center-point function va-
lue, are selected.
DIRECT-GL :
These extensions and variations of the DIRECT algorithm address its limitations
and enhance its performance in solving real-world global optimization problems.
Each method introduces unique strategies and modifications to improve the se-
lection of potentially optimal hyper-rectangles and the overall optimization pro-
cess.
FIGURE 2.3.5 – The maximum distance between the sampled points (solid blue circles) and an
unsampled point (open red circle) is higher in the ADC method than in the BIRECT method.
(Color figure online).
2.4 Conclusions
In this chapter, we have discussed various global optimization methods and exa-
mined their respective advantages and disadvantages. It’s important to note that
the scope of this work does not permit an exhaustive exploration of all global opti-
mization methods. Specifically, we have highlighted the DIRECT algorithm, which
excels in efficiently discovering global optima for objective functions characteri-
zed by multimodality and Lipschitz continuity, all without relying on derivative in-
formation. This is achieved through a process of iteratively subdividing the search
space and judiciously selecting promising regions, making it well-suited for a wide
array of optimization problems. The DIRECT-L algorithm is its capability to effi-
ciently find global optima for Lipschitz-continuous objective functions in a man-
ner that mitigates the computational cost associated with the original DIRECT al-
gorithm.
It’s worth emphasizing that determining the superiority of one method over ano-
ther is a task highly dependent on the specific problem, its context, and the desired
objectives.
3
CHAPITRE
BIRECT-V Algorithm)
3.1 Introduction
While most DIRECT-type algorithms don’t find evaluating the objective function
at vertices suitable for bisection, our approach defies this trend, leading to a more
comprehensive understanding of the objective function. However, introducing new
sampling points might coincide with existing ones at shared vertices, necessitating
additional objective function evaluations and consequently increasing the num-
ber of evaluations per iteration. To surmount this issue, we propose a modification
44
to the original optimization domain that yields a good approximation of the glo-
bal solution. Experimental investigations underscore the positive impact of this
modification on the performance of the BIRECT-V algorithm. Our proposed me-
thodology showcases promise as a global optimization algorithm, outperforming
the original BIRECT and two popular DIRECT-type algorithms across a range of
test problems, particularly excelling in high-dimensional scenarios.
As such, the primary aim of this chapter revolves around the novel sampling scheme—samplin
at vertices—integrated into the BIRECT-V algorithm framework, without introdu-
cing additional parameters. This chapter delves into the exploration of this fresh
approach, discussing both its merits and limitations.
In this section we start by giving a description of the principle of the sampling and
division strategies retained from the original BIRECT algorithm. Then we intro-
duce our suggested method with emphasis on the sampling strategy. We conclude
this section by an illustration of this new scheme.
As we have seen in the subsection 2.3, the original BIRECT (BIsection of REC-
Tangles) algorithm, as formulated by Paulavičius et al. [45], employs a diagonal
space-partitioning technique and comprises two primary procedures : diagonal
sampling and hyper-rectangle bisection. The algorithm’s initial step involves sca-
ling the original search space D to the unit hyper-cube D̄ in which all variables are
normalized.
Selection rule
Pk = {D̄ i : i ∈ Ik },
After the inital covering, BIRECT-V moves to the future iterations by partitioning
potentially optimal hyper-rectangles and evaluating the objective function f (x ) at
their new sampling points.
New sampling points are obtained by adding and subtracting from the previous
(old) ones a distance equal to the half-side length of the branching coordinate.
This way, old sampled from the previous iterations are re-used in descendant su-
bregions.
A vital aspect of the algorithm is how the selected hyper-rectangles D̄ i , i ∈ Ik are
divided. For every potentially optimal hyper-rectangle the set of the maximum co-
ordinates (edges) is computed, and every potentially optimal hyper-rectangle is
bisected (divided in halves of equal size), along the coordinate (branching variable
x b r , 1 ≤ b r ≤ n ), having the largest side length (d bi r ) and by first considering the
coordinate directions with the lowest index j (if more coordinates may be chosen),
where function values are more promising, [70].
n o
b r = min arg max = d j = b j − a j
i i i
, (3.2.4)
1≤ j ≤n
In this subsection, we present the basic idea of the new sampling scheme in a more
general setting. An illustration is given in a two-dimensional example in Fig. 3.2.1
and Fig. 3.2.2 . Since our new method is based on the original BIRECT algorithm,
BIRECT-V follows the same hyper-rectangle selection and subdivision procedure,
unlike the sampling method which is done in a different way.
In the initialization phase, BIRECT-V normalize the search domain to an n -dimensional
unit hyper-rectangle D̄01 , and evaluates the objective function f (x) at two different
diagonal points : 00 third00 ti = (t 1i , . . . , t ni ) = (1/3, . . . , 1/3)T and 00 vertex00 vi = (v1i , . . . , vni ) =
(1, . . . , 1)T . The scaled hyper-rectangle is considered as the only trivial selected POH.
In the succeeding iterations, POHs are selected and bisected in essentially the
same way as BIRECT, with the change that in inequalites (3.2.1) and (3.2.2), the
sampled points li and ui are replaced by ti = li and vi = ui + 31 kbi −ai k respectively,
and using the same measure of the hyper-rectangle given by Eq. (3.2.3).
Selected POHs are divided with the restriction that only along the coordinate (bran-
ching variable x b r , 1 ≤ b r ≤ n ), having the largest side length (d bi r ), and by first
considering the coordinate directions with the smallest index j (if more coordi-
nates may be chosen). This restriction guarantees that the hyper-rectangle will re-
duce on every dimension. Potentially optimal hyper-rectangles are shown in the
left-side of Fig. 3.2.3 , and correspond to the lower-right convex hull of the set of
points.
Formalizing our sampling and partitioning schemes in a more general case. Sup-
pose that at iteration k , D̄ki = [ai , bi ] = {x ∈ D̄ : 0 ≤ a ij ≤ x j ≤ b ji ≤ 1, j = 1, ..., n, ∀i ∈
Ik } is a hyper-cube.
Since all the variables (x j , j = 1, ..., n) of D¯ki have the same side lengths (d ji =
b ji − a ij , j = 1, ..., n ), D¯ki is bisected (divided in halves) across the middle point
2 (a 1 + b1i ) of the coordinate direction with the smallest index (x j , j = 1) into two
1 i
FIGURE 3.2.1 – Description of the initialization and the first three iterations used in the new
sampling scheme on on the Branin test problem. Each iteration is performed by sampling
two new points (blue color) issued from the old ones (red color) and bisecting potentially
optimal hyper-rectangles (shown in gray color) along the coordinate (branching variable x b r ,
1 ≤ b r ≤ n ), having the largest side length (d bi r , where d ji = b ji − a ij , j = 1, ..., n ) and by first
considering the coordinate directions with the smallest index j (if more coordinates may be
chosen).
FIGURE 3.2.2 – Illustration of selection, sampling and partitioning schemes ranging from ite-
ration 4 to 5 on the Branin test problem. A situation where two adjacent hyper-rectangles
share the same vertex. After bisection of the lower-left hyper-rectangle in iteration 4, the new
created point fall exactly with the one in the adjacent hyper-rectangle. This point is marked
with a circle in iteration 5.
hyper-rectangles D̄ki +1 , and D̄ki +2 of equal side lengths (see Fig. 3.2.1 , iteration 1 for
illustration).
After D̄ki is bisected, the first iteration is performed by sampling two new points
from the old ones.
The new point ti +2 is obtained by adding or substracting from the old point one
third side-length d bi r /3 to the lower coordinate of the branching variable. Also the
new point vi +1 is obtained from the old one by subtracting or adding the whole
side length d bi r , while keeping all the rest of coordinates issued from ti and vi un-
changed.
In the case where D̄ki is a hyper-rectangle, new sampled points are obtained, after
distinguishing the branching variable (b r ), by adding or substracting the required
side length from the coordinate on which we branch, pursuant the following rule :
If t ji < v ji , then
i
i +2
d br i +1
t br = t br
i
+ , a nd vbr = vbr
i i
− d br , (3.2.5)
3
i
i +1
d br i +2
t br = t br
i
− , a nd vbr = vbr
i
+ d br
i
. (3.2.6)
3
d bi r |b1i −a 1i |
ti +2 = (t 1i , . . . , t bi r ± 3 , . . . , t ni ) = (t 1i , . . . , t bi r ± 3 , . . . , t ni ),
and
Each descending hyper-rectangle D̄ki +1 and D̄ki +2 retainss one sampled point ti and
vi , respectively from their ancestor D̄ki , At the same time, old sampling points are
re-used in descending hyper-rectangles as ti +1 = ti and vi +2 = vi . More precisely :
ti +1 = ti = t 1i , . . . , t ni
1 i 1
= (a 1i + b1 − a 1i , . . . , a ni + bni − a ni )
3 3
2 i +1 1
= (a 1 + b1 − a 1 , . . . , a ni +1 + bni +1 − a ni +1 ),
i +1 i +1
3 3
and
vi +2 = vi = v1i , . . . , vni
= a 1i + b1i − a 1i , . . . , a ni + bni − a ni
= a 1i +2 + b1i +2 − a 1i +2 , . . . , a ni +2 + bni +2 − a ni +2 .
The BIRECT-V algorithm continues in this way by sampling two new points in
each potentially optimal hyper-rectangle, by adding and subtracting the required
side-length from the old points, and bisecting through the longest coordinate until
some stopping rule is satisfied. After subdivision, each rectangle resulting from the
previous iteration retains one point from its predecessor.
Notice that the sampled points vi +1 and vi +1 in D̄ki +1 belong to the same diagonal
(see Fig. 3.2.1 for illustration). This is a straightforward consequence of Theorem
1 in [45]. The same conclusion holds for hyper-rectangle D̄ki +2 .
Finally, let us emphasize that, in contrast to the naming convention used in [45] of
the sampling points as “lower” (l) and “upper” (u), to make differentiate two points
belonging to the same hyper-rectangle, we can assume without any confusion that
the new points are affected as “third” t and “vertex” v. In this way, the two points
are always identified during all the optimization process even if they are “lower”
or “upper”.
It is also of importance to stress again, that our new sampling scheme differs in
its unique and distinctive way on how new sampled points are created by using
different side-lengths, in contrast to direct-type algorithms and diagonal sampling
strategies, where they use the same side-lengths.
FIGURE 3.2.3 – Geometric interpretation of potentially optimal hyper-rectangles using the
BIRECT-V algorithm on the Branin test function in the seventh iteration : (right side), POHs
correspond to the lower-right convex hull of points marked in blue color (left side). The posi-
tion of six points (values of f (x )) obtained in BIRECT can be clearly distinguished. We observe
three sampled points at which the objective function has been re-evaluated.
Illustration
Let t1 = (t 11 , t 21 ) = (1/3, 1/3) and v1 = (v11 , v21 ) = (1, 1)T denote two points lying on the
main diagonal (see initialization in Fig. 3.2.1 ) of hyper-rectangle D̄01 = [a1 , b1 ] =
[a 11 , b11 ] × [a 21 , b21 ].
Without losing generality, we restrict our illustration to two iterations only ; the
other situations are the same. In (Fig. 3.2.1 , iteration 2), D̄23 and D̄24 are POHs. For
hyper-rectangle D̄23 , as there is only one longest side (coordinate j = 2) with side
length d 23 = 1. Therefore using the rule in Eq. 3.2.5, the new sampling points t7 and
v6 are expressed as follows :
d 23
2 2
t =
7
t 17 , t 27 = t 13 , t 23 + = , ,
3 3 3
v6 = v16 , v2 = v13 , v23 − d 23 = (1, 0) .
6
For hyper-rectangle D̄24 , we use the second rule given by Eq. 3.2.6. The new sam-
pling points are located at (see Fig. 3.2.1 , iteration 2) :
d 14
1 4 1 2
t =
8
t 14 − = ,t = ,
, t 24 t 14 −
,
3 3 2 6 3
1
v = v1 + d 1 , v2 = v1 + 1, v2 = , 1 .
9 4 4 4 4 4
2
The BIRECT-V algorithm main steps are shown in Algorithm 4, where the inputs
are problem ( f ), optimization domain (D ), and some stopping criteria : requi-
red tolerance (εp e ), the maximal number of function evaluations (M ma x ), and the
maximal number of iterations (K ma x ). BIRECT-V returns the value of the objective
function found ( fmi n ), and the point (xmi n ) as well as the algorithmic performance
measures : percent error (p e ), number of function evaluations (m ), and number
of iterations (k ) after termination.
convergence
3.3.1 Implementation
As the BIRECT-V algorithm is based on the original BIRECT algorithm, we use the
same measure of the size of the hyper-rectangle.
Note that in the DIRECT algorithm, this size is measured by the Euclidean distance
from its center to a corner, while in DIRECT-l, it corresponds to the infinity norm,
permitting the algorithm to collect more hyper-rectangles having the same size.
In BIRECT-Vl, the number of potentially hyper-rectangles in each group, to be
further divided, is reduced to at most one hyper-rectangle.
Algorithm 4 The BIRECT-V algorithm
1: BIRECT-V (f , D , o p t ) ;
Input : Objective function : f , search-space : D , tolerance : εp e , the maximal
number of function evaluations : M ma x , and the maximal number of iterations : K ma x ;
EndWhile: Global minimum : fmi n , global minimizer : xmi n , and performance mea-
sures : m , k and p e (if needed) ;
FIGURE 3.3.4 – Iteration progress of the BIRECT-Vl algorithm on the left-hand side and
BIRECT-V on the right-hand side while solving the Ackley (No. 3, n =10, from Table 3.3) test
problem.
TABLE 3.1 – Preliminary results during the first run of the BIRECT-V algorithm.
In Table 1 are listed the test problems from[18] used in this comparison, which
consist in total of 54 global optimization test problems with dimensions varying
from n = 2 to n = 10, with the main attributs : problem number, problem name, di-
mension (n ), faisible domain (D ), number of local minima, and known minimum
(f ∗ ). Note that these problems could also be found in [68], and in a more detailed
version in [62] and related up-to-date versions.
Some of these test problems have several variants, e.g. (Bohachevsky, Hartmann,
Shekel), while others (Ackley, Dixon and Price, Levy, Rastrigin, Rosenbrock, Schwe-
fel, Sphere, Sum squares, Zakharov) and can be tested for different dimensionality.
Finally, notice that it may occur occasionally that at the initial steps of the algo-
rithm, the sampling is performed near the global minimizer. In this particular si-
tuation, the feasible domain was modified in the same way as in[45], i.e., the upper
bound was increased. For clarity, the modified test problems are marked with an
asterisk.
Implementation and comparison of the newly introduced scheme with the origi-
nal BIRECT together with other DIRECT-type algorithms were performed in the
MATLAB programming language, using MATLAB R2016a on an EliteBook with the
following hardware settings :
Intel Core i5-6300U CPU @ 2.5 GHz, 8 GB of memory, and running on the Windows
10 operating system (64-bit). Potentially optimal hyper-rectangles are identified
using modified Graham’s scan algorithm. In our implementation, the output va-
lues are rounded up to 10 decimals. A test problem is considered successful if an
algorithm returns a value of an objective function that does not exceed 10−4 error
or a minimizer xmi n that achieves a comparable value in [63].
The algorithms were stopped either when the point x̄ (noted also xmi n ) was gene-
rated such that the following stopping criterion was satisfied :
f (x̄)−∗ f ∗ ≤ 10−4 , f ∗ 6= 0,
|f |
pe = (3.3.7)
f (x̄) ≤ 10−4 , f ∗ = 0,
(where f ∗ is the known global optimum) or when the number of function evalua-
tions exceeds the prescribed limit of 500, 000. (The maximum number of iterations
was set to 100, 000 but usually it is supposed to be unlimited.)
The comparison is based on two criteria : the best-found function value f (x̄) and
the number of function evaluations (f.eval.). For each test problem, the average
and median numbers of function evaluations are shown at the bottom of each
table. The best number of function evaluations is shown in bold font in Table 3.3.
The number of iterations and the execution time (measured in seconds) are only
reported in Tables 3.1 and 3.2 in the link https://data.mendeley.com/datasets/
x9fpc9w7wh.
3.3.2 Discussion
In this subsection, we discuss the efficiency of the newly introduced BIRECT-V al-
gorithm and compare it with the original BIRECT, BIRECT-l (see [46, 45]) and two
DIRECT-type algorithms. In Table 3.1, we report the results obtained by BIRECT-V
and BIRECT-Vl when the algorithm is running in the usual way without additio-
nal parameters.
In Table 3.2 are reported the results when the best-found objective function value
f (x̄) found by the BIRECT algorithm is used as a known optimal (minimal) value
(f ∗ ). In Table 3.3, are summarized the experimental results for all tested algorithms
are summarized and compared in the case where the original domain (D ) was mo-
dified. Also, the results related to this comparison are presented in Table 2.
First, it is easy to observe from Table 3.1, that our proposed partitioning scheme
requires, most often, more function evaluations than in BIRECT and BIRECT-l,
and sometimes does not reach a comparable minimum function value to that ob-
tained in BIRECT for certain test problems. This seems inappropriate and makes
the comparison not favorable to our results. This is the case, for example, of Ackley
test problems (No. 1–3).
At the same time, it requires fewer function evaluations than in DIRECT and DIRECT-l
algorithms. Also, the median value is smallest using BIRECT-Vl (921.000), com-
pared to BIRECT-V (1681.000), DIRECT-l (1752) and DIRECT (3810) algorithms.
On the other hand, our framework gives better results on the basis of the best
(minimum) function value for almost all instances compared to both versions of
BIRECT. In general, the overall average number of objective functions obtained
with BIRECT-V algorithm is approximately 61, 11% (33 out of 54). To confirm the
above-mentioned fact, it can be seen from Table 3.2, that the situation changes
completely when the best-found objective function value f (x̄) found by the BIRECT
algorithm is used as a known optimal (minimal) value (f ∗ ). Both BIRECT-V and
BIRECT-Vl algorithms give on average significantly better results compared to the
original BIRECT- and BIRECT-l algorithms.
The same is observed, especially for some problems (for n = 10 cases), as for Mi-
chalewics (No.26), and Zakharov (No.54) test problems, while others have reached
exactly the known optimal (minimal) value (f ∗ ). This is the case of the following
test problems : Perm (No.27), Power Sum (No.30), Rastrigin (No.31–33), and Zakha-
rov (No.52, 53). These results are confirmed by comparing the value of the global
minimizer xmin from the libraries ([18], [68], [64]), and the value of x̄ generated by
the algorithm (see Table 2).
More precisely, for the case of the problems : Michalewics (No.26), we found x (10) =
[1.57079632679490], Perm (No.27), the global minimizer found is xmin = [1,2,3,4],
Power Sum (No.30), the global minimum is 0, which is attained at [2,1,3,2], Ras-
trigin (No.32), and Zakharov (No.52-53) test problems, the global minimum is 0,
which is attained at xmin = 0. This situation arises occasionally, where at the early
stages of the sampling process, the algorithm samples near a global optimum. Mo-
reover, for some test problems, e.g., (Dixon and Price (No.13), Michalewics (No.25),
Powell (No.29), Schewefel problem (No.39), Trid (No.51), as previously pointed out,
we observed an excessive number of function evaluations. In this case, we observe
the following situations :
Notice that these situations are typical for diagonal-based algorithms as well as for
DIRECT-type algorithms. A detailed review could be found in [25]. Let us illustrate
the above situations in the case of our sampling strategy. Assume that a global mi-
nimum is near one of the two sampled points located at 1/3 and 2/3 along one
of the diagonals of a hyper-rectangle. This situation is in favor of BIRECT, since it
samples one of these two points per hyper-rectangle. However, for the BIRECT-V
algorithm, it may produce many unnecessary sampling points of the objective
function at vertices before this optimum is reached. Every vertex could be shared
up to 2n hyper-rectangles, where the function has been re-evaluated. In this case,
the algorithm takes significantly longer than usual to find a good solution close
to the global optimum. This can be observed from the results given in Table 3.3,
where the two algorithms reached approximatively the same best function value
in some situations.
In the opposite scenario, i.e., if the global optimum point is located at the vertex
of a hyper-rectangle, BIRECT- has a contrary impact to the previous situation. As
the optimization proceeds, BIRECT-V requires fewer function evaluations than
BIRECT, since many adjacent hyper-rectangles could share the same vertex.
In contrast to the previous situations, the same objective function value can be at-
tained in many different points of the feasible domain, as in the case of the Branin
test problem (No.9), where xmin = [3.13965, 2.275] for BIRECT-V, while for BIRECT,
xmin = [9.42383, 2.471]. This situation is current for multimodal problems (having
multiple global minima), symmetrical problems, and (convex) quadratic test pro-
blems. Therefore, BIRECT-V requires fewer function evaluations, thus leading to a
much larger set of selected potentially optimal hyper-rectangles having the same
size and objective function value.
For the problems where BIRECT-V failed to converge most often, we suggested a
modification to the original optimization domain to obtain a good approximation
reasonably closer to the real (known) global optimum. The performance of the
BIRECT-V algorithm is better compared to the original results. It is clear that this
strategy does not overcome the situation in a proper way, but it allows the algo-
rithm to avoid unnecessary sampling of objective function points at vertices and
reduces considerably the number of function evaluations.
It should be stressed that we did not adopt any specific rule or known method for
how the optimization domain is modified. Just slightly modify the domain until
we find a minimizer close to the known solution, or at least to the one obtained
by BIRECT. For example, For the Schewefel problem (No. 39), we obtained xmi n
=[420.9635416667] for BIRECT-V, and xmin = [420.9686279297] for BIRECT-Vl. The
domain was modified up to [−500, 700]10 , see [62, 61, 63].
Note that some results reported in Table 3.3, and Table 2 could be improved more
and more, e.g., Ackley problem 1, 2 and 3 could be improved to get f ( x̄ ) = 1.27161957e −
05, with a global minimizer : xmin =[0.0000031789, ...]. Also, it is shown that some
problems are sensitive to the domain modification, while others don’t really re-
quire such a modification.
From Table 3.3, the numerical results prove that both BIRECT-Vl and BIRECT-V
algorithms produce the best results based on the best found objective function
value, with about 89% (48 out of 54) for BIRECT-Vl, and 87% (47 out of 54) for
BIRECT-V. On the other hand, we observe that the number of function evaluations
is most often smallest for the BIRECT (for about 33 out of 54 of the test problems)
and (30 out of 54) of the BIRECT-l problems) when compared to BIRECT-V and
BIRECT-Vl respectively, in particular for the test problems having the same mini-
mum value.
3.4 Conclusions
This chapter proposes a new diagonal partitioning strategy for global optimization
problems. A modification of the BIRECT algorithm based on bisection and a novel
sampling scheme, contary to the most DIRECT-type algorithms, where the eva-
luation of the objective function at vertices of hyper-rectangles are not suitable
for bisection. The newly introduced BIRECT-V and its variant BIRECT-Vl were
compared against BIRECT, BIRECT-l, and two DIRECT-type algorithms[45, 46].
The experimental results revealed that the new sampling scheme gives signifi-
cantly better results for almost all test problems, particularly when the faisible do-
main is modified.
Finally, the results could also be extended to other test problems from [61]. All
these observations may be considered for future research directions.
TABLE 3.3 – Comparison between BIRECT-Vl, BIRECT-V, BIRECT-l, BIRECT, DIRECT-l, and DIRECT algorithms.
1 1.52 × 10−5 218 1.52 × 10−5 260 2.54 × 10−5 176 2.54 × 10−5 202 7.53 × 10−5 135 7.53 × 10−5 255
2 1.52 × 10−5 524 1.52 × 10−5 2728 2.54 × 10−5 454 2.54 × 10−5 1268 7.53 × 10−5 1777 7.53 × 10−5 8845
3 1.52 × 10−5 1280 1.52 × 10−5 137040 2.54 × 10−5 874 2.54 × 10−5 47792 3.57445 > 500000 7.53 × 10−5 80927
4 8.77 × 10−5 640 8.77 × 10−5 1034 9.17 × 10−5 436 9.17 × 10−5 436 9.29 × 10−5 247 9.29 × 10−5 655
5 1.83 × 10−6 254 3.17 × 10−5 524 4.02 × 10−5 468 4.02 × 10−5 476 3.09 × 10−6 205 3.09 × 10−5 327
6 1.53 × 10−6 252 1.53 × 10−6 284 3.35 × 10−5 472 3.35 × 10−5 478 2.58 × 10−6 233 2.58 × 10−5 345
7 2.88 × 10−6 284 2.88 × 10−6 282 3.68 × 10−5 474 3.67 × 10−5 480 8.21 × 10−5 573 8.21 × 10−5 693
8 2.99 × 10−6 300 2.99 × 10−6 334 6.10 × 10−5 188 6.10 × 10−5 194 6.58 × 10−5 215 6.58 × 10−5 295
9 0.39791 656 0.39791 492 0.39790 242 0.39790 242 0.39789 159 0.39789 195
10 9.82 × 10−5 2320 9.82 × 10−5 1910 9.82 × 10−5 794 9.82 × 10−5 794 3.83 × 10−5 3379 6.08 × 10−5 6585
11 4.01 × 10−5 810 4.01 × 10−5 784 4.84 × 10−5 722 4.84 × 10−5 722 5.32 × 10−5 485 6.25 × 10−5 513
12 7.57 × 10−5 10872 7.57 × 10−5 8446 7.15 × 10−5 4060 7.15 × 10−5 4060 6.45 × 10−5 54843 6.45 × 10−5 19661
13 7.02 × 10−5 35492 7.60 × 10−5 50922 9.52 × 10−5 162862 9.52 × 10−5 164826 0.66667 > 500000 5.79 × 10−5 372619
14 −0.99999 180 −0.99999 1082 −0.99999 480 −0.99999 16420 −0.99999 6851 −0.99999 32845
15 3.00000 28 3.00000 28 3.00019 274 3.00019 274 3.00009 115 3.00009 191
16 4.61 × 10−7 8456 4.61 × 10−7 9162 7.76 × 10−7 5106 7.76 × 10−7 5106 4.84 × 10−6 8379 4.84 × 10−6 9215
17 −3.86245 200 −3.86245 208 −3.86242 352 3.86242 352 −3.86245 111 −3.86245 199
18 −3.32214 542 −3.32214 542 −3.32206 764 3.32206 764 −3.32207 295 −3.32207 571
19 −1.03154 202 −1.03154 334 −1.03154 190 1.03154 334 −1.03162 137 −1.03162 321
20 9.03 × 10−6 136 9.03 × 10−6 154 9.09 × 10−5 152 9.09 × 10−5 152 2.10 × 10−5 77 2.10 × 10−5 105
21 1.83 × 10−5 454 1.83 × 10−5 558 1.83 × 10−5 660 1.83 × 10−5 1024 3.65 × 10−5 359 3.65 × 10−5 705
22 3.54 × 10−5 1182 3.54 × 10−5 7440 3.55 × 10−5 1698 3.55 × 10−5 7904 9.62 × 10−5 5297 6.23 × 10−5 5589
23 2.71 × 10−5 148 2.71 × 10−5 208 2.71 × 10−5 90 2.71 × 10−5 94 3.81 × 10−5 71 3.81 × 10−5 107
24 −1.80130 184 −1.80130 314 −1.80118 126 −1.80118 126 −1.80127 45 −1.80127 69
25 −4.68744 8430 −4.68744 7472 −4.68736 101942 −4.68736 73866 −4.68721 26341 −4.68721 13537
26 −8.60559 > 500000 −7.55576 > 500000 −7.32661 > 500000 −7.32661 > 500000 −7.84588 > 500000 −7.87910 > 500000
27 0.00132 > 500000 0.00189 > 500000 0.00203 > 500000 0.00203 > 500000 0.04054 > 500000 0.04355 > 500000
Continued on next page
Table 3.3 Continued from previous page
28 4.59 × 10−5 2786 4.59 × 10−5 1678 4.86 × 10−5 1832 4.86 × 10−5 2114 6.52 × 10−5 32331 9.02 × 10−5 14209
29 9.00 × 10−5 2872 9.00 × 10−5 3072 9.71 × 10−5 92884 9.71 × 10−5 99514 0.02488 > 500000 0.02142 > 500000
30 0.00000 204 9.97 × 10−5 40788 9.00 × 10−5 1718 9.00 × 10−5 10856 0.03524 > 500000 0.00215 > 500000
31 4.81 × 10−5 774 4.81 × 10−5 958 4.81 × 10−5 154 4.81 × 10−5 180 2.30 × 10−5 1727 2.30 × 10−5 987
32 1.29 × 10−5 9126 1.29 × 10−5 11008 1.18 × 10−5 472 1.18 × 10−5 1394 4.97479 > 500000 4.97479 > 500000
33 1.98 × 10−5 124 1.98 × 10−5 1454 2.36 × 10−5 1250 2.36 × 10−5 40254 4.97479 > 500000 9.94967 > 500000
34 9.65 × 10−5 698 9.65 × 10−5 718 9.65 × 10−5 242 9.65 × 10−5 242 9.65 × 10−5 285 9.65 × 10−5 1621
35 2.41 × 10−5 2444 2.41 × 10−5 2972 2.41 × 10−5 1494 2.41 × 10−5 1700 5.75 × 10−5 2703 8.80 × 10−5 20025
36 3.05 × 10−5 19134 3.05 × 10−5 31430 5.42 × 10−5 4590 5.42 × 10−5 10910 8.29 × 10−5 74071 8.29 × 10−5 174529
37 1.37 × 10−7 492 1.37 × 10−7 564 5.64 × 10−5 210 5.64 × 10−5 236 2.88 × 10−5 341 2.88 × 10−5 255
38 3.42 × 10−7 24272 3.42 × 10−7 16704 6.41 × 10−5 1422 6.41 × 10−5 7210 7.21 × 10−5 322039 7.21 × 10−5 31999
39 1.77 × 10−8 1492 1.77 × 10−8 86306 1.30 × 10−6 58058 1.30 × 10−6 315960 1269.34444 > 500000 1187.63199 > 500000
40 −10.15234 6618 −10.15234 5866 −10.15234 1200 −10.15234 147 −10.152234 155
41 −10.40201 2298 −10.40201 2604 −10.402269 1224 −10.40269 1180 −10.40196 141 −10.40196 145
42 −10.53544 2498 −10.53545 3324 −10.53618 1158 −10.53618 1140 −10.53539 139 −10.53539 145
43 −186.72944 806 −186.72945 1684 −186.72441 2114 −186.72441 1780 −186.72153 2043 −186.72153 2967
44 1.15 × 10−5 112 1.15 × 10−5 190 1.15 × 10−5 108 1.15 × 10−5 118 8.74 × 10−5 91 8.74 × 10−5 209
45 2.87 × 10−5 392 2.87 × 10−5 1400 2.87 × 10−5 288 2.87 × 10−5 712 7.49 × 10−5 465 9.39 × 10−5 4653
46 5.74 × 10−5 1054 5.74 × 10−5 27566 5.74 × 10−5 784 5.74 × 10−5 16974 9.63 × 10−5 2057 6.32 × 10−5 99123
47 8.74 × 10−5 248 8.74 × 10−5 280 7.94 × 10−6 226 7.94 × 10−6 244 3.53 × 10−5 77 3.52 × 10−5 107
48 3.97 × 10−5 1354 3.97 × 10−5 1776 3.97 × 10−5 836 3.97 × 10−5 1034 7.19 × 10−5 411 7.19 × 10−5 833
49 9.35 × 10−5 3394 9.35 × 10−5 9244 9.11 × 10−6 3366 9.11 × 10−6 7688 7.76 × 10−6 1809 7.76 × 10−5 8133
50 −49.99788 1312 −49.99788 1662 −49.99512 1138 −49.99512 1506 −49.99525 8731 −49.99525 5693
51 −209.98779 3114 −209.98779 11878 −209.98007 24716 −209.98007 30100 −209.92644 > 500000 −209.98085 90375
52 2.88 × 10−5 156 2.88 × 10−5 162 2.88 × 10−5 338 2.88 × 10−5 502 7.95 × 10−5 209 7.95 × 10−5 237
53 6.43 × 10−5 3810 6.43 × 10−5 4060 6.44 × 10−5 27364 6.44 × 10−5 20974 0.11921 > 500000 9.71 × 10−5 316827
54 2.607286 > 500000 2.607286 > 500000 9.41133 > 500000 9.41133 > 500000 16.47703 > 500000 28.96394 > 500000
Concluded
4
CHAPITRE
This thesis delved into the extensive field of global optimization, with a specific fo-
cus on methods for solving global optimization problems, especially in the context
of black-box functions. Throughout our journey, we explored the historical evolu-
tion of global optimization methods, highlighting algorithms like BIRECT, a key
player in our study.
The BIRECT algorithm served as our starting point, and we examined it in de-
tail, revealing crucial aspects of its Diagonal Partitioning Strategy and Sampling
Scheme, which significantly improved its efficiency and precision.
Significantly, our research also shed light on the DIRECT algorithm, a well-established
pillar of global optimization. By analyzing its history and fundamental principle,
we observed that despite its undeniable advantages, it had limitations when tack-
ling complex problems. Notably, when DIRECT encounters large cylinders, it may
show performance gaps.
It is precisely in response to these challenges that the research community has un-
dertaken essential modifications to enhance the DIRECT algorithm. Researchers
have developed variants and innovative approaches to address these weaknesses.
In this context, our contribution in the form of the BIRECT-V algorithm has taken
a central place. BIRECT-V distinguishes itself by its bold use of sampling at points
located at 1/3 and 1 along the main diagonal of the initial hyperrectangle. Unlike
most DIRECT algorithms, our approach includes the evaluation of the objective
function at the vertices for bisection, providing a deeper understanding of the ob-
71
jective function. The results of BIRECT-V have been exceptional, demonstrating
its ability to compete with other DIRECT-type algorithms while offering a fresh
perspective on dealing with large cylinders.
In summary, this thesis contributes to the extensive body of knowledge in the field
of global optimization by highlighting significant advances made to overcome the
weaknesses of the DIRECT algorithm. Our comparison with DIRECT has unders-
cored the distinct advantages of our approaches, thus paving the way for ongoing
improvements in global optimization. This research underscores the continued
importance of innovation and exploration in this field and hopes to catalyze fur-
ther research aimed at solving complex optimization problems more effectively.
Bibliographie
[1] Baker, C.A., Watson, L.T., Grossman, B., Mason, W.H., Haftka, R.T. : Parallel
Global Aircraft Configuration Design Space Exploration. Nova Science Publi-
shers, Inc., USA, 79–96.
[2] Custódio, A.L., Rocha, H., Vicente, L.N. : Incorporating minimum Frobenius
norm models in direct search. Computational Optimization and Applica-
tions. (2010), 46(2), 265-278. DOI :10.1007/s10589-009-9283-0.
[3] Dennis, J. E., Torczon,V. : Direct search methods on parallel machines. SIAM
Journal of Optimization, 1 :448–474, 1991.
[4] Di Serafino, D., Liuzzi, G., Piccialli, V., Riccio, F., Toraldo, G. : A modified DI-
viding RECTangles algorithm for a problem in astrophysics. Journal of Opti-
mization Theory and Applications. (2011), 151(1), 175-190. DOI :10.1007/
s10957-011-9856-9.
[5] Finkel, D.E. : Global optimization with the Direct algorithm. Ph.D. thesis,
North Carolina State University (2005).
[7] Finkel, D.E., Kelley, C.T. : Additive scaling and the DIRECT algorithm.
Journal of Global Optimization. (2006), 36(4), 597-608. DOI :10.1007/
s10898-006-9029-9.
73
[8] Floudas, C.A. : Deterministic Global Optimization : Theory, Methods and Ap-
plications. Nonconvex Optimization and Its Applications, vol. 37. Springer,
Boston, MA (1999). https://doi.org/10.1007/978-1-4757-4949-6.
[9] Gablonsky, J.M. : Modifications of the Direct algorithm. Ph.D. thesis, North
Carolina State University (2001).
[10] Gablonsky, J.M., Kelley, C.T. : A locally-biased form of the DIRECT algo-
rithm. Journal of Global Optimization. (2001), 21(1), 27-37. DOI :10.1023/A:
1017930332101.
[11] Galperin, E. A. : The beta-algorithm. Journal of Mathematical Analysis and
Applications, 126 :455–468, 1987.
[14] Grbić, R., Nyarko, E.K., Scitovski, R. : A modification of the direct method
for Lipschitz global optimizatio n for a symmetric function. Journal of Glo-
bal Optimization 57, 4 (2013), 1193–1212. https://doi.org/10.1007/
s10898-012-0020-3.
[15] He, J., Verstak, A., Sosonkina, M., Watson, L.T. : Performance modeling and
analysis of a massively parallel DIRECT–Part 2. The International Journal
of High Performance Computing Applications 23, 1 (2009), 29–41. https:
//doi.org/10.1177/1094342008098463.
[16] He, J., Verstak, A., Sosonkina, M., Watson, L.T. : Performance modeling and
analysis of a massively parallel DIRECT–part 1. The International Journal
of High Performance Computing Applications 23, 1 (2009), 14–28. https:
//doi.org/10.1177/1094342008098462.
[17] He, J, Watson, L.T., Sosonkina, M., Watson, L.T. : VTDIRECT95 : Serial and Pa-
rallel Codes for the Global Optimization Algorithm Direct. ACM Trans. Math.
Softw. 36, 3, Article 17 (July 2009), 24 pages. https://doi.org/10.1145/
1527286.1527291.
[18] Hedar, A. : Test functions for unconstrained global optimization. :http:
//www-optima.amp.i.kyotou.ac.jp/member/student/hedar/
Hedar_files/TestGO.htm (2005). ((accessed on 23 August 2006)).
[19] Horst, R., Pardalos, P.M., Thoai, N.V. : Introduction to Global Optimization.
Nonconvex Optimization and Its Application. Kluwer Academic Publishers
(1995).
[22] Jonas, M., Paulavičius, R., Dainius Rusakevičius, Šešok, D., Žilinskas, J. : Ap-
plication of Reduced-set Pareto-Lipschitzian Optimization to truss optimiza-
tion. Journal of Global Optimization 67, 1-2 (2017), 425–450. https://doi.
org/10.1007/s10898-015-0364-6.
[23] Jones, D.R., Perttunen, C.D., Stuckman, B.E. : Lipschitzian optimization wi-
thout the Lipschitz constant. Journal of Optimization Theory and Applica-
tion. (1993), 79(1), 157-181. DOI :10.1007/BF00941892.
[24] Jones, D.R. : The Direct global optimization algorithm. In : C.A. Floudas, P.M.
Pardalos (eds.) The Encyclopedia of Optimization, pp. (2001), 431-440. Klu-
wer Academic Publishers, Dordrect (2001).
[25] Jones, D.R., Martins, J.R.R.A. : The DIRECT algorithm : 25 years later. Journal
of Global Optimization 79, 521–566 (2021). https://doi.org/10.1007/
s10898-020-00952-6.
[26] Ma, K., Rios, L. M., Bhosekar, A., Sahinidis, N., V., Rajagopalan, S. : Branch-
and-Model : a derivative-free global optimization algorithm. Computatio-
nal Optimization and Applications. (2023), https://doi.org/10.1007/
s10589-023-00466-3.
[27] Kvasov, D.E., Sergeyev, Y.D. : Lipschitz gradients for global optimization in a
one-point-based partitioning scheme. Journal of Computational and Applied
Mathematics. (2012), 236(16), 4042-4054. DOI :10.1016/j.cam.2012.02.
020.
[28] Liberti, L., Kucherenko, S. : Comparison of deterministic and stochas-
tic approaches to global optimization. International Transactions in
Operational Research 12(3), 263–285 (2005) https://onlinelibrary.
wiley.com/doi/pdf/10.1111/j.1475-3995.2005.00503.x.https:
//doi.org/10.1111/j.1475-3995.2005.00503.x.
[29] Liu, Q. : Linear scaling and the DIRECT algorithm. Journal of Global Op-
timization 56 (2013), 1233–1245. Issue 3. https://doi.org/10.1007/
s10898-012-9952-x.
[30] Liu, Q., Cheng, W. : A modified DIRECT algorithm with bilevel partition.
Journal of Global Optimization. (2014), 60(3), 483-499. DOI :10.1007/
s10898-013-0119-1.
[31] Liu, H., Xu, S.,Wang, X.,Wu, J., Song, Y. : A global optimization algorithm for
simulation-based problems via the extended DIRECT scheme. Eng. Optim.
(2015), 47(11), 1441–1458. DOI :10.1080/0305215X.2014.971777.
[32] Liu, Q., Zeng, J., Yang, G. : MrDIRECT : a multilevel robust DIRECT algorithm
for global optimization problems. Journal of Global Optimization. (2015),
62(2), 205-227. DOI :10.1007/s10898-014-0241-8.
[33] Liuzzi, G., Lucidi, S., Piccialli, V. : A direct-based approach exploiting lo-
cal minimizations for the solution for large-scale global optimization pro-
blems. Computational Optimization and Applications. (2010), 45(2), 353-375.
DOI :10.1007/s10589-008-9217-2.
[34] Liuzzi, G., Lucidi, S., Piccialli, V. : A DIRECT-based approach exploiting local
minimizations for the solution of large-scale global optimization problems.
Computational Optimization and Applications. (2010), 45, 353-375. DOI :10.
1007/s10589-008-9217-2.
[35] Liuzzi, G., Lucidi, S., Piccialli, V. : A partition-based global optimization algo-
rithm. Journal of Global Optimization. (2010), 48(1), 113-128. DOI :10.1007/
s10898-009-9515-y.
[36] Liuzzi, G., Lucidi, S., Piccialli, V. : Exploiting derivative-free local searches in
direct-type algorithms for global optimization. Computational Optimization
and Applications pp. (2014), 1-27. DOI :10.1007/s10589-015-9741-9.
[38] Piyavskii, S.A. : An algorithm for finding the absolute minimum of a func-
tion. Theory of Optimal Solutions 2, 13–24 (1967). https://doi.org/
10.1016/0041-5553(72)90115-2. in Russian. DOI :10.1080/13928619.
2006.9637758.
[39] Shubert, B.O. : A sequential method seeking the global maximum of a func-
tion. SIAM Journal on Numerical Analysis 9, 379–388 (1972). https://doi.
org/10.1137/0709036.
[40] Pintér, J.D. : Global Optimization in Action : Continuous and Lipschitz Opti-
mization : Algorithms, Implementations and Applications. Nonconvex Op-
timization and Its Applications, vol. 6. Springer, Berlin, Germany (1996).
https://doi.org/10.1007/978-1-4757-2502-5.
[41] Paulavičius, R., Žilinskas, J., Grothey, A. : Parallel branch and bound for glo-
bal optimization with combination of Lipschitz bounds. Optimization Me-
thods and Software. (2011), 26(3), 487-498. DOI :10.1080/10556788.2010.
551537 .
[42] Paulavičius, R., Žilinskas, J. : Simplicial Lipschitz optimization without the
Lipschitz constant. Journal of Global Optimization 59, 1 (2013), 23–40.
https://doi.org/10.1007/s10898-013-0089-3.
[43] Paulavičius, R., Žilinskas, J. : Simplicial Global Optimization. SpringerBriefs
in Optimization. Springer New York, New York, NY (2014). DOI :10.1007/
978-1-4614-9093-7.
[44] Paulavičius, R., Sergeyev, Y.D., Kvasov, D.E., Žilinskas, J. : Globally-biased
DISIMPL algorithm for expensive global optimization. Journal of Glo-
bal Optimization 59, 2-3 (2014), 545–567. https://doi.org/10.1007/
s10898-014-0180-4.
[45] Paulavičius, R., Chiter, L., Žilinskas, J. : Global optimization based on
bisection of rectangles, function values at diagonals, and a set of Lip-
schitz constants. J. Glob. Optim. (2018), 71(1), 5–20. DOI :10.1007/
s10898-016-0485-6.
[46] Paulavičius, R., Sergeyev, Y.D., Globally-biased BIRECT algorithm with local
accelerators for expensive global optimization, Expert Systems with Applica-
tions. November 2019.
[47] Paulavičius, R., Sergeyev, Y.D., Kvasov, D.E., Žilinskas, J. : Globally-biased BI-
RECT algorithm with local accelerators for expensive global optimization. Ex-
pert Systems with Applications 144 (2020), 11305. https://doi.org/10.
1016/j.eswa.2019.113052.
[48] Sergeyev, Y.D. : On convergence of \divide the best" global optimization algo-
rithms. Optimization. (1998), 44(3), 303-325.
[51] Sergeyev, Y.D., Kvasov, D.E. : Global search based on diagonal partitions and a
set of Lipschitz constants. SIAM Journal on Optimization. (2006), 16(3), 910-
937. DOI :10.1137/040621132.
[52] Sergeyev, Y.D., Kvasov, D.E. : Diagonal Global Optimization Methods. FizMat-
Lit, Moscow (2008). In Russian.
[53] Sergeyev, Y.D., Kvasov, D.E. : On deterministic diagonal methods for sol-
ving global optimization problems with Lipschitz gradients. In : Optimi-
zation, Control, and Applications in the Information Age, 130, pp . Sprin-
ger International Publishing Switzerland. (2015), 315-334. DOI :10.1007/
978-3-319-18567-5-16.
[54] Sergeyev, Y.D., Kvasov, D.E. : Lipschitz global optimization. In : Cochran, J.J.,
Cox, L.A., Keskinocak, P., Kharoufeh, J.P., Smith, J.C. (eds.) Wiley Encyclopedia
of Operations Research and Management Science (in 8 Volumes) vol. 4, pp.
2812–2828. John Wiley and Sons, New York, NY, USA (2011).
[57] Stripinis, L., Paulavičius, R., Žilinskas, J. : Improved scheme for selection of
potentially optimal hyperrectangles in DIRECT. Optim. Lett. (2018), 12(7),
1699–1712. DOI :10.1007/s11590-017-1228-4.
[58] Stripinis, L., Paulavičius, R., Žilinskas, J. : Penalty functions and two-step se-
lection procedure based DIRECT-type algorithm for constrained global op-
timization. Struct. Multidiscip. Optim. (2019), 59(6), 2155–2175. DOI :10.
1007/s00158-018-2181-2.
[59] Stripinis, L. : Parallel DIRECT-type algorithms for generally constrai-
ned global optimization problems in MATLAB. https://github.com/
blockchain-group/pDIRECT-GLce.
[60] Stripinis, L., Žilinskas, J., Casado, L.G., Paulavičius, R. : On MATLAB ex-
perience in accelerating DIRECT-GLce algorithm for constrained global
optimization through dynamic data structures and parallelization. Appl.
Math. Comput. 390 (2021), 1–17. https://doi.org/10.1016/j.amc.
2020.125596.
[61] Stripinis, L., Paulavičius, R. : DIRECTGOLib - DIRECT Global Optimization
test problems Library, v1.1. Zenodo (2022). https://doi.org/10.5281/
zenodo.6491951.
[62] Stripinis, L., Paulavičius, R. : DIRECTGO : A new DIRECT-type MATLAB
toolbox for derivative free global optimization. GitHub (2022). https://
github.com/blockchain-group/DIRECTGO.
[63] Stripinis, L., Paulavičiuss, R. : DIRECTGO : A new DIRECT-type MATLAB tool-
box for derivative free global optimization. arXiv (2022). https://arxiv.
org/abs/2107.0220.
[64] Stripinis, L., Paulavičius, R. : Lipschitz-inspired HALRECT Algorithm for
Derivative-free Global Optimization. https://doi.org/10.48550/
arXiv.2205.03015.
[65] Stripinis, L. ; Paulavičius, R. An extensive numerical benchmark study of
deterministic vs. stochastic derivative-free global optimization algorithms.
https://doi.org/10.48550/ARXIV.2209.05759.
[66] Stripinis, L. ; Paulavičius, R. An empirical study of various candidate selection
and partitioning techniques in the framework. J. Glob. Optim. 2022, 1–31.
https://doi.org/10.1007/s10898-022-01185-5.
[67] Stripinis, L. ; Paulavičius, R. Experimental Study of Excessive Local Refine-
ment Reduction Techniques for Global Optimization -Type Algorithms. Ma-
thematics 2022, 10, 3760. https://doi.org/10.3390/math10203760.
[71] Watson, L.T., Baker, C.A. : A fully-distributed parallel global search algo-
rithm. Engineering Computations. Engineering Computations 18, 1/2 (2001),
155–169. https://doi.org/10.1108/02644400110365851.
Appendices
82
.1 Key characteristics of the Hedar test problems [18]
i i
11 2 [−10, 10, 4554]2 [1.0033203125 -0.7069335937] [2(−((2 −2)/(2 ))) ]
i i
12 5 [−10.40, 12.301]5 [1.006 0.709 0.594 0.545 0.523] [2(−((2 −2)/(2 ))) ]
i i
13 10 [−10, 12]10 [1.002 0.708 0.595 0.544 0.521 0.510 0.505 0.502 0.501 0.501] [2(−((2 −2)/(2 ))) ]
14 2 −− [3.1412760417, [π; π]
15 2 −− [0.0000000000 -1.0000000000] [0 ; -1]
16 2 −− [0.0006357829 -0.0010172526] [0]
23 2 −− [0.0260416667, [0]
27 4 −− [1 2 3 4] [1 2 3 4]
31 2 −− [-0.0003483073] [0]
32 5 [−5.12, 5.30]5 [0.0001139323, [0]
33 10 [−5.12, 5.12]10 [0.0001000000, [0]
34 2 −− [1.0009765625, [1]
35 5 −− [0.9997558594, [1]
36 10 [−5, 10.1]10 [0.9998168945, [1]
The objective of this appendix is to evaluate the current status of software imple-
mentations of the previously discussed algorithms. These software implementa-
tions have been in development or have been released since 1998. They share the
common capability of effectively handling black-box functions in a non-intrusive
manner. This implies that these optimization solvers can operate without access
to or modification of the source code of the optimization problem being addres-
sed.
The following table (Table ??) provides an overview of the considered solvers, ac-
companied by their key characteristics. Each solver’s handling of variable bounds
is indicated in the ’Bounds’ column, with potential entries being :
"optional" for solvers that allow bounds but do not necessitate them.
Additionally, the ’Constraints’ column reflects the solvers’ capacity to manage li-
near algebraic constraints and general constraints available solely as black-box
executables, without an explicit functional representation. This appendix will pro-
ceed to delve into each solver’s details
ASA
BOBYQA
CMA-ES
Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [55] is a genetic al- go-
rithm implemented in multiple languages including C, Matlab, and Python. Mu-
tation is performed by a perturbation with expected value zero and a co- variance
matrix which is iteratively updated to guide the search towards areas with expec-
ted lower objective values [56].
DAKOTA solvers
Design Analysis Kit for Optimization and Terascale Applications (DAKOTA) [45]
is a project at Sandia National Laboratories. DAKOTA’s initial scope was to cre-
ate a toolkit of black-box optimization methods. The scope was later expanded
to include additional optimization methods and other engineering applications,
including design of experiments, and nonlinear least squares. DAKOTA contains
a collection of optimization software packages featuring the COLINY library [124]
that includes, among others, the following solvers that we tested : 1. DAKOTA/EA :
an implementation of various genetic algorithms ;
DFO
PSWARM
SID-PSM
SID-PSM [41] is a Matlab implementation of a pattern search method with the poll
step guided by simplex derivatives [40]. The search step relies on the optimization
of quadratic surrogate models [39]. SID-PSM is designed to solve unconstrained
and constrained problems.
SNOBFIT
TOMLAB solvers
2. TOMLAB/LGO [107] : a suite of global and local nonlinear solvers [106] that im-
plements a combination of Lipschitzian-based branch-and-bound with de- ter-
ministic and stochastic local search (several versions of LGO are available, for ins-
tance under Maple, and may offer different features than TOMLAB/LGO) ;
In addition to the above solvers for which detailed results are presented in the
sequel, we experimented with several solvers for which results are not presented
here. These solvers were : 1. PRAXIS : an implementation of the minimization al-
gorithm of Brent [25] ;