Learning and Intelligent Optimization For Material Design Innovation
Learning and Intelligent Optimization For Material Design Innovation
Learning and Intelligent Optimization For Material Design Innovation
Dmitri E. Kvasov
Yaroslav D. Sergeyev (Eds.)
LNCS 10556
Learning and
Intelligent Optimization
11th International Conference, LION 11
Nizhny Novgorod, Russia, June 1921, 2017
Revised Selected Papers
123
Lecture Notes in Computer Science 10556
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrcken, Germany
More information about this series at http://www.springer.com/series/7407
Roberto Battiti Dmitri E. Kvasov
Learning and
Intelligent Optimization
11th International Conference, LION 11
Nizhny Novgorod, Russia, June 1921, 2017
Revised Selected Papers
123
Editors
Roberto Battiti Yaroslav D. Sergeyev
University of Trento University of Calabria
Trento Rende
Italy Italy
and and
This volume edited by R. Battiti, D.E. Kvasov, and Y.D. Sergeyev contains peer-
reviewed papers from the 11th Learning and Intelligent International Optimization
conference (LION-11) held in Nizhny Novgorod, Russia, during June 1921, 2017.
The LION-11 conference has continued the successful series of the constantly
expanding and worldwide recognized LION events (LION-1: Andalo, Italy, 2007;
LION-2 and LION-3: Trento, Italy, 2008 and 2009; LION-4: Venice, Italy, 2010;
LION-5: Rome, Italy, 2011; LION-6: Paris, France, 2012; LION-7: Catania, Italy,
2013; LION-8: Gainesville, USA, 2014; LION-9: Lille, France, 2015; LION-10:
Ischia, Italy, 2016). This edition was organized by the Lobachevsky University of
Nizhny Novgorod, Russia, as one of the key events of the Russian Science Foundation
project No. 15-11-30022 Global Optimization, Supercomputing Computations, and
Applications. Like its predecessors, the LION-11 international meeting explored
advanced research developments in such interconnected elds as mathematical pro-
gramming, global optimization, machine learning, and articial intelligence. Russia has
a long tradition in optimization theory, computational mathematics, and intelligent
learning techniques (in particular, cybernetics and statistics), therefore, the location of
LION-11 in Nizhny Novgorod was an excellent occasion to meet researchers and
consolidate research and personal links.
More than 60 participants from 15 countries (Austria, Belgium, France, Germany,
Hungary, Italy, Lithuania, Portugal, Russia, Serbia, Switzerland, Taiwan, Turkey, UK,
and USA) took part in the LION-11 conference. Four plenary lecturers shared their
current research directions with the LION-11 participants:
Renato De Leone, Camerino, Italy: The Use of Grossone in Optimization: A
Survey and Some Recent Results
Nenad Mladenovic, Belgrade, Serbia: Less Is More Approach in Heuristic
Optimization
Panos Pardalos, Gainesville, USA: Quantication of Network Dissimilarities and
Its Practical Implications
Julius ilinskas, Vilnius, Lithuania: Deterministic Algorithms for Black
Box Global Optimization
Moreover, three tutorials were also presented during the conference:
Adil Erzin, Novosibirsk, Russia: Some Optimization Problems in the Wireless
Sensor Networks
Mario Guarracino, Naples, Italy: Laplacian-Based Semi-supervised Learning
Yaroslav Sergeyev, University of Calabria, Italy, and Lobachevsky University of
Nizhny Novgorod, Russia: Numerical Computations with Innities and
Innitesimals
VI Preface
A total of 20 long papers and 15 short papers were accepted for publication in this
LNCS volume after thorough peer reviewing (up to three review rounds for some
manuscripts) by the members of the LION-11 Program Committee and independent
reviewers. These papers describe advanced ideas, technologies, methods, and appli-
cations in optimization and machine learning. This volume also contains the paper
of the winner (Francesco Romito, Rome, Italy) of the second edition of the
Generalization-Based Contest in Global Optimization (GENOPT: http://genopt.org).
The editors thank all the participants for their dedication to the success of LION-11
and are grateful to the reviewers for their valuable work. The support of the
Springer LNCS editorial staff is greatly appreciated.
The editors express their gratitude to the organizers and sponsors of the LION-11
international conference: Lobachevsky University of Nizhny Novgorod, Russia;
Russian Science Foundation; EnginSoft Company, Italy; NTP Truboprovod, Russia;
and the International Society of Global Optimization. Their support was essential for
the success of this event.
General Chair
Yaroslav Sergeyev University of Calabria, Italy and Lobachevsky
University of Nizhny Novgorod, Russia
Steering Committee
Roberto Battiti (Head) University of Trento, Italy and Lobachevsky
University of Nizhny Novgorod, Russia
Holger Hoos University of British Columbia, Canada
Youssef Hamadi cole Polytechnique, France
Mauro Brunato University of Trento, Italy
Thomas Sttzle Universit Libre de Bruxelles, Belgium
Christian Blum Spanish National Research Council
Martin Golumbic University of Haifa, Israel
Marc Schoenauer Inria Saclay, le-de-France
Xin Yao University of Birmingham, UK
Benjamin Wah The Chinese University of Hong Kong
and University of Illinois, USA
Yaroslav Sergeyev University of Calabria, Italy and Lobachevsky
University of Nizhny Novgorod, Russia
Panos Pardalos University of Florida, USA
Additional Reviewers
Long Papers
A New Local Search for the p-Center Problem Based on the Critical
Vertex Concept. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Daniele Ferone, Paola Festa, Antonio Napoletano,
and Mauricio G.C. Resende
GENOPT Paper
Short Papers
1 Introduction
Often, there are many ways to solve a given problem. However, not all of these
are equally good. In the Algorithm Design Problem [2,14] (ADP) we are to
nd the best way to solve a given problem; e.g. using the least computational
resources (time, memory etc.), and/or maximizing the quality of the solutions
obtained. In a sense it is the problem of how to best solve a given problem.
To date, algorithms for many real-world problems are most commonly
designed following a manual, ad-hoc, trial & error approach, making algorithm
design a tedious and costly process, often leading to mediocre results. Recently,
Programming by Optimization [8] (PbO) was proposed as an alternative design
paradigm. In PbO, dicult choices are deliberately left open at design time, thus
programming a family of algorithms (design space), rather than a single algo-
rithm. Subsequently, optimization methods are applied to automatically deter-
mine the best algorithm instance (design) for a specic use-case. Often, the latter
Steven Adriaensen is funded by a Ph.D. grant of the Research Foundation Flanders
(FWO).
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 317, 2017.
https://doi.org/10.1007/978-3-319-69404-7_1
4 S. Adriaensen et al.
The crux is that even though an execution e might have been obtained using
some design c, it might as well (with some likelihood) have been generated
using a dierent design c . As such, the observed desirability of e does not only
provide information about the performance of c, but also about that of c . In
this work we describe how Importance Sampling (IS) can be used to combine all
performance observations relevant to an algorithm, into a consistent estimator
of its performance, assuming we are able to compute the (relative) likelihood of
generating a given execution, using a given design.
The remainder of this paper is structured as follows. First, in Sect. 2, we
formally dene the ADP and related concepts such as the design space, desir-
ability of an execution and algorithm performance. Subsequently, in Sect. 3.1, we
summarize contemporary approaches to performance estimation in PbO; before
introducing our IS approach in Sect. 3.2, discussing its benets in Sect. 4 and
examining its theoretical feasibility in Sect. 5. Finally, we discuss some chal-
lenges encountered when trying to implement IS in practice, and how we have
addressed these in the implementation of a Proof of Concept in Sect. 6, which
we validate experimentally in Sect. 7, before concluding in Sect. 8.
Let C be the set of alternative algorithm instances considered, i.e. the design
space. Let D denote a distribution over X, the set of possible inputs (e.g. problem
instances to be solved and budgets available for doing so). Let E be the execu-
tion space, i.e. the set of all possible executions of any c C, on any x X.
Let P r(e|c) denote the likelihood that executing algorithm instance c, on an
input x D, results in an execution e, and f : E R a function quantifying
the desirability of an execution. We dene algorithm performance as a function
o : C R:
o(c) = P r(e|c)f (e). (1)
eE
i.e. every algorithm instance in the design space can be viewed as a distribution
over executions, whose performance corresponds to the expectation of f , over
this distribution (as illustrated in Fig. 2). Remark that executing an algorithm
instance (on x D) corresponds to sampling from its corresponding distribution.
The objective in the ADP is to nd c = arg maxcC o(c).
f(e)
Pr
o Execution Space (E)
Pr(e|c)
f
c1 c2
Performance Space ( )
Executions (E)
Thus far, optimizers of choice for PbO have been algorithm congurators,
i.e. Programming by Conguration (PbC). Here, alternative designs are rep-
resented as congurations and congurators search the space of congurations C
for one maximizing performance. As such, the ADP is treated as an Algorithm
Conguration Problem [10] (ACP). Our formulation above is very similar and
can in fact be seen as a simplied, specialization of the ACP. The most relevant
dierence, in context of this paper, lies in our denition of o. As argued in [2],
PbC treats algorithm evaluation as a (stochastic) black box function C R.
Above, we dene this mapping as a consequence of algorithm execution (see
Fig. 1), i.e. our choice of design c aects execution e in a particular way (P r),
which in turn relates to its observed desirability (f ). The performance estimation
technique proposed in this paper exploits this feature, assuming P r (and f ) to
be computable (black box) functions. In Sect. 5 we argue why this assumption
is reasonable in context of PbO.
6 S. Adriaensen et al.
f(e)
c c
Pr(ec)
Pr(ec)
c c
where G(e) is the likelihood of generating e using G. While the estimate of each
design c is based on the same E ; the weight function wc will be dierent, weighing
performance observations according to their relevance to c; e.g. in Fig. 6, observa-
tions on the left and right hand side are more relevant for c1 and c2 respectively.
o(c) is a consistent estimate of o(c), as long as wc (e) is bounded, e E. In prac-
tice, it is also important that we can actually compute G(e) for any e. We will
discuss the choice of G in more detail in Sect. 6.1, but for now it should be clear
that both conditions are met if G is some mixture of all c C, i.e.
G(e) = (c) P r(e|c). (3)
cC
where (c) > 0 is the likelihood that c is used. Note that wc (e) (c) 1
holds.
1
Furthermore, remark that if distributions do not overlap, wc (e) = (c) and o = o.
f(e)
f(e)
c1 c2 c1 c2
Pr(e|c)
wc
4 Envisioned Benefits
In this section we will discuss the benets of using IS (2) to estimate algorithm
performance in an ADP setting. In addition, we will illustrate some of these
experimentally for the abstract setup shown in Figs. 2, 3, 4, 5 and 6. Here, our
design space consists of 2 normal distributions c1 = N (1 , 1) and c2 = N (2 , 1).
Our objective is to determine which of these has the greatest mean, based on
samples E generated by alternately sampling each. To make this more challeng-
ing we add uniform white noise in [1, 1] to these observations. Results shown
are averages of 1000 independent runs, generating 1000 samples each. Obviously,
this particular instance is not representative for the ADP. Nonetheless, we would
like to argue that the observations in this section generalize to the full ADP set-
ting; e.g. computing P r requires us to actually know , i.e. o, this however is a
peculiarity of this simple setup and is denitely not the case in general (see also
Sect. 5). As IS treats P r as a black box, this fact did not aect the generality of
our results. For the critical reader, an analogous argument is made in [1] using
a somewhat more realistic ADP, benchmark 1 (see Sect. 7.1), as a running
example.
First, IS increases data-eciency. By using a single performance observation
in the estimation of many designs, we amortize the cost of obtaining it, and will
need less evaluations to obtain similarly accurate estimates. Figure 7 illustrates
this, comparing estimation errors |o(c2 ) o(c2 )| (dashed line) and |o(c2 ) o(c2 )|
(full lines) respectively, after x evaluations, for multiple setups with dierent
= 2 1 . Clearly, if is large, the distributions for c1 and c2 do not overlap,
i.e. samples from c1 are not relevant for c2 and o
o. However, the lower , the
greater the overlap. As approaches 0, observations become equally relevant
for both designs and only half the evaluations are needed to obtain similarly
accurate estimates.
Related, yet arguably more important in an optimization setting, is that we
can determine more quickly which of 2 similar designs performs better, i.e. have a
more reliable gradient to guide the search. As similar designs share performance
observations, their estimation errors will be correlated, i.e. the error on rela-
tive performance will be smaller. In the extreme case where distributions fully
overlap (e.g. = 0), performance estimates are the same. Figure 8 illustrates
this, comparing the fraction of the runs for which o(c1 ) < o(c2 ) (dashed lines)
and o(c1 ) < o(c2 ) (full lines) holds after x evaluations, for dierent . For high
values both perform good, as o(c1 ) o(c2 ). However, for approaching 0,
designs become more similar, and the independent estimate of c1 is frequently
better, even after many evaluations. However, o(c1 ) < o(c2 ) holds, even for small
, after only a few evaluations. In summary, using IS we generally expect to need
less evaluations to solve a given ADP, where at least two or more designs overlap.
Thus far we have discussed why one would use IS estimates, as opposed to
independent sample averages. But how about using regression model predictions
instead? Similar to IS, these allow one to generalize observations beyond the
design used to obtain them, as such improving data-eciency. The main dif-
ference is that IS is model-free. By choosing a regression model, one introduces
model-bias, i.e. makes prior assumptions about what the tness landscape (most
likely) looks like. Clearly, the specic assumptions are model-specic. As para-
metric models (e.g. linear, quadratic) typically impose very strict constraints,
mainly nonparametric models (e.g. Random Forests [5], Gaussian Processes [13])
are used in general ADP settings. While nonparametric models are more exi-
ble, they nonetheless hinge on the assumption that the performance of similar
designs is correlated (i.e. smoothness), which is in essence reasonable, however
the key issue is that this similarity measure is dened in representation space.
Small changes in representation (e.g. a conguration) might result in large per-
formance dierences, while large changes may not.
Using IS estimates, similar designs will also have similar estimates, but rather
than being based on similarity in representation space, they are based on simi-
larity in execution space. In fact, we can derive similarity measures for designs
based on the overlap of their corresponding distributions of executions. This is
interesting in analysis, but can also be used in solving the ADP; e.g. to maintain
the diversity in a population of designs. One way to measure overlap would be
by computing the Bhattacharyya Coecient [4].
To conclude this section, the use of the proposed IS approach does not exclude
the use of regression models. Both approaches are complementary, as dierent
executions might have a similar desirability, something one could learn based on
correlations in performance observations.
5 Theoretical Feasibility
where x is the input used, ai the decision made in the ith choice point and i
the context in which it was encountered. Computing (4) requires D(x) and (e)
to be known explicitly, which is not always the case in practice. Luckily, it can
be shown (proof
n in [1]) that we can ignore D(x) and (e) in practice, i.e. use
P r(e|c) = i=1 c (i , ai ) instead to compute wc (e), as doing so will make both
nominator, P r(e|c), and denominator, G(e), D(e) (e) times too small, and as
such their ratio, wc (e), correct. In summary, to compute o, the ability to compute
c (c C) and store (a, , f (e)) e E , suces in general.
in fact practical. First, in Sect. 6.1, we discuss some challenges encountered when
using IS estimates in practice, in an ADP setting, and how we have addressed
them in the implementation of a Proof of Concept (PoC).
While performance estimation is a key, it is also only one piece of the puzzle.
The ADP is a search problem, i.e. we must search the design space for the design
maximizing performance. In Sect. 6.2 we describe the high-level search strategy
used in our PoC. As a whole, our PoC resembles a simple SMBO-like framework,
using importance sampling estimates, in place of regression model predictions,
to guide its search. To complement this description and facilitate reproduction,
the source code of our PoC is made publicly available.3
which is the standard deviation of the distribution over E , where the relative
likelihood of drawing e E is given by wc (e). Remark that if designs are disjoint,
both (5) and (6) reduce to their sample average equivalents N and std.
following design choices: Given our observations thus far, which design should
we evaluate next? On which input? What is our incumbent? Decisions made for
all of these are critical in the realization of a framework competitive with the
state-of-the-art. As this paper is about performance estimation, which is largely
orthogonal to these other decisions, doing so was not our main objective; e.g. to
keep it simple our PoC currently only supports optimization on a single input.4
In the remainder of this section we briey discuss the decisions made for the
other two, followed by a detailed description of the high-level search strategy as
a whole.
o(c) o(cinc ) o(c) o(cinc )
EI(c) = (o(c) o(cinc )) +u
nc(c) . (7)
unc(c) unc(c)
where and are the standard normal density and cumulative distribution
will be high for designs estimated to perform well and for those
functions. EI
with high estimated uncertainty; and as such this criterion oers an automatic
balance between exploration and exploitation.
the other hand, we often lack the prior knowledge to manually select Cpool C.
Therefore we adapt Cpool dynamically with |Cpool | PSize (default 10). At line 4
we initialize Cpool to be empty. Each iteration, at lines 67, we sample N P rop
(default 100) designs (Cprop ) from P, a distribution over C, to be considered
for inclusion in Cpool . Here, P is an equal mixture of 2 distributions passed as
arguments to the framework, which can be used to inject heuristic information:
Global Prior (GP): A distribution over C, allowing the user to encode prior
knowledge about where good designs are most likely located in C.
Local Prior (LP): A distribution over C, conditioned on cinc , allowing the
user to encode prior knowledge of how cinc can most likely be improved.
Note that our search strategy only interacts with C through these distributions,
making it design representation independent. In particular, it does not assume
them to be congurations, let alone makes any assumptions about the type of
parameters. At line 8 we update cinc to be the design with the greatest lb Z in
Cprop {cinc }. Having generated Cprop , we must decide whether to include them
in Cpool or not. At line 9 we update Cpool as the P Size designs from Cprop Cpool
with maximal EI. At line 10 we select the design to be evaluated next as the
we evaluate this design at line 11 by
design in Cpool {cinc } maximizing EI,
executing it on given input x and we update E , C (i.e. M ) accordingly at
line 12. At line 13, we update cinc w.r.t. the new M . Finally, after M axEvals
iterations, we return cinc at line 15.
7 Experiments
In this section, we briey evaluate the PoC, described previously, experimentally.
We detail our setup in Sect. 7.1 and discuss our results in Sect. 7.2.
to the high similarity between designs, making it an ideal use-case for the pro-
posed IS approach. Remark that the optimal design is c : ci = 1, i with
o(c ) = 20. Also, executions are cheap, allowing us to repeat our experiments
multiple times to lter out the noise in our observations. While this bench-
mark exhibits various other specic features, making it trivial to solve, the only
information required/exploited by the IS approach is P r and f , which it treats
as black box functions. In order to compute these, we stored the number of
iterations performed (#it), and the sum of rewards received (sum r), for each
execution. The latter equals f (e), while the former suces to compute P r as
#it
(1 c#it+1 ) i=1 ci 0 #it 19
P r(e|c) = #it
i=1 ci #it = 20
20
inputs: rng, c; outputs: #it, sum_r; 18
sum_r = 0; choice point 16 o(c*)
i = 1; PoC (disc.)
14
while i 20 do PoC (cont.)
12 SMAC (disc.)
o(inc)
c within the 2000 rst evaluations (the worst needed 7573 evaluations). In the
continuous setting both SMAC and our PoC performed (only) slightly worse
than in the discretized setup.
Note that the results for ParamILS are worse than those reported in [2] and
that our PoC performs worse than WB-PURS described therein. This because
only deterministic policies, i.e. ci {0, 1} i, were considered5 in [2], and there-
fore the design space was much smaller (220 vs. 1120 ), yet included c .
Finally, note that the run times diered for all 3 frameworks. A run using our
PoC, ParamILS and SMAC took on average about 5, 20 and 30 min respectively
on our machine. However, as a single evaluation takes virtually no time, compar-
ing optimizers on this benchmark based on actual run times would be unfair, and
bias results towards those optimizers having the lowest overhead (e.g. least IO
operations), which is furthermore very machine dependent. The actual overhead
per evaluation was small for all frameworks (30180 ms), and as such negligible,
as long as an evaluation takes at least a few seconds, which is typically the case
in more realistic ADP settings.
8 Conclusion
5
As WB-PURS does not support stochastic policies it was not be included as baseline.
An IS Approach to the Estimation of Algorithm Performance 17
References
1. Supplementary Material (2017). http://ai.vub.ac.be/node/1566
2. Adriaensen, S., Nowe, A.: Towards a white box approach to automated algorithm
design. In: International Joint Conference on Articial Intelligence (IJCAI), pp.
554560 (2016)
3. Ansotegui, C., Sellmann, M., Tierney, K.: A gender-based genetic algorithm for the
automatic conguration of algorithms. In: Gent, I.P. (ed.) CP 2009. LNCS, vol.
5732, pp. 142157. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04244-7 14
4. Bhattacharyya, A.: On a measure of divergence between two multinomial popula-
tions. Sankhya Indian J. Stat. 7, 401406 (1946)
5. Breiman, L.: Random forests. Mach. Learn. 45(1), 532 (2001)
6. Denny, M.: Introduction to importance sampling in rare-event simulations. Eur. J.
Phys. 22(4), 403 (2001)
7. Hesterberg, T.: Weighted average importance sampling and defensive mixture dis-
tributions. Technometrics 37(2), 185194 (1995)
8. Hoos, H.H.: Programming by optimization. Commun. ACM 55(2), 7080 (2012)
9. Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization
for general algorithm conguration. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol.
6683, pp. 507523. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25566-3 40
10. Hutter, F., Hoos, H.H., Leyton-Brown, K., Stutzle, T.: ParamILS: an automatic
algorithm conguration framework. J. Artif. Intell. Res. 36(1), 267306 (2009)
11. Jones, D.R., Schonlau, M., Welch, W.J.: Ecient global optimization of expensive
black-box functions. J. Glob. Optim. 13(4), 455492 (1998)
12. Lopez-Ibanez, M., Dubois-Lacoste, J., Stutzle, T., Birattari, M.: The irace package,
iterated race for automatic algorithm conguration. Technical report, Universit
Libre de Bruxelles (2011)
13. Rasmussen, C.E.: Gaussian Processes for Machine Learning. MIT Press, Cambridge
(2006)
14. Rice, J.R.: The algorithm selection problem. Adv. Comput. 15, 65118 (1976)
15. Van Laarhoven, P.J., Aarts, E.H.: Simulated annealing. In: van Laarhoven, P.J.M.,
Aarts, E.H.L. (eds.) Simulated Annealing: Theory and Applications, pp. 715.
Springer, Dordrecht (1987). doi:10.1007/978-94-015-7744-1 2
Test Problems for Parallel Algorithms
of Constrained Global Optimization
1 Introduction
One of the general approaches to studying and comparing multiextremal opti-
mization algorithms is based on applying these methods to solve a set of test prob-
lems, selected at random from a certain specially constructed class. In this case,
each test problem can be viewed as a random function created by a special gen-
erator. Using multiextremal optimization algorithms with large samples of such
problems allows the operating characteristics of the methods to be evaluated (the
likelihood of properly identifying the global optimizer within a given number of
iterations), thus characterizing the eciency of each particular algorithm.
The generator for one-dimensional problems was suggested by Hill [1]. These
test functions are typical for many engineering problems; they are particularly
reminiscent of reduced stress functions in problems with multiple concentrated
loads (see [2] for example). Another widely known class of one-dimensional test
problems is produced using a generator developed by Shekel [3].
A special GLOBALIZER software suite [4] was developed to study vari-
ous one-dimensional algorithms with random samples of functions produced by
the Hill and Shekel generators. A comprehensive description of this system, its
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 1833, 2017.
https://doi.org/10.1007/978-3-319-69404-7_2
Test Problems for Parallel Algorithms of Constrained Global Optimization 19
capabilities and example uses is provided in [5]. It should be noted that the Hill
functions were successfully used in the design of a one-dimensional constrained
problem generator with controlled measure of a feasible domain [6].
Another generator for random samples of two-dimensional test functions,
successfully used in the studies by a number of authors, was developed and
investigated in [710]. A generator for functions with arbitrary dimensionality
was suggested in [11]. It was used to study certain multidimensional algorithms
as described in [1215]. Well-known collections of test problems for constrained
global optimization algorithms were proposed in [16,17].
All of these generators produce the function to be optimized. In the case
when the dimensionality is greater than two the optimization process itself can-
not be clearly observed. In this regard, it is interesting to examine a dierent
approach, initiated in [5]. In this approach, the objective function appears as
a solution to a certain supporting approximation problem, which allows the
nature of the best current estimate and the nal result to be observed, regard-
less of the number of variables. Complicating the problem statement (including
non-convex constraints) results in an increase of its dimensionality. In fact, the
proposed generator produces an approximation problem to which the objective
function is related.
2 Problem Statement
mu = eu + F (1)
where m > 0 is the particles mass, u(t) is the particles current position at a
moment of time t 0, eu is an attractive force aecting the particle, F is an
external force applied to the particle along u axis (control action). It is assumed
that control action F is a function of time and is represented as
n
F =m Ai sin(i t + i ).
i=1
Here n > 0 is the dimensionality of the vectors and determines the number of
frequencies in the control action.
Substituting 02 = e/m the Eq. (1) is reduced to a known equation of forced
oscillations
n
u + 02 u = Ai sin(i t + i ). (2)
i=1
The problem is to nd own frequency, control action and initial conditions such
that:
20 K. Barkalov and R. Strongin
1. at t [a, b] the particle would deviate from the position q0 by no more than
> 0;
2. at t = t1 , t2 , t3 , the particle would deviate from positions q1 , q2 , q3 respectively
by no more than > 0;
3. at t = t3 the particle speed would be maximized.
This problem statement can be interpreted as follows: the trajectory of particle
movement u(t) shall pass within a tube, then through the three windows,
with maximum slope in the last of the windows. The illustration in Fig. 1
shows a graph of the function u(t) of the solution to problem (2), which passes
through the tube and windows shown in the chart by dashed lines.
Solving the optimization problem (4) and nding the vectors , c the solution
to the original problem (2) can be written in accordance with the following
relationships:
n n
u0 = i=0 c2i+2 , u0 = i=0 c2i+1 i
2
Ai = 02 i2 c2i+1 + c22i+2 , 1 i n, (5)
c2i+2
i = arcsin , 1 i n.
c22i+1 +c22i+2
As follows from formula (3), the parameters c are included in the equation
solution linearly, and the parameters non-linearly. Given the constraints (4)
this allows the problem to be reformulated and c to be found without using a
numerical optimization method.
Lets consider a set of points (j , uj ), 0 j m, with the coordinates dened
as follows:
j = a + jh, uj = q0 , 0 j m 3, (6)
m2 = t1 , um2 = q1 ,
m1 = t2 , um1 = q2 ,
m = t 3 , u m = q3 ,
where h = (b a)/(m 3), i.e. the rst m 3 points are located at equal
distances in the center of the tube, the other three align with the centers of
the windows.
The requirement is that the trajectory of particle u(t) passes near the
points (j , uj ), 0 j m. If the measure of deviation from the points is dened
as the sum of the squared deviations
m
2
(c, ) = [uj u(j , , c)] ,
j=0
then the parameters c can be found (given xed values of ), by solving the least
squares problem
According to the least squares method, the solution to problem (7) can be
obtained by solving a system of linear algebraic equations regarding the unknown
c, which can be done, e.g., by Gaussian elimination.
It should also be considered that the components of frequency vector can
be placed in ascending order, so as to avoid duplicate solutions corresponding to
the vector with similar components in dierent order. In addition, it is natural
to assume that frequencies in the control action must not just be ordered but
dier by a certain positive value, as an actual physical device can only generate
22 K. Barkalov and R. Strongin
i1 (1 + ) i (1 ) 0, 1 i n. (8)
i1 (1 + ) i (1 ) 0, 1 i n,
|u(ti , , c ()) qi | , i = 1, 2, 3,
max u(t, , c ()) min u(t, , c ()) ,
t[a,b] t[a,b]
where c () is determined from (7), and the number of constraints will be depen-
dent on the number of frequencies n in the control action.
Figure 1 shows trajectory u(t), which corresponds to the solution to problem
(9) with parameters a = 1, b = 10, t1 = 13, t2 = 16.65, t3 = 18, q0 = q3 = 0,
q1 = 7.65, q2 = 9.86. The solution (with three signicant digits)
The numeric experiments described below used a generator based on the approx-
imation problem from Sect. 2. Apparently, the variation in any parameter of the
original problem (2) will change the optimization problem (9), so it is sucient
to vary just a few of them.
The centers of the rst two windows, i.e. the pairs (t1 , q1 ) and (t2 , q2 ),
were chosen as the parameters for determining the specic problem statement.
The values q1 and q2 were chosen independently and uniformly from the ranges
Test Problems for Parallel Algorithms of Constrained Global Optimization 23
[1, 10] and [10, 1], respectively. The values t1 and t2 were dependent: rst, the
value t1 was chosen from the range [b + 1, t3 2], then, the value t2 was chosen
from the range [t1 + 1, t3 1]. All other parameters in problem (2) were xed:
a = 1, b = 10, t3 = 18, q0 = q3 = 0, = 0.3. Parameter from (8) was chosen
at 0.05. The number of points in the additional grid (6) for solving the least
square problem (7) was set at 20. The problem of one-dimensional maximization
and minimization from (9) were solved by a scanning over a uniform grid of 100
nodes within the interval [a, b].
An important feature determining the existence of a feasible solution for the
problem being considered is the number and range of frequency variation in the
vector . If the range is too small, or the number of frequencies is insucient,
the feasible domain in the problem (9) will be empty. In the experiments carried
out, the number of frequencies was chosen to be n = 3, which corresponds to
= (0 , 1 , 2 , 3 ), while the variable frequency change range was set from 0.01
to 2, i.e. i [0.01, 2], 0 i 3.
Lets note some important properties of the problems produced by the gen-
erator under consideration.
Remark 1. Problem constraints (9) are dierent in terms of the time required
to verify them. For example, checking each of the rst n constraints in (9) (lets
call these constraints geometric)
i1 (1 + ) i (1 ) 0, 1 i n,
requires performing only three operations with real numbers. Testing other con-
straints (we will call them the main constraints)
|u(ti , , c ()) qi | , i = 1, 2, 3,
Suppose, that the objective function (y) (henceforth denoted by gm+1 (y)) and
the left-hand sides gj (y), 1 j m, of the constraints satisfy Lipschitz condi-
tion
The issues around the numeric construction of a Peano-type curve and the cor-
responding theory are considered in detail in [5,19]. Here we can just state that
the numerically computed curve (evolvent) is an approximation of the theoreti-
cal Peano curve with a precision at least 2m for each coordinate (the parameter
m is called the evolvent density).
Lets introduce the classication of points x from the search domain [0, 1]
using the index = (x). This index is determined by the following conditions:
which is determined and computed along the interval [0, 1]. Its value in a point
x is either the value of the left part of the constraint violated at this point (in
the case, when m), or the value of the objective function (in the case, when
= m + 1). Therefore, determining the value f (y(x)) is reduced to a sequential
computation of the values
gj (y(x)), 1 j = (x),
i.e. the subsequent value gj+1 (y(x)) is only computed if gj (y(x)) 0. The com-
putation process is completed either when the inequality gj (y(x)) > 0 becomes
true, or when the value of (x) = m + 1 is reached.
The procedure called trial at point x automatically results in determining
the index for this point. The pair of values
produced by the trial in point x [0, 1], is called the trial result.
A serial index algorithm for solving one-dimensional conditional optimization
problems (12) is described in detail in [6]. This algorithm belongs to a class of
characteristical algorithms (see [7]). It can be parallelized using the approach
described in [7] for solving unconstrained global optimization problems. Lets
briey describe the rules of the resulting parallel index algorithm (PIA).
Suppose we have p 1 computational elements (e.g., CPU cores), which
can be used to run p trials simultaneously. In the rst iteration of the method,
p trials are run in parallel at various random points xi (0, 1), 1 i p.
Suppose n 1 iterations of the method have been completed, and as a result
of which, trials were carried out in k = k(n) points xi , 1 i k. Then the
points xk+1 , . . . , xk+p of the search trials in the next (n + 1)-th iteration will be
determined according to the rules below.
I = {i : 1 i k, = (xi )} , 1 m + 1, (15)
28 K. Barkalov and R. Strongin
|zi zj |
= max 1/N
: i, j I , j < i . (17)
(xi xj )
If the set I contains less than two elements or from (17) equals zero, then
assume = 1.
4. For all non-empty sets I , 1 m + 1, determine the values
, < M,
z = (18)
min {g (xi ) : i I }, = M,
where M is the maximum current value of the index, and the vector
R = ( 1 , . . . , m ) , (19)
with positive coordinates is called the reserve vector and is used as a para-
meter in the algorithm.
5. For each interval (xi1 , xi ),1 i k + 1, calculate the characteristic R(i):
(zi zi1 )2 zi + zi1 2z
R(i) = i + 2 , = (xi1 ) = (xi ),
(r )2 i r
zi z
R(i) = 2i 4 , (xi1 ) < (xi ) = ,
r
zi1 z
R(i) = 2i 4 , = (xi1 ) > (xi ).
r
where i = (xi xi1 )1/N , and the values r > 1, 1 m + 1, are used
as parameters in the algorithm.
6. Reorder the characteristics R(i), 1 i k + 1, from highest to lowest
R(t1 ) R(t2 ) . . . R(tk ) R(tk+1 ) (20)
and choose p largest characteristics with interval numbers tj , 1 j p.
7. Carry out p new trials in parallel at the points xk+j , 1 j p, calculated by
the formulae
xtj +xtj 1
xk+j = 2 , (xtj 1 ) = (xtj ),
N
xtj +xtj 1 sign(ztj ztj 1 ) |ztj ztj 1 |
xk+j = 2 2r , (xtj 1 ) = (xtj ) = .
Test Problems for Parallel Algorithms of Constrained Global Optimization 29
The algorithm stops if the condition tj becomes true for at least one
number tj , 1 j p; here > 0 has an order of magnitude of the desired
coordinate accuracy.
Lets formulate the conditions for algorithm convergence. For this, in addition
to the exact solution y of the problem (10), we will also consider the -reserved
solution, determined by the conditions
where 1 , . . . , m are positive numbers (reserves for each constraint). Lets also
introduce the set
of all feasible points for the problem (10), which are no worse (in terms of the
objective functions value) than -reserved solution.
Using this notation, the convergence conditions can be formulated as the
theorem below.
p n(p) s(p)
1 241239
2 94064 2.56
4 45805 5.27
8 22628 10.66
The results show that the speedup is greater than the number of cores used
(hyper-acceleration). This situation is explained by the fact that the algorithm
performs an adaptive evaluation of the behavior of the objective function (calcu-
lating the lower bounds for the Lipschitz constant (17) and the current minimum
value (18)). For example, if the Lipschitz constant is better estimated in a paral-
lel version, then the parallel algorithm using p cores can be accelerated by more
than p times.
6 Conclusion
In summary, we must note that the method proposed in this work to generate
multidimensional conditional global optimization problems allows:
clear visualization of the best current estimate and the nal solution to the
problem, regardless of the number of variables;
increased dimensionality of the optimization problem being addressed by
varying the original approximation problem;
control of the feasible domain by adding extra non-convex constraints.
32 K. Barkalov and R. Strongin
References
1. Hill, J.D.: A search technique for multimodal surfaces. IEEE Trans. Syst. Sci.
Cybern. 5(1), 28 (1969)
2. Toropov, V.V.: Simulation approach to structural optimization. Struct. Optim. 1,
3746 (1989)
3. Shekel, J.: Test functions for multimodal search technique. In: Proceedings of the
5th Princeton Conference on Information Science Systems, pp. 354359. Princeton
University Press, Princeton (1971)
4. Strongin, R.G., Gergel, V.P., Tropichev, A.V.: Globalizer. Investigation of mini-
mizing sequences generated by global search algorithms for univariate functions.
Users guide. Nizhny Novgorod University Press, Nizhny Novgorod (1995)
5. Strongin, R.G., Sergeyev, Y.D.: Global Optimization with Non-convex Constraints:
Sequential and Parallel Algorithms. Kluwer Academic Publishers, Dordrecht (2000)
6. Barkalov, K.A., Strongin, R.G.: A global optimization technique with an adaptive
order of checking for constraints. Comput. Math. Math. Phys. 42(9), 12891300
(2002)
7. Grishagin, V.A., Sergeyev, Y.D., Strongin, R.G.: Parallel characteristical algo-
rithms for solving problems of global optimization. J. Glob. Optim. 10(2), 185206
(1997)
8. Sergeyev, Y.D., Grishagin, V.A.: Sequential and parallel algorithms for global opti-
mization. Optim. Methods Softw. 3, 111124 (1994)
9. Sergeyev, Y.D., Grishagin, V.A.: Parallel asynchronous global search and the
nested optimization scheme. J. Comput. Anal. Appl. 3(2), 123145 (2001)
10. Gergel, V., Grishagin, V., Gergel, A.: Adaptive nested optimization scheme for
multidimensional global search. J. Glob. Optim. 66(1), 3551 (2016)
11. Gaviano, M., Kvasov, D.E., Lera, D., Sergeyev, Y.D.: Software for generation of
classes of test functions with known local and global minima for global optimiza-
tion. ACM Trans. Math. Softw. 29(4), 469480 (2003)
12. Sergeyev, Y.D., Kvasov, D.E.: Global search based on ecient diagonal partitions
and a set of Lipschitz constants. SIAM J. Optim. 16(3), 910937 (2006)
13. Lera, D., Sergeyev, Y.D.: Lipschitz and Holder global optimization using space-
lling curves. Appl. Numer. Math. 60(12), 115129 (2010)
14. Paulavicius, R., Sergeyev, Y., Kvasov, D., Zilinskas, J.: Globally-biased DISIMPL
algorithm for expensive global optimization. J. Glob. Optim. 59(23), 545567
(2014)
15. Sergeyev, Y.D., Kvasov, D.E.: A deterministic global optimization using smooth
diagonal auxiliary functions. Commun. Nonlinear Sci. Numer. Simul. 21(13), 99
111 (2015)
Test Problems for Parallel Algorithms of Constrained Global Optimization 33
16. Floudas, C.A., Pardalos, P.M.: A Collection of Test Problems for Constrained
Global Optimization Algorithms. LNCS, vol. 455. Springer, Heidelberg (1990).
doi:10.1007/3-540-53032-0
17. Floudas, C.A., et al.: Handbook of Test Problems in Local and Global Optimiza-
tion. Springer, New York (1999). doi:10.1007/978-1-4757-3040-1
18. Pardalos, P.M., Phillips, A.T., Rosen, J.B.: Topics in Parallel Computing in Math-
ematical Programming. Science Press, New York (1992)
19. Sergeyev, Y.D., Strongin, R.G., Lera, D.: Introduction to Global Optimiza-
tion Exploiting Space-Filling Curves. Springer, New York (2013). doi:10.1007/
978-1-4614-8042-6
20. Gergel, V.P., Kuzmin, M.I., Solovyov, N.A., Grishagin, V.A.: Recognition of sur-
face defects of cold-rolling sheets based on method of localities. Int. Rev. Autom.
Control 8(1), 5155 (2015)
21. Modorskii, V.Y., Gaynutdinova, D.F., Gergel, V.P., Barkalov, K.A.: Optimization
in design of scientic products for purposes of cavitation problems. In: Simos, T.E.
(ed.) ICNAAM 2015. AIP Conference Proceedings, vol. 1738 (2016). Article No.
400013
22. Grishagin, V.A.: Operating characteristics of some global search algorithms. Probl.
Stoch. Search 7, 198206 (1978). (in Russian)
Automatic Conguration of Kernel-Based
Clustering: An Optimization Approach
1 Introduction
In this paper we consider a machine learning model based on clustering and Support
Vector Machine (SVM) classication, and its application to leakage localization in an
urban water distribution network.
Machine learning based approaches for analytically localizing leaks in a Water
Distribution Network (WDN) have been proposed, mostly based on the idea that leaks
can be detected by correlating actual modications in flow and pressure within the
WDN to the output of a simulation model whose parameters are set to evaluate the
effect induced by a leak in a specic location and with a specic severity [16].
This paper is based on an analytical framework that uses: (1) extensive simulation
of leaks for data generation; (2) kernel-based clustering to group leaks implying similar
variations in pressure and flow; (3) classication learning (i.e. SVM) to discover the
relation linking variations in pressure and flow to a limited set of probably leaky pipes.
The main contribution of this paper is the formulation of the algorithm congu-
ration problem and a proposal of a global optimization strategy to solve it.
Automatic conguration of machine learning algorithms has been attracting a
growing attention [7]: the use of default parameters, as pointed out in [8], can result in a
poor generalization performance. Grid search, the most widely used strategy for
hyperparameters optimization, is hardly feasible for more than 2 or 3 parameters. For
these reasons optimization methods have been widely proposed. Classical optimization
cannot be used as the performance measures are typically black-box functions, and/or
multimodal, whose derivatives are not available. For these reasons, global optimization
is now widely accepted as the main computational framework for hyperparameters
optimization and algorithm conguration.
Global optimization [911] methods fall in 3 large families. The rst is Partitional
Methods which offer global convergence properties and guaranteed accuracy estima-
tion of the global solutions, e.g., in the case of Lipschitz global optimization [12, 13].
A signicant application of these methods to hyperparameter optimization is given, e.g.
in [14] for the case of SVM regression or in [15] for the case of signal processing. The
second family is Random Search [1618] and the related metaheuristics like simulated
annealing, evolutionary/genetic algorithms, multistart&clustering, largely applied in
global optimization. Random Search has recently received fresh attention from the
machine learning community and increasingly considered as a baseline for global
optimization as in Hyperband [19, 20] which considers randomly sampled congura-
tions and relies on a principled early stopping rule to evaluate each conguration and
compares its performance to Bayesian Optimization (BO).
BO represents the third family which came to dominate the landscape of hyper-
parameter optimization in the machine learning community [2123]. First proposed in
[24], BO is a sequential model based approach: its key ingredients are a probabilistic
model which captures our beliefs given the evaluations already performed and an
acquisition function which computes the utility of each candidate point for the next
evaluation of f. The Bayesian framework is particularly useful when evaluations of f are
costly, no derivatives are available and/or f is nonconvex/multimodal.
Very relevant to the authors activity is that the BO can be applied to unusual design
spaces which involve categorical or conditional variables. This capability makes BO the
natural solution when not only the hyperparameters but also the specic algorithm itself
to be automatically selected in the so called algorithmic conguration [25].
The rest of the paper is organized as follows: Sect. 2 describes the hydraulic
simulation model, the general algorithmic structure and the formulation of the per-
formance measure of the clustering. Section 3 is devoted to the sequential model based
optimization for the hyperparameters optimization and the description of the software
environment utilized. Section 4 analyzes the computational performance of different
strategies while Sect. 5 contains concluding remarks.
36 A. Candelieri et al.
In this paper the reference application domain is an urban WDN. Water network design
and optimization of the operations (pump scheduling) has received a lot of attention in
the Operation Research literature [26].
The elements of the network are subject to failures, not uncommon given the
typically old age of the infrastructure. Breakages of pipes, in particular, generate bursts
and leaks which can inhibit supply service of the network (or a subnetwork) and induce
substantial loss of water, with an economic loss (no-revenue water), water quality
problems and unnecessary increase in energy costs.
The state of the WDN is usually monitored by a number of sensors which record
flows and pressures. When a leak occurs, sensors record a variation from normal
operating values. We named the vector of deviations the signature of the leak.
The main aim of the proposed machine learning approach is to use this signature
to identify the location of the leak. Therefore, the basic idea is to move between the
physical space (pipes of the WDN) and the space of leak signatures to infer a
possible relation both direct and inverse between causes (leaks on pipes) and effects
(variations in flow and pressure at the monitoring points).
Although data could be gathered looking at historical leakage events, they would be
too sparse and of poor quality. Thus, a hydraulic simulation software is used to gen-
erate a wide set of data emulating the data from sensors according to different leakage
scenarios in the WDN, consisting in placing, in turn, a leak on each pipe and varying
its severity in a given range. Our choice of simulator is EPANET 2.0, widely used for
modeling WDNs and downloadable for free from the Environmental Protection
Agency web site (http://www.epa.gov/nrmrl/wswrd/dw/epanet.html).
In this paper we focus on optimizing, throughout Sequential Model Based Opti-
mization (SMBO) [27], a set of design variables which are hyperparameters of a
machine learning based system which, given a new leak signature, infers its location as
a limited set of probably leaky pipes. Learning is performed on a dataset obtained as
vectors of N components, where N is the overall number of sensors (N = Np + Nf, with
Np = number of pressure sensors and Nf number of flow sensors), that is the Input
Space. According to the Fig. 1, learning is performed in two stages: one unsupervised,
aimed at grouping together similar signatures to reduce the number of different
effects, and one supervised, aimed at estimating the group of signatures which the
signature of a real leak belongs to. This allows for retrieving only the scenarios related
to the signatures in that cluster and, therefore, leaky pipes associated to those
signatures.
The basic idea of the analytical framework has been presented in [2830], where
Spectral Clustering (SC) is used for the unsupervised learning phase and Support
Vector Machine (SVM) classication is used for the supervised learning phase. The
cluster assignment provided to each instance (i.e. signature) is used as label to train the
SVM classier. While the clustering is used to model the direct relation from leak
scenario (i.e. leaky pipe and leak severity) to a group of similar signatures, the SVM
inverts this relation. Thus, when a reading from sensors is acquired, the variations with
respect to the faultless WDN model are computed and the resulting signature is given
Automatic Conguration of Kernel-Based Clustering 37
as input to the trained SVM which assigns the most probable cluster the signature
belongs to. Finally, the pipes relative to the scenarios in that cluster are selected as
leaky pipes.
Although SC proved to be effective, it is computationally expensive and, in this
study, we propose to replace SC with a Kernel k-means algorithm and apply SMBO to
optimally tune its hyperparameters. This allows to reduce the computational burden
induced and to infer the similarity measure even non-linear directly from data and
implicitly through the kernel trick, instead to dene a priori a similarity measures for
weighting edges of the afnity graph. It has been already reported in literature that an
equivalence exists between SC and kernel k-means [31]. In particular, a Radial Basis
Function (RBF) kernel has been chosen, characterized by its hyperparameter r. The
other hyperparameter is, naturally, the optimal number k of clusters.
Moreover, it is important to highlight that adoption of RBF kernels in global
optimization algorithms has been also investigated, and compared with statistical
models [32].
2.1 Notation
The WDN can be represented through a graph: G hVjE i, where:
V is the set of junctions, which are consumption points, reservoirs, tanks, emitters
as well as simple junctions between two pipes;
E is the set of links, which are pipes, pumps or valves.
38 A. Candelieri et al.
the set of pipes within the set of links. Furthermore, let us denote with
A fa1 ; . . .; al g
the set of severities of the (simulated) leaks. Then, the set of Leakage Scenarios
can be dened as follows:
A
SE
with severity a 2 A.
where se;a 2 S is related to a leak on the pipe e 2 E
Finally, the signature of a leak, that is the effect induced by specic leakage
scenario, is dened as:
we;a f se;a
with f(.) a function to compute, through EPANET, the expected variations of pressure
and flow at the sensors locations.
Clustering is performed on signatures, therefore every cluster Ck is a set of
signatures (i.e. similar effects due to different leaks):
a2A :
Ck xe;a 2 RN ; xe;a f se;a ; e 2 E;
C i \ C j ; 8i 6 j:
Many clustering quality measures are given in the literature [33], divided between
internal basically related to inter- and intra- distances and external which
need a set of labeled examples and treat the clustering as a classication problem.
External measures are domain specic therefore more effective in our case but they
cannot be used because as remarked above historically are sparse, unbalanced and poor
quality. For these reasons we have dened ad hoc composite index to address the
leakage localization objective. In particular, to obtain an effective and efcient local-
ization of possible leaks into a WDN, clusters have to satisfy the following properties:
The set of pipes candidate as leaky must be as limited as possible, for every cluster;
The signatures of scenarios associated to a given pipe should be spread as less as
possible among different clusters.
We refer to the rst property as compactness and to the second one as a proxy of
accuracy.
Automatic Conguration of Kernel-Based Clustering 39
Before to present the nal index, the two following sets have to be dened:
k e 2 E
E : 9a 2 A : xe;a f se;a 2 C k
and
: xe;a f se;a 2 C k :
Sk;a se;a 2 S; e 2 E
The index for evaluating the tness of clustering is the composition of two different
measures:
IC IP
I
2
where IC measures the compactness of the cluster in terms of number of pipes
identied as probably leaky with respect to the overall number of pipes in the WDN,
while IP measures a sort of accuracy that the leak is in the set of pipes identied as
leaky instead to be on other pipes.
More in detail the two measures are computed as follows:
PK
k k
k1 IC E
IC PK
k
k1 jE j
where
k
j E
jE
ICk
jE j 1
and
IP avgk IPk
where
P
1 k;a
j Aj a2A S
IPk kj :
jE
Let us suppose that Ck contains only signatures associated to all the scenarios
then:
related to only one pipe e 2 E,
k
E 1, because E
k fe g;
k;a
S 1, because 8a 2 A; Sk;a s 2 S : 9!e 2 E : x f se ;a 2 C k ;
1
j Aj
IPk j AjEj k j 1.
40 A. Candelieri et al.
On the other side, let us suppose that Ck contains signatures associated to |A|
different pipes with |A| different severities, then:
k
E j Aj, for the hypothesis;
k;a
S 1, because 8a 2 A; Sk;a s 2 S : 9!e 2 E : x f se ;a 2 C k ;
P k;a
a2A S j Aj;
1
j Aj
IPk j AjEj k j j A1 j \1.
k k
In general, E can be greater than j Aj; in the worst case E
j Aj, thus IPk 1.
Fig. 2. The District Metered Area (DMA) Neptun of the urban WDN in Timisoara, Romania:
the WDN case study considered in this paper
Neptune consists of 335 junctions (92 are consumption points) and 339 pipes. In
the proposed approach, EPANET is used to simulate a wide set of leakage scenarios,
consisting in placing, in turn, a leak on each pipe and varying its severity in a given
range.
At the end of each run, EPANET provides pressures and flows at each junction and
pipe, respectively, and, therefore, also at the monitoring points (i.e. sensors). Variations
in pressure and flow induced by a leak (i.e. signature of the leak) are computed with
respect to the simulation of the faultless WDN. Finally, a dataset is obtained having as
many instances as number-of-pipelines times number-of-discharge-coefcient-values.
discover clusters that are non-linearly separable in input space. This provides a major
advantage over standard k-means, and allows us to cluster points if we are given a
positive denite matrix of similarity values.
A general weighted kernel k-means objective is mathematically equivalent to a
weighted graph partitioning objective [31]; this equivalence has an important conse-
quence: in cases where eigenvector computation is prohibitive, kernel k-means elim-
inates the need for any eigenvector computation required by graph partitioning.
Given a set of vectors x1, x2, , xn, the kernel k-means objective can be written as a
minimization of:
K X
X
kUxi xk k2
k1 x2C k
where U x is a (non-linear) function mapping vectors xi from the Input Space to the
Feature Space, and where xk is the centroid of cluster Ck.
Expanding kUxi xk k2 in the objective function, one can obtain:
P P
2 xj 2C k Uxi U xj xj ;xi 2C k U xj Uxi
Uxi Uxi :
jC k j jCk j2
Therefore, only inner products are used in the computation of the Euclidean dis-
k
tance between every vector and the centroid
of the cluster C . As a conclusion, given a
kernel matrix K, where Kij = Uxi U xj , these distances can be computed without
knowing explicit representations of U x.
Although the overall analytical leakage localization is composed of two learning stages,
the focus of this paper is on the optimization of hyperparameters of the rst unsu-
pervised learning phase (i.e. Kernel k-means clustering).
Clustering of leak signatures is aimed at grouping together similar effects induced
by different (simulated) leaks. This allows to implement the second supervised learning
phase considering a limited number of labels (i.e. the number of clusters instead of the
number of pipes of the WDN). Previous Fig. 1 summarizes the overall machine
learning based leakage localization approach where Kernel k-means clustering
replaces Spectral Clustering used in the preliminary papers.
Since the focus of this paper is to replace the original SC phase with a kernel-based
k-means, just the two following hyperparameters are taken into account:
The number k of clusters a discrete decision variable (i.e. an integer);
The value of the RBF kernels parameter r a continuous decision variable.
The SVM hyperparameters are not part of the optimization process in this paper.
This section summarizes the results obtained. The experimental setting consists in:
Grid search vs SMBO (using both GP and RF to generate the surrogate of the
objective function): with k = 3, , 13 and r in [0.00001, 0.1], with 70 values for r,
equally distributed in the range, are used for the grid search;
Maximization of the index I used to measure the performance of clustering output
with respect to leakage localization properties (as dened in Sect. 2.1):
Termination criteria, based on a limit on the number of function evaluations: 770
function evaluations (equals to the number of congurations into the grid),
where 220 are used as initial design.
The following Table 1 summarizes the results obtained:
Table 1. Results: best performance, hyperparameter conguration, time and iterations among
the three approaches (Grid Search, GP- and RF- based SMBO)
Clustering performance k* r* Time Last iteration with
I (best seen) (sec) improvement
Grid 0,516 3 0,0000100 6165,03 NA
search
GP 0,505 3 0,0055322 9704,88 388
RF 0,556 3 0,0000112 12317,92 687
The best seen of the performance index I, over the 770 function evaluations;
Values of the hyperparameters, K* and r*, associated to the best seen;
Overall execution time (sec), computed as total on all the 770 evaluations;
Last iteration with improvement of the clustering performance index I.
SMBO using RF proved to be the most effective strategy. In particular, it was able
to identify a hyperparameters conguration of the kernel k-means outside the grid
and associated to the highest value of the clustering performance index I.
On the contrary, SMBO using GP was not able to converge to a better hyperpa-
rameters conguration than the one identied by the grid search. This was probably due
to the nature of k, indeed RF is usually preferred to GP in the case of categorical
variables.
The following Fig. 3 compares the convergence of the different approaches. SMBO
with GP converges very fast after 388 evaluations of the objective function no more
improvements are obtained, even if the best seen value of I is lower than the one
obtained through the grid search.
It is important to highlight that SMBO with RF provides the following benets:
it is able to nd a hyperparameters conguration outperforming grid search as well
as SMBO with GP in terms of effectiveness (clustering performance index, I);
Automatic Conguration of Kernel-Based Clustering 45
Fig. 3. Best seen over the iterations for SMBO with RF, SMB with GP and the best value
obtained over the grid search (independent on the iterations)
5 Conclusions
Any performance comparison between Bayesian Optimization and other global opti-
mization strategies can be only platform and problem dependent and thus difcult to
generalize: in [19] it is stated that random search offers a simple, parallelizable and
theoretically sound launching point while Bayesian Optimization may offer improved
empirical accuracy but its selection models are intrinsically sequential and thus
difcult to parallelize. The main problem is that Bayesian Optimization scales poorly
with the number of dimensions: in [46] it is stated that the approach is restricted to
problems of moderate dimensions up to 10 dimensions and a random embedding is
proposed to identify a work space of a much smaller number of dimensions. Other
proposal to scale Bayesian Optimization to higher dimensions are in [19, 47].
The computational results reported in this paper substantiate the known fact that the
Bayesian framework is suitable to objective functions which are costly to evaluate and
black box. Moreover, they can be applied to unusual design spaces which involve
categorical or conditional inputs and are therefore able to deal with such diverse
domains as A/B testing, recommender systems, reinforcement learning, environmental
monitoring, sensor networks, preference learning and interactive interfaces.
The next activities will leverage the capability RF based SMBO to handle condi-
tional parameters as well; we will also consider the optimization of the whole machine
learning pipeline, moving towards an automatic algorithm conguration setting.
References
1. Xia, L., Xiao-dong, W., Xin-hua, Z., Guo-jin, L.: Bayesian theorem based on-line leakage
detection and localization of municipal water supply network. Water Wastewater Eng. 12
(2006)
2. Sivapragasam, C., Maheswaran, R., Venkatesh, V.: ANN-based model for aiding leak
detection in water distribution networks. Asian J. Water Environ. Pollut. 5(3), 111114
(2007)
3. Xia, L., Guo-jin, L.: Leak detection of municipal water supply network based on the
cluster-analysis and fuzzy pattern recognition. In: 2010 International Conference on
E-Product E-Service and E-Entertainment (ICEEE), vol. 1(5), pp. 79 (2010)
4. Lijuan, W., Hongwei, Z., Hui, J.: A leak detection method based on EPANET and genetic
algorithm in water distribution systems. In: Wu, Y. (ed.) Software Engineering and
Knowledge Engineering: Theory and Practice. AISC, vol. 114, pp. 459465. Springer,
Heidelberg (2012). doi:10.1007/978-3-642-03718-4_57
5. Nasir, A., Soong, B.H., Ramachandran, S.: Framework of WSN based human centric cyber
physical in-pipe water monitoring system. In: 11th International Conference on Control,
Automation, Robotics and Vision, pp. 12571261 (2010)
6. Soldevila, A., Fernandez-Canti, R.M., Blesa, J., Tornil-Sin, S., Puig, V.: Leak localization in
water distribution networks using Bayesian classiers. J. Process Control 55, 19 (2017)
7. Franzin, A., Cceres, L.P., Sttzle, T.: Effect of Transformations of Numerical Parameters in
Automatic Algorithm Conguration, IRIDIA Technical Report 2017-006 (2017)
8. Bagnall, A., Cawley, G.C.: On the Use of Default Parameter Settings in the Empirical
Evaluation of Classication Algorithms. arXiv:1703.06777v1 [cs.LG] (2017)
Automatic Conguration of Kernel-Based Clustering 47
28. Candelieri, A., Soldi, D., Archetti, F.: Cost-effective sensors placement and leak localization
- The Neptun pilot of the ICeWater project. J. Water Supply: Res. Technol. AQUA 64(5),
567582 (2015)
29. Candelieri, A., Soldi, D., Conti, D., Archetti, F.: Analytical leakages localization in water
distribution networks through spectral clustering and support vector machines. The icewater
approach. Procedia Eng. 89, 10801088 (2014)
30. Candelieri, A., Archetti, F., Messina, E.: Improving leakage management in urban water
distribution networks through data analytics and hydraulic simulation. WIT Trans. Ecol.
Environ. 171, 107117 (2013)
31. Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts.
In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp. 551556 (2004)
32. ilinskas, A.: On similarities between two models of global optimization: statistical models
and radial basis functions. J. Global Optim. 48(1), 173182 (2010)
33. Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Prez, J.M., Perona, I.: An extensive
comparative study of cluster validity indices. Pattern Recogn. 46(1), 243256 (2013)
34. Scholkopf, B., Smola, A., Muller, K.R.: Nonlinear component analysis as a kernel
eigenvalue problem. Neural Comput. 10, 12991319 (1998)
35. Jones, D.R., Schonlau, M., Welch, W.J.: Efcient global optimization of expensive
black-box functions. J. Global Optim. 13(4), 455492 (1998)
36. Horn, D., Wagner, T., Biermann, D., Weihs, C., Bischl, B.: Model-based multi-objective
optimization: taxonomy, multi-point proposal, toolbox and benchmark. In: Gaspar-Cunha,
A., Henggeler Antunes, C., Coello, C.C. (eds.) EMO 2015. LNCS, vol. 9018, pp. 6478.
Springer, Cham (2015). doi:10.1007/978-3-319-15934-8_5
37. Ginsbourger, D., Le Riche, R., Carraro, L.: Kriging is well-suited to parallelize optimization.
In: Tenne, Y., Goh, C.K. (eds.) Computational Intelligence in Expensive Optimization
Problems. ALO, vol. 2, pp. 131162. Springer, Heidelberg (2010). doi:10.1007/978-3-642-
10701-6_6
38. Bischl, B., Wessing, S., Bauer, N., Friedrichs, K., Weihs, C.: MOI-MBO: multiobjective
inll for parallel model-based optimization. In: Pardalos, Panos M., Resende, M.G.C.,
Vogiatzis, C., Walteros, J.L. (eds.) LION 2014. LNCS, vol. 8426, pp. 173186. Springer,
Cham (2014). doi:10.1007/978-3-319-09584-4_17
39. Bergstra, J.S., Bardenet, R., Bengio, Y., Kgl, B.: Algorithms for hyperparameter
optimization. In: Advances in Neural Information Processing Systems, pp. 25462554
(2011)
40. Bischl, B., Richter, J., Bossek, J., Horn, D., Thomas, J., Lang, M.: mlrMBO: A Modular
Framework for Model-Based Optimization of Expensive Black-Box Functions. arXiv
preprint arXiv:1703.03373 (2017)
41. Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: Combined selection
and hyperparameter optimization of classication algorithms. In: Proceedings of
ACM SIGKDD, pp. 847855 (2013)
42. Lang, M., Kotthaus, H., Marwedel, P., Weihs, C., Rahnenfhrer, J., Bischl, B.: Automatic
model selection for high-dimensional survival analysis. J. Stat. Comput. Simul. 85(1), 6276
(2015)
43. Horn, D., Bischl, B.: Multi-objective parameter conguration of machine learning
algorithms using model-based optimization. In: 2016 IEEE Symposium Series on
Computational Intelligence (SSCI), pp. 18 (2016)
Automatic Conguration of Kernel-Based Clustering 49
44. Richter, J., Kotthaus, H., Bischl, B., Marwedel, P., Rahnenfhrer, J., Lang, M.: Faster
model-based optimization through resource-aware scheduling strategies. In: Festa, P.,
Sellmann, M., Vanschoren, J. (eds.) LION 2016. LNCS, vol. 10079, pp. 267273. Springer,
Cham (2016). doi:10.1007/978-3-319-50349-3_22
45. Kvasov, D.E., Sergeyev, Y.D.: Deterministic approaches for solving practical black-box
global optimization problems. Adv. Eng. Softw. 80, 5866 (2015)
46. Wang, Z., Zoghi, M., Hutter, F., Matheson, D., De Freitas, N.: Bayesian optimization in high
dimensions via random embeddings. In: AAAI Press/International Joint Conferences on
Articial Intelligence (2013)
47. Klein, A., Falkner, S., Bartels, S., Hennig, P., Hutter, F.: Fast Bayesian Optimization of
Machine Learning Hyperparameters on Large Datasets. arXiv:1605.07079 (2017)
Solution of the Convergecast Scheduling
Problem on a Square Unit Grid When
the Transmission Range is 2
Adil Erzin(B)
1 Introduction
In the wireless sensor networks (WSN) the data collected by the sensors should
be delivered to the analytical center. The process of transferring the packets
of information from the sensors in such a center - a base station (BS) is called
an aggregation of the data. Aggregation time (or latency), i.e. a period during
which the data from all sensors fall in the BS, is the most important criterion in
the quick response networks. The shorter aggregation time the more eectively
WSN can react to the possible events.
A synthesis of the network through which the data is transmitted, as a rule,
carried out object to the criterion of minimum communication power consump-
tion [13]. As a result, a constructed communication graph (CG) is highly sparse.
Consequently, not all vertices (sensors) can transmit the collected data directly
to the BS. Packets from the most vertices are going through the other vertices,
and the path from some vertex to the BS may consist of a big number of edges.
In the formulations of the aggregation problem the volume of the transmitted
data, as usually, does not taken into account. So the packets are considered to be
equal in length for all vertices of the CG, and each packet is transmitted along
any edge during one time round (slot).
In the majority of the wireless networks, an element (vertex or node) cannot
transmit and receive packets at the same time (half duplex mode), and a vertex
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 5063, 2017.
https://doi.org/10.1007/978-3-319-69404-7_4
Solution of the Convergecast Scheduling Problem 51
cannot receive more than one packet simultaneously. Moreover, due to the need of
energy saving, each sensor sends the packet once during the aggregation period
(session). This means that the packets are transmitted along the edges of a
desired aggregation tree (AT) rooted in BS, and an arbitrary vertex in the AT
must rst receive the packets from all its children, and only then can send the
aggregated packet to its parent node. So, we assume that the arcs of the AT are
oriented towards the root (BS), and if the AT is known, then the partial order
on the set of arcs of the AT is known too.
In the most WSNs, the sensors use a common radio frequency to transmit
the messages. So, if in the sensors reception area are working more than one
transmitter, then (due to the radio wave interference phenomenon), the receiver
cannot get any correct data packet. Such a situation is called a conflict or a
collision.
In the conict-free data aggregation problem it is necessary to nd an aggre-
gation tree and a conict-free schedule of minimal length [46]. This problem is
known as a Convergecast Scheduling Problem (CSP), and it is NP-hard even in
the case when the AT is given [7].
If the AT is known, it is possible to construct a graph of conicts (GC) as
follows. Each node in the GC is associated with the arc in the AT. Two vertices in
the GC are linked by an edge, if the simultaneous transmission of the respective
arcs in the AT implies a conict. There is an arc going from the node i to the
node j in the GC, if the end of the arc i coincides with the beginning of the arc
j in the AT.
It is obvious that the CSP in the case of a given AT reduces to the problem
of mixed coloring of the GC [8], which is also NP-hard, and stated as follows. Let
the mixed graph G = (V, AE) with vertex set V , the set of arcs A and the set of
edges E is given. Graph G is k-colorable if exists a function f : V {1, . . . , k}.
In the problem of mixed graph coloring it is required to nd a minimal k, for
which exists such k-coloring, that if two vertices i and j are joined by an edge
(i, j) E, then f (i) = f (j), and if there is an arc (i, j) A, then f (i) < f (j).
The problem of conict-free data aggregation has been intensively studied
by both theoreticians and practitioners [13,68,10,11]. A number of heuristic
algorithms was proposed to construct an approximate solution [46,912]. For
some of them the guaranteed accuracy bounds in terms of degree and radius
of the CG were found [13]. To assess the quality of the other heuristics the
numerical experiments were carried out [4,9].
The literature also addresses the special cases of the problem. For example,
when the conicts occur only between the children of common parent in the AT
[9]. Such a situation occurs when the sensors use dierent radio frequencies for data
transmission [12]. In this case, if AT is given, the problem can be solved in a poly-
nomial time, but when AT is a desired tree the problem remains NP-hard [14].
In [15] a special case of CG in the form of a unit square grid, in which in
each node a sensor is located, and the transmission range of each sensor is 1, is
considered. A simple polynomial algorithm for constructing an optimal solution
to this problem was proposed. In [7] a similar problem in the case when the
52 A. Erzin
Fig. 1. (a) CG and AT (bold lines); (b) Feasible schedule of length 6. (Color gure
online)
Fig. 2. (a) CG and AT; (b) Graph of Conicts (GC); (c) Feasible coloring. (Color
gure online)
for example, when the CG is the unit square grid, and a transmission distance
is 1 [15]. In this paper, we are interested in the problem when a CG is also a
unit square grid, but the transmission distance is 2 (in L1 metric). In [7] for the
(n + 1) (m + 1) grid with the BS at the origin (0, 0) and when transmission
distance equals 2, an algorithm for constructing a schedule, the length of which
(when n and m are even) equals (n + m)/2 + 3, is proposed.
Fig. 3. (a) Grid graph; (b) Conict (infeasible) transmissions; (c) Conict-free (feasi-
ble) transmissions. (Color gure online)
The restrictions from the previous section naturally have to be met also. For
example, in Fig. 3b the unacceptable transmissions are displayed (the transmis-
sions shown by arrows of the same color cannot be performed simultaneously).
And in Fig. 3c the conict-free transmissions are indicated (the arrows of the
same color) which can be performed simultaneously.
Definition 1. The distance from the vertex to the BS is the minimum number
of transmissions from this vertex.
Property 1. If at least two vertices in the arbitrary graph are at the distance R
from the BS, then the aggregation time cannot be less than R + 1.
Since in the considered grid three vertices are at the distance D from the BS,
then follows the obvious.
Proof. We rst consider the dierent (up to symmetry) transmissions from the
vertex (n, m) at the moment (time round) 1 (Fig. 4). At that, let us encode each
possible action (transmission) as a (t = a; b), where a is a time round, and b
is the number/code of the allowable transmission. For example, when (t = 1; 0)
in Fig. 4 the vertex (n, m) is silent (dont send a packet), and if (t = 1; 1) it
transfers the packet to another blue vertex which is also located at the distance
D from the BS (the origin).
In the gures the node with the red circle cannot transmit a packet during
the moments 1, 2, . . . , t, but vertex with the green circle can transmit at the time
round t. For example, in the case (t = 1; 1) the receiver may hear 6 extra vertices
besides the sender. Thus, these vertices (5 yellow and one blue) must be silent
and in Fig. 4 they have the red circles.
Fig. 4. Possible transmissions from the vertex (n, m) at the moment 1. (Color gure
online)
When we proved the statement, we have considered all possible cases (more
than 100 000), but they, of course, cannot be presented all in a single paper, so
here we will describe only a few of them as the examples. So, we present not a
complete proof, but only illustrate the proof. The analysis of all cases is planned
to place in Internet in the foreseeable future.
Case (t = 1;0). Let us consider the possible actions of the vertex (n1, m) at the
time round 1 (Fig. 5), and then consider the case (t = 1; 0.0.0) in detail (when
all blue nodes (at the distance D) are silent (Fig. 6) at the moment 1).
The possible actions of the vertex (n, m) at the time round 2 are shown in
Fig. 7.
Consider the possible actions of the vertex (n, m) at the time 2 (Fig. 7), and
the detailed case (t = 2; 0) when the node (n, m) is silent. Consider the behavior
56 A. Erzin
Fig. 5. Possible transmissions from the node (n 1, m) at the moment 1, when vertex
(n, m) is silent.
Fig. 6. Possible transmissions from the node (n, m 1) at the moment 1, when vertices
(n, m) and (n 1, m) are silent. (Color gure online)
Fig. 7. Possible transmissions from the node (n, m) at the moment 2, when all blue
nodes were silent at moment 1. (Color gure online)
of the vertex (n 1, m) at the time round 2. If it is silent, then after two time
rounds remain at least two vertices at the distance D, which have not started
transmitting. Therefore, by Property 1, the length of the schedule cannot be less
than 2 + D + 1 = D + 3 (Fig. 8).
Solution of the Convergecast Scheduling Problem 57
Fig. 8. Possible transmissions from the node (n 1, m) at the moment 2, when all blue
nodes were silent at moment 1, and vertex (n, m) is silent at moment 2. (Color gure
online)
Let us consider the case (t = 2; 0.1). Since the vertex (n, m) is silent, then,
according to Property 1, both vertices (n 1, m) and (n, m 1) must transmit.
The transmission cases are shown in Fig. 9.
Fig. 9. Possible transmissions from the node (n, m 1) at the moment 2 in case (t =
2; 0.1).
Fig. 10. Possible transmissions from the node (n, m) at the moment 3 in case (t =
2; 0.1.1).
Fig. 13. Possible transmissions from the node (n, m 1) in the case (t = 2; 1).
Fig. 14. Possible transmissions from the node (n 1, m 1) at the moment 3 in the
case (t = 2; 1.1).
The author have considered all possible transmission from the vertices at a
distance of at least D 2 from the BS (the origin), and it is shown that the
length of the schedule cannot be less than D + 3, which proves the lemma.
60 A. Erzin
Fig. 15. Cases (t = 1; 3), (t = 2; 1.1) and (t = 3; 0). (Color gure online)
Fig. 16. Possible transmissions from the node (n 1, m 1) at the moment 4. (Color
gure online)
3.2 Algorithm A
In this section, we present a version of the pseudo-code of the algorithm A which
is proposed in [7]. The set of vertices with identical ordinates for convenience we
call a layer.
Algorithm A.
Step 1. Set t = 1.
Send from all even vertices (0, m 1), (2, m 1), . . . , (n, m 1) at the layer
m 1 the packets up to a distance 1 to the corresponding vertices at the layer
m.
Send from all vertices at the layer m 3 the packets down to a distance of
2, i.e., to the corresponding nodes at the layer m 5.
Send from all even vertices (0, 1), (2, 1), . . . , (n, 1) at the layer 1 the packets
up to a distance 1 to the corresponding vertices at the layer 2.
Step 2. Set t = 2.
Send from all odd vertices (1, m 1), (3, m 1), . . . , (n 1, m 1) at the
layer m 1 the packets up to a distance of 1 to the corresponding vertices at
the layer m.
Send from all vertices at the layer m 5 the packets down to a distance of
2, i.e. to the corresponding vertices at the layer m 7.
Send from all odd vertices (1, 1), (3, 1), . . . , (n 1, 1) at the layer 1 the
packets up to a distance 1 to the corresponding nodes at the layer 2.
Step 3. Set t = t + 1 and k = m 2(t 3).
Send from all vertices at the layer k the packets down to a distance 2 to the
corresponding vertices at the layer k 2.
Send from all vertices at the layer k 2t 1 the packets down to a distance
of 2 to the corresponding nodes at the layer k 2t 3.
If k 2t 1 > 3, then go to Step 3.
Solution of the Convergecast Scheduling Problem 61
Note that the vertical aggregation is carried out with time complexity O(m),
and the horizontal aggregation is carried out with time complexity O(n). There-
fore, the complexity of algorithm A is O(n + m).
Algorithm A returns a schedule which length is D + 3. From Lemma 1 we
know that the aggregation time cannot be less than D + 3. Hence, the following
theorem holds.
4 Conclusion
In this paper we found an exact lower bound for the length of the conict-free
schedule of data aggregation in the unit square grid (unit disk graph in L1 metric)
when the transmission range of each vertex is 2. This lower bound coincides with
the length of the schedule constructed by algorithm A in a polynomial time.
Consequently, polynomial time algorithm A constructs an optimal schedule, the
length of which is L(n, m) = (n + m)/2 + 3.
References
1. Erzin, A., Plotnikov, R.: Using VNS for the optimal synthesis of the communication
tree in wireless sensor networks. Electro. Notes Discrete Math. 47, 2128 (2015)
2. Erzin, A., Plotnikov, R., Mladenovic, N.: Variable neighborhood search variants
for min-power symmetric connectivity problem. Comput. Oper. Res. 78, 557563
(2017)
3. Plotnikov, R., Erzin, A., Mladenovic, N.: Variable neighborhood search-based
heuristics for min-power symmetric connectivity problem in wireless networks.
In: Kochetov, Y., Khachay, M., Beresnev, V., Nurminski, E., Pardalos, P. (eds.)
DOOR 2016. LNCS, vol. 9869, pp. 220232. Springer, Cham (2016). doi:10.1007/
978-3-319-44914-2 18
4. De Souza, E., Nikolaidis, I.: An exploration of aggregation convergecast scheduling.
Ad Hoc Netw. 11, 23912407 (2013)
5. Malhotra, B., Nikolaidis, I., Nascimento, M.A.: Aggregation convergecast schedul-
ing in wireless sensor networks. Wirel. Netw. 17, 319335 (2011)
6. Cheng, C.-T., Tse, C.K., Lau, F.C.M.: A delay-aware data collection network struc-
ture for wireless sensor networks. IEEE Sens. J. 11(3), 699710 (2011)
7. Erzin, A., Pyatkin, A.: Convergecast scheduling problem in case of given aggre-
gation tree. The complexity status and some special cases. In: 10th International
Symposium on Communication Systems, Networks and Digital Signal Processing,
Article 16, 6 p. IEEE-Xplore, Prague (2016)
8. Hansen, P., Kuplinsky, J., De Werra, D.: Mixed graph colorings. Math. Methods
Oper. Res. 45, 145160 (1997)
9. Incel, O.D., Ghosh, A., Krishnamachari, B., Chintalapudi, K.: Fast data collection
in tree-based wireless sensor networks. IEEE Trans. Mob. Comput. 11(1), 8699
(2012)
10. Wang, P., He, Y., Huang, L.: Near optimal scheduling of data aggregation in wire-
less sensor networks. Ad Hoc Netw. 11, 12871296 (2013)
11. Li, H., Hua, Q.-S., Wu, C., Lau, F.C.M.: Minimum-Latency Aggregation Schedul-
ing in Wireless Sensor Networks under Physical Interference Model. HKU CS Tech-
nical report TR-2010-07, Source: DBLP (2010)
12. Ghods, F., Youse, H., Mohammad, A., Hemmatyar, A., Movaghar, A.: MC-MLAS:
multi-channel minimum latency aggregation scheduling in wireless sensor networks.
Comput. Netw. 57, 38123825 (2013)
Solution of the Convergecast Scheduling Problem 63
13. Xu, X., Li, X.-Y., Mao, X., Tang, S., Wang, S.: A delay-ecient algorithm for data
aggregation in multihop wireless sensor networks. IEEE Trans. Parallel Distrib.
Syst. 22, 163175 (2011)
14. Slater, P.J., Cockayne, E.J., Hedetniemi, S.T.: Information dissemination in trees.
SIAM J. Comput. 10(4), 692701 (1981)
15. Gagnon, J., Narayanan, L.: Minimum latency aggregation scheduling in wireless
sensor networks. In: Gao, J., Efrat, A., Fekete, S.P., Zhang, Y. (eds.) ALGOSEN-
SORS 2014. LNCS, vol. 8847, pp. 152168. Springer, Heidelberg (2015). doi:10.
1007/978-3-662-46018-4 10
A GRASP for the Minimum Cost SAT Problem
1 Introduction
Propositional Satisability (SAT) and its derivations are well known problems
in logic and optimization, and belong to the special class of NP-complete prob-
lems [11]. Beside playing a special role in the theory of complexity, they often
arise in applications, where they are used to model complex problems whose
solution is of particular interest.
One such case surfaces in logic supervised learning. Here, we have a dataset of
samples, each represented by a nite number of logic variables, and a particular
extension of the classic SAT problem - the Minimum Cost Satisability Problem
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 6478, 2017.
https://doi.org/10.1007/978-3-319-69404-7_5
A GRASP for the Minimum Cost SAT Problem 65
1 Algorithm GRASP()
2 x Nil ;
3 z(x ) + ;
4 while a stopping criterion is not satisfied do
5 Build a greedy randomized solution x ;
6 x LocalSearch(x) ;
7 if z(x) < z(x ) then
8 x x ;
9 z(x ) z(x) ;
10 return x
clause can be covered only by a single literal x due to the choices made in
previous iterations then x is selected to cover the clause. Otherwise, if there
are not clauses covered by only a single literal, the addition of literals to the
solution takes place according to a penalty function penalty(), which greedily
sorts all the candidates literals, as described below.
Let cr(x) be the number of clauses yet to be covered that contain x. We then
compute:
c(x) + cr(x)
penalty(x) = . (1)
cr(x)
This penalty function evaluates both the benets and disadvantages that
can result from the choice of a literal rather than another. The benets are
proportional to the number of uncovered clauses that the chosen literal could
cover, while the disadvantages are related to both the cost of the literal and
the number of uncovered clauses that could be covered by x. The smaller the
penalty function penalty(x), the more favorable is the literal x. According to the
GRASP scheme, the selection of the literal to add is not purely greedy, but a
Restricted Candidate List (RCL) is created with the most promising elements,
and an element is randomly selected among them. Concerning the tuning of the
parameter , whose task is to adjust the greediness of the construction phase,
we performed an extensive analysis over a set of ten dierent random seeds.
Such testing showed how a nearly totally greedy setup ( = 0.1) allowed the
algorithm to attain better quality solutions in smallest running times.
Let |C| = m be the number of clauses. Since |X| = 2n, in the worst case
scenario the loop while (Fig. 2, line 3) in the construct-solution function
pseudo-coded in Fig. 2 runs m times and in each run the most expensive opera-
tion consists in the construction of the RCL. Therefore, the total computational
complexity is O(m n).
In the local search phase, the algorithm uses a 1-exchange (ip) neighborhood
function, where two solutions are neighbors if and only if they dier in at most one
component. Therefore, if there exists a better solution x that diers only for one
literal from the current solution x, the current solution s is set to s and the proce-
dure restarts. If such a solution does not exists, the procedure ends and returns the
68 G. Felici et al.
1 Function construct-solution(C, X, )
/* C is the set of uncovered clauses */
/* X is the set of candidate literals */
2 s;
3 while C = do
4 if c C can be covered only by x X then
5 s s {x};
6 X X \ {x, x};
7 C C \ {c | x c};
8 else
9 compute penalty(x) x X;
10 th min{penalty(x)} + (max{penalty(x)} min{penalty(x)}) ;
xX xX xX
11 RCL { x X : penalty(x) th } ;
12 x rand(RCL) ;
13 s s {x};
14 X X \ {x, x};
15 C C \ {c | x c};
16 return s
current solution s. The local search procedure would also re-establish feasibility
if the current solution is not covering all clauses of (X). During our experimen-
tation we tested the one-ip local search using two dierent neighborhood explo-
ration strategies: rst improvement and best improvement. With the former strat-
egy, the current solution is replaced by the rst improving solution found in its
neighborhood; such improving solution is then used as a starting point for the next
local exploration. On the other hand, with the best improvement strategy, the cur-
rent solution x is replaced with the solution x N (x) corresponding to the greatest
improvement in terms of objective function value; x is then used as a starting point
for the next local exploration. Our results showed how the rst improvement strat-
egy is slightly faster, as expected, while attaining solution of the same quality of
those given by the best improvement strategy. Based on this rationale, we selected
rst improvement as exploration strategy in our testing phase.
The rst step to be performed in order to properly represent the random vari-
able X with a theoretical distribution consists in an empirical observation of
the algorithm. Examining the objective function values obtained at the end
of each iteration, and counting up the respective frequencies, it is possible to
select a promising parametric family of distributions. Afterwards, by means of
a Maximum Likelihood Estimation (MLE), see for example [22], a choice is made
regarding the parameters characterizing the best tting distribution of the cho-
sen family.
In order to carry on the empirical analysis of the objective function value
obtained in a generic iteration of GRASP, which will result in a rst guess
concerning the parametric family of distributions, we represent the data obtained
in the following way.
Let I be a xed instance and F the set of solutions obtained by the algo-
rithm up to the current iteration, and let Z be the multiset of the objective
function values associated to F. Since we are dealing with a minimum optimiza-
tion problem, it is harder to nd good quality solutions, whose cost is small in
term of objective function, rather than expensive ones. This means that during
the analysis of the values in Z we expect to nd an higher concentration of
elements between the mean value and the max(Z). In order to represent the
values in Z with a positive distribution function, that presents higher frequencies
in a right neighborhood of zero and a single tail which decays for growing values
of the random variable, we perform a reection on the data in Z by means the
following transformation:
z = max(Z) z, z Z. (2)
The behaviour of the distribution of z in our instances has then a very recog-
nizable behaviour. A representative of such distribution is given in Fig. 3 where
the histogram of absolute and relative frequencies of z are plotted. It is easy
to observe how the gamma distribution family represents a reasonable educated
guess for our random variable.
Once we have chosen the gamma distribution family, we estimate its parame-
ters performing a MLE. In order to accomplish the estimation, we collect an initial
sample of solution values and on-line execute a function, developed in R (whose
pseudo-code is reported in Fig. 4), which carries out the MLE and returns the
characteristic shape and scale parameters, k and , which pinpoint the specic
distribution of the gamma family that best suits the data.
70 G. Felici et al.
1 Function fitting-data(Z)
/* Z is the initial sample of the objective function values */
2 foreach z Z do
3 z = max(Z) z;
4 {k, } MLE(Z, gamma);
5 return {k, }
(a) let q be an user-dened positive integer, and let Z be the sample of initial
solution values obtained by the GRASP in the rst q iterations;
(b) call the fitting-data procedure, whose input is Z is called one-o to esti-
mate shape and scale parameters, k and , of the best tting gamma distri-
bution;
(c) every time that an incumbent is improved, improve-probability proce-
dure (pseudo-code in Fig. 5) is performed and the probability p of further
improvements is computed. If p is less than or equal to the stopping crite-
rion is satised. For the purpose of determining p, we have used the function
pgamma of R package stats.
A GRASP for the Minimum Cost SAT Problem 71
1 Function improve-probability(k, , z )
/* z is the value of the incumbent */
2 p pgamma(z , shape = k, scale = );
3 return p
5 Results
Our GRASP has been implemented in C++ and compiled with gcc 5.4.0 with
the ag -std=c++14. All tests were run on a cluster of nodes, connected by
10 Gigabit Inniband technology, each of them with two processors Intel Xeon
E5-4610v2@2.30 GHz.
We performed two dierent kinds of experimental tests. In the rst one, we
compared the algorithm with dierent solvers proposed in literature, without
use of probabilistic stop. In particular, we used: Z3 solver freely available from
Microsoft Research [19], bsolo solver kindly provided by its authors [12], the
MiniSat+ [5] available at web page http://minisat.se/, and PWBO available
at web page http://sat.inesc-id.pt/pwbo/index.html. The aim of this rst set
of computational experiment is the evaluation of the quality of the solutions
obtained by our algorithm within a certain time limit. More specically, the
stopping criterion for GRASP, bsolo and PWBO is a time limit of 3 h, for Z3
and MiniSat+ is the reaching of an optimal solution.
Z3 is a satisability modulo theories (SMT) solver from Microsoft Research
that generalizes boolean satisability by adding equality reasoning, arithmetic,
xed-size bit-vectors, arrays, quantiers, and other useful rst-order theories.
Z3 integrates modern backtracking-based search algorithm for solving the CNF-
SAT problem, namely DPLL-algorithm; in addition it provides a standard search
pruning methods, such as two-watching literals, lemma learning using conict
clauses, phase caching for guiding case splits, and performs non-chronological
backtracking.
bsolo [12,13] is an algorithmic scheme resulting from the integration of sev-
eral features from SAT-algorithms in a branch-and-bound procedure to solve the
binate covering problem. It incorporates the most important characteristics of a
branch-and-bound and SAT algorithm, bounding and reduction techniques for
the former, and search pruning techniques for the latter. In particular, it incor-
porates the search pruning techniques of the Generic seaRch Algorithm-SAT
proposed in [14].
MiniSat+ [5,24] is a minimalistic implementation of a Cha-like SAT solver
based on the two-literal watch scheme for fast boolean constraint propagation
[18], and conict clauses driven learning [14]. In fact the MiniSat solver provides
a mechanism which allows to minimize the clauses conicts.
PWBO [1517] is a Parallel Weighted Boolean Optimization Solver. The
algorithm uses two threads in order to simultaneously estimate a lower and an
upper bound, by means of an unsatisability-based procedure and a linear search,
72 G. Felici et al.
respectively. Moreover, learned clauses are shared between threads during the
search.
For testing, we have initially considered the datasets used to test feature
selection methods in [2], where an extensive description of the generation pro-
cedure can be found. Such testbed is composed of 4 types of problems (A,B,C,D),
for each of which 10 random repetitions have been generated. Problems of type
A and B are of moderate size (100 positive examples, 100 negative examples, 100
logic features), but dier in the form of the formula used to classify the samples
into the positive and negative classes (the formula being more complex for B
than for A). Problems of type C and D are much larger (200 positive exam-
ples, 200 negative examples, 2500 logic features), and D has a more complex
generating logic formula than C.
Table 1 reports both the value of the solutions and the time needed to achieve
them (in the case of GRASP, it is average over ten runs).1 For problems of
moderate size (A and B), the results show that GRASP nds an optimal solution
whenever one of the exact solvers converges. Moreover, GRASP is very fast in
nding the optimal solution, although here it runs the full allotted time before
stopping the search. For larger instances (C and D), GRASP always provides a
solution within the bounds, while two of the other tested solvers fail in doing so
and the two that are successful (bsolo, PWBO) always obtain values of inferior
quality.
The second set of experimental tests was performed for the purpose of evaluat-
ing the impact of the probabilistic stopping rule. In order to do so, we have chosen
ve dierent values for threshold , two distinct sizes for the set Z of initial solu-
tion, and executed GRASP using ten dierent random seeds imposing a maximum
number of iterations as stopping criterion. This experimental setup yielded for each
instance, and for each threshold value, 20 executions of the algorithm. About such
runs, the data collected were: the number of executions in which the probabilis-
tic stopping rule was veried (stops), the mean value of the objective function
of the best solution found (z ), and the average computational time needed (t ).
To carry out the evaluation of the stopping rule, we executed the algorithm only
using the maximum number of iterations as stopping criterion for each instance
and for each random seed. About this second setup, the data collected are, as for
the rst one, the objective function of the best solution found (z ) and the aver-
age computational time needed (t ). For the sake of comparison, we considered the
percentage gaps between the results collected with and without the probabilistic
stopping rule. The second set of experimental tests is summarized in Table 2 and
in Fig. 7. For each pair of columns (3,4), (6,7), (9,10), (12, 13), the table reports the
percentage of loss in terms of objective function value and the percentage of gain
in terms of computation times using the probabilistic stopping criterion, respec-
tively. The analysis of the gaps shows how the probabilistic stop yields little or no
changes in the objective function value while bringing dramatic improvements in
the total computational time.
1
For missing values, the algorithm was not able to nd the optimal solution in 24 h.
A GRASP for the Minimum Cost SAT Problem 73
threshold inst %-gap z %-gap t(s) inst %-gap z %-gap t(s) inst %-gap z %-gap t(s) inst %gap z %gap t(s)
5 102 A1 -0.0 83.1 B1 -2.1 87.1 C1 -6.6 76.0 D1 -5.0 79.3
1 102 A1 -0.0 83.1 B1 -2.1 87.1 C1 -6.6 76.1 D1 -5.0 79.3
5 103 A1 -0.0 83.0 B1 -2.1 87.1 C1 -5.0 74.8 D1 -4.9 78.7
1 103 A1 -0.0 2.5 B1 -2.1 87.1 C1 -3.8 70.7 D1 -1.7 58.9
5 104 A1 -0.0 -15.3 B1 -2.1 87.2 C1 -2.6 70.2 D1 -1.2 49.0
1 104 A1 -0.0 -11.8 B1 -0.5 86.1 C1 -1.3 52.5 D1 -0.2 31.6
5 102 A2 -0.0 84.0 B2 -0.7 87.0 C2 -3.5 76.0 D2 -0.1 79.1
1 102 A2 -0.0 84.1 B2 -0.7 87.0 C2 -3.5 76.2 D2 -0.1 79.1
5 103 A2 -0.0 83.6 B2 -0.7 86.9 C2 -3.5 76.7 D2 -0.1 79.1
1 103 A2 -0.0 84.0 B2 -0.7 87.0 C2 -1.9 76.4 D2 -0.1 79.1
5 104 A2 -0.0 84.9 B2 -0.7 87.0 C2 -1.9 76.1 D2 -0.1 75.7
1 104 A2 -0.0 57.9 B2 -0.1 71.3 C2 -1.9 65.2 D2 -0.1 53.5
5 102 A3 -0.0 83.4 B3 -2.7 87.0 C3 -2.7 76.3 D3 -1.8 75.2
1 102 A3 -0.0 83.8 B3 -2.7 87.0 C3 -2.1 73.0 D3 -1.8 75.2
5 103 A3 -0.0 82.9 B3 -2.7 87.0 C3 -1.7 68.0 D3 -1.7 74.8
1 103 A3 -0.0 8.3 B3 -2.6 86.6 C3 -0.6 40.9 D3 -0.8 38.5
5 104 A3 -0.0 -1.6 B3 -2.0 84.1 C3 -0.0 28.3 D3 -0.5 19.1
1 104 A3 -0.0 -6.8 B3 -0.7 58.4 C3 -0.0 9.9 D3 -0.3 14.5
5 102 A4 -0.0 86.4 B4 -2.3 86.9 C4 -4.3 78.8 D4 -2.2 75.0
1 102 A4 -0.0 6.4 B4 -2.3 86.9 C4 -3.3 68.0 D4 -2.2 70.9
5 103 A4 -0.0 3.5 B4 -2.3 86.9 C4 -2.2 63.9 D4 -2.2 66.8
1 103 A4 -0.0 1.4 B4 -2.3 87.0 C4 -1.0 51.2 D4 -2.0 41.0
5 104 A4 -0.0 5.6 B4 -2.3 86.9 C4 -0.8 48.6 D4 -1.2 29.1
1 104 A4 -0.0 6.4 B4 -0.6 74.8 C4 -0.3 38.1 D4 -1.2 18.9
5 102 A5 -0.0 87.6 B5 -0.7 86.6 C5 -2.6 79.7 D5 -5.6 75.2
1 102 A5 -0.0 12.2 B5 -0.7 86.6 C5 -1.5 71.5 D5 -4.9 75.1
5 103 A5 -0.0 12.5 B5 -0.7 86.6 C5 -0.4 68.1 D5 -4.9 75.2
1 103 A5 -0.0 12.4 B5 -0.7 86.6 C5 -0.2 53.2 D5 -4.7 67.6
5 104 A5 -0.0 12.3 B5 -0.6 86.3 C5 -0.0 46.8 D5 -3.8 60.0
1 104 A5 -0.0 12.5 B5 -0.1 19.0 C5 -0.0 33.2 D5 -3.3 49.8
5 102 A6 -0.9 87.2 B6 -0.8 86.6 C6 -3.3 79.9 D6 -7.9 76.0
1 102 A6 -0.9 87.2 B6 -0.8 86.6 C6 -2.0 70.5 D6 -5.9 74.8
5 103 A6 -0.9 87.2 B6 -0.8 86.6 C6 -1.3 65.4 D6 -5.0 74.0
1 103 A6 -0.8 87.1 B6 -0.7 86.3 C6 -0.2 49.6 D6 -2.5 71.1
5 104 A6 -0.5 86.8 B6 -0.1 72.1 C6 -0.2 39.9 D6 -2.5 71.2
1 104 A6 -0.0 66.1 B6 -0.0 7.6 C6 -0.0 36.6 D6 -2.5 67.3
5 102 A7 -0.0 87.5 B7 -3.1 86.2 C7 -3.8 74.4 D7 -6.5 75.5
1 102 A7 -0.0 11.7 B7 -3.1 86.2 C7 -2.4 65.7 D7 -5.3 72.1
5 103 A7 -0.0 11.7 B7 -3.1 86.2 C7 -1.9 60.7 D7 -4.0 68.0
1 103 A7 -0.0 11.3 B7 -3.1 86.2 C7 -0.8 43.0 D7 -2.8 61.2
5 104 A7 -0.0 11.5 B7 -3.0 86.0 C7 -0.0 36.4 D7 -2.2 60.6
1 104 A7 -0.0 11.4 B7 -0.8 75.8 C7 -0.0 14.0 D7 -2.2 57.4
5 102 A8 -0.0 88.1 B8 -1.5 86.7 C8 -3.6 73.9 D8 -11.5 76.2
1 102 A8 -0.0 88.1 B8 -1.5 86.7 C8 -3.3 74.7 D8 -6.7 73.4
5 103 A8 -0.0 88.1 B8 -1.5 86.7 C8 -3.3 74.4 D8 -6.7 73.4
1 103 A8 -0.0 16.4 B8 -1.2 86.4 C8 -3.3 73.7 D8 -4.4 68.2
5 104 A8 -0.0 16.6 B8 -0.8 74.5 C8 -3.2 65.6 D8 -3.4 67.9
1 104 A8 -0.0 16.5 B8 -0.0 7.8 C8 -2.2 60.5 D8 -2.4 64.9
5 102 A9 -0.0 88.0 B9 -1.9 85.9 C9 -4.1 75.3 D9 -2.1 75.2
1 102 A9 -0.0 88.0 B9 -1.9 85.9 C9 -2.7 74.8 D9 -2.1 75.2
5 103 A9 -0.0 88.0 B9 -1.9 85.9 C9 -1.1 74.4 D9 -2.1 75.2
1 103 A9 -0.0 16.0 B9 -1.9 85.9 C9 -1.1 66.6 D9 -2.1 75.2
5 104 A9 -0.0 16.0 B9 -1.7 84.9 C9 -0.2 56.5 D9 -2.1 67.7
1 104 A9 -0.0 15.9 B9 -0.5 45.2 C9 -0.2 55.7 D9 -1.9 60.4
5 102 A10 -0.0 83.3 B10 -0.3 87.7 C10 -0.4 76.3 D10 -7.1 73.7
1 102 A10 -0.0 75.4 B10 -0.3 87.6 C10 -0.4 76.2 D10 -6.9 73.8
5 103 A10 -0.0 0.5 B10 -0.3 87.7 C10 -0.3 67.9 D10 -6.4 73.1
1 103 A10 -0.0 -5.4 B10 -0.3 87.6 C10 -0.3 48.0 D10 -4.5 62.0
5 104 A10 -0.0 -4.8 B10 -0.0 87.4 C10 -0.3 48.0 D10 -4.3 57.3
1 104 A10 -0.0 -4.7 B10 -0.0 35.7 C10 -0.2 27.0 D10 -3.1 38.6
A GRASP for the Minimum Cost SAT Problem 75
Number of stops
20
15
# of stops
10
5
0
Thresholds
80
2
60
4
%gap
%gap
40
6
20
8
10
0
12
0.05 0.01 0.005 0.001 5e04 1e04 0.05 0.01 0.005 0.001 5e04 1e04
Thresholds Thresholds
Fig. 7. Comparison of objective function values and computation times obtained with
and without probabilistic stopping rule for dierent threshold values.
The gaps yielded show how even with the highest , the dierence in solution
quality is extremely small, with a single minimum of 11.5% for the instance D8,
and a very promising average gap, slightly below 2%. As expected, decreasing
the values the solutions obtained with and without the probabilistic stopping
rule will align with each other, and the negative gaps will accordingly grow up to
approximately 1%. The third boxplot shows the gaps obtained in the compu-
tation times. The analysis of such gaps is the key to realistically appraise the
actual benet provided by the use of the probabilistic stopping rule. Observing
the results reported, it is possible to note how even in the case of the smallest
threshold, i.e., using the most strict probabilistic stopping criterion, the stops
recorded are such that an average time discount close to the 40% is encountered.
A more direct display of this time gaps can be obtained straightly considering
the total time discount in seconds: with the smallest we have experienced a
time discount of 4847.6 s over the 11595.9 total seconds needed for the execution
without the probabilistic stop. Analyzing in the same fashion the values obtained
under the largest threshold, we observed an excellent average discount just over
80%, which quantied in seconds amounts to an astonishing total discount of
8919.64 s over the 11595.9 total seconds registered for the execution without the
probabilistic stop.
A GRASP for the Minimum Cost SAT Problem 77
6 Conclusions
In this paper, we have investigated a strategy for a GRASP heuristic that solves
large sized MinCostSAT. The method adopts a straight-forward implementation
of the main ingredients of the heuristic, but proposes a new probabilistic stopping
rule. Experimental results show that, for instances belonging to particular class
of MinCostSAT problems, the method performs very well and the new stopping
rule provides a very eective way to reduce the number of iterations of the
algorithm without observing any signicant decay in the value of the objective
function.
The work presented has to be considered preliminary, but it clearly indi-
cates several research directions that we intend to pursue: the renement of the
dynamic estimate of the probability distribution of the solutions found by the
algorithm, the comparative testing of instances of larger size, and the extension
to other classes of problems. Last, but not least, attention will be directed toward
the incorporation of the proposed heuristic into methods that are specically
designed to extract logic formulas from data and to the test of the performances
of the proposed algorithm in this setting.
Acknowledgements. This work has been realized thanks to the use of the S.Co.P.E.
computing infrastructure at the University of Napoli FEDERICO II.
References
1. Arisi, I., DOnofrio, M., Brandi, R., Felsani, A., Capsoni, S., Drovandi, G., Felici,
G., Weitschek, E., Bertolazzi, P., Cattaneo, A.: Gene expression biomarkers in the
brain of a mouse model for alzheimers disease: Mining of microarray data by logic
classication and feature selection. J. Alzheimers Dis. 24(4), 721738 (2011)
2. Bertolazzi, P., Felici, G., Festa, P., Fiscon, G., Weitschek, E.: Integer programming
models for feature selection: new extensions and a randomized solution algorithm.
Eur. J. Oper. Res. 250(2), 389399 (2016)
3. Bertolazzi, P., Felici, G., Weitschek, E.: Learning to classify species with barcodes.
BMC Bioinform. 10(14), S7 (2009)
4. Cestarelli, V., Fiscon, G., Felici, G., Bertolazzi, P., Weitschek, E.: CAMUR: knowl-
edge extraction from RNA-seq cancer data through equivalent classication rules.
Bioinformatics 32(5), 697704 (2016)
5. Een, N., Sorensson, N.: Translating pseudo-boolean constraints into SAT. J. Satisf.
Boolean Model. Comput. 2, 126 (2006)
6. Felici, G., Truemper, K.: A minsat approach for learning in logic domains.
INFORMS J. Comput. 14(1), 2036 (2002)
7. Feo, T.A., Resende, M.G.C.: Greedy randomized adaptive search procedures. J.
Glob. Optim. 6(2), 109133 (1995)
8. Festa, P., Resende, M.G.C.: An annotated bibliography of GRASP - part I: algo-
rithms. Int. Trans. Oper. Res. 16(1), 124 (2009)
9. Festa, P., Resende, M.G.C.: An annotated bibliography of GRASP - part II: appli-
cations. Int. Trans. Oper. Res. 16(2), 131172 (2009)
78 G. Felici et al.
10. Fu, Z., Malik, S.: Solving the minimum-cost satisability problem using SAT based
branch-and-bound search. In: 2006 IEEE/ACM International Conference on Com-
puter Aided Design, pp. 852859, November 2006
11. Garey, M.R., Johnson, D.S.: Computers and Intractability, vol. 29. W.H. Freeman,
New York (2002)
12. Manquinho, V.M., Marques-Silva, J.P.: Search pruning techniques in SAT-based
branch-and-bound algorithms for the binate covering problem. IEEE Trans. Com-
put. Aided Des. Integr. Circ. Syst. 21(5), 505516 (2002)
13. Manquinho, V.M., Flores, P.F., Silva, J.P.M., Oliveira, A.L.: Prime implicant com-
putation using satisability algorithms. In: Ninth IEEE International Conference
on Tools with Articial Intelligence, 1997 Proceedings, pp. 232239. IEEE (1997)
14. Marques-Silva, J.P., Sakallah, K.A.: GRASP: a search algorithm for propositional
satisability. IEEE Trans. Comput. 48(5), 506521 (1999)
15. Martins, R., Manquinho, V., Lynce, I.: Clause sharing in deterministic parallel
maximum satisability. In: RCRA International Workshop on Experimental Eval-
uation of Algorithms for Solving Problems with Combinatorial Explosion (2012)
16. Martins, R., Manquinho, V.M., Lynce, I.: Clause sharing in parallel MaxSAT.
In: Hamadi, Y., Schoenauer, M. (eds.) LION 2012. LNCS, pp. 455460. Springer,
Heidelberg (2012). doi:10.1007/978-3-642-34413-8 44
17. Martins, R., Manquinho, V.M., Lynce, I.: Parallel search for maximum satisabil-
ity. AI Commun. 25(2), 7595 (2012)
18. Moskewicz, M.W., Madigan, C.F., Zhao, Y., Zhang, L., Malik, S.: Cha: engineer-
ing an ecient SAT solver. In: Proceedings of the 38th Annual Design Automation
Conference, DAC 2001, pp. 530535. ACM, New York (2001)
19. de Moura, L., Bjrner, N.: Z3: an ecient SMT solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337340. Springer, Heidelberg
(2008). doi:10.1007/978-3-540-78800-3 24
20. Pipponzi, M., Somenzi, F.: An iterative algorithm for the binate covering problem.
In: Proceedings of the European Design Automation Conference, EDAC 1990, pp.
208211, March 1990
21. Ribeiro, C.C., Rosseti, I., Souza, R.C.: Probabilistic stopping rules for GRASP
heuristics and extensions. Int. Trans. Oper. Res. 20(3), 301323 (2013)
22. Scholz, F.W.: Maximum likelihood estimation (2004)
23. Servit, M., Zamazal, J.: Heuristic approach to binate covering problem. In: Pro-
ceedings The European Conference on Design Automation, pp. 123129, March
1992
24. Sorensson, N., Een, N.: Minisat v1.13 - a sat solver with conict-clause minimiza-
tion. Technical report (2005(53))
25. Truemper, K.: Design of Logic-Based Intelligent Systems. Wiley-Interscience Pub-
lication, Wiley (2004)
26. Villa, T., Kam, T., Brayton, R.K., Sangiovanni-Vincenteili, A.L.: Explicit and
implicit algorithms for binate covering problems. IEEE Trans. Comput. Aided
Des. Integr. Circ. Syst. 16(7), 677691 (1997)
27. Weitschek, E., Felici, G., Bertolazzi, P.: MALA: a microarray clustering and clas-
sication software. In: 2012 23rd International Workshop on Database and Expert
Systems Applications, pp. 201205, September 2012
28. Weitschek, E., Fiscon, G., Felici, G.: Supervised DNA barcodes species classica-
tion: analysis, comparisons and results. BioData Min. 7(1), 4 (2014)
29. Weitschek, E., Lo Presti, A., Drovandi, G., Felici, G., Ciccozzi, M., Ciotti, M.,
Bertolazzi, P.: Human polyomaviruses identication by logic mining techniques.
Virol. J. 9(1), 58 (2012)
A New Local Search for the p-Center Problem
Based on the Critical Vertex Concept
1 Introduction
The p-center problem is one of the best-known discrete location problems rst
introduced in the literature in 1964 by Hakimi [13]. It consists of locating p facil-
ities and assigning clients to them in order to minimize the maximum distance
between a client and the facility to which the client is assigned (i.e., the clos-
est facility). Useless to say that this problem arises in many dierent real-world
contexts, whenever one designs a system for public facilities, such as schools or
emergency services.
Formally, we are given a complete undirected edge-weighted bipartite graph
G = (V U, E, c), where
is the vertex set of a complete graph G = (V, E), each distance cij represents
the length of a shortest path between vertices i and j (cii = 0), and hence the
triangle inequality is satised.
In 1979, Kariv and Hakimi [16] proved that the problem is NP -hard, even in
the case where the input instance has a simple structure (e.g., a planar graph
of maximum vertex degree 3). In 1970, Minieka [20] designed the rst exact
method for the p-center problem viewed as a series of set covering problems. His
algorithm iteratively chooses a threshold r for the radius and checks whether all
clients can be covered within distance r using no more than p facilities. If so, the
threshold r is decreased; otherwise, it is increased. Inspired by Miniekas idea,
in 1995 Daskin [3] proposed a recursive bisection algorithm that systematically
reduces the gap between upper and lower bounds on the radius. More recently,
in 2010 Salhi and Al-Khedhairi [26] proposed a faster exact approach based on
Daskins algorithm that obtains tighter upper and lower bounds by incorporat-
ing information from a three-level heuristic that uses a variable neighborhood
strategy in the rst two levels and at the third level a perturbation mechanism
for diversication purposes.
Recently, several facility location problems similar to the p-center have been
used to model scenarios arising in nancial markets. The main steps to use such
techniques are the following: rst, to describe the considered nancial market
via a correlation matrix of stock prices; second, to model the matrix as a graph,
stocks and correlation coecients between them are represented by nodes and
edges, respectively. With this idea, Goldengorin et al. [11] used the p-median
problem to analyze stock markets. Another interesting area where these problems
arise is the manufacturing system with the aim of lowering production costs [12].
Due to the computational complexity of the p-center problem, several approx-
imation and heuristic algorithms have been proposed for solving it. By exploiting
the relationship between the p-center problem and the dominating set prob-
lem [15,18], nice approximation results were proved. With respect to inapprox-
imability results, Hochbaum and Shmoys [15] proposed a 2-approximation algo-
rithm for the problem with triangle inequality, showing that for any < 2 the
existence of a -approximation algorithm would imply that P = NP .
Although interesting in theory, approximation algorithms are often outper-
formed in practice by more straightforward heuristics with no particular perfor-
mance guarantees. Local search is the main ingredient for most of the heuristic
algorithms that have appeared in the literature. In conjunction with various tech-
niques for escaping local optima, these heuristics provide solutions which exceed
the theoretical upper bound of approximating the problem and derive from ideas
used to solve the p-median problem, a similar NP -hard problem [17]. Given a set
F of m potential facilities, a set U of n users (or customers), a distance function
d : U F R, and a constant p m, the p-median problem is to determine
a subset of p facilities to open so as to minimize the sum of the distances from
each user to its closest open facility. For the p-median problem, in 2004 Resende
and Werneck [25] proposed a multistart heuristic that hybridizes GRASP with
Path-Relinking as both, intensication and post-optimization phases. In 1997,
A New Local Search for the p-Center Problem 81
Hansen and Mladenovic [14] proposed three heuristics: Greedy, Alternate, and
Interchange (vertex substitution). To select the rst facility, Greedy solves a
1-center problem. The remaining p1 facilities are then iteratively added, one at
a time, and at each iteration the location which most reduces the maximum cost
is selected. In [5], Dyer and Frieze suggested a variant, where the rst center is
chosen at random. In the rst iteration of Alternate, facilities are located at p
vertices chosen in V , clients are assigned to the closest facility, and the 1-center
problem is solved for each facilitys set of clients. During the subsequent itera-
tions, the process is repeated with the new locations of the facilities until no more
changes in assignments occur. As for the Interchange procedure, a certain pat-
tern of p facilities is initially given. Then, facilities are moved iteratively, one by
one, to vacant sites with the objective of reducing the total (or maximum) cost.
This local search process stops when no movement of a single facility decreases
the value of the objective function. A multistart version of Interchange was also
proposed, where the process is repeated a given number of times and the best
solution is kept. The combination of Greedy and Interchange has been most
often used for solving the p-median problem. In 2003, Mladenovic et al. [21]
adapted it to the p-center problem and proposed a Tabu Search (TS) and a
Variable Neighborhood Search (VNS), i.e., a heuristic that uses the history of
the search in order to construct a new solution and a competitor that is not
history sensitive, respectively. The TS is designed by extending Interchange to
the chain-interchange move, while in the VNS, a perturbed solution is obtained
from the incumbent by a k-interchange operation and Interchange is used to
improve it. If a better solution than the incumbent is found, the search is recen-
tered around it. In 2011, Davidovic et al. [4] proposed a Bee Colony algorithm, a
random search population-based technique, where an articial system composed
of a number of precisely dened agents, also called individuals or articial bees.
To the best of our knowledge, most of the research eort devoted towards
the development of metaheuristics for this problem has been put into the design
of ecient local search procedures. The purpose of this article is propose a new
local search and to highlight how its performances are better than best-known
local search proposed in literature (Mladenovic et al.s [21] local search based on
VNS strategy), both in terms of solutions quality and convergence speed.
The remainder of the paper is organized as follows. In Sect. 2, a GRASP con-
struction procedure is described. In Sect. 3, we introduce the new concept of
critical vertex with relative denitions and describe a new local search algorithm.
Computational results presented in Sect. 4 empirically demonstrate that our local
search is capable of obtaining better results than the best known local search,
and they are validated by a statistical signicance test. Concluding remarks are
made in Sect. 5.
and a local search phase. For a comprehensive study of GRASP strategies and
their variants, the reader is referred to the survey papers by Festa and Resende
[9,10], as well as to their annotated bibliography [8].
Starting from a partial solution made of 1 randElem p facilities ran-
domly selected from V , our GRASP construction procedure iteratively selects
the remaining prandElem facilities in a greedy randomized fashion. The greedy
function takes into account the contribution to the objective function achieved
by selecting a particular candidate element. In more detail, given a partial solu-
tion P , |P | < p, for each i V \ P , we compute w(i) = C(P {i}). The pure
greedy choice would consist in selecting the vertex with the smallest greedy func-
tion value. This procedure instead computes the smallest and the largest greedy
function values:
In the following, we will denote with maxP = |{i V : P (i) = C(P )}| the
number of vertices whose distance from their closest facility results in the objec-
tive function value corresponding to solution P . We dene also the comparison
operator <cv , and we will say that P <cv P if and only if maxP < maxP .
N (y2 ) N (y2 )
P P
y2 y2
N (y1 )
N (y1 )
xl1 xl1
xj1 xj1 N (y3 )
xi1
N (y3 ) xi1
C(P ) y1 y3 C(P ) y1 y3
xk1 xk1
Fig. 2. An example of how the local search works. In this case, the algorithm switches
from solution P to solution P . In P , a new facility y3 is selected in place of y3 in
P , y3 attracts one of the critical vertices from the neighborhood of the facility y1 .
Although the cost of the two solutions is the same, the algorithm selects the new
solution P because maxP < maxP .
84 D. Ferone et al.
The main idea of our plateau surfer local search is to use the concept of
critical vertex to escape from plateaus, moving to solutions that have either
a better cost than the current solution or equal cost but less critical vertices.
Figure 2 shows a simple application of the algorithm, while in Figs. 3 and 4, for
four benchmark instances, both Mladenovics local search and our local search
are applied once taken as input the same starting feasible solution. It is evident
that both the procedures make the same rst moves. However, as soon as a
plateau is met, Mladenovics local search ends, while our local search is able to
escape from the plateau moving to other solutions with the same cost value,
Fig. 3. Plateau escaping. The behavior of our plateau surfer local search (in red) com-
pared with the Mladenovics one (in blue). Both algorithms work on the same instances
taking as input the same starting solution. Filled red dots and empty blue circles indi-
cate the solutions found by the two algorithms. Mladenovic local search stops as soon
as the rst plateau is met. (Color gure online)
A New Local Search for the p-Center Problem 85
Fig. 4. Plateau escaping. The behavior of our plateau surfer local search (in red) com-
pared with the Mladenovics one (in blue) on other two dierent instances. (Color gure
online)
and restarting the procedure from a new solution that can lead to a strict cost
function improvement.
Let us analyze in more detail the behavior of our local search, whose pseudo-
code is reported in Fig. 5. The main part of the algorithm consists in the portion of
the pseudo-code that goes from line 7 to line 14. Starting from an initial solution
P , the algorithm tries to improve the solution replacing a vertex j / P with a
facility i P . Clearly, this swap is stored as an improving move if the new solution
P = P \{i}{j} is strictly better than P according to the cost function C. If C(P )
is better than the current cost C(P ), then P is compared also with the incumbent
86 D. Ferone et al.
Fig. 5. Pseudocode of the plateau surfer local search algorithm based on the critical
vertex concept.
solution and if it is the best solution found so far, the incumbent is update and
the swap that led to this improvement stored (lines 911).
Otherwise, the algorithm checks if it is possible to reduce the number of
critical vertices. If the new solution P is such that P <cv P , then the algorithm
checks if P is the best solution found so far (line 12), the value that counts the
number of critical vertices in a solution is update (line 13), and the current swap
stored as an improving move (line 14).
To study the computational complexity of our local search, let be n = |V |
and p = |P |, the number of vertices in the graph and the number of facilities in
a solution, respectively. The loops at lines 3 and 7 are executed p and n times,
respectively. The update of the solution takes O(n). In conclusion, the total
complexity is O(p n2 ).
4 Experimental Results
In this section, we describe computational experience with the local search pro-
posed in this paper. We have compared it with the local search proposed by
Mladenovic et al. [21], by embedding both in a GRASP framework.
A New Local Search for the p-Center Problem 87
The algorithms were implemented in C++, compiled with gcc 5.2.1 under
Ubuntu with -std=c++14 ag. The stopping criterion is maxT ime = 0.1n+0.5
p. All the tests were run on a cluster of nodes, connected by 10 Gigabit Inniband
technology, each of them with two processors Intel Xeon E5-4610v2@2.30 GHz.
Table 1 summarizes the results on a set of ORLIB instances, originally intro-
duced in [1]. It consists of 40 graphs with number of vertices ranging from 100 to
900, each with a suggested value of p ranging from 5 to 200. Each vertex is both
a user and a potential facility, and distances are given by shortest path lengths.
Tables 2 and 3 report the results on the TSP set of instances. They are just sets
of points on the plane. Originally proposed for the traveling salesman problem,
they are available from the TSPLIB [24]. Each vertex can be either a user or
an open facility. We used the Mersenne Twister random number generator by
Matsumoto and Nishimura [19]. Each algorithm was run with 10 dierent seeds,
and minimum (min), average (E) and variance ( 2 ) values are listed in each
table. The second to last column lists the %-Gap between average solutions. To
deeper investigate the statistical signicance of the results obtained by the two
local searches, we performed the Wilcoxon test [2,27].
Generally speaking, the Wilcoxon test is a ranking method that well applies
in the case of a number of paired comparisons leading to a series of dierences,
some of which may be positive and some negative. Its basic idea is to substitute
scores 1, 2, 3, . . . , n with the actual numerical data, in order to obtain a rapid
approximate idea of the signicance of the dierences in experiments of this
kind.
More formally, let A1 and A2 be two algorithms, I1 , . . . , Il be l instances of
the problem to solve, and let Ai (Ij ) be the value of the solution obtained by
algorithm Ai (i = 1, 2) on instance Ij (j = 1, . . . , l). For each j = 1, . . . , l, the
Wilcoxon test computes the dierences j = |A1 (Ij ) A2 (Ij )| and sorts them
in non decreasing order. Accordingly, starting with a smallest rank equal to 1, to
each dierence j , it assigns a non decreasing rank Rj . Ties receive a rank equal
to the average of the sorted positions they span. Then, the following quantities
are computed
W+ = Rj ,
j=1,...,l : j >0
W = Rj .
j=1,...,l : j <0
Under the null hypothesis that A1 (Ij ) and A2 (Ij ) have the same median
value, it should result that W + = W . If the p-value associated to the experi-
ment is less than an a priori xed signicance level , then the null hypothesis
is rejected and the dierence between W + and W is considered signicant.
The last column of each table lists the p-values where the %-Gap is signicant,
all the values are less than = 0.01. This outcome of the Wilcoxon test further
conrms that our local search is better performing than the local search proposed
by Mladenovic et al.
88 D. Ferone et al.
5 Concluding Remarks
In this paper, we presented a new local search heuristic for the p-center problem,
whose potential applications appear in telecommunications, in transportation
logistics, and whenever one must to design a system to organize some sort of
public facilities, such as, for example, schools or emergency services.
The computational experiments show that the proposed local search is capa-
ble to reduce the number of local optimum solutions using the concept of critical
vertex, and it improves the results of the best local search for the problem.
Future lines of work will be focused on a deeper investigation of the robustness
of our proposal by applying it on further instances coming from nancial markets
and manufacturing systems.
Acknowledgements. This work has been realized thanks to the use of the S.Co.P.E.
computing infrastructure at the University of Napoli FEDERICO II.
References
1. Beasley, J.: A note on solving large p-median problems. Eur. J. Oper. Res. 21,
270273 (1985)
2. Con, M., Saltzman, M.: Statistical analysis of computational tests of algorithms
and heuristics. INFORMS J. Comput. 12(1), 2444 (2000)
3. Daskin, M.: Network and Discrete Location: Models, Algorithms, and Applications.
Wiley, New York (1995)
4. Davidovic, T., Ramljak, D., Selmic, M., Teodorovic, D.: Bee colony optimization
for the p-center problem. Comput. Oper. Res. 38(10), 13671376 (2011)
5. Dyer, M., Frieze, A.: A simple heuristic for the p-centre problem. Oper. Res. Lett.
3(6), 285288 (1985)
6. Feo, T., Resende, M.: A probabilistic heuristic for a computationally dicult set
covering problem. Oper. Res. Lett. 8, 6771 (1989)
7. Feo, T., Resende, M.: Greedy randomized adaptive search procedures. J. Global
Optim. 6, 109133 (1995)
8. Festa, P., Resende, M.: GRASP: an annotated bibliography. In: Ribeiro, C.,
Hansen, P. (eds.) Essays and Surveys on Metaheuristics, pp. 325367. Kluwer Aca-
demic Publishers, London (2002)
9. Festa, P., Resende, M.: An annotated bibliography of GRASP - part I: algorithms.
Int. Trans. Oper. Res. 16(1), 124 (2009)
10. Festa, P., Resende, M.: An annotated bibliography of GRASP - part II: applica-
tions. Int. Trans. Oper. Res. 16(2), 131172 (2009)
11. Goldengorin, B., Kocheturov, A., Pardalos, P.M.: A pseudo-boolean approach to
the market graph analysis by means of the p-median model. In: Aleskerov, F.,
Goldengorin, B., Pardalos, P.M. (eds.) Clusters, Orders, and Trees: Methods and
Applications. SOIA, vol. 92, pp. 7789. Springer, New York (2014). doi:10.1007/
978-1-4939-0742-7 5
12. Goldengorin, B., Krushinsky, D., Pardalos, P.M.: Application of the PMP to cell
formation in group technology. In: Goldengorin, B., Krushinsky, D., Pardalos, P.M.
(eds.) Cell Formation in Industrial Engineering. SOIA, vol. 79, pp. 7599. Springer,
New York (2013). doi:10.1007/978-1-4614-8002-0 3
92 D. Ferone et al.
13. Hakimi, S.: Optimum locations of switching centers and the absolute centers and
medians of a graph. Oper. Res. 12(3), 450459 (1964)
14. Hansen, P., Mladenovic, N.: Variable neighborhood search for the p-median. Locat.
Sci. 5(4), 207226 (1997)
15. Hochbaum, D., Shmoys, D.: A best possible heuristic for the k-Center problem.
Math. Oper. Res. 10(2), 180184 (1985)
16. Kariv, O., Hakimi, S.: An algorithmic approach to network location problems.
Part I: the p-centers. SIAM J. Appl. Math. 37(3), 513538 (1979)
17. Kariv, O., Hakimi, S.: An algorithmic approach to network location problems.
Part II: the p-medians. SIAM J. Appl. Math. 37(3), 539560 (1979)
18. Martinich, J.S.: A vertex-closing approach to the p-center problem. Nav. Res.
Logist. 35(2), 185201 (1988)
19. Matsumoto, M., Nishimura, T.: Mersenne twister: a 623-dimensionally equidis-
tributed uniform pseudo-random number generator. ACM Trans. Model. Comput.
Simul. 8(1), 330 (1998)
20. Minieka, E.: The m-center problem. SIAM Rev. 12(1), 138139 (1970)
21. Mladenovic, N., Labbe, M., Hansen, P.: Solving the p-center problem with Tabu
Search and variable neighborhood search. Networks 42(April), 4864 (2003)
22. Mladenovic, N., Urosevic, D., Prez-Brito, D., Garca-Gonzlez, C.G.: Variable neigh-
bourhood search for bandwidth reduction. Eur. J. Oper. Res. 200(1), 1427 (2010)
23. Pardo, E.G., Mladenovi, N., Pantrigo, J.J., Duarte, A.: Variable formulation search
for the cutwidth minimization problem. Appl. Soft Comput. 13(5), 22422252
(2013)
24. Reinelt, G.: TSPLIBA traveling salesman problem library. ORSA J. Comput.
3(4), 376384 (1991)
25. Resende, M., Werneck, R.: A hybrid heuristic for the p-median problem. J. Heuris-
tics 10(1), 5988 (2004)
26. Salhi, S., Al-Khedhairi, A.: Integrating heuristic information into exact methods:
the case of the vertex p-centre problem. J. Oper. Res. Soc. 61(11), 16191631
(2010)
27. Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 8083
(1945)
An Iterated Local Search Framework with
Adaptive Operator Selection for Nurse Rostering
1 Introduction
One of the main motivations for the development of hyper-heuristics was to
develop search algorithms that can operate with a certain degree of gen-
erality [1,2]. Hyper-heuristics can be considered to be high level general
approaches that are able to select or generate low-level heuristics, whilst restrict-
ing the need to use domain knowledge [3]. In particular, selection hyper-heuristics
choose heuristics from a predened set of low level heuristics within a framework,
where the aim is to determine a sequence of perturbations that provide ecient
solutions for a given problem. On the other hand, the idea behind generative
hyper-heuristics is to develop new heuristics based on the basic components of
the input low level heuristics [3].
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 93108, 2017.
https://doi.org/10.1007/978-3-319-69404-7_7
94 A. Gretsista and E.K. Burke
Sect. 4.2. Finally, Sect. 5 concludes with a summary of the experimental ndings
of this work and provides some pointers for future work.
at hand. After having the initial solution, each iteration of HHILS operates ve
main steps. There is a set S which includes the k available perturbation low-level
search heuristics, S = {llh1 , llh2 , . . . , llhk }. HHILS exploits an action selection
method (ActionSelection(str)) to predict and select the most tting perturba-
tion low-level search heuristic included in S for the next step (line 4, see Sect. 3.2
for details). Having selected the perturbation low-level heuristic (selectedllh ), a
new solution, stmp , is generated by applying the selectedllh perturbation heuris-
tic to the current solution scur (line 5). After the perturbation, the new solution
(stmp ) is rened by a local search method, ApplyLocalSearch(stmp ) (line 6). The
local search procedure utilizes a set of greedy local search heuristics which are
applied in an iterative way. More specically, given a list L of the available
local search heuristics, L = {l1 , l2 , . . . , l }, at each repetition, a local search is
selected in a uniform random way to be applied to the current solution. If the
selected local search heuristic is not able to provide a better position for the
stmp , it is excluded from L, and another local search heuristic is selected. This
continues until an improved solution has been produced. By the end of the iter-
ative local search process, the result will be the incumbent solution (stmp ). In
the next step, HHILS will decide the best solution, between scur and stmp , to
use for the next iteration through the AcceptanceCriterion(scur , stmp ) proce-
dure (line 7). Here, we have adopted a Simulating Annealing acceptance rule
to allow worsening moves being accepted with a probability. The acceptance
probability can be calculated as p = e(f (scur )f (stmp ))/(T i ) , where f (scur ) and
f (stmp ) are the objective values of the incumbent (scur ) and temporary (stmp )
solutions, T is the temperature value with T R, (here is xed to T = 2) and
i is the mean improvement of the improving iterations [15]. The value essen-
tially normalizes the objective value dierence by a quantity that is not problem
An Iterated Local Search Framework with Adaptive Operator Selection 97
dependent. In the last step, HHILS assigns a score to the utilized low-level per-
turbation heuristics (selectedllh ) involved based on its performance (incumbent
improvement) through the credit assignment module CreditAssignment
(selectedllh ) (line 8).
Four dierent perturbation strategies are used here, one from the mutation
and three from the ruin and recreate categories. The mutation heuristic (HM)
randomly un-assigns shifts based on an intensity parameter respecting the fea-
sibility of the solution. The three ruin and recreate heuristics (HR1HR3) are
all inspired by the one proposed in [20]. HR1 unassigns all shifts of random
employees from the schedule and recreates the schedule by prioritizing the objec-
tives related to weekdays and then to weekends. Then, greedy procedures are
used to satisfy the remaining objectives. A hill climbing procedure is employed
to improve the quality of the roster. HR1 destroys the solution by removing
the schedule of a medium number of employees. HR2 adopts a similar proce-
dure accepting a greater change to the solution, proportional to the number of
employees in the schedule, while HR3 slightly perturbs the solution by removing
the shifts from only one employee. Five dierent local searchers are also adopted
LS1LS5, The rst three are using dierent neighborhood operators from the lit-
erature, i.e., vertical , horizontal and new swaps respectively, while the last two
local searchers follow a variable depth search strategy with dierent neighbor-
hood operators (LS4: vertical and new, LS5: vertical, horizontal and new) [21].
A detailed description of all the employed low-level heuristics can be found in
the documentation of HyFlex [22].
In order to assess the quality of the last action performed each time, a credit
assignment module has been employed. The most conventional way to determine
the impact of each move and assign a credit to it, is to associate the search move
with the solution improvement caused by its application. To this end, we can
calculate the credit of an action based on the improvement of the incumbent
solution weighted by the eort paid to improve it.
More precisely, HHILS rewards each low-level perturbation heuristic accord-
ing to the ability to improve the incumbent solution normalized by the total time
spent to achieve this improvement. Let S = {llh1 , llh2 , . . . , llhk } be the set of k
available low-level perturbation heuristics and tllhi be the execution time con-
sumed by action llhi S to search the solution space. The total time consumed
by action llhi can be calculated according to tllhi = tllhi + tsp sp
llhi , where tllhi is the
execution time consumed by action llhi for the current iteration. The reward rllhi
of action llhi can, therefore, be calculated according to the following equation:
1+improvementllhi
rllhi = tllhi , where improvementllhi simply counts the number of
times action llhi improves the incumbent solution (i.e., f (scur ) < f (stmp ), where
scur is the incumbent solution and stmp is the solution produced by the search
operations at the current iteration). Notice that the value 1 in the numerator
is responsible for assigning non-zero rewards to actions that have not yet led
98 A. Gretsista and E.K. Burke
where (0, 1] is the adaptation rate which can amplify the inuence of the
most recent rewards over their history (here is xed to 0.1) [19,24].
Notice also, that the credit assignment module could employ any reward value
that can be measured during the search process to score the applied search oper-
ation. Representative examples of such rewards are the tness improvement [19],
ranking successful movements [19,25], and landscape analysis measures [26].
4 Experimental Results
We rstly present the experimental setup of this study (Sect. 4.1), which includes
details about the environment used, the considered problem instances, the pro-
posed as well as the state-of-the-art hyper-heuristics, and the parameter cong-
urations of all considered algorithms. We then proceed with the presentation of
100 A. Gretsista and E.K. Burke
1
More details can be found in http://www.cs.nott.ac.uk/tec/NRP/.
An Iterated Local Search Framework with Adaptive Operator Selection 101
Table 2. Summarizing statistics of the normalized objective values, regret, and davg
metrics of all hyper-heuristics across all considered problem instances.
Fig. 1. Frequency graphs of the selected strategies by the adaptive HHILS variants for
all problem instances averaged over all simulations
5 Conclusions
In this study, we proposed a simple and eective Iterated Local Search based
selection hyper-heuristic framework that adopts the adaptive operator selec-
tion paradigm to successfully address a wide variety of nurse rostering problem
instances. It employs an action selection model to select dierent perturbation
strategies and a credit assignment module to appropriately score them. The pro-
posed framework is able to adopt any action selection model and credit assign-
ment mechanism available in the literature. In this study, we have tested six
dierent action selection models resulting in new competitive hyper-heuristics.
The high level nature of the framework makes it widely applicable to new or
unseen problem instances/domains without requiring further modications.
The adaptive characteristics of the proposed framework are investigated by
comparing with its non-adaptive variants, while its performance is evaluated
through comparisons with 8 state-of-the-art hyper-heuristics on 39 dierent
nurse rostering problem instances. The experimental results suggest that the pro-
posed framework operates signicantly better against the state-of-the-art hyper-
heuristics. The proposed adaptive mechanisms seem to be eective across the
majority of the problem instances, with the Adaptive Pursuit and the simple
proportional action selection model to be able to learn and identify the most
promising perturbation strategies. The remaining three considered action selec-
tion models operate similarly with the uniform selection, which indicates that
they are not able to identify the best performing perturbation strategy. However,
even a simple random selection performs signicantly better than the majority
of the state-of-the-art algorithms. Therefore, further experimentation and analy-
sis of the adaptive strategies on more nurse rostering problem instances have to
be performed to draw safe conclusions about their behavior. Future work will
also include comparisons with specialized state-of-the-art heuristics developed
for nurse rostering problems.
References
1. Burke, E.K., Kendall, G., Newall, J., Hart, E., Ross, P., Schulenburg, S.: Hyper-
heuristics: an emerging direction in modern search technology. In: Glover, F.,
Kochenberger, G.A. (eds.) Handbook of Metaheuristics, pp. 457474. Springer,
Boston (2003). doi:10.1007/0-306-48056-5 16
2. Burke, E.K., Gendreau, M., Hyde, M., Kendall, G., Ochoa, G., Ozcan, E., Qu,
R.: Hyper-heuristics: a survey of the state of the art. J. Oper. Res. Soc. 64(12),
16951724 (2013)
3. Burke, E.K., Hyde, M., Kendall, G., Ochoa, G., Ozcan, E., Woodward, J.R.: A
classication of hyper-heuristic approaches. In: Gendreau, M., Potvin, J.Y. (eds.)
Handbook of Metaheuristics. International Series in Operations Research & Man-
agement Science, vol. 146, pp. 449468. Springer, Boston (2010). doi:10.1007/
978-1-4419-1665-5 15
4. Burke, E.K., Causmaecker, P.D., Berghe, G.V., Landeghem, H.V.: The state of the
art of nurse rostering. J. Sched. 7(6), 441499 (2004)
An Iterated Local Search Framework with Adaptive Operator Selection 107
5. Ernst, A.T., Jiang, H., Krishnamoorthy, M., Sier, D.: Sta scheduling and roster-
ing: a review of applications, methods and models. Eur. J. Oper. Res. 153(1), 327
(2004)
6. Asta, S., Ozcan, E., Curtois, T.: A tensor based hyper-heuristic for nurse rostering.
Knowl. Based Syst. 98, 185199 (2016)
7. Lu, Z., Hao, J.K.: Adaptive neighborhood search for nurse rostering. Eur. J. Oper.
Res. 218(3), 865876 (2012)
8. Rae, C., Pillay, N.: Investigation into an evolutionary algorithm hyperheuristic for
the nurse rostering problem. In: Proceedings of the 10th International Conference
on the Practice and Theory of Automated, PATAT 2014, pp. 527532 (2014)
9. Anwar, K., Awadallah, M.A., Khader, A.T., Al-betar, M.A.: Hyper-heuristic app-
roach for solving nurse rostering problem. In: 2014 IEEE Symposium on Compu-
tational Intelligence in Ensemble Learning (CIEL), pp. 16, December 2014
10. Burke, E.K., Curtois, T.: New approaches to nurse rostering benchmark instances.
Eur. J. Oper. Res. 237(1), 7181 (2014)
11. Bai, R., Burke, E., Kendall, G., Li, J., McCollum, B.: A hybrid evolutionary app-
roach to the nurse rostering problem. IEEE TEVC 14(4), 580590 (2010)
12. Burke, E.K., Li, J., Qu, R.: A hybrid model of integer programming and vari-
able neighbourhood search for highly-constrained nurse rostering problems. Eur.
J. Oper. Res. 203(2), 484493 (2010)
13. Kheiri, A., Keedwell, E.: A sequence-based selection hyper-heuristic utilising a hid-
den Markov model. In: Proceedings of the 2015 Annual Conference on Genetic and
Evolutionary Computation, GECCO 2015, pp. 417424. ACM, New York (2015)
14. Chan, C.Y., Xue, F., Ip, W.H., Cheung, C.F.: A hyper-heuristic inspired by pearl
hunting. In: Hamadi, Y., Schoenauer, M. (eds.) LION 2012. LNCS, pp. 349353.
Springer, Heidelberg (2012). doi:10.1007/978-3-642-34413-8 26
15. Adriaensen, S., Brys, T., Nowe, A.: Fair-share ILS: a simple state-of-the-art iterated
local search hyperheuristic. In: Proceedings of the 2014 Conference on Genetic and
Evolutionary Computation, GECCO 2014, pp. 13031310. ACM (2014)
16. Msr, M., Verbeeck, K., Causmaecker, P., Berghe, G.: An intelligent hyper-
heuristic framework for CHeSC 2011. In: Hamadi, Y., Schoenauer, M. (eds.)
LION 2012. LNCS, pp. 461466. Springer, Heidelberg (2012). doi:10.1007/
978-3-642-34413-8 45
17. CHeSC 2011 (2011). http://www.asap.cs.nott.ac.uk/external/chesc2011/
18. Battiti, R., Brunato, M., Mascia, F.: Reactive Search and Intelligent Optimization.
Operations research/Computer Science Interfaces, vol. 45. Springer, Boston (2008).
doi:10.1007/978-0-387-09624-7
19. Fialho, A.: Adaptive operator selection for optimization. Ph.D. thesis, Universite
Paris-Sud XI, Orsay, France, December 2010
20. Burke, E.K., Curtois, T., Post, G., Qu, R., Veltman, B.: A hybrid heuristic ordering
and variable neighbourhood search for the nurse rostering problem. Eur. J. Oper.
Res. 188(2), 330341 (2008)
21. Burke, E.K., Curtois, T., Qu, R., Vanden Berghe, G.: A time predened variable
depth search for nurse rostering. INFORMS J. Comput. 25(3), 411419 (2013)
22. CHeSC 2014: The second cross-domain heuristic search challenge (2014). http://
www.hyex.org/chesc2014/, http://www.hyex.org/. Accessed 25 Mar 2015
23. Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT
Press, Cambridge (1998)
24. Thierens, D.: Adaptive strategies for operator allocation. In: Lobo, F., Lima, C.,
Michalewicz, Z. (eds.) Parameter Setting in Evolutionary Algorithms. SCI, vol. 54,
pp. 7790. Springer, UK (2007). doi:10.1007/978-3-540-69432-8 4
108 A. Gretsista and E.K. Burke
25. Epitropakis, M.G., Tasoulis, D.K., Pavlidis, N.G., Plagianakos, V.P., Vrahatis,
M.N.: Tracking particle swarm optimizers: an adaptive approach through multino-
mial distribution tracking with exponential forgetting. In: 2012 IEEE Congress on
Evolutionary Computation (CEC), pp. 18 (2012)
26. Munoz, M.A., Sun, Y., Kirley, M., Halgamuge, S.K.: Algorithm selection for black-
box continuous optimization problems: a survey on methods and challenges. Inf.
Sci. 317, 224245 (2015)
27. Fialho, A., Costa, L.D., Schoenauer, M., Sebag, M.: Analyzing bandit-based adap-
tive operator selection mechanisms. Ann. Math. Artif. Intell. 60(12), 2564 (2010)
28. Karafotias, G., Hoogendoorn, M., Eiben, A.E.: Why parameter control mechanisms
should be benchmarked against random variation. In: 2013 IEEE Congress on
Evolutionary Computation (CEC), pp. 349355, June 2013
29. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed
bandit problem. Mach. Learn. 47(23), 235256 (2002)
30. Banerjea-Brodeur, M.: Selection hyper-heuristics for healthcare scheduling. Ph.D.
thesis, University of Nottingham, UK, June 2013
31. Asta, S., Ozcan, E., Parkes, A.J.: Batched mode hyper-heuristics. In: Nicosia, G.,
Pardalos, P. (eds.) LION 2013. LNCS, vol. 7997, pp. 404409. Springer, Heidelberg
(2013). doi:10.1007/978-3-642-44973-4 43
32. Ochoa, et al.: HyFlex: a benchmark framework for cross-domain heuristic search.
In: Hao, J.-K., Middendorf, M. (eds.) EvoCOP 2012. LNCS, vol. 7245, pp. 136147.
Springer, Heidelberg (2012). doi:10.1007/978-3-642-29124-1 12
33. Hollander, M., Wolfe, D.A., Chicken, E.: Nonparametric Statistical Methods, 3rd
edn. Wiley, Hoboken (2013)
Learning a Reactive Restart Strategy
to Improve Stochastic Search
1 Introduction
Restarted search has become an integral part of combinatorial search algorithms.
Even before heavy-tailed runtime distributions were found to explain the massive
variance in search performance [1], in local search restarts were commonly used
as a search diversication technique [2].
Fixed-schedule restart strategies were studied theoretically in [3]. For SAT
and constraint programming solvers, practical studies followed. For example,
one study found that there is hardly any dierence between theoretically opti-
mal schedules and simple geometrically growing limits [4]. SAT solvers used
geometrically growing limits for quite some time before the community largely
adapted theoretically optimal schedules (whereby the optimality guarantees are
based on assumptions that actually do not hold for clause-learning solvers where
consecutive restarts are not independent). Audemard and Simon [5] argued that
xed schedules are suboptimal for SAT solvers and designed adaptive restarts
strategies for one SAT solver specically.
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 109123, 2017.
https://doi.org/10.1007/978-3-319-69404-7_8
110 S. Kadioglu et al.
2 Restart Strategies
Nowadays, stochastic search algorithms and randomized search heuristics are
frequently restarted: If a run does not conclude within a pre-determined limit,
we restart the algorithm. This was shown to to help avoid heavy-tailed runtime
distributions [1]. Due to the added complexity of designing an appropriate restart
strategy for a given target algorithm, the two most common techniques used are
to either restarts with a certain probability at the end of each iteration, or to
employ a xed schedule of restarts.
Some theoretical results exist on how to construct optimal restart strategies.
For example, Luby et al. [3] showed that, for Las Vegas algorithms with known
run time distribution, there is an optimal stopping time in order to minimize the
expected running time. They also showed that, if the distribution is unknown,
there is a universal sequence of running times which is the optimal restarting
strategy up to constant factors.
Fewer results are known for the optimization case. Marti [6] and Lourenco
et al. [7] present practical approaches, and a recent theoretical result is presented
by Schoenauer et al. [8]. Particularly for the satisability problem, several studies
make an empirical comparison of a number of restart policies [9,10].
Quite often, classical optimization algorithms are deterministic and thus can-
not be improved by restarts. This also appears to hold for certain popular modern
solvers, such as IBM ILOG CPLEX. However, characteristics can change when
memory constraints or parallel computations are encountered. This was the ini-
tial idea of Lalla-Ruiz and Vo [11], who investigated dierent mathematical
programming formulations to provide dierent starting points for the solver.
Many other modern optimization algorithms, while also working mostly deter-
ministically, have some randomized component, for example by choosing a random
starting point. Two very typical uses for an algorithm with time budget t are to
(a) use all of time t for a single run of the algorithm (single-run strategy), or (b) to
Learning a Reactive Restart Strategy to Improve Stochastic Search 111
make a number of k runs of the algorithm, each with running time t/k (multi-run
strategy).
Extending these two classical strategies, Fischetti et al. [12] investigated the
use of the following Bet-and-Run strategy with a total time limit t:
Phase 1 performs k runs of the algorithm for some (short) time limit t1 with
t1 t/k.
Phase 2 uses remaining time t2 = t k t1 to continue only the best run from
the rst phase until timeout.
Note that the multi-run strategy of restarting from scratch k times is a special
case by choosing t1 = t/k and t2 = 0 and the single-run strategy corresponds to
k = 1; thus, it suces to consider dierent parameter settings of the bet-and-run
strategy to also cover these two strategies.
Fischetti et al. [12] experimentally studied such a Bet-and-Run strategy for
mixed-integer programming. They explicitly introduce diversity in the starting
conditions of the used MIP solver (IBM ILOG CPLEX) by directly accessing
internal mechanisms. In their experiments, k = 5 performed best.
Recently, Friedrich et al. [13] investigated a comprehensive range of Bet-
and-Run strategies on the traveling salesperson problem and the minimum ver-
tex cover problem. Their best strategy was Restarts40 1% , which in the rst phase
does 40 short runs with a time limit that is 1% of the total time budget and
then uses the remaining 60% of the total time budget to continue the best run
of the rst phase. They investigated the use of the universal sequence of Luby
et al. [3] as well, using various choices of t1 , however, it turned out inferior.
The theoretical analysis is provided by Lissovoi et al. [14], who investigated
Bet-and-Run for a family of pseudo-Boolean functions, consisting of a plateau
and a slope, as an abstraction of real tness landscapes with promising and
deceptive regions. The authors showed that Bet-and-Run with non-trivial k
and t1 are necessary to nd the global optimum eciently. Also, they showed that
the choice of t1 is linked to properties of the function, and they provided a xed
budget analysis to guide selection of the bet-and-run parameters to maximise
expected tness after t = k t1 + t2 tness evaluations.
The goal of our present research is to address the two challenges encountered
in previous works: the need to set k and t1 in case of Bet-and-Run, and the
general issue of inexibility in previous approaches. Our framework can decide
online whether (i) the current run should be continued, (ii) the best run so far
should be continued, or (iii) a completely new run should be started.
4.1 Features
new run. One complication arises. Namely, for dierent instances, the objec-
tive function values observed will generally operate on vastly dierent scales.
However, to learn strategies oine, we need to compute weights, and these need
to work with all kinds of instances. Consequently, rather than taking average
and projected objective function values at face value, we rst normalize them.
In particular, we consider the three initial values (best found in initial time
interval for current and best, and running average of best found for all new runs)
and normalize them between 0 and 1. That is, we shift and scale these values in
such a way that their smallest will be 0, the largest will be 1, and the last will be
somewhere between 0 and 1. Analogously, we normalize the trajectory values.
On top of the six features thus computed we also use the percentage of
overall time that has already elapsed, the percentage of overall time aorded in
the beginning where all we do is restart a new run every time, and the time a
new run will be given as percentage of total time left. In total we thus arrive at
nine features.
Now, to compute the score for each of the three possibilities (continue current
run, continue best run so far, and start a new run) we compute the following
function
1
pk (f ) ,
1 + exp(w0k + i fi wik )
whereby k {1, 2, 3} marks whether the function marks the score for continuing
the current run, continuing the best run, or starting a new run, and f R9
is the feature vector that characterizes our search experience so far. Note that
pk (f ) (0, 1), whereby the function approaches 0 when the weighted sum in
the denominators exponential function goes to innity, and how the function
approaches 1 when the same sum approaches minus innity. Finally, note that
we require a total of 30 weights to dene the three functions. These weights will
be learned later by a parameter tuner to achieve superior runtime behavior.
Given the weights wik with k {1, 2, 3} and i {0, . . . , 9}, we can now dene
the framework within which we can embed any black-box optimization solver.1
1
We say black-box because we do not need to know anything about the inner workings
of the solver. However, we make two assumptions. First, that we can set a time limit
to the solver where it stops, and that we can add more time and continue the
interrupted computation later. Second, that whenever the solver stops it returns
information when it found the rst solution, when it found the best solution so far,
and what the quality of the best solution found so far is.
114 S. Kadioglu et al.
After the rst phase ends, we initialize the features based on the search
experience so far. Then, we enter the main phase. Based on the given weights and
the current features we compute scores for the three options how we can continue
the computation at each step. We then choose randomly and proportionally to
these scores whether we continue the best run so far, the last run, or whether
we begin a new run.
No matter which choice we always keep the best solution found so far up to
date. When we choose to start a new run, we also update the running average of
the times it takes to nd a rst solution as well as the incremental time interval
that results from this running average times the factor r. Finally, we update the
features and continue until the time has run out.
The last ingredient needed to apply this framework in practice is a method for
learning the weights w. Based on a training set of instances, we compute weights
that result in superior performance using the gender-based genetic algorithm tuner
GGA [19], following the same general approach for tuning hyper-parameterized
search methods as introduced in [17].
5 Experimental Analysis
We now present our numerical analysis. First, we briey introduce the combina-
torial optimization problems, the solvers, and the instances used in our experi-
ments. Second, we describe our comprehensive data collection, which allows us
to conduct our investigations completely oine, that is, without the need of
running any additional experiments. Third, we present the results of our inves-
tigations which show the eectiveness of our online method.
5.4 Results
Following the training of Hyper on two thirds of the instances (per problem
domain), we are left with 38 of the 115 TSP instances and 28 of the 86 MVC
instances. We use these to compare the performance of the following investigated
approaches:
Learning a Reactive Restart Strategy to Improve Stochastic Search 117
1. Single: the solver is run once with a random seed, allowing it to run for the
total time given;
2. Restarts: the solver is restarted from scratch whenever a preset time limit
is reached, and this loop is repeated until time is up;
3. Luby: restarts based on the xed Luby sequence [3], where one Luby time
unit is based on ve times the time the rst run needs to produce the rst
solution;
4. Bet-and-Run: the previously described bet-and-run strategy by Friedrich
et al. [13];
5. Hyper: our trained hyper-parameterized bet-and-run restart strategy, as
described above.
We will analyze the outcomes using several criteria. First, we compare the
performance gaps achieved with respect to the optimal solution possible within
the time budget.2 Second, we consider the number of times an approach is able
to nd the best possible solution. Third, we compare the amount of time needed
in order to compute the nal results.
To start o, Tables 1 and 2 show the results of the individual solvers across
the sets of 38 and 28 instances. Note that we are using the problem domain
names TSP and MVC instead of the respective solvers to facilitate reading.
We observe that the number of times the best possible solution is found
increases with increasing time budget. Note that this is not natural as the best
possible solution is the best possible solution for the respective time limit! The
fact that the relative gap decreases anyhow is therefore a reection of the fact
that the best restart can actually nd the best solution rather quickly. With
increasing time limits, the restarted approaches thus have more buer to nd
this best quality solution as well.
Next, we nd that Single and Restarts are clearly outperformed by the
other three approaches across both problem domains and across all total time
budgets. On TSP, Hyper achieves less than half the performance gap of Bet-
and-Run when the total time budget is only 100 s. This advantage for Hyper
becomes more and pronounced as the budget increases to 5,000 s. For this time
limit, Hyper has a six-times lower average gap than Bet-and-Run, which
is marked improvement. At the same time, Bet-and-Run can nd the best
solutions in only 67% of the runs, whereas Hypers success rate is 84%.
MVC can be seen as a little bit more challenging in our setting, as the com-
putation time budgets were rather short and FastVC encountered signicant
initialization times on some of the large instances. As a consequence, the number
of times where no solution has been produced by the various approaches is higher
than for TSP, however, this number decreases with increasing time budget.
On MVC, Hyper and Bet-and-Run are really close in terms of average per-
formance gap, however, there is an advantage for Hyper in number of times the
best possible solutions are found. In practice this is still a substantial improve-
ment.
2
This best possible solution is the best solution provided within the given time limit
by any of the 10,000 runs we conducted.
118 S. Kadioglu et al.
Table 1. TSP results. Shown are time in seconds, and performance gap from the best
possible solution within the respective time limit. solutions and no solutions refer
to the number of times the approach has produced any solution at all. best found lists
the number of times the best possible solution was found given 380 runs (38 instances
10 independent runs). Highlighted in dark blue and light blue are the best and second
best average approaches.
Single
time average average
solutions no solutions best found
budget performance time
100 380 0 234 0.1415 12
200 378 2 239 0.1368 21
500 380 0 266 0.0885 95
1000 380 0 266 0.0877 105
2000 380 0 266 0.0762 165
5000 380 0 266 0.0596 290
Restarts
time average average
solutions no solutions best found
budget performance time
100 380 0 252 0.0689 21
200 380 0 255 0.0618 35
500 380 0 259 0.0519 61
1000 380 0 261 0.0474 98
2000 380 0 261 0.0457 154
5000 380 0 258 0.0435 268
Luby
time average average
solutions no solutions best found
budget performance time
100 380 0 296 0.0274 19
200 380 0 299 0.0189 32
500 380 0 309 0.0135 75
1000 380 0 317 0.0108 127
2000 380 0 318 0.0090 229
5000 380 0 322 0.0070 476
Bet-and-Run
time average average
solutions no solutions best found
budget performance time
100 380 0 244 0.0487 5
200 380 0 245 0.0473 6
500 380 0 246 0.0444 8
1000 380 0 248 0.0436 13
2000 380 0 251 0.0429 22
5000 380 0 256 0.0419 49
Hyper
time average average
solutions no solutions best found
budget performance time
100 380 0 295 0.0216 15
200 380 0 302 0.0142 26
500 380 0 307 0.0132 57
1000 380 0 307 0.0090 87
2000 380 0 319 0.0077 178
5000 380 0 321 0.0066 322
Learning a Reactive Restart Strategy to Improve Stochastic Search 119
Hyper vs Single
16 15 12 12 12 12
TSP:
(gap) 25 25 25 25
22 22
Hyper vs Restarts
14 13 13 13 13 13
TSP:
2 2
(gap) 22 23 24 24 24 24
Hyper vs Luby
TSP: 27 3 29 29 29 29
2 29 2
(gap) 9 5 8 8 8 7
Hyper vs Bet-and-Run
13 14 13 13 14 13
TSP:
3 2 3 2 2
(gap) 22 22 22 23 23
23
Hyper vs Restarts
6 11 11 11
MVC: 8
3
(gap) 7
9 8 5
13 8 9
Hyper vs Luby
8 7 8 7
MVC: 4 7 8 3
(gap) 10
15 13 11 8
Hyper vs Bet-and-Run
8 8 11
MVC: 6
2 3
(gap) 19 21 18 14
Fig. 1. Statistical comparison of Hyper with the other approaches using the Wilcoxon
rank-sum test (signicance level p = 0.05). The approaches are compared based on the
quality gap to the best possible solution (smaller is better).
The colors have the following meaning: Green indicates that Hyper is statistically
better, Red indicates that Hyper is statistically worse, Light gray indicates that both
performed identical, Dark gray indicates that the dierences were statistically insignif-
icant. We have chosen pie charts on purpose because they allow for a quick qualitative
comparison of results. (Color gure online)
120 S. Kadioglu et al.
Table 2. MVC results. Shown are time in seconds, and performance gap from the best
possible solution within the respective time limit. solutions and no solutions refer
to the number of times the approach has produced any solution at all. best found
lists the number of times the respective best possible solution has been found given
280 runs (28 instances 10 independent runs). Highlighted in dark blue and light blue
are the best and second best average approaches.
Single
time average average
solutions no solutions best found
budget performance time
5 211 69 74 0.1097 3
10 223 57 76 0.4558 5
20 254 26 80 0.6181 9
50 264 16 98 0.2273 19
Restarts
time average average
solutions no solutions best found
budget performance time
5 228 52 76 0.1111 3
10 252 28 82 0.4140 6
20 268 12 88 0.6128 12
50 278 2 101 0.1802 25
Luby
time average average
solutions no solutions best found
budget performance time
5 228 52 80 0.1064 3
10 252 28 91 0.3907 6
20 268 12 91 0.5767 11
50 278 2 114 0.1032 23
Bet-and-Run
time average average
solutions no solutions best found
budget performance time
5 228 52 65 0.0800 3
10 252 28 79 0.3328 5
20 268 12 90 0.4721 9
50 278 2 105 0.0390 18
Hyper
time average average
solutions no solutions best found
budget performance time
5 228 52 75 0.0781 3
10 252 28 87 0.3309 5
20 268 12 104 0.4710 9
50 278 2 119 0.0385 19
Interestingly, our results dier from [13], where Luby-based restarts per-
formed not as well as Restarts, whereas in our study Bet-and-Run is out-
performed by LubyStat on TSP. This might be due to a dierent approach of
setting tinit and because we use a larger instance set for TSP. Independent of
this Hyper outperforms both.
Learning a Reactive Restart Strategy to Improve Stochastic Search 121
Table 3. One-sided Wilcoxon rank-sum test to test whether the quality gap distrib-
ution of Hyper is shifted to the left of that of the other approaches. Shown are the
p-values.
6 Conclusion
We introduced the idea of learning reactive restart strategies for combinato-
rial search algorithms. We compared this new approach (Hyper) with other
approaches, among them a very recent Bet-and-Run approach that had been
assessed comprehensively on TSP and MVC instances. Across both domains,
Hyper resulted in markedly better average solution qualities, and it exhibited
signicantly increased rates of hitting the best possible solution.
As the investigated problem domains are structurally very dierent, we
expect our approach to generalize to other problem domains as well, such as
continuous and multi-objective optimization problems.
Future work will focus on the development of other runtime features as a
basis for making restart decisions.
122 S. Kadioglu et al.
References
1. Gomes, C.P., Selman, B., Crato, N., Kautz, H.A.: Heavy-tailed phenomena in
satisability and constraint satisfaction problems. J. Autom. Reason. 24(1), 67
100 (2000)
2. Hoos, H.H.: Stochastic local search - methods, models, applications. Ph.D. thesis,
TU Darmstadt (1998)
3. Luby, M., Sinclair, A., Zuckerman, D.: Optimal speedup of Las Vegas algorithms.
Inf. Process. Lett. 47(4), 173180 (1993)
4. Wu, H., van Beek, P.: On universal restart strategies for backtracking search. In:
Bessiere, C. (ed.) CP 2007. LNCS, vol. 4741, pp. 681695. Springer, Heidelberg
(2007). doi:10.1007/978-3-540-74970-7 48
5. Audemard, G., Simon, L.: Rening restarts strategies for SAT and UNSAT. In:
Milano, M. (ed.) CP 2012. LNCS, vol. 7514, pp. 118126. Springer, Heidelberg
(2012). doi:10.1007/978-3-642-33558-7 11
6. Marti, R.: Multi-start methods. In: Glover, F., Kochenberger, G.A. (eds.) Hand-
book of Metaheuristics, pp. 355368 (2003)
7. Lourenco, H.R., Martin, O.C., Stutzle, T.: Iterated local search: framework and
applications. In: Gendreau, M., Potvin, J.Y. (eds.) Handbook of Metaheuristics.
International Series in Operations Research & Management Science, vol. 146, pp.
363397. Springer, Boston (2010). doi:10.1007/978-1-4419-1665-5 12
8. Schoenauer, M., Teytaud, F., Teytaud, O.: A rigorous runtime analysis for quasi-
random restarts and decreasing stepsize. In: Hao, J.-K., Legrand, P., Collet, P.,
Monmarche, N., Lutton, E., Schoenauer, M. (eds.) EA 2011. LNCS, vol. 7401, pp.
3748. Springer, Heidelberg (2012). doi:10.1007/978-3-642-35533-2 4
9. Biere, A.: Adaptive restart strategies for conict driven SAT solvers. In: Kleine
Buning, H., Zhao, X. (eds.) SAT 2008. LNCS, vol. 4996, pp. 2833. Springer,
Heidelberg (2008). doi:10.1007/978-3-540-79719-7 4
10. Huang, J.: The eect of restarts on the eciency of clause learning. In: Interna-
tional Joint Conference on Articial Intelligence (IJCAI), pp. 23182323 (2007)
11. Lalla-Ruiz, E., Vo, S.: Improving solver performance through redundancy. Syst.
Sci. Syst. Eng. 25(3), 303325 (2016)
12. Fischetti, M., Monaci, M.: Exploiting erraticism in search. Oper. Res. 62(1), 114
122 (2014)
13. Friedrich, T., Kotzing, T., Wagner, M.: A generic bet-and-run strategy for speeding
up stochastic local search. In: Proceedings of the Thirty-First AAAI Conference
on Articial Intelligence, pp. 801807 (2017)
14. Lissovoi, A., Sudholt, D., Wagner, M., Zarges, C.: Theoretical results on bet-and-
run as an initialisation strategy. In: Genetic and Evolutionary Computation Con-
ference (GECCO) (2017, accepted for publication)
15. Stutzle, T., Lopez-Ibanez, M.: Automatic (oine) conguration of algorithms. In:
Genetic and Evolutionary Computation Conference (GECCO), pp. 795818 (2016)
16. Bezerra, L.C.T., Lopez-Ibanez, M., Stutzle, T.: Automatic component-wise design
of multiobjective evolutionary algorithms. IEEE Trans. Evol. Comput. 20(3), 403
417 (2016)
17. Ansotegui, C., Pon, J., Tierney, K., Sellmann., M.: Reactive dialectic search port-
folios for MaxSAT. In: AAAI Conference on Articial Intelligence (2017, accepted
for publication)
18. Kadioglu, S., Sellmann, M.: Dialectic search. In: Gent, I.P. (ed.) CP 2009.
LNCS, vol. 5732, pp. 486500. Springer, Heidelberg (2009). doi:10.1007/
978-3-642-04244-7 39
Learning a Reactive Restart Strategy to Improve Stochastic Search 123
19. Ansotegui, C., Sellmann, M., Tierney, K.: A gender-based genetic algorithm for the
automatic conguration of algorithms. In: Gent, I.P. (ed.) CP 2009. LNCS, vol.
5732, pp. 142157. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04244-7 14
20. Hartigan, J.A.: Bounding the maximum of dependent random variables. Electron.
J. Stat. 8(2), 31263140 (2014)
21. Polacek, M., Doerner, K.F., Hartl, R.F., Kiechle, G., Reimann, M.: Scheduling
periodic customer visits for a traveling salesperson. Eur. J. Oper. Res. 179, 823
837 (2007)
22. Applegate, D.L., Bixby, R.E., Chvatal, V., Cook, W.J.: The Traveling Salesman
Problem: A Computational Study. Princeton University Press, Princeton (2011)
23. Applegate, D.L., Cook, W.J., Rohe, A.: Chained Lin-Kernighan for large traveling
salesman problems. INFORMS J. Comput. 15(1), 8292 (2003)
24. Cook, W.: The Traveling Salesperson Problem: Downloads (Website) (2003).
http://www.math.uwaterloo.ca/tsp/concorde/downloads/downloads.htm.
Accessed 21 Dec 2016
25. Reinelt, G.: TSPLIB - A traveling salesman problem library. ORSA J. Comput.
3(4), 376384 (1991). Instances: http://comopt.i.uni-heidelberg.de/software/
TSPLIB95/tsp/. Accessed 21 Dec 2016
26. Gomes, F.C., Meneses, C.N., Pardalos, P.M., Viana, G.V.R.: Experimental analy-
sis of approximation algorithms for the vertex cover and set covering problems.
Comput. Oper. Res. 33(12), 35203534 (2006)
27. Abu-Khzam, F.N., Langston, M.A., Shanbhag, P., Symons, C.T.: Scalable parallel
algorithms for FPT problems. Algorithmica 45(3), 269284 (2006)
28. Cai, S.: Balance between complexity and quality: local search for minimum vertex
cover in massive graphs. In: International Joint Conference on Articial Intelli-
gence (IJCAI), pp. 747753 (2015). Code: http://lcs.ios.ac.cn/caisw/MVC.html.
Accessed 21 Dec 2016
29. Cai, S., Su, K., Luo, C., Sattar, A.: NuMVC: an ecient local search algorithm for
minimum vertex cover. J. Artif. Intell. Res. 46(1), 687716 (2013)
30. Ansotegui, C., Malitsky, Y., Samulowitz, H., Sellmann, M., Tierney, K.: Model-
based genetic algorithms for algorithm conguration. In: International Joint Con-
ference on Articial Intelligence (IJCAI), pp. 733739 (2015)
Ecient Adaptive Implementation of the Serial
Schedule Generation Scheme Using
Preprocessing and Bloom Filters
1 Introduction
Metaheuristic
Objective function
9 return tj ;
job. The only requirement for is to respect the precedence relations; otherwise
SSGS produces a feasible schedule for any permutation of jobs. The pseudo-
code of SSGS is given in Algorithm 1, and its two subroutines find and update
in Algorithms 2 and 3.
Commonly, the objective of RCPSP is to nd a schedule that minimises
the makespan, i.e. the time required to complete all jobs; however other objec-
tive functions are also considered in the literature. We say that an objective
function of a scheduling problem is regular if advancing the start time of a
job cannot worsen the solutions objective value. Typical objective functions of
RCPSP, including makespan, are regular. If the scheduling problem has a reg-
ular objective function, then SSGS guarantees to produce active solutions, i.e.
solutions that cannot be improved by changing tj for a single j J. Moreover,
it was shown [8] that for any active schedule S there exists a permutation
for which SSGS will generate S. Since any optimal solution S is active, search-
ing in the space of feasible permutations is sucient to solve the problem.
Ecient Adaptive Implementation of the Serial Schedule Generation Scheme 127
2.1 Initialisation of A
The initialisation of A in line 1 of Algorithm 1 iterates through T slots, where T is
the upper bound on the makespan. It was noted in [1] that instead of initialising
128 D. Karapetyan and A. Vernitski
11 return tj ;
A at every execution of SSGS, one can reuse this data structure between the
executions. To correctly initialise A, at the end of SSGS we restore At,r for each
r R and each slot where some job was scheduled: At,r cr for r R and
t = 1, 2, . . . , M , where M is the makespan of the solution. Since M T and
usually M T , this notably improves the performance of SSGS [1].
The function find (j, t0 , I, A) nds the earliest slot feasible for scheduling job
j. Its conventional implementation (Algorithm 2) takes O(T |R|) time, where T
is the upper bound of the time horizon. Our enhanced implementation of find
(Algorithm 4), rst proposed in [1], has the same worst case complexity but is
more ecient in practice. It is inspired by the Knuth-Morris-Pratt substring
search algorithm. Let tj be the assumed starting time of job J. To verify if it is
feasible, we need to test suciency of resources in slots tj , tj + 1, . . . , tj + dj 1.
Unlike the conventional implementation, our enhanced version tests these slots
in the reversed order. The order makes no dierence if the slot is feasible, but
otherwise testing in reversed order allows us to skip some slots; in particular, if
slot t is found to have insucient resources then we know that feasible tj is at
least t + 1.
A further speed up, which was not discussed in the literature before, is to
avoid re-testing of slots with sucient resources. Consider the point when we
nd that the resources in slot t are insucient. By that time we know that the
resources in t + 1, t + 2, . . . , tj + dj 1 are sucient. Our heuristic is to remember
that the earliest slot ttest to be tested in future iterations is tj + dj .
Ecient Adaptive Implementation of the Serial Schedule Generation Scheme 129
paper, we also use a non-hash-based approach, and to our knowledge, our paper
is the rst in which the structure of Bloom lters is chosen dynamically accord-
ing to the statistical properties of the data, with the purpose of improving the
speed of an optimisation algorithm.
In general, Bloom lters can be dened as a way of using data, and they
are characterised by two aspects: rst, all data is represented by short binary
arrays of a xed length (perhaps with a loss of accuracy); second, the process of
querying data involves only bitwise comparison of binary arrays (which makes
querying data very fast).
We represent both each jobs resource consumption and resource availability
at each time slot, by binary arrays of a xed length; we call these binary arrays
Bloom lters. Our Bloom lters will consist of bits which we call resource level
bits. Each resource bit, denoted by ur,k , r R, k {1, 2, . . . , cr }, means k units
of resource r (see details below). Let U be the set of all possible resource bits.
A Bloom filter structure is an ordered subset L U , see Fig. 2 for an example.
Suppose that a certain Bloom lter structure L is xed. Then we can introduce
B L (j), the Bloom lter of job j, and B L (t), the Bloom lter of time slot t, for
each j and t. Each B L (j) and B L (t) consists of |L| bits dened as follows: if ur,k
is the ith element of L then
1 if v j,r k, 1 if At,r k,
B L (j)i = and B L (t)i =
0 otherwise, 0 otherwise.
Fig. 2. Example of a Bloom lter structure for a problem with 3 resources, each having
capacity 4.
The length |L| of a Bloom lter is limited to reduce space requirements and,
more importantly for our application, speed up Bloom lter tests. Note that if
|L| is small (such as 32 or 64 bits) then we can exploit ecient bitwise operators
implemented by all modern CPUs; then each Bloom lter test takes only one
CPU operation. We set |L| = 32 in our implementation. While obeying this
constraint, we aim at minimising the number of false positives, because false
positives slow down the implementation.
u1,1 u1,2 u1,3 u1,4 u2,1 u2,2 u2,3 u2,4 u3,1 u3,2 u3,3 u3,4
1 2 3 4 1 2 3 4 1 2 3 4
Fig. 3. Example of a Bloom lter structure for a problem with 3 resources, each having
capacity 4, with L = U .
where Dkr is the probability that a randomly chosen job needs exactly k units
of resource r, and Ekr is the probability that a certain slot, when we examine
it for scheduling a job, has exactly k units of resource r available. The proba-
bility distribution Dr is produced from the RCPSP instance data during pre-
processing.1 The probability distribution E r is obtained empirically during the
run of SSGSdata (see Sect. 2.2); each time resource suciency is tested within
SSGSdata , its availability is recorded.
While positive result of a Bloom lter test generally requires further verication
using full data, in some circumstances its correctness can be guaranteed. In
particular, if for some r R and j J we have ur,k L and vj,r = k, then the
Bloom lter result, whether positive or negative, does not require verication.
Another observation is that updating B L (t) in update can be done in O(|Rj |)
operations instead of O(|R|) operations. Indeed, instead of computing B L (t)
from scratch, we can exploit our structure of Bloom lters. We update each bit
related to resources r Rj , but we keep intact other bits. With some trivial
pre-processing, this requires only O(|Rj |) CPU operations.
We also note that if |Rj | = 1, i.e. job j uses only one resource, then Bloom
lters will not speed up the find function for that job and, hence, in such cases
we use the standard find function specialised for one resource (see Sect. 2.2).
1
In multi-mode extension of RCPSP, this distribution depends on selected modes and
hence needs to be obtained empirically, similarly to how we obtain E r .
Ecient Adaptive Implementation of the Serial Schedule Generation Scheme 133
to be the most appropriate approach as the input data may vary signicantly
between executions of the algorithm. Our case is dierent in that the most
crucial input data (the RCPSP instance) does not change between executions
of SSGS. Thus, during the rst few executions of SSGS, we can test how each
implementation performs, and then select the faster one. This is a simple yet
eective control mechanism which we call Hybrid.
Hybrid is entirely transparent for the metaheuristic; the metaheuristic simply
calls SSGS whenever it needs to evaluate a candidate solution and/or generate
a schedule. The Hybrid control mechanism is then intelligently deciding each
time which implementation of SSGS to use based on information learnt during
previous runs.
An example of how Hybrid performs is illustrated in Fig. 4. In the rst exe-
cution, it uses SSGSdata to collect data required for both SSGSBF and SSGSNBF . For
the next few executions, it alternates between SSGSBF and SSGSNBF , measuring
the time each of them takes. During this stage, Hybrid counts how many times
SSGSBF was faster than the next execution of SSGSNBF . Then we use the sign
test [4] to compare the implementations. If the dierence is signicant (we use
a 5% signicance level for the sign test) then we stop alternating the implemen-
tations and in future use only the faster one. Otherwise we continue alternating
the implementations, but for at most 100 executions. (Without such a limitation,
there is a danger that the alternation will never stop if the implementations
perform similarly; since there are overheads associated with the alternation and
time measurement, it is better to pick one of the implementations and use it in
future executions.)
Restart
First 10,000 executions Next 10,000 executions
Fig. 4. Stages of the Hybrid control mechanism. Each square shows one execution of
SSGS, and the text inside describes which implementation of SSGS is used. SSGSdata
is always used in the rst execution. Further few executions (at most 100) alternate
between SSGSBF and SSGSNBF , with each execution being timed. Once sign test shows
signicant dierence between the SSGSBF and SSGSNBF implementations, the faster one
is used for the rest of executions. After 10,000 executions, previously collected data is
erased and adaptation starts from scratch.
134 D. Karapetyan and A. Vernitski
Our decision to use the sign test is based on two considerations: rst, it is very
fast, and second, it works for distributions which are not normal. This makes
our approach dierent from [12] where the distributions of runtimes are assumed
to be normal. (Note that in our experiments we observed that the distribution
of running times of an SSGS implementation resembles Poisson distribution.)
As pointed out in this and previous sections, optimal choices of parameters of
the SSGS implementations mostly depend on the RCPSP instance which does
not change throughout the metaheuristic run; however solution also aects the
performance. It should be noted though that metaheuristics usually apply only
small changes to the solution at each iteration, and hence solution properties tend
to change relatively slowly over time. Consequently, we assume that parameters
chosen in one execution of SSGS are likely to remain ecient for some further
executions. Thus, Hybrid restarts every 10,000 executions, by which we mean
that all the internal data collected by SSGS is erased, and learning starts from
scratch, see Fig. 4. This periodicity of restarts is a compromise between accuracy
of choices and overheads, and it was shown to be practical in our experiments.
5 Empirical Evaluation
of that feature. The rest of the features (or generator parameters) are then set
as follows: number of jobs 120, number of resources 4, maximum duration of
job 10, network complexity 1, resource factor 0.75, and resource strength 0.1.
These values correspond to some typical settings used in PSPLIB. For formal
denitions of the parameters we refer to [11].
For each combination of the instance generator settings, we produce 50
instances using dierent random generator seed values, and in each of our exper-
iments the metaheuristic solves each instance once. Then the runtime of SSGS is
said to be the overall time spent on solving those 50 instances, over 50,000,000
(which is the number of SSGS executions). The metaheuristic overheads are
relatively small and are ignored.
From the results reported in Fig. 5 one can see that our implementations of
SSGS are generally signicantly faster than SSGSconv , but performance of each
implementation varies with the instance features. In some regions of the instance
space SSGSBF outperforms SSGSNBF , whereas in other regions SSGSNBF outperforms
SSGSBF . The dierence in running times is signicant, up to a factor of two in our
experiments. At the same time, Hybrid is always close to the best of SSGSBF and
SSGSNBF , which shows eciency of our algorithm selection approach. In fact, when
SSGSBF and SSGSNBF perform similarly, Hybrid sometimes outperforms both; this
behaviour is discussed below.
Another observation is that SSGSNBF is always faster than SSGSconv (always
below the 100% mark) which is not surprising; indeed, SSGSNBF improves the per-
formance of both find and update. In contrast, SSGSBF is sometimes slower than
SSGSconv ; on some instances, the speed-up of the find function is overweighed
by overheads in both find and update. Most important though is that Hybrid
outperforms SSGSconv in each of our experiments by 8 to 68%, averaging at 43%.
In other words, within a xed time budget, an RCPSP metaheuristic employ-
ing Hybrid will be able to run around 1.8 times more iterations than if it used
SSGSconv .
To verify that Hybrid exhibits the adaptive behaviour and does not just stick
to whichever implementation has been chosen initially, we recorded the imple-
mentation it used in every execution for several problems, see Fig. 6. For this
experiment, we produced three instances: rst instance has standard parameters
except Resource Strength is 0.2; second instance has standard parameters except
Resource Factor is 0.45; third instance has standard parameters except Maxi-
mum Job Duration is 20. These parameter values were selected such that the two
SSGS implementations would be competitive and, therefore, switching between
them would be a reasonable strategy. One can see that the switches occur several
times throughout the run of the metaheuristic, indicating that Hybrid adapts to
the changes of solution. For comparison, we disabled the adaptiveness and mea-
sured the performance if only implementation chosen initially is used throughout
all iterations; the results are shown on Fig. 6. We conclude that Hybrid benets
from its adaptiveness.
136 D. Karapetyan and A. Vernitski
65
70
Runtime, %
60
60
55
50
50
0 200 400 600 800 1,000 2 4 6 8 10
Number of jobs Number of resources
150
70
Runtime, %
60
100
50
50
40
70
100
Runtime, %
60 80
60
50
Fig. 5. These plots show how performance of the SSGS implementations depends on
various instance features. Vertical axis gives the runtime of each implementation rela-
tive to SSGSconv . (SSGSconv graph would be a horizontal line y = 100%.)
Ecient Adaptive Implementation of the Serial Schedule Generation Scheme 137
SSGSBF SSGSNBF
99%
96%
95%
executions
0 1 000 000
Fig. 6. This diagram shows how Hybrid switches between SSGS implementations while
solving three problem instances. The number on the right shows the time spent by
Hybrid compared with the time that would be needed if only the implementation
chosen at the start would be used for all iterations.
While we have only discussed SSGS for the simple RCPSP, our ideas can
easily be applied in RCPSP extensions. We expect some of these ideas to work
particularly well in multi-project RCPSP, where the overall number of resources
is typically large but only a few of them are used by each job.
References
1. Asta, S., Karapetyan, D., Kheiri, A., Ozcan, E., Parkes, A.J.: Combining Monte-
Carlo and hyper-heuristic methods for the multi-mode resource-constrained multi-
project scheduling problem. Inf. Sci. 373, 476498 (2016)
2. Bloom, B.H.: Space/time trade-os in hash coding with allowable errors. Commun.
ACM 13(7), 422426 (1970)
3. Broder, A., Mitzenmacher, M.: Network applications of bloom lters: a survey.
Internet Math. 1(4), 485509 (2004)
4. Cohen, L., Holliday, M.: Practical Statistics for Students: An Introductory Text.
Paul Chapman Publishing Ltd., London (1996)
5. Guo, H.: Algorithm selection for sorting and probabilistic inference: a machine
learning-based approach. Ph.D. thesis, Kansas State University (2003)
6. Kayaturan, G.C., Vernitski, A.: A way of eliminating errors when using Bloom
lters for routing in computer networks. In: Fifteenth International Conference on
Networks, ICN 2016, pp. 5257 (2016)
7. Kim, J.-L., Ellis Jr., R.D.: Comparing schedule generation schemes in resource-
constrained project scheduling using elitist genetic algorithm. J. Constr. Eng.
Manag. 136(2), 160169 (2010)
8. Kolisch, R.: Serial and parallel resource-constrained project scheduling methods
revisited: theory and computation. Eur. J. Oper. Res. 90(2), 320333 (1996)
9. Kolisch, R., Hartmann, S.: Heuristic algorithms for the resource-constrained
project scheduling problem: classication and computational analysis. In: Weglarz,
J. (ed.) Project Scheduling, pp. 147178. Springer, Boston (1999). doi:10.1007/
978-1-4615-5533-9 7
10. Kolisch, R., Hartmann, S.: Experimental investigation of heuristics for resource-
constrained project scheduling: an update. Eur. J. Oper. Res. 174(1), 2337 (2006)
11. Kolisch, R., Sprecher, A.: PSPLIB - a project scheduling problem library: OR
software - ORSEP operations research software exchange program. Eur. J. Oper.
Res. 96(1), 205216 (1997)
12. Lau, J., Arnold, M., Hind, M., Calder, B.: Online performance auditing: using hot
optimizations without getting burned. SIGPLAN Not. 41(6), 239251 (2006)
13. Tarkoma, S., Rothenberg, C.E., Lagerspetz, E.: Theory and practice of bloom lters
for distributed systems. IEEE Commun. Surv. Tutor. 14(1), 131155 (2012)
Interior Point and Newton Methods in Solving
High Dimensional Flow Distribution Problems
for Pipe Networks
1 Introduction
In this paper optimal (steady-state) ow distribution problem in pipe network is
considered. From mathematical point of view this problem is sparse large-scale
convex optimization problem.
Several approaches to nd steady-state solution exist. A comprehensive sur-
vey of methods is presented in [6]. Our paper uses approach similar to [2], but
also includes sparse matrices techniques in order to eciently solve high dimen-
sional ow distribution problems.
The aim of this paper is application of new technology for solving the con-
sidered problem.
2 Problem Statement
Let us consider pipe network, which is given by its incidence matrix
A IR(n + 1)m , where m is the number of oriented edges and n+1 is the number
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 139149, 2017.
https://doi.org/10.1007/978-3-319-69404-7_10
140 O.O. Khamisov and V.A. Stennikov
n
of nodes. Vector of nodal ow rates Q IRn+1 , i = 1 Qi = 0, with components
Qi > 0 for sources, Qi < 0 for sinks and Qi = 0 for the nodes of connection.
The ow distribution problem has the following statement [5]:
Ax = Q, (2)
here x is the vector of unknown ows on the edges, that have to be found: xi > 0
if ow coincides with the direction of the edge i, otherwise xi < 0. Function F
is a twice dierentiable convex function, which will be described below.
System (2) describes the rst Kirchhos law. Since A is an incidence matrix,
we have rank A = n, therefore it is possible to exclude one arbitrary row and
get equivalent system:
Ax = Q, (3)
here A IRnm , and Q IRn are new matrix and right hand side vector,
obtained by such exclusion. This is done to avoid working with singular matrix.
The objective function is dened in the following way [2,5]:
m
Si |xi |i
F (x) = Hi xi , (4)
i=1
i
For the problem (1) with linear constraints (3) Newton Method can be
applied with certain conditions, which are considered below.
If line contains a pump, xi must be nonnegative. Let I1 {1, . . . , m} be a
set of lines with pumps without ow limitations. Additionally, ow values on
some lines i I2 {1, . . . , m}, which can also contain pumps, must be limited
from above by i > 0. Note that I1 I2 = . Therefore, the following inequality
constraints on the ows have to be taken into consideration
xi 0, i I1 , i xi i , i I2 , (5)
3 Newton Method
Let us consider problem (1) and (3). The Lagrange function for this system has
the following form:
Due to the convexity of the objective function, equivalence of the Lagrange func-
tion gradient in x to zero is a sucient minimum condition. Therefore Newton
Method is applied to solve the equation
L
= 0.
x
At (k + 1) step of the Newton method vector xk + 1 is calculated according to
the following formula [8]
k+1 k
x x 1
k+1 = k 2 L(xk , k ) L(xk , k ),
where
F (x) + AT 2 F (x) AT
L(x, ) = , L(x, ) =
2
.
Ax Q A 0
1
Axs = Q,
i i i
xsi , i I1 , xsi , i I2 ,
F (xs )
here F s (xs ) = 3 . Numerical experiments show the following. When
= 0.001 = 0.1,
Qi
all constants Si , H i
2 and are approximately of the same order. In this case we
obtain essential acceleration in computations, that can be seen in the Table 1.
146 O.O. Khamisov and V.A. Stennikov
= {x | xi i1 , i I1 , xi [i + i , i i ], i = i2 (i i ), i I2 }, (8)
here i1 > 0, i I1 , i2 > 0, i I2 . This set is chosen so that any its point
satises strictly (5). Obtained projection is taken as initial for interior point
search and Interior Point Method. Formalization of the algorithm has form 5.
Results of the algorithm work are presented in Table 3, its comparison with usage
of Interior Point Method is given in Table 4.
8 Numerical Results
The algorithms were coded in C++. Results of numerical experiments are given
in Tables 1, 2, 3 and 4. Computations were made in PC with Intel Core i7 /
2.4 GHz / 16 GB.
Here the following notations are used: n amount of nodes in system (rows
in A), m amount of edges in system (columns in A), column Scaling describes
whether x is scaled according to (7) or not, Iter. number of iterations, CG
iter. average among all Newton Method steps number of iterations for Con-
jugate Gradient, IPS iter number of iterations of interior (and feasible) point
search (Algorithm 2), Time time in seconds, NZ (%) number of nonzero
entries relative to all entries in the matrices A and Lk , Lk Cholesky matrix,
obtained from B k (nonzero structures of Lk and B k is always same on any iter-
ation, since only diagonal matrix Dk changes), Bck amount of calculations of
values B k and numerical factorizations, Total time full computational time.
In Table 1 problem without inequality constrains (5) is considered. Therefore
Newton Method by itself is sucient to nd optimal solution (theoretically it
can fail because of matrix being indenite or possibly singular, but such cases
did not happen in numerical experiments). Additionally Cholesky factorization
can be done only once, due to the fact that matrices Dk and B k are same for
Interior Point and Newton Methods in Solving High Dimensional Flow 147
Table 1. Problem without inequality constraints (5) with and without scaling
Table 2. Problem without inequality constraints (5). Newton Method and Dikin Inte-
rior Point Method
all iterations. As can be seen, scaling (7) allows to reduce number of iterations
for both Newton Method and Interior Point Method.
In Table 2 Newton Method and Interior Point Method with scaling are com-
pared. As can be seen, Interior Point Method works much slower, because density
of Lk is higher, than density of A, therefore Cholesky decomposition together
with solution of triangular system with Lk require more computations, than
computations with A.
In Table 3 problem (1), (3) and (5) is considered, combined method 5 is
used. Firstly Newton Method is applied to nd solution of problem (1) and (3).
Then obtained point is projected to the shrunk area, as described in Sect. 7
(here i1 = 10, i2 = 0.3). Since in interior point search and in Interior Point
Method matrix B k changes every several iterations, amount of calculations of
B k is presented in column B k calc. for interior point search and for Interior
Point Method.
In Table 4 problem (1), (3) and (5) is considered. Combined method 5 is
compared with usage of Interior Point Method only. As can be seen, with growth
of the problem size, performance dierence increases in favor of method 5.
148 O.O. Khamisov and V.A. Stennikov
Table 4. Problem with inequality constraints (5). Solution with Newton method and
without it
9 Conclusion
References
1. Dikin, I.I.: Interior Point Method in Linear and Nonlinear programming. Moscow,
Krasand (2010). (in Russian)
2. Novitskiy, N.N., Dikin, I.I.: Calculation of feasible pipeline network operating con-
ditions by the interior-point method. Bull. Russ. Acad. Sci. Energy. (5) (2003). (in
Russian)
3. Dikin, I.I.: Iterative solution of problems of linear and quadratic programming.
Sov. Math. Dokl. 8, 674675 (1967)
4. Vanderbei, R.J.: Linear Programming Foundations and Extentions, 4th edn.
Springer, Heidelberg (2014)
5. Merenkov, A.P., Khasilev, V.Y.: Theory of Hydralic Networks. Moscow, Nauka
(1985). (in Russian)
6. Farhat, I.A., Al-Hawary, M.E.: Optimization methods applied for solving the short-
term hydrothermal coordination problem. Electr. Power Syst. Res. 79, 13081320
(2009)
7. Nocedal, J., Wright, S.: Numerical Optimization. Springer, Heidelberg (2006)
8. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press,
Cambridge (2004)
9. Van der Vorst, H.A.: Iterative Krylov Methods for Large Linear Systems. Cam-
bridge Monographs on Applied and Computational Mathematics, vol. 13, 2nd edn.
Cambridge University Press, Cambridge (2003). Ciarlet, P.G., Iserles, A., Kohn,
R.V., Wright M.H. (eds.)
10. Saad, Y.: Iterative Methods for Sparce Linear Systems, 2nd edn. SIAM, Philadel-
phia (2003)
11. Pissanetsky, S.: Sparce Matrix Technology. Academic Press, New York (1984)
12. Davis, T.A.: Direct Methods for Sparce Linear Systems. SIAM, Philadelphia
(2006)
13. Gilbert, J.R., Ng, E.G., Peyton B.W.: An Ecient Algorythm to Compute Row and
Column Counts for Sparce Cholesky Factorization. Oak Ridge National Laboratory
(1992)
Hierarchical Clustering and Multilevel
Refinement for the Bike-Sharing Station
Planning Problem
1 Introduction
Many large cities around the world have already built bike sharing systems
(BSS), and many more are considering to introduce one or extend an existing
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 150165, 2017.
https://doi.org/10.1007/978-3-319-69404-7_11
Hierarchical Clustering and Multilevel Renement for the BSSPP 151
one. These systems consist of rental stations around the city or a certain part
of it where customers can rent and return bikes. A rental station has a specic
number of parking slots where a bike can be taken from or returned to. On the
contrary to bike-rental systems, BSSs encourage a short-term usage of bikes. As
bikes are typically returned at a dierent station than they have been taken
from, a need for active rebalancing arises as the demand for bikes to rent and
parking slots to return bikes is not equally distributed among the stations.
Finding a good combination of station locations and building these stations
in the right size is crucial when planning a BSS as these stations obviously
directly determine the satised customer demand in terms of bike trips, the aris-
ing rebalancing eort, and the resulting xed and variable costs. Stations close
to public transport, business parks, or large housing developments will likely face
a high demand whereas stations in sparser inhabited areas will probably face a
lower demand. However, also the station density and connectedness of the actual
regions to be covered play crucial roles. Some solitary station that is far from
any other station will most likely not fulll much demand. Moreover, a clever
choice of station locations might also exploit the natural demands and customer
ows in order to keep the rebalancing eort and associated costs reasonable.
As BSSs are usually implemented in rather large cities the problem of nd-
ing optimal locations for rental stations and sizing these stations appropriately
is challenging and manually hardly comprehensible. Thus, there is the need for
computational techniques supporting this decision-making. Besides xed costs
for building the system, an integrated approach should also estimate mainte-
nance and rebalancing costs over a certain time horizon such that overall costs
for the operator can be approximated more precisely. It is further important to
consider the customer demands in a time-dependent way because there usually
exists a morning peak and an afternoon peak which is due to commuters, people
going to work, and students. Between these peaks, the demand of the system is
usually a bit lower. We refer to this problem as Bike Sharing Station Planning
Problem (BSSPP). The objective we consider here is to determine for a specied
total-cost budget and a separate xed-cost budget a selection of locations where
rental stations of an also to be determined size should be erected in order to
maximize the actually fullled customer demand.
In this work, we rst concentrate on how to eciently model the BSSPP such
that we can also deal with very large instances with thousands of considered
geographical cells for customers and potential station locations. To this end we
propose to utilize a hierarchical clustering to express the estimated potential
customer demand on it. We will then describe a linear programming (LP) based
method to evaluate candidate solutions, and nally present a rst novel multilevel
renement heuristic (MLR), based on mixed integer linear programming (MIP),
to approach the optimization problem.
In Sect. 2 we discuss related work. Section 3 denes the BSSPP formally, also
introducing the hierarchical clustering. Sections 3.3 and 3.4 describe LP models
for determining the actually fullled customer demands for a candidate solution
and estimating the required rebalancing eort, respectively. The MLR is then
152 C. Kloimullner and G.R. Raidl
2 Related Work
There already exists some work which tries to nd optimal station locations for
BSSs, although mostly considering dierent aspects. To the best of our knowl-
edge, Yang et al. [12] were the rst who considered the problem in 2010. They
relate the problem to hub location problems, a special variant of the well-known
facility location problem, and propose a mathematical model for it. The con-
sidered objective is to minimize the walking distance by prospective customers,
xed costs, and, a penalty for uncovered demands. The authors solve the problem
by a heuristic approach in which a rst part of the algorithm tries to identify the
location of rental stations and a second, inner part tries to nd shortest paths
between origin and destination pairs. The authors illustrate their approach by a
small example consisting of 11 candidate cells for bike stations.
Lin et al. [6] propose a mixed integer non-linear programming model and
solve a small example instance with 11 candidate stations by the commercial
solver LINGO, and furthermore provide a sensitivity analysis. Martinez et al. [8]
develop approaches for a case study within Lisbon having 565 prospective can-
didate stations. They propose a hybrid approach consisting of a heuristic part
utilizing a mixed integer linear programming (MIP) formulation. Locations as
well as the eet dimension are optimized, e-bikes are also considered, and rebal-
ancing requirements are estimated.
Lin et al. [7] propose a heuristic algorithm for solving the hub location inven-
tory problem arising in BSSPP. They do not only optimize station locations
but their algorithm also identies where to build bike lanes. As a subproblem
they have to determine the travel patterns of the customers, i.e., solve a ow
problem for a given conguration. They illustrate their approach on a small
example consisting of 11 candidate locations for stations. Saharidis et al. [9] pro-
pose a MIP formulation which minimizes unmet demands and walking distance
for prospective customers. They test their approach in a case study for the city
center of Athens having 50 candidate cells for stations. Chen et al. [1] provide a
mathematical non-linear programming model and solve the problem utilizing an
improved immune algorithm. They dene three dierent types of rental stations
depending on their location (e.g., near a metro station, supermarkets). Their
aim is that stations in the residential area have enough bikes available such that
the morning peak can be managed and that stations near metro lines or impor-
tant places have enough free parking slots available to manage incoming bikes
during the morning peak. They provide a case study for a particular metro line
of Nianjing city including 10 district stations and 31 residential stations. In [2]
Chen and Sun aim at satisfying a given demand and minimizing travel times
of the users. The authors propose an integer programming model which they
solve with the LINGO solver. A computational analysis is provided on a small
example. Frade et al. [3] describe an approach for a case study of the city of
Hierarchical Clustering and Multilevel Renement for the BSSPP 153
Coimbra, Portugal. They present a compact MIP model which they solve using
the XPRESS solver. Their objective is to maximize the demand covered by the
BSS under budget constraints. They also include the net revenue in their mathe-
matical model which reduces the costs incurred by building the BSS. Their single
test instance consists only of 29 cells or trac zones, how they call it. Hu et al. [5]
also present a case study for a BSS along a metro line. They aim at minimizing
total costs incurred by building particular BSS stations. In their computational
study they consider three scenarios, each consisting of ten possible station can-
didates. They solve the proposed MIP model by the LINGO solver. Last but
not least, Gavalas et al. [4] summarized diverse algorithmic approaches for the
design and management of vehicle-sharing systems.
We conclude that all previous works on computational optimization
approaches for designing BSS only consider rather small scenarios. Most pre-
vious work accomplishes the optimization with compact mathematical models
that are directly solved by a MIP solver. Such methods, however, are clearly
unsuited for tackling large realistic scenarios of cities with up to 2000 cells or
more. In the following, we therefore propose a novel multilevel renement heuris-
tic based on a hierarchical clustering of the demand data.
3 Problem Formalization
The considered geographical area is partitioned into cells. Let S be the set of
cells where a BSS station may potentially be located (station cells), and let V be
the set of cells where some positive travel demand (outgoing, ingoing, or both)
from prospective customers of the BSS exists (customer cells).
To handle such a large number of cells eectively, we consider a hierarchical
abstraction as crucial in order to represent and model the further data in a mean-
ingful and relatively compact form. To this end, we are expecting a hierarchical
clustering of all customer cells V as input.
This hierarchical clustering is given in the form of a rooted tree with the
inner nodes corresponding to clusters and the leafs corresponding to the cells.
All cells have the same depth which is equal to the height of the tree, denoted by
h. Let C = C0 . . . Ch be the set of all tree nodes, with Cd corresponding to
the subset of nodes at depth d = 0, . . . , h. C0 = {0} contains only the root node
0 representing the single cluster with all cells, while Ch = V . Let super(p) C
be the immediate predecessor (parent cluster) of some node p C \ C0 and
sub(p) C be the set of immediate successors (children) of a cluster p C \ Ch .
As the travel demand of potential users varies over time we are given a (small)
set of periods T = {1, . . . , } for a typical day for which the planning shall
be done. The estimated existing travel demand occurring in each period t T
from/to any cell v V is given by a weighted directed graph Gt = (C t , At ).
All relevant outgoing travel demand at a cell v is represented by outgoing arcs
(v, p) At with p C and corresponding values (weights) dtv,p > 0, i.e., (v, p)
represents all expected trips from v to any cell represented by p in period t that
might ideally be satised, and dtv,p indicates the expected number of these trips.
154 C. Kloimullner and G.R. Raidl
Moreover, for each time period t T we are given its duration denoted by tperiod
and we are given a global parameter rent which denes the average duration of
a single trip performed by some user of the BSS.
The following conditions must hold to keep this graph as compact and mean-
ingful as possible: the target node p of an arc (v, p) must not be a predecessor
of v in the cluster tree. Self-loops (v, v), however, are allowed and important
to model demand where the destination corresponds to the origin, arcs repre-
senting a neglectable demand, i.e., below a certain threshold, shall be avoided.
Consequently, if there is an arc (v, p) no further arc (v, q) is allowed to any node
q being a successor or a predecessor of p.
All estimated ingoing travel demand for each cell v V is given corre-
spondingly by arcs (p, v) At with p C with demand values dtp,v 0, and
corresponding conditions must hold.
Furthermore, it is an important property, that ingoing and outgoing demands
have to be consistent: Let us denote by V (p) the subset of all cells from V
contained in cluster p C, i.e., the leafs of the subtree rooted in p, and by
C(p) the subset of all the nodes q C that are part of the subtree rooted in p,
including p and V (p). For any p C \ V it must hold that
dtv,q dtq,v (1)
(v,q)At |vV (p),qC(p) (q,v)At |qC(p),vV (p)
and
dtq,v dtv,q . (2)
(q,v)At |qC(p),vV (p) (v,q)At |vV (p),qC(p)
Condition (1) ensures that the total demand originating at the leafs of the subtree
rooted at p and leading to a destination outside of the tree is never less than
the total ingoing demand at all the cells outside the tree originating from some
cluster inside the tree. Condition (2) provides a symmetric condition for the
total ingoing demand at all the leafs of the tree. Furthermore, for the root node
p = 0, inequalities (1) and (2) must hold with equality.
For each customer cell v V , we are given a (typically small) set S(v) S
of station cells in the vicinity by which vs demand may be (partly) fullled.
Furthermore, let av,s (0, 1], v V, s S(v), be an attractiveness value
indicating the expected proportion of demand from v (ingoing as well as out-
going) that can at most be fullled with a suciently sized station at s. These
attractiveness values will be determined primarily based on the walking distances
among the stations (the value will typically roughly exponentially decrease with
the distance), but can be in general an arbitrary distance decay model. If there
is a one-to-one correspondence of cells in V and S, for each v V , v S(v),
av,v = 1 will typically hold.
For the costs of building a station we consider here only a (strongly) sim-
plied linear model, but we distinguish xed costs for building the station and
initially buying the bikes, variable costs for maintaining the station and the
Hierarchical Clustering and Multilevel Renement for the BSSPP 155
respective bikes, and costs for performing the rebalancing. Let bfix and bvar be
the average xed and variable costs per bike slot, and let breb be the average
costs for rebalancing one bike per day over the whole planning horizon. The xed
costs for a station in cell s S with xs slots are then xcost(s) = bfix xs and
the total costs are totalcost(s) = bfix xs + bvar xs + breb Qx (s), where Qx (s)
denotes an estimation for the number of bikes that need to be redistributed from
station s to some other station. We assume here that the size of each station,
i.e., the number of its slots, can be freely chosen from 0 (i.e., no station is built)
up to some maximum cell-dependent capacity zs N. The determination of the
rebalancing eort for a given candidate solution will be described in Sect. 3.4.
We remark that this cost model only is a rst very rough estimate. Considering
location dependent costs, costs for a station to be built that are independent of
the number of slots, and a more restricted selection of station sizes is left for
future research.
tot
We assume that a total budget Bmax is given as well as a budget for only the
fix tot
sum of all xed costs Bmax < Bmax , and both must not be exceeded in a feasible
solution.
3.2 Objective
The goal is to maximize the expected total number of journeys in the system,
i.e., the total demand that actually can be fullled at each day over all time
tot fix
periods, considering the available budgets Bmax and Bmax .
Let D(x, t) be the total demand fullled by solution x in time period t T ,
and let Qx (s) be the required rebalancing eort arising at each station s S |
xs = 0 in terms of the number of bikes to be moved to some other station. The
calculation of these values will be considered separately in Sects. 3.3 and 3.4.
The BSSPP can then be stated as the following MIP.
max D(x, t) (3)
tT
(bfix xs + bvar xs + breb Qx (s)) Bmax
tot
(4)
sS
bfix xs Bmax
fix
(5)
sS
xs {0, . . . , zs } sS (6)
156 C. Kloimullner and G.R. Raidl
Inequality (4) calculates the total costs over all stations and ensures that the
total budget is not exceeded, while inequality (5) restricts only the xed costs
over all stations by the respective budget.
To determine the overall fullled demand for a specic, given solution x and
a certain time slot t T , we rst make the following local denitions. Let
S = {s S | xs = 0} correspond to the set of cells where a station actually
is located, V = {v V | S(v) S = } be the set of customer cells whose
demand can possibly (partly) be fullled as at least one station exists in the
neighborhood. Moreover, let C = {p C | V (p)V = } be the set of all nodes
in the hierarchical clustering representing relevant customer cells, i.e., cells whose
demand can possibly be fullled. The set S (v) = S(v) V , v V refers to
the existing stations that might fulll part of vs demand, and V (p) = V (p)
V , p C denotes the existing customer cells contained in cluster p. C (p)
refers to the subset of all the nodes q C that are part of the subtree rooted at
p, including p and V (p), and G = (C , A ) with A = {(p, q) At | p, q C } is
then the correspondingly reduced demand graph.
In the following we use variables u, v, w for referencing customer cells in V ,
variables p, q for referencing cluster nodes in C (which might possibly also be
customer cells), variable s for station cells in S , and , for arbitrary nodes in
C S.
We further dene for each arc in A corresponding to a specic demand an
individual ow network depending on the kind of the arc:
on each other and the stations capacities. A weighting factor is used to adjust
the number of trips which can be performed in time period t by using only a
single bike. The following LP is used to compute the total satised demand
v,p
D(x, t) = max fv,s (7)
(v,p)A |vV (v,s)Av,p
f
v,p t
s.t. fv,s dv,p (v, p) A | v V (8)
v,p
(v,s)A
f
p,v t
fs,v dp,v (p, v) A | v V (9)
p,v
(s,v)A
f
u,v
u,v
fu,s = fs,s (u, v) A | u, v V , (10)
s S (v)
s S (u)
u,v u,v
fs ,s = fs,v (u, v) A | u, v V , (11)
s S (u)
s S (v)
v,p v,p
fv,s = fs,p (v, p) A | v V , (12)
p C \ V , s S (v)
p,v p,v
fp,s = fs,v (p, v) A | v V , (13)
p C \ V , s S (v)
p,q
xs f,s sS (14)
(p,q)A (,s)Ap,q
f
p,q
fs,
(p,q)A (s,)Ap,q
f
rent (p,q)A (,s)A
p,q
p,q
f,s
f
tperiod
p,q
xs f,s sS (15)
(p,q)A (,s)Ap,q
f
p,q
fs,
(p,q)A (s,)Ap,q
f
rent (p,q)A (s,)A
p,q
p,q
fs,
f
+
tperiod
in
Hp = pC \V (16)
(q,v)A |qC (p)V ,vV (p)
q,v
fs,q
q,v
(s,q)A
f
in
Fp = pC \V (17)
(v,q)A |vV (p),qC (p)\V
v,p
fs,q
v,q
(s,q)A
f
158 C. Kloimullner and G.R. Raidl
0 fs,v
p,v
as,v dtp,v (p, v) A | v V , (25)
(s, v) Ap,v
f
p,q
0 f, dtp,q (p, q) A , (26)
(, ) Ap,q
f | , V
Fpin , Fpout 0 p C \ V (27)
Hpin , Hpout 0 pC \V (28)
Objective function (7) maximizes the total outgoing ow over all v V , i.e.,
the fullled demand. Note that this also corresponds to the total ingoing ow
over all v. Inequalities (8) limit the total ow leaving v V , for each demand
(v, p) A | v V to dtv,p . Inequalities (9) do the same w.r.t. ingoing demands.
Equalities (10) and (11) provide the ow conservation at source and destination
stations s for (u, v) A with u, v V . Equalities (12) provide the ow conser-
vation at the source station in case of an arc (v, p) A towards a cluster node
p, while (13) provide the ow conservation at the destination station in case
of an arc (p, v) A originating at a cluster node p. Inequalities (14) and (15)
provide the capacity limitations at each station v V . It is the accumulated
demand occurring at the particular station including a compensation term for
large values of ingoing as well as outgoing demand. The fraction tperiod / rent
represents the number of trips which can ideally be performed in period t using
a single bike. The weighting factor is used to adjust this value such that it
better reects reality as the bike trips are not likely to be performed optimally
with respect to the distribution over the whole time period in real world. Equal-
ities (16) compute the total outgoing ow for the leafs of the subtree rooted at
p to any cluster which is not part of the subtree rooted at p. Equalities (17)
Hierarchical Clustering and Multilevel Renement for the BSSPP 159
compute the total ingoing ow for each cluster node p by considering the ingo-
ing ow from any v V for which p is not a predecessor to every cluster of
the subtree rooted at p. Inequalities (18) ensure that there must not be more
ingoing ow to clusters of the subtree rooted at p as there is outgoing ow from
the leafs contained in the subtree rooted at p. Equality (19) ensures that at the
top level, i.e., at the root node 0, the outgoing ow from leaf nodes to cluster
nodes and the ingoing ow from cluster nodes to leaf nodes is balanced, i.e., the
same amount. Inequalities (21)(23) state the corresponding constraints for the
outgoing ow instead of the ingoing ow. Equations (24) and (25) provide the
domain denitions for the ow variables from/to a cell v to/from a neighboring
station s by considering the demand weighted by factor av,s . For all remaining
ow variables, (26) provide the domain denitions based on the demands. The
remaining variables are just restricted to be non-negative in (27) and (28).
+
Qx (s) = min rt,s + rt,s (29)
tT
s.t. +
yt,s + rt,s Dt,s
acc
tT (30)
xs yt,s + Dt,s
rt,s acc
tT (31)
yt+1,s = yt,s Dt,s
acc +
+ rt,s rt,s t T \ { } (32)
y1,s = y,s D,s
acc +
+ r,s r,s (33)
0 yt,s xs tT (34)
0 +
rt,s acc
Dt,s tT (35)
0 rt,s Dt,s
acc
tT (36)
Objective function (29) minimizes the number of rebalanced bikes, i.e., number
+
of bikes that have to be delivered rt,s and number of bikes that have to be
picked up rt,s . Inequalities (30) compute the number of bikes that have to
160 C. Kloimullner and G.R. Raidl
Clearly, practical instances of the problem are far too large to be approached
by a direct exact MIP approach. However, also basic constructive techniques
or metaheuristics with simple, classical neighborhoods are unlikely to yield rea-
sonable results when making decisions on a low level without considering crucial
relationships on higher abstraction levels, i.e., a more global view. Classical local
search techniques on the natural variable domains concerning decisions for indi-
vidual stations may only ne-tune a solution but are hardly able to overcome
bad solutions in which larger regions need to be either supplied with new sta-
tions or where many stations need to be removed. We therefore have the strong
need of some technique that exploits also a higher-level view, deciding for larger
areas about the supply of stations in principle. Multilevel renement strategies
can provide this point-of-view.
In multilevel renement strategies [11] the whole problem is iteratively coars-
ened (aggregated) until a certain problem size is reached that can be reasonably
handled by some exact or heuristic optimization technique. After obtaining a
solution at this highest abstraction level, the solution is iteratively extended to
the previous lower level problem instance and possibly rened by some local
search, until a solution to the original problem at the lowest level, i.e., the orig-
inal problem instance, is obtained. For a general discussion and the generic
framework we refer to the work of Walshaw [10].
To apply multilevel renement to BSSPP we essentially have to decide how
to realize the procedures for coarsening an instance for the next higher level,
solving a reasonably small instance, and extending a solution to a solution at
the next lower level. In the following, we denote all problem instance data at
level l by an additional superscript l. By Pl we generally refer to the problem at
level l of the MLR algorithm described here.
4.1 Coarsening
We have to derive the more abstract problem instance Pl+1 from a given instance
Pl . Naturally, we can exploit the already existing customer cell cluster hierar-
chy for the coarsening. Remember that all customer cells appear in the cluster
hierarchy always at the same level. We coarsen the problem by considering the
customer cells and the station cells separately.
Hierarchical Clustering and Multilevel Renement for the BSSPP 161
Coarsening of Customer Cells. The main strategy for coarsening the customer
cells is to merge cells having the same parent cluster together with their parent.
This means V l+1 = Chl l 1 or simply V l+1 = Chl1 , i.e., each cluster node at
depth h l 1 corresponds to a customer cell at level l + 1 representing the
merged set of customer nodes contained in Chl1 . The hierarchical clustering
of Pl becomes C l+1 = C0 . . . Chl . Remember that we already dened
the function super(p) to return the parent cluster of some node p, and therefore
super(pl ) : C l C l+1 also returns the cluster from C l+1 in which cluster pl C l
is merged into. The new demand graph Gt,l+1 = (C t,l+1 , At,l+1 ) consists of the
t,l+1
arc set A = (pl ,ql )At,l (super(p ), super(q l )). This demand graph may again
l
contain self-loops, but it is still simple, i.e., multiple arcs from At,l may map to
the same single arc in At,l+1 and the respective demand values are merged.
Considering an arc (pl+1 , q l+1 ) At,l+1 , its associated demand is thus
dt,l+1
pl+1 ,q l+1 = dt,l
pl ,q l
. (37)
(pl ,q l )At,l |pl+1 =super(pl ),q l+1 =super(q l )
Note that the conditions for a valid demand graph and valid demand values
stated in inequalities (1) and (2) will still hold when aggregating in this way,
since the total ingoing and outgoing demand at each cluster p C l+1 (including
the demands from and to all existing subnodes) stays the same.
S l+1 (v l+1 ) = sl+1 super(S l (v l )) | avl+1 ,sl+1 (38)
v l sub(v l+1 )
4.2 Initialization
The initial problem becomes coarsened until we reach some level l where it can
be reasonably solved as it is then small enough. In our experiments with binary
clustering trees here we are stopping the coarsening when the clustering tree
has no more than 25 = 32 leaf nodes, or in other words, at a height of ve.
For initializing the solution at the coarsest level we utilize a MIP model. In
this model, the objective stated in Sect. 3.2, the demand calculation for every
time period stated in Sect. 3.3, and the rebalancing LP model stated in Sect. 3.4
are put together. By solving this model we obtain an optimal solution for the
coarsest level, which forms the basis for proceeding with the next step of the
algorithm, the extension to derive step-by-step a more detailed solutions.
4.3 Extension
In the extension step we derive from a solution xl+1 at level l + 1 a solution xl
at level l, i.e., we have to decide for each aggregated station sl+1 S l+1 with
xl+1
sl+1
> 0 slots how they should be realized by the respective underlying station
cells sub(sl+1 ) at level l. We do this in a way so that the globally fullled demand
is again maximized by solving the following MIP.
max D(xl , t) (40)
tT
s.t. bfix xlsl + bvar xlsl + breb Qxl (sl ) Bmax
tot
(41)
sl S l
bfix xsl Bmax
fix
(42)
sl S l
xlsl xl+1
sl+1
sl+1 S l+1 (43)
sl sub(sl+1 )
The objective (40) maximizes the total satisable demand. Inequalities (41)
restrict the maximum total budget whereas inequalities (42) restrict the maxi-
mum xed budget. Inequalities (43) are the bounds on the total number of slots
for the station nodes sl sub(sl+1 ). The number of parking slots in each cell xlsl
is restricted by the maximum number of parking slots allowed in this cell (44).
5 Computational Results
For our experiments we created seven dierent benchmark sets1 , each one con-
taining 20 dierent, random instances. We consider instances with 200, 300, 500,
800, 1000, 1500, and 2000 customer cells, where each customer cell is also a pos-
sible location for a station to be built. Customer cells are aligned on a grid in
1
https://www.ac.tuwien.ac.at/les/resources/instances/bsspp/lion17.bz2.
Hierarchical Clustering and Multilevel Renement for the BSSPP 163
the plane and euclidean distances have been calculated based on which a hier-
archical clustering with the complete-linkage method was computed. Demands
among the leaf nodes were chosen randomly, considering the pairwise distance
between customer cells, and demands below a certain threshold have been aggre-
gated upwards in the clustering tree such that the demand graphs get sparser.
Only cells within 200 m walking distance are considered to be in the vicinity of
a customer cell and respective attractiveness values are chosen randomly but in
correlation with the distances. We set the maximum station size to zs = 40 for
all cells in all test cases. For slot costs we set bfix = 1750 e, and bvar = 1000 e,
which are reasonable estimates in the Vienna area gathered from real BSSs. The
costs for rebalancing a single bike for one day have been estimated with 3 e per
bike and per day. When projecting this cost to the optimization horizon, e.g.,
1 year, we get breb = 365 3 = 1095 e. For coarsening of attractiveness values,
we set the corresponding parameter = 0 and for adjusting the number of trips
which can be performed in a particular time period t T by using only a single
bike we set = 1.2. Each instance contains four time periods which we selected
as follows: 4:30 am to 8:00 am, 8:00 am to 12:00 Noon, 12:00 Noon to 6:15 pm,
and 6:15 pm to 4:30 am. The duration for each time period t T has been set
accordingly and the average trip duration has been set to trent = 10 min.
All algorithms are implemented in C++ and have been compiled with gcc 4.8.
For solving the LPs and MIPs we used Gurobi 7.0. All experiments were executed
as single threads on an Intel Xeon E5540 2.53 GHz Quad Core processor.
Table 1 summarizes obtained results. For every instance set we state the
name containing the number of nodes, the number of dierent instances we
tot
have tested on (#runs), the maximum total budget (Bmax ), and the maximum
fix
xed budget (Bmax ). For the proposed MLR, we list the average objective value
(obj), i.e., the expected fullled demand in terms of the number of journeys, the
average number of coarsening levels (#coarsen), the median time (time), and
the average total costs (totcost) as well as the average xed costs (xcost) for
building the number of slots in the solution. Most importantly, it can be seen
that the proposed MLR scales very well to large instances up to 2000 customer
cells.
Instance MLR
tot fix [s] totcost [e]
Name #runs Bmax [e] Bmax [e] obj #coarsen time fixcost [e]
BSSPP 200 20 200,000.00 130,000.00 9,651.98 3 46.2 198,000.00 126,000.00
BSSPP 300 20 350,000.00 250,000.00 10,951.79 5 60.8 349,250.00 222,250.00
BSSPP 500 20 500,000.00 350,000.00 16,057.78 6 121.6 497,750.00 316,750.00
BSSPP 800 20 850,000.00 550,000.00 28,862.21 6 263.9 849,750.00 540,750.00
BSSPP 1000 20 1,000,000.00 700,000.00 28,967.58 8 346.7 998,250.00 635,250.00
BSSPP 1500 20 1,500,000.00 1,000,000.00 41,208.19 8 574.5 1,498,475.00 953,575.00
BSSPP 2000 20 2,000,000.00 1,300,000.00 55,892.06 8 803.4 1,999,250.00 1,272,250.00
Average 27,370.22 6.3 912,960.71 580,975.00
164 C. Kloimullner and G.R. Raidl
References
1. Chen, J., Chen, X., Jiang, H., Zhu, S., Li, X., Li, Z.: Determining the optimal
layout design for public bicycle system within the attractive scope of a metro
station. Math. Probl. Eng. Article ID 456013, 8 p. (2015)
2. Chen, Q., Sun, T.: A model for the layout of bike stations in public bike-sharing
systems. J. Adv. Transport. 49(8), 884900 (2015)
3. Frade, I., Ribeiro, A.: Bike-sharing stations: a maximal covering location approach.
Transport. Res. A-Pol. 82, 216227 (2015)
4. Gavalas, D., Konstantopoulos, C., Pantziou, G.: Design & management of
vehicle sharing systems: a survey of algorithmic approaches. In: Obaidat, M.S.,
Nicopolitidis, P. (eds.) Smart Cities and Homes: Key Enabling Technologies, pp.
261289. Elsevier Science, Amsterdam (2016). Chap. 13
5. Hu, S.R., Liu, C.T.: An optimal location model for a bicycle sharing program
with truck dispatching consideration. In: IEEE 17th International Conference on
Intelligent Transportation Systems (ITSC), pp. 17751780. IEEE (2014)
Hierarchical Clustering and Multilevel Renement for the BSSPP 165
6. Lin, J.R., Yang, T.H.: Strategic design of public bicycle sharing systems with ser-
vice level constraints. Transport. Res. E-Log. 47(2), 284294 (2011)
7. Lin, J.R., Yang, T.H., Chang, Y.C.: A hub location inventory model for bicycle
sharing system design: formulation and solution. Comput. Ind. Eng. 65(1), 7786
(2013)
8. Martinez, L.M., Caetano, L., Eiro, T., Cruz, F.: An optimisation algorithm to
establish the location of stations of a mixed eet biking system: an application to
the city of Lisbon. Procedia Soc. Behav. Sci. 54, 513524 (2012)
9. Saharidis, G., Fragkogios, A., Zygouri, E.: A multi-periodic optimization modeling
approach for the establishment of a bike sharing network: a case study of the city
of Athens. In: Proceedings of the International Multi Conference of Engineers and
Computer Scientists 2014. LNECS, vol. II, No. 2210, pp. 12261231. Newswood
Limited (2014)
10. Walshaw, C.: A multilevel approach to the travelling salesman problem. Oper. Res.
50(5), 862877 (2002)
11. Walshaw, C.: Multilevel renement for combinatorial optimisation problems. Ann.
Oper. Res. 131(1), 325372 (2004)
12. Yang, T.H., Lin, J.R., Chang, Y.C.: Strategic design of public bicycle sharing
systems incorporating with bicycle stocks considerations. In: 40th International
Conference on Computers and Industrial Engineering (CIE), pp. 16. IEEE (2010)
Decomposition Descent Method for Limit
Optimization Problems
Igor Konnov(B)
1 Introduction
We rst consider the general optimization problem, which consists in nding
the minimal value of some function p over the corresponding feasible set X. For
brevity, we write this problem as
Its solution set will be denoted by X and the optimal value of the function by
p , i.e.
p = inf p(x).
xX
In order to develop ecient solution methods for this problem we should exploit
certain additional information about its properties, which are related to some
classes of applications.
In what follows, we denote by Rs the real s-dimensional Euclidean space, all
elements of such spaces being column vectors represented by a lower case Roman
alphabet in boldface, e.g. x. For any vectors x and y of Rs , we denote by x, y
their scalar product, i.e.,
s
x, y = x y = xi yi ,
i=1
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 166179, 2017.
https://doi.org/10.1007/978-3-319-69404-7_12
Decomposition Descent Method 167
and by x the Euclidean norm of x, i.e., x = x, x. Next, we dene for
brevity M = {1, . . . , n}, |A| will denote the cardinality
of a nite set A. As
usual, R will denote the set of real numbers, R = R {+}.
Let us consider a partition of the N -dimensional space
i.e.
n
N = Ni ,
i=1
where N = {1, . . . , N }, N = |N |, Ni = |Ni |, and Ni Nj = if i = j.
This means that any point x = (x1 , . . . , xN ) RN is represented by x =
(x1 , . . . , xn ) where xi = (xj )jNi RNi for i M . The simplest case where
ni = 1 for all i M and n = N corresponds to the scalar coordinate partition.
Rather recently, partially decomposable optimization problems were paid sig-
nicant attention due to their various big data applications; see e.g. [13] and
the references therein. In these problems, the cost function and feasible set are
specialized as follows:
(A1) It holds that (5) where Xi are non-empty, convex, and closed sets in RNi
for i = 1, . . . , n.
Also, we suppose that
n
(x) = i (xi ), (9)
i=1
(b) If is convex, then each solution of MVI (11) solves problem (10).
In what follows, we denote by X 0 the solution set of MVI (11) and call it the
set of stationary points of problem (10); cf. (6).
Fix > 0. For each point x X we can dene y(x) = (y1 (x), . . . , yn (x)) X
such that
n
n
gi (x) + (yi (x) xi ), yi yi (x) + [i (yi ) i (yi (x))] 0
(12)
i=1 i=1
yi Xi , for i = 1, . . . , n.
This MVI gives a necessary and sucient optimality condition for the optimiza-
tion problem:
n
min i (x, yi ), (13)
yX1 ...Xn
i=1
where
i (x, yi ) = gi (x), yi + 0.5xi yi 2 + i (yi ) (14)
for i = 1, . . . , n. Under the above assumptions each i (x, ) is strongly convex,
hence problem (13) and (14) (or (12)) has the unique solution y(x), thus dening
the single-valued mapping x
y(x). Observe that all the components of y(x)
can be found independently, i.e. (13) and (14) is equivalent to n independent
optimization problems of the form
(a) x = y(x) x X 0 ;
(b) The mapping x
y(x) is continuous on X.
n
Set (x) = x y(x), then 2 (x) = 2i (x) where i (x) = xi yi (x).
i=1
From Lemma 2 we conclude that the value (x) can serve as accuracy measure
for MVI (11).
We need also a descent property from [6, Lemma 2.1].
then
(x; d) yi (x) xi 2 .
Decomposition Descent Method 171
and go to Step 3. Otherwise (i.e. when s (xk ) < for all s M ) go to Step 2.
Step 2: Set z = xk and stop.
Step 3: Determine m as the smallest number in Z+ such that
lim k = 0.
k
Suppose that the sequence {xk } is innite. Since the set M is nite, there is an
index ik = i, which is repeated innitely. Take the corresponding subsequence
{ks }, then, without loss of generality, we can suppose that the subsequence {xks }
converges to a point x, besides, iks (xks ) = dki s , and we have
Using the mean value theorem (see e.g. [11, Theorem 2.3.7]), we obtain
for some gks = (xks + (ks /)ks dks ), tks (xks + (ks /)ks dks ), ks
(0, 1). By taking the limit s we have
Condition (A2) means that the limit set-valued mapping G at any point
is approximated by a sequence of gradients {Gl }. In fact, if G is the Clarke
subdierential of a locally Lipschitz function f , it can be always approximated
by a sequence of gradients within condition (A2); see [12,13]. Observe also
that if there is a subsequence yls X with {yls } y, then (A2) implies
{Gls (yls )} g G(y) and the same is true for (A3). At the same time, the
non-dierentiability of the functions f or h is not obligatory, the main property
is the existence of the approximation sequences indicated in (A2) and (A3).
So, we replace MVI (7) with a sequence of MVIs: Find a point zl X =
X1 . . . Xn such that
n
n
Gl,i (zl ), yi zli + hl,i (yi ) hl,i (zli ) 0
(17)
i=1 i=1
yi Xi , for i = 1, . . . , n;
where we use the partition of Gl which corresponds to that of the space RN , i.e.
Similarly, we set
n
hl (x) = hl,i (xi ).
i=1
(C1) For each xed l = 1, 2, . . . , the function fl (x) + hl (x) is coercive on the
set X, that is, {fl (wk ) + hl (wk )} + if {wk } X, wk as
k .
(C2) There exist a number > 0 and a point v X such that for any sequences
{ul } and {dl } satisfying the conditions:
ul X, {ul } +, {dl } 0;
it holds that
lim inf Gl (ul ) + dl , v ul dl + [hl (v) hl (ul dl )] if > 0.
l
Clearly, (C1) gives a custom coercivity condition for each function fl (x) +
hl (x), which provides existence of solutions of each particular problem (17).
Obviously, (C1) holds if X is bounded. At the same time, (C2) gives a similar
coercivity condition for the whole sequence of these problems approximating the
limit MVI (7). It also holds if X is bounded. In the unbounded case (C2) is
weaker than the following coercivity condition:
vul dl 1 Gl (ul ) + dl , v ul dl + [hl (v) hl (ul dl )] as l .
Similar conditions are also usual for penalty type methods; see e.g. [14,15]. We
therefore conclude that conditions (C1) and (C2) are not restrictive.
The whole decomposition method for the non-stationary MVI (7) has a two-
level iteration scheme where each stage of the upper level invokes Algorithm
(DDS) with dierent parameters.
Method (DNS). Choose a point z0 X and a sequence {l } +0.
At the l-th stage, l = 1, 2, . . ., we have a point zl1 X and a number l .
Set
(x) = fl (x), (x) = hl (x),
apply Algorithm (DDS) with x0 = zl1 , = l and obtain a point zl = z as its
output.
We now establish the main convergence result.
(iii) the sequence {zl } generated by Method (DNS) has limit points and all these
limit points are solutions of MVI (7);
(iv) if f is convex, then all the limit points of {zl } belong to X .
Proof. We rst observe that (C1) implies that each problem (17) has a solution
since the cost function
(x) = fl (x) + hl (x)
is coercive, hence the set
Xl (x0 ) = y X | (y) (x0 )
min (x)
xX
has a solution and so is MVI (17) due to Lemma 1. Hence, assertion (i) is true.
Next, from Proposition 1 we now have that assertion (ii) is also true.
By (ii), the sequence {zl } is well-dened and (12) implies
Gl (zl ) + (yl (zl ) zl ), y yl (zl ) + [hl (y) hl (yl (zl ))] 0 y X. (18)
Here and below, for brevity we set gl = Gl (zl ), zl = yl (zl ), and dl = (y(zl )zl ).
Take a subsequence {ls } such that
lim gls + dls , v zls + [hls (v) hls (zls )]
s
= lim inf gl + dl , v zl + [hl (v) hl (zl )] ,
l
a contradiction. Therefore, the sequence {zl } is bounded and has limit points.
Let z be an arbitrary limit point for {zl }, i.e.
z = lim zls .
s
Fix an arbitrary point y X, then, using (18) and (19) and (A3), we have
g, y z + [h(y) h(z)] = lim gls , y zls + [hls (y) hls (zls )]
s
= lim gls + dls , y zls + [hls (y) hls (zls )] 0,
s
Ax = b,
may give very inexact approximations. In order to enhance its properties, one
can utilize a family of regularized problems of the form
n
where h(x) = x2 or h(x) = x1 |xi |, > 0 is a parameter. Note that the
i=1
non-smooth regularization term yields additionally sparse solutions with rather
small number of non-zero components; see e.g. [2,17].
176 I. Konnov
The second instance is the basic machine learning problem, which is called
the linear support vector machine. It consists in nding the optimal partition
of the feature space Rn by using some given training sequence xi , i = 1, . . . , l
where each point xi has a binary label yi {1, +1} indicating the class. We
have to nd a separating hyperplane. Usually, its parameters are found from the
solution of the optimization problem
l
minn (1/p)wpp + C L(w, xi ; yi ), (21)
wR
i=1
where L is a loss function and C > 0 is a penalty parameter. The usual choice
is L(z; y) = max{0; 1 yz} and p is either 1 or 2; see e.g. [1,5] for more details.
Observe that the data of the observation points xi can be again inexact or even
non-stationary.
Next, taking p = 2, we can rewrite this problem as
l
min 0.5w2 + C i ,
w,
i=1
subject to
1 yi w, xi i , i 0, i = 1, . . . , l.
Its dual has the quadratic programming format:
l
l
l
max i 0.5 (s ys )(t yt )xs , xt . (22)
0i C,i=1,...,l
i=1 s=1 t=1
Observe that all these problems fall into format (1), (3)(5) and that they can
be treated as limit problems.
5 Computational Experiments
In order to evaluate the computational properties of the proposed method we
carried out preliminary series of test experiments. For simplicity, we took only
unconstrained test problems of form (1), (3)(5) where X = RN with the single-
dimensional (coordinate) partition of the space, i.e., set N = n and Ni = 1 for
i = 1, . . . , n. In all the experiments, we took the limit function f to be convex
and quadratic, namely,
The main goal was to compare (DNS) with the usual (splitting) gradient descent
method (GNS for short); see [18]. It calculates all the components for the direc-
tion nding procedure. Both the methods used the same line-search strategy and
were applied sequentially to each non-stationary problem (23) with the following
rule l+1 = l for changing the perturbation. In (GNS), this change occurs after
satisfying the inequality
l (x) = x yl (x) l ,
In both the methods, we chose the rule l+1 = l with = 0.5. Similarly, for
the limit problem, we set
(x) = x y(x),
where y(x) is a unique solution of the problem
min f (x), y + 0.5x y2 + h(y) .
y
We took (xk ) as accuracy measure for solving the limit problem, chose the
accuracy 0.1, took the same starting point zj0 = j| sin(j)| for j = 1, . . . , n, and
set = 1 for both the methods. The methods were implemented in Delphi with
double precision arithmetic.
In the rst two series, we set h 0 and took versions with exact line-search. In
the rst series, we took the elements aij = sin(i/j) cos(ij) and bi = (1/i) sin(i).
The results are given in Table 1. In the second series, we took the elements aij =
1/(i + j) + 2 sin(i/j) cos(ij)/j and bi = n sin(i). The results are given in Table 2.
In the third series, we took the elements aij = 1/(i + j) + 2 sin(i/j) cos(ij)/j
and bi = n sin(i) as above, but also chose
N
h(x) = |xi |.
i=1
So, the cost function is non-smooth. Here we took versions with the Armijo
line-search. The results are given in Table 3, where (cl) now denotes the total
number of calculations of partial derivatives of fl . Therefore, (DNS) showed
rather stable and rapid convergence, and the explicit preference over (GNS) if
the dimensionality was greater than 20.
178 I. Konnov
Table 1. The numbers of iterations (it) and partial derivatives calculations (cl)
(GNS) (DNS)
it cl it cl
N =2 2 4 2 7
N =5 10 50 19 64
N = 10 17 170 67 235
N = 20 36 720 194 688
N = 40 105 4200 734 2935
N = 80 228 18240 3500 11241
N = 100 201 20100 4641 16473
Table 2. The numbers of iterations (it) and partial derivatives calculations (cl)
(GNS) (DNS)
it cl it cl
N =2 6 12 2 7
N =5 10 50 12 49
N = 10 19 190 25 139
N = 20 38 760 55 349
N = 40 72 2880 119 871
N = 80 164 13120 309 2461
N = 100 252 25200 414 3279
Table 3. The numbers of iterations (it) and partial derivatives calculations (cl)
(GNS) (DNS)
it cl it cl
N =2 13 26 7 23
N =5 16 80 32 84
N = 10 16 160 76 245
N = 20 40 800 257 982
N = 40 72 2880 485 1923
N = 80 135 10800 1127 4286
N = 100 188 18800 1374 6075
6 Conclusions
We described a new class of coordinate-wise descent splitting methods for limit
decomposable composite optimization problems involving set-valued mappings
and non-smooth functions. The method is based on selective coordinate variations
Decomposition Descent Method 179
Acknowledgement. The results of this work were obtained within the state assign-
ment of the Ministry of Science and Education of Russia, project No. 1.460.2016/1.4.
In this work, the author was also supported by Russian Foundation for Basic Research,
project No. 16-01-00109 and by grant No. 297689 from Academy of Finland.
References
1. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition.
Data Mining Know. Disc. 2, 121167 (1998)
2. Cevher, V., Becker, S., Schmidt, M.: Convex optimization for big data. Signal
Process. Magaz. 31, 3243 (2014)
3. Facchinei, F., Scutari, G., Sagratella, S.: Parallel selective algorithms for nonconvex
big data optimization. IEEE Trans. Sig. Process. 63, 18741889 (2015)
4. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable
minimization. Math. Progr. 117, 387423 (2010)
5. Richtarik, P., Takac, M.: Parallel coordinate descent methods for big data opti-
mization. Math. Program. 156, 433484 (2016)
6. Konnov, I.V.: Sequential threshold control in descent splitting methods for decom-
posable optimization problems. Optim. Meth. Softw. 30, 12381254 (2015)
7. Alart, P., Lemaire, B.: Penalization in non-classical convex programming via vari-
ational convergence. Math. Program. 51, 307331 (1991)
8. Cominetti, R.: Coupling the proximal point algorithm with approximation meth-
ods. J. Optim. Theor. Appl. 95, 581600 (1997)
9. Salmon, G., Nguyen, V.H., Strodiot, J.J.: Coupling the auxiliary problem principle
and epiconvergence theory for solving general variational inequalities. J. Optim.
Theor. Appl. 104, 629657 (2000)
10. Konnov, I.V.: An inexact penalty method for non stationary generalized variational
inequalities. Set-Valued Variat. Anal. 23, 239248 (2015)
11. Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, New York (1983)
12. Ermoliev, Y.M., Norkin, V.I., Wets, R.J.B.: The minimization of semicontinuous
functions: mollier subgradient. SIAM J. Contr. Optim. 33, 149167 (1995)
13. Czarnecki, M.-O., Riord, L.: Approximation and regularization of lipschitz func-
tions: convergence of the gradients. Trans. Amer. Math. Soc. 358, 44674520 (2006)
14. Gwinner, J.: On the penalty method for constrained variational inequalities. In:
Hiriart-Urruty, J.-B., Oettli, W., Stoer, J. (eds.) Optimization: Theory and Algo-
rithms, pp. 197211. Marcel Dekker, New York (1981)
15. Blum, E., Oettli, W.: From optimization and variational inequalities to equilibrium
problems. The Math. Stud. 63, 127149 (1994)
16. Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer
Academic Publishers, Dordrecht (1996)
17. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc.
Ser. B. 58, 267288 (1996)
18. Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain non-
convex minimization problems. Int. J. Syst. Sci. 12, 9891000 (1981)
RAMBO: Resource-Aware Model-Based
Optimization with Scheduling for Heterogeneous
Runtimes and a Comparison with Asynchronous
Model-Based Optimization
1 Introduction
Ecient global optimization of expensive black-box functions is of interest to
many elds of research. In the engineering industry, computationally expensive
models have to be optimized; for machine learning hyperparameters have to
be tuned; and for computer experiments in general, expensive algorithms have
parameters that have to be optimized to obtain a well-performing algorithm
conguration. The problems of global optimization can usually be modeled by a
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 180195, 2017.
https://doi.org/10.1007/978-3-319-69404-7_13
RAMBO: Resource-Aware Model-Based Optimization 181
where is the distribution and is the density function of the standard normal
distribution and ymin is the best observed function value so far. Alternatively,
the comparably simpler lower condence bound criterion
is used, where (x) denotes the posterior mean and s(x) the posterior standard
deviation of the regression model at point x. Before entering the iterative process,
initially some points have to be pre-evaluated. These points are generally chosen
in a space-lling manner to uniformly cover the input space. The optimization
usually stops after a target objective value is reached or a predened budget is
exhausted [22,23].
RAMBO: Resource-Aware Model-Based Optimization 183
Multi-Point proposals derive not only one single point x from a surrogate model,
but q points x1 , . . . , xq simultaneously. The q proposed points must be su-
ciently dierent from each other to avoid multiple evaluations with the same
conguration. For this reason Hutter et al. [11] introduced the criterion
processing units. The busy evaluations have to be taken into account by the
surrogate model to avoid that new point proposals are identical or very similar
to pending evaluations. Here, the Kriging believer approach [9] can be applied to
block these regions. Another theoretically well-founded way to impute pending
values is the expected EI (EEI) [8,13,21]. The unknown value of f (xbusy ) is inte-
grated out by calculating the expected value of ybusy via Monte Carlo sampling,
which is, similar to qEI, computationally demanding. For each Monte Carlo iter-
ation values y1,busy , . . . , y,busy are drawn from the posterior distribution of the
surrogate regression model at x1,busy , . . . , x,busy , with denoting the number
of pending evaluations. These values are combined with the set of already known
evaluations and used to t the surrogate model. The EEI can then simply be
calculated by averaging the individual expected improvement values that are
formed by each Monte Carlo sample:
n
sim
= 1
EEI(x) EIi (x) (2)
nsim i=1
other. Therefore, we introduce an order that steers the search more towards
promising areas. We give the highest priority to the point xj that was proposed
using the smallest value of j . We dene the priority for each point as pj := j .
The goal of our scheduling strategy is to reduce the CPU idle time on the workers
while acquiring the feedback of the workers in the shortest possible time to avoid
model update delay. The set of points proposed by the multi-point inll criterion
forms the set of jobs J = {1, . . . , q} that we want to execute on the available
CPUs K = {1, . . . , m}. For each job the estimated runtime is given by tj and the
corresponding priority is given by pj . To reduce idle times caused by evaluations
of jobs with a low priority, the maximal runtime for each MBO iteration is dened
by the runtime of the job with the highest priority. Lower prioritized jobs have
to subordinate. At the same time we want to maximize the prot, given by the
priorities, of parallel job executions for each model update. To solve this problem,
we apply the 0 1 multiple knapsack algorithm by interfacing the R-package
adagio for global optimization routines [5]. Here the knapsacks are the available
CPUs and their capacity is the maximally allowed computing time, dened by
the runtime of the job with the highest priority. The items are the jobs J, their
weights are the estimated runtimes tj and their values are the priorities pj . The
capacity for each CPU is accordingly tj , with j := arg maxj pj . To select the
best subset of jobs the algorithm maximizes the prot Q:
Q= pj ckj ,
jJ kK
which is the sum of priorities of the selected jobs, under the restriction of the
capacity
tj tj ckj k K
jJ
per CPU. The restriction with the decision variable ckj {0, 1}
1 ckj j J, ckj {0, 1}.
kK
As the job with the highest priority denes the time bound tj it is mapped to
the rst CPU k = 1 exclusively and single jobs with higher runtimes are directly
discarded. Then the knapsack algorithm is applied to assign the remaining can-
didates in J to the remaining m1 CPUs. This leads to the best subset of J that
can be run in parallel minimizing the delay of the model update. If a CPU is left
without a job we query the surrogate model for a job with an estimated runtime
smaller or equal to tj to ll the gaps. For a useful scheduling the set of can-
didates should have considerably more candidates q than available CPUs. This
knapsack scheduling is a direct enhancement of the rst t scheduling strategy
presented in [19].
4 Numerical Experiments
In our experiments, we consider two categories of synthetic functions to ensure
a fair comparison in a disturbance-free environment. They are implemented in
the R package smoof [6]:
1. Functions with a smooth surface: rosenbrock(d) and bohachevsky(d) with
dimension d = 2, 5. They are likely to be tted well by the surrogate.
2. Highly multimodal functions: ackley(d) and rastrigin(d) (d = 2, 5). We
expect that surrogate models can have problems to achieve a good t here.
As these are illustrative test functions, they have no signicant runtime. As
a resort, we also use these functions to simulate runtime behavior. First, we
RAMBO: Resource-Aware Model-Based Optimization 187
combine two functions: One determines the number of seconds it would take
to calculate the objective value of the other function. E.g., for the combina-
tion rastrigin(2).rosenbrock(2) it would require rosenbrock(2)(x) seconds
to retrieve the objective value rastrigin(2)(x) for an arbitrary proposed point
x. Technically, we just sleep rosenbrock(2)(x) seconds before returning the
objective. We simulate the runtime with either rosenbrock(d) or rastrigin(d)
and analyze all combinations of our four objective functions, except where the
objective and the time function are identical.
A prerequisite for this approach is the unication of the input space. Thus,
we simply mapped values from the input space of the objective function to the
input space of the time function. The output of the time functions is scaled to
return values between 5 min to 60 min.
We examine the capability of the considered optimization strategies to mini-
mize functions with highly heterogeneous runtimes within a limited time budget.
To do this, we measure the distance between the best found point at time t and
a predened target value. We call this measure accuracy. In order to make this
measure comparable across dierent objective functions, we scale the function
values to [0, 1] with zero being the target value. It is dened as the best y reached
by any optimization method after the complete time budget. The upper bound
1 is the best y found in the initial design (excluding the initial runs of smac)
which is identical for all algorithms per given problem. Both values are averaged
over the 10 replications.
If an algorithm needs 2 h to reach an accuracy of 0.5, this means that within
2 h half of the way to 0 has been accomplished, after starting at 1. We compare
the dierences between optimizers at the three accuracy levels 0.5, 0.1 and 0.01.
The optimizations are repeated 10 times and conducted on m = 4 and m = 16
CPUs. We allow each optimization to run for 4 h on 4 CPUs and for 2 h on
16 CPUs in total which includes all computational overhead and idling. All
computations were performed on a Docker Swarm cluster using the R package
batchtools [18]. The initial design is generated by Latin hypercube sampling
with n = 4 d points and all of the following optimizers start with the same
design in the respective repetition:
rs: Random search, serving as base-line.
qLCB: Synchronous approach using qLCB where in each iteration q = m
points are proposed.
ei.bel: Synchronous approach using Kriging believer where in each iter-
ation m points are proposed.
asyn.eei: Asynchronous approach using EEI (100 Monte Carlo iterations)
asyn.ei.bel: Asynchronous Kriging believer approach.
rambo: Synchronous approach using qLCB with our new scheduling app-
roach where in each iteration q = 8 m candidates are proposed.
qLCB and ei.bel are implemented in the R package mlrMBO [3], which builds
upon the machine learning framework mlr [2]. asyn.eei, asyn.ei.bel and
rambo are also based on mlrMBO. We use a Kriging model from the package
DiceKriging [20] with a Matern 52 -kernel for all approaches above and add a
188 H. Kotthaus et al.
nugget eect of 108 Var(y), where y denotes the vector of all observed func-
tion outcomes. Additionally we compare our implementations to:
bohachevsky.rastrigin_5d bohachevsky.rosenbrock_5d
prediction runtime (seconds)
1500
1000
500
500
1000
0 1 2 3 4 0 1 2 3 4
hours
Fig. 1. Residuals of the runtime prediction in the course of time for the rosenbrock(5)
and rastrigin(5) time functions on 4 CPUs and bohachevsky(5) as objective function.
Positive values indicate an overestimated runtime and negative values an underestima-
tion.
Figure 2 shows boxplots for the time required to reach the three dierent accuracy
levels in 10 repetitions within a budget of 4 h real time on 4 CPUs (upper part) and
1
Hutter, F., Ramage, S.: Manual for SMAC version v2.10.03-master. Department
of Computer Science, UBC. (2015), www.cs.ubc.ca/labs/beta/Projects/SMAC/
v2.10.03/manual.pdf.
RAMBO: Resource-Aware Model-Based Optimization 189
Table 1. Ranking for accuracy levels 0.5, 0.1, 0.01 averaged over all problems with
rosenbrock() time function on 4 and 16 CPUs with a time budget of 4 h and 2 h,
respectively.
2 h on 16 CPUs (lower part). The faster an optimizer reaches the desired accuracy
level, the lower the displayed box and the better the approach. If an algorithm did
not reach an accuracy level within the time budget, we impute with the respective
time budget (4 h or 2 h) plus a penalty of 1000 s.
Table 1 lists the aggregated ranks over all objective functions, grouped by
algorithm, accuracy level, and number of CPUs. For this computation, the algo-
rithms are ranked w.r.t. their performance for each replication and problem
before they are aggregated with the mean. If there are ties (e.g. if an accuracy
level was not reached), all values obtain the worst possible rank.
The benchmarks indicate an overall advantage of our proposed resource-
aware MBO algorithm (rambo): On average, rambo reaches the accuracy level
rst in 2 of 3 setups on 4 CPUs and is always fastest on 16 CPUs. rambo is
closely followed by the asynchronous variant asyn.eei on 4 CPUs but the lead
becomes more clear on 16 CPUs. In comparison to the conventional synchronous
algorithms (ei.bel, qLCB), rambo as well as asyn.eei and asyn.ei.bel reach
the given accuracy levels in shorter time. This is especially true for objective
functions that are hard to model (ackley(), rastrigin()) by the surrogate
as seen in Fig. 2. The simpler asyn.ei.bel performs better than asyn.eei on
16 CPUs. Except for smac, all presented MBO methods outperform base-line rs
on almost all problems and accuracy levels. The bad average results for smac
are partly due to its low performance on the 5d problems and probably because
of the disadvantage of using a random forest as a surrogate on purely numerical
problems. A recent benchmark in [3] was able to demonstrate the competitive
performance of the Kriging based EGO approach. On 16 CPUs smac performs
better than rs and comparable to ei.bel.
For a thorough analysis of the optimization, Fig. 3 exemplary visualizes the
mapping of the parallel point evaluations (jobs) for all MBO algorithms on
16 CPUs for the 5d versions of the problems. Each gray box represents com-
putation of a job on the respective CPU. For the synchronous MBO algorithms
190 H. Kotthaus et al.
2d (4 CPUs)
3
2
1
hours
0
4
5d (4 CPUs)
3
2
1
0
0.5 0.1 0.01 0.5 0.1 0.01 0.5 0.1 0.01
2.0
2d (16 CPUs)
1.5
1.0
0.5
hours
0.0
2.0
5d (16 CPUs)
1.5
1.0
0.5
0.0
0.5 0.1 0.01 0.5 0.1 0.01 0.5 0.1 0.01
accuracy level
Fig. 2. Accuracy level vs. execution time for dierent objective functions using time
function rosenbrock() (lower is better).
(rambo, qLCB, ei.bel) the vertical lines indicate the end of an MBO iteration.
For asyn.eei red boxes indicate that the CPU is occupied with the point pro-
posal. The necessity of a resource estimation for jobs with heterogeneous run-
times becomes obvious, as qLCB and ei.bel can cause long idle times by queuing
jobs together with large runtime dierences. The knapsack scheduling in rambo
manages to clearly reduce this idle time. This eect of ecient resource utiliza-
tion increases with the number of CPUs. rambo reaches nearly the same eective
resource-utilization as the asynchronous asyn.ei.bel algorithm and smac (see
Fig. 3) and at the same time reaches the accuracy level fastest on 16 CPUs.
The Monte Carlo approach asyn.eei generates a high computational over-
head as indicated by the red boxes, which reduces the eective number of evalu-
ations. Idling occurs because the calculation of the EEI is encouraged to wait for
ongoing EEI calculations to include their proposals. This overhead additionally
increases with the number of already evaluated points. asyn.ei.bel and smac
have a comparably low overhead and thus basically no idle time. This seems
to be an advantage for asyn.ei.bel on 16 CPUs where it performs better on
average than its complex counterpart asyn.eei.
RAMBO: Resource-Aware Model-Based Optimization 191
asyn.eei
12
8
4
0
16
asyn.ei.bel
12
8
4
0
16
RAMBO
12
8
4
CPU
0
16
12
ei.bel
8
4
0
16
12
qLCB
8
4
0
16
12
smac
8
4
0
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
hours
Summed up, if the resource estimation that is used in rambo has a high
quality, rambo clearly outperforms the considered synchronous MBO algorithms
qLCB, ei.bel, and smac. This indicates, that the resource utilization obtained
by the scheduling in rambo leads to faster and better results, especially, when
the number of available CPUs increases. On average rambo performs better than
all considered asynchronous approaches.
Table 2. Ranking for accuracy levels 0.5, 0.1, 0.01 averaged over all problems with
rastrigin() time function on 4 and 16 CPUs with a time budget of 4 h and 2 h,
respectively.
2d (4 CPUs)
3
2
1
hours
0
4
5d (4 CPUs)
3
2
1
0
0.5 0.1 0.01 0.5 0.1 0.01 0.5 0.1 0.01
2.0
2d (16 CPUs)
1.5
1.0
0.5
hours
0.0
2.0
5d (16 CPUs)
1.5
1.0
0.5
0.0
0.5 0.1 0.01 0.5 0.1 0.01 0.5 0.1 0.01
accuracy level
Fig. 4. Accuracy level vs. execution time for dierent objective functions using time
function rastrigin() (lower is better).
smac can not compete with the Kriging based optimizers. Overall, rambo
appears not to be able to outperform the asynchronous MBO methods on 4 CPUs
as unreliable runtime estimates likely lead to suboptimal scheduling decisions.
RAMBO: Resource-Aware Model-Based Optimization 193
5 Conclusion
We benchmarked our knapsack based resource-aware parallel MBO algorithm
rambo against popular synchronous and asynchronous MBO approaches on a set
of illustrative test functions for global optimization methods. Our new approach
was able to outperform SMAC and the default synchronous MBO approach qLCB
on the continuous benchmark functions. On setups with high runtime estimation
quality it converged faster to the optima than the competing MBO algorithms on
average. This indicates, that the resource utilization obtained by our new app-
roach improves MBO, especially, when the number of available CPUs increases.
On setups with low runtime estimation quality the asynchronous Kriging based
approaches performed best on 4 CPUs and only the simplied asynchronous
Kriging believer kept its lead on 16 CPUs. Unreliable estimates likely lead to sub-
optimal scheduling decisions for rambo. While the asynchronous Kriging believer
approach, SMAC and rambo beneted from increasing the number of CPUs, the
overhead of the asynchronous approach based on EEI increased.
If the runtime of point proposals is predictable we suggest our new rambo
approach for parallel MBO with high numbers of available CPUs. Even if the
runtime estimation quality is obviously hard to determine in advance, for real
applications like hyperparameter optimization for machine learning methods pre-
dictable runtimes can be assumed. Our results also suggest that on some setups
the choice of the inll criterion determines which parallelization strategy can
reach a better performance. For future work a criterion that assigns an inll
value to a set of candidates that can be scheduled without causing long idle
times appears promising. Furthermore we want to include the memory con-
sumption measured by the traceR [16,17] tool into our scheduling decisions for
experiments with high memory demands.
References
1. Ansotegui, C., Malitsky, Y., Samulowitz, H., Sellmann, M., Tierney, K.: Model-
based genetic algorithms for algorithm conguration. In: International Joint Con-
ference on Articial Intelligence, pp. 733739 (2015)
2. Bischl, B., Lang, M., Kottho, L., Schiner, J., Richter, J., Studerus, E.,
Casalicchio, G., Jones, Z.M.: mlr: Machine learning in R. J. Mach. Learn. Res.
17(170), 15 (2016)
194 H. Kotthaus et al.
3. Bischl, B., Richter, J., Bossek, J., Horn, D., Thomas, J., Lang, M.: mlrMBO: A
modular framework for model-based optimization of expensive black-box functions.
arXiv pre-print (2017). http://arxiv.org/abs/1703.03373
4. Bischl, B., Wessing, S., Bauer, N., Friedrichs, K., Weihs, C.: MOI-MBO: multi-
objective inll for parallel model-based optimization. In: Pardalos, P.M., Resende,
M.G.C., Vogiatzis, C., Walteros, J.L. (eds.) LION 2014. LNCS, vol. 8426, pp. 173
186. Springer, Cham (2014). doi:10.1007/978-3-319-09584-4 17
5. Borchers, H.: adagio: Discrete and Global Optimization Routines (2016). R package
version 0.6.5. https://CRAN.R-project.org/package=adagio
6. Bossek, J.: smoof: Single and Multi-Objective Optimization Test Functions (2016).
R package version 1.4. https://CRAN.R-project.org/package=smoof
7. Chevalier, C., Ginsbourger, D.: Fast computation of the multi-points expected
improvement with applications in batch selection. In: Nicosia, G., Pardalos, P.
(eds.) LION 2013. LNCS, vol. 7997, pp. 5969. Springer, Heidelberg (2013). doi:10.
1007/978-3-642-44973-4 7
8. Ginsbourger, D., Janusevskis, J., Le Riche, R.: Dealing with asynchronicity in par-
allel Gaussian process based global optimization. In: 4th International Conference
of the ERCIM WG on Computing & Statistics (ERCIM 2011), pp. 127 (2011)
9. Ginsbourger, D., Le Riche, R., Carraro, L.: Kriging is well-suited to parallelize opti-
mization. In: Tenne, Y., Goh, C.K. (eds.) Computational Intelligence in Expensive
Optimization Problems, pp. 131162. Springer, Heidelberg (2010). doi:10.1007/
978-3-642-10701-6 6
10. Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization
for general algorithm conguration. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol.
6683, pp. 507523. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25566-3 40
11. Hutter, F., Hoos, H.H., Leyton-Brown, K.: Parallel algorithm conguration. In:
Hamadi, Y., Schoenauer, M. (eds.) LION 2012. LNCS, pp. 5570. Springer,
Heidelberg (2012). doi:10.1007/978-3-642-34413-8 5
12. Janusevskis, J., Le Riche, R., Ginsbourger, D.: Parallel expected improvements
for global optimization: summary, bounds and speed-up. Technical report (2011).
https://hal.archives-ouvertes.fr/hal-00613971
13. Janusevskis, J., Le Riche, R., Ginsbourger, D., Girdziusas, R.: Expected
improvements for the asynchronous parallel global optimization of expensive
functions: potentials and challenges. In: Hamadi, Y., Schoenauer, M. (eds.)
LION 2012. LNCS, pp. 413418. Springer, Heidelberg (2012). doi:10.1007/
978-3-642-34413-8 37
14. Jones, D.R.: A taxonomy of global optimization methods based on response sur-
faces. J. Global Optim. 21(4), 345383 (2001)
15. Jones, D.R., Schonlau, M., Welch, W.J.: Ecient global optimization of expensive
black-box functions. J. Global Optim. 13(4), 455492 (1998)
16. Kotthaus, H., Korb, I., Lang, M., Bischl, B., Rahnenfuhrer, J., Marwedel, P.: Run-
time and memory consumption analyses for machine learning R programs. J. Stat.
Comput. Simul. 85(1), 1429 (2015)
17. Kotthaus, H., Korb, I., Marwedel, P.: Performance analysis for parallel R pro-
grams: towards ecient resource utilization. Technical report 01/2015, Department
of Computer Science 12, TU Dortmund University (2015)
18. Lang, M., Bischl, B., Surmann, D.: batchtools: Tools for R to work on batch sys-
tems. J. Open Source Softw. 2(10) (2017)
RAMBO: Resource-Aware Model-Based Optimization 195
19. Richter, J., Kotthaus, H., Bischl, B., Marwedel, P., Rahnenfuhrer, J., Lang, M.:
Faster model-based optimization through resource-aware scheduling strategies. In:
Festa, P., Sellmann, M., Vanschoren, J. (eds.) LION 2016. LNCS, vol. 10079, pp.
267273. Springer, Cham (2016). doi:10.1007/978-3-319-50349-3 22
20. Roustant, O., Ginsbourger, D., Deville, Y.: DiceKriging, DiceOptim: two R pack-
ages for the analysis of computer experiments by Kriging-based metamodeling and
optimization. J. Stat. Softw. 51(1), 155 (2012)
21. Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine
learning algorithms. In: Advances in Neural Information Processing Systems, vol.
25. pp. 29512959. Curran Associates, Inc. (2012)
22. Strongin, R.G., Sergeyev, Y.D.: Global Optimization with Non-Convex Con-
straints. Kluwer Academic Publishers, Dordrecht (2000)
23. Zhigljavsky, A., Zilinskas, A.: Stochastic Global Optimization. Springer, New York
(2008). doi:10.1007/978-0-387-74740-8
A New Constructive Heuristic for the No-Wait
Flowshop Scheduling Problem
Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL - Centre de Recherche en
Informatique Signal et Automatique de Lille, 59000 Lille, France
lucien.mousin@ed.univ-lille1.fr,
{me.kessaci,clarisse.dhaenens}@univ-lille1.fr
1 Introduction
where i {2, ..., n}. Note that C1 () is the sum of the processing times on
the m machines of the rst scheduled job and then, is not concerned by the
delay. Therefore, the makespan (Cmax ()) of a sequence can be computed
from Eq. (1) with a complexity of O(n).
Experiments we will present, involve local search methods at dierent steps.
These methods need a neighborhood operator to move from a solution to another.
Such neighborhood operators are specic to the problem and more precisely to
the representation of a solution. In this work, we use a permutation representa-
tion and a neighborhood structure based on the insertion operator [14]. For a
sequence , it consists in inserting a job i at position k (i = k). Hence, jobs
between positions i and k are shifted. The size of this neighborhood, i.e., the
number of neighboring solutions, is (n 1)2 . Two sequences and are said to
be neighbors when they dier from exactly one insertion move. It is very inter-
esting to note that exploiting the characteristics of the NWFSP, the makespan
of can be directly computed from the makespan of with a complexity of
O(1) [11].
2.2 State-of-the-Art
Many heuristics and metaheuristics have been proposed to solve scheduling prob-
lems and in particular the No-Wait Flowshop variant. The literature review pro-
posed here, focuses on constructive heuristics as it is the scope of the article.
First, let us note that the well-known NEH (Nawaz, Enscore, Ham) heuristic,
proposed for the classical permutation FSP [10], has been successfully applied on
the No-Wait variant. Moreover, constructive heuristics have been designed specif-
ically for the NWFSP. We may cite BIH (Bianco et al. [3]), BH (Bertolissi [2])
and RAJ (Rajendran [12]) among others. These heuristics dene the order in
which jobs are considered, regardingone criterion (e.g. decreasing order of sum
of processing time of each job i m pi,m ), but this may be improved with
A New Constructive Heuristic for the NWFSP 199
other jobs orders, even random. Indeed, in some contexts, it has been shown
that repeating a random construction of solutions, may be more ecient than
applying a constructive heuristic [1].
All these heuristics construct good quality solutions that can be further
improved by a neighborhood-based metaheuristic, for example. In this paper,
we focus on NEH and BIH as they provide high quality solutions and are con-
sidered as references for this problem. These two heuristics will be used later in
the article for comparison.
The principle of NEH heuristic is to iteratively build a sequence of jobs J.
First, the n jobs of J are sorted by decreasing sums of processing times. Then,
at each iteration, the rst remaining unscheduled job is inserted in the partial
sequence in order to minimize the partial makespan. Algorithm 1 presents NEH.
Algorithm 1. NEH
Data: Set J of n jobs, the sequence of jobs scheduled
= ;
Sort the set J in decreasing sums of processing times;
for k = 1 to n do
Insert job Jk in at the position, which minimizes the partial makespan.
Algorithm 2. BIH
Data: Set J of n jobs, the sequence of jobs scheduled
= ;
while J = do
Find job k J, which can be inserted in the sequence , such that the
partial makespan is minimized. Let h be the best insertion position of job k
in the sequence ;
Insert job k at position h in the sequence ;
Remove k from set J;
Most of constructive heuristics are based on the best insertion principle. Indeed,
the principle of the heuristic is to increase to increase, at each iteration, the
problem size by one. It starts with a sub-problem of size one (only one job to
schedule), then increases the size of the problem to solve (two jobs to schedule,
three jobs. . . ) until the size of the initial problem is reached.
In order to understand the dynamic of such a construction, we analyzed the
structure of consecutive constructed sub-sequences and compare them to optimal
sequences. Obviously this can be done only for rst steps, as it is impossible to
enumerate all the sequences, and nd the optimal one, when the problem size
is too large. Our proposition is to extend observations realized on sub-problems
to the whole problem, in order to provide better solution. This approach can be
compared to the Streamlined Constraint Reasoning, where the solution space
is partitioned into sub-spaces whose structures are analyzed to better solve the
whole problem [5].
Figure 1 gives an example starting from a sub-problem of size 8, P8 , where 8
jobs are scheduled in the optimal order to minimize the makespan. Then, follow-
ing the constructive principle, job 9 is inserted at the position that minimizes the
makespan leading to the sub-problem P9 of size 9. This sequence corresponds
to the one given by NEH strategy. When comparing the obtained sequence with
the optimal solution of P9 , we can observe that they are very close. Indeed, only
two improving re-insertions (re-insertions of jobs 7 and 2) are needed to obtain
this optimal solution from the NEH solution.
A New Constructive Heuristic for the NWFSP 201
Fig. 1. Example of the evolution of the structure of the optimal solution from a sub-
problem P8 of size 8 to a sub-problem P9 of size 9.
Algorithm 3. IBI
Data: Set J of n jobs, the sequence of jobs scheduled, criterion to sort J,
cycle number of iterations without iterative improvement.
= ;
Sort set J according to ;
for k = 1 to n do
Insert job Jk in at the position which minimizes the partial makespan.;
if k 0 [cycle] then
Perform an iterative improvement from .
experimental analysis shows that the sequence built in phase (i) is not far from
the optimum of the sub-problem, it is known that the exploration of the neigh-
borhood is more and more time-consuming and the optimum more and more
dicult to reach with the increase of the problem size. In order to control the
time-performance of IBI, the cycle parameter allows the iterative improvement
to be executed at a regular number of iterations only.
As the initial sort of IBI , may impact its performance as well as the cycle
parameter, the performance of IBI is evaluated under these two parameters. This
part, rst presents the experimental protocol used, and then compare several
versions of the IBI algorithm, using several initial sorts and nally analyses the
impact of the cycle parameter.
IBI: Initial Sort. Greedy heuristics, such as IBI, consider jobs in a given order.
For example, in NEH, jobs are ordered by the decreasing sum of processing times.
The specicity of the NWFSP gives other useful information as the GAP between
two jobs on each machine that represents the idle time of the machine between
the two jobs [9]. This measure does not depend on the schedule. Hence, for each
machine, the GAP between each pair of jobs can be computed independently
from the solving. The sum of GAPs for a pair of jobs represents the sum of
their GAP on all the machines. To obtain a single value per job, we propose to
A New Constructive Heuristic for the NWFSP 203
dene the total GAP of one job as the sum of the sum of GAPs. Hence, a high
value of total GAP for a job indicates that the job has no good matching in the
schedule. On the other hand, a low total GAP indicates that the job ts well
with the others. In order to evaluate the impact of the initial sort and to dene
the most ecient one for the IBI heuristic, we experiment several initial sorts ,
based on (i) the total GAP or on (ii) the sum of processing times in decreasing
or increasing order for both.
Table 1. Average RPD obtained on each size of Taillards instances for dierent sorts:
No Sort, Decreasing Mean GAP (DecrMeanGAP), Increasing Mean GAP (IncrMean-
GAP), Decreasing sum of processing times (DecrPI) and the increasing sum of process-
ing times (IncrPI). RPD values in bold stand for algorithms outperforming the other
ones according to the statistical Friedman test.
Table 1 provides a detailed comparison of the dierent initial sorts on the 110
Taillards instances. This table shows that when a signicant dierence is observed
the best initial sort is to consider the sum of processing times. Ordering jobs by
increasing or decreasing order has no signicant inuence. Thus for the following,
we choose the increasing sum of processing times as the initial sort of IBI.
IBI: Cycle. The use of an iterative improvement at each iteration can be time
expensive. In order to minimize the execution time, we introduce a cycle. A cycle
is a sequence of x iterations without any iterative improvement.
In Table 2, we study the impact of the size of the cycle on both the quality
of solutions and the execution time.
These results show that the quality decreases with a large cycle size, but
is obviously faster. As the objective of such a constructive heuristic is to pro-
vide in a very reasonable time a solution as good as possible, and as the time
required here even for large instances stays reasonable, we propose not to use
204 L. Mousin et al.
Table 2. Average RPD and time (milliseconds) obtained on Taillards instances for
dierent sizes of cycle. RPD values in bold stand for algorithms outperforming the
other ones according to the Friedman test. A cycle of size x, indicates the iterative
improvement is executed every x iterations. Thus, a cycle of size n indicates the iterative
improvement is executed only once, at the end of the construction.
Instance 1 2 5 10 n
RPD Time RPD Time RPD Time RPD Time RPD Time
20 5 1.38 0.77 1.24 0.55 1.43 0.27 1.78 0.20 1.62 0.18
20 10 1.77 0.83 1.82 0.52 1.85 0.26 1.49 0.20 1.69 0.19
20 20 1.15 0.77 1.35 0.47 1.71 0.25 1.74 0.19 1.68 0.16
50 5 3.04 12.16 3.25 6.76 3.46 3.52 3.62 2.49 3.94 1.74
50 10 2.57 11.55 2.49 6.69 2.75 3.62 3.07 2.39 3.26 1.48
50 20 2.50 11.73 2.37 6.67 2.58 3.58 2.65 2.42 3.14 1.59
100 5 4.12 95.74 4.22 55.96 4.42 30.22 4.65 19.40 5.29 9.62
100 10 3.27 93.43 3.27 54.56 3.30 29.19 3.45 18.78 4.17 9.71
100 20 2.89 92.90 2.93 53.60 3.07 28.35 3.16 18.28 3.56 9.88
200 10 3.87 755.09 3.91 435.40 4.07 225.52 4.09 146.26 4.74 55.04
200 20 2.99 746.62 3.10 431.33 3.11 222.80 3.18 146.26 3.93 57.83
any cycle, that is to say to execute the iterative improvement at each step of the
construction.
In conclusion of these experiments we propose to x for the remainder of this
work as IBI parameters: an initial sort based on the increasing sum of processing
times and no cycle (i.e. cycle of size 1).
4 Experiments
The aim of this section is to analyze the eciency of the proposed IBI method in
two situations. First, IBI alone will be compared to other constructive heuristics
of the literature, and in a second time these dierent heuristics will be used as
initialization of a classical local search. Along this section, the same experimental
protocol as before is used, and parameters used for IBI are those resulting from
experiments of Sect. 3.3.
2. BIH is a classical heuristic for the NWFSP and very ecient as, according to
some preliminary experiments, it oers the best performance among several
tested heuristics.
Table 3 shows the comparative study between IBI and the two other construc-
tive heuristics. Both RPD and computation time are given to exhibit the gain in
quality with regards to the time. NEH and BIH are deterministic and so, both val-
ues of an instance size are average ones computed from the ten RPD values obtained
for the 10 instances respectively. On the other hand, IBI values correspond to the
average ones over the 30 runs and the 10 instances of each instance size.
Table 3. RPD and time (milliseconds) obtained on Taillards instances for NEH, BIH
and IBI (average values). RPD values in bold stand for algorithms statistically outper-
forming the other ones according to the Friedman test.
Solutions built by IBI have a better quality (lower RPD) than those built by
NEH or BIH. It has statistically been veried with the Friedman Test. Beside,
IBI was able to better perform than NEH or BIH for 106 instances over the 110
available instances. The counterpart of this performance is the computational
time required to execute IBI regarding to the two other heuristics. However,
this time remains reasonable as, less than one second is required even for larger
instances. IBI is an ecient alternative to the classical heuristics used to build
good quality solutions.
an initialization phase, it has been combined with a Tabu Search (TS). This
local search is widely used to solve owshop problems [6,7,16] and so is a good
candidate to show the pertinence to build solutions with IBI rather than NEH
or BIH. The three heuristics are combined with TS. In order to be fair, every
combined approaches have the same running time xed to 1000 n ms (with n
the number of jobs). These experiments aim at checking if the quality reached
is enough to justify the use of IBI as initialization or if its higher execution time
penalizes its use.
Table 4 presents results about the average RPD values among the 30 runs
for each combined approach for each instance size. Results exposed in this table
compared to those of Table 3 rst indicate that the Tabu Search manages to
improve solutions produced by the dierent heuristics. It also indicates that IBI
used as an initialization phase, gets better results than the others, even if the
dierence is not always statistically proved. In particular for instances with a
high number of jobs to schedule, IBI shows a very good performance.
Table 4. Average RPD on Taillards instances for NEH+TS, BIH+TS and IBI+TS.
RPD values in bold stand for algorithms statistically outperforming the other ones
according to the Friedman test.
Fig. 2. Evolution of the average makespan on 30 runs for the 7th instance of Taillard
(200 jobs, 20 machines) for the three combined approaches. IBI+TS is represented by
squares, NEH+TS by crosses and BIH+TS by circles.
In conclusion, these experiments show the good performance of IBI for both
aspects: as a constructive heuristics and as initialization of a meta-heuristic. In
particular, they show the contribution of the local improvement to build the
solution.
This work presents IBI, a new heuristic to minimize the makespan for the No-
Wait Flowshop Scheduling Problem. IBI has been designed from the analysis of
existing heuristics of the literature as well as of the structure of optimal solutions.
Indeed, IBI is an improvement of the widely used heuristic (NEH), where an
iterative improvement procedure has been added after each insertion of a job.
Even if this additional procedure increases the computational time compared to
other constructive heuristics of the literature (NEH and BIH), the improvement
of the quality of the nal solution built is noticeable and has statistically been
validated with the Friedman test.
Then, we analyzed if this extra computational time is justied in term of
quality obtained by evaluating the capacity of IBI to provide a good solution.
IBI and the others heuristics have been used as initialization for a metaheuristic
i.e., the initial solution of the approach is a solution built by a heuristic. The
experiments, on the Taillard instances, show that IBI helps the metaheuristic to
be more ecient and the extra time needed to build an initial solution is not a
drawback.
208 L. Mousin et al.
References
1. Balas, E., Carrera, M.C.: A dynamic subgradient-based branch-and-bound proce-
dure for set covering. Oper. Res. 44(6), 875890 (1996)
2. Bertolissi, E.: Heuristic algorithm for scheduling in the no-wait ow-shop. J. Mater.
Process. Technol. 107(13), 459465 (2000)
3. Bianco, L., DellOlmo, P., Giordani, S.: Flow shop no-wait scheduling with sequence
dependent setup times and release dates. INFOR Inf. Syst. Oper. Res. 37(1), 319
(1999)
4. Gilmore, P.C., Gomory, R.E.: Sequencing a one state-variable machine: a solvable
case of the traveling salesman problem. Oper. Res. 12(5), 655679 (1964)
5. Gomes, C., Sellmann, M.: Streamlined constraint reasoning. In: Wallace, M. (ed.)
CP 2004. LNCS, vol. 3258, pp. 274289. Springer, Heidelberg (2004). doi:10.1007/
978-3-540-30201-8 22
6. Grabowski, J., Pempera, J.: The permutation ow shop problem with blocking. A
Tabu Search approach. Omega 35(3), 302311 (2007)
7. Grabowski, J., Wodecki, M.: A very fast Tabu Search algorithm for the permutation
ow shop problem with makespan criterion. Comput. Oper. Res. 31(11), 18911909
(2004)
8. Kadioglu, S., Malitsky, Y., Sellmann, M.: Non-model-based search guidance for
set partitioning problems. In: Homann, J., Selman, B. (eds.) Proceedings of
the Twenty-Sixth AAAI Conference on Articial Intelligence, Toronto, Ontario,
Canada, 2226 July 2012. AAAI Press (2012)
9. Nagano, M.S., Araujo, D.C.: New heuristics for the no-wait owshop with sequence-
dependent setup times problem. J. Braz. Soc. Mech. Sci. Eng. 36(1), 139151
(2013)
10. Nawaz, M., Enscore, E.E., Ham, I.: A heuristic algorithm for the m-machine, n-job
ow-shop sequencing problem. Omega 11(1), 9195 (1983)
11. Pan, Q.-K., Wang, L., Tasgetiren, M.F., Zhao, B.-H.: A hybrid discrete particle
swarm optimization algorithm for the no-wait ow shop scheduling problem with
makespan criterion. Int. J. Adv. Manuf. Technol. 38(34), 337347 (2007)
12. Rajendran, C.: A no-wait owshop scheduling heuristic to minimize makespan. J.
Oper. Res. Soc. 45(4), 472478 (1994)
13. Rock, H.: The three-machine no-wait ow shop is NP-complete. J. ACM 31(2),
336345 (1984)
14. Schiavinotto, T., Stutzle, T.: A review of metrics on permutations for search land-
scape analysis. Comput. Oper. Res. 34, 31433153 (2007)
15. Taillard, E.: Benchmarks for basic scheduling problems. Eur. J. Oper. Res. 64(2),
278285 (1993)
A New Constructive Heuristic for the NWFSP 209
16. Wang, C., Li, X., Wang, Q.: Accelerated tabu search for no-wait owshop schedul-
ing problem with maximum lateness criterion. Eur. J. Oper. Res. 206(1), 6472
(2010)
17. Wismer, D.A.: Solution of the owshop-scheduling problem with no intermediate
queues. Oper. Res. 20(3), 689697 (1972)
Sharp Penalty Mappings for Variational
Inequality Problems
Evgeni Nurminski(B)
1 Introduction
Variational inequalities (VI) became one of the common tools for represent-
ing many problems in physics, engineering, economics, computational biology,
computerized medicine, to name but a few, which extend beyond optimization,
see [1] for the extensive review of the subject. Apart from the mathematical
problems connected with the characterization of solutions and development of
the appropriate algorithmic tools to nd them, modern problems oer signif-
icant implementation challenges due to their non-linearity and large scale. It
leaves just a few options for the algorithms development as it occurs in the oth-
ers related elds like convex feasibility (CF) problems [2] as well. One of these
options is to use xed point iteration methods with various attraction properties
toward the solutions, which have low memory requirements and simple and easily
parallelized iterations. These schemes are quite popular for convex optimization
and CF problems but they need certain modications to be applied to VI prob-
lems. The idea of modication can be related to some approaches put forward
for convex optimization and CF problems in [35] and which is becoming known
as superiorization technique (see also [6] for the general description).
From the point of view of this approach the optimization problem
are conceptually divided into the feasibility problem x X and the second-
stage optimization or VI problems next. Then these tasks can be considered to a
certain extent separately which makes it possible to use their specics to apply
the most suitable algorithms for feasibility and optimization/VI parts.
The problem is to combine these algorithms in a way which provides the
solution of the original problems (1) or (2). As it turns out these two tasks can
be merged together under rather reasonable conditions which basically require
a feasibility algorithm to be resilient with respect to diminishing perturbations
and the second-stage algorithm to be something like globally convergent over
the feasible set or its small expansions.
Needless to say that this general idea meets many technical diculties one of
them is to balance in intelligent way the feasibility and optimization/VI steps.
If optimization steps are essentially smaller than feasibility steps then it is
possible to prove general convergence results [3,4] under rather mild conditions.
However it looks like that this requirements for optimization steps to be smaller
(in fact even vanishing compared to feasibility) slows down the overall optimiza-
tion in (1) considerably.
This can be seen in the text-book penalty function method for (1) which
consists in the solution of the auxiliary problem of the kind
where X (x) = 0 for x X and X (x) > 0 otherwise. The term f (x) can
be considered as the perturbation of the feasibility problem minx (x) and for
classical smooth penalty functions the penalty parameter > 0 must tend to
zero to guarantee convergence of x to the solution of (1). Denitely it makes
the objective function f (x) less inuential in solution process of (1) and hinders
the optimization.
To overcome this problem the exact penalty functions X () can be used
which provide the exact solution of (1)
for small enough > 0 under rather mild conditions. The price for the conceptual
simplication of the solution of (2) is the inevitable non-dierentiability of the
penalty function X (x) and the corresponding worsening of convergence rates
for instance for gradient-like methods (see [8,9] for comparison). Nevertheless
the idea has a certain appeal, keeping in mind successes of nondiereniable
optimization, and the similar approaches with necessary modications were used
for VI problems starting from [10] and followed by [1114] among others. In these
works the penalty functions were introduced and their gradient elds direct the
iterations to feasibility.
212 E. Nurminski
This problem has its roots in convex optimization and for F (x) = f (x) VI (5)
is the geometrical formalization of the optimality conditions for (1).
If F is monotone, then the pseudo-variational inequality (PVI) problem
F (x ) = 0
X = [1, 1]
For simplicity we assume that both problems (5) and (6) has unique and
hence coinciding solutions.
To have more freedom to develop iteration methods for the problem (6) we
introduce the notions of oriented and strongly oriented mappings according to
the following denitions.
214 E. Nurminski
gx (x x) (8)
fx (x x ) (14)
X = X \ {x + B},
(1)
4 Iteration Algorithm
After construction of the mapping F , oriented toward solution x of (6) at the
whole space E except -neighborhood of x we can use it in an iterative manner
like
xk+1 = xk k f k , f k F (xk ), k = 0, 1, . . . , (23)
where {k } is a certain prescribed sequence of step-size multipliers, to get the
sequence of {xk }, k = 0, 1, . . . which hopefully converges under some conditions
to to at least the set X = x + B of approximate solutions of (5).
For technical reasons, however, it would be convenient to guarantee from the
very beginning the boundedness of {xk }, k = 0, 1, . . .. Possibly the simplest way
to do so is to insert into the simple scheme (23) a safety device, which enforces
restart if a current iteration xk goes too far. This prevents the algorithm from
divergence due to the run away eect and it can be easily shown that it keeps
a sequence of iterations {xk } bounded.
Thus the nal form of the algorithm is shown as the gure Algorithm 1,
assuming that the set X, the operator F and the sharp penalty mapping PX
satisfy conditions of the Lemma 1. We prove convergence of the Algorithm K 1
under common assumptions on step sizes: k +0 when k and k=1 k
when K . This is not the most ecient way to control the algorithm,
but at the moment we are interested mostly in the very fact of convergence.
is a certain interval [wl , wu ] R+ and the statement of the theorem means that
wu 2 .
To prove this we assume contrary, that is wu > 2 and hence there exists a
sub-sequence {xks , s = 1, 2, . . . } such that lims W (xks ) = w > 2 . Without
loss of generality we may assume that lims xks = x and of course x / X .
Therefore f (x x ) > 0 for any f F (x ) and by upper semi-continuity of F
there exists an > 0 such that F (x)(x x ) for all x x + 4B
and some
> 0. Again without loss of generally we may assume that < ( w )/4 so
(x + 4B) (x + B) = .
For for s large enough xks x + B and let us assume that for all t > ks
the sequence {xt , t > ks } xks + B x + 2B.
Then
for all t > ks and s large enough that supt>ks t < /C 2 . Summing up last
inequalities from t = ks to t = T 1 obtain
T
1
W (xT ) W (xks ) t (31)
t=ks
Passing to the limit when s obtain W (x ) W (x ) /K < W (x ) Also
W (x ) > 2 as x x + 4B which does not intersect with x + B. To save on
notations denote W (x ) = w and W (x ) = w .
In other words, assuming that w > 2 we constructed another limit point
w of the sequence {W (xk )} such that 2 < w < w . It follows from this that
the sequence {W (xk )} innitely many times crosses any sub-interval [w , w ]
(w , w ) both in up and down directions and hence there exist sub-sequences
{ps , s = 1, 2, . . . } and {qs , s = 1, 2, . . . } such that ps < qs and
Then
q
s 1
and hence for any s there is an index ts : ps < ts < qs such that
for all s large enough. This contradicts (37) and therefore proves the
theorem.
5 Conclusions
In this paper we dene and use a sharp penalty mapping to construct the itera-
tion algorithm converging to an approximate solutions of monotone variational
inequalities. Sharp penalty mappings are analogues of gradient elds of exact
penalty functions but do not need to be potential mappings. Three examples of
sharp penalty mappings are given with one of them seems to be a new one. The
algorithm consists in recursive application of a penalized variational inequal-
ity operator, but scaled by step-size multipliers which satisfy classical diverging
series condition. As for practical value of these result it is generally believed
that the conditions for the step-size multipliers used in this theorem result in
rather slow convergence of the order O(k 1 ). However the convergence rate can
be improved by dierent means following the example of non-dierentiable opti-
mization. The promising direction is for instance the least-norm adaptive reg-
ulation, suggested probably rst by A. Fiacco and McCormick [16] as early as
1968 and studied in more details in [17] for convex optimization problems. With
some modication in can be easily used for VI problems as well. Experiments
show that under favorable conditions it produces step multipliers decreasing as
geometrical progression which gives a linear convergence for the algorithm. This
may explain the success of [7] where geometrical progression for step multipliers
was independently suggested and tested in practice.
References
1. Facchinei, F., Pang, J.-S.: Finite-dimensional Variational Inequalities and Comple-
mentarity Problems. Springer, Berlin (2003)
2. Bauschke, H., Borwein, J.: On projection algorithms for solving convex feasibility
problems. SIAM Rev. 38, 367426 (1996)
Sharp Penalty Mappings 221
Andrei Orlov(B)
1 Introduction
As well-known, problems with hierarchical structure arise in investigations of com-
plex control systems [9] with the bilevel optimization being the most popular
modeling tool [7]. According to Pang [17], a distinguished expert in optimization,
development of methods for solving various problems with hierarchical structure
is one the three challenges faced by optimization theory and methods in the 21st
century. This paper investigates one of the classes of bilevel problems with a con-
vex quadratic upper level goal function and a quadratic lower level goal function if
the constraints are linear. The lower level goal function has a bilinear component.
The task is to nd an optimistic solution in the situation when the actions of the
lower level might coordinate with the interests of the upper level [7].
During more than three decades of intensive investigation of bilevel optimiza-
tion problems, many methods for nding the optimistic solutions were proposed
by dierent authors (see the surveys [6,8]). Nevertheless, as far as we can con-
clude on the basis of the available literature, a few results published so far deal
with numerical solutions of merely test bilevel high-dimension problems (for
example, up to 100 variables at each level for linear bilevel problems [19]). Most
frequently authors consider just illustrative examples with the dimension up to
10 (see [14,18]) and only the works [1,5,10,12] present some results on solving
nonlinear bilevel problems of dimension up to 30 at each level.
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 222234, 2017.
https://doi.org/10.1007/978-3-319-69404-7_16
A Nonconvex Optimization Approach to Quadratic Bilevel Problems 223
At the same time, in our group we have an experience of solving linear bilevel
problems with up to 500 variables at each level [11] and quadratic-linear bilevel
problems of dimension up to (150 150) [23]. Here we generalize our methods
for problems with quadratic goal functions at each level.
For this purpose, we use the most common approach to address a bilevel prob-
lem via its reduction to a single-level one by replacing the lower level problem
with optimality conditions (the so called KKT-approach) [7,23]. Then, using the
penalty method, the resulting problem with a nonconvex feasible set is reduced
to a series of problems with a nonconvex function under linear constraints [7,23].
The latter ones turn out to be the d.c. optimization problems (with goal func-
tions that can be represented as the dierence of two convex functions) [20,21],
which can be addressed by means of the Global Search Theory developed by
Strekalovsky [20,21]. In contrast to the generally accepted global optimization
methods such as branch-and-bound based techniques, approximation methods
and the like, this theory employs reduction of the nonconvex problem to a family
of simpler problems with a possibility of application of classic convex optimiza-
tion methods.
In accordance with the Global Search Theory, this paper aims at constructing
specialized local and global search methods for nding optimistic solutions to the
problems under study. Illustrative examples taken from the literature were used
to demonstrate that the approach proposed for numerical solution of quadratic
bilevel problems performs rather well.
Note that despite the bilinear component in the goal function, for a xed
x X the lower level problem
1
y, D1 y + xQ + d1 , y min, A1 x + B1 y b1 (FP(x))
2 y
By replacing the lower level problem in (QBP) with its optimality conditions
(KKT ) we obtain the following mathematical optimization problem:
F (x, y) min , Ax b,
x,y,v
D1 y + d1 + xQ + vB1 = 0, v 0, A1 x + B1 y b1 , (DCC)
v, A1 x + B1 y b1 = 0.
It is clear that (DCC) is a problem with a nonconvex feasible set because the
nonconvexity is generated by the last equality constraint (complementary con-
straint). The following theorem on the equivalence between (QBP) and (DCC)
is valid.
Theorem 1 [7]. For the pair (x , y ) to be a global solution to (QBP) it is
necessary and sucient that there exists a vector v IRq such that the triple
(x , y , v ) is a global solution to (DCC).
This theorem reduces the search for an optimistic solution to the bilevel
problem (QBP) to solving Problem (DCC). Note that direct handling of the
nonconvex set in this problem is quite challenging, which is why we use the
penalty method to boil it down to a series of nonconvex problems with a convex
feasible set.
It is easy to see that the complementary constraint can be written in the
equivalent form v, b1 A1 x B1 y = 0. Then (x, y, v) D, where
D := {(x, y, v) | Ax b, D1 y + d1 + xQ + vB1 = 0, v 0, A1 x + B1 y b1 }
where > 0 is a penalty parameter. For a xed this problem belongs to the
class of d.c. minimization problems [20,21] with a convex feasible set. In what
A Nonconvex Optimization Approach to Quadratic Bilevel Problems 225
follows we show that the goal function of (DC()) can be represented as a dif-
ference of two convex functions. Let (x(), y(), v()) be a solution to (DC())
for some . Denote r[] := v(), b1 A1 x() B1 y() and formulate the fol-
lowing result on the connection between the solutions to (DCC) and (DC()).
Proposition 1 [2,7]
(i) Let for some > 0 the equality r[] = 0 holds for a solution (x(), y(),
v()) to Problem (DC()). Then, the triple (x(), y(), v()) is a solution
to Problem (DCC).
(ii) For all values of the parameters > the function r[] vanishes, so that
the triple (x(), y(), v()) is a solution to Problem (DCC).
V -procedure
s
Step 0. Set s := 1, v := v0 .
s
Step 1. Using a suitable quadratic programming method, nd the -
2
solution (xs+1 , y s+1 ) to the problem
1 1
x, Cx + c, x + y, Dy + d, y v s A1 , x v s B1 , y min,
2 2 x,y (QP(v s ))
Ax b, A1 x + B1 y b1 , D1 y + d1 + xQ + v s B1 = 0.
226 A. Orlov
s
Step 2. Find the -solution v s+1 to the following LP-problem:
2
b1 A1 xs+1 B1 y s+1 , v min,
v (LP(xs+1 , y s+1 ))
D1 y s+1 + d1 + xs+1 Q + vB1 = 0, v 0.
Theorem 2
(i) Let a sequence {s } be such that s > 0, s = 1, 2, ..., and s=1 s <
+. Then the number sequence {s := (xs , y s , v s )} generated by the V -
procedure converges.
(ii) If (xs , y s , v s ) (x, y, v), then the accumulation point (x, y, v) satises the
inequalities
(x, y, v) (x, y, v) (x, y) D(v), (1)
(x, y, v) (x, y, v) v D(x, y), (2)
where D(v) := {(x, y)| (x, y, v) D}, D(x, y) := {v| (x, y, v) D}.
Definition 1. The triple (x, y, v) satisfying (1) and (2) is said to be the critical
point in Problem (DC()). If the inequalities (1) and (2) are satised with a
certain accuracy then we refer to this point as approximately critical.
It can be shown (see, e.g., [23]), that if we use, for example, the following
inequality as a stopping criterion for the V -procedure
s s+1 , (3)
where is a given accuracy, then after the nite number of iterations of the local
search method we arrive at the approximately critical point. Recall that such
a denition of critical points is quite advantageous when we perform a global
search in problems with a bilinear structure [22,23]. The next section describes
basic elements of the global search for Problem (DC()).
A Nonconvex Optimization Approach to Quadratic Bilevel Problems 227
As well-known, the local search does not provide, in general, a global solution in
nonconvex problems of even moderate dimension [2022]. Therefore, in what fol-
lows we discuss the procedure of escaping critical points obtained during the local
search. The procedure is based on the Global Optimality Conditions (GOCs)
developed by Strekalovsky for the d.c. minimization problems [20,21]. To build
the global search procedure, rst of all we need an explicit d.c. representation of
the goal function of (DC()). We will employ the following representation based
on the known property of scalar product:
h(z, u, w) = , := (x , y , v ), (5)
Note that denition for h() (see (4)) makes it possible to solve Problem (Ui )
analytically. Let (z0i , ui0 , w0i ) be the approximate solution to this problem.
A Nonconvex Optimization Approach to Quadratic Bilevel Problems 229
5 Computational Simulation
To illustrate the operability of the newly constructed local and global search
algorithms for nding optimistic solutions to quadratic bilevel problems, a few
examples of moderate dimension were taken from the available literature. Note
that here they have the form (QBP), with constant components excluded from
the goal functions on the upper and lower levels.
Example 1 ([18])
F (x, y) = x2 10x + 4y 2 + 4y min,
x,y
x X = {x IR1 | x 0},
y Y (x) = Arg min{y 2 2y | y Y (x)},
y
Y (x) = {y IR | 3x + y 3, x 0.5y 4, x + y 7, y 0}.
1
It is well-known that the global optimistic solution to this problem with the
goal function optimal value F = 9 is achieved at the point (x , y ) = (1, 0).
Example 2 ([1])
F (x, y) = x2 10x + 4y 2 + 4y min,
x,y
x X = {x IR1 | x 0},
y Y (x) = Arg min{y 2 2y 1.5xy | y Y (x)},
y
Y (x) = {y IR | 3x + y 3, x 0.5y 4, x + y 7, y 0}.
1
230 A. Orlov
Interestingly, this problem diers from the previous one only by having a
bilinear component in the lower level goal function. Obviously, it does not result
in noncovexity of the problem at the lower level, because for a xed x the bilin-
ear component becomes linear. Moreover, the global optimistic solution to this
problem with the goal function optimal value F = 9 is achieved at the same
point (x , y ) = (1, 0) as above.
Example 3 ([4])
1 1
F (x, y) = x2 x + y 2 min,
2 2 x,y
x X = {x IR1 | x 0},
1
y Y (x) = Arg min{ y 2 xy | y Y (x)},
y 2
Y (x) = {y IR1 | x y 1, x + y , x y 1.
This problem has a form of the kernel problems used for generating quadratic
bilevel problems by means of the method constructed in [4]. Depending on the
value of the parameter , the kernel problems are divided into following classes:
Even though the problems discussed above dier from each other only by
a single parameter, they all have dierent properties and a dierent number of
local and global solutions (for more detail, refer to [4]).
Example 4 ([14])
F (x, y) = 7x1 + 4x2 + y12 + y32 y1 y3 4y2 min,
x,y
x X = {x IR2 | x1 + x2 1, x1,2 0, },
1 1
y Y (x) = Arg min{y1 3x1 y1 + y2 + x2 y2 + y12 + y22 + y32 + y1 y2 |
y 2 2
y Y (x)}, Y (x) = {y IR3 |x1 2x2 + 2y1 + y2 y3 + 2 0, y1,2,3 0}.
Apparently, this problem is more complex than the previous ones, because
the dimension of the upper level variable is 2, whereas the dimension of the
lower level variable is 3. The global optimistic solution approximately equals
F = 0.6426 and is achieved at the point (x , y ) = (0.609, 0.391, 0, 0, 1.828).
First, for each of the problems we write down its single-level equivalent. Note
that the dimension of the single-level problems is bigger than that one of the
original problems by exactly the number of constraints in the lower level of
the bilevel problem. Further, we penalize the complementary constraint in each
problem and afterwards apply the local and global search methods described
above.
A Nonconvex Optimization Approach to Quadratic Bilevel Problems 231
The software that implements the methods developed was coded in MATLAB
7.11.0.584 R2010b [13]. The auxiliary problems of linear and convex quadratic
programming were solved by the standard MATLAB subroutines linprog and
quadprog with the default settings [13]. To run the software, we used the
computer with Intel Core i5-2400 processor (3.1 GHz) and 4 Gb RAM.
To construct feasible starting points for the local search method, we used the
projection of the chosen infeasible point (x0 , y 0 , v 0 ) = (0, 0, 0) onto the feasible
set D by solving the following problem:
1
(x, y, v) (x0 , y 0 , v 0 )2 min ,
2 x,y,v (PR(x0 , y 0 , v 0 ))
(x, y, v) D.
Here el IRm+n , ej IRq are the Euclidean basis vectors of the corresponding
dimension, (x, y, v) is a current critical point. These sets proved to be most
ecient when solving test problems. Computational results are given in Table 1
with the following denotations:
N o is a number of example;
Dir is the most ecient direction set which delivered its global optimistic
solution for the given problem;
Loc stands for the number of start-ups of the local search procedure required
to nd the approximate global solution to the problem;
LP is the number of the LP problems solved during the operation of the
program;
QP stands for the number of auxiliary convex quadratic problems solved;
GIt is the number of iterations of the global search method (the number of
improved critical points obtained during the operation of the program);
(x , y ) is an optimal solution to the bilevel problem;
v is an optimal value of the auxiliary variables;
F = is the optimal value of the goal functions of Problems (QBP) and
(DC());
T is the operating time of the program (in seconds).
First of all note that the values of the parameters , , and M specied above
happened to be insucient to solve Problem 4. To nd the global optimistic
solution in it, we needed the values = 15, = 0.1, M = 3. On the other hand,
we managed to somewhat improve the approximate solution against the results
from [14]. This involved 6 global search iterations and about 1 s of operating
time.
Analysis of the rest of results in the table shows that in all one-dimensional
problems the known global optimistic solutions at each level were found in less
than 0.3 s, which required between 1 and 3 iterations of the global search method
(for GIt = 1, the solution was obtained already at the local search stage).
As expected, Problem 4 happened to be more complex than the others, pri-
marily due to its dimension. This is attested by the values in columns Loc, LP ,
A Nonconvex Optimization Approach to Quadratic Bilevel Problems 233
QP , and GIt, which can be considered as some complexity measure for the
problem under study.
Therefore, computational experiments demonstrated that the global search
theory performs well when applied to bilevel problems of quadratic optimization,
whereby varying of the parameters in the algorithm opens up great prospects
for solving simple as well as more complex problems.
6 Conclusion
This paper proposes an innovative approach to solving quadratic bilevel problems
based on their reduction to parametric problems of d.c. minimization with a
subsequent application of the Global Search Theory [20,21]. The specialized local
and global search methods have been constructed to nd optimistic solutions to
bilevel problems. The methods have proved to be ecient in solving test problems
of moderate dimension.
Further research suggests extension of the range and dimension of problems.
For this purpose, it is planned to implement a special method of generating test
problems of bilevel optimization from [4].
The results of numerical testing as well as our previous computational expe-
rience [11,15,16,22,23] allow us to expect that the approach proposed will prove
eective in solving quadratic bilevel problems of high dimension (probably, up to
100 100) with a supplementary possibility of exploiting modern software pack-
ages for solving auxiliary LP and convex quadratic problems (IBM CPLEX,
FICO Xpress etc.).
Acknowledgments. This work has been supported by the Russian Science Founda-
tion (Project no. 15-11-20015).
References
1. Bard, J.F.: Convex two-level optimization. Math. Prog. 40, 1527 (1988)
2. Bazara, M.S., Shetty, C.M.: Nonlinear Programming. Theory and Algorithms.
Wiley, New York (1979)
3. Bonnans, J.-F., Gilbert, J.C., Lemarechal, C., Sagastizabal, C.A.: Numerical Opti-
mization: Theoretical and Practical Aspects. Springer, Heidelberg (2006)
4. Calamai, P., Vicente, L.: Generating quadratic bilevel programming test problems.
ACM Trans. Math. Softw. 20, 103119 (1994)
5. Colson, B., Marcotte, P., Savard, G.: A trust-region method for nonlinear bilevel
programming: algorithm and computational experience. Comput. Optim. Appl.
30, 211227 (2005)
6. Colson, B., Marcotte, P., Savard, G.: An overview of bilevel optimization. Ann.
Oper. Res. 153, 235256 (2007)
7. Dempe, S.: Foundations of Bilevel Programming. Kluwer Academic Publishers,
Dordrecht (2002)
8. Dempe, S.: Bilevel programming. In: Audet, C., Hansen, P., Savard, G. (eds.)
Essays and Surveys in Global Optimization, pp. 165193. Springer, Boston (2005)
234 A. Orlov
9. Dempe, S., Kalashnikov, V.V., Perez-Valdes, G.A., Kalashnykova, N.: Bilevel Pro-
gramming Problems: Theory, Algorithms and Applications to Energy Networks.
Springer, Heidelberg (2015)
10. Etoa, J.B.E.: Solving quadratic convex bilevel programming problems using a
smoothing method. Appl. Math. Comput. 217, 66806690 (2011)
11. Gruzdeva, T.V., Petrova, E.G.: Numerical solution of a linear bilevel problem.
Comp. Math. Math. Phys. 50, 16311641 (2010)
12. Gumus, Z.H., Floudas, C.A.: Global optimization of nonlinear bilevel programming
problems. J. Glob. Optim. 20, 131 (2001)
13. MATLABThe language of technical computing. http://www.mathworks.com/
products/matlab/
14. Muu, L.D., Quy, N.V.: A global optimization method for solving convex quadratic
bilevel programming problems. J. Glob. Optim. 26, 199219 (2003)
15. Orlov, A.V.: Numerical solution of bilinear programming problems. Comput. Math.
Math. Phys. 48, 225241 (2008)
16. Orlov, A.V., Strekalovsky, A.S.: Numerical search for equilibria in bimatrix games.
Comput. Math. Math. Phys. 45, 947960 (2005)
17. Pang, J.-S.: Three modeling paradigms in mathematical programming. Math. Prog.
Ser. B. 125, 297323 (2010)
18. Pistikopoulos, E.N., Dua, V., Ryu, J.-H.: Global optimization of bilevel program-
ming problems via parametric programming. In: Floudas, C.A., Pardalos, P.M.
(eds.) Frontiers in Global Optimization, pp. 457476. Kluwer Academic Publish-
ers, Dordrecht (2004)
19. Saboia, C.H., Campelo, M., Scheimberg, S.: A computational study of global algo-
rithms for linear bilevel programming. Numer. Algorithms 35, 155173 (2004)
20. Strekalovsky, A.S.: Elements of Nonconvex Optimization. Nauka, Novosibirsk
(2003). [in Russian]
21. Strekalovsky, A.S.: On solving optimization problems with hidden nonconvex
structures. In: Rassias, T.M., Floudas, C.A., Butenko, S. (eds.) Optimization in
Science and Engineering, pp. 465502. Springer, New York (2014). doi:10.1007/
978-1-4939-0808-0 23
22. Strekalovsky, A.S., Orlov, A.V.: Bimatrix Games and Bilinear Programming. Fiz-
MatLit, Moscow (2007). [in Russian]
23. Strekalovsky, A.S., Orlov, A.V., Malyshev, A.V.: On computational search for opti-
mistic solution in bilevel problems. J. Glob. Optim. 48, 159172 (2010)
An Experimental Study of Adaptive Capping
in irace
1 Introduction
Algorithm conguration is the task of nding parameter settings (a congura-
tion) of a target algorithm that achieve high performance for a given class of
problem instances [6,8]. The appropriate choice of parameter settings is often
crucial for obtaining good performance, particularly when dealing with compu-
tationally challenging (e.g., N P-hard) problems. This choice usually depends
on the set or distribution of problem instances to be solved as well as on the
execution environment. Therefore, using appropriately chosen parameter values
is not only essential for reaching peak performance, but also for conducting fair
performance comparisons between dierent algorithms for the same problem.
Traditionally, algorithm conguration has been performed manually, relying
on experience and intuition about the behaviour of a given algorithm. However,
typical manual conguration processes are time-consuming and tedious; further-
more, they often leave the performance potential of a given target algorithm
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 235250, 2017.
https://doi.org/10.1007/978-3-319-69404-7_17
236 L.P. Caceres et al.
Fig. 1. Illustration of the 1st and 2nd iteration of a run of irace using T first = 3,
T each = 1 and T new = 1.
238 L.P. Caceres et al.
rst statistical test is applied, and the congurations that are signicantly worse
performing than the best ones are eliminated. This elimination test is performed
every T each instances until the termination criterion of the iteration is met. The
surviving congurations at the end of the iteration (elite congurations) are used
to update a probabilistic model from which new congurations are sampled.
The set of congurations evaluated in the next iteration is composed of the elite
congurations and newly sampled congurations (non-elite ones). Algorithm 1
shows the pseudo-code of the race performed at each iteration of irace. Instances
are evaluated following an execution order that is built interleaving new and old
instances (procedure generateInstancesList in line 1).
More precisely, the instance list includes T new previously unseen instances,
followed by the list of previously evaluated instances (I old ), and nally, enough
new instances to complete the race. I old is randomly shued to avoid a bias
that could result from always using the same instance order. A race may termi-
nate even before evaluating all instances in I old (e.g. when a minimum number
of congurations is reached), and, as a result, each elite conguration may be
evaluated on some instances in I old . When the race nishes, irace therefore mem-
orises which conguration has been evaluated on which instance. (Line 7 updates
the elite status, and line 10 tests this condition.) In line 6, the congurations
are evaluated on instance I[i] with a maximum execution time of bmax . If an
elite conguration was already previously evaluated on I[i] (i.e., I[i] I old ), its
result on that instance is read from memory. When the statistical elimination
test is applied, only non-elite congurations (new ) may be eliminated and elite
ones are kept until they become non-elite. A conguration becomes non-elite if
all instances in I old on which it has previously been evaluated have been seen in
a race. Finally, the race returns the best congurations found, which will become
elite in the next iteration. For more details about irace, see [17].
ParamILS [13] is an iterated local search [19] procedure that searches in a para-
meter space dened by categorical parameters only; for conguring numerical
parameters with ParamILS, these need to be discretised. ParamILS uses a rst-
improvement local search algorithm that explores, in random order, the one-
exchange neighbourhood of the current conguration.
There are two versions of ParamILS, BasicILS and FocusedILS, which dier
in the number of instances evaluated when comparing two congurations [13].
BasicILS compares congurations by evaluating them on a xed number N of
instances, while FocusedILS varies the number of instances according to the
quality of the congurations to be tested. The number of instances used in
the comparison is adjusted based on the dominance criterion, by which a
conguration j is dominated by a conguration i if (1) i has been evaluated
in at least as many instances as j and (2) the aggregated performance of i is
better or equal than the one of j on the Nj instances on which j has been
evaluated. When no dominance can be established between two congurations,
An Experimental Study of Adaptive Capping in irace 239
the number of instances seen by the conguration with less instances evaluated
is increased until both congurations have seen the same number of evaluations.
The execution of a conguration on each instance is always bounded by a dened
maximum execution time (cut-o time).
The adaptive capping technique further bounds the execution of a cong-
uration by using the running time of good congurations as a bound in running
time that is often less than the user-specied cut-o time. Using this technique
can signicantly reduce the time wasted in the evaluation of poor performing
congurations. Adaptive capping adjusts the bound on running time according
to the number of instances to be used in the comparison, and for this reason, it
can be sensitive to the ordering of the given instances. There are two types of
adaptive capping: trajectory preserving and aggressive capping [13]. The rst of
these bounds the running time of new congurations using the performance of
the currently best conguration of each ParamILS iteration as reference, while
the second additionally uses the performance of the overall best conguration
multiplied by a factor, set to two by default, for bounding. This factor controls
the aggressiveness of the capping strategy. Further details on adaptive capping
can be found in [13].
240 L.P. Caceres et al.
5 Experiments
In this section, we study the impact of introducing the previously described cap-
ping procedure into irace. We compare the performance of the nal congurations
obtained by elitist irace and iracecap using dierent settings.
Lingeling [14]. 300 s cut-o time, 172 800 s total conguration budget, and a
training and testing set of 299 and 302 SAT instances, respectively. These
instances were obtained from the 2014 Congurable SAT Solver Competition
(CSSC) [14].
Spear [13]. 300 s cut-o time, 172 800 s total conguration budget, and a train-
ing and testing set of 302 SAT-encoded software verication instances each.
The instance les for these scenarios are also available from the Algorithm
Conguration Library (AClib) [15]. AClib species a cut-o time of 10 000 s for
the CPLEX scenarios, which stems from their initial use in conjunction with the
CPLEX auto-tuning tool. Following the experiments in [11, Sect. 5]), we use a
cut-o time of 300 s.1 Another minor dierence is that we usedversion 12.4 of
CPLEX, which was installed on our system, while AClib proposes to use version
12.6. However, there is no obvious reason to suspect that the particular version
of CPLEX should aect our conclusions on the eect of capping inside irace, and
we do not directly compare to results for the original AClib scenarios. Moreover,
although both irace and SMAC are able to handle non-discrete parameter spaces,
for ParamILS, all parameters have to be discretised, with all possible values
specied explicitly in the scenario denition. There is some evidence that the use
of non-discrete parameter spaces, where possible, leads to improved results [12],
thus giving an advantage to both irace and SMAC over ParamILS, unrelated
to the capping mechanism, which is the focus of our comparison presented in
Sect. 6. To avoid this bias, we only consider the variants of the scenarios where
all parameters are discretised and explicitly specied.
In all our experiments, we used the t-test to eliminate congurations within
irace, as previously recommended for running time minimisation [21]. The com-
parisons presented in the following are based on 20 independent runs of all
conguration procedures; multiple independent congurator runs are performed
due to the inherent randomness of the conguration procedures and the con-
guration scenarios. The experiments were run on one core of a dual-processor
2.1 GHz AMD Opteron system with 16 cores per CPU, 16 MB cache and 64 GB
RAM, running Cluster Rocks 6.2, which is based on CentOS 6.2.
In our empirical analysis of iracecap , we use mean running time as the per-
formance criterion to be optimised by irace. Runs that time out due to reaching
the cut-o time are then counted at this maximum cut-o time. In the litera-
ture, unsuccessful runs are often more strongly penalised, computing eectively
the number of timed out runs multiplied by a penalty factor pf plus the mean
computation time of the successfully terminated runs. In fact, the penalty fac-
tor pf converts the bi-objective problem of minimising the number of timed-out
runs and mean time of successful runs into a single-objective problem. In this
1
A higher cut-o time, as used in AClib, would be detrimental for conguration pro-
cedures such as iracecap , as time-outs would very strongly impact the number of con-
gurations that can be evaluated. On the other hand, there are various techniques,
such as early termination of ongoing runs or the initial use of smaller maximum
cut-o times, to address this problem. In the literature, the use of smaller cut-o
times has been suggested as a possible remedy [12, footnote 9].
An Experimental Study of Adaptive Capping in irace 243
section, runs of irace attempt to minimise mean running time (with pf = 1),
and we therefore assessed the performance of the resulting target algorithm
congurations using this performance metric. In the supplementary material,
we additionally present results for evaluating congurations using pf = 10 and
pf = 100. In the literature, pf = 10 is commonly used and referred to as PAR10;
consequently, in Sect. 6, all congurator runs and target algorithm evaluations
are performed using PAR10 scores.
We rst compare the results obtained by elitist irace and iracecap , using their
respective default settings. Table 1 presents performance statistics over the 20
runs of both irace versions. The implementation of the proposed capping proce-
dure proves to be benecial for the scenarios used in these experiments. For the
Regions 100, Regions 200, Corlat, and Spear scenarios, the results obtained by
iracecap are signicantly better than those of elitist irace, while for the Lingeling
scenario, the results are not signicantly dierent (however, iracecap still achieves
a better mean than irace).
Table 1. Summary statistics of the distribution of observed mean running time and
percentage of timed out evaluations of 20 runs of iracecap and elitist irace (irace) on
test sets for the various conguration scenarios. We show the rst and second quartile
(q25 and q75, respectively), the median, the mean, the standard deviation (sd) and the
variation coecient (sd/mean). Wilcoxon test p-values are reported in the last line.
Statistically signicantly better results (at = 0.05) are indicated in bold-face and
lowest mean running times in italics.
of the iteration (bars). (For results on all other scenarios, see Figure A.2.) The
capping procedure selects more congurations for elimination than the statistical
test in all stages of the search, while the statistical test is mainly able to eliminate
congurations in the initial phases of the search. As the race progresses, capping
elimination quickly becomes mainly responsible for eliminating congurations,
illustrating the importance of introducing it into irace.
Fig. 2. Mean percentage of congurations selected for elimination by the capping proce-
dure and the statistical test (solid and dashed lines respectively), and mean percentage
of initial congurations that become elite congurations at the end of the iteration
(bars). Means obtained across 20 independent runs of iracecap on the Regions 200 and
Spear scenarios.
Table 2. Statistics over 20 independent runs of iracecap and irace: mean number of
iterations performed (iterations), mean number of instances used in the evaluation
(instances), mean overall sampled congurations (candidates), mean elite congura-
tions per iteration (elites) and mean total executions (executions).
In what follows, we examine in more detail the impact of some specic parameter
settings of iracecap on its performance. For the sake of conciseness, we will only
discuss overall trends;detailed results are found in supplementary material [20].
Instance order. The order of the instances may introduce a bias in irace when
the conguration scenario involves a heterogeneous instance set. By default, irace
shues the order of the training instances. Without this shuing, irace evalu-
ates the instances in the order provided by the user. Since the set of previously
used instances is evaluated in every iteration, elitist irace randomly permutes the
order of previously seen instances (I old ) before each iteration to further avoid
any bias that the previous order may introduce. Table A.2 compares the results
obtained by iracecap with and without this instance reshuing. For most bench-
mark scenarios, disabling instance reshuing produces better mean results and
fewer timed-out runs; for Regions 200 and Lingeling, these dierences are sta-
tistically signicant. The main exception is the Spear scenario, where reshuing
leads to much improved results; this is probably due to the fact that this scenario
contains a very heterogeneous instances set.
These results suggest that the impact of reshuing depends on the given
conguration scenario; we conjecture that for more heterogeneous instance sets,
reshuing the instance set becomes increasingly important. Investigating this
conjecture in detail is an interesting direction for future work.
Confidence level of statistical test. The dominance criterion eliminates more
congurations than the statistical test. Lowering the condence level of the sta-
tistical test should lead to an even higher elimination rate of the latter and
possibly improve the ecacy of the overall conguration process. We explored
this possibility by lowering the condence level in iracecap from its default value
of 0.95 to 0.75. Table A.3 in the supplementary material shows the impact of
this change. The eects on the elimination of congurations can be observed
in Figure A.5 in the supplementary material. As expected, the statistical test
eliminates more congurations when setting the condence level to 0.75. This
also results in a small increase in the overall number of congurations evaluated
and a reduction of the mean number of elite congurations (see Table A.4 in the
supplementary material). The more aggressive test slightly improved the perfor-
mance for three scenarios, yielding signicantly better results for Regions 200. In
contrast, a condence level of 0.75 results in slightly worse performance on the
Spear scenario, indicating that the eliminations performed with lower condence
can be premature.
If we completely disable statistical testing (condence level 1.0), the perfor-
mance of iracecap improves on Regions 100 and Regions 200, as seen in Table A.5
in the supplementary material. This suggests that the statistical test can prema-
turely eliminate congurations based on an incorrect criterion. Despite this, we
still recommend keeping the default condence level of 0.95, as a safe-guard that
may be useful for conguration scenarios with possibly very dierent properties
from the ones we are testing here.
246 L.P. Caceres et al.
We compare the results obtained by iracecap with two other automatic congu-
rators available in the literature, ParamILS and SMAC. Both have been widely
used in the literature for running time minimisation. SMAC and ParamILS, as
well as irace, were run using default settings. We chose not to include instance
features in the conguration process and use only fully discretised conguration
spaces; this was done to isolate as much as possible the impact of the new cap-
ping mechanism in irace, and to examine whether it would become competitive
2
Setting T new to 0 may be benecial for scenarios with a very large cut-o time,
as used by default in AClib for the CPLEX scenarios. This should help to aggres-
sively bound the running time at the start of each race, by using the running times
of the elite congurations, thus avoiding the high cost of evaluating possibly poor
congurations with a very large cut-o time.
An Experimental Study of Adaptive Capping in irace 247
with other congurators that already used this technique. Considering features
or non-discrete parameter spaces would introduce additional factors that are
likely to aect performance beyond the impact of capping. Nevertheless, SMAC
can also use instance features in the conguration process, which may improve
its results; therefore, the results obtained here should be considered with caution
for those scenarios in cases where these features are available. Yet, identifying
how much of the improvement is due to instance features or due to dierences
in the capping methods between SMAC and other congurators would require
a more extensive analysis that is left for future research. Additionally, SMAC
and irace can handle real-valued parameters and, as already shown for SMAC in
[12], doing so may further improve performance.
As mentioned previously, we ran iracecap , SMAC and ParamILS using the
PAR10 evaluation on the scenarios described in Sect. 5. Table 3 shows the mean
PAR10 execution times obtained from 20 runs of the congurators. In the on-
line supplementary material, we present results with other penalty factors from
{1, 10, 100}. The table shows the p-values obtained from the Wilcoxon signed-
rank test comparing the performance of the two congurators with the lowest
mean PAR10 score. iracecap obtains the statistically signicantly lowest mean
on the Regions 200, Corlat, and Lingeling scenarios, while SMAC obtains the
statistically signicantly lowest mean on the Spear scenario. On the Regions
100 scenario, iracecap obtains the lowest mean performance value, though its
performance is not statistically dierent from that of ParamILS.
It is known that trajectory-based local search methods, such as ParamILS,
can exhibit high performance variability over multiple independent runs due
to search stagnation. A common practice for dealing with this situation, and
for reducing the overall wall-clock time of the conguration process by means
Table 3. Statistics over the mean PAR10 performance and percentage of timed-out
instances from 20 runs of iracecap , SMAC and ParamILS. Wilcoxon test p-values (sig-
nicance 0.05). Signicantly better results in bold and best mean in cursive.
Table 4. Statistics over the mean PAR10 performance for the best-out-of-ten runs
sampled from the 20 original runs of iracecap , SMAC and ParamILS. Wilcoxon test
p-values ( = 0.05). Signicantly better results are shown in bold-face and best mean
values in italics.
7 Conclusions
improvements in the ecacy of irace for running time minimisation, and our new
iracecap congurator reaches state-of-the-art performance on prominent congu-
ration scenarios. This considerably broadens the range of conguration scenarios
on which irace should be seen as one of the methods of choice.
In future work, it would be interesting to explore which characteristics of
a conguration scenario makes it particularly amenable to dierent variants of
adaptive capping. Furthermore, we would like to investigate under which cir-
cumstances iracecap performs better (or worse) than other state-of-the-art con-
gurators, notably SMAC [12], ParamILS [13] and GGA++ [1]. We see this as
an important step towards automatic selection of the congurator expected to
perform best on a given scenario. This could improve the state of the art in auto-
matic algorithm conguration and further boost the appeal of the programming
by optimisation (PbO) software design paradigm [9], which crucially depends on
maximally eective congurators.
References
1. Ansotegui, C., Malitsky, Y., Samulowitz, H., Sellmann, M., Tierney, K.: Model-
based genetic algorithms for algorithm conguration. In: IJCAI 2015, pp. 733739.
IJCAI/AAAI Press, Menlo Park (2015)
2. Ansotegui, C., Sellmann, M., Tierney, K.: A gender-based genetic algorithm for the
automatic conguration of algorithms. In: Gent, I.P. (ed.) CP 2009. LNCS, vol.
5732, pp. 142157. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04244-7 14
3. Babic, D., Hutter, F.: Spear theorem prover. In: SAT 2008: Proceedings of the SAT
2008 Race (2008)
4. Balaprakash, P., Birattari, M., Stutzle, T.: Improvement strategies for the F-Race
algorithm: sampling design and iterative renement. In: Bartz-Beielstein, T., Blesa
Aguilera, M.J., Blum, C., Naujoks, B., Roli, A., Rudolph, G., Sampels, M. (eds.)
HM 2007. LNCS, vol. 4771, pp. 108122. Springer, Heidelberg (2007). doi:10.1007/
978-3-540-75514-2 9
5. Biere, A.: Yet another local search solver and lingeling and friends entering the SAT
competition 2014. In: Belov, A., et al. (ed.) Proceedings of SAT Competition 2014.
Science Series of Publications B, vol. B-2014-2, pp. 3940. University of Helsinki
(2014)
6. Birattari, M.: The Problem of Tuning Metaheuristics as Seen from a Machine
Learning Perspective. Ph.D. thesis, Universite Libre de Bruxelles, Belgium (2004)
7. Birattari, M., Yuan, Z., Balaprakash, P., Stutzle, T.: F-race and iterated F-race: an
overview. In: Bartz-Beielstein, T., Chiarandini, M., Paquete, L., Preuss, M. (eds.)
Experimental Methods for the Analysis of Optimization Algorithms, pp. 311336.
Springer, Heidelberg (2010). doi:10.1007/978-3-642-02538-9 13
8. Hoos, H.H.: Automated algorithm conguration and parameter tuning. In:
Hamadi, Y., Monfroy, E., Saubion, F. (eds.) Autonomous Search, pp. 3771.
Springer, Heidelberg (2012). doi:10.1007/978-3-642-21434-9 3
250 L.P. Caceres et al.
1 Introduction
Let X be a Banach space with norm . Let E be a convex function dened
on X. The problem of convex optimization is to nd an approximate solution to
the problem
E(x) min . (1)
xX
Many problems in machine learning can be reduced to the problem (1) with
E as a loss function [1]. In many real applications it is required that the optimal
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 251262, 2017.
https://doi.org/10.1007/978-3-319-69404-7_18
252 S.P. Sidorov and S.V. Mironov
solution x of (1) should have a simple structure, e.g. be a finite linear combi-
nation of elements from a dictionary D in X. In another words, x should be a
sparse element with respect to the dictionary D in X. Of course, one can sub-
stitute the requirement of sparsity by a constraint on cardinality (i.e. the limit
on the number of elements used in linear combinations of elements from the
dictionary D to construct a solution of the problem (1)). However, it many cases
the optimization problems with cardinality-type constraint are NP-complete.
By this reason, practitioners and researchers in real applications choose to use
greedy methods. By its design, greedy algorithms is capable of producing sparse
solutions.
A set of elements D from the space X is called a dictionary (see, e.g. [15]) if
each element g D has norm bounded by one, g 1, and the closure of span
D is X, i.e. span D = X. A dictionary D is called symmetric if g D for every
g D. In this paper we assume that the dictionary D is symmetric.
As it was pointed out, practitioners and researchers would like to nd the
solutions of the optimization problem (1), which are sparse with respect to the
dictionary D, i.e. they are looking for solving the following problem:
One of the apparent choices among constructive methods for nding the best
m-term approximations are greedy algorithms. The design of greedy algorithms
allows us to obtain sparse solutions with respect to D. Perhaps, the Frank-Wolfe
method [2], which is also known as the conditional gradient method [3], is one
of the most prominent algorithms for nding optimal solutions of constrained
convex optimization problems. Important contributions to the development of
Frank-Wolfe type algorithms can be found in [46]. The paper [5] provides general
primal-dual convergence results for Frank-Wolfe-type algorithms by extending
the duality concept presented in the work [4]. Recent convergence results for
greedy algorithms one can nd in the works [714,1618,20].
This paper examines two weak relaxed greedy algorithms
Weak Relaxed Greedy Algorithm (WRGA(co)),
Weak Relaxed Greedy Algorithm with Error (WRGA())
for nding solutions of convex optimization problem, which are sparse with
respect to some dictionary, in Banach spaces. Primal convergence results for
the weak relaxed greedy algorithms were obtained in [15,18]. In this paper,
extending the ideas of [4,5] we force into application the notion of the duality
gap for weak relaxed greedy algorithms to obtain dual convergence estimates
for sparse-constrained convex optimization problems of type (2). In contrast to
Duality Gap Analysis of Weak Relaxed Greedy Algorithms 253
The line-search step of Algorithm 1 nds the best point lying on the line
segment between the current point Gm1 and m .
Let := {x X : E(x) E(0)} and suppose that is bounded. As it
turns out, the convergence analysis of greedy algorithms essentially depends on
a measure of non-linearity of the objective function E over set , which can
be depicted via the modulus of smoothness of function E.
Let us remind that the modulus of smoothness of function E on the bounded
set can be dened as
1
(E, u) = sup |E(x + uy) + E(x uy) 2E(x)|, u > 0. (4)
2 x,y=1
The paper [18] notes that values of E may not be calculated exactly for many
real application problems. Moreover, very often the exact optimal value of in
the problem
inf E ((1 )Gm1 + ) (6)
01
in Step 2 of WRGA(co) can not be found. Therefore, the paper [18] examines
the weak relaxed greedy algorithm with error (Algorithm 2), which is a slightly
Duality Gap Analysis of Weak Relaxed Greedy Algorithms 255
Proposition 1 shows that the duality gap g(G) is a bound for the current
approximation E(G) to the optimal solution E(x ).
The duality gap g is calculated as a derivative product on every iteration of
both Algorithms 1 and 2. If the linearized problem at the gradient greedy step
for element Gm1 has optimal solution m , then the element m is a reference
for the current duality gap
Such references for the approximation quality of current iteration can be used
as a stopping criterion, or to verify the numerical stability of an optimizer.
Theorems 1 and 2 give upper estimates for primal errors for WRGA(co) and
WRGA(), respectively. In the next subsections we will obtain dual estimates
for the algorithms in terms of duality gap g.
and
E(Gm ) = inf E(Gm1 + (m Gm1 )). (10)
01
E (Gm1 ), m Gm1
tm supE (Gm1 ), s Gm1 = tm g(Gm1 ). (12)
sD
m0 p m 0 p
k=1 tk k=1 m0 p s[M ]+1
M p M = p , or p .
t
k=1 k k=1 1 M sM
1q
where C2 := C2 (q, ) := (min{1, C1 (q, )}) and depends only on M, q, , .
g(Gm ) C2 s1q
M , (16)
1. tk 1, k = 1, 2, . . .;
p 1q 1q
2. s1q
[M ]+1 ( ) sM (Lemma 2);
3. since [M ] + 1 m M , we have s[M ]+1 sm sM , and consequently,
s1q 1q
[M ]+1 sm sM .
1q
E(Gm+1 ) E
q(1q)
E(Gm ) E C2 q+1 sM + 2q+1 (p )q(1q) sM
2(1q)
, (19)
where E := inf E(x). Let us write the chain of inequalities for all m0 from
f A1 (D)
[M ] + 1 to M , then
E(GM ) E E(Gm0 ) E (M m0 )s1q
M 1
C2 s1q 1q 1q
M (M (1 ) 1)sM 1 = sM [C2 (M (1 ) 1)1 ] , (20)
where 2 (q1)2
1 := C2 q+1 s1q
M 2
q+1 q(1q) q
sM .
Let us take any satisfying
C2 2 (q1)2
M (1)1 + 2q+1 q(1q) q sM
>
C2 q+1 s1q
M
then we obtain
E(GM ) E < 0
that can not be impossible. The smallest value of leads to a better estimate
in (14). To be sure that is smallest we can choose the parameter as follows:
C2 q+1 q(1q) q 2 (q1)
2
:= arg min + 2 sM .
01 M (1 ) 1
Duality Gap Analysis of Weak Relaxed Greedy Algorithms 259
Gm = (1 m )Gm1 + m m .
Gm1 m C1 + 1 =: C0 .
where
2 := M 1q C(E, q, ) 2(C0 )q q(1q) M (1q)(q1) + .
M 1q
If we take
C(E,q,)1q
M (1)1 + 2(C0 )q q(1q) M (1q)(q1) M 1q
>
M 1q C(E, q, )
then we get E(Gm ) E < 0 which is impossible. We are interested in obtaining
a better value of the constant in (24). The smallest can be attained if we
choose as follows:
C(E, q, )1q
:= arg min + 2(C0 )q q(1q) M (1q)(q1) 1q .
01 M (1 ) 1 M
Duality Gap Analysis of Weak Relaxed Greedy Algorithms 261
4 Conclusion
Theorems 1 and 2 cited in Sect. 2 show that primal errors for the weak relaxed
greedy algorithms are small and heavily depend on geometric properties of the
objective function E. On the other hand, the paper [5] remarks that very often
both the optimal value E and the constant in the modulus of smoothness of E
are unknown, and therefore, estimates for the quality of current approximation
to optimal solution are considerably in demand. Following ideas of [5], we dened
the notion of the duality gap by the equality (7). The values of duality gap are
calculated on each iteration of both WRGA(co) and WRGA() at the gradient
greedy step, and therefore, they are inherent upper bounds for primal errors, i.e.
dierences between values of objective function at current and optimal points
on each step. We obtain dual convergence estimates for the weak relaxed greedy
algorithms in Theorems 3 and 4.
Acknowledgments. This work was supported by the Russian Fund for Basic
Research under Grant 16-01-00507. We would like to thank the reviewers profoundly
for very helpful suggestions and commentaries.
References
1. Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends Mach.
Learn. 8(34), 231358 (2015)
2. Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logis.
Quart. 3, 95110 (1956)
3. Levitin, E.S., Polyak, B.T.: Constrained minimization methods. USSR Comp.
Math. & M. Phys. 6(5), 150 (1966)
4. Clarkson, K.L.: Coresets, sparse greedy approximation, and the Frank-Wolfe algo-
rithm. ACM Trans. Algorithms 6(4), 130 (2010)
5. Jaggi, M., Frank-Wolfe, R.: Projection-free sparse convex optimization. In: Pro-
ceedings of the 30th International Conference on Machine Learning (ICML 2013),
pp. 427435 (2013)
6. Freund, R.M., Grigas, P.: New analysis and results for the Frank-Wolfe method.
Math. Program. 155(1), 199230 (2016)
7. Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann.
Stat. 29(5), 11891232 (2001)
8. Davis, G., Mallat, S., Avellaneda, M.: Adaptive greedy approximation. Constr.
Approx. 13, 5798 (1997)
9. Zhang, Z., Shwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning
DNA sequences. J. Comput. Biol. (12), 203214 (2000)
10. Huber, P.J.: Projection pursuit. Ann. Statist. 13, 435525 (1985)
11. Jones, L.: On a conjecture of Huber concerning the convergence of projection pur-
suit regression. Ann. Statist. 15, 880882 (1987)
12. Barron, A.R., Cohen, A., Dahmen, W., DeVore, R.A.: Approximation and learning
by Greedy algorithms. Ann. Stat. 36(1), 6494 (2008)
13. DeVore, R.A., Temlyakov, V.N.: Some remarks on greedy algorithms. Adv. Com-
put. Math. 5, 173187 (1996)
262 S.P. Sidorov and S.V. Mironov
1 Introduction
Business Rules (BR) are a programming for non programmers paradigm that
is often used by large corporations to store industrial process knowledge formally.
BR replaces the two most abstract concepts of programming, namely loops and
function calls, by means of an implicit outer loop and meta-variables used within
a set of easy-to-manage if-then type instructions. BR interpreters are imple-
mented by all BR management systems, e.g. [14]. BR programs are often used
by corporations to encode their policies and empirical knowledge: given some
technical input, they produce a decision, often in the form of a YES/NO output.
Corporations often require their internal processes to perform according to a
prescribed statistical behavior, which could be imposed because of strategy or
by law. This required behavior is typically independent of the BR input data.
The problem is then to parametrize the BR program so it will behave as pre-
scribed on average, while still providing meaningful YES/NO answers on given
inputs.
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 263276, 2017.
https://doi.org/10.1007/978-3-319-69404-7_19
264 O. Wang and L. Liberti
1.1 Preliminaries
We formally represent a BR program as an ordered list of sentences of the form:
if cond(p, x) then
x act(p, x)
end if
where p is a control parameter vector (with c components) which encodes a
possible tuning of the program (e.g. thresholds which can be adjusted by the
user), x X Rd is a variable vector representing intermediate and nal stages
of computation, cond is a boolean function, and act a function with values in X.
We call rule such a sentence, condition an expression cond(p, x) and action an
instruction x act(p, x), which indicates a modication of the value of x. We
write the nal value of the variable x as xf = P (p, q), where P represents the
BR program and q is an input parameter vector representing a problem instance
and equal to the initial value of x. Although in general BR programs may have
Controlling Some Statistical Properties of Business Rules Programs 265
any type of output, we consider only integer outputs, since BR programs are
mostly used to take discrete decisions. We remark that p, x are symbolic vectors
(rather than numeric vectors) since their components are decision variables.
BR programs are executed in an external loop construct which is transparent
to the user. Without getting into the details of BR semantics, the loop executes
a single action from a BR whose condition is True at each iteration. Which BR is
executed depends on a conict resolution strategy with varying complexity. De
Sainte-Marie et al. [23] describe typical operational semantics, including conict
resolution strategy, for industrial BR management systems. In this paper, the
list of rules is ordered and the loop executes the rst BR of the list with a
condition evaluating to True at each iteration. The loop only terminates once
every condition of the BRs is False. We proved in [29] that there is a universal
BR program which can simulate any Turing Machine (TM), which makes the
BR language Turing-complete.
We consider the problem where the q Q are the past, known instances
of the BR program, and the outputs P (p, q) of those instances are divided into
N evenly sized intervals [H0 , H1 ], . . . , [HN 1 , HN ], forming a quantized output
distribution. Denoting 1 (p), . . . , N (p) the number of outputs in these categories,
we can formalize the problem as:
min p p0 1
p,x (1)
C (1 (p), . . . , N (p))
{1/3, 1/3, 1/3}, but it is currently {1/4, 1/4, 1/2}. Our aim is in each case to
adjust p, e.g. modifying the income level, so that the BR program satises the
banks goal regarding automatic loan treatment. This adjustment of parameters
could be required after a change of internal or external conditions, for example.
The rst scenario can be formulated as:
min p p0 1
p,x (2)
EqQ P1 (p, q) g
where P1 has an output in {0, 1}, g [0, 1] is the desired max percentage of 1
outputs, the q Q are the past known instances of the BR program, 1 is
the L1 norm, p, q must satisfy the semantics of the BR program P (p, q) when
executed within the loop of a BR interpreter and E is the usual notation for the
expected value.
Similarly, the second scenario where P2 has an output in {1, . . . , N } and
the desired output is as close to a uniform distribution as possible can be for-
malized as:
min p p0 1
p,x (3)
s, t {1, . . . , N }, s t 1
Note that the solution to this problem is not always a truly uniform distribu-
tion, simply because there is no guarantee that m is divisible by N . However,
it will always be as close as possible to a uniform distribution, since the con-
straint imposes that all the outputs will be reached by either floor(m/N ) or
ceil(m/N ) data points. Again, we use whole numbers (of outputs in a given
interval) instead of frequencies to be able to employ integer decision variables.
Such problems could be solved heuristically by treating P1 or P2 as a black-
box, or by replacing it by means of a simplied model, such as e.g. a low-degree
polynomial. We approach this problem as in [30]: we model the algorithmic
dynamics of the BR by means of MIP constraints, in view to solving those
equations with an o-the-shelf solver. That this should be possible at all in full
generality stems from the fact that Mathematical Programming (MP) is itself
Turing-complete [16].
We make a number of simplifying assumptions in order to obtain a practi-
cally useful methodology, based on solving a Mixed-Integer Linear Programming
(MILP) reformulation of these equations using a solver such as CPLEX [13]:
1. We suppose Q is small enough that solving the MILP is (relatively) compu-
tationally cheap.
2. We assume nite BR programs with a known bound (n 1) on the number
of iterations of the loop for any input q (industrial BR programs often have
a low value of n relative to the number of rules). This in turn implies that
the values taken by x during the execution of the BR program are bounded.
We assume that M 1 is an upper bound of all absolute values of all p, q,
and x, as well as any other values appearing in the BR program. It serves as
a big M for the MP described in the rest of the paper.
Controlling Some Statistical Properties of Business Rules Programs 267
3. We assume that the conditions and actions of the BR program give rise to
constraints for which an exact MILP reformulation is possible. In order to
have a linear model, each BR must thus be linear, i.e. have the form:
if L x G then
x Ax + B
end if
with L, G, B Rd and A {0, 1}dd . In general, Ah,k may have values in R
if it is not a parameter and xh has only integer values.
We follow the formalism used in [30] pertaining to Business Rules (BR) programs
and their statistical behavior.
Business Rules (also known as Production Rules) are well studied as a knowl-
edge representation system [8,10,18], originating as a psychological model of
human behavior [20,21]. They have further been used to encode expert systems,
such as MYCIN [6,27], EMYCIN [6,25], OPS5 [5,11], or more recently ODM [14]
or OpenRules [22]. On business side of things, they have been dened broadly
and narrowly in many dierent ways [12,15,24]. We consider Business Rules as
a computational tool, which to the best of our knowledge has not been explored
in depth before.
Supervised Learning is also a well studied eld of Machine Learning, with
many dierent formulations [3,17,26,28]. A popular family of algorithms for the
classication problem uses Association Rules [1,19]. Such Rule Learning is not
to be confused with the problem treated in this article, which is more a regres-
sion problem than a classication problem. There exist many other algorithms
for Machine Learning, from simple linear regression to neural networks [2] and
support vector machines [9]. When the learner does not have as many known
output values as it has items in the training set, the problem is known as Semi-
Supervised Learning [7]. Similarly, there has been research into machine learning
when the matching of the known outputs values to the inputs is not certain [4].
A previous paper has started to explore the Learning problem when the known
information does not match to a single input [30].
In the rest of this paper, we concatenate indices so that (Lr )k = Lrk , (Gr )k =
Grk , (Ar )h,k = Arhk and (Br )k = Brk . We assume that rules are feasible,
i.e. r, k R D, Lk Gk . In the rest of this section, we suppose that the
dimension of p is c = 1, making p a scalar, and that p takes the place of A111 .
Similar sets of constraints exists for when the parameter p takes the place of
a scalar in Br , Lr or Gr . Additional parameters correspond to additional con-
straints that mirror the ones used for the rst parameter.
268 O. Wang and L. Liberti
This formalization is taken from [30], in which we have also proved that the
set of constraints described in Fig. 1 models the execution of such a BR program.
The iterations of the execution loop are indexed by i I = {1, . . . , n} where n1
is the upper bound on the number of iterations, the nal value of x corresponds
to iteration n. We use an auxiliary binary variable yir with the property: yir = 1
U
i the rule Rr is executed at iteration i. The other auxiliary binary variables yir
L
and yir are used to enforce this property.
We note (C1), (C2), etc. the constraints related to the evolution of the exe-
cution and (IC1), (IC2), etc. the constraints related to the initial conditions of
the BR program:
(C1): represents the evolution of the value of the variable x
(C2): represents the property that at most one rule is executed per iteration
(C3): represents the fact that a rule whose condition is False cannot be exe-
cuted
(C4)(C6) represent the fact that only the rst rule whose condition is True
can be executed
(IC1) through (IC3) represent the initial value of a
(IC4) represents the initial value of x.
The Mixed-Integer Program from Fig. 2 models the problem from Eq. 1. We
index the instances in Q with j J = {1, . . . , m}. We also limit ourselves to
solutions which result in computations that terminate in less than n 1 rule
executions. As modifying the parameter means modifying the BR program, the
assumptions made regarding the niteness of the program might not be veried
otherwise.
We note O = {1, . . . , N }, such that t O, t = card{j J | x1n,j
[Ht1 , Ht ]}. We enforce this denition of t by using an auxiliary binary variable
stj with the property: stj = 1 i x1n,j [Bt1 , Bt ]. The other auxiliary binary
variables sU L
tj and stj are used to enforce this property.
The constraints are mostly similar to the ones in Fig. 1. We simply add the
goal of minimizing the variation of the parameter value and the constraints
C (1 (p), . . . , N (p)) from Eq. 1. The new constraints are:
(C7) represents the need for the computation to have terminated after n 1
executions
(C8)(C12) represents the denition of 1 , . . . , N
(IC4) represents (IC4) with an additional index j.
That solving the MIP in Fig. 2 also solves the original Eq. 1 is a direct con-
sequence of the fact that the constraints in Fig. 1 simulate P (p, q). The proof is
simple since (C8) through (C12) trivially represent the denition of 1 , . . . , N .
A similar MIP can be obtained when p has values in dierent part of the BRs,
from which a more complex MILP is obtained for when p is non-scalar. However,
this formulation is still quite abstract, as it depends heavily on the form of C .
In fact, it can almost always be simplied given a particular constraint over the
quantized distribution, as we see in the rest of this section.
(C11 ), (C12 ), (C13 ), (C14 ) and (C15 ) represent the linearization of (C1)
from Fig. 1
(C8) represents the goal from Eq. 2, that is a constraint over the average of
the nal values of x. It replaces C (1 , . . . , N ) and all the constraints used to
dene t from the MIP in Fig. 2.
The MILP from Fig. 3 nds a value of p that satises Eq. 2. This is again
derived from the fact that Fig. 1 simulates a BR program, and from the trivial
proof that (C11 ), (C12 ), (C13 ), (C14 ) and (C15 ) represent the linearization
of (C1).
As before, we exhibit in Fig. 4 a MILP that solves Eq. 3. Any constraints num-
bered as before fullls the same role. The additional constraints are:
(C8) through (C10) represent the adaptation of (C8) through (C10) to the
relevant case of integer outputs
(C13) represents the equivalent to C from Eq. 3.
This MILP is obviously equivalent to solving Eq. 3, since it is for the most part
a straight linearization of the MIP in Fig. 2.
Controlling Some Statistical Properties of Business Rules Programs 271
5000
4500
4000
solving time (s)
3500
3000
2500
2000
1500
1000
500
0
5 7 9 11 13 15
number of control parameters c
600
500
solving time (s)
400
300
200
100
0
5 7 9 11 13 15
number of control parameters c
Fig. 6. Average solution time over non-trivial solvable P2 for varying values of c.
the size of the BR program: but this can currently be said of most MILPs. This
issue, which certainly requires more work, can possibly be tackled by pursu-
ing some of the following ideas: more eective BB-based or formulation-based
heuristics (also called mat-heuristics in the literature), cut generation based on
problem structure, and decomposition. The latter, specically, looks promising
as the structure of the BR program is, up to the extent provided by automatic
translation based on parsing trees, carried over to the resulting MILP.
Other avenues of research are in extending this statistical learning approach
in other directions, e.g. learning other moments, or given quantiles in continuous
distributions. Statistical goal learning problems are an apparently unexplored
area of ML that has eminently practical applications.
References
1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of
items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the
1993 ACM SIGMOD International Conference on Management of Data, pp. 207
216. ACM, New York (1993)
2. Atiya, A.: Learning Algorithms for Neural Networks. Ph.D. thesis, California Insti-
tute of Technology, Pasadena, CA (1991)
3. Bakir, G., Hofmann, T., Scholkopf, B., Smola, A., Taskar, B., Vishwanathan,
S.: Predicting Structured Data (Neural Information Processing). The MIT Press,
Cambridge (2007)
4. Brodley, C., Friedl, M.: Identifying mislabeled training data. J. Artif. Intell. Res.
11, 131167 (1999)
5. Brownston, L., Farrell, R., Kant, E., Martin, N.: Programming Expert Systems
in OPS5: An Introduction to Rule-Based Programming. Addison-Wesley, Boston
(1985)
6. Buchanan, B., Shortlie, E. (eds.): Rule Based Expert Systems: The Mycin Experi-
ments of the Stanford Heuristic Programming Project (The Addison-Wesley Series
in Articial Intelligence). Addison-Wesley, Boston (1984)
7. Chapelle, O., Schlkopf, B., Zien, A.: Semi-Supervised Learning. The MIT Press,
Cambridge (2010)
8. Clancey, W.: The epistemology of a rule-based expert system: a framework for
explanation. Artif. Intell. 20(3), 215251 (1983)
9. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273297
(1995)
10. Davis, R., Buchanan, B., Shortlie, E.: Production rules as a representation for a
knowledge-based consultation program. Artif. Intell. 8(1), 1545 (1977)
11. Forgy, C.: OPS5 Users Manual. Department of Computer Science, Carnegie-Mellon
University, Pittsburgh (1981)
12. Knolmayer, G., Herbst, H.: Business rules. Wirtschaftsinformatik 35(4), 386390
(1993)
13. IBM: ILOG CPLEX 12.2 Users Manual. IBM (2010)
14. IBM: Operational Decision Manager 8.8 (2015)
276 O. Wang and L. Liberti
15. Kolber, A., et al.: Dening business rules - what are they really? Project Report
3, The Business Rules Group (2000)
16. Liberti, L., Marinelli, F.: Mathematical programming: turing completeness and
applications to software analysis. J. Comb. Optim. 28(1), 82104 (2014)
17. Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retriev.
3(3), 225331 (2009)
18. Lucas, P., Gaag, L.V.D.: Principles of Expert Systems. Addison-Wesley, Boston
(1991)
19. Malioutov, D.M., Varshney, K.R.: Exact rule learning via boolean compressed sens-
ing. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International
Conference on Machine Learning (ICML 2013). JMLR: Workshop and Conference
Proceedings, vol. 28, pp. 765773. JMLR, Brookline (2013)
20. Newell, A.: Production systems: models of control structures. In: Chase, W. (ed.)
Visual Information Processing. Proceedings of the Eighth Annual Carnegie Sym-
posium on Cognition, pp. 463526. Academic Press, New York (1973)
21. Newell, A., Simon, H.: Human Problem Solving. Prentice-Hall, Upper Saddle River
(1972)
22. OpenRules Inc.: OpenRules User Manual, Monroe (2015)
23. Paschke, A., Hallmark, G., De Sainte Marie, C.: RIF production rule dialect,
2nd edn. W3C recommendation, W3C (2013). http://www.w3.org/TR/2013/
REC-rif-prd-20130205/
24. Ross, R.: Principles of the Business Rule Approach. Addison-Wesley, Boston (2003)
25. Scott, A., Bennett, J., Peairs, M.: The EMYCIN Manual. Department of Computer
Science, Stanford University, Stanford (1981)
26. Settles, B.: Active learning literature survey. Computer Sciences Technical Report
1648, University of Wisconsin-Madison (2009)
27. Shortclie, E.: Computer-Based Medical Consultations: MYCIN. Elsevier,
New York (1976)
28. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
29. Wang, O., Ke, C., Liberti, L., de Sainte Marie, C.: The learnability of business
rules. In: International Workshop on Machine Learning, Optimization, and Big
Data (MOD 2016) (2016)
30. Wang, O., Liberti, L., DAmbrosio, C., de Sainte Marie, C., Ke, C.: Controlling
the average behavior of business rules programs. In: Alferes, J.J.J., Bertossi, L.,
Governatori, G., Fodor, P., Roman, D. (eds.) RuleML 2016. LNCS, vol. 9718, pp.
8396. Springer, Cham (2016). doi:10.1007/978-3-319-42019-6 6
GENOPT Paper
Hybridization and Discretization Techniques
to Speed Up Genetic Algorithm and Solve
GENOPT Problems
Francesco Romito(&)
1 Introduction
One of the fundamental principles in our world is the search for an optimal state.
Technological progress and the expansion of knowledge constantly bring to light the
real issues that need to be explained in quantitative terms, solving global optimization
problems as in [13]. Very often the analytical expressions of the model are missed or
are not easily represented. These problems are known in the literature as black box.
The GENOPT [4] challenge allows contestants to evaluate the algorithms on ran-
domized functions, created through suitable generators [5], provided as a binary library,
in order to be treated just as black box problems. In accord with No Free Lunch
Theorems for Optimization [6], an algorithm could reveal a positive result for a specic
benchmark function, whereas the use of randomized functions provides a major and
faithful overview of its robustness.
In literature there are now countless derivative free approaches for global opti-
mization. For instance, recently in [7] has been proposed a Deterministic Particle
Swarm Optimization, in order to better explore the search space. Another example is
given in [8] where a DIRECT-type algorithm is hybridized with a derivative-free local
minimization to globally solving optimization problems. Other well-known meta-
heuristics like Simulated Annealing [9], Tabu Search [10], Random Optimization [11],
Ant Colony Optimization [12] are widely available in literature, and used successfully
on many real-world applications. Moreover, thorough overviews of global optimiza-
tion, with topics such as stochastic global optimization, partitioning methods, bounding
procedures, convergence studies and complexity can be found in [1316].
This paper focuses on one of the most popular classes of algorithms belonging to
evolutionary computation, namely Genetic Algorithms (GAs). Moving away from the
classical scheme [17, 18], the GA has been used as an internal procedure of a larger
scheme of global search and has been also successfully modied to reduce the prob-
abilistic component that typically characterizes it.
In particular, after the preliminary Sect. 2 about useful concepts and a look on the
complexity of the problem involved, in Sect. 3 a novel algorithmic scheme for global
optimization is presented. Ad hoc discretization techniques have been successfully
interlaced with the GA classical search operations, performing a better and wider
search on a feasible domain. Moreover, a high efcient scheme of a hybridized GA
with local searches is described. Section 4 details the tuning process and the results
obtained during the GENOPT contest, a special session of the Learning and Intelligent
Optimization Conference (LION 11: June 1921, 2017, Nizhny Novgorod, Russia).
Finally, in Sect. 5 a conclusive overview of the work is drawn, providing several lines
for further research.
2 Preliminary Concepts
f x minff x : x 2 Dg;
1
D fx 2 RN : lbi x ubi ; 1 i Ng:
f x f x e; x 2 D; e 105 : 2
The complexity to solve the problem (1) with an e-approximated solution is at least
exponential for any algorithm that works in the black-box model with a generic
non-convex function. On this subject, a useful result due to Vavasis [19] as a special
case of a theorem established by Nemirovsky and Yudin [20] is explanatory:
Theorem 1. Let F k; p be the class of k-times differentiable functions on D whose kth
derivative is bounded by p as follows: at any point x 2 D and for any unit vector u,
Hybridization and Discretization Techniques 281
k
d
f x tu p: 3
dtk
Let A be any minimization algorithm that works in the black-box model (evaluating
f and its derivatives). Assume that for any function 2 F k; p, A is guaranteed to output
a point that satises inequality (2). Then, there exists a function f 2 F k; p such that
algorithm A will run on f for at least a number of steps given
pn=k
c ; 4
e
with c being a suitable positive constant.
As mentioned before, what emerges is that, even assuming bounds on the
derivatives, the complexity of solving a global minimization problem increases
exponentially with the problems dimension.
Usually the problem (1) cannot be solved by an exhaustive search algorithm in an
efcient time. Next section will introduce an approach based on the idea of space
search reduction to lead the search towards the most promising area.
Subsection 3.1 is aimed to the description of the modied Genetic Algorithm (GA),
while in Subsect. 3.2 a novel Bounding Restart (BR) technique is described. Addi-
tionally, addressing the need to improve the convergence speed, an overall scheme with
derivative free Local Searches (LS) is presented in Subsect. 3.3.
Fig. 1. Graphic view of the points (blue squared) generated with DiagPI in a 3D box. (Color
gure online)
Algorithm 2. DiagPI
1. for j = 1 N
2.
3.
4. end for
5. for i = 1 Pop
6. for j = 1 N
7.
8. end for
9. end for
Hybridization and Discretization Techniques 283
Fig. 2. Graphic view of the points (red squared) generated with AxialPI in a 3D box. (Color
gure online)
284 F. Romito
DiagPI and AxialPI routines allow to create, through the crossover operator during
the iterations of GA (main generations loop), a multi-dimensional mesh with the points
generated as nodes (Fig. 3 provides an example in a 3D box with a vertex in the origin
of the coordinate axes).
AxialPI routine distributes the points of population along the coordinate axes of the
feasible hyperinterval (Fig. 2), the result is the Algorithm 3.
Fig. 3. Mesh of all possible points generated by the crossover (grey) through the recombination
of the initial ones (AxialPI in red, DiagPI in blue). (Color gure online)
This mesh will become more dense according to the most promising areas of the
feasible domain D. In particular, after every selection phase in a GA iteration, a
single-point crossover operator has been adopted, without recombination
of blocks to
generate and place new points PSon i j : i 1 ! Pop; j 1 ! N in the mesh.
Equations (5) and (6) and Fig. 4 show how it works.
PSon
1 PFather
1 1; . . .; PFather
1 k 1; PFather
2 k; . . .; PFather
2 N ; 5
PSon
2 PFather
2 1; . . .; PFather
2 k 1; PFather
1 k; . . .; PFather
1 N : 6
area, the bounding step at the generic iteration k is carried out through the following
equations:
ub lb ub lb
LBk ; CF 2 R : CF const [ 1; expCF 2 N: 7
2 2 CF expCF
ub lb ub lb
UBk ; CF 2 R : CF const [ 1; expCF 2 N: 8
2 2 CF expCF
CF is a convergence factor that has a high impact on reducing the bounds. The
reduction is managed by increasing expCF, the exponent of CF, of a unit per iteration.
After updating lower and upper bounds, the reduced set is centred in the best point xk
currently known. Let Ctraslk be the difference between xk and the centre of the kth
reduced hyperinterval, then, the set can be put centrally as follows:
UBk LBk
Ctraslk xk ; 9
2
LBk 1 max lb; LBk Ctraslk ; 10
At the kth BR reduction cycle, assuming expCF k the feasible space reduced is:
ub lb
Spacek [ d; d machine precision: 12
CF k
Fig. 5. Overview of a bounding step of BR. In red Pi , the best solution currently known. (Color
gure online)
Hybridization and Discretization Techniques 287
The second step of BR concerns the restart of GA inside the reduced space.
Denoting with Dk the kth reduced hyperinterval, the overall algorithmic scheme is the
following:
Table 1. Number of the problems solved by the algorithms with and without LS
Id Type Type-details Dim GABRLS GABR
Tasks solved Tasks solved
0 GKLS Non-differentiable 10 92 92
1 30 59 40
2 Differentiable 10 73 74
3 30 45 27
4 Twice differentiable 10 67 72
5 30 43 34
6 High condition Rosenbrock 10 100 3
7 30 71 0
8 Rastrigin 10 100 99
9 30 100 0
10 Zakharov 10 100 100
11 30 2 0
12 Composite 10 100 7
13 30 100 0
14 10 100 3
15 30 100 0
16 10 100 69
17 30 100 0
Total 1452 620
Hybridization and Discretization Techniques 289
integrated as LS with default parameters because of the slightly better result on several
classes of function. In particular, FMINCON and NM-SDBOX showed equal perfor-
mance to identify a local optimum on GKLS (classes 0, , 5), whereas FMINCON
outperformed NM-SDBOX on all remaining classes.
The second high level strategy was to tune the amount of population and genera-
tions of GA. These parameters give an impact on the search performance and are
crucial to balance efciency (CPU time and convergence speed) and effectiveness (fast
identication of the global optimum neighbourhood). Smaller values lead quickly to
less quality solutions, while larger values allow to identify a more promising areas but
slowing down the search.
The main aim was to nd out the smallest value that allows to improve efciency
and solve the maximum number of tasks. The setting of these parameters on GKLS
classes appeared more sensitive than other classes, so a mixed strategy has been
implemented. In particular, two settings have been adopted as starting point of tuning,
to solve GKLS functions and all other classes of functions, respectively.
Table 2 reports the selection phase of the two starting settings of population and
generations indexed by scenarios.
Table 2. Selection of the starting setting of GA. The best ones are highlighted.
The nal amount of Population and Generations are rened for each class of
GENOPT problems, so small changes have been made from the two guideline of
scenarios selected.
The other high level parameters of GABRLS algorithm are self tuned or constants.
After every selection phase of GA, carried out through the efcient well known
Tournament Selection [25], a Random Mutation [26] with xed rate has been inte-
grated, as usual, inside the crossover operator to insure that the probability of reaching
any point in the search space is never equal to zero. Table 3 reports the starting value
and the updating rule of the most important parameters.
The total number of solved problem was 1605 over 1800, almost 90%. Table 4
presents the number of solved problem for each class of function.
5 Conclusion
In this work, a novel algorithmic scheme for global optimization is presented. Several
discretization techniques to place initial points and to bound the search space are
described. The numerical results show that the overall scheme with local searches step
up the effectiveness on locating optimal solutions within a high precision range.
Further research will involve on improving the exploratory geometry through the
use of different bounding procedure. Moreover, algorithmic schemes for
multi-objective optimization and constrained optimization will be investigated.
References
1. Serani, A., Fasano, G., Liuzzi, G., Lucidi, S., Iemma, U., Campana, E.F., Stern, F., Diez, M.:
Ship hydrodynamic optimization by local hybridization of deterministic derivative-free
global algorithms. Appl. Ocean Res. 59, 115128 (2016)
2. Liuzzi, G., Lucidi, S., Piccialli, V., Sotgiu, A.: A magnetic resonance device designed via
global optimization techniques. Math. Program. 101(2), 339364 (2004)
3. Kvasov, D.E., Sergeyev, Y.D.: Deterministic approaches for solving practical black-box
global optimization problems. Adv. Eng. Softw. 80, 5866 (2015)
4. Battiti, R., Sergeyev, Y.D., Brunato, M., Kvasov, D.E.: GENOPT 2016: design of a
generalization-based challenge in global optimization. In: Sergeyev, Y.D., Kvasov, D.E.,
DellAccio, F., Mukhametzhanov, M.S. (eds.) AIP Conference Proceedings, vol. 1776, no.
060005. AIP Publishing (2016)
5. Gaviano, M., Kvasov, D.E., Lera, D., Sergeyev, Y.D.: Algorithm 829: software for
generation of classes of test functions with known local and global minima for global
optimization. ACM Trans. Math. Softw. 29(4), 469480 (2003)
6. Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans.
Evol. Comput. 1(1), 6782 (1997)
7. Diez, M., Serani, A., Leotardi, C., Campana, E.F., Fasano, G., Gusso, R.: Dense orthogonal
initialization for deterministic PSO: ORTHOinit+. In: Tan, Y., Shi, Y., Niu, B. (eds.) ICSI
2016. LNCS, vol. 9712, pp. 322330. Springer, Cham (2016). doi:10.1007/978-3-319-
41000-5_32
8. Di Pillo, G., Liuzzi, G., Lucidi, S., Piccialli, V., Rinaldi, F.: A DIRECT-type approach for
derivative-free constrained global optimization. Comput. Optim. Appl. 65(2), 361397
(2016)
9. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science
220(4598), 671680 (1983)
10. Pierre, S., Houto, F.: A tabu search approach for assigning cells to switches in cellular
mobile networks. Comput. Commun. 25(5), 464477 (2002)
11. Baba, N.: Convergence of a random optimization method for constrained optimization
problems. J. Optim. Theor. Appl. 33(4), 451461 (1981)
12. Dorigo, M., Blum, C.: Ant colony optimization theory: a survey. Theoret. Comput. Sci. 344
(23), 243278 (2005)
13. Strongin, R.G., Sergeyev, Y.D.: Global Optimization with Non-Convex Constraints:
Sequential and Parallel Algorithms. Nonconvex Optimization and Its Applications, vol. 45.
Springer, New York (2000). doi:10.1007/978-1-4615-4677-1
292 F. Romito
14. Zhigljavsky, A., ilinskas, A.: Stochastic Global Optimization. Springer Optimization and
Its Applications. Springer, New York (2008). doi:10.1007/978-0-387-74740-8
15. Paulaviius, R., ilinskas, J.: Simplicial Global Optimization. SpringerBriefs in Optimiza-
tion. Springer, New York (2014). doi:10.1007/978-1-4614-9093-7
16. Locatelli, M., Schoen, F.: Global Optimization: Theory, Algorithms, and Applications.
Society for Industrial and Applied Mathematics, Philadelphia (2013)
17. Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 6585 (1994)
18. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning.
Addison-Wesley Publishing Company, Reading (1989)
19. Vavasis, S.A.: Complexity issues in global optimization: a survey. In: Horst, R., Pardalos, P.
M. (eds.) Handbook of Global Optimization. Springer, Boston (1995). doi:10.1007/978-1-
4615-2025-2_2
20. Nemirovskii, A., Yudin, D.B.: Problem Complexity and Method Efciency in Optimization.
A Wiley-Interscience. Wiley, New York (1983). Translated from the Russian and with a
preface by Dawson E.R., Wiley-Interscience Series in Discrete Mathematics. Chichester:
John Wiley and Sons (1983)
21. Sergeyev, Y.D., Kvasov, D.E.: Global search based on efcient diagonal partitions and a set
of Lipschitz constants. SIAM J. Optim. 16(3), 910937 (2006)
22. Liuzzi, G., Lucidi, S., Piccialli, V.: A DIRECT-based approach exploiting local minimiza-
tions for the solution of large-scale global optimization problems. Comput. Optim. Appl. 45
(2), 353375 (2010)
23. Lucidi, S., Sciandrone, M.: A derivative-free algorithm for bound constrained optimization.
Comput. Optim. Appl. 21(2), 119142 (2002)
24. https://it.mathworks.com/help/optim/ug/fminunc.html#References
25. Goldberg, D.E., Kalyanmoy, D.: A comparative analysis of selection schemes used in
genetic algorithms. Found. Genetic Algorithms 1, 6993 (1991)
26. Herrera, F., Lozano, M., Verdegay, J.L.: Tackling real-coded genetic algorithms: operators
and tools for behavioural analysis. Artif. Intell. Rev. 12(4), 265319 (1998)
Short Papers
Identication of Discontinuous Thermal
Conductivity Coefcient Using Fast
Automatic Differentiation
1 Introduction
The classical heat equation is often used in the description and mathematical modeling
of many thermal processes. The density of the substance, its specic thermal capacity,
and the thermal conductivity coefcient appearing in this equation are assumed to be
known functions of the coordinates and temperature. Additional boundary value con-
ditions makes it possible to determine the dynamics of the temperature eld in the
substance under examination.
However, the substance properties are not always known. It often happens that the
thermal conductivity coefcient depends only on the temperature, and this dependence
is not known. In this case, the problem of determining the dependence of the thermal
conductivity coefcient on the temperature based on experimental measurements of the
temperature eld arises. This problem also arises when a complex thermal process
should be described by a simplied mathematical model. For example, in studying and
modeling of heat propagation in complex porous composite materials, where the
radiation heat transfer plays a considerable role, both the convective and radiative heat
transfer must be taken into account. The thermal conductivity coefcients in this case
typically depend on the temperature. To estimate these coefcients, various models of
the medium are used. As a result, one has to deal with a complex nonlinear model that
describes the heat propagation in the composite material (see [1]). However, another
approach is possible: a simplied model is constructed in which the radiative heat
transfer is not taken into account, but its effect is modeled by an effective thermal
conductivity coefcient that is determined based on empirical data.
Determining the thermal conductivity of a substance is an important problem, and it
has been studied for a long time. This is conrmed by a large number of publications
(e.g., see [25]).
In [6] the inverse coefcient problems are considered in new formulation. They are
studied theoretically, and new numerical methods for their solution are developed. In
that work the case of continuous thermal conductivity coefcient is considered.
In this paper we consider the problem studied in [6] for the case of a discontinuous
thermal conductivity coefcient. The consideration is based on the Dirichlet problem
for the one-dimensional unsteady-state heat equation. The inverse coefcient problem
is reduced to a variation problem. A linear combination of the mean-root-square
deviations of the temperature distribution eld and the heat flux from the empirical data
on the left boundary of the domain is used as the objective functional. An algorithm for
the numerical solution of the inverse coefcient problem is proposed. It is based on the
modern approach of Fast Automatic Differentiation, which made it possible to solve a
number of difcult optimal control problems for dynamic systems.
A layer of material of width L is considered. The temperature of this layer at the initial
time is given. It is also known how the temperature on the boundary of this layer
changes with time. The distribution of the temperature eld at each moment of time is
described by the following initial boundary value (mixed) problem:
@Tx; t @ @Tx; t
qC KT 0; x; t 2 Q 1
@t @x @x
Tx; 0 w0 x; 0 x L; 2
T0; t w1 t; TL; t w2 t; 0 t H: 3
The density of the material q and its heat capacity C are known functions of
coordinate and/or temperature.
If the dependence of the coefcient of the convective thermal conductivity KT on
the temperature T is known, then we can solve the mixed problem (1)(3) to nd the
Problem (1)(3) will be further called the
distribution of the temperature Tx; t in Q.
direct problem.
If the dependence of the coefcient of the convective thermal conductivity of the
material on the temperature is not known, it is of interest to determine this dependence.
A possible statement of this problem (it is classied as an identication problem of the
model parameters) is as follows: nd the dependence KT on T under which the
temperature eld Tx; t obtained by solving problem (1)(3) is close to the eld
Yx; t, which itself is obtained empirically. The quantity
ZH ZL
UKT Tx; t Yx; t 2 lx; t dxdt
0 0
4
ZH 2
@T
b K T0; t 0; t Pt dt;
@x
0
can be used as the measure of difference between these functions. Here b 0 is a given
number, lx; t 0 is a given weighting function, and Pt is the known heat flux on
the left boundary of the domain. Thus, the optimal control problem is to nd the
optimal control KT and the corresponding optimal solution Tx; t of problem (1)(3)
that minimizes functional (4).
The optimal control problems similar to this one are typically solved numerically using
a descent method, which requires the gradient of functional (4) to be known. The
unknown function KT was approximated by a continuous piecewise linear function.
If the input data of the problem is such that the desired coefcient of thermal
conductivity represents a fairly smooth function, then the problem of identication
could be solved by a method proposed in [6].
This work presents the algorithm for solving the problem of identifying a dis-
continuous thermal conductivity coefcient and some numerical results. The proposed
algorithm, as well as in [6], is based on the numerical solution of the problem of
minimizing the cost functional (4). The gradient descent method was used. It is well
known that it is very important for the gradient methods to determine accurate values of
the gradients. For this reason, we used the efcient approach of Fast Automatic Dif-
ferentiation that enables us to determine with machine precision the gradient of cost
function, subject to equality constraints (see [7]).
To numerically solve the mixed problem (1)(3) the domain Q f0\x\L
0\t Hg is decomposed by the grid lines f~xi gIi0 and f~t j gj0 into rectangles.
J
298 A.F. Albu et al.
At each node ~xi ; ~t j of Q characterized by the pair of indices i; j, all the functions are
determined by their values at the point ~xi ; ~t j (e.g., T ~xi ; ~t j Tij ). In each rectangle,
the thermal balance must be preserved.
The temperature interval a; b (the interval of interest) is partitioned by the points,
T0 a, T1 ; T2 ; . . .; TN b into N 2m 1 parts (they can be of equal or of different
lengths). Each point Tn n 0; . . .; N is assigned a number kn KTn . The function
KT T, which needs to be found, is approximated by a continuous piecewise linear
N
functions with the nodes at the points T~n ; kn so that
n0
kn kn1
KT kn1 T Tn1 for Tn1 T Tn ; n 1; . . .; N:
Tn Tn1
X I1
J X
UKT F Tij Yij 2 lij hi s j
j1 i1
XJ
r
b KT0j KT1j T1j T0j
j1
2h0
2 !
1 r qC h
0 0
KT0j1 KT1j1 T1j1 T0j1 0 j T0j T0j1 P j s j :
2h0 2s
To illustrate the efciency of the proposed algorithm the variation problem (1)(4)
was considered with the following parameters:
L 1; H 1; qx Cx 1;
w0 x 2x; w1 0; w2 2;
lx; t 1; b 0; a 0; b 2:
It should be noted that in this example, the thermal conductivity coefcient being
identied was a continuous function, although contained narrow domain of smooth-
ing jump (its width is 2=N).
The numerous numerical results showed that the proposed algorithm is efcient and
stable and allows us to restore thermal conductivity with high accuracy.
Acknowledgments. This work was supported by the Russian Foundation for Basic Research
(project no. 17-07-00493a).
300 A.F. Albu et al.
References
1. Alifanov, O.M., Cherepanov, V.V.: Mathematical simulation of high-porosity brous
materials and determination of their physical properties. High Temp. 47, 438447 (2009)
2. Kozdoba, L.A., Krukovskii, P.G.: Methods for Solving Inverse Thermal Transfer Problems.
Naukova Dumka, Kiev (1982). [in Russian]
3. Alifanov, O.M.: Inverse Problems of Heat Transfer. Mashinostroenie, Moscow (1988).
[in Russian]
4. Marchuk, G.I.: Adjoint Equation and the Analysis of Complex Systems. Nauka, Moscow
(1992). [in Russian]
5. Samarskii, A.A., Vabishchevich, P.N.: Computational Heat Transfer. Editorial URSS,
Moscow (2003). [in Russian]
6. Zubov, V.I.: Application of fast automatic differentiation for solving the inverse coefcient
problem for the heat equation. Comput. Math. Math. Phys. 56(10), 17431757 (2016)
7. Evtushenko, Y.G.: Computation of exact gradients in distributed dynamic systems. Optim.
Methods Softw. 9, 4575 (1998)
Comparing Two Approaches for Solving
Constrained Global Optimization Problems
1 Introduction
In the present paper, the methods for solving the global optimization problems
with non-convex constraints
are considered. The objective function as well as the constraint ones are supposed
to satisfy Lipschitz condition with Lipschitz constants unknown a priori. The
analytical formulae of the problem functions may be unknown, i.e. these ones
may be dened by an algorithm for computing the function values in the search
domain (so called black-box-functions). Moreover, it is suggested that even a
single computing of a problem function value may be a time-consuming operation
since in the applied problems it is related to the necessity of numerical modeling
(see, for example, [14]).
The method of penalty functions is one of the most popular numerical meth-
ods for solving the problems of this kind. The idea of the method is simple and
very universal; therefore, the method has found wide application to solving var-
ious practical problems. A detailed description of the method can be found, for
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 301306, 2017.
https://doi.org/10.1007/978-3-319-69404-7_22
302 K. Barkalov and I. Lebedev
2 Index Method
A novel approach to the minimization of the multiextremal functions with non-
convex constraints (called the index method of accounting for the constraints)
has been developed in [69]. The approach is based on a separate accounting
for each constraint of the problem and is not related to the use of the penalty
functions. At that, employing the continuous single-valued Peano curve y(x)
(evolvent) mapping the unit interval [0, 1] on the x-axis onto the N -dimensional
domain (2) it is possible to nd the minimum in (1) by solving a one-dimensional
problem
presented in [6]. The algorithm is very exible and allows an ecient paralleliza-
tion as for shared memory, as for distributed memory, as for accelerators [1014].
3 Results of Experiments
Let us compare the solving of the constrained global optimization problems using
penalty functions method (PM) and index method (IM). The penalty function
was taken in the form
m
2
G(y) = max {0, gj (y)} ,
j=1
N =2 N =3 N =4 N =5
Simple Hard Simple Hard Simple Hard Simple Hard
0.2 1639 1507 56149 60651 160198(5) 119764(2) 305957(1) 488587(14)
0.4 1534 1967 67145 74860 158547 127832 396480(4) 434328(2)
0.6 1201 1514 92854 101240 138550 143818 373617(8) 561447(9)
0.8 1277 1287 108121 148479 130372 145592 488791(12) 646538(24)
N =2 N =3 N =4 N =5
Simple Hard Simple Hard Simple Hard Simple Hard
0.2 447 911 14719 20120 59680 66551(1) 391325(2) 188797(12)
0.4 465 1800 11951 17427 71248 86899 339327(1) 151998(3)
0.6 403 1988 7366 12853 58451 92007 316648 179013(4)
0.8 371 4292 4646 8702 33621 54405 309844 124952
Simple Hard
k1 k3 k3 k1 k2 k3
0.2 59680 20445 4401 66551 24210 6316
0.4 71248 28527 6784 86899 39682 12615
0.6 58451 31508 9505 92007 52560 19853
0.8 33621 21411 10446 54405 36838 22202
The average values presented demonstrate that the solving of the specied
problems using the index method requires less number of iterations than with
the use of the penalty function one. At the same time, separate accounting for
the constraints in the index method provides less number of computations of
the values of the problem functions as well. The numbers of computations of
the values of the constraint functions g1 (y), g2 (y) and those for the objective
function (y) (k1 , k2 , and k3 , respectively) are presented in Table 3 for four-
dimensional problems.
4 Conclusion
Concluding, let us note that the index method for solving constrained global
optimization problems considered in the present work:
is based on the global search algorithm, which is not inferior in the speed of
work than other well-known algorithms;
Comparing Two Approaches for Solving Constrained Global Optimization 305
allows solving the initial problem directly, without the use of the penalty
functions (thus, the issues of selection the penalty coecient and of solv-
ing a series of unconstrained problems with dierent penalty coecients are
eliminated);
allows solving the problems, which the values of the problem function are not
dened everywhere (for example, the objective function values are undened
out of the problem feasible domain);
speeds up the process of solving the constrained optimization problems (due
to an essential reduction of the total number of computations of the problem
function values).
The last statement has been supported by the numerical solving several hundred
test problems.
References
1. Famularo, D., Pugliese, P., Sergeyev, Y.D.: A global optimization technique for
checking parametric robustness. Automatica 35, 16051611 (1999)
2. Kvasov, D.E., Menniti, D., Pinnarelli, A., Sergeyev, Y.D., Sorrentino, N.: Tuning
fuzzy power-system stabilizers in multi-machine systems by global optimization
algorithms based on ecient domain partitions. Electr. Power Syst. Res. 78(7),
12171229 (2008)
3. Kvasov, D.E., Sergeyev, Y.D.: Deterministic approaches for solving practical black-
box global optimization problems. Adv. Eng. Softw. 80, 5866 (2015)
4. Modorskii, V.Y., Gaynutdinova, D.F., Gergel, V.P., Barkalov, K.A.: Optimization
in design of scientic products for purposes of cavitation problems. In: Simos, T.E.
(ed.) ICNAAM 2015. AIP Conference Proceedings, vol. 1738 (2016). Article No.
400013
5. Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and
Algorithms, 2nd edn. Wiley, New York (1993)
6. Strongin, R.G., Sergeyev, Y.D.: Global Optimization with Non-convex Constraints:
Sequential and Parallel Algorithms. Kluwer Academic Publishers, Dordrecht (2000)
7. Sergeyev, Y.D., Famularo, D., Pugliese, P.: Index branch-and-bound algorithm for
Lipschitz univariate global optimization with multiextremal constraints. J. Glob.
Optim. 21(3), 317341 (2001)
8. Barkalov, K.A., Strongin, R.G.: A global optimization technique with an adaptive
order of checking for constraints. Comput. Math. Math. Phys. 42(9), 12891300
(2002)
9. Strongin, R.G., Sergeyev, Y.D.: Global optimization: fractal approach and non-
redundant parallelism. J. Glob. Optim. 27(1), 2550 (2003)
10. Barkalov, K., Ryabov, V., Sidorov, S.: Parallel scalable algorithms with mixed
local-global strategy for global optimization problems. In: Hsu, C.-H., Malyshkin,
V. (eds.) MTPP 2010. LNCS, vol. 6083, pp. 232240. Springer, Heidelberg (2010).
doi:10.1007/978-3-642-14822-4 26
306 K. Barkalov and I. Lebedev
11. Barkalov, K.A., Gergel, V.P.: Multilevel scheme of dimensionality reduction for par-
allel global search algorithms. In: Proceedings of the 1st International Conference
on Engineering and Applied Sciences Optimization - OPT-i 2014, pp. 21112124
(2014)
12. Barkalov, K., Gergel, V., Lebedev, I.: Use of Xeon Phi coprocessor for solving global
optimization problems. In: Malyshkin, V. (ed.) PaCT 2015. LNCS, vol. 9251, pp.
307318. Springer, Cham (2015). doi:10.1007/978-3-319-21909-7 31
13. Barkalov, K., Gergel, V.: Parallel global optimization on GPU. J. Glob. Optim.
66(1), 320 (2016)
14. Barkalov, K., Gergel, V., Lebedev, I.: Solving global optimization problems on
GPU cluster. In: Simos, T.E. (ed.) ICNAAM 2015. AIP Conference Proceedings,
vol. 1738 (2016). Article No. 400006
15. Gergel, V., Grishagin, V., Gergel, A.: Adaptive nested optimization scheme for
multidimensional global search. J. Glob. Optim. 66(1), 3551 (2016)
16. Gaviano, M., Kvasov, D.E., Lera, D., Sergeyev, Y.D.: Software for generation of
classes of test functions with known local and global minima for global optimiza-
tion. ACM Trans. Math. Softw. 29(4), 469480 (2003)
17. Sergeyev, Y.D., Kvasov, D.E.: Global search based on ecient diagonal partitions
and a set of Lipschitz constants. SIAM J. Optim. 16(3), 910937 (2006)
18. Paulavicius, R., Sergeyev, Y., Kvasov, D., Zilinskas, J.: Globally-biased DISIMPL
algorithm for expensive global optimization. J. Glob. Optim. 59(23), 545567
(2014)
19. Sergeyev, Y.D., Kvasov, D.E.: A deterministic global optimization using smooth
diagonal auxiliary functions. Commun. Nonlinear Sci. Numer. Simul. 21(13), 99
111 (2015)
20. Gergel, V.: An approach for generating test problems of constrained global opti-
mization. In: Battiti, R., Kvasov, D., Sergeyev, Y. (eds.) LION 2017. LNCS, vol.
10556, pp. 314319. Springer, Cham (2017). doi:10.1007/978-3-319-69404-7 24
Towards a Universal Modeller
of Chaotic Systems
Erik Berglund(B)
1 Introduction
The class of dynamical systems that are non-linear and highly sensitive to initial
conditions - commonly called chaotic - appear in many places in nature, and it
has even been suggested that chaos plays a part in biological intelligence [6] and
perception [7]. Since deterministic systems can be chaotic (indeed, this is where
the phenomenon was rst discovered), chaos is of interest to computer scientists.
Chaotic systems have been subject of study in the Machine Learning com-
munity. Much, if not most, of this eort has been aimed at predicting the time
evolution of chaotic systems based on starting conditions and a manually con-
structed model of the chaotic system. This prediction is, by the very nature of
chaotic systems, extremely dicult and the predictive value of the model quickly
diminishes as the prediction time period increases.
Instead this paper presents a machine learning system that, by being trained
on the output of a dynamical system, can replicate some of the fundamental
properties of the dynamical system without necessarily attempting to predict
the evolution of the dynamical system itself, thus creating a tool that can model
chaotic behaviour without knowing the underlying rules of the chaotic system.
The algorithm is trained on four dierent deterministic dynamical processes,
of which one is chaotic and the others are not, and the chaotic properties of the
system are measured.
The rest of this document is laid out as follows: Sect. 2 discusses previous
works and gives background information, Sect. 3 gives details for the machine
learning system, Sect. 4 describes the experimental setup and Sect. 5 lists the
results. Section 6 contains concluding remarks and discussion.
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 307313, 2017.
https://doi.org/10.1007/978-3-319-69404-7_23
308 E. Berglund
2 Previous Work
Chaotic time-series and chaos in neural networks has been studied by several
researchers. One prominent example is CSA [4], which has been shown to be
able to solve combinatorial tasks like the Travelling Salesperson Problem (TSP)
eciently. Unlike the present work the chaos is inherent in the network model,
not learned. CSA has inspired other approaches, for example [13], which also
contains a good review of similar methods.
Time-series processing with SOM-type algorithms have been studied before,
with the aim of predicting time series [3,8].
Another avenue of research is the role of chaotic neural networks in memory
formation and retrieval [5].
The nomenclature of SOM algorithms with recurrent connections is confus-
ing. In other ML algorithms recurrence usually means that the output of the
algorithm is fed back as part of the input. The Recurrent SOM [11], on the
other hand, is in essence a SOM with leaky integrators on the inputs. When
recurrent connections in SOM algorithms were rst investigated [12], the term
recursive was used instead.
Self-Organising Maps are a form of unsupervised learning, where a set of
nodes indexed by i [1, n] are each associated with a weight vector wi . Train-
ing is iterative, and proceeds as a discrete set of steps where training data is
presented to the network, then the weight vectors are updated until some pre-
determined condition is met.
In the standard SOM the magnitude of the weight update is dependent on
the iteration number or time. For the PLSOM [2] and PLSOM2 [1] algorithms,
on the other hand, the weight update is a function of what can be inferred about
the state of the network.
3 Learning Algorithm
Recursion in SOM type neural networks works the same way as recurrence in
other Neural Network algorithms: The output at time t 1 forms part of the
input at time t. The particular method used here is adopted from [12].
The output is dened as a vector of length n, where n is the number of nodes,
that corresponds to the excitation of a given node.
ki = e[||x(t1)wx || + (1)||y(t1)wy ||] (1)
In (1), is a tuning parameter that determines how much inuence the new
input has relative to the recurrent connections, k is a vector of length n, e is
Eulers constant and y is the output vector, wy is the part of the weight vector
that corresponds to the output (that is, the recurrent weights) and wx is the part
of the weight vector that corresponds to the input. Following the calculation of
k, the updated value of y is found by scaling and translating k so that each
element lies in the range [0, 1] according to (2).
ki min(k)
yi (t) = (2)
max(k) min(k)
Towards a Universal Modeller of Chaotic Systems 309
4 Experimental Setup
As training input for time series are used:
1. A sine wave with wavelength 200.
2. The sum of two sine waves, with wavelengths 200 and 61.5.
3. A simulated Mackey-Glass [9] sequence, consisting of 511 sine waves with
dierent wavelength and amplitude added together to resemble the frequency
response of the real Mackey-Glass sequence.
4. A chaotic Mackey-Glass series.
All time series were scaled and translated to span the interval [0, 1]. The Fourier
coecients of of the simulated Mackey-Glass series is indistinguishable from the
Fourier coecients of the real Mackey-Glass series, as the simulated series was
created through inverse Fourier transform of the real series.
The Mackey-Glass time series is given by (4).
dx x
=a bx (4)
dt 1 + xn
here x represents the value of the variable at time index (t ). In the present
work the following values are used: = 17, a = 0.2, b = 0.1, and n = 10.
Before each test the RPLSOM2 weights were initialised to a random state
and trained with 50000 samples from one of the time series. The map has 100
nodes arranged in a 10 10 grid. The generalisation factor was set to 17, and
set to 0.6. Each experiment was repeated 1000 times for each time series.
310 E. Berglund
4.1 Repetition
One characteristic of chaotic systems is that they have very long repetition peri-
ods. Therefore the repetition period for the input sequences was measured. The
repetition period is dened as the number of samples one must draw from the
series before the last 100 samples are repeated anywhere in any of the previous
samples.
After training the map was put into idle mode for 200000 iterations. The
weight vector of the winning node will describe a one-dimensional time series,
which was checked for repetition.
where x(t) is the sample drawn from the time series at time t. This results in a
trajectory of vectors in 3-space.
The fractal dimension was estimated using the information dimension based
on 15000 samples drawn from the trained RPLSOM2 in idle mode. Any output
sequence with fewer than 100 unique points were discarded, since this gives the
information dimension computation too little to work with to give a meaningful
result. Few unique points indicate a stable orbit or stable point, which would
indicate a non-fractal dimension. The number of discarded output sequences for
each input time series are given by Table 1.
Table 1. Number of output sequences discarded because of too few unique points.
The Lyapunov exponent was calculated using the excitation vector p from (2) of
the RPLSOM2 and the numerical algorithm described in [10]. The perturbation
value used was d0 = 1012 , and the algorithm was in idle mode for 300 iterations
before 15000 iterations were sampled and averaged to compute the Lyapunov
exponent estimate.
Towards a Universal Modeller of Chaotic Systems 311
5 Results
As can be seen from Table 2, the mean Lyapunov exponent for the map trained
with a chaotic time series is clearly less negative. This becomes even clearer from
Table 3, which shows the percentage of maps with positive Lyapunov exponents.
The connection between the chaos of the input sequence and the behaviour
of the map is also evident in the number of iterations before repeat, see Table 4.
6 Conclusion
It was observed that RPLSOM2 networks trained with a chaotic time series to
a signicant degree exhibit the following characteristics:
Longer repetition periods.
Higher Lyapunov exponent.
Higher probability of having a positive Lyapunov exponent.
Higher fractal dimension.
when compared to networks that have been trained on non-chaotic but otherwise
similar time sequences. This is consistent with chaotic behaviour.
This is the rst instance of the chaotic behaviour of a network output after
training depends on its training input.
References
1. Berglund, E.: Improved PLSOM algorithm. Appl. Intell. 32(1), 122130 (2010)
2. Berglund, E., Sitte, J.: The parameterless self-organizing map algorithm. IEEE
Trans. Neural Netw. 17(2), 305316 (2006)
3. Chappell, G.J., Taylor, J.G.: The temporal kohonen map. Neural Netw. 6(3), 441
445 (1993)
4. Chen, L., Aihara, K.: Chaotic simulated annealing by a neural-network model with
transient chaos. Neural Netw. 8(6), 915930 (1995)
5. Crook, N., Scheper, T.O.:. A novel chaotic neural network architecture. In: ESANN
2001 Proceedings, pp. 295300, April 2001
6. Freeman, W.J.: Chaos in the brain: possible roles in biological intelligence. Int. J.
Intell. Syst. 10(1), 7188 (1995)
7. Freeman, W.J., Barrie, J.M.: Chaotic oscillations and the genesis of meaning in
cerebral cortex. In: Buzsaki, G., Llinas, R., Singer, W., Berthoz, A., Christen,
Y. (eds.) Temporal Coding in the Brain. NEUROSCIENCE. Springer, Heidelberg
(1994). doi:10.1007/978-3-642-85148-3 2
8. Koskela, T., Varsta, M., Heikkonen, J., Kaski, K.:. Recurrent SOM with local linear
models in time series prediction. In: 6th European Symposium on Artificial Neural
Networks, pp. 167172. D-facto Publications (1998)
9. Mackey, M.C., Glass, L.: Oscillation and chaos in physiological control systems.
Science 197, 287 (1977)
10. Sprott, J.C.: Chaos and Time-Series Analysis. Oxford University Press, Oxford
(2003)
Towards a Universal Modeller of Chaotic Systems 313
11. Varstal, M., Millan, J.R., Heikkonen, J.: A recurrent self-organizing map for tempo-
ral sequence processing. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D.
(eds.) ICANN 1997. LNCS, vol. 1327, pp. 421426. Springer, Heidelberg (1997).
doi:10.1007/BFb0020191
12. Voegtlin, T.: Recursive self-organizing maps. Neural Netw. 15(89), 979991 (2002)
13. Wang, L., Li, S., Tian, F., Fu, X.: A noisy chaotic neural network for solving
combinatorial optimization problems: stochastic chaotic simulated annealing. IEEE
Trans. Syst. Man Cybern. Part B 34(5), 21192125 (2004)
An Approach for Generating Test Problems
of Constrained Global Optimization
Victor Gergel(B)
1 Introduction
In the present paper, the methods for generating the global optimization test
problems with non-convex constraints
|gi (y ) gi (y )| Li y y , y , y D, 1 i m + 1.
with the Lipschitz constants unknown a priori. The analytical formulae of the
problem functions may be unknown, i.e. these ones may be dened by an
algorithm for computing the function values in the search domain (so called
black-box-functions). It is supposed that even a single computing of a prob-
lem function value may be a time-consuming operation since it is related to the
necessity of numerical modeling in the applied problems (see, for example, [14]).
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 314319, 2017.
https://doi.org/10.1007/978-3-319-69404-7_24
An Approach for Generating Test Problems 315
The evaluation of eciency of the developed methods is one of the key prob-
lems in the optimization theory and applications. Unfortunately, it is dicult
to obtain any theoretical estimates in many cases. As a result, the comparison
of the methods is performed by carrying out the computational experiments on
solving some test optimization problems in most cases. In order to obtain a reli-
able evaluation of the eciency of the methods, the sets of test problems should
be diverse and representative enough. The problem of choice of the test problems
has been considered in a lot of works (see, for example, [58]). Unfortunately, in
many cases, the proposed sets contain a small number of test problems, and it
is dicult to obtain the problems with desired properties. The most important
drawback consists of the fact that the constraints are absent in the proposed
test problems as a rule (or the constraints are relatively simple: linear, convex,
etc.).
A novel approach to the generation of any number of the global optimiza-
tion problems with non-convex constraints for performing multiple computa-
tional experiments in order to obtain a reliable evaluation of the eciency of the
developed optimization algorithms has been proposed. When generating the test
problems, the necessary number of constraints and desired fraction of the feasi-
ble domain relative to the whole search domain can be specied. In addition, the
locations of the global minimizers in the generated problems are known a priori
that simplies the evaluation of the results of the computational experiments
essentially.
where gij (y) = sin(iy1 ) sin(jy2 ), hij (y) = cos(iy1 ) cos(jy2 ), y = (y1 , y2 )
R2 , 0 y1 , y2 1, and coecients Aij , Bij , Cij , Dij are taken uniformly in the
interval [1, 1].
Let us consider a scheme for constructing the generator GCGen (Global
Constrained optimization problem Generator) which allows to generate the test
global optimization problems with m constraints. Obviously, one can generate
m + 1 functions, the rst m of these ones can be considered as the constraints
and the (m + 1)-th function as the objective one. However, in this case, the
conditional global minimizer of the objective function is unknown, and the pre-
liminary estimate of this one (for example, by scanning over a uniform grid)
316 V. Gergel
will be time-consuming. At the same time, one could not control the size of the
feasible domain. In particular, the constraints might be incompatible, and the
feasible domain might be empty.
Below, the rules, which allow formulating the constrained global optimization
problems so that:
one could control the size of feasible domain with respect to the whole domain
of the parameters variation;
the global minimizer of the objective function would be known a priori taking
into account the constraints;
the global minimizer of the objective function without accounting for the
constraints would be out of the feasible domain (with the purpose of simulat-
ing the behavior of the constraints and the objective function in the applied
constrained optimization problems)
are proposed.
The rules dening the operation of the generator of the constrained global
optimization with the properties listed above consist in the following.
1. Let us generate m + 1 functions fj (y), y D, 1 j m + 1, by some
generating scheme (for instance, by using the formula (3)). The constraints
will be constructed on the base of the rst m functions, the (m+1)-th function
will serve for the construction of the objective function.
2. In order to know the global minimizer in the constrained problem a priori,
let us make it to be the same to the global minimizer in the unconstrained
problem. To do so, let us perform a linear transformation of coordinates
so that the global minimizers of the constraint functions yj , 1 j m,
would transit into the minimizer of the objective function ym + 1 . This way,
the functions fj (y), 1 j m, with the same point of extremum will be
constructed.
3. In order to control the size of the feasible domain, let us construct an auxiliary
function (a combined constraint)
and compute its values in the nodes of a uniform grid in the domain D;
the number of the grid nodes in the conducted
experiments
should be big
enough (in our experiments it was min 107 , 102N ). Then, let us nd the
maximum and minimum values of the function H(y) in the grid nodes, Hmax
and Hmin , respectively, and construct a characteristic s(i) the number of
points, in which the values of H(y) fall into the range
Hmax Hmin
Hmin , Hmin + i , 1 i 100.
100
Then, the functions
Hmax Hmin
fj (y) q = Hmin + i , 1 j m,
100
An Approach for Generating Test Problems 317
where
m
(y) = fm + 1 (y) max 0, fj (y) q ,
j =1
where hmax and hmin are the maximum and minimum values of the function
fm + 1 (y) respectively.
(a) (b)
4 Conclusion
This paper considers a method for generating global optimization test problems
with non-convex constraints that allows:
to control the size of feasible domain with respect to the whole domain of the
parameters variation;
to know a priori the conditional global minimizer of the objective function;
to generate the unconditional global minimizer of the objective function out
of the feasible domain (to simulate the constraints and objective function in
the applied optimization problems).
References
1. Famularo, D., Pugliese, P., Sergeyev, Y.D.: A global optimization technique for
checking parametric robustness. Automatica 35, 16051611 (1999)
2. Kvasov, D.E., Menniti, D., Pinnarelli, A., Sergeyev, Y.D., Sorrentino, N.: Tuning
fuzzy power-system stabilizers in multi-machine systems by global optimization
algorithms based on ecient domain partitions. Electr. Power Syst. Res. 78(7),
12171229 (2008)
3. Kvasov, D.E., Sergeyev, Y.D.: Deterministic approaches for solving practical black-
box global optimization problems. Adv. Eng. Softw. 80, 5866 (2015)
4. Modorskii, V.Y., Gaynutdinova, D.F., Gergel, V.P., Barkalov, K.A.: Optimization
in design of scientic products for purposes of cavitation problems. Solving global
optimization problems on GPU cluster. In: Simos, T.E. (ed.) ICNAAM 2015, AIP
Conference Proceedings, 1738, art. no. 400013 (2016)
5. Floudas, C.A., et al.: Handbook of Test Problems in Local and Global Optimiza-
tion. Kluwer Academic Publishers, Dordrecht (1999)
6. Gaviano, M., Kvasov, D.E., Lera, D., Sergeyev, Y.D.: Software for generation of
classes of test functions with known local and global minima for global optimiza-
tion. ACM TOMS 29(4), 469480 (2003)
7. Ali, M.M., Khompatraporn, C., Zabinsky, Z.B.: A numerical evaluation of several
stochastic algorithms on selected continuous global optimization test problems. J.
Glob. Optim. 31(4), 635672 (2005)
8. Addis, B., Locatelli, M.: A new class of test functions for global optimization. J.
Glob. Optim. 38(3), 479501 (2007)
9. Grishagin, V.A.: Operating characteristics of some global search algorithms. Probl.
Stat. Optim. 7, 198206 (1978). [in Russian]
10. Gergel, V., Grishagin, V., Gergel, A.: Adaptive nested optimization scheme for
multidimensional global search. J. Glob. Optim. 66(1), 3551 (2016)
11. Strongin, R.G., Sergeyev, Y.D.: Global Optimization with Non-Convex Con-
straints. Sequential and Parallel Algorithms. Kluwer Academic Publishers,
Dordrecht (2000)
12. Sergeyev, Y.D., Famularo, D., Pugliese, P.: Index branch-and-bound algorithm for
Lipschitz univariate global optimization with multiextremal constraints. J. Glob.
Optim. 21(3), 317341 (2001)
13. Barkalov, K.A., Strongin, R.G.: A global optimization technique with an adaptive
order of checking for constraints. Comput. Math. Math. Phys. 42(9), 12891300
(2002)
14. Barkalov, K., Gergel, V., Lebedev, I.: Use of Xeon Phi coprocessor for solving global
optimization problems. In: Malyshkin, V. (ed.) PaCT 2015. LNCS, vol. 9251, pp.
307318. Springer, Cham (2015). doi:10.1007/978-3-319-21909-7 31
15. Barkalov, K., Gergel, V.: Parallel global optimization on GPU. J. Glob. Optim.
66(1), 320 (2016)
Global Optimization Using Numerical
Approximations of Derivatives
1 Introduction
The global optimization problem is a problem of nding the minimum value of
a function (x)
(x ) = min{(x) : x [a, b]}. (1)
For numerical solving of the problem (1) optimization methods usually gen-
erate a sequence of points yk which converges to the global optimum x .
Suppose that the optimized function (x) is multiextremal. Also assume that
the optimized function (x) satises the Lipschitz condition
|(x2 ) (x1 )| L |x2 x1 | , x1 , x2 [a, b], (2)
where L > 0 is the Lipschitz constant. In addition to (2) also assume that the
rst derivative of the optimized function (x) satises the Lipschitz condition
(x2 ) (x1 ) L1 || x2 x1 || , x1 , x2 [a, b]. (3)
where
(xi1 ) + m(xi xi1 ) + mxi
xmin
i = .
m
Rule 5. Find the interval (xt1 , xt ) with the minimal characteristic R(t)
In the case when there exist several intervals satisfying (6) the interval with
the minimal number t is taken for certainty.
Rule 6. Compute the next point of the next trial xk+1 accordingly
min min
xt , xt [xt , xt ],
xk+1 = xt , min (x t ) min (xt ),
xt , min (x t ) > min (xt ).
As it was mentioned above, the derivative may be unknown or its values are
time-consuming to compute. In this paper the modication of the AGMD based
on numerical dierentiation is proposed.
The following relations are used for numerical estimations of values of the
rst derivative:
(xk ) (xk1 )
i = ,
xk xk1
Global Optimization Using Numerical Approximations of Derivatives 323
approximation of values of the rst derivatives at left, right and center points
by three values of the function
1 (1 + 2 )2 1
0 = 2 (2 + 2 )(x0 ) + (x1 ) (x2 ) ,
H1 2 2
1 (1 + k )2 (2 + k )
k = k k (xk2 ) (xk1 ) + (xk ) ,
Hk1 k k
for 1 i k 1
1 2
(i+1 1) 1
i = i+1 (xi1 ) (xi ) + (xi+1 ) ,
Hii+1 i+1 i+1
Table 2. The results of comparison of dierent schemes for setting the reliability
parameter
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Average
1.0 17 14 67 12 18 28 14 43 16 - 29 16 20 16 63 25 - 11 15 24 24.89
1.5 21 15 66 16 22 27 16 64 20 - 28 25 2 17 38 32 - 14 17 28 27.17
2.0 23 18 77 20 24 28 20 72 22 21 30 30 29 28 35 38 47 16 18 28 31.2
r + d/k 18 14 49 16 18 24 17 47 16 16 30 21 25 17 34 30 32 15 15 25 23.95
Global Optimization Using Numerical Approximations of Derivatives 325
4 Conclusion
In the framework of the proposed approach to solving global optimization prob-
lems, the algorithm of AGMND-3 showed results close to results of AGMD. The
method with numerical derivatives looks even more eective since each trial in
AGMD includes calculation of the function and its derivative.
For further research it is necessary to continue computational experiments on
higher-dimensional optimization problems, as well as to provide some theoretical
basis for AGMND.
References
1. Gergel, V.P.: A method of using derivatives in the minimization of multiextremum
functions. Comput. Math. Math. Phys. 36(6), 729742 (1996)
2. Sergeyev, Y.D., Mukhametzhanov, M.S., Kvasov, D.E., Lera, D.: Derivative-free
local tuning and local improvement techniques embedded in the univariate global
optimization. J. Optim. Theor. Appl. 171(1), 186208 (2016)
3. Sergeyev, Y.D.: Global one-dimensional optimization using smooth auxiliary func-
tions. Math. Program. 81(1), 127146 (1998)
4. Strongin, R.G., Sergeyev, Y.D.: Global Optimization with Non-Convex Con-
straints: Sequential and Parallel Algorithms. Kluwer Academic Publishers,
Dordrecht (2000)
5. Strongin, R.G.: Numerical methods in multiextremal problems: information-
statistical algorithms. Nauka, Moscow (1978). (in Russian)
6. Barkalov, K., Gergel, V.P.: Parallel global optimization on GPU. J. Global Optim.
66(1), 320 (2016)
7. Gergel, V.P., Kuzmin, M.I., Solovyov, N.A., Grishagin, V.A.: Recognition of surface
defects of cold-rolling sheets based on method of localities. Int. Rev. Automat.
Control 8(1), 5155 (2015)
8. Barkalov, K., Gergel, V., Lebedev, I.: Use of xeon phi coprocessor for solving global
optimization problems. In: Malyshkin, V. (ed.) PaCT 2015. LNCS, vol. 9251, pp.
307318. Springer, Cham (2015). doi:10.1007/978-3-319-21909-7 31
9. Paulavicius, R., Zilinskas, J.: Advantages of simplicial partitioning for Lipschitz
optimization problems with linear constraints. Optim. Lett. 10(2), 237246 (2016)
10. Paulavicius, R., Sergeyev, Y.D., Kvasov, D.E., Zilinskas, J.: Globally-biased DIS-
IMPL algorithm for expensive global optimization. J. Global Optim. 59(23), 545
567 (2014)
11. Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of
Algorithmic Dierentiation, 2nd edn. Society for Industrial and Applied Mathe-
matics, Philadelphia (2008)
Global Optimization Challenges in Structured
Low Rank Approximation
2 Optimization Challenges
2.1 Challenge 1: Selecting f
where a1 , . . . , ar are some real numbers with ar = 0. The model (4) includes, as
a special case, the model of a sum of exponentially damped sinusoids:
q
sn = a exp (d n) sin (2 n + ), n = 1, . . . , N, (5)
=1
where d(, ) is a distance on RLK RLK and A is the set of all L K Hankel
matrices of rank r. This is the optimization problem (1).
The optimization problems (3) and (1) are equivalent if the distance functions
(, ) in (3) and d(, ) in (1) are such that
Further challenges arise when observations of the time series Y are classied
as exact or missing. In the former case the observation would require innite
weight, whilst in the latter the observation would require zero weight. Both
cases give rise to diculties computing with innite and innitesimals, and there
is need for methodology that allows one to represent innite and innitesimal
numbers by a nite number of symbols to execute arithmetical operations. There
is great potential for the use of grossone and the innity computer (see [11] for
more details on these topics).
One of the earliest known approaches to obtain a solution of (1) is the so-called
Cadzow iterations which are the alternating projections of the matrices, starting
at a given structured matrix, to the set of matrices of rank r (by performing
a singular value decomposition) and to the set of Hankel matrices (by diagonal
averaging). Despite the fact that Cadzow iterations guarantee convergence to
the set A, they can easily be shown to be sub-optimal (see [1]). They remain
popular due to their simplicity. One Cadzow iteration corresponds to the tech-
nique known as singular spectrum analysis, which has been an area of research
developed by the PI (see for example [1]). The main recent contributions to
nding a solution of (1) are described below.
1. Structured total least norm. Proposed by Park et al. [9], this class of methods
is aimed at rank reduction of a given Hankel matrix by 1 (that is, r = L 1).
2. Fitting a sum of damped sinusoids [10]. Methods in this category parameterize
the vector of observations as a sum of damped sinusoids and use the set of
unknown parameters as a feasible domain. This has been discussed earlier.
3. Local optimization methods starting at an existing approximation. Markovsky
and co-authors [8] have developed methodology and software to locally
improve an existing solution (or approximate solution) of (1).
In summary all of these methods suer from a number of aws [3] (i) the
rank of the matrix can only be reduced by one, (ii) they are based on local
optimizations and may not move signicantly from this initial approximation
and (iii) none have guaranteed convergence to the global optimum. Additionally
(and importantly) the focus in the literature has been on the case when the
distance f in (1) is taken to be the Frobenius norm (that is Q and R being the
identity matrices).
References
1. Gillard, J.: Cadzows basic algorithm, alternating projections and singular spec-
trum analysis. Stat. Interface 3(3), 335343 (2010)
2. Gillard, J., Zhigljavsky, A.: Analysis of structured low rank approximation as an
optimization problem. Informatica 22(4), 489505 (2011)
3. Gillard, J., Zhigljavsky, A.: Optimization challenges in the structured low rank
approximation problem. J. Global Optim. 57(3), 733751 (2013)
4. Gillard, J.W., Kvasov, D.: Lipschitz optimization methods for tting a sum of
damped sinusoids to a series of observations. Stat. Interface 10(1), 5970 (2017)
5. Gillard, J., Zhigljavsky, A.: Stochastic algorithms for solving structured low-rank
matrix approximation problems. Commun. Nonlinear Sci. Numer. Simul. 21(1),
7088 (2015)
6. Gillard, J., Zhigljavsky, A.: Weighted norms in subspace-based methods for time
series analysis. Numer. Linear Algebra Appl. 23(5), 947967 (2016)
7. Gillis, N., Glineur, F.: Low-rank matrix approximation with weights or missing
data is NP-hard. SIAM J. Matrix Anal. Appl. 32(4), 11491165 (2011)
8. Markovsky, I.: Low Rank Approximation: Algorithms, Implementation, Applica-
tions. Springer, London (2012)
9. Park, H., Zhang, L., Rosen, J.B.: Low rank approximation of a Hankel matrix by
structured total least norm. BIT Numer. Math. 39(4), 757779 (1999)
10. Sergeyev, Y.D., Kvasov, D.E., Mukhametzhanov, M.S.: On the least-squares t-
ting of data by sinusoids. In: Pardalos, P.M., Zhigljavsky, A., Zilinskas, J. (eds.)
Advances in Stochastic and Deterministic Global Optimization, Chap. 11. SOIA,
vol. 107, pp. 209226. Springer, Cham (2016). doi:10.1007/978-3-319-29975-4 11
11. Sergeyev, Y.D.: Numerical computations and mathematical modelling with innite
and innitesimal numbers. J. Appl. Math. Comput. 29(12), 177195 (2009)
A D.C. Programming Approach
to Fractional Problems
1 Introduction
We consider the following problem of the fractional optimization [1,10]
m
i (x)
(Pf ) f (x) := min, x S,
i=1
i (x) x
Theorem 1. [5] Suppose that in Problem (Pf ) i (x) > 0, i (x) > 0
and the assumption (H1 ) is satisfied. In addition, let there exist a vector
0 = (01 , . . . , 0m ) K IRm at which the nonnegativity condition
(H(0 )) holds. Besides, suppose that in Problem (P0 ) the following equality
takes place:
m
V(0 ) : = inf [i (x) 0i i (x)] : x S = 0. (1)
x
i=1
Let us emphasize the fact that the algorithm for solving Problem (Pf ) of
fractional optimization consists of 3 basic stages: the (a) local and (b) global
searches in Problem (P ) with a xed vector parameter and (c) the method
for nding the vector parameter at which the optimal value of Problem (P )
is zero.
334 T. Gruzdeva and A. Strekalovsky
Proposition 1. [6] Let the pair (x , ) IRn IRm be a solution to the fol-
lowing problem:
m
i (x)
i min , x S, i , i = 1, . . . , m. (2)
i=1
(x,) i (x)
i (x )
Then = i , i = 1, . . . , m.
i (x )
Corollary 1. For any solution (x , ) IRn IRm to the problem (2), the
point x will be a solution to Problem (Pf ).
The inequality constraints in the problem (2) can be replaced by the equiv-
alent constraints i (x) i i (x) 0, i = 1, . . . , m, since i (x) > 0 x S.
This yields the following problem with m nonconvex constraints:
m
(P) f0 := i min , x S, fi := i (x) i i (x) 0, i = 1, . . . , m.
i=1 (x,)
We intend to solve this problem using the exact penalization approach for d.c.
optimization developed in [13]. Therefore, we introduce the penalized problem
It can be readily seen that the penalized function () is a d.c. function. The the-
ory enables us to construct an algorithm which consists of two principal stages:
(a) local search, which provides an approximately critical point; (b) procedures
of escaping from critical points.
Actually, since > 0, (x) = G (x) H (x), H (x) := h0 (x) + hi (x),
iI
m
G (x) := (x)+H (x) = g0 (x)+ max hi (x); max[gi (x) + hj (x)] ,
i=1 iI jI, j=i
it is clear that G () and H () are convex functions.
Let the Lagrange multipliers, associated with the constraints and correspond-
ing to the point z k , k {1, 2, ...}, be denoted by := (1 , . . . , m ) IRm .
Global search scheme
Step 1. Using the local search method from [14], nd a critical point z k in (P).
m
Step 2. Set k := i . Choose a number : inf(G , S) sup(G , S).
i=1
Choose an initial 0 = G (z k ), k = (z k ).
Step 3. Construct a nite approximation
Rk () = {v 1 , . . . , v Nk | H (v i ) = + k , i = 1, . . . , Nk , Nk = Nk ()}
Step 4. Find a k -solution ui of the following Linearized Problem:
(P Li ) G (x)
H (v i ), x min, x S.
x
A D.C. Programming Approach to Fractional Problems 335
According to Corollary 1, the point z resulting from the global search strat-
egy will be a solution to the original fraction program. It should be noted that,
in contrast to the approach from Sect. 2, i will be found simultaneously with
the solution vector x.
4 Computational Simulations
Two approaches from above for solving the fractional programs (Pf ) via d.c. opti-
mization problems were successfully tested. The algorithm based on the method
for solving the equation V() = 0 from Sect. 2 (F1-algorithm) and the algorithm
based on the global search scheme from Sect. 3 (F2-algorithm) were applied to an
extended set of test examples for the various starting points. Several instances
of fractional problems from [2,79] with a small number of variables and a small
number of terms in the sum were used for computational experiments. Addition-
ally, randomly generated fractional problems with linear or quadratic functions
in the numerators and the denominators of ratios with up to 200 variables and
200 terms in the sum were successfully solved. All computational experiments
were performed on the Intel Core i7-4790K CPU 4.0 GHz. All convex auxiliary
problems (linearized problems) on the steps of F1-, F2-algorithms were solved
by the software package IBM ILOG CPLEX 12.6.2.
Table 1 presents results of some comparative computational testing of two
approaches (F1-, F2-algorithms) and employs the following designations: name
is the test example name; n is the number of variables (problems dimension);
m is the number of terms in the sum; f (x0 ) is the value of the goal function
to Problem (Pf ) at the starting point; f (z) is the value of the function at the
solution provided by the algorithms; it is the number of iterations of F1- or F2-
algorithms; T ime stands for the CPU time of computing (seconds).
Observe that one iteration of F1-algorithm and one iteration of F2-algorithm
dier in processing time and, therefore, cannot be compared. In the F1-algorithm
it denotes the number of times that we varied the parameter , while in the F2-
algorithm it stands for the number of iterations of the global search in solving
the nonconvex Problem (P ).
Computational experiments showed that solving of the fraction program
should combine the two approaches. For example, we can use the solution to
Problem (P ) to search for the parameter that reduces the optimal value
function of Problem (P ) to zero.
336 T. Gruzdeva and A. Strekalovsky
5 Conclusions
In this paper, we showed how fractional programs can be solved by applying the
Global Search Theory of d.c. optimization. The methods developed were justied
and tested on an extended set of problems with linear or quadratic functions in
the numerators and denominators of the ratios.
Acknowledgments. This work has been supported by the Russian Science Founda-
tion, Project No. 15-11-20015.
References
1. Bugarin, F., Henrion, D., Lasserre, J.-B.: Minimizing the sum of many rational
functions. Math. Prog. Comput. 8, 83111 (2016)
2. Chun-feng, W., San-yang, L.: New method for solving nonlinear sum of ratios
problem based on simplicial bisection. Syst. Eng. Theory Pract. 33(3), 742747
(2013)
3. Dinkelbach, W.: On nonlinear fractional programming. Manage. Sci. 13, 492498
(1967)
4. Freund, R.W., Jarre, F.: Solving the sum-of-ratios problem by an interior-point
method. J. Global Optim. 19(1), 83102 (2001)
5. Gruzdeva, T.V., Strekalovsky, A.S.: An approach to fractional programming via
d.c. optimization. AIP Conf. Proc. 1776, 090010 (2016)
6. Gruzdeva, T.V., Strekalovsky, A.S.: An approach to fractional programming via
d.c. constraints problem: local search. In: Kochetov, Y., Khachay, M., Beresnev,
V., Nurminski, E., Pardalos, P. (eds.) DOOR 2016. LNCS, vol. 9869, pp. 404417.
Springer, Cham (2016). doi:10.1007/978-3-319-44914-2 32
A D.C. Programming Approach to Fractional Problems 337
7. Ma, B., Geng, L., Yin, J., Fan, L.: An eective algorithm for globally solving a
class of linear fractional programming problem. J. Softw. 8(1), 118125 (2013)
8. Pandey, P., Punnen, A.P.: A simplex algorithm for piecewise-linear fractional pro-
gramming problems. European J. of Oper. Res. 178, 343358 (2007)
9. Raouf, O.A., Hezam, I.M.: Solving fractional programming problems based on
swarm intelligence. J. Ind. Eng. Int. 10, 5666 (2014)
10. Schaible, S., Shi, J.: Fractional programming: the sum-of-ratios case. Optim. Meth-
ods Softw. 18, 219229 (2003)
11. Strekalovsky, A.S.: On solving optimization problems with hidden nonconvex
structures. In: Rassias, T.M., Floudas, C.A., Butenko, S. (eds.) Optimization in
Science and Engineering, pp. 465502. Springer, New York (2014). doi:10.1007/
978-1-4939-0808-0 23
12. Strekalovsky, A.S.: Elements of nonconvex optimization. Nauka, Novosibirsk
(2003). [in Russian]
13. Strekalovsky, A.S.: On the merit and penalty functions for the d.c. optimization.
In: Kochetov, Y., Khachay, M., Beresnev, V., Nurminski, E., Pardalos, P. (eds.)
DOOR 2016. LNCS, vol. 9869, pp. 452466. Springer, Cham (2016). doi:10.1007/
978-3-319-44914-2 36
14. Strekalovsky, A.S.: On local search in d.c. optimization problems. Appl. Math.
Comput. 255, 7383 (2015)
Objective Function Decomposition
in Global Optimization
Oleg V. Khamisov(B)
1 Introduction
In this paper we consider global optimization problems in which objective func-
tions are explicitly given and can be represented as compositions of some other
functions. Many practical problems can be formulated in a such form [8,9,11].
An approach similar to the described below was suggested earlier in [4,7,13]. In
[5] an equivalent approach was suggested for utility function, i.e. for the case
when objective composite function has some monotonicity properties.
are practically tractable. The latter means, for example, a possibility to solve
Eq. (3) in x for given y subject to inclusion x X by an ecient algorithm.
Solving problem (1) corresponds to nding optimal and feasible point simulta-
neously. In problem (2)(3) optimality (i.e. minimization in (2)) and feasibility
(i.e. determining x for a given y in (3)) stages are separated: they are performed
in dierent spaces. What is exactly done in the reduction of problem (1) to
problem (2)(3) and what is understood under objective function decomposi-
tion in this paper is deleting some complexity from the objective function to the
constraints, i.e. moving a part of diculty from the optimality stage the fea-
sibility stage. The motivation of a such decomposition is a desire to distribute
diculty of the initial problem between objective and constraints more or less
uniformly. It is necessary to mention that structure of the objective function is
given, we just use it. In this case we perform explicit decomposition. There are
many cases when we need to discover good (or ecient) decomposition. In the
latter case the decomposition is implicit.
In practical minimization of F in (2) it is quite often necessary to localize a
global minimum in some compact subset of Rp . Dene values
y i yi y i , i = 1, . . . , p, (7)
y Y. (8)
Since (8) is a reformulation of the feasibility stage constraint (3) the inclusion
y Y will be referred to as induced constraint.
3 Agreed Decomposition
We will say that the composite objective function g has agreed variable decom-
position
g(x) = F (f1 (x1 ), . . . , fp (xp )), (9)
where xi X i Rni , i = 1, . . . , p, X 1 . . . X p = X and n1 + . . . + np = n.
Conversely, we will say that the function g has disagreed variable decomposition
if g is still representable in the form (9) and xi Rni , ni < n, i = 1. . . . , p, but
n1 + . . . + np > n.
340 O.V. Khamisov
X = X1 X2 , X1 = X2 = [10, 10].
It is obvious to set F (y1 , y2 ) = y1 y2 .
Objective Function Decomposition in Global Optimization 341
f 1 = f 2 = 12.87088549, f 1 = f 2 = 14.50800793.
(y) 0, (15)
y i yi y i , i = 1, . . . , p. (16)
Such reduction is eective when p < n (or even p n) and inner optimization
problem for calculating values of 1 can be eectively solved. The most appro-
priate example here is given by linear functions fi and small p, say, p 10.
Then the complexity is formed by nonconvex problem in p variables and it can
342 O.V. Khamisov
For xed (y1 , y2 ) system (17) always has a unique solution in (x1 , x2 ). Then
F (y1 , y2 ) = 1 + y12 3y12 20y1 + 36 30 + y22 3y22 16y2 + 18 .
5 Conclusion
An objective function decomposition in global optimization was discussed. Due
to decomposition we can obtain reduction in solution diculty. The suggested
approach can be considered as a starting decomposition scheme depending on
properties of F . Types of decomposition are generated by dierent classes of
functions F . Among well-known classes we mention multiplicative functions,
sum of ratios functions and so on. Other types of function F can be used.
References
1. Bromberg, M., Chang, T.C.: A function embedding technique for a class of global
optimization problems one-dimensional global optimization. In: Proceedings of the
28th IEEE Conference on Decision and Control, vol. 13, pp. 24512556 (1989)
2. Hansen, P., Jaumard, B.: Lipschitz optimization. In: Pardalos, P.M., Horst,
R. (eds.) Handbook of Global Optimization, pp. 407494. Kluwer Academic
Publishers, Dordrecht (1995)
3. Hansen, P., Jaumard, B., Lu, S.H.: An analytical approach to global optimization.
Math. Program. 52(1), 227254 (1991)
4. Hamed, A.S.E.-D., McCormick, G.P.: Calculations of bounds on variables satisfying
nonlinear equality constraints. J. Glob. Optim. 3, 2548 (1993)
5. Horst, R., Thoai, N.V.: Utility functions programs and optimization over ecient
set in multiple-objective decision making. JOTA 92(3), 605631 (1997)
6. Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches. Springer,
Heidelberg (1996). doi:10.1007/978-3-662-03199-5
7. McCormick, G.P.: Attempts to calculate global solution of problems that may have
local minima. In: Lootsma, F. (ed.) Numerical Methods for Nonlinear Optimiza-
tion, pp. 209221. Academic Press, London, New York (1972)
8. Pardalos, P.M.: An open global optimization problem on the unit sphere. J. Glob.
Optim. 6, 213 (1995)
9. Pardalos, P.M., Shalloway, D., Xue, G.: Optimization methods for computing
global minima of nonconvex potential energy functions. J. Glob. Optim. 4, 117133
(1994)
10. Paulavicius, R., Zilinskas, J.: Simplicial Global Optimization. Springer Briefs in
Optimization. Springer, New York (2014). doi:10.1007/978-1-4614-9093-7
11. Pinter, J.: Global Optimization in Action. Kluwer Academic Publishers, Dordrecht
(1996)
12. Sergeyev, Y.D., Strongin, R.G., Lera, D.: Introduction to Global Optimiza-
tion Exploiting Space-Filling Curves. Springer Briefs in Optimization. Springer,
New York (2013). doi:10.1007/978-1-4614-8042-6
13. Sniedovich, M., Macalalag, E., Findlay, S.: The simplex method as a global opti-
mizer: a C-programming perspectuve. J. Glob. Optim. 4, 89109 (1994)
14. Strekalovsky, A.S.: On solving optimization problems with hidden nonconvex
structures. In: Rassias, T.M., Floudas, C.A., Butenko, S. (eds.) Optimization in
Science and Engineering, pp. 465502. Springer, New York (2014). doi:10.1007/
978-1-4939-0808-0 23
344 O.V. Khamisov
15. Strongin, R.G., Sergeev, Y.D.: Global Optimization with Non-convex Constraints:
Sequential and Parallel Algorithms. Kluwer Academic Publishers, Dordrecht (2000)
16. Tuy, H.: D.C. optimization: theory, methods and algorithms. In: Pardalos, P.M.,
Horst, R. (eds.) Handbook of Global Optimization, pp. 149216. Kluwer Academic
Publishers, Dordrecht (1995)
Projection Approach Versus Gradient Descent
for Networks Flows Assignment Problem
1 Introduction
Huge amount of dierent practical problems are solved due to models of net-
works ows assignment. The most remarkable among them are road networks,
power grids and pipe networks [6,7]. The task it to estimate networks ows
assignment profile according to demands between all source-sink pairs. Gener-
ally, there are many source-sink pairs in a network (multicommodity networks).
In a multicommodity network ows from dierent commodities load common
arcs simultaneously and inuence on volume delays of each others.
In this paper we show that projection approach is more appropriate tech-
nique for coping with networks ows assignment problem than gradient descent.
Moreover, zig-zagging bahavior of gradient descent in the neighborhood of the
equilibrium solution is claried.
Network of parallel routes consists of two nodes (sources and sink) and n alterna-
tive arcs (routes). The demand between n source and sink is F . The demand F is
to be assigned among n routes: F = i=1 fi , fi 0, i = 1, n. Link performance
function is smooth non-decreasing function: ti C 1 (R+ ), ti (x) ti (y) 0 when
x y 0, x, y R+ , i = 1, n, where R+ non-negative orthant. Moreover, it
is believed that ti (x) 0, x 0 and ti (x)/x > 0, x > 0, i = 1, n.
From mathematical perspective, link-route and link-node formulations are
equivalent for the network of parallel routes. In such a case, networks ows
assignment problem could be expressed as follows:
n
fi
f = arg min ti (u)du, (1)
f 0
i=1
subject to
n
fi = F, (2)
i=1
fi 0 i = 1, n. (3)
According to results obtained in [4], there exists an explicit projection operator
to cope with the problem (1)(3). For the sake of convenience let us introduce
additional notations:
def def
ai (fi ) = ti (fi ) ti (fi )fi , bi (fi ) = ti (fi ), i = 1, n.
f = (f )
Projection Approach for Networks Flows Assignment Problem 347
3. To compute f k+1 :
mk+1 as (fsk )
1 F + s=1 bs (fsk ) ai (fik )
fik+1 = k
mk+1 1 , i = 1, mk+1 ,
bi (fi ) s=1 bs (f k ) bi (fik )
s
fik+1 = 0, i = mk+1 , n.
4. Termination criterion
mk+1 1
ti (f k+1 ) ti+1 (f k+1 ) < .
i i+1
i=1
4 Simulation Results
Zig-zagging behavior of such widely used gradient descent as Frank-Wolfe algo-
rithm became apparent in the 1970s [3,5]. That discussions were intuitive. Here
we investigate it in detail on the example of simple network of parallel routes.
Assume that volume delay functions for this network are dened as follows:
ti (fi ) = ci + di fi , i = 1, n.
subject to (2) and (3). Let y k be its solution, and pk = y k f k the resulting
search direction.
2. Find a step length lk , which solves the problem min T (f k + lpk ) | 0 l 1 ,
where T is the objective function (1).
3. Let f k+1 = f k + lk pk and Rk+1 = {fik+1 | fik+1 > 0.1, i = 1, n} is the set of
used routes.
4. If
ti (f k+1 ) tj (f k+1 ) < ,
i j
i,jRk+1
Projection Approach for Networks Flows Assignment Problem 349
I II
Acknowledgement. The rst author was jointly supported by a grant from the
Russian Science Foundation (Project No. 17-71-10069).
References
1. Dafermos, S.C., Sparrow, F.T.: The trac assignment problem for a general net-
work. J. Res. Nat. Bur. Stan. 73B, 91118 (1969)
2. Dafermos, S.-S.C.: Trac assignment and resource allocation in transportation net-
works. PhD thesis. Johns Hopkins University, Baltimore, MD (1968)
3. Holloway, C.A.: An extension of the Frank and Wolfe method of feasible directions.
Math. Program. 6, 1427 (1973)
4. Krylatov, A.Y.: Network ow assignment as a xed point problem. J. Appl. Ind.
Math. 10(2), 243256 (2016)
5. Meyer, G.G.L.: Accelerated Frank-Wolfe algorithms. SIAM J. Control 12, 655663
(1974)
6. Patriksson, M.: The Trac Assignment Problem: Models and Methods. Dover Pub-
lications, Inc., Mineola (2015)
7. Popov, I., Krylatov, A., Zakharov, V., Ivanov, D.: Competitive energy consumption
under transmission constraints in a multi-supplier power grid system. Int. J. Syst.
Sci. 48(5), 9941001 (2017)
8. She, Y.: Urban Transportation Networks: Equilibrium Analysis with Mathemati-
cal Programming Methods. Prentice-Hall, Inc., Englewood Clis (1985)
An Approximation Algorithm for Preemptive
Speed Scaling Scheduling of Parallel Jobs
with Migration
1 Introduction
Energy consumption of computing devices is an important issue in our days [9].
A popular technology to reduce energy usage is dynamic speed scaling, where
a processor may vary its speed dynamically. Running a job at a slower speed
is more energy ecient, however it takes longer time and may aect the per-
formance. One of the algorithmic and complexity study of this area is devoted
to revising classical scheduling problems with dynamic speed scaling (see e.g.
[1,4,6,7,9,13,14] and others).
In our paper we consider a basic speed scaling scheduling of parallel jobs.
Given a set J = {1, . . . , n} of parallel jobs to be executed on m parallel speed-
scalable processors. Each job j J is associated with a release date rj , a deadline
dj and a processing volume (work) Wj . Moreover, job j J simultaneously
requires exactly sizej processors at each time point when it is in process. Such
jobs are called rigid jobs [8].
We distinguish two variants of the problem. The rst variant (non-migratory
variant) allows the preemption of the jobs but not their migration. This means
that a job may be interrupted and resumed later on the same subset of sizej
processors, but it is not allowed to continue its execution on a dierent subset of
sizej processors. In the second variant (migratory variant) both the preemption
and the migration of jobs are allowed.
The standard homogeneous model in speed-scaling is considered. When a
processor runs at a speed s, then the rate with which the energy is consumed
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 351357, 2017.
https://doi.org/10.1007/978-3-319-69404-7_30
352 A. Kononov and Y. Kovalenko
2 Related Research
For the preemptive single-processor setting, Yao et al. [14] developed a polyno-
mial time algorithm, that outputs a minimum energy schedule. The preemptive
multiprocessor scheduling of single-processor jobs has been widely studied, see
e.g. [1,4,6,13]. The authors proposed exact polynomial algorithms for this prob-
lem with migration. The works [1,4,13] are based on dierent reductions of the
problem to maximum ow problems. As far as we know, the algorithm presented
in [13] has the best running time among the above-mentioned algorithms.
Albers et al. [3] studied the preemptive problem on parallel processors where
the migration of jobs among processors is disallowed. They showed that if all jobs
have unit work and deadlines are agreeable, an optimal schedule can be com-
puted in polynomial time. At the same time, the general speed scaling scheduling
problem with unit-work jobs was proved to be NP-hard, even on two proces-
sors. A common rule to design algorithms for problems without migration is to
rst dene some strategy that assigns jobs to processors, and then schedule the
assigned jobs separately
on each processor. Albers et al. [3] presented using this
rule an 24 -approximation algorithm for instances with agreeable deadlines,
1
and an 2 2 m -approximation algorithm for instances with common release
dates, or common deadlines. Greiner et al. [10] showed that any -approximation
algorithm for parallel processors with migration can be transformed into a B -
approximation algorithm for parallel processors without migration, where B
is the -th Bell number. The result holds when m.
Bampis et al. [5] considered the problem on heterogeneous processors with
preemption. They assume that each processor i has its own power function,
s(i) , and jobs characteristics are processor dependent. For the case where job
migrations are allowed, an algorithm has been proposed, that returns a solution
within an additive error in time polynomial in the problem size and in 1 . They
also developed an approximation algorithm of ratio (1 + ) B for the problem
without migration, where B is the generalized Bell number [5]. Recently, Albers
et al. [2] proposed a faster combinatorial algorithm based on ows for preemptive
scheduling of jobs whose density is lower bounded by a small constant, and the
migration is allowed.
An Approximation Algorithm for Preemptive Speed Scaling Scheduling 353
3 Our Result
Here we consider the speed scaling scheduling problem of rigid jobs with migra-
1 1
tion and present 2 m -approximation algorithm for this problem.
Our algorithm consists of two stages. At the rst stage we solve an auxiliary
min-cost max-ow problem in order to obtain a lower bound on the minimal
energy consumption and an assignment of the jobs to time intervals. At this
stage we follow the approach proposed in [13]. Then, at the second stage, we
determine speeds of jobs and schedule them separately for each time interval.
The first stage. Due to the convexity of the speed-to-power function, the energy
consumption is minimized if each job j is processed with a xed speed sj , which
does not change during the processing of the job. Therefore, we can formulate
the problem with the variables pj = Wj /sj , where pj is treated as an actual
processing time of job j J . The objective function is written as follows:
n
Wj
F = pj sizej .
j=1
pj
(s, j) = +, (s, j) As ,
(j, Ik ) = sizej
k , (j, Ik ) A0 , (1)
(Ik , t) = m
k , (Ik , t) At . (2)
354 A. Kononov and Y. Kovalenko
x(s,j)
We denote by x(u, v) the amount of ow on an arc (u, v). Note that pj = sizej
x(j,Ik )
denes a total duration of job j and pj,Ik = species a processing time of
sizej
Wj sizej
job j in the interval Ik . Let the cost of ow x(s, j) is x(s, j) x(s,j) , which
is a convex function with respect to x(s, j). The cost of ow on all other arcs is
set to be zero. Then the considered problem reduces to nding a maximum s t
ow in G = (V, A), that minimizes the total cost
n
Wj sizej
x(s, j) .
j=1
x(s, j)
Proof. Let l be the last job in the preemptive list-schedule (if there are several
such jobs, we choose a job with the smallest value sizej ), and let Cl be its com-
pletion time (the length of the schedule). We consider two cases: (I) sizel > m 2
and (II) sizel m 2.
Case (I): sizel > m 2 . According to the preemptive sizej -list-scheduling
algorithm we obtain that exactly one job is executed at each time moment and
m2
sizej sizel m+1 = (m+0.5)(m0.5) + 14 (m+0.5)(m0.5)+0.25 = 2m1 for
2 2(m0.5)
m
2(m0.5)
all j J . It follows that jJ pj,I sizej 21/m p j,I . From (2) we get
m
jJ
21/m jJ p j,I m and jJ p j,I 2 1
m .
Case (II): sizel m 2 . We claim that, at every point in time during the sched-
ule, either job l is undergoing processing on some sizel processors or job l is not
executed and at least (msizel +1) processors are busy. Therefore, the total load
of all processors jJ pj,I sizej is at least pl,I sizel + (Cl pl,I )(m sizel + 1).
Suppose that Cl > 2 m 1
, then we get the inequality
1
pj,I sizej > pl,I sizel + 2 pl,I (m sizel + 1)
m
jJ
= m + ( pl,I ) (m 2sizel + 1) + (sizel 1) m,
m
which leads to a contradiction.
As shown in [11], the approximation ratio of 2 m 1
for the preemptive
sizej -list-scheduling algorithm is tight even if sizej = 1 for all jobs. As a result,
1 1
the energy consumption is increased in 2 m times when we put the result-
ing schedule inside the interval I. Now we show that the approximation ratio of
our algorithm can not be improved even if we will use an exact algorithm for
minimization of makespan at the second stage.
4 Conclusion
We study the energy minimization problem of scheduling rigid jobs on m speed
scalable processors. For migratory case of the problem we propose a strongly
polynomial time approximation algorithm based on a reduction to the min-cost
1 1
max-ow problem. The algorithm has approximation ratio 2 m and this
bound is tight. Our result can be generalized to the case of job-dependent energy
consumption when each job j has its own constant j > 1. For this case our
1 1
algorithm obtains the 2 m -approximate solution where = max j .
jJ
References
1. Albers, S., Antoniadis, A., Greiner, G.: On multi-processor speed scaling with
migration. J. Comput. Syst. Sci. 81, 11941209 (2015)
2. Albers, S., Bampis, E., Letsios, D., Lucarelli, G., Stotz, R.: Scheduling on power-
heterogeneous processors. In: Kranakis, E., Navarro, G., Chavez, E. (eds.) LATIN
2016. LNCS, vol. 9644, pp. 4154. Springer, Heidelberg (2016). doi:10.1007/
978-3-662-49529-2 4
3. Albers, S., Muller, F., Schmelzer, S.: Speed scaling on parallel processors. In: 19th
ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2007, pp.
289298. ACM (2007)
4. Angel, E., Bampis, E., Kacem, F., Letsios, D.: Speed scaling on parallel processors
with migration. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-
Par 2012. LNCS, vol. 7484, pp. 128140. Springer, Heidelberg (2012). doi:10.1007/
978-3-642-32820-6 15
5. Bampis, E., Kononov, A., Letsios, D., Lucarelli, G., Sviridenko, M.: Energy ecient
scheduling and routing via randomized rounding. In: FSTTCS, pp. 449460 (2013)
6. Bingham, B.D., Greenstreet, M.R.: Energy optimal scheduling on multiprocessors
with migration. In: International Symposium on Parallel and Distributed Process-
ing with Applications, ISPA 2008, pp. 153161. IEEE, (2008)
7. Cohen-Addad, V., Li, Z., Mathieu, C., Milis, I.: Energy-ecient algorithms for non-
preemptive speed-scaling. In: Bampis, E., Svensson, O. (eds.) WAOA 2014. LNCS,
vol. 8952, pp. 107118. Springer, Cham (2015). doi:10.1007/978-3-319-18263-6 10
8. Drozdowski, M.: Scheduling for Parallel Processing. Springer-Verlag, London
(2009)
9. Gerards, M.E.T., Hurink, J.L., Holzenspies, P.K.F.: A survey of oine algorithms
for energy minimization under deadline constraints. Journ. Sched. 19, 319 (2016)
10. Greiner, G., Nonner, T., Souza, A.: The bell is ringing in speed-scaled multiproces-
sor scheduling. In: 21st ACM Symposium on Parallelism in Algorithms and Archi-
tectures, SPAA 2009, pp. 1118. ACM, (2009)
11. Johannes, B.: Scheduling parallel jobs to minimize the makespan. J. Sched. 9,
433452 (2006)
12. Kononov, A., Kovalenko, Y.: On speed scaling scheduling of parallel jobs with
preemption. In: Kochetov, Y., Khachay, M., Beresnev, V., Nurminski, E., Pardalos,
P. (eds.) DOOR 2016. LNCS, vol. 9869, pp. 309321. Springer, Cham (2016).
doi:10.1007/978-3-319-44914-2 25
An Approximation Algorithm for Preemptive Speed Scaling Scheduling 357
13. Shioura, A., Shakhlevich, N., Strusevich, V.: Energy saving computational
models with speed scaling via submodular optimization. In: Proceedings of
Third International Conference on Green Computing, Technology and Innovation
(ICGCTI2015), pp. 718 (2015)
14. Yao, F., Demers, A., Shenker, S.: A scheduling model for reduced CPU energy.
In: 36th Annual Symposium on Foundation of Computer Science, FOCS 1995, pp.
374382 (1995)
Learning and Intelligent Optimization
for Material Design Innovation
1 Introduction
Materials design is crucial for the long-lasting success of any technological sector, and
yet every technology is founded upon a particular materials design set. This is why the
pressure on development of new high-performance materials for use as high-tech
structural and functional components has become greater than ever. Although the
demand for materials is endlessly growing, experimental materials design is attached to
high costs and time-consuming procedures of synthesis. Consequently simulation
technologies have become completely essential for material design innovation [1].
Naturally the research community highly supports the advancement of simulation
technologies as it represents a massive platform for further development of scientic
methods and techniques. Yet computational material design innovation is a new
paradigm in which the usual route of materials selection is enhanced by concurrent
materials design simulations and computational applications [19].
Designing new materials is a multi-dimensional problem where multiple criteria of
design need to be satised. Consequently material design innovation would require
advanced multiobjective optimization (MOO) [13] and decision-support tools [12].
In addition the performance and behavior of new materials must be predicted in
Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 358363, 2017.
https://doi.org/10.1007/978-3-319-69404-7_31
Learning and Intelligent Optimization for Material Design Innovation 359
different design scenarios and conditions [2]. In fact predictive analytics and MOO
algorithms are the essential computation tools to tailor the atomic-scale structures,
chemical compositions and microstructures of materials for desired mechanical prop-
erties such as high-strength, high-toughness, high thermal and ionic conductivity, high
irradiation and corrosion resistance [7]. Via manipulating the atomic-scale dislocation,
phase transformation, diffusion, and soft vibrational modes the material behavior on
plasticity, fracture, thermal, and mass transport at the macroscopic level can be pre-
dicted and optimized accurately [17]. Therefore the framework of a predictive
simulation-based optimization of advanced materials, which yet to be realized, repre-
sents a central challenge within material simulation technology [9]. Consequently
material design innovation is facing the ever-growing need to provide a computational
toolbox that allows the development of tailor-made molecules and materials through
the optimization of materials behavior [10]. The goal of such toolbox is to provide
insight over the property of materials associated with their design, synthesis, pro-
cessing, characterization, and utilization [19].
requires machine learning and optimization combined [4]. Furthermore a great deal of
understanding on big data and prediction technologies for microstructure behavior of
existing materials, as well as the ability to test the behavior of new materials at the
atomic, microscopic and mesoscale is desired to condently modifying the materials
properties [7]. Numerical analysis further allows efcient experiments with entirely
new materials and molecules [20]. Basic machine learning technologies such as arti-
cial neural networks [21], and genetic algorithms [9], Bayesian probabilities and
machine learning [8], data mining of spectral decompositions [7], renement and
optimization by cluster expansion [20], structure map analysis and neural networks [1],
and support vector machines [19], have been recently used for this purpose.
Computational materials design innovation to perfect needs to dramatically improve
and put crucial components in place. To be precise, data mining, efcient codes, Big data
technologies, advanced machine learning techniques, intelligent and interactive MOO,
open and distributed networks of repositories, fast and effective descriptors, and
strategies to transfer knowledge to practical implementations are the research gaps to be
addressed [6]. In fact the current solvers rely only on a single algorithm and address
limited scales of the design problems [17]. In addition there is a lack of reliable visu-
alization tools to better involve engineers into the design loop [11]. The absence of
robust design, lack of the post-processing tools for multicriteria decision-making, lack
of Big data tools for an effective consideration of huge materials database are further
research gaps reported in literature [8]. To conclude, the process of computational
material design innovation requires a set of up-to-date solvers to cover a wide range of
problems. Further problem with the current open-source software toolboxes, reported in
[6], is that they require a concrete specication on the mathematical model, and also the
modeling solution is not flexible and adaptive. This has been a reason why the traditional
computation tools for materials design have not been realistic and as effective. Conse-
quently the vision of this work is to propose an interactive toolbox, where the solver
determines the optimal choices via visualization tools as demonstrated in [5]. Ultimately
the purpose is to construct a knowledge-based virtual test laboratory to simultaneously
optimize the hybrid materials microstructure systems, e.g. textile composites. Whether
building atomistic, continuum mechanics or multiscale models, the toolbox can provide
a platform to rearrange the appropriate solver according to the problem at hand. Such
platform contributes in advancement of innovative materials database leading to inno-
vative materials design with the optimal functionality.
3 LION as a Solver
The complex body of information of computational materials design requires the most
recent advancements in machine learning and MOO to scale to the complex and
multiobjective nature of the optimal materials design problems [10]. From this per-
spective the materials design can be seen as a high potential research area and a
continuous source of challenging problems for LION. In the LION way [3] every
individual design task, according to the problem at hand, can be modeled on the basis
of the solvers within the toolbox. To obtain a design model the methodology does not
ask to specify a model, but it experiments with the current system. The appropriate
Learning and Intelligent Optimization for Material Design Innovation 361
model is created in the toolbox and further is used to identify a better solution in a
learning cycle. The methodology is based on transferring data to knowledge to optimal
decisions through LION way i.e. a workflow that is referred to as prescriptive analytics
[4]. In addition an efcient Big data application [18] can be integrated to build models
and extract knowledge. Consequently a large database containing the properties of the
existing and hypothetical materials is interrogated in the search of materials with the
desired properties. Knowledge exploits to automate the discovery of improving solu-
tions i.e. connecting insight to decisions and actions [17]. As the result a massively
parallelized multiscale materials modeling tools that expand atomistic-simulation-based
predictive capability is established which leads to rational design of a variety of
innovative materials and applications.
A variety of solvers integrated within the LION include several algorithms for data
mining, machine learning, and predictive analytics which are tuned by cross-validation.
These solvers provide the ability of learning from data, and are empowered by reactive
search optimization (RSO) [4] i.e. the intelligent optimization tool that is integrated into
the solver. The LION way fosters research and development for intelligent optimization
and Reactive Search. Reactive Search stands for the integration of sub-symbolic
machine learning techniques into local search heuristics for solving complex opti-
mization problems via an internal online feedback loop for the self-tuning of critical
parameters [3, 12]. In fact RSO is the effective building block for solving complex
discrete and continuous optimization problems which can cure local minima traps.
Further, cooperating RSO coordinates a collection of interacting solvers which is
adapted in an online manner to the characteristics of the problem. LIONsolver [4],
LIONoso (a non-prot version of LIONsolver), and Grapheur [5], are the software
implementations of the LION way which can be customized for different usage con-
texts in materials design. These implementations have been used for solving a number
of real-life problems including materials selection [18], engineering design [14, 15],
computational mechanics [13], and Robotics [16].
To evaluate the effectiveness of the LION way the case study of textile composites
design with MOO, presented by Milani (2011), is reconsidered using Grapheur. This
case study describes a novel application of LION way dealing with decision conflicts
often seen among design criteria in composites materials design [18]. In this case study
it is necessary to explore optimal design options by simultaneously analyzing materials
properties in a multitude of disciplines, design objectives, and scales. The complexity
increases with considering the fact that the design objective functions are not mathe-
matically available and designer must be in the loop of optimization to evaluate the
Mesomanufacturing Scales of the draping behavior of textile composites. The case
study has a relatively large-scale decision space of electrical, mechanical, weight, cost,
and environmental attributes.
To solve the problem an interactive MOO model is created with Grapheur. With
the aid of the 7D visualization graph the designer in the loop formulates and sys-
tematically compares different alternatives against the large sets of design criteria to
362 A. Mosavi and T. Rabczuk
Fig. 1. 7D visualization graph for MOO and post-processing: Interactive MOO toolset of
Grapheur on exploring trade-offs and simultaneous screening the Mesomanufacturing Scales: the
multi-disciplinary property values of candidate materials are supplied from [12].
5 Conclusions
References
1. Artrith, N.H., Alexander, U.: An implementation of articial neural-network potentials for
atomistic materials simulations. Comput. Mater. Sci. 114, 135150 (2016)
2. Bayer, F.A.: Robust economic Model Predictive Control using stochastic information.
Automatica 74, 151161 (2016)
3. Battiti, R., Brunato, M.: The LION Way: Machine Learning plus Intelligent Optimization.
Lionlab, University of Trento, Italy (2015)
4. Brunato, M., Battiti, R.: Learning and intelligent optimization: one ring to rule them all.
Proc. VLDB Endow. 6, 11761177 (2013)
5. Brunato, M., Battiti, R.: Grapheur: a software architecture for reactive and interactive
optimization. In: Blum, C., Battiti, R. (eds.) LION 2010. LNCS, vol. 6073, pp. 232246.
Springer, Heidelberg (2010). doi:10.1007/978-3-642-13800-3_26
6. Ceder, G.: Opportunities and challenges for rst-principles materials design and applications
to Li battery materials. Mater. Res. Soc. Bull. 35, 693701 (2010)
7. Fischer, C.: Predicting crystal structure by merging data mining with quantum mechanics.
Nat. Mater. 5, 641646 (2006)
8. Jain, A.: A high-throughput infrastructure for density functional theory calculations.
Comput. Mater. Sci. 50, 22952310 (2011)
9. Johannesson, G.H.: Combined electronic structure and evolutionary search approach to
materials design. Phys. Rev. Lett. 88, 255268 (2002)
10. Lencer, D.: A map for phase-change materials. Nat. Mater. 7, 972977 (2008)
11. Mosavi, A.: Decision-making software architecture; the visualization and data mining
assisted approach. Int. J. Inf. Comput. Sci 3, 1226 (2014)
12. Milani, A.: Multiple criteria decision making with life cycle assessment for material selection
of composites. Express Polym. Lett. 5, 10621074 (2011)
13. Mosavi, A., Vaezipour, A.: Reactive search optimization; application to multiobjective
optimization problems. Appl. Math. 3, 15721582 (2012)
14. Mosavi, A.: A multicriteria decision making environment for engineering design and
production decision-making. Int. J. Comput. Appl. 69, 2638 (2013)
15. Mosavi, A.: Decision-making in complicated geometrical problems. Int. Comput. Appl. 87,
2225 (2014)
16. Mosavi, A., Varkonyi, A.: Learning in Robotics. Int. J. Comput. Appl. 157, 811 (2017)
17. Mosavi, A., Rabczuk, T., Varkonyi-Koczy, A.R.: Reviewing the novel machine learning
tools for materials design. In: Luca, D., Sirghi, L., Costin, C. (eds.) INTER-ACADEMIA
2017: Recent Advances in Technology Research and Education. Advances in Intelligent
Systems and Computing, vol. 660, pp. 5058. Springer, Cham (2018). doi:10.1007/978-3-
319-67459-9_7
18. Mosavi, A., et al.: Multiple criteria decision making integrated with mechanical modeling of
draping for material selection of textile composites. In Proceedings of 15th European
Conference on Composite Materials, Venice, Italy (2012)
19. Saito, T.: Computational Materials Design, vol. 34. Springer Science & Business Media,
Heidelberg (2013)
20. Stucke, D.P., Crespi, V.H.: Predictions of new crystalline states for assemblies of
nanoparticles. Nano Lett. 3, 11831186 (2003)
21. Sumpter, B.G., Noid, D.W.: On the design, analysis, and characterization of materials using
computational neural networks. Annu. Rev. Mater. Sci. 26, 223277 (1996)
Statistical Estimation in Global Random Search
Algorithms in Case of Large Dimensions
1 Introduction
where c0 and are some positive constants. The value of c0 is not important but
the value of is essential. The coecient is called tail index and its value is
usually known, as discussed below.
Let be a random variable which has c.d.f. F (t) and y1,n . . . yn,n be
the order statistics for the sample Y . By construction, f is the lower endpoint
of the random variable .
One of the most important result in the theory of extreme order statis-
tics states (see e.g. [3, Sect. 2.3]) that if (1) holds then the c.d.f. F (t) belongs
to the domain of attraction of the Weibull distribution with density (t) =
t1 exp {t } , t > 0 . This distribution has only one parameter, the tail
index .
In PRS we can usually have enough knowledge about f () to get the exact
value of the tail index . Particularly, the following statement holds: if the global
minimizer x of f () is unique and f () is locally quadratic around x then the
representation (1) holds with = d/2. However, if the global minimizer x of
f () is unique and f () is not locally quadratic around x then the representation
(1) may hold with = d. See [4] for a comprehensive description of the related
theory.
The result that has the same order as d when d is large implies the phe-
nomena called the curse of dimensionality. Let us rst illustrate this curse of
dimensionality on a simple numerical example.
366 A. Pepelyshev et al.
3 Numerical Examples
We investigate the minimization problem with the objective function f (x) =
eT1 x, where e1 = (1, 0, . . . , 0)T , and the set X is the unit ball: X = {x
Rd : ||x|| 1}. The minimal value is f = 1 and the global minimizer
z = (1, 0, . . . , 0)T is located at the boundary of X. Consider the PRS algo-
rithm with points xj generated from the uniform distribution PU on X.
Let us give some numerical values. In a simulation with n = 103 and d =
20, we have received y1,n = 0.6435, y2,n = 0.6107, y3,n = 0.6048 and
y4,n = 0.6021. In a simulation with n = 105 and d = 20, we have obtained
y1,n = 0.7437, y2,n = 0.7389, y3,n = 0.7323 and y4,n = 0.726. In Fig. 1 we
depict the dierences yk,n f for k = 1, 4, 10 and n = 103 , . . . , 1013 , where the
horizontal axis has logarithmic scale. We can see that the dierence yk,n y1,n is
much smaller than the dierence y1,n f ; that demonstrates that the problem
of estimating the minimal value of f is very hard.
Fig. 1. Dierences y1,n f (solid), y4,n f (dashed) and y10,n f (dotted), where
yk,n , k = 1, 4, 10, are records of evaluations of the function f (x) = eT1 x at points
x1 , . . . , xn with uniform distribution in the unit hyperball in the dimension d = 20
(left) and d = 50 (right).
Fig. 2. The dierence y1,n f (left) and y10,n y1,n (right) for n = 106 (solid)
and n = 1010 (dashed), where yj,n is the j-th record of evaluations of the function
f (x) = eT1 x at points x1 , . . . , xn with uniform distribution in the unit hyperball in the
dimension d; d varies in [5, 250].
Estimation in Random Search in Large Dimensions 367
1
k
ui
fn,k = yi,n , (2)
Ck, i=1
(i + 2/)
k
Ck, = i=1 1/i, = 2,
1
2 ( (k + 1)/ (k + 2/) 2/ (1 + 2/)) , = 2.
If the representation (1) holds, then for given k and and as n , the
estimator fn,k is a consistent and asymptotically unbiased estimator of f and
its asymptotic mean squared error E(fn,k f )2 has maximum possible rate
of convergence in the class of all consistent estimators including the maximum
likelihood estimator of f , as shown in [4, Chap. 7]. This mean squared error has
the following asymptotic form:
1 2((k) 1 + 1/k)
Ck,
+ , (4)
k k
for large , where () = ()/ () is the psi-function. Quality of this approxi-
mation is illustrated on Figs. 3 and 4.
In practice of global optimization, the standard estimator of f is the current
record y1,n = mini=1,...,n f (xi ). Its asymptotic mean squared error is
Fig. 3. The exact expression of Ck, (solid) and the approximation (4) (dashed) for
k = 2 (left) and k = 10 (right); varies in [5, 50].
Fig. 4. The exact expression of Ck, (solid) and the approximation (4) (dashed) for
= 4 (left) and = 7 (right); as k varies in [2, 25].
rules. But do we lose much by choosing the points at random? We claim that if
the dimension d is large then the use of quasi-random points instead of purely
random does not bring any advantage. Let us try to illustrate this using some
simulation experiments.
Using simulation studies we now investigate the performance of the PRS
algorithm with P = PU and quasi-random points generated from the Sobol low-
dispersion sequence.
d We examine the minimization problem with the objective
function f (x) = s=1 (xs | cos(s)|)2 and the set X = [0, 1]d in the dimension
d = 15. In this problem, the global minimum f = 0 is attained at the internal
point x = (| cos(1)|, . . . , | cos(d)|). For each run of the PRS algorithm, we gen-
erate n points and compute the records y1,n and y2,n , for n = 103 , 104 , 105 , 106 .
Estimation in Random Search in Large Dimensions 369
Fig. 6. Boxplot of records y1,n for 500 runs of the PRS algorithm with points generated
from the Sobol low-dispersion sequence (left) and the uniform distribution (right),
d = 15.
We repeat this procedure 500 times and show the obtained records as boxplots
in Fig. 6.
We can see that the performance of the PRS algorithm with points gener-
ated from the Sobol low-dispersion sequence and the uniform distribution is very
similar. We also note that the variability of y1,n is larger than variability of y4,n
and the dierence y10,n y4,n has a small variability.
Acknowledgements. The work of the rst author was partially supported by the
SPbSU project No. 6.38.435.2015 and the RFFI project No. 17-01-00161. The work
of the third author was supported by the Russian Science Foundation, project No.
15-11-30022 Global optimization, supercomputing computations, and applications.
References
1. Zhigljavsky, A.: Mathematical Theory of Global Random Search. Leningrad Uni-
versity Press (1985). in Russian
2. Zhigljavsky, A.: Branch and probability bound methods for global optimization.
Informatica 1(1), 125140 (1990)
3. Zhigljavsky, A., Zilinskas, A.: Stochastic Global Optimization. Springer, New York
(2008)
4. Zhigljavsky, A.: Theory of Global Random Search. Kluwer Academic Publishers,
Boston (1991)
5. Zhigljavsky, A., Hamilton, E.: Stopping rules in k-adaptive global random search
algorithms. J. Global Optim. 48(1), 8797 (2010)
6. Zilinskas, A., Zhigljavsky, A.: Branch and probability bound methods in
multi-objective optimization. Optim. Lett. 10(2), 341353 (2016). doi:10.1007/
s11590-014-0777-z
A Model of FPGA Massively Parallel
Calculations for Hard Problem of Scheduling
in Transportation Systems
1 Introduction
The reported study was funded by RFBR according to the research project 18-07-01078a.
where t0 maxft; ti g, and the minimal overall penalty is W W0min 0; ;. Thus, the
optimization task can be represented by a recurrent hierarchy of subtasks. The solution
to each task is based on all the solutions to all its subtasks.
The expression (t, S) is called the system state and each expression Wkmin t; S can
be calculated only once to be saved for a later usage.
The algorithm time costs are estimated by counting the overall number of unique
states (t, S). The t value can be limited by some maximum moment T (some dis-
cretization can be chosen for the required accuracy), and the overall number of unique
states will be 2n T. The requirements for the RAM usage will be exponential, too.
A heterogeneous approach which is considered in this work uses the classical von
Neumann system connected to a special accelerator. Generally, it can be one of several
types like GPU, FPGA or a specialized processor. In this work, we consider the
possibility of using FPGA as a coprocessor to accelerate the optimization task solving.
We suggest a model of massively parallel calculations for a part of the original
problem.
For example, let us consider the original problem with n = 22 and T = 32. The
process of the DP algorithm consists of a continuous calculation of Wkmin t; S values
for all possible system states (t, S) for the stage k from n to 0. Each Wkmin is calculated
on the base of the previously calculated values. The structure of system states realizes
the recurrent hierarchy. As shown on Fig. 1(a), the number of states for processing
depends on its depth in hierarchy Skh marks the set of processed jobs with the serial
number h for the stage k). The CPU calculation time h for each stage is shown on Fig. 1
(b). The most of time is spent for calculation of stages with the number k close to n/2.
Hence, for a parallel realization we can calculate each stage separately and use all the
resources.
Fig. 1. (a) DP states hierarchy by stages; (b) time in seconds needed for each stage solving.
A Model of FPGA Massively Parallel Calculations for Hard Problem 373
Fig. 2. (a) ALU DP hierarchy example for m = 4, exact schedules are shown; (b) one ALU.
1. The rst part generates ALU for all possible states (t, S) for t from 0 to T and
S 2 Z.
2. The second part generates busses Wkmin t; S for each ALU and connects the result
registers to all inputs of the other ALUs that require it.
374 M. Reznikov and Y. Fedosenko
This model for the size up to m = 11 has been synthesized for FPGA Virtex-7 565T
(see [10]) with clock 300 MHz, providing one solution to task per clock cycle with
pipelining.
The heterogeneous calculation process works in the following two cycles.
min
1. In the beginning of the solving process the FPGA calculates all Wnm values for the
stage (n-m). Each value is provided as the solution to the subtask of size m.
2. The CPU gets these values from the FPGA, stores in memory tables and then starts
the software-based dynamic programming calculation process from stage (n-m).
As a result of this approach, the CPU skips the rst m stages of the common DP
algorithm calculations. In the case of ability of the FPGA coprocessor to solve a
problem with the size equal to half of the original problem size n, the whole calculation
time can be reduced almost by factor two. Increasing m will lead to the less calculation
time. The problem solution time for a heterogeneous system containing FPGA-based
solver for schedule size m is presented on Fig. 3. The time required for the problem
solution using a 4-core modern CPU is shown as a reference. In particular, the time for
m = 11 was taken from the performance estimation of the heterogeneous system
prototype. The time for m > 11 is the result of mathematical modelling using a sim-
ulator. The original problem size is n = 22 and T = 32.
Fig. 3. Processing time h of the original problem in seconds for different FPGA-based solver
schedule size m compared with a 4-core CPU implementation.
The considered approach allows a linear scaling of DP algorithm for the original
problem solving. However, the maximum size m of the problem which can be solved is
signicantly limited by the available FPGA resources. Taking into account the fact that
the last generation of FPGA contains more resources than devices used in this work, we
can conclude that coprocessors can provide a signicant solving time reduction. By
using several FPGA devices it is possible to provide a linear scalability of the solving
algorithm.
A Model of FPGA Massively Parallel Calculations for Hard Problem 375
From the other hand, this model demonstrates the way how FPGAs or, generally,
application-specic integrated circuits (ASICs) can be used for a fast solving of
NP-hard discrete optimization problems.
4 Conclusions
As the main result of this work, a special model of a heterogeneous architecture with
the FPGA massively parallel calculations was suggested. The model is based on
decomposition of the original problem using dynamic programming method and pro-
vides the following benets:
a 24 times synthesis time reduction with one FPGA architecture in comparison to
the CPU-based realization,
the linear scalability is possible if more FPGA devices are used.
References
1. Kogan, D.I., Fedosenko, Yu.S.: Optimal servicing strategy design problems for stationary
objects in a one-dimensional working zone of a processor. Autom. Remote Control 71(10),
20582069 (2010)
2. Kogan, D.I., Fedosenko, Yu.S.: The discretization problem: analysis of computational
complexity, and polynomially solvable subclasses. Discrete Math. Appl. 6(5), 435447
(1996)
3. Sammarra, M., Cordeau, J.-F., Laporte, G., Monaco, M.F.: A tabu search heuristic for the
quay crane scheduling problem. J. Sched. 10(4), 327336 (2007)
4. Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming. Princeton Univ. Press,
Princeton (1962)
5. Cuencaa, J., Gimnezb, D., Martnez, J.: Heuristics for work distribution of a homogeneous
parallel dynamic programming scheme on heterogeneous systems. Parallel Comput. 31(7),
711735 (2005)
6. Gergel, V.P.: High-Performance Computing for Multi-Processor Multi-Core Systems. MGU
Press, Moscow (2010). (In Russian)
7. Brodtkorb, A.R., Dyken, C., Hagen, T.R., Hjelmervik, J.M., Storaasli, O.O.: State-of-the-art
in Heterogeneous Computing. Scientic Programming 18(1), 133 (2010)
8. Madhavan, A., Sherwood, T., Strukov, D.: Race logic: a hardware acceleration for dynamic
programming algorithms. In: ISCA 2014, Minnesota, USA, pp. 517528 (2014)
9. Tao, F., Zhang, L., Laili, Y.: Job shop scheduling with FPGA-based F4SA. In: Tao, F.,
Zhang, L., Laili, Y. (eds.) Congurable Intelligent Optimization Algorithm. SSAM, pp. 333
347. Springer, Cham (2015). doi:10.1007/978-3-319-08840-2_11
10. 7 Series FPGAs Data Sheet: Overview, March 2017. https://www.xilinx.com/support/
documenta-tion/data_sheets/ds180_7Series_Overview.pdf
Accelerating Gradient Descent with Projective
Response Surface Methodology
Alexander Senov(B)
1 Introduction
two drawbacks: they are memory and time consuming and their performance
depends on the chosen surrogate adequacy.
Additionally, the so-called multi-step optimization methods do use history
too (see, e.g., [8]). Many of them do use only xed amount of history, e.g. two-
step Heavy-ball method use only two past steps. Others do use parametrized
amount of history, like multi-step quasi-Newton method [3], but in slightly dif-
ferent way (e.g., not using the projection trick, or without explicit quadratic
approximation).
In this paper we propose a new method which is essentially an incorporation
of quadratic response surface methodology into the gradient descent algorithm.
To neutralize memory footprint of quadratic polynomial we use a projection
trick. The general idea of the proposed algorithm is to use a sequence of points
obtained from the gradient descent iterations as follows:
1. K1 consecutive points used to train incremental principal component analysis
algorithm which produces orthogonal projection matrix P.
2. K2 consecutive points used to collect training set in low-dimensional space
obtained with P.
3. Quadratic polynomial tted to collected training set and the argument of the
polynomial minimum returned back to the original space used as the next
point estimate.
These steps are executed iteratively producing an additional point every K1 +
K2 gradient descent iterations. We consider gradient descent algorithm as an
example, but it may be easily replaced with some other zero-order or rst-order
iterative optimization method.
The paper is organized as follows. In Sect. 2 we describe the proposed
algorithm which improves gradient descent by using the projective quadratic
response surface methodology. In Sect. 3 we provide theoretical motivation
behind it. Further, in Sect. 4 we report a case study on modelled data and discuss
its results. Finally, Sect. 5 concludes the paper.
2 Algorithm Description
A pseudo-code of the proposed algorithm (Algorithm 1) is given in Fig. 1. It has
the following parameters: f : Rd R function to be optimized; x f : Rd R
function gradient or its approximation (in case of zero-order method); R
step size; d N+ original space dimensionality; q N+ projective space
dimensionality; T N+ number of iterations; IncrPCA incremental PCA
algorithm; K1 N+ number of points for incremental PCA tting; K2 N+
number of points for surrogate construction.
We use x to denote sample mean vector. One might be confused by back-
ward projection step of transforming z in low-dimensional space to x in high-
dimensional space (line 18 in Algorithm 1). This transformation is motivated by
Proposition 1 (Sect. 3). As for the choice of principal component analysis as a
tool for the orthogonal projection construction it is rather motivated by practice
378 A. Senov
and intuition: the rst q principal components have the largest possible variance.
The more information (in terms of variance) we keep from the original points
the more accurate our approximation will be.
As one can see, the proposed modication utilizes: O (K1 qd) operations and
O (qd) memory
at step (1), O (K2 qd) operations
O (K2 q) memory at step (2)
and
and O K2 q 2 + q 3 operations and O K2 q+ q 2 memory at step (3). Hence,
2modication adds at most O qd + q in the number of operations and
3
this
O q + qd + K2 q in the memory consumption per single gradient descent iter-
ation.
3 Theoretical Background
In this section we provide theoretical motivation behind the proposed algorithm.
Proofs sketches are given in Appendix A.
1
argmin f (P z + v) = PA1 b.
z 2
Proposition 3. Consider function f (x) = x Ax + b x + c, where A Rdd ,
b Rd , c R and A 0, orthogonal projection matrix P Rqd , q < d,
sequence of gradient descent estimates {xt }K d K
1 R , their projections {zt }1
q K
R , zt = Pxt and corresponding function values {yt }1 , yt = f (xt ).
= 12 P PA1 b + (I P P)x. Then
Let x
2
1 1
argmin f 22
x
= (I P P) A bx
2 .
2
4 A Case Study
First, we describe the modelling strategy. For modelling purposes we use the
following default values: f (x) = x Id x 1Td x, where d is a variable parameter;
d
d 10, T 50, x0 U [0, 1] , = 105 ; q 1, K1 10, K2 5. We vary
parameters T , d, K1 and K2 independently (with other parameters xed) in the
following ranges: T [25, 30, 50, 100]; d [5, 10, 20, 50, 100]; K1 [2, 5, 10, 20];
K2 [2, 5, 10, 20]. For each parameters values combination we execute the gradi-
ent descent algorithm and the proposed algorithm with the same initial estimate
x0 for 103 times and calculate their errors as Euclidean distance from the real
optimum point to the algorithm estimate. Then we calculate a ratio of times
when the proposed algorithm error was less than the gradient descent error.
Table 1 contains the results of the experiment described above. monoton-
ically decreases with T increased since the proposed algorithm works well at
the start but fails to build a surrogate when the gradient descent oscillates near
the optimum point. Further, there are no evident dependency on q since for the
particular function the gradient descent estimates lie on the straight line. Thus,
they are perfectly described by a single dimension. Situation with parameter K1
380 A. Senov
Table 1. Results of the performed numerical experiments. is the ratio of times when
the proposed algorithm error was less than the gradient descent error
is similar to the parameter q and may be explained by the same reason. Finally,
a huge dierence in quality between K2 = 2 and K2 = 3 is explained by the fact
that one needs at least three points in one-dimensional space to construct the
second order polynomial.
5 Conclusion
We propose a modication to the gradient descent based on the quadratic
response surface methodology with the projection trick. We show that the pro-
posed modication can provide the best optimum approximation with respect
to the considered projection. The synthetic case study shows that the modied
gradient descent can be superior with respect to the original one in terms of the
optimum point estimation error. This modication may be used with other zero-
or rst-order iterative optimization method thus improving their performance.
A Proofs
Proof. (of Proposition 1)
K
x argmin xt x22
{xRd : Px=
z} t=1
K
= x || P Pxt + I P P xt P
z + I P P x ||22
t=1
K
= 2(I P P) xt 2K I P P x = 0.
t=1
z = PAP b + x I P P A P
2
1 1 1
= P A b + I P P x = PA1 b.
2 2
Proof. (of Proposition 3) From Propositions 1 and 2: x = I P P x
1 1
2 P PA b. Hence
2
1 1
argmin f 22
x
= A bx
2
2
2
1 1 1
= I P P x + P PA b A b
1
2 2 2
2
1 1
=
(I P P) 2 A b x .
2
References
1. Box, G.E., Draper, N.R., et al.: Empirical Model-Building and Response Surfaces.
John Wiley & Sons, New York (1987)
2. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press,
New York (2004)
382 A. Senov
3. Ford, J., Moghrabi, I.: Multi-step quasi-Newton methods for optimization. J. Com-
put. Appl. Math. 50(13), 305323 (1994)
4. Forrester, A., Keane, A.: Recent advances in surrogate-based optimization. Prog.
Aerosp. Sci. 45(1), 5079 (2009)
5. Granichin, O., Volkovich, V., Toledano-Kitai, D.: Randomized Algorithms in Auto-
matic Control and Data Mining. Springer, Heidelberg (2015)
6. Granichin, O.N.: Stochastic approximation search algorithms with randomization
at the input. Autom. Remote Control 76(5), 762775 (2015)
7. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course.
Springer, New York (2004)
8. Polyak, B.T.: Introduction to Optimization. Translations Series in Mathematics and
Engineering. Optimization Software (1987)
Emmental-Type GKLS-Based Multiextremal
Smooth Test Problems
with Non-linear Constraints
1 Introduction
Global optimization problems with and without constraints attract a great atten-
tion of researchers from both theoretical and practical viewpoints (see, e.g.,
[3,17,20], for the derivative-free global optimization, [5,9,13,18], for the real-life
engineering problems, [2,17], for the parallel global optimization, etc.).
Let us consider the following constrained problem
f = f (x ) = minf (x), D = {(x1 , ..., xN ) : ai xi bi , i = 1, ..., N } RN ,
xD
(1)
with p constraints
gj (x) 0, j = 1, ..., p, (2)
where the functions f (x) and gj (x), j = 1, ..., p, satisfy the Lipschitz condition
over the hyperinterval D
|f (x ) f (x )| L||x x ||,
(3)
|gj (x ) gj (x )| Lj ||x x ||, j = 1, ..., p, x , x D,
where L and Lj , 0 < L < and 0 < Lj < , j = 1, ..., p, are the Lipschitz
constants for the functions f (x) and gj (x) over the hyperinterval D, respectively
(hereafter || || denotes the Euclidean norm).
c Springer International Publishing AG 2017
R. Battiti et al. (Eds.): LION 2017, LNCS 10556, pp. 383388, 2017.
https://doi.org/10.1007/978-3-319-69404-7_35
384 Y.D. Sergeyev et al.
There exists a huge number of methods for solving (1)(3) (see, e.g.,
[11,16,21,22]). These methods often have a completely dierent nature and their
numerical comparison can be very dicult (see, e.g., [14,15] for a numerical com-
parison of metaheuristic and deterministic unconstrained global optimization
algorithms). In [19], a new tool called Operational zones for an ecient numeri-
cal comparison of constrained and unconstrained global optimization algorithms
of dierent nature has been proposed. To use it, classes of test problems are
required. On the one hand, there exist many generators of test problems for
global and local optimization (see, e.g., [23] for the landscape generators, [12] for
the multidimensional assignment problem generator, [4,7] for the wide analysis
of dierent test classes and generators, [1,10,16] for dierent classes and gen-
erators of unconstrained test problems). On the other hand, collections of test
problems are used usually in the framework of continuous constrained global
optimization (see, e.g., [6]) due to absence of test classes and generators for such
a type of problems. This paper introduces a new class of test problems with
non-linear constraints, known minimizers, and parameterizable diculty, where
both the objective function and constraints are continuously dierentiable.
Let us consider the unconstrained GKLS class of test problems with continuously
dierentiable objective function proposed in [8]. Test functions in this class are
generated by dening a convex quadratic function systematically distorted by
cubic polynomials in order to introduce local minima. The objective function
f (x) of the GKLS class is constructed by modifying a paraboloid Z
with the minimum value t at the point T int(D), where int(D) denotes the
interior of D, in such a way that the resulting function f (x) has m, m 2,
local minimizers: point T from (4) and points Mi int(D), Mi = T, Mi =
Mj , i, j = 2, ..., m, i = j. The paraboloid Z is modied by cubic polynomials
Ci (x) within balls Si D (not necessarily entirely contained in D) around each
point Mi , i = 2, ..., m (with M1 = T being the vertex of the paraboloid and M2
being the global minimizer of the problem), where
Each class contains 100 test problems and is dened by the following parame-
ters: problem dimension, number of local minima, value of the global minimum,
radius of the attraction region of the global minimizer, and distance from the
global minimizer to the vertex of the paraboloid. An example of the test problem
with 30 local minima is presented in Fig. 1a.
Emmental-Type GKLS-Based Multiextremal Smooth Test Problems 385
(a) (b)
Fig. 1. Original GKLS test function (a) and Emmental-type GKLS-based test function
(b) do not coincide even without constraints.
The rst-type constraints are constructed as follows. First, the local mini-
mizer (not the global one) Mimin nearest to the vertex of the paraboloid is taken
as follows
imin = arg min {||Mi M1 || i }, (7)
i=3,...,m
where i is the radius of the ball Si (the index i starts from i = 3 since in original
GKLS classes M2 is the global minimizer). Then, the polynomial Cimin around
Mimin is modied in a way, that its minimum value has been set to 2 |f |,
where f is the optimum value of the original unconstrained GKLS problem. The
global minimizer of the Emmental-type unconstrained test problem diers from
386 Y.D. Sergeyev et al.
the global minimizer of the original GKLS test problem due to this modication
(see Fig. 1b). The ball with the radius r1 = ||M1 Mimin ||imin , and the center
at the vertex of the paraboloid, i.e., G1 = M1 , is taken as the rst constraint.
Second, in order to guarantee that the global minimizer of the constrained
test problem does not coincide with the global minimizer of the unconstrained
modied problem, the ball with the radius r2 = imin and the center G2 = Mimin
is taken as the second constraint.
Then, in order to guarantee that the global minimizer M2 of the constrained
test problem is placed on the boundary, the ball with the radius r3 = 12 2 and
the center at the point G3 = 2||M1rM2 || M1 + (1 2||M1rM2 || )M2 is taken as
the third constraint (see Fig. 2a).
If p1 > 3, where p1 is the number of the rst-type constraints, p1 p, then
the fourth constraint is constructed symmetrically to the third one with respect
to the global minimizer M2 , i.e., r4 = r3 and G4 = M2 + (M2 G3 ) (see Fig. 2b).
The last p1 4 rst-type constraints are taken as p1 4 dierent random
balls Sj , j {3, ..., m}, j = imin, i.e., ri = j(i) and Gi = Mj(i) , i = 5, ..., p1 .
The p2 = p p1 constraints of the second type, where 0 p2 2N , are built
as follows. Random vertices cj = (cj1 , ..., cjN ), cji {ai , bi }, j = 1, ..., p2 , of D are
taken. Then, for each taken vertex cj , the nearest local or global minimum Mi(j)
is found. The (p1 + j)-th constraint is built as a ball with the center Gp1 +j = cj
and the radius rp1 +j = ||cj Mi(j) ||, j = 1, ..., p2 (see Fig. 2c).
The presented Emmental-GKLS class of test problems consists of 100 smooth
objective functions with the same characteristics as the original GKLS-based
test functions, with the global minimum located in a random point with a ran-
dom radius of its region of attraction. Moreover, it is built using p constraints,
including p1 constraints of the rst type, i.e., the constraints related to the local
minima, with 3 p1 m, and p2 = p p1 constraints of the second type, i.e.,
the constraints related to the vertices of D, with 0 p2 2N . The admissible
regions for (p1 , p2 ) = (3, 0), (4, 0), (4, 2), and (20, 2) are presented in Fig. 2.
We can conclude that in the obtained test class:
The global minimizer x of the constrained problem is known, is placed on the
boundaries of the admissible region in a point dierent w.r.t. global minimizer
of the unconstrained Emmental-type GKLS problem.
The global minimizer of the unconstrained Emmental-type GKLS problem
is known. It coincides with the nearest local minimizer to the vertex of the
paraboloid of the original unconstrained GKLS test problem, and diers from
the global minimizer of the constrained test problem.
The diculty of the constrained problem can be easily changed. The simplest
domain has p1 = 3 and p2 = 0 constraints and the hardest one has p1 =
m + 1 and p2 = 2N constraints (notice that the search domain can be simply
connected, biconnected or multiconnected). It should be noticed, that the
global minimizer cannot be an isolated point and is always accessible from a
feasible region having a positive volume.
The constraints are non-linear and satisfy the Lipschitz condition over the
hyperinterval D.
Emmental-Type GKLS-Based Multiextremal Smooth Test Problems 387
Fig. 2. The admissible region and the level curves of the objective function of the
Emmental-type GKLS-based test problem with (a) p1 = 3, p2 = 0; (b) p1 = 4, p2 = 0;
(c) p1 = 4, p2 = 2; (d) p1 = 20, p2 = 2. The admissible region in (d) contains 3 disjoint
subregions. The vertex of the paraboloid is indicated by and the global minimizer of
the constrained problem is indicated by +.
References
1. Addis, B., Locatelli, M.: A new class of test functions for global optimization. J.
Global Optim. 38, 479501 (2007)
2. Barkalov, K., Gergel, V., Lebedev, I.: Solving global optimization problems on
GPU cluster. In: Simos, T.E. (ed.) ICNAAM 2015: 13th International Conference
of Numerical Analysis and Applied Mathematics, vol. 1738, p. 400006. AIP Con-
ference Proceedings (2016)
3. Barkalov, K.A., Strongin, R.G.: A global optimization technique with an adaptive
order of checking for constraints. Comput. Math. Math. Phys. 42(9), 12891300
(2002)
4. Beasley, J.E.: Obtaining test problems via internet. J. Global Optim. 8(4), 429433
(1996)
5. Famularo, D., Pugliese, P., Sergeyev, Y.D.: A global optimization technique for
checking parametric robustness. Automatica 35, 16051611 (1999)
388 Y.D. Sergeyev et al.