Computational Intelligence in Expensive Optimization Problems (2010) (Attica)
Computational Intelligence in Expensive Optimization Problems (2010) (Attica)
Computational Intelligence in Expensive Optimization Problems (2010) (Attica)
)
Computational Intelligence in Expensive Optimization Problems
Computational Intelligence in
Expensive Optimization
Problems
123
ISBN 978-3-642-10700-9
e-ISBN 978-3-642-10701-6
DOI 10.1007/978-3-642-10701-6
Adaptation, Learning, and Optimization
ISSN 1867-4534
This work is subject to copyright. All rights are reserved, whether the whole or part
of the material is concerned, specically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microlm or in any other
way, and storage in data banks. Duplication of this publication or parts thereof is
permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from
Springer. Violations are liable to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this
publication does not imply, even in the absence of a specic statement, that such
names are exempt from the relevant protective laws and regulations and therefore
free for general use.
Typeset & Cover Design: Scientic Publishing Services Pvt. Ltd., Chennai, India.
Printed on acid-free paper
987654321
springer.com
Preface
Optimization is an essential part of research, both in science and in engineering. In many cases the research goal is an outcome of an optimization
problem, for example, improving a vehicles aerodynamics or a metal alloys
tensile strength.
Motivated by industrial demands, the process of design in science and engineering has undergone a major transformation. The advances in the elds
of intelligent computing paradigm and the introduction of massive computing power have facilitated a move away from paper-based analytical systems
towards digital models and computer simulations. Computer-aided design
optimization is now involved in a wide range of design applications, ranging
from large transatlantic airplanes to micro electro mechanical systems.
With the development of more powerful optimization techniques, the research community is continually seeking new optimization challenges and
to solve increasingly more complicated problems. An emerging class of such
challenging problems is known as the expensive optimization problems. High
computational cost can arise due to:
Resource-intensive evaluations of the objective function: such problems
arise when using computer-experiments, i.e., when a computer simulation replaces a real-world laboratory experiment during the optimization
process. Such simulations can be prohibitory expensive (require anywhere
from minutes to hours of evaluation time for each candidate solution). Also,
there is no analytic expression for the objective function or its derivatives,
requiring optimization algorithms which are derivative-free. Examples include wing shape optimization and electronic circuit design.
Very high dimensional problems: in problems with hundreds or thousands
of variables the curse of dimensionality implies locating an optimum
can be intractable due to the size of the search space. Examples include
scheduling problems and image analysis.
On top of these diculties, real-world optimization problems may exhibit additional challenges such as a complicated and non-smooth landscape,
VIII
Preface
multiple optima and discontinuities. Under these diculties classical optimization methods may perform poorly or may even fail to obtain a satisfactory solution within the allocated resources (such as computer time). To
circumvent this, researchers turn to computational intelligence methods such
as agent-based algorithms, fuzzy logic and articial neural networks. Such
methods have shown to perform well in challenging scenarios and they can
often handle a wide variety of problems when little or no-apriori knowledge
is available. These nature- and biologically-inspired techniques are capable of
learning the problem features during the optimization and this can improve
their performance and provide a better nal solution.
However, the application of computational intelligence methods to expensive optimization problems is not straightforward. Their robustness, also referred to as the exploration-exploitation trade-o, implies they do not exploit domain knowledge eciently and this can impair their convergence. For
example, an evolutionary algorithm may require many thousands of function
evaluations to obtain a satisfactory solution, which is unacceptable when
each function evaluation requires hours of computer run-time. This necessitates the need to explore various methods to bridge the missing gaps before
computational intelligence can be applied eectively to expensive problems.
Computational intelligence in Expensive Optimization Problems is a recent and emerging eld which has received increasing attention in the last
decade. This edited book represents the rst endeavor to provide a snapshot
of the current state-of-the-art in the eld, covering both theory and practice. This edition consists of chapters contributed by leading researchers in
the eld, demonstrating the dierent methodology and practice to handle
high computational cost of todays applications. This book is intended for
wide readership and can be read by engineers, researchers, senior undergraduates and graduates who are interested in the development of computational
intelligence techniques for expensive optimization problems.
This book is divided into 3 parts:
I Techniques for resource-intensive problems
II Techniques for high-dimensional problems
III Real-world applications
Part I considers the various methods to reduce the evaluation time, such as
using models (also known as surrogate-models or meta-models, which are
computationally cheaper approximations of the true expensive function) and
parallelization. This section starts with two surveys on the current state-ofthe-art. Shi and Rasheed survey a wide range of model-assisted algorithms,
including frameworks for model-management in single objective optimization while Santana-Quintero et al. survey tness approximations in multiobjective algorithms. Giannakoglou and Kampolis propose a exible parallel
multilevel evolutionary algorithm (EA) framework where each level can employ a dierent model, dierent search algorithm or dierent parametrization.
They describe the performance of their approach with real-world expensive
Preface
IX
aerodynamic shape optimization problems. Koziel and Bandler describe another approach which uses models of dierent delity, the space-mapping
method, to accelerate the optimization search. They apply their method to
electronic circuit design. In another related study, Takahama and Sakai propose methods for model management which assesses the model accuracy and
decides when a model needs to be improved. They implement their method
in a dierential evolution framework. Ginsbourger et al. parallelize the Ecient Global Optimization (EGO) algorithm which uses Kriging models and
the expected improvement criterion. They propose statistical criteria for selecting multiple sites to evaluate for each iteration. Guimaraes et al. propose
a memetic algorithm for expensive design optimization problems. Their algorithm identies promising regions and candidates from these regions are
identied with a higher delity model and are given more weight by the algorithm. Ochoa also employs statistical criteria and proposes using Estimation
of Distribution Algorithms (EDAs) to reduce the number of function evaluations. The study describes several approaches such as Boltzmann estimation
and the Shrinkage EDAs. Also within the evolutionary computing framework,
Fonseca et al. explore the use of similarity-based models (a nearest-neighbour
approach) to extend the number of generations of an evolutionary algorithm
in expensive optimization problems. Nakayama et al. and Bird and Li address the issues of expensive dynamic optimization problems. Nakayama et
al. describe a model-predictive control algorithm for dynamic and expensive
multiobjective optimization problems where they use a support-vector regression model. On the other hand, Bird and Li suggest a specialized particle
swarm optimization (PSO) algorithm with least-squares regressors. The regressors locally approximate the objective function landscape and accelerate
the convergence of the PSO to local optima.
In Part II, researchers explore sophisticated operators, such as those utilizing domain knowledge or which self-adapt during the search to combat
the curse of dimensionality. Caponio et al. implement a memetic algorithm
which combines dierential evolution (DE) with an adaptive local search
which scales the DE vector, along with other algorithmic enhancements. Carvalho and Ferreira tackle the electric network distribution problem, which is
a large scale combinatorial problem. They propose several hybrid Lamarckian
evolutionary algorithms with specialized operators. dos Santos et al. tackle
the traveling salesman problem (TSP) and propose a reinforcement learning
metaheuristic for a specialized parallel hybrid EA. They show performance
can be improved by using multiple search trajectories. S
ural et al. also focus
on the TSP and the TSP with back hauls problem and propose several evolutionary algorithms with specialized crossover and mutation and operators.
They show that utilizing domain knowledge improves the algorithms performance. Cococcioni et al. study multiobjective genetic Takagi-Sugeno fuzzy
systems in high-dimensional problems, which pose a challenge to such multiobjective EAs. They propose two enhancements to the multiobjective EA
to accelerate the search. Davis-Moradkhan and Browne propose a specialized
Preface
Preface
XI
Overall, the chapters in this volume discuss a wide range of topics which
reect the broad spectrum of computational intelligence in expensive optimization problems. The chapters highlight both the current achievements
and challenges and point to promising future research venues in this exciting
eld.
September 2009
Yoel Tenne
Chi-Keong Goh
Acknowledgement To Reviewers
Dudy Lim
Passi Luukka
Pramod Kumar Meher
Hirotaka Nakayama
Ferrante Neri
Thai Dung Nguyen
Alberto Ochoa
Yew-Soon Ong
Khaled Rasheed
Tapabrata Ray
Abdellah Salhi
Vui Ann Shim
Ofer M. Shir
Dimitri Solomatine
Sanjay Srivastava
Janusz Starzyk
Stephan Stilkerich
Haldun S
ural
Mohammhed B. Trabia
Massimiliano Vasile
Lingfeng Wang
Chee How Wong
Contents
3
3
5
5
7
9
11
14
17
17
17
18
18
19
19
20
20
20
21
23
24
XVI
Contents
29
29
30
31
31
32
32
33
34
35
36
37
38
40
41
42
43
45
46
48
52
52
53
54
61
62
64
65
65
66
67
68
68
70
Contents
3.6
Assessment of MultilevelHierarchical Optimization . . . . . . .
3.7
Optimization of an Annular Cascade . . . . . . . . . . . . . . . . . . . .
3.8
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
XVII
71
75
79
80
85
85
88
88
90
91
92
92
95
95
99
101
101
102
103
106
106
107
111
111
113
113
113
114
115
115
116
117
118
118
XVIII
Contents
5.4.2
119
120
120
122
122
126
128
128
131
132
132
132
134
135
135
137
142
143
147
149
150
150
151
155
157
157
157
158
159
160
Contents
XIX
163
164
168
170
171
175
175
180
182
182
185
188
189
193
193
195
195
197
197
198
199
200
201
202
205
208
208
210
213
214
216
XX
Contents
219
219
221
221
222
224
225
227
229
235
244
245
249
250
250
255
257
263
263
265
265
267
267
268
269
270
270
271
274
275
277
281
281
Contents
XXI
297
297
299
299
301
301
302
305
307
310
313
316
320
321
325
325
327
330
331
335
337
342
342
345
345
347
347
XXII
Contents
Haldun S
ural, Nur Evin Ozdemirel,
Ilter
Onder,
Meltem S
onmez Turan
15.1 The Traveling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . .
15.1.1 Conventional TSP Heuristics . . . . . . . . . . . . . . . . . . .
15.1.2 Metaheuristics for the TSP . . . . . . . . . . . . . . . . . . . . .
15.1.3 Evolutionary TSP Algorithms . . . . . . . . . . . . . . . . . .
15.1.4 The TSP with Backhauls . . . . . . . . . . . . . . . . . . . . . .
15.1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.2 The First Evolutionary Algorithm for the TSP . . . . . . . . . . .
15.2.1 Generating Ospring from the Union Graph . . . . . .
15.2.2 Nearest Neighbor Crossover (NNX) . . . . . . . . . . . . .
15.2.3 Greedy Crossover (GX) . . . . . . . . . . . . . . . . . . . . . . . .
15.2.4 Combined Use of the Crossover Operators . . . . . . . .
15.2.5 Proposed Mutation Operators . . . . . . . . . . . . . . . . . .
15.2.6 Other Settings of the First Evolutionary
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.2.7 Computational Results for the TSP . . . . . . . . . . . . .
15.3 The Second Evolutionary Algorithm for the TSP and the
TSPB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.3.1 More Than Two Parents and Multiple
Ospring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.3.2 Improved Mutation Operators . . . . . . . . . . . . . . . . . .
15.3.3 Computational Results for the TSP . . . . . . . . . . . . .
15.3.4 Computational Results for the TSPB . . . . . . . . . . . .
15.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
347
348
349
350
351
352
352
354
355
356
368
368
371
372
373
374
374
375
376
376
377
377
379
380
380
381
383
387
387
389
390
391
394
395
Contents
XXIII
397
397
399
400
402
403
403
405
406
406
407
409
413
420
421
423
423
424
425
429
432
432
432
433
434
437
440
441
442
447
448
449
XXIV
Contents
18.2
455
455
456
457
457
458
459
462
463
468
468
470
476
481
482
483
487
487
489
490
492
492
493
493
494
496
497
499
500
501
502
503
508
508
Contents
XXV
513
513
514
516
522
524
525
526
527
528
529
534
534
537
539
543
543
545
546
546
548
549
551
551
553
555
559
559
562
563
564
566
568
568
XXVI
Contents
571
571
573
575
576
577
577
578
579
580
582
584
584
585
585
589
589
590
590
591
591
592
594
594
595
597
598
598
599
599
600
600
600
602
604
605
Contents
XXVII
609
609
611
612
612
613
615
615
615
616
617
617
618
620
621
622
622
623
624
625
627
627
632
637
637
642
644
646
647
650
651
652
654
657
658
658
XXVIII
Contents
24.5.2
661
664
665
667
668
671
671
673
673
676
678
680
681
681
682
684
684
684
686
687
692
694
695
698
701
701
703
704
705
706
707
707
709
Contents
XXIX
710
711
714
715
716
721
721
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
Part I
Chapter 1
1.1 Introduction
In recent years, EAs have been applied to many real-world application domains and
gained much research interest. EAs proved to be powerful tools for optimization
problems and were therefore used in a wide range of real-world applications, especially for engineering design domains. In such domains, the so-called fitness functions are sometimes discontinuous, non-differential, with many local optima, noisy
L. Shi
Applied Research, McAfee
e-mail: liang_shi@mcafee.com
K. Rasheed
Computer Science Department, University of Georgia
e-mail: Khaled@uga.edu
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 328.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
and ambiguous. It was found that EAs perform better than the conventional optimizers such as sequential quadratic programming and Simulated Annealing [2, 3, 4, 73].
Many challenges still arise in the application of EAs to real-world domains.
For engineering design problems, a large number of objective evaluations may be
required in order to obtain near-optimal solutions. Moreover, the search space can be
complex, with many constraints and a small feasible region. However, determining
the fitness of each point may involve the use of a simulator or analysis code that takes
an extremely long time to execute. Therefore it would be difficult to be cavalier about
the number of objective evaluations used for an optimization [5, 6]. For tasks like
art design and music composition, no explicit fitness function exists; experienced
human users are needed to do the evaluation. A humans ability to deal with a large
number of evaluations is limited as humans easily get tired. Another challenge is that
the environment of an EA can be noisy, which means that the exact fitness cannot be
determined, and an approximate fitness must be assigned to each individual. An average fitness solution to the noise problem requires even more evaluations. For such
problems surrogate-assisted evolution methods based on fitness approximation are
preferable, as they can simulate the exact fitness at a much lower computational cost.
A good fitness approximation method can still lead the EA process to find optimal
or near-optimal solutions and is also tolerant to noise [7, 71].
In this chapter we further extend the discussion about fitness approximation by
introducing more concepts in this area and by presenting new developments in recent years. Three main aspects of fitness approximation are our main focus areas.
Those are the different types of fitness approximation methods, the working styles
and the management schemes of the fitness approximation.
For the methods of fitness approximation, instance-based learning methods, machine learning methods and statistical learning methods are the most popular ones.
Instance-based and machine learning methods include fitness inheritance, radial basis function models, the K-nearest-neighbor method, clustering techniques, and neural network methods. Statistical learning methods also known as functional models
such as the polynomial models, the Kriging models, and the support vector machines are all widely used for fitness approximation in EAs. Comparative studies
among these methods are presented in this chapter.
For the working styles of the fitness approximation, we discuss both direct and
indirect fitness replacement strategies. The direct fitness replacement method is to
use the approximate fitness to directly replace the original exact fitness during the
course of the EA process. Thus individuals mostly have the approximate fitness
during the optimization. The indirect fitness replacement method is to use the approximate fitness only for some but not all processes in the EA, such as population
initialization and EA operators. Individuals have the exact fitness during most if not
all of the optimization process.
With fitness approximation in EAs, the quality of an approximate model is
always a concern for lack of training data and the often high dimensionality
of the problem. Obtaining a perfect approximate model is not possible in such
cases. Usually the original fitness function is used with the approximate method
to solve this problem. The original fitness function can either correct some/all
individuals fitness in some generations or improve the approximate model by giving the exact fitness. This is called the management of the fitness approximation
or evolution control. In this chapter, different management methods of approximate
fitness are presented, including online fitness update, offline model training, online
model update, hierarchical models, and model migration. At the end of this charter,
two real-world expensive optimizations by surrogate-assisted EAs are given.
Fitness inheritance techniques are one of the main subclasses of fitness approximation techniques. One such technique simply assigns the fitness of a new solution
(child) based on the average fitness of its parents or a weighted average based on
how similar the child is to each parent [12]. To deal with a noisy fitness function,
a resampling method combined with a simple average fitness inheritance method is
used to reduce the computational cost in [15]. Another approach is to divide the population into building blocks according to certain schemata. Under this approach, an
individual obtains its fitness from the average fitness of all the members in its building block [13]. More sophisticated methods such as conditional probability tables
and decision trees are used in [14] for fitness inheritance.
Fig. 1.1 Fitness approximation methods, FI: Fitness Inheritance KNN: K-Nearest Neighbors RBF: Radial Basis Functions NN: Neural Networks DT: Decision Tree PM: Polynomial
Model SVM: Support Vector Machines
1.2.1.2
The Radial Basis Function (RBF) model is another instance-based learning method.
RBF networks can also be viewed as a type of neural networks. Since it is a very
popular technique for fitness approximation in EAs [3, 19, 31], it is worthy of being
introduced independently from the normal multilayer neural networks.
An RBF network consists of an input layer with the same number of input units
as the problem dimension, a single hidden layer of k nonlinear processing units and
an output layer of linear weights wI (Fig. 1.2). The size of the hidden layer (k) can
be equal to the sample size if the sample size is small. In the case of a larger sample
size, k is usually smaller than the sample size to avoid excessive calculations. This
RBF network is called the generalized RBF network. The output y(x) of the RBF
network is given as a linear combination of a set of radial basis functions expressed
in the following way:
k
y(x) = w0 + wi i (x ci )
(1.1)
i=1
where w0 and wi are the unknown coefficients to be learned. The term i (x ci),
also called the kernel, represents the ith radial basis function. It evaluates the distance between the input x and the center ci . For the generalized RBF network, the
centers ci are also unknown and have to be learned by other methods such as the
k-means method.
Typical choices for the kernel include linear splines, cubic splines, multiquadratics, thin-plate splines, and Gaussian kernels. A Gaussian kernel is the most
commonly used in practice, having the form:
x ci
(1.2)
i (x ci ) = exp
2 2
A detailed comprehensive description of RBF networks can be found in [32].
Clustering Techniques
several clusters and then build an approximate model for each cluster. The motivation is that multiple approximate models are believed to utilize more local information about the search space and fit the original fitness function better than a single
model [5, 20, 21].
1.2.2.2
Multilayer Perceptron Neural Networks (MLPNNs) usually utilize the backpropagation algorithm. MLPNNs have been proven to be powerful tools for fitness
approximation. A MLPNN model is generally used to accelerate the convergence by
replacing the original fitness function [34, 36]. In engineering design domains and
drug design, MLPNNs have been used to reduce the evaluation times of complex
fitness functions [35, 37, 38]. In [68], MLPNNs are used as surrogates to speed-up
the process of an expensive blade design problem.
A simple feed-forward MLPNN with one input layer, one hidden layer and one
output layer can be expressed as:
y(x) =
j=1
i=1
w j f wi j xi + j
+ 0
(1.3)
where n is the number of input neurons (which is usually equal to the problem
dimension), K is the number of nodes of the hidden layer, and the function f is
called the activation function. The structure of a feed-forward MLPNN is shown in
Fig. 1.3. W and are the unknown weights to be learned. The most commonly used
activation function is the logistic function, which has the form:
f (x) =
1
1 + exp(cx)
(1.4)
1.2.2.3
Other machine learning techniques were also applied for fitness approximation in
EAs. An individuals fitness can be estimated by its neighbors using the K-nearestneighbor algorithm [22]. The screening technique has been used for pre-selection
[2, 5]. Decision Tree (DT) is another machine learning technique which has been
used in [14].
Polynomial Models
Polynomial models (PM) are sometimes called Response Surfaces. Commonly used
quadratic polynomial models have the form:
n
) = a 0 + a i xi +
F(X
i=1
n,n
a i j xi x j
(1.5)
i=1, j=1
Kriging Models
The Kriging model consists of two component models which can be mathematically
expressed as:
10
(1.6)
Where f (x) represents a global model and Z(x) is the realization of a stationary
Gaussian random function that creates a localized deviation from the global model.
Typically f (x) is a polynomial and can be as simple as an underlying constant in
many cases, and then equation (1.6) becomes:
y(x) = + Z(x)
(1.7)
(1.8)
The correlation vector between x and the sampled data points is expressed as:
T
rT (x) = R(x, x1 ), R(x, x2 ), . . . , R(x, xn )
(1.10)
Estimation of the parameters is often carried out using the generalized least squares
method or the maximum likelihood method. Detailed implementations can be found
in [24, 25].
In addition to the approximate values, the Kriging method can also provide accuracy information about the fitting in the form of confidence intervals for the estimated
values without additional computational cost. In [6, 28], a Kriging model is used to
build the global models because it is believed to be a good solution for fitting complex surfaces. A Kriging model is used to pre-select the most promising solutions in
[29]. In [26, 27, 30], a Kriging model is used to accelerate the optimization or reduce
the expensive computational cost of the original fitness function. In [67], a Kriging
model with a pattern search technique is used to approximate the original expensive function. In [70], a Gaussian process method is used for landscape search in a
multi-objective optimization problem that gives promising performance. One disadvantage of the Kriging method is that it is sensitive to the problems dimension. The
computational cost is unacceptable when the dimension of the problem is high.
1.2.3.3
The SVM model is primarily a classifier that performs classification tasks by constructing hyper-planes in a multidimensional space to separate cases with different
11
class labels. Contemporary SVM models support both regression and classification
tasks and can handle multiple continuous and categorical variables. A detailed description of SVM models can be found in [40, 41]. SVM models compare favorably
to many other approximation models because they are not sensitive to local optima,
their optimization process does not depend on the problem dimensions, and overfitting is seldom an issue. Applications of SVM for fitness approximation can be
found in [42]. The regression SVM is used for constructing approximate models.
There are two types of regression SVMs: epsilon-SVM regression and nu-SVM regression. The epsilon-SVM regression model is more commonly used for fitness
approximation, where the linear epsilon-insensitive loss function is defined by:
L (x, y, f ) = |y f (x)| = max(0, |y f (x)| )
(1.11)
(1.12)
(1.13)
(1.14)
(1.15)
i , i 0, i = 1, . . . , N
(1.16)
Where (x) is called the kernel function. It may have the forms of linear, polynomial, Gaussian, RBF and sigmoid functions. The RBF is by far the most popular
choice of kernel type used in SVMs. This is mainly because of their localized and
finite responses across the entire range of the real x-axis. This optimization problem
can be solved by using quadratic programming techniques.
12
13
14
formula is used to decide whether to continue using the same type of model or
switch to the next at any time. Fig. 1.9 shows the evolution path.
15
The neural network model and the polynomial model were compared in [46, 72].
The study concluded that the performance of the two types of approximation was
comparable in terms of the number of function evaluations required to build the
approximations and the number of undetermined parameters associated with the approximations. However, the polynomial model had a much lower construction cost.
In [72], after evaluating both methods in several applications, the authors concluded
that both of them can perform comparably for modest data. In [43], a quadratic polynomial model was found to be the best method among the polynomial model, RBF
network, and the Quick-prop neural network when the models were built for regions
created by clustering techniques. The authors were in favor of the polynomial model
because they found that it formed approximations more than an order of magnitude
faster than the other methods and did not require any tuning of parameters. The
authors also pointed out that the polynomial approximation was in a mathematical form which could be algebraically analyzed and manipulated, as opposed to the
black-box results that neural networks give.
The Kriging model and the neural network model were compared using benchmark problems in [47]. However, no clear conclusion was drawn about which model
is better. Instead, the author showed that optimization with a meta-model could lead
to degraded performance. Another comparison was presented in [45] between the
polynomial model and the Kriging model. By testing these two models on a realworld engineering design problem, the author found that the polynomial and Kriging
approximations yielded comparable results with minimal difference in predictive capability. Comparisons between several approximate models were presented in [44],
which compared the performance of the polynomial model, the multivariate adaptive
splines model, the RBF model, and the Kriging model using 14 test problems with
different scales and nonlinearities. Their conclusion was that the polynomial model
is the best for low-order nonlinear problems, and the RBF model is the best for dealing with high-order nonlinear problems (details shown in Table 1.1). In [59], four
types of approximate models - Gaussian Process (Kriging), RBF, Polynomial model
and Extreme Learning Machine Neural Network (ELMNN) - were compared on
artificial unconstrained benchmark domains. Polynomial Models (PM) were found
to be the best for final solution quality and RBF was found to be the best when
considering correlation coefficients between the exact fitness and estimated fitness.
Table 1.2 shows the performance ranks of these four models in terms of the quality
of the final solution.
So far different approximate models have been compared based on their performance, but the word performance itself has not been clearly defined. This is because
the definition of performance may depend on the problem to be addressed, and multiple criteria need to be considered. Model accuracy is probably the most important
criterion, since approximate models with a low accuracy may lead the optimization
process to local optima. Model accuracy also should be based on new sample points
instead of the training data set points. The reason for this is that for some models
such as the neural network, overfitting is a common problem. In the case of overfitting, the model works very well on training data, yielding good model accuracy,
16
RBF
RBF
RBF
Table 1.2 Final quality measures for Kriging, PM. RBF and ELMNN approximate models
in [59]
Benchmark
domain
Kriging
Ackley
Griewank
Rosenbrock
Step
2
3
1
3
Method
PM RBF ELMNN
1
1
3
1
4
2
2
2
3
4
4
4
but may perform poorly on new sample points. The optimization process could
easily go in the wrong direction if it is assisted by a model suffering from overfitting. There are other important criteria to be considered, including robustness, efficiency, and time spent on model construction and updating. A fair comparison would
consider the model accuracy as well as all of these criteria.
It is difficult to draw a clear conclusion on which model is the best for the reasons
stated above, though the polynomial model seems to be the best choice for a local
model when dealing with local regions or clusters and enough sample points are
available [43]. In such cases, the fitting problem usually has low-order nonlinearity
and the polynomial model is the best candidate according to [44]. The polynomial
model is also believed to perform the best for problems with noise [44]. As for highorder nonlinear problems the RBF model is believed to be the best and it is the least
sensitive to the sample size and has the most robustness [44]. So the RBF model is a
good choice for a global model with or without many samples. In [72], NN is found
to perform significantly better than PM when search space is very complex and the
parameters are correctly set.
The SVM model is a powerful fitting tool that belongs to the class of kernel
methods. Because of the beneficial features of SVMs stated above, the SVM model
becomes a good choice for constructing a global model, especially for problems
with high dimension and many local optima, provided that a large sample of points
exists.
17
18
can be generated by selecting the best individual from a number of uniformly distributed random individuals in the design space according to the approximate fitness
[5, 43, 49].
Approximate fitness can also be used for crossover or mutation in a similar manner, through a technique known as Informed Operators [5, 17, 43, 49]. Under this
approach, the approximate models are used to evaluate candidates only during the
crossover and/or mutation process. After the crossover and/or mutation process,
the exact fitness is still computed for the newly created candidate solutions. Using
the approximate fitness indirectly in the form of Informed Operators - rather than direct evaluation - is expected to keep the optimization moving toward the true global
optima and to reduce the risk of convergence to suboptimal solutions because each
individual in the population is still assigned its exact fitness [49]. Experimental results have shown that a surrogate-assisted informed operator-based multi objective
GA can outperform state-of-art multi objective GAs for several benchmark problems [5]. Informed Operators also make it easy to use surrogates adaptively, as the
number of candidates can be adaptively determined. Some of the informed operators
used in [49] are explained as follows:
Informed initialization: Approximate fitness is used for population pre-selection.
Instead of generating a random initial population, an individual for the initial
population can be generated by selecting the best individuals from a number of
uniformly distributed random individuals in the design space according to the
approximate fitness.
Informed mutation: To perform informed mutation, several random mutations
of the base point are generated. The mutation with the best approximate fitness
value is returned as the result.
Informed crossover: Two parents are selected at random according to the usual
selection strategy. These two parents are not changed in the course of the
informed crossover operation. Several crossovers are conducted by randomly
selecting a crossover method, randomly selecting its internal parameters and
applying it to the two parents to generate a potential child. The surrogate is used
to evaluate every potential child, and the best child is selected as the outcome.
19
individuals in some/all generations. There are two categories of Evolution Control methods: Fixed Evolution Control and Adaptive Evolution Control. For fixed
evolution control, there are individual-based and generation-based methods. In individual-based evolution control, only some selected individuals are evaluated by
the exact fitness function. The individual selection can be random or using some
strategy, e.g., selecting the best individual (according to the surrogate) for evolution
control. In generation-based Evolution Control, all individuals in a selected generation will be evaluated by the original fitness function, the generation selection can
be random or with a fixed frequency. The adaptive Evolution Control adjusts the
frequency of control according to the fidelity of the surrogates.
20
21
22
62.5945
45.0132
17.5813
-12.8733
12.8733
5.53123
1.6764
0
of the aircraft, and drymass, which provides a rough approximation of the cost of
building the aircraft. In summary, the problem has 12 parameters and 37 inequality
constraints and only 0.6% of the search space is evaluable.
Fig. 1.15 shows a performance comparison in this domain. Each curve in the figure shows the average of 15 runs of GADO starting from random initial populations.
The experiments were done once for each surrogate: Least Square PM (LS), QuickProp NN (QP) and RBF in addition to one without the surrogate-assisted informed
operators altogether, with all other parameters kept the same. Fig. 1.15 demonstrates
the performance with each of the three surrogate-assisted methods as well as performance with no approximation at all (the solid line). The figure plots the average
(over the 15 runs) of the best measure of merit found so far in the optimization as
a function of the number of iterations. The figure shows that all surrogate-assisted
methods are better than the plain GADO and the LS approximation method gave the
best performance in all stages of the search in this domain.
23
240
GADO_Aircraft
GADO_Aircraft_LS
GADO_Aircraft_QP
GADO_Aircraft_RBF
230
220
210
200
190
180
170
500
1000
1500
2000
2500
3000
3500
4000
Fig. 1.15 Four GA methods comparison in supersonic aircraft design domain, landscape
shows numbers of fitness function evaluations and vertical direction shows fitness values
24
(1.17)
(1.18)
(1.19)
bh 0
(1.20)
Pc (x) 6000 0
0.25 (x) 0
(1.21)
(1.22)
0.125 h 10
0.1 l,t, b 10
(1.23)
(1.24)
(1.25)
(1.26)
(1.27)
(1.28)
where
(1.29)
(1.30)
References
[1] Abuthinien, M., Chen, S., Hanzo, L.: Semi-blind joint maximum likelihood channel
estimation and data detection for MIMO systems. IEEE Signal Processing Letters 15,
202205 (2008)
[2] Rasheed, K.: GADO: A genetic algorithm for continuous design optimization. Technical Report DCS-TR-352, Department of Computer Science, Rutgers University. Ph.D.
Thesis (1998)
[3] Ong, Y.S., Nair, P.B., Keane, A.J., Wong, K.W.: Surrogate-Assisted Evolutionary Optimization Frameworks for High-Fidelity Engineering Design Problems. In: Jin, Y. (ed.)
Knowledge Incorporation in Evolutionary Computation. Studies in Fuzziness and Soft
Computing, pp. 307332. Springer, Heidelberg (2004)
[4] Schwefel, H.-P.: Evolution and Optimum Seeking. Wiley, Chichester (1995)
[5] Chafekar, D., Shi, L., Rasheed, K., Xuan, J.: Multi-objective GA optimization using
reduced models. IEEE Trans. on Systems, Man, and Cybernetics: Part C 9(2), 261265
(2005)
[6] Chung, H.-S., Alonso, J.J.: Multi-objective optimization using approximation modelbased genetic algorithms. Technical report 2004-4325, AIAA (2004)
25
26
[25] Williams, C.K.I., Rasmussen, C.E.: Gaussian Processes for regression. In: Touretzky,
D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing
Systems, vol. 8. MIT Press, Cambridge (1996)
27
[42] Llor`a, X., Sastry, K., Goldberg, D.E., Gupta, A., Lakshmi, L.: Combating User Fatigue
in iGAs: Partial Ordering, Support Vector Machines, and Synthetic Fitness. In: Proceedings of the 2005 conference on Genetic and evolutionary computation, pp. 13631370
(2005)
[43] Rasheed, K., Ni, X., Vattam, S.: Comparison of Methods for Developing Dynamic Reduced Models for Design Optimization. Soft Computing Journal 9(1), 2937 (2005)
[44] Jin, R., Chen, W., Simpson, T.W.: Comparative studies of metamodeling techniques
under multiple modeling criteria. Technical report 2000-4801, AIAA (2000)
[45] Simpson, T., Mauery, T., Korte, J., Mistree, F.: Comparison of response surface and
Kriging models for multidiscilinary design optimization. Technical report 98-4755,
AIAA (1998)
[46] Carpenter, W., Barthelemy, J.-F.: A comparison of polynomial approximation and artificial neural nets as response surface. Technical report 92-2247, AIAA (1992)
[47] Willmes, L., Baeck, T., Jin, Y., Sendhoff, B.: Comparing neural networks and kriging for
fitness approximation in evolutionary optimization. In: Proceedings of IEEE Congress
on Evolutionary Computation, pp. 663670 (2003)
[48] Branke, J., Schmidt, C.: Fast convergence by means of fitness estimation. Soft Computing Journal 9(1), 1320 (2005)
[49] Rasheed, K., Hirsh, H.: Informed operators: Speeding up genetic-algorithm-based design optimization using reduced models. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2000), pp. 628635 (2000)
[50] Biles, J.A.: GenJam: A genetic algorithm for generating jazz solos. In: Proceedings of
International Computer Music Conference, pp. 131137 (1994)
[51] Zhou, Z.Z., Ong, Y.S., Nair, P.B., Keane, A.J., Lum, K.Y.: Combining Global and Local Surrogate Models to Accelerate Evolutionary Optimization. IEEE Transactions on
Systems, Man and Cybernetics - Part C 37(1), 6676 (2007)
[52] Sefrioui, M., Periaux, J.: A hierarchical genetic algorithm using multiple models for optimization. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel,
H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 879888. Springer, Heidelberg
(2000)
[53] Skolicki, Z., De Jong, K.: The influence of migration sizes and intervals on island models. In: Proceedings of the 2005 conference on Genetic and evolutionary computation,
pp. 12951302 (2005)
[54] Rasheed, K., Hirsh, H.: Learning to be selective in genetic-algorithm-based design optimization. Artificial Intelligence in Engineering, Design, Analysis and Manufacturing 13, 157169 (1999)
[55] Hidovic, D., Rowe, J.E.: Validating a model of colon colouration using an evolution
strategy with adaptive approximations. In: Deb, K., et al. (eds.) GECCO 2004. LNCS,
vol. 3103, pp. 10051016. Springer, Heidelberg (2004)
[56] Ziegler, J., Banzhaf, W.: Decreasing the number of evaluations in evolutionary algorithms by using a meta-model of the fitness function. In: Ryan, C., Soule, T., Keijzer,
M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 264
275. Springer, Heidelberg (2003)
[57] Jin, Y., Branke, J.: Evolutionary optimization in uncertain environments: A survey.
IEEE Transactions on Evolutionary Computation 9(3), 303317 (2005)
[58] Ziegler, J., Banzhaf, W.: Decreasing the number of evaluations in evolutionary algorithms by using a meta-model of the fitness function. In: Ryan, C., Soule, T., Keijzer,
M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 264
275. Springer, Heidelberg (2003)
28
[59] Lim, D., Ong, Y.S., Jin, Y., Sendhoff, B.: A Study on Metamodeling Techniques, Ensembles, and Multi-Surrogates in Evolutionary Computation. In: Genetic and Evolutionary Computation Conference, London, UK, pp. 12881295. ACM Press, New York
(2007)
[60] Shi, L., Rasheed, K.: ASAGA: An Adaptive Surrogate-Assisted Genetic Algorithm. In:
Genetic and Evolutionary Computation Conference (GECCO 2008), pp. 10491056.
ACM Press, New York (2008)
[61] Regis, R.G., Shoemaker, C.A.: Local Function Approximation in Evolutionary Algorithms for the Optimization of Costly Functions. IEEE Transactions on Evolutionary
Computation 8(5), 490505 (2004)
[62] Zerpa, L.E., Queipo, N.V., Pintos, S., Salager, J.-L.: An Optimization Methodology of
Alkaline-surfactant-polymer Flooding Processes Using Field Scale Numerical Simulation and Multiple Surrogates. Journal of Petroleum Science and Engineering 47, 197
208 (2005)
[63] Lundstrom, D., Staffan, S., Shyy, W.: Hydraulic Turbine Diffuser Shape Optimization
by Multiple Surrogate Model Approximations of Pareto Fronts. Journal of Fluids Engineering 129(9), 12281240 (2007)
[64] Zhou, Z., Ong, Y.S., Lim, M.H., Lee, B.S.: Memetic Algorithm Using Multi-surrogates
for Computationally Expensive Optimization Problems. Soft Computing 11(10), 957
971 (2007)
[65] Goel, T., Haftka, R.T., Shyy, W., Queipo, N.V.: Ensemble of Surrogates? Structural and
Multidisciplinary Optimization 33, 199216 (2007)
[66] Sastry, K., Lima, C.F., Goldberg, D.E.: Evaluation Relaxation Using Substructural Information and Linear Estimation. In: Proceedings of the 8th annual conference on Genetic and Evolutionary Computation Conference (2006)
[67] Torczon, V., Trosset, M.: Using approximations to accelerate engineering design optimization. NASA/CR-1998-208460 (or ICASE Report No. 98-33) (1998)
[68] Pierret, S., Braembussche, R.A.V.: Turbomachinery Blade Design Using a NavierStokes Solver and ANN. Journal of Turbomachinery (ASME) 121(2) (1999)
[69] Goel, T., Vaidyanathan, R., Haftka, R.T., Shyy, W., Queipo, N.V., Tucker, K.: Response
surface approximation of Pareto optimal front in multi-objective optimization. Computer Methods in Applied Mechanics and Engineering (2007)
[70] Knowles, J.: ParEGO: A Hybrid Algorithm with On-Line Landscape Approximation for
Expensive Multiobjective Optimization Problems. IEEE Transactions on Evolutionary
Computation 10(1) (February 2005)
[71] Giannakoglou, K.C.: Design of optimal aerodynamic shapes using stochastic optimization methods and computational intelligence. Progress in Aerospace Sciences 38(1)
(2000)
[72] Shyy, W., Papila, N., Vaidyanathan, R., Tucker, K.: Global design optimization for aerodynamics and rocket propulsion components. Progress in Aerospace Sciences 37 (2001)
[73] Quagliarella, D., Periaux, J., Poloni, C., Winter, G. (eds.): Genetic Algorithms and Evolution Strategies in Engineering and Computer Science. Recent Advances and Industrial
Applications, ch. 13, pp. 267288. John Wiley and Sons, West Sussex (1997)
[74] Gelsey, A., Schwabacher, M., Smith, D.: Using modeling knowledge to guide design
space search. In: Fourth International Conference on Artificial Intelligence in Design
1996 (1996)
Part II
Chapter 2
Abstract. Evolutionary algorithms have been very popular for solving multiobjective optimization problems, mainly because of their ease of use, and their wide
applicability. However, multi-objective evolutionary algorithms (MOEAs) tend to
consume an important number of objective function evaluations, in order to achieve
a reasonably good approximation of the Pareto front. This is a major concern when
attempting to use MOEAs for real-world applications, since we can normally afford only a fairly limited number of fitness function evaluations in such cases. Despite these concerns, relatively few efforts have been reported in the literature to
reduce the computational cost of MOEAs. It has been until relatively recently, that
researchers have developed techniques to achieve an effective reduction of fitness
function evaluations by exploiting knowledge acquired during the search. In this
chapter, we analyze different proposals currently available in the specialized literature to deal with expensive functions in evolutionary multi-objective optimization.
Additionally, we review some real-world applications of these methods, which can
be seen as case studies in which such techniques led to a substantial reduction in
the computational cost of the MOEA adopted. Finally, we also indicate some of the
potential paths for future research in this area.
2.1 Introduction
In many disciplines, optimization problems have, in a natural form, two or more
objectives that we aim to minimize simultaneously, and which are normally in conflict with each other. These problems are called multi-objective, and their solution
Luis V. Santana-Quintero Alfredo Arias Montano Carlos A. Coello Coello
CINVESTAV-IPN (Evolutionary Computation Group)
Departamento de Computacion
Av. IPN No. 2508, Col. San Pedro Zacatenco
Mexico, D.F. 07360, Mexico
e-mail: lvspenny@hotmail.com,aarias@computacion.cs.cinvestav.mx,
ccoello@cs.cinvestav.mx
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 2959.
springerlink.com
Springer-Verlag Berlin Heidelberg 2010
30
gives rise not to one, but to a set of solutions representing the best possible tradeoffs among the objectives (the so-called Pareto optimal set). In the absence of users
preferences, all the solutions contained in the Pareto optimal set are equally good.
When plotted in objective function space, the contents of the Pareto optimal set
produces the so-called Pareto front.
Evolutionary algorithms (EAs) have become a popular search engine for solving
multi-objective optimization problems [17, 21], mainly because they are very easy
to use and have a wide applicability. However, multi-objective evolutionary algorithms (MOEAs) normally require a significant number of objective function evaluations, in order to achieve a reasonably good approximation of the Pareto front,
even when dealing with problems of low dimensionality. This is a major concern
when attempting to use MOEAs for real-world applications, since in many of them,
we can only afford a fairly limited number of fitness function evaluations.
Despite these concerns, relatively little efforts have been reported in the literature to reduce the computational cost of MOEAs, and several of them only focus
on algorithmic complexity (see for example [36]), in which little else can be done
because of the theoretical bounds related to nondominance checking [45].
It has been until relatively recently, that researchers have developed techniques
to achieve a reduction of fitness function evaluations by exploiting knowledge acquired during the search [42]. Knowledge of past evaluations can also be used to
build an empirical model that approximates the fitness function to optimize. This
approximation can then be used to predict promising new solutions at a smaller evaluation cost than that of the original problem [40, 42]. Current functional approximation models include Polynomials (response surface methodologies [30, 65]), neural
networks (e.g., multi-layer perceptrons (MLPs) [33, 34, 62]), radial-basis function
(RBF) networks [60, 77, 83], support vector machines (SVMs) [4, 71], Gaussian
processes [6, 78], and Kriging [24, 66] models. Other authors have adopted fitness
inheritance [67] or cultural algorithms [46] for the same purposes.
In this chapter several possible schemes are described, in which the use of the
knowledge from past solutions can help to guide the search of the new solutions, with
particular emphasis on MOEAs. The remainder of this chapter is organized as follows. In Section 2.2, we present basic concepts related to multi-objective optimization. Then, in Section 2.3 we discuss several schemes that incorporate knowledge
into the fitness evaluations of an evolutionary algorithm, providing a brief explanation of the surrogate models that have been used to approximate the fitness function.
Next in Section 2.4 some selected research works are discussed. Such works are
related to realworld engineering optimization problems, and can be considered as
case studies in which the use of the described techniques led to a substantial reduction in the computational cost of the MOEA adopted. Finally, in Section 2.5, our
conclusions and some potential paths for future research in this area are indicated.
31
gi (
x ) 0;
i = 1, . . . , m
h j (
x ) = 0;
j = 1, . . . , p
x ), f2 (
x ), . . . , fk (
x )]
f(x) = [ f1 (
In other words, we aim to determine from among the set S of all vectors (points)
which satisfy the constraints those that yield the optimum values for all the k
objective functions simultaneously. The constraints define the feasible region S and
any point
x in the feasible region is called a feasible point.
v = (v1 , . . . , vk ) if and
A vector
u = (u1 , . . . , uk ) is said to dominate a vector
A solution
xu S (where S is the feasible region) is said to be Pareto optimal
32
Fig. 2.1 Mapping of the Pareto optimal solutions to the objective function space
Problem
Approximation
33
Simulations
Functional
Approximation
(meta-models)
Evolutionary
Approximation
Clusters
Fitness Inheritance
2.3.1 Surrogates
In many practical engineering problems, we have black-box objective functions
whose algebraic definitions are not known. In order to construct an approximation function, it is required to have a set of sample points that help us to build a
meta-model of the problem. The objective of such meta-model is to reduce the total
number of evaluations performed on the real objective functions, while maintaining
34
a reasonably good quality of the results obtained. Thus, such meta-model is used to
predict promising new solutions at a smaller evaluation cost than that of the original
problem.
The accuracy of the surrogate model relies on the number of samples provided
in the search space, as well as on the selection of the appropriate model to represent
the objective functions. There exist a variety of techniques for constructing surrogate
models (see for example [79]). One example is least-square regression using loworder polynomials, also known as response surface methods. Comparisons of several
surrogate modeling techniques have been presented by Giunta and Watson [27] and
by Jin et al. [39].
A surrogate model is built when the objective functions are to be estimated. This
local model is built using a set of data points that lie on the local neighborhood of
the design. Since surrogate models will probably be built thousands of times during
the search, computational efficiency becomes a major issue of their construction
process.
In [43], Knowles and Nakayama present a survey of meta-modeling approaches
to solve specific problems. The authors discuss the problem on how to model each
objective function and how to improve the Pareto approximation set using a tradeoff method proposed by Nakayama et al. [56]. In multi-objective optimization problems, the trade-off method tries to satisfy an aspiration level at the k-th iteration, with
the help of a trade-off operator which changes the k-th level if the decision maker
(DM) is not satisfied with the solution. So, they combine the satisficing trade-off
method and meta-modeling for supporting the DM to get a final solution with a low
number of fitness function evaluations. They use the v Support Vector Regression method [57] as their meta-model and include two real-world multi-objective
optimization problems, using also a Radial Basis Function Network with a Genetic
Algorithm in searching the optimal value of the predicted objective function [58].
The proposed approach obtains good solutions within 1/10 or less analysis time
than a conventional optimization approach based on a quasi-Newton method with
approximated differentials.
35
course, possible. For the cases of quadratic polynomials, the response surface is
described as follows:
n
y = (0 ) + (i xi ) +
i=1
(i, j xi x j )
(2.1)
i, j=1,i j
where n is the number of variables, and 0 and i are the coefficients to be calculated. To estimate the unknown coefficients of the polynomial model, both the least
squares method (LSM) and the gradient method can be used, but either of them requires at least the same number of samples of the real objective function than the i
coefficients in order to obtain good results.
y(
x ) = a(
x ) + b(
x)
where a(
x ) represents the average long-term range behavior and the expected
value of the true function. This function can be modeled in various ways, such as
with polynomials or with trigonometric series as:
L
a(
x ) = a0 + ai j (xi ) j
i=1 j=1
deviation term. b( x ) is a Gaussian random function with zero mean and non-zero
covariance that represents a localized deviation from the global model. This function
represents a short-distance influence of every data point over the global model. The
)
b(
x ) = bn K(h(x, xn )) and h(x, xn ) = ( max
xmin
i
n=1
i=1 xi
and xmax
are the lower an upper bounds of the search space and xin
where xmin
i
i
denotes the i th component of the data point xn . However, the shape of K(h) has
a strong influence on the resulting aspect of the statistical model. That is the reason
why it is said that Kriging is used as an estimator or an interpolator.
36
i=1
i=1
f : Rd R :
x i g(
x
xi ) = i (
x
xi
)
(2.2)
x and
x i . So, f
where
x
xi
is the Euclidean distance between the points
becomes a function which is in the finite dimensional space spanned by the basis
functions:
gi :
x g(
x
xi
)
Now, lets suppose that we already know the values of a certain function H : Rd R
to use the x j as centers in the equation 2.2. If we want to force the function f to take
j {1, . . . , n} f j = f (
x j ) = (i (
xj
xi
))
i=1
In these equations, only the i are unknown, and the equations are linear in their
unknowns. Therefore, we can write these equations in matrix form:
1
f1
(0)
(
x1 x2
) . . . (
x1 xn
)
(
x2 x1
)
(0)
.
.
.
(
x
f
2
n 2
2
.. = ..
..
..
..
. .
.
.
.
n
fn
(
xn x1
) (
xn x2
) . . .
(0)
(2.3)
Typical choices for the basis function g(x) include linear splines, cubic splines,
multiquadrics, thin-plate splines and Gaussian functions as shown in Table 2.1.
37
X = input layer
W
W = hidden layer
Y = output layer
Fig. 2.3 A graphical representation of an MLP network with one hidden layer
y=
wi ai + b
i=1
where ai are the inputs of the neuron, and wi is the weight associated with the ith
input. The nonlinear function is called the activation function as it determines the
activation level of the neuron.
In Figure 2.3, we show an MLP network with one layer of linear output neurons and one layer of nonlinear neurons between the input and output neurons. The
middle layers are usually called hidden layers.
To learn a mapping Rn Rm by an MLP, its architecture should be the following:
it should have n input nodes and m output nodes with a single or multiple hidden
layer. The number of nodes in each hidden layer is generally a design decision.
38
2.3.5.1
Training an ANN
1 N c
1 N
2
2
(t
z
)
=
zi ||
ki ki
|| ti
2 i=1
2
i=1
k=1
where ti and zi are the ith -target and the ith -network output vectors of length c, respectively; W represents all the weights in the network. The backpropagation learning rule is based on a gradient descent. The weights are initialized with random
values, and are changed in a direction to reduce the error following the next rule:
J
W
The weight update for the hidden-output weights is given by:
Wnew = Wold
Wk j = (tk zk ) f (netk )y j
and the input-to-hidden weights learning rule is:
n
W ji = xi f (net j ) wk j k
k=1
where is the learning rate, i, j, k are the corresponding node indexes for each layer
and net j is the inner product of the input layer with the weights w ji at the hidden unit.
39
at the same time is as flat as possible. Lets suppose we are given training data
N
= (xt , yt )t=1
where yt R. Then, the f (x) is given by:
f (x) = w, x
+ b with w Rd , x Rd , b R
where ,
denotes the dot product in . A small w means that the regression is
flat. One way to ensure this, is to minimize the norm, ||w||2 = w, w
. The problem
can be written as a convex optimization problem:
minimize
sub ject to
1
2
2 ||w||
(2.4)
yi w, xi
b
w, xi
+ b yi
And one can introduce two slack variables i , i , for positive and negative deviations, i 0 and i 0, where i > 0 corresponds to a point for which w, xi
+ b >
yi + and i > 0 corresponds to a point for which w, xi
+ b < yi (as in Figure 2.4):
minimize C li=1 (i + i ) + 12 ||w||2
yi w, xi
b + i
sub ject to w, xi
+ b yi + i
i , i
0
(2.5)
The constant C > 0 determines the trade-off between the flatness of f and the
amount up to which deviations larger than are tolerated. The -insensitive loss
function [80] (see equation (2.6)) means that we tolerate errors up to and also
that errors beyond that value have a linear rather than a quadratic effect. This error
function is therefore more tolerant to noise and is thus, more robust.
0,
if || ;
|| =
(2.6)
|| , otherwise.
Figure 2.4, shows a plot of the -insensitive loss function. Note that only the points
outside the shaded region contribute to the cost of the function. It turns out that in
most cases, the optimization problem defined by equation (2.5) can be solved more
easily in its dual formulation. The dual formulation also provides the capability for
extending SVM to nonlinear functions using a standard dualization method based on
Lagrange multipliers, as described by Fletcher [25]. So, optimizing the Lagrangian
and substituting ti = w, xi
for simplicity, we have:
N
N
1
L = C (i + i ) + ||w||2 (i i + i i )
2
i=1
i=1
N
i=1
i=1
i ( + i + yn tn ) i ( + i + yn tn )
(2.7)
40
Then, we can substitute for y(x) using the linear model equation: y(x) = wT (x) + b
and set the derivatives of the Lagrangian with respect to 1) w, 2) b, 3) i and 4) i
to zero, giving:
N
L
= 0 w = (i i )(xi )
w
i=1
(2.8)
L
=0
b
(2.9)
(i i ) = 0
i=1
L
= 0 i + i = C
i
L
= 0 i + i = C
i
(2.10)
(2.11)
Using these results to eliminate the corresponding variables from the Lagrangian,
we see that the dual problem involves maximizing:
N
N
1 N N
(i i )( j j )k(xi , x j ) (i + i ) + (i i )tn
2 i=1 j=1
i=1
i=1
(2.12)
with respect to i and i , where k(xi , x j ) = (xi )T (x j ) is the kernel function. So,
the problem becomes a constrained maximization problem with the box constraints:
L (a, a ) =
0 i C
0 i C
And the predictions for new inputs can be made using:
N
y(x) = (i i )k(x, xi ) + b
(2.13)
i=1
The support vectors are those data points that contribute to predictions given by
equation 2.13, in other words those for which either i = 0 or i = 0. These are
points that lie on the boundary of the -tube or outside the tube. All points within
the tube have i = i = 0.
2.3.7 Clustering
Clustering is the unsupervised classification of patterns into groups (or clusters).
The clustering problem has been addressed in many contexts and by researchers in
many disciplines [35].
41
42
parents. This saves one fitness function evaluation, and is based on the assumption
of similarity of an offspring to its parents.
Fitness inheritance must not be always applied, since the algorithm needs to use
the true fitness function several times, in order to obtain enough information to guide
the search. The percentage of time in which fitness inheritance is applied is called
inheritance proportion. If this inheritance proportion is 1, the algorithm is most
likely to prematurely converge [8].
It is important to mention that some researchers consider this mechanism not
so useful in complex or real world problems, under the argument that it has been
only applied in easy problems. For example, Ducheyne et al. [23] tested the original scheme of fitness inheritance on a standard binary genetic algorithm and the
Zitzler-Deb-Thiele (ZDT) [84] multiobjective test problems, concluding that fitness
inheritance was not useful when dealing with difficult shapes of the Pareto front.
Other authors, however, have successfully applied fitness inheritance to the ZDT
and other (more complicated) test problems (see for example [67]).
43
Structural Optimization: The aeronautical/aerospace design philosophy focuses on the design of structures with minimum weight that are strong enough
to withstand certain design loads. These two objectives are conflicting in nature and, therefore, the aim of structural optimization is to find the best possible
compromise between them. Typical applications for this type of problems comprise structural shape and topology optimization, robust structural design and
structural weight optimization.
Multidisciplinary Design Optimization: aeronautical/aerospace design has a
multidisciplinary nature, since in many practical design applications, two or more
disciplines are involved, each one with specific performances to accomplish.
Typical applications for this type of problems are the aeroelastic applications
in which aerodynamics and structural engineering are the interacting disciplines.
For all the optimization problems indicated above, the objective function evaluations are routinarily done by using complex computational simulations such as
CFD (Computational Fluid Dynamics) in the case of aerodynamic problems, CAA
(Computational Aero-Acoustics) for aero-acoustic problems, CSM (Computational
Structural Mechanics, by means of Finite Element Method software) for Structural
Optimization Problems, or a combination of them in the case of multidisciplinary
design optimization problems. Because of their nature, any of these computational
simulations have a high computational cost (since they solve, in an iterative manner, the set of partial differential equation governing the physics of the problem) and
evaluating the objective functions for the kind of problems indicated above, can take
from minutes to hours for a single candidate solution, depending on the fidelity of
the simulation.
Nowadays in aeronautical/aerospace industries, MOEAs have gained popularity and are considered as a mature and reliable numerical optimization tool, since
they provide to the designers not only with one design solution, but with a set of
them from which the tradeoff between the competing objectives can be assessed.
This last situation can help decision makers to select a compromise design according to his/her own preferences. Given the high computational cost required for the
computational simulations and the population based nature of MOEAs, the use of
hybrid methods or meta-models is a natural choice in order to reduce the computational cost of the design optimization process, as indicated by some representative
research works that will be described next.
44
45
Lee et al. [49, 50] made use of a generic framework for multidisciplinary design
and optimization [31] to explore the application of a robust MOEA-based algorithm
for improving the aerodynamic and radar cross section characteristics of an UCAV
(Unmanned Combat Aerial Vehicle). In both applications, two disciplines are considered, the first concerning the aerodynamic efficiency and the second one dealing
with the visual and radar signature of an UCAV airplane. The evolutionary Algorithm employed corresponds to the HAPMOEA indicated above. In this case, the
minimization of three objective functions is considered: (i) inverse of the lift/drag
ratio at ingress condition, (ii) inverse of the lift/drag ratio at cruise condition, and
(iii) frontal area. The problem has, approximately, 100 decision variables, and the
first two objective functions are evaluated using a potential flow solver (FLO22)
coupled to FRICTION code for obtaining the viscous drag. The use of these last
two codes approximates the Navier-Stokes flow solution, considerably reducing the
computational cost. The evolutionary system evaluates a total of 1600 solution candidates from which, a Pareto set containing 30 members is obtained. From these
nondominated solutions, a single compromise solution is obtained. The authors
reported a solution time of 200 hours on a single processor.
46
competing objectives are considered: i) combustion length, ii) injector face temperature, iii) injector wall temperature, and iv) injector tip temperature. In this research,
the NSGA-IIa (referred to as archiving NSGA-II [22]), and a local search strategy
called constraint are adopted to generate a solution set that is used for approximating the Pareto optimal front by a response surface method (RSM). Once
the Pareto optimal solutions are obtained, a clustering technique is used to select
representative tradeoff design solutions.
Pagano et al. [61] presented an application for three-dimensional aerodynamic
shape optimization, particularly the aerodynamic shape of an aircraft propeller. The
aim of this multiobjective optimization is to improve an actual propeller performance. The authors considered two conflicting objectives: (i) minimize noise emission level, and (ii) maximize aerodynamic propeller efficiency. For this industrial
problem, several disciplines are considered and, therefore the objective function
evaluations consider: (a) aerodynamics, (b) structural behavior, and (c) aeroacustics. For each of these, specialized computer simulation codes are employed. Every
calculation comprises an iterative coupling procedure (fluids-structures-acoustics)
among these simulation codes in order to evaluate a more realistic operating condition. As a consequence, the optimization process becomes computationally demanding. In order to reduce the burden of this high computational cost, the authors
made use of design of experiment techniques (DOE), and a quadratic response surface method (RSM) for efficiently exploring the design space. The geometry for the
propeller blade is parameterized using a total of 14 design variables. The optimization problem contains constraints on the geometry design variables and on propeller
shaft power at two flight conditions; takeoff and cruise, respectively. The evolutionary algorithm employed corresponds to the NSEA+ (Nondominated Sorting Evolutionary Algorithm) as implemented in the OPTIMUS commercial code which is
adopted by the authors. The population size for the evolutionary algorithm is set
to 20 individuals, and the optimization is run using the DOE and RSM methods.
Afterwards, the Pareto front solutions obtained are evaluated using the high fidelity
simulation codes. The authors indicated that a total of 340 designs were evaluated
using high fidelity simulations. From them, approximately 20 Pareto solutions were
obtained, all of them being better than the reference design in the two objectives
considered.
47
distribution. The blade geometry is defined by eight design parameters, but only two
of them are varied during the optimization process. The evolutionary algorithm used
in this research correspond to a multiobjective version of the differential evolution
algorithm previously implemented by the same author and described in [63]. In order
to cope with the associated calculation time of the CFD simulations required to evaluate the objective functions, the authors used a hybrid neural network comprised of
10 individual singlehiddenlayer feed forward networks. The optimization is run
with a small population size of 10 individuals and during 25 generations.
Arabnia and Ghaly [2] presented a strategy that makes use of multi-objective
evolutionary algorithms for aerodynamic shape optimization of turbine stages in
three-dimensional fluid flow. The NSGA [74] is used and coupled to an artificial
neural network (ANN) based response surface method (RSM) in order to reduce the
overall computational cost. The blade geometry, both for rotor and stator blades, is
based on the E/TU-3 turbine which is used as a reference design to compare the
optimization results to. The multi-objective optimization consists of finding the best
distribution of 2D blade sections in the radial and circumferential directions. For
this, a quadratic rational B`ezier curve, with 5 control points, is used for each of the
two blades. The objective functions to be optimized include: (i) maximization of
isentropic efficiency for the stage, and (ii) minimization of the streamwise vorticity.
Both objective functions are evaluated using a 3D CFD flow simulation with constraints on: (1) inlet total pressure and temperature, (2) exit pressure, (3) axial chord
and spacing, (4) inlet and exit flow angles, and (5) mass flow rate. The authors noted
that one CFD simulation took approximately 10 hours. Therefore they resorted to an
ANN based RSM. The ANN model with backpropagation, containing a single hidden layer with 50 nodes, was trained and tested with 23 CFD simulations, sampling
the design space using the latin hypercubes sampling technique. The optimization
process used the ANN model to estimate the objective functions, and the constraints
values as well. The population size used in the NSGA was set to 50 individuals, and
was run for 150 generations. Finally, the Pareto solutions were evaluated with the
CFD flow simulation. From their results, the authors indicated that they obtained
design solutions which were better in comparison to the reference turbine design.
Indeed, they attained a 1.2% improvement in stage efficiency, which is remarkable
considering the small number of design variables used in the optimization process.
Alonso et al. [1] described a procedure for the multi-objective optimization
design of a generic supersonic aircraft. The competing design objectives considered were two: i) maximization of aircraft range, and ii) minimization of the perceived loudness of the ground boom signature. Constraints were set for aircrafts
structural integrity, take-off field length and landing field length. The objective
functions were evaluated using CFD with various fidelity (approximation) levels. In this work, the authors made use of a neural network (NN) based response
surface method. The prototype for the NN is a single hidden layer perceptron
with sigmoid activation functions, providing a general nonlinear model, which is
useful for the high non-linearities present in the objective functions landscapes
associated to this problem. The neural network was trained with 300 sampling
design solutions, obtained with low fidelity simulations in order to reduce the
48
computational cost. In their optimization cycle, authors used high fidelity simulations only in promising regions of the design space to do a local exploration. The
problem comprised 10 design variables and the NSGA-II [22] was used as the search
engine with a population size of 64 and was run for 1000 generations using the
surrogate-based objective function.
49
from the Pareto front obtained using the kriging model and evaluated with the CFD
tool. In their research, the authors reported difficulties in obtaining a converged
Pareto front (there exist large discrepancies between the approximated and the real
Pareto fronts). They attributed this behavior to the large number of variables in the
design problem, and to the associated difficulties in obtaining an accurate kriging
model for these situations. In order to alleviate this situation, they performed an
ANOVA (Analysis of Variance) test to find the variables that contributed the most
to the objective function values. After this test, they presented results with a reduced
kriging surrogate model, employing only 7 variables. The authors argued that they
obtained a similar design with this reduced kriging model at a considerably lower
computational effort.
Jeong et al. [38] investigated the improvement of the lateral dynamic characteristics of a lifting-body type re-entry vehicle in transonic flight condition. The problem
was posed as a multi-objective optimization problem in which two objectives were
minimized: (i) derivative of the yawing moment, and (ii) derivative of the rolling
moment. Due to the geometry of the lifting body and the operating flow condition
of interest, namely high Mach number and strong vortex formation, the evaluation of
the objectives was done by means of a full Navier-Stokes CFD simulation. Since the
objectives were derivatives, multiple flow solutions were required to determine their
values in a discrete manner through the use of finite differencing techniques. This
considerably increased the total computational time due to a large number of calls
for the CFD code. The optimization problem considered 4 design variables, and two
solutions were sought: the first one without constraints, and the second one constraining the L/D ratio for the lifting-body type reentry vehicle. the authors used the
EGOMOP (Efficient Global Optimization for Multi-Objective Problems) algorithm
developed by Jeong et al. [37]. Such algorithm was built upon the ideas borrowed
from the EGO and ParEGO algorithms from Jone et al. [41] and Knowles et al. [42],
respectively. EGOMOP adopts the use of the kriging model as a response surface
model, for predicting the function value and its uncertainty. For the exploration of
the Pareto solutions, Fonsecas MOGA [26] was used. The initial kriging model
was built by using the latin hypercube sampling method for uniformly covering the
design space, and the model was continuosly updated.
Voutchkov et al. [81] used the NSGA-II [22] to perform a robust structural design
of a simplified FEM jet engine model. This application aimed at finding the best jet
engine structural configuration minimizing: the variation of reacting forces under a
range of external loads, the mass for the engine and the engines fuel consumption.
These objectives are competing with each other and, therefore, the authors used a
multi-objective optimization technique to explore the design space looking for tradeoffs among them. The evaluation of the structural response was done in parallel by
means of finite element simulations. The FEM model comprised a set of 22 groups
of shell elements. The thickness for 15 of these groups were considered as the design variables. Computational time was reduced by using a kriging based response
surface method. The optimization problem was posed as a MOP, comprising four objectives (all to be minimized): (i) standard deviation of the internal reaction forces,
(ii) mean value of the internal reaction forces, (iii) engines mass, and (iv) mean value
50
of the specific fuel consumption. The first two objectives were computed over 200
external load variations. The authors noted that for this class of problem which comprises huge combinations of loads and finite element thicknesses, the multiobjective
optimization problem would take on the order of one year of computational time on
a single 1 GHZ CPU. Also, they indicated that by using the surrogate model and
parallel processing, the optimization time was reduced to about 26 hours in a cluster
with 30 PEs (processing elements).
Todoroki and Sekishiro [75, 76] proposed a new optimization method for composite structural components. This approach is based on the use of a multi-objective
genetic algorithm coupled to a kriging model, in order to reduce the number of
objective function evaluations, and to a FBB (Fractal Branch and Bound) method
for the stacking sequence optimization needed in laminar composite structures. The
problem consisted of two objectives: (i) minimize the structural weight of a hatstiffened wing panel, subject to buckling load constraints, and (ii) maximize the
probability of satisfying a predefined buckling load. The variables for the problem
are a set of mixed real/discrete variables. Real variables correspond to the stiffener
geometry definition, while discrete variables correspond to the number of plies for
the composite panel. Constraints were imposed on the dimensions of the stiffener,
but they were automatically satisfied in the definition of the variables ranges. The
authors noted that the buckling load constraint demanded a large computational cost,
since it needed a FEM (Finite Element Analysis). For this reason a kriging model
was adopted and initialized with sampling points obtained by the LHS (Latin Hypercube Sampling) technique. The optimization cycle consisted of two layers. The
upper one driven by the multi-objective genetic algorithm and the kriging model,
in which the optimization of the structural dimensions was performed. In the lower
layer, the stacking sequences of the stiffener and panels were optimized by means
of the FBB method. The evolutionary algorithm was run for 300 generations with
a population of 100 individuals, and every 50 generations some nondominated solutions were evaluated with the FEM model, in order to update the kriging model.
The authors obtained a Pareto Front that was discontinuous. Also, from the results
obtained, a comparison of different designs was made. The solution obtained with
the evolutionary algorithm was 3% heavier than a previous design obtained with
a conventional method (deterministic), but obtained after only 301 FEM analyses
compared to the tens of thousands required by the conventional method.
Choi et al. [11] used the NSGA-II [22] in the solution of a multidisciplinary
supersonic business jet design. In this case, the disciplines involved were (i) aerodynamics and, (ii) aeroacoustics. The main objective of this particular problem was
to obtain a compromise design having good aerodynamic performance while minimizing the intensity of the sonic boom signature at the ground level. Multiobjective optimization was used to obtain tradeoffs among the following objectives: (i)
the aircraft drag coefficient, (ii) initial pressure raise (boom overpressure), and (iii)
ground perceived noise level. All the objectives were minimized. The geometry of
the aircraft was defined by 17 design variables, involving the modification of the
wing platform, its position along the fuselage, and some cross sections and camber for the fuselage. For evaluating the objective functions, a high fidelity Euler
51
solution was obtained with a very fine grid close to the aircrafts surface. In order to
reduce the computational time required for the optimization cycle, a kriging model
was employed. Its initial definition was formed with a latin hypercube sampling of
the design space with 232 initial solutions, including both feasible and infeasible
candidates. Following a kriging based optimization cycle, the Pareto optimal solutions were evaluated with high fidelity simulation tools and used to update the
kriging model. In the example, constraints were imposed on some geometry parameters, and on the aircrfats operational conditions. No special constraint-handling
mechanism was adopted other than discarding the solution candidates that did not
satisfy the constraints, which were mostly geometrical. From their results, the authors noted that after the first design cycle using the kriging based NSGA-II, 59
feasible solutions were obtained. It is important to note that all the solutions obtained were better in both objectives compared to a base design. Another important
issue in this particular application was that the kriging model did not perform as
well as in other applications. The reason for this behavior was the high nonlinear
physics involved in the two disciplines considered, which required, in consequence,
more design cycles in the optimization.
In related work, Chung and Alonso [12] and Chung et al. [13] solved the same
previously defined multidisciplinary problem, but using the -GA Algorithm, from
Coello Coello and Toscano Pulido [15, 16]. This change was aimed at reducing
the total number of function evaluations during the optimization process. This GA algorithm used a population size of 3 to 6 individuals and an external file to
keep track of the nondominated solutions obtained so far. In the study reported in
[12], the design cycles were performed using a kriging model. Two design cycles
were executed, each one consisting of 150 solution candidates using the latin hypercube sampling technique applied around a base design in the first cycle. For the
second cycle, the sampling was applied around the best solution obtained in the
previous cycle. The authors reported that they obtained a very promising Pareto
front estimation with only 300 functions evaluations. In the second study, reported
in [13], the authors proposed an tested the GEMOGA (Gradient Enhanced Multiobjective Genetic Algorithm). The basic idea of this algorithm is to enhance the Pareto
solutions with a gradient based search. One important feature of the algorithm
is that gradient information is obtained from the kriging model. With this, the
computational cost is not considerably increased.
Kumano et al. [44] used Fonsecas MOGA [26] for the multidisciplinary design
optimization of wing shape for a small jet aircraft. In this study, four objectives
were considered: (i) drag at the cruise condition, (ii) drag divergence between cruising and off-design condition, (iii) pitching moment at the cruising condition, and
(iv) structural weight of the main wing. All these objectives were minimized. In this
study, the optimization process was also performed by means of a kriging model,
and such model was continuosly updated after a certain prescribed number of iterations (defined by the user), adding new nondominated points obtained from the
optimization steps.
52
53
The weights of the linear combinations were determined through a training procedure. The number of neurons involved was taken as the number of individuals in
the training set. The first training set was formed with all the solutions obtained
from the first two generations. Afterwards, the objective functions were approximated with the ANN-RBF model, and the training set was updated by adding a 30%
of exactly evalauted individuals per generation. With this technique the authors
obtained similar design solutions with approximately 60% less computational cost.
Kampolis and Giannakoglou [7] solved the inverse design of an isolated airfoil at
two operating conditions. For this design problem, two reference airfoil and operating conditions were defined (these solutions could be seen as the extreme portions of
the Pareto front), and a MOEA was used to find the tradeoff solutions between them.
The MOEA adopted was SPEA-2 [85]. In their approach, the authors proposed the
use of a radial basis function meta-model.
54
Use of Multiple Approximation Models: Most authors report the use of a single approximation model. However, it may be worth exploring the combination
of several of them for exploiting either their global or their local nature. This
idea has been explored in the past, for example, by Mack et al. [55], by using a
combination of polynomial respose surface methods and radial basis functions,
for performing global sensitivity analysis and shape optimization of bluff bodies.
Also, Glaz et al. [28] adopted three approximation models, namely polynomial,
kriging, and radial basis functions. This combined approach, adopted a weighted
estimation from the different models, which was used to reduce the vibration for
a helicopter rotor blade. To the authors best knowledge, no similar combination
of approaches has ever been reported when using MOEAs.
Automatic Switching: Considering that every approximation model has particular properties in terms of global or local accuracy, and that the selection of the
best approximation method to use for a particular application can also be considered a difficult task, one promising research area is to develop mechanisms
allowing to automatically switch from one approximation method to a different
one, as the optimization process is being executed. For example, a global approximation method (i.e., coarse-grained) could be used for exploration of the design
space, while a more locally accurate method (i.e., fine-grained) might be used for
solution exploitation.
Sampling Techniques: The accuracy of the approximation highly depends on
the sampling and updating technique used. In most cases, the initial sampling is
defined by a latin hypercube sampling, aiming at covering as much as possible the
design space. This can be considered as a general technique. Another possibility
is to use application-dependent sampling techniques, where the initial sampling
design points are selected on the basis of reference or similar solutions. One
example of this sort of situation is reported by Chung et al. [13] and by Chung and
Alonso [12], where the initial approximation models are built around a reference
design in decision variable space.
References
[1] Alonso, J., LeGresley, P., Pereyra, V.: Aircraft Design Optimization. Mathematics and
Computer in Simulation 79, 19481958 (2008)
[2] Arabnia, M., Ghaly, W.: A strategy for multi-objective shape optimization of turbine
stages in three-dimensional flow. In: 12th AIAA/ISSMO Multidisciplinary Analysis and
Optimization Conference, Victoria, British Columbia, Canada (2008)
[3] Beachkofski, B.K., Grandhi, R.V.: Improved Distributed Hypercube Sampling. In: 43rd
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, Denver, CO, USA (2002)
[4] Bhattacharya, M., Lu, G.: A dynamic approximate fitness based hybrid ea for optimization problems. In: Proceedings of IEEE Congress on Evolutionary Computation, pp.
18791886 (2003)
[5] Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, UK
(1995)
55
[6] Bueche, D., Schraudolph, N., Koumoutsakos, P.: Accelerating evolutionary algorithms
with gaussian process fitness function models. IEEE Transactions on Systems, Man,
and Cybernetics: Part C 35(2), 183194 (2005)
[7] Kampolis, I.C., Giannakoglou, K.C.: A multilevel approach to single- and multiobjective aerodynamic optimization. Computer Methods in Applied Mechanics and Engineering 197, 29632975 (2008)
[8] Chen, J.H., Goldberg, D., Ho, S.Y., Sastry, K.: Fitness inheritance in multi-objective
optimization. In: Proceedings of Genetic and Evolutionary Computation Conference.
Morgan Kaufmann, San Francisco (2002)
[9] Chiba, K., Obayashi, S.: Data mining for multidisciplinary design space of regional-jet
wing. AIAA Journal of Aerospace Computing, Information, and Communication 4(11),
10191036 (2007)
[10] Chiba, K., Obayashi, S., Nakahashi, K., Morino, H.: High-Fidelity Multidisciplinary
Design Optimization of Wing Shape for Regional Jet Aircraft. In: Coello Coello, C.A.,
Hernandez Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 621635.
Springer, Heidelberg (2005)
[11] Choi, S., Alonso, J.J., Chung, H.S.: Design of a low-boom supersonic business jet using
evolutionary algorithms and an adaptive unstructured mesh method. In: AIAA Paper
2004-1758, 45th AIAA/ASME/ASCE/AHS/ASC Structure, Structural Dynamics and
Materials Conference, Palm Springs, CA, USA (2004)
[12] Chung, H.S., Alonso, J.J.: Multiobjective optimization using approximation modelbased genetic algorithms. In: AIAA Paper 2004-4325, 10th AIAA/ISSMO Symposium
on Multidisciplinary Analysis and Optimization, Albany, New York, USA (2004)
[13] Chung, H.S., Choi, S., Alonso, J.J.: Supersonic business jet design using a knowledgebased genetic algorithm with an adaptive, unstructured grid methodology. In: AIAA Paper 2003-3791, 21st Applied Aerodynamics Conference, Orlando, Florida, USA (2003)
[14] Cinnella, P., Congedo, P.M.: Optimal Airfoil Shapes for Viscous Transonic Flows of
Dense Gases. In: AIAA Paper 2006-3881, 36th AIAA Fluid Dynamics Conference and
Exhibit, San Francisco, California, USA (2006)
[15] Coello Coello, C.A., Toscano Pulido, G.: A Micro-Genetic Algorithm for Multiobjective Optimization. In: Zitzler, E., Deb, K., Thiele, L., Coello Coello, C.A., Corne, D.W.
(eds.) EMO 2001. LNCS, vol. 1993, pp. 126140. Springer, Heidelberg (2001)
[16] Coello Coello, C.A., Toscano Pulido, G.: Multiobjective Optimization using a MicroGenetic Algorithm. In: Spector, L., Goodman, E.D., Wu, A., Langdon, W., Voigt, H.M.,
Gen, M., Sen, S., Dorigo, M., Pezeshk, S., Garzon, M.H., Burke, E. (eds.) Proceedings
of the Genetic and Evolutionary Computation Conference (GECCO 2001), pp. 274
282. Morgan Kaufmann Publishers, San Francisco (2001b)
[17] Coello Coello, C.A., Lamont, G.B., Van Veldhuizen, D.A.: Evolutionary Algorithms for
Solving Multi-Objective Problems, 2nd edn. Springer, New York (2007)
[18] Congedo, P.M., Corre, C., Cinnella, P.: Airfoil Shape Optimization for Transonic Flows
of Bethe-Zeldovich-Thompson Fluids. AIAA Journal 45(6), 13031316 (2007)
[19] Costa, M., Minisci, E.: MOPED: A multi-objective parzen-based estimation of distribution algorithm for continuous problems. In: Fonseca, C.M., Fleming, P.J., Zitzler, E.,
Deb, K., Thiele, L. (eds.) EMO 2003. LNCS, vol. 2632, pp. 282294. Springer, Heidelberg (2003)
[20] DAngelo, S., Minisci, E.A.: Multi-objective evolutionary optimization of subsonic airfoils by kriging approximation and evolutionary control. In: 2005 IEEE Congress on
Evolutionary Computation (CEC 2005), Edinburg, Scotland, vol. 2, pp. 12621267
(2005)
56
[21] Deb, K.: Multi-Objective Optimization using Evolutionary Algorithms. John Wiley &
Sons, Chichester (2001)
[22] Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A Fast and Elitist Multiobjective
Genetic Algorithm: NSGAII. IEEE Transactions on Evolutionary Computation 6(2),
182197 (2002)
[23] Ducheyne, E.I., De Baets, B., De Wulf, R.: Is Fitness Inheritance Useful for Real-World
Applications? In: Fonseca, C.M., Fleming, P.J., Zitzler, E., Deb, K., Thiele, L. (eds.)
EMO 2003. LNCS, vol. 2632, pp. 3142. Springer, Heidelberg (2003)
57
[38] Jeong, S., Suzuki, K., Obayashi, S., Kurita, M.: Improvement of nonlinear lateral characteristics of lifting-body type reentry vehicle using optimization algorithm. In: AIAA
Paper 20072893, AIAA infotech@Aerospace 2007 Conference and Exhibit, Rohnert
Park, California, USA (2007)
[39] Jin, R., Chen, W., Simpson, T.: Comparative studies of metamodeling techniques under
miltiple modeling criteria. Tech. Rep. 2000-4801, AIAA (2000)
[40] Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation.
Soft Computing 9(1), 312 (2005)
[41] Jone, D., Schonlau, M., Welch, W.: Efficient global optimization of expensive black-box
function. Journal of Global Optimization 13, 455492 (1998)
[42] Knowles, J.: ParEGO: A hybrid algorithm with on-line landscape approximation for
expensive multiobjective optimization problems. IEEE Transactions on Evolutionary
Computation 10(1), 5066 (2006)
[43] Knowles, J., Nakayama, H.: Meta-modeling in multiobjective optimization. In: Branke,
J., Deb, K., Miettinen, K., Slowinski, R. (eds.) Multiobjective Optimization-Interactive
and Evolutionary Approaches, pp. 245284. Springer, Heidelberg (2008)
[44] Kumano, T., Jeong, S., Obayashi, S., Ito, Y., Hatanaka, K., Morino, H.: Multidisciplinary design optimization of wing shape for a small jet aircraft using kriging model.
In: AIAA Paper 2006-932, 44th AIAA Aerospace Science Meeting and Exhibit, Reno,
Nevada, USA (2006)
[45] Kung, H., Luccio, F., Preparata, F.: On finding the maxima of a set of vectors. Journal
of the Association for Computing Machinery 22(4), 469476 (1975)
[46] Becerra, R.L., Coello, C.A.C.: Solving Hard Multiobjective Optimization Problems Using -Constraint with Cultured Differential Evolution. In: Runarsson, T.P., Beyer, H.-G.,
Burke, E.K., Merelo-Guervos, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS,
vol. 4193, pp. 543552. Springer, Heidelberg (2006)
[47] Langer, H., Puhlhofer, T., Baier, H.: A multi-objective evolutionary algorithm with integrated response surface functionalities for configuration optimization with discrete
variables. In: AIAA Paper 20044326, 10th AIAA/ISSMO Symposium on Multidisciplinary Analysis and Optimization Conference, Albany, New York, USA (2004)
[48] Lee, D., Gonzalez, L., Periaux, J., Srinivas, K.: Robust design optimisation using multiobjective evolutionary algorithms. Computer & Fluids 37, 565583 (2008)
[49] Lee, D., Gonzalez, L., Srinivas, K., Periaux, J.: Robust evolutionary algorithms for
uav/ucav aerodynamic and rcs design optimisation. Computer & Fluids 37, 547564
(2008b)
[50] Lee, D.S., Gonzalez, L.F., Srinivas, K., Auld, D.J., Wong, K.C.: Aerodynamics/rcs
shape optimisation of unmanned aerial vehicles using hierarchical asynchronous parallel evolutionary algorithms. In: AIAA Paper 2006-3331, 24th AIAA Applied Aerodynamics Conference, San Francisco, California, USA (2006)
[51] Lee, D.S., Gonzalez, L.F., Srinivas, K., Periaux, J.: Multi-objective robust design optimisation using hierarchical asynchronous parallel evolutionary algorithms. In: AIAA
Paper 2007-1169, 45th AIAA Aerospace Science Meeting and Exhibit, Reno, Nevada,
USA (2007)
[52] Lian, Y., Liou, M.S.: Multiobjective Optimization Using Coupled Response Surface
Model and Evolutinary Algorithm. In: AIAA Paper 20044323, 10th AIAA/ISSMO
Multidisciplinary Analysis and Optimization Conference, Albany, New York, USA
(2004)
58
[53] Lian, Y., Liou, M.S.: Multi-Objective Optimization of a Transonic Compressor Blade Using Evolutionary Algorithm. In: AIAA Paper 20051816, 46th
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics & Materials Conference, Austin, Texas, USA (2005)
[54] Lian, Y., Liou, M.S.: Multi-objective Optimization of Transonic Compressor Blade Using Evolutionary Algorithm. Journal of Propulsion and Power 21(6), 979987 (2005)
[55] Mack, Y., Goel, T., Shyy, W., Haftka, R., Queipo, N.: Multiple Surrogates for the Shape
Optimization of Bluff Body-Facilitated Mixing. In: AIAA Paper 2005333, 43rd AIAA
Aerospace Sciences Metting and Exhibit, Reno, Nevada, USA (2005)
[56] Nakayama, H., Sawaragi, Y.: Satisficing trade-off method for multi-objective programming. In: Grauer, M., Wierzbicki, A. (eds.) Interactive Decision Analysis, pp. 113122.
Springer, Heidelberg (1984)
[57] Nakayama, H., Yun, Y.: Support vector regression based on goal programming and
multi-objective programming. In: IEEE World Congress on Computational Intelligence
(2006)
[58] Nakayama, H., Inoue, K., Yoshimori, Y.: Approximate optimization using computational intelligence and its application to reinforcement of cable-stayed bridges. In: Zha,
X., Howlett, R. (eds.) Integrated Intelligent Systems for Engineering Design, pp. 289
304. IOS Press, Amsterdam (2006)
[59] Obayashi, S., Sasaki, D.: Self-organizing map of pareto solutions obtained from multiobjective supersonic wing design. In: AIAA Paper 20020991, 40th Aerospace Science
Meeting and Exhibit, Reno, Nevada, USA (2002)
[60] Ong, Y.S., Nair, P.B., Keane, A.J., Wong, K.W.: Surrogate-assisted evolutionary optimization frameworks for high-fidelity engineering design problems. In: Jin, Y. (ed.)
Knowledge Incorporation in Evolutionary Computation. Studies in Fuzziness and Soft
Computing, pp. 307332. Springer, Heidelberg (2004)
[61] Pagano, A., Federico, L., Barbarino, M., GUida, F., Aversano, M.: Multi-objective
Aeroacoustic Optimization of an Aircraft Propeller. In: 12th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Victoria, British Columbia Canada
(2008)
[62] Pierret, S.: Turbomachinery blade design using a Navier-Stokes solver and artificial
neural network. ASME Journal of Turbomachinery 121(3), 326332 (1999)
[63] Rai, M.M.: Robust Optimal Aerodynamic Design Using Evolutionary Methods and
Neural Networks. In: AIAA Paper 2004-778, 42nd AIAA Aerospace Science Meeting
and Exhibit, Reno, Nevada, USA (2004)
[64] Rai, M.M.: Robust Optimal Design With Differential Evolution. In: AIAA Paper 20044588, 10th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Albany, New York, USA (2004)
[65] Rasheed, K., Ni, X., Vattam, S.: Comparison of methods for developing dynamic reduced models for design optimization. Soft Computing 9(1), 2937 (2005)
[66] Ratle, A.: Accelerating the convergence of evolutionary algorithms by fitness landscape
approximation. In: Eiben, A.E., Back, T., Schoenauer, M., Schwefel, H.-P. (eds.) PPSN
1998. LNCS, vol. 1498, pp. 8796. Springer, Heidelberg (1998)
[67] Reyes Sierra, M., Coello Coello, C.A.: A Study of Fitness Inheritance and Approximation Techniques for Multi-Objective Particle Swarm Optimization. In: 2005 IEEE
Congress on Evolutionary Computation (CEC 2005), vol. 1, pp. 6572. IEEE Service
Center, Edinburgh (2005)
[68] Sacks, J., Welch, W., Mitchell, T., Wynn, H.: Design and analysis of computer experiments (with discussion). Statistical Science 4, 409435 (1989)
59
[69] Sasaki, D., Obayashi, S., Nakahashi, K.: Navier stokes optimzation of supersonic wings
with four objectives using evolutionary algorithms. In: AIAA Paper 20012531, 15th
AIAA Computational Fluid Dynamics Confernce, Anaheim, CA, USA (2001)
[70] Sasaki, D., Obayashi, S., Nakahashi, K.: Navier-Stokes Optimization of Supersonic
Wings with Four Objectives Using Evolutionary Algorithms. Journal of Aircraft 39(4),
621629 (2002)
[71] Abboud, K., Schoenauer, M.: Surrogate deterministic mutation: Preliminary results. In:
Collet, P., Fonlupt, C., Hao, J.-K., Lutton, E., Schoenauer, M. (eds.) EA 2001. LNCS,
vol. 2310, pp. 104115. Springer, Heidelberg (2002)
[72] Smith, R.E., Dike, B.A., Stegmann, S.A.: Fitness inheritance in genetic algorithms. In:
SAC 1995: Proceedings of the 1995 ACM symposium on Applied computing, pp. 345
350. ACM Press, New York (1995)
[73] Song, W., Keane, A.J.: Surrogate-based aerodynamic shape optimization of a civil aircraft engine nacelle. AIAA Journal 45(10), 2652574 (2007)
[74] Srinivas, N., Deb, K.: Multiobjective Optimization Using Nondominated Sorting in Genetic Algorithms. Evolutionary Computation 2(3), 221248 (1994)
[75] Todoroki, A., Sekishiro, M.: Dimensions and laminates optimization of hat-stiffened
composite panel with buckling load constraint using multi-objective ga. In: AIAA Paper
20072880, AIAA infotech@Aerospace 2007 Conference and Exhibit, Rohnert Park,
California, USA (2007)
[76] Todoroki, A., Sekishiro, M.: Modified efficient global optimization for a hat-stiffened
composite panel with buckling constraint. AIAA Journal 46(9), 22572264 (2008)
[77] Ulmer, H., Streicher, F., Zell, A.: Model-assisted steady-state evolution strategies. In:
Cantu-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., OReilly, U.-M., Beyer, H.G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A.,
Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO
2003. LNCS, vol. 2723, pp. 610621. Springer, Heidelberg (2003)
[78] Ulmer, H., Streichert, F., Zell, A.: Evolution startegies assisted by gaussian processes
with improved pre-selection criterion. In: Proceedings of IEEE Congress on Evolutionary Computation, pp. 692699 (2003)
[79] Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)
[80] Vapnik, V.N.: The Nature of Statistical Learning. Springer, Heidelberg (1995)
[81] Voutchkov, I., Keane, A.J., Fox, R.: Robust structural design of a simplified jet engine
model, using multiobjective optimization. AIAA Paper 20067003, Portsmouth, Virginia, USA (2006)
[82] Williams, C.K.I., Rasmussen, C.E.: Gaussian processes for regression. In: Touretzky,
D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing
Systems, vol. 8. MIT Press, Cambridge (1996)
[83] Won, K., Ray, T.: Performance of kriging and cokriging based surrogate models within
the unified framework for surrogate assisted optimization. In: Congress on Evolutionary
Computation, pp. 15771585. IEEE, Los Alamitos (2004)
[84] Zitzler, E., Deb, K., Thiele, L.: Comparison of Multiobjective Evolutionary Algorithms:
Empirical Results. Evolutionary Computation 8(2), 173195 (2000)
[85] Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the Strength Pareto Evolutionary Algorithm. In: Giannakoglou, K., Tsahalis, D., Periaux, J., Papailou, P., Fogarty, T.
(eds.) EUROGEN 2001. Evolutionary Methods for Design, Optimization and Control
with Applications to Industrial Problems, Athens, Greece, pp. 95100 (2002)
Chapter 3
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 6184.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
62
3.1 Introduction
The term computationally expensive optimization problems is referred to design or
optimization applications requiring an excessive number of calls to the costly evaluation software for locating the optimal solution(s). Typical examples are optimizations in which the evaluation of candidate solutions calls for the numerical solution
of p.d.e.s or is based on MonteCarlo techniques to account for uncertainties. Without loss in generality, we will restrict ourselves to designoptimization problems
with aerodynamic performance criteria. Therefore, the evaluation software might
be any Computational Fluid Dynamics (CFD) code. Depending on the selected flow
model, the problem dimension (2D or 3D) and the complexity of the flow domain
(affecting the computational grid size, if such a grid is needed), the cost of running
the CFD code may range from a couple of minutes to some hours on many CPUs.
In aerodynamic shape optimization, the use of either gradientbased methods or
global search metaheuristics is steadily increasing, [48, 82]. Though they usually
appear as rival methods, they can be hybridized to create more efficient optimization methods (see also [58], chapter 16). In the present chapter, an evolutionary
algorithm (EA, [4, 34, 57]) is the key search method. EAs are assisted by metamodels and fitness inheritance, hybridized with gradientbased methods and structured
as multilevel search algorithms. They are also used in distributed search schemes,
properly adapted for use on multiprocessor platforms and may become Grid
enabled.
EAs are gradientfree methods that may accommodate any ready-to-use analysis
software, even a commercialofftheshelf one without having access to its source
code. In aerodynamic optimization problems, they unfortunately become computationally demanding if (some, at least, of) the addon features discussed in this
chapter or elsewhere in this book are not used. This is due to the high number of
candidate solutions that must be evaluated. For EAs to become routine industrial
tools, much focus has been placed on methods reducing the number of evaluations required and, thus, their CPU cost. To this end, most of the existing papers
rely on surrogate evaluation models or metamodels. The latter stand for evaluation
methods of lower accuracy and CPU cost. The socalled metamodelassisted EAs
(MAEAs, [26, 67]) use both the exact and costly problemspecific evaluation model
and the approximate and computationally cheap metamodel), according to coupling
schemes to be discussed below.
In the socalled MAEAs with offline trained metamodels [7, 8, 20, 25, 35, 38, 41,
62, 71, 73], the metamodel is trained in advance, i.e. separately from the evolution
which is exclusively based on them. The problemspecific tool is used to evaluate
a number of selected samples which the metamodel should be trained on and for
crosschecking the outcome of the metamodelbased optimization.
In MAEAs with online trained metamodels [6, 18, 19, 26, 33, 38, 44, 66, 77, 83]
the metamodel(s) and the problemspecific model are used in an interleaving way
during the evolution. The metamodels may be local (valid over a part only of the
design space) or global (valid over the entire design space). The more frequently
metamodels are used (in place of the exact model), the greater the gain in CPU cost.
63
64
65
Initialization
t Evaluations
New
Population
Evolution Operations
Pre-Evaluation
Selection Based on
Approximate Fitness
Tr
66
(3.1)
xc(k) 22
rk2
is
wk G (x c(k)2, rk )
(3.2)
k=1
Therefore, assuming K centers c(k) and T > K training patterns x(t) , the network
training requires the solution of the following system of linear equations
K
t = 1, T
k=1
(where y(t) are the known responses) using the least squares algorithm.
The selection of the RBF centers is carried out as in [44]. It is based on self
organizing maps (SOMs, [22, 36] and an iterative scheme with both unsupervised
and supervised learning. During the unsupervised learning, the SOMs classify the
training patterns into K clusters. Each cluster gives a single RBF center c(k) and
the corresponding radius rk , through heuristics based on distances between the centers, [5, 23, 36, 47, 60]. During the supervised learning, the synaptic weights are
calculated by minimizing the approximation error over the training set.
Herein, a variant of RBF networks enhanced by Importance Factors (IFs, denoted
by In , n = 1, . . . , N), as proposed in [29], is used. The modified network incorporates
the In factors which quantify how much the network response is affected by each
design variable. The higher the In value the higher the response sensitivity with
respect to the nth input variable. A weighted norm defined by
x c(k)
wei
=
2
c(k)
x
I
,
n
n
n
N
n=1
67
(b)
y
xn
(b)
In =
y
Ni=1 xi
(3.3)
68
Fig. 3.2 Redesign of the RAE2822 airfoil. Convergence history plotted in terms of the number of evaluations on the problemspecific model (left). CP distribution on the optimal airfoil
(right)
Fig. 3.3 Design of an axial compressor cascade airfoil. Pareto front approximations using a
conventional EA and two MAEAs. The contour and pressure coefficient (CP ) distribution on
the Pareto front member marked with A (right)
Multilevel
Evaluation
High Fidelity S/W
Multilevel
Search
Gradientbased
Global
Metaheuristics
69
Multilevel
Parameterization
Detailed
Parameterization
Rough
Parameterization
Fig. 3.4 The three modes of the multilevel (herein L=2) algorithm at a glance: (a) Multilevel
Evaluation (b) Multilevel Search and (c) Multilevel Parameterization
and fidelity are associated with it. The problemspecific (high fidelity) evaluation
model is employed on the high level. One or twoway interlevel migrations can
be used. In oneway migrations, a small number of best performing individuals is
directed upwards, with no feedback at all. On the high level, immigrants replace
badly performing and/or randomly selected population members (assume that an
EA or a MAEA is used on both levels). In the twoway migration scheme, promising
individuals from the high level may also move downwards to stimulate better search
in their neighborhood.
The multilevel evaluation algorithm is often configured with different population
sizes per level, usually a large population on the low level and a small one on the
high level to compensate for the high CPU cost per evaluation and synchronize
better with the low level. The latter certainly depends on the CPU cost ratio and the
number of processors used.
The multilevel evaluation mode is appropriate for use in aerodynamic shape optimization problems. For instance, in flows dominated by viscous effects, either a
NavierStokes solver coupled with wallfunctions on a coarse grid or an integral
boundary layer method can be employed on the low level. The high level must
rely on a model with the desired accuracy, such as a NavierStokes solver with
a lowReynolds number turbulence model and a much finer grid. Alternatively, the
same CFD tool running with different grids and/or employing different convergence
criteria can be used on the two levels.
(b) Multilevel Search: In this mode, each level is associated with a different
search technique (EA, conjugate gradient, Sequential Quadratic Programming SQP,
etc., [63]). Stochastic search techniques, such as EAs, are preferably used on the
low level to adequately explore the design space. On the high level, the refinement of promising solutions can be carried out through gradientbased methods or
stochastic, individualbased methods (such as simulated annealing). The migration
of promising solutions is, preferably, bidirectional to accentuate the exploration
capabilities of low level EAs.
The coupling of stochastic, populationbased methods and gradientbased algorithms is not new. In the literature, hybrid optimization methods are mostly restricted
to SOO problems or MOO ones where the objectives are concatenated in a single
70
function. In [39], a genuine multilevel search methods for MOO problems was
proposed. In this method, on the low level, the EA or MAEA computes approximations to the Pareto front using a known scalar utility assignment (SPEA2, [86]).
On the high level, the scalar utility gradient is computed for a few selected non
dominated solutions and a descent algorithm is used to improve them with respect
to all objectives. This is carried out using the chain rule after replacing the derivative (delta function) of the nondifferentiable SPEA2 utility function (terms that
involve the Heaviside function) with a differentiable approximation.
In case a gradientbased search is used on the high level, the gradient of the
objective function must be computed or approximated. To this end, in aerodynamic
optimization, the adjoint approach can be used at about the cost of an additional
flow solution, [2, 69, 70].
(c) Multilevel Parameterization: The third mode associates a different set of design
variables with each level. On the low level, a problem with just a few design variables is solved. On the high level, the detailed problem parameterization is used.
To support migrations, (exact or approximate) transformations between different
parameterizations must be available. All immigrants must be transformed to the
parameterization scheme of the destination level. Working with NURBS curves or
surfaces, knot insertion and removal properties and formulas [72] must be used to
switch between levels with different parameterizations.
In constrained problems, the term different parameterization may also imply
that the constraints are handled differently on each level. For instance, on the low
level, constraints may be relaxed or even ignored, allowing thus even infeasible but
promising solutions to be sent to the high level.
The three modes can be used either separately or altogether. As mentioned above,
the term HDMAEA is used to denote an optimization method of multilevel structure
which may accommodate distributed search on all or some of its levels. In a HDMAEA, all levels using DEAs or DMAEAs regularly perform intralevel (i.e. inter
deme) migrations, over and above to the interlevel ones. In case a level has not yet
reached the generation (or iteration, in gradientbased methods) marked for inter
level migration whereas the other did, the one ahead suspends evolution, waiting for
synchronization. This is also valid for the intralevel migration between demes.
When ineffective interlevel migrations occur (i.e. when all immigrants perform
worse than the destination level individuals) for a userdefined number of consecutive
generations/iterations, the evolution on the lower level terminates.
Deme 1
71
Deme 2
Deme 3
Evolution
Operators
Evolution
Operators
Evolution
Operators
E2
E2
E2
E1
E1
E1
E0
E0
E0
Migration
Fig. 3.5 The distributed hierarchical EA (DHEAm) with one metamodel (E0 ) and two
problemspecific tools (E1 , E2 )
72
C
A
C
A
Fig. 3.6 Design of a 2D transonic compressor cascade using a HDMAEA (multilevel evaluation): Pareto front obtained at the cost of 311 cost units; airfoil shapes and isoMach number
contours for three Pareto members. From [43]
73
Fig. 3.8 Design of a compressor stator cascade. Convergence plot of a multilevel parameterization algorithm and a singlelevel
MAEA. From [39]
74
pressure rise, subject to airfoil thickness related constraints. The flow conditions
were Mout,is =0.45, in =47o , ReC =8.41105 . Local metamodels (E0 ), a high fidelity
CFD model (E2 ) and a low fidelity one (E1 based on the same iterative solution
methods with relaxed convergence criteria) were used. The CPU cost ratio of E1
and E2 was about 0.1 : 1,
In this case, three algorithms are compared:
1. A singlelevel (15, 60) EA using E2 based evaluations.
2. A DHEA with three (5, 20) demes and two evaluation passes (on E1 and E2 )
within each deme. During the first pass, the 20 offspring were evaluated on E1
and only the top two of them were reevaluated on E2 .
3. A 3 (5, 20) DHMAEA and three evaluation passes per deme. Upon completion
of the 20 E0 based evaluations, the 10 best among them were reevaluated on
E1 and the 3 best of them on E2 . All metamodels were trained on the fly, using
previously evaluated (on E1 ) neighbors.
In both distributed algorithms, the migration operator was employed every 8 generations by exchanging two individuals between any pair of demes. The two emigrants of each deme were selected after ranking the 20 population members in terms
of their fitness, irrespective of the evaluation tool used. In each destination deme,
the immigrants replaced the worst performing members evaluated on the same or a
lower fidelity model.
The three algorithms are compared in terms of the hypervolume indicator, [87],
fig. 3.9, which quantifies the part of the objective space (up to a userdefined point)
dominated by the front; larger indicator values correspond to better Pareto front approximations. The combined use of metamodels and hierarchical schemes achieves
better performance. The same figure also shows the Pareto front approximation (at
the cost of 1000 CPU cost units) computed by the DHMAEA.
Fig. 3.9 Twoobjective compressor cascade airfoil design: Evolution of the hypervolume
indicator for the three tested algorithms (left) and the Pareto front approximation computed
by the DHMAEA (right)
75
Fig. 3.10 Optimization of an annular cascade: Reference airfoil (continuous line), its control
points polygon (dashed line) and the design variables bounds (left). Reference (dashed) and
optimal (continuous) blade airfoil contours (right)
76
Fig. 3.11 Optimization of an annular cascade: Views of the hybrid fine grid used for
E2 based evaluations (left) and view of the surface grid over the reference blade (right)
walls satisfied the usual constraint on the nondimensional distance y+ from the wall
(y+ < 1). All subsequent layers of elements were arranged using a geometrical progression law with ratio . A layer of pyramids was used to interface the hexahedral
structuredlike layers and the tetrahedra filling the inner part of the domain. The
same grid generation procedure was also used to support the low fidelity tool E1 ,
with different, however, parameters. E1 used the SpalartAllmaras turbulence model
with wall functions and a coarser hybridunstructured grid of about 600.000 nodes.
The blade surface was discretized using a 40085 grid; larger and distances of
the first layer of nodes off the wall were used. Table 3.1 compares the basic features
of the fine (for E2 ) and coarse (for E1 ) grids.
This optimization was carried out on a cluster of 25 nodes (2Quad Core Xeon,
2.0Ghz, with 16 GB RAM each). Each population member was evaluated in parallel on a single node. The wall clock time required for a single E2 based evaluation
was about 6 hours. The same evaluation on E1 required about 1.2 hours. Below, one
CPU cost unit is assigned to each evaluation on E2 and 0.2 units to each on E1 . It
should become clear that even if an evaluation failed quite early (during grid
generation), this was assigned the full CPU cost. These are also summarized in
table 3.1.
To compare the modeling accuracy of E1 and E2 , the reference cascade was
firstly analyzed using both tools. The computed radial distributions of the circumferentially massaveraged total pressure on the (same) exit plane are compared in
fig. 3.12, left. Differences between the two models reflect on the PLCave values
which were equal to 0.1498 (based on E1 ) and 0.1189 (on E2 ).
A (20, 60) hierarchical EA, with a single population was used. Fitness inheritance
and RBF networks supported the IPE process. During the first generation, all individuals were evaluated on E1 and only the 6 top of them were reevaluated on E2 .
On the second generation, the fitness inheritance technique was activated. The most
promising (between 5 and 10 of them) members in the population were reevaluated
77
E2
SpalartAllmaras,
lowReynolds model
1.000k
280k
10k
300k
820k
3 105 m
1.15
400 95
27
18min
6h
1
on E1 and only the best of them on E2 . Once 100 previously evaluated individuals
on E1 were archived in the database, RBF networks were used in place of fitness
inheritance. The hierarchical EA was stopped at 80 CPU cost units.
Fig. 3.12 (right) illustrates the convergence of the optimization algorithm. The
massaveraged PLC values of the reference and optimal airfoils were found equal
to 0.1189 and 0.0843, respectively. The gain using the hierarchical algorithm is
absolutely clear since, by merely using a MAEA based on E2 , it would be impossible
to locate the optimal solution at the cost of 80 CPU cost units.
78
Table 3.2 Optimization of an annular cascade: Analysis of the overall CPU cost
E1
E2
33
165
54
35
47
47
19
3
79
Fig. 3.14 Optimization of an annular cascade: Total pressure loss fields transversal cross
sections located at 0.782Cax (top) and 1.145Cax (bottom) downstream of the blade leading
edge. Reference (left) and optimal (right) blade are included
3.8 Conclusions
Several algorithms capable to reduce the number of evaluations and the wall clock
time of EAbased optimization were presented. They were based on:
Metamodels, which replace as many calls to the problemspecific evaluation
model as possible. RBF networks have been used, assisted by fitness inheritance during the first few generations. However, any other artificial neural network, such as a multilayer perceptron, Gaussian processes or even polynomial
regression, could have been used instead, [18, 19, 25].
Distributed search, in the form of intercommunicating demes, being advantageous if, particularly, the design is carried out on multiprocessor platforms.
Hierarchical schemes, splitting the search into levels based on evaluation tools
of different fidelity and CPU cost, different coarse and fine parameterizations or
involving gradientbased search for refinement.
80
Their combination is possible and very efficient indeed. It can be performed in various ways. In this chapter we presented hierarchical distributed (where the levels
are independently structured in demes) and distributed hierarchical search (where
the hierarchy is implicit to each deme). It was beyond the scope of this chapter to
compare the two aforementioned schemes; from several tests, it has been seen that
such a conclusion is case dependent. However, it was clearly demonstrated that the
combination of the above schemes leads to a considerable economy in the CPU
cost. Over and above, in this work, an EA served as the base search method. The
presented hierarchical framework as well as the IPE technique is directly extended
to any other stochastic search technique, such as evolution strategies with covariant
matrix adaptation [3] or particle swarm optimization [51].
References
1. Alba, E., Tomassini, M.: Parallelism and evolutionary algorithms. IEEE Trans. on Evolutionary Computation 6(5) (2002)
2. Asouti, V., Zymaris, A., Papadimitriou, D., Giannakoglou, K.: Continuous and discrete
adjoint approaches for aerodynamic shape optimization with low Mach number preconditioning. Int. J. for Numerical Methods in Fluids 57(10), 14851504 (2008)
3. Auger, A., Hansen, N.: A restart cma evolution strategy with increasing population size.
In: CEC 2005, UK, vol. 2, pp. 17691776 (2005)
4. Back, T.: Evolutionary Algorithms in Theory and Practice. Evolution Strategies. Evolutionary Programming, Genetic Algorithms. Oxford University Press, Oxford (1996)
5. Benoudjit, N., Archambeau, C., Lendasse, A., Lee, J., Verleysen, M.: Width optimization
of the Gaussian kernels in radial basis function networks. In: ESANN 2002, Bruges, pp.
425432 (2002)
6. Branke, J., Schmidt, C.: Faster convergence by means of fitness estimation. Soft Computing A Fusion of Foundations, Methodologies & Applications 9(1), 1320 (2005)
7. Buche, D., Schraudolph, N., Koumoutsakos, P.: Accelerating evolutionary algorithms
with Gaussian process fitness function models. Trans. on Systems, Man & Cybernetics
Part C: Applications & Reviews 35(2), 183194 (2005)
8. Bull, L.: On model-based evolutionary computation. Soft Computing A Fusion of
Foundations, Methodologies & Applications 3(2), 7682 (1999)
9. Cantu-Paz, E.: A survey of parallel genetic algorithms. Calculateurs Paralleles, Reseaux
et Systemes Repartis 10(2), 141171 (1998)
10. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast elitist non-dominated sorting
genetic algorithm for multi-objective optimization: NSGA-II. In: Deb, K., Rudolph, G.,
Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000.
LNCS, vol. 1917, pp. 849858. Springer, Heidelberg (2000)
11. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic
algorithm: NSGA-II. Trans. on Evolutionary Computation 6(2), 182197 (2002)
12. Desideri, J., Janka, A.: Hierarchical parameterization for multilevel evolutionary shape
optimization with application to aerodynamics. In: EUROGEN 2003, Barcelona (2003)
13. Doorly, D.J., Peiro, J.: Supervised parallel genetic algorithms in aerodynamic optimisation. AIAA Paper 1997-1852 (1997)
14. Drela, M., Giles, M.: Viscous-inviscid analysis of transonic and low Reynolds number
airfoils. AIAA J. 25(10), 13471355 (1987)
81
15. Ducheyne, E., De Baets, B., De Wulf, R.: Fitness inheritance in multiple objective evolutionary algorithms: A test bench and real-world evaluation. Applied Soft Computing 8(1), 337349 (2008)
16. Duvigneau, R., Chaigne, B., Desideri, J.: Multi-level parameterization for shape optimization in aerodynamics and electromagnetics using a particle swarm optimization algorithm. Tech. Rep. RR-6003, INRIA, France (2006)
17. Eby, D., Averill, R., Punch III, W., Goodman, E.: Evaluation of injection island GA performance on flywheel design optimization. In: Proceedings of the 3rd Conf. on Adaptive
Computing in Design & Manufacturing, pp. 121136. Springer, Heidelberg (1998)
82
32. Giotis, A., Giannakoglou, K.: Single- and multi-objective airfoil design using genetic
algorithms and artificial intelligence. In: EUROGEN 1999, Jyvaskyla (1999)
33. Giotis, A., Giannakoglou, K., Periaux, J.: A reduced-cost multi-objective optimization
method based on the Pareto front technique, neural networks and PVM. In: ECCOMAS
2000, Barcelona (2000)
34. Goldberg, D.: Genetic Algorithms in Search, Optimization & Machine Learning.
Addison-Wesley, Reading (1989)
35. Greenman, R., Roth, K.: Minimizing computational data requirements for multi-element
airfoils using neural networks. AIAA Paper 1999-0258 (1999)
36. Haykin, S.: Neural Networks - A Comprehensive Foundation, 2nd edn. Prentice Hall,
Englewood Cliffs (1999)
37. Herrera, F., Lozano, M., Moraga, C.: Hierarchical distributed genetic algorithms. Int. J.
of Intelligent Systems 14(9), 10991121 (1999)
38. Jin, Y., Olhofer, M., Sendhoff, B.: A framework for evolutionary optimization with approximate fitness functions. IEEE Trans. on Evolutionary Computation 6(5), 481494
(2002)
39. Kampolis, I., Giannakoglou, K.: A multilevel approach to single- and multiobjective aerodynamic optimization. Computer Methods in Applied Mechanics & Engineering 197, 29632975 (2008)
40. Kampolis, I., Giannakoglou, K.: Distributed evolutionary algorithms with hierarchical
evaluation. Engineering Optimization Accepted for publication (to appear, 2009)
41. Kampolis, I., Karangelos, E., Giannakoglou, K.: Gradient-assisted radial basis function
networks: theory and applications. Applied Mathematical Modelling 28(13), 197209
(2004)
42. Kampolis, I., Papadimitriou, D., Giannakoglou, K.: Evolutionary optimization using a
new radial basis function network and the adjoint formulation. Inverse Problems in Science & Engineering 14(4), 397410 (2006)
43. Kampolis, I., Zymaris, A., Asouti, V., Giannakoglou, K.: Multilevel optimization strategies based on metamodel-assisted evolutionary algorithms, for computationally expensive problems. In: CEC 2007, Singapore (2007)
44. Karakasis, M., Giannakoglou, K.: On the use of metamodel-assisted, multi-objective
evolutionary algorithms. Engineering Optimization 38(8), 941957 (2006)
45. Karakasis, M., Giotis, A., Giannakoglou, K.: Inexact information aided, low-cost, distributed genetic algorithms for aerodynamic shape optimization. Int. J. for Numerical
Methods in Fluids 43(10-11), 11491166 (2003)
46. Karakasis, M., Koubogiannis, D., Giannakoglou, K.: Hierarchical distributed evolutionary algorithms in shape optimization. Int. J. for Numerical Methods in Fluids 53(3),
455469 (2007)
47. Karayiannis, N., Mi, G.: Growing radial basis neural networks: Merging supervised and
unsupervised learning with network growth techniques. IEEE Trans. on Neural Networks 8(6), 14921506 (1997)
48. Keane, A., Nair, P.: Computational Approaches for Aerospace Design The Pursuit of
Excellence. John Wiley & Sons, Ltd., Chichester (2005)
49. Knowles, J., Corne, D.: M-PAES: A memetic algorithm for multiobjective optimization.
In: CEC 2000, pp. 325332. IEEE Press, Los Alamitos (2000)
50. Lambropoulos, N., Koubogiannis, D., Giannakoglou, K.: Acceleration of a NavierStokes equation solver for unstructured grids using agglomeration multigrid and parallel processing. Computer Methods in Applied Mechanics & Engineering 193, 781803
(2004)
83
51. Langdo, W., Poli, R.: Evolving problems to learn about particle swarm and other optimisers. In: CEC 2005, UK, pp. 8188 (2005)
52. Liakopoulos, P., Kampolis, I., Giannakoglou, K.: Grid-enabled, hierarchical distributed
metamodel-assisted evolutionary algorithms for aerodynamic shape optimization. Future
Generation Computer Systems 24, 701708 (2008)
53. Lim, D., Ong, Y.S., Jin, Y., Sendhoff, B., Lee, B.S.: Efficient hierarchical parallel genetic
algorithms using grid computing. Future Generation Computer Systems 23(4), 658670
(2007)
54. Lin, S.C., Punch, W., Goodman, E.: Coarse-grain parallel genetic algorithms: categorization and new approach. In: 6th IEEE Symposium on Parallel & Distributed Processing,
Dallas, pp. 2837 (1994)
55. Massie, M., Chun, B., Culler, D.: The Ganglia distributed monitoring system: Design,
implementation, and experience. Parallel Computing 30(7) (2004)
56. Mathioudakis, K., Papailiou, K., Neris, N., Bonhommet, C., Albrand, G., Wenger, U.:
An annular cascade facility for studying tip clearance effects in high speed flows. In:
XIII ISABE Conf., Chattanooga, TN (1997)
57. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 3rd edn.
Springer, Heidelberg (1996)
58. Michalewicz, Z., Fogel, D.: How to Solve it: Modern Heuristics, 2nd edn. Springer, Heidelberg (2004)
59. Montero, R., Huedo, E., Llorente, I.: A framework for adaptive execution on grids. J. of
Software - Practice & Experience 34, 631651 (2004)
60. Moody, J., Darken, C.: Fast learning in networks of locally-tuned processing units. Neural Computation 1(2), 281294 (1989)
61. Muyl, F., Dumas, L., Herbert, V.: Hybrid method for aerodynamic shape optimization in
automotive industry. Computers & Fluids 33(5-6), 849858 (2004)
62. Nakayama, H., Inoue, K., Yoshimori, Y.: Approximate optimization using computational
intelligence and its application to reinforcement of cable-stayed bridges. In: ECCOMAS
2004, Jyvaskyla (2004)
63. Nocedal, J., Wright, S.: Numerical Optimization. Springer, Heidelberg (1999)
64. Nowostawski, M., Poli, R.: Parallel genetic algorithm taxonomy. In: KES 1999, pp. 88
92 (1999)
65. Ong, Y., Lum, K., Nair, P.: Hybrid evolutionary algorithm with Hermite radial basis
function interpolants for computationally expensive adjoint solvers. Computational Optimization & Applications 39(1), 97119 (2008)
66. Ong, Y.S., Lum, K.Y., Nair, P., Shi, D., Zhang, Z.K.: Global convergence of unconstrained and bound constrained surrogate-assisted evolutionary search in aerodynamic
shape design. In: CEC 2003, Canberra, vol. 3, pp. 18561863 (2003)
67. Ong, Y.S., Nair, P., Keane, A.: Evolutionary optimization of computationally expensive
problems via surrogate modeling. AIAA J. 41(4), 687696 (2003)
68. Ong, Y.S., Lim, M., Zhu, N., Wong, K.: Classification of adaptive memetic algorithms:
A comparative study. IEEE Trans. on Systems Man & Cybernetics - Part B 36, 141152
(2006)
69. Papadimitriou, D., Giannakoglou, K.: A continuous adjoint method with objective function derivatives based on boundary integrals for inviscid and viscous flows. Computers
& Fluids 36, 325341 (2007)
70. Papadimitriou, D., Giannakoglou, K.: Total pressure loss minimization in turbomachinery cascades using a new continuous adjoint formulation. J. of Power & Energy (Part
A) 221, 865872 (2007)
84
71. Papadrakakis, M., Lagaros, N.D., Tsompanakis, Y.: Structural optimization using evolution strategies and neural networks. Computer Methods in Applied Mechanics & Engineering 156(1-4), 309333 (1998)
72. Piegl, L., Tiller, W.: The NURBS Book, 2nd edn. Springer, Heidelberg (1997)
73. Pierret, S., Van den Braembussche, R.: Turbomachinery blade design using a NavierStokes solver and artificial neural network. ASME J. of Turbomachinery 121(2), 326
332 (1999)
74. Poggio, T., Girosi, F.: Networks for approximation and learning. Proceedings of the
IEEE 78(9), 14811497 (1990)
75. Politis, E., Giannakoglou, K., Papailiou, K.: Highspeed flow in an annular cascade with
tip clearance: Numerical investigation. ASME Paper 98-GT-247 (1998)
76. Poloni, C., Giurgevich, A., Onesti, L., Pediroda, V.: Hybridization of a multiobjective
genetic algorithm, a neural network and a classical optimizer for a complex design problem in fluid dynamics. Computer Methods in Applied Mechanics & Engineering 186(2),
403420 (2000)
77. Ratle, A.: Optimal sampling strategies for learning a fitness model. In: CEC 1999, Washington, DC, vol. 3, pp. 20782085 (1999)
78. Sefrioui, M., Periaux, J.: A hierarchical genetic algorithm using multiple models for optimization. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel,
H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 879888. Springer, Heidelberg
(2000)
79. Smith, R., Dike, B., Stegmann, S.: Fitness inheritance in genetic algorithms. In: SAC
1995, pp. 345350. ACM, New York (1995)
80. Spalart, P., Allmaras, S.: A one-equation turbulence model for aerodynamic flows. AIAA
Paper 92-0439 (1992)
81. Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the Condor
experience. Concurrency - Practice and Experience 17(2-4), 323356 (2005)
82. Thevenin, D., Janiga, G.: Optimization and Computational Fluid Dynamics. Springer,
Heidelberg (2008)
83. Ulmer, H., Streichert, F., Zell, A.: Evolution strategies assisted by Gaussian processes
with improved pre-selection criterion. In: CEC 2003, Canberra, vol. 1, pp. 692699
(2003)
84. Zhou, Z., Ong, Y.S., Lim, M., Lee, B.: Memetic algorithm using multisurrogates
for computational expensive optimization problems. Soft Computing 11(10), 957971
(2007)
85. Zitzler, E., Laumans, M., Thiele, L.: SPEA2: Improving the strength Pareto evolutionary
algorithm. Tech. Rep. 103, ETH, Computer Engineering & Communication Networks
Lab. (TIK), Zurich (2001)
86. Zitzler, E., Laumans, M., Thiele, L.: SPEA2: Improving the strength Pareto evolutionary
algorithm for multiobjective optimization. In: EUROGEN 2001, CIMNE, Barcelona, pp.
1926 (2001)
87. Zitzler, E., Brockhoff, D., Thiele, L.: The hypervolume indicator revisited: On the design of pareto-compliant indicators via weighted integration. In: Obayashi, S., Deb, K.,
Poloni, C., Hiroyasu, T., Murata, T. (eds.) EMO 2007. LNCS, vol. 4403, pp. 862876.
Springer, Heidelberg (2007)
Chapter 4
Knowledge-Based Variable-Fidelity
Optimization of Expensive Objective Functions
through Space Mapping
Slawomir Koziel and John W. Bandler
Abstract. The growing complexity of engineering modeling and design problems demands effective strategies for optimization of computationally expensive
objective functions. To this end, we focus on knowledge-based, variable-fidelity
optimization of expensive functions through a tried and tested, yet still rapidly
evolving art called space mapping optimization. Fitting into the arena of surrogatebased optimization, space-mapping optimization is a model-driven optimization
process where the model is an iteratively updated surrogate derived from a valid,
low-fidelity or physics-based coarse model. Space mapping takes several forms.
Here, we present and formulate the original input space mapping concept, as well as
the more recent implicit and output space mapping concepts. Corresponding surrogate models are presented, classified, and discussed. A proposed optimization flow
is explained. Then we illustrate both input space mapping and implicit space mapping through the space mapping optimization of a simple, technology-free wedgecutting problem. We also present tuning space mapping, a powerful methodology,
but one that requires extra engineering knowledge of the problem under investigation. To confirm our work, we select representative examples from the fields of
microwave and antenna engineering, including filter and antenna designs.
4.1 Introduction
True of all branches of engineering, the escalating complexity of modeling and
design problems drives todays demand for effective strategies for optimization
Slawomir Koziel
Engineering Optimization & Modeling Center, School of Science and Engineering,
Reykjavik University, Kringlunni 1, IS-103, Reykjavik, Iceland
e-mail: koziel@ru.is
John W. Bandler
Simulation Optimization Systems Research Laboratory, Department of Electrical and
Computer Engineering, McMaster University, 1280 Main Street West, L8S 4K1 Ontario,
Canada
e-mail: bandler@mcmaster.ca
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 85109.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
86
87
Within the sphere of space mapping, Robinson et al. [44] treat design problems when the coarse and fine models are defined over different design spaces and
mappings are required over these spaces.
In a related field, Rayas-Sanchez [41] reviews the state-of-the-art in electromagnetics-based design and optimization using artificial neural networks. He
surveys conventional modeling approaches along with typical enhancing and knowledge-based techniques. Rayas-Sanchez reviews strategies for design exploiting
knowledge, including neural space-mapping methods.
Space mapping technology emerged in 1994 [9] out of competitive necessity.
Full-wave electromagnetic solvers had long been accepted for validating microwave
designs obtained through equivalent circuit models. While the idea of employing
electromagnetic solvers for direct optimal design attracted microwave
engineers, electromagnetic solvers are notoriously CPU-intensive. As originally
construed, they also suffered from non-differentiable response evaluation and nonparameterized design variables that were often discrete in the parameter space,
etc. Such characteristics are unfriendly to classical gradient optimization algorithms. Thus, state-of-the-art successful interconnection of electromagnetic solvers
with powerful optimization techniques still insufficiently addressed the microwave
communitys ambitions for automated electromagnetics-based design optimization.
The original idea of space mapping [9] was to map designs from optimized circuit models to corresponding electromagnetic models. A parameter extraction
step calibrated the circuit solver against the electromagnetic simulator in order to
minimize observed discrepancies between the two simulations. The circuit model
(surrogate) was then updated through extracted parameters and made ready for
subsequent classical optimization.
Bandler et al. [11] reviewed the space mapping and the space-mapping-based surrogate modeling concepts and applications in various engineering design optimization problems. They present a mathematical motivation and place space mapping
into the context of classical optimization. Recent work in space mapping includes a
trust-region approach [5], neural space mapping [6] and implicit space mapping [12].
Parameter extraction is an essential sub-problem used to align the surrogatean enhanced coarse modelwith the fine model. In a 2006 review, Bandler et al. [13] show
that all the existing space mapping approaches can be viewed as particular cases of
one, generic formulation of space mapping.
Space mapping demonstrably addresses the engineers need for validated, highfidelity designs when classical optimization algorithms threaten hundreds of costly
simulations, and perhaps days or weeks of CPU time. The methodology exploits
underlying fast-to-compute, low-fidelity surrogate models, which are ubiquitous
in engineering practice. Space mapping takes the high-fidelity simulator out of
the classical optimization loop, instead exploiting the iterative enhancement of the
available low-fidelity surrogates. Space mapping optimization algorithms enjoy a
desirable feature: they usually provide excellent designs after only a handful of
high-fidelity simulations. The methodology follows the traditional experience and
intuition of the engineer, yet is amenable to mathematical treatment. It enjoys
immediate recognition by the experienced engineering designer.
88
A key to the success of space mapping, i.e., that it yields satisfactory solutions
after a few fine model evaluations, is the recommended physical nature of the coarse
model. Other surrogate-model-based methods [14, 18, 24, 39, 46] exploit functional
surrogates obtained from direct approximation of the available fine model data and,
therefore, cannot compete with space mapping in terms of computational efficiency.
In implicit space mapping, preassigned parameters not used in the optimization
process can change in the coarse model. In output space mapping, we transform
the response of the coarse model. Other exciting developments include surrogates
that interpolate fine models simulated on a structured grid, frequency mappings; and
the recent concept of tuning space mapping. The latest review by Koziel et al. [32]
places various related concepts contextually into the history of design optimization
and modeling of microwave circuits.
Space mapping optimization [21, 29] belongs to the class of surrogate-based optimization methods [14] that generates a sequence of approximations to the objective
function and manages the use of these approximations as surrogates for optimization.
Space mapping methodology continues to provide success in diverse areas [4, 17,
20, 22, 25, 27, 28, 33, 42, 43, 49, 50, 51]: electronic, photonic, radio frequency, antenna, microwave, and magnetic systems; civil, mechanical, and aerospace
engineering structures, including automotive crashworthiness design [43].
In this chapter, we present and formulate the original input space mapping
concept, as well as the more recent implicit and output space mapping concepts.
We present, classify and discuss corresponding surrogate models. A proposed optimization flow is explained. Then we illustrate both input space mapping and
implicit space mapping through the space mapping optimization of a simple wedgecutting problem. We also present tuning space mapping, a powerful methodology,
but one that requires extra engineering knowledge of the problem under investigation. Throughout, we select representative examples from the fields of microwave
and antenna engineering, including filter and antenna designs.
(4.1)
89
min
(i)
x f s
H(s(i) (x))
(4.2)
(i)
p p
(4.3)
(k) , p)||
wi.k || f (x(k) ) s(x
(4.4)
k=0
(i)
Typically, x(0) = arg min{x : H(c(x))}, i.e., it is the optimal solution of the coarse
model, which is the best initial design we normally have at our disposal.
90
Usually, the algorithm is terminated when it converges (i.e., ||x(i) x(i1) || and/or
|| f (x(i) ) f (x(i1) )|| are smaller than user-defined values) or when the maximum
number of iterations (or fine model evaluations) is exceeded.
91
92
min
(i)
x f s , ||xx(i) || (i)
H(s(i) (x))
(4.5)
where (i) denotes the trust region radius at iteration i, which is updated at every
iteration using classical rules [16].
It should be emphasized that space mapping is not a general-purpose approach.
The existence of the computationally cheap and sufficiently accurate coarse model
is an important prerequisite of our technique. If such a coarse model does exist, the
space mapping method is able to yield a satisfactory design after a few evaluations
of the high-fidelity model, which is a dramatic reduction of the computational cost
of the optimization compared to other methods. Otherwise, space mapping cannot
be used or will not be efficient.
93
H
x
x
(a)
(b)
Fig. 4.1 Wedge-cutting problem [11]: (a) the fine model, and (b) the coarse model
of a piece of this rectangle, determined by the length x, so that we have c(x) = Hx.
Here, we assume that H = 5.
The starting point of SM optimization is a coarse model optimal solution x(0) =
20. The fine model at x(0) is f (x(0) ) = 125. For illustration purposes we will solve
our problem using the simplest version of the input space mapping and then using
implicit space mapping.
4.2.5.1
We use the following setup for the input space mapping approach. The generic
surrogate model is given by s(x,
p) = s(x,
q) = c(x + q). The weighting factors in
the parameter extraction process (4.4) are given by wi.k = 1 for k = i and wi.k = 0
otherwise. Thus, the surrogate model can be written in short as:
where
(4.6)
(4.7)
In this simple case, (4.7) has an analytical solution given by q(i) = f (x(i) )/H x(i) .
Figure 4.2 shows the first four iterations of the SM algorithm solving the wedge
cutting problem. This particular input space mapping approach is both simple and
direct, yet it converges to an acceptable result (from an engineering point of view)
in a remarkably small number of iterations. It is clearly knowledge-based, since
the coarse model is a physical approximation to the fine model, and the iteratively
updated coarse model attempts to align itself with the fine model. The optimization
process mimics a learning process derived from intuition.
4.2.5.2
We use the following setup for the implicit space mapping approach. The generic
surrogate model is given by s(x,
p) = s(x,
H) = ci (x, H), where ci (x, H) = Hx. The
weighting factors in the parameter extraction process (4.4) are, as before, wi.k = 1
for k = i and wi.k = 0 otherwise. The surrogate model can be restated as:
s(i) (x) = ci (x, H (i) ) = H (i) x
(4.8)
94
Fine model
x(0)
x(1)
q(0)
q(1)
x(2)
x(3)
q(2)
q(3)
x(0)
x(1)
x(2)
x(3)
x(4)
q(3)
x(4)
Fig. 4.2 Input space mapping solving the wedge cutting problem [11]
Surrogate model
Fine model
H(0)
x(0)
x(0)
H(1)
x(1)
x(1)
H(2)
x(2)
x(2)
H(3)
x(3)
x(3)
x(4)
x(4)
Fig. 4.3 Implicit space mapping solving the wedge cutting problem
95
where
H (i) = argmin || f (x(i) ) ci (x(i) , H)|| = arg min || f (x(i) ) Hx(i) ||
H
(4.9)
In this simple case, (4.9) has an analytical solution H (i) = f (x(i) )/x(i) . Figure 4.3
shows the first four iterations of the SM algorithm solving the wedge cutting problem.
This indirect, implicit space mapping approach is as simple as input space mapping and also converges to an acceptable result (from an engineering point of view)
in few iterations. Since our target is area, the H and the x in the Hx of the rectangle
are of equal significance as design parameters in the coarse model. The physical approximation remains valid and this optimization process also mimics a learning process derived from intuition. In effect, we are recalibrating the coarse model against
measurements of the fine model after each change to the fine model.
96
f (x) of the fine model is the modulus of the so-called transmission coefficient, |S21 |,
evaluated at 41 frequency points equally spaced over the frequency band 2 GHz to
6 GHz. Evaluation time on a Pentium D 3.4 GHz processor is about 14 minutes.
Empirical modeling is central to engineering theory and practice. Electrical and
electronics engineers, in particular, have a strong tradition of developing libraries of
fast models already validated by electromagnetic analysis or physical experiment.
Thus, users of commercial circuit solvers such as Agilent ADS [1] enjoy a rich and
vast library of empirical elements that they can call upon in their quest to formulate
suitable coarse models.
Here, a suitable coarse model, Fig. 4.5, is a circuit equivalent of the structure
in Fig. 4.4, consisting of circuit-theory-based models of microstrips. The coarse
model is implemented in Agilent ADS. Evaluation of the coarse model takes a few
milliseconds. It should be noted that both fine and coarse models describe basically
the same physical phenomena.
The design specifications are |S21 | 3 dB for 3.8 GHz 4.2 GHz,
and |S21 | 20 dB for 2.0 GHz 3.2 GHz and 4.8 GHz 6.0
GHz. Thus, the merit function H in this case is a minimax function defined as
97
|S21|
10
20
30
40
50
2
2.5
3.5
4
4.5
frequency [GHz]
5.5
Fig. 4.6 Second-order CCDBR filter: fine model (solid line) and coarse model response
(dashed line) at the starting point x(0) . Design specifications are marked using horizontal
lines
98
|S21|
10
20
30
40
50
2
2.5
3.5
4
4.5
frequency [GHz]
5.5
Fig. 4.7 Second-order CCDBR filter: fine model response (solid line) and the response of the
surrogate model c(x + q(0) ) (dashed line) at the starting point x(0)
0
|S21|
10
20
30
40
50
2
2.5
3.5
4
4.5
frequency [GHz]
5.5
Fig. 4.8 Second-order CCDBR filter: fine model response (solid line) and the response of the
surrogate model c(x + q(0) ) (dashed line) at x(1) , the optimum of the surrogate model s(0)
even more important, this good match is maintained away from x(0) , as illustrated
in Fig. 4.8, which shows the fine model response and the response of the surrogate
model c(x + q(0) ) at x(1) , the optimum of the surrogate model s(0) . This means that
the space mapping surrogate not only exhibits good approximation capability but
also has excellent generalization (prediction) capability. It should be emphasized
that the space mapping surrogate is established using fine model data at just a single
point (design). This is only possible because the coarse model encodes substantial
knowledge about the physical phenomena described by the fine model.
SM optimization is accomplished after five iterations. Fig. 4.9 shows the fine
model response at the final solution, x(5) = [3.344 4.820 1.092 0.052]T mm; the
corresponding minimax objective function value is 1.4 dB. Table 4.1 compares
the computational efficiency of the SM algorithm and direct optimization using
Matlabs fminimax routine [37]. SM optimization is about 16 times faster than direct optimization. Note that the SM optimization time is slightly larger than the
total fine model evaluation time (6 14 min = 84 min), which is because of some
overhead related to multiple evaluations of the surrogate model (cf. (4.2) and
(4.4)).
99
|S21|
10
20
30
40
50
2
2.5
3.5
4
4.5
frequency [GHz]
5.5
Fig. 4.9 Second-order CCDBR filter: optimized fine model response. Design specifications
are marked using horizontal lines
Table 4.1 Second-order CCDBR filter: SM optimization versus direct optimization
Optimization
Procedure
106
6
24 h 45 min
1 h 31 min
100
(U p , Ip )
2r0
d2
Hr2
d1
H r1
b2
a2
a1
b1
Fig. 4.10 Geometry of a stacked probe-fed printed double annular ring antenna [26]
0
5
|S11|
10
15
20
25
30
1.75
1.8
1.85
1.9
1.95
2
frequency [GHz]
2.05
2.1
2.15
Fig. 4.11 Double annular ring antenna: fine model (solid line) and coarse model response
(dashed line) at the starting point x(0) . Design specifications are marked using a horizontal
line
(i)
(i)
[r1 r2 ]T )||} . Vector d (i) is calculated as d (i) = f (x(i) ) c(x(i) , [r1 r2 ]T ) after
(i) (i)
vector [r1 r2 ]T is known.
The starting point for SM optimization is the optimal solution of the coarse
model, x((0) = [9.228 8.722 30.723 34.127 18.211]T mm. Figure 4.11 shows the
responses of the fine and coarse models at x(0) . The fine model minimax objective
function value at the initial design is +8.2 dB.
SM optimization is accomplished after three iterations (four fine model evaluations). Fig. 4.12 shows the fine model response at the final solution, x(3) = [10.674
7.809 28.462 32.504 19.682]T mm as well as the response of the surrogate model
s(2) (x(3) ); the fine model minimax objective function value is 0.2 dB. Note that
the fine model response in Fig. 4.12 corresponds to an almost optimum design in a
minimax sense. The SM optimization time is 5 hours 58 minutes and is larger than
the total fine model evaluation time (4 78 minutes = 5 hours 12 minutes), which is
because of the overhead related to multiple evaluations of the surrogate model.
101
|S11|
10
15
20
25
30
1.75
1.8
1.85
1.9
1.95
2
frequency [GHz]
2.05
2.1
2.15
Fig. 4.12 Double annular ring antenna: fine model (solid line) and surrogate model response
(dashed line) at the final design x(3) . Design specifications are marked using a horizontal line
Direct optimization of the fine model in this example was not attempted. With a
simulation time of 1 hour and 18 minutes per system analysis, direct optimization
would require about a week, which is not acceptable.
4.3.3 Discussion
In this section we have studied two representative examples taken from electromagnetics-based microwave engineering design. We see that a major obstacle in
executing the optimizations include the high computational cost of full wave electromagnetic simulation by commercial solvers. Space mapping effectively replaces
the direct optimization of the high-fidelity model by iterative re-optimization and
updating of the faster surrogate based on the problem-specific knowledge embedded in the underlying coarse model. Thus, space mapping optimization shifts the
CPU burden from the slower simulator to the faster simulator.
102
uH j
ZDJ
D HE
u E jZ
B PH
DB 0
DD U
test) at the current iteration point, and tuning parameters (typically implemented
through circuit elements inserted into tuning ports). The tunable parameters are adjusted so that the model satisfies the original design specifications. The conceptual
illustration of the tuning model is shown in Fig. 4.13. The procedure is invasive in
the sense that the structure may need to be cut. The fine model simulator must allow
such cuts and allow tuning elements to be inserted.
A certain relation (not necessarily analytical) between the parameters of the tuning model and the design variables is assumed, so that the new design is obtained
by translating the adjusted parameters into the corresponding design variable values
using this very relation.
(4.10)
In the next step, we optimize t (i) to have it meet the design specifications. We obtain
(i)
the optimal values of the tuning parameters xt as follows:
(i)
(i)
(4.11)
Having xt.1 we perform the calibration procedure to determine changes in the design
variables that yield the same change in the calibration model response as that caused
103
(i)
by xt.1 xt.0 , where xt.0 are initial values of the tuning parameters (normally zero).
We first adjust the SM parameters p(i) of the calibration model c to obtain a match
with the fine model response at x(i)
(i)
(4.12)
The calibration model is then optimized with respect to the design variables in order
to obtain the next iteration point x(i+1)
(i)
(i)
x(i+1) = arg min ||t (i) (xt.1 ) c(x, p(i) , xt.0 )||
x
(4.13)
(i)
Note that we use xt.0 in (4.12), which corresponds to the state of the tuning model af(i)
ter performing the alignment procedure (4.10), and xt.1 in (4.13), which corresponds
to the optimized tuning model (cf. (4.11)). Thus, (4.12) and (4.13) allow us to find
the change of design variable values x(i+1) x(i) necessary to compensate the effect
(i)
(i)
of changing the tuning parameters from xt.0 to xt.1 .
TSM exploits the standard SM optimization, classical circuit and electromagnetic
(EM) theory, as well as the engineers expertise. For example, in a physics-based
simulation according to classical EM theory, design parameters such as physical
length and width of a microstrip line can be mapped to a tuning component such as
a capacitor. The calibration process then transfers the tuning parameters to physical
design parameters, which can be achieved by taking advantage of classical theory
and engineering experience. Still, the TSM algorithm can be seen as a specialized
case of a standard SM. On the other hand, TSM allows greater flexibility in terms of
the surrogate model which may, in general, involve any relation between the tuning
parameters and design variables.
104
S1
Input
Output
S2
W
25 26
L1
S2
27 28
L5
3
4
5
6
L2
11
12
L3
19
20
13
14
7
8
15
16
9
10
17
18
L4
W1
21 22
23 24
Fig. 4.14 Box-section Chebyshev bandpass filter: geometry [34], and places for inserting
the tuning ports (denoted as white rectangles; the numbers correspond to the terminals of
S-parameter component S28P of the tuning model shown in Fig. 4.15)
Lt5
Lt4
Lt5
Term 1
Z=50 Ohm
Term 2
Z=50 Ohm
28 27 26 25 24 23 22
21
20
19
S28P
SNP1
Ct1
Lt1
Lt4
18
17
16
7
8
Lt3
15
10 11 12 13 14 Ref
Ct2
Ct2
Lt2
Ct1
Ct1
Lt1
Ct1
Ct2
Lt2
Ct2
Fig. 4.15 Box-section Chebyshev bandpass filter: tuning model (Agilent ADS)
circuit simulator [1]. Tuning elements connected to the appropriate ports complete
the tuning model within the ADS simulator.
Specifically in our example, the tuning model is constructed by dividing the polygons corresponding to the length parameters L1 to L5 in the middle and inserting the
tuning ports at the new cut edges. Its S28P data file (a 28-port S-parameter matrix) is
then loaded into the S-parameter component in Agilent ADS [1]. The circuit-theory
coupled-line components and capacitor components are designated as tuning elements and are inserted into each pair of tuning ports (Fig. 4.15). The lengths of the
imposed coupled-lines and the capacitances of the capacitors are assigned to be the
tuning parameters, so that we have xt = [Lt1 Lt2 Lt3 Lt4 Lt5 Ct1 Ct2 ]T (Ltk are in
mil, Ctk in pF).
The calibration model is a circuit equivalent model implemented in ADS and
shown in Fig. 4.16. It contains the same tuning elements as the tuning model. It
basically mimics the division of the coupled-lines performed while preparing t. The
105
Fig. 4.16 Box-section Chebyshev bandpass filter: calibration model (Agilent ADS)
0
|S21|
10
20
30
40
50
1.8
2.2
2.4
2.6
Frequency [GHz]
2.8
Fig. 4.17 Box-section Chebyshev bandpass filter: the coarse (dashed line) and fine (solid
line) model response at the initial design. Design specifications are marked using horizontal
lines
calibration model also contains six (implicit) SM parameters that will be used as
parameters p in the calibration process (4.12), (4.13). These parameters are p =
[r1 r2 r3 r4 r5 H]T , where rk is the dielectric constant of the microstrip line
segment of length Lk (cf. Fig. 4.14), and H is the substrate height of the filter. Initial
values of these parameters are [3.63 3.63 3.63 3.63 3.63 20]T .
The misalignment between the fine and tuning model responses with the tun(0)
ing elements set to zero is negligible so that xt.0 = [0 0 0 0 0 0 0]T was used
throughout. The values of the tuning parameters at the optimal design of the tun(0)
ing model are xt.1 = [85.2 132.5 5.24 1.13 15.24 0.169 0.290]T . Note
that some of the parameters take negative values, which is permitted in ADS. The
values of the preassigned parameters obtained in the first calibration phase (4.12)
are p(0) = [3.10 6.98 4.29 7.00 6.05 17.41]T .
Figure 4.17 shows the coarse and fine model responses at the initial design,
whereas Fig. 4.18 shows the fine model response after just one TSM iteration with
x(1) = [1022 398 46 56 235 4 10]T mil (the corresponding minimax objective
function value is 1.8 dB).
106
|S21|
10
20
30
40
50
1.8
2.2
2.4
2.6
Frequency [GHz]
2.8
Fig. 4.18 Box-section Chebyshev bandpass filter: fine model response at the design found
after one iteration of the TSM algorithm. Design specifications are marked using horizontal
lines
4.4.3 Summary
We have considered simulation-based tuning within the scope of space mapping. In
our TSM approach, which is significantly more knowledge-intensive than regular
space mapping, we construct a tuning model directly by cutting into the fine model
and connecting tuning elements to the resulting internal ports of the structure. Relevant parameters or preassigned parameters of these auxiliary elements are chosen to
be tunable and are varied to match the tuning model to the fine model. This process
takes little CPU effort as the tuning model is typically implemented within a circuit
simulator. An updated tuning model is then available for design prediction. The prediction is fed back to fine model simulator after simple calibration. The process is
repeated until the fine model response is sufficiently close to the design target.
4.5 Conclusions
Ultimately, every engineer seeks high-fidelity, but cheap, solutions to computationally expensive design problems. So much the better if, without loss of fidelity or
sacrifice of optimality, the design process can be cast as a simple nonlinear optimization of black-box functions. However, to ensure low cost, a tight limit on the
number of high-fidelity simulation runs to evaluate expensive objective functions
and constraints is mandatory, otherwise such optimization problems can become
computationally intractable. In our situation, pure classical methods of optimization are likely to perform poorly or fail since we limit the number of high-fidelity
function simulations to only a handful. However, space mapping heavily exploits
classical methods in the optimization of the underlying surrogates.
Input space mapping requires expert knowledge, usually deals with relatively
few free optimization variables, but the parameter extraction step can be a difficult nonlinear optimization problem to solve. Expertise is helpful in implicit space
mapping because of the many possibly available pre-assigned parameters. In output space mapping, engineering expertise may by somewhat less necessary, but the
107
process can involve a large number of variables. However, the parameter extraction
step might not require coarse model re-simulation. The new tuning space mapping
approach is an effective simulator-based approach but requires significantly more
expertise to execute.
Essential to overall success, we believe, is a suitable combination of (1) classical optimization algorithms, (2) computational intelligence, (3) fast physics-based
surrogates, and (4) the designers engineering expertise. Our contribution to space
mapping exploits these necessary ingredients.
References
1. Agilent ADS 2008: Agilent Technologies, Santa Rosa, CA, USA (2008)
2. Alexandrov, N.M., Dennis, J.E., Lewis, R.M., Torczon, V.: A trust region framework for
managing the use of approximation models in optimization. Structural Optimization 15,
1623 (1998)
3. Alexandrov, N.M., Lewis, R.M.: An overview of first-order model management for engineering optimization. Optimization Eng. 2, 413430 (2001)
4. Amari, S., LeDrew, C., Menzel, W.: Space-mapping optimization of planar coupledresonator microwave filters. IEEE Trans. Microwave Theory Tech. 54, 21532159 (2006)
5. Bakr, M.H., Bandler, J.W., Biernacki, R.M., Chen, S.H., Madsen, K.: A trust region aggressive space mapping algorithm for EM optimization. IEEE Trans. Microwave Theory
Tech. 46, 24122425 (1998)
6. Bakr, M.H., Bandler, J.W., Ismail, M.A., Rayas-Sanchez, J.E., Zhang, Q.J.: Neural spacemapping optimization for EM-based design. IEEE Trans. Microwave Theory Tech. 48,
23072315 (2000)
7. Bandler, J.W., Liu, P.C., Tromp, H.: A nonlinear programming approach to optimal design centering, tolerancing and tuning. IEEE Trans. Circuits and Systems 23, 155165
(1976)
8. Bandler, J.W., Salama, A.E.: Functional approach to microwave postproduction tuning.
IEEE Trans. Microwave Theory Tech. 33, 302310 (1985)
9. Bandler, J.W., Biernacki, R.M., Chen, S.H., Grobelny, P.A., Hemmers, R.H.: Space
mapping technique for electromagnetic optimization. IEEE Trans. Microwave Theory
Tech. 42, 536544 (1994)
10. Bandler, J.W., Biernacki, R.M., Chen, S.H., Hemmers, R.H., Madsen, K.: Electromagnetic optimization exploiting aggressive space mapping. IEEE Trans. Microwave Theory
Tech. 43, 28742882 (1995)
11. Bandler, J.W., Cheng, Q.S., Dakroury, S.A., Mohamed, A.S., Bakr, M.H., Madsen, K.,
Sndergaard, J.: Space mapping: the state of the art. IEEE Trans. Microwave Theory
Tech. 52, 337361 (2004)
12. Bandler, J.W., Cheng, Q.S., Nikolova, N.K., Ismail, M.A.: Implicit space mapping optimization exploiting preassigned parameters. IEEE Trans. Microwave Theory Tech. 52,
378385 (2004)
13. Bandler, J.W., Koziel, S., Madsen, K.: Space mapping for engineering optimization.
SIAG/Optimization Views-and-News Special Issue on Surrogate/Derivative-free Optimization 17, 1926 (2006)
14. Booker, A.J., Dennis, J.E., Frank, P.D., Serafini, D.B., Torczon, V., Trosset, M.W.: A
rigorous framework for optimization of expensive functions by surrogates. Structural
Optimization 17, 113 (1999)
108
15. Buhmann, M.D., Ablowitz, M.J.: Radial Basis Functions: Theory and Implementations.
Cambridge University Press, Cambridge (2003)
16. Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust Region Methods. MPS-SIAM Series on
Optimization (2000)
17. Crevecoeur, G., Dupre, L., Van de Walle, R.: Space mapping optimization of the magnetic circuit of electrical machines including local material degradation. IEEE Trans.
Magn. 43, 26092611 (2007)
18. Dennis, J.E., Torczon, V.: Managing approximation models in optimization. In: Alexandrov, N.M., Hussaini, M.Y. (eds.) Multidisciplinary Design Optimization: State of the
Art, pp. 330347. SIAM, Philadelphia (1997)
19. Ding, X., Devabhaktuni, V.K., Chattaraj, B., Yagoub, M.C.E., Doe, M., Xu, J.J., Zhang,
Q.J.: Neural network approaches to electromagnetic based modeling of passive components and their applications to high-frequency and high-speed nonlinear circuit optimization. IEEE Trans. Microwave Theory Tech. 52, 436449 (2004)
20. Dorica, M., Giannacopoulos, D.D.: Response surface space mapping for electromagnetic
optimization. IEEE Trans. Magn. 42, 11231126 (2006)
21. Echeverria, D., Hemker, P.W.: Space mapping and defect correction. CMAM The International Mathematical Journal Computational Methods in Applied Mathematics 5,
107136 (2005)
22. Encica, L., Makarovic, J., Lomonova, E.A., Vandenput, A.J.A.: Space mapping optimization of a cylindrical voice coil actuator. IEEE Trans. Industry Applications 42, 1437
1444 (2006)
23. FEKO Users Manual, Suite 5.4, EM Software and Systems-S.A (Pty) Ltd., Stellenbosch,
South Africa (2008)
24. Gano, S.E., Renaud, J.E., Sanders, B.: Variable fidelity optimization using a kriging
based scaling function. In: Proc. 10th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conf., Albany, New York, USA (2004)
25. Ismail, M.A., Smith, D., Panariello, A., Wang, Y., Yu, M.: EM-based design of largescale dielectric-resonator filters and multiplexers by space mapping. IEEE Trans. Microwave Theory Tech. 52, 386392 (2004)
26. Kokotoff, D.M., Aberle, J.T., Waterhouse, R.B.: Rigorous analysis of probe-fed printed
annular ring antennas. IEEE Trans. Antennas and Propagation 47, 384388 (1999)
27. Koziel, S., Bandler, J.W., Mohamed, A.S., Madsen, K.: Enhanced surrogate models for
statistical design exploiting space mapping technology. In: IEEE MTT-S Int. Microwave
Symp. Dig., Long Beach, CA, pp. 16091612 (2005)
28. Koziel, S., Bandler, J.W., Madsen, K.: Space-mapping based interpolation for engineering optimization. IEEE Trans. Microwave Theory Tech. 54, 24102421 (2006)
29. Koziel, S., Bandler, J.W., Madsen, K.: A space mapping framework for engineering optimization: theory and implementation. IEEE Trans. Microwave Theory Tech. 54, 3721
3730 (2006)
30. Koziel, S., Bandler, J.W.: Space-mapping optimization with adaptive surrogate model.
IEEE Trans. Microwave Theory Tech. 55, 541547 (2007)
31. Koziel, S., Bandler, J.W., Madsen, K.: Quality assessment of coarse models and surrogates for space mapping optimization. Optimization Eng. 9, 375391 (2008)
32. Koziel, S., Cheng, Q.S., Bandler, J.W.: Space mapping. IEEE Microwave Magazine 9(6),
105122 (2008)
33. Leary, S.J., Bhaskar, A., Keane, A.J.: A constraint mapping approach to the structural
optimization of an expensive model using surrogates. Optimization Eng. 2, 385398
(2001)
109
34. Liao, C.K., Chi, P.L., Chang, C.Y.: Microstrip realization of generalized Chebyshev filters with box-like coupling schemes. IEEE Trans. Microwave Theory Tech. 55, 147153
(2007)
35. Manchec, A., Quendo, C., Favennec, J.F., Rius, E., Person, C.: Synthesis of capacitivecoupled dual-behavior resonator (CCDBR) filters. IEEE Trans. Microwave Theory
Tech. 54, 23462355 (2006)
36. Marsden, A.L., Wang, M., Dennis, J.E., Moin, P.: Optimal aeroacoustic shape design
using the surrogate management framework. Optimization Eng. 5, 235262 (2004)
37. Matlab, ver. 7.14, The MathWorks, Inc., Natick, MA, USA (2008)
38. Miraftab, V., Mansour, R.R.: A robust fuzzy-logic technique for computer-aided diagnosis of microwave filters. IEEE Trans. Microwave Theory Tech. 52, 450456 (2004)
39. Queipo, N.V., Haftka, R.T., Shyy, W., Goel, T., Vaidynathan, R., Tucker, P.K.: Surrogatebased analysis and optimization. Progress in Aerospace Sciences 41, 128 (2005)
40. Rautio, J.C.: RF design closurecompanion modeling and tuning methods. In: IEEE
MTT IMS Workshop: Microwave component design using space mapping technology,
San Francisco, CA (2006)
41. Rayas-Sanchez, J.E.: EM-based optimization of microwave circuits using artificial neural
networks: the state of the art. IEEE Trans. Microwave Theory Tech. 52, 420435 (2004)
42. Rayas-Sanchez, J.E., Lara-Rojo, F., Martnez-Guerrero, E.: A linear inverse space mapping (LISM) algorithm to design linear and nonlinear RF and microwave circuits. IEEE
Trans. Microwave Theory Tech. 53, 960968 (2005)
43. Redhe, M., Nilsson, L.: Using space mapping and surrogate models to optimize vehicle
crashworthiness design. In: 9th AIAA/ISSMO Multidisciplinary Analysis and Optimization Symp., Atlanta, GA, Paper AIAA-2002-5536 (2002)
44. Robinson, T.D., Eldred, M.S., Willcox, K.E., Haimes, R.: Surrogate-based optimization
using multifidelity models with variable parameterization and corrected space mapping.
AIAA Journal 46, 28142822 (2008)
45. Simpson, T.W., Maurey, T.M., Korte, J.J., Mistree, F.: Kriging models for global approximation in simulation-based multidisciplinary design optimization. AIAA Journal 39,
22332241 (2001)
46. Simpson, T.W., Peplinski, J., Koch, P.N., Allen, J.K.: Metamodels for computer-based
engineering design: survey and recommendations. Engineering with Computers 17, 129
150 (2001)
47. Sonnet em Version 11.54, Sonnet Software, Inc., North Syracuse, NY, USA (2008)
48. Van Beers, W.C.M., Kleijnen, J.P.C.: Kriging interpolation in simulation: survey. In:
Proc. 2004 Winter Simulation Conf., pp. 113121 (2004)
49. Wu, K.L., Zhao, Y.J., Wang, J., Cheng, M.K.K.: An effective dynamic coarse model for
optimization design of LTCC RF circuits with aggressive space mapping. IEEE Trans.
Microwave Theory Tech. 52, 393402 (2004)
50. Zhang, L., Xu, J., Yagoub, M.C.E., Ding, R., Zhang, Q.J.: Efficient analytical formulation and sensitivity analysis of neuro-space mapping for nonlinear microwave device
modeling. IEEE Trans. Microwave Theory Tech. 53, 27522767 (2005)
51. Zhu, J., Bandler, J.W., Nikolova, N.K., Koziel, S.: Antenna optimization through space
mapping. IEEE Trans. Antennas and Propagation 55, 651658 (2007)
Chapter 5
Abstract. In this chapter, a rough approximation model, which is an approximation model with low accuracy and without learning process, is presented in order to
reduce the number of function evaluations effectively. Although the approximation
errors between the true function values and the approximation values are not small,
the rough model can estimate the order relation of solutions with fair accuracy. By
utilizing this nature of the rough model, we have proposed estimated comparison
method, in which function evaluations are omitted when the order relation of solutions can be judged by approximation values. In the method, a parameter for error
margin is introduced to avoid incorrect judgment. Also, a parameter for utilizing
congestion of solutions is introduced to avoid omitting promising solutions. In order
to improve the stability and efficiency of the method, we propose adaptive control
of the margin parameter and the congestion parameter according to the success rate
of the judgment. The advantage of these improvements is shown by comparing the
results obtained by Differential Evolution (DE), DE with the estimated comparison method, adaptively controlled DE with the estimated comparison method and
particle swarm optimization in various types of benchmark functions.
5.1 Introduction
Evolutionary computation has been successfully applied to various fields of science
and engineering. Evolutionary algorithms (EAs) have been proved to be powerful
function optimization algorithms. However, EAs need a large number of function
Tetsuyuki Takahama
Hiroshima City University, 3-4-1 Ozukahigashi, Asaminami-ku, Hiroshima, Japan
e-mail: takahama@info.hiroshima-cu.ac.jp
Setsuko Sakai
Hiroshima Shudo University, 1-1-1 Ozukahigashi, Asaminami-ku, Hiroshima, Japan
e-mail: setuko@shudo-u.ac.jp
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 111129.
springerlink.com
Springer-Verlag Berlin Heidelberg 2010
112
evaluations before a well acceptable solution can be found. Recently, the size of
optimization problems tends to become large, and the cost of function evaluation
becomes high. It is necessary to develop more efficient optimization algorithms in
order to reduce the number of function evaluations.
An effective method for reducing function evaluations is to build an approximation model for the objective function and to solve the optimization problems using
the approximation values [6]. However, building the high-quality approximation
model is very difficult and time-consuming. Also, a proper approximation model
depends on the problems to be optimized. It is difficult to design a general-purpose
approximation model with high accuracy.
In order to solve these difficulties, we have proposed estimated comparison
method [17, 20]. In the method, an approximation model with low accuracy and
without learning process, or a rough approximation model, is utilized to reduce the
number of function evaluations effectively. The approximation errors between the
true function values and the approximation values estimated by the rough approximation model will not be small. However, the rough model can estimate whether the
function value of a solution is smaller than that of the other solution or not with fair
accuracy, and can be used to compare solutions. Thus, the estimated comparison,
which compares solutions using the rough approximation values, can be defined.
In the estimated comparison, the approximation values are compared first. When
a value is judged to be worse enough than the other value, the estimated comparison returns the estimated result without evaluating the objective function. When it is
difficult to judge the result of comparison from the approximation values, true values are obtained by evaluating the objective function and the estimated comparison
returns the true result based on the true values. By using the estimated comparison,
the evaluation of the objective function is sometimes omitted and the number of
function evaluations can be reduced.
Two parameters, an error margin parameter and a congestion parameter, are introduced in the estimated comparison. The error margin parameter is used to allow
approximation error and avoid incorrect judgment. If the error margin is too large,
cautious judgment is made and the objective function is often evaluated to obtain
true values. As the result, the efficiency of optimization could not be improved
much. If the margin parameter is too small, the evaluation of the objective function is often omitted, but the judgment would often be incorrect. As the result, the
optimization process might be led to a wrong direction. In this study, we propose to
control the margin parameter adaptively based on success rate of the comparison.
The congestion parameter is used not to block the search for new direction. If the
congestion of a solution is low, the solution exists in new or not-yet-visited area and
it is difficult to estimate the true function value of the solution. It is important to
evaluate the function and confirm whether the solution is good or not. If the congestion parameter is too large, the solution in new area will always be evaluated. As
the result, the efficiency of optimization could not be improved much. If the congestion parameter is too small, the search for new area would often be blocked. As
the result, the speed of optimization process might be slow down. In this study, we
propose to control the congestion parameter adaptively based on the success rate.
113
In this chapter, potential model is used as a rough approximation model. The potential model can estimate a function value of a point based on some other points
without learning process and can be used as a general-purpose rough approximation
model. Differential Evolution (DE) [2, 13, 14, 15, 19] is used as an optimization algorithm and the estimated comparison is introduced in the survivor selection phase
of DE. The advantage of these improvements is shown by comparing the results obtained by DE, DE with the estimated comparison method, adaptively controlled DE
with the estimated comparison method and particle swarm optimization in various
types of benchmark functions.
The rest of this chapter is organized as follows: Section 2 describes evolutionary
algorithms using approximation models briefly. Section 3 describes the potential
model as a rough approximation model. The estimated comparison is defined using
the potential model and the adaptive control of parameters is proposed. The adaptive DE with the estimated comparison method is proposed in Section 4. Section 5
presents experimental results on various benchmark problems. Section 6 describes
a comparative study between the adaptive method and particle swarm optimization.
Finally, Section 7 concludes with a brief summary of this chapter and a few remarks.
(5.1)
114
Radial Basis Function (RBF) network models [5, 7, 8, 9] are often used. In most approximation models, model parameters are learned by least square method, gradient
method, maximum likelihood method and so on. In general, learning model parameters is time-consuming process, especially in order to obtain a model with higher
accuracy or a model of a large function such as a function with large dimensions.
Evolutionary algorithms with approximation models can be classified into some
types:
All individuals have only approximation values. Very high quality approximation
model is built and the objective function is optimized using approximation values
only. It is possible to reduce function evaluations greatly. However, these methods can be applied to well-informed objective function and cannot be applied to
general problems.
Some individuals have approximation values and others have true values. Methods in this type are called evolution control approaches and can be classified into
individual-based and generation-based control [6]. The individual-based control
means that good individuals (or randomly selected individuals) use true values
and others use approximation values in each generation [7, 8]. The generationbased control means that all individuals use true values once in a fixed number
of generations and use approximation values in other generations [8, 9]. In the
approaches, the approximation model should be accurate because approximation values are compared with true values. Also, it is known that approximation
models with high accuracy sometimes generate a false optimum or hide a true optimum. Individuals may converge into a false optimum while they are optimized
using the approximation models in some generations. Thus, these approaches are
much affected by the quality of approximation models. It is difficult to utilize
rough approximation models.
All individuals have true values. Some methods in this type are called surrogate
approaches. In the surrogate approaches, an estimated optimum is searched using
an approximation model that is usually a local model. The estimated optimum is
evaluated to obtain the true value and also to improve the approximation model
[1, 5, 10]. If the true value is good, the value is included as an individual. In
the approaches, rough approximation models might be used because approximation values are compared with other approximation values. These approaches are
less affected by the approximation model than the evolution control approaches.
However, they have the process of optimization using the approximation model
only. If the process is repeated many times, they are much affected by the quality
of approximation models.
115
whether a new individual is worth evaluating its true value or not. Also, it can specify the error margin parameter for allowing approximation error and the congestion parameter for accepting the promising solutions in a new direction when the
comparison is carried out. Thus, it is not affected by the approximation model much.
The reduction of function evaluations by the estimated comparison method is not
larger than that by other optimization methods using approximation models with
high accuracy. However, the estimated comparison method does not need the learning process of the approximation model which is often time-consuming and needs
much effort to tune the learning parameters. The estimated comparison method is
fast and easy-to-use approach and can be applied to wide range of problems including from low or medium computation cost to high computation cost problems. It is
thought that the estimated comparison method is a more general-purpose method
than other methods with high-quality approximation models.
(5.2)
Uo =
(5.3)
(5.4)
116
Uo (y) =
f (xi )
d(xi , y) p
(5.5)
Uc (y) =
d(xi , y) p
(5.6)
(5.7)
Uo (xi ) =
( )
Uc (xi ) =
j=i
j=i
f (x j )
( )
(5.8)
( )
(5.9)
d(x j , xi ) p
1
d(x j , xi ) p
()
()
()
f(xi ) = Uo (xi )/Uc (xi )
(5.10)
It should be noted that the parent point x j ( j = i) is omitted in the right side of Uo
and Uc . If the parent point is not omitted, the approximation value of the parent
point becomes the true value. As the result, the difference between the precision of
approximation at the parent point and that at the child point becomes large, and it is
difficult to compare the approximation values.
The estimated comparison judges whether the child point is better than the parent point. In the comparison, a reference value z for indicating accuracy level of
the approximation model, the error margin parameter ( 0) and the congestion
parameter (0 1) are introduced. The estimated comparison can be defined
as follows:
better(x i , xi , z) {
if(Uc (xi ) Uc (xi ) || f(xi ) f(xi ) + z) {
Evaluate xi ;
if( f (xi ) < f (xi )) return yes;
}
return no;
}
117
(5.12)
In this case, the recommended value of the margin parameter is [0.05, 0.5].
118
minimum value min and min , respectively. If the success rate is small, the accuracy
is low and the margin parameter and the congestion parameter should be increased.
Thus, when the rate is Bad, and are increased by a factor of 1.5. When the rate
is Very Bad, and have the maximum value max and max , respectively.
119
P;
i
i
j=select randomly from [1, n];
120
k=1;
do {
xnew
i j =x p1 , j +F(x p2 , j x p3 , j );
j=( j + 1)%n;
k++;
} while(k n && u(0, 1) < CR);
// estimated comparison
if(better(x new , xi , )) xi =xnew ;
}
Update and adaptively;
}
}
where u(0, 1) is a uniform random variable generator between [0, 1].
In this study, current population P is used as the set of solutions which have
known objective values. As the search process progresses, the area where individuals exist may become elliptical. In order to handle such a case, the normalized
distance is introduced, in which the distance is normalized by the width of each
dimension in the current population P.
2
xj yj
d(x, y) =
(5.14)
maxxi P xi j minxi P xi j
j
(5.15)
i=1
This function is a unimodal function and has the minimum value 0 at (0, 0, , 0).
f2 : Generalized Rosenbrock (Star type) function
n
121
This function is a unimodal function with a steep surface and has the minimum
value 0 at (1, 1, , 1).
f3 : Ill-scaled generalized Rosenbrock (Star type) function
n
This function is a unimodal and ill-scaled function with a steep surface and has
the minimum value 0 at (1, 12 , , 1n ).
f4 : Generalized Rastrigin function
n
This function is a multimodal function with a very bumpy surface and has the
minimum value 0 at (0, 0, , 0).
f5 : Ackley function
1 n 2
1 n
f (x) = 20 + exp(1) 20 exp 0.2 xi exp
cos(2xi) ,
n i=1
n i=1
32 xi 32
This function is a multimodal function with a bumpy surface and has the minimum value 0 at (0, 0, , 0).
f6 : Griewank function
x
1 n 2
n
i + 1, 600 xi 600
x
cos
f (x) =
i i=1
4000 i=1
i
This function is a multimodal function with a less bumpy surface and has the
minimum value 0 at (0, 0, , 0).
Figure 5.1 shows the graphs of functions f2 , f4 , f5 and f6 in case of n = 2.
Table 5.2 Features of test functions
Function modality
surface
dependency of variables ill-scale
f1
unimodal
smooth
unimodal
steep
strong
f2
unimodal
steep
strong
strong
f3
f4
multimodal bumpy (large bumps)
multimodal
bumpy
f5
multimodal bumpy (small bumps)
f6
122
f4
80
f(x)
10000
1000
100
10
1
0.1
0.01
0.001
-2
-1.5
f(x)
70
60
50
40
30
2
1.5
1
0
-1
-0.5
x1
0.5
1.5
-0.5
-1
-1.5
-2
20
10
0.5
x2
4
2
-4
-2
x1
-2
2
25
20
15
10
5
-30
-20
-10
x1
10
20
-4
f6
f5
x2
30 -30
-20
200
180
160
140
120
100
80
60
40
30
20
20
10
0
0
x
2
-10
600
400
200
-400
0
-200
x1
200
400
-200
-400
x2
600
123
Table 5.3 Comparison among adaptive DE with the estimated comparison method, DE with
the estimated comparison method, and DE
Func. Method
f1
adaptive
(0.2, 0.4)
(0.1, 0.2)
DE
f2
adaptive
(0.2, 0.4)
(0.1, 0.2)
DE
f3
adaptive
(0.2, 0.4)
(0.1, 0.2)
DE
f4
adaptive
(0.2, 0.4)
(0.1, 0.2)
DE
f5
adaptive
(0.2, 0.4)
(0.1, 0.2)
DE
f6
adaptive
(0.2, 0.4)
(0.1, 0.2)
DE
eval
36,256.68
46,324.32
41,344.04
85,656.16
586,018.92
578,136.68
617,305.88
790,263.36
586,919.64
582,880.12
620,157.24
817,802.48
304,002.68
313,046.68
295,009.04
556,960.80
69,175.84
83,019.20
75,401.76
155,823.64
56,982.00
71,336.48
63,453.12
129,658.04
success
13,224.08
14,649.48
14,157.88
15,357.96
65,552.36
58,460.00
87,315.84
45,467.08
66,579.04
59,880.64
86,223.28
45,482.80
22,342.08
23,243.68
22,895.68
24,504.44
23,521.36
25,245.48
24,669.00
26,636.68
19,982.08
21,822.72
21,104.56
22,888.72
fail
rate(%) reduce(%)
22,950.88 36.56
57.67
31,592.96 31.68
45.92
27,104.32 34.31
51.73
70,216.20 17.95
0
520,384.56 11.19
25.85
519,594.72 10.11
26.84
529,908.20 14.15
21.89
744,714.28
5.75
0
520,258.72 11.35
28.23
522,917.60 10.27
28.73
533,852.04 13.91
24.17
772,237.68
5.56
0
281,578.76
7.35
45.42
289,721.12
7.43
43.79
272,031.48
7.76
47.03
532,374.36
4.40
0
45,572.80 34.04
55.61
57,691.84 30.44
46.72
50,651.08 32.75
51.61
129,104.96 17.10
0
36,918.08 35.12
56.05
49,431.88 30.63
44.98
42,266.72 33.30
51.06
106,687.32 17.66
0
adaptive means the adaptive DE with the estimated comparison method using
potential model, DE means original DE/rand/1/exp, and others mean DE with the
estimated comparison method using fixed parameter values (, ) specified in the
table. The columns labeled eval, success, fail and rate show the total number of evaluation until a near optimal solution is found, the number of successful
evaluations where the child solution is better than the parent solution, the number
of failure evaluations and the success rate on average, respectively. The column reduce shows the ratio of how many times function evaluations is reduced compared
with DE.
The function f1 is a unimodal and smooth function. It is easy to approximate the
function. The adaptive DE with the estimated comparison method achieved the best
result and reduced 57.67% of function evaluations compared with DE.
The functions f2 and f3 are unimodal but steep functions. It is difficult to approximate the functions. The DE with the estimated comparison method using
(, ) = (0.2, 0.4) reduced 26.84% ( f2 ) and 28.73% ( f3 ) of function evaluations
compared with DE and achieved the best result. The adaptive DE with estimated
comparison method reduced 25.85% ( f2 ) and 28.23% ( f3 ) of function evaluations.
124
It achieved the second best result and the reduction rate is almost same as the best
result.
The function f4 is a multimodal and very bumpy function. It is difficult to
approximate the function. The DE with the estimated comparison method using
(, ) = (0.1, 0.2) reduced 47.03% of function evaluations compared with DE and
achieved the best result. The adaptive DE with the estimated comparison method
reduced 45.42% of function evaluations. It achieved the second best result and the
reduction rate is almost same as the best result.
The function f5 is a multimodal and bumpy function. It is not so difficult to
approximate the function. The adaptive DE with the estimated comparison method
reduced 55.61% of function evaluations compared with DE and achieved the best
result.
The function f6 is a multimodal and less bumpy function. It is fairly easy to
approximate the function. The adaptive DE with the estimated comparison method
reduced 56.05% of function evaluations compared with DE and achieved the best
result.
It is shown that the adaptive and original DE with the estimated comparison
method are far better than original DE in all problems. Also, it is shown that the
adaptive DE with the estimated comparison method could reduce from 25% to 57%
function evaluations without hand-tuning the parameters and and achieved better or almost same results compared with the estimated comparison method with
hand-tuning the parameters.
Figures 5.2, 5.3, 5.4, 5.5, 5.6 and 5.7 show single logarithmic plots of the best
function values (left figures) and adaptively controlled parameter values of and
(right figures) over the number of function evaluations for function f1 , f2 , f3 ,
f4 , f5 and f6 , respectively. Note that the ends of graphs are violated because some
runs are terminated earlier than some other runs when the near optimal solution is
found. In the figures of the best values, thick solid lines, thin solid lines and dotted
lines show optimization process by adaptive, (0.2, 0.4) and (0.1, 0.2) DE with the
estimated comparison method, and chain lines show that by DE. In the figures of the
parameter values, solid lines and dotted lines show the values of and controlled
by the adaptive DE with the estimated comparison method.
It is clear that the adaptive and original DE with the estimated comparison method
can find better solutions faster than DE. Also, it is clear that the adaptive control
can adjust the parameter values properly and dynamically. The function f1 can be
approximated easily. As shown in the right of Fig. 5.2, the parameter values of
and are small and are often decreased to the minimum values. The approximation
of the functions f2 and f3 is very difficult. As shown in the right of Figs. 5.3 and
5.4, the parameter values are large and are often increased to the maximum values.
The approximation of f4 is difficult because the function is multimodal. However,
after the valley of the function surface including the minimum value has found, f4
becomes unimodal and the approximation becomes easy. As shown in the right of
Fig. 5.5, the parameter values are large in the early stage of the search process and
become small in the last stage. The approximation of f5 and f6 is not so difficult.
As shown in the right of Figs. 5.6 and 5.7, the parameter values are almost in the
1000
adaptive
(0.2,0.4)
(0.1,0.2)
DE
10
f1
0.1
Parameter Value
100
125
0.1
0.01
0.001
0.0001
0.01
1e-005
0.001
0
50000
Number of Function Evaluations
100000
adaptive
(0.2,0.4)
(0.1,0.2)
DE
10000
0.1
Parameter Value
1000
f2
100
10
1
0.01
0.001
0.1
0.0001
0.01
1e-005
0.001
0
200000
400000
600000
800000
Number of Function Evaluations
1e+006
adaptive
(0.2,0.4)
(0.1,0.2)
DE
10000
0.1
Parameter Value
1000
f3
100
10
1
0.01
0.001
0.1
0.0001
0.01
1e-005
0.001
0
200000
400000
600000
800000
Number of Function Evaluations
1e+006
lower range of [0.01, 0.1] than the fixed case of (, ) = (0.2, 0.1) and more number
of function evaluations can be skipped than the fixed case. The parameter values in
f6 are often lower than those in f5 , because f6 is less bumpy than f5 . Thus, it is
thought that the parameter values are properly controlled according to the difficulty
of approximation for the objective functions.
126
adaptive
(0.2,0.4)
(0.1,0.2)
DE
100
0.1
Parameter Value
10
f4
0.1
0.01
0.001
0.0001
0.01
1e-005
0.001
0
100000
600000
100
adaptive
(0.2,0.4)
(0.1,0.2)
DE
Parameter Value
10
f5
0.1
0.1
0.01
0.01
0.001
0.001
0
100000
Number of Function Evaluations
200000
10000
adaptive
(0.2,0.4)
(0.1,0.2)
DE
1000
0.1
Parameter Value
100
f6
10
1
0.01
0.001
0.1
0.0001
0.01
1e-005
0.001
0
100000
Number of Function Evaluations
200000
10000
20000
30000
40000
50000
60000
70000
5.6 Discussion
In this section, the adaptive DE with the estimated comparison is compared with
particle swarm optimization (PSO), because PSO is known as a fast and efficient optimization algorithm. In this study, the standard Particle Swarm Optimization 2007
127
(5.17)
128
Table 5.4 Comparison between adaptive DE with the estimated comparison method and
SPSO-07
Func. Method success
eval
best
f1
adaptive 100% 36,256.68 0.001
SPSO-07 100% 38,956.00 0.001
f2
adaptive
100% 586,018.92 0.001
SPSO-07
0%
2.528
f3
adaptive
100% 586,919.64 0.001
SPSO-07
0%
2.534
f4
adaptive
100% 304,002.68 0.001
SPSO-07
0%
67.059
f5
adaptive 100% 69,175.84 0.001
SPSO-07 100% 72,016.00 0.001
f6
adaptive 100% 56,982.00 0.001
SPSO-07
92% 61,721.74 0.002
5.7 Conclusions
We proposed to utilize a rough approximation model, which is an approximation
model with low accuracy and without learning process, in order to reduce the number of function evaluations in wide range of problems including from low or medium
computation cost to high computation cost problems. We proposed the estimated
comparison method, in which the function evaluation of a solution is skipped when
the goodness of the solution can be judged from the approximation value of it. Also,
we proposed to control the margin parameter and the congestion parameter adaptively in the estimated comparison method. Through the optimization of various
types of test problems, it is shown that the estimated comparison method is very
effective to reduce function evaluations. Also, it is shown that without tuning the
parameters the adaptive DE with the estimated comparison method can improve
the optimization process and reduce about from 25% to 57% function evaluations
compared with DE.
In the future, we will apply the estimated comparison method into constrained optimization problems using the constrained Differential Evolution (DE) [16, 19].
We have shown some results of constrained optmization using the estimated comparison method and DE in [18]. We plan to apply the estimated comparison method
into other evolutionary algorithms such as particle swarm optimization. Also, we
will apply the estimated comparison method to real world problems, and test the
performance of the method.
References
1. Buche, D., Schraudolph, N.N., Koumoutsakos, P.: Accelerating evolutionary algorithms
with gaussian process fitness function models. IEEE Trans. on Systems, Man, and Cybernetics, Part C: Applications and Reviews 35(2), 183194 (2005)
2. Chakraborty, U.K. (ed.): Advances in Differential Evolution. Springer, Heidelberg (2008)
3. Clerc, M.: Standard pso 2007.c (2007), http://www.particleswarm.info/
129
4. Giunta, A., Watson, L.: A comparison of approximation modeling techniques: Polynomial versus interpolating models. Tech. Rep. 98-4755, AIAA (1998)
5. Guimaraes, F.G., Wanner, E.F., Campelo, F., Takahashi, R.H., Igarashi, H., Lowther,
D.A., Ramrez, J.A.: Local learning and search in memetic algorithms. In: Proc. of the
2006 IEEE Congress on Evolutionary Computation, Vancouver, BC, Canada, pp. 9841
9848 (2006)
6. Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation.
Soft Computing 9, 312 (2005)
7. Jin, Y., Sendhoff, B.: Reducing fitness evaluations using clustering techniques and neural
network ensembles. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3102, pp. 688
699. Springer, Heidelberg (2004)
8. Jin, Y., Olhofer, M., Sendhoff, B.: On evolutionary optimization with approximate fitness
functions. In: Proc. of the Genetic and Evolutionary Computation Conference, pp. 786
792. Morgan Kaufmann, San Francisco (2000)
9. Jin, Y., Olhofer, M., Sendhoff, B.: A framework for evolutionary optimization with approximate fitness functions. IEEE Trans. on Evolutionary Computation 6(5), 481494
(2002)
10. Ong, Y.S., Zhou, Z., Lim, D.: Curse and blessing of uncertainty in evolutionary algorithm
using approximation. In: Proc. of the 2006 IEEE Congress on Evolutionary Computation,
Vancouver, BC, Canada, pp. 98339840 (2006)
11. Shyy, W., Tucker, P.K., Vaidyanathan, R.: Response surface and neural network techniques for rocket engine injector optimization. Tech. Rep. 99-2455, AIAA (1999)
12. Simpson, T.W., Mauery, T.M., Korte, J.J., Mistree, F.: Comparison of response surface
and kriging models in the multidisciplinary design of an aerospike nozzle. Tech. Rep.
98-4758, AIAA (1998)
13. Storn, R., Price, K.: Minimizing the real functions of the ICEC 1996 contest by differential evolution. In: Proc. of the International Conference on Evolutionary Computation,
pp. 842844 (1996)
14. Storn, R., Price, K.: Differential evolution A simple and efficient heuristic for global
optimization over continuous spaces. Journal of Global Optimization 11, 341359 (1997)
15. Takahama, T., Sakai, S.: Constrained optimization by the constrained differential evolution with gradient-based mutation and feasible elites. In: Proc. of the 2006 IEEE
Congress on Evolutionary Computation, pp. 308315 (2006)
16. Takahama, T., Sakai, S.: Constrained optimization by the constrained differential evolution with dynamic -level control. In: Chakraborty, U. (ed.) Advances in Differential
Evolution. Springer, Heidelberg (2008)
17. Takahama, T., Sakai, S.: Reducing function evaluations in differential evolution using
rough approximation-based comparison. In: Proc. of the 2008 IEEE Congress on Evolutionary Computation, pp. 23072314 (2008)
18. Takahama, T., Sakai, S.: Efficient constrained optimization by the constrained differential evolution using an approximation model with low accuracy. Transactions of the
Japanese Society for Artificial Intelligence 24(1), 3445 (2009) (in Japanese)
19. Takahama, T., Sakai, S., Iwane, N.: Solving nonlinear constrained optimization problems
by the constrained differential evolution. In: Proc. of the 2006 IEEE Conference on
Systems, Man, and Cybernetics, pp. 23222327 (2006)
20. Takahama, T., Sakai, S., Hara, A.: Reducing the number of function evaluations in differential evolution by estimated comparison method using an approximation model with
low accuracy. IEICE Trans. on Information and Systems J91-D(5), 12751285 (2008)
(in Japanese)
Chapter 6
132
6.1 Introduction
6.1.1 Motivations: Efficient Optimization Algorithms for
Expensive Computer Experiments
Beyond both estalished frameworks of derivative-based descent and stochastic
search algorithms, the rise of expensive optimization problems creates the need for
new specific approaches and procedures. The word expensive which refers to
price and/or time issues implies severely restricted budgets in terms of objective function evaluations. Such limitations contrast with the computational burden
typically associated with stochastic search techniques, like genetic algorithms. Furthermore, the latter evaluations provide no differential information in a majority
of expensive optimization problems, whether the objective function originate from
physical or from simulated experiments. Hence there exists a strong motivation for
developing derivative-free algorithms, with a particular focus on their optimization
performances in a drastically limited number of evaluations. Investigating and implementing adequate strategies constitute a contemporary challenge at the interface
between Applied Mathematics and Computational Intelligence, especially when it
comes to reducing optimization durations by efficiently taking advantage of parallel
computation facilities.
The primary aim of this chapter is to address parallelization issues for the optimization of expensive-to-evaluate simulators, such as increasingly encountered in
engineering applications like car crash tests, nuclear safety, or reservoir forecasting.
More specifically, the work presented here takes place in the frame of metamodelbased design of computer experiments, in the sense of [42]. Even though the results
and discussions might be extended to a more general scope, we restrict ourself here
for clarity to single-objective optimization problems for deterministic codes. The
simulator is seen as black-box function y with d-dimensional vector of inputs and
scalar output, the latter being often obtained as combination of several responses.
Metamodels, also called surrogate models, are simplified representations of y. They
can be used for predicting values of y outside the initial design, or visualizing the
influence of each variable on y [27, 43]. They may also guide further sampling decisions for various purposes, such as refining the exploration of the input space in
preferential zones or optimizing the function y [22]. Classical surrogates include
radial basis functions [37], interpolation splines [52], neural nets [8] (deterministic
metamodels), or linear and non-linear regression [2], and Kriging [7] (probabilistic
metamodels). We concentrate here on the advantages of probabilistic metamodels
for parallel exploration and optimization, with a particular focus on the virtues of
Kriging.
133
An History Going from Experiments to Theory: CI methods very often originate from empirical computing experiments, in particular from experiments that
mimick natural processes (e.g., neural networks [4], ant colony optimization [5],
simulated annealing [25]). Later on, as researchers use and analyze them, theory develops and their mathematical content grows. A good example is provided
by the evolutionary algorithms [9] which have progressively mixed the genetic
metaphor and stochastic optimization theory.
An Indirect Problem Representation: In standard evolutionary optimization
methods, knowledge about the cost function takes the indirect form of a set of
well-performing points, known as current population. Such set of points is an
implicit, partial, representation of a function. In fuzzy methods, the probability density functions of the uncertain variables are averaged out. Such indirect
representations enable to work with few mathematical assumptions and have
broadened the range of applicability of CI methods.
Parallelized Decision Process: Most CI approaches are inherently parallel. For
example, the evolutionary or particle swarm optimization [24] methods process
sets of points in parallel. Neural networks have an internal parallel structure.
Today, parallelism is crucial for taking advantage of the increasingly distributed
computing capacity. The parallel decision making possibilities are related to the
indirect problem representations (through set of points, distributions) and to the
use of randomness in the decision process.
Heuristics: Implicit problem representations and the empirical genesis of the CI
methods rarely allow mathematical proofs of the methods properties. Most CI
methods are thus heuristics.
Kriging has recently gained popularity among several research communities related
to CI, ranging from Data Mining [16] and Bayesian Statistics [34, 48] to Machine
Learning [39], where it is linked to Gaussian Process Regression [53] and Kernel
Methods [12]. Recent works [17, 30, 31] illustrate the practical relevance of Kriging
to approximate computer codes in application areas such as aerospace engineering or materials science. Indeed, probabilistic metamodels like Kriging seem to be
particularly adapted for the optimization of black-box functions, as analyzed and
illustrated in the excellent article [20]. The current Chapter is devoted to the optimization of black-box functions using a Kriging metamodel [14, 22, 49, 51]. Let
us now stress some essential relationships between Kriging and CI by revisiting the
above list of features.
A History from Field Studies to Mathematical Statistics: Kriging comes from
the earth sciences [29, 33], and has been progressively developed since the 1950s
along with the discipline called geostatistics [23, 32]. Originally aimed at estimating natural ressources in mining applications, it has later been adapted to address very general interpolation and approximation problems [42, 43]. The word
Kriging comes from the name of a mining engineer, Prof. Daniel G. Krige,
who was a pioneer in the application of mathematical statistics to the study of
new gold mines using a limited number of boreholes [29].
134
135
Section 6.3 (The Multi-points Expected Improvement) consists in the presentation of the q-EI criterion continuing the work initiated in [47], its explicit
calculation when q = 2, and the derivation of estimates of the latter criterion in
the general case, relying on Monte-Carlo simulations of gaussian vectors.
Section 6.4 (Approximated q-EI maximization) introduces two heuristic strategies, KB and CL, to circumvent the computational complexity of a direct q-EI
maximization. These strategies are tested on a classical test-case, and CL is found
to be a very promizing competitor for approximated q-EI maximization
Section 6.5 (Towards Kriging-based Parallel Optimization: Conclusion and Perspectives) gives a summary of obtained results as well as some related practical
recommendations, and finally suggests what the authors think are perspectives of
research to address the most urgently in order to extend this work.
The appendix 6.6 is a short but dense introduction to GP for machine learning, with an emphasis on the foundations of both Simple Kriging and Ordinary
Kriging by GP conditioning.
Some Notations: y : x D Rd y(x) R refers to the objective function, where
d N\{0} is the number of input variables and D is the set in which the inputs vary,
most of the time assumed to be a compact and connex1 subset of Rd . At first, y is
known at a Design of Experiments X = {x1 , ..., xn } , where n N is the number of
initial runs or experiments, and each xi (1 i n) is hence a d-dimensional vector
(xi1 , . . . , xid ). We denote by Y = {y(x1 ), ..., y(xn )} the set of observations made by
evaluating y at the points of X. The data (X, Y) provides information on which is
initially based the metamodeling of y, with an accuracy that depends on n, the geometry of X, and the regularity of y. The OK mean predictor and prediction variance
are denoted by the functions mOK (.) and s2OK (.). The random process implicitely
underlying OK is denoted by Y (.), in accordance with the notations of eq. (6.35)
presented in appendix. The symbol | is used for conditioning, together with the
classical symbols for probability and expectation, respectively P and E.
Connexity is sometimes untenable in practical applications, see e.g. [46] for a treatment of
disconnected feasible regions.
136
T
1 c(x)T 1 1n
1n 1 Y,
mOK (x) = c(x) +
1Tn 1 1n
(6.1)
(1 1Tn 1 c(x))2
,
1Tn 1 1n
(6.2)
T
where c(x) := c(Y (x),Y (x1 )), ..., c(Y (x),Y (xn )) , and and 2 are defined following the assumptions2 and notations given in appendix 6.6. Classical properties of OK include that i [1, n] mOK (xi ) = y(xi ) and s2OK (xi ) = 0, therefore
[Y (x)|Y (X) = Y] is interpolating. Note that [Y (xa )|Y (X) = Y] and [Y (xb )|Y (X) = Y]
are dependent random variables, where xa and xb are arbitrary points of D, as we
will develop later.
The OK metamodel of the Branin-Hoo function (Cf. eq. 6.25) is plotted on fig.
6.2.1. The OK interpolation (upper middle) is based only on 9 observations. Even if
the shape is reasonably respected (lower middle), the contour of the mean shows an
artificial optimal zone (upper middle, around the point (6, 2)). In other respects, the
variance is not depending on the observations3 (eq. 6.2). Note its particular shape,
due to the anisotropy of the covariance kernel estimated by likelihood maximization.
In modern interpretations [39], deriving OK equations is often based on the assumption that y is a realization of a random process Y with unknown constant mean and
known covariance (see [1] or [12] for a review of classical covariance kernels). Here
we follow the derivation of 6.6.4, which has the advantage of delivering a gaussian
posterior distribution:
(6.3)
Note that both a structure selection and a parametric estimation are made in practice: one often chooses a generalized exponential kernel with plugged-in maximum
likelihood covariance hyperparameters, i.e. without taking the estimation variance
into account [22]. This issue is sometimes addressed using a full bayesian treatment, as can be found in [43], or more recently in [15, 34, 39]. Rephrasing eq. 6.3,
under the latter GP assumptions, the random variable Y (x) knowing the values
of {y(x1 ), ..., y(xn )} follows a gaussian distribution which mean and variance are
respectively E[Y (x)|Y (X) = Y] = mOK (x) and Var[Y (x)|Y (X) = Y] = s2OK (x). In
fact, as shown in appendix (Cf. eq. 6.38), one can even get much more than these
marginal conditional distributions; Y (.)|Y (X) = Y constitutes a random process
2
3
An extension of the Kriging equations to the framework of covariance non-stationary processes [35] is straightforward but beyond the scope of the present work.
Phenomenon known as homoskedasticity of the Kriging variance with respect to the
observations [7].
137
Fig. 6.1 Ordinary Kriging of the Branin-Hoo function (function, Kriging mean value and
variance, from left to right). The design of experiments is a 3 3 factorial design. The covariance is an anisotropic squared exponential with parameters estimated by gaussian likelihood
maximization [7]
(6.4)
(1 1Tn 1 c(x))(1 1Tn 1 c(x ))
.
1Tn 1 1n
(6.5)
This new kernel cOK is not stationary, even if c is. In other respects, the knowledge
of mOK and cOK is the first step to performing conditional simulations of Y knowing
the observations Y (X) = Y, which is easily feasible at any new finite design of
experiments, whatever the dimension of inputs. This will enable the computation
of any multi-points sampling criterion, such as proposed in the forthcoming section
about parallelization.
138
metamodel in [44, 45] or [20]. The latter analyzes why directly optimizing a deterministic metamodel (like a spline, a polynomial, or the Kriging mean) is dangerous,
and does not even necessarily lead to a local optimum. Kriging-based sequential
optimization strategies (as developed in [22], and commented in [20]) address the
issue of converging to non (locally) optimal points, by taking the Kriging variance
term into account (hence encouraging the algorithms to explore outside the already
visited zones). Such algorithms produce one point at each iteration that maximizes a
figure of merit based upon [Y (x)|Y (X) = Y]. In essence, the criteria balance Kriging
mean prediction and uncertainty.
6.2.2.1
A fundamental mistake of minimizing mOK is that no account is done of the uncertainty associated with it. At the extreme inverse, it is possible to define the next
optimization iterate as the least known point in D,
x = argmaxxD sOK (x)
(6.6)
This procedure defines a series of x s which will fill the space D and hence ultimately locate a global optimum. Yet, since no use is made of previously obtained
Y information look at formula 6.2 for s2OK , there is no bias in favor of high
performance regions. Maximizing the uncertainty is inefficient in practice.
6.2.2.3
The most general formulation for compromizing between the exploitation of previous simulations brought by mOK and the exploration based on sOK is the multicriteria
problem
minxD mOK (x)
(6.7)
maxxD sOK (x)
Let P denote the Pareto set of solutions4 . Finding one (or many) elements in P
remains a difficult problem since P typically contains an infinite number of points.
A comparable approach called direct, although not based on OK, is described in
4
139
[21]: the metamodel is piecewise linear and the uncertainty measure is a distance to
already known points. The space D is discretized and the Pareto optimal set defines
areas where discretization is refined. The method becomes computationally expensive as the number of iterations and dimensions increase. Note that [3] proposes
several parallelized versions of direct.
6.2.2.4
Among the numerous criteria presented in [20], the probability of getting an improvement of the function with respect to the past evaluations seems to be one
of the most fundamental. This function is defined for every x D as the probability for the random variable Y (x) to be below the currently known minimum
min(Y) = min{y(x1 ), ..., y(xn )} conditional on the observations at the design of
experiments:
PI(x) := P (Y (x) min(Y (X))|Y (X) = Y)
(6.8)
(6.9)
where is the gaussian cumulative distribution function, and the last equality follows eq. 6.3. The threshold min(Y) is sometimes replaced by some arbitrary target
T R, as evokated in [38]. PI is known to provide a very local search whenever
the value of T is equal or close to min(Y). Taking several T s is a remedy proposed
by [20] to force global exploration. Of course, this new degree of freedom is also
one more parameter to fit. In other respects, PI has also been succesfully used as
pre-selection criterion in GP-assisted evolution strategies [49], where it was pointed
out that PI is performant but has a tendency to sample in unexplored areas. We argue
that the chosen covariance structure plays a capital role in such matters, depending
whether the Kriging mean is overshooting the observations or not. The next presented criterion, the expected improvement, is less sensitive to such issues since it
explicitly integrates both Kriging mean and variance.
6.2.2.5
(6.10)
that additionally takes into account the magnitude of the improvements. EI measures
how much improvement is expected by sampling at x. In fine, the improvement
will be 0 if y(x) is above min(Y) and min(Y) y(x) else. Knowing the conditional
distribution of Y (x), it is possible to calculate EI in closed form:
140
EI(x) = (min(Y) mOK (x))
min(Y) mOK (x)
min(Y) mOK (x)
+ sOK (x)
,
sOK (x)
sOK (x)
(6.11)
where stands for the probability density function of the standard normal law
N (0, 1).
Proof of 6.11: EI(x) = E[(min(Y) Y (x))
=
min(Y)
Y(x)min(Y) |Y (X) =
(min(Y) t) fN (m
(t)dt
2
KO (x),sKO (x))
min(Y)mKO (x)
sKO (x)
Y]
=
min(Y)mKO (x)
sKO (x)
min(Y)mKO (x)
sKO (x)
min(Y) mKO (x)
min(Y) mKO (x)
+ sKO (x)
sKO (x)
sKO (x)
EI represents a trade-off between promising and uncertain zones. This criterion has
important properties for sequential exploration: it is null at the already visited sites,
and positive everywhere else with a magnitude that is increasing with the Kriging
variance and with the decreasing Kriging mean (EI maximizers are indeed part of the
Pareto front of (sOK , mOK )). Such features are usually demanded from global optimization procedures (see [21] for instance). EI and the probability of improvement
are compared in fig. (2).
6.2.2.6
SUR has been introduced in [11] and extended to global optimization in [50, 51].
By modeling y using the process Y s conditional law Y (x)|Y, it is possible to define
x |Y, the conditional law of Y s global minimizer x , and its density px |Y (x). The
uncertainty about the location of x is measured as the entropy of px |Y (x), H(x |Y).
H(x |Y) diminishes as the distribution of x |Y gets more peaked. Conceptually, the
SUR strategy for global optimization chooses as next iterate the point that specifies
the most the location of the optimum,
x = argminxD H(x |Y,Y (x))
(6.12)
141
Fig. 6.2 PI and EI surfaces of the Branin-Hoo function (same design of experiments, Kriging
model, and covariance parameters as in fig. 6.2.1). Maximizing PI leads to sample near the
good points (associated with low observations) whereas maximizing EI leads here to sample
between the good points. By construction, both criteria are null at the design of experiments,
but the probability of improvement is very close to 12 in a neighborhood of the point(s) where
the function takes its current minimum
6.2.2.7
EGO [22] relies on the EI criterion. Starting with an initial Design X (typically a
Latin Hypercube), EGO sequentially visits the current global maximizer of EI (say
the first visited one if there is more than one global maximizer) and updates the OK
metamodel at each iteration, including hyperparameters re-estimation:
1. Evaluate y at X, set Y = y(X) and estimate covariance parameters
of Y by MLE (Maximum Likelihood Estimation)
2. While stopping criterion not met
a. Compute x = argmaxxD EI(x), set X = X{x } and Y = Y{y(x )}
b. Re-estimate covariance parameters by MLE
142
After having been developed in [22, 47], EGO has inspired contemporary works
in optimization of expensive-to-evaluate functions. For instance, [19] exposes some
EGO-based methods for the optimization of noisy black-box functions like stochastic simulators. [18] focuses on multiple numerical simulators with different levels
of fidelity, and introduces the so-called augmented EI criterion, integrating possible heterogeneity in the simulation times. Moreover, [26] proposes an adaptation to
multi-objective optimization, [17] proposes an original multi-objective adaptation of
EGO for physical experiments, and [28] focuses on robust criteria for multiobjective
constrained optimization with applications to laminating processes.
In all, one major drawback of the EGO-like algorithms discussed so far is that
they do not allow parallel evaluations of y, which is desirable for costly simulators
(e.g. a crash-test simulation run typically lasts 24 hours). This was already pointed
out in [47], where the multi-points EI was defined but not further developed. Here
we continue this work by expliciting the latter multi-points EI (q-EI), and by proposing two classes of heuristics strategies meant to approximatly optimize the q-EI, and
hence (almost) simultaneously deliver an arbitrary number of points without intermediate evaluations of y. In particular, we analytically derive the 2-EI, and explain
in detail how to take advantage of statistical interpretations of Kriging to consistently compute q-EI by simulation when q > 2, which happens to provide quite a
general template for desiging Kriging-based parallel evaluation strategies dedicated
to optimization or other purposes.
(6.13)
143
EI(xn+1 , ..., xn+q ) : = E[max{(min(Y (X)) Y (xn+1 ))+ , ..., (min(Y) Y (xn+q ))+ }|Y (X) = Y]
Hence, the q-EI may be seen as the regular EI applied to the random variable
min(Y (xn+1 ), ...,Y (xn+q )). We thus have to deal with a minimum of dependent random variables. Fortunately, eq. 6.4 provides us with the exact joint distribution of
the q unknown responses conditional on the observations:
[(Y (xn+1 ), ...,Y (xn+q ))|Y (X) = Y] N ((mOK (xn+1 ), ..., mOK (xn+q )), Sq ) (6.15)
where the elements of the conditional covariance matrix Sq are (Sq )i, j = cOK (xn+i ,
xn+ j ) (See eq. 6.5). We now propose two different ways to evaluate the criterion
eq. 6.14, depending whether q = 2 or q 3.
We will now show that the 2-EI can be developed as a sum of two 1-EIs, plus a
correction term involving 1- and 2-dimensional gaussian cumulative distributions.
144
Fig. 6.3 1-EI (lower left) and 2-EI (right) functions associated with a monodimensional
quadratic function (y(x) = 4 (x 0.45)2 known at X = {1, 0.5, 0, 0.5, 1}. The OK metamodel has here a cubic covariance with parameters 2 = 10, scale = 0.9)
Before all, some classical results of conditional calculus allow us to precise the
dependence between Y (xn+1 ) and Y (xn+2 ) conditional on Y (X) = Y, and to fix some
additional notations. i, j {1, 2} (i = j), we note:
145
n+i
:=
s
(x
)
=
Var[Y (xn+i )|Y (X) = Y],
i
KO
(6.16)
c1,2 := 1,2 1 2 := cov[Y (xn+1 ),Y (xn+2 )|Y (X) = Y],
mi| j = E[Y (xn+i )|Y (X) = Y,Y (xn+ j ))] = mi + c1,2 i2 (Y (xn+ j ) m j ),
2
2
i| j = i2 c21,2 2
j = i (1 12 ).
At this stage we are in position to compute EI(xn+1, xn+2 ) in four steps. From now
on, we replace the complete notation Y (xn+i ) by Yi and forget the conditioning on
Y (X) = Y for the sake of clarity.
Step 1
EI(xn+1 , xn+2 ) = E[(min(Y) min(Y1 ,Y2 ))1min(Y1 ,Y2 )min(Y) ]
= E[(min(Y) min(Y1 ,Y2 ))1min(Y1 ,Y2 )min(Y) (1Y1 Y2 + 1Y2 Y1 )]
= E[(min(Y) Y1 )1Y1 min(Y) 1Y1 Y2 ] + E[(min(Y) Y2 )1Y2 min(Y) 1Y2 Y1 ]
Since both terms of the last sum are similar (up to a permutation between xn+1 and
xn+2 ), we will restrict our attention to the first one. Using 1Y1 Y2 = 1 1Y2Y1 5 , we
get:
E[(min(Y) Y1 )1Y1 min(Y) 1Y1 Y2 ] = E[(min(Y) Y1 )1Y1 min(Y) (1 1Y2 Y1 )]
= EI(xn+1 ) E[(min(Y) Y1 )1Y1 min(Y) 1Y2 Y1 ]
= EI(xn+1 ) + B(xn+1 , xn+2 )
where B(xn+1, xn+2 ) = E[(Y1 min(Y))1Y1 min(Y) 1Y2 Y1 ]. Informally, B(xn+1, xn+2 )
is the opposite of the improvement brought by Y1 when Y2 Y1 and hence that
doesnt contribute to the 2-points improvement. Our aim in the next steps will be to
give an explicit expression for B(xn+1 , xn+2 ).
Step 2
B(xn+1 , xn+2 ) = E[Y1 1Y1 min(Y) 1Y2 Y1 ] min(Y)E[1Y1 min(Y) 1Y2 Y1 ]
L
This expression should be noted 1 1Y2 <Y1 , but since we work with continous random
variables, it sufficies that their correlation is = 1 for the expression to be exact ({Y1 = Y2 }
is then neglectable). We implicitely do this assumption in the following.
146
The two terms of this sum require some attention. We compute them in detail in the
two next steps.
Step 3. Using a key property of conditional calculus6 , we obtain
E[N1 1Y1 min(Y) 1Y2 Y1 ] = E[N1 1Y1 min(Y) E[1Y2 Y1 |Y1 ]],
and the fact that Y2 |Y1 N (m2|1 (Y1 ), s22|1 (Y1 )) (all conditional on the observations)
leads to the following:
E[1Y2 Y1 |Y1 ] =
Y1 m2|1
s2|1
c
Y1 m2 1,22 (Y1 m1 )
1
=
2
2 1 12
Back to the main term and using again the normal decomposition of Y1 , we get:
m
+
(
)N
1
2
1
12
2
1
= E N1 1N (1 N1 + 1 )
min(Y)m1
1
1
N1
1
2 1 212
where 1 =
1 12 2
min(Y) m1
m1 m 2
, 1 =
and 1 =
1
2
2
2 1 12
2 1 12
(6.17)
And since
u (u) (1 u + 1 )du = (1 ) (1 1 + 1 ) +
u2 + (1 u + 1 )2
1
2
u2 (1 u+1 )2
2
du
2
2
2
1
1
=
(1 + 1 )u + 2 + 1+1 2 , the last integral
1+1
reduces to:
2
21
1 + 21
2
1 1
(1+2
1 )u+
1+2
1
2
21
v2
2
(1+2 )1 + 1 1
2
1+1
2
1
1+21 e
du =
dv
2
2
(1 + 1 )
21
1
2
1+1
1
1
(1 + 21 )1 +
E[N1 1Y1 min(Y) 1Y2 Y1 ] = (1 )(1 1 + 1 ) +
(1 + 21 )
1 + 21
147
Step 4. We then compute the term E[1Y1 min(Y) 1Y2 Y1 ] = E[1Xmin(Y) 1Z0 ], where
(X, Z) := (Y1 ,Y2 Y1 ) follows a 2-dimensional gaussian
distribution with expecta
12
c1,2 12
2
2
c1,2 1 2 + 12 2c1,2
The final results rely on the fact that: E[1Xmin(Y) 1Z0 ] = CDF(M, )(min(Y), 0),
where CDF stands for the bi-gaussian cumulative distribution function:
Figure 6.3.1 represents the 1-EI and the 2-EI contour plots associated with a deterministic polynomial function known at 5 points. 1-EI advises here to sample between the good points of X. The 2-EI contour illustrates some general properties:
2-EI is symmetric and its diagonal equals 1-EI, what can be easily seen by coming
back to the definitions. Roughly said, 2-EI is high whenever the 2 points have high
1-EI and are reasonably distant from another (precisely, in the sense of the metric
used in OK). Additionally, maximizing 2-EI selects here the two best local optima
of 1-EI (x1 = 0.3 and x2 = 0.7). This is not a general fact. The next example illustrates for instance how 2-EI maximization can yield two points located around (but
different from) 1-EIs global optimum whenever 1-EI has one single peak of great
magnitude (see fig. 6.4).
k [1, nsim ], Mk = (mOK (xn+1 ), ..., mOK (xn+q )) + [Sq2 Nk ]T , Nk N (0q , Iq ) i.i.d.
(6.19)
148
Fig. 6.4 1-point EI (lower left) and 2-points EI (right) functions associated with a monodimensional linear function (y(x) = 3 x) known at X = {1, 0.5, 0, 0.5, 1}. The OK metamodel has here a cubic covariance with parameters 2 = 10, scale = 1.4)
149
nsim
(6.20)
and the Central Limit Theorem (CLT) can finally be used to control the precision of
the Monte Carlo approximation as a function of nsim (see [40] for details concerning
the variance estimation):
(6.22)
150
Instead of searching for the globally optimal vector (x n+1 , x n+2 , ..., x n+q ), an intuitive way of replacing it by a sequential approach is the following: first look for
the next best single point xn+1 = argmaxxDEI(x), then feed the model and look
for xn+2 = argmaxxD EI(x), and so on. Of course, the value y(xn+1 ) is not known
at the second step (else we would be in a real sequential algorithm, like EGO).
Nevertheless, we dispose of two pieces of information: the site xn+1 is assumed
to have already been visited at the previous iteration, and [Y (xn+1 )|Y = Y (X)]
has a known distribution. More precisely, the latter is [Y (xn+1 )|Y (X) = Y]
N (mOK (xn+1 ), s2OK (xn+1 )). Hence, the second site xn+2 can be computed as:
(6.23)
and the same procedure can be applied iteratively to deliver q points, computing
j [1, q 1]:
uR j
E (Y (x) min(Y (X)))+ |Y (X) = Y,Y (xn+1 ), ...,Y (xn+ j1 ) fY (X1: j )|Y (X)=Y (u)du,
(6.24)
where fY (X1: j )|Y (X)=Y is the multivariate gaussian density of the OK conditional
distribution at (xn+1 , ..., xn+ j ). Although eq. 6.24 is a sequentialized version of the
q-points expected improvement maximization, it doesnt completely fulfill our objectives. There is still a multivariate gaussian density to integrate, which seems to be
a typical curse in such problems dealing with dependent random vectors. We now
present two classes of heuristic strategies meant to circumvent the computational
complexity encountered in eq. 6.24.
The Kriging Believer strategy replaces the conditional knowledge about the responses at the sites chosen within the last iterations by deterministic values equal to
the expectation of the Kriging predictor. Keeping the same notations as previously,
the strategy can be summed up as follows:
151
Let us now consider a sequential strategy in which the metamodel is updated (still
without hyperparameter re-estimation) at each iteration with a value L exogenously
fixed by the user, here called a lie. The strategy referred to as the Constant Liar
consists in lying with the same value L at every iteration: maximize EI (i.e. find
xn+1 ), actualize the model as if y(xn+1 ) = L, and so on always with the same L R:
Algorithm 2. The Constant Liar algorithm: another approximate solution of the
multipoints problem (x n+1 , x n+2 , ..., x n+q ) = argmaxX Dq [EI(X )]
1: Function CL(X, Y, L, q)
2: for i 1, q do
3:
xn+i = argmaxxD EI(x)
4:
X = X {xn+i }
5:
Y = Y {L}
6: end for
152
(6.25)
yBH has three global minimizers (3.14, 12.27), (3.14, 2.27), (9.42, 2.47), and the
global minimum is approximately equal to 0.4. The variables are normalized by
+5
x2
the transformation x1 = x115
and x2 = 15
. The initial design of experiments is a
3 3 complete factorial design X9 (see 6.5 ), thus Y = yBH (X9 ). Ordinary Kriging
is applied with a stationary, anisotropic, gaussian covariance function
h = (h1 , h2 ) R2 , c(h) = 2 e1 h1 2 h2
2
(6.26)
153
Fig. 6.5 (Left) contour of the yBH function with the design X9 (small black points) and
the 6 first points given by the heuristic strategy CL[min(yBH (X9 ))] (large bullets). (Right)
Histogram of 104 Monte Carlo simulated values of the improvement brought by the 6-points
CL[min(yBH (X9 ))] strategy. The corresponding estimates of 6-points PI and EI are given
above
the good and the very good designs. The q-EI is a more selective measure thanks
to taking the magnitude of possible improvements into account. Nevertheless, q-EI
overevaluates the improvement associated with all designs considered here. This
effect (already pointed out in [47]) can be explained by considering both the high
value of 2 estimated from Y and the small difference between the minimal value
reached at X9 (9.5) and the actual minimum of yBH (0.4).
We finally compared CL[min], CL[max], latin hypercubes (LHS) and uniform
random designs (UNIF) in terms of q-EI values, with q [1, 10]. For every q
[1, 10], we sampled 2000 q-elements designs of each type (LHS and UNIF) and
compared the obtained empirical distributions of q-points Expected Improvement
to the q-points Expected Improvement estimates associated with the q first points of
both CL strategies.
154
Table 6.1 Multipoints PI, EI, and actual improvements for the 2, 6, and 10 first iterations
of the heuristic strategies CL[min(Y)], CL[mean(Y)], CL[max(Y)], and Kriging Believer
(here min(Y) = min(yBH (X9 ))). q PI and q EI are evaluated by Monte-Carlo simulations
(Eq. 6.20, nsim = 104 )
PI (first 2 points)
EI (first 2 points)
PI (first 6 points)
EI (first 6 points)
PI (first 10 points)
EI (first 10 points)
Improvement (first 6 points)
Improvement (first 10 points)
Fig. 6.6 Comparaison of the q-EI associated with the q first points (q [1, 10]) given by the
constant liar strategies (min and max), 2000 q-points designs uniformly drawn for every q,
and 2000 q-points LHS designs taken at random for every q
As can be seen on fig. 6.6, CL[max] (light bullets) and CL[min] (dark squares)
offer very good q-EI results compared to random designs, especially for small values of q. By definition, the two of them start with the 1-EI global maximizer, which
ensures a q-EI at least equal to 83 for all q 1. Both associated q-EI series then
seem to converge to threshold values, almost reached for q 2 by CL[max] (which
dominates CL[min] when q = 2 and q = 3) and for q 4 by CL[min] (which dominates CL[max] for all q s.t. 4 q 10). The random designs have less promizing
155
q-EI expected values. Their q-EI distributions are quite dispersed, which can be
seen for instance by looking at the 10% 90% interpercentiles represented on fig.
6.6 by thin full lines (respectively dark and light for UNIF and LHS designs). Note
in particular that the q-EI distribution of the LHS designs seem globally better than
the one of the uniform designs. Interestingly, the best designs ever found among the
UNIF designs (dark dotted lines) and among the LHS designs (light dotted lines)
almost match with CL[max] when q {2, 3} and CL[min] when 4 q 10. We
havent yet observed a design sampled at random that clearly provides better q-EI
values than the proposed heuristic strategies.
Other computational intelligence optimizers, e.g. evolutionary algorithms [9], address the
exploration/exploitation trade-off implicitely through the choice of parameters such as the
population size and the mutation probability.
156
designs on the basis of such estimates is not straightforward, and crucially depending on both n and d. Hence some greedy alternative problems were considered:
four heuristic strategies, the Kriging Believer and three Constant Liars have
been proposed and compared that aim at maximizing q-EI while being numerically
tractable. It has been verified in the frame of a classical test case that the CL strategies provide q-EI values comparable with the best Latin Hypercubes and uniform
designs of experiments found by simulation. This simple application illustrated a
central practical conclusion of this work: considering a set of candidate designs of
experiments, provided for instance by heuristic strategies, it is always possible
whatever n and d to evaluate and rank them using estimates of q-EI or related
criteria, thanks to conditional Monte-Carlo simulation.
Perspectives include of course the development of synchronous parallel EGO
variants delivering a set of q points at each iteration. The tools presented in the
chapter may constitute bricks of these algorithms, as it has very recently been illustrated on a succesful 6-dimensional test-case in the thesis [13]. An R package
covering that subject is in an advanced stage of preparation and should be released
soon [41]. On a longer term, the scope of the work presented in this chapter, and not
only its modest original contributions, could be broaden. If the considered methods
could seem essentially restricted to the Ordinary Kriging metamodel and concern
the use of an optimization criterion meant to obtain q points in parallel, several degrees of freedom can be played on in order to address more general problems. First,
any probabilistic metamodel potentially providing joint distributions could do well
(regression models, smoothing splines, etc.). Second, the final goal of the new generated design might be to improve the global accuracy of the metamodel, to learn
a quantile, to fill the space, etc : the work done here with the q-EI and associate
strategies is just a particular case of what one can do with the flexibility offered by
probabilistic metamodels and all possible decision-theoretic criteria. To finish with
two challenging issues of Computationnal Intelligence, the following perspectives
seem particularly relevant at both sides of the interface with this work:
CI methods are needed to maximize the q-EI criterion, which inputs live in a
(n d)-dimensional space, and which evaluation is noisy, with tunable fidelity
depending on the chosen nsim values,
q-EI and related criteria are now at disposal to help pre-selecting good points in
metamodel-assisted evolution strategies, in the flavour of [10].
Aknowledgements: This work was conducted within the frame of the DICE (Deep
Inside Computer Experiments) Consortium between ARMINES, Renault, EDF,
IRSN, ONERA, and Total S.A. The authors wish to thank X. Bay, R. T. Haftka, B.
Smarslok, Y. Richet, O. Roustant, and V. Picheny for their help and rich comments.
Special thanks to the R project people [6] for developing and spreading such a useful freeware. David moved to Neuchatel University (Switzerland) for a postdoc, and
he gratefully thanks the Mathematics Institute and the Hydrogeology Department
for letting him spend time on the revision of the present chapter.
157
6.6 Appendix
6.6.1 Gaussian Processes for Machine Learning
A real-valued random process (Y (x))xD is called a Gaussian Process (GP) whenever all its finite-dimensional distributions are gaussian. Consequently, for all n N
and for all set X = {x1 , ..., xn } of n points of D, there exists a vector m Rn and
a symmetric positive semi-definite matrix Mn (R) such that (Y (x1 ), ...,Y (xn ))
is a gaussian Vector, following a multigaussian probability distribution N (m, ).
More specifically, for all i [1, n], Y (xi ) N (E[Y (xi )],Var[Y (xi )]) where E[Y (xi )]
is the ith coordinate of m and Var[Y (xi )] is the ith diagonal term of . Furthermore, all couples (Y (xi ),Y (x j )) i, j [1, n], i = j are multigaussian with a covariance Cov[Y (xi ),Y (x j )] equal to the non-diagonal term of indexed by i and j.
A Random Process Y is said to be first order stationary if its mean is a constant,
i.e. if R| x D, E[Y (x)] = . A first order stationary process Y is said to
be second order stationary if there exists furthermore a function of positive type,
c : D D R, such that for all pairs (x, x ) D2 , Cov[Y (x),Y (x )] = c(x x).
We then have the following expression for the covariance matrix of the observations
at X:
c(x1 x2 )
2
c(x2 x1 )
2
... c(x1 xn )
... c(x2 xn )
...
...
...
2
(6.27)
where 2 := c(0). Second order stationary processes are sometimes called weakly
stationary. A major feature of GPs is that their weak stationarity is equivalent to
strong stationarity: if Y is a weakly stationary GP, the law of probability of the random variable Y (x) doesnt depend on x, and the joint distribution of (Y (x1 ), ...,Y (xn ))
is the same as the distribution of (Y (x1 + h), ...,Y (xn + h)) whatever the set of points
{x1 , ..., xn } Dn and the vector h Rn such that {x1 + h, ..., xn + h} Dn . To sum
up, a stationary GP is entirely defined by its mean and its covariance function
c(.). The classical framework of Kriging for Computer Experiments is to make predictions of a costly simulator y at a new set of sites Xnew = {xn+1, ..., xn+q } (most
of the time, q = 1), on the basis of the collected observations at the initial design
X = {x1 , ..., xn }, and under the assumption that y is one realization of a stationary
GP Y with known covariance function c (in theory). Simple Kriging (SK) assumes
a known mean, R. In Ordinary Kriging (OK), is estimated.
158
Key properties of Gaussian vectors include that the orthogonal projection of a Gaussian vector onto a linear subspace is still a Gaussian vector, and that the orthogonality of two subvectors V1 ,V2 of a Gaussian vector V (i.e. cross = E[V2V1T ] = 0)
is equivalent to their independence. We now express the conditional expectation
E[V1 |V2 ]. E[V1 |V2 ] is by definition such that V1 E[V1 |V2 ] is independent of V2 .
E[V1 |V2 ] is thus fully characterized as orthogonal projection on the vector space
spanned by the components of V2 , solving the so called normal equations:
E[(V1 E[V1 |V2 ])V2T ] = 0
(6.29)
Assuming linearity of E[V1 |V2 ] in V2 , i.e. E[V1 |V2 ] = AV2 (A Mn (R)), a straightforT
ward development of (eq.6.29) gives the matrix equation cross
= AV2 , and hence
1
T
cross V2 V2 is a suitable solution provided V2 is full ranked8. We conclude that
T
E[V1 |V2 ] = cross
V1
V2
2
(6.30)
V1 |V2 = E[(V1 E[V1|V2 ])(V1 E[V1|V2 ])T |V2 ] = E[(V1 AV2)(V1 AV2)T ]
T
T
= V1 Across cross
AT + AV2 AT = V1 cross
V1
cross
2
(6.31)
Now consider the case of a non-centered random vector V = (V1 ,V2 ) with mean
m = (m1 , m2 ). The conditional distribution V1 |V2 can be obtained by coming back
to the centered random vector V m. We then find that E[V1 m1 |V2 m2 ] =
T 1 (V m ) and hence E[V |V ] = m + T 1 (V m ).
cross
2
2
1 2
1
2
2
cross V2
V2
tot =
cross
T
cross
new
2
c(x1 x2 )
c(x2 x1 )
2
=
...
...
c(xn+q x1) c(xn+q x2 )
159
...
...
2
...
We can directly apply eq. (6.30) and eq. (6.31) to derive the Simple Kriging Equations:
[Y (Xnew )|Y (X) = Y] N (mSK (Xnew ), SK (Xnew ))
(6.34)
T 1 (Y 1 )
with mSK (Xnew ) = E[Y (Xnew )|Y (X) = Y] = 1q + cross
q
T 1
n+1 ) =
and SK (Xnew ) = new cross
.
When
q
=
1,
cross
cross = c(x
Cov[Y (xn+1 ),Y (X)] and the covariance matrix reduces to s2SK (x) =
2 c(xn+1 )T 1 c(xn+1 ), which is called the Kriging Variance. Note that
when is constant but not known in advance, it is not mathematically correct to
sequentially estimate and plug in the estimate in the Simple Kriging equations.
Ordinary Kriging addresses this issue.
Y (x) = + (x)
160
A,B, C such that all terms exist. We then get for all couples of points (xn+i , xn+ j )
(i, j [1, q]):
Cov[Y (xn+i ),Y (xn+ j )|Y (X) = Y]
=E Cov[Y (xn+i ),Y (xn+ j )|Y (X) = Y, ] +Cov E[Y (xn+i )|Y (X) = Y, ], E[Y (xn+ j )|Y (X) = Y, ] .
(6.37)
The left term Cov[Y (xn+i ),Y (xn+ j )|Y (X) = Y, ] is the conditional covariance under the Simple Kriging Model. The right term is the covariance between +
c(xn+i )T 1 (Y 1q ) and + c(xn+ j )T 1 (Y 1q ) conditional on the observations Y (X) = Y. Using eq. 6.36, we finally obtain:
Cov[Y (xn+i ),Y (xn+ j )|Y (X) = Y]
=CovSK [Y (xn+i ),Y (xn+ j )|Y (X) = Y]
+Cov[c(xn+i )T 1 (Y) + (1 + c(xn+i )T 1 1q ), c(xn+ j )T 1 (Y) + (1 + c(xn+ j )T 1 1q )]
=c(xn+i xn+ j ) c(xn+i )T 1 c(xn+ j ) +
(6.38)
References
1. Abrahamsen, P.: A review of gaussian random fields and correlation functions, 2nd edn.
Tech. Rep. 917, Norwegian Computing Center, Olso (1997)
2. Antoniadis, A., Berruyer, J., Carmona, R.: Regression non lineaire et applications. Economica, Paris (1992)
3. Baker, C., Watson, L.T., Grossman, B., Mason, W.H., Haftka, R.T.: Parallel global aircraft configuration design space exploration. Practical parallel computing, 7996 (2001)
4. Bishop, C.: Neural Networks for Pattern Recognition. Oxford Univ. Press, Oxford (1995)
5. Blum, C.: Ant colony optimization: introduction and recent trends. Physics of Life Review 2, 353373 (2005)
6. development Core Team R: R: A language and environment for statistical computing
(2006), http://www.R-project.org
7. Cressie, N.: Statistics for spatial data. Wiley series in probability and mathematical statistics (1993)
8. Dreyfus, G., Martinez, J.M.: Reseaux de neurones. Eyrolles (2002)
9. Eiben, A., Smith, J.: Introduction to Evolutionary Computing. Springer, Heidelberg
(2003)
10. Emmerich, M., Giannakoglou, K., Naujoks, B.: Single-and multiobjective optimization
assisted by gaussian random field metamodels. IEEE Transactions on Evolutionary Computation 10(4), 421439 (2006)
11. Geman, D., Jedynak, B.: An active testing model for tracking roads in satellite images.
Tech. rep., Institut National de Recherches en Informatique et Automatique (INRIA)
(December 1995)
12. Genton, M.: Classes of kernels for machine learning: A statistics perspective. Journal of
Machine Learning Research 2, 299312 (2001)
161
13. Ginsbourger, D.: Multiples metamod`eles pour lapproximation et loptimisation de fonctions numeriques multivariables. PhD thesis, Ecole Nationale Superieure des Mines de
Saint-Etienne (2009)
14. Ginsbourger, D., Le Riche, R., Carraro, L.: A multipoints criterion for parallel global optimization of deterministic computer experiments. In: Non-Convex Programming 2007
(2007)
15. Goria, S.: Evaluation dun projet minier: approche bayesienne et options reelles. PhD
thesis, Ecole des Mines de Paris (2004)
16. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer,
Heidelberg (2001)
17. Henkenjohann, N., Gobel, R., Kleiner, M., Kunert, J.: An adaptive sequential procedure for efficient optimization of the sheet metal spinning process. Qual. Reliab. Engng.
Int. 21, 439455 (2005)
18. Huang, D., Allen, T., Notz, W., Miller, R.: Sequential Kriging optimization using multiple fidelity evaluations. Sructural and Multidisciplinary Optimization 32, 369382
(2006)
19. Huang, D., Allen, T., Notz, W., Zheng, N.: Global optimization of stochastic black-box
systems via sequential Kriging meta-models. Journal of Global Optimization 34, 441
466 (2006)
20. Jones, D.: A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization 21, 345383 (2001)
21. Jones, D., Pertunen, C., Stuckman, B.: Lipschitzian optimization without the lipschitz
constant. Journal of Optimization Theory and Application 79(1) (1993)
22. Jones, D., Schonlau, M., Welch, W.: Efficient global optimization of expensive black-box
functions. Journal of Global Optimization 13, 455492 (1998)
23. Journel, A.: Fundamentals of geostatistics in five lessons. Tech. rep., Stanford Center for
Reservoir Forecasting (1988)
24. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: IEEE Intl. Conf. on Neural
Networks, vol. 4, pp. 19421948 (1995)
25. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220, 671680 (1983)
26. Knowles, J.: Parego: A hybrid algorithm with on-line landscape approximation for
expensive multiobjective optimization problems. IEEE Transactions on Evolutionnary
Computation (2005)
27. Koehler, J., Owen, A.: Computer experiments. Tech. rep., Department of Statistics, Stanford University (1996)
28. Kracker, H.: Methoden zur analyse von computerexperimenten mit anwendung auf die
hochdruckblechumformung. Masters thesis, Dortmund University (2006)
29. Krige, D.: A statistical approach to some basic mine valuation problems on the witwatersrand. J. of the Chem., Metal. and Mining Soc. of South Africa 52(6), 119139 (1951)
30. Martin, J., Simpson, T.: A monte carlo simulation of the Kriging model. In: 10th
AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Albany, NY,
AIAA, AIAA-2004-4483, August 30 - September 2 (2004)
31. Martin, J., Simpson, T.: Use of Kriging models to approximate deterministic computer
models. AIAA Journal 43(4), 853863 (2005)
32. Matheron, G.: Principles of geostatistics. Economic Geology 58, 12461266 (1963)
33. Matheron, G.: La theorie des variables regionalisees et ses applications. Tech. rep., Centre de Morphologie Mathematique de Fontainebleau, Ecole Nationale Superieure des
Mines de Paris (1970)
162
34. OHagan, A.: Bayesian analysis of computer code outputs: a tutorial. Reliability Engineering and System Safety 91(91), 12901300 (2006)
35. Paciorek, C.: Nonstationary gaussian processes for regression and spatial modelling. PhD
thesis, Carnegie Mellon University (2003)
36. Ponweiser, W., Wagner, T., Biermann, D., Vincze, M.: Multiobjective optimization on a
limited budget of evaluations using model-assisted S-metric selection. In: Rudolph, G.,
Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN 2008. LNCS, vol. 5199, pp.
784794. Springer, Heidelberg (2008)
37. Praveen, C., Duvigneau, R.: Radial basis functions and Kriging metamodels for aerodynamic optimization. Tech. rep., INRIA (2007)
38. Queipo, N., Verde, A., Pintos, S., Haftka, R.: Assessing the value of another cycle
in surrogate-based optimization. In: 11th Multidisciplinary Analysis and Optimization
Conference, AIAA (2006)
39. Rasmussen, C., Williams, K.: Gaussian Processes for Machine Learning. MIT Press,
Cambridge (2006)
40. Ripley, B.: Stochastic Simulation. John Wiley and Sons, New York (1987)
41. Roustant, O., Ginsbourger, D., Deville, Y.: The DiceKriging package: Kriging-based
metamodeling and optimization for computer experiments. In: The UseR! Conference,
Agrocampus-Ouest, Rennes, France (2009)
42. Sacks, J., Welch, W., Mitchell, T., Wynn, H.: Design and analysis of computer experiments. Statistical Science 4(4), 409435 (1989)
43. Santner, T., Williams, B., Notz, W.: The Design and Analysis of Computer Experiments.
Springer, Heidelberg (2003)
44. Sasena, M.: Flexibility and efficiency enhancements for constrained global design optimization with Kriging approximations. PhD thesis, University of Michigan (2002)
45. Sasena, M.J., Papalambros, P., Goovaerts, P.: Exploration of metamodeling sampling
criteria for constrained global optimization. Journal of Engineering Optimization (2002)
46. Sasena, M.J., Papalambros, P.Y., Goovaerts, P.: Global optimization of problems with
disconnected feasible regions via surrogate modeling. In: Proceedings of the 9th
AIAA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, Atlanta,
GA (2002)
47. Schonlau, M.: Computer experiments and global optimization. PhD thesis, University of
Waterloo (1997)
48. Schonlau, M., Welch, W., Jones, D.: A data-analytic approach to bayesian global optimization. In: Proceedings of the A.S.A. (1997)
49. Ulmer, H., Streichert, F., Zell, A.: Evolution strategies assisted by gaussian processes
with improved pre-selection criterion. Tech. rep., Center for Bioinformatics Tuebingen,
ZBIT (2003)
50. Villemonteix, J.: Optimisation de fonctions couteuses: Mod`eles gaussiens pour une utilisation theorie et pratique industrielle. PhD thesis, Universite Paris-sud XI, Faculte des
Sciences dOrsay (2008)
51. Villemonteix, J., Vazquez, E., Walter, E.: An informational approach to the global optimization of expensive-to-evaluate functions. Journal of Global Optimization 44(4), 509
534 (2009)
52. Wahba, G.: Spline Models for Observational Data. SIAM, Philadelphia (1990)
53. Williams, C., Rasmussen, C.: Gaussian processes for regression. In: Advances in Neural
Information Processing Systems, vol. 8 (1996)
Chapter 7
164
(7.1)
in which x is the vector of optimization variables, f() : Rnx Rn f are the objective functions and Fx is the feasible set, mathematically defined by the constraint
functions. This optimization problem requires an adequate optimization algorithm
to search for the best solutions. During the search process, the evaluation of each
possible design demands the solution of partial differential equations that describe
the laws that physically govern its behavior. When nonlinearities and accuracy requirements are incorporated, the computational cost becomes even more relevant.
In coupled problems, the analysis step may also involve the association of different
analysis softwares as in [2, 41], which further increases the computational cost. This
synthesis-analysis cycle is shown in the inner loop of the flowchart in Figure 7.1.
When the prototype is optimized, its performance is evaluated to verify if it meets
165
START
?
Definition of
specifications and
requirements
?
Define prototype
device / system
Objectives, constraints,
search space, and
convergence criteria
?
Computational
geometric model
?
Synthesis by
Performance evaluation
Numerical analysis
optimization technique
?
@
@
- Converged?@N
@
@
@ Y
?
@
@
@
@
Match @NOut of @N
requirements?
@resources?
@
@
@
@ Y
@ Y
?
?
STOP
FAILURE
Fig. 7.1 Flowchart description of the computer-aided design process
the requirements defined in the beginning of the CAD process. If it does not meet
the requirements, we can either give up, due to lack of resources and rethink the
design specifications, or return to step 2 to modify the prototype, hence modifying
the objective and contraint functions and the search space.
For the sake of simplicity, the software responsible for solving the analysis problem will be treated as an expensive black-box, of which we know only the outputs
- the behavior it produces - due to different inputs. From the optimization point of
view, it does not matter what is inside this black-box, as long as it provides consistent outputs. In some problems, all objective and constraint functions are functions
of the black-box output value. In this case, the time to evaluate one individual in a
population-based evolutionary algorithm is simply the time consumed by the blackbox. In other cases, only some of these functions depend on the black-box software,
while the others are analytically defined and usually inexpensive.
Evolutionary Algorithms (EAs) play an important role in the solution of
complex CAD problems [10, 17, 31], because they can deal with problems that are
166
discontinuous, nonconvex, multimodal, noisy, and present correlations or higherorder dependencies among variables. Moreover, these algorithms can explore simultaneous regions of the search space in a single run and evolve a population of
candidate solutions that represent sub-optimal solutions or trade-off solutions in the
context of multiobjective problems. The bottom line is that the same principles in
nature that are responsible for the diversity and complexity of biological systems
can be applied to the solution of complex design problems in engineering. Thanks
to the work of many researchers since the 1950s, and contributions of the last two
decades that helped to popularize and mature these techniques, evolutionary algorithms are today an important computational intelligence methodology for complex
designs and an established field of research [4, 6, 9].
Memetic Algorithms (MAs) [24, 25], a term initially coined by Pablo Moscato
in 19891, represent a particular class of EAs that employ local search methods
inside their cycle. MAs first appeared for combinatorial optimization problems
[14, 23, 25], but it did not take long for continuous search space versions of such
algorithms to appear [18, 22, 27]. MAs for multiobjective problems have also been
proposed, see [19, 39]. Any successful global search meta-heuristic method, including evolutionary techniques, should find a good balance between its exploitation
and exploration components2. MAs approach the exploration-exploitation balance
by trying to find a good association of global search mechanisms and local search
operators, which can favor the optimization process as a whole. In fact, some deterministic search methods, although capable of local search only, present fast and
precise convergence properties that surpass the properties of EAs, which, in contrast,
have poorer convergence and accuracy. There are very specialized methods for treating constraints in optimization problems, including equality constraints, that rely on
assumptions that favor their utilization as local search techniques [5, 37]. All these
characteristics suggest the use of hybrid strategies, in order to enhance the local
search properties of typical EAs.
In the first versions of MAs, the local search method is applied to each individual generated by the reproductive operators, leading this offspring to its closest local optimum. Such MAs have presented good performance in some contexts,
since the search space to be explored by the global search side of the hybrid algorithm is then greatly reduced. This reduction has a price to be paid, which is the
computational cost of the local search, limiting the application of MAs in many
expensive-to-evaluate CAD problems involving continuous-variable search spaces.
Later versions of MAs have relaxed this definition, by not applying the local search
to all individuals and sometimes not applying the local search in all generations.
The specification of the local search intensity and frequency is referred to as the
balance between global and local search [15]. Some approaches relax the accuracy
of the local optimizer [18] and other strategies relax the requirement that the local
search is applied up to local optimality, only a local enhancement is required [20]. In
1
2
Moscato was inspired by the concept of memes as proposed by Richard Dawkins in 1976
[7].
Exploration is related to the global search capability of the algorithm, while Exploitation
means spending more searching effort in the most promising regions.
167
this chapter, we follow the taxonomy presented by Krasnogor in [20], and adopt the
term Memetic Algorithm for evolutionary algorithms that employ any local search
method to some or all individuals of the population, but this is not a consensus, and
different opinions exist [27, 29].
This chapter deals with Approximation-Based Memetic Algorithms (A -MAs),
which is a category of MAs that employ approximation techniques within the
local search phase, using samples produced by the algorithm itself in previous generations. These A -MAs have arisen recently to deal specifically with expensiveto-evaluate black-box functions in continuous-variable search spaces, in order to
reduce the computational overhead of the local search phase while still benefitting
from the principles of MAs [11, 12, 28, 38, 39, 42]. The use of approximations,
or surrogate models, in evolutionary optimization is well discussed in the literature
[16], but the employment of local approximations, specially in the context of MAs
is rather recent [36]. The algorithms in this particular class of MAs rely strongly on
the sampling properties of population-based EAs, whose heuristic operators tend to
concentrate more samples in the most promising regions of the search space. Given
the computational effort spent to obtain each of these samples, it is a good idea to
store them and use some of this information to build local approximations of these
black-box functions for the local search operator. With these computationally less
expensive-to-evaluate local approximations in hand, the local search can be used
to enhance some individuals of the population. By using approximations, the local
search is not exact, but this local search engine can potentially increase the convergence properties of typical EAs without the high computational cost in terms of
function evaluations required when applying the local search method directly to the
real functions. Note that, unlike usual MAs, the local search phase in A -MAs is
intended to be as inexpensive as possible, considering the number of function evaluations, usually not requiring any additional evaluation of the black-box functions.
We propose a methodology for the local search in A -MAs that is based on Radial Basis Function (RBF) approximations [13, 30] of the expensive black-box functions. The auxiliary local search problem based on these RBF approximations is then
solved by the Sequential Quadratic Programming (SQP) method [8], which is a fast
and accurate method for constrained mono-objective problems, having a quite predictable behavior. This procedure locally enhances one individual of the population,
making it an MA according to [20].
Adittionally, we combine this framework of A -MAs with another strategy for
treating expensive optimization problems: we perform the whole optimization varying the accuracy with which a given candidate solution is evaluated by the expensive
black-box function, rather than using the same accuracy for all evaluations. We allocate more computational effort for evaluating the candidate solutions around the
best regions and also as the evolutionary search converges. When high accuracy
is required for the final solution of the CAD problem, we can reduce the overall
computational cost by varying the accuracy (and thus the computational cost) of the
objective function when analyzing points tested by the algorithm during the optimization process. This has an impact in the approximation methodology used in the
local search.
168
The chapter proceeds to the formal analysis of A -MAs. This analysis is divided
in two main parts. The first part investigates the effect of the local search operators on the global convergence properties of evolutionary algorithms via Markov
chain theory [26]. The second part studies the computational cost of A -MAs. In this
second part, the computational complexity of the approximation-based local search
operators is derived and expressions for the overhead of the local search are presented. The chapter concludes with the illustration of the methodology in the design
of electromagnetic devices, in which the evaluation of candidate solutions requires
the numerical solution of partial differential equations, making it an expensive
optimization problem.
(7.2)
The inequality constraints g() : Rnx Rng and the equality constraints h() : Rnx
Rnh define the feasible region:
n
n
Fx = S
g
i=1
Gi
h
Hj
(7.3)
j=1
169
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Data: population size , offspring size , maximum archive size , search space S ,
objective and constraint functions f(), g(), h().
Result: Estimate(s) of S in the archive population Ak .
k 0 /* Generation counter
*/
(1)
( )
Pk = pk , . . . , pk
Initialize population( , S );
Ak = Initialize archive /* Keep the best solution(s)
Pk Evaluate fitness(Pk , f, g, h);
Ak Update(Ak ,Pk , );
// Reproduction comprises selection and variation
(1)
( )
Reproduction(Pk, Ak , Pk );
Qk = q k , . . . , q k
*/
The substitution operator creates the next parent population. In genetic algorithms, the offspring simply substitutes for the previous parent population. In
evolution strategies, substitution schemes such as the ES( + ) or ES( , ) are
employed [4]. Other deterministic substitution schemes are used in differential evolution and evolutionary programming algorithms.
The main point that is expressed in this general structure is that the basic evolutionary operators, as well as the generic evolutionary structure, are essentially
preserved in the structure of A -MAs. The local search phase is described in lines
1115, after the fitness assignment of the offspring population Qk . It introduces the
following parameters:
the interval of generations in which the local search is to be applied, denoted by
nL 0. For example, if nL = 0, the local search operator is applied at every generation. If nL = 4, the local search operator is applied at every four generations.
the number of individuals in the Qk population that will be subject to local search,
denoted by 0 .
170
There is no general rule for choosing the values of these parameters. They are related
to the frequency and the intensity of the local search in the memetic algorithm. The
value of nL cannot be too small, because the local search is more efficient when new
information is available, that is, when new samples are generated. It cannot be too
big either, because the local search has to be applied in a number of generations
that is enough for producing some effect on the performance of the algorithm. The
value of should be less than . An intuitive argument is that the approximationbased local search operators rely on the samples available, and therefore they would
not work properly if applied to all individuals, simply because there are not enough
samples to generate local approximations around each individual. Therefore, the
local search in A -MAs should be applied to few individuals. In the next section we
provide a formal argument based on convergence properties.
In this section, we discuss the treatment of expensive optimization problems by
combining two strategies. First, we do not perform the whole optimization with
great accuracy for all evaluations, but only for some of them. The black-box functions are evaluated with varying accuracy instead of using a fixed accuracy in the
whole process. The second strategy is the employment of local RBF approximations
within the local search operator of A -MAs. The approximations should be flexible
given that the points in the data were not evaluated with the same accuracy.
171
In this way, the evolutionary algorithm can specify the values of . The searching
process can then dedicate more computational effort when evaluating some solutions in the search space S . The parameters may vary with time and/or within the
population at a given generation. For example, as the time increases the parameters
are selected in order to increase the accuracy of the numerical method. Moreover,
in a given generation, we can dedicate more effort and time to those solutions in
the most promising regions, and solutions generated by exploratory mutations are
evaluated with less computational effort. These general ideas in the utilization of
varying accuracy cost functions lead to the specification of the following general
rules or heuristics:
1. Initially, the basis value assumes the first value in the list, i.e., .
2. During the first generations, the algorithm is in its exploration phase, sampling
the search space in a more exploratory way and the diversity of the population
is high. As the number of generations increases, we expect the population have
converged toward a given optimum, and we can therefore increase the accuracy
for the evaluation of the population. Therefore, after T successive generations,
we change the base value to next( ).
3. Individuals in better regions of the search space should be evaluated with higher
accuracy, while we should not spend much computational effort evaluating individuals in poor regions. In this case, we use the fitness values to decide which
individuals are evaluated with higher accuracy. The offspring of the N best individuals are evaluated using next( ), while the remaining ones are evaluated using
the current base value . This follows from the principle of heredity, which states
that the offspring tend to be similar to their parents.
4. Each offspring individual changed by mutation is evaluated using prev( ), thus
with less effort.
This approach does give rise to some noise in the objective function, but evolutionary algorithms are relatively robust in the presence of noise. Moreover, as the
algorithm converges towards a given optimum, this noise is reduced because more
and more points around this optimum are evaluated with the same accuracy.
172
reasonable to believe that the local approximations will improve as the search progresses, making the solutions achieved by indirect (i.e., using approximations) local
search arbitrarily close to the ones that would be obtained by direct local search.
In the context of CAD optimization problems using A -MAs, we point out some
important requirements [11]:
The time spent in the evaluation step is dominant in the whole process;
The time spent to generate the approximations should be as small as possible;
The total time spent in the optimization process when using the hybrid algorithm
with approximation-based local search must be less than that spent when using
the standard algorithm.
The first observation is often true in complex CAD problems. It implies that some
additional complexity in the algorithm operations is justifiable. The second requirement is important because the time needed to generate and evaluate the approximations must be small in comparison to the time consumed in evaluating solutions
directly. Finally, the third requirement means that the hybrid algorithm must converge with less evaluations than the standard algorithm, or at least provide a better
solution for the same number of evaluations. We detail next how the approximations are generated and how the local search is performed. We consider constrained
mono-objective problems only.
All evaluations performed by the evolutionary algorithm are stored in a global
data set:
Nk
Dk = x(i) ; f (x(i) ); g1 (x(i) ), . . . , gng (x(i) ); h1 (x(i) ), . . . , hnh (x(i) )
(7.4)
i=1
(7.5)
V (z, ) = x : zi (x+
(7.6)
i xi ) xi zi + (xi xi ), i = 1, . . . , nx
and considering 2, we have an ellipsoidal neighborhood
V (z, ) = x : (x z)T (x z) 1
in which the matrix is given by:
1
[ (x+
i xi )] , i = j
i j =
0,
i=
j
173
(7.7)
(7.8)
The parameter 0 < < 1 defines the size of the local neighborhood with respect to
the parameter range. This parameter is usually set to a small value, typically 0.1.
Algorithm 7.2 shows how the local data set L is assembled from D. Identical
points and points closer than a threshold do not enter the local data set. The elimination of similar points is important, because they cause numerical ill-conditioning
in approximation techniques such as neural networks and RBF models.
*/
The approximations are generated to fit data in L . We employ RBF approximations of the form:
m
p(x) = vi ri (x ci ) = rT v
(7.9)
i=1
where x is the vector of optimization variables, ci is the center of the radial basis
function ri (x ci ) : Rnx R, and v is the vector of parameters of the RBF model.
There are many types of RBF available [13, 30], we have selected the multiquadric
function in the methodology. For training the RBF approximation, we adopt the
following error cost function:
C(v) = wii e2i = eT We
(7.10)
i=1
where ei is the error between the RBF model and the desired value in the local
data set L . The weighted squared error is used because some points in the data
are evaluated with more accuracy than others. Therefore, these points should have
174
a greater weight in the RBF model. This problem can be solved with the Weighted
Least Squares (WLS) method:
1 T
v = RT WR
R Wy
(7.11)
where R is the matrix with the values of the radial basis functions and y is the vector
with the desired outputs.
Each nonlinear expensive-to-evaluate function in the optimization problem is locally approximated using the RBF model trained with the WLS method. One advantage of this approach is that it is a fast way to obtain the approximations, as required
by the observations made before. With the approximations in hand, we can define
the local search problem as:
min f(x)
x
x V (z, )
subject to: x F
(7.12)
in which V (z, ) is the local neighborhood, f() is the approximation for the objecx is the approximated feasible set generated by the approximated
tive function, and F
constraints.
Observe that the local search problem, see (7.12), has an additional constraint
in comparison to the original optimization problem: the local problem is restricted
to the region V (z, ), that is, the neighborhood of the individual selected for local
search. This problem can be easily and directly solved by employing the Sequential Quadratic Programming (SQP) method [8]. The SQP method is an extension
of the Newtons method for unconstrained optimization to constrained problems.
The method replaces the objective function with its quadratic approximation and
replaces the constraint functions by linear approximations. A nonlinear problem in
which the objective function is quadratic and the constraints are linear is called
a Quadratic Program (QP). The SQP method solves a QP at each iteration until
a satisfactory solution is found. The local convergence properties of the SQP are
well understood. Therefore, although the solution of the problem (7.12) is obtained
numerically, it is found by a fast and accurate method, allowing us to call it a semianalitycal solution. Furthermore, since the expressions of the RBF approximations
are known, their quadratic and linear models are easily and analytically obtained
and used within the SQP procedure. We summarize the local search operator in
Algorithm 7.3.
4 return z ;
1
2
*/
175
This section presented the approximation-based local search operator and its use
in evolutionary algorithms. The local enhancement of some individuals in A -MAs
can be achieved in an indirect way, by using a local representation of the problem,
based on the knowledge acquired by the algorithm along successive generations.
This indirect local search via local approximations reduces the computational cost
associated with the exploitation component of memetic algorithms. However, it is
important to analyze the theoretical effects of the proposed local search on the global
convergence properties of evolutionary algorithms. This issue is addressed in the
next section. We also discuss the computional cost of the proposed methodology.
176
si = sAi ; sPi , where sAi is the part of the state associated with An , and sPi is the part
associated with Pn . After the archive update function, we can say that the individuals
represented by sAi are the best individuals in the state si .
Before proceeding, we give some useful definitions:
Definition 1. A state s j is said to be accesible from the state si if k < such that:
(k)
(k)
P{Xn+k = s j |Xn = si } = i j > 0, where i j is the element i j of the matrix Tk , and
we write si s j .
Definition 2. A state s j is said to be a communicating state with the state si if si s j
and s j si , and we write si s j .
Definition 3. A Markov chain {Xn X : n N} is irreducible if its state space is
a communicating class, that is, all its states are communicating states.
The last definition means that it is possible to get to any state from any state in
an irreducible Markov chain. Therefore, every state will be visited in finite time
regardless of the initial state [26]. Due to the use of the archive population An ,
some states are not communicating states, because of elitism. Thus, states containing
worse solutions in An cannot be reached from states containing better solutions in
An . These intermediate states in the search process are called transient states, since
si s j , but s j si .
Let S be the set of optimal solutions for the optimization problem. It can represent: (i) all the global optima if there are more than one; (ii) all the global and local
optima (if the algorithm is designed to find them); or (iii) all global Pareto-optimal
solutions in a multi-objective context. We adopt the following notation:
Definition 4. Those states that represent an An whose elements belong to S are
called essential states. They form the set E . Those states that represent an An
whose elements do not belong to S are called inessential states, hence they are
intermediate states. They form the set I .
From [33], we know that P{Xn I } 0, when n . Since all essential
states represent archive populations that are optimal, whose meaning depends on the
context, the algorithm will finally converge to one of the essential states:
P{XAn X } = P{Xn E } = 1 P{Xn I }
(7.13)
which becomes equal to one, when n , and we say that the algorithm is globally
convergent. The probability that the archive population An has converged to a subset
of the optimal solution set is equal to one when time goes to infinity. Therefore,
we say that the online population Pn locates the optimal solution, while the offline
population An converges to the optimal solution.
All essential states should be accessible from any inessential state. So, to improve
the solutions in A(n), the following transition must be possible:
A P
A P
A P
si ; s i si ; s j s j ; s j
(7.14)
where sPj represents a population with better solutions than those in sAi .
177
Observe that the archive population does not undergo the selection and variation
steps, and the last transition is performed by the update operator. Therefore, sPj must
be accessible from sPi , in order to validate the complete transition. Thus, although the
complete transition matrix is not irreducible, the transition matrix associated with
the online population Pn must be irreducible, in order to guarantee that the online
population will visit all states in the search space with finite time.
Let G be the transition matrix for Pn . It is a function of the operators of the algorithm during one iteration, i.e., it is obtained by the product of transition matrices
associated with the selection and variation steps (crossover and mutation in the case
of genetic algorithms, for instance). Since we will analyze the product of stochastic
matrices, it is important to provide some helpful definitions and properties.
Definition 5. A matrix A is said to be positive4 if ai j > 0, i, j.
Definition 6. A matrix A is said to be non-negative if ai j 0, i, j.
Definition 7. A square matrix A is said to be diagonal positive if all elements of its
diagonal are positive.
Definition 8. A stochastic square matrix A, representing the transition probabilities
of a Markov chain process with state space X , is said to be irreducible if si , s j
(n)
X , n N such that ai j > 0.
(n)
(7.15)
178
7.3.1.1
In this section, we analyze the case when a local search is explicitly used in the
evolutionary cycle. In the canonical hybrid algorithm, see Algorithm 7.1, the local
search is performed before the selection and variation steps. Thus, assuming that the
product SCM is irreducible, we need to analyze what happens with the product:
H = LG = L(SCM)
(7.16)
where L is the transition matrix associated with the local search phase, G is the
transition matrix associated with the global search algorithm, and H is the transition matrix associated with the hybrid algorihm, which results from the global-local
search interaction.
Considering the Lamarckian approach for hybrid evolutionary algorithms, the
population is modified by the local search. If the local search operator is deterministic, we can say that L is at least row-allowable. Based on these characteristics, we
can state the following:
Theorem 1. If SCM is positive and the local search is deterministic, L is at least
row-allowable, then H is also positive and thus irreducible. Therefore, the archive
population will converge to the solution set and the hybrid algorithm is globally
convergent.
Proof. Let D = SCM. Since L is row-allowable, there is at least one k such that lik
is positive, then:
hi j = lik dk j > 0
k
However, when G = SCM is irreducible, due to a mutation operator that produces a non-positive but irreducible transition matrix, and L is row-allowable,
we cannot state that H is irreducible. Therefore, we cannot prove global convergence in general. This situation can be understood with an illustrative example, see
Figure 7.2. In this Figure, we consider that the population is concentrated in the
region of attraction of a local optimum, which is not the global optimum. Using
a mutation with compact support, the population is able to escape from the local
minimum in n > 1 steps. But if the local search is applied to all individuals at every generation, the population will never escape from the local minimum in this
situation.
Nevertheless, it is possible to guarantee convergence if the number of individuals
selected for local search is smaller than the population size . The result is stated
in the following theorem:
Theorem 2. If < , SCM is irreducible and L is at least row-allowable, then H
is also irreducible (but not positive in general). Therefore, the archive population
will converge to the solution set and the hybrid algorithm is globally convergent.
179
(B)
f (x)
f (x)
(A)
1.5
2.5
3.5
Mutated individual
1.5
2.5
3.5
Fig. 7.2 (A) The population of an evolutionary algorithm at a given iteration is stuck in
the region of attraction of a local minimum. (B) A mutation operator with compact support
produces new solutions inside the same region of attraction, as shown in the figure. In the next
evolutionary cycle, the local search will concentrate the population, including the mutated
individuals, around the local minimum. The population has no chance of escaping the local
minimum in this situation
Proof. In this case, there is a positive probability that the individual(s) not selected
for local search is(are) able to escape from the local minimum in n > 1 steps due
(n)
to the irreducible mutation. Therefore, n such that hi j > 0 and hence the product
H = L(SCM) will be irreducible when < .
Finally, we need to analyze the situation when the local search operator is not applied at every generation, but at every constant number of generations, let us say, at
every nL generations. The transition matrix becomes:
H = L SCM SCM SCM
(7.17)
nL times
then:
Hn = (LGnL )n
(7.18)
180
Therefore, given the considerations above, the hybrid algorithm using an explicit
local search phase is also globally convergent as long as the non-hybrid algorithm is globally convergent. The local search phase will not affect this property
except when the mutation is irreducible and the local search is applied to all
individuals.
This global convergence analysis by means of Markov chain theory allows us
to state only if the algorithm is globally convergent or not under the very general
criterion considered previously. The analysis does not say anything about the convergence rate of the algorithm. For example, the random search algorithm, using
uniform sampling in the search space, is globally convergent under the Markov
chain analysis, because its transition matrix is positive. As long as we use an archive
population to store the best solutions, the random search is globally convergent.
Nonetheless, assessing the global convergence property of an algorithm is an obvious requirement to calculate its convergence rate. Moreover, strategies like the
random search and simple enumeration are very inefficient in practice. The transition matrices L, S, and C do not affect the global convergence of the algorithm,
which stands on the transition matrix M. This is important, but they may affect its
convergence time. As a consequence of the No Free Lunch theorems [40], an evolutionary algorithm can outperfom the random search for a given class of problems,
when the local search and crossover operators exploit some knowledge of that class
of problems, or implicitly rely on some common structure of these problems. On
the other hand, simple random search does not exploit any structure at all. Evolutionary algorithms are far from being random search methods, because the matrices
L, S, and C can improve the performance of the algorithm in comparison to simple
random search for a specific class of problems.
181
(7.20)
(7.21)
where TG and TH are, respectively, the averaged total time consumed by the standard EA and the MA to converge to the solution within a given accuracy; NG and
NH are, respectively, the averaged number of generations required by the standard
EA and the MA to converge5; te is the time consumed to evaluate one solution; tr is
the time consumed by the reproduction step; tls is the computational time consumed
by the local search of one individual.
5
NG and NH can represent the theoretical mean time in the Markov chain or the sample
mean over a given number of independent runs.
182
By imposing the condition TH < TG and assuming te >> tr , which is usually
the case in CAD problems, we have
NH < NG
te
te +
nL tls
= NG
1
1 + nL ttlse
(7.22)
Based on this relation, we observe that even if the hybrid algorithm converges in
less generations (on average) compared to the basic algorithm, TH can be greater.
The more significant the amount ( /nL )tls , compared to te , the smaller the value
NH should be in comparison to NG to satisfy the initial condition TH < TG .
When using an indirect local search, by means of approximated functions, we can
have tls << te , that is, the time to evaluate a solution is dominant in the problem. In
this case, if NH is slightly smaller than NG , then the initial condition will be satisfied and thus the additional complexity introduced by the methodology would be
justifiable. This relation emphasizes the context of problems in which the proposed
methodology can be considered helpful.
This section showed that as long as we can say that the EA is globally convergent
under Markov chain analysis, the local search operator preserves this characteristic.
Moreover, the approximation-based local search operator preserves the polynomial
complexity of standard evolutionary algorithms, and the additional complexity in
the memetic algorithm is not dramatic and would be acceptable in some applications. The final relations developed show under which conditions the hybrid algorithm is advantageous. These relations show that the proposed methodology can be
very interesting for expensive optimization problems, in which the evaluation time
te of a single solution is very big in comparison to the time required for performing
the normal operations of the algorithm, including the local search. This scenario,
commonly found in CAD problems, is the one in which the additional complexity
of the memetic algorithm is justifiable.
nx 1
100
2
2
xi xi+1 + (1 xi)2 , 2 xk 2, k
(7.23)
i=1
f2 (x) = 2.6164 +
1 nx
0.01 (xi + 0.5)4 30x2i 20xi , 6 xk 6, k (7.24)
nx i=1
nx
f3 (x) = 10nx + x2i 10 cos(2 xi ) , 5 xk 5, k
183
(7.25)
i=1
184
(B)
0
convergence c(n)
convergence c(n)
0
0.2
0.4
0.6
0.8
GA
1
1.2
A -MA
1.4
GA
0.5
1.5
A -MA
2
1.6
1.8
50
100
generation
150
200
2.5
50
100
generation
150
200
(C)
convergence c(n)
0.2
0.4
GA
0.6
0.8
A -MA
1.2
1.4
50
100
generation
150
200
Fig. 7.3 Mean convergence for the GA and its memetic version for (A) f 1 (x), (B) f 2 (x), and
(C) f 3 (x)
Table 7.1 Results for the analytical functions
Problem Algorithm
f1 (x)
GA
A -MA
f2 (x)
GA
A -MA
f3 (x)
GA
A -MA
Success rate
22.5%
67.5%
67.5%
95.0%
7.5%
20.0%
N
171.9
105.0
106.8
21.5
186.9
166.5
The mean velocities for f1 and f2 present different slopes for a wider range of
generations, showing that the local search is having an important impact in the
search process. This is because f1 is unimodal and f2 , although multimodal, has
a low to medium degree of multimodality. The basins of attractions in f2 are fairly
convex and present no obstacle for a gradient-based local search. This is why the
performance in f2 is even better than in f1 . The non-convexity of f1 poses some
difficulty for the approximation, due to the gradual slope of the banana region,
and also for the local search.
185
In these analytical problems, the local search was the most time-consuming step.
The overhead of the local search is in this case very high, because the evaluation
of the objective functions is very fast. Therefore, in these problems, although the
memetic algorithms converged in less iterations, they took much more time to do so.
Nonetheless, we can estimate the minimal value for te that would make the memetic
algorithms converge in less time using the relation in (7.22). For each analytical
problem, we have monitored the mean times in seconds taken to perform the local
search of one individual tls and we can use the values in Table 7.1 as rough estimates of NH and NG . Using these values in (7.22), we get te,min = 71ms for
f1 (x), te,min = 12ms for f1 (x), and te,min = 310ms for f1 (x). Therefore, if the time to
evaluate f1 , f2 , and f3 was at least some tenths of a second, the memetic algorithm
would have converged in less time than the genetic algorithm (on average), showing
that the overhead of the local search is not that significant in expensive optimization
problems.
186
n^k
z
x
V'
k
y
n^
V'
Fig. 7.4 Finite Element Method (FEM) model of the SMES device
x S
subject to: g(x) = Bmax 4.92T 0
h(x) = |E 180MJ|/180MJ = 0
(7.26)
where Bstrayi is the value for the magnetic flux density evaluated in one of the 21
points, Bmax is the biggest value for the flux density and is related to the quench
condition, it cannot be greater than the value of 4.92T , and E is the energy value
obtained by the FEM.
The accuracy parameter is the mesh density, and thus the number of nodes
in the mesh, for the FEM. Fig. 7.5 shows the convergence of the energy value
with the number of nodes. The list of values for the accuracy parameter is =
{1.0, 0.5, 0.2, 0.1}. Table 7.2 shows how the number of nodes and the computation
time vary with .
In order to compare instances of EAs and MAs in this problem, we use the mean
convergence given by
c(n) = log
f (x Ak )
f (x A1 )
We applied two instances of the generic structure in Algorithm 7.1 with and
without the approximation-based local search. The parameters of the local search
are = 5, nL = 4, NL = 400. One instance is a Genetic Algorithm (GA) with
6
The square root is not present in the original formulation [3]. We use it here just to make
the objective function smoother.
187
1860
Energy (MJ)
1855
1850
1845
1840
1835
1830
1825
2
10
10
10
10
10
Number of nodes
Fig. 7.5 Convergence with respect to the mesh density (energy value)
Table 7.2 Varying the mesh density
nodes
te (s)a
te
1.0
0.5
0.2
0.1
1024
3777
22998
91246
0.7
1.0
3.6
16.0
1
1.5
5
23
real-biased crossover [35], Gaussian mutation, and mutation rate of 0.1. The other is
a Differential Evolution Algorithm (DEA) [34], with convex crossover for the mutant vectors, Gaussian mutation, and mutation rate of 0.1. In Fig. 7.6 we illustrate
the mean convergence of GA, DEA, and their memetic versions. One can see that
DEA performed better than the GA in this problem. Also, both algorithms presented
higher convergence rate when using the approximation-based local search.
In the experiments in Fig. 7.6, all evaluations were performed with the same
accuracy for all individuals ( = 0.2 and te = 3.6s). Also, tls was about 1.4s on
average and tr was less than 1ms on average, making te >> tr . Using these values
in (7.22), we get NH < 0.988NG , showing a negligible overhead for the local
search. Therefore, if the hybrid algorithm takes less generations, it will take less time
to converge in this problem. Using direct local search, i.e. without approximations,
the time to perform the local search of one individual tls becomes two times greater
than the time to evaluate the whole population in this problem, greatly increasing the
overhead of the local search phase. Although the MA with direct local search usually
converges in less generations than the A -MA, its local search is very expensive,
increasing the total time for the optimization process.
188
(B)
0
convergence c(n)
convergence c(n)
GA
0.2
0.4
0.6
A -MA
0.2
0.6
0.8
0.8
1
0
20
40
60
generation
80
100
DEA
0.4
A -MA
0
20
40
60
generation
80
100
Fig. 7.6 (A) Mean convergence for the GA and its memetic version. (B) Mean convergence
for the DEA and its memetic version
189
References
1. Agapie, A.: Genetic algorithms: Minimal conditions for convergence. In: Hao, J.-K.,
Lutton, E., Ronald, E., Schoenauer, M., Snyers, D. (eds.) AE 1997. LNCS, vol. 1363,
pp. 183193. Springer, Heidelberg (1998)
2. Almaghrawi, A., Lowther, D.A.: Heterogeneity and loosely coupled systems in electromagnetic device analysis. IEEE Transactions on Magnetics 44(6), 14021405 (2008)
3. Alotto, P., Kuntsevitch, A.V., Magele, C., Molinari, G., Paul, C., Preis, K., Repetto, M.,
Richter, K.R.: Multiobjective optimization in magnetostatics: a proposal for benchmark
problems. IEEE Transactions on Magnetics 32(3), 12381241 (1996)
4. Back, T.: Evolutionary Algorithms in Theory and Practice. Oxford University Press, New
York (1996)
5. Coello, C.A.C.: Theoretical and numerical constraint handling techniques used with evolutionary algorithms: a survey of the state of the art. Computer Methods in Applied Mechanics and Engineering 191(11-12), 12451287 (2002)
6. Coello, C.A.C.: Evolutionary multi-objective optimization: a historical view of the field.
IEEE Computational Intelligence Magazine 1(1), 2836 (2006)
7. Dawkins, R.: Memes: the new replicators. The Selfish Gene, ch. 11. Oxford University
Press, Oxford (1976)
8. Fletcher, R.: Practical Methods of Optimization, 2nd edn. John Wiley & Sons, Chichester
(1987)
9. Gen, M., Cheng, R.: Genetic Algorithms and Engineering Design. John Wiley & Sons,
New York (1997)
10. Graham, I.J., Case, K., Wood, R.L.: Genetic algorithms in computer-aided design. Journal of Materials Processing Technology 117(1-2), 216221 (2001)
11. Guimaraes, F.G., Wanner, E.F., Campelo, F., Takahashi, R.H.C., Igarashi, H., Lowther,
D.A., Ramrez, J.A.: Local learning and search in memetic algorithms. In: Proceedings
of the IEEE Congress on Evolutionary Computation. IEEE World Congress on Computational Intelligence, pp. 29362943. IEEE Press, Los Alamitos (2006)
12. Guimaraes, F.G., Campelo, F., Igarashi, H., Lowther, D.A., Ramrez, J.A.: Optimization
of cost functions using evolutionary algorithms with local learning and local search.
IEEE Transactions on Magnetics 43(4), 16411644 (2007)
190
13. Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice-Hall, Englewood
Cliffs (2008)
14. Ishibuchi, H., Murata, T.: A multi-objective genetic local search algorithm and its application to flowshop scheduling. IEEE Transactions on Systems, Man, and Cybernetics
Part C 28(3), 392403 (1998)
15. Ishibuchi, H., Yoshida, T., Murata, T.: Balance between genetic search and local search
in memetic algorithms for multiobjective permutation flowshop scheduling. IEEE Transactions on Evolutionary Computation 7(2), 204223 (2003)
16. Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation.
Soft Computing: A Fusion of Foundations, Methodologies and Applications 9(1), 312
(2005)
17. Johnson, J.M., Rahmat-Samii, V.: Genetic algorithms in engineering electromagnetics.
IEEE Antennas and Propagation Magazine 39(4), 721 (1997)
18. Kimura, S., Konagaya, A.: High dimensional function optimization using a new genetic
localsearch suitable for parallel computers. In: Proceedings of the IEEE Congress on
Evolutionary Computation, vol. 1, pp. 335342. IEEE Press, Los Alamitos (2003)
19. Knowles, J.D., Corne, D.W.: Memetic algorithms for multiobjective optimization: issues, methods and prospects. In: Hart, W.E., Krasnogor, N., Smith, J.E. (eds.) Recent
Advances in Memetic Algorithms. Studies in Fuzziness and Soft Computing, vol. 166,
pp. 313352. Springer, Heidelberg (2004)
20. Krasnogor, N., Smith, J.: A tutorial for competent memetic algorithms: model, taxonomy, and design issues. IEEE Transactions on Evolutionary Computation 9(5), 474488
(2005)
21. Lowther, D.A.: Automating the design of low frequency electromagnetic devices a
sensitive issue. COMPEL: The International Journal for Computation and Mathematics
in Electrical and Electronic Engineering 22(3), 630642 (2003)
22. Lozano, M., Herrera, F., Krasnogor, N., Molina, D.: Real-coded memetic algorithms with
crossover hill-climbing. Evolutionary Computation Journal 12(3), 273302 (2004)
23. Merz, P., Freisleben, B.: A comparison of memetic algorithms, tabu search, and ant
colonies for the quadratic assignment problem. In: Proceedings of the IEEE Congress
on Evolutionary Computation, vol. 3, pp. 20632070. IEEE Press, Los Alamitos (1999)
24. Moscato, P.: On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms. Tech. Rep. C3P Report 826, Caltech Concurrent Computation Program (1989)
25. Moscato, P.: Memetic algorithms: a short introduction. In: Corne, D., Dorigo, M., Glover,
F. (eds.) New Ideas in Optimization. Advanced Topics in Computer Science, pp. 219
234. McGraw-Hill, New York (1999)
26. Norris, J.R.: Markov Chains. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (1997)
27. Ong, Y.S., Keane, A.J.: Meta-lamarckian learning in memetic algorithms. IEEE Transactions on Evolutionary Computation 8(2), 99110 (2004)
28. Ong, Y.S., Nair, P.B., Keane, A.J.: Evolutionary optimization of computationally
expensive problems via surrogate modeling. American Institute of Aeronautics and Astronautics Journal 41(4), 687696 (2003)
29. Ong, Y.S., Lim, M.H., Zhu, N., Wong, K.W.: Classification of adaptive memetic algorithms: a comparative study. IEEE Transactions on Systems, Man, and Cybernetics
Part B: Cybernetics 36(1), 141152 (2006)
30. Park, J., Sandberg, I.W.: Universal approximation using radial-basis-function networks.
Neural Computation 3(2), 246257 (1991)
191
31. Renner, G., Ekart, A.: Genetic algorithms in computer aided design. Computer-Aided
Design 35(8), 709726 (2003)
32. Rudolph, G.: Convergence properties of canonical genetic algorithms. IEEE Transactions
on Neural Networks 5(1), 96101 (1994)
33. Seneta, E.: Non-negative Matrices and Markov Chains. Springer Series in Statistics.
Springer, Heidelberg (1981)
34. Storn, R.M., Price, K.V.: Differential evolution: A simple and efficient adaptive scheme
for global optimization over continuous spaces. Journal of Global Optimization 11, 341
359 (1997)
35. Takahashi, R.H.C., Vasconcelos, J.A., Ramrez, J.A., Krahenbuhl, L.: A multiobjective
methodology for evaluating genetic operators. IEEE Transactions on Magnetics 39(3),
13211324 (2003)
36. Wang, G.G., Shan, S.: Review of metamodeling techniques in support of engineering
design optimization. Journal of Mechanical Design 129(4), 370380 (2007)
37. Wanner, E.F., Guimaraes, F.G., Takahashi, R.H.C., Saldanha, R.R., Fleming, P.J.: Constraint quadratic approximation operator for treating equality constraints with genetic
algorithms. In: Proceedings of the IEEE Congress on Evolutionary Computation, vol. 3,
pp. 22552262. IEEE Press, Los Alamitos (2005)
38. Wanner, E.F., Guimaraes, F.G., Takahashi, R.H.C., Ramrez, J.A.: Hybrid genetic algorithms using quadratic local search operators. COMPEL: The International Journal for
Computation and Mathematics in Electrical and Electronic Engineering 26(3), 773787
(2007)
39. Wanner, E.F., Guimaraes, F.G., Takahashi, R.H.C., Fleming, P.J.: Local search with
quadratic approximations into memetic algorithms for optimization with multiple criteria. Evolutionary Computation Journal 16(2), 185224 (2008)
40. Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1(1), 6782 (1997)
41. Zhou, P., Fu, W.N., Lin, D., Stanton, S., Cendes, Z.J.: Numerical modeling of magnetic
devices. IEEE Transactions on Magnetics 40(4), 18031809 (2004)
42. Zhou, Z., Ong, Y.S., Lim, M.H.: Memetic algorithm using multi-surrogates for computationally expensive optimization problems. Soft Computing (11), 957971 (2007)
Chapter 8
Abstract. The chapter claims that the search distributions of Estimation of Distribution Algorithms (EDAs) contain much information that can be obtained with the
help of modern statistical techniques to create powerful strategies for expensive optimization. For example, it shows how the regularization of some parameters of the
EDAs probabilistic models can yield dramatic improvements in efficiency. In this
context a new class, Shrinkage EDAs, based on shrinkage estimation is presented.
Also, a novel mutation operator based on a regularization of the entropy is discussed.
Another key contribution of the chapter is the development of a new surrogate fitness model based on the search distributions. With this method the evolution starts
in the fitness landscape, switches to the log-probability landscape of the model and
then backtracks to continue in the original landscape if the optimum is not found.
For the sake of completeness the chapter reviews other techniques for improving
the sampling efficiency of EDAs. The theoretical presentation is accompanied by
numerical simulations that support the main claims of the chapter.
8.1 Introduction
Nowadays, the optimization of computationally expensive black-box functions has
become a task of great practical importance that requires new theoretical developments. In particular, in the area of evolutionary algorithms (EAs) several approaches
have been studied so as to reduce the number of function evaluations.
The main message of this chapter is: the area of Estimation of Distribution Algorithms (EDAs) [12, 18, 26, 27, 37], a state-of-the-art branch of EAs, can play a
leadership role in expensive optimization. However, it is important to recognize that
dealing with expensive optimization problems within the framework of EDAs is
Alberto Ochoa
Institute of Cybernetics, Mathematics and Physics, Calle 15 No. 551, CP 10400,
Ciudad Habana, Cuba
e-mail: ochoa@icmf.inf.cu
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 193218.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
194
A. Ochoa
a very complex issue that has to be attacked from several different directions at
the same time. Therefore, our main point is the following affirmation: the search
distributions contain a lot of information that can be extracted with the help of modern statistical techniques to be used to elaborate powerful strategies for expensive
optimization.
To begin with, it is crucial to understand that in EDAs the distribution is the
central concept, as opposed to populations and individuals. We believe that it is
from this understanding that these new research opportunities will arise.
In [32] several developments of this idea related to both the entropy of the search
distributions and the entropy of the optimization problems were presented. Two
interesting issues were raised: maximum entropy and entropic mutation, and the
important role of the Boltzmann distribution in the theory of EDAs was stressed.
The influence of the quality of the utilized probabilistic model in the performance
of an EDA algorithm was another issue investigated in [32]. It was made clear that
building a good model is important, but at the same time it was recognized that
good does not always point out the model with the most information.
Throughout the chapter the above-mentioned issues are revisited and other unpublished or published elsewhere are presented. Our goal is to highlight the main
directions that can be taken today to boost the capabilities of current EDAs to cope
with expensive problems.
The outline of the chapter is as follows. To begin with, a brief introduction on Estimation of Distribution Algorithms (EDAs) is given in Sect. 8.2. Then, we review
three different perspectives on a basic problem: enhancing the sampling efficiency
of EDAs. This occurs in Sect. 8.3, which is divided in three subsections: improving selection methods (Sect. 8.3.1), accelerating the convergence (Sect. 8.3.2) and
building statistically efficient sampling distributions (Sect. 8.3.3).
Section 8.4 presents the first of the two main contributions of this chapter: a method
and an algorithm that do not evaluate all visited search points. The evolution occurs
in two different landscapes, in one of them the fitness is estimated. The method is
called Partial Evaluation with Backtracking from Log-Probability Landscapes.
Section 8.5 is devoted to regularization, a new important area of research
in EDAs. We will show how the regularization of the probabilistic models used in
EDAs can yield dramatic improvements in efficiency. In Sect. 8.5.1 we show how
to design mutation operators that do not destroy the learned distributions. This technique, called Linear Entropic Mutation (LEM), increases the entropy of
the probabilistic models and produces higher percentage of success rates and fewer
function evaluations. It is a natural operator for EDAs because it mutates distributions instead of single individuals. Finally, in Sect. 8.5.2 the second main contribution of our work is presented: an efficient technique for accurate model building with
small populations. With this we are opening the doors of EDAs research to a new
class of efficient algorithms based on shrinkage estimation. We propose the name
Shrinkage Estimation of Distribution Algorithms (SEDA) for
the whole class. We give a brief introduction of the SEDAs by means of a detailed
discussion of one of its members: SEDAmn (an algorithm that uses the multivariate
normal distribution).
195
The chapter includes an appendix dedicated to the B-functions -a new benchmark function model especially designed for EDAs. Some of the functions used in
our simulations are of this type. Finally, the conclusions of the chapter are given.
e f (x)
e f (x)
=
Z f ( )
y e f (y)
196
A. Ochoa
We also use Z , f , but to simplify the notation and f can be omitted. If we follow
the usual definition of the Boltzmann distribution, then f (x) is called the free
energy and 1/ the temperature of the distribution. The parameter is usually
called the inverse temperature.
Closely related to the Boltzmann distribution is Boltzmann selection:
Definition 2. Given a distribution p (x) and a selection parameter , Boltzmann
selection calculates a new distribution according to
ps (x) =
p(x)e f (x)
y p(y)e f (y)
197
f =1
f =4
f = 12
400
450
500
83, 7, 2689
94, 6, 2844
99, 6, 3025
93, 6, 2546
99, 6, 2645
100, 6, 2920
96, 6, 2408
99, 6, 2614
100, 6, 2750
198
A. Ochoa
Table 8.2 Probabilistic Elitism. PADA with Goldbergs Deceptive3. (See [42] page 94)
Elit-size
Prob-Elit-size
Gc
%Success
#Evaluations
0
150
300
450
300
300
0
0
0
0
5
10
6.75 1.43
7.04 1.46
9.07 2.29
14.90 4.91
8.05 1.60
7.36 1.68
79
100
100
20
96
98
3375
2614
2114
1945
1835
(8.1)
199
Table 8.3 The impact of using maximum-entropy search distributions. (See [34])
N
%Success
Gc
%SuccessME
GcME
200
600
800
5000
0
8
10
92
9.7 1.5
8.7 3.2
7.2 1.2
2
69
90
100
8.5 0.7
7.4 1.1
7.0 1.2
5.8 0.9
ME
grant them the right to be in the next population as it is done with the best current
individuals? The term probabilistic elitism was chosen for this method in [42].
Table 8.2, shows an interesting experiment about the synergy between traditional
and probabilistic elitism. The algorithm used is PADA, the Polytree Approximation Distribution Algorithm [43], one of the pioner EDAs. The algorithm learns
polytree Bayesian networks from the selected populations. It makes independence
tests to construct the skeleton and to orient some edges, then a BIC score guides a
hillclimbing local search to orient the remainder edges.
Notice that increasing the classic elitism decreases the number of function evaluations, but this has an upper bound beyond which the efficacy (%Success) of the
algorithm drops (see Elitism = 450). Adding the five or ten best configurations of
the search distributions accelerates the convergence of the algorithm and produces
an additional reduction of the number of evaluations. As a result of the combined
use of the methods 1540 function evaluations were saved. Simple, but effective! It
is worth noting from the point of view of expensive optimization (see Sect. 8.4.2),
that the probabilistic elite can be included in the new population without being
evaluated.
When it is possible to build a junction tree for a discrete problem, the probabilistic
elite the M most probable configurations can be computed with the Nilssons
algorithm [28]. Alternatively, the algorithm introduced in [45], which uses maxmarginals instead of a junction tree, can be used for arbitrary discrete graphical
models. For some continuos problems it is also possible to sample zones of high
probability with similar results, for example, around the mean of the multivariate
normal distribution.
We look at the probabilistic elitism as an acceleration method that compensates
for the delay introduced by classic elitism and the regularization techniques that will
be discussed later in this chapter.
200
A. Ochoa
201
UMDA
MMHC-EDA
TPDA-EDA
BOA(k=1)
BOA(k=3)
EBNA
N a
%Success
Gc b
%Evaluations
100
250
250
280
700
150
98.70
95.30
98.60
97.80
97.80
97.10
11.98
10.97
10.54
10.83
11.09
28.85
1198.783.01
2742.659.61
2635.658.64
3034.1911.48
7765.1327.16
4328.5336.37
202
A. Ochoa
Table 8.5 Minimization in the hard class of random polytree B-functions, BF2B30s4-1245.
The cells contain the succcess rate
N
UMDA
BOA
k=1
BOA
k=2
BOA
k=3
EBNA
MMHC
EDA
300
500
1000
2000
3.33
3.33
3.33
3.33
46.67
50
43.33
43.33
23.33
56.67
50
50
13.33
43.33
50
53.33
30.00
26.67
40.00
43.33
40.00
53.33
73.33
100
In summary, we believe that MMHC-EDA is the best candidate for our purposes.
It seems to obtain accurate approximations of the search distributions, which is
highly convenient for the technique we will introduce in the next section.
203
Table 8.6 Partial Evaluation of the OneMax function with the Boltzmann Univariate
Marginal Distribution Algorithm (BUMDA). (See [42], page 83)
GPE
Gc
%Success
#Evaluations
%Estimated
1
2
3
4
-
4.77
4.95
5.47
5.97
7.86
48
80
95
100
100
5770
5950
6470
6070
8860
83
66
54
43
0
204
A. Ochoa
log pa , f (x)
pa , f (x)
p , f (x)
and taking the expectation with respect to the approximating distribution, we get
D pa , f (x) p , f (x)
f , (x) pa (x) =
(8.2)
,f
The error expectation equals the product of the temperature times the KullbackLiebler divergence between the two distributions. Notice that the last two equations
further explain why the function approximation is better toward the end of the run:
is larger.
One interesting observation, which deserves more study, is the fact that for this
function the search in the log-probability space seems easier than in the original
fitness landscape. In [24] the authors reported a similar behaviour of the UMDA
with the Saw function. According to these authors, the rugged fitness landscape
of the function is implicitly transformed into a fairly smooth landscape in a space
whose coordinates are the univariate probabilities. The algorithm performs a gradient ascent on the transformed landscape and easily gets to the optimum. In this
regard the apparent merit of our PE technique is that it explicitly performs the
optimization in the probability space. These issues are the subject of ongoing
research.
Now the obvious question is whether or not the method can be applied to cases
different from those covered by the Factorization theorem. Equation 8.2 drops us a
hint. We can apply the PE method if our approximating distribution is close enough
to the Boltzmann distribution. This is valid for both discrete and real valued problems. In [46] a variant of Boltzmann selection with an annealing schedule for real
variables was reported. The algorithm proposed by these authors computes the multivariate normal distribution that minimizes the Kullback-Liebler distance to the
Boltzmann search distribution.
205
The TLN function and C code to work with it is available from the author.
206
A. Ochoa
Table 8.7 Optimization of the B-function TLN with 50 variables. The reported averages are
computed over 100 runs. Algorithm: MMHC, N=3000, Elit=0, Gmax =20, =0.3
PEstart
Mean Genc
#Evaluations
%Success
3
4
5
6
7
6.77
6.45
6.66
6.59
6.49
6005.8
9004.5
12003.7
15002.5
17937.7
9
42
83
93
94
Table 8.8 Optimization of the function Trap5 with 30 variables. The reported averages are
computed over 20 successfuls runs. Algorithm: MMHC, N=4000, Elit=0, Gmax =12, GPE =2
Selection
Partial Evaluation
#Evaluations
%Success
= 0.3
= 0.3
Boltzmann (N = Ns )
Boltzmann (N = Ns )
yes
no
yes
no
19360
21201
11611
16401
60
100
100
100
the application of the method starts from. Using this algorithm, we have confirmed
the predictions made in the previous paragraph.
At this point we conclude that the log-probability fitness model can be effective
with truncation selection at least for the investigated function. Unfortunately, this is
not always the case as the following example shows.
We investigate our method with the Trap function. It is a separable deceptive
problem proposed in [5]. Its global maximum is located at the point (1, 1, . . . , 1).
Given the function
k,
f or u = k
trap(u) =
k 1 u, otherwise
The function (we use k = 5) is defined as follows:
m
Trap(
x ) = trap(xkik+1 + xkik+2 + . . . + xki )
(8.3)
i=1
This function has been used extensively in testing of EDAs and genetic algorithms,
and solving them has proven to be quite challenging in the absence of a correct
knowledge of its dependence structure.
Table 8.8 reveals several interesting issues. The first thing to notice is the significant reduction in the success rate when PE is applied with truncation selection.
Besides, the decrease in the number of function evaluations is smaller than the one
obtained with Boltzmann selection. In the later case, a reduction of about 5000
207
evaluations is achieved with the same 100% success. Notice that the best value is
about 46% of the maximum number of function evaluations for truncation selection.
It is worth noting, that the results shown in Table 8.8 were computed for a population size less than the critical value for 30 variables. For truncation selection the
critical value (97% success in 100 runs) is equal to 5000 and the average number
of function evaluations in this case is 23042. We have found that with Boltzmann
selection and PE we get a 100% success and an average of 11500 function evaluations. This means that the average saving is of 51%. We also present, in Fig. 8.1,
the histograms of the number of function evaluations per optimization run. As it
can be seen near 90% of the runs are below 15000 function evaluations when PE is
combined with Boltzmann selection. In contrast, more than 95% of the runs without
PE require more than 20000 function evaluations and 70% of the PE runs need only
half of this amount.
Algorithm 1. MMHC-EDA + Partial Evaluation with Backtracking from LogProbability Landscapes (PEBLPL)
Set t 1
Randomly generate N 0 configurations
while stop criteria are not valid do
one-MMHC-EDA-step( f (x)) {// Evolution in fitness landscape}
if stop criteria are not valid AND (t GPE ) then
save(t, current Population)
while stop criteria are not valid do
one-MMHC-EDA-step(log p(x)) {// Evolution in log-probability landscape}
end while
if optima are not found then
restore(t, Population) {// Backtracking}
end if
end if
end while
function one-MMHC-EDA-step(g(x))
Evaluate the current population with the input function g(x)
According to a selection method build the selected set SS
Find the structure of the probability model BN = MMHC(SS)
Estimate the parameters of pss (x,t) using BN and SS
Generate N new configurations from p(x,t + 1) pss (x,t)
Set t t + 1
endfunction
According to the results shown in this section we cannot still say under what
conditions it is possible to use truncation selection with the PE scheme proposed
here. We only know so far that for some functions this is posible. Obviously, this
topic needs more research. At this point, it is important to say that with the TLN
function the PE also works with Boltzmann selection. This is another confirmation
208
A. Ochoa
70
60
50
40
30
20
10
0
10000
15000
20000
25000
30000
15000
20000
25000
30000
70
60
50
40
30
20
10
0
10000
Fig. 8.1 Histograms of the number of function evaluations per optimization run of the Trap
function. N = 5000. top) Truncation selection without PE. bottom) Boltzmann selection with
PE
of the theoretical ideas developed in Sect. 8.4.2 and of the important role of the
Boltzmann distribution in the theory/practice of EDAs.
It is worth noting, that the proposed Partial Evaluation scheme is beneficial when
the actual fitness evaluation is expensive, in which case the above costs are indeed
negligible and the model developed in this section valid.
We discuss two examples. In the first one the entropy of the search distribution
is the parameter to regularize. Due to space constraints, we give just a small introduction to the topic and leave the details for a forthcoming publication. In the
second example, the covariance matrix of a multivariate normal search distribution
is regularized. In this case we give more details and introduce a new EDA algorithm.
209
Table 8.9 Entropic mutation of the search distributions. UMDA with OneMax function for
fixed population size
0.05
0.1
0.125
0.2
%Success
#Evaluations
17
300
74
465
97
425
100
412
99
450
Table 8.10 A Bayesian EDA and the LEM mutation for the Goldbergs Deceptive 3 function,
n = 18
N
500 (
600 (
700 (
800 (
420 (
420 (
420 (
420 (
420 (
= 0)
= 0)
= 0)
= 0)
= 0.08)
= 0.10)
= 0.12)
= 0.14)
= 0.20)
%Success
#Evaluations
86
88
96
100
100
100
100
100
100
2211
2815
3186
3306
2600
2450
2400
2300
2200
Table 8.9 presents the results of a small experiment. For a fixed small population size, which is not enough to obtain a good success rate without mutation, it is
possible to boost the efficacy of the Univariate Marginal Distribution Algorithm by
increasing the mutation intensity, .
The approach was called linear entropic mutation (LEM) in [32]. The LEM acts
as a regularizer of the entropy of the system and computes a convex sum of the
current and the maximum entropy with the regularization parameter . In this way
the distribution is shrunk toward the maximum entropy distribution. It turns out that
this process can be interpreted as a mutation process as far as it increases the level
of uncertainty or randomness in the system. For multivariate discrete systems the
following definition introduces the LEM.
Definition 3. Let p (x1 , x2 , , xn ) and p (x1 , x2 , , xn ) denote a discrete joint
probability mass and its LEM-mutation with mutation intensity . If H (X ) and
H (X ) are their respective entropy values, then the following holds:
(8.4)
210
A. Ochoa
just to show that the LEM is another regularization technique that can reduce the
number of function evaluations.
The Goldbergs Deceptive3 function with 18 variables is optimized with an EDA
that learns a Bayesian network using a scoring metric. All the results are averages
over 30 runs.
Table 8.10 shows that without mutation to achieve a success rate larger than 95%
more than 3000 function evaluations are needed. In contrast, for a fixed population
size equal to 420 and for all the mutation intensities shown (starting at 0.08) a 100%
success is obtained with many fewer function evaluations. For = 0.2, the table
shows a saving of more than 30%!
Shrinkage Estimation gives EDAs the ability of building better models of the
search distributions under small populations. Having better models is important for implementing partial evaluation strategies.
Our main claim is that the synergy between shrinkage estimation, small
populations and partial evaluation offers a great research opportunity with
regard to expensive optimization problems.
Due to space constraints the complex issues of the combination of the above mentioned methods are not discussed in the chapter. We recall that the material presented
in Sect. 8.3 is relevant to the small population issue. Hereafter, we concentrate ourselves on the impact of shrinkage estimation alone.
211
212
A. Ochoa
Best Fva
#Evalsa
Best Fvb
#Evalsb
Best Fvc
Sphere
Rosenbrock
Ackley
Griewangk
106
48.7
106
106
> 200000
> 270000
> 280000
> 170000
108
48.5
108
108
< 3000
< 1500
< 2500
< 2500
> 1000
> 150
> 1.2
> 6x105
Best result from [13]. (EMNAglobal and others); b SEDAmn with shrinkage; c SEDAmn with
MLE covariance estimation.
213
Table 8.12 Scaling of SEDAmn . Averages over 20 runs. N = 50 and error = 106
F(x)
n = 50
n = 100
n = 500
Ackley
Griewangk
Rastrigin
5060 96
5525 110
6575 761.4
6720 103.2
8140 251.4
8500 1149
11880 97.7
14050 282.8
13540 1069
covariance estimation and the maximum number of function evaluations that appears in the corresponding fifth column.
Now, and just to get an idea of the scaling of SEDAmn , we fix the population
size N = 50 and report in Table 8.12 the average number of function evaluations
needed for getting an error of 106 in 20 runs. We show the results for 50, 100 and
500 variables and the functions: Ackley, Griewangk and Rastrigin. Note the linear
dependence.
In our opinion, the experiments of this section clearly show that SEDAmn is a
new powerful EDA algorithm that offers us a significant reduction in the number of
function evaluations with respect to existing EDAs.
214
A. Ochoa
Appendix
B-Functions: A Random Class of Benchmark Functions
The random Boltzmann function model was introduced in [32] and later extended in
[33] and investigated in [30], where it is called B-function model. The model allows
the explicit codification of probabilistic information, which is very convenient for
the study of Estimation of Distribution Algorithms.
215
p (x)
called here B-function, is an additively decomposable unimodal non-negative function with minimum at xmpc .
The above definition can be modified to the deal with multimodal and real valued
problems. It also says that whenever we have a distribution we can construct a Bfunction. For example, the Alarm B-function is built by using as p (x) the famous
ALARM Bayesian network [1]. We also have shown elsewhere how to build such a
distribution given collections of certain types of probabilistic constraints.
The following properties tell us why B-functions are an excellent benchmark for
evolutionary optimization.
The minimum of a B-function is always zero, thus the stopping criterion of the
optimization algorithm is easy to implement.
The computation of the most probable configuration (discrete variables), which
is necessary for the definition of the function, can be accomplished in polynomial
time for a large class of distributions [28, 45].
The random generation of B-function instances (graph and parameters) can be
accomplished efficiently [32].
A naming convention to facilitate referencing of some B-function instances and
subclasses can be easily implemented. Alternatively, less standard B-functions
instances and subclasses can be distributed as files.
It is straightforward to control problem size, structural and parametric complexity
to test scalability.
There is no need to construct functions by concatenating small subfunctions.
In [33] we introduced a naming mechanism (and a program available from the author) to facilitate working with and referencing to certain subclasses of B-functions.
We show it here for the case of boolean polytree B-functions.
BF2Bn2sn3 d1d2 d3 d4 [n4 ]
(8.6)
The above notation stands for a function with n2 boolean variables. The dependence structure is given by a polytree (restricted Bayesian network) with maximum
number of parents equal to n3 . The digits d1 , . . . , d4 have the following meaning.
The mutual information of any pair of adjacent variables in the dependency graph
of the function lies on the interval 0.1 [d1 , d2 ]. The digits d3 and d4 constraint
the univariate entropies. In fact, the univariate probabilities lie in the interval
[pmin , pmax ] = 0.1 [d3 , d4 ] + 0.05 or in [1 pmax, 1
the optional pa pmin ]. Finally,
rameter n4 , which is a natural number not exceding 109 1 , is a random seed that
216
A. Ochoa
References
1. Beinlich, I., Chavez, H.R., Cooper, G.: The ALARM Monitoring System: a Case Study
with two Probabilistic Inference Techniques for Belief Networks. In: Artificial Intelligence in Medical Care, pp. 247256 (1989)
2. Blanco, R., Lozano, J.A.: Empirical comparison of Estimation of Distribution Algorithms in combinatorial optimization. In: Larranaga, P., Lozano, J.A. (eds.) Estimation of
Distribution Algorithms. A New Tool for Evolutionar y Computation, Kluwer Academic
Publishers, Dordrecht (2002)
3. Brown, L.E., Tsamardinos, I., Aliferis, C.F.: A comparison of novel and state-of-the-art
polynomial bayesian network learning algorithms. In: AAAI, pp. 739745 (2005)
4. Chiba, K., Jeong, S., Shigeru, O., Morino, H.: Data mining for multidisciplinary design
space of regional-jet wing. In: Proceedings the 2005 IEEE Congress on Evolutionary
Computation CEC 2005, pp. 233323340 (2005)
5. Deb, K., Goldberg, D.E.: Analyzing deception in trap functions. In: Analyzing deception
in trap functions. Foundations of Genetic Algorithms, vol. 2, pp. 93108 (1993)
6. Eby, D., Averill, R.C., Punch, W.F.I., Goodman, E.D.: Evaluation of injection island ga
performance on flywheel design optimization. In: Proceedings of the Third Conference
on Adaptive Computing in Design and Manufacturing (1998)
7. Groppo, T.: Learning bayesian networks skeleton: A comparison between TPDA and
PMMS algorithm. Masters thesis, Universite Claude Bernard Lyon I (2006)
8. Ireland, C.T., Kullback, S.: Contingency Tables with Given Marginals. Biometrika 55,
179188 (1968)
9. Jaynes, E.T.: Information Theory and Statistical Mechanics. Physics Review 6,
620643 (1957)
10. Jirousek, R., Preucil, S.: On the Effective Implementation of the Iterative Proportional
Fitting Procedure. Comput. Statistics and Data Analysis 19, 177189 (1995)
11. Kim, H.S., Cho, S.B.: An efficient genetic algorithm with less fitness evaluation by clustering. In: Proceedings of 2001 IEEE Conference on Evolutionary Computation, pp.
887894 (2001)
12. Larranaga, P., Lozano, J.A. (eds.): Estimation of Distribution Algorithms. A New Tool
for Evolutionary Computation. Kluwer Academic Publishers, Dordrecht (2002)
217
13. Larranaga, P., Lozano, J.A., Miqueles, T., E-Bengoetxea: Experimental Results in Function Optimization with EDAs in Continuos Domain. In: Larranaga, P., Lozano, J.A.
(eds.) Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic Publishers, Dordrecht (2002)
14. Lauritzen, S.L.: Graphical Models. Oxford Press (1996)
15. Ledoit, O., Wolf, M.: Improved estimation of the covariance matrix of stock returns with
an application to portfolio selection. J. Empir. Finance 10, 603621 (2003)
16. Lewis, P.M.: Approximating Probability Distributions to Reduce Storage Requirements.
Information and Control 2, 214225 (1959)
17. Liang, K.H., Yao, X., Newton, C.: Evolutionary search of approximated n-dimensional
landscapes. International Journal of Knowledge-Based Intelligent Engineering Systems 4(3), 172183 (2000)
18. Lozano, J., Larranaga, P., Inza, I., Bengoetxea, E. (eds.): Towards a new Evolutionary
Computation. Advances on Estimation of Distribution Algorithms. Studies in Fuzziness
and Soft Computing. Springer, Heidelberg (2006)
19. Madera, J.: Hacia una Generacion Eficiente de Algoritmos Evolutivos con Estimacion
de Distribuciones: Pruebas de (In)dependencia +Paralelismo. PhD thesis, Instituto de
Cibernetica, Matematica y Fsica, La Habana (in Spanish). Adviser: A. Ochoa (2009)
20. Madera, J., Ochoa, A.: Un EDA basado en aprendizaje Max-Min con escalador de
colinas. Tech. Rep. 396, ICIMAF (2006)
21. Madera, J., Ochoa, A.: Algoritmos Evolutivos con Estimacion de Distribuciones
Bayesianas basados en pruebas de (in)dependencia. Tech. Rep. 474, ICIMAF (2008)
22. Mahnig, T., Muhlenbein, H.: Comparing the Adaptive Boltzmann Selection Schedule
SDS to Truncation Selection. In: Third International Symposium on Adaptive Systems,
ISAS 2001, Evolutionary Computation and Probabilistic Graphical Models, La Habana,
pp. 121128 (2001)
23. Meyer, C.H.: Korrektes Schliesen bei Unvollstandiger Information. PhD thesis, Fernuniversitat Hagen (1998) (in German)
24. Muhlenbein, H., Mahnig, T.: Mathematical analysis of evolutionary algorithms for optimization. In: Proceedings of the Third International Symposium on Adaptive Systems,
pp. 166185 (2001)
25. Muhlenbein, H., Mahnig, T.: Evolutionary Optimization and the Estimation of Search
Distributions. Journal of Approximate Reasoning 31(3), 157192 (2002)
26. Muhlenbein, H., Paas, G.: From Recombination of Genes to the Estimation of Distributions I. Binary Parameters. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P.
(eds.) PPSN 1996. LNCS, vol. 1141, pp. 178187. Springer, Heidelberg (1996)
27. Muhlenbein, H., Mahnig, T., Ochoa, A.: Schemata, Distributions and Graphical Models
in Evolutionary Optimization. Journal of Heuristics 5(2), 213247 (1999)
28. Nilsson, D.: An Efficient Algorithm for Finding the M most Probable Configuration in
Bayesian Networks. Statistics and Computing 2, 159173 (1998)
29. Ochoa, A.: Linear Entropic Mutation (submitted for publication) (2009)
30. Ochoa, A.: The Random Class of B-functions (unpublished) (2009)
31. Ochoa, A., Soto, M.: Partial Evaluation in Genetic Algorithms. In: IEA/AIE 1997: Proceedings of the 10th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, pp. 217222. Goose Pond Press, Atlanta (1997)
218
A. Ochoa
32. Ochoa, A., Soto, M.: Linking Entropy to Estimation of Distribution Algorithms. In:
Lozano, J., Larranaga, P., Inza, I., Bengoetxea, E. (eds.) Towards a new Evolutionary
Computation. Advances on Estimation of Distribution Algorithms. Studies in Fuzziness
and Soft Computing, pp. 138. Springer, Heidelberg (2006)
33. Ochoa, A., Soto, M.: On the Performance of the Bayesian Optimization Algorithm with
B-functions. Tech. Rep. 383, ICIMAF (2006)
34. Ochoa, A., Hons, R., Soto, M., Muehlenbein, H.: A Maximum Entropy Approach to
Sampling in EDA-the Single Connected case. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.)
CIARP 2003. LNCS, vol. 2905, pp. 683690. Springer, Heidelberg (2003)
35. Ong, Y.S., Nair, P.B., Keane, A.J.: Evolutionary optimization of computationally
expensive problems via surrogate modeling. American Institute of Aeronautics and Astronautics Journal 41(4), 687696 (2003)
36. Pelikan, M., Goldberg, D.E., Cantu-Paz, E.: BOA: The Bayesian optimization algorithm.
In: Banzhaf, W., Daida, J., Eiben, A.E., Garzon, M.H., var, V.H., Jakiela, M., Smith, R.E.
(eds.) Proceedings of the Genetic and Evolutionary Computation Conference GECCO
1999, Orlando, FL, vol. 1, pp. 525532. Morgan Kaufmann Publishers, San Francisco
(1999)
37. Pelikan, M., Sastry, K., Cant-Paz, E. (eds.): Scalable Optimization via Probabilistic Modeling. Studies in Computational Intelligence, vol. 33. Springer, Heidelberg (2006)
38. Sastry, K., Pelikan, M., Goldberg, D.E.: Efficiency Enhancement of Estimation of Distribution Algorithms. In: Pelikan, M., Sastry, K., Cant-Paz, E. (eds.) Scalable Optimization
via Probabilistic Modeling. Studies in Computational Intelligence, vol. 33, pp. 161186.
Springer, Heidelberg (2006)
39. Schafer, J., Strimmer, K.: A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics. Statistical Applications in Genetics
and Molecular Biology 4(1) (2005)
40. Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379423 (1948)
41. Smith, R.E., Dike, B., Stegmann, S.: Fitness inheritance in genetic algorithms. In:
Proceedings of the 1995 ACM Symposium on Applied Computing, SAC 1995,
pp. 345350 (1995)
42. Soto, M.: Un estudio sobre los algoritmos evolutivos basados en redes bayesianas simplemente conectadas y su costo de evaluacion. PhD thesis, Instituto de Cibernetica,
Matematica y Fsica, La Habana (in Spanish). Adviser: A. Ochoa (2003)
43. Soto, M., Ochoa, A.: A Factorized Distribution Algorithm based on polytrees. In:
Congress on Evolutionary Computation, CEC 2000, California, pp. 232237 (2000)
44. Soto, M., Ochoa, A., Rodrguez-Ojea, L.: An Empirical Stuty on Oversampled Selection
in Estimation of Distribution Algorithms. Tech. Rep. 494, ICIMAF (2008)
45. Yanover, C., Weiss, Y.: Finding the M most Probable Configurations in Arbitrary Graphical Models. In: Thrun, S., Saul, L., Schollkopf, B. (eds.) Advances in Neural Information
Processing Systems, vol. 16. MIT Press, Cambridge (2004)
46. Yunpeng, C., Xiaomin, S., Peifa, J.: Probabilistic Modeling for Continuos EDA with
Boltzmann Selection and Kullback-Liebler Divergence. In: Proceedings of GECCO Conference (2006)
47. Zhou, Z.Z., Ong, Y.S., Nair, P.B., Keane, A.J., Lum, K.Y.: Combining global and local
surrogate models to accelerate evolutionary optimization. IEEE Transactions on Systems, Man and Cybernetics-Part C 37(1), 6676 (2007)
Chapter 9
Abstract. In this chapter we propose a surrogate-assisted framework for expensive single- and multi-objective evolutionary optimization, under a fixed budget of
computationally intensive evaluations. The framework uses similarity-based surrogate models and an individual-based model management with pre-selection. Instead
of existing frameworks where the surrogates are used to improve the performance
of evolutionary operators or as local search tools, here we use them to allow for an
augmented number of generations to evolve solutions. The introduction of the surrogates into the evolutionary cycle is controlled by a single parameter, which is related
with the number of generations performed by the evolutionary algorithm. Numerical
experiments are conducted in order to assess the applicability and the performance
in constrained and unconstrained, single- and multi-objective optimization problems. The results show that the present framework arises as an attractive alternative
to improve the final solutions with a fixed budget of expensive evaluations.
9.1 Introduction
Several problems of interest in science and engineering are or can be advantageously formulated as optimization problems. However, modern problems have lead
to the development of increasingly complex and computationally expensive simulation models. When the optimization algorithm involves the repeated use of expensive simulations to evaluate the candidate solutions, the computational cost of such
L.G. Fonseca
Natl Lab for Scientific Computing LNCC, Petropolis RJ Brazil
e-mail: goliatt@lncc.br
H.J.C. Barbosa
Natl Lab for Scientific Computing LNCC, Petropolis RJ Brazil
e-mail: hcbm@lncc.br
A.C.C. Lemonge
Department of Applied and Computational Mechanics,
Federal University of Juiz de Fora UFJF, Juiz de Fora MG Brazil
e-mail: afonso.lemonge@ufjf.edu.br
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 219248.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
220
221
(9.1)
where fi (x) is the ith objective function to be minimized, nob j is the number of
objectives, n is the number of design variables, S is the search space bounded by
xL x xU , and ni is the number of inequality constraints. The feasible region is
defined by S and the ni inequality constraints g j (x).
We have multi-objective (MO) optimization when nob j 2. Single-objective
(SO) optimization (nob j = 1) is a special case of the formulation above. Also,
in the absence of constraints (ni = 0) we have the single- and multi-objective
unconstrained optimization problems.
In MO optimization a set of solutions representing the tradeoff among the different objectives rather than an unique optimal solution is sought. This set of solutions
is also known as the Pareto optimal set and these solutions are also termed noninferior, admissible, or efficient solutions [20]. The corresponding objective vectors of
these solutions are termed nondominated and each objective component of any nondominated solution in the Pareto optimal set can only be improved by degrading at
least one of its other objective components [58]. The concept of Pareto dominance
and Pareto optimality will form the basis of solution quality. Pareto dominance is
defined by
x1 P x2 (x1 Pareto-dominates x2 ) :
(9.2)
222
Fitness Inheritance
The fitness inheritance procedure was first proposed by Smith et al [56], and since
then has been applied in several problems [6, 12, 13, 49, 52, 63] and algorithms
[38, 45]. In fitness inheritance, all the individuals in the initial population have their
fitness value obtained via fitness function. Thereafter, the fitness of a fraction of
the individuals in the subsequent populations is inherited from their parents. The
remaining individuals are evaluated using the original fitness function (referred to
as simulation model).
The inheritance procedure is described as follows. Given an individual xh generated by evolutionary operators (crossover and mutation), from the parents x p1 and
x p2 . The surrogate evaluation is given by:
f (x pi )
if d(xh , x pi ) = 0, i = 1 or 2
h
(9.3)
f(x ) =
223
x2
x1
Fig. 9.1 Illustration of the Fitness Imitation procedure. The individuals inside the dotted
circles belong to the same group. The representative individual, denoted by a black square,
is evaluated by the exact function. The remaining individuals are evaluated by a surrogate
model, their predicted fitness being calculated according to the distance to the representative
individual
fitness function evaluation [8, 51]. In fact, the inheritance procedure may be orders
of magnitude less expensive than the standard fitness evaluation. However, this approach introduces some noise in the search process and may adversely affect the
final solution found [13].
9.3.1.2
Fitness Imitation
In Fitness Imitation [24], the individuals are clustered into several groups. Several
clustering techniques can be used to perform this task [28]. Then, only the individual that represents its cluster is evaluated using the fitness function. The choice of
the representative individual can be made either deterministically or randomly [35].
The fitness value of other individuals in the same cluster will be estimated from the
representative individual based on a similarity measure. If a new individual to be
evaluated does not belong to any cluster, it is evaluated by the original function. The
term Fitness Imitation is used in contrast to Fitness Inheritance.
An illustration of the Fitness Imitation procedure is depicted in Figure 9.1.
Examples of applications of this procedure can be found in [3, 28, 35].
9.3.1.3
Nearest Neighbors
The nearest neighbors surrogate model (k-NN) is a simple and transparent surrogate model where the approximations are built based on a set D, which stores
individuals (samples).
The idea of using k-NN to assist an evolutionary algorithm was explored in [46,
47], where the aim was to reduce the number of exact function evaluations needed
during the search. Here we use the surrogate to extend the generations, and to guide
the search towards improved solutions.
Given an offspring xh , the corresponding value f(xh ) f (xh ), to be assigned to
h
x is
224
f(xh ) =
f (xI j )
I
I
kj=1 s(xh ,x j )u f (x j )
I
kj=1 s(xh ,x j )u
if xh = xI j , for some j = 1, . . . ,
otherwise
(9.4)
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
225
Fig. 9.2 Pre-selection (PS) management procedure. psm is the fraction of individuals evaluated by the original model, is the population size, f and f are the original and surrogate functions, N f the current number of exact evaluations and N f is the current number of
surrogate evaluations
In the model management used here, only a fraction 0 < psm 1 of the population
is evaluated by the time-consuming original model. We implement a pre-selection
(PS) [19] strategy, where the surrogate model is used to decide which individuals
will be evaluated by the original function. This procedure is described as follows:
first, using evolutionary operators, individuals in the offspring population Gt are
generated from parents in the parent population Pt . Then the offspring population
Gt is entirely evaluated by the surrogate model and then ranked in decreasing order
of quality. Based upon this rank, the psm highest ranked individuals (according to
the surrogate model f) are evaluated by the original model, and the remaining
psm individuals in Gt maintain their objective function predicted by the surrogate
model f. The PS model management procedure is shown in Figure 9.2.
In the PS model management it is not necessary that the surrogate model approximates the objective function closely. It is sufficient that the ranking of the individuals
in the offspring population be similar to the ranking that would be obtained using
the simulation model.
226
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
Fig. 9.3 Similarity-based surrogate-assisted GA (SBSM-GA). Pseudo-code for single(SBSM-SOGA) or multi-objective (SBSM-MOGA) optimization. Pt is the parent population,
Gt is the offspring population, is the population size, f and f are the original and surrogate
functions. N f ,max is maximum number of exact evaluations, N f is the current number of exact
evaluations and N f is the number of surrogate evaluations
measure of constraint violation associated with it. The population is sorted in order to establish a ranking. Individuals are then selected for reproduction in a way
that better performing solutions have a higher probability of being selected. The
genetic material contained in the chromosome of such parent individuals is then
recombined and mutated, by means of crossover and mutation operators, giving rise
to offspring which will form a new generation of individuals. Finally, the whole
process is repeated until a termination criterion is attained.
Elitism is applied in the parent population update procedure (line 13): some individuals of the parent population are saved to the offspring population before the
new parent population is created. In the single-objective version (SBSM-SOGA),
the best ranked individual of the parent population Pt is copied to the offspring
population Gt .
In single-objective constrained optimization (ni
= 0), we use a constraint handling technique presented in [10] to guide the search toward the (feasible) optimum.
The individuals are ranked according to a pair-wise comparison procedure, where
the following criteria are enforced:
1. when two feasible solutions are compared, the one with better objective function
value is chosen,
2. when one feasible and one infeasible solutions are compared, the feasible
solution is chosen, and
3. when two infeasible solutions are compared, the one with smaller constraint
violation is chosen.
i
The constraint violation is given by nj=1
max(0, g j (x))2 .
227
228
Table 9.1 Algorithmic parameters setting for SBSM-GA (single- and multi-objective
optimization)
Algorithmic Parameters
Single-obj. optimization problems: = 40
Population size ( )
Multi-obj. optimization problems: = 50
Floating-point coding: vectors of real numbers.
Representation
Single-obj. opt. problems: Uniform mutation, [34], Heuristic,
One- and Two-point crossover, [23], Rank-based selection [57]
and Elitism (best individual copied to the next generation)
Multi-obj. opt. problems: Uniform mutation, Heuristic, One- and
Operators
Two-point crossover, Rank-based selection (fast-non-dominated
sorting and crowding distance [11]), and Elitism (parent and offspring population mixed and sorted in order to create the next
generation)
Maximum number of exact evaluations, given by N f ,max .
Stop criterium
p
= 0.54 (Heuristic), pc ,1p = 0.18 (One-point) and pc ,2p =
Crossover Probability (pc ) c ,heu
0.18 (Two-Point)
Mutation Rate (pm )
pm = 0.02
Database size ( )
= { , 2 , 5 , 15 } or DPR=( / ) = {1, 2, 5, 15}
Replace the oldest individual. Only individuals evaluated by the
Database update
original function can replace individuals in the database D.
Surrogate Model
Nearest Neighbors (k-NN)
Number of Neighbors (k) k {1, 2, 5, 10, 15}
Individual-based Pre-Selection (PS) [19]. At each generation, the
offspring population Gt is entirely evaluated by the surrogate
model and ranked in decreasing order of quality. The psm highest ranked individuals (according to the surrogate model f) are
Model Management
evaluated by the original model, and the remaining psm individuals in Gt maintain their objective function predicted by the
surrogate model
f.
psm [0.05, 1.00]. The parameter psm defines the fraction of individuals evaluated by the original model: psm = 1 means the stanFraction psm
dard GA (no surrogates) with NG = N f ,max / generations. As
we must ensure at least one individual evaluated by the original
model in each generation, we have psm 1/ .
The performance of the SBSM-GA is to be compared to the Standard GA (psm = 1).
Single-obj. opt. problems: The value of the objective function.
For constrained problems, also the number of runs leading to feaPerformance Measures
sible final solutions.
Multi-obj. opt. problems: Generational Distance [59], Maximum
Spread [33] and Spacing [20].
Number of runs
50
Database size =
DPR: Database size Population size Ratio, with DPR = Population
size
229
As the surrogate evaluations are introduced into the Standard GA, errors due
the surrogate model evaluations are also introduced, which may adversely affect the
quality of the final solutions. On the other hand, the extra surrogate evaluations allow
for a longer period to search for improved solutions. There is a trade-off beetwen the
noise introduced by the surrogate models and the beneficial impact in increasing the
number of generations. We recall that, given a budget of N f ,max exact evaluations,
as the parameter psm decreases, the number of generations increases according to
NG = N f ,max /psm .
It is assumed that for complex real-world applications the cost of a surrogate model
evaluation is negligible when compared to that of a simulation, hence total computational time will be only slightly increased due to the extra surrogate evaluations.
In this first experiment, we study the impact of increasing the number of neighbors, given a fixed database size (fixed DPR), in order to choose an appropriate
neighborhood size. Under a fixed DPR= / = 2. the experiments were conducted
Table 9.2 Single-objective minimization problems. The maximum number of simulations is
N f ,max , the lower and upper bounds are respectively xU and xL , n is the number of design
variables, and f is the optimal objective function value
#
F01
F02
F03
Objective function
ni=1 x2i
n (x + 0.5)2
i=1 i
x2
i
ni=1
ni=1 4000
0.2
ni=1 x2i
n
cos
xi
i
+1
ni=1 cos(2 xi )
n
F04 20e
e
+ 20 + e
2 2
2
F05
n1
i=1 100(xi+1 xi ) + (1 xi )
n
4
F06
i=1 ixi +U(0, 1)
2 10 cos (2 x ) + 10)
F07
ni=1 (x
i
i
F08 ni=1 xi sin( |xi |) 418.982887272433n
N f ,max n
1000 10
1000 10
[xL , xU ]
[5.12, 5.12]
[100, 100]
f
0
0
1600 10
[600, 600]
1000
2000
1000
2000
1000
10 [32.768, 32.768]
10 [5.12, 5.12]
10 [4.28, 4.28]
10
[10, 10]
10
[500, 500]
0
0
0
0
0
230
Table 9.3 Constrained minimization problems The number of design variables is n. The
constraints read g j = g j (x) 0, j = 1, . . . , ni
#
Objective function
G02
ni=1 cos4 (x
ni=1 cos2 (xi )
i )2
n
i=1 ix2i
G04
5.3578547x23 + 0.8356891x1
+x2 37.293239x1 40792.141
G06
G07
G08
G09
G10
sin (2 x1 ) sin (2 x2 )
x31 (x1 +x2 )
(x1 10)2 + 5(x2 12)2 +
xqb43 + 3(x4 11)2 +
10x56 + x26 + x47
4x6 x7 10x6 8x7
x1 + x2 + x3
Constraints
g1 = 2x1 + 2x2 + x10 + x11 10
g2 = 2x1 + 2x3 + x10 + x12 10
g3 = 2x3 + 2x2 + x12 + x11 10
g4 = 8x1 + x10
g5 = 8x2 + x11
g6 = 8x3 + x12
g7 = 2x4 x5 + x10
g8 = 2x6 x7 + x11
g9 = 2x8 x9 + x12
g1 = 0.75 ni=1 xi
g2 = ni=1 xi 7.5n
g1 = 85.334407 + 0.0056858x2 x5 +
0.0006262x1x4 0.0022053x3 x5 92
g2 = 85.334407 0.0056858x2 x5
0.0006262x1x4 + 0.0022053x3 x5 0
g3 = 80.51249 + 0.0071317x2 x5 +
0.0029955x1 x2 + 0.0021813x23 110
g4 = 80.51249 0.0071317x2 x5
0.0029955x1 x2 0.0021813x23 90
g5 = 9.300961 + 0.0047026x3 x5 +
0.0012547x1 x3 + 0.0019085x3 x4 25
g6 = 9.300961 0.0047026x3 x5
0.0012547x1 x3 0.0019085x3 x4 20
g1 = (x1 5)2 (x2 5)2 + 100
g2 = (x1 6)2 (x2 5)2 + 82.81
g1 = 105 + 4x1 + 5x2 + 3x7 + 9x8
g2 = 10x1 8x2 17x7 + 2x8
g3 = 8x1 + 2x2 + 5x9 2x10 12
g4 = 3(x1 2)2 + 4(x2 3)2 + 2x23 7x4 120
g5 = 5x21 + 8x2 + (x3 6)2 2x4 40
g6 = x21 + 2(x2 2)2 2x1 x2 + 14x5 6x6
g7 = 0.5(x1 8)2 + 2(x2 4)2 + 3x25 x6 30
g8 = 3x1 + 6x2 + 12(x9 8)2 7x10
g1 = x21 x2 + 1
g2 = 1 x1 + (x2 4)2
g1 = 127 + 2x21 + 3x42 + x3 + 4x24 + 5x5
g2 = 282 + 7x1 + 3x2 + 10x23 + x4 x5
g3 = 196 + 23x1 + x22 + 6x26 8x7
g4 = 4x21 + x22 3x1 x2 + 2x23 + 5x6 11x7
g1 = 1 + 0.0025(x4 + x6 )
g2 = 1 + 0.0025(x5 + x7 x4 )
g3 = 1 + 0.01(x8 x5 )
g4 = x1 x6 + 833.33252x4 + 100x1 83333.333
g5 = x2 x7 + 1250x5 + x2 x4 1250x4
g6 = x3 x8 + 1250000 + x3 x5 2500x5
13
20
10
231
Table 9.4 Bound constraints for single-objective constrained optimization problems. The
maximum number of simulations is N f ,max and f is the optimal objective function value
Function
G01
G02
G04
G06
G07
G08
G09
G10
Bound constraints
0 xi 1 (i = 1, . . . , 9),
0 xi 100 (i = 10, 11, 12),
0 x13 1
0 xi 10 (i = 1, . . . , n) n = 20
78 x1 102,
33 x2 45,
27 xi 45(i = 3, 4, 5)
13 x1 100,
0 x2 100
10 xi 10(i = 1, . . . , 10)
0 x1 , x2 10
10 xi 10(i = 1, . . . , 7)
100 x1 10000,
1000 xi 10000(i = 2, 3),
10 xi 1000(i = 4, ..., 8)
N f ,max
600
15
1200
0.80355
6000 30665.539
2400 6961.81388
1000 24.3062091
8000 0.095825
800 680.6300573
3000
7049.3307
for k = {1, 2, 5, 15} neighbors and the averaged fitness in 50 runs was used as
performance measure.
The neighborhood size affects the surrogate model in a way that small neighborhood leads to estimates very close to the data in the database D, while a larger
neighborhood tends to smooth the surrogate output, resulting in estimates close to
the mean of the data in D [4].
The results for the SBSM-SOGA applied to the single objective optimization
problems in Tables 9.2 and 9.3 for different values of psm , and using 1, 2, 5, 10, and
15 neighbors are shown in Figures 9.4 and 9.5. In Figure 9.5, for each test-problem,
the average of the objective function in 50 runs is displayed. The average was
calculated considering only the feasible runs, i.e. those producing a final solution
which does not violate the constraints in Eq. (9.1).
For the all unconstrained functions, except for F08 , as psm decreases, increasingly
better results are obtained. For those functions, it is possible to use very small values
of psm . In this set of experiments we set psm = 0.05, although we may use psm >
1/40 = 0.025, as described in Table 9.1. The results obtained for function F08 , show
that improvements with respect to the Standard GA are obtained for psm values
below a certain threshold value, and the maximum improvement (compared to the
Standard GA) were obtained when psm > 0.20.
The same trend with respect to the number of neighbors and the parameter psm
is observed for all unconstrained functions. We observe that the extra evaluations
performed by the surrogate are beneficial to the evolutionary search, and improved
results are obtained when the number of generations increases.
From the results obtained for function G08 , we can see that reducing psm , no
longer improves the final results, which means that the noise introduced by the
232
=
=
=
=
=
1
2
5
10
15
1.00
0.80
0.60
0.40
0.20
k
k
k
k
k
k
k
k
k
k
Averaged Fitness
DPR= 2
Averaged Fitness
DPR= 2
0.05
1.00
0.80
p sm
0.40
0.20
1
2
5
10
15
0.05
p sm
(a) F01
(b) F02
DPR= 2
k
k
k
k
k
=
=
=
=
=
1
2
5
10
15
1
2
5
10
15
Averaged Fitness
=
=
=
=
=
1.5
2.0
2.5
k
k
k
k
k
DPR= 2
Averaged Fitness
0.60
=
=
=
=
=
1.00
0.80
0.60
0.40
0.20
0.05
1.00
0.80
p sm
0.40
0.20
30
k
k
k
k
k
20
1
2
5
10
15
Averaged Fitness
=
=
=
=
=
=
=
=
=
=
1
2
5
10
15
5 10
Averaged Fitness
0.05
DPR= 2
k
k
k
k
k
0.60
0.20
(d) F04
DPR= 2
0.80
0.40
p sm
(c) F03
1.00
0.60
0.05
1.00
0.80
p sm
0.60
0.40
0.20
0.05
p sm
(e) F05
(f) F06
DPR= 2
1200
k
k
k
k
k
=
=
=
=
=
1
2
5
10
15
800
1000
1
2
5
10
15
Averaged Fitness
=
=
=
=
=
25
30
35
k
k
k
k
k
20
Averaged Fitness
40
DPR= 2
1.00
0.80
0.60
0.40
0.20
0.05
1.00
0.80
0.60
p sm
p sm
(g) F07
(h) F08
0.40
0.20
0.05
Fig. 9.4 Averaged Fitness for different values of psm , with DPR=2, using 1, 2, 5, 10, and 15
neighbors in the surrogate model shown in Eq. (9.4)
233
DPR= 2
1.00
0.80
0.60
0.40
0.20
1
2
5
10
15
Averaged Fitness
=
=
=
=
=
k
k
k
k
k
10
Averaged Fitness
DPR= 2
0.05
k
k
k
k
k
1.00
0.80
p sm
(a) G01
(b) G02
0.80
0.60
0.40
0.20
4500
5500
k
k
k
k
k
0.05
1.00
0.80
p sm
(c) G04
(d) G06
0.40
0.20
0.05
1.00
0.80
(e) G07
(f) G08
0.40
1
2
5
10
15
0.05
0.40
=
=
=
=
=
0.20
1
2
5
10
15
0.05
0.20
=
=
=
=
=
1
2
5
10
15
0.05
DPR= 2
Averaged Fitness
Averaged Fitness
0.60
p sm
0.60
=
=
=
=
=
0.20
k
k
k
k
k
p sm
k
k
k
k
k
0.80
0.40
1
2
5
10
15
Averaged Fitness
500
300
Averaged Fitness
=
=
=
=
=
100
0.60
0.05
DPR= 2
DPR= 2
1.00
0.60
p sm
k
k
k
k
k
0.80
0.20
6500
1
2
5
10
15
Averaged Fitness
=
=
=
=
=
DPR= 2
1.00
0.40
1
2
5
10
15
DPR= 2
k
k
k
k
k
Averaged Fitness
DPR= 2
1.00
0.60
p sm
=
=
=
=
=
k
k
k
k
k
1.00
0.80
0.60
p sm
p sm
(g) G09
(h) G10
0.40
0.20
=
=
=
=
=
1
2
5
10
15
0.05
Fig. 9.5 Averaged Fitness for different values of psm , with DPR=2, using 1, 2, 5, 10, and 15
neighbors in the surrogate model shown in Eq. (9.4)
234
surrogate model affects the search in a negative way. Function G08 , corresponds to a
complex landscape which could not be well approximated by the surrogate model.
Although faster and simple, the k-NN surrogate model has limited capabilities to
approximate complex mapping in n , which, as an inner-product space, allows for
other calculus-based approximation. However, when the search occurs in a metric
space, k-NN may be one of the few available alternatives.
As observed in function G08 , the constraints make the problem harder for the
SBSM-SOGA, since more approximations are involved (objective functions and
constraints) and the use of surrogates may lead the evolutionary process to poorer
regions of the search space.
The results displayed in Figure 9.5, except for function G06 , and for G08 (where
no improvements were obtained), show that the number of neighbors does not significantly affect the performance of the SBSM-SOGA for the set of functions considered here.
Table 9.5 shows the number of feasible runs for the SBSM-GA. The results were
obtained using k = 2 neighbors and DPR=2 to build the surrogate in Eq. (9.4). We
observe that the introduction of the surrogate does not affect the number of feasible
runs, except in test-problem G06 , where a slightly decrease occurs. In G01 and G10 ,
the SBSM-GA increased the number of feasible runs.
Table 9.5 Constrained optimization problems Number of runs that produce a final feasible
solution with respect to the parameter psm . The results were obtained using 2 neighbors and
DPR=2 to build the surrogate in Eq. (9.1)
psm
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.05
G01
12
13
25
29
43
48
50
50
50
50
50
G02
50
50
50
50
50
50
50
50
50
50
50
G04
50
50
50
50
50
50
50
50
50
50
50
G06
50
49
49
47
48
49
47
47
48
46
47
G07
46
42
45
49
50
49
50
50
50
50
50
G08
50
50
50
50
50
50
50
50
50
50
50
G09
50
50
50
50
50
50
50
50
50
50
50
G10
20
15
20
20
15
27
28
33
39
41
40
In frameworks that use surrogates as a local search tools or to enhance operators, the improvements are directly related to the surrogate models. In this set of
experiments, the contribution of the surrogates to the evolutionary search is indirect:
the surrogates allow for an extended number of generations (although with inexact
evaluations), which provided the GA a longer period to evolve solutions.
9.4.1.2
In this section, a study of the impact of the database size on the evolutionary process
is performed. Based on the experiments presented in the previous section, we set
235
1 NPF
GD =
(9.5)
d 2j
NPF j=1
where NPF is the number of individuals in PFT , d j is the Euclidean distance (in the
objective space) beetwen an individual j in PFE and its nearest individual in PFT .
The generational distance in Eq. (9.5) measures the convergence to the true Pareto
front, and lower values of GD are better.
236
=
=
=
=
1
2
5
15
1.00
0.80
0.60
0.40
0.20
DPR
DPR
DPR
DPR
DPR
DPR
DPR
DPR
Averaged Fitness
2 neighbors
Averaged Fitness
2 neighbors
0.05
1.00
0.80
0.60
0.40
0.20
1
2
5
15
0.05
p sm
(b) F02
2 neighbors
2 neighbors
DPR
DPR
DPR
DPR
=
=
=
=
1
2
5
15
1
2
5
15
=
=
=
=
Averaged Fitness
2.0
2.5
DPR
DPR
DPR
DPR
(a) F01
1.5
Averaged Fitness
p sm
=
=
=
=
1.00
0.80
0.60
0.40
0.20
0.05
1.00
0.80
2 neighbors
2 neighbors
0.60
0.05
0.40
0.20
30
DPR
DPR
DPR
DPR
=
=
=
=
1
2
5
15
20
1
2
5
15
0.05
1.00
0.80
p sm
0.60
0.40
0.20
0.05
p sm
(f) F06
2 neighbors
2 neighbors
DPR
DPR
DPR
DPR
1200
1
2
5
15
=
=
=
=
1
2
5
15
800
1000
=
=
=
=
30
35
DPR
DPR
DPR
DPR
Averaged Fitness
(e) F05
25
Averaged Fitness
0.20
5 10
=
=
=
=
Averaged Fitness
(d) F04
0.80
0.40
(c) F03
DPR
DPR
DPR
DPR
1.00
0.60
p sm
50
Averaged Fitness
p sm
1.00
0.80
0.60
0.40
0.20
0.05
1.00
0.80
0.60
p sm
p sm
(g) F07
(h) F08
0.40
0.20
0.05
237
2 neighbors
1.00
0.80
0.60
0.40
0.20
1
2
5
15
Averaged Fitness
=
=
=
=
DPR
DPR
DPR
DPR
10
Averaged Fitness
2 neighbors
0.05
DPR
DPR
DPR
DPR
1.00
0.80
p sm
(a) G01
(b) G02
0.60
0.40
0.20
4500
DPR
DPR
DPR
DPR
5500
1
2
5
15
0.05
1.00
0.80
p sm
(c) G04
(d) G06
0.40
0.40
0.20
Averaged Fitness
500
1
2
5
15
300
=
=
=
=
100
0.20
0.05
DPR
DPR
DPR
DPR
1.00
0.80
0.60
0.40
p sm
(e) G07
(f) G08
2 neighbors
2 neighbors
1
2
5
15
0.05
1
2
5
15
=
=
=
=
0.20
DPR
DPR
DPR
DPR
1
2
5
15
0.05
=
=
=
=
1
2
5
15
13000
=
=
=
=
17000
DPR
DPR
DPR
DPR
Averaged Fitness
p sm
=
=
=
=
15000
Averaged Fitness
Averaged Fitness
0.60
0.05
2 neighbors
DPR
DPR
DPR
DPR
0.80
0.60
p sm
2 neighbors
1.00
0.20
6500
=
=
=
=
Averaged Fitness
30350
30500
DPR
DPR
DPR
DPR
0.80
0.40
1
2
5
15
2 neighbors
30650
Averaged Fitness
2 neighbors
1.00
0.60
p sm
=
=
=
=
1.00
0.80
0.60
0.40
0.20
0.05
1.00
0.80
0.60
p sm
p sm
(g) G09
(h) G10
0.40
0.20
0.05
238
MF02
MF03
MF04
MF05
MF06
MF07
MF08
Objective Functions
f1 (x)
= x1
f2 (x) = 1 f1 (x)/g(x)
g(x) = 1 + 9(ni=2 xi )/(n 1)
f1(x) = x1
f2 (x) = g(x) 1 ( f 1 (x)/g(x))2
g(x) = 1 + 9(ni=2 xi )/(n 1)
f1 (x) = x1
f2 (x) = g(x) 1 f1 (x)/g(x) ( f 1 (x)/g(x)) sin 10 f1
g(x) = 1 + 9(ni=2 xi )/(n 1)
f1 (x)
= x1
f2 (x) = 1 f1 (x)/g(x)
g(x) = 1 + 10(n 1) + 9( ni=2 (x2i 10 cos 4 xi )
f1 (x) = 0.5x1 x2 (1 + g(x))
f2 (x) = 0.5(1 x2 )(1 + g(x))
f3 (x) = 0.5(1 x1 )(1 + g(x))
g(x) = 1000 + 100 ni=3 [(xi 0.5)2 cos 20 (xi 0.5)]
f1 (x) = 1 exp (4x1 ) sin6 (6 x1 )
f2 (x) = g(x)[1 f 1 (x)/g(x)2 ]
g(x) = 1 + 9[ni=2 xi /(n 1)]0.25
f1 (x) = cos 2 x1 cos 2 x2 (1 + g(x))
f2 (x) = cos 2 x1 sin 2 x2 (1 + g(x))
f3 (x) = sin 2 x1 (1 + g(x))
g(x) = ni=3 (xi 0.5)2
f1 (x) = cos 2 x1 cos 2 x2 (1 + g(x))
f2 (x) = cos 2 x1 sin 2 x2 (1 + g(x))
f3 (x) = sin 2 x1 (1 + g(x))
g(x) = 1000 + 100 ni=3 [(xi 0.5)2 cos 20 (xi 0.5)]
[xL , xU ]
N f ,max
30
[0, 1]
1000
30
[0, 1]
1000
30
[0, 1]
1000
x1 [0, 1],
10 xi [5, 5] 1000
i = 2, . . . , 10
12
[0, 1]
2000
10
[0, 1]
1000
12
[0, 1]
1000
12
[0, 1]
1400
The Maximum Spread (MS) is used to measure how well the true Pareto front
PFT is covered by the evolved Pareto front PFE. A larger value of MS reflects that
a larger area of the PFT is covered by PFE. The MS is given as
nob j i=1
Fimax Fimin
where fimax and fimin are the maximum and minimum of the ith objective in the
evolved Pareto front, respectively, and Fimax and Fimin the maximum and minimum
of the ith objective in the true Pareto front, respectively.
239
MG03
MG04
MG05
MG06
Objective functions
f1 = 2x1 + x2
f2 = +2x1 + x2
f1 = (x1 2)2 + (x2 1)2 2
f 2 = 9x1 + (x2 1)2
Constraints
g1 = x1 + x2 1
g2 = +x1 + x2 7
g1 = x21 + x22 225
g2 = x1 3x2 + 10
g1 = 1 x21 x22 +
0.1 cos (16 arctan xx12 )
f1 = x1
f2 = x2
g2 = (x1 0.5)2 +
(x2 0.5)2 0.5
g1 = x1 + x2 2
f1 = 25(x1 2)2 + (x2 2)2 + g2 = 6 x1 x2
g3 = 2 x2 + x1
(x3 1)2 + (x4 4)2 +
g4 = 2 x1 + 3 x2
(x5 1)2
g5 = 4 (x3 3)2 x4
f2 = ni=1 x2i
g6 = (x5 3)2 + x6 4
f1 = x1
f2 = x2
g1 = 1 + ni=1 x2i
f2 = x3
1 10 x
f1 = 10
g1 = 1 f3 4 f 1
i=1 i
1 20 x
g2 = 1 f3 4 f 2
f2 = 10
i=11 i
1 30
g3 = 1 2 f 3 f 1 f 2
f3 = 10
i=11 xi
Domain N f ,max
0 x1 5
2
1000
0 x2 3
2
[20, 20]
800
[0, ]
4000
0 x1 10
0 x2 10
1 x3 5
6
0 x4 6
1 x5 5
0 x6 10
800
[0, 1]
1200
30
[0, 1]
2000
The metric of Spacing (S) shows how the nondominated solutions are distributed
along the evolved Pareto front and is given as
1 NPF
1
1 NPF
S=
(d d j )2 ,
d =
(9.7)
dk
NPF k=1
d NPF j=1
where NPF is the number of individuals in PFT and di is the Euclidean distance (in
the objective space) beetwen an individual i in the evolved Pareto front PFE and its
nearest individual in the true Pareto front PFT .
9.4.2.1
In this section we analyze the impact of the number of neighbors to the evolutionary
search, given a fixed database size. According to the database replacement policy,
the oldest individual is always chosen to be replaced. However, by removing solutions according to age, we may inevitably remove some important information.
In order to alleviate this effect, we enlarge the training size for MO problems, and
set the database size to = 15 , which corresponds to DPR=15. This value of
240
1
2
5
10
15
k
k
k
k
k
1.00
0.80
0.60
0.40
0.20
0.05
1.00
0.80
p sm
0.60
0.40
0.20
0.05
(b) MF02
DPR= 15
1.5
k
k
k
k
k
=
=
=
=
=
1
2
5
10
15
0.5
1.0
1
2
5
10
15
Gen Distance
=
=
=
=
=
0.020 0.025
0.035
k
k
k
k
k
2.5
DPR= 15
Gen Distance
1
2
5
10
15
p sm
(a) MF01
1.00
0.80
0.60
0.40
0.20
0.05
1.00
0.80
p sm
0.60
0.40
0.20
0.05
p sm
(c) MF03
(d) MF04
DPR= 15
k
k
k
k
k
0.10 0.15
1
2
5
10
15
Gen Distance
=
=
=
=
=
=
=
=
=
=
1
2
5
10
15
2.0
2.5
3.0
k
k
k
k
k
0.25
DPR= 15
Gen Distance
=
=
=
=
=
0.02
0.04 0.06
=
=
=
=
=
Gen Distance
0.030
k
k
k
k
k
0.10
DPR= 15
0.020
Gen Distance
0.040
DPR= 15
1.00
0.80
0.60
0.40
0.20
0.05
1.00
0.80
p sm
0.40
0.20
0.05
p sm
(e) MF05
(f) MF06
DPR= 15
k
k
k
k
k
1
2
5
10
15
=
=
=
=
=
1
2
5
10
15
=
=
=
=
=
0.003
0.005
k
k
k
k
k
Gen Distance
0.008
DPR= 15
Gen Distance
0.60
1.00
0.80
0.60
p sm
(g) MF07
0.40
0.20
0.05
1.00
0.80
0.60
0.40
0.20
0.05
p sm
(h) MF08
Fig. 9.8 Generational Distance (GD) indicator: surrogate-assisted multi-objective optimization using DPR=15 and k = {1, 2, 5, 10, 15} neighbors
241
k=1
k=2
k=5
k = 10
k = 15
0.02
2e04
5e04
k=1
k=2
k=5
k = 10
k = 15
Gen Distance
DPR= 15
5e05
Gen Distance
DPR= 15
1.00
0.80
0.60
0.40
0.20
0.05
1.00
0.80
p sm
0.20
0.05
(b) MG02
0.8
Gen Distance
k=1
k=2
k=5
k = 10
k = 15
0.4
5e04
2e03
k=1
k=2
k=5
k = 10
k = 15
1.0
DPR= 15
0.5 0.6
1e02
DPR= 15
Gen Distance
0.40
p sm
(a) MG01
1.00
0.80
0.60
0.40
0.20
0.05
1.00
0.80
p sm
0.60
0.40
0.20
0.05
p sm
(c) MG03
(d) MG04
DPR= 15
0.0025 0.0035
k=1
k=2
k=5
k = 10
k = 15
0.0015
6e04
8e04
k=1
k=2
k=5
k = 10
k = 15
Gen Distance
1e03
DPR= 15
Gen Distance
0.60
1.00
0.80
0.60
p sm
(e) MG05
0.40
0.20
0.05
1.00
0.80
0.60
0.40
0.20
0.05
p sm
(f) MG06
Fig. 9.9 Generational Distance (GD) indicator: surrogate-assisted multi-objective optimization using DPR=15 and k = {1, 2, 5, 10, 15} neighbors
242
1.00
0.80
0.60
0.40
0.20
k
k
k
k
k
0.80
1
2
5
10
15
0.05
1.00
0.80
p sm
0.20
1
2
5
10
15
Max Spread
1.0000000
0.9999994
Max Spread
0.9999988
0.40
=
=
=
=
=
0.05
1.00
0.80
0.05
0.60
0.40
=
=
=
=
=
0.20
1
2
5
10
15
0.05
p sm
(c) MF05
(d) MF08
DPR= 15
k
k
k
k
k
0.999
1
2
5
10
15
0.997
=
=
=
=
=
=
=
=
=
=
1
2
5
10
15
0.995
k
k
k
k
k
Max Spread
DPR= 15
Max Spread
0.20
k
k
k
k
k
p sm
1.00
0.80
0.60
0.40
0.20
0.05
1.00
0.80
p sm
p sm
(g) MG03
0.40
0.20
0.05
0.90
k
k
k
k
k
1
2
5
10
15
Max Spread
1.0000
0.9996
=
=
=
=
=
0.9992
0.60
0.20
0.05
DPR= 15
k
k
k
k
k
0.80
0.40
(f) MG02
DPR= 15
1.00
0.60
p sm
(e) MG01
Max Spread
0.40
DPR= 15
k
k
k
k
k
0.60
0.60
(b) MF03
DPR= 15
0.80
1
2
5
10
15
p sm
(a) MF01
1.00
=
=
=
=
=
0.65 0.70
=
=
=
=
=
Max Spread
0.85
k
k
k
k
k
0.90
DPR= 15
0.70 0.75
Max Spread
0.95
DPR= 15
1.00
0.80
0.60
0.40
0.20
=
=
=
=
=
1
2
5
10
15
0.05
p sm
(h) MG06
243
1.00
0.80
0.60
0.40
0.20
k
k
k
k
k
1.0
1
2
5
10
15
Spacing
=
=
=
=
=
0.05
1.00
0.80
p sm
0.60
0.40
0.20
0.05
(b) MF03
DPR= 15
DPR= 15
0.56
k
k
k
k
k
=
=
=
=
=
1
2
5
10
15
0.48
0.52
1
2
5
10
15
Spacing
=
=
=
=
=
0.55
0.60 0.65
k
k
k
k
k
0.50
Spacing
1
2
5
10
15
p sm
(a) MF01
1.00
0.80
0.60
0.40
0.20
0.05
1.00
0.80
p sm
0.60
0.40
0.20
0.05
p sm
(c) MF05
(d) MF08
DPR= 15
k
k
k
k
k
2.5 3.0
1
2
5
10
15
2.0
=
=
=
=
=
=
=
=
=
=
1
2
5
10
15
0.5
1.5
2.0 5.0
k
k
k
k
k
Spacing
20.0
DPR= 15
Spacing
=
=
=
=
=
k
k
k
k
k
1.2
DPR= 15
0.4
Spacing
DPR= 15
1.00
0.80
0.60
0.40
0.20
0.05
1.00
0.80
p sm
0.60
0.40
0.20
0.05
p sm
(e) MG01
(f) MG02
DPR= 15
k
k
k
k
k
1.0 1.1
1
2
5
10
15
Spacing
=
=
=
=
=
=
=
=
=
=
1
2
5
10
15
0.9
3 4 5
k
k
k
k
k
0.8
Spacing
DPR= 15
1.00
0.80
0.60
p sm
(g) MG03
0.40
0.20
0.05
1.00
0.80
0.60
0.40
0.20
0.05
p sm
(h) MG06
Fig. 9.11 Spacing (S): surrogate-assisted multi-objective optimization using DPR=15 and
k = {1, 2, 5, 10, 15} neighbors
244
245
a better convergence to the true Pareto front, according to the performance metrics, and the results are not significantly affected by the number of neighbors
used.
In the nearest neighbor approximation model no training procedure is required and
the prediction involves finding the nearest neighbors in an archive of previously
evaluated individuals. Under a fixed number of expensive simulations, the cost of
the surrogate-assisted procedure is only slightly increased due to the negligible computational cost of the extra surrogate evaluations as the cost of the expensive simulation increases.
The framework presented here seems to be a simple and effective way to tackle
single- and multi-objective unconstrained or constrained expensive optimization
problems. Additionally, the proposed framework can be easily extended to other
population-based metaheuristics, such as Differential Evolution, Ant Colony
Optimization and Particle Swarm Optimization.
References
1. Acar, E., Rais-Rohani, M.: Ensemble of metamodels with optimized weight factors.
Struct. Multidisc. Optim. 37(3), 279294 (2009)
2. Aha, D.W.: Editorial. Artif. Intell. Rev. 11(1-5), 16 (1997); special issue on lazy learning
3. Akbarzadeh-T, M.R., Davarynejad, M., Pariz, N.: Adaptive fuzzy fitness granulation for
evolutionary optimization. International Journal of Approximate Reasoning 49(3), 523
(2008)
4. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression.
The American Statistician 46(3), 175185 (1992)
5. Blanning, R.W.: The source and uses of sensivity information. Interfaces 4(4), 3238
(1974)
6. Bui, L.T., Abbass, H.A., Essam, D.: Fitness inheritance for noisy evolutionary multiobjective optimization. In: GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, pp. 779785. ACM, New York (2005)
7. Bull, L.: On model-based evolutionary computation. Soft Computing 3(2), 7682 (1999)
8. Chen, J.H., Goldberg, D.E., Ho, S.Y., Sastry, K.: Fitness inheritance in multi-objective
optimization. In: GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 319326. Morgan Kaufmann Publishers Inc., San Francisco (2002)
9. Coello, C.A.C., Lamont, G.B., Veldhuizen, D.A.V.: Evolutionary Algorithms for Solving
Multi-Objective Problems. Kluwer Academic Publishers, Norwell (2002)
10. Deb, K.: An Efficient Constraint Handling Method for Genetic Algorithms. Computer
Methods in Applied Mechanics and Engineering 186(2/4), 311338 (2000)
11. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic
algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182197
(2002)
12. Ducheyne, E., De Baets, B., de Wulf, R.: Is fitness inheritance useful for real-world
applications? In: Fonseca, C.M., Fleming, P.J., Zitzler, E., Deb, K., Thiele, L. (eds.)
EMO 2003. LNCS, vol. 2632, pp. 3142. Springer, Heidelberg (2003)
13. Ducheyne, E., Baets, B.D., Wulf, R.D.: Fitness inheritance in multiple objective evolutionary algorithms: A test bench and real-world evaluation. Applied Soft Computing 8(1), 337349 (2007)
246
14. El-Beltagy, M., Nair, P., Keane, A.: Metamodeling techniques for evolutionary optimization of computationally expensive problems: promises and limitations. In: Proceedings of
Genetic and Evolutionary Conference, pp. 196203. Morgan Kaufmann, Orlando (1999)
15. Emmerich, M., Giannakoglou, K., Naujoks, B.: Single- and multiobjective evolutionary optimization assisted by gaussian random field metamodels. Evolutionary Computation 10(4), 421439 (2006)
16. Emmerich, M.T.M.: Single- and multi-objective evolutionary design optimization
assisted by gaussian random field metamodels. PhD thesis, Technische Universitaet Dortmund (2005)
17. Ferrari, S., Stengel, R.F.: Smooth function approximation using neural networks. IEEE
Transactions on Neural Networks 16(1), 2438 (2005)
18. Forrester, A.I., Keane, A.J.: Recent advances in surrogate-based optimization. Progress
in Aerospace Sciences 45, 5079 (2009)
19. Giannakoglou, K.C.: Design of optimal aerodynamic shapes using stochastic optimization methods and computational intelligence. Progress in Aerospace Sciences 38(1), 43
76 (2002)
20. Goh, C.K., Tan, K.C.: A competitive-cooperative coevolutionary paradigm for dynamic
multiobjective optimization. IEEE Transactions on Evolutionary Computation 13(1),
103127 (2009)
21. Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning.
Addison-Wesley Publishing Co., Reading (1989)
22. Grefenstette, J., Fitzpatrick, J.: Genetic search with approximate fitness evaluations. In:
Proceedings of the International Conference on Genetic Algorithms and Their Applications, pp. 112120 (1985)
23. Herrera, F., Lozano, M., Verdegay, J.L.: Tackling real-coded genetic algorithms: Operators and tools for behavioural analysis. Artificial Intelligence Review 12(4), 265319
(1998)
24. Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation.
Soft Computing Journal 9(1), 312 (2005)
25. Jin, Y., Branke, J.: Evolutionary optimization in uncertain environments-a survey. IEEE
Transactions on Evolutionary Computation 9(3), 303317 (2005)
26. Jin, Y., Olhofer, M., Sendhoff, B.: A framework for evolutionary optimization with approximate fitness functions. IEEE Transactions on Evolutionary Computation 6(5), 481
494 (2002)
27. Kecman, V.: Learning and soft computing: support vector machines, neural networks,
and fuzzy logic models. Complex adaptive systems. MIT Press, Cambridge (2001)
28. Kim, H.S., Cho, S.B.: An efficient genetic algorithm with less fitness evaluation by clustering. In: Proceedings of the 2001 Congress on Evolutionary Computation, vol. 2, pp.
887894 (2001)
29. Knowles, J.: Parego: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Transactions on Evolutionary Computation 10(1), 5066 (2006)
30. Kybic, J., Blu, T., Unser, M.: Generalized sampling; a variational approach Part I:
Theory. IEEE Transactions on Signal Processing 50(8), 19651976 (2002)
31. Kybic, J., Blu, T., Unser, M.: Generalized sampling; a variational approach Part II:
Applications. IEEE Transactions on Signal Processing 50(8), 19771985 (2002)
247
32. Lim, D., Ong, Y., Jin, Y., Sendhoff, B.: A study on metamodeling techniques, ensembles, and multi-surrogates in evolutionary computation. In: Proceedings of the 9th annual
conference on Genetic and evolutionary computation, pp. 12881295. ACM Press, New
York (2007)
33. Lim, D., Jin, Y., Ong, Y.S., Sendhoff, B.: Generalizing surrogate-assisted evolutionary
computation. IEEE Transactions on Evolutionary Computation (2008) (in press)
34. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 3rd edn.
Springer, Heidelberg (1996)
35. Mota, F., Gomide, F.: Fuzzy clustering in fitness estimation models for genetic algorithms and applications. In: IEEE International Conference on Fuzzy Systems,
pp. 13881395 (2006) ISBN: 0-7803-9488-7
36. Myers, R.H., Montgomery, D.C.: Response Surface Methodology Process and Product
Optimization Using Designed Experiments. Wiley Series in Probability and Statistics.
John Wiley & Sons Inc., New York (2002)
37. Ong, Y., Nair, P., Keane, A.: Evolutionary optimization of computationally expensive
problems via surrogate modeling. AIAA Journal 41(4), 687696 (2003)
38. Pilato, C., Tumeo, A., Palermo, G., Ferrandi, F., Lanzi, P.L., Sciuto, D.: Improving evolutionary exploration to area-time optimization of FPGA designs. Journal of Systems
Architecture 54(11), 1046 (2008)
39. Praveen, C., Duvigneau, R.: Low cost PSO using metamodels and inexact pre-evaluation:
Application to aerodynamic shape design. Computer Methods in Applied Mechanics and
Engineering 198(9-12), 10871096 (2009)
40. Queipo, N., Arevalo, C., Pintos, S.: The integration of design of experiments, surrogate modeling, and optimization for thermoscience research. Engineering with
Computers 20, 309315 (2005)
41. Queipo, N.V., Haftka, R.T., Shyy, W., Goela, T., Vaidyanathana, R., Tucker, P.K.:
Surrogate-based analysis and optimization. Progress in Aerospace Sciences 41(1), 128
(2005)
42. Rasheed, K., Vattam, S., Ni, X.: Comparison of methods for using reduced models to
speed up design optimization. In: Proceedings of Genetic and Evolutionary Computation
Conference, pp. 11801187. Morgan Kaufmann, New York (2002)
43. Rasheed, K., Ni, X., Vattam, S.: Comparison of methods for developing dynamic
reduced models for design optimization. Soft Computing Journal 9, 2937 (2005)
44. Regis, R.G., Shoemaker, C.A.: Local function approximation in evolutionary algorithms
for the optimization of costly functions. IEEE Trans. Evolutionary Computation 8(5),
490505 (2004)
45. Reyes-Sierra, M., Coello, C.A.C.: A study of fitness inheritance and approximation techniques for multi-objective particle swarm optimization. In: The 2005 IEEE Congress on
Evolutionary Computation, vol. 1, pp. 6572 (2005)
46. Runarsson, T.: Approximate evolution strategy using stochastic ranking. In: Yen, G.G.,
Wang, L., Bonissone, P., Lucas, S.M. (eds.) IEEE World Congress on Computational
Intelligence, Vancouver, Canada (2006)
47. Runarsson, T.P.: Constrained Evolutionary Optimization by Approximate Ranking and
Surrogate Models. In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Merelo-Guervos,
J.J., Bullinaria, J.A., Rowe, J.E., Tino, P., Kaban, A., Schwefel, H.-P. (eds.) PPSN 2004.
LNCS, vol. 3242, pp. 401410. Springer, Heidelberg (2004)
48. Runarsson, T.P., Yao, X.: Stochastic Ranking for Constrained Evolutionary Optimization. IEEE Transactions on Evolutionary Computation 4(3), 284294 (2000)
248
49. Salami, M., Hendtlass, T.: A fast evaluation strategy for evolutionary algorithms.
Applied Soft Computing 2, 156173 (2003)
50. Sanchez, E., Pintos, S., Queipo, N.: Toward an optimal ensemble of kernel-based
approximations with engineering applications. Structural and Multidisciplinary Optimization, 115 (2007)
51. Sastry, K., Goldberg, D.E., Pelikan, M.: Dont evaluate, inherit. Tech. Rep. IlliGAL
Report No. 2001013, Illinois Genetic Algorithms Laboratory (IlliGAL), Department of
General Engineering, University of Illinois at Urbana-Champaign (2001)
52. Sastry, K., Pelikan, M., Goldberg, D.E.: Efficiency enhancement of genetic algorithms
via building-block-wise fitness estimation. In: Congress on Evolutionary Computation,
CEC 2004, pp. 720727 (2004)
53. Schmidt, M., Lipson, H.: Coevolution of fitness predictors. IEEE Transactions on Evolutionary Computation 12(6), 736749 (2008)
54. Shepard, D.: A two-dimensional interpolation function for irregularly-spaced data. In:
Proceedings of the 1968 23rd ACM National Conference, pp. 517524. ACM Press,
New York (1968)
55. Sironen, S., Kangas, A., Maltamo, M., Kalliovirta, J.: Localization of growth estimates
using non-parametric imputation methods. Forest Ecology and Management 256, 674
684 (2008)
56. Smith, R.E., Dike, B.A., Stegmann, S.A.: Fitness inheritance in genetic algorithms. In:
SAC 1995: Proceedings of the 1995 ACM symposium on Applied computing, pp. 345
350. ACM Press, New York (1995)
57. Sokolov, A., Whitley, D., Barreto, A.M.S.: A note on the variance of rank-based selection
strategies for genetic algorithms and genetic programming. Genetic Programming and
Evolvable Machines 8(3), 221237 (2007)
58. Srinivas, N., Deb, K.: Multiobjective optimization using nondominated sorting in
genetic algorithms. Evolutionary Computation 2(3), 221248 (1994)
59. Van Veldhuizen, D.A., Lamont, G.B.: Evolutionary computation and convergence to a
pareto front. In: Koza, J.R. (ed.) Late Breaking Papers at the Genetic Programming 1998
Conference, Stanford University Bookstore, University of Wisconsin, Madison, Wisconsin, USA, Stanford, CA, USA (1998)
60. Wanner, E.F., Guimaraes, F.G., Takahashi, R.H.C., Lowther, D.A., Ramirez, J.A.: Multiobjective memetic algorithms with quadratic approximation-based local search for expensive optimization in electromagnetics. IEEE Transactions on Magnetics 44(6), 1126
1129 (2008)
61. Yang, D., Flockton, S.J.: Evolutionary algorithms with a coarse-to-fine function smoothing. In: IEEE International Conference on Evolutionary Computation, vol. 2, pp. 657
662 (1995)
62. Zhang, J., Yim, Y.S., Yang, J.: Intelligent selection of instances for prediction functions
in lazy learning algorithms. Artif. Intell. Rev. 11(1-5), 175191 (1997)
63. Zheng, X., Julstrom, B.A., Cheng, W.: Design of vector quantization codebooks using
a genetic algorithm. In: Proceedings of 1997 IEEE International Conference on Evolutionary Computation, Piacataway, NJ, pp. 525530 (1997)
64. Zhou, Z., Ong, Y.S., Nair, P.B.: Hierarchical surrogate-assisted evolutionary optimization framework. In: Congress on Evolutionary Computation, pp. 15861593. IEEE, Los
Alamitos (2004)
Chapter 10
Abstract. When function forms in mathematical models can not be given explicitly
in terms of design variables, the values of functions are usually given by numerical/real experiments. Since those experiments are often expensive, it is important to
develop techniques for finding a solution with as less number of experiments as possible. To this end, the model predictive optimization methods aim to find an optimal
solution in parallel with predicting the function forms in mathematical models. Successive approximate optimization or metamodeling are of the same terminology. So
far, several kinds of methods have been developed for this purpose. Among them,
response surface method, design of experiments, Kriging method, active learning
methods and methods using computational intelligence are well known. However,
the subject of those methods is mainly static optimization. For dynamic optimization problems, the model predictive control has been developed along a similar idea
to the above. This chapter discusses multi-objective model predictive control problems and proposes a method using computational intelligence such as support vector
regression.
Keywords: multi-objective optimization, satisficing trade-off method, model
predictive control, support vector regression.
Hirotaka Nakayama
Konan University, 8-9-1 Okamoto, Higashinada, Kobe 658-8501, Japan
e-mail: nakayama@konan-u.ac.jp
Yeboon Yun
Kagawa University, 2217-20 Hayashicho, Takamatsu 761-0396, Japan
e-mail: yun@eng.kagawa-u.ac.jp
Masakazu Shirakawa
Toshiba Corporation, 2-4 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
e-mail: masakazu1.shirakawa@toshiba.co.jp
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 249264.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
250
10.1 Introduction
In many practical problems such as engineering design, function forms in mathematical models can not be given explicitly in terms of design variables, but the
values of functions are usually given by numerical/real experiments. Since those
experiments are often expensive, it is important to develop techniques for finding
a solution with as less number of experiments as possible. Model predictive optimization (or sequential approximate optimization: SAO depending on literatures)
has been developed to this aim extensively in recent years [13, 18, 19, 22].
In this chapter, we consider model predictive optimization problems under a dynamic environment with multiple objectives. For prediction of function forms, we
apply some techniques of computational intelligence such as support vector regression or radial basis function networks. For optimization with multiple objectives, the
satisficing trade-off method, which was developed by one of authors in 80s, is applied
along with some meta-heuristic optimization method such as genetic algorithms. It
will be shown how model prediction using computational intelligence combined with
an interactive multi-objective optimization technique works well for multi-objective
model predictive control problems.
where the constraint set X may be represented by gi (x) 0, i = 1, . . . , m. The identification of f and X (or gi , i = 1, . . . , m) is called modeling. We assume that those
functions exist due to some physical rule, although their explicit function forms
can not be known in terms of design variables x. Those situations are common in
particular in engineering design problems. Under the circumstance, we try to get
approximate functions f (and gi , i = 1, . . . , m, if necessary). The approximation of
objective/constraint functions based on several observations is called metamodeling in the sense of making a model of the model.
Now, our aim is to construct a good metamodel in the sense that
i) we can obtain an approximate optimal solution x through the
metamodel with the property
| f(x ) f (x )| 1 ,
where x and x minimize f and f , respectively, and 1 is a given small
positive number,
ii) the total number of observations is as small as possible,
10
251
252
f (z) = wi zi + b,
i=1
i , i 0,
where C is a trade-off parameter between the norm of w and i (i ).
The dual formulation to the problem (CSVR)P using the kernel function
K(x, x ) = zT z = (x)T (x ) is given by
maximize
,
1
(i i ) ( j j ) K (xi , x j )
2 i,
j=1
i=1
i=1
(CSVR)
+ (i i ) yi (i + i )
subject to
(i i) = 0,
i=1
0 i
C
C
, 0 i , i = 1, . . . , .
, i , i 0,
where C and 0 < 1 are trade-off parameters between the norm of w and and
i (i ).
10
253
1
(i i ) ( j j ) K (xi , x j )
2 i,
j=1
( SVR)
+ (i i ) yi
i=1
subject to
(i i) = 0,
i=1
(i + i) C ,
i=1
0 i
C
C
, 0 i , i = 1, . . . , .
minimize
w,b, ,
subject to
1
(i i ) ( j j ) K (xi , x j )
2 i,
j=1
i=1
i=1
( SVR)
+ (i i ) yi (i + i )
subject to
(i i) = 0,
i=1
i=1
i=1
i , i ,
i 0, i 0, i = 1, . . . , .
Combining SVR and SVR, we can derive another formulation which may be
defined as SVR:
254
1
( SVR)P
w22 + + ( + )
2
T
w zi + b yi + , i = 1, . . . , ,
yi wT zi + b + , i = 1, . . . , ,
, , 0,
minimize
w,b, , ,
subject to
where ( ) denotes the maximum outer deviation from the band, and and
are trade-off parameters between the norm of w and and ( ) respectively.
In this formulation, however, at least either or (or ) vanishes at the solution
according to 2 or 2 . Therefore, SVR may be reduced to the
following formulation simply called SVR (or similarly SVR replacing
by ( ) and by ( + )):
1
w22 +
2
T
w zi + b yi , i = 1, . . . , ,
yi wT zi + b , i = 1, . . . , ,
0,
minimize
w,b,
subject to
( SVR)P
1
(i i ) ( j j ) K (xi , x j )
2 i,
j=1
( SVR)
+ (i i ) yi
i=1
subject to
(i i ) = 0,
i=1
(i + i ) ,
i=1
i 0, i 0, i = 1, . . . , .
It has been observed that SVR and SVR provide the least number of support
vectors while keeping a reasonable error rate, compared with CSVR and SVR.
That is, SVR and SVR is promising for sparse approximation which means
the computation is less expensive. The fact that SVR and SVR yields good
function approximation with reasonable accuracy and with less support vectors, is
important in practice in engineering design.
10
255
f (X )
f (x )
f1
Since there may be many Pareto solutions in practice, the final decision should
be made among them taking the total balance over all criteria into account. This
is a problem of value judgment of decision maker (DM). The balancing over criteria is usually called trade-off. The set of Pareto values (namely, the values of
objective functions corresponding to Pareto solutions) is called Pareto frontier. If
we can visualize the Pareto frontier, DM can easily make his trade-off analysis on
the basis of the shown Pareto frontier. In recent years, studies aimed at generating
Pareto frontier have been developed with a help of meta-heuristic algorithms such as
evolutionary algorithms and particle swarm optimization (see, for example, [1, 2, 5,
11, 12]).
On the other hand, it is not so easy to understand the trade-off relation of Pareto
frontier with more than 3 dimensions. Since 1970s, interactive multi-objective programming techniques have been developed in order to overcome this difficulty:
those methods search a solution in an interactive way with DM while making tradeoff analysis on the basis of DMs value judgment (see, for example, [1]). Among
them, the aspiration level approach is now recognized to be effective in many practical problems. As one of aspiration level approaches, one of authors proposed the
satisficing trade-off method [20].
Suppose that we have objective functions f (x) := ( f1 (x), . . . , fr (x))T to be minimized over x X Rn . In the satisficing trade-off method, the aspiration level at
k
the k-th iteration f is modified as follows:
f
k+1
= T P( f ).
256
Here, the operator P selects the Pareto solution nearest in some sense to the given
k
aspiration level f . The operator T is the trade-off operator which changes the kk
k
th aspiration level f if DM does not compromise with the shown solution P( f ).
k
Of course, since P( f ) is a Pareto solution, there exists no feasible solution which
k
makes all criteria better than P( f ), and thus DM has to trade-off among criteria if
he wants to improve some of criteria. Based on this trade-off, a new aspiration level
k
is decided as T P( f ). Similar process is continued until DM obtains an agreeable
solution.
k
k
The operation which gives a Pareto solution P( f ) nearest to f is performed by
some auxiliary scalar optimization:
r
max i fi (x) f i + i fi (x),
minimize
x
1ir
i=1
k +1
f*
f1
10
257
subject to
J = [x(T )] +
T
F(x(t), u(t),t)dt
0
(10.1)
If the function form in the above model is explicitly given, then we can apply some
techniques on the basis of optimal control theory. However, we assume that some of
function forms, in particular the dynamic system equation (10.1), can not explicitly
be given. Under this circumstance, we predict some of future state x(t + 1), . . ., x(t +
p1 ) for given u(t + 1), . . . , u(t + p2 ), where the prediction period p1 and the control
period p2 are given (p1 p2 ). Our aim is to decide the optimal control sequence
u(t) over [0, T ].
Suppose that our problem to be considered in this section has multiple objectives
J = (J1 , . . . , Jr )T .
For example, those objectives are the energy consumption, constraints of terminal
state, the terminal time (T ) itself and so on.
For predicting the future state, we apply a support vector regression technique,
namely SVR which was introduced in the previous section. It has been observed
that SVR provides less support vectors than other SVRs.
In order to get the final decision for these multi-objective problems, applying
the satisficing trade-off method [20] which is an aspiration level based method, we
summarize the algorithms as belows:
Step 1. Predict the model f by using SVR based on the past state and control
(x(k q), x(k q + 1), . . ., x(k), u(k q), u(k q + 1), . . ., u(k 1)), k = q, . . . ,t,
where q represents a depth of sampling training data and x(0) = x0 (denote f as
the predicted function of f ).
Step 2. Decide a control u (t) at the time t by using genetic algorithm:
(i) Generate randomly N individuals of control sequence:
u j (t), u j (t + 1), . . ., u j (t + p2 1), j = 1, 2, . . . , N,
and set u j (t + i) = u j (t + p2 1) for i p2 , generally.
(ii) Predict the next state vector x j (k + 1) for each control sequence from the
present time t to the time t + p1:
x j (k + 1) x j (k) := f (x j (k), u j (k)), k = t, t + 1, . . .,t + p1 1.
258
i=1
where wi = J J
and Ji is an ideal value of i-th objective function.
i
i
(vi) Evaluating the individuals of control sequence by the value of z j , generate new
individuals of control sequence through natural selection and genetic operators
(for details, see [5]).
(v)Repeat (ii)(iv) until a stop condition, for example the number of iteration,
holds.
Step 3.
min z j , and
j=1,...,N
(10.2)
The above equation (10.2) for the discrete time can be represented by
x(t + 1) = Ax(t) + Bu(t),
(10.3)
where
1.2822
0
0
0
A =
5.4293
0
128.2 128.2
0
1
0
0
C =
128.2 128.2
0.98
1
1.8366
0
0 0
0 1 ,
0 0
0.3
0
0
, B = 0 ,
17
0
0
0
0
D = 0 .
0
A decision maker may change her/his aspiration level from the one at the previous time
t 1.
10
259
minimize
minimize
y2 (k)
J1 = 1
H
k=0
t+p1
y1 (k) 2
J2 =
0.349
k=0
2
Now, we optimize the input (i.e., the elevator angle) without knowing the explicit
form of dynamic equation (10.3), and set the target altitude H = 400 m. Both the
prediction period p1 and the control period p2 are 5 sec and 1.5 sec, respectively,
and the depth of sampling training data q = 1. The terminal time T is given by 20
sec, and the sampling period for discretization of dynamics is 0.5 sec. In GA, we use
BLX , = 0.25 (blend crossover, BLX) which is well known as a real-coded GA
[6]. The size N of individuals is 100 and the iteration number is 100. Each constraint
is treated by the penalty method (other techniques to deal with constraints in GA can
be referred to [2, 28]).
Case 1
Suppose that we are at the time t = 5, and consider two situations without/with
turbulence for 5 seconds from the present time t = 5 (the observed altitude may be
considerably different with the predicted one by turbulence). The aspiration level is
given by J 1 = 6.0, J 2 = 10.0, and the ideal point J1 = 3.0, J2 = 5.0.
Fig. 10.3 shows the solutions by using the satisficing trade-off method with the
predicted model. Here, the symbol represents the aspiration level, the symbol
3 the solution without turbulence and the symbol the solution with turbulence.
Fig. 10.4 and Fig. 10.5 show each response corresponding to the obtained solution.
Compared Fig. 10.5 with Fig. 10.4, because of turbulence, there are relatively strong
fluctuations in controlling the elavator angle u.
In this case, one may see that the time in which the increase of altitude attains
400 m becomes longer because the comfortability of passengers is considered relatively more important. However, since the upper bound of the altitude rate y3 is
not constrained, the pitch angle y1 may take the value of the upper bound during
the transient state. Thus, in the following case 2, we consider the case in which the
upper bound of the altitude rate y3 is 30 m/sec.
260
19
17
J2
15
aspiration level
13
solution without
turbulence
11
9
7
5
3
J1
11
Case 2
The aspiration level is given by J 1 = 8.0, J 2 = 8.0, and the ideal point J1 = 4.0, J2 =
4.0. We show the solutions by using the satisficing trade-off method with the predicted model in Fig. 10.6. Fig. 10.7 and Fig. 10.8 show each corresponding response
to the obtained solution.
10
261
J2
aspiration level
10
solution without
turbulence
J1
14
Comparing the results of case 1 and case 2, it is seen in case 2 that the pitch angle
y1 during the transient state is smaller than its upper bound due to the limitation of
the altitude rate y3 . As seen from Fig. 10.7 and Fig. 10.8, consequently, there are
strong fluctuations in controlling the elavator angle u from t = 10 to t = 15, in order
to attain the target altitude 400 m as fast as possible. Moreover, the curve of the
pitch angle y1 in Fig. 10.8 is not as smooth as the one in Fig. 10.7.
262
10
263
References
1. Branke, J., Deb, K., Miettinen, K., Slowinski, R.: Multiobjective Optimization Interactive and Evolutionary Approaches. Springer, Heidelberg (2008)
2. Coello, C.A.C.: Theoretical and Numerical Constraint-Handling Techniques used with
Evolutionary Algorithms: A Survey of the State of the Art. Computer Methods in
Applied Mechanics and Engineering 191(11-12), 12451287 (2002)
3. Cortes, C., Vapnik, V.: Support Vector Networks. Machine Learning 20, 273297 (1995)
4. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other
Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
5. Deb, K.: Multi-Objective Optimization using Evolutionary Algorithms. John Wiley &
Sons, Ltd., Chichester (2001)
6. Eshelman, L.J., Schaffer, J.D.: Real-Coded Genetic Algorithms and Interval-Schemata.
In: Foundations of Genetic Algorithms, vol. 2, pp. 187202 (1993)
7. Erenguc, S.S., Koehler, G.J.: Survey of Mathematical Programming Models and
Experimental Results for Linear Discriminant Analysis. Managerial and Decision
Economics 11, 215225 (1990)
8. Freed, N., Glover, F.: Simple but Powerful Goal Programming Models for Discriminant
Problems. European Journal of Operational Research 7, 4460 (1981)
9. Gal, T., Stewart, T.J., Hanne, T.: Multicriteria Decision Making Advances in MCDM
Models, Algorithms, Theory, and Applications. Kluwer Academic Publishers, Dordrecht
(1999)
10. Glover, F.: Improved Linear Programming Models for Discriminant Analysis. Decision
Sciences 21, 771785 (1990)
11. Goh, C.K., Tan, K.C.: An Investigation on Noisy Environments in Evolutionary Multiobjective Optimization. IEEE Trans. Evolutionary Computation 11(3), 354381 (2007)
264
12. Goh, C.K., Ong, Y.S., Tan, K.C., Teoh, E.J.: An investigation on evolutionary gradient
search for multi-objective optimization. In: IEEE Congress on Evolutionary Computation, pp. 37413746 (2008)
13. Jones, D.R., Schonlau, M., Welch, W.J.: Efficient Global Optimization of Expensive
Black-Box Functions. J. of Global Optimization 13, 455492 (1998)
14. Maciejowski, J.M.: Predcitive Control with constraints. Pearson Educational Limited,
London (2002)
15. Miettinen, K.M.: Nonlinear Multiobjective Optimization. Kluwer Academic Publishers,
Dordrecht (1999)
16. Myers, R.H., Montgomery, D.C.: Response Surface Methodology: Process and Product
Optimization using Designed Experiments. Wiley, Chichester (1995)
17. Nakayama, H.: Aspiration Level Approach to Interactive Multi-objective Programming
and its Applications. In: Pardalos, P.M., Siskos, Y., Zopounidis, C. (eds.) Advances in
Multicriteria Analysis, pp. 147174. Kluwer Academic Publishers, Dordrecht (1995)
18. Nakayama, H., Arakawa, M., Sasaki, R.: Simulation based Optimization for Unknown
Objective Functions. Optimization and Engineering 3, 201214 (2002)
19. Nakayama, H., Arakawa, M., Washino, K.: Optimization for Black-box Objective Functions. In: Pardalos, P.M., Tseveendorj, I., Enkhbat, R. (eds.) Optimization and Optimal
Control, pp. 185210. World Scientific, Singapore (2003)
20. Nakayama, H., Sawaragi, Y.: Satisficing Trade-off Method for Multi- objective Programming. In: Grauer, M., Wierzbicki, A. (eds.) Interactive Decision Analysis, pp. 113122.
Springer, Heidelberg (1984)
21. Nakayama, H., Yun, Y.: Generating Support Vector Machines using Multiobjective Optimization and Goal Programming. In: Multi-objective Mahine Learning. Studies in Computational Intelligence. Springer, Heidelberg (2006)
22. Nakayama, H., Yun, Y., Yoon, M.: Sequential Approximate Multiobjecitve Optimization
using Computational Intelligence. Springer Series on Vector Optimization (to appear,
2009)
23. Radcliffe, N.J.: Forma Analysis and Random Respectful Recombination. In: Proceedings
of the Fourth International Conference on Genetic Algorithms, pp. 222229 (1991)
24. Sawaragi, Y., Nakayama, H., Tanino, T.: Theory of Multiobjective Optimization.
Academic Press, London (1985)
25. Scholkopf, B., Smola, A.J.: New Support Vector Algorithms. NeuroCOLT2 Technical
report Series, NC2-TR-1998-031 (1998)
26. Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
27. Steuer, R.: Mutiple Criteria Optimization: Theory, Computation, and Application. Wiley,
Chichester (1986)
28. Runarsson, T.P., Yao, X.: Stochastic Ranking for Constrained Evolutionary Optimization. IEEE Transactions on Evolutionary Computation 4(3), 284294 (2000)
29. Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (1998)
30. Wierzbicki, A.P., Makowski, M., Wessels, J.: Model-based Decision Support Methodology with Environmental Applications. Kluwer Academic Publishers, Dordrecht (2000)
Chapter 12
Abstract. This chapter proposes a novel algorithm for handling high dimensionality in large scale problems. The proposed algorithm, here indicated with Differential
Evolution for Large Scale problems (DELS) is a Differential Evolution (DE) based
Memetic Algorithm with self-adaptive control parameters and automatic population size reduction, which employs within its framework a variation operator local
search. The local search algorithm is applied to the scale factor in order to generate
high quality solutions. Due to its structure, the computational cost of the local search
is independent on the number of dimensions characterizing the problem and thus is
a suitable component for large scale problems. The proposed algorithm has been
compared with a standard DE and two other modern DE based metaheuristics for
a varied set of test problems. Numerical results show that the DELS is an efficient
and robust algorithm for highly multivariate optimization, and the employment of
the local search to the scale factor is beneficial in order to detect solutions with a
high quality, convergence speed and algorithmic robustness.
12.1 Introduction
Computationally expensive optimization problems can be classified into two categories: problems which require a long calculation time for each objective function
evaluation and problems which require a very high amount of objective function
evaluations for detecting a reasonably good candidate solution. The problems belonging to the latter category, which are the focus of this chapter, are usually
Andrea Caponio
Department of Electrotechnics and Electronics, Technical University of Bari, Italy
e-mail: caponio@deemail.poliba.it
Anna V. Kononova
Centre for CFD, School of Process, Environmental and Materials Engineering,
University of Leeds, LS2 9JT, UK
e-mail: pmak@leeds.ac.uk
Ferrante Neri
Department of Mathematical Information Technology, University of Jyvaskyla,
FI-40014 Jyvaskyla, Finland
e-mail: ferrante.neri@jyu.fi
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 297323.
springerlink.com
Springer-Verlag Berlin Heidelberg 2010
298
characterized by a vast decision space which is strictly related to the high dimensionality of the function. Optimization problems characterized by a high number
of variables are also known as large scale optimization problems, or briefly Large
Scale Problems (LSPs).
The detection of an efficient solver for LSPs can be a very valuable achievement
in applied science and engineering since in many applications a high number of design variables may be of interest for an accurate problem description. For example,
in structural optimization an accurate description of complex spatial objects might
require the formulation of a LSP; similarly such a situation also occurs in scheduling problems, see [19]. Another important example of a class of real-world LSPs is
the inverse problem chemical kinetics studied in [13] and [14].
Unfortunately, when an exact method cannot be applied, LSPs can turn out to be
very difficult to solve. As a matter of fact, due to high dimensionality, algorithms
which perform a neighborhood search (e.g. Hooke-Jeeves Algorithm) might require
an unreasonably high number of fitness evaluations at each step of the search while
population based algorithms are likely to either prematurely converge to suboptimal
solutions, or stagnate due to an inability to generate new promising search directions. In other words, many metaheuristics that perform well for problems characterized by a low dimensionality, e.g. Evolutionary Algorithms (EAs), can often fail
to find good near optimal solutions to high-dimensional problems. The deterioration
in the algorithmic performance, as the dimensionality of the search space increases
is commonly known as a curse of dimensionality, see [38].
Since the employment of optimization algorithms can lead to a prohibitively high
computational cost of the optimization run without the detection of a satisfactory
result, it is crucially important to detect an algorithmic solution that allows good
results by performing a relatively low amount of objective function evaluations. In
the literature various studies have been carried out and several algorithmic solutions
have been proposed. In [15], a modified Ant Colony Optimizer (ACO) has been
proposed for large scale optimization problems. Some other papers propose a technique, namely cooperative coevolution, originally defined in [27] and subsequently
developed in other works, see e.g. [17] and [36]. The concept of the cooperative
coevolution is to decompose a LSP in a set of low dimension problems which can
be separately solved and then recombined in order to compose the solution of the
original problem. It is obvious that if the objective function (fitness function) is separable then the problem decomposition can be trivial while for nonseparable functions the problem decomposition can turn out to be a very difficult task. However,
some techniques for performing the decomposition of nonseparable functions have
been developed, see [26]. Recently, cooperative coevolution procedures have been
successfully integrated within Differential Evolution (DE) frameworks for solving
LSPs, see [35], [39], [41], [24] and [40].
It should be remarked that a standard DE can be inefficient for solving LSPs, see
[6]. However, DE framework, thanks to its simple structure and flexibility, can be easily modified and become an efficient solver of high dimensional problems. Besides
the examples of DE integrating cooperative coevolution, other DE based algorithms
for LSPs have been proposed. In [30] the opposition based technique is proposed for
12
299
handling the high dimensionality. This technique consists of generating extra points,
that are symmetric to those belonging to the original population, see details in [31].
In [22] a Memetic Algorithm (MA) (see for the definitions e.g. [20], [12], and [25])
which integrates a simplex crossover within the DE framework has been proposed
in order to solve LSPs, see also [23]. In [5], on the basis of the studies carried out
in [1], [4], and [2], a DE for LSPs has proposed. The algorithm proposed in [5] performs a probabilistic update of the control parameter of DE variation operators and
a progressive size reduction of the population size. Although the theoretical justifications of the success of this algorithm are not fully clear, the proposed approach
seems to be extremely promising for various problems. In [21], a memetic algorithm
which hybridizes the self-adaptive DE described in [2] and a local search applied
to the scale factor in order to generate candidate solutions with a high performance
has been proposed. Since the local search on the scale factor (or scale factor local
search) is independent on the dimensionality of the problem, the resulting memetic
algorithm offered a good performance for relatively large scale problems, see [21].
This chapter proposes a novel memetic algorithm which integrates the potential
of the scale factor local search within the self-adaptive DE with automatic reduction of the population size in order to guarantee a high performance, in terms of
convergence speed and solution detection, for large scale problems.
The rest of this chapter is organized in the following way. Section 12.2 describes
the algorithmic components characterizing the proposed algorithm. Section 12.3
shows the numerical results and highlights the performance of the proposed algorithm with respect to a standard DE and two modern DE variants. Section 12.4 gives
the conclusion of this work.
300
generated. With F [0, 1+[, it is here meant that the scale factor should be a positive
value which cannot be much greater than 1, see [28]. While there is no theoretical
upper limit for F, effective values are rarely greater than 1.0. The mutation scheme
shown in eq. (12.1) is also known as DE/rand/1. Although the DE/rand/1 mutation
have been employed in this chapter, its important to mention that other variants of
the mutation rule have been proposed in literature, see [29]:
DE/best/1: xo f f = xbest + F (xs xt )
DE/cur-to-best/1: xo f f = xi + F (xbest xi ) + F (xs xt )
DE/best/2: xo f f = xbest + F (xs xt ) + F (xu xv )
DE/rand/2: xo f f = xr + F (xs xt ) + F (xu xv )
where xbest is the solution with the best performance among the individuals of the
population, xu and xv are two additional randomly selected individuals.
Then, to increase exploration, each gene of the new individual xo f f is switched
with the corresponding gene of xi with a uniform probability CR [0, 1] and the
final offspring xo f f is generated:
if
rand (0, 1) < CR
xi, j
(12.2)
xo f f , j =
xo f f , j otherwise
where rand (0, 1) is a random number between 0 and 1; j is the index of the gene
under examination, from 1 to n, n the length of each individual.
generate S pop individuals of the initial population randomly;
while budget condition
for i = 1 : S pop
compute f (xi );
end-for
for i = 1 : S pop
**mutation**
select three individuals xr , xs , and xt ;
compute xo f f = xt + F(xr xs );
**crossover**
xo f f = xo f f ;
for j = 1 : n
generate rand(0, 1);
if rand(0, 1) < CR
xo f f , j = xi, j ;
end-if
end-for
**selection**
if f xo f f f (xi )
xi = xo f f ;
end-if
end-for
end-while
Fig. 12.1 DE pseudocode
12
301
(12.5)
(12.6)
where rand j , j {1, 2, 3, 4, 5, 6}, are uniform random values between 0 and 1; 1 ,
2 , and 3 are constant values which represent the probabilities that parameters are
updated, Fl and Fu are constant values which represent the minimum value that
Fi could take and the maximum variable contribution to Fi , respectively. The sign
inversion in the scale factor described in eq. (12.5) can be seen as the exploitation
of a crude approximation of the gradient information in order to generate offspring
along the most promising search directions. The newly calculated values of Fi and
CRi are then used for generating the offspring. Mutation, crossover, and selection
scheme are performed as shown in Subsection 12.2.1 for a standard DE.
302
Thus, the total budget of the algorithm is divided into Ns periods, each period
being characterized by a population size value Skpop (for k = 1 we obtain the initial
population size). Each period is composed of Ngk generations which are calculated
in the following way:
Tb
k
+ rk
Ng =
(12.7)
Ns Skpop
where rk is a constant non-negative value which takes a positive value when Tb is
not divisible by Ns . In this case rk extra generations are performed. The population
reduction is simply carried out by halving the population size at the beginning of
Sk
pop
the new stage, see [1]. In other words, for k = 1, 2, ..., Ns 1, Sk+1
pop = 2 .
In this way, the population size is progressively reduced during the optimization
process until the final budget is reached. The concept behind this strategy can be
explained as the satisfaction of the necessity of focusing the search in progressively
smaller search spaces in order to inhibit the DE stagnation in the environment with
high dimensionality. During the early stages of the optimization process, the search
requires a highly explorative search rule, i.e. a large population size, in order to explore a large portion of the decision space. During the optimization, the search space
is progressively narrowed by decreasing the population size and thus exploiting the
promising search directions previously detected. Although the number of stages and
the population size values remain arbitrary issues defined by the algorithmic designer, the idea seems to lead to a fairly robust algorithmic behavior for the setting
proposed in [41] and seems to be very promising for LSPs, as highlighted in [1].
The last topic to be clarified is the selection rule employed every time a population reduction occurs. At the end of each stage, i.e. at each Ngk generation for
k = 2, 3, ..., Ns , the population is divided into two sub-populations on the basis of
Sk
lutions x S pop k
2
+1
, x S pop k
2
+2
spawning, typical of the DE logic, to the two sub-populations analogous to the selection among parent and offspring individuals in a standard DE scheme. In other
words, the individuals xi and x S pop k
2
+i
Skpop
2 ,
and
the individuals having the most promising fitness value are retained for the subsequent generation.
For the sake of clarity, it should be remarked that in order to guarantee a proper
functioning of the population reduction mechanism, populations should never undergo sorting of any kind.
12
303
search is not applied to all the coordinates of the individual but on its scale factor Fi .
The main idea is that the update of the scale factor and thus generation of the offspring is, with a certain probability, controlled in order to guarantee a high quality
solution which can take on a key role in subsequent generations, see also [18].
Local search in the scale factor space can be seen as the minimization over the
variable Fi of fitness function f in the direction given by xr and xs and modified by
the crossover. More specifically, at first the scale factor local search determines those
genes which are undergoing crossover by means of the standard criterion explained
in eq. (12.2), then it attempts to find the scale factor value which guarantees an
offspring with the best performance. Thus, for given values of xt , xr , xs , and the set
of design variables to be swapped during the crossover operation, the scale factor
local search attempts to solve the following minimization problem:
min f (Fi ) in [1, 1] .
Fi
(12.8)
For sake of clarity, the procedure describing the fitness function is shown in
Fig. 12.2.
insert Fi ;
compute xo f f = xt + Fi (xr xs );
perform the swapping of the genes
and generate in a crossover
fashion xo f f ;
compute f (Fi ) = f xo f f ;
return f (Fi );
Fig. 12.2 Local search fitness function, f (Fi ) pseudocode
304
the step size h is halved. The local search is stopped when a budget condition is
exceeded. For the sake of completeness the pseudo-code of the Scale Factor HillClimb (SFHC) is shown in Fig. 12.3.
insert Fi ;
initialize h;
while budget condition
compute f (Fi h), f (Fi ), and f (Fi + h);
select the point with the best performance Fi ;
if Fi == Fi
h = h/2;
end-if
Fi = Fi ;
end-while
12
305
Ellipsoid
Griewangk
Michalewicz
Parallel Axis
Rastrigin
Rosenbrock
Schwefel
Tirronen
Analytic Expression
20 + e + 20 exp 0.2
ni=1 x2i
n
exp 1n ni=1 cos(2 xi )xi
2
ni=1 ij=1 x j
||x||2
4000
Decision Space
[1, 1]n
[65.536, 65.536] n
ni=0 cos ii + 1
20
i x2
ni=1 sin xi sin i
[600, 600]n
ni=1 i x2i
n
10n + i=0 x2i 10 cos(2xi )
2
xn+1 x2i + (1 x)2
n1
i=1
[5.12, 5.12]n
|xi |
10 exp 8||x||2
ni=1 xi sin
||x||2
3 exp
10n
[0, ]n
[5.12, 5.12]n
[2.048, 2.048]n
[500, 500]n
[10, 5]n
n
+ 2.5
n i=1 cos (5xi (1 + i mod 2) cos (||x||))
For the sake of clarity, the pseudocode highlighting the working principle of the
DELS integrating the scale factor local search is given in Fig. 12.4.
306
for i = 1
Tb
Ns S pop
S pop
: 2
+ rk
if f x S pop +i
2
<
f (xi )
xi = x S pop +i ;
2
end-if
end-for
halve S pop and pls ;
update Ng ;
end-if
end-while
12
307
4. The proposed DELS has the same parameter setting of the jDEdynNP-F. In
addition, the SFHC has been run with a budget of 40 fitness evaluations and an
initial step size h = 0.1.
The experiments have been performed for n = 100, n = 500, and n = 1000. The
total budget for all the algorithms has been set equal to 1.5 105, 3 106, and 6
106 fitness evaluations, respectively. Regarding jDEdynNP-F and DELS, the initial
population size S1pop is set equal to n. Regarding DE and SACPDE the population
size S pop has been set in order to keep constant the amount of generations for all the
algorithms considered in this study. More specifically, DE and SACPDE have been
run with a population size S pop = 27 for the 100 dimension case, S pop = 134 for the
500 dimension case, and S pop = 267 for the 1000 dimension case.
For each test problem, each algorithm performed the optimization process on 30
independent runs.
Regarding the rotated test problems, in order to perform a fair comparison and an
analysis on the robustness, a rotation matrix has been generated for each problem
and for each run. Then, all the algorithms considered in this study have been run
with the same set of rotated problems.
DE
SACPDE
Ackley
9.41E+001.26E+00 4.54E+001.14E+00
7.87E+001.67E+00 5.83E+001.50E+00
Rotated Ackley
1.02E+034.20E+02 3.81E+031.87E+03
Ellipsoid
Rotated Ellipsoid
8.86E+023.90E+02 3.80E+031.32E+03
1.02E-011.74E-01
2.76E-016.66E-01
Griewangk
2.24E-023.31E-02
7.45E-42.34E-03
Rotated Griewangk
Michalewicz
-4.60E+015.86E+00 -8.99E+011.13E+00
Rotated Michalewicz -8.16E+006.48E-01 -7.98E+004.31E-01
2.74E-046.36E-04
4.39E-186.96E-18
Parallel
Rotated Parallel
5.13E+003.82E+00 1.50E+008.23E-01
2.28E+028.02E+01 3.74E+018.29E+00
Rastrigin
1.64E+025.15E+01 1.89E+024.76E+01
Rotated Rastrigin
Rosenbrock
2.24E+025.59E+01 1.82E+025.26E+01
1.34E+025.59E+01 1.25E+025.52E+01
Rotated Rosenbrock
1.60E+042.19E+03 4.46E+031.27E+03
Schwefel
Rotated Schwefel
1.99E+043.78E+03 1.64E+041.03E+03
-1.75E+001.17E-01 -2.48E+001.32E-02
Tirronen
-7.82E-016.42E-02 -1.03E+001.00E-01
Rotated Tirronen
jDEdynNP-F
DELS
8.87E-058.97E-05
6.49E-016.04E-01
4.37E+039.65E+02
4.07E+037.58E+02
1.21E-022.12E-02
6.85E-038.11E-03
-8.86E+011.15E+00
-8.28E+009.11E-01
3.76E-084.92E-08
6.44E+003.98E+00
1.28E+015.39E+00
3.05E+027.78E+01
1.15E+022.72E+01
9.71E+011.30E+00
1.10E+033.03E+02
1.74E+041.21E+03
-2.47E+007.98E-03
-1.01E+001.57E-01
3.21E-054.56E-05
3.57E-015.77E-01
3.91E+039.11E+02
3.98E+039.97E+02
2.18E-024.05E-02
2.45E-032.42E-03
-9.16E+012.49E+00
-1.09E+011.37E+00
2.13E-083.05E-08
3.40E+002.10E+00
1.27E+014.92E+00
1.96E+026.22E+01
1.45E+024.55E+01
9.69E+011.43E+00
9.76E+024.30E+02
1.59E+042.00E+03
-2.49E+006.16E-03
-1.61E+002.08E-01
308
Results in Table 12.2 show that the proposed DELS obtained the best results for
10 problems out of the 18 considered in the 100 dimension benchmark. Thus, the
DELS seems clearly to be the most efficient algorithm in terms of final solutions.
In the remaining 8 test problems the DELS, in any case is never by far outperformed by other algorithms and still demonstrates a competitive performance. For
example, with the Griewangk function, although the DELS does not seem to have a
very promising behavior, reaches satisfactory results anyway.
In order to prove the statistical significance of the results, the Students t-test
has been applied according to the description given in [34] for a confidence level
of 0.95. Final values obtained by the DELS have been compared to the final value
returned by each algorithm used as a benchmark. Table 12.3 shows the results of
the test. Indicated with + is the case when the DELS statistically outperforms, for
the corresponding test problem, the algorithm mentioned in the column; indicated
with = is the case when pairwise comparison leads to success of the t-test i.e. the
two algorithms have the same performance; indicated with - is the case when the
DELS is outperformed.
DE
Ackley
Rotated Ackley
Ellipsoid
Rotated Ellipsoid
Griewangk
Rotated Griewangk
Michalewicz
Rotated Michalewicz
Parallel
Rotated Parallel
Rastrigin
Rotated Rastrigin
Rosenbrock
Rotated Rosenbrock
Schwefel
Rotated Schwefel
Tirronen
Rotated Tirronen
+
+
=
=
+
+
=
=
+
=
+
=
+
+
+
+
SACPDE jDEdynNP-F
+
+
=
=
=
=
=
+
+
=
=
=
+
=
+
+
=
=
=
=
=
=
+
+
=
+
=
+
=
=
=
=
+
+
The t-test results listed in Table 12.3 show that the DELS loses the comparison in
only 4 cases out of the 54 comparisons carried out i.e. the DELS loses in only 7.4%
of the pairwise comparisons. In addition, it should be remarked that the scale factor
local search never reduces the performance of the jDEdynNP-F framework, as the
right hand column of Table 12.3 proves.
In addition to the t-test also the Friedman test has been performed, see [34]. In a
nutshell, Friedman test is a non-parametric test equivalent of the repeated-measures
12
309
ANOVA. Under the null-hypothesis, it states that all the algorithms are equivalent. If
the hypothesis is rejected, the algorithms have a different performance. Details of the
test can be found in [34] and the application of this test in the context of algorithm
comparisons is described in [11]. In this study, rotated and non-rotated problems
have been treated separately and in both cases level of significance has been set to
0.05. We can conclude that the probability that the algorithms under analysis have
the same performance for non-rotated problems is 0 while the probability that this
events happens for rotated problems is 1.4618 108, i.e. this event is very unlike.
In order to carry out a numerical comparison of the convergence speed performance, for each test problem, the average final fitness value returned by the best performing algorithm G has been considered. Subsequently, the average fitness value at
the beginning of the optimization process J has also been computed. The threshold
value T HR = J 0.95(G J) has then been calculated. The value T HR represents
95% of the decay in the fitness value of the algorithm with the best performance. If
an algorithm succeeds during a certain run to reach the value T HR, the run is said
to be successful. For each test problem, the average amount of fitness evaluations
ne
required, for each algorithm, to reach T HR has been computed. Subsequently,
the Q-test (Q stands for Quality) described in [10] has been applied. For each test
problem and each algorithm, the Q measure is computed as:
Q=
ne
(12.9)
where the robustness R is the percentage of successful runs. It is clear that, for each
test problem, the smallest value equals the best performance in terms of convergence
speed.The value Inf means that R = 0, i.e. the algorithm never reached the T HR.
Table 12.4 shows the Q values in 100 dimensions. The best results are highlighted
in bold face.
Results in Table 12.4 show that the best performance values in terms of Qmeasure are distributed among the considered algorithms. In other words, there is
not a clear best algorithm in terms of Q-measure. The jDEdynNP-F seems to have
a slightly lower performance than the other algorithms. It is important to notice that
the DELS has a very robust behaviour compared to the other algorithms considered
in this study. As a matter of fact, as shown in Table 12.4, the DELS is the only algorithm whose Q-measure never takes the Inf value. This means that the DELS is
always able to detect candidate solutions with a high performance and is never outperformed consistently by other algorithms. We can conclude that in 100 dimension
case the proposed DELS tends either to have an excellent performance with respect
to the other algorithms (e.g. Rotated Schwefel) or is anyway competitive with the
other algorithms (e.g. Rastrigin).
In order to graphically show the behaviour of the algorithms, some examples of
the average performance are plotted against the number of fitness evaluations (for
some of the test problems listed in Table 12.1) and represented in Figure 12.5.
310
DE
Inf
Inf
9.94E+01
1.16E+02
7.35E+01
6.58E+01
Inf
Inf
6.67E+01
6.39E+01
Inf
5.09E+02
1.96E+01
1.45E+01
Inf
Inf
Inf
Inf
SACPDE jDEdynNP-F
Inf
Inf
1.43E+02
1.79E+02
3.96E+01
4.04E+01
3.19E+03
Inf
3.63E+01
3.94E+01
3.40E+02
2.06E+03
2.07E+01
2.12E+01
Inf
Inf
3.88E+02
Inf
3.27E+02
6.83E+02
3.90E+02
4.09E+02
1.39E+02
1.39E+02
Inf
Inf
1.39E+02
1.72E+02
6.61E+02
9.94E+03
1.06E+02
1.04E+02
7.38E+02
Inf
7.13E+02
Inf
DELS
4.61E+02
6.51E+02
5.40E+02
5.92E+02
1.99E+02
1.92E+02
1.80E+03
1.45E+04
1.83E+02
2.22E+02
8.63E+02
1.74E+03
1.34E+02
1.35E+02
9.57E+02
4.53E+03
7.84E+02
1.49E+04
Fitness Values f
311
jDEdynNPF
SACPDE
DE
DELS
5
6
7
8
9
10
11
0
10
15
4
x 10
x 10
jDEdynNPF
SACPDE
DE
DELS
3.5
Fitness Values f
3
2.5
2
1.5
1
0.5
0
0
10
15
4
x 10
(b) Schwefel
4
x 10
jDEdynNPF
SACPDE
DE
DELS
3.5
Fitness Values f
12
2.5
1.5
0
10
15
4
x 10
312
Test Problem
DE
SACPDE
jDEdynNP-F
DELS
Ackley
5.02E+005.56E-01 2.09E+003.75E-01
1.11E-128.86E-13
6.01E-135.64E-14
3.97E+003.58E-01 2.88E+002.44E-01
7.85E-131.08E-13
7.13E-135.08E-14
Rotated Ackley
Ellipsoid
6.14E+045.47E+03 1.12E+041.35E+03 8.06E+035.77E+02 8.86E+031.10E+03
6.18E+041.66E+04 1.04E+041.03E+03 7.42E+031.09E+03 9.50E+031.07E+03
Rotated Ellipsoid
3.58E-024.08E-02
1.72E-023.44E-02
4.97E-152.03E-15
5.52E-151.78E-15
Griewangk
Rotated Griewangk
2.31E-022.01E-02
1.84E-107.26E-11
2.57E-092.60E-09
1.40E-081.33E-08
-2.42E+028.51E+00 -4.57E+022.66E+00 -4.46E+022.65E+00 -4.69E+028.75E-01
Michalewicz
Rotated Michalewicz -1.24E+018.41E-01 -1.21E+015.86E-01 -1.21E+014.41E-01 -1.44E+012.74E+00
Parallel
3.18E-031.53E-03
4.67E-527.24E-53
3.95E-425.64E-42
2.87E-382.79E-38
5.58E+012.11E+01 4.22E+003.69E+00 3.94E+001.59E+00 2.95E+001.14E+00
Rotated Parallel
4.50E+026.74E+01 9.95E-011.41E+00 1.27E+002.54E+00 5.60E+004.63E+00
Rastrigin
Rotated Rastrigin
3.67E+023.56E+01 7.59E+023.26E+01 7.14E+025.32E+01 9.21E+021.76E+02
6.14E+025.40E+01 6.16E+028.47E+01 4.91E+023.25E+01 4.97E+023.17E+01
Rosenbrock
5.08E+022.82E+01 5.17E+022.96E+01 4.91E+023.57E-01
4.92E+025.15E-01
Rotated Rosenbrock
Schwefel
9.23E+043.21E+03 3.26E+021.78E+02 8.88E+015.92E+01 6.36E-032.84E-07
1.79E+051.55E+03 9.60E+045.26E+02 9.14E+041.52E+03 8.77E+044.66E+03
Rotated Schwefel
-1.78E+004.99E-02 -2.49E+001.96E-03 -2.49E+003.91E-03 -2.50E+001.86E-03
Tirronen
Rotated Tirronen
-4.06E-012.42E-02 -8.72E-014.14E-02 -9.57E-014.99E-02 -1.41E+001.64E-01
DE
SACPDE
jDEdynNP-F
Ackley
Rotated Ackley
Ellipsoid
Rotated Ellipsoid
Griewangk
Rotated Griewangk
Michalewicz
Rotated Michalewicz
Parallel
Rotated Parallel
Rastrigin
Rotated Rastrigin
Rosenbrock
Rotated Rosenbrock
Schwefel
Rotated Schwefel
Tirronen
Rotated Tirronen
+
+
+
+
=
=
+
=
+
+
+
+
=
+
+
+
+
+
+
+
=
=
=
+
=
=
=
=
=
+
=
+
+
+
+
=
=
=
=
=
+
=
=
=
=
=
=
+
=
+
+
12
313
DE
SACPDE
jDEdynNP-F
DELS
Inf
Inf
8.50E+03
9.46E+03
8.92E+02
9.64E+02
Inf
Inf
8.96E+02
8.98E+02
2.30E+04
8.48E+03
2.01E+02
1.79E+02
Inf
Inf
Inf
Inf
Inf
Inf
1.46E+03
1.46E+03
4.01E+02
3.73E+02
2.84E+04
Inf
3.83E+02
4.01E+02
1.08E+04
2.98E+04
2.17E+02
1.99E+02
1.45E+04
Inf
1.51E+04
Inf
5.00E+03
5.73E+03
6.63E+03
6.59E+03
1.56E+03
1.64E+03
4.86E+04
Inf
1.50E+03
1.76E+03
1.85E+04
2.73E+04
1.11E+03
1.10E+03
1.96E+04
Inf
1.98E+04
Inf
5.33E+03
6.22E+03
7.05E+03
7.00E+03
1.71E+03
1.66E+03
2.29E+04
9.55E+04
1.65E+03
1.91E+03
1.67E+04
9.42E+04
1.17E+03
1.09E+03
2.07E+04
5.21E+04
1.64E+04
5.38E+04
314
jDEdynNPF
SACPDE
DE
DELS
Fitness Values f
9
10
11
12
13
14
15
0
0.5
1.5
2.5
3
6
x 10
x 10
Fitness Values f
1.8
1.6
1.4
1.2
1
0.8
0
jDEdynNPF
SACPDE
DE
DELS
0.5
1
1.5
2.5
3
6
x 10
Fitness Values f
0.6
0.8
1
1.2
1.4
0
jDEdynNPF
SACPDE
DE
DELS
0.5
1
1.5
2.5
3
6
x 10
12
315
Table 12.8 Average final fitness values standard deviations in 1000 dimensions
Test Problem
DE
SACPDE
jDEdynNP-F
DELS
Ackley
4.47E+003.10E-01 1.47E+003.38E-01
5.41E-124.99E-12
1.80E-128.74E-14
4.01E+005.22E-02 3.24E+002.62E-01
1.14E-111.82E-11
2.67E-121.11E-12
Rotated Ackley
2.31E+066.82E+05 4.19E+046.78E+03 2.94E+044.37E+03 3.11E+043.64E+03
Ellipsoid
Rotated Ellipsoid
2.39E+065.79E+05 4.26E+042.93E+03 3.32E+044.55E+03 2.86E+044.39E+03
1.86E+001.24E-01
3.83E-027.67E-02
1.93E-141.05E-14
1.58E-147.79E-15
Griewangk
2.40E+004.95E-01
2.47E-034.93E-03
3.39E-061.12E-06
8.66E-065.27E-06
Rotated Griewangk
Michalewicz
-2.20E+026.37E+00 -4.26E+021.17E+01 -7.51E+024.31E+00 -8.23E+022.50E+01
Rotated Michalewicz -1.49E+014.88E-01 -1.46E+011.66E-01 -1.46E+016.53E-01 -1.63E+013.06E+00
Parallel
2.18E+028.66E+01 6.24E-367.30E-36
5.75E-296.28E-29
2.17E-272.92E-27
8.98E+021.71E+02 2.31E+018.83E+01 2.55E+011.08E+02 2.41E+015.65E+01
Rotated Parallel
5.62E+021.53E+01 3.05E+025.20E+01 2.29E+007.56E-01
8.95E+016.31E+01
Rastrigin
Rotated Rastrigin
5.82E+028.38E+01 1.45E+032.32E+02 1.49E+032.51E+02 1.59E+032.35E+02
1.39E+031.88E+02 1.29E+031.61E+02 9.94E+023.90E+01 1.05E+031.02E+02
Rosenbrock
1.34E+039.43E+01 1.05E+037.29E+01 9.87E+024.67E-01
1.00E+032.62E+01
Rotated Rosenbrock
Schwefel
2.30E+051.30E+03 1.86E+041.86E+03 1.48E+021.14E+02 2.18E+022.57E+02
3.75E+051.50E+03 2.27E+052.78E+03 2.14E+055.24E+03 2.07E+051.25E+03
Rotated Schwefel
-1.47E+002.96E-02 -2.36E+008.76E-03 -2.46E+001.32E-03 -2.49E+002.12E-03
Tirronen
Rotated Tirronen
-2.87E-012.01E-02 -6.96E-014.56E-02 -7.86E-012.88E-02 -1.15E+006.71E-02
DE
Ackley
Rotated Ackley
Ellipsoid
Rotated Ellipsoid
Griewangk
Rotated Griewangk
Michalewicz
Rotated Michalewicz
Parallel
Rotated Parallel
Rastrigin
Rotated Rastrigin
Rosenbrock
Rotated Rosenbrock
Schwefel
Rotated Schwefel
Tirronen
Rotated Tirronen
+
+
+
+
+
+
+
=
+
+
+
+
+
+
+
+
+
SACPDE jDEdynNP-F
+
+
+
+
=
=
+
=
=
=
+
=
+
=
+
+
+
+
=
=
=
=
=
=
+
=
=
=
=
=
=
=
+
+
+
316
DE
Inf
Inf
2.25E+05
Inf
2.53E+03
2.54E+03
Inf
Inf
2.33E+03
2.49E+03
3.14E+04
3.77E+04
7.76E+02
5.50E+02
Inf
Inf
Inf
Inf
SACPDE jDEdynNP-F
Inf
Inf
4.09E+03
4.13E+04
1.04E+03
1.08E+03
Inf
Inf
9.83E+02
1.06E+03
4.32E+04
1.89E+05
5.65E+02
4.85E+02
7.91E+04
Inf
Inf
Inf
1.35E+04
1.54E+04
1.36E+04
1.41E+04
4.39E+03
4.33E+03
Inf
Inf
4.00E+03
4.69E+03
4.71E+04
9.00E+04
3.10E+03
2.49E+03
5.12E+04
7.76E+04
5.10E+04
Inf
DELS
1.41E+04
1.62E+04
1.42E+04
1.39E+04
4.54E+03
4.64E+03
1.16E+05
2.17E+05
4.32E+03
1.79E+03
4.39E+04
1.22E+05
3.11E+03
2.64E+03
4.98E+04
5.40E+04
4.65E+04
1.66E+05
317
0
100
Fitness Value f
200
300
400
500
600
700
800
0
jDEdynNPF
SACPDE
DE
DELS
1
2
6
6
x 10
(a) Michalewicz
6
jDEdynNPF
SACPDE
DE
DELS
Fitness Values f
8
10
12
14
16
18
0
6
6
x 10
jDEdynNPF
SACPDE
DE
DELS
16000
14000
Fitness Value f
12
12000
10000
8000
6000
4000
2000
0
0
6
6
x 10
(c) Rastrigin
Fig. 12.7 Performance trends in 1000 dimensions
318
4.5
x 10
jDEdynNPF
SACPDE
DE
DELS
Fitness Values f
3.5
3
2.5
2
1.5
1
0.5
0
0
6
6
x 10
(a) Schwefel
5
x 10
Fitness Values f
3.5
2.5
2
0
jDEdynNPF
SACPDE
DE
DELS
1
2
6
6
x 10
Fitness Values f
0.2
0.4
0.6
0.8
1
jDEdynNPF
SACPDE
DE
DELS
1
2
6
6
x 10
12
319
the scheme has, for each stage of the optimization process, a limited amount of
exploratory moves and if these moves are not enough for generating new promising
solutions, the search can be heavily compromised. Clearly, the risk of the DE stagnation is higher for larger decision spaces and worsens as number of the dimensions
of the problem increases. A large decision space (in terms of dimensions) requires a
wide range of possible moves to enhance the capability of detecting new promising
solutions.
Experimental observations from Fig 5(c), 6(b), and 7(b) show that for a complex fitness landscape (Rotated Schwefel) the DE is heavily influenced by curse of
dimensionality. It can be observed that for 100 dimensions the DE performance is
competitive compared to the other algorithms, for 500 variables the performance is
poor and for 1000 variables stagnates early and detects a completely unsatisfactory
solutions.
In order to enhance the performance of the DE by widening the range of its search
moves, in [7], [8], and [9], a randomization of the scale factor is proposed. Although
this operation seems to be beneficial for the DE in some specific cases (noisy problems), according to our opinion, it leads to excessive random search within the decision space possibly leading to a significant slowing down of the optimization in
high dimensional problems. Conversely, the probabilistic update of the scale factor
proposed in [2] seems to be an effective alternative to handle complex and multivariate functions. As a matter of fact the SACPDE tends to outperform, on a regular
basis, the standard DE for most of the test problems analyzed in this chapter.
The inversion of the scale factor described in equation (12.5) and proposed in [5]
can be seen as a single step local search which detects the most promising search
directions on the basis of an estimation of the gradient. Thus, for high dimensional
problems, the limited amount of moves of the DE is increased by means of a randomized update of the scale factor and on a knowledge based correction of this
parameter during the algorithmic search. The scale factor local search, originally
proposed in [21] and here proposed for LSPs is a further step in this direction. As
mentioned above, the scale factor local search is independent of the amount of variables and is thus suitable for highly multivariate problems. In addition, the SFHC
integrated into the framework has the crucial role of offering an alternative move
to the DE which is the selection of the most suitable scale factor for a specific offspring generation. This move should lead towards the generation of promising offspring, significantly contributing to the search of more promising solutions during
the subsequent generations. This effect can be easily visualized in Fig. 6(c) where
the improvements appear to be not only due to the application of the local search but
also (and mainly) due to the presence of the individuals generated during the local
search while the global search is performed.
Finally, the population size reduction proposed in [1] plays a different, but nevertheless important role. Although this component does not explicitly offers alternative search moves, it progressively narrows the space where the search is performed
by eliminating the individuals characterized by a poor performance. This makes
the algorithm more exploitative and thus reduces the risk of stagnation. In other
words, this component does not help to detect the global optimum in a LSP but is
320
fundamental in order to quickly improve upon the obtained results after completing the exploratory procedure. To give an analogy, the progressive reduction in the
population size is similar to progressive increase in selection pressure in Genetic
Algorithms. Following a different analogy, this mechanism is similar to a cascade
algorithm composed of as many algorithms as the amount of stages Ns , see equation
(12.7). The search by each algorithm is progressively focused in smaller decision
space after that promising search directions are detected.
The combination of these algorithmic components appears to be very beneficial
for LSPs and helpful in improving the performance of a standard DE.
12.4 Conclusion
This chapter proposes a novel Computational Intelligence algorithm for real-valued
parameter optimization in high dimensions. The proposed algorithm employs a local
search on the scale factor of a DE framework in order to control generation of high
performance offspring solutions. The DE framework also includes self-adaptive parameter control and automatic re-sizing of the population.
It should be remarked that the proposed memetic algorithm performs the local
search on the scale factor and thus on one parameter, regardless of the dimensionality of the problem. This kind of hybridization seems to be very efficient in enhancing
the offspring generation and have a dramatic impact on stagnation prevention in the
Differential Evolution framework. More specifically, these improved solutions seem
to be beneficial in refreshing the genotypes and assisting the global search in the
optimization process.
Numerical results show that the algorithmic behaviour in 100 dimensions is very
promising and the scale factor local search leads to good results in terms of robustness over various optimization problems. The results in 500 and 1000 dimensions
show that the standard DE is much affected but the curse of dimensionality. The
SACPDE and the jDEdynNP-F can have, in various cases good performance but in
some test problems, fail to detect competitive solutions. On the contrary, the proposed DELS appears to be competitive in all the problems analyzed in this chapter,
as the Q-tests prove. To be specific, for some test problems the DELS displays performance competitive to other algorithms considered in this study while in other
cases significantly outperforms them.
In summary, the scale factor local search in DE frameworks seems to be a powerful component for handling LSPs and appears to be very promising in terms of
robustness notwithstanding the complexity of the fitness landscape and high dimensionality characterizing the problem. In this sense, the proposed logic can be
potentially be very useful for various real-world applications.
Acknowledgement
This work is supported by IEEE Computational Intelligence Society, Walter J. Karplus Grant,
and by Academy of Finland, Akatemiatutkija 00853, Algorithmic Design Issues in Memetic
12
321
Computing. The second author would also like to acknowledge the financial support from the
FP6 Marie Curie EST project COFLUIDS, contract number MEST-CT-2005-020327.
References
1. Brest, J., Maucec, M.S.: Population size reduction for the differential evolution algorithm. Applied Intelligence 29(3), 228247 (2008)
322
12
323
Chapter 13
13.1 Introduction
Electric power distribution network planning and operation is the subject matter
of most of the research conducted in network optimization in power systems since
the 90s. The problem has been formulated in many different ways but its solution
always relies on computationally expensive optimization approaches [1][6]. Realistic formulations lead to large-scale combinatorial problems where the objective
function and constraints are not possible to express analytically. We have been working on several instances of the problem since the early 90s and succeed to deploy
industrial applications to solve such problems with evolutionary based algorithms
since 1997 [7][10]. In the following we state the network planning and operation problems focusing on the problem aspects that lead to the main optimization
difficulties.
Pedro M.S. Carvalho Luis A.F.M. Ferreira
Instituto Superior Tecnico, Technical University of Lisbon,
Av. Rovisco Pais, 1049-001 Lisbon
e-mail: pcarvalho@ist.utl.pt,lmf@ist.utl.pt
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 325343.
springerlink.com
Springer-Verlag Berlin Heidelberg 2010
326
Both network planning and operation problems are hard optimization problems
but for different reasons. Distribution network planning consists in choosing a new
distribution system from a set of possible distribution systems so as to meet the
expected load profile in a better way, more reliably, and with fewer losses. The new
distribution system is a plan, a plan to be carried out by project services if the plan
comprises acquisition or installation of new equipment or a plan to be carried out
by dispatch services if the plan involves only changes in the network configuration
(i.e., switching operations). If one can define criteria to measure the goodness of a
plan, then there will be one plan that ranks higher than others; that plan will be the
optimal distribution plan. Finding such plan is computationally difficult because:
1. The number of possible plans is very large, as distribution networks have thousands of nodes and thousands of branches and new investments in one network
area impact considerably in neighbor areas (see Fig. 13.1 for an illustration of a
medium voltage distribution network);
2. The criterion to measure the goodness of a plan is complex. Plan analysis involves complex judgment usually impossible to express analytically, e.g., criteria
involve reliability and security analysis which must be carried out by simulation
for each candidate plan.
Distribution network operation consists in several different network dispatch activities. The most computationally demanding activity is to follow contingency conditions by attenuating the effects of the contingency and leading the distribution
system to a satisfactory point of operation (when possible). For a given contingency, the problem consists in selecting and sequencing a set of switching operations to restore power in a secure and prompt manner. The problem is dynamic.
The switching operations require a substantial cumputational effort to be sequenced
in a securely and optimal manner. The sequence must be investigated in order to
keep the network radial, minimize customer outage costs and not violate voltage
and branch capacity constraints during the several reconfiguration stages [11][12].
The dynamic restoration problem can be addressed in two phases: (i) in the first
phase a network optimization approach finds the post-contingency final configuration; and (ii) in the second phase an optimal sequencing approach finds the order of
the switching operations that better changes the original network configuration into
the post-contingency final configuration. Finding the post-contingency final configuration is a hard optimization problem because the network optimization objective
is twofold: finding a secure configuration is not enough - one must find one that does
not involve many switching operations in order to be able to promptly restore power
to the maximum number of customers [1].
Both planning and operation problem solutions rely upon network optimization.
Network optimization is a broad area of research. In this chapter we address a particular type of network optimization - radial network optimization. The chapter is
organized as follows. In Sect. 13.2 we formulate the planning and operation problems as network optimization problems. In Sect. 13.3 we present the evolutionary
solution approach together with the necessary framework to deal effectively with the
13
327
Fig. 13.1 Geographic representation of a medium-voltage distribution network. Circles represent substations (feeding points) and triangles low-voltage transformation stations (load
points)
network topology constraints. In Sect. 13.4 we present application examples and use
these to discuss implementation practicalities. Section 13.5 concludes the chapter.
328
In normal operation, each node of the graph is connected to a single power delivery point through a single path. The operating network configuration is radial and
connected. Thus, from the topology perspective the operating configuration of the
network can be represented by a spanning-tree T of the graph G. See Fig. 13.2 for
an illustration of the relationship between the electrical network topology and the
graph concepts.
a
b
h
i
a i
b
d
f
e
Fig. 13.2 Schematic representation of an electrical network (upper diagram), and its correspondent graph and spanning tree solution (lower diagram). The figure shows a small-scale
network with two power delivery points (busbars a and i), six load points (busbars b, c, d, e, g,
and h), and a connecting point (busbar f ). The dashed lines identify the network branches
not used by power flow purposes. In the graph representation, the two delivery points are
represented by a single node (the tree root a i), the spanning-tree arcs are represented by
solid lines, and the co-tree arcs are represented by dashed lines
Many operating configurations can be found for the same network infra-structure.
The co-tree arcs of the graph can be used to change the operating topology so as
to improve its performance. It may also happen that even the optimal operating
topology will not be satisfactory. In such case, the network infra-structure must be
upgraded or expanded, thus leading to investment costs in new cables and switching
busbars. The problem of finding the optimal operating network configuration can be
formulated as in the following:
(P)
13
329
Where,
f : Operating cost and investment cost function;
T: Spanning tree of G;
G: Graph of the physical network infra-structure.
Problem (P) falls into general network problem formulations. It can be stated as
connecting all nodes by selecting a spanning-tree T (out of graph G) to minimize
f . The specificities of this problem are that the objective function is non-analytical.
The operating costs must involve at least efficiency and reliability costs.
Efficiency is determined by computing the electrical losses in the network cables
(Joule losses) and transformers (Joule, Foulcault and hysteresis losses), which must
be obtained after finding the network node voltages and branch currents. The voltages and currents depend on loads and are obtained by running an AC power flow
given the network configuration. The AC power flow problem is non-linear and is
usually solved by Newton-like algorithms.
Reliability must be obtained by simulation analysis of possible faults [2, 3].
Faults in distribution networks cause the trigger of one or more breakers upstream
the fault. The opening of a breaker is followed by a sequence of switching actions to
isolate the fault and restore power to the downstream load points (customers). Some
switching actions are automatic others are manual; the manual switching can be
remote-controlled or operated in site. Some switching actions may cause additional
customer interruptions. Reliability is a measure of the interruption impact. Several
reliability indices can be defined. A popular index is the value of the expected Energy Not Supplied (ENS). Such value is a function of (i) the number of faults, (ii)
the chosen sequence of switching actions and their operating times, and (iii) the load
demand of the customers interrupted during the sequence. The function is complex.
Instead of expressing it analytically or even formulating it mathematically, we describe it as follows.
A fault in a line-section leads to the automatic opening of the feeder breaker. If
the fault is fugitive, the feeder is usually reclosed successfully in very short time and,
therefore, the energy that is not supplied is negligible (very short interruptions are
usually not included in system average reliability indices). If the fault is persistent,
then either the breaker reopens the feeder or some line-section upstream automatic
device isolates the fault from the feeder, in which case the breaker recloses. The
breaker or the automatic device stays opened until the faulted line-section is isolated.
After fault isolation, the breaker or the automatic device may be closed so as to
supply the customers upstream from the fault.
After fault isolation, the network may be reconfigured to feed the customers
downstream the fault. When normally opened devices exist, (one of) these may be
closed to feed the downstream customers. However, if the downstream demand is
high, the backup circuit prompted by closing the normally opened device may not
be able to feed all customers without circuit overloads. If overloads appear, some
demand must be discarded. If the backup circuit cannot feed all customers, the discarded demand stays out of service during fault repair time. After repairing the
330
cable or line, the network may return to its original configuration without additional
interruptions.
The expected value of the ENS is obtained by summing the contributions of every
switching step, for every possible line-section fault, and multiplying the result by the
line-section fault probability.
Synthetically, the objective function f can be defined by the sum:
f (T) = i(T) + e(T) + r(T)
(13.1)
Where,
i: Investment cost
e: Efficiency cost (from AC power flow losses)
r: Reliability cost (from fault simulation)
Both e and r are strongly dependent on the configuration T as losses are quadratic
functions of branch-currents and ENS is strongly dependent on branch failure rates
and neighbor switching capabilities.
Problem (P) can get more complex [6]. Increased complexity results from: (i)
node information (load, for instance) being considered as a stage dependent variable. In that case the problem becomes dynamic, as decisions must be scheduled to
gather a sequence of network solutions, and (ii) information for future stages being considered uncertain. This happens when important future investments stand a
chance of being impossible to realize, or some important future information is unknown. If uncertainty is also considered, the problem becomes a stochastic-dynamic
problem.
The more complex versions of the network optimization problem can however
be decoupled into sequences of (P)-like problems [13]. Here, we will address the
solution of (P)-like problems with evolutionary algorithms.
13
331
The binary array representation is very poor as it defines a solution domain much
larger than the space of spanning trees. Note that the domain of the binary array
representation is 2m where m is the number of graph arcs and the space of spanningtrees is much smaller than that [15]. In such circumstances, the canonical recombination operators would hardly find a feasible solution in the binary string domain.
332
Another popular operator that has been proposed by several under different
names is the so-called edge-set encoding [16][18] that consists in superimposing
both parents to create a subgraph of G and then randomly generating two spanning
trees of such subgraph as possible offspring. This is a simple idea that guarantees
offspring feasibility but the result it is not effective. The tree generating process
(Prim-like) is time consuming and too much random, which leads to slow convergence to the optimum. We state such operator in the following under the name of
Tree Generation.
TG Recombination. Let H, Ti and Tii be subgraphs of G and G be defined by the
pair (A, N) where A is the set of arcs and N is the ser of nodes N.
Built H = Ti Tii = (Ai Aii , N) as the subgraph of G with the arcs of the
two spanning-trees only.
Step 2: Randomly generate the offspring trees Ti and Tii as spanning-trees of H.
Step 1:
In the following we propose a more natural problem-related representation of spanning trees. In our approach we propose to recombine by interchanging paths between solutions. The idea is to take the information to be interchanged between
solutions as sub-networks of each solution. As sub-networks, connectivity and radiality can be ensured and meaningful information gets propagated along generations [5].
The main idea behind our recombination approach will be presented together
with the theoretical results that allow going into the implementation details. We
start by defining the spanning tree genotype space as a partially ordered set of nodes
(A, ), i.e., a set where:
(i) a a;
(ii) a b and b a implies a = b;
(iii) a b and b c implies a c, for every a, b, c A [20].
An element a is called the direct precedent of the element b in A, iff: (i) a = b; (ii)
a b; (iii) there is no element c A such that a c and c b. The relation is denoted
by b a. Similarly, an element b is denoted the direct follower of an element a in
A, iff a is one of its direct precedents. The elements of the set are the nodes of the
spanning-tree, and the order relation a b denotes that node a precedes node b on
the path from a to b. Spanning trees have a specific property as partially ordered
sets: each tree element is preceded directly by one and just one single element; an
exception is made for the first element (the root), which is not preceded.
Then, we define possible changes as changes that do not violate order as defined
in properties (i)-(ii)-(iii). We call these consistent changes. Take Lemma 1 to identify
non-consistent changes.
Lemma 1. A b direct-precedence change b a taken over a tree ordered set T
violates order (i)-(ii)-(iii) iff b a.
Proof. Sufficiency If b a there exists in T a direct ordered sequence like a
x y . . . b. A change b a forces a circulation a x y . . . b a,
and thus an order violation (property-ii).
13
333
Necessity If b a does not apply, either (1) a v, or (2) no order exists between
a and b. In case (1), a change b a eliminates the order relationship between every
x : a x b and y : b y by eliminating the existent b-precedence. The order of
the x-elements is not changed: the x-elements remain as followers of a. The same
applies for the y-elements, they remain as followers of b, and by change b a,
also followers of a. In case (2), a change b a forces b to become a follower of a,
and thus every y : b y becomes a follower of a, instead of being a follower of the
existing p(b).
Lemma 1 allows classifying direct precedence changes as consistent or nonconsistent. When consistent, direct precedence changing is a simple way to change tree
information, guaranteeing network radiality and connectivity. Simplicity is important but is not enough. Information to be interchanged should also be meaningful.
One simple and meaningful information structure of a network is a path between
two nodes of the spanning-tree. We propose to interchange path information between solutions as a recombination operator.
Paths can be interchanged between spanning trees if they do not enclose inconsistencies. A path is not just a set it is a partially ordered set and thus precedence
change consistency must be tested orderly. Path precedence relationships must be
submitted and tested, starting from the paths smallest element to the largest one,
by the order defined in the path itself. The following algorithms summarize the path
interchange approach:
Path Interchange Algorithm. (Submit a path P to a tree T)
Name a as the paths smallest element. Denote by F(x) the set of direct followers of
x in the path P. Consider a set E of tree elements, and start by setting it to E = F(a).
Step 1:
Step 2:
334
6
8
6
8
Fig. 13.3 Two spanning trees Ti and Tii (upper and lower figures) and the complementary
paths between nodes 1 and 6, respectively Pii and Pi (in dashed line)
Step 2:
Submit Pi = {2 1, 6 2} to Tii = {5 1, 8 1, 4 5, 6 8, 7
8, 2 6, 3 7} Node 1 is the path Pi smallest element. F(1) = {2}. The
change 2 1 is a consistent change in Tii as node 2 is also a descendent
of node 1 in Tii . Note that 6 2 is not consistent in Tii at this stage. By
updating the tree with 2 1 (in bold) one gets Tii = {5 1, 8 1, 4
5, 6 2, 7 8, 2 1, 3 7}, in which 2 6 changes to 6 2. So,
the change of the second element of the path is no longer necessary. The
result of the path submission in shown in Fig. 13.4.
Now submit Pii = {8 1, 6 8} to Ti = {2 1, 5 1, 4 2, 3
5, 6 2, 7 3, 8 7}. Element 1 is the smallest element of the path
Pii . F(1) = {8}. 8 1 is a consistent change as 8 2 in Ti . Changing the
precedence results in the tree update {2 1, 5 1, 4 2, 3 5, 6
2, 7 3, 8 1}. The follower of 8 in Pii is 6, F(8) = 6. The change 6
8 is again a consistent one as there is not an order relationship between
node 6 and node 8 in Ti . The change results in the spanning-tree {2
1, 5 1, 4 2, 3 5, 6 8, 7 3, 8 1}. The result of the path
submission in shown in Fig. 13.4.
13
335
7
8
7
8
Fig. 13.4 Two descendant spanning trees that result from path interchange between node 1
and node 6 and the branches that were lost in each spanning tree (dotted line). The figures
represent the changed spanning trees Ti and Tii (upper and lower figures). The originals are
represented in Fig. 13.3
336
Lamarckian Hybrid
Make t = 0;
Initialize the population p(0), at random.
Evaluate p(0)
Repeat Steps 1 to 5 (until close to genetic saturation)
Step 1 t t + 1
Step 2 Select the fittest from p(t 1) to build p(t)
Step 3 Recombine p(t) and mutate p(t)
Step 4 Improve p(t) with local heuristics
Step 5 Evaluate p(t)
Some implementation difficulties spring out when trying to implement Step 4. The
difficulties rely upon deciding what solutions will be improved and in what extent
they will be improved. To answer this, note that: (i) if too many solutions are to
be improved, the GA process will become too slow and would be the merit of
improving very bad solutions if they will barely survive after all, and (ii) if solutions
are to be improved in a high extent, population will lose diversity as solutions will
tend to be very much alike (each one similar to its closer local optimum).
Some authors have proposed empirical rules for undertaking Lamarckian steps
(e.g., the Rule of 10); others have proposed theories for coordinating global and local search [19]. Here, we present a very simple but effective approach to coordinate
local search effort with global search effort. We call it Diversity Driven hybridization. Despite being very simple to implement, the presented coordination approach
observes population diversity and solution relative quality. The approach is summarized in the following two steps.
Diversity Driven Hybridization
Step 1:
Step 2:
Given a population of solutions to be improved (Step 4), identify the subset of solutions that have at least one clone (a copy) in p(t). Name this
set q(t) and remark that p(t) \ q(t) has the same genetic material as p(t).
Use local search to improve some solutions of q(t). Randomly choose (i)
the solutions of q(t) to be improved, e.g., with a fixed probability, as well
as (ii) the number of local improvement to make in each solution.
13
337
Where,
f : Operating and investment cost function
y : Co-tree arc of Y
Y : Fundamental cycle of the graph G
G : Graph of the physical network infra-structure
Problem (Q) is very simple when compared to (P). The fundamental cycles of the
graph G with respect of a spanning tree T defined by co-tree arcs represent the
operating network open-loops. For a single of such cycles, the subproblem objective
function is often convex in the space of possible co-tree arcs [20], and can be solved
approximately very easily.
Two questions seem pertinent about the proposed hybridization procedure: (i)
what should be the number of fundamental cycle changes to be operated in each
solution, a single one, y Y , or a generalized operation, y G \ T (single vs. multiple changes), and (ii) why should local optimization be performed on non-diverse
solutions only. The rationale is the following:
1. Local optimization must not perform exhaustive modifications on each solution
that would lead to a dramatic diversity lack
2. Above average solutions are more likely to get copies in the descendants generation selection is a competitive mechanism
3. Local modifications performed at above average solutions are more likely to
propagate to the descendants.
The proposed hybridization has the additional advantage of guaranteeing local optimality, which is a very important aspect for industry applications. See the following
result where optimality is related to genetic saturation.
Lemma 2. Genetic convergence guarantees (Q)-optimality.
Proof. Genetic convergence presumes the stable absence of diversity, i.e., p(t)
q(t). If a solution is not (Q)-optimal, then it is possible to perform a single loop
reconfiguration to improve such solution, and as improved, selection will guarantee
its propagation to the descendants. So, for a non empty random subset q, (Q) suboptimal solutions are unstable.
338
Knowing how to change is not an easy task. We present the results of our experience
and discuss their limitations.
We start by solving a network operation problem. We chose to optimize the operating configuration of the distribution network represented in Fig. 13.1. The network
has nine substations, 1613 nodes from which 768 are load points and 1667 branches,
which are connected through 35 different feeders (sub-trees). The network has a
300 MVA installed capacity for a peak load of 185 MVA. High-voltage losses are
1 GWh /year and medium-voltage losses are 6 GWh /year. Yearly operational costs
include losses costs and reliability costs amounting to about 1 MC /year.
We start from the actual configuration and generate a population of 120 random
solutions by undertaking a random number of fundamental cycle changes over the
actual spanning-tree configuration. Changes are made sequentially for each individual configuration. The generated population has a cost distribution that varies
between 0.9 MC and 3.4 MC (see Fig. 13.5).
0.1
10
gn = 1
10
gn = 5
-0.1
10
gn = 10
10
10
No. of individuals
10
Fig. 13.5 Operating cost distribution for generations 1, 5, and 10. In generation number 1
most of the individuals have costs around 1 M C, in generation number 5 there is already a
large distribution of the costs, some still have high values but many are already below 800 k C,
and after 10 generations most are already below 650 k C. The initial cost that corresponds to
the actual configuration is shown in the figure as a circle
Then, we select the better solutions from this population with binary tournaments
without elitism and recombine 80% of the selected configurations by interchanging
13
339
paths between spanning-trees. We do not mutate. From the set of recombined configurations we find out clones (repeated solutions) and modify these by undertaking
fundamental cycle changes to solve (Q)-like subproblems. The modified population
is then evaluated by computing f before going again into the selection process.
This sequence continues for 20 generations until genetic saturation is achieved
(see Fig. 13.6). The result obtained has a total operating cost of 650 k C, which
represent a cost reduction of 35%. This has been possible in such few generations
because the operators are very effective for the problem at hand. Parameters are
also important for effectiveness. How many paths did we submitted when recombining? How many cycle changes did we undertake when improving clones? These
are important questions that experience can answer.
Fig. 13.6 Generation evolution of the operating cost and the corresponding number of repeated individuals that are locally optimized by solving Q-subproblems
Before answering the questions on parameters used, let us solve a network planning problem, i.e., an optimization problem that also involves investment possibilities. That problem is similar to the operation problem but harder. It is harder
because the investment costs increase significantly the spanning-tree building block
cost variance, the so-called collateral noise [21]. A small change either in a path or
in a cycle might be responsible for a big difference in performance. That makes the
evolutionary algorithm job much more difficult.
For the planning problem, we use the same distribution network as before (the
network represented in Fig. 13.1) but now with some of the nodes and some of
branches as new possible investments. The starting investment plan involves investment costs of 330 kC, high-voltage losses of 1 GWh /year and medium-voltage
340
losses 8 GWh /year. Part of the network shows serious under-voltage and overcurrent problems, which are penalized. Penalties related to electrical constraints
amount to 315 k C. Yearly operational costs include losses costs and reliability costs
amounting to about 2 MC /year.
Like before, we start from the actual configuration plan and generate a population of 120 random solutions. The generated population has a cost distribution that varies between 1.6 MC and 7.2 MC (see Fig. 13.7). The evolutionary
algorithm evolves now for 37 generations until genetic saturation is achieved (see
Fig. 13.8). The process is now longer than before (for the operation problem) and
also more sensitive to the algorithm parameters. We will address this problem in the
following.
The evolution of the process depends on the classical GA parameters, such as
crossover probability, population size, etc., but also on other parameters that are
required by our specific approach. These parameters are (i) the number of paths
exchanged, np, at the recombination of two individuals, and (ii) the number of cycle
changes, nc, undertaken in each repeated solution (for each clone).
0.3
10
gn = 1
0.2
10
gn = 10
gn = 20
0.1
10
gn = 35
10
10
No. of individuals
10
Fig. 13.7 Operating and investment cost distribution for generations 1, 10, and 20 and 35. In
generation number 1 most of the individuals have costs around 2 M C, in generation number
10 there is already a large number of individuals with costs around 1.6 M C, and after 20
generations most are already around 1.35 M C. The population saturates for an optimum cost
is below 1.2 M C after generation number 33
13
341
Fig. 13.8 Generation evolution of the investment and operating cost and the corresponding
number of repeated individuals that are locally optimized by solving Q-subproblems
Our experience with distribution networks lead to the conclusion that these numbers should be random but bounded. Upper and lower bounds can be defined for the
number of paths to be exchanged, say npU and npL , and for the number of cycles to
be changed, say ncU and ncL . The bounds can be defined as a function of the number of feeders, n f (independent sub-trees). Simple functions can be used with good
results. In the cases presented previously we used the bounds given in Table 13.1.
Table 13.1 Bounds for the number of paths to be exchanged and the number of cycles to be
changed
Name
npL
npU
ncL
ncU
Value
0.2n f
0.15n f
342
13.5 Summary
In this chapter we have presented an evolutionary approach to the electric power
distribution network planning and operation problems. The problems have been formulated as large-scale optimization problems and addressed by especially designed
evolutionary hybrids. The designed evolutionary operators and hybridization process
have been presented and their role in optimality discussed. Application examples
have been provided to support the discussion and illustrate critical implementation
practicalities.
References
1. Carvalho, P.M.S., Ferreira, L.A.F.M., Barruncho, L.M.F.: Optimization Approach to Dynamic Restoration of Distribution Systems. International Journal of Electrical Power &
Energy Systems 29(3), 222229 (2007)
2. Carvalho, P.M.S., Ferreira, L.A.F.M.: Distribution Quality of Service and Reliability Optimal Design: Individual Standards and Regulation Effectiveness. IEEE Transactions on
Power Systems 20(4) (November 2005)
3. Carvalho, P.M.S., Ferreira, L.A.F.M.: Urban Distribution Network Investment Criteria
for Reliability Adequacy. IEEE Transactions on Power Systems 19(2) (May 2004)
4. Carvalho, P.M.S., Ferreira, L.A.F.M.: On the Robust Application of Loop Optimization Heuristics in Distribution Operations Planning. IEEE Transactions on Power Systems 17(4), 12451249 (2002)
5. Carvalho, P.M.S., Ferreira, L.A.F.M., Barruncho, L.M.F.: On Spanning-Tree Recombination in Evolutionary Large-Scale Network Problems: Application to Electrical Distribution Planning. IEEE Transactions on Evolutionary Computation 5(6), 613630 (2001)
6. Carvalho, P.M.S., Ferreira, L.A.F.M., Lobo, F.G., Barruncho, L.M.F.: Optimal Distribution Network Expansion Planning Under Uncertainty by Evolutionary Decision Convergence. International Journal of Electrical Power & Energy Systems Special Issue on
PSCC 1996 20(2), 125129 (1998)
7. Ferreira, L.A.F.M., Carvalho, P.M.S., Barruncho, L.M.F.: An Evolutionary Approach
to Decision-Making in Distribution Systems. In: Proceedings of the 14th International
Conference and Exhibition on Electricity Distribution (CIRED 1997), Birmingham, UK
(1997)
8. Carvalho, P.M.S., Ferreira, L.A.F.M., Barruncho, L.M.F.: Hybrid Evolutionary Approach
to the Distribution Minimum-Loss Network Configuration. In: Proceedings of the Simulated Evolution And Learning (SEAL 1998 Special Session), Canberra, Australia (1998)
9. Ferreira, L.A.F.M., Carvalho, P.M.S., Jorge, L.A., Grave, S.N.C., Barruncho, L.M.F.:
Optimal Distribution Planning by Evolutionary Computation How to Make it Work.
In: Proceedings of the Transmission and Distribution Conference and Exposition (TDCE
2001), Atlanta, USA (2001)
10. Mira, F., Jorge, L.A., Quaresma, E., Ferreira, L.A.F.M., Carvalho, P.M.S.: New Technologies for Distribution Planning: Optimal Design for Efficiency, Reliability and New
Paraguay (2005)
Regulation Criteria. In: Proceedings of XI ERIAC CIGRE,
11. Carvalho, P.M.S., Ferreira, L.A.F.M., Rojao, T.L.: Dynamic Programming for Optimal
Sequencing of Operations in Distribution Networks. In: Proc. 15th Power System Comp.
Conf. (PSCC 2005), Li`ege, Belgium (2005)
13
343
12. Carvalho, P.M.S., Carvalho, F.J.D., Ferreira, L.A.F.M.: Dynamic Restoration of LargeScale Distribution Network Contingencies: Crew Dispatch Assessment. In: IEEE Power
Tech. 2007, Lausanne, Switzerland (2007)
13. Carvalho, P.M.S., Ferreira, L.A.F.M., Lobo, F.G., Barruncho, L.M.F.: Distribution Network Expansion Planning Under Uncertainty: A Hedging Algorithm in an Evolutionary
Approach. IEEE Transactions on Power Delivery 15(1), 412416 (2000)
14. Michalewicz, Z.: Genetic algorithm + Data structures = Evolution programs, 3rd edn.
Springer, New York (1996)
15. Behzad, M., Chartrand, G.: Introduction to the Theory of Graphs. Allyn and Bacon Inc.,
Boston (1971)
16. Walters, G.A., Smith, D.K.: Evolutionary Design Algorithm for Optimal Layout of Tree
Solutions. Eng. Optimization 24(4), 261281 (1995)
17. Dengiz, B., Altiparmak, F., Smith, A.E.: Local Search Genetic Algorithm for Optimal
Design of Reliable Networks. IEEE Trans. Evolutionary Computation 1(3), 179188
(1997)
18. Raidl, G.R., Julstrom, B.A.: Edge Sets: An Effective Evolutionary Coding of Spanning
Trees. IEEE Transactions on Evolutionary Computation 7(3), 225239 (2003)
19. Goldberg, D.E., Voessner, S.: Optimizing global-local search hybrids, IlliGAL Report
No. 99001 (January 1999)
20. Civanlar, S., Grainger, J.J., Yin, H., Lee, S.S.: Distribution Feeder Reconfiguration for
Loss Reduction. IEEE Trans. Power Delivery 4(2), 12171223 (1988)
21. Goldberg, D.E., Deb, K., Clark, J.H.: Genetic Algorithms, Noise and the Sizing of Populations. Complex Systems 6, 333362 (1992)
Chapter 14
Abstract. Many problems formerly considered intractable have been satisfactorily resolved using approximate optimization methods called metaheuristics. These
methods use a non-deterministic approach that finds good solutions, despite not ensuring the determination of the overall optimum. The success of a metaheuristic is
conditioned on its capacity of alternating properly between the exploration and exploitation of solution spaces. During the process of searching for better solutions,
a metaheuristic can be guided to regions of promising solutions using the acquisition of information on the problem under study. In this study this is done through
the use of reinforcement learning. The performance of a metaheuristic can also be
improved using multiple search trajectories, which act competitively and/or cooperatively. This can be accomplished using parallel processing. Thus, in this paper
we propose a hybrid parallel implementation for the GRASP metaheuristics and the
genetic al gorithm, using reinforcement learning, applied to the symmetric traveling
salesman problem.
14.1 Introduction
Modeling and resolving complex problems in the world we live in is not an easy
task, given that there are some situations in which it is impossible to build a detailed
Joao Paulo Queiroz dos Santos Jorge Dantas de Melo
Adriao Duarte Doria Neto
Department of Automation and Control, Federal University of Rio Grande do Norte
e-mail: {jxpx,jdmelo,adriao}@dca.ufrn.br
Rafael Marrocos Magalhaes
Department of Exact Sciences, Federal University of Paraiba
e-mail: rafael@ccae.ufpb.br
Francisco Chagas de Lima Junior
Department of Computing
State University of Rio Grande do Norte,
College of Science and Technology Mater Christi
e-mail: lima@dca.ufrn.br
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 345369.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
346
model for the problem, owing to its high complexity. On the other hand, a process of
simplifying this model leads to loss of relevant information that may compromise its
quality. In addition to the inherent difficulty in building models for these problems,
a characteristic during the resolution phase is the need for large scale computational processing, which, in most cases, leads to these problems being considered
intractable. In this context researchers have dedicated themselves to the development of techniques aimed at facilitating modeling and, mainly, at resolving these
problems [13], [11] and [10].
A widely used approach for solving intractable problems has been the usage of
so-called metaheuristics, which are strategies based on heuristic procedures, mainly
applicable to optimization problems and which produce a simplified process of a
stochastic search in the solution space [12]. Despite achieving good results without
an exhaustive search, metaheuristics do not ensure obtaining the optimal solution of
the problem.
The great challenge of a metaheuristic is to maintain the equilibrium between
exploration and exploitation processes. Exploration (or diversification) is used to
allow the solution to escape from the so-called local minima, whereas exploitation
(or intensification) is used to improve the quality of the solution locally, in search of
the overall optimum.
Resolving the dilemma of when to explore and when to exploit is not an easy
task. Thus, many researchers have been involved in seeking improvements that help
the metaheuristics in the exploration and/or exploitation process. In this context a
very interesting study [7] was conducted using reinforcement learning, but specifically the Q-learning algorithm, as an exploration/exploitation strategy for GRASP
metaheuristics and the genetic algorithm, applied to the traveling salesman problem
- T SP. In addition to the explore or exploit dilemma, another aspect to consider is
the large number of possible solutions that problems such as T SP present.
This high dimension of the universe of solutions of problems like T SP generates a
large processing demand, which may be met by the use of architectures with parallel
processing capacity, able to increase, by some orders of magnitude, the processing
power available in monoprocessed architectures.
The use of parallel processing promotes the development of new algorithms and
opens possibilities for the exploration of aspects of the problem not approached in
the usual architectures such as competition and cooperation [5].
Based on the success obtained by the aforementioned techniques and motivated
by the difficulties of complex problems in the real world, this study proposes the development of hybrid parallel methods, using reinforcement learning, GRASP metaheuristic and genetic algorithms.
With the use of these techniques together, better efficiency in obtaining solutions
is expected. In this case, instead of using the Q-learning algorithm of reinforcement learning only as a technique to generate the initial metaheuristic solution, we
intend to use it cooperatively/competitively with the other strategies in a parallel
implementation, which will be described in detail in the continuation of the text.
14
347
348
where st is the current state, at is the action performed in the st state, rt is the reinforcement signal received after executing at in st , st+1 is the next state, is the
discount factor (0 < 1) and (0 < < 1) is the learning coefficient. The function Q(s, a) is the value associated to the state-action pair (s, a) and represents how
good the choice of this action is in minimizing the accumulated reward function,
designated by:
R=
k
t+k+1
(14.2)
k=0
14
349
widely used technique for this choice is the so-called -greedy exploration, which
consists of choosing the action associated to the highest Q-value with probability
1 + A(s)|, where |A(s)| corresponds to the number of possible actions to be
executed starting from s. Q-learning was the first reinforcement learning method
to display strong evidence of convergence. Watkins [17] showed that if each pair
(s, a) is visited an infinite number of times, the Q-value function Q(s, a) will converge with probability one for Q, with sufficiently small. As long as the optimal Q-value is known, an optimal choice of actions can be made according to the
expression:
a (s) = max Q(s, a)
(14.3)
a
350
14.3.1 GRASP-Learning
The GRASP metaheuristic need to work with good initial solution. Considering this
dependence, the use of the Q-learning algorithm is here proposed as a constructor of initial solutions, in substitution to the partially greedy algorithm generally
used.
The Q-learning algorithm use as rule of state transition will use greedy strategy, mentioned previously, defined by:
arandom if v <
(s ) =
(14.4)
argmaxa Q(s , a ) otherwise
where: v is a random value with uniform probability distribution between [0, 1],
(0 1) is the parameter that defines the exploration rate so that, the lesser the
value of , the lesser the probability of making a random choice of the action will
be, and arandom is an action randomly chosen amongst the possible actions to be
executed in the state s .
As already mentioned, the Q-learning algorithm will be used as a constructor
of initial solutions for GRASP metaheuristics. Therefore, each iteration of the algorithm intends to construct a solution of good quality, since the Q-learning will
explore the knowledge of the environment (solution space of the problem) through
the use of the matrix of rewards. The matrix of rewards is generating using the
matrix of distance of each instance do T SP, and is computed of the following
form:
Mi
r(s , a ) =
(14.5)
di j
where, di j corresponds to the distance between cities i and j that compose a route
and are represented in the model by the states s and s , respectively, while Mi is the
distance average of the city i for all another cities.
The control between exploitation and exploration will be made by the parameter of the transition rule described in (14.4). Higher the value of , more rarely
the Q-learning algorithm will make use of the knowledge of the environment, while,
lower value of , means more random choice of actions.
The basic idea of the GRASP-Learning method is to make use of the information
contained in the matrix of Q-values as a kind of adaptive memory, that allows repeat
the good decisions made in previous iterations, and avoid those that were not interesting. Thus, considering for example the traveling salesman problem, the method
used in each GRASP iteration, the state-action pairs Q(s, a) stored in the matrix of
the Q-values, to decide which visits are promising for the traveling salesman.
The policy greedy is used with the objective guarantee certain level of randomness, thus avoiding the construction of locally optimal solutions. The Fig. 14.1
presents an overview of the GRASP-Learning metaheuristic.
14
351
14.3.2 Genetic-Learning
The Genetic-Learning algorithm use the idea of introducing knowledge of the environment through the Reinforcement Learning. The main focus of this method is
to explore of efficient way the space of search through the learning of the environment of the problem, using the Q-learning algorithm with a genetic algorithm - GA.
Making use of a genetic algorithm, the solution search space of a problem can be
explored adequately through the generation of an initial population of high fitness in
352
relation to the objective function. Therefore, the genetic algorithm considered here
has its generated initial population through the Q-learning algorithm.
Another modification in this method occurs in the crossover operator of the GA.
In this operator, one of the parents will be taken from the improved population for
the action of the operators of the current generation, as in the traditional GA, while
the other will be generated by the Q-learning algorithm without any action from the
genetic operators. The other operators (selection and mutation) are implemented in
the traditional way. The Fig. 14.2 presents an overview of the Cooptative GeneticLearning algorithm.
The following Section describe the parallel hybrid implementation of the GRASPLearning and Genetic-Learning metaheuristics.
14.4.1 Methodology
As explained in section 14.2, the genetic algorithms work with populations of solutions, GRASP provides a local optimal solution and Q-learning a table of Q-values
that enables the building of solutions starting from any point on the table. All the algorithms involved are iterative; that is, the quality of their solutions tends to improve
with an increase in the number of iterations.
For a better understanding of the proposal, consider the schematic diagram in
Figure 14.3, which shows the interaction structure between the algorithms. In this
scheme the critical has the task of managing the quality of solutions generated by
each of the algorithms, i.e., when necessary it replace a bad solution by another best.
14.4.1.1
Using the table of Q-values, one or more solutions are generated as follows: Choose
a starting city s0 for T SP and refer to the table of values Q(s, a), obtaining the best
value Q (s0 , a) = max Q(s0 , a), a A(s0 ), where A(s0 ) are all the possible actions
a
from the s0 state.
Choose the city indicated by the choice of action a as being the next in the
T SP route. Repeat the process until all the cities have been visited a single time,
14
353
Fig. 14.3 Cooperation scheme between the Q-learning, GRASP and Genetic algorithms
generating thus a solution Sq for T SP. For simplicity, consider that only one solution is generated.
For GRASP, solution Sq will be used as the initial solution instead of that obtained
in the construction phase and will be improved using local search in a new iteration
of the algorithm.
14.4.1.2
At each iteration of the algorithm, choose the individual of the population with the
highest suitability value (best fitness); this individual will be a solution Sg . Let Rg
be the cost of the route associated to Sg .
For GRASP, solution Sg will be used as the initial solution, substituting the solution that would be generated in the construction phase, in the same way that this
solution will be improved before using local search in the next iteration of the
algorithm.
For Q-learning, the Q-values table must be updated based on information available in solution Sg . Therefore, it should be remembered that a value Q(s, a) represents an estimate of the total expected return that will be obtained when in state s
and choosing action a. In the case of T SP this value represents a cost estimate of the
cycle starting from s and having as the next city visited the one indicated by action
a. Similarly, solution Sg has a cost associated to the cycle, that is, Rg . Thus, a series
of pairs (s, a) can be established from this solution, corresponding to the order in
which the cities were visited and the values Q(s, a) can be updated as follows:
Q(s, a) = .(Q(s, a) + Rg),
(0 < < 1)
(14.6)
354
14.4.1.3
At each iteration of the algorithm, take the solution obtained using the GRASP local
search, SG . Let RG be the cycle cost associated to SG .
For Q-learning, use the same procedure described in the previous item and represented by equation 14.6, replacing Rg by RG .
It is important to observe that in a parallel application, each of the algorithms
will constitute an independent task that will be executed at its own pace. This means
that communication between tasks cannot be synchronous; otherwise the execution
of one of the tasks will be blocked while it waits for results from another to be sent.
Therefore, asynchronous communications could be avoided with a specific parameters settings. It is done in a fashion that each algorithm execute a number of steps
that is approximately proportional of others algorithms. For example, if the GRASP
algorithm run ten times faster than genetic algorithm for one step, so the number
of steps for that algorithm will be adjusted proportionally greater than another. This
solution reduce problems with asynchronous communications and avoid idleness
time of tasks.
Since there are three algorithms involved in the parallel implementation, it can be
supposed that three distinct tasks are needed for such a purpose. If we consider that
a parallel architecture has considerably more than 3 processing elements, it follows
that the efficiency of the implementation will be compromised, given that several of
these elements will be idle during the execution of the application.
To avoid this idleness and to make use of the entire potential of parallel architecture, multiple parameterizations of the algorithms will be used in this study. We
understand multiple parameterizations as being the execution of a same algorithm
with different behaviors.
In the case of genetic algorithms, this corresponds to associating to some parallel
tasks, instances of genetic algorithms with different population behaviors; that is,
different mutation rates, crossover and selection mechanisms.
For GRASP, different lengths can be used for the restricted list of candidates and
different local search mechanisms.
In the case of Q-Learning, the possibilities of multiple parameterizations are associated to the choices of the parameters involved in updating Q-values. These values are the and the presented in equation 14.1.
It should be pointed out that the occurrence of multiple parameterizations, in
addition to allowing a more efficient use of parallel architecture, will make the
communication structure between the tasks more complex, given that two different instances of a same algorithm will be able to exchange information. The cost
of these communications, in terms of processing time, will be analyzed during the
implementation to avoid compromising performance.
14
355
0
1
(14.7)
356
max
Q(i, a ) + Q(i, a) if pi j = 0
a
ri j if pi j (a) = 1
(14.8)
Methodology
The computer architecture was used as a dedicated network of computers that form
a cluster system of Beowulf type. This architecture is composed of nine computers
that have the following configuration: Intel Core 2 Duo processor with 2.33GHz
of clock, 2GB of RAM, 80GB hard disk and Gigabit Ethernet network card. The
operating system used on the machines is GNU/Linux Ubuntu version 8.04. The
library for parallel programming used was OpenMPI version 1.2.5, that is an implementation of a message passing interface library. The MPI is based on simulating
of the existence of many different programs exchange information simultaneously.
The Zabbix software version 1.4.2 was used as monitoring performance. The implementation of the algorithms were developed with the programming language C++
and analysis using MATLAB.
For the Traveling Salesman Problem (TSP) was used the TSPLIB1 repository,
this library presents various instances for many different variants of TSP. Eight TSP
1
The instances of the TSP problem used were obtained from TSPLIB site, hosted under the
domain http://ftp.zib.de/pub/Packages/mp-testdata/tsp/tsplib/tsplib.html
14
357
instances were evaluated in this parallel implementation, they are: gr17, bays29,
gr48, berlin52, eil76 and a280. The information about the instances used in the
experiment are presented in Table 14.1.
14.5.2.2
Best value
2085.00
2020.00
5046.00
7542.00
538.00
2579.00
Serial Execution
This experiment makes the implementation of the serialized algorithm, where all
algorithms were implemented on the same machine, that is in the same processing
node. Table 14.2 presents the results of serial execution in each of eight instances
evaluated. This experiment was conducted comparing purposes with the parallel
implementation and measurement of speedup gain and efficiency in terms of time
and quality of results.
Table 14.2 Serial execution time and Objective Function values for each instance
Instance
Objective Function
Time in seconds
14.5.2.3
bays29
swiss42
berlin52
eil76
gr120
2020
220
1284
577
7705
952
591
1390
8789
3308
ch150
si175
a280
Parallel Execution
The parallel implementation has been done distributing the algorithm parts over the
nodes of the cluster. One node with genetic algorithm, other with GRASP algorithm,
and another focused on the reinforcement learning. The communication between
algorithms nodes was done as showed in Figure 14.3. For each evaluated instance
was generated thirty executions for a statistical analysis of implementation.
Table 14.3 shows the parameters used for implementation of algorithms for this
set of experiments. The number of individuals in the population for genetic algorithm was one hundred (100) elements, the crossover rate equals to 0.7, the mutation rate equal to 0.2, while for Q-learning algorithm the parameters were adjusted
as follows, = 0.8, = 0.01, = 1 and = 0.2, which is the actualization parameter
358
of Q-values with solutions from the GRASP and Genetic Algorithm. Considering
the fact that the execution time of Genetic Algorithm (GA) is slower than others
algorithms, the number of executions of the GRASP and the Q-Learning are higher
than the Genetic Algorithm, this quantity is expressed in columns (QL/GA) and
(GRASP/GA) mean that the number of executions of the algorithm Q-Learning per
Genetic Algorithm iteration and the GRASP execution per GA iteration. The Rand
index means an exchan ge of position in the current solution in order to avoid a
local minimum or to diversify the current solution, the Communication index is the
number of iterations that the algorithms exchange information.
iterations
(QL/GA)
(GRASP/GA)
Rand
Communications
bays29
swiss42
berlin52
eil76
gr120
ch150
si175
a280
20.00
40.00
50.00
50.00
70.00
100.00
100.00
100.00
8.00
7.00
7.00
8.00
8.00
10.00
12.00
12.00
28.00
18.00
18.00
10.00
7.00
6.00
5.00
5.00
5.00
5.00
5.00
5.00
5.00
6.00
5.00
6.00
10.00
20.00
20.00
20.00
30.00
50.00
50.00
50.00
The graphs shown in Figures from 14.4 to 14.11 presents for each instance the
behavior of the value of objective function achieved by each test in comparison with
the best known normalized value of the objective function, in addition are plotted
the tracks of superior and inferior standard deviation for a better visual perception
of the quality of solution. The limits of the plot area were chosen in a range of 20%
around the value of the average objective function in each experiment.
Figure 14.4 shows homogeneous result in all executions, in this case the objective function reached the optimal value known in thirty executions. Figures 14.5 to
14.10 have a very stable behavior in all executions, it can be confirmed by the value
of standard deviations that in all case were less than 2%. The less homogeneous behavior and more proportional variance was in Figure 14.11, this is probably caused
by the complexity of this instance and the parameters of execution that was selected
empirically for this implementation.
Table 14.4 presents a compilation of statistical data for this experiments. The first
line shows the average values for the objective function obtained with thirty runs in
each instance studied. The second line shows the standard deviation of each instance
in relation to mean value of objective function obtained, the third line presents the
same information about standard deviation but in percentage value for better understanding of the data. The fourth line shows the optimal value (better value) of
the objective function found in literature and TSPLIB database for each instance.
14
29 Instance
42 Instance
Standard Deviation
Objective Function
Standard Deviation
Objective Function
1.15
1.15
1.1
Objective Function Value
1.1
1.05
0.95
1.05
0.95
0.9
0.9
0.85
0.85
0.8
10
15
Number of Runs
20
25
0.8
30
10
15
Number of Runs
20
30
76 Instance
Standard Deviation
Objective Function
Standard Deviation
Objective Function
1.15
1.15
1.1
Objective Function Value
1.1
Objective Function Value
25
52 Instance
1.05
0.95
1.05
0.95
0.9
0.9
0.85
0.85
0.8
359
10
15
Number of Runs
20
25
30
0.8
10
15
Number of Runs
20
25
30
The fifth line shows the distance between the mean value and the known optimum
values, which is shown again in the last line in percentages.
It is possible to interpret from the last row of table 14.4 that result from the
execution of the bayes29 instance in average 100% near (in a proximity way) of
the best objective function value known, for instance 42 is 99.45% near of better
value known and so on, as shown in the graph of the Figure 14.12 and Table 14.5,
where the values closer to 100% represent better solutions. The average distance
of maximum and minimum proximity are obtained through the standard deviation
percentage value shown in Table 14.4, which indicates in average, more than half
of the instances have an proximity of objective function a value greater than 90%
closer to the optimal function value.
360
120 Instance
150 Instance
Standard Deviation
Objective Function
Standard Deviation
Objective Function
1.15
1.15
1.1
Objective Function Value
1.1
1.05
0.95
1.05
0.95
0.9
0.9
0.85
0.85
0.8
10
15
Number of Runs
20
25
0.8
30
10
15
Number of Runs
20
280 Instance
Standard Deviation
Objective Function
1.15
1.15
1.1
1.1
Objective Function Value
Standard Deviation
Objective Function
1.05
0.95
1.05
0.95
0.9
0.9
0.85
0.85
10
30
175 Instance
0.8
25
15
Number of Runs
20
25
0.8
30
10
15
Number of Runs
20
25
30
si175
a280
O.F. Mean Value O.F. 2020.00 1280.00 7678.70 590.80 8638.50 8723.10 23936.90
Standard Deviation
0.00
8.70 52.10
6.30 168.60 164.70 240.60
Standard Deviation (%)
0.68
0.00
0.67
1.07
1.95
1.88
1.00
Optimal O.F.
2020.00 1273.00 7542.00 538.00 6942.00 6528.00 21407.00
O.F. Mean Dist
0.00
7.10 136.70 52.80 1696.5 2195.17 2529.90
O.F. Mean Dist. (%)
0.00
0.50
1.80
9.80 24.40
33.60
11.80
4608.70
183.40
3.90
2579.00
2029.70
78.70
O.F.
Objective Function.
eil76
gr120
ch150
14
361
mean proximity(%)
bays29
swiss42
berlin52
eil76
gr120
ch150
si175
a280
100.00
99.44
98.19
90.19
75.56
66.37
88.18
21.30
100.00
100.00
98.87
91.26
77.51
68.26
89.19
25.28
100.00
98.77
97.51
97.51
73.62
64.48
87.18
17.32
1.8
1.6
1.4
1.2
0.8
bays29 swiss42 berlin52
eil76
gr120
Size Instance
ch150
si175
a280
Fig. 14.12 Mean percentage distance of optimal objective function value for each instance
14.5.2.4
362
Instances
si175
a280
4519.00
138.30
2.90
2579.00
2153.90
83.50
O.F.
eil76
gr120
ch150
Objective Function.
Figure 14.13 shows the normalized distances between the values obtained with
the experiment and the optimum values found in literature, where can be visualized
a similar behavior to the parallel experiment without time limitation.
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1
0.9
bays29 swiss42 berlin52
eil76
gr120
Instance
ch150
si175
a280
Fig. 14.13 Normalized distance between the mean Objective Functions evaluated and the
optimal Object Function known
14.5.2.5
14
363
Table 14.7 shows the settings parameters for algorithms implementation in this set of
experiments. For both Genetic Algorithm and Q-learning the same parameters were
used rather than in the previous experiment, except for value, that was set with
different values (multiparameterized) because of the changes in values of parameters have fundamental importance for initial solutions provided from Q-learning,
making possible to continue the produce of good solutions and further enhance these
solutions. Two groups of structures were created, for Group 1, = 0.2 and for Group
2, the = 0.3.
iterations
(QL/GA)
(GRASP/GA)
Rand
Communications
bays29
swiss42
berlin52
eil76
gr120
ch150
si175
20.000
40.00
50.00
50.00
70.00
100.00
100.00
5.00
5.00
5.00
6.00
7.00
8.00
8.00
20.00
16.00
13.00
8.00
7.00
6.00
5.00
5.00
5.00
5.00
5.00
5.00
6.00
5.00
10.00
20.00
20.00
20.00
30.00
50.00
50.00
When comparing the table 14.3 with the table 14.7, there is a decrease of the
values in columns (QL/GA) or (GRASP/GA). This is due to cooperation and competition between the groups, because it can get the same solution quality of the
previous architecture with less iterations due to greater diversity.
Table 14.8 presents the results obtained in this application. As previous tables are
presented in table 14.8 data with statistical values about achieved objective function
values, their absolute and percentage standard deviations, the best known optimal
value and the distance between the optimum values known and the obtained absolute
and percentage values.
eil76
gr120
ch150
si175
a280
Mean O.F.O.F.
2020.00 1273.00 7695.90 589.40 8560.00 8674.30 23789.00 4519.00
Standard Deviation
0.00
0.00 43.05
5.80 157.08 181.30 202.30 131.80
Standard Deviation (%)
0.00
0.00
0.56
0.98
1.81
2.10
0.85
2.90
Optimal O.F.
2020.00 1273.00 7542.00 538.00 6942.00 6528.00 21407.00
2579
O.F. Optimal Dist.
0.00
0.00 153.90 51.40 1618.00 2146.30 2381.70 1940.00
O.F. Optimal Dist. (%).
0.00
0.00
2.10
9.50 23.30
32.80
11.10 75.20
O.F.
Objective Function.
364
From the table 14.8 is observed that in addition to instance bays29 the instance
swiss42 also gets the average value for Objective Function with a zero % distance
on the optimal value, i.e., all executions of the experiment reach the best value of
solution.
14.5.2.6
Analysis of Performance
The performance analysis achieved in this section relates only to data processing
and time, not including the quality of results information. A final consideration on
the results is presented in the following subsection (collective analysis). The comparison of the results include data of execution time of the serial experiments (using only one machine), parallel experiments (using three machines per experiment)
and in parallel groups (using six machines per experiment). Table 14.9 contains the
values of the time in seconds of execution for each instance in each type of tests
serial, parallel and in groups. The same information are expressed graphically in the
Figure 14.14 included here for better visual perception.
eil76
gr120
ch150
si175
a280
Serial Time (sec.) 220.00 577.00 952.00 1390.00 3308.00 7082.00 9975.00
20300
Parallel Time (sec.) 112.00 302.00 470.00 790.00 2030.00 4500.00 6660.00 13050.00
Group Time (sec.) 115.00 307.00 464.00 750.00 2041.00 4237.00 5487.00 13413.00
In all cases, the serial execution time is longer than the parallel execution time
and parallel group time. The experimental of parallel group, has a similar time to
the simple parallel implementation, a less time if readily apparent at the eil76, ch150
and si175 instances, while the simple parallel implementation time only highlights
in a280 instance.
The statistical measures commonly used for evaluating performance in parallel software are the speedup and efficiency [5]. The speedup is obtained by the
expression 14.9.
Ts
speedup =
(14.9)
Tp
where Ts is the best time of serial algorithm execution and Tp the best parallel time
execution, this analysis uses the average time for parallel execution as Ts . The efficiency can be calculated as shown in expression 14.10.
e f f icency =
Ts
pTp
(14.10)
14
365
Execution Time
1400
Sequential time
Parallel Time
Parallel Time (Group)
1200
1000
800
600
400
200
bays29
swiss42
berlin52
eil76
Size Instance
where again Ts is the serial time, Tp parallel time and p is the number of processors
used in the experiment. Thus the value of p is equals three in parallel experiments
and equals six in group parallel experiments.
Table 14.10 gives the values of speedup and efficiency for each experiment in
each instance examined. Since it is not possible to evaluate the characteristics of a
instance only by its number of cities, can not be regarded as increasing the number of cities produces a linear increase in processing time, several factors must be
considered, such as the value parameters of each algorithm and distribution of cities
(the distance between them).
1.91
0.63
1.87
0.31
2.02
0.67
2.05
0.34
eil76
gr120
ch150
1.73
0.57
1.85
0.30
1.62
0.54
1.62
0.27
1.57
0.52
1.67
0.27
si175 a280
1.49
0.49
1.81
0.30
1.55
0.51
1.51
0.25
Figure 14.15 shows the curves that represent the speedup for parallel and parallel group implamentations for purposes of comparison. Figure 14.16 shows the
efficiency curves of the parallel and parallel group experiments. That is possible to
observe that the cost to maintain a good speedup for all the instances, which shows
almost constant, there is just a slight reduction in efficiency, seen in Figure 14.16
because of the addition of node processors in parallel group experiment.
366
Speedup Value
1.5
0.5
Parallel Speedup
Parallel Speedup (group)
0
bays29 swiss42berlin52 eil76 gr120
Instance
ch150
si175
a280
0.9
0.8
Efficiency
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
bays29 swiss42berlin52 eil76 gr120
Size Instance
ch150
si175
a280
14.5.2.7
Collective Analysis
Tables 14.11 and 14.12 present a qualitative comparison among all experiments in
this work. In the first table are presented for each experiment: serial, parallel, time
limited parallel and in group parallel, the best average values for objective function.
In the last line there is the best objective function value known.
In Table 14.12 is presented the percentage distance among the average O.F. values
obtained in each experiment and the known optimal value. From these tables 14.11
14
367
Table 14.11 Comparison between obtained values of objective function in all implementations
Instances
Serial
Parallel
Limited Parallel
Parallel Group
Optimal O.F. Value
2020.00
2020.00
2020.00
2020.00
2020.00
284.00
1280.00
1285.00
1273.00
1273.00
7705.00
7678.00
7751.00
7695.00
7542.00
eil76
gr120
ch150
si175
a280
591.00
590.00
598.00
589.00
538.00
8789.00
8638.00
8735.00
8560.00
6942.00
9163.00
8723.00
8896.00
8674.00
6528.00
23795.00
23936.00
24090.00
23788.00
21407.00
4682.00
4608.00
4732.00
4519
2579.00
Table 14.12 Percentage distance between the average values of objective function achieved
by implementations
Instances
Serial (%)
Parallel (%)
Limited Parallel (%)
Group Parallel (%)
0.00
0.00
0.00
0.00
0.86
0.55
0.94
0.00
2.16
1.80
2.77
2.02
eil76
gr120
ch150
si175 a280
9.85
9.66
11.15
9.47
26.60
24.43
25.82
23.30
40.36
33.62
36.27
32.87
11.15
11.81
12.53
11.12
81.54
78.67
83.48
75.22
Serial Experiment
Parallel Experimentt
Parallel Experiment (Limited time)
Parallel Experiment (Group)
60
50
40
30
20
10
0
bays29 swiss42 berlin52
eil76
gr120
Instance
ch150
si175
a280
Fig. 14.17 Distance between avarege values of O.F. obtained in each experiment and the best
O.F. value
and 14.12 can be seen that there is some proximity between the data obtained in each
type of experiment per instance evaluated. As expected the values of the time limited
parallel execution were a slight reduction in quality results when compared to other
executions, but in the worst case the difference is about 8% for another instance
when compared to the known optimal value, this is the case of a280 instance.
368
The graph of Figure 14.17 was created from these data. In this chart the values
closer to zero are better because they correspond to shorter distances between the
values obtained and the optimum values known. Its possible to see that the experiment that better contribute to the average solution in all instances is the parallel
group communication, being the only exception the berlin52 instance, where parallel execution without time limitation has a slightly higher performance.
14.6 Conclusions
The computational results presented in this work show that the cooperative and competitive approaches achieved satisfactory results in both of cooperation and competition between them (algorithms), and cooperation and competition between groups,
which in both instances tested, the bays29 and swiss42, was found the global optimum in all executions, the rest of the instances get good results as presented.
Furthermore, an performance analysis was made from the proposed approach and
there was a good performance on questions that prove the efficiency and speedup
of performed implementations. The new parallel implementation developed here reduced the execution time by increasing the number of processor nodes. The modular
form of implementation of algorithms and communication infrastructure enables to
create differentiated and good adaptability to high scalability, which can be used for
problems of high dimensionality.
References
1. White, A., Mann, J., Smith, G.: Genetic algorithms and network ring design. Annals of
Operational Research 86, 347371 (1999)
2. Darrell, W.: A genetic algorithm tutorial. Statistics and Computing 4(2), 6585 (1994)
3. Fang, H.: Genetic algorithms in timetabling and scheduling. PhD thesis, Department of
Artificial Intelligence. University of Edinburgh, Scotland (1994)
4. Feo, T., Resende, M.: Greedy randomized adaptive search procedures. Journal of Global
Optimization 6, 109133 (1995)
5. Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for Parallel
Software Engineering. Addison-Wesley Longman Publishing Co., Inc., Boston (1995)
6. Karp, R.: On the computational complexity of combinatorial problems. Networks 5,
4568 (1975)
7. Lima Junior, F.C., Melo, J.D., Doria Neto, A.D.: Using q-learning algorithm for initialization of the GRASP metaheuristic and genetic algorithm. In: IEEE International
Joint Conference on Neural Networks, ITE - ISPEC, Orlando, FL, USA, pp. 12431248
(2007)
8. Prais, M., Ribeiro, C.C.: Reactive grasp: An application to a matrix decomposition problem in tdma traffic assignment. Jornal on Computing 12(3), 164176 (2000)
9. Randy, H., Haupt, S.E.: Patrical Genetic Algoritms, 2nd edn. Wiley Intercience,
Chichester (1998)
10. Reinelt, G.: The Traveling Salesman: Computational Solutions for TSP Applications,
vol. 840, pp. 187199. Springer, Heidelberg (1994)
14
369
11. Resende, M., Ribeiro, C.: GRASP with Path-relinking: Recent Advances and Applications, pp. 2963. Springer, Heidelberg (2005)
12. Resende, M.G.C., de Sousa, J.P., Viana, A. (eds.): Metaheuristics: computer decisionmaking. Kluwer Academic Publishers, Norwell (2004)
13. Ribeiro, C.: Essays and Surveys in Metaheuristics. Kluwer Academic Publishers,
Norwell (2002)
14. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge
(1998)
15. Thangiah, S.: Vehicle Routing with Time Windows using Genetic Algorithms. In: Application Handbook of Genetic Algorithms, New Frontiers, vol. II, pp. 253277 (1995)
16. Vazquez, M., Whitley, L.D.: A comparison of genetic algorithms for the static job shop
scheduling problem. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M.,
Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 303312. Springer,
Heidelberg (2000)
17. Watkins, C.: Learning from delayed rewards. PhD thesis, University of Cambridge,
England (1989)
Chapter 15
Ilter Onder,
Haldun Sural, Nur Evin Ozdemirel,
and Meltem Sonmez Turan
Abstract. This chapter presents an evolutionary approach for solving the traveling salesman problem (TSP) and the TSP with backhauls (TSPB). We propose two
evolutionary algorithms for solving the difficult TSPs. Our focus is on developing
evolutionary operators based on conventional heuristics. We rely on a set of detailed computational experiments and statistical tests for developing an effective
algorithm.
The chapter starts with a careful survey of the algorithms for the TSP and the
TSPB, with a special emphasis on crossover and mutation operators and applications on benchmark test instances. The second part addresses our first evolutionary
algorithm. We explore the use of two tour construction heuristics, nearest neighbor and greedy, in developing new crossover operators. We focus on preserving the
edges in the union graph constructed by edges of the parent tours. We let the heuristics exploit the building blocks found in this graph. This way, new solutions can
inherit good blocks from both parents. We also combine the two crossover operators together in generating offspring to explore the potential gain due to synergy.
In addition, we make use of 2-edge exchange moves as the mutation operator to
incorporate more problem specific information in the evolution process. Our reproduction strategy is based on the generational approach. Experimental results indicate
that our operators are promising in terms of both solution quality and computation
time.
In the third part of the chapter, we present the second evolutionary algorithm developed. This part can be thought of as an enhancement of the first algorithm. A
Ilter Onder
Graduate School of Informatics,
Middle East Technical University, 06531, Ankara, Turkey
e-mail: ilteronder@gmail.com
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 371396.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
372
H. Sural et al.
common practice with such algorithms is to generate one child or two children from
two parents. In the second implementation, we investigate the preservation of good
edges available in more than two parents and generate multiple children. We use the
steady-state evolution as a reproduction strategy this time and test the replacement
of the worst parent or the worst population member to find the better replacement
strategy. Our two mutation operators try to eliminate the longest and randomly selected edges and a third operator makes use of the cheapest insertion heuristic. The
algorithm is finalized after conducting a set of experiments for best parameter settings and testing on larger TSPLIB instances. The second evolutionary algorithm is
also implemented for solving randomly generated instances of the TSPB. Our experiments reveal that the algorithm is significantly better than the competitors in the
literature. The last part concludes the chapter.
Keywords: Networks-Graphs, Traveling Salesman Problems, Evolutionary Algorithms, Crossover Operator, Mutation Operator, Heuristics.
15
An Evolutionary Approach for the TSP and the TSP with Backhauls
373
374
H. Sural et al.
15
An Evolutionary Approach for the TSP and the TSP with Backhauls
375
the TSP. It seems that the idea of using edges rather than position or order of nodes
is more promising.
Among the edge preserving operators, EAX proposed by Nagata and Kobayashi
[8] seems to be particularly promising. Deviation from optimal is less than 0.03% in
21 instances from the TSPLIB [12] with 100 n 3000. In generating an offspring,
[8] starts with the union graph constructed with all the edges from two parents. They
preserve edges from the union graph, but their detection of good edges or segments
in the graph seems to be limited. Chen [13], on the other hand, assumes that the
edges that are common to the parents lead to good solutions and concentrates on the
intersection graph of parental edges.
Jung and Moon [14] devise natural crossover-NX where the parent tours are partitioned randomly, and partitions from different parents are merged to give partial
tours. The partial tours are then merged using the shortest edges. They argue that
the results they present are better than EAX and faster than distance preserving
crossover-DPX. They also report that EAX showed poorer performance than the
original paper [8]. Lin-Kernighan heuristic is used to improve the results of NX,
and a deviation of 0.085% is obtained for a problem instance with 11849 cities.
Merz [15] proposes a new edge recombination-NEX operator where the probabilities of inheriting an edge from the parents and selecting an edge from the complete
graph can be adjusted. The results are comparable with the results of EAX for small
problems. Ray et al. [16] present a crossover to improve the tours generated with
the nearest neighbor heuristic. They propose fragmentation of tours generated with
the heuristic and connecting the fragments using the shortest possible edges.
Considering conventional heuristics as hill climbing methods, the combination
of conventional heuristics and EA seems a promising approach for solving the TSP,
since EAs are also able to find the hills for the conventional heuristics to climb.
376
H. Sural et al.
matrix in which large numbers are added to the inter-cluster distances. In the second
heuristic, GENIUS constructs linehaul and backhaul tours separately, and then connects these tours. The third heuristic is similar to the second one except the depot
is not included in the beginning. The fourth heuristic is cheapest insertion coupled
with US for improving the solutions, and the fifth one is GENI coupled with Oropt improvement moves. The last heuristic uses cheapest insertion that incorporates
Or-opt. [18] reports that the first heuristic is the best, and the best results are 34% larger than the lower bound on average. Mladenovic and Hansen [20] improve
GENIUS for solving the TSPB, incorporating variable neighborhood search (VNS).
VNS is a random neighborhood search mechanism in which the neighborhood size
is increased until an improving move has been found. [20] reports that GENIUS
coupled with VNS, G+VNS, is better than the original GENIUS by an average of
0.40% with an increase of 30% in computation time.
Ghaziri and Osman [21] use an artificial neural network (SOFM) and demonstrate
that SOFM coupled with 2-opt (SOFM*) can improve the solution quality. Their test
results are comparable to those of the methods that transform TSPB to TSP.
The only EA to solve the clustered TSP, developed by Potvin and Guertin [22],
use the edge recombination crossover-ER and 2-opt as a mutation operator. ER is
used to preserve the inter-cluster edges in the first phase and the intra-cluster edges
in the second phase. The 2-opt mutation operator is applied within clusters. The
results are better than those of GENIUS.
15.1.5 Outline
Our aim in this chapter is to illustrate that conventional TSP heuristics can be used
as effective crossover and mutation operators. We restrict the use of crossover operators on the union graph of parents in an attempt to preserve parental edges. We
describe two EAs and report their computational results in Sect. 15.2 and Sect. 15.3.
The first EA uses nearest neighbor and greedy heuristics for crossover and 2-edge
exchange for mutation. We choose these heuristics for illustrative purposes, but others can also be considered. We also explore combined use of multiple crossovers
operators and make use of generational evolution approach. The second EA focuses
on nearest neighbor crossover and explores generation of multiple offspring from
more than two parents. Mutation operators used are 2-edge exchange to eliminate
the longest or random edges and node insertion. The second EA takes a steadystate evolution approach. Considering the TSP materializes with side constraints in
practice, we implement the second EA to solve the TSP with backhauls using the
modified cost matrix and report our computational results. Sect. 15.4 concludes the
chapter.
15
An Evolutionary Approach for the TSP and the TSP with Backhauls
377
378
H. Sural et al.
to the tour in this manner. The nodes adjacent to 6 are 8, 3, 2 and 9, all of which
are already in the tour. In this case, from the complete graph, we choose 7 among
the unvisited nodes 4, 5 and 7. Then, 5 and 4 are added to the tour. Note that all
edges except 6-7 and 4-11 are taken from the union graph. Fitness value of the new
tour is 31, which is less than fitness values of both parents. In Figure 15.1.a, most
of the edges in the tour are short except the lastly inserted edge 4-11. Notice that,
for instance, removing edges 4-11 and 1-2, and inserting edges 1-11 and 2-4 would
improve the tour, which can be achieved with 2-edge exchange.
6 7 8 9 10 11 12 13 14
1
2 2
5 2 2
4 4 1 1
5 5 3 1 2
6 6 4 4 5
5 7 5 6 7
7 10 8 7 8
7 8 8 12 8
4 6 6 12 6
7 9 9 11 7
2
4
6
6
6
7
2
3
3
6
7
2
2
4
6
1
4
3
3
1
Given a starting node, there is only one tour a deterministic NNX can generate unless there is a tie in selection. We have also tried a stochastic version of
NNX where one of the edges incident to the current node is selected probabilistically. The selection probability is inversely proportional with the length of the
edge. Edge selection in this version of NNX is similar to the heuristic crossover
15
An Evolutionary Approach for the TSP and the TSP with Backhauls
379
[7], which preserves 60% of parental edges. Stochastic NNX has a potential advantage over deterministic NNX. Given two parents and a starting node, it can produce different offspring because of randomization. Hence, it is possible to increase
the population diversity and portion of search space covered. Our pilot run results
have shown that stochastic NNX indeed provides higher diversification, resulting in
slower population convergence at the expense of longer computation times. In solution quality, however, stochastic NNX has proven to be significantly inferior to the
deterministic version. Therefore we used only the deterministic version in further
experimentation.
380
H. Sural et al.
15
An Evolutionary Approach for the TSP and the TSP with Backhauls
381
382
H. Sural et al.
operator generates one offspring from each pair. We use this method because we
wish to isolate the effect of our evolutionary operators. We do not want the selection pressure to interfere with this effect, therefore, we prefer to use a neutral
selection scheme.
6. Replacement: With newly generated offspring, the population size is temporarily
doubled. For replacement, we sort parents and offspring together according to
their fitness values. We then carry the best half of these chromosomes to the next
generation.
7. Stopping conditions: We stop our EA if average fitness is exactly the same in
two consecutive generations. In addition to this condition, we also use an upper
bound of 500 on the number of generations, which is large enough when we
consider convergence behavior of our EA in Figure 15.2 for an instance with 52
cities.
%
45
10
10
5
5
5
5
5
5
5
15
An Evolutionary Approach for the TSP and the TSP with Backhauls
383
the other hand, [25], [26], and [27] are some of the studies who come up with very
good solutions when the initial population is generated using a heuristic.
DB is the percent deviation of the best solution found throughout the evolution in a
single EA run. We also use DA to see whether or not the population has converged
at the end of the run; if DB and DA are close, then convergence has been achieved.
CT includes the initial population generation time. NG gives us an idea about convergence behavior of the EA. We can observe, for instance, if GX leads to faster
convergence than NNX in terms of the number of generations.
We test 25 problem instances in total. They range in size from 52 to 1748. 24
problem instances selected from the TSPLIB have symmetric integer Euclidean distances. The additional problem tr81 includes the cities in Turkey. Note that only an
upper bound is available for tr81, and therefore the percent deviation of its EA solutions is computed using this bound. We replicate our EA runs 30 times for small
problems with n 226 and 10 times for larger problems. The algorithm is coded in
ANSI C and runs on a Pentium IV 1600 MHz machine with 256 MB RAM running
RedHat Linux 8.0.
Table 15.3 includes the averages of performance measures over 30 replications
of 10 problem instances with n 226, where the initial population size p is 50.
When we compare two crossovers, NNX yields better solution quality than GX and
takes much shorter CT. Both M1 and M2 improve solution quality over no mutation
case. M2 mutating all offspring improves NNX better compared to M1 mutating
only the best. For GX, M2 performs only slightly better than M1. M2 coupled with
NNX takes longer time than M1 as expected. With GX, however, M1 results in
longer CT than M2 because of slower convergence of GX-M1 combination. Hybrid
initial population leads to slightly better solution quality than random population.
Edge preservation from the union graph is 97% for NNX and 92% for GX without
mutation. Mutation reduces these figures by 2-5%. The largest NG values are
384
H. Sural et al.
Table 15.3 Average results over 30 replications of the 10 small problem instances where
p = 50
Crossover
Mutation IP
R
NoM
H
R
NNX
M1
H
R
M2
H
R
NoM
H
R
GX
M1
H
R
M2
H
R
NoM
H
R
50% NNX 50% GX M1
H
R
M2
H
R
NoM
H
R
90% NNX 10% GX M1
H
R
M2
H
R
NoM
H
R
M1
95% NNX 5% GX
H
R
M2
H
DB
3.10
4.82
1.67
1.57
0.55
0.55
12.54
7.19
4.36
3.67
3.30
3.01
8.15
5.53
1.92
1.68
1.76
1.61
7.23
5.19
1.84
1.67
0.51
0.48
6.69
5.06
1.77
1.41
0.49
0.44
DA
5.71
5.74
2.30
2.13
0.67
0.73
15.54
12.58
7.48
7.04
4.91
4.80
8.86
7.99
2.90
3.06
4.32
4.16
7.44
6.03
2.35
2.38
0.67
0.63
7.03
5.98
2.39
2.43
0.62
0.58
NG
45.39
33.52
40.21
36.09
53.37
43.53
17.35
16.37
60.44
48.44
26.30
25.83
42.50
38.47
66.04
61.21
19.25
20.68
41.16
34.93
55.60
46.93
37.13
37.24
41.23
33.04
52.62
44.33
37.15
36.19
CT
0.38
2.95
0.62
3.55
5.52
8.11
48.23
54.27
208.70
178.65
82.79
90.58
73.25
75.67
113.81
112.77
26.40
34.19
13.39
14.95
19.14
20.16
19.26
21.95
6.74
8.93
10.03
11.30
11.58
14.88
Problem instances: berlin52, tr81, eil101, bier127, ch130, pr136, ch150, u159, kroa200, and
pr226.
observed when mutating only the best offspring. As expected, NG is higher for
random initial population compared to hybrid one.
In implementing our multi-crossover approach, we tried mixing NNX and GX
with various ratios. We started with 50% NNX and 50% GX and observed that both
the solution quality and the computation time are in between those obtained with individual operators. As NNX yields better solution quality in shorter time compared
to GX, we decided to increase the contribution of NNX. After experimenting with
90% and 95% NNX, we found that best results are obtained with the latter. With
95% NNX and 5% GX, M2, and hybrid initial population, the EAs results deviate
15
An Evolutionary Approach for the TSP and the TSP with Backhauls
385
from the optimal by only 0.44%. The same figure is 0.55% with only NNX. This
shows that using multiple crossover operators instead of a single one can indeed
bring advantages in terms of solution quality. Our observation is also consistent
with the results reported by Potvin and Bengio [22]. The results for the 10 instances
are given in Table 15.4 for the best mixture of crossovers with p = 100. When p is
increased from 50 to 100, the average deviation reduces to 0.34%, which is quite a
satisfactory result in 25 seconds.
Table 15.4 Average results over 30 replications of the 10 small problem instances with 95%
NNX and 5% GX where p = 100
Problem
berlin52
tr81
eil101
bier127
ch130
pr136
ch150
u159
kroa200
pr226
Average
IP
R
H
R
H
R
H
R
H
R
H
R
H
R
H
R
H
R
H
R
H
R
H
DB
1.22
0.92
3.35
3.48
6.60
3.78
3.15
3.72
5.49
6.16
11.09
5.92
3.18
3.44
6.45
6.95
8.37
9.12
5.06
4.26
5.40
4.78
NoM
DA
1.91
1.61
3.57
3.80
6.75
3.93
3.25
3.81
5.68
6.19
11.20
5.92
3.26
3.51
6.80
6.98
8.43
9.27
5.14
4.27
5.60
4.93
NG
35.9
27.5
42.5
34.2
62.1
63.0
60.2
53.0
45.8
41.2
42.9
40.4
44.0
36.3
42.6
33.5
42.6
41.2
60.6
54.4
47.9
42.5
CT
0.7
0.7
2.5
2.7
12.5
10.3
10.7
12.1
20.0
12.7
11.8
14.2
15.3
19.0
14.9
17.8
33.3
45.8
62.4
76.4
18.4
21.2
DB
0.07
0.00
0.97
0.84
2.10
0.93
0.77
0.62
2.14
2.39
4.85
2.87
0.28
0.37
0.81
1.03
1.68
1.78
0.72
0.79
1.44
1.16
M1
DA
0.45
0.00
1.45
1.21
2.64
1.37
1.14
0.93
2.49
2.50
5.78
3.52
0.42
0.44
0.97
1.28
1.78
2.23
1.10
1.21
1.82
1.47
NG
39.0
32.3
51.7
50.6
83.8
81.1
73.6
71.5
69.0
51.7
68.6
55.3
58.8
51.9
6.5
53.0
85.7
89.6
72.8
70.5
61.0
60.8
CT
0.8
0.7
3.1
3.6
9.0
8.7
13.4
15.3
23.8
18.6
19.2
18.4
21.1
23.5
23.5
2.2
72.4
87.3
77.3
93.0
26.4
27.1
DB
0.00
0.00
0.48
0.47
0.93
0.82
0.26
0.28
0.76
0.85
0.41
0.37
0.26
0.24
0.04
0.00
0.33
0.32
0.01
0.01
0.35
0.34
M2
DA
0.00
0.00
0.53
0.48
1.06
0.92
0.28
0.28
0.90
0.99
0.46
0.49
0.40
0.35
0.14
0.04
0.43
0.43
0.01
0.07
0.42
0.41
NG
14.6
13.2
34.5
37.7
69.9
60.9
28.5
27.9
53.4
47.0
64.0
67.1
26.3
26.9
31.6
25.4
64.1
65.6
40.5
40.2
42.7
41.2
CT
0.8
0.8
4.6
5.8
15.6
14.3
9.6
12.0
22.6
22.3
24.6
35.2
15.8
20.7
19.6
21.1
84.3
98.6
64.8
21.6
26.2
25.2
386
H. Sural et al.
Table 15.5 Average results over 10 replications of the four larger problem instances where
p = 100
Crossover
Mutation
NoM
95% NNX 5% GX M1
M2
NoM
100% NNX
M1
M2
DB
8.43
3.90
2.01
4.73
3.25
2.13
DA
8.46
4.66
2.74
7.03
4.61
3.05
NG
74.5
120.9
105.2
92.4
133.8
102.1
CT
494.0
972.4
964.0
4.5
11.7
206.3
are quite comparable with those for NNX only. Besides, the multi-crossover approach requires significantly longer computation times than NNX. Therefore, we
decided to run the EA using only NNX and random initial population for solving
larger test problems.
The final test bed includes 15 instances, where 318 n 1748, including the
four instances in the preliminary test. The average results over 10 replications are
given in Table 15.6. Coupled with M1, NNX achieves an average deviation of 4.9%
from the optimal in about 65 seconds. M2 requires more CT and slightly improves
this deviation.
Table 15.6 Average results over 10 replications of the 15 larger problem instances with NNX
only where p = 100
Problem
lin318
fl417
pr439
pcb442
rat575
p654
d657
u724
rat783
u1060
vm1084
pcb1173
nrw1379
u1432
vm1748
Average
DB
4.44
4.93
4.38
5.15
5.30
7.47
7.95
6.35
8.40
9.56
10.01
9.47
10.60
10.79
9.31
7.61
NoM
DA
6.67
6.67
6.76
8.00
7.94
10.50
10.03
8.29
10.38
13.81
12.55
13.26
13.26
13.08
11.16
10.16
CT
3
5
4
6
11
11
12
15
21
38
24
36
63
65
65
25.3
DB
1.87
2.93
3.44
4.75
5.16
3.09
5.16
5.14
5.84
6.68
5.77
3.00
7.35
6.89
7.05
4.94
M1
DA
2.60
4.73
3.68
7.42
7.15
7.68
7.35
6.64
6.75
9.06
10.58
5.52
10.41
9.95
9.99
7.30
CT
8
14
10
15
21
27
39
39
46
93
65
97
181
117
203
65.0
DB
2.01
1.83
1.48
3.18
4.07
2.22
4.70
3.87
5.33
7.44
6.52
8.01
6.65
6.08
7.09
4.70
M2
DA CT
2.93 105
3.34 210
1.56 240
4.37 270
6.03 345
4.70 360
6.57 390
5.42 720
7.46 480
10.19 1080
9.19 1470
10.27 1230
9.09 1800
8.12 3030
9.78 4215
6.60 1063.0
15
An Evolutionary Approach for the TSP and the TSP with Backhauls
387
Finally, we present Table 15.7 for the comparison of the EA with the two metaheuristics in the literature, Meta-RaPS [6] and ESOM [5]. It can be seen from the
table that the EA outperforms both heuristics on the 10 benchmark TSPs. The table
also reports CPU times for each competitor.
Table 15.7 Results for the first EA and two metaheuristics in the literature
EA-M1 EA-M2
DB CT1 DB CT1
eil101 0.93 8.7 0.82 14.3
bier127 0.62 15.3 0.28 12.0
pr136 2.87 18.4 0.37 35.2
kroa200 1.78 87.3 0.32 98.6
pr226 0.79 93 0.01 21.6
lin318 1.87 8 2.01 105
pr439 3.44 10 1.48 240
pcb442 4.75 15 3.18 270
pcb1173 3.00 97 8.01 1230
vm1748 7.05 203 7.09 4215
1 Pentium 4 1.6 GHz.
2 AMD Athlon 900 MHz.
3 SUN Ultra 5/270.
Problem
Meta-RaPS ESOM3
DB CT2 DB CT3
NA NA 3.43 NA
0.90 48 1.70 NA
0.39 73 4.31 NA
1.07 190 2.91 NA
0.23 357 NA NA
NA NA 2.89 NA
3.30 2265 NA NA
NA NA 7.43 NA
NA NA 9.87 200
NA NA 7.27 475
NA: Not available. EA results are the best results given in Table 15.4 (15.6) for small (larger)
problem instances.
15.3 The Second Evolutionary Algorithm for the TSP and the
TSPB
The second EA can be perceived as an enhancement of the first algorithm based
on our previous experimental results. Combined use of NNX and GX slightly improves the solution quality, but NNX is much faster than GX. Hence, we focus on
NNX and want to explore generating multiple offspring from more than two parents in an attempt to preserve parental edges while keeping the population diverse.
Also, considering that the 2-edge exchange mutation limits diversity and takes significant computation time, we propose faster mutation operators that will increase
population diversity.
388
H. Sural et al.
All combinations of the above parameters are replicated 30 times for each problem. Analysis of variance results show that all parameters and two-way interactions
(except PxC interaction) have a significant effect at = 0.05. According to the twoway interaction plots, the best settings are P=2, C=10, random parent selection and
RP replacement. It is interesting that, although generating multiple offspring brings
a significant improvement, using more than two parents has an adverse effect. This
is probably because as the number of parents increases the union graph resembles
the complete graph. The tours NNX generates on this graph become similar to those
that would be generated with the deterministic nearest neighbor heuristic, resulting
in premature convergence. For generating diversified offspring, using two parents
seems to provide a good balance between the edges inherited from the parents and
the edges borrowed from the complete graph.
We have also tried generating initial population with the nearest neighbor heuristic and selecting edges in NNX by alternating between the parents (similar to the
AB cycles used in EAX [8]) with no further improvements.
15
An Evolutionary Approach for the TSP and the TSP with Backhauls
389
Analysis of variance results indicate that P=C and LEM have significant effect on
solution quality as well as their two-way interaction at = 0.01. According to the
two-way interaction plots, the best settings are P=C=2, LEM2/NIM2 combination,
and RMH replacement. Note that RMH with two parents is equivalent to RP with
two parents given in Sect. 15.3.1.
390
H. Sural et al.
Set population size to n for small problems, to 200 for larger problems
Generate initial population randomly
for generation = 1 to NG
for parent = 1 to 10
Select two parents at random
for o f f spring = 1 to 10
Generate an offspring using NNX on union graph of parents
If the offspring is better than its worse parent, go to M
end for
end for
Apply REM2 or NIM2 with equal probabilities to the offspring
Replace the worse parent with the mutated offspring
end for
Fig. 15.3 The pseudocode of the second EA
This algorithm is run for larger problem instances and their average results
over 10 (5) replications for instances with n 2000 (n > 2000) are summarized
in Table 15.9. The algorithm achieves an average deviation of 2.3% from the optimal. As expected, CT increases as n increases. However, DBR and DAR values do
not increase monotonously. Instance characteristics seem to make a difference, and
problems nrw1379, u2152 and pr2392 prove to be more difficult to solve.
15
An Evolutionary Approach for the TSP and the TSP with Backhauls
391
Table 15.8 Results of the second EA over 30 replications for the eight small problem
instances
Problem
berlin52
eil101
bier127
ch130
ch150
u159
kroa200
pr226
Average
NoM with RP
DBR DAR
0.00 1.24
0.79 4.32
0.61 1.74
1.64 3.12
1.23 1.95
1.30 5.09
1.25 1.83
0.86 1.44
0.96 2.59
Table 15.9 Results of the second EA for the 17 larger problem instances
Problem
lin318
fl417
pr439
pcb442
rat575
p654
d657
u724
rat783
u1060
vm1084
pcb1173
nrw1379
u1432
vm1748
u2152
pr2392
Average
NG
10,000
10,000
10,000
10,000
10,000
10,000
10,000
10,000
10,000
40,000
40,000
40,000
40,000
40,000
40,000
60,000
60,000
26471
DBR
0.92
0.95
1.33
2.58
1.64
0.91
3.32
2.33
2.07
1.92
2.53
3.88
3.81
2.25
3.13
1.52
4.67
2.34
DAR
1.36
2.03
1.48
3.44
2.23
1.19
5.17
3.52
3.35
2.78
3.29
4.48
6.63
3.17
3.59
1.98
7.31
3.35
CT
33.1
43.0
33.9
66.7
93.4
92.7
127.5
118.8
140.7
1333.8
993.2
1191.9
4775.3
1999.7
1710.4
7623.5
13242.6
1977.7
392
H. Sural et al.
of the test instances. We generate 30 instances for each (n, r) pair, where r is the
backhaul fraction. Note that r = 0.1 indicates an instance with 10% backhauls.
We follow [17] to ensure that all linehauls precede all backhauls. In transforming TSPB to TSP we add a very large value to the distances between each pair of
linehaul and backhaul customers. Using the modified distance matrix enforces the
second EA to visit all linehauls prior to all backhauls.
Table 15.10 presents the averages over 30 randomly generated instances. Column GENI gives the results of the construction heuristic, column GENIUS presents
Table 15.10 Average results of the second EA over five replications and various metaheuristics in the literature
n
100
r
0.1
0.2
0.3
0.4
0.5
Average
200 0.1
0.2
0.3
0.4
0.5
Average
300 0.1
0.2
0.3
0.4
0.5
Average
500 0.1
0.2
0.3
0.4
0.5
Average
1000 0.1
0.2
0.3
0.4
0.5
Average
Overall
GENI
1012.5
1068.7
1109.66
1125.63
1133.87
1090.07
1418.63
1498.83
1550.52
1585.76
1586.93
1528.13
1720.82
1824.62
1886.48
1903.29
1927.34
1852.51
2197.16
2342.99
2409.8
2443.12
2464.11
2371.44
3099.17
3281.34
3366.07
3451.02
3455.69
3330.66
2034.56
GENIUS
994.12
1047.01
1088.09
1106.69
1114.34
1070.05
1387.22
1470.95
1525.26
1555.26
1554.13
1498.56
1683.76
1784.8
1854.86
1874.43
1892.2
1818.01
2158.79
2297.11
2370.45
2399.35
2418.2
2328.78
3042.6
3232.65
3314.8
3387.43
3388.16
3273.13
1997.71
G+VNS
987.11
1044.66
1085.34
1102.29
1108.68
1065.62
1378.8
1464.88
1519.93
1548.73
1546.97
1491.86
1675.82
1782.62
1849.05
1865.75
1887.35
1812.12
2156.61
2292.04
2363.16
2388.07
2405.55
2321.09
3029.76
3213.61
3302.93
3366.23
3379.67
3258.44
1989.83
SOFM
1043.56
1072.3
1108.54
1131.83
1123.29
1095.90
1436.12
1489.91
1545
1554.69
1553.9
1515.92
1702.6
1787.14
1877.15
1876.49
1891.39
1826.95
2168.59
2310.7
2398.49
2441.94
2428.72
2349.69
3083.59
3279.02
3327.04
3392.17
3408.41
3298.05
2017.30
SOFM*
996.13
1052.71
1092.07
1106.97
1112.37
1072.05
1381.15
1462.32
1523.71
1551.48
1549.61
1493.65
1680.93
1784.9
1854.3
1866.84
1888.92
1815.18
2161.07
2297.35
2376.73
2397.06
2410.81
2328.60
3048.69
3228.44
3312.38
3371.28
3386.50
3269.46
1995.79
EA DBR
994.18
1052.06
1089.04
1100.51
1103.16
1067.79
1381.17
1467.57
1502.41
1521.97
1534.64
1481.55
1668.96
1773.81
1830.29
1868.45
1873.38
1802.98
2139.06
2279.95
2342.80
2392.52
2405.93
2312.05
3035.92
3212.79
3323.03
3358.36
3398.27
3265.67
1986.01
EA DAR
994.57
1052.30
1089.85
1101.02
1104.31
1068.41
1381.85
1467.85
1502.99
1522.78
1535.90
1482.27
1671.28
1774.43
1832.32
1869.54
1874.08
1804.33
2142.44
2285.57
2347.55
2398.11
2411.29
2316.99
3044.57
3216.77
3328.28
3366.75
3403.74
3272.02
1988.81
15
An Evolutionary Approach for the TSP and the TSP with Backhauls
393
Table 15.11 Average CPU times in seconds for the second EA and various metaheuristics in
the literature
n
100
Average
200
Average
300
Average
500
Average
1000
Average
Overall
r GENIUS1
0.1
4.7
0.2
4.8
0.3
4.8
0.4
5.1
0.5
4.4
4.8
0.1 36.3
0.2 32.4
0.3 31.7
0.4 31.2
0.5 39.6
34.2
0.1 106.4
0.2 105.9
0.3 70.9
0.4 69.6
0.5 72.3
85.0
0.1 325.6
0.2 289.5
0.3 317.7
0.4 374.0
0.5 405.9
342.5
0.1 1130.3
0.2 1211.1
0.3 1019.8
0.4 1302.8
0.5 1324.6
1197.7
332.9
G+VNS1
5.4
5.8
5.5
5.4
5.6
5.5
31.6
35.9
30.7
38.8
43.1
36.0
109.3
87.2
100.1
105.6
101.1
100.7
343.6
248.1
383.3
326.1
472.2
354.7
1417.9
1637.2
1643.1
1898.3
1762.6
1671.8
433.7
SOFM2
23.5
22.1
21.2
27.6
23.1
23.5
61.1
63.6
71.3
72.9
62.1
66.2
237.9
278.3
286.3
365.0
354.0
304.3
732.0
729.0
798.0
802.0
852.0
782.6
1398
1423
1412
1435
1402
1414
518.1
SOFM*2
23.7
22.8
21.6
28.0
23.8
24.0
61.7
64.3
71.9
73.8
62.6
66.9
239.2
283.6
287.8
371.0
360.3
308.4
749.7
751.9
821.5
834.2
872.0
805.9
1428.1
1495.3
1432.1
1470.5
1440
1453.2
531.7
EA3
20.5
18.5
17.9
17.3
17.2
18.3
38.0
38.7
35.8
35.6
35.2
36.7
68.5
65.4
68.8
68.7
66.2
67.5
317.4
300.5
321.4
343.1
295.4
315.6
1933.7
1921.4
1829.1
1776.9
1832.9
1858.8
459.4
the results when US improvement is applied after GENI, and column G+VNS is
for the variable neighborhood search applied on GENIUS [20]. The neighborhood
is formed by node exchange moves, a node is deleted from the tour and inserted
at a point that improves the tour length. Columns SOFM and SOFM* display the
results of self-organizing feature map type neural network algorithms in [21].
394
H. Sural et al.
SOFM* corresponds to the procedure where the SOFM solutions are improved with
2-opt.
The last two columns represent the results of our second EA in its final form.
The EA is run for 10,000 generations for problem instances n =100, 200, 300, and
for 20,000 (30,000) generations for instances with n = 500 (1000). Column EA
DBR in Table 15.10 presents the average of the best of five replications over 30
problem instances, and EA DAR is the average of the best of each replication over
five replications of 30 instances.
A paired t-test on the difference between EA DAR versus GENI, GENIUS,
SOFM, SOFM* and G+VNS indicates that EA is better than the first four competitors at = 0.01. The overall average is better than that of G+VNS, but the difference is relatively small to derive a statistically significant result (p-value 0.122).
When n 500, the test on the difference between EA DAR versus GENI, GENIUS,
SOFM, SOFM* and G+VNS indicates that EA is statistically the best algorithm
among all alternatives. The p-value for G+VNS versus EA DAR comparison is
0.011.
The results of the random TSPB instances indicate that our second algorithm
with REM2/NIM2 mutation is better than the three competitors and is comparable to
G+VNS. The application of EA is simple and the constraint handling is eliminated
as the algorithm can effectively find good solutions just by making the necessary
modifications in the distance matrix.
Table 15.11 reports CPU times for various metaheuristics. Although each competitor uses a different machine, there are no significant differences in CPU times
for problems of certain size.
15.4 Conclusion
We presented two EAs using conventional heuristics to solve TSPs. The first EA
uses nearest neighbor and greedy heuristics as crossover operators. Their application
on the union graph resembles the implementation of classical heuristics on a candidate graph of k-nearest neighbors of each node. The mutation operator makes use
of the well known 2-edge exchange heuristic. The two crossovers are also used in
a combined manner and performed better than a single operator alone, but required
more computation time for larger problems. The EA solutions are significantly better than those obtained by conventional heuristics and by some recent metaheuristics
in the literature. The second EA uses only NNX and three new mutation operators
based on longest or random edge elimination and node insertion. Solution quality is
further improved with this EA in reasonable computation times.
Considering the TSP materializes with complicating side constraints in practice,
we tested our second EA on TSPB benchmark sets generated randomly. To the best
of our knowledge there is no published results on the TSPB solved using an EA.
Our EA in general outperforms other metaheuristics for the TSPB. Our experience
shows that it might be easier to incorporate constraint handling into this form of
EAs than the specialized and sophisticated classical TSP algorithms.
15
An Evolutionary Approach for the TSP and the TSP with Backhauls
395
For future research, NNX can be further improved to become faster in solving
large problems in shorter time. When offspring construction on the union graph
fails, search of the complete graph for the shortest edge can be improved. A knearest neighbor candidate graph can be used to speed up the search as suggested
by Yang [26]. We believe that these efforts will take us one step closer to bridging the
gap between operations research and EAs because the proposed approach can also
be generalized for solving other expensive combinatorial optimization problems.
References
1. Traveling Salesman Problem, http://www.tsp.gatech.edu/ (last access: July 2009)
2. Gutin, G., Punnen, A.P.: The Traveling Salesman Problem and Its Variations (Combinatorial Optimization). Kluwer Academic, Dordrecht (2002)
3. Junger, M., Reinelt, G., Rinaldi, G.: The Traveling Salesman Problem. In: Monma, C.L.,
Ball, M.O., Magnanti, T., Nemhauser, G. (eds.) Network Models. Handbook on Operations Research and Management Science, vol. 7, pp. 225230. Elsevier Science, Amsterdam (1995)
4. Johnson, D.S., Mcgeoch, L.A.: The Traveling Salesman Problem: A Case Study in Local Optimization. In: Aarts, E.H.L., Lenstra, J.K. (eds.) Local Search in Combinatorial
Optimization, pp. 215310. John Wiley, New York (1997)
5. Leung, K.S., Jin, H.D., Xu, Z.B.: An Expanding Self-Organizing Neural Network for the
Traveling Salesman Problem. Neurocomputing 62, 267292 (2004)
6. DePuy, G.W., Moraga, R.J., Whitehouse, G.E.: Meta-RaPS: A Simple and Effective Approach for Solving Traveling Salesman Problem. Transportation Research Part E 41(1),
115130 (2005)
7. Michalewicz, Z., Fogel, D.B.: How to Solve It: Modern Heuristics. Springer, New York
(2000)
8. Nagata, Y., Kobayashi, S.: Edge Assembly Crossover: A High-power Genetic Algorithm
for the Traveling Salesman Problem. In: Back, T. (ed.) Proceedings of the Seventh International Conference on Genetic Algorithms, pp. 450457. Morgan Kaufmann, San
Mateo (1997)
9. Schmitt, L.J., Amini, M.M.: Performance Characteristics of Alternative Genetic Algorithmic Approaches to the Traveling Salesman Problem Using Path Representation: An
Empirical Study. European Journal of Operational Research 108(3), 551570 (1998)
10. Xiaoming, D., Runmin, Z., Rong, S., Rui, F., Shao, H.: Convergence Properties of NonCrossover Genetic Algorithm. In: Proceedings of the Fourth World Congress on Intelligent Control and Automation, pp. 18221826 (2002)
11. Potvin, J.J.: Genetic Algorithms for the Traveling Salesman Problem. Annals of Operations Research 63, 339370 (1996)
12. Reinelt, G.: TSPLIB-A Traveling Salesman Problem Library. INFORMS Journal on
Computing 3(4), 376384 (1991)
13. Chen, S.Y.: Is the Common Good? A New Perspective Developed in Genetic Algorithms.
PhD thesis, Carnegie Mellon University, Pittsburgh, USA (1999)
14. Jung, S., Moon, B.R.: Toward Minimal Restriction of Genetic Encoding and Crossovers
for the Two-Dimensional Euclidean TSP. IEEE Transactions on Evolutionary Computation 6(6), 557565 (2002)
396
H. Sural et al.
15. Merz, P.: A Comparison of Memetic Recombination Operators for the Traveling Salesman Problem. In: Proceedings of the Genetic and Evolutionary Computation Conference,
pp. 472479. Morgan Kaufmann, San Francisco (2002)
16. Ray, S.S., Bandyopadhyay, S., Pal, S.K.: New Operators of Genetic Algorithms for Traveling Salesman Problem. In: Proceedings of the 17th International Conference on Pattern
Recognition, pp. 497500. IEEE Computer Society, Washington (2004)
17. Chisman, J.A.: The Clustered Traveling Salesman Problem. Computers and Operations
Research 22, 115119 (1975)
18. Gendreau, M., Hertz, A., Laporte, G.: The Traveling Salesman Problem with Backhauls.
Computers and Operations Research 23(5), 501508 (1996)
19. Gendreau, M., Hertz, A., Laporte, G.: New Insertion and Post-Optimization Procedures
for the Traveling Salesman Problem. Operations Research 40, 10861094 (1992)
20. Mladenovic, N., Hansen, P.P.: Variable Neighborhood Search. Computers and Operations
Research 24, 10971100 (1997)
21. Ghaziri, H., Osman, I.H.: A Neural Network Algorithm for the Traveling Salesman Problem with Backhauls. Computers and Industrial Engineering 44(2), 267281 (2003)
22. Potvin, J.J., Bengio, S.: The Vehicle Routing Problem with Time Windows. Part II:
Genetic search. INFORMS Journal on Computing 8, 619632 (1996)
23. Reinelt, G.: The Traveling Salesman Problem, Computational Solutions for TSP Applications. Springer, Berlin (1994)
24. Baraglia, R., Hidalgo, J.I., Perego, R.: A Hybrid Heuristic for the Traveling Salesman
Problem. IEEE Transactions on Evolutionary Computation 5(6), 613622 (2001)
25. Merz, P., Freisleben, B.: Genetic Local Search for the TSP: New Results. In: Proceedings
of the IEEE International Conference on Evolutionary Computation, pp. 159164 (1997)
26. Yang, R.: Solving Large Traveling Salesman Problems with Small Population. In: Genetic Algorithms in Engineering Systems: Innovations and Applications Conference, pp.
157162 (1997)
27. Tsai, H.K., Yang, J.M., Tsai, Y.F., Kao, C.Y.: A Heterogeneous Selection Genetic
Algorithm for Traveling Salesman Problems. Engineering Optimization 35(3), 297311
(2003)
28. Muhlenbein, H.: Parallel Genetic Algorithms, Population Genetics and Combinatorial
Optimization. In: Proceedings of the Third International Conference on Genetic algorithms, pp. 416421. Morgan Kaufmann, San Francisco (1989)
Chapter 16
Abstract. Multi-objective genetic Takagi-Sugeno (TS) fuzzy systems use multiobjective evolutionary algorithms to generate a set of fuzzy rule-based systems of
the TS type with different trade-offs between, generally, complexity/interpretability
and accuracy. The application of these algorithms requires a large number of TS
system generations and evaluations. When we deal with high dimensional data sets,
these tasks can be very time-consuming, thus making an adequate exploration of the
search space very problematic. In this chapter, we propose two techniques to speed
up generation and evaluation of TS systems. The first technique aims to speed up the
identification of the consequent parameters of the TS rules, one of the most timeconsuming phases in TS generation. The application of this technique produces as a
side-effect a decoupling of the rules in the TS system. Thus, modifications in a rule
do not affect the other rules. Exploiting this property, the second technique proposes
to store specific values used in the parents, so as to reuse them in the offspring and
to avoid wasting time. We show the advantages of the proposed method in terms
of computing time saving and improved search space exploration through two examples of multi-objective genetic learning of compact and accurate TS-type fuzzy
systems for a high dimensional data set in the regression and time series forecasting
domains.
16.1 Introduction
Multi-objective Genetic Fuzzy Systems (MGFSs) [18] are interesting multi-objective
computational intelligent methods successfully employed in regression [3, 4, 8, 9,
10], classification [19, 20, 21], data mining [7, 22] and control [32], which are receiving increasing attention in the research community [12]. MGFSs extend Genetic
Marco Cococcioni Beatrice Lazzerini Francesco Marcelloni
Dipartimento di Ingegneria dellInformazione, University of Pisa,
Largo Lucio Lazzarino 1, 56122 Pisa, Italy
e-mail: m.cococcioni@iet.unipi.it, b.lazzerini@iet.unipi.it,
f.marcelloni@iet.unipi.it
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 397422.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
398
Fuzzy Systems (GFSs) [16] in that more than one objective can be optimized; this
provides a set of equally optimal solutions which approximate the Pareto optimal
front. By visually inspecting the obtained front the user can select his preferred
trade-off among the different objectives (an example of a typically used pair of objectives is complexity/interpretability and accuracy).
In most existing MGFSs, Fuzzy Rule-Based Systems (FRBSs) are optimized
through a multi-objective evolutionary algorithm. While such systems have been
widely studied and applied to low dimensional data sets, more research is still required to make them practical for high dimensional data sets [16, 25]. By High
Dimensional Data Sets (HDDSs) we mean data sets having a high number of input features and, sometimes, a huge number of training data. In this study we will
consider only FRBSs of the Takagi-Sugeno (TS) type (denoted TS systems in the
following), since they are known to be very powerful in regression and control tasks.
However, their use is particularly time consuming, since they consist of if-then rules
where the consequent parts involve crisp functions (typically linear), whose parameters have to be estimated through a sequence of pseudoinversions. When dealing
with HDDSs, the most time consuming task in the evolutionary multi-objective optimization of TS systems resides in the fitness evaluation, and, in particular, in consequent parameters estimation [11].
Most available designing methods of TS-type MGFSs are impractical for HDDSs.
Moreover, it is very unlikely that a single strategy can make the optimization efficient. Conversely, we think that only a combination of different methods can overcome computational bottlenecks. In the following we list some of the methods that,
once appropriately combined, can achieve significant improvements in the efficient
implementation of TS-type MGFSs for HDDSs: i) fast identification of the TS
systems [11] through fitness approximation [26], ii) reuse of previously computed
quantities in evaluating the fitness of a TS system (activation degrees, etc . . . ), iii)
fitness inheritance [6, 14], iv) landscape modeling to reduce the number of fitness
evaluations [27], v) parallel Multi-Objective Evolutionary Algorithms (MOEAs) on
parallel/multicore/distributed machines [5, 33, 34, 35].
This work aims to extend the results obtained in [11] by considering the integration of the first two methods among those listed above: a fast identification of
consequents of the TS systems and the reuse of previously computed quantities.
The first technique is described in [11] and will be briefly discussed in this chapter. As regards the reuse of previously computed quantities, our simple idea is to
store, for all the rules in the current population, their activation degrees and their
unnormalized weighted outputs. Since fast identification of consequents is based
on a decoupling of rules, modifications in a rule do not affect the consequents of
the other rules. Thus, thanks to the fast identification, we can avoid re-estimating
consequent parameters for all the rules that are not modified. For instance, the reestimation can be avoided when we apply crossover followed by no mutation: if the
crossover point is between rules and not within rules, no re-estimation of consequent
parameters is needed for the offspring. Similarly to crossover, we can completely
avoid re-estimation when we apply mutation operators which remove rules from the
rule base. Further, we can limit the re-estimation only to the added or modified rules
16
399
when we apply mutation operators which, respectively, add or modify rules. Finally,
we also discuss how the reuse of previously computed parameters can speed up the
evaluation phase of the TS systems.
In the experimental part, we show the benefits of the combined use of the two
methods through two examples of multi-objective genetic learning of compact and
accurate TS systems for HDDSs in a regression problem and a chaotic time series
forecasting problem, respectively. We evaluate the saved time with respect to the
application of the MGFS without the adoption of the two methods and discuss the
advantages that this saved time provides in terms of search space exploration.
The rest of the chapter is organized as follows. Section 16.2 introduces the TS
systems. In Section 16.3, we compute the asymptotical time complexity associated
with the identification of the consequent parameters, and with the evaluation of the
TS systems. Section 16.4 introduces the technique of the fast identification of consequents. In Section 16.5, we discuss how the reuse of computed parameters can
speed up both the identification and the evaluation of the TS systems. Section 16.6
describes the MOEA used in the experiments, which are shown in Section 16.7.
Finally, Section 16.8 draws some conclusions.
m
m=1
h=1 wh (x) ym (x). By defining
M
(16.1)
vm (x) = wm (x) h=1 wh (x)
400
(16.2)
F times
(16.3)
16
401
rectangular [17] matrices, they are generally faster only asymptotically and quite
difficult to implement. On the other hand, the formulas in [11] are still valid even
when using fast multiplication algorithms, but with a lower impact.
In the following, we will briefly recall the procedure and the results shown in
[11]. Without considering the time required to compute Vm (which will be analyzed
later), it is well known that the time complexity, independently of the technique used
to solve the weighted least squares problem, associated with each pseudoinversion
is O(N (F + 1)2 ). Estimating the consequent parameters of a rule base made of M
rules will therefore require O(M N (F + 1)2) operations.
As regards the time complexity associated with the evaluation of a TS system, assuming that J and P are already known, we first compute the M vectors of activation degrees wm , (wm = [wm (x1 ), . . . , wm (xN )]T , m = 1, . . . , M), and
then we derive the normalized vectors vm . Often, it might be more efficient to
work with vectors and matrices instead of using scalar quantities (for instance,
when adopting Matlab [15]). To this aim, it is helpful to first define vectors a f ,q ,
(a f ,q = [A f ,q (x1, f ), . . . , A f ,q (xN, f )]T , f = 1, . . . , F, q = 1, . . . , Q), to represent the
membership degrees of each input variable to the pertinent fuzzy set for the whole
training set. Then we compute all the wm from a f ,q using the vectorized form:
wm = f =1 a f , jm, f .
F
(16.4)
The complexity for computing each a f ,q is O(N), assuming that the complexity for
computing each A f ,q (xn, f ) is constant, i.e., O(1). For instance, the latter assumption is true when using triangular, trapezoidal or generalized Bellman MFs. The
use of gaussian MFs may require significantly higher computations [30], since it
is not a rational function. Nevertheless, even for this case, efficient approximated
algorithms exist, which exploit the internal representation of floating point numbers
[31]. In the considered application, the evaluation of the MFs will never be a bottleneck, since in Multi-objective Genetic Rule Learning (MGRL) each a f ,q can be
computed just once at the beginning and reused during the optimization process. We
can observe now that computing each wm from a f ,q takes O(N F) time. As regards
vm , an efficient way to compute them in a vectorized form from wm can be obtained
by first building the new matrix W = [w1 , . . . , wM ] (W NM ) and then computing the vector s = [s1 , . . . , sN ]T , in O(M N) time, where sn = 1 M
m=1 wm (xn ).
Second, we build matrix S = [s, . . . , s] NM and finally compute V = W S
M times
402
The computation of y = M
m=1 vm ym first requires to compute each vm ym ,
which has complexity O(N) when starting from vm and ym . Thus y is just the sum
of the M vectors vm ym and has complexity O(M N). Globally, the complexity
to evaluate the output vector y starting from a f ,q (without considering the time for
computing pm ) is O(M N (F + 1)). Finally, we observe how computing the MSE
between y and o has complexity O(N), since MSE = mean(e), and computing e =
(y o) (y o) has complexity O(N).
i X
im ]1 X
i oim ,
pm = [X
m
m
(16.6)
16
403
complexity associated with the estimation of all the consequent parameters, thereM
2
fore, is M
m=1 Nm (F + 1) . Observe that now m=1 Nm N, and thus the new comM
2
plexity m=1 Nm (F + 1) is lower than, or equal to, N (F + 1)2 , which is, on its
turn, lower than M N (F + 1)2 . However, we have to include the complexity associated with computing the vectors im , which consists of M N F operations. Globally,
the complexity associated with the estimation of consequent parameters in the fast
2
2
approach is: M N F + M
m=1 Nm (F + 1) , which is lower than M N (F + 1) .
Real data patterns are not uniformly distributed over the input space; as a consequence, some cells could receive fewer training data points than others or no training
points at all. However, this situation can be managed by removing rules with insufficient number of patterns (for each cell we need to estimate (F + 1) parameters and
thus we could require at least a number of patterns equal to 4 (F + 1) ). In MGRL
it can be meaningful not to consider a rule if few data points are directly related
to it. We have already shown in [11] that this approach considerably speeds up the
identification of the consequent parameters, without particularly deteriorating the
modeling capabilities of the TS system.
404
16.5.1.1
As we have already discussed above, when we apply an operator which changes the
rule base, the overall matrix P of the consequent parameters has to be recomputed.
This occurs also when we apply a mutation operator which removes one or more
rules from the rule base. In the case of the fast identification method, since rules
are independent of each other, no computation is needed. Indeed, the consequent
parts of the survived rules have not to be updated. Thus, also the application of the
mutation operator, which removes rules, does not require the execution of the most
time consuming task in TS identification.
16.5.1.3
When we apply a mutation operator which adds rules to the rule base, the overall
matrix P of the consequent parameters has to be recomputed in the classical approach. Indeed, since the rule base is changed, the consequent parameters of all the
rules have to be updated. On the contrary, when we adopt the fast identification
method, only the consequent parameters of the added rules have to be computed.
Indeed, the rules already existing in the rule base have not to be updated. Thus, the
application of the mutation operator, which adds rules, requires the execution of the
most time consuming task in TS identification only for the added rules.
16
16.5.1.4
405
When we apply a mutation operator, which modifies some rules in the rule base, the
overall matrix P of the consequent parameters has to be recomputed in the classical
approach. Indeed, since the rule base is changed, the consequent parameters of all
the rules have to be updated. On the contrary, when we adopt the fast identification
method, only the consequent parameters of the modified rules have to be computed.
Indeed, the unmodified rules have not to be updated. Thus, the application of the
mutation operator, which modifies rules, requires the execution of the most time
consuming task in TS identification only for the modified rules.
406
that in such cases we can save a factor (F + 1) of time, which can be very important
for high dimensional problems.
16
407
[Mmin , min ], where Mmin is the minimum number of rules, which must be present in
a rule base, and min is the minimum number of rules in c1 and c2 , and multiplying
this number by (F + 1).
The first mutation operator adds rules to the rule base, where is randomly
chosen in [1, max ]. The upper bound max is fixed by the user. If + M > Mmax ,
where Mmax is the maximum possible number of rules in a generated TS system,
then = Mmax M . For each rule rm added to the chromosome, we generate a
random number t, t [1, max ], which indicates the number of input variables used
in the antecedent of the rule. Then, we generate t natural random numbers between
1 and F to determine the input variables which compose the antecedent part of
the rule. Finally, for each selected input variable f , we generate a random natural
number jm, f between 1 and Q f , which determines the fuzzy set A f , jm, f to be used
in the antecedent of rule rm .
The second mutation operator randomly changes elements of matrix J. The
number is randomly generated in [1, max ]. The upper bound max is fixed by the
user. For each element to be modified, a number is randomly generated in [0, Q f ],
where f is the input variable corresponding to the selected matrix element (when
the element 0 is selected, the condition corresponds to dont care). The element is
modified only if the constraint on the maximum number of input variables max for
each rule is satisfied; otherwise, the element maintains its original value.
The mutation operator removes rules from the rule base, where is randomly
chosen in [1, max ]. In the experiments, we used max = min(max , M Mmin ), where
max is fixed by the user, and M and Mmin are, respectively, the number of rules of
the individual and the minimum number of rules allowed for all individuals.
We start with two randomly generated solutions. At each iteration, the application of crossover and mutation operators produces two new solutions z1 and z2 from
two solutions c1 and c2 randomly picked from the archive. If the archive contains
a unique solution, c1 and c2 correspond to this unique solution. We experimentally
verified that the random extraction of the current solutions from the archive allows
us to extend the set of non-dominated solutions contained in the archive so as to
obtain a better approximation of the Pareto front. In this paper, we have used two
alternative stopping criteria, based on the number of epochs G and on elapsed time
ETtot , respectively. Figure 16.1 shows the flow-chart of the (2+2)M-PAES which
uses the fast identification of TS-type FRBSs and the reuse. Sometimes in the following we will shortly call the fast method with reuse as fast method. When the
reuse is not exploited, we will always refer to it as fast with no reuse method.
408
Start
Choose the number Qf of fuzzy sets for each input variable Xf and the
membership function type. Then compute all vectors af,q and af,q
Randomly generate two antecedent parts J of two initial solutions. Then for each
solution, compute indexes im from af,q, and perform a fast estimatation of
consequents pm using formula (6). Compute wm, um, s, y, MSE and complexity. Store
vectors wm and um for possible reuse. Update the archive with the two solutions
Yes
Stop
No
Randomly pick two solutions c1 and c2 from the archive
(they could be the same solution when the archive size < 2 )
For each solution z1 and z2, let zold and znew be the set of rules
inherithed from the parents and the set of new rules, respectively
For each solution z1 and z2: for rules in zold, reuse wm and um; for rules in
znew, compute indexes im from af,q ,wfast estimate consequents pm using
formula (6), compute vectors wm and um and store wm and um for possible
reuse. For each solution z1 and z2, compute s, y, MSE and complexity
Fig. 16.1 Flow-chart of the (2+2)M-PAES which uses fast identification and reuse
16
409
epochs basis. Each of the four experiments has been repeated for eight times, changing the seed of the random number generator, thus causing the generation of different random training and test sets and different evolutions of the MOEA.
0.117
0.084
410
Table 16.2 Parameters used in the (2+2)M-PAES execution for the regression problem
Parameter
Value
archive size
max
max
Mmin
Mmax
max
max
crossover probability
mutation probability
50
10
20
1
64
3
5
1
0.4
as a measure of accuracy. Table 16.2 summarizes the parameters used for the execution of the (2+2)M-PAES. As regards mutation probability, the first, second and
third mutation operators are applied with 0.8, 0.05 and 0.15 probabilities, when the
mutation is applied. Here, there is a bias towards rule adding, since the probability
to generate a solution which will be added to the archive is higher when removing
than when adding rules.
16.7.1.1
In the first experiment, we have used the execution time as stopping criterion. We
have set the maximum amount of time to 1800 sec (30 min). We have repeated
the experiment for eight trials using different randomly extracted training and test
sets and different MOEA evolutions. Figures 16.2 and 16.3 show the trend of the
Pareto fronts after approximately 450, 900, 1350, and 1800 seconds on the training
set for the classical (figure on the left) and fast (figure on the right) methods for two
randomly selected trials.
For the sake of brevity, we do not show the trends for all the trials. On the other
hand, these trends are similar to the ones reported in Figs. 16.2 and 16.3. First of
all, we can observe that, independently of the trial, the fast method executes a much
higher number of epochs on equal time, thus achieving good Pareto fronts. Actually,
at each epoch considered in the figures, the approximated Pareto fronts obtained by
the fast method dominate the ones achieved by the classical approach. Further, the
MSEs of the most accurate solutions generated by the fast method are quite close to
the MSEs shown in Table 16.1, thus suggesting that the fast method achieves very
good MSEs just after 1800 seconds. On the other hand, the MSEs of the most accurate solutions generated by the classical method are quite far from the MSEs shown
in Table 16.1. Finally, we observe that the intervals of complexity of the Pareto
fronts generated by the classical method are much wider than the ones generated
by the fast method. This is quite normal since we start the execution of the (2+2)MPAES from two solutions that contain the maximum number of rules and conditions.
16
1.4
epoch 42, elapsed time: 453.9531 (sec)
epoch 125, elapsed time: 900.0313 (sec)
epoch 304, elapsed time: 1351.5781 (sec)
epoch 468, elapsed time: 1803.2656 (sec)
1.2
MSE
MSE
1.2
0.8
0.8
0.6
0.6
0.4
0.4
0.2
411
20
40
60
Complexity
80
100
0.2
120
10
15
Complexity
20
25
30
Fig. 16.2 Trends of the approximated Pareto fronts obtained within 30 minutes on the training set using classical (left) and fast (right) methods for a sample trial (regression problem)
1.4
1.4
epoch 27, elapsed time: 452.1485 (sec)
epoch 80, elapsed time: 903.4531 (sec)
epoch 218, elapsed time: 1351.375 (sec)
epoch 367, elapsed time: 1800.9531 (sec)
1.2
1.2
MSE
MSE
0.8
0.8
0.6
0.6
0.4
0.4
0.2
20
40
60
80
Complexity
100
120
140
0.2
10
15
Complexity
20
25
Fig. 16.3 Trends of the approximated Pareto fronts obtained within 30 minutes on the training set using classical (left) and fast (right) methods for another sample trial (regression
problem)
During the evolutionary process, the complexity of the rules decreases, as testified
by the fast method and by the second experiment on the dataset at hand. Figure 16.4
shows the final Pareto fronts achieved on the test set by the classical (figure on the
left) and fast (figure on the right) methods on all the eight trials. We can observe that
the Pareto fronts obtained by the fast method outperform the Pareto fronts obtained
by the classical method, thus testifying the good generalization properties of the TS
systems generated by the fast method.
Table 16.3 shows the average results obtained by the classical method and the
fast method executed both without exploiting and by exploiting reuse. Here, G is
the average number of epochs on the eight trials, Mtot is the average total number
412
0.6
Trial 1
Trial 2
Trial 3
Trial 4
Trial 5
Trial 6
Trial 7
Trial 8
0.55
0.5
0.45
0.4
0.5
0.45
0.4
0.35
MSE
MSE
0.35
0.3
0.25
0.3
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
Trial 1
Trial 2
Trial 3
Trial 4
Trial 5
Trial 6
Trial 7
Trial 8
0.55
10
20
30
Complexity
40
50
10
Complexity
15
20
Fig. 16.4 Final Pareto fronts for each trial on the test set obtained using classical (left) and
fast (right) methods after 30 minutes (regression problem)
Table 16.3 Results averaged on the eight trials after 30 minutes on the regression problem
Classical
Fast with no reuse
Fast (with reuse)
best
best
Mtot
MSE T S
466.0
1, 950.5
6, 066.6
10, 309.5
45, 566.4
115, 390.4
0.17475
0.03777
0.01375
0.16404
0.08163
0.07209
0.43340
0.28819
0.25273
In the second experiment on the regression dataset, we have used the number of
epochs as stopping criterion. We have set the maximum number G of epochs to 2500
16
413
and have repeated the experiment for eight trials using different randomly extracted
training and test sets. Figures 16.5 and 16.6 show the trend of the Pareto fronts
after 625, 1250, 1875, and 2500 epochs on the training set for the classical (figure
on the left) and fast (figure on the right) methods and for two randomly selected
trials, respectively. We can observe that, independently of the trial, the accuracies
of the solutions in the Pareto fronts obtained by the classical and the fast methods
are quite similar. This points out that the use of the fast method does not affect the
performance of the generated TS. After the same number of epochs the solutions
in the corresponding Pareto fronts do not differ from each other considerably. Of
course, the Pareto fronts of the fast method have been obtained in a much shorter
time.
Figure 16.7 shows the final Pareto fronts achieved on the test sets of the eight
trials by the classical (figure on the left) and fast (figure on the right) methods. We
can observe that the Pareto fronts obtained by the fast method and by the classical
method are comparable, thus again testifying the good generalization properties of
the TS systems generated by the fast method.
Table 16.4 shows the average results obtained by the classical method and the
fast method executed both without exploiting and by exploiting reuse. Here, ET tot
is the average elapsed time after 2500 epochs. We can observe that the fast identification of the consequent parameters allows considerably reducing the average
elapsed times from 6580 seconds of the classical method to 2166 seconds of the
fast method with no reuse. Further, the adoption of the reuse decreases the average
elapsed time until 643. It is interesting to observe that the average MSE of the best
solutions achieved by the fast method is lower than the average MSE of the best solutions generated by the classical method, thus further testifying that the techniques
proposed to speed up the fitness computation do not affect the accuracies of the final
solutions. Concluding, this second experiment has pointed out how speeding up the
generation and evaluation of the TS systems allows reducing the execution times
without deteriorating the accuracy of the solutions of an amount of
= 90%.
414
1.4
1.4
epoch 625, elapsed time: 2095.1563 (sec)
epoch 1250, elapsed time: 3723.4532 (sec)
epoch 1875, elapsed time: 5351.75 (sec)
epoch 2500, elapsed time: 7011.7813 (sec)
1.2
1.2
MSE
MSE
0.8
0.8
0.6
0.6
0.4
0.4
0.2
10
0.2
15
10
20
Complexity
30
Complexity
40
50
60
Fig. 16.5 Approximated Pareto fronts obtained on the training set using classical (left) and
fast (right) methods after 625, 1250, 1875, and 2500 epochs for a sample trial (regression
problem)
1.4
1.4
epoch 625, elapsed time: 1508.7813 (sec)
epoch 1250, elapsed time: 2876.7032 (sec)
epoch 1875, elapsed time: 4244.625 (sec)
epoch 2500, elapsed time: 5744.1875 (sec)
1.2
1.2
MSE
MSE
0.8
0.8
0.6
0.6
0.4
0.4
0.2
10
Complexity
15
20
0.2
6
8
Complexity
10
12
14
Fig. 16.6 Approximated Pareto fronts obtained on the training set using classical (left) and
fast (right) methods after 625, 1250, 1875, and 2500 epochs for another sample trial (regression problem)
where K1 , K2 , K3 and K4 are the Runge-Kutta coefficients. The method has been
applied starting from an initial condition x[0] randomly generated within the interval
[0, 1] and considering x(t) = 0 for all t < 0.
Once the 67, 499 samples have been generated, according to [24] we have removed the first 10, 099 data points to avoid the transient portion of the data, thus
obtaining 57, 400 values. From these values, we have generated a dataset of 55, 000
samples of the format (x[k 1800], x[k 1200], x[k 600], x[k], x[k + 600]) in order
to predict x[k + 600] from the past values x[k 1800], x[k 1200], x[k 600], and
x[k] (2400 samples have been discarded to completely separate the test set from
the training set). The first 50, 000 samples have been used for training while the
16
0.6
Trial 1
Trial 2
Trial 3
Trial 4
Trial 5
Trial 6
Trial 7
Trial 8
0.55
0.5
0.45
0.4
0.5
0.45
0.4
0.35
MSE
MSE
Trial 1
Trial 2
Trial 3
Trial 4
Trial 5
Trial 6
Trial 7
Trial 8
0.55
0.35
0.3
0.25
0.3
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
415
6
8
Complexity
10
12
14
10
15
Complexity
20
25
Fig. 16.7 Final Pareto fronts for each trial on the test set obtained using classical (left) and
fast (right) methods after 2500 epochs (regression problem)
Table 16.4 Results averaged on the eight trials after 2500 epochs on the regression problem
best
best
Method
ET tot (sec)
Mtot
MSE T S
Classical
Fast with no reuse
Fast (with reuse)
6, 580
2, 166
643
40,359
55,497
55,497
0.16306
0.03904
0.01159
0.1180
0.0762
0.0762
0.3346
0.2753
0.2753
remaining 5, 000 for test. The problem at hand can be viewed as a regression problem of the form o = f (X1 , X2 , X3 , X4 ), where X1 = x[k 1800], X2 = x[k 1200],
X3 = x[k 600], X4 = x[k], and o = x[k + 600].
Unlike in [24], we have added six more input variables X5 , . . . , X10 , uniformly
sampled in the domains [0, 1]6 for the training and test sets, in order to make the
dataset more difficult to deal with, thus obtaining the problem: o = f (X1 , . . . , X10 ).
Again, the six added input variables constitute a sort of noise with respect to
the function since the output does not depend on them. We have used a number of
fuzzy sets equal to seven for all inputs (Q f = 7) and uniform partitions composed by
gaussian membership functions. Such dataset is similar to the one used in [24], but
with some differences, namely: a higher sampling rate (1 instead of 0.01), a smaller
integration step (0.01 instead of 0.1), a higher number of fuzzy sets on each input
variable (7 instead of 2) and the presence of the six added fictitious input variables.
As shown in the following, we have achieved good MSEs, though higher than that
found in [24] by Jang using the well-known ANFIS neuro-fuzzy system with 4
inputs, 2 fuzzy sets per input, 16 rules and 64 conditions (dont care is not used
therein). The reasons are the following: i) we used a bigger dataset (and thus more
difficult to deal with), having both a higher number of samples and a higher number
of inputs, ii) we limit ourselves to consider systems with lower complexity (only one
condition per rule instead of four and a maximum of 30 conditions in total instead of
416
64), iii) we do not perform membership functions optimization. On the other hand,
it has to be said that it would require too much time to run the ANFIS system with
membership function optimization on the problem at hand even to perform only one
iteration step.
Above we have explained how to generate training and test sets for the time series
forecasting problem. Thus, we are now ready to carry out experiments. Here we have
repeated the two experiments carried out on the regression dataset on the problem at
hand. Again, in the first experiment, we show how, on equal execution time, the fast
method with reuse obtains Pareto fronts which dominate the ones obtained by the
classical method. In the second experiment, we point out how, on equal number of
epochs, the fast method with reuse achieves Pareto fronts comparable with the ones
obtained by the classical method, but saving approximately 96.5% of time.
The objective functions used here are the same as those used for the regression
problem (number of conditions and MSE). The parameters used for the execution
of the (2+2)M-PAES are the same as those shown in Table 16.2, with the exception
of the maximum number of rules Mmax , which has been set to 30 instead of 64,
and the maximum number of conditions per rule max (1 instead of 3). As regards
mutation probability, we used the same used for the regression problem on both the
experiments.
16.7.2.1
Here we repeated the first experiment carried out on the regression problem. Thus
we have used the execution time as stopping criterion and we have set the maximum amount of time to 1800 sec. We have repeated the experiment for eight times
using different randomly extracted training and test sets (by using a different starting point x[0]). Figures 16.8 and 16.9 show the trend of the Pareto fronts after
approximately 450, 900, 1350, and 1800 seconds on the training set for the classical
(figure on the left) and fast (figure on the right) methods and for two out of the eight
trials, randomly selected. We can observe that, independently of the trial, the fast
method executes a much higher number of epochs on equal time, thus achieving
better Pareto fronts even for this problem.
Figure 16.10 shows the final Pareto fronts achieved on the test set by the classical
(figure on the left) and fast (figure on the right) methods. We can observe that the
Pareto fronts obtained by the fast method outperform the Pareto fronts obtained by
the classical method, thus testifying the good generalization properties of the TS
systems generated by the fast method.
Table 16.5 shows the average results obtained by the classical method and the fast
method executed both without exploiting and by exploiting reuse. We can observe
that the fast identification of the consequent parameters allows increasing the average number of rules generated and evaluated during the 30 minutes from 10, 706.9
of the classical method to 38, 337.3 of the fast method. Further, the adoption of the
reuse increases this average number of rules until 76, 670.9. The tangible result of
this decrease in time needed to generate and evaluate the rules is the increase in
16
x 10
2.4
2.4
2.1
2.1
1.8
1.8
1.5
1.5
1.2
1.2
0.9
0.9
0.6
0.6
0.3
0.3
6
7
8
Complexity
10
11
2.7
MSE
MSE
x 10
3
epoch 154, elapsed time: 450.9531 (sec)
epoch 340, elapsed time: 900.25 (sec)
epoch 547, elapsed time: 1350.6563 (sec)
epoch 788, elapsed time: 1800.8281 (sec)
2.7
417
12
5
Complexity
Fig. 16.8 Trends of the approximated Pareto fronts obtained within 30 minutes on the training set using classical (left) and fast (right) methods for a sample trial (time series forecasting
problem)
3
x 10
3
epoch 121, elapsed time: 450.4531 (sec)
epoch 277, elapsed time: 901.7031 (sec)
epoch 494, elapsed time: 1352.2656 (sec)
epoch 725, elapsed time: 1800.0156 (sec)
2.7
2.4
2.1
2.1
1.8
1.8
1.5
1.5
1.2
1.2
0.9
0.9
0.6
0.6
0.3
0.3
7
8
9 10 11 12 13 14 15
Complexity
2.7
MSE
MSE
2.4
x 10
6
7
8
Complexity
10
11
12
Fig. 16.9 Trends of the approximated Pareto fronts obtained within 30 minutes on the training set using classical (left) and fast (right) methods for another sample trial (time series
forecasting problem)
accuracy achieved by the fast method thanks to the higher number of epochs. Concluding, this experiment has pointed out how speeding up the generation and evaluation of the TS systems allows executing a larger number of epochs, thus improving
accuracy of the solutions.
16.7.2.2
418
x 10
2.7
2.4
2.1
2.4
2.1
1.8
1.5
1.5
1.2
1.2
0.9
0.9
0.6
0.6
0.3
0.3
6
Complexity
Trial 1
Trial 2
Trial 3
Trial 4
Trial 5
Trial 6
Trial 7
Trial 8
2.7
MSE
MSE
1.8
x 10
3
Trial 1
Trial 2
Trial 3
Trial 4
Trial 5
Trial 6
Trial 7
Trial 8
10
6
Complexity
10
Fig. 16.10 Final Pareto fronts for each trial on the test set obtained using classical (left) and
fast (right) methods after 30 minutes (time series forecasting problem)
Table 16.5 Results averaged on the eight trials after 30 minutes on the time series forecasting
problem
Classical
Fast with no reuse
Fast (with reuse)
best
Mtot
837.3
2, 638.4
5, 265.3
10, 706.9
38, 337.3
76, 670.9
0.16503
0.04387
0.02076
best
MSE T S
x 10
1
epoch 625, elapsed time: 500.7422 (sec)
epoch 1250, elapsed time: 995.0781 (sec)
epoch 1875, elapsed time: 1482.1875 (sec)
epoch 2500, elapsed time: 2155.75 (sec)
0.8
x 10
0.8
MSE
0.6
MSE
0.6
0.4
0.4
0.2
0.2
4
Complexity
5
Complexity
Fig. 16.11 Approximated Pareto fronts obtained on the training set using classical (left) and
fast (right) methods after 625, 1250, 1875, and 2500 epochs for a sample trial (time series
forecasting problem)
16
x 10
1
epoch 625, elapsed time: 613.7500 (sec)
epoch 1250, elapsed time: 1222.9219 (sec)
epoch 1875, elapsed time: 1842.6719 (sec)
epoch 2500, elapsed time: 2495.6563 (sec)
0.8
419
x 10
0.8
MSE
0.6
MSE
0.6
0.4
0.4
0.2
0.2
3
Complexity
5
Complexity
Fig. 16.12 Approximated Pareto fronts obtained on the training set using classical (left) and
fast (right) methods after 625, 1250, 1875, and 2500 epochs for another sample trial (time
series forecasting problem)
3
x 10
1
Trial 1
Trial 2
Trial 3
Trial 4
Trial 5
Trial 6
Trial 7
Trial 8
0.8
Trial 1
Trial 2
Trial 3
Trial 4
Trial 5
Trial 6
Trial 7
Trial 8
0.8
MSE
0.6
MSE
0.6
x 10
0.4
0.4
0.2
0.2
5
7
Complexity
11
5
7
Complexity
11
Fig. 16.13 Final Pareto fronts for each trial on the test set obtained using classical (left) and
fast (right) methods after 2500 epochs (time series forecasting problem)
Table 16.6 Results averaged on the eight trials after 2500 epochs on the time series forecasting problem
best
Method
ET tot (sec)
Mtot
Classical
Fast with no reuse
Fast (with reuse)
2, 562.7
1, 682.0
89.1
15, 837.5
38, 337.3
17, 286.6
0.16181
0.04387
0.00515
best
MSE T S
420
experiment for eight trials using different randomly extracted training and test sets
and different MOEA evolutions. Figures 16.11 and 16.12 show the trend of the
Pareto fronts after 625, 1250, 1875, and 2500 epochs on the training set for the classical (figure on the left) and fast (figure on the right) methods and for two out of the
eight trials.
In Fig. 16.13 (which provides the final Pareto fronts achieved in the test set on
each of the eight trials for classical and fast methods) we can observe that, even
if the classical method seems slightly better, we can consider the average Pareto
fronts approximatively equivalent. Of course, the Pareto fronts of the fast method
have been obtained in a much shorter time even in this case.
Table 16.6 shows the average results obtained by the classical method and the
fast method executed both without exploiting and by exploiting reuse. We can observe that the fast identification of the consequent parameters allows considerably
reducing the average elapsed times from 2562.7 seconds of the classical method to
1682.0 seconds of the fast method with no reuse. Moreover, the adoption of the reuse
further decreases the average elapsed time until 89.1. Thus, using the fast method
we have been able to save
= 96.5% of the time. It is interesting to observe that the
average MSE of the best solutions achieved by the fast method is lower than the
average MSE of the best solutions generated by the classical method on the training
set, while they are comparable on the test set.
Even for the time series forecasting problem we can conclude that speeding up
the generation and evaluation of the TS systems allows reducing the execution times
without significantly deteriorating the accuracy of the solutions.
16.8 Conclusions
In this chapter, we have shown a possible roadmap towards the efficient design of
multi-objective genetic Takagi-Sugeno fuzzy systems for high dimensional problems. We have proposed a method to speed up the identification of the consequent
parameters of the TS rules. This method produces as a side-effect a decoupling of the
rules. Thus, during the evolutionary process possible modifications in a rule do not
affect the other rules and therefore we can avoid re-estimating parameters for all the
rules which are not modified. Exploiting this observation, we have discussed how
simply storing and reusing previously computed parameters we can further speed
up the evolutionary process. In the experimental part we have shown the advantages
of applying the efficient approach proposed in this chapter by using both regression
and time series forecasting problems. Results have highlighted that, on average, the
approach allowed saving approximately 90% and 96.5% of the execution times, respectively. When the execution times are the same, the proposed approach performs
a significantly higher number of epochs and thus it better explores the search space,
providing better Pareto fronts.
16
421
References
1. Angelov, P.P., Filev, D.P.: An approach to online identification of Takagi-Sugeno fuzzy
models. IEEE Trans. on Systems, Man and Cybernetics: part B Cyb. 34(1), 484498
(2004)
2. Babuska, R.: Fuzzy modeling for control. Kluwer Academic Publishers, Boston (1998)
3. Botta, A., Lazzerini, B., Marcelloni, F.: Context adaptation of Mamdani fuzzy rule-based
systems. International Journal of Intelligent Systems 23(4), 397418 (2008)
4. Botta, A., Lazzerini, B., Marcelloni, F., Stefanescu, D.C.: Context adaptation of fuzzy
systems through a multi-objective evolutionary approach based on a novel interpretability index. Soft Computing 13(5), 437449 (2009)
5. Branke, J., Schmeck, H., Deb, K.: Parallelizing multi-objective evolutionary algorithms:
Cone separation. In: Proc. of the IEEE Congress on Evolutionary Computation 2004 CEC 2004, Portland, Oregon, USA, June 19-23, pp. 19521957 (2004)
6. Bui, L.T., Abbass, H.A., Essam, D.: Fitness inheritance for noisy evolutionary multiobjective optimization. In: Proc. of the 2005 Conference on Genetic and Evolutionary
Computation, Washington, D.C., USA, June 25-29, pp. 779785 (2005)
7. Chen, C.-H., Hong, T.-P., Tseng, V.S., Chen, L.-C.: A multi-objective genetic-fuzzy data
mining algorithm. In: Proc. of the IEEE International Conference on Granular Computing, Hangzhou, China, August 26-28, pp. 115120 (2008)
8. Cococcioni, M., Corsini, G., Lazzerini, B., Marcelloni, F.: Approaching the ocean color
problem using fuzzy rules. IEEE T. on Syst., Man & Cyb. - part B: Cyb. 34(3), 1360
1373 (2004)
9. Cococcioni, M., Corsini, G., Lazzerini, B., Marcelloni, F.: Solving the ocean color
inverse problem by using evolutionary multi-objective optimization of neuro-fuzzy
systems. International Journal of Knowledge-Based and Intelligent Engineering Systems 12(5-6), 339355 (2008)
10. Cococcioni, M., Ducange, P., Lazzerini, B., Marcelloni, F.: A Pareto-based multiobjective evolutionary approach to the identification of Mamdani fuzzy systems. Soft
Computing 11(11), 10131031 (2007)
11. Cococcioni, M., Lazzerini, B., Marcelloni, F.: Fast multiobjective genetic rule learning
using an efficient method for Takagi-Sugeno fuzzy systems identification. In: Proc. of
the 8th Int. Conference on Hybrid Intelligent Systems (HIS 2008), Barcelona, Spain, pp.
272277 (2008)
12. Cococcioni, M.: The Evolutionary Multiobjective Optimization of Fuzzy Rule-Based
Systems Bibliography Page (2009),
http://www2.ing.unipi.it/g000502/emofrbss.html
13. Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation 9, 251280 (1990)
14. Ducheyne, E.I., De Baets, B., De Wulf, R.R.: Fitness inheritance in multiple objective
evolutionary algorithms: A test bench and real-world evaluation. Appl. Soft Comp. 8,
337349 (2008)
15. Getreuer, P.: Writing fast Matlab code (2009),
http://www.mathworks.com/matlabcentral/fileexchange/5685
16. Herrera, F.: Genetic fuzzy systems: Taxonomy, current research trends and prospects.
Evolutionary Intelligence 1, 2746 (2008)
17. Huang, X., Pan, V.Y.: Fast rectangular matrix multiplication and applications. Journal of
Complexity 14, 257299 (1998)
18. Ishibuchi, H.: Multiobjective genetic fuzzy systems: review and future research directions. In: Proc. of Fuzz-IEEE 2007, London, UK, July 23-26, pp. 16 (2007)
422
19. Ishibuchi, H., Murata, T., Turksen, I.B.: Selecting linguistic classification rules by twoobjective genetic algorithms. In: Proc. of the 1995 IEEE International Conference on
System, Man and Cybernetics, Vancouver, BC, Canada, vol. 2, pp. 14101415 (1995)
20. Ishibuchi, H., Murata, T., Turksen, I.B.: Single-objective and two-objective genetic algorithms for selecting linguistic rules for pattern classification problems. Fuzzy Sets and
Systems 89(2), 135150 (1997)
21. Ishibuchi, H., Nakashima, T., Murata, T.: Three-objective genetics-based machine learning for linguistic rule extraction. Information Sciences 136(1-4), 109133 (2001)
22. Ishibuchi, H., Nojima, Y.: Analysis of interpretability-accuracy tradeoff of fuzzy systems
by multiobjective fuzzy genetics-based machine learning. Int. J. of Appr. Reas. 4(1), 4
31 (2007)
23. Ishibuchi, H., Yamamoto, T.: Interpretability issues in fuzzy genetics-based machine
learning for linguistic modelling. In: Lawry, J., Shanahan, J.G., Ralescu, A.L. (eds.)
Modelling with Words. LNCS, vol. 2873, pp. 209228. Springer, Heidelberg (2003)
24. Jang, J.-S.R.: ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans. on
Systems, Man and Cybernetics 23(3), 665685 (1993)
25. Jin, Y.: Fuzzy modeling of high-dimensional systems: Complexity reduction and interpretability improvement. IEEE Trans. on Fuzzy Systems 8(2), 212223 (2000)
26. Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation.
Soft Computing 9, 312 (2005)
27. Knowles, J.D.: ParEGO: A hybrid algorithm with on-line landscape approximation for
expensive multiobjective optimization problems. IEEE Trans. on Evol. Comp. 10(1), 50
66 (2006)
28. Knowles, J.D., Corne, D.W.: Approximating the non dominated front using the Pareto
archived evolution strategy. Evolutionary Computation 8(2), 149172 (2000)
29. Mackey, M.C., Glass, L.: Oscillation and chaos in physiological control systems. Science 197, 287289 (1977)
30. Nauck, D.D.: GNU Fuzzy. In: Proc. of FUZZ-IEEE 2007, London, UK, pp. 16 (2007)
31. Schraudolph, N.N.: A fast, compact approximation of the exponential function. Neural
Computation 11, 853862 (1999)
32. Soukkou, A., Khellaf, A., Leulmi, S.: Multiobjective optimisation of robust TakagiSugeno fuzzy neural controller with hybrid learning algorithm. Int. Journal of Modelling,
Identification and Control 2(4), 332346 (2007)
33. Streichert, F., Ulmer, H., Zell, A.: arallelization of Multi-objective Evolutionary Algorithms Using Clustering Algorithms. In: Coello Coello, C.A., Hernandez Aguirre, A.,
Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 92107. Springer, Heidelberg (2005)
34. Tan, K.C., Yang, Y.J., Goh, C.K.: A distributed Cooperative coevolutionary algorithm for
multiobjective optimization. IEEE Trans. on Evolutionary Computation 10(5), 527549
(2006)
35. Van Veldhuizen, D.A., Zydallis, J.B., Lamont, G.B.: Considerations in engineering parallel multiobjective evolutionary algorithms. IEEE Trans. on Evol. Comp. 7(2), 144173
(2003)
36. Yen, J., Gillespie, L.W., Gillespie, C.W.: Improving the interpretability of TSK fuzzy
models by combining global learning and local learning. IEEE Trans. on Fuzzy
Syst. 6(4), 530537 (1998)
Chapter 17
Abstract. In many real world network problems several objectives have to be optimized simultaneously. To solve such problems, it is often appropriate to use the
multi-criterion minimum spanning tree (MCMST) model, a combinatorial optimization problem that has been shown to be NP-Hard. In Pareto Optimization of
the model no polynomial time algorithm is known to find the Pareto front for all
instances of the MCMST problem. Researchers have therefore developed deterministic and evolutionary algorithms. However, these exhibit a number of shortcomings such as lack of scalability and large CPU times. Therefore, the hybridised
Knowledge-based Evolutionary Algorithm (KEA) has been proposed, which does
not have the limitations of previous algorithms because of its speed, its scalability to more than 500 nodes in the bi-criterion case and scalability to the multicriterion case, and its ability to find both the supported and non-supported optimal
solutions. KEA is faster and more efficient than NSGA-II in terms of spread and
number of solutions found. The only weakness of KEA is the dominated middle of
its Pareto front. In order to overcome this deficiency, a number of modifications have
been tested including KEA-M, KEA-G and KEA-W. Experimental results show that
when time is expensive KEA is preferable to all other algorithms tested.
17.1 Introduction
The multi-criterion minimum spanning tree (MCMST) problem is a combinatorial
optimization problem that has attracted attention in recent years due to its applications in many real world problems, in particular in designing networks (computer,
Madeleine Davis-Moradkhan
School of Systems Engineering, University of Reading, Pepper Lane, Reading RG6 6AY, UK
e-mail: mdmtig@yahoo.co.uk
Will Browne
Senior Lecturer, Victoria University of Wellington, New Zealand
e-mail: will.browne@ecs.vuw.ac.nz
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 423452.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
424
communication, electrical, pipeline, etc). This problem is also interesting for theoretical reasons, as it has been shown to be NP-Hard [1]. Existing algorithms applied
to this problem exhibit a number of shortcomings such as lack of scalability and
large CPU times. A 500-node MCMST problem has a search space of 500498 feasible solutions, which account for the difficulty of finding good solutions in a practical
time.
c1 j x j ,
j=1
minz2 (x) =
c2 j x j ,
j=1
,
minz p (x) =
cp jx j
j=1
subject to x X .
Figure 17.1 shows the evaluated graph G on five nodes and two of its spanning
trees, each minimizing one of the criteria. The numbers in brackets indicate the
values of the two costs (criteria) corresponding to each edge of G.
17
425
Fig. 17.1 Graph G and two of its spanning trees, which are the extreme points of the Pareto
front, each minimizing one of the criteria
426
there does not exist another solution x such that: zi (x) zi (x ), i = 1, , p, and
zk (x) < zk (x ), for at least one k.
The set of all Pareto optimal solutions is called the efficient set or the Pareto front
(PF) [9]. There are two types of efficient solution [42], as shown in Figure 17.2.
Formally the supported efficient solutions are minima for a convex combination of
criteria. The non-supported efficient solutions cannot be computed by minimizing
a convex combination of criteria, whatever the considered strictly positive weights
[14]. The supported efficient solutions are situated on the convex hull of the feasible region, and are relatively easy to find. The non-supported efficient solutions,
situated in the segment formed by two consecutive supported solutions and the corresponding local nadir point , are difficult to discover and most algorithms do not
find them. This is the case in the weighted-sum methods where the optimal solutions found by varying the weights associated with each criterion are all supported
efficient solutions.
Fig. 17.2 Points A, B, C, D and E are the supported efficient solutions. H is the local nadir
point corresponding to D and E. All the points situated inside a segment, for example segment
DEH, are non-supported efficient solutions. Thus, points F and G are non-supported efficient
solutions
No polynomial time algorithm is known to find the PF for all instances of the
MCMST problem. For small complete graphs (n 10) exhaustive search has been
used to find the true PF [25]. However, the exhaustive search is impractical with
the present speed of computers for complete graphs having more than 10 or 11
nodes. Authors have therefore turned to deterministic heuristics and evolutionary
algorithms that succeed only in calculating an approximate Pareto set (APS). Most
17
427
of the existing algorithms are applicable to the bi-objective case and have been tested
using simple or planar graphs with few edges.
A number of deterministic algorithms for the bi-criterion case have been proposed for use in decision aid or for validating evolutionary algorithms , all of which
are restricted to two criteria, have been tested on small instances and have not been
validated with an exhaustive search. Hamacher and Ruhe [21] are among the pioneers to propose an approximate algorithm for the bi-criterion MST that consists of
two phases. Although their algorithm has been criticized as inefficient and incapable
of producing the true PF [42], their 2-phase methodology has been adopted successfully by many researchers not only for the MCMST problem, but also in other
problems such as the Traveling Salesman Problem [39], the Quadratic Assignment
Problem [23] and the max-ordering problem [15]. Other 2-phase heuristics have
been proposed for the MCMST problem by Anderson et al. [1], Ramos et al. [42],
Steiner and Radzik [47].
In order to measure the efficacy of their evolutionary algorithm, some authors
have proposed simple deterministic heuristics, based on Kruskals algorithm [31] or
on Prims algorithm [40]. These algorithms, applicable to two criteria, include the
Enumeration method of Zhou and Gen [50], which was shown to be incorrect [25],
the mc-Prim proposed by Knowles [25], and the mc-Kruskal by Arroyo et al. [2]. In
both mc-Prim and mc-Kruskal a parameter controls the number of solutions that will
be calculated; thus they will not produce all the supported Pareto optimal solutions
if their number exceeds this parameter.
Recently the Extreme Point Deterministic Algorithm (EPDA) has been proposed
[10] that improves upon previous algorithms as it finds both supported and nonsupported efficient solutions for more than two criteria. EPDA is validated against
the exhaustive search algorithm EXS, based on the method proposed by Christofides
[8], using benchmark instances generated by algorithms suggested by Knowles [25]
which consist of complete graphs having three different cost functions.
When the number of nodes is large, and deterministic algorithms are slow to
converge and become impractical; probabilistic algorithms can be used to find a
near-optimal solution in less time. The few evolutionary algorithms proposed in
literature have been tested on small instances on two criteria and the majority find
only the supported efficient solutions.
Zhou and Gen [50] appear to have been the first to suggest applying, the first version of Non-dominated Sorting Genetic Algorithm (NSGA) [46], to the bi-criterion
MST problem. This algorithm was criticized [25] for failing to calculate nondominated solutions.
The first Evolution Strategy (ES) for the bi-criterion MST is proposed by Knowles
and Corne [28] and Knowles [25] called Archived Elitist Steady-State EA (AESSEA),
which is a ( + 1)-ES and a population-based variant of Pareto-Archived ES
(PAES) [27].
Bosman and Tierens [5] have proposed and tested the Multi-objective Mixturebased Iterated Density Estimation Evolutionary Algorithm (MIDEA) in two different multi-objective problem domains, real-valued continuous and binary combinatorial problems. The second domain includes the MCMST problem. Experimental
428
results, presented in [5] and [6], comparing MIDEA with Non-dominated Sorting Genetic Algorithm (NSGA-II) [13] and Strength Pareto Evolutionary Algorithm
(SPEA) [48] show that MIDEA is at least comparable with the other two. NSGA-II
is shown to be the most competitive in terms of front occupation and the Average
Front Distance, whereas SPEA has a better spread than NSGA-II.
An algorithm applying the Greedy Randomized Adaptive Search Procedure (mcGRASP) has been suggested by Arroyo et al. [2], who argue that the randomization
helps to obtain non-supported efficient solutions as well as supported ones.
More recently, other genetic algorithms have been proposed by Han and Wang
[22] and by Chen et al. [7], which have been tested on small graphs in two criteria.
Gue et al. [20] have presented a particle swarm algorithm that is expected to work
for more than two criteria, but no results have been reported for the multi-criterion
case.
Evolutionary algorithms [29], [30] and particle swarm algorithms [19] have been
suggested for the bi-objective degree-constrained MCMST problem. Moreover, special types of bi-objective MST problems have been addressed [33], [36], where a
constraint in a single objective MST has been transformed into a second objective.
In order to overcome the limitations of previous algorithms the novel, fast and
scalable Knowledge-based Evolutionary Algorithm (KEA) is proposed. KEA is designed to achieve all of the following unlike its predecessors.
To be applicable to more than two criteria.
To calculate both the supported and the non-supported Pareto optimal solutions.
To be fast so that it can be used for large graphs with more than 500 nodes.
The main features of KEA include:
The application of deterministic approaches to calculate the extreme points of
the Pareto front. These are used to produce the initial population comprising of
an elite set of parents.
An elitist evolutionary search attempts to find the remaining Pareto optimal
points by applying a knowledge-based mutation operator. The domain knowledge is based on the k-best approaches in deterministic methods.
Marking schemes that reduce the re-evaluation of solutions; and cut-off points
that eliminate the dominated regions of the search space are applied.
Experimental results are obtained from hard benchmark instances of the problem
that are generated using the algorithms proposed by Knowles [25] for complete
graphs. KEA is verified and validated against the exhaustive search algorithm EXS,
based on the method suggested by Christofides [8]. Comparative results with an
adapted version of Non-dominated Sorting Genetic Algorithm (NSGA-II) [13] and
with the Extreme Point Deterministic Algorithm (EPDA) [10] are reported. It is
shown that the strength and superiority of KEA is due to the domain knowledge
that is autonomously incorporated, making it efficient, fast and scalable.
Because of its speed and efficiency, KEA has much potential for rendering the
MCMST model applicable to real world problems arising in diverse systems:
17
429
1. Analysis and design of various networks [3], for example, large scale telecommunication networks [16], distributed computing networks [38], spatial networks
for commodity distribution (gas pipelines or train tracks) [18] and processor networks [32].
2. Hardware design [17], for example, connectionist architectures for analog and
VLSI circuits [37].
3. Data collection and information retrieval, for example, file mirroring/transfer,
bit-compression for information retrieval and minimizing message passing [4].
430
graphs: m = (n (n 1))/2 and p is the number of criteria. Eps is the domain dependent precision parameter. Two cost values are considered equal if their absolute
difference is less than Eps. The size of the PF is controlled by varying the value of
Eps. The crossover and mutation probabilities are equal to zero and one respectively.
Finally, g is the number of generations.
17
431
The Total Costs of the new ST are compared with the Total Costs of nondominated STs on the APS. If the new ST is not dominated it is added to the APS. If
the new ST dominates one or more STs on the APS, then they are discarded from it.
Once all the possible edges (r, s) have been found to replace the edge (u, v) and
the corresponding STs are created, MSTi is restored by inserting the edge (u, v).
Then another edge of MSTi is considered to be replaced. The procedure is repeated
until all edges of MSTi are mutated one by one.
The advantage of KEA constructing the APS from all the extreme points is threefold:
a) Genotypic neighbours of the extreme MSTs are more likely to be non-dominated
than other solutions. Therefore, they form an elite initial population superior to a
randomly generated population.
432
17
433
434
%FO = (FO/T TC) 100 , shows the efficiency of an algorithm. An efficient algorithm will derive its APS with few calculations, whilst avoiding the infeasible
region, resulting in a large %FO.
Effectivity is demonstrated by testing with the three cost functions and increasing
problem sizes.
Parameter tuning tests were performed with benchmark complete graphs having
20, 30, 50 and 100 nodes in order to determine the best values for decision parameters. These tests were performed with two different values for the precision
17
435
436
Table 17.2 (100 Nodes) Mean and standard deviation of the S-measure, the CPU times in
seconds and the values of other parameters averaged over 10 runs of KEA and NSGA2 on
Graphs with 100 Nodes and three different Cost Types. g = number of generations, P =
population size for NSGA2. T TC measures total number of STs created of which a number
equal to front occupation, FO, are placed in the APS. The values of %FO show the nondominated portion of the T TC
E ps
g KEA
g , P NSGA2
S-value KEA , ( )
S-value NSGA2 , ( )
Mann-Whitney z-value
Significant level
CPU Sec. EPDA
CPU Sec. KEA , ( )
CPU Sec. NSGA2 , ( )
T TC EPDA
T TC KEA , ( )
T TC NSGA2 , ( )
FO EPDA
FO KEA , ( )
FO NSGA2 , ( )
%FO EPDA
%FO KEA , ( )
%FO NSGA2 , ( )
Correlated Costs
0.0001
3K
1000 , 1.5 K
25.1 , (0.1)
17.7 , (1.1)
-3.74
> 99%
78.2
37.1 , (3)
1,197 , (21.9)
412 K
383 K , (31 K)
1,942 K , (55 K)
1,339
1,315 , (61.2)
501 , (74.3)
0.32%
0.34% , (0.20)
0.03% , (0.13)
Anti-Correlated Costs
0.0001
20 K
500 , 5 K
2,051 , (2.5)
1,878.7 , (7.9)
-3.74
> 99%
2,407
698.3 , (59.8)
9,079.8 , (52.4)
5,696 K
3,759 K , (350 K)
2,938 K , (44 K)
4,453
4,255 , (124.1)
1,550 , (308.3)
0 08%
0.11% , (0.04)
0.05% , (0.69)
Random Costs
0.0001
20 K
500 , 5K
7,705 K ,(17,184)
7,07 K , (26,700)
-4.35
> 99%
2,021
515.5 , (51.9)
8,852.9 , (33.6)
4,643
3,063 , (199 K)
21,837 K , (19 K)
4,805
4,430 , (139.8)
1,860 , (208.2)
0.10%
0.14% , (0.07)
0.01% , (1.1)
Table 17.3 Median and Inter-quartile Range (IQR) values of the Additive Epsilon Indicator
Obtained after 10 evaluations of KEA and NSGA2 on Graphs with 100 Nodes and different
Cost Types. Lower values indicate better performance. E ps = 0.0001
I(KEA) I(KEA) I(NSGA2) I(NSGA2) z-value
Cost Type Median
IQR Median
IQR > 99%
Anti-Correlated
2.19
0.17
3.40
0.18 -12.22
Correlated
-0.36
0.17
1.18
0.35 -12.22
Random
154.40 43.24
301.23
12.20 -12.22
Figure 17.4 compares the best attainment surface plots obtained by the three algorithms in the case of random costs. Since the median and the worse attainment
surfaces are close to the best, they are not plotted for clarity. Figure 17.4 shows
that the middle section of the surface plot of KEA is dominated by EPDA; and that
NSGA2 mostly dominates the middle portion of KEA and is almost superimposed
on the middle section of EPDA. Nonetheless, NSGA2 and KEA are incomparable
17
437
Fig. 17.4 Best attainment surface plots by KEA and NSGA2 compared with the attainment
surface of EPDA for a graph with 100 nodes and random costs, (NSGA2: P = 5, 000, g = 500,
crossover-prob. = 0.9, mutation-prob. = 0.01) (KEA: g = 20, 000)EPDA FO = 4, 805, KEA
FO = 4, 317, NSGA2 FO = 2, 030, E ps = 0.0001
in terms of the I-indicator. The attainment surface plots showed that in all instances
NSGA2 converges to a small area in the middle of the objective space, due to the
way NSGA2 explores the decision space (similar to Knowles [26]).
In practical situations with a short time limit, KEA dominates both EPDA and
NSGA2. To illustrate this point, the attainment surface plots for the random cost
instance are shown in Figure 17.5, where the running times of EPDA and NSGA2
have been limited to that of KEA. In this figure KEA dominates both algorithms
in the middle section of the APS, which is not the case for an unlimited execution
time. Where time is not a critical issue and the middle sections are important, neither
KEA nor NSGA2 is recommended as EPDA dominates both.
438
Fig. 17.5 Attainment surface plots for a graph with 100 nodes and random costs, where
the execution time has been limited to that of KEA. Approximate CPU = 515.5 sec. (KEA:
g = 20, 000), resolution = 60, total no. of test points = 120 (NSGA2: P = 5, 000, crossoverprob. = 0.9, mutation-prob. = 0.01) EPDA FO = 2, 782, Approximate KEA FO = 4, 317,
NSGA2 FO = 21, E ps = 0.0001
Fig. 17.6 Plots showing CPU times of EPDA, KEA and NSGA2 vs. no. of nodes of graphs
with correlated costs, E ps = 0.01
17
439
Fig. 17.7 Plots showing CPU times of EPDA, KEA and NSGA2 vs. no. of nodes of graphs
with anti-correlated costs, E ps = 0.01
Fig. 17.8 Plots showing CPU times of EPDA, KEA and NSGA2 vs. no. of nodes of graphs
with random costs, E ps = 0.01
are common to all the algorithms (EXS, EPDA, KEA and NSGA2). Therefore their
experimental complexities can be compared on the basis of the CPU times.
440
Fig. 17.9 Attainment surface plots for a Graph with 200 nodes and random costs, (KEA:
g = 60, 000), resolution = 60. The middle of the attainment surface of KEA is dominated by
that of EPDA while the tails are almost identical, E ps = 0.01
17
441
it was not tested on these graphs. In the case of graphs with 150 nodes, the Smeasures of KEA were very close to EPDAs. This suggests that the APSs of the
two algorithms were very close, although the positive I-indicators mean that they
were incomparable. However, since I(EPDA) indicators were smaller, it means that
in a weak sense EPDA dominated KEA. The weak dominance of EPDA over KEA
is due to the dominated middle part of KEAs front. The Mann-Whitney rank sum
test indicated that the I values are significantly different in all instances at %99 level
of confidence.
In some 200-node instances, KEA obtained better values for the S-measure than
EPDA. Moreover, although both I-indicators were positive, KEA was better in a
weak sense as its I-indicator was smaller. In the case of Random costs with 200
nodes, the S-measure of KEA was close to EPDAs, suggesting that the APS discovered by KEA was relatively close to the APS of EPDA. The strength of KEA lies in
that it was able to find this APS in a third of the time of EPDA.
The best attainment surface for KEA is compared with EPDA in Figure 17.9,
which confirms that the APSs calculated by the two algorithms are very close except
for the middle section.
Table 17.4 (Three Criteria) Values of the S and the I indicators and the CPU times in seconds
averaged over 10 runs of KEA compared with EPDA on graphs with 10, 50 and 100 nodes
and random costs in three criteria
E ps
g KEA
S-value EPDA
S-value KEA , ( )
I (EPDA, KEA) Median , (IQR)
I (KEA, EPDA) Median , (IQR)
Mann-Whitney z-value
Significant level
CPU Sec. EPDA
CPU Sec. KEA , ( )
TTC EPDA
TTC KEA , ( )
FO EPDA
FO KEA , ( )
%FO EPDA
%FO KEA , ( )
10 Nodes
0.0001
1K
9M
9.5 M , (198 K)
0 , (0)
2.66 , (5.96)
-3.36
> 99%
2.9
0.65 , (0.08)
25 K
23 K , (885.8)
628
624.5 , (3.2)
2.54%
2.73% , (0.36%)
50 Nodes
9
10 K
3.296 M
2,988 M , (135 M)
69.57 , (26.13)
82.03 , (28.81)
-1.85
95%
933
887.2 , (51.79)
14,807 K
13,466 K , (531 K)
1,743
1,700.1 , (55.6)
0.01%
0.01% , (0.01)
100 Nodes
9
30 K
22,575 M
24,088 M , (1,185 M)
170.75 , (68.81)
139.67 , (36.85)
-3.74
> 99%
86K
23 K , (1,797.1)
342,811 K
205,800 K , (8,657 K)
4,265
3,400.9 , (113.9)
0.001%
0.002% , (0.001%)
442
100 nodes are compared with those obtained by EPDA in Table 17.4. It can be seen
that KEA is much faster than EPDA, in particular for n = 100. The %FO shows
that the computational effort of the two algorithms is almost identical. In the tricriterion case, there is no clear recommendation for the more suitable algorithm as
the I-indicator is positive in all cases. The S-measure suggests that KEA is better for
the 10 and 100 node instances.
The attainment surface obtained by EPDA and the best attainment surface obtained by KEA for the graph with 100 nodes are shown in Figure 17.10. To aid
visualization, the resolution is reduced so that only 84 points are shown on the attainment surfaces. Similar to the bi-criterion case, the APS determined by KEA and
EPDA are close, but EPDA partly dominates KEA.
Fig. 17.10 Attainment surface plot obtained by EPDA and the best attainment surface plot
obtained by KEA for a graph with 100 nodes, random costs and three criteria, (KEA: g =
30000), resolution 14 (only 84 points are projected for clarity), E ps = 9
17
443
Fig. 17.11 The approximate Pareto set obtained by KEA-M after 20000 generations for a
graph with random costs and 100 nodes which shows two gaps and still dominated solutions,
E ps = 0.0001
as seen in Figure 17.11, these solutions are unable to eliminate all the dominated
solutions and leave gaps in the final APS.
In KEA-G all the supported solutions are found and placed on the APS along
with the extreme points in the first phase; and then the other phases are performed
as in KEA. However, this algorithm does not perform well in verification tests as
it is unable to calculate the complete APS. The reason for this failure is that the
supported solutions dominate and consequently eliminate many of the intermediate
solutions whose non-dominated offspring are never produced.
Finally, KEA-W [11] is proposed that includes the search for non-dominated solutions from the middle point of the APS as well as the extreme points. The inclusion
of the middle point, found by the geometric method of Ramos et al. [42], ensures
that the middle section of the APS is not dominated. Verification tests against EXS,
demonstrated that KEA-W is capable of finding all the optimal solutions in graphs
with four to ten nodes in all the instances tested. In particular KEA-W found the
solution that was missed out by both EPDA and KEA in the case of the graph with
random costs and ten nodes.
Further experimental results were carried out with graphs having 20, 30, 50 and
100 nodes and three different cost types, comparing KEA-W with EPDA, KEA and
NSGA2. Table 17.5 shows the statistical results for KEA-W averaged over ten runs
compared with those obtained by KEA, NSGA2 and EPDA. KEA-W has a larger Svalue in all instances compared with KEA and NSGA2, whilst the S-values obtained
444
Table 17.5 (100 Nodes, bi-criterion) Mean and standard deviation of the S-measure and other
parameters and the CPU times in seconds averaged over 10 runs of KEA-W compared to those
obtained by KEA, EPDA and NSGA2 on graphs with 100 nodes and three different cost types
E ps
g KEA-W
g KEA
g , P NSGA2
S-value EPDA
S-value KEA-W , ( )
S-value KEA , ( )
S-value NSGA , ( )
CPU Sec. EPDA
CPU Sec. KEA-W , ( )
CPU Sec. KEA , ( )
CPU Sec. NSGA2 , ( )
TTC EPDA
TTC KEA-W , ( )
TTC KEA , ( )
TTC NSGA2 , ( )
FO EPDA
FO KEA-W , ( )
FO KEA , ( )
FO NSGA2 , ( )
%FO EPDA
%FO KEA-W , ( )
%FO KEA , ( )
%FO NSGA2 , ( )
Correlated Costs
0.0001
3K
3K
1000 , 1.5 K
25.3
25.3 , (0.004)
25.3 , (0.1)
17.7 , (1.1)
78.2
874 , (209)
37.1 , (3)
1,197 , (21.9)
412 K
56 M , (5 M)
383 K , (31 K)
1,942 K , (55 K)
1,339
1,457 , (40)
1,315 , (61.2)
501 , (74.3)
0,32%
0.003% , (0.001)
0.34% , (0.20)
0.03% , (0.13)
Anti-Correlated Costs
0.0001
20 K
20 K
500 , 5K
2073.5
2,073.6 , (0.16)
2,063.6 , (6.7)
1,878.7 , (7.9)
2,407
27 K , (8 K)
698.3 , (59.8)
9,079.8 , (52.4)
5,696 K
235 M , (25 M)
3,759 K , (350 K)
2,938 K , (44 K)
4,453
5,236 , (86)
4,255 , (124.1)
1,550 , (308.3)
0.08%
0.002% , (0.0003)
0.11% , (0.04)
0.05% , (0.69)
Random Costs
0.0001
20 K
20 K
500 , 5 K
7,810
7,807 K , (1,172)
7,727 K , (30,977)
1,077 K , (26,700)
2,021
11 K , (2 K)
515.5 , (51.9)
8,852.9 , (33.6)
4,643 K
294 M , (22 M)
3,063 K , (199 K)
2,837 K , (19 K)
4,805
5,824 , (207)
4,430 , (139.8)
1,860 , (208.2)
0.10%
0.002% , (0.0001)
0.14% , (0.07)
0.01% , (1.1)
by EPDA and KEA-W are very close. The inclusion of the mid-point of the PF has
increased the accuracy of KEA-W compared with that of KEA as well as its front
occupation. However, this precision has been achieved at the cost of large CPU time,
which is due to the large total number of STs calculated (T TC). The presence of an
additional search point results in the unnecessary re-evaluation of many solutions.
The median and the inter-quartile range (IQR) values of the I-indicator are shown
in Table 17.6, where lower values indicate better performance. The positive Iindicators show that KEA-W and KEA are incomparable in all instances. However,
since I(KEA-W, KEA) values are smaller than I(KEA, KEA-W), it can be concluded that, in a weaker sense, KEA-W is better in more than 50% of the runs. The
same conclusion can be made when comparing KEA-W with EPDA. In the case of
NSGA2, the median scores of KEA-W are negative in all instances tested indicating
17
445
Table 17.6 Median and Inter-quartile Range (IQR) values of the Additive Epsilon Indicator
obtained after 10 evaluations of KEA-W, KEA and NSGA2 on graphs with 100 nodes, E ps =
0.0001
Algorithms
I-values Correlated Costs Anti-Correlated Costs Random Costs
I(KEA-W, KEA) Median
0.001
0.013
1.219
I(KEA-W, KEA)
IQR
0.010
0.014
2.201
I(KEA, KEA-W) Median
0.046
2.361
162.398
I(KEA, KEA-W)
IQR
0.019
0.791
36.196
I(KEA-W, NSGA2) Median
-0.381
-0.088
-2.434
I(KEA-W, NSGA2) IQR
0.161
0.087
6.879
I(NSGA2, KEA-W) Median
1.179
3.398
301.227
I(NSGA2, KEA-W) IQR
0.352
0.184
12.203
I(KEA-W, EPDA) Median
0.008
0.078
7.019
I(KEA-W, EPDA)
IQR
0.010
0.026
0.700
I(EPDA, KEA-W) Median
0.010
0.168
8.912
I(EPDA, KEA-W) IQR
0.000
0.000
0.000
Fig. 17.12 Attainment surface plots obtained by KEA-W, KEA and NSGA2 with a graph
with 100 nodes and anti-correlated costs, resolution 60, total number of test points 120,
E ps = 0.0001
446
Fig. 17.13 Attainment surface plots obtained by KEA-W, KEA and NSGA2, with a graph
with 100 nodes and random costs, resolution 60, total number of test points 120, E ps =
0.0001
that it is better than NSGA2 in a strict sense on more than 50% of runs. Therefore,
the additive epsilon indicator confirms the superiority of KEA-W over KEA, EPDA
and NSGA2 in all three instances.
Figures 17.12 and 17.13 compare the best attainment surface plots obtained
by the three algorithms, KEA-W , KEA, and NSGA2 for the random and anticorrelated costs with E ps = 0.0001. These figures show that the middle section of
each surface plot of KEA is dominated by KEA-W. NSGA2 mostly dominates the
middle portion of KEA and is almost superimposed on the middle section of KEAW. However, the tails of KEA-W dominate those of NSGA2. Therefore, the attainment surface plots also attest the superiority of KEA-W over KEA and NSGA2.
In order to discover how expensive these algorithms are if only an approximate
PF is sought, all the four algorithms were executed for a limited time and their Attainment surface plots are compared in Figure 17.14. The solutions of NSGA2 are
dominated by the other three techniques when the CPU time is limited. EPDA diverges away from the middle of the APS, so it is dominated by KEA which diverges
less. At this middle point of the APS, KEA-W dominates all algorithms, but diverges on either side of the APS, where it is dominated by KEA and EPDA. Thus,
when time is limited, a member of the KEA family can be selected depending on
the part of the APS of interest.
17
447
Fig. 17.14 Attainment surface plots for a graph with 100 nodes and random costs, where the
execution time has been limited to approximately 515 sec. EPDA FO = 2, 782, approximate
KEA FO = 4, 317, NSGA2 FO = 21, KEA-W FO = 1, 629, resolution = 60, total no. of test
points = 120, E ps = 0.0001
448
One of the advantages of KEA is having an APS with flexible size. In most algorithms the APS is a vector with a fixed size. Therefore if the APS is full, when
new solutions are found other solutions have to be discarded. This may be a disadvantage since valuable information is lost. Given the present computational memory
capacity, a large APS should not cause computational problems. If in special applications there is a risk of memory getting saturated, the size of APS can be tailored
accordingly. KEA controls the size of the APS in two ways. By varying the precision
parameter, E ps, different sizes of APS can be obtained. Furthermore, the solutions
with equal fitness functions are deleted from the APS.
Most algorithms proposed in literature have been tested with integer costs. In
the past, to the best of our knowledge, only Kumar et al. [33], who have used realvalued data, have made observations about the effects of the precision parameter,
E ps. Other researchers have fixed the precision parameter and therefore have not
noted that with different values of the precision parameter they could have obtained
different sized APSs in the objective space. It is emphasized that the precision parameter affects the Pareto front only in the objective space. The number of the optimal solutions in the decision space will not be affected; regardless of the value of the
precision parameter. Nevertheless, since algorithms are compared with their APS in
the objective space, it is important to note the value of the precision parameter used
when results are reported to enable meaningful comparisons with future algorithms.
The only weakness of KEA lies in its dominated solutions in the middle of the
APS in certain instances. This is because these points are the farthest away from
the initial ancestors, the end points of the PF. Moreover, unlike EPDA that mutates
systematically all the STs in the APS, KEA selects the STs randomly. This has a
number of important effects. 1) KEA is substantially faster since not all STs are
mutated. 2) At each iteration, dominated STs are eliminated, which again speeds
up future processing, but at the cost that the non-dominated offspring of these STs
cannot be produced. 3) Some existing non-dominated STs may not be selected in
order to produce new non-dominated offspring. The last two effects result in the
dominated middle section of the APS.
17.6 Conclusions
A fast knowledge-based Evolutionary Algorithm, called KEA, was presented for
the multi-criterion minimum spanning tree problem. KEA was validated and tested
using hard benchmark instances of the problem generated with algorithms from
literature. The verification tests in the bi-criterion case against an exhaustive search
algorithm, for complete graphs having four to ten nodes and three different cost
functions, showed that KEA is capable of finding the true Pareto fronts.
KEA was compared with an adapted NSGA-II (NSGA2), on complete graphs
with 20, 30, 50 and 100 nodes and three different cost functions. The approximate
Pareto sets calculated by a deterministic algorithm (EPDA) were used as reference.
It was shown that KEA outperforms NSGA2 in terms of speed, spread and front
occupation. Further experiments with larger graphs of up to 200 nodes in two criteria
17
449
and up to 100 nodes in three criteria showed that it can obtain approximate Pareto
sets that are almost as good as those obtained by EPDA in less time. The speed of
KEA was demonstrated on problems with up to 500 nodes. The main advantages of
KEA over its predecessors include its scalability to more than two criteria; its ability
to calculate both the supported and the non-supported Pareto optimal solutions ;
exploring regions of the search space that other algorithms do not thus finding evenly
distributed Pareto fronts; and its speed making it applicable to problems with more
than 500 nodes.
The only deficiency of KEA is that the middle section of its approximate Pareto
sets for large graphs is dominated. In order to overcome this deficiency, a number
of modifications were tested. It was found that KEA-W obtains the best results at
the cost of large CPU times. Therefore when time is expensive and constitutes a
limiting factor, KEA is still preferable over all the algorithms tested.
Although KEA has been tailored for the MCMST problem, its underling philosophy is anticipated to be applicable to other domains, where the calculation of
extreme points is possible.
Acknowledgements
We are sincerely indebted to the reviewers for their comments and suggestions that have
improved the quality of this chapter. Our thanks are also due to Dr. J. Knowles for making
the software to calculate attainment surfaces to be plotted by the gnu plot available on his
website.
References
1. Anderson, K., Jornsten, A.K., Lind, M.: On bicriterion minimal spanning trees: An approximation. Comput. Oper. Res. 23, 11711182 (1996)
2. Arroyo, J.E.C., Vieira, P.S., Vianna, D.S.: A GRASP algorithm for the multi-criteria minimum spanning tree problem. In: Second Brazilian Symp. on Graphs (GRACO 2005),
Rio de Janeiro, Brazil (2005); Also the Sec. Multidiscip. Conf. on Sched.: Theory and
Apps., New York (2005)
3. Balakrishnan, A., Magnanti, T.L., Mirchandani, P.: Heuristics, LPs and trees on trees:
Network design analyses. Ops. Res. 44, 478496 (1996)
4. Bookstein, A., Klein, S.T.: Compression of correlated bit-vectors. Inf. Syst. 16, 110118
(1996)
5. Bosman, P.A.N., Thierens, D.: Multi-objective optimization with diversity preserving
mixture-based iterated density estimation evolutionary algorithms. Int. J. App. Reas. 31,
259289 (2002)
6. Bosman, P.A.N., Thierens, D.: A thorough documentation of obtained results on
real-valued continuous and combinatorial multi-objective optimization problems using
diversity preserving mixture-based iterated density estimation evolutionary algorithms,
Tech. Rep., Institute of Information and Computing Sciences, Utrecht University, The
Netherlands (2002)
7. Chen, G., Chen, S., Guo, W., Chen, H.: The multi-criteria minimum spanning tree problem based genetic algorithm. Inf. Sci. 177, 50505063 (2007)
450
8. Christofides, N.: Graph Theory: An Algorithmic Approach. Academic Press Inc., London (1975)
9. Coello Coello, C.A.: A short tutorial on evolutionary multi-objective optimization. In:
Zitzler, E., Deb, K., Thiele, L., Coello Coello, C.A., Corne, D.W. (eds.) EMO 2001.
LNCS, vol. 1993, pp. 2140. Springer, Heidelberg (2001)
10. Davis-Moradkhan, M., Browne, W.: A hybridized evolutionary algorithm for multicriterion minimum spanning tree problem. In: Proc. 8th. Int. Conf. Hybrid Intell. Sys.
(HIS 2008), pp. 290295 (2008)
11. Davis-Moradkhan, M., Browne, W., Grindrod, P.: Extending evolutionary algorithms to
discover tri-criterion and non-supported solutions for the minimum spanning tree problem. In: Proc. Genet. Evol. Comput. (GECCO 2009), Montreal, Canada, pp. 18291830
(2009)
12. Deb, K., Pratap, A., Agrawal, S., Meyarivan, T.: A fast and elitist multi-objective genetic
algorithm: NSGA-II. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M.,
Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 849858. Springer,
Heidelberg (2000)
13. Deb, K., Pratap, A., Agrawal, S., Meyarivan, T.: A fast and elitist multi-objective
genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 181197 (2002)
14. Ehrgott, M., Gandibleux, X.: Multiple Criteria Optimization: State of the Art. Annotated
Bibliographic Surveys. Kluwers International Series (2002)
15. Ehrgott, M., Skriver, A.J.: Solving biobjective combinatorial max-ordering problems by
ranking methods and a two-phase approach. Eur. J. Oper. Res. 147, 657664 (2003)
16. Flores, S.D., Cegla, B.B., Caceres, D.B.: Telecommunication network design with parallel multi-objective evolutionary algorithms. In: Proc. IFIP/ACM Lat. Am. Conf.: Towards Lat Am Agenda Network Res, La Paz, Bolivia (2003)
17. Gabow, H.N.: Two algorithms for generating weighted spanning trees in order. SIAM J.
Comput. 6, 139150 (1977)
18. Gastner, M.T., Newman, M.E.J.: Shape and efficiency in spatial distribution networks. J.
Stat. Mech. Theory & Exp. 1, 10151023 (2006)
19. Goldbarg, E.F.G., De Souza, G.R., Goldbarg, M.C.: Particle swarm optimization for the
bi-objective degree-constrained minimum spanning tree. In: Proc. IEEE Congr. Evol.
Comput., Vancouver, BC, Canada, pp. 15271534 (2006)
20. Guo, W., Chen, G., Feng, X., Yu, L.: Solving multi-criteria minimum spanning tree problem with discrete particle swarm optimization. In: Proc. 3rd Int. Conf. Nat. Comput.
(ICNC 2007), pp. 471478 (2007)
21. Hamacher, H.W., Ruhe, G.: On spanning tree problems with multiple objectives. An.
Oper. Res. 52, 209230 (1994)
22. Han, L., Wang, Y.: A novel genetic algorithm for multi-criteria minimum spanning tree
problem. In: Hao, Y., Liu, J., Wang, Y.-P., Cheung, Y.-m., Yin, H., Jiao, L., Ma, J., Jiao,
Y.-C. (eds.) CIS 2005. LNCS (LNAI), vol. 3801, pp. 297302. Springer, Heidelberg
(2005)
23. Lopez-Iban ez, M., Paquete, L., Stutzle, T.: On the design of ACO for the biobjective
quadratic assignment problem. In: Dorigo, M., Birattari, M., Blum, C., Gambardella,
L.M., Mondada, F., Stutzle, T. (eds.) ANTS 2004. LNCS, vol. 3172, pp. 214225.
Springer, Heidelberg (2004)
24. Katoh, N., Ibaraki, T., Mine, H.: An algorithm for finding K minimum spanning trees.
SIAM J. Comput. 10, 247255 (1981)
17
451
25. Knowles, J.D.: Local search and hybrid evolutionary algorithms for Pareto optimization,
Ph.D. Dissertation R8840, Department of Comput. Sci., University of Reading, Reading,
UK (2002)
26. Knowles, J.D.: ParEGO: A Hybrid Algorithm with On-Line Landscape Approximation
for Expensive Multi-objective Optimization Problems. IEEE Trans. Evol. Comput. 10,
5066 (2006)
27. Knowles, J.D., Corne, D.W.: Approximating the non-dominated front using the Pareto
archived evolution strategy. Evol. Comput. 8, 149172 (2000)
28. Knowles, J.D., Corne, D.W.: A Comparison of Encodings and Algorithms for Multiobjective Minimum Spanning Tree Problems. In: Proc. Congr. Evol. Comput. (CEC
2001), pp. 544551. IEEE Press, Los Alamitos (2001)
29. Knowles, J.D., Corne, D.W.: A Comparative Assessment of Memetic, Evolutionary,
and Constructive Algorithms for the Multi-objective d-MST Problem. In: Proc. Genet.
Evol. Comput. Conf. (Workshop WOMA II), Available on authors website (2001),
http://dbk.ch.umist.ac.uk/klowles/
30. Knowles, J.D., Corne, D.W.: Benchmark Problem Generators and Results for the Multiobjective Degree-Constrained Minimum Spanning Tree Problem. In: Proc. Genet. Evol.
Comput. Conf. (GECCO 2001), pp. 424431. Morgan Kuafman Publishers, San Francisco (2001)
31. Kruskal Jr., K.B.: On the shortest spanning subtree of a graph and the travelling salesman
problem. Proc. Amer. Math. Soc. 7, 4850 (1956)
32. Kumar, R., Banerjee, N.: Multicriteria network design using evolutionary algorithms. In:
Cantu-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., OReilly, U.-M., Beyer, H.-G.,
Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz,
A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS,
vol. 2724, pp. 21792190. Springer, Heidelberg (2003)
33. Kumar, R., Singh, P.K., Chakrabarti, P.P.: Multiobjective EA approach for improved
quality of solutions for spanning tree problem. In: Coello Coello, C.A., Hernandez
Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 811825. Springer, Heidelberg (2005)
34. Laumanns, M., Thiele, L., Zitzler, E.: An efficient, adaptive parameter variation scheme
for metaheuristics based on the epsilon-constraint method. Eur. J. Oper. Res. 169, 932
942 (2006)
35. Michalewics, Z.: Genetic Algorithms + Dta Structures = Evolution Programs. Springer,
U.S.A (1992)
36. Neumann, F., Wegener, I.: Minimum spanning trees made easier via multi-objective optimization. In: Beyer, H.-G. (ed.) Proc. Genet. Evol. Comput. (GECCO 2005), pp. 763
769 (2005)
37. Ng, H.S., Lam, K.P., Tai, W.K.: Analog and VLSI implementation of connectionist network for minimum spanning tree problems. In: Proc. IEEE Reg. 10 Int. Conf. Microelectron & VLSI (TENCON 1995), Hong Kong, pp. 137140 (1995)
38. Obradovie, N., Peters, J., Ruzie, G.: Multiple communication trees of optimal circulant
graphs with two chord lengths, Tech. Rep. SFU-CMPT-TR-2004-04, School of Comput.
Sci., Simon Fraser University, Barnaby, British Colombia, V5A 1S6, Canada (2004)
39. Paquete, L., Stutzle, T.: A two-phase local search for the biobjective traveling salesman
problem. In: Fonseca, C.M., Fleming, P.J., Zitzler, E., Deb, K., Thiele, L. (eds.) EMO
2003. LNCS, vol. 2632, pp. 479493. Springer, Heidelberg (2003)
40. Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech.
J. 36, 13891401 (1957)
452
41. Raidl, G.R., Julstrom, B.A.: Edge sets: An effective evolutionary coding of spanning
trees. IEEE Trans. Evol. Comput. 7, 225239 (2003)
42. Ramos, R.M., Alonso, S., Sicila, J., Gonzalez, C.: The Problem of the optimal
bi-objective spanning tree. Eur. J. Oper. Res. 111, 617628 (1998)
Chapter 18
18.1 Introduction
Many statistical inference methods rely on selection procedures to estimate a parameter of the joint distribution of the data structure X = (W,Y ) that consists of explanatory variables W = (W1 , . . . ,WJ ), J Z+ , and a scalar outcome Y . The parameter of
interest often takes the form of a functional relationship between the outcome and
explanatory variables, as in the regression settings estimation of E[Y |W ], the conditional expectation of the outcome given a set of covariates. In loss-based estimation,
David Shilane
e-mail: dshilane@stanford.edu
Richard H. Liang
e-mail: rhliang@berkeley.edu
Sandrine Dudoit
e-mail: sandrine@stat.berkeley.edu
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 453484.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
454
the parameter of interest is defined as the risk minimizer for a user-supplied loss
function. Risk optimization is particularly challenging for high-dimensional estimation problems because it requires searching over large parameter spaces to accommodate general regression functions with possibly higher-order interactions among
explanatory variables. We will show that the size of the parameter space in polynomial regression grows at a doubly exponential rate. In light of this expensive
optimization problem, we seek new procedures that rely upon computational intelligence to provide accurate statistical estimates. In this study, we propose an evolutionary algorithm (EA) for risk optimization that may be applied in the context of
loss-based estimation with cross-validation.
Statistical estimation in regression settings may consider a variety of approaches.
For a fixed regression function relating the outcome and explanatory variables, a
least squares approach [12] may be employed when the sample size is large relative to the number of explanatory variables. When the sample size is insufficient
for the dimension of the problem, sparse regression procedures such as the Lasso
[19], Least Angle Regression [10], or the Dantzig Selector [4] may be applied as
shrinkage procedures. By including only a small number of explanatory variables as
main effects, these estimators aim to produce interpretable models of the outcome.
By contrast, other estimators focus solely on their predictive value; these include
Classification and Regression Trees [3], Random Forests [2], Multivarite Adaptive
Regression Splines [13, 14], and neural network approaches [17]. Lossbased estimation with crossvalidation seeks to balance the goals of producing interpretable
models and reliable estimates with predictive value by selecting among a variety
of candidate estimators in terms of their fit on an independent validation set. Like
shrinkage procedures, lossbased estimation with cross-validation results in some
degree of variable selection but also allows for the exploration of higherorder variable interactions. Furthermore, crossvalidation can be shown to be an asymptotically optimal selection procedure in terms of the sample size [7, 15]. Because of
its strong theoretical properties and its utility in producing reliable and interpretable
estimates, we will rely upon crossvalidation as a general method for estimation and
propose a new procedure for risk optimization to be used within this context.
The proposed methodology is motivated by the general road map for statistical lossbased estimation using crossvalidation of van der Laan and Dudoit [15]
and Dudoit and van der Laan [7]. Risk optimization may be considered a sub
problem of this road map. Sinisi and van der Laan [16] introduced a general Deletion/Substitution/Addition (DSA) algorithm for generating candidate estimators that
seek to minimize empirical risk over subspaces demarcated by basis size (Section
18.3.1). However, Wolpert and MacReady [20] have shown that no single optimization algorithm can competitively solve all problems; therefore, we are interested in
generating complementary risk optimization algorithms for use in estimator selection procedures. Within the estimation road map [7, 15], this project seeks to analyze
the size of the parameter space for a polynomial regression function in terms of the
number of explanatory variables, the maximum number of interacting variables, and
either the polynomial degree or the variable degree. It also introduces an EA to generate candidate estimators and minimize empirical risk within parameter subspaces.
Relying upon V-fold cross-validation to select an optimal parameter subspace, the
18
455
456
( , P)
P M.
(18.1)
It is assumed that the loss function is specified such that the parameter of interest
minimizes the risk function. For instance, in regression, the parameter of interest is
the regression function (W ) = E[Y |W ], which minimizes risk for the L2 loss function L(X, ) = (Y (W ))2 . We can then define the optimal risk over the parameter
space as:
1 n
L(Xi , ).
n i=1
(18.3)
The general road map for loss-based estimation [7, 8, 15] contains three steps:
1. Define the parameter of interest. This parameter is the value that minimizes risk
for a user-supplied loss function.
2. Generate candidate estimators. The parameter space is divided based upon a
sieve of increasing dimensionality into subspaces whose union approximates the
complete parameter space. Within each subspace, a candidate estimator is chosen
to minimize empirical risk.
3. Apply cross-validation. Select the optimal estimator among the candidates produced in Step 2 using cross-validation.
18
457
3. Data points Xi = (Wi ,Yi ), i {1, . . . , n}, from the learning set Xn are randomly assigned to a class in {1, . . .,V } such that each class contains an approximately equal
number of observations. Let Q = (q1 , . . . , qn ) refer to the datas class assignments.
4. For each fold v {1, . . . ,V }:
a. Assign data points to the training set:
Tn (v) = {Xi : qi = v} .
(18.4)
(18.5)
V (v)
, where Pn n
ii. Compute the validation set risk k,n , Pn n
the empirical distribution on the validation set Vn (v).
represents
5. Calculate the mean cross-validated risk for each subspace and store it in the vector
CV
1 V
1 V
V (v)
V (v)
CV
CV
.
1 , . . . , K =
1,n, Pn , . . . , V K,n , Pn
V v=1
v=1
(18.6)
6. Select the subspace that minimizes mean cross-validated risk:
kn = argmin kCV .
(18.7)
k{1,...,K}
7. Finally, search within the parameter subspace kn for the estimate n minimizing
empirical risk (n , Pn ) on the learning set data Xn .
Steps 4(c)i and 7 of the above procedure rely upon searching a parameter subspace
for the estimator that minimizes empirical risk when applied to the specified (training or learning) data set. An exhaustive search of the parameter subspace may be
employed when doing so is computationally tractable. However, in estimation problems over general regression functions with possibly higher-order interactions, the
parameter space can grow complex and large (Section 18.3.3) for even a moderate number of explanatory variables. We therefore require a search algorithm to
minimize risk within a parameter subspace in the allotted computational time. The
DSA [16] is one candidate search algorithm; Section 18.4 will introduce a class of
evolutionary algorithms as an alternative procedure for risk minimization.
458
the choice of the link function (e.g. logit or probit) mapping the selected basis functions to the outcome variable, and the constraints that limit the way in which explanatory variables may interact. Much as in Sinisi and van der Laan [16], the proposed
estimator selection procedure may be applied to any estimation setting, including but
not limited to robust and weighted regression, censored data structures, and generalized linear models for any choice of link function h : R R. Because of its approximation capabilities [18], we will focus on the parameter space consisting of the
set of polynomial combinations of the explanatory variables with real-valued coefficients. In this parametrization, the set of basis functions consists of all monomial
functions that can be expressed in terms of an exponent vector d = (d1 , . . . , dJ ) as
= W d W1d1 WJdJ .
(18.8)
= i1 , . . . , ik ,
(18.9)
with cardinality | | = k referred to as the basis size. Given the link function h, a size
k set of basis functions , and a (k + 1)-dimensional vector = (0 , 1 , . . . , k ) of
real-valued coefficients, a regression function for has the form
= h 0 +
i:i
i i .
(18.10)
In seeking an estimate n that minimizes true risk, we will first search for the optimal set of basis functions n and then subsequently seek an optimal estimate n of
. Estimating given n is a standard regression problem that is solved in a closed
form for linear regression and with numeric optimization methods for non-linear
regression. Given n and n , the estimate n is defined as:
n = h 0 n +
i:i n
in i .
(18.11)
18.3.2 Constraints
At the users discretion, constraints may be imposed on the set of basis functions
. These constraints may take the form of limits on the interaction order and the
polynomial or variable degree. The interaction order constraint, which limits the
number of variables that interact in a basis function, may be stated as:
1
1{d j > 0} S;
S Z+ .
(18.12)
j=1
d j D;
j=1
d j Z+ , j {1, . . . , J}.
(18.13)
18
459
j {1, . . . , J} with
d j 1.
(18.14)
j=1
Although the constraints (18.12), (18.13), and (18.14) are not required, they allow
the researcher to restrict attention to a particular subset of the class of chosen basis
functions. By default, the interaction order S can be no greater than min(J, D) under
constraint (18.13) and is limited to J under (18.14). The extreme cases S = 1 and
either S = min(J, D) or S = J correspond, respectively, to constraints allowing no
interactions and interactions of any order.
460
That is, when no variables may interact, the set of possible basis functions consists of all choices of a single variable W j , j {1, . . . , J}, raised to a power
d j {1, . . . , D}.
When the parameter space is instead restricted by the variable degree constraint
(18.14), the number of basis functions is I0 :
I0 =
J
s Ds0 .
s=1
S
(18.16)
under the constraint (18.13), and so the lower bound on I is 2min(J,D) . Likewise, when no interaction order constraint is imposed under the constraint (18.14),
then S = J, and I0 is bounded below by DJ0 . Because all interactions are allowed
when S = J, the summation for I0 in (18.16) may be
expressed
as a polynomial in
D0 of degree J. Therefore, when S = J, I0 is both DJ0 and O DJ0 , which jointly
imply a tight bound on I0 . The latter bound may also be used as a loose upper bound
when S < J. Furthermore, because any basis function allowed under the constraints
(18.12) and (18.13) is also permitted under (18.12) and (18.14) whenD =
D0 , this
upper bound on I0 is also a trivial upper bound on I. Therefore, I is O DJ . Finally,
I or 2I0 , these quantities are respecbecause the size of the parameter space is 2
S
S
tively bounded below by functions of order 22 and 2D0 . Likewise, upper
J
J
bounds of O 2D and O 2D0 may also be established, where the former is a
trivial bound, and the latter is a loose bound that is only tight in the extreme case of
S = J. These results are proved in the Appendix and summarized in Table 18.1.
18
461
Table 18.1 Size of the parameter space under the interaction order constraint (18.12) and
either the polynomial degree constraint (18.13) or the variable degree constraint (18.14)
Polynomial Degree
(18.13) Variable DegreeConstraint
(18.14)
Constraint
J
O 2D
S
22
Upper Bound
Lower Bound
O 2D0
S
2D0
Because the size of the parameter space is at least of a doubly exponential order
of the number of variables J and the polynomial degree bound D or variable degree bound D0 when the interaction order is not constrained, even moderate degree
constraints imposed on a small number of variables may result in an intractable parameter space to search. In this setting, significant computation may be required to
obtain a reliable estimate of the parameter of interest. Figure 18.1 depicts the growth
of log(I) and log(I0 ) as the polynomial degree bound D and the variable degree
bound D0 increase in an estimation setting with J = 11 variables and no interaction
order constraint; i.e. S = min(J, D) for the constraint (18.13), and S = J for the constraint (18.14). The approximately linear growth on the logarithmic scale confirms
that the values I and I0 are exponential functions of their respective degree bounds.
The value I is consistently smaller than I0 because the polynomial degree constraint
(18.13) restricts the parameter space to a subset of that specified by the variable degree constraint (18.14). Furthermore, the maximum value of S under the constraint
(18.14) is J, whereas S is constrained to min(J, D) J under the constraint (18.13).
15
I_0
I
10
20
Degree
Fig. 18.1 The natural logarithm of the numbers of basis functions I and I0 as a function
of the polynomial degree bound D and the variable degree bound D0 , respectively, for J =
11 variables and no interaction order constraint. For the constraint (18.13), we have S =
min(J, D), and for the constraint (18.14), this value is S = J
462
20
15
10
10
Fig. 18.2 The natural logarithm of the number of basis functions I0 as a function of the
interaction order bound S for J = 11 variables with the variable degree constraint (18.14)
specified by D0 = 5
Therefore, for a fixed level of the interaction order bound S, the polynomial degree
constraint (18.13) always results in a smaller parameter space than that specified by
the variable degree constraint (18.14) when D = D0 .
Figure 18.2 plots the growth of log(I0 ) as a function of the interaction order
bound S for an estimation setting including J = 11 variables and degree bound D0 =
5, which corresponds to Model 5 presented in Section 18.5. In practice, S is often
chosen according to scientific insight for the problem at hand. However, the choice
of S can also be used to effectively prune the parameter space to a manageable size.
18
463
distribution for either the full learning set or a cross-validation training set. A candidate optimum of this fitness function is given by an individual consisting of a
kJ
genotype vector e = (e1 , . . . , ekJ ) (R+ ) and a corresponding phenotype vector:
kJ
d=d(e) [d(e1 , . . . , eJ )] , [d(eJ+1 , . . . , de2J )] , . . . , d(e(k1)J+1 , . . . , ekJ ) R+ .
(18.17)
Each block of J phenotypic components d(e( j1)J+1 ), . . . , d(e jJ ) serves as the exponent vector of a particular basis function, and the k basis functions collectively
specify a subset of the form (18.9) that map to a candidate optimum n . An individuals fitness is given by the risk (n , P) of its associated estimate n . Although
the user may proceed by directly specifying a phenotype vector, a data structure
including both a genotype and a phenotype allows for a greater variety of evolutionary information to be stored in an individual. For instance, a continuous genotype
may be used to break ties in the phenotype when an interaction order constraint is
imposed. In this setting, the elements of the genotype vector e may belong to the
positive real numbers R+ , the elements of the phenotype vector d may be limited
to the set of positive integers Z+ , and any function with domain R+ and range Z+
may be used to map from an individuals genotype to its phenotype. In the procedure of Section 18.4.1, we choose this function according to the selected degree
and interaction order constraints. When an interaction order constraint is imposed, a
continuous genotype structure allows some genes to maintain a genotype while remaining dormant in terms of phenotype. In this scenario, if a gene whose phenotype
previously interacted mutates, a gene of dormant phenotype may immediately take
its place as one of the at most S interacting phenotypic components of a given basis
function. Furthermore, a data structure incorporating both a genotype and a phenotype generalizes the EA so that it may be tailored to a particular constraint profile
(e.g. those of Section 18.3.2) solely through the choice of the phenotype function.
Starting from a random initial population, EAs typically generate subsequent
populations of individuals in generations of offspring created from existing parents via iterations of evolutionary mechanisms. Although other mechanisms may be
used, each generation of the proposed EA consists of a reproduction, mutation, and
selection phase, and these mechanisms collectively create and evaluate new individuals for quality in terms of fitness. After allowing the population to evolve for
G Z+ generations, the individual with optimum observed fitness is retained as the
algorithms result, which specifies an estimate n with an associated risk given by
(n , P).
18.4.1 Proposed EA
The following EA is used to optimize risk within a parameter subspace of size
k Z+ based on monomial basis functions of the J explanatory variables under the interaction order constraint (18.12) and either the polynomial degree constraint (18.13) or the variable degree constraint (18.14). A schematic diagram of this
464
algorithm is depicted in Figure 18.3. Each step of the algorithm is first summarized
here and then further elucidated below.
Fig. 18.3 Schematic diagram for the proposed EA risk optimization procedure
18
465
2. Calculate the individuals phenotype vector d from its genotype e. This computation differs depending on which degree constraint is used. However, this step is
the only location within the EA for which the procedure differs depending upon
the constraint. This is an additional advantage of a data structure that includes
both a genotype and a phenotype.
For the polynomial degree constraint (18.13): Within each block of J genes of
the genotype vector e, begin with the variable of highest rank (corresponding
to the largest gene value). Assign the minimum of the floor of this gene value
and the remaining polynomial degree as the variables phenotype within the
block. Repeat this procedure on each variable in order of its gene rank until the
monomial is of degree D or the interaction order constraint (18.12) is binding.
For the variable degree constraint (18.14): Within each block of J genes, compute the phenotype by assigning the floor of the genotype for each of the at
most S interacting variables. (Because all gene values are within the interval (0, D0 + 1), the floor function ensures that no phenotype exceeds D0 .) All
non-interacting variables receive phenotype 0. This computation may be performed via the following equation:
d =
e1{r S} = (
e1 1{r1 S}, . . .,
ekJ 1{rkJ S}).
(18.19)
In order to ensure that the resulting monomials are all of degree at least 1 under
each of the above degree constraints, the variable of highest rank within each
block of J genes may receive a phenotype of 1 when all gene values within the
block are less than 1.
3. Given a phenotype vector d, an individual has an associated subset of basis functions n . Calculate from n the corresponding estimate n according to (18.11).
Initialization: The user may specify the number Z 4, Z Z+ , of individuals in
the initial population. Recall that the genotype vector e for an individual has length
kJ. Initialization consists of generating genotype vectors for each individual from a
(kJ)-variate uniform distribution on (0, D + 1)kJ or (0, D0 + 1)kJ .
Selection: Given a population of individuals, each with an associated estimate n ,
we will rank individuals according to fitness via the following procedure:
1. Compute empirical risk (n , P), where P is the empirical distribution with respect to either a cross-validation training data set Tn or the learning set Xn .
2. Rank existing individuals in order of increasing empirical risk. Select the 2
Z/4
individuals with smallest empirical risk for reproduction. We will refer to the
ranked population as (e[1], . . . , e[Z]), with each e[z] mapping to the gene vector e
for the individual with the zth smallest risk.
An individual may be considered cumulatively optimal at generation g if its associated estimate n has a smaller risk than that of any other individual produced in
the first g generations. The proposed selection mechanism is elitist in the sense that
the cumulatively optimal individual is always selected at each generation. (Indeed,
if the cumulatively optimal individual at generation g is not selected at generation
466
18
467
468
estimate of globally optimal risk within the size k parameter subspace. Because of
this convergence, and because cross-validation is an asymptotically optimal procedure for selecting the basis size kn as a function of the sample size n, the proposed
estimator selection procedure asymptotically converges in risk to the parameter of
interest as n and G tend toward infinity.
When information is known about the risk surface for an estimator selection application, it may be incorporated into the design of an appropriate optimization algorithm. However, the risk surface topology is typically unknown, so we are unable to
provide any general bounds on the rate at which convergence to the global optimum
is achieved. Indeed, it is possible that an EA will not improve upon full enumeration
in terms of the rate of asymptotic convergence; however, because an EA evolves
the population according to the risk surface, it generally outperforms random search
in practical settings with limited computational resources available. Similarly, it is
difficult to provide a priori guidelines on how the EAs computational parameters
should be tuned to facilitate optimization over the specific problems unknown risk
surface. However, the resulting estimator is typically most sensitive to the mutation parameters and . Although allowing for a larger number of mutating genes
through the choice of risks devolving the EA into a random search, one must take
into account that not all mutations will affect the choice of basis function because
of the interaction order constraint.
Y1e = Y1 + ;
N(0, 1);
W;
(18.21)
Y2 = W2W4 ;
Y2e = Y2 + ;
N(0, 1);
W;
(18.22)
18
Y3 = W2W4W62 + W8W11 ;
Y3e = Y3 + ;
N(0, 1);
W;
(18.23)
W.
(18.24)
Each model a was subject to the variable degree constraint (18.14) with bounds of
D1 = D2 = 2, D3 = 4, and D5 = 5. No interaction order constraint was imposed, so
S = J = 11 by default. The total number of basis functions for each setting, which is
given by the formula for I0 in (18.16), is shown in Table 18.2. Figure 18.1 also shows
the growth in I0 as a function of D0 with J = 11 variables and no interaction order
constraint. Model 5 comprises the largest parameter space. Figure 18.2 shows how
this parameter space can be pruned by introducing an interaction order constraint.
The above experiment was repeated for a total of B = 193 trials. Given a subset
of basis functions, the parameter vector in (18.10) was estimated using Ordinary
Least Squares (OLS) linear regression. Candidate basis sizes were restricted to a
maximum value of K = 5. V-fold cross-validation was conducted with V = 5 and the
default values of all non-specified computational parameters. Both the EA presented
in Section 18.4 and the DSA algorithm of [16] were used to estimate the parameter
of interest a = E[Ya |W ] for each of the above random and non-random models
a {1, 2, 3, 5}. However, the DSA used the polynomial degree constraint (18.13),
and the EA relied upon the variable degree constraint (18.14). On each trial, the EA
and DSA algorithms each performed separate random assignments of data to their
respective training sets T and validation sets V . Furthermore, because the DSA
algorithm includes a stopping criterion based upon relative improvement in risk,
the trials do not involve the same number of model fits. In general, the DSA was
allowed to run until its stopping criterion was triggered, and the EA was run for
the generation limits specified in Table 18.2. However, for the deterministic models,
the EA was allowed to halt its search if an estimate with zero risk (with a roundoff error tolerance of 1015 ) was located. If all V searches of a parameter subspace
with basis size k located estimators that attained a validation set risk of zero, then
no parameter subspaces of larger basis size were searched. Likewise, the EA also
halted if the learning set search located an estimate with zero empirical risk. The
current software implementation of the EA also allows for an exhaustive search of
a parameter subspace if doing so is more computationally efficient than running the
EA for the specified number of generations. For a population of size Z, the proposed
EA fits a total of T regression estimates over G generations of evolution, where T is
given by:
T = Z + 2G
Z/4.
(18.25)
Y5 = W1W22W32 + W1W2W32W4 + W33 + W54 ;
Y5e = Y5 + ;
469
N(0, 1);
The EA first fits regression estimates for each of the Z individuals in the initial
population. At each generation, a total of 2
Z/4 offspring are created. Because
regression estimates are computationally costly, the value of T may be reduced if
individuals
withinthe
population specify the same candidate solution. Similarly,
when kI T or Ik0 T for constraints (18.13) or (18.14), respectively, an exhaustive search of the size k parameter subspace is more computationally efficient
470
D0
2
2
2
2
4
4
5
5
I0
177,146
177,146
177,146
177,146
48,828,124
48,828,124
362,797,055
362,797,055
| n |
;
| |
(18.26)
speci f icity( , n ) =
| n |
.
|n |
(18.27)
The cross-validated risk is the minimum component of the vector (18.6) of mean validation set risks. The empirical learning set risk is the risk of the selected estimator
on the learning set Xn . The empirical test set risk is the risk of the selected estimator on a test set of 1 million observations generated independently of the learning
18
471
set from the same distribution. Because we seek to minimize risk, smaller quantities
are preferred for the cross-validated, empirical learning set, and empirical test set
risks. When independent and identical trials are conducted for a given algorithm on
separate data sets, we can combine the results in terms of a performance metric.
We are primarily concerned with the distribution of each type of risk in general and
the median value in particular. The results of the simulation study are contained in
Tables 18.318.5 and Figures 18.418.9. These figures contain notched boxplots,
and evidence of a significant performance difference between the DSA and EA is
noted when the notches of the respective boxplots fail to overlap [5].
The simulations sensitivity results for the random and non-random models are
summarized in Table 18.3. For non-random models, the EA consistently produces
a sensitivity of 1 for estimating E[Ya |W ] on Models 1, 2, and 3. Although results
are variable for Model 5, the median sensitivity is 1. Meanwhile, the DSA produces
strong results for some models but not for others. In the random models, the results
are more varied. The EA successfully locates the proper basis functions for Model
1e and at least one true basis function for Models 3e and 5e but does not locate the
proper term for Model 2e. The DSA performs similarly to the EA on Models 1e, 2e,
and 5e, but it does not locate any true basis functions for Model 3e.
Table 18.3 Six number summaries for sensitivity measurements in the simulation study
Model 1 EA
Model 1 DSA
Model 1e EA
Model 1e DSA
Model 2 EA
Model 2 DSA
Model 2e EA
Model 2e DSA
Model Y3 EA
Model Y3 DSA
Model 3e EA
Model 3e DSA
Model Y5 EA
Model Y5 DSA
Model 5e EA
Model 5e DSA
Min.
1.00
1.00
0.00
0.00
1.00
0.00
0.00
0.00
1.00
0.00
0.00
0.00
0.50
0.50
0.00
0.00
1st Qu.
1.00
1.00
0.50
1.00
1.00
0.00
0.00
0.00
1.00
0.00
0.00
0.00
1.00
0.50
0.00
0.00
Median
1.00
1.00
1.00
1.00
1.00
0.00
1.00
0.00
1.00
0.00
0.00
0.00
1.00
0.50
0.25
0.25
Mean
1.00
1.00
0.81
0.88
1.00
0.00
0.58
0.00
1.00
0.00
0.23
0.00
0.89
0.50
0.18
0.23
3rd Qu.
1.00
1.00
1.00
1.00
1.00
0.00
1.00
0.00
1.00
0.00
0.50
0.00
1.00
0.50
0.25
0.25
Max.
1.00
1.00
1.00
1.00
1.00
0.00
1.00
0.00
1.00
0.00
1.00
0.00
1.00
0.50
0.75
0.50
The specificity results are displayed in Table 18.4. The EA consistently selects
only proper basis functions for Models 1, 2, and 3, with 1 improper term and 4
correct terms typically selected for Model 5. The DSA includes both proper and
improper terms for Models 1 and 5 but trails the EA in specificity on all non-random
models. However, for random models, the DSA appears to perform better than the
EA on Models 1e and 5e, equally on Model 2e, and worse on Model 3e.
472
Table 18.4 Six number summaries for specificity measurements in the simulation study
Model 1 EA
Model 1 DSA
Model 1e EA
Model 1e DSA
Model 2 EA
Model 2 DSA
Model 2e EA
Model 2e DSA
Model 3 EA
Model 3 DSA
Model 3e EA
Model 3e DSA
Model 5 EA
Model 5 DSA
Model 5e EA
Model 5e DSA
Min.
1.00
0.40
0.00
0.00
1.00
0.00
0.00
0.00
0.67
0.00
0.00
0.00
0.40
0.40
0.00
0.00
1st Qu.
1.00
0.40
0.25
0.50
1.00
0.00
0.00
0.00
1.00
0.00
0.00
0.00
0.80
0.40
0.00
0.00
Median
1.00
0.40
0.40
1.00
1.00
0.00
0.20
0.00
1.00
0.00
0.00
0.00
0.80
0.40
0.20
0.33
Mean
1.00
0.44
0.36
0.79
1.00
0.00
0.15
0.00
1.00
0.00
0.11
0.00
0.78
0.40
0.16
0.34
3rd Qu.
1.00
0.40
0.40
1.00
1.00
0.00
0.20
0.00
1.00
0.00
0.20
0.00
1.00
0.40
0.20
0.50
Max.
1.00
1.00
1.00
1.00
1.00
0.00
1.00
0.00
1.00
0.00
1.00
0.00
1.00
0.50
0.75
1.00
Table 18.5 Six number summaries for basis size error measurements in the simulation
experiments
Min.
Model 1 EA
0.00
Model 1 DSA 0.00
Model 1e EA 0.00
Model 1e DSA 0.00
Model 2 EA
0.00
Model 2 DSA 1.00
Model 2e EA 0.00
Model 2e DSA 0.00
Model 3 EA
0.00
Model 3 DSA 3.00
Model 3e EA 0.00
Model 3e DSA 1.00
Model 5 EA
0.00
Model 5 DSA 0.00
Model 5e EA 1.00
Model 5e DSA 2.00
1st Qu.
0.00
3.00
2.00
0.00
0.00
1.00
3.00
1.00
0.00
3.00
2.00
0.00
0.00
1.00
0.00
2.00
Median
0.00
3.00
3.00
0.00
0.00
1.00
4.00
1.00
0.00
3.00
3.00
1.00
1.00
1.00
1.00
1.00
Mean
0.00
2.77
2.60
0.37
0.00
1.27
3.47
1.26
0.01
3.00
2.47
1.33
0.61
0.99
0.69
0.97
3rd Qu.
0.00
3.00
3.00
1.00
0.00
1.00
4.00
1.00
0.00
3.00
3.00
2.00
1.00
1.00
1.00
0.00
Max.
0.00
3.00
3.00
3.00
0.00
4.00
4.00
4.00
1.00
3.00
3.00
3.00
1.00
1.00
1.00
1.00
Table 18.5 shows the basis size error. The error is standardized across models
by subtracting the true basis size from the selected size on each trial, so an error
of zero is desirable. In terms of median performance, the EA consistently selects
the appropriate basis size on all non-random models but occasionally overestimates
on Model 5. The DSA consistently overestimates the basis size for all non-random
18
473
models. However, for random models, the EA appears to overestimate the true basis
size while the DSA either produces a smaller overestimate (Models 2e and 3e),
selects the appropriate size (Model 1e), or underestimates the basis size (Model 5e).
The performance difference between the EA and DSA becomes clear when we
compare the two procedures in terms of risk. Figure 18.4 (non-random models)
and Figure 18.5 (random models) summarize the cross-validated risk for the estimates produced in the simulation study. Both procedures consistently locate the
appropriate set of basis functions in the cross-validation stage of estimator selection on Model 1, but the EA produces a smaller median cross-validated risk for the
other seven models studied. Furthermore, the EA consistently locates an estimate
0.015
CV Risk
0.010
SA
EA
5
SA
EA
SA
D
2
EA
SA
EA
0.000
0.005
Fig. 18.4 Cross-validated risk of estimates produced by the EA and DSA algorithms for the
non-random simulation models
1.15
1.10
1.00
0.90
0.95
CV Risk
1.05
0.85
SA
EA
5e
5e
SA
EA
3e
3e
2e
SA
EA
SA
2e
D
1e
1e
EA
Fig. 18.5 Cross-validated risk of estimates produced by the EA and DSA algorithms for the
random simulation models
474
0.010
0.005
0.015
SA
EA
SA
D
SA
D
2
SA
EA
EA
EA
0.000
Fig. 18.6 Empirical learning set risk of estimates produced by the EA and DSA algorithms
for the non-random simulation models
1.10
0.95
1.00
1.05
0.90
1.15
0.85
SA
EA
5e
5e
SA
EA
3e
3e
2e
SA
EA
SA
2e
D
1e
1e
EA
Fig. 18.7 Empirical learning set risk of estimates produced by the EA and DSA algorithms
for the random simulation models
18
475
0.015
0.010
0.005
SA
EA
5
SA
EA
SA
D
2
SA
EA
EA
0.000
Fig. 18.8 Empirical test set risk of estimates produced by the EA and DSA algorithms for
the non-random simulation models
1.05
0.95
1.00
SA
EA
5e
5e
SA
EA
3e
3e
2e
SA
EA
SA
2e
D
1e
1e
EA
0.85
0.90
1.10
1.15
Fig. 18.9 Empirical test set risk of estimates produced by the EA and DSA algorithms for
the random simulation models
of Figure 18.9. For the random models, the true risk (the risk of the true regression
function) is given by the variance of the residual vector , which is 1 in this case
because the residuals were generated from standard Normal random variables. For
the simulation study, any increase above 1 in the test set risk can be attributed to
a bias introduced by the selection of improper basis functions by the EA or DSA.
Across all simulations, it appears that the EAs estimates produce a test set risk that
exhibits greater variance than that of the DSA.
476
The cross-validated risk, empirical learning set risk, and empirical test set risk
are all estimates of true risk for a given estimate of the regression function . However, it is well known that the empirical learning set risk tends to underestimate
true risk [7]. In the figures mentioned above, the median empirical learning set risk
for the simulation results is smaller than the corresponding median cross-validated
risk or empirical test set risk in each of the random models studied for both the
EA and DSA. (In many of the non-random models, each median is zero.) In general, we prefer the empirical test set risk to the cross-validated risk in assessing an
estimates quality; because the test set data are not used in the estimator selection
process, the resulting estimate cannot over-fit to the test set data. Although the median cross-validated and empirical test set risks were both close to the true risk of
1, the cross-validated risk exhibits significantly greater variability across trials than
the corresponding empirical test set risk on each model. Therefore, the empirical
test set risk appears to estimate true risk more reliably than the cross-validated risk
in the random simulation models.
18
477
overnight, which was considered the maximum acceptable search time for the study.
In total, this required 5.7 hours of computation.
Table 18.6 Tuning parameter values for the EA estimator selection algorithm applied to the
diabetes data set of Efron et al [10]
Basis Sizes V D0 /D S Population Size, Z CV Generations, G Learning Set Generations, G
{0, 1, . . ., 8} 5 3 3
20
5, 000
10, 000
Figure 18.10 displays the cross-validated risk for each candidate basis size
considered by the EA. During the cross-validation phase, the estimator selection
algorithm selected a basis size of 8, the maximum considered. Figure 18.11 plots
empirical learning set risk as a function of generation in the learning set risk optimization within the size 8 parameter subspace. Because the cumulatively optimal
individual is retained at each generation, risk decreases monotonically as a function
of generation. Somewhat after the 8, 000th generation, the EA located an estimate
that was not improved upon in the subsequent generations. The estimator selection
procedure results in the OLS coefficient estimates contained in Table 18.7. Ordinarily, these coefficients are accompanied by estimated standard errors, t-statistics, and
p-values for testing the null hypothesis of a zero coefficient. However, such inferences can only be drawn through a model of the underlying distribution of the estimator, which is currently an open problem for estimator selection procedures such
as those considered in this paper. Similarly, Table 18.8 shows the regression coefficient estimates obtained by the DSA. The basis function including the S5 serum
measurement was selected by both the EA and DSA, but otherwise the selected basis
functions differed in terms of degree, order of interaction, and coefficient estimates.
Most of the basis functions selected by the EA contain higher powers, a maximal
478
order of interaction, and generally large coefficient estimates. In contrast, the DSA
produced an estimate with no higher powers assigned to any variable, relatively few
variable interactions, and smaller coefficient estimates that produce a simpler interpretation for the effect of each variable. It is possible that the EA would also produce
a more meaningful estimate if the polynomial degree constraint (18.13) were used
in place of the variable degree constraint (18.14), which would limit the parameter
space to a subspace of that considered here. However, at the time of this analysis,
the software implementation of the EA for the polynomial degree constraint (18.13)
was not yet available.
Fig. 18.11 Empirical learning set risk as a function of generation in risk optimization on
estimates of size 8. The circled region contains the generation at which the final estimate was
located by the EA
Table 18.7 EA regression coefficient estimates for disease progression in the diabetes study
Int.
S5
Age:S6
Sex2 :BMI
Table 18.8 DSA regression coefficient estimates of disease progression in the diabetes study
Int. BMI
S5
S3
BP
SEX BMI : BP AGE : SEX
5.84 525.98 549.76 315.14 295.34 255.71 3910.98 3913.89
18
479
test set risks were calculated on a total of B = 100 bootstrap samples produced from
the test set data. Although the learning and test sets were identical to those used
by Durbin et al [9], the specific bootstrap test set samples previously used were not
available. However, the bootstrap test sets generated in this analysis are i.i.d. observations produced from the sampling technique of Durbin et al [9]. The results are
displayed in Table 18.9. In terms of mean test set risk, both the EA and the DSA
improved upon the performance of all estimators considered by Durbin et al [9]. In
particular, the EAs estimate resulted in a mean test set risk that improved upon all
previous results by approximately 7.9%. Moreover, the DSAs estimate improved
upon that of the EA by approximately 10.2%. Figure 18.12 displays a notched boxplot of bootstrap test set risk for each estimator selection algorithm. These results
may be directly compared to those contained in [9], which are reproduced in Figure
18.13 with the permission of the authors. Because its notches do not overlap with
those of any other estimator, it appears that the DSA significantly outperforms all
other estimators considered for this particular problem. The EA and DSAs 95%
confidence intervals for test set risk appear to be wider than those of the other estimators studied. It is possible that the proposed EA produces a greater variability
in its estimates on account of its stochastic mechanisms in the reproduction and
mutation stages.
Table 18.9 Empirical test set risk of several estimator selection procedures on the diabetes
data of Efron et al [10] based upon B = 100 bootstrap samples of the diabetes data test set
of 50 observations. The EA and DSA results are compared in terms of risk to a number of
estimators tested in Durbin et al [9] on the diabetes data . The table shows the mean bootstrap test set risk and 95% risk confidence interval for each estimator based on 100 bootstrap
samples from the test set. Confidence intervals were produced from normal theory according
to estimates of the mean and standard deviation for each estimators risk. The third column
compares each procedures mean risk ratio to that of the EA, and the final column shows
which covariates were included in each algorithms selected estimator. It appears that all results obtained by Durbin et al [9] were at least 7.9% larger in test set risk than that obtained
from the EA, and the DSA subsequently improved on the EA by approximately 10.2%
Estimator
lm
LARS (CV)
polymars
LARS (Cp)
full nnet
nnet-DSA
rpart
DSA
EA
Ratio
Covariates
1.079
(all)
1.109 Sex, BMI, BP, S1 S3, S5, S6
1.119
Sex, BMI, BP, S3, S5, S6
1.131 Sex, BMI, BP, S2, S3, S5, S6
1.204
(all)
1.208
(all)
1.251
BMI, BP, S2, S3, S5, S6
0.898
Sex, BMI, BP, S3, S5
1 Age, Sex, BMI, BP, S1, S3 S6
480
2000
3000
4000
EA
DSA
Fig. 18.12 Boxplots of bootstrap test set risk of the EA and DSA estimates obtained from
the diabetes data based upon B = 100 bootstrap samples of the test set. These results may be
directly compared to those obtained by Durbin et al [9] in Figure 18.13
Fig. 18.13 Empirical test set risk of several estimator selection procedures on the diabetes
data based upon B = 100 bootstrap samples of the diabetes data test set of 50 observations.
This figure was originally produced by Durbin et al [9] and is reproduced here with the
permission of the authors. These results may be directly compared to those of the EA and
DSA, which are displayed in Figure 18.12
18
481
18.7 Conclusion
In light of the size of parameter spaces for the constraint profiles characterized in
Section 18.3, estimator selection procedures operating according to the general road
map for loss based estimation must be able to search quickly and effectively for candidate estimators minimizing empirical risk within parameter subspaces. EAs and
similar stochastic optimization algorithms provide an aggressive approach to risk
optimization and are sufficiently flexible to offer high-quality estimates in a wide
variety of settings. The results of the simulation study and diabetes analysis establish the proposed EA as a competitive alternative to other procedures. Because the
No Free Lunch Theorem [20] shows that no single algorithm can always outperform
all others, the proposed EA may be used as a complement to the DSA as a general
tool for estimator selection in regression settings. The EA is an attractive alternative because its computational parameters can be adapted to the problem at hand,
and its modular design allows for variations of its evolutionary mechanisms without
requiring significant changes in the overall software implementation. Furthermore,
the EA converges asymptotically in generation to the global optimum within the
size k parameter subspace to be searched. It should be noted that asymptotic convergence does not ensure that a global optimum will be reached in the allotted time.
However, the EA performed competitively in the simulations and data analysis, and
its asymptotic convergence property and elitist selection mechanism indicates that
further computations would only improve the quality of its results.
While the DSA search algorithm shifts between parameter subspaces of different basis size, the EA independently searches each subspace. This separation allows
for parallel computing techniques to simultaneously search different parameter subspaces on additional processors and also allows the user to tune computational parameters like the population size, mutation probability, and number of generations
according to the size of the subspace. Although the EA described is designed to
search a parameter space consisting of polynomial regression functions, the proposed methodology applies to general parameterizations (e.g. histogram regression
and neural networks), which is an appealing feature of both the EA and the DSA.
The results obtained in this study come with a few caveats: first, in general the EA
required significantly more time to produce its estimates than the DSA. This time
difference may be attributed to the DSAs implementation in the C programming
language, which is significantly faster than R. Although this project illustrates the
EAs utility, it also demonstrates the need to improve the algorithms speed in subsequent software packages. Future implementations of the EA may also apply parallel
computing techniques to simultaneously search distinct subspaces or training sets in
the cross-validation phase. However, because statistical estimation typically occurs
at the end of a lengthy study, these computations are not especially timesensitive,
and in many cases it is reasonable to allow several hours or days for this task.
Because the DSA was treated as a black box in the simulations, a comparison
to the EA in terms of the number of model fits required to obtain an estimate of a
given quality is currently unavailable. However, the simulation results suggest that
the DSA is vulnerable to local optima. Unlike EAs, the DSA risk-optimizing search
482
procedure is deterministic for a given split of the data into training and validation
sets. Future versions of the DSA may consider introducing a stochastic component
akin to the EAs mutation mechanism to work in concert with its existing elitist
selection procedure. If the proposed augmentation ensures that all estimators within
a parameter subspace form a single communicating class, then this modified DSA
would asymptotically converge in time to the global optimum.
Additionally, estimator selection software packages may provide the researcher
with the opportunity to include particular basis functions in all candidate estimates
so that known causal relationships remain fixed while searching for additional factors that contribute to a quantity of interest. When the researcher wishes to compare
results from a large number of distinct algorithms, an arbitrary number of alternative search procedures may be generated by varying the EAs tuning parameters such
as the mutation probability. For a particular problem, an additional cross-validation
procedure may be used to select among candidate mutation probabilities or other
tuning parameters. Finally, the variability of the EAs results may be investigated as
a function of generation to guide the choice of these computational parameters.
18.8 Appendix
Section 18.3.3 analyzed the size of the parameter space for polynomial regression
under the interaction order constraint (18.12) and the polynomial degree constraint
(18.13) or the variable degree constraint (18.14). We wish to substantiate the conclusions summarized in Table 18.1.
Under constraint (18.13), the number of basis functions is given by the value of
I (18.15), which can be bounded below as follows:
S
D min(s,ds)
S
S
J
s
d s1
J
S
I=
1+
k
k1
s=1 s
s=1 s
s=1 s
d=s+1
k=1
S
S
(18.28)
= 2 1 2 .
The first equality restates (18.15), and the first inequality follows because all
terms in the nestedsummations
are positive. Under constraint (18.13), then S
min(J, D) J, and Ss Js for all s {1, . . . , S}, so the second inequality holds.
The final equality is a direct consequence
of
the Binomial Theorem. Therefore, I is
S . For the extreme case of S = min(J, D),
bounded below by a function
of
order
2
then I = 2S = 2min(J,D) , which is an exponential function of the number
of variables J and the polynomial degree bound D. We then turn our attention to the
case of I0 under the variable degree constraint (18.14). We can bound I0 from below
as follows:
S
S
J
S s
I0 =
Ds0
D0 = (D0 + 1)S 1 > DS0 DS0 .
(18.29)
s=1 s
s=1 s
18
483
The first equality restates (18.16), and the first inequality follows because J S
when constraint (18.14) is imposed. The next equality follows from the Binomial
Theorem, and the remaining polynomialis of degree S. In the extreme case of S = J,
the number of basis functions is then DJ0 . Because the summation in (18.29) is
solved in a closed form and results in a polynomial when S = J, this asymptotic
lower bound is also an asymptotic upperbound,
andboth
are tight [6]. Therefore
the number of basis functions is both DJ0 and O DJ0 when S = J. Because S
is maximized, this upper bound is an overall upper bound on the number of basis
functions under the variable degree constraint (18.14). Furthermore, because the
set of basis functions under the polynomial degree constraint (18.13) is a subset of
those under the variable degree constraint (18.14) when D = D0 , then
the number of
basis functions I is trivially bounded above by a function of order O DJ . Likewise,
the value
I0 for constraint (18.14) is loosely bounded above by a function of order
O DJ0 that becomes tight if S = J.
The size of the parameter space is 2I or 2I0 in the constraint profiles of Section
18.3. By applying the previous bounds for I and I0 to the parameter space analysis,
we arrive at the conclusions summarized in Table 18.1. It should be noted that the
order functions and O imply that the bounds can be stated as a constant times the
given function. In expressing the size of the parameter space in terms of the number
of basis functions under different constraint profiles, the constant for the order of
the size of the parameter space differs from that for the order of the number of basis
functions.
Acknowledgments
The authors wish to thank Ron Peled, Cathy Tuglus, Burke Bundy, Mark van der Laan, and
the anonymous reviewers for their helpful suggestions. Blythe Durbin provided information
about the design of her previous diabetes analysis to facilitate a fair comparison of the proposed method to other predictors. Richard Liang gratefully acknowledges the support of the
Natural Sciences and Engineering Research Council (NSERC) of Canada.
References
1. Back, T.: Evolutionary Algorithms in Theory and Practice. Oxford University Press,
Oxford (1996)
2. Breiman, L.: Random forests. Machine Learning 45(1), 532 (2001)
3. Breiman, L., Friedman, J.H., Stone, C.J., Olshen, R.A.: Classification and Regression
Trees. Chapman and Hall, Boca Raton (1984)
4. Candes, E., Tao, T.: The dantzig selector: Statistical estimation when p is much larger
than n. Annals of Statistics 35(6), 23132351 (2007)
5. Chambers, J.M., Cleveland, W.S., Tukey, P.A.: Graphical Methods for Data Analysis.
Duxbury Press (1983)
6. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press,
Cambridge (1990)
484
7. Dudoit, S., van der Laan, M.J.: Asymptotics of cross-validated risk estimation in estimator selection and performance assessment. Statistical Methodology 2(2), 131154 (2005)
8. Dudoit, S., van der Laan, M.J., Keles, S., Molinaro, A.M., Sinisi, S.E., Teng, S.L.:
Loss-based estimation with cross-validation: Applications to microarray data analysis.
In: Piatetsky-Shapiro, G., Tamayo, P. (eds.) Microarray Data Mining. SIGKDD Explorations, vol. 5, pp. 5668. ACM, New York (2003),
http://www.acm.org/sigs/sigkdd/explorations/issue5-2.htm
9. Durbin, B., Dudoit, S., van der Laan, M.J.: Optimization of the architecture of neural
networks using a Deletion/Substitution/Addition algorithm. Tech. Rep. 170, Division of
Biostatistics, University of California, Berkeley (2005),
www.bepress.com/ucbbiostat/paper170
10. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Annals of
Statistics 32(4), 407499 (2004)
11. Fogel, D.B.: Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. IEEE Press, Los Alamitos (2005)
12. Freedman, D.A.: Statistical Models: Theory and Practice, 2nd edn. Cambridge University Press, Cambridge (2009)
13. Friedman, J.H.: Multivariate adaptive regression splines. The Annals of Statistics 19(1),
1141 (1991)
14. Friedman, J.H.: Fast sparse regression and classification. Tech. rep., Department of
Statistics, Stanford University (2008), http://www-stat.stanford.edu/~ jfh/
15. van der Laan, M.J., Dudoit, S.: Unified cross-validation methodology for selection
among estimators and a general cross-validated adaptive -net estimator: Finite sample
oracle inequalities and examples. Tech. Rep. 130, Division of Biostatistics, University of
California, Berkeley (2003), www.bepress.com/ucbbiostat/paper130
16. Sinisi, S.E., van der Laan, M.J.: Deletion/substitution/addition algorithm in learning
with applications in genomics. Statistical Applications in Genetics and Molecular Biology 3(1), Article 18 (2004), www.bepress.com/sagmb/vol3/iss1/art18
17. Specht, D.F.: A general regression neural network. IEEE Transactions on Neural Networks 2(6), 568576 (1991)
18. Stoll, M.: Introduction to Real Analysis. Addison-Wesley, Reading (2000)
19. Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal
Statistical Society 58(1), 267288 (1996)
20. Wolpert, D.H., MacReady, W.G.: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1(1), 6782 (1997)
Part III
Real-World Applications
Chapter 19
Abstract. Multiple-input multiple-output (MIMO) technologies are capable of substantially improving the achievable systems capacity, coverage and/or quality of
service. The systems ability to approach the MIMO capacity depends heavily on
the designs of MIMO receiver and/or transmitter, which are generally expensive optimisation tasks. Hence, researchers and engineers have endeavoured to develop efficient optimisation techniques that can solve practical MIMO designs with affordable
costs. In this contribution, we demonstrate that particle swarm optimisation (PSO)
offers an efficient means for aiding MIMO transceiver designs. Specifically, we consider PSO-aided semi-blind joint maximum likelihood channel estimation and data
detection for MIMO receiver, and we investigate PSO-based minimum bit-error-rate
multiuser transmission for MIMO systems. In both these two MIMO applications,
the PSO-aided approach attains an optimal design solution with a significantly lower
complexity than the existing state-of-the-art scheme.
19.1 Introduction
Multiple-input multiple-output (MIMO) technologies are widely adopted in practice to improve the systems achievable capacity, coverage and/or quality of service
[14, 15, 30, 32, 33, 41, 42, 43, 45]. The designs of MIMO receiver and/or transmitter critically influence the systems ability to approach the MIMO capacity. MIMO
transceiver designs, which are typically expensive optimisation tasks, have motivated researchers and engineers to develop efficient optimisation techniques that
can attain optimal MIMO designs with affordable costs. Hence, the particle swarm
optimisation (PSO) as an advanced optimisation tool can offer an efficient means
for aiding MIMO transceiver designs. PSO [25] is a population based stochastic optimisation technique inspired by social behaviour of bird flocking or fish schooling.
The algorithm commences with random initialisation of a swarm of individuals, referred to as particles, within the problems search space. It then endeavours to find
S. Chen W. Yao H.R. Palally L. Hanzo
School of Electronics and Computer Science,
University of Southampton, Southampton SO17 1BJ, UK
e-mail: {sqc,wy07r,hrp1v07,lh}@ecs.soton.ac.uk
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 487511.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
488
S. Chen et al.
a global optimal solution by gradually adjusting the trajectory of each particle toward its own best location and toward the best position of the entire swarm at each
evolutionary optimisation step. The PSO method is popular owing to its simplicity in implementation, ability to rapidly converge to a reasonably good solution
and its robustness against local minima. The PSO method has been successfully applied to wide-ranging optimisation problems [10, 12, 13, 16, 18, 26, 27, 35, 37, 38].
In particular, many research works have applied the PSO techniques to multiuser
detection (MUD) [11, 17, 28, 29, 36]. In this contribution we consider the PSO
aid MIMO transceiver designs. Specifically, we develop the PSO aided semi-blind
joint maximum likelihood (ML) channel estimation and data detection for MIMO
receivers and we investigate the PSO-based minimum bit error rate (MBER) multiuser transmission (MUT) for MIMO systems.
In a MIMO receiver, if the channel state information (CSI) is available, optimal ML data detection can be performed using for example the optimised hierarchy
reduced search algorithm (OHRSA) aided detector [2], which is an advanced extension of the complex sphere decoder [34]. Accurately estimating a MIMO channel
however is a challenging task, and a high proportion of training symbols is required
to obtain a reliable least square channel estimate (LSCE) which considerably reduces the achievable system throughput. Although blind joint ML channel estimation and data detection does not reduce the achievable system throughput, it suffers
from drawbacks of excessively high computational complexity and an inherent estimation and decision ambiguities [40]. An interesting scheme for semi-blind joint
ML channel estimation and data detection has been proposed in [1], in which the
joint ML channel estimation and data detection optimisation is decomposed into
two levels. At the upper level a population-based optimisation algorithm known as
the repeated weighted boosting search (RWBS) algorithm [7] searches for an optimal channel estimate, while at the lower level the OHRSA detector [2] recovers
the transmitted data. Joint ML channel estimation and data detection is achieved by
iteratively exchanging information between the RWBS-aided channel estimator and
the OHRSA data detector. The scheme is semi-blind as it employs a few training
symbols, approximately equal to the rank of the MIMO system, to provide an initial LSCE for aiding the RWBS channel estimator to improve its convergence. The
employment of a minimum training overhead has an additional benefit in terms of
avoiding the ambiguities inherent in pure blind joint channel estimation and data
detection. This study advocates the PSO aided alternative for semi-blind joint ML
channel estimation and data detection. We will demonstrate that this PSO aided
scheme compares favourably with the existing state-of-the-art RWBS based method,
in terms of performance and complexity.
In the downlink of a space-division multiple-access (SDMA) induced MIMO
system, mobile terminal (MT) receivers are incapable of cooperatively performing sophisticated MUD. In order to facilitate the employment of a low-complexity
high-power efficiency single-user-receiver, the transmitted signals have to be preprocessed at the base station (BS), leading to the appealing concept of multiuser
transmission (MUT) [50], provided that accurate downlink CSI is available at the
transmitter. The assumption that the downlink channel impulse response (CIR) is
19
489
known at the BS may be deemed valid in time division duplex (TDD) systems,
where the uplink and downlink signals are transmitted at the same frequency, provided that the co-channel interference is also similar at the BS and the MTs. MUTaided transmit preprocessing may hence be deemed attractive, when the channels
coherence time is longer than the transmission burst interval. However, for frequency division duplex (FDD) systems, where the uplink and downlink channels
are expected to be different, CIR feedback from the MTs receivers to the BS transmitter is necessary [51]. Most of the MUT techniques are designed based on the
minimum mean-square-error (MMSE) criterion [44, 51]. Since the achievable bit
error rate (BER) is the ultimate system performance indicator, interests on minimum BER (MBER) based MUT techniques have increased recently [21, 39]. The
optimal MBER-MUT design is a constrained nonlinear optimisation [21, 39], and
the sequential quadratic programming (SQP) algorithm [31] is typically used to obtain the precoders coefficients for the MBER-MUT [21, 23, 39]. In practice, the
computational complexity of the SQP based MBER-MUT solution can be excessive
for high-rate systems [23] and, therefore, it is difficult for practical implementation.
In this contribution, the PSO algorithm is invoked to find the precoders coefficients
for the MBER-MUT in order to reduce the computational complexity to a practically acceptable level. Our results obtained in [52] have demonstrated that the PSO
aided MBER-MUT design imposes a much lower computational complexity than
the existing SQP-based MBER-MUT design.
The rest of this contribution is structured as follows. In Section 19.2, the PSO
algorithm is presented. Section 19.3 is devoted to the development of the PSO-aided
semi-blind joint ML scheme, while Section 19.4 derives the PSO assisted optimal
MBER-MUT scheme. Our conclusions are then offered in Section 19.5.
Throughout our discussions we adopt the following notational conventions. Boldface capitals and lower-case letters stand for complex-valued matrices and vectors of
appropriate dimensions, respectively, while IK and 1KL denote the K K identity
matrix and the K L matrix of unity elements, respectively. The (p, q)th element
h p,q of H is also denoted by H| p,q. Furthermore, ( )T and ( )H represent the transpose and Hermitian operators, respectively, while 2 and | | denote the norm and
the magnitude operators, respectively. E [ ] denotes the expectation operator,
while
[ ] and [ ] represent the real and imaginary parts, respectively. Finally, j = 1.
s.t. U UNM
(19.1)
(19.2)
where F( ) is the cost function of the optimisation problem, U is a N M complexvalued parameter matrix to be optimised, and
490
S. Chen et al.
U = Umax , Umax + j Umax , Umax
(19.3)
defines the search range for each element of U. The flowchart of the PSO algorithm
(l)
is given in Fig. 19.1. A swarm of particles, {Ui }Si=1 , that represent potential soluNM
, where S is the swarm size and index l
tions are evolved in the search space U
denotes the iteration step. The details of the algorithm is now explained.
Update velocities
Initialise particles
S
{U(0)
i }i=1
Vi(l)
l=0
Modify
velocity
Yes
Velocity
approaches zero
or out of limits?
No
Update positions
U(l)
i
Modify
position
Yes
position
out of bounds?
No
S
Evaluate costs {F(U (l)
i )} i=1
(l) S
update { Pb i }i=1 and Gb(l)
Terminate?
Yes
Output solution
Gb
a) The swarm initialisation. Set l = 0 and generate the initial particles, {Ui }Si=1 ,
in the search space UNM with a prescribed way. Typically, the initial particles are
randomly generated.
(l)
(l)
b) The swarm evaluation. For each particle Ui , compute its associated cost F Ui .
(l)
(l)
Each particle Ui remembers its best position visited so far, denoted as Pbi , which
provides the cognitive information. Every particle also knows the best position visited so far among the entire swarm, denoted as Gb(l) , which provides the social
19
491
(l)
information. The cognitive information {Pbi }Si=1 and the social information Gb(l)
are updated at each iteration:
For (i = 1; i S; i++)
(l)
(l)
(l)
(l)
If (F(Ui ) < F(Pbi )) Pbi = Ui ;
End for;
(l)
i = arg min1iS F(Pbi );
(l)
(l)
If (F(Pbi ) < F(Gb(l) )) Gb(l) = Pbi ;
(l)
(l)
(l+1)
= Ui + Vi
Vi
Ui
(l)
(l)
(l)
(l+1)
(l)
(l)
(19.5)
where is the inertia weight, c1 and c2 are the two empirically chosen acceleration
coefficients, while 1 = rand() and 2 = rand() denotes the two random variables
uniformly distributed in (0, 1).
In order to avoid excessive roaming of particles beyond the search space [18], a
velocity space VNM with
V = Vmax , Vmax + j Vmax, Vmax
(19.6)
(l+1)
(l+1)
[Vi
| p,q ] = Vmax ;
(l+1)
[Vi
| p,q ] = Vmax;
(l+1)
[Vi
| p,q ] = Vmax ;
(l+1)
[Vi
| p,q ] = Vmax ;
(l+1)
| p,q ] == 0)
If ([Vi
If(rand() < 0.5)
(l+1)
[Vi
| p,q ] = v Vmax;
Else
(l+1)
[Vi
| p,q ] = v Vmax ;
End if;
(l+1)
Else if ([Vi
| p,q ] == 0)
If(rand() < 0.5)
(l+1)
[Vi
| p,q ] = v Vmax;
492
S. Chen et al.
Else
(l+1)
[Vi
| p,q ] = v Vmax ;
End if;
End if;
where v = rand() is another uniform random variable in (0, 1).
(l+1)
Similarly, each Ui
is checked to ensure that it stays inside the search space
NM
. This can be done for example with the rule:
U
(l+1)
(l+1)
[Ui
| p,q ] = Umax ;
(l+1)
[Ui
| p,q ] = Umax ;
(l+1)
[Ui
| p,q ] = Umax ;
(l+1)
[Ui
| p,q ] = Umax ;
An alternative rule is, if a particle is outside the search space, it is moved back inside
the search space randomly, rather than forcing it to stay at the border as the previous
rule does. That is,
(l+1)
If ([Ui
| p,q ] > Umax )
(l+1)
| p,q ] < Umax )
If ([Ui
(l+1)
| p,q ] > Umax )
If ([Ui
(l+1)
| p,q ] < Umax )
If ([Ui
(l+1)
[Ui
| p,q ] = rand() Umax;
(l+1)
[Ui
| p,q ] = rand() Umax;
(l+1)
[Ui
| p,q ] = rand() Umax;
(l+1)
[Ui
| p,q ] = rand() Umax;
(19.7)
19
493
number of iterations, Imax , is generally determined by experiment. In our experiments we choose the optimal swarm size S to minimise the total complexity C of
(19.7).
It was reported in [35] that a time varying acceleration coefficient (TVAC) enhances the performance of PSO. In this TVAC mechanism [35], c1 for the cognitive
component is reduced from 2.5 to 0.5 and c2 for the social component varies from
0.5 to 2.5 respectively during the iterative procedure according to
c1 = (0.5 2.5) l/Imax + 2.5
(19.8)
c2 = (2.5 0.5) l/Imax + 0.5
The reason given for this TVAC mechanism is that at the initial stages, a large cognitive component and a small social component help particles to wander around
or exploit better the search space and to avoid local minima. In the later stages, a
small cognitive component and a large social component help particles to converge
quickly to a global minimum.
We also experiment an alternative TVAC mechanism in which c1 is varies from
0.5 to 2.5 and c2 changes from 2.5 to 0.5 during the iterative procedure according to
c1 = (2.5 0.5) l/Imax + 0.5
(19.9)
c2 = (0.5 2.5) l/Imax + 2.5
Which TVAC mechanism to choose is decided by empirical performance in our
applications.
Several choices of the inertia weight can be considered, including the zero inertia
weight = 0, a constant inertia weight or a random inertia weight = rand(). In
our applications, empirical experience suggests that = 0 is appropriate. An appropriate value of the control factor in reinitialising zero velocity found empirically
for our applications is = 0.1.
(19.10)
494
S. Chen et al.
1
(2n2 )nR L
1 2 Lk=1 y(k)Hx(k)2
2n
(19.12)
19
495
X,
H)
= arg min min JML (X,
H)
(X,
.
(19.15)
At the inner-level optimisation we can use the optimised hierarchy reduced search
algorithm (OHRSA) based ML detector [2] to find the ML data estimate for the
given channel. The detailed implementation of the OHRSA-aided ML detector can
be found in [2] and will not be repeated here. In order to guarantee a joint ML estimate, the search algorithm used at the outer or upper-level optimisation should be
capable of finding a global optimal channel estimate efficiently. A joint ML solution
is achieved with the following iterative loop.
Outer-level Optimisation: A search algorithm searches the MIMO channel parame by minimising the mean square error
ter space to find a global optimal estimate H
(MSE)
= JML (X(
H),
H),
JMSE (H)
(19.16)
H)
denotes the ML estimate of the transmitted data for the given channel
where X(
H.
the OHRSA detector finds the ML estimate of
Inner-level Optimisation: Given H
to the upper level.
the transmitted data and feeds back the ML metric JMSE (H)
Pure blind joint data and channel estimation converges very slowly and suffers
from an inherent permutation and scaling ambiguity problem [40]. To resolve this
permutation and scaling ambiguity, a few training symbols are employed to provide
an initial least square channel estimate (LSCE) for aiding the outer-level search algorithm. Let the number of training symbols be K, and denote the available training
data as YK = [y(1) y(2) y(K)] and XK = [x(1) x(2) x(K)]. The LSCE based
on {YK , XK } is readily given by
H 1
LSCE = YK XH
H
.
K XK X K
(19.17)
To maintain the system throughput, we only use the minimum number of training
symbols, namely, K = nT , which is equal to the rank of the MIMO system. The
training symbol matrix XK should be designed to yield the optimal estimation performance [4]. Specifically, XK is designed to have nT orthogonal rows. This yields
the most efficient estimate and removes the need for matrix inversion.
496
S. Chen et al.
solution X(
In the previous work [1], we have applied the repeated weighted boosting search
(RWBS) algorithm [7] to perform the outer-level optimisation search of the joint ML
iterative loop. The results shown in [1] demonestrate that the RWBS-aided semiblind joint ML scheme performs well and is efficient in terms of its convergence
speed. In this contribution, we show that by invoking the PSO method as the outerlevel search algorithm, further performance enhancement can be achieved in terms
of reduced complexity. The cost function for the PSO algorithm to optimise in this
= JMSE (H)
with the dimensions of the search space specified by N =
case is F(H)
nR and M = nT .
(0) =
In Step a) The swarm initialisation, the initial particles are chosen as H
1
LSCE and
H
(0) = H
LSCE + h (1nR nT + j1nR nT ), 2 i S,
H
(19.18)
i
where h is a uniformly distributed random variable defined in the range [ , ].
Appropriate value for is determined by experiment.
In Step c) The swarm update, we adopt the zero inertia weight = 0 and the
TVAC mechanism (19.9). For any particle wandering outside the search space, we
force it back to stay at the border of the search space. These provisions are found to
be appropriate for this application empirically.
Let COHRSA (L) be the complexity of the OHRSA algorithm to decode the Lsymbol data matrix X and let NOHRSA be the number of calls for the OHRSA algorithm required by the PSO algorithm to converge. Then the complexity of the
proposed semi-blind method is expressed as
C = NOHRSA COHRSA (L),
(19.19)
where COHRSA (L) is given in [46], and NOHRSA = S Imax with Imax being the
maximum number of iterations and S the swarm size. It can be seen that the
19
497
(19.20)
10
Eb/No=15dB , L=50
Eb/No=15dB , L=100
Eb/No=20dB , L=50
Eb/No=20dB , L=100
10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
alpha
Fig. 19.2 Mean channel error average over 50 different channel realisations as a function of
after 1000 OHRSA evaluations, for two values of Eb /No and two values of L
498
S. Chen et al.
which lay between 2 to 3 standard deviations of the true tap distribution. We also set
the velocity limit to Vmax = 1.0 which was confirmed in simulation to be a suitable
value for this application. The control factor in reinitialising zero velocity was
found empirically to be = 0.1. The optimal value for the control parameter in
the channel population initiation (19.18) was first found empirically. Fig. 19.2 shows
the MCE performance after 1000 OHRSA evaluations over a range of values. It
can be seen from Fig. 19.2 that the optimal value of in this case was 0.15. This
value of was used in all the other simulations.
Fig. 19.3 depicts the BER performance of the PSO based semi-blind scheme having a frame length L = 100 after 1000 OHRSA evaluations and averaging over 50
different channel realisations, in comparison with the performance of the trainingbased OHRSA detector having K = 4, 8 and 16 training symbols for the LSCE,
respectively, as well as with the case of perfect channel knowledge. It can be observed from Fig. 19.3 that, for the training-based scheme to achieve the same BER
performance of the PSO-aided semi-blind one having only 4 pilot symbols, the number of training symbols had to be more than 16. This example was identical to the
MIMO system investigated in [1]. The BER performance of the PSO-based semiblind scheme depicted in Fig. 19.3 was slightly better than the BER of the RWBSbased semi-blind scheme shown in [1]. Moreover, the performance of the PSO-aided
scheme was achieved after 1000 OHRSA evaluations, while the performance of the
RWBS-based scheme reported in [1] was obtained after 1200 OHRSA evaluations.
Thus, for this 4 4 MIMO benchmark, the computational saving achieved by the
Fig. 19.3 BER of the PSO aided semi-blind scheme with frame length L = 100 after 1000
OHRSA evaluations and average over 50 different channel realisations in comparison with
the training-based cases using 4, 8 and 16 pilot symbols as well as the case of perfect channel
knowledge
19
499
Fig. 19.4 Mean square error convergence performance of the PSO aided semi-blind scheme
averaged over 50 different channel realisations for different values of Eb /No and L
(19.21)
Figs. 19.4 and 19.5 depict the convergence performance of the proposed PSO-aided
semi-blind joint ML channel estimation and data detection scheme averaged over
50 different channel realisations in terms of the MSE and MCE, respectively, for
different SNR values as well as for two frame lengths L = 50 and 100. It can be seen
from Fig. 19.4 that the MSE converged to the noise floor. The MCE performance
shown in Fig. 19.5 was seen to be slightly better and converging faster than the
results obtained by the RWBS-based semi-blind joint ML scheme shown in [1].
500
S. Chen et al.
Fig. 19.5 Mean channel error convergence performance of the PSO aided semi-blind scheme
averaged over 50 different channel realisations for different values of Eb /No and L
19
501
The downlink of the SDMA system is specified by its channel matrix H, which
is given by
H = [h1 h2 hnR ],
(19.24)
where hm = [h1,m h2,m hnT ,m ]T , 1 m nR , is the mth users spatial signature. The
channel taps hi,m for 1 i nT and 1 m nR are independent of each other and
obey the complex-valued Gaussian distribution with E[|hi,m |2 ] = 1. At the receiver,
the reciprocal of the scaling factor, namely 1 , is used to scale the received signal
to ensure unity-gain transmission, and the baseband model of the system can be
described as
y(k) = 1 HT C x(k) + 1n(k) = HT C x(k) + 1n(k),
(19.25)
where n(k) = [n1 (k) n2 (k) nnR (k)]T is the channel additive white Gaussian noise
vector, nm (k), 1 m nR , is a complex-valued Gaussian random process with zero
mean and E[|nm (k)|2 ] = 2n2 = No , and y(k) = [y1 (k) y2 (k) ynR (k)]T denotes the
received signal vector. Note that ym (k), 1 m nR , constitutes sufficient statistics
for the mth MT to detect the transmitted data symbol xm (k). The SNR of the downlink is defined as SNR = Eb /No , where Eb = ET /(nT log2 M ) is the energy per bit
per antenna for M -ary modulation. In our case, M = 4.
502
S. Chen et al.
Table 19.1 Computational complexity per iteration of two MBER MUT designs for QPSK
signalling, where nT is the number of transmit antennas, nR the number of mobile terminals,
M = 4 is the size of symbol constellation and S is the swarm size
Algorithm Flops
SQP nR (8 n2T n2R + 6 nT nR + 6 nT + 8 nR + 4) M nR
+O(8 n3T n3R ) + 8 n2T n2R + 16 nT n2R + 8 n2T nR
+12 nT nR + 6 n2R 2 n2T + nT 2 nR + 11
PSO ((16 nT nR + 7 nR + 6 nT + 1) M nR + 20 nT nR + 2) S + 8
(19.29)
(19.31)
19
503
With an appropriately chosen penalty factor , the MBER-MUT design (19.29) can
be obtained as the solution of the following unconstrained optimisation
CTxMBER = arg min F(C).
(19.32)
The value of is linked to the value of SNR. Since the BS has the knowledge of
the downlink SNR, it is not difficult at all to assign an appropriate value. The
dimensions of the search space for the PSO optimisation are specified by N = nT
and M = nR .
(0)
In Step a) The swarm initialisation, we set C1 = CTxMMSE , the MMSE MUT
(0)
solution, and randomly generate the rest of the initial particles, {Ci }Si=2 , in the
search space UnT nR .
In Step c) The swarm update, we adopt the zero inertia weight = 0 and the
TVAC mechanism (19.8). If a particle wanders outside the search space, we move
it back inside the search space randomly rather than forcing it to stay at the border
of the search space. These measures are tested empirically to be appropriate for this
application.
The computational complexity per iteration for the PSO-aided MBER-MUT
scheme is also listed in Table 19.1. We will demonstrate that the PSO-aided MBER
MUT design imposes a considerably lower complexity than the SQP based MBER
MUT design. This is owing to the fact that the designed PSO algorithm is very
efficient in searching through the precoders parameter space to find an optimal solution, as demonstrated in the following simulation study.
504
S. Chen et al.
dimension to each channel tap hi,m to represent channel estimation error. The BERs
of the MMSE-MUT and the PSO-based MBER-MUT under this channel estimation
error are also plotted in Fig. 19.6. It can be seen that the PSO-aided MBER-MUT
design was no more sensitive to channel estimation error than the MMSE-MUT
design. The convergence performance and computational requirements of the PSOaided MBER-MUT design were investigated, using the SQP-based MBER-MUT
counterpart as the benchmark. Fig. 19.7 compares the convergence performance of
the SQP-based and PSO-aided MBER MUT schemes, operating at the SNR values
of Eb /No = 10 dB and 15 dB, respectively.
At the SNR of 10 dB, it can be seen from Fig. 19.7 that the SQP algorithm
converged to the MBER-MUT solution after 100 iterations, while the PSO counterpart arrived at the same MBER-MUT solution after 20 iterations. Fig. 19.8 shows
the computational complexities required by the SQP-based and PSO-aided MBERMUT designs, respectively, to arrive at the MBER MUT solution, in term of (a) the
total number of operations (Flops) and (b) the total run time (seconds) recorded. In
deriving the number of operations required by the SQP algorithm, we had approximated O(8 n3T n3R ) by 8 n3T n3R . It can be observed from Fig. 19.8 (a) that the
SQP-based algorithm needed 229,351,100 Flops to converge to the MBER-MUT
solution, while the PSO-aided algorithm converged to the same MBER-MUT solution at the cost of 34,561,760 Flops. Therefore, the PSO-aided MBER-MUT design
imposed an approximately seven times lower complexity than the SQP counterpart
1
10
BER
10
10
10
10
15
20
25
30
Eb/No(dB)
Fig. 19.6 BER versus SNR performance of the PSO-aided MBER-MUT communicating
over flat Rayleigh fading channels using nT = 4 transmit antennas to support nR = 4 QPSK
MTs, in comparison with the benchmark MMSE-MUT
19
505
for this scenario. From Fig. 19.8 (b), it can be seen that the SQP-based design required 1730.6 seconds to converge to the optimal MBER-MUT solution, while the
PSO-aided design only needed 257.3 seconds to arrive at the same optimal MBERMUT solution. This also confirms that the PSO-aided MBER-MUT scheme was
approximately seven times faster than the SQP-based counterpart in this case.
From Fig. 19.7 it can also been that, with the SNR of 15 dB, the SQP based algorithm converged after 140 iterations, which required a total cost of 321,091,540
Flops, while the PSO-aided scheme archived the convergence after 40 iterations,
which required a total cost of 63,541,120 Flops. Thus, the PSO-aided design imposed an approximately five times lower complexity than the SQP counterpart in
this scenario.
Further investigation showed that the convergence results obtained for SNR<
10 dB were similar to the case of SNR= 10 dB, while the convergence results obtained under SNR> 15 dB agreed with the case of SNR= 15 dB. Thus, we may
conclude that for this 4 4 MIMO benchmark the PSO-aided MBER-MUT design
imposed approximately five to seven times lower complexity than the SQP-based
MBER-MUT counterpart.
Finally, we showed that why the choice of the swarm size S = 20 was optimal
in this application. Fig. 19.9 illustrates the convergence performance and the total
required complexity for the PSO-aided algorithm with the different swarm sizes of
S = 10, 20, 30 and 40 at the SNR value of 15 dB. It is clear that S = 10 was too small
Fig. 19.7 Convergence performance of the SQP-based and PSO-aided MBER-MUT schemes
for the system employing nT = 4 transmit antennas to support nR = 4 QPSK MTs over flat
Rayleigh fading channels at Eb /No = 10 dB and 15 dB, respectively
506
S. Chen et al.
8
x 10
2.5
FLOPS
1.5
0.5
SQP
PSO
0
20
40
60
80
100
120
Iterations
(a)
1800
1600
SQP
PSO
1400
1200
1000
800
600
400
200
20
40
60
80
100
120
Iterations
(b)
Fig. 19.8 Complexity comparison of the SQP-based and PSO-aided MBER-MUT schemes
for the system employing nT = 4 transmit antennas to support nR = 4 QPSK MTs over flat
Rayleigh fading channels at Eb /No = 10 dB, in terms of (a) number of FLOPs, and (b) run
time (seconds)
19
507
(a)
7
x 10
8
7
63,541,120
64,335,276
79,426,200
FLOPS
6
5
4
3
2
1
0
15
20
25
30
35
40
45
Swarm Size
(b)
Fig. 19.9 Convergence performance (a) and required total complexity (b) of the PSO-aided
MBER-MUT scheme with different swarm sizes for the system employing nT = 4 transmit
antennas to support nR = 4 QPSK MTs over flat Rayleigh fading channels at Eb /No = 15 dB
508
S. Chen et al.
for the algorithm to converge to the optimal MBER-MUT solution in this case. The
results of Fig. 19.9 also show that with S = 20 the algorithm took 40 iterations to
converge at the cost of 63,541,120 Flops, and with S = 30 it needed 27 iterations
at the cost of 64,335,276 Flops, while the algorithm given S = 40 only required 25
iterations to converge but its cost was 79,426,200 Flops. Thus the choice of S = 20
led to the lowest computational cost for the algorithm to converge in this application.
19.5 Conclusions
State-of-the-art MIMO transceiver designs impose expensive optimisation problems, which require the applications of sophisticated and advanced optimisation
techniques, such as evolutionary computation methods, in order to achieve the optimal performance offered by MIMO technologies at practically affordable cost.
In this contribution, we have demonstrated that the PSO provides an efficient tool
for aiding MIMO transceiver designs. Specifically, we have applied the PSO algorithm to the semi-blind joint ML channel estimation and data detection for
MIMO receiver, which offers significant complexity saving over an existing stateof-the-art RWBS-based scheme. Furthermore, we have employed the PSO to design the MBER MUT scheme for the downlink of a SDMA induced MIMO system,
which imposes much lower computational complexity than the available SQP-based
MBER MUT design.
The Communication Research Group at the University of Southampton has actively engaged in research of state-of-the-art MIMO transceiver designs using various powerful evolutionary computation methods for a long time. In particular, we
have extensive experience using the genetic algorithm [3, 5, 6, 22, 24, 53] and the ant
colony optimisation [47, 48, 49] for MUD designs. Further research is warranted to
further investigate various evolutionary computation methods in benchmark MIMO
designs and to study their performance-complexity trade-offs with the aim of providing useful guidelines for aiding practical MIMO system designs.
References
1. Abuthinien, M., Chen, S., Hanzo, L.: Semi-blind joint maximum likelihood channel estimation and data detection for MIMO systems. IEEE Signal Processing Letters 15, 202
205 (2008)
2. Akhtman, J., Wolfgang, A., Chen, S., Hanzo, L.: An optimized-hierarchy-aided
approximate Log-MAP detector for MIMO systems. IEEE Trans. Wireless Communications 6(5), 19001909 (2007)
3. Alias, M.Y., Chen, S., Hanzo, L.: Multiple antenna aided OFDM employing genetic
algorithm assisted minimum bit error rate multiuser detection. IEEE Trans. Vehicular
Technology 54(5), 17131721 (2005)
4. Biguesh, M., Gershman, A.B.: Training-based MIMO channel estimation: A study of
estimator tradeoffs and optimal training signals. IEEE Trans. Signal Processing 54(3),
884893 (2006)
19
509
5. Chen, S., Wu, Y., McLaughlin, S.: Genetic algorithm optimisation for blind channel
identification with higher-order cumulant fitting. IEEE Trans. Evolutionary Computation 1(4), 259266 (1997)
6. Chen, S., Wu, Y.: Maximum likelihood joint channel and data estimation using genetic
algorithms. IEEE Trans. Signal Processing 46(5), 14691473 (1998)
7. Chen, S., Wang, X.X., Harris, C.J.: Experiments with repeating weighted boosting search
for optimization in signal processing applications. IEEE Trans. System, Man and Cybernetics, Part B 35(4), 682693 (2005)
8. Chen, S., Hanzo, L., Ahmad, N.N., Wolfgang, A.: Adaptive minimum bit error rate
beamforming assisted receiver for QPSK wireless communication. Digital Signal Processing 15(6), 545567 (2005)
9. Chen, S., Livingstone, A., Du, H.-Q., Hanzo, L.: Adaptive minimum symbol error rate
beamforming assisted detection for quadrature amplitude modulation. IEEE Trans. Wireless Communications 7(4), 11401145 (2008)
10. Das, S., Konar, A.: A swarm intelligence approach to the synthesis of two-dimensional
IIR filters. Engineering Applications of Artificial Intelligence 20(8), 10861096 (2007)
11. El-Mora, H.H., Sheikh, A.U., Zerguine, A.: Application of particle swarm optimization
algorithm to multiuser detection in CDMA. In: Proc. 16th IEEE Int. Symp. Personal,
Indoor and Mobile Radio Communications, Berlin, Germany, September 11-14, vol. 4,
pp. 25222526 (2005)
12. Fang, W., Sun, J., Xu, W.-B.: Design IIR digital filters using quantum-behaved particle
swarm optimization. In: Proc. 2nd Int. Conf. Natural Computation, Part II, Xian, China,
September 24-28, pp. 637640 (2006)
13. Feng, H.-M.: Self-generation RBFNs using evolutional PSO learning. Neurocomputing 70(1-3), 41251 (2006)
14. Foschini, G.J.: Layered space-time architecture for wireless communication in a fading
environment when using multiple antennas. Bell Labs Tech. J. 1(2), 4159 (1996)
15. Foschini, G.J., Gans, M.J.: On limits of wireless communications in a fading environment when using multiple antennas. Wireless Personal Communications 6(3), 311335
(1998)
16. Guerra, F.A., Coelho, L.S.: Multi-step ahead nonlinear identification of Lorenzs chaotic
system using radial basis function neural network with learning by clustering and particle
swarm optimiszation. Chaos, Solitons and Fractals 35(5), 967979 (2008)
17. Guo, Z., Xiao, Y., Lee, M.H.: Multiuser detection based on particle swarm optimization algorithm over multipath fading channels. IEICE Trans. Communications E90-B(2),
421424 (2007)
18. Guru, S.M., Halgamuge, S.K., Fernando, S.: Particle swarm optimisers for cluster formation in wireless sensor networks. In: Proc. 2005 Int. Conf. Intelligent Sensors, Sensor
Networks and Information Processing, Melbourne, Australia, December 5-8, pp. 319
324 (2005)
19. Hanzo, L., Munster, M., Choi, B.J., Keller, T.: OFDM and MC-CDMA for Broadband
Multi-User Communications, WLANs and Broadcasting. John Wiley and IEEE Press,
Chichester (2003)
20. Hanzo, L., Ng, S.X., Keller, T., Webb, W.: Quadrature Amplitude Modulation: From Basics to Adaptive Trellis-Coded, Turbo-Equalised and Space-Time Coded OFDM, CDMA
and MC-CDMA Systems. John Wiley and IEEE Press, Chichester (2004)
21. Hjrungnes, A., Diniz, P.S.R.: Minimum BER prefilter transform for communications
systems with binary signaling and known FIR MIMO channel. IEEE Signal Processing
Letters 12(3), 234237 (2005)
510
S. Chen et al.
22. Hua, W.: Interference Suprression in Single- and Multi-Carrier CDMA Systems. PhD
Thesis, School of Electronics and Computer Science, University of Southampton,
Southampton, UK (2005)
23. Irmer, R.: Multiuser Transmission in Code Division Multiple Access Mobile Communication Systems. PhD Thesis, Technique University of Dresden, Dresden, Germany
(2005)
24. Jiang, M.: Hybrid Multi-user OFDM Uplink Systems Using Multiple Antennas. PhD
Thesis, School of Electronics and Computer Science, University of Southampton,
Southampton, UK (2005)
25. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proc. 1995 IEEE Int. Conf.
Neural Networks, Perth, Australia, November 27-December 1, vol. 4, pp. 19421948
(1995)
26. Kennedy, J., Eberhart, R.: Swarm Intelligence. Morgan Kaufmann, San Francisco (2001)
27. Leong, W.-F., Yen, G.G.: PSO-based multiobjective optimization with dynamic population size and adaptive local archives. IEEE Trans. Systems, Man and Cybernetics, Part
B 38(5), 12701293 (2008)
28. Liu, H., Li, J.: A particle swarm optimization-based multiuser detection for receivediversity-aided STBC systems. IEEE Signal Processing Letters 15, 2932 (2008)
29. Lu, Z., Yan, S.: Multiuser detector based on particle swarm algorithm. In: Proc. 6th IEEE
CAS Symp. Emerging Technologies: Frontiers of Mobile and Wireless Communication,
Shanghai, China, May 31-June 2, vol. 2, pp. 783786 (2004)
30. Marzetta, T.L., Hochwald, B.M.: Capacity of a mobile multiple-antenna communication
link in Rayleigh flat fading. IEEE Trans. Information Theory 45(1), 139157 (1999)
31. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1999)
32. Paulraj, A., Nabar, R., Gore, D.: Introduction to Space-Time Wireless Communications.
Cambridge University Press, Cambridge (2003)
33. Paulraj, A.J., Gore, D.A., Nabar, R.U., Bolcskei, H.: An overview of MIMO communications A key to gigabit wireless. Proc. IEEE 92(2), 198218 (2004)
34. Pham, D., Pattipati, K.R., Willet, P.K., Luo, J.: An improved complex sphere decoder for
V-BLAST systems. IEEE Signal Processing Letters 11(9), 748751 (2004)
35. Ratnaweera, A., Halgamuge, S.K., Watson, H.C.: Self-organizing hierarchical particle
swarm optimizer with time-varying acceleration coefficients. IEEE Trans. Evolutionary
Computation 8(3), 240255 (2004)
36. Soo, K.K., Siu, Y.M., Chan, W.S., Yang, L., Chen, R.S.: Particle-swarm-optimizationbased multiuser detector for CDMA communications. IEEE Trans. Vehicular Technology 56(5), 30063013 (2007)
37. Sun, J., Xu, W.-B., Liu, J.: Training RBF neural network via quantum-behaved particle
swarm optimization. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006.
LNCS, vol. 4233, pp. 11561163. Springer, Heidelberg (2006)
38. Sun, T.-Y., Liu, C.-C., Tsai, T.-Y., Hsieh, S.-T.: Adequate determination of a band of
wavelet threshold for noise cancellation using particle swarm optimization. In: Proc.
CEC 2008, Hong Kong, China, June 1-6, pp. 11681175 (2008)
39. Tan, S.: Minimum Error Rate Beamforming Transceivers. PhD Thesis, School of Electronics and Computer Science, University of Southampton, Southampton, UK (2008)
40. Tang, L., Liu, R.W., Soon, V.C., Huang, Y.F.: Indeterminacy and identifiability of blind
identification. IEEE Trans. Circuits and Systems 38(5), 499509 (1991)
41. Telatar, I.E.: Capacity of multi-antenna Gaussian channels. European Trans. Telecommunications 10(6), 585595 (1999)
19
511
42. Tse, D., Viswanath, P.: Fundamentals of Wireless Communication. Cambridge University Press, Cambridge (2005)
43. Vandenameele, P., van Der Perre, L., Engels, M.: Space Division Multiple Access for
Wireless Local Area Networks. Kluwer, Boston (2001)
44. Vojcic, B.R., Jang, W.M.: Transmitter precoding in synchronous multiuser communications. IEEE Trans. Communications 46(10), 13461355 (1998)
45. Winters, J.H.: Smart antennas for wireless systems. IEEE Personal Communications 5(1), 2327 (1998)
46. Wolfgang, A.: Single-Carrier Time-Domain Space-Time Equalization Algorithms for the
SDMA Uplink. PhD Thesis, School of Electronics and Computer Science, University of
Southampton, Southampton, UK (2007)
47. Xu, C., Yang, L.-L., Hanzo, L.: Ant-colony-based multiuser detection for MC DSCDMA systems. In: Proc. VTC 2007-Fall, Baltimore, USA, September 30-October 2,
pp. 960964 (2007)
48. Xu, C., Yang, L.-L., Maunder, R.G., Hanzo, L.: Near-optimum soft-output ant-colonyoptimization based multiuser detection for the DS-CDMA. In: Proc. ICC 2008, Beijing,
China, pp. 795799 (2008)
49. Xu, C., Hu, B., Yang, L.-L., Hanzo, L.: Ant-colony-based multiuser detection for multifunctional antenna array assisted MC DS-CDMA systems. IEEE Trans. Vehicular Technology 57(1), 658663 (2008)
50. Yang, L.-L.: Design of linear multiuser transmitters from linear multiuser receivers. In:
Proc. ICC 2007, Glasgow, UK, June 24-28, pp. 52585263 (2007)
51. Yang, D., Yang, L.-L., Hanzo, L.: Performance of SDMA systems using transmitter preprocessing based on noisy feedback of vector-quantized channel impulse responses. In:
Proc. VTC2007-Spring, Dublin, Ireland, pp. 21192123 (2007)
52. Yao, W., Chen, S., Tan, S., Hanzo, L.: Particle swarm optimisation aided minimum bit
error rate multiuser transmission. In: Proc. ICC 2009, Dresden, Germany, 5 pages (2009)
53. Yen, K.: Genetic Algorithm Assisted CDMA Multiuser Detection. PhD Thesis, School
of Electronics and Computer Science, University of Southampton, Southampton, UK
(2001)
Chapter 20
Abstract. This chapter analyzes the main challenges in the application of simulation optimization to the design of engine components, with particular reference to
the combustion chamber of a Direct Injection Diesel engine evaluated via Computational Fluid Dynamic (CFD) codes.
20.1 Presentation
This chapter analyzes the main challenges in the application of simulation optimization to the design of engine components, with particular reference to the combustion chamber of a Direct Injection Diesel engine evaluated via Computational
Fluid Dynamic (CFD) codes.
The chapter starts with a description of the advantages of simulation optimization
with respect to traditional trial and error approaches. The state of the art and the recent spreading of such techniques also in industry will be considered. Then, the specific challenges of optimizing an internal combustion engine are analyzed: the large
computational time required for the fluid dynamic simulation of the engine (depending on the resolution of the computational mesh used to represent the fluid domain), the necessity to take into account several operating conditions (each requiring
time expensive simulations), the interaction between fluid and solid structure (requiring the combination of CFD and FEM), etc. Moreover, if the design parameters
include the geometrical features of the engine (i.e. the shape of the combustion
chamber), the computational three-dimensional mesh has to be parameterized so
that it can be automatically generated according to the selected values of the design
parameters. This is particularly challenging when using commercial CFD codes that
usually have a specific pre-processor with specific requirements in terms of grid
quality, volumes connection, boundary conditions, etc. Constraints, restrictions and
Teresa Donateo
Department of Engineering for Innovation, via per Arnesano, 73100 Lecce, Italy
e-mail: teresa.donateo@unisalento.it
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 513541.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
514
T. Donateo
limits that the designer must meet due to norms, regulations and functionalities are
additional challenges in the design process. Another aspect to be considered is the
multi-objective nature of the problem (the main goals to be achieved are the reduction of emissions, the containment of fuel consumption and CO2 emissions, etc)
that has been addressed in different ways by the main research centers involved in
this kind of application.
Among the available optimization methods, Genetic Algorithms (GAs) are usually chosen in this application for their robustness and their capability to deal with
multi-objective optimizations. Furthermore, they are simple to use and to combine
with existing simulation code without significant modifications and their efficiency
is independent of the nature of the problem. Another advantage of genetic algorithms is their implicit parallel nature that makes possible to easily exploit the computational capability of high performance multi-processors server now available not
only in academic but also in industrial research centres. The advantages and the criticism of distributing the computational load on inter regional computing grids like
the south Italy SPACI Grid will be also underlined.
Finally, a test case will be presented, describing the application of simulation optimization to the design of a common rail direct injection diesel engine for automotive applications based on multi-objective genetic algorithms and CFD analysis with
respect to four different engine operating conditions. Note that a multi-disciplinary
approach is a key aspect to successfully apply the method. The achievements shown
in the test case are the results of the collaboration with both automotive companies
(CVIT - Bosch Research Center, Bari, Italy, Prototipo Group) and high performance
computation academic research centres (HPC- Lecce). Moreover, the methodology
described in the chapter has been extensively applied for the design of commercial
diesel engines since 2003 so that a wide amount of data are available. The results
have been also validated through experiments by building and testing the optimized
pistons developed with the described method.
20.2 Introduction
In direct injection diesel engines, the combustion chamber is represented by the
space defined, at each time, by the cylinder head and walls and the piston crown.
Since the combustion and emissions mechanisms of formation are strongly affected
by the flow field produced by the chamber shape, the optimization of the bowl profile is a strategic way to fulfill present day and future regulations about pollutant
emissions and greenhouse gases, which depend on fuel consumption. A symmetrical cavity, called the bowl, is usually present in the piston to allow fuel to be injected,
mixed with air and burned. This type of combustion system was firstly introduced
in 1934 by a Swiss company named Adolph Saurer [1] and then adopted by several
companies, such as Scania, Volvo, PSA, British Leyland, IVECO and many others. Lately, the increase of fuel injection pressure and the use of multiple injections
improved their efficiency and reduced emissions. These results yielded the development of high-pressure electronically-controlled injection systems as unit injectors
20
515
and common rail systems. Due to the high flexibility of common rail in fulfilling
arbitrary injection strategies, cars equipped with direct injection diesel engines and
Common Rail injection systems are now widespread in the automotive market.
The necessity to fulfill more and more stringent limits of emissions, lead designers to abandon the traditional trial and error approach in the design of the combustion chamber and to search for innovative solutions. Since 90s, work has been done,
mainly through experiments but also with the help of CFD simulations to study the
effect of combustion chamber on engine performances and emissions by comparing
a small number (two or three) of combustion chamber designs a ([2],[3], [4], [5],
[6], [7], [8],[7]).
In the mean time, the academic world and in particular the ERC Research Center at the University of Madison-Wisconsin [9] and the CREA Research center of
the University of Salento [10], introduced the use of simulation optimization in the
design of engine combustion chambers. In these first applications, the optimization
was performed by combining a standard genetic algorithm with the Kiva code, an
open source three-dimensional engine simulation program. These works underlined
one of the difficulties of optimizing the combustion chamber, that is the necessity to
take into account a very large number of input parameters. The two main operating
parameters of the engine are its torque and speed that change continuously when the
vehicle is run according to the vehicle route, the user driving style, etc. For this reason, to assess the performance of a vehicle in terms of consumption and emissions,
standard driving cycle are used. A driving cycle consists of a particular profile of
velocity versus time that has to be executed under controlled conditions. Only the
vehicles that fulfill specific regulation in terms of emissions along a standard driving
cycle can be introduced in the market. The specification of the driving cycle changes
form nation to nation and depends on the mass or the rated power of the vehicle.
In the preliminary design of the vehicle, however, the engine is not tested over the
entire driving cycle but only over a certain number of operating conditions (torque
and speed couples) named modes that are assumed to be representative of the
cycle. The effect of a particular chamber shape over emission levels changes significantly when changing from one mode to another [6]. Thus, the combustion chamber
optimization has to be performed according to several modes. Senecal et al. [11]
applied the KIVA-GA optimization method to optimize chamber for two operating
modes, de Risi et al. [12] considered up to four modes. Moreover, the performance
of an engine with a particular combustion chamber shape depends on the control
strategy chosen for the engine in terms of injection strategy (injection pressure,
number of injection per cycle, quantity of fuel injected in each injection pulse),
EGR (exhaust gas recirculation), boost pressure, etc. This problem is usually approached by separating the design parameters that are common across all modes
(compression ratio, combustion chamber profile, etc.) from those that can vary from
mode to mode (EGR, injection profile, boost pressure, etc.) and so can be considered independent. To completely optimize engine performance and pollutant
emissions, both common and independent parameters should be considered in
the optimization. This approach, used by Reitz et al. [9], would find an absolutely
optimized configuration of the input parameters but it is unlikely to explain why this
516
T. Donateo
configuration works well. Moreover, by using this approach, the complexity of the
system strongly increases when different modes are taken into account since each
independent parameters has to be counted as many time as the number of modes.
For this reason, the optimizations of engine geometry and control parameters were
taken separated at the CREA. De Risi et al. [13] developed a two-step optimization
methodology where both steps are based on genetic algorithms and CFD simulations. Firstly, the combustion chamber shape is optimized for a fixed injection strategy. Then, the shape of the combustion chamber is kept constant and the injection
strategy is changed to identify the response of the chambers selected in the first step
to different injection strategies.
In [14] to save computational time the Kriging response surface model is adopted
to limit the number of CFD simulations to be performed. The Kriging method
is used to develop an approximation model that is coupled with an optimization
method to find the optimum. In this way, the computational time is cut by 95%.
However, using RSM in optimization the global optimum can be missed because
the estimated function values obtained with RSM includes errors at an unknown
point.
Nowadays, the use of multi-objective optimization combined with CFD code is
widespread in automotive industry that has become one of the main user of complex simulation and optimization tools. However, industry search for user-friendly
optimization tools easily to couple with existing commercial CFD code. This is the
reason of the success of optimization environment like ModeFRONTIER [15] and
iSIGHT [16]. In these optimization environments, different optimization method
(genetic algorithm, simulated annealing, etc.) can be chosen by the user. However,
it is important to stress the difference between optimizing the engine in terms of
operating conditions (injection strategy, EGR level, etc) and in terms of combustion
chamber shape. In the first case, in fact, the design parameters are usually easy to
modify since they are given as input to commercial CFD code by means of ASCII
files. On the other hand, the shape of the combustion chamber is given through a
complex three-dimension computational mesh that cannot be easily described by input parameters and automatically generated. For this reason the next chapter will analyze the automatic generator of mesh for open source and commercial CFD codes.
20
517
in the piston to allow fuel to be injected, mixed with air and burned. Note that the
squish region grows and shrinks as the piston moves downward and upward between
the top dead center (TDC) and the bottom dead center (BTD) while the bowl region
keeps always the same shape and size. The minimum height of the squish region
is dependent on the compression ratio of the engine and is named squish height.
To simulate the behavior of the combustion chamber with CFD code, a moving 3D
computational mesh that describes the combustion space according with the piston
movement is used.
In initial applications of simulation optimization to the combustion chamber, the
engine was simulated with different versions of an open source CFD code named
KIVA ([18],[19]). In the first investigation at the ERC [9] the bowl geometry was
defined by three input variables (bowl diameter, bowl depth and central crown height
of the piston) allowing only open chamber profiles to be investigated. Lately, a more
general chamber profile generation tool was developed at the ERC[20]. where the
chamber profile is defined according to the parameters shown in figure 20.2. This is
performed with an automated grid generator named Kwickgrid that uses a reduced
set of input parameters when compared with the standard KIVA grid generation code
[18]. Kwickgrid uses up to five parameters to define the overall piston bowl shape
and up to eight variables to generate Bezier curves that describe the desired piston
518
T. Donateo
design and make it smooth for practical application. An approximating and iterative
methodology is used to maintain the compression ratio and user-determined mesh
size until a convergence criterion is reached.
In 2002, Senecal et al. [11] presented a general geometry parametrization where
six parameters were used to define the bowl shape. By increasing or decreasing
the number of parameters defining the profile it is possible to include more or less
wiggles in the bowl profile. Once the bowl profile is determined, a grid generation program G-Smooth is used to create the mesh. A wide range of bowl shapes
can be obtained with this technique but some of them are unsuitable for practical
application. Senecal et al. [11] stress the importance of keeping the same mesh resolution for all geometries generated in the search. This ensures that differences in
design performance are due to changes in geometry (and other eventual independent
parameters) and not changes in mesh resolution. Like in the approach of the ERC
center, bore, stroke, and compression ratio are constant. The compression ratio set
by the user is maintained for all designs by changing the squish height.
In the investigation of Wakisaka et al. [21], the shape of the combustion chamber
is defined with 10 design variable (see figure 20.3). Injection angle is also considered as design variable because diesel engine combustion largely depends on the
injection angle.
In the investigations performed at the CREA the parametric schematization of
figure 20.5 has been initially considered. Note that this schematization does not
allow more than one inflexion point in the bowl profile. In the method of de Risi
et al. [10] the volume of the bowl obtained with a particular combination of the
parameters is calculated as algebraic sum of six volumes V j with j = 1, 6 as in figure
20.4. Once calculated the six volumes the volume of the bowl is given by:
5
Vbowl = V j V6
(20.1)
i=1
Since volumes V j with j = 2, 6 depend on the position of point O, a non linear equation in x0 is obtained that is solved with a standard iterative calculus procedure. Note
20
519
that in this way the achievement of the desiderate volume bowl (i.e. compression ratio) is obtained by moving point 0 in the horizontal direction and not by changing
the squish height. In this way the bowl to squish ratio, that strongly affects the flow
field, doesnt change when the parameters of the bowl are changed. The bowl profile
obtained is then processed by a tool named Meshmaker that automatically write
the simple input file which is required by the KIVA standard preprocessor, named
K3prep, in order to generate a three-block structured mesh. The mesh consists of
three blocks represented in 20.5. The first two blocks define the squish region while
the third block describes the bowl. The spatial resolution is set equal for all chambers and the number of divisions along x, and z axes is automatically calculated
for each chamber according to both engine size (bore, stroke and squish) and bowl
depth and width. K3prep automatically adapts the shape of the computational cells
of the third region to follow the profile of the bowl trying to avoid cells with a bad
aspect ratio. If this is not achievable, depending on the shape of the profile, an error
message is given by k3prep. In this case the computational mesh is not generated.
520
T. Donateo
All the examples illustrated until now, refer to open source mesh generators
where structured meshes are used and the implementation of the automatic generation of the chamber profile is quite simple. On the contrary, the implementation
becomes more complex when using commercial CFD codes that usually consider
unstructured meshes and have complex pre-processors with specific requirements
in terms of grid quality, volumes connection, boundary conditions, etc. Moreover,
they only accept input files with specific formats (Iges, Patran, STL, etc.). Thus,
automatically writing the chamber profile for them is quite challenging. Recently,
the CREA research center extended the optimization method to the use of commercial CFD codes in collaboration with the Nardo Technical Center, Prototipo group
[15] . In this application, the simulation of the engine was performed with the CFX
software and the procedure of figure 20.6 was developed to generate an automatic
computational mesh for this software. Moreover, the effect of the combustion chamber was evaluated not only with respect to its fluid-dynamic behavior but also with
respect to the thermal and pressure stresses on the piston head. This could be done
thanks to the capability of the CFX software in performing both CFD analysis of the
fluid domain (combustion chamber) and FEM analysis of the piston (solid domain).
The flow-chart of the optimization process is shown in figure 20.6.
The method has been implemented in the optimization environment ModeFrontier. The input files Meshparam.txt and Enginemode.ccl contain the geometrical
parameters of the bowl and the operating conditions of the engine, respectively.
These files are automatically generated in the optimization environment according
to the design parameters selected by the optimization algorithm. The geometrical
parameters defining the bowl are the input of the automatic profile and mesh generator (Meshmaker X). The engine operating conditions (rpm, injected fuel, injection strategy, EGR, etc.) are used as boundary conditions for both the thermo-fluid
dynamics analysis and the structural simulation with CFX. The results of the simulations are post-processed with CFX-Post and the main outputs are written in two
20
521
ASCII files that are returned to the optimization environment. The first one is named
C f dout.txt and contains the emission levels (mainly soot and NOx) and the fuel
consumption of the engine with the proposed combustion chamber for the operating
mode in Enginemode.ccl. The second one ( f emout.txt) contains information about
the capability of the piston to sustain the predicted thermal load that will be used as
constrain in the optimization process.
The bowl profile is obtained by the union of the two cubic Bezier curves (AB and
BD) and the vertical lines OA and CD as illustrated in figure 20.7. Points 1,2,3,4
are control points that define the direction and the slope of the curves. By moving
points A,B,C , D,1,2,3 and 4, a large variety of combustion chamber profiles can be
obtained. However there are some constraints in the building of the profile. Point D
can move only in the horizontal direction (x axis), zD = 0, and its x coordinate must
be the same of point C (xC = xD ) while point A has to belong to the vertical axis z
(xA = 0). The two Bezier curves have in common the point B, thus the slope in that
point has to be the same for the two curves. The achievement of the constant volume
of the chamber is obtained by adjusting the coordinates of point B. In particular
point B is moved parallel to the bisector of angle the defined by the prolongations
of segments A1 and C4 until the volume of the chamber is equal (with a tolerance
v ) to the value chosen by the user. The numerical values of the parameters of
figure 20.7 as described in table 20.1 are contained in the input file Meshparam.txt.
Once the profile has been generated, an unstructured computational mesh is defined for the fluid domain according to the resolution selected by the user (also included in the Meshparam.txt) and written in a PATRAN file to be read by the CFX
solver. Meshmaker X also sets the boundary conditions on the mesh and check for
the fulfillment of the CFX preprocessor requirements in terms of grid uniformity,
cell shape, etc. Thanks to the combined CFD-FEM evaluation, chambers that are
suitable from the CFD point of view but not able to sustain the thermal load are penalized in the optimization process. Details on the construction of the solid domain
for the FEM analysis are not reported here for the sake of brevity. Note that the
522
T. Donateo
Table 20.1 Input parameters for Meshmaker-X
Variable
Description
zA
x1
z1
x2
x3
s2,3
x4
z4
xC
zC
xB
xB
n1
n2
vol f
v
z coordinate of point A
x coordinate of point 1
z coordinate of point 1
horizontal distance between points 2 and B
horizontal distance between points 3 and B
slope of the line 2 3 B
x coordinate of point 4
z coordinate of point 4
x coordinate of point C
z coordinate of point C
initial value of the x coordinate of point B
movement of point B along x direction
number of points defining the first Bezier curve (AB)
number of points defining the second Bezier curve (BC)
selected volume of the bowl
maximum acceptable error of the bowl volume
computational time required for the FEM analysis of the piston is negligible with
respect to the CFD simulation of the combustion chamber.
NOx +NC
NOx,n +HCm
2
1000
(20.2)
ISFC
W HEAT
+ ISFC
+W
HEAT0
0
where NOx + HC are the nitrogen oxides and hydrocarbon emissions lumped together while NOx,m + HCm are the target values to be achieved to fulfill specific
engine emission regulation, ISFC is the indicated specific fuel consumption and
ISFC0 is the same but calculated from simulation of the baseline operating case.
W HEAT is the total cylinder wall heat transfer and W HEAT0 is the corresponding
value for the baseline configuration.
20
523
A similar approach was also used by Senecal et al [11] to optimize the chamber
with respect to two operating modes A and B, Senecal calculated the merit function
with respect of each mode ( fA and fB ) and then considered the following expression:
f=
0.5
1
1
fA
1
fB
(20.3)
Compared with the simply averaging of the individual merit function values, this
expression has the property of weighting the overall merit value more heavily by
the lower of fA and fB . In this way, the formulation does not allow a design with
a high value of f fA and a very low value of fB to falsely have a high value of the
overall merit function. The individual merit function values from both operating
conditions must be reasonable to obtain a high value of f .
Starting from 2008, the ERC group [23] considered a multi-objective approach by
using the NSGA-II optimization algorithm. In [14] the EGOMOP (Efficient Global
Optimization for multi-objective problem) approach is used, in which each objective
function is converted into its Expected Improvement and this value is used as fitness
in multi-objective optimization problem. The k-means clustering method is adopted
for the efficient selection of additional sample points in high dimensional problem.
In this way not all the Pareto points are used as additional sample points but only a
reduced number is selected. In this study, the optimization was performed with four
objective functions: soot, NO, CO and thermal efficiency.
In the approach followed at the CREA, the optimization goals, soot, NOx, HC
and Indicate Mean Effective Pressure, are kept separate and a fitness component is
evaluated for each of them:
Nmodes
NO0x
F1 = F1p (i) w1 (i)
(20.4)
NOx i
i=1
F2 =
Nmodes
F2p(i) w2 (i)
soot
0
soot
i=1
(20.5)
i
HC0
F3 = F3p(i) w3 (i)
HC i
i=1
Nmodes
IMEP
F4 = F4p (i) w4 (i)
IMEP0 i
i=1
Nmodes
(20.6)
(20.7)
Note that in this case the optimization of the combustion chamber is treated as a
maximization problem. In these equations, subscript 0 refers to the values obtained
with the baseline configuration while w j (i), j = 1, 4 represents the weight of mode
i for the definition of objective j (1=NOx, 2=soot, 3=HC and 4=IMEP); Nmodes is
the total number of modes considered in the application and Fjp (i), j = 1, 2, 3 is
the value of the penalty function for the fitness component j calculated on mode i.
Penalty functions are used to introduce inequality constraints in the optimization.
524
T. Donateo
This approach also allows for secondary optimization goals to be taken into account
as constraints instead of goal so that the complexity of the problem is not excessively
increased by considering more than four fitness components. More details on this
approach will be given in the last section of this chapter that describes a case study
performed with this approach.
20
525
and NOx emissions and increasing IMEP by changing the injection pressure and
the advance of the single injection. This problem is called Diesel dilemma since
retarding the injection NOx decrease while soot increases. On the other hand, by
increasing injection pressure soot emissions and IMEP improve but NOx emissions
get worse. GA-CREA was compared with three optimization tools available in the
ModeFrontier environment: MOGAII, MOSA and MACK.
For the quantitative analysis of the optimization algorithms, four metrics available in literature [26] were considered and applied to the Pareto front obtained with
GACREA, MOGAII, MOSA and MACK. The results of this test showed that the
optimization algorithms GACREA performs very well in terms of distribution and
definition of the Pareto front. Even if the number of points on the Pareto front is
lower than in the case of MOGAII and MOSA, only a small percentage of these
points are dominated by the solutions found with MOGAII, MOSA and MACK.
Another genetic algorithm named HiPeGEO (High Performance Genetic Algorithm for Engine Optimization) was developed by the CREA in collaboration
with the High Performance Computing group at the University of Salento,. The
HiPeGEO algorithm will be described in the test case section since it was specifically developed for that application.
526
T. Donateo
that difference in the performance of the analyzed chambers could derive not only
from the chamber specification but also on the architecture used to simulate that
specific chamber. A possible way to solve this problem is to use scaled objective
function as in 20.4 - 20.7 where the baseline data used for the comparison are calculated on the same architecture where the current design is being evaluated. A second
difficulty arise when commercial codes are used. In this case, the possibility to distribute the computational load over the grid is limited by the number of available
licenses. Thus, each resource of the grid has to be connected with the licence manager server and ask for the availability of a license to run an evaluation with the
simulation code.
unit
value
Displacement
Compression ratio
Intake valve closing, crank angle BTDC
Injection system
Holes diameter
Number of holes
[cm3 ]
420
17.2
134
Common Rail
0.145
7
[deg]
[mm]
unit
lower limit
upper limit
Xe
r
E
[mm]
[deg]
[deg]
[mm]
[mm]
15.0
-90
45
2.0
1.0
34.0
90
90
14.0
5.0
For the present application, bore, stroke, squish volume and compression ratio
were kept constant and equal to the baseline configuration chosen for reference.
Therefore, the bowl volume was the same for all the analyzed combustion chambers.
20
527
The simulation of the engine behavior when changing the bowl shape was performed
with the CFD code KIVA3V-CREA while the optimization was performed with the
HiPeGEO algorithm, specifically developed for this application.
1500
4.3
0.5
0
2000
8.0
0.5
0
3000
25.0
0
0.5
5300
20.5
0
0.5
528
T. Donateo
The IMEP was considered as the main objective at full load but it has to be taken
into account at low load and speed too. In fact, IMEP values can be very low, fuel
injected being the same, if the completeness of the combustion processes is prevented somehow or other. In spite of their low performance, these chambers could
be considered good solutions by GA because they produce very low levels of NOx.
For this reason, a penalty function, was used to penalize chamber configurations
with low IMEP values at low speed and load. If the current chamber gives an IMEP
value higher than the baseline configuration, the penalty function is set equal to 1
and no penalization is given to the chamber. Otherwise, the chamber is slightly penalized if the reduction of IMEP is inferior to 8% while penalization is much higher
when the reduction is greater than 8% with respect to the baseline case. The same
criterion was applied at full load to penalize chamber configurations with soot emissions higher than a prefixed threshold value, in the present investigation the value
Sootths = 0.78g/kg f was considered (baseline value at operating mode 4). Mathematically, the penalty functions used in this test case can be both described in the
following way:
p 1.0
1
(20.8)
Fp (i) = 1.0 + log(p)/10p 0.92 p 1.0
p + 0.07
p 0.92
where p is the penalty parameter, i.e. p = IMEP/IMEP0 at modes 1 and 2 (applied
to fitness components 1 to 3) and p = sootths /soot at modes 3 and 4 (applied to the
fourth fitness component). The choice of this formulation derives from experience
in the automotive industry according to which variations up to 8% can be regained
by re-mapping the engine.
20
529
20.6.3 HIPEGEO
In the HiPeGEO algorithm, the micro-GA technique proposed by Coellos and
Pulidos [29] is implemented. HiPeGEO automatically executes the preprocessor,
the solver and the postprocessor of KIVA3V-CREA, as described in the text subparagraphs. HiPeGEO has been developed specifically for this application and uses
Grid Technologies to reduce computational time. The complexity of the system is
hidden to the final user who accesses the Grid using a user-friendly web interface
to solve a specific optimization problem. The flowchart of HiPeGEO is shown in
fig. 20.8.
HiPeGEO is structured on two levels: an external level, where a certain number of
macro-iterations are performed, and an internal one which is represented by a microGA cycle. The external iteration uses a large population divided in a replaceable and
a non-replaceable portion (which is generated randomly only once at the begin of
530
T. Donateo
the optimization). The non replaceable portion is never upgraded and represents the
source of diversity for the micro-GA cycles.
At each iteration, a micro population is randomly extracted from both the replaceable and non-replaceable portions and a micro-GA cycle is performed until the
nominal convergence is reached. Then, the external memory containing the Pareto
front (initially empty) is upgraded (as described in the elitism module section) with
the nominal solution of the micro-GA and a new iteration is started. The size of the
external memory is limited (filter block of figure 20.8) to a threshold value selected
by the user with the clustering procedure described in the clustering module section.
The micro-GA cycle is perfomed through the generation, fitness, rank, crossover,
mutation, nominal convergence and elitism modules described next.
20.6.3.1
Generation Module
alfa=0.695257
alfa=4.800000
alfa=1.326748
alfa=4.800000
beta=1.547302
beta=0.695257
beta=4.800000
beta=4.800000
r=0.361379
r=0.937501
r=0.361379
r=4.800000
em=1.108602
em=0.361379
em=0.863116
em=0.408767
For each gene, a suitability analysis is performed to verify that the corresponding
geometrical parameter is included in the range of variation of table 20.3.
20.6.3.2
Fitness Module
The fitness module calculate the fitness values corresponding to a list of the chromosomes. For the test case, the fitnessmodule executes four runs with the KIVA3V
code (one for each mode of table 20.4 and produces a list of chromosomes and their
fitness values as follows:
xe=0.361762
alfa=0.695257
beta=1.547302
r=0.361379
em=1.108602
fit1=15.766697 fit2=0.001982
fit3=0.404997
fit4=0.078914
xe=0.863116
alfa=4.800000
beta=0.695257
r=0.937501
em=0.361379
fit1=7.673714
fit2=0.001166
fit3=0.106391
fit4=0.160870
xe=0.361762
alfa=1.326748
beta=4.800000
r=0.361379
em=0.863116
fit1=0.000000
fit2=0.000000
fit3=0.000000
fit4=0.000000
xe=1.547302
alfa=4.800000
beta=4.800000
r=4.800000
em=0.408767
fit1=18.446617 fit2=0.002777
fit3=0.000000
fit4=0.210047
.....
20
531
the maximum cell size was set equal to 2.2 mm, 1.4 mm and 2 deg after a sensitivity
analysis performed to test the influence of grid resolution on KIVA3V results ([13]).
If the K3PREP pre-processor is able to generate a structured mesh with a good
aspect ratio for the candidate chamber, an output file named itape17, containing
geometric solution information, is passed to the KIVA3V application which simulates the combustion process and generates the fitness file, named fort.65. On the
contrary, if a good mesh cannot be obtained, the corresponding chamber profile is
excluded from the optimization process. Other input files are needed for KIVA3V
execution, such as itape5erc, itapei, itape5, etc., containing initial conditions, constants of spray combustion models and so on.
At the end of each KIVA3V-CREA run, the emissions levels and the IMEP are
stored and uses at the end of the four runs to calculate the penalty function and the
fitness functions (eqs. 20.4- 20.7).
20.6.3.3
Rank Module
The rank module provides a list of rank values for the individuals belonging to a
population to be ranked, as required by the selection module. To rank individuals, the
approach developed by Fonseca [30] has been followed in HiPeGEO. In particular,
the rank r( j) of an individual j is defined by the number of fitness vectors by which
F( j) is dominated, increased by 1. If F(x) is the fitness vector associated to solution
x, F(y) is the fitness vector of individual y, and the goal is the maximization of all
the fitness components of F, than F(y) is said to dominate F(x) if the condition 20.9
is verified:
i (Fi (x) Fi (y)) i (F(xi ) < F(yi ))
(20.9)
If a vector is not dominated by any other, it is called non-dominated or not inferior
and the corresponding design is said to belong to the Pareto front.
In this way the Pareto solutions, which are the best individuals in a multiobjective problem, have rank equal to one and while the worst solutions has a
rank equal to the population size (). The pseudo-code of the rank module for the
case study, characterized by 4 fitness functions to be maximized, is the following:
For i=1 to N
Rank(i)=1
For m=1 to N
If ((F1(i)-F1(m)<=0) and (F2(i)-F2(m)<=0) and
(F3(i)-F3(m)<=0) and (F4(i)-F4(m)<=0))
If ((F1(i)-F1(m)<0) or (F2(i)-F2(m)<0) or
(F3(i)-F3(m)<0) or (F4(i)-F4(m)<0))
Individual i is dominated
Rank(i)=Rank(i)+1
next m
next i
20.6.3.4
Selection Module
The selection module provides a list of couples of selected chromosomes according to the rank selection method. Generally speaking, the rank selection method is
preferable to the roulette wheel when there are big differences among the fitness
532
T. Donateo
values of a population. In fact, with the rank method all the chromosomes have a
chance to be selected, also those with all zeros values in the fitness because unable
to generate a good quality mesh. Of course, the rank method is particular suitable
for this application where the solutions are ranked in a multi-objective fashion.
20.6.3.5
The crossover and mutation module provides a list of offspring generated after
crossover and mutation operations. Like the generation module, it verifies the suitability of offsprings. The selection of crossover and mutation techniques to be applied, is a crucial factor. For the present case study, the uniform crossover has been
chosen since experience has showed its stability compared with other crossover
methods and its efficiency if applied to chromosomes with few genes. The initial
crossover probability Pci is specified by the user (as a parameter of the HiPeGEO
application). After the first iteration, the probability changes taking into account the
degree of stability of the Pareto front and the number of iterations already executed.
The crossover probability Pc (i), referred to the ith iteration is given by (20.10):
i=1
Pci
Pc (i) =
(20.10)
Pci (a S p f (i) + b i) i > 1
where a and b are two positive coefficients selected by the user and S p f (i) measures
the stability of the Pareto front.
The distance between the fronts at ith and (i 1)th iterations, d(i), is defined as
the average of Euclidean distance between couples of points with the same index j,
belonging to the fronts (20.11):
N(i)
Pj (i)Pj (i 1)
N(i)
j=1
(20.11)
If N(i) is the difference between the number of points belonging to the two fronts,
and d is the optimal distance (fixed by the user) between two fronts in order to consider them very similar, then S p f (i) is given by eq. 20.12:
N(i) = 0
0
d(i)d
(20.12)
S p f (i) =
1 d(i) i N(i) = 0, d(i) d
20
Pm (i) = Pmi + c
i
Nmax
533
(20.13)
where Pmi is the initial mutation probability and c is a positive coefficient, both
selected by the user. When the user specifies Pmi and c coefficient, the system verifies
that Pm (i) belongs to the [0, 1] interval i, that is c (1 Pmi ) Nmax
i, so that
i
c (1 Pmi), being Nmax the maximum value of i.
20.6.3.6
The nominal convergence module provides the nominal solution of the micro-GA if
the convergence has been achieved.
In the present work, both convergence criteria suggested by Coello and Pulido
have been implemented. The module verifies the similarity among the chromosomes
belonging to the new population provided as input. For each geometric parameter
(gene), a range of similarity is fixed by the user. The nominal convergence can be
considered reached when the difference among the same genes is within the range
of similarity for all analyzed individuals. Once the similarity criterion is satisfied, a
representative individual is selected as nominal solution. However, if convergence
is not reached after a fixed number of cycles, the micro-GA execution stops and the
individuals are ranked to select the nominal solution. If two or more solutions have
the same rank, one of them is randomly selected.
The nominal solution is then is copied in the external memory.
20.6.3.7
Elitism Module
Clustering Module
After each iteration, the clustering module verifies if the number of solutions stored
into the external memory, which contains the Pareto front, exceeds the maximum
size (n) defined by the user during the submission of the optimization process. When
this happens, it is necessary to exclude some individuals, preserving a uniform
534
T. Donateo
Value
100
50
70
5
0.05
100
20
535
the injected mass is the same for all the chambers at the same operating mode, an
increase of IMEP by 15% corresponds to reducing fuel consumption by the same
amount. To better analyze the results, a clustering process has been performed on the
Pareto front with the same algorithm described in the nominal convergence module
section.
20.6.5.1
The final Pareto Front has been clustered in five groups and the outcome of the
clustering process is shown in fig. 20.10. The data in 20.10 can be used to analyze
the effect of combustion chamber not only on each output parameter but also on
each operating condition.
536
T. Donateo
Note that the first four combustion chambers (C1-C4), very deep and with a narrow throat, allow a strong reduction of NOx emissions on almost all the operating
modes, while the fifth chamber is worse than the baseline chamber at high load
(mode 3 and 4). On the other hand, soot emissions seem to be more affected by the
mode. In fact, the behavior of the selected chambers changes with engine speed and
load. In particular, chambers C1 and C2 (with a narrow throat and a large bottom)
performs better than the baseline chamber at mode 1 and 2. At high load and speed
(mode 3 and mode 4) a very large and shallow chamber is required to reduce particulate (see results of chamber C5). Chambers C3 and C4 are characterized by very
high values of soot emission on all modes.
As far as IMEP is concerned, note that this output value is weakly influenced by
the combustion chamber shape at mode 1 and 2. The best results are obtained with
chamber C5 which is very similar to the baseline configuration, except for the shape
of the throat lip and the high of the central protrusion. Among the five chambers
of fig. 20.10, chambers C1 and C5 can be said to guarantee a better compromise in
the optimization goals; for this reason they were selected for further investigation
so as to explain the influence of bowl geometry on emissions and performance.
The results of this investigation can be found in [12]. The choice between C1 and
20
537
C5 depends on the relative importance given by the user to the emissions in urban
driving cycle (mode 1 and 2) with respect to high load conditions. In the first case
chambers C1 is to be preferred while C5 is the best solution at high speed and load.
20.6.6 Conclusions
This chapter analyzes different approaches described in literature for the optimization of a direct injection diesel combustion chambers by means of evolutionary
algorithms and Computational Fluid Dynamic (CFD) codes. The chapter focused
principally on the investigations of two academic research centers: the ERC at the
Madison-Wisconsin University and the CREA at the University of Salento, Italy.
The different approaches of these centers were compared in terms of specification
of the algorithm used for the optimization, kind of parameterization of the chamber
profile, treatment of the competitive goals, etc. Finally, a test case developed at the
CREA is presented to illustrate how these aspects have been practically addressed
in the design of a common rail diesel engine piston in collaboration with a major
European automotive company.
From the analysis of the test case presented here and from the application of
the method to several other engine optimization problems, the following general
considerations can be drawn.
The key factor in the optimization of a complex system like a diesel engine is, of
course, the capability of the simulation model to capture the behavior of the engine.
Even if this aspect is not directly connected with the optimization process, it must
be always kept in mind when addressing the optimization of an engine. In particular,
the optimization process must be always preceded by a validation of the model with
appropriate experimental data.
Of course, a compromise must be looked for between accuracy and computational load. The behavior of an engine when changing its control parameter (like
EGR, injection rate, etc.) can be captured with an acceptable accuracy also with
simplified models like ANNs and 1-D simulation codes. On the other hand, when
the goal of the optimization is to change the design parameter (spray orientation with
respect to the cylinder axys, compression ratio, squish ratio, bowl profile), simple
models cannot be used since this aspects can be only addressed by complex CFD
codes. The experience suggests that small differences in the bowl parameters (and
often on the bowl profile) can produce very large differences in the behavior of the
engine in terms of pollutant emissions.
Moreover, the optimization of the engine design requires the development of specific mesh generator tools for each CFD simulation codes. In literature, the level of
complexity of these tools as been increased in time to allow a larger variety in the
shape of the bowl, a better quality of the mesh in terms of resolution and aspect
ratio, the possibility to generate specific file formats like IGES, PATRAN, etc, the
necessity to simulate also the solid domain with FEM codes, etc.
Another proof of the particular challenges in optimizing the design parameters
has been obtained at the CREA when one of the chamber optimized for a specific
538
T. Donateo
engine was tested on a second engine. When adapted to a slightly larger cylinder diameter and a smaller compression ratio, that bowl profile produced higher emission
levels than the baseline chamber of the second engine. Note that this was not due to
lack of accuracy of the simulation code since it was able to predict the behavior of
the two engines without any tuning of the models. Thus, the optimization was to be
repeated for the second engine to obtain a reduction of emissions for that engine.
The validation of the CFD model also includes a sensitivity study to assess the
lowest mesh resolution compatible with the desired accuracy and this resolution has
to be preserved for all the investigated chamber designs. Once the model has been
validated, it is preferable to orient the optimization to the improvements achievable
with respect to a baseline case and not to the achievement of specific target values
of the goals. In fact, even if validated, the model always introduce some errors in
the evaluation of the engine behavior when changing the boundary conditions with
respect to those considered in the validation. Thus, the absolute values of the output
variables can be meaningless. On the other hand, a good engine model is expected
to capture the trend of the output when changing the value of the design variables.
For this reasons, the author suggests to compare the output of each design with
those of the baseline conditions as simulated with the same simulation code and on
the same computing architecture. In this way, it is also possible to directly monitor
the optimization process by analyzing in real time the improvement achieved with
respect to the initial configuration.
An experimental validation of the method is also useful at the end of the optimization. For the test case presented in this chapter, one of the optimized chamber
was built and tested at operating conditions similar to those used in the optimization.
The experimental results showed that the optimized chamber was effective in reducing soot and HC emissions and the measured reduction of soot was higher than the
calculated one (up to 50% for mode 2). As far as NOx emissions are concerned, a
better NOx-soot trade-off was also obtained. However, the experimental validation
also revealed another important aspect to be kept in mind. As discussed in the introduction, the author choose to keep constant the control parameters, in particular
the injection strategy, and to focus the optimization only on the design parameter of
the combustion chamber. When tested with injection strategies different from those
used in the optimization, the optimized chamber didnt perform better that the baseline one with the same strategy. This stresses the strong interaction between the air
flow field (generated by the chamber profile) and the spray distribution (resulting
from the injection specification) on the pollutant mechanisms of formation.
In the case of mechanical components like the engine piston is also mandatory
to consider not only the fluid dynamic behavior but also the mechanical response
of the component to the stresses generated by the pressure flow field. A shape of
the bowl that is able to reduce emissions but strongly decrease the mechanical resistance of the piston cannot be accepted. The mechanical resistance is one of the
secondary output values that can be included in the optimization as a penalty functions if it can be evaluated with a specific model. These results of the optimization
also put in evidence the importance of keeping separated the fitness components. In
fact, in the choice of the final configuration to be built, a preference has been given
20
539
to the reduction of soot (or particulate) since the NOx emissions could be controlled
with the use of EGR. Some months after the construction of the optimized chamber,
the massive introduction in the diesel automotive market of particulate filters completely changed the optimization scenario making preferable reducing NOx. With
the multi-objective approach followed at CREA, the solution for the new scenario
was already available without the necessity to perform a new optimization.
A final consideration can be drawn with respect to the criterion used for the
choice of the final configuration. In this investigation, a clustering of the Pareto
solutions was performed with respect to the design parameter (genotype) in order to group the chambers with similar geometric characteristics. This approach
allowed the effect of the overall combustion chamber aspect to be analyzed with
respect to each optimization goals. However, as underlined before, the output values can change significantly also for chambers belonging to the same cluster.
From this point of view, it is better to use Multi-Criteria-Decision-Making techniques to perform the final choice of the chamber with respect to the output values
(phenotype).
References
[1] Saurer, H.: Improvements in and relating to internal combustion engines of the liquid
fuel injection type. Patent N. GB421101 (December 1934)
[2] Heywood, J.B.: Internal Combustion Engine Fundamentals. Mc Graw-Hill, New York
(1988)
[3] Tsao, K.C., Dong, Y., Xu, Y.: Investigation of flow field and fuel spray in a directinjection diesel engine via kiva-ii program. SAE Technical Paper 901616 (1990)
[4] Zhang, L., Ueda, T., Takatsuki, T., Yokota, K.: A study of the effect of chamber geometries on flame behavior in a di diesel engine. SAE Technical Paper 952515 (1995)
[5] Mahakul, B., Bolis, D.A., Crane, G.E.: Deep angle injection nozzle and piston having
complementary combustion bowl. Patent N. US5868112 (February 1999)
[6] De Risi, A., Manieri, D., Laforgia, D.: A theoretical investigfation on the effects of
combustion chamber geometry and engine speed on soot and nox emissions. In: ASMEICE1999, Book No. G1127A, vol. 33-1, pp. 5159 (1999)
[7] Kidoguchi, Y., Sanda, M., Miwa, K.: Experimental and theoretical optimization of combustion chamber and fuel distribution for the low emission di diesel engine. In: 2001
ICE Spring Technical Conference, ASME 2001, vol. ICE-36-2 (2001)
[8] Lisbona, M.G., Olmo, L., Rindone, G.: Analysis of the effect of combustin bowl geometry of a di diesel engine on efficiency and emissions. In: Desantes, J.-M., Whitelaw,
J.H., Payri, F. (eds.) Thermo- and Fluid-dynamic Processes in Diesel Engines: Selected
Papers from the THIESEL 2000 Conference Held in Valencia, Spain, September 13-15
(2000)
[9] Senecal, P.K., Reitz, R.D.: Simultaneous reduction of engine emissions and fuel
consumption using genetic algorithms and multidimensional spray and combustion
modeling. SAE Technical Paper 2000-01-1890 (2000)
[10] De Risi, A., Donateo, T., Laforgia, L.: Optimization of the combustion chamber of direct
injection diesel engines. SAE Technical Paper 2003-10-1064 (2003)
540
T. Donateo
[11] Senecal, P.K., Pomraning, E., Richards, K.: Multi-mode genetic algorithm optimization
of combustion chamber geometry for low emissions. SAE Technical Paper 2002-010958 (2002)
[12] De Risi, A., Donateo, T., Laforgia, D., Aloisio, G., Blasi, E.: An evolutionary methodology for the design of a d.i. combustion chamber for diesel engines. In: THIESEL 2004
Conference on Thermo-and Fluid-Dynamic Processes in Diesel Engines (2004)
[13] De Risi, A., Donateo, T., Laforgia, D.: A new advanced approach to design diesel
engines. International Journal of Vehicle Design 41(1/2/3/4), 165187 (2006)
[14] Jeong, S., Minemura, Y., Obayashi, S.: Optimization of combustion chamber for diesel
engine using kriging model. Journal of Fluid Science and Tecnology 1(2), 138146
(2006)
[15] De Risi, A., Donateo, T., Nobile, F., Vadacca, G., Vedruccio, D.: Fluid dynamics and
structural behavior of optimized combustion chamber profiles. In: International Conference on CAE and Computational Tecnologies for Industry, Mestre Italy, October 16-17
(2008)
[16] Padula, S.L., Korte, J.J., Dunn, H.J., Salas, A.O.: Multidisciplinary optimization branch
experience using isight software. In: 1999 International iSIGHT Users Conference
(1999)
[17] Carrozza, R., Donateo, T., Laforgia, D.: Effect of the combustion chamber profile on
the in-cylinder flow field in a direct injection diesel engine. In: 61 Congresso Nazionale
ATI, Perugia (2006)
[18] Amsden, A.A., Rourke, P.J.O., Butler, T.D.: Kiva ii - a computer program for chemically
reactive flows with sprays. Los Alamos National Labs, LA - 11560 - MS (1989)
[19] Amsden, A.A.: Kiva 3 - a kiva program with block-sructured mesh for complex geometries. Los Alamos National Labs (1989)
[20] Genzale, C., Wickman, D., Reitz, R.D.: An advanced optimization methodology for
understanding the effects of piston bowl design in late injection low-temperature
diesel combustion. In: International Multidimensional Engine Modeling Users Group
Meeting (2006)
[21] Wakisaka, T., Takeuchi, S., Imamura, F., Ibaraki, K., Isshiki, T.: Numerical analysis of
diesel spray impinging on combustion chamber walls by means of a discrete droplet
liquid-film model. In: Proceeding of COMODIA, pp. 462492 (1998)
[22] Subramaniam, M., Ruman, M., Reitz, R.D.: Reduction of emissions and fuel consumption in a 2- stroke direct injection engine with multidimensional modelling and an evolutionary search technique. SAE Technical Paper 2003-01-0544 (2003)
[23] Shi, Y., Reitz, R.D.: Assessment of optimization methodologies to study the effects of
bowl geometry, spray targeting and swirl ratio for a heavy-duty diesel engine operated
at high-load. SAE Technical paper 2008-01-0949 (2008)
[24] De Risi, A., Donateo, T., Laforgia, D.: Optimization of high pressure common rail
electro-injector using genetic algorithms. SAE Technical Paper 2001-10-1980 (2003)
[25] De Risi, A., Donateo, T., Laforgia, D.: Choosing an evolutionary algorithm to optimize
diesel engines. In: International Conference on CAE and Computational Tecnologies
for Industry, Lecce, Italy (2005)
[26] Lee, S., Von Allmen, P., Fink, W., Petropoulos, A.E., Terrile, R.J.: Comparison of multiobjective genetic algorithms in optimizing q-law low-thrust orbit transfers. In: GECCO
2005, June 25-29 (2005)
[27] Luna, F., Nebro, A.J., Alba, E.: Observations in using grid-enabled technologies for
solving multi-objective optimization problems. Parallel Computing 32(5), 377393
(2006)
20
541
[28] De Risi, A., Donateo, T., Zurlo, S., Laforgia, D.: 3d simulation and experimental validation of high egr-phcci combustion. In: Proceedings of ICE-Capri 2007 (2007)
[29] Coello, C.A., Pulido, G.T.: Multiobjective optimization using micro-genetic algorithm.
In: Genetic and Evolutionary Computation Conference (GECCO 2001), pp. 274282.
Morgan Kaufmann, San Francisco (2001)
[30] Fonseca, C.M., Fleming, J.: Genetic algorithms for multiobjective optimization:
Formulation, discussion and generalization. In: Fifth International Conference on
Genetic Algorithm, pp. 416423 (1993)
Chapter 21
Abstract. This chapter presents the design of a space mission at a preliminary stage,
when uncertainties are high. At this particular stage, an insufficient consideration
for uncertainty could lead to a wrong decision on the feasibility of the mission.
Contrary to the traditional margin approach, the methodology presented here explicitly introduces uncertainties in the design process. The overall system design
is then optimised, minimising the impact of uncertainties on the optimal value of
the design criteria. Evidence Theory, used as the framework to model uncertainties, is presented in details. Although its use in the design process would greatly
improve the quality of the design, it increases significantly the computational cost
of any multidisciplinary optimisation. Therefore, two approaches to tackle an Optimisation Problem Under Uncertainties are proposed: (a) a direct solution through a
multi-objective optimisation algorithm and (b) an indirect solution through a clustering algorithm. Both methods are presented, highlighting the techniques used to
reduce the computational time. It will be shown in particular that the indirect method
is an attractive alternative when the complexity of the problem increases.
21.1 Introduction
In the early phase of the design of a space mission, it is generally desirable to investigate as many feasible alternative solutions as possible. At this particular stage,
an insufficient consideration for uncertainty would lead to a wrong decision on the
feasibility of the mission. Traditionally, a system margin approach is used in order to take into account the inherent uncertainties related to the computation of the
system budgets. The reliability of the mission is then independently computed in
parallel. An iterative, though integrated, process between the solution design and
the reliability assessment should finally converge to an acceptable solution. This
chapter describes a way to model uncertainties and introduce them explicitly in the
design process. The overall system design is then optimised, minimising the impact
of uncertainties on the optimal value of the design criteria. The minimisation of the
Massimiliano Vasile Nicolas Croisard
University of Glasgow, Department of Aerospace Engineering,
James Watt Building, Glasgow, G12 8QQ, United Kingdom
e-mail: {m.vasile,n.croisard}@aero.gla.ac.uk
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 543570.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
544
21
545
In these two approaches the solution of the OUU problem is addressed through
a global optimisation procedure based on evolutionary computation. In particular,
in the direct approach the OUU problem is directly solved in the attempt to reconstruct the set of all the Pareto optimal solutions (or a good approximation to it) that
maximise the Belief and optimise the cost functions for all the disciplines. The indirect approach is proposed to mitigate the computational cost related to the use of
Evidence Theory (in particular the exponential growth of the combinations of uncertainty intervals). The indirect approach tries to find at first the sets of solutions for
which the system budgets are within some required values, then it intersects these
sets with the interval of uncertainties for the design parameters. The resulting set is
a superset of the Pareto optimal one (i.e. it contains the Pareto one).
The preliminary design is here performed by using reduced models for trajectory analysis and system design. When a reduced model is not available we propose
the use of surrogate models made of Kriging response surfaces (e.g. the shape of
a heat shield may need running a CFD code, in this case every variation in shape
would require minutes to hours of computational time). Although the methodology is of general applicability to mission design problems it is here intended for
preliminary combined design of interplanetary trajectories in which system level
parameters play a consistent role, such as low-thrust or low-thrust multi gravity assist transfers. The chapter will present the solution of the OUU problem associated
to the robust design of a real space mission.
546
the parameters defining the characteristics of spacecraft subsystems are not known
a priori and their value cannot be computed because it depends on other unknown
parameters. Therefore their value has to be first estimated on the basis of previous
experience or educated guesses by a group of experts. The uncertainty associated to
those parameters is therefore epistemic.
The classical way to treat uncertainty is through probability theory. A probability density function is well suited to mathematically model aleatory uncertainties,
as far as enough data (experimental for instance) are available [1]. Even though,
the analyst still has to assume the distribution function and estimate its parameters.
Moreover, Bae et al. [3] pointed out that aleatory uncertainty could be in fact epistemic uncertainty when insufficient data are available to construct a probability
distribution. In this situation, alternative distributions can represent the uncertainty
as the mean, variance and shape are unknown [16].
However, probability fails to represent epistemic uncertainties because there is no
reason to prefer one distribution function over another [23]. Indeed, the probability
applies only if one can identify a sample of independent, identically-distributed observations of the phenomenon of interest [24]. When uncertainties are express by
means of intervals, based on experts opinion or rare experimental data, as it is the
case in space mission design, this representation becomes questionable. As pointed
out by Helton et al. [15], there is a large conceptual difference between saying that
all that is known about a quantity is that its value belongs to an interval [a,b] and
saying that the probability distribution of that quantity is uniform on [a,b]. The latter
statement, in fact, implies an additional piece of knowledge on where the value of
that quantity is located in [a,b].
A few modern theories exist to better represent epistemic uncertainties, without
the need to make additional assumptions. These include for example interval analysis [13, 19], Possibility Theory [29], Fuzzy Set Theory [11], Theory of Paradoxical
Reasoning [26] or Theory of Clouds [22]. Evidence Theory is an extension of Possibility and Fuzzy Set Theory [18], therefore we propose the use of Evidence Theory
in the framework of preliminary space mission design.
21
547
m(E) > 0, E 2U
(21.1)
m(E) = 0, E
/ 2U
m() = 0
(21.2)
(21.3)
m(E) = 1
(21.4)
E2U
Therefore, the BPA is a function that maps the power set into [0, 1]. The elements
of 2U are solely defined by their associated BPA being strictly positive, and are
commonly called focal elements (FE).
While the set of subsets of the finite sample space in the probability theory constitutes a -algebra, the power set 2U does not. This distinguishes fundamentally
Evidence Theory from the probability theory. Unlike probability theory, unions and
intersections of subsets of U are not necessarily included in the power set. This
means that evidence on the event {A or B} or {A and B} does not imply/require information on either events {A} and {B}. Moreover, the complement of an element
of U is not necessarily in the power set. While P(A) = 1 P(A) is true in probability
theory, it does not hold for Evidence Theory.
Therefore, the power set 2U and the BPA are less structured than their counterparts of probability theory. They aim at representing all and only the piece of
information available to the analyst. This characteristic is fundamental when the
analyst needs to make decisions based on poor or incomplete information.
When more than one parameter are considered uncertain (e.g. u1 and u2 ), the
power set is composed of the cartesian products of all the elements of the power
sets of each parameters frame of discernment: 2(U1 ,U2 ) = 2U1 2U2 . Thus the BPA
of a given element of 2(U1 ,U2 ) is the product of the BPA of the two corresponding
focal elements:
(FE1 , FE2 ) 2U1 2U2 , m12 (FE1 FE2 ) = m1 (FE1 ) m2 (FE2 )
(21.5)
The number of focal elements increases exponentially with the number of uncertain
parameters and the number of focal elements of their respective power sets. If N
parameters are considered uncertain and nk represents the number of focal elements
of the power set of the kth uncertain parameter, the total number of focal elements
is given by:
N
nFE = nk
(21.6)
k=1
This expression is based on the assumption that the different uncertain parameters
are independent. This chapter considers that this assumption holds true. For the case
of dependant parameters, which is beyond the scope of the present publication, the
reader can refer to the work of Ferson et al. [12].
548
It is worth mentioning that, in general, the pieces of evidence can come from
different sources and need to be combined. As highlighted in [23], the results of an
uncertainty analysis can strongly depend on which combination method is chosen
for use. The choice of the combination rule should be driven principally by the
context of the information to be combined. However, in this chapter we consider
only the case in which the sources of evidence have already been combined.
m(FE)
(21.7)
FEA
FE2U
Pl(A) =
m(FE)
(21.8)
FEA=
FE2U
Thus, all the propositions with a not null intersection with the set A contribute to the
Pl value while only the propositions included in A contribute to the Bel value. For
example, Fig. 21.1 represents a BPA structure of two uncertain parameters u1 and
u2 . Parameter u1 can belong to any of the four intervals [a1 , b1 ], [b1 , c1 ], [c1 , d1 ] and
[d1 , e1 ] while the parameter u2 can belong to the three intervals [a2 , b2 ], [b2 , c2 ] and
[c2 , d2 ]. Thus there is a total of twelve focal elements FE1 , . . . , FE12 . Let us define
the proposition A as the area within the dashed curve C . Only the focal elements
FE1 , FE6 and FE10 (gray in the figure) are entirely included in C . In addition, FE2 ,
FE3 , FE5 , FE7 , FE9 and FE11 are partly inside C (dotted in the figure), therefore
only partially implying the proposition A. The belief and plausibility of A are then:
Bel(A) =
Pl(A) =
(21.9)
Pl(A) + Pl(A) 1
Pl(A) + Bel(A) = 1
(21.10)
(21.11)
Uncertain Parameter u2
21
d2
549
FE9
FE10
FE11
FE12
FE5
FE6
FE7
FE8
FE1
FE2
FE3
FE4
c2
b2
a2
a1
b1
c1
d1
e1
Uncertain Parameter u1
Fig. 21.1 Belief and Plausibility of proposition A in a given BPA structure of two uncertain
parameters
Fig. 21.2 Interpretation of the relation between Belief, Plausibility and uncertainty
where (A) represents the complement of A. The two first relations shows that, on
the contrary of probability, the belief (resp. plausibility) assigned to an event does
not uniquely determine the belief (resp. plausibility) of its complement. The last
relation means that Pl considers the uncertainty, while Bel does not (cf. Fig. 21.2).
550
CBF : y Y BelY (y y ) = BelU f 1 (Y )
CCBF : y Y BelY (y > y ) = BelU f 1 Y
CPF : y Y PlY (y y ) = PlU f 1 (Y )
CCPF : y Y PlY (y > y ) = PlU f 1 Y
(21.12)
(21.13)
(21.14)
(21.15)
(21.16)
and:
BelU f 1 (Y ) =
BelU f 1 Y =
PlU f 1 (Y ) =
PlU f 1 Y =
m(FE)
(21.17)
m(FE)
(21.18)
m(FE)
(21.19)
m(FE)
(21.20)
FE2U
uFE, f (u)y
FE2U
uFE, f (u)>y
FE2U
uFE, f (u)y
FE2U
uFE, f (u)>y
CBF (y ) and CP F (y )
0.8
CP F
0.6
0.4
CBF
0.2
0.8
CCP F
0.6
0.4
0.2
CCBF
Fig. 21.3 Examples of cumulative belief and plausibility functions (left) and complementary
cumulative belief and plausibility functions (right)
21
551
become quickly prohibitive as the number of uncertain parameters and the number
of intervals per parameter increase. In fact, the total number of focal elements nFE
grows exponentially, according to Eq. (21.6), with the number of uncertain parameters N, in particular if the number of focal elements is the same for every uncertain
parameter nFE = nN . Furthermore, in order to identify the focal elements included
(or intersecting) f 1 (Y ), the maximum of f over every focal element in 2U has to
be computed and compared to y . In the event that the system function is convex,
this maximum lies at one of the vertices of the focal element, otherwise, an optimisation problem has to be solved over every focal element. Therefore, a generic
algorithm that attempted the direct calculation of the Belief and Plausibility values
starting from the calculation of the focal elements, would have a computational cost
that increases exponentially with the number of uncertain parameters.
In the literature, there exist some approximation methods that attempt a reduction of the number of focal elements, such as (k-l-x)-approximation [27] or the D1approximation [4], in order to improve the speed of computation of the cumulative
functions. Other, more recent methods, instead try to reduce the number of uncertain parameters by evaluating their impact on the value of the cumulative function
through a sampling approach [15]. All these techniques could be used as a preprocessing stage to simplify the computation of the cumulative functions by approximating it.
In this chapter, however, we are addressing the OUU without any a priori reduction of the number of focal elements or uncertain parameters. Note that existing
approaches like the one proposed by Agarwal et al. [1] address a reliability optimisation problem in which the belief in the satisfaction of the constraints has to be
higher than a given value. In this case the construction of the entire CBF is not required, furthermore in the works of Agarwal et al. no specific technique to mitigate
the exponential growth of the computational cost is considered.
(21.21)
552
The subscript ()d highlights the dependency of the CBF value on the design vector
d. Without loss of generality, we assumed here that the cost function is minimised.
Note that the use of Belief corresponds to a strict requirement on the actual feasibility of the mission. On the other hand, if the analyst was interested in the possibility of having the mission in some conditions, then Plausibility should be used in
Eq. (21.21) instead of Belief.
Although, the solution to problem (21.21) gives a measure of the maximum confidence in the proposition f < y , it does not give a measure of the best achievable
system budget. The simultaneous optimisation of the CBF and of f can be formulated as a bi-objective optimisation problem, such that:
CBFd (y )
max
y Y,dD
(21.22)
y
min
y Y,dD
CBF (y )
0.8
0.6
0.4
CBF - Design A
CBF - Design B
Optimal solution
0.2
Fig. 21.4 Typical solution of the optimisation under uncertainty problem (dash). The CBF of
2 of the dominating designs are represented ( and
)
21
553
The latter point is particularly interesting because defines the optimality of a set
(the entire CBF curve) over another. According to this principle, the optimality of
a design point can be redefined by saying that a design point d1 dominates another
design point d2 if every point in the image space corresponding to d1 is better,
lower y and higher CBF, than every point in the image space corresponding to d2 .
As described in the next section, this definition of optimality will lead to a particular
formulation of the OUU.
min yd (bl(1))
dD
min yd (bl(2))
dD
(21.23)
..
min yd (bl(nbel ))
dD
where yd (bl(k))=min (y | CBFd (y ) = bl(k)) corresponds to the minimal threshold for which the design d is at belief bl(k). All the nbel minimal thresholds for a
554
given design are known as soon as the entire belief curve is computed, which is
done each time a design is selected.
In the case of the bi-objective approach, the solution vector is x = [d, y ]. We firstly
rank all the focal elements according to their BPA and compute a complete belief
curve for a randomly selected design point. This curve, called CBFopt represents the
current best estimate of the optimal CBF. We then start the optimisation process.
When evaluating an agent ai , corresponding to a pair [d, y ] (i.e. a new design point
and a new threshold), we use Algorithm 1 to compute efficiently the belief associated to it. Because we ranked the focal elements, we add them up starting from
the ones with higher BPA value. If the focal elements with higher BPA value are
discarded because f is above y and the sum of all the remaining BPAs would not
allow to improve the current CBFopt value for that particular y , then we stop the
computation of the belief value associated to that particular solution vector. This is
done via the achievable belief (variable achBel in Algorithm 1) that tracks the value
of the maximum belief that can be achieved during the computation. Furthermore,
once a value is assigned to the threshold y , the maximisation of the system function
f over each focal element is stopped as soon as a value is found above the threshold.
21
555
Finally, if the Bel(d, y ) value associated to a pair [d, y ] is better than CBFopt (y ),
we compute the minimum threshold ymin such that Bel(ymin ) = Bel(d, y ) and update the CBFopt . This guided search for the optimal belief curve is summarised in
Algorithm 1. Note that the multiobjective optimisation algorithm had to be slightly
modified to make the CBFopt available throughout the computation. Such a modification does not modify the performance of the optimiser. The computational cost of
Algorithm 1 is dictated by nFE the number of focal elements. In fact, a maximum of
nFE optimisation problems need to be solved every time a design point is evaluated.
In the case of the multi-belief approach, the solution vector is simply x = d. For
each selected design vector the complete belief curve is computed. Though this is
more computationally expensive than computing a single belief value, it has the
benefit of having only the design vector as optimization variable. Therefore, each
design needs to be evaluated once and only once. Additionally, in Algorithm 1, the
known maxima of f over all focal elements evaluated during the loop are lost. Thus,
while the information was available, it is not used to identify if the current design is
dominating for lower belief levels (or identically lower thresholds). By computing
the whole belief curve instead we preserve this information.
A more elegant implementation of this approach would consist in redefining the
dominance index. If the classical Pareto dominance index:
Ii = j | CBFd j (yj ) > CBFdi (yi ) yj < yi j = 1, . . . , n pop j = i
(21.24)
is used to define the Pareto optimality of a design vector di , where |.| denotes the
cardinality of a set, the optimiser cannot evaluate correctly the local Pareto optimality of a point on the CBF y plane since for each design there is a whole curve
of points in the CBF y plane. If the Pareto dominance index were defined as in
equation Eq. (21.25)
Ii = nbel k [i, nbel ] | j [1, n pop], yi (bl(k)) > yj (bl(k))
(21.25)
then a design with a dominance index lower than nbel is dominating all the others
for at least one of the belief levels bl. Therefore leading to the same result as the
formulation of Eq. (21.23) and standard dominance index (Eq. (21.24)).
556
Fig. 21.5 Illustration of the cluster method with 3 focal elements FE1 , FE2 and FE3 . The
proposition f < y is true only within the subdomains s1 , s2 and s3 . Two examples of design
point d1 and d2 are given
function can then be cheaply computed by adding the mass of the focal elements
included in any element of Sy .
d (y ) =
CBF
m (FE)
(21.26)
(d,FE)s
sSy
Fig. 21.5 illustrates the proposed method. In this example, there are only three focal
elements FE1 , FE2 and FE3 . The set of subdomains where the system function
verifies the proposition is Sy = {s1 , s2 , s3 }. Two different design points d1 and d2
are represented. Their respective approximation of CBF are:
d (y ) = m (FE2 ) + m (FE3 )
d (y ) = m (FE1 ) ; CBF
CBF
1
2
Algorithm. To compute the approximation of the CBF function, the set Sy of subdomains is computed for increasing values of the threshold until a belief of 1 is
found. At each step, sample points verifying the proposition f (d, u) < y are identified, then classified in clusters. The points of a given cluster defines one subdomain
(y ) is selected.
si of Sy . Then, the design maximising the approximation of CBF
The algorithm used here is described in Algorithm 2.
To speed up the computation, Axis-Aligned Box (AAB) are used. Each subdomain si is associated with its outer AAB (called also the Axis-Aligned Boundary
Box) and an inner AAB. If si is defined by the set of points of RL (x1 , x2 , . . . , xp ),
then its axis-aligned boundary box oAAB(si ) is defined as:
1 jL
21
557
*/
/* Main loop
max < 1 do
while Bel
/* Update the threshold
y y step ;
/* New sampling points
Xnew {[d, u]|(y step) < f (d, u) y } ;
/* Update the set of valid sampled point
X {X, Xnew } ;
/* Identify the valid subdomains
Partition in clusters the sample points X ;
foreach cluster do
Compute the associated convex hull ;
Compute the oAAB and an iAAB ;
endfch
*/
/* Find the design point giving the highest CBF
[CBF opt (y ), dopt ] max CBF d (y ) ;
*/
*/
*/
*/
*/
*/
dD
The inner AAB is an axis-aligned box that is contained within the subdomain si . As
opposite to the outer AAB, the definition of the inner AAB is not unique. It has been
chosen here to centre the inner AAB on the barycentre of the sample points defining
si and to maximise its relative size such that it remains within si .
The idea behind the inner and outer AABs is that it is extremely cheap to check
if a focal element is outside or inside an AAB. The focal elements that are outside
the outer AAB are guaranteed not to be within f 1 (Y ) and the one inside the inner
AAB are guaranteed to be within f 1 (Y ). Once this selection process done, only
558
the focal elements that do not enter in any of those categories need to be checked to
(y ).
compute CBF
Convex hull. In order to identify if any of the remaining focal elements fulfils the
proposition f (d, u) < y , u FE, one only need to check if its vertices are within
the same subdomain si . In our implementation si is the convex hull of the sample
points of the ith cluster. If v is a point of R p , we have:
p
v si R+ |
k=1
k=1
(v = (k) xk ) ( (k) = 1)
(21.28)
Thus the phase 1 of the revised simplex method used to find a feasible solution to a
linear programming problem has been implemented in order to determine whether
or not such a vector exits [6][5].
It is important to highlight that in this method, no assumptions are made on the
convexity of the system function f . Only the subdomains si are considered as convex
which in the practical application related to space design is reasonable. Another
advantage of this method is that it shall identify all the locally optimal design regions
and thus identifying various classes of interesting design (as in the direct solution).
Finally the global optimum is likely to be found using a simple local optimiser,
starting for instance from the barycentre of each cluster.
Pixelisation. A more efficient possibility to identify the subdomains si is based on
the partition into pixels the cartesian product of the uncertain parameters domain
and the design domain. This pixelisation technique replaces the use of the convex
hull to identify the subdomains si .
It is done by creating first the list of the pixels containing sample points verifying
the proposition f (d, u) < y , then pruning this list by eliminating the pixels containing at least one sample point violating the proposition. It can be proven that this
operation is polynomial with the number of dimensions and subdivisions of each
dimension. A focal element is thereafter said valid if all the pixels intersecting it are
included in any si .
The quality of this approximation technique is related the quality of the sampling
of the uncertain and design space and on the number and size of the pixels. It is here
implicitly assumed that the set of points that satisfy the proposition f (d, u) < y is
finite and can be covered with a finite set of pixels (a reasonable assumption for the
problems of interest). The larger the pixels the lower the accuracy of the coverage
and the faster the algorithm. However, it has a main advantage over the convex
hull one, as it can represent even very non-convex subdomain si . Moreover, as the
design domain is discretised, a fixed number of possible design vector is accessible.
Therefore, one can consider testing them all to identify the best one(s). If not, an
optimiser working with binary variables can be use to solve the OUU.
Finally, since the number of pixels is at most equal to the number of admissible
sample points in Sy , it does not grow exponentially if an efficient sampling procedure is used. The sampling algorithm needs to be run only once per every value
of the threshold and, therefore, unlike in the direct approach, is independent of the
21
559
(21.29)
wet
In this equation, the subsystem considered are the tanks (mtank ), the solar arrays
(marray ), the radiator (mrad ), the harness equipment (mharness ), the power processing subsystem(mPPU ), the thrusters (mthrusters ) and finally the propellant required to
perform the low thrust transfer (mxenon ). The expressions of all these quantities are
given in the following subsections.
21.5.1.1
Propellant Mass
The mass of xenon is estimated from the V budget using the rocket equation.
V
mxenon = mT LO 1 e ISP g0
(21.30)
where mT LO is the trans lunar orbit mass, i.e. the wet mass of the spacecraft just
after the Earth-Moon system escape (specific to this mission, mT LO = 2400 kg), g0
is the gravitational acceleration (g0 = 9.80665 m/s2 ), V is the delta V budget for
the SEP transfer from the Earth-Moon system escape to the Mercury capture (in
ms1 ) and ISP is the mean specific impulse of the SEP transfer, given in seconds by
Eq. (21.31).
ISP = 0.989 ISP
(21.31)
max T
max T
560
Total DV [m/s]
9500
8500
7500
6500
200
4
250
4.5
300
Maximum thrust
[mN]
5
5.5
350
400
6
6.5
Power @ 1AU
[kW]
Fig. 21.6 Kriging surrogate of the deep space V for the low thrust mission of BepiColombo
Delta V budget. The delta V budget is composed of the deep space V (cf. below), the V for second Lunar Gravity Assist (40 ms1 ), the V for SAA control
(100 ms1 ), the V for flyby navigation (260 ms1 ), the V for other navigation
(280 ms1 ) and the contingency (+5% of the deep space V ).
The deep space V is a quantity essential to any optimisation of spacecraft design. Indeed, it has a direct impact on the propellant mass (cf. Eq.(21.30)) and the
tank mass (cf. Eq.(21.32)). In the frame of the BepiColombo test case, this value is
computationally expensive to obtain and cannot be done fully automatically. Therefore, it is not feasible to consider it within the model as it is. In order to overcome
this issue, a surrogate model has been built based on 180 different transfers priorly
computed for various values of P1AU the power to be generated by the solar arrays
at 1 Astronomical Unit (AU) and Tmax the maximum thrust. Moreover, the surrogate
reduces significantly the computational time but at the expense of accuracy. For
this study, Kriging has been selected via the DACE package [21], with a first order
polynomial regression model and an exponential correlation model (cf. Fig. 21.6).
21.5.1.2
Tank. The mass of the tank is directly proportional to the mass of propellant:
mtank = tank mxenon
(21.32)
where tank is the specific ratio of the tank subsystem (tank = 11%).
Solar arrays. The area of the solar arrays required is given by Eq. (21.33).
A=
P1AU
A
p Gs
(21.33)
21
561
where SA is the specific ratio mass/area of the solar arrays (SA = 2.89 kg/m2 ),
m0
is the inevitable structural mass of the solar arrays and SA is the mass margin
array
ISP = b2 T 2 + b1 T + b0
(21.35)
(21.36)
P = c (a1 T + a0)
Pdis
c
rad
+
c
0
1P
dis
lim
2
mrad =
c2 + c3 PPdis + c4 PPdis
rad
dis
dis
lim
(21.38)
otherwise.
lim
where c0 , c1 , c2 , c3 and c4 are constants and rad is the mass margin for the radiator
(rad = 1.15).
562
Value
Subsystem
V
A
SA
rad
+5%
1.20
1.10
1.15
1.20
V contingency
Area of the solar arrays
Mass of the solar arrays
Mass of the radiator
Mass of the harness subsystem
harness
harness
where m0
harness
+ harnessPmax harness
(21.39)
cific ratio mass/power of the harness subsystem (harness = 1.3763 103 kg/W) and
harness is the mass margin for the harness subsystem (harness = 1.2).
Power Processing Unit. The mission of BepiColombo is designed with 4 power
processing unit (PPU). The mass of each of them is estimated using an equation
linear with the maximum power Pmax (cf. Eq.(21.36)) and the square of the mean
specific impulse (cf. Eq.(21.31)).
Thrusters. Finally, the mass of the thrusters and the associated components varies
with the technology used and also the number of thrusters necessary to achieved the
required thrust.
mthrusters = m0
+ nthruster mnominal
(21.40)
thrusters
where m0
thrusters
thrusters
nominal mass of one thruster and nthruster is the number of thrusters installed aboard
the spacecraft (nthruster = 2).
Remark. The simple model presented here enables to estimate the mass of the
main subsystems of a low thrust spacecraft with only three inputs: (i) the power to be
generated by the solar arrays at 1AU P1AU , (ii) the maximum thrust Tmax and (iii) the
specific impulse at maximum thrust ISP . Moreover, margins are conventionally
max T
used to take into account uncertainties on this modelling, therefore we report them
in table 21.1.
21
563
Intervals
Lower bound
Upper bound
Basic probability
assignment
0.18959
0.195
0.205
0.215
0.195
0.205
0.215
0.22751
0.05
0.15
0.25
0.55
SA
2.89
3.00
3.10
3.25
3.00
3.10
3.25
3.3105
0.10
0.15
0.35
0.40
harness
1.3763 103
1.4500 103
1.5500 103
1.6000 103
1.4500 103
1.5500 103
1.6000 103
1.6515 103
0.05
0.25
0.30
0.40
systems are available to the designer, and their performances varies significantly,
impacting directly the value of p and SA . Similarly, the specific mass/power ratio
of the harness subsystem is dependant on the technology used but also on the internal configuration of the spacecraft, which is unknown at the preliminary stage of the
spacecraft design.
The use of system margins is classically to compensate for uncertainties. As we
are aiming here at crystallising the uncertainties with Evidence Theory, we selected
parameters as uncertain when they were associated to a system margin. In our example, these are A , SA and harness . Therefore they are set to 0 for the OUU problem. Note that the BPA structure is such that the effect of the 3 parameters being
considered as uncertain is artificially equivalent to applying the default system margins. The consequence is that the optimal design of the OUU is the same as the
deterministic one. This is obviously not generally the case but helps here to better
comprehend the results.
564
wet )
0.8
CBF l(mSEP
0.6
0.4
0.2
870
872
874
876
mSEP
878
wet
880
882
[kg]
884
886
888
Fig. 21.7 Optimal solution of the OUU problem for the BepiColombo test case. An example
of solution found is shown too in dash along with the error area between the two curves
BepiColombo Optimisation Under Uncertainties
Localisation of the optimal points
Tmax [mN]
230.1
230
0.8
229.9
5640
5645
5650
5655
IspmaxT [s]
5655
Belief levels
0.6
5650
IspmaxT [s]
IspmaxT [s]
5655
5645
5650
0.4
5645
5640
5635
230
5640
0.2
4800
229.9
4750
5635
4640
Tmax [mN]
4660
4680
4700
4720
4740
P1AU [W]
4760
4780
4800
4820
4700
230
229.9
P1AU [W]
4650
0
Fig. 21.8 Location of the optimal design points for the OUU - BepiColombo test case
P1 AU: 4,650 or 4,800 W. The optimal maximum thrust is clearly 230 mN and the
specific impulse at maximum thrust between 5639 and 5655 seconds.
21
565
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
12
20
32
48
64
100
Fig. 21.9 Influence of the number of agents in the performance of NSGA-II for the BepiColombo test case
mutation: 10 and 25. As for any test involving evolutionary algorithm, the settings
of the optimiser parameters is tricky and can affect significantly the results. We
set the probabilities and distribution indices such that we balance the convergence
speed and the global exploration. The most significant parameter however is clearly
the size of the population. We set it to 20 agents after running some preliminary
tests for up to 100,000 function evaluations. Fig.21.9 shows that for our selection of
probabilities and distribution indices, the optimal population size is around 20.
The BPA structure defined for the BepiColombo test case is composed of 64
adjacent focal elements (cf. table 21.2). As we do not assume convexity here of the
system function mSEP , a local optimiser1 is used to identify the maximum of the
wet
(21.41)
Table 21.4 gives the percentage of times, over 100 runs, an approach finds solutions within both classes or, in brackets, within only one class. Once again, both
approaches give similar results. It is interesting to note that even though the biobjective gives worse results than the multi-belief approach, it finds solutions in
both classes more often. Indeed, as the bi-objective approach associates a design
with a given threshold, it does not guarantee that nearly optimal designs are found
for the whole range of thresholds, thus leading to a higher error area.
1
566
0.8
0.6
0.4
Optimal
Pareto Front
MultiBelief
Biobjective
0.2
870
872
874
876
878
880
882
884
886
888
Fig. 21.10 Solution found for the OUU with only 100,000 system function evaluations (BepiColombo test case)
Table 21.3 Mean value and variance of the normalised error area for the OUU BepiColombo
test case for 100 runs
Bi-Objective
variance
nval
mean
100,000
500,000
1,000,000
2.39 101
9.26 103
5.27 103
5.23 102
2.37 105
2.53 106
mean
Multi-Belief
variance
2.36 101
9.85 103
3.24 103
4.78 102
1.63 105
3.00 106
Table 21.4 Percentage for which solutions have been found over 100 runs in both classes
and in at least one class, for the case of BepiColombo
Number of system
function evaluations
100,000
500,000
1,000,000
Bi-Objective
both classes
one class
2%
94%
100%
20%
99%
100%
Multi-Belief
both classes
one class
0%
58%
79%
2%
100%
100%
21
567
0.8
0.6
0.4
0.2
870
872
874
876
878
880
882
884
886
888
890
Fig. 21.11 Approximation found with the indirect approaches for the OUU with only
100,000 system function samples (BepiColombo test case)
3
10
10
10
64
216
1000
1728
Fig. 21.12 Variation of the number of designs evaluated in the direct approach versus the
number of focal elements. The number of system function evaluations has been fixed to
100,000
global solution. However, both the clustering and the pixelisation give a reasonably
good approximation of the Pareto front.
Unlike the direct solution, the complexity of the indirect one does not increase
with the number of focal elements. Indeed, only the focal elements that lie between
the outer and inner axis-aligned boxes need to be checked. Moreover, the number of
sample points needed to gather the same information increases polynomially with
the number of dimensions. It is not dependant on the number of focal elements in
any way. Fig. 21.12 shows the number of design points that the direct approach can
test with 100,000 function evaluations. As the number of focal elements increases,
the result of the direct approach naturally decreases in quality. On the contrary, an
increase in the number of focal elements has no affect on the indirect approach.
568
21.7 Conclusions
In this chapter, we presented a way to model the uncertainties inherent to preliminary space mission design. The use of Evidence Theory was introduced to represent
adequately both aleatory and epistemic uncertainties. The associated robust design
problem was formulated as a multi-objective optimisation problem, and two solution approaches were proposed: a direct and an indirect one. The direct approach
solves directly the multi-objective optimisation problem (in this chapter we used
a population-based multi-objective genetic algorithm). It was tested on two different interpretations of the optimisation under uncertainty problem, however, in both
cases, the computational time was increasing exponentially with the number of uncertain parameters. Therefore, an indirect approach was devised to contain the computational cost required to optimise the belief and plausibility functions. The indirect
approach, provided good approximations of the belief and plausibility curves with a
computational complexity that remained polynomial with the number of uncertain
parameters. Therefore, it can be used to produce a first estimation of the optimal solution and to narrow down the design and uncertain domains. The direct approached,
instead, could be used on the reduced domains for more accurate results.
Acknowledgements
The authors would like to thank Mr. Stephen Kemble of EADS Astrium UK for providing the
reduced system models and plenty of useful suggestions on how to model the design process.
References
1. Agarwal, H., Renaud, J.E., Preston, E.L.: Trust region managed reliability based design
optimization using evidence theory. In: Collection of Technical Papers - AIAA/ASME/
ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference, vol. 5, pp.
34493463 (2003)
2. Agarwal, H., Renaud, J.E., Preston, E.L., Padmanabhan, D.: Uncertainty quantification
using evidence theory in multidisciplinary design optimization. Reliability Engineering
and System Safety 85(1-3), 281294 (2004)
3. Bae, H.R., Grandhi, R.V., Canfield, R.A.: Uncertainty quantification of structural response using evidence theory. In: Collection of Technical Papers - AIAA/ASME/ASCE/
AHS/ASC Structures, Structural Dynamics and Materials Conference, vol. 4, pp. 2135
2145 (2002)
4. Bauer, M.: Approximation algorithms and decision making in the dempster-shafer theory
of evidence - an empirical study. International Journal of Approximate Reasoning 17( 23), 217237 (1997)
5. Bunday, B.D.: Basic Linear programming. Hodder Arnold (1984)
6. Dantzig, G.B.: Linear Programming and Extensions. Princeton University Press, Princeton (1965)
7. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective
genetic algorithm: Nsga-ii. IEEE Transactions on Evolutionary Computation 6( 2), 182
197 (2002)
21
569
8. Dempster, A.P.: New methods for reasoning towards posterior distributions based on
sample data. The Annals of Mathematical Statistics 37(2), 355374 (1966)
9. Dempster, A.P.: Upper and lower probabilities induced by a multivalued mapping. Ann.
Math. Statist. 38, 325339 (1967)
10. Du, X., Wang, Y., Chen, W.: Methods for robust multidisciplinary design. In:
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference and Exhibit, 41st, Atlanta, GA, no. 1785 in AIAA 2000 (2000)
11. Dubois, D., Prade, H.: Fuzzy sets, probability and measurement. European Journal of
Operational Research 40(2), 135154 (1989)
12. Ferson, S., Nelsen, R.B., Hajagos, J., Berleant, D.J., Zhang, J., Tucker, W.T., Ginzburg,
L.R., Oberkampf, W.L.: Dependence in probabilistic modeling, dempster-shafer theory,
and probability bounds analysis. Tech. Rep. SAND2004-3072, Sandia National Laboratories (2004)
13. Hayes, B.: A lucid intervals. American Scientist 91(6), 484488 (2003)
14. Helton, J.C.: Uncertainty and sensitivity analysis in the presence of stochastic and subjective uncertainty. Journal of Statistical Computation and Simulation 57, 376 (1997)
15. Helton, J.C., Johnson, J., Oberkampf, W.L., Storlie, C.: A sampling-based computational
strategy for the representation of epistemic uncertainty in model predictions with evidence theory. Computer Methods in Applied Mechanics and Engineering 196(37-40
SPEC ISS), 39803998 (2007)
16. Hoffman, F.O., Hammonds, J.S.: Propagation of uncertainty in risk assessments: The
need to distinguish between uncertainty due to lack of knowledge and uncertainty due to
variability. Risk Analysis 14(5), 707712 (1994)
17. Kemble, S.: Interplanetary Mission Analysis and Design. Springer Praxis Books, Heidelberg (2006)
18. Klir, G.J., Smith, R.M.: On measuring uncertainty and uncertainty-based information:
Recent developments. Annals of Mathematics and Artificial Intelligence 32(1-4), 533
(2001)
19. Kreinovich, V., Xiang, G., Starks, S.A., Longpre, L., Ceberio, M., Araiza, R., Beck,
J., Kandathi, R., Nayak, A., Torres, R., Hajagos, J.G.: Towards combining probabilistic
and interval uncertainty in engineering calculations: Algorithms for computing statistics
under interval uncertainty, and their computational complexity. Reliable Computing 12,
471501 (2006)
20. Limbourg, P.: Multi-objective optimization of problems with epistemic uncertainty. In:
Coello Coello, C.A., Hernandez Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS,
vol. 3410, pp. 413427. Springer, Heidelberg (2005)
21. Lophaven, S.N., Nielsen, H.B., Sondergaard, J.: DACE: a MatLab kriging toolbox. Tech.
Rep. IMM-TR-2002-12, Technical University of Denmark (2002)
22. Neumaier, A.: Clouds, fuzzy sets, and probability intervals. Reliable Computing 10(4),
249272 (2004)
23. Oberkampf, W., Helton, J.C.: Investigation of evidence theory for engineering applications. In: 4th Non-Deterministic Approaches Forum, AIAA, vol. 1569 (2002)
24. Pate-Cornell, M.E.: Uncertainties in risk analysis: Six levels of treatment. Reliability
Engineering and System Safety 54, 95111 (1996)
25. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton
(1976)
570
26. Smarandache, F., Dezert, J.: An introduction to the DSm theory for the combination
of paradoxical, uncertain and imprecise sources of information. In: 13th International
Congress of Cybernetics and Systems (2005)
27. Tessem, B.: Approximations for efficient computation in the theory of evidence. Artif.
Intell. 61(2), 315329 (1993)
28. Vasile, M.: Robust mission design through evidence theory and multiagent collaborative
search. Annals of the New York Academy of Sciences 1065, 152173 (2005)
29. Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and
Systems 100(suppl. 1), 934 (1999)
Chapter 22
Abstract. This chapter focuses on a design methodology that aids in design and
development of complex engineering systems. This design methodology consists of
simulation, optimization and decision making. Within this work a framework is presented in which modelling, multi-objective optimization and multi criteria decision
making techniques are used to design an engineering system. Due to the complexity of the designed system a three-step design process is suggested. In the first step
multi-objective optimization using genetic algorithm is used. In the second step a
multi attribute decision making process based on linguistic variables is suggested
in order to facilitate the designer to express the preferences. In the last step the fine
tuning of selected few variants is performed. This methodology is named as Progressive Design Methodology (PDM). The method is applied as a case study to design
a permanent magnet brushless DC motor drive and the results are compared with
experimental values.
22.1 Introduction
The design of complex engineering systems, as such electrical drives and power
electronics, requires application of knowledge from several disciplines (multidisciplinary) of engineering (electrical, mechanical, thermal) [1, 2, 3]. The interdisciplinary nature of complex systems presents challenges associated with modelling,
simulation, computation time and integration of models from different disciplines.
There is a need to develop design methods that can model different degrees of collaboration and help resolve the conflicts between different disciplines. In order to
Praveen Kumar
Indian Institute of Technology, Department of Electronics and Communication Engineering
Guwahati 781039 Assam India
e-mail: praveen_kumar@iitg.ernet.in
Pavol Bauer
Delft University of Technology, Mekelweg 4 2628 CD Delft The Netherlands
e-mail: P.Bauer@ewi.tudelft.nl
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 571607.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
572
22
573
of feasible solutions is selected. In the final step a detailed model of the variants of
the system, as selected from the previous set, are developed and the design variables
of the system are fine-tuned. At the end the final design of the system is selected.
The aim of this chapter is to:
Present a framework where optimisation and decision making is employed to
accelerate and improve the design of complex systems.
Support the formulation of the optimisation problem, partly by supporting the
selection of optimisation parameters, but also by supporting the formulation of
the objective functions. The design problem is often multiobjective in nature,
it is therefore natural to formulate the problem as a multiobjective optimisation
problem.
Develop a framework in which system-level simulation models can be composed
from sub-system models in different disciplines
Formalise a multi-domain modelling paradigm that allows to evolve with the
design process, increasing in detail as the design process progresses
An ideal condition in the design of an engineering system will be if all the objectives and constraints can be expressed by a simple model. However, in practical
design problems this is seldom the case due to the complexity of the system. Hence,
a trade-off has to be made between the complexity of the model and time to compute the model. A complex model will enable us to represent all the objectives and
constraints of the system but will be computationally intensive. On the other hand a
simple model will be computationally inexpensive but will limit the scope of objectives and constraints that can be expressed. In order to overcome this problem PDM
consists of three main phases:
Synthesis phase of PDM
Intermediate analysis phase of PDM
Final design phase of PDM
Since in the first step (synthesis phase) of PDM the detailed knowledge of the system is unavailable, the optimization process is exhaustive. If complex models are
574
used in this stage then the computational burden will be overwhelming. In order
to facilitate the initial optimization process only those objectives and constraints
are considered that can be expressed by simple mathematical models of the system. In the synthesis process a set of feasible solutions (Pareto Optimal solutions)
is obtained, Fig. 22.1 and Fig. 22.2. The important task in engineering design is to
generate various design alternatives and then to make preliminary decision to select
a design or a set of designs that fulfils a set of criteria. Hence the engineering design decision problem is a multi criteria decision-making problem. In the conceptual
stages of design, the design engineer faces the greatest uncertainty in the product attributes and requirements (e.g., dimensions, features, materials, and performance).
Because the evolution of the design is greatly affected by decisions made during the
conceptual stage, these decisions have a considerable impact on overall cost. In the
intermediate analysis process the multi criteria decision making process is carried
Fig. 22.1 Set of Pareto optimal solutions for a optimisation problem with two objectives
Fig. 22.2 Set of Pareto optimal solutions for a optimisation problem with three objectives
22
575
out. This step is a screening process where the large number of solutions obtained
from the first step are subjected to process of elimination. In order to achieve the
elimination additional constraints are taken into consideration. The constraints considered here are those that cannot be expressed explicitly in mathematical terms,
such as manufacturability of an embodiment of the system. In the final design phase
detail model of the system is developed. After having executed the synthesis phase a
better understanding of the system is obtained and it is possible to develop a detailed
model of the system. In this phase all the objectives and constraints that could not be
considered in the synthesis phase are taken into consideration. In this phase exhaustive optimisation is not carried out, rather fine tuning of the variables is performed
in order to satisfy all the objectives and constraints.
The implementation of the above steps is shown in Fig. 22.3. From Fig. 22.3 it
can be seen that the six steps involved in the synthesis phase are not executed in
purely sequential manner. After the sensitivity analysis has been done and a set of
independent design variables (IDV) has been identified, the designer has to decide
if the set of IDV obtained is appropriate to proceed with the modelling process. The
decision about the appropriateness of the set of IDV can be made based on previous
experience or discussions with other experts. If the set of IDV is not sufficient then
it is prudent to go back to system requirement analysis and perform the loop again.
This loop can be repeated until a satisfactory set of IDV is identified. Similarly after
the model of the system to be designed (target system) is developed, it is important to
check if the model includes the system boundaries and the set of IDV. In reality the
selection of variables and the development of the model have to be done iteratively
since both depend on each other. The choice of variables has influence on modelling
and the modelling process itself will influence the variables needed. The details of
each of the above steps are given in the following subsections.
576
Fig. 22.3 Steps in the synthesis phase of Progressive Design Methodology (PDM)
22
577
578
Economic criterion/criteria: In engineering system design problems the economic criterion involves total capital cost, annual cost, annual net profit, return
on investment, cost-benefit ration or net present worth.
Technological criterion/criteria: The technological criterion involves production
time, production rate, and manufacturability.
Performance criterion/criteria: Performance criterion is directly related to the
performance of the engineering system such as torque, losses, speed, mass, etc.
In the synthesis phase of PDM the Performance criterion/criteria are taken into consideration because they can be expressed explicitly in the mathematical model of
the system. The economic and technological criteria are suitable for Intermediate
analysis and Final design phases of PDM because by then detailed knowledge about
the engineering systems performance and dimensions are available.
22
579
580
els and hence are suited for MOOP. The major aspect of analytical model is
that certain approximations are required to develop analytical models. However
in certain cases where approximations cannot be made and a very deep insight
of the system are required then numerical simulation methods such as Finite
Element Method (FEM), Computational Fluid Dynamics (CFD), etc. have to be
adopted. The main drawback of numerical models is that they are computationally intensive and are not suitable for exhaustive optimisation process.
A detailed discussion about the suitability of the models is given in another section
of this chapter.
22
581
582
With this in view in PDM posteriori based optimisation method is used. In principle
any posteriori based multiobjective optimisation algorithm such as NSGA-II [28],
SPEA 2 [29], etc. can be used in PDM. In this work the NBGA [30] was used.
Choosing a suitable solution from the Pareto optimal set forms the second phase of
PDM and is described in the next section 22.4.
22
583
The general problem is thus a Multi Criteria Decision-Making problem, where the
designer is to choose the highest performing design configuration from the available
set of design alternatives and each design is judged by several, even competing, performance criteria or variables. A Multi Criteria Decision-Making problem (MCDM)
is expressed as:
C1 , C2 , , Cn
A2
x2n x22 x2n
D= . .
..
.
.. ..
. ..
Am
(22.1)
w = {w1 , w2 , ..., wn }
(22.2)
584
where Ai , i=1,...,m are the possible alternatives; c j , j=1,...,n are the criteria with
which alternative performances are measured and xi j is the performance score of
the alternative Ai with respect to attribute C j and w j are the relative importance of
attributes. The alternative performance rating xi j can be crisp, fuzzy, and/or linguistic. The linguistic approach is an approximation technique in which the performance
ratings are represented as linguistic variable [38, 39, 40] . The classical MCDM
problem consists of two phases:
an aggregation phase of the performance values with respect to all the criteria for
obtaining a collective performance value for alternatives
an exploitation phase of the collective performance value for obtaining a rank
ordering, sorting or choice among the alternatives.
The various parts of intermediate analysis phase of PDM are:
22
585
586
Averaging operators are all those functions lying between the maximum and
minimum.
For linguistic weighted information the aggregation operators mentioned above
have to be modified for linguistic variables and can be placed under two categories
[55] Linguistic Weighted Disjunction (LWD) and Linguistic Weighted Conjunction
(LWC). In Fig. 22.10 the detailed classification of the linguistic aggregation operators is shown. In the following example the mathematical formulation of LWD
and LWC is given. In order to illustrate each of the above mentioned linguistic aggregation operators the following example is considered [56]: Example: For each
alternative an expert is required to provide his/her opinion in terms of elements from
the following scale
S = {OU(S7 ),V H(S6 ), H(S5 ), M(S4 ), L(S3 ),V L(S2 ), N(S1 )}
(22.4)
where OU stands for Outstanding, V H for Very High, H for High, M for Medium,
L for Low, V L for Very Low, N for None. The expert provides the opinion on a set
of five criteria . An example of criteria as for electrical drive can be:
C1 =Mass of the motor (Minimum mass is 100 gram and maximum mass is 800
gram)
C2 =Cost of the electrical drive (Minimum cost is 10 Euros and maximum cost is
80 Euros)
C3 =Losses in the electrical drive (Minimum loss is 10 watts and maximum loss
is 80 watts)
C4 =Electrical time constant (Minimum loss is 0 .1 milliseconds and maximum
time constant is 0.8 milliseconds)
C5 =Moment of inertia of the motor (Minimum moment of inertia is 1 and maximum moment of inertia is 8)
22
587
Then perofrmance of each alternative is also defined in terms of the scale. The performance of each alternative is also defined in terms of the scale shown above. The
scale is evenly distributed and the scale for each alternative is given in Table 22.1
[55]. The problem is to select a drive that has lowest losses, lowest cost, lowest
mass, low electrical time constant and low moment of inertia. The motor is to be
used in a hand held drill. For this application the mass of the motor and its cost
are very important because a lighter motor with a low cost will be most preferred.
Hence these two criteria are given Very High (VH) importance. For this application
the efficiency of the motor is of moderate importance and is given a Medium (M)
importance. The electrical time constant and moment of inertia of the rotor are important from the dynamic behaviour of the motor and are not very important for the
application in hand held drill and are given low (L) and Very Low (VL) importance.
The importance to each criterion is shown in Table 22.2. The performance of an
alternative on all the criteria is also shown in Table 22.2; in brackets the numerical value is given. The aggregation of the weighted information using Linguistic
Weighted Conjunction (LWC) is defined as follows
f = LWC [(w1 , a1 ) , , (wm , am )]
(22.5)
where LWC = MIN i=1, . . .,m Max (Neg (wi ) , ai ) and m is the number of alternatives.
An example of LWC is Kleene-Dienes Linguistic Implication Function LI1 [58]:
LI1 (w, a) = (Neg (w) , a)
(22.6)
Table 22.1 The relation between numerical values and linguistic variables
C1
C2
C3
C4
C5
VL
VH
OU
100-200
10-20
10-20
0.1-0.2
1-2
200-300
20-30
20-30
0.2-0.3
2-3
300-400
30-40
30-40
0.3-0.4
3-4
400-500
40-50
40-50
0.4-0.5
4-5
500-600
50-60
50-60
0.5-0.6
5-6
600-700
60-70
60-70
0.6-0.7
6-7
700-800
70-80
70-80
0.7-0.8
7-8
C1
VH
M(425)
M(460)
H(572)
OU(72)
H(550)
C2
VH
L(34)
OU(75)
M(47)
M(45)
M(46)
C3
M
OU(77)
VH(64)
VH(64)
H(53)
H(55)
C4
L
VH(0.65)
VH(0.67)
H(0.53)
VH(0.66)
OU(0.74)
C5
VL
OU(7.6)
H(5.6)
OU(7.8)
H(5.8)
VH(6.5)
588
Based on the example given in Table 22.1 the net performance of the first alternative
based on LI1 is:
f1 = MIN [LI1 (V H, M) , LI1 (V H, L) , LI1 (M, OU) , LI1 (L,V H) , LI1 (V L, OU)]
= MIN [M, L, OU,V H, OU] = L
(22.7)
(22.11)
Hence on the basis of LI1 the final score of all the alternatives is [L, M, M, M, M].
The results of total score of all the five alternatives based on different aggregation
operators is summarised below in Table 22.3. From the above the following conclusions can be drawn:
The choice of linguistic aggregation operator can influence the results of the
intermediate analysis process.
Table 22.3 Result of total score of all the alternatives using different aggregation operators
Alternative
Min LD
1
Nilpotent LD
2
Weakest LD
3
Kleene-Dienes LI1
Gdels LI2
Fodors LI3
Lukasiewiczs LI4
1
M
M
M
L
L
L
M
2
VH
VH
VH
M
M
M
H
3
H
H
VL
M
M
M
M
4
VH
VH
VH
M
M
M
H
5
H
H
L
M
M
M
H
22
589
The linguistic weighted disjunction aggregation operators in general give an optimistic average value to alternatives. The Weakest linguistic disjunction gives
the least optimistic value to the alternatives.
The linguistic weighted conjunction aggregation operators in general give a pessimistic average value to the alternatives.
Out of all the conjunction operators the Lukasiewiczs implication operator gives
the least pessimistic final score to all the alternatives.
The disjunction aggregation operators are suitable if it is required to select a set
of as many alternatives as possible. This situation can arise in the initial design
phase when the designer wants to include as many alternatives as possible for
further investigation.
In the initial design process if the number of alternatives is large and there is
limited capability, in terms of manpower and computing power, to investigate
each alternative then linguistic weighted conjunction operators are preferred.
590
multiobjective optimization algorithms an initial set of feasible solution is generated. Since the multiobjective optimization is employed and the detailed knowledge
of the system is not available, it is prudent to use simple low fidelity models of the
system. The advantage of low fidelity models is that they are computationally less
intensive and hence are suitable for multiobjective optimization. The suitable low fidelity models are the analytical models. For many situations it is possible to develop
an analytical model of the system by making suitable assumptions. However if analytical models are not possible then simple numerical models of the system should
be used in the synthesis phase of PDM. In the Intermediate Analysis phase of PDM
the selection is performed. The central challenge of this phase is to select from the
set of solutions, obtained in the Synthesis Phase, a subset of suitable solutions. The
selection process involves evaluating the alternatives available. In PDM the alternatives are evaluated based on criteria that cannot be expressed mathematically such as
manufacturability of the system. In order to achieve this the judgmental models are
used. The judgmental models are formed by the deductions and assessments contained in the mind of an expert. In Intermediate Analysis the expert evaluates each
alternative based on judgmental models and assigns preference based on linguistic
variables and the entire multi attribute decision making is carried out (chapter 2).
After the selection process a small set of suitable solutions is generated. The Final
Analysis phase of PDM involves the tuning process. In the tuning process the system performance criteria are improved by varying system parameters. In order to
achieve this, high fidelity model of the system that is to be designed is developed.
Each alternative obtained after Intermediate Analysis phase is evaluated using the
high fidelity model and tuning of the system is performed. The high fidelity models
can be developed using finite element methods (FEM), computational fluid dynamic
(CFD), etc. These models are computationally intensive but are closer to the actual
system and are suitable for Final Analysis phase of PDM. In the next section the
PDM is applied for design of a BLDC motor. The various aspects of PDM are used
in the design of BLDC motor.
22
591
The aim of the problem is to design a motor with a cogging torque of less than
20 milliNm, maximum efficiency, minimum mass and trapezoidal back emf.
Inverter Full bridge Voltage source inverter
Motor topology Inner rotor with surface mount magnets
Phase connection The phases are connected in star
The additional constraints of the motor are:
Outer stator diameter 40 mm
Max. Length 50 mm
Air gap length 0.2 mm
Maximum input voltage 50 Volts
In the synthesis phase of PDM only simple model of the BLDC drive is developed.
However determining parameters like cogging torque and shape of the back emf
requires detailed analytical models or FEM models. The mass and efficiency of the
motor can be calculated with relative ease compared to the cogging torque and back
emf shape. Hence in the synthesis phase the objectives that will be considered are:
Minimise the mass
Maximise the efficiency
592
Variable name
Number of poles
Number of slots
Length of the motor
Ratio of inner diameter of motor to outer diameter
Ratio of magnet angle to pole pitch
Height of the magnet
Reminance field of the permanent magnet
Maximum allowable field density in the lamination material for linear operation
Number of turns in the coils of the motor
Input Voltage
Symbol
Np
Ns
Lmot
dido
m
hm
Br
Bfe
Min. value
2
3
1
0.1
0.1
1
0.5
0.5
Max. Value
10
15
15
0.7
1
3
1.2
2
Units
mm
mm
T
T
Nturns
Vdc
1
10
100
50
Motor Model
In this section a simple design methodology for the surface mounted BLDC motor is given [57]. To develop this model certain assumptions have been made. The
assumptions made are:
The general configuration of the motor is shown in Fig. 22.11. The motor design
equations are developed in detail in [57].
22.7.4.2
The schematic of the typical voltage source inverter is shown in Fig. 22.12. The
coiupled circuit equations of the stator windings in terms of the motor electrical
parameters are
d [i]
[V ] = [R] [i] + [L]
+ [e]
(22.12)
dt
where
[V ] = [Va ,Vb ,Vc ]
(22.13)
22
593
R ph 0 0
[R] = 0 R ph 0
0 0 R ph
[i] = [ia , ib , ic ]
L ph 0 0
[L] = 0 L ph 0
0 0 L ph
[e] = [ea , eb , ec ]
(22.14)
(22.15)
(22.16)
(22.17)
where R ph and L ph are the phase resistance and phase inductance values respectively defined earlier and V a , V b and V c the input voltages to each phase a, b and c
respectively. The induced emf ea , eb and ec and the phase resistance R ph and phase
inductance L ph are determined from the motor model described above. The electromagnetic torque is given by
594
Te = [ea ia , eb ib , ec ic ]
1
m
(22.18)
where m is the mechanical speed of the motor. The analytical solution of the eq.
(7) is done following the lines of Nucera et.al. work [58].
22
595
596
perform the screening process certain parameters are required. Each solution obtained in the previous section is evaluated based on the values these parameters. The
application of various steps of intermediate analysis is explained in the following
subsection.
22
597
Stack length
Losses
Mass
Electrical time constant
Inertia of the rotor
Ratio of inner diameter of stator to outer diameter
Number of turns
Switching frequency
Width of the tooth
Thickness of the stator yoke
Input Voltage
Area of slots
The losses and mass of the motor are the primary parameters. A motor with smallest losses and smallest mass is preferable. However as can be seen from the results
of the previous section as the mass increases the losses decrease. Hence in the intermediate analysis both are considered for the screening purpose. Electrical time
constant of the motor has a direct influence on the dynamic performance of the motor. A motor with lower time constant has a better dynamic response compared to
the motor with higher electrical time constant. Similarly the inertia of the rotor is
important parameter because it influences the dynamic performance of the motor.
A motor with high inertia will accelerate slowly compared to the motor with lower
598
inertia. The ratio of inner diameter of stator to outer diameter of stator is considered
because it has an influence on the end turn of the winding. Switching frequency
has an impact on the performance of the motor. Higher switching frequency results
is lower torque ripple but higher switching losses and a lower switching frequency
results in higher torque ripple but lower switching frequency. The magnetic loading and the mechanical aspects determine the width of the tooth. If the tooth is too
thin then it may not be able to withstand the mechanical forces acting on it. Hence
in this analysis tooth with higher thickness is preferred. The thickness required for
the stator yoke depends on the magnetic loading of the machine as well as on the
mechanical properties. If the number of the pole pairs is small, often the allowable
magnetic loading and the mechanical loading determines the thickness of the stator yoke. However, if the number of pole pairs is high enough the stator yoke may
be thin if it is sized according to the allowed magnetic loading. The mechanical
constraints may thus determine the minimum thickness of the stator yoke. In the
decision making process it smaller the thickness of stator yoke the better it is. A
smaller yoke thickness is preferred because it reduces the mass of the steel lamination required. The area of the slot is considered as an objective because it influences
the winding. A slot with smaller area is difficult to wind. Hence in this analysis a
larger slot area is preferred.
(22.20)
where Sa < Sb i f a < b. The linguistic term set in addition satisfy the following conmditions:
Negation operatorNeg (Si ) = S j , j = T i (T + 1is cardinality))
(22.21)
(22.22)
(22.23)
22
599
Importance
Direction
M
H
VH
H
L
H
M
N
N
VL
L
H
H
L
L
L
L
L
H
L
L
L
L
L
L
H
dido
Br
Bfe
hm
Vdc
Fsw
wt
wy Ns
Nm
59
60
60
60
60
60
0.60
0.60
0.60
0.50
0.60
0.54
0.94
0.94
0.98
0.88
0.78
0.96
1.19
1.19
1.20
0.82
0.82
1.01
1.88
1.55
1.85
1.99
1.94
1.97
1.61
1.59
1.57
1.53
1.51
1.55
36.24
21.10
25.05
22.51
19.56
25.36
206.17
207.34
149.98
192.57
115.40
120.08
5.80
3.39
3.92
2.90
2.14
2.61
4.35
2.54
2.94
1.09
1.20
1.18
4
6
8
8
8
10
10.35
10.65
10.36
18.21
19.93
19.49
6
9
12
6
9
9
600
the better it is. The area of the slot is given a high importance and the higher value
of the slot area is preferred. The results of the multicriteria decision for motors are
given in Table 22.6.
22
601
602
Lmotor
dido
Br
Bfe
hm
Vdc
Fsw
wt
60
10
0.60
0.65
1.57
1.505
24
206.17
2.015 1.511 12
wy
Ns
Nm
8
Fig. 22.20 Power vs. Speed characteristics: Comparison between simulation and experimental values
22
603
Fig. 22.21 Current vs. Speed Characteristic comparison between simulation and experimental values
Fig. 22.22 Torque vs. Speed characteristics: Comparison between simulation and experimental values
604
Fig. 22.23 Cogging torque comparison between simulations and experimental values
22.10 Conclusions
In this chapter the progressive design methodology (PDM) is proposed. This
methodology is suitable for designing complex systems, such as electrical drive and
power electronics, from conceptual stage to final design. The main aspects of PDM
discussed are as follows:
PDM allows effective and efficient practices and techniques to be used from the
start of the project.
PDM ensures that each component of the system is compatible with each other.
The computation time required for optimisation is reduced as the bulk of optimisation is done in the synthesis phase and the models of the components of the
target system are simple in the synthesis phase.
The experience of design engineers and production engineers are included in the
intermediate analysis thus ensuring that the target system is feasible to manufacture.
In PDM the decision making factor is critical as proper decisions about dimensions,
features, materials, and performance in the conceptual stage will ensure a robust
and optimal design of the system. The different stages of PDM are explained using the example of the design of a BLDC motor and the results are validated by
22
605
experiments. It is shown that using PDM an optimal design of the motor can be
obtained that meets the performance requirements.
References
1. Balling, R.J., Sobieszczanski, J.S.: Optimization of coupled Systems: A Critical
Overview of Approaches. AIAA 34, 617 (1996)
2. Lewis, K., Mistree, F.: Collaboration, Sequential and Isolated Decision Design. ASME
Journal of Mechanical Design 120, 643652 (1998)
3. Sobieszczanski, J.S., Haftka, R.T.: Multidisciplinary Design Optimization. Structural
Optimization 14, 123 (1997)
4. Alexandrov, N.M., Lewis, R.M.: Analytical and Computational Aspects with Multidisciplinary Design. AIAA 40, 301309 (2002)
5. Sobieszczanski, J.S.: Optimization by Decomposition: A Step from Hierarchic to NonHierarchic Systems. In: Second NASA/USAF Symposium on Recent Advances in Multidisciplinary Analysis and Optimization (1988)
6. Sobieszczanski, J.S., Agte, J.S., Sandusky, R.R.J.: Bilevel Integrated System Synthesis.
AIAA Journal 38, 164172 (2000)
7. Sobieszczanski, J.S., Altus, T.D., Philips, M., Sandusky, R.: Bilevel Integrated System
Synthesis (BLISS) for Concurrent and Distributed Processing. In: 9th AIAA/ISSMO
Symposium on Multidisciplinary Analysis and Optimization (2002)
8. Tappeta, R., Nagendra, S., Renand, J.E., Badhrinath, K.: Concurrent Sub-Space Optimization (CSSO) Code using iSIGHT. Technical Report 97CRDD188, GE (1998)
9. Marczyk, J.: Stochastic Multidisciplinary Improvement: Beyond Optimization. Presented at Proceedings of 8th AIAA/ USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, Long beach, USA (2000)
10. Egorov, N., Kretinin, G.V., Leshchenko, I.A.: Stochastic Optimization of Parameters and
Control Laws of Aircraft Gas-Turbine Engines- a Step to a Robust Design. In: Inverse
Problem in Engineering Mechanics III, pp. 345353 (2002)
11. Koch, P.N., Wujek, B., Golovidov, O.: A Multi-Stage, Parallel Implementation of Probabilistic Design optimization in an MDO Framework. Presented at Proceeding of 8th
AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, Long Beach, USA (2000)
12. Tong, M.T., Wujek, B., Golovidov, O.: A Probabilstic Approach to Aeropropulsion System Assessment. Presented at Proceedings of ASME TURBOEXPO, Munich, Germany
(2000)
13. Booker, A., Dennis, J., Frank, P., Serafini, D., Torczon, V., Trosset, M.: A Rigorous
Framework for Optimization of Expensive Functions by Surrogates. Structural Optimization 17, 113 (1999)
14. Audet, C., Dennis, J., Moore, D.W., Booker, A., Frank, P.D.: A surrogate model
based method for constrained optimization. Presented at Proceedings of the 8th
AIAA/USAF/NASA/ASSMO Symposium on Multidisciplinary Analysis and Optimization, Long Beach, USA (2000)
15. Audet, C., Dennis, J.: A pattern search filter method for nonlinear programming without
derivatives. SIAM Journal of Optimization 14, 9801010 (2004)
16. Dasgupta, S.: The Structure of Design Processes. Advances in Computers 28, 167
(1989)
606
17. Shakeri, C.: Discovery of Design Methodologies for the Integration of Multi-disciplinary
Design Problems. Mechanical Engineering: Worcester Polytechnic Institute (1998)
18. Systems Engineering Manual Version 3.1, The Federal Aviation Administration
19. Chong, E.P.K., Zak, S.H.: An Introduction to Optimization, 2nd edn. John Wiley & Sons,
Chichester (2001)
20. Buede, D.M.: The Engineering Design of Systems: Models and Methods. Wiley Interscience, Hoboken (1999)
21. Hwang, S.P.C., Yoon, K.: Mathematical Programming With Multiple Objectives: A Tutorial. Computers & Operations Research 7, 531 (1980)
22. Steuer, R.: Multiple Criteria Optimization: Theory, Computation and Application. John
Wiley & Sons, Chichester (1986)
23. Das, I., Dennis, J.E.: A close look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multi-criteria optimization problems. Structural Optimization 14, 6369 (1997)
24. Krus, P., Palmberg, J.O., Lohr, F., Backlund, G.: The Impact of Computational Performance on Optimisation in Aircraft Design. Presented at I MECH E, AEROTECH 1995,
Birmingham, UK (1995)
25. Thurston, D.: A formal method for subjective design evaluation with multiple attributes.
Research in Engineering Design 3, 105122 (1991)
26. Steuer, R., Choo, E.-U.: An Interactive Weighted Tchebycheff Procedure for Multiple
Objective Programming. Mathematical Programming 26, 326344 (1983)
27. Benayon, R., Montgolfier, J.D., Tergny, J., Laritchev, O.: Linear Programming with multiple objective functions: Step method (STEM). Mathematical Programming, 366375
(1971)
28. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast elitist nondominated sorting
genetic algorithm for multi-objective optimization. In: Deb, K., Rudolph, G., Lutton,
E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS,
vol. 1917, pp. 849858. Springer, Heidelberg (2000)
29. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength pareto evolutionary alorithm (2001)
30. Kumar, P., Gospodaric, D., Bauer, P.: Improved Genetic Algorithm Inspired by Biological Evolution. Soft Computing- A Fusion of Foundations, Methodologies and Applications 11, 923941 (2006)
31. Scott, M.J., Antonsson, E.K.: Arrows Theorem and Engineering Design Decision Making. Research in Engineering Design 11, 218228 (1999)
32. Costa, B., Vincke, P.: Multiple criteria decision aid: An overview. Readings in Multiple
Criteria Decision Aid, 314 (1990)
33. Carlsson, C., Fuller, R.: Fuzzy Multiple Criteria Decision Making: Recent Developments. 78, 139153 (1996)
34. Riberio, R.A.: Fuzzy Multiple Attribute Decision Making: A Review and New Preference elicitation Techniques. Fuzzy Sets and Systems 78, 155181 (1996)
35. Roubens, M.: Fuzzy Sets and Decision Analysis. Fuzzy Sets and Systems 90, 199206
(1997)
36. Matinez, L., Liu, J., et al.: A Fuzzy Model for Design Evaluation Based on MultipleCriteria Analysis in Engineering Systems. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 14, 317336 (2006)
37. Whitney, D.E.: Manufacturing by Design. Harvard Business Review 66, 8391 (1988)
38. Zadeh, L.A.: The concept of a linguistic variable and its applications to approximate
reasoning. Part III. Information Sciences 9, 4380 (1975)
22
607
39. Zadeh, L.A.: The concept of a linguistic variable and its applications to approximate
reasoning. Part I. Information Sciences 8, 301357 (1975)
40. Zadeh, L.A.: The concept of a linguistic variable and its applications to approximate
reasoning. Part II. Information Sciences 8, 357377 (1975)
41. Bordogna, G., Passi, G.: A Fuzzy Linguistic Approach Generalizing Boolean Information Retrival: A Model and its Evaluation. Journal of the American Society for Information Science 44, 7082 (1993)
42. Bonissone, P.P.: A Fuzzy Sets Based Linguistic Approach. Approximate Reasoning in
Decision Analysis, 329339 (1986)
43. Bonissone, P.P., Decker, K.S.: Selecting Uncertainty Calculi and Granularity: An Experiment in Trading-off Precision and Complexity. In: Uncertainty in Artificial Intelligence,
pp. 217247 (1986)
44. Bordogna, G., Fedrizzi, M., Passi, G.: A Linguistic Modelling of Consensus in Group
Decision Making Based on OWA Operators. IEEE Transactions on System, Man, and
Cybernatics - Part A: Systems and Humans 27, 126132 (1997)
45. Delago, M., Verdegay, J.L., Vila, M.A.: Linguistic Decision Making Models. International Journal of Intelligent Systems 7, 479492 (1992)
46. Herrera, F., Herrera-Viedma, E.: Linguistic Decision Analysis: Steps for Solving Decision Problems under Linguistic Information. Fuzzy Sets and Systems 115, 6782 (2000)
47. Torra, V.: Negation Functions Based Semantics for Ordered Linguistic Labls. International Journal of Intelligent Systems 11, 975988 (1996)
48. Herrera, F., Verdegay, J.L.: A Linguistic Decision Process in Group Decision Making.
Group Decision and Negotiation 5, 165176 (1996)
49. Gabrish, M., Murofushi, T., Sugeno, M.: Fuzzy Measures and Integrals. Physica-Verlag,
Heidelberg (1999)
50. Dubious, D., Prade, H.: On the use of aggregation operations in information fusion process. Fuzzy Sets and Systems 142, 143161 (2004)
51. Klement, E.P., Mesiar, R., Pap, E.: Triangular Norms. Kluwer Academic Publishers, Dordrecht (2000)
52. Sivert, W.: A Class of Operations Fuzzy Sets. IEEE Transactions on System, Man, and
Cybernatics - Part A: Systems and Humans 9 (1979)
53. Calvo, T., Baets, B.D., Fodor, J.: The functional equations of Alsina and Frank for uniforms and null-norms. Fuzzy Sets and Systems 120, 1524 (2001)
54. Yager, R., Fodor, J.: Structure of Uninorms. Journal of Uncertainity, Fuzziness and
Knowledge based Systems 5, 411427 (1997)
55. Herrera, F., Herrera-Viedma, E.: Aggregation Operators for Linguistic Weighted Information. IEEE Transactions on Systems, Man and Cybernatics 27, 646656 (1997)
56. Carlsson, C., Fuller, R.: On fuzzy screening systems. Presented at EUFIT 1995, Aachen,
Germany (1995)
57. Hanselmann, D.C.: Brushless Permanent Magnet Motor Design, 2nd edn. The Writers
Collective (2003)
58. Nucera, R.R., Sundhoff, S.D., Krause, P.C.: Computation of Steady State Performance
of an Electronically Commutated Motor. IEEE Transactions on Industry Application 25,
110111 (1989)
Chapter 23
The singular and plural of abbreviations are spelled the same in this chapter.
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 609635.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
610
fact, the exact evaluation of network reliability belongs to the class of NP-hard problems [2]; that is, these exists no polynomial-time algorithm to calculate the reliability
of a given network exactly. Moreover, even if we know the reliability of every candidate network, the RNDP itself cannot be solved exactly in a polynomial time.
To address realistic RNDP, many algorithms have been studied and proposed
in the literature. They can be classified into three categories: (i) enumerationbased approaches, (ii) heuristic approaches, and (iii) computational intelligence.
The enumeration-based approaches attempt to evaluate all the possible candidate
solutions to find the best one. To avoid exhaustive enumeration, reduction techniques such as branch and bound methods [1] should be involved. The reduction
technique restructures the search space so that scanning the limited portion of the
search space could yield the optimal solution. It has been shown that even with a
well-designed reduction technique, the enumeration-based approaches are applicable for small networks only.
The heuristic approaches find a sub-optimal solution of the RNDP by exploring
the search space using problem specific trial and error mechanism. The heuristic approaches never guarantee the discovering of the optimal solution, but they are practical choices to find a satisfactory solution in an acceptable time. Typical examples
of the classical heuristic-based approaches are greedy heuristics [3], cross-entropy
methods [4], simulated annealing [5, 6, 7], and tabu search [8, 9].
Algorithms based on computational intelligence can be regarded as a branch
of heuristic approaches. However, computational intelligence simulates distinctive
search process based on learning, adaptation, and evolution mechanism. A Genetic
Algorithm (GA) is the most widely used optimization technique based on computational intelligence [10, 11]. Mimicking natural evolution process, the GA maintains
a population of candidate solutions by applying selection, crossover, and mutation
operators iteratively. The goal of the GA process is adapting the population to the
fitness landscape of the RNDP to find a good sub-optimal solution. More recently,
it has been often attempted to hybridize the GA with problem-specific local search
algorithms to achieve better solution quality [12]. Such hybrid GA is called genetic
local search or memetic algorithms [13, 14].
This chapter surveys up-to-date research efforts for the RNDP and proposes a
new GA hybridized with an Ant Colony System (ACS). The ACS is a heuristic inspired by the behavior of real ants, which establish the shortest path between the nest
and food source [15, 16]. To combine the GA and ACS, the proposed heuristic algorithm incorporates a Multi-Ring Encoding (MRE), which encodes a candidate network as a union of rings. The MRE has three distinctive advantages for the RNDP.
First, it can represent every possible two-edge-connected network. Second, it is free
from expensive algorithms required to repair disconnected or unreliable candidate
networks generated by the GA. Third, the MRE allows incorporating a local search
heuristic dedicated to ring optimization. In the proposed hybrid heuristic, the GA
works as a high-level heuristic evolving a population of multi-ring-encoded individuals with special genetic operators. On the other side, the ACS fine-tunes each ring
by trying to connect the nodes in other possible orders.
This chapter is organized as follows. Section 23.2 describes the mathematical formulation of the RNDP and suggests the ways handling two objectives,
23
611
cost and reliability. Section 23.3 classifies reliability metrics and introduces the issues regarding reliability evaluation and estimation. Section 23.4 outlines previous
works related to this study. Section 23.5 reviews existing encoding methods developed for the network design problem and discusses the advantages of the proposed MRE. Section 23.6 explains each procedure of the proposed hybrid heuristic
in detail. Section 23.7 discusses numerical results comparing the proposed hybrid
heuristic to existing exact algorithm and genetic local search.
(23.1)
Let C(x) and R(x) be the cost and reliability of the candidate solution x. Then, the
objective of the basic RNDP is to find x that forms a connected sub-network of G
such that C(x) is minimized while R(x) is maximized. If other performance criteria
like capacity or transmission delay are specified, they act as additional constraints
of the RNDP.
It is clear that the cost and reliability are conflicting objectives. Adding edges
to a certain network will make it more reliable but more expensive and vice versa.
This implies that the RNDP can be regarded as a bi-objective optimization problem
having multiple solutions, where the improvement in cost sacrifices the reliability. Such solutions are called Pareto optimal solutions [17]. Many research works
have been performed to establish the theory and applications of the multi-objective
optimization problem. In particular, the GA has enjoyed great success in addressing multi-objective optimization problems. The population-based search paradigm
of the GA provides a simple but efficient way to approximate the Pareto optimal
solutions from the single GA run [18].
A simple way to make the RNDP single-objective is to consider either objective
as a constraint. This study assumes that the minimum reliability requirement Rmin
is predetermined and C(x) should be minimized accordingly. Hence, the RNDP is
formulated as:
Given:
Over:
Minimize:
Subject to:
612
5
10
20
1024
3.52 1013
1.57 1057
125
1.00 108
2.62 1023
With a slight modification, the proposed hybrid heuristic can be applied to the opposite case, where the available budget is prescribed and R(x) should be maximized.
The cost of the edge k is given as ck + and C(x) is the summation of the total
edge cost:
C(x) =
ck xk .
k=1
This study assumes that nodes are invulnerable (perfectly reliable) but the edge k
may fail with a probability of qk [0, 1]. The operating probability pk , is hence
1 qk . The probabilistic metric for all-terminal reliability is used as R(x). Refer to
the next section to see how R(x) is evaluated or estimated from x and pk .
The RNDP belongs to the class of NP-hard problems. As mentioned above, the
exhaustive search space of the RNDP has a cardinality of 2e . This grows faster than
exponentially as the network size n increases. Table 23.1 shows the relationship between the number of nodes and the associated search space size of the RNDP. Even
for a small network with n = 10, it is impractical to check the cost and reliability of
every candidate solution to pick up the best one. Instead, we should rely on heuristic
methods, which yield good sub-optimal solutions in an acceptable time.
23
613
For the sake of simplicity, suppose again all the edges have the same failure probability of q. Then, we have Pr[F1 ] = qd1 . For the other terms on the right-hand side of
(23.2), we have
i1
c
Pr[Fi F1c F2c Fi1
] qdi (1 qd j 1 )
j=1
614
for i {2, . . . , n} as G(x) may have an edge between node i and j {1, . . . , i 1}.
Given di , we may have a tighter lower bound as follows:
c
] q di
Pr[Fi F1c F2c Fi1
min(di ,i1)
di
min(di ,i1)
j=1
(1 q
(1 qd j ).
(23.3)
j=min(di +1,i)
j=1
i1
(1 qd j 1 )
d j 1
i1
(1 q ) .
dj
(23.4)
j=min(di +1,i)
This upper bound can be calculated in a polynomial time of n, but sometimes has a
considerable error from the actual ATR.
Another simple but intuitive estimation method is the MCS. Given G(x) =
(N, E(x)) and pk for k E(x), the MCS randomly generates a number of edge
operating scenarios and checks if the nodes remain connected under each scenario.
The edge operating scenario is generated by sampling a uniform random number
u [0, 1] for each edge k; if this number is greater than pk , the edge is removed.
For every scenario, a connectivity test such as breadth-first search is performed. The
ratio of the operating scenarios that maintain the node connectivity becomes the
reliability measure of G(x).
As the iteration number of MCS grows, more precise reliability estimation can be
expected. The computational load of the MCS for precise estimation grows slightly
faster than linearly with network size, but this is much heavier than the bound-based
method. For this reason, the bound-based method is generally used for the screening
purpose of unreliable candidate networks while the MCS is applied to obtain more
precise reliability estimates of the low-cost candidate solutions having high ATR
bounds.
Under certain conditions making reliability evaluation straightforward, the RNDP
can be easily solved even for a big n. For example, if all the e edges in E have identical operating probability p and Rmin pn1 , the RNDP is reduced to the minimum
spanning tree problem whose objective is to find the shortest-length spanning tree
to connect all the nodes in N. Greedy heuristics such as Prims algorithm [22] can
solve the minimum spanning tree problem in a polynomial time. If Rmin > pn1 and
Rmin pn + npn1 (1 p), the RNDP becomes the Travelling Salesman Problem
(TSP) to find out the shortest-length Hamiltonian cycle visiting all the nodes in N
[23]. The TSP is NP-hard and thus exact algorithms work only for a small network.
However, many approximation algorithms have been proposed so far to find a good
sub-optimal solution in an acceptable time. Once the entire edges in E have the identical cost, theoretical clues obtained from previous works can help us to find a good
solution of the RNDP even when Rmin is large. For example, [19] showed that the
network with a largest number of trees maximizes the ATR when p is higher. However, these kind of exact methods cannot cope with realistic RNDP whose Rmin is
big and ck and pk are different for a different k E. The heuristic methods surveyed
in the next section are practical approaches for the realistic RNDP.
23
615
616
Although the ACO has broadened its application area to various engineering
problems such as routing, quadratic assignment, and scheduling, it especially outperforms other heuristic algorithms in dealing with the TSP. It has been rarely attempted to apply the ACO technique to general network design problems [15]. This
is because an ant itself is incapable of building a generic network, where a node may
have three or more adjacent edges. An intuitive way to utilize the ACO for the network design problems is to make it generate an initial Hamiltonian cycle and then
augment the cycle by adding edges until the given network constraints are satisfied.
This approach is not recommended since it cannot synthesize every possible connected network. As a part of network design heuristics, however, the ACO may play
an important role of building the minimum-cost path or cycle. From this motivation,
we propose the hybrid heuristic in Section 23.6.
23
617
sub-problem. Likewise, Li et al. hybridized the ACO and GA to handle the task
mapping in multi-core based system, where task scheduling and task assignment
problems are combined [38].
618
Ring1: (1, 2, 3)
Ring2: (1, 3, 4, 7)
Ring3: (5, 6, 7)
such operators are the nearest neighbor algorithm, Clarke-Wright algorithm [49],
Christofides algorithm [50], 2-Opt [51], 3-Opt [52], double-bridge move [53], and
Lin-Kernighan algorithm [54].
Under the MRE, a two-edge-connected candidate network can be represented by
a set of rings Y = {r1 , . . . , rL }, where rl for l {1, . . . , L} is a simple ring visiting ml
distinct nodes in N. The candidate network is formed by taking the union of all the
edges composing r1 , . . . , rl . For example, Fig. 23.1 illustrates a seven-node network
represented by three rings. Define a node whose degree is three or more as a bridge
node of a candidate network. Under the MRE, the bridge node acts as a junction
point of two or more rings. The network in Fig. 23.1 has three bridge nodes 1, 3,
and 7.
Given a candidate network G(x), we can establish its contraction model G(x)
=
1. Count the degree of every node in G(x) to decide bridge node set N.
23
619
Fig. 23.2 Contraction model of the network depicted in Fig. 23.1 (black circles: bridge
nodes; white circles: non-bridge nodes; solid lines: contracted edges composed of a single
edge; dashed lines: contracted edges composed of two or more edges)
Table 23.2 List of contracted edges in Fig. 23.2
k
Nodes connected
pk
qk
rk
1
2
3
4
5
1, 3
1, 7
1, 2, 3
3, 4, 7
5, 6, 7
0.95
0.95
0.9025
0.9025
0.857375
0.05
0.05
0.095
0.095
0.135375
0
0
0.0025
0.0025
0.00725
620
0.012
0.01
0.008
0.006
0.004
0.002
500
1000
1500
2000
2500
3000
# MCS Iterations
3500
4000
4500
5000
Fig. 23.3 ATR estimation errors averaged over 20 independent MCS runs performed for
original network in Fig. 23.1 and its contraction model in Fig. 23.2 (dotted line: original
network; solid line: contraction model)
Fig. 23.3. It is seen that the reliability estimate obtained from the contraction model
converged to the actual ATR much faster. The difference in convergence speed will
be even bigger if the given network has lots of nodes and only a limited portion of
them are bridge nodes. Note that the contraction model is also useful in evaluating other network constraints such as throughput or transmission delay though it is
beyond the scope of this study.
23
621
g 0;
Initialize(Pg );
FOR EACH I Pg DO
Repair(I);
Evaluate(I);
END FOR
REPEAT
I1 , I2 Select Parents(Pg );
I3 , I4 Generate Offspring(I1 , I2 );
Mutate(I3 , I4 );
LSACS(I3 , I4 );
Evaluate(I3 , I4 );
Pg+1 Select(Pg , I3 , I4 );
g g + 1;
UNTIL g < gT
The pseudo code of the HGA is outlined in Table 23.3, where g denotes the
generation index of the HGA, gT the termination generation, Pg the population of
individuals at g, I1 and I2 two parent individuals, and I3 and I4 two offspring individuals. At the initial generation, the HGA creates P individuals and evaluates them.
At every subsequent generation, I1 and I2 are selected from the current population
Pg to generate I3 and I4 . The offspring I3 and I4 undergo mutation and LSACS
operations. After evaluating the offspring, the HGA decides which individual will
survive to the next generation. In the following subsections, each step of the HGA
is discussed in detail.
0 =
1
.
n ek=1 ck
Denote V by the set of nodes visited by the rings in Y, W a temporary node set used
to create a new ring, and N\V the relative complement of V in N. The initialization
procedure for Y is as follows:
622
1.
2.
3.
4.
5.
6.
7.
The nearest neighbor algorithm is a simple heuristic to find a short ring r. Its
procedure is as follows:
1.
2.
3.
4.
Select a random starting node in W as the current node and mark it as visited.
Add the edge connecting the current node to the nearest unvisited node to r.
Mark the nearest node as visited and assign the nearest node to the current node.
If all the nodes in W are visited, added the edge connecting the current node to
the starting node and stop; otherwise go to Step 2.
Usually, the output of the nearest neighbor algorithm is not the shortest ring and can
be improved by other heuristics.
23.6.2 Repair
The ring set Y of a newly created individual I undergoes a repair procedure, where
its connectivity is tested and repaired. Even when every node in N is visited by
one or more rings in Y, the network represented by Y may be disconnected due
to the presence of disconnected rings. The breadth-first search is used to check the
connectivity of a given individual. If the individual is disconnected, it is repaired by
taking new rings. To build a new ring, the repair procedure picks up a disconnected
node i from N and builds a ring traversing i using the ring creation method used for
the initialization procedure. The repair procedure is summarized as follows:
1. Check if the network represented by Y is connected. If so, stop; otherwise, go to
next step.
2. Choose the size of a new ring r as a random integer u {3, . . . , n}.
3. Make W an empty set.
4. Pick up a disconnected node i N and add it to W .
5. Choose u 1 random nodes other than i from N and add them to W .
6. Apply the nearest neighbor algorithm to the nodes in W to obtain r.
7. Add r to Y and go to Step 1.
Note that a network represented by the MRE is two-edge-connected once it is
connected. So, the connectivity repair algorithm also repairs two-edge-connectivity.
23
f (x) =
1
C(x)
1
C(x)+n(Rmin R(x)) ek=1 ck
if R(x) Rmin ,
otherwise.
623
(23.5)
624
Ring pool
(1, 4, 3)
(1, 2, 4, 5)
(1, 2, 3)
(2, 3, 5, 4)
I3 and I4 , respectively. Fig. 23.4 illustrates the crossover operation of multi-ringencoded networks.
23.6.5 Mutation
The mutation operator comprises three sub-operators dedicated to the MRE: ringmerging, ring-splitting, and ring-resizing operators. The first two operators changes
the size of Y while the last one changes the number of nodes visited by the rings.
Since the node order of each ring will be fine-tuned by the LSACS, no permutationbased mutation operators such as 2-Opt are used here. Moreover, no mutation operation is performed for the pheromone matrix.
The three sub-operators are applied one by one. The ring-merging operation is
performed with a probability pRM [0, 1]. It randomly selects two rings ra and rb
in Y and applies the nearest neighbor algorithm to the nodes visited by ra and rb to
create a new ring, which replaces both ra and rb in Y. The network represented by
the mutated individual is generally cheaper but less reliable than the original one.
The ring-splitting operator is carried out with a probability pRS [0, 1]. It randomly selects one ring rc Y whose size is greater than three and replicates it
as rd . Let the number of nodes visited by rd is nd . Given a uniform random integer u {1, . . . , nd 3}, the ring-splitting operator removes u + 1 consecutive edges
from rd and places an edge between two disconnected nodes to make rd connected.
The ring-splitting operator ends by adding rd to Y. As a consequence, a shortcut is
placed between two nodes visited by rc to make the network more expensive but
more reliable.
To every ring r Y, the ring-resizing operator is applied with a probability pRR
[0, 1]. Let Nr be the set of nodes visited by the ring r. The ring-resizing operator
either augments or diminishes r with an equal probability 0.5. When augmenting r,
the ring-resizing operator randomly chooses a node i in N\Nr and takes the nearest
23
625
Y = {( 1, 2, 4, 5) , (2, 3, 5, 4)}
Ring merging
Ring
splitting
Ring resizing
626
rings in other possible ways. If the new ring set generated by the ants achieve a
better fitness, it replaces Y as described in Step 2 and 3. In Step 4, a ring randomly
chosen from Y passes through further fine-tuning stage named 2-Opt operation to
generate each of b ring sets [51]. The 2-Opt operator randomly picks up two edges
in r and removes them to have two separate paths. The paths are reconnected in
other possible way by reversing the node sequence of one path. The example of 2Opt operation is illustrated in Fig. 23.6. If this modification improves the fitness, the
modified ring replaces r as shown in Step 5 and 6. Step 7 updates the pheromone
matrix with the updated ring set. More details on each step is described in the later
part of this subsection.
The proposed LSACS is different from the standard ACS in two aspects. First,
the LSACS optimizes multiple rings at the same time. Given Y = {r1 , . . . , rL }, the
LSACS creates L ants, which share the same pheromone matrix H. Second, the
LSACS performs a single iteration of ant algorithm while the standard ACS performs multiple iterations, each of which creates multiple ant tours and updates
pheromone matrix with the best tour. For the proposed HGA, a single iteration is
enough since the pheromone matrix evolves as a part of the HGA individual. The
multi-generation operation of the HGA effectively simulates the multi-iteration operation of the standard ACS.
To generate each of a ring sets, the LSACS assigns L ants to N1 , . . . , NL , where
Nl for l {1, . . . , L} represents the set of nodes visited by rl . The daemon of each
ant generates a tour (ring) from Nl using the algorithm that is identical to the nearest
neighbor algorithm described in Section 23.6.1 except for the way to choose the
next node from the current node. Instead of taking the nearest node as the next
node, the ant daemon uses the pseudo-random-proportional rule to choose the next
node. Given the current node i, unvisited node set U, H, and edge cost ci, j for j U,
the daemon builds an ant decision table whose element corresponding to j U is
formulated as:
ai, j =
i, j ci, j
mU i,m ci,m
where < 0 is a tunable parameter representing the relative importance of ci, j over
i, j . With a probability pA [0, 1], the daemon chooses the next node j such that
23
627
i, j = (1 )i, j + 0 ,
(23.6)
where (0, 1) is a tunable parameter. The local pheromone update increases solution diversity by encouraging the next ants to generate a new tour that has not
emerged so far.
A global pheromone update is performed for the updated ring set Y. If Y contains
an edge between node i and j, the associated pheromone is updated as
i, j = (1 )i, j + f ,
where (0, 1) is a tunable parameter and f is the fitness of the updated Y. The
higher the fitness of Y is, the more pheromone is deposited over the edges in Y.
Note that the LSACS never destroys the connectivity of the network represented by
the offspring.
628
Network encoding
MCS
Repair algorithm
Crossover
Local search heuristic
LSGA [10]
HGA
Edge encoding
Based on original network
Greedy edge augmentation
Uniform edge crossover
Randomized greedy mutation
Multi-ring encoding
Based on contraction model
Random ring augmentation
Ring swapping
Local search ant colony system
the steady state version of the LSGA was implemented so that it would breed two
offspring at every generation. This choice was based on the observation that the
number of offspring mainly affects the computation time of both algorithms. Third,
the two algorithms applied the MCS to the best individual in the current population
and the offspring individuals whose initial fitness evaluated with Jans upper bound
are higher than the best individual.
On the other hand, we retained the core features of the LSGA such as the encoding method, repair algorithm, and genetic operators. The edge encoding of
the LSGA represented a candidate network with an e-dimensional binary vector
(23.1). To repair the candidate network violating two-edge-connectivity constraint,
the LSGA used a greedy edge augmentation procedure, which added the least-cost
edges to connect the nodes of degree one. The uniform crossover operator of the
LSGA ensured that each offspring is two-edge-connected and contains a least-cost
spanning tree in its parent. The mutation operator implemented a randomized greedy
local search algorithm. If every node in N have a degree of two, an edge was randomly chosen and added to the network. If the node degrees are greater than two
for all the nodes, expensive edge was removed from the network once the twoedge-connectivity was maintained. The LSGA carried out the MCS for the original
network while the HGA used the contraction model for the MCS runs.
Both algorithms used P = 50, gT = 5000, = 3, pC = 0.7, and MCS iterations of
10000, which gave good results during the experiment. The HGA used pRM = 0.3,
pRS = 0.3, pRR = 0.1, a = b = 2, = 3, pA = 0.9, and = = 0.1. The mutation rate
and drop rate of the LSGA are chosen as 0.3 and 0.6, respectively, as suggested in
[10].
The BBA is an exact algorithm attempting to find the optimal solution of the
RNDP by going through the limited portion of the search space using special arrangement of candidate solutions. The BBA works only for the case, where all the
edges in E have the same operating probability p. All the candidate solutions of
the RNDP are grouped into e n + 2 sets Sn1 , Sn , . . . , Se , where e = n(n 1)/2
and Sl represents the set of candidate solutions composed of l edges. Using (23.4),
the BBA approximates the maximum ATR achievable by Sl denoted by Rl . Since
Rn1 < Rn < . . . < Re , the BBA first determines l such that Rl 1 < Rmin and
Rl Rmin to reduce the search space to Sl Se .
23
629
Edge operating
probability (p)
Minimum reliability
requirement (Rmin )
7
10
20
30
40
50
70
100
0.9
0.9
0.95
0.97
0.975
0.98
0.985
0.99
0.95
0.95
0.95
0.95
0.95
0.95
0.95
0.95
630
Table 23.6 Average solution quality obtained from the single BBA run and 30 LSGA and
HGA runs (Better results are highlighted in boldface)
RNDP (n)
BBA
LSGA
HGA
7
10
20
30
40
50
70
100
2.623
N/A
N/A
N/A
N/A
N/A
N/A
N/A
2.913
4.763
5.712
7.206
8.425
9.220
10.974
13.638
2.625
4.318
5.190
6.749
7.582
8.315
10.133
11.490
and obtained P-value less than 0.001 for every RNDP instance. This verifies that the
HGA significantly outperforms the LSGA in solution quality. Further investigation
on the experiment results revealed that the LSGA tended to converge prematurely
at early generations due to its greedy repair and mutation operators biased to find
nearby local optima. The diversity of the LSGA crossover operator was limited because the edge of an offspring always comes from one of its parents. Moreover, the
edge-wise greedy heuristic was not so helpful to fine-tune the paths comprising a
candidate network. For example, the edge-wise greedy heuristic cannot mimic 2-Opt
operation. On the other hand, the HGA exhibited strong solution diversity. Actually,
the crossover operator of the HGA is able to generate any two-edge-connected network regardless how parents look like. This is because of the ring swapping and
augmentation mechanism. It was also observed that the mutation and LSACS operators of the HGA worked properly to fine-tune the ring topologies comprising a
candidate network.
Table 23.7 lists the statistics of the CPU seconds obtained from the three algorithms. Between the LSGA and HGA, we could not judge which algorithm generally
ran faster. We only observed that the CPU seconds of the LSGA varied widely while
the CPU seconds of the HGA were roughly proportional to n. As mentioned before,
the CPU second is mainly decided by the number of MCS runs performed for the
offspring individuals. Once all the individuals in the current population represent
the same network topology, the LSGA keeps generating similar offspring. Therefore, the LSGA will experience no MCS run if the offspring never outperforms the
best individual in future generations. On the other hand, if the reliability estimate
of the best individual is slightly lower than Rmin , the offspring will outperform the
best individual and undergo the MCS procedure, which may adjust the fitness of the
offspring lower than the best individual. This could be repeated for all the remaining
generations to waste CPU seconds. The consequence of this irregular behavior of the
LSGA is also illustrated in Table 23.8, which shows the ratio of computation time
spent to perform MCS during the runs of the two algorithms. The non-MCS computational load was roughly proportional to n for the HGA, but no such a tendency was
observed for the LSGA. It is also seen that the HGA required more computational
23
631
Table 23.7 CPU seconds obtained from the single BBA run and 30 LSGA and HGA runs
(Better results are highlighted in boldface)
RNDP (n)
7
10
20
30
40
50
70
100
BBA
311
N/A2
N/A
N/A
N/A
N/A
N/A
N/A
mean
LSGA
std. dev.
121
74
174
12
64
57
312
227
130
156
394
13
193
138
1110
794
mean
28
40
77
58
111
114
258
389
HGA
std. dev.
12
23
27
20
49
39
100
49
Table 23.8 Average ratios of CPU seconds spent to perform MCS during the LSGA and
HGA runs
RNDP (n)
LSGA
HGA
7
10
20
30
40
50
70
100
0.98
0.98
0.97
0.64
0.89
0.81
0.93
0.71
0.86
0.88
0.82
0.58
0.61
0.41
0.42
0.16
resources than the LSGA to perform non-MCS routines. This implies that the special genetic operators and LSACS dedicated to the MRE are more expensive than
conventional counterparts.
Fig. 23.7 depicts the best solutions obtained from the 30 runs of the LSGA and
HGA for the RNDP instance with n = 10. Even with naked eyes, we could see that
the HGA yielded a better solution. This is even more impressing if we consider the
number of evaluated candidate solutions. With P = 50, gT = 5000, and a = b = 2,
the 30 HGA runs evaluated 1.5 106 distinctive candidate solutions at most. This is
very tiny compared to the exhaustive search space size 3.52 1013.
The results of computer experiments verified that the RNDP is a computationally
expensive problem. As an enumeration-based approach, the BBA worked for the
RNDP only when n is very small. Even for a moderate network size with n = 10,
the BBA did not terminate within six hours of CPU time. On the other hand, the
counterpart methods based on computational intelligence handled realistic RNDP
effectively. The numerical results also proved that the search capability of the traditional edge-represented GA is limited compared to the HGA, which could handle
632
Fig. 23.7 Best solutions obtained from the LSGA and HGA for the RNDP instance with
n = 10
the RNDP more efficiently. This suggests that a proper hybridization may achieve
a synergetic alliance of two or more computational-intelligence-based heuristics.
Moreover, we showed that the problem-specific representation method such as the
MRE plays an important role in improving solution quality and CPU seconds of the
population-based heuristics.
References
1. Jan, R.-H., Hwuang, F.-J., Chen, S.-T.: Topological optimization of a communication
network subject to a reliability constraint. IEEE Transactions on Reliability 42(1), 63
70 (1993)
2. Johnson, D.S., Lenstra, J.K., Kan, A.H.G.R.: The complexity of the network design problem. Networks 8, 279285 (1978)
3. Aggarwal, K.K., Chopra, Y.C., Bajwa, J.S.: Topological layout of links for optimising
the overall reliability in a computer communication system. Microelectronics and Reliability 22(3), 347351 (1982)
4. De Boer, P.-T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy
method. Annals of Operations Research 134(1), 1967 (2005)
5. Pierre, S., Hyppolite, M.-A., Bourjolly, J.-M., Dioume, O.: Topological design of computer communication networks using simulated annealing. Engineering Applications of
Artificial Intelligence 8(1), 6169 (1995)
6. Randall, M., Mcmahon, G., Sugden, S.: A simulated annealing approach to communication network design. Journal of Combinatorial Optimization 6(1), 5565 (2002)
7. Jayaraman, V., Ross, A.: A simulated annealing methodology to distribution network
design and management. European Journal of Operational Research 144(3), 629645
(2003)
8. Glover, F., Lee, M., Ryan, J.: Least-cost network topology design for a new service: An
application of a tabu search. Annals of Operations Research 33(5), 351362 (1991)
23
633
9. Pedersen, M.B., Crainic, T.G., Madsen, O.B.G.: Models and tabu search metaheuristics for service network design with asset-balance requirements. Transportation Science
(2008), doi:10.1287/trsc.1080.0234
10. Dengiz, B., Altiparmak, F., Smith, A.E.: Local search genetic algorithm for optimal design of reliable networks. IEEE Transactions on Evolutionary Computation 1(3), 179
188 (1997)
11. Gen, M., Kumar, A., Kim, J.R.: Recent network design techniques using evolutionary
algorithms. Int. J. Production Economics 98(2), 251261 (2005)
12. Poorzahedy, H., Rouhania, O.M.: Hybrid meta-heuristic algorithms for solving network
design problem. European Journal of Operational Research 175(2), 707721 (2006)
13. Gang, P., Iimura, I., Nakayama, S.: An evolutionary multiple heuristic with genetic local
search for solving TSP. International Journal of Information Technology 14(2), 111
(2008)
14. Krasnogor, N., Smith, J.: A tutorial for competent memetic algorithms: model, taxonomy, and design issues. IEEE Transactions on Evolutionary Computation 9(5), 474488
(2005)
15. Randall, M., Tonkes, E.: Solving network synthesis problems using ant colony optimisation. In: Monostori, L., Vancza, J., Ali, M. (eds.) IEA/AIE 2001. LNCS (LNAI),
vol. 2070, pp. 110. Springer, Heidelberg (2001)
16. Dorigo, M., Caro, G.D., Gambardella, L.M.: Ant algorithms for discrete optimization.
Artificial Life 5(2), 137172 (1999)
17. Deb, K.: Multi-objective optimization using evolutionary algorithms. John Wiley and
Sons, Chichester (2001)
18. Konak, A., Coit, D.W., Smith, A.E.: Multi-objective optimization using genetic algorithms: A tutorial. Reliability Engineering & System Safety 91(9), 9921007 (2006)
19. Weichenberg, G.E., Chan, V.W.S., Medard, M.: High-reliability architectures for networks under stress. In: Proc. Twenty-third Annual Joint Conference of the IEEE Computer and Communications Societies (2004)
20. Ball, M., Van Slyke, R.M.: Backtracking algorithms for network reliability analysis. Ann.
Discrete Math. 1, 4964 (1977)
21. Lomonosov, M.V., Polesskii, V.P.: Lower bound of network reliability. Problems of
Information Transmission 8, 118123 (1972)
22. Prim, R.C.: Shortest connection networks and some generalizations. Bell System Technical Journal 36, 13891401 (1957)
23. Lawler, E.L., Lenstra, J.K., Kan, A.H.G.R., Shmoys, D.B.: The Traveling Salesman
Problem: A Guided Tour of Combinatorial Optimization. Wiley, Chichester (1985)
24. Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan
Press, Ann Arbor (1975)
25. Kumar, A., Pathak, R.M., Gupta, Y.P.: Genetic algorithm based reliability optimization
for computer network expansion. IEEE Transactions on Reliability 44(1), 6372 (1995)
26. Pierre, S., Legault, G.: A genetic algorithm for designing distributed computer network
topologies. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 28(2), 249258 (1998)
27. Chou, H., Premkumar, G., Chu, C.-H.: Genetic algorithms for communications network
design-an empirical study of the factors that influence performances. IEEE Transactions
on Evolutionary Computation 5(3), 236249 (2001)
28. Marseguerra, M., Zio, E., Podofillini, L., Coit, D.W.: Optimal design of reliable network
systems in presence of uncertainty. IEEE Transactions on Reliability 54(2), 243253
(2005)
634
29. Cordon, O., Viana, I.F., Herrera, F., Moreno, L.: A new ACO model integrating evolutionary computation concepts: the best-worst ant system. In: Proc. Second International
Workshop on Ant Algorithms, Brussels, Belgium, pp. 2229 (2000)
30. Gong, D., Ruan, X.: A hybrid approach of GA and ACO for TSP. In: Proc. 5th World
Congress on Intelligent Control and Automation, pp. 20682072 (2004)
31. Pilat, M.L., White, T.: Using genetic algorithms to optimize ACS-TSP. In: Proc. 3rd Int.
Workshop on Ant Algorithms, Brussels, Belgium, pp. 282287 (2002)
32. Tseng, L.-Y., Liang, S.-C.: A hybrid metaheuristic for the quadratic assignment problem.
Computational Optimization and Applications 34(1), 85113 (2006)
33. Acan, A.: GAACO: A GA + ACO Hybrid for Faster and Better Search Capability. In:
Proc. 3rd Int. Workshop on Ant Algorithms, Brussels, Belgium, pp. 1526 (2002)
34. Lee, Z.-J.: A hybrid algorithm applied to travelling salesman problem. In: Proc. IEEE
International Conference on Networking, Sensing and Control, pp. 237242 (2004)
35. Tseng, L.-Y., Chen, S.-C.: A hybrid metaheuristic for the resource-constrained project
scheduling problem. European Journal of Operational Research 175(2), 707721 (2006)
36. Lei, C.: A MCM interconnect test generation optimization scheme based on ant algorithm and genetic algorithm. In: Proc. 6th Intern. Conf. Electronic Packaging Technology, pp. 710713 (2005)
37. Silva, C.A., Faria, J.M., Abrantes, P., Sousa, J.M.C., Surico, M., Naso, D.: Concrete
delivery using a combination of GA and ACO. In: Proc. 44th IEEE Conf. Decision and
Control and European Control Conference, pp. 76337638 (2005)
38. Li, M., Wang, H., Li, P.: Tasks mapping in multi-core based system: hybrid ACO&GA
approach. In: Proc. 5th Intern. Conf. on ASIC, pp. 335340 (2003)
39. Prufer, H.: Neuer beweis eines satzes uber permutation. Arch. Math. Phys. 27, 742744
(1918)
40. Krishnamoorthy, M., Ernst, A.T., Sharaiha, Y.M.: Comparison of algorithms for the degree constrained minimum spanning tree (Tech. Rep.). CSIRO Mathematical and Information Sciences, Clayton, Australia
41. Abuali, F.N., Wainwright, R.L., Schoenefeld, D.A.: Determinant factorization: a new encoding scheme for spanning trees applied to the probablilistic minimum spanning tree
problem. In: Proc. 6th Intern. Conf. Genetic Algorithms, pp. 470477. Morgan Kaufmann, San Mateo (1995)
42. Rothlauf, F., Goldberg, D.E., Heinzl, A.: Network random keys a tree network representation scheme for genetic and evolutionary algorithms (Tech. Rep. No. 8/2000).
University of Bayreuth, Germany
43. Raidl, G.R., Julstrom, B.A.: A weighted coding in a genetic algorithm for the degreeconstrained minimum spanning tree problem. In: Proc. 2000 ACM Symp. Applied Computing, pp. 440445 (2000)
44. Cheriton, D., Tarjan, R.E.: Finding minimum spanning trees. SIAM J. Comput. 5(4),
724742 (1976)
45. Lee, Y., Chiu, S.Y., Sanchez, J.: A branch and cut algorithm for the Steiner ring star
problem. International Journal of Management Science 4, 2134 (1998)
46. Song, Y., Wool, A., Yener, B.: Combinatorial design of multi-ring networks with combined routing and flow control. Computer Networks 41(2), 247267 (2003)
47. Resendo, L.C., Pedro, J.M., Ribeiro, M.R.N., Pires, J.J.O.: ILP approaches to study interconnection strategies for multi-ring networks in the presence of traffic grooming
48. Won, J.-M., Karray, F.: A genetic algorithm with cycle representation and contraction
digraph model for guideway network design of personal rapid transit. In: Proc. 2007
IEEE Cong. Evolutionary Computation, pp. 24052412 (2007)
23
635
49. Clarke, G., Wright, J.W.: Scheduling of vehicles from a central depot to a number of
delivery points. Operations Research 12, 568581 (1964)
50. Christofides, N.: Worst-case analysis of a new heuristic for the travelling salesman problem, Report No. 388, GSIA, Carnegie-Mellon University, Pittsburgh, PA
51. Croes, G.A.: A method for solving traveling salesman problems. Operations
Research 6, 791812 (1958)
52. Bock, F.: An algorithm for solving traveling-salesman and related network optimization
problems. unpublished manuscript associated with talk presented at the 14th ORSA National Meeting (1958)
53. Jung, S., Moon, B.-R.: Toward minimal restriction of genetic encoding and crossovers
for the two-dimensional Euclidean TSP. IEEE Transactions on Evolutionary Computation 6(6), 557565 (2002)
54. Lin, S., Kernighan, B.: An effective heuristic algorithm for the traveling salesman problem. Operations Research 21, 498516 (1973)
55. Johnson, D.S., McGeoch, L.A.: The traveling salesman problem: a case study in
local optimization. In: Aarts, E.H.L., Lenstra, J.K. (eds.) Local Search in Combinatorial Optimization, pp. 215310. Wiley, New York (1997)
56. Sayoud, H., Takahashi, K., Vaillant, B.: Designing communication networks topologies using steady-state genetic algorithms. IEEE Communication Letters 5(3), 113115
(2001)
57. Davis, L.: Applying adaptive algorithms to epistatic domains. In: Proc. Intern. Joint Conf.
Artificial Intelligence, pp. 162164 (1985)
58. Goldberg, D.E., Lingle, R.: Alles, loci and the TSP. In: Proc. 1st Intern. Conf. Genetic
Algorithm and Their Applications, pp. 154159 (1985)
Chapter 24
Abstract. A dynamic synapse neural network (DSNN) for speech recognition system input filtering and the genetic algorithm (GA) used to optimize DSNN parameters is presented. DSNNs are trained to respond to a target word (TW) said
by one female speaker or by 8 male and 8 female speakers. The response of the
single-speaker trained DSNNs to all 16 speakers is similar to the 16-speaker-trained
DSNN responses. TW training results in an ordering of the expected responses to
the 9 words of the non-TW set. The ordering determined by single-speaker training
matches the ordering determined by multi-speaker training; and in many instances,
the single-speaker trained DSNN output matches the multi-speaker trained DSNN
output. While searching the parameter space to best solve the isolated word recognition task, the GA implicitly searched the input space to find the input subset best
describing the separatrix between TWs and non-TWs. Computation is decreased by
concentrating optimization on this subset. The GA adapts as knowledge of this subset is learned. The GA begins as a random search, becoming a steady state GA and
then a simple elitist GA over the course of optimization.
638
one of a limited number of menu TWs was most likely said. This task is not trivial. Menus contain a variable number of items and each TW is of variable length
and placement within the recorded sound. The acoustic environment is unknown,
e.g. the noise and reverberation characteristics are unknown. Each example word is
a function; but the exact form of the function is altered by a large number of factors. For example, gender, age, accent (both regional as well as foreign language
induced), current state of health or physical exertion, volume of speech (whisper,
normal speech, or yelling), familiarity with the recording device, and experience
with the language and syntax of the VCI. Finally, the functions that represent each
menu item are not of uniform distance from each other. Controlling for these factors still results in a variable input signal since human speech is variable. Therefore,
a statistical and nonlinear approach to IWR is required that generally begins with
the analysis of many thousands of example inputs. To make this problem tenable,
engineered VCIs are designed to solve the IWR task under specific operating restrictions. Our own auditory system however, is capable of learning new items after
very few presentations and is able to generalize that learning to new speakers, new
acoustic settings, an altered vocalization, a novel accent, different ambient noise
conditions, etc. Because traditionally engineered VCIs are not robust in the face of
the general problem, whereas our own auditory system is, there is much interest in
determining the mechanisms that allow for human hearing.
Human hearing occurs in two steps. The first step happens between the ear and
the cerebral cortex. Sound passes through a non-linear spectral/temporal filter, with
many thousands of channels, expressing many different characteristics, with several
clearly defined processing centers (nuclei), each of which has a complicated circuitry allowing for intranuclear channel mixing; also, each nuclei communicates
with the others to affect internuclear processing [42]. The second step happens
within the cerebral cortex, where auditory imaging studies are used to correlate neural activity in particular places with speech intelligibility [7, 10]. The cerebral cortex
hierarchically processes speech. Higher processing feeds back to lower processing
centers to allow for context sensitive sound filtering, both intracortically [57] and
subcortically [16]. Unfortunately, the high degree of complexity and an inability to
observe much of it, renders reverse engineering the auditory system difficult. Instead, a forward approach is taken to model aspects of auditory neuron function, to
use these models to support IWR tasks, and to then ascertain how well the model
supports the task. In this chapter we present one such model and the optimization
algorithms required for studying it. The application presented here is a bio-mimetic
neural engineering model of sub-cortical auditory filter behavior.
A genetic algorithm (GA) has served to greatly speed up model development and
testing. The GA is an optimization tool that is intuitive to apply, is model independent, and is well known within the application area of natural computation [3]. The
GA requires the list of system free parameters and ranges and an equation that scores
how well a particular individual (a complete set of valid parameter values) solves for
a stated objective. The GA is a recursive algorithm that generates sequential sets of
individuals (populations) that are increasingly better at solving for the objective. It
begins by scoring a population using the objective function and mapping the scores
24
639
to fitness values. Then, based on each individuals relative fitness it is cloned, killed,
or mated to reproduce more individuals, generating a new population and allowing the loop to be re-run. Each new loop is a generation. In our case, the objective
function measures how well a neural model may be used to classify a set of words.
Whether an individual is cloned or killed depends only on if it is one of the best
or one of the worst of the current generation at solving the IWR task. However, at
the heart of all GAs is a fitness-weighted random process: reproduction. During reproduction the bit representation of corresponding parameter values from different
individuals has a high probability of being split and recombined (crossover) to form
new individuals. After new individuals are formed there is a small probability that
any single bit will be flipped (mutated). This loop is illustrated in figure 24.1.
micro-Word Environment
BIASED SELECTION from
%-age of Corpus (best score, generation)
Entire Word Corpus
OBJECTIVE FUNCTION
Score each Individual
{Individual, Score}
if (Individual score > best score)
(1.) Retest on entire corpus
(2.) Reset score for the individual
f = Score - <Score>
SELECTION OPERATOR
Current Generation
of (K + r) Individuals
(K - z + r)
Individuals
FITNESS FUNCTION
z
Individuals
REPRODUCTION
(Crossover and Mutation)
Mutation (best score, generation)
K
Individuals
Fig. 24.1 The genetic algorithm used in this work is pictured above. At all times, K = 40,
initially, z = 40 and r = 32, but as evolution progresses the value of z and r decrease, falling
to 1. The grey boxes are controlled by the GA. Functionality of these boxes is explained in
many places, e.g. [17, 33]. Functionality of the white boxes is explained in the text. Objective
function: 24.4.2, Biased Selection of Words: 24.4.1, and Mutation and z: 24.4.4
640
in the model (Table 24.1). Computational overhead of the IWR problem during optimization lays almost exclusively in filtering many different sound examples many
thousands of times. The first generation of these models was optimized by an
intuitive hand tweaking approach taking many months to finally converge on a
solution. They separated only four command-words and they incorporated simplistic approximations of biological dynamics that could be expressed in terms of linear
functions and threshold devices. They also used a circuit construction algorithm that
resulted in optimization for new menu items being a super-exponential task in terms
of model complexity and numbers of input words. Nevertheless, these IWR systems
were more noise-robust than state-of-the art speech recognition systems [25, 26, 27].
By redefining the IWR task to a set of binary operations (TW vs. all others), further simplifying the incorporated dynamics, and using a GA with the objective of
having greater integrated TW output on a particular output lead vs. the other, time
for parameter optimization was reduced to approximately two weeks; allowing for
larger IWR systems to be designed [15, 35]. However the simple dynamics used
were not useful for elucidating principles of human hearing and although they could
be programmed for use for technological application [13], they offered no insight
toward neural engineering (culturing) of biological hearing systems. Such insight
requires that the designed systems be constructed with models that mimic neural
parts as opposed to abstractions of neural functionality. The application reported
here moves toward addressing these concerns; while also decreasing optimization
time to approximately two days per model.
Biological synapses consist of complicated dynamic processes that constrain
and support complex non-linear functional transformations that generally have
no analytic description. The synapses used here incorporate models of both Nmethyl-D-aspartate (NMDA) and -amino-3-hydroxy-5-methyl-4-isoxazole propionate (AMPA) protein function [49]. This results in synapses resembling those
found in the central nervous system. To optimize network performance, we developed an objective function that allows the GA to simultaneously conduct an implicit
search of the input space and an explicit search of the model parameter space to:
1. find the smallest subset of input words that represents the separatrix between
TWs and non-TWs, and
2. find the model parameter values that best solve for the IWR task on the entire
word corpus, while
3. calling the Objective Function as few times as possible.
Simultaneous search is necessary because these two goals are inter-dependent. Each
choice of input subset may imply different best parameter values, and vice-versa:
each choice of parameter values classifies best a different input subset. Three measures of parameter fitness were calculated with the goal of applying evolutionary
force in the direction of increasing information capacity, increasing accuracy, decreasing non-target word decision cost, and increasing network stability. The measurement weightings were determined by programmer experience. Fitness scores
fell along a gradient of less to more useful network behaviors, with lowest scores
applied to no-machines, and highest scores being associated with networks that
24
641
both classify the training set and also filter-out a portion of the non-TWs in the
training set. Mutation rate and input subset were independently adjusted throughout training in order to control the degree to which optimization was directed and
to focus optimization near the target/non-target word separatrix. Overtraining was
implicitly prevented by use of the output PDFs across all networks, and by our preference for training with historically difficult to classify input (function of the Biased
Selector).
Non-TWs could be discerned in two ways, either by correctly classifying them
as non-TW or by filtering out (not responding) to the non-TW input sound. Training
toward a filtering operation was accomplished by scaling the non-target-class decisions used to fill the confusion matrix by a cost or discernment factor. Because of
the preponderance of non-target class input, after 100% classification was reached,
but while optimization continued toward filtering-out non-target-class input, the calculated information capacity continued to increase because the ratio of TWs to
non-TWs classified increased.
IWR analysis usually begins with an assumption that speech intelligibility is dependent upon recognizing particular patterns in the sound; analysis of the isolated
speech is in terms of one or more of these classifiable sequences (usually spectral patterns or phonemes). We did not use these patterns for determining input encoding, network processing, or output analysis. We lumped nine of the ten input
classes into a single Negative Id class. Furthermore, we used integrated output
pulse trains (did not account for output dynamics) to score networks during training. Nevertheless, training resulted in a set of networks with discernible temporal
(dynamic) output pattern clusters useful for input classification. These patterns are
presented.
We show temporal output responses to filtered and pulse-encoded isolated speech
of DSNNs constrained by equations approximating the functionality of NMDA and
AMPA proteins and by the architecture of Figure 24.2. Emphasis is on dynamic
synapse neural network (DSNN) formalism, optimization, and response visualization, i.e. the acoustic temporal processing task is converted into a spatial pattern
recognition task. DSNNs have synapses consisting of a dynamic presynapse, generating an amplitude modulated pulse train for input to a single dynamic postsynapse.
Each postsynapse is a variable resistor with instantaneous current proportional to a
potential affected by multiple postsynapses, which are passively connected to each
other and to a single cell body via a dendritic branch. The NMDA and AMPA equations are state transition models affecting the instantaneous postsynaptic membrane
conductance. We show these networks have three cluster-types of trained I/O: first,
multiple (distinct temporal, spectral, and phoneme description) input speech classes
that elicit similar average temporal output (n 1); second , input speech classes
that elicit a distinct average temporal output response (1 1); and most often, multiple input speech classes that elicit a gradient of average temporal output response
([a . . . n] [A . . . N]).
ANNs are designed to provide efficient computation for well-defined tasks in
regression analysis, classification and data processing. Biological neural networks,
from which ANNs are inspired, are living tissue that has evolved physiological I/O
642
2
AP
AP
3
AP
AP
Positive
Id
1
7
Interneuron
(Wideband Inhibitor)
3
2
BRANCH
1
2
3
Negative
Id
BRANCH
ti o
rca n
B if
1
AP
7
BRANCH
INPUT
Pulse Trains
AP
S+(t)dt = E+
(+)
OUTPUT
Difference(t)
()
S(t)dt = E
AP
vint(t)
AP
AP
Presynapse
vd(t)
uj(t)
Postsynapse
Fig. 24.2 DSNN architecture. Isolated speech samples were decomposed into seven continuous waveforms that were passed through an adaptive pulse encoder oval shaded AP. There
are two output neurons, one with synapses trained to be responsive to a specific TW (Positive
ID), the other with synapses trained to be responsive to all other input (Negative ID). Further
detail is discussed throughout the chapter; signal flow through the Branch is illustrated in
figure 24.4
descriptions that are interpretable as computation. The defined nature of the tasks for
which ANNs are designed has allowed for the development of analytical tools for
interpreting the computational ability of ANNs there is as yet no such set of rules
for construction, development, or interpreting evolution, of biological neural networks. The application herein is an open problem in both ANN and natural computation research: generalization from single-speaker training data to multiple-speaker
performance. We compare averaged temporal output for single-speaker trained networks versus output of networks trained using sixteen speakers. The resulting output
response clusters, and in many cases the actual average temporal output, was similar for both sets of trained networks. Speaker specific training resulted in speaker
independent functionality. Our networks have the minimum architecture required to
embody the speech recognition task and so are a suitable test of the computational
ability supported by the synaptic dynamics relatively independent of other factors.
We thus make progress toward determining rules for understanding how biological
synapses may solve waveform classification problems. Our network is contrasted
with the dorsal cochlear nucleus (DCN), which provides complex signal envelope
processing necessary for human speech perception.
24
643
Corpus. Our aim is to generate pulse-trains typical of those seen on auditory nerve
afferent fibers.
The inner ear is a large-scale parallel distributed processor with approximately
35,000 output channels [34]. The operation of the inner ear is to pulse-encode the
presence of features in the filtered sound, with the purpose of supporting subsequent
segregation of relevant from irrelevant sound sources [5, 14]. One successful body
of work, Uysal, et al. [48], begins with an attempt to model the inner ear output.
Leaky integrate and fire neurons (LIFs) preceded by a synapse (HC, [46]) are used
to classify vowels. Their 7850 synapses are arranged in parallel and do not interact.
Output is the order of neuron firing the LIFs, however, receive summated input
from multiple HCs. The HC output represents probable firing of auditory nerve
fibers. The result is that appropriately thresholded LIF firing is more likely when a
quorum of HCs output is high, allowing them to be synchrony detectors. Because
each LIF is associated with a characteristic HC center frequency the likely mix of
formant frequencies emerges, allowing for vowel classification.
However, the hearing ability of cochlear implant patients [30] and of normal hearing subjects listening to spectrally limited speech [11] or to speech-envelope modulated noise [12, 41], is proof that the input filter bank need not duplicate the number
(or form) of the filters of a normal ear in order to support word recognition (ergo
or to research the required neural complexity allowing for word recognition). Auditory fibers respond to sound with a wide variety of characteristics that have been
well-described using statistical systems identification [40] derived, and experimentally validated, models (e.g. prediction of tuning curve amplitude dependence, phase
locking, and onset/offset response to pulsed tones [9, 53]). This analysis has led to
a high-level biophysical model of the acoustic transform that is descriptive through
the entire spectral sensitivity of the inner ear [23, 53], and that has been validated in
several vertebrate acoustic preparations [24, 38, 54]. This model describes inner ear
acoustic waveform transduction as a filter followed by a pulse generator.
For our filter bank, we opted to use a Debauchies-4 level 6 wavelet decomposition, which results in seven channels of spectrally filtered continuous waveforms
for input to the pulse encoder [37]. We mimicked two response characteristics of the
pulse encoder: onset responses, and phase locking to the filtered-sound. The input
pulse encoding is a running threshold,
k if (s > ); also, an input pulse is generated,
(24.1)
(t) =
c otherwise
where, s (t) is one component of the wavelet filtered input, k is a constant > 1, and
c is the exponential decay per time step that satisfies
ky exp(Td / ) = y
(24.2)
where, y is the value of (t) when a pulse is generated, Td is the desired average inter-pulse interval duration, and must be calculated for a given value of k.
Td = 10msec; resulting in the average pulse rate in response to noise being 100 per
644
10
40
Absolute Amplitude
Pulse density
30
20
10
-10
5
-5
2
0
-2
10
20
msec
30
40
20
40
60
80
msec
100
120
Fig. 24.3 Phase locking. The graphs on the left are peri-stimulus time histograms of the
pulse encoder response to a sinusoidal input with added noise over 256 cycles. The sinusoid is 24.41Hz. Noise is pseudo-Gaussian to approximately 2kHz. The SNR of the signals,
10log10 (Es /En ), is 4 (black), 1(light grey), -2 (medium grey), and -4 (dark grey). Three cycles at SNR equal to -4, 1 and 4 are shown in the graphs to the right. Average pulse rate per
cycle is approximately 4 for all SNR levels, corresponding to a pulse rate of approximately
100Hz
24
645
event sequences [36]. In so far as we can assume a good neural information processor is one that transforms classes of input event sequences into classes of output event sequences, where each output class is a set of similar sequences distinguishable from any given sequence of a different output class, the optimal sequence
passing feature of a single synapse renders it an unlikely candidate for information processing on two counts. First, there is no guarantee that each input class is a
set of event sequences with low-enough variability that each sequence is approximately equal to any other in the class. Therefore the nonlinear nature of a synapse
implies there is no guarantee that the output of each input sample maps to an output sequence that is near enough to the average of a recognized output class as to
declare passage of a meaningful message. Second, the output of a synapse is interpreted by a neuron as it generates action potentials, the action potential sequence
presumably carries the passed information. However, this sequence has no negative bit the output sequence only consists of yes and absence of yes events
(periods of silence). Isolated synapses are good filters, but a network is mandatory
to study the role of synaptic dynamics in information processing. The network described in this section is a relatively stiff system of ordinary differential equations.
However, we were able to estimate the dynamics of most of this system explicitely
using a forward Euler approach (and a small time step, approx. 1/10msec).
The postsynapse however, required implicit solution using Heuns method [21,
chapter 14].
Synapses are the elementary structural and functional unit for constructing of
neural circuits [43]. As such, it is the definition of synaptic operation that ultimately limits both the physical structure and the information processing ability of a
given neural circuit. A reasonably general mathematical description of the synaptic
functional is,
u j (t) = f (u j , t; j , vint , vd )
vint (t) = g( t;int )
vd (t) = h({u j }, vd )
(24.3)
In equation 24.3, u j is the weight of the jth synapse associated with a small patch
of dendritic branch. Weight is the real number value describing the strength or ability to transmit a signal across the synapse. j and int are the event sequences
(the mathematical equivalent of action potentials in neural tissue) that represent
the input to the synapse. The are the times of each event in the sequences.1
vd is the dendritic branch voltage. The function f describes the jth synapse transform, the function g describes the affect of an interneuron, and the function h describes the function of the dendritic branch. Referring to figure 24.2, synapses (see
1
The notation, t (read as, the sequence of events at times less than t), is equivalent to
the notation t with = t T ( ) and T ( ) is the list of event times. Networks not
containing dendritic branch function are notated by replacing vd with vsoma , the somatic
potential. Networks with no interneurons have vint = 0; while networks with arrays of
interneuronal interference replace vint with vint .
646
figure inset) in our DSNNs are lumped into three groups of seven, j {1, ...7}; with
the synaptic group connecting the input to the interneuron having vint (t) = 0, i.e.
having no interneuronal connectivity onto itself.
24
647
network which supports an excitation / inhibition algorithm, Difference( t), for the
purpose of speech recognition.
g( <t;int )
vd (t) =
h({u j }, vd )
(24.4)
The pre-synaptic terminal stores a limited supply of transmitter ready for release;
after release, it takes time to replenish this store via uptake, transport, and metabolic
mechanisms. Time is also required for intracellular buffers and membrane pump
mechanisms to restore [Ca2+ ] to the resting value after action potential induced
influx. During that time the residual calcium adds to any new calcium entering
the terminal, resulting in the new incoming action potential releasing more neurotransmitter than the previous one did. This effect is mediated by a reactiondiffusion equation that can saturate. This process is called facilitation and we describe a fast ( f f ast ) and slow ( fslow ) component of facilitation in our presynapses.
2
648
Table 24.1 Pre- and Post-synapse parameter values or ranges. Kx are unitless, values are in
msec. The NMDA gate has a rise time of 3.5msec and decay time of 8.27msec
Fast Facilitation
Slow Facilitation
Interneuron Inhibition
m = 1.372;
2 inhib
Ereversal = 140;
AMPA
o>>c = 0.56;
c>> = 15.76;
NMDA
1 = 35.0;
2 = 5.0;
5 = 1.0;
6 (0.5, 50);
7 = 1.0;
8 = 3.75;
9 = 1.5;
10 (1, 15);
11 = 250.0;
Synapses have processes that can reduce the width and height of incoming action
potentials. These processes reduce the amount of calcium entering the presynapse
during the action potential timecourse; which has the subsequent affect of reducing
the amount of neurotransmitter released in response to the action potential as well
as reducing the amount of residual calcium remaining at the time of the next action
potential. We do not model these modulations as they are a relatively insignificant.3
The primary process responsible for adjusting the amount of calcium entry into the
presynapse (up to 85% of the total possible modulation in guinea pig and rat hippocampal CA3-CA1 synapses) is a GABA-ergic type-B synapse [47]. We describe
this using a negative going alpha-function initiated at the time of an interneuron
action potential (vint ). The constraints discussed here are applied to equation 24.4
resulting in the more descriptive equations 24.5 through 24.10.
f f ast (t) = t; j (t) K f , f ast exp t/ f , f ast
(24.5)
(24.6)
fslow (t) = t; j (t) K f ,slow exp t/ f ,slow
vint (t) = <t;int (t) Kint exp t/int,slow
(24.7)
<t;int (t) Kint exp t/int, f ast
Ar (t) = 1 + f f ast + fslow vint
if (Ar < R )
0
if (Ar > R ) AND (Ar < Nstore )
u j (t) = Ar
0
if (Ar R ) AND (Nstore < u j )
3
(24.8)
(24.9)
(24.10)
These contributions can be modeled by setting the value of Kr in equation 24.8 to a function modeling the action potential shape modulation processes and attenuating each input
impulse accordingly.
24
Facilitation and
Release Functions
INPUT
AP
INTERNEURON
AP
Gating
Kinetics
AMPAOPEN
Postsynaptic
Membrane
Gating
Kinetics
NMDAOPEN
C20
C2*
k11 k7 k
8
+
C2
k6
kc
*
OUTPUT
AP
k5
AMPAClosed
Dendritic Branch
Intracellular
Signal
[Ca2+]
k9
k10
g1
koc
AMPA*
Feedforward
Presynapse
Modulation
Slow
Current
Glutamate
Fast
Current
k o
*
Presynaptic
Membrane
649
C1
k3
k4
k2
NMDAClosed
NMDAStore
k1
Fig. 24.4 Diagram of a single synapse embedded in the DSNN of Figure 24.2. To the upper
left is one of seven input sequences ( t; j of equation 24.4). The remaining six input
sequences have identical synapses onto the dendritic branch; only the postsynaptic membrane
of these connections is drawn (bottom). To the upper right is the interneuronal input (vint (t)
of equation 24.4). The circled Glutamate is u j (t) in equation 24.4. The Dendritic Branch
equation is h({u j }, vd ) in equation 24.4. kx correspond to the x of Table 24.1
Variables in equations 24.5 to 24.10 are defined consistent to the presynaptic biology presented above. The convolution operation is indicated by an asterisk. R is
an imposed threshold for synaptic transmitter release. Replenishment of the neurotransmitter store is a linear function (+KN ), however, the store has an exponential cycling decay (CN Nstore ), i.e. vessicles that are ready for release but do not get
released may become absorbed back into the intracellular matrix.
Analytic study of isolated synapses [4] and synaptic networks [55] reveal them
to be highly complex processes. However, for simplicity of presentation, the complexity of the postsynaptic model is described herein by figure 24.4, where several interacting processes are drawn: AMPA, NMDA, and Dendritic Branch. All
three processes are simplified functional descriptions of their biological counterparts [51, 52], rather than being an accurate kinetic representation. Both AMPA
and NMDA are described by state transition models; the amounts of stuff represented is conserved. At equilibrium all AMPA is in the AMPA , or activated,
state. An impulse of glutamate, results in a percentage of AMPA making a transition to AMPAOPEN , changing the variable resistance of the postsynaptic membrane. The fraction of AMPAOPEN quickly falls, as it is converted to AMPAClosed ,
which is then, subject to a slower time constant, converted once again to AMPA .
650
24
651
average integrated output and were misclassified (non-TW input producing lowenergy output were filtered out of consideration) and on TW samples that produced
low output energy (as these would lower the threshold of discernment resulting
in some non-TW misclassifications being counted that would otherwise have been
filtered out of consideration). These points are discussed below.
n
j
j
j=1
j=1
Choosing an example reduces to generating a number from the uniform distribution
on [1, Nj=1 n j ], and matching that number to the appropriate interval. The percentage
of TW and non-TW examples chosen for input to a given DSNN increases throughout training as a function of the highest fitness score attained. If two generations
passed without improvement, the number of samples increases by one seventh of
the difference between the current number and the total number of example words
of that class in the corpus. The purpose is to concentrate optimization at the separatrix of TW and non-TW class samples; rather than to optimize parameters to a
small set of examples that is assumed apriori to best describe the input classes. If
twenty-one generations passed without improvement, then the entire optimization
was restarted and the number of samples chosen to fill the training set for both TW
and non-TW word classes was reduced to 3.
652
to yield the corresponding pairs, Ei , Ei . Finally, the classification result for each
wi is the sign(Ei+ Ei ). So that {wi } corresponds to an ordered list of length, I.
Training Subset = {wi } <, >, E + , E , < +, > i
(24.11)
We generate four histograms from the E-sequence data: H+ , H , H+ , and H ;
with respectively, mean values, m +
, m
, m
, and m
. It is useful for the immediate
discussion to define r for each histogram as, for example: r = rms(E,i m
). We
also used the sequence in 24.11 to generate the Confusion Matrix [22], and used
standard operations on the Confusion Matrix for determining how well a particular
DSNN was able to classify {wi }. However, during training the contribution of the
non-TWs to the matrix was normalized.4
P(,+) P(,)
, such that,
Normalized (Cost Weighted) Confusion Matrix, P =
P(,+) P(,)
wi , and for + = min[Ei+ | wi ],
Ei+ / + if Ei+ < +
+
P(,sign(E + E )) =
i
i
1
otherwise
and wi ,
(24.12)
(24.13)
P(,sign(E +E )) = 1.
i
Perfect filtering (discrimination of TWs by comparing Ei+ to an absolute threshold) and classification5 (discrimination according to sign(Ei+ Ei )) is achieved if
training accomplishes the following two-part goal:
r
I m
r (24.14)
and
with, r r very small
goal 2: Complete
4
5
H+
separation of
and H+
with, m +
m
<< m
(24.15)
(24.16)
<< m +
(24.17)
The Confusion Matrix used here is a tabulation of all decisions; rather than a matrix of
true- and false-, positive and negative rates.
If both min[H+ ] max[H ] > 0 and max[H+ ] min[H ] < 0 are true, perfect classification is guaranteed. However, this constraint is much stricter
than necessary because classification is determined from individual samples of wi <, >, E + , E i . We also note,
anecdotally, that generally, i, E + E ; but that the magnitude of E <+,> depends on
speaker-specific input factors for example the sex or dialect of the speaker. Therefore,
the strict population constraint on E + and E necessary to guarantee perfect classification
is not a useful measure for network performance as it is almost never realizable.
24
653
(min, max)
= (min[Ei | wi ], max[Ei | wi ])
(min, max)
= (min[Ei | wi ], max[Ei | wi ]),
z + z
)
2
(24.18)
Cf / f
25 1
=
0 234
251 175
Pf / =
700 3270.0
240 164
C f / =
60 3655
(24.19)
386
31
P/ =
(24.20)
116.4 2922.4
384 29
C/ =
108 3606
(24.21)
654
6
4
10
H+
100
H+
50
100
50
2
0
0
0
0
40
H+
/
20
0
0
100
H
/
50
1000
H+
/
0
0
600
H
/
400
500
200
0
0
0
0
0
0
0
0
Fig. 24.5 Example population output statistics used for fitness scoring. The four PDFs on
the left are the output of training on a single female speaker; the four PDFs on the right are
for training using all 16 speakers. The Confusion Matrices are given in the boxed equations
24.20 and 24.21
scoreP =
b
P(;) ( pP )
P(;+) ( a+PP
2a p +b p
bp
1
2 P ) + P(;)( P ) 2
if + P = 0, and
otherwise.
(24.22)
In equation 24.22 the sum of all output represented in P is P, the sum of all TW
output is P, and the sum of all output that is classified as in-class is + P. Specific
entries in P are notated by the subscripted pair. Equations 24.18, 24.19, and 24.22,
are summed by the Objective Function of figure 24.1 to determine the fitness score
of each DSNN. The range of these functions, and therefore the balance of their
importance toward parameters optimization, was determined by adjusting the values
of b(H ) , a(H ) , b(H + ) , a(H + ) , b p and a p .
75
ax
m
100
350
300
Classifiers
e
or
sc
e
or
sc
e
or
sc
Yesmachine
ax
m
ax
m
0
-50
-175
100
50
-100
250
655
Perfect
Classifier
Score Range for:
Perfect Classifier........[ 125, 350]
Yes-machine...................[ 25, 300]
No-machine....................[-25, 250]
perfect No-machine...[-175, 150]
350
200
300
250
150
Evolution Progress
50
25
Perfect
No-machine
150
Nomachine
24
Fig. 24.6 Score values as they relate to network function. Ordinate = scoreH + scoreH+ ,
abscissa = scoreP ; fitness = abscissa + ordinate. Perfect No-machines are networks with
P = ; No-machines are networks with + P/ P 0 ; Yes-machines are networks with
P(;) / P 0 and P(;+) / P 1; Classifiers are networks that have most weight in P
on the diagonal; a Perfect Classifier has strictly diagonal P. The grey area of the plane represents the networks that can be generated; the light grey areas are highly unlikely. The range
of score for any given type of network is given in the table to the top right; there is a bound
on the lower right side of the Perfect No-machine and the Yes-machine score-spaces. The circled numbers are the score of idealized networks and where those networks lie in the plane;
the graphs boxed to the bottom right are their ideal behavior (explained in the text). The
graphs represent relative placement of, H and H (dotted), H+ (grey) and H+ (black). The
score ranges depicted here are for: aH = 50, bH = 50, aH + = 125, bH + = 100, aP = 50,
bP = 100
656
300
200
100
0
100
200
200
400
600
800
1000
1200
Fig. 24.7 Convergence progress for 7 DSNNs trained on all speakers to the target repeat
are overlayed. Every generated DSNN Micro-Environment score is plotted with a grey dot.
Those DSNNs that merit testing on the complete Word Environment have score plotted with a
black asterisk. The stratification of score values is explained in section 24.4.3. First generation
micro-scores are not tabulated, second generation micro-scores do not initiate macro-testing
unlikely areas of the Perfect No- and the Yes-machines score spaces is a result
of the rightward evolutionary force along the scoreP axis: as the separation of H+
and H+ increases, the increasing value of P(;+) /+ P, requires that H+ and H+ become increasingly separated, so that both scoreP and scoreH are expected to increase
together.
The familiar operations on confusion matrices [22] are,
precision
=
sensitivity; true positive rate =
specificity; true negative rate
=
accuracy =
P(;+) /+ P
P(;+) / P
P(;) / P
(P(;+) + P(;))/ P
The weights applied to the row of the Confusion Matrix in equation 24.13 render
some of the calculations above approximate. However, in so far we can accept the
above equations, the scoreP function is a measure which balances the relative importance of precision, sensitivity, specificity, and (due to the fractional representation of the overwhelming representation of words in the non-target class) accuracy
exactly what one would expect from a task-specific measure of classifier
performance.
24
657
658
GO = go
NO = no
RB = rubout
HP = help
ER = erase
EN = enter
YS = yes
RP = repeat
24
659
ST
GO
SP
NO
YS
RB
ER
RP
HP
EN
0
2 1
ST
GO
SP
NO
YS
RB
ER
RP
HP
EN
0
5 4
4 4
ST
GO
SP
NO
YS
RB
ER
RP
HP
EN
0
Fig. 24.8 Population output statistics of DSNNRP broken down into target word and nontarget word subclasses: for training on a single female speaker (top four plots), for that same
DSNN with sample words from all speakers applied to it (middle row), and for training
on all 16 speakers (bottom four plots). This data is also pictured in figure 24.5. Here, the
height of each histogram is indicated by shade intensity. The input labels for each histogram
are printed
on the ordinate; the abscissa corresponds to the response energy relative to the
min E + ; wi (see equation 24.11). The columns of graphs are respectively, H + , H ,
pdf(E + E ), and pdf(E + + E )
approximately 4160 samples including the 260 used during training). The pattern
of output energy versus input word remains similar. However, we note that the interval containing E + for non-TWs is increased, and the increased number of samples allow us to visualize more than one optimal output class in this case the
word, stop. From the classification results in table 24.2 middle row, the word stop
is correctly classified at greater than 99%, indicating that this network has optimal
Positive ID input, and optimal Negative ID input. Classification of all non-TWs is
660
Table 24.2 Positive-classification rates for the DSNNs of Figure 24.8. The true input word
is given at the top of each column, the target word is RP. First row: training data for the
single female speaker DSNN. Second row: validation of the single speaker DSNN on all 16
speakers. Third row: training data for a DSNN trained using examples from all 16 speakers.
The P-matrices used during training corresponding to the first and third rows are given in
equations 24.20. The last row is the average positive classification rate for each subclass during validation of 15 DSNNs, each one having been trained with a different speaker. The data
given in the last row indicate that regardless of which of the 16 speakers data sets is chosen
to train the initial DSNN, the input words filtered will be the same, and the classification task
will be reduced to the same subset of non-target words
ST
0.21
GO
0.34
RB
0.2
0.16
HP
0.5
0.24
SP
0.7
-
NO
4.3
1.56
YS
2
3.6
4.95
ER
EN
RP
96.2
2.4
4.3
57.8
11.1 11.1 93.0
6.18 14.31 70.44
validated above 95%; but noticeable misclassification rates are generated for four of
the non-target words; the remaining non-target words being classified at near 100%
rate. The classification rate for repeat has fallen to 57.8% certainly, adjusting
the classification threshold leftward would better balance the error rates for TW
and non-TW classes, but here we are interested in output patterns more than classification statistics according to equation 24.11 with an added variable threshold
(Receiver Operating Characteristic).
The third row of plots is for a DSNN trained on all word samples. In some sense,
these plots represent the ideal response to which validation data ought to be compared unfortunately, training these systems is intolerably slow. The DSNN pictured in row one was achieved overnight on a desktop PC; whereas, DSNNs trained
on all 4140 input samples require two to four weeks of dedicated processing time on
seven comparably equipped computers. Indeed, one of the motivations for this work
was to determine algorithms for decreasing the compute- and real-time required to
optimize biologically based small-set speech classifiers. Row three of figure 24.8
is demonstrative of an important output characteristic: by defining classification in
terms of equation 24.11 the optimal filtering properties of a DSNN do not necessarily coincide with the actual classification abilities. This is evidenced by the
output energy for the words, enter and erase, both of which have 11% error rate
(Table 24.2). The word enter produces a relatively significant total output energy,
but the word erase has significantly depressed output relative to all other word input
(Row 3, Pane 4).
The result shown in figure 24.8 and table 24.2 is typical, and indicative of the
following general patterns found in most of our trained DSNNs. First, optimization
toward the TW versus a set of nine non-TWs is in effect optimization toward distinguishing the TW from a subset of the non-TWs. Some non-TWs are filtered-out
(rendering their classification a moot issue), others are easily distinguished from the
TW. Therefore it is possible to limit the optimization task to a corpus of manageable
24
661
662
RB
EN
RP
4
200
400
600
800
1000
200
GO
NO
4
400
600
800
2
SP
ST
4
200
400
600
800
HP
ER
YS
1000
1000
200
400
600
800
1000
RB
EN
RP
4
200
400
600
800
1000
GO
NO
4
200
400
600
800
1000
2
0
2
SP
ST
4
200
400
600
800
1000
HP
ER
YS
4
200
400
600
800
1000
Fig. 24.9 Average temporal output for a DSNN trained on a single female speaker (Top
half: blue thick lines), average temporal output of a DSNN trained on all speakers (Bottom
half: blue thick lines), versus average temporal output for the single speaker trained DSNN
responding to all sixteen speakers (thin orange to brown lines). Time is in msec
magnitude positive prelude as compared to RPf / (Y S); RPf / f (EN) is very similar to
RPf / (EN), with the exception of a positive going feature that peaks just prior to
600msec in RPf (EN) that is averaged out of RPf / (EN); this same feature is found
in the RPf / f (RB) response, but not in RPf / (RB). As expected, words with pronunciation requiring a mid-word full stop produce an oscillating response (rubout and
24
663
repeat); enter is spoken with at least a partial stop, it also produces an oscillating response. We expected, but could not measure, the compilation of error due to
syllable sequences e.g. different speakers vocalize the syllable sequence at different
rates, so that the average of all speaker output at any given time after the start of
vocalization includes the response to multiple syllables.
The bottom four frames of figure 24.9 compare RPf / () to the average responses
of a DSNN trained on all sixteen speakers (RP/ ()). We note that the single speaker
training required two days of optimization time (individual 4679; score = 344.65
training; score = 215.3 validation), whereas the sixteen speaker training required
seven parallel GAs running for approximately two weeks on seven identical computers (individual 1048 of one of the parallel GAs; score = 266.51) . There are
three dissimilarities in these responses: the positive prelude in RPf / (ER) is practically missing, whereas it is quite prominent in RP/ (ER), and the positive prelude for RPf / (Y S) is truncated by nearly 30% relative to RP/ (Y S); and possibly
significant, RPf / (RP) is an entirely different waveform than RP/ (RP).
The isolated results displayed in figure 24.9 typify a very important general
theme we see in all trained DSNNs. There are three cluster-types of trained I/O:
first, multiple (distinct temporal, spectral, and phoneme description) input speech
classes that elicit similar average temporal output (n 1; e.g.RP(ST) = RP(SP));
second , input speech classes that elicit a distinct average temporal output response (1 1; e.g.RP/(RP)); but most often, multiple input speech classes that
elicit a gradient of average temporal output response ([a . . . n] [A . . . N]). The
most obvious example of the last cluster type is displayed in the sequence of
RPf / f orall (RB) RPf / (EN) RPf / (RP). In such a case, the waveform describing the expected, canonical response, is very similar, however, the magnitude
(and/or very low frequency modulation) of the response differs. Less obvious from
the responses graphed here (but seen in other DSNNs) are the sequences in critical
point movement, breadth of the negative going lobes, and the height of the sharp
positive prelude.
Taken altogether, what we find is that any given optimization results in a small
subset of words being distinguishable from the others based on output energy (previous section) and temporal response single DSNNs as defined in figure 24.2 and
constrained by dynamic operation as described in section 24.3.2 are weak classifiers with regard to capacity or total number of distinguishable input classes.
However, most words elicit a response that tends towards a canonical output; our
experience suggests that these DSNNs are very good ordering operators on the input
set. Their output can support system level (multiple DSNNs) probabilistic classification schemes. In particular, in graphing the output responses, we found that there
was an obvious order of the responses in terms of expected magnitude, critical point,
and breadth of waveform and most importantly, that the expected sequences were
word dependent, but were consistent between single speaker and multiple speaker
trained DSNNs. This suggests that our DSNNs are capable of quickly forming the
basis for a probabilistic speech recognizer, without making apriori assumptions
regarding the input or desirable output waveforms.
664
SP
ST
2
GO
NO
RP
ER
1
0
1
RB
EN
2
HP
YS
400
800
1200
0
400
msec
800
1200
Fig. 24.10 The expected output of all DSNNs is shown here for each input word. The input
word label is given to the right of each set of graphs. Graph order is the same for each set, from
bottom to top: EN, HP, RP, ER, RB,Y S, NO, SP, GO, ST . For example, the lower left set of
color graphs are EN(Y S), HP(Y S), RP(Y S), ER(Y S), ...ST (Y S). The functions graphed here
are for DSNNs trained on all available word samples. The third row of each plot corresponds
to the x-y graph of the same label in the lower half of figure 24.9; as in that figure, event
averages are for t = 20msec, averages near zero are not plotted
24
665
Not only does each TW indicate a unique separatrix, it also indicates a unique pattern of non-target subclass separation. An example is the response of DSNN (ST )
and DSNN (SP) to the words start and stop. DSNNEN , DSNNHP , DSNNRP ,
DSNNER , DSNNGO , and DSNNST have indistinguishable responses to the words
start and stop (rows 1-3, 9, 10, of ST and SP); and DSNNER , DSNNY S and
DSNNNO , have similar responses (rows 4,6, and 7, of ST and SP, relative color
magnitude difference between 600 700msec). The distinguishing characteristics
of these responses is found in two DSNNs: DSNNRB , which has no significant response to the word start, but does respond to stop (row 5), and DSNNSP , which
has a bimodal response to stop, but only a unimodal response to start.
Likewise, DSNN (GO) and DSNN (NO) are remarkably similar however,
eight of the 10 trained DSNNs have response to NO that begins with a prelude to
the main system response, five of these preludes are short duration negative bursts.
Although three of these expected preludes are of modest magnitude, the sum total
of the prelude response is a significant identifying feature of DSNN (NO).
DSNN (RP) and DSNN (ER) are somewhat similar, the onset of response
(< 400msec) indicates an expected magnitude difference of approximately 8 events
greater for DSNN (ER). After this initial phase, there is a cessation of activity on
seven of the ten DSNNs of DSNN (RP) while at the same time DSNN (ER) is
sparsely active with a strong negative response from RP(ER). This is followed by
a 100msec phase where, again, the two responses are very similar except in magnitude, during this time period DSNN (RP) is greater. These responses finish with
a 100msec period during which time EN(ER) and Y S(ER) show strong activity,
whereas there is little to no activity from DSNN (RP).
Similar observation reveals the significant differences between any given two
spatiotemporal response patterns. These diffferences are indicative that the overall
expected output is not necessarily as important as where and when is the output
for distinguishing the response of one word vs. another an expected conclusion
when using dynamic neural networks to analyze a signal. What was unexpected was
the minimal amount of information required to train these differences namely, no
information regarding composition of the non-TW subset.
24.6 Discussion
Pattern classification tasks often begin with an analysis of the input signal. In particular, the first concern is to determine useful signal components or parameters of
the signal (nonlinear filtering [20]) that can be measured or catalogued in such a
way that the signals are more easily classified after measurement than before. Research in speech perception (human pattern classification of speech) has identified
over 50 such measures [18]. A standard approach toward building VCIs is to analyze
input speech signals in terms of one or more known measures with the aim of mapping the input sound as a function of time into a sequence of meaningful speech. In
contrast to this approach, one may assume only that the input signal contains classifiable components (or that the input is a set of classifiable signals), and then after
666
24
667
668
Acknowledgements. This work was supported by grants from the USA Office of Naval Research, and the USA Department of Defense/DARPA. Thanks to Shivani Pandya, Dr. Sageev
George, and Dr. Hassan Namarvar, for computer programming of the encoded input files and
much of the software framework within which our DSNNs run.
References
1. Arnott, R.H., Wallace, M.N., Shackleton, T.M., Palmer, A.R.: Onset neurones in the
anteroventral cochlear nucleus project to the dorsal cochlear nucleus. JARO 5, 153170
(2004)
2. Balakrishnan, V., Trussel, L.: Synaptic inputs to granule cells of the dorsal cochlear nucleus. Journal of Neurophysiology 99, 208219 (2008)
3. Ballard, D.H.: An Introduction To Natural Computation. MIT Press, Cambridge (1999)
4. Berger, T.W., Eriksson, J.L., Ciarolla, D.A., Sclabassi, R.J.: Nonlinear systems analysis
of the hippocampal perforant path-dentate projection. II. Effects of random train stimulation. Journal of Neurophysiology 60, 10771094 (1988)
5. Bregman, A.: Auditory scene analysis: the perceptual organization of sound. MIT Press,
Cambridge (1990)
6. Cerf, R.: Asymptotic convergence of genetic algorithms. Adv. Appl. Prob. 30, 521550
(1998)
7. Davis, M.H., Johnsrude, I.S.: Hierarchical processing in spoken language comprehension. Journal of Neuroscience 23(8), 34233431 (2003)
8. Davis, T.E., Principe, J.C.: A simulated annealing like convergence theory for the simple
genetic algorithm. In: Proceedings of the Fourth International Conference on Genetic
Algorithms, pp. 174181. Morgan Kaufman, San Mateo (1991)
9. de Boer, E.: Correlation studies applied to the frequency resolution of the cochlea. Journal Auditory Research 7, 209217 (1967)
10. Demonet, J.-F., Thierry, G., Cardebat, D.: Renewal of the Neurophysiology of Language:
functional neuroimaging. Physiol. Rev. 85, 4995 (2005)
11. Dorman, M., Loizou, P., Rainey, D.: Speech intelligibility as a function of the number of
channels of stimulation for signal processors using sine-wave and noise-band outputs. J.
Acoust. Soc. Am. 102, 24032411 (1997)
12. Dudley, H.: Remaking speech. J. Acoust. Soc. Am. 11, 169177 (1939)
13. Dibazar, A., Song, D., Yamada, W.M., Berger, T.W.: Speech recognition based on fundamental functional principles of the brain. In: Proceedings of the IEEE International Joint
Conference on Neural Networks, vol. 2004(4), pp. 30713075 (2004)
14. Fay, R.R., Popper, A.N.: Evolution of hearing in vertebrates: the inner ears and processing. Hearing Research 149, 110 (2000)
15. George, S.T.: The Use of Dynamic Synapse Neural Networks for Speech Processing
Tasks. PhD Thesis. Department of Biomedical Engineering, University of Southern California, Los Angeles, CA 90089 (2007)
16. Di Girolamo, S., Napolitano, B., Alessandrini, M., Bruno, E.: Experimental and clinical
aspects of the efferent auditory system. Acta Neurochir. Suppl. 97(2), 419424 (2007)
17. Goldberg, D.E.: The Design of Innovation: Lessons from and for competent genetic algorithms. Kluwer Academic Publishers, Boston (2002)
18. Greenberg, S.: The origins of speech intelligibility in the real world. In: Proc. of the
ESCA Workshop on Robust Speech Recognition for Unknown Communication Channels, pp. 2332 (1997)
24
669
19. Griffiths, T.D., Bates, D., Rees, A., Witton, C., Gholkar, A., Gren, G.G.R.: Sound movement detection deficit due to a brainstem lesion. J. Neurol. Neurosurg. Psych. 62, 522
526 (1997)
20. Haykin, S.: Modern Filters. Macmillan Publishing Co., New York (1989)
21. Koch, C., Segev, I. (eds.): Methods In Neuronal Modeling: from ions to networks, 2nd
edn. MIT Press, Cambridge (1998)
22. Kohavi, R., Provost, F. (eds.): Glossary to The Special Issue on Applications of Machine
Learning and Knowledge Discovery Process. Machine Learning, vol. 30, pp. 271274
(1998)
23. Lewis, E.R., Henry, K.R., Yamada, W.M.: Tuning and timing of excitation and inhibition
in primary auditory nerve fibers. Hearing Research 171, 1331 (2002)
24. Lewis, E.R., Henry, K.R., Yamada, W.M.: Tuning and timing in the gerbil ear: Wienerkernel analysis. Hearing Research 174, 206221 (2002)
25. Liaw, J.-S., Berger, T.W.: Dynamic Synapse: A New Concept of Neural Representation
and Computation. Hippocampus 6, 591600 (1996)
26. Liaw, J.-S., Berger, T.W.: Robust Speech Recognition With Dynamic Synapses. In: IEEE
World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference on Neural Networks, Neural Networks Proceedings, 1998, vol. 3, pp. 21752179
(1998), doi:10.1109/IJCNN.1998.687197
27. Liaw, J.-S., Berger, T.W.: Dynamic Synapse: Harnessing the computing power of synaptic dynamics. Neurocomputing 26-27, 199206 (1999)
28. Liberman, C., Guinan, J.: Feedback control of the auditory periphery: antimasking effects of middle ear muscles vs. olivocochlear efferents. J. Commun. Disord. 31, 471483
(1998)
29. Loiselle, S., Rouat, J., Pressnitzer, D., Thorpe, S.: Exploration of rank order coding with
spiking neural networks for speech recognition. In: Proceedings of International Joint
Conference on Neural Networks 2005, vol. 4, pp. 20762080 (2005) ISBN: 0-78039048-2/05
30. Loizou, P.C.: Mimicking the human ear. IEEE Signal Processing Magazine, 101130
(September 1998)
31. On the number of channels needed to understand speech. J. Acoust. Soc. Am. 106(4),
20972103 (1999)
32. N-Methyl-D-Aspartate receptors at Parallel Fiber Synapses in the Dorsal Cochlear Nucleus. Journal of Neurophysiology 76(3), 16391656
33. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1996)
34. Gacek, R.R., Rasmussen, G.L.: Fiber analysis of the statoacoustic nerve of guinea pig,
can and monkey. Anat. Rec. 139, 455463 (1961)
35. Namarvar, H.H., Liaw, J.-S., Berger, T.W.: A New Dynamic Synapse Neural Network for
Speech Recognition. In: Proceedings International Joint Conference on Neural Networks
2001, vol. 4, pp. 29852990 (2001), doi:10.1109/IJCNN.2001.938853
36. Natschlager, T., Maass, W.: Computing the optimally fitted spike train for a synapse.
Neural Computation 13, 24772494 (2001)
37. Percival, D.B., Walden, A.T.: Wavelet Methods for Time Series Analysis. Cambridge
University Press, Cambridge (2000)
38. Recio-Spinoso, A., Temchin, A.N., van Dijk, P., Fan, Y.H., Ruggero, M.A.: WienerKernel Analysis of Responses to Noise of Chinchilla Auditory-Nerve Fibers. Journal
of Neurophysiology 93, 36153634 (2005)
39. Rigal, L., Truffet, L.: A new genetic algorithm specifically based on mutation and selection. Adv. Appl. Prob. 39, 141161 (2007)
670
40. Schetzen, M.: The Volterra and Wiener Theories of Nonlinear Systems. John Wiley and
Sons, New York (1980)
41. Shannon, R., Zeng, F.-G., Kamath, V., Wygonski, J., Ekelid, M.: Speech recognition with
primarily temporal cues. Science 270, 303304 (1995)
42. Shepherd, G.M. (ed.): The Synaptic Organization of the Brain. Oxford University Press,
Oxford (1998)
43. Shepherd, G.M., Koch, C.: Introduction to synaptic circuits. In: Shepherd, G.M. (ed.)
The Synaptic Organization of the Brain. Oxford University Press, Inc., Oxford (1998)
44. Shouval, H.Z., Bear, M.F., Cooper, L.N.: A unified model of NMDA Receptor-dependent
bidirectional synaptic plasticity. PNAS 99(16), 1083110836 (2002)
45. Stevens, C.F.: Neurotransmitter release at central synapses. Neuron 40, 381388 (2003)
46. Sumner, C.J., Lopez-Poveda, E.A., OMard, L.P., Meddis, R.: Adaptation in a revised
inner-hair cell model. J. Acoust. Soc. America 113(2), 893901 (2003)
47. Thiels, E., Barrionuevo, G., Berger, T.W.: Induction of long-term depression in hippocampus in vivo requires postsynaptic inhibition. Journal of Neurophysiology 72,
30093016 (1994)
48. Uysal, I., Sathyendra, H., Harris, J.: A biologically plausible system approach
for noise robust vowel recognition. In: 49th IEEE International Midwest Symposium on Circuits and Systems, MWSCAS 2006, vol. 1, pp. 245249 (2006),
doi:10.1109/MWSCAS.2006.382043
49. Watkins, J.C., Jane, D.E.: The glutamate story. British Journal of Pharmacology 147,
S100S108 (2006)
50. Wu, L.-G., Saggau, P.: Presynaptic inhibition of elicited neurotransmitter release. Trends
In Neuroscience 20(5), 204212 (1997)
51. Xie, X., Berger, T.W., Barrionuevo, G.: Isolated NMDA receptor-mediated synaptic responses express both LTP and LTD. Journal of Neurophysiology 67, 10091013 (1992)
52. Xie, X., Barrionuevo, G., Berger, T.W.: Differential expression of short-term potentiation
by AMPA and NMDA receptors. Learning and Memory 3, 115123 (1996)
53. Yamada, W.Y.: Second-Order Wiener Kernel Analysis of Auditory Afferent Axons of
the North American Bullfrog and Mongolian Gerbil Responding To Noise. PhD Thesis,
University of California at Berkeley. Committee in charge: Edwin R. Lewis, Kenneth
Henry, Erv Hafter, and Geoffrey Owen (1997)
54. Yamada, W.M., Lewis, E.R.: Predicting the temporal response of non-phase-locked bullfrog auditory units to complex acoustic waveforms. Hearing Research 130(1-2), 155170
(1999)
55. Yeckel, M.F., Berger, T.W.: Feedforward excitation of the hippocampus by entorhinal
afferents: Redefinition of the role of the trisynaptic pathway. Proceedings of the National
Academy of Sciences 87, 58325836 (1990)
56. Young, E.: Cochlear Nucleus. In: Shepherd, G. (ed.) The Synaptic Organization of the
Brain, 4th edn. Oxford University Press, Oxford (1998)
57. Zatorre, R.J., Gandour, J.T.: Neural specializations for speech and pitch: moving beyond
the dichotomies. Phil. Trans. Royal Soc. B 363, 10871104 (2008)
58. Zucker, R.S., Regehr, W.G.: Short-term synaptic plasticity. Annual Reviews in Physiology 64, 355405 (2002)
Chapter 25
Abstract. Automatic image registration is a fundamental task in medical image processing, and significant advances have occurred in the last decade. However, one
major problem with advanced registration techniques is their high computational
cost. Due to this restraint, these methods have found limited application to clinical situations where real time or near real time execution is required, e.g., intraoperative imaging, or high volumes of data need to be processed periodically. High
performance in image registration can be achieved by reduction in data and search
spaces. However, to obtain a significant increase in performance, these approaches
must be complemented with parallel processing. Parallel processing is associated
with expensive supercomputers and computer clusters that are unaffordable for most
public medical institutions. This chapter will describe how to take advantage of an
existing computational infrastructure and achieve high performance image registration in a practical and affordable way. More specifically, it will outline the implementation of a fast and robust Internet subtraction service, using a distributed
evolutionary algorithm and a service-oriented architecture.
25.1 Introduction
In image processing, the interest often lies not only in analyzing one image but also
in comparing or combining the information present in different images. In this context, image registration can be defined as the process of aligning images so that corresponding features can be related. The term image registration is also used to refer
to the alignment of images with a computer model or the alignment of features in an
image with locations in physical space. For this reason image registration is one of
the fundamental tasks within image processing: by determining the transformation
required to align two images, registration enables specialists to make quantitative
comparisons.
Gabriel Manana Guichon Eduardo Romero Castro
National University, Carrera 45 N 26-85, Bogota, Colombia
e-mail: {gjmananag,edromero}@unal.edu.co
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 671700.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
672
From an operational point of view, image registration is an optimization problem and its goal is to produce, as output, an optimal geometrical transformation that
aligns corresponding points of the two given views. Image registration has applications in many fields, including remote sensing, astro- and geophysics, computer
vision, and medical imaging. The field of application to be addressed in this chapter
is medical imaging, and in this field, this transformation is generally used as input to
another system that can be, for instance, a fusion system or a subtraction system. For
a complete overview of the different image acquisition systems and the relevance of
registration in medical image interpretation and analysis, you may refer to Hajnal et
al. [29], and references therein.
In many clinical scenarios, images of the same or different modalities may be acquired and it is the responsibility of the diagnostician to combine or fuse the images
information to draw useful clinical conclusions. Without an imaging system this
generally requires mental compensation for changes in patient position, the sensors
used, or even the chemicals involved. An image registration system aligns the images and so establishes correspondence between features present in different images,
allowing the monitoring of subtle changes in size or intensity over time or across a
population. It also allows establishing correspondence between images and physical
space in image guided interventions. In many applications a rigid transformation,
i.e., translations and rotations only, is enough to describe the spatial relationship
between two images. However, there are many other applications where non-rigid
transformations are required to describe this spatial relationship adequately.
In terms of the algorithms used, the current tendency is to use automatic algorithms (i.e., no user interaction) [8], which requires the application of advanced image registration techniques, all characterized by their high computational cost. Due
to this restraint, these methods have found limited application in clinical situations
where real time or near real time execution is required, e.g., intra-operative imaging
or image guided surgery. High performance in image registration can be achieved
by reduction in data space, as well as reduction in solution search space. These
techniques can decrease significantly the registration time without compromising
registration accuracy. Nonetheless, to obtain a significant increase in performance,
these approaches must be complemented with parallel processing. The problem is
that parallel processing has always been associated with extremely expensive supercomputers, unaffordable for most medical institutions in developing countries. This
chapter will describe our experience in achieving high performance in an affordable
way, i.e., taking advantage of an existing computational infrastructure. More specifically, it will outline how this can be done by using open source software tools that
are readily available. This will be illustrated by the use of a real case study: an online
subtraction radiography service that employs distributed evolutionary algorithms for
automatic registration.
The chapter is organized as follows. Section 25.2, Background, briefly describes
the main aspects behind medical image registration and presents a general overview
of available options to attain high-performance computing for scientific research.
Next, section 25.3, A Grid Computing Framework for Medical Imaging, presents
our experience in building a scalable computing framework for medical imaging,
25
673
25.2 Background
As introduced in the previous section, the task of image registration is to find an
optimal geometric transformation between corresponding image data. The image
registration problem can be stated in just a few words: given a reference and a template image, find an appropriate geometric transformation such that the transformed
template becomes similar to the reference. However, though the problem is easy to
express, it is hard to solve. In practice, the concrete types of the geometric transformation, as well as the notions of optimal and corresponding depend on the specific
application. In this section we summarize the main aspects involved in the registration process and review recent trends in high-performance computing (HPC).
(25.1)
674
where the first term characterizes the similarity between the images and the second
term characterizes the cost associated with particular deformations. From a probabilistic point of view, the cost function in eq. (25.1) can be can be explained in a
Bayesian context. In this framework, the similarity measure can be viewed as a likelihood term which expresses the probability of a match between the two images, and
the second term can be interpreted as a prior which represents a priori knowledge
about the expected deformation. This term only plays a role in non-rigid registration
and in the case of rigid registration is usually ignored.
Several approaches can be used to optimize this function. They go from the use
of standard numerical methods to the use of evolutionary methods, including some
hybrid approaches. No matter what method is used, this always implies an iterative
process whose computational cost is so high that prevents most applications from
performing appropriately in real time situations. One possible way to solve this issue
is to devise faster algorithms. Another way is to exploit the intrinsic parallelism that
most methods convey.
Medical image registration spans numerous applications and there is a large score
of different techniques reported in the literature. What follows is an attempt to classify the different techniques and categorize them based upon some criteria, for a
complete analysis, please refer to, e.g., [2]. Maintz and Viergever [25] originally
proposed a nine-dimensional scheme that can be condensed into the following eight
criteria [22]: image dimensionality, registration basis, geometrical transformation,
degree of interaction, optimization procedure, image acquisition modalities, subject, and object.
Image dimensionality refers to the number of geometrical dimensions of the
image spaces involved, which in medical applications are typically two and threedimensional, but may include time as a fourth dimension. For spatial registration,
there are the 2D/2D, 3D/3D and the more complex 2D/3D registration (e.g., CT/Xray).
The registration basis is the aspect of the two images used to perform the registration. In this category, registration can be classified into extrinsic and intrinsic
methods. Registration methods that are based upon the attachment of markers are
termed extrinsic methods, and in contrast, those which rely on anatomic features
only are termed intrinsic. When there are no known correspondences as input, intensity patterns in the two views are used for alignment. A basis known as intensity or voxel -based, has become in recent years the most widely used registration basis in medical imaging. Here there are two distinct approaches: the first reduces
the image gray value content to a representative set of scalars and orientations (e.g.
principal axes and moments based methods), the second uses the full image pixel
content throughout the registration process. In general, intensity-based methods are
more complex, yet more flexible.
The category geometrical transformation refers to the mathematical forms of the
geometrical mapping used to align points in one space with those in the other. These
include rigid transformations, which preserve all distances, i.e., transformations that
preserve the straightness of lines - and hence planarity of surfaces - and all angles
25
675
between straight lines. Images are rotated and translated in two or three dimensions in the matching process, but not deformed in any way. This is ideal for most
fusion applications, and accounts for differences such as patient positioning. Registration problems that are limited to rigid transformations are called rigid registration problems. In deformable or non-rigid registration, images are stretched to
take into account complex motions, such as breathing, and any changes in the shape
of the body or organs, which may occur following surgery, for example. Non-rigid
transformations are important not only for applications to non-rigid anatomy, but
also for inter-patient registration of rigid anatomy and intra-patient registration of
rigid anatomy, in those cases where there are non-rigid distortions caused by the
image acquisition procedure. These include scaling transformations, with a special
case when the scaling is isotropic, known as similarity transformations ; the more
general affine transformations that preserve the straightness of lines and planarity
of surfaces, as well as parallelism, but change the angles between lines; the even
more general projective transformations that preserve the straightness of lines and
planarity of surfaces, but no parallelism; perspective transformations, a subset of
the projective transformations, required for images obtained by techniques such as
X-ray, endoscopy or microscopy, and finally curved transformations which do not
preserve the straightness of lines. Each type of transformation contains as special
cases the ones described before it, e.g., rigid transformations are a special type of
non-rigid transformations, and so on. Transformations that are applied to the whole
image are called global, while transformations that are applied to subsections of
the image are called local. Rigid, affine and projective transformations are generally global, and curved transformations are more or less local, depending upon the
underlying physical model used.
Degree of interaction refers to the degree of intervention of a human operator
in the registration algorithm. The fully automatic algorithm, which requires no user
interaction and represents the ideal situation, is a central focus of the subtraction
service presented in this chapter.
The optimization procedure is the method by which the function that measures
the alignment of the images is maximized. Depending upon the mathematical approach to registration used, i.e., parametric or non-parametric, the optimization
method will try to find an optimum of some function defined on the parameter
space, or will try to come up with an appropriate measure, both for the similarity
of the images as well as for the likelihood of a non-parametric transformation. The
more common situation here is that in which a global extremum is sought among
many local ones by means of iterative search. In parametric registration, popular
techniques include traditional numerical methods like Powells method [36], Downhill Simplex [24], gradient descent methods, as well as evolutionary methods like
genetic algorithms [27], simulated annealing [39], and differential evolution [30].
Modalities refers to the means by which the images to be registered are acquired.
Two-dimensional images are acquired, e.g., by X-ray projections captured on film
or digitally, and three-dimensional images are typically acquired by tomographic
676
The IBM Roadrunner, number one in the TOP500 list as of the end of 2008: about 130
million dollars (http://en.wikipedia.org/wiki/IBM_Roadrunner).
25
677
Fig. 25.1 The easiest way to integrate heterogeneous computing resources is not to recreate
them as homogeneous elements, but to provide a layer that allows them to communicate
despite their differences. This software layer is commonly known as middleware
This layer is responsible for hiding distribution and the heterogeneity of the various hardware components, operating systems and communication protocols. At its
most basic level, middleware is nothing but a way of abstracting access to a resource through the use of an Application Programming Interface (API). Despite
their benefits, distributed systems can be notoriously difficult to build. Perhaps the
most obvious complexity is the variety of machine architectures and software platforms over which a distributed application must function. In the past, developing
a distributed application entailed porting it to every platform it would run on, as
well as managing the distribution of platform-specific code to each machine. Most
of the computers in the University campus are used in academic related tasks and
2
678
use a variety of operating systems, a fact that clearly indicated the necessity of a
platform-independent middleware. Additionally, the computing grid would be part
of a bigger system3 that uses a service-oriented architecture (SOA), so it should act
as another service.
Another important aspect to be considered was related to the available computing
infrastructure. Contrary to what happens with computers in a cluster, that are dedicated and under a single administration domain, the computers in the campus are
shared and belong to many different domains. This meant that the computers would
frequently enter and leave the grid at random, and therefore the middleware to use
should allow us to build a loosely coupled system, in space (network addresses) and
time (synchronization).
The system now includes services for data mining, machine learning, simulation and visualization, and image analysis. For detailed information please refer to
http://www.bioingenium.unal.edu.co/
Java, Jini and JavaSpaces are trademarks of Sun Microsystems Inc.
25
679
or taken out from the space they transform themselves into standalone applications.
This not only solves the problem of code distribution, but also gives us a powerful
mechanism for building parallel computing servers. The application pattern use for
this is known as the Replicated-Worker pattern [9], and involves a master process
that divides a problem into smaller tasks and puts them into a space. The workers
take and execute these tasks, and write the results back into the space. It is then
the responsibility of the master to collect the task results and combine them into a
meaningful overall solution.
It is worth pointing out a couple of important characteristics of this pattern.
First, each worker process may execute many tasks, as soon as one task is computed a worker can take another task from the space and execute it. In this way, the
replicated-worker pattern automatically balances the load: workers compute tasks in
direct relation to their availability and capacity to do the work. Second, the type of
applications that fit into the replicated-worker pattern scales naturally: more workers can be added and the computation speeds up, without rewriting the code. The
Appendix section shows how the JavaSpaces API and the replicated-worker pattern
can be used to build a generic worker node.
Another important issue that has to be addressed when implementing a distributed system is related to the following. So far, we have discussed several tools
that Jini provides and that help obtain fault tolerance in the worker nodes. However,
the set of Jini services run in a server computer, a situation known as a single point of
failure (SPOF). This means that if this single server computer fails for some reason,
the whole computing grid goes down. To avoid this situation, we have embedded our
computing grid service, along with Jini, in a layered application (JEE). The service
is then run in a cluster of six application servers using two open source frameworks:
the application server JBoss [26] and the clustering tool Terracotta [41].
680
25
681
682
a1
u
v = a3
w
a5
a2
a4
a6
dx
x
dy y ,
1
1
x
y
=
u/w
v/w
(25.2)
Therefore, the new coordinates (x , y ) of pixels (x, y) in the template image, are
given by x = (a1 x + a2y + dx)/(a5x + a6 y + 1) and y = (a3 x + a4y + dy)/(a5x +
a6 y + 1).
Y [T 1 ( )].
We now have to find the intensities that a given point of T (y ) simultaneously takes
in X and YT . Since we are dealing with continuous spatial transformations, points of
25
683
the grid T (y ) do not, in general, transform to points of the grid x . So, in order to
define the joint probability density function of the images, we used the interpolation
approach explained below, discarding the points of T (y ) that do not have eight
neighbors in x . If we denote by T (y ) the subset of accepted points and by X the
interpolation of X , we can define the image pair as the following couple:
ZT : T (y ) A 2 ,
X ( ),Y [T 1 ( )] ,
and, in a similar way as we did for a single image in Eq. (25.3), their joint probability
density function as:
PT (i, j) =
(25.4)
(25.5)
expresses the fact that the variance can be decomposed as a sum of two energy terms:
a first term Var [E(Y |X )] that is the variance of the conditional expectation and measures the part of Y which is predicted by X, and a second term EX [E(Y |X = x)]
which is the conditional variance and stands for the part of Y which is functionally
independent of X.
Now, based on the previous equation that can be seen as an energy conservation equation, we can define the correlation ratio as the measure of the functional
dependence between two random variables:
(Y |X) =
Unlike the correlation coefficient which measures the linear dependence between
two variables, the correlation ratio measures the functional dependence. The correlation ratio takes on values between 0 and 1, where a value near 1 indicates high
functional dependence. Then, for a given transformation T , in order to compute
(YT |X) we can use the following equation:
1 (YT |X ) =
EX [Var(YT |X = x)]
,
Var(YT )
that by means of Eq. (25.4) and Eq. (25.5) can be expressed as:
1 (YT |X ) =
1
i2 Px,T (i),
2
i
684
where
2 = j2 Py ( j) m2 , m = jPy ( j),
j
and
i2 =
1
1
j2 P(i, j) m2i , mi =
jP(i, j).
Px (i) j
Px (i)
j
The correlation ratio measures the similarity between two images, and since it is
assumed to be maximal when the images are correctly aligned, it will be used to
compute the fitness of the individuals that make up the algorithm population.
c( ) n(x ),
which involves integer shifts of the central B-spline. The parameters of the spline
are the coefficients c. In the case of images with regular grids, they are calculated at
the beginning of the procedure by recursive filtering. A three-order approximation
was used in the present work.
25
685
the genome), and the candidates are called individuals or phenotypes. Traditionally,
individuals are represented as binary strings, but as we shall see, real number encoding is also possible. The evolution usually starts from a population of randomly
generated individuals and occurs in generations. In each generation, the fitness of
every individual in the population is evaluated, multiple individuals are stochastically selected from the current population, recombined and mutated to form a new
population. The new population is then used in the next iteration of the algorithm.
Commonly, the algorithm terminates when either an adequate fitness level has been
achieved, a maximum number of iterations has been reached, or, as in our case, the
available computational time is exausted.
Despite their computational cost, evolutionary algorithms have been chosen over
standard numerical methods because of their strong immunity to local extrema, their
intrinsic parallelism and robustness, as well as their ability to cope with large and
irregular search spaces. In this section we compare two simple evolutionary algorithms categorized as parallel iterative [28]: a Genetic Algorithm (GA) and Differential Evolution (DE). Genetic algorithms are attributed to Holland (1975) [27]
and Goldberg (1989) [7], while evolution strategies were developed by Rechenberg
(1973) [21] and Schwefel (1995) [19]. A good and diverse set of GA examples is
synthetized in Chambers [31], while a practical approach to Differential Evolution
can be found in [30]. Both approaches mimic Darwinian evolution and attempt to
evolve better solutions through recombination, mutation, and selection. However,
some distinctions do exist. DEs are very effective in problems of continuous functions optimization, in part because they use real encoding and arithmetic operators.
Since GAs generally encode parameters as binary strings and manipulate them with
logical operators, they are more suited to combinatorial optimization.
Upon analyzing the most relevant works in this area, it can be concluded that
the most crucial aspects refer to the selection of the coding scheme and the design
of the fitness function. All seem to agree that for this kind of optimization problem,
real-number encoding performs better than both binary and Gray encoding [34]. Accordingly for the problem at hand, in both evolutionary algorithms the chromosome
has been coded as eight floating point numbers representing the set of parameters
used in the projective transformation. The initial population includes an individual
that is either the null transformation or the center of mass transformation, according
to their respective fitness. The rest of the population is generated randomly within
the search space.
The fitness of each individual, indicating the similarity between the transformed
image and the reference image, is then computed using the correlation ratio previously described. Selection in the GA is performed as follows. The fittest ten percent
of the population is selected to be part of the next generation, a facet known as exploitation. The rest of the individuals are the result of either crossver (pc = 0.85
in our implementation) or random selection. In the case of crossover, the parents
of each new offspring are selected by tournament (5% the size of the population)
from the current population. Finally, leaving unmodified the individuals selected by
elitism (the evolution history), new candidate individuals are mutated according to
a predetermined probability (pm = 0.21), known as the exploration characteristic.
686
25
687
Fig. 25.4 Timing profile for the parallel iterative algorithm showing the percentage time
required for each operation
688
Table 25.1 Some combinations of rotation, scaling, and translation applied to the set of synthetic images
1
2
3
4
5
6
7
8
9
10
11
12
a
Tx
Ty
SF
1
10
10
10
10
10
1
10
10
10
10
10
1
1
10
10
10
10
1
1
10
10
10
10
1
1
1
10
10
10
1
1
1
10
10
10
10
10
10
10
100
100
10
10
10
10
100
100
10
10
10
10
10
100
10
10
10
10
10
100
0.8
0.8
0.8
0.8
0.8
0.8
1.2
1.2
1.2
1.2
1.2
1.2
1
2
3
4
5
6
7
8
9
10
11
12
a
Tx
Ty
SF
CR a
Error %
0.91
12.32
11.15
13.98
8.11
7.98
1.11
7.12
7.67
8.73
10.98
11.96
0.88
1.15
8.16
7.75
8.12
10.71
0.96
0.95
12.98
9.01
12.85
6.72
1.17
0.76
0.91
12.24
10.83
12.77
0.87
0.82
1.31
11.96
7.77
6.11
8.55
8.75
13.28
7.34
76.16
91.21
8.15
9.05
6.55
13.34
87.14
111.17
8.78
11.74
10.98
8.68
7.74
129.05
10.78
10.54
8.13
12.26
7.76
132.32
0.88
0.81
0.83
0.79
0.78
0.67
1.08
1.04
1.42
1.41
1.01
1.39
0.67
0.61
0.64
0.50
0.60
0.56
0.71
0.65
0.40
0.53
0.55
0.42
12.5
15.6
14.2
21.0
15.8
18.2
10.7
13.3
25.9
19.3
18.6
25.1
25
689
1
2
3
4
5
6
7
8
9
10
11
12
Tx
Ty
SF
CR
Error %
0.98
11.01
9.47
8.69
10.1
8.76
1.03
9.44
10.91
9.62
9.63
12.08
1.03
0.99
10.03
8.98
10.93
11.23
1.09
0.95
9.78
10.12
9.21
10.23
0.96
1.01
1.03
9.21
11.10
11.52
0.99
0.89
1.11
10.52
10.09
8.91
9.65
9.43
9.15
9.99
96.04
115.88
10.12
9.53
9.01
11.08
92.77
93.49
10.15
9.96
11.37
10.50
13.34
96.13
10.35
9.66
11.00
11.10
12.96
108.65
0.81
0.82
0.82
0.79
0.78
0.82
1.21
1.16
1.23
1.22
1.32
1.34
0.87
0.85
0.81
0.79
0.72
0.71
0.86
0.81
0.77
0.79
0.73
0.72
2.5
3.5
5.6
6.3
10.2
10.4
3.1
5.5
7.7
6.4
9.5
10.0
1
2
3
4
5
6
7
8
9
10
11
12
Tx
Ty
SF
CR
Error %
0.99
10.05
9.77
9.01
10.2
9.06
0.98
10.2
9.85
10.34
10.65
9.19
0.99
0.98
10.02
9.57
9.78
10.55
0.97
0.97
10.15
9.88
10.37
9.39
1.02
0.99
1.03
9.52
11.1
9.43
1.01
0.91
0.97
9.68
9.76
10.91
9.66
9.55
10.1
9.98
95.99
92.77
9.88
10.53
10.9
10.95
109.01
110.56
9.78
9.36
10.99
10.02
11.34
117.31
10.02
9.87
10.7
9.15
9.13
92.28
0.82
0.81
0.83
0.80
0.81
0.82
1.10
1.15
1.19
1.18
1.19
1.17
0.88
0.87
0.85
0.85
0.81
0.76
0.87
0.84
0.84
0.83
0.82
0.77
2.0
2.6
3.4
3.5
5.6
7.9
2.6
4.1
3.8
4.6
5.2
7.4
obtained, and efficiency, in terms of execution time and use of resources. All algorithms were coded in the same programming language and use the same routine to
compute the correlation ratio between the transformed and reference images. For
this comparison, the three algorithms were also executed ten times for each pair of
radiographs. A summary of the results obtained is presented in Table 25.5:
As expected, the Downhill Simplex method appeared to be very sensitive to the
initial parameters and not always converged to the global optimum. While in some
executions it obtained better results than the EAs, in other executions it produced
meaningless values and this is reflected in the low overall accuracy shown in Table 25.5. Again, the DE algorithm consistently outperformed the GA and for that
reason it is the algorithm currently used in production. It is also important to note
690
DS
GA
DE
0.63
52
1
0.81
50
120
0.83
48
120
that the computational grid, used to run the EAs, only uses the free CPU cycles of
the computers that make it up.
Fig. 25.5 shows a pair of radiographs to be subtracted (top row). The bottom row
displays subtraction without geometric correction on the left and with correction on
the right. Null intensity level is shifted to 128 in order to make tissue changes easily
observed. In this particular example, it can be appreciated that the match is precise
enough to make objective measurements despite the fact that in the second radiograph, the fifth tooth (from left to right) is nearly hidden. The small spot, possibly
an artifact, that appears in both images is observed in the resulting image in white,
indicating that new tissue developed. In this image it can also be observed that a
difference appears at the root of the third tooth which corresponds to new tissue
developed after treatment. These changes are impossible to observe in the raw difference image (bottom left). Similarly, in this image the bone pattern is blurred and
impossible to recognize, while in the resulting image the trabecular bone pattern is
clear. For the entire set of test images, matching has been visually assessed by two
experts in the field. They judged that the alignment was sufficiently accurate to get
objective measurements while maintaining acceptable computation times.
For the GA, 4580 experiments were performed in order to guarantee a complete
analysis of the parameter space. An experiment is the execution of the algorithm
with a particular set of images and parameters, i.e. population size, tournament size
and genetic operators probabilities. In this task the grid became an essential tool and
allowed us to achieve a second level of parallelism. The first analysis was conducted
to determine two basic parameters of the algorithm: population size and selection
scheme used to choose the parents for crossover. The experiments showed that the
optimum population size for this problem is 120. Two common selection options are
tournament selection and elitism. In tournament selection of size N, N individuals
are selected at random and the fittest is chosen. Elitism is a particular case of tournament selection where the size of the tournament equals the size of the population,
so the best individual is always preserved. For this problem, tournament selection of
size 12 is the best option for selecting the parents for a new generation. The other parameters analyzed were the crossover and mutation probabilities. The combination
of probabilities that yielded the best results were 0.85 and 0.21 respectively.
Another advantage of the DE algorithm over the GA is that it only uses two parameters: the scale factor F, that controls the rate at which the population evolves,
25
691
Fig. 25.5 The upper row shows the two images to subtract. Bottom row shows the subtracted
images: left without geometrical correction and right after automatic correction
and the uniform crossover probability Cr . This makes the analysis of the parameter
space simpler and therefore tuning of the algorithm becomes easier. The values
found for the DE algorithm are F = 0.5 and Cr = 0.5.
692
25
693
Fig. 25.6 Overall architecture of the subtraction radiography service, showing the protocols
used for communication between neighbouring components
to the computing grid (HPC). The cluster is based on the Rocks cluster distribution
[13] and Sun Grid Engine [37], and uses peer-to-peer technologies - a replicated,
distributed, transactional tree-structured cache - to avoid the appearence of a singlepoint of failure (HA).
The subtraction service provides two modes of operation: an interactive mode
and a batch mode. In the first mode, the user loads the images to register, and interactively drags, rotates and scales the template image to align it manually. Once
registered, the images can be subtracted and the difference visualized. Since this is
a lightweight operation, it is carried out completely on the client side. However, the
user can choose to register the images automatically. In this case, the images are uploaded to the server (if not already there), and then registered by the aforementioned
distributed algorithm. The parameters of the projective transformation are then sent
back to the client application for visualization. With the help of the local server
and database, the interactive mode keeps working even without an active Internet
connection, provided the images reside in the local machine. This is what happens
most of the time, either because the images were produced on the local machine,
or because they were previously downloaded from the server. The synchronization
process is the responsibility of the so called proxy element that the client application actually communicates with: it uploads the locally digitized images to the
server, and downloads the images stored on the server to the corresponding client
computers. Figure 25.7 shows the graphical user interface of the service.
In practice, the service is mostly used in the second or batch mode. In this mode
the set of radiographs taken daily are digitized and uploaded to the service server
where they are registered automatically by the same evolutionary algorithm. The
job of the master process, executed in the cluster, is to generate the initial populations and send them to the computing grid for evaluation. Since the fitness of each
694
Fig. 25.7 Graphical user interface for the radiography subtraction service
individual can be evaluated independently from the others, this task is performed in
parallel on the grid. Once evaluated, each population is collected by the corresponding master process that applies the genetic operators (mutation, recombination, selection), produces a new population and sends it again to the grid for evaluation. The
process repeats until the stop conditions are met. The optimal transformations are
then stored in the server database and sent to the client applications for visualization.
The use of Java and Internet technologies in medical imaging is not new. These
technologies have been used in radiology teaching files, to access information in
multimedia integrated picture archiving and communication systems (PACS), and
for teleradiology purposes. However, all known approaches seem to assume the existence of a reliable and stable Internet connection, and this is not always possible.
25
695
chapter and have been omitted. However, it is worth noticing that these are essential
aspects that in research and clinical environments have to be properly addressed in
order to provide secure and reliable medical imaging services.
To demonstrate the applicability of the computing grid in a real situation, we
have presented the case study of automatic digital subtraction. In this situation, we
evaluated two evolutionary algorithms as the search strategy to solve an expensive
optimization problem. This is to find a global maximum of an unknown function that
measures the similarity between two given images. In this evaluation, i.e., for this
particular problem, differential evolution proved to be more performant and reliable
than the genetic algorithm. The global structure of the algorithm is iterative, but
since individuals in a population can be evaluated independently from the others,
the most time-consuming stage of the algorithm is computed in parallel. This, and
the simple yet powerful API of JavaSpaces, allowed us to easily implement the
devised solution.
The high computational cost of the evolutionary algorithm in use was addressed
by developing a distributed implementation. This implementation exploits the computational power of a set of personal computers arranged in a low cost computational
grid. Since it can be deployed over an existent computational infrastructure, this approach can be affordably implemented in institutions with a low budget, like public
and university hospitals.
The proposed service-oriented model for medical imaging is feasible and useful
in research and clinical scenarios, and is used daily in the School of Dental Medicine.
The implemented framework allows doctors to use up to date medical imaging techniques and high-performance computing power in routine clinical studies, by means
of a standard web browser and without specialized training. Furthermore, the framework allows for new services to be obtained from the integration of existing services
with different dynamics, such as 2D/3D/4D, and video processing tools.
We are currently working on evolving the architecture in use towards a cloud computing model, in which the common theme relies on integrated services over the
Internet to satisfy the clinician computing needs. Regarding the algorithms used in
registration, current and future work is related to further exploring hybrid evolutionary algorithms such as those presented in [4, 16, 35], their possible application to 3D
curvature-based registration [23], as well as their distribution and parallelization.
Acknowledgements. The authors would like to give special thanks to Professors Fabio A.
Gonzalez, German J. Hernandez, Luis F. Nino, and Mark J. Duffy for their invaluable help
and advice.
Appendix
This section shows the use of the Command pattern and JavaSpaces to build a
generic worker node. This pattern was first introduced by Gamma et al. [10] in
object-oriented software design, and is used in a variety of domains. In our case it
was used to create a worker application capable of servicing requests of any master
process, in other words, to build a generic worker. In this context, to implement the
Command pattern, a class must implement the following interface:
696
To benefit from this pattern, a master process has to break the job at hand into
TaskEntry objects and write them to the space. The TaskEntry class, given as
an example here, implements the Entry interface (a tagged JavaSpaces interface,
without methods) and the execute() method declared in the Task interface described before.
public class TaskEntry implements Task, Entry
{
public Result execute()
{
. . .
}
}
25
697
In practice, the routines that generate tasks and collect results are run concurrently in two separate execution threads. This is because this process can be executed asynchronously: as soon as the tasks start to generate and are written into
the space, worker nodes can start computing them without having to wait for this
mapping process, therefore speeding up the whole computation.
A simplified version of the generic worker would then look like:
public class Worker
{
public void run()
{
for ( ;; )
{
Task t = takeTask();
Result r = t.execute();
writeResult( r );
}
}
}
Again, in practice, the actual Worker class spawns several worker threads to
compute multiple tasks concurrently, depending upon the number of processors
(and cores) available. Internally, the method takeTask() calls the JavaSpaces
method take() which is blocking, therefore releasing the processor where the
worker thread is running.
For the sake of clarity we have also omitted some important details such as those
regarding the use of transactions, leases, job priorities and caches. Transactions are
used to guarantee that a master process acts as a standalone application: if it writes a
number of tasks into the space, then it must receive the same number of results (complete success), or none (complete failure). That is, to avoid partial failure, workers
compute every task under a transaction. If the task is completed successfully, then
the worker writes the result into the space and commits the transaction, otherwise,
the transaction is cancelled and the task is returned to the space. These semantics are
provided by a two-phase commit protocol that is performed by the Jini transaction
manager6. Using transactions, the pseudo code for the Worker class would now be:
public class Worker
{
public void run()
{
for ( ;; )
For detailed information on this protocol, please refer to the Jini Transaction Specification
(http://www.jini.org/transactions).
698
{
createTransaction()
try
{
Task t = takeTask();
Result r = t.execute();
writeResult( r );
commitTransaction();
}
catch ( Exception e )
{
cancelTransaction();
}
The use of transactions alone does not guarantee that partial failure will not occur.
Let us suppose a worker node takes a task from the space and starts computing it.
Then, if some component fails (e.g., the worker application, the operating system,
the computer is shut down or disconnected from the net) before the task finishes
execution, the transaction will not be committed, nor cancelled, and the result will
never be written into the space, causing the master process to wait indefinitely. The
solution to this situation lies in the use of leases.
Leasing in JavaSpaces provides a way of allocating resources for a fixed period
of time, after which the resource is freed. In the context of our worker class, when it
calls the JavaSpaces take() method to fetch a task object for execution, it supplies
a lease parameter that specifies the amount of time that it will hold the task object.
If the task is executed before the lease runs out, the result is written into the space
and the operation finishes. Otherwise, the lease keeps being renewed until the tasks
finishes. If the lease is not renewed, an indication that something went wrong in the
worker node, the task is returned to the space by the Jini lease manager7, so another
worker node can compute it.
References
1. Petersson, A., Ekberg, E.C., Nilner, M.: An evaluation of digital subtraction radiography
for assessment of changes in position of the mandibular condyle. Dentomaxillofacial
Radiology 27, 230235 (1998)
7
For detailed information on leasing, please refer to the Jini Leasing Specification
(http://www.jini.org/leasing).
25
699
2. Farag, A.A., Yamany, S.M., Nett, J., Moriarty, T., El-Baz, A., Hushek, S., Falk, R.: Medical Image Registration: Theory, Algorithm, and Case Studies in Surgical Simulation,
Chest Cancer, and Multiple Sclerosis, ch. 1, vol. 3, pp. 146. Kluwer Academic/Plenum
Publishers, New York (2005)
3. Apache. Apache river (2007), http://www.apache.org/river
4. Grosan, C., Abraham, A., Ishibuchi, H.: Hybrid Evolutionary Algorithms. Springer, Heidelberg (2007)
5. Gelernter, D.: Generative communication in linda. ACM TRansactions on Programming
Languages and Systems 7(1), 80112 (1985)
6. Rueckert, D.: Non-rigid Registration: Concepts, Algorithms and Applications. Biomedical Engineering, ch. 13, pp. 281301. CRC Press, Florida (2001)
7. Goldberg, D.A.: Genetic algorithms in search, optimization and machine learning.
Addison-Wesley Professional, Reading (1989)
8. Hawkes, D.J.: Registration Methodology: Introduction. Biomedical Engineering, ch. 2,
pp. 1138. CRC Press, Florida (2001)
9. Freeman, E., Hupfer, S., Arnold, K.: JavaSpaces Principles, Patterns, and Practice. Prentice Hall PTR, Englewood Cliffs (1999)
10. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Elements of Reusable Object-Oriented
Software. Addison-Wesley Professional, Reading (1994)
11. Berman, F., Fox, G., Hey, A.J.G.: Grid Computing: Making The Global Infrastructure a
Reality. Wiley, Chichester (2003)
12. Manana, G., Romero, E., Gonzalez, F.: A grid computing approach to subtraction radiography. In: IEEE International Conference on Image Processing, pp. 32253228 (2006)
13. Rocks Group. Rocks clusters (2008), http://www.rocksclusters.org/
14. Grondahl, H., Grondahl, K.: Subtraction radiography for the diagnosis of periodontal
bone lesions. Oral Surgery 55, 208213 (1983)
15. Lester, H., Arridge, S.R.: A survey of hierarchical non linear medical image registration.
Pattern Recognition 32(1), 129149 (1999)
16. Talbi, H., Batouche, M.: Hybrid particle swarm with differential evolution for multimodal image registration. In: IEEE International Conference on Industrial Technology,
pp. 15671572 (2004)
17. Cordon, H.F., Damas, S., Santamara, J.: A chc evolutionary algorithm for 3d image
registration. In: De Baets, B., Kaynak, O., Bilgic, T. (eds.) IFSA 2003. LNCS, vol. 2715,
pp. 440441. Springer, Heidelberg (2003)
18. Gomez Garca, H.F., Gonzalez Vega, A., Hernandez Aguirre, A., Marroqun Zaleta, J.L.,
Coello Coello, C.A.: Robust multiscale affine 2D-image registration through evolutionary strategies. In: Guervos, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernandez-Villacanas,
J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 740748. Springer, Heidelberg (2002)
19. Schwefel, H.P.: Evolution and Optimum Seeking: The Sixth Generation. WileyInterscience, New York (1995)
20. De Falco, I., Della Cioppa, A., Maisto, D., Tarantino, E.: Differential evolution as a
viable tool for satellite image registration. Applied Soft Computing 8, 14531462 (2008)
21. Rechenberg, I.: Evolutionsstrategie - Optimierung technischer Systeme nach Prinzipien
der biologischen Evolution, pp. 305324. Frommann-Holzboog, Stuttgart (1973)
22. Beutel, J., Sonka, M., Kundel, H.L., Fitzpatrick, J.M., Van Metter, R.L.: Medical Image
Processing and Analysis, vol. 2, pp. 447513. SPIE Press, Belligham (2000)
23. Modersitzki, J.: Numerical Methods for Image Registration. Oxford University Press,
Oxford (2004)
700
24. Nelder, J., Mead, R.A.: A simplex method for function minimization. The Computer
Journal 7(4), 308313 (1965)
25. Maintz, J.B.A., Viergever, M.A.: An overview of medical image registration methods.
In: Symposium of the Belgian Hospital Physicists Association, SBPH/BVZF (1997)
26. JBoss. Jboss application server (2008), http://www.jboss.org/jbossas
27. Holland, J.H.: Adaptation in natural and artificial systems: An Introductory Analysis with
Applications to Biology, Control, and Artificial Intelligence. MIT Press, Massachusetts
(1992)
28. Bahi, J.M., Contassot-Vivier, S., Couturier, R.: Parallel Iterative Algorithms: From Sequential to grid Computing. Chapman & Hall/CRC, Boca Raton (2008)
29. Hajnal, J.V.: Introduction. Biomedical Engineering, ch. 1, pp. 18. CRC Press, Florida
(2001)
30. Price, K.V., Storn, R.M., Lampinen, J.A.: Differential Evolution: a practical approach to
global optimization. Springer, Heidelberg (2005)
31. Chambers, L.: The practical handbook of genetic algorithms: Applications, 2nd edn.
Chapman & Hall/CRC, Boca Raton (2000)
32. Davis, L.: Handbook of genetic algorithms, 2nd edn. Chapman & Hall/CRC, Boca Raton
(2000)
33. Eshelman, L.J.: Real-coded genetic algorithms and interval schemata, vol. 2, pp. 187
202. Morgan Kaufmann Publishers, Belligham (1993)
34. Gen, M., Cheng, R.: Genetic Algorithms and Engineering Optimization. WileyInterscience, Hoboken (2000)
35. Lozano, M., Garca-Martnez, C.: Hybrid metaheuristics with evolutionary algorithms
specializing in intensification and diversification: Overview and progress report. Computers & Operations Research (in press, 2009)
36. Powell, M.: An efficient method for finding the minimum of a function of several varialbles without calculating derivatives. The Computer Journal 7(2), 155162 (1964)
37. Sun Microsystems. Grid engine (2008), http://gridengine.sunsource.net/
38. Viola, P., Wells III, W.: Alignment by maximization of mutual information. International
Journal of Computer Vision 24, 137154 (1997)
39. Kirkpatrick, S., Gelatt Jr., C.D., Vecchi, M.: Optimization by simulated annealing.
Science 220(4598), 671680 (1983)
40. Sterling, T., Becker, D.: Beowulf (2008), http://www.beowulf.org/
41. Terracotta. Terracotta (2008), http://www.terracotta.org/
42. Pennec, X., Roche, A., Malandain, G., Ayache, N.: Multimodal image registration by
maximization of the correlation ratio (1998),
http://hal.archives-ouvertes.fr/
43. Yuan, X., Zhang, J., Buckles, B.P.: Evolution strategies based image registration via
feature matching. Information Fusion 5, 269282 (2004)
Chapter 26
26.1 Introduction
High-Level Synthesis (HLS) [8] is concerned with the design and implementation of
digital circuits starting from a behavioral description, a set of goals and constraints,
and a library of different types of resources. HLS typically consists of three steps:
the scheduling, the resource allocation and the controller synthesis. The scheduling
assigns each operation to one or more clock cycles (or control steps) for the execution. The resource allocation assigns the operations and the produced values to
the hardware components and interconnects them using connection elements. Finally, the controller synthesis provides the logic to issue data-path operations, based
on the control flow. Unfortunately, it is non-trivial to solve these problems as they
Christian Pilato Daniele Loiacono Antonino Tumeo Fabrizio Ferrandi Pier Luca Lanzi
Donatella Sciuto
Politecnico di Milano, Dipartimento di Elettronica ed Informazione
{pilato,loiacono,tumeo,ferrandi,lanzi,sciuto}@elet.polimi.it
Y. Tenne and C.-K. Goh (Eds.): Computational Intel. in Expensive Opti. Prob., ALO 2, pp. 701723.
c Springer-Verlag Berlin Heidelberg 2010
springerlink.com
702
C. Pilato et al.
26
703
quality of the solutions with respect to the design objectives and overall execution
time of the exploration, are presented and discussed for each technique.
704
C. Pilato et al.
sizing and convergence time and to enhance speedups (see [31] for further details).
The surrogate can be either endogenous [34] or exogenous [2, 19, 23]. Fitness inheritance [34] is one of the most promising endogenous approach to evaluation
relaxation: the fitness of some proportion of individuals in the population is inherited from the parents. Sastry et al. [33] use a model based on least squares fitting,
applied in particular to extended compact genetic algorithm (eCGA [16]). Chen et
al. [6] present their studies on fitness inheritance in multi-objective optimization as a
weighted average of parent fitness, decomposed in the different n objectives. Recent
studies investigated the impact of fitness inheritance on real-world applications [11]
and different exploration algorithms [30]. Exogenous surrogate are typically used
in engineering applications [2, 10] and consists of developing a simplified model
of the real problem to provide an inexpensive surrogate of the fitness function. In
particular, in HLS several simplified models for area and timing have been proposed
in the literature. In [26], simple metrics are proposed to drive the optimization algorithms, even if some elements are not correctly considered (e.g., steering logic or
effects of optimizations performed by the logic synthesis tools). In [3] the area is
estimated with a linear regression approach that is also able to model the effects of
the logic optimizations. Unfortunately, most of the models proposed provide a poor
guidance to the optimization process as they do not take into account the resource
binding and the interconnections [5]. In this work we focus on data-flow applications that involve only area models, however we refer the interested reader to [4, 22]
for timing estimation models.
26
705
706
C. Pilato et al.
26
707
relatively low rate (e.g., Pm =10%) and it is applied as follows: each gene is modified
with probability P , changing the corresponding binding information. Crossover is
a reproduction technique that mates two parent chromosomes and produces two offspring chromosomes. Given two chromosomes, a standard single-point crossover is
applied with a high probability (e.g., Pc =90%). The crossover mechanism mixes the
binding information of the two parent solutions.
708
C. Pilato et al.
26
709
Table 26.1 Examples of computational effort for the complete synthesis of common benchmarks for high-level synthesis
Benchmark HLS time (s) Logic Synthesis time (s) Total time (s)
arf
0.35
100.72
101.07
0.99
28.50
29.49
bandpass
0.37
122.03
122.40
chemical
1.02
133.70
134.72
dct
1.04
124.45
125.49
dct wang
0.96
248.54
249.50
dist
0.39
121.35
121.74
ewf
0.07
43.61
43.80
fir
0.06
32.19
32.25
paulin
0.84
121.70
122.54
pr1
1.19
176.08
177.27
pr2
0.05
14.00
14.05
tseng
Avg.
0.61
105.57
106.18
710
C. Pilato et al.
#A.Area.FF =
RA.DataPath.Registers
#LU T =
F.Area
FA.DataPath.FunctionalUnits
Fig. 26.2 Simplified model to estimate area occupation for the structural design A
26
711
#FFDataPath =
2 sizeo f (R) + 2
RA.DataPath.Registers
#LU TFU =
5 F.Area + 5
FA.DataPath.FunctionalUnits
#LU TMUX =
MA.DataPath.Mux
Fig. 26.3 Linear regression model to estimate area occupation for the structural design A
of the functional units and so its value is still the sum of the area value of each
functional unit. The other three parts (FSM, MUX, Glue) are obtained by using a
regression-based approach:
the FSM contribution is due to the combinational logic used to compute the output and next state;
the MUX contribution is due to the number and size of multiplexers used in the
data-path;
the Glue contribution is due to the logic to enable writing in the flip flops and to
the logic used for the interaction between the controller and the data-path.
The model is then specialized for the particular vendors tools and devices by using
a linear regression approach similar to [3], obtaining an accurate estimation of the
design objectives, if properly adapted. For this reason, one of the main drawbacks
is that, each time the designer changes the experimental setup, it requires an initial
phase of tuning, that could be time-consuming and error-prone.
712
C. Pilato et al.
#FFFSM = log2 (A.FSM.NumControlStates)
#FFDataPath =
sizeo f (R)
RA.DataPath.Registers
#LU TFU =
F.Area
FA.DataPath.FunctionalUnits
#LU TMUX =
MA.DataPath.Mux
Fig. 26.4 Model used to estimate area occupation for the FPGA design A using Xilinx ISE
ver. 10.1. and targeting a Virtex XC2VP30 FPGA device
16000
14000
12000
10000
8000
6000
4000
2000
Simplified Model
Linear Regression Model
0
0
2000
4000
6000
8000
10000
12000
14000
16000
26.5.2.1
Accuracy of Models
Figure 26.5 presents the validation for the models described in the previous section.
The dashed line represents the ideal situation, where the estimated values are equal
to the real ones, obtained with an actual synthesis on the target device. Squared
dots represent the values associated to the first model, where only functional units
and registers are considered. Round dots, instead, represent the values obtained with
the linear regression model. We validated the models on a data-set composed of 73
designs that represent different architectures of the benchmarks described in [7] and
shown in Table 26.1.
The plot shows that the simplified model systematically underestimates the real
values. This happens because the contribution due to multiplexers and steering logic
26
713
Benchmark
arf
dct
dct wang
dist
ewf
pr1
pr2
Area
63,010
111,392
115,167
158,716
73,969
72,990
162,130
Simplified
#Pareto Points
NSGA-II DSE Synthesis
7
6
8
6
11
9
14
13
10
9
9
8
17
14
Linear Regression
#Pareto Points
NSGA-II DSE Synthesis
60,341
9
9
107,808
14
14
110,198
16
16
157,049
20
20
72,634
13
13
70,978
12
12
154,503
19
19
Area
is not considered. On the other hand, the model based on linear regression approximates the real values with a good accuracy. In particular, the simplified model shows
an average error of 43.3920.00%, while the maximum error is 73.35%. The model
based on linear regression, instead, has an average error equal to 2.222.20%, with a
maximum error of 11.85%. Thus, we can confirm that is able to accurately estimate
all the area contributions of a structural description and that it can be effectively
integrated in the proposed methodology to drive the exploration algorithm.
26.5.2.2
714
C. Pilato et al.
24
30
Simplified Model
Linear Regression Model
20
22
18
16
14
12
10
Simplified Model
Linear Regression Model
25
20
15
10
8
6
2500 3000 3500 4000 4500 5000 5500 6000 6500 7000
5
2000
3000
4000
30
Simplified Model
Linear Regression Model
14
12
10
8
7000
8000
9000 10000
(b) pr2
16
6000
5000
Simplified Model
Linear Regression Model
25
20
15
10
6
4
1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500
(c) ewf
5
2000
3000
4000
5000
6000
7000
8000
9000 10000
(d) dist
Fig. 26.6 Examples of comparison of obtained Pareto curves using the different models
model systematically outperforms the simplified one in terms of quality of the Paretooptimal set. In Fig. 26.6(c), for large designs, the model that considers only functional
units and registers obtains better results. In fact, in this region of the design space, the
impact of the multiplexer is limited (about 15-20%) and a fitness function focused
only on functional components and registers is more suitable to drive the exploration
algorithm. In the other region of the space, where few functional units are in the designs, the multiplexers have a larger impact (about 70-75%) and the fitness function
that takes into account their occupation obtains better results. Finally, in Fig. 26.6(d),
the multiplexers are not so relevant for the design. As a result the two models are almost equivalent, as also shown by the similar values in Table 26.2.
26
715
overall number evaluations rather than reducing the time required for a single evaluation (i.e. the synthesis steps, HLS or logic synthesis). Interpolation is usually much
less time consuming, thus we can save some of the time required for a complete
synthesis.
Note that this technique is less dependent on the problem than solution modeling.
In fact, to build the model, the designer should identify the relevant features of
the design solutions, synthesize the related hardware descriptions and establish a
correspondence. On the contrary, fitness inheritance is only based on the definition
of the chromosome encoding and the fitness of previously evaluated individuals.
However, to produce an effective surrogate, we needed to carefully take into account some aspects. In particular, we focused our attention on the percentage of
individuals to be estimated, on the parents to choose and on how to combine their
fitness. We present and discuss these aspects in Section 26.6.1, and then compare
the quality of some different solutions in Section 26.6.2.
Provided a proper analysis of these aspects, the results show that fitness inheritance is able to consistently reduce the execution time of all the methodology. We
also demonstrate that, if the parameters are not correct, the method can even degrade
rather than improving the performance of the exploration algorithm.
Nk=1 i,k j
N
(26.3)
716
C. Pilato et al.
this function is normalized with the size of chromosome, so its value is always
between 0 and 1. The distance di, j measures the similarity of two individuals. If
these are totally different (there is not any matching gene), the value will be 1.
On the other hand, if the two individuals are identical, the value will be 0. Only
individuals that are considered neighbors in this space will be kept for the fitness
estimation. We call r the maximum distance that an individual should have to be
kept. The name r is used to remember the term radius, since the region delimited by
this value can be imagined as a N-dimensional hyper-sphere centered at individual
Indi . All the individuals Ind j S, having distance smaller than the radius r, can be
considered as points inside this hyper-sphere. Therefore, all these individuals will
be considered for estimation and the distance value is modified as follows:
di, j i f di, j r
di, j =
(26.4)
1 i f di, j > r
where all individuals outside the hyper-sphere are equivalent to points at infinite distance and they will not be considered for estimation. To perform the estimation, we
require a minimum number of points in this region. If there are not enough points,
it means that there is no sufficient local information to estimate and individual. So,
it will be really evaluated. If there are enough points, instead, the estimation can be
performed on the set S of points, selected as follows:
Fitiz =
(26.5)
for each objective z. Fitkz is the value of the objective z for the individual Indk and
(1 di, j ) is used as a measure of closeness between individuals. f and g are functions that change the contribution of the two terms. We formulated the term (1 di, j )
in this way since the distance di, j does not go to infinite, but has a value between
0 and 1. Therefore, we consider the values associated to 1 equivalent to an infinite distance (i.e., no contribute to the fitness). As explained above, this weighted
average is computed for all the objectives considered in the optimization. The resulting value is then returned to the genetic algorithm, which can so proceed. A flag
is also associated to the individual Indi to remember that the fitness has been estimated and not really evaluated. This allows the algorithm to identify the estimated
individuals when needed.
In particular, in the last generation the fitness of all the individuals are tested for
evaluation. Individuals that have already been evaluated will be skipped, while the
estimated individuals will be effectively evaluated. Thus, when the exploration ends,
all the individuals on which the final non-dominated set is computed will have a real
fitness value associated.
26
717
Table 26.3 Comparison of the weighting functions about fitness evaluations and execution
time
Benchmarks
Model
linear
quadratic
exponential
linear
dct
quadratic
exponential
linear
dct wang quadratic
exponential
linear
dist
quadratic
exponential
linear
ewf
quadratic
exponential
linear
pr1
quadratic
exponential
linear
pr2
quadratic
exponential
arf
w/o inheritance
Ancestors
#Eval. Exec. #Eval. Exec.
diff
Time(s)
Time (s)
(%)
#Eval.
4,721
9,639 1,064.65 5,048
5,330
7,263
11,150 3,677.55 7,758
7,319
6,945
10,837 3,470.16 7,479
6,160
8,315
12,683 3,907.81 7,607
8,376
6,218
9,575 1,165.55 6,392
6,256
8,358
9,773 2,542.35 6,834
6,681
6,423
10,610 4,044.71 6,930
6,937
7,567
8,160
9,027
7,758
7,131
7,867
7,348
7,758
7,312
7,812
7,358
7,801
6,518
64,18
6,578
7,548
6,912
7,154
6,958
7,198
7,277
1,211.68
1,327.19
1,374.83
5,074.26
5,492.04
5,135.87
4,906.20
5,456.37
4,083.51
6,181.55
5,254.36
5,995.74
2,074.99
2,127.26
2,095.66
5,514.68
3,879.37
3,594.25
4,718.46
5,086.70
5,119.57
Avg.
Std. Dev.
+13.81%
+24.65%
+29.13%
+37.98%
+49.34%
+39.65%
+41.38%
+57,24%
+17,68%
+58.18%
+34.46%
+53.43%
+78.03%
+82.51%
+79.80%
+116.91%
+52.59%
+41.48%
+16.66%
+25.76%
+26.57%
+46.53%
25.90%
Parents
Exec.
Time(s)
888.66
801.86
985.59
2,699.86
2,360.44
2,625.69
2,385.12
2,549.29
2,689.09
3,402.46
3,048.29
3,590.28
814.60
790.00
889.94
2,058.11
1,878.11
1,958.15
3,589.25
3,578.02
3,547.66
Avg.
Std. Dev.
diff
(%)
-16.53%
-24.68%
-7.42%
-26.58%
-35.81%
-28.60%
-37.27%
-26.53%
-22.51%
-12.93%
-21.99%
-8.13%
-30.11%
-32.22%
-23.65%
-19.05%
-26.13%
-22.98%
-11.26%
-11.54%
-12.59%
-21.82%
8.81%
Section 26.5.2.2. In all the experiment, the fitness evaluation uses the linear regression model. In Section 26.6.2.1 we present, discuss, and compare different functions
to weight the fitness contributions of the evaluated individuals. In Section 26.6.2.2
we apply fitness inheritance both to the ancestors and to the parents and compare
the results. Finally, we analyze the effects of different inheritance percentages (pi )
and distance rates (r).
26.6.2.1
Weighting Functions
(26.6)
where the fitness of the evaluated individuals are linearly combined with the related
distances 1 di, j from the candidate individual Indi . While, the second model is
computed as follows:
jS Fit zj (1 di, j )2
Fitiz =
(26.7)
jS (1 di, j )2
where the quadratic function in (1 di, j ) is used to increase the weight of distance,
similarly to the Physics equations for gravity or magnetism. However, we adopt a
718
C. Pilato et al.
Table 26.4 Comparison of the weighting functions about quality of the results
Benchmarks
Model
linear
arf
quadratic
exponential
linear
dct
quadratic
exponential
linear
dct wang quadratic
exponential
linear
dist
quadratic
exponential
linear
ewf
quadratic
exponential
linear
pr1
quadratic
exponential
linear
pr2
quadratic
exponential
w/o inheritance
Area #Pareto
63,157
113,526
14
112,868
16
169,487
20
72,634
13
75,405
12
156,906
19
Ancestors
Area
#Pareto
DSE Synth.
63,756 9
9
63,633 11
11
65,097 9
9
113,151 14
14
111,598 11
10
114,550 13
11
113,351 12
12
114,778 17
17
114,389 13
13
168,955 18
17
171,706 18
17
169,708 21
20
76,946 14
12
75,503 13
12
77,000 13
13
76,286 11
10
76,878 11
10
76,580 12
11
158,110 19
18
161,812 21
19
162,195 22
20
Parents
Area
#Pareto
DSE Synth.
63,633 11
11
62,729 12
12
64,941 12
12
113,732 12
12
113,732 12
12
113,469 10
10
112,782 15
15
112,484 14
14
113,911 14
14
170,731 19
19
170,578 18
18
167,900 19
19
73,245 13
13
74,366 11
11
74,184 13
13
75,168 11
11
75,083 11
11
75,000 12
12
154,903 18
18
154,186 19
19
160,800 20
20
proportion with (1 d)2 and not (1/d)2 , that allows dealing with infinite distance
as described above. The last model is computed as:
Fitiz
jS Fit zj (e
jS (e
1di, j
1di, j
1)
1)
(26.8)
where the distance is exponentially weighted, emphasizing even more the contribution of the nearest individuals to the fitness estimation of Indi . These functions
have been applied both to the ancestors and to the parents. The distance rate has
been set to r = 0.20 and the inheritance rate to pi = 0.5. In the former case, the
set S of individuals considered increases generation by generation, while, in the latter case, the size is constant and related to the size of the population. When the
ancestors are used, the inheritance model analyzes all the elements of the set for
distance calculation, and the time required for fitness inheritance could overcome
the time required by the function evaluation itself. Thus, in this case, fitness inheritance reduces the number of evaluations, but may also degrade the overall execution
time of the methodology. At opposite, if the methodology is applied only to the
parents, both the number of evaluations and the execution time of the methodology
are significantly reduced. Since less individuals are available for computing the inheritance information (see Eq. 26.4), the number of evaluations is larger than with
the ancestors. Table 26.3 shows the data about the number of evaluations and about
the overall execution time.
26
719
Table 26.5 Comparison of different inheritance rate about quality of the exploration, fitness
evaluations and execution time
Benchmarks
pi
0.20
0.30
0.40
arf
0.50
0.55
0.60
0.70
0.20
0.30
0.40
dct
0.50
0.55
0.60
0.70
0.20
0.30
0.40
dct wang 0.50
0.55
0.60
0.70
0.20
0.30
0.40
dist
0.50
0.55
0.60
0.70
0.20
0.30
0.40
ewf
0.50
0.55
0.60
0.70
0.20
0.30
0.40
pr1
0.50
0.55
0.60
0.70
0.20
0.30
0.40
pr2
0.50
0.55
0.60
0.70
Area
w/o inheritance
#Pareto #Eval. Exec.
Time(s)
63,157
113,526
14
112,868
16
169,487
20
72,634
13
75,405
12
156,906
19
Area
64,088
9,639 1,064.65 62,724
62,990
63,239
62,820
64,275
63,654
113,487
11,150 3,677.55 113,909
113,104
113,732
112,223
114,051
111,080
113,952
10,837 3,470.16 111,487
113,536
112,484
114,283
112,842
113,706
166,060
12,683 3,907.81 161,432
167,804
170,578
167,801
167,472
170,151
74,623
9,575 1,165.55 74,143
73,609
74,366
74,053
73,234
73,023
75,308
9,773 2,542.35 74,309
75,319
75,083
74,888
75,045
74,831
160,628
10,610 4,044.71 150,630
158,903
154,186
154,241
155,714
159,150
w inheritance
#Pareto #Eval. Exec.
DSE Synth.
Time(s)
12
12 7,681 834.00
11
11 8,852 961.30
11
11 8,812 981.43
11
11 8,259 913.65
10
10 8,160 888.66
12
12 8,702 965.97
11
11 8,655 961.38
14
14 8,158 3,248.20
16
16 7,789 2,874.11
15
15 7,441 2,581.47
12
12 7,131 2,360.44
11
11 7,062 2,236.98
16
16 7,325 2,514.02
12
12 7,587 2,636.42
13
13 8,258 3,025.20
14
14 8,126 2,854.01
15
15 7,887 2,741.36
14
14 7,758 2,549.29
15
15 7,747 2,569.10
17
17 7,854 2,698.47
16
16 7,981 2,747.11
17
17 7,414 3,658.10
17
17 7,401 3,698.43
18
18 7,333 3,154.01
18
18 7,358 3,048.29
17
17 7,441 3,341.22
19
19 7,551 3,418.99
16
16 7,547 3,507.67
13
13 9,147 847.10
12
12 7,765 858.36
13
13 7,010 802.19
11
11 6,418 790.00
11
11 6,211 767.41
12
12 6,478 789.23
13
13 6,441 796.59
13
13 8,012 2,236.39
9
9
7,477 2,056.47
10
10 7,087 1,969.78
11
11 6,912 1,878.11
9
9
6,898 1,789.56
10
10 7,101 1,941.02
10
10 7,011 1,867.53
20
20 8,101 3,856.12
22
22 7,812 3,785.54
21
21 7,485 3,696.36
19
19 7,198 3,578.02
15
15 7,012 3,547.10
21
21 7,025 3,423.11
21
21 6,894 3,326.98
Avg.
Std. Dev.
diff
(%)
-21.66%
-9.71%
-7.82%
-14.18%
-16.53%
-9.27%
-9.70%
-11.67%
-21.85%
-29.80%
-35.81%
-39.17%
-31.64%
-28.31%
-12.82%
-17.76%
-21.00%
-26.54%
-25.97%
-22.24%
-20.84%
-6.39%
-5.36%
-19.29%
-21.99%
-14.50%
-12.51%
-10.24%
-27.32%
-26.36%
-31.17%
-32.22%
-34.16%
-32.29%
-31.66%
-12.03%
-19.11%
-22.52%
-26.13%
-29.61%
-23.65%
-26.54%
-4.66%
-6.41%
-8.61%
-11.54%
-12.30%
-15.37%
-17.74%
-20.43%
9.13%
Finally, Table 26.4 compares the quality of the results with the different weighting functions. As in Section 26.5.2.2, the area delimited by the approximated
Pareto-optimal curve gives a qualitative evaluation of the explorations. The results
show that the quadratic function is the most efficient solution to weight the fitness
720
C. Pilato et al.
Table 26.6 Comparison of different distance rate about quality of the results
Benchmarks
r
0.10
0.20
arf
0.25
0.50
0.10
0.20
dct
0.25
0.50
0.10
0.20
dct wang
0.25
0.50
0.10
0.20
dist
0.25
0.50
0.10
0.20
ewf
0.25
0.50
0.10
0.20
pr1
0.25
0.50
0.10
0.20
pr2
0.25
0.50
w/o inheritance
Area #Pareto
63,157
113,526
14
112,868
16
169,487
20
72,634
13
75,405
12
156,906
19
w inheritance
Area
#Pareto
DSE Synth.
63,549 10
10
63,626 11
11
63,307 11
11
62,906 11
11
112,626 14
14
113,116 13
13
112,630 17
17
113,234 14
14
113,347 13
13
115,027 15
15
111,391 17
17
112,979 13
13
171,159 19
19
168,305 16
16
168,195 17
17
169,162 19
19
73,156 14
14
73,302 12
12
71,693 11
11
75,009 13
13
76,607 13
13
75,372 11
11
76,214 12
12
77,654 12
12
157,005 19
19
154,814 18
18
158,718 21
21
153,866 18
18
contributions. In fact, this function emphasizes the individuals closer to the candidate more than the linear function. With respect to the exponential function, which
(strongly) emphasize only very similar individuals, it also consider more distant
contributions (always inside the radius).
26.6.2.2
Parameter Analysis
In this section, different inheritance rates (pi ) and different distance rates (r) are
studied. The parameters for the GA are the same used in Section 26.5.2.2. The fitness evaluation uses the linear regression model and exploits inheritance on parents
with the quadratic weighting function in all the experiments.
Table 26.5 shows the results of explorations where fitness inheritance is applied
with different inheritance rates. Note that values of pi between 0.40 and 0.55 provides a good trade-off between the quality of the exploration and the related execution time. The reason is that, with lower values, few individuals are chosen for
inheritance. On the other hand, with higher values, the number of really evaluated
individuals is limited. When there are not enough similar individuals (at least 10),
we swap the fitness evaluation to the HLS flow and the area model. Therefore, the
execution time is not reduced as expected. The results obtained in our experiments
26
721
are also consistent with the optimal proportion for inheritance derived in [32], defined as follows:
0.54 pi 0.558
(26.9)
Finally, Table 26.6 reports the results obtained with pi = 0.5 while changing the
distance rates r. Almost all the considered rates give good results. However, values
comprised between 0.20 and 0.25 perform best. In fact, with lower values, limited
information is available for inheritance, while, with higher values, additional noise
is introduced in the interpolation.
26.7 Conclusions
In this work, we presented an evolutionary approach to HLS design space exploration problem based on NSGA-II, a multi-objective evolutionary algorithm. We
exploited two orthogonal techniques, surrogate fitness and fitness inheritance, to reduce the time necessary to the expensive solution evaluations. The fitness surrogate
was computed with a linear regression model that takes into account the contributions of all the components of the design (e.g., interconnections or glue logic) and
the effect of the optimizations introduced by the logic tool: replacing the logic synthesis process with such a surrogate model, we can save a lot of computational time.
Fitness inheritance was used to reduce the number of evaluations, by evaluating only
a fixed portion of the population. We validated our approach on several benchmarks
and our results suggest that both the proposed techniques allows to speed-up the
evolutionary search without degrading its performance. At the best of our knowledge, this is the first framework for the HLS design space exploration that exploits
at the same time a surrogate fitness model as well as a fitness inheritance scheme.
References
1. Araujo, S.G., Mesquita, A.C., Pedroza, A.: Optimized Datapath Design by Evolutionary
Computation. In: IWSOC: International Workshop on System-on-Chip for Real-Time
Applications, pp. 69 (2003)
2. Barthelemy, J.F.M., Haftka, R.T.: Approximation concepts for optimum structural design
- a review. Structural Optimization (5), 129144 (1993)
3. Brandolese, C., Fornaciari, W., Salice, F.: An area estimation methodology for FPGA
based designs at SystemC-level. In: DAC: Design Automation Conference, pp. 129132.
ACM, New York (2004)
4. Chaiyakul, V., Wu, A.C.H., Gajski, D.D.: Timing models for high-level synthesis. In:
EURO-DAC 1992: European Design Automation Conference, pp. 6065. IEEE Computer Society Press, Los Alamitos (1992)
5. Chen, D., Cong, J.: Register binding and port assignment for multiplexer optimization.
In: ASP-DAC: Asia South Pacific Design Automation Conference, pp. 6873 (2004)
6. Chen, J.H., Goldberg, D.E., Ho, S.Y., Sastry, K.: Fitness inheritance in multi-objective
optimization. In: GECCO: Genetic and Evolutionary Computation Conference, pp. 319
326 (2002)
722
C. Pilato et al.
7. Cordone, R., Ferrandi, F., Santambrogio, M.D., Palermo, G., Sciuto, D.: Using speculative computation and parallelizing techniques to improve scheduling of control based
designs. In: ASPDAC: Asia South Pacific Design Automation Conference, pp. 898904.
ACM, Yokohama (2006)
8. De Micheli, G.: Synthesis and Optimization of Digital Circuits. McGraw-Hill, New York
(1994)
9. Deb, K., Agrawal, S., Pratab, A., Meyarivan, T.: A Fast and Elitist Multi-Objective Genetic Algorithm: NSGA-II. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 849858.
Springer, Heidelberg (2000)
10. Dennis, J., Torczon, V.: Managing approximate models in optimization. In: Alexandrov,
N., Hussani, M. (eds.) Multidisciplinary design optimization: State-of-the-art, pp. 330
347. SIAM, Philadelphia (1997)
11. Ducheyne, E., Baets, B.D., Wulf, R.D.: Is fitness inheritance useful for real-world applications? (2003)
12. Ferrandi, F., Lanzi, P.L., Palermo, G., Pilato, C., Sciuto, D., Tumeo, A.: An evolutionary
approach to area-time optimization of FPGA designs. In: ICSAMOS: International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pp.
145152 (2007)
13. Grefenstette, J.J., Fitzpatrick, J.M.: Genetic search with approximate function evaluation.
In: International Conference on Genetic Algorithms, pp. 112120. Lawrence Erlbaum
Associates, Inc., Mahwah (1985)
14. Grewal, G., OCleirigh, M., Wineberg, M.: An evolutionary approach to behaviourallevel synthesis. In: CEC: IEEE Congress on Evolutionary Computation, 8-12, pp. 264
272. ACM Press, New York (2003)
15. Gu, Z., Wang, J., Dick, R.P., Zhou, H.: Unified incremental physicallevel and high-level
synthesis. IEEE Trans. on CAD of Integrated Circuits and Systems 26(9), 15761588
(2007)
16. Harik, G.: Linkage Learning via ProbabilisticModeling in the ECGA (1999)
17. Huband, S., Hingston, P.: An evolution strategy with probabilistic mutation for multiobjective optimisation. In: IEEE Congress on Evolutionary Computation, CEC 2003,
pp. 22842291. IEEE Press, Piscataway (2003)
18. Hwang, C.T., Leea, J.H., Hsu, Y.C.: A formal approach to the scheduling problem in high
level synthesis. IEEE Trans. on CAD of Integrated Circuits and Systems 10(4), 464475
(1991)
19. Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation.
Soft Comput. 9(1), 312 (2005)
20. Kollig, P., Al-Hashimi, B.: Simultaneous scheduling, allocation and binding in high level
synthesis. Electronics Letters 33(18), 15161518 (1997)
21. Krishnan, V., Katkoori, S.: A genetic algorithm for the design space exploration of datapaths during high-level synthesis. IEEE Trans. Evolutionary Computation 10(3), 213229
(2006)
22. Kuehlmann, A., Bergamaschi, R.A.: Timing analysis in high-level synthesis. In: ICCAD:
International Conference on Computer-Aided Design, pp. 349354. IEEE Computer Society Press, Los Alamitos (1992)
23. Llora, X., Sastry, K., Goldberg, D.E., Gupta, A., Lakshmi, L.: Combating user fatigue
in iGAs: partial ordering, support vector machines, and synthetic fitness. In: GECCO:
Conference on Genetic and evolutionary computation, pp. 13631370. ACM Press, New
York (2005)
26
723
24. Mandal, C., Chakrabarti, P.P., Ghose, S.: Design space exploration for data path synthesis. In: International Conf. on VLSI Design, pp. 166170 (1996)
25. Mandal, C., Chakrabarti, P.P., Ghose, S.: GABIND: a GA approach to allocation and
binding for the high-level synthesis of data paths. IEEE Transaction on Very Large Scale
Integration System 8(6), 747750 (2000)
26. Meribout, M., Motomura, M.: Efficient metrics and high-level synthesis for dynamically
reconfigurable logic. IEEE Trans. Very Large Scale Integr. Syst. 12(6), 603621 (2004)
27. Palesi, M., Givargis, T.: Multi-objective design space exploration using genetic algorithms. In: CODES: International Symposium on Hardware/ software Codesign, pp. 67
72. ACM, New York (2002)
28. Paulin, P.G., Knight, J.P.: Force-directed scheduling for the behavioral synthesis of
ASICs. IEEE Trans. on CAD of Integrated Circuits and Systems 8(6), 661679 (1989)
29. Pilato, C., Palermo, G., Tumeo, A., Ferrandi, F., Sciuto, D., Lanzi, P.L.: Fitness inheritance in evolutionary and multi-objective high-level synthesis. In: IEEE Congress on
Evolutionary Computation, pp. 34593466 (2007)
30. Reyes-Sierra, M., Coello, C.: A study of fitness inheritance and approximation techniques for multi-objective particle swarm optimization 1, 6572 (2005)
31. Sastry, K.: Evaluation-relaxation schemes for genetic and evolutionary algorithms.
Masters thesis, General Engineering Department, University of Illinois at UrbanaChampaign, Urbana, IL (2001)
32. Sastry, K., Goldberg, D.E., Pelikan, M.: Dont evaluate, inherit. In: GECCO: Genetic and
Evolutionary Computation Conference, pp. 551558. Morgan Kaufmann, San Francisco
(2001)
33. Sastry, K., Lima, C.F., Goldberg, D.E.: Evaluation relaxation using substructural information and linear estimation. In: GECCO 2006, pp. 419426. ACM, Seattle (2006)
34. Smith, R.E., Dike, B.A., Stegmann, S.A.: Fitness inheritance in genetic algorithms. In:
SAC: Symposium on Applied computing, pp. 345350. ACM Press, New York (1995)
35. Stok, L.: Data Path Synthesis. Integration, the VLSI Journal 18(1), 171 (1994)
36. Teich, J., Blickle, T., Thiele, L.: An evolutionary approach to system level synthesis. In:
CODES Workshop, p. 167 (1997)
37. Wanner, E., Guimaraes, F., Takahashi, R., Fleming, P.: A quadratic approximationbased local search procedure for multiobjective genetic algorithms. In: IEEE
Congress on Evolutionary Computation, CEC 2006, pp. 938945 (2006),
doi:10.1109/CEC.2006.1688411
38. Zitzler, E., Optimization, M., Zrich, E.H., Thiele, L., Deb, K.: Evolutionary algorithms
for multiobjective optimization: Methods and applications. PhD thesis (1999)
39. Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms:
Empirical results. Evolutionary Computation 8(2), 173195 (2000)
Index
SVR
253
graph
424, 426429, 433435,
440443, 448, 449
428, 433,