ConvergeofHybridSpaceMapping Alg
ConvergeofHybridSpaceMapping Alg
ConvergeofHybridSpaceMapping Alg
s
w
k
k
(h)
,
where the minimization is usually subject to a trust region.
In the algorithm w
0
= 1 and w
k
is non-increasing function of k. The principle being that
if c p
k
is doing badly in approximating f then w
k
is decreased, and thereby the algorithm
gradually switches to using the Taylor model f
k
of f .
2.1. Details of the SMTA
The trust region at x
k
is the set
T
k
= {x R
n
[ |x x
k
|
k
] (5)
where || is a suitable norm in R
n
and
k
> 0.
At the current iterate x
k
, the tentative step h
k
R
n
is a solution to
h
k
arg min
h
H
s
w
k
k
(h)
s.t. x
k
h T
k
(6)
The quality of a given tentative step h
k
is measured by
S
k
(h
k
) H
s
w
k
k
(h
k
)
H( f (x
k
)).
148 MADSEN AND SNDERGAARD
S
k
(h
k
) may be interpreted as a measure of the ability of the model s
w
k
to predict a
decrease in F. Notice that S
k
(0) is not necessarily 0, however, i.e., the model does not
necessarily interpolate at x
k
.
If h
k
is acceptable then we use x
k
h
k
as the next iterate, otherwise we maintain x
k
as
the best iterate found. The acceptance of the step is based on the following criteria:
If the predicted decrease is non-positive,
S
k
(h
k
) 0, (7)
then the step is rejected. Otherwise the step is accepted if F decreases sufciently:
F(x
k
) F(x
k
h
k
)
1
[S
k
(h
k
)] (8)
where 0 <
1
< 1.
In each iteration the local bound
k
and the transition parameter w
k
are adjusted as
follows:
If (7) is true then
k1
=
k
, otherwise
k1
depends on the ratio between actual and
the predicted decrease. If
F(x
k
) F(x
k
h
k
)
2
[S
k
(h
k
)], (9)
1
<
2
< 1, is satised then
k1
= K
1
k
with 0 < K
1
< 1.
If
F(x
k
) F(x
k
h
k
)
3
[S
k
(h
k
)], (10)
2
<
3
< 1, is satised then
k1
= K
2
k
with K
2
> 1.
If none of the conditions (9) or (10) are satised then we let
k1
=
k
.
The transition parameter w
k
is chosen as follows: Initially w
k
= 1. If x
k
h
k
is not
accepted then we wish to move weight towards the Taylor model, and therefore we let
w
k1
= K
3
w
k
min{
k1
, 1], (11)
where 0 K
3
< 1. Otherwise w
k1
= w
k
.
In order to ensure convergence, however, we need w
k
0 for k . Therefore we
also apply (11) if it has not been used for the previous n iterations.
2.2. Summary of the SMTA
Given
0
, B
0
= I , w
0
= 1, k = 0
0. Find x
0
as a solution to min
x
H(c( x))
1. Evaluate f (x
0
)
2. Find p(x
0
) by solving (2)
3. Find h
k
by solving (6)
4. Evaluate f (x
k
h
k
)
CONVERGENCE OF HYBRID SPACE MAPPING ALGORITHMS 149
5. Find p(x
k
h
k
) by solving (2)
6. If (7) is false and (8) is true then let x
k1
= x
k
h
k
otherwise x
k1
= x
k
7. Update , w, B and D (only if w < 1)
8. Let k = k 1
9. If not converged then go to 3
The Steps 0, 1 and 2 are an initial phase where a (local) optimizer of H(c(x)) is found and
the initial translation p(x
0
) in the linear approximation to the space mapping (3) is found.
Note that if (7) is true after Step 3, then the algorithm can skip to Step 8, letting x
k1
,
k1
, B
k1
and D
k1
take the values from iteration k and updating w
k
using (11). Hence
we avoid the evaluation of f (x
k
h
k
) and f
/
(x
k
h
k
) in this case.
3. Proof of convergence
We show that the SMTA satises the usual convergence condition for trust region methods.
In the proof we do not need the actual updating scheme of the weights {w
k
], (11), we only
need property (12) below. Similarly, we do not need any properties of the SMA, we only
need that c is bounded (Assumption A2 below). Thus, the proof covers a class of algorithms,
including those presented in Bakr et al. (2001) and Pedersen (2001). Probably also other
pre-processing techniques will suit into this framework.
Throughout this section we use the notations {x
k
], {h
k
], {
k
] and {w
k
] as they are dened
in Section 2.1. It follows from the denition of the weights that they satisfy
w
k
= min{
k1
, 1] o(1) (12)
where o(1) 0 for k .
We make the following assumptions
A1: For each point z R
n
there exist a neighbourhood N(z) R
n
such that
f (x h) = f (x) f
/
(x)h o(|h|), x N(z), h R
n
,
where o is uniform for x N(z).
f
/
is continuous on R
n
.
A2: c is bounded in the region of interest.
A3: H is a convex function on R
m
.
A4: {x
k
] stays in a bounded region in R
n
.
3.1. Prerequisites
The convexity of H implies that it satises a local Lipschitz condition. H is not necessarily
differentiable. However, we can dene a generalized gradient which is a set of points (rather
150 MADSEN AND SNDERGAARD
than always a single point). Since a function which is Lipschitz on a nite dimensional space
is differentiable almost everywhere, the generalized gradient can be dened as follows, see
Clarke (1975) or Clarke (1983), Theorem 2.5.1:
Denition 1. The generalized gradient of H at x, denoted H(x), is the convex hull of the
set of all limits of the form lim H
/
(x e
i
), where H is differentiable at x e
i
and e
i
0
as i ,
H(x) conv
lim
e
i
0
H
/
(x e
i
)
.
It is easily seen, Clarke (1983), Proposition 2.1.1 and Madsen (1985), Proposition 2.1,
that H(x) is non-empty and compact.
Since H is convex the generalized gradient coincides with what is called the subdiffer-
ential in convex analysis. The convexity of H also implies the existence of a directional
derivative H
/
e
(y) for any y, e R
m
, e ,= 0,
H
/
e
(y) lim
t 0
H(y t e) H(x)
t
.
Now consider the composite function F = H f . Since f is smooth F is well dened.
A stationary point of F is dened as follows,
Denition 2. x is a stationary point of F if
0 F(x).
Using the convexity of H and the smoothness of f we can obtain the following characteri-
zation of a stationary point (Madsen, 1985), Proposition 2.15.
Proposition 1. Let x R
n
. Then
0 F(x)
F
/
h
(x) 0 for every direction h R
n
, h ,= 0.
Below we shall use the following 1. order approximation F to a change in F:
Denition 3. Let x, h R
n
. Dene
F(x; h) H( f (x) f
/
(x)h) H( f (x)).
We shall use the following two properties of F:
Proposition 2. For x, h R
n
we have
F(x; t h) = t F
/
h
(x) o(t ), for t > 0.
CONVERGENCE OF HYBRID SPACE MAPPING ALGORITHMS 151
Proof: Madsen (1985): Proposition 2.9.
Proposition 3. For x, h R
n
and 0 t 1 we have
F(x; t h) t F(x; h).
Proof: The result is a simple consequence of assumptions A1 and A3.
3.2. Proof of convergence
The technicalities of the convergence proof for SMTA are contained in the following three
lemmas.
Lemma 1. Let x R
n
be a non-stationary point. Then there exist positive numbers
1
,
2
and
1
such that for x
k
R
n
|x
k
x|
1
and
k
2
S
k
(h
k
)
1
k
if k
k
Proof: Since x is a non-stationary point there exist, by Proposition 1, a direction h such
that F
/
h
(x) < 0. Then it follows from Proposition 2 that there exist a point x d, d = t h,
t > 0, such that F(x; d) < 0.
Dene d
k
by
x
k
d
k
= x d.
If x
k
x then d
k
d. Therefore it follows from the uniform continuity of f , f
/
and H
that there exists a neighbourhood N(x) of x and a number such that
F(x
k
; d
k
) < 0 (13)
when x
k
N(x). Dene
1
> 0 and
2
> 0 such that if |x
k
x|
1
then x
k
N(x) and
|d
k
|
2
.
Let h
t
k
R
n
be a solution to (4) subject to the trust region:
h
t
k
arg min
h
H( f
k
(h))
s.t. x
k
h T
k
(14)
Suppose
k
2
. Let t
k
=
k
/|d
k
| and q
k
= t
k
d
k
. Then x
k
q
k
T
k
and since
H( f
k
(h)) = F(x
k
; h) F(x
k
) we can use the denition of h
t
k
and Proposition 3 to obtain
F
x
k
; h
t
k
F(x
k
; q
k
) t
k
F(x
k
; d
k
). (15)
152 MADSEN AND SNDERGAARD
Since {d
k
] is bounded away from 0 this implies the existence of > 0, independent of k,
such that
F
x
k
; h
t
k
k
(16)
(using (13)). Since H is locally Lipschitz, there exists a constant K such that
x
k
; h
t
k
h
t
k
h
t
k
K
k
. (17)
Now let u
k
= |h
t
k
|/|d
k
|. The arguments showing (15) imply
F
x
k
; h
t
k
u
k
F(x
k
; d
k
).
Using (13) and the denition of F, this implies
H
f
k
h
t
k
u
k
H( f (x
k
)). (18)
Because of (17), property (12), A2, and the boundedness away from 0 of {|d
k
|], we have
the following inequalities for all sufciently large values of k, say k
k,
[w
k
H( f (x
k
))[
4
u
k
,
w
k
H
p
k
h
t
k
4
u
k
,
w
k
1
4
.
Therefore, using the convexity of H and (18), par
H
s
w
k
k
h
t
k
w
k
H
p
k
h
t
k
(1 w
k
)H
f
k
h
t
k
4
u
k
(1 w
k
)(u
k
H( f (x
k
)))
=
4
u
k
u
k
H( f (x
k
)) w
k
u
k
w
k
H( f (x
k
))
4
u
k
H( f (x
k
)).
Since h
k
minimizes H s
w
k
k
subject to the trust region, it follows that
S
k
(h
k
) = H
s
w
k
k
h
k
H( f (x
k
))
H
s
w
k
k
h
t
k
H( f (x
k
))
CONVERGENCE OF HYBRID SPACE MAPPING ALGORITHMS 153
h
t
k
/|d
k
|
k
/|d
k
|,
which proves Lemma 1.
Lemma 2. Let x R
n
be a non-stationary point. Let
1
be dened as in Lemma 1. For
every
3
> 0 there exists
2
> 0 such that
|x
k
x|
1
and
k
3
S
k
(h
k
)
2
if k
k
Proof: Let
4
= min{
2
,
3
],
2
being dened as in Lemma 1. Suppose
h
k
is generated
by (6) from x
k
with the trust region bound
k
=
4
. Then it follows from Lemma 1 that,
for k
k,
S
k
(
h
k
)
1
4
Suppose h
k
is generated from x
k
by (6) with the local bound
k
3
. Then
S
k
(h
k
) S
k
(
h
k
)
1
4
,
which proves Lemma 2.
Lemma 3. If {x
k
] is convergent then the limit point is stationary.
Proof: Suppose x
k
x for k and that x is non-stationary.
From Assumptions A1A3 and (12) we obtain
F(x
k
h
k
) = H( f
k
(h
k
) o(|h
k
|))
= H
s
w
k
k
(h
k
) w
k
(c( p
k
(h
k
)) f
k
(h
k
)) o(|h
k
|)
= H
s
w
k
k
(h
k
)
k
o(1) o(|h
k
|)
= H
s
w
k
k
(h
k
)
k
o(1)
Therefore
F(x
k
) F(x
k
h
k
) = S
k
(h
k
)
k
o(1) (19)
Let
2
be dened as in Lemma 1. Using Lemma 1 if
k
2
and Lemma 2 if
k
>
2
we
nd from (19)
F(x
k
) F(x
k
h
k
) = S
k
(h
k
)(1 o(1))
154 MADSEN AND SNDERGAARD
Therefore Lemma 1 and the rules (7), (8) and (10) for accepting a newpoint and for adjusting
k
imply the existence of > 0, with
2
such that for sufciently large k,
k
x
k1
= x
k
h
k
and
k1
= K
2
|h
k
| (20)
where K
2
> 1.
Equation (20) implies that if
k
, then
k1
>
k
. Hence the sequence of local bounds
must be bounded away from zero,
k
3
> 0 for k = 0, 1, 2, 3, . . . (21)
Therefore an innite number of proposals (x
k
h
k
) are accepted by the algorithm, because
otherwise we would have
k
0 for k (using the fact that the bound is decreased
linearly when a point is not accepted (see (9))). Furthermore, when a proposal (x
k
h
k
) is
accepted and k is sufciently large, then (8), Lemma 2 and (21) imply
F(x
k
) F(x
k
h
k
)
1
[S
k
(h
k
)]
2
> 0.
Since the sequence of function values F(x
k
) is non-increasing we obtain F(x
k
)
for k . This is a contradiction since x
k
x and F is continuous at x. Thus this
assumption is wrong: x is a stationary point.
The following theorem extends the result of Lemma 3.
Theorem 1. Let S be the set of stationary points of (1), S = { x R
n
[ 0 F(x)]. Let
d(x
k
, S) be the distance between x
k
and S. Then
d(x
k
, S) 0 for k
Proof: Suppose d(x
k
, S)0. Then innitely many points x
k
must be bounded away from
S, and hence Assumption A4 implies that the sequence {x
k
] must have a cluster point x
which is not stationary. According to Lemma 3 {x
k
] cannot converge to x. Thus, for all
small > 0 innitely many iterates x
k
must have a distance less than from x and innitely
many must have a distance larger than 2 from x. Let > 0 be chosen smaller than
1
/2,
1
being the bound used in Lemmas 1 and 2. Then we shall prove that if |x
k
x| < and
|x
kp
x| > 2 we have
F(x
k
) F(x
kp
) > 0, (22)
being independent of k and p. Since (22) holds for innitely many values of k, and since
the sequence {F(x
k
)] is non-increasing, we obtain that F(x
k
) for k which
contradicts that F is continuous at x and {x
k
] converges to x. Therefore the result follows
as a consequence of (22).
CONVERGENCE OF HYBRID SPACE MAPPING ALGORITHMS 155
Equation (22) is proved by the following argument: Consider
F(x
k
) F(x
kp
) =
kp1
j =k
[F(x
j
) F(x
j 1
)] , (23)
for k
k, with
k as in Lemma 1. The terms of the sum are non-negative. Suppose that
x
j 1
,= x
j
, i.e. the increment h
j
is accepted. Suppose further that if |x
j
x| 2 then we
can use the Lemmas 1 and 2. We obtain from (8) and these lemmas,
F(x
j
) F(x
j 1
)
1
[S
j
(h
j
)]
j
if
j
2
2
otherwise
(24)
Equation (22) now follows from (23) using (24): Let A
k
be the index set corresponding
to the terms in (23) with x
j
,= x
j 1
. If, for all of these terms, we have
j
2
then
j A
k
[F(x
j
) F(x
j 1
)]
j A
k
j A
k
|h
j
| (25)
The last sum exceeds since h
j
= x
j 1
x
j
for j A
k
, x
j
= x
j 1
for j / A
k
, and since
|x
kp
x
k
| > . Thus the sum in (23) exceeds when
j
2
for all j A
k
. If the last
condition is not fullled then there exists j A
k
with
F(x
j
) F(x
j 1
)
2
so in that case the sum exceeds a positive number which is independent of k and t . Thus
we have proved (22) with
= min{
1
,
2
]
This completes the proof of Theorem 1.
4. Conclusion
We have considered the problem of minimizing functions of the type H f , where f :
R
n
.R
m
is smooth and H : R
m
.Ris convex. It is proved that the hybrid space mapping
algorithm described in Bakr et al. (2001) has guaranteed global convergence to the set of
stationary points of h f . The proof covers a class of hybrid algorithms, including those
presented in Bakr et al. (2001) and Pedersen (2001).
156 MADSEN AND SNDERGAARD
References
M. H. Bakr, J. W. Bandler, K. Madsen, and J. Sndergaard, Reviewof the space mapping approach to engineering
optimization and modelling, Optimization and Engineering vol. 1, pp. 241276, 2000.
M. H. Bakr, J. W. Bandler, K. Madsen, and J. Sndergaard, An introduction to the space mapping technique,
Optimization and Engineering vol. 2, pp. 369384, 2001.
J. W. Bandler, R. M. Biernacki, S. H. Chen, P. A. Grobelny, and R. H. Hemmers, Space mapping technique for
electromagnetic optimization, IEEE Trans. Microwave Theory Tech. vol. 42, pp. 25362544, 1994.
C. G. Broyden, Aclass of methods for solving non-linear simultaneous equations, Math. Comp. vol. 19, pp. 577
593, 1965.
F. H. Clarke, Generalized gradients and applications, Trans. Am. Maths. Society vol. 205, pp. 247262, 1975.
F. H. Clarke, Optimization and Nonsmooth Analysis, Canadian Mathematical Society Series of Monographs and
Advanced Texts, John Wiley & Sons, 1983, pp. 1308.
R. Fletcher, A model algorithm for composite nondifferentiable optimization problems, in D. C. Sorensen, R.
J.-B. Wets, eds., Nondifferentiable and Variational Techniques in Optimization, Mathematical Programming
Study, vol. 17, North-Holland: Amsterdam, 1982.
J. Hald and K. Madsen, Combined LP and quasi-newton methods for non-linear L1 optimization, SIAM J. Num.
Anal. vol. 22, pp. 369384 1985.
K. Madsen, An algorithmfor minimax solution of overdetermined systems of non-linear equations, J. Inst. Math.
Appl. vol. 16, pp. 321328, 1975.
K. Madsen, Minimization of non-linear approximation functions, Dr. Techn. Thesis, Institute for Numerical
Analysis, Technical University of Denmark, 1985, pp. 1141.
F. . Pedersen, Advances on the space mapping optimization method, Master Thesis, IMM-THESIS-200135,
Informatics and Mathematical Modelling, Technical University of Denmark, 2001.
L. N. Vicente, Space mapping: Models, sensitivities, and trust-regions methods, to appear in Optimization and
Engineering, 2003.
R. S. Womersley and R. Fletcher, An algorithm for composite nonsmooth optimization problems, J. Opt. Theo.
Applns. vol. 48, pp. 493523, 1986.