Nothing Special   »   [go: up one dir, main page]

Bilevel Programming - A Survey: Dempe@math - Tu-Freiberg - de

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Chapter 1

BILEVEL PROGRAMMING - A SURVEY


Stephan Dempe
Technical University Bergakademie Freiberg, Germany
dempe@math.tu-freiberg.de
Abstract Bilevel programming problems are hierarchical optimization problems
where an objective function is to be minimized over the graph of the
solution set mapping of a second parametric optimization problem. It
is the aim of the paper to give a survey for this living research area
indicating main recent approaches to solve such problems and to de-
scribe optimality conditions as well as to touch main particularities of
this problem class.
1. Introduction
Bilevel programming problems are hierarchical mathematical prob-
lems where the set of all variables is partitioned between vectors x and
y. Given y, the vector x is to be chosen as an optimal solution x = x(y)
of an optimization problem parametrized in y:
x(y) (y) := Argmin
x
f(x, y) : g(x, y) 0. (1.1)
This problem is the so-called lower level problem or the followers prob-
lem. Here, f : R
n
R
m
R, g : R
n
R
m
R
p
. Equality constraints
can be added without serious problems. Sometimes, integrality condi-
tions are added to the lower level problem. The point-to-set mapping
: R
m
2
R
n
is the solution set mapping of the problem (1.1). The
solution x(y) is called the rational reaction of the follower on the leaders
choice y. Knowing this reaction, the bilevel problem reads as follows:
min
y
F(x, y) : G(y) 0, x (y). (1.2)
This problem is the leaders problem or the bilevel problem. Quotation
marks have been added in (1.2) due to an unclear denition of the ob-
2
jective function value F(x, y) from the leaders point-of-view (who has
control over y only) if the set of optimal solutions of (1.1) does not re-
duce to a singleton. Throughout the paper it is assumed that all the
functions F, G
j
, f, g
i
are suciently smooth.
Bilevel programming has many potential applications in very dierent
elds, as transportation, economics, ecology, engineering and others, see
Dempe (2003). One problem in chemistry from Dempe (2002) is the fol-
lowing. If a certain mixture y of chemical substances is put into a chem-
ical reactor running with a certain temperature T and a given pressure
p a chemical equilibrium x arises. This equilibrium depends on y, T, p
but in general it cannot be inuenced directly by the engineer. One can
compute this equilibrium by solving a convex optimization problem
N

i=1
c
i
(p, T)x
i
+
G

i=1
x
i
ln
x
i
z
min
x
z =
G

j=1
x
j
, Ax = Ay, x 0,
(1.3)
where G N denotes the number of gaseous and N the total number of
reacting substances. Each row of the matrix A corresponds to a chem-
ical element, each column to a substance. Hence, a column gives the
amount of the dierent elements in the substances; x is the vector of the
masses of the substances in the resulting chemical equilibrium whereas y
denotes the initial masses of substances put into the reactor. The value
of c
i
(p, T) gives the chemical potential of a substance which depends
on the pressure p and on the temperature T. Let x(p, T, y) denote the
(unique) optimal solution of this problem. Then, since there exists some
desire on the result of the chemical reactions, another problem arises.
Namely, the question of how to determine the mixture of substances,
the temperature and the pressure such that the resulting chemical equi-
librium has a desired quality as containing a large amount of needed
substances and a small amount of others. This goal can be expressed as
another optimization problem whose feasible set depends on the optimal
solution of the problem (1.3):
c, x) min
p,T,y
(p, T, y) Y, x = x(p, T, y).
This latter problem is the bilevel programming problem.
Bilevel programming has been the topic of a large number of pa-
pers including masters and PhD thesis, at least two monographs Bard
(1998); Dempe (2002) and edited volumes Anandalingam and Friesz
(1992); Migdalas et al. (1998), see the annotated bibliography Dempe
Bilevel Programming - A Survey 3
(2003). These publications present both theoretical results as well as
solution approaches and a large number of applications.
2. Nonunique lower level optimal solutions
In the denition of the bilevel programming problem (1.2) quotation
marks have been used to express the ambiguity in the denition of the
problem in case of multiple optimal solutions in the lower level problem
(1.1) for some parameter values. This is illustrated by the following
example:
Example 1.1 (Lucchetti at al. (1987)) Let the lower level prob-
lem be given as
(y) = Argmin
x
xy : 0 x 1
and consider the bilevel problem
min
y
x
2
+y
2
: x (y), 0 y 1.
Then, evaluating the lower level problem and inserting the optimal so-
lution of this problem into the objective function of the upper level one
results in
(y) =

0, y > 0,
1, y < 0,
[0, 1], y = 0.
F(x(y), y)

= y
2
, y > 0,
= 1 +y
2
, y < 0,
[0, 1], y = 0.
Unclear is the function value of the function y F(x(y), y) at the point
y = 0. The inmal function value of F(x(y), y) is equal to zero but
this value is attained only if F(x(0), 0) = 0. This situation is called the
optimistic position in what follows. If this is not the case then the bilevel
problem has no solution.
Note that in problem (1.2) the leader is not allowed to force the follower
to take the one or the other of his optimal solutions. Hence, the leader
cannot predict the true value of his objective function until the follower
has communicated his choice. To overcome this ambiguity at least two
approaches have been suggested.
2.1 The optimistic position
The rst one is the optimistic position. The leader can use this ap-
proach if he supposes that the follower is willing to support him, i.e. that
4
the follower will select a solution x(y) (y) which is a best one from
the point-of-view of the leader. Let

o
(y) := min
x
F(x, y) : x (y) (1.4)
denote the optimistic objective function value in the upper level. Then,
the optimistic position of the bilevel programming problem reduces to
min
o
(y) : G(y) 0. (1.5)
A pair (x(y), y) such that y solves (1.5) and x(y) solves (1.4) for y = y
is an optimal solution for the problem
min
x,y
F(x, y) : G(y) 0, x (y), (1.6)
too, and vice versa, provided that the latter one has an optimal solution.
Most of the contributions to bilevel programming treat this problem.
One of the reasons can be that this problem has an optimal solution
under quite reasonable assumptions.
To formulate an existence result for this problem a regularity condition
is needed. For a problem
min(x) : (x) 0, (x) = 0 (1.7)
with equality and inequality constraints (,
i
,
j
: R
n
R, i =
1, . . . , p, j = 1, . . . , q) the Mangasarian-Fromowitz constraint qualica-
tion at a point x
0
reads as
(MFCQ) there exists a direction d with

i
(x
0
)d < 0 i :
i
(x
0
) = 0,

j
(x
0
)d = 0 j
and the gradients
j
(x
0
) j are linearly independent.
Theorem 1.2 (Dempe (2002)) If
(C) the set (x, y) : g(x, y) 0, G(y) 0 is non-empty and compact
and, for each y with G(y) 0, the (MFCQ) is satised, then problem
(1.6) has an optimal solution.
Related results can be found in Harker and Pang (1988).
2.2 The pessimistic position
The optimistic position seems not to be possible without any trouble
at least in the cases when cooperation is not allowed respectively not
Bilevel Programming - A Survey 5
possible or when the followers seriousness of keeping the agreement is
not granted. Then, one way out of this unpleasant situation for the
leader is to bound the damage resulting from an undesirable selection of
the follower. This leads to the problem
min
p
(y) : G(y) 0, (1.8)
where

p
(y) := max
x
F(x, y) : x (y) (1.9)
denotes the worst upper level objective function value achievable on the
set of optimal solutions of the lower level problem.
Theorem 1.3 (Dempe (2002)) Let the point-to-set mapping () be
lower semicontinuous at all points y with G(y) 0 and let assumption
(C) be satised. Then, problem (1.8) has an optimal solution.
Here, the point-to-set mapping () is lower semicontinuous at a point
y = y
0
if, for each open set A with (y
0
) A ,= there is an open
neighborhood B of y
0
with (y) A ,= for all y B.
Similar results can be found in the paper Lucchetti at al. (1987). Note
that the assumptions in this theorem are much more restrictive than
in Theorem 1.2. The bilevel programming problem has an implicitly
determined set of feasible solutions and, at least looking at (1.5) or
(1.8), it is a nonconvex optimization problem with a nondierentiable
objective function. Hence local optimal solutions may occur. To dene
the notion of a locally optimal solution or of a stationary solution usual
tools from optimization applied to the problems (1.5) or (1.8) can be
used.
Definition 1.4 A point (x
0
, y
0
) with G(y
0
) 0 and x
0
(y
0
) is
called a locally optimistic (pessimistic) optimal solution of the bilevel
programming problem (1.1), (1.2) if
F(x
0
, y
0
) F(x, y
0
) (resp. F(x
0
, y
0
) F(x, y
0
)) x (y
0
)
and there is an > 0 such that

o
(y
0
)
o
(y) (resp.
p
(y
0
)
p
(y)) y : G(y) 0, |y y
0
| < .
2.3 Weak solutions
If the lower level problem is nonconvex (for xed parameter value y)
the computation of a globally optimal solution in the lower level problem
can be computationally intractable (especially tracing the global solution
set mapping for varying parameter). In this case it can be considered as
6
being helpful to modify the bilevel problem such that a locally optimal
solution in the lower level problem is searched for instead of a globally
optimal solution. But, as it is shown in an example in Vogel (2002), this
can completely change the existence of an optimal solution of the bilevel
programming problem.
Example 1.5 (Vogel (2002)) Consider the bilevel problem
min
y
(x + 1)
2
: 3 y 2, x
s
(y),
with the lower level problem
min
x
x
3
3x : x y.
Then, if
s
(y) denotes the set of global optimal solutions of the last
problem and if the optimistic approach is used, then an optimal solution
is y

= 2, an optimal solution in the pessimistic approach does not


exist. But, if
s
(y) denotes the set of locally optimal solutions of the
lower level problem the following results are obtained:

s
o
(y) := minF(x, y) : x
s
(y) =

(y + 1)
2
, if y [3, 1) [1, 2]
4, if y [1, 1)
and

s
p
(y) := maxF(x, y) : x
s
(y) =

(y + 1)
2
, if y (1, 2]
4, if y [3, 1]
Hence, inf
s
o
(y) = 0 and an optimal solution of this problem does not
exist. On the other hand, inf
s
p
(y) = 4 and all points y [3, 1] are
optimal solutions. The reason for this behavior is it that the point-to-set
mapping of locally optimal solutions of a parametric optimization prob-
lem is generally not upper semicontinuous if the convexity assumption is
dropped in the assumptions of Theorem 1.2.
To circumvent this unpleasant situation, Vogel (2002) has dened a
weaker notion of an optimistic and a pessimistic solutions. For this,
let the point-to-set mapping
s
:= cl
s
be dened via the closure of
the graph of the point-to-set mapping
s
:
grph
s
:= cl grph
s
.
Consider the problem
min
y
F(x, y) : G(y) 0, x
s
(y) (1.10)
Bilevel Programming - A Survey 7
and dene

o
(y) := inf
x
s
(y)
F(x, y)
F
o
:= inf
y:G(y)0

o
(y)

p
(y) := sup
x
s
(y)
F(x, y)
F
p
:= inf
y:G(y)0

p
(y),
where
s
is a point-to-set mapping dened by solutions of the lower
level problem (e.g. local optimal solutions or global optimal solutions or
stationary points).
Definition 1.6 (Vogel (2002)) Consider the problem (1.10).
1 A point y with G(y) 0 is a weak optimistic solution of the
bilevel programming problem if there is some x
s
(y) such that
F(x, y) = F
o
.
2 A point y with G(y) 0 is a weak pessimistic solution of the bilevel
programming problem if there is some x
s
(y) and a sequence
y
k

k=1
dom
s
such that lim
k
y
k
= y and lim
k

p
(y
k
) = F
p
as well as F(x, y) = F
p
.
The denitions of weak optimistic and pessimistic solutions are dierent
since, if a weak optimistic solution exists, it can be shown that the
additional property similar to the one formulateded for the pessimistic
solution is satised. This additional property is needed to guarantee
that the obtained solution is not an isolated one which is far from all
feasible solutions for slightly perturbed problems.
Theorem 1.7 (Vogel (2002)) Let (x, y) : x
s
(y), G(y) 0 be
nonempty and bounded. Then, the bilevel programming problem (1.10)
has a weak optimistic and a weak pessimistic solutions.
The assumption of this theorem is satised if
s
(y) denotes the set of
globally or locally optimal solutions or the set of Fritz John points and
also if it denotes the set of generalized critical points (Guddat et al.
(1990)) of the lower level problem (1.1).
3. Relations to other problems
The bilevel programming problem is closely related to other optimiza-
tion problems which is often used to solve this problem. In the papers
8
Audet et al. (1997); Frangioni (1995) it is shown that every mixed-
discrete optimization problem can be formulated as bilevel programming
problem. This of course implies ^Thardness of bilevel programming.
The latter is also shown in
Theorem 1.8 (Deng (1998)) For any > 0 it is ^Thard to nd a
feasible solution to the linear bilevel programming problem with no more
than times the optimal value.
Related results can also be found in Hansen et al. (1992).
The relations to bicriterial optimization have been investigated e.g. in
the papers Fliege and Vicente (2003); Haurie et al. (1990); Marcotte
and Savard (1991). On the one hand it is easy to see that at least one
feasible point of the bilevel programming problem (1.1), (1.6) is Pareto
optimal for the problem
F(x, y)
f(x, y)

min
x,y

G(y) 0, g(x, y) 0.
But this, in general, is not true for a (local) optimal solution of the bilevel
problem. Hence, attempts to solve the bilevel programming problem
via bicriterial optimization with the ordering cone R
2
+
will in general
not work. On the other hand, Fliege and Vicente (2003) shows that
bicriterial optimization can indeed be used to prove optimality for the
bilevel programming problem. But, for doing so, another more general
ordering cone has to be used. Closely related to bilevel programming
problems are also the problems of minimizing a function over the ecient
set of some multicriterial optimization problem (see F ulop (1993); Muu
(2000)).
One tool often used to reformulate the bilevel programming problem
as an one-level problem are the Karush-Kuhn-Tucker conditions. If a
regularity condition is satised for the lower level problem (1.1), then
the Karush-Kuhn-Tucker conditions are necessary optimality conditions.
They are also sucient in the case when (1.1) is a convex optimization
problem in the xvariables for xed parameters y. This suggests to
replace problem (1.1), (1.6) by
F(x, y) min
x,y,
G(y) 0

x
f(x, y) +

x
g(x, y) = 0
g(x, y) 0, 0,

g(x, y) = 0.
(1.11)
The relations between (1.1), (1.4), (1.5) and (1.11) are highlighted in
the following theorem.
Bilevel Programming - A Survey 9
Theorem 1.9 (Dempe (2002)) Consider the optimistic bilevel pro-
gramming problem (1.1), (1.4), (1.5) and assume that, for each xed
y, the lower level problem (1.1) is a convex optimization problem for
which (MFCQ) is satised for each xed y and all feasible points. Then,
each local optimal solution for the problem (1.1), (1.4),(1.5) corresponds
to a local optimal solution for problem (1.11).
Note that the opposite implication is not true in general. This can be
seen in the following example.
Example 1.10 Consider the simple linear bilevel programming problem
min
y
y : x (y), 1 y 1,
where
(y) := Argmin
x
xy : 0 x 1
at the point (x, y) = (0, 0). Then, this point is a local minimum
of problem (1.11), i.e. there exists an open neighborhood W

(0, 0) =
(, ) (, ) with 0 < < 1 such that y 0 for all (x, y) W

(0, 0)
with x (y) and 1 y 1. The simple reason for this is it that there
is no < x < with x (y) for y < 0 since (y) = 1 for y < 0.
But a closer look at the denition of a local optimistic optimal solution
shows that y
0
= 0 is not a local optimistic optimal solution since it is
not a local minimum of the function
o
(y) = y.
It is a rst implication of these considerations that the problems (1.1),
(1.4), (1.5) and (1.1), (1.6) are not equivalent if local optimal solutions
are considered and a second one that not all local optimal solutions
of the problem (1.11) correspond in general to local optimal solutions
of the problem (1.1), (1.4), (1.5). It should be noted that, under the
assumptions of Theorem 1.9 and if the optimal solutions of the lower
level problem are strongly stable in the sense of Kojima (1980), then
problems (1.1), (1.2) and (1.11) are equivalent.
The following example from Mirrlees (1999) shows that this result is
no longer valid if the convexity assumption is dropped.
Example 1.11 Consider the problem
min
x,y
(y 2) + (x 1)
2
: x (y)
where (y) is the set of optimal solutions of the following unconstrained
optimization problem on the real axis:
y exp(x + 1)
2
exp(x 1)
2
min
x
10
Then, the necessary optimality conditions for the lower level problem are
y(x + 1) exp(x + 1)
2
+ (x 1) exp(x 1)
2
= 0
which has three solutions for 0.344 y 2.903. The global optimum of
the lower level problem is uniquely determined for all y ,= 1 and it has
a jump at the point y = 1. Here the global optimum of the lower level
problem can be found at the points x = 0.957. The point (x
0
; y
0
) =
(0.957; 1) is also the global optimum of the bilevel problem.
But if the lower level problem is replaced with its necessary optimal-
ity conditions and the necessary optimality conditions for the resulting
problem are solved then three solutions: (x, y) = (0.895; 1.99), (x, y) =
(0.42; 2.19), (x, y) = (0.98; 1.98) are obtained. Surprisingly, the global
optimal solution of the bilevel problem is not obtained with this approach.
The reason for this is it that the problem
min(y2)+(x1)
2
: y(x+1) exp(x+1)
2
+(x1) exp(x1)
2
= 0
has a much larger feasible set than the bilevel problem. And this feasible
set has no jump at the point (x, y) = (0.957; 1) but is equal to a certain
connected curve in R
2
. And on this curve the objective function has no
stationary point at the optimal solution of the bilevel problem.
A surprising result is also the dependence of bilevel programming prob-
lems on irrelevant constraints, cf. Macal and Hurter (1997). This means
that dropping some lower level constraints which are not active at an
optimal solution (x
0
, y
0
) of the bilevel programming problem can change
the problem drastically such that (x
0
, y
0
) will not remain optimal.
Moreover, the location of constraints is essential. Moving one con-
straint from the lower to the upper levels (or vice versa) generally
changes the problem signicantly; it is even possible that one of the
problems has an optimal solution whereas the other has not.
4. Optimality conditions
4.1 Implicit functions approach
The formulation of optimality conditions for bilevel programming
problems usually starts with a suitable reformulation of the problem as
a one-level one. First conditions are based on strong stability of optimal
solutions for problem (1.1) and replace the implicit constraint x (y)
by the implicitly determined function x = x(y) with x(y) = (y).
Let L(x, y, ) := f(x, y) +

g(x, y) denote the Lagrange function for


problem (1.1) and
(x, y) := 0 :
x
L(x, y, ) = 0, g(x, y) 0,

g(x, y) = 0
Bilevel Programming - A Survey 11
denote its set of regular Lagrange multipliers.
Theorem 1.12 (Kojima (1980)) Consider problem (1.1) and let x
0
be a locally optimal solution of this problem at y = y
0
. Assume that
(MFCQ) and
(SSOC) for all
0
(x
0
, y
0
) and for all d ,= 0 with

x
g
i
(x
0
, y
0
)d = 0 i J(
0
) := j :
0
j
> 0
the inequality d

2
xx
L(x
0
, y
0
,
0
)d > 0 holds
are satised. Then, the solution x
0
is strongly stable, i.e. there exist open
neighborhoods U of x
0
and V of y
0
and a uniquely determined function
x : V U being the unique locally optimal solution of (1.1) in U for all
y V.
If the assumptions of Theorem 1.12 are satised for x
0
(y
0
) and
problem (1.1) is a convex optimization problem for xed y then, problem
(1.1), (1.2) can locally equivalently be replaced with
min
y
F(x(y), y) : G(y) 0. (1.12)
Using the chain rule, necessary and sucient optimality conditions can
now be derived, provided it is possible to compute, say, a directional
derivative for the function x(y) in the point y
0
.
Theorem 1.13 (Ralph and Dempe (1995)) Consider problem (1.1)
and let x
0
be a locally optimal solution of this problem at y = y
0
. Assume
that (MFCQ), (SSOC) together with
(CRCQ) there exists an open neighborhood W of (x
0
, y
0
) such that, for
each subset I I(x
0
, y
0
) := j : g
j
(x
0
, y
0
) = 0 the family of gradient
vectors
x
g
i
(x, y) : i I has the same rank on W
are satised at (x
0
, y
0
). Then, the by Theorem 1.12 locally uniquely
determined function x(y) is directionally dierentiable at the point y =
y
0
with the directional derivative in direction r being the unique optimal
solution of the problem
0.5d

2
xx
L(x
0
, y
0
, )d +d

2
yx
L(x
0
, y
0
, )r min
d

x
g
i
(x
0
, y
0
)

= 0, if i J()
0, if i I(x
0
, y
0
) J()
for all Argmax

y
L(x
0
, y
0
, )r : (x
0
, y
0
).
12
Denote the directional derivative of x(y) at y = y
0
in direction r by
x

(y
0
; r). Then, necessary and sucient optimality conditions for the
bilevel programming problem can be formulated.
Theorem 1.14 (Dempe (1992)) Consider the problem (1.1), (1.2) at
a point (x
0
, y
0
) with G(y
0
) 0, x
0
(y
0
) and let (MFCQ), (SSOC)
and (CRCQ) be satised for the lower level problem. Moreover assume
that (1.1) is a convex optimization problem parametrized in y. Then,
1 if y
0
is a locally optimal solution of this problem then the following
optimization problem has the optimal objective function value zero:
min
,r

x
F(x
0
, y
0
)x

(y
0
; r) +
y
F(x
0
, y
0
)r
G
i
(y
0
) , i : G
i
(y
0
) = 0
|r| 1.
(1.13)
2 If the optimal function value v of the problem

x
F(x
0
, y
0
)x

(y
0
; r) +
y
F(x
0
, y
0
)r min
r
G
i
(y
0
) 0, i : G
i
(y
0
) = 0
|r| = 1.
is greater than zero (v > 0) then y
0
is a strict local optimal solution
of the problem (1.1), (1.2), i.e. for each 0 < z < v there is > 0
such that
F(x, y) F(x
0
, y
0
) +z|y y
0
|
for all (x, y) with G(y) 0, x (y), |y y
0
| .
Put T(y) := F(x(y), y). If the (MFCQ) is satised at the point y
0
for
the problem (1.12) then the necessary optimality condition of rst order
in Theorem 1.14 means that
T

(y
0
; r) 0 r satisfying G
i
(y
0
)r 0, i : G
i
(y
0
) = 0.
This property is usually called Bouligand stationarity (or B-stationarity)
of the point y
0
.
4.2 Using the KKT conditions
If the Karush-Kuhn-Tucker conditions are applied to replace the lower
level problem by a system of equations and inequalities, problem (1.11)
is obtained. The Example 1.11 shows that it is possible to obtain neces-
sary optimality conditions for the bilevel programming problem by this
Bilevel Programming - A Survey 13
approach only in the case when the lower level problem is a convex para-
metric one and also only using the optimistic position. But even in this
case this is not so easy since the familiar regularity conditions are not
satised for this problem.
Theorem 1.15 (Scheel and Scholtes (2000)) For problem (1.11)
the Mangasarian-Fromowitz constraint qualication (MFCQ) is violated
at every feasible point.
To circumvent the resulting diculties for the construction of Karush-
Kuhn-Tucker type necessary optimality conditions for the bilevel pro-
gramming problem, in Scheel and Scholtes (2000) a nonsmooth version
of the KKT reformulation of the bilevel programming problem in the
optimistic case is constructed:
F(x, y) min
x,y,
G(y) 0

x
L(x, y, ) = 0
ming(x, y), = 0.
(1.14)
Here, for a, b R
n
, the formula mina, b = 0 is understood component
wise.
For problem (1.14) the following generalized variant of the linear in-
dependence constraint qualication can be dened (Scholtes and Stohr
(2001)):
(PLICQ) The piecewise linear independence constraint qualication
is satised for the problem (1.14) at a point (x
0
, y
0
,
0
) if the gra-
dients of all the vanishing components of the constraint functions
G(y),
x
L(x, y, ), g(x, y), are linearly independent.
Problem (1.14) can be investigated by considering the following patch-
work of nonlinear programs for xed sets I:
F(x, y) min
x,y,
G(y) 0

x
L(x, y, ) = 0
g
i
(x, y) = 0, for i I
g
i
(x, y) 0, for i , I

i
= 0, for i , I

i
0, for i I.
(1.15)
Then, the piecewise linear independence constraint qualication is valid
for problem (1.14) at some point (x
0
, y
0
,
0
) if and only if it is satised
for each of the problems (1.15) for all sets J(
0
) I I(x
0
, y
0
).
14
The following theorem says that the (PLICQ) is generically satised.
For this dene the set
H
l
B
= (F, G, f, g) C
l
(R
m+n
, R
1+s+1+p
) : (PLICQ) is satised
at each feasible point of (1.14) with ||

B
for an arbitrary constant 0 < B < , ||

= max[
i
[ : 1 i p is
the L

-norm of a vector R
p
and l 2.
Theorem 1.16 (Scholtes and St ohr (2001)) For 2 k l, the
set H
l
B
is open in the C
k
-topology (Hirsch (1994)). Moreover, for l > m,
the set H
l
B
is also dense in the C
k
-topology for all 2 k l.
Now, after this excursion to regularity, the description of necessary op-
timality conditions for the bilevel programming problem with convex
lower level problems using the optimistic position is continued. For the
origin of the following theorem for mathematical programs with equilib-
rium constraints see Scheel and Scholtes (2000). There a relaxation of
problem (1.11) is considered:
F(x, y) min
x,y,,

x
L(x, y, , ) = 0
g
i
(x, y) = 0 for g
i
(x
0
, y
0
) = 0

i
= 0 for
0
i
= 0
g
i
(x, y) 0 for g
i
(x
0
, y
0
) < 0

i
0 for
0
i
> 0
G(y) 0.
(1.16)
In the following theorem, the following more restrictive regularity con-
dition than (MFCQ) is needed
(SMFCQ) The strict Mangasarian-Fromowitz constraint qualication
(SMFCQ) is satised at x
0
for problem (1.7) if there exists a Lagrange
multiplier (, ),
0,

(x
0
) = 0, (x
0
) +

(x
0
) +

(x
0
) = 0,
as well as a direction d satisfying

i
(x
0
)d < 0, for each i with
i
(x
0
) =
i
= 0,

i
(x
0
)d = 0, for each i with
i
> 0,

j
(x
0
)d = 0, for each j
and
i
(x
0
) :
i
> 0)
x

j
(x
0
) : j = 1, . . . , q are linearly indepen-
dent.
Bilevel Programming - A Survey 15
Note that this condition is implied by (PLICQ).
Theorem 1.17 Let (x
0
, y
0
,
0
) be a local minimizer of problem (1.14)
and use z
0
= (x
0
, y
0
).
If the (MFCQ) is valid for problem (1.16) at (x
0
, y
0
,
0
), then there
exist multipliers (, , , ) satisfying
F(z
0
) +

(0,
y
G(y
0
)) +(
x
L(z
0
,
0
)) +

g(z
0
) = 0

x
g(z
0
) = 0
g
i
(z
0
)
i
= 0, i

0
i

i
= 0, i

i
0, i K

G(y
0
) = 0,
0,
where K = i : g
i
(x
0
, y
0
) =
0
i
= 0.
If the (SMFCQ) is fullled for the problem (1.16), then there exist
unique multipliers (, , , ) solving the last system of equations
and inequalities with
i

i
0, i K being replaced by

i
0,
i
0, i K.
For related optimality conditions see e.g. Flegel and Kanzow (2002);
Flegel and Kanzow (2003).
5. Solution algorithms
5.1 Implicit function approach
Again, to solve the bilevel programming problem, it is reformulated
as a one-level problem. The rst approach again uses the implicit deter-
mined solution function of the convex lower level problem x(y) provided
this function is uniquely determined. If the assumptions (C), (MFCQ),
(SSOC), and (CRCQ) are satised at every point y with G(y) 0, then
the resulting problem
minF(x(y), y) : G(y) 0
has an objective function being piecewise continuously dierentiable (see
Ralph and Dempe (1995)). The pieces of the solution function x(y) are
obtained by replacing some of the active inequalities g
i
(x, y) 0, i I
in the lower level problem by equations g
i
(x, y) = 0, i I, where
J(
0
) := i :
0
i
> 0 I I(x(y
0
), y
0
) := j : g
j
(x(y
0
), y
0
) = 0
16
and
0
is a Lagrange multiplier vector in the lower level problem cor-
responding to the optimal solution x(y
0
) for y = y
0
. If the constraints
g
i
(x, y) 0 in problem (1.1) are locally replaced by g
i
(x, y) = 0, i I,
the resulting lower level problems are
min
x
f(x, y) : g
i
(x, y) = 0, i I. (1.17)
If the gradients
x
g
i
(x(y
0
), y
0
) : i I are moreover linearly indepen-
dent (which can be guaranteed for small sets I J(
0
) with
0
being
a vertex in (x(y
0
), y
0
)), then the optimal solution function x
I
() of the
problem (1.17) is dierentiable (Fiacco (1983)). Let 1 denote the family
of all index sets determined by the above two demands for all vertices

0
(x(y
0
), y
0
).
Theorem 1.18 (Dempe and Pallaschke (1997)) Consider problem
(1.1) at the point (x(y
0
), y
0
) and let (MFCQ), (SSOC) and (CRCQ) be
satised there. If the condition
(FRR) For each vertex
0
(x(y
0
), y
0
) the matrix


2
xx
L(x(y
0
), y
0
,
0
)

x
g
J(
0
)
(z
0
)
2
yx
L(x(y
0
), y
0
,
0
)

x
g
I(x(y
0
),y
0
)
(x(y
0
), y
0
) 0
y
g
I(x(y
0
),y
0
)
(x(y
0
), y
0
)

has full row rank n +[I(x(y


0
), y
0
)[
is valid, then the generalized derivative of the function x() at the point
y = y
0
in the sense of Clarke (1983) is
x(y
0
) = conv

II
x
I
(y
0
).
Using this formula, a bundle algorithm (cf. Outrata at al. (1998)) can
be derived to solve the resulting problem (1.12).
Since the full description of bundle algorithms is rather lengthy, the
interested reader is referred e.g. to Outrata at al. (1998). Repeating the
results in Schramm (1989) (cf. also Outrata at al. (1998)) the following
result is obtained:
Theorem 1.19 (Dempe (2002)) If the assumptions (C), (MFCQ),
(CRCQ), (SSOC), and (FRR) are satised for the convex lower level
problem (1.1) at all points (x, y), x (y), G(y) 0, and the sequence
of iteration points (x(y
k
), y
k
,
k
)

k=1
in the bundle algorithm remains
bounded, then this algorithm computes a sequence (x(y
k
), y
k

k
)

k=1
having at least one accumulation point (x(y
0
), y
0
,
0
) with
0
x
F(x(y
0
), y
0
)x(y
0
) +
y
F(x(y
0
), y
0
).
Bilevel Programming - A Survey 17
If assumption (FRR) is not satised, then the point (x(y
0
), y
0
) is pseu-
dostationary in the sense of Mikhalevich et al. (1987).
Hence, under suitable assumptions the bundle algorithm computes a
Clarke stationary point. Such points are in general not Bouligand sta-
tionary.
5.2 A smoothing method
If the lower level problem (1.1) is a convex prarametric problem for
which, at every feasible parameter value y with G(y) 0, a constraint
qualication is satised, then the optimistic problem (1.6) can be re-
placed equivalently by the problem (1.11). By Theorem 1.9 every opti-
mistic optimal solution of the bilevel programming problem correspon-
des to an optimal solution of each of the problems (1.11). To solve this
problem several authors (e.g. Fukushima and Pang (1999)) use a NCP
function approach to replace the complementarity constraints. This re-
sults in the nondierentiable problem
F(x, y) min
x,y,
G(y) 0

x
L(x, y, ) = 0
(g
i
(x, y),
i
) = 0, i = 1, . . . , p,
(1.18)
where a function (, ) satisfying
(a, b) = 0 a 0, b 0, ab = 0
is called a NCP function. Examples and properties of NCP functions can
be found in the book of Geiger and Kanzow (2003). NCP functions are
inherently nondierentiable, and algorithms solving problem (1.18) use
smoothed NCP functions. Fukushima and Pang (1999) use the function

(a, b) = a +b

a
2
+b
2
+, > 0
and solve the resulting problems
F(x, y) min
x,y,
G(y) 0

x
L(x, y, ) = 0

(g
i
(x, y),
i
) = 0, i = 1, . . . , p,
(1.19)
for 0 with suitable standard algorithms. Hence selecting an
arbitrary sequence
k

k=1
they compute a sequence of solutions
18
(x
k
, y
k
,
k
)

k=1
and investigate the properties of the accumulation
points of this sequence.
To formulate their convergence result, the assumption of week nonde-
generacy is needed. To formulate this assumption consider the Clarke
derivative of the function (g
i
(x, y),
i
). This Clarke derivative exists
and is contained in the set
C
i
(x, y, ) = r =
i
(g
i
(x, y), 0)+
i
(0, 0, 1) : (1
i
)
2
+(1
i
)
2
1.
Let the point (x, y, ) be an accumulation point of the sequence
(x
k
, y
k
,
k
)

k=1
. It is then easy to see that, for each i such that
g
i
(x, y) =
i
= 0 any accumulation point of the sequence

k
(g
i
(x
k
, y
k
),
k
i
)

k=1
belongs to C
i
(x, y, ), hence is of the form
r =
i
(g
i
(x, y), 0) +
i
(0, 0, 1)
with (1
i
)
2
+ (1
i
)
2
1. Then, it is said that the se-
quence (x
k
, y
k
,
k
)

k=1
is asymptotically weakly nondegenerate, if in
this formula neither
i
nor
i
vanishes for any accumulation point of
(x
k
, y
k
,
k
)

k=1
. Roughly speaking this means that both g
i
(x
k
, y
k
) and

k
i
approach zero in the same order of magnitude (see Fukushima and
Pang (1999)).
Theorem 1.20 (Fukushima and Pang (1999)) Let for each point
(x
k
, y
k
,
k
) the necessary optimality conditions of second order for prob-
lem (1.19) be satised. Suppose that the sequence (x
k
, y
k
,
k
)

k=1
con-
verges to some (x, y, ) for k . If the (PLICQ) holds at the limit
point and the sequence (x
k
, y
k
,
k
)

k=1
is asymptotically weakly non-
degenerate, then (x, y, ) is a Bouligand stationary solution for problem
(1.11).
5.3 SQP methods
In recent times several authors have reported (in view of the violated
regularity condition rather surprisingly) a good behavior of SQP meth-
ods for solving mathematical programs with equilibrium constraints (see
Anitescu (2002); Fletcher et al. (2002); Fletcher and Leyer (2002)).
To scetch these results consider a bilevel programming problem (1.6)
with a convex parametric lower level problem (1.1) and assume that a
regularity assumption is satised for each xed parameter value y with
G(y) 0. Then, by Theorem 1.9, a locally optimal solution of the bilevel
programming problem corresponds to a locally optimal for the problem
Bilevel Programming - A Survey 19
(1.11). Consequently, in order to compute local minima of the bilevel
problem, problem (1.11) can be solved.
In doing this, Anitescu (2002) uses the elastic mode approach in a se-
quantial quadratic programming algorithm solving (1.11). This means
that if a quadratic programming problem minimizing a quadratic ap-
proximation of the objective function of problem (1.11) subject to a
linear approximation of the constraints of this problem has a feasible so-
lution with bounded Lagrange multipliers then the solution of this prob-
lem is used as a search direction. And if not, a regularized quadratic
programming problem is used to compute this search direction.
For simplicity, this idea is described for problem (1.7). Then this
means that the following problem is used to compute this seach direction:
(x)d +d

Wd min
d

i
(x) +(x)d 0, i = 1, . . . , p

j
(x) +(x)d = 0, j = 1, . . . , q.
Here, W can be the Hessian matrix of the Lagrange function of the prob-
lem (1.7) or another positive denite matrix approximating this Hessian.
If this problem has no feasible solution or unbounded Lagrange multi-
pliers the solution of problem (1.7) (or accordingly the solution process
for the problem (1.11)) with the sequential quadratic programming ap-
proach is replaced by the solution of the following problem by the same
approach:
min
x,
(x)+c :
i
(x) , i = 1, . . . , p,
j
(x) , j = 1, . . . , q,
where c is a suciently large constant. This is the elastic mode SQP
method.
To implement the idea of Anitescu (2002) assume that the problem
(1.16) satises the (SMFCQ) and that the quadratic growth condition at
a point x = x
0
(QGC) There exists > 0 satisfying
max(x) (x
0
),
i
(x), i, [
j
(x)[, j |x x
0
|
for all x in some open neighborhood of x
0
is valid for problem (1.11) at a locally optimal solution of this problem.
Theorem 1.21 (Anitescu (2002)) If the above two assumtions are
satised then the elastic mode sequantial quadratic programming algo-
rithm computes a locally optimal solution of the problem (1.11) provided
it is started suciently close to that solution and the constant c is suf-
ciently large.
20
Using stronger assumptions Fletcher et al. (2002) have even been able
to prove local Q-quadratic convergence of sequential quadratic program-
ming algorithms to solutions of (1.11).
6. Discrete bilevel programming
If discreteness demands are added to the lower or upper levels of a
bilevel programming problem the investigation becomes more dicult
and the number of references is rather small, see Dempe (2003). With
respect to the existence of optimal solutions the location of the dis-
creteness demand is important Vicente et al. (1996). Most dicult is
the situation when the lower level problem is a parametric discrete one
and the upper level problem is a continuous problem. Then the graph
of the solution set mapping () is in general neither closed nor open.
The situation is more or less similar to the continuous case if the lower
level problem is a continuous parametric optimization problem or if the
discreteness demands are situated in both levels.
One way to solve discrete optimization problems (and also bilevel
programming problems) is branch-and-bound. Here a second diculty
for the solution of discrete bilevel programming problems appeares: the
usual fathomimg procedure becomes wrong, see Moore and Bard, (1990).
If the integrality conditions in both levels are dropped at the beginning
and are introduced via the branching procedure, then a global optimal
solution of the relaxed problem, which occasionally proves to be feasible
for the bilevel problem is in general not an optimal solution for the bilevel
programming problem.
Known solution methods include one being based on the investigation
of the solution set mapping if the lower level problem is a right-hand side
parametrized Boolean knapsack problem and another one using cutting
planes in the discrete lower level problem with parameters in the objec-
tive function only (see Dempe (2002)).
To describe a third approach consider a linear bilevel programming
problem with upper level discreteness demands only:
c

1
x +d

1
y max
x,y
A
1
x b
1
, x 0, integer
where y solves
d

2
y max
y
A
2
x +B
2
y = b
2
y 0
(1.20)
Then, an idea of White and Anandalingam (1993) can be used to trans-
form this problem into a mixed discrete optimization problem. For this,
Bilevel Programming - A Survey 21
apply the Karush-Kuhn-Tucker conditions to the lower level problem.
This transforms problem (1.20) into
c

1
x +d

1
y max
x,y,
A
1
x b
1
, x 0, integer
B

2
b
2
,
A
2
x +B
2
y = b
2
y 0, y

(B

2
d
2
) = 0.
(1.21)
Now use a penalty function approach to get rid of the complementarity
constraint resulting in the problem
c

1
x +d

1
y Ky

(B

2
d
2
) max
x,y,
A
1
x b
1
, x 0, integer
A
2
x +B
2
y = b
2
, y 0
B

2
b
2
.
(1.22)
By application of the results in White and Anandalingam (1993) the
following is obtained:
Theorem 1.22 Assume that problem (1.22) has an optimal solution for
some positive K
0
. Then, the problem (1.22) describes an exact penalty
function approach for problem (1.20), i.e. there is a number K

such that
the optimal solutions of the problems (1.22) and (1.20) for all K K

coincide.
This idea has been used in Dempe and Kalashnikov, (2002) to solve
an application problem in gas industry. Moreover, the implications of
a movement of the discreteness condition from the lower to the upper
level problems has been touched there.
7. Conclusion
In the paper a survey of results in bilevel programming has been given.
It was not the intention of the author to give a detailled description of
one or two results but rather to give an overview over dierent directions
of research and to describe some of the challenges of this topic. Since
bilevel programming is a very living area a huge number of questions
remain open. These include optimality conditions as well as solution
algorithms for problems with nonconvex lower level problems, discrete
bilevel programming problems in every context, many questions related
to the investigation of pessimistic bilevel programming problems to call
only some of them. Also, one implication from ^T-hardness often used
in theory is it that such problems should also be solved with approxima-
tion algorithms which, if possible, should be complemented by a bound
22
on the accuracy of the computed solution. One example for such an
approximation algorithm can be found in Marcotte (1986) but in gen-
eral the description of such algorithms is a challenging task for future
research.
References
G. Anandalingam and T. Friesz (eds.). Hierarchical optimization. Annals
of Operations Research, vol. 24, 1992.
M. Anitescu. On solving mathematical programs with complementarity
constraints as nonlinear programs. Technical Report Nr. ANL/NCS-
P864-1200, Department of Mathematics, University of Pittsburgh,
2002.
C. Audet, P. Hansen, B. Jaumard and G. Savard. Links between linear
bilevel and mixed 0-1 programming problems, Journal of Optimization
Theory and Applications, 93: 273-300, 1997.
J.F. Bard. Practical Bilevel Optimization: Algorithms and Applications.
Kluwer Academic Publishers, Dordrecht, 1998.
F.H. Clarke. Optimization and Nonsmooth Analysis. John Wiley & Sons,
New York, 1983.
S. Dempe. A necessary and a sucient optimality condition for bilevel
programming problems. Optimization, 25: 341354, 1992.
S. Dempe. Foundations of Bilevel Programming. Kluwer Academie Pub-
lishers, Dordrecht, 2002.
S. Dempe. Annotated Bibliography on Bilevel Programming and Math-
ematical Programs with Equilibrium Constraints. Optimization, 52:
333-359, 2003.
S. Dempe and V. Kalashnikov. Discrete bilevel programming: application
to a gas shippers problem. Preprint Nr. 2002-02, TU Bergakademie
Freiberg, Fakultat f ur Mathematik und Informatik, 2002.
S. Dempe and D. Pallaschke. Quasidierentiability of optimal solutions
in parametric nonlinear optimization. Optimization, 40: 1-24, 1997.
X. Deng. Complexity issues in bilevel linear programming. In: A.
Migdalas, P.M. Pardalos and P. Varbrand (eds.), Multilevel Opti-
mization: Algorithms and Applications, pp. 149-164, Kluwer Academic
Publishers, Dordrecht, 1998.
A. V. Fiacco. Introduction to Sensitivity and Stability Analysis in Non-
linear Programming. Academic Press, New York, 1983.
M. L. Flegel and C. Kanzow. Optimality conditions for mathematical
programs with equilibrium constraints: Fritz John and Abadie-type ap-
proaches. Report, Universitat W urzburg, Germany, 2002.
Bilevel Programming - A Survey 23
M. L. Flegel and C. Kanzow. A Fritz John approach to rst order op-
timality conditions for mathematical programs with equilibrium con-
straints. Optimization, 52: 277-286, 2003.
R. Fletcher and S. Leyer. Numerical experience with solving MPECs as
NLPs. Numerical Analysis Report Nr. NA/210, Department of Math-
ematics, University of Dundee, UK, 2002.
R. Fletcher, S. Leyer, D. Ralph and S. Scholtes. Local convergence of
SQP methods for mathematical programs with equilibrium constraints.
Numerical Analysis Report NA/209, Department of Mathematics,
University of Dundee, UK, 2002.
J. Fliege and L. N. Vicente. A bicriteria approach to bilevel optimization,
Technical Report, Fachbereich Mathematik, Universitat Dortmund,
Germany, 2003.
A. Frangioni. On a new class of bilevel programming problems and its
use for reformulating mixed integer problems, European Journal of
Operational Research, 82: 615-646, 1995.
M. Fukushima and J.-S. Pang. Convergence of a smoothing continua-
tion method for mathematical programs with complementarity con-
straints. In: M. Thera and R. Tichatschke (eds.), Ill-posed Varia-
tional Problems and Regularization Techniques. Number 477 in Lec-
ture Notes in Economics and Mathematical Systems. Berlin et al.:
Springer, 1999.
J. F ulop. On the equivalence between a linear bilevel programming prob-
lem and linear optimization over the ecient set. Working Paper No.
WP 93-1, Laboratory of Operations Research and Decision Systems,
Computer and Automation Institute, Hungarian Academy of Sciences,
1993.
C. Geiger and C. Kanzow. Theorie und Numerik restringierter Opti-
mierungsaufgaben. Berli et al.: Springer, 2002.
J. Guddat, F. Guerra Vasquez and H.Th. Jongen. Parametric Optimiza-
tion: Singularities, Pathfollowing and Jumps. John Wiley & Sons,
Chichester and B.G. Teubner, Stuttgart, 1990.
P. Hansen, B. Jaumard and G. Savard. New branch-and-bound rules for
linear bilevel programming. SIAM Journal on Scientic and Statistical
Computing. 13: 11941217, 1992.
P.T. Harker and J.-S. Pang. Existence of optimal solutions to mathe-
matical programs with equilibrium constraints. Operations Research
Letters, 7: 61-64, 1988.
A. Haurie, G. Savard and D. White. A note on: an ecient point al-
gorithm for a linear two-stage optimization problem. Operations Re-
search, 38: 553555, 1990.
M. W. Hirsch. Dierential Topology, Springer-Verlag, Berlin et al., 1994.
24
M. Kojima. Strongly stable stationary solutions in nonlinear programs.
In: S.M. Robinson (ed.), Analysis and Computation of Fixed Points,
pp. 93138, Academic Press, New York, 1980.
R. Lucchetti, F. Mignanego and G. Pieri, Existence theorem of equilib-
rium points in Stackelberg games with constraints. Optimization, 18:
857-866, 1987.
C.M. Macal and A.P. Hurter. Dependence of bilevel mathematical pro-
grams on irrelevant constraints. Computers and Operations Research,
24: 1129-1140, 1997.
P. Marcotte. Network design problem with congestion eects: a case of
bilevel programming. Mathematical Programming, 34: 142162, 1986.
P. Marcotte and G. Savard. A note on the Pareto optimality of solutions
to the linear bilevel programming problem. Computers and Operations
Research, 18: 355359, 1991.
A. Migdalas, P.M. Pardalos and P. Varbrand (eds.) Multilevel Optimiza-
tion: Algorithms and Applications. Kluwer Academic Publishers, Dor-
drecht, 1998.
V. S. Mikhalevich and A. M. Gupal and V. I. Norkin. Methods of Non-
convex Optimization. Nauka, Moscow, 1987 (in Russian).
J. A. Mirrlees. The theory of moral hazard and unobservable bevaviour:
part I. Review of Economic Studies, 66: 3-21, 1999.
J. Moore and J. F. Bard. The mixed integer linear bilevel programming
problem. Operations Research, 38: 911921, 1990.
L. D. Muu. On the construction of initial polyhedral convex set for op-
timization problems over the ecient set and bilevel linear programs.
Vietnam Journal of Mathematics, 28: 177-182, 2000.
J. Outrata and M. Kocvara and J. Zowe. Nonsmooth Approach to Op-
timization Problems with Equilibrium Constraints. Kluwer Academic
Publishers, Dordrecht, 1998.
D. Ralph and S. Dempe. Directional Derivatives of the Solution of a
Parametric Nonlinear Program. Mathematical Programming, 70: 159-
172, 1995.
H. Scheel and S. Scholtes. Mathematical programs with equilibrium con-
straints: stationarity, optimality, and sensitivity. Mathematics of Op-
erations Research, 25: 1-22, 2000.
S. Scholtes and M. Stohr. How stringent is the linear independence as-
sumption for mathematical programs with stationarity constraints?.
Mathematics of Operations Research, 26: 851-863, 2001.
H. Schramm. Eine Kombination von bundle- und trust-region-Verfahren
zur Losung nichtdierenzierbarer Optimierungsprobleme. Bayreuther
Mathematische Schriften, Bayreuth, No. 30, 1989.
Bilevel Programming - A Survey 25
L. N. Vicente, G. Savard and J. J. Judice. The discrete linear bilevel
programming problem. Journal of Optimization Theory and Applica-
tions, 89: 597-614, 1996.
S. Vogel. Zwei-Ebenen-Optimierungsaufgaben mit nichtkonvexer Ziel-
funktion in der unteren Ebene: Pfadverfolgung und Spr unge. PhD the-
sis, Technische Universitat Bergakademie Freiberg, 2002.
D. J. White and G. Anandalingam. A penalty function approach for
solving bi-level linear programs. Journal of Global Optimization, 3:
397-419, 1993.

You might also like