Nothing Special   »   [go: up one dir, main page]

Global Convergence of A Trust-Region Sqp-Filter Algorithm For General Nonlinear Programming

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

SIAM J. OPTIM.

Vol. 13, No. 3, pp. 635659

c 2002 Society for Industrial and Applied Mathematics




GLOBAL CONVERGENCE OF A TRUST-REGION SQP-FILTER


ALGORITHM FOR GENERAL NONLINEAR PROGRAMMING
ROGER FLETCHER , NICHOLAS I. M. GOULD , SVEN LEYFFER ,

PHILIPPE L. TOINT , AND ANDREAS WACHTER


Abstract. A trust-region SQP-lter algorithm of the type introduced by Fletcher and Leyer
[Math. Program., 91 (2002), pp. 239269] that decomposes the step into its normal and tangential
components allows for an approximate solution of the quadratic subproblem and incorporates the
safeguarding tests described in Fletcher, Leyer, and Toint [On the Global Convergence of an SLPFilter Algorithm, Technical Report 98/13, Department of Mathematics, University of Namur, Namur,
Belgium, 1998; On the Global Convergence of a Filter-SQP Algorithm, Technical Report 00/15,
Department of Mathematics, University of Namur, Namur, Belgium, 2000] is considered. It is proved
that, under reasonable conditions and for every possible choice of the starting point, the sequence of
iterates has at least one rst-order critical accumulation point.
Key words. nonlinear optimization, sequential quadratic programming, lter methods, convergence theory
AMS subject classications. 90C30, 65K05
PII. S1052623499357258

1. Introduction. We analyze an algorithm for solving optimization problems


where a smooth objective function is to be minimized subject to smooth nonlinear
constraints. No convexity assumption is made. More formally, we consider the problem
(1.1)

minimize f (x)
subject to cE (x) = 0,
cI (x) 0,

where f is a twice continuously dierentiable real valued function of the variables


x Rn and cE (x) and cI (x) are twice continuously dierentiable functions from Rn
into Rm and from Rn into Rp , respectively. Let c(x)T = (cE (x)T cI (x)T ).
The class of algorithms that we discuss belongs to the class of trust-region methods
and, more specically, to that of lter methods introduced by Fletcher and Leyer [18],
in which the use of a penalty function, a common feature of the large majority of the
algorithms for constrained optimization, is replaced by the introduction of a so-called
lter.
A global convergence theory for an algorithm of this class is proposed by Fletcher,
Leyer, and Toint in [19], in which the objective function is locally approximated by a
linear function, leading, at each iteration, to the (exact) solution of a linear program.
Received by the editors June 14, 1999; accepted for publication (in revised form) March 12, 2002;
published electronically November 6, 2002.
http://www.siam.org/journals/siopt/13-3/35725.html
Department of Mathematics, University of Dundee, Dundee, Scotland (etcher@maths.dundee.
ac.uk, sleyer@maths.dundee.ac.uk).
Computational Science and Engineering Department, Rutherford Appleton Laboratory, Chilton,
Oxfordshire OX11 0QX, England (gould@rl.ac.uk).
Department of Mathematics, University of Namur, Namur, Belgium (philippe.toint@fundp.ac.
be).
Mathematical Sciences Department, IBM T. J. Watson Research Center, P. O. Box 218, Yorktown Heights, NY 10598. Current address: Department of Chemical Engineering, Carnegie Mellon
University, 5000 Forbes Avenue, Pittsburgh, PA 15213 (andreasw@andrew.cmu.edu).

635


FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER

636

This algorithm therefore mixes the use of the lter with sequential linear programming (SLP). This approach was generalized by the same authors in [20], where the
objective function is approximated by a quadratic model, which results in a sequential
quadratic programming (SQP) technique in which each quadratic program must be
solved globally. In this paper, we again consider approximating the objective function
by a quadratic model, but, at variance with the latter reference, the method discussed
here does not require the global solution of the associated nonconvex quadratic programming (QP) subproblem, which is known to be a theoretically dicult processit
is known to be NP hard (see Murty and Kabadi [26]). The algorithm analyzed here
also has a dierent mechanism for deciding on the compatibility of this subproblem
and allows for an approximate subproblem solution.
2. A class of trust-region SQP-lter algorithms.
2.1. An approximate SQP framework. SQP methods are iterative. At a
given iterate xk , they implicitly apply Newtons method to solve (a local version of)
the rst-order necessary optimality conditions by solving the QP subproblem QP(xk )
given by
(2.1)

minimize fk + gk , s + 12 s, Hk s


subject to cE (xk ) + AE (xk )s = 0,
cI (xk ) + AI (xk )s 0,
def

where fk = f (xk ), gk = g(xk ) = x f (xk ), where AE (xk ) and AI (xk ) are the Jacobians of the constraint functions cE and cI at xk , and where Hk is a symmetric
matrix. We will not immediately be concerned about how Hk is obtained, but we will
return to this point in section 3. Assuming that a suitable matrix Hk can be found,
the solution of QP(xk ) then yields a step sk . If sk = 0, then xk is rst-order critical
for problem (1.1).
Unfortunately, due to the locally convergent nature of Newtons iteration, the
step sk may not always be very good. One possible way to cope with this diculty
is to dene an appropriate merit function whose value decreases with the goodness
of sk , which is where penalty functions typically play a role. A trust-region or a
linesearch method is then applied to minimize this merit function, ensuring global
convergence under reasonable assumptions. However, as one of our objectives is to
avoid penalty functions (and the need to update the associated penalty parameter),
we instead consider a trust-region approach that will not use any penalty function.1
In such an approach, the objective function of QP(xk ) is intended to be only of local
interest; that is, we restrict the step sk in the norm to ensure that xk + sk remains in
a trust-region centered at xk . In other words, we replace QP(xk ) by the subproblem
TRQP(xk , k ) given by
(2.2)

minimize mk (xk + s)
subject to cE (xk ) + AE (xk )s = 0,
cI (xk ) + AI (xk )s 0,
and
s k

for some (positive) value of the trust-region radius k , where we have dened
(2.3)

1
mk (xk + s) = fk + gk , s + s, Hk s
2

1 Recently, W
achter and Biegler [31] have proposed a linesearch variant of the ideas described in
this paper.

CONVERGENCE OF A TRUST-REGION SQP-FILTER METHOD

637

and where   denotes the Euclidean norm. This latter choice is purely for ease
of exposition. We could equally use a family of iteration-dependent norms  k , so
long as we require that all members of the family are uniformly equivalent to the
Euclidean norm. The interested reader may verify that all subsequent developments
can be adapted to this more general case by introducing the constants implied by this
uniform equivalence wherever needed.
Remarkably, most early SQP algorithms assume that an exact local solution of
QP(xk ) or TRQP(xk , k ) is found, although attempts have been made by Dembo
and Tulowitzki [8] and Murray and Prieto [25] to design conditions under which
an approximate solution of the subproblem is acceptable. We revisit this issue in
what follows, and start by noting that the step sk may be viewed as the sum of two
distinct components, a normal step nk , such that xk + nk satises the constraints
of TRQP(xk , k ), and a tangential step tk , whose purpose is to obtain reduction of
the objective functions model while continuing to satisfy those constraints. This
framework is therefore similar in spirit to the composite-step SQP methods pioneered
by Vardi [30], Byrd, Schnabel, and Shultz [5], and Omojokun [27], and later developed
by several authors, including Biegler, Nocedal, and Schmid [1], El-Alem [12, 13], Byrd,
Gilbert, and Nocedal [3], Byrd, Hribar, and Nocedal [4], Bielschowsky and Gomes [2],
Liu and Yuan [23], and Lalee, Nocedal, and Plantenga [22]. More formally, we write
(2.4)

sk = nk + tk

and assume that


(2.5)
(2.6)

cE (xk ) + AE (xk )nk = 0,

cI (xk ) + AI (xk )nk 0,

sk  k ,

and
(2.7)

cE (xk ) + AE (xk )sk = 0,

cI (xk ) + AI (xk )sk 0.

Of course, this is a strong assumption, since in particular (2.5) or (2.6)/(2.7) may not
have a solution. We shall return to this possibility shortly. Given our assumption,
there are many ways to compute nk and tk . For instance, we could compute nk from
(2.8)

nk = Pk [xk ] xk ,

where Pk is the orthogonal projector onto the feasible set of QP(xk ). In what follows,
we do not make any specic choice for nk , but we shall make the assumptions that
nk exists when the maximum violation of the nonlinear constraints at the kth iterate
def
k = (xk ), where


(2.9)
(x) = max 0, max |ci (x)|, max ci (x)
iE

iI

is suciently small, and that nk is then reasonably scaled with respect to the values
of the constraints. In other words, we assume that
(2.10)

nk exists and nk  usc k whenever k n

for some constants usc > 0 and n > 0. This assumption is also used by Dennis,
El-Alem, and Maciel [9] and Dennis and Vicente [11] in the context of problems only


FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER

638

involving equality constraints. We can interpret it in terms of the constraint functions


themselves and the geometry of the boundary of the feasible set. For instance, if we
dene the linearized feasible set at x by
def

L(x) = {v Rn | cE (x) + AE (x)(v x) = 0, cI (x) + AI (x)(v x) 0}


and assume that, at every limit point x of the sequence of iterates, the relative
interior of the linearized constraints ri{L(x )} is nonempty and that the active set
settles, in that A(xki ) = A(x ) for suciently large ki with limi xki = x , we know,
by applying a continuity argument, that the feasible set of QP(xk ) is nonempty for
such a k, which implies that Pk is well dened and that a normal step nk of the form
(2.8) exists. Furthermore, if the singular values of the Jacobian of constraints active
at x , AA(x ) (x ), are nonzero, those of AA(x ) (xk ) must be bounded away from zero
by continuity in a neighborhood of x . Since only the constraints active at x can be
active in a suciently small neighborhood of this limit point, this in turn guarantees
that (2.10) holds for the normal step

1
ATA(x ) (xk ) AA(x ) (xk )ATA(x ) (xk ) cA(x ) (xk )
for all k suciently large, provided that the sequence of iterates remains bounded,
because this latter assumption ensures that xk must be arbitrarily close to a least one
limit point of the sequence {xk } for such k. Thus we see that (2.10) does not impose
conditions on the constraints or the normal step itself that are unduly restrictive.
Having dened the normal step, we are in position to use it if it falls within the
trust-region, that is, if nk  k . In this case, we write
(2.11)

xNk = xk + nk

and observe that nk satises the constraints of TRQP(xk , k ) and thus also of QP(xk ).
It is crucial to note, at this stage, that such an nk may fail to exist because the
constraints of QP(xk ) may be incompatible, in which case Pk is undened, or because
all feasible points for QP (xk ) may lie outside the trust-region.
Let us continue to consider the case where this problem does not arise, and a
normal step nk has been found with nk  k . We then have to nd a tangential
step tk , starting from xNk and satisfying (2.6) and (2.7), with the aim of decreasing the
value of the objective function. As always in trust-region methods, this is achieved by
computing a step that produces a sucient decrease in mk , which is to say that we
wish mk (xNk ) mk (xk + sk ) to be suciently large. Of course, this is only possible
if the maximum size of tk is not too small, which is to say that xNk is not too close to
the trust-region boundary. We formalize this supposition by replacing our condition
that nk  k with the stronger requirement that
(2.12)

nk  k min[1, k ]

for some (0, 1], some > 0, and some (0, 1). If condition (2.12) does
not hold, we assume that the computation of tk is unlikely to produce a satisfactory
decrease in mk , and proceed just as if the feasible set of TRQP(xk , k ) were empty. If
nk can be computed and (2.12) holds, we shall say that TRQP(xk , k ) is compatible.
In this case at least a sucient model decrease seems possible, which we state in the
form of a familiar Cauchy-point condition. In order to formalize what we mean, we

CONVERGENCE OF A TRUST-REGION SQP-FILTER METHOD

639

recall that the feasible set of QP(xk ) is convex, and we can therefore introduce the
measure











k = 
(2.13)
min
gk + Hk nk , t
AE (xk )t=0


cI (xk )+AI (xk )(nk +t)0



t 1
(see Conn et al. [6]), which we will use to deduce rst-order criticality for our problem
(see Lemma 3.2). Note that this function is zero if the origin is a rst-order critical
point of the tangential problem
(2.14)

minimize gk + Hk nk , t + 12 Hk t, t


subject to AE (xk )t = 0,
cI (xk ) + AI (xk )(nk + t) 0,

which is, up to the constant term 12 nk , Hk nk , equivalent to QP(xk ) with s = nk + t.


Our sucient decrease condition is then to require that, whenever TRQP(xk , k ) is
compatible,


k
mk (xNk ) mk (xNk + tk ) tmd k min
(2.15)
, k
k
for some constant tmd > 0, where k = 1 + Hk . We know from Toint [29] and
Conn et al. [6] that this condition holds if the model reduction exceeds that which
would be obtained at the generalized Cauchy point, that is, the point resulting from
a backtracking curvilinear search along the projected gradient path from xNk , that is,
xk () = Pk [xNk x mk (xNk )].
This technique therefore provides an implementable algorithm for computing a step
that satises (2.15) (see Gould, Hribar, and Nocedal [21] for an example in the case
where c(x) = cE (x), or Toint [29] and More and Toraldo [24] for the case of bound
constraints), but, of course, reduction of mk beyond that imposed by (2.15) is often
possible and desirable if fast convergence is sought. Also note that the minimization
problem of the right-hand side of (2.13) would reduce to a linear programming problem
if we had chosen to use a polyhedral norm in its denition at iteration k.
Let us now return to the case where TRQP(xk , k ) is not compatible, that is,
when the feasible set determined by the constraints of QP(xk ) is empty, or the freedom
left to reduce mk within the trust-region is too small in the sense that (2.12) fails. In
this situation, solving TRQP(xk , k ) is most likely pointless, and we must consider an
alternative. We base this on the intuitive observation that, if (xk ) is suciently small
and the true nonlinear constraints are locally compatible, the linearized constraints
should also be compatible, since they approximate the nonlinear constraints (locally)
correctly. Furthermore, the feasible region for the linearized constraints should then
be close enough to xk for there to be some room to reduce mk , at least if k is
large enough. If the nonlinear constraints are locally incompatible, we have to nd
a neighborhood where this is not the case, since the problem (1.1) does not make
sense in the current one. We thus rely on a restoration procedure, whose aim is to
produce a new point xk + rk for which TRQP(xk + rk , k+1 ) is compatible for some
k+1 > 0we will actually need another condition which we will discuss shortly.


FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER

640

The idea of the restoration procedure is to (approximately) solve


(2.16)

min (x),

xRn

perhaps starting from xk , the current iterate. This is a nonsmooth problem, but we
know that there exist methods, possibly of trust-region type (such as that suggested
by Yuan [32]), which can be successfully applied to solve it. Thus we will not describe
the restoration procedure in detail. Note that we have chosen here to reduce the
innity norm of the constraint violation, but we could equally well consider other
norms, such as 1 or 2 , in which case the methods of Fletcher and Leyer [17] or
of El-Hallabi and Tapia [14] and Dennis, El-Alem, and Williamson [10], respectively,
can be considered. Of course, this technique only guarantees convergence to a rstorder critical point of the chosen measure of constraint violation, which means that,
in fact, the restoration procedure may fail as this critical point may not be feasible
for the constraints of (1.1). However, even in this case, the result of the procedure
is of interest because it typically produces a local minimizer of (x), or of whatever
other measure of constraint violation we choose for the restoration, yielding a point
of locally least infeasibility.
There is no easy way to circumvent this drawback, as it is known that nding a
feasible point or proving that no such point exists is a global optimization problem
and can be as dicult as the optimization problem (1.1) itself. We therefore accept
two possible outcomes of the restoration procedure: either the procedure fails in that
it does not produce a sequence of iterates converging to feasibility, or a point xk + rk
is produced such that (xk + rk ) is as small as we wish. We will shortly see that this
is all we need.
2.2. The notion of a lter. Having computed a step sk = nk + tk (or rk ), we
still need to decide whether the trial point xk + sk (or xk + rk ) is any better than
xk as an approximate solution to our original problem (1.1). We shall use a concept
borrowed from multicriteria optimization. We say that a point x1 dominates a point
x2 whenever
(x1 ) (x2 ) and f (x1 ) f (x2 ).
Thus, if iterate xk dominates iterate xj , the latter is of no real interest to us since xk
is at least as good as xj on account of both feasibility and optimality. All we need to
do now is to remember iterates that are not dominated by any other iterates using a
structure called a lter. A lter is a list F of pairs of the form (i , fi ) such that either
i < j or fi < fj
for i = j. We thus aim to accept a new iterate xi only if it is not dominated by
any other iterate in the lter. In the vocabulary of multicriteria optimization, this
amounts to building elements of the ecient frontier associated with the bicriteria
problem of reducing infeasibility and the objective function value.
Figure 2.1 illustrates the concept of a lter by showing the pairs (k , fk ) as black
dots in the (, f )-space. Each such pair is called the (, f )-pair associated with xk .
The lines radiating from each (, f )-pair indicate that any iterate whose associated
(, f )-pair occurs above and to the right of that of a given lter point is dominated
by this (, f )-pair.
While the idea of not accepting dominated trial points is simple and elegant,
it needs to be rened a little in order to provide an ecient algorithmic tool. In

641

CONVERGENCE OF A TRUST-REGION SQP-FILTER METHOD

f (x)

(x)

r
r

Fig. 2.1. A lter with four pairs.

particular, we do not wish to accept xk + sk if its (, f )-pair is arbitrarily close to that


of xk or that of a point already in the lter. Thus we set a small margin around
the border of the dominated part of the (, f )-space in which we shall also reject trial
points. Formally, we say that a point x is acceptable for the lter if and only if
(2.17)

(x) (1 )j or f (x) fj j for all (j , fj ) F

for some (0, 1). In Figure 2.1, the set of acceptable points corresponds to the set
of (, f )-pairs below the thin line. We also say that x is acceptable for the lter and
xk if (2.17) holds with F replaced by F (k , fk ). We thus consider moving from
xk to xk + sk only if xk + sk is acceptable for the lter and xk .
As the algorithm progresses, we may want to add a (, f )-pair to the lter. If an
iterate xk is acceptable for F, we do this by adding the pair (k , fk ) to the lter and
by removing from it every other pair (j , fj ) such that both
(2.18)

j k and fj j fk k .

Only entries whose envelope is dominated by a new entry are thus removed from
the lter. As a consequence, the margin of the lter never decreases, and it can be
shown that, for all innite subsequences of points added to the lter, lim ki = 0 (see
Lemma 3.3). We also refer to this operation as adding xk to the lter, although,
strictly speaking, it is the (, f )-pair which is added.
We conclude this section by noting that, if a point xk is in the lter or is acceptable
for the lter, then any other point x such that
(x) (1 )k and f (x) fk k
is also acceptable for the lter and xk .

642

FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER

2.3. An SQP-lter algorithm. We have now discussed the main ingredients of


the class of algorithms we wish to consider, and we are now ready to dene it formally
as Algorithm 2.1 below. A owchart of the algorithm is given as an appendix; see
Figure A.1.
Algorithm 2.1: SQP-Filter Algorithm.
Step 0: Initialization. Let an initial point x0 , an initial trust-region radius
0 > 0, and an initial symmetric matrix H0 be given, as well as constants
0 < 0 < 1 1 2 , 0 < 1 2 < 1, (0, 1), (0, 1),
(0, 1], > 0, (0, 1), > 1/(1 + ), and tmd (0, 1]. Compute
f (x0 ) and c(x0 ). Set F = and k = 0.
Step 1: Test for optimality. If k = k = 0, stop.
Step 2: Ensure compatibility. Attempt to compute a step nk . If TRQP
(xk , k ) is compatible, go to Step 3. Otherwise, include xk in the lter and compute a restoration step rk for which TRQP(xk + rk , k+1 ) is
compatible for some k+1 > 0, and xk + rk is acceptable for the lter. If
this proves impossible, stop. Otherwise, dene xk+1 = xk + rk and go to
Step 7.
Step 3: Determine a trial step. Compute a step tk and set sk = nk + tk .
Step 4: Tests to accept the trial step.
Evaluate c(xk + sk ) and f (xk + sk ).
If xk + sk is not acceptable for the lter and xk , set xk+1 = xk ,
choose k+1 [0 k , 1 k ], set nk+1 = nk , increment k by one,
and go to Step 2.
If
(2.19)

mk (xk ) mk (xk + sk ) k

and
(2.20)

def

k =

f (xk ) f (xk + sk )
< 1 ,
mk (xk ) mk (xk + sk )

again set xk+1 = xk , choose k+1 [0 k , 1 k ], set nk+1 = nk ,


increment k by one, and go to Step 2.
Step 5: Test to include the current iterate in the lter. If (2.19) fails,
include xk in the lter F.
Step 6: Move to the new iterate. Set xk+1 = xk + sk and choose k+1 such
that
k+1 [k , 2 k ] if k 2 and (2.19) holds.
Step 7: Update the Hessian approximation. Determine Hk+1 . Increment
k by one and go to Step 1.
As in Fletcher and Leyer [18, 17], one may choose = 2. (Note that the choice
= 1 is always possible because > 0.) Reasonable values for the constants might
then be
= 104 ,

0 = 0.1, 1 = 0.5, 2 = 2, 1 = 0.01, 2 = 0.9,


= 0.7, = 100, = 0.01, = 104 , and tmd = 0.01,

but it is too early to know if these are even close to the best possible choices.

CONVERGENCE OF A TRUST-REGION SQP-FILTER METHOD

643

Observe rst that, by construction, every iterate xk must be acceptable for the
lter at the beginning of iteration k, irrespective of the possibility that it might be
added to the lter later. Also note that the restoration step rk cannot be zero, that
is, restoration cannot simply entail enlarging the trust-region radius to ensure (2.12),
even if nk exists. This is because xk is added to the lter before rk is computed,
and xk + rk must be acceptable for the lter which now contains xk . Also note
that the restoration procedure cannot be applied on two successive iterations, since
the iterate xk + rk produced by the rst of these iterations leads to a compatible
TRQP(xk+1 , k+1 ) and is acceptable for the lter.
For the restoration procedure in Step 2 to succeed, we have to evaluate whether
TRQP(xk + rk , k+1 ) is compatible for a suitable value of k+1 . This requires that a
suitable normal step be computed which successfully passes the test (2.12). Of course,
once this is achieved, this normal step may be reused at iteration k + 1. Thus we shall
require that the normal step calculated to verify compatibility of TRQP(xk +rk , k+1 )
should actually be used as nk+1 .
As it stands, the algorithm is not specic about how to choose k+1 during
a restoration iteration. On one hand, there is an advantage to choosing a large
k+1 , since this allows a large step and, one hopes, good progress. On the other
hand, it may be unwise to choose it to be too large, as this may possibly result in
a large number of unsuccessful iterations, during which the radius is reduced, before
the algorithm can make any progress. A possible choice might be to restart from
the radius obtained during the restoration iteration itself, if it uses a trust-region
method. Reasonable alternatives would be to use the average radius observed during
past successful iterations, or to apply the internal doubling strategy of Byrd, Schnabel,
and Shultz [5] to increase the new radius, or even to consider the technique described
by Sartenaer [28]. However, we recognize that numerical experience with the algorithm
is too limited at this stage to make denite recommendations.
The role of condition (2.19) may be interpreted as follows. If this condition fails,
then one may think that the constraint violation is signicant and that one should
aim to improve on this situation in the future by inserting the current point into the
lter. Fletcher, Leyer, and Toint [19] use the term of -step in such circumstances
to indicate that the main preoccupation is to improve feasibility. On the other hand,
if condition (2.19) holds, then the reduction in the objective function predicted by the
model is more signicant than the current constraint violation, and it is thus appealing
to let the algorithm behave as if it were unconstrained. Fletcher and Leyer [18] use
the term f -step to denote the step generated, in order to reect the dominant role
of the objective function f . In this case, it is important that the predicted decrease
in the model be realized by the actual decrease in the function, which is why we also
require that (2.20) not hold. In particular, if the iterate xk is feasible, then (2.10)
implies that xk = xNk , and we obtain that
(2.21)

k = 0 mk (xNk ) mk (xk + sk ) = mk (xk ) mk (xk + sk ).

As a consequence, the lter mechanism is irrelevant if all iterates are feasible, and
the algorithm reduces to a classical unconstrained trust-region method. Another
consequence of (2.21) is that no feasible iterate is ever included in the lter, which
is crucial in allowing nite termination of the restoration procedure. Indeed, if the
restoration procedure is required at iteration k of the lter algorithm and produces a
sequence of points {xk,j } converging to feasibility, there must be an iterate xk,j for

644
which

FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER




def
k,j = (xk,j ) min (1 )kmin ,
k+1 min[1, k+1 ]
usc

for any given k+1 > 0, where


kmin =

min i > 0

iZ, ik

and
Z = {k | xk is added to the lter}.
Moreover, k,j must eventually be small enough to ensure, using our assumption on
the normal step, the existence of a normal step nk,j from xk,j . In other words, the
restoration iteration must eventually nd an iterate xk,j which is acceptable for the
lter and for which the normal step exists and satises (2.12), i.e., an iterate xj which
is both acceptable and compatible. As a consequence, the restoration procedure will
terminate in a nite number of steps, and the lter algorithm may then proceed. Note
that the restoration step may not terminate in a nite number of iterations if we do
not assume the existence of the normal step when the constraint violation is small
enough, even if this violation converges to zero (see Fletcher, Leyer, and Toint [19],
for an example).
Notice also that (2.19) ensures that the denominator of k in (2.20) will be strictly
positive whenever k is. If k = 0, then xk = xNk , and the denominator of (2.20) will
be strictly positive unless xk is a rst-order critical point because of (2.15).
The reader may have observed that Step 6 allows a relatively wide choice of the
new trust-region radius k+1 . While the stated conditions appear to be sucient
for the theory developed below, one must obviously be more specic in practice. For
instance, one may wish to distinguish, at this point in the algorithm, the cases where
(2.19) fails or holds. If (2.19) fails, the main eect of the current iteration is not
to reduce the model (which makes the value of k essentially irrelevant), but rather
to reduce the constraint violation (which is taken care of by inserting the current
iterate into the lter at Step 5). In this case, Step 6 imposes no further restriction on
k+1 . In practice, it may be reasonable not to reduce the trust-region radius, because
this might cause too small steps towards feasibility or an unnecessary restoration
phase. However, there is no compelling reason to increase the radius either, given
the compatibility of TRQP(xk ,k ). A reasonable strategy might then be to choose
k+1 = k . If, on the other hand, (2.19) holds, the emphasis of the iteration is then
on reducing the objective function, a case akin to unconstrained minimization. Thus
a more detailed rule of the type

[1 k , 2 k ]
if k [1 , 2 ),
k+1
if k 2
[k , 2 k ]
seems reasonable in these circumstances.
Finally, we recognize that (2.15) may be dicult to verify in practice, since it
may be expensive to compute xNk and Pk when the dimension of the problem is large.
3. Convergence to rst-order critical points. We now prove that our algorithm generates a globally convergent sequence of iterates. In the following analysis,
we concentrate on the case in which the restoration iteration always succeeds. If this

CONVERGENCE OF A TRUST-REGION SQP-FILTER METHOD

645

is not the case, then it usually follows that the restoration phase has converged to an
approximate solution of the feasibility problem (2.16) and we can conclude that (1.1)
is locally inconsistent. For the purpose of our analysis, we shall consider
S = {k | xk+1 = xk + sk },
the set of (indices of) successful iterations, and
R = {k | nk does not satisfy (2.10)

or nk  > k min[1, k ]},

the set of restoration iterations. In order to obtain our global convergence result, we
will use the following assumptions.
AS1. f and the constraint functions cE and cI are twice continuously dierentiable.
AS2. There exists umh > 1 such that
Hk  umh 1 for all k.
AS3. The iterates {xk } remain in a closed, bounded domain X Rn .
If, for example, Hk is chosen as the Hessian of the Lagrangian function
(x, y) = f (x) + yE , cE (x) + yI , cI (x)
at xk , in that
(3.1)

Hk = xx f (xk ) +

[yk ]i xx ci (xk ),

iEI

where [yk ]i denotes the ith component of the vector of Lagrange multipliers ykT =
T
T
(yE,k
yI,k
), then we see from AS1 and AS3 that AS2 is satised when these multipliers
remain bounded. The same is true if the Hessian matrices in (3.1) are replaced by
bounded approximations.
A rst immediate consequence of AS1AS3 is that there exists a constant ubh > 1
such that, for all k,
(3.2)

|f (xk + sk ) mk (xk + sk )| ubh 2k .

A proof of this property, based on Taylor expansion, may be found, for instance, in
Toint [29]. A second important consequence of our assumptions is that AS1 and AS3
together directly ensure that, for all k,
(3.3)

f min f (xk ) and 0 k max

for some constants f min and max > 0. Thus the part of the (, f )-space in which the
(, f )-pairs associated with the lter iterates lie is restricted to the rectangle
A0 = [0, max ] [f min , ].
We also note the following simple consequence of (2.10) and AS3.
Lemma 3.1. Suppose that Algorithm 2.1 is applied to problem (1.1). Suppose also
that (2.10) and AS3 hold and that
k n .


FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER

646

Then there exists a constant lsc > 0 independent of k such that


lsc k nk .

(3.4)
Proof. First dene
def

Vk = {j E | k = |cj (xk )|}

{j I | k = cj (xk )},

which is the subset of most-violated constraints. From the denitions of k in (2.9)


and of the normal step in (2.5) we obtain, using the CauchySchwarz inequality, that
(3.5)

k |x cj (xk ), nk | x cj (xk ) nk 

for all j Vk . But AS3 ensures that there exists a constant lsc > 0 such that
def

max max x cj (x) =

jEI xX

1
.
lsc

We then obtain the desired conclusion by substituting this bound into (3.5).
Our assumptions and the denition of k in (2.13) ensure that k and k can be
used (together) to measure criticality for problem (1.1).
Lemma 3.2. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose also that AS1 and AS3 hold, and that there exists
a subsequence {ki } such that, for any i, ki  R with
(3.6)

lim ki = 0 and lim ki = 0.

Then every limit point of the subsequence {xki } is a rst-order critical point for problem (1.1).
Proof. Consider x , a limit point of the sequence {xki }, whose existence is ensured
by AS3, and assume that {k } {ki } is the index set of a subsequence such that {xk }
converges to x . The fact that k  R implies that nk satises (2.10) for suciently
large  and converges to zero, because {k } converges to zero and the second part of
this condition. As a consequence, we deduce from (2.11) that {xNk } also converges to
x . Since the minimization problem occurring in the denition of k (in (2.13)) is
convex, we then obtain from classical perturbation theory (see, for instance, Fiacco
[15, pp. 1417], AS1, and the rst part of (3.6) that











 = 0.
min
g
,
t

 A (x )t=0


E


cI (x )+AI (x )t0



t 1
This in turn guarantees that x is rst-order critical for problem (1.1).
We start our analysis by examining what happens when an innite number of
iterates (that is, their (, f )-pairs) are added to the lter.
Lemma 3.3. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose that AS1 and AS3 hold and that |Z| = . Then
lim k = 0.

k
kZ

CONVERGENCE OF A TRUST-REGION SQP-FILTER METHOD

647

Proof. Suppose, for the purpose of obtaining a contradiction, that there exists an
innite subsequence {ki } Z such that
ki %

(3.7)

for all i and for some % > 0. At each iteration ki , the (, f )-pair associated with xki ,
that is (ki , fki ), is added to the lter. This means that no other (, f )-pair can be
added to the lter at a later stage within the square
[ki %, ki ] [fki %, fki ]
or with the intersection of this square with A0 . Note that this holds, even if (ki , fki )
is later removed from the lter, since the rule for removing entries, (2.18), ensures
that the envelope never shrinks. Now observe that the area of each of these squares
is 2 %2 . As a consequence, the set A0 {(, f )|f f } is completely covered by at
most a nite number of such squares, for any choice of f f min . Since the pairs
(ki , fki ) keep on being added to the lter, this implies that fki tends to innity when
i tends to innity. Let us assume, without loss of generality, that fki+1 fki for all i
suciently large. But (2.17) and (3.7) then imply that
ki+1 (1 )ki ki %,
and therefore that ki converges to zero, which contradicts (3.7). Hence this latter
assumption is imposssible and the conclusion follows.
We next examine the size of the constraint violation before and after an iteration
where restoration did not occur.
Lemma 3.4. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose also that AS1 and AS3 hold, that k  R, and
that nk satises (3.4). Then
k ubt 1+
k

(3.8)
and
(3.9)

(xk + sk ) ubt 2k

for some constant ubt 0.


Proof. Since k  R, we have from (3.4) and (2.12) that
(3.10)

lsc k nk  1+
,
k

which gives (3.8). Now, the ith constraint function at xk + sk can be expressed as
1
ci (xk + sk ) = ci (xk ) + ei , Ak sk  + sk , xx ci (k )sk 
2
for i E I, where we have used AS1 and the mean-value theorem and where k
belongs to the segment [xk , xk + sk ]. Using AS3, we may bound the Hessian of the
constraint functions, and we obtain from (2.7), the CauchySchwarz inequality, and
(2.6) that
|ci (xk + sk )|

1
max xx ci (x) sk 2 1 2k
2 xX


FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER

648
if i E, or

1
max xx ci (x) sk 2 1 2k
2 xX

ci (xk + sk )
if i I, where we have dened
def

1 =

1
max max xx ci (x).
2 iEI xX

This gives the desired bound with


ubt = max[1 , /lsc ].
We next assess the model decrease when the trust-region radius is suciently
small.
Lemma 3.5. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose also that AS1AS3, (2.12), and (2.15) hold, that
k  R, that
k %

(3.11)
for some % > 0, and that

(3.12)

ubg
k min
, 2
umh
umh

1
1+

tmd %
4ubg

def

= m ,

def

where ubg = maxxX x f (x). Then


mk (xk ) mk (xk + sk )

1
tmd %k .
2

Proof. We rst note that, by (2.15), AS2, (3.11), and (3.12),




k
N
mk (xk ) mk (xk + sk ) tmd k min
(3.13)
, k tmd %k .
umh
Now
1
mk (xNk ) = mk (xk ) + gk , nk  + nk , Hk nk ,
2
and therefore, using the CauchySchwarz inequality, AS2, (2.12), and (3.12),
|mk (xk ) mk (xkN )|

nk  gk  + 12 Hk  nk 2


ubg nk  + 12 umh nk 2
2(1+)
ubg 1+
+ 12 umh 2 2 k
k
2ubg 1+
k
12 tmd %k .

We thus conclude from this last inequality and (3.13) that the desired conclusion
holds.
We continue our analysis by showing, as the reader has grown to expect, that
iterations have to be very successful when the trust-region radius is suciently small.

CONVERGENCE OF A TRUST-REGION SQP-FILTER METHOD

649

Lemma 3.6. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose also that AS1AS3, (2.15), and (3.11) hold, that
k  R, and that


(1 2 )tmd % def
(3.14)
= .
k min m ,
2ubh
Then
k 2 .
Proof. Using the denition of k in (2.20), (3.2), Lemma 3.5, and (3.14), we nd
that
|k 1| =

ubh 2k
|f (xk + sk ) mk (xk + sk )|
1
1 2 ,
|mk (xk ) mk (xk + sk )|
2 tmd %k

from which the conclusion immediately follows.


Note that this proof could easily be extended if the denition of k in (2.20) were
altered to be of the form
(3.15)

def

k =

f (xk ) f (xk + sk ) + k
,
mk (xk ) mk (xk + sk )

provided that k is bounded above by a multiple of 2k . We will comment in section 4


why such a modication might be of interest (see also section 10.4.3 of Conn, Gould,
and Toint [7]).
Now, we also show that the test (2.19) will always be satised when the trustregion radius is suciently small.
Lemma 3.7. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose also that AS1AS3, (2.12), (2.15), and (3.11)
hold, that k  R, that nk satises (3.4), and that


1

tmd % (1+)1 def


(3.16)
= f .
k min m ,
2
ubt
Then
mk (xk ) mk (xk + sk ) k .
Proof. This directly results from the inequalities
(1+)

k
ubt k

1
tmd %k mk (xk ) mk (xk + sk ),
2

where we have successively used Lemma 3.4, (3.16), and Lemma 3.5.
We may also guarantee a decrease in the objective function, large enough to ensure
that the trial point is acceptable with respect to the (, f )-pair associated with xk , so
long as the constraint violation is itself suciently small.
Lemma 3.8. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose also that AS1AS3, (2.15), (3.11), and (3.14)
hold, that k  R, that nk satises (3.4), and that
(3.17)

k ubt

2 tmd %
2

1+

def

= .

650

FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER

Then
f (xk + sk ) f (xk ) k .
Proof. Applying Lemmas 3.43.6which is possible because of (3.11), (3.14),
k  R, and the fact that nk satises (3.4)and (3.17), we obtain that
f (xk ) f (xk + sk ) 2 [mk (xk ) mk (xk + sk )]
12 2 tmd %k
1
 1+

k
12 2 tmd % ubt
k ,
and the desired inequality follows.
We now establish that if the trust-region radius and the constraint violation are
both small at a noncritical iterate xk , TRQP(xk , k ) must be compatible.
Lemma 3.9. Suppose that Algorithm 2.1 is applied to problem (1.1) and that nite
termination does not occur. Suppose also that AS1AS3, (2.10), and (3.11) hold, that
(2.15) holds for k
/ R, and that

1

1
2
1
0 (1 ) 1 def
k min 0 ,
(3.18)
,
= R .

usc ubt
Suppose furthermore that k > 0 and that
(3.19)

k min[ , n ].

Then k  R.
Proof. Because k n , we know from (2.10) and Lemma 3.1 that nk satises
(2.10) and (3.4). Moreover, since k , we have that (3.17) also holds. Assume,
for the purpose of deriving a contradiction, that k R, that is,
(3.20)

nk  > 1+
,
k

where we have used (2.12) and the fact that k 1 because of (3.18). In this
case, the mechanism of the algorithm then ensures that k 1  R. Now assume that
iteration k 1 is unsuccessful. Because of Lemmas 3.6 and 3.8, which hold at iteration
k 1  R because of (3.18), the fact that k = k1 , (2.10), and (3.17), we obtain
that
(3.21)

k1 2 and f (xk1 + sk1 ) f (xk1 ) k1 .

Hence, given that xk1 is acceptable for the lter at the beginning of iteration k 1,
if this iteration is unsuccessful, it must be because xk1 + sk1 is not acceptable for
the lter and xk1 , which in turn can happen only if
(xk1 + sk1 ) > (1 )k1 = (1 )k
because of (3.21) (see the last paragraph of section 2.2). But Lemma 3.4 and the
mechanism of the algorithm then imply that
(1 )k ubt 2k1

ubt 2
.
02 k

CONVERGENCE OF A TRUST-REGION SQP-FILTER METHOD

651

Combining this last bound with (3.20) and (2.10), we deduce that
< nk  usc k
1+
k

usc ubt
2
) k

02 (1

and hence that


02 (1 )
.
usc ubt

1
>
k

Since this last inequality contradicts (3.18), our assumption that iteration k 1 is
unsuccessful must be false. Thus iteration k 1 is successful and k = (xk1 + sk1 ).
We then obtain from (3.20), (2.10), and (3.9) that
< nk  usc k usc ubt 2k1
1+
k

usc ubt 2
k ,
02

which is again impossible because of (3.18) and because (1 ) < 1. Hence our
initial assumption (3.20) must be false, which yields the desired conclusion.
We now distinguish two mutually exclusive cases. For the rst, we consider what
happens if there is an innite subsequence of iterates belonging to the lter.
Lemma 3.10. Suppose that Algorithm 2.1 is applied to problem (1.1) and that
nite termination does not occur. Suppose also that AS1AS3, (2.10) hold, and (2.15)
holds for k
/ R. Suppose furthermore that |Z| = . Then there exists a subsequence
{kj } Z such that
(3.22)

lim kj = 0

and
lim kj = 0.

(3.23)

Proof. Let {ki } be any innite subsequence of Z. We observe that (3.22) follows
from Lemma 3.3. Suppose now that
ki %2 > 0

(3.24)

for all i and some %2 > 0. Suppose furthermore that there exists %3 > 0 such that, for
all i i0 ,
ki %3 .

(3.25)

Observe rst that (3.22) and (2.10) ensure that


lim nki  = 0.

(3.26)

Thus (3.25) ensures that (2.12) holds for suciently large i and thus ki  R for such
i. Now, as we noted in the proof of Lemma 3.5,
1
|mki (xki ) mki (xNki )| ubg nki  + umh nki 2 ,
2
which in turn, with (3.26), yields that
(3.27)

lim [mki (xki ) mki (xNki )] = 0.

652

FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER

We also deduce from (2.15) and AS2 that




(3.28)

mki (xki ) mki (xki


N

%2
+ ski ) tmd %2 min
, %3
umh

def

= > 0.

We now decompose the model decrease in its normal and tangential components, that
is,
mki (xki ) mki (xki + ski ) = mki (xki ) mki (xNki ) + mki (xNki ) mki (xki + ski ).
Substituting (3.27) and (3.28) into this decomposition, we nd that
(3.29)

lim inf [mki (xki ) mki (xki + ski )] > 0.


i

We now observe that, because xki is added to the lter at iteration ki , we know from
the mechanism of the algorithm that either iteration ki R or (2.19) must fail. Since
we have already shown that ki  R, (2.19) must fail for i suciently large, that is,
(3.30)

mki (xki ) mki (xki + ski ) < ki .

Combining this bound with (3.29), we nd that ki is bounded away from zero for
i suciently large, which is impossible in view of (3.22). We therefore deduce that
(3.25) cannot hold and obtain that there is a subsequence {k } {ki } for which
lim k = 0.

We now restrict our attention to the tail of this subsequence, that is, to the set of
indices k > 0 that are large enough to ensure that (3.16), (3.17), and (3.18) hold,
which is possible by denition of the subsequence and because of (3.22). For these
indices, we may therefore apply Lemma 3.9 and deduce that iteration k  R for 
suciently large. Hence, as above, (3.30) must hold for  suciently large. However,
we may also apply Lemma 3.7, which contradicts (3.30), and therefore (3.24) cannot
hold, yielding the desired result.
Thus, if an innite subsequence of iterates is added to the lter, Lemma 3.2
ensures that there exists a limit point which is a rst-order critical point. Our remaining analysis then naturally concentrates on the possibility that there may be no
such innite subsequence. In this case, no further iterates are added to the lter for k
suciently large. In particular, this means that the number of restoration iterations,
|R|, must be nite. In what follows, we assume that k0 0 is the last iteration for
which xk0 1 is added to the lter.
Lemma 3.11. Suppose that Algorithm 2.1 is applied to problem (1.1), that nite
termination does not occur, and that |Z| < . Suppose also that AS1AS3 and (2.10)
hold and that (2.15) holds for k
/ R. Then we have that
(3.31)

lim k = 0.

Furthermore, nk satises (3.4) for all k k0 suciently large.


Proof. Consider any successful iterate with k k0 . Since xk is not added to the
lter, it follows from the mechanism of the algorithm that k 1 holds and thus
that
(3.32)

f (xk ) f (xk+1 ) 1 [mk (xk ) mk (xk + sk )] 1 k 0.

CONVERGENCE OF A TRUST-REGION SQP-FILTER METHOD

653

Thus the objective function does not increase for all successful iterations with k k0 .
But AS1 and AS3 imply (3.3), and therefore we must have, from the rst part of this
statement, that
(3.33)

lim f (xk ) f (xk+1 ) = 0.

kS

The limit (3.31) then immediately follows from (3.32) and the fact that j = k for
all unsuccessful iterations j that immediately follow the successful iteration k, if any.
The last conclusion then results from (2.10) and Lemma 3.1.
We now show that the trust-region radius cannot become arbitrarily small if the
(asymptotically feasible) iterates stay away from rst-order critical points.
Lemma 3.12. Suppose that Algorithm 2.1 is applied to problem (1.1), that nite
termination does not occur, and that |Z| < . Suppose also that AS1AS3 hold and
(2.15) holds for k
/ R. Suppose furthermore that (3.11) hold for all k k0 . Then
there exists a min > 0 such that
k min
for all k.
Proof. Suppose that k1 k0 is chosen suciently large to ensure that (3.19) holds
and that nk satises (2.10) for all k k1 , which is possible because of Lemma 3.11.
Suppose also, for the purpose of obtaining a contradiction, that iteration j is the rst
iteration following iteration k1 for which


F
(1

)
def

j 0 min ,
(3.34)
, k1 = 0 s ,
ubt
where
def

F = min i
iZ

is the smallest constraint violation appearing in the lter. Note also that the inequality
j 0 k1 , which is implied by (3.34), ensures that j k1 + 1 and hence that
j 1 k1 and thus that j 1  R. Then the mechanism of the algorithm and (3.34)
imply that
(3.35)

j1

1
j s ,
0

and Lemma 3.6, which is applicable because (3.34) and (3.35) together imply (3.14)
with k replaced by j 1, then ensures that
(3.36)

j1 2 .

Furthermore, since nj1 satises (2.10), Lemma 3.1 implies that we can apply Lemma 3.4.
This, together with (3.34) and (3.35), gives that
(3.37)

(xj1 + sj1 ) ubt 2j1 (1 )F .

We may also apply Lemma 3.8 because (3.34) and (3.35) ensure that (3.14) holds and
because (3.17) also holds for j 1 k1 . Hence we deduce that
f (xj1 + sj1 ) f (xj1 ) j1 .

654

FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER

This last relation and (3.37) ensure that xj1 + sj1 is acceptable for the lter and
xj1 . Combining this conclusion with (3.36) and the mechanism of the algorithm, we
obtain that j j1 . As a consequence, and since (2.19) also holds at iteration
j 1, iteration j cannot be the rst iteration following k1 for which (3.34) holds. This
contradiction shows that k 0 s for all k > k1 , and the desired result follows if
we dene
min = min[0 , . . . , k1 , 0 s ].
We may now analyze the convergence of k itself.
Lemma 3.13. Suppose that Algorithm 2.1 is applied to problem (1.1), that nite
termination does not occur, and that |Z| < . Suppose also that AS1AS3, (2.10)
hold, and (2.15) holds for k
/ R. Then
lim inf k = 0.

(3.38)

Proof. We start by observing that Lemma 3.11 implies that the second conclusion
of (2.10) holds for k suciently large. Moreover, as in Lemma 3.11, we obtain (3.32)
and therefore (3.33) for each k S, k k0 . Suppose now, for the purpose of obtaining
a contradiction, that (3.11) holds, and notice that
(3.39) mk (xk ) mk (xk + sk ) = mk (xk ) mk (xNk ) + mk (xNk ) mk (xk + sk ).
Moreover, note, as in Lemma 3.5, that
|mk (xk ) mk (xNk )| ubg nk  + umh nk 2 ,
which in turn yields that
lim [mk (xk ) mk (xNk )] = 0

because of Lemma 3.11 and the second conclusion of (2.10). This limit, together with
(3.32), (3.33), and (3.39), then gives that
(3.40)

lim [mk (xkN ) mk (xk + sk )] = 0.

k
kS

But (2.15), (3.11), AS2, and Lemma 3.12 together imply that for all k k0



mk (xNk ) mk (xk + sk ) tmd k min k , k



 k
(3.41)
% ,
tmd % min umh
min ,
immediately giving a contradiction with (3.40). Hence (3.11) cannot hold and the
desired result follows.
We may summarize all of the above in our main global convergence result.
Theorem 3.14. Suppose that Algorithm 2.1 is applied to problem (1.1) and
that nite termination does not occur. Suppose also that AS1, (2.10), AS3, and AS2
hold, and that (2.15) holds for k
/ R. Let {xk } be the sequence of iterates produced
by the algorithm. Then either the restoration procedure terminates unsuccessfully by
converging to an infeasible rst-order critical point of problem (2.16), or there is a
subsequence {kj } for which
lim xkj = x

655

CONVERGENCE OF A TRUST-REGION SQP-FILTER METHOD

and x is a rst-order critical point for problem (1.1).


Proof. Suppose that the restoration iteration always terminates successfully.
From AS3, Lemmas 3.10, 3.11, and 3.13, we obtain that, for some subsequence {kj },
(3.42)

lim kj = lim kj = 0.

The conclusion then follows from Lemma 3.2.


Can we dispense with AS3 to obtain this result? First, this assumption ensures that the objective remains bounded below and the constraint violation remains
bounded above (see (3.3)). This is crucial for the rest of the analysis because the
convergence of the iterates to feasibility depends on this fact. Thus, if AS3 does not
hold, we have to verify that (3.3) holds for other reasons. The second part of this
statement may be ensured quite simply by initializing the lter to (max , ), for
some max > 0 , in Step 0 of the algorithm. This has the eect of putting an upper
bound on the infeasibility of all iterates, which may be useful in practice. However,
this does not prevent the objective function from being unbounded below in
C(max ) = {x Rn | (x) max },
and we cannot exclude the possibility that a sequence of infeasible iterates might both
continue to improve the value of the objective function and satisfy (2.19). If C(max )
is bounded, AS3 is most certainly satised. If this is not the case, we could assume
that
(3.43)

f min f (x) and 0 (x) max for x C(max )

for some value of f min and simply monitor that the values f (xk ) are
reasonablein view of the problem being solvedas the algorithm proceeds. To
summarize, we may replace AS1 and AS3 by the following assumption.
AS4. The functions f and c are twice continuously dierentiable on an open
set containing C(max ), their rst and second derivatives are uniformly bounded on
C(max ), and (3.43) holds.
The reader should note that AS4 no longer ensures the existence of a limit point,
but only that (3.42) holds for some subsequence {kj }. Furthermore, the comments
following the statement of (2.10) no longer apply if limit points at innity are allowed.
4. Conclusion and perspectives. We have introduced a trust-region SQPlter algorithm for general nonlinear programming and have shown this algorithm to
be globally convergent to rst-order critical points. The proposed algorithm diers
from that discussed by Fletcher and Leyer [18], notably because it uses a decomposition of the step in its normal and tangential components and imposes some restrictions
on the length of the former. However, preliminary numerical experiments indicate that
its practical performance is similar to that reported in [18]. Since the performance
of the latter is excellent, the theory developed in this paper provides the reassurance
that lter algorithms also have reasonable convergence properties, which then makes
these methods very attractive.
We are aware, however, that the convergence study is not complete, as we have
not discussed local convergence properties. As it is possible to exhibit examples where
the SQP step increases both the objective function and the constraint violation,2 it is
2 Such an example is provided by Figure 9.3.1 in Fletcher [16], taking the case =
feasible point close to the origin illustrates the eect.

1
4

. Any

656

FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER

very likely that such a study will have to introduce second-order corrections (see [16,
section 14.4]) to ensure that the Maratos eect does not take place and that a fast
(quadratic) rate of convergence can be achieved. Moreover, convergence to secondorder critical points also remains, for now, an open question. In this context, the
alternative denition of k presented in (3.15) is also likely to play a role if we choose
Hk according to (3.1). In this case, we might choose

k =
[yk ]i sk , xx ci (xk )sk 
iEI

in order to ensure that the denominator of the fraction dening k is a correct model of
its numerator not only up to rst-order, but also up to second-order. These questions
are the subject of ongoing work.

657

CONVERGENCE OF A TRUST-REGION SQP-FILTER METHOD

Appendix.
initialization (k = 0)

attempt to compute nk

TRQP(xk , k ) compatible?
n

add xk to the lter

compute tk

compute rk and k+1

xk + sk acceptable?
y

n
y

k < 1 and
mk (xk ) mk (xk + sk )

mk (xk ) mk (xk + sk ) k
?

xk+1 = xk + rk

add xk to the lter

xk+1 = xk + sk

increase k k+1

compute Hk+1 and increment k by one


Fig. A.1. F lowchart of Algorithm 2.1.

xk+1 = xk

reduce k
k+1

658

FLETCHER, GOULD, LEYFFER, TOINT, AND WACHTER


REFERENCES

[1] L. T. Biegler, J. Nocedal, and C. Schmid, A reduced Hessian method for large-scale constrained optimization, SIAM J. Optim., 5 (1995), pp. 314347.
[2] R. H. Bielschowsky and F. A. M. Gomes, Dynamical Control of Infeasibility in Nonlinearly
Constrained Optimization, presentation at the Optimization 98 Conference, University of
Coimbra, Portugal, 1998.
[3] R. H. Byrd, J. C. Gilbert, and J. Nocedal, A trust region method based on interior point
techniques for nonlinear programming, Math. Program., 89 (2000), pp. 149186.
[4] R. H. Byrd, M. E. Hribar, and J. Nocedal, An interior point algorithm for large-scale
nonlinear programming, SIAM J. Optim., 9 (2000), pp. 877900.
[5] R. H. Byrd, R. B. Schnabel, and G. A. Shultz, A trust region algorithm for nonlinearly
constrained optimization, SIAM J. Numer. Anal., 24 (1987), pp. 11521170.
[6] A. R. Conn, N. Gould, A. Sartenaer, and Ph. L. Toint, Global convergence of a class of
trust region algorithms for optimization using inexact projections on convex constraints,
SIAM J. Optim., 3 (1993), pp. 164221.
[7] A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust-Region Methods, MPS-SIAM Ser.
Optim. 1, SIAM, Philadelphia, 2000.
[8] R. S. Dembo and U. Tulowitzki, On the Minimization of Quadratic Functions Subject to
Box Constraints, Working paper, Series B, 71, School of Organization and Management,
Yale University, New Haven, CT, 1983.
[9] J. E. Dennis, Jr., M. El-Alem, and M. C. Maciel, A global convergence theory for general
trust-region based algorithms for equality constrained optimization, SIAM J. Optim., 7
(1997), pp. 177207.
[10] J. E. Dennis, Jr., M. El-Alem, and K. Williamson, A trust-region approach to nonlinear
systems of equalities and inequalities, SIAM J. Optim., 9 (1999), pp. 291315.
[11] J. E. Dennis and L. N. Vicente, On the convergence theory of trust-region-based algorithms
for equality-constrained optimization, SIAM J. Optim., 7 (1997), pp. 927950.
[12] M. El-Alem, Global convergence without the assumption of linear independence for a trustregion algorithm for constrained optimization, J. Optim. Theory Appl., 87 (1995), pp. 563
577.
[13] M. El-Alem, A global convergence theory for Dennis, El-Alem, and Maciels class of trustregion algorithms for constrained optimization without assuming regularity, SIAM J. Optim., 9 (1999), pp. 965990.
[14] M. El-Hallabi and R. A. Tapia, An Inexact Trust-Region Feasible-Point Algorithm for Nonlinear Systems of Equalities and Inequalities, Technical Report TR95-09, Department of
Computational and Applied Mathematics, Rice University, Houston, TX, 1995.
[15] A. V. Fiacco, Introduction to Sensitivity and Stability Analysis in Nonlinear Programming,
Academic Press, London, 1983.
[16] R. Fletcher, Practical Methods of Optimization, 2nd ed., J. Wiley and Sons, Chichester,
England, 1987.
[17] R. Fletcher and S. Leyffer, User Manual for FilterSQP, Numerical Analysis Report
NA/181, Department of Mathematics, University of Dundee, Dundee, Scotland, 1998.
[18] R. Fletcher and S. Leyffer, Nonlinear programming without a penalty function, Math.
Program., 91 (2002), pp. 239269.
[19] R. Fletcher, S. Leyffer, and P. L. Toint, On the Global Convergence of an SLP-Filter
Algorithm, Technical Report 98/13, Department of Mathematics, University of Namur,
Namur, Belgium, 1998.
[20] R. Fletcher, S. Leyffer, and P. L. Toint, On the Global Convergence of a Filter-SQP
Algorithm, Technical Report 00/15, Department of Mathematics, University of Namur,
Namur, Belgium, 2000.
[21] N. I. M. Gould, M. E. Hribar, and J. Nocedal, On the solution of equality constrained
quadratic programming problems arising in optimization, SIAM J. Sci. Comput., 23 (2001),
pp. 13761395.
[22] M. Lalee, J. Nocedal, and T. D. Plantenga, On the implementation of an algorithm for
large-scale equality constrained optimization, SIAM J. Optim., 8 (1998), pp. 682706.
[23] X. Liu and Y. Yuan, A Robust Trust-Region Algorithm for Solving General Nonlinear Programming Problems, presentation at the International Conference on Nonlinear Programming and Variational Inequalities, Hong Kong, 1998.
and G. Toraldo, On the solution of large quadratic programming problems with
[24] J. J. More
bound constraints, SIAM J. Optim., 1 (1991), pp. 93113.
[25] W. Murray and F. J. Prieto, A sequential quadratic programming algorithm using an in-

CONVERGENCE OF A TRUST-REGION SQP-FILTER METHOD

659

complete solution of the subproblem, SIAM J. Optim., 5 (1995), pp. 590640.


[26] K. G. Murty and S. N. Kabadi, Some NP-complete problems in quadratic and nonlinear
programming, Math. Programming, 39 (1987), pp. 117129.
[27] E. O. Omojokun, Trust Region Algorithms for Optimization with Nonlinear Equality and
Inequality Constraints, Ph.D. thesis, University of Colorado, Boulder, CO, 1989.
[28] A. Sartenaer, Automatic determination of an initial trust region in nonlinear programming,
SIAM J. Sci. Comput., 18 (1997), pp. 17881803.
[29] P. L. Toint, Global convergence of a class of trust region methods for nonconvex minimization
in Hilbert space, IMA J. Numer. Anal., 8 (1988), pp. 231252.
[30] A. Vardi, A trust region algorithm for equality constrained minimization: Convergence properties and implementation, SIAM J. Numer. Anal., 22 (1985), pp. 575591.
chter and L. T. Biegler, Global and Local Convergence of Line Search Filter Methods
[31] A. Wa
for Nonlinear Programming, Technical Report CAPD B-01-09, Department of Chemical
Engineering, Carnegie Mellon University, Pittsburgh, PA, 2001.
[32] Y. Yuan, Trust region algorithms for nonlinear programming, in Computational Mathematics
in China, Contemp. Math. 163, Z. C. Shi, ed., AMS, Providence, RI, 1994, pp. 205225.

You might also like