()1990
SIAM J. NUMER. ANAL.
Vol. 27, No. 5, p. 1263-1294
October 1990
Society for Industrial and Applied Mathematics
008
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES*
SAMIH K.
BOURJI AND HOMER F. WALKERS
Abstract. The notion of least-change secant updates is extended to apply to nonsquare matrices
in a way appropriate for quasi-Newton methods used to solve systems of nonlinear equations that
depend on parameters. Extensions of the widely used least-change secant updates for square matrices
are given. A local convergence analysis for certain paradigm iterations is outlined as motivation for
the use of these updates, and numerical experiments involving these iterations are discussed.
Key words, least-change secant updates, quasi-Newton updates, parameter-dependent systems
AMS(MOS) subject
classification. 65H10
1. Introduction. Quasi-Newton methods are very widely used iterative methods
for solving systems of nonlinear algebraic equations. The basic form of a quasi-Newton
method for solving F(x) 0, F Rn --. Rn, is
(1.1)
Xk+:
=xk--B:F(xk),
,.
in which Bk F(xk) E Rnn, the Jacobian (matrix) of F at xk. For practical success,
it is usually essential to augment this basic form with procedures for modifying the
step -B:F(xk) to ensure progress from bad starting points, but we need not consider
such procedures here. For a general reference on all aspects of quasi-Newton methods,
see Dennis and Schnabel [11].
The most effective quasi-Newton methods are those in which each successive
Bk+: is determined as a least-change secant update of its predecessor Bk. As the name
suggests, Bk+l is determined as a least-change secant update of Bk by making the least
possible change in Bk (as measured by a suitable matrix norm) which incorporates
current secant information (usually expressed in terms of successive x- and F-values)
and other available information about the structure of F There are also notable
updates which, strictly speaking, are least-change inverse secant updates obtained
When speaking
in an analogous way by making the least possible change in
of
secant
we
In [10], Dennis
these.
to
include
intend
updates,
generically least-change
and Schnabel precisely formalize the notion of a least-change secant update and show
how the most widely used updates can be derived as least-change secant updates. In
[12], Dennis and Walker show that least-change secant update methods, i.e., quasiNewton methods which use least-change secant updates, have desirable convergence
properties in general.
.
B-:.
*Received by the editors August 10, 1988; accepted for publication (in revised form) October
5, 1989.
Department of Mathematics, Weber State College, Ogden, Utah 84408. The work of this
author was supported in part by the United States Department of Energy under contract DE-FG0286ER25018 with Utah State University. Some of the results in this paper first appeared in this
author’s doctoral dissertation at Utah State University.
SDepartment of Mathematics and Statistics, Utah State University, Logan, Utah 84322-3900.
The work of this author was supported by United States Department of Energy grant DE-FG0286ER25018, Department of Defense/Army grant DAAL03-88-K, and National Science Foundation
grant DMS-0088995, all with Utah State University.
1263
1264
SAMIH K. BOURJI AND HOMER F. WALKER
In this paper we outline general principles and specific formulas which extend
least-change secant updating from the traditional square-matrix context to that of
nonsquare matrices. Our primary purpose is to provide updates which can be used
in quasi-Newton methods for solving nonlinear systems which depend on parameters.
We mention two important areas in which such systems occur and which have strongly
influenced the work presented here. The first is the solution of a nonlinear system by
continuation or homotopy methods, in which the parameter is a continuation parameter used to transform a problem from one which is presumably easy to solve into one
for which the solution is desired but which may be difficult to solve ab initio. The
second is the solution of ordinary differential equations by implicit methods, in which
the parameter is the time variable and a nonlinear system must be solved by a sequence of corrector iterations at each timestep. In this second area, iterative methods
used in the corrector iterations typically must carry over Jacobian information from
one timestep to the next for the sake of efficiency. As a result, the nonlinear systems
arising in this context are appropriately regarded as parametrized by the time variable, rather than as sequences of nonparametric systems, even though time is constant
during each set of corrector iterations.
We consider a parameter-dependent nonlinear system in the form
(1.2)
F(x, A)
0,
F :Rn
Rm
-
Rn,
where x E Rn is an independent variable and A E Rm is a parameter vector. Our
interest is in iterative methods that are intended to produce approximate solutions
of (1.2) through some range of A-values. There are many iterative methods that
might be appropriate for this, depending in part on how the A-values are specified (a
priori or as the iterations proceed, independent of or in conjunction with the x-values,
etc.). We are not concerned here with how successive A-values might be determined or
with the specific forms of the iterations, but we have in mind methods which require
approximations at current (x, A)-values of F Rnxn, the Jacobian of F with respect
to x, or perhaps of the full Jacobian F
[F, F] Rnx(n+m), where F Rnm
is the Jacobian of F with respect to A, and which for efficiency require Jacobian
information to be retained through at least some changes in A.
In view of the success of least-change secant updates in maintaining approximate
Jacobians in a quasi-Newton iteration (1.1), it is natural to consider their use in
maintaining approximations of F or F’
[F, Fx] in iterations used to solve (1.2).
In order to do so, some sort of extension of the usual formulation of these updates is
clearly required. Indeed, the usual formulation applies only to square matrices, and
F is nonsquare. Even if we require only an approximation of the square matrix F,
it is not clear how to incorporate the information in successive F-values in a secant
update if A changes as well as x.
To be more specific about this last point, we suppose that (x, A) and (x+, A+) are
"current" and "next" values and that B is a "current" approximation of Fz. Then the
usual formulation for determining a "next" approximate Jacobian B+ as a least-change
secant update of B calls for B+ to satisfy a secant equation
(1.3)
B+ (x+ x) F(x+, A+) F(x, A)
as nearly as possible consistent with any structure that may be imposed on
F(x+,A+)-F(x,A)
[F=(x+,A+),F(x+,A+)]
x
A+- A
/
B+.
Since
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
1265
this secant equation is clearly inappropriate if A+
A. One modification of the
usual formulation which may be useful in some circumstances is to replace the righthand side of (1.3) with something more appropriate. An ideal replacement would be
F(x+, ,+) F(x, ) F),(x,,k)(,k+ ,k) or F(x+, ,+) F(x, ) F(x+, )+)()+ ),
provided Fn is easy to obtain. If F is not easy to obtain, then a finite-difference
approximation might be an effective substitute. However, any strategy such as this will
necessarily require extra derivative or function evaluations and so may be undesirably
expensive in some applications.
In the next section we give extensions to the nonsquare-matrix case of the usual
general formulations of square-matrix least-change secant and inverse secant updates.
With these, the secant information in successive F-values can be used to determine
updates of approximations of F
[Fx, F] according to least-change criteria. These
extensions incidentally determine updates of approximations of Fx using available
secant information, i.e., without any extra derivative or function evaluations. Having
these general extensions, we derive in 3 extensions of the most widely used specific
formulas for square-matrix least-change secant updates having various properties. In
order to provide some theoretical support for using these updates, we outline in 4 a
local convergence analysis for certain paradigm methods which employ them. In 5,
we report on some numerical experiments.
The focus here is on least-change secant updates of matrices with more columns
than rows; however, our ways of deriving updates using least-change principles readily
apply as well to matrices with more rows than columns, such as occur in nonlinear
least-squares problems. In this connection, we note that Griewank and Sheng [19] have
recently considered Broyden updating for such matrices in the context of nonlinear
least-squares problems.
Although the general updating principles, specific update formulas, and local
convergence analysis given here are for the most part new, there is other work which
is closely related to the developments here. The updates used in the path-following
algorithm of Georg [16] and in the augmented Jacobian matrix algorithm in the homotopy method code HOMPACK [32] are really the same as our extension of the first
Broyden update (3.1.1) below, although these updates are viewed somewhat differently in [16] and [32]. In work which is complementary to that in this paper, Walker
and Watson [31] consider general normal flow and augmented Jacobian algorithms for
underdetermined systems which use the general and specific updates given here and
give a local q-linear and q-superlinear convergence analysis for these algorithms. (See,
e.g., Watson, Billups, and Morgan [32] as well as [31] for a description of the normal
flow and augmented Jacobian algorithms. Also, see, e.g., Ortega and Rheinboldt [25]
for definitions of the various types of convergence referred to here.) In recent work
independent of that here and in [31], Martinez [20] considers Newton-like iterative
methods for underdetermined systems which use very general procedures for updating approximate Jacobians, and he develops a general local r-linear and r-superlinear
convergence analysis for these methods. He points out as a special case the possibility
of maintaining approximate Jacobians in normal flow algorithms with updates which
are, in our terms, the Frobenius-norm least-hange secant updates developed in 2
and 3.1 below. No specific update formulas are given in [20], although experiments
with (sparse) first Broyden updating are discussed; see the additional remarks in 5
below. Finally, we note that in many respects there are strong parallels between the
developments here and those in [12]; this is especially so in our general formulations
of least-change secant and inverse secant updates in 2 and in the local convergence
analysis in 4 and in the Appendix.
1266
SAMIH K. BOURJI AND HOMER F. WALKER
Our notational conventions are not strict, but it might be helpful to the reader
to note that we use the following guidelines: Unless otherwise indicated, lowercase
letters denote vectors and scalars, and capital letters denote matrices and operators.
Boldface uppercase letters denote vector spaces, subspaces, and affine subspaces. For
n + m. Vectors and matrices with bars are in Rn and R’n,
convenience, we set
respectively. Without bars, vectors are in R’ or Rm and matrices are in R’’ or
R’m, unless otherwise indicated. If x e R’ and A e Rm, then for
() E Rn we
usually write
(x, A), F(x, A) f(), etc. Vector components and matrix entries
are indicated by superscripts in parentheses. Distinguished vectors and matrices are
indicated by subscripts, and subscripts on vectors and matrices are inherited by their
components and entries. For example, the kth matrix in a sequence in Rnxn might
be denoted by Bk, in which case its ijth entry would be denoted by j). We use
to denote all vector norms and their induced matrix norms, and we use I1" I to
denote a matrix norm associated with an inner product. A projection onto a subspace
or affine subspace which is orthogonal with respect to I1" is denoted by P with the
subspace or affine subspace appearing as a subscript. If P denotes a projection, then
we set p_L I- P, where I is the identity operator.
B(k
I"
I
2. General least-change secant updates. To define general least-change secant updates of nonsquare matrices, we suppose that analogues of the usual ingredients
in the square-matrix case are given, viz., the following:
(i) B e
(ii) eRn andy
(iii) an inner-product norm
(iv) an affine subspace A AN -b S C_ R,n, where S is the parallel subspace
and AN is the element of A normal to $ in the I[" I -inner product.
It is appropriate (but not necessary) to relate these to the equation-solving context
outlined in the introduction by regarding B F’() for some
+ for some
+, and y F’(), e.g., y F(+)- F(). The affine subspace A presumably
reflects some structure of F’, e.g., a particular pattern of sparsity, which is to be
imposed on updates. An appropriate choice of norm I[" I[ depends on the problem at
,
-
hand but seems likely to be the Frobenius norm or a weighted Frobenius norm, as in
the square-matrix case.
We define a general least-change secant update via the approach and notation
of [12, 2]. We set N()
{/17/ e Rnn /t:/- 0} and Q(y,) {2/e R’n
/17/ y} and note that N() is a subspace of Rnxn and Q(y, ) is an affine subspace
representable as Q(y, ) ys-T/T + N(). We define M(A, Q(y, )) to be the set of
elements of A which are closest to Q(y, ) in the norm [[. if Q(y, ) q}; otherwise,
we set M(A, Q(y, )) A. Of course, M(A, Q(y, )) ANQ(y, ) if ANQ(y, ) q).
As in Theorem 2.3 of [12], it can be shown that M(A, Q(y, )) is an affine subspace
with parallel subspace S N N().
Our general definition of a least-change secant update is the following.
DEFINITION 2.1. B+ E Rnxn is the least-change secant update of B in A with
respect to y, and the norm ][. [[ if B+ is the unique solution of
,
/M(A,Q(y,))
We note that /}+
PM(A,Q(u,))/}, the [[. [[-orthogonal projection of / onto
M(A, Q(y, )). If A Q(y, ) q), then B+ satisfies the secant equation
(2.1)
B+=y.
1267
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
It can easily be verified that all of the results of [12, 2] for square-matrix leastchange secant updates extend to the present case. Particularly relevant results are the
following. We have
/}+
lim
(PAPQ(y,)) k PA/
+ (i
(i
In (2.2),
J- PSNN()
(I PsPN(,)) -1 is well defined as an operator on (S N N()) +/-.
T)
(I- PsPN()) -1 N + (I- PsPN()) -1 PsPN() +/- T’--and it follows that for any G E
+
.
In fact,
(S N N()) +/-
M(A, Q(y, g)),
P-NN() + PSNN()
and for any M E RnXn
)11 +
Ps u( )
)11.
We also offer a general definition of a least-change inverse secant update of a
nonsquare matrix. Our motivation is the importance of least-change inverse secant
updates in the square-matrix case. For example, the Broyden-Fletcher-GoldfarbShanno update [6], [7], [13], [17], [29] is such an update (see [10]) and is generally
regarded as the most successful update for unconstrained optimization problems. The
second Broyden update [5] is another such update (see [10]); although it is not generally
as effective as the first Broyden update [5], there is evidence that it may be competitive
in stiff ODE applications (see [1], [4]). We hope that the updates prescribed here will
be similarly effective, e.g., in optimization problems which depend on parameters or
in stiff ODE problems in which the time variable is taken into account in updating.
The idea underlying our definition is to assume that B is of full rank n, then to
select a set of n columns which constitute a nonsingular matrix, and finally to apply
the least-change principle to the n columns of the inverse of this matrix together with
another m columns derived from B. There are several ways in which this idea might
be carried out. We choose a way which is not only natural and straightforward but
also supported by the local convergence analysis in 4.
For convenience, we assume that the first n columns of B constitute a nonsingular
matrix, although we stress that any set of n linearly independent columns of/ can
be used instead. We write/} [B, C] for nonsingular B Rnn and for C Rnm
and set K [K,L], where K B and L -B-C. We define an update K+ of
/ with an eye toward an inverse secant equation
-
(2.5)
R+y
s,
-
where we write g (s, t) for s Rn and t Rm and set
(y, t). Our motivation
is the following: If we write/Tt’+
and/}+
[K+,L+]
[B+, C+] with K+ B+
and L+
-B+-IC+, then/+ satisfies the secant equation/}+g y if and only if
s
/+. We phrase our definition in terms of/ and/+ as
B_
B_
follows.
y
C+t
1268
SAMIH K. BOURJI AND HOMER F. WALKER
,
/+ E Rnn is the least-change inverse secant update of/ in
y, and the norm [1-II if K+ is the unique solution of
DEFINITION 2.2.
A with respect to
7EM(A,Q(s,))
(s, t) for s E Rn and t e Rm and ff
where
(y, t).
To provide additional motivation for this definition, we note that Ypma [34] and
one of the referees have observed that
0
I
0
I
0
I
and from this it is easy to see that K+ of Definition 2.2 can be obtained through a
conventional least-change inverse secant update of a square matrix with an appropriate
choice of an inner-product norm and affine subspace. We also note that from K+
[K+, L+], we can obtain/}+ [B+, C+] by taking B+ K and C+ -K+-IL+,
although it may be preferred in some applications to maintain K+ rather than B+. If
A N Q(s, ) q), then/+ satisfies (2.5) and/}+ satisfies (2.1). We have in any case
that/Tf+ pM(A,Q(s,0))/. Also, inverse analogues of (2.2), (2.3), and (2.4) hold with
/}+,/}, and y in those equations replaced by/+,/, and s, respectively.
,
,
3. Some specific updates. We now derive extensions to the nonsquare case
of the best-known square-matrix least-change secant updates. These extensions are
obtained from the definitions in 2 by making various choices of A and I1" I which
correspond in each case to the analogous choice made for square matrices. Not surprisingly, many of these updates look very much like their square-matrix counterparts.
Our intention is not only to determine updates which are likely to be important but
also to demonstrate methods of derivation which can be used to obtain other desirable
updates.
We first derive least-change secant updates from Definition 2.1 and then derive
least-change inverse secant updates from Definition 2.2. In each case, we refer to
the update by the corresponding well-known name in the square-matrix case. For
simplicity, we also assume that the matrix to be updated lies in A in each case. This
is typically to be expected in practice. Also, if the matrix were not in A, then we
could obtain its least-change secant update in A by first projecting it onto A in a
straightforward way and then applying the appropriate formula below. For both the
symmetry-preserving updates and the inverse updates, it is necessary to assume that
a subset of n of the columns of the matrix being updated has some special property.
Both for convenience and because it seems most likely to be the case in practice, we
assume that the special property resides in the first n columns. We stress that any
other subset of n columns will do just as well and that even though an assumption is
made about a particular subset of the columns, the updates are derived by applying
the least-change principle to the entire matrix.
3.1. Least-change secant updates. We derive below extensions of the first
Broyden update, the Powell symmetric Broyden (PSB) update [26], the sparse Broyden
(Schubert) update [8], [27], and the Davidon-Fletcher-Powell (DFP) update [9], [14].
At this time we do not have a tractable derivation of an extension of the sparse
symmetric update of Marwil [21] and Toint [30], although such an update has been
derived by Beattie and Weaver-Smith [2] using other methods.
.
1269
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
,
We suppose that we are given B 6 Rnxn, nonzero 6 R and y Rn. When
necessary in the following, we write/}
[B, C] for B Rnxn and C Rnxm and
n
m.
for
We
t
and
s
that each update below gives the usual
note
R
R
6
(s, t)
square-matrix update of B when t 0.
The first Broyden update. We take A S = Rnxn and I1"
I1" IIF, the
Frobenius norm. We have M(A, Q(y, $))
and
so
]+
Q(y,),
PM(A,Q(y,))]
PQ@,)B, where the projection is I1" IIF-orthogonal. Orthogonal projection onto an
affine subspace is obtained by adding the normal element of the affine subspace to
orthogonal projection onto the parallel subspace. It is easily verified that the normal
element of Q(y, ) is yT/T and that orthogonal projection onto the parallel subspace
Rnn. It follows that
N($) is given by PN()AT/--/9/[1- $T/T] for
I
yT
+=[[I-T]
B)$T
+ (Y-T
The Powell symmetric Broyden update. We now take A
S
{//
[M,N] e Rnn M M T e Rnn} and again take I1" I
I1" liE. Instead of
characterizing PM(A,Q(y,)) as above, we work with the expression (2.2). With AN O,
(2.2) is
ps
[
+
[M,N] e Rnxa with M 6 Rnxn and N e Rnxm, we easily have PsM
[(M + MT)/2, N], and PN()/tT/ /17/[I- jT/Tj] as before. Assuming ] e A S,
we use these projections to evaluate the right-hand side of (3.1.2).
First, we have PSnN()] limk-,oo(PsPN())k[ (cf. [12, 2]). Straightforward
For M
calculation gives
PsPN(,)B
PsPN()B1
...,PN()]2
Note that
k 2, 3,
B-
1
-B1
lsTs
sB 2
0, so PsPN()2
2
where B2
(3.1.3)
,
PSnN()/} =/- T 3 -4- tT t (l -/2)
=--
T
0. It can be shown by induction that for
and it follows that
T
8-T,
+
T + tTt
1270
_
SAMIH K. BOURJI AND HOMER F.
WALKER
PsPN()) -1
To evaluate the other term in (3.1.2), we note that (I-
_,k=o(PSPN())k
on
(S N N()) +/- and that
"
ys
and
PN()
,-
5 T---- + T’ -J
2TPSPN() -]
1, 2, ...,
It follows by induction that for k
(
-
s Ty
sT
k2T] PsP()
sT s
sTY s
PSPN() T--
(3.1.5)
and
yT
1
+/PsPN()
--]
(3.1.4)
(3..4) and (3.1.5) give
(3.1.6)
(I PsPN(,))
{yT)
yT + [syT, ytT]
sTy
sT
Combining (3.1.2), (3.1.3), and (3.1.6) gives our extension of the PSB update:
(3..7)
B+
B+
(- 9)* +
[(- 9)*, (-
T + tTt
r(u_ 9)
$T + tTt T"
The structure of this update may be better revealed if we write
B+ Rnxn and C+ Rnxm and note that (3.1.7) gives
9+
[B+, C+] for
9) + (u- 9) ,r(u- 9)
T + tTt
T + tTt T’
9)tr
2(usr(uS)
C+ =C+ T + tTt
T + tTt T"
B+ =B+ (u(a.l.S)
."
The sparse Broyden update. We now take A and S to be the set of matrices
in Rnxn having a particular pattern of sparsity, and we again take
]F. As
in the case of the first Broyden update, we characterize PM(A,Q(y,)).
For
n, we let i be the vector obtained by imposing on the sparsity pat1,
tern of the ith row of matrices in A. We note that if M A, then M M(A, Q(y, ))
if and only if for
1, ...,n, eiTM =eiTy whenever i 0, where ei is the ith standard unit basis vector in Rn. Also, M(A, Q(y, )) is an affine subspace with parallel
subspace A N() S N(). Then we have for M A that
I
...,
n
/=1
and that the normal to
M(A, Q(y, ))
n
i--1
is
1271
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
where "/" denotes pseudo-inverse, i.e., for each i, (Ti)+
("i)+ 0 if i 0. It follows that if/} E A, then
(Ti)-I
if i
0 and
The Davidon-Fletcher-Powell update. We take
A- S- (I- [M,N] E R’x’M-M T e R’’}
as in the case of the PSB update, and we suppose that B
[B,C]
A. As in
the square-matrix case (cf. [10], [12]), we determine an appropriate matrix norm for
Rnn. We ask the reader to bear with our
updating using a scs{n9 mstr{x
perhaps curious definitions and assumptions pending the justication given following
them. We set
y- Ct and assume sT > 0. We let ]7 Rnn be any positiveand define a norm
definite symmetric matrix such that Ws
llw on Rn as
follows: For M [M, N] E Rn
we set
,
(3.1.9)
[I/17/llv
II"
tr {W-IMW-M T + W-NNT}.
To justify these definitions and assumptions, we note that in the equation-solving
context outlined earlier, we presumably want an update of the type sought here in
order to reflect positive-definiteness and symmetry of Fx. If
5:+ 5: for nearby
C
and
of
and
then
is
a
5:
if
good approximation F(5:),
Fx(5:)s, and so we can
5:+
expect sTfl > 0. The tradition is to think of the s, pair as reflecting a natural scaling
determined by F and approximated by W. In principle, we might define I1" IIw in
any of several ways which extend the definition used in the square-matrix case. We
have chosen to define I1" IIw by (3.1.9) because doing so yields a final update formula
in which W does not appear. As in the square-matrix case, this is crucial; see the
discussion in [12, pp. 971-972].
We determine /}+ as a least-change secant update of/} in A with respect to
I1" IIw. The strategy for doing so is one sometimes used in the square-matrix case (see,
e.g., [10]), viz., first to rescale B using W, then to update the rescaled matrix using
(3.1.7) and (3.1.8), and finally to remove the scale to obtain B+. We let W jjT
be any factorization of W and for B
[B,C] and B+
[B+,C+] we set B
and
-T,
-T,
g-c]
[, ] [J-BJ
[/}+, (7+] [J-IB+J J-C+]. We note
where
that /}+
y if and only if/+
(,t) (jTs, f;) and ) J-y.
It
follows
that
Furthermore, liB+B+ is the least-change secant
liB+- BilE.
update of B in A with respect to y, and I1" IIw if and only if B+ is the least-change
secant update of in A with respect }, ), and I1" liE. Determining/+ by (3.1.7)
and (3.1.8) and removing the scale gives our extension of the DFP update, in which
+,
,
BIIw
#
(#, t).
B+=B+
(y
-T
)T, (y )tT]
B)# + [#(y
T
_}_
tTt
1272
SAMIH K. BOURJI AND HOMER F. WALKER
To further clarify the structure of this update, we write B+
and C+ Rnm and obtain from (3.1.10)
[B+, C+] for B+ 6 R
n
3.2. Least-change inverse secant updates. We now give extensions of several
square-matrix least-change inverse secant updates, viz., the second Broyden update,
a rank-two symmetry preserving update due to Greenstadt [18], which is the inverseupdate analogue of the PSB update, and the Broyden-Fletcher-Goldfarb-Shanno
(BFGS) update. As we remarked in 2, the BFGS update is generally regarded as
the most successful update for unconstrained optimization, and the second Broyden
update, while generally not as effective as the first, may have some advantages in stiff
ODE applications. In the square-matrix case, the Greenstadt update is generally less
successful than the PSB update. We include it here both for completeness and because
it may prove to have particular advantages on some special problems. It would be
straightforward to prescribe an inverse-update analogue of the sparse Broyden update,
but the lack of applications of such an update makes this seem of doubtful worth.
Rn, and y Rn. We write
We assume that we are given K Rnn, nonzero
(y, t). We also write/ [K, L], where
(s, t) for s Rn and t 6 Rm and set
K Rnn and L Rnxm, and denote each updated matrix by K+
[K+,L+].
We note that as before each update below gives the usual square-matrix update of K
when t 0. Each of these updates is an inverse-update analogue of a direct update in
3.1 and not only can be derived by analogous reasoning but indeed is obtained just
by renaming the ingredients in the appropriate formula in 3.1.
The second Broyden update. As in the case of the first Broyden update, we
take A S Rnxn and I1"
I1" Then we obtain from (3.1.1) that
-
I
IIF-
The Greenstadt update. As with the PSB and DFP updates, we take
A- Sand
I1" I I1" II .
Then
{II= [M,N] e Rnn’M=MT
(3.1.7) yields
K+=K+
-_
lT tTt
yT(s
or
Rnn}
K) yT
fi)yT + y(s R) T yT(s R) yyT
/T ] + tT t T I
lT ] tT t
T
T
Ry)
yt
yT(s2(S [i)t
L+=L+ fT + tTt
fTf + tTt fTf"
K+=K+ (s
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
1273
The Broyden-Fletcher-Goldfarb-Shanno update. We again take
A
S
{1"I [M,N] e Rna" M
.
MT
Rnn}
and define the norm as a weighted Frobenius norm as follows: Set
s- Lt
Fx(2)-ly, assume Ty > O, and let W Rnn be any positive-definite symmetric
Then for
matrix such that Wy
[M, N] Rna, define
[I//llv
From (3.1.10) we have that
R+ =R+
T
R) 7.,J
yT(s
T
where
(, t),
_
tr (W-1MW-IM T / W-INNT}.
2t_
tTt
T
T
tTt
or
(3.2.3)
In the square-matrix case, it is often desirable in practice to update approximate
Jacobians directly instead of updating their inverses, e.g., in order to maintain and
update QR or Cholesky factorizations of approximate Jacobians. Direct update formulas can be obtained in a straightforward way from the expressions above via the
Sherman-Morrison-Woodbury formula (see, e.g., [25, p. 50]), and we give such formulas below for the second Broyden update and the BFGS update. Although the direct
update formula for the second Broyden update is easy to obtain, deriving the direct
formulas for the Greenstadt and BFGS updates is quite tedious. We do not discuss
the direct formula for the Greenstadt update here since there appears to be nothing
additional to be learned from it and it seems unlikely to be widely applied. Assuming
K and K+ are invertible, we set B [B, C], where B K and C -K-L, and
/+ [B+, C+], where B+ K_ and C+
For the second Broyden update, an application of the rank-one ShermanMorrison-Woodbury formula to K+ given by (3.2.1) yields
-KL+.
-
(y- B)yTB
B+-B+ yT + tTt
which gives
C+=C+
(y- B)(yTC + tT)
yT + tTt
For the BFGS update, an application of the rank-two Sherman-MorrisonWoodbury formula to K+ given by (3.2.2) and (3.2.a) yields (after a very long computation)
(3.2.4) B+
B + (B)TB-(B) yyT
d
(/)(/})T + 2tTt [y(/)T + (/})yT]
c
1274
SAMIH K. BOURJI AND HOMER F. WALKER
which gives
B+B -1
C+
[C
T
2
2ytT-
d2 T yTB-Iy
()tT
did2
]
where
yTB-B + tTt,
yTB-B -t- 2tTt,
(yTB-Y T tTt) d2
c d2 T
yTB_ly
d
d 4 (tTt) 2 / ()rB-([)c.
d
d2
(3.2.6)
There are many possible alternatives to these expressions for B+ and C+, but these
are as appealing to us from both aesthetic and computational points of view as any
others which we have explored. The apparent computational difficulty of applying
(3.2.5) is mitigated somewhat by the fact that
(3.2.7)
B+B
-
I+
+
(/)TB-1 () y(B-y) T
d
2tTt
c
(/) (B_I$) T
[Y(B-I) T + ([)(B-y)T],
and so by using (3.2.7), the application of (3.2.5) involves mainly forming dot products
and linear combinations of vectors and, in particular, no excessive system solving. We
note, however, that forming B-y and B-B requires the solution of two systems
involving B, and we have not been able to find any alternative to these expressions
that does not require this.
An alternative which might be preferred is to maintain B and L instead of B
and C. For example, iteration (4.1) in the next section is naturally implemented by
maintaining B in factored form together with L. In any case, C can be recovered
from B and L if desired by C
-BL, and maintaining B and L offers certain
arithmetic advantages. Indeed, from (3.2.3) and (3.2.4) and the identities B B
and s- KB-y, we have
-
B+
(3.2.8)
TBSd
B+
L+=L/
yyT
_c (B)(B)
2(- B-ly)tT
d2
T
+
2tTt
[y(B) T + (B)yT]
yT(_ B-y) tr
d’
d2
where
d
d2
(3.2.9)
yT z_ tTt,
yT + 2tTt,
(yTB-y + tTt) d2
dl
4 (tTt) 2 + TBc.
c-d2+
d
yTB-y,
The implementation of these expressions requires the solution of only one system
involving B, and considerably less work is required to generate L+ from L using
(3.2.8) and (3.2.9) than is required to generate C+ from C using (3.2.5)-(3.2.7).
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
A local convergence analysis. In this section we outline
1275
a general local
for
for
certain
iterations
systems
parameter-dependent
solving
convergence analysis
which use the general least-change secant and inverse secant updates defined in 2.
Our objective is to provide some theoretical support for using these general updates
and the specific updates in 3 to augment the motivation provided by the success of
least-change secant updates in the square-matrix case.
Iterative methods used to solve parameter-dependent systems vary widely in practice and are usually structured with particular applications or contexts in mind. Here
we consider certain paradigm iterations which we feel represent in an essential way a
significant class of methods that has not )een treated elsewhere. We have in mind
methods in which some explicit control is exercised over successive parameter values
and, in particular, the way in which parameter values approach final or limiting values.
For treatments of other iterative methods applicable to solving parameter-dependent
systems, especially the normal flow and augmented Jacobian algorithms, see Walker
and Watson [31] and Martinez [20].
Our paradigm iterations and local convergence analysis could have taken several
forms, and we have tried to offer a good compromise of generality, presentability, etc.,
which still serves the primary purpose, viz., to show that if updating is done according to certain principles, then successive approximate Jacobians will approximate the
actual Jacobian increasingly well in the directions which count and desirable convergence properties will follow. The main point of our an:ysis can be summarized as
follows: If the Jacobian has the structure being impose :[ on updates and if various
ancillary hypotheses hold, then locally the iterates produced by the paradigm iterations
converge as fast as the parameter convergence will allow, at least up to q-superlinear
convergence.
In our analysis, parameter values are explicitly allowed to approach a limiting
value without necessarily ever reaching it, and the rate at which this limiting value
is approached plays an important role. These aspects of our analysis are essential
not only to establishing the main point summarized above but also to meaningfully
addressing the basic issue, which is whether anything is to be gained by nonsquarematrix least-change secant updating. If we were to consider only the case in which
parameter values attain some final value after a finite number of steps, then the
analysis would provide no basis for distinguishing between doing nonsquare-matrix
updating at each step and, say, doing no updating at all until the final parameter value
is reached and subsequently doing conventional square-matrix updating; indeed, it
follows from the analysis in [12] that the latter leads to q-superlinear local convergence.
We emphasize that the iterations considered here are intended to serve as paradigms.
However, it should be possible to modify our analysis without much difficulty to apply
to some other iterations which do not fit within the framework given here; see the
discussion in 5 after Algorithm 5.1 for an illustration.
Our analysis closely parallels the general local convergence analysis for squarematrix least-change secant update methods given by Dennis and Walker [12]. It is
not fully as general as the analysis of [12]: For one thing, it does not include cases
in which part of the Jacobian is computed and part is approximated by a matrix
maintained by updating. However, it can easily be extended to include such cases;
they have not been considered here only to simplify the exposition. For the same
reason, our iterations do not include the option of not updating, although it would be
trivial to do so. More seriously, the analysis here does not include cases in which the
norm used to define least-change secant updates varies from one iteration to the next
(the "iteratively rescaled" cases of [12]). These cases are very important and include
4.
1276
SAMIH K. BOURJI AND HOMER F. WALKER
the nonsquare extensions of the DFP and BFGS updates given in 3. It might be
considered a serious shortcoming of the analysis given here that these cases are not
included; however, we note below that there are still good heuristic arguments based
on our analysis which support the use of these updates.
Otherwise, we have retained the generality of [12] in our analysis. In particular,
we have allowed for cases in which the Jacobian at the solution of interest is not in
the set of allowable approximants in order to show how the speed of convergence is
affected in these cases. Since the usual secant equation determined by a difference in
F-values is unlikely to be appropriate in these cases, we have also allowed for cases
in which there is a variety of admissible secant equations determined by a "choice
rule." As in [12], we assume a choice rule is given as a function X which for each pair
E R determines a set X(, +) c_ Rn of admissible right-hand sides of secant
equations.
Following [12], we have carried out most of the difficult technical work underlying
our analysis in an appendix. The results in the Appendix, while used here only to
support the analysis in this section, are rather general and may be of interest in their
own right.
We assume below that A AN + S C_ Rnn is a given affine subspace which
reflects structure to be imposed on approximate Jacobians and that I1" I is a given
inner-product norm on Rnn. As before, we denote all vector norms and their induced
matrix norms by I" I, and here we assume that particular norms I" are given on Rn
and Rm. We also consider various norms on R which are specified in each instance
using the norms on Rn and Rm.
We consider a nonlinear system (1.2) and assume that a particular
(x,, A,)
0. Throughout the sequel we assume the following hyis sought satisfying F(,)
pothesis.
BASIC HYPOTHESIS. Let F be diflferentiable in an open convex neighborhood of
, (x,, ,) R .for which F(,) 0 and suppose that y > 0 and p (0, 1] are
such that .for Y.
+
,
IF’() F’(Y.,)I <_ "y I Y;,I p
where ly] max{lxl, IAI} for z (x, A) e Rn.
We assume that {Ak }k=o,,... is a sequence prescribed in some way which converges
to A, and first consider an iteration which begins with some o
(xo, A/o) and
/}0 --[B0, Co] and determines
(4.1)
...,
for k
where (/k)+ is the least-change secant update of/k in A with
0, 1,
respect to gk .k+l --.k, Yk, and the norm I1" II. An analogous iteration which uses
least-change inverse secant updates is given by (4.11) below. Iteration (4.1) is modeled
after an iteration derived in the manner of Newton’s method in which
Xk+l
x/
F() -1 {F(Y;) + F(k)(Ako+k+l
Note that (4.2) reverts to the Newton iteration in x if "ko+k+l
Ak0+k)}.
)iko+k. The index k0
is used in (4.1) mainly for convenience in our local convergence analysis; we elaborate
further on the use of k0 in the remark following Theorem 4.1 below. We are not
1277
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
concerned here with exactly how the Ak’s are prescribed. They may be completely
specified in some way prior to beginning the iterations. However, in some iterations
used in practice, they are determined as the iterations proceed, rather than at the
outset, and it happens in some cases that they often remain constant through a range of
k-values before changing in some specified way. For a clarifying example, see Algorithm
5.1 and the related discussion in 5.
The assumption in (4.2) that F is invertible might seem troubling to those concerned with homotopy methods since it suggests an assumption that a particular set
of n columns of F is of rank n. However, if F’(,) itself is of rank n, then the variWe point out that
ables can be relabeled if necessary so that F is invertible near
with this understanding, certain corrector iterations used in predictor-corrector homotopy methods, such as the augmented Jacobian algorithm, the normal flow algorithm,
and variations of these which use least-change secant and inverse secant updates (see
Watson, Sillups, and Morgan [32] and Walker and Watson [31]), fall within the framework of iterations (4.1), (4.2), and (4.11) below. However, our analysis below fails
to apply to those iterations because the main results are conditioned on the rate of
q-convergence of the Ak’s to A,, and nothing can be said about this q-convergence for
those iterations. See Walker and Watson [31] for a local q-linear and q-superlinear
convergence analysis which applies to those iterations.
Our local convergence results for iteration (4.1) are given in Theorems 4.1 and 4.2
below, which are analogues of Theorems 3.1 and 3.3 of [12]. Theorem 4.1 addresses the
local linear convergence of the iteration; Theorem 4.2 dr ws more refined conclusions
about the asymptotic speed of convergence. In Theorem 4.1, we have augmented the
local q-linear convergence result which the reader might expect with a statement about
the local r-linear convergence of the iteration sequence in the case in which Ak --> A,
r-linearly. This has been done for two reasons: First, r-linear convergence of {Ak} to
A, is all that can be expected in many iterations used in practice. Second, the reader
might otherwise accuse us of offering only a disguised version of the theory of [12] in
cases in which A} -+ A, q-linearly, but only because Ak A, for large k.
Our notation and terminology used in association with q-linear and q-superlinear
convergence is that of Ortega and Rheinboldt [25, p. 281]: If {zk}}=o,1,... is a sequence converging to z, in a space with norm I, then Q1 {z} }, the linear q-factor of
{zl }l=0,,..., is defined as
,.
0
Qi(z}
li--_
z,l/l
if zk z,, k
if z} z,, k
otherwise.
>_
>_
some k0,
some k0,
We say that {zk}k=o,,... converges q-linearly to z, in the norm [.
if and only
Q(zk} < 1 and that (zk}k=0,,... converges q-superlinearly to z, if and only if
Q(zk} O. Note that q-superlinear convergence holds in one norm if and only if it
if
holds in every other norm. Also, in saying that a sequence of matrices is uniformly
small or uniformly bounded, we mean that in any matrix norm the corresponding
sequence of norm values is uniformly small or uniformly bounded.
THEOREM 4.1. Let F satisfy the basic hypothesis. Let A and B,
[B,, C,] _---nn
is invertible and there exists an rx .for which
PA(F(,)") be such that B, E R
]I- B,-F(,)I <_ rx < 1. Assume that X has the property with A that there exists
’l and any y X(,+), we have
an >_ 0 such that for any 5c,
+
(4.3)
1278
SAMIH K. BOURJI AND HOMER F. WALKER
.for every e M(A, Q(y, )), where 2+-2 anda(2,2+) max{12-2,1,
.for 121 max{x, ]A} for 2 (x, ) e Rn. Let {Ak}k=o,,... be such that one of the
,,
following holds:
r, k O, 1,..., for some and r (0, 1);
]Ak
q-linearly with Q{Ak} r e [0, 1).
Ak
Then for any r such that max {rx, rn} < r < 1, there exist positive constants
and an integer kr such that if ]xo- x, < er,
< 5r, and ko kr, then
and converges to 2, with
iteration (4.1) is well defined for k O, 1,
(4.6) 2 2, rker, k O, 1,..., if (4.4) holds, where 2] max {x, } for
(x,) R;
g r]k
k O, 1,..., if (4.5) holds, where
(4.7) ]k+
(x, ) e Rn and for a suitable constant F.
for
(4.4)
(4.5)
..., o-
,
,
,
,,
Furthermore, { k
} k=O,,.., and { B; B, } k=o,i,.., are uniformly small.
Remark. Our reasons for using the index k0 in this analysis are somewhat subtle,
and we will outline them in the context of iteration (4.1) and Theorem 4.1. Similar
remarks are valid in the context of iteration (4.11) and Theorem 4.3 below. The bulk
of the technical work underlying Theorem 4.1 is contained in Theorem A.2 in the
Appendix, an examination of the proof of which shows that the A’s are required, at
least eventually, to satisfy one of the following:
small for each k;
with
(i) an r-linear convergence inequality ]Ak ,[ g
a
q-linear
converge..ce inequality
(ii)
r"- with ksmall for each k, where r" is near but strictly greater than QI{Ak} (which may be
+-
,
,r, r,,
zero).
Since these conditions hold for sufficiently large k under (4.4) and (4.5), respectively,
our use of k0 in conjunction with (4.4) and (4.5) is really an economical way of imposing
,
these conditions.
Proof. Let N Rn and N2 Rnxn be neighborhoods of and B,, respectively,
N fl and the first n columns of any matrix in N2 constitute a nonsingular
matrix. Set N
N x N x N2 and define an update function U on N (see the
such that
Appendix) by
.u e x(z,z+)},
v(z,z+,
where
+ is the least-change secant update of
with respect to
X(, +), and the norm [[. ][. It follows immediately from (4.3) and (2.4) that U satisfies
of
0 and 2
the bounded deterioration hypothesis of the Appendix with al
(4.3), and the theorem follows from Theorem A.2.
THEOREM 4.2. Suppose that the hypotheses of Theorem 4.1 hold and that for some
o and Bo, {k}k=0,,... is a sequence generated by (4.1) which converges q-linearly to
, in some norm with gk k+ k # 0 for all but finitely many k. Then
lim
[I- B,-F(,)](x
x,)
(x+
x,)
(a.s)
for
B,-C,(X0++ ,)
/
+ B, IV,
on Rn and Rn. In paicular, ff rz
any norms
Qi { Ak } then the following hold:
(4.9) for any e > 0, lik [k+
x] + F]A] for
,
/lz,
z,l
0
]I- B,- Fz(,)] and rx
max{rz, r + e}, where ]
,[/]k
n
constant F depending on
suitable
a
and
(x, ) e R
for
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
(4.10) if r
O, then 2k
--* 2,
q-superlinearly
if and only if the
1279
norm-independent
condition
holds.
-
In particular, 2k 2, q-superlinearly if Ak A, q-superlinearly and F1(2,) E A.
Proof. From Theorem A.3, (4.8) holds if and only if limk_ I(/k --/,)1/11
0, and this latter condition can be shown to hold by a very minor modification of the
proof of Theorem 3.3 of [12]. Proposition A.4 gives (4,9) and (4.10).
Theorems 4.1 and 4.2 have immediate consequences for an iteration (4.1) which
uses the direct, fixed-scale least-change secant updates derived in 3. There is extensive discussion of the square-matrix analogue of condition (4.3) in [12, pp. 962964], and this discussion is valid with minor appropriate changes in the present context. It follows in general from this discussion that if F E A near 2,, then not
only does B, F’(2,) but also condition (4.3) is satisfied near 2, for some a when
X(2, 2+) {F(2+)- F(2)}, the traditional choice which gives Yk F(2k+l)- F(2k)
in iteration (4.1). It follows in particular from Theorems 4.1 and 4.2 that if Ak
q-superlinearly and if F(2) has the structure imposed on updates for 2 near 2, in
each case in 3, then iteration (4.1) enjoys local q-superlinear convergence when the
update is the nonsquare extension of any of the following: the first Broyden update,
the Powell symmetric Broyden update, or the sparse Broyden update.
We now establish counterparts of Theorems 4.1 and 4.2 for an analogue of iteration
(4.1) which uses least-change inverse secant updates. We continue assuming that
{k }k=o,1,... is a prescribed sequence which converges to and consider an iteration
which begins with some 20 (x0, Ako) and K0 [K0, L0] and determines
-
,
,
2k+
KkF(2k) + Lk (Ak0+k+
(xk+, Ako+k+),
Yk E
R+
[K+,L+]
xk+
(4.11)
xk
Ak0+k),
(R)+,
0, 1, ..-, where (/k)+ is the least-change inverse secant update of/k in
A with respect to k 2.k+ --2k, Yk, and [[. [I. Theorems 4.3 and 4.4 below are
for k
_
analogous to Theorems 5.1 and 5.2 of [12]. Since we are not treating a "computed
part" of approximate Jacobians, there is no compelling need to consider a choice rule
for determining a vector wk as well as Yk at each iteration as in [12, 5]. Here we let
sk
Xk+ --Xk play the role of wk in [12, 5] in determining inverse secant equations.
The proofs of Theorems 4.3 and 4.4 are minor modifications of those of Theorems 4.1
and 4.2, and so we omit them.
THEOREM 4.3. Let F satisfy the basic hypothesis and assume that F(2,) is
invertible. Let A and K,
[K,,L,] P,([F(2,)-,-F(2,)-Fx(2,)]) be such
that there exists an rz for which [I- K, Fx(2,)[
rz < 1. Assume that X has the
2 and any
property with A that there exists an a >_ 0 such that for any 2, 2+
y X(2, 2+), we have
1280
SAMIH K. BOURJI AND HOMER F. WALKER
for every E M(A, Q(s, )), where l (y, +-A) and a(5:,5:+) max(15:-5:,l, 15:+(x,A) e Rn. Let (k}k=0,1,.." be such that one
5:,1} .for 15:1- max(]x],A]} for
holds.
Then
or
r such that max (rx, r} < r < 1, there exist
any
of (4.4) (4.5)
for
positive constants er, 5r and an integer kr such that ff xo- x, < er, ]Ko- K, < 5r,
and ko
O, 1,..., and converges
kr, then iteration (4.11) is well defined for k
to
according to either (4.6), ff (4.4)holds, or (4.7), if (4.5)holds. Furthermore,
} =o,,... and (g; K: } =o,,... are uniformly small.
(k
THEOREM 4.4. Suppose that the hypotheses of Theorem 4.3 hold and that for
some o and Ko, (k}k=o,,... is a sequence generated by (4.11) which converges qin some norm with $k
linearly to
k+ k 0 for all but finitely many k.
Suppose further that (R}=o,,... and (g[ }k=o,,... are uniformly bounded and that
{Yk}k=o,,... satisfies ,k- Sk 11 Ior each k, where k (Yk, Ako+k+o+), Sk Xk+ Xk,, and limk ak 0. Then
,
,
,
lim
[I- K, Fx(5:,)](xk x,)
.
(Xk+l
x,) + L,(Ako+k+
[L, + K,F (5:,)] (Ako+k
A,)
),)
for any norms on Rn and R In particular, ifrx II-K,F(2,)I and r Q{Ak},
then (4.9) and the following hold:
if r O, then 5:k 5:, q-superlinearly if and only if the norm-independent
condition
holds.
In particular, 5:k
5:, q-superlinearly
if Ak
-
,
q-superlinearly and
e A.
The discussion following Theorem 4.2 remains valid with a few appropriate
), q-superlinearly,
changes. In particular, Theorems 4.3 and 4.4 imply that if )k
then iteration (4.11) is locally and q-superlinearly convergent when the update is the
nonsquare extension of the second Broyden update or, if Fz(5:) is symmetric .for 5: near
5:,, the Greenstadt update.
As we remarked at the beginning of this section, the above analysis does not apply
to cases in which the norm used to determine least-change secant updates varies from
iteration to iteration. In particular it does not apply to iterations which use the
nonsquare extensions of the DFP and BFGS updates given in 3. The counterparts
of these iterations in the square-matrix case are the iteratively rescaled least-change
secant update methods considered in [12], which include the usual DFP and BFGS
methods, and a full local convergence analysis for these methods is given in [12].
However, this local convergence analysis depends critically on Lemma 4.1, p. 969, of
that paper, and we cannot adapt that lemma to apply to the present circumstances.
To indicate why, we note that in the setting of the extension of the DFP update in 3,
the roles of W, and v in Lemma 4.1 are played by Fx(5:,) and ) y-Ct, respectively,
where C is part of a given approximate Jacobian B [B, C] and $ (s, t) 5:+ 5:
1281
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
,
=
and y
F(+)- F() for given +. Since ) depends on C and C F(,) in
general, we cannot expect an inequality I)-Fx(,)sl2 -< aa(hc,+)Plsl2 to hold for all
and +, where I" 12 is the Euclidean norm. This is to say that we cannot expect an
inequality of the form (ii) of the hypotheses of Lemma 4.1 to hold in the context of
our extension of the DFP update.
We note, however, that we can construct a local convergence analysis along the
lines of that given above which, while not applicable to iterations using our extensions
of the DFP and BFGS updates, is applicable to iterations using certain "nearby"
updates. These "nearby" updates may be costly or even impractical in some applications. However, we feel that they are worth noting because they use more computed
Jacobian information and therefore may result in more effective Jacobian approximations in some cases. A local convergence analysis for iterations using these "nearby"
updates is of value in its own right; furthermore, we feel that it provides some heuristic
support for using our extensions of the DFP and BFGS updates. We indicate what
these "nearby" updates are and how a local convergence analysis for iterations using
them can be developed in the DFP context: The "nearby" update is obtained from
our extension of the DFP update by taking
+ and y orF(+) F() for
y F(+)t
y- F()t,
given and + and choosing 9 Y- F(c,)t,
instead of
y- Ct. The first choice is usually impractical, but not always
see,
e.g., the homotopy map (5.4) in 5, in which Fx is constant. The second two choices
can be obtained through either analytic evaluation or finite differences; either way
will require function evaluation work which may by costly in some cases. Note that
the evaluation of Fx ()t or Fx (+)t can be done through finite differences of F in the
t-direction and, hence, does not require the evaluation of all of Fx. (If it is reasonable
to evaluate all of Fx, then we can approximate F via traditional square-matrix DFP
or BFGS updating in a straightforward way, and this would probably be more effective
than using the "nearby" updates discussed here.) For any of these choices of 9, it can
easily be shown that an inequality 19- Fx(,)812 ga(5:,+)P1812 holds, and Lemma
4.1 of [12] can be readily adapted to apply to the present situation with W, Fz(,)
and v ). From this point, it is straightforward to adapt the arguments in the proofs
of Theorems 4.1 and 4.2 above together with those in the proofs of Theorems 4.2 and
4.3 of [12] to obtain local convergence results analogous to those above for an iteration
(4.1) in which X(,+) {F(+)- F()} and the update is this "nearby" update,
y Ct replaced by one of the choices
i.e., our extension of the DFP update with
t y- F (,)t, t y- Fx()t, or y- Fx (+)t.
-
5. Numerical experiments. We report on some simple numerical experiments
that are intended to give some insight into the performance of the updates discussed
in this paper, especially in the context of iterations of the type considered in 4. We
stress that the experimental results given here are by no means intended to provide a
complete or thorough study of iterative methods which use these updates, but rather
are intended to complement experimental results and other work discussed elsewhere.
For perspective, we briefly review other work involving applications of the updates
given here. As noted in the introduction, the first Broyden update (3.1.1) has been
used in an algorithm of Georg [16] and in the augmented Jacobian matrix algorithm
in the homotopy method code HOMPACK [32]. Both of these are predictor-corrector
path-following methods with predictor steps taken in current (approximate) tangent
directions. In Georg’s algorithm, corrector iterations are of the normal flow type,
while the HOMPACK algorithm uses corrector iterations determined by certain augmented Jacobian matrices. (See Walker and Watson [31] as well as [16] and [32] for a
1282
SAMIH K. BOUI:tJI AND HOMEI:t F. WALKER
discussion of these iterations.) In Georg’s algorithm, first Broyden updating is done
on both predictor steps and corrector iterations. In the augmented Jacobian matrix
algorithm in HOMPACK, first Broyden updating is done only on the corrector iterations; exact Jacobians are obtained and used at each predictor step. The update used
in HOMPACK is described in [32] as a square-matrix first Broyden update of a certain
augmented Jacobian matrix, but it is essentially the same as the nonsquare-matrix
update (3.1.1).
Martinez [20] augments his local r-linear and r-superlinear local convergence analysis for very general Newton-like methods for underdetermined systems with a discussion of applications of these methods to the problem of finding an interior point
of a polytope, which arises in connection with interior point methods for linear programming. In experiments reported in [20], a nonlinear transformation was used to
rephrase this problem as one involving an underdetermined nonlinear system, and
normal flow algorithms with both exact Jacobians and approximate Jacobians maintained by Broyden updating, together with other related algorithms, were applied to
this problem. The Broyden updating was apparently either first Broyden updating
or sparse Broyden updating as given in 3.1 here, although the exact nature of the
updates used is not made clear in [20].
Walker and Watson [31], in addition to giving a local q-linear and q-superlinear
local convergence analysis for general normal flow and augmented Jacobian algorithms
which use the updates given in this paper, discuss two sets of numerical experiments
involving these algorithms. In the first set, normal flow iterations using the first and
second Broyden updates, (3.1.1) and (3.2.1), respectively, were applied to simple twovariable problems; in the second set, the normal flow and augmented Jacobian matrix
algorithms in HOMPACK, together with variations which use first and second Broyden
updating, were applied to a geometric modeling problem described by Morgan [24].
Other experiments involving use of the updates introduced here in modifications
of algorithms in HOMPACK are reported by Bourji [3]. In these experiments, all
of the specific updates introduced in 3 except the sparse Broyden and Greenstadt
updates were implemented in modifications of the HOMPACK augmented Jacobian
matrix algorithm, and the resulting algorithms were tested on five problems obtained
by parametrizing problems from the test set of Mor, Garbow, and Hillstrom [23] with
simple homotopy maps. The first and second Broyden updates, (3.1.1) and (3.2.1),
respectively, and the PSB update (3.1.7) were tested on all five problems.
In the first problem, no subset of n columns of the Jacobian constituted a symmetric matrix; however, the PSB update was still tested on this problem because the
(n- 1) x (n- 1) principal submatrix of the Jacobian was symmetric. The second
through fifth problems amounted to parametrized nonlinear least-squares problems,
and the DFP and BFGS updates, (3.1.10) and (3.2.2), respectively, as well as the first
and second Broyden and PSB updates were tested on these. We refer the reader to [3]
for more details of these experiments and summarize the results here. As measured by
numbers of function and Jacobian evaluations, the performances and rankings of the
algorithms using the different updates varied considerably from problem to problem.
However, the algorithm using the first Broyden update clearly performed best overall;
only in one of fifteen trials did another algorithm, the one using the PSB update, take
fewer function or Jacobian evaluations. The algorithm which performed second best
overall was the one using the PSB update, and the algorithms using the other updates
often, but not in every case, performed considerably worse than those using the first
Broyden and PSB updates. The algorithm using the DFP update was perhaps third
best and outperformed the PSB-update algorithm in one trial. Each algorithm failed
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
1283
in at least one instance, and there was one trial (involving the Rosenbrock function)
in which only the algorithms using the second Broyden, DFP, and BFGS updates succeeded and another trial (involving the extended Rosenbrock function) in which only
the algorithms using the DFP and BFGS updates succeeded. However, there were
two trials in which only the BFGS-update algorithm failed and three trials in which
only the DFP-update algorithm failed. In evaluating these results, it should be kept
in mind that there are many things going on in the sophisticated homotopy method
codes in HOMPACK, e.g., procedures for selecting stepsizes, determining when to
reevaluate Jacobians, etc. The first Broyden update, which performed most successfully in these trials, is the update used in the unmodified augmented Jacobian matrix
algorithm in HOMPACK and thus is the one for which the code is "tuned." In view
of this, we feel that it would be premature to dismiss the other updates on the basis
of these experiments.
As a complement to the above work, we give here the results of experiments
involving the application of a simple path-following algorithm to a realistic problem.
The object is to explore in some depth the performance of one of the updates given
here--the first Broyden update (3.1.1)--in comparison to that of various alternatives
involving numerically evaluated Jacobians and traditional square-matrix first Broyden
updating.
The problem of interest is the elastica problem described by Watson and Wang
[33]. In this problem, a large planar deflection of a thin rod, or elastica, subject to
terminal loads x(1), x(2) and moment x(3) is modeled by
dO
EIs
x(2)
x(1)y + x(3),
where EI is the flexural rigidity, 0 is the local angle of inclination at the point ((,
and s is the arc length along .the elastica. For simplicity, we take E1 1 and assume
that the elastica is of unit length with the left endpoint clamped horizontally at the
origin. This gives the initial-value problem
(5.1)
dO
ds
x(2) x()r] + x(3),
o(o) o.
d(
ds
cos0,
dn
ds
sin0,
We denote the solution by (s,x), y(s,x), O(s,x), where x (x(),x(2),x(3)) T E R3.
The problem is to determine x. so that the right endpoint of the elastica has a
given location and angle of inclination, i.e., so that x x. satisfies
l(x)
((1, x))
0(1, x)
for a specified vector a giving the right endpoint location and angle of inclination.
Because of the sensitivity of the elastica to end conditions, especially for more complicated shapes which require large forces and torques to achieve, solving (5.2) can be
quite challenging for globalized Newton-like methods such as those in MINPACK [22];
see [33]. Homotopy methods seem to fare better. Our strategy here is to choose a
homotopy map F(x, ) such that F(x, 0) 0 is easy to solve and F(x, 1) f(x) -a,
and then to track the zero curve of F as A goes from 0 to 1. For a given F, the simple
algorithm for following this curve which we used in our tests is the following.
1284
SAMIH K. BOUI:tJI AND HOMER F. WALKEI:t
ALGOIITHM 5.1. Given x0 such that F(x0, 0) 0, choose nstep, the number of
A-increments between 0 and 1, and set h 1Instep, x xo, and A 0.
For
nstep, do:
1,
1. Predict: Given (x, A) and [- [B, C] F’(x, A), overwrite
x
x B {F(x, A) + Ch} and A ,-- A + h.
2. Correct: Given e > 0 and initial (x, A) and B F(x, ,), overwrite
x
x- B-1F(x, A.) until IF(x, A)[ _< e.
...,
-
,
What is unspecified in Algorithm 5.1 is the manner in which successive B’s and
B’s are determined. If they are determined by least-change secant or inverse secant
updating at each step and if the corrector convergence test is passed after a finite
number of iterations for each < nstep, then this algorithm has the form of iteration
nstep. With this
(4.1) or (4.11), respectively, with an added convergence test when
updating, a reasonable modification of the analysis in 4 and in the Appendix applies
to a reasonable modification of Algorithm 5.1 as follows: Suppose F(xo, A0) 0 for
some (x0, A0) and in Algorithm 5.1 we initialize h (1-Ao)/nstep, x xo, and A A0
and take B F(xo, A0) at the initial predictor step. Suppose also that we use some
fixed e in the corrector iterations for < nstep and take e 0 for
nstep, so that
the algorithm proceeds indefinitely once A
1. Suppose finally that for (x, A) near
(x,, 1), F’(x, A) has the structure (if any) which is imposed on updates and Fx(x, A) is
nonsingular. If (x0, A0) is sufficiently near (x,, 1), then the iterates are well defined, h
is small, and furthermore, using bounded deterioration (see the proof of Theorem 4.1
and the Appendix), we can verify that approximate Jacobians remain near their actual
values for a given finite number of predictor steps and corrector iterations. It follows
that if (x0, Ao) is sufficiently near (x,, 1), then the number of corrector iterations
in step 2 is no greater than a prescribed bound for each < nstep, and therefore
that the sequence of A-values used in the predictor steps and corrector iterations
satisfies an r-linear convergence inequality (4.4) with A, 1 and/ -[A0- A, I. With
this, an inspection of the analysis in 4 and the Appendix shows that if (x0, A0) is
sufficiently near (x,, 1) and the appropriate ancillary hypotheses of the theorems in
4 are satisfied, then the iterate sequence {2k (Xk, Ak)}k=O,1,... satisfies the r-linear
nstep
convergence inequality of (4.6). After a finite number of iterations, we have
and A 1 for all remaining iterations, and since approximate Jacobians are still near
their true values if (x0, Ao) is sufficiently near (x,, 1), the convergence is ultimately
q-superlinear.
Algorithm 5.1 is quite unsophisticated compared to other homotopy algorithms.
For one thing, it uses a preset number of equally spaced A-increments, rather than
determining variable A-increments in some intelligent way as the algorithm proceeds.
Perhaps more seriously, because a prescribed value of A is used in determining the
next value of x at each step, the algorithm is likely to get into trouble if Fx is singular
along the zero curve. This lack of sophistication is by design, however. The object of
the experiments here is to give insight into the merits of various ways of determining
successive B’s and B’s, and the simplicity of Algorithm 5.1 lends itself to this. Also,
maintaining constant A-values through the corrector iterations allows options for traditional square-matrix first Broyden updating in determining successive B’s and so is
useful for comparative testing.
In our experiments, we tested seven different strategies for determining the successive B’s and B’s in Algorithm 5.1. These are as follows:
Strategy 1. Take /}
F’(xo, 0) initially, and then maintain / [B, C] and
thereby B through nonsquare first Broyden updating on all subsequent predictor steps
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
1285
and corrector iterations. See the remarks below.
Strategy 2. Take [ [B, C] F’(xo, 0) initially, and then use this initial C
F (x0, 0) in all subsequent predictor steps while maintaining B through square-matrix
first Broyden updating on all corrector iterations.
Strategy 3. Take B
[B,C] F’(xo, O) initially, and then reevaluate C
and maintain B through square-matrix first Broyden
at
each
step
predictor
F(x, )
updating on all predictor steps as well as all corrector iterations, replacing the righthand side of the secant equation (1.3) with the "ideal replacement" F(x+,A+)
F(x,A) F(x,A)(A+ A) when updating on the predictor steps. See the remarks
below.
Strategy 4. Take B F(x, A) at each predictor step, and at each set of corrector
iterations take B Fz(x, A) initially and then maintain B through square-matrix first
Broyden updating on the subsequent corrector iterations.
Strategy 5. Take
F(x, A) at each predictor step, and at each set of corrector
iterations take B F(x, A) initially and then use this B throughout the subsequent
corrector iterations.
Strategy 6. Take/ F(x,A) at each predictor step and B F(x, A) at each
corrector iteration.
Strategy 7. Take/} [B, C] F’(xo, 0) initially, and then use this/ and B in
all subsequent predictor steps and corrector iterations.
Remarks. The ordering of these strategies very roughly reflects an increasing
dependence on derivative evaluations and a decreasing dependence on updating. The
total chord-method Strategy 7 is included mainly to note that it was quite unsuccessful
in the experiments discussed here, as might be expected. The partial chord-method
Strategy 5 also resulted in failure in the two particular trials reported below, but it
showed some success in other trials while Strategy 7 nearly always led to failure. In
Strategy 1, the nonsquare first Broyden updating of course reduces to square-matrix
first Broyden updating on the corrector iterations because does not change. The
updating in Strategy 3 is described as square-matrix first Broyden updating, but it
can also be regarded as nonsquare-matrix updating in which the last column of F i.e.,
Fx, is computed while the remainder is approximated through Frobenius-norm leastchange secant updating into the subspace of matrices the last column of which is zero.
Thus Strategies 1 and 3 can be regarded as the strategies which employ nonsquarematrix updating. As noted in 4, the discussion in 4 does not explicitly include cases
in which part of the Jacobian is computed while the remainder is approximated by
updating, but it would be straightforward to extend it to do so.
In the experiments reported here, we made a particular choice of F and in each
of a number of trials counted the function evaluations, Jacobian evaluations, and corrector iterations required by the above strategies for the implementation of Algorithm
5.1. Evaluating the function F required the evaluation of f in (5.2). This was done
by numerically solving (5.1) using subroutine RKF45 of Shampine and Watts (see
Shampine, Watts, and Davenport [28]), which we obtained from the Forsythe, Malcolm, and Moler [15] collection of subroutines. Derivatives of F were obtained by
forward-difference approximations, which provided adequate accuracy. The function
evaluations required for these derivative approximations were included in the overall
function evaluation counts; thus these counts really provide a measure of overall function evaluation work including that required for derivative evaluation. The evaluation
of either F or F counted as a Jacobian evaluation; the evaluation of F alone did
not. Thus the Jacobian evaluation counts reflect the number of matrix factorizations
"from scratch" which were required, as well as the number of function evaluations
,
1286
SAMIH K. BOURJI AND HOMER F. WALKER
which arose from Jacobian evaluations. (As in the square-matrix case, we can update
matrix factors or perhaps exercise other options to incorporate the rank-one Broyden
update without obtaining a new factorization "from scratch".) In the trials reported
here, we chose a (0, r/2, 7t’) T, for which the solution of (5.2) is x. (0, 0, 71") T, i.e.,
the elastica is a semicircle. All computing reported here was done in double precision
on a Sun 4/280S using the Sun Microsystems Fortran 1.1 compiler.
Our particular choice of F is
F(x, A)
a] + (1 A)(x x0).
A If(x)
This choice appears to be not very well behaved. In particular we saw evidence in our
testing of bifurcation of the zero curve of F, at least for the values of x0 which were
used in the tests reported here. Although we did not verify bifurcation with certainty,
there were at least sharp changes in the direction of the zero curve for values of A
in [.5, .6] and [.9, 1], which resulted in relatively large numbers of corrector iterations
for all strategies around the middle and last predictor steps. Still, for our testing we
prefer (5.3) to an apparently better behaved alternative suggested in [33], viz.,
F(x, A)
f(x)-
(1
A)f(x0).
First, the bad behavior of the zero curve of F given by (5.3) creates challenges for the
methods being tested; second, if F were given by (5.4), then Fx would be constant
and there would be no need for nonsquare updating.
The results of two typical trials with F given by (5.3) are given in Tables 1 and
2 below. In both trials, we took x0
(-.4, .4,3) T. In the first trial, we also took
-1
nstep,
tolerances
e
10 -5 for
10 for < nstep and e
nstep 10 and used
at
interest
the
of
an
zero
curve
was
only
accurate
that
the
on
attitude
point
taking
the final step. In the second trial, we took a greater number of steps and used tighter
nstep.
tolerances, choosing nstep 20 and e 10 -2 for < nstep and e 10 -6 for
A maximum number of 20 corrector iterations was allowed before convergence failure
was declared.
TABLE 1
Results
Strategy
Corrector Iterations
Function Evaluations
Jacobian Evaluations
of trial 1 for F
given by
(5.3).
1
2
3
4
5
6
7
36
51
1
45
60
1
22
46
1
15
78
14
*
*
7
79
17
*
*
*
* corrector convergence failure at the 6th step
It is evident in these trials that the strategies which use more computed derivative information require (perhaps significantly) fewer corrector iterations, but these
strategies may require (perhaps significantly) more function evaluation effort, not to
mention matrix arithmetic, and updating, either alone or in conjunction with computed derivative information, offers (perhaps significant) advantages. The success of
Strategy 4 and the failure of Strategy 5 show the usefulness of updating on the corrector iterations. The success of Strategies 1-3 shows the effectiveness of updating
1287
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
TABLE 2
Results
Strategy
Corrector Iterations
Function Evaluations
Jacobian Evaluations
of
trial 2
for F
given by
(5.3).
1
2
3
4
53
78
1
65
90
1
38
82
1
23
151
29
5
6
7
18
173
38
*
*
*
corrector convergence failure at the llth step
in maintaining Jacobian approximations over the full range of A-values, even if the
updating is only traditional square-matrix first Broyden updating on the corrector iterations. However, comparing the results for Strategy 2 with those for Strategies I and
3 shows the importance of updating on the predictor steps as well as on the corrector
iterations, i.e., the importance of nonsquare-matrix updating. It happened in a number of other trials not reported here that Strategies 1 and 3 succeeded while Strategy
2 failed. For example, for x0 (-.5, .5, 3) T, nstep 20, e 10 -2 for i < nstep and
e
10 -6 for
nstep, Strategy 2 led to corrector convergence failure at the eleventh
step while Strategies 1 and 3 succeeded. A comparison of the results for Strategies 1
and 3 reinforces the conventional wisdom that computing part of the Jacobian results
in more effective Jacobian approximations. However, this partial computation of the
Jacobian requires additional function evaluation work in proportion to the number of
predictor steps. Strategy 3 has an overall function-evaluation advantage over Strategy
1 with the ten predictor steps in trial 1, but the advantage is reversed with the twenty
predictor steps in trial 2.
Appendix. We now give analogues of the results in the Appendix of [12] which
are suitable for application to the local convergence analysis given in 4. We assume
as in 4 that F" R
Rn satisfies the basic hypothesis near 2, (x,, A,) and that
{Ak}k=0,1,... is a sequence which converges to ),. Our convention regarding norms is
denotes
as in 4, i.e., I1" denotes a particular inner product norm on ann and
all of the following: particular norms on l:tn and l:tm, various norms on R which are
specified in each instance, and the matrix norms induced by these vector norms.
Our interest is in a very general iteration which begins with some 20 (x0, Ako)
and B0 [Bo, Co] and determines
-
I
Xk+l
(A.1)
2k+
{F(k) + Ck (Ako+k+l
(Xk+l,Ako+k+l),
Xk
B
....
Ako+k)},
for k 0, 1,
In (A.1), U is an update function, the values of which are subsets
of Rnn and which we assume is defined in a neighborhood N N1 N1 x N2 of
(2,,2,,/},) E Rn R Rnn for some/,, where N1 C_ and N2 is such that the
first n columns of any matrix in N2 constitute a nonsingular matrix. Usually, but
not always, we also assume that B, is near F’(2,) and that U satisfies the following
hypothesis.
BOUNDED DETERIORATION HYPOTHESIS. Let al and a2 be nonnegative constants
such that .for each (2, 2+, [) N, every
U(2, 2+, [) satisfies
+
1288
SAMIH K. BOURJI AND HOMER F. WALKER
for a(, 5c+)
max{l--,l, lY+--,l}, where lYI max{Ixl, IAI} for Y (x,A)
Proposition A.1 below is a standard elementary technical result which we use in
obtaining the local convergence theorems which follow.
PROPOSITION A.1. Under the basic hypothesis, we have
7
IF() F’(,)(- ,)l <-l+p
i_ ,]+p
,
max{x, A} for
(x, A) e Rn.
for e where
+ t( ,), we have
Proof. Setting (t)
,
]F() F’(,)( ,)]
]F’((t)) F’(,)[ dt [(
<
-l+p
Iz-z,I +.
Theorems A.2 and A.3 and Proposition A.4 below establish the local linear convergence of iteration (A.1) and characterize the speed of convergence of the iteration
sequence. They are counterparts in our analysis of Theorems A2.1 and A3.1 of [12].
THEOREM A.2. Let F satisfy the basic hypothesis and let U satisfy the bounded
deterioration hypothesis with respect to B,
IF,, C,], where B, Rnxn is an invertible matrix with
II- B,-F(,)I
Let
{Ak}k=O,,...
rz <
1.
of the following holds:
(A.2) ]Ak--A,[ Zr, k=0, 1, ..., for some Z andr e(0,1);
(A.3) Ak A, q-linearly with Q{Ak} r e [0,1).
Then for any r such that max (rz,r} < r < 1, there exist positive constants er, 5r,
and an integer kv such that ff Ixo- x,] < er, ]Bo- B, < 5r, and ko
kr, then
with
iteration (A.1) is well dCned for k O, 1, ..., and converges to
(A.4) ] ,l g re, k 0, 1, ..., ff (A.2) holds, where [] max {[x], IA]} for
be such that one
,
,
(A.5) [k+
for
rlkn ,[, k 0, 1,..., ff (A.3) holds, where
e
(x, A) R and for a suitable constant F.
, }k=o,,...
-
}k=o,,...
B,
and {B;
are uniformly small.
nxn
to be small is equivalent
are
norms
Since
on
R
equivalent,
requiring[.
Proof.
to requiring ][. ][ to be small. For convenience, then, we impose smallness requirements
on ]]. ] rather than on ]. ]. The following may be helpful: If we are given
A+, and B IF, C] with B invertible, then for x+ x B {F() + C(A+ A)},
we have
Furthermore,
(A.6)
{k
+
x,
-
[I- B-F(Z,)I (x ,) + B- IV F(z,)] ( ,)
B-C(+ ,) B- IF(z) F’(z,)(z z,)].
We first assume that (A.2) holds and establish all conclusions except (A.5). Take
max {lx[, ]A]} for
(x, A) e Rn. Choose positive e, 5 and an integer kr such
that (5 + a2)eP/(1 rP) <
e, and if ] ,]] 5 and k k, then
B- exists and
[
I
, Zr
S-F(z,)l + (IS- IV F(z,)]l +
IB-CI} Z + I-Xl
1
I+p
rer.
+ er
1289
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
If necessary, further restrict er, 5, and kr so that if [x
k > kr, then (2,2+,/)) E N for 2 (x, Ak) and
2+
(x- B -1 {F(2) + C(Ak+l
x,
< er, lIB B,[[ < 5,
and
Ak)}, Ak+l).
> 0 such that 5r 4- (a5 4- a)e/(1 r) < 5.
Ix0- x, < er, liB0- B, I < 5r, and ko > kr. We
Choose 5r
Let
have from
(A.6) and the
other assumptions above and from Proposition A.1 that
12 2, < rer.
As an inductive hypothesis, assume that for some k > 0 and for j
Then
II/)j-/),ll
a(2,2+)
...,
k- 1,
0,
that
5
and
Then
<
see
we
forj
0,..., k-l,
<
12+-2, rJ+er.
< rJer and by the bounded deterioration hypothesis
Summing over j -0,..., k- 1 gives
(A.7)
]]/k ,]1 < llBo B,][ + (o1( 4- o2) 1 erPrp < 5.
Also, as in the case k
1,
Ix+ x,I < [I B-Fx(2,)[ rker
+ {Is;
+
"r
(re)+v
_<rk+l r.
-
Then ]2+ Y,l < rk+ler, and the induction is complete. It follows from (A.7) and
the assumptions on 5 that {/ -/}, } =o,,... and { -1 B, } =o,,... are uniformly
small, and the theorem is proved in this case.
We now assume that (A.3) holds and establish (A.5). We assume that e, 5, and
k have been chosen such that if Ixo- x,I < e, IIBo- B, < 6, and ko > k, then
iteration (A.1) is well defined for k
0, 1,
(A.4) holds, and {B- B,}=o,,...
B,- }=o,,... are uniformly small. This is allowed, for if (A.3) holds,
and {Bthen (A.2) holds for an appropriate with rx replaced by anything larger. Choose
r’, r" such that max {r,r} < r" < r’ < r. If necessary, further restrict e, 5, and
k so that if Ixo- x,I < e, IIBo- B,I] < 5, and ko > k, and if {2,B}=o,1,... is
determined by (A.1), then [Ako+k+l- ,[ < r"lAko+k- ,[, II--BF(2,)[ < r’, and
IB-Xl(-/(l/p)) max {Ix x,], ]o+ ,1} < r-r’ for k 0, 1,.... Choose F > 1
so large that if Ixo x,] < e, I1o t),ll <
is
and ko > k and if
-1
determined by (A.1), then
[C- Fx(2,)] + (IBiCkl + F)r" < Fr’ for k 0,
1,.... Take 121 Ixl + rll for 2 (x, A) e R
B
...,
,
IB
.
I
{2k,[k}=o,x,...
1290
SAMIH K. BOURJI AND HOMER F. WALKER
< er, II/o -/}, I < ir, and k0 >_ kr, and let (2k, [k } k=0,1,...
be determined by (A.1). We have from (A.6), the other assumptions, and Proposition
Suppose that
Ix0
x,
A.1 that
and so
12k+l 2,[ <_ r’lxk
x,
+ {[B; [Ck F(2,)]] + r" IB;1Ck[ + r"F}
+ (r
<
[:]
THEOREM A.3. Suppose that F satisfies the basic hypothesis and that {2k }k=o,1,...
is a sequence generated by (A.1) which converges q-linearly to 2, in some norm with
k 2k+1- 2k 0 .for all but finitely many k. If B, [B,, C,] is any matrix such
that B, E Rnxn is invertible, then the norm-independent condition
This completes the proof.
(A.8)
holds
lim
if and only if the
lim
I(/}k -/}*) Ski
0
norm-independent condition
[I- B,-1F(2,)](x x,) (x+l x,) B,-1C,(Ao++ A,)
(A.9)
+ B, -1 IV, F(2,)] (Ako+k A,)
/12k
2,
0
holds.
Proof. We have
(J0k --/},)k
B, ([I- B,-1Fx(2,)] (xk
(A.10)
x,) (Xk+l X,)
B,-C,(Ao++ A,)
+ B, -1 [C, F (2,)] (Ako+k A,) }
+ F’ (2,)(2k 2,) F(2k)
2, q-linearly in some norm, it follows from the equivalence of norms
Since 2
R that for any norm [. on R there are constants 1 and /2 for which
on
(A.II)
Also, we see from Proposition A.1 that
(A.12)
F’(2,)(2k
2,)
F(2k)
o(lkl)
in any norm. Since B, is invertible, it follows immediately from (A.10)-(A.12), and
norm equivalence that (A.8) holds in some norms on Rn and R if and only if (A.9)
holds in some norms on Rn and R
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
1291
The conditions of Theorem A.3 have particular consequences which are outlined
in Proposition A.4 and the remarks below.
PROPOSITION A.4. If (A.9) holds in some norms on Rn and Rn and if rx
B,-1Fx(2,)I and r Q{Ak}, then the following hold:
II-
(A.13) /or any e > 0, limk_ 12k+ 2,1/12k 2, <_ max{rx, r,x + e}, where 121
Ixl + FIA /or 2 (x, ,) E Rn and/or a suitable constant F depending on ;
(A.14) if r O, then 2k 2, q-superlinearly if and only if the norm-independent
-
condition
holds.
In particular, 2k
-
Proof. If (A.9)
2, q-superlinearly
if ,k
-
,
q-superlinearly and B,
F(2,).
holds in some norms, then
,
for any F and any norms on Rn, Rm, and Rn. From (A.15), we see that (A.13)
holds by taking 121
Ixl + F[A[ for 2 (x,A) e R where F is sufficiently
large that [IB,-C,I + F]r + IB,-[C, F(2,)] _< F(r + e). If rx
0, then
0 for any norms on Rm and Rn, and it follows
limk- IAko+k+- A,[/[2k- 2,
from (A.9) that
for any positive F. Then we obtain
(A.14) by taking
R
Ixl + rill
for 2
(x, ) e
We now consider an inverse update analogue of iteration (A.1), viz., an iteration
which begins with some 2o (xo, Ao) and Ko [Ko, Lo] and determines
xk+
(A.16)
+
/k+l
KkF(2k) + L (Aco+k+
(x+,,Xo++),
[gk+l, Lk+l] e U (2k, 2k+l,
xk
Ao+k),
1292
SAMIH K. BOURJI AND HOMER F. WALKER
...,
where we assume the update function U is defined in a neighborhood
0, 1,
x
x
N N N2 of (,,,,/,) R Rn x Rn for some/,, where N C_
and N2 is such that the first n columns of any matrix in N2 constitute a nonsingular
matrix. The appropriate analogue of the bounded-deterioration hypothesis is the
following hypothesis.
INVERSE BOUNDED DETERIORATION HYPOTHESIS. Let
and 2 be nonnegative
constants such that for each (5:, 5:+, [) e N, every k+ e U(5:, 5:+, R) satisfies
for k
N
lik+ -/L I < [1 + 0!1(:1"(5:, 5:+)P] Ilk
k, I +
max{ls:-5:,l, 15:+-5:,l}, where 15:l max{lxl, IAI} fors: (x,A) e Re.
Theorems A.5 and A.6 below are counterparts for iteration (A.16) of Theorems
A.2 and A.3 for iteration (A.1). They are analogous to Theorems A2.2 and A3.2 of
[12]. The proof of Theorem A.5 is very similar to the proof of Theorem A.2, and so
we omit it.
THEOREM A.5. Let F satisfy the basic hypothesis andlet U satisfy the inverse
bounded deterioration hypothesis with respect to K, [K,,L,], where K, E I:tnxn is
an invertible matrix with
fora(5:,5:+)
II- g, Fx(5:,)l <_ rx < 1.
Let
be such that either (A.2) or (A.3) holds for some r. Then for any r
max{rz,r),} < r < 1, there exist positive constants er, r and an integer kr
if [xo x,[ < er, I/o -/,[ < r, and ko >_ kr, then iteration (A.16) is well
defined for k O, 1,..., and converges to 5:, according to either (A.4), /f (A.2) holds,
or (A.5), if (A.3) holds. Furthermore, { fk
}k=O,1,... and
{Ak}k=O,1,...
such that
such that
,
are uniformly small.
THEOREM A.6. Suppose that F satisfies the basic hypothesis and that {5:k }k=o,1,...
(A.16) which converges q-linearly to 5:, in some norm
for all but finitely many k and with {gk}k--O,1,... and
Let fi[, [g,, L,] be any matrix such that K,
bounded.
{K[l}k=o,1,... uniformly
l:tnn is invertible, and suppose that {Yk}k=O,1,... i8 a sequence satisfying the normindependent condition
is a sequence generated by
with k
0
5:k+1- 5:k
[k, flk -sk[ <_ aklkl,
(A.17)
where fk
(Yk, Ako+k+l Ako+k), Sk
norm-independent condition
(A.18)
holds
lim
Xk,
I(/7k -/*) k
if and only if the norm-independent
lim
Xk+l
k
[I K, Fx(5:,)](xk x,)
O, 1,...
and limk.-, ck
0. Then the
0
condition
(xk+l
x,) + L,(Ako+k+l
A,)
(A.19)
[L, + K,F (5:,)] (Ako+k A,)
holds.
Remark. Clearly, (A.19) is the same as (A.9) with B, g-1 and C,
Consequently if (A.18) and (A.19) hold, then the conclusions of Proposition A.4 hold
with B, K- and C, -K- 1L,.
LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES
1293
Since (A.19) is the same as (A.9) with B, K-1 and C. -K-IL., it
suffices to show (A.18) holds if and only if (i.8) holds with [t: [Bt:, Ct:], St:
and Ct:
Lt:. We have
Proof.
K
K"
(/t: -/.) t: (B. Bt:)(/T(.t: st:) Bt: (/t: R.),t:,
(A.20)
and it follows from (A.17), (A.20), and the boundedness of
(K)k=o,,... that (A.8) holds if and only if
(A.21)
lim
(’t: -/*) ,Ot:l
(Kk}k=O,,...
and
0
(A.17) implies that there are positive constants 71 and v]2 for which llt:l <-Y2[kl, k 0, 1,.... It follows that (A.21) is equivalent to (A.18), and the
holds. But
Ikl
g
proof is complete.
[:]
Acknowledgments. The authors thank Layne T. Watson for participating in
many discussions which were influential in shaping the contents of this paper. We also
thank the referees for their thoughtful reviews.
REFERENCES
[1] P. ALFELD, Two devices for improving the ejficiency of stiff ODE solvers, in Proc. 1979 SIGNUM
Meeting on Numerical Ordinary Differential Equations, pp. 24-1-24-3, Report 79-1710,
Computer Science Dept., University of Illinois, Urbana, IL, 1979.
[2] C. A. BEATTIE AND S. WEAVER-SMITH, Secant methods ]or structural model identification,
Report ICAM-TR-88-0601, Interdisciplinary Center for Applied Mathematics, Virginia
Polytechnic Institute and State University, Blacksburg, VA, 1988.
[3] S. K. BOURJI, Ph.D. thesis, Utah State University, Logan, UT, 1987.
[4] P. N. BROWN, A. C. HINDMARSH, AND H. F. WALKER, Experiments with quasi-Newton methods
in solving stiff ODE systems, SIAM J. Sci. Statist. Comput., 6 (1985), pp. 297-313.
[5] C. G. BROYDEN, A class of methods for solving nonlinear simultaneous equations, Math. Comp.,
19 (1965), pp. 577-593.
[6]---.-----, A new double-rank minimization algorithm, AMS Notices, 16 (1969), p. 670.
[7]--------, The convergence of a class o] double-rank minimization algorithms, Parts I and II,
J. Inst. Math. Appl., 6 (1971), pp. 76-90, 222-236.
The convergence of an algorithm for solving sparse nonlinear systems, Math. Comp.,
[8]
(1971), pp. 285-294.
[9] W. C. DAVIDON, Variable metric methods ]or minimization, Report ANL-5990 Rev., Argonne
National Laboratory, Argonne, IL, November, 1959.
[10] J. E. DENNIS, JR., AND R. B. SCHNABEL, Least change secant updates for quasi-Newton methods,
SIAM Rev., 21 (1980), pp. 443-459.
[11], Numerical Methods for Unconstrained Optimization and Nonlinear Equations, PrenticeHall Series in Automatic Computation, Englewood Cliffs, N J, 1983.
[12] J. E. DENNIS, JR., AND H. F. WALKER, Convergence theorems for least change secant update
methods, SIAM J. Numer. Anal., 18 (1981), pp. 949-987.
[13] R. FLETCHER, A new approach to variable metric algorithms, Comput. J., 13 (1970), pp. 317322.
[14] R. FLETCHER AND M. J. D. POWELL, A rapidly convergent descent method for minimization,
Comput. J., 6 (1963), pp. 163-168.
[15] G. E. FORSYTHE, M. A. MALCOLM, AND C. S. MOLER, Computer Methods for Mathematical
Computations, Prentice-Hall Series in Automatic Computation, Englewood Cliffs, N J,
1977.
2
1294
SAMIH K. BOURJI AND HOMER F. WALKER
[16] K. GEOIG, On tracing
[17]
an implicitly defined curve by quasi-Newton steps and calculating bifurcation by local perturbations, SIAM J. Sci. Statist. Comput., 2 (1981), pp. 35-50.
D. GOLDFARB, A family of variable-metric methods derived by variational means, Math. Comp.,
24 (1970), pp. 23-26.
[18] J. GREENSTADT, Variations on variable metric methods, Math. Comp., 24 (1970), pp. 1-18.
[19] A. GRIEWANK AND L. SHENG, On the Gauss-Broyden method for nonlinear least-squares,
Preprint MCS-PS-0988, Mathematics and Computer Science Div., Argonne National Laboratory, Argonne, IL, September, 1988.
[20] J. M. MARTfNEZ, Quasi-Newton methods for solving underdetermined nonlinear simultaneous
equations, RelatSrio Tcnico 21/88, Instituto de Matemtica, Estatstica, e Cincia dd
Computax), Universidade Estudual de Campinas, Campinas, Brasil, November, 1988.
sparsity in Newton-like methods, Ph.D. thesis, Cornell University,
Ithaca, NY, 1978.
[22] J. J. MoR, B. S. GARBOW, AND K. E. HILLSTROM, User Guide for MINPACK-1, Report ANL80-74, Applied Mathematics Div., Argonne National Laboratory, Argonne, IL, August,
[21] E. S. MARWIL, Exploiting
1980.
[23], Testing
unconstrained optimization software, ACM Trans. Math. Software, 7
(1981),
pp. 17-41.
[24] A. P. MORGAN, Solving Polynomial Systems
[25]
[26]
[27]
[28]
[29]
Using Continuation for Engineering and
Scientific
Problems, Prentice-Hall, Englewood Cliffs, N J, 1987.
J. M. ORTEGA AND W. C. RHEINBOLDT, Iterative Solution of Nonlinear Equations in Several
Variables, Academic Press, New York, 1970.
M. J. D. POWELL, A new algorithm for unconstrained optimization, in Nonlinear Programming,
J. B. Rosen, O. L. Mangasarian, and K. Ritter, eds., Academic Press, New York, 1970.
L. K. SCHUBERT, Modification of a quasi-Newton method for nonlinear equations with a sparse
Jacobian, Math. Comp., 24 (1978), pp. 27-30.
L. F. SHAMPINE, H. A. WATTS, AND S. M. DAVENPORT, Solving nonstiff ordinary differential
the state of the art, SIAM Rev., 18 (1976), pp. 376-411.
equations
D. F. SHANNO, Conditioning of quasi-Newton methods for function minimization, Math. Comp.,
24 (1970), pp. 647-656.
[30] PH. L. TOINT, On sparse and symmetric matrix updating subject to a linear equation, Math.
Comp., 31 (1977), pp. 954-961.
[31] H. F. WALKER AND L. W. WATSON, Least-change secant update methods for underdetermined
[32]
[33]
[34]
systems, Res. Report August/88/41, Mathematics and Statistics Dept., Utah State University, Report 88-28, Computer Science Dept., and Report 88-09-03, Interdisciplinary
Center for Applied Mathematics, Virginia Polytechnic Institute and State University, August, 1988.
L. T. WATSON, S. C. BILLUPS, AND A. P. MORGAN, HOMPACK: a suite of codes for globally
convergent homotopy algorithms, ACM Trans. Math. Software, 13 (1987), pp. 281-310.
L. T. WATSON AND C. Y. WANG, A homotopy method applied to elastica problems, Internat.
J. Solids and Structures, 17 (1981), pp. 29-37.
T. YPMA, private communication, 1988.