Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu
()1990 SIAM J. NUMER. ANAL. Vol. 27, No. 5, p. 1263-1294 October 1990 Society for Industrial and Applied Mathematics 008 LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES* SAMIH K. BOURJI AND HOMER F. WALKERS Abstract. The notion of least-change secant updates is extended to apply to nonsquare matrices in a way appropriate for quasi-Newton methods used to solve systems of nonlinear equations that depend on parameters. Extensions of the widely used least-change secant updates for square matrices are given. A local convergence analysis for certain paradigm iterations is outlined as motivation for the use of these updates, and numerical experiments involving these iterations are discussed. Key words, least-change secant updates, quasi-Newton updates, parameter-dependent systems AMS(MOS) subject classification. 65H10 1. Introduction. Quasi-Newton methods are very widely used iterative methods for solving systems of nonlinear algebraic equations. The basic form of a quasi-Newton method for solving F(x) 0, F Rn --. Rn, is (1.1) Xk+: =xk--B:F(xk), ,. in which Bk F(xk) E Rnn, the Jacobian (matrix) of F at xk. For practical success, it is usually essential to augment this basic form with procedures for modifying the step -B:F(xk) to ensure progress from bad starting points, but we need not consider such procedures here. For a general reference on all aspects of quasi-Newton methods, see Dennis and Schnabel [11]. The most effective quasi-Newton methods are those in which each successive Bk+: is determined as a least-change secant update of its predecessor Bk. As the name suggests, Bk+l is determined as a least-change secant update of Bk by making the least possible change in Bk (as measured by a suitable matrix norm) which incorporates current secant information (usually expressed in terms of successive x- and F-values) and other available information about the structure of F There are also notable updates which, strictly speaking, are least-change inverse secant updates obtained When speaking in an analogous way by making the least possible change in of secant we In [10], Dennis these. to include intend updates, generically least-change and Schnabel precisely formalize the notion of a least-change secant update and show how the most widely used updates can be derived as least-change secant updates. In [12], Dennis and Walker show that least-change secant update methods, i.e., quasiNewton methods which use least-change secant updates, have desirable convergence properties in general. . B-:. *Received by the editors August 10, 1988; accepted for publication (in revised form) October 5, 1989. Department of Mathematics, Weber State College, Ogden, Utah 84408. The work of this author was supported in part by the United States Department of Energy under contract DE-FG0286ER25018 with Utah State University. Some of the results in this paper first appeared in this author’s doctoral dissertation at Utah State University. SDepartment of Mathematics and Statistics, Utah State University, Logan, Utah 84322-3900. The work of this author was supported by United States Department of Energy grant DE-FG0286ER25018, Department of Defense/Army grant DAAL03-88-K, and National Science Foundation grant DMS-0088995, all with Utah State University. 1263 1264 SAMIH K. BOURJI AND HOMER F. WALKER In this paper we outline general principles and specific formulas which extend least-change secant updating from the traditional square-matrix context to that of nonsquare matrices. Our primary purpose is to provide updates which can be used in quasi-Newton methods for solving nonlinear systems which depend on parameters. We mention two important areas in which such systems occur and which have strongly influenced the work presented here. The first is the solution of a nonlinear system by continuation or homotopy methods, in which the parameter is a continuation parameter used to transform a problem from one which is presumably easy to solve into one for which the solution is desired but which may be difficult to solve ab initio. The second is the solution of ordinary differential equations by implicit methods, in which the parameter is the time variable and a nonlinear system must be solved by a sequence of corrector iterations at each timestep. In this second area, iterative methods used in the corrector iterations typically must carry over Jacobian information from one timestep to the next for the sake of efficiency. As a result, the nonlinear systems arising in this context are appropriately regarded as parametrized by the time variable, rather than as sequences of nonparametric systems, even though time is constant during each set of corrector iterations. We consider a parameter-dependent nonlinear system in the form (1.2) F(x, A) 0, F :Rn Rm - Rn, where x E Rn is an independent variable and A E Rm is a parameter vector. Our interest is in iterative methods that are intended to produce approximate solutions of (1.2) through some range of A-values. There are many iterative methods that might be appropriate for this, depending in part on how the A-values are specified (a priori or as the iterations proceed, independent of or in conjunction with the x-values, etc.). We are not concerned here with how successive A-values might be determined or with the specific forms of the iterations, but we have in mind methods which require approximations at current (x, A)-values of F Rnxn, the Jacobian of F with respect to x, or perhaps of the full Jacobian F [F, F] Rnx(n+m), where F Rnm is the Jacobian of F with respect to A, and which for efficiency require Jacobian information to be retained through at least some changes in A. In view of the success of least-change secant updates in maintaining approximate Jacobians in a quasi-Newton iteration (1.1), it is natural to consider their use in maintaining approximations of F or F’ [F, Fx] in iterations used to solve (1.2). In order to do so, some sort of extension of the usual formulation of these updates is clearly required. Indeed, the usual formulation applies only to square matrices, and F is nonsquare. Even if we require only an approximation of the square matrix F, it is not clear how to incorporate the information in successive F-values in a secant update if A changes as well as x. To be more specific about this last point, we suppose that (x, A) and (x+, A+) are "current" and "next" values and that B is a "current" approximation of Fz. Then the usual formulation for determining a "next" approximate Jacobian B+ as a least-change secant update of B calls for B+ to satisfy a secant equation (1.3) B+ (x+ x) F(x+, A+) F(x, A) as nearly as possible consistent with any structure that may be imposed on F(x+,A+)-F(x,A) [F=(x+,A+),F(x+,A+)] x A+- A / B+. Since LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES 1265 this secant equation is clearly inappropriate if A+ A. One modification of the usual formulation which may be useful in some circumstances is to replace the righthand side of (1.3) with something more appropriate. An ideal replacement would be F(x+, ,+) F(x, ) F),(x,,k)(,k+ ,k) or F(x+, ,+) F(x, ) F(x+, )+)()+ ), provided Fn is easy to obtain. If F is not easy to obtain, then a finite-difference approximation might be an effective substitute. However, any strategy such as this will necessarily require extra derivative or function evaluations and so may be undesirably expensive in some applications. In the next section we give extensions to the nonsquare-matrix case of the usual general formulations of square-matrix least-change secant and inverse secant updates. With these, the secant information in successive F-values can be used to determine updates of approximations of F [Fx, F] according to least-change criteria. These extensions incidentally determine updates of approximations of Fx using available secant information, i.e., without any extra derivative or function evaluations. Having these general extensions, we derive in 3 extensions of the most widely used specific formulas for square-matrix least-change secant updates having various properties. In order to provide some theoretical support for using these updates, we outline in 4 a local convergence analysis for certain paradigm methods which employ them. In 5, we report on some numerical experiments. The focus here is on least-change secant updates of matrices with more columns than rows; however, our ways of deriving updates using least-change principles readily apply as well to matrices with more rows than columns, such as occur in nonlinear least-squares problems. In this connection, we note that Griewank and Sheng [19] have recently considered Broyden updating for such matrices in the context of nonlinear least-squares problems. Although the general updating principles, specific update formulas, and local convergence analysis given here are for the most part new, there is other work which is closely related to the developments here. The updates used in the path-following algorithm of Georg [16] and in the augmented Jacobian matrix algorithm in the homotopy method code HOMPACK [32] are really the same as our extension of the first Broyden update (3.1.1) below, although these updates are viewed somewhat differently in [16] and [32]. In work which is complementary to that in this paper, Walker and Watson [31] consider general normal flow and augmented Jacobian algorithms for underdetermined systems which use the general and specific updates given here and give a local q-linear and q-superlinear convergence analysis for these algorithms. (See, e.g., Watson, Billups, and Morgan [32] as well as [31] for a description of the normal flow and augmented Jacobian algorithms. Also, see, e.g., Ortega and Rheinboldt [25] for definitions of the various types of convergence referred to here.) In recent work independent of that here and in [31], Martinez [20] considers Newton-like iterative methods for underdetermined systems which use very general procedures for updating approximate Jacobians, and he develops a general local r-linear and r-superlinear convergence analysis for these methods. He points out as a special case the possibility of maintaining approximate Jacobians in normal flow algorithms with updates which are, in our terms, the Frobenius-norm least-hange secant updates developed in 2 and 3.1 below. No specific update formulas are given in [20], although experiments with (sparse) first Broyden updating are discussed; see the additional remarks in 5 below. Finally, we note that in many respects there are strong parallels between the developments here and those in [12]; this is especially so in our general formulations of least-change secant and inverse secant updates in 2 and in the local convergence analysis in 4 and in the Appendix. 1266 SAMIH K. BOURJI AND HOMER F. WALKER Our notational conventions are not strict, but it might be helpful to the reader to note that we use the following guidelines: Unless otherwise indicated, lowercase letters denote vectors and scalars, and capital letters denote matrices and operators. Boldface uppercase letters denote vector spaces, subspaces, and affine subspaces. For n + m. Vectors and matrices with bars are in Rn and R’n, convenience, we set respectively. Without bars, vectors are in R’ or Rm and matrices are in R’’ or R’m, unless otherwise indicated. If x e R’ and A e Rm, then for () E Rn we usually write (x, A), F(x, A) f(), etc. Vector components and matrix entries are indicated by superscripts in parentheses. Distinguished vectors and matrices are indicated by subscripts, and subscripts on vectors and matrices are inherited by their components and entries. For example, the kth matrix in a sequence in Rnxn might be denoted by Bk, in which case its ijth entry would be denoted by j). We use to denote all vector norms and their induced matrix norms, and we use I1" I to denote a matrix norm associated with an inner product. A projection onto a subspace or affine subspace which is orthogonal with respect to I1" is denoted by P with the subspace or affine subspace appearing as a subscript. If P denotes a projection, then we set p_L I- P, where I is the identity operator. B(k I" I 2. General least-change secant updates. To define general least-change secant updates of nonsquare matrices, we suppose that analogues of the usual ingredients in the square-matrix case are given, viz., the following: (i) B e (ii) eRn andy (iii) an inner-product norm (iv) an affine subspace A AN -b S C_ R,n, where S is the parallel subspace and AN is the element of A normal to $ in the I[" I -inner product. It is appropriate (but not necessary) to relate these to the equation-solving context outlined in the introduction by regarding B F’() for some + for some +, and y F’(), e.g., y F(+)- F(). The affine subspace A presumably reflects some structure of F’, e.g., a particular pattern of sparsity, which is to be imposed on updates. An appropriate choice of norm I[" I[ depends on the problem at , - hand but seems likely to be the Frobenius norm or a weighted Frobenius norm, as in the square-matrix case. We define a general least-change secant update via the approach and notation of [12, 2]. We set N() {/17/ e Rnn /t:/- 0} and Q(y,) {2/e R’n /17/ y} and note that N() is a subspace of Rnxn and Q(y, ) is an affine subspace representable as Q(y, ) ys-T/T + N(). We define M(A, Q(y, )) to be the set of elements of A which are closest to Q(y, ) in the norm [[. if Q(y, ) q}; otherwise, we set M(A, Q(y, )) A. Of course, M(A, Q(y, )) ANQ(y, ) if ANQ(y, ) q). As in Theorem 2.3 of [12], it can be shown that M(A, Q(y, )) is an affine subspace with parallel subspace S N N(). Our general definition of a least-change secant update is the following. DEFINITION 2.1. B+ E Rnxn is the least-change secant update of B in A with respect to y, and the norm ][. [[ if B+ is the unique solution of , /M(A,Q(y,)) We note that /}+ PM(A,Q(u,))/}, the [[. [[-orthogonal projection of / onto M(A, Q(y, )). If A Q(y, ) q), then B+ satisfies the secant equation (2.1) B+=y. 1267 LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES It can easily be verified that all of the results of [12, 2] for square-matrix leastchange secant updates extend to the present case. Particularly relevant results are the following. We have /}+ lim (PAPQ(y,)) k PA/ + (i (i In (2.2), J- PSNN() (I PsPN(,)) -1 is well defined as an operator on (S N N()) +/-. T) (I- PsPN()) -1 N + (I- PsPN()) -1 PsPN() +/- T’--and it follows that for any G E + . In fact, (S N N()) +/- M(A, Q(y, g)), P-NN() + PSNN() and for any M E RnXn )11 + Ps u( ) )11. We also offer a general definition of a least-change inverse secant update of a nonsquare matrix. Our motivation is the importance of least-change inverse secant updates in the square-matrix case. For example, the Broyden-Fletcher-GoldfarbShanno update [6], [7], [13], [17], [29] is such an update (see [10]) and is generally regarded as the most successful update for unconstrained optimization problems. The second Broyden update [5] is another such update (see [10]); although it is not generally as effective as the first Broyden update [5], there is evidence that it may be competitive in stiff ODE applications (see [1], [4]). We hope that the updates prescribed here will be similarly effective, e.g., in optimization problems which depend on parameters or in stiff ODE problems in which the time variable is taken into account in updating. The idea underlying our definition is to assume that B is of full rank n, then to select a set of n columns which constitute a nonsingular matrix, and finally to apply the least-change principle to the n columns of the inverse of this matrix together with another m columns derived from B. There are several ways in which this idea might be carried out. We choose a way which is not only natural and straightforward but also supported by the local convergence analysis in 4. For convenience, we assume that the first n columns of B constitute a nonsingular matrix, although we stress that any set of n linearly independent columns of/ can be used instead. We write/} [B, C] for nonsingular B Rnn and for C Rnm and set K [K,L], where K B and L -B-C. We define an update K+ of / with an eye toward an inverse secant equation - (2.5) R+y s, - where we write g (s, t) for s Rn and t Rm and set (y, t). Our motivation is the following: If we write/Tt’+ and/}+ [K+,L+] [B+, C+] with K+ B+ and L+ -B+-IC+, then/+ satisfies the secant equation/}+g y if and only if s /+. We phrase our definition in terms of/ and/+ as B_ B_ follows. y C+t 1268 SAMIH K. BOURJI AND HOMER F. WALKER , /+ E Rnn is the least-change inverse secant update of/ in y, and the norm [1-II if K+ is the unique solution of DEFINITION 2.2. A with respect to 7EM(A,Q(s,)) (s, t) for s E Rn and t e Rm and ff where (y, t). To provide additional motivation for this definition, we note that Ypma [34] and one of the referees have observed that 0 I 0 I 0 I and from this it is easy to see that K+ of Definition 2.2 can be obtained through a conventional least-change inverse secant update of a square matrix with an appropriate choice of an inner-product norm and affine subspace. We also note that from K+ [K+, L+], we can obtain/}+ [B+, C+] by taking B+ K and C+ -K+-IL+, although it may be preferred in some applications to maintain K+ rather than B+. If A N Q(s, ) q), then/+ satisfies (2.5) and/}+ satisfies (2.1). We have in any case that/Tf+ pM(A,Q(s,0))/. Also, inverse analogues of (2.2), (2.3), and (2.4) hold with /}+,/}, and y in those equations replaced by/+,/, and s, respectively. , , 3. Some specific updates. We now derive extensions to the nonsquare case of the best-known square-matrix least-change secant updates. These extensions are obtained from the definitions in 2 by making various choices of A and I1" I which correspond in each case to the analogous choice made for square matrices. Not surprisingly, many of these updates look very much like their square-matrix counterparts. Our intention is not only to determine updates which are likely to be important but also to demonstrate methods of derivation which can be used to obtain other desirable updates. We first derive least-change secant updates from Definition 2.1 and then derive least-change inverse secant updates from Definition 2.2. In each case, we refer to the update by the corresponding well-known name in the square-matrix case. For simplicity, we also assume that the matrix to be updated lies in A in each case. This is typically to be expected in practice. Also, if the matrix were not in A, then we could obtain its least-change secant update in A by first projecting it onto A in a straightforward way and then applying the appropriate formula below. For both the symmetry-preserving updates and the inverse updates, it is necessary to assume that a subset of n of the columns of the matrix being updated has some special property. Both for convenience and because it seems most likely to be the case in practice, we assume that the special property resides in the first n columns. We stress that any other subset of n columns will do just as well and that even though an assumption is made about a particular subset of the columns, the updates are derived by applying the least-change principle to the entire matrix. 3.1. Least-change secant updates. We derive below extensions of the first Broyden update, the Powell symmetric Broyden (PSB) update [26], the sparse Broyden (Schubert) update [8], [27], and the Davidon-Fletcher-Powell (DFP) update [9], [14]. At this time we do not have a tractable derivation of an extension of the sparse symmetric update of Marwil [21] and Toint [30], although such an update has been derived by Beattie and Weaver-Smith [2] using other methods. . 1269 LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES , We suppose that we are given B 6 Rnxn, nonzero 6 R and y Rn. When necessary in the following, we write/} [B, C] for B Rnxn and C Rnxm and n m. for We t and s that each update below gives the usual note R R 6 (s, t) square-matrix update of B when t 0. The first Broyden update. We take A S = Rnxn and I1" I1" IIF, the Frobenius norm. We have M(A, Q(y, $)) and so ]+ Q(y,), PM(A,Q(y,))] PQ@,)B, where the projection is I1" IIF-orthogonal. Orthogonal projection onto an affine subspace is obtained by adding the normal element of the affine subspace to orthogonal projection onto the parallel subspace. It is easily verified that the normal element of Q(y, ) is yT/T and that orthogonal projection onto the parallel subspace Rnn. It follows that N($) is given by PN()AT/--/9/[1- $T/T] for I yT +=[[I-T] B)$T + (Y-T The Powell symmetric Broyden update. We now take A S {// [M,N] e Rnn M M T e Rnn} and again take I1" I I1" liE. Instead of characterizing PM(A,Q(y,)) as above, we work with the expression (2.2). With AN O, (2.2) is ps [ + [M,N] e Rnxa with M 6 Rnxn and N e Rnxm, we easily have PsM [(M + MT)/2, N], and PN()/tT/ /17/[I- jT/Tj] as before. Assuming ] e A S, we use these projections to evaluate the right-hand side of (3.1.2). First, we have PSnN()] limk-,oo(PsPN())k[ (cf. [12, 2]). Straightforward For M calculation gives PsPN(,)B PsPN()B1 ...,PN()]2 Note that k 2, 3, B- 1 -B1 lsTs sB 2 0, so PsPN()2 2 where B2 (3.1.3) , PSnN()/} =/- T 3 -4- tT t (l -/2) =-- T 0. It can be shown by induction that for and it follows that T 8-T, + T + tTt 1270 _ SAMIH K. BOURJI AND HOMER F. WALKER PsPN()) -1 To evaluate the other term in (3.1.2), we note that (I- _,k=o(PSPN())k on (S N N()) +/- and that " ys and PN() ,- 5 T---- + T’ -J 2TPSPN() -] 1, 2, ..., It follows by induction that for k ( - s Ty sT k2T] PsP() sT s sTY s PSPN() T-- (3.1.5) and yT 1 +/PsPN() --] (3.1.4) (3..4) and (3.1.5) give (3.1.6) (I PsPN(,)) {yT) yT + [syT, ytT] sTy sT Combining (3.1.2), (3.1.3), and (3.1.6) gives our extension of the PSB update: (3..7) B+ B+ (- 9)* + [(- 9)*, (- T + tTt r(u_ 9) $T + tTt T" The structure of this update may be better revealed if we write B+ Rnxn and C+ Rnxm and note that (3.1.7) gives 9+ [B+, C+] for 9) + (u- 9) ,r(u- 9) T + tTt T + tTt T’ 9)tr 2(usr(uS) C+ =C+ T + tTt T + tTt T" B+ =B+ (u(a.l.S) ." The sparse Broyden update. We now take A and S to be the set of matrices in Rnxn having a particular pattern of sparsity, and we again take ]F. As in the case of the first Broyden update, we characterize PM(A,Q(y,)). For n, we let i be the vector obtained by imposing on the sparsity pat1, tern of the ith row of matrices in A. We note that if M A, then M M(A, Q(y, )) if and only if for 1, ...,n, eiTM =eiTy whenever i 0, where ei is the ith standard unit basis vector in Rn. Also, M(A, Q(y, )) is an affine subspace with parallel subspace A N() S N(). Then we have for M A that I ..., n /=1 and that the normal to M(A, Q(y, )) n i--1 is 1271 LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES where "/" denotes pseudo-inverse, i.e., for each i, (Ti)+ ("i)+ 0 if i 0. It follows that if/} E A, then (Ti)-I if i 0 and The Davidon-Fletcher-Powell update. We take A- S- (I- [M,N] E R’x’M-M T e R’’} as in the case of the PSB update, and we suppose that B [B,C] A. As in the square-matrix case (cf. [10], [12]), we determine an appropriate matrix norm for Rnn. We ask the reader to bear with our updating using a scs{n9 mstr{x perhaps curious definitions and assumptions pending the justication given following them. We set y- Ct and assume sT > 0. We let ]7 Rnn be any positiveand define a norm definite symmetric matrix such that Ws llw on Rn as follows: For M [M, N] E Rn we set , (3.1.9) [I/17/llv II" tr {W-IMW-M T + W-NNT}. To justify these definitions and assumptions, we note that in the equation-solving context outlined earlier, we presumably want an update of the type sought here in order to reflect positive-definiteness and symmetry of Fx. If 5:+ 5: for nearby C and of and then is a 5: if good approximation F(5:), Fx(5:)s, and so we can 5:+ expect sTfl > 0. The tradition is to think of the s, pair as reflecting a natural scaling determined by F and approximated by W. In principle, we might define I1" IIw in any of several ways which extend the definition used in the square-matrix case. We have chosen to define I1" IIw by (3.1.9) because doing so yields a final update formula in which W does not appear. As in the square-matrix case, this is crucial; see the discussion in [12, pp. 971-972]. We determine /}+ as a least-change secant update of/} in A with respect to I1" IIw. The strategy for doing so is one sometimes used in the square-matrix case (see, e.g., [10]), viz., first to rescale B using W, then to update the rescaled matrix using (3.1.7) and (3.1.8), and finally to remove the scale to obtain B+. We let W jjT be any factorization of W and for B [B,C] and B+ [B+,C+] we set B and -T, -T, g-c] [, ] [J-BJ [/}+, (7+] [J-IB+J J-C+]. We note where that /}+ y if and only if/+ (,t) (jTs, f;) and ) J-y. It follows that Furthermore, liB+B+ is the least-change secant liB+- BilE. update of B in A with respect to y, and I1" IIw if and only if B+ is the least-change secant update of in A with respect }, ), and I1" liE. Determining/+ by (3.1.7) and (3.1.8) and removing the scale gives our extension of the DFP update, in which +, , BIIw # (#, t). B+=B+ (y -T )T, (y )tT] B)# + [#(y T _}_ tTt 1272 SAMIH K. BOURJI AND HOMER F. WALKER To further clarify the structure of this update, we write B+ and C+ Rnm and obtain from (3.1.10) [B+, C+] for B+ 6 R n 3.2. Least-change inverse secant updates. We now give extensions of several square-matrix least-change inverse secant updates, viz., the second Broyden update, a rank-two symmetry preserving update due to Greenstadt [18], which is the inverseupdate analogue of the PSB update, and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update. As we remarked in 2, the BFGS update is generally regarded as the most successful update for unconstrained optimization, and the second Broyden update, while generally not as effective as the first, may have some advantages in stiff ODE applications. In the square-matrix case, the Greenstadt update is generally less successful than the PSB update. We include it here both for completeness and because it may prove to have particular advantages on some special problems. It would be straightforward to prescribe an inverse-update analogue of the sparse Broyden update, but the lack of applications of such an update makes this seem of doubtful worth. Rn, and y Rn. We write We assume that we are given K Rnn, nonzero (y, t). We also write/ [K, L], where (s, t) for s Rn and t 6 Rm and set K Rnn and L Rnxm, and denote each updated matrix by K+ [K+,L+]. We note that as before each update below gives the usual square-matrix update of K when t 0. Each of these updates is an inverse-update analogue of a direct update in 3.1 and not only can be derived by analogous reasoning but indeed is obtained just by renaming the ingredients in the appropriate formula in 3.1. The second Broyden update. As in the case of the first Broyden update, we take A S Rnxn and I1" I1" Then we obtain from (3.1.1) that - I IIF- The Greenstadt update. As with the PSB and DFP updates, we take A- Sand I1" I I1" II . Then {II= [M,N] e Rnn’M=MT (3.1.7) yields K+=K+ -_ lT tTt yT(s or Rnn} K) yT fi)yT + y(s R) T yT(s R) yyT /T ] + tT t T I lT ] tT t T T Ry) yt yT(s2(S [i)t L+=L+ fT + tTt fTf + tTt fTf" K+=K+ (s LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES 1273 The Broyden-Fletcher-Goldfarb-Shanno update. We again take A S {1"I [M,N] e Rna" M . MT Rnn} and define the norm as a weighted Frobenius norm as follows: Set s- Lt Fx(2)-ly, assume Ty > O, and let W Rnn be any positive-definite symmetric Then for matrix such that Wy [M, N] Rna, define [I//llv From (3.1.10) we have that R+ =R+ T R) 7.,J yT(s T where (, t), _ tr (W-1MW-IM T / W-INNT}. 2t_ tTt T T tTt or (3.2.3) In the square-matrix case, it is often desirable in practice to update approximate Jacobians directly instead of updating their inverses, e.g., in order to maintain and update QR or Cholesky factorizations of approximate Jacobians. Direct update formulas can be obtained in a straightforward way from the expressions above via the Sherman-Morrison-Woodbury formula (see, e.g., [25, p. 50]), and we give such formulas below for the second Broyden update and the BFGS update. Although the direct update formula for the second Broyden update is easy to obtain, deriving the direct formulas for the Greenstadt and BFGS updates is quite tedious. We do not discuss the direct formula for the Greenstadt update here since there appears to be nothing additional to be learned from it and it seems unlikely to be widely applied. Assuming K and K+ are invertible, we set B [B, C], where B K and C -K-L, and /+ [B+, C+], where B+ K_ and C+ For the second Broyden update, an application of the rank-one ShermanMorrison-Woodbury formula to K+ given by (3.2.1) yields -KL+. - (y- B)yTB B+-B+ yT + tTt which gives C+=C+ (y- B)(yTC + tT) yT + tTt For the BFGS update, an application of the rank-two Sherman-MorrisonWoodbury formula to K+ given by (3.2.2) and (3.2.a) yields (after a very long computation) (3.2.4) B+ B + (B)TB-(B) yyT d (/)(/})T + 2tTt [y(/)T + (/})yT] c 1274 SAMIH K. BOURJI AND HOMER F. WALKER which gives B+B -1 C+ [C T 2 2ytT- d2 T yTB-Iy ()tT did2 ] where yTB-B + tTt, yTB-B -t- 2tTt, (yTB-Y T tTt) d2 c d2 T yTB_ly d d 4 (tTt) 2 / ()rB-([)c. d d2 (3.2.6) There are many possible alternatives to these expressions for B+ and C+, but these are as appealing to us from both aesthetic and computational points of view as any others which we have explored. The apparent computational difficulty of applying (3.2.5) is mitigated somewhat by the fact that (3.2.7) B+B - I+ + (/)TB-1 () y(B-y) T d 2tTt c (/) (B_I$) T [Y(B-I) T + ([)(B-y)T], and so by using (3.2.7), the application of (3.2.5) involves mainly forming dot products and linear combinations of vectors and, in particular, no excessive system solving. We note, however, that forming B-y and B-B requires the solution of two systems involving B, and we have not been able to find any alternative to these expressions that does not require this. An alternative which might be preferred is to maintain B and L instead of B and C. For example, iteration (4.1) in the next section is naturally implemented by maintaining B in factored form together with L. In any case, C can be recovered from B and L if desired by C -BL, and maintaining B and L offers certain arithmetic advantages. Indeed, from (3.2.3) and (3.2.4) and the identities B B and s- KB-y, we have - B+ (3.2.8) TBSd B+ L+=L/ yyT _c (B)(B) 2(- B-ly)tT d2 T + 2tTt [y(B) T + (B)yT] yT(_ B-y) tr d’ d2 where d d2 (3.2.9) yT z_ tTt, yT + 2tTt, (yTB-y + tTt) d2 dl 4 (tTt) 2 + TBc. c-d2+ d yTB-y, The implementation of these expressions requires the solution of only one system involving B, and considerably less work is required to generate L+ from L using (3.2.8) and (3.2.9) than is required to generate C+ from C using (3.2.5)-(3.2.7). LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES A local convergence analysis. In this section we outline 1275 a general local for for certain iterations systems parameter-dependent solving convergence analysis which use the general least-change secant and inverse secant updates defined in 2. Our objective is to provide some theoretical support for using these general updates and the specific updates in 3 to augment the motivation provided by the success of least-change secant updates in the square-matrix case. Iterative methods used to solve parameter-dependent systems vary widely in practice and are usually structured with particular applications or contexts in mind. Here we consider certain paradigm iterations which we feel represent in an essential way a significant class of methods that has not )een treated elsewhere. We have in mind methods in which some explicit control is exercised over successive parameter values and, in particular, the way in which parameter values approach final or limiting values. For treatments of other iterative methods applicable to solving parameter-dependent systems, especially the normal flow and augmented Jacobian algorithms, see Walker and Watson [31] and Martinez [20]. Our paradigm iterations and local convergence analysis could have taken several forms, and we have tried to offer a good compromise of generality, presentability, etc., which still serves the primary purpose, viz., to show that if updating is done according to certain principles, then successive approximate Jacobians will approximate the actual Jacobian increasingly well in the directions which count and desirable convergence properties will follow. The main point of our an:ysis can be summarized as follows: If the Jacobian has the structure being impose :[ on updates and if various ancillary hypotheses hold, then locally the iterates produced by the paradigm iterations converge as fast as the parameter convergence will allow, at least up to q-superlinear convergence. In our analysis, parameter values are explicitly allowed to approach a limiting value without necessarily ever reaching it, and the rate at which this limiting value is approached plays an important role. These aspects of our analysis are essential not only to establishing the main point summarized above but also to meaningfully addressing the basic issue, which is whether anything is to be gained by nonsquarematrix least-change secant updating. If we were to consider only the case in which parameter values attain some final value after a finite number of steps, then the analysis would provide no basis for distinguishing between doing nonsquare-matrix updating at each step and, say, doing no updating at all until the final parameter value is reached and subsequently doing conventional square-matrix updating; indeed, it follows from the analysis in [12] that the latter leads to q-superlinear local convergence. We emphasize that the iterations considered here are intended to serve as paradigms. However, it should be possible to modify our analysis without much difficulty to apply to some other iterations which do not fit within the framework given here; see the discussion in 5 after Algorithm 5.1 for an illustration. Our analysis closely parallels the general local convergence analysis for squarematrix least-change secant update methods given by Dennis and Walker [12]. It is not fully as general as the analysis of [12]: For one thing, it does not include cases in which part of the Jacobian is computed and part is approximated by a matrix maintained by updating. However, it can easily be extended to include such cases; they have not been considered here only to simplify the exposition. For the same reason, our iterations do not include the option of not updating, although it would be trivial to do so. More seriously, the analysis here does not include cases in which the norm used to define least-change secant updates varies from one iteration to the next (the "iteratively rescaled" cases of [12]). These cases are very important and include 4. 1276 SAMIH K. BOURJI AND HOMER F. WALKER the nonsquare extensions of the DFP and BFGS updates given in 3. It might be considered a serious shortcoming of the analysis given here that these cases are not included; however, we note below that there are still good heuristic arguments based on our analysis which support the use of these updates. Otherwise, we have retained the generality of [12] in our analysis. In particular, we have allowed for cases in which the Jacobian at the solution of interest is not in the set of allowable approximants in order to show how the speed of convergence is affected in these cases. Since the usual secant equation determined by a difference in F-values is unlikely to be appropriate in these cases, we have also allowed for cases in which there is a variety of admissible secant equations determined by a "choice rule." As in [12], we assume a choice rule is given as a function X which for each pair E R determines a set X(, +) c_ Rn of admissible right-hand sides of secant equations. Following [12], we have carried out most of the difficult technical work underlying our analysis in an appendix. The results in the Appendix, while used here only to support the analysis in this section, are rather general and may be of interest in their own right. We assume below that A AN + S C_ Rnn is a given affine subspace which reflects structure to be imposed on approximate Jacobians and that I1" I is a given inner-product norm on Rnn. As before, we denote all vector norms and their induced matrix norms by I" I, and here we assume that particular norms I" are given on Rn and Rm. We also consider various norms on R which are specified in each instance using the norms on Rn and Rm. We consider a nonlinear system (1.2) and assume that a particular (x,, A,) 0. Throughout the sequel we assume the following hyis sought satisfying F(,) pothesis. BASIC HYPOTHESIS. Let F be diflferentiable in an open convex neighborhood of , (x,, ,) R .for which F(,) 0 and suppose that y > 0 and p (0, 1] are such that .for Y. + , IF’() F’(Y.,)I <_ "y I Y;,I p where ly] max{lxl, IAI} for z (x, A) e Rn. We assume that {Ak }k=o,,... is a sequence prescribed in some way which converges to A, and first consider an iteration which begins with some o (xo, A/o) and /}0 --[B0, Co] and determines (4.1) ..., for k where (/k)+ is the least-change secant update of/k in A with 0, 1, respect to gk .k+l --.k, Yk, and the norm I1" II. An analogous iteration which uses least-change inverse secant updates is given by (4.11) below. Iteration (4.1) is modeled after an iteration derived in the manner of Newton’s method in which Xk+l x/ F() -1 {F(Y;) + F(k)(Ako+k+l Note that (4.2) reverts to the Newton iteration in x if "ko+k+l Ak0+k)}. )iko+k. The index k0 is used in (4.1) mainly for convenience in our local convergence analysis; we elaborate further on the use of k0 in the remark following Theorem 4.1 below. We are not 1277 LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES concerned here with exactly how the Ak’s are prescribed. They may be completely specified in some way prior to beginning the iterations. However, in some iterations used in practice, they are determined as the iterations proceed, rather than at the outset, and it happens in some cases that they often remain constant through a range of k-values before changing in some specified way. For a clarifying example, see Algorithm 5.1 and the related discussion in 5. The assumption in (4.2) that F is invertible might seem troubling to those concerned with homotopy methods since it suggests an assumption that a particular set of n columns of F is of rank n. However, if F’(,) itself is of rank n, then the variWe point out that ables can be relabeled if necessary so that F is invertible near with this understanding, certain corrector iterations used in predictor-corrector homotopy methods, such as the augmented Jacobian algorithm, the normal flow algorithm, and variations of these which use least-change secant and inverse secant updates (see Watson, Sillups, and Morgan [32] and Walker and Watson [31]), fall within the framework of iterations (4.1), (4.2), and (4.11) below. However, our analysis below fails to apply to those iterations because the main results are conditioned on the rate of q-convergence of the Ak’s to A,, and nothing can be said about this q-convergence for those iterations. See Walker and Watson [31] for a local q-linear and q-superlinear convergence analysis which applies to those iterations. Our local convergence results for iteration (4.1) are given in Theorems 4.1 and 4.2 below, which are analogues of Theorems 3.1 and 3.3 of [12]. Theorem 4.1 addresses the local linear convergence of the iteration; Theorem 4.2 dr ws more refined conclusions about the asymptotic speed of convergence. In Theorem 4.1, we have augmented the local q-linear convergence result which the reader might expect with a statement about the local r-linear convergence of the iteration sequence in the case in which Ak --> A, r-linearly. This has been done for two reasons: First, r-linear convergence of {Ak} to A, is all that can be expected in many iterations used in practice. Second, the reader might otherwise accuse us of offering only a disguised version of the theory of [12] in cases in which A} -+ A, q-linearly, but only because Ak A, for large k. Our notation and terminology used in association with q-linear and q-superlinear convergence is that of Ortega and Rheinboldt [25, p. 281]: If {zk}}=o,1,... is a sequence converging to z, in a space with norm I, then Q1 {z} }, the linear q-factor of {zl }l=0,,..., is defined as ,. 0 Qi(z} li--_ z,l/l if zk z,, k if z} z,, k otherwise. >_ >_ some k0, some k0, We say that {zk}k=o,,... converges q-linearly to z, in the norm [. if and only Q(zk} < 1 and that (zk}k=0,,... converges q-superlinearly to z, if and only if Q(zk} O. Note that q-superlinear convergence holds in one norm if and only if it if holds in every other norm. Also, in saying that a sequence of matrices is uniformly small or uniformly bounded, we mean that in any matrix norm the corresponding sequence of norm values is uniformly small or uniformly bounded. THEOREM 4.1. Let F satisfy the basic hypothesis. Let A and B, [B,, C,] _---nn is invertible and there exists an rx .for which PA(F(,)") be such that B, E R ]I- B,-F(,)I <_ rx < 1. Assume that X has the property with A that there exists ’l and any y X(,+), we have an >_ 0 such that for any 5c, + (4.3) 1278 SAMIH K. BOURJI AND HOMER F. WALKER .for every e M(A, Q(y, )), where 2+-2 anda(2,2+) max{12-2,1, .for 121 max{x, ]A} for 2 (x, ) e Rn. Let {Ak}k=o,,... be such that one of the ,, following holds: r, k O, 1,..., for some and r (0, 1); ]Ak q-linearly with Q{Ak} r e [0, 1). Ak Then for any r such that max {rx, rn} < r < 1, there exist positive constants and an integer kr such that if ]xo- x, < er, < 5r, and ko kr, then and converges to 2, with iteration (4.1) is well defined for k O, 1, (4.6) 2 2, rker, k O, 1,..., if (4.4) holds, where 2] max {x, } for (x,) R; g r]k k O, 1,..., if (4.5) holds, where (4.7) ]k+ (x, ) e Rn and for a suitable constant F. for (4.4) (4.5) ..., o- , , , ,, Furthermore, { k } k=O,,.., and { B; B, } k=o,i,.., are uniformly small. Remark. Our reasons for using the index k0 in this analysis are somewhat subtle, and we will outline them in the context of iteration (4.1) and Theorem 4.1. Similar remarks are valid in the context of iteration (4.11) and Theorem 4.3 below. The bulk of the technical work underlying Theorem 4.1 is contained in Theorem A.2 in the Appendix, an examination of the proof of which shows that the A’s are required, at least eventually, to satisfy one of the following: small for each k; with (i) an r-linear convergence inequality ]Ak ,[ g a q-linear converge..ce inequality (ii) r"- with ksmall for each k, where r" is near but strictly greater than QI{Ak} (which may be +- , ,r, r,, zero). Since these conditions hold for sufficiently large k under (4.4) and (4.5), respectively, our use of k0 in conjunction with (4.4) and (4.5) is really an economical way of imposing , these conditions. Proof. Let N Rn and N2 Rnxn be neighborhoods of and B,, respectively, N fl and the first n columns of any matrix in N2 constitute a nonsingular matrix. Set N N x N x N2 and define an update function U on N (see the such that Appendix) by .u e x(z,z+)}, v(z,z+, where + is the least-change secant update of with respect to X(, +), and the norm [[. ][. It follows immediately from (4.3) and (2.4) that U satisfies of 0 and 2 the bounded deterioration hypothesis of the Appendix with al (4.3), and the theorem follows from Theorem A.2. THEOREM 4.2. Suppose that the hypotheses of Theorem 4.1 hold and that for some o and Bo, {k}k=0,,... is a sequence generated by (4.1) which converges q-linearly to , in some norm with gk k+ k # 0 for all but finitely many k. Then lim [I- B,-F(,)](x x,) (x+ x,) (a.s) for B,-C,(X0++ ,) / + B, IV, on Rn and Rn. In paicular, ff rz any norms Qi { Ak } then the following hold: (4.9) for any e > 0, lik [k+ x] + F]A] for , /lz, z,l 0 ]I- B,- Fz(,)] and rx max{rz, r + e}, where ] ,[/]k n constant F depending on suitable a and (x, ) e R for LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES (4.10) if r O, then 2k --* 2, q-superlinearly if and only if the 1279 norm-independent condition holds. - In particular, 2k 2, q-superlinearly if Ak A, q-superlinearly and F1(2,) E A. Proof. From Theorem A.3, (4.8) holds if and only if limk_ I(/k --/,)1/11 0, and this latter condition can be shown to hold by a very minor modification of the proof of Theorem 3.3 of [12]. Proposition A.4 gives (4,9) and (4.10). Theorems 4.1 and 4.2 have immediate consequences for an iteration (4.1) which uses the direct, fixed-scale least-change secant updates derived in 3. There is extensive discussion of the square-matrix analogue of condition (4.3) in [12, pp. 962964], and this discussion is valid with minor appropriate changes in the present context. It follows in general from this discussion that if F E A near 2,, then not only does B, F’(2,) but also condition (4.3) is satisfied near 2, for some a when X(2, 2+) {F(2+)- F(2)}, the traditional choice which gives Yk F(2k+l)- F(2k) in iteration (4.1). It follows in particular from Theorems 4.1 and 4.2 that if Ak q-superlinearly and if F(2) has the structure imposed on updates for 2 near 2, in each case in 3, then iteration (4.1) enjoys local q-superlinear convergence when the update is the nonsquare extension of any of the following: the first Broyden update, the Powell symmetric Broyden update, or the sparse Broyden update. We now establish counterparts of Theorems 4.1 and 4.2 for an analogue of iteration (4.1) which uses least-change inverse secant updates. We continue assuming that {k }k=o,1,... is a prescribed sequence which converges to and consider an iteration which begins with some 20 (x0, Ako) and K0 [K0, L0] and determines - , , 2k+ KkF(2k) + Lk (Ak0+k+ (xk+, Ako+k+), Yk E R+ [K+,L+] xk+ (4.11) xk Ak0+k), (R)+, 0, 1, ..-, where (/k)+ is the least-change inverse secant update of/k in A with respect to k 2.k+ --2k, Yk, and [[. [I. Theorems 4.3 and 4.4 below are for k _ analogous to Theorems 5.1 and 5.2 of [12]. Since we are not treating a "computed part" of approximate Jacobians, there is no compelling need to consider a choice rule for determining a vector wk as well as Yk at each iteration as in [12, 5]. Here we let sk Xk+ --Xk play the role of wk in [12, 5] in determining inverse secant equations. The proofs of Theorems 4.3 and 4.4 are minor modifications of those of Theorems 4.1 and 4.2, and so we omit them. THEOREM 4.3. Let F satisfy the basic hypothesis and assume that F(2,) is invertible. Let A and K, [K,,L,] P,([F(2,)-,-F(2,)-Fx(2,)]) be such that there exists an rz for which [I- K, Fx(2,)[ rz < 1. Assume that X has the 2 and any property with A that there exists an a >_ 0 such that for any 2, 2+ y X(2, 2+), we have 1280 SAMIH K. BOURJI AND HOMER F. WALKER for every E M(A, Q(s, )), where l (y, +-A) and a(5:,5:+) max(15:-5:,l, 15:+(x,A) e Rn. Let (k}k=0,1,.." be such that one 5:,1} .for 15:1- max(]x],A]} for holds. Then or r such that max (rx, r} < r < 1, there exist any of (4.4) (4.5) for positive constants er, 5r and an integer kr such that ff xo- x, < er, ]Ko- K, < 5r, and ko O, 1,..., and converges kr, then iteration (4.11) is well defined for k to according to either (4.6), ff (4.4)holds, or (4.7), if (4.5)holds. Furthermore, } =o,,... and (g; K: } =o,,... are uniformly small. (k THEOREM 4.4. Suppose that the hypotheses of Theorem 4.3 hold and that for some o and Ko, (k}k=o,,... is a sequence generated by (4.11) which converges qin some norm with $k linearly to k+ k 0 for all but finitely many k. Suppose further that (R}=o,,... and (g[ }k=o,,... are uniformly bounded and that {Yk}k=o,,... satisfies ,k- Sk 11 Ior each k, where k (Yk, Ako+k+o+), Sk Xk+ Xk,, and limk ak 0. Then , , , lim [I- K, Fx(5:,)](xk x,) . (Xk+l x,) + L,(Ako+k+ [L, + K,F (5:,)] (Ako+k A,) ),) for any norms on Rn and R In particular, ifrx II-K,F(2,)I and r Q{Ak}, then (4.9) and the following hold: if r O, then 5:k 5:, q-superlinearly if and only if the norm-independent condition holds. In particular, 5:k 5:, q-superlinearly if Ak - , q-superlinearly and e A. The discussion following Theorem 4.2 remains valid with a few appropriate ), q-superlinearly, changes. In particular, Theorems 4.3 and 4.4 imply that if )k then iteration (4.11) is locally and q-superlinearly convergent when the update is the nonsquare extension of the second Broyden update or, if Fz(5:) is symmetric .for 5: near 5:,, the Greenstadt update. As we remarked at the beginning of this section, the above analysis does not apply to cases in which the norm used to determine least-change secant updates varies from iteration to iteration. In particular it does not apply to iterations which use the nonsquare extensions of the DFP and BFGS updates given in 3. The counterparts of these iterations in the square-matrix case are the iteratively rescaled least-change secant update methods considered in [12], which include the usual DFP and BFGS methods, and a full local convergence analysis for these methods is given in [12]. However, this local convergence analysis depends critically on Lemma 4.1, p. 969, of that paper, and we cannot adapt that lemma to apply to the present circumstances. To indicate why, we note that in the setting of the extension of the DFP update in 3, the roles of W, and v in Lemma 4.1 are played by Fx(5:,) and ) y-Ct, respectively, where C is part of a given approximate Jacobian B [B, C] and $ (s, t) 5:+ 5: 1281 LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES , = and y F(+)- F() for given +. Since ) depends on C and C F(,) in general, we cannot expect an inequality I)-Fx(,)sl2 -< aa(hc,+)Plsl2 to hold for all and +, where I" 12 is the Euclidean norm. This is to say that we cannot expect an inequality of the form (ii) of the hypotheses of Lemma 4.1 to hold in the context of our extension of the DFP update. We note, however, that we can construct a local convergence analysis along the lines of that given above which, while not applicable to iterations using our extensions of the DFP and BFGS updates, is applicable to iterations using certain "nearby" updates. These "nearby" updates may be costly or even impractical in some applications. However, we feel that they are worth noting because they use more computed Jacobian information and therefore may result in more effective Jacobian approximations in some cases. A local convergence analysis for iterations using these "nearby" updates is of value in its own right; furthermore, we feel that it provides some heuristic support for using our extensions of the DFP and BFGS updates. We indicate what these "nearby" updates are and how a local convergence analysis for iterations using them can be developed in the DFP context: The "nearby" update is obtained from our extension of the DFP update by taking + and y orF(+) F() for y F(+)t y- F()t, given and + and choosing 9 Y- F(c,)t, instead of y- Ct. The first choice is usually impractical, but not always see, e.g., the homotopy map (5.4) in 5, in which Fx is constant. The second two choices can be obtained through either analytic evaluation or finite differences; either way will require function evaluation work which may by costly in some cases. Note that the evaluation of Fx ()t or Fx (+)t can be done through finite differences of F in the t-direction and, hence, does not require the evaluation of all of Fx. (If it is reasonable to evaluate all of Fx, then we can approximate F via traditional square-matrix DFP or BFGS updating in a straightforward way, and this would probably be more effective than using the "nearby" updates discussed here.) For any of these choices of 9, it can easily be shown that an inequality 19- Fx(,)812 ga(5:,+)P1812 holds, and Lemma 4.1 of [12] can be readily adapted to apply to the present situation with W, Fz(,) and v ). From this point, it is straightforward to adapt the arguments in the proofs of Theorems 4.1 and 4.2 above together with those in the proofs of Theorems 4.2 and 4.3 of [12] to obtain local convergence results analogous to those above for an iteration (4.1) in which X(,+) {F(+)- F()} and the update is this "nearby" update, y Ct replaced by one of the choices i.e., our extension of the DFP update with t y- F (,)t, t y- Fx()t, or y- Fx (+)t. - 5. Numerical experiments. We report on some simple numerical experiments that are intended to give some insight into the performance of the updates discussed in this paper, especially in the context of iterations of the type considered in 4. We stress that the experimental results given here are by no means intended to provide a complete or thorough study of iterative methods which use these updates, but rather are intended to complement experimental results and other work discussed elsewhere. For perspective, we briefly review other work involving applications of the updates given here. As noted in the introduction, the first Broyden update (3.1.1) has been used in an algorithm of Georg [16] and in the augmented Jacobian matrix algorithm in the homotopy method code HOMPACK [32]. Both of these are predictor-corrector path-following methods with predictor steps taken in current (approximate) tangent directions. In Georg’s algorithm, corrector iterations are of the normal flow type, while the HOMPACK algorithm uses corrector iterations determined by certain augmented Jacobian matrices. (See Walker and Watson [31] as well as [16] and [32] for a 1282 SAMIH K. BOUI:tJI AND HOMEI:t F. WALKER discussion of these iterations.) In Georg’s algorithm, first Broyden updating is done on both predictor steps and corrector iterations. In the augmented Jacobian matrix algorithm in HOMPACK, first Broyden updating is done only on the corrector iterations; exact Jacobians are obtained and used at each predictor step. The update used in HOMPACK is described in [32] as a square-matrix first Broyden update of a certain augmented Jacobian matrix, but it is essentially the same as the nonsquare-matrix update (3.1.1). Martinez [20] augments his local r-linear and r-superlinear local convergence analysis for very general Newton-like methods for underdetermined systems with a discussion of applications of these methods to the problem of finding an interior point of a polytope, which arises in connection with interior point methods for linear programming. In experiments reported in [20], a nonlinear transformation was used to rephrase this problem as one involving an underdetermined nonlinear system, and normal flow algorithms with both exact Jacobians and approximate Jacobians maintained by Broyden updating, together with other related algorithms, were applied to this problem. The Broyden updating was apparently either first Broyden updating or sparse Broyden updating as given in 3.1 here, although the exact nature of the updates used is not made clear in [20]. Walker and Watson [31], in addition to giving a local q-linear and q-superlinear local convergence analysis for general normal flow and augmented Jacobian algorithms which use the updates given in this paper, discuss two sets of numerical experiments involving these algorithms. In the first set, normal flow iterations using the first and second Broyden updates, (3.1.1) and (3.2.1), respectively, were applied to simple twovariable problems; in the second set, the normal flow and augmented Jacobian matrix algorithms in HOMPACK, together with variations which use first and second Broyden updating, were applied to a geometric modeling problem described by Morgan [24]. Other experiments involving use of the updates introduced here in modifications of algorithms in HOMPACK are reported by Bourji [3]. In these experiments, all of the specific updates introduced in 3 except the sparse Broyden and Greenstadt updates were implemented in modifications of the HOMPACK augmented Jacobian matrix algorithm, and the resulting algorithms were tested on five problems obtained by parametrizing problems from the test set of Mor, Garbow, and Hillstrom [23] with simple homotopy maps. The first and second Broyden updates, (3.1.1) and (3.2.1), respectively, and the PSB update (3.1.7) were tested on all five problems. In the first problem, no subset of n columns of the Jacobian constituted a symmetric matrix; however, the PSB update was still tested on this problem because the (n- 1) x (n- 1) principal submatrix of the Jacobian was symmetric. The second through fifth problems amounted to parametrized nonlinear least-squares problems, and the DFP and BFGS updates, (3.1.10) and (3.2.2), respectively, as well as the first and second Broyden and PSB updates were tested on these. We refer the reader to [3] for more details of these experiments and summarize the results here. As measured by numbers of function and Jacobian evaluations, the performances and rankings of the algorithms using the different updates varied considerably from problem to problem. However, the algorithm using the first Broyden update clearly performed best overall; only in one of fifteen trials did another algorithm, the one using the PSB update, take fewer function or Jacobian evaluations. The algorithm which performed second best overall was the one using the PSB update, and the algorithms using the other updates often, but not in every case, performed considerably worse than those using the first Broyden and PSB updates. The algorithm using the DFP update was perhaps third best and outperformed the PSB-update algorithm in one trial. Each algorithm failed LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES 1283 in at least one instance, and there was one trial (involving the Rosenbrock function) in which only the algorithms using the second Broyden, DFP, and BFGS updates succeeded and another trial (involving the extended Rosenbrock function) in which only the algorithms using the DFP and BFGS updates succeeded. However, there were two trials in which only the BFGS-update algorithm failed and three trials in which only the DFP-update algorithm failed. In evaluating these results, it should be kept in mind that there are many things going on in the sophisticated homotopy method codes in HOMPACK, e.g., procedures for selecting stepsizes, determining when to reevaluate Jacobians, etc. The first Broyden update, which performed most successfully in these trials, is the update used in the unmodified augmented Jacobian matrix algorithm in HOMPACK and thus is the one for which the code is "tuned." In view of this, we feel that it would be premature to dismiss the other updates on the basis of these experiments. As a complement to the above work, we give here the results of experiments involving the application of a simple path-following algorithm to a realistic problem. The object is to explore in some depth the performance of one of the updates given here--the first Broyden update (3.1.1)--in comparison to that of various alternatives involving numerically evaluated Jacobians and traditional square-matrix first Broyden updating. The problem of interest is the elastica problem described by Watson and Wang [33]. In this problem, a large planar deflection of a thin rod, or elastica, subject to terminal loads x(1), x(2) and moment x(3) is modeled by dO EIs x(2) x(1)y + x(3), where EI is the flexural rigidity, 0 is the local angle of inclination at the point ((, and s is the arc length along .the elastica. For simplicity, we take E1 1 and assume that the elastica is of unit length with the left endpoint clamped horizontally at the origin. This gives the initial-value problem (5.1) dO ds x(2) x()r] + x(3), o(o) o. d( ds cos0, dn ds sin0, We denote the solution by (s,x), y(s,x), O(s,x), where x (x(),x(2),x(3)) T E R3. The problem is to determine x. so that the right endpoint of the elastica has a given location and angle of inclination, i.e., so that x x. satisfies l(x) ((1, x)) 0(1, x) for a specified vector a giving the right endpoint location and angle of inclination. Because of the sensitivity of the elastica to end conditions, especially for more complicated shapes which require large forces and torques to achieve, solving (5.2) can be quite challenging for globalized Newton-like methods such as those in MINPACK [22]; see [33]. Homotopy methods seem to fare better. Our strategy here is to choose a homotopy map F(x, ) such that F(x, 0) 0 is easy to solve and F(x, 1) f(x) -a, and then to track the zero curve of F as A goes from 0 to 1. For a given F, the simple algorithm for following this curve which we used in our tests is the following. 1284 SAMIH K. BOUI:tJI AND HOMER F. WALKEI:t ALGOIITHM 5.1. Given x0 such that F(x0, 0) 0, choose nstep, the number of A-increments between 0 and 1, and set h 1Instep, x xo, and A 0. For nstep, do: 1, 1. Predict: Given (x, A) and [- [B, C] F’(x, A), overwrite x x B {F(x, A) + Ch} and A ,-- A + h. 2. Correct: Given e > 0 and initial (x, A) and B F(x, ,), overwrite x x- B-1F(x, A.) until IF(x, A)[ _< e. ..., - , What is unspecified in Algorithm 5.1 is the manner in which successive B’s and B’s are determined. If they are determined by least-change secant or inverse secant updating at each step and if the corrector convergence test is passed after a finite number of iterations for each < nstep, then this algorithm has the form of iteration nstep. With this (4.1) or (4.11), respectively, with an added convergence test when updating, a reasonable modification of the analysis in 4 and in the Appendix applies to a reasonable modification of Algorithm 5.1 as follows: Suppose F(xo, A0) 0 for some (x0, A0) and in Algorithm 5.1 we initialize h (1-Ao)/nstep, x xo, and A A0 and take B F(xo, A0) at the initial predictor step. Suppose also that we use some fixed e in the corrector iterations for < nstep and take e 0 for nstep, so that the algorithm proceeds indefinitely once A 1. Suppose finally that for (x, A) near (x,, 1), F’(x, A) has the structure (if any) which is imposed on updates and Fx(x, A) is nonsingular. If (x0, A0) is sufficiently near (x,, 1), then the iterates are well defined, h is small, and furthermore, using bounded deterioration (see the proof of Theorem 4.1 and the Appendix), we can verify that approximate Jacobians remain near their actual values for a given finite number of predictor steps and corrector iterations. It follows that if (x0, Ao) is sufficiently near (x,, 1), then the number of corrector iterations in step 2 is no greater than a prescribed bound for each < nstep, and therefore that the sequence of A-values used in the predictor steps and corrector iterations satisfies an r-linear convergence inequality (4.4) with A, 1 and/ -[A0- A, I. With this, an inspection of the analysis in 4 and the Appendix shows that if (x0, A0) is sufficiently near (x,, 1) and the appropriate ancillary hypotheses of the theorems in 4 are satisfied, then the iterate sequence {2k (Xk, Ak)}k=O,1,... satisfies the r-linear nstep convergence inequality of (4.6). After a finite number of iterations, we have and A 1 for all remaining iterations, and since approximate Jacobians are still near their true values if (x0, Ao) is sufficiently near (x,, 1), the convergence is ultimately q-superlinear. Algorithm 5.1 is quite unsophisticated compared to other homotopy algorithms. For one thing, it uses a preset number of equally spaced A-increments, rather than determining variable A-increments in some intelligent way as the algorithm proceeds. Perhaps more seriously, because a prescribed value of A is used in determining the next value of x at each step, the algorithm is likely to get into trouble if Fx is singular along the zero curve. This lack of sophistication is by design, however. The object of the experiments here is to give insight into the merits of various ways of determining successive B’s and B’s, and the simplicity of Algorithm 5.1 lends itself to this. Also, maintaining constant A-values through the corrector iterations allows options for traditional square-matrix first Broyden updating in determining successive B’s and so is useful for comparative testing. In our experiments, we tested seven different strategies for determining the successive B’s and B’s in Algorithm 5.1. These are as follows: Strategy 1. Take /} F’(xo, 0) initially, and then maintain / [B, C] and thereby B through nonsquare first Broyden updating on all subsequent predictor steps LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES 1285 and corrector iterations. See the remarks below. Strategy 2. Take [ [B, C] F’(xo, 0) initially, and then use this initial C F (x0, 0) in all subsequent predictor steps while maintaining B through square-matrix first Broyden updating on all corrector iterations. Strategy 3. Take B [B,C] F’(xo, O) initially, and then reevaluate C and maintain B through square-matrix first Broyden at each step predictor F(x, ) updating on all predictor steps as well as all corrector iterations, replacing the righthand side of the secant equation (1.3) with the "ideal replacement" F(x+,A+) F(x,A) F(x,A)(A+ A) when updating on the predictor steps. See the remarks below. Strategy 4. Take B F(x, A) at each predictor step, and at each set of corrector iterations take B Fz(x, A) initially and then maintain B through square-matrix first Broyden updating on the subsequent corrector iterations. Strategy 5. Take F(x, A) at each predictor step, and at each set of corrector iterations take B F(x, A) initially and then use this B throughout the subsequent corrector iterations. Strategy 6. Take/ F(x,A) at each predictor step and B F(x, A) at each corrector iteration. Strategy 7. Take/} [B, C] F’(xo, 0) initially, and then use this/ and B in all subsequent predictor steps and corrector iterations. Remarks. The ordering of these strategies very roughly reflects an increasing dependence on derivative evaluations and a decreasing dependence on updating. The total chord-method Strategy 7 is included mainly to note that it was quite unsuccessful in the experiments discussed here, as might be expected. The partial chord-method Strategy 5 also resulted in failure in the two particular trials reported below, but it showed some success in other trials while Strategy 7 nearly always led to failure. In Strategy 1, the nonsquare first Broyden updating of course reduces to square-matrix first Broyden updating on the corrector iterations because does not change. The updating in Strategy 3 is described as square-matrix first Broyden updating, but it can also be regarded as nonsquare-matrix updating in which the last column of F i.e., Fx, is computed while the remainder is approximated through Frobenius-norm leastchange secant updating into the subspace of matrices the last column of which is zero. Thus Strategies 1 and 3 can be regarded as the strategies which employ nonsquarematrix updating. As noted in 4, the discussion in 4 does not explicitly include cases in which part of the Jacobian is computed while the remainder is approximated by updating, but it would be straightforward to extend it to do so. In the experiments reported here, we made a particular choice of F and in each of a number of trials counted the function evaluations, Jacobian evaluations, and corrector iterations required by the above strategies for the implementation of Algorithm 5.1. Evaluating the function F required the evaluation of f in (5.2). This was done by numerically solving (5.1) using subroutine RKF45 of Shampine and Watts (see Shampine, Watts, and Davenport [28]), which we obtained from the Forsythe, Malcolm, and Moler [15] collection of subroutines. Derivatives of F were obtained by forward-difference approximations, which provided adequate accuracy. The function evaluations required for these derivative approximations were included in the overall function evaluation counts; thus these counts really provide a measure of overall function evaluation work including that required for derivative evaluation. The evaluation of either F or F counted as a Jacobian evaluation; the evaluation of F alone did not. Thus the Jacobian evaluation counts reflect the number of matrix factorizations "from scratch" which were required, as well as the number of function evaluations , 1286 SAMIH K. BOURJI AND HOMER F. WALKER which arose from Jacobian evaluations. (As in the square-matrix case, we can update matrix factors or perhaps exercise other options to incorporate the rank-one Broyden update without obtaining a new factorization "from scratch".) In the trials reported here, we chose a (0, r/2, 7t’) T, for which the solution of (5.2) is x. (0, 0, 71") T, i.e., the elastica is a semicircle. All computing reported here was done in double precision on a Sun 4/280S using the Sun Microsystems Fortran 1.1 compiler. Our particular choice of F is F(x, A) a] + (1 A)(x x0). A If(x) This choice appears to be not very well behaved. In particular we saw evidence in our testing of bifurcation of the zero curve of F, at least for the values of x0 which were used in the tests reported here. Although we did not verify bifurcation with certainty, there were at least sharp changes in the direction of the zero curve for values of A in [.5, .6] and [.9, 1], which resulted in relatively large numbers of corrector iterations for all strategies around the middle and last predictor steps. Still, for our testing we prefer (5.3) to an apparently better behaved alternative suggested in [33], viz., F(x, A) f(x)- (1 A)f(x0). First, the bad behavior of the zero curve of F given by (5.3) creates challenges for the methods being tested; second, if F were given by (5.4), then Fx would be constant and there would be no need for nonsquare updating. The results of two typical trials with F given by (5.3) are given in Tables 1 and 2 below. In both trials, we took x0 (-.4, .4,3) T. In the first trial, we also took -1 nstep, tolerances e 10 -5 for 10 for < nstep and e nstep 10 and used at interest the of an zero curve was only accurate that the on attitude point taking the final step. In the second trial, we took a greater number of steps and used tighter nstep. tolerances, choosing nstep 20 and e 10 -2 for < nstep and e 10 -6 for A maximum number of 20 corrector iterations was allowed before convergence failure was declared. TABLE 1 Results Strategy Corrector Iterations Function Evaluations Jacobian Evaluations of trial 1 for F given by (5.3). 1 2 3 4 5 6 7 36 51 1 45 60 1 22 46 1 15 78 14 * * 7 79 17 * * * * corrector convergence failure at the 6th step It is evident in these trials that the strategies which use more computed derivative information require (perhaps significantly) fewer corrector iterations, but these strategies may require (perhaps significantly) more function evaluation effort, not to mention matrix arithmetic, and updating, either alone or in conjunction with computed derivative information, offers (perhaps significant) advantages. The success of Strategy 4 and the failure of Strategy 5 show the usefulness of updating on the corrector iterations. The success of Strategies 1-3 shows the effectiveness of updating 1287 LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES TABLE 2 Results Strategy Corrector Iterations Function Evaluations Jacobian Evaluations of trial 2 for F given by (5.3). 1 2 3 4 53 78 1 65 90 1 38 82 1 23 151 29 5 6 7 18 173 38 * * * corrector convergence failure at the llth step in maintaining Jacobian approximations over the full range of A-values, even if the updating is only traditional square-matrix first Broyden updating on the corrector iterations. However, comparing the results for Strategy 2 with those for Strategies I and 3 shows the importance of updating on the predictor steps as well as on the corrector iterations, i.e., the importance of nonsquare-matrix updating. It happened in a number of other trials not reported here that Strategies 1 and 3 succeeded while Strategy 2 failed. For example, for x0 (-.5, .5, 3) T, nstep 20, e 10 -2 for i < nstep and e 10 -6 for nstep, Strategy 2 led to corrector convergence failure at the eleventh step while Strategies 1 and 3 succeeded. A comparison of the results for Strategies 1 and 3 reinforces the conventional wisdom that computing part of the Jacobian results in more effective Jacobian approximations. However, this partial computation of the Jacobian requires additional function evaluation work in proportion to the number of predictor steps. Strategy 3 has an overall function-evaluation advantage over Strategy 1 with the ten predictor steps in trial 1, but the advantage is reversed with the twenty predictor steps in trial 2. Appendix. We now give analogues of the results in the Appendix of [12] which are suitable for application to the local convergence analysis given in 4. We assume as in 4 that F" R Rn satisfies the basic hypothesis near 2, (x,, A,) and that {Ak}k=0,1,... is a sequence which converges to ),. Our convention regarding norms is denotes as in 4, i.e., I1" denotes a particular inner product norm on ann and all of the following: particular norms on l:tn and l:tm, various norms on R which are specified in each instance, and the matrix norms induced by these vector norms. Our interest is in a very general iteration which begins with some 20 (x0, Ako) and B0 [Bo, Co] and determines - I Xk+l (A.1) 2k+ {F(k) + Ck (Ako+k+l (Xk+l,Ako+k+l), Xk B .... Ako+k)}, for k 0, 1, In (A.1), U is an update function, the values of which are subsets of Rnn and which we assume is defined in a neighborhood N N1 N1 x N2 of (2,,2,,/},) E Rn R Rnn for some/,, where N1 C_ and N2 is such that the first n columns of any matrix in N2 constitute a nonsingular matrix. Usually, but not always, we also assume that B, is near F’(2,) and that U satisfies the following hypothesis. BOUNDED DETERIORATION HYPOTHESIS. Let al and a2 be nonnegative constants such that .for each (2, 2+, [) N, every U(2, 2+, [) satisfies + 1288 SAMIH K. BOURJI AND HOMER F. WALKER for a(, 5c+) max{l--,l, lY+--,l}, where lYI max{Ixl, IAI} for Y (x,A) Proposition A.1 below is a standard elementary technical result which we use in obtaining the local convergence theorems which follow. PROPOSITION A.1. Under the basic hypothesis, we have 7 IF() F’(,)(- ,)l <-l+p i_ ,]+p , max{x, A} for (x, A) e Rn. for e where + t( ,), we have Proof. Setting (t) , ]F() F’(,)( ,)] ]F’((t)) F’(,)[ dt [( < -l+p Iz-z,I +. Theorems A.2 and A.3 and Proposition A.4 below establish the local linear convergence of iteration (A.1) and characterize the speed of convergence of the iteration sequence. They are counterparts in our analysis of Theorems A2.1 and A3.1 of [12]. THEOREM A.2. Let F satisfy the basic hypothesis and let U satisfy the bounded deterioration hypothesis with respect to B, IF,, C,], where B, Rnxn is an invertible matrix with II- B,-F(,)I Let {Ak}k=O,,... rz < 1. of the following holds: (A.2) ]Ak--A,[ Zr, k=0, 1, ..., for some Z andr e(0,1); (A.3) Ak A, q-linearly with Q{Ak} r e [0,1). Then for any r such that max (rz,r} < r < 1, there exist positive constants er, 5r, and an integer kv such that ff Ixo- x,] < er, ]Bo- B, < 5r, and ko kr, then with iteration (A.1) is well dCned for k O, 1, ..., and converges to (A.4) ] ,l g re, k 0, 1, ..., ff (A.2) holds, where [] max {[x], IA]} for be such that one , , (A.5) [k+ for rlkn ,[, k 0, 1,..., ff (A.3) holds, where e (x, A) R and for a suitable constant F. , }k=o,,... - }k=o,,... B, and {B; are uniformly small. nxn to be small is equivalent are norms Since on R equivalent, requiring[. Proof. to requiring ][. ][ to be small. For convenience, then, we impose smallness requirements on ]]. ] rather than on ]. ]. The following may be helpful: If we are given A+, and B IF, C] with B invertible, then for x+ x B {F() + C(A+ A)}, we have Furthermore, (A.6) {k + x, - [I- B-F(Z,)I (x ,) + B- IV F(z,)] ( ,) B-C(+ ,) B- IF(z) F’(z,)(z z,)]. We first assume that (A.2) holds and establish all conclusions except (A.5). Take max {lx[, ]A]} for (x, A) e Rn. Choose positive e, 5 and an integer kr such that (5 + a2)eP/(1 rP) < e, and if ] ,]] 5 and k k, then B- exists and [ I , Zr S-F(z,)l + (IS- IV F(z,)]l + IB-CI} Z + I-Xl 1 I+p rer. + er 1289 LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES If necessary, further restrict er, 5, and kr so that if [x k > kr, then (2,2+,/)) E N for 2 (x, Ak) and 2+ (x- B -1 {F(2) + C(Ak+l x, < er, lIB B,[[ < 5, and Ak)}, Ak+l). > 0 such that 5r 4- (a5 4- a)e/(1 r) < 5. Ix0- x, < er, liB0- B, I < 5r, and ko > kr. We Choose 5r Let have from (A.6) and the other assumptions above and from Proposition A.1 that 12 2, < rer. As an inductive hypothesis, assume that for some k > 0 and for j Then II/)j-/),ll a(2,2+) ..., k- 1, 0, that 5 and Then < see we forj 0,..., k-l, < 12+-2, rJ+er. < rJer and by the bounded deterioration hypothesis Summing over j -0,..., k- 1 gives (A.7) ]]/k ,]1 < llBo B,][ + (o1( 4- o2) 1 erPrp < 5. Also, as in the case k 1, Ix+ x,I < [I B-Fx(2,)[ rker + {Is; + "r (re)+v _<rk+l r. - Then ]2+ Y,l < rk+ler, and the induction is complete. It follows from (A.7) and the assumptions on 5 that {/ -/}, } =o,,... and { -1 B, } =o,,... are uniformly small, and the theorem is proved in this case. We now assume that (A.3) holds and establish (A.5). We assume that e, 5, and k have been chosen such that if Ixo- x,I < e, IIBo- B, < 6, and ko > k, then iteration (A.1) is well defined for k 0, 1, (A.4) holds, and {B- B,}=o,,... B,- }=o,,... are uniformly small. This is allowed, for if (A.3) holds, and {Bthen (A.2) holds for an appropriate with rx replaced by anything larger. Choose r’, r" such that max {r,r} < r" < r’ < r. If necessary, further restrict e, 5, and k so that if Ixo- x,I < e, IIBo- B,I] < 5, and ko > k, and if {2,B}=o,1,... is determined by (A.1), then [Ako+k+l- ,[ < r"lAko+k- ,[, II--BF(2,)[ < r’, and IB-Xl(-/(l/p)) max {Ix x,], ]o+ ,1} < r-r’ for k 0, 1,.... Choose F > 1 so large that if Ixo x,] < e, I1o t),ll < is and ko > k and if -1 determined by (A.1), then [C- Fx(2,)] + (IBiCkl + F)r" < Fr’ for k 0, 1,.... Take 121 Ixl + rll for 2 (x, A) e R B ..., , IB . I {2k,[k}=o,x,... 1290 SAMIH K. BOURJI AND HOMER F. WALKER < er, II/o -/}, I < ir, and k0 >_ kr, and let (2k, [k } k=0,1,... be determined by (A.1). We have from (A.6), the other assumptions, and Proposition Suppose that Ix0 x, A.1 that and so 12k+l 2,[ <_ r’lxk x, + {[B; [Ck F(2,)]] + r" IB;1Ck[ + r"F} + (r < [:] THEOREM A.3. Suppose that F satisfies the basic hypothesis and that {2k }k=o,1,... is a sequence generated by (A.1) which converges q-linearly to 2, in some norm with k 2k+1- 2k 0 .for all but finitely many k. If B, [B,, C,] is any matrix such that B, E Rnxn is invertible, then the norm-independent condition This completes the proof. (A.8) holds lim if and only if the lim I(/}k -/}*) Ski 0 norm-independent condition [I- B,-1F(2,)](x x,) (x+l x,) B,-1C,(Ao++ A,) (A.9) + B, -1 IV, F(2,)] (Ako+k A,) /12k 2, 0 holds. Proof. We have (J0k --/},)k B, ([I- B,-1Fx(2,)] (xk (A.10) x,) (Xk+l X,) B,-C,(Ao++ A,) + B, -1 [C, F (2,)] (Ako+k A,) } + F’ (2,)(2k 2,) F(2k) 2, q-linearly in some norm, it follows from the equivalence of norms Since 2 R that for any norm [. on R there are constants 1 and /2 for which on (A.II) Also, we see from Proposition A.1 that (A.12) F’(2,)(2k 2,) F(2k) o(lkl) in any norm. Since B, is invertible, it follows immediately from (A.10)-(A.12), and norm equivalence that (A.8) holds in some norms on Rn and R if and only if (A.9) holds in some norms on Rn and R LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES 1291 The conditions of Theorem A.3 have particular consequences which are outlined in Proposition A.4 and the remarks below. PROPOSITION A.4. If (A.9) holds in some norms on Rn and Rn and if rx B,-1Fx(2,)I and r Q{Ak}, then the following hold: II- (A.13) /or any e > 0, limk_ 12k+ 2,1/12k 2, <_ max{rx, r,x + e}, where 121 Ixl + FIA /or 2 (x, ,) E Rn and/or a suitable constant F depending on ; (A.14) if r O, then 2k 2, q-superlinearly if and only if the norm-independent - condition holds. In particular, 2k - Proof. If (A.9) 2, q-superlinearly if ,k - , q-superlinearly and B, F(2,). holds in some norms, then , for any F and any norms on Rn, Rm, and Rn. From (A.15), we see that (A.13) holds by taking 121 Ixl + F[A[ for 2 (x,A) e R where F is sufficiently large that [IB,-C,I + F]r + IB,-[C, F(2,)] _< F(r + e). If rx 0, then 0 for any norms on Rm and Rn, and it follows limk- IAko+k+- A,[/[2k- 2, from (A.9) that for any positive F. Then we obtain (A.14) by taking R Ixl + rill for 2 (x, ) e We now consider an inverse update analogue of iteration (A.1), viz., an iteration which begins with some 2o (xo, Ao) and Ko [Ko, Lo] and determines xk+ (A.16) + /k+l KkF(2k) + L (Aco+k+ (x+,,Xo++), [gk+l, Lk+l] e U (2k, 2k+l, xk Ao+k), 1292 SAMIH K. BOURJI AND HOMER F. WALKER ..., where we assume the update function U is defined in a neighborhood 0, 1, x x N N N2 of (,,,,/,) R Rn x Rn for some/,, where N C_ and N2 is such that the first n columns of any matrix in N2 constitute a nonsingular matrix. The appropriate analogue of the bounded-deterioration hypothesis is the following hypothesis. INVERSE BOUNDED DETERIORATION HYPOTHESIS. Let and 2 be nonnegative constants such that for each (5:, 5:+, [) e N, every k+ e U(5:, 5:+, R) satisfies for k N lik+ -/L I < [1 + 0!1(:1"(5:, 5:+)P] Ilk k, I + max{ls:-5:,l, 15:+-5:,l}, where 15:l max{lxl, IAI} fors: (x,A) e Re. Theorems A.5 and A.6 below are counterparts for iteration (A.16) of Theorems A.2 and A.3 for iteration (A.1). They are analogous to Theorems A2.2 and A3.2 of [12]. The proof of Theorem A.5 is very similar to the proof of Theorem A.2, and so we omit it. THEOREM A.5. Let F satisfy the basic hypothesis andlet U satisfy the inverse bounded deterioration hypothesis with respect to K, [K,,L,], where K, E I:tnxn is an invertible matrix with fora(5:,5:+) II- g, Fx(5:,)l <_ rx < 1. Let be such that either (A.2) or (A.3) holds for some r. Then for any r max{rz,r),} < r < 1, there exist positive constants er, r and an integer kr if [xo x,[ < er, I/o -/,[ < r, and ko >_ kr, then iteration (A.16) is well defined for k O, 1,..., and converges to 5:, according to either (A.4), /f (A.2) holds, or (A.5), if (A.3) holds. Furthermore, { fk }k=O,1,... and {Ak}k=O,1,... such that such that , are uniformly small. THEOREM A.6. Suppose that F satisfies the basic hypothesis and that {5:k }k=o,1,... (A.16) which converges q-linearly to 5:, in some norm for all but finitely many k and with {gk}k--O,1,... and Let fi[, [g,, L,] be any matrix such that K, bounded. {K[l}k=o,1,... uniformly l:tnn is invertible, and suppose that {Yk}k=O,1,... i8 a sequence satisfying the normindependent condition is a sequence generated by with k 0 5:k+1- 5:k [k, flk -sk[ <_ aklkl, (A.17) where fk (Yk, Ako+k+l Ako+k), Sk norm-independent condition (A.18) holds lim Xk, I(/7k -/*) k if and only if the norm-independent lim Xk+l k [I K, Fx(5:,)](xk x,) O, 1,... and limk.-, ck 0. Then the 0 condition (xk+l x,) + L,(Ako+k+l A,) (A.19) [L, + K,F (5:,)] (Ako+k A,) holds. Remark. Clearly, (A.19) is the same as (A.9) with B, g-1 and C, Consequently if (A.18) and (A.19) hold, then the conclusions of Proposition A.4 hold with B, K- and C, -K- 1L,. LEAST-CHANGE SECANT UPDATES OF NONSQUARE MATRICES 1293 Since (A.19) is the same as (A.9) with B, K-1 and C. -K-IL., it suffices to show (A.18) holds if and only if (i.8) holds with [t: [Bt:, Ct:], St: and Ct: Lt:. We have Proof. K K" (/t: -/.) t: (B. Bt:)(/T(.t: st:) Bt: (/t: R.),t:, (A.20) and it follows from (A.17), (A.20), and the boundedness of (K)k=o,,... that (A.8) holds if and only if (A.21) lim (’t: -/*) ,Ot:l (Kk}k=O,,... and 0 (A.17) implies that there are positive constants 71 and v]2 for which llt:l <-Y2[kl, k 0, 1,.... It follows that (A.21) is equivalent to (A.18), and the holds. But Ikl g proof is complete. [:] Acknowledgments. The authors thank Layne T. Watson for participating in many discussions which were influential in shaping the contents of this paper. We also thank the referees for their thoughtful reviews. REFERENCES [1] P. ALFELD, Two devices for improving the ejficiency of stiff ODE solvers, in Proc. 1979 SIGNUM Meeting on Numerical Ordinary Differential Equations, pp. 24-1-24-3, Report 79-1710, Computer Science Dept., University of Illinois, Urbana, IL, 1979. [2] C. A. BEATTIE AND S. WEAVER-SMITH, Secant methods ]or structural model identification, Report ICAM-TR-88-0601, Interdisciplinary Center for Applied Mathematics, Virginia Polytechnic Institute and State University, Blacksburg, VA, 1988. [3] S. K. BOURJI, Ph.D. thesis, Utah State University, Logan, UT, 1987. [4] P. N. BROWN, A. C. HINDMARSH, AND H. F. WALKER, Experiments with quasi-Newton methods in solving stiff ODE systems, SIAM J. Sci. Statist. Comput., 6 (1985), pp. 297-313. [5] C. G. BROYDEN, A class of methods for solving nonlinear simultaneous equations, Math. Comp., 19 (1965), pp. 577-593. [6]---.-----, A new double-rank minimization algorithm, AMS Notices, 16 (1969), p. 670. [7]--------, The convergence of a class o] double-rank minimization algorithms, Parts I and II, J. Inst. Math. Appl., 6 (1971), pp. 76-90, 222-236. The convergence of an algorithm for solving sparse nonlinear systems, Math. Comp., [8] (1971), pp. 285-294. [9] W. C. DAVIDON, Variable metric methods ]or minimization, Report ANL-5990 Rev., Argonne National Laboratory, Argonne, IL, November, 1959. [10] J. E. DENNIS, JR., AND R. B. SCHNABEL, Least change secant updates for quasi-Newton methods, SIAM Rev., 21 (1980), pp. 443-459. [11], Numerical Methods for Unconstrained Optimization and Nonlinear Equations, PrenticeHall Series in Automatic Computation, Englewood Cliffs, N J, 1983. [12] J. E. DENNIS, JR., AND H. F. WALKER, Convergence theorems for least change secant update methods, SIAM J. Numer. Anal., 18 (1981), pp. 949-987. [13] R. FLETCHER, A new approach to variable metric algorithms, Comput. J., 13 (1970), pp. 317322. [14] R. FLETCHER AND M. J. D. POWELL, A rapidly convergent descent method for minimization, Comput. J., 6 (1963), pp. 163-168. [15] G. E. FORSYTHE, M. A. MALCOLM, AND C. S. MOLER, Computer Methods for Mathematical Computations, Prentice-Hall Series in Automatic Computation, Englewood Cliffs, N J, 1977. 2 1294 SAMIH K. BOURJI AND HOMER F. WALKER [16] K. GEOIG, On tracing [17] an implicitly defined curve by quasi-Newton steps and calculating bifurcation by local perturbations, SIAM J. Sci. Statist. Comput., 2 (1981), pp. 35-50. D. GOLDFARB, A family of variable-metric methods derived by variational means, Math. Comp., 24 (1970), pp. 23-26. [18] J. GREENSTADT, Variations on variable metric methods, Math. Comp., 24 (1970), pp. 1-18. [19] A. GRIEWANK AND L. SHENG, On the Gauss-Broyden method for nonlinear least-squares, Preprint MCS-PS-0988, Mathematics and Computer Science Div., Argonne National Laboratory, Argonne, IL, September, 1988. [20] J. M. MARTfNEZ, Quasi-Newton methods for solving underdetermined nonlinear simultaneous equations, RelatSrio Tcnico 21/88, Instituto de Matemtica, Estatstica, e Cincia dd Computax), Universidade Estudual de Campinas, Campinas, Brasil, November, 1988. sparsity in Newton-like methods, Ph.D. thesis, Cornell University, Ithaca, NY, 1978. [22] J. J. MoR, B. S. GARBOW, AND K. E. HILLSTROM, User Guide for MINPACK-1, Report ANL80-74, Applied Mathematics Div., Argonne National Laboratory, Argonne, IL, August, [21] E. S. MARWIL, Exploiting 1980. [23], Testing unconstrained optimization software, ACM Trans. Math. Software, 7 (1981), pp. 17-41. [24] A. P. MORGAN, Solving Polynomial Systems [25] [26] [27] [28] [29] Using Continuation for Engineering and Scientific Problems, Prentice-Hall, Englewood Cliffs, N J, 1987. J. M. ORTEGA AND W. C. RHEINBOLDT, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970. M. J. D. POWELL, A new algorithm for unconstrained optimization, in Nonlinear Programming, J. B. Rosen, O. L. Mangasarian, and K. Ritter, eds., Academic Press, New York, 1970. L. K. SCHUBERT, Modification of a quasi-Newton method for nonlinear equations with a sparse Jacobian, Math. Comp., 24 (1978), pp. 27-30. L. F. SHAMPINE, H. A. WATTS, AND S. M. DAVENPORT, Solving nonstiff ordinary differential the state of the art, SIAM Rev., 18 (1976), pp. 376-411. equations D. F. SHANNO, Conditioning of quasi-Newton methods for function minimization, Math. Comp., 24 (1970), pp. 647-656. [30] PH. L. TOINT, On sparse and symmetric matrix updating subject to a linear equation, Math. Comp., 31 (1977), pp. 954-961. [31] H. F. WALKER AND L. W. WATSON, Least-change secant update methods for underdetermined [32] [33] [34] systems, Res. Report August/88/41, Mathematics and Statistics Dept., Utah State University, Report 88-28, Computer Science Dept., and Report 88-09-03, Interdisciplinary Center for Applied Mathematics, Virginia Polytechnic Institute and State University, August, 1988. L. T. WATSON, S. C. BILLUPS, AND A. P. MORGAN, HOMPACK: a suite of codes for globally convergent homotopy algorithms, ACM Trans. Math. Software, 13 (1987), pp. 281-310. L. T. WATSON AND C. Y. WANG, A homotopy method applied to elastica problems, Internat. J. Solids and Structures, 17 (1981), pp. 29-37. T. YPMA, private communication, 1988.