A new trust region method for unconstrained optimization: 夡 Zhen-Jun Shi, Jin-Hua Guo
A new trust region method for unconstrained optimization: 夡 Zhen-Jun Shi, Jin-Hua Guo
A new trust region method for unconstrained optimization: 夡 Zhen-Jun Shi, Jin-Hua Guo
www.elsevier.com/locate/cam
Abstract
In this paper, we propose a new trust region method for unconstrained optimization problems. The new trust region method can
automatically adjust the trust region radius of related subproblems at each iteration and has strong global convergence under some
mild conditions. We also analyze the global linear convergence, local superlinear and quadratic convergence rate of the new method.
Numerical results show that the new trust region method is available and efcient in practical computation.
2007 Elsevier B.V. All rights reserved.
MSC: 90C30; 49M37; 65K05
Keywords: Unconstrained optimization; Trust region method; Global convergence; Convergence rate
1. Introduction
Consider an unconstrained optimization problem
min f (x),
x Rn,
(1)
The
Corresponding author. College of Operations Research and Management, Qufu Normal University, Rizhao, Shandong 276826, PR China.
510
Z.-J. Shi, J.-H. Guo / Journal of Computational and Applied Mathematics 213 (2008) 509 520
(suppose that it is not a stationary point), a trial step is usually obtained by solving the following subproblem:
min mk (d) = gkT d + 21 d T Bk d,
dR n
(2)
where k is a trust region radius. A merit function is normally used to test whether the trial step is accepted or the
trust region radius needs to be adjusted. If a trial step is accepted at some iteration then we call the related iteration an
effective iteration, otherwise we call the iteration an ineffective iteration.
In comparison with quasi-Newton methods, trust region methods converge to a point which not only is a stationary
point, but also satises second-order necessary conditions. Because of their strong convergence and robustness, trust
region methods have been studied by many authors [36,9,17] and some convergence properties are given in the
literature [2,8,10,12,18]. A professional book on trust region methods was published, in which we see diverse trust
region methods and a lot of theoretical analysis, see [1] and references therein.
It is obvious that the trust region radius k in the abovementioned subproblem is independent of gk and Bk . We do
not know whether the radius k is suitable to the whole algorithm. This situation would possibly increase the number of
solving subproblems and decrease the efciency of these methods. In fact, we want to reduce the number of ineffective
iterations so that we can reduce the number of solving subproblems when reaching the same precision.
Sartenaer [11] presented a strategy which can automatically determine an initial trust region radius. The basic idea is
to determine a maximal initial radius through many repeated trials in the direction gk in order to guarantee a sufcient
agreement between the model and the objective function. Zhang et al. [21] presented another strategy of determining
the trust region radius. Their basic idea is originated from the following subproblem in an articial neural network
research [19]:
min
s.t.
mk (d) = gkT d + 21 d T Bk d
k di k ,
i = 1, 2, . . . , n,
where k = cp gk /k , 0 < c < 1, k = min(Bk , 1), and p is a nonnegative integer. Therefore, instead of adjusting
k , one adjusts p at each iteration. Motivated by this technique, they solved the trust region subproblem (2) with
k = cp gk Bk
,
(3)
and gave a global convergent adaptive trust region method [20], where c (0, 1), p is a nonnegative integer and
B k = Bk + iI is a positive denite matrix for some i.
However, their method needs to estimate Bk or B k1 at each iteration, which leads to some additional cost of
computation. As a result, a simple adaptive trust region method was proposed [14], in which the trust region radius was
computed by the following equation:
k = cp gk 3 /gkT B k gk ,
(4)
where c (0, 1), B k is a positive denite matrix and p is a nonnegative integer. An analysis for general adaptive trust
region method was given in [13,15,16]. Of course, we should try to nd more effective and implementable trust region
methods so as to decrease the number of solving trust region subproblems at each iteration and improve the numerical
performance of related trust region methods.
In this paper, we present a new trust region method for unconstrained optimization problems and develop some
convergence properties under some mild conditions. At each iteration, the new method generates a suitable initial trust
region radius automatically based on the current iterative information. In comparison with other trust region methods,
the new trust region model is more consistent with the objective function at the current iterate. The global convergence
and local superlinear and quadratic convergence rate of the new method are proved under some mild conditions.
Numerical results also show that the new trust region method is available and efcient in practical computation. We
can nd that the number of solving trust region subproblems is reduced substantially and the efciency of trust region
method is greatly improved.
The rest of this paper is organized as follows. In the next section, we introduce the new trust region method and give
some simple properties. In Sections 3 and 4, the global convergence, linear convergence rate, local superlinear and
quadratic convergence rate are investigated. Numerical results are given in Section 5. Conclusions and future research
are summarized in Section 6.
Z.-J. Shi, J.-H. Guo / Journal of Computational and Applied Mathematics 213 (2008) 509 520
511
gkT qk
,
gk qk
(5)
gkT qk
,
q T Bk qk
(6)
in which B k is generated by the procedure: qkT B k qk = qkT Bk qk + iqk 2 and i is the smallest nonnegative integer such
that
qkT B k qk = qkT Bk qk + iqk 2 > 0.
The new trust region method at the current iterate xk needs to solve the subproblem
min mk (d) = gkT d + 21 d T Bk d
dR n
(7)
Let k be the largest in {sk , sk , 2 sk , . . .} such that the solution dk to (7) satises
rk =
fk f (xk + dk )
.
mk (dk )
(8)
x, y .
Moreover, if g(x) is Lipschitz continuous on , then (H1) is sure to hold. Therefore, Assumption (H1) is weaker than
that of the literature [15,16,21,20].
Remark 2.2. (H2) implies that {B k } is also uniformly bounded, which can be seen from the generating procedure of
B k . In fact, if Bk M for all k, then B k = Bk + iI 2M + 1 because M < i M + 1 implies that qkT B k qk =
qkT Bk qk + iqk 2 > 0.
512
Z.-J. Shi, J.-H. Guo / Journal of Computational and Applied Mathematics 213 (2008) 509 520
3. Global convergence
Theorem 3.1. If (H1) and (H2) hold and = 0, then Algorithm A either stops at a stationary point of (1) or generates
an innite sequence {xk } such that
lim
g T qk
k
qk
= 0.
(9)
Proof. Suppose that Algorithm A generates an innite sequence {xk } and limk (gkT qk /qk ) = 0. Then there
exist 0 > 0 and an innite subset K {0, 1, 2, . . .} such that
gkT qk
0 ,
qk
k K.
(10)
k.
(11)
Thus
qkT B k qk M0 qk 2 ,
k.
(12)
Let K1 = {k K|k = sk } and K2 = {k K|k < sk }. Although K1 K2 = K is an innite subset of {0, 1, 2, . . .},
we can prove that both K1 and K2 cannot be an innite subset and thus obtain a contradiction to (10).
Assume that K1 is an innite subset of K, by Lemma 2.3 and (12), we have
k gkT qk
2
(gkT qk )2
T
= sk g k q k =
2
2 q T B k qk
gkT qk
2M0 qk
2
,
k K1 .
Z.-J. Shi, J.-H. Guo / Journal of Computational and Applied Mathematics 213 (2008) 509 520
513
+
(fk fk+1 )
(fk fk+1 )
kK
k=0
kK1
(fk fk+1 )
2M0
kK1
2
0 .
2M0
gkT qk
qk
2
kK1
(fk fk+1 )
k=0
kK2
(fk fk+1 )
kK2
g T qk
1
k qk k
2
qk
1
0 k qk .
2
kK2
kK2 ,k
[k qk ] = 0.
(13)
Suppose that dk is an optimal solution of (7) with respect to k = k / (k K2 ). Then the following inequality holds:
fk f (xk + dk )
< ,
mk (dk )
k K2 ,
(14)
kK2 ,k
[k qk ] = 0.
(15)
O(k qk )
21 gkT qk /qk
O(k qk )
1
2 0
(k K2 , k +),
514
Z.-J. Shi, J.-H. Guo / Journal of Computational and Applied Mathematics 213 (2008) 509 520
which contradicts (14) for sufciently large k K2 . This contradiction shows that there exists no such an innite subset
K of {0, 1, 2, . . .} such that (10) holds. Therefore, (9) holds.
Corollary 3.2. If the conditions in Theorem 3.1 hold and qk satises (5), then
lim gk = 0.
g T qk
gkT qk
gk = k
0
qk
gk qk
(k ).
Theorem 3.3. Assume that the conditions in Corollary 3.2 hold and Algorithm A generates an innite sequence {xk }
such that xk x (k +), where g(x ) = 0, H (x ) = 2 f (x ) is a positive denite matrix, and H (x) = 2 f (x)
is continuous on a neighborhood N (x , ) of x . Then {xk } converges to x at least linearly.
Proof. Since H (x ) is a positive denite matrix and H (x) is continuous on N (x , ), it follows from the mean value
theorem that there exist L > 0 and 0 such that
g(x) g(y)L x y,
x, y N (x , 0 ).
Let K1 = {k|k = sk } and K2 = {k|k < sk }. If k K1 we can prove the following inequality similarly as in the proof
of Theorem 3.1.
2
gkT qk
, k K1 .
(16)
fk fk+1
2M0 qk
In the case of k K2 , we can see that k / sk . Suppose that dk is an optimal solution of (7) with respect to
k = k /, k K2 . Then (14) holds. Therefore,
f (xk + dk ) fk > mk (dk ),
k K2 .
By using the mean value theorem on the left-hand side of the above inequality, there exists
k (0, 1) such that
g(xk +
k dk )T dk > [gkT dk + 21 dkT Bk dk ] [gkT dk 21 dkT B k dk ],
k K2 .
k K2 .
Z.-J. Shi, J.-H. Guo / Journal of Computational and Applied Mathematics 213 (2008) 509 520
515
Thus
1
2 (1 )
L + 21 M0
k
gkT qk
qk 2
k K2 .
(17)
By noting that
fk fk+1
,
mk (dk )
and by Lemma 2.3 and (17) we have
2
(1 ) gkT qk
,
fk fk+1
4L + 2M0 qk
k K2 .
(18)
By letting
= min
(1 )
,
2M0 4L + 2M0
,
fk fk+1
=
gkT qk
qk
2
gkT qk
gk qk
2
gk 2
2 gk 2 .
By letting = 2 , we have
fk fk+1 gk 2 .
The rest of the proof is similar to the proof of Theorem 4.1 of [13] and is omitted here.
(19)
4. Convergence rate
Theorem 4.1. Assume that the conditions in Corollary 3.2 hold and Algorithm A generates an innite sequence {xk }
such that xk x (k +). H (x) = 2 f (x) is continuous on a neighborhood N (x , ) of x and H (x ) and Bk
are uniformly positive denite matrices such that
[Bk H (x )]qk
= 0,
k
qk
lim
(20)
dR n
mk (d) = gkT d + 21 d T Bk d
s.t. dqk .
516
Z.-J. Shi, J.-H. Guo / Journal of Computational and Applied Mathematics 213 (2008) 509 520
(gkT qk )2
q T B k qk
.
= k
2
2qkT Bk qk
(21)
(22)
q
B
mk (dk )
k k qk
2
o(dk 2 )
qk 2
o(dk 2 )
dk 2
(k +),
Z.-J. Shi, J.-H. Guo / Journal of Computational and Applied Mathematics 213 (2008) 509 520
517
Proof. Since Bk = H (xk ) implies that all conditions of Theorem 4.1 hold, we have dk = qk 0 (k ). Therefore,
there exists a k such that
xk + tq k N(x , ),
k k , t [0, 1].
(23)
Because H (x) is Lipschitz continuous on the neighborhood N (x , ) of x , there must exist an L() > 0 such that
H (x) H (y)L()x y,
x, y N (x , ).
(24)
By Theorem 4.1, we know that the new trust region method reduces to Newton method xk+1 = xk + qk = xk
H (xk )1 gk , for sufciently large k. Thereby, {xk } converges to x quadratically. The remainder proof can also be seen in
[1,3,8].
Theorem 4.3. Assume that (H1) and (H2) hold and Bk = Hk , Algorithm A generates an innite sequence {xk }. If {xk }
converges to x , then H (x ) is a semi-positive denite matrix, i.e., x satises the second order necessary condition.
Proof. Denote 1k and as the smallest eigenvalue of Hk and H (x ), respectively. Let zk be a normalized eigenvector
(zk = 1) of Hk corresponding to the eigenvalue 1k and zkT gk 0, then Hk zk = 1k zk . Suppose that H (x ) is not a
positive semi-denite matrix, then < 0 and thus 1k < 0 for sufciently large k.
Because k qk zk = k qk , it follows that k qk zk is a feasible solution to (7). Therefore,
mk (k qk zk ) = (k qk gkT zk + 21 2k qk 2 zkT Bk zk ) 21 2k qk 2 zkT Bk zk
= 21 2k qk 2 zkT Hk zk = 21 2k qk 2 1k zk 2 = 21 2k qk 2 1k .
(25)
(k +),
(26)
dR n
mk (d) = gkT d + 21 d T Bk d
(27)
o(2k qk 2 )
21 2k qk 2 1k
(28)
518
Z.-J. Shi, J.-H. Guo / Journal of Computational and Applied Mathematics 213 (2008) 509 520
mk (dk )
and thus
f (xk ) f (xk+1 )
> ,
mk (dk )
for sufciently large k, which contradicts (27) and then shows that 0 and thus H (x ) is a positive semi-denite
matrix.
5. Numerical results
In the new trust region method, qk has a wide scope which only needs to satisfy (5), and qk = gk is certainly a
choice which produces the recently proposed adaptive trust region methods [14]. If we take qk = B k1 gk then we can
obtain a new trust region method. The new trust region method with qk = gk and qk = B k1 gk are denoted by TRS
and TRN, respectively. The original trust region method with = 100 and = 0.01 is denoted by TRO [8, p. 68].
We chose the parameters c = 0.75, = 0.75, = 0.01 and = 109 , and {Bk } was modied by BFGS formula.
If we set Bk I and qk = gk we can also obtain another new trust region method denoted by TRI. Zhangs adaptive
trust region method [21] is denoted by TRZ. We chose the 18 test problems and their initial points from the literature
[7], for example, P5 means the No. 5 problem in this literature and so on. The stop criteria is
gk 109 ,
and numerical results are summarized in Table 1.
In Table 1, the three numbers mean that the rst number refers to the number of iterations, the second and third
numbers stand for the number of functional evaluations and gradient evaluations respectively. P refers to the test
problem, and T denotes the total CPU time for solving all the 18 test problems. As you can see, it is difcult to
choose an adequate upper bound for trust region radius in the original trust region method which may lead to a slow
Table 1
The number of iterations, function and gradient evaluations and CPU time
n
TRS
TRN
TRI
TRZ
TRO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
2
2
2
2
2
2
3
3
3
3
3
3
4
4
4
4
5
6
25/62/25
21/42/21
15/43/15
15/55/16
26/46/26
18/49/18
43/46/43
48/62/48
46/56/46
28/34/28
46/62/46
46/68/46
56/58/56
38/45/38
76/76/76
29/36/29
93/98/93
38/46/38
17/36/17
16/34/16
12/23/12
15/15/15
21/26/21
15/15/15
36/36/36
35/35/35
37/37/37
23/23/23
38/38/38
37/37/37
45/45/45
29/29/29
59/59/59
18/18/18
79/79/79
26/26/26
28/62/28
27/58/27
16/52/16
17/68/17
26/62/26
23/63/23
52/65/52
53/58/53
63/68/63
25/50/25
53/57/53
65/75/65
68/74/68
43/48/43
85/88/85
43/63/43
98/98/98
38/64/38
26/59/29
32/32/32
22/33/22
34/64/34
37/87/37
35/38/35
61/72/61
64/68/64
66/74/66
56/61/56
59/78/59
69/72/69
81/89/81
68/78/68
97/99/97
78/84/78
85/96/85
54/59/54
36/56/56
42/57/52
20/78/60
28/75/68
32/78/72
28/85/78
48/67/67
83/75/72
85/85/81
65/72/64
72/83/78
83/88/83
91/95/91
86/92/86
94/98/94
93/98/93
92/123/93
63/82/78
CPU (s)
118
59
155
124
195
Z.-J. Shi, J.-H. Guo / Journal of Computational and Applied Mathematics 213 (2008) 509 520
519
convergent method. While in the new trust region method, the problem is solved and the trust region radius is adjusted
automatically at each step.
It is shown from Table 1 that TRN seems to be the best trust region method because it uses the least amount of
total CPU time for solving all the 18 test problems. TRS is the second one that has good numerical performance. The
preliminary numerical experiments show that the new trust region method in the paper is a kind of promising method
for optimization problems.
Numerical results also show that the number of solving subproblems was decreased substantially when reaching the
same precision. The reason is that we chose an adequate initial trust region radius at each step. Actually, if we choose an
adequate trust region radius at each iteration then the number of solving subproblems will be reduced and the efciency
of related trust region methods will be improved. In Algorithm A, a suitable initial trust region radius at each iteration
was chosen and adjusted automatically, this makes the new trust region method have a good numerical performance in
solving practical problems.
In particular, when qk = Bk1 gk the new trust region method will be reduced to quasi-Newton method and when
qk = H (xk )1 gk the new trust region method will be reduced to Newton method. These also show that the new trust
region method is a promising method.
6. Conclusions and future research
In this paper, we presented a new class of adaptive trust region methods for unconstrained optimization problems
and investigated their global convergence. In these new methods, the trust region radius can be adjusted automatically
according to the current iterative information and is computed by a simple formula. As we have seen, different choices
of qk determine different adaptive trust region methods. If we take qk = gk then the new method will reduce to the
recently proposed method [14]. If we take qk = Bk1 gk with Bk1 being available, then we can obtain some interesting
convergence properties of these new methods, such as superlinear convergence rate and quadratic convergence rate,
etc. We can choose other qk to obtain many different adaptive trust region methods. Numerical results show that some
new trust region methods are available and efcient in practical computation.
For future research, we should choose B k and qk by using other approaches and conduct some numerical experiments
when qk is taken as other special directions that satisfy (5). Moreover, the main task in trust region methods is to solve
the quadratic subproblem. We should investigate how to solve the subproblems efciently.
Acknowledgments
The rst author is very grateful to Prof. Xiang-Sun Zhang in Chinese Academy of Sciences for many meaningful
discussions on this topic. The authors would like to thank two anonymous referees and the editor for many valuable
comments and suggestions that greatly improved the paper.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
A.R. Conn, N.I.M. Gould, Ph.L. Toint, Trust-Region Methods, Society for Industrial and Applied Mathematics, SIAM, Philadelphia, PA, 2000.
N.Y. Deng, Y. Xiao, F.J. Zhou, Nonmonotonic trust region algorithm, J. Optim. Theory Appl. 76 (1993) 259285.
R. Fletcher, Practical Method of Optimization, Unconstrained Optimization, vol. 1, Wiley, New York, 1980.
J.H. Fu, W.Y. Sun, Nonmonotone adaptive trust-region method for unconstrained optimization problems, Appl. Math. Comput. 163 (2005)
489504.
D.J. Higham, Trust region algorithms and timestep selection, SIAM J. Numer. Anal. 37 (1999) 194210.
J.J. Mor, Recent developments in algorithms and software for trust region methods, in: A. Bachem, M. Grotchel, B. Korte (Eds.), Mathematical
Programming: The State of the Art, Springer, Berlin, 1983, pp. 258287.
J.J. Mor, B.S. Grabow, K.E. Hillstrom, Testing unconstrained optimization software, ACM Trans. Math. Software 7 (1981) 1741.
J. Nocedal, S.J. Wright, Numerical Optimization, Springer, New York, 1999.
M.J.D. Powell, Convergence properties of a class minimization algorithms, in: O.L. Mangasarian, R.R. Meyer, S.M. Robinson (Eds.), Nonlinear
Programming, vol. 2, Academic Press, New York, 1975, pp. l27.
M.J.D. Powell, On the global convergence of trust region algorithms for unconstrained optimization, Math. Programming 29 (1984) 297303.
A. Sartenaer, Automatic determination of an initial trust region in nonlinear programming, SIAM J. Sci. Comput. 18 (6) (1997) 17881803.
520
Z.-J. Shi, J.-H. Guo / Journal of Computational and Applied Mathematics 213 (2008) 509 520
[12] G.A. Schultz, R.B. Schnabel, R.H. Byrd, A family of trust-region-based algorithms for unconstrained minimization with strong global
convergence, SIAM J. Numer. Anal. 22 (1985) 4767.
[13] Z.J. Shi, Convergence of line search methods for unconstrained optimization, Appl. Math. Comput. 157 (2004) 393405.
[14] Z.J. Shi, H.Q. Wang, A new self-adaptive trust region method for unconstrained optimization, Technical Report, College of Operations Research
and Management, Qufu Normal University, 2004.
[15] Z.J. Shi, X.S. Zhang, Convergence analysis of adaptive trust region methods, Technical Report, Institute of Applied Mathematics, Academy of
Mathematics and Systems Science, Chinese Academy of Science, 2004.
[16] Z.J. Shi, X.S. Zhang, From Line Search Method to Trust Region Method. Lecture Notes in Operations Research, vol. 5, 2005, pp. 156170.
[17] W.Y. Sun, Nonmonotone trust region method for solving optimization problems, Appl. Math. Comput. 156 (2004) 159174.
[18] J.Z. Zhang, C.X. Xu, Trust region dogleg path algorithms for unconstrained minimization, Ann. Oper. Res. 87 (1999) 407418.
[19] X.S. Zhang, Trust region method in neural network, Acta Math. Appl. Sinica 12 (1) (1996) 110.
[20] X.S. Zhang, Z.W. Chen, J.L. Zhang, A self-adaptive trust region method for unconstrained optimization, Oper. Res. Trans. 5 (1) (2001) 5362.
[21] X.S. Zhang, J.L. Zhang, L.Z. Liao, An adaptive trust region method and its convergence, Sci. China 45 (2002) 620631.