Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Fast Convergence of Inertial Gradient Dynamics with Multiscale Aspects

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

In this paper, the asymptotic properties as \( t\rightarrow +\infty \) of the following second-order differential equation in a Hilbert space \( {\mathcal {H}} \) are studied,

$$\begin{aligned} \ddot{x}(t)+\gamma (t){\dot{x}}(t)+\beta (t)\Big (\nabla \Phi (x(t))+\epsilon (t)\nabla U(x(t))\Big )=0, \end{aligned}$$

where \( \Phi ,U:{\mathcal {H}}\rightarrow {\mathbb {R}} \) are convex differentiable, \( \gamma (t) \) is a positive damping coefficient, \( \beta (t) \) is a time scale coefficient and \( \epsilon (t) \) is a positive nonincreasing function, \(\gamma (t)\), \( \beta (t) \) and \( \epsilon (t) \) are all continuously differentiable. This system has applications in the fields of mechanics and optimization. Based on the proper tuning of \( \gamma (t) \) and \( \beta (t) \), we obtain the convergence rates for the values, and the conclusion is that, under the different conditions, the trajectories either converge to one minimizer of \( \Phi \) weakly, or converge to one common minimizer of \( \Phi \) and U weakly. When \( \epsilon (t) \) tends to 0 as t goes to infinity, under the condition that \( \Phi \) or U is convex, the trajectories converge to the unique minimizer of \( \Phi \), or the unique minimizer of U, respectively. Finally, some particular cases are examined, and some numerical experiments are conducted to illustrate our main results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data Availability

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

References

  1. Alvarez, F.: On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38, 1102–1119 (2000). https://doi.org/10.1137/S0363012998335802

    Article  MathSciNet  MATH  Google Scholar 

  2. Attouch, H., Cabot, A.: Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J. Diff. Equ. 263, 5412–5458 (2017). https://doi.org/10.1016/j.jde.2017.06.024

    Article  MathSciNet  MATH  Google Scholar 

  3. Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 168, 123–175 (2018). https://doi.org/10.1007/s10107-016-0992-8

    Article  MathSciNet  MATH  Google Scholar 

  4. Attouch, H., Chbani, Z., Riahi, H.: Fast convex optimization via time scaling of damped inertial gradient dynamics. HAL preprint. (2019) https://hal.archives-ouvertes.fr/hal-02138954. Accessed 24 May 2019

  5. Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case \( \alpha \leqslant 3 \). Control Optim. Calc. Var. ESAIM (2019). https://doi.org/10.1051/cocv/2017083

    Article  MATH  Google Scholar 

  6. Attouch, H., Chbani, Z., Riahi, H.: Combining fast inertial dynamics for convex optimization with Tikhonov regularization. J. Math. Anal. Appl. 457, 1065–1094 (2018). https://doi.org/10.1016/j.jmaa.2016.12.017

    Article  MathSciNet  MATH  Google Scholar 

  7. Attouch, H., Cominetti, R.: A dynamical approach to convex minimization coupling approximation with the steepest descent method. J. Diff. Equ. 128, 519–540 (1996). https://doi.org/10.1006/jdeq.1996.0104

    Article  MathSciNet  MATH  Google Scholar 

  8. Attouch, H., Czarnecki, M.O.: Asymptotic control and stabilization of nonlinear oscillators with non-isolated equilibria. J. Diff. Equ. 179, 278–310 (2002). https://doi.org/10.1006/jdeq.2001.4034

    Article  MathSciNet  MATH  Google Scholar 

  9. Attouch, H., Czarnecki, M.O.: Asymptotic behavior of gradient-like dynamical systems involving inertia and multiscale aspects. J. Diff. Equ. 262, 2745–2770 (2017). https://doi.org/10.1016/j.jde.2016.11.009

    Article  MathSciNet  MATH  Google Scholar 

  10. Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than \( 1/k^2 \). SIAM J. Optim. 26, 1824–1834 (2016). https://doi.org/10.1137/15M1046095

    Article  MathSciNet  MATH  Google Scholar 

  11. Boţ, R.I., Csetnek, E.R., László, S.C.: Tikhonov regularization of a second order dynamical system with Hessian driven damping. Math. Program. 189, 151–186 (2021). https://doi.org/10.1007/s10107-020-01528-8

    Article  MathSciNet  MATH  Google Scholar 

  12. Bolte, J.: Continuous gradient projection method in Hilbert spaces. J. Optim. Theory Appl. 119, 235–259 (2003). https://doi.org/10.1023/B:JOTA.0000005445.21095.02

    Article  MathSciNet  MATH  Google Scholar 

  13. Bruck, J., Ronald, E.: Asymptotic convergence of nonlinear contraction semigroups in Hilbert space. J. Funct. Anal. 18, 15–26 (1975). https://doi.org/10.1016/0022-1236(75)90027-0

    Article  MathSciNet  MATH  Google Scholar 

  14. Ge, B., Zhuge, X., Ren, H.: Convergence rates of damped inerial dynamics from multi-degree-of-freedom system. Optim. Lett. (2022). https://doi.org/10.1007/s11590-022-01855-z

    Article  MathSciNet  MATH  Google Scholar 

  15. Haraux, A.: Systèmes dynamiques dissipatifs et applications. Masson, Paris (1991)

    MATH  Google Scholar 

  16. Jendoubi, M.A., May, R.: On an asymptotically autonomous system with Tikhonov type regularizing term. Archiv der Mathematik 95, 389–399 (2010). https://doi.org/10.1007/s00013-010-0181-6

    Article  MathSciNet  MATH  Google Scholar 

  17. May, R.: Asymptotic for a second-order evolution equation with convex potential and vanishing damping term. Turkish J. Math. 41, 681–685 (2017). https://doi.org/10.3906/mat-1512-28

    Article  MathSciNet  MATH  Google Scholar 

  18. May, R.: On the strong convergence of the gradient projection algorithm with Tikhonov regularizing term. arXiv preprint. (2019) https://doi.org/10.48550/arXiv.1910.07873. Accessed 17 Oct 2019

  19. May, R., Mnasri, C., Elloumi, M.: Asymptotic for a second order evolution equation with damping and regularizing terms. AIMS Math. 6, 4901–4914 (2021). https://doi.org/10.3934/math.2021287

    Article  MathSciNet  MATH  Google Scholar 

  20. Nesterov, Y.: A method for solving the convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Doklady 27, 372–376 (1983)

    MATH  Google Scholar 

  21. Nesterov, Y.: Introductory Lectures on Convex Optimization. Kluwer Academic Publishers, Boston (2004). https://doi.org/10.1007/978-1-4419-8853-9

    Book  MATH  Google Scholar 

  22. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005). https://doi.org/10.1007/s10107-004-0552-5

    Article  MathSciNet  MATH  Google Scholar 

  23. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013). https://doi.org/10.1007/s10107-012-0629-5

    Article  MathSciNet  MATH  Google Scholar 

  24. Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 73, 591–597 (1967). https://doi.org/10.1090/S0002-9904-1967-11761-0

    Article  MathSciNet  MATH  Google Scholar 

  25. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. Ussr Comput. Math. Math. Phys. 4, 1–17 (1964). https://doi.org/10.1016/0041-5553(64)90137-5

    Article  Google Scholar 

  26. Polyak, B.T.: Introduction to Optimization. Optimization Software Inc, New York (1987)

    MATH  Google Scholar 

  27. Sebbouh, O., Dossal, C., Rondepierre, A.: Convergence rates of damped inertial dynamics under geometric conditions and perturbations. SIAM J. Optim. 30, 1850–1877 (2020). https://doi.org/10.1137/19M1272767

    Article  MathSciNet  MATH  Google Scholar 

  28. Su, W., Boyd, S., Candes, E.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17:1-43. (2016) https://doi.org/10.48550/arXiv.1503.01243

  29. Xu, B., Wen, B.: On the convergence of a class of inertial dynamical systems with Tikhonov regularization. Optim. Lett. 15, 2025–2052 (2021). https://doi.org/10.1007/s11590-020-01663-3

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 11201095), the Fundamental Research Funds for the Central Universities (Grant No. 3072022TS2402), the Postdoctoral Research Startup Foundation of Heilongjiang (Grant No. LBH-Q14044), and the Science Research Funds for Overseas Returned Chinese Scholars of Heilongjiang Province (Grant No. LC201502).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Ge.

Ethics declarations

Conflicts of Interest

The authors declare no conflict of interest.

Additional information

Communicated by Radu Ioan Bot.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

In what follows, we prove the existence and the uniqueness of a global solution to the Cauchy problem associated with the evolution system (1). We will use the following lemma, which have been established in [15, Prop. 6.2.1].

Lemma A.1

Let \( F:I\times {\mathcal {X}}\rightarrow {\mathcal {X}} \) where \( I=[t_0,+\infty ) \) and \( {\mathcal {X}} \) is a Banach space. Assume that

  1. (i)

    for every \( x\in {\mathcal {X}} \), \( F(\cdot ,x)\in L^1_{loc}(I,{\mathcal {X}}) \);

  2. (ii)

    for a.e. \( t\in I \), for every \( x,y\in {\mathcal {X}} \),

    $$\begin{aligned} \left\| F(t,x)-F(t,y)\right\| \leqslant K(t,\left\| x\right\| +\left\| y\right\| )\left\| x-y\right\| \end{aligned}$$

    where \( K(\cdot ,r)\in L^1_{loc}(I), \forall r\in {\mathbb {R}}_+\);

  3. (iii)

    for a.e. \( t\in I\), for every \( x\in {\mathcal {X}} \),

    $$\begin{aligned} \left\| F(t,x)\right\| \leqslant P(t)(1+\left\| x\right\| ), \end{aligned}$$

    where \( P\in L^1_{loc}(I)\).

Then, for every \( s\in I, x\in {\mathcal {X}} \), there exists a unique solution \( u_{s,x}\in W_{loc}^{1,1}(I,{\mathcal {X}}) \) of the Cauchy problem:

$$\begin{aligned} {\dot{u}}_{s,x}(t)=F(t,u_{s,x}(t)) \text { for a.e. } t\in I , \text { and } u_{s,x}(s)=x . \end{aligned}$$

The following theorem gives the existence and uniqueness of system (1).

Theorem A.1

Suppose that \( \Phi ,U:{\mathcal {H}}\rightarrow {\mathbb {R}} \) are convex, \( {\mathcal {C}}^1 \), with Lipschitz continuous gradient \( \nabla \Phi \) and \( \nabla U \). Assume that \( \beta ,\gamma ,\epsilon :[t_0,+\infty )\rightarrow {\mathbb {R}}_+^* \) are locally integrable. Then, the evolution system (1), with initial condition \( (x(t_0),{\dot{x}}(t_0))=(x_0,{\dot{x}}_0)\in {\mathcal {H}}\times {\mathcal {H}} \), admits a unique global solution \( x:[t_0\rightarrow +\infty ) \rightarrow {\mathcal {H}}\).

Proof

To prove the existence and uniqueness for (1) with initial condition \( (x(t_0),{\dot{x}}(t_0))=(x_0,{\dot{x}}_0) \), we formulate it in the phase space. Set \( I=[t_0,+\infty ) \), and define \( F:I\times {\mathcal {H}}\times {\mathcal {H}}\rightarrow {\mathcal {H}} \) by

$$\begin{aligned} F(t,x,y)=(y,-\gamma (t)y-\beta (t)(\nabla \Phi +\epsilon (t)\nabla U(x))) \end{aligned}$$

We set \( u(t)=(x(t),y(t)) \). The Cauchy problem for (1) can be equivalently formulated as:

$$\begin{aligned} \begin{aligned} {\dot{u}}(t)&=F(t,u(t))\quad \text {for a.e. } t\in I ,\\ u(t_0)&=(x_0,{\dot{x}}_0). \end{aligned} \end{aligned}$$

We are going to verify the three conditions of Lemma A.1.

(i) For each \( (x,y)\in {\mathcal {H}} \times {\mathcal {H}}\), \( F(\cdot ,x,y)\in L^1_{loc}(I,{\mathcal {H}}) \), since the functions \( \beta ,\gamma ,\epsilon \) are so.

(ii) Denote by \( L_1 \) and \( L_2 \) the Lipschitz constant of \( \nabla \Phi \) and \( \nabla U \). For every \( u=(x,y), u'=(x',y')\in {\mathcal {H}}\times {\mathcal {H}} \) and a.e. \( t\in I \),

$$\begin{aligned} \begin{aligned}&\left\| F(t,u)-F(t,u')\right\| \\&\quad ={}\left\| y-y'\right\| +\left\| \gamma (t)(y-y')+\beta (t)(\nabla \Phi (x)-\nabla \Phi (x')+\epsilon (t)(\nabla U(x)-\nabla U(x')))\right\| \\&\quad \leqslant {}(1+\gamma (t)+L_1\beta (t)+L_2\beta (t)\epsilon (t))(\left\| x-x'\right\| +\left\| y-y'\right\| )\\&\quad ={}(1+\gamma (t)+L_1\beta (t)+L_2\beta (t)\epsilon (t))\left\| (x,y)-(x',y')\right\| \end{aligned} \end{aligned}$$

Hence, the second condition is verified, since the real function \( t\mapsto 1+\gamma (t)+L_1\beta (t)+L_2\beta (t)\epsilon (t) \) belongs to \( L_{loc}^1(I,{\mathbb {R}}) \).

(iii) For every \( u=(x,y)\in {\mathcal {H}}\times {\mathcal {H}} \) and a.e. \( t\in I \),

$$\begin{aligned} \begin{aligned}&\left\| F(t,u)\right\| \\&\quad ={}\left\| y\right\| +\left\| \gamma (t)y+\beta (t)(\nabla \Phi (x)-\nabla \Phi (x_0)+\nabla \Phi (x_0) \right. \\&\qquad \left. +\epsilon (t)(\nabla U(x)-\nabla U(x_0)+\nabla U(x_0)))\right\| \\&\quad \leqslant {}(1+\gamma (t))\left\| y\right\| +(L_1\beta (t)+L_2\beta (t)\epsilon (t))\left\| x-x_0\right\| \\&\qquad +\beta (t)\left\| \nabla \Phi (x_0)\right\| +\beta (t)\epsilon (t)\left\| \nabla U(x_0)\right\| \\&\quad \leqslant {} v(t)(1+\left\| x\right\| +\left\| y\right\| ), \end{aligned} \end{aligned}$$

where

$$\begin{aligned} v(t)= & {} \max (1+\gamma (t),L_1\beta (t)+L_2\beta (t)\epsilon (t),\beta (t)((L_1+L_2\epsilon (t))\left\| x_0\right\| \\&+\nabla \Phi (x)+\epsilon (t)\nabla U(x))). \end{aligned}$$

Since \( v(\cdot )\in L_{loc}^1(I,{\mathbb {R}}) \), we conclude that all the conditions of Lemma A.1 are satisfied. Therefore, there exists a unique global solution of system (1) satisfying the initial condition \( (x(t_0),{\dot{x}}(t_0))=(x_0,{\dot{x}}_0) \). \(\square \)

The following lemma can be found in [26, Lemma 3.2].

Lemma A.2

Let \( \delta >0 \), and let \( w:[\delta ,+\infty )\rightarrow {\mathbb {R}} \) be a continuously differentiable function which is bounded from below. If the positive part \( [{\dot{w}}]_+ \) of \( {\dot{w}} \) belongs to \( L^1(\delta ,+\infty ) \), then \( \lim _{t\rightarrow +\infty }w(t) \) exists.

Proof

Since \( [{\dot{w}}]_+ \) belongs to \( L^1(\delta ,+\infty ) \), we have \( \int _{\delta }^{+\infty }{[{\dot{w}}(t)]_+\mathrm {d}t}<+\infty \). Let \( \theta (t)=w(t)-\int _{\delta }^{t}{[{\dot{w}}(t)]_+\mathrm {d}s} \), hence we obtain \( {\dot{\theta }}(t)=\dot{w }(t)-[{\dot{w}}(t)]_+\leqslant 0 \), which means that \( \theta (t) \) is nonincreasing. Since w(t) is bounded from below, we immediately deduce that \( \theta (t) \) is also bounded from below, it implies that \( \lim _{t\rightarrow +\infty }\theta (t) \) exists. Therefore,

$$\begin{aligned} \lim _{t\rightarrow +\infty }w(t)=\lim _{t\rightarrow +\infty }\theta (t)+\int _{\delta }^{+\infty }{[{\dot{w}}(t)]_+\mathrm {d}t} \end{aligned}$$

exists. \(\square \)

To establish the weak convergence of the solutions of (1), we shall use Opial’s lemma in [24], which has the continuous form. This argument was first used in [13] to obtain the convergence of nonlinear contraction semigroups.

Lemma A.3

Let S be a nonempty subset of \( {\mathcal {H}} \) and let \( x:[0,+\infty )\rightarrow {\mathcal {H}} \). Assume that

  1. (i)

    for every \( z\in S \), \( \lim _{t\rightarrow +\infty }\left\| x(t)-z\right\| \) exists;

  2. (ii)

    every weak sequential limit point of x(t) , as \( t\rightarrow +\infty \), belongs to S.

Then, x(t) converges weakly as \( t\rightarrow +\infty \) to a point in S.

Introducing three lemmas about \( \gamma (t) \) and \( \Gamma (t) \), which has been established in [2].

Lemma A.4

Let \( \Gamma \) defined by (3). If there exists \( m>0 \) such that \( \gamma (t)\leqslant m \) for every \( t\geqslant t_2 \). Then, we have \( \Gamma (t)\geqslant \frac{1}{m} \) for every \( t\geqslant t_2 \).

Lemma A.5

Let \( \Gamma \) defined by (3). If there exists \( t_1 \geqslant t_0 \) and \( a\in [0,1) \) such that \( \frac{{\dot{\gamma }}(t)}{\gamma (t)^2}\geqslant -a \) for every \( t\geqslant t_1 \). Then, we have \( \int _{t_0}^{+\infty }{\mathrm {e}^{-\int _{t_0}^{t}{\gamma (u)\mathrm {d}u}}}<+\infty \) and \( \Gamma (t)\leqslant \frac{1}{(1-a)\gamma (t)} \) for every \( t\geqslant t_1 \).

Lemma A.6

Let \( \Gamma \) defined by (3). If there exists \( c\in [0,1[\) such that \( \lim _{t\rightarrow +\infty }\frac{{\dot{\gamma }}(t)}{\gamma (t)^2}=-c \). Then we have \( \Gamma (t)\sim \frac{1}{(1-c)\gamma (t)} \) as \( t\rightarrow +\infty \).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ren, H., Ge, B. & Zhuge, X. Fast Convergence of Inertial Gradient Dynamics with Multiscale Aspects. J Optim Theory Appl 196, 461–489 (2023). https://doi.org/10.1007/s10957-022-02124-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-022-02124-w

Keywords

Mathematics Subject Classification

Navigation