Nothing Special   »   [go: up one dir, main page]

Skip to main content

Robust Reinforcement Learning for Stochastic Linear Quadratic Control with Multiplicative Noise

  • Chapter
  • First Online:
Trends in Nonlinear and Adaptive Control

Part of the book series: Lecture Notes in Control and Information Sciences ((LNCIS,volume 488))

Abstract

This chapter studies the robustness of reinforcement learning for discrete-time linear stochastic systems with multiplicative noise evolving in continuous state and action spaces. As one of the popular methods in reinforcement learning, the robustness of policy iteration is a longstanding open problem for the stochastic linear quadratic regulator (LQR) problem with multiplicative noise. A solution in the spirit of input-to-state stability is given, guaranteeing that the solutions of the policy iteration algorithm are bounded and enter a small neighborhood of the optimal solution, whenever the error in each iteration is bounded and small. In addition, a novel off-policy multiple-trajectory optimistic least-squares policy iteration algorithm is proposed, to learn a near-optimal solution of the stochastic LQR problem directly from online input/state data, without explicitly identifying the system matrices. The efficacy of the proposed algorithm is supported by rigorous convergence analysis and numerical results on a second-order example.

Dedicated to Laurent Praly, a beautiful mind

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abbasi-Yadkori, Y., Lazic, N., Szepesvari, C.: Model-free linear quadratic control via reduction to expert prediction. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2019)

    Google Scholar 

  2. Agarwal, R.P.: Difference Equations and Inequalities: Theory, Methods, and Applications, 2nd edn. Marcel Dekker Inc, New York (2000)

    Book  Google Scholar 

  3. Athans, M., Ku, R., Gershwin, S.: The uncertainty threshold principle: some fundamental limitations of optimal decision making under dynamic uncertainty. IEEE Trans. Autom. Control 22(3), 491–495 (1977)

    Google Scholar 

  4. Beghi, A., D’alessandro, D.: Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations. Automatica, 34(8):1031 – 1034, 1998

    Google Scholar 

  5. Bertsekas, D.P.: Approximate policy iteration: A survey and some new methods. J. Control Theory Appl. 9(3), 310–335 (2011)

    Article  MathSciNet  Google Scholar 

  6. Bertsekas, D.P.: Reinforcement Learning and Optimal Control. Athena Scientific, Belmont, Massachusetts (2019)

    Google Scholar 

  7. Bian, T., Jiang, Z.P.: Continuous-time robust dynamic programming. SIAM J. Control Optim. 57(6), 4150–4174 (2019)

    Article  MathSciNet  Google Scholar 

  8. Bian, T., Wolpert, D.M., Jiang, Z.P.: Model-free robust optimal feedback mechanisms of biological motor control. Neural Comput. 32(3), 562–595 (2020)

    Article  MathSciNet  Google Scholar 

  9. Bitmead, R.R., Gevers, M., Wertz, V.: Adaptive Optimal Control: The Thinking Man’s GPC. Prentice-Hall, Englewood Cliffs, New Jersy (1990)

    MATH  Google Scholar 

  10. Breakspear, M.: Dynamic models of large-scale brain activity. Nat. Neurosci. 20(3), 340–352 (2017)

    Google Scholar 

  11. Bryson, A.E., Ho, Y.C.: Applied Optimal Control: Optimization, Estimation and Control. Talor & Francis (1975)

    Google Scholar 

  12. Buşoniu, L., de Bruin, T., Tolić, D., Kober, J., Palunko, I.: Reinforcement learning for control: Performance, stability, and deep approximators. Annu. Rev. Control 46, 8–28 (2018)

    Google Scholar 

  13. Coppens, P., Patrinos, P.: Sample complexity of data-driven stochastic LQR with multiplicative uncertainty. In: The 59th IEEE Conference on Decision and Control (CDC), pp. 6210–6215 (2020)

    Google Scholar 

  14. Coppens, P., Schuurmans, M., Patrinos, P.: Data-driven distributionally robust LQR with multiplicative noise. In: Learning for Dynamics and Control (L4DC), pp. 521–530. PMLR (2020)

    Google Scholar 

  15. De Koning, W.L.: Infinite horizon optimal control of linear discrete time systems with stochastic parameters. Automatica 18(4), 443–453 (1982)

    Google Scholar 

  16. De Koning, W.L.: Compensatability and optimal compensation of systems with white parameters. IEEE Trans. Autom. Control 37(5), 579–588 (1992)

    Google Scholar 

  17. Drenick, R., Shaw, L.: Optimal control of linear plants with random parameters. IEEE Trans. Autom. Control 9(3), 236–244 (1964)

    Google Scholar 

  18. Du, K., Meng, Q., Zhang, F.: A Q-learning algorithm for discrete-time linear-quadratic control with random parameters of unknown distribution: convergence and stabilization. arXiv preprint arXiv:2011.04970, 2020

  19. Duncan, T.E., Guo, L., Pasik-Duncan, B.: Adaptive continuous-time linear quadratic gaussian control. IEEE Trans. Autom. Control 44(9), 1653–1662 (1999)

    Google Scholar 

  20. Gravell, B., Esfahani, P.M., Summers, T.: Learning robust controllers for linear quadratic systems with multiplicative noise via policy gradient. IEEE Trans. Autom. Control (2019)

    Google Scholar 

  21. Gravell, B., Esfahani, P.M., Summers, T.: Robust control design for linear systems via multiplicative noise. arXiv preprint arXiv:2004.08019 (2020)

  22. Gravell, B., Ganapathy, K., Summers, T.: Policy iteration for linear quadratic games with stochastic parameters. IEEE Control Syst. Lett. 5(1), 307–312 (2020)

    Google Scholar 

  23. Guo, Y., Summers, T.H.: A performance and stability analysis of low-inertia power grids with stochastic system inertia. In: American Control Conference (ACC), pp. 1965–1970 (2019)

    Google Scholar 

  24. Hespanha, J.P., Naghshtabrizi, P., Xu, Y.: A survey of recent results in networked control systems. Proceedings of the IEEE 95(1), 138–162 (2007)

    Article  Google Scholar 

  25. Hewer, G.: An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Trans. Autom. Control (1971)

    Google Scholar 

  26. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, New York (2012)

    Book  Google Scholar 

  27. Jiang, Y., Jiang, Z.P.: Adaptive dynamic programming as a theory of sensorimotor control. Biolog. Cybern. 108(4), 459–473 (2014)

    Article  MathSciNet  Google Scholar 

  28. Jiang, Y., Jiang, Z.P.: Robust Adaptive Dynamic Programming. Wiley, Hoboken, New Jersey (2017)

    Book  Google Scholar 

  29. Jiang, Z.P., Bian, T., Gao, W.: Learning-based control: A tutorial and some recent results. Found. Trends Syst. Control. 8(3), 176–284 (2020)

    Article  Google Scholar 

  30. Jiang, Z.P., Lin, Y., Wang, Y.: Nonlinear small-gain theorems for discrete-time feedback systems and applications. Automatica 40(12), 2129–2136 (2004)

    MathSciNet  MATH  Google Scholar 

  31. Kamalapurkar, R., Walters, P., Rosenfeld, J., Dixon, W.: Reinforcement learning for optimal feedback control: A Lyapunov-based approach. Springer (2018)

    Google Scholar 

  32. Kantorovich, L.V., Akilov, G.P.: Functional Analysis in Normed Spaces. Macmillan, New York (1964)

    MATH  Google Scholar 

  33. Kiumarsi, B., Vamvoudakis, K.G., Modares, H., Lewis, F.L.: Optimal and autonomous control using reinforcement learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2042–2062 (2018)

    Article  MathSciNet  Google Scholar 

  34. Konrad, K.: Theory and Application of Infinite Series, 2nd edn. Dover Publications, New York (1990)

    Google Scholar 

  35. Lai, J., Xiong, J., Shu, Z.: Model-free optimal control of discrete-time systems with additive and multiplicative noises. arXiv preprintarXiv:2008.08734 (2020)

  36. Levine, S., Koltun, V.: Continuous inverse optimal control with locally optimal examples. In: International Conference on Machine Learning (ICML) (2012)

    Google Scholar 

  37. Levine, S., Kumar, A., Tucker, G., Fu, J.: Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprintarXiv:2005.01643 (2020)

  38. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. In: International Conference on LearningRepresentations (ICLR) (2016)

    Google Scholar 

  39. Ljung, L.: System Identification: Theory for the user, 2nd edn. Prentice Hall PTR, Upper Saddle River (1999)

    MATH  Google Scholar 

  40. Magnus, J.R., Neudecker, H.: Matrix Differential Calculus With Applications In Statistics And Economerices. Wiley, New York (2007)

    Google Scholar 

  41. Monfort, M., Liu, A., Ziebart, B.D.: Intent prediction and trajectory forecasting via predictive inverse linear-quadratic regulation. In: AAAI Conference on Artificial Intelligence (AAAI) (2015)

    Google Scholar 

  42. Morozan, T.: Stabilization of some stochastic discrete-time control systems. Stoch. Anal. Appl. 1(1), 89–116 (1983)

    Google Scholar 

  43. Pang, B., Bian, T., Jiang, Z.-P.: Robust policy iteration for continuous-time linear quadratic regulation. IEEE Trans. Autom. Control (2020)

    Google Scholar 

  44. Pang, B., Jiang, Z.-P.: Robust reinforcement learning: A case study in linear quadratic regulation. In: AAAI Conference on Artificial Intelligence (AAAI) (2020)

    Google Scholar 

  45. Powell, W.B.: From reinforcement learning to optimal control: A unified framework for sequential decisions. arXiv preprintarXiv:1912.03513 (2019)

  46. Praly, L., Lin, S.-F., Kumar, P.R.: A robust adaptive minimum variance controller. SIAM J. Control Optim. 27(2), 235–266 (1989)

    Google Scholar 

  47. Rami, M.A., Chen, X., Zhou, X.Y.: Discrete-time indefinite LQ control with state and control dependent noises. J. Glob. Optim. 23(3), 245–265 (2002)

    Google Scholar 

  48. Åström, K.J., Wittenmark, B.: Adaptive Control, 2nd edn. Addison-Wesley, Reading, Massachusetts (1995)

    MATH  Google Scholar 

  49. Sontag, E.D.: Input to state stability: Basic concepts and results. Nonlinear and optimal control theory. volume 1932, pp. 163–220. Springer, Berlin (2008)

    Google Scholar 

  50. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge, Massachusetts (2018)

    MATH  Google Scholar 

  51. Sutton, R.S., Barto, A.G., Williams, R.J.: Reinforcement learning is direct adaptive optimal control. IEEE Control Syst. Mag. 12(2), 19–22 (1992)

    Article  Google Scholar 

  52. Tiedemann, A., De Koning, W.: The equivalent discrete-time optimal control problem for continuous-time systems with stochastic parameters. Int. J. Control 40(3), 449–466 (1984)

    Google Scholar 

  53. Todorov, E.: Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Comput. 17(5), 1084–1108 (2005)

    Google Scholar 

  54. Tu, S., Recht, B.: The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint. In Annual Conference on Learning Theory (COLT) (2019)

    Google Scholar 

  55. Xing, Y., Gravell, B., He, X., Johansson, K.H., Summers, T.: Linear system identification under multiplicative noise from multiple trajectory data. In American Control Conference (ACC), pp 5157–5261 (2020)

    Google Scholar 

  56. Huang, Y., Zhang, W., Zhang, H.: Infinite horizon LQ optimal control for discrete-time stochastic systems. In: The 6th World Congress on Intelligent Control and Automation (WCICA), vol. 1, pp. 252–256 (2006)

    Google Scholar 

Download references

Acknowledgements

Confucius once said, Virtue is not left to stand alone. He who practices it will have neighbors. Laurent Praly, the former PhD advisor of the second-named author, is such a beautiful mind. His vision about and seminal contributions to control theory, especially nonlinear and adaptive control, have influenced generations of students including the authors of this chapter. ZPJ is privileged to have Laurent as the PhD advisor during 1989–1993 and is very grateful to Laurent for introducing him to the field of nonlinear control. It is under Laurent’s close guidance that ZPJ started, in 1991, working on the stability and control of interconnected nonlinear systems that has paved the foundation for nonlinear small-gain theory. The research findings presented here are just a reflection of Laurent’s vision about the relationships between control and learning. We also thank the U.S. National Science Foundation for its continuous financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Pang .

Editor information

Editors and Affiliations

Appendices

Appendix 1

The following lemma provides the relationship between operations \({{\,\mathrm{vec}\,}}(\cdot )\) and \({{\,\mathrm{svec}\,}}(\cdot )\).

Lemma 9.5

([40, Page 57]) For \(X\in \mathbb {S}^n\), there exists a unique matrix \(D_n\in \mathbb {R}^{n^2\times \frac{1}{2}n(n+1)}\) with full column rank, such that

$${{\,\mathrm{vec}\,}}(X) = D_n{{\,\mathrm{svec}\,}}(X),\quad {{\,\mathrm{svec}\,}}(X) = D_n^\dagger {{\,\mathrm{vec}\,}}(X).$$

\(D_n\) is called the duplication matrix.

Lemma 9.6

([44, Lemma A.3.]) Let \(\mathcal {O}\) be a compact set such that \(\rho (O)<1\) for any \(O\in \mathcal {O}\), then there exist an \(a_0>0\) and a \(0<b_0<1\), such that

$$\Vert O^k\Vert _2\le a_0 b_0^k,\quad \forall k\in \mathbb {Z}_+$$

for any \(O\in \mathcal {O}\).

For \(X\in \mathbb {R}^{n\times n}\), \(Y\in \mathbb {R}^{n\times m}\), \(X+\Delta X\in \mathbb {R}^{n\times n}\), \(Y + \Delta Y\in \mathbb {R}^{n\times m}\), supposing X and \(X + \Delta X\) are invertible, the following inequality is repeatedly used:

$$\begin{aligned} \begin{aligned}&\Vert X^{-1}Y - (X+\Delta X)^{-1}(Y+\Delta Y)\Vert _F=\left\| X^{-1}Y - X^{-1}(Y+\Delta Y)\right. \\&\left. + X^{-1}(Y+\Delta Y) - (X+\Delta X)^{-1}(Y+\Delta Y)\right\| _F\\&= \Vert -X^{-1}\Delta Y + X^{-1}\Delta X(X + \Delta X)^{-1}(Y+\Delta Y)\Vert _F\\&\le \Vert X^{-1}\Vert _F\Vert \Delta Y\Vert _F + \Vert X^{-1}\Vert _F\Vert (X + \Delta X)^{-1}\Vert _F\Vert (Y+\Delta Y)\Vert _F\Vert \Delta X\Vert _F. \end{aligned} \end{aligned}$$
(9.23)

Appendix 2

The following property of \(\mathcal {L}_{K}(\cdot )\) is useful.

Lemma 9.7

If K is mean-square stabilizing, then \(\mathcal {L}_K(Y_1)\le \mathcal {L}_K(Y_2)\implies Y_1\ge Y_2\), where \(Y_1,Y_2\in \mathbb {S}^n\).

Proof

Let \(\{x_t\}_{t=0}^\infty \) be the solution of the closed-loop system (9.1) with controller \(u=-Kx\). Then for any \(t\ge 1\)

$$\begin{aligned} \mathbb {E}[x_{t+1}^TY_1x_{t+1}-x_{t}^TY_1x_{t}]&= \mathbb {E}[x_t^T\mathcal {L}_K(Y_1)x_t]\\&\le \mathbb {E}[x_t^T\mathcal {L}_K(Y_2)x_t]=\mathbb {E}[x_{t+1}^TY_2x_{t+1}-x_{t}^TY_2x_{t}]. \end{aligned}$$

Since K is mean-square stabilizing,

$$-x^T_0Y_1x_0=\sum _{t=0}^\infty \mathbb {E}[ x_t^T\mathcal {L}_K(Y_1)x_t]\le \sum _{t=0}^\infty \mathbb {E}[x_t^T\mathcal {L}_K(Y_2)x_t]=-x^T_0Y_2x_0.$$

The proof is complete because \(x_0\) is arbitrary. \(\square \)

Now we are ready to prove Theorem 9.1.

Proof

(Theorem 9.1) By (9.7) and (9.8), for any \(x\in \mathbb {R}^n\),

$$K_2 \in \displaystyle \mathop {{\text {arg}}\,{\text {min}}}_{K\in \mathbb {R}^{m\times n}} \{x^T\mathcal {H}(P_1,K)x\}.$$

Thus \(\mathcal {H}(P_1,K_2)\le 0\). By definition, \(P_1>0\) and

$$\mathcal {L}_{K_2}(P_1) \le -S - K_2^TRK_2 <0.$$

Then Lemma 9.1 implies that \(K_2\) is mean-square stabilizing. Inserting (9.7) into the above inequality yields \(\mathcal {L}_{K_2}(P_1) \le \mathcal {L}_{K_2}(P_2)\). This implies \(P_1\ge P_2\) by Lemma 9.7. An application of mathematical induction proves the first two items. For the last item, by a theorem on the convergence of a monotone sequence of self-adjoint operators (see [32, Pages 189–190]), \(\lim _{i\rightarrow \infty } P_i\) and \(\lim _{i\rightarrow \infty } K_i\) exist. Letting \(i\rightarrow \infty \) in (9.7) and (9.8), and eliminating \(K_\infty \) in (9.7) using (9.8), we have

$$\begin{aligned} P_\infty = S + \Pi (P_\infty ) - A^TP_\infty B(R + \Sigma (P_\infty ))^{-1}B^TP_\infty A. \end{aligned}$$

The proof is complete by the uniqueness of \(P^*\). \(\square \)

Appendix 3

Proof

(Lemma 9.2) Since \(\mathscr {K}(P^*)\) is mean-square stabilizing, by continuity there always exists a \(\bar{\delta }_0>0\), such that \(\mathscr {R}(P_i)\) is invertible, \(\mathscr {K}(P_i)\) is mean-square stabilizing for all \(P_i\in \mathcal {\bar{B}}_{\bar{\delta }_0}(P^*)\). Suppose \(P_i\in \mathcal {\bar{B}}_{\bar{\delta }_0}(P^*)\). Subtracting

$$K_{i+1}^TB^TP^*A+A^TP^*BK_{i+1}-K_{i+1}^T\mathscr {R}(P^*)K_{i+1}$$

from both sides of the GARE (9.4) yields

$$\begin{aligned} \begin{aligned}&\mathcal {L}_{\mathscr {K}(P_i)}(P^*) = - S - \mathscr {K}^T(P_i)R\mathscr {K}(P_i) + \\&(\mathscr {K}(P_i)-\mathscr {K}(P^*))^T\mathscr {R}(P^*)(\mathscr {K}(P_i)-\mathscr {K}(P^*)). \end{aligned} \end{aligned}$$
(9.24)

Subtracting (9.24) from (9.12), we have

$$\begin{aligned} \begin{aligned} P_{i+1}-P^* = -\mathcal {L}^{-1}_{\mathscr {K}(P_i)}\left( ((\mathscr {K}(P_i)-\mathscr {K}(P^*))^T\mathscr {R}(P^*)(\mathscr {K}(P_i)-\mathscr {K}(P^*))\right) . \end{aligned} \end{aligned}$$

Taking norm on both sides of above equation, (9.11) yields

$$\begin{aligned} \begin{aligned} \Vert P_{i+1}-P^* \Vert _F \le \Vert \mathscr {A}(\mathscr {K}(P_i))^{-1}\Vert _2 \Vert \mathscr {R}(P^*)\Vert _F\Vert \mathscr {K}(P_i)-\mathscr {K}(P^*)\Vert _F^2. \end{aligned} \end{aligned}$$

Since \(\mathscr {K}(\cdot )\) is locally Lipschitz continuous at \(P^*\), by continuity of matrix norm and matrix inverse, there exists a \(c_1>0\), such that

$$\Vert P_{i+1}-P^* \Vert _F \le c_1\Vert P_i - P^*\Vert _F^2,\quad \forall P_i\in \mathcal {\bar{B}}_{\bar{\delta }_0}(P^*).$$

So for any \(0<\sigma <1\), there exists a \(\bar{\delta }_0\ge \delta _0>0\) with \(c_1\delta _0\le \sigma \). This completes the proof. \(\square \)

Appendix 4

Before the proof of Lemma 9.3, some auxiliary lemmas are firstly proved. Procedure 9.2 will exhibit a singularity, if \([\hat{G}_{i}]_{uu}\) in (9.10) is singular, or the cost (9.2) of \(\hat{K}_{i+1}\) is infinity. The following lemma shows that if \(\Delta G_i\) is small, no singularity will occur. Let \(\bar{\delta }_0\) be the one defined in the proof of Lemma 9.2, then \(\delta _0\le \bar{\delta }_0\).

Lemma 9.8

For any \(\tilde{P}_i\in \mathcal {B}_{\delta _0}(P^*)\), there exists a \(d(\delta _0)>0\), independent of \(\tilde{P}_i\), such that \(\hat{K}_{i+1}\) is mean-square stabilizing and \([\hat{G}_{i}]_{uu}\) is invertible, if \(\Vert \Delta G_i\Vert _F\le d\).

Proof

Since \(\mathcal {\bar{B}}_{\bar{\delta }_0}(P^*)\) is compact and \(\mathscr {A}(\mathscr {K}(\cdot ))\) is a continuous function, set

$$\mathcal {S} = \{\mathscr {A}(\mathscr {K}(\tilde{P}_i))\vert \tilde{P}_i\in \mathcal {\bar{B}}_{\bar{\delta }_0}(P^*)\}$$

is also compact. By continuity and Lemma 9.1, for each \(X\in \mathcal {S}\), there exists a \(r(X)>0\) such that \(\rho (Y+I_n\otimes I_n)<1\) for any \(Y\in \mathcal {B}_{r(X)}(X)\). The compactness of \(\mathcal {S}\) implies the existence of a \(\underline{r}>0\), such that \(\rho (Y+I_n\otimes I_n)<1\) for each \(Y\in \mathcal {B}_{\underline{r}}(X)\) and all \(X\in \mathcal {S}\). Similarly, there exists \(d_1>0\) such that \([\hat{G}_{i}]_{uu}\) is invertible for all \(\tilde{P}_i\in \mathcal {\bar{B}}_{\bar{\delta }_0}(P^*)\), if \(\Vert \Delta G_i\Vert _F\le d_1\). Note that in policy improvement step of Procedure 9.1 (the policy update step in Procedure 9.2), the improved policy\(\tilde{K}_{i+1}=[\tilde{G}_{i}]_{uu}^{-1}[\tilde{G}_{i}]_{ux}\) (the updated policy \(\hat{K}_{i+1}\)) is continuous function of \(\tilde{G}_i\) (\(\hat{G}_i\)), and there exists a \(0<d_2\le d_1\), such that \(\mathscr {A}(\hat{K}_{i+1})\in \mathcal {B}_{\underline{r}}(\mathscr {A}(\mathscr {K}(\tilde{P}_i)))\) for all \(\tilde{P}_i\in \mathcal {\bar{B}}_{\bar{\delta }_0}(P^*)\), if \(\Vert \Delta G_i\Vert _F\le d_2\). Thus, Lemma 9.1 implies that \(\hat{K}_{i+1}\) is mean-square stabilizing. Setting \(d=d_2\) completes the proof. \(\square \)

By Lemma 9.8, if \(\Vert \Delta G_i\Vert _F\le d\), the sequence \(\{\tilde{P}_i\}_{i=0}^{\infty }\) satisfies (9.13). For simplicity, we denote \(\mathcal {E}(\tilde{G}_i,\Delta G_i)\) in (9.13) by \(\mathcal {E}_i\). The following lemma gives an upper bound on \(\Vert \mathcal {E}_i\Vert _F\) in terms of \(\Vert \Delta G_i\Vert _F\).

Lemma 9.9

For any \(\tilde{P}_i\in \mathcal {B}_{\delta _0}(P^*)\) and any \(c_2>0\), there exists a \(0<\delta _1^1(\delta _0,c_2)\le d\), independent of \(\tilde{P}_i\), where d is defined in Lemma 9.8, such that

$$\Vert \mathcal {E}_i\Vert _F\le c_3\Vert \Delta G_i\Vert _F<c_2,$$

if \(\Vert \Delta G_i\Vert _F<\delta _1^1\), where \(c_3(\delta _0)>0\).

Proof

For any \(\tilde{P}_i\in \bar{\mathcal {B}}_{\delta _0}(P^*)\), \(\Vert \Delta G_i\Vert _F\le d\), we have from (9.23)

$$\begin{aligned} \Vert \mathscr {K}(\tilde{P}_i) - \hat{K}_{i+1}\Vert _F&\le \Vert [\tilde{G}_{i}]^{-1}_{uu}\Vert _F(1 + \Vert [\hat{G}_{i}]^{-1}_{uu}\Vert _F \Vert [\hat{G}_{i}]_{ux}\Vert _F)\Vert \Delta G_i\Vert _F \nonumber \\&\le c_4(\delta _0,d)\Vert \Delta G_i\Vert _F, \end{aligned}$$
(9.25)

where the last inequality comes from the continuity of matrix inverse and the extremum value theorem. Define

$$\begin{aligned} \begin{aligned} \check{P}_{i} = \mathcal {L}_{\hat{K}_{i+1}}^{-1}\left( -S-\hat{K}^T_{i+1}R\hat{K}_{i+1}\right) ,\quad \mathring{P}_{i} = \mathcal {L}_{\mathscr {K}(\tilde{P}_i)}^{-1}\left( - S - \mathscr {K}(\tilde{P}_i)^TR\mathscr {K}(\tilde{P}_i)\right) . \end{aligned} \end{aligned}$$

Then by (9.11) and (9.13),

$$\begin{aligned} \begin{aligned} \Vert \mathcal {E}_i\Vert _F&= \Vert {{\,\mathrm{vec}\,}}(\check{P}_{i}- \mathring{P}_{i})\Vert _2, \\ {{\,\mathrm{vec}\,}}(\check{P}_{i})&= \mathscr {A}^{-1}\left( \hat{K}_{i+1}\right) {{\,\mathrm{vec}\,}}\left( -S-\hat{K}^T_{i+1}R\hat{K}_{i+1}\right) , \\ {{\,\mathrm{vec}\,}}(\mathring{P}_{i})&= \mathscr {A}^{-1}\left( \mathscr {K}(\tilde{P}_i)\right) {{\,\mathrm{vec}\,}}\left( - S - \mathscr {K}(\tilde{P}_i)^TR\mathscr {K}(\tilde{P}_i)\right) . \end{aligned} \end{aligned}$$

Define

$$\begin{aligned} \begin{aligned} \Delta \mathscr {A}_i&= \mathscr {A}\left( \mathscr {K}(\tilde{P}_i)\right) - \mathscr {A}\left( \hat{K}_{i+1}\right) ,\quad \Delta b_i = {{\,\mathrm{vec}\,}}\left( \mathscr {K}(\tilde{P}_i)^TR\mathscr {K}(\tilde{P}_i) - \hat{K}_{i+1}^TR\hat{K}_{i+1}\right) . \end{aligned} \end{aligned}$$

Using (9.25), it is easy to check that \(\Vert \Delta \mathscr {A}_i\Vert _F\le c_5\Vert \Delta G_i\Vert _F\), \(\Vert \Delta b_i\Vert _2\le c_6\Vert \Delta G_i\Vert _F\), for some \(c_5(\delta _0,d)>0\), \(c_6(\delta _0,d)>0\). Then by (9.23)

$$\begin{aligned} \begin{aligned} \Vert \mathcal {E}_i \Vert _F&\le \left\| \mathscr {A}^{-1}\left( \hat{K}_{i+1}\right) \right\| _F \left( c_6 + c_5\left\| \mathscr {A}^{-1}\left( \mathscr {K}(\tilde{P}_i)\right) \right\| _F\right. \\&\left. \times \left\| S + \mathscr {K}(\tilde{P}_i)^TR\mathscr {K}(\tilde{P}_i)\right\| _F\right) \Vert \Delta G_i\Vert _F \le c_3(\delta _0)\Vert \Delta G_i\Vert _F, \end{aligned} \end{aligned}$$

where the last inequality comes from the continuity of matrix inverse and Lemma 9.8. Choosing \(0<\delta ^1_1\le d\) such that \(c_3\delta ^1_1<c_2\) completes the proof.

Now we are ready to prove Lemma 9.3.

Proof

(Lemma 9.3) Let \(c_2=(1-\sigma )\delta _0\) in Lemma 9.9, and \(\delta _1\) be equal to the \(\delta _1^1\) associated with \(c_2\). For any \(i\in \mathbb {Z}_+\), if \(\tilde{P}_i\in \mathcal {B}_{\delta _0}(P^*)\), then \([\hat{G}_i]_{uu}\) is invertible, \(\hat{K}_{i+1}\) is mean-square stabilizing and

$$\begin{aligned} \Vert \tilde{P}_{i+1} - P^*\Vert _F&\le \Vert \mathcal {E}_i\Vert _F +\nonumber \left\| \mathcal {L}^{-1}_{\mathscr {K}(\tilde{P}_i)}(S+\tilde{P}^T_iBR^{-1}B^T\tilde{P}_i) - P^* \right\| _F \nonumber \\&\le \sigma \Vert \tilde{P}_i - P^*\Vert _F + c_3\Vert \Delta G_i\Vert _F \end{aligned}$$
(9.26)
$$\begin{aligned}&\le \sigma \Vert \tilde{P}_i - P^*\Vert _F + c_3\Vert \Delta G\Vert _\infty \end{aligned}$$
(9.27)
$$\begin{aligned}&<\sigma \delta _0 + c_3\delta _1<\sigma \delta _0 + c_2 =\delta _0, \end{aligned}$$
(9.28)

where (9.26) and (9.28) are due to Lemmas 9.2 and 9.9. By induction, (9.26) to (9.28) hold for all \(i\in \mathbb {Z}_+\), thus by (9.27),

$$\begin{aligned} \begin{aligned} \Vert \tilde{P}_{i} - P^*\Vert _F&\le \sigma ^2\Vert \tilde{P}_{i-2}-P^*\Vert _F + (\sigma + 1)c_3\Vert \Delta G\Vert _\infty \\&\le \cdots \le \sigma ^{i}\Vert \tilde{P}_{0}-P^*\Vert _F + (1+ \cdots + \sigma ^{i-1})c_3\Vert \Delta G\Vert _\infty \\&<\sigma ^{i}\Vert \tilde{P}_{0}-P^*\Vert _F + \frac{c_3}{1-\sigma }\Vert \Delta G\Vert _\infty , \end{aligned} \end{aligned}$$

which proves (i) and (ii) in Lemma 9.3. Then (9.25) implies (iii) in Lemma 9.3.

In terms of (iv) in Lemma 9.3, for any \(\epsilon >0\), there exists a \(i_1\in \mathbb {Z}_+\), such that \(\sup \{\Vert \Delta G_i\Vert _F\}_{i=i_1}^\infty <\gamma ^{-1}(\epsilon /2)\). Take \(i_2\ge i_1\). For \(i\ge i_2\), we have by (ii) in Lemma 9.3,

$$\begin{aligned} \begin{aligned} \Vert \tilde{P}_{i}-P^*\Vert _F\le \beta (\Vert \tilde{P}_{i_2}-P^*\Vert _F,i-i_2) + \epsilon /2 \le \beta (c_7,i-i_2) + \epsilon /2, \end{aligned} \end{aligned}$$

where the second inequality is due to the boundedness of \(\tilde{P}_i\). Since \(\lim _{i\rightarrow \infty }\beta (c_7,i-i_2)=0\), there is a \(i_3\ge i_2\) such that \(\beta (c_7,i-i_2)<\epsilon /2\) for all \(i\ge i_3\), which completes the proof. \(\square \)

Appendix 5

Notice that all the conclusions of Theorem 9.2 can be implied by Lemma 9.3 if

$$\delta _2<\min (\gamma ^{-1}(\epsilon ),\delta _1),\quad \tilde{P}_1\in \mathcal {B}_{\delta _0}(P^*)$$

for Procedure 9.2. Thus, the proof of Theorem 9.2 reduces to the proof of the following lemma.

Lemma 9.10

Given a mean-square stabilizing \(\hat{K}_1\), there exist \(0<\delta _2<\min (\gamma ^{-1}(\epsilon ),\delta _1)\), \(\bar{i}\in \mathbb {Z}_+\), \(\alpha _2>0\), and \(\kappa _2>0\), such that \([\hat{G}_i]_{uu}\) is invertible, \(\hat{K}_i\) is mean-square stabilizing, \(\Vert \tilde{P}_i\Vert _F<\alpha _2\), \(\Vert \hat{K}_i\Vert _F<\kappa _2\), \(i=1,\cdots ,\bar{i}\), \(\tilde{P}_{\bar{i}}\in \mathcal {B}_{\delta _0}(P^*)\), as long as \(\Vert \Delta G\Vert _\infty <\delta _2\).

The next two lemmas state that under certain conditions on \(\Vert \Delta G_i\Vert _F\), each element in \(\{\hat{K}_i\}_{i=1}^{\bar{i}}\) is mean-square stabilizing, each element in \(\{[\hat{G}_i]_{uu}\}_{i=1}^{\bar{i}}\) is invertible, and \(\{\tilde{P}_i\}_{i=1}^{\bar{i}}\) is bounded. For simplicity, in the following we assume \(S>I_n\) and \(R>I_m\). All the proofs still work for any \(S>0\) and \(R>0\), by suitable rescaling.

Lemma 9.11

If \(\hat{K}_i\) is mean-square stabilizing, then \([\hat{G}_i]_{uu}\) is nonsingular and \(\hat{K}_{i+1}\) is mean-square stabilizing, as long as \(\Vert \Delta G_i\Vert _F < a_i\), where

$$\begin{aligned} a_i=\left( m(\sqrt{n} + \Vert \hat{K}_{i}\Vert _2)^2+m(\sqrt{n} + \Vert \hat{K}_{i+1}\Vert _2)^2\right) ^{-1}. \end{aligned}$$

Furthermore,

$$\begin{aligned} \Vert \hat{K}_{i+1}\Vert _F \le 2\Vert R^{-1}\Vert _F(1 + \Vert B^T\tilde{P}_iA\Vert _F). \end{aligned}$$
(9.29)

Proof

By definition,

$$\Vert [\tilde{G}_i]_{uu}^{-1}([\hat{G}_i]_{uu}-[\tilde{G}_i]_{uu})\Vert _F < a_i\Vert [\tilde{G}_i]_{uu}^{-1}\Vert _F.$$

Since \(R>I_m\), the eigenvalues \(\lambda _j([\tilde{G}_i]_{uu}^{-1})\in (0,1]\) for all \(1\le j\le m\). Then by the fact that for any \(X\in \mathbb {S}^m\)

$$\Vert X\Vert _F = \Vert \Lambda _X\Vert _F,\quad \Lambda _X = \mathrm {diag}\{\lambda _1(X),\cdots ,\lambda _m(X)\},$$

we have

$$\begin{aligned} \Vert [\tilde{G}_i]_{uu}^{-1}([\hat{G}_i]_{uu}-[\tilde{G}_i]_{uu})\Vert _F< a_i\sqrt{m} < 0.5. \end{aligned}$$
(9.30)

Thus by [26, Section 5.8], \([\hat{G}_i]_{uu}\) is invertible.

For any \(x\in \mathbb {R}^{n}\) on the unit ball, define

$$\mathcal {X}_{\hat{K}_i} = \left[ \begin{array}{c} I \\ -\hat{K}_i \end{array}\right] xx^T \left[ \begin{array}{cc} I&-\hat{K}_i^T \end{array}\right] .$$

From (9.9) and (9.10) we have

$$x^T\mathcal {H}(\tilde{G}_i,\hat{K}_i)x = \text {tr}(\tilde{G}_i\mathcal {X}_{\hat{K}_i}) = 0,$$

and

$$\text {tr}(\hat{G}_i\mathcal {X}_{\hat{K}_{i+1}}) = \min _{K\in \mathbb {R}^{m\times n}} \text {tr}(\hat{G}_i\mathcal {X}_K).$$

Then

$$\begin{aligned}&\mathrm {tr}(\tilde{G}_i\mathcal {X}_{\hat{K}_{i+1}}) \le \mathrm {tr}(\hat{G}_i\mathcal {X}_{\hat{K}_{i+1}}) + \Vert \Delta G_i\Vert _F \mathrm {tr}(\mathbf {1}\mathbf {1}^T\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs}) \nonumber \\&\le \mathrm {tr}(\hat{G}_i\mathcal {X}_{\hat{K}_{i}}) + \Vert \Delta G_i\Vert _F \mathbf {1}^T\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs}\mathbf {1}\nonumber \\&\le \mathrm {tr}(\tilde{G}_i\mathcal {X}_{\hat{K}_{i}}) + \Vert \Delta G_i\Vert _F \mathbf {1}^T(\vert \mathcal {X}_{\hat{K}_{i}}\vert _{abs}+\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs})\mathbf {1} \nonumber \\&\le \Vert \Delta G_i\Vert _F \mathbf {1}^T(\vert \mathcal {X}_{\hat{K}_{i}}\vert _{abs}+\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs})\mathbf {1}, \end{aligned}$$
(9.31)

where \(\vert \mathcal {X}_{\hat{K}_{i}}\vert _{abs}\) denotes the matrix obtained from \(\mathcal {X}_{\hat{K}_{i}}\) by taking the absolute value of each entry. Thus by (9.31) and the definition of \(\tilde{G}_i\), we have

$$\begin{aligned} x^T\mathcal {L}_{\hat{K}_{i+1}}(\tilde{P}_i)x + \epsilon _1 \le 0, \end{aligned}$$
(9.32)

where

$$\begin{aligned} \begin{aligned} \epsilon _1 = x^T(S+\hat{K}_{i+1}^TR\hat{K}_{i+1})x - \Vert \Delta G_i\Vert _F \mathbf {1}^T(\vert \mathcal {X}_{\hat{K}_{i}}\vert _{abs}+\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs})\mathbf {1}. \end{aligned} \end{aligned}$$

For any x on the unit ball, \(\vert \mathbf {1}^Tx\vert _{abs}\le \sqrt{n}\). Similarly, for any \(K\in \mathbb {R}^{m\times n}\), by the definition of induced matrix norm, \(\vert \mathbf {1}^TKx\vert _{abs}\le \Vert K\Vert _2 \sqrt{m}\). This implies

$$\begin{aligned} \left| \mathbf {1}^T\left[ \begin{array}{c} I \\ -K \end{array}\right] x\right| _{abs} = \left| \mathbf {1}^Tx - \mathbf {1}^TKx\right| _{abs} \le \sqrt{m}(\sqrt{n} + \Vert K\Vert _2), \end{aligned}$$

which means \(\mathbf {1}^T\vert \mathcal {X}_K\vert _{abs}\mathbf {1}\le m(\sqrt{n} + \Vert K\Vert _2)^2\). Thus

$$\Vert \Delta G_i\Vert _F \mathbf {1}^T(\vert \mathcal {X}_{\hat{K}_{i}}\vert _{abs}+\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs})\mathbf {1}<1.$$

Then \(S>I_n\) leads to

$$x^T\mathcal {L}_{\hat{K}_{i+1}}(\tilde{P}_i)x<0$$

for all x on the unit ball. So \(\hat{K}_{i+1}\) is mean-square stabilizing by Lemma 9.1.

By definition,

$$\begin{aligned} \Vert \hat{K}_{i+1}\Vert _F&\le \Vert [\hat{G}_i]_{uu}^{-1}\Vert _F(1 + \Vert B^T\tilde{P}_iA\Vert _F)\nonumber \\&\le \Vert [\tilde{G}_i]_{uu}^{-1}\Vert _F(1-\Vert [\tilde{G}_i]_{uu}^{-1}([\hat{G}_i]_{uu}-[\tilde{G}_i]_{uu})\Vert _F)^{-1}\nonumber (1 + \Vert B^T\tilde{P}_iA\Vert _F) \nonumber \\&\le 2\Vert R^{-1}\Vert _F(1 + \Vert B^T\tilde{P}_iA\Vert _F), \end{aligned}$$
(9.33)

where the second inequality comes from [26, Inequality (5.8.2)], and the last inequality is due to (9.30). This completes the proof. \(\square \)

Lemma 9.12

For any \(\bar{i}\in \mathbb {Z}_+\), \(\bar{i}>0\), if

$$\begin{aligned} \Vert \Delta G_i\Vert _F< (1+i^2)^{-1}a_i,\quad i=1,\cdots , \bar{i}, \end{aligned}$$
(9.34)

where \(a_i\) is defined in Lemma 9.11, then

$$\begin{aligned} \Vert \tilde{P}_i\Vert _F\le 6\Vert \tilde{P}_1\Vert _F,\quad \Vert \hat{K}_{i}\Vert _F\le C_0, \end{aligned}$$

for \(i=1,\cdots ,\bar{i}\), where

$$ C_0 = \max \left\{ \Vert \hat{K}_1 \Vert _F, 2\Vert R^{-1}\Vert _F\left( 1+6\Vert B^T\Vert _F\Vert \tilde{P}_1\Vert _F\Vert A\Vert _F\right) \right\} .$$

Proof

Inequality (9.32) yields

$$\begin{aligned} \mathcal {L}_{\hat{K}_{i+1}}(\tilde{P}_i) + (S+\hat{K}_{i+1}^TR\hat{K}_{i+1}) - \epsilon _{2,i}I < 0, \end{aligned}$$
(9.35)

where

$$\epsilon _{2,i} = \Vert \Delta G_i\Vert _F \mathbf {1}^T(\vert \mathcal {X}_{\hat{K}_{i}}\vert _{abs}+\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs})\mathbf {1}<1.$$

Inserting (9.9) into above inequality, and using Lemma 9.7, we have

$$\begin{aligned} \tilde{P}_{i+1} < \tilde{P}_{i} + \epsilon _{2,i}\mathcal {L}_{\hat{K}_{i+1}}^{-1}(-I). \end{aligned}$$
(9.36)

With \(S>I_n\), (9.35) yields

$$\begin{aligned} \mathcal {L}_{\hat{K}_{i+1}}(\tilde{P}_i) + (1 - \epsilon _{2,i})I < 0. \end{aligned}$$

Similar to (9.36), we have

$$\begin{aligned} \mathcal {L}_{\hat{K}_{i+1}}^{-1}(-I)<\frac{1}{1-\epsilon _{2,i}}\tilde{P}_i. \end{aligned}$$
(9.37)

From (9.36) to (9.37), we obtain

$$\begin{aligned} \tilde{P}_{i+1}<\left( 1+\frac{\epsilon _{2,i}}{1-\epsilon _{2,i}}\right) \tilde{P}_i. \end{aligned}$$

By definition of \(\epsilon _{2,i}\) and condition (9.34),

$$\frac{\epsilon _{2,i}}{1-\epsilon _{2,i}} \le \frac{1}{i^2},\quad i=1,\cdots ,\bar{i}.$$

Then [34, §28. Theorem 3] yields

$$\tilde{P}_i\le 6\tilde{P}_1,\quad i=1,\cdots ,\bar{i}.$$

An application of (9.29) completes the proof. \(\square \)

Now we are ready to prove Lemma 9.10.

Proof

(Lemma 9.10) Consider Procedure 9.2 confined to the first \(\bar{i}\) iterations, where \(\bar{i}\) is a sufficiently large integer to be determined later in this proof. Suppose

$$\begin{aligned} \Vert \Delta G_i\Vert _F<b_{\bar{i}}\triangleq \frac{1}{2m(1+\bar{i}^2)}\left( \sqrt{n} + C_0\right) ^{-2}. \end{aligned}$$
(9.38)

Condition (9.38) implies condition (9.34). Thus \(\hat{K}_i\) is mean-square stabilizing, \([\hat{G}_i]_{uu}^{-1}\) is invertible, \(\Vert \tilde{P}_i\Vert _F\) and \(\Vert \hat{K}_i\Vert _F\) are bounded. By (9.9) we have

$$\begin{aligned} \begin{aligned} \mathcal {L}_{\hat{K}_{i+1}}(\tilde{P}_{i+1}-\tilde{P}_{i})&= -S-\hat{K}_{i+1}^TR\hat{K}_{i+1}-\mathcal {L}_{\hat{K}_{i+1}}(\tilde{P}_{i}). \end{aligned} \end{aligned}$$

Letting \(E_i = \hat{K}_{i+1} - \mathscr {K}(\tilde{P}_i)\), the above equation can be rewritten as

$$\begin{aligned} \tilde{P}_{i+1} = \tilde{P}_i - \mathcal {N}(\tilde{P_i}) + \mathcal {L}_{\mathscr {K}(\tilde{P_i})}^{-1}(\mathscr {E}_i), \end{aligned}$$
(9.39)

where \(\mathcal {N}(\tilde{P_i}) = \mathcal {L}_{\mathscr {K}(\tilde{P_i})}^{-1}\circ \mathcal {R}(\tilde{P}_i),\) and

$$\begin{aligned} \begin{aligned} \mathcal {R}(Y)&= \Pi (Y)-Y-A_0^TYB_0(R+\Sigma (Y))^{-1}B_0^TYA_0+S, \\ \mathscr {E}_i&= - E_i^T\mathscr {R}(\tilde{P}_{i+1})E_i + E_i^T\mathscr {R}(\tilde{P}_{i+1})\left( \mathscr {K}(\tilde{P}_{i+1})-\mathscr {K}(\tilde{P}_i)\right) \\&+\left( \mathscr {K}(\tilde{P}_{i+1})-\mathscr {K}(\tilde{P}_i)\right) ^T\mathscr {R}(\tilde{P}_{i+1})E_i. \end{aligned} \end{aligned}$$

Given \(\hat{K}_1\), let \(\mathcal {M}_{\bar{i}}\) denote the set of all possible \(\tilde{P}_i\), generated by (9.39) under condition (9.38). By definition, \(\{\mathcal {M}_j\}_{j=1}^\infty \) is a nondecreasing sequence of sets, i.e., \(\mathcal {M}_1\subset \mathcal {M}_2 \subset \cdots \). Define \(\mathcal {M} = \cup _{j=1}^\infty \mathcal {M}_j\), \(\mathcal {D} = \{P\in \mathbb {S}^n\ \vert \ \Vert P\Vert _F\le 6\Vert \tilde{P}_1\Vert _F\}\). Then by Lemma 9.12 and Theorem 9.1, \(\mathcal {M}\subset \mathcal {D}\); \(\mathcal {M}\) is compact; \(\mathscr {K}(P)\) is stable for any \(P\in \mathcal {M}\).

Now we prove that \(\mathcal {N}(P^1)\) is Lipschitz continuous on \(\mathcal {M}\). Using (9.11), we have

$$\begin{aligned} \Vert \mathcal {N}(P^1)-\mathcal {N}(P^2)\Vert _F&= \Vert \mathscr {A}^{-1}(\mathscr {K}(P^1)){{\,\mathrm{vec}\,}}(\mathcal {R}(P^1)) -\mathscr {A}^{-1}(\mathscr {K}(P^2)){{\,\mathrm{vec}\,}}(\mathcal {R}(P^2))\Vert _2 \nonumber \\&\le \Vert \mathscr {A}^{-1}(\mathscr {K}(P^1))\Vert _2\Vert \mathcal {R}(P^1) -\mathcal {R}(P^2)\Vert _F +\nonumber \\&\Vert \mathcal {R}(P^2)\Vert _F\Vert \mathscr {A}^{-1}(\mathscr {K}(P^1)) -\mathscr {A}^{-1}(\mathscr {K}(P^2))\Vert _2\nonumber \\&\le L \Vert P^1 -P^2 \Vert _F, \end{aligned}$$
(9.40)

where the last inequality is due to the fact that matrix inversion \(\mathscr {A}(\cdot )\), \(\mathscr {K}(\cdot )\), and \(\mathcal {R}(\cdot )\) are locally Lipschitz, thus Lipschitz on compact set \(\mathcal {M}\) with some Lipschitz constant \(L>0\).

Define \(\{P_{k\vert i}\}_{k=0}^{\infty }\) as the sequence generated by (9.12) with \(P_{0\vert i}=\tilde{P}_i\). Similar to (9.39), we have

$$\begin{aligned} P_{k+1\vert i} = P_{k\vert i} - \mathcal {N}(P_{k\vert i}), \quad k\in \mathbb {Z}_+. \end{aligned}$$
(9.41)

By Theorem 9.1 and the fact that \(\mathcal {M}\) is compact, there exists \(k_0\in \mathbb {Z}_+\), such that

$$\begin{aligned} \Vert P_{k_0\vert i}-P^*\Vert _F<\delta _0/2, \qquad \forall P_{0\vert i}\in \mathcal {M}. \end{aligned}$$
(9.42)

Suppose

$$\begin{aligned} \Vert \mathcal {L}_{\mathscr {K}(\tilde{P}_{i+j})}^{-1}(\mathscr {E}_{i+j})\Vert _F<\mu ,\qquad j=0,\cdots , \bar{i} - i. \end{aligned}$$
(9.43)

We find an upper bound on \(\Vert P_{k\vert i}-\tilde{P}_{i+k}\Vert _F\). Notice that from (9.39) to (9.41),

$$\begin{aligned} \begin{aligned} P_{k\vert i} = P_{0\vert i} - \sum _{j=0}^{k-1} \mathcal {N}(P_{j\vert i}), \qquad \tilde{P}_{i+k} = \tilde{P}_i - \sum _{j=0}^{k-1} \mathcal {N}(\tilde{P}_{i+j}) + \sum _{j=0}^{k-1} \mathcal {L}_{\mathscr {K}(\tilde{P}_{i+j})}^{-1}(\mathscr {E}_{i+j}). \end{aligned} \end{aligned}$$

Then (9.40) and (9.43) yield

$$\begin{aligned} \begin{aligned} \Vert P_{k\vert i} - \tilde{P}_{i+k}\Vert _F \le k\mu + \sum _{j=0}^{k-1} L \Vert P_{j\vert i} - \tilde{P}_{i+j}\Vert _F. \end{aligned} \end{aligned}$$

An application of the Gronwall inequality [2, Theorem 4.1.1.] to the above inequality implies

$$\begin{aligned} \Vert P_{k\vert i} - \tilde{P}_{i+k}\Vert _F \le k\mu + L\mu \sum _{j=0}^{k-1}j (1+L)^{k-j-1}. \end{aligned}$$
(9.44)

By (9.11), the error term in (9.39) satisfies

$$\begin{aligned} \left\| \mathcal {L}_{\mathscr {K}(\tilde{P_i})}^{-1}(\mathscr {E}_i)\right\| _F = \left\| \mathscr {A}^{-1}(\mathscr {K}(\tilde{P}_i)) {{\,\mathrm{vec}\,}}\left( \mathscr {E}_i\right) \right\| _2 \le C_1\Vert \mathscr {E}_i\Vert _F, \end{aligned}$$
(9.45)

where \(C_1\) is a constant and the inequality is due to the continuity of matrix inverse.

Let \(\bar{i}>k_0\), and \(k=k_0\), \(i = \bar{i}-k_0\) in (9.44). Then by condition (9.38), Lemma 9.12, (9.43), (9.44), and (9.45), there exist \(i_0\in \mathbb {Z}_+\), \(i_0>k_0\), such that \(\Vert P_{k_0\vert \bar{i}-k_0} -\tilde{P}_{\bar{i}}\Vert _F<\delta _0/2\), for all \(\bar{i}\ge i_0\). Setting \(i = \bar{i}-k_0\) in (9.42), the triangle inequality yields \(\tilde{P}_{\bar{i}}\in \mathcal {B}_{\delta _0}(P^*)\), for \(\bar{i}\ge i_0\). Then in (9.38), choosing \(\bar{i}\ge i_0\) such that \(\delta _2 = b_{\bar{i}}<\min (\gamma ^{-1}(\epsilon ),\delta _1)\) completes the proof. \(\square \)

Appendix 6

For given \(\hat{K}_1\), let \(\mathcal {K}\) denote the set of control gains (including \(\hat{K}_1\)) generated by Procedure 9.2 with all possible \(\{\Delta G_i\}_{i=1}^\infty \) satisfying \(\Vert \Delta G\Vert _\infty <\delta _2\), where \(\delta _2\) is the one in Theorem 9.2. The following result is firstly derived.

Lemma 9.13

Under the conditions in Theorem 9.3, there exist \(\bar{L}_0>0\) and \(N_0>0\) such that for any \(\bar{L}\ge \bar{L}_0\) and \(N\ge N_0\), \(\hat{K}_i\in \mathcal {K}\) implies \(\Vert \Delta G_i\Vert _F< \delta _2\), almost surely.

Proof

By definition, in the context of Algorithm 9.1,

$$\begin{aligned} \begin{aligned} \Vert \Delta G_i\Vert _F \le \Vert \hat{Q}_i - Q(\hat{P}_{i,\bar{L}})\Vert _F + \Vert Q(\hat{P}_{i,\bar{L}}) - Q(\tilde{P}_i)\Vert _F + \Vert \tilde{P}_i - \hat{P}_{i,\bar{L}} \Vert _F, \end{aligned} \end{aligned}$$

where \(\tilde{P}_i\) is the unique solution of (9.9) with \(K=\hat{K}_i\). Thus, the task is to prove that each term in the right-hand side of the above inequality is less than \(\delta _2/3\). To this end, we firstly study \(\Vert \tilde{P}_i - \hat{P}_{i,\bar{L}} \Vert _F\). Define \(\hat{p}_{i,j}={{\,\mathrm{vec}\,}}(\hat{P}_{i,j})\), by Lemma 9.5, Line 11 and Line 12 in Algorithm 9.1 can be rewritten as

$$\begin{aligned} \begin{aligned} \hat{p}_{i,j+1} = \mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i)\hat{p}_{i,j} + \mathcal {T}^2(\hat{\Phi }^\dagger _{N,M},\hat{r}_{N,M},\hat{K}_i), \end{aligned} \end{aligned}$$
(9.46)

where \(\hat{p}_{i,0}\in \mathbb {R}^{n^2}\) and

$$\begin{aligned} \begin{aligned} \mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i)&= \left[ I_n,-\hat{K}^T_i\right] \otimes \left[ I_n,-\hat{K}^T_i\right] D_{(m+n)(m+n+1)/2}\hat{\Phi }^\dagger _{N,M}\hat{\Psi }^2_{N,M}D_n^\dagger , \\ \mathcal {T}^2(\hat{\Phi }^\dagger _{N,M},\hat{r}_{N,M},\hat{K}_i)&= \left[ I_n,-\hat{K}^T_i\right] \otimes \left[ I_n,-\hat{K}^T_i\right] D_{(m+n)(m+n+1)/2}\hat{\Phi }^\dagger _{N,M}\hat{r}_{N,M}. \end{aligned} \end{aligned}$$

Similar derivations applied to (9.20) with \(K=\hat{K}_i\) yield

$$\begin{aligned} \bar{p}_{i,j+1} = \mathcal {T}^1(\Phi _M,\Psi ^2_M,\hat{K}_i)\bar{p}_{i,j} + \mathcal {T}^2(\Phi _M,r_M,\hat{K}_i),\quad \bar{p}_{i,0}\in \mathbb {R}^{n^2}. \end{aligned}$$
(9.47)

Since (9.20) is identical to (9.14), (9.47) is identical to (9.15) with K and \({{\,\mathrm{vec}\,}}(P_{K,j})\) replaced by \(\hat{K}_i\) and \(\bar{p}_{i,j}\) respectively, and

$$\begin{aligned} \mathcal {T}^1(\Phi _M,\Psi _M,\hat{K}_i)=\mathscr {A}(\hat{K}_i) + I_n\otimes I_n,\quad \mathcal {T}^2(\Phi _M,r_M,\hat{K}_i) = {{\,\mathrm{vec}\,}}(S+\hat{K}_i^TR\hat{K}_i). \end{aligned}$$
(9.48)

Since \(\mathscr {A}(\hat{K}_i)\), \(\hat{K}_i\in \mathcal {K}\) is mean-square stabilizing, by Lemma 9.4

$$\begin{aligned} \lim _{j\rightarrow \infty } \bar{P}_{i,j} = \tilde{P}_i, \end{aligned}$$
(9.49)

where \(\bar{P}_{i,j} = {{\,\mathrm{vec}\,}}^{-1}(\bar{p}_{i,j})\). By definition and Theorem 9.2, \(\bar{\mathcal {K}}\) is bounded, thus compact. Let \(\mathcal {V}\) be the set of the unique solutions of (9.5) with \(K\in \mathcal {K}\). Then by Theorem 9.2 \(\mathcal {V}\) is bounded. So \(\mathscr {A}(K)\) is mean-square stable for \(\forall K\in \bar{\mathcal {K}}\), otherwise by (9.11) and Lemma 9.1 it contradicts the boundedness of \(\mathcal {V}\). Define \(\mathcal {K}_1 = \{\mathscr {A}(K)+I_n\otimes I_n\vert K\in \bar{\mathcal {K}}\}\). Then \(\rho (X)<1\) for any \(X\in \mathcal {K}_1\), and by continuity \(\mathcal {K}_1\) is a compact set. This implies the existence of a \(\delta _3>0\), such that \(\rho (X)<1\) for any \(X\in \bar{\mathcal {K}}_2\), where

$$\mathcal {K}_2 = \{X\vert X\in \mathcal {B}_{\delta _3}(Y),Y\in \mathcal {K}_1\}.$$

Define

$$\begin{aligned} \begin{aligned} \Delta \mathcal {T}^1_{N,M,i}&= \mathcal {T}^1(\Phi _M,\Psi ^2_M,\hat{K}_i) - \mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i), \\ \Delta \mathcal {T}^2_{N,M,i}&= \mathcal {T}^2(\Phi _M,r_M,\hat{K}_i) - \mathcal {T}^2(\hat{\Phi }^\dagger _{N,M},\hat{r}_{N,M},\hat{K}_i). \end{aligned} \end{aligned}$$

The boundedness of \(\mathcal {K}\), (9.22), and (9.48) imply the existence of \(N_1>0\), such that for any \(N\ge N_1\), any \(\hat{K}_i\in \mathcal {K}\), almost surely

$$\begin{aligned} \mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i)\in \bar{\mathcal {K}}_2, \quad \mathcal {T}^2(\hat{\Phi }^\dagger _{N,M},\hat{r}_{N,M},\hat{K}_i)< C_9, \end{aligned}$$
(9.50)

where \(C_9>0\) is a constant. Then

$$\rho (\mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i))<1$$

and (9.46) admits a unique stable equilibrium, that is,

$$\begin{aligned} \lim _{j\rightarrow \infty }\hat{P}_{i,j} = \mathring{P}_i \end{aligned}$$
(9.51)

for some \(\mathring{P}_i\in \mathbb {S}^n\). From (9.46), (9.47), (9.49), and (9.51), we have

$$\begin{aligned} \begin{aligned} {{\,\mathrm{vec}\,}}(\tilde{P}_i)&= \left( I_{n^2} - \mathcal {T}^1(\Phi _M,\Psi _M,\hat{K}_i) \right) ^{-1}\mathcal {T}^2(\Phi _M,r_M,\hat{K}_i), \\ {{\,\mathrm{vec}\,}}(\mathring{P}_i)&= \left( I_{n^2} - \mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i) \right) ^{-1}\mathcal {T}^2(\hat{\Phi }^\dagger _{N,M},\hat{r}_{N,M},\hat{K}_i). \end{aligned} \end{aligned}$$

Thus by (9.23), for any \(N\ge N_1\), any \(\hat{K}_i\in \mathcal {K}\), almost surely

$$\begin{aligned}&\Vert \mathring{P}_i - \tilde{P}_i\Vert _F\le \left\| \left( I_{n^2} - \mathcal {T}^1(\Phi _M,\Psi _M,\hat{K}_i) \right) ^{-1}\right\| _F\left( \Vert \Delta \mathcal {T}^2_{N,M,i} \Vert _F +\right. \\&\left. \left\| \left( I_{n^2} - \mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i) \right) ^{-1}\right\| _F\left\| \mathcal {T}^2(\hat{\Phi }^\dagger _{N,M},\hat{r}_{N,M},\hat{K}_i)\right\| _2\Vert \Delta \mathcal {T}^1_{N,M,i} \Vert _F\right) \\&\le C_{10} \Vert \Delta \mathcal {T}^2_{N,M,i} \Vert _F + C_{11}\Vert \Delta \mathcal {T}^1_{N,M,i} \Vert _F, \end{aligned}$$

where \(C_{10}\) and \(C_{11}\) are some positive constants, and the last inequality is due to (9.48), (9.50) and the fact that \(\mathcal {K}_1\) and \(\bar{\mathcal {K}}_2\) are compact sets. Then for any \(\epsilon _1>0\), the boundedness of \(\mathcal {K}\) and (9.22) implies the existence of \(N_2\ge N_1\), such that for any \(N\ge N_2\), almost surely

$$\begin{aligned} \Vert \mathring{P}_i - \tilde{P}_i\Vert _F<\epsilon _1/2, \end{aligned}$$
(9.52)

as long as \(\hat{K}_i\in \mathcal {K}\). By Lemma 9.6 and (9.52), for any \(N\ge N_2\) and any \(\hat{K}_i\in \mathcal {K}\),

$$\Vert \mathring{P}_{i}-\hat{P}_{i,j}\Vert _F\le a_0b^j_0\Vert \mathring{P}_i\Vert _F\le a_1b^j_0,$$

for some \(a_0>0\), \(1>b_0>0\) and \(a_1>0\). Therefore there exists a \(\bar{L}_1>0\), such that for any \(\bar{L}\ge \bar{L}_1\), and any \(N\ge N_2\), almost surely

$$\begin{aligned} \Vert \hat{P}_{i,\bar{L}}-\mathring{P}_i\Vert _F<\epsilon _1/2, \end{aligned}$$
(9.53)

as long as \(\hat{K}_i\in \mathcal {K}\). With (9.52) and (9.53), we obtain

$$\begin{aligned} \Vert \hat{P}_{i,\bar{L}}-\tilde{P}_i\Vert _F<\epsilon _1, \end{aligned}$$
(9.54)

almost surely for any \(\bar{L}\ge \bar{L}_1\), any \(N\ge N_2\), as long as \(\hat{K}_i\in \mathcal {K}\). Since \(\epsilon _1\) is arbitrary, we can choose \(\epsilon _1\) such that almost surely

$$\begin{aligned} \Vert \hat{P}_{i,\bar{L}}-\tilde{P}_i\Vert _F<\delta _2/3 \end{aligned}$$

for any \(\bar{L}\ge \bar{L}_1\), any \(N\ge N_2\), as long as \(\hat{K}_i\in \mathcal {K}\).

Secondly, by definition and (9.54), there exist \(\bar{L}_2\ge \bar{L}_1\) and \(N_3\ge N_2\), such that

$$\begin{aligned} \Vert Q(\hat{P}_{i,\bar{L}}) - Q(\tilde{P}_i)\Vert _F<\delta _2/3 \end{aligned}$$

for any \(\bar{L}\ge \bar{L}_2\), any \(N\ge N_3\), as long as \(\hat{K}_i\in \mathcal {K}\).

Thirdly, since \(\mathcal {V}\) is bounded, \(\hat{P}_{i,\bar{L}}\) is also almost surely bounded by (9.54). Thus, from Line 14 in Algorithm 9.1 and (9.22), there exists \(N_4\ge N_3\), such that

$$\Vert \hat{Q}_{i} - Q(\hat{P}_{i,\bar{L}})\Vert _F<\delta _2/3$$

for any \(N\ge N_4\) and any \(\bar{L}\ge \bar{L}_2\), as long as \(\hat{K}_i\in \mathcal {K}\).

Setting \(N_0 = N_4\) and \(\bar{L}_0 = \bar{L}_2\) yields \(\Vert \Delta G_i\Vert <\delta _2\). \(\square \)

Now we are ready to prove the convergence of Algorithm 9.1.

Proof

(Theorem 9.3) Since \(\hat{K}_1\in \mathcal {K}\), Lemma 9.13 implies \(\Vert \Delta G_1\Vert _F<\delta _2\) almost surely. By definition, \(\hat{K}_2\in \mathcal {K}\). Thus \(\Vert \Delta G_i\Vert _F<\delta _2, i=1,2,\cdots \) almost surely by mathematical induction. Then Theorem 9.2 completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Pang, B., Jiang, ZP. (2022). Robust Reinforcement Learning for Stochastic Linear Quadratic Control with Multiplicative Noise. In: Jiang, ZP., Prieur, C., Astolfi, A. (eds) Trends in Nonlinear and Adaptive Control. Lecture Notes in Control and Information Sciences, vol 488. Springer, Cham. https://doi.org/10.1007/978-3-030-74628-5_9

Download citation

Publish with us

Policies and ethics