Robust Reinforcement Learning for Stochastic Linear Quadratic Control with Multiplicative Noise

Bo Pang¹² &
Zhong-Ping Jiang¹²

Part of the book series: Lecture Notes in Control and Information Sciences ((LNCIS,volume 488))

Abstract

This chapter studies the robustness of reinforcement learning for discrete-time linear stochastic systems with multiplicative noise evolving in continuous state and action spaces. As one of the popular methods in reinforcement learning, the robustness of policy iteration is a longstanding open problem for the stochastic linear quadratic regulator (LQR) problem with multiplicative noise. A solution in the spirit of input-to-state stability is given, guaranteeing that the solutions of the policy iteration algorithm are bounded and enter a small neighborhood of the optimal solution, whenever the error in each iteration is bounded and small. In addition, a novel off-policy multiple-trajectory optimistic least-squares policy iteration algorithm is proposed, to learn a near-optimal solution of the stochastic LQR problem directly from online input/state data, without explicitly identifying the system matrices. The efficacy of the proposed algorithm is supported by rigorous convergence analysis and numerical results on a second-order example.

Dedicated to Laurent Praly, a beautiful mind

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Adaptive optimal control of unknown discrete-time linear systems with guaranteed prescribed degree of stability using reinforcement learning

Article 24 August 2021

Reinforcement Learning for Optimal Adaptive Control of Time Delay Systems

Online Model-Free RLSPI Algorithm for Nonlinear Discrete-Time Non-affine Systems

References

Abbasi-Yadkori, Y., Lazic, N., Szepesvari, C.: Model-free linear quadratic control via reduction to expert prediction. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2019)
Google Scholar
Agarwal, R.P.: Difference Equations and Inequalities: Theory, Methods, and Applications, 2nd edn. Marcel Dekker Inc, New York (2000)
Book Google Scholar
Athans, M., Ku, R., Gershwin, S.: The uncertainty threshold principle: some fundamental limitations of optimal decision making under dynamic uncertainty. IEEE Trans. Autom. Control 22(3), 491–495 (1977)
Google Scholar
Beghi, A., D’alessandro, D.: Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations. Automatica, 34(8):1031 – 1034, 1998
Google Scholar
Bertsekas, D.P.: Approximate policy iteration: A survey and some new methods. J. Control Theory Appl. 9(3), 310–335 (2011)
Article MathSciNet Google Scholar
Bertsekas, D.P.: Reinforcement Learning and Optimal Control. Athena Scientific, Belmont, Massachusetts (2019)
Google Scholar
Bian, T., Jiang, Z.P.: Continuous-time robust dynamic programming. SIAM J. Control Optim. 57(6), 4150–4174 (2019)
Article MathSciNet Google Scholar
Bian, T., Wolpert, D.M., Jiang, Z.P.: Model-free robust optimal feedback mechanisms of biological motor control. Neural Comput. 32(3), 562–595 (2020)
Article MathSciNet Google Scholar
Bitmead, R.R., Gevers, M., Wertz, V.: Adaptive Optimal Control: The Thinking Man’s GPC. Prentice-Hall, Englewood Cliffs, New Jersy (1990)
MATH Google Scholar
Breakspear, M.: Dynamic models of large-scale brain activity. Nat. Neurosci. 20(3), 340–352 (2017)
Google Scholar
Bryson, A.E., Ho, Y.C.: Applied Optimal Control: Optimization, Estimation and Control. Talor & Francis (1975)
Google Scholar
Buşoniu, L., de Bruin, T., Tolić, D., Kober, J., Palunko, I.: Reinforcement learning for control: Performance, stability, and deep approximators. Annu. Rev. Control 46, 8–28 (2018)
Google Scholar
Coppens, P., Patrinos, P.: Sample complexity of data-driven stochastic LQR with multiplicative uncertainty. In: The 59th IEEE Conference on Decision and Control (CDC), pp. 6210–6215 (2020)
Google Scholar
Coppens, P., Schuurmans, M., Patrinos, P.: Data-driven distributionally robust LQR with multiplicative noise. In: Learning for Dynamics and Control (L4DC), pp. 521–530. PMLR (2020)
Google Scholar
De Koning, W.L.: Infinite horizon optimal control of linear discrete time systems with stochastic parameters. Automatica 18(4), 443–453 (1982)
Google Scholar
De Koning, W.L.: Compensatability and optimal compensation of systems with white parameters. IEEE Trans. Autom. Control 37(5), 579–588 (1992)
Google Scholar
Drenick, R., Shaw, L.: Optimal control of linear plants with random parameters. IEEE Trans. Autom. Control 9(3), 236–244 (1964)
Google Scholar
Du, K., Meng, Q., Zhang, F.: A Q-learning algorithm for discrete-time linear-quadratic control with random parameters of unknown distribution: convergence and stabilization. arXiv preprint arXiv:2011.04970, 2020
Duncan, T.E., Guo, L., Pasik-Duncan, B.: Adaptive continuous-time linear quadratic gaussian control. IEEE Trans. Autom. Control 44(9), 1653–1662 (1999)
Google Scholar
Gravell, B., Esfahani, P.M., Summers, T.: Learning robust controllers for linear quadratic systems with multiplicative noise via policy gradient. IEEE Trans. Autom. Control (2019)
Google Scholar
Gravell, B., Esfahani, P.M., Summers, T.: Robust control design for linear systems via multiplicative noise. arXiv preprint arXiv:2004.08019 (2020)
Gravell, B., Ganapathy, K., Summers, T.: Policy iteration for linear quadratic games with stochastic parameters. IEEE Control Syst. Lett. 5(1), 307–312 (2020)
Google Scholar
Guo, Y., Summers, T.H.: A performance and stability analysis of low-inertia power grids with stochastic system inertia. In: American Control Conference (ACC), pp. 1965–1970 (2019)
Google Scholar
Hespanha, J.P., Naghshtabrizi, P., Xu, Y.: A survey of recent results in networked control systems. Proceedings of the IEEE 95(1), 138–162 (2007)
Article Google Scholar
Hewer, G.: An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Trans. Autom. Control (1971)
Google Scholar
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, New York (2012)
Book Google Scholar
Jiang, Y., Jiang, Z.P.: Adaptive dynamic programming as a theory of sensorimotor control. Biolog. Cybern. 108(4), 459–473 (2014)
Article MathSciNet Google Scholar
Jiang, Y., Jiang, Z.P.: Robust Adaptive Dynamic Programming. Wiley, Hoboken, New Jersey (2017)
Book Google Scholar
Jiang, Z.P., Bian, T., Gao, W.: Learning-based control: A tutorial and some recent results. Found. Trends Syst. Control. 8(3), 176–284 (2020)
Article Google Scholar
Jiang, Z.P., Lin, Y., Wang, Y.: Nonlinear small-gain theorems for discrete-time feedback systems and applications. Automatica 40(12), 2129–2136 (2004)
MathSciNet MATH Google Scholar
Kamalapurkar, R., Walters, P., Rosenfeld, J., Dixon, W.: Reinforcement learning for optimal feedback control: A Lyapunov-based approach. Springer (2018)
Google Scholar
Kantorovich, L.V., Akilov, G.P.: Functional Analysis in Normed Spaces. Macmillan, New York (1964)
MATH Google Scholar
Kiumarsi, B., Vamvoudakis, K.G., Modares, H., Lewis, F.L.: Optimal and autonomous control using reinforcement learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2042–2062 (2018)
Article MathSciNet Google Scholar
Konrad, K.: Theory and Application of Infinite Series, 2nd edn. Dover Publications, New York (1990)
Google Scholar
Lai, J., Xiong, J., Shu, Z.: Model-free optimal control of discrete-time systems with additive and multiplicative noises. arXiv preprintarXiv:2008.08734 (2020)
Levine, S., Koltun, V.: Continuous inverse optimal control with locally optimal examples. In: International Conference on Machine Learning (ICML) (2012)
Google Scholar
Levine, S., Kumar, A., Tucker, G., Fu, J.: Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprintarXiv:2005.01643 (2020)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. In: International Conference on LearningRepresentations (ICLR) (2016)
Google Scholar
Ljung, L.: System Identification: Theory for the user, 2nd edn. Prentice Hall PTR, Upper Saddle River (1999)
MATH Google Scholar
Magnus, J.R., Neudecker, H.: Matrix Differential Calculus With Applications In Statistics And Economerices. Wiley, New York (2007)
Google Scholar
Monfort, M., Liu, A., Ziebart, B.D.: Intent prediction and trajectory forecasting via predictive inverse linear-quadratic regulation. In: AAAI Conference on Artificial Intelligence (AAAI) (2015)
Google Scholar
Morozan, T.: Stabilization of some stochastic discrete-time control systems. Stoch. Anal. Appl. 1(1), 89–116 (1983)
Google Scholar
Pang, B., Bian, T., Jiang, Z.-P.: Robust policy iteration for continuous-time linear quadratic regulation. IEEE Trans. Autom. Control (2020)
Google Scholar
Pang, B., Jiang, Z.-P.: Robust reinforcement learning: A case study in linear quadratic regulation. In: AAAI Conference on Artificial Intelligence (AAAI) (2020)
Google Scholar
Powell, W.B.: From reinforcement learning to optimal control: A unified framework for sequential decisions. arXiv preprintarXiv:1912.03513 (2019)
Praly, L., Lin, S.-F., Kumar, P.R.: A robust adaptive minimum variance controller. SIAM J. Control Optim. 27(2), 235–266 (1989)
Google Scholar
Rami, M.A., Chen, X., Zhou, X.Y.: Discrete-time indefinite LQ control with state and control dependent noises. J. Glob. Optim. 23(3), 245–265 (2002)
Google Scholar
Åström, K.J., Wittenmark, B.: Adaptive Control, 2nd edn. Addison-Wesley, Reading, Massachusetts (1995)
MATH Google Scholar
Sontag, E.D.: Input to state stability: Basic concepts and results. Nonlinear and optimal control theory. volume 1932, pp. 163–220. Springer, Berlin (2008)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge, Massachusetts (2018)
MATH Google Scholar
Sutton, R.S., Barto, A.G., Williams, R.J.: Reinforcement learning is direct adaptive optimal control. IEEE Control Syst. Mag. 12(2), 19–22 (1992)
Article Google Scholar
Tiedemann, A., De Koning, W.: The equivalent discrete-time optimal control problem for continuous-time systems with stochastic parameters. Int. J. Control 40(3), 449–466 (1984)
Google Scholar
Todorov, E.: Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Comput. 17(5), 1084–1108 (2005)
Google Scholar
Tu, S., Recht, B.: The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint. In Annual Conference on Learning Theory (COLT) (2019)
Google Scholar
Xing, Y., Gravell, B., He, X., Johansson, K.H., Summers, T.: Linear system identification under multiplicative noise from multiple trajectory data. In American Control Conference (ACC), pp 5157–5261 (2020)
Google Scholar
Huang, Y., Zhang, W., Zhang, H.: Infinite horizon LQ optimal control for discrete-time stochastic systems. In: The 6th World Congress on Intelligent Control and Automation (WCICA), vol. 1, pp. 252–256 (2006)
Google Scholar

Download references

Acknowledgements

Confucius once said, Virtue is not left to stand alone. He who practices it will have neighbors. Laurent Praly, the former PhD advisor of the second-named author, is such a beautiful mind. His vision about and seminal contributions to control theory, especially nonlinear and adaptive control, have influenced generations of students including the authors of this chapter. ZPJ is privileged to have Laurent as the PhD advisor during 1989–1993 and is very grateful to Laurent for introducing him to the field of nonlinear control. It is under Laurent’s close guidance that ZPJ started, in 1991, working on the stability and control of interconnected nonlinear systems that has paved the foundation for nonlinear small-gain theory. The research findings presented here are just a reflection of Laurent’s vision about the relationships between control and learning. We also thank the U.S. National Science Foundation for its continuous financial support.

Author information

Authors and Affiliations

Control and Networks Lab, Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, 370 Jay Street, Brooklyn, NY, 11201, USA
Bo Pang & Zhong-Ping Jiang

Authors

Bo Pang
View author publications
You can also search for this author in PubMed Google Scholar
Zhong-Ping Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Pang .

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, New York University, Brooklyn, NY, USA
Zhong-Ping Jiang
Automatic Control, CNRS, Saint Martin d’Heres, France
Christophe Prieur
Department of Electrical and Computer Engineering, Imperial College London, London, UK
Alessandro Astolfi

Appendices

Appendix 1

The following lemma provides the relationship between operations ${{\,\mathrm{vec}\,}}(\cdot )$ and ${{\,\mathrm{svec}\,}}(\cdot )$.

Lemma 9.5

([40, Page 57]) For $X\in \mathbb {S}^n$, there exists a unique matrix $D_n\in \mathbb {R}^{n^2\times \frac{1}{2}n(n+1)}$ with full column rank, such that

$${{\,\mathrm{vec}\,}}(X) = D_n{{\,\mathrm{svec}\,}}(X),\quad {{\,\mathrm{svec}\,}}(X) = D_n^\dagger {{\,\mathrm{vec}\,}}(X).$$

$D_n$ is called the duplication matrix.

Lemma 9.6

([44, Lemma A.3.]) Let $\mathcal {O}$ be a compact set such that $\rho (O)<1$ for any $O\in \mathcal {O}$, then there exist an $a_0>0$ and a $0<b_0<1$, such that

$$\Vert O^k\Vert _2\le a_0 b_0^k,\quad \forall k\in \mathbb {Z}_+$$

for any $O\in \mathcal {O}$.

For $X\in \mathbb {R}^{n\times n}$, $Y\in \mathbb {R}^{n\times m}$, $X+\Delta X\in \mathbb {R}^{n\times n}$, $Y + \Delta Y\in \mathbb {R}^{n\times m}$, supposing X and $X + \Delta X$ are invertible, the following inequality is repeatedly used:

$$\begin{aligned} \begin{aligned}&\Vert X^{-1}Y - (X+\Delta X)^{-1}(Y+\Delta Y)\Vert _F=\left\| X^{-1}Y - X^{-1}(Y+\Delta Y)\right. \\&\left. + X^{-1}(Y+\Delta Y) - (X+\Delta X)^{-1}(Y+\Delta Y)\right\| _F\\&= \Vert -X^{-1}\Delta Y + X^{-1}\Delta X(X + \Delta X)^{-1}(Y+\Delta Y)\Vert _F\\&\le \Vert X^{-1}\Vert _F\Vert \Delta Y\Vert _F + \Vert X^{-1}\Vert _F\Vert (X + \Delta X)^{-1}\Vert _F\Vert (Y+\Delta Y)\Vert _F\Vert \Delta X\Vert _F. \end{aligned} \end{aligned}$$

(9.23)

Appendix 2

The following property of $\mathcal {L}_{K}(\cdot )$ is useful.

Lemma 9.7

If K is mean-square stabilizing, then $\mathcal {L}_K(Y_1)\le \mathcal {L}_K(Y_2)\implies Y_1\ge Y_2$, where $Y_1,Y_2\in \mathbb {S}^n$.

Proof

Let $\{x_t\}_{t=0}^\infty $ be the solution of the closed-loop system (9.1) with controller $u=-Kx$. Then for any $t\ge 1$

$$\begin{aligned} \mathbb {E}[x_{t+1}^TY_1x_{t+1}-x_{t}^TY_1x_{t}]&= \mathbb {E}[x_t^T\mathcal {L}_K(Y_1)x_t]\\&\le \mathbb {E}[x_t^T\mathcal {L}_K(Y_2)x_t]=\mathbb {E}[x_{t+1}^TY_2x_{t+1}-x_{t}^TY_2x_{t}]. \end{aligned}$$

Since K is mean-square stabilizing,

$$-x^T_0Y_1x_0=\sum _{t=0}^\infty \mathbb {E}[ x_t^T\mathcal {L}_K(Y_1)x_t]\le \sum _{t=0}^\infty \mathbb {E}[x_t^T\mathcal {L}_K(Y_2)x_t]=-x^T_0Y_2x_0.$$

The proof is complete because $x_0$ is arbitrary. $\square $

Now we are ready to prove Theorem 9.1.

Proof

(Theorem 9.1) By (9.7) and (9.8), for any $x\in \mathbb {R}^n$,

$$K_2 \in \displaystyle \mathop {{\text {arg}}\,{\text {min}}}_{K\in \mathbb {R}^{m\times n}} \{x^T\mathcal {H}(P_1,K)x\}.$$

Thus $\mathcal {H}(P_1,K_2)\le 0$. By definition, $P_1>0$ and

$$\mathcal {L}_{K_2}(P_1) \le -S - K_2^TRK_2 <0.$$

Then Lemma 9.1 implies that $K_2$ is mean-square stabilizing. Inserting (9.7) into the above inequality yields $\mathcal {L}_{K_2}(P_1) \le \mathcal {L}_{K_2}(P_2)$. This implies $P_1\ge P_2$ by Lemma 9.7. An application of mathematical induction proves the first two items. For the last item, by a theorem on the convergence of a monotone sequence of self-adjoint operators (see [32, Pages 189–190]), $\lim _{i\rightarrow \infty } P_i$ and $\lim _{i\rightarrow \infty } K_i$ exist. Letting $i\rightarrow \infty $ in (9.7) and (9.8), and eliminating $K_\infty $ in (9.7) using (9.8), we have

$$\begin{aligned} P_\infty = S + \Pi (P_\infty ) - A^TP_\infty B(R + \Sigma (P_\infty ))^{-1}B^TP_\infty A. \end{aligned}$$

The proof is complete by the uniqueness of $P^*$. $\square $

Appendix 3 Proof

(Lemma 9.2) Since $\mathscr {K}(P^*)$ is mean-square stabilizing, by continuity there always exists a $\bar{\delta }_0>0$, such that $\mathscr {R}(P_i)$ is invertible, $\mathscr {K}(P_i)$ is mean-square stabilizing for all $P_i\in \mathcal {\bar{B}}_{\bar{\delta }_0}(P^*)$. Suppose $P_i\in \mathcal {\bar{B}}_{\bar{\delta }_0}(P^*)$. Subtracting

$$K_{i+1}^TB^TP^*A+A^TP^*BK_{i+1}-K_{i+1}^T\mathscr {R}(P^*)K_{i+1}$$

from both sides of the GARE (9.4) yields

$$\begin{aligned} \begin{aligned}&\mathcal {L}_{\mathscr {K}(P_i)}(P^*) = - S - \mathscr {K}^T(P_i)R\mathscr {K}(P_i) + \\&(\mathscr {K}(P_i)-\mathscr {K}(P^*))^T\mathscr {R}(P^*)(\mathscr {K}(P_i)-\mathscr {K}(P^*)). \end{aligned} \end{aligned}$$

(9.24)

Subtracting (9.24) from (9.12), we have

$$\begin{aligned} \begin{aligned} P_{i+1}-P^* = -\mathcal {L}^{-1}_{\mathscr {K}(P_i)}\left( ((\mathscr {K}(P_i)-\mathscr {K}(P^*))^T\mathscr {R}(P^*)(\mathscr {K}(P_i)-\mathscr {K}(P^*))\right) . \end{aligned} \end{aligned}$$

Taking norm on both sides of above equation, (9.11) yields

$$\begin{aligned} \begin{aligned} \Vert P_{i+1}-P^* \Vert _F \le \Vert \mathscr {A}(\mathscr {K}(P_i))^{-1}\Vert _2 \Vert \mathscr {R}(P^*)\Vert _F\Vert \mathscr {K}(P_i)-\mathscr {K}(P^*)\Vert _F^2. \end{aligned} \end{aligned}$$

Since $\mathscr {K}(\cdot )$ is locally Lipschitz continuous at $P^*$, by continuity of matrix norm and matrix inverse, there exists a $c_1>0$, such that

$$\Vert P_{i+1}-P^* \Vert _F \le c_1\Vert P_i - P^*\Vert _F^2,\quad \forall P_i\in \mathcal {\bar{B}}_{\bar{\delta }_0}(P^*).$$

So for any $0<\sigma <1$, there exists a $\bar{\delta }_0\ge \delta _0>0$ with $c_1\delta _0\le \sigma $. This completes the proof. $\square $

Appendix 4

Before the proof of Lemma 9.3, some auxiliary lemmas are firstly proved. Procedure 9.2 will exhibit a singularity, if $[\hat{G}_{i}]_{uu}$ in (9.10) is singular, or the cost (9.2) of $\hat{K}_{i+1}$ is infinity. The following lemma shows that if $\Delta G_i$ is small, no singularity will occur. Let $\bar{\delta }_0$ be the one defined in the proof of Lemma 9.2, then $\delta _0\le \bar{\delta }_0$.

Lemma 9.8

For any $\tilde{P}_i\in \mathcal {B}_{\delta _0}(P^*)$, there exists a $d(\delta _0)>0$, independent of $\tilde{P}_i$, such that $\hat{K}_{i+1}$ is mean-square stabilizing and $[\hat{G}_{i}]_{uu}$ is invertible, if $\Vert \Delta G_i\Vert _F\le d$.

Proof

Since $\mathcal {\bar{B}}_{\bar{\delta }_0}(P^*)$ is compact and $\mathscr {A}(\mathscr {K}(\cdot ))$ is a continuous function, set

$$\mathcal {S} = \{\mathscr {A}(\mathscr {K}(\tilde{P}_i))\vert \tilde{P}_i\in \mathcal {\bar{B}}_{\bar{\delta }_0}(P^*)\}$$

is also compact. By continuity and Lemma 9.1, for each $X\in \mathcal {S}$, there exists a $r(X)>0$ such that $\rho (Y+I_n\otimes I_n)<1$ for any $Y\in \mathcal {B}_{r(X)}(X)$. The compactness of $\mathcal {S}$ implies the existence of a $\underline{r}>0$, such that $\rho (Y+I_n\otimes I_n)<1$ for each $Y\in \mathcal {B}_{\underline{r}}(X)$ and all $X\in \mathcal {S}$. Similarly, there exists $d_1>0$ such that $[\hat{G}_{i}]_{uu}$ is invertible for all $\tilde{P}_i\in \mathcal {\bar{B}}_{\bar{\delta }_0}(P^*)$, if $\Vert \Delta G_i\Vert _F\le d_1$. Note that in policy improvement step of Procedure 9.1 (the policy update step in Procedure 9.2), the improved policy$\tilde{K}_{i+1}=[\tilde{G}_{i}]_{uu}^{-1}[\tilde{G}_{i}]_{ux}$ (the updated policy $\hat{K}_{i+1}$) is continuous function of $\tilde{G}_i$ ($\hat{G}_i$), and there exists a $0<d_2\le d_1$, such that $\mathscr {A}(\hat{K}_{i+1})\in \mathcal {B}_{\underline{r}}(\mathscr {A}(\mathscr {K}(\tilde{P}_i)))$ for all $\tilde{P}_i\in \mathcal {\bar{B}}_{\bar{\delta }_0}(P^*)$, if $\Vert \Delta G_i\Vert _F\le d_2$. Thus, Lemma 9.1 implies that $\hat{K}_{i+1}$ is mean-square stabilizing. Setting $d=d_2$ completes the proof. $\square $

By Lemma 9.8, if $\Vert \Delta G_i\Vert _F\le d$, the sequence $\{\tilde{P}_i\}_{i=0}^{\infty }$ satisfies (9.13). For simplicity, we denote $\mathcal {E}(\tilde{G}_i,\Delta G_i)$ in (9.13) by $\mathcal {E}_i$. The following lemma gives an upper bound on $\Vert \mathcal {E}_i\Vert _F$ in terms of $\Vert \Delta G_i\Vert _F$.

Lemma 9.9

For any $\tilde{P}_i\in \mathcal {B}_{\delta _0}(P^*)$ and any $c_2>0$, there exists a $0<\delta _1^1(\delta _0,c_2)\le d$, independent of $\tilde{P}_i$, where d is defined in Lemma 9.8, such that

$$\Vert \mathcal {E}_i\Vert _F\le c_3\Vert \Delta G_i\Vert _F<c_2,$$

if $\Vert \Delta G_i\Vert _F<\delta _1^1$, where $c_3(\delta _0)>0$.

Proof

For any $\tilde{P}_i\in \bar{\mathcal {B}}_{\delta _0}(P^*)$, $\Vert \Delta G_i\Vert _F\le d$, we have from (9.23)

$$\begin{aligned} \Vert \mathscr {K}(\tilde{P}_i) - \hat{K}_{i+1}\Vert _F&\le \Vert [\tilde{G}_{i}]^{-1}_{uu}\Vert _F(1 + \Vert [\hat{G}_{i}]^{-1}_{uu}\Vert _F \Vert [\hat{G}_{i}]_{ux}\Vert _F)\Vert \Delta G_i\Vert _F \nonumber \\&\le c_4(\delta _0,d)\Vert \Delta G_i\Vert _F, \end{aligned}$$

(9.25)

where the last inequality comes from the continuity of matrix inverse and the extremum value theorem. Define

$$\begin{aligned} \begin{aligned} \check{P}_{i} = \mathcal {L}_{\hat{K}_{i+1}}^{-1}\left( -S-\hat{K}^T_{i+1}R\hat{K}_{i+1}\right) ,\quad \mathring{P}_{i} = \mathcal {L}_{\mathscr {K}(\tilde{P}_i)}^{-1}\left( - S - \mathscr {K}(\tilde{P}_i)^TR\mathscr {K}(\tilde{P}_i)\right) . \end{aligned} \end{aligned}$$

Then by (9.11) and (9.13),

$$\begin{aligned} \begin{aligned} \Vert \mathcal {E}_i\Vert _F&= \Vert {{\,\mathrm{vec}\,}}(\check{P}_{i}- \mathring{P}_{i})\Vert _2, \\ {{\,\mathrm{vec}\,}}(\check{P}_{i})&= \mathscr {A}^{-1}\left( \hat{K}_{i+1}\right) {{\,\mathrm{vec}\,}}\left( -S-\hat{K}^T_{i+1}R\hat{K}_{i+1}\right) , \\ {{\,\mathrm{vec}\,}}(\mathring{P}_{i})&= \mathscr {A}^{-1}\left( \mathscr {K}(\tilde{P}_i)\right) {{\,\mathrm{vec}\,}}\left( - S - \mathscr {K}(\tilde{P}_i)^TR\mathscr {K}(\tilde{P}_i)\right) . \end{aligned} \end{aligned}$$

Define

$$\begin{aligned} \begin{aligned} \Delta \mathscr {A}_i&= \mathscr {A}\left( \mathscr {K}(\tilde{P}_i)\right) - \mathscr {A}\left( \hat{K}_{i+1}\right) ,\quad \Delta b_i = {{\,\mathrm{vec}\,}}\left( \mathscr {K}(\tilde{P}_i)^TR\mathscr {K}(\tilde{P}_i) - \hat{K}_{i+1}^TR\hat{K}_{i+1}\right) . \end{aligned} \end{aligned}$$

Using (9.25), it is easy to check that $\Vert \Delta \mathscr {A}_i\Vert _F\le c_5\Vert \Delta G_i\Vert _F$, $\Vert \Delta b_i\Vert _2\le c_6\Vert \Delta G_i\Vert _F$, for some $c_5(\delta _0,d)>0$, $c_6(\delta _0,d)>0$. Then by (9.23)

$$\begin{aligned} \begin{aligned} \Vert \mathcal {E}_i \Vert _F&\le \left\| \mathscr {A}^{-1}\left( \hat{K}_{i+1}\right) \right\| _F \left( c_6 + c_5\left\| \mathscr {A}^{-1}\left( \mathscr {K}(\tilde{P}_i)\right) \right\| _F\right. \\&\left. \times \left\| S + \mathscr {K}(\tilde{P}_i)^TR\mathscr {K}(\tilde{P}_i)\right\| _F\right) \Vert \Delta G_i\Vert _F \le c_3(\delta _0)\Vert \Delta G_i\Vert _F, \end{aligned} \end{aligned}$$

where the last inequality comes from the continuity of matrix inverse and Lemma 9.8. Choosing $0<\delta ^1_1\le d$ such that $c_3\delta ^1_1<c_2$ completes the proof.

Now we are ready to prove Lemma 9.3.

Proof

(Lemma 9.3) Let $c_2=(1-\sigma )\delta _0$ in Lemma 9.9, and $\delta _1$ be equal to the $\delta _1^1$ associated with $c_2$. For any $i\in \mathbb {Z}_+$, if $\tilde{P}_i\in \mathcal {B}_{\delta _0}(P^*)$, then $[\hat{G}_i]_{uu}$ is invertible, $\hat{K}_{i+1}$ is mean-square stabilizing and

$$\begin{aligned} \Vert \tilde{P}_{i+1} - P^*\Vert _F&\le \Vert \mathcal {E}_i\Vert _F +\nonumber \left\| \mathcal {L}^{-1}_{\mathscr {K}(\tilde{P}_i)}(S+\tilde{P}^T_iBR^{-1}B^T\tilde{P}_i) - P^* \right\| _F \nonumber \\&\le \sigma \Vert \tilde{P}_i - P^*\Vert _F + c_3\Vert \Delta G_i\Vert _F \end{aligned}$$

(9.26)

$$\begin{aligned}&\le \sigma \Vert \tilde{P}_i - P^*\Vert _F + c_3\Vert \Delta G\Vert _\infty \end{aligned}$$

(9.27)

$$\begin{aligned}&<\sigma \delta _0 + c_3\delta _1<\sigma \delta _0 + c_2 =\delta _0, \end{aligned}$$

(9.28)

where (9.26) and (9.28) are due to Lemmas 9.2 and 9.9. By induction, (9.26) to (9.28) hold for all $i\in \mathbb {Z}_+$, thus by (9.27),

$$\begin{aligned} \begin{aligned} \Vert \tilde{P}_{i} - P^*\Vert _F&\le \sigma ^2\Vert \tilde{P}_{i-2}-P^*\Vert _F + (\sigma + 1)c_3\Vert \Delta G\Vert _\infty \\&\le \cdots \le \sigma ^{i}\Vert \tilde{P}_{0}-P^*\Vert _F + (1+ \cdots + \sigma ^{i-1})c_3\Vert \Delta G\Vert _\infty \\&<\sigma ^{i}\Vert \tilde{P}_{0}-P^*\Vert _F + \frac{c_3}{1-\sigma }\Vert \Delta G\Vert _\infty , \end{aligned} \end{aligned}$$

which proves (i) and (ii) in Lemma 9.3. Then (9.25) implies (iii) in Lemma 9.3.

In terms of (iv) in Lemma 9.3, for any $\epsilon >0$, there exists a $i_1\in \mathbb {Z}_+$, such that $\sup \{\Vert \Delta G_i\Vert _F\}_{i=i_1}^\infty <\gamma ^{-1}(\epsilon /2)$. Take $i_2\ge i_1$. For $i\ge i_2$, we have by (ii) in Lemma 9.3,

$$\begin{aligned} \begin{aligned} \Vert \tilde{P}_{i}-P^*\Vert _F\le \beta (\Vert \tilde{P}_{i_2}-P^*\Vert _F,i-i_2) + \epsilon /2 \le \beta (c_7,i-i_2) + \epsilon /2, \end{aligned} \end{aligned}$$

where the second inequality is due to the boundedness of $\tilde{P}_i$. Since $\lim _{i\rightarrow \infty }\beta (c_7,i-i_2)=0$, there is a $i_3\ge i_2$ such that $\beta (c_7,i-i_2)<\epsilon /2$ for all $i\ge i_3$, which completes the proof. $\square $

Appendix 5

Notice that all the conclusions of Theorem 9.2 can be implied by Lemma 9.3 if

$$\delta _2<\min (\gamma ^{-1}(\epsilon ),\delta _1),\quad \tilde{P}_1\in \mathcal {B}_{\delta _0}(P^*)$$

for Procedure 9.2. Thus, the proof of Theorem 9.2 reduces to the proof of the following lemma.

Lemma 9.10

Given a mean-square stabilizing $\hat{K}_1$, there exist $0<\delta _2<\min (\gamma ^{-1}(\epsilon ),\delta _1)$, $\bar{i}\in \mathbb {Z}_+$, $\alpha _2>0$, and $\kappa _2>0$, such that $[\hat{G}_i]_{uu}$ is invertible, $\hat{K}_i$ is mean-square stabilizing, $\Vert \tilde{P}_i\Vert _F<\alpha _2$, $\Vert \hat{K}_i\Vert _F<\kappa _2$, $i=1,\cdots ,\bar{i}$, $\tilde{P}_{\bar{i}}\in \mathcal {B}_{\delta _0}(P^*)$, as long as $\Vert \Delta G\Vert _\infty <\delta _2$.

The next two lemmas state that under certain conditions on $\Vert \Delta G_i\Vert _F$, each element in $\{\hat{K}_i\}_{i=1}^{\bar{i}}$ is mean-square stabilizing, each element in $\{[\hat{G}_i]_{uu}\}_{i=1}^{\bar{i}}$ is invertible, and $\{\tilde{P}_i\}_{i=1}^{\bar{i}}$ is bounded. For simplicity, in the following we assume $S>I_n$ and $R>I_m$. All the proofs still work for any $S>0$ and $R>0$, by suitable rescaling.

Lemma 9.11

If $\hat{K}_i$ is mean-square stabilizing, then $[\hat{G}_i]_{uu}$ is nonsingular and $\hat{K}_{i+1}$ is mean-square stabilizing, as long as $\Vert \Delta G_i\Vert _F < a_i$, where

$$\begin{aligned} a_i=\left( m(\sqrt{n} + \Vert \hat{K}_{i}\Vert _2)^2+m(\sqrt{n} + \Vert \hat{K}_{i+1}\Vert _2)^2\right) ^{-1}. \end{aligned}$$

Furthermore,

$$\begin{aligned} \Vert \hat{K}_{i+1}\Vert _F \le 2\Vert R^{-1}\Vert _F(1 + \Vert B^T\tilde{P}_iA\Vert _F). \end{aligned}$$

(9.29)

Proof

By definition,

$$\Vert [\tilde{G}_i]_{uu}^{-1}([\hat{G}_i]_{uu}-[\tilde{G}_i]_{uu})\Vert _F < a_i\Vert [\tilde{G}_i]_{uu}^{-1}\Vert _F.$$

Since $R>I_m$, the eigenvalues $\lambda _j([\tilde{G}_i]_{uu}^{-1})\in (0,1]$ for all $1\le j\le m$. Then by the fact that for any $X\in \mathbb {S}^m$

$$\Vert X\Vert _F = \Vert \Lambda _X\Vert _F,\quad \Lambda _X = \mathrm {diag}\{\lambda _1(X),\cdots ,\lambda _m(X)\},$$

we have

$$\begin{aligned} \Vert [\tilde{G}_i]_{uu}^{-1}([\hat{G}_i]_{uu}-[\tilde{G}_i]_{uu})\Vert _F< a_i\sqrt{m} < 0.5. \end{aligned}$$

(9.30)

Thus by [26, Section 5.8], $[\hat{G}_i]_{uu}$ is invertible.

For any $x\in \mathbb {R}^{n}$ on the unit ball, define

$$\mathcal {X}_{\hat{K}_i} = \left[ \begin{array}{c} I \\ -\hat{K}_i \end{array}\right] xx^T \left[ \begin{array}{cc} I&-\hat{K}_i^T \end{array}\right] .$$

From (9.9) and (9.10) we have

$$x^T\mathcal {H}(\tilde{G}_i,\hat{K}_i)x = \text {tr}(\tilde{G}_i\mathcal {X}_{\hat{K}_i}) = 0,$$

and

$$\text {tr}(\hat{G}_i\mathcal {X}_{\hat{K}_{i+1}}) = \min _{K\in \mathbb {R}^{m\times n}} \text {tr}(\hat{G}_i\mathcal {X}_K).$$

Then

$$\begin{aligned}&\mathrm {tr}(\tilde{G}_i\mathcal {X}_{\hat{K}_{i+1}}) \le \mathrm {tr}(\hat{G}_i\mathcal {X}_{\hat{K}_{i+1}}) + \Vert \Delta G_i\Vert _F \mathrm {tr}(\mathbf {1}\mathbf {1}^T\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs}) \nonumber \\&\le \mathrm {tr}(\hat{G}_i\mathcal {X}_{\hat{K}_{i}}) + \Vert \Delta G_i\Vert _F \mathbf {1}^T\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs}\mathbf {1}\nonumber \\&\le \mathrm {tr}(\tilde{G}_i\mathcal {X}_{\hat{K}_{i}}) + \Vert \Delta G_i\Vert _F \mathbf {1}^T(\vert \mathcal {X}_{\hat{K}_{i}}\vert _{abs}+\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs})\mathbf {1} \nonumber \\&\le \Vert \Delta G_i\Vert _F \mathbf {1}^T(\vert \mathcal {X}_{\hat{K}_{i}}\vert _{abs}+\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs})\mathbf {1}, \end{aligned}$$

(9.31)

where $\vert \mathcal {X}_{\hat{K}_{i}}\vert _{abs}$ denotes the matrix obtained from $\mathcal {X}_{\hat{K}_{i}}$ by taking the absolute value of each entry. Thus by (9.31) and the definition of $\tilde{G}_i$, we have

$$\begin{aligned} x^T\mathcal {L}_{\hat{K}_{i+1}}(\tilde{P}_i)x + \epsilon _1 \le 0, \end{aligned}$$

(9.32)

where

$$\begin{aligned} \begin{aligned} \epsilon _1 = x^T(S+\hat{K}_{i+1}^TR\hat{K}_{i+1})x - \Vert \Delta G_i\Vert _F \mathbf {1}^T(\vert \mathcal {X}_{\hat{K}_{i}}\vert _{abs}+\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs})\mathbf {1}. \end{aligned} \end{aligned}$$

For any x on the unit ball, $\vert \mathbf {1}^Tx\vert _{abs}\le \sqrt{n}$. Similarly, for any $K\in \mathbb {R}^{m\times n}$, by the definition of induced matrix norm, $\vert \mathbf {1}^TKx\vert _{abs}\le \Vert K\Vert _2 \sqrt{m}$. This implies

$$\begin{aligned} \left| \mathbf {1}^T\left[ \begin{array}{c} I \\ -K \end{array}\right] x\right| _{abs} = \left| \mathbf {1}^Tx - \mathbf {1}^TKx\right| _{abs} \le \sqrt{m}(\sqrt{n} + \Vert K\Vert _2), \end{aligned}$$

which means $\mathbf {1}^T\vert \mathcal {X}_K\vert _{abs}\mathbf {1}\le m(\sqrt{n} + \Vert K\Vert _2)^2$. Thus

$$\Vert \Delta G_i\Vert _F \mathbf {1}^T(\vert \mathcal {X}_{\hat{K}_{i}}\vert _{abs}+\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs})\mathbf {1}<1.$$

Then $S>I_n$ leads to

$$x^T\mathcal {L}_{\hat{K}_{i+1}}(\tilde{P}_i)x<0$$

for all x on the unit ball. So $\hat{K}_{i+1}$ is mean-square stabilizing by Lemma 9.1.

By definition,

$$\begin{aligned} \Vert \hat{K}_{i+1}\Vert _F&\le \Vert [\hat{G}_i]_{uu}^{-1}\Vert _F(1 + \Vert B^T\tilde{P}_iA\Vert _F)\nonumber \\&\le \Vert [\tilde{G}_i]_{uu}^{-1}\Vert _F(1-\Vert [\tilde{G}_i]_{uu}^{-1}([\hat{G}_i]_{uu}-[\tilde{G}_i]_{uu})\Vert _F)^{-1}\nonumber (1 + \Vert B^T\tilde{P}_iA\Vert _F) \nonumber \\&\le 2\Vert R^{-1}\Vert _F(1 + \Vert B^T\tilde{P}_iA\Vert _F), \end{aligned}$$

(9.33)

where the second inequality comes from [26, Inequality (5.8.2)], and the last inequality is due to (9.30). This completes the proof. $\square $

Lemma 9.12

For any $\bar{i}\in \mathbb {Z}_+$, $\bar{i}>0$, if

$$\begin{aligned} \Vert \Delta G_i\Vert _F< (1+i^2)^{-1}a_i,\quad i=1,\cdots , \bar{i}, \end{aligned}$$

(9.34)

where $a_i$ is defined in Lemma 9.11, then

$$\begin{aligned} \Vert \tilde{P}_i\Vert _F\le 6\Vert \tilde{P}_1\Vert _F,\quad \Vert \hat{K}_{i}\Vert _F\le C_0, \end{aligned}$$

for $i=1,\cdots ,\bar{i}$, where

$$ C_0 = \max \left\{ \Vert \hat{K}_1 \Vert _F, 2\Vert R^{-1}\Vert _F\left( 1+6\Vert B^T\Vert _F\Vert \tilde{P}_1\Vert _F\Vert A\Vert _F\right) \right\} .$$

Proof

Inequality (9.32) yields

$$\begin{aligned} \mathcal {L}_{\hat{K}_{i+1}}(\tilde{P}_i) + (S+\hat{K}_{i+1}^TR\hat{K}_{i+1}) - \epsilon _{2,i}I < 0, \end{aligned}$$

(9.35)

where

$$\epsilon _{2,i} = \Vert \Delta G_i\Vert _F \mathbf {1}^T(\vert \mathcal {X}_{\hat{K}_{i}}\vert _{abs}+\vert \mathcal {X}_{\hat{K}_{i+1}}\vert _{abs})\mathbf {1}<1.$$

Inserting (9.9) into above inequality, and using Lemma 9.7, we have

$$\begin{aligned} \tilde{P}_{i+1} < \tilde{P}_{i} + \epsilon _{2,i}\mathcal {L}_{\hat{K}_{i+1}}^{-1}(-I). \end{aligned}$$

(9.36)

With $S>I_n$, (9.35) yields

$$\begin{aligned} \mathcal {L}_{\hat{K}_{i+1}}(\tilde{P}_i) + (1 - \epsilon _{2,i})I < 0. \end{aligned}$$

Similar to (9.36), we have

$$\begin{aligned} \mathcal {L}_{\hat{K}_{i+1}}^{-1}(-I)<\frac{1}{1-\epsilon _{2,i}}\tilde{P}_i. \end{aligned}$$

(9.37)

From (9.36) to (9.37), we obtain

$$\begin{aligned} \tilde{P}_{i+1}<\left( 1+\frac{\epsilon _{2,i}}{1-\epsilon _{2,i}}\right) \tilde{P}_i. \end{aligned}$$

By definition of $\epsilon _{2,i}$ and condition (9.34),

$$\frac{\epsilon _{2,i}}{1-\epsilon _{2,i}} \le \frac{1}{i^2},\quad i=1,\cdots ,\bar{i}.$$

Then [34, §28. Theorem 3] yields

$$\tilde{P}_i\le 6\tilde{P}_1,\quad i=1,\cdots ,\bar{i}.$$

An application of (9.29) completes the proof. $\square $

Now we are ready to prove Lemma 9.10.

Proof

(Lemma 9.10) Consider Procedure 9.2 confined to the first $\bar{i}$ iterations, where $\bar{i}$ is a sufficiently large integer to be determined later in this proof. Suppose

$$\begin{aligned} \Vert \Delta G_i\Vert _F<b_{\bar{i}}\triangleq \frac{1}{2m(1+\bar{i}^2)}\left( \sqrt{n} + C_0\right) ^{-2}. \end{aligned}$$

(9.38)

Condition (9.38) implies condition (9.34). Thus $\hat{K}_i$ is mean-square stabilizing, $[\hat{G}_i]_{uu}^{-1}$ is invertible, $\Vert \tilde{P}_i\Vert _F$ and $\Vert \hat{K}_i\Vert _F$ are bounded. By (9.9) we have

$$\begin{aligned} \begin{aligned} \mathcal {L}_{\hat{K}_{i+1}}(\tilde{P}_{i+1}-\tilde{P}_{i})&= -S-\hat{K}_{i+1}^TR\hat{K}_{i+1}-\mathcal {L}_{\hat{K}_{i+1}}(\tilde{P}_{i}). \end{aligned} \end{aligned}$$

Letting $E_i = \hat{K}_{i+1} - \mathscr {K}(\tilde{P}_i)$, the above equation can be rewritten as

$$\begin{aligned} \tilde{P}_{i+1} = \tilde{P}_i - \mathcal {N}(\tilde{P_i}) + \mathcal {L}_{\mathscr {K}(\tilde{P_i})}^{-1}(\mathscr {E}_i), \end{aligned}$$

(9.39)

where $\mathcal {N}(\tilde{P_i}) = \mathcal {L}_{\mathscr {K}(\tilde{P_i})}^{-1}\circ \mathcal {R}(\tilde{P}_i),$ and

$$\begin{aligned} \begin{aligned} \mathcal {R}(Y)&= \Pi (Y)-Y-A_0^TYB_0(R+\Sigma (Y))^{-1}B_0^TYA_0+S, \\ \mathscr {E}_i&= - E_i^T\mathscr {R}(\tilde{P}_{i+1})E_i + E_i^T\mathscr {R}(\tilde{P}_{i+1})\left( \mathscr {K}(\tilde{P}_{i+1})-\mathscr {K}(\tilde{P}_i)\right) \\&+\left( \mathscr {K}(\tilde{P}_{i+1})-\mathscr {K}(\tilde{P}_i)\right) ^T\mathscr {R}(\tilde{P}_{i+1})E_i. \end{aligned} \end{aligned}$$

Given $\hat{K}_1$, let $\mathcal {M}_{\bar{i}}$ denote the set of all possible $\tilde{P}_i$, generated by (9.39) under condition (9.38). By definition, $\{\mathcal {M}_j\}_{j=1}^\infty $ is a nondecreasing sequence of sets, i.e., $\mathcal {M}_1\subset \mathcal {M}_2 \subset \cdots $. Define $\mathcal {M} = \cup _{j=1}^\infty \mathcal {M}_j$, $\mathcal {D} = \{P\in \mathbb {S}^n\ \vert \ \Vert P\Vert _F\le 6\Vert \tilde{P}_1\Vert _F\}$. Then by Lemma 9.12 and Theorem 9.1, $\mathcal {M}\subset \mathcal {D}$; $\mathcal {M}$ is compact; $\mathscr {K}(P)$ is stable for any $P\in \mathcal {M}$.

Now we prove that $\mathcal {N}(P^1)$ is Lipschitz continuous on $\mathcal {M}$. Using (9.11), we have

$$\begin{aligned} \Vert \mathcal {N}(P^1)-\mathcal {N}(P^2)\Vert _F&= \Vert \mathscr {A}^{-1}(\mathscr {K}(P^1)){{\,\mathrm{vec}\,}}(\mathcal {R}(P^1)) -\mathscr {A}^{-1}(\mathscr {K}(P^2)){{\,\mathrm{vec}\,}}(\mathcal {R}(P^2))\Vert _2 \nonumber \\&\le \Vert \mathscr {A}^{-1}(\mathscr {K}(P^1))\Vert _2\Vert \mathcal {R}(P^1) -\mathcal {R}(P^2)\Vert _F +\nonumber \\&\Vert \mathcal {R}(P^2)\Vert _F\Vert \mathscr {A}^{-1}(\mathscr {K}(P^1)) -\mathscr {A}^{-1}(\mathscr {K}(P^2))\Vert _2\nonumber \\&\le L \Vert P^1 -P^2 \Vert _F, \end{aligned}$$

(9.40)

where the last inequality is due to the fact that matrix inversion $\mathscr {A}(\cdot )$, $\mathscr {K}(\cdot )$, and $\mathcal {R}(\cdot )$ are locally Lipschitz, thus Lipschitz on compact set $\mathcal {M}$ with some Lipschitz constant $L>0$.

Define $\{P_{k\vert i}\}_{k=0}^{\infty }$ as the sequence generated by (9.12) with $P_{0\vert i}=\tilde{P}_i$. Similar to (9.39), we have

$$\begin{aligned} P_{k+1\vert i} = P_{k\vert i} - \mathcal {N}(P_{k\vert i}), \quad k\in \mathbb {Z}_+. \end{aligned}$$

(9.41)

By Theorem 9.1 and the fact that $\mathcal {M}$ is compact, there exists $k_0\in \mathbb {Z}_+$, such that

$$\begin{aligned} \Vert P_{k_0\vert i}-P^*\Vert _F<\delta _0/2, \qquad \forall P_{0\vert i}\in \mathcal {M}. \end{aligned}$$

(9.42)

Suppose

$$\begin{aligned} \Vert \mathcal {L}_{\mathscr {K}(\tilde{P}_{i+j})}^{-1}(\mathscr {E}_{i+j})\Vert _F<\mu ,\qquad j=0,\cdots , \bar{i} - i. \end{aligned}$$

(9.43)

We find an upper bound on $\Vert P_{k\vert i}-\tilde{P}_{i+k}\Vert _F$. Notice that from (9.39) to (9.41),

$$\begin{aligned} \begin{aligned} P_{k\vert i} = P_{0\vert i} - \sum _{j=0}^{k-1} \mathcal {N}(P_{j\vert i}), \qquad \tilde{P}_{i+k} = \tilde{P}_i - \sum _{j=0}^{k-1} \mathcal {N}(\tilde{P}_{i+j}) + \sum _{j=0}^{k-1} \mathcal {L}_{\mathscr {K}(\tilde{P}_{i+j})}^{-1}(\mathscr {E}_{i+j}). \end{aligned} \end{aligned}$$

Then (9.40) and (9.43) yield

$$\begin{aligned} \begin{aligned} \Vert P_{k\vert i} - \tilde{P}_{i+k}\Vert _F \le k\mu + \sum _{j=0}^{k-1} L \Vert P_{j\vert i} - \tilde{P}_{i+j}\Vert _F. \end{aligned} \end{aligned}$$

An application of the Gronwall inequality [2, Theorem 4.1.1.] to the above inequality implies

$$\begin{aligned} \Vert P_{k\vert i} - \tilde{P}_{i+k}\Vert _F \le k\mu + L\mu \sum _{j=0}^{k-1}j (1+L)^{k-j-1}. \end{aligned}$$

(9.44)

By (9.11), the error term in (9.39) satisfies

$$\begin{aligned} \left\| \mathcal {L}_{\mathscr {K}(\tilde{P_i})}^{-1}(\mathscr {E}_i)\right\| _F = \left\| \mathscr {A}^{-1}(\mathscr {K}(\tilde{P}_i)) {{\,\mathrm{vec}\,}}\left( \mathscr {E}_i\right) \right\| _2 \le C_1\Vert \mathscr {E}_i\Vert _F, \end{aligned}$$

(9.45)

where $C_1$ is a constant and the inequality is due to the continuity of matrix inverse.

Let $\bar{i}>k_0$, and $k=k_0$, $i = \bar{i}-k_0$ in (9.44). Then by condition (9.38), Lemma 9.12, (9.43), (9.44), and (9.45), there exist $i_0\in \mathbb {Z}_+$, $i_0>k_0$, such that $\Vert P_{k_0\vert \bar{i}-k_0} -\tilde{P}_{\bar{i}}\Vert _F<\delta _0/2$, for all $\bar{i}\ge i_0$. Setting $i = \bar{i}-k_0$ in (9.42), the triangle inequality yields $\tilde{P}_{\bar{i}}\in \mathcal {B}_{\delta _0}(P^*)$, for $\bar{i}\ge i_0$. Then in (9.38), choosing $\bar{i}\ge i_0$ such that $\delta _2 = b_{\bar{i}}<\min (\gamma ^{-1}(\epsilon ),\delta _1)$ completes the proof. $\square $

Appendix 6

For given $\hat{K}_1$, let $\mathcal {K}$ denote the set of control gains (including $\hat{K}_1$) generated by Procedure 9.2 with all possible $\{\Delta G_i\}_{i=1}^\infty $ satisfying $\Vert \Delta G\Vert _\infty <\delta _2$, where $\delta _2$ is the one in Theorem 9.2. The following result is firstly derived.

Lemma 9.13

Under the conditions in Theorem 9.3, there exist $\bar{L}_0>0$ and $N_0>0$ such that for any $\bar{L}\ge \bar{L}_0$ and $N\ge N_0$, $\hat{K}_i\in \mathcal {K}$ implies $\Vert \Delta G_i\Vert _F< \delta _2$, almost surely.

Proof

By definition, in the context of Algorithm 9.1,

$$\begin{aligned} \begin{aligned} \Vert \Delta G_i\Vert _F \le \Vert \hat{Q}_i - Q(\hat{P}_{i,\bar{L}})\Vert _F + \Vert Q(\hat{P}_{i,\bar{L}}) - Q(\tilde{P}_i)\Vert _F + \Vert \tilde{P}_i - \hat{P}_{i,\bar{L}} \Vert _F, \end{aligned} \end{aligned}$$

where $\tilde{P}_i$ is the unique solution of (9.9) with $K=\hat{K}_i$. Thus, the task is to prove that each term in the right-hand side of the above inequality is less than $\delta _2/3$. To this end, we firstly study $\Vert \tilde{P}_i - \hat{P}_{i,\bar{L}} \Vert _F$. Define $\hat{p}_{i,j}={{\,\mathrm{vec}\,}}(\hat{P}_{i,j})$, by Lemma 9.5, Line 11 and Line 12 in Algorithm 9.1 can be rewritten as

$$\begin{aligned} \begin{aligned} \hat{p}_{i,j+1} = \mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i)\hat{p}_{i,j} + \mathcal {T}^2(\hat{\Phi }^\dagger _{N,M},\hat{r}_{N,M},\hat{K}_i), \end{aligned} \end{aligned}$$

(9.46)

where $\hat{p}_{i,0}\in \mathbb {R}^{n^2}$ and

$$\begin{aligned} \begin{aligned} \mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i)&= \left[ I_n,-\hat{K}^T_i\right] \otimes \left[ I_n,-\hat{K}^T_i\right] D_{(m+n)(m+n+1)/2}\hat{\Phi }^\dagger _{N,M}\hat{\Psi }^2_{N,M}D_n^\dagger , \\ \mathcal {T}^2(\hat{\Phi }^\dagger _{N,M},\hat{r}_{N,M},\hat{K}_i)&= \left[ I_n,-\hat{K}^T_i\right] \otimes \left[ I_n,-\hat{K}^T_i\right] D_{(m+n)(m+n+1)/2}\hat{\Phi }^\dagger _{N,M}\hat{r}_{N,M}. \end{aligned} \end{aligned}$$

Similar derivations applied to (9.20) with $K=\hat{K}_i$ yield

$$\begin{aligned} \bar{p}_{i,j+1} = \mathcal {T}^1(\Phi _M,\Psi ^2_M,\hat{K}_i)\bar{p}_{i,j} + \mathcal {T}^2(\Phi _M,r_M,\hat{K}_i),\quad \bar{p}_{i,0}\in \mathbb {R}^{n^2}. \end{aligned}$$

(9.47)

Since (9.20) is identical to (9.14), (9.47) is identical to (9.15) with K and ${{\,\mathrm{vec}\,}}(P_{K,j})$ replaced by $\hat{K}_i$ and $\bar{p}_{i,j}$ respectively, and

$$\begin{aligned} \mathcal {T}^1(\Phi _M,\Psi _M,\hat{K}_i)=\mathscr {A}(\hat{K}_i) + I_n\otimes I_n,\quad \mathcal {T}^2(\Phi _M,r_M,\hat{K}_i) = {{\,\mathrm{vec}\,}}(S+\hat{K}_i^TR\hat{K}_i). \end{aligned}$$

(9.48)

Since $\mathscr {A}(\hat{K}_i)$, $\hat{K}_i\in \mathcal {K}$ is mean-square stabilizing, by Lemma 9.4

$$\begin{aligned} \lim _{j\rightarrow \infty } \bar{P}_{i,j} = \tilde{P}_i, \end{aligned}$$

(9.49)

where $\bar{P}_{i,j} = {{\,\mathrm{vec}\,}}^{-1}(\bar{p}_{i,j})$. By definition and Theorem 9.2, $\bar{\mathcal {K}}$ is bounded, thus compact. Let $\mathcal {V}$ be the set of the unique solutions of (9.5) with $K\in \mathcal {K}$. Then by Theorem 9.2 $\mathcal {V}$ is bounded. So $\mathscr {A}(K)$ is mean-square stable for $\forall K\in \bar{\mathcal {K}}$, otherwise by (9.11) and Lemma 9.1 it contradicts the boundedness of $\mathcal {V}$. Define $\mathcal {K}_1 = \{\mathscr {A}(K)+I_n\otimes I_n\vert K\in \bar{\mathcal {K}}\}$. Then $\rho (X)<1$ for any $X\in \mathcal {K}_1$, and by continuity $\mathcal {K}_1$ is a compact set. This implies the existence of a $\delta _3>0$, such that $\rho (X)<1$ for any $X\in \bar{\mathcal {K}}_2$, where

$$\mathcal {K}_2 = \{X\vert X\in \mathcal {B}_{\delta _3}(Y),Y\in \mathcal {K}_1\}.$$

Define

$$\begin{aligned} \begin{aligned} \Delta \mathcal {T}^1_{N,M,i}&= \mathcal {T}^1(\Phi _M,\Psi ^2_M,\hat{K}_i) - \mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i), \\ \Delta \mathcal {T}^2_{N,M,i}&= \mathcal {T}^2(\Phi _M,r_M,\hat{K}_i) - \mathcal {T}^2(\hat{\Phi }^\dagger _{N,M},\hat{r}_{N,M},\hat{K}_i). \end{aligned} \end{aligned}$$

The boundedness of $\mathcal {K}$, (9.22), and (9.48) imply the existence of $N_1>0$, such that for any $N\ge N_1$, any $\hat{K}_i\in \mathcal {K}$, almost surely

$$\begin{aligned} \mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i)\in \bar{\mathcal {K}}_2, \quad \mathcal {T}^2(\hat{\Phi }^\dagger _{N,M},\hat{r}_{N,M},\hat{K}_i)< C_9, \end{aligned}$$

(9.50)

where $C_9>0$ is a constant. Then

$$\rho (\mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i))<1$$

and (9.46) admits a unique stable equilibrium, that is,

$$\begin{aligned} \lim _{j\rightarrow \infty }\hat{P}_{i,j} = \mathring{P}_i \end{aligned}$$

(9.51)

for some $\mathring{P}_i\in \mathbb {S}^n$. From (9.46), (9.47), (9.49), and (9.51), we have

$$\begin{aligned} \begin{aligned} {{\,\mathrm{vec}\,}}(\tilde{P}_i)&= \left( I_{n^2} - \mathcal {T}^1(\Phi _M,\Psi _M,\hat{K}_i) \right) ^{-1}\mathcal {T}^2(\Phi _M,r_M,\hat{K}_i), \\ {{\,\mathrm{vec}\,}}(\mathring{P}_i)&= \left( I_{n^2} - \mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i) \right) ^{-1}\mathcal {T}^2(\hat{\Phi }^\dagger _{N,M},\hat{r}_{N,M},\hat{K}_i). \end{aligned} \end{aligned}$$

Thus by (9.23), for any $N\ge N_1$, any $\hat{K}_i\in \mathcal {K}$, almost surely

$$\begin{aligned}&\Vert \mathring{P}_i - \tilde{P}_i\Vert _F\le \left\| \left( I_{n^2} - \mathcal {T}^1(\Phi _M,\Psi _M,\hat{K}_i) \right) ^{-1}\right\| _F\left( \Vert \Delta \mathcal {T}^2_{N,M,i} \Vert _F +\right. \\&\left. \left\| \left( I_{n^2} - \mathcal {T}^1(\hat{\Phi }^\dagger _{N,M},\hat{\Psi }^2_{N,M},\hat{K}_i) \right) ^{-1}\right\| _F\left\| \mathcal {T}^2(\hat{\Phi }^\dagger _{N,M},\hat{r}_{N,M},\hat{K}_i)\right\| _2\Vert \Delta \mathcal {T}^1_{N,M,i} \Vert _F\right) \\&\le C_{10} \Vert \Delta \mathcal {T}^2_{N,M,i} \Vert _F + C_{11}\Vert \Delta \mathcal {T}^1_{N,M,i} \Vert _F, \end{aligned}$$

where $C_{10}$ and $C_{11}$ are some positive constants, and the last inequality is due to (9.48), (9.50) and the fact that $\mathcal {K}_1$ and $\bar{\mathcal {K}}_2$ are compact sets. Then for any $\epsilon _1>0$, the boundedness of $\mathcal {K}$ and (9.22) implies the existence of $N_2\ge N_1$, such that for any $N\ge N_2$, almost surely

$$\begin{aligned} \Vert \mathring{P}_i - \tilde{P}_i\Vert _F<\epsilon _1/2, \end{aligned}$$

(9.52)

as long as $\hat{K}_i\in \mathcal {K}$. By Lemma 9.6 and (9.52), for any $N\ge N_2$ and any $\hat{K}_i\in \mathcal {K}$,

$$\Vert \mathring{P}_{i}-\hat{P}_{i,j}\Vert _F\le a_0b^j_0\Vert \mathring{P}_i\Vert _F\le a_1b^j_0,$$

for some $a_0>0$, $1>b_0>0$ and $a_1>0$. Therefore there exists a $\bar{L}_1>0$, such that for any $\bar{L}\ge \bar{L}_1$, and any $N\ge N_2$, almost surely

$$\begin{aligned} \Vert \hat{P}_{i,\bar{L}}-\mathring{P}_i\Vert _F<\epsilon _1/2, \end{aligned}$$

(9.53)

as long as $\hat{K}_i\in \mathcal {K}$. With (9.52) and (9.53), we obtain

$$\begin{aligned} \Vert \hat{P}_{i,\bar{L}}-\tilde{P}_i\Vert _F<\epsilon _1, \end{aligned}$$

(9.54)

almost surely for any $\bar{L}\ge \bar{L}_1$, any $N\ge N_2$, as long as $\hat{K}_i\in \mathcal {K}$. Since $\epsilon _1$ is arbitrary, we can choose $\epsilon _1$ such that almost surely

$$\begin{aligned} \Vert \hat{P}_{i,\bar{L}}-\tilde{P}_i\Vert _F<\delta _2/3 \end{aligned}$$

for any $\bar{L}\ge \bar{L}_1$, any $N\ge N_2$, as long as $\hat{K}_i\in \mathcal {K}$.

Secondly, by definition and (9.54), there exist $\bar{L}_2\ge \bar{L}_1$ and $N_3\ge N_2$, such that

$$\begin{aligned} \Vert Q(\hat{P}_{i,\bar{L}}) - Q(\tilde{P}_i)\Vert _F<\delta _2/3 \end{aligned}$$

for any $\bar{L}\ge \bar{L}_2$, any $N\ge N_3$, as long as $\hat{K}_i\in \mathcal {K}$.

Thirdly, since $\mathcal {V}$ is bounded, $\hat{P}_{i,\bar{L}}$ is also almost surely bounded by (9.54). Thus, from Line 14 in Algorithm 9.1 and (9.22), there exists $N_4\ge N_3$, such that

$$\Vert \hat{Q}_{i} - Q(\hat{P}_{i,\bar{L}})\Vert _F<\delta _2/3$$

for any $N\ge N_4$ and any $\bar{L}\ge \bar{L}_2$, as long as $\hat{K}_i\in \mathcal {K}$.

Setting $N_0 = N_4$ and $\bar{L}_0 = \bar{L}_2$ yields $\Vert \Delta G_i\Vert <\delta _2$. $\square $

Now we are ready to prove the convergence of Algorithm 9.1.

Proof

(Theorem 9.3) Since $\hat{K}_1\in \mathcal {K}$, Lemma 9.13 implies $\Vert \Delta G_1\Vert _F<\delta _2$ almost surely. By definition, $\hat{K}_2\in \mathcal {K}$. Thus $\Vert \Delta G_i\Vert _F<\delta _2, i=1,2,\cdots $ almost surely by mathematical induction. Then Theorem 9.2 completes the proof. $\square $

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pang, B., Jiang, ZP. (2022). Robust Reinforcement Learning for Stochastic Linear Quadratic Control with Multiplicative Noise. In: Jiang, ZP., Prieur, C., Astolfi, A. (eds) Trends in Nonlinear and Adaptive Control. Lecture Notes in Control and Information Sciences, vol 488. Springer, Cham. https://doi.org/10.1007/978-3-030-74628-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-74628-5_9
Published: 12 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74627-8
Online ISBN: 978-3-030-74628-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Robust Reinforcement Learning for Stochastic Linear Quadratic Control with Multiplicative Noise

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive optimal control of unknown discrete-time linear systems with guaranteed prescribed degree of stability using reinforcement learning

Reinforcement Learning for Optimal Adaptive Control of Time Delay Systems

Online Model-Free RLSPI Algorithm for Nonlinear Discrete-Time Non-affine Systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix 1

Lemma 9.5

Lemma 9.6

Appendix 2

Lemma 9.7

Proof

Proof

Appendix 3

Proof

Appendix 4

Lemma 9.8

Proof

Lemma 9.9

Proof

Proof

Appendix 5

Lemma 9.10

Lemma 9.11

Proof

Lemma 9.12

Proof

Proof

Appendix 6

Lemma 9.13

Proof

Proof

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation