Mixed Density Methods for Approximate Dynamic Programming

Max L. Greene⁶,
Patryk Deptula⁷,
Rushikesh Kamalapurkar⁸ &
…
Warren E. Dixon⁶

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 325))

8231 Accesses
1 Citations

Abstract

This chapter discusses mixed density reinforcement learning (RL)-based approximate optimal control methods applied to deterministic systems. Such methods typically require a persistence of excitation (PE) condition for convergence. In this chapter, data-based methods will be discussed to soften the stringent PE condition by learning via simulation-based extrapolation. The development is based on the observation that, given a model of the system, RL can be implemented by evaluating the Bellman error (BE) at any number of desired points in the state space, thus virtually simulating the system. The sections will discuss necessary and sufficient conditions for optimality, regional model-based RL, local (StaF) RL, combining regional and local model-based RL, and RL with sparse BE extrapolation. Notes on stability follow within each method’s respective section.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reinforcement Learning

Robust Optimal Well Control using an Adaptive Multigrid Reinforcement Learning Framework

Article Open access 04 November 2022

Reinforcement Learning for Approximate Optimal Control

Notes

1.
The notation $\nabla _{x}h\left( x,y,t\right) $ denotes the partial derivative of generic function $h\left( x,y,t\right) $ with respect to generic variable x. The notation $h^{\prime }\left( x,y\right) $ denotes the gradient with respect to the first argument of the generic function, $h\left( \cdot ,\cdot \right) $, e.g., $h'\left( x,y\right) =\nabla _{x}h\left( x,y\right) .$
2.
For notational brevity, unless otherwise specified, the domain of all the functions is assumed to be $\mathbb {R}_{\ge 0}$, where $\mathbb {R}_{\ge a}$ denotes the interval $\left[ a,\infty \right) $. The notation $\left\| \cdot \right\| $ denotes the Euclidean norm for vectors and the Frobenius norm for matrices.
3.
The notation $I_{n}$ denotes the $n\times n$ identity matrix.
4.
The notation G, $G_{\sigma }$, and $G_{\varepsilon }$ is defined as $G=G\left( x\right) \triangleq g\left( x\right) R^{-1}g^{T}\left( x\right) $ , $G_{\sigma }=G_{\sigma }\triangleq \sigma ^{\prime }\left( x\right) G\left( x\right) \sigma ^{\prime }\left( x\right) ^{T}$, and $G_{\varepsilon }=G_{\varepsilon }\left( x\right) \triangleq \varepsilon ^{\prime }\left( x\right) G\left( x\right) \varepsilon ^{\prime }\left( x\right) ^{T}$, respectively.
5.
The subsequent analysis in Sect. 5.3.5 indicates that when a system identifier that satisfies Assumption 5.2 is employed to facilitate online optimal control, the ratio $\frac{D}{K}$ needs to be sufficiently small to establish set-point regulation and convergence to optimality.
6.
The Lipschitz property is exploited here for clarity of exposition. The bound in (5.38) can be easily generalized to $\left\| Y\left( x\right) \right\| \le L_{Y}\left( \left\| x\right\| \right) \left\| x\right\| $, where $L_{Y}:\mathbb {R}\rightarrow \mathbb {R}$ is a positive, non-decreasing function.
7.
The notation ${a \atopwithdelims ()b}$ denotes the combinatorial operation “a choose b”.
8.
Similar to NN-based approximation methods such as [1,2,3,4,5,6,7,8], the function approximation error, $\varepsilon $, is unknown, and in general, infeasible to compute for a given function, since the ideal NN weights are unknown. Since a bound on $\varepsilon $ is unavailable, the gain conditions in (5.57)–(5.59) cannot be formally verified. However, they can be met using trial and error by increasing the gain $k_{a2}$, the number of StaF basis functions, and $\underline{c}$, by selecting more points to extrapolate the BE.

References

Doya, K.: Reinforcement learning in continuous time and space. Neural Comput. 12(1), 219–245 (2000)
Article Google Scholar
Padhi, R., Unnikrishnan, N., Wang, X., Balakrishnan, S.: A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw. 19(10), 1648–1660 (2006)
Article Google Scholar
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38, 943–949 (2008)
Article Google Scholar
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)
Article Google Scholar
Dierks, T., Thumati, B., Jagannathan, S.: Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw. 22(5–6), 851–860 (2009)
Article Google Scholar
Mehta, P., Meyn, S.: Q-learning and pontryagin’s minimum principle. In: Proceedings of the IEEE Conference on Decision and Control, pp. 3598–3605
Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5), 878–888 (2010)
Article MathSciNet Google Scholar
Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 22(12), 2226–2236 (2011)
Article Google Scholar
Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.: A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1), 89–92 (2013)
Article MathSciNet Google Scholar
Zhang, H., Cui, L., Luo, Y.: Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network adp. IEEE Trans. Cybern. 43(1), 206–216 (2013)
Article Google Scholar
Zhang, H., Liu, D., Luo, Y., Wang, D.: Adaptive Dynamic Programming for Control Algorithms and Stability, ser. Communications and Control Engineering. Springer, London (2013)
Google Scholar
Kaelbling, L., Littman, M., Moore, A.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Article Google Scholar
Vrabie, D.: Online adaptive optimal control for continuous-time systems, Ph.D. dissertation, University of Texas at Arlington (2010)
Google Scholar
Vamvoudakis, K.G., Vrabie, D., Lewis, F.L.: Online adaptive algorithm for optimal control with integral reinforcement learning. Int. J. Robust Nonlinear Control 24(17), 2686–2710 (2014)
Article MathSciNet Google Scholar
Kamalapurkar, R., Walters, P., Dixon, W.E.: Model-based reinforcement learning for approximate optimal regulation. Automatica 64, 94–104 (2016)
Article MathSciNet Google Scholar
He, P., Jagannathan, S.: Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints. IEEE Trans. Syst. Man Cybern. Part B Cybern. 37(2), 425–436 (2007)
Article Google Scholar
Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy hdp iteration algorithm. SIEEE Trans. Syst. Man Cybern. Part B Cybern. 38(4), 937–942 (2008)
Article Google Scholar
Kamalapurkar, R., Rosenfeld, J., Dixon, W.E.: Efficient model-based reinforcement learning for approximate online optimal control. Automatica 74, 247–258 (2016)
Article MathSciNet Google Scholar
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Model-free q-learning designs for linear discrete-time zero-sum games with application to $H_{\infty }$ control. Automatica 43, 473–481 (2007)
Article Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamilton-jacobi equations. Automatica 47, 1556–1569 (2011)
Article MathSciNet Google Scholar
Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012). http://www.sciencedirect.com/science/article/pii/S0005109812002476
Modares, H., Lewis, F.L., Naghibi-Sistani, M.-B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1513–1525 (2013)
Article Google Scholar
Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-Sistani, M.-B.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)
Article MathSciNet Google Scholar
Modares, H., Lewis, F.L., Naghibi-Sistani, M.-B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1), 193–202 (2014)
Article MathSciNet Google Scholar
Modares, H., Lewis, F.L.: Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7), 1780–1792 (2014)
Article MathSciNet Google Scholar
Kamalapurkar, R., Walters, P.S., Rosenfeld, J.A., Dixon, W.E.: Reinforcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach. Springer, Berlin (2018)
Google Scholar
Singh, S.P.: Reinforcement learning with a hierarchy of abstract models. AAAI Natl. Conf. Artif. Intell. 92, 202–207 (1992)
Google Scholar
Atkeson, C.G., Schaal, S.: Robot learning from demonstration. Int. Conf. Mach. Learn. 97, 12–20 (1997)
Google Scholar
Abbeel, P., Quigley, M., Ng, A.Y.: Using inaccurate models in reinforcement learning. In: International Conference on Machine Learning, pp. 1–8. ACM, New York (2006)
Google Scholar
Deisenroth, M.P.: Efficient Reinforcement Learning Using Gaussian Processes. KIT Scientific Publishing (2010)
Google Scholar
Mitrovic, D., Klanke, S., Vijayakumar, S.: Adaptive optimal feedback control with learned internal dynamics models. In: Sigaud, O., Peters, J., (eds.), From Motor Learning to Interaction Learning in Robots. Series Studies in Computational Intelligence, vol. 264, pp. 65–84. Springer Berlin (2010)
Google Scholar
Deisenroth, M.P., Rasmussen, C.E., Pilco: a model-based and data-efficient approach to policy search. In: International Conference on Machine Learning 2011, pp. 465–472 (2011)
Google Scholar
Liberzon, D.: Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton University Press, Princeton (2012)
Google Scholar
Kirk, D.: Optimal Control Theory: An Introduction. Dover, Mineola (2004)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
MATH Google Scholar
Konda, V., Tsitsiklis, J.: On actor-critic algorithms. SIAM J. Control Optim. 42(4), 1143–1166 (2004)
Article MathSciNet Google Scholar
Dierks, T., Jagannathan, S.: Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics. In: Proceedings of the IEEE Conference on Decision and Control, Shanghai, CN, Dec. 2009, pp. 6750–6755 (2009)
Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Online synchronous policy iteration method for optimal control. In: Yu, W. (ed.) Recent Advances in Intelligent Control Systems, pp. 357–374. Springer, London (2009)
Chapter Google Scholar
Dierks, T., Jagannathan, S.: Optimal control of affine nonlinear continuous-time systems. In: Proceedings of the American Control Conference, 2010, pp. 1568–1573 (2010)
Google Scholar
Khalil, H.K.: Nonlinear Systems, 3rd edn. Prentice Hall, Upper Saddle River (2002)
MATH Google Scholar
Chowdhary, G., Concurrent learning for convergence in adaptive control without persistency of excitation, Ph.D. dissertation, Georgia Institute of Technology (2010)
Google Scholar
Chowdhary, G., Johnson, E.: A singular value maximizing data recording algorithm for concurrent learning. In: Proceedings of the American Control Conference, 2011, pp. 3547–3552 (2011)
Google Scholar
Chowdhary, G., Yucelen, T., Mühlegg, M., Johnson, E.N.: Concurrent learning adaptive control of linear systems with exponentially convergent bounds. Int. J. Adapt. Control Signal Process. 27(4), 280–301 (2013)
Article MathSciNet Google Scholar
Kamalapurkar, R., Walters, P., Dixon, W.E.: Model-based reinforcement learning for approximate optimal regulation. Automatica 64, 94–104 (2016)
Article MathSciNet Google Scholar
Kamalapurkar, R., Andrews, L., Walters, P., Dixon, W.E.: Model-based reinforcement learning for infinite-horizon approximate optimal tracking. IEEE Trans. Neural Netw. Learn. Syst. 28(3), 753–758 (2017)
Article Google Scholar
Kamalapurkar, R., Klotz, J., Dixon, W.E.: Concurrent learning-based online approximate feedback Nash equilibrium solution of N-player nonzero-sum differential games. IEEE/CAA J. Autom. Sin. 1(3), 239–247 (2014)
Article Google Scholar
Luo, B., Wu, H.-N., Huang, T., Liu, D.: Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica (2014)
Google Scholar
Yang, X., Liu, D., Wei, Q.: Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming. IET Control Theory Appl. 8(16), 1676–1688 (2014)
Article MathSciNet Google Scholar
Rosenfeld, J.A., Kamalapurkar, R., Dixon, W.E.: The state following (staf) approximation method. IEEE Trans. Neural Netw. Learn. Syst. 30(6), 1716–1730 (2019)
Google Scholar
Lorentz, G.G.: Bernstein Polynomials, 2nd edn. Chelsea Publishing Co., New York (1986)
MATH Google Scholar
Rosenfeld, J.A., Kamalapurkar, R., Dixon, W.E.: State following (StaF) kernel functions for function approximation Part I: theory and motivation. In: Proceedings of the American Control Conference, 2015, pp. 1217–1222 (2015)
Google Scholar
Deptula, P., Rosenfeld, J., Kamalapurkar, R., Dixon, W.E.: Approximate dynamic programming: combining regional and local state following approximations. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2154–2166 (2018)
Article MathSciNet Google Scholar
Walters, P.S.: Guidance and control of marine craft: an adaptive dynamic programming approach, Ph.D. dissertation, University of Florida (2015)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceeding of the International Conference on Artificial Intelligence and Statistics, 2011, pp. 315–323 (2011)
Google Scholar
Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. In: Proceedings of the Advances in Neural Information Processing Systems, 2007, pp. 801–808 (2007)
Google Scholar
Nivison, S.A., Khargonekar, P.: Improving long-term learning of model reference adaptive controllers for flight applications: a sparse neural network approach. In: Proceedings of the AIAA Guidance, Navigation and Control Conference, Jan. 2017 (2017)
Google Scholar
Nivison, S.A., Khargonekar, P.P.: Development of a robust deep recurrent neural network controller for flight applications. In: Proceedings of the American Control Conference, IEEE, 2017, pp. 5336–5342 (2017)
Google Scholar
lan Boureau, Y., Cun, Y.L., Ranzato, M.: Sparse feature learning for deep belief networks. In: Proceedings of the Advances in Neural Information Processing Systems, 2008, pp. 1185–1192 (2008)
Google Scholar
Nivison, S.A., Khargonekar, P.P.: Development of a robust, sparsely-activated, and deep recurrent neural network controller for flight applications. In: Proceedings of the IEEE Conference on Decision and Control, pp. 384–390. IEEE (2018)
Google Scholar
Nivison, S.A., Khargonekar, P.: A sparse neural network approach to model reference adaptive control with hypersonic flight applications. In: Proceedings of the AIAA Guidance, Navigation and Control Conference, 2018, p. 0842 (2018)
Google Scholar
Greene, M.L., Deptula, P., Nivison, S., Dixon, W.E.: Sparse learning-based approximate dynamic programming with barrier constraints. IEEE Control Syst. Lett. 4(3), 743–748 (2020)
Article MathSciNet Google Scholar
Nivison, S.A.: Sparse and deep learning-based nonlinear control design with hypersonic flight applications, Ph.D. dissertation, University of Florida (2017)
Google Scholar
Walters, P., Kamalapurkar, R., Voight, F., Schwartz, E., Dixon, W.E.: Online approximate optimal station keeping of a marine craft in the presence of an irrotational current. IEEE Trans. Robot. 34(2), 486–496 (2018)
Article Google Scholar
Fan, Q.-Y., Yang, G.-H.: Active complementary control for affine nonlinear control systems with actuator faults. IEEE Trans. Cybern. 47(11), 3542–3553 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, FL, USA
Max L. Greene & Warren E. Dixon
The Charles Stark Draper Laboratory, Inc., Cambridge, MA, USA
Patryk Deptula
Department of Mechanical and Aerospace Engineering, Mechanical and Aerospace Engineering, Stillwater, OK, USA
Rushikesh Kamalapurkar

Authors

Max L. Greene
View author publications
You can also search for this author in PubMed Google Scholar
Patryk Deptula
View author publications
You can also search for this author in PubMed Google Scholar
Rushikesh Kamalapurkar
View author publications
You can also search for this author in PubMed Google Scholar
Warren E. Dixon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Warren E. Dixon .

Editor information

Editors and Affiliations

The Daniel Guggenheim School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Kyriakos G. Vamvoudakis
Department of Electrical Engineering, The University of Texas at Arlington, Arlington, TX, USA
Yan Wan
Department of Electrical Engineering, The University of Texas at Arlington, Arlington, TX, USA
Frank L. Lewis
Army Research Office, Durham, NC, USA
Derya Cansever

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Greene, M.L., Deptula, P., Kamalapurkar, R., Dixon, W.E. (2021). Mixed Density Methods for Approximate Dynamic Programming. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds) Handbook of Reinforcement Learning and Control. Studies in Systems, Decision and Control, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-60990-0_5
Published: 24 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60989-4
Online ISBN: 978-3-030-60990-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics