Nothing Special   »   [go: up one dir, main page]

Skip to main content

Mixed Density Methods for Approximate Dynamic Programming

  • Chapter
  • First Online:
Handbook of Reinforcement Learning and Control

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 325))

Abstract

This chapter discusses mixed density reinforcement learning (RL)-based approximate optimal control methods applied to deterministic systems. Such methods typically require a persistence of excitation (PE) condition for convergence. In this chapter, data-based methods will be discussed to soften the stringent PE condition by learning via simulation-based extrapolation. The development is based on the observation that, given a model of the system, RL can be implemented by evaluating the Bellman error (BE) at any number of desired points in the state space, thus virtually simulating the system. The sections will discuss necessary and sufficient conditions for optimality, regional model-based RL, local (StaF) RL, combining regional and local model-based RL, and RL with sparse BE extrapolation. Notes on stability follow within each method’s respective section.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The notation \(\nabla _{x}h\left( x,y,t\right) \) denotes the partial derivative of generic function \(h\left( x,y,t\right) \) with respect to generic variable x. The notation \(h^{\prime }\left( x,y\right) \) denotes the gradient with respect to the first argument of the generic function, \(h\left( \cdot ,\cdot \right) \), e.g., \(h'\left( x,y\right) =\nabla _{x}h\left( x,y\right) .\)

  2. 2.

    For notational brevity, unless otherwise specified, the domain of all the functions is assumed to be \(\mathbb {R}_{\ge 0}\), where \(\mathbb {R}_{\ge a}\) denotes the interval \(\left[ a,\infty \right) \). The notation \(\left\| \cdot \right\| \) denotes the Euclidean norm for vectors and the Frobenius norm for matrices.

  3. 3.

    The notation \(I_{n}\) denotes the \(n\times n\) identity matrix.

  4. 4.

    The notation G, \(G_{\sigma }\), and \(G_{\varepsilon }\) is defined as \(G=G\left( x\right) \triangleq g\left( x\right) R^{-1}g^{T}\left( x\right) \) , \(G_{\sigma }=G_{\sigma }\triangleq \sigma ^{\prime }\left( x\right) G\left( x\right) \sigma ^{\prime }\left( x\right) ^{T}\), and \(G_{\varepsilon }=G_{\varepsilon }\left( x\right) \triangleq \varepsilon ^{\prime }\left( x\right) G\left( x\right) \varepsilon ^{\prime }\left( x\right) ^{T}\), respectively.

  5. 5.

    The subsequent analysis in Sect. 5.3.5 indicates that when a system identifier that satisfies Assumption 5.2 is employed to facilitate online optimal control, the ratio \(\frac{D}{K}\) needs to be sufficiently small to establish set-point regulation and convergence to optimality.

  6. 6.

    The Lipschitz property is exploited here for clarity of exposition. The bound in (5.38) can be easily generalized to \(\left\| Y\left( x\right) \right\| \le L_{Y}\left( \left\| x\right\| \right) \left\| x\right\| \), where \(L_{Y}:\mathbb {R}\rightarrow \mathbb {R}\) is a positive, non-decreasing function.

  7. 7.

    The notation \({a \atopwithdelims ()b}\) denotes the combinatorial operation “a choose b”.

  8. 8.

    Similar to NN-based approximation methods such as [1,2,3,4,5,6,7,8], the function approximation error, \(\varepsilon \), is unknown, and in general, infeasible to compute for a given function, since the ideal NN weights are unknown. Since a bound on \(\varepsilon \) is unavailable, the gain conditions in (5.57)–(5.59) cannot be formally verified. However, they can be met using trial and error by increasing the gain \(k_{a2}\), the number of StaF basis functions, and \(\underline{c}\), by selecting more points to extrapolate the BE.

References

  1. Doya, K.: Reinforcement learning in continuous time and space. Neural Comput. 12(1), 219–245 (2000)

    Article  Google Scholar 

  2. Padhi, R., Unnikrishnan, N., Wang, X., Balakrishnan, S.: A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw. 19(10), 1648–1660 (2006)

    Article  Google Scholar 

  3. Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38, 943–949 (2008)

    Article  Google Scholar 

  4. Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)

    Article  Google Scholar 

  5. Dierks, T., Thumati, B., Jagannathan, S.: Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw. 22(5–6), 851–860 (2009)

    Article  Google Scholar 

  6. Mehta, P., Meyn, S.: Q-learning and pontryagin’s minimum principle. In: Proceedings of the IEEE Conference on Decision and Control, pp. 3598–3605

    Google Scholar 

  7. Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5), 878–888 (2010)

    Article  MathSciNet  Google Scholar 

  8. Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 22(12), 2226–2236 (2011)

    Article  Google Scholar 

  9. Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.: A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1), 89–92 (2013)

    Article  MathSciNet  Google Scholar 

  10. Zhang, H., Cui, L., Luo, Y.: Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network adp. IEEE Trans. Cybern. 43(1), 206–216 (2013)

    Article  Google Scholar 

  11. Zhang, H., Liu, D., Luo, Y., Wang, D.: Adaptive Dynamic Programming for Control Algorithms and Stability, ser. Communications and Control Engineering. Springer, London (2013)

    Google Scholar 

  12. Kaelbling, L., Littman, M., Moore, A.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    Article  Google Scholar 

  13. Vrabie, D.: Online adaptive optimal control for continuous-time systems, Ph.D. dissertation, University of Texas at Arlington (2010)

    Google Scholar 

  14. Vamvoudakis, K.G., Vrabie, D., Lewis, F.L.: Online adaptive algorithm for optimal control with integral reinforcement learning. Int. J. Robust Nonlinear Control 24(17), 2686–2710 (2014)

    Article  MathSciNet  Google Scholar 

  15. Kamalapurkar, R., Walters, P., Dixon, W.E.: Model-based reinforcement learning for approximate optimal regulation. Automatica 64, 94–104 (2016)

    Article  MathSciNet  Google Scholar 

  16. He, P., Jagannathan, S.: Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints. IEEE Trans. Syst. Man Cybern. Part B Cybern. 37(2), 425–436 (2007)

    Article  Google Scholar 

  17. Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy hdp iteration algorithm. SIEEE Trans. Syst. Man Cybern. Part B Cybern. 38(4), 937–942 (2008)

    Article  Google Scholar 

  18. Kamalapurkar, R., Rosenfeld, J., Dixon, W.E.: Efficient model-based reinforcement learning for approximate online optimal control. Automatica 74, 247–258 (2016)

    Article  MathSciNet  Google Scholar 

  19. Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Model-free q-learning designs for linear discrete-time zero-sum games with application to \(H_{\infty }\) control. Automatica 43, 473–481 (2007)

    Article  Google Scholar 

  20. Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamilton-jacobi equations. Automatica 47, 1556–1569 (2011)

    Article  MathSciNet  Google Scholar 

  21. Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012). http://www.sciencedirect.com/science/article/pii/S0005109812002476

  22. Modares, H., Lewis, F.L., Naghibi-Sistani, M.-B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1513–1525 (2013)

    Article  Google Scholar 

  23. Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-Sistani, M.-B.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)

    Article  MathSciNet  Google Scholar 

  24. Modares, H., Lewis, F.L., Naghibi-Sistani, M.-B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1), 193–202 (2014)

    Article  MathSciNet  Google Scholar 

  25. Modares, H., Lewis, F.L.: Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7), 1780–1792 (2014)

    Article  MathSciNet  Google Scholar 

  26. Kamalapurkar, R., Walters, P.S., Rosenfeld, J.A., Dixon, W.E.: Reinforcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach. Springer, Berlin (2018)

    Google Scholar 

  27. Singh, S.P.: Reinforcement learning with a hierarchy of abstract models. AAAI Natl. Conf. Artif. Intell. 92, 202–207 (1992)

    Google Scholar 

  28. Atkeson, C.G., Schaal, S.: Robot learning from demonstration. Int. Conf. Mach. Learn. 97, 12–20 (1997)

    Google Scholar 

  29. Abbeel, P., Quigley, M., Ng, A.Y.: Using inaccurate models in reinforcement learning. In: International Conference on Machine Learning, pp. 1–8. ACM, New York (2006)

    Google Scholar 

  30. Deisenroth, M.P.: Efficient Reinforcement Learning Using Gaussian Processes. KIT Scientific Publishing (2010)

    Google Scholar 

  31. Mitrovic, D., Klanke, S., Vijayakumar, S.: Adaptive optimal feedback control with learned internal dynamics models. In: Sigaud, O., Peters, J., (eds.), From Motor Learning to Interaction Learning in Robots. Series Studies in Computational Intelligence, vol. 264, pp. 65–84. Springer Berlin (2010)

    Google Scholar 

  32. Deisenroth, M.P., Rasmussen, C.E., Pilco: a model-based and data-efficient approach to policy search. In: International Conference on Machine Learning 2011, pp. 465–472 (2011)

    Google Scholar 

  33. Liberzon, D.: Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton University Press, Princeton (2012)

    Google Scholar 

  34. Kirk, D.: Optimal Control Theory: An Introduction. Dover, Mineola (2004)

    Google Scholar 

  35. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  36. Konda, V., Tsitsiklis, J.: On actor-critic algorithms. SIAM J. Control Optim. 42(4), 1143–1166 (2004)

    Article  MathSciNet  Google Scholar 

  37. Dierks, T., Jagannathan, S.: Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics. In: Proceedings of the IEEE Conference on Decision and Control, Shanghai, CN, Dec. 2009, pp. 6750–6755 (2009)

    Google Scholar 

  38. Vamvoudakis, K.G., Lewis, F.L.: Online synchronous policy iteration method for optimal control. In: Yu, W. (ed.) Recent Advances in Intelligent Control Systems, pp. 357–374. Springer, London (2009)

    Chapter  Google Scholar 

  39. Dierks, T., Jagannathan, S.: Optimal control of affine nonlinear continuous-time systems. In: Proceedings of the American Control Conference, 2010, pp. 1568–1573 (2010)

    Google Scholar 

  40. Khalil, H.K.: Nonlinear Systems, 3rd edn. Prentice Hall, Upper Saddle River (2002)

    MATH  Google Scholar 

  41. Chowdhary, G., Concurrent learning for convergence in adaptive control without persistency of excitation, Ph.D. dissertation, Georgia Institute of Technology (2010)

    Google Scholar 

  42. Chowdhary, G., Johnson, E.: A singular value maximizing data recording algorithm for concurrent learning. In: Proceedings of the American Control Conference, 2011, pp. 3547–3552 (2011)

    Google Scholar 

  43. Chowdhary, G., Yucelen, T., Mühlegg, M., Johnson, E.N.: Concurrent learning adaptive control of linear systems with exponentially convergent bounds. Int. J. Adapt. Control Signal Process. 27(4), 280–301 (2013)

    Article  MathSciNet  Google Scholar 

  44. Kamalapurkar, R., Walters, P., Dixon, W.E.: Model-based reinforcement learning for approximate optimal regulation. Automatica 64, 94–104 (2016)

    Article  MathSciNet  Google Scholar 

  45. Kamalapurkar, R., Andrews, L., Walters, P., Dixon, W.E.: Model-based reinforcement learning for infinite-horizon approximate optimal tracking. IEEE Trans. Neural Netw. Learn. Syst. 28(3), 753–758 (2017)

    Article  Google Scholar 

  46. Kamalapurkar, R., Klotz, J., Dixon, W.E.: Concurrent learning-based online approximate feedback Nash equilibrium solution of N-player nonzero-sum differential games. IEEE/CAA J. Autom. Sin. 1(3), 239–247 (2014)

    Article  Google Scholar 

  47. Luo, B., Wu, H.-N., Huang, T., Liu, D.: Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica (2014)

    Google Scholar 

  48. Yang, X., Liu, D., Wei, Q.: Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming. IET Control Theory Appl. 8(16), 1676–1688 (2014)

    Article  MathSciNet  Google Scholar 

  49. Rosenfeld, J.A., Kamalapurkar, R., Dixon, W.E.: The state following (staf) approximation method. IEEE Trans. Neural Netw. Learn. Syst. 30(6), 1716–1730 (2019)

    Google Scholar 

  50. Lorentz, G.G.: Bernstein Polynomials, 2nd edn. Chelsea Publishing Co., New York (1986)

    MATH  Google Scholar 

  51. Rosenfeld, J.A., Kamalapurkar, R., Dixon, W.E.: State following (StaF) kernel functions for function approximation Part I: theory and motivation. In: Proceedings of the American Control Conference, 2015, pp. 1217–1222 (2015)

    Google Scholar 

  52. Deptula, P., Rosenfeld, J., Kamalapurkar, R., Dixon, W.E.: Approximate dynamic programming: combining regional and local state following approximations. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2154–2166 (2018)

    Article  MathSciNet  Google Scholar 

  53. Walters, P.S.: Guidance and control of marine craft: an adaptive dynamic programming approach, Ph.D. dissertation, University of Florida (2015)

    Google Scholar 

  54. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceeding of the International Conference on Artificial Intelligence and Statistics, 2011, pp. 315–323 (2011)

    Google Scholar 

  55. Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. In: Proceedings of the Advances in Neural Information Processing Systems, 2007, pp. 801–808 (2007)

    Google Scholar 

  56. Nivison, S.A., Khargonekar, P.: Improving long-term learning of model reference adaptive controllers for flight applications: a sparse neural network approach. In: Proceedings of the AIAA Guidance, Navigation and Control Conference, Jan. 2017 (2017)

    Google Scholar 

  57. Nivison, S.A., Khargonekar, P.P.: Development of a robust deep recurrent neural network controller for flight applications. In: Proceedings of the American Control Conference, IEEE, 2017, pp. 5336–5342 (2017)

    Google Scholar 

  58. lan Boureau, Y., Cun, Y.L., Ranzato, M.: Sparse feature learning for deep belief networks. In: Proceedings of the Advances in Neural Information Processing Systems, 2008, pp. 1185–1192 (2008)

    Google Scholar 

  59. Nivison, S.A., Khargonekar, P.P.: Development of a robust, sparsely-activated, and deep recurrent neural network controller for flight applications. In: Proceedings of the IEEE Conference on Decision and Control, pp. 384–390. IEEE (2018)

    Google Scholar 

  60. Nivison, S.A., Khargonekar, P.: A sparse neural network approach to model reference adaptive control with hypersonic flight applications. In: Proceedings of the AIAA Guidance, Navigation and Control Conference, 2018, p. 0842 (2018)

    Google Scholar 

  61. Greene, M.L., Deptula, P., Nivison, S., Dixon, W.E.: Sparse learning-based approximate dynamic programming with barrier constraints. IEEE Control Syst. Lett. 4(3), 743–748 (2020)

    Article  MathSciNet  Google Scholar 

  62. Nivison, S.A.: Sparse and deep learning-based nonlinear control design with hypersonic flight applications, Ph.D. dissertation, University of Florida (2017)

    Google Scholar 

  63. Walters, P., Kamalapurkar, R., Voight, F., Schwartz, E., Dixon, W.E.: Online approximate optimal station keeping of a marine craft in the presence of an irrotational current. IEEE Trans. Robot. 34(2), 486–496 (2018)

    Article  Google Scholar 

  64. Fan, Q.-Y., Yang, G.-H.: Active complementary control for affine nonlinear control systems with actuator faults. IEEE Trans. Cybern. 47(11), 3542–3553 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Warren E. Dixon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Greene, M.L., Deptula, P., Kamalapurkar, R., Dixon, W.E. (2021). Mixed Density Methods for Approximate Dynamic Programming. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds) Handbook of Reinforcement Learning and Control. Studies in Systems, Decision and Control, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_5

Download citation

Publish with us

Policies and ethics