Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Approximate dynamic programming for stochastic N-stage optimization with application to optimal consumption under uncertainty

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Stochastic optimization problems with an objective function that is additive over a finite number of stages are addressed. Although Dynamic Programming allows one to formally solve such problems, closed-form solutions can be derived only in particular cases. The search for suboptimal solutions via two approaches is addressed: approximation of the value functions and approximation of the optimal decision policies. The approximations take on the form of linear combinations of basis functions containing adjustable parameters to be optimized together with the coefficients of the combinations. Two kinds of basis functions are considered: Gaussians with varying centers and widths and sigmoids with varying weights and biases. The accuracies of such suboptimal solutions are investigated via estimates of the error propagation through the stages. Upper bounds are derived on the differences between the optimal value of the objective functional and its suboptimal values corresponding to the use at each stage of approximate value functions and approximate policies. Conditions under which the number of basis functions required for a desired approximation accuracy does not grow “too fast” with respect to the dimensions of the state and random vectors are provided. As an example of application, a multidimensional problem of optimal consumption under uncertainty is investigated, where consumers aim at maximizing a social utility function. Numerical simulations are provided, emphasizing computational pros and cons of the two approaches (i.e., value-function approximation and optimal-policy approximation) using the above-mentioned two kinds of basis functions. To investigate the dependencies of the performances on dimensionality, the numerical analysis is performed for various numbers of consumers. In the simulations, discretization techniques exploiting low-discrepancy sequences are used. Both theoretical and numerical results give insights into the possibility of coping with the curse of dimensionality in stochastic optimization problems whose decision strategies depend on large numbers of variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The expression “curse of dimensionality”, introduced by Bellman [6], refers to an unacceptably fast growth, with respect to a suitably-defined “dimension” of the problem at hand, of the computational time or memory requirements necessary to solve an instance of the problem itself.

  2. A sigmoid is a bounded measurable function \(\sigma : \mathbb {R}\rightarrow \mathbb {R}\) such that lim t→−∞ σ(t)=0 and lim t→+∞ σ(t)=1 [25]; see Sect. 4.

  3. I.e., with a finite number of variables and an infinite number of constraints.

  4. In general, such expectations and maximizations cannot be performed exactly. These additional sources of error are due to the replacements of the conditional expectations by empirical means (computed on the basis of a sufficiently large number of samples) and to the use of nonlinear programming algorithms. Tools from statistical learning theory [28, 59, 85] and deterministic learning (see, e.g., [21] and the references therein) can be used to derive upper bounds on the required number of samples for a desired degree of accuracy. However, in this paper we do not address these issues, so in the following we shall simplify the analysis by assuming a large number of samples and neglecting the error in performing the maximizations and conditional expectations.

  5. To see how \(\tilde{J}_{t}\) is chosen, see the proof of Proposition 5, in particular formula (52).

  6. In particular, for deterministic infinite-horizon problems, in [72] it was shown that high-order differentiability may be lost at unstable stationary points of the dynamical system x t+1=g(x t ), where g is the (stationary) optimal policy.

  7. We assume for simplicity of notation that the minimum in (52) is achieved, otherwise one can replace the argmin in (52) by an element \(\tilde{f}_{t}\) of \(\mathcal {F}_{V,t}\) whose value of \(\|\tilde{f}_{t}- \hat{J}_{t}\|_{{\mathrm{sup}}(\bar{Y}_{t})}\) is arbitrarily close to its infimum.

References

  1. Adda, J., Cooper, R.: Dynamic Economics: Quantitative Methods and Applications. MIT Press, Cambridge (2003)

    Google Scholar 

  2. Alessandri, A., Gaggero, M., Zoppoli, R.: Feedback optimal control of distributed parameter systems by using finite-dimensional approximation schemes. IEEE Trans. Neural Netw. Learn. Syst. 23(6), 984–996 (2012)

    Article  Google Scholar 

  3. Altman, E., Nain, P.: Optimal control of the M/G/1 queue with repeated vacations of the server. IEEE Trans. Autom. Control 38, 1766–1775 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  4. Anderson, E.J., Nash, P.: Linear Programming in Infinite-Dimensional Spaces. Wiley, New York (1987)

    MATH  Google Scholar 

  5. Baird, L.: Residual algorithms: reinforcement learning with function approximation. In: Proc. 12th Int. Conf. on Machine Learning, pp. 30–37 (1995)

    Google Scholar 

  6. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  7. Bellman, R., Dreyfus, S.: Functional approximations and dynamic programming. Math. Tables Other Aids Comput. 13, 247–251 (1959)

    Article  MATH  MathSciNet  Google Scholar 

  8. Bellman, R., Kalaba, R., Kotkin, B.: Polynomial approximation—a new computational technique in dynamic programming. Math. Comput. 17, 155–161 (1963)

    MATH  MathSciNet  Google Scholar 

  9. Benveniste, L.M., Scheinkman, J.A.: On the differentiability of the value function in dynamic models of economics. Econometrica 47, 727–732 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  10. Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1. Athena Scientific, Belmont (2005)

    MATH  Google Scholar 

  11. Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 2. Athena Scientific, Belmont (2007)

    Google Scholar 

  12. Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  13. Bertsekas, D.P., Borkar, V.S., Nedic, A.: Improved temporal difference methods with linear function approximation. In: Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.) Handbook of Learning and Approximate Dynamic Programming, pp. 233–257. IEEE Press, New York (2004)

    Google Scholar 

  14. Bhattacharya, R., Majumdar, M.: Random Dynamical Systems: Theory and Applications. Cambridge University Press, Cambridge (2007)

    Book  Google Scholar 

  15. Blume, L., Easley, D., O’Hara, M.: Characterization of optimal plans for stochastic dynamic programs. J. Econ. Theory 28, 221–234 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  16. Buhmann, M.D.: Radial Basis Functions. Cambridge University Press, Cambridge (2003)

    Book  MATH  Google Scholar 

  17. Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton (2010)

    Book  Google Scholar 

  18. Cervellera, C., Muselli, M.: Efficient sampling in approximate dynamic programming algorithms. Comput. Optim. Appl. 38, 417–443 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  19. Cervellera, C., Gaggero, M., Macciò, D.: Efficient kernel models for learning and approximate minimization problems. Neurocomputing 97, 74–85 (2012)

    Article  Google Scholar 

  20. Cervellera, C., Gaggero, M., Macciò, D., Marcialis, R.: Quasi-random sampling for approximate dynamic programming. In: Proc. 2013 Int. Joint Conf. on Neural Networks, pp. 2567–2574 (2013)

    Google Scholar 

  21. Cervellera, C., Gaggero, M., Macciò, D.: Low-discrepancy sampling for approximate dynamic programming with local approximators. Comput. Oper. Res. 43, 108–115 (2014)

    Article  MathSciNet  Google Scholar 

  22. Chen, V.C.P., Ruppert, D., Shoemaker, C.A.: Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38–53 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  23. Cruz-Suárez, H., Montes-de-Oca, R.: Discounted Markov control processes induced by deterministic systems. Kybernetika 42, 647–664 (2006)

    MATH  MathSciNet  Google Scholar 

  24. Cruz-Suárez, H., Montes-de-Oca, R.: An envelope theorem and some applications to discounted Markov decision processes. Math. Methods Oper. Res. 67, 299–321 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  25. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2, 303–314 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  26. Ekeland, I., Turnbull, T.: Infinite-Dimensional Optimization and Convexity. University of Chicago Press, Chicago (1983)

    MATH  Google Scholar 

  27. Fang, K.T., Wang, Y.: Number-Theoretic Methods in Statistics. Chapman & Hall, London (1994)

    Book  MATH  Google Scholar 

  28. Farahmand, A., Ghavamzadeh, M., Szepesvári, C., Mannor, S.: Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems. In: Proc. American Control Conf., pp. 725–730 (2009)

    Google Scholar 

  29. Foufoula-Georgiou, E., Kitanidis, P.K.: Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems. Water Resour. Res. 24, 1345–1359 (1988)

    Article  Google Scholar 

  30. Gaggero, M., Gnecco, G., Sanguineti, M.: Dynamic programming and value-function approximation in sequential decision problems: error analysis and numerical results. J. Optim. Theory Appl. 156, 380–416 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  31. Gaggero, M., Gnecco, G., Sanguineti, M.: Suboptimal policies for stochastic N-stage optimization problems: accuracy analysis and a case study from optimal consumption. In: El Ouardighi, F., Kogan, K. (eds.) Models and Methods in Economics and Management Science. International Series in Operations Research and Management Science, vol. 198, pp. 27–50. Springer, Berlin (2014)

    Chapter  Google Scholar 

  32. Gelfand, I.M., Fomin, S.V.: Calculus of Variations. Prentice Hall, Englewood Cliffs (1963)

    Google Scholar 

  33. Giulini, S., Sanguineti, M.: Approximation schemes for functional optimization problems. J. Optim. Theory Appl. 140, 33–54 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  34. Gnecco, G., Sanguineti, M.: Approximation error bounds via Rademacher’s complexity. Appl. Math. Sci. 2, 153–176 (2008)

    MATH  MathSciNet  Google Scholar 

  35. Gnecco, G., Sanguineti, M.: Suboptimal solutions to dynamic optimization problems via approximations of the policy functions. J. Optim. Theory Appl. 146, 764–794 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  36. Gnecco, G., Sanguineti, M., Gaggero, M.: Suboptimal solutions to team optimization problems with stochastic information structure. SIAM J. Optim. 22, 212–243 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  37. Gribonval, R., Vandergheynst, P.: On the exponential convergence of matching pursuits in quasi-incoherent dictionaries. IEEE Trans. Inf. Theory 52, 255–261 (2006)

    Article  MathSciNet  Google Scholar 

  38. Grisvard, P.: Elliptic Problems in Nonsmooth Domains. Pitman, Boston (1985)

    MATH  Google Scholar 

  39. Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods. Methuen, London (1964)

    Book  MATH  Google Scholar 

  40. Hernandez-Lerma, O., Lasserre, J.: Approximation schemes for infinite linear programs. SIAM J. Optim. 8, 973–988 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  41. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Springer, Berlin (1996)

    Google Scholar 

  42. Ito, S., Wu, S.Y., Shiu, T.J., Teo, K.L.: A numerical approach to infinite-dimensional linear programming in L 1 spaces. J. Ind. Manag. Optim. 6, 15–18 (2010)

    MATH  MathSciNet  Google Scholar 

  43. Johnson, S., Stedinger, J., Shoemaker, C., Li, Y., Tejada-Guibert, J.: Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41, 484–500 (1993)

    Article  MATH  Google Scholar 

  44. Judd, K.: Numerical Methods in Economics. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  45. Kainen, P., Kůrková, V., Sanguineti, M.: Minimization of error functionals over variable-basis functions. SIAM J. Optim. 14, 732–742 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  46. Kainen, P.C., Kůrková, V., Sanguineti, M.: Dependence of computational models on input dimension: tractability of approximation and optimization tasks. IEEE Trans. Inf. Theory 58, 1203–1214 (2012)

    Article  Google Scholar 

  47. Karp, L., Lee, I.H.: Learning-by-doing and the choice of technology: the role of patience. J. Econ. Theory 100, 73–92 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  48. Kuhn, D.: Generalized Bounds for Convex Multistage Stochastic Programs. Springer, Berlin (2005)

    MATH  Google Scholar 

  49. Kůrková, V., Sanguineti, M.: Bounds on rates of variable-basis and neural-network approximation. IEEE Trans. Inf. Theory 47, 2659–2665 (2001)

    Article  MATH  Google Scholar 

  50. Kůrková, V., Sanguineti, M.: Comparison of worst-case errors in linear and neural network approximation. IEEE Trans. Inf. Theory 48, 264–275 (2002)

    Article  MATH  Google Scholar 

  51. Kůrková, V., Sanguineti, M.: Error estimates for approximate optimization by the extended Ritz method. SIAM J. Optim. 18, 461–487 (2005)

    Article  Google Scholar 

  52. Kůrková, V., Sanguineti, M.: Approximate minimization of the regularized expected error over kernel models. Math. Oper. Res. 33, 747–756 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  53. Kůrková, V., Sanguineti, M.: Geometric upper bounds on rates of variable-basis approximation. IEEE Trans. Inf. Theory 54, 5681–5688 (2008)

    Article  Google Scholar 

  54. Lendaris, G.G., Neidhoefer, J.C.: Guidance in the choice of adaptive critics for control. In: Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.) Handbook of Learning and Approximate Dynamic Programming, pp. 97–124. IEEE Press, New York (2004)

    Google Scholar 

  55. Loomis, L.H.: An Introduction to Abstract Harmonic Analysis. Van Nostrand, Princeton (1953)

    MATH  Google Scholar 

  56. Mhaskar, H.N.: Neural networks for optimal approximation of smooth and analytic functions. Neural Comput. 8, 164–177 (1996)

    Article  Google Scholar 

  57. Montrucchio, L.: Lipschitz continuous policy functions for strongly concave optimization problems. J. Math. Econ. 16, 259–273 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  58. Montrucchio, L.: Thompson metric, contraction property and differentiability of policy functions. J. Econ. Behav. Organ. 33, 449–466 (1998)

    Article  Google Scholar 

  59. Munos, R., Szepesvári, C.: Finite-time bounds for fitted value iteration. J. Mach. Learn. Res. 1, 815–857 (2008)

    Google Scholar 

  60. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia (1992)

    Book  MATH  Google Scholar 

  61. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006)

    MATH  Google Scholar 

  62. Philbrick, C.R., Jr., Kitanidis, P.K.: Improved dynamic programming methods for optimal control of lumped-parameter stochastic systems. Oper. Res. 49, 398–412 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  63. Pinkus, A.: Approximation theory of the MLP model in neural networks. Acta Numer. 8, 143–195 (1999)

    Article  MathSciNet  Google Scholar 

  64. Poggio, T., Smale, S.: The mathematics of learning: dealing with data. Not. Am. Math. Soc. 50, 537–544 (2003)

    MATH  MathSciNet  Google Scholar 

  65. Powell, W.B.: Approximate Dynamic Programming. Wiley-Interscience, Hoboken (2007)

    Book  MATH  Google Scholar 

  66. Puterman, M.L., Shin, M.C.: Modified policy iteration algorithms for discounted Markov decision processes. Manag. Sci. 41, 1127–1137 (1978)

    Article  MathSciNet  Google Scholar 

  67. Rapaport, A., Sraidi, S., Terreaux, J.: Optimality of greedy and sustainable policies in the management of renewable resources. Optim. Control Appl. Methods 24, 23–44 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  68. Rincon-Zapatero, J.P., Santos, M.S.: An envelope theorem and some applications to discounted Markov decision processes. J. Econ. Theory 144, 19481764 (2009)

    Article  MathSciNet  Google Scholar 

  69. Sahinidis, N.V.: Optimization under uncertainty: state-of-the-art and opportunities. Comput. Chem. Eng. 28, 971–983 (2004)

    Article  Google Scholar 

  70. Santos, M.S.: Smoothness of policy function in discrete time economic models. Econometrica 59, 1365–1382 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  71. Santos, M.S.: On high-order differentiability of the policy function. Econ. Theory 3, 565–570 (1993)

    Article  MATH  Google Scholar 

  72. Santos, M.S., Vila, J.L.: Smoothness of the policy function in continuous-time economic models: the one-dimensional case. J. Econ. Dyn. Control 15, 741–753 (1988)

    Article  MathSciNet  Google Scholar 

  73. Schweitzer, P.J., Seidmann, A.: Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. Appl. 110, 568–582 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  74. Secomandi, N.: Optimal commodity trading with a capacitated storage asset. Manag. Sci. 56, 449–467 (2010)

    Article  MATH  Google Scholar 

  75. Semmler, W., Sieveking, M.: Critical debt and debt dynamics. J. Econ. Dyn. Control 24, 1121–1144 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  76. Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.): Handbook of Learning and Approximate Dynamic Programming. IEEE Press, New York (2004)

    Google Scholar 

  77. Singer, I.: Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces. Springer, Berlin (1970)

    Book  MATH  Google Scholar 

  78. Sobol’, I.M.: The distribution of points in a cube and the approximate evaluation of integrals. Ž. Vyčisl. Mat. Mat. Fiz. 7, 784–802 (1967)

    MathSciNet  Google Scholar 

  79. Stein, E.M.: Singular Integrals and Differentiability Properties of Functions. Princeton University Press, Princeton (1970)

    MATH  Google Scholar 

  80. Stokey, N.L., Lucas, R.E., Prescott, E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989)

    MATH  Google Scholar 

  81. ten Hagen, S., Kröse, B.: Neural Q-learning. Neural Comput. Appl. 12, 81–88 (2003)

    Article  Google Scholar 

  82. Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992)

    MATH  Google Scholar 

  83. Tsitsiklis, J.N.: Perspectives on stochastic optimization over time. INFORMS J. Comput. 22, 18–19 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  84. Tsitsiklis, J.N., Roy, B.V.: Feature-based methods for large scale dynamic programming. Mach. Learn. 22, 59–94 (1996)

    MATH  Google Scholar 

  85. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  86. White, D.J.: Markov Decision Processes. Wiley, New York (1993)

    MATH  Google Scholar 

  87. Zoppoli, R., Parisini, T., Sanguineti, M.: Approximating networks and extended Ritz method for the solution of functional optimization problems. J. Optim. Theory Appl. 112, 403–439 (2002)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous referees for their fruitful comments and suggestions, and for having pointed out the related literature [5, 28, 59, 74], and [82]. Part of the discussion contained in Sect. 1.4 is by one of the reviewers, to whom we are grateful. G. Gnecco and M. Sanguineti were partially supported by the Progetto di Ricerca di Ateneo 2012 “Models and Computational Methods for High-Dimensional Data”, granted by University of Genoa.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcello Sanguineti.

Appendix: Proofs and technical lemmas

Appendix: Proofs and technical lemmas

Proof of Proposition 5

(i) We use a backward induction argument. For t=N−1,…,0, assume that at stage t+1, \(\tilde{J}_{t+1} \in \mathcal {F}_{V,t+1}\) is such that \(\| J_{t+1}-\tilde{J}_{t+1}\|_{{\mathrm{sup}}(\bar{Y}_{t+1})} \leq {\eta}_{t+1}\) for some η t+1≥0. In particular, as \(\tilde{J}_{N} = J_{N}\) one has η N =0. Let \(\hat{J}_{t}\) be defined as in formula (11). This can be written as \(\hat{J}_{t}=T_{t} \tilde{J}_{t+1}\), where T t is Bellman’s operator, defined for every bounded and continuous function k t+1 on X t+1×Z t+1 as

$$(T_{t} k_{t+1}) (x_t,z_t) :=\sup_{y \in \mathcal{D}_t(x_t,z_t)} \Bigl[h_t(x_t, y,z_t) + \beta \mathop {{\mathbb{E}}}_{z_{t+1}|z_t} \bigl\{ k_{t+1}(y,z_{t+1})\bigr\} \Bigr]. $$

Thanks to Assumption 3, it follows by an application of [80, Theorem 9.6] that

$$\begin{aligned} \|J_{t}-\hat{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} =& \bigl\| (T_t J_{t+1})-(T_t \tilde{J}_{t+1})\bigr\| _{{\mathrm{sup}}(\bar{Y}_t)} \\ \leq &\beta \|J_{t+1}-\tilde{J}_{t+1}\|_{{\mathrm{sup}}(\bar{Y}_{t+1})} \leq \beta \eta_{t+1}. \end{aligned}$$
(51)

Before moving to the t-th stage, one has to find an approximation \(\tilde{J}_{t} \in \mathcal {F}_{V,t}\) for J t =T t J t+1. Such an approximation has to be obtained from \(\hat{J}_{t}=T_{t} \tilde{J}_{t+1}\) (which, in general, may not belong to \(\mathcal {F}_{V,t}\)), because J t =T t J t+1 is unknown.

For a bounded and continuous function f on \(\bar{Y}_{t}\) and r>0, denote by B r (f) the closed ball of radius r in the sup-norm over \(\bar{Y}_{t}\) centered at f, i.e., \(B_{r}(f):=\{g \ \mathrm{bounded} \ \mathrm{and}\ \mathrm{continuous}\ \mathrm{function}\ \mathrm{on}\ \bar{Y}_{t}\,|\, \|f-g\|_{{\mathrm{sup}}(\bar{Y}_{t})} \leq r \}\). Let \(J_{t} \in B_{\beta\eta_{t+1}}(\hat{J}_{t})\). By assumption, there exists \(f_{t} \in \mathcal {F}_{V,t}\) such that \(\|J_{t}-f_{t}\|_{{\mathrm{sup}}(\bar{Y}_{t})} \leq \varepsilon_{t}\), i.e., \(f_{t} \in B_{\varepsilon_{t}}(J_{t}) \cap \mathcal {F}_{V,t}\). Since J t is unknown, one cannot simply set \(\tilde{J}_{t}:=f_{t}\) but one can defineFootnote 7 \(\tilde{J}_{t}\) as any element of the set

$$ \arg\min_{\tilde{f}_t \in \mathcal {F}_{V,t}} \|\hat{J}_t- \tilde{f}_t\|_{{\mathrm{sup}}(\bar{Y}_t)}. $$
(52)

Hence, by the assumption on f t , the definition of \(\tilde{J}_{t}\), and the triangle inequality, we get

$$\|\hat{J}_t-\tilde{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq \|\hat{J}_t-J_t\|_{{\mathrm{sup}}(\bar{Y}_t)} +\|J_t-f_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq \beta \eta_{t+1} + \varepsilon_t. $$

Moreover, by (51) we have

$$\|J_t-\tilde{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq \|J_t-\tilde{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} +\|\hat{J}_t-\tilde{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq 2 \beta \eta_{t+1} + \varepsilon_t :=\eta_t. $$

Starting from \(\tilde{J}_{N} = J_{N}\) and making explicit the dependence of η t from ε t ,…,ε N−1, after Nt iterations (t=0,…,N−1) we obtain

$$ \|J_t-\tilde{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq \eta_t= \varepsilon_t + 2\beta\, \varepsilon_{t+1} + \cdots+(2\beta)^{N-1-t}\,\varepsilon_{N-1}= \sum_{i=0}^{N-1-t}(2\beta)^i \varepsilon_{t+i}. $$
(53)

Then, we obtain \(\|J-\tilde{J}\|_{{\mathrm{sup}}(\bar{Y}_{0})}=\|J_{0}-\tilde{J}_{0}\|_{{\mathrm{sup}}(\bar{Y}_{0})} \leq \sum_{i=0}^{N-1}{(2\beta)^{i}\varepsilon_{i}}\).

(ii) In order to prove (19), we need to find, for each t, suitable positive constants η t , δ t and χ t such that

$$\begin{aligned} \|\tilde{J}_t-J_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq &\eta_t , \\ \|\tilde{J}_{V,t}-J_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq &\delta_t, \\ \|\tilde{J}_{V,t}-\tilde{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq &\chi_t. \end{aligned}$$

In the following we show how one can define such constants in a recursive way and we prove that

$$\delta_0=2 \sum_{t=1}^{N-1} (2\beta)^t \varepsilon_t - 2 \sum_{t=1}^{N-1} \beta^t \varepsilon_t, $$

from which (19) follows. We divide this part of the proof in several steps.

Step 1: stage N−1.:

We recall that the policy \(\tilde{g}_{V,N-1}(x_{N-1},z_{N-1})\) is defined as

$$\begin{aligned} & \tilde{g}_{V,N-1}(x_{N-1},z_{N-1}) \\ &\quad :=\arg\max_{y \in \mathcal{D}_{N-1}(x_{N-1},z_{N-1})} \Bigl[h_{N-1}(x_{N-1},y,z_{N-1}) + \beta \mathop {{\mathbb{E}}}_{z_N|z_{N-1}} \bigl\{ \tilde{J}_N (y,z_{N}) \bigr\} \Bigr]. \end{aligned}$$
(54)

Since \(\tilde{J}_{N}=J_{N}\), the policy \(\tilde{g}_{V,N-1}\) is optimal also for the problem (54) with \(\tilde{J}_{N}\) replaced by J N . Then we obtain \(\tilde{J}_{V,N-1}=J_{N-1}\). Concluding, we can set δ N−1:=0 and χ N−1:=η N−1.

Step 2: stage N−2.:

The reoptimized policy \(\tilde{g}_{V,N-2}(x_{N-2},z_{N-2})\) is defined as

$$\begin{aligned} &\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}) \\ &\quad := \arg\max_{y \in \mathcal{D}_{N-2}(x_{N-2},z_{N-2})} \Bigl[h_{N-2}(x_{N-2},y,z_{N-2}) + \beta \mathop {{\mathbb{E}}}_{z_{N-1}|z_{N-2}} \bigl\{ \tilde{J}_{N-1} (y,z_{N-1}) \bigr\} \Bigr]. \end{aligned}$$
(55)

Now, we evaluate the distance of \(\tilde{J}_{V,N-2}\) from J N−2 in the sup-norm, then we determine δ N−2. Since \(\tilde{J}_{V,N-2}\) corresponds to the application of the suboptimal policies \(\tilde{g}_{V,N-2}\) and \(\tilde{g}_{V,N-1}\) at the stages N−2 and N−1, we get

$$ \tilde{J}_{V,N-2}(x_{N-2},z_{N-2}) \leq J_{N-2}(x_{N-2},z_{N-2}). $$
(56)

Moreover,

$$\begin{aligned} &\tilde{J}_{V,N-2}(x_{N-2},z_{N-2}) \\ &\quad =h_{N-2} \bigl(x_{N-2},\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \\ &\qquad {}+ \beta \mathop {{\mathbb{E}}}_{z_{N-1}|z_{N-2}} \bigl\{ \tilde{J}_{N-1} \bigl(\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \bigr\} \\ &\quad =h_{N-2}\bigl(x_{N-2},\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \\ &\qquad {}+ \beta \mathop {{\mathbb{E}}}_{z_{N-1}|z_{N-2}} \bigl\{ \tilde{J}_{N-1} \bigl( \tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \bigr\} \\ &\qquad {}+\beta \mathop {{\mathbb{E}}}_{z_{N-1}|z_{N-2}} \bigl\{ \tilde{J}_{V,N-1} \bigl( \tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \\ &\qquad {}-\tilde{J}_{N-1}\bigl(\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2} \bigr) \bigr\} \\ &\quad \geq J_{V,N-2}(x_{N-2},z_{N-2}) - \beta \eta_{N-1} - \beta \chi_{N-1}, \end{aligned}$$
(57)

where the last inequality in (57) is obtained observing that

$$\begin{aligned} &\bigl\vert \tilde{J}_{V,N-1} \bigl(\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \\ &\quad {}-\tilde{J}_{N-1}\bigl(\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2} \bigr)\bigr\vert \leq\chi_{N-1}, \\ & h_{N-2}\bigl(x_{N-2},\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2} \bigr) \\ &\qquad {}+ \beta \mathop {{\mathbb{E}}}_{z_{N-1}|z_{N-2}} \bigl\{ \tilde{J}_{N-1} \bigl( \tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \bigr\} \\ &\quad =(T_{N-2} \tilde{J}_{N-1}) (x_{N-2},z_{N-2}), \\ & h_{N-2}\bigl(x_{N-2},g_{N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \\ &\qquad {}+\beta \mathop {{\mathbb{E}}}_{z_{N-1}|z_{N-2}} \bigl\{ J_{N-1}\bigl(g_{N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \bigr\} \\ &\quad =(T_{N-2} J_{N-1}) (x_{N-2},z_{N-2})= J_{N-2} (x_{N-2},z_{N-2}) \end{aligned}$$

(where T N−2 is Bellman’s operator at the stage N−2), and

$$ \|T_{N-2} \tilde{J}_{N-1}-T_{N-2} J_{N-1}\|_{{\mathrm{sup}}(\bar{Y}_{N-2})} \leq \beta \|\tilde{J}_{N-1}-J_{N-1} \|_{{\mathrm{sup}}(\bar{Y}_{N-1})} \leq \beta \eta_{N-1}, $$
(58)

and the first inequality in (58) follows by an application of the contraction property of Bellman’s operator [80, Theorem 9.6].

Concluding, we obtain \(\|\tilde{J}_{V,N-2}-J_{N-2}\|_{{\mathrm{sup}}(\bar{Y}_{N-2})} \leq \delta_{N-2}:= \beta(\eta_{N-1}+\chi_{N-1})\) and, by an application of the triangle inequality, \(\|\tilde{J}_{V,N-2}-\tilde{J}_{N-2}\|_{{\mathrm{sup}}(\bar{Y}_{N-2})} \leq \chi_{N-2}:=\eta_{N-2}+\delta_{N-2}\).

Step 3: other stages.:

Similarly, we get the following recursive formulas for the other stages (t=N−3,…,0): \(\|\tilde{J}_{V,t}-J_{t}\|_{{\mathrm{sup}}(\bar{Y}_{t})} \leq \delta_{t}:= \beta(\eta_{t+1}+\chi_{t+1})\) and \(\|\tilde{J}_{V,t}-\tilde{J}_{t}\|_{{\mathrm{sup}}(\bar{Y}_{t})} \leq \chi_{t}:=\eta_{t}+\delta_{t}\).

Step 4: dependence of δ 0 from the errors η 0,…,η N−1.:
$$ \begin{aligned} &\delta_{N-1}=0, \\ &\chi_{N-1}=\eta_{N-1}, \\ &\delta_{N-2}=\beta(\eta_{N-1}+\chi_{N-1})=2\beta \eta_{N-1}, \\ &\chi_{N-2}=\eta_{N-2}+\delta_{N-2}= \eta_{N-2}+2\beta \eta_{N-1}, \\ &\delta_{N-3}=\beta(\eta_{N-2}+\chi_{N-2})=2\beta \eta_{N-2} + 2 \beta^2 \eta_{N-1}, \\ &\chi_{N-3}=\eta_{N-3}+\delta_{N-3}= \eta_{N-3}+2\beta \eta_{N-2} + 2 \beta^2 \eta_{N-1}, \\ &\delta_{N-4}=\beta(\eta_{N-3}+\chi_{N-3})=2\beta \eta_{N-3} + 2 \beta^2 \eta_{N-2} + 2 \beta^3 \eta_{N-1}, \\ &\vdots \\ &\delta_{0}=2 \beta \eta_1+2 \beta^2 \eta_2 + 2 \beta^3 \eta_3 + \cdots + 2 \beta^{N-1} \eta_{N-1}=2 \sum_{i=1}^{N-1} \beta^i \eta_i. \end{aligned} $$
(59)
Step 5: dependence of δ 0 from the errors ε 1,…,ε N−1.:

By (59) and the expressions (53) for η 0,…,η N−1, we make explicit the dependence of δ 0 from the errors ε 1,…,ε N−1, as shown in the following:

$$\begin{aligned} \delta_{0} =& 2 \sum_{i=1}^{N-1} \sum_{j=0}^{N-1-i} \beta^i (2\beta)^j \varepsilon_{i+j} \\ =& 2 \bigl[\beta \varepsilon_1 + \bigl(\beta^2 + 2 \beta^2\bigr) \varepsilon_2 + \bigl(\beta^3 + 2 \beta^3 + 4 \beta^3\bigr) \varepsilon_3 + \cdots \\ &{}+ \bigl(\beta^{N-1} + 2 \beta^{N-1} + \cdots + 2^{N-2} \beta^{N-1}\bigr) \varepsilon_{N-1} \bigr] \\ =& 2 \sum_{i=1}^{N-1} \Biggl(\sum_{j=0}^{i-1} 2^j \Biggr) \beta^i \varepsilon_i = 2 \sum_{i=1}^{N-1} \bigl(2^i-1\bigr) \beta^i \varepsilon_i =2 \sum_{i=1}^{N-1} (2 \beta)^i \varepsilon_i - 2 \sum_{i=1}^{N-1} \beta^i \varepsilon_i, \end{aligned}$$

which concludes the proof.  □

Proof of Proposition 6

We compare the behavior of the two dynamical systems x t+1=g t (x t ,z t ) and \(\tilde{x}_{t+1}=\tilde{g}_{P,t}(\tilde{x}_{t},z_{t})\), for the same initial state \(\tilde{x}_{0}=x_{0}\) and the same sequence of random vectors z t . By the Lipschitz continuity of h t and h N , we get

$$\begin{aligned} \|J -\tilde{J}_P\|_{{\mathrm{sup}}(\bar{Y}_0)} \leq &\mathop {{\mathbb{E}}}_{z_1,\ldots,z_N|z_0} \Biggl\{ \sum_{t=0}^{N-1} L_{h_t} \beta^t \sqrt{\|x_t-\tilde{x}_t \|^2 + \|x_{t+1}-\tilde{x}_{t+1}\|^2} \\ &{}+ \beta^N L_{h_N} \|x_N - \tilde{x}_N\| \Biggr\} . \end{aligned}$$
(60)

The Lipschitz continuity of g t and the triangle inequality give

$$\begin{aligned} \|x_{t+1}-\tilde{x}_{t+1}\| =& \bigl\| g_t(x_{t},z_t)-\tilde{g}_{P,t}(\tilde{x}_t,z_t)\bigr\| \\ =& \bigl\| g_t(x_{t},z_t)-g_t( \tilde{x}_t,z_t) +g_t(\tilde{x}_t,z_t)-\tilde{g}_{P,t}(\tilde{x}_t,z_t)\bigr\| \\ \leq& \bigl\| g_t(x_{t},z_t)-g_t( \tilde{x}_t,z_t)\bigr\| + \bigl\| g_t(\tilde{x}_t,z_t)- \tilde{g}_{P,t}(\tilde{x}_t,z_t)\bigr\| \\ \leq& L_{g_t}\|x_t-\tilde{x}_t\| + \bigl\| g_t(\tilde{x}_t,z_t)-\tilde{g}_{P,t}(\tilde{x}_t,z_t)\bigr\| . \end{aligned}$$
(61)

Let \(\tilde{\theta}_{t} :=\|x_{t}-\tilde{x}_{t}\|\). Then \(\tilde{\theta}_{0}=0\) and Eq. (61) together with \(\|g_{t} -\tilde{g}_{P,t}\|_{{\mathrm{sup}}(\bar{Y}_{t})} \leq {\varepsilon}_{t}\) provide \(\tilde{\theta}_{t+1} \leq L_{g_{t}}\tilde{\theta}_{t}+{\varepsilon}_{t}\) for t=0,…,N−1. So, the upper bound \(\|J {-}\,\tilde{J}_{P}\|_{{\mathrm{sup}}(\bar{Y}_{0})}\leq \sum_{t=0}^{N-1} \beta^{t} L_{h_{t}} \sqrt{\theta_{t}^{2}+\theta_{t+1}^{2}}+\beta^{N} L_{h_{N}}\theta_{N}\) is obtained by (60) and the definition of θ t (which does not depend on the particular sequence of random vectors z t ). As θ t is nonnegative for t=0,…,N−1, we have \(\sqrt{\theta_{t}^{2}+\theta_{t+1}^{2}} \leq \theta_{t} +\theta_{t+1}\). This, together with the upper bound \(\|J -\tilde{J}_{P}\|_{{\mathrm{sup}}(\bar{Y}_{0})}\leq \sum_{t=0}^{N-1} \beta^{t} L_{h_{t}} \sqrt{\theta_{t}^{2}+\theta_{t+1}^{2}}+\beta^{N} L_{h_{N}}\theta_{N}\) and the linear difference equation \(\theta_{t+1}=L_{g_{t}}\theta_{t}+{\varepsilon}_{t}\), yields the estimate \(\|J -\tilde{J}_{P}\|_{{\mathrm{sup}}(\bar{Y}_{0})} \leq \sum_{t=0}^{N-1} \varGamma_{t} {\varepsilon}_{t}\).  □

The next two lemmas provide smoothness properties of value functions and optimal policies. Lemma 17 (which gives high-order continuity properties) will be used in the proof of Lemma 18, which, in turn, will be exploited in the proofs of Propositions 7 and 8.

Lemma 17

Under Assumption 2, \(J_{t} \in \mathcal {C}^{m}(X_{t} \times Z_{t})\), t=0,…,N, and the optimal policy g t satisfies \(g_{t} \in \mathcal {C}^{m-1}(X_{t} \times Z_{t}, \mathbb {R}^{d})\), t=0,…,N−1.

Proof of Lemma 17

The regularity conditions stated in Assumption 2 (among these, Assumption 2(iii), which is used only in this part of the proof) guarantee that solving Problem \(\varSigma_{N}^{\mathrm{stoch}}\) is equivalent to solving iteratively Bellman’s equations (this follows, e.g., by an application of [48, Theorem 2.6]). The remaining of the proof works by backward induction as described in the following.

Suppose that, for t=0,…,N−1 and the value of α t+1>0 given in Assumption 2(iv) for t=N−1 and (v) for t=0,…,N−2, one knows that \(J_{t+1}(x_{t+1},z_{t+1}) \in \mathcal {C}^{m}(X_{t+1} \times Z_{t+1})\) and is α t+1-concave in x t+1 (i.e., if one fixes z t+1, then J t+1(x t+1,z t+1) is α t+1-concave as a function of x t+1). For t=N−1 this is guaranteed by Assumption 2(iv). We show that also \(J_{t}(x_{t},z_{t}) \in \mathcal {C}^{m}(X_{t} \times Z_{t})\) is α t -concave in its first argument (i.e., the function J t (⋅,z t ) is α t -concave for every fixed z t Z t ), with the same value of α t >0 given in Assumption 2(v).

Let \((x_{t}, z_{t}) \in \operatorname {int}(X_{t} \times Z_{t})\) and \(J^{e}_{t+1}(y,z_{t}):=\mathop {{\mathbb{E}}}_{z_{t+1}|z_{t}} \{J_{t+1}(y,z_{t+1}) \}\). By computing the conditional expected value, it is easy to see that \(J^{e}_{t+1}(y,z_{t})\) is α t+1-concave in y. As \(J_{t+1} \in \mathcal {C}^{m}(X_{t+1} \times Z_{t+1})\) and \(p_{z_{t+1}|{z_{t}}} \in \mathcal {C}^{m}(Z_{t+1} \times Z_{t})\), by interchanging derivatives and conditional expectations we get \(J^{e}_{t+1} \in \mathcal {C}^{m}(X_{t+1} \times Z_{t})\). Since by hypothesis the optimal policy g t is interior, the first-order optimality condition

$$ \nabla_2 h_t\bigl(x_t,g_t(x_t,z_t),z_t\bigr)+\beta \nabla_1 J^e_{t+1} \bigl(g_t(x_t,z_t),z_t\bigr)=0 $$
(62)

holds. Then, by the Implicit Function Theorem we get

$$\begin{aligned} \nabla_{x_t} g_t(x_t,z_t) = &-\bigl[\nabla^2_{2,2} \bigl(h_t\bigl(x_t, g_t(x_t,z_t),z_t \bigr) \bigr)+ \beta \nabla^2_{1,1} J^e_{t+1} \bigl(g_t(x_t,z_t),z_t\bigr)\bigr]^{-1} \\ &{}\cdot\nabla^2_{2,1}h_t\bigl(x_t,g_t(x_t,z_t),z_t\bigr) \end{aligned}$$
(63)

and

$$\begin{aligned} \nabla_{z_t} g_t(x_t,z_t) =& -\bigl[\nabla^2_{2,2} \bigl(h_t\bigl(x_t,g_t(x_t,z_t),z_t\bigr) \bigr) + \beta \nabla^2_{1,1} J^e_{t+1}\bigl(g_t(x_t,z_t),z_t\bigr)\bigr]^{-1} \\ &{}\cdot \nabla^2_{2,3}h_t\bigl(x_t,g_t(x_t,z_t),z_t \bigr), \end{aligned}$$
(64)

where the matrix \(\nabla^{2}_{2,2} (h_{t}(x_{t},g_{t}(x_{t},z_{t}),z_{t}) )+ \beta \nabla^{2} J^{e}_{t+1}(g_{t}(x_{t},z_{t}))\) is non-singular. Indeed, by the concavity of h t with respect to its second argument (which holds by Assumption 2(v)), the matrix \(\nabla^{2}_{2,2} (h_{t}(x_{t},g_{t}(x_{t},z_{t}),z_{t}))\) is negative semi-definite, and the matrix \(\nabla^{2}_{1,1} J^{e}_{t+1}(g_{t}(x_{t},z_{t}),z_{t})\) is negative definite since \(J^{e}_{t+1}\) is α t+1-concave in the first argument and α t+1>0. Hence, the matrix \(\nabla^{2}_{2,2} (h_{t}(x_{t},g_{t}(x_{t},z_{t}),z_{t}) )+ \beta \nabla^{2} J^{e}_{t+1}(g_{t}(x_{t},z_{t}))\) is non-singular and so (63) and (64) are well defined.

By differentiating (63) and (64) up to the order m, we get \(g_{t} \in C^{m-1}(\operatorname {int}(X_{t} \times Z_{t}), \mathbb {R}^{d})\). As the expressions obtained for its partial derivatives up to the order m−1 are bounded and continuous not only on \(\operatorname {int}(X_{t} \times Z_{t})\) but on the whole X t ×Z t , one has \(g_{t} \in C^{m-1}(X_{t} \times Z_{t}, \mathbb {R}^{d})\).

Since \(J_{t}(x_{t},z_{t})=h_{t}(x_{t},g_{t}(x_{t}),z_{t})+\beta J^{e}_{t+1}(g_{t}(x_{t},z_{t}),z_{t})\), by differentiating (62) we get

$$\begin{aligned} \bigl(\nabla_{x_t} J_t(x_t,z_t) \bigr)^{T} =& \bigl(\nabla_1 h_t \bigl(x_t,g_t(x_t,z_t),z_t\bigr)\bigr)^{T} + \bigl[\nabla_2 h_t\bigl(x_t,g_t(x_t,z_t),z_t\bigr) \\ &{}+ \beta \nabla_1 J^e_{t+1} \bigl(g_t(x_t,z_t),z_t\bigr)\bigr]^{T} \nabla_{x_t} g_t(x_t,z_t) \\ =& \bigl(\nabla_1 h_t\bigl(x_t,g_t(x_t,z_t),z_t \bigr)\bigr)^{T}, \end{aligned}$$
(65)

where the last equality holds by the first-order optimality condition (62). Similarly, we have

$$\begin{aligned} \bigl(\nabla_{z_t} J_t(x_t,z_t) \bigr)^{T} =& \bigl(\nabla_3 h_t \bigl(x_t,g_t(x_t,z_t),z_t \bigr)\bigr)^{T} +\bigl[\nabla_2 h_t \bigl(x_t,g_t(x_t,z_t),z_t\bigr) \\ &{}+ \beta \nabla_1 J^e_{t+1} \bigl(g_t(x_t,z_t),z_t\bigr)\bigr]^{T} \nabla_{z_t} g_t(x_t,z_t) \\ &{}+ \beta \bigl(\nabla_2 J^e_{t+1} \bigl(g_t(x_t,z_t),z_t\bigr)\bigr)^{T} \\ =&\bigl(\nabla_1 h_t\bigl(x_t,g_t(x_t,z_t),z_t \bigr)\bigr)^{T} + \beta \bigl(\nabla_2 J^e_{t+1}\bigl(g_t(x_t,z_t),z_t \bigr)\bigr)^{T}\!. \end{aligned}$$
(66)

By differentiating (65) and (66) up to the order m, we get \(J_{t} \in \mathcal {C}^{m}(\operatorname {int}(X_{t} \times Z_{t}))\). Likewise for the optimal policies, this extends to \(J_{t} \in \mathcal {C}^{m}(X_{t} \times Z_{t})\).

In order to conclude the backward induction step, it remains to show that J t is α t -concave in its first argument for the same value of α t >0 in Assumption 2(v). By [58, Proposition 1], for every \(a_{0} \in \mathbb {R}^{d}\) and z t fixed we get

$$\begin{aligned} a_0^{T}\nabla^2_{1,1} J_t(x_t,z_t)a_0 =&\max _{a_1 \in \mathbb {R}^d} \bigl[(a_0,a_1)^{T} \nabla^2_{(1,2)} h_t\bigl(x_t,g_t(x_t,z_t),z_t \bigr) (a_0,a_1) \\ &{}+ \beta a_1^{T}\nabla^2_{1,1} J^e_{t+1}\bigl(g_t(x_t,z_t),z_t \bigr)a_1 \bigr]. \end{aligned}$$
(67)

By Assumption 2(v) and the concavity of \(J^{e}_{t+1}\) with respect to the first argument, it follows that \(a_{0}^{T}\nabla^{2}_{1,1} J_{t}(x_{t},z_{t})a_{0} \leq -\alpha_{t} \|a_{0}\|^{2}\). As a twice continuously differentiable function is α-concave iff the maximum eigenvalue of its Hessian is at most −α, it follows that J t is α t -concave in its first argument.  □

Recall that for r>0 and a positive integer l, the Bessel potential of order r on \(\mathbb {R}^{l}\), denoted by G r (x), is the inverse Fourier transform of the function \({\hat {G}_{r}}(\omega):= (2\pi)^{-\frac{d}{2}}(1+\|{\omega}\|^{2})^{-r/2}\). It is continuous when r>l [79, p. 132]. For r>0 and 1≤p≤∞, \(\mathcal {B}^{r}_{p}(\mathbb {R}^{l})\) is the Bessel potential space, whose elements are functions u such that u=fG r , where \(f \in \mathcal {L}_{p}(\mathbb {R}^{l})\) [79, p. 134].

For an open set \(\varOmega \subseteq \mathbb {R}^{l}\), a positive integer m, and 1≤p≤∞, by \(\mathcal {W}_{p}^{m}(\varOmega)\) we denote the Sobolev space of functions whose partial derivatives up to the order m are in \(\mathcal {L}_{p}(\varOmega)\).

Lemma 18

Let Assumption 2 be satisfied. Then the following hold.

  • Case 1: m≥2 and 1<p<∞.

    1. (i)

      For every t=0,…,N there exists a function \(\bar{J}^{p}_{t} \in \mathcal {B}^{m}_{p}(\mathbb {R}^{d+s})\) such that \(J_{t}=\bar{J}^{p}_{t}|_{X_{t} \times Z_{t}}\).

    2. (ii)

      For every t=0,…,N−1 and j=1,…,d there exists a function \(\bar{g}_{t,j}^{p} \in \mathcal {B}^{m-1}_{p}(\mathbb {R}^{d+s})\) such that, for the component g t,j of the optimal policy g t , one has \(g_{t,j}=\bar{g}^{p}_{t,j}|_{X_{t} \times Z_{t}}\).

  • Case 2: m≥2 and p=1.

    1. (iii)

      If m=2, then (i) holds with \(\mathcal {B}^{m}_{p}(\mathbb {R}^{d+s})\) replaced by \(\mathcal {B}^{2}_{1}(\mathbb {R}^{d+s})\).

    2. (iv)

      If m>2 is even, then (i) holds with \(\mathcal {B}^{m}_{p}(\mathbb {R}^{d+s})\) replaced by \(\mathcal {B}^{m}_{1}(\mathbb {R}^{d+s})\) and (ii) holds with \(\mathcal {B}^{m-1}_{p}(\mathbb {R}^{d+s})\) replaced by \(\mathcal {B}^{m-2}_{1}(\mathbb {R}^{d+s})\).

    3. (v)

      If m is odd, then (i) holds with \(\mathcal {B}^{m}_{p}(\mathbb {R}^{d+s})\) replaced by \(\mathcal {B}^{m-1}_{1}(\mathbb {R}^{d+s})\) and (ii) holds with \(\mathcal {B}^{m-1}_{p}(\mathbb {R}^{d+s})\) replaced by \(\mathcal {B}^{m-1}_{1}(\mathbb {R}^{d+s})\).

Proof of Lemma 18

  1. (i)

    By Lemma 17 and the compactness of X t ×Z t , we get \(J_{t} \in \mathcal {C}^{m}(X_{t} \times Z_{t}) \subset \mathcal {W}^{m}_{p}(\operatorname {int}(X_{t} \times Z_{t}))\). As X t is bounded and convex, by [38, Corollary 1.2.2.3, p. 12] it has a Lipschitz boundary. By Assumption 2(i), (ii) X t ×Z t has a Lipschitz boundary, too. So, by Sobolev’s extension theorem [79, Theorem 5, p. 181], which holds thanks to the regularity of the boundary, J t can be extended for every 1≤p≤∞ on the whole \(\mathbb {R}^{d+s}\) to a function \(\bar{J}_{t}^{p} \in \mathcal {W}^{m}_{p}(\mathbb {R}^{d+s})\). By the equivalence between Sobolev spaces and Bessel potential spaces [79, Theorem 3, p. 135], which holds for 1<p<∞, we have \(\bar{J}^{p} \in \mathcal {B}^{m}_{p}(\mathbb {R}^{d+s})\) which, when restricted to X t ×Z t , proves (i).

  2. (ii)

    By Lemma 17, \(g_{t,j} \in \mathcal {C}^{m-1}(X_{t} \times Z_{t}) \subset \mathcal {W}^{m-1}_{p}(\operatorname {int}(X_{t} \times Z_{t}))\). Then the statement follows by similar arguments as in the last part of the proof of (i).

  3. (iii), (iv), and (v)

    As d+s>1, for l even the inclusion \(\mathcal {W}^{l}_{1}(\mathbb {R}^{d+s}) \subset \mathcal {B}^{l}_{1}(\mathbb {R}^{d+s})\) holds [79, Remark 6.6(b), p. 160]. Then (iii), (iv), and (v) follow by combining this with \(J_{t} \in \mathcal {W}^{m}_{1}(\mathbb {R}^{d+s})\) and \(g_{t,j} \in \mathcal {W}^{m-1}_{1}(\mathbb {R}^{d+s})\).  □

The next lemma, which is proven in [34, Sect. 5] and will be used in the proof of Proposition 7, states an approximation property that holds for functions belonging to the Bessel potential space \(\mathcal {B}^{r}_{1}(\mathbb {R}^{l})\), when r>l.

Lemma 19

[34, Corollary 5.2]

Let l be a positive integer and r>l. Then for every \(f \in \mathcal {B}^{r}_{1}(\mathbb {R}^{l})\) and every positive integer n there exist \(t_{1}, \ldots, t_{n} \in \mathbb {R}^{l}\), b 1,…,b n >0, δ 1,…,δ n ∈{−1,+1}, and an absolute constant C>0 such that

$$\sup_{x \in \mathbb {R}^l} \Biggl| f({x})-\frac{K(r,l)}{n} \sum_{i=1}^n \delta_i\, e^{-\frac{\|x-t_i\|^2}{b_i}}\Biggr| \leq C\,K(r,l) \sqrt {\frac{l+3}{n}}, $$

where \(K(r,l):= 2^{-l/2}\,\frac{\varGamma(r/2-d/2)}{\varGamma(r/2)}\|\lambda\|_{\mathcal {L}_{1}(\mathbb {R}^{l})}\) and \(\varGamma(z) := \int_{0}^{\infty}y^{z-1}e^{-y}\,dy\) is the Gamma function.

Proof of Proposition 7

  1. (i)

    Let d+s be even. By Lemma 18, there exists a function \(\bar{J}^{o,1}_{t} \in \mathcal {B}^{d+s+2}_{1}(\mathbb {R}^{d+s})\) such that \(J_{t}=\bar{J}^{o,1}_{t}|_{X_{t} \times Z_{t}}\). Since the order r=d+s+2 of the Bessel potential associated to \(\mathcal {B}^{r}_{1}(\mathbb {R}^{d+s})\) is larger than the number of variables d+s, we conclude the proof by Lemma 19.

    For d+s odd, by Lemma 18 there exists a function \(\bar{J}_{t} \in \mathcal {B}^{d+s+1}_{1}(\mathbb {R}^{d+s})\) such that \(J^{o,1}_{t}=\bar{J}^{o,1}_{t}|_{X_{t} \times Z_{t}}\). As the order r=d+s+1 of the Bessel potential associated to \(\mathcal {B}^{r}_{1}(\mathbb {R}^{d+s})\) is larger than the number of variables d+s, we conclude the proof by Lemma 19.

  2. (ii)

    It is proven like (i) by replacing m, J t , and \(\bar{J}^{o,1}_{t}\) with m−1, g t , and \(\bar{g}^{o,1}_{t}\), resp.  □

The next lemma, which is an immediate consequence of [56, Theorem 2.1], will be used in the proof of Proposition 8. Recall that in Sect. 4 we have defined

$$\begin{aligned} \mathcal {S}^\prime :=\biggl\{ &\psi:\mathbb {R}\to \mathbb {R}| {\rm nonzero, \ infinitely \ many \ times \ differentiable \ in \ some}\ \mathcal {I}\subset \mathbb {R}, \\ &{\rm open \ interval \ and \ such \ that \ there \ exists} \ c \in \mathcal {I}: \frac{\partial^k \psi}{\partial z^k} \biggm|_{z=c} \neq 0\ \forall k \in \mathbb {N}\biggr\} . \end{aligned}$$

Lemma 20

Let r and l be positive integers and \(\psi \in \mathcal {S}^{\prime}\). Then for every \(f \in \mathcal {C}^{r}(\mathbb {R}^{l})\) and every positive integer n there exist \(a_{1}, \ldots, a_{n} \in \mathbb {R}^{l}\), \(b_{1}, \ldots, b_{n} \in \mathbb {R}\), \(\delta_{1}, \ldots, \delta_{n} \in \mathbb {R}\), and a constant C>0 (we emphasize that C does not depend on f) such that

$$ \sup_{x \in \mathbb {R}^l} \Biggl| f({x})-\sum _{i=1}^n \delta_i \psi\bigl(\langle a_i , x \rangle + b_i\bigr)\Biggr| \leq C n^{-{r}/{l}}. $$
(68)

Proof of Proposition 8

One proceeds as the proof of Proposition 7, using Lemma 20 instead of Lemma 18 and noting that for r≥⌊l/2⌋+1, the second member of (68) can be replaced by Cn −1/2.  □

Proof of Theorem 9

One combines Proposition 7(i) with Proposition 5(i) and (ii). □

Proof of Theorem 10

Proceeding as in the proof of Lemma 17, we obtain that the t-th value function J t and the t-th optimal policy g t are Lipschitz continuous. Hence, for t=0,…,N−1 and any sequence of ε t >0, Proposition 6(ii) gives

$$ \|J-\tilde{J}_{P, \mathit{GRBFN}}\|_{{\mathrm{sup}}(\bar{Y}_0)} \leq \sum_{t=0}^{N-1} \varGamma_t {\varepsilon}_t $$
(69)

for any sequence of functions \(\gamma_{t} \in \mathcal {G}(n_{t,1},\ldots,n_{t,d}; \bar{Y}_{t}, X_{t+1})\) such that \(\|g_{t} -\gamma_{t}\|_{{\mathrm{sup}}(\bar{Y}_{t})} \leq {\varepsilon}_{t}\) (t=0,…,N−1).

By Proposition 7(ii), for every t=0,…,N−1 and j=1,…,d, there exists C t,j >0 such that, for every positive integer n t,j , there exist \(\tau_{t,j,1}, \ldots,\tau_{t,j,n_{t,j}} \in \mathbb {R}^{d}\), \(b_{t,j,1}, \ldots, b_{t,j,n_{t,j}}>0\), and \(\delta_{t,j,1}, \ldots, \delta_{t,j,n_{t,j}} \in \mathbb {R}\) for which

$$ \sup_{\bar{y}_t \in \bar{Y}_t} \Biggl\vert g_{t,j}(\bar{y}_t)-\sum_{i=1}^{n_{t,j}} \delta_{t,j,i} \exp \biggl(-\frac{\|\bar{y}_t-\tau_{t,j,i}\|^2}{b_{t,j,i}} \biggr) \Biggr\vert \leq \frac{C_{t,j}}{\sqrt{n_{t,j}}} . $$
(70)

By (16), for every continuous function \(\gamma_{t}: \bar{Y}_{t} \to \mathbb {R}^{d}\) we have

$$ \|\gamma_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq \sqrt{\sum_{j=1}^d \Bigl( \sup_{\bar{y}_t \in \bar{Y}_t} \bigl|g_{t,j}(\bar{y}_t)\bigr| \Bigr)^2}. $$
(71)

Since \(g_{t}(\bar{y}_{t}) \in X_{t+1}\) and \(\operatorname {Prj}_{\mathcal {D}_{t}(x_{t},z_{t})}\) is a nonexpansive map, from (70) and (71) we get

$$\begin{aligned} \sup_{\bar{y}_t \in \bar{Y}_t} \Biggl| &g_{t}(\bar{y}_t)-\operatorname {Prj}_{\mathcal {D}_{t}(x_t,z_t)} \Biggl\{ \sum_{i=1}^{n_{t,1}} \delta_{t,1,i} \exp \biggl(-\frac{\|\bar{y}_t-\tau_{t,1,i}\|^2}{b_{t,1,i}} \biggr), \ldots, \\ & \sum_{i=1}^{n_{t,d}} \delta_{t,d,i} \exp \biggl(-\frac{\|\bar{y}_t-\tau_{t,d,i}\|^2}{b_{t,d,i}} \biggr) \Biggr\} \Biggr| \leq \sqrt{\sum_{j=1}^d \frac{C_{t,j}^2}{n_{t,j}}} . \end{aligned}$$
(72)

Hence, for every t=0,…,N−1, there exists \({\tilde{g}}_{t} \in \mathcal {G}(n_{t,1},\ldots,n_{t,d}; \bar{Y}_{t}, X_{t+1})\) such that

$$ \bigl\Vert g_{t}(\bar{y}_t)-{\tilde{g}}_t(\bar{y}_t) \bigr\Vert _{{\mathrm{sup}}(\bar{Y}_t)} \leq \sqrt{\sum_{j=1}^d \frac{C_{t,j}^2}{n_{t,j}}} . $$

We conclude by taking \({\varepsilon}_{t} = \sqrt{\sum_{j=1}^{d} \frac{C_{t,j}^{2}}{n_{t,j}}}\) in (69).  □

Proofs of Theorems 11 and 12

They are along the lines of the proofs of Theorems 9 and 10, resp., by exploiting Proposition 8 instead of Proposition 7. □

Proof of Proposition 14

We first derive some conditions on the sets A t,j , then we show that the budget constraints (46) are satisfied if and only if the sets A t,j are chosen as in Assumption 13 (or are suitable subsets).

As the labour incomes y t,j and the stochastic interest rates r t,j are bounded from above, for t=1,…,N we have

$$a_{t,j} \leq a_{0,j}^\mathrm{max} \prod _{k=0}^{t-1}\bigl(1+r_{k,j}^\mathrm{max} \bigr) + \sum_{i=0}^{t-1} y_{i,j} \prod_{k=i}^{t-1}\bigl(1+r_{k,j}^\mathrm{max} \bigr)=a_{t,j}^\mathrm{max} $$

(the upper bound is achieved when all the r t,j are equal to \(r_{t,j}^{\mathrm{max}}\) and all the consumptions c t,j are equal to 0), so the corresponding feasible sets A t,j are bounded from above by \(a_{t,j}^{\mathrm{max}}\). Boundedness from below of each A t,j follows from the budget constraints (46), which for c k,j =0 (k=t,…,N) are equivalent to

$$ a_{N,j} \geq -y_{N,j} $$
(73)

for t=N and

$$a_{t,j} \prod_{k=t}^{N-1} (1+r_{k,j}) + \sum_{i=t}^{N-1} y_{i,j} \prod_{k=i}^{N-1} (1+r_{k,j}) + y_{N,j} \geq 0, $$

i.e.,

$$ a_{t,j} \geq -\frac{\sum_{i=t}^{N-1} y_{i,j} \prod_{k=i}^{N-1} (1+r_{k,j}) + y_{N,j}}{\prod_{k=t}^{N-1} (1+r_{k,j} )} $$
(74)

for t=0,…,N−1. For k=t,…,N−1, one can show that the maximum value of the right-hand side in the lower bound (74) occurs when each r k,j is equal to \(r_{k,j}^{\mathrm{max}}\) (indeed, by computing its partial derivatives with respect to r k,j it is easy to see that they are nonnegative). So, in order to satisfy the budget constraints (46) independently of the specific sequence of stochastic interest rates, the constraints (73) and

$$ a_{t,j} \geq -\frac{\sum_{i=t}^{N-1} y_{i,j} \prod_{k=i}^{N-1} (1+r_{k,j}^\mathrm{max}) + y_{N,j}}{\prod_{k=t}^{N-1} (1+r_{k,j}^\mathrm{max} )}, \quad t=0,\ldots,N-1, $$
(75)

have to be satisfied. Then the maximal sets A t that satisfy the budget constraints (46) have the form described in Assumption 13.  □

Proof of Proposition 16

  1. (a)

    Assumption 2(i), (ii), and (iii). These follow immediately by the choices of A t and R t , and by Assumption 15(i). By construction, the sets \(\bar{A}_{t}\) are compact, convex, and have nonempty interiors, since they are Cartesian products of nonempty closed intervals. The same holds for the sets \(\bar{D}_{t}\), since by (48) they are the intersections between \(\bar{A}_{t} \times R_{t} \times \bar{A}_{t+1}\) and the sets D t , which are compact, convex, and have nonempty interiors, too.

  2. (b)

    Assumption 2(iv). Recall that for Problem OCUd we have

    $$h_N(a_N)= u (a_N+y_N ). $$

    Then, \(h_{N} \in \mathcal {C}^{m}(\bar{A}_{N})\) and by Assumption 15(ii) is α N -concave with α N =α.

  3. (c)

    Assumption 2(v). Recall that for Problem OCUd and t=0,…,N−1, we have

    $$ h_t(a_t,a_{t+1})=u \biggl(\frac{(1+r_t) \circ\,(a_t+y_t)-a_{t+1}}{1+r_t} \biggr)+\sum_{j=1}^d v_{t,j}(a_{t,j}). $$

    Then, \(h_{t} \in \mathcal {C}^{m}(\bar{D}_{t})\) by Assumption 15(ii), (iii). As u and v t,j are twice continuously differentiable, the second part of Assumption 2(v) means that there exists some α t >0 such that, for each fixed r t , the function

    $$u \biggl(\frac{(1+r_t) \circ\,(a_t+y_t)-a_{t+1}}{1+r_t} \biggr)+\sum_{j=1}^d v_{t,j}(a_{t,j})+ \frac{1}{2}\alpha_t \|a_t\|^2 $$

    has negative semi-definite Hessian with respect to the variables a t and a t+1. Assumption 15(ii) and straightforward computations show that, for fixed r t , the function

    $$u \biggl(\frac{(1+r_t) \circ\,(a_t+y_t)-a_{t+1}}{1+r_t} \biggr) $$

    has negative semi-definite Hessian. By Assumption 15(iii), for each j=1,…,d and α t,j ∈(0,β t,j ], \(v_{t,j}(a_{t,j})+ \frac{1}{2}\alpha_{t,j} a_{t,j}^{2}\) has negative semi-definite Hessian, too. So, Assumption 2(v) is satisfied for every α t ∈(0,min j=1,…,d {β t,j }].

  4. (d)

    Assumption 2(vi). This is Assumption 15(i), with the obvious replacements of X t and D t .  □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gaggero, M., Gnecco, G. & Sanguineti, M. Approximate dynamic programming for stochastic N-stage optimization with application to optimal consumption under uncertainty. Comput Optim Appl 58, 31–85 (2014). https://doi.org/10.1007/s10589-013-9614-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-013-9614-z

Keywords

Navigation