Approximate dynamic programming for stochastic N-stage optimization with application to optimal consumption under uncertainty

Mauro Gaggero¹,
Giorgio Gnecco^2,3 &
Marcello Sanguineti³

624 Accesses
23 Citations
Explore all metrics

Abstract

Stochastic optimization problems with an objective function that is additive over a finite number of stages are addressed. Although Dynamic Programming allows one to formally solve such problems, closed-form solutions can be derived only in particular cases. The search for suboptimal solutions via two approaches is addressed: approximation of the value functions and approximation of the optimal decision policies. The approximations take on the form of linear combinations of basis functions containing adjustable parameters to be optimized together with the coefficients of the combinations. Two kinds of basis functions are considered: Gaussians with varying centers and widths and sigmoids with varying weights and biases. The accuracies of such suboptimal solutions are investigated via estimates of the error propagation through the stages. Upper bounds are derived on the differences between the optimal value of the objective functional and its suboptimal values corresponding to the use at each stage of approximate value functions and approximate policies. Conditions under which the number of basis functions required for a desired approximation accuracy does not grow “too fast” with respect to the dimensions of the state and random vectors are provided. As an example of application, a multidimensional problem of optimal consumption under uncertainty is investigated, where consumers aim at maximizing a social utility function. Numerical simulations are provided, emphasizing computational pros and cons of the two approaches (i.e., value-function approximation and optimal-policy approximation) using the above-mentioned two kinds of basis functions. To investigate the dependencies of the performances on dimensionality, the numerical analysis is performed for various numbers of consumers. In the simulations, discretization techniques exploiting low-discrepancy sequences are used. Both theoretical and numerical results give insights into the possibility of coping with the curse of dimensionality in stochastic optimization problems whose decision strategies depend on large numbers of variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Suboptimal Policies for Stochastic $$N$$ -Stage Optimization: Accuracy Analysis and a Case Study from Optimal Consumption

A Computationally Efficient FPTAS for Convex Stochastic Dynamic Programs

Two-stage linear decision rules for multi-stage stochastic programming

Article 15 October 2018

Notes

The expression “curse of dimensionality”, introduced by Bellman [6], refers to an unacceptably fast growth, with respect to a suitably-defined “dimension” of the problem at hand, of the computational time or memory requirements necessary to solve an instance of the problem itself.
A sigmoid is a bounded measurable function $\sigma : \mathbb {R}\rightarrow \mathbb {R}$ such that lim_t→−∞ σ(t)=0 and lim_t→+∞ σ(t)=1 [25]; see Sect. 4.
I.e., with a finite number of variables and an infinite number of constraints.
In general, such expectations and maximizations cannot be performed exactly. These additional sources of error are due to the replacements of the conditional expectations by empirical means (computed on the basis of a sufficiently large number of samples) and to the use of nonlinear programming algorithms. Tools from statistical learning theory [28, 59, 85] and deterministic learning (see, e.g., [21] and the references therein) can be used to derive upper bounds on the required number of samples for a desired degree of accuracy. However, in this paper we do not address these issues, so in the following we shall simplify the analysis by assuming a large number of samples and neglecting the error in performing the maximizations and conditional expectations.
To see how $\tilde{J}_{t}$ is chosen, see the proof of Proposition 5, in particular formula (52).
In particular, for deterministic infinite-horizon problems, in [72] it was shown that high-order differentiability may be lost at unstable stationary points of the dynamical system x _t+1=g(x _t), where g is the (stationary) optimal policy.
We assume for simplicity of notation that the minimum in (52) is achieved, otherwise one can replace the argmin in (52) by an element $\tilde{f}_{t}$ of $\mathcal {F}_{V,t}$ whose value of $\|\tilde{f}_{t}- \hat{J}_{t}\|_{{\mathrm{sup}}(\bar{Y}_{t})}$ is arbitrarily close to its infimum.

References

Adda, J., Cooper, R.: Dynamic Economics: Quantitative Methods and Applications. MIT Press, Cambridge (2003)
Google Scholar
Alessandri, A., Gaggero, M., Zoppoli, R.: Feedback optimal control of distributed parameter systems by using finite-dimensional approximation schemes. IEEE Trans. Neural Netw. Learn. Syst. 23(6), 984–996 (2012)
Article Google Scholar
Altman, E., Nain, P.: Optimal control of the M/G/1 queue with repeated vacations of the server. IEEE Trans. Autom. Control 38, 1766–1775 (1993)
Article MATH MathSciNet Google Scholar
Anderson, E.J., Nash, P.: Linear Programming in Infinite-Dimensional Spaces. Wiley, New York (1987)
MATH Google Scholar
Baird, L.: Residual algorithms: reinforcement learning with function approximation. In: Proc. 12th Int. Conf. on Machine Learning, pp. 30–37 (1995)
Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Bellman, R., Dreyfus, S.: Functional approximations and dynamic programming. Math. Tables Other Aids Comput. 13, 247–251 (1959)
Article MATH MathSciNet Google Scholar
Bellman, R., Kalaba, R., Kotkin, B.: Polynomial approximation—a new computational technique in dynamic programming. Math. Comput. 17, 155–161 (1963)
MATH MathSciNet Google Scholar
Benveniste, L.M., Scheinkman, J.A.: On the differentiability of the value function in dynamic models of economics. Econometrica 47, 727–732 (1979)
Article MATH MathSciNet Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1. Athena Scientific, Belmont (2005)
MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 2. Athena Scientific, Belmont (2007)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
MATH Google Scholar
Bertsekas, D.P., Borkar, V.S., Nedic, A.: Improved temporal difference methods with linear function approximation. In: Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.) Handbook of Learning and Approximate Dynamic Programming, pp. 233–257. IEEE Press, New York (2004)
Google Scholar
Bhattacharya, R., Majumdar, M.: Random Dynamical Systems: Theory and Applications. Cambridge University Press, Cambridge (2007)
Book Google Scholar
Blume, L., Easley, D., O’Hara, M.: Characterization of optimal plans for stochastic dynamic programs. J. Econ. Theory 28, 221–234 (1982)
Article MATH MathSciNet Google Scholar
Buhmann, M.D.: Radial Basis Functions. Cambridge University Press, Cambridge (2003)
Book MATH Google Scholar
Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton (2010)
Book Google Scholar
Cervellera, C., Muselli, M.: Efficient sampling in approximate dynamic programming algorithms. Comput. Optim. Appl. 38, 417–443 (2007)
Article MATH MathSciNet Google Scholar
Cervellera, C., Gaggero, M., Macciò, D.: Efficient kernel models for learning and approximate minimization problems. Neurocomputing 97, 74–85 (2012)
Article Google Scholar
Cervellera, C., Gaggero, M., Macciò, D., Marcialis, R.: Quasi-random sampling for approximate dynamic programming. In: Proc. 2013 Int. Joint Conf. on Neural Networks, pp. 2567–2574 (2013)
Google Scholar
Cervellera, C., Gaggero, M., Macciò, D.: Low-discrepancy sampling for approximate dynamic programming with local approximators. Comput. Oper. Res. 43, 108–115 (2014)
Article MathSciNet Google Scholar
Chen, V.C.P., Ruppert, D., Shoemaker, C.A.: Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38–53 (1999)
Article MATH MathSciNet Google Scholar
Cruz-Suárez, H., Montes-de-Oca, R.: Discounted Markov control processes induced by deterministic systems. Kybernetika 42, 647–664 (2006)
MATH MathSciNet Google Scholar
Cruz-Suárez, H., Montes-de-Oca, R.: An envelope theorem and some applications to discounted Markov decision processes. Math. Methods Oper. Res. 67, 299–321 (2008)
Article MATH MathSciNet Google Scholar
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2, 303–314 (1989)
Article MATH MathSciNet Google Scholar
Ekeland, I., Turnbull, T.: Infinite-Dimensional Optimization and Convexity. University of Chicago Press, Chicago (1983)
MATH Google Scholar
Fang, K.T., Wang, Y.: Number-Theoretic Methods in Statistics. Chapman & Hall, London (1994)
Book MATH Google Scholar
Farahmand, A., Ghavamzadeh, M., Szepesvári, C., Mannor, S.: Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems. In: Proc. American Control Conf., pp. 725–730 (2009)
Google Scholar
Foufoula-Georgiou, E., Kitanidis, P.K.: Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems. Water Resour. Res. 24, 1345–1359 (1988)
Article Google Scholar
Gaggero, M., Gnecco, G., Sanguineti, M.: Dynamic programming and value-function approximation in sequential decision problems: error analysis and numerical results. J. Optim. Theory Appl. 156, 380–416 (2013)
Article MATH MathSciNet Google Scholar
Gaggero, M., Gnecco, G., Sanguineti, M.: Suboptimal policies for stochastic N-stage optimization problems: accuracy analysis and a case study from optimal consumption. In: El Ouardighi, F., Kogan, K. (eds.) Models and Methods in Economics and Management Science. International Series in Operations Research and Management Science, vol. 198, pp. 27–50. Springer, Berlin (2014)
Chapter Google Scholar
Gelfand, I.M., Fomin, S.V.: Calculus of Variations. Prentice Hall, Englewood Cliffs (1963)
Google Scholar
Giulini, S., Sanguineti, M.: Approximation schemes for functional optimization problems. J. Optim. Theory Appl. 140, 33–54 (2009)
Article MATH MathSciNet Google Scholar
Gnecco, G., Sanguineti, M.: Approximation error bounds via Rademacher’s complexity. Appl. Math. Sci. 2, 153–176 (2008)
MATH MathSciNet Google Scholar
Gnecco, G., Sanguineti, M.: Suboptimal solutions to dynamic optimization problems via approximations of the policy functions. J. Optim. Theory Appl. 146, 764–794 (2010)
Article MATH MathSciNet Google Scholar
Gnecco, G., Sanguineti, M., Gaggero, M.: Suboptimal solutions to team optimization problems with stochastic information structure. SIAM J. Optim. 22, 212–243 (2012)
Article MATH MathSciNet Google Scholar
Gribonval, R., Vandergheynst, P.: On the exponential convergence of matching pursuits in quasi-incoherent dictionaries. IEEE Trans. Inf. Theory 52, 255–261 (2006)
Article MathSciNet Google Scholar
Grisvard, P.: Elliptic Problems in Nonsmooth Domains. Pitman, Boston (1985)
MATH Google Scholar
Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods. Methuen, London (1964)
Book MATH Google Scholar
Hernandez-Lerma, O., Lasserre, J.: Approximation schemes for infinite linear programs. SIAM J. Optim. 8, 973–988 (1998)
Article MATH MathSciNet Google Scholar
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Springer, Berlin (1996)
Google Scholar
Ito, S., Wu, S.Y., Shiu, T.J., Teo, K.L.: A numerical approach to infinite-dimensional linear programming in L ₁ spaces. J. Ind. Manag. Optim. 6, 15–18 (2010)
MATH MathSciNet Google Scholar
Johnson, S., Stedinger, J., Shoemaker, C., Li, Y., Tejada-Guibert, J.: Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41, 484–500 (1993)
Article MATH Google Scholar
Judd, K.: Numerical Methods in Economics. MIT Press, Cambridge (1998)
MATH Google Scholar
Kainen, P., Kůrková, V., Sanguineti, M.: Minimization of error functionals over variable-basis functions. SIAM J. Optim. 14, 732–742 (2003)
Article MATH MathSciNet Google Scholar
Kainen, P.C., Kůrková, V., Sanguineti, M.: Dependence of computational models on input dimension: tractability of approximation and optimization tasks. IEEE Trans. Inf. Theory 58, 1203–1214 (2012)
Article Google Scholar
Karp, L., Lee, I.H.: Learning-by-doing and the choice of technology: the role of patience. J. Econ. Theory 100, 73–92 (2001)
Article MATH MathSciNet Google Scholar
Kuhn, D.: Generalized Bounds for Convex Multistage Stochastic Programs. Springer, Berlin (2005)
MATH Google Scholar
Kůrková, V., Sanguineti, M.: Bounds on rates of variable-basis and neural-network approximation. IEEE Trans. Inf. Theory 47, 2659–2665 (2001)
Article MATH Google Scholar
Kůrková, V., Sanguineti, M.: Comparison of worst-case errors in linear and neural network approximation. IEEE Trans. Inf. Theory 48, 264–275 (2002)
Article MATH Google Scholar
Kůrková, V., Sanguineti, M.: Error estimates for approximate optimization by the extended Ritz method. SIAM J. Optim. 18, 461–487 (2005)
Article Google Scholar
Kůrková, V., Sanguineti, M.: Approximate minimization of the regularized expected error over kernel models. Math. Oper. Res. 33, 747–756 (2008)
Article MATH MathSciNet Google Scholar
Kůrková, V., Sanguineti, M.: Geometric upper bounds on rates of variable-basis approximation. IEEE Trans. Inf. Theory 54, 5681–5688 (2008)
Article Google Scholar
Lendaris, G.G., Neidhoefer, J.C.: Guidance in the choice of adaptive critics for control. In: Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.) Handbook of Learning and Approximate Dynamic Programming, pp. 97–124. IEEE Press, New York (2004)
Google Scholar
Loomis, L.H.: An Introduction to Abstract Harmonic Analysis. Van Nostrand, Princeton (1953)
MATH Google Scholar
Mhaskar, H.N.: Neural networks for optimal approximation of smooth and analytic functions. Neural Comput. 8, 164–177 (1996)
Article Google Scholar
Montrucchio, L.: Lipschitz continuous policy functions for strongly concave optimization problems. J. Math. Econ. 16, 259–273 (1987)
Article MATH MathSciNet Google Scholar
Montrucchio, L.: Thompson metric, contraction property and differentiability of policy functions. J. Econ. Behav. Organ. 33, 449–466 (1998)
Article Google Scholar
Munos, R., Szepesvári, C.: Finite-time bounds for fitted value iteration. J. Mach. Learn. Res. 1, 815–857 (2008)
Google Scholar
Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia (1992)
Book MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006)
MATH Google Scholar
Philbrick, C.R., Jr., Kitanidis, P.K.: Improved dynamic programming methods for optimal control of lumped-parameter stochastic systems. Oper. Res. 49, 398–412 (2001)
Article MATH MathSciNet Google Scholar
Pinkus, A.: Approximation theory of the MLP model in neural networks. Acta Numer. 8, 143–195 (1999)
Article MathSciNet Google Scholar
Poggio, T., Smale, S.: The mathematics of learning: dealing with data. Not. Am. Math. Soc. 50, 537–544 (2003)
MATH MathSciNet Google Scholar
Powell, W.B.: Approximate Dynamic Programming. Wiley-Interscience, Hoboken (2007)
Book MATH Google Scholar
Puterman, M.L., Shin, M.C.: Modified policy iteration algorithms for discounted Markov decision processes. Manag. Sci. 41, 1127–1137 (1978)
Article MathSciNet Google Scholar
Rapaport, A., Sraidi, S., Terreaux, J.: Optimality of greedy and sustainable policies in the management of renewable resources. Optim. Control Appl. Methods 24, 23–44 (2003)
Article MATH MathSciNet Google Scholar
Rincon-Zapatero, J.P., Santos, M.S.: An envelope theorem and some applications to discounted Markov decision processes. J. Econ. Theory 144, 19481764 (2009)
Article MathSciNet Google Scholar
Sahinidis, N.V.: Optimization under uncertainty: state-of-the-art and opportunities. Comput. Chem. Eng. 28, 971–983 (2004)
Article Google Scholar
Santos, M.S.: Smoothness of policy function in discrete time economic models. Econometrica 59, 1365–1382 (1991)
Article MATH MathSciNet Google Scholar
Santos, M.S.: On high-order differentiability of the policy function. Econ. Theory 3, 565–570 (1993)
Article MATH Google Scholar
Santos, M.S., Vila, J.L.: Smoothness of the policy function in continuous-time economic models: the one-dimensional case. J. Econ. Dyn. Control 15, 741–753 (1988)
Article MathSciNet Google Scholar
Schweitzer, P.J., Seidmann, A.: Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. Appl. 110, 568–582 (1985)
Article MATH MathSciNet Google Scholar
Secomandi, N.: Optimal commodity trading with a capacitated storage asset. Manag. Sci. 56, 449–467 (2010)
Article MATH Google Scholar
Semmler, W., Sieveking, M.: Critical debt and debt dynamics. J. Econ. Dyn. Control 24, 1121–1144 (2000)
Article MATH MathSciNet Google Scholar
Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.): Handbook of Learning and Approximate Dynamic Programming. IEEE Press, New York (2004)
Google Scholar
Singer, I.: Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces. Springer, Berlin (1970)
Book MATH Google Scholar
Sobol’, I.M.: The distribution of points in a cube and the approximate evaluation of integrals. Ž. Vyčisl. Mat. Mat. Fiz. 7, 784–802 (1967)
MathSciNet Google Scholar
Stein, E.M.: Singular Integrals and Differentiability Properties of Functions. Princeton University Press, Princeton (1970)
MATH Google Scholar
Stokey, N.L., Lucas, R.E., Prescott, E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989)
MATH Google Scholar
ten Hagen, S., Kröse, B.: Neural Q-learning. Neural Comput. Appl. 12, 81–88 (2003)
Article Google Scholar
Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992)
MATH Google Scholar
Tsitsiklis, J.N.: Perspectives on stochastic optimization over time. INFORMS J. Comput. 22, 18–19 (2010)
Article MATH MathSciNet Google Scholar
Tsitsiklis, J.N., Roy, B.V.: Feature-based methods for large scale dynamic programming. Mach. Learn. 22, 59–94 (1996)
MATH Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
White, D.J.: Markov Decision Processes. Wiley, New York (1993)
MATH Google Scholar
Zoppoli, R., Parisini, T., Sanguineti, M.: Approximating networks and extended Ritz method for the solution of functional optimization problems. J. Optim. Theory Appl. 112, 403–439 (2002)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous referees for their fruitful comments and suggestions, and for having pointed out the related literature [5, 28, 59, 74], and [82]. Part of the discussion contained in Sect. 1.4 is by one of the reviewers, to whom we are grateful. G. Gnecco and M. Sanguineti were partially supported by the Progetto di Ricerca di Ateneo 2012 “Models and Computational Methods for High-Dimensional Data”, granted by University of Genoa.

Author information

Authors and Affiliations

Institute of Intelligent Systems for Automation (ISSIA), National Research Council of Italy, Via De Marini 6, 16149, Genova, Italy
Mauro Gaggero
IMT, Institute for Advanced Studies, Piazza S. Ponziano 6, 55100, Lucca, Italy
Giorgio Gnecco
DIBRIS, University of Genoa, Via Opera Pia 13, 16145, Genova, Italy
Giorgio Gnecco & Marcello Sanguineti

Authors

Mauro Gaggero
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Gnecco
View author publications
You can also search for this author in PubMed Google Scholar
Marcello Sanguineti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcello Sanguineti.

Appendix: Proofs and technical lemmas

Proof of Proposition 5

(i) We use a backward induction argument. For t=N−1,…,0, assume that at stage t+1, $\tilde{J}_{t+1} \in \mathcal {F}_{V,t+1}$ is such that $\| J_{t+1}-\tilde{J}_{t+1}\|_{{\mathrm{sup}}(\bar{Y}_{t+1})} \leq {\eta}_{t+1}$ for some η _t+1≥0. In particular, as $\tilde{J}_{N} = J_{N}$ one has η _N=0. Let $\hat{J}_{t}$ be defined as in formula (11). This can be written as $\hat{J}_{t}=T_{t} \tilde{J}_{t+1}$, where T _t is Bellman’s operator, defined for every bounded and continuous function k _t+1 on X _t+1×Z _t+1 as

$$(T_{t} k_{t+1}) (x_t,z_t) :=\sup_{y \in \mathcal{D}_t(x_t,z_t)} \Bigl[h_t(x_t, y,z_t) + \beta \mathop {{\mathbb{E}}}_{z_{t+1}|z_t} \bigl\{ k_{t+1}(y,z_{t+1})\bigr\} \Bigr]. $$

Thanks to Assumption 3, it follows by an application of [80, Theorem 9.6] that

$$\begin{aligned} \|J_{t}-\hat{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} =& \bigl\| (T_t J_{t+1})-(T_t \tilde{J}_{t+1})\bigr\| _{{\mathrm{sup}}(\bar{Y}_t)} \\ \leq &\beta \|J_{t+1}-\tilde{J}_{t+1}\|_{{\mathrm{sup}}(\bar{Y}_{t+1})} \leq \beta \eta_{t+1}. \end{aligned}$$

(51)

Before moving to the t-th stage, one has to find an approximation $\tilde{J}_{t} \in \mathcal {F}_{V,t}$ for J _t=T _t J _t+1. Such an approximation has to be obtained from $\hat{J}_{t}=T_{t} \tilde{J}_{t+1}$ (which, in general, may not belong to $\mathcal {F}_{V,t}$), because J _t=T _t J _t+1 is unknown.

For a bounded and continuous function f on $\bar{Y}_{t}$ and r>0, denote by B _r(f) the closed ball of radius r in the sup-norm over $\bar{Y}_{t}$ centered at f, i.e., $B_{r}(f):=\{g \ \mathrm{bounded} \ \mathrm{and}\ \mathrm{continuous}\ \mathrm{function}\ \mathrm{on}\ \bar{Y}_{t}\,|\, \|f-g\|_{{\mathrm{sup}}(\bar{Y}_{t})} \leq r \}$. Let $J_{t} \in B_{\beta\eta_{t+1}}(\hat{J}_{t})$. By assumption, there exists $f_{t} \in \mathcal {F}_{V,t}$ such that $\|J_{t}-f_{t}\|_{{\mathrm{sup}}(\bar{Y}_{t})} \leq \varepsilon_{t}$, i.e., $f_{t} \in B_{\varepsilon_{t}}(J_{t}) \cap \mathcal {F}_{V,t}$. Since J _t is unknown, one cannot simply set $\tilde{J}_{t}:=f_{t}$ but one can define^{Footnote 7} $\tilde{J}_{t}$ as any element of the set

$$ \arg\min_{\tilde{f}_t \in \mathcal {F}_{V,t}} \|\hat{J}_t- \tilde{f}_t\|_{{\mathrm{sup}}(\bar{Y}_t)}. $$

(52)

Hence, by the assumption on f _t, the definition of $\tilde{J}_{t}$, and the triangle inequality, we get

$$\|\hat{J}_t-\tilde{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq \|\hat{J}_t-J_t\|_{{\mathrm{sup}}(\bar{Y}_t)} +\|J_t-f_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq \beta \eta_{t+1} + \varepsilon_t. $$

Moreover, by (51) we have

$$\|J_t-\tilde{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq \|J_t-\tilde{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} +\|\hat{J}_t-\tilde{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq 2 \beta \eta_{t+1} + \varepsilon_t :=\eta_t. $$

Starting from $\tilde{J}_{N} = J_{N}$ and making explicit the dependence of η _t from ε _t,…,ε _N−1, after N−t iterations (t=0,…,N−1) we obtain

$$ \|J_t-\tilde{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq \eta_t= \varepsilon_t + 2\beta\, \varepsilon_{t+1} + \cdots+(2\beta)^{N-1-t}\,\varepsilon_{N-1}= \sum_{i=0}^{N-1-t}(2\beta)^i \varepsilon_{t+i}. $$

(53)

Then, we obtain $\|J-\tilde{J}\|_{{\mathrm{sup}}(\bar{Y}_{0})}=\|J_{0}-\tilde{J}_{0}\|_{{\mathrm{sup}}(\bar{Y}_{0})} \leq \sum_{i=0}^{N-1}{(2\beta)^{i}\varepsilon_{i}}$.

(ii) In order to prove (19), we need to find, for each t, suitable positive constants η _t, δ _t and χ _t such that

$$\begin{aligned} \|\tilde{J}_t-J_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq &\eta_t , \\ \|\tilde{J}_{V,t}-J_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq &\delta_t, \\ \|\tilde{J}_{V,t}-\tilde{J}_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq &\chi_t. \end{aligned}$$

In the following we show how one can define such constants in a recursive way and we prove that

$$\delta_0=2 \sum_{t=1}^{N-1} (2\beta)^t \varepsilon_t - 2 \sum_{t=1}^{N-1} \beta^t \varepsilon_t, $$

from which (19) follows. We divide this part of the proof in several steps.

Step 1: stage N−1.:

We recall that the policy $\tilde{g}_{V,N-1}(x_{N-1},z_{N-1})$ is defined as

$$\begin{aligned} & \tilde{g}_{V,N-1}(x_{N-1},z_{N-1}) \\ &\quad :=\arg\max_{y \in \mathcal{D}_{N-1}(x_{N-1},z_{N-1})} \Bigl[h_{N-1}(x_{N-1},y,z_{N-1}) + \beta \mathop {{\mathbb{E}}}_{z_N|z_{N-1}} \bigl\{ \tilde{J}_N (y,z_{N}) \bigr\} \Bigr]. \end{aligned}$$

(54)

Since $\tilde{J}_{N}=J_{N}$, the policy $\tilde{g}_{V,N-1}$ is optimal also for the problem (54) with $\tilde{J}_{N}$ replaced by J _N. Then we obtain $\tilde{J}_{V,N-1}=J_{N-1}$. Concluding, we can set δ _N−1:=0 and χ _N−1:=η _N−1.

Step 2: stage N−2.:

The reoptimized policy $\tilde{g}_{V,N-2}(x_{N-2},z_{N-2})$ is defined as

$$\begin{aligned} &\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}) \\ &\quad := \arg\max_{y \in \mathcal{D}_{N-2}(x_{N-2},z_{N-2})} \Bigl[h_{N-2}(x_{N-2},y,z_{N-2}) + \beta \mathop {{\mathbb{E}}}_{z_{N-1}|z_{N-2}} \bigl\{ \tilde{J}_{N-1} (y,z_{N-1}) \bigr\} \Bigr]. \end{aligned}$$

(55)

Now, we evaluate the distance of $\tilde{J}_{V,N-2}$ from J _N−2 in the sup-norm, then we determine δ _N−2. Since $\tilde{J}_{V,N-2}$ corresponds to the application of the suboptimal policies $\tilde{g}_{V,N-2}$ and $\tilde{g}_{V,N-1}$ at the stages N−2 and N−1, we get

$$ \tilde{J}_{V,N-2}(x_{N-2},z_{N-2}) \leq J_{N-2}(x_{N-2},z_{N-2}). $$

(56)

Moreover,

$$\begin{aligned} &\tilde{J}_{V,N-2}(x_{N-2},z_{N-2}) \\ &\quad =h_{N-2} \bigl(x_{N-2},\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \\ &\qquad {}+ \beta \mathop {{\mathbb{E}}}_{z_{N-1}|z_{N-2}} \bigl\{ \tilde{J}_{N-1} \bigl(\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \bigr\} \\ &\quad =h_{N-2}\bigl(x_{N-2},\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \\ &\qquad {}+ \beta \mathop {{\mathbb{E}}}_{z_{N-1}|z_{N-2}} \bigl\{ \tilde{J}_{N-1} \bigl( \tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \bigr\} \\ &\qquad {}+\beta \mathop {{\mathbb{E}}}_{z_{N-1}|z_{N-2}} \bigl\{ \tilde{J}_{V,N-1} \bigl( \tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \\ &\qquad {}-\tilde{J}_{N-1}\bigl(\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2} \bigr) \bigr\} \\ &\quad \geq J_{V,N-2}(x_{N-2},z_{N-2}) - \beta \eta_{N-1} - \beta \chi_{N-1}, \end{aligned}$$

(57)

where the last inequality in (57) is obtained observing that

$$\begin{aligned} &\bigl\vert \tilde{J}_{V,N-1} \bigl(\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \\ &\quad {}-\tilde{J}_{N-1}\bigl(\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2} \bigr)\bigr\vert \leq\chi_{N-1}, \\ & h_{N-2}\bigl(x_{N-2},\tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2} \bigr) \\ &\qquad {}+ \beta \mathop {{\mathbb{E}}}_{z_{N-1}|z_{N-2}} \bigl\{ \tilde{J}_{N-1} \bigl( \tilde{g}_{V,N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \bigr\} \\ &\quad =(T_{N-2} \tilde{J}_{N-1}) (x_{N-2},z_{N-2}), \\ & h_{N-2}\bigl(x_{N-2},g_{N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \\ &\qquad {}+\beta \mathop {{\mathbb{E}}}_{z_{N-1}|z_{N-2}} \bigl\{ J_{N-1}\bigl(g_{N-2}(x_{N-2},z_{N-2}),z_{N-2}\bigr) \bigr\} \\ &\quad =(T_{N-2} J_{N-1}) (x_{N-2},z_{N-2})= J_{N-2} (x_{N-2},z_{N-2}) \end{aligned}$$

(where T _N−2 is Bellman’s operator at the stage N−2), and

$$ \|T_{N-2} \tilde{J}_{N-1}-T_{N-2} J_{N-1}\|_{{\mathrm{sup}}(\bar{Y}_{N-2})} \leq \beta \|\tilde{J}_{N-1}-J_{N-1} \|_{{\mathrm{sup}}(\bar{Y}_{N-1})} \leq \beta \eta_{N-1}, $$

(58)

and the first inequality in (58) follows by an application of the contraction property of Bellman’s operator [80, Theorem 9.6].

Concluding, we obtain $\|\tilde{J}_{V,N-2}-J_{N-2}\|_{{\mathrm{sup}}(\bar{Y}_{N-2})} \leq \delta_{N-2}:= \beta(\eta_{N-1}+\chi_{N-1})$ and, by an application of the triangle inequality, $\|\tilde{J}_{V,N-2}-\tilde{J}_{N-2}\|_{{\mathrm{sup}}(\bar{Y}_{N-2})} \leq \chi_{N-2}:=\eta_{N-2}+\delta_{N-2}$.

Step 3: other stages.:

Similarly, we get the following recursive formulas for the other stages (t=N−3,…,0): $\|\tilde{J}_{V,t}-J_{t}\|_{{\mathrm{sup}}(\bar{Y}_{t})} \leq \delta_{t}:= \beta(\eta_{t+1}+\chi_{t+1})$ and $\|\tilde{J}_{V,t}-\tilde{J}_{t}\|_{{\mathrm{sup}}(\bar{Y}_{t})} \leq \chi_{t}:=\eta_{t}+\delta_{t}$.

Step 4: dependence of δ ₀ from the errors η ₀,…,η _N−1.:

$$ \begin{aligned} &\delta_{N-1}=0, \\ &\chi_{N-1}=\eta_{N-1}, \\ &\delta_{N-2}=\beta(\eta_{N-1}+\chi_{N-1})=2\beta \eta_{N-1}, \\ &\chi_{N-2}=\eta_{N-2}+\delta_{N-2}= \eta_{N-2}+2\beta \eta_{N-1}, \\ &\delta_{N-3}=\beta(\eta_{N-2}+\chi_{N-2})=2\beta \eta_{N-2} + 2 \beta^2 \eta_{N-1}, \\ &\chi_{N-3}=\eta_{N-3}+\delta_{N-3}= \eta_{N-3}+2\beta \eta_{N-2} + 2 \beta^2 \eta_{N-1}, \\ &\delta_{N-4}=\beta(\eta_{N-3}+\chi_{N-3})=2\beta \eta_{N-3} + 2 \beta^2 \eta_{N-2} + 2 \beta^3 \eta_{N-1}, \\ &\vdots \\ &\delta_{0}=2 \beta \eta_1+2 \beta^2 \eta_2 + 2 \beta^3 \eta_3 + \cdots + 2 \beta^{N-1} \eta_{N-1}=2 \sum_{i=1}^{N-1} \beta^i \eta_i. \end{aligned} $$

(59)

Step 5: dependence of δ ₀ from the errors ε ₁,…,ε _N−1.:

By (59) and the expressions (53) for η ₀,…,η _N−1, we make explicit the dependence of δ ₀ from the errors ε ₁,…,ε _N−1, as shown in the following:

$$\begin{aligned} \delta_{0} =& 2 \sum_{i=1}^{N-1} \sum_{j=0}^{N-1-i} \beta^i (2\beta)^j \varepsilon_{i+j} \\ =& 2 \bigl[\beta \varepsilon_1 + \bigl(\beta^2 + 2 \beta^2\bigr) \varepsilon_2 + \bigl(\beta^3 + 2 \beta^3 + 4 \beta^3\bigr) \varepsilon_3 + \cdots \\ &{}+ \bigl(\beta^{N-1} + 2 \beta^{N-1} + \cdots + 2^{N-2} \beta^{N-1}\bigr) \varepsilon_{N-1} \bigr] \\ =& 2 \sum_{i=1}^{N-1} \Biggl(\sum_{j=0}^{i-1} 2^j \Biggr) \beta^i \varepsilon_i = 2 \sum_{i=1}^{N-1} \bigl(2^i-1\bigr) \beta^i \varepsilon_i =2 \sum_{i=1}^{N-1} (2 \beta)^i \varepsilon_i - 2 \sum_{i=1}^{N-1} \beta^i \varepsilon_i, \end{aligned}$$

which concludes the proof. □

Proof of Proposition 6

We compare the behavior of the two dynamical systems x _t+1=g _t(x _t,z _t) and $\tilde{x}_{t+1}=\tilde{g}_{P,t}(\tilde{x}_{t},z_{t})$, for the same initial state $\tilde{x}_{0}=x_{0}$ and the same sequence of random vectors z _t. By the Lipschitz continuity of h _t and h _N, we get

$$\begin{aligned} \|J -\tilde{J}_P\|_{{\mathrm{sup}}(\bar{Y}_0)} \leq &\mathop {{\mathbb{E}}}_{z_1,\ldots,z_N|z_0} \Biggl\{ \sum_{t=0}^{N-1} L_{h_t} \beta^t \sqrt{\|x_t-\tilde{x}_t \|^2 + \|x_{t+1}-\tilde{x}_{t+1}\|^2} \\ &{}+ \beta^N L_{h_N} \|x_N - \tilde{x}_N\| \Biggr\} . \end{aligned}$$

(60)

The Lipschitz continuity of g _t and the triangle inequality give

$$\begin{aligned} \|x_{t+1}-\tilde{x}_{t+1}\| =& \bigl\| g_t(x_{t},z_t)-\tilde{g}_{P,t}(\tilde{x}_t,z_t)\bigr\| \\ =& \bigl\| g_t(x_{t},z_t)-g_t( \tilde{x}_t,z_t) +g_t(\tilde{x}_t,z_t)-\tilde{g}_{P,t}(\tilde{x}_t,z_t)\bigr\| \\ \leq& \bigl\| g_t(x_{t},z_t)-g_t( \tilde{x}_t,z_t)\bigr\| + \bigl\| g_t(\tilde{x}_t,z_t)- \tilde{g}_{P,t}(\tilde{x}_t,z_t)\bigr\| \\ \leq& L_{g_t}\|x_t-\tilde{x}_t\| + \bigl\| g_t(\tilde{x}_t,z_t)-\tilde{g}_{P,t}(\tilde{x}_t,z_t)\bigr\| . \end{aligned}$$

(61)

Let $\tilde{\theta}_{t} :=\|x_{t}-\tilde{x}_{t}\|$. Then $\tilde{\theta}_{0}=0$ and Eq. (61) together with $\|g_{t} -\tilde{g}_{P,t}\|_{{\mathrm{sup}}(\bar{Y}_{t})} \leq {\varepsilon}_{t}$ provide $\tilde{\theta}_{t+1} \leq L_{g_{t}}\tilde{\theta}_{t}+{\varepsilon}_{t}$ for t=0,…,N−1. So, the upper bound $\|J {-}\,\tilde{J}_{P}\|_{{\mathrm{sup}}(\bar{Y}_{0})}\leq \sum_{t=0}^{N-1} \beta^{t} L_{h_{t}} \sqrt{\theta_{t}^{2}+\theta_{t+1}^{2}}+\beta^{N} L_{h_{N}}\theta_{N}$ is obtained by (60) and the definition of θ _t (which does not depend on the particular sequence of random vectors z _t). As θ _t is nonnegative for t=0,…,N−1, we have $\sqrt{\theta_{t}^{2}+\theta_{t+1}^{2}} \leq \theta_{t} +\theta_{t+1}$. This, together with the upper bound $\|J -\tilde{J}_{P}\|_{{\mathrm{sup}}(\bar{Y}_{0})}\leq \sum_{t=0}^{N-1} \beta^{t} L_{h_{t}} \sqrt{\theta_{t}^{2}+\theta_{t+1}^{2}}+\beta^{N} L_{h_{N}}\theta_{N}$ and the linear difference equation $\theta_{t+1}=L_{g_{t}}\theta_{t}+{\varepsilon}_{t}$, yields the estimate $\|J -\tilde{J}_{P}\|_{{\mathrm{sup}}(\bar{Y}_{0})} \leq \sum_{t=0}^{N-1} \varGamma_{t} {\varepsilon}_{t}$. □

The next two lemmas provide smoothness properties of value functions and optimal policies. Lemma 17 (which gives high-order continuity properties) will be used in the proof of Lemma 18, which, in turn, will be exploited in the proofs of Propositions 7 and 8.

Lemma 17

Under Assumption 2, $J_{t} \in \mathcal {C}^{m}(X_{t} \times Z_{t})$, t=0,…,N, and the optimal policy g _t satisfies $g_{t} \in \mathcal {C}^{m-1}(X_{t} \times Z_{t}, \mathbb {R}^{d})$, t=0,…,N−1.

Proof of Lemma 17

The regularity conditions stated in Assumption 2 (among these, Assumption 2(iii), which is used only in this part of the proof) guarantee that solving Problem $\varSigma_{N}^{\mathrm{stoch}}$ is equivalent to solving iteratively Bellman’s equations (this follows, e.g., by an application of [48, Theorem 2.6]). The remaining of the proof works by backward induction as described in the following.

Suppose that, for t=0,…,N−1 and the value of α _t+1>0 given in Assumption 2(iv) for t=N−1 and (v) for t=0,…,N−2, one knows that $J_{t+1}(x_{t+1},z_{t+1}) \in \mathcal {C}^{m}(X_{t+1} \times Z_{t+1})$ and is α _t+1-concave in x _t+1 (i.e., if one fixes z _t+1, then J _t+1(x _t+1,z _t+1) is α _t+1-concave as a function of x _t+1). For t=N−1 this is guaranteed by Assumption 2(iv). We show that also $J_{t}(x_{t},z_{t}) \in \mathcal {C}^{m}(X_{t} \times Z_{t})$ is α _t-concave in its first argument (i.e., the function J _t(⋅,z _t) is α _t-concave for every fixed z _t∈Z _t), with the same value of α _t>0 given in Assumption 2(v).

Let $(x_{t}, z_{t}) \in \operatorname {int}(X_{t} \times Z_{t})$ and $J^{e}_{t+1}(y,z_{t}):=\mathop {{\mathbb{E}}}_{z_{t+1}|z_{t}} \{J_{t+1}(y,z_{t+1}) \}$. By computing the conditional expected value, it is easy to see that $J^{e}_{t+1}(y,z_{t})$ is α _t+1-concave in y. As $J_{t+1} \in \mathcal {C}^{m}(X_{t+1} \times Z_{t+1})$ and $p_{z_{t+1}|{z_{t}}} \in \mathcal {C}^{m}(Z_{t+1} \times Z_{t})$, by interchanging derivatives and conditional expectations we get $J^{e}_{t+1} \in \mathcal {C}^{m}(X_{t+1} \times Z_{t})$. Since by hypothesis the optimal policy g _t is interior, the first-order optimality condition

$$ \nabla_2 h_t\bigl(x_t,g_t(x_t,z_t),z_t\bigr)+\beta \nabla_1 J^e_{t+1} \bigl(g_t(x_t,z_t),z_t\bigr)=0 $$

(62)

holds. Then, by the Implicit Function Theorem we get

$$\begin{aligned} \nabla_{x_t} g_t(x_t,z_t) = &-\bigl[\nabla^2_{2,2} \bigl(h_t\bigl(x_t, g_t(x_t,z_t),z_t \bigr) \bigr)+ \beta \nabla^2_{1,1} J^e_{t+1} \bigl(g_t(x_t,z_t),z_t\bigr)\bigr]^{-1} \\ &{}\cdot\nabla^2_{2,1}h_t\bigl(x_t,g_t(x_t,z_t),z_t\bigr) \end{aligned}$$

(63)

and

$$\begin{aligned} \nabla_{z_t} g_t(x_t,z_t) =& -\bigl[\nabla^2_{2,2} \bigl(h_t\bigl(x_t,g_t(x_t,z_t),z_t\bigr) \bigr) + \beta \nabla^2_{1,1} J^e_{t+1}\bigl(g_t(x_t,z_t),z_t\bigr)\bigr]^{-1} \\ &{}\cdot \nabla^2_{2,3}h_t\bigl(x_t,g_t(x_t,z_t),z_t \bigr), \end{aligned}$$

(64)

where the matrix $\nabla^{2}_{2,2} (h_{t}(x_{t},g_{t}(x_{t},z_{t}),z_{t}) )+ \beta \nabla^{2} J^{e}_{t+1}(g_{t}(x_{t},z_{t}))$ is non-singular. Indeed, by the concavity of h _t with respect to its second argument (which holds by Assumption 2(v)), the matrix $\nabla^{2}_{2,2} (h_{t}(x_{t},g_{t}(x_{t},z_{t}),z_{t}))$ is negative semi-definite, and the matrix $\nabla^{2}_{1,1} J^{e}_{t+1}(g_{t}(x_{t},z_{t}),z_{t})$ is negative definite since $J^{e}_{t+1}$ is α _t+1-concave in the first argument and α _t+1>0. Hence, the matrix $\nabla^{2}_{2,2} (h_{t}(x_{t},g_{t}(x_{t},z_{t}),z_{t}) )+ \beta \nabla^{2} J^{e}_{t+1}(g_{t}(x_{t},z_{t}))$ is non-singular and so (63) and (64) are well defined.

By differentiating (63) and (64) up to the order m, we get $g_{t} \in C^{m-1}(\operatorname {int}(X_{t} \times Z_{t}), \mathbb {R}^{d})$. As the expressions obtained for its partial derivatives up to the order m−1 are bounded and continuous not only on $\operatorname {int}(X_{t} \times Z_{t})$ but on the whole X _t×Z _t, one has $g_{t} \in C^{m-1}(X_{t} \times Z_{t}, \mathbb {R}^{d})$.

Since $J_{t}(x_{t},z_{t})=h_{t}(x_{t},g_{t}(x_{t}),z_{t})+\beta J^{e}_{t+1}(g_{t}(x_{t},z_{t}),z_{t})$, by differentiating (62) we get

$$\begin{aligned} \bigl(\nabla_{x_t} J_t(x_t,z_t) \bigr)^{T} =& \bigl(\nabla_1 h_t \bigl(x_t,g_t(x_t,z_t),z_t\bigr)\bigr)^{T} + \bigl[\nabla_2 h_t\bigl(x_t,g_t(x_t,z_t),z_t\bigr) \\ &{}+ \beta \nabla_1 J^e_{t+1} \bigl(g_t(x_t,z_t),z_t\bigr)\bigr]^{T} \nabla_{x_t} g_t(x_t,z_t) \\ =& \bigl(\nabla_1 h_t\bigl(x_t,g_t(x_t,z_t),z_t \bigr)\bigr)^{T}, \end{aligned}$$

(65)

where the last equality holds by the first-order optimality condition (62). Similarly, we have

$$\begin{aligned} \bigl(\nabla_{z_t} J_t(x_t,z_t) \bigr)^{T} =& \bigl(\nabla_3 h_t \bigl(x_t,g_t(x_t,z_t),z_t \bigr)\bigr)^{T} +\bigl[\nabla_2 h_t \bigl(x_t,g_t(x_t,z_t),z_t\bigr) \\ &{}+ \beta \nabla_1 J^e_{t+1} \bigl(g_t(x_t,z_t),z_t\bigr)\bigr]^{T} \nabla_{z_t} g_t(x_t,z_t) \\ &{}+ \beta \bigl(\nabla_2 J^e_{t+1} \bigl(g_t(x_t,z_t),z_t\bigr)\bigr)^{T} \\ =&\bigl(\nabla_1 h_t\bigl(x_t,g_t(x_t,z_t),z_t \bigr)\bigr)^{T} + \beta \bigl(\nabla_2 J^e_{t+1}\bigl(g_t(x_t,z_t),z_t \bigr)\bigr)^{T}\!. \end{aligned}$$

(66)

By differentiating (65) and (66) up to the order m, we get $J_{t} \in \mathcal {C}^{m}(\operatorname {int}(X_{t} \times Z_{t}))$. Likewise for the optimal policies, this extends to $J_{t} \in \mathcal {C}^{m}(X_{t} \times Z_{t})$.

In order to conclude the backward induction step, it remains to show that J _t is α _t-concave in its first argument for the same value of α _t>0 in Assumption 2(v). By [58, Proposition 1], for every $a_{0} \in \mathbb {R}^{d}$ and z _t fixed we get

$$\begin{aligned} a_0^{T}\nabla^2_{1,1} J_t(x_t,z_t)a_0 =&\max _{a_1 \in \mathbb {R}^d} \bigl[(a_0,a_1)^{T} \nabla^2_{(1,2)} h_t\bigl(x_t,g_t(x_t,z_t),z_t \bigr) (a_0,a_1) \\ &{}+ \beta a_1^{T}\nabla^2_{1,1} J^e_{t+1}\bigl(g_t(x_t,z_t),z_t \bigr)a_1 \bigr]. \end{aligned}$$

(67)

By Assumption 2(v) and the concavity of $J^{e}_{t+1}$ with respect to the first argument, it follows that $a_{0}^{T}\nabla^{2}_{1,1} J_{t}(x_{t},z_{t})a_{0} \leq -\alpha_{t} \|a_{0}\|^{2}$. As a twice continuously differentiable function is α-concave iff the maximum eigenvalue of its Hessian is at most −α, it follows that J _t is α _t-concave in its first argument. □

Recall that for r>0 and a positive integer l, the Bessel potential of order r on $\mathbb {R}^{l}$, denoted by G _r(x), is the inverse Fourier transform of the function ${\hat {G}_{r}}(\omega):= (2\pi)^{-\frac{d}{2}}(1+\|{\omega}\|^{2})^{-r/2}$. It is continuous when r>l [79, p. 132]. For r>0 and 1≤p≤∞, $\mathcal {B}^{r}_{p}(\mathbb {R}^{l})$ is the Bessel potential space, whose elements are functions u such that u=f∗G _r, where $f \in \mathcal {L}_{p}(\mathbb {R}^{l})$ [79, p. 134].

For an open set $\varOmega \subseteq \mathbb {R}^{l}$, a positive integer m, and 1≤p≤∞, by $\mathcal {W}_{p}^{m}(\varOmega)$ we denote the Sobolev space of functions whose partial derivatives up to the order m are in $\mathcal {L}_{p}(\varOmega)$.

Lemma 18

Let Assumption 2 be satisfied. Then the following hold.

Case 1: m≥2 and 1<p<∞.
1. (i)
  For every t=0,…,N there exists a function $\bar{J}^{p}_{t} \in \mathcal {B}^{m}_{p}(\mathbb {R}^{d+s})$ such that $J_{t}=\bar{J}^{p}_{t}|_{X_{t} \times Z_{t}}$.
2. (ii)
  For every t=0,…,N−1 and j=1,…,d there exists a function $\bar{g}_{t,j}^{p} \in \mathcal {B}^{m-1}_{p}(\mathbb {R}^{d+s})$ such that, for the component g _t,j of the optimal policy g _t, one has $g_{t,j}=\bar{g}^{p}_{t,j}|_{X_{t} \times Z_{t}}$.
Case 2: m≥2 and p=1.
1. (iii)
  If m=2, then (i) holds with $\mathcal {B}^{m}_{p}(\mathbb {R}^{d+s})$ replaced by $\mathcal {B}^{2}_{1}(\mathbb {R}^{d+s})$.
2. (iv)
  If m>2 is even, then (i) holds with $\mathcal {B}^{m}_{p}(\mathbb {R}^{d+s})$ replaced by $\mathcal {B}^{m}_{1}(\mathbb {R}^{d+s})$ and (ii) holds with $\mathcal {B}^{m-1}_{p}(\mathbb {R}^{d+s})$ replaced by $\mathcal {B}^{m-2}_{1}(\mathbb {R}^{d+s})$.
3. (v)
  If m is odd, then (i) holds with $\mathcal {B}^{m}_{p}(\mathbb {R}^{d+s})$ replaced by $\mathcal {B}^{m-1}_{1}(\mathbb {R}^{d+s})$ and (ii) holds with $\mathcal {B}^{m-1}_{p}(\mathbb {R}^{d+s})$ replaced by $\mathcal {B}^{m-1}_{1}(\mathbb {R}^{d+s})$.

Proof of Lemma 18

(i)
By Lemma 17 and the compactness of X _t×Z _t, we get $J_{t} \in \mathcal {C}^{m}(X_{t} \times Z_{t}) \subset \mathcal {W}^{m}_{p}(\operatorname {int}(X_{t} \times Z_{t}))$. As X _t is bounded and convex, by [38, Corollary 1.2.2.3, p. 12] it has a Lipschitz boundary. By Assumption 2(i), (ii) X _t×Z _t has a Lipschitz boundary, too. So, by Sobolev’s extension theorem [79, Theorem 5, p. 181], which holds thanks to the regularity of the boundary, J _t can be extended for every 1≤p≤∞ on the whole $\mathbb {R}^{d+s}$ to a function $\bar{J}_{t}^{p} \in \mathcal {W}^{m}_{p}(\mathbb {R}^{d+s})$. By the equivalence between Sobolev spaces and Bessel potential spaces [79, Theorem 3, p. 135], which holds for 1<p<∞, we have $\bar{J}^{p} \in \mathcal {B}^{m}_{p}(\mathbb {R}^{d+s})$ which, when restricted to X _t×Z _t, proves (i).
(ii)
By Lemma 17, $g_{t,j} \in \mathcal {C}^{m-1}(X_{t} \times Z_{t}) \subset \mathcal {W}^{m-1}_{p}(\operatorname {int}(X_{t} \times Z_{t}))$. Then the statement follows by similar arguments as in the last part of the proof of (i).
(iii), (iv), and (v)
As d+s>1, for l even the inclusion $\mathcal {W}^{l}_{1}(\mathbb {R}^{d+s}) \subset \mathcal {B}^{l}_{1}(\mathbb {R}^{d+s})$ holds [79, Remark 6.6(b), p. 160]. Then (iii), (iv), and (v) follow by combining this with $J_{t} \in \mathcal {W}^{m}_{1}(\mathbb {R}^{d+s})$ and $g_{t,j} \in \mathcal {W}^{m-1}_{1}(\mathbb {R}^{d+s})$. □

The next lemma, which is proven in [34, Sect. 5] and will be used in the proof of Proposition 7, states an approximation property that holds for functions belonging to the Bessel potential space $\mathcal {B}^{r}_{1}(\mathbb {R}^{l})$, when r>l.

Lemma 19

[34, Corollary 5.2]

Let l be a positive integer and r>l. Then for every $f \in \mathcal {B}^{r}_{1}(\mathbb {R}^{l})$ and every positive integer n there exist $t_{1}, \ldots, t_{n} \in \mathbb {R}^{l}$, b ₁,…,b _n>0, δ ₁,…,δ _n∈{−1,+1}, and an absolute constant C>0 such that

$$\sup_{x \in \mathbb {R}^l} \Biggl| f({x})-\frac{K(r,l)}{n} \sum_{i=1}^n \delta_i\, e^{-\frac{\|x-t_i\|^2}{b_i}}\Biggr| \leq C\,K(r,l) \sqrt {\frac{l+3}{n}}, $$

where $K(r,l):= 2^{-l/2}\,\frac{\varGamma(r/2-d/2)}{\varGamma(r/2)}\|\lambda\|_{\mathcal {L}_{1}(\mathbb {R}^{l})}$ and $\varGamma(z) := \int_{0}^{\infty}y^{z-1}e^{-y}\,dy$ is the Gamma function.

Proof of Proposition 7

(i)
Let d+s be even. By Lemma 18, there exists a function $\bar{J}^{o,1}_{t} \in \mathcal {B}^{d+s+2}_{1}(\mathbb {R}^{d+s})$ such that $J_{t}=\bar{J}^{o,1}_{t}|_{X_{t} \times Z_{t}}$. Since the order r=d+s+2 of the Bessel potential associated to $\mathcal {B}^{r}_{1}(\mathbb {R}^{d+s})$ is larger than the number of variables d+s, we conclude the proof by Lemma 19.

For d+s odd, by Lemma 18 there exists a function $\bar{J}_{t} \in \mathcal {B}^{d+s+1}_{1}(\mathbb {R}^{d+s})$ such that $J^{o,1}_{t}=\bar{J}^{o,1}_{t}|_{X_{t} \times Z_{t}}$. As the order r=d+s+1 of the Bessel potential associated to $\mathcal {B}^{r}_{1}(\mathbb {R}^{d+s})$ is larger than the number of variables d+s, we conclude the proof by Lemma 19.
(ii)
It is proven like (i) by replacing m, J _t, and $\bar{J}^{o,1}_{t}$ with m−1, g _t, and $\bar{g}^{o,1}_{t}$, resp. □

The next lemma, which is an immediate consequence of [56, Theorem 2.1], will be used in the proof of Proposition 8. Recall that in Sect. 4 we have defined

$$\begin{aligned} \mathcal {S}^\prime :=\biggl\{ &\psi:\mathbb {R}\to \mathbb {R}| {\rm nonzero, \ infinitely \ many \ times \ differentiable \ in \ some}\ \mathcal {I}\subset \mathbb {R}, \\ &{\rm open \ interval \ and \ such \ that \ there \ exists} \ c \in \mathcal {I}: \frac{\partial^k \psi}{\partial z^k} \biggm|_{z=c} \neq 0\ \forall k \in \mathbb {N}\biggr\} . \end{aligned}$$

Lemma 20

Let r and l be positive integers and $\psi \in \mathcal {S}^{\prime}$. Then for every $f \in \mathcal {C}^{r}(\mathbb {R}^{l})$ and every positive integer n there exist $a_{1}, \ldots, a_{n} \in \mathbb {R}^{l}$, $b_{1}, \ldots, b_{n} \in \mathbb {R}$, $\delta_{1}, \ldots, \delta_{n} \in \mathbb {R}$, and a constant C>0 (we emphasize that C does not depend on f) such that

$$ \sup_{x \in \mathbb {R}^l} \Biggl| f({x})-\sum _{i=1}^n \delta_i \psi\bigl(\langle a_i , x \rangle + b_i\bigr)\Biggr| \leq C n^{-{r}/{l}}. $$

(68)

Proof of Proposition 8

One proceeds as the proof of Proposition 7, using Lemma 20 instead of Lemma 18 and noting that for r≥⌊l/2⌋+1, the second member of (68) can be replaced by Cn ^−1/2. □

Proof of Theorem 9

One combines Proposition 7(i) with Proposition 5(i) and (ii). □

Proof of Theorem 10

Proceeding as in the proof of Lemma 17, we obtain that the t-th value function J _t and the t-th optimal policy g _t are Lipschitz continuous. Hence, for t=0,…,N−1 and any sequence of ε _t>0, Proposition 6(ii) gives

$$ \|J-\tilde{J}_{P, \mathit{GRBFN}}\|_{{\mathrm{sup}}(\bar{Y}_0)} \leq \sum_{t=0}^{N-1} \varGamma_t {\varepsilon}_t $$

(69)

for any sequence of functions $\gamma_{t} \in \mathcal {G}(n_{t,1},\ldots,n_{t,d}; \bar{Y}_{t}, X_{t+1})$ such that $\|g_{t} -\gamma_{t}\|_{{\mathrm{sup}}(\bar{Y}_{t})} \leq {\varepsilon}_{t}$ (t=0,…,N−1).

By Proposition 7(ii), for every t=0,…,N−1 and j=1,…,d, there exists C _t,j>0 such that, for every positive integer n _t,j, there exist $\tau_{t,j,1}, \ldots,\tau_{t,j,n_{t,j}} \in \mathbb {R}^{d}$, $b_{t,j,1}, \ldots, b_{t,j,n_{t,j}}>0$, and $\delta_{t,j,1}, \ldots, \delta_{t,j,n_{t,j}} \in \mathbb {R}$ for which

$$ \sup_{\bar{y}_t \in \bar{Y}_t} \Biggl\vert g_{t,j}(\bar{y}_t)-\sum_{i=1}^{n_{t,j}} \delta_{t,j,i} \exp \biggl(-\frac{\|\bar{y}_t-\tau_{t,j,i}\|^2}{b_{t,j,i}} \biggr) \Biggr\vert \leq \frac{C_{t,j}}{\sqrt{n_{t,j}}} . $$

(70)

By (16), for every continuous function $\gamma_{t}: \bar{Y}_{t} \to \mathbb {R}^{d}$ we have

$$ \|\gamma_t\|_{{\mathrm{sup}}(\bar{Y}_t)} \leq \sqrt{\sum_{j=1}^d \Bigl( \sup_{\bar{y}_t \in \bar{Y}_t} \bigl|g_{t,j}(\bar{y}_t)\bigr| \Bigr)^2}. $$

(71)

Since $g_{t}(\bar{y}_{t}) \in X_{t+1}$ and $\operatorname {Prj}_{\mathcal {D}_{t}(x_{t},z_{t})}$ is a nonexpansive map, from (70) and (71) we get

$$\begin{aligned} \sup_{\bar{y}_t \in \bar{Y}_t} \Biggl| &g_{t}(\bar{y}_t)-\operatorname {Prj}_{\mathcal {D}_{t}(x_t,z_t)} \Biggl\{ \sum_{i=1}^{n_{t,1}} \delta_{t,1,i} \exp \biggl(-\frac{\|\bar{y}_t-\tau_{t,1,i}\|^2}{b_{t,1,i}} \biggr), \ldots, \\ & \sum_{i=1}^{n_{t,d}} \delta_{t,d,i} \exp \biggl(-\frac{\|\bar{y}_t-\tau_{t,d,i}\|^2}{b_{t,d,i}} \biggr) \Biggr\} \Biggr| \leq \sqrt{\sum_{j=1}^d \frac{C_{t,j}^2}{n_{t,j}}} . \end{aligned}$$

(72)

Hence, for every t=0,…,N−1, there exists ${\tilde{g}}_{t} \in \mathcal {G}(n_{t,1},\ldots,n_{t,d}; \bar{Y}_{t}, X_{t+1})$ such that

$$ \bigl\Vert g_{t}(\bar{y}_t)-{\tilde{g}}_t(\bar{y}_t) \bigr\Vert _{{\mathrm{sup}}(\bar{Y}_t)} \leq \sqrt{\sum_{j=1}^d \frac{C_{t,j}^2}{n_{t,j}}} . $$

We conclude by taking ${\varepsilon}_{t} = \sqrt{\sum_{j=1}^{d} \frac{C_{t,j}^{2}}{n_{t,j}}}$ in (69). □

Proofs of Theorems 11 and 12

They are along the lines of the proofs of Theorems 9 and 10, resp., by exploiting Proposition 8 instead of Proposition 7. □

Proof of Proposition 14

We first derive some conditions on the sets A _t,j, then we show that the budget constraints (46) are satisfied if and only if the sets A _t,j are chosen as in Assumption 13 (or are suitable subsets).

As the labour incomes y _t,j and the stochastic interest rates r _t,j are bounded from above, for t=1,…,N we have

$$a_{t,j} \leq a_{0,j}^\mathrm{max} \prod _{k=0}^{t-1}\bigl(1+r_{k,j}^\mathrm{max} \bigr) + \sum_{i=0}^{t-1} y_{i,j} \prod_{k=i}^{t-1}\bigl(1+r_{k,j}^\mathrm{max} \bigr)=a_{t,j}^\mathrm{max} $$

(the upper bound is achieved when all the r _t,j are equal to $r_{t,j}^{\mathrm{max}}$ and all the consumptions c _t,j are equal to 0), so the corresponding feasible sets A _t,j are bounded from above by $a_{t,j}^{\mathrm{max}}$. Boundedness from below of each A _t,j follows from the budget constraints (46), which for c _k,j=0 (k=t,…,N) are equivalent to

$$ a_{N,j} \geq -y_{N,j} $$

(73)

for t=N and

$$a_{t,j} \prod_{k=t}^{N-1} (1+r_{k,j}) + \sum_{i=t}^{N-1} y_{i,j} \prod_{k=i}^{N-1} (1+r_{k,j}) + y_{N,j} \geq 0, $$

i.e.,

$$ a_{t,j} \geq -\frac{\sum_{i=t}^{N-1} y_{i,j} \prod_{k=i}^{N-1} (1+r_{k,j}) + y_{N,j}}{\prod_{k=t}^{N-1} (1+r_{k,j} )} $$

(74)

for t=0,…,N−1. For k=t,…,N−1, one can show that the maximum value of the right-hand side in the lower bound (74) occurs when each r _k,j is equal to $r_{k,j}^{\mathrm{max}}$ (indeed, by computing its partial derivatives with respect to r _k,j it is easy to see that they are nonnegative). So, in order to satisfy the budget constraints (46) independently of the specific sequence of stochastic interest rates, the constraints (73) and

$$ a_{t,j} \geq -\frac{\sum_{i=t}^{N-1} y_{i,j} \prod_{k=i}^{N-1} (1+r_{k,j}^\mathrm{max}) + y_{N,j}}{\prod_{k=t}^{N-1} (1+r_{k,j}^\mathrm{max} )}, \quad t=0,\ldots,N-1, $$

(75)

have to be satisfied. Then the maximal sets A _t that satisfy the budget constraints (46) have the form described in Assumption 13. □

Proof of Proposition 16

(a)
Assumption 2(i), (ii), and (iii). These follow immediately by the choices of A _t and R _t, and by Assumption 15(i). By construction, the sets $\bar{A}_{t}$ are compact, convex, and have nonempty interiors, since they are Cartesian products of nonempty closed intervals. The same holds for the sets $\bar{D}_{t}$, since by (48) they are the intersections between $\bar{A}_{t} \times R_{t} \times \bar{A}_{t+1}$ and the sets D _t, which are compact, convex, and have nonempty interiors, too.
(b)
Assumption 2(iv). Recall that for Problem OCU^d we have
$$h_N(a_N)= u (a_N+y_N ). $$
Then, $h_{N} \in \mathcal {C}^{m}(\bar{A}_{N})$ and by Assumption 15(ii) is α _N-concave with α _N=α.
(c)
Assumption 2(v). Recall that for Problem OCU^d and t=0,…,N−1, we have
$$ h_t(a_t,a_{t+1})=u \biggl(\frac{(1+r_t) \circ\,(a_t+y_t)-a_{t+1}}{1+r_t} \biggr)+\sum_{j=1}^d v_{t,j}(a_{t,j}). $$
Then, $h_{t} \in \mathcal {C}^{m}(\bar{D}_{t})$ by Assumption 15(ii), (iii). As u and v _t,j are twice continuously differentiable, the second part of Assumption 2(v) means that there exists some α _t>0 such that, for each fixed r _t, the function
$$u \biggl(\frac{(1+r_t) \circ\,(a_t+y_t)-a_{t+1}}{1+r_t} \biggr)+\sum_{j=1}^d v_{t,j}(a_{t,j})+ \frac{1}{2}\alpha_t \|a_t\|^2 $$
has negative semi-definite Hessian with respect to the variables a _t and a _t+1. Assumption 15(ii) and straightforward computations show that, for fixed r _t, the function
$$u \biggl(\frac{(1+r_t) \circ\,(a_t+y_t)-a_{t+1}}{1+r_t} \biggr) $$
has negative semi-definite Hessian. By Assumption 15(iii), for each j=1,…,d and α _t,j∈(0,β _t,j], $v_{t,j}(a_{t,j})+ \frac{1}{2}\alpha_{t,j} a_{t,j}^{2}$ has negative semi-definite Hessian, too. So, Assumption 2(v) is satisfied for every α _t∈(0,min_j=1,…,d{β _t,j}].
(d)
Assumption 2(vi). This is Assumption 15(i), with the obvious replacements of X _t and D _t. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gaggero, M., Gnecco, G. & Sanguineti, M. Approximate dynamic programming for stochastic N-stage optimization with application to optimal consumption under uncertainty. Comput Optim Appl 58, 31–85 (2014). https://doi.org/10.1007/s10589-013-9614-z

Download citation

Received: 19 April 2012
Published: 19 November 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s10589-013-9614-z

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Suboptimal Policies for Stochastic $$N$$ -Stage Optimization: Accuracy Analysis and a Case Study from Optimal Consumption

A Computationally Efficient FPTAS for Convex Stochastic Dynamic Programs

Two-stage linear decision rules for multi-stage stochastic programming

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs and technical lemmas

Proof of Proposition 5

Proof of Proposition 6

Lemma 17

Proof of Lemma 17

Lemma 18

Proof of Lemma 18

Lemma 19

Proof of Proposition 7

Lemma 20

Proof of Proposition 8

Proof of Theorem 9

Proof of Theorem 10

Proofs of Theorems 11 and 12

Proof of Proposition 14

Proof of Proposition 16

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Approximate dynamic programming for stochastic N-stage optimization with application to optimal consumption under uncertainty

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Suboptimal Policies for Stochastic $$N$$ -Stage Optimization: Accuracy Analysis and a Case Study from Optimal Consumption

A Computationally Efficient FPTAS for Convex Stochastic Dynamic Programs

Two-stage linear decision rules for multi-stage stochastic programming

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs and technical lemmas

Appendix: Proofs and technical lemmas

Proof of Proposition 5

Proof of Proposition 6

Lemma 17

Proof of Lemma 17

Lemma 18

Proof of Lemma 18

Lemma 19

Proof of Proposition 7

Lemma 20

Proof of Proposition 8

Proof of Theorem 9

Proof of Theorem 10

Proofs of Theorems 11 and 12

Proof of Proposition 14

Proof of Proposition 16

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation