Nothing Special   »   [go: up one dir, main page]

IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Special Section on Foundations of Computer Science
Policy Gradient Based Semi-Markov Decision Problems: Approximation and Estimation Errors
Ngo Anh VIENSeungGwan LEETaeChoong CHUNG
Author information
JOURNAL FREE ACCESS

2010 Volume E93.D Issue 2 Pages 271-279

Details
Abstract

In [1] and [2] we have presented a simulation-based algorithm for optimizing the average reward in a parameterized continuous-time, finite-state semi-Markov Decision Process (SMDP). We approximated the gradient of the average reward. Then, a simulation-based algorithm was proposed to estimate the approximate gradient of the average reward (called GSMDP), using only a single sample path of the underlying Markov chain. GSMDP was proved to converge with probability 1. In this paper, we give bounds on the approximation and estimation errors for GSMDP algorithm. The approximation error of that approximation is the size of the difference between the true gradient and the approximate gradient. The estimation error, the size of the difference between the output of the algorithm and its asymptotic output, arises because the algorithm sees only a finite data sequence.

Content from these authors
© 2010 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top