Nothing Special   »   [go: up one dir, main page]

×
Please click here if you are not redirected within a few seconds.
Mar 9, 2017 · We propose using policies over temporally extended actions, called options, and show that combining these policies with importance sampling can significantly ...
Evaluating a policy by deploying it in the real world can be risky and costly. Off-policy policy evaluation (OPE) algorithms use historical data collected ...
It is shown theoretically and experimentally that combining importance sampling with options-based policies can significantly improve performance for ...
We propose using policies over temporally extended actions, called options, to address this long-horizon problem. We show theoretically and experimentally that ...
The authors investigate how options influence the variance of importance sampling estimators to increase the length of trajectories that off-policy evaluation ...
We propose using policies over temporally extended actions, called options, and show that combining these policies with importance sampling can significantly ...
This work proposes using policies over temporally extended actions, called options, and shows that combining these policies with importance sampling can ...
People also ask
Using options and covariance testing for long horizon off-policy policy evaluation. In Proceedings of the 31st International Con- ference on Neural ...
Using options and covariance testing for long horizon off-policy policy evaluation. In Advances in Neural Information Processing Systems 30. (NIPS), pp. 2489 ...
Using options and covariance testing for long horizon off-policy policy evaluation. In Advances in Neural Information Processing. Systems 30 (NIPS), pages ...