Mar 9, 2017 · We propose using policies over temporally extended actions, called options, and show that combining these policies with importance sampling can significantly ...
Evaluating a policy by deploying it in the real world can be risky and costly. Off-policy policy evaluation (OPE) algorithms use historical data collected ...
It is shown theoretically and experimentally that combining importance sampling with options-based policies can significantly improve performance for ...
We propose using policies over temporally extended actions, called options, to address this long-horizon problem. We show theoretically and experimentally that ...
Reviews: Using Options and Covariance Testing ... - NIPS papers
proceedings.neurips.cc › paper › file
The authors investigate how options influence the variance of importance sampling estimators to increase the length of trajectories that off-policy evaluation ...
We propose using policies over temporally extended actions, called options, and show that combining these policies with importance sampling can significantly ...
This work proposes using policies over temporally extended actions, called options, and shows that combining these policies with importance sampling can ...
People also ask
What is the difference between on policy and off-policy evaluation?
What is off-policy evaluation via importance sampling?
Using options and covariance testing for long horizon off-policy policy evaluation. In Proceedings of the 31st International Con- ference on Neural ...
Using options and covariance testing for long horizon off-policy policy evaluation. In Advances in Neural Information Processing Systems 30. (NIPS), pp. 2489 ...
Using options and covariance testing for long horizon off-policy policy evaluation. In Advances in Neural Information Processing. Systems 30 (NIPS), pages ...