Nothing Special   »   [go: up one dir, main page]

×
Please click here if you are not redirected within a few seconds.
Our method, temporal-difference search, combines temporal-difference learning with simulation-based search. Like Monte-Carlo tree search, the value function is ...
Feb 21, 2012 · We introduce a new approach to high-performance search in Markov decision processes and two-player games.
Jun 2, 2013 · Our method, TD search, combines TD learning with simulation-based search. Like Monte-Carlo tree search, value estimates are updated by learning online from ...
Abstract. Temporal-difference (TD) learning is one of the most successful and broadly applied solutions to the rein- forcement learning problem; it has been ...
We apply temporal-difference search to the game of 9 9 Go, using a million binary features matching simple patterns of stones. Without any explicit search tree, ...
People also ask
This work applies temporal-difference search to the game of 9×9 Go, using a million binary features matching simple patterns of stones, and outperformed an ...
Return to Article Details Temporal-Difference Search in Computer Go Download Download PDF. Thumbnails Document Outline Attachments. Previous. Next. Highlight ...
Instead of weakly approximating the value of every position, we approximate the value of positions that occur in the subgame starting from now until termination ...
Our method, TD search, combines TD learning with simulation-based search. Like Monte-Carlo tree search, value estimates are updated by learning online from ...
We demonstrate a viable alternative by training neural networks to evaluate Go positions via temporal difference (TD) learning. Our approach is based on neural ...