Abstract
In large domains, Monte-Carlo tree search (MCTS) is required to estimate the values of the states as efficiently and accurately as possible. However, the standard update rule in backpropagation assumes a stationary distribution for the returns, and particularly in min-max trees, convergence to the true value can be slow because of averaging. We present two methods, Softmax MCTS and Monotone MCTS, which generalize previous attempts to improve upon the backpropagation strategy. We demonstrate that both methods reduce to finding optimal monotone functions, which we do so by performing Bayesian optimization with a Gaussian process (GP) prior. We conduct experiments on computer Go, where the returns are given by a deep value neural network, and show that our proposed framework outperforms previous methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baier, H., Winnands, M.H.M.: MCTS-minimax hybrids with state evaluations. J. Artif. Intell. Res. 62, 193–231 (2018)
Blumenthal, S., Cohen, A.: Estimation of the larger of two normal means. J. Am. Statist. Assoc. 63(323), 861–876 (1968)
Browne, C.B., et al.: A survey of Monte-Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Chen, Y., et al.: Bayesian optimization in AlphaGo. arXiv preprint https://arxiv.org/abs/1812.06855 (2018)
Coquelin, P.A., Munos, R.: Bandit algorithms for tree search. In: Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence, pp. 67–74 (2007)
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: International Conference on Computers and Games, pp. 72–83 (2006)
Fu, M.: Monte-Carlo tree search and minimax combination, MSc thesis, University of Maryland at College Park (2017)
Hashimoto, J., Kishimoto, A., Yoshizoe, K., Ikeda, K.: Accelerated UCT and its application to two-player games. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 1–12. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31866-5_1
Khandelwal, P., Liebman, E., Niekum, S., Stone, P.: On the analysis of complex backup strategies in Monte-Carlo tree search. In: International Conference on Machine Learning, pp. 1319–1328 (2016)
Kocsis, L., Svepesvári, C.: Bandit based Monte-Carlo planning. In: Proceedings of the 17th European Conference on Machine Learning, pp. 282–293 (2006)
Ramanujan, R., Sabharwal, A., Selman, B.: On adversarial search spaces and sampling-based planning. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 20, no. 1 (2010)
Ramanujan, R., Sabharwal, A., Selman, B.: On the behavior of UCT in synthetic search spaces. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 21, no. 1 (2011)
Ramanujan, R., Selman, B.: Trade-offs in sampling-based adversarial planning. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 21, no. 1 (2011)
Rosin, C.: Multi-armed bandits with episode context. Ann. Math. Artif. Intell. 61(3), 203–230 (2011)
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016)
Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017)
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press (2018)
Vien, N.A., Zimmermann, H., Toussaint, M.: Bayesian functional optimization. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Williams, C.K., Rasmussen, C.E.: Gaussian processes for machine learning, vol. 2. MIT press, Cambridge (2006)
Xie, F., Liu, Z.: Backpropagation modification in Monte-Carlo game tree search. In: Third International Symposium on Intelligent Information Technology Application, vol. 2, pp. 125–128 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Lim, N., Li, Y. (2021). Bayesian Optimization for Backpropagation in Monte-Carlo Tree Search. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12892. Springer, Cham. https://doi.org/10.1007/978-3-030-86340-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-86340-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86339-5
Online ISBN: 978-3-030-86340-1
eBook Packages: Computer ScienceComputer Science (R0)