Bayesian Optimization for Backpropagation in Monte-Carlo Tree Search

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12892))

Included in the following conference series:

International Conference on Artificial Neural Networks

2234 Accesses

Abstract

In large domains, Monte-Carlo tree search (MCTS) is required to estimate the values of the states as efficiently and accurately as possible. However, the standard update rule in backpropagation assumes a stationary distribution for the returns, and particularly in min-max trees, convergence to the true value can be slow because of averaging. We present two methods, Softmax MCTS and Monotone MCTS, which generalize previous attempts to improve upon the backpropagation strategy. We demonstrate that both methods reduce to finding optimal monotone functions, which we do so by performing Bayesian optimization with a Gaussian process (GP) prior. We conduct experiments on computer Go, where the returns are given by a deep value neural network, and show that our proposed framework outperforms previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

On the Performance of Planning Through Backpropagation

An Efficient Bayesian Network Structure Learning Strategy

Article 12 December 2016

Batch Monte Carlo Tree Search

References

Baier, H., Winnands, M.H.M.: MCTS-minimax hybrids with state evaluations. J. Artif. Intell. Res. 62, 193–231 (2018)
Article MathSciNet Google Scholar
Blumenthal, S., Cohen, A.: Estimation of the larger of two normal means. J. Am. Statist. Assoc. 63(323), 861–876 (1968)
MathSciNet MATH Google Scholar
Browne, C.B., et al.: A survey of Monte-Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Article Google Scholar
Chen, Y., et al.: Bayesian optimization in AlphaGo. arXiv preprint https://arxiv.org/abs/1812.06855 (2018)
Coquelin, P.A., Munos, R.: Bandit algorithms for tree search. In: Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence, pp. 67–74 (2007)
Google Scholar
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: International Conference on Computers and Games, pp. 72–83 (2006)
Google Scholar
Fu, M.: Monte-Carlo tree search and minimax combination, MSc thesis, University of Maryland at College Park (2017)
Google Scholar
Hashimoto, J., Kishimoto, A., Yoshizoe, K., Ikeda, K.: Accelerated UCT and its application to two-player games. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 1–12. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31866-5_1
Chapter Google Scholar
Khandelwal, P., Liebman, E., Niekum, S., Stone, P.: On the analysis of complex backup strategies in Monte-Carlo tree search. In: International Conference on Machine Learning, pp. 1319–1328 (2016)
Google Scholar
Kocsis, L., Svepesvári, C.: Bandit based Monte-Carlo planning. In: Proceedings of the 17th European Conference on Machine Learning, pp. 282–293 (2006)
Google Scholar
Ramanujan, R., Sabharwal, A., Selman, B.: On adversarial search spaces and sampling-based planning. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 20, no. 1 (2010)
Google Scholar
Ramanujan, R., Sabharwal, A., Selman, B.: On the behavior of UCT in synthetic search spaces. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 21, no. 1 (2011)
Google Scholar
Ramanujan, R., Selman, B.: Trade-offs in sampling-based adversarial planning. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 21, no. 1 (2011)
Google Scholar
Rosin, C.: Multi-armed bandits with episode context. Ann. Math. Artif. Intell. 61(3), 203–230 (2011)
Article MathSciNet Google Scholar
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016)
Article Google Scholar
Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017)
Article Google Scholar
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press (2018)
Google Scholar
Vien, N.A., Zimmermann, H., Toussaint, M.: Bayesian functional optimization. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Williams, C.K., Rasmussen, C.E.: Gaussian processes for machine learning, vol. 2. MIT press, Cambridge (2006)
MATH Google Scholar
Xie, F., Liu, Z.: Backpropagation modification in Monte-Carlo game tree search. In: Third International Symposium on Intelligent Information Technology Application, vol. 2, pp. 125–128 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Singapore University of Technology and Design, Singapore, Singapore
Nengli Lim & Yueqin Li

Authors

Nengli Lim
View author publications
You can also search for this author in PubMed Google Scholar
Yueqin Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
iMotions A/S, Copenhagen, Denmark
Paolo Masulli
University of Tübingen, Tübingen, Baden-Württemberg, Germany
Sebastian Otte
Universität Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lim, N., Li, Y. (2021). Bayesian Optimization for Backpropagation in Monte-Carlo Tree Search. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12892. Springer, Cham. https://doi.org/10.1007/978-3-030-86340-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-86340-1_17
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86339-5
Online ISBN: 978-3-030-86340-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Bayesian Optimization for Backpropagation in Monte-Carlo Tree Search

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On the Performance of Planning Through Backpropagation

An Efficient Bayesian Network Structure Learning Strategy

Batch Monte Carlo Tree Search

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Bayesian Optimization for Backpropagation in Monte-Carlo Tree Search

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On the Performance of Planning Through Backpropagation

An Efficient Bayesian Network Structure Learning Strategy

Batch Monte Carlo Tree Search

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation