Nothing Special   »   [go: up one dir, main page]

Skip to main content

Bayesian Optimization for Backpropagation in Monte-Carlo Tree Search

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2021 (ICANN 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12892))

Included in the following conference series:

  • 2234 Accesses

Abstract

In large domains, Monte-Carlo tree search (MCTS) is required to estimate the values of the states as efficiently and accurately as possible. However, the standard update rule in backpropagation assumes a stationary distribution for the returns, and particularly in min-max trees, convergence to the true value can be slow because of averaging. We present two methods, Softmax MCTS and Monotone MCTS, which generalize previous attempts to improve upon the backpropagation strategy. We demonstrate that both methods reduce to finding optimal monotone functions, which we do so by performing Bayesian optimization with a Gaussian process (GP) prior. We conduct experiments on computer Go, where the returns are given by a deep value neural network, and show that our proposed framework outperforms previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Baier, H., Winnands, M.H.M.: MCTS-minimax hybrids with state evaluations. J. Artif. Intell. Res. 62, 193–231 (2018)

    Article  MathSciNet  Google Scholar 

  2. Blumenthal, S., Cohen, A.: Estimation of the larger of two normal means. J. Am. Statist. Assoc. 63(323), 861–876 (1968)

    MathSciNet  MATH  Google Scholar 

  3. Browne, C.B., et al.: A survey of Monte-Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)

    Article  Google Scholar 

  4. Chen, Y., et al.: Bayesian optimization in AlphaGo. arXiv preprint https://arxiv.org/abs/1812.06855 (2018)

  5. Coquelin, P.A., Munos, R.: Bandit algorithms for tree search. In: Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence, pp. 67–74 (2007)

    Google Scholar 

  6. Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: International Conference on Computers and Games, pp. 72–83 (2006)

    Google Scholar 

  7. Fu, M.: Monte-Carlo tree search and minimax combination, MSc thesis, University of Maryland at College Park (2017)

    Google Scholar 

  8. Hashimoto, J., Kishimoto, A., Yoshizoe, K., Ikeda, K.: Accelerated UCT and its application to two-player games. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 1–12. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31866-5_1

    Chapter  Google Scholar 

  9. Khandelwal, P., Liebman, E., Niekum, S., Stone, P.: On the analysis of complex backup strategies in Monte-Carlo tree search. In: International Conference on Machine Learning, pp. 1319–1328 (2016)

    Google Scholar 

  10. Kocsis, L., Svepesvári, C.: Bandit based Monte-Carlo planning. In: Proceedings of the 17th European Conference on Machine Learning, pp. 282–293 (2006)

    Google Scholar 

  11. Ramanujan, R., Sabharwal, A., Selman, B.: On adversarial search spaces and sampling-based planning. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 20, no. 1 (2010)

    Google Scholar 

  12. Ramanujan, R., Sabharwal, A., Selman, B.: On the behavior of UCT in synthetic search spaces. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 21, no. 1 (2011)

    Google Scholar 

  13. Ramanujan, R., Selman, B.: Trade-offs in sampling-based adversarial planning. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 21, no. 1 (2011)

    Google Scholar 

  14. Rosin, C.: Multi-armed bandits with episode context. Ann. Math. Artif. Intell. 61(3), 203–230 (2011)

    Article  MathSciNet  Google Scholar 

  15. Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016)

    Article  Google Scholar 

  16. Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017)

    Article  Google Scholar 

  17. Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)

    Google Scholar 

  18. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press (2018)

    Google Scholar 

  19. Vien, N.A., Zimmermann, H., Toussaint, M.: Bayesian functional optimization. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  20. Williams, C.K., Rasmussen, C.E.: Gaussian processes for machine learning, vol. 2. MIT press, Cambridge (2006)

    MATH  Google Scholar 

  21. Xie, F., Liu, Z.: Backpropagation modification in Monte-Carlo game tree search. In: Third International Symposium on Intelligent Information Technology Application, vol. 2, pp. 125–128 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lim, N., Li, Y. (2021). Bayesian Optimization for Backpropagation in Monte-Carlo Tree Search. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12892. Springer, Cham. https://doi.org/10.1007/978-3-030-86340-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86340-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86339-5

  • Online ISBN: 978-3-030-86340-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics