Monte Carlo Tree Search Algorithm for SSPs Under the GUBS Criterion

Authors

Gabriel Nunes Crispino Universidade de São Paulo https://orcid.org/0000-0002-7641-8492
Valdinei Freire Universidade de São Paulo https://orcid.org/0000-0003-0330-3931
Karina Valdivia Delgado Universidade de São Paulo https://orcid.org/0000-0002-9120-8987

DOI:

https://doi.org/10.19153/cleiej.27.3.5

Keywords:

Markov Decision Processes, Stochastic Shortest Path, Sequential decision making, Probabilistic planning, Monte Carlo tree search

Abstract

The Stochastic Shortest Path (SSP) is a formalism widely used for modeling goal-oriented probabilistic planning problems. When dead ends, which are states from which goal states cannot be reached, are present in the problem and cannot be avoided, the standard criterion for solving SSPs is not well defined in these scenarios. Because of that, several alternate criteria for solving SSPs with unavoidable dead ends have been proposed in the literature. One of these criteria is GUBS (Goals with Utility-Based Semantics), a criterion that makes trade-offs between probability-to-goal and cost by combining goal prioritization with Expected Utility Theory. GUBS is a good choice for these problems because it is one of the only criteria that are known to maintain the ?-strong probability-to-goal priority property, a property that provides guarantees on how a decision criterion can choose policies without having to preprocess any specific SSP problem. Although there already exist two exact algorithms for solving GUBS, eGUBS-VI and eGUBS-AO*, both are offline and there is no algorithm for solving GUBS in an online manner. In this paper we propose UCT-GUBS, an online approximate algorithm based on UCT (a Monte Carlo tree search algorithm) that solves SSPs under the GUBS criterion. We provide an analysis of an empirical evaluation performed on two probabilistic planning domains (Triangle Tireworld and Navigation) to observe how the probability-to-goal and utility values of the resulting policies compare to the optimal values, and also how the time performance of UCT-GUBS compares to the ones of eGUBS-VI and eGUBS-AO*. Our conclusion is that, like other algorithms, the usage of UCT-GUBS has to be evaluated considering the application requirements and of the problem being solved. Depending on these factors, it can be a good alternative for obtaining policies in an online fashion while, for some problems, also being able to have better time performance than other algorithms

Downloads

Published

2024-08-08

Issue

Vol. 27 No. 3 (2024): CLEI 2023 CLTM and CLTD best works

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

CLEIej is supported by its home institution, CLEI, and by the contribution of the Latin American and international researchers community, and it does not apply any author charges whatsoever for submitting and publishing. Since its creation in 1998, all contents are made publicly accesibly. The current license being applied is a (CC)-BY license (effective October 2015; between 2011 and 2015 a (CC)-BY-NC license was used).