Thompson
is Python package to evaluate the multi-armed bandit problem. In addition to thompson, Upper Confidence Bound (UCB) algorithm, and randomized results are also implemented. The thompson package implements three algorithms for solving the multi-armed bandit problem:
-
Thompson Sampling: A Bayesian approach that maintains probability distributions over the expected rewards of each arm and samples from these distributions to select the next arm to pull.
-
Upper Confidence Bound (UCB): A deterministic algorithm that selects arms based on their estimated rewards and the uncertainty in those estimates.
-
Randomized Sampling: A baseline method that randomly selects arms without considering their past performance.
The multi-armed bandit problem is a classic reinforcement learning problem that exemplifies the exploration-exploitation tradeoff dilemma. In this problem, a fixed limited set of resources must be allocated between competing choices in a way that maximizes expected gain, when each choice's properties are only partially known at the time of allocation.
⭐️ Star this repo if you like it ⭐️
pip install thompson
import thompson as th
On the documentation pages you can find detailed information about the working of the thompson
with examples.