Computer Science > Machine Learning

arXiv:2310.16252 (cs)

[Submitted on 25 Oct 2023 (v1), last revised 27 Nov 2023 (this version, v2)]

Title:Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

Authors:Arnab Maiti, Ross Boczar, Kevin Jamieson, Lillian J. Ratliff

View PDF

Abstract:We study the sample complexity of identifying the pure strategy Nash equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally, we are given a stochastic model where any learner can sample an entry $(i,j)$ of the input matrix $A\in[-1,1]^{n\times m}$ and observe $A_{i,j}+\eta$ where $\eta$ is a zero-mean 1-sub-Gaussian noise. The aim of the learner is to identify the PSNE of $A$, whenever it exists, with high probability while taking as few samples as possible. Zhou et al. (2017) presents an instance-dependent sample complexity lower bound that depends only on the entries in the row and column in which the PSNE lies. We design a near-optimal algorithm whose sample complexity matches the lower bound, up to log factors. The problem of identifying the PSNE also generalizes the problem of pure exploration in stochastic multi-armed bandits and dueling bandits, and our result matches the optimal bounds, up to log factors, in both the settings.

Comments:	22 pages, 5 figures
Subjects:	Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
Cite as:	arXiv:2310.16252 [cs.LG]
	(or arXiv:2310.16252v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2310.16252

Submission history

From: Arnab Maiti [view email]
[v1] Wed, 25 Oct 2023 00:05:37 UTC (840 KB)
[v2] Mon, 27 Nov 2023 21:33:05 UTC (840 KB)

Computer Science > Machine Learning

Title:Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators