Bandit algorithms with graphical feedback models and privacy awareness

Files

Bingshan_Hu_PhD_2021.pdf (1.32 MB)

Date

2021-09-27

Authors

Hu, Bingshan

Abstract

This thesis focuses on two classes of learning problems in stochastic multi-armed bandits (MAB): graphical bandits and private bandits. Different from the basic MAB setting where the learning algorithm can only have one observation,for a bandit problem under a graphical feedback model, the learning algorithm may be able to have more than one observation every time it interacts with the environment. Meanwhile, the learning algorithm only needs to suffer a regret resulting from the pulled arm if it is not the optimal one, which is the same as the basic MAB setting. The first theme of this thesis is to derive instance-dependent regret bounds for stochastic bandits under graphical feedback models.In a basic MAB problem, the learning algorithm can always use the learnt in-formation to make future decisions. If each reward vector encodes information of an individual, this kind of non-private learning algorithm may “leak” sensitive information associated with individuals. In an MAB problem with privacy awareness, the learning algorithm cannot rely on the true information learnt to make future decisions in order to comply with privacy. What a private learning algorithm promises is even if an adversary sees the output of the learning algorithm, this adversary almost cannot infer any information associated with a single individual. The second theme of this thesis covers three variants of private online learning: the private bandit setting, the private full information setting, and the private graphical bandit setting.

URI

http://hdl.handle.net/1828/13411

Collections

Electronic Theses and Dissertations (ETD)
Theses (Computer Science)

Full item page

Bandit algorithms with graphical feedback models and privacy awareness

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections