research-article

Open access

(Private) Kernelized Bandits with Distributed Biased Feedback

Authors:

Bo JiAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 7, Issue 1

Article No.: 5, Pages 1 - 47

https://doi.org/10.1145/3579318

Published: 02 March 2023 Publication History

Abstract

In this paper, we study kernelized bandits with distributed biased feedback. This problem is motivated by several real-world applications (such as dynamic pricing, cellular network configuration, and policy making), where users from a large population contribute to the reward of the action chosen by a central entity, but it is difficult to collect feedback from all users. Instead, only biased feedback (due to user heterogeneity) from a subset of users may be available. In addition to such partial biased feedback, we are also faced with two practical challenges due to communication cost and computation complexity. To tackle these challenges, we carefully design a new distributed phase-then-batch-based elimination (DPBE) algorithm, which samples users in phases for collecting feedback to reduce the bias and employs maximum variance reduction to select actions in batches within each phase. By properly choosing the phase length, the batch size, and the confidence width used for eliminating suboptimal actions, we show that DPBE achieves a sublinear regret of ~O(T1-α/2 +√γT T), where α ∈ (0,1) is the user-sampling parameter one can tune. Moreover, DPBE can significantly reduce both communication cost and computation complexity in distributed kernelized bandits, compared to some variants of the state-of-the-art algorithms (originally developed for standard kernelized bandits). Furthermore, by incorporating various differential privacy models (including the central, local, and shuffle models), we generalize DPBE to provide privacy guarantees for users participating in the distributed learning process. Finally, we conduct extensive simulations to validate our theoretical results and evaluate the empirical performance.

References

[1]

Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems 24 (2011).

[2]

Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. 2017. Prochlo: Strong privacy for analytics in the crowd. In Proceedings of the 26th symposium on operating systems principles. 441--459.

Digital Library

[3]

Ilija Bogunovic and Andreas Krause. 2021. Misspecified Gaussian Process Bandit Optimization. Advances in Neural Information Processing Systems 34 (2021).

[4]

Ilija Bogunovic, Zihan Li, Andreas Krause, and Jonathan Scarlett. 2022. A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits. arXiv preprint arXiv:2202.01850 (2022).

[5]

Djallel Bouneffouf, Irina Rish, and Charu Aggarwal. 2020. Survey on applications of multi-armed and contextual bandits. In 2020 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1--8.

Digital Library

[6]

Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, and Lorenzo Rosasco. 2020. Near-linear time Gaussian process optimization with adaptive batching and resparsification. In International Conference on Machine Learning. PMLR, 1295--1305.

[7]

Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, and Lorenzo Rosasco. 2022. Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times. arXiv preprint arXiv:2201.12909 (2022).

[8]

Romain Camilleri, Kevin Jamieson, and Julian Katz-Samuels. 2021. High-dimensional experimental design and kernel bandits. In International Conference on Machine Learning. PMLR, 1227--1237.

[9]

Mingzhe Chen, Nir Shlezinger, H Vincent Poor, Yonina C Eldar, and Shuguang Cui. 2021. Communication-efficient federated learning. Proceedings of the National Academy of Sciences 118, 17 (2021).

[10]

Albert Cheu, Matthew Joseph, Jieming Mao, and Binghui Peng. 2021. Shuffle private stochastic convex optimization. arXiv preprint arXiv:2106.09805 (2021).

[11]

Albert Cheu, Adam Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. 2019. Distributed differential privacy via shuffling. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 375--403.

Digital Library

[12]

Sayak Ray Chowdhury and Aditya Gopalan. 2017. On kernelized multi-armed bandits. In International Conference on Machine Learning. PMLR, 844--853.

[13]

Zhongxiang Dai, Bryan Kian Hsiang Low, and Patrick Jaillet. 2020. Federated Bayesian optimization via Thompson sampling. Advances in Neural Information Processing Systems 33 (2020), 9687--9699.

[14]

Yihan Du, Wei Chen, Yuko Yuroki, and Longbo Huang. 2021. Collaborative Pure Exploration in Kernel Bandit. arXiv preprint arXiv:2110.15771 (2021).

[15]

Abhimanyu Dubey. 2021. No-regret algorithms for private gaussian process bandit optimization. In International Conference on Artificial Intelligence and Statistics. PMLR, 2062--2070.

[16]

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference. Springer, 265--284.

[17]

Cynthia Dwork, Aaron Roth, et al . 2014. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 3--4 (2014), 211--407.

Digital Library

[18]

David Janz, David Burt, and Javier González. 2020. Bandit optimisation of functions in the Matérn kernel RKHS. In International Conference on Artificial Intelligence and Statistics. PMLR, 2486--2495.

[19]

Motonobu Kanagawa, Philipp Hennig, Dino Sejdinovic, and Bharath K Sriperumbudur. 2018. Gaussian processes and kernel methods: A review on connections and equivalences. arXiv preprint arXiv:1807.02582 (2018).

[20]

Tor Lattimore and Csaba Szepesvári. 2020. Bandit algorithms. Cambridge University Press.

[21]

Tor Lattimore, Csaba Szepesvari, and Gellert Weisz. 2020. Learning with good feature representations in bandits and in rl with a generative model. In International Conference on Machine Learning. PMLR, 5662--5670.

[22]

Fengjiao Li, Xingyu Zhou, and Bo Ji. 2022. Differentially Private Linear Bandits with Partial Distributed Feedback. arXiv preprint arXiv:2207.05827 (2022).

[23]

Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661--670.

Digital Library

[24]

Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research 18, 1 (2017), 6765--6816.

Digital Library

[25]

Zihan Li and Jonathan Scarlett. 2022. Gaussian process bandit optimization with few batches. In International Conference on Artificial Intelligence and Statistics. PMLR, 92--107.

[26]

Wei Yang Bryan Lim, Nguyen Cong Luong, Dinh Thai Hoang, Yutao Jiao, Ying-Chang Liang, Qiang Yang, Dusit Niyato, and Chunyan Miao. 2020. Federated learning in mobile edge networks: A comprehensive survey. IEEE Communications Surveys & Tutorials 22, 3 (2020), 2031--2063.

[27]

Ajay Mahimkar, Ashiwan Sivakumar, Zihui Ge, Shomik Pathak, and Karunasish Biswas. 2021. Auric: using data-driven recommendation to automatically generate cellular configuration. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference. 807--820.

Digital Library

[28]

Nikita Mishra and Abhradeep Thakurta. 2015. (Nearly) optimal differentially private stochastic multi-arm bandits. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence. 592--601.

Digital Library

[29]

Kanishka Misra, Eric M Schwartz, and Jacob Abernethy. 2019. Dynamic online pricing with incomplete information using multiarmed bandit experiments. Marketing Science 38, 2 (2019), 226--252.

Digital Library

[30]

Carl Edward Rasmussen. 2003. Gaussian processes in machine learning. In Summer school on machine learning. Springer, 63--71.

[31]

Carl Edward Rasmussen and Christopher KI Williams. 2006. Gaussian processes for machine learning. Vol. 1. MIT press Cambridge, MA.

Digital Library

[32]

Sayak Ray Chowdhury and Aditya Gopalan. 2019. Bayesian optimization under heavy-tailed payoffs. Advances in Neural Information Processing Systems 32 (2019).

[33]

Touqir Sajed and Or Sheffet. 2019. An optimal private stochastic-mab algorithm based on optimal private stopping rule. In International Conference on Machine Learning. PMLR, 5579--5588.

[34]

School of Computer Science, Carnegie Mellon University [n.d.]. Light sensor data. Retrieved October 05, 2022, from http://www.cs.cmu.edu/~guestrin/Class/10708-F08/projects/lightsensor.zip.

[35]

Rachael Hwee Ling Sim, Yehong Zhang, Bryan Kian Hsiang Low, and Patrick Jaillet. 2021. Collaborative Bayesian optimization with fair regret. In International Conference on Machine Learning. PMLR, 9691--9701.

[36]

Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. 2009. Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995 (2009).

[37]

S. Surjanovic and D. Bingham. [n.d.]. Virtual Library of Simulation Experiments: Test Functions and Datasets. Retrieved July 29, 2022, from http://www.sfu.ca/~ssurjano.

[38]

Jay Tenenbaum, Haim Kaplan, Yishay Mansour, and Uri Stemmer. 2021. Differentially private multi-armed bandits in the shuffle model. Advances in Neural Information Processing Systems 34 (2021), 24956--24967.

[39]

Sattar Vakili, Nacime Bouziani, Sepehr Jalali, Alberto Bernacchia, and Da-shan Shiu. 2021. Optimal order simple regret for gaussian process bandits. Advances in Neural Information Processing Systems 34 (2021).

[40]

Sattar Vakili, Kia Khezeli, and Victor Picheny. 2021. On information gain and regret bounds in gaussian process bandits. In International Conference on Artificial Intelligence and Statistics. PMLR, 82--90.

[41]

Xingyu Zhou and Bo Ji. 2022. On Kernelized Multi-Armed Bandits with Constraints. arXiv preprint arXiv:2203.15589 (2022).

[42]

Xingyu Zhou and Jian Tan. 2020. Local differential privacy for bayesian optimization. arXiv preprint arXiv:2010.06709 (2020).

[43]

Yinglun Zhu, Dongruo Zhou, Ruoxi Jiang, Quanquan Gu, Rebecca Willett, and Robert Nowak. 2021. Pure exploration in kernel and neural bandits. Advances in Neural Information Processing Systems 34 (2021), 11618--11630.

Cited By

Li KGuan SLee D(2024)Towards Understanding and Characterizing the Arbitrage Bot Scam In the WildACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365508852:1(89-90)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3673660.3655088
Liu QFang Z(2024)Learning the Optimal Control for Evolving Systems with Converging DynamicsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560078:2(1-39)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3656007
Li KGuan SLee DGaretto MMarin ACiucu FFanti GRighter R(2024)Towards Understanding and Characterizing the Arbitrage Bot Scam In the WildAbstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3652963.3655088(89-90)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3652963.3655088
Show More Cited By

Index Terms

(Private) Kernelized Bandits with Distributed Biased Feedback

Recommendations

(Private) Kernelized Bandits with Distributed Biased Feedback
SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

We study kernelized bandits with distributed biased feedback. This problem is motivated by several real-world applications (such as dynamic pricing, cellular network configuration, and policy making), where users from a large population contribute to the ...
(Private) Kernelized Bandits with Distributed Biased Feedback
SIGMETRICS '23

We study kernelized bandits with distributed biased feedback. This problem is motivated by several real-world applications (such as dynamic pricing, cellular network configuration, and policy making), where users from a large population contribute to the ...
Bandit Learning with Biased Human Feedback
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

We study a multi-armed bandit problem with biased human feedback. In our setting, each arm is associated with an unknown reward distribution. When an arm is played, a user receives a realized reward drawn from the distribution of the arm. She then ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 7, Issue 1

POMACS

March 2023

749 pages

EISSN:2476-1249

DOI:10.1145/3586099

Editors:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California, United States
,
Zhi-Li Zhang
University of Minnesota, United States

Issue’s Table of Contents

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 March 2023

Published in POMACS Volume 7, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
310
Total Downloads

Downloads (Last 12 months)171
Downloads (Last 6 weeks)35

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li KGuan SLee D(2024)Towards Understanding and Characterizing the Arbitrage Bot Scam In the WildACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365508852:1(89-90)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3673660.3655088
Liu QFang Z(2024)Learning the Optimal Control for Evolving Systems with Converging DynamicsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560078:2(1-39)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3656007
Li KGuan SLee DGaretto MMarin ACiucu FFanti GRighter R(2024)Towards Understanding and Characterizing the Arbitrage Bot Scam In the WildAbstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3652963.3655088(89-90)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3652963.3655088
Zhang QZhai JFang CLiu JSun WHu HWang Q(2024)Machine Translation Testing via Syntactic Tree PruningACM Transactions on Software Engineering and Methodology10.1145/364032933:5(1-39)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3640329
Li FZhou XJi B(2024)Distributed Linear Bandits With Differential PrivacyIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.336297811:3(3161-3173)Online publication date: May-2024
https://doi.org/10.1109/TNSE.2024.3362978
Li FZhou XJi B(2023)(Private) Kernelized Bandits with Distributed Biased FeedbackACM SIGMETRICS Performance Evaluation Review10.1145/3606376.359356551:1(61-62)Online publication date: 27-Jun-2023
https://dl.acm.org/doi/10.1145/3606376.3593565

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents