Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

(Private) Kernelized Bandits with Distributed Biased Feedback

Published: 02 March 2023 Publication History

Abstract

In this paper, we study kernelized bandits with distributed biased feedback. This problem is motivated by several real-world applications (such as dynamic pricing, cellular network configuration, and policy making), where users from a large population contribute to the reward of the action chosen by a central entity, but it is difficult to collect feedback from all users. Instead, only biased feedback (due to user heterogeneity) from a subset of users may be available. In addition to such partial biased feedback, we are also faced with two practical challenges due to communication cost and computation complexity. To tackle these challenges, we carefully design a new distributed phase-then-batch-based elimination (DPBE) algorithm, which samples users in phases for collecting feedback to reduce the bias and employs maximum variance reduction to select actions in batches within each phase. By properly choosing the phase length, the batch size, and the confidence width used for eliminating suboptimal actions, we show that DPBE achieves a sublinear regret of ~O(T1-α/2 +√γT T), where α ∈ (0,1) is the user-sampling parameter one can tune. Moreover, DPBE can significantly reduce both communication cost and computation complexity in distributed kernelized bandits, compared to some variants of the state-of-the-art algorithms (originally developed for standard kernelized bandits). Furthermore, by incorporating various differential privacy models (including the central, local, and shuffle models), we generalize DPBE to provide privacy guarantees for users participating in the distributed learning process. Finally, we conduct extensive simulations to validate our theoretical results and evaluate the empirical performance.

References

[1]
Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems 24 (2011).
[2]
Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. 2017. Prochlo: Strong privacy for analytics in the crowd. In Proceedings of the 26th symposium on operating systems principles. 441--459.
[3]
Ilija Bogunovic and Andreas Krause. 2021. Misspecified Gaussian Process Bandit Optimization. Advances in Neural Information Processing Systems 34 (2021).
[4]
Ilija Bogunovic, Zihan Li, Andreas Krause, and Jonathan Scarlett. 2022. A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits. arXiv preprint arXiv:2202.01850 (2022).
[5]
Djallel Bouneffouf, Irina Rish, and Charu Aggarwal. 2020. Survey on applications of multi-armed and contextual bandits. In 2020 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1--8.
[6]
Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, and Lorenzo Rosasco. 2020. Near-linear time Gaussian process optimization with adaptive batching and resparsification. In International Conference on Machine Learning. PMLR, 1295--1305.
[7]
Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, and Lorenzo Rosasco. 2022. Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times. arXiv preprint arXiv:2201.12909 (2022).
[8]
Romain Camilleri, Kevin Jamieson, and Julian Katz-Samuels. 2021. High-dimensional experimental design and kernel bandits. In International Conference on Machine Learning. PMLR, 1227--1237.
[9]
Mingzhe Chen, Nir Shlezinger, H Vincent Poor, Yonina C Eldar, and Shuguang Cui. 2021. Communication-efficient federated learning. Proceedings of the National Academy of Sciences 118, 17 (2021).
[10]
Albert Cheu, Matthew Joseph, Jieming Mao, and Binghui Peng. 2021. Shuffle private stochastic convex optimization. arXiv preprint arXiv:2106.09805 (2021).
[11]
Albert Cheu, Adam Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. 2019. Distributed differential privacy via shuffling. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 375--403.
[12]
Sayak Ray Chowdhury and Aditya Gopalan. 2017. On kernelized multi-armed bandits. In International Conference on Machine Learning. PMLR, 844--853.
[13]
Zhongxiang Dai, Bryan Kian Hsiang Low, and Patrick Jaillet. 2020. Federated Bayesian optimization via Thompson sampling. Advances in Neural Information Processing Systems 33 (2020), 9687--9699.
[14]
Yihan Du, Wei Chen, Yuko Yuroki, and Longbo Huang. 2021. Collaborative Pure Exploration in Kernel Bandit. arXiv preprint arXiv:2110.15771 (2021).
[15]
Abhimanyu Dubey. 2021. No-regret algorithms for private gaussian process bandit optimization. In International Conference on Artificial Intelligence and Statistics. PMLR, 2062--2070.
[16]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference. Springer, 265--284.
[17]
Cynthia Dwork, Aaron Roth, et al . 2014. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 3--4 (2014), 211--407.
[18]
David Janz, David Burt, and Javier González. 2020. Bandit optimisation of functions in the Matérn kernel RKHS. In International Conference on Artificial Intelligence and Statistics. PMLR, 2486--2495.
[19]
Motonobu Kanagawa, Philipp Hennig, Dino Sejdinovic, and Bharath K Sriperumbudur. 2018. Gaussian processes and kernel methods: A review on connections and equivalences. arXiv preprint arXiv:1807.02582 (2018).
[20]
Tor Lattimore and Csaba Szepesvári. 2020. Bandit algorithms. Cambridge University Press.
[21]
Tor Lattimore, Csaba Szepesvari, and Gellert Weisz. 2020. Learning with good feature representations in bandits and in rl with a generative model. In International Conference on Machine Learning. PMLR, 5662--5670.
[22]
Fengjiao Li, Xingyu Zhou, and Bo Ji. 2022. Differentially Private Linear Bandits with Partial Distributed Feedback. arXiv preprint arXiv:2207.05827 (2022).
[23]
Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661--670.
[24]
Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research 18, 1 (2017), 6765--6816.
[25]
Zihan Li and Jonathan Scarlett. 2022. Gaussian process bandit optimization with few batches. In International Conference on Artificial Intelligence and Statistics. PMLR, 92--107.
[26]
Wei Yang Bryan Lim, Nguyen Cong Luong, Dinh Thai Hoang, Yutao Jiao, Ying-Chang Liang, Qiang Yang, Dusit Niyato, and Chunyan Miao. 2020. Federated learning in mobile edge networks: A comprehensive survey. IEEE Communications Surveys & Tutorials 22, 3 (2020), 2031--2063.
[27]
Ajay Mahimkar, Ashiwan Sivakumar, Zihui Ge, Shomik Pathak, and Karunasish Biswas. 2021. Auric: using data-driven recommendation to automatically generate cellular configuration. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference. 807--820.
[28]
Nikita Mishra and Abhradeep Thakurta. 2015. (Nearly) optimal differentially private stochastic multi-arm bandits. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence. 592--601.
[29]
Kanishka Misra, Eric M Schwartz, and Jacob Abernethy. 2019. Dynamic online pricing with incomplete information using multiarmed bandit experiments. Marketing Science 38, 2 (2019), 226--252.
[30]
Carl Edward Rasmussen. 2003. Gaussian processes in machine learning. In Summer school on machine learning. Springer, 63--71.
[31]
Carl Edward Rasmussen and Christopher KI Williams. 2006. Gaussian processes for machine learning. Vol. 1. MIT press Cambridge, MA.
[32]
Sayak Ray Chowdhury and Aditya Gopalan. 2019. Bayesian optimization under heavy-tailed payoffs. Advances in Neural Information Processing Systems 32 (2019).
[33]
Touqir Sajed and Or Sheffet. 2019. An optimal private stochastic-mab algorithm based on optimal private stopping rule. In International Conference on Machine Learning. PMLR, 5579--5588.
[34]
School of Computer Science, Carnegie Mellon University [n.d.]. Light sensor data. Retrieved October 05, 2022, from http://www.cs.cmu.edu/~guestrin/Class/10708-F08/projects/lightsensor.zip.
[35]
Rachael Hwee Ling Sim, Yehong Zhang, Bryan Kian Hsiang Low, and Patrick Jaillet. 2021. Collaborative Bayesian optimization with fair regret. In International Conference on Machine Learning. PMLR, 9691--9701.
[36]
Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. 2009. Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995 (2009).
[37]
S. Surjanovic and D. Bingham. [n.d.]. Virtual Library of Simulation Experiments: Test Functions and Datasets. Retrieved July 29, 2022, from http://www.sfu.ca/~ssurjano.
[38]
Jay Tenenbaum, Haim Kaplan, Yishay Mansour, and Uri Stemmer. 2021. Differentially private multi-armed bandits in the shuffle model. Advances in Neural Information Processing Systems 34 (2021), 24956--24967.
[39]
Sattar Vakili, Nacime Bouziani, Sepehr Jalali, Alberto Bernacchia, and Da-shan Shiu. 2021. Optimal order simple regret for gaussian process bandits. Advances in Neural Information Processing Systems 34 (2021).
[40]
Sattar Vakili, Kia Khezeli, and Victor Picheny. 2021. On information gain and regret bounds in gaussian process bandits. In International Conference on Artificial Intelligence and Statistics. PMLR, 82--90.
[41]
Xingyu Zhou and Bo Ji. 2022. On Kernelized Multi-Armed Bandits with Constraints. arXiv preprint arXiv:2203.15589 (2022).
[42]
Xingyu Zhou and Jian Tan. 2020. Local differential privacy for bayesian optimization. arXiv preprint arXiv:2010.06709 (2020).
[43]
Yinglun Zhu, Dongruo Zhou, Ruoxi Jiang, Quanquan Gu, Rebecca Willett, and Robert Nowak. 2021. Pure exploration in kernel and neural bandits. Advances in Neural Information Processing Systems 34 (2021), 11618--11630.

Cited By

View all
  • (2024)Towards Understanding and Characterizing the Arbitrage Bot Scam In the WildACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365508852:1(89-90)Online publication date: 13-Jun-2024
  • (2024)Learning the Optimal Control for Evolving Systems with Converging DynamicsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560078:2(1-39)Online publication date: 29-May-2024
  • (2024)Towards Understanding and Characterizing the Arbitrage Bot Scam In the WildAbstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3652963.3655088(89-90)Online publication date: 10-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 7, Issue 1
POMACS
March 2023
749 pages
EISSN:2476-1249
DOI:10.1145/3586099
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 March 2023
Published in POMACS Volume 7, Issue 1

Check for updates

Author Tags

  1. bias
  2. communication cost
  3. computation complexity
  4. distributed feedback
  5. kernelized bandits
  6. privacy
  7. regret

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)171
  • Downloads (Last 6 weeks)35
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Understanding and Characterizing the Arbitrage Bot Scam In the WildACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365508852:1(89-90)Online publication date: 13-Jun-2024
  • (2024)Learning the Optimal Control for Evolving Systems with Converging DynamicsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560078:2(1-39)Online publication date: 29-May-2024
  • (2024)Towards Understanding and Characterizing the Arbitrage Bot Scam In the WildAbstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3652963.3655088(89-90)Online publication date: 10-Jun-2024
  • (2024)Machine Translation Testing via Syntactic Tree PruningACM Transactions on Software Engineering and Methodology10.1145/364032933:5(1-39)Online publication date: 4-Jun-2024
  • (2024)Distributed Linear Bandits With Differential PrivacyIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.336297811:3(3161-3173)Online publication date: May-2024
  • (2023)(Private) Kernelized Bandits with Distributed Biased FeedbackACM SIGMETRICS Performance Evaluation Review10.1145/3606376.359356551:1(61-62)Online publication date: 27-Jun-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media