Article

Free access

Differentially private contextual linear bandits

Authors:

Roshan Shariff,

Or SheffetAuthors Info & Claims

NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

Pages 4301 - 4311

Published: 03 December 2018 Publication History

PDF eReader Publisher Site

Abstract

We study the contextual linear bandit problem, a version of the standard stochastic multi-armed bandit (MAB) problem where a learner sequentially selects actions to maximize a reward which depends also on a user provided per-round context. Though the context is chosen arbitrarily or adversarially, the reward is assumed to be a stochastic function of a feature vector that encodes the context and selected action. Our goal is to devise private learners for the contextual linear bandit problem.

We first show that using the standard definition of differential privacy results in linear regret. So instead, we adopt the notion of joint differential privacy, where we assume that the action chosen on day t is only revealed to user t and thus needn't be kept private that day, only on following days. We give a general scheme converting the classic linear-UCB algorithm into a joint differentially private algorithm using the tree-based algorithm [10, 18]. We then apply either Gaussian noise or Wishart noise to achieve joint-differentially private algorithms and bound the resulting algorithms' regrets. In addition, we give the first lower bound on the additional regret any private algorithms for the MAB problem must incur.

References

[1]

Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24, pages 2312-2320. Curran Associates, Inc., 2011.

Digital Library

[2]

Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Online-to-confidence-set conversions and application to sparse stochastic bandits. In AISTATS, pages 1-9, 2012.

[3]

Naoki Abe, Alan W. Biermann, and Philip M. Long. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37(4):263-293, 2003.

Digital Library

[4]

Rajeev Agrawal. Sample mean based index policies with O(log n) regret for the multi-armed bandit problem., volume 27, pages 1054-1078. Applied Probability Trust, 1995.

[5]

Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. JMLR, 3:397-422, 2003.

Digital Library

[6]

Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. JMLR, 47(2-3):235-256, 2002.

Digital Library

[7]

Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, FOCS '14, pages 464-473, Washington, DC, USA, 2014. IEEE Computer Society. ISBN 978-1-4799-6517-5.

Digital Library

[8]

Donald A Berry and Bert Fristedt. Bandit problems: sequential allocation of experiments. London ; New York : Chapman and Hall, 1985.

[9]

Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography, Lecture Notes in Computer Science, pages 635-658. Springer, Berlin, Heidelberg, November 2016. ISBN 978-3-662-53640-7 978-3-662-53641-4.

Digital Library

[10]

T.-H. Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics. In Automata, Languages and Programming, Lecture Notes in Computer Science, pages 405-417. Springer, Berlin, Heidelberg, July 2010. ISBN 978-3-642-14161-4 978-3-642-14162-1.

[11]

Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk minimization. J. Mach. Learn. Res., 12:1069-1109, July 2011. ISSN 1532-4435.

Digital Library

[12]

Wei Chu, Lihong Li, Lev Reyzin, and Robert E. Schapire. Contextual bandits with linear payoff functions. In AISTATS, volume 15 of JMLR Proceedings, pages 208-214, 2011.

[13]

Varsha Dani, Thomas Hayes, and Sham Kakade. Stochastic linear optimization under bandit feedback. In 21st Annual Conference on Learning Theory, pages 355-366, January 2008.

[14]

C. Dwork, G. N. Rothblum, and S. Vadhan. Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 51-60, October 2010.

Digital Library

[15]

Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3-4):211-407, August 2014. ISSN 1551-305X, 1551-3068.

Digital Library

[16]

Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology - EUROCRYPT 2006, Lecture Notes in Computer Science, pages 486-503. Springer, Berlin, Heidelberg, May 2006. ISBN 978-3-540-34546-6 978-3-540-34547-3.

[17]

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Lecture Notes in Computer Science, pages 265-284. Springer, Berlin, Heidelberg, March 2006. ISBN 978-3-540-32731-8 978-3-540-32732-5.

Digital Library

[18]

Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under continual observation. InProceedings of the Forty-Second ACM Symposium on Theoryof Computing, STOC '10, pages 715-724, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0050-6.

Digital Library

[19]

Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Analyze Gauss: Optimal bounds for privacy-preserving principal component analysis. In Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, STOC '14, pages 11-20, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2710-7.

Digital Library

[20]

Moritz Hardt and Kunal Talwar. On the geometry of differential privacy. In Proceedings of the Forty-Second ACM Symposium on Theory of Computing, STOC '10, pages 705-714, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0050-6.

Digital Library

[21]

Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning. In Conference on Learning Theory, pages 24.1-24.34, June 2012.

[22]

Vishesh Karwa and Salil Vadhan. Finite sample differentially private confidence intervals. arXiv:1711.03908 [cs, math, stat], November 2017.

[23]

Michael Kearns, Mallesh Pai, Aaron Roth, and Jonathan Ullman. Mechanism design in large games: Incentives and privacy. pages 403-410. ACM Press, 2014. ISBN 978-1-4503-2698-8.

Digital Library

[24]

Tor Lattimore and Csaba Szepesvári. The end of optimism? an asymptotic analysis of finite-armed linear bandits. In AISTATS, pages 728-737, 2017.

[25]

B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5):1302-1338, October 2000. ISSN 0090-5364, 2168-8966.

[26]

Nikita Mishra and Abhradeep Thakurta. (Nearly) optimal differentially private stochastic multi-arm bandits. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI'15, pages 592-601, Arlington, Virginia, United States, 2015. AUAI Press. ISBN 978-0-9966431-0-8.

Digital Library

[27]

Seth Neel and Aaron Roth. Mitigating bias in adaptive data gathering via differential privacy. In ICML, pages 3717-3726, 2018.

[28]

Herbert Robbins. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc., 58(5): 527-535, 09 1952.

[29]

Paat Rusmevichientong and John N. Tsitsiklis. Linearly parameterized bandits. Mathematics of Operations Research, 35(2):395-411, April 2010. ISSN 0364-765X.

Digital Library

[30]

Or Sheffet. Private approximations of the 2nd-moment matrix using existing techniques in linear regression. arXiv:1507.00056 [cs], June 2015.

[31]

Adam Smith and Abhradeep Thakurta. (Nearly) optimal algorithms for private online learning in full-information and bandit settings. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 2733-2741. Curran Associates, Inc., 2013.

Digital Library

[32]

Terence Tao. Topics in Random Matrix Theory, volume 132. American Mathematical Society Providence, RI, 2012.

[33]

Aristide C. Y. Tossou and Christos Dimitrakakis. Algorithms for differentially private multi-armed bandits. In Thirtieth AAAI Conference on Artificial Intelligence, March 2016.

Digital Library

[34]

Aristide C. Y. Tossou and Christos Dimitrakakis. Achieving privacy in the adversarial multi-armed bandit. arXiv:1701.04222 [cs], January 2017.

[35]

Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv:1011.3027 [cs, math], November 2010.

[36]

Fuzhen Zhang. Matrix Theory: Basic Results and Techniques. Universitext. Springer, New York, 2nd edition, 2011. ISBN 978-1-4614-1098-0.

Cited By

Zhang LZhu TXiong PZhou WYu P(2021)More than PrivacyACM Computing Surveys10.1145/346077154:7(1-37)Online publication date: 18-Jul-2021
https://dl.acm.org/doi/10.1145/3460771

Differentially private contextual linear bandits
1. Computing methodologies
2. Theory of computation

Recommendations

Differentially Private Federated Combinatorial Bandits with Constraints
Machine Learning and Knowledge Discovery in Databases
Abstract
There is a rapid increase in the cooperative learning paradigm in online learning settings, i.e., federated learning (FL). Unlike most FL settings, there are many situations where the agents are competitive. Each agent would like to learn from ...
(Nearly) optimal differentially private stochastic multi-arm bandits
UAI'15: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence

We study the problem of private stochastic multi-arm bandits. Our notion of privacy is the same as some of the earlier works in the general area of private online learning [13, 17, 24]. We design algorithms that are i) differentially private, and ii) ...
Contextual conservative interleaving bandits
ICML'23: Proceedings of the 40th International Conference on Machine Learning

The performance of a bandit algorithm is usually measured by the cumulative rewards of the actions chosen by the algorithm. However, in many real-world applications, the rewards in each round should be good enough for reasons such as safety and fairness. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

December 2018

11021 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 03 December 2018

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
158
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)7

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang LZhu TXiong PZhou WYu P(2021)More than PrivacyACM Computing Surveys10.1145/346077154:7(1-37)Online publication date: 18-Jul-2021
https://dl.acm.org/doi/10.1145/3460771

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents