Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3327144.3327342guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article
Free access

Differentially private contextual linear bandits

Published: 03 December 2018 Publication History

Abstract

We study the contextual linear bandit problem, a version of the standard stochastic multi-armed bandit (MAB) problem where a learner sequentially selects actions to maximize a reward which depends also on a user provided per-round context. Though the context is chosen arbitrarily or adversarially, the reward is assumed to be a stochastic function of a feature vector that encodes the context and selected action. Our goal is to devise private learners for the contextual linear bandit problem.
We first show that using the standard definition of differential privacy results in linear regret. So instead, we adopt the notion of joint differential privacy, where we assume that the action chosen on day t is only revealed to user t and thus needn't be kept private that day, only on following days. We give a general scheme converting the classic linear-UCB algorithm into a joint differentially private algorithm using the tree-based algorithm [10, 18]. We then apply either Gaussian noise or Wishart noise to achieve joint-differentially private algorithms and bound the resulting algorithms' regrets. In addition, we give the first lower bound on the additional regret any private algorithms for the MAB problem must incur.

References

[1]
Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24, pages 2312-2320. Curran Associates, Inc., 2011.
[2]
Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Online-to-confidence-set conversions and application to sparse stochastic bandits. In AISTATS, pages 1-9, 2012.
[3]
Naoki Abe, Alan W. Biermann, and Philip M. Long. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37(4):263-293, 2003.
[4]
Rajeev Agrawal. Sample mean based index policies with O(log n) regret for the multi-armed bandit problem., volume 27, pages 1054-1078. Applied Probability Trust, 1995.
[5]
Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. JMLR, 3:397-422, 2003.
[6]
Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. JMLR, 47(2-3):235-256, 2002.
[7]
Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, FOCS '14, pages 464-473, Washington, DC, USA, 2014. IEEE Computer Society. ISBN 978-1-4799-6517-5.
[8]
Donald A Berry and Bert Fristedt. Bandit problems: sequential allocation of experiments. London ; New York : Chapman and Hall, 1985.
[9]
Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography, Lecture Notes in Computer Science, pages 635-658. Springer, Berlin, Heidelberg, November 2016. ISBN 978-3-662-53640-7 978-3-662-53641-4.
[10]
T.-H. Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics. In Automata, Languages and Programming, Lecture Notes in Computer Science, pages 405-417. Springer, Berlin, Heidelberg, July 2010. ISBN 978-3-642-14161-4 978-3-642-14162-1.
[11]
Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk minimization. J. Mach. Learn. Res., 12:1069-1109, July 2011. ISSN 1532-4435.
[12]
Wei Chu, Lihong Li, Lev Reyzin, and Robert E. Schapire. Contextual bandits with linear payoff functions. In AISTATS, volume 15 of JMLR Proceedings, pages 208-214, 2011.
[13]
Varsha Dani, Thomas Hayes, and Sham Kakade. Stochastic linear optimization under bandit feedback. In 21st Annual Conference on Learning Theory, pages 355-366, January 2008.
[14]
C. Dwork, G. N. Rothblum, and S. Vadhan. Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 51-60, October 2010.
[15]
Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3-4):211-407, August 2014. ISSN 1551-305X, 1551-3068.
[16]
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology - EUROCRYPT 2006, Lecture Notes in Computer Science, pages 486-503. Springer, Berlin, Heidelberg, May 2006. ISBN 978-3-540-34546-6 978-3-540-34547-3.
[17]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Lecture Notes in Computer Science, pages 265-284. Springer, Berlin, Heidelberg, March 2006. ISBN 978-3-540-32731-8 978-3-540-32732-5.
[18]
Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under continual observation. InProceedings of the Forty-Second ACM Symposium on Theoryof Computing, STOC '10, pages 715-724, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0050-6.
[19]
Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Analyze Gauss: Optimal bounds for privacy-preserving principal component analysis. In Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, STOC '14, pages 11-20, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2710-7.
[20]
Moritz Hardt and Kunal Talwar. On the geometry of differential privacy. In Proceedings of the Forty-Second ACM Symposium on Theory of Computing, STOC '10, pages 705-714, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0050-6.
[21]
Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning. In Conference on Learning Theory, pages 24.1-24.34, June 2012.
[22]
Vishesh Karwa and Salil Vadhan. Finite sample differentially private confidence intervals. arXiv:1711.03908 [cs, math, stat], November 2017.
[23]
Michael Kearns, Mallesh Pai, Aaron Roth, and Jonathan Ullman. Mechanism design in large games: Incentives and privacy. pages 403-410. ACM Press, 2014. ISBN 978-1-4503-2698-8.
[24]
Tor Lattimore and Csaba Szepesvári. The end of optimism? an asymptotic analysis of finite-armed linear bandits. In AISTATS, pages 728-737, 2017.
[25]
B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5):1302-1338, October 2000. ISSN 0090-5364, 2168-8966.
[26]
Nikita Mishra and Abhradeep Thakurta. (Nearly) optimal differentially private stochastic multi-arm bandits. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI'15, pages 592-601, Arlington, Virginia, United States, 2015. AUAI Press. ISBN 978-0-9966431-0-8.
[27]
Seth Neel and Aaron Roth. Mitigating bias in adaptive data gathering via differential privacy. In ICML, pages 3717-3726, 2018.
[28]
Herbert Robbins. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc., 58(5): 527-535, 09 1952.
[29]
Paat Rusmevichientong and John N. Tsitsiklis. Linearly parameterized bandits. Mathematics of Operations Research, 35(2):395-411, April 2010. ISSN 0364-765X.
[30]
Or Sheffet. Private approximations of the 2nd-moment matrix using existing techniques in linear regression. arXiv:1507.00056 [cs], June 2015.
[31]
Adam Smith and Abhradeep Thakurta. (Nearly) optimal algorithms for private online learning in full-information and bandit settings. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 2733-2741. Curran Associates, Inc., 2013.
[32]
Terence Tao. Topics in Random Matrix Theory, volume 132. American Mathematical Society Providence, RI, 2012.
[33]
Aristide C. Y. Tossou and Christos Dimitrakakis. Algorithms for differentially private multi-armed bandits. In Thirtieth AAAI Conference on Artificial Intelligence, March 2016.
[34]
Aristide C. Y. Tossou and Christos Dimitrakakis. Achieving privacy in the adversarial multi-armed bandit. arXiv:1701.04222 [cs], January 2017.
[35]
Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv:1011.3027 [cs, math], November 2010.
[36]
Fuzhen Zhang. Matrix Theory: Basic Results and Techniques. Universitext. Springer, New York, 2nd edition, 2011. ISBN 978-1-4614-1098-0.

Cited By

View all
  1. Differentially private contextual linear bandits

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems
      December 2018
      11021 pages

      Publisher

      Curran Associates Inc.

      Red Hook, NY, United States

      Publication History

      Published: 03 December 2018

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)87
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 14 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media