abstract

Incentivizing Exploration with Selective Data Disclosure

Authors:

Nicole Immorlica,

Jieming Mao,

Aleksandrs Slivkins,

Zhiwei Steven WuAuthors Info & Claims

EC '20: Proceedings of the 21st ACM Conference on Economics and Computation

Pages 647 - 648

https://doi.org/10.1145/3391403.3399487

Published: 13 July 2020 Publication History

Get Access

Abstract

We study the design of rating systems that incentivize (more) efficient social learning among self-interested agents. Agents arrive sequentially and are presented with a set of possible actions, each of which yields a positive reward with an unknown probability. A disclosure policy sends messages about the rewards of previously-chosen actions to arriving agents. These messages can alter agents' incentives towards exploration, taking potentially sub-optimal actions for the sake of learning more about their rewards. Prior work achieves much progress with disclosure policies that merely recommend an action to each user, without any other supporting information, and sometimes recommend exploratory actions. All this work relies heavily on standard, yet very strong rationality assumptions. However, these assumptions are quite problematic in the context of the motivating applications: recommendation systems such as Yelp, Amazon, or Netflix, and macthing markets such as AirBnB. It is very unclear whether users would know and understand a complicated disclosure policy announced by the principal, let alone trust the principal to faithfully implement it. (The principal may deviate from the announced policy either intentionally, or due to insufficient information about the users, or because of bugs in implementation.) Even if the users understand the policy and trust that it was implemented as claimed, they might not react to it rationally, particularly given the lack of supporting information and the possibility of being singled out for exploration. For example, users may find such disclosure policies unacceptable and leave the system.

We study a particular class of disclosure policies that use messages, called unbiased subhistories, consisting of the actions and rewards from a subsequence of past agents. Each subsequence is chosen ahead of time, according to a predetermined partial order on the rounds. We posit a flexible model of frequentist agent response, which we argue is plausible for this class of "order-based" disclosure policies. We measure the performance of a policy by its regret, i.e., the difference in expected total reward between the best action and the policy. A disclosure policy that reveals full history in each round risks inducing herding behavior among the agents, and typically has regret linear in the time horizon T. Our main result is an order-based disclosure policy that obtains regret ~O (√T). This regret is known to be optimal in the worst case over reward distributions, even absent incentives. We also exhibit simpler order-based policies with higher, but still sublinear, regret. These policies can be interpreted as dividing a sublinear number of agents into constant-sized focus groups, whose histories are then revealed to future agents.

Helping market participants find whatever they are looking for, and coordinating their search and exploration behavior in a globally optimal way, is an essential part of market design. This paper continues the line of work on "incentivized exploration": essentially, exploration-exploitation learning in the presence of self-interested users whose incentives are skewed in favor of exploitation. Conceptually, we study the interplay of information design, social learning, and multi-armed bandit algorithms. To the best of our knowledge, this is the first paper in the literature on incentivized exploration (and possibly in the broader literature on "learning and incentives") which attempts to mitigate the limitations of standard economic assumptions. Full version: https://arxiv.org/abs/1811.06026.

Cited By

View all

Kalvit ASlivkins AGur Y(2024)Incentivized Exploration via Filtered Posterior SamplingSSRN Electronic Journal10.2139/ssrn.4733191Online publication date: 2024
https://doi.org/10.2139/ssrn.4733191
Dai XXu WQi YJordan M(2024)Incentive-Aware Recommender Systems in Two-Sided MarketsACM Transactions on Recommender Systems10.1145/3674158Online publication date: 21-Jun-2024
https://doi.org/10.1145/3674158
Agrawal SFeng YTang WOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Dynamic pricing and learning with Bayesian persuasionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668710(59273-59285)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668710
Show More Cited By

Index Terms

Incentivizing Exploration with Selective Data Disclosure
1. Information systems
  1. World Wide Web
    1. Web searching and information discovery
      1. Social recommendation
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Algorithmic game theory and mechanism design
      1. Algorithmic mechanism design

Recommendations

Bayesian Exploration with Heterogeneous Agents
WWW '19: The World Wide Web Conference

It is common in recommendation systems that users both consume and produce information as they make strategic choices under uncertainty. While a social planner would balance “exploration” and “exploitation” using a multi-armed bandit algorithm, users' ...
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity
EC '21: Proceedings of the 22nd ACM Conference on Economics and Computation

We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents, and the algorithm can only issue recommendations. The algorithm controls the flow of information, and the ...
Incentivizing exploration
EC '14: Proceedings of the fifteenth ACM conference on Economics and computation

We study a Bayesian multi-armed bandit (MAB) setting in which a principal seeks to maximize the sum of expected time-discounted rewards obtained by pulling arms, when the arms are actually pulled by selfish and myopic individuals. Since such individuals ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

EC '20: Proceedings of the 21st ACM Conference on Economics and Computation

July 2020

937 pages

ISBN:9781450379755

DOI:10.1145/3391403

General Chairs:
Péter Biró
Hungarian Academy of Sciences
,
Jason Hartline
Northwestern University
,
Program Chairs:
Michael Ostrovsky
Stanford University
,
Ariel Procaccia
Harvard University

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2020

Check for updates

Author Tags

Qualifiers

Abstract

Conference

EC '20

Sponsor:

SIGecom

EC '20: The 21st ACM Conference on Economics and Computation

July 13 - 17, 2020

Virtual Event, Hungary

Acceptance Rates

Overall Acceptance Rate 664 of 2,389 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
73
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)2

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kalvit ASlivkins AGur Y(2024)Incentivized Exploration via Filtered Posterior SamplingSSRN Electronic Journal10.2139/ssrn.4733191Online publication date: 2024
https://doi.org/10.2139/ssrn.4733191
Dai XXu WQi YJordan M(2024)Incentive-Aware Recommender Systems in Two-Sided MarketsACM Transactions on Recommender Systems10.1145/3674158Online publication date: 21-Jun-2024
https://doi.org/10.1145/3674158
Agrawal SFeng YTang WOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Dynamic pricing and learning with Bayesian persuasionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668710(59273-59285)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668710
Banihashem KHajiaghayi MShin SSlivkins AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Bandit social learning under myopic behaviorProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666578(10385-10411)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666578
Sellke MKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Incentivizing exploration with linear contexts and combinatorial actionsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619675(30570-30583)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619675
Agarwal ANiazadeh RPatil P(2023)Misalignment, Learning, and Ranking: Harnessing Users Limited AttentionSSRN Electronic Journal10.2139/ssrn.4365381Online publication date: 2023
https://doi.org/10.2139/ssrn.4365381
Camara MHartline JJohnsen A(2020)Mechanisms for a No-Regret Agent: Beyond the Common Prior2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS46700.2020.00033(259-270)Online publication date: Nov-2020
https://doi.org/10.1109/FOCS46700.2020.00033
Dasaratha KHe K(undefined)Aggregative Efficiency of Bayesian Learning in NetworksSSRN Electronic Journal10.2139/ssrn.3914873
https://doi.org/10.2139/ssrn.3914873

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Bayesian Exploration with Heterogeneous Agents

The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

Incentivizing exploration