Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3391403.3399487acmconferencesArticle/Chapter ViewAbstractPublication PagesecConference Proceedingsconference-collections
abstract

Incentivizing Exploration with Selective Data Disclosure

Published: 13 July 2020 Publication History

Abstract

We study the design of rating systems that incentivize (more) efficient social learning among self-interested agents. Agents arrive sequentially and are presented with a set of possible actions, each of which yields a positive reward with an unknown probability. A disclosure policy sends messages about the rewards of previously-chosen actions to arriving agents. These messages can alter agents' incentives towards exploration, taking potentially sub-optimal actions for the sake of learning more about their rewards. Prior work achieves much progress with disclosure policies that merely recommend an action to each user, without any other supporting information, and sometimes recommend exploratory actions. All this work relies heavily on standard, yet very strong rationality assumptions. However, these assumptions are quite problematic in the context of the motivating applications: recommendation systems such as Yelp, Amazon, or Netflix, and macthing markets such as AirBnB. It is very unclear whether users would know and understand a complicated disclosure policy announced by the principal, let alone trust the principal to faithfully implement it. (The principal may deviate from the announced policy either intentionally, or due to insufficient information about the users, or because of bugs in implementation.) Even if the users understand the policy and trust that it was implemented as claimed, they might not react to it rationally, particularly given the lack of supporting information and the possibility of being singled out for exploration. For example, users may find such disclosure policies unacceptable and leave the system.
We study a particular class of disclosure policies that use messages, called unbiased subhistories, consisting of the actions and rewards from a subsequence of past agents. Each subsequence is chosen ahead of time, according to a predetermined partial order on the rounds. We posit a flexible model of frequentist agent response, which we argue is plausible for this class of "order-based" disclosure policies. We measure the performance of a policy by its regret, i.e., the difference in expected total reward between the best action and the policy. A disclosure policy that reveals full history in each round risks inducing herding behavior among the agents, and typically has regret linear in the time horizon T. Our main result is an order-based disclosure policy that obtains regret ~O (√T). This regret is known to be optimal in the worst case over reward distributions, even absent incentives. We also exhibit simpler order-based policies with higher, but still sublinear, regret. These policies can be interpreted as dividing a sublinear number of agents into constant-sized focus groups, whose histories are then revealed to future agents.
Helping market participants find whatever they are looking for, and coordinating their search and exploration behavior in a globally optimal way, is an essential part of market design. This paper continues the line of work on "incentivized exploration": essentially, exploration-exploitation learning in the presence of self-interested users whose incentives are skewed in favor of exploitation. Conceptually, we study the interplay of information design, social learning, and multi-armed bandit algorithms. To the best of our knowledge, this is the first paper in the literature on incentivized exploration (and possibly in the broader literature on "learning and incentives") which attempts to mitigate the limitations of standard economic assumptions. Full version: https://arxiv.org/abs/1811.06026.

Cited By

View all
  • (2024)Incentivized Exploration via Filtered Posterior SamplingSSRN Electronic Journal10.2139/ssrn.4733191Online publication date: 2024
  • (2024)Incentive-Aware Recommender Systems in Two-Sided MarketsACM Transactions on Recommender Systems10.1145/3674158Online publication date: 21-Jun-2024
  • (2023)Dynamic pricing and learning with Bayesian persuasionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668710(59273-59285)Online publication date: 10-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EC '20: Proceedings of the 21st ACM Conference on Economics and Computation
July 2020
937 pages
ISBN:9781450379755
DOI:10.1145/3391403
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2020

Check for updates

Author Tags

  1. incentivizing exploration
  2. unbiased subhistorie

Qualifiers

  • Abstract

Conference

EC '20
Sponsor:
EC '20: The 21st ACM Conference on Economics and Computation
July 13 - 17, 2020
Virtual Event, Hungary

Acceptance Rates

Overall Acceptance Rate 664 of 2,389 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)2
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Incentivized Exploration via Filtered Posterior SamplingSSRN Electronic Journal10.2139/ssrn.4733191Online publication date: 2024
  • (2024)Incentive-Aware Recommender Systems in Two-Sided MarketsACM Transactions on Recommender Systems10.1145/3674158Online publication date: 21-Jun-2024
  • (2023)Dynamic pricing and learning with Bayesian persuasionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668710(59273-59285)Online publication date: 10-Dec-2023
  • (2023)Bandit social learning under myopic behaviorProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666578(10385-10411)Online publication date: 10-Dec-2023
  • (2023)Incentivizing exploration with linear contexts and combinatorial actionsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619675(30570-30583)Online publication date: 23-Jul-2023
  • (2023)Misalignment, Learning, and Ranking: Harnessing Users Limited AttentionSSRN Electronic Journal10.2139/ssrn.4365381Online publication date: 2023
  • (2020)Mechanisms for a No-Regret Agent: Beyond the Common Prior2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS46700.2020.00033(259-270)Online publication date: Nov-2020
  • (undefined)Aggregative Efficiency of Bayesian Learning in NetworksSSRN Electronic Journal10.2139/ssrn.3914873

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media