Abstract
Companies like Google and Microsoft run billions of auctions every day to sell advertising opportunities. Any change to the rules of these auctions can have a tremendous effect on the revenue of the company and the welfare of the advertisers and the users. Therefore, any change requires careful evaluation of its potential impacts. Currently, such impacts are often evaluated by running simulations or small controlled experiments. This, however, misses the important factor that the advertisers respond to changes. Our goal is to build a theoretical framework for predicting the actions of an agent (the advertiser) that is optimizing her actions in an uncertain environment. We model this problem using a variant of the multi-armed bandit setting where playing an arm is costly. The cost of each arm changes over time and is publicly observable. The value of playing an arm is drawn stochastically from a static distribution and is observed by the agent and not by us. We, however, observe the actions of the agent. Our main result is that assuming the agent is playing a strategy with a regret of at most f(T) within the first T rounds, we can learn to play the multi-armed bandits game (without observing the rewards) in such a way that the regret of our selected actions is at most \(O(k^4(f(T)+1)\log (T))\), where k is the number of arms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
See [15] for an attempt to solve this problem by restricting the experiment to small micro-markets. Note that this has the obvious disadvantage of biasing the experiment toward a non-representative set of advertisers and auctions.
- 2.
See [9] for an interesting theoretical treatment of this setting. It turns out that assuming that the advertisers are fully rational and react even to a small change in the auction, even treating a small percentage of each advertiser’s auctions is enough to extrapolate their response to a full treatment. In practice, however, there is too much noise and fluctuation in the system for advertisers to be able to observe and respond to a change that, for example, increases their cost per click by 10% in 1% of their auctions.
- 3.
In our motivating application, the reward can be the profit the advertiser makes if the user clicks on their ad and makes a purchase, or zero otherwise. In this case, the assumption that the reward distribution is static means that the profit per conversion and the conversion probability are fixed over time. This is not entirely accurate, but is a reasonable approximation of the reality, since while these parameters change over time, they tend to change at a slow pace.
References
Athey, S., Nekipelov, D.: A structural model of sponsored search advertising auctions. In: Sixth Ad Auctions Workshop (2010)
Athey, S., Haile, P.A.: Nonparametric approaches to auctions. In: Handbook of Econometrics. Elsevier (2007)
Backstrom, L., Kleinberg, J.: Network bucket testing. In: Proceedings of the 20th International Conference on World Wide Web, pp. 615–624. ACM (2011)
Braverman, M., Mao, J., Schneider, J., Weinberg, M.: Selling to a no-regret buyer. In: Proceedings of the 2018 ACM Conference on Economics and Computation, pp. 523–538 (2018)
Broder, A., Gabrilovich, E., Josifovski, V., Mavromatis, G., Smola, A.: Bid generation for advanced match in sponsored search. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, New York, NY, USA, pp. 515–524 (2011)
Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. CoRR abs/1204.5721 (2012)
Campo, S., Guerre, E., Perrigne, I., Vuong, Q.: Semiparametric Estimation of First-price Auctions with Risk Averse Bidders. Working papers, Centre de Recherche en Economie et Statistique (2003)
Cary, M., et al.: Greedy bidding strategies for keyword auctions. In: Proceedings of the 8th ACM Conference on Electronic Commerce, EC 2007, pp. 262–271. ACM, New York (2007). https://doi.org/10.1145/1250910.1250949
Chawla, S., Hartline, J., Nekipelov, D.: Mechanism design for data science. In: Proceedings of the Fifteenth ACM Conference on Economics and Computation, pp. 711–712. ACM (2014)
Feldman, M., Kalai, A., Tennenholtz, M.: Playing games without observing payoffs. In: ICS, pp. 106–110 (2010)
Guerre, E., Perrigne, I., Vuong, Q.: Optimal nonparametric estimation of first-price auctions. Econometrica 68(3), 525–574 (2000)
Guerre, E., Perrigne, I., Vuong, Q.: Nonparametric identification of risk aversion in first-price auctions under exclusion restrictions. Econometrica 77(4), 1193–1227 (2009)
Jofre-Bonet, M., Pesendorfer, M.: Estimation of a dynamic auction game. Econometrica 71(5), 1443–1489 (2003)
Kuleshov, V., Schrijvers, O.: Inverse game theory: learning utilities in succinct games. In: Markakis, E., Schäfer, G. (eds.) WINE 2015. LNCS, vol. 9470, pp. 413–427. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48995-6_30
Lang, K.J., Andersen, R.: Finding dense and isolated submarkets in a sponsored search spending graph. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 613–622. ACM (2007)
Mizuta, H., Steiglitz, K.: Agent-based simulation of dynamic online auctions. In: 2000 Winter Simulation Conference Proceedings, vol. 2, pp. 1772–1777. IEEE (2000)
Nekipelov, D., Syrgkanis, V., Tardos, E.: Econometrics for learning agents. In: Proceedings of the Sixteenth ACM Conference on Economics and Computation, pp. 1–18. ACM (2015)
Pin, F., Key, P.: Stochastic variability in sponsored search auctions: observations and models. In: Proceedings of the 12th ACM Conference on Electronic Commerce, pp. 61–70 (2011)
Tang, D., Agarwal, A., O’Brien, D., Meyer, M.: Overlapping experiment infrastructure: more, better, faster experimentation. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 17–26. ACM (2010)
Varian, H.R.: Position auctions. Int. J. Ind. Organ. 25(6), 1163–1178 (2007)
Xu, H., Gao, B., Yang, D., Liu, T.Y.: Predicting advertiser bidding behaviors in sponsored search by rationality modeling. In: Proceedings of the 22nd International Conference on World Wide Web, May 2013
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Alaei, S., Badanidiyuru, A., Mahdian, M., Yazdanbod, S. (2019). Response Prediction for Low-Regret Agents. In: Caragiannis, I., Mirrokni, V., Nikolova, E. (eds) Web and Internet Economics. WINE 2019. Lecture Notes in Computer Science(), vol 11920. Springer, Cham. https://doi.org/10.1007/978-3-030-35389-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-35389-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35388-9
Online ISBN: 978-3-030-35389-6
eBook Packages: Computer ScienceComputer Science (R0)