Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3637528.3671555acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Offline Reinforcement Learning for Optimizing Production Bidding Policies

Published: 24 August 2024 Publication History

Abstract

The online advertising market, with its thousands of auctions run per second, presents a daunting challenge for advertisers who wish to optimize their spend under a budget constraint. Thus, advertising platforms typically provide automated agents to their customers, which act on their behalf to bid for impression opportunities in real time at scale. Because these proxy agents are owned by the platform but use advertiser funds to operate, there is a strong practical need to balance reliability and explainability of the agent with optimizing power. We propose a generalizable approach to optimizing bidding policies in production environments by learning from real data using offline reinforcement learning. This approach can be used to optimize any differentiable base policy (practically, a heuristic policy based on principles which the advertiser can easily understand), and only requires data generated by the base policy itself. We use a hybrid agent architecture that combines arbitrary base policies with deep neural networks, where only the optimized base policy parameters are eventually deployed, and the neural network part is discarded after training. We demonstrate that such an architecture achieves statistically significant performance gains in both simulated and at-scale production bidding environments compared to the default production bidding policy. Our approach does not incur additional infrastructure, safety, or explainability costs, as it directly optimizes the parameters of existing production routines without necessarily replacing them with black box-style models like neural networks.

Supplemental Material

MP4 File - ads0256-video.mp4
This promotional video introduces a practical approach to optimizing control policies in production environments by learning from real data with offline RL. Our approach does not incur additional infrastructure, safety, or explainability costs, as it directly optimizes the parameters of existing production routines. For details, please check our full paper "Offline Reinforcement Learning for Optimizing Production Bidding Policies".

References

[1]
Kareem Amin, Michael Kearns, Peter Key, and Anton Schwaighofer. 2012. Budget optimization for sponsored search: Censored learning in mdps. arXiv preprint arXiv:1210.4847 (2012).
[2]
OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. 2020. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, Vol. 39, 1 (2020), 3--20.
[3]
Ashwinkumar Badanidiyuru Varadaraja, Zhe Feng, Tianxi Li, and Haifeng Xu. 2022. Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards. Advances in Neural Information Processing Systems, Vol. 35 (2022), 2142--2153.
[4]
Santiago Balseiro, Negin Golrezaei, Mohammad Mahdian, Vahab Mirrokni, and Jon Schneider. 2019. Contextual bandits with cross-learning. Advances in Neural Information Processing Systems, Vol. 32 (2019).
[5]
Stuart Bennett. 1993. Development of the PID controller. IEEE Control Systems Magazine, Vol. 13, 6 (1993), 58--62.
[6]
Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, and Defeng Guo. 2017. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the tenth ACM international conference on web search and data mining. 661--670.
[7]
Ignacio Carlucho, Mariano De Paula, Sebastian A Villar, and Gerardo G Acosta. 2017. Incremental Q-learning strategy for adaptive PID control of mobile robots. Expert Systems with Applications, Vol. 80 (2017), 183--199.
[8]
Oguzhan Dogru, Kirubakaran Velswamy, Fadi Ibrahim, Yuqi Wu, Arun Senthil Sundaramoorthy, Biao Huang, Shu Xu, Mark Nixon, and Noel Bell. 2022. Reinforcement learning approach to autonomous PID tuning. Computers & Chemical Engineering, Vol. 161 (2022), 107760.
[9]
Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. 2007. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American economic review, Vol. 97, 1 (2007), 242--259.
[10]
Benjamin Eysenbach, Matthieu Geist, Sergey Levine, and Ruslan Salakhutdinov. 2023. A Connection between One-Step RL and Critic Regularization in Reinforcement Learning. (2023).
[11]
Zhe Feng, Swati Padmanabhan, and Di Wang. 2023. Online Bidding Algorithms for Return-on-Spend Constrained Advertisers. In Proceedings of the ACM Web Conference 2023. 3550--3560.
[12]
Zhe Feng, Chara Podimata, and Vasilis Syrgkanis. 2018. Learning to bid without knowing your value. In Proceedings of the 2018 ACM Conference on Economics and Computation. 505--522.
[13]
Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052--2062.
[14]
Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).
[15]
Yue He, Xiujun Chen, Di Wu, Junwei Pan, Qing Tan, Chuan Yu, Jian Xu, and Xiaoqiang Zhu. 2021. A unified solution to constrained bidding in online display advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2993--3001.
[16]
Kartik Hosanagar and Vadim Cherepanov. 2008. Optimal bidding in stochastic budget constrained slot auctions. In Proceedings of the 9th ACM Conference on Electronic Commerce. 20--20.
[17]
Sham Kakade and John Langford. 2002. Approximately optimal approximate reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning. 267--274.
[18]
Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, et al. 2018. Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on Robot Learning. PMLR, 651--673.
[19]
Dmytro Korenkevych, A Rupam Mahmood, Gautham Vasan, and James Bergstra. 2019. Autoregressive policies for continuous control deep reinforcement learning. arXiv preprint arXiv:1903.11524 (2019).
[20]
Vijay Krishna. 2009. Auction theory. Academic press.
[21]
Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, and Sergey Levine. 2019. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems, Vol. 32 (2019).
[22]
Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. 2020. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, Vol. 33 (2020), 1179--1191.
[23]
Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2016. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 1334--1373.
[24]
Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 (2020).
[25]
Brendan Lucier, Sarath Pattathil, Aleksandrs Slivkins, and Mengxiao Zhang. 2023. Autobidders with budget and roi constraints: Efficiency, regret, and pacing dynamics. arXiv preprint arXiv:2301.13306 (2023).
[26]
A Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, and James Bergstra. 2018. Benchmarking reinforcement learning algorithms on real-world robots. In Conference on robot learning. PMLR, 561--591.
[27]
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928--1937.
[28]
Zhiyu Mou, Yusen Huo, Rongquan Bai, Mingzhou Xie, Chuan Yu, Jian Xu, and Bo Zheng. 2022. Sustainable Online Reinforcement Learning for Auto-bidding. Advances in Neural Information Processing Systems, Vol. 35 (2022), 2651--2663.
[29]
Claudia Perlich, Brian Dalessandro, Rod Hook, Ori Stitelman, Troy Raeder, and Foster Provost. 2012. Bid optimizing and inventory scoring in targeted online advertising. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 804--812.
[30]
Mostafa Sedighizadeh and Alireza Rezazadeh. 2008. Adaptive PID controller based on reinforcement learning for wind turbine control. In Proceedings of world academy of science, engineering and technology, Vol. 27. Citeseer, 257--262.
[31]
William John Shipman. 2021. Learning to Tune a Class of Controllers with Deep Reinforcement Learning. Minerals, Vol. 11, 9 (2021), 989.
[32]
Qifeng Sun, Chengze Du, Youxiang Duan, Hui Ren, and Hongqiang Li. 2021. Design and application of adaptive PID controller based on asynchronous advantage actor--critic learning method. Wireless Networks, Vol. 27 (2021), 3537--3547.
[33]
Richard S Sutton, Andrew G Barto, et al. 1998. Introduction to reinforcement learning. Vol. 135. MIT press Cambridge.
[34]
Haozhe Wang, Chao Du, Panyan Fang, Li He, Liang Wang, and Bo Zheng. 2023. Adversarial Constrained Bidding via Minimax Regret Optimization with Causality-Aware Reinforcement Learning. arXiv preprint arXiv:2306.07106 (2023).
[35]
Jun Wang and Shuai Yuan. 2015. Real-time bidding: A new frontier of computational advertising research. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. 415--416.
[36]
Jonathan Weed, Vianney Perchet, and Philippe Rigollet. 2016. Online learning in repeated auctions. In Conference on Learning Theory. PMLR, 1562--1583.
[37]
Jiayi Weng, Huayu Chen, Dong Yan, Kaichao You, Alexis Duburcq, Minghao Zhang, Yi Su, Hang Su, and Jun Zhu. 2022. Tianshou: A Highly Modularized Deep Reinforcement Learning Library. Journal of Machine Learning Research, Vol. 23, 267 (2022), 1--6. http://jmlr.org/papers/v23/21--1127.html
[38]
Di Wu, Xiujun Chen, Xun Yang, Hao Wang, Qing Tan, Xiaoxun Zhang, Jian Xu, and Kun Gai. 2018. Budget constrained bidding by model-free reinforcement learning in display advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1443--1451.
[39]
Xun Yang, Yasong Li, Hao Wang, Di Wu, Qing Tan, Jian Xu, and Kun Gai. 2019. Bid optimization by multivariable control in display advertising. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 1966--1974.
[40]
Shuai Yuan, Jun Wang, and Xiaoxue Zhao. 2013. Real-time bidding for online advertising: measurement and analysis. In Proceedings of the seventh international workshop on data mining for online advertising. 1--8.
[41]
Weinan Zhang, Yifei Rong, Jun Wang, Tianchi Zhu, and Xiaofan Wang. 2016. Feedback control of real-time display advertising. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. 407--416.
[42]
Weinan Zhang and Jun Wang. 2015. Statistical arbitrage mining for display advertising. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1465--1474.
[43]
Weinan Zhang, Shuai Yuan, and Jun Wang. 2014. Optimal real-time bidding for display advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1077--1086.
[44]
Jun Zhao, Guang Qiu, Ziyu Guan, Wei Zhao, and Xiaofei He. 2018. Deep reinforcement learning for sponsored search real-time bidding. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1021--1030.
[45]
Haolin Zhou, Chaoqi Yang, Xiaofeng Gao, Qiong Chen, Gongshen Liu, and Guihai Chen. 2022. Multi-Objective Actor-Critics for Real-Time Bidding in Display Advertising. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 20--37.

Index Terms

  1. Offline Reinforcement Learning for Optimizing Production Bidding Policies

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
      August 2024
      6901 pages
      ISBN:9798400704901
      DOI:10.1145/3637528
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 August 2024

      Check for updates

      Author Tags

      1. auto-bidding
      2. reinforcement learning

      Qualifiers

      • Research-article

      Conference

      KDD '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 154
        Total Downloads
      • Downloads (Last 12 months)154
      • Downloads (Last 6 weeks)154
      Reflects downloads up to 21 Sep 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media