research-article

Open access

Offline Reinforcement Learning for Optimizing Production Bidding Policies

Authors:

Dmytro Korenkevych,

Artsiom Balakir,

Zheqing ZhuAuthors Info & Claims

KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 5251 - 5259

https://doi.org/10.1145/3637528.3671555

Published: 24 August 2024 Publication History

Abstract

The online advertising market, with its thousands of auctions run per second, presents a daunting challenge for advertisers who wish to optimize their spend under a budget constraint. Thus, advertising platforms typically provide automated agents to their customers, which act on their behalf to bid for impression opportunities in real time at scale. Because these proxy agents are owned by the platform but use advertiser funds to operate, there is a strong practical need to balance reliability and explainability of the agent with optimizing power. We propose a generalizable approach to optimizing bidding policies in production environments by learning from real data using offline reinforcement learning. This approach can be used to optimize any differentiable base policy (practically, a heuristic policy based on principles which the advertiser can easily understand), and only requires data generated by the base policy itself. We use a hybrid agent architecture that combines arbitrary base policies with deep neural networks, where only the optimized base policy parameters are eventually deployed, and the neural network part is discarded after training. We demonstrate that such an architecture achieves statistically significant performance gains in both simulated and at-scale production bidding environments compared to the default production bidding policy. Our approach does not incur additional infrastructure, safety, or explainability costs, as it directly optimizes the parameters of existing production routines without necessarily replacing them with black box-style models like neural networks.

Supplemental Material

MP4 File - ads0256-video.mp4

This promotional video introduces a practical approach to optimizing control policies in production environments by learning from real data with offline RL. Our approach does not incur additional infrastructure, safety, or explainability costs, as it directly optimizes the parameters of existing production routines. For details, please check our full paper "Offline Reinforcement Learning for Optimizing Production Bidding Policies".

Download
62.98 MB

References

[1]

Kareem Amin, Michael Kearns, Peter Key, and Anton Schwaighofer. 2012. Budget optimization for sponsored search: Censored learning in mdps. arXiv preprint arXiv:1210.4847 (2012).

Digital Library

[2]

OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. 2020. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, Vol. 39, 1 (2020), 3--20.

Digital Library

[3]

Ashwinkumar Badanidiyuru Varadaraja, Zhe Feng, Tianxi Li, and Haifeng Xu. 2022. Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards. Advances in Neural Information Processing Systems, Vol. 35 (2022), 2142--2153.

[4]

Santiago Balseiro, Negin Golrezaei, Mohammad Mahdian, Vahab Mirrokni, and Jon Schneider. 2019. Contextual bandits with cross-learning. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[5]

Stuart Bennett. 1993. Development of the PID controller. IEEE Control Systems Magazine, Vol. 13, 6 (1993), 58--62.

[6]

Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, and Defeng Guo. 2017. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the tenth ACM international conference on web search and data mining. 661--670.

Digital Library

[7]

Ignacio Carlucho, Mariano De Paula, Sebastian A Villar, and Gerardo G Acosta. 2017. Incremental Q-learning strategy for adaptive PID control of mobile robots. Expert Systems with Applications, Vol. 80 (2017), 183--199.

Digital Library

[8]

Oguzhan Dogru, Kirubakaran Velswamy, Fadi Ibrahim, Yuqi Wu, Arun Senthil Sundaramoorthy, Biao Huang, Shu Xu, Mark Nixon, and Noel Bell. 2022. Reinforcement learning approach to autonomous PID tuning. Computers & Chemical Engineering, Vol. 161 (2022), 107760.

[9]

Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. 2007. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American economic review, Vol. 97, 1 (2007), 242--259.

[10]

Benjamin Eysenbach, Matthieu Geist, Sergey Levine, and Ruslan Salakhutdinov. 2023. A Connection between One-Step RL and Critic Regularization in Reinforcement Learning. (2023).

[11]

Zhe Feng, Swati Padmanabhan, and Di Wang. 2023. Online Bidding Algorithms for Return-on-Spend Constrained Advertisers. In Proceedings of the ACM Web Conference 2023. 3550--3560.

Digital Library

[12]

Zhe Feng, Chara Podimata, and Vasilis Syrgkanis. 2018. Learning to bid without knowing your value. In Proceedings of the 2018 ACM Conference on Economics and Computation. 505--522.

Digital Library

[13]

Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052--2062.

[14]

Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).

[15]

Yue He, Xiujun Chen, Di Wu, Junwei Pan, Qing Tan, Chuan Yu, Jian Xu, and Xiaoqiang Zhu. 2021. A unified solution to constrained bidding in online display advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2993--3001.

Digital Library

[16]

Kartik Hosanagar and Vadim Cherepanov. 2008. Optimal bidding in stochastic budget constrained slot auctions. In Proceedings of the 9th ACM Conference on Electronic Commerce. 20--20.

Digital Library

[17]

Sham Kakade and John Langford. 2002. Approximately optimal approximate reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning. 267--274.

Digital Library

[18]

Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, et al. 2018. Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on Robot Learning. PMLR, 651--673.

[19]

Dmytro Korenkevych, A Rupam Mahmood, Gautham Vasan, and James Bergstra. 2019. Autoregressive policies for continuous control deep reinforcement learning. arXiv preprint arXiv:1903.11524 (2019).

[20]

Vijay Krishna. 2009. Auction theory. Academic press.

[21]

Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, and Sergey Levine. 2019. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[22]

Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. 2020. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, Vol. 33 (2020), 1179--1191.

[23]

Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2016. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 1334--1373.

Digital Library

[24]

Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 (2020).

[25]

Brendan Lucier, Sarath Pattathil, Aleksandrs Slivkins, and Mengxiao Zhang. 2023. Autobidders with budget and roi constraints: Efficiency, regret, and pacing dynamics. arXiv preprint arXiv:2301.13306 (2023).

[26]

A Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, and James Bergstra. 2018. Benchmarking reinforcement learning algorithms on real-world robots. In Conference on robot learning. PMLR, 561--591.

[27]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928--1937.

[28]

Zhiyu Mou, Yusen Huo, Rongquan Bai, Mingzhou Xie, Chuan Yu, Jian Xu, and Bo Zheng. 2022. Sustainable Online Reinforcement Learning for Auto-bidding. Advances in Neural Information Processing Systems, Vol. 35 (2022), 2651--2663.

[29]

Claudia Perlich, Brian Dalessandro, Rod Hook, Ori Stitelman, Troy Raeder, and Foster Provost. 2012. Bid optimizing and inventory scoring in targeted online advertising. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 804--812.

Digital Library

[30]

Mostafa Sedighizadeh and Alireza Rezazadeh. 2008. Adaptive PID controller based on reinforcement learning for wind turbine control. In Proceedings of world academy of science, engineering and technology, Vol. 27. Citeseer, 257--262.

[31]

William John Shipman. 2021. Learning to Tune a Class of Controllers with Deep Reinforcement Learning. Minerals, Vol. 11, 9 (2021), 989.

[32]

Qifeng Sun, Chengze Du, Youxiang Duan, Hui Ren, and Hongqiang Li. 2021. Design and application of adaptive PID controller based on asynchronous advantage actor--critic learning method. Wireless Networks, Vol. 27 (2021), 3537--3547.

Digital Library

[33]

Richard S Sutton, Andrew G Barto, et al. 1998. Introduction to reinforcement learning. Vol. 135. MIT press Cambridge.

[34]

Haozhe Wang, Chao Du, Panyan Fang, Li He, Liang Wang, and Bo Zheng. 2023. Adversarial Constrained Bidding via Minimax Regret Optimization with Causality-Aware Reinforcement Learning. arXiv preprint arXiv:2306.07106 (2023).

[35]

Jun Wang and Shuai Yuan. 2015. Real-time bidding: A new frontier of computational advertising research. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. 415--416.

Digital Library

[36]

Jonathan Weed, Vianney Perchet, and Philippe Rigollet. 2016. Online learning in repeated auctions. In Conference on Learning Theory. PMLR, 1562--1583.

[37]

Jiayi Weng, Huayu Chen, Dong Yan, Kaichao You, Alexis Duburcq, Minghao Zhang, Yi Su, Hang Su, and Jun Zhu. 2022. Tianshou: A Highly Modularized Deep Reinforcement Learning Library. Journal of Machine Learning Research, Vol. 23, 267 (2022), 1--6. http://jmlr.org/papers/v23/21--1127.html

[38]

Di Wu, Xiujun Chen, Xun Yang, Hao Wang, Qing Tan, Xiaoxun Zhang, Jian Xu, and Kun Gai. 2018. Budget constrained bidding by model-free reinforcement learning in display advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1443--1451.

Digital Library

[39]

Xun Yang, Yasong Li, Hao Wang, Di Wu, Qing Tan, Jian Xu, and Kun Gai. 2019. Bid optimization by multivariable control in display advertising. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 1966--1974.

Digital Library

[40]

Shuai Yuan, Jun Wang, and Xiaoxue Zhao. 2013. Real-time bidding for online advertising: measurement and analysis. In Proceedings of the seventh international workshop on data mining for online advertising. 1--8.

Digital Library

[41]

Weinan Zhang, Yifei Rong, Jun Wang, Tianchi Zhu, and Xiaofan Wang. 2016. Feedback control of real-time display advertising. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. 407--416.

Digital Library

[42]

Weinan Zhang and Jun Wang. 2015. Statistical arbitrage mining for display advertising. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1465--1474.

Digital Library

[43]

Weinan Zhang, Shuai Yuan, and Jun Wang. 2014. Optimal real-time bidding for display advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1077--1086.

Digital Library

[44]

Jun Zhao, Guang Qiu, Ziyu Guan, Wei Zhao, and Xiaofei He. 2018. Deep reinforcement learning for sponsored search real-time bidding. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1021--1030.

Digital Library

[45]

Haolin Zhou, Chaoqi Yang, Xiaofeng Gao, Qiong Chen, Gongshen Liu, and Guihai Chen. 2022. Multi-Objective Actor-Critics for Real-Time Bidding in Display Advertising. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 20--37.

Index Terms

Offline Reinforcement Learning for Optimizing Production Bidding Policies
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
2. Information systems
  1. World Wide Web
    1. Online advertising

Recommendations

Real-Time Bidding by Reinforcement Learning in Display Advertising
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

The majority of online display ads are served through real-time bidding (RTB) --- each ad display impression is auctioned off in real-time when it is just being generated from a user visit. To place an ad automatically and optimally, it is critical for ...
Deep Reinforcement Learning for Sponsored Search Real-time Bidding
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Bidding optimization is one of the most critical problems in online advertising. Sponsored search (SS) auction, due to the randomness of user query behavior and platform nature, usually adopts keyword-level bidding strategies. In contrast, the display ...
Reinforcement learning approaches for specifying ordering policies of perishable inventory systems

Developed RL policies perform better than the other algorithms.RL learn better when the age of the products is used in state representation.Age and demand variance are important for perishable inventory management.The value of the age becomes critical ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2024

6901 pages

ISBN:9798400704901

DOI:10.1145/3637528

General Chairs:
Ricardo Baeza-Yates
Northeastern University, USA
,
Francesco Bonchi
CENTAI / Eurecat, Italy

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '24

Sponsor:

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
154
Total Downloads

Downloads (Last 12 months)154
Downloads (Last 6 weeks)154

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents