research-article

Interpretable Personalized Experimentation

Authors:

Eytan BakshyAuthors Info & Claims

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 4173 - 4183

https://doi.org/10.1145/3534678.3539175

Published: 14 August 2022 Publication History

Abstract

Black-box heterogeneous treatment effect (HTE) models are increasingly being used to create personalized policies that assign individuals to their optimal treatments. However, they are difficult to understand, and can be burdensome to maintain in a production environment. In this paper, we present a scalable, interpretable personalized experimentation system, implemented and deployed in production at Meta. The system works in a multiple treatment, multiple outcome setting typical at Meta to: (1) learn explanations for black-box HTE models; (2) generate interpretable personalized policies. We evaluate the methods used in the system on publicly available data and Meta use cases, and discuss lessons learnt during the development of the system.

References

[1]

Maxime Amram, Jack Dunn, and Ying Daisy Zhuo. 2020. Optimal Policy Trees. arXiv:2012.02279 (2020).

[2]

Susan F. Assmann, Stuart J. Pocock, Laura E. Enos, and Linda E. Kasten. 2000. Subgroup analysis and other (mis) uses of baseline data in clinical trials. The Lancet 355, 9209 (2000).

[3]

Susan Athey and Guido Imbens. 2016. Recursive partitioning for heterogeneous causal effects. PNAS 113, 27 (2016).

[4]

Susan Athey and Stefan Wager. 2021. Policy learning with observational data. Econometrica 89, 1 (2021), 133--161.

[5]

Dimitris Bertsimas, Jack Dunn, and Nishanth Mundru. 2019. Optimal prescriptive trees. INFORMS Journal on Optimization 1, 2 (2019), 164--183.

[6]

Max Biggs, Wei Sun, and Markus Ettl. 2021. Model Distillation for Revenue Optimization: Interpretable Personalized Pricing. In ICML.

[7]

Cristian Bucilua, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression. In KDD.

[8]

Rich Caruana. 1997. Multitask learning. Machine Learning 28, 1 (1997), 41--75.

Digital Library

[9]

Gong Chen, Hua Zhong, Anton Belousov, and Viswanath Devanarayan. 2015. A PRIM approach to predictive-signature development for patient stratification. Stat Med 34, 2 (2015), 317--342.

[10]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In RecSys.

[11]

Miroslav Dudík, John Langford, and Lihong Li. 2011. Doubly Robust Policy Evaluation and Learning. In ICML.

[12]

Raaz Dwivedi, Yan Shuo Tan, Briton Park, Mian Wei, Kevin Horgan, David Madigan, and Bin Yu. 2020. Stable discovery of interpretable subgroups via calibration in causal studies. International Statistical Review 88 (2020).

[13]

Charles Elkan. 2001. The Foundations of Cost-Sensitive Learning. In IJCAI.

[14]

Jared C Foster, Jeremy MG Taylor, and Stephen J Ruberg. 2011. Subgroup identification from randomized clinical trial data. Stat Med 30, 24 (2011).

[15]

Antonino Freno. 2017. Practical Lessons from Developing a Large-Scale Recommender System at Zalando. In RecSys.

[16]

Florent Garcin, Christos Dimitrakakis, and Boi Faltings. 2013. Personalized News Recommendation with Context Trees. In RecSys.

[17]

Mihajlo Grbovic and Haibin Cheng. 2018. Real-time personalization using embeddings for search ranking at airbnb. In KDD.

[18]

Nyoman Gunantara. 2018. A review of multi-objective optimization: Methods and its applications. Cogent Engineering 5, 1 (2018).

[19]

Tamir Hazan, Joseph Keshet, and David McAllester. 2010. Direct Loss Minimization for Structured Prediction. In NeurIPS.

[20]

Jennifer L Hill. 2011. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20, 1 (2011), 217--240.

[21]

Kevin Hillstrom. 2008. MineThatData E-Mail Analytics And Data Mining Challenge. https://www.uplift-modeling.com/en/v0.3.1/api/datasets/fetch_hillstrom. html. Accessed October 19, 2021.

[22]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2014. Distilling the Knowledge in a Neural Network. In NeurIPS Deep Learning Workshop.

[23]

Kosuke Imai and Marc Ratkovic. 2013. Estimating treatment effect heterogeneity in randomized program evaluation. The Annals of Applied Statistics 7, 1 (2013).

[24]

Kosuke Imai and Aaron Strauss. 2011. Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get-out-the-vote campaign. Political Analysis 19, 1 (2011), 1--19.

[25]

Nathanael Jo, Sina Aghaei, Andrés Gómez, and Phebe Vayanos. 2021. Learning Optimal Prescriptive Trees from Observational Data. arXiv:2108.13628 (2021).

[26]

Fredrik Johansson, Uri Shalit, and David Sontag. 2016. Learning representations for counterfactual inference. In ICML.

[27]

Nathan Kallus. 2017. Recursive partitioning for personalization using observational data. In ICML.

[28]

Nathan Kallus and Angela Zhou. 2018. Policy evaluation and optimization with continuous treatments. In AISTATS.

[29]

Edward H. Kennedy. 2020. Optimal doubly robust estimation of heterogeneous causal effects. arXiv:2004.14497 (2020).

[30]

Toru Kitagawa and Aleksey Tetenov. 2018. Who should be treated? empirical welfare maximization methods for treatment choice. Econometrica 86, 2 (2018).

[31]

Sören R Künzel, Jasjeet S Sekhon, Peter J Bickel, and Bin Yu. 2019. Metalearners for estimating heterogeneous treatment effects using machine learning. PNAS 116, 10 (2019), 4156--4165.

[32]

Akos Lada, Alexander Peysakhovich, Diego Aparicio, and Michael Bailey. 2019. Observational data for heterogeneous treatment effects with application to recommender systems. In EC.

[33]

Hyafil Laurent and Ronald L Rivest. 1976. Constructing optimal binary decision trees is NP-complete. Inform. Process. Lett. 5, 1 (1976), 15--17.

[34]

Hyun-Suk Lee, Yao Zhang, William Zame, Cong Shen, Jang-Won Lee, and Mihaela van der Schaar. 2020. Robust recursive partitioning for heterogeneous treatment effects with uncertainty quantification. In NeurIPS.

[35]

Benjamin Letham and Eytan Bakshy. 2019. Bayesian Optimization for Policy Search via Online-Offline Experimentation. JMLR 20 (2019), 145--1.

[36]

Wei-Yin Loh, Luxi Cao, and Peigen Zhou. 2019. Subgroup identification for precision medicine: A comparative review of 13 methods. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9, 5 (2019), e1326.

[37]

Maggie Makar, Adith Swaminathan, and Emre K?c?man. 2019. A distillation approach to data efficient individual treatment effect estimation. In AAAI.

[38]

MedicineNet. 2021. Liver Function Tests (Normal, Low, and High Ranges & Results). https://www.medicinenet.com/liver_blood_tests/article.htm. Accessed October 19, 2021.

[39]

Lina Montoya, Mark van der Laan, Alexander Luedtke, Jennifer Skeem, Jeremy Coyle, and Maya Petersen. 2021. The Optimal Dynamic Treatment Rule SuperLearner: Considerations, Performance, and Application. arXiv:2101.12326 (2021).

[40]

Chirag Nagpal, Dennis Wei, Bhanukiran Vinzamuri, Monica Shekhar, Sara E Berger, Subhro Das, and Kush R Varshney. 2020. Interpretable subgroup discovery in treatment effect estimation with application to opioid prescribing guidelines. In ACM Conference on Health, Inference, and Learning.

Digital Library

[41]

Jersey Neyman. 1923. Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. Roczniki Nauk Rolniczych 10 (1923).

[42]

Xinkun Nie and Stefan Wager. 2021. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika 108, 2 (2021), 299--319.

[43]

Min Qian and Susan A Murphy. 2011. Performance guarantees for individualized treatment rules. Annals of Statistics 39, 2 (2011), 1180.

[44]

Donald B Rubin. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 5 (1974), 688.

[45]

D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. In NeurIPS.

[46]

Paras Sheth, Ujun Jeong, Ruocheng Guo, Huan Liu, and K Selçuk Candan. 2021. CauseBox: A Causal Inference Toolbox for Benchmarking Treatment Effect Estimators with Machine Learning Methods. In CIKM.

[47]

Aleksandrs Slivkins. 2019. Introduction to Multi-Armed Bandits. Foundations and Trends in Machine Learning 12, 1--2 (2019), 1--286.

Digital Library

[48]

Yang Song and George YH Chi. 2007. A method for testing a prespecified subgroup in clinical trials. Stat Med 26, 19 (2007), 3535--3549.

[49]

Adith Swaminathan and Thorsten Joachims. 2015. Counterfactual risk minimization: Learning from logged bandit feedback. In ICML. 814--823.

[50]

Xiaocheng Tang, Fan Zhang, Zhiwei Qin, Yansheng Wang, Dingyuan Shi, Bingchen Song, Yongxin Tong, Hongtu Zhu, and Jieping Ye. 2021. Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms. In KDD.

[51]

Ye Tu, Kinjal Basu, Cyrus DiCiccio, Romil Bansal, Preetam Nandy, Padmini Jaikumar, and Shaunak Chatterjee. 2021. Personalized Treatment Selection Using Causal Heterogeneity. In WWW.

[52]

S.Wager and S. Athey. 2018. Estimation and inference of heterogeneous treatment effects using random forests. J. Amer. Statist. Assoc. 113, 523 (2018), 1228--1242.

[53]

Tong Wang and Cynthia Rudin. 2021. Causal rule sets for identifying subgroups with enhanced treatment effect. INFORMS Journal on Computing (2021).

[54]

Yeming Wang, Dingyu Zhang, Guanhua Du, Ronghui Du, Jianping Zhao, Yang Jin, Shouzhi Fu, Ling Gao, Zhenshun Cheng, Qiaofa Lu, et al. 2020. Remdesivir in adults with severe COVID-19: a randomised, double-blind, placebo-controlled, multicentre trial. The Lancet 395, 10236 (2020), 1569--1578.

[55]

Yuxiang Xie, Nanyu Chen, and Xiaolin Shi. 2018. False discovery rate controlled heterogeneous treatment effect detection for online controlled experiments. In KDD.

[56]

Yingqi Zhao, Donglin Zeng, A John Rush, and Michael R Kosorok. 2012. Estimating individualized treatment rules using outcome weighted learning. J. Amer. Statist. Assoc. 107, 499 (2012), 1106--1118.

[57]

Zhengyuan Zhou, Susan Athey, and Stefan Wager. 2018. Offline multi-action policy learning: Generalization and optimization. arXiv:1810.04778 (2018).

Cited By

Leng YDimmery D(undefined)Calibration of Heterogeneous Treatment Effects in Random ExperimentsSSRN Electronic Journal10.2139/ssrn.3875850
https://doi.org/10.2139/ssrn.3875850

Index Terms

Interpretable Personalized Experimentation
1. Information systems
  1. World Wide Web
    1. Web searching and information discovery
      1. Personalization

Recommendations

Effects of Personalized and Aggregate Top-N Recommendation Lists on User Preference Ratings

Prior research has shown a robust effect of personalized product recommendations on user preference judgments for items. Specifically, the display of system-predicted preference ratings as item recommendations has been shown in multiple studies to bias ...
An interpretable mechanism for personalized recommendation based on cross feature

The existing recommender system provides personalized recommendation service for users in online shopping, entertainment, and other activities. In order to improve the probability of users accepting the system’s recommendation service, compared with the ...
Understanding collaborative filtering parameters for personalized recommendations in e-commerce

Collaborative Filtering (CF) is a popular method for personalizing product recommendations for e-Commerce and customer relationship management (CRM). CF utilizes the explicit or implicit product evaluation ratings of customers to develop personalized ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2022

5033 pages

ISBN:9781450393850

DOI:10.1145/3534678

General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '22

Sponsor:

KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2022

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
399
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)2

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Leng YDimmery D(undefined)Calibration of Heterogeneous Treatment Effects in Random ExperimentsSSRN Electronic Journal10.2139/ssrn.3875850
https://doi.org/10.2139/ssrn.3875850

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten