research-article

Public Access

Towards Robust Off-Policy Evaluation via Human Inputs

Authors:

Harvineet Singh,

Shalmali Joshi,

Finale Doshi-Velez,

Himabindu LakkarajuAuthors Info & Claims

AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society

Pages 686 - 699

https://doi.org/10.1145/3514094.3534198

Published: 27 July 2022 Publication History

Abstract

Off-policy Evaluation (OPE) methods are crucial tools for evaluating policies in high-stakes domains such as healthcare, where direct deployment is often infeasible, unethical, or expensive. When deployment environments are expected to undergo changes (that is, dataset shifts), it is important for OPE methods to perform robust evaluation of the policies amidst such changes. Existing approaches consider robustness against a large class of shifts that can arbitrarily change any observable property of the environment. This often results in highly pessimistic estimates of the utilities, thereby invalidating policies that might have been useful in deployment. In this work, we address the aforementioned problem by investigating how domain knowledge can help provide more realistic estimates of the utilities of policies. We leverage human inputs on which aspects of the environments may plausibly change, and adapt the OPE methods to only consider shifts on these aspects. Specifically, we propose a novel framework, Robust OPE (ROPE), which considers shifts on a subset of covariates in the data based on user inputs, and estimates worst-case utility under these shifts. We then develop computationally efficient algorithms for OPE that are robust to the aforementioned shifts for contextual bandits and Markov decision processes. We also theoretically analyze the sample complexity of these algorithms. Extensive experimentation with synthetic and real world datasets from the healthcare domain demonstrates that our approach not only captures realistic dataset shifts accurately, but also results in less pessimistic policy evaluations.

References

[1]

Hamsa Bastani and Mohsen Bayati. 2015. Online decision-making with high-dimensional covariates. Forthcoming in Operations Research (2015).

[2]

Aharon Ben-Tal, Dick Den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs Rennen. 2013. Robust solutions of optimization problems affected by uncertain probabilities. Management Science 59, 2 (2013), 341--357.

Digital Library

[3]

Dimitris Bertsimas, Vishal Gupta, and Nathan Kallus. 2018. Data-driven robust optimization. Mathematical Programming 167, 2 (2018), 235--292.

Digital Library

[4]

Jose Blanchet and Karthyek Murthy. 2019. Quantifying distributional model risk via optimal transport. Mathematics of Operations Research 44, 2 (2019), 565--600.

Digital Library

[5]

Clément L. Canonne. 2022. A short note on an inequality between KL and TV. https://doi.org/10.48550/ARXIV.2202.07198

[6]

Rune Christiansen, Niklas Pfister, Martin Emil Jakobsen, Nicola Gnecco, and Jonas Peters. 2020. A causal framework for distribution generalization. arXiv e-prints (2020), arXiv--2006.

[7]

International Warfarin Pharmacogenetics Consortium. 2009. Estimation of the warfarin dose with clinical and pharmacogenetic data. New England Journal of Medicine 360, 8 (2009), 753--764.

[8]

Frances Ding, Moritz Hardt, John Miller, and Ludwig Schmidt. 2021. Retiring Adult: New Datasets for Fair Machine Learning. In Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (Eds.). https://openreview.net/forum?id=bYi_2708mKK

[9]

John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12, 61 (2011), 2121--2159. http://jmlr.org/papers/v12/duchi11a.html

Digital Library

[10]

John Duchi and Hongseok Namkoong. 2018. Learning models with uniform performance via distributionally robust optimization. arXiv preprint arXiv:1810.08750 (2018).

[11]

John C Duchi, Tatsunori Hashimoto, and Hongseok Namkoong. 2019. Distributionally robust losses against mixture covariate shifts. Under review (2019).

[12]

Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li, et al. 2014. Doubly robust policy evaluation and optimization. Statist. Sci. 29, 4 (2014), 485--511.

[13]

Louis Faury, Ugo Tanielian, Elvis Dohmatob, Elena Smirnova, and Flavian Vasile. 2020. Distributionally robust counterfactual risk minimization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 3850--3857.

[14]

Joseph Futoma, Michael C Hughes, and Finale Doshi-Velez. 2020. Popcorn: Partially observed prediction constrained reinforcement learning. arXiv preprint arXiv:2001.04032 (2020).

[15]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).

[16]

Omer Gottesman, Fredrik Johansson, Joshua Meier, Jack Dent, Donghun Lee, Srivatsan Srinivasan, Linying Zhang, Yi Ding, David Wihl, Xuefeng Peng, Jiayu Yao, Isaac Lage, Christopher Mosch, Li-wei H. Lehman, Matthieu Komorowski, Matthieu Komorowski, Aldo Faisal, Leo Anthony Celi, David Sontag, and Finale Doshi-Velez. 2018. Evaluating Reinforcement Learning Algorithms in Observational Health Settings. https://doi.org/10.48550/ARXIV.1805.12298

[17]

Tobias Hatt, Daniel Tschernutter, and Stefan Feuerriegel. 2021. Generalizing Off-Policy Learning under Sample Selection Bias. arXiv:2112.01387 [stat.ML]

[18]

Carl Heneghan, Sally Tyndel, Clare Bankhead, Yi Wan, David Keeling, Rafael Perera, and Alison Ward. 2010. Optimal loading dose for the initiation of warfarin: a systematic review. BMC cardiovascular disorders 10, 1 (2010), 1--12.

[19]

Wassily Hoeffding. 1994. Probability inequalities for sums of bounded random variables. In The Collected Works of Wassily Hoeffding. Springer, 409--426.

[20]

Weihua Hu, Gang Niu, Issei Sato, and Masashi Sugiyama. 2018. Does distributionally robust supervised learning give robust classifiers?. In International Conference on Machine Learning. 2029--2037.

[21]

Garud N Iyengar. 2005. Robust dynamic programming. Mathematics of Operations Research 30, 2 (2005), 257--280.

Digital Library

[22]

Sookyo Jeong and Hongseok Namkoong. 2020. Robust causal inference under covariate shift via worst-case subpopulation treatment effects. In Conference on Learning Theory. PMLR, 2079--2084.

[23]

Nan Jiang. 2020. Notes on Tabular Methods. https://nanjiang.cs.illinois.edu/files/cs598/note3.pdf. Accessed: 2021--06-04.

[24]

Alistair E. W. Johnson, Tom J. Pollard, and Tristan Naumann. 2018. Generalizability of predictive models for intensive care unit patients. arXiv:1812.02275 [cs.LG]

[25]

Michael Kearns and Satinder Singh. 2002. Near-optimal reinforcement learning in polynomial time. Machine learning 49, 2 (2002), 209--232.

[26]

Taylor W Killian, Marzyeh Ghassemi, and Shalmali Joshi. 2020. Counterfactually Guided Policy Transfer in Clinical Settings. arXiv preprint arXiv:2006.11654 (2020).

[27]

Sean Kulinski, Saurabh Bagchi, and David I. Inouye. 2020. Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests. In Neural Information Processing Systems (NeurIPS).

[28]

Cassidy Laidlaw, Sahil Singla, and Soheil Feizi. 2021. Perceptual Adversarial Robustness: Defense Against Unseen Threat Models. In ICLR.

[29]

David A Levin and Yuval Peres. 2017. Markov chains and mixing times. Vol. 107. American Mathematical Soc.

[30]

Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv:2005.01643 [cs.LG]

[31]

Gen Li, Yuting Wei, Yuejie Chi, Yuantao Gu, and Yuxin Chen. 2020. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 12861--12872. https://proceedings.neurips.cc/paper/2020/file/96ea64f3a1aa2fd00c72faacf0cb8ac9-Paper.pdf

[32]

Min Li, Amy Mickel, and Stanley Taylor. 2018. "Should This Loan be Approved or Denied?": A Large Dataset with Class Assignment Guidelines. Journal of Statistics Education 26, 1 (2018), 55--66. https://doi.org/10.1080/10691898.2018.1434342 arXiv:https://doi.org/10.1080/10691898.2018.1434342

[33]

Mike Li, Hongseok Namkoong, and Shangzhou Xia. 2021. Evaluating model performance under worst-case subpopulations. In Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (Eds.). https://openreview.net/forum?id=nehzxAdyJxF

[34]

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In International Conference on Learning Representations.

[35]

Sara Magliacane, Thijs van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, and Joris M Mooij. 2018. Domain adaptation by using causal inference to predict invariant conditional distributions. In Advances in Neural Information Processing Systems. 10846--10856.

[36]

Pratyush Maini, Eric Wong, and Zico Kolter. 2020. Adversarial Robustness Against the Union of Multiple Perturbation Models. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 6640--6650. http://proceedings.mlr.press/v119/maini20a.html

[37]

John Miller, Karl Krauth, Benjamin Recht, and Ludwig Schmidt. 2020. The Effect of Natural Distribution Shift on Question Answering Models. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 6905--6916. http://proceedings.mlr.press/v119/miller20a.html

[38]

Weibin Mo, Zhengling Qi, and Yufeng Liu. 2020. Learning optimal distributionally robust individualized treatment rules. J. Amer. Statist. Assoc. (2020), 1--16.

[39]

Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, and Emma Brunskill. 2020. Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding. In Advances in Neural Information Processing Systems.

[40]

Arnab Nilim and Laurent El Ghaoui. 2005. Robust control of Markov decision processes with uncertain transition matrices. Operations Research 53, 5 (2005), 780--798.

Digital Library

[41]

Michael Oberst and David Sontag. 2019. Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 4881--4890. http://proceedings.mlr.press/v97/oberst19a.html

[42]

Michael Oberst, Nikolaj Thams, Jonas Peters, and David Sontag. 2021. Regularizing towards Causal Invariance: Linear Models with Proxies. arXiv preprint arXiv:2103.02477 (2021).

[43]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf

Digital Library

[44]

Arne Peine, Ahmed Hallawa, Johannes Bickenbach, Guido Dartmann, Lejla Begic Fazlic, Anke Schmeink, Gerd Ascheid, Christoph Thiemermann, Andreas Schuppert, Ryan Kindle, et al. 2021. Development and validation of a reinforcement learning algorithm to dynamically optimize mechanical ventilation in critical care. NPJ digital medicine 4, 1 (2021), 1--12.

[45]

Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. 2016. Causal inference by using invariant prediction: identification and confidence intervals. Journal AIES'22, August 1--3, 2022, Oxford, United Kingdom Singh et al. of the Royal Statistical Society: Series B (Statistical Methodology) 78, 5 (2016), 947--1012.

[46]

Marek Petrik and Reazul Hasan Russel. 2019. Beyond confidence regions: Tight bayesian ambiguity sets for robust mdps. In Advances in Neural Information Processing Systems. 7049--7058.

[47]

Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta. 2017. Robust Adversarial Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 2817--2826. http://proceedings.mlr.press/v70/pinto17a.html

[48]

Zhengling Qi and Peng Liao. 2020. Robust Batch Policy Learning in Markov Decision Processes. arXiv preprint arXiv:2011.04185 (2020).

[49]

Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, and Percy Liang. 2020. Understanding and Mitigating the Tradeoff between Robustness and Accuracy. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 7909--7919. https://proceedings.mlr.press/v119/raghunathan20a.html

[50]

R Tyrrell Rockafellar, Stanislav Uryasev, et al . 2000. Optimization of conditional value-at-risk. Journal of risk 2 (2000), 21--42.

[51]

Mateo Rojas-Carulla, Bernhard Schölkopf, Richard Turner, and Jonas Peters. 2018. Invariant models for causal transfer learning. The Journal of Machine Learning Research 19, 1 (2018), 1309--1342.

Digital Library

[52]

Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Jonas Peters. 2018. Anchor regression: heterogeneous data meets causality. arXiv preprint arXiv:1801.06229 (2018).

[53]

Alexander Shapiro, Darinka Dentcheva, and Andrzej Ruszczy'ski. 2014. Lectures on stochastic programming: modeling and theory. SIAM.

[54]

Nian Si, Fan Zhang, Zhengyuan Zhou, and Jose Blanchet. 2020. Distributional Robust Batch Contextual Bandits. arXiv preprint arXiv:2006.05630 (2020).

[55]

Aman Sinha, Hongseok Namkoong, and John Duchi. 2018. Certifying Some Distributional Robustness with Principled Adversarial Training. In International Conference on Learning Representations.

[56]

Matthew Staib and Stefanie Jegelka. 2017. Distributionally robust deep learning as a generalization of adversarial training. In NIPS workshop on Machine Learning and Computer Security.

[57]

Adarsh Subbaswamy, Roy Adams, and Suchi Saria. 2021. Evaluating Model Robustness and Stability to Dataset Shift. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 130), Arindam Banerjee and Kenji Fukumizu (Eds.). PMLR, 2611--2619. http://proceedings.mlr.press/v130/subbaswamy21a.html

[58]

Adarsh Subbaswamy, Peter Schulam, and Suchi Saria. 2019. Preventing failures due to dataset shift: Learning predictive models that transport. In The 22nd International Conference on Artificial Intelligence and Statistics. 3118--3127.

[59]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[60]

Aviv Tamar, Shie Mannor, and Huan Xu. 2014. Scaling up robust MDPs using function approximation. In International Conference on Machine Learning. 181--189.

[61]

Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Benjamin Recht, and Ludwig Schmidt. 2020. Measuring Robustness to Natural Distribution Shifts in Image Classification. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 18583--18599. https://proceedings.neurips.cc/paper/2020/file/d8330f857a17c53d217014ee776bfd50-Paper.pdf

[62]

Philip Thomas and Emma Brunskill. 2016. Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning. 2139--2148.

[63]

Masatoshi Uehara, Masahiro Kato, and Shota Yasui. 2020. Off-Policy Evaluation and Learning for External Validity under a Covariate Shift. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 49--61. https://proceedings.neurips.cc/paper/2020/file/0084ae4bc24c0795d1e6a4f58444d39b-Paper.pdf

[64]

Tim van Erven and Peter Harremos. 2014. Rényi Divergence and Kullback-Leibler Divergence. IEEE Transactions on Information Theory 60, 7 (2014), 3797--3820. https://doi.org/10.1109/TIT.2014.2320500

[65]

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, Ilhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17 (2020), 261--272. https://doi.org/10.1038/s41592-019-0686--2

[66]

Wolfram Wiesemann, Daniel Kuhn, and Berç Rustem. 2013. Robust Markov Decision Processes. Mathematics of Operations Research 38, 1 (2013), 153--183. https://doi.org/10.1287/moor.1120.0566 arXiv:https://doi.org/10.1287/moor.1120.0566

Digital Library

[67]

Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, and Doina Precup. 2020. Invariant Causal Prediction for Block MDPs. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 11214--11224. http://proceedings.mlr.press/v119/zhang20t.html

[68]

Zhengqing Zhou, Zhengyuan Zhou, Qinxun Bai, Linhai Qiu, Jose Blanchet, and Peter Glynn. 2021. Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 130), Arindam Banerjee and Kenji Fukumizu (Eds.). PMLR, 3331--3339. https://proceedings.mlr.press/v130/zhou21d.html

Index Terms

Towards Robust Off-Policy Evaluation via Human Inputs
1. Computing methodologies

Recommendations

Designing Fast and Scalable XACML Policy Evaluation Engines

Most prior research on policies has focused on correctness. While correctness is an important issue, the adoption of policy-based computing may be limited if the resulting systems are not implemented efficiently and thus perform poorly. To increase the ...
Adaptive Reordering and Clustering-Based Framework for Efficient XACML Policy Evaluation

The adoption of XACML as the standard for specifying access control policies for various applications, especially web services is vastly increasing. This calls for high performance XACML policy evaluation engines. A policy evaluation engine can easily ...
Towards high performance security policy evaluation

The Enterprise Privacy Authorization Language (EPAL) is a formal language for specifying fine-grained enterprise privacy policies. With the adoption of EPAL, especially in web applications, the performance of EPAL policy evaluation engines becomes a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society

July 2022

939 pages

ISBN:9781450392471

DOI:10.1145/3514094

General Chairs:
Vincent Conitzer
Duke University & University of Oxford
,
John Tasioulas
University of Oxford
,
Program Chairs:
Matthias Scheutz
Tufts University
,
Ryan Calo
University of Washington
,
Martina Mara
Johannes Kepler University Linz
,
Annette Zimmermann
University of York

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation
Google
Center for Research on Computation and Society, Harvard SEAS
Amazon
Harvard Data Science Institute
Bayer

Conference

AIES '22

Sponsor:

SIGAI

AIES '22: AAAI/ACM Conference on AI, Ethics, and Society

May 19 - 21, 2021

Oxford, United Kingdom

Acceptance Rates

Overall Acceptance Rate 61 of 162 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
325
Total Downloads

Downloads (Last 12 months)105
Downloads (Last 6 weeks)11

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents