research-article

Open access

Explaining an increase in predicted risk for clinical alerts

Authors:

Michaela Hardt,

Alvin Rajkomar,

Gerardo Flores,

Michael Howell,

Moritz HardtAuthors Info & Claims

CHIL '20: Proceedings of the ACM Conference on Health, Inference, and Learning

Pages 80 - 89

https://doi.org/10.1145/3368555.3384460

Published: 02 April 2020 Publication History

Abstract

Much work aims to explain a model's prediction on a static input. We consider explanations in a temporal setting where a stateful dynamical model produces a sequence of risk estimates given an input at each time step. When the estimated risk increases, the goal of the explanation is to attribute the increase to a few relevant inputs from the past.

While our formal setup and techniques are general, we carry out an in-depth case study in a clinical setting. The goal here is to alert a clinician when a patient's risk of deterioration rises. The clinician then has to decide whether to intervene and adjust the treatment. Given a potentially long sequence of new events since she last saw the patient, a concise explanation helps her to quickly triage the alert.

We develop methods to lift static attribution techniques to the dynamical setting, where we identify and address challenges specific to dynamics. We then experimentally assess the utility of different explanations of clinical alerts through expert evaluation.

References

[1]

David W Bates and Eyal Zimlichman. Finding patients before they crash: the next major opportunity to improve patient safety. BMJ Qual. Saf., 24(1):1--3, January 2015.

[2]

David W Bates, Suchi Saria, Lucila Ohno-Machado, Anand Shah, and Gabriel Escobar. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff., 33(7):1123--1131, July 2014.

[3]

Vincent Liu, Yan S. Kim, Benjamin J. Turk, Arona Ragins, Brian A. Dummett, Carnen L. Adams, Elizabeth A. Scruth, and Patricia Kipnis. Early Detection of Impending Deterioration Outside The ICU: A Difference-in-Differences (DiD) Study, pages A7614--A7614. 2016.

[4]

Barbara J Drew, Patricia Harris, Jessica K Zègre-Hemsey, Tina Mammone, Daniel Schindler, Rebeca Salas-Boni, Yong Bai, Adelita Tinoco, Quan Ding, and Xiao Hu. Insights into the problem of alarm fatigue with physiologic monitor devices: a comprehensive observational study of consecutive intensive care unit patients. PLoS One, 9(10):e110274, October 2014.

[5]

Shannon M Fernando, Alexandre Tran, Monica Taljaard, Wei Cheng, Bram Rochwerg, Andrew J E Seely, and Jeffrey J Perry. Prognostic accuracy of the quick sequential organ failure assessment for mortality in patients with suspected infection: A systematic review and meta-analysis. Ann. Intern. Med., February 2018.

[6]

Sana Tonekaboni, Shalmali Joshi, Melissa D. McCradden, and Anna Goldenberg. What clinicians want: Contextualizing explainable machine learning for clinical end use. In MLHC, 2019.

[7]

David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert Müller. How to explain individual classification decisions. J. Mach. Learn. Res., 11:1803--1831, August 2010.

Digital Library

[8]

Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. Visualizing higher-layer features of a deep network. Technical Report 1341, University of Montreal, June 2009. Also presented at the ICML 2009 Workshop on Learning Feature Hierarchies, Montréal, Canada.

[9]

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR, abs/1312.6034, 2013.

[10]

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In ICML, 2017.

Digital Library

[11]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. September 2014.

[12]

Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '15, pages 1721--1730, New York, NY, USA, 2015. ACM.

Digital Library

[13]

Michael J Rothman, Steven I Rothman, and Joseph Beals, 4th. Development and validation of a continuous measure of patient condition using the electronic medical record. J. Biomed. Inform., 46(5):837--848, October 2013.

Digital Library

[14]

Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning. arXiv, 2017.

[15]

Lauren Block, Robert Habicht, Albert W Wu, Sanjay V Desai, Kevin Wang, Kathryn Novello Silva, Timothy Niessen, Nora Oliver, and Leonard Feldman. In the wake of the 2003 and 2011 duty hours regulations, how do internal medicine interns spend their time? J. Gen. Intern. Med., 28(8):1042--1047, August 2013.

[16]

Lena Mamykina, David K Vawdrey, and George Hripcsak. How do residents spend their shift time? a time and motion study with a particular focus on the use of computers. Acad. Med., 91(6):827--832, June 2016.

[17]

Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values. CoRR, abs/1606.01865, 2016.

[18]

Zachary C. Lipton, David C. Kale, and Randall C. Wetzel. Directly modeling missing data in sequences with rnns: Improved classification of clinical time series. In MLHC, 2016.

[19]

Harini Suresh, Nathan Hunt, Alistair Johnson, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. Clinical intervention prediction and understanding using deep networks. May 2017.

[20]

Denis Agniel, Isaac S Kohane, and Griffin M Weber. Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. BMJ, 361:k1479, April 2018.

[21]

Robert A Verheij, Vasa Curcin, Brendan C Delaney, and Mark M McGilchrist. Possible sources of bias in primary care electronic health record data use and reuse. J Med Internet Res, 20(5):e185, May 2018.

[22]

Matti Suistomaa, Aarno Kari, Esko Ruokonen, and Jukka Takala. Sampling rate causes bias in apache ii and saps ii scores. Intensive Care Medicine, 26(12):1773--1778, Dec 2000.

[23]

Supriyo Chakraborty, Richard Tomsett, Ramya Raghavendra, Daniel Harborne, Moustafa Alzantot, Federico Cerutti, Mani Srivastava, Alun Preece, Julier Simon, Raghuveer M. Rao, Troy D. Kelley, David Braines, Murat Sensoy, Christopher J. Willis, and Prudhvi Gurram. Interpretability of deep learning models: a survey of results. In IEEE Smart World Congress 2017 Workshop: DAIS 2017 - Workshop on Distributed Analytics InfraStructure and Algorithms for Multi-Organization Federations, 2017.

[24]

D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg. SmoothGrad: removing noise by adding noise. ICML workshop on visualization for deep learning, June 2017.

[25]

Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision -- ECCV 2014, pages 818--833, Cham, 2014. Springer International Publishing.

[26]

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. Striving for simplicity: The all convolutional net. CoRR, abs/1412.6806, 2014.

[27]

Pramod Kaushik Mudrakarta, Ankur Taly, Mukund Sundararajan, and Kedar Dhamdhere. Did the model understand the question? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pages 1896--1906, 2018.

[28]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should i trust you?": Explaining the predictions of any classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, pages 1135--1144, New York, NY, USA, 2016. ACM.

Digital Library

[29]

Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 4765--4774. Curran Associates, Inc., 2017.

[30]

Jianbo Chen, Le Song, Martin Wainwright, and Michael Jordan. Learning to explain: An information-theoretic perspective on model interpretation. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 883--892, Stockholmsmässan, Stockholm Sweden, 10--15 Jul 2018. PMLR.

[31]

Lionel A Mandell, Richard G Wunderink, Antonio Anzueto, John G Bartlett, G Douglas Campbell, Nathan C Dean, Scott F Dowell, Thomas M File, Jr, Daniel M Musher, Michael S Niederman, Antonio Torres, Cynthia G Whitney, Infectious Diseases Society of America, and American Thoracic Society. Infectious diseases society of America/American thoracic society consensus guidelines on the management of community-acquired pneumonia in adults. Clin. Infect. Dis., 44 Suppl 2:S27--72, March 2007.

[32]

W S Lim, D L Smith, M P Wise, and S A Welham. British thoracic society community acquired pneumonia guideline and the NICE pneumonia guideline: how they fit together. BMJ Open Respir Res, 2(1):e000091, May 2015.

[33]

Matthew M Churpek and Dana P Edelson. Moving beyond Single-Parameter early warning scores for rapid response system activation. Crit. Care Med., 44(12):2283--2285, December 2016.

[34]

Michael D Howell, Long Ngo, Patricia Folcarelli, Julius Yang, Lawrence Mottley, Edward R Marcantonio, Kenneth E Sands, Donald Moorman, and Mark D Aronson. Sustained effectiveness of a primary-team-based rapid response system. Crit. Care Med., 40(9):2562--2568, September 2012.

[35]

Julia Adler-Milstein, Catherine M DesRoches, Peter Kralovec, Gregory Foster, Chantal Worzala, Dustin Charles, Talisha Searcy, and Ashish K Jha. Electronic health record adoption in US hospitals: Progress continues, but challenges persist. Health Aff., 34(12):2174--2180, December 2015.

[36]

Benjamin Shickel, Patrick Tighe, Azra Bihorac, and Parisa Rashidi. Deep ehr: A survey of recent advances in deep learning techniques for electronic health record (ehr) analysis. IEEE Journal of Biomedical and Health Informatics, PP(99):1--1, 2017.

[37]

Harini Suresh, Nathan Hunt, Alistair Johnson, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. Clinical intervention prediction and understanding with deep neural networks. In Finale Doshi-Velez, Jim Fackler, David Kale, Rajesh Ranganath, Byron Wallace, and Jenna Wiens, editors, Proceedings of the 2nd Machine Learning for Healthcare Conference, volume 68 of Proceedings of Machine Learning Research, pages 322--337, Boston, Massachusetts, 18--19 Aug 2017. PMLR.

[38]

Phuoc Nguyen, Truyen Tran, Nilmini Wickramasinghe, and Svetha Venkatesh. Deepr: A convolutional net for medical records. July 2016.

[39]

Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. Doctor AI: Predicting clinical events via recurrent neural networks. In Proceedings of the 1st Machine Learning for Healthcare Conference, pages 301--318. jmlr.org, 2016.

[40]

Zhengping Che, David Kale, Wenzhe Li, Mohammad Taha Bahadori, and Yan Liu. Deep computational phenotyping. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '15, pages 507--516, New York, NY, USA, 2015. ACM.

Digital Library

[41]

Thomas A. Lasko, Joshua C. Denny, and Mia A. Levy. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PLoS One, 8(6), 2013.

[42]

Truyen Tran, Tu Dinh Nguyen, Dinh Phung, and Svetha Venkatesh. Learning vector representation of medical objects via emr-driven nonnegative restricted boltzmann machines (enrbm). J. of Biomedical Informatics, 54(C):96--105, April 2015.

Digital Library

[43]

Finale Doshi-Velez, Yaorong Ge, and Isaac Kohane. Comorbidity clusters in autism spectrum disorders: An electronic health record time-series analysis. Pediatrics, 133(1):e54--e63, 2014.

[44]

Mathieu Guillame-Bert, Artur Dubrawski, Donghan Wang, Marilyn Hravnak, Gilles Clermont, and Michael R. Pinsky. Learning temporal rules to forecast instability in continuously monitored patients. JAMIA, 24(1):47--53, 2017.

[45]

Thomas McCoy Roy Perlis Finale Doshi-Velez Michael C. Hughes, Huseyin Melih Elibol. Supervised topic models for clinical interpretability. In Proceedings of the 1st Machine Learning for Healthcare Conference, 2016.

[46]

Anand Avati, Kenneth Jung, Stephanie Harman, Lance Downing, Andrew Ng, and Nigam H Shah. Improving palliative care with deep learning. November 2017.

[47]

Ying Sha and May D. Wang. Interpretable predictions of clinical outcomes with an attention-based recurrent neural network. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics, ACM-BCB '17, pages 233--240, New York, NY, USA, 2017. ACM.

Digital Library

[48]

Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 3504--3512. Curran Associates, Inc., 2016.

[49]

Zhengping Che, Sanjay Purushotham, Robinder Khemani, and Yan Liu. Interpretable deep models for icu outcome prediction. 2016:371--380, 02 2017.

[50]

Zhengping Che, Sanjay Purushotham, Robinder G. Khemani, and Yan Liu. Distilling knowledge from deep networks with applications to healthcare domain. In NIPS Workshop on Machine Learning for Healthcare, 2015.

[51]

Hany Farid and Eero Simoncelli. Differentiation of discrete multidimensional signals. IEEE Transactions on Image Processing, 13(4):496--508, 4 2004.

Digital Library

[52]

Sepp Hochreiter and Jürgen Schmidhuber. Long Short-Term memory. Neural Comput., 9(8):1735--1780, November 1997.

Digital Library

[53]

Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. Mimic-iii, a freely accessible critical care database. Scientific Data, 2016.

[54]

G. Michael Felker, Larry A. Allen, Stuart J. Pocock, Linda K. Shaw, John J.V. McMurray, Marc A. Pfeffer, Karl Swedberg, Duolao Wang, Salim Yusuf, Eric L. Michelson, and Christopher B. Granger. Red cell distribution width as a novel prognostic marker in heart failure: Data from the charm program and the duke databank. Journal of the American College of Cardiology, 50(1):40 -- 47, 2007.

[55]

Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M Dai, Nissan Hajaj, Peter J Liu, Xiaobing Liu, Mimi Sun, Patrik Sundberg, Hector Yee, Kun Zhang, Gavin E Duggan, Gerardo Flores, Michaela Hardt, Jamie Irvine, Quoc Le, Kurt Litsch, Jake Marcus, Alexander Mossin, Justin Tansuwan, De Wang, James Wexler, Jimbo Wilson, Dana Ludwig, Samuel L Volchenboum, Katherine Chou, Michael Pearson, Srinivasan Madabushi, Nigam H Shah, Atul J Butte, Michael Howell, Claire Cui, Greg Corrado, and Jeff Dean. Scalable and accurate deep learning for electronic health records. npj Digital Medicine, 1, January 2018.

[56]

Yarin Gal and Zoubin Ghahramani. A theoretically grounded application of dropout in recurrent neural networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 1019--1027. Curran Associates, Inc., 2016.

[57]

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res., 12:2121--2159, July 2011.

Digital Library

[58]

J. A. Kellum and N. Lameire. Diagnosis, evaluation, and management of acute kidney injury: a KDIGO summary (Part 1). Crit Care, 17(1):204, Feb 2013.

[59]

P.-J. Kindermans, S. Hooker, J. Adebayo, M. Alber, K. T. Schütt, S. Dähne, D. Erhan, and B. Kim. The (Un)reliability of saliency methods. NIPS workshop on Explaining and Visualizing Deep Learning, 2017.

[60]

Zachary Chase Lipton. The mythos of model interpretability. In ICML Workshop on Human Interpretability, 2016.

[61]

Aaron Springer, Victoria Hollis, and Steve Whittaker. Dice in the black box: User experiences with an inscrutable algorithm, 2017.

Cited By

Ramosaj LBytyçi AShala BBytyçi E(2024)Graph and Structured Data Algorithms in Electronic Health Records: A Scoping ReviewMetadata and Semantic Research10.1007/978-3-031-65990-4_6(61-73)Online publication date: 31-Jul-2024
https://doi.org/10.1007/978-3-031-65990-4_6
Li YLu XWang YDou DKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Generative time series forecasting with diffusion, denoise, and disentanglementProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601942(23009-23022)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601942
Meng CTrinh LXu NEnouen JLiu Y(2022)Interpretability and fairness evaluation of deep learning models on MIMIC-IV datasetScientific Reports10.1038/s41598-022-11012-212:1Online publication date: 3-May-2022
https://doi.org/10.1038/s41598-022-11012-2
Show More Cited By

Index Terms

Explaining an increase in predicted risk for clinical alerts
1. Applied computing
  1. Life and medical sciences
    1. Health informatics

Recommendations

Explaining Disease: Correlations, Causes, and Mechanisms

Why do people get sick? I argue that a disease explanation is best thought of as causal network instantiation, where a causal network describes the interrelations among multiple factors, and instantiation consists of observational or hypothetical ...
JUICE: JUstIfied Counterfactual Explanations
Discovery Science
Abstract
Complex, highly accurate machine learning algorithms support decision-making processes with large and intricate datasets. However, these models have low explainability. Counterfactual explanation is a technique that tries to find a set of feature ...
Explaining recommendation system using counterfactual textual explanations
Abstract
Currently, there is a significant amount of research being conducted in the field of artificial intelligence to improve the explainability and interpretability of deep learning models. It is found that if end-users understand the reason for the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CHIL '20: Proceedings of the ACM Conference on Health, Inference, and Learning

April 2020

265 pages

ISBN:9781450370462

DOI:10.1145/3368555

General Chair:
Marzyeh Ghassemi
University of Toronto and the Vector Institute

Copyright © 2020 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 April 2020

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ACM CHIL '20

Sponsor:

ACM

ACM CHIL '20: ACM Conference on Health, Inference, and Learning

April 2 - 4, 2020

Ontario, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 27 of 110 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
846
Total Downloads

Downloads (Last 12 months)163
Downloads (Last 6 weeks)7

Reflects downloads up to 01 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ramosaj LBytyçi AShala BBytyçi E(2024)Graph and Structured Data Algorithms in Electronic Health Records: A Scoping ReviewMetadata and Semantic Research10.1007/978-3-031-65990-4_6(61-73)Online publication date: 31-Jul-2024
https://doi.org/10.1007/978-3-031-65990-4_6
Li YLu XWang YDou DKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Generative time series forecasting with diffusion, denoise, and disentanglementProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601942(23009-23022)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601942
Meng CTrinh LXu NEnouen JLiu Y(2022)Interpretability and fairness evaluation of deep learning models on MIMIC-IV datasetScientific Reports10.1038/s41598-022-11012-212:1Online publication date: 3-May-2022
https://doi.org/10.1038/s41598-022-11012-2
Chae MHan SGil HCho NLee H(2021)Prediction of In-Hospital Cardiac Arrest Using Shallow and Deep LearningDiagnostics10.3390/diagnostics1107125511:7(1255)Online publication date: 13-Jul-2021
https://doi.org/10.3390/diagnostics11071255
Gupta MPhan TBunnell HBeheshti RJiang HHuang XZhang J(2021)Concurrent imputation and prediction on EHR data using bi-directional GANsProceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/3459930.3469512(1-9)Online publication date: 1-Aug-2021
https://dl.acm.org/doi/10.1145/3459930.3469512
Ismail AGunady MBravo HFeizi SLarochelle HRanzato MHadsell RBalcan MLin H(2020)Benchmarking deep learning interpretability in time series predictionsProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496264(6441-6452)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496264

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents