Background: Machine learning is increasingly used to predict healthcare outcomes, including cost, utilization, and quality.
Objective: We provide a high-level overview of machine learning for healthcare outcomes researchers and decision makers.
Methods: We introduce key concepts for understanding the application of machine learning methods to healthcare outcomes research. We first describe current standards to rigorously learn an estimator, which is an algorithm developed through machine learning to predict a particular outcome. We include steps for data preparation, estimator family selection, parameter learning, regularization, and evaluation. We then compare 3 of the most common machine learning methods: (1) decision tree methods that can be useful for identifying how different subpopulations experience different risks for an outcome; (2) deep learning methods that can identify complex nonlinear patterns or interactions between variables predictive of an outcome; and (3) ensemble methods that can improve predictive performance by combining multiple machine learning methods.
Results: We demonstrate the application of common machine methods to a simulated insurance claims dataset. We specifically include statistical code in R and Python for the development and evaluation of estimators for predicting which patients are at heightened risk for hospitalization from ambulatory care-sensitive conditions.
Conclusions: Outcomes researchers should be aware of key standards for rigorously evaluating an estimator developed through machine learning approaches. Although multiple methods use machine learning concepts, different approaches are best suited for different research problems.
Keywords: claims data; deep learning; elastic net; gradient boosting machine; gradient forest; health services research; machine learning; neural networks; random forest.
Copyright © 2019 ISPOR–The Professional Society for Health Economics and Outcomes Research. Published by Elsevier Inc. All rights reserved.