Nothing Special   »   [go: up one dir, main page]

A Scoping Review of Causal Methods Enabling Predictions Under Hypothetical Interventions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Lin et al.

Diagnostic and Prognostic Research (2021) 5:3 Diagnostic and


https://doi.org/10.1186/s41512-021-00092-9
Prognostic Research

REVIEW Open Access

A scoping review of causal methods


enabling predictions under hypothetical
interventions
Lijing Lin1* , Matthew Sperrin1, David A. Jenkins1,2, Glen P. Martin1 and Niels Peek1,2,3

Abstract
Background: The methods with which prediction models are usually developed mean that neither the parameters
nor the predictions should be interpreted causally. For many applications, this is perfectly acceptable. However,
when prediction models are used to support decision making, there is often a need for predicting outcomes under
hypothetical interventions.
Aims: We aimed to identify published methods for developing and validating prediction models that enable risk
estimation of outcomes under hypothetical interventions, utilizing causal inference. We aimed to identify the main
methodological approaches, their underlying assumptions, targeted estimands, and potential pitfalls and challenges
with using the method. Finally, we aimed to highlight unresolved methodological challenges.
Methods: We systematically reviewed literature published by December 2019, considering papers in the health
domain that used causal considerations to enable prediction models to be used for predictions under hypothetical
interventions. We included both methodologies proposed in statistical/machine learning literature and methodologies
used in applied studies.
Results: We identified 4919 papers through database searches and a further 115 papers through manual searches. Of
these, 87 papers were retained for full-text screening, of which 13 were selected for inclusion. We found papers from
both the statistical and the machine learning literature. Most of the identified methods for causal inference from
observational data were based on marginal structural models and g-estimation.
Conclusions: There exist two broad methodological approaches for allowing prediction under hypothetical
intervention into clinical prediction models: (1) enriching prediction models derived from observational studies with
estimated causal effects from clinical trials and meta-analyses and (2) estimating prediction models and causal effects
directly from observational data. These methods require extending to dynamic treatment regimes, and consideration
of multiple interventions to operationalise a clinical decision support system. Techniques for validating ‘causal
prediction models’ are still in their infancy.
Keywords: Clinical prediction models, Statistical modeling, Causal inference, Counterfactual prediction

* Correspondence: lijing.lin@manchester.ac.uk
1
Division of Informatics, Imaging and Data Science, Faculty of Biology,
Medicine and Health, University of Manchester, Manchester Academic Health
Science Centre, Manchester, UK
Full list of author information is available at the end of the article

© The Author(s). 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if
changes were made. The images or other third party material in this article are included in the article's Creative Commons
licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons
licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain
permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 2 of 16

Introduction Nevertheless, end-users often mistakenly compare the


Clinical prediction models (CPMs) aim to predict contribution of individual covariates (in terms of risk pre-
current diagnostic status or future outcomes in individ- dictions) and seek causal interpretation of model parame-
uals, conditional on covariates [1]. In clinical practice, ters [14]. Within a potential outcomes (counterfactual)
CPMs may inform patients and their treating physicians framework, an emerging class of causal predictive models
of the probability of a diagnosis or a future outcome, could enable ‘what-if’ queries to be addressed, specifically
which is then used to support decision-making. For ex- calculating the predicted risk under different hypothetical
ample, QRISK [2, 3] computes an individual’s risk of de- interventions. This enables targeted intervention, allows
veloping cardiovascular disease within the next 10 years, correct communication to patients and clinicians, and fa-
based on their characteristics such as BMI, blood pres- cilitates a preventative healthcare system.
sure, smoking status, and other risk factors. The Na- There exists a vast literature on both predictive models
tional Institute for Health and Care Excellence (NICE) and causal inference. While the use of prediction model-
guidelines indicate that anyone with an estimated QRISK ling to enrich causal inference is becoming widespread
above 10% should be considered for statin treatment [4]. [15], the use of causal thinking to improve prediction
These guidelines also state that initially, patients should modelling is less well studied [16]; however, its potential
be encouraged to implement lifestyle changes such as is acknowledged [13]. Our aim was therefore to identify
smoking cessation and weight loss. However, guidelines methods for developing and validating ‘causal prediction’
using such clinical prediction models can be problematic models that use causal methods to enable risk estimation
for two main reasons. First, there is often a lack of clarity of outcomes under hypothetical interventions. We aimed
concerning the estimand that a clinical prediction model to identify the main methodological approaches, their
is targeting [5]. To inform decision making about treat- underlying assumptions, targeted estimands, and poten-
ment initiation, one requires predicted risks assuming tial pitfalls and challenges with using the method. Fi-
no treatment is given. This might be achieved by using a nally, we aimed to highlight unresolved methodological
‘treatment-naïve’ cohort (removing all patients who take challenges.
treatment at baseline) [2, 3] or by incorporating treat-
ment as a predictor variable in the model [6]. However, Methods
such approaches do not handle ‘treatment drop-in’: in We aimed to identify all studies in which a form of
which patients in the development cohort might start causal reasoning is used to enable predictions for health
taking treatment post-baseline [7, 8]. One way to at- outcomes under hypothetical interventions. To be clear,
tempt to account for this is to censor patients at treat- we were not interested in causal studies where the
ment initiation; however, this assumes that treatment methods can solely be used to predict average or condi-
initiation is not informative [9]. Second, these prediction tional causal effects [13].
models cannot indicate which of the potential treatment Due to the available resources for reviewing large vol-
options or lifestyle changes would be best in terms of umes of papers, the search was restricted to the health
lowering an individual’s future cardiovascular risk, nor domain. We included both methodologies proposed in
can they quantify the future risk if that individual was statistical/machine learning literature and methodologies
given a treatment or lifestyle change [10, 11]. With the used in applied studies. The review process adhered to
lack of randomised treatment assignment such as in the Arksey and O’Malley’s [17] scoping review framework
observational studies, simply ‘plugging in’ the hypo- and the preferred reporting items for systematic reviews
thetical treatment or intervention via the baseline and meta-analyses (PRISMA) statement [18]. We have
covariates will rarely, if ever, give the correct hypo- also followed recommendations for conducting method-
thetical risks [11]. For example, there may be ology scoping reviews as suggested in Martin et al. [19].
underadjustment due to residual confounding, or
overadjustment of mediators or colliders. Search strategy
To correctly aid such decision-making, one needs an- We systematically reviewed the literature available up to
swers to ‘what-if’ questions. As an example, suppose we a cut-off date on December 31, 2019. The literature
are interested in statin interventions for primary preven- search was conducted in two electronic databases: Ovid
tion of CVD and we would like to predict the 10-year risk MEDLINE and Ovid Embase, and searches were tailored
of CVD with or without statin interventions at an individ- to each database and restricted to English language pub-
ual level. The methods used to derive CPMs do not allow lications. The search terms were designed by considering
for the correct use of the model in answering such ‘what- the intersection of prediction modelling and causal infer-
if’ questions, as they select and combine covariates to ence. Pre-existing search filters were utilised where pos-
optimize predictive accuracy, not to predict the outcome sible such as those for prediction models [20]. Details of
distribution under hypothetical interventions [12, 13]. the search terms are included in the supplementary
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 3 of 16

protocol. We were also aware, a priori, of several re- 4. Information on targeted estimand and possible
search groups that have published work on methods in validation approaches for the proposed methods
related areas (listed in the supplementary protocol). We inferred by the review authors; stated possible
manually searched for any relevant recent publications sources of bias;
within the past 4 years from these groups. In addition, 5. Stated assumptions; methodologies/methods used
we conducted backward citation search checking the ref- for the causal effect estimation and outcome
erences of identified papers, and a forward citation prediction; main methodological novelty stated by
search using Google Scholar, which discovered papers the authors of identified papers;
referencing the identified papers. 6. Reported modelling strengths and limitations, and
suggestions for the future work;
7. Availability of software/code.
Selection of studies
After the initial search, all titles and abstracts of pa-
The completed extraction table is available in the Sup-
pers identified by the search strategy were screened
plementary File 1. Categorisation of papers was carried
for eligibility by the lead author (LL). A random 3%
out during information extraction phase by synthesising
were screened by a second reviewer (DJ) to ensure re-
the extracted information.
liability of the screening process. Any discrepancies
between the reviewers were solved through mutual
discussion, in consultation with a third reviewer,
Results
where needed (MS). The initial eligibility criteria,
Our database searches identified 4919 papers. We identi-
based on title and abstract screen, were as follows: (1)
fied a further 115 papers through checking publications
use causal reasoning in the context of health outcome
from known research groups, and forward and backward
prediction, specifically enabling prediction under
citation searching. Of these, 87 were retained for a full-
hypothetical interventions; (2) describe original meth-
text screening, with 13 of these were deemed eligible for
odological research (e.g. peer reviewed methodological
final inclusion, as listed in Table 1. The process of study
journal); or (3) applied research, which did not de-
identification, screening, and inclusion is summarised in
velop methodology, but state-of-the-art methodology
the PRISMA flowchart (Fig. 1).
was employed to address relevant causal prediction
The identified papers covered two main intervention
questions. We excluded studies that could only be
scenarios: single intervention [22–27] and repeated in-
used for causal effect estimation, and studies where
terventions over time [8, 28–33] with nearly an equal
standard clinical prediction models were used to infer
amount of papers addressing average intervention effects
conditional causal effects, e.g. [21]. However, we do
(defined by a contrast of means of counterfactual out-
not exclude papers that developed a novel method of
comes averaged across the population) [8, 22, 23, 25, 29,
allowing prediction under hypothetical intervention,
30] and conditional effects (defined as the contrast of
even when the final goal was causal effect estimation.
covariate-specific means of the outcome under different
We excluded letters, commentaries, editorials, and
intervention levels) [24, 26–28, 31–33]. Across the in-
conference abstracts with no information to allow as-
cluded papers, we identified two broad categories of
sessment of proposed methods.
methodological approaches for developing causal predic-
tion models: (1) enriching prediction models with exter-
Extraction nally estimated causal effects, such as from meta-
Following the review aims, we extracted information analyses of clinical trials and (2) estimating both the pre-
from papers that were included after a full-text screening diction model and causal effects from observational data.
as follows: The majority of the identified papers (10 out of 13) fell
into the latter category, which can be further divided ac-
1. Article type (summary/review, theoretical, cording to intervention scenarios and included methods
modelling with application via simulation and/or embedded within both statistical and machine learning
observed data, purely applied paper); frameworks. Table 1 describes part of the extracted in-
2. Clinical topic area of analysis (e.g. CVD, HIV, formation on each paper. The complete extraction table
cancer) for papers with application to observed is available in the Supplementary File 1. In addition, we
data; will illustrate the methods identified using the statin in-
3. Intervention scenarios (single intervention vs terventions for primary prevention of CVD example in-
multiple interventions); types of outcome and troduced previously, explaining each method and
exposure examined (binary, time-to-event, count, showing their differences in terms of targeted questions
continuous, other); and corresponding estimand (Table 2).
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 4 of 16

Table 1 Summary of included 13 papers (See Supplementary File 1 for the completed extraction table)
Title Intervention Clinical topic Types of Stated assumptions Reported limitations Code
Scenario area outcomes availability
Candido dos Reis, F. J. Single Breast cancer Survival Generalisability of effect Prediction of non-breast Stata code
et al. (2017) An updated intervention, from clinical trial. cancer deaths was excel- are
PREDICT breast cancer Discrete choice, lent in the model develop- available
prognostication and Average effect ment data set but could from the
treatment benefit under-predict or over- author on
prediction model with predict in the validation request.
independent validation, data sets.
Breast Cancer Research,
19(1), 58
Brunner, F. J. et al. (2019) Single CVD Binary The therapeutic benefit of (1) Data limitation in the Reported
Application of non-HDL intervention, lipid-lowering intervention derivation cohort. (2) using R but
cholesterol for population- Discrete choice, investigated in the study is Strong clinical assumption codes not
based cardio-vascular risk Average effect based on a hypothetical that treatment effects are available
stratification: results from model that assumes a sustained over a much
the Multinational Cardio- stable reduction of non- longer term than has been
vascular Risk Consortium., HDL cholesterol. studied in clinical trials.
The Lancet 394.10215:
2173-2183.
Silva, R. (2016), Single Infant Health Continuous A(1) and additionally: It is (1) Computation Code
Observational- intervention, and possible to collect complexity. (2) Have not available
Interventional Priors for Treatment dose Development interventional data such discussed at all the from OLS
Dose-Response Learning. (continuous), Program (IHDP) that treatments are important issue of sample
In Advances in Neural In- Conditional controlled selection bias. (3)
formation Processing Sys- effect Generalisability issue.
tems 29.
Van Amsterdam, W. A. C. Single Lung caner Survival A(1), A(2) and additionally: (1) Provide an example of Code
et al. (2019). Eliminating intervention, An image is hypothesized how deep learning and available
biasing signals in lung Discrete choice to contain important structural causal models from OLS
cancer images for (0/1), Average information for the clinical can be combined.
prognosis predictions with effect prediction task. The Methods combining
deep learning. npj Digital collider can be measured machine learning with
Medicine, 2(1), 1-6. from the image. causal inference need to
be further developed.
Alaa, A. M., & Van Der Single IHDP & Heart Continuous/ A(2) (1) No experiments Code
Schaar, M. (2017). Bayesian intervention, transplantation Survival times regarding outcome available
Inference of Individualized Discrete choice for prediction accuracy. (2) from
Treatment Effects using (0/1), cardiovascular The computational burden authors'
Multi-task Gaussian Pro- Conditional patients is dominated by the O(n3) website.
cesses. In Advances in effect (matrix inversion on line
Neural Information Pro- 13 in Alg.1.
cessing Systems 30.
Arjas, E. (2014) Time to Single Acute middle Survival A(2) and local In studies involving real NA
Consider Time, and Time intervention, ear infections independence data the computational
to Predict? Statistics in Discrete choice, challenge can become
Biosciences. Springer New Conditional formidable and even
York LLC, 6(2), pp. 189-203 effect exceed what is feasible in
practice.
Pajouheshnia R. et al. Multiple Chronic Survival A(1) and A(2) A very strong indication NA
(2020) Accounting for intervention; obstructive for treatment will result in
time-dependent treatment Discrete choice pulmonary structural non-positivity
use when developing a (0/1); Average disease (COPD) leading to biased esti-
prognostic model from ob- effect mates of treatment-naïve
servational data: A review risk.
of methods. Stat Neerl.
74(1).
Sperrin, M. et al. (2018) Multiple CVD Binary A(1) and A(2) (1) Have not modelled Code
Using marginal structural intervention; statistical interaction available
models to adjust for Discrete choice between treatment and from OLS
treatment drop-in when (0/1); prognostic factors; (2) Did
developing clinical predic- Conditional not explicitly model statin
tion models, Statistics in effect discontinuation; (3) Only
Medicine. John Wiley & consider single treatment.
Sons, Ltd, 37(28), pp. 4142-
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 5 of 16

Table 1 Summary of included 13 papers (See Supplementary File 1 for the completed extraction table) (Continued)
Title Intervention Clinical topic Types of Stated assumptions Reported limitations Code
Scenario area outcomes availability
4154.
Lim, B. et al. (2018). Multiple Cancer growth No restriction A(2) NA Code
Forecasting Treatment intervention; No and treatment available
Responses Over Time restriction on responses from OLS
Using Recurrent Marginal treatment
Structural Networks. In choices;
Conference on Neural Average effect
Information Processing
Systems 32.
Bica, I. et al. Multiple Treatment No restriction A(2) Additional theoretical Code
(2020). Estimating intervention; response in a understanding is needed available
Counterfactual Treatment Discrete tumour growth for performing model from
Outcomes over Time treatment model selection in the causal authors'
through Adversarially choices; inference setting with website.
Balanced Representations. Average effect time-dependent treat-
ICLR 2020 ments and confounders.
Xu, Y. et al. (2016) A Multiple (1) kidney Continuous A(2) NA NA
Bayesian Nonparametric intervention; function
Approach for Estimating Discrete deterioration in
Individualized Treatment- treatment ICU; (2) the
Response Curves. Edited choices; effects of
by F.Doshi-Velez et al. Conditional diuretics on
PMLR , pp. 282-300. treatment effect fluid balance.
Soleimani, H. et al. (2017). Continuous- Modelling No restriction A(2), A(3) While this approach relies NA
Treatment-response time physiologic on regularisation to
models for counterfactual intervention; signals with decompose the observed
reasoning with Continuous- EHRs for data into shared and
continuous-time, valued treat- treatment signal-specific compo-
continuous-valued inter- ments; Condi- effects on renal nents, new methods are
ventions. In Uncertainty in tional treatment function needed for constraining
Artificial Intelligence. Pro- effect the model in order to
ceedings of the 33rd Con- guarantee posterior
ference, UAI 2017. consistency of the sub-
components of this model.
Schulam, P., & Saria, S. Continuous- Applicable to Continuous- A(2), A(3), A(4) (1) the validity of the CGP Code
(2017). Reliable Decision time data from EHR time; no is conditioned upon a set available
Support using intervention; but not restrict restriction on of assumptions that are, in from
Counterfactual Models. In Continuous- to such data type general, not testable. The authors'
Advances in Neural valued treat- medical reliability of approaches website.
Information Processing ments; Condi- settings therefore critically depends
Systems 30. tional treatment on the plausibility of those
effect assumptions in light of
domain knowledge.
Abbreviations for ‘Stated Assumptions’: A(1) Relevant directed acyclic graphs (DAGs) available, A(2) Identifiability conditions (consistency, exchangeability, and
positivity; or sequential version of consistency, exchangeability, and positivity for time-varying treatments), A(3) Continuous-time exchangeability and A(4) Non-
informative measurement times. Other abbreviations: EHR electronic health record, RCT randomised controlled trial

Combining causal effects measured from external coefficient for the statins variable in the model is fixed
information to the statin effects estimated from trials. Brunner et al.
Three papers [22–24] were identified as developing [23] developed a CPM for cardiovascular risk which was
models with combined information from different then combined with an externally estimated equation of
sources to address single treatment effect. Candido dos proportional risk reduction per unit LDL cholesterol re-
Reis et al. [22] and Brunner et al. [23] took a two-stage duction to aid decision making in lipid-lowering treat-
approach, in which treatment effect estimates from ex- ment usage.
ternal sources such as RCTs and meta-analyses were In addition to the above two-stage approach borrow-
first identified, then combined with prediction models to ing causal information estimated externally into predict-
allow predictions under treatment. In the statins for ive models, a one-stage approach, proposed by Silva
CVD example, the method proposed by Candido dos [24], was also identified where the two sources of data,
Reis et al. [22] corresponds to developing a CPM includ- interventional and observational, were jointly modelled
ing individuals who take statin at baseline, where the for causal prediction. This approach was applied in a
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 6 of 16

Fig. 1 PRISMA flow diagram. ‡The majority of papers excluded at this stage did not meet our first inclusion criterion: that is, they did not focus
on prediction under hypothetical interventions; these papers either did not use prediction models at all, or only used prediction modelling to
enrich causal inference

scenario where it is possible to collect interventional Estimating both a prediction model and causal effects
data such that treatments were controlled but where from observational data
sample sizes might be limited. The idea was to transform A total of 10 papers discussed modelling predictions
observational data into informed priors under a Bayesian under interventions entirely from observational data.
framework to predict the unbiased dose-response curve Approaches from these papers can be further divided
under a pre-defined set of interventions, or ‘dose’. into two categories: (1) methods considering only one
All the three approaches above are limited to a single intervention at a single time point [25–27], as discussed
intervention type and intervening at a single point in in Section 3.2.1, and (2) methods allowing time-
time, where, in a considered trial protocol, the interven- dependent interventions [8, 28–33], as discussed in Sec-
tion may follow-up for a certain length of time during tion 3.2.2.
which its choice is maintained (e.g. the initialisation of
statin intervention). Approaches that directly apply the Counterfactual prediction models that consider an
externally estimated causal effects into CPMs assume intervention at a single point in time
that the estimated causal effects are generalisable to the In our running example, this corresponds to a decision
population in which one wishes to apply the prediction at a single time of whether to prescribe statins for CVD
model. Equally, combining individual data from both prevention. It does not account for whether statins are
sources (i.e. the one-stage approach) ignored the issue of discontinued or started at any subsequent time.
sample selection bias, which was highlighted in [24].
Additionally, the one-stage approach can become com- Related to average treatment effect estimation Deci-
putationally intensive as the size of the observational sion-making on whether to intervene on treatment re-
and number of treatment levels increase. quires an unbiased estimate of the treatment effect at
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 7 of 16

Table 2 Illustration of methods in different categories using an example of statin intervention in primary prevention of CVD
Approach categories Refs Targeted estimand Potential pitfalls/ Exemplary methods/
challenges evaluations
Combining Two-stage approach Candido dos Risk of CVD under Efficacy/effectiveness gap Develop a CPM using
causal effects Reis et al. [22] intervention of taking or when translating trial individuals who take statin
measured not taking statin at results to routine care. at baseline with the
from external baseline (and, in a Comparability of trail and coefficient for treatment
information considered trial protocol, observed populations variable in the model fixed
following-up for a certain (selection bias). to the statin effects
length of time during estimated from trials.
which statin choice is
Brunner et al. Inflating the baseline Inflate the baseline
maintained): EðY ðA0 Þ jX 0 Þ
[23] cholesterol for individuals cholesterol of individuals
receiving statin by a receiving statin (by 30%
certain level has assumed e.g.). Develop a CPM using
that ‘statins had a all individuals. Combine the
moderate effect on lipid predicted individual-level
reduction and was initiated CVD risk with an effect
late during lifetime’, and equation estimated from
that statins operate only trials to get the absolute
through cholesterol, i.e. risk under intervention.
ignores any other causal
pathways.
One-stage approach Silva [24] Risk of CVD under Sample selection bias Individual patient data
intervention of taking between the interventional from RCTs and
statin of dosage ai, (i = 1, data and observational observational clinical data
…, d) at baseline: EðY ðA0 ¼ai Þ data. are combined under a
jX 0 Þ. Bayesian framework to
predict risk under
intervention. Use MCMC to
approximate the posterior
distributions of the
parameters in the model.
Estimating Single Related to Van Risk of CVD under An over-simplified causal Use a CNN to separate the
both a intervention average Amsterdam intervention of taking/not structure can lead to unobserved collider
prediction treatment et al. [25] taking statin at baseline, biased estimates of causal information from other risk
model and effect regardless of future: EðY ðA0 Þ effects, e.g. when there ex- factors while using the last
causal effects estimation jX 0 Þ. ists more than one collider layer resembling linear
from that were not observed regression to include the
observational but whose information treatment variable as a
data were contained in the covariate for risk prediction
prognostic factors. under intervention.
Related to Alaa et al. Risk of CVD under Without careful Estimate the outcome
conditional [26] intervention of taking/not examination of causal curves for the treated
treatment taking statin at baseline, structure within the samples and untreated
effect regardless of future: variables, biased samples simultaneously
estimation EðY ðA0 Þ jX 0 Þ. association between using the signal-in-white-
treatment and outcome noise model. The estima-
can be introduced. tion of model is done
through one loss function,
known as the precision in
estimating heterogeneous
effects (PEHE).
Arjas [27] Risk of CVD under Potentially biased estimate Use treatment history and
intervention of taking/not due to misspecification of other risk factors measured
taking statin at baseline, intensity functions required over-time to set up a
regardless of future: in the outcome hazard Bayesian model to estimate
EðY ðA0 Þ jH0 Þ. model. the outcome risk intensity
function over time. For pre-
diction, given an individ-
ual’s measurements up to
time t, estimate the risk
under a single intervention
by applying MCMC on the
predictive distributions.
Time- MSMs within a Pajouheshnia Risk of CVD under The effectiveness of bias Assume a causal structure.
dependent prediction et al. [8] interventions of taking/not correction depends on a Estimate treatment
treatments model taking statin at baseline correct specification of censoring probabilities by
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 8 of 16

Table 2 Illustration of methods in different categories using an example of statin intervention in primary prevention of CVD
(Continued)
Approach categories Refs Targeted estimand Potential pitfalls/ Exemplary methods/
challenges evaluations
and framework and/or some other times treatment model. fitting logistic regression
treatment- at the future: Eð models in each of the
Y ðAð0; KÞ ¼ 0Þ jX Þ
confounder follow-up periods and de-
0
feedback rive time-varying censoring
weights. After censoring,
develop the prognostic
model using a weighted
Cox model.
Sperrin et al. Risk of CVD under The effectiveness of bias Assume a causal structure.
[28] interventions of taking/not correction depends on a Collect the baseline
taking statin at baseline correct specification of prognostic factors,
and/or some other times treatment model. Requires treatments, and treatment
at the future: Eð agreement between the confounders at each time
Y ðAð0; KÞ ¼ 0Þ jX Þ
prediction model and the point post-baseline. Com-
0
set of variables required for pute IPTWs using a treat-
conditional ment model; with derived
exchangeability. IPTWs, build a logistic re-
gression for outcome pre-
diction under treatments.
Lim et al. [29] Risk of CVD and/or other Requires agreement With observed treatment,
outcomes of interest (e.g. between the prediction covariate and outcome
cholesterol, SBP, etc) under model and the set of histories (allowing for
multiple interventions variables required for multiple treatment options
planned for the next τ conditional of different forms), develop
timesteps from current exchangeability. a propensity network to
time, given an observed compute the IPTW and a
history H0 : sequence-to-sequence
ðAð0; τ − 1 ÞÞ model that predict the out-
EðY τ jH0 Þ. come under a planned se-
quence of interventions.
Methods Bica et al. [30] Potential confounders as Build a counterfactual
based on no careful examination of recurrent network to
balanced causal structure. predict outcomes under
representation interventions:
approach 1. For the encoder network,
use an RNN, with LSTM
unit to build treatment
invariant representations of
the patient history ΦðHt Þ
and to predict one-step-
ahead outcomes Yt + 1;
2. For the decoder network,
use ΦðHt Þ to initialize the
state of an RNN that
predicts the counterfactual
outcomes for future
treatments.
Methods with Xu et al. [31] Cholesterol or other Potential bias due to With observed treatment/
g-computation continuous outcome of strong assumptions on covariate/outcome
for correcting interest (univariate) at any model structure and histories, estimate
time-varying time t in the future, under possible model treatment-response trajec-
confounding a sequence of misspecification. tories using a Bayesian
interventions planned nonparametric or semi-
irregularly from current parametric approach:
time till t, A0; <t , given 1. Specify models for
ðA Þ different components in
observed history: EðY t 0;<t j the generalised mix-effect
H0 Þ. model for outcome
Soleimani ðA Þ
prediction.
et al. [32] EðY t 0;<t jH0 Þ: same as in These usually include:
the Xu et al. method treatment response,
except that now the baseline regression (fixed
outcome Y can be effects), and random
multivariate (e.g. effects. For the case where
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 9 of 16

Table 2 Illustration of methods in different categories using an example of statin intervention in primary prevention of CVD
(Continued)
Approach categories Refs Targeted estimand Potential pitfalls/ Exemplary methods/
challenges evaluations
simultaneously predict risk the treatments are
of CVD, cholesterol and continuously-administrated,
SBP) and the treatment can model the treatment re-
be both discrete-time and sponse using LTI dynamic
continuous-time. systems (Soleimani et al).
2. Choose priors for these
models based on expert
domain knowledge.
3. Use maximum a
posteriori (MAP) (Soleimani
et al.) or MCMC (Xu et al.)
to approximate the
posterior distributions of
the parameters in the
proposed model.
Schulam  Þ
ðA Potential bias due to With observed histories,
EðY t 0;<t jY0 ; A
<0 Þ: same as
et al. [33] strong assumptions on jointly model intervention
in the Soleimani et al.
model structure and and outcomes using a
method except that the
possible model marked point process
observations only include
misspecification. Lack of (MPP):
intervention and outcome
effect heterogeneity due 1. Specify models for the
histories.
to omitting baseline components in the MPP
covariates. intensity function: event
model, outcome model,
action (intervention) model.
The parameterization of the
event and action models
can be chosen to reflect
domain knowledge. The
outcome model is
parameterized using a GP.
2. Maximise the likelihood
of observational traces over
a fixed interval to estimate
the parameters.

baseline. Assuming that the Directed Acyclic Graph information from images while enabling the treatment
(DAG) that encodes the relationship between all the to be appropriately included as a covariate for risk
relevant variables is known, then do-calculus [34] pro- prediction under intervention. As there is no model-
vides an indication of whether this can be achieved in ling of interactions between treatment and other co-
the setting of observational data with the required causal variates, this approach only addresses average
assumptions. For example, including a collider in the treatment effects.
model (a variable caused by both treatment and out- Van Amsterdam et al. [25] have demonstrated that deep
come) will lead to biased estimates of treatment effects learning can in principal be combined with insights from
on the outcome. A more complex scenario appears when causal inference to estimate unbiased treatment effect for
the collider itself cannot be directly observed, but its in- prediction. However, the causal structure applied therein
formation is contained in other prognostic factors. Van was in its simplest form, and further developments are
Amsterdam et al. [25] proposed a deep learning frame- needed for more realistic clinical scenarios where, e.g.
work to address this particular scenario. Their goal is to there is confounding for treatment assignment, or a treat-
predict survival of lung cancer patients using CT-scan ment effect modifier exists within the image.
images, in which case factors such as tumor size and
heterogeneity are colliders that cannot be directly ob- Related to conditional treatment effect estimation
served but can be measured from the image. The Let Y(a) denote the potential outcome under an interven-
authors proposed a multi-task prediction scheme em- tion a, for example one’s risk of CVD or cholesterol level
bedded in a convolutional neural network (CNN) frame- under intervention of taking statin. Conditional treat-
work (a non-linear model often used with images) which ment effects for subjects with a covariate X = x in a
can simultaneously estimate the outcome and the col- population at a single time point is defined as T(x) =
lider. It used a CNN to separate the unobserved collider E[Y(1) − Y(0)| X = x], and our goal here is to estimate the
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 10 of 16

counterfactual prediction of E[Y(a)| X = x], a ∈ {0, 1}. In can be then made by evaluating the corresponding
an RCT, given complete randomisation—i.e., a is inde- predictive probabilities in the Bayesian posterior pre-
pendent of Y(a) and X, under consistency, one can esti- dictive setting given the data.
mate E[Y(a)| X = x] by fitting a prediction model to the Both methods in this subsection can be computation-
treated arm (a = 1) and the control arm (a = 0), respect- ally intensive as the number of observed samples in-
ively. The technique is often used in estimating condi- creased. This could be ameliorated using conventional
tional treatment effects [21, 35] or identifying subgroups sparse approximations [26, 39]. Both methods are lim-
from RCTs [36, 37], whereas our focus is counterfactual ited to binary interventions, and prediction via treatment
prediction under interventions. In Alaa et al. [26], under effect estimation can only make counterfactual predic-
a set of assumptions, this technique was adapted for tion for outcomes with or without intervention.
counterfactual prediction with observational data, which
used a more complex regression model to address for Counterfactual prediction models that consider time-
selection bias in the observational dataset. dependent treatments and treatment-confounder feedback
Alaa et al. [26] adopted standard assumptions of uncon- Papers included in this category [8, 28–33] covered three
foundedness (or ignorability) and overlap (or positivity), types of approaches to deal with scenarios where the
which is known as the ‘potential outcomes model with treatments of interest and confounders vary over time.
unconfoundedness’. Their idea is to use the signal-in- One example of such confounding is in the sequential
white-noise model for the potential outcomes and esti- treatment assignment setting, where doctors use a set of
mate two target functions, the treated and the untreated, variable measurements, at the current time or in the
simultaneously with training data. The estimation is done past, to determine whether or not to treat, which in turn
through one loss function, known as the precision in esti- affects values of these variables at a subsequent time.
mating heterogeneous effects (PEHE), which jointly mini- For example, whether or not statins are taken at a par-
mises the error of factual outcomes and the posterior ticular time will affect cholesterol, and these subsequent
counterfactual variance, in such a way to adjust for the cholesterol levels affect subsequent decisions about
bias between the treated and untreated groups. The coun- statins. The benefit of such approaches is that they allow
terfactual prediction for either treated or untreated can consideration of a longer term treatment plan, such as
then be made through the estimated posterior mean of comparing taking statins continuously for 10 years from
two potential outcome functions. Since the ground truth baseline, versus not taking statins for the next 10 years.
counterfactual outcomes are never available in real-world The assumptions needed for identifying unbiased treat-
observational datasets, it is not straightforward to evaluate ment effects in such scenarios are consistency, positivity,
causal prediction algorithms and compare their perfor- and sequential ignorability.
mances. A semi-synthetic experimental setup was adopted
in [26], where covariates and treatment assignments are Marginal structural models (MSMs) within a
real but outcomes are simulated. prediction model framework Consider in our running
For the longitudinal setting where the event history is fully example the hypothetical risk of not taking statins for
observed, Arjas [27] adopted a marked point process (MPP) the next 10 years. To account for treatment drop-ins, i.e.
framework with a Bayesian non-parametric hazard model to treatments initiated post-baseline, one straightforward
predict the outcome under a single intervention. Point pro- way is to censor patients at treatment initiation; how-
cesses are distributions over sequences of time points, and a ever, this assumes that treatment initiation is non-
marked point process is made by attaching a characteristic (a informative about the baseline and time-dependent co-
Mark) to each point of the process [38]. The idea is to incorp- variates. Pajouheshnia et al. [8] proposed censoring
orate all observed events in the data, including past treat- followed by reweighting using inverse probability of cen-
ments, covariates, and outcome of interest, into a single MPP: soring weights (IPCW) to solve the issue of informative
{(Tn, Xn) : n ≥ 0}, where T0 ≤ T1 ≤ ⋯are the ordered event censoring in estimating treatment-naïve risk. The pro-
times and Xn is a description of the event occurring at Tn. posed method derived time-varying censoring weights
The model assumed local independence—i.e. the intensities of by estimating the conditional probabilities of treatment
events (that is, the probability of an event occurring in an in- initiation and then developed a weighted Cox model in
finitesimal time interval) when considered relative to the his- the treatment-naïve pseudo-population. A more flexible
tories H t are locally independent of outcome risk functions in way of addressing treatment drop-ins for hypothetical
the model. Under this assumption, in order to define a statis- risk prediction is to use MSMs with inverse probability
tical model for MPP, it suffices to specify the outcome inten- treatment weighting (IPTW), where a pseudo-population
sities with respect to H t and there is no need for other event is created such that treatment selection will be uncon-
time intensities. Prediction under hypothetical interventions founded. Sperrin et al. [28] proposed combining MSM
with predictive modelling approaches to adjust for
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 11 of 16

confounding and generate prediction models that could representation Φ of the patient history H t ¼ ðAt − 1 ; X t Þ that
appropriately estimate risk under the required treatment was not predictive of treatment assignments. That is, in the
regimens. Following the classic development of IPTW case of two treatment assignments at time t, PðΦðH t ÞjAt
for an MSM, the proposed methods develop two predic- ¼ 0Þ ¼ PðΦðH t ÞjAt ¼ 1Þ. It can be shown that, in this way,
tion models: a treatment model for computing the prob- estimation of counterfactual treatment outcomes is unbiased
ability of receiving post-baseline treatments, and an [40]. Bica et al. [30] proposed a counterfactual recurrent net-
outcome prediction model fitted with the derived work (CRN) to achieve balancing representation and esti-
weights and with these post-baseline treatments as well mate unbiased counterfactual outcomes under a planned
as terms for any interactions between treatment and sequence of treatments (such as statins). CRN improved the
other predictors included as predictors. By carefully de- closely related RMSN model proposed by Lim et al. [29] in a
fining the required estimand for the target prediction, way that overcame the fundamental problem with IPTW,
the proposed framework could estimate risks under a such as the high variance of the weights. As with RMSN,
variety of treatment regimens. In the statin example, this both models required hyperparameter tuning. As the coun-
means that one can compare CVD risk under a range of terfactual outcomes were never observed, hyperparameters
different statin treatment plan, although the focus in the in both models were optimised based on the error on the
paper was on the ‘never takes statins’ hypothetical pre- factual outcomes in the validation dataset. As noted by the
diction. As with approaches described so far in this cat- authors in [30], more work on providing theoretical guaran-
egory, the model only considered a binary treatment tees for the error on the counterfactuals is required.
(e.g. statins yes/no). The extension to multiple treatment
choices for the proposed method is possible in principal,
although the underlying causal structure and resulted Methods with g-computation for correcting time-
model may become too complex. varying confounding Three papers [31–33] were identi-
Similarly to [28], Lim et al. [29] adopted the MSM fied using g-computation to correct time-varying con-
combined with IPTW approach. Instead of using linear founding and predicting treatment response curves
or logistic regression models, they embedded the con- under the potential outcome framework.
cept into a deep learning framework and proposed a Re- Xu et al. [31] developed a Bayesian non-parametric
current Marginal Structural Network (RMSN). The model for estimating conditional treatment response
model consisted of (1) a set propensity networks to com- curves under the g-computation formula and provided
pute treatment probabilities used for IPTW and (2) a posterior inference over the continuous response curves.
prediction network used to determine the treatment re- In the statin example, this means that one can estimate
sponse for a given set of planned interventions. cholesterol or any other continuous outcome of interest
The benefit of RMSN is that, it can be configured to under a planned sequence of statin treatments (yes/no).
have multiple treatment choices and outcomes of differ- The proposed method modelled the potential outcome
ent forms (e.g. continuous or discrete) using multi- using a generalized mixed-effects model combining the
input/multi-output RNNs. This means, in the statin ex- baseline progression (with no treatment prescribed), the
ample, one could consider different doses, and indeed treatment responses overtime, and noise. The goal was
consider alternative treatments as well. Treatment se- to obtain posterior inference for the treatment response
quences can also be evaluated, and no restrictions were and predict the potential outcomes given any sequence
imposed on the prediction horizon or number of of treatments conditioned upon past treatments and co-
planned interventions. The use of LSTMs in computing variate history. There are two limitations to the model
the probabilities required for propensity weighting can here: (1) it assumes independent baseline progression
also alleviate susceptibility of IPTWs to model misspeci- and treatment response components, and (2) treatment
fication. A drawback is that one needs a rich source of response models rely on the additive treatment effects
longitudinal data to train the model. Moreover, as in assumption and a careful choice of priors based on clin-
general in deep learning models, they lack a clear ical details to be decided by domain experts.
interpretation. Soleimani et al. [32] extended the approach in Xu
et al. [31] in two ways: (1) to continuous-time setting
Methods based on balanced representation approach with continuous-valued treatments and (2) to multivari-
Matching approaches such as MSM or RMSN combined ate outcomes. This means, in the statin example, one
with IPTW above adjust for bias in the treatment assign- could simultaneously predict e.g. risk of CVD, choles-
ments by creating a pseudo-population where the probability terol, and systolic blood pressure (SBP) under a range of
of treatment assignments does not depend on the time- different statin treatment plans (allowing for different
varying confounders. Balanced representation approach, as doses assigned at different time points). The model has
proposed by Bica et al. [30], instead aimed for a its ability to capture the dynamic response after the
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 12 of 16

treatment is initiated or discontinued by using linear review to focus on methods enabling predictions under
time-invariant systems. Despite being a more flexible interventions (i.e. counterfactual prediction models). A
model than [31], this model did not overcome two recent review focused on how time-dependent treatment
limitations mentioned above. use should be handled when developing prediction
Schulam and Saria [33] considered another continuous- models [8]. This clarified the targeted estimand of the
time setting where both type and timing of actions may be clinical prediction model of interest and consider hypo-
dependent on the preceding outcome. In the statin thetical risks under no interventions.
example, this means both the statin dose and treatment Our search terms, defined from the intersection of
time (initialisation or discontinuation) depend on the pre- prediction modelling filters and causal inference
ceding cholesterol level. Here, one needs to predict how a keywords, have been made purposely broad to capture
continuous-time trajectory will progress under sequences relevant literature, albeit with a high number of false-
of actions. The goal was to model action-outcome traces positives driven by the heterogeneity in language across
D ≡ {tij, Yij, aij}i, j: for each individual i and irregularly sam- the fields. This could imply a challenge in devising a po-
pled sequences of actions and outcomes. Schulam and tentially more effective search strategy for identifying
Saria [33] proposed a counterfactual Gaussian process methodological papers on both fields, a challenge as
(CGP) model to model the trajectory and derived an ad- highlighted in Martin et al. [19].
justed maximum likelihood objective that learned the This review has synthesised a range of methods, em-
CGP from observational traces. The objective was derived bedded within both statistical and machine learning
by jointly modeling observed actions and outcomes using frameworks. These methods rely on the availability of
a marked point process (MPP). The potential outcome the DAG that encodes the relationship between all the
query can therefore be answered with the posterior pre- relevant variables, and a series of assumptions that make
dictive trajectory of the outcome model. A key limitation it possible to estimate counterfactual predictions from
in this model is that it could not model heterogeneous observational data. Approaches described here cover a
treatment effects arising from baseline variables. wide range of data settings and clinical scenarios. Care-
Counterfactual prediction models in this section using ful thoughts are needed before adopting these methods,
g-formula to correct for time-varying confounding are and further challenges and gaps for future research re-
highly flexible and can be adopted for a variety of clinical main, which we will discuss here.
settings. However, these methods rely on a set of strong Methods combining information from different
assumptions in both discrete-time and continuous-time sources, such as RCTs combined with observational
settings that are generally not testable; for the latter, data, provide a natural way to enable counterfactual pre-
Schulam and Saria [33] extended Robin’s Sequential No dictions; however, challenges remain when combining
Unobserved Confounders assumption to continuous-time these two settings. Their objectives are not necessarily
case and also assumed Non-informative Measurement complementary, leading to distinct populations included
Times. in each study (of possibly very different sample sizes),
different sets of covariates being measured, and some
Discussion potential measurement bias. Therefore, combining ob-
In this study, we conducted a methodology scoping re- servational study with RCTs would need more careful
view, which has identified two main types of causal pre- consideration, and a good global guidance may be re-
dictive modelling (methods that allow for prediction quired. Harrell and Lazzeroni [45] laid out some initial
under hypothetical interventions), with the main differ- steps one can follow toward an optimal decision making
ences between the methods being the source of data from using both RCTs and electronic health record (EHR)
which the causal effects are estimated. We identified that data. We also refer the reader to the recent PATH (Pre-
when the causal effects required for the predictions were dictive Approaches to Treatment effect Heterogeneity)
fully estimated from the observational data, methods are Statement [46, 47], developed to provide guidance for
available for predictions under interventions either at a predictive analyses of heterogeneity of treatment effects
single time point or varying over time. We have collated (HTE) in clinical trials. Predictive HTE analysis aims to
current approaches within this field and highlighted their express treatment effects in terms of predicted risks and
advantages and limitations in the review. predict which of 2 or more treatments will be better for
There are recent studies that have performed a review a particular individual, which aligns closely with our re-
of methods for causal inference all with different focuses: view aim here. However, as motivated by the limitations
methods in the analyses of RCTs [41], methods based on in the conventional subgroup analyses in RCTs, predict-
graphical models [42] or DAGs [43], and methods tar- ive HTE analysis has focused on regression-based pre-
geting time-varying confounding [44]. Our work differs diction in randomised trials for treatment effect
from these reviews, and, to our knowledge, is the first estimation and subgroup identification. Such techniques
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 13 of 16

can be adapted for the purpose of counterfactual pre- interventions have explored methods such as marginal
diction. For example, the predictive modelling used in structural cox models [50] and parametric g-formula
estimating individualised causal effect in [21, 35] was [51]. However, despite its apparent need in clinical prac-
applied for counterfactual prediction in the included tice as in the abovementioned example, there appears to
paper [30]. However, as the primary goal in predictive be a lack of models for counterfactual predictions under
HTE analyses such as [21, 35] is not predicting the multiple interventions, and future methodological devel-
counterfactual outcome, we did not include them in opment is required.
our review, which may also be deemed as a limitation Treatment scenarios addressed so far in this review,
of this study. both time-fixed and time-varying, are static interven-
Another obstacle in combining RCTs with observa- tions, i.e. treatment assignment under intervention does
tional study is that, while the estimand for causal infer- not depend on the post baseline covariates. In contrast
ence is clearly defined, the prediction estimand, termed to the static intervention is the dynamic treatment strat-
the predictimand by Van Geloven et al. [5], is often un- egy, a rule in which treatments are assigned dynamically
clear in prediction models. There is an emergence of as a function of previous treatment and covariate his-
studies arguing that clearly defining the estimand in pre- tory. Methods such as dynamic MSMs introduced by
diction is important [28, 43]. Despite these challenges, Orellana et al. [52] and independently by Van der Laan
and that relatively little work has been done in combin- and Petersen [53], and variants of structural nested
ing RCTs with patient observational data, it remains an models (SNMs) introduced by Robins [54] were pro-
opportunity to explore the interplay of these two areas, posed to use observational data to estimate the optimal
as noted in the recent survey by Bica et al. [48]. dynamic treatment regime. Embedding these methods
Several key challenges arise in dealing with multiple within clinical prediction framework could enable coun-
interventions. The term ‘multiple treatments’ has been terfactual predicting under dynamic treatment allocation
commonly used throughout literature, especially when and support decision-making on optimal treatment
addressing time-varying treatments. However, the same rules, which presents a promising avenue for future re-
term may refer to very distinct scenarios in different search [55].
studies, and greater clarity is necessary. The first and the The most pressing problem to address for predictions
most often seen scenario is where multiple values/op- under hypothetical interventions is model validation.
tions are observed for a treatment variable, either at a Validation is a crucial step in prediction modelling
single time point or over time. Treatments in this setting (counterfactual or otherwise), but is challenging in the
are indeed ‘multivariate treatments’. Many approaches in counterfactual space since that the counterfactual out-
this review are designated to deal with multivariate treat- comes are not observable in the validation dataset. The
ments [24, 30–33] or can in principal be extended to included papers have by-passed this issue by noting that,
this case [25], [28]. However, except for the approach in models are fitted based on the error on the factual out-
[24], all methods assume treatment effects from different comes in the validation dataset. In this context, handling
options to be independent; in [24], interactions between of treatment in validation of clinical prediction models
treatment options are modelled through the covariance has received some attention [56]. Pajouheshnia et al.
matrix in the Gaussian process prior. Further methodo- [56] addressed the specific case of validating a prognos-
logical development could explore ways to incorporate tic model for treatment-free risk predictions in a valid-
treatment-treatment interactions into the model. ation set where risk-lowering treatments are used.
A second scenario of ‘multiple treatments’ is where However, further studies are required to extend the
there are interventions on several risk factors, which is potential methods to address the more complex issue in
substantially more complex, but also more realistic. For validating counterfactual predictions, such as non-
example, in clinical settings, one could intervene on dif- discrete treatment types and non-parametric models as
ferent risk factors to prevent CVD, and possible inter- included in this review. While there is emerging research
ventions include giving antihypertensive drug or lipid- on developing a model validation procedure to estimate
lowering treatment, lifestyle changing (physical activity, the performance of methods for causal effects estimation
smoking, and alcohol drinking), or a combination of [57] and sensitivity analysis in causal inference [58],
them. As these interventions take effect on different techniques are required to validate the models tailored
parts of the causal structure for the outcome, changes in for counterfactual prediction. Just as domain knowledge
one factor may affect others, e.g., weight gain after is important in causal inference before real-world de-
smoking cessation [49]. Moreover, each clinical interven- ployment, it is also important in validating counterfac-
tion scenario will require its own model for identifying tual prediction, and integrating data generated from
treatment effects from observational data [11]. Recent RCTs and observational studies and their corresponding
studies on estimating causal effect under multiple models provides a promising way to aid the process [48].
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 14 of 16

Conclusions Consent for publication


Prediction under hypothetical intervention is an emer- Not applicable.

ging topic, with most methodological contributions pub-


lished after 2015. This is now an active area of research Competing interests
The authors declare that they have no conflicts of interest relating to the
in both the statistics and machine learning communities. publication of this work.
Available methods for causal predictive modelling can
be divided into two approaches. The first combines data Author details
1
Division of Informatics, Imaging and Data Science, Faculty of Biology,
from randomised controlled trials with observational Medicine and Health, University of Manchester, Manchester Academic Health
data, while the second approach uses observational data Science Centre, Manchester, UK. 2NIHR Greater Manchester Patient Safety
only. We recommend using causal effects from rando- Translational Research Centre, The University of Manchester, Manchester, UK.
3
NIHR Manchester Biomedical Research Centre, The University of Manchester,
mised controlled trials where possible, combining these Manchester Academic Health Science Centre, Manchester, UK.
with prediction models estimated from observational
data, as this alleviates the required assumptions for the Received: 21 July 2020 Accepted: 2 January 2021
causal contrasts to be unbiased. However, further theor-
etical guarantees are required regarding triangulating
data from multiple sources. As well as the data sources References
1. Steyerberg EW. Clinical prediction models : a practical approach to
available, the targeted estimand needs careful thought, development, validation, and updating: Springer; 2009. p. 497.
and a relevant approach for the required estimand 2. Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle P.
should be chosen. For example, marginal structural Derivation and validation of QRISK, a new cardiovascular disease risk score for
the United Kingdom: prospective open cohort study. BMJ. 2007;335(7611):136
models can be used if observational data are used to Available from: http://www.ncbi.nlm.nih.gov/pubmed/17615182.
make hypothetical predictions concerning an interven- 3. Hippisley-Cox J, Coupland C, Brindle P. Development and validation of QRIS
tion that is sustained into the future. However, tech- K3 risk prediction algorithms to estimate future risk of cardiovascular
disease: prospective cohort study. BMJ. 2017;357:j2099 Available from:
niques to validate such models, and approaches for https://www.bmj.com/content/357/bmj.j2099.
hypothetical risks under multiple or dynamic interven- 4. NICE. Lipid modification: cardiovascular risk assessment and the
tion scenarios, are under-investigated. modification of blood lipids for the primary and secondary prevention of
cardiovascular disease. 2014. Available from: https://www.nice.org.uk/
guidance/cg181
Supplementary Information 5. van Geloven N, Swanson S, Ramspek C, Luijken K, Van Diepen M, Morris T,
The online version contains supplementary material available at https://doi. et al. Prediction meets causal inference: the role of treatment in clinical
org/10.1186/s41512-021-00092-9. prediction models. Eur J Epidemiol. 2020; Available from: https://doi.org/10.
1007/s10654-020-00636-1.
6. Groenwold RHH, Moons KGM, Pajouheshnia R, Altman DG, Collins GS,
Additional file 1. Debray TPA, et al. Explicit inclusion of treatment in prognostic modeling
Additional file 2. was recommended in observational and randomized settings. J Clin
Epidemiol. 2016;78:90–100 Available from: http://www.ncbi.nlm.nih.gov/
pubmed/27045189.
Acknowledgments 7. Liew SM, Doust J, Glasziou P. Cardiovascular risk scores do not account for
We thank two reviewers for their thoughtful comments and suggestions, the effect of treatment: a review. Heart. 2011;97(9):689–97. Available from:
which helped improve and clarify this manuscrip, and have undoubtedly https://heart.bmj.com/content/97/9/689.
strengthened the final version. 8. Pajouheshnia R, Schuster NA, Groenwold RH, Rutten FH, Moons KG, Peelen
LM. Accounting for time-dependent treatment use when developing a
Authors’ contributions prognostic model from observational data: A review of methods. Statistica
All authors contributed to developing the review protocol. LL conducted the Neerlandica. 2020;74(1):38–51. Available from: https://onlinelibrary.wiley.
literature searches, screening, and data extraction. DAJ conducted the initial com/doi/abs/10.1111/stan.12193.
3% of the abstract screen to ensure reliability of the screening process. MS 9. Lawton M, Tilling K, Robertson N, Tremlett H, Zhu F, Harding K, Oger J, Ben-
contributed to the study selection against inclusion/exclusion criteria. LL and Shlomo Y. A longitudinal model for disease progression was developed and
MS wrote the first draft. The authors discussed, reviewed, and edited the applied to multiple sclerosis. Journal of clinical epidemiology. 2015;68(11):
manuscript and have approved the final version. 1355–65. Available from: https://doi.org/10.1016/j.jclinepi.2015.05.003.
10. Hernán MA, Hsu J, Healy B. A second chance to get causal inference right: a
classification of data science tasks. CHANCE. 2019;32(1):42–9 Available from:
Funding
https://www.tandfonline.com/doi/full/10.1080/09332480.2019.1579578.
This work was funded by UKRI via the Alan Turing Institute under the Health
11. Westreich D, Greenland S. The table 2 fallacy: presenting and interpreting
and Medical Sciences Programme. DAJ is funded by the National Institute for
confounder and modifier coefficients. Am J Epidemiol. 2013;177(4):292–8
Health Research Greater Manchester Patient Safety Translational Research
Available from: http://www.ncbi.nlm.nih.gov/pubmed/23371353.
Centre (NIHR Greater Manchester PSTRC). The views expressed are those of
12. Shmueli G. To Explain or to Predict? Stat Sci. 2010;25(3):289–310 Available
the author(s) and not necessarily those of the NHS, the NIHR, or the
from: https://projecteuclid.org/euclid.ss/1294167961.
Department of Health and Social Care.
13. Dickerman BA, Hernán MA. Counterfactual prediction is not only for causal
inference. Eur J Epidemiol. 2020;4 Available from: http://link.springer.com/1
Availability of data and materials 0.1007/s10654-020-00659-8.
Data sharing is not applicable to this article as no datasets were generated 14. Arnold KF, Davies V, de Kamps M, Tennant PWG, Mbotwa J, Gilthorpe MS.
or analysed during the current study. Reflections on modern methods: generalized linear models for prognosis
and intervention-theory, practice and implications for machine learning. Int
Ethics approval and consent to participate J Epidemiol. 2020; Available from: https://pubmed.ncbi.nlm.nih.gov/323
Not applicable. 80551/.
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 15 of 16

15. Blakely T, Lynch J, Simons K, Bentley R, Rose S. Reflection on modern Healthcare: PMLR; 2016. p. 282–300. Available from: http://proceedings.mlr.
methods: when worlds collide—prediction, machine learning and causal press/v56/Xu16.pdf.
inference. Int J Epidemiol. 2020;49(1):338–347. Available from: https://doi. 32. Soleimani H, Subbaswamy A, Saria S. Treatment-response models for
org/10.1093/ije/dyz132. counterfactual reasoning with continuous-time, continuous-valued
16. Piccininni M, Konigorski S, Rohmann JL, Kurth T. Directed acyclic graphs and interventions. In: the 33rd Conference on Uncertainty in Artificial
causal thinking in clinical risk prediction modeling. BMC Med Res Methodol. Intelligence (UAI); 2017. Available from: http://auai.org/uai2017/proceedings/
2020;20(1):179 Available from: https://bmcmedresmethodol.biomedcentral. papers/266.pdf.
com/articles/10.1186/s12874-020-01058-z. 33. Schulam P, Saria S. Reliable decision support using counterfactual Models.
17. Arksey H, O'Malley L. Scoping studies: towards a methodological framework. In: Advances in Neural Information Processing Systems 30 (NIPS 2017); 2017.
Int J Soc Res Methodol. 2005;8(1):19–32. Available from: https://doi.org/10. p. 1697–708. Available from: https://papers.nips.cc/paper/6767-reliable-
1080/1364557032000119616. decision-support-using-counterfactual-models.
18. Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. 34. Pearl J. Causality: Models, reasoning, and inference, second edition:
Preferred reporting items for systematic review and meta-analysis protocols Cambridge University Press; 2011. p. 1–464.
(PRISMA-P) 2015 statement. Syst Rev. 2015;4(1) Available from: https:// 35. Li J, Zhao L, Tian L, Cai T, Claggett B, Callegaro A, et al. A predictive
systematicreviewsjournal.biomedcentral.com/articles/10.1186/2046-4053-4-1. enrichment procedure to identify potential responders to a new therapy for
19. Martin GP, Jenkins D, Bull L, Sisk R, Lin L, Hulme W, et al. Towards a randomized, comparative controlled clinical studies. Biometrics. 2016;72(3):
framework for the design, implementation and reporting of methodology 877–87 Available from: https://pubmed.ncbi.nlm.nih.gov/26689167/.
scoping reviews. 2020;127:191. Available from: https://doi.org/10.1016/j. 36. Lamont A, Lyons MD, Jaki T, Stuart E, Feaster DJ, Tharmaratnam K, et al.
jclinepi.2020.07.014. Identification of predicted individual treatment effects in randomized
20. Geersing GJ, Bouwmeester W, Zuithoff P, Spijker R, Leeflang M, Moons K. clinical trials. Stat Methods Med Res. 2018;27(1):142–57 Available from:
Search filters for finding prognostic and diagnostic prediction studies in https://pubmed.ncbi.nlm.nih.gov/26988928/.
medline to enhance systematic reviews. PLoS One. 2012;7(2):e32844 37. Cai T, Tian L, Wong PH, Wei LJ. Analysis of randomized comparative clinical
Available from: http://www.ncbi.nlm.nih.gov/pubmed/22393453. trial data for personalized treatment selections. Biostatistics. 2016;17(2):249–
21. Nguyen T-L, Collins GS, Landais P, Le Manach Y. Counterfactual clinical 63 Available from: https://www.lanternpharma.com/.
prediction models could help to infer individualised treatment effects in 38. Daley DJ, Vere-Jones D. An introduction to the theory of point processes:
randomised controlled trials – an illustration with the international stroke Volume I : Elementary, Theory and Methods - Second Edition. Probability
trial. J Clin Epidemiol. 2020;125:47–56 Available from: https://pubmed.ncbi. and its Applications. 2003.
nlm.nih.gov/32464321/. 39. Rasmussen CE, CKI W. Gaussian processes for machine learning: the MIT
22. Candido dos Reis FJ, Wishart GC, Dicks EM, Greenberg D, Rashbass J, Press; 2006. Available from: www.GaussianProcess.org/gpml
Schmidt MK, et al. An updated PREDICT breast cancer prognostication and 40. Robins J. Association, causation, and marginal structural models. Synthese.
treatment benefit prediction model with independent validation. Breast 1999;121(1/2):151–79.
Cancer Res. 2017;19(1):58 Available from: http://breast-cancer-research. 41. Farmer RE, Kounali D, Walker AS, Savović J, Richards A, May MT, et al.
biomedcentral.com/articles/10.1186/s13058-017-0852-3. Application of causal inference methods in the analyses of randomised
23. Brunner FJ, Waldeyer C, Ojeda F, Salomaa V, Kee F, Sans S, et al. Application controlled trials: a systematic review. Trials. 2018;19(1):23 Available from:
of non-HDL cholesterol for population-based cardiovascular risk https://trialsjournal.biomedcentral.com/articles/10.1186/s13063-017-2381-x.
stratification: results from the Multinational Cardiovascular Risk Consortium. 42. Glymour C, Zhang K, Spirtes P. Review of causal discovery methods based
Lancet. 2019;394(10215):2173–83 Available from: https://www.thelancet. on graphical models. Front Genet. 2019;10(JUN):524 Available from: https://
com/journals/lancet/article/PIIS0140-6736(19)32519-X/fulltext. www.frontiersin.org/article/10.3389/fgene.2019.00524/full.
24. Silva R. Observational-interventional priors for dose-response learning. in 43. Tennant PW, Harrison WJ, Murray EJ, Arnold KF, Berrie L, Fox MP, et al. Use
advances in neural information processing systems 29 (NIPS 2016); 2016. of directed acyclic graphs (DAGs) in applied health research: review and
Available from: http://papers.neurips.cc/paper/6107-observational- recommendations. medRxiv. 2019; Available from: https://www.medrxiv.org/
interventional-priors-for-dose-response-learning content/10.1101/2019.12.20.19015511v1.
25. van Amsterdam WAC, Verhoeff JJC, de Jong PA, Leiner T, MJC E. Eliminating 44. Clare PJ, Dobbins TA, Mattick RP. Causal models adjusting for time-varying
biasing signals in lung cancer images for prognosis predictions with deep confounding—a systematic review of the literature. Int J Epidemiol. 2019;(1):48,
learning. NPJ Digit Med. 2019;2(1):1–6 Available from: https://www.nature. 254–265 Available from: http://www.ncbi.nlm.nih.gov/pubmed/30358847.
com/articles/s41746-019-0194-x. 45. Frank Harrell, Laura Lazzeroni. EHRs and RCTs: outcome prediction vs.
26. Alaa AM, van der Schaar M. Bayesian inference of individualized treatment optimal treatment selection [Internet]. 2020 [accessed 2020 Apr 19].
effects using multi-task gaussian processes. In: Advances in Neural Available from: https://www.fharrell.com/post/ehrs-rcts/
Information Processing Systems 30 (NIPS 2017); 2017. Available from: 46. Kent DM, Paulus JK, van Klaveren D, D’Agostino R, Goodman S, Hayward R,
https://papers.nips.cc/paper/6934-bayesian-inference-of-individualized- et al. The Predictive Approaches to Treatment effect Heterogeneity (PATH)
treatment-effects-using-multi-task-gaussian-processes. Statement. Ann Intern Med. 2020;172(1):35 Available from: https://annals.
27. Arjas E. Time to consider time, and time to predict? Stat Biosci. 2014;6(2): org/aim/fullarticle/2755582/predictive-approaches-treatment-effect-
189–203 Available from: https://link.springer.com/article/10.1007/s12561- heterogeneity-path-statement.
013-9101-1. 47. Kent DM, van Klaveren D, Paulus JK, D’Agostino R, Goodman S, Hayward R,
28. Sperrin M, Martin GP, Pate A, Van Staa T, Peek N, Buchan I. Using marginal et al. The Predictive Approaches to Treatment effect Heterogeneity (PATH)
structural models to adjust for treatment drop-in when developing clinical statement: explanation and elaboration. Ann Intern Med. 2020;172(1):W1–25
prediction models. Stat Med. 2018;37(28):4142–54 Available from: http://doi. Available from: https://www.acpjournals.org/doi/10.7326/M18-3668.
wiley.com/10.1002/sim.7913. 48. Bica I, Alaa AM, Lambert C, van der Schaar M. From real-world patient data
29. Lim B. Forecasting treatment responses over time using recurrent marginal to individualized treatment effects using machine learning: current and
structural networks. In: 32nd Conference on Neural Information Processing future methods to address underlying challenges. Clin Pharmacol Ther.
Systems (NeurIPS 2018); 2018. p. 7494–504. Available from: http://papers. 2020;cpt:1907 Available from: https://onlinelibrary.wiley.com/doi/abs/10.1
nips.cc/paper/7977-forecasting-treatment-responses-over-time-using- 002/cpt.1907.
recurrent-marginal-structural-networks. 49. Jain P, Danaei G, Manson JE, Robins JM, Hernán MA. Weight gain after
30. Bica I, Alaa AM, Jordon J, van der Schaar M. Estimating counterfactual smoking cessation and lifestyle strategies to reduce it. Epidemiology. 2020;
treatment outcomes over time through adversarially balanced 31(1):7–14 Available from: http://journals.lww.com/00001648-202001000-
representations. In: 8th International Conference on Learning 00002.
Representations (ICLR); 2020. Available from: https://openreview.net/pdf?id 50. Lusivika-Nzinga C, Selinger-Leneman H, Grabar S, Costagliola D, Carrat F.
= BJg866NFvB. Performance of the marginal structural cox model for estimating individual
31. Xu Y, Xu Y, Saria S. A Bayesian nonparametric approach for estimating and joined effects of treatments given in combination. BMC Med Res
individualized treatment-response curves. In: Doshi-Velez F, Fackler J, Kale D, Methodol. 2017;17(1):1–11 Available from: https://bmcmedresmethodol.
Wallace B, Wiens J, editors. Proceedings of the 1st Machine Learning for biomedcentral.com/articles/10.1186/s12874-017-0434-1.
Lin et al. Diagnostic and Prognostic Research (2021) 5:3 Page 16 of 16

51. Vangen-Lønne AM, Ueda P, Gulayin P, Wilsgaard T, Mathiesen EB, Danaei G.


Hypothetical interventions to prevent stroke: an application of the
parametric g-formula to a healthy middle-aged population. Eur J Epidemiol.
2018;33(6):557–66.
52. Orellana L, Rotnitzky A, Robins J. Generalized marginal structural models for
estimating optimal treatment regimes; 2006.
53. van der Laan MJ, Petersen ML. Causal effect models for realistic
individualized treatment and intention to treat rules. Int J Biostat. 2007;3(1):
Article3 Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/
PMC2613338/.
54. Robins JM. Optimal structural nested models for optimal sequential
decisions. In: Lin DY, Heagerty PJ, editors. Proceedings of the Second Seattle
Symposium in Biostatistics. New York, NY: Springer; 2004. p. 189–326.
55. Chakraborty B, Murphy SA. Dynamic treatment regimes. Annu Rev Stat Its
Appl. 2014;1(1):447–64 Available from: http://www.annualreviews.org/doi/10.
1146/annurev-statistics-022513-115553.
56. Pajouheshnia R, Peelen LM, Moons KG, Reitsma JB, Groenwold RH.
Accounting for treatment use when validating a prognostic model: a
simulation study. BMC Med Res Methodol. 2017;17(1):103. Available from:
https://doi.org/10.1186/s12874-017-0375-8.
57. Alaa AM, van der Schaar M. Validating causal inference models via influence
functions. In: 36th International Conference on Machine Learning, ICML
2019; 2019. p. 281–91. Available from: http://proceedings.mlr.press/v97/alaa1
9a.html.
58. Franks AM, D’Amour A, Feller A. Flexible sensitivity analysis for observational
studies without observable implications. J Am Stat Assoc. 2020;115:(532):
1730-46. Available from: https://doi.org/10.1080/01621459.2019.1604369.

Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

You might also like