Predicting No-Show Appointments in A Pediatric Hospital in Chile Using Machine Learning
Predicting No-Show Appointments in A Pediatric Hospital in Chile Using Machine Learning
Predicting No-Show Appointments in A Pediatric Hospital in Chile Using Machine Learning
https://doi.org/10.1007/s10729-022-09626-z
Abstract
The Chilean public health system serves 74% of the country’s population, and 19% of medical appointments are missed
on average because of no-shows. The national goal is 15%, which coincides with the average no-show rate reported in the
private healthcare system. Our case study, Doctor Luis Calvo Mackenna Hospital, is a public high-complexity pediatric
hospital and teaching center in Santiago, Chile. Historically, it has had high no-show rates, up to 29% in certain medical
specialties. Using machine learning algorithms to predict no-shows of pediatric patients in terms of demographic, social,
and historical variables. To propose and evaluate metrics to assess these models, accounting for the cost-effective impact
of possible intervention strategies to reduce no-shows. We analyze the relationship between a no-show and demographic,
social, and historical variables, between 2015 and 2018, through the following traditional machine learning algorithms:
Random Forest, Logistic Regression, Support Vector Machines, AdaBoost and algorithms to alleviate the problem of class
imbalance, such as RUS Boost, Balanced Random Forest, Balanced Bagging and Easy Ensemble. These class imbalances
arise from the relatively low number of no-shows to the total number of appointments. Instead of the default thresholds used
by each method, we computed alternative ones via the minimization of a weighted average of type I and II errors based
on cost-effectiveness criteria. 20.4% of the 395,963 appointments considered presented no-shows, with ophthalmology
showing the highest rate among specialties at 29.1%. Patients in the most deprived socioeconomic group according to their
insurance type and commune of residence and those in their second infancy had the highest no-show rate. The history of
non-attendance is strongly related to future no-shows. An 8-week experimental design measured a decrease in no-shows of
10.3 percentage points when using our reminder strategy compared to a control group. Among the variables analyzed, those
related to patients’ historical behavior, the reservation delay from the creation of the appointment, and variables that can
be associated with the most disadvantaged socioeconomic group, are the most relevant to predict a no-show. Moreover, the
introduction of new cost-effective metrics significantly impacts the validity of our prediction models. Using a prototype to
call patients with the highest risk of no-shows resulted in a noticeable decrease in the overall no-show rate.
Keywords No-show patients · Appointments and schedules · Machine learning · Medical informatics · Public health
1 Introduction
J. Peypouquet
j.g.peypouquet@rug.nl
With a globally increasing population, efficient use of
healthcare resources is a priority, especially in countries
Extended author information available on the last page of the article. where those resources are scarce [21]. One avoidable source
J. Dunstan et al.
of inefficiency stems from patients missing their scheduled We use machine learning methods to estimate the
appointments, a phenomenon known as no-show [7], which probability of no-show in pediatric appointments, and
produces noticeable wastes of human and material resources identify which patients are likely to miss them. This
[17]. A systematic review of 105 studies found that Africa prediction is meant to be used by the hospital to reduce
has the highest no-show (43%), followed by South America no-show rates through personalised actions. Since public
(28%), Asia (25%), North America (24%), Europe (19%), hospitals have scarce resources and a tight budget, we
and Oceania (13%), with a global average of 23% [11]. In introduce new metrics to account for both the costs and
pediatric appointments, no-show rates range between 15% the effectiveness of these actions, which marks a difference
and 30% [11], and tend to increase with the patients’ age with the work presented by Srinivas and Salah [47], which
[33, 44]. considers standard machine learning metrics, and Berg et
To decrease the rate of avoidable no-shows, hospitals can al. [2], which balances interventions and opportunity costs,
focus their efforts in three main areas: among others.
a) Identifying the causes. The most common one is The paper is organised as follows: Section 2 describes
forgetting the appointment, according to a survey in the the data and our methodological approach. It contains
United Kingdom [36]. Lacy et al. [26] identified three data description, the machine learning methods, our cost-
additional issues: emotional barriers (negative emotions effectiveness metrics, and the deployment. Results are
about going to see the doctor were greater than the sensed shown in Section 3, paying particular attention to the
benefit), perceived disrespect by the health care system, metrics we constructed to assess efficiency, and the
and lack of understanding of the scheduling system. In impact of the use of this platform, measured in an
pediatric appointments, other reasons include caregiver’s experimental design. Section 4 contains our conclusions and
issues, scheduling conflicts, forgetting, transportation, gives directions for future research. Finally, some details
public health insurance, and financial constraints [11, 19, concerning the threshold tuning, and the balance between
23, 39, 44, 49]. type I and II errors are given in the Appendix.
b) Predicting patients’ behaviour. To this end,
researchers have used diverse statistical methods, including
logistic regression [5, 20, 22, 40], generalised additive mod- 2 Materials and methods
els [43], multivariate [5], hybrid methods with Bayesian
updating [1], Poisson regression [41], decision trees [12, 2.1 Data description
13], ensembles [14, 37], and stacking methods [46]. Their
efficiency depends on the ability of predictors to com- Dr. Luis Calvo Mackenna Hospital is a high-complexity
pute the probability of no-show for a given patient and pediatric hospital in Santiago. We analysed the schedule
appointment. Among adults, the most likely to miss their of medical appointments from 2015 to 2018, comprising
appointments are younger patients, those with a history 395,963 entries. It contains socioeconomic information
of no-show, and those from a lower socioeconomic back- about the patient (commune of residence, age, sex,1
ground, but variables such as the time of the appointment health insurance), and the appointment (specialty, type of
are also relevant [11]. appointment, day of the week, month, hour of the day,
c) Improving non-attendance rates using preventive reservation delay), as well as the status of the appointment
measures. A review of 26 articles from diverse backgrounds (show/no-show).
found that patients who received a text notification were Although the hospital receives patients from the whole
23% less likely to miss their appointment than those who did country, 70.7% of the appointments correspond to patients
not [42]. Similar results were obtained for personal phone from the Eastern communes of Santiago (see Fig. 1).
calls in adolescents [39]. Text messages have been observed Among these communes, the poorest, Peñalolén, exhibits
to produce similar outcomes to telephone calls, at a lower the highest percentage of no-show. Table 1 shows the per-
cost, in both adults [10, 18] and pediatric patients [29]. centage of appointments, no-shows and poverty depending
In terms of implementing mitigation actions, overbook- on the patients’ commune of residence. For measuring
ing can maintain an efficient use of resources, despite poverty, we used the Chilean national survey Casen, which
no-show [2, 25]. However, there is a trade-off between effi- uses the multidimensional poverty concept to account for
ciency and service quality. For other strategies, see the work the multiple deprivations faced by poor people at the same
of Cameron et al. [6]. time in areas such as education, health, among others [34].
This work is concerned with prediction and prevention 1 Ofthe 395,963 appointments, there are 15 from intersex patients and
in a pediatric setting. This is particularly challenging as 25 in which sex was marked as undefined. These appointments were
attendance involves patients and their caregivers, who can not considered to create the model because small group sizes could
moreover change over time. cause model overfitting.
Predicting no-show appointments in a pediatric hospital in Chile using machine learning
LB
V
EI Metropolitan
Region LC
Pr
Santiago LR
Ñ
M Pe
0 4 8 16 24 km
Since Dr. Luis Calvo Mackenna is a pediatric hospital, status in groups A, B, C, and D. The income range for
99.2% of the appointments correspond to patients whose each group and the percentage of appointments at each
age at the day of the appointment is under 18 years. The level is shown in Table 3. During the time this study took
distribution by age group is shown on Table 2. place, patients in groups A and B had zero co-payment,
Most appointments (96.5%) correspond to patients while groups C and D had 10% and 20%, respectively. As
covered by the Public Health Fund FONASA. These of September 2022, due to new government policies, all
patients are classified according to their socioeconomic patients covered in FONASA have a zero co-payment.
The type of appointment is also an important variable.
Table 3 shows the percentage of appointments that cor-
Table 1 Location of the referred center, the proportion of patients respond to first-time appointments, routine appointments,
from the total of appointments, no-show rate and proportion of the
population in multidimensional poverty [34] first-time appointments derived from primary healthcare,
and others. The table shows each type’s volume and the
Referred from Appts. % No-show % Poverty % percentage of no-shows for each type.
Peñalolén 31.1 23.8 26.3
Macul 12.4 23.5 13.5
Ñuñoa 8.9 21.9 5.8 Table 2 Appointments at Dr. Luis Calvo Mackenna displayed by age
Lo Barnechea 4.8 22.4 17.2 group
Las Condes 4.6 21.3 4.2
Life cycle grouping Age Range Percentage
Providencia 4.1 20.2 3.4
La Reina 4.1 23.3 7.0 Nursling 0-5 months 9.7%
Vitacura 0.5 20.6 3.5 First infancy 6 months-4 years 24.1%
Easter Island 0.2 16.6 21.7 Second infancy 5-11 years 39.2%
Other communes 11.1 16.7 − Teenagers 12-17 years 26.2%
Rest of the country 18.2 13.4 − Young adults 18-25 years 0.8%
J. Dunstan et al.
Table 3 Distribution of
patients by grouping them Group Description Appointments % No-Show %
according to socioeconomic
status and type of appointment Socioeconomic Status
A Without income/migrants 44.1 22.5
B Less than US$425. 22.1 18.9
C Between US$425 and US$620 13.0 18.9
D Greater than US$621 17.3 18.3
Other Without health insurance 2.0 20.4
Private With private insurance 1.5 20.4
Type of appointment
1st time appointment First visit for a certain medical episode 23.1 24.1
Routine appointment Medical controls that follow 1st appointments 63.7 18.6
1st time derived Special slots derived from primary healthcare 8.7 26.8
Other Mainly medical prescriptions 4.5 16.6
We analysed specialty consultation referrals both from appointments are missed when looking at time windows of
within the hospital and from primary care providers. The only 12 months. This number grows to 55.2% when the
dataset contains appointments from 25 specialties, which window is 6 months. Due to the above reasons, we decided
are shown in Table 4, along with the corresponding no- to consider all available no-show records.
show rate. The no-show rate is uneven, and seems to The ultimate aim of this work is to identify which
be lower in specialties associated with chronic and life- appointments are more likely to be missed. To do so, we
threatening diseases (e.g. Oncology, Cardiology) than in developed models that classify patients based on attributes
other specialties (e.g. Dermatology, Ophthalmology). available to the hospital, which are described in Table 5.
According to Dantas et al. [11], the patients’ no-show
history can be helpful in predicting their behavior. In order 2.2 Machine learning methods
to determine whether or not to use the complete history,
we performed a correlation analysis between no-show and Our models predict the probability of no-show for a given
past non-attendance, as a function of the size of the look- appointment. This prediction problem was approached
back period. We observed that the Pearson correlation grows using supervised machine learning (ML) methods, where
with the window size (0.09 at six months and 0.11 at the label (variable to predict) was the appointment
18 months), achieving a maximum correlation using the state: show or no-show. All the categorical features in
complete patient history (0.47). Note also that 20.3% of past Table 5 were transformed to one-hot encoded vectors.
The numerical features (historical no-show and reservation
delay) were scaled between 0 and 1.
Table 4 Medical and dental specialties in the dataset In medical applications, the decisions and predictions
Medical specialties (no-show %) of algorithms must be explained, in order to justify
Pulmonology (23.2) Ophthalmology (30.3) their reliability or trustworthiness [28]. Instead of deep
Cardiology (14.7) Oncology (4.9) learning, we preferred traditional machine learning, since its
General Surgery (16.9) Otorhinolaryngology (22.7) explanatory character [35] brings insight into the incidence
Plastic Surgery (14.2) Psychiatry (24.0) of the variables over the output. This is particularly
Dermatology (28.1) Rheumatology (20.9) important because the hospital intends to implement tailored
Endocrinology (22.1) Traumatology (19.9) actions to reduce the no-show.
Gastroenterology (19.3) Urology (19.3) The tested algorithms, listed in Table 6, were imple-
Gynecology (25.1) Genetics (24.5)
mented in Python programming language [50]. The dis-
Hematology (15.8) Pediatrics (22.6)
tribution of the classes is highly unbalanced, with a
ratio of 31:8 between show and no-show. To address the
Nephrology (18.4) Infectology (23.7)
class imbalance we used algorithms suited for imbalanced
Neurology (28.3) Parasitology (18.8)
learning implemented in imbalanced-learn [27] and
Nutrition (27.6)
scikit-learn [38]. To handle the problem of class bal-
Dental specialties (no-show %)
ancing, RUSBoost [45] randomly under-samples the major-
Pediatric dentistry (24.9) Orthodontics (18.4)
ity sample at each iteration of AdaBoost [16], which is a
Predicting no-show appointments in a pediatric hospital in Chile using machine learning
Age Age at the day of the appointment, as the Categorical Nursling (0-5 months), first infancy (6
position in the life cycle: months-4 years), second infancy (5-11
years), teenager (12-17 years), young
adult (18-25 years)
Sex Sex of the patient Categorical Male, female
Commune of residence Location of residence of the patient at the Categorical Any of the 346 communes of Chile
commune level.
Insurance Insurance type Categorical Group A (person without housing or
income, or migrant, Group B (monthly
income < US $ 425), Group C (monthly
income ∈ [US $ 425;US $621)), Group
D (monthly income > US $ 621), Pro-
visory Insurance (people without health
insurance)
Day of the week Day of the week of the appointment Categorical Monday - Friday
Month Month of the appointment Categorical January - December
Hour of the day Hour of the day of the appointment as a Categorical 8hrs - 17hrs (ranges of one hour)
categorical feature
Reservation delay Time in weeks from the creation of the Numerical 0,1,2,. . .
appointment generation and the appoint-
ment itself as a categorical feature.
Historical no-show Calculated as the no-show citations Numerical Number between 0 and 1
divided by total citations prior the current
appointment.
Historical no-show Calculated as the no-show citations Numerical Number between 0 and 1
by specialty divided by total citations prior the cur-
rent appointment, both with respect to the
considered specialty.
Type of appointment Type of the appointment, regardless its Categorical First-time appointment, routine appoint-
medical specialty ment, and first-time appointment derived
from primary healthcare (PHC)
well-known boosting algorithm shown to improve the clas- no-shows. Also, as shown in the Section 3, a single model
sification performance of weak classifiers. Similarly, the incorporating specialty information through a series of indi-
balanced Random Forest classifier balances the minority cator variables is less accurate than our specialty-based
class by randomly under-sampling each bootstrap sample models.
[8]. On the other hand, Balanced Bagging re-samples using The dataset was split by specialty, and each specialty sub-
random under-sampling, over-sampling, or SMOTE to bal- set was separated into training and testing subsets. The first
ance each bootstrap sample [4, 32, 51]. The final classifier subset was used to select optimal hyperparameters−selected
adapted to imbalanced data was Easy Ensemble, which via grid search on the values described in Table 7−and train
performs random under-sampling. Then, it trains a learner machine learning algorithms. Due to computing power con-
for each subset of the majority class with all the minor- straints, each hyperparameter combination performance was
ity training set to generate learner outputs combined for the
final decision [30]. In turn, Support Vector Machine con-
structs a hyperplane to separate the data points into classes
Table 6 Machine learning algorithms used in this work
[9]. Logistic regression [15] is a generalized linear model,
widely used to predict non-show [1, 7, 20, 22, 40]. We imbalanced-learn
did not use stacking because these classifiers are likely to RUS Boost Balanced Random Forest
suffer from overfitting when the number of minority class Balanced Bagging Easy Ensemble
examples is small [48, 52].
scikit-learn
We trained and analyzed prediction models by spe-
Logistic Regression Random Forest
cialty to ensure that each specialty receives unit-specific
insights about the reasons correlated with their patients’ Ada Boost Support Vector Machines
J. Dunstan et al.
assessed using 3-fold cross-validation. The testing subset The use of custom cost-effectiveness metrics has two
was used to obtain performance metrics. advantages. Firstly, they account for operational costs
The hyperparameters that maximised the metric given and constraints in the hospital’s appointment confirmation
by (1-cost)*effectiveness (see Eq. 6 below) were process, while standard machine learning metrics do not.
used to train models using 10-fold cross-validation over For instance, the number of calls to be made or SMSs to be
the training subset to assess the best algorithm to use sent, the number of telephone operators, etc., all incur costs
for specialty model training. Then, these combinations that the hospital must cover. Secondly, they offer an evident
of best hyperparameters and algorithms were tuned to interpretation of the results since we establish a balance
optimise their classification thresholds, as explained in the between the expected no-show reduction and the number of
Appendix. The tuple (hyperparameter, algorithm, threshold) actions to be made. For instance, a statement such as “in
constitutes a predictive model. Then, the best predictive order to reduce the no-show in ophthalmology by 30%, we
model for each medical specialty is chosen as the one that need to contact 40% of daily appointments” can be easily
maximises cost/effectiveness (see Eq. 5 below). understood by operators and decision-makers.
See Section 2.3 for more details To construct these metrics, we used the proportion PC of
actions to be carried out, based on model predictions:
2.3 Cost-effectiveness metrics
FP + TP
PC = , (1)
Custom metrics were developed to better understand the N
behavior of the trained models, and assess the efficiency of where FP and TP are the number of false and true positives,
the system. These metrics balance the effectiveness of the respectively (analogously for FN and TN); and N = FP +
predictions and the cost associated with possible prevention TP + FN + TN is the total number of appointments (for the
actions. This is particularly relevant in public institutions, specialty). This quantity can be seen as a proxy of the cost
which have strong budget limitations. of actions taken to prevent no-shows.
Predicting no-show appointments in a pediatric hospital in Chile using machine learning
The second quantity used to define our custom me- slightly easier to interpret (but possibly unbounded), we
trics is the proportion PR of no-show reduction, obtained used it to select the best predictive model for each studied
from model predictions. First, let NSPi be the existing no- medical specialty. An analysis of our classification metrics
show rate, and NSPf be the no-show rate obtained after against Geometric Mean (GM) and Matthews’s Correlation
considering that all TP cases attend their appointment. That Coefficient (MCC) is shown in the Appendix. This is carried
is: out to analyze the bias of these two metrics in the context of
FN + TP an imbalanced dataset.
NSPi = , (2) Regarding the limitations of the proposed metrics,
N
FN we noticed that, in some occasional cases, the use of
NSPf = . (3) m1 recommended very few actions. Indeed, few medical
N
appointments with high no-show probability generate a
Then, PR , computed as high classification threshold, yielding a high value of m1 .
NSPf FN TP For example, when the model recommends confirming the
PR = 1 − =1− = , (4) top 1% of the appointments (i.e., PC = 0.01), but this
NSPi FN + TP FN + TP
also reduces the no-show rate by 5% (i.e., PR = 0, 05),
measures the effectiveness of the prediction. To assess we obtain a m1 = 5. To overcome this problem in a
the trade-off between cost and effectiveness, we defined heuristic way, and also for practical reasons (values of m2
metrics: are bounded), we use metric m2 for the hyperparameters
PR optimization process. However, we keep m1 to select the
m1 := effectiveness / cost = , (5) best predictive model for each specialty because it is easier
PC
m2 := effectiveness · (1 - cost) = PR · (1 − PC ). (6) to interpret than m2 .
Another approach used in the literature is the compari-
Here, PR is the proportion of correctly predicted no- sion of models through costs instead of a cost-effectiveness
shows from the total actual no-shows, a measure of analysis—for example, the minimization of both the costs
efficiency. Conversely, PC corresponds to the proportion of of outreaches and the opportunity cost of no-shows. For
predicted no-shows from the total analyzed appointments, a instance, in the context of overbooking, Berg et al. [2] sug-
measure of cost (number of interventions to be performed). gested that the cost function to be minimized could balance
Hence, m1 is the ratio between the proportion of no- the cost of prevention (predicted no-shows multiplied by
shows avoided by the intervention and the proportion of the cost of intervention) and the cost of no-shows (real no-
interventions. In turn, m2 is the product (combined effect) shows multiplied by the cost of medical consultation). This
of the proportion of no-shows avoided by intervention and approach could be adapted to our context to assess miti-
the proportion of shows predicted (appointments no to be gation actions (such as phone calls) through more realistic
intervened). criteria. However, this is beyond the scope of this research
Thus, an increase of a 10% in m1 can be produced by a and will be the object of future studies.
10% increase of PR (an increase of correctly predicted no-
shows) or a 10% decrease of PC (decrease in the number 2.4 Deployment
of interventions to be performed). Similarly, an increase
of a 10% of m2 can be produced by a 10% increase of We designed a computational platform to implement our
PR (an increase of correctly predicted no-shows) without predictive models as a web application. The front- and
performing more interventions, or a 10% increase of 1 − PC back-end were designed in Python using the Django
(decrease in the number of interventions to be performed) web framework. The input is a spreadsheet containing
without changing PR . the appointment’s features, such as patient ID and other
These two metrics are used to construct and select the personal information, medical specialty, date, and time.
best predictive models for each specialty. This decision is This data is processed to generate the features described in
supported by the fact that, by construction, both metrics Table 5.
have higher values when the associated model performs For each specialty, the labels of all appointments are
better in a (simple) cost-effectiveness sense and is therefore predicted using the best predictive model. The appointments
preferred according to our methodology. Then, since the are sorted in descending order according to the predicted
range of m2 is bounded (it takes values between 0 and probability of no-show, along with the patient’s contact
1), we used it as the objective function for hyperparameter information. The hospital may then contact the patients
optimization, which is an intermediate process to construct with the highest probability of no-show to confirm the
our predictive models. On the other hand, since m1 is appointment.
J. Dunstan et al.
Cardiology RandomForestClassifier 0.55 0.10 0.16 0.13 0.18 1.76 0.16 0.63
Dermatology RandomForestClassifier 0.56 0.13 0.26 0.21 0.22 1.61 0.19 0.65
Endocrinology RandomForestClassifier 0.54 0.20 0.21 0.14 0.33 1.68 0.27 0.66
Gastroenterology BalancedBaggingClassifier 0.68 0.11 0.19 0.15 0.21 1.90 0.19 0.65
General surgery LogisticRegression 0.67 0.19 0.13 0.08 0.40 2.17 0.33 0.72
Genetics BalancedRandomForestClassifier 0.57 0.18 0.23 0.18 0.24 1.32 0.20 0.57
Gynecology BalancedBaggingClassifier 0.65 0.14 0.24 0.19 0.22 1.54 0.19 0.61
Hematology RandomForestClassifier 0.54 0.16 0.16 0.10 0.38 2.31 0.32 0.73
Infectology RandomForestClassifier 0.57 0.11 0.26 0.21 0.21 1.79 0.18 0.64
Nephrology BalancedBaggingClassifier 0.73 0.11 0.15 0.12 0.23 2.17 0.21 0.69
Neurology BalancedBaggingClassifier 0.64 0.12 0.26 0.20 0.23 1.91 0.20 0.68
Nutrition LogisticRegression 0.65 0.10 0.32 0.27 0.16 1.53 0.14 0.60
Oncology RandomForestClassifier 0.50 0.09 0.04 0.03 0.29 3.26 0.26 0.72
Ophtalmology BalancedRandomForestClassifier 0.65 0.13 0.31 0.24 0.21 1.61 0.18 0.62
Orthodontics BalancedBaggingClassifier 0.63 0.17 0.21 0.11 0.47 2.87 0.40 0.80
Otorhinolaryngology BalancedBaggingClassifier 0.61 0.18 0.22 0.14 0.37 2.07 0.30 0.69
Parasitology BalancedBaggingClassifier 0.72 0.12 0.17 0.12 0.26 2.20 0.23 0.65
Pediatric dentistry BalancedBaggingClassifier 0.67 0.11 0.30 0.24 0.20 1.86 0.18 0.66
Pediatrics BalancedRandomForestClassifier 0.63 0.13 0.25 0.19 0.23 1.75 0.20 0.64
Plastic surgery BalancedRandomForestClassifier 0.67 0.21 0.10 0.05 0.47 2.22 0.37 0.76
Psychiatry RandomForestClassifier 0.56 0.14 0.25 0.19 0.25 1.78 0.21 0.65
Pulmonology BalancedRandomForestClassifier 0.61 0.27 0.17 0.09 0.49 1.85 0.36 0.74
Rheumatology BalancedRandomForestClassifier 0.66 0.11 0.22 0.18 0.16 1.54 0.14 0.60
Traumatology BalancedBaggingClassifier 0.65 0.13 0.18 0.14 0.22 1.71 0.20 0.63
Urology BalancedRandomForestClassifier 0.61 0.13 0.19 0.15 0.23 1.73 0.20 0.63
Predicting no-show appointments in a pediatric hospital in Chile using machine learning
Feature Correlation
Fig. 3 Features with the strongest label correlation by specialty. All correlations presented have p-values <0.001
Table 13 Comparison of
no-show rates in control and Specialty No-show rate Reduction in
intervention groups in
experimental design Control group Intervention group percentage points
NSP average of the hospital) and 10.7% for the intervention measuring the predictive power of our methods for remote
group, with a reduction of 10.3 percentage points (p-value∼ consultations using telemedicine. Finally, as said before,
0.002). Table 13 shows the no-show rates in both groups for we use cost-effectiveness metrics to construct and select
the different specialties considered in the study. the best predictive models. These metrics are computed
To interpret these results in terms of metrics m1 and as the proportion of avoided no-shows and the proportion
m2 , first, we use the percentage of no-show of the control of appointments identified as possible no-shows. Although
group as a proxy for the value NSPi . This percentage simple, these metrics were enough for our purposes. They
also coincides with the historical no-show of the hospital, permit us to consider the hospital’s needs where resources
which justifies this decision. We obtained PR = (21.0% − are scarce, and it is not desirable to contact many patients.
10.7%)/21.0% = 0.46 and PC = 247/4, 617 = However, considering other more complex cost metrics
0.05. This can be read as follows: calling the top 5% (such as in Berg et al. [2]) could bring realism to our
of appointments ordered from higher to lowest no-show methodology and can be the object of a future study.
probability generates a 46% decrease in no-shows. Thus, in Some of the limitations of this study are that we work
terms of the metrics, we get m1 = PR /PC = 9.80 and in pediatric settings, and extending our work to adult
m2 = PR (1 − PC ) = 0.47. appointments will require us to train the models again.
We are currently working on that by gathering funding to
study no-shows for adults and combining urban and rural
4 Conclusions, perspectives populations. In addition, this paper shows only the reduction
in no-shows that calling had compared to a control group.
We have presented the design and implementation of Future work could include cheaper forms of contacting
machine learning methods applied to the no-show problem patients, such as SMS or WhatsApp messages written by
in a pediatric hospital in Chile. It is the most extensive work automatic agents.
using Chilean data, and among the few in pediatric settings. The implementation of actions based on the results
The novelty of our approach is fourfold: provided by our platform may yield a noticeable reduction
of avoidable no-shows. Using a prototype at Dr. Luis Calvo
1. The use extensive historical data to train machine
Mackenna Hospital in a subset of medical specialties and
learning models.
a phone call intervention has resulted in 10.3 percentage
2. The most suitable machine learning model for each
points less no-show. This research is a concrete step
specialty was selected from various methods.
towards reducing non-attendance in this healthcare provider.
3. The development of tailored cost-effectiveness metrics
Other actions, such as reminders of the appointments via
to account for possible preventive interventions.
phone calls, text messages, or e-mail, special scheduling
4. The realization of an experimental design to measure the
rules according to patient characteristics, or even arranging
effectiveness of our predictive models in real conditions
transportation for patients from far communes, could be
Our results show a notorious variability among spe- implemented in the future. However, all these actions rely
cialties in terms of the predictive power of the features. on a good detection of possible no-shows to maximize the
Although reservation delay and historical no-show are con- effect subjected to a limited budget.
sistently strong predictors across most specialties, variables
such as the patient’s age, time of the day, or appointment
type must not be overlooked. Appendix A: Threshold tuning
Future work includes testing the effect of adding weather
variables. However, including weather forecasts from The optimal classification thresholds were obtained by balanc-
external sources poses additional technical implementation ing type I and II errors (defined in Eqs. 7 and 8) for each
challenges. Another interesting line of future research is method, following [22]. For the sake of completeness, we
Predicting no-show appointments in a pediatric hospital in Chile using machine learning
GM 0
λP P +λN N −1
MCC −
2 [λP P +(1−λN N ) 1+δ 1+δ
1−δ ][λN N +(1−λP P ) 1−δ ]
√ λP P +λN N −1
2 [λP P +(1−λN N )][λN N +(1−λP P )]
2λP P 2λP P
Fig. 5 Type I and II errors as a function of the method classification m1 λP P (1+δ)+(1−λN N )(1−δ) − λP P +(1−λN N )
threshold. Threshold p is selected as the minimiser of their weighted λP P λP P
m2 2 (λN N (1 − δ) − λP P (1 + δ)) − 2 (λN N − λP P )
sum
J. Dunstan et al.
Fig. 7 Heat maps of bias for each performance metrics with δ = 2 × 8/31 − 1
used. We determined the impact of the imbalance using imbalance coefficient δ of our dataset is 2×8/31−1, the bias
the bias of the metric given by Bμ (λP P , λNN , δ), where depends only on λP P and λNN . Figure 7 shows the bias in a
λP P is the percent of true positive, λNN is the percent heatmap. Metrics m1 and m2 have a low bias for most values
of true negative, and δ the imbalance coefficient is given of the parameters, with m2 showing the best performance.
by 2mp /m − 1, where mp is the total number of positive The use of both metrics allows to reduce the impact in areas
elements and m is the total number of elements. with a high bias.
Table 14 shows the definition of bias for the Geometric
Mean (GM), Matthews’s Correlation Coefficient (MCC), A.3 ML metrics the best models for each specialty
and the proposed metrics m1 and m2 . The first two were
selected as benchmarks, since they are known to have a Table 15 gives more information about the best model in
good performance with imbalanced datasets [31]. Since the each specialty.
Table 15 Additional performance metrics of the best model for each medical specialty
Supplementary Information The online version contains 11. Dantas LF, Fleck JL, Oliveira FLC, Hamacher S (2018) No-shows
supplementary material available at https://doi.org/10.1007/ in appointment scheduling–a systematic literature review. Health
s10729-022-09626-z. Policy 122:412–421
12. Denney J, Coyne S, Rafiqi S (2019) Machine learning predictions
Acknowledgements This work was partly supported by Fondef Grant of no-show appointments in a primary care setting. SMU Data Sci
ID19I10271, Fondecyt grants 11201250, 1181179 and 1201982, Rev 2:2
and Center for Mathematical Modeling (CMM) BASAL fund 13. Devasahay SR, Karpagam S, Ma NL (2017) Predicting appoint-
FB210005 for center of excellence, all from ANID-Chile; as well as ment misses in hospitals using data analytics. mHealth 3:12–12
Millennium Science Initiative Program grants ICN17 002 (IMFD) and 14. Elvira C, Ochoa A, Gonzalvez JC, Mochon F (2018) Machine-
ICN2021 004 (iHealth). learning-based no show prediction in outpatient visits. Int J
Interact Multimed Artif Intell 4:29
Declarations 15. Freedman D (2005) Statistical models: theory and practice.
Cambridge University Press, Cambridge
Ethics approval This research was carried out according to interna- 16. Freund Y, Schapire RE (1997) A decision-theoretic generalization
tional standards on data privacy, and was approved by the Faculty of on-line learning and an application to boosting. J Comput Syst
Committee for Ethics and Biosecurity. Sci 55:119–139. https://www.sciencedirect.com/science/article/
pii/S002200009791504X, https://doi.org/10.1006/jcss.1997.1504
17. Gupta D, Wang WY (2012) Patient appointments in ambu-
Open Access This article is licensed under a Creative Commons
latory care. In: Handbook of Healthcare system scheduling.
Attribution 4.0 International License, which permits use, sharing,
International series in operations research and management
adaptation, distribution and reproduction in any medium or format, as
science, vol 168. Springer, New York LLC, pp 65–104.
long as you give appropriate credit to the original author(s) and the
https://doi.org/10.1007/978-1-4614-1734-7 4
source, provide a link to the Creative Commons licence, and indicate
18. Gurol-Urganci I, de Jongh T, Vodopivec-Jamsek V, Atun R, Car
if changes were made. The images or other third party material in this
J (2013) Mobile phone messaging reminders for attendance at
article are included in the article’s Creative Commons licence, unless
healthcare appointments. Cochrane database of systematic reviews
indicated otherwise in a credit line to the material. If material is not
19. Guzek LM, Fadel WF, Golomb MR (2015) A pilot study of
included in the article’s Creative Commons licence and your intended
reasons and risk factors for “no-shows” in a pediatric neurology
use is not permitted by statutory regulation or exceeds the permitted
clinic. J Child Neurol 30:1295–1299
use, you will need to obtain permission directly from the copyright
20. Harvey HB, Liu C, Ai J, Jaworsky C, Guerrier CE, Flores
holder. To view a copy of this licence, visit http://creativecommons.
E, Pianykh O (2017) Predicting no-shows in radiology using
org/licenses/by/4.0/.
regression modeling of data available in the electronic medical
record. J Am Coll Radiol 14:1303–1309
21. Hu M, Xu X, Li X, Che T (2020) Managing patients’ no-show
References behaviour to improve the sustainability of hospital appointment
systems: Exploring the conscious and unconscious determinants
of no-show behaviour. J Clean Prod 269:122318
1. Alaeddini A, Yang K, Reddy C, Yu S (2011) A probabilistic model 22. Huang Y, Hanauer DA (2014) Patient no-show predictive
for predicting the probability of no-show in hospital appointments. model development using multiple data sources for an effective
Health Care Manag Sci 14:146–157 overbooking approach. Appl Clin Inform 5:836–860
2. Berg BP, Murr M, Chermak D, Woodall J, Pignone M, Sandler 23. Perron Junod N., Dominicé Dao M, Kossovsky MP, Miserez V,
RS, Denton BT (2013) Estimating the cost of no-shows and Chuard C, Calmy A, Gaspoz JM (2010) Reduction of missed
evaluating the effects of mitigation strategies. Med Decis Making appointments at an urban primary care clinic: A randomised
33:976–985. https://doi.org/10.1177/0272989X13478194 controlled study. BMC Fam Pract 11:79
3. Breiman L (2001) Random forests. Mach Learn 45:5–32. 24. Kong Q, Li S, Liu N, Teo CP, Yan Z (2020) Appointment
https://doi.org/10.1023/A:1010933404324 scheduling under time-dependent patient no-show behavior.
4. Breiman L (2004) Bagging predictors. Mach Learn 24:123–140 Queuing Theory eJournal
5. Bush R, Vemulakonda V, Corbett S, Chiang G (2014) Can we 25. Kuo YH, Balasubramanian H, Chen Y (2020) Medical appoint-
predict a national profile of non-attendance pediatric urology ment overbooking and optimal scheduling: tradeoffs between
patients: a multi-institutional electronic health record study. schedule efficiency and accessibility to service. Flex Serv Manuf
Inform Prim Care 21:132 J 32:72–101
6. Cameron S, Sadler L, Lawson B (2010) Adoption of open-access 26. Lacy NL, Paulman A, Reuter MD, Lovejoy B (2004) Why we
scheduling in an academic family practice. Can Fam Physician don’t come: patient perceptions on no-shows. Ann Fam Med
56:906–911 2:541–545
7. Carreras-Garcı́a D, Delgado-Gómez D., Llorente-Fernández F., 27. Lemaı̂tre G, Nogueira F, Aridas CK (2017) Imbalanced-learn:
Arribas-Gil A (2020) Patient no-show prediction: A systematic A python toolbox to tackle the curse of imbalanced datasets
literature review. Entropy 22 in machine learning. J Mach Learn Res 18:1–5. http://jmlr.org/
8. Chen C, Breiman L (2004) Using random forest to learn papers/v18/16-365.html
imbalanced data. University of California, Berkeley 28. Li X, Xiong H, Li X, Wu X, Zhang X, Liu J, Bian J, Dou D
9. Cortes C, Vapnik VN (1995) Support-vector networks. Mach (2021) Interpretable deep learning: Interpretation, interpretability,
Learn 20:273–297 trustworthiness, and beyond. arXiv:2103.10689
10. da Costa TM, Salomão PL, Martha AS, Pisa IT, Sigulem D 29. Lin CL, Mistry N, Boneh J, Li H, Lazebnik R (2016) Text message
(2010) The impact of short message service text messages sent reminders increase appointment adherence in a pediatric clinic:
as appointment reminders to patients’ cell phones at outpatient A randomized controlled trial. International Journal of Pediatrics
clinics in SÃO Paulo, Brazil. Int J Med Inform 79:65–70. http:// 2016
www.sciencedirect.com/science/article/pii/S1386505609001336, 30. Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for
https://doi.org/10.1016/j.ijmedinf.2009.09.001 class-imbalance learning. IEEE Trans Syst Man Cybern Part B
J. Dunstan et al.
Affiliations
J. Dunstan1,2 · F. Villena1 · J.P. Hoyos3 · V. Riquelme1 · M. Royer4 · H. Ramı́rez1,5 · J. Peypouquet6