Predicting No-Show Appointments in A Pediatric Hospital in Chile Using Machine Learning

Health Care Management Science
https://doi.org/10.1007/s10729-022-09626-z
Predicting no-show appointments in a pediatric hospital in Chile

using machine learning
J. Dunstan1,2 · F. Villena1 · J.P. Hoyos3 · V. Riquelme1 · M. Royer4 · H. Ramı́rez1,5 · J. Peypouquet6
Received: 8 August 2021 / Accepted: 13 December 2022

© The Author(s) 2023
Abstract
The Chilean public health system serves 74% of the country’s population, and 19% of medical appointments are missed
on average because of no-shows. The national goal is 15%, which coincides with the average no-show rate reported in the
private healthcare system. Our case study, Doctor Luis Calvo Mackenna Hospital, is a public high-complexity pediatric
hospital and teaching center in Santiago, Chile. Historically, it has had high no-show rates, up to 29% in certain medical
specialties. Using machine learning algorithms to predict no-shows of pediatric patients in terms of demographic, social,
and historical variables. To propose and evaluate metrics to assess these models, accounting for the cost-effective impact
of possible intervention strategies to reduce no-shows. We analyze the relationship between a no-show and demographic,
social, and historical variables, between 2015 and 2018, through the following traditional machine learning algorithms:
Random Forest, Logistic Regression, Support Vector Machines, AdaBoost and algorithms to alleviate the problem of class
imbalance, such as RUS Boost, Balanced Random Forest, Balanced Bagging and Easy Ensemble. These class imbalances
arise from the relatively low number of no-shows to the total number of appointments. Instead of the default thresholds used
by each method, we computed alternative ones via the minimization of a weighted average of type I and II errors based
on cost-effectiveness criteria. 20.4% of the 395,963 appointments considered presented no-shows, with ophthalmology
showing the highest rate among specialties at 29.1%. Patients in the most deprived socioeconomic group according to their
insurance type and commune of residence and those in their second infancy had the highest no-show rate. The history of
non-attendance is strongly related to future no-shows. An 8-week experimental design measured a decrease in no-shows of
10.3 percentage points when using our reminder strategy compared to a control group. Among the variables analyzed, those
related to patients’ historical behavior, the reservation delay from the creation of the appointment, and variables that can
be associated with the most disadvantaged socioeconomic group, are the most relevant to predict a no-show. Moreover, the
introduction of new cost-effective metrics significantly impacts the validity of our prediction models. Using a prototype to
call patients with the highest risk of no-shows resulted in a noticeable decrease in the overall no-show rate.
Keywords No-show patients · Appointments and schedules · Machine learning · Medical informatics · Public health
Highlights criteria. The hospital management can then apply a

reduced number of actions in order to prevent the
• We predict the probability of patients missing their
no-show or mitigate its effect.
medical appointments, based on demographic, social
• The use of a prototype in the hospital resulted in
and historical variables.
an average of 10.3 percentage points reduction in
• For each day and specialty, we provide a short list with
no-shows when measured in an 8-week experimental
the appointments that are more likely to be missed. The
design.
length of the list is determined using cost-effectiveness
1 Introduction
J. Peypouquet
j.g.peypouquet@rug.nl
With a globally increasing population, efficient use of
healthcare resources is a priority, especially in countries
Extended author information available on the last page of the article. where those resources are scarce [21]. One avoidable source
J. Dunstan et al.
of inefficiency stems from patients missing their scheduled We use machine learning methods to estimate the
appointments, a phenomenon known as no-show [7], which probability of no-show in pediatric appointments, and
produces noticeable wastes of human and material resources identify which patients are likely to miss them. This
[17]. A systematic review of 105 studies found that Africa prediction is meant to be used by the hospital to reduce
has the highest no-show (43%), followed by South America no-show rates through personalised actions. Since public
(28%), Asia (25%), North America (24%), Europe (19%), hospitals have scarce resources and a tight budget, we
and Oceania (13%), with a global average of 23% [11]. In introduce new metrics to account for both the costs and
pediatric appointments, no-show rates range between 15% the effectiveness of these actions, which marks a difference
and 30% [11], and tend to increase with the patients’ age with the work presented by Srinivas and Salah [47], which
[33, 44]. considers standard machine learning metrics, and Berg et
To decrease the rate of avoidable no-shows, hospitals can al. [2], which balances interventions and opportunity costs,
focus their efforts in three main areas: among others.
a) Identifying the causes. The most common one is The paper is organised as follows: Section 2 describes
forgetting the appointment, according to a survey in the the data and our methodological approach. It contains
United Kingdom [36]. Lacy et al. [26] identified three data description, the machine learning methods, our cost-
additional issues: emotional barriers (negative emotions effectiveness metrics, and the deployment. Results are
about going to see the doctor were greater than the sensed shown in Section 3, paying particular attention to the
benefit), perceived disrespect by the health care system, metrics we constructed to assess efficiency, and the
and lack of understanding of the scheduling system. In impact of the use of this platform, measured in an
pediatric appointments, other reasons include caregiver’s experimental design. Section 4 contains our conclusions and
issues, scheduling conflicts, forgetting, transportation, gives directions for future research. Finally, some details
public health insurance, and financial constraints [11, 19, concerning the threshold tuning, and the balance between
23, 39, 44, 49]. type I and II errors are given in the Appendix.
b) Predicting patients’ behaviour. To this end,
researchers have used diverse statistical methods, including
logistic regression [5, 20, 22, 40], generalised additive mod- 2 Materials and methods
els [43], multivariate [5], hybrid methods with Bayesian
updating [1], Poisson regression [41], decision trees [12, 2.1 Data description
13], ensembles [14, 37], and stacking methods [46]. Their
efficiency depends on the ability of predictors to com- Dr. Luis Calvo Mackenna Hospital is a high-complexity
pute the probability of no-show for a given patient and pediatric hospital in Santiago. We analysed the schedule
appointment. Among adults, the most likely to miss their of medical appointments from 2015 to 2018, comprising
appointments are younger patients, those with a history 395,963 entries. It contains socioeconomic information
of no-show, and those from a lower socioeconomic back- about the patient (commune of residence, age, sex,1
ground, but variables such as the time of the appointment health insurance), and the appointment (specialty, type of
are also relevant [11]. appointment, day of the week, month, hour of the day,
c) Improving non-attendance rates using preventive reservation delay), as well as the status of the appointment
measures. A review of 26 articles from diverse backgrounds (show/no-show).
found that patients who received a text notification were Although the hospital receives patients from the whole
23% less likely to miss their appointment than those who did country, 70.7% of the appointments correspond to patients
not [42]. Similar results were obtained for personal phone from the Eastern communes of Santiago (see Fig. 1).
calls in adolescents [39]. Text messages have been observed Among these communes, the poorest, Peñalolén, exhibits
to produce similar outcomes to telephone calls, at a lower the highest percentage of no-show. Table 1 shows the per-
cost, in both adults [10, 18] and pediatric patients [29]. centage of appointments, no-shows and poverty depending
In terms of implementing mitigation actions, overbook- on the patients’ commune of residence. For measuring
ing can maintain an efficient use of resources, despite poverty, we used the Chilean national survey Casen, which
no-show [2, 25]. However, there is a trade-off between effi- uses the multidimensional poverty concept to account for
ciency and service quality. For other strategies, see the work the multiple deprivations faced by poor people at the same
of Cameron et al. [6]. time in areas such as education, health, among others [34].
This work is concerned with prediction and prevention 1 Ofthe 395,963 appointments, there are 15 from intersex patients and
in a pediatric setting. This is particularly challenging as 25 in which sex was marked as undefined. These appointments were
attendance involves patients and their caregivers, who can not considered to create the model because small group sizes could
moreover change over time. cause model overfitting.
Predicting no-show appointments in a pediatric hospital in Chile using machine learning
Fig. 1 Map of communes that Chile Greater Santiago

belong to the East Metropolitan
Health Service
LB
V
EI Metropolitan
Region LC
Pr
Santiago LR
Ñ
M Pe
0 4 8 16 24 km
Communes of Metropolitan Orient Health Service:

V: Vitacura LB: Lo Barnechea EI: Easter Island
100 km Pr: Providencia LC: Las Condes Luis Calvo
Ñ: Ñuñoa LR: La Reina Mackenna
60 mi M: Macul Pe: Peñalolén Hospital
Since Dr. Luis Calvo Mackenna is a pediatric hospital, status in groups A, B, C, and D. The income range for
99.2% of the appointments correspond to patients whose each group and the percentage of appointments at each
age at the day of the appointment is under 18 years. The level is shown in Table 3. During the time this study took
distribution by age group is shown on Table 2. place, patients in groups A and B had zero co-payment,
Most appointments (96.5%) correspond to patients while groups C and D had 10% and 20%, respectively. As
covered by the Public Health Fund FONASA. These of September 2022, due to new government policies, all
patients are classified according to their socioeconomic patients covered in FONASA have a zero co-payment.
The type of appointment is also an important variable.
Table 3 shows the percentage of appointments that cor-
Table 1 Location of the referred center, the proportion of patients respond to first-time appointments, routine appointments,
from the total of appointments, no-show rate and proportion of the
population in multidimensional poverty [34] first-time appointments derived from primary healthcare,
and others. The table shows each type’s volume and the
Referred from Appts. % No-show % Poverty % percentage of no-shows for each type.
Peñalolén 31.1 23.8 26.3
Macul 12.4 23.5 13.5
Ñuñoa 8.9 21.9 5.8 Table 2 Appointments at Dr. Luis Calvo Mackenna displayed by age
Lo Barnechea 4.8 22.4 17.2 group
Las Condes 4.6 21.3 4.2
Life cycle grouping Age Range Percentage
Providencia 4.1 20.2 3.4
La Reina 4.1 23.3 7.0 Nursling 0-5 months 9.7%
Vitacura 0.5 20.6 3.5 First infancy 6 months-4 years 24.1%
Easter Island 0.2 16.6 21.7 Second infancy 5-11 years 39.2%
Other communes 11.1 16.7 − Teenagers 12-17 years 26.2%
Rest of the country 18.2 13.4 − Young adults 18-25 years 0.8%
J. Dunstan et al.
Table 3 Distribution of
patients by grouping them Group Description Appointments % No-Show %
according to socioeconomic
status and type of appointment Socioeconomic Status
A Without income/migrants 44.1 22.5
B Less than US$425. 22.1 18.9
C Between US$425 and US$620 13.0 18.9
D Greater than US$621 17.3 18.3
Other Without health insurance 2.0 20.4
Private With private insurance 1.5 20.4
Type of appointment
1st time appointment First visit for a certain medical episode 23.1 24.1
Routine appointment Medical controls that follow 1st appointments 63.7 18.6
1st time derived Special slots derived from primary healthcare 8.7 26.8
Other Mainly medical prescriptions 4.5 16.6
We analysed specialty consultation referrals both from appointments are missed when looking at time windows of
within the hospital and from primary care providers. The only 12 months. This number grows to 55.2% when the
dataset contains appointments from 25 specialties, which window is 6 months. Due to the above reasons, we decided
are shown in Table 4, along with the corresponding no- to consider all available no-show records.
show rate. The no-show rate is uneven, and seems to The ultimate aim of this work is to identify which
be lower in specialties associated with chronic and life- appointments are more likely to be missed. To do so, we
threatening diseases (e.g. Oncology, Cardiology) than in developed models that classify patients based on attributes
other specialties (e.g. Dermatology, Ophthalmology). available to the hospital, which are described in Table 5.
According to Dantas et al. [11], the patients’ no-show
history can be helpful in predicting their behavior. In order 2.2 Machine learning methods
to determine whether or not to use the complete history,
we performed a correlation analysis between no-show and Our models predict the probability of no-show for a given
past non-attendance, as a function of the size of the look- appointment. This prediction problem was approached
back period. We observed that the Pearson correlation grows using supervised machine learning (ML) methods, where
with the window size (0.09 at six months and 0.11 at the label (variable to predict) was the appointment
18 months), achieving a maximum correlation using the state: show or no-show. All the categorical features in
complete patient history (0.47). Note also that 20.3% of past Table 5 were transformed to one-hot encoded vectors.
The numerical features (historical no-show and reservation
delay) were scaled between 0 and 1.
Table 4 Medical and dental specialties in the dataset In medical applications, the decisions and predictions
Medical specialties (no-show %) of algorithms must be explained, in order to justify
Pulmonology (23.2) Ophthalmology (30.3) their reliability or trustworthiness [28]. Instead of deep
Cardiology (14.7) Oncology (4.9) learning, we preferred traditional machine learning, since its
General Surgery (16.9) Otorhinolaryngology (22.7) explanatory character [35] brings insight into the incidence
Plastic Surgery (14.2) Psychiatry (24.0) of the variables over the output. This is particularly
Dermatology (28.1) Rheumatology (20.9) important because the hospital intends to implement tailored
Endocrinology (22.1) Traumatology (19.9) actions to reduce the no-show.
Gastroenterology (19.3) Urology (19.3) The tested algorithms, listed in Table 6, were imple-
Gynecology (25.1) Genetics (24.5)
mented in Python programming language [50]. The dis-
Hematology (15.8) Pediatrics (22.6)
tribution of the classes is highly unbalanced, with a
ratio of 31:8 between show and no-show. To address the
Nephrology (18.4) Infectology (23.7)
class imbalance we used algorithms suited for imbalanced
Neurology (28.3) Parasitology (18.8)
learning implemented in imbalanced-learn [27] and
Nutrition (27.6)
scikit-learn [38]. To handle the problem of class bal-
Dental specialties (no-show %)
ancing, RUSBoost [45] randomly under-samples the major-
Pediatric dentistry (24.9) Orthodontics (18.4)
ity sample at each iteration of AdaBoost [16], which is a
Table 5 Description of the input features of the model
Feature name Description Type Categories/range
Age Age at the day of the appointment, as the Categorical Nursling (0-5 months), first infancy (6
position in the life cycle: months-4 years), second infancy (5-11
years), teenager (12-17 years), young
adult (18-25 years)
Sex Sex of the patient Categorical Male, female
Commune of residence Location of residence of the patient at the Categorical Any of the 346 communes of Chile
commune level.
Insurance Insurance type Categorical Group A (person without housing or
income, or migrant, Group B (monthly
income < US $ 425), Group C (monthly
income ∈ [US $ 425;US $621)), Group
D (monthly income > US $ 621), Pro-
visory Insurance (people without health
insurance)
Day of the week Day of the week of the appointment Categorical Monday - Friday
Month Month of the appointment Categorical January - December
Hour of the day Hour of the day of the appointment as a Categorical 8hrs - 17hrs (ranges of one hour)
categorical feature
Reservation delay Time in weeks from the creation of the Numerical 0,1,2,. . .
appointment generation and the appoint-
ment itself as a categorical feature.
Historical no-show Calculated as the no-show citations Numerical Number between 0 and 1
divided by total citations prior the current
appointment.
Historical no-show Calculated as the no-show citations Numerical Number between 0 and 1
by specialty divided by total citations prior the cur-
rent appointment, both with respect to the
considered specialty.
Type of appointment Type of the appointment, regardless its Categorical First-time appointment, routine appoint-
medical specialty ment, and first-time appointment derived
from primary healthcare (PHC)
well-known boosting algorithm shown to improve the clas- no-shows. Also, as shown in the Section 3, a single model
sification performance of weak classifiers. Similarly, the incorporating specialty information through a series of indi-
balanced Random Forest classifier balances the minority cator variables is less accurate than our specialty-based
class by randomly under-sampling each bootstrap sample models.
[8]. On the other hand, Balanced Bagging re-samples using The dataset was split by specialty, and each specialty sub-
random under-sampling, over-sampling, or SMOTE to bal- set was separated into training and testing subsets. The first
ance each bootstrap sample [4, 32, 51]. The final classifier subset was used to select optimal hyperparameters−selected
adapted to imbalanced data was Easy Ensemble, which via grid search on the values described in Table 7−and train
performs random under-sampling. Then, it trains a learner machine learning algorithms. Due to computing power con-
for each subset of the majority class with all the minor- straints, each hyperparameter combination performance was
ity training set to generate learner outputs combined for the
final decision [30]. In turn, Support Vector Machine con-
structs a hyperplane to separate the data points into classes
Table 6 Machine learning algorithms used in this work
[9]. Logistic regression [15] is a generalized linear model,
widely used to predict non-show [1, 7, 20, 22, 40]. We imbalanced-learn
did not use stacking because these classifiers are likely to RUS Boost Balanced Random Forest
suffer from overfitting when the number of minority class Balanced Bagging Easy Ensemble
examples is small [48, 52].
scikit-learn
We trained and analyzed prediction models by spe-
Logistic Regression Random Forest
cialty to ensure that each specialty receives unit-specific
insights about the reasons correlated with their patients’ Ada Boost Support Vector Machines
J. Dunstan et al.
Table 7 Hyperparameters for grid search
Model Parameter Values
AdaBoost Decision tree max depth 1, 2, 5, 8, 10, 15

Decision tree min samples leaf 2, 3, 5, 10, 20, 40
n estimators 50, 100, 200, 300, 500, 750, 1000
learning rate 0.01, 0.05, 0.1, 0.2,None
Random Forest bootstrap True, False
Balanced Random max features auto, sqrt
Forest (imblearn) n estimators 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000
max depth 10, 20, 30, 40, 50, 60, 70, 80, 90, 100,None
min samples split 2, 5, 10, 50
Support Vector Machine Kernel linear, rbf
C 1,10,100,1000
Gamma (rbf kernel only) 1,0.1,0.001,0.0001
Logistic Regression penalty L1, L2
C 0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000
RUS Boost n estimators 50, 100, 400, 800, 1000, 1200, 1400, 1600, 1800, 2000
replacement True, False
Balanced Bagging bootstrap True, False

bootstrap features True, False
replacement True, False
n estimators 10, 50, 100, 200, 500, 1000, 1200, 1400, 1600, 1800
EasyEnsemble replacement True, False
n estimators 10, 50, 100, 200, 500, 1000, 1200, 1400, 1600, 1800
assessed using 3-fold cross-validation. The testing subset The use of custom cost-effectiveness metrics has two
was used to obtain performance metrics. advantages. Firstly, they account for operational costs
The hyperparameters that maximised the metric given and constraints in the hospital’s appointment confirmation
by (1-cost)*effectiveness (see Eq. 6 below) were process, while standard machine learning metrics do not.
used to train models using 10-fold cross-validation over For instance, the number of calls to be made or SMSs to be
the training subset to assess the best algorithm to use sent, the number of telephone operators, etc., all incur costs
for specialty model training. Then, these combinations that the hospital must cover. Secondly, they offer an evident
of best hyperparameters and algorithms were tuned to interpretation of the results since we establish a balance
optimise their classification thresholds, as explained in the between the expected no-show reduction and the number of
Appendix. The tuple (hyperparameter, algorithm, threshold) actions to be made. For instance, a statement such as “in
constitutes a predictive model. Then, the best predictive order to reduce the no-show in ophthalmology by 30%, we
model for each medical specialty is chosen as the one that need to contact 40% of daily appointments” can be easily
maximises cost/effectiveness (see Eq. 5 below). understood by operators and decision-makers.
See Section 2.3 for more details To construct these metrics, we used the proportion PC of
actions to be carried out, based on model predictions:
2.3 Cost-effectiveness metrics
FP + TP
PC = , (1)
Custom metrics were developed to better understand the N
behavior of the trained models, and assess the efficiency of where FP and TP are the number of false and true positives,
the system. These metrics balance the effectiveness of the respectively (analogously for FN and TN); and N = FP +
predictions and the cost associated with possible prevention TP + FN + TN is the total number of appointments (for the
actions. This is particularly relevant in public institutions, specialty). This quantity can be seen as a proxy of the cost
which have strong budget limitations. of actions taken to prevent no-shows.
The second quantity used to define our custom me- slightly easier to interpret (but possibly unbounded), we
trics is the proportion PR of no-show reduction, obtained used it to select the best predictive model for each studied
from model predictions. First, let NSPi be the existing no- medical specialty. An analysis of our classification metrics
show rate, and NSPf be the no-show rate obtained after against Geometric Mean (GM) and Matthews’s Correlation
considering that all TP cases attend their appointment. That Coefficient (MCC) is shown in the Appendix. This is carried
is: out to analyze the bias of these two metrics in the context of
FN + TP an imbalanced dataset.
NSPi = , (2) Regarding the limitations of the proposed metrics,
N
FN we noticed that, in some occasional cases, the use of
NSPf = . (3) m1 recommended very few actions. Indeed, few medical
N
appointments with high no-show probability generate a
Then, PR , computed as high classification threshold, yielding a high value of m1 .
NSPf FN TP For example, when the model recommends confirming the
PR = 1 − =1− = , (4) top 1% of the appointments (i.e., PC = 0.01), but this
NSPi FN + TP FN + TP
also reduces the no-show rate by 5% (i.e., PR = 0, 05),
measures the effectiveness of the prediction. To assess we obtain a m1 = 5. To overcome this problem in a
the trade-off between cost and effectiveness, we defined heuristic way, and also for practical reasons (values of m2
metrics: are bounded), we use metric m2 for the hyperparameters
PR optimization process. However, we keep m1 to select the
m1 := effectiveness / cost = , (5) best predictive model for each specialty because it is easier
PC
m2 := effectiveness · (1 - cost) = PR · (1 − PC ). (6) to interpret than m2 .
Another approach used in the literature is the compari-
Here, PR is the proportion of correctly predicted no- sion of models through costs instead of a cost-effectiveness
shows from the total actual no-shows, a measure of analysis—for example, the minimization of both the costs
efficiency. Conversely, PC corresponds to the proportion of of outreaches and the opportunity cost of no-shows. For
predicted no-shows from the total analyzed appointments, a instance, in the context of overbooking, Berg et al. [2] sug-
measure of cost (number of interventions to be performed). gested that the cost function to be minimized could balance
Hence, m1 is the ratio between the proportion of no- the cost of prevention (predicted no-shows multiplied by
shows avoided by the intervention and the proportion of the cost of intervention) and the cost of no-shows (real no-
interventions. In turn, m2 is the product (combined effect) shows multiplied by the cost of medical consultation). This
of the proportion of no-shows avoided by intervention and approach could be adapted to our context to assess miti-
the proportion of shows predicted (appointments no to be gation actions (such as phone calls) through more realistic
intervened). criteria. However, this is beyond the scope of this research
Thus, an increase of a 10% in m1 can be produced by a and will be the object of future studies.
10% increase of PR (an increase of correctly predicted no-
shows) or a 10% decrease of PC (decrease in the number 2.4 Deployment
of interventions to be performed). Similarly, an increase
of a 10% of m2 can be produced by a 10% increase of We designed a computational platform to implement our
PR (an increase of correctly predicted no-shows) without predictive models as a web application. The front- and
performing more interventions, or a 10% increase of 1 − PC back-end were designed in Python using the Django
(decrease in the number of interventions to be performed) web framework. The input is a spreadsheet containing
without changing PR . the appointment’s features, such as patient ID and other
These two metrics are used to construct and select the personal information, medical specialty, date, and time.
best predictive models for each specialty. This decision is This data is processed to generate the features described in
supported by the fact that, by construction, both metrics Table 5.
have higher values when the associated model performs For each specialty, the labels of all appointments are
better in a (simple) cost-effectiveness sense and is therefore predicted using the best predictive model. The appointments
preferred according to our methodology. Then, since the are sorted in descending order according to the predicted
range of m2 is bounded (it takes values between 0 and probability of no-show, along with the patient’s contact
1), we used it as the objective function for hyperparameter information. The hospital may then contact the patients
optimization, which is an intermediate process to construct with the highest probability of no-show to confirm the
our predictive models. On the other hand, since m1 is appointment.
J. Dunstan et al.
3 Results the scikit-learn ones in this study. Ensemble methods,

such as BalancedBaggingClassifier, which combine multi-
Table 8 shows the best model for each specialty analyzed ple isolated models, usually achieve better results due to a
and provides the values for the m1 and m2 metrics, along lower generalization error. In addition, our dataset is imbal-
with the Area Under the Receiver Operating Characteristics anced, so it is not surprising that the balanced versions of
Curve (AUC) metric. Please check the Appendix (Table 15) the classifiers are dominant. Interestingly, the three best
for additional metrics corresponding to the best model in algorithms (BalancedBaggingClassifier, Randomforestclas-
each specialty. sifier, and BalancedRandomForestClassifier) are based on
Cross-validated AUC performance of the best (hyper- bagging, which combines trees independently.
parameter, model) combination with its deviations is also For each specialty, the results in Table 8 can be
shown in Fig. 2. Our proposed metrics correlate with the interpreted as follows: Suppose that there are 1,000
AUC performance (0.78 and 0.89 Pearson correlation for appointments and a historical no-show rate of 20%. Then,
m1 and m2 , respectively), suggesting our custom-tailored PC = 0.27 means that our model recommends confirming
metrics conform with the well-known AUC metric. How- the 270 appointments with the highest no-show probability.
ever, in contrast to AUC, metrics m1 and m2 can be related On the other hand, PR = 0.49 means that this action may
to the trade-off between costs and effectiveness. Our pro- reduce the no-show rate from the original 20% to 10.2% (=
posed single-specialty models achieve a weighted m1 of (1-0.49) x 20%; see Eq. 4).
3.33 (0.83 AUC), in contrast to the single model architec- Table 9 and Fig. 3 show the features with the
ture for all specialties that achieves an m1 of 2.18 (0.71 strongest correlation with no-show, overall and by specialty,
AUC). Balanced Random Forest and Balanced Bagging respectively. The historical no-show and the reservation
were the best classifiers in 8 and 9 specialties, respec- delay are the most correlated variables to no-show. A
tively. The imbalanced-learn methods outperformed patient with a large historical no-show rate is likely to
Table 8 Performance of the best model for each medical specialty
Specialty Algorithm Threshold PC NSPi NSPf PR m1 m2 AUC
Cardiology RandomForestClassifier 0.55 0.10 0.16 0.13 0.18 1.76 0.16 0.63
Dermatology RandomForestClassifier 0.56 0.13 0.26 0.21 0.22 1.61 0.19 0.65
Endocrinology RandomForestClassifier 0.54 0.20 0.21 0.14 0.33 1.68 0.27 0.66
Gastroenterology BalancedBaggingClassifier 0.68 0.11 0.19 0.15 0.21 1.90 0.19 0.65
General surgery LogisticRegression 0.67 0.19 0.13 0.08 0.40 2.17 0.33 0.72
Genetics BalancedRandomForestClassifier 0.57 0.18 0.23 0.18 0.24 1.32 0.20 0.57
Gynecology BalancedBaggingClassifier 0.65 0.14 0.24 0.19 0.22 1.54 0.19 0.61
Hematology RandomForestClassifier 0.54 0.16 0.16 0.10 0.38 2.31 0.32 0.73
Infectology RandomForestClassifier 0.57 0.11 0.26 0.21 0.21 1.79 0.18 0.64
Nephrology BalancedBaggingClassifier 0.73 0.11 0.15 0.12 0.23 2.17 0.21 0.69
Neurology BalancedBaggingClassifier 0.64 0.12 0.26 0.20 0.23 1.91 0.20 0.68
Nutrition LogisticRegression 0.65 0.10 0.32 0.27 0.16 1.53 0.14 0.60
Oncology RandomForestClassifier 0.50 0.09 0.04 0.03 0.29 3.26 0.26 0.72
Ophtalmology BalancedRandomForestClassifier 0.65 0.13 0.31 0.24 0.21 1.61 0.18 0.62
Orthodontics BalancedBaggingClassifier 0.63 0.17 0.21 0.11 0.47 2.87 0.40 0.80
Otorhinolaryngology BalancedBaggingClassifier 0.61 0.18 0.22 0.14 0.37 2.07 0.30 0.69
Parasitology BalancedBaggingClassifier 0.72 0.12 0.17 0.12 0.26 2.20 0.23 0.65
Pediatric dentistry BalancedBaggingClassifier 0.67 0.11 0.30 0.24 0.20 1.86 0.18 0.66
Pediatrics BalancedRandomForestClassifier 0.63 0.13 0.25 0.19 0.23 1.75 0.20 0.64
Plastic surgery BalancedRandomForestClassifier 0.67 0.21 0.10 0.05 0.47 2.22 0.37 0.76
Psychiatry RandomForestClassifier 0.56 0.14 0.25 0.19 0.25 1.78 0.21 0.65
Pulmonology BalancedRandomForestClassifier 0.61 0.27 0.17 0.09 0.49 1.85 0.36 0.74
Rheumatology BalancedRandomForestClassifier 0.66 0.11 0.22 0.18 0.16 1.54 0.14 0.60
Traumatology BalancedBaggingClassifier 0.65 0.13 0.18 0.14 0.22 1.71 0.20 0.63
Urology BalancedRandomForestClassifier 0.61 0.13 0.19 0.15 0.23 1.73 0.20 0.63
Table 9 Correlations between no-show and features
Feature Correlation
Historical no-show 0.16

Reservation delay = 0 weeks −0.15
Historical no-show by specialty 0.15
Appointment type = routine appointment −0.07
Commune of residence = outside Santiago −0.07
Hour = 8 0.06
Commune of residence = Peñalolén 0.05
Appointment type = 1st appointment 0.05
Appointment type = 1st appointment PHC 0.05
Insurance = A Group 0.04
Reservation delay = 5 weeks 0.03
Commune of residence = Macul 0.03
Day of the week = Monday 0.03
Commune of residence = others in Santiago −0.03
Insurance = D Group −0.03
Day of the week = Wednesday −0.03
Hour of the day = 11 −0.02
All correlations had a p-value < 0.001
Figure 3 shows the features with the strongest label cor-

relation for each specialty. Figure 4 presents a heatmap
Fig. 2 Cross-validated AUC performance of the best (hyperparameter,
based on the seven most important features by specialty,
model) combination
in terms of their predictive power. To do so, the Gini
or Mean Decrease Impurity [3] was sorted in descend-
miss the appointment, and a patient whose appointment is ing order to their overall importance. In most special-
scheduled for the ongoing week is likely to attend. First- ties, no-show can be predicted by a small number of
time appointments are more likely to be missed. Patients features, as shown by the sparsity of the corresponding
are likely to miss an 8 am appointment, while they are lines. Some specialties−especially gastroenterology, gen-
more likely to attend at 11 am. These results are consistent eral surgery, gynecology, nutrition, and traumatology−have
with the analysis of a Chile dataset from 2012 to 2013 a more complex dependence. Table 12 shows the features,
reported previously [24]. Peñalolén and Macul show a larger calculated with the Gini importance, with the highest fre-
correlation with no-show. Patients belonging to Group A of quency. Historical no-show, Peñalolen commune, insurance
the public health insurance (lowest income) are more likely group A and the minimal reservation delay appear con-
not to attend, contrary to those in Group D (highest income). sistently. Although there is a strong similarity between
Interestingly, patients from outside Santiago are more likely Tables 9 and 12, there are also differences. For example,
to attend. Age, sex, and month of the appointment show a historical no-show by specialty and commune of residence
weaker correlation with no-show, which is consistent with outside Santiago are strongly correlated with no-show, but
the results obtained by Kong et al. [24]. their overall predictive importance is low.
Correlation with no-shows is not always coherent with As shown in Table 8, the implementation of actions based
the prediction power of the features. Moreover, both may on this model may yield a noticeable reduction of no-show
change from one specialty to another, which further justifies (as high as 49% in pulmonology).
our decision to model no-shows by specialty. Table 10
displays the correlation with no-shows, while Table 11 3.1 Experimental design
shows the predictive power of features for pulmonology.
The information for the remaining specialties can be The impact on no-shows of having appointments ordered
found in the Supplementary Material. by their risk of being missed was measured in collaboration
J. Dunstan et al.
Fig. 3 Features with the strongest label correlation by specialty. All correlations presented have p-values <0.001
with the hospital. We set an experimental design to measure

Table 10 Correlations between no-show and features: Pulmonology
the effect of phone calls made according to our models.
Feature Correlation This occurred between the 16th of November 2020 and
the 15th of January 2021. The hospital does not receive
Reservation delay = 0 weeks −0.20
Historical no-show 0.15
Appointment type = 1st appointment 0.08 Table 11 Feature importance in pulmonology (Balanced Random
Appointment type = 1st appointment PHC 0.08 Forest Classifier)
Reservation delay = 30-50 weeks 0.08 Feature Importance
Age = first infancy 0.07
Hour = 15 0.05 Reservation delay = 0 weeks 0.13
Insurance = A Group 0.05 Historical no-show 0.09
Commune of residence = Peñalolén 0.05 Hour = 15 0.03
Age = second infancy −0.05 Day of the week = Tuesday 0.01
Month = May −0.04 Commune of residence = Peñalolén 0.01
Hour = 12 −0.04 Day of the week = Thursday 0.01
Month = December 0.03 Age = Nursling 0.01
Day of the week = Monday 0.01 Sex = male 0.01
Hour = 9 0.01
All correlations had a p-value < 0.001 Appointment type = 1st appointment PHC 0.01
Fig. 4 Features with the strongest Gini importance by specialty model
patients on weekends, and we did not carry out follow-ups

during the week between Christmas and new-year. Hence,
Table 12 List of most recurring features we performed an 8-week experimental design in normal
conditions.
Feature Count
On a daily basis, the appointments scheduled for the next
Historical no-show 19 working day were processed by our models to obtain an
Insurance = A Group 16 ordered list, sorted by no-show probability from highest
Commune of residence = Peñalolen 16 to lowest. Then, the hospital’s call center reminded (only)
Reservation delay = 0 weeks 15 the scheduled appointments classified as possible no-
Age = second infancy 9
shows by our predictive models for the specialties selected
Appointment type = routine appointment 6
for the experiment (see paragraph below). All of these
Reservation delay = 1 weeks 6
appointments had been pre-scheduled in agreement with
the patients. These reminders were performed before
Commune of residence = Macul 5
10 AM.
Hour = 10 5
We analyzed 4,617 appointments from four specialties:
Day of the week = Thursday 3
Dermatology, Neurology, Ophthalmology, and Traumatol-
Hour = 11 3
ogy. These specialties were chosen together with the hos-
Insurance = B Group 3
pital, due to their high appointment rates and significant
Day of the week = Tuesday 3
no-show rates. Our predictive models recommended inter-
Sex = male 3
vening in 495 appointments throughout the experimental
Day of the week = Monday 3
design. That is, on average, approximately 10 appointments
Age = first infancy 3
per day. From those appointments, 247 were randomly
Sex = female 3
selected as a control group and 248 for the intervention
Day of the week = Wednesday 3
group.
Appointment type = first appointment 2
The no-show rates during these two months were 21.0%
Hour = 9 2 for the control group (which coincides with the historical
J. Dunstan et al.
Table 13 Comparison of
no-show rates in control and Specialty No-show rate Reduction in
intervention groups in
experimental design Control group Intervention group percentage points
Ophthalmology 29.6% 12.1% 17.5

Neurology 17.6% 5.0% 12.6
Traumatology 19.0% 10.3% 8.7
Dermatology 24.0% 21.1% 2.9
Total 21.0% 10.7% 10.3
NSP average of the hospital) and 10.7% for the intervention measuring the predictive power of our methods for remote
group, with a reduction of 10.3 percentage points (p-value∼ consultations using telemedicine. Finally, as said before,
0.002). Table 13 shows the no-show rates in both groups for we use cost-effectiveness metrics to construct and select
the different specialties considered in the study. the best predictive models. These metrics are computed
To interpret these results in terms of metrics m1 and as the proportion of avoided no-shows and the proportion
m2 , first, we use the percentage of no-show of the control of appointments identified as possible no-shows. Although
group as a proxy for the value NSPi . This percentage simple, these metrics were enough for our purposes. They
also coincides with the historical no-show of the hospital, permit us to consider the hospital’s needs where resources
which justifies this decision. We obtained PR = (21.0% − are scarce, and it is not desirable to contact many patients.
10.7%)/21.0% = 0.46 and PC = 247/4, 617 = However, considering other more complex cost metrics
0.05. This can be read as follows: calling the top 5% (such as in Berg et al. [2]) could bring realism to our
of appointments ordered from higher to lowest no-show methodology and can be the object of a future study.
probability generates a 46% decrease in no-shows. Thus, in Some of the limitations of this study are that we work
terms of the metrics, we get m1 = PR /PC = 9.80 and in pediatric settings, and extending our work to adult
m2 = PR (1 − PC ) = 0.47. appointments will require us to train the models again.
We are currently working on that by gathering funding to
study no-shows for adults and combining urban and rural
4 Conclusions, perspectives populations. In addition, this paper shows only the reduction
in no-shows that calling had compared to a control group.
We have presented the design and implementation of Future work could include cheaper forms of contacting
machine learning methods applied to the no-show problem patients, such as SMS or WhatsApp messages written by
in a pediatric hospital in Chile. It is the most extensive work automatic agents.
using Chilean data, and among the few in pediatric settings. The implementation of actions based on the results
The novelty of our approach is fourfold: provided by our platform may yield a noticeable reduction
of avoidable no-shows. Using a prototype at Dr. Luis Calvo
1. The use extensive historical data to train machine
Mackenna Hospital in a subset of medical specialties and
learning models.
a phone call intervention has resulted in 10.3 percentage
2. The most suitable machine learning model for each
points less no-show. This research is a concrete step
specialty was selected from various methods.
towards reducing non-attendance in this healthcare provider.
3. The development of tailored cost-effectiveness metrics
Other actions, such as reminders of the appointments via
to account for possible preventive interventions.
phone calls, text messages, or e-mail, special scheduling
4. The realization of an experimental design to measure the
rules according to patient characteristics, or even arranging
effectiveness of our predictive models in real conditions
transportation for patients from far communes, could be
Our results show a notorious variability among spe- implemented in the future. However, all these actions rely
cialties in terms of the predictive power of the features. on a good detection of possible no-shows to maximize the
Although reservation delay and historical no-show are con- effect subjected to a limited budget.
sistently strong predictors across most specialties, variables
such as the patient’s age, time of the day, or appointment
type must not be overlooked. Appendix A: Threshold tuning
Future work includes testing the effect of adding weather
variables. However, including weather forecasts from The optimal classification thresholds were obtained by balanc-
external sources poses additional technical implementation ing type I and II errors (defined in Eqs. 7 and 8) for each
challenges. Another interesting line of future research is method, following [22]. For the sake of completeness, we
recall the mathematical relations involving these concepts:

FP
Type I error = ; (7)
N − NSPi
and
FN
Type II error = . (8)
NSPi
where NSPi is the existing no-show rate, FP and TP are the
number of false positives and true positives, respectively (anal-
ogously for FN and TN); and N = FP+TP+FN+TN is the
total number of appointments (for the analized specialty).
Instead of using the default thresholds, we computed the
global minimum of a weighted sum of type I and II errors as
shown in Fig. 5. More precisely, denote by e1 (p) and e2 (p)
the type I and II errors as functions of the classification
threshold p for each machine learning method, respectively,
and let w1 and w2 be their respective weights. As explained
in the next section, we considered the ratio w1 /w2 = 1.5.
Then, p is given by
p ∈ argmin{w1 e1 (p) + w2 e2 (p)}. (9) Fig. 6 Performance metrics as a function of the type I and II weighting
ratio
Once each method is trained, and its classification
threshold tuned, we selected the best model (method,
threshold) for each specialty based on the metrics described w1 /w2 (see Fig. 6). To write PC and PR in terms of FP, FN,
in Section 2.3. TP, TN see Eqs. 1 and 4.
Huang and Hanauer [22] suggests that minimizing type
A.1 Ratio between type I and II errors I error is more critical than type II error in this context,
suggesting a ratio higher than 1 (i.e., w1 > w2 ). We
For the selection of weights w1 and w2 in problem (9), we agree with this appreciation due to the limited resources
analyzed the ratio w1 /w2 between type I and II errors. For in the public health sector and to ensure patient satisfac-
this, we computed PC and m1 = PR /PC as a function of tion. Figure 6 shows that, as the ratio increases, less patients
will be acted upon, but our performance metric will also
increase. Thus, by selecting a ratio higher than 1, we obtain
a better cost-effectiveness. Although Fig. 6 corresponds
only to an exercise for a given specialty and model, it is
representative of the whole dataset. Based on the consider-
ations above, we select a ratio of w1 /w2 = 1.5, aiming at a
greater patient satisfaction and a better cost-effectiveness.
A.2 Metric bias
To analyze the performance of the metrics against feature

imbalance, the measure designed by Luque et al. [31] was
Table 14 Bias of performance metrics due to class imbalance
Metrics Bias Bμ (λP P , λN N , δ)
GM 0
λP P +λN N −1
MCC −
2 [λP P +(1−λN N ) 1+δ 1+δ
1−δ ][λN N +(1−λP P ) 1−δ ]
√ λP P +λN N −1
2 [λP P +(1−λN N )][λN N +(1−λP P )]
2λP P 2λP P
Fig. 5 Type I and II errors as a function of the method classification m1 λP P (1+δ)+(1−λN N )(1−δ) − λP P +(1−λN N )
threshold. Threshold p is selected as the minimiser of their weighted λP P λP P
m2 2 (λN N (1 − δ) − λP P (1 + δ)) − 2 (λN N − λP P )
sum
J. Dunstan et al.
Fig. 7 Heat maps of bias for each performance metrics with δ = 2 × 8/31 − 1
used. We determined the impact of the imbalance using imbalance coefficient δ of our dataset is 2×8/31−1, the bias
the bias of the metric given by Bμ (λP P , λNN , δ), where depends only on λP P and λNN . Figure 7 shows the bias in a
λP P is the percent of true positive, λNN is the percent heatmap. Metrics m1 and m2 have a low bias for most values
of true negative, and δ the imbalance coefficient is given of the parameters, with m2 showing the best performance.
by 2mp /m − 1, where mp is the total number of positive The use of both metrics allows to reduce the impact in areas
elements and m is the total number of elements. with a high bias.
Table 14 shows the definition of bias for the Geometric
Mean (GM), Matthews’s Correlation Coefficient (MCC), A.3 ML metrics the best models for each specialty
and the proposed metrics m1 and m2 . The first two were
selected as benchmarks, since they are known to have a Table 15 gives more information about the best model in
good performance with imbalanced datasets [31]. Since the each specialty.
Table 15 Additional performance metrics of the best model for each medical specialty
Specialty Precisionshow Precisionno−show Recallshow Recallno−show F1 Scoreshow F1 Scoreno−show
Cardiology 0.85 0.29 0.91 0.18 0.88 0.22

Dermatology 0.76 0.42 0.90 0.22 0.82 0.28
Endocrinology 0.83 0.35 0.84 0.33 0.83 0.34
Gastroenterology 0.83 0.36 0.91 0.21 0.87 0.27
General surgery 0.91 0.28 0.85 0.40 0.88 0.33
Genetics 0.78 0.31 0.84 0.24 0.81 0.27
Gynecology 0.78 0.37 0.88 0.22 0.83 0.28
Hematology 0.88 0.37 0.88 0.38 0.88 0.37
Infectology 0.76 0.47 0.92 0.21 0.83 0.29
Nephrology 0.87 0.33 0.92 0.23 0.89 0.27
Neurology 0.77 0.50 0.92 0.23 0.84 0.31
Nutrition 0.70 0.49 0.92 0.16 0.80 0.24
Oncology 0.97 0.12 0.92 0.29 0.94 0.17
Ophtalmology 0.72 0.49 0.91 0.21 0.80 0.29
Orthodontics 0.87 0.59 0.92 0.47 0.89 0.53
Otorhinolaryngology 0.83 0.45 0.88 0.37 0.85 0.40
Parasitology 0.86 0.37 0.91 0.26 0.89 0.30
Pediatric dentistry 0.73 0.55 0.93 0.20 0.82 0.30
Pediatrics 0.78 0.43 0.90 0.23 0.84 0.30
Plastic surgery 0.93 0.22 0.81 0.47 0.87 0.30
Psychiatry 0.78 0.45 0.90 0.25 0.84 0.32
Pulmonology 0.88 0.32 0.78 0.49 0.83 0.39
Rheumatology 0.80 0.33 0.91 0.16 0.85 0.22
Traumatology 0.84 0.31 0.89 0.22 0.86 0.26
Urology 0.83 0.33 0.89 0.23 0.86 0.27
Supplementary Information The online version contains 11. Dantas LF, Fleck JL, Oliveira FLC, Hamacher S (2018) No-shows
supplementary material available at https://doi.org/10.1007/ in appointment scheduling–a systematic literature review. Health
s10729-022-09626-z. Policy 122:412–421
12. Denney J, Coyne S, Rafiqi S (2019) Machine learning predictions
Acknowledgements This work was partly supported by Fondef Grant of no-show appointments in a primary care setting. SMU Data Sci
ID19I10271, Fondecyt grants 11201250, 1181179 and 1201982, Rev 2:2
and Center for Mathematical Modeling (CMM) BASAL fund 13. Devasahay SR, Karpagam S, Ma NL (2017) Predicting appoint-
FB210005 for center of excellence, all from ANID-Chile; as well as ment misses in hospitals using data analytics. mHealth 3:12–12
Millennium Science Initiative Program grants ICN17 002 (IMFD) and 14. Elvira C, Ochoa A, Gonzalvez JC, Mochon F (2018) Machine-
ICN2021 004 (iHealth). learning-based no show prediction in outpatient visits. Int J
Interact Multimed Artif Intell 4:29
Declarations 15. Freedman D (2005) Statistical models: theory and practice.
Cambridge University Press, Cambridge
Ethics approval This research was carried out according to interna- 16. Freund Y, Schapire RE (1997) A decision-theoretic generalization
tional standards on data privacy, and was approved by the Faculty of on-line learning and an application to boosting. J Comput Syst
Committee for Ethics and Biosecurity. Sci 55:119–139. https://www.sciencedirect.com/science/article/
pii/S002200009791504X, https://doi.org/10.1006/jcss.1997.1504
17. Gupta D, Wang WY (2012) Patient appointments in ambu-
Open Access This article is licensed under a Creative Commons
latory care. In: Handbook of Healthcare system scheduling.
Attribution 4.0 International License, which permits use, sharing,
International series in operations research and management
adaptation, distribution and reproduction in any medium or format, as
science, vol 168. Springer, New York LLC, pp 65–104.
long as you give appropriate credit to the original author(s) and the
https://doi.org/10.1007/978-1-4614-1734-7 4
source, provide a link to the Creative Commons licence, and indicate
18. Gurol-Urganci I, de Jongh T, Vodopivec-Jamsek V, Atun R, Car
if changes were made. The images or other third party material in this
J (2013) Mobile phone messaging reminders for attendance at
article are included in the article’s Creative Commons licence, unless
healthcare appointments. Cochrane database of systematic reviews
indicated otherwise in a credit line to the material. If material is not
19. Guzek LM, Fadel WF, Golomb MR (2015) A pilot study of
included in the article’s Creative Commons licence and your intended
reasons and risk factors for “no-shows” in a pediatric neurology
use is not permitted by statutory regulation or exceeds the permitted
clinic. J Child Neurol 30:1295–1299
use, you will need to obtain permission directly from the copyright
20. Harvey HB, Liu C, Ai J, Jaworsky C, Guerrier CE, Flores
holder. To view a copy of this licence, visit http://creativecommons.
E, Pianykh O (2017) Predicting no-shows in radiology using
org/licenses/by/4.0/.
regression modeling of data available in the electronic medical
record. J Am Coll Radiol 14:1303–1309
21. Hu M, Xu X, Li X, Che T (2020) Managing patients’ no-show
References behaviour to improve the sustainability of hospital appointment
systems: Exploring the conscious and unconscious determinants
of no-show behaviour. J Clean Prod 269:122318
1. Alaeddini A, Yang K, Reddy C, Yu S (2011) A probabilistic model 22. Huang Y, Hanauer DA (2014) Patient no-show predictive
for predicting the probability of no-show in hospital appointments. model development using multiple data sources for an effective
Health Care Manag Sci 14:146–157 overbooking approach. Appl Clin Inform 5:836–860
2. Berg BP, Murr M, Chermak D, Woodall J, Pignone M, Sandler 23. Perron Junod N., Dominicé Dao M, Kossovsky MP, Miserez V,
RS, Denton BT (2013) Estimating the cost of no-shows and Chuard C, Calmy A, Gaspoz JM (2010) Reduction of missed
evaluating the effects of mitigation strategies. Med Decis Making appointments at an urban primary care clinic: A randomised
33:976–985. https://doi.org/10.1177/0272989X13478194 controlled study. BMC Fam Pract 11:79
3. Breiman L (2001) Random forests. Mach Learn 45:5–32. 24. Kong Q, Li S, Liu N, Teo CP, Yan Z (2020) Appointment
https://doi.org/10.1023/A:1010933404324 scheduling under time-dependent patient no-show behavior.
4. Breiman L (2004) Bagging predictors. Mach Learn 24:123–140 Queuing Theory eJournal
5. Bush R, Vemulakonda V, Corbett S, Chiang G (2014) Can we 25. Kuo YH, Balasubramanian H, Chen Y (2020) Medical appoint-
predict a national profile of non-attendance pediatric urology ment overbooking and optimal scheduling: tradeoffs between
patients: a multi-institutional electronic health record study. schedule efficiency and accessibility to service. Flex Serv Manuf
Inform Prim Care 21:132 J 32:72–101
6. Cameron S, Sadler L, Lawson B (2010) Adoption of open-access 26. Lacy NL, Paulman A, Reuter MD, Lovejoy B (2004) Why we
scheduling in an academic family practice. Can Fam Physician don’t come: patient perceptions on no-shows. Ann Fam Med
56:906–911 2:541–545
7. Carreras-Garcı́a D, Delgado-Gómez D., Llorente-Fernández F., 27. Lemaı̂tre G, Nogueira F, Aridas CK (2017) Imbalanced-learn:
Arribas-Gil A (2020) Patient no-show prediction: A systematic A python toolbox to tackle the curse of imbalanced datasets
literature review. Entropy 22 in machine learning. J Mach Learn Res 18:1–5. http://jmlr.org/
8. Chen C, Breiman L (2004) Using random forest to learn papers/v18/16-365.html
imbalanced data. University of California, Berkeley 28. Li X, Xiong H, Li X, Wu X, Zhang X, Liu J, Bian J, Dou D
9. Cortes C, Vapnik VN (1995) Support-vector networks. Mach (2021) Interpretable deep learning: Interpretation, interpretability,
Learn 20:273–297 trustworthiness, and beyond. arXiv:2103.10689
10. da Costa TM, Salomão PL, Martha AS, Pisa IT, Sigulem D 29. Lin CL, Mistry N, Boneh J, Li H, Lazebnik R (2016) Text message
(2010) The impact of short message service text messages sent reminders increase appointment adherence in a pediatric clinic:
as appointment reminders to patients’ cell phones at outpatient A randomized controlled trial. International Journal of Pediatrics
clinics in SÃO Paulo, Brazil. Int J Med Inform 79:65–70. http:// 2016
www.sciencedirect.com/science/article/pii/S1386505609001336, 30. Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for
https://doi.org/10.1016/j.ijmedinf.2009.09.001 class-imbalance learning. IEEE Trans Syst Man Cybern Part B
J. Dunstan et al.
(Cybernetics) 39:539–550. https://doi.org/10.1109/TSMCB2008. 42. Robotham D, Satkunanathan S, Reynolds J, Stahl D, Wykes T

2007853 (2016) Using digital notifications to improve attendance in clinic:
31. Luque A, Carrasco A, Martı́n A, de las Heras A (2019) The Systematic review and meta-analysis. BMJ Open 6
impact of class imbalance in classification performance metrics 43. Ruggeri K, Folke T, Benzerga A, Verra S, Büttner C., Steinbeck V,
based on the binary confusion matrix. Pattern Recogn 91:216–231. Yee S, Chaiyachati K (2020) Nudging New York: adaptive models
https://doi.org/10.1016/j.patcog.2019.02.023 and the limits of behavioral interventions to reduce no-shows and
32. Maclin R (1997) An empirical evaluation of bagging and boosting. health inequalities. BMC Health Serv Res 20:1–11
In: Proceedings of the 14th national conference on artificial 44. Samuels RC, Ward VL, Melvin P, Macht-Greenberg M, Wenren
intelligence. AAAI Press, pp 546–551 LM, Yi J, Massey G, Cox JE (2015) Missed appointments: Factors
33. McLeod H, Heath G, Cameron E, Debelle G, Cummins C (2015) contributing to high no-show rates in an urban pediatrics primary
Introducing consultant outpatient clinics to community settings to care clinic. Clin Pediatr 54:976–982
improve access to paediatrics: an observational impact study. BMJ 45. Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010)
Qual Saf 24:377–384 Rusboost: A hybrid approach to alleviating class imbalance.
34. Ministerio de Desarrollo Social y Familia (2017) Encuesta IEEE Trans Syst Man Cybern Part A Syst Hum 40:185–197.
CASEN. http://observatorio.ministeriodesarrollosocial.gob.cl/ https://doi.org/10.1109/TSMCA.2009.2029559
encuesta-casen-2017 46. Srinivas S, Ravindran AR (2018) Optimizing outpatient appoint-
35. Molnar C, Casalicchio G, Bischl B (2020) Interpretable machine ment system using machine learning algorithms and schedul-
learning – a brief history, state-of-the-art and challenges. ing rules: A prescriptive analytics framework. Expert Syst Appl
In: ECML PKDD 2020, Workshops. Springer International 102:245–261
Publishing, Cham, pp 417–431 47. Srinivas S, Salah H (2021) Consultation length and
36. Neal RD, Hussain-Gambles M, Allgar VL, Lawlor DA, Dempsey no-show prediction for improving appointment schedul-
O (2005) Reasons for and consequences of missed appointments ing efficiency at a cardiology clinic: A data analytics
in general practice in the UK: questionnaire survey and approach. Int J Med Inform 145:104290. http://www.
prospective review of medical records. BMC Fam Pract 6:47 sciencedirect.com/science/article/pii/S1386505620309059.
37. Nelson A, Herron D, Rees G, Nachev P (2019) Predicting https://doi.org/10.1016/j.ijmedinf.2020.104290
scheduled hospital attendance with artificial intelligence. NPJ 48. Ting KM, Witten IH (1999) Issues in stacked generalization 10,
Digit Med 2:1–7 271–289
38. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, 49. Topuz K, Uner H, Oztekin A, Yildirim MB (2018) Predicting
Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, pediatric clinic no-shows: a decision analytic framework using
Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, elastic net and Bayesian belief network. Ann Oper Res 263:479–
Duchesnay E (2011) Scikit-learn: Machine learning in Python. J 499
Mach Learn Res 12:2825–2830 50. Van Rossum G, Drake Jr FL (1995) Python tutorial. Centrum voor
39. Penzias R, Sanabia V, Shreeve KM, Bhaumik U, Lenz C, Wiskunde en Informatica Amsterdam, The Netherlands
Woods ER, Forman SF (2019) Personal phone calls lead to 51. Wang S, Yao X (2009) Diversity analysis on imbalanced data
decreased rates of missed appointments in an adolescent/young sets by using ensemble models. In: 2009 IEEE symposium
adult practice. Pediatr Qual Saf 4:e192 on computational intelligence and data mining, pp 324–331,
40. Percac-Lima S, Cronin PR, Ryan DP, Chabner BA, Daly EA, https://doi.org/10.1109/CIDM.2009.4938667
Kimball AB (2015) Patient navigation based on predictive modeling 52. Wolpert DH (1992) Stacked generalization. Neural Netw 5:241–
decreases no-show rates in cancer care. Cancer 121:1662–1670 259. https://www.sciencedirect.com/science/article/pii/S0893608
41. Rebolledo E, Mesı́a LR, Silva G (2014) Nonattendance to medical 005800231, https://doi.org/10.1016/S0893-6080(05)80023-1
specialists appointments and its relation to regional environmental
and socioeconomic indicators in the chilean public health system. Publisher’s note Springer Nature remains neutral with regard to
Medwave 14:e6023–e6023 jurisdictional claims in published maps and institutional affiliations.
Aﬃliations
J. Dunstan1,2 · F. Villena1 · J.P. Hoyos3 · V. Riquelme1 · M. Royer4 · H. Ramı́rez1,5 · J. Peypouquet6
1 Center for Mathematical Modeling (CNRS IRL2807),

University of Chile, Santiago, Chile
2 Departamento de Ciencia de la Computación and Instituto
de Matemática Computacional, Pontificia Universidad Católica
de Chile, Santiago, Chile
3 Escuela de pregrados-Dirección Académica - Vicerrectorı́a,
Universidad Nacional de Colombia Sede De La Paz,
La Paz, Colombia
4 Dr. Luis Calvo Mackenna Hospital, Santiago, Chile
5 Mathematical Engineering Department, University of Chile,
Santiago, Chile
6 Bernoulli Institute for Mathematics, Computer Science
and Artificial Intelligence, Faculty of Science and Engineering,
University of Groningen, Groningen, The Netherlands

Predicting No-Show Appointments in A Pediatric Hospital in Chile Using Machine Learning

Uploaded by

Copyright:

Available Formats

Predicting No-Show Appointments in A Pediatric Hospital in Chile Using Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predicting No-Show Appointments in A Pediatric Hospital in Chile Using Machine Learning

Uploaded by

Copyright:

Available Formats

Health Care Management Science

Predicting no-show appointments in a pediatric hospital in Chile

Received: 8 August 2021 / Accepted: 13 December 2022

Highlights criteria. The hospital management can then apply a

Fig. 1 Map of communes that Chile Greater Santiago

Communes of Metropolitan Orient Health Service:

Table 5 Description of the input features of the model

Feature name Description Type Categories/range

Table 7 Hyperparameters for grid search

Model Parameter Values

AdaBoost Decision tree max depth 1, 2, 5, 8, 10, 15

Balanced Bagging bootstrap True, False

3 Results the scikit-learn ones in this study. Ensemble methods,

Table 8 Performance of the best model for each medical specialty

Specialty Algorithm Threshold PC NSPi NSPf PR m1 m2 AUC

Table 9 Correlations between no-show and features

Historical no-show 0.16

All correlations had a p-value < 0.001

Figure 3 shows the features with the strongest label cor-

with the hospital. We set an experimental design to measure

Fig. 4 Features with the strongest Gini importance by specialty model

patients on weekends, and we did not carry out follow-ups

Ophthalmology 29.6% 12.1% 17.5

recall the mathematical relations involving these concepts:

A.2 Metric bias

To analyze the performance of the metrics against feature

Metrics Bias Bμ (λP P , λN N , δ)

Specialty Precisionshow Precisionno−show Recallshow Recallno−show F1 Scoreshow F1 Scoreno−show

Cardiology 0.85 0.29 0.91 0.18 0.88 0.22

(Cybernetics) 39:539–550. https://doi.org/10.1109/TSMCB2008. 42. Robotham D, Satkunanathan S, Reynolds J, Stahl D, Wykes T

1 Center for Mathematical Modeling (CNRS IRL2807),

You might also like