Deep Learning para Previsao de Faltantes A Consultas
Deep Learning para Previsao de Faltantes A Consultas
Deep Learning para Previsao de Faltantes A Consultas
M Dashtban Weizi Li
Informatics Research Centre, Henley Business Informatics Research Centre, Henley Business
School, University of Reading School, University of Reading
m.h.dashtban@pgr.reading.ac.uk weizi.li@henley.ac.uk
URI: https://hdl.handle.net/10125/59809
ISBN: 978-0-9981331-2-6 Page 3731
(CC BY-NC-ND 4.0)
In this research, we develop a novel method to circumstances may mean certain times are inconvenient
predict non-attendance based on state-of-the-art [31] and/or that the perceived importance of the
machine learning algorithms and big data (both in- appointment may vary between social groups in and of
hospital and outside-hospital data). The proposed itself, or in the context of wider life complexities.
method is an end-to-end deep learning model based on Within psychiatry for example, one study found that
sparse stacked denoising autoencoders (SSDAE) that is alcohol and drug users had particularly high non-
among the latest autoencoders introduced in the attendance rates [6], [14]. However above studies have
literature. We adopt the SSDAE for data reconstruction focused single disease areas. Studies of single disease
and prediction. Our model firstly learns the compact area have produced conflicting results when it comes to
representation of data by which having missing values designing effective interventions to reduce non-
recovered, resulting in a better data representation. Then attendance [32]–[35]. This may due to a reliance on
it uses a direct layer to predict the non- attendance event small data sets and limited variables in certain specialty
with an integrated softmax classification layer. Our settings. The non-attendance in primary care [18],
approach is demonstrated to be more accurate hospital inpatient and outpatient from all specialties [19]
incorporating the prediction model into hospital systems are studied focusing on single missed appointment.
and daily practices. This research will benefit the Factors reported to be associated include age, sex,
hospitals for more targeted intervention and messages to transport logistics, and clinic or practitioner factors such
patients and reduce non-attendance rate. as booking efficiency and the rapport between staff and
patients [21], [22], [31], [33], [36], [37]. Williamson et
2. Related Work al. [9] and Ellis et al. [1] focused on the patient
demographic and practice factors that predict serial
Existing research on non-attendance mainly focuses missed appointments in general practice. Although
on traditional quantitative and qualitative methods those studies considered multiple missing appointments
analysing factors and probability estimation for as one of the factors, only a limited number of patient
population groups. (age, gender, SIMD, distance, ethnicity, number of
consultants per year per patient) and practice variables
(SIMD, appointment delay, number of available
2.1. Existing methods in analysing non-
appointments per patient, average appointment length
attendance in hospital appointments per patient, urban/rural classification). This has led to a
limited coverage of personal health, behavioral,
Most of research in this domain studies factors environmental and social support information in the
contributing to non-attendance in both specific specialty prediction model, lacking the capability of revealing
and all appointments from hospital or general practice. whole spectrum of patterns at the individual level. How
A variety of factors were found effective on patient’s the whole spectrum of patterns affects patients’ behavior
attendance in pediatric urology unit [11], pulmonary in attendance remains unclear. Furthermore, those
rehabilitation [12], [13], psychiatric [14]–[16] and HIV studies use population-based techniques rather than at
[17], primary care [18], inpatient and outpatient in the an individual patient level. For example, logistic
hospital [19] through analyzing multiple correlation regression is mostly used to predict the probability of
from hospital administrative database. A few studies non-attendance by fitting numerical or categorical
also used survey and interviews to explore and compare predictor variables in data to a logit function [1], [38].
the views of patient and health professionals on the The problem with these population-based methods is
reasons of non-attendance [20]–[23]. The factors relate that they do not differentiate between the behaviors of
to inaccessibility, including physical location [24], individual persons and are based on small datasets
opening hours and days [25], and barriers such as therefore it will affect the effectiveness of predicting
language, stigma and cultural differences [26], [27] results in practice. At present, little agreement exists on
may all be important. However, the interplay between what works in practice to reduce missed appointments
the accessibility of a service and the perceived [1]. We will use deep learning method to consider a
worthiness of the attendee, or ‘candidacy’, competing wide range of factors and extract important features and
priorities [20], [22], [28], [29] (both self-perceived and complexities towards meaningful patterns from large
as perceived by the service provider) can also lead to dataset and more accurate at the individual level.
differences in how likely particular groups are to ‘get
into, through and on’ with services [30]. Morbidity
2.2 Deep learning in healthcare
differences can also affect attendance where the illness
reduces the ability to navigate access to the health care
Compared with traditional statistical methods, deep
system[14]. Variation in social and economic
learning methods have attracted many researchers and
Page 3732
institutions in clinical research tasks which are difficult Additionally, learning highly non-linear and
or even impossible to solve with traditional methods complicated patterns such as the relations among input
[39], [40]. They are more robust to learn knowledge features is one the prominent characteristics of SSDAE
from high-dimensional and high-volume data such as [49]. To this end, in this paper, the SSDAE was
health, social economics and environmental employed for recovering whole data at the first step
information. It has proven to be competent to identify (after data preparation from our hospital EPR system).
patterns and dependencies with cases superior to human A denoising autoencoder (DAE) is simply a neural
experts. Therefore, deep learning methods provide great network with one hidden layer that should be trained to
potential to present a whole picture embedded in large reconstruct a clean version of input X from a
scale data and reveal unknown structure to better serve corrupted/current version of x’. It is accomplished by a
prediction of non-attendance risk and effective so-called encoder that is a deterministic mapping from
engagement to optimize the health resource usage. an input vector x into hidden representation y. X is the
Deep learning classification from EPR is initially in-hospital and outside hospital datasets with variables
studied to predict disease progression. For example, to predict patient’s non-attendance.
Choi et al [41] applied recurrent neural network (RNN) 𝑓" 𝐱 = 𝑠(𝐖𝐱 + 𝐛)
in longitudinal time stamped EPR to predict diagnoses
where the parameter 𝜃 is (𝐖, 𝐛), 𝐖 is a weight matrix
and medications for the subsequent visit by building a
indicating the weight of each of contributing variables
generic temporal predictive model that covers observed
of patients with non-attendance, 𝑏 is an encoding bias
medical conditions and medication uses, followed by
vector.
the development of specific heart failure prediction
In stacking SSDAE as demonstrated in Figure 1, the
model. Pham et al [42] utilize the long-short memory
auto-encoder layers are placed on top of each other.
(LSTM) method to model disease progression and
Each layer is trained independently (‘greedily’) and then
predict future risk. Recently more attention is received
is stacked on top of previous one. In denoising
in using deep learning method to predict the risk of
autoencoders, the loss function is to minimizing the
readmission. For example, Nguyen et al. [43] and Wang
reconstruction loss between a clean X and its
et al. [44] applied convolutional neural network
reconstruction from Y [50]. A decoder is then used to
methods to detect and combines predictive local clinical
motifs to stratify the risk of readmission. Jamei et al. mapped the latent representation ! into a reconstructed
[45] developed an artificial neural network model to (‘repaired’) vector such as 𝑧 ∈ 0,1 2
predict all cause risk of 30- day hospital readmission and 𝒛 = g 5′ 𝐲 = 𝑠(𝐖 ′ 𝒚 + 𝐛 ′ )
Xiao et al. [46] developed a hybrid deep learning model
𝐖 8 is a decoding matrix, and 𝐛8 is decoding bias
that combines topic modelling and RNN to embed
clinical concepts in short-term local context and long vector; The SSDAE could have several layers. For
term global context to predict readmission. Rajkomar et training a SSDAE, each layer is trained on top of
al. [47] further developed a scalable deep learning previous one. The training process starts with per-
model using RNN for prediction across multiple centers training the first hidden layer fed the training samples as
without site-specific data harmonization which is input, training the second hidden layer with the outputs
validated in readmission task. However, to best of our flowing from the first hidden layer, and so on. This was
knowledge, there is no research available to predict non- how autoencoders stack hierarchically to form a deep
SSDAE. The parameters of the model θ and θ′ are
attendance risk using deep learning methods.
optimized during the training phase to minimize the
average reconstruction error,
3. Deep learning model based on sparse G
1
stacked denoising autoencoders (SSDAE) ′∗
𝜃, 𝜃 = 𝑎𝑟𝑔 min 𝐿 𝐱, 𝐳 = arg min
′∗ ′∗ 𝑁
𝐿 𝐱 𝐢 ,𝐳 𝐢
,
"," ","
HIJ
It is well-known that in hospital systems there are where 𝐿(𝐱, 𝐳) is a loss function and N is the number of
large amount of missing values [48]. There are several data samples in the training set. The reconstruction
algorithms in the literature to deal with such issues. The cross-entropy function is usually used as loss as
simplest way is to replace the missing values with the depicted in the equation below:
mean values, median values, or some other statistics. It 2
is obviously fast and simple but not effective as it does
𝐿K 𝐱, 𝐳 = − [𝐱 𝐤 𝑙𝑜𝑔𝐳𝐤 + (1 − 𝐱 𝐤 )log (1 − 𝐳𝐤 )]
not include the relations of such missing values with
TIJ
other known/unknown values. To this point, the SSDAE
is an AI solution for reconstructing whole data instead
on recovering each of which independently.
Page 3733
Figure 1 Non-attendance prediction model integrated with hospital appointment system
Page 3734
learn proper representation while going very deeper
where x is a N dimensional vector of real numbers added just complexity than any improvement. Our
from the previous hidden unit and transform it into a empirical observation was already reported in [50] as
vector of real number in range (0,1) thus it is the output they also found stability of results (error convergence)
probabilities for each class. As is clear in the equation, on the three layer architecture specially for sparse types.
the output is always positive numbers which has been
also normalized. 4. Non-attendance risk prediction using
In brief, the training of the model comprises of two
phases. At first, the model is trained using training SSDAE
dataset together with its associated labels. In the former
phase, we try to minimize the difference between the 4.1 Data and variables
recovered and ground truth training dataset: X vs X̂ . In
the later phase, purpose is to optimize the model in terms The data source is from in-hospital data (e.g. EPR)
of supervised prediction performance. and outside hospital data (e.g. environment and social
It is worth mentioning that training the model using economic data). In EPR, the information of over
standard backpropagation algorithms usually yields 150,000 outpatients spanning on around 1.6 million
poor performance. To this end, a greedy layerwise records was gathered. The information is unevenly
unsupervised learning algorithm is proposed by [52] to distributed in 6 years beginning from 2010 and going
pre-train the SSDAEs layer by layer in a bottom–up through the early 2018. Considering the period, the
way. Just afterward, fine-tuning the model’s parameters more records there is in the most recent years as the EPR
in a top–down direction is applied with backpropagation system is more extending. Variables selected from
to improve the performance at the same time. The different tables and different database can be classified
training procedures of this study briefly involves the into 7 categories including demographic, appointment
following steps drawn from the proposed algorithms of history, inpatient history, outpatient history, deprivation
[52], [53]. index and weather information, health status. Those
Step 1: Minimize the objective function of the first variables are identified through literatures and focus
autoencoder over the input data groups with hospital operation teams. The inpatient
Step 2: Minimize the second autoencoder’s objective database contains information about in-hospital
function over the output of the previous layer patients. Nevertheless, we used it to take the advantages
Step 3: Iterates through steps 1 and 2 of possibly available historic health data about new-
Step 4: Obtain the probability of no-show patient class coming patients. Such historic health records contain
based on the output of the last hidden layer diagnostic codes which in turn could be used to draw
Step 5: Optimize whole network with backpropagation some very informative variables such as psychiatric
algorithms variables for example. If there was a patient had
inpatient records for more than once, we will only use
The first three steps are literally unsupervised as is the record where there was an overlap between inpatient
aimed to minimize the reconstruction error; whereas in period and outpatient appointment time and less than 14
the last step, where the generated labels from the last days gap between discharge and outpatient date. This is
autoencoder fed to a softmax layer, all stacked layers from focus group that patient may choose not to attend
will be optimized using backpropagation as a whole the outpatient appointment if it is within their inpatient
network. The optimization is performed in a supervised time or it is close to their discharge date. It should be
way based on the respective class labels. noted that some variables are particularly conditional.
Moreover, it is critical to consider that the number of For instance, length of stay (LOS) is used as an input.
hidden layers could potentially leverage the The LOS is non-zero if and only if the patient had an
performance of SSDAE. Very shallow structure of immediate inpatient record in the EPR. The zero value
SSDAE could result in poor performance whereas a is used in every empty element in the resulting table.
very deep structure (i.e., with many hidden unit) make After digitization, a normalization procedure was
the constructed model very complex and diversely effect applied to center the data and making them in a closed
the performance as well. We pre-evaluated some range [0,1]. The normalization considerably diminishes
architectures and found a three-layer SSDAE works the inverse effect of large-scale variables to hinder the
better for our application. Selecting a specific network from incorporating small-scale attribute in both
architecture by testing over whole data is highly the neural networks and classification models [54].
resource-intensive and not always applicable. Some Beside the input variables, the target variable that is
shallower (one-Layer) to deeper (5-Layer) architectures indeed a binary event i.e., show & no-show, should be
were assessed using only validation set. The shallower constructed. The target vector contains either zero or
networks resulted in poorer performance as failed to
Page 3735
one respective to the corresponding event to each row of models in comparison with some traditional models
information. It should be mentioned that each patient such as logistic regression produce the result in a few
may have several records from which a few might be milliseconds without any extra efforts. We do not need
positive event type. Hence, the model should predict the to do anything else once the model is built. One critical
patient’s behavior given a specific time point and benefit of our proposed approach is the scalability.
historic information. Scalability defines in three different ways: (1) in terms
of number of variables and (2) in terms of number of
4.2 Model training samples we can use and most importantly (3) in terms
of updating the model over time. We could add new
For training the model, we use all the information variables to existing model with the same practice. New
before mid 2017 as training set and validation set. The variables provide a way to incorporate more information
remaining records were utilized as testing data for to the model resulting in more reliable model for
evaluating the model performance. We tried to use a managers.
natural split as the model is going to be run over the live The proposed method was applied to the test data
data, the most recent data samples were used for testing and its performance compared with commonly-used
the model comprising statistically around 25% of all classifiers and representative methods. The Five well-
samples. The remaining samples were divided using known machine learning classifiers were used first to
stratified random sampling into of 15% validation and give us some insight about complexity of prediction.
85% training sets. The Stratified random sampling [55] Support vector machines (SVM) (with Linear kernel),
is essential to maintain the original class distribution k-nearest neighbours algorithm (KNN, K=3), Decision
among both subsets. Tree (DT), Naïve Bayes, and Random Forest based on
In brief, the training model is to minimize the the parameter settings suggested by [56] for imbalanced
difference between the feeding data and recovered high-dimensional data. The random forest classifier was
replicate (i.e., the output of the autoencoders) while used from the widely-used ‘sklearn.ensemble‘ library in
trying to build an overall high-performance python with all the default parameter settings but the
classification model with backpropagation. It is number of tree which was set to 100 instead of its default
noteworthy that the pre-training the SSDAE layers are (10) .
literally unsupervised as no label is being used. Considering parameter settings, it is important to
However, the optimization process is supervised as we note that we could not feed all data into the neural
exploit the target vector (i.e., prepared binary labels network either during the training or testing phase. Data
indicating attendance vs non-attendance). Our method should be fed in to the model in small parts called batch.
was implemented and evaluated with SQL Server (for The batch size containing 64 samples was used as
fetching data, preparing tables, some cleansing, etc.), adopted primarily in Adam optimizer [57] and
Matlab 2018a (deep learning and machine learning suggested by the previous work of [45]. Other
packages) and Jupyter Notebook. The experiments were parameters such as sparsity weight has not been altered
conducted on CPU 4Ghz, RAM 32GB, Highest Speed from default values. Furthermore logistic regression is
SSD: 1TB, and VGA Card: GTX 1080TI with 11GB of the only method used to predict the probability of non-
RAM having over 3600 CUDA cores. attendance so far [38]. Hence, it is also included as one
of the baseline classifier to compare with our non-
4.3 Evaluation attendance model. Table 1 demonstrates the
performance of proposed method with baseline
The evaluation phase consists of three practical classifiers.
stages. In the first stage, the original test data was fed In our comparison, we evaluate a similar
into the previously trained model. The trained model architecture used previously in healthcare [48], as well
will elucidate the recovered version of the feeding test as a representative work of [1] and logistic-regression-
data while at the same time produce a probability of no- based method that are basically belong to statistical
show event. One important advantage of this model is modelling. As it is obvious in Table 1, the proposed
its flexibility about missing values or incomplete method markedly outperforms all the conventional
information that is widespread in real-world practical methods in various aspects. It could obtain a general
application. Another advantage is the performance of accuracy of almost 70%. Only the Deep Patient [48]
the final model is highly better than a traditional model which has a similar deep learning architecture used for
with the same time complexity. We should notice there disease prediction comes near in performance. However,
is two complexities in terms of time involved in building as aforementioned in the introduction, our proposed
practical models. Time complexity in training and the
final product. This models and similar machine learning
Page 3736
Table 1 Performance of different prediction methods (including existing non-attendance method
and representative deep learning methods used in EPR)
Measures
Method
AUC-ROC Precision Recall Accuracy
Our Method 0.71 0.69 0.78 0.69
DeepPatient-SVM [48] 0.69 0.73 0.67 0.67
Logistic Regression non- 0.54 0.60 0.52 0.51
attendance model [1]
Logistic + Bayesian non- 0.49 0.61 0.45 0.46
attendance model [38]
Random Forest 0.54 0.63 0.46 0.51
SVM 0.51 0.58 0.49 0.47
KNN 0.17 0.14 0.33 0.17
Naïve Bayes 0.33 0.25 0.76 0.32
Decision Tree 0.29 0.27 0.69 0.27
architecture is more integrative than those methods having such models. Assume a case when a model learnt
having separated the prediction phase and the wrong patterns thereby generating wrong class labels a.
construction phase. Such issues could be because of the fix improper data-
Regarding other machine learning classifiers, split, overfitting over training sets, missing values,
Decision Tree along with KNN had the lowest class-imbalance, and other matters.
performance. Nonetheless, the Random Forest and The proposed method significantly outperformed
SVM achieved the highest accuracy among other these methods in terms of various evaluation metrics.
common methods. The RF classifier with higher Nevertheless, the fine-tuning procedures and dealing
running time in our experiments had slightly better with several free parameters are quite challenging.
performance than SVM. One reason behind that could Perhaps in future with advancing AI technology, we
be because the random forests usually does not require would see high-scale self-adaptable algorithms. In other
many tweaking as long as the number of estimators (the viewpoint, more relevant data and higher quality
trees) is large enough. we selected the model with 100 improves the performance of all current models. We
trees which outperforms other baseline methods. believe the current trends for developing health-care
Furthermore, it is worth noting these baseline systems in UK follow a growing consistent strategy to
methods were utilized mostly without fine-tuning hyper reduce operational costs, reduce clinical costs, and
parameter settings since it is highly time-consuming improve clinical outcomes at the same time. adopting
considering various settings and improvement steps. So, such intelligent algorithms in health-care application
one reason of such significant different in performance with high-scale dimension could potentially contributes
of these method is because of that fact. The other reason to this process.
is the complexity of hypothesis space which influence
directly the performance of classifiers. For instance, 5. Conclusions
decision tree which is expected usually to have a better
performance particularly on high-dimensional In this study, we represented a novel non-
unbalanced datasets [58], [59] obtained poorer accuracy attendance prediction method incorporating wide
than SVM and Naïve Bayes (although it was run with spectrum of factors relating to health, social economics
automatic hyper-parameter optimization in Matlab and environment for improved understanding and
toolset). One reason as aforementioned is the tweaking prediction of patient behaviors. Our approach is
issues and the other could be because of conflicting rules applicable upon hospital big data from EPR systems.
in the decision surface. In such cases, a proper rule The proposed approach is an end-to-end deep learning
induction method like [60] could significantly help out model which adopted the latest architecture of sparse
to build a better model that is out of this paper scope. stacked denoising autoencoders (SSDAEs). The
Also, considering the AUC area of those models that are SSDAEs were used both for data reconstruction and
under 0.5 -which actually perform poorer than a random classification. As for reconstruction, the stacked
classifier- it is essential to note about the possibility of autoencoders were exploited to deal with the missing
Page 3737
values for recovering them and provide a dense
representation. In prediction phase, a softmax layer that 7. References
has been used in modern deep learning models was
added to the network. This layer produced probability of
[1] D. A. Ellis, R. McQueenie, A. McConnachie, P. Wilson,
non-attendance event based on the outputs of the last
and A. E. Williamson, “Demographic and practice factors
hidden unit in SSDAE. predicting repeated non-attendance in primary care: a
Practically developing the proposed model required national retrospective cohort analysis,” Lancet Public Heal.,
three main phases. First was the data preparation that vol. 2, no. 12, pp. e551–e559, 2017.
was gathering and combining various variables from
different data tables and databases in the EPR system. [2] Department of Health. “A zero cost way to reduce missed
Collating data itself is not enough. Hence, digitization hospital appointments.”, 2016
and normalization on the whole data were performed to
obtain a proper input for the model. The data was [3] NHS England, “Quarterly Hospital Activity Data.”, 2016
separated into testing, training and evaluation sets. A
[4] National Audit Service, “NHS waiting times for elective
target vector notating the non-attendance events care in England.” 42, 2014
containing wither zero or one was created for supervised
training the model afterwards. The model was trained on [5] A. George and G. Rubin, “Non-attendance in general
the training samples and evaluated upon the testing practice: a systematic review and its implications for access
samples. The performance of the model over the test set to primary health care,” Fam. Pract., vol. 20, no. 2, pp. 178–
was compared with other classification models 184, 2003.
including logistic regression. The experiments
illustrated that the proposed model significantly [6]K. Campbell, A. Millard, G. McCartney, and S.
McCullough, “Who is least likely to attend? An analysis of
outperformed other models in terms of important
outpatient appointment ‘did not attend’(DNA) data in
evaluation metrics including AUC-ROC, Precision, Scotland.” Edinburgh: NHS Health Scotland, 2015.
Recall, and Accuracy.
The constructed model was finally deployed on [7] J. Clark, “Appointment cancellation options-a new
current infrastructure to being connected to a reminder system to help decrease no-show appointments,” IACH Inf.,
system. The limitation of this research is that only vol. 3, 2006.
“specialty” covers clinical related information on
consultant’s skill and expertise. More detailed clinical [8] C. G. Moore, P. Wilson-Witherspoon, and J. C. Probst,
data such as diagnosis, treatment specialty, attendance “Time and money: effects of no-shows at a family practice
residency clinic,” Fam. Med. City-, vol. 33, no. 7, pp. 522–
type (main procedure) will be included in our future
527, 2001.
work for better results. We will involve wider features
from hospital database and use methods such as [9] A. E. Williamson, D. A. Ellis, P. Wilson, R. McQueenie,
Recursive Feature Elimination (RFE) for feature and A. McConnachie, “Understanding repeated non-
selection to improve accuracy. Further research will also attendance in health services: a pilot analysis of
be conducted to predict more detailed patients’ administrative data and full study protocol for a national
attendance behaviors including attendance, non- retrospective cohort,” BMJ Open, vol. 7, no. 2, Feb. 2017.
attendance without prior notification and non-
attendance with prior notification through multi- [10] U.-G. Gerdtham and M. Johannesson, “The relationship
between happiness, health, and socio-economic factors:
classification. results based on Swedish microdata,” J. Socio. Econ., vol. 30,
no. 6, pp. 553–557, 2001.
Page 3738
clinic: descriptive analyses of consultations over eight years,”
[14] A. J. Mitchell and T. Selmes, “A comparative survey of Swiss Med. Wkly., vol. 137, no. 47/48, p. 677, 2007.
missed initial and follow-up appointments to psychiatric
specialties in the United Kingdom,” Psychiatr. Serv., vol. 58, [26] W. Franks, N. Gawn, and G. Bowden, “Barriers to
no. 6, pp. 868–871, 2007. access to mental health services for migrant workers,
refugees and asylum seekers,” J. Public Ment. Health, vol. 6,
[15] A. J. Mitchell and T. Selmes, “Why don’t patients attend no. 1, pp. 33–41, 2007.
their appointments? Maintaining engagement with
psychiatric services,” Adv. Psychiatr. Treat., vol. 13, no. 6, [27] F. M. Burns, J. Y. Imrie, J. Nazroo, A. M. Johnson, and
pp. 423–434, 2007. K. A. Fenton, “Why the (y) wait? Key informant
understandings of factors contributing to late presentation
[16] H. Killaspy, S. Banerjee, M. King, and M. Lloyd, and poor utilization of HIV health and social care services by
“Prospective controlled study of psychiatric out-patient non- African migrants in Britain,” AIDS Care, vol. 19, no. 1, pp.
attendance: Characteristics and outcome,” Br. J. Psychiatry, 102–108, 2007.
vol. 176, no. 2, pp. 160–165, 2000.
[28] M. D. Woods et al., “Vulnerable groups and access to
[17] S. L. Catz, J. B. McClure, G. N. Jones, and P. J. health care: a critical interpretive review,” Natl. Coord. Cent.
Brantley, “Predictors of outpatient medical appointment NHS Serv. Deliv. Organ RD Retrieved May, vol. 27, p. 2012,
attendance among persons with HIV,” AIDS Care, vol. 11, 2005.
no. 3, pp. 361–373, 1999.
[29] M. Mackenzie, E. Conway, A. Hastings, M. Munro, and
[18] D. Giunta, A. Briatore, A. Baum, D. Luna, G. Waisman, C. O’Donnell, “Is ‘candidacy’a useful concept for
and F. G. B. de Quiros, “Factors associated with understanding journeys through public services? A critical
nonattendance at clinical medicine scheduled outpatient interpretive literature synthesis,” Soc. Policy Adm., vol. 47,
appointments in a university general hospital,” Patient no. 7, pp. 806–825, 2013.
Prefer. Adherence, vol. 7, pp. 1163–1170, 2013.
[30] A. Hirst, J. Delvaux, S. Rinne, C. Short, and A.
[19] P. Kheirkhah, Q. Feng, L. M. Travis, S. Tavakoli- McGregor, “Multiple and complex needs initiative:
Tabasi, and A. Sharafkhaneh, “Prevalence, predictors and programme evaluation report,” 2009.
economic consequences of no-shows,” BMC Health Serv.
Res., vol. 16, no. 1, p. 13, 2016. [31] R. D. Neal, M. Hussain-Gambles, V. L. Allgar, D. A.
Lawlor, and O. Dempsey, “Reasons for and consequences of
[20] E. Harte et al., “Reasons why people do not attend NHS missed appointments in general practice in the UK:
Health Checks: a systematic review and qualitative questionnaire survey and prospective review of medical
synthesis,” Br J Gen Pr., vol. 68, no. 666, pp. e28–e35, 2018. records,” BMC Fam. Pract., vol. 6, no. 1, p. 47, 2005.
[21] V. L. Lawson, P. A. Lyne, J. N. Harvey, and C. E. [32] T. N. O. Lehmann, A. Aebi, D. Lehmann, M. B. Olivet,
Bundy, “Understanding why people with type 1 diabetes do and H. Stalder, “Missed appointments at a Swiss university
not attend for specialist advice: A qualitative analysis of the outpatient clinic,” Public Health, vol. 121, no. 10, pp. 790–
views of people with insulin-dependent diabetes who do not 799, 2007.
attend diabetes clinic,” J. Health Psychol., vol. 10, no. 3, pp.
409–423, 2005. [33] K. M. Nielsen, O. Faergeman, A. Foldspang, and M. L.
Larsen, “Cardiac rehabilitation: health characteristics and
[22] C. Martin, T. Perfect, and G. Mantle, “Non-attendance socio-economic status among those who do not attend,” Eur.
in primary care: The views of patients and practices on its J. Public Health, vol. 18, no. 5, pp. 479–483, 2008.
causes, impact and solutions,” Fam. Pract., vol. 22, no. 6, pp.
638–643, 2005. [34] S. B. Cashman, J. A. Savageau, C. A. Lemay, and W.
Ferguson, “Patient health status and appointment keeping in
[23] M. Husain-Gambles, R. D. Neal, O. Dempsey, D. A. an urban community health center,” J. Health Care Poor
Lawlor, and J. Hodgson, “Missed appointments in primary Underserved, vol. 15, no. 3, pp. 474–488, 2004.
care: questionnaire and focus group study of health
professionals.,” Br J Gen Pr., vol. 54, no. 499, pp. 108–113, [35] Y. Masuda et al., “Personal features and dropout from
2004. diabetic care,” Environ. Health Prev. Med., vol. 11, no. 3, pp.
115–119, 2006.
[24] K. E. Lasser, I. L. Mintzer, A. Lambert, H. Cabral, and
D. H. Bor, “Missed appointment rates in primary care: the [36] J. Waller and P. Hodgkin, “Defaulters in general
importance of site of care,” J. Health Care Poor practice: Who are they and what can be done about them?,”
Underserved, vol. 16, no. 3, pp. 475–486, 2005. Fam. Pract., vol. 17, no. 3, pp. 252–253, 2000.
[25] V. Chariatte, P. Michaud, A. Berchtold, C. Akré, and J. [37] A. Murdock, C. Rodgers, H. Lindsay, and T. C. K.
Suris, “Missed appointments in an adolescent outpatient Tham, “Why do patients not keep their appointments?
Page 3739
Prospective study in a gastroenterology outpatient clinic,” J. [49] H.-I. Suk, S.-W. Lee, D. Shen, and A. D. N. Initiative,
R. Soc. Med., vol. 95, no. 6, pp. 284–286, 2002. “Latent feature representation with stacked auto-encoder for
AD/MCI diagnosis,” Brain Struct. Funct., vol. 220, no. 2, pp.
[38] A. Alaeddini, K. Yang, C. Reddy, and S. Yu, “A 841–859, 2015.
probabilistic model for predicting the probability of no-show
in hospital appointments,” Health Care Manag. Sci., vol. 14, [50] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-
no. 2, pp. 146–157, 2011. A. Manzagol, “Stacked denoising autoencoders: Learning
useful representations in a deep network with a local
[39] J. Wu, J. Roy, and W. F. Stewart, “Prediction modeling denoising criterion,” J. Mach. Learn. Res., vol. 11, no. Dec,
using EHR data: challenges, strategies, and a comparison of pp. 3371–3408, 2010.
machine learning approaches,” Med. Care, vol. 48, no. 6, pp.
S106–S113, 2010. [51] J. Xie, L. Xu, and E. Chen, “Image denoising and
inpainting with deep neural networks,” in Advances in neural
[40] W. Raghupathi and V. Raghupathi, “Big data analytics information processing systems, 2012, pp. 341–349.
in healthcare: promise and potential,” Heal. Inf. Sci. Syst.,
vol. 2, no. 1, p. 3, 2014. [52] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast
learning algorithm for deep belief nets,” Neural Comput.,
[41] E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart, and vol. 18, no. 7, pp. 1527–1554, 2006.
J. Sun, “Doctor ai: Predicting clinical events via recurrent
neural networks,” in Machine Learning for Healthcare [53] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle,
Conference, 2016, pp. 301–318. “Greedy layer-wise training of deep networks,” in Advances
in neural information processing systems, 2007, pp. 153–
[42] T. Pham, T. Tran, D. Phung, and S. Venkatesh, 160.
“Deepcare: A deep dynamic memory model for predictive
medicine,” in Pacific-Asia Conference on Knowledge [54] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data
Discovery and Data Mining, 2016, pp. 30–41. Mining: Practical machine learning tools and techniques.
Morgan Kaufmann, 2016.
[43] P. Nguyen, T. Tran, N. Wickramasinghe, and S.
Venkatesh, “Deepr: A Convolutional Net for Medical [55] A. I. Marqués, V. García, and J. S. Sánchez, “On the
Records,” IEEE J. Biomed. Heal. Informatics, vol. 21, no. 1, suitability of resampling techniques for the class imbalance
pp. 22–30, 2017. problem in credit scoring,” J. Oper. Res. Soc., vol. 64, no. 7,
pp. 1060–1070, 2013.
[44] H. Wang, Z. Cui, Y. Chen, M. Avidan, A. Ben
Abdallah, and A. Kronzer, “Cost-sensitive Deep Learning for [56] M. Dashtban and M. Balafar, “Gene selection for
Early Readmission Prediction at A Major Hospital,” Canada microarray cancer classification using a new evolutionary
Proc. BIOKDD, no. 17, 2017. method employing artificial intelligence concepts,”
Genomics, vol. 109, no. 2, pp. 91–107, 2017.
[45] M. Jamei, A. Nisnevich, E. Wetchler, S. Sudat, and E.
Liu, “Predicting all-cause risk of 30-day hospital readmission [57] D. P. Kingma and J. Ba, “Adam: A method for
using artificial neural networks,” PLoS One, vol. 12, no. 7, stochastic optimization,” arXiv Prepr. arXiv1412.6980, 2014.
2017.
[58] L. Rokach and O. Z. Maimon, Data mining with
[46] C. Xiao, T. Ma, A. B. Dieng, D. M. Blei, and F. Wang, decision trees: theory and applications, vol. 69. World
“Readmission prediction via deep contextual embedding of scientific, 2008.
clinical concepts,” PLoS One, vol. 13, no. 4, 2018.
[59] M. Dashtban, M. Balafar, and P. Suravajhala, “Gene
[47] A. Rajkomar, E. Oren, K. Chen, A. Dai, … N. H. selection for tumor classification using a novel bio-inspired
preprint arXiv, and U. 2018, “Scalable and accurate deep multi-objective approach,” Genomics, vol. 110, no. 1, pp.
learning for electronic health records,” arxiv.org. 10–17, 2018.
[48] R. Miotto, L. Li, B. A. Kidd, and J. T. Dudley, “Deep [60] G. M. Weiss and F. Provost, “Learning when training
patient: an unsupervised representation to predict the future data are costly: The effect of class distribution on tree
of patients from the electronic health records,” Sci. Rep., vol. induction,” J. Artif. Intell. Res., vol. 19, pp. 315–354, 2003.
6, p. 26094, 2016.
Page 3740