Abstract
Simulated allocation models (SAMs) are used to evaluate organ allocation policies. An important component of SAMs is a module that decides whether each potential recipient will accept an offered organ. The objective of this study was to develop and test accept-or-decline classifiers based on several machine-learning methods in an effort to improve the SAM for liver allocation. Feature selection and imbalance correction methods were tested and best approaches identified for application to organ transplant data. Then, we used 2011 liver match-run data to compare classifiers based on logistic regression, support vector machines, boosting, classification and regression trees, and Random Forests. Finally, because the accept-or-decline module will be embedded in a simulation model, we also developed an evaluation tool for comparing performance of predictors, which we call sample-path accuracy. The Random Forest method resulted in the smallest overall error rate, and boosting techniques had greater accuracy when both sensitivity and specificity were simultaneously considered important. Our comparisons show that no method dominates all others on all performance measures of interest. A logistic regression-based classifier is easy to implement and allows for pinpointing the contribution of each feature toward the probability of acceptance. Other methods we tested did not have a similar interpretation. The Scientific Registry of Transplant Recipients decided to use the logistic regression-based accept-decline decision module in the next generation of liver SAM.
Similar content being viewed by others
References
Ahn JH, Hornberger JC (1996) Involving patients in the cadaveric kidney transplant allocation process: a decision-theoretic perspective. Manag Sci 42(5):629–641
Alagoz O, Maillart LM, Schaefer AJ, Roberts MS (2007) Determining the acceptance of cadaveric livers using an implicit model of the waiting list. Oper Res 55(1):24–36
Bertsimas D, Chang A (2012) ORC: ordered rules for classification a discrete optimization approach to associative classification. Research Working papers
Bishop CM (2006) Pattern recognition and machine learning. Springer
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Campbell C, Ying Y (2011) Learning with support vector machines. Morgan &; Claypool Publishers
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Cover TM, Thomas JA (2012) Elements of information theory. Wiley-Interscience
David I, Yechiali U (1985) A time-dependent stopping problem with application to live organ transplants. Oper Res 33(3):491–504
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley-Interscience
Gentry S, Chow E, Wickliffe C, Massie A, Leighton T, Snyder JJ, Israni AK, Kasiske BL, Segev D (2013) Relationship of cold ischemia time to estimated transportation time for liver allografts. Research Working papers
Harper AM, Taranto SE, Edwards EB, Daily OP (2000) An update on a successful simulation project: the Unos Liver Allocation Model. In: Proceeding of Winter Simulation Conference, Winter Simulation Conference, pp 1955–1962
Howard DH (2002) Why do transplant surgeons turn down organs? A model of the accept/reject decision. J Heart Fail 21(6):957–969
Jolliffe I (1982) A note on the use of principal components in regression. J R Stat Soc Ser C (Appl Stat) 31(3):300–303
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the ninth international workshop on machine learning. Morgan Kaufmann Publishers Inc., pp 249– 256
Kreke J, Schaefer AJ, Angus DC, Bryce CL, Roberts MS (2002) Incorporating biology into discrete event simulation models of organ allocation. In: Proceeding of winter simulation conference, winter simulation conference
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the fourteenth international conference on machine learning
Leppke S, Leighton T, Zaun D, Chen SC, Skeans M, Israni AK, Snyder JJ, Kasiske BL (2013) Scientific registry of transplant recipients: Collecting, analyzing, and reporting data on transplantation in the United States. Transplant Rev 27:50–56
Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models. McGraw-Hill/Irwin
Pritsker AAB, Daily OP, Pritsker KD (1996) Using simulation to craft a national organ transplantation policy. In: Proceeding of winter simulation conference. ACM Press, New York, pp 1163–1169
Ratcliffe J, Young T, Buxton M, Eldabi T, Paul R, Burroughs A, Papatheodoridis G, Rolles K (2001) A simulation modelling approach to evaluating alternative policies for the management of the waiting list for liver transplantation. Health Care Manag Sci 4(2):117–124
Sandikci B, Maillart LM, Schaefer AJ, Alagoz O, Roberts MS (2008) Estimating the patient’s price of privacy in liver transplantation. Oper Res 56(6):1393–1410
Sandikci B, Maillart LM, Schaefer AJ, Roberts MS (2011) Alleviating the Patient’s price of privacy through a partially observable waiting list. SSRN electronic journal
Schapire RE, Freund Y (2012) Boosting.Foundations and Algorithms, MIT Press
Shechter SM (2005) A clinically based discrete-event simulation of end-stage liver disease and the organ allocation process. Med Dec Making 25(2):199–209
Su X, Zenios SA (2005) Patient choice in kidney allocation: a sequential stochastic assignment model. Oper Res 53(3):443–455
Su XM, Zenios SA, Chertow GM (2004) Incorporating recipient choice in kidney transplantation. J Am Soc Nephrol 15(6):1656–1663
Tang Y, Zhang YQ, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE transactions on systems, man, and cybernetics. Part B (Cybern) 39(1):281–288
Thompson D, Waisanen L, Wolfe R, Merion RM, McCullough K, Rodgers A (2004) Simulating the allocation of organs for transplantation. Health Care Manag Sci 7(4):331–338
TOMEK I (1976) Two modifications of CNN. IEEE transactions on systems. Man Cybern 6:769–772
Veropoulos K, Campbell C, Cristianini N (1999) Controlling the sensitivity of support vector machines. In: Proceedings of the international joint conference on AI, pp 55–60
Wilson DR, Martinez TR (1997) Improved heterogeneous distance functions. J Artif Intell Res 6:1–34
Wold S, Sjöström M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemometr Intell Lab Syst 58(2):109–130
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. Proc Twentieth Int Conf Mach Learn 20(2):856
Zhao Z, Morstatter F, Sharma S, Alelyani S, Anand A, Liu H (2010). Advancing feature selection research. ASU feature selection repository
Acknowledgements
This work was conducted under the auspices of the Minneapolis Medical Research Foundation, contractor for the Scientific Registry of Transplant Recipients (SRTR), as a deliverable under contract no. HHSH250201000018C (US Department of Health and Human Services, Health Resources and Services Administration, Healthcare Systems Bureau, Division of Transplantation). As a US Government-sponsored work, there are no restrictions on its use. The views expressed herein are those of the authors and not necessarily those of the US Government. The authors thank University of Minnesota colleague Zhi Zhang for assistance with data preparation and interpretation, and SRTR colleagues Jon J. Snyder for valuable contributions to the study; David Schladt, Sally Gustafson, Eugene Shteyn, and Xinyue Wang for assistance with data preparation and interpretation; Susan Leppke for project assistance; Delaney Berrini for manuscript preparation; and Nan Booth for manuscript editing.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In this appendix, we explain methods used in our paper for binary classification. These techniques are also applicable to multi-class problems. In addition, we provide cross-validation results on training sample and performance comparisons on test data in Tables 6, 7, 8, 9, 10 and 11 in the Appendix. Table 12 shows sample path SMSE comparisons for different performance criteria. Tables 13 and 15 contain lists of top 50 features selected by the Information Gain feature selection method for 1A and not-1A candidates respectively. These tables are provided in support of the abridged results presented in the paper. We begin by describing data coding procedures we used in this study.
All potentially relevant attributes associated with each donor-candidate pair are called features. We coded categorical variables with several categories into binary variables, e.g., race with 8 basic categories and 3 additional options for missing, multi-racial, and unknown categories, was coded into 11 0-1 variables. This resulted in an original data set of 554 features from three sources: match run, candidate, and donor features. We eliminated those features that were either empty or were not known to the candidates at the time of receiving an offer. For example, only those lab values were retained that were received before the offer date. We also removed certain features that could lead to over-fitting to the existing allocation policy. For example, candidate designation as local, regional, or national, relative to the donor OPO was not considered. Similarly, TxC indices were dropped. We instead created several variables that could explain TxC-specific strategies. In particular, we calculated and included the number of TxCs within 100, 200, and 300 miles of the donor hospital to capture the effect of competition among TxCs. We also included travel times between donor hospital and TxC where each candidate was located [11].
We coded certain features differently from how they appeared in the original data. For example, surgeons indicated that the donor and candidate weight, height, and age ratios were important to them in making accept or decline decisions. Therefore, we created features that consisted of these ratios and removed redundant features.
After the above-mentioned data manipulation steps, our data consists of 3,140 livers transplanted from deceased donors with n = 59,510 valid observations (donor-patient pairs with Y/N responses) from 2011 match-run data, each comprising a vector x={x 1,x 2,...,x k } of donors’ and candidates’ characteristics and a label y that is either 1 (for accept) or -1 (for decline). The parameter k=376 denotes the number of relevant characteristics. Throughout this appendix, we also use (x i ,y i ) to denote the i-th observation. Similarly, the notation x i j refers to the value of the jth feature in the ith observation. There are four sections in this appendix, which provide, respectively, the mathematical underpinnings of feature selection and imbalance correction methods, information about MATLAB implementation of different methods, and cross validation and test results.
1.1 Appendix A: Feature selection
As described in the paper, we focus on filter methods [35]. In what follows, we describe each of the four methods we compared in the paper.
1.1.1 A.1 Fisher score
Fisher score measures the dissimilarity of feature values across different classes. Since it treats features one at a time, it cannot consider the redundancy of features [10]. It calculates a score for each feature. Features with higher scores are preferred because a higher score implies greater differences in feature values across different label classes. For a feature labeled f j , its Fisher Score F S(f j ) is calculated as follows:
- μ j ::
-
the mean value of the feature f j for all classes,
- n z ::
-
the number of samples in the zth class,
- μ j,z ::
-
the mean value of f j within class z, and
- σ j,z ::
-
the variance of f j values within class z.
1.1.2 A.2 Relief
Relief is a weighting method in which the weights are determined by measuring distances between the value of a feature for a particular observation and its value for the nearest same and different class observations [15]. The feature that provides the widest overall gap earns the highest weight. In particular, the score of a feature is calculated as follows:
where x i j is the value of jth feature of ith instance x i , \(f_{NH(x_{ij})}\) and \(f_{NM(x_{ij})}\) are the values on the jth feature of nearest points to x i with the same and different class label, respectively [35]. The function diff(⋅) is the distance measurement defined as following. When u and v are nominal variables,
When u and v are numerical variables,
where N u is an appropriately sized normalization unit needed to make the difference lie in the interval [0,1].
1.1.3 A.3 Information Gain (IG)
Information Gain measures the dependency between the feature and the class label [8]. It is widely used because it is simple to calculate and easy to interpret. In order to explain Information Gain, we first define the concept of entropy. Entropy R(U) of random variable \(U \in \mathcal {U}\) with probability mass function p(u) is defined by
Entropy R(U) is a measure of uncertainty of the random variable U. Similarly, conditional entropy R(U|V) is defined as
With these definitions in hand, the Information Gain (IG) of feature f j for class label Y is calculated as follows:
For each label class, features are ranked in terms of Information Gain. Similar to Fisher score, IG considers each feature one at a time. Therefore, it cannot eliminate redundant features.
1.1.4 A.4 Correlation Based Filter (CBF)
This method uses the correlations between features and class labels and between features to eliminate redundant features [34]. There are several ways to measure correlation between features and labels, and between features. We present Fast CBF, which we used in the paper. In this method symmetrical uncertainty (SU) is employed to measure correlation. SU is defined as follows:
where R(u) and I G(u,v) is defined (7) and (12), respectively.
The algorithm used by the CBF method is presented in Table 5. At the beginning, CBF chooses a set of features according to SU between features and class labels. If the SU between a feature and labels is higher than a predefined threshold (δ), then the feature is selected as a predominant feature, i.e., it then belongs to the set S l i s t . All features that are not selected as predominant features are discarded. Then, the feature with the highest SU with a class label becomes a base feature (f b ). If the SU between a feature(f q ) and the base feature(f b ) is higher than that between a feature(f q ) and class label, then that feature is dropped from the list. The next highest feature in the predominant list then becomes the base feature and the above process is repeated until no additional features can be dropped. All other methods described in this appendix provide a relative rank of each feature, leaving the task of selecting which features to use for tuning classifiers to the analyst. In contrast, CBF actually selects a certain number of features by itself.
1.2 Appendix B: Treating imbalanced data: under-sampling
Because we have many more observations with the N label, we use under-sampling to balance the number of observations within each label class. It is possible to use random selection to decide which N-labeled observations to keep in the training data. However, there exist better methods that select important observations. We describe one such approach [17]. In this approach, we remove two kinds of observations: redundant and Tomek links. Redundant observations do not hurt the classifier, but increase classification error cost of the N-class labels. In contrast, Tomek links are observations located near the boundary; see Fig. 6 for a graphical representation. These observations affect the classifier and their noisiness can lower the accuracy of the classifier if they are not removed.
Redundant observations are identified by using the One Nearest Neighbor (denoted 1-NN) method. 1-NN is one of the simplest classification methods that finds the nearest observation from each tagged observation and assigns to it the same label as that of its nearest neighbor. For implementing 1-NN, distance is calculated by the Value Difference Metric (VDM), which is shown below [32].
where
-
\(N_{{f_{i}},u}\): the number of instances in the training set (T) that have value u for feature f i ,
-
\(N_{{f_{i}},u,c}\): the number of instances in T that have value u for feature f i and class c,
-
c: output class. c∈{1,2}, for binary classification.
-
q: a constant, usually 1 or 2, and
-
\(P_{{f_{i}},u,c}\): conditional probability that the output class is c given that the attribute f i has value u, \(P_{{f_{i}},u,c} =P(c|u,{f_{i}})\).
Finally, \(P_{{f_{i}},u,c}\) is defined as
and \({N_{{f_{i}},u}}\) is the sum of \(N_{{f_{i}},u}\) over all classes. That is,
Tomek links are defined as follows. Let δ(x i ,x j ) represent the distance between observations x i and x j with different class labels. The pair (x i ,x j ) is called a Tomek link if there is no observation x k such that δ(x i ,x k )<δ(x i ,x j ) and δ(x k ,x j )<δ(x i ,x j ) [30].
With these definitions in hand, the under-sampling algorithm has the following steps: (i) create a subset S with all the Y labeled observations, (ii) select one N-labeled observation at random and apply 1-NN, (iii) if the 1-NN observation is misclassified, then add the N-labeled observation to set S, else remove it from further consideration, (iv) repeat the procedure by selecting each N-labeled observation one by one until all such observations are considered, and finally (v) among data points in subset C, find Tomek-link observations and remove them. The resulting set S is the under-sampled training data set.
1.3 Appendix C: MATLAB Code
MATLAB codes we used for feature selection are available from Feature Selection Group’s web site at the Arizona State University (featureselection.asu.edu).
-
Fisher score: fsFisher.m
-
Relief: fsReliefF.m
-
CBF: fsFCBF.m
-
Information Gain: fsInfoGain.m
We used MATLAB(R2012b) functions from statistics toolbox for classifiers (details can be found at the following URL: www.mathworks.com/products/statistics/description4.html).
-
Logistic Regression : glmfit.m with the option, ’binomial’
-
SVM : svmtrain.m
-
Boosting : fitensemble.m with the options, ’AdaBoostM1’ and ’GentleBoost’
-
RF : treebagger.m
-
CART : classregtree.m
1.4 Appendix D: Cross-validation and test results
This Appendix contains cross-validation and test results, as well as top features identified by the Information Gain feature selection method. Many machine learning models have one or more control parameters, e.g., σ in kernel for SVM. Before the final evaluation of our final model, we need to tune these control parameters. We divided the data into three groups; training (60 %), cross-validation(20 %), and test (20 %) data sets. We build a model with training data and tune the control parameters by intermediate evaluation with cross-validation, and the final model evaluation is done with test data set. The cross-validation helps us not only tune parameters but also prevent over- fitting.
Tuning objective is maximizing G-metric except for RF-RUS (RF with Random Under Sampling).
In the cross-validation and test result tables,
-
CM: Classification Method
-
FS: Feature Selection Method
-
# of Features: the number of features we selected from the top of the list
-
Error rate: percent of observations that were incorrectly classified
-
Accuracy: percente of observations that were correctly classified
-
Sensitivity: Accuracy only within positive class
-
Specificity: Accuracy only within negative class
-
G-metric: the square root of the product of sensitivity and specificity
Because CBF determines the number of features by itself, it had only 12 and 8 features in the lists for 1A and Not-1A, respectively. In all other cases, we obtained a rank ordered list of all features. Information Gain resulted in the best G-metric for all methods in cross-validation results. Therefore, we used the feature set provided by the Information Gain method to finalize parameters of each classifier.
Rights and permissions
About this article
Cite this article
Kim, SP., Gupta, D., Israni, A.K. et al. Accept/decline decision module for the liver simulated allocation model. Health Care Manag Sci 18, 35–57 (2015). https://doi.org/10.1007/s10729-014-9295-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10729-014-9295-x