ORIGINAL ARTICLE
Cardiac Surgical Mortality
Comparison Among Different Additive Risk-Scoring Models
in a Multicenter Sample
Joan M. V. Pons, MD; Josep A. Espinas, MD; Josep M. Borras, MD;
Victor Moreno, MD; Isaac Martin, MD; Alicia Granados, MD
Objective: To compare the performance of several risk-
scoring models to predict surgical mortality following
open heart surgery.
Design: A prospective observational study.
Setting: Seven tertiary cardiac centers (3 private and 4
public and teaching hospitals) in Catalonia (Spain).
Patients: A consecutive sample of 1287 patients submitted to open heart surgery during a 61⁄2-month period
(February 14, 1994, to August 31, 1994).
Intervention: None.
Main Outcome Measure: Model discrimination
capability was assessed with the c-statistic. A x2 test to
compare observed and predicted mortality rates was
used as a measure of model calibration. Performance of
centers was evaluated through the standardized mortality ratio and using the center as an indicator variable in
a logistic regression model. The agreement among mod-
els for individual predictions was tested using weighted
k statistics.
Results: Models developed in other health care contexts showed, as expected, lower c-statistics and an
inappropriate calibration. There were no statistically
significant differences among hospitals after adjusting
for baseline patients’ risk factors with the use of any of
the different models. Models also agree in the standardized rank of centers. Weighted k statistics indicated
poor agreement among models for individual patient
risk prediction.
Conclusions: Models can be a useful tool to compare
providers’ performance and to give a more in-depth look
at the process of care when appropriately customized to
the context. Severity-adjusted models can also play a role
in supporting the informed and subjective surgeon’s assessment, but it is inappropiate to use them for individual predictions.
Arch Surg. 1998;133:1053-1057
S
From the Catalan Agency for
Health Technology Assessment
(Drs Pons, Espinas, Borras, and
Granados), Catalan Institute of
Oncology (Drs Borras and
Moreno), Biostatistics and
Epidemiology Laboratory,
Department of Pediatrics,
Obstetrics and Preventive
Medicine, Faculty of Medicine,
Autonomous University of
Barcelona (Drs Moreno and
Martin), Barcelona, Spain.
PECIFIC HOSPITAL mortality
rates have received increasing attention as a measure of
health care outcome. However, crude hospital mortality rate is an inaccurate indicator since it
does not consider the severity of illness.
When comparisons are made, it is essential to adjust mortality rates according to
the presence of factors that might determine the risk of an adverse outcome.
Recent data on hospital cardiac surgical mortality have generated both
controversy and confusion among stakeholders in health care systems: consumers, purchasers, and health care providers (hospitals and surgeons).1 On the one
hand, proponents of releasing this kind of
information argue that despite potential inaccuracies, hospitals with very high mortality rates are likely to provide poor quality of care, and that increased consumer
knowledge will lead to a greater demand
for all hospitals to ensure quality of
care.2 On the other hand, there are arguments against public disclosure of
provider-specific mortality rates. Criticism is directed at the quality of data
used, the inaccuracy of models, and the
misunderstanding of this information by
the media. The use of additional indicators that can also be risk-adjusted, such
as perioperative complications, improvements in functional capacity, quality
of life, and patient satisfaction, as well
as cost-benefit analysis, has also been
suggested.3
Several severity measurement tools to
assess surgical risk are now available. They
differ in their classification approach, conceptual foundation, risk factors included, outcome definition, potential reliability, resistance to manipulation, and
availability of documentation.4 The char-
ARCH SURG/ VOL 133, OCT 1998
1053
©1998 American Medical Association. All rights reserved.
Downloaded From: https://jamanetwork.com/ on 04/09/2023
SUBJECTS AND METHODS
The population in this study came from the Catalan Study
(CS) on Open Heart Surgery and detailed methods have been
referred to previously.9 All consecutive open heart procedures carried out in adult patients in 7 centers in Barcelona, Spain, identified by a number, were included during
a 61⁄2-month period (February 14, 1994, to August 31, 1994).
Data were registered on a specifically designed sheet. Overall, there were 1287 open heart procedures collected after
excluding heart transplantations (22 cases performed in only
2 hospitals).
Three risk stratification models to predict surgical mortality were selected for our analysis. The CS model came
from the study referred to above.9 The 2 other models were
selected because they encompassed a range of extracorporeal cardiac procedures wider than CABG alone, were additive, and were not based on administrative data sets. The
Parsonnet et al10 method is addressed to acquired adult heart
disease; it stratified patients into 5 risk categories. Two factors (“catastrophic state” and “rare circumstances”) in the
first version were valued subjectively. In the use of this risk
model in our population, we gave a fixed value to these subjective items depending on the presence of any catastrophic state or rare circumstance and on the preoperative subjective risk assessment made by the surgeon. This
subjective assessment, based on clinical judgment and data
available before surgery, used 5 categories of risk. The other
model selected was the Higgins et al11 clinical severity score
addressed to patients who had undergone CABG surgery
acteristics of the population analyzed and the way a system was developed may affect its applicability to other
health care settings.5 To our knowledge, there are few
published studies that analyze the performance of several predictive models for patients who have undergone
coronary artery bypass graft (CABG) surgery. Two of these
assess the validity of 4 severity-adjusted models in an independent surgical database coming from a single center. 6,7 Another study compared the performance of
different CABG providers by use of severity models developed for hospitalized patients.8
The aim of our study is to compare how different
additive risk-scoring models work in a multicenter sample
of patients subjected to open heart surgery. Models were
compared with regard to their calibration and discrimination capability, the assessment of differences among
providers in surgical mortality, and individual patient prediction.
RESULTS
The population characteristics of the Higgins et al and
the CS models are given in Table 1. Unfortunately, data
on the Parsonnet et al model were not available from the
original publication. The most striking inequalities were
related to reoperation rates, the prevalence of chronic obstructive pulmonary disease while taking medication, and
kidney disease. Other more clearly defined factors (de-
and those who had accompanying procedures. For this
study, we recoded the 9 severity categories used by Higgins et
al into an ordinal scale with 5 risk levels based on similarity in observed mortality rates as shown in Figure 2 of the
study by Higgins et al.
Predicted mortality rates were calculated for each model
according to the original criteria. For the Parsonnet et al
model, the predicted rate for each risk level was calculated averaging the individual scores within each category. For the Higgins et al model, predicted mortality rates
were estimated from the figures of the original publication because the exact numbers were not reported. For the
CS model, predicted rates were calculated in the validation subsample applying the observed mortality rates of the
training subsample. A x2 test to compare observed and predicted mortality rates was used as a measure of the calibration of the model. For each model, a c-statistic, which
equals the area under a receiver operating characteristic
curve, was used as a measure of discrimination (a c value
of 0.5 suggests no ability to discriminate, and a value of
1.0 indicates perfect discrimination).12
Differences in centers’ performances were assessed
through the standardized mortality ratio (SMR), which is
the ratio of the observed and the expected mortality rates.
Correlations of centers’ SMR order were assessed by the
Spearman rank correlation coefficient. To test for centers’
homogeneity in surgical mortality, a logistic regression
analysis for each model was used. Finally, the agreement
among models for individual predictions was tested using
a weighted k statistic in the cross-classification tables generated with each pair of models.
mographic, surgical, and creatinine level) did not differ
substantially.
Table 2 gives the sample population where the
model was applied, the score values for each category,
patients’ distribution by risk level, and the observed mortality rate for each of the 5 risk categories. All models combined the highest scores in the worst risk category. Except for the Parsonnet et al model, the first 2 risk levels
composed more than 60% of patients.
In Table 3 we present the c-statistic, predicted and
observed mortality rates in the population selected for
the different models, and the x2 test for calibration. The
highest c-statistic corresponded to the model specifically designed for this population (CS model). Statistically significant differences between observed and predicted mortality rates were seen in the 2 external models.
The Parsonnet et al model underestimated the low risk
level and overestimated the poor, high, and extremely
high risk levels. The Higgins et al model uniformly underestimated the risk through all categories (Figure 1).
There were no statistically significant differences in
mortality among centers when adjusting for any risk
model as given in Table 3. Accordingly, all SMR 95% confidence intervals included the value 1, as shown in
Figure 2. There was an almost-perfect agreement among
models in the order assigned to centers depending on the
SMR. Spearman rank correlation coefficient showed statistically significant values (P ≤ .02) for all the pairwise
ARCH SURG/ VOL 133, OCT 1998
1054
©1998 American Medical Association. All rights reserved.
Downloaded From: https://jamanetwork.com/ on 04/09/2023
comparisons (rs between the CS and the Higgins et al models = 0.89; rs between the Higgins et al and the Parsonnet et al models = 1.00 and, rs between the CS and the
Parsonnet et al models = 0.88).
The weighted k statistic between the Parsonnet et
al and the CS models was 0.29 (n = 1287); between the
Higgins et al and the CS models, 0.50 (n = 715); and between the Parsonnet et al and the Higgins et al models,
0.40 (n = 715).
COMMENT
This study underlines the applicability of risk stratification models when assessing cardiac surgical mortality.
Risk models can be used to compare different providers
and to offer a more objective and adjunctive assessment
of patients’ risks, but it is inappropriate to use them to
make individual predictions, or to base clinical decisions only on this assessment.
Risk models applied to our surgical database used
different methods to select risk factors; they included different numbers of variables, assigned different weights
to risk factors selected, and produced different classifications of patients by levels of surgical risk. To avoid any
subjective assessment of risk factors, most of the models try to include factors that can be objectively measured. Nevertheless, some of the variables that contributed the most in the first version of the Parsonnet et al
model were valued subjectively; although in a more recent version, the subjective input has been eliminated.13
Except for the CS model, the other models have been developed and validated in a single institution although these
external models have been applied by other institutions
in other settings.6,7
The characteristics of the population where the
models are developed should be a primary criterion
for selecting a model to be applied in other institutions and in other settings.14 Most developed models
on cardiac surgery deal with CABG surgery, the most
common open heart procedure in developed countries. However, international registers have shown heterogeneity in the type of open heart procedures among
countries.15 Therefore, as has been suggested, small
differences in population selection may lead to different combinations of variables being selected for any
predictive model.14 When applying the different models to the same multicenter population, we found that
any model can be used to test for heterogeneity among
centers. The models’ ranking of providers according to
Table 1. Population Characteristics*
Model
Catalan Study9
Characteristics
Sample size, No.
Mean age, y
Age $70 y
Sex, female
Diabetes while receiving
medication
Chronic obstructive
pulmonary disease while
receiving medication
Cerebrovascular disease
Prior vascular surgery
Kidney disease
Serum creatinine level,
$168 µmol/L
Liver disease
Left ventricular dysfunction
(ejection fraction ,35)
Emergency operation
Mitral valve disease
Aortic valve operated on
Reoperation
Higgins
et al11
Coronary Artery
Bypass Graft
Procedures
Overall
Population
5051
NA
22.9
20.6
17.2
715
62.53
22.5
19.4
20.9
1287
61.25
22.4
32.3
14.3
7.5
3.5
5.2
6.6
5.6
5.8
3.5
6.9
3.6
9.0
4.4
8.7
3.2
9.3
4.5
3.0
11.3
1.8
8.4
4.1
6.9
3.1
4.3
5.9
18.5
2.9
4.6
8.8
4.5
3.2
20.8
32.7
10.6
*Values except for mean age are in percentages. NA indicates not
available.
Table 2. Risk Levels for Surgical Mortality With Percentage of Patients and Observed Mortality Rate in Each Category
Risk Levels†
Model*
1
2
3
4
5
0-4
30.3
5.4
5-9
12.4
6.9
10-14
24.2
9.6
15-19
12.0
13.0
$20
21.2
21.0
0-1
46.6
3.3
2-4
30.9
13.6
5-6
7.8
14.3
7-9
10.1
19.4
$10
4.6
36.4
0-10
52.3
4.2
11-15
15.9
7.3
16-20
13.5
13.2
21-30
12.1
19.2
$31
6.1
54.4
10
Parsonnet et al (n = 1287)
Scale values
Cases, %
Observed mortality, %
Higgins et al11 (n = 715)
Scale values
Cases, %
Observed mortality, %
Catalan Study9 (n = 1287)
Scale values
Cases, %
Observed mortality, %
*Scale values indicate the score ranges for each risk level in the different additive models used. The score for an individual patient was the sum of the individual
assigned risk factors’ weight. Model references are to note the results of the application of these models to the current study’s population and not the results of
the external models in the original publications.
†Risk levels indicate levels of increasing risk for surgical mortality respective to the individual model’s definition. For example, 1, 2, 3, 4, and 5 represent good,
fair, poor, high, and extremely high in the Parsonnet et al10 model and low, fair, high, very high, and extremely high in the Catalan Study of ours. The 9-item
severity categories of Higgins et al11 were recoded into an ordinal scale with 5 risk levels.
ARCH SURG/ VOL 133, OCT 1998
1055
©1998 American Medical Association. All rights reserved.
Downloaded From: https://jamanetwork.com/ on 04/09/2023
Table 3. Measures of Model’s Accuracy and Center’s Differences
Mortality Rates, %
Model*
c-Statistic
Observed
Expected
x2 Test for
Calibration, P
Test for Homogeneity
Among Centers, P
0.67
0.68
0.76
10.8
10.5
9.8
12.1
2.9
9.8
,.001
,.001
.34
.40
.10
.23
Parsonnet et al10
Higgins et al11
Catalan Study9
*See asterisk footnote to Table 2 for information about model references.
2.0
Standardized Mortality Ratio
Observed Mortality Rate, %
60
50
40
30
20
10
1.5
1.0
0.5
0
2
1
6
5
7
3
4
2
1
6
7
3
5
4
2
1
6
7
3
5
4
0
10
30
20
40
50
60
25
20
15
1.5
1.0
0.5
3.0
5
0
5
15
10
20
25
30
40
Observed Mortality Rate, %
2.0
0
10
30
Standardized Mortality Ratio
Observed Mortality Rate, %
30
Standardized Mortality Ratio
2.5
2.5
2.0
1.5
1.0
0.5
0
20
Hospital
10
0
5
10
15
20
25
Predicted Mortality Rate, %
Figure 1. Comparison of observed vs predicted mortality rates by risk level
by applying the Catalan Study,9 Parsonnet et al,10 and Higgins et al11 models,
respectively, to our multicenter population. Note that these model references
are to note the results of the application of these models to the current
study’s population and not the results of the external models in the original
publications. Solid lines represent actual mortality rate; dotted lines, perfect
fit for each model.
SMR showed that there was a good concordance in the
order assigned.
To compare centers, predictive accuracy of an external model can be restored by an analytic adjustment
for the differences in mortality prevalence in the 2 different populations.14,16 However, if the analyses of adjusted surgical outcomes have to be interpreted as an indicator that one should look more deeply into a specific
Figure 2. Ranking of centers by standardized mortality ratio and by applying
the results of the Catalan Study,9 Parsonnet et al,10 and Higgins et al11
models, respectively, to our multicenter population. Note that these model
references are to note the results of the application of these models to the
current study’s population and not the results of the external models in the
original publications.The horizontal bars are given to help the reader see if
the center has an observed mortality rate that is higher or lower than that
expected and also, depending on if the 95% confidence interval of the
standardized mortality ratio excludes or includes the value 1, that there is or
is not a statistically significant difference.
center situation, models designed specifically for the study
population are needed.
This study also points out the limitations of any predictive model when applied to predict individual risk, as
has been the case with severity systems used in patients
in intensive care settings. Using these predictive models
as an adjunct to informed but subjective opinions made
by surgeons is a reasonable and prudent choice, but using them to dictate individual patient decisions does not
seem appropriate.17
There are some limitations to our study. One is
sample size; the number of patients operated on by a
ARCH SURG/ VOL 133, OCT 1998
1056
©1998 American Medical Association. All rights reserved.
Downloaded From: https://jamanetwork.com/ on 04/09/2023
single center in this study falls well below the hundreds
of patients needed to detect meaningful differences in
mortality rates. All predictive models also share another
type of limitation. They cannot adjust for all patient
characteristics that may have, at least in some cases, an
important impact on surgical mortality. Neither can
they consider other patient nonclinical factors or
inequalities in technical and therapeutic resources available in cardiac services nor the administrative or managerial differences in practices. Also, specifically for predictive models in surgical mortality, they cannot
consider technical skills of surgeons that can be related
to the learning curve and continual practice. Some risk
assessment models have presented surgical adjusted
mortality as a measure of surgeons’ technical skills, but
this approach is still open to debate, and its potential
ramifications are being scrutinized.18,19 Finally, models
cannot assess another important issue in any procedure:
its appropriateness.
CONCLUSIONS
Our study showed that predictive models can be a useful tool to standardize and to assess the performance of
different providers. Although external models for open
heart surgical risk are not as reliable as the model specifically designed for our study population, there is an
agreement among them in the SMR relative value among
centers. Risk models can play a role as an adjunct to the
informed, although subjective, risk assessment made by
the surgeons, but it is inappropriate to use them to dictate individual patient decisions. A specific approach might
help to assess factors associated with the observed and
expected differential rates. This analysis can provide insight into the process of care and, therefore, improve quality of care.
We are indebted to the cardiac surgeons at Centre Quirúrgic Sant Jordi, Clı́nica Quirón, Hospital de Barcelona, Hospital Clı́nic i Provincial, Hospital General de la Vall d’Hebron,
Barcelona, Spain; Hospital Prı́nceps d’Espanya de Bellvitge, L’Hospitalet, Spain; and Hospital de la Santa Creu i
Sant Pau, Barcelona, for their support and cooperation; to
the following surgeons as a representative of participating
centers: Alejandro Aris, MD, Eduard Castells, MD, Lluı̈sa
Camera, MD, Josep M. Caralps, MD, Carles Fontanillas,
MD, Francisco Murillo, MD, Jaume Mulet, MD, Marcos
Murtra, MD, Jose Luis Pomar, MD, Félix Rovira, MD, Josep Oriol Sole, MD; to Maria Cardona, MD, as research assistant for the study; to A. Ginel, MD, and J. Montiel, MD,
for their data collection assistance; to Cari Almazan, MD,
Albert J. Jovell, MD, and Laura Sampietro-Colom, MD, for
their helpful comments; and to David Lavine for his assistance in manuscript preparation.
Reprints: Joan M. V. Pons, MD, Catalan Agency for
Health Technology Assessment, Travessera de les Corts 131159, Pavelló Ave Maria, 08028 Barcelona, Spain (e-mail:
jpons@olimpia.scs.es).
REFERENCES
1. Iezzoni LI, Shwartz M, Restuccia J. The role of severity information in health policy
debates: a survey of state and regional concerns. Inquiry. 1991;28:117-128.
2. Greenfield S, Aronow HU, Elashoff RM, Watanabe D. Flaws in mortality data. JAMA.
1988;260:2253-2255.
3. Kouchoukos NT, Anderson RP, Fosburg RG, et al. Report of the Ad Hoc Committee on Physician-Specific Mortality Rates for Cardiac Surgery. Ann Thorac
Surg. 1993;56:1200-1202.
4. Iezzoni LI, Ash AS, Coffman GA, Moskowitz MA. Predicting in-hospital mortality: a
comparison of severity measurement approaches. Med Care. 1992;30:347-359.
5. Iezzoni LI. Risk and outcomes. In: Iezzoni LI, ed. Risk Adjustment for Measuring
Health Care Outcomes. Ann Arbor, Mich: Health Administration Press; 1994:123.
6. Orr RK, Maini BS, Sottile FD, Dumas EM, O’Mara P. Comparison of four severityadjusted models to predict mortality after coronary artery bypass graft surgery.
Arch Surg. 1995;130:301-306.
7. Weightman WM, Gibbs NM, Sheminant MR, Thackray NM, Newman MA. Risk
prediction in coronary artery surgery: a comparison of four risk scores. Med J
Aust. 1997;166:408-411.
8. Landon B, Iezzoni LI, Ash AS, et al. Judging hospitals by severity-adjusted mortality rates: the case of CABG surgery. Inquiry. 1996;33:155-166.
9. Pons JMV, Granados A, Espinas JA, Borras JM, Martin I, Moreno V. Assessing
open heart surgery mortality in Catalonia (Spain) through a predictive risk model.
Eur J Cardiothorac Surg. 1997:11:415-423.
10. Parsonnet V, Dean D, Bernstein AD. A method of uniform stratification of risk
for evaluating the results of surgery in acquired adult heart disease. Circulation.
1989;79(suppl 1):I3-I11.
11. Higgins TL, Estafanous FG, Loop FD, Beck GJ, Blum JM, Paranandi L. Stratification of morbidity and mortality outcome by preoperative risk factors in coronary artery bypass patients: a clinical severity score. JAMA. 1992;267:23442348.
12. Ash AS, Shwartz M. Evaluating the performance of risk-adjustment methods:
dichotomous measures. In: Iezzoni LI, ed. Risk Adjustment for Measuring
Health Care Outcomes. Ann Arbor, Mich: Health Administration Press; 1994:
313-346.
13. Parsonnet V, Bernstein AD, Gera M. Clinical usefulness of risk-stratified outcome analysis in cardiac surgery in New Jersey. Ann Thorac Surg. 1996;61
(suppl 2):S8-S11.
14. Charlson ME, Ales KL, Simon R, Mackenzie R. Why predictive indexes perform
less well in validation studies: is it magic or methods? Arch Intern Med. 1987;
147:2155-2161.
15. Unger F. European survey on cardiac interventions: open heart surgery, PTCA,
cardiac catheterization 1994. Ann Acad Scientarium Art Eur. 1995;12.
16. Poses RM, Cebul RD, Collins M, Fager SS. The importance of disease prevalence in transporting clinical prediction rules. Ann Intern Med. 1986;105:586591.
17. Lemeshow S, Le Gall J-R. Modeling the severity of illness of ICU patients: a system update. JAMA. 1994;272:1049-1055.
18. Chassin MR, Hannan EL, DeBuono BA. Benefits and hazards of reporting medical outcome publicly. N Engl J Med. 1996;334:394-398.
19. Omoigui NA, Miller DP, Brown KJ, et al. Outmigration for coronary bypass surgery in an era of public dissemination of clinical outcomes. Circulation. 1996;
93:27-33.
ARCH SURG/ VOL 133, OCT 1998
1057
©1998 American Medical Association. All rights reserved.
Downloaded From: https://jamanetwork.com/ on 04/09/2023