Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu
ORIGINAL ARTICLE Cardiac Surgical Mortality Comparison Among Different Additive Risk-Scoring Models in a Multicenter Sample Joan M. V. Pons, MD; Josep A. Espinas, MD; Josep M. Borras, MD; Victor Moreno, MD; Isaac Martin, MD; Alicia Granados, MD Objective: To compare the performance of several risk- scoring models to predict surgical mortality following open heart surgery. Design: A prospective observational study. Setting: Seven tertiary cardiac centers (3 private and 4 public and teaching hospitals) in Catalonia (Spain). Patients: A consecutive sample of 1287 patients submitted to open heart surgery during a 61⁄2-month period (February 14, 1994, to August 31, 1994). Intervention: None. Main Outcome Measure: Model discrimination capability was assessed with the c-statistic. A x2 test to compare observed and predicted mortality rates was used as a measure of model calibration. Performance of centers was evaluated through the standardized mortality ratio and using the center as an indicator variable in a logistic regression model. The agreement among mod- els for individual predictions was tested using weighted k statistics. Results: Models developed in other health care contexts showed, as expected, lower c-statistics and an inappropriate calibration. There were no statistically significant differences among hospitals after adjusting for baseline patients’ risk factors with the use of any of the different models. Models also agree in the standardized rank of centers. Weighted k statistics indicated poor agreement among models for individual patient risk prediction. Conclusions: Models can be a useful tool to compare providers’ performance and to give a more in-depth look at the process of care when appropriately customized to the context. Severity-adjusted models can also play a role in supporting the informed and subjective surgeon’s assessment, but it is inappropiate to use them for individual predictions. Arch Surg. 1998;133:1053-1057 S From the Catalan Agency for Health Technology Assessment (Drs Pons, Espinas, Borras, and Granados), Catalan Institute of Oncology (Drs Borras and Moreno), Biostatistics and Epidemiology Laboratory, Department of Pediatrics, Obstetrics and Preventive Medicine, Faculty of Medicine, Autonomous University of Barcelona (Drs Moreno and Martin), Barcelona, Spain. PECIFIC HOSPITAL mortality rates have received increasing attention as a measure of health care outcome. However, crude hospital mortality rate is an inaccurate indicator since it does not consider the severity of illness. When comparisons are made, it is essential to adjust mortality rates according to the presence of factors that might determine the risk of an adverse outcome. Recent data on hospital cardiac surgical mortality have generated both controversy and confusion among stakeholders in health care systems: consumers, purchasers, and health care providers (hospitals and surgeons).1 On the one hand, proponents of releasing this kind of information argue that despite potential inaccuracies, hospitals with very high mortality rates are likely to provide poor quality of care, and that increased consumer knowledge will lead to a greater demand for all hospitals to ensure quality of care.2 On the other hand, there are arguments against public disclosure of provider-specific mortality rates. Criticism is directed at the quality of data used, the inaccuracy of models, and the misunderstanding of this information by the media. The use of additional indicators that can also be risk-adjusted, such as perioperative complications, improvements in functional capacity, quality of life, and patient satisfaction, as well as cost-benefit analysis, has also been suggested.3 Several severity measurement tools to assess surgical risk are now available. They differ in their classification approach, conceptual foundation, risk factors included, outcome definition, potential reliability, resistance to manipulation, and availability of documentation.4 The char- ARCH SURG/ VOL 133, OCT 1998 1053 ©1998 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 04/09/2023 SUBJECTS AND METHODS The population in this study came from the Catalan Study (CS) on Open Heart Surgery and detailed methods have been referred to previously.9 All consecutive open heart procedures carried out in adult patients in 7 centers in Barcelona, Spain, identified by a number, were included during a 61⁄2-month period (February 14, 1994, to August 31, 1994). Data were registered on a specifically designed sheet. Overall, there were 1287 open heart procedures collected after excluding heart transplantations (22 cases performed in only 2 hospitals). Three risk stratification models to predict surgical mortality were selected for our analysis. The CS model came from the study referred to above.9 The 2 other models were selected because they encompassed a range of extracorporeal cardiac procedures wider than CABG alone, were additive, and were not based on administrative data sets. The Parsonnet et al10 method is addressed to acquired adult heart disease; it stratified patients into 5 risk categories. Two factors (“catastrophic state” and “rare circumstances”) in the first version were valued subjectively. In the use of this risk model in our population, we gave a fixed value to these subjective items depending on the presence of any catastrophic state or rare circumstance and on the preoperative subjective risk assessment made by the surgeon. This subjective assessment, based on clinical judgment and data available before surgery, used 5 categories of risk. The other model selected was the Higgins et al11 clinical severity score addressed to patients who had undergone CABG surgery acteristics of the population analyzed and the way a system was developed may affect its applicability to other health care settings.5 To our knowledge, there are few published studies that analyze the performance of several predictive models for patients who have undergone coronary artery bypass graft (CABG) surgery. Two of these assess the validity of 4 severity-adjusted models in an independent surgical database coming from a single center. 6,7 Another study compared the performance of different CABG providers by use of severity models developed for hospitalized patients.8 The aim of our study is to compare how different additive risk-scoring models work in a multicenter sample of patients subjected to open heart surgery. Models were compared with regard to their calibration and discrimination capability, the assessment of differences among providers in surgical mortality, and individual patient prediction. RESULTS The population characteristics of the Higgins et al and the CS models are given in Table 1. Unfortunately, data on the Parsonnet et al model were not available from the original publication. The most striking inequalities were related to reoperation rates, the prevalence of chronic obstructive pulmonary disease while taking medication, and kidney disease. Other more clearly defined factors (de- and those who had accompanying procedures. For this study, we recoded the 9 severity categories used by Higgins et al into an ordinal scale with 5 risk levels based on similarity in observed mortality rates as shown in Figure 2 of the study by Higgins et al. Predicted mortality rates were calculated for each model according to the original criteria. For the Parsonnet et al model, the predicted rate for each risk level was calculated averaging the individual scores within each category. For the Higgins et al model, predicted mortality rates were estimated from the figures of the original publication because the exact numbers were not reported. For the CS model, predicted rates were calculated in the validation subsample applying the observed mortality rates of the training subsample. A x2 test to compare observed and predicted mortality rates was used as a measure of the calibration of the model. For each model, a c-statistic, which equals the area under a receiver operating characteristic curve, was used as a measure of discrimination (a c value of 0.5 suggests no ability to discriminate, and a value of 1.0 indicates perfect discrimination).12 Differences in centers’ performances were assessed through the standardized mortality ratio (SMR), which is the ratio of the observed and the expected mortality rates. Correlations of centers’ SMR order were assessed by the Spearman rank correlation coefficient. To test for centers’ homogeneity in surgical mortality, a logistic regression analysis for each model was used. Finally, the agreement among models for individual predictions was tested using a weighted k statistic in the cross-classification tables generated with each pair of models. mographic, surgical, and creatinine level) did not differ substantially. Table 2 gives the sample population where the model was applied, the score values for each category, patients’ distribution by risk level, and the observed mortality rate for each of the 5 risk categories. All models combined the highest scores in the worst risk category. Except for the Parsonnet et al model, the first 2 risk levels composed more than 60% of patients. In Table 3 we present the c-statistic, predicted and observed mortality rates in the population selected for the different models, and the x2 test for calibration. The highest c-statistic corresponded to the model specifically designed for this population (CS model). Statistically significant differences between observed and predicted mortality rates were seen in the 2 external models. The Parsonnet et al model underestimated the low risk level and overestimated the poor, high, and extremely high risk levels. The Higgins et al model uniformly underestimated the risk through all categories (Figure 1). There were no statistically significant differences in mortality among centers when adjusting for any risk model as given in Table 3. Accordingly, all SMR 95% confidence intervals included the value 1, as shown in Figure 2. There was an almost-perfect agreement among models in the order assigned to centers depending on the SMR. Spearman rank correlation coefficient showed statistically significant values (P ≤ .02) for all the pairwise ARCH SURG/ VOL 133, OCT 1998 1054 ©1998 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 04/09/2023 comparisons (rs between the CS and the Higgins et al models = 0.89; rs between the Higgins et al and the Parsonnet et al models = 1.00 and, rs between the CS and the Parsonnet et al models = 0.88). The weighted k statistic between the Parsonnet et al and the CS models was 0.29 (n = 1287); between the Higgins et al and the CS models, 0.50 (n = 715); and between the Parsonnet et al and the Higgins et al models, 0.40 (n = 715). COMMENT This study underlines the applicability of risk stratification models when assessing cardiac surgical mortality. Risk models can be used to compare different providers and to offer a more objective and adjunctive assessment of patients’ risks, but it is inappropriate to use them to make individual predictions, or to base clinical decisions only on this assessment. Risk models applied to our surgical database used different methods to select risk factors; they included different numbers of variables, assigned different weights to risk factors selected, and produced different classifications of patients by levels of surgical risk. To avoid any subjective assessment of risk factors, most of the models try to include factors that can be objectively measured. Nevertheless, some of the variables that contributed the most in the first version of the Parsonnet et al model were valued subjectively; although in a more recent version, the subjective input has been eliminated.13 Except for the CS model, the other models have been developed and validated in a single institution although these external models have been applied by other institutions in other settings.6,7 The characteristics of the population where the models are developed should be a primary criterion for selecting a model to be applied in other institutions and in other settings.14 Most developed models on cardiac surgery deal with CABG surgery, the most common open heart procedure in developed countries. However, international registers have shown heterogeneity in the type of open heart procedures among countries.15 Therefore, as has been suggested, small differences in population selection may lead to different combinations of variables being selected for any predictive model.14 When applying the different models to the same multicenter population, we found that any model can be used to test for heterogeneity among centers. The models’ ranking of providers according to Table 1. Population Characteristics* Model Catalan Study9 Characteristics Sample size, No. Mean age, y Age $70 y Sex, female Diabetes while receiving medication Chronic obstructive pulmonary disease while receiving medication Cerebrovascular disease Prior vascular surgery Kidney disease Serum creatinine level, $168 µmol/L Liver disease Left ventricular dysfunction (ejection fraction ,35) Emergency operation Mitral valve disease Aortic valve operated on Reoperation Higgins et al11 Coronary Artery Bypass Graft Procedures Overall Population 5051 NA 22.9 20.6 17.2 715 62.53 22.5 19.4 20.9 1287 61.25 22.4 32.3 14.3 7.5 3.5 5.2 6.6 5.6 5.8 3.5 6.9 3.6 9.0 4.4 8.7 3.2 9.3 4.5 3.0 11.3 1.8 8.4 4.1 6.9 3.1 4.3 5.9 18.5 2.9 4.6 8.8 4.5 3.2 20.8 32.7 10.6 *Values except for mean age are in percentages. NA indicates not available. Table 2. Risk Levels for Surgical Mortality With Percentage of Patients and Observed Mortality Rate in Each Category Risk Levels† Model* 1 2 3 4 5 0-4 30.3 5.4 5-9 12.4 6.9 10-14 24.2 9.6 15-19 12.0 13.0 $20 21.2 21.0 0-1 46.6 3.3 2-4 30.9 13.6 5-6 7.8 14.3 7-9 10.1 19.4 $10 4.6 36.4 0-10 52.3 4.2 11-15 15.9 7.3 16-20 13.5 13.2 21-30 12.1 19.2 $31 6.1 54.4 10 Parsonnet et al (n = 1287) Scale values Cases, % Observed mortality, % Higgins et al11 (n = 715) Scale values Cases, % Observed mortality, % Catalan Study9 (n = 1287) Scale values Cases, % Observed mortality, % *Scale values indicate the score ranges for each risk level in the different additive models used. The score for an individual patient was the sum of the individual assigned risk factors’ weight. Model references are to note the results of the application of these models to the current study’s population and not the results of the external models in the original publications. †Risk levels indicate levels of increasing risk for surgical mortality respective to the individual model’s definition. For example, 1, 2, 3, 4, and 5 represent good, fair, poor, high, and extremely high in the Parsonnet et al10 model and low, fair, high, very high, and extremely high in the Catalan Study of ours. The 9-item severity categories of Higgins et al11 were recoded into an ordinal scale with 5 risk levels. ARCH SURG/ VOL 133, OCT 1998 1055 ©1998 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 04/09/2023 Table 3. Measures of Model’s Accuracy and Center’s Differences Mortality Rates, % Model* c-Statistic Observed Expected x2 Test for Calibration, P Test for Homogeneity Among Centers, P 0.67 0.68 0.76 10.8 10.5 9.8 12.1 2.9 9.8 ,.001 ,.001 .34 .40 .10 .23 Parsonnet et al10 Higgins et al11 Catalan Study9 *See asterisk footnote to Table 2 for information about model references. 2.0 Standardized Mortality Ratio Observed Mortality Rate, % 60 50 40 30 20 10 1.5 1.0 0.5 0 2 1 6 5 7 3 4 2 1 6 7 3 5 4 2 1 6 7 3 5 4 0 10 30 20 40 50 60 25 20 15 1.5 1.0 0.5 3.0 5 0 5 15 10 20 25 30 40 Observed Mortality Rate, % 2.0 0 10 30 Standardized Mortality Ratio Observed Mortality Rate, % 30 Standardized Mortality Ratio 2.5 2.5 2.0 1.5 1.0 0.5 0 20 Hospital 10 0 5 10 15 20 25 Predicted Mortality Rate, % Figure 1. Comparison of observed vs predicted mortality rates by risk level by applying the Catalan Study,9 Parsonnet et al,10 and Higgins et al11 models, respectively, to our multicenter population. Note that these model references are to note the results of the application of these models to the current study’s population and not the results of the external models in the original publications. Solid lines represent actual mortality rate; dotted lines, perfect fit for each model. SMR showed that there was a good concordance in the order assigned. To compare centers, predictive accuracy of an external model can be restored by an analytic adjustment for the differences in mortality prevalence in the 2 different populations.14,16 However, if the analyses of adjusted surgical outcomes have to be interpreted as an indicator that one should look more deeply into a specific Figure 2. Ranking of centers by standardized mortality ratio and by applying the results of the Catalan Study,9 Parsonnet et al,10 and Higgins et al11 models, respectively, to our multicenter population. Note that these model references are to note the results of the application of these models to the current study’s population and not the results of the external models in the original publications.The horizontal bars are given to help the reader see if the center has an observed mortality rate that is higher or lower than that expected and also, depending on if the 95% confidence interval of the standardized mortality ratio excludes or includes the value 1, that there is or is not a statistically significant difference. center situation, models designed specifically for the study population are needed. This study also points out the limitations of any predictive model when applied to predict individual risk, as has been the case with severity systems used in patients in intensive care settings. Using these predictive models as an adjunct to informed but subjective opinions made by surgeons is a reasonable and prudent choice, but using them to dictate individual patient decisions does not seem appropriate.17 There are some limitations to our study. One is sample size; the number of patients operated on by a ARCH SURG/ VOL 133, OCT 1998 1056 ©1998 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 04/09/2023 single center in this study falls well below the hundreds of patients needed to detect meaningful differences in mortality rates. All predictive models also share another type of limitation. They cannot adjust for all patient characteristics that may have, at least in some cases, an important impact on surgical mortality. Neither can they consider other patient nonclinical factors or inequalities in technical and therapeutic resources available in cardiac services nor the administrative or managerial differences in practices. Also, specifically for predictive models in surgical mortality, they cannot consider technical skills of surgeons that can be related to the learning curve and continual practice. Some risk assessment models have presented surgical adjusted mortality as a measure of surgeons’ technical skills, but this approach is still open to debate, and its potential ramifications are being scrutinized.18,19 Finally, models cannot assess another important issue in any procedure: its appropriateness. CONCLUSIONS Our study showed that predictive models can be a useful tool to standardize and to assess the performance of different providers. Although external models for open heart surgical risk are not as reliable as the model specifically designed for our study population, there is an agreement among them in the SMR relative value among centers. Risk models can play a role as an adjunct to the informed, although subjective, risk assessment made by the surgeons, but it is inappropriate to use them to dictate individual patient decisions. A specific approach might help to assess factors associated with the observed and expected differential rates. This analysis can provide insight into the process of care and, therefore, improve quality of care. We are indebted to the cardiac surgeons at Centre Quirúrgic Sant Jordi, Clı́nica Quirón, Hospital de Barcelona, Hospital Clı́nic i Provincial, Hospital General de la Vall d’Hebron, Barcelona, Spain; Hospital Prı́nceps d’Espanya de Bellvitge, L’Hospitalet, Spain; and Hospital de la Santa Creu i Sant Pau, Barcelona, for their support and cooperation; to the following surgeons as a representative of participating centers: Alejandro Aris, MD, Eduard Castells, MD, Lluı̈sa Camera, MD, Josep M. Caralps, MD, Carles Fontanillas, MD, Francisco Murillo, MD, Jaume Mulet, MD, Marcos Murtra, MD, Jose Luis Pomar, MD, Félix Rovira, MD, Josep Oriol Sole, MD; to Maria Cardona, MD, as research assistant for the study; to A. Ginel, MD, and J. Montiel, MD, for their data collection assistance; to Cari Almazan, MD, Albert J. Jovell, MD, and Laura Sampietro-Colom, MD, for their helpful comments; and to David Lavine for his assistance in manuscript preparation. Reprints: Joan M. V. Pons, MD, Catalan Agency for Health Technology Assessment, Travessera de les Corts 131159, Pavelló Ave Maria, 08028 Barcelona, Spain (e-mail: jpons@olimpia.scs.es). REFERENCES 1. Iezzoni LI, Shwartz M, Restuccia J. The role of severity information in health policy debates: a survey of state and regional concerns. Inquiry. 1991;28:117-128. 2. Greenfield S, Aronow HU, Elashoff RM, Watanabe D. Flaws in mortality data. JAMA. 1988;260:2253-2255. 3. Kouchoukos NT, Anderson RP, Fosburg RG, et al. Report of the Ad Hoc Committee on Physician-Specific Mortality Rates for Cardiac Surgery. Ann Thorac Surg. 1993;56:1200-1202. 4. Iezzoni LI, Ash AS, Coffman GA, Moskowitz MA. Predicting in-hospital mortality: a comparison of severity measurement approaches. Med Care. 1992;30:347-359. 5. Iezzoni LI. Risk and outcomes. In: Iezzoni LI, ed. Risk Adjustment for Measuring Health Care Outcomes. Ann Arbor, Mich: Health Administration Press; 1994:123. 6. Orr RK, Maini BS, Sottile FD, Dumas EM, O’Mara P. Comparison of four severityadjusted models to predict mortality after coronary artery bypass graft surgery. Arch Surg. 1995;130:301-306. 7. Weightman WM, Gibbs NM, Sheminant MR, Thackray NM, Newman MA. Risk prediction in coronary artery surgery: a comparison of four risk scores. Med J Aust. 1997;166:408-411. 8. Landon B, Iezzoni LI, Ash AS, et al. Judging hospitals by severity-adjusted mortality rates: the case of CABG surgery. Inquiry. 1996;33:155-166. 9. Pons JMV, Granados A, Espinas JA, Borras JM, Martin I, Moreno V. Assessing open heart surgery mortality in Catalonia (Spain) through a predictive risk model. Eur J Cardiothorac Surg. 1997:11:415-423. 10. Parsonnet V, Dean D, Bernstein AD. A method of uniform stratification of risk for evaluating the results of surgery in acquired adult heart disease. Circulation. 1989;79(suppl 1):I3-I11. 11. Higgins TL, Estafanous FG, Loop FD, Beck GJ, Blum JM, Paranandi L. Stratification of morbidity and mortality outcome by preoperative risk factors in coronary artery bypass patients: a clinical severity score. JAMA. 1992;267:23442348. 12. Ash AS, Shwartz M. Evaluating the performance of risk-adjustment methods: dichotomous measures. In: Iezzoni LI, ed. Risk Adjustment for Measuring Health Care Outcomes. Ann Arbor, Mich: Health Administration Press; 1994: 313-346. 13. Parsonnet V, Bernstein AD, Gera M. Clinical usefulness of risk-stratified outcome analysis in cardiac surgery in New Jersey. Ann Thorac Surg. 1996;61 (suppl 2):S8-S11. 14. Charlson ME, Ales KL, Simon R, Mackenzie R. Why predictive indexes perform less well in validation studies: is it magic or methods? Arch Intern Med. 1987; 147:2155-2161. 15. Unger F. European survey on cardiac interventions: open heart surgery, PTCA, cardiac catheterization 1994. Ann Acad Scientarium Art Eur. 1995;12. 16. Poses RM, Cebul RD, Collins M, Fager SS. The importance of disease prevalence in transporting clinical prediction rules. Ann Intern Med. 1986;105:586591. 17. Lemeshow S, Le Gall J-R. Modeling the severity of illness of ICU patients: a system update. JAMA. 1994;272:1049-1055. 18. Chassin MR, Hannan EL, DeBuono BA. Benefits and hazards of reporting medical outcome publicly. N Engl J Med. 1996;334:394-398. 19. Omoigui NA, Miller DP, Brown KJ, et al. Outmigration for coronary bypass surgery in an era of public dissemination of clinical outcomes. Circulation. 1996; 93:27-33. ARCH SURG/ VOL 133, OCT 1998 1057 ©1998 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 04/09/2023