Abstract
In healthcare, the development and deployment of insufficiently fair systems of artificial intelligence (AI) can undermine the delivery of equitable care. Assessments of AI models stratified across subpopulations have revealed inequalities in how patients are diagnosed, treated and billed. In this Perspective, we outline fairness in machine learning through the lens of healthcare, and discuss how algorithmic biases (in data acquisition, genetic variation and intra-observer labelling variability, in particular) arise in clinical workflows and the resulting healthcare disparities. We also review emerging technology for mitigating biases via disentanglement, federated learning and model explainability, and their role in the development of AI-based software as a medical device.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Buolamwini, J. & Gebru, T. Gender shades: intersectional accuracy disparities in commercial gender classification. In Conf. on Fairness, Accountability and Transparency 77–91 (PMLR, 2018).
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
Pierson, E., Cutler, D. M., Leskovec, J., Mullainathan, S. & Obermeyer, Z. An algorithmic approach to reducing unexplained pain disparities in underserved populations. Nat. Med. 27, 136–140 (2021).
Hooker, S. Moving beyond ‘algorithmic bias is a data problem’. Patterns 2, 100241 (2021).
McCradden, M. D., Joshi, S., Mazwi, M. & Anderson, J. A. Ethical limitations of algorithmic fairness solutions in health care machine learning. Lancet Digit. Health 2, e221–e223 (2020).
Mhasawade, V., Zhao, Y. & Chunara, R. Machine learning and algorithmic fairness in public and population health. Nat. Mach. Intell. 3, 659–666 (2021).
Currie, G. & Hawk, K. E. Ethical and legal challenges of artificial intelligence in nuclear medicine. In Seminars in Nuclear Medicine (Elsevier, 2020).
Chen, I. Y. et al. Ethical machine learning in healthcare. Annu. Rev. Biomed. Data Sci. 4, 123–144 (2020).
Howard, F. M. et al. The impact of site-specific digital histology signatures on deep learning model accuracy and bias. Nat. Commun. 12, 4423 (2021).
Seyyed-Kalantari, L., Liu, G., McDermott, M., Chen, I. Y. & Ghassemi, M. CheXclusion: fairness gaps in deep chest X-ray classifiers. In BIOCOMPUTING 2021: Proc. Pacific Symposium 232–243 (World Scientific, 2020).
Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen, I. Y. & Ghassemi, M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat. Med. 27, 2176–2182 (2021).
Gichoya, J. W. et al. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4, E406–E414 (2022).
Glocker, B., Jones, C., Bernhardt, M. & Winzeck, S. Algorithmic encoding of protected characteristics in chest X-ray disease detection models. EBioMedicine 89, 104467 (2023).
Proposed Regulatory Framework for Modifications to Artificial Intelligence. Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) (US FDA, 2019).
Gaube, S. et al. Do as AI say: susceptibility in deployment of clinical decision-aids. NPJ Digit. Med. 4, 31 (2021).
Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SAMD) Action Plan (US FDA, 2021).
Vyas, D. A. et al. Challenging the use of race in the vaginal birth after cesarean section calculator. Women’s Health Issues 29, 201–204 (2019).
Vyas, D. A., Eisenstein, L. G. & Jones, D. S. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N. Engl. J. Med. 383, 874–882 (2020).
van der Burgh, A. C., Hoorn, E. J. & Chaker, L. Removing race from kidney function estimates. JAMA 325, 2018 (2021).
Diao, J. A. et al. Clinical implications of removing race from estimates of kidney function. JAMA 325, 184–186 (2021). 2021.
Caton, S. & Haas, C. Fairness in machine learning: a survey. Preprint at https://doi.org/10.48550/arXiv.2010.04053 (2020).
Adler, N. E., Glymour, M. M. & Fielding, J. Addressing social determinants of health and health inequalities. JAMA 316, 1641–1642 (2016).
Phelan, J. C. & Link, B. G. Is racism a fundamental cause of inequalities in health? Annu. Rev. Socio 41, 311–330 (2015).
Yehia, B. R. et al. Association of race with mortality among patients hospitalized with coronavirus disease 2019 (COVID-19) at 92 US hospitals. JAMA Netw. Open 3, e2018039 (2020).
Lopez, L., Hart, L. H. & Katz, M. H. Racial and ethnic health disparities related to COVID-19. JAMA 325, 719–720 (2021).
Bonvicini, K. A. LGBT healthcare disparities: what progress have we made? Patient Educ. Couns. 100, 2357–2361 (2017).
Yamada, T. et al. Access disparity and health inequality of the elderly: unmet needs and delayed healthcare. Int. J. Environ. Res. Public Health 12, 1745–1772 (2015).
Moy, E., Dayton, E. & Clancy, C. M. Compiling the evidence: the national healthcare disparities reports. Health Aff. 24, 376–387 (2005).
Balsa, A. I., Seiler, N., McGuire, T. G. & Bloche, M. G. Clinical uncertainty and healthcare disparities. Am. J. Law Med. 29, 203–219 (2003).
Marmot, M. Social determinants of health inequalities. Lancet 365, 1099–1104 (2005).
Maness, S. B. et al. Social determinants of health and health disparities: COVID-19 exposures and mortality among African American people in the United States. Public Health Rep. 136, 18–22 (2021).
Seligman, H. K., Laraia, B. A. & Kushel, M. B. Food insecurity is associated with chronic disease among low-income NHANES participants. J. Nutr. 140, 304–310 (2010).
Thun, M. J., Apicella, L. F. & Henley, S. J. Smoking vs other risk factors as the cause of smoking attributable deaths: confounding in the courtroom. JAMA 284, 706–712 (2000).
Tucker, M. J., Berg, C. J., Callaghan, W. M. & Hsia, J. The Black–White disparity in pregnancy-related mortality from 5 conditions: differences in prevalence and case-fatality rates. Am. J. Public Health 97, 247–251 (2007).
Gadson, A., Akpovi, E. & Mehta, P. K. Exploring the social determinants of racial/ethnic disparities in prenatal care utilization and maternal outcome. In Seminars in Perinatology 41, 308–317 (Elsevier, 2017).
Wallace, M. et al. Maternity care deserts and pregnancy-associated mortality in louisiana. Women’s Health Issues 31, 122–129 (2021).
Burchard, E. G. et al. The importance of race and ethnic background in biomedical research and clinical practice. N. Engl. J. Med. 348, 1170–1175 (2003).
Phimister, E. G. Medicine and the racial divide. N. Engl. J. Med. 348, 1081–1082 (2003).
Bonham, V. L., Green, E. D. & Pérez-Stable, E. J. Examining how race, ethnicity, and ancestry data are used in biomedical research. JAMA 320, 1533–1534 (2018).
Eneanya, N. D., Yang, W. & Reese, P. P. Reconsidering the consequences of using race to estimate kidney function. JAMA 322, 113–114 (2019).
Zelnick, L. R., Leca, N., Young, B. & Bansal, N. Association of the estimated glomerular filtration rate with vs without a coefficient for race with time to eligibility for kidney transplant. JAMA Netw. Open 4, e2034004 (2021).
Chadban, S. J. et al. KDIGO clinical practice guideline on the evaluation and management of candidates for kidney transplantation. Transplantation 104, S11–S103 (2020).
Wesselman, H. et al. Social determinants of health and race disparities in kidney transplant. Clin. J. Am. Soc. Nephrol. 16, 262–274 (2021).
Kanis JA, H. N. M. E. & Johansson, H. A brief history of frax. Arch. Osteoporos. 13, 118 (2018).
Lewiecki, E. M., Wright, N. C. & Singer, A. J. Racial disparities, frax, and the care of patients with osteoporosis. Osteoporos. Int. 31, 2069–2071 (2020).
Civil Rights Act of 1964. Title VII, Equal Employment Opportunities https://en.wikipedia.org/wiki/Civil_Rights_Act_of_1964 (1964)
Griggs v. Duke Power Co. https://en.wikipedia.org/wiki/Griggs_v._Duke_Power_Co (1971).
Awad, E. et al. The moral machine experiment. Nature 563, 59–64 (2018).
Feller, A., Pierson, E., Corbett-Davies, S. & Goel, S. A computer program used for bail and sentencing decisions was labeled biased against blacks. it’s actually not that clear. The Washington Post (17 October 2016).
Dressel, J. & Farid, H. The accuracy, fairness, and limits of predicting recidivism. Sci. Adv. 4, eaao5580 (2018).
Char, D. S., Shah, N. H. & Magnus, D. Implementing machine learning in health care—addressing ethical challenges. N. Engl. J. Med. 378, 981–983 (2018).
Bernhardt, M., Jones, C. & Glocker, B. Potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms. Nat. Med. 28, 1157–1158 (2022).
Mukherjee, P. et al. Confounding factors need to be accounted for in assessing bias by machine learning algorithms. Nat. Med. 28, 1159–1160 (2022).
Diao, J. A., Powe, N. R. & Manrai, A. K. Race-free equations for eGFR: comparing effects on CKD classification. J. Am. Soc. Nephrol. 32, 1868–1870 (2021).
Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C. & Venkatasubramanian, S. Certifying and removing disparate impact. In Proc. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 259–268 (2015).
Hardt, M., Price, E. & Srebro, N. Equality of opportunity in supervised learning. In Adv. Neural Information Processing Systems (2016).
Corbett-Davies, S. & Goel, S. The measure and mismeasure of fairness: a critical review of fair machine learning. Preprint at https://doi.org/10.48550/arXiv.1808.00023 (2018).
Calders, T., Kamiran, F. & Pechenizkiy, M. Building classifiers with independency constraints. In Int. Conf. Data Mining Workshops 13–18 (IEEE, 2009).
Chen, J., Kallus, N., Mao, X., Svacha, G. & Udell, M. Fairness under unawareness: assessing disparity when protected class is unobserved. In Proc. Conf. Fairness, Accountability, and Transparency 339–348 (2019).
Zliobaite, I., Kamiran, F. & Calders, T. Handling conditional discrimination. In 11th Int. Conf. Data Mining 992–1001 (IEEE, 2011).
Dwork, C., Hardt, M., Pitassi, T., Reingold, O. & Zemel, R. Fairness through awareness. In Proc. 3rd Innovations in Theoretical Computer Science Conf. 214–226 (2012).
Pedreshi, D., Ruggieri, S. & Turini, F. Discrimination-aware data mining. In Proc. 14th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining 560–568 (2008).
Angwin, J., Larson, J., Mattu, S. & Kirchner, L. In Ethics of Data and Analytics 254–264 (Auerbach, 2016).
Kleinberg, J., Mullainathan, S. & Raghavan, M. Inherent trade-offs in the fair determination of risk scores. In 8th Innovations in Theoretical Computer Science Conf. (ITCS 2017)
Chouldechova, A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5, 153–163 (2017).
Joseph, M., Kearns, M., Morgenstern, J. H. & Roth, A. Fairness in learning: classic and contextual bandits. In Adv. Neural Information Processing Systems (2016).
Celis, L. E. & Keswani, V. Improved adversarial learning for fair classification. Preprint at https://doi.org/10.48550/arXiv.1901.10443 (2019).
Kamiran, F. & Calders, T. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33, 1–33 (2012).
Calmon, F. P., Wei, D., Vinzamuri, B., Ramamurthy, K. N. & Varshney, K. R. Optimized pre-processing for discrimination prevention. In Proc. 31st Int. Conf. Neural Information Processing Systems 3995–4004 (2017).
Krasanakis, E., Spyromitros-Xioufis, E., Papadopoulos, S. & Kompatsiaris, Y. Adaptive sensitive reweighting to mitigate bias in fairness-aware classification. In Proc. 2018 World Wide Web Conf. 853–862 (2018).
Jiang, H. & Nachum, O. Identifying and correcting label bias in machine learning. In Int. Conf. Artificial Intelligence and Statistics 702–712 (PMLR, 2020).
Chai, X. et al. Unsupervised domain adaptation techniques based on auto-encoder for non-stationary eeg-based emotion recognition. Comput. Biol. Med. 79, 205–214 (2016).
Kamishima, T., Akaho, S., Asoh, H. & Sakuma, J. Fairness-aware classifier with prejudice remover regularizer. In Joint Eur. Conf. Machine Learning and Knowledge Discovery in Databases 35–50 (Springer, 2012).
Zafar, M. B., Valera, I., Rogriguez, M. G. & Gummadi, K. P. Fairness constraints: mechanisms for fair classification. In Artificial Intelligence and Statistics 962–970 (PMLR, 2017).
Goel, N., Yaghini, M. & Faltings, B. Non-discriminatory machine learning through convex fairness criteria. In 32nd AAAI Conference on Artificial Intelligence (2018).
Goh, G., Cotter, A., Gupta, M. & Friedlander, M. P. Satisfying real-world goals with dataset constraints. In Adv. Neural Information Processing Systems (2016).
Agarwal, A. et al. A reductions approach to fair classification. In Int. Conf. Machine Learning 60–69 (PMLR, 2018).
Corbett-Davies, S., Pierson, E., Feller, A., Goel, S. & Huq, A. Algorithmic decision making and the cost of fairness. In Proc. 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining 797–806 (2017).
Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J. & Weinberger, K. Q. On fairness and calibration. In Adv. Neural Information Processing Systems (2017).
Chouldechova, A., Benavides-Prado, D., Fialko, O. & Vaithianathan, R. A case study of algorithm assisted decision making in child maltreatment hotline screening decisions. In Conf. Fairness, Accountability and Transparency 134–148 (PMLR, 2018).
Abernethy, J., Awasthi, P., Kleindessner, M., Morgenstern, J. & Zhang, J. Active sampling for min-max fairness. In Int. Conf. Machine Learning 53–65, (PMLR, 2022).
Iosifidis, V. & Ntoutsi, E. Dealing with bias via data augmentation in supervised learning scenarios. In Proc. Int. Workshop on Bias in Information, Algorithms, and Systems (eds. Bates, J. et al.) (2018).
Vodrahalli, K., Li, K. & Malik, J. Are all training examples created equal? An empirical study. Preprint at https://doi.org/10.48550/arXiv.1811.12569 (2018).
Barocas, S. & Selbst, A. D. Big data’s disparate impact. Calif. Law Rev. 104, 671 (2016).
O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Crown, 2016).
Rezaei, A., Liu, A., Memarrast, O. & Ziebart, B. D. Robust fairness under covariate shift. In Proc. AAAI Conf. Artificial Intelligence 35, 9419–9427 (2021).
Alabi, D., Immorlica, N. & Kalai, A. Unleashing linear optimizers for group-fair learning and optimization. In Conf. Learning Theory 2043–2066 (PMLR, 2018).
Kearns, M., Neel, S., Roth, A. & Wu, Z. S. Preventing fairness gerrymandering: auditing and learning for subgroup fairness. In Int. Conf. Machine Learning 2564–2572 (PMLR, 2018).
Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2, 158–164 (2018).
Babenko, B. et al. Detection of signs of disease in external photographs of the eyes via deep learning. Nat. Biomed. Eng. 6, 1370–1383 (2022).
Kamishima, T., Akaho, S. & Sakuma, J. Fairness-aware learning through regularization approach. In 2011 IEEE 11th Int. Conf. Data Mining Workshops 643–650 (IEEE, 2011).
Zafar, M. B., Valera, I., Gomez Rodriguez, M. & Gummadi, K. P. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proc. 26th Int. Conf. World Wide Web 1171–1180 (2017).
Zemel, R., Wu, Y., Swersky, K., Pitassi, T. & Dwork, C. Learning fair representations. In Int. Conf. Machine Learning 325–333 (PMLR, 2013).
Kim, M., Reingold, O. & Rothblum, G. Fairness through computationally-bounded awareness. In Adv. Neural Information Processing Systems (2018).
Pfohl, S. R., Foryciarz, A. & Shah, N. H. An empirical characterization of fair machine learning for clinical risk prediction. J. Biomed. Inform. 113, 103621 (2021).
Foryciarz, A., Pfohl, S. R., Patel, B. & Shah, N. Evaluating algorithmic fairness in the presence of clinical guidelines: the case of atherosclerotic cardiovascular disease risk estimation. BMJ Health Care Inf. 29, e100460 (2022).
Muntner, P. et al. Potential US population impact of the 2017 ACC/AHA high blood pressure guideline. Circulation 137, 109–118 (2018).
Chen, I., Johansson, F. D. & Sontag, D. Why is my classifier discriminatory? In Adv. Neural Information Processing Systems (2018).
Raji, I. D. & Buolamwini, J. Actionable auditing: investigating the impact of publicly naming biased performance results of commercial AI products. In Proc. 2019 AAAI/ACM Conf. AI, Ethics, and Society 429–435 (2019).
Rolf, E., Worledge, T., Recht, B. & Jordan, M. I. Representation matters: assessing the importance of subgroup allocations in training data. In Int. Conf. Machine Learning 9040–9051 (2021).
Zhao, H. & Gordon, G. Inherent tradeoffs in learning fair representations. In Adv. Neural InformationProcessing Systems 32, 15675–15685 (2019).
Pfohl, S. et al. Creating fair models of atherosclerotic cardiovascular disease risk. In Proc. 2019 AAAI/ACM Conf. AI, Ethics, and Society 271–278 (2019).
Pfohl, S. R. Recommendations for Algorithmic Fairness Assessments of Predictive Models in Healthcare: Evidence from Large-scale Empirical Analyses. PhD thesis, Stanford Univ. (2021).
Singh, H., Singh, R., Mhasawade, V. & Chunara, R. Fairness violations and mitigation under covariate shift. In Proc. 2021 ACM Conf. Fairness, Accountability, and Transparency 3–13 (2021).
Biswas, A. & Mukherjee, S. Ensuring fairness under prior probability shifts. In Proc. 2021 AAAI/ACM Conf. AI, Ethics, and Society 414–424 (2021).
Giguere, S. et al. Fairness guarantees under demographic shift. In Int. Conf. Learning Representations (2021).
Mishler, A. & Dalmasso, N. Fair when trained, unfair when deployed: observable fairness measures are unstable in performative prediction settings. Preprint at https://doi.org/10.48550/arXiv.2202.05049 (2022).
Duchi, J. & Namkoong, H. Learning models with uniform performance via distributionally robust optimization. Ann. Stat. 49, 1378–1406 (2021).
Hashimoto, T., Srivastava, M., Namkoong, H. & Liang, P. Fairness without demographics in repeated loss minimization. In Int. Conf. Machine Learning 1929–1938 (PMLR, 2018).
Wang, S. et al. Robust optimization for fairness with noisy protected groups. In Adv. Neural InformationProcessing Systems 33, 5190–5203 (2020).
Coston, A. et al. Fair transfer learning with missing protected attributes. In Proc. 2019 AAAI/ACM Conf. AI, Ethics, and Society 91–98 (2019).
Schumann, C. et al. Transfer of machine learning fairness across domains. In NeurIPS AI for Social Good Workshop (2019).
Lahoti, P. et al. Fairness without demographics through adversarially reweighted learning. In Adv. Neural Information Processing Systems 33, 728–740 (2020).
Yan, S., Kao, H.-t. & Ferrara, E. Fair class balancing: enhancing model fairness without observing sensitive attributes. In Proc. 29th ACM Int. Conf. Information and Knowledge Management 1715–1724 (2020).
Zhao, T., Dai, E., Shu, K. & Wang, S. Towards fair classifiers without sensitive attributes: exploring biases in related features. In Proc. 15th ACM Int. Conf. Web Search and Data Mining 1433–1442 (2022).
Quinonero-Candela, J., Sugiyama, M., Lawrence, N. D. & Schwaighofer, A. Dataset Shift in Machine Learning (MIT Press, 2009).
Subbaswamy, A., Schulam, P. & Saria, S. Preventing failures due to dataset shift: learning predictive models that transport. In 22nd Int. Conf. Artificial Intelligence and Statistics 3118–3127 (PMLR, 2019).
Subbaswamy, A. & Saria, S. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics 21, 345–352 (2020).
Guo, L. L. et al. Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Sci. Rep. 12, 2726 (2022).
Singh, H., Singh, R., Mhasawade, V. & Chunara, R. Fair predictors under distribution shift. In NeurIPS Workshop on Fair ML for Health (2019).
Bernhardt, M., Jones, C. & Glocker, B. Investigating underdiagnosis of ai algorithms in the presence of multiple sources of dataset bias. Nat. Med. 28, 1157–1158 (2022).
Ghosh, A. & Shanbhag, A. FairCanary: rapid continuous explainable fairness. In Proc. AAAI/ACM Conf. AI, Ethics, and Society (2022).
Sagawa, S., Koh, P. W., Hashimoto, T. B. & Liang, P. Distributionally robust neural networks. In Int. Conf. Learning Representations (2020).
Yang, Y., Zhang, H., Katabi, D. & Ghassemi, M. Change is hard: a closer look at subpopulation shift. In Int. Conf. Machine Learning (2023).
Zong, Y., Yang, Y. & Hospedales, T. MEDFAIR: benchmarking fairness for medical imaging. In Int. Conf. Learning Representations (2023).
Lipkova, J. et al. Deep learning-enabled assessment of cardiac allograft rejection from endomyocardial biopsies. Nat. Med. 28, 575–582 (2022).
Tedeschi, P. & Griffith, J. R. Classification of hospital patients as ‘surgical’. Implications of the shift to ICD-9-CM. Med. Care 22, 189–192 (1984).
Heslin, K. C. et al. Trends in opioid-related inpatient stays shifted after the US transitioned to ICD-10-CM diagnosis coding in 2015. Med. Care 55, 918–923 (2017).
Castro, D. C., Walker, I. & Glocker, B. Causality matters in medical imaging. Nat. Commun. 11, 3673 (2020).
Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal 6, pl1 (2013).
Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Wen, D. et al. Characteristics of publicly available skin cancer image datasets: a systematic review. Lancet Digit. Health 4, e64–e74 (2021).
Khan, S. M. et al. A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability. Lancet Digit. Health 3, e51–e66 (2021).
Mamary, A. J. et al. Race and gender disparities are evident in COPD underdiagnoses across all severities of measured airflow obstruction. Chronic Obstr. Pulm. Dis. 5, 177 (2018).
Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen, I. Y. & Ghassemi, M. Reply to: ‘potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms’ and ‘confounding factors need to be accounted for in assessing bias by machine learning algorithms’. Nat. Med. 28, 1161–1162 (2022).
Landry, L. G., Ali, N., Williams, D. R., Rehm, H. L. & Bonham, V. L. Lack of diversity in genomic databases is a barrier to translating precision medicine research into practice. Health Aff. 37, 780–785 (2018).
Gusev, A. et al. Atlas of prostate cancer heritability in European and African-American men pinpoints tissue-specific regulation. Nat. Commun. 7, 10979 (2016).
Hinch, A. G. et al. The landscape of recombination in African Americans. Nature 476, 170–175 (2011).
Shriver, M. D. et al. Skin pigmentation, biogeographical ancestry and admixture mapping. Hum. Genet. 112, 387–399 (2003).
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12, e1001779 (2015).
Puyol-Anton, E. et al. Fairness in cardiac MR image analysis: an investigation of bias due to data imbalance in deep learning based segmentation. Med. Image Comput. Computer Assist. Intervention 24, 413–423 (2021).
Kraft, S. A. et al. Beyond consent: building trusting relationships with diverse populations in precision medicine research. Am. J. Bioeth. 18, 3–20 (2018).
West, K. M., Blacksher, E. & Burke, W. Genomics, health disparities, and missed opportunities for the nation’s research agenda. JAMA 317, 1831–1832 (2017).
Mahal, B. A. et al. Racial differences in genomic profiling of prostate cancer. N. Engl. J. Med. 383, 1083–1085 (2020).
Shi, Y. et al. A prospective, molecular epidemiology study of EGFR mutations in asian patients with advanced non–small-cell lung cancer of adenocarcinoma histology (PIONEER). J. Thorac. Oncol. 9, 154–162 (2014).
Spratt, D. E. et al. Racial/ethnic disparities in genomic sequencing. JAMA Oncol. 2, 1070–1074 (2016).
Zhang, G. et al. Characterization of frequently mutated cancer genes in chinese breast tumors: a comparison of chinese and TCGA cohorts. Ann. Transl. Med. 7, 179 (2019).
Zavala, V. A. et al. Cancer health disparities in racial/ethnic minorities in the United States. Br. J. Cancer 124, 315–332 (2020).
Zhang, W., Edwards, A., Flemington, E. K. & Zhang, K. Racial disparities in patient survival and tumor mutation burden, and the association between tumor mutation burden and cancer incidence rate. Sci. Rep. 7, 13639 (2017).
Ooi, S. L., Martinez, M. E. & Li, C. I. Disparities in breast cancer characteristics and outcomes by race/ethnicity. Breast Cancer Res. Treat. 127, 729–738 (2011).
Henderson, B. E., Lee, N. H., Seewaldt, V. & Shen, H. The influence of race and ethnicity on the biology of cancer. Nat. Rev. Cancer 12, 648–653 (2012).
Gamble, P. et al. Determining breast cancer biomarker status and associated morphological features using deep learning. Commun. Med. 1, 1–12 (2021).
Borrell, L. N. et al. Race and genetic ancestry in medicine—a time for reckoning with racism. N. Engl. J. Med. 384, 474–480 (2021).
Martini, R., Newman, L. & Davis, M. Breast cancer disparities in outcomes; unmasking biological determinants associated with racial and genetic diversity. Clin. Exp. Metastasis 39, 7–14 (2022).
Martini, R. et al. African ancestry–associated gene expression profiles in triple-negative breast cancer underlie altered tumor biology and clinical outcome in women of African descent. Cancer Discov. 12, 2530–2551 (2022).
Herbst, R. S. et al. Atezolizumab for first-line treatment of PD-L1–selected patients with NSCLC. N. Engl. J. Med. 383, 1328–1339 (2020).
Clarke, M. A., Devesa, S. S., Hammer, A. & Wentzensen, N. Racial and ethnic differences in hysterectomy-corrected uterine corpus cancer mortality by stage and histologic subtype. JAMA Oncol. 8, 895–903 (2022).
Yeyeodu, S. T., Kidd, L. R. & Kimbro, K. S. Protective innate immune variants in racial/ethnic disparities of breast and prostate cancer. Cancer Immunol. Res. 7, 1384–1389 (2019).
Yang, W. et al. Sex differences in gbm revealed by analysis of patient imaging, transcriptome, and survival data. Sci. Transl. Med. 11, eaao5253 (2019).
Carrano, A., Juarez, J. J., Incontri, D., Ibarra, A. & Cazares, H. G. Sex-specific differences in glioblastoma. Cells 10, 1783 (2021).
Creed, J. H. et al. Commercial gene expression tests for prostate cancer prognosis provide paradoxical estimates of race-specific risk. Cancer Epidemiol. Biomark. Prev. 29, 246–253 (2020).
Burlina, P., Joshi, N., Paul, W., Pacheco, K. D. & Bressler, N. M. Addressing artificial intelligence bias in retinal diagnostics. Transl. Vis. Sci. Technol. 10, 13 (2021).
Kakadekar, A., Greene, D. N., Schmidt, R. L., Khalifa, M. A. & Andrews, A. R. Nonhormone-related histologic findings in postsurgical pathology specimens from transgender persons: a systematic review. Am. J. Clin. Pathol. 157, 337–344 (2022).
Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).
Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).
Echle, A. et al. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br. J. Cancer 124, 686–696 (2021).
Dwork, C., Immorlica, N., Kalai, A. T. & Leiserson, M. Decoupled classifiers for fair and efficient machine learning. In Conf. Fairness, Accountability and Transparency (PMLR, 2018).
Lipton, Z., McAuley, J. & Chouldechova, A. Does mitigating ml’s impact disparity require treatment disparity? In Adv. Neural Information Processing Systems (2018).
Madras, D., Creager, E., Pitassi, T. & Zemel, R. Fairness through causal awareness: learning causal latent-variable models for biased data. In Proc. Conf. Fairness, Accountability, and Transparency 349–358 (2019).
Lohaus, M., Kleindessner, M., Kenthapadi, K., Locatello, F. & Russell, C. Are two heads the same as one? Identifying disparate treatment in fair neural networks. In Adv. Neural Information Processing Systems (2022).
McCarty, C. A. et al. The emerge network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med. Genet. 4, 1–11 (2011).
Gottesman, O. et al. The electronic medical records and genomics (emerge) network: past, present, and future. Genet. Med. 15, 761–771 (2013).
Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019).
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Manrai, A. K. et al. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375, 655–665 (2016).
Dehkharghanian, T. et al. Biased data, biased AI: deep networks predict the acquisition site of TCGA images. Diagn. Pathol. 18, 1–12 (2023).
Ganin, Y. & Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Int. Conf. Machine Learning 1180–1189 (PMLR, 2015).
Shaban, M. T., Baur, C., Navab, N. & Albarqouni, S. StainGAN: stain style transfer for digital histological images. In 2019 IEEE 16th Int. Symp. Biomedical Imaging (ISBI 2019) 953–956 (IEEE, 2019).
Widmer, G. & Kubat, M. Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23, 69–101 (1996).
Schlimmer, J. C. & Granger, R. H. Incremental learning from noisy data. Mach. Learn. 1, 317–354 (1986).
Lu, J. et al. Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31, 2346–2363 (2018).
Guo, L. L. et al. Systematic review of approaches to preserve machine learning performance in the presence of temporal dataset shift in clinical medicine. Appl. Clin. Inform. 12, 808–815 (2021).
Barocas, S. et al. Designing disaggregated evaluations of AI systems: choices, considerations, and tradeoffs. In Proc. 2021 AAAI/ACM Conf. AI, Ethics, and Society 368–378 (2021).
Zhou, H., Chen, Y. & Lipton, Z. C. Evaluating model performance in medical datasets over time. In Proc. Conf. Health, Inference, and Learning (2023).
Scholkopf, B. et al. On causal and anticausal learning. In Int. Conf. Machine Learning (2012).
Lipton, Z., Wang, Y.-X. & Smola, A. Detecting and correcting for label shift with black box predictors. In Int. Conf. Machine Learning 3122–3130 (PMLR, 2018).
Loupy, A., Mengel, M. & Haas, M. Thirty years of the international banff classification for allograft pathology: the past, present, and future of kidney transplant diagnostics. Kidney Int 101, 678–691 (2022).
Delahunt, B. et al. Gleason and Fuhrman no longer make the grade. Histopathology 68, 475–481 (2016).
Davatchi, F. et al. The saga of diagnostic/classification criteria in Behcet’s disease. Int. J. Rheum. Dis. 18, 594–605 (2015).
Louis, D. N. et al. The 2016 World Health Organization classification of tumors of the central nervous system: a summary. Acta Neuropathol. 131, 803–820 (2016).
Bifet, A. & Gavalda, R. Learning from time-changing data with adaptive windowing. In Proc. 2007 SIAM International Conference on Data Mining 443–448 (SIAM, 2007).
Nigenda, D. et al. Amazon SageMaker Model Monitor: a system for real-time insights into deployed machine learning models. In Proc. 28th ACM SIGKDD Conf. Knowledge Discovery and Data Mining (2022).
Miroshnikov, A., Kotsiopoulos, K., Franks, R. & Kannan, A. R. Wasserstein-based fairness interpretability framework for machine learning models. Mach. Learn. 111, 3307–3357 (2022).
Board, A. E. AAA statement on race. Am. Anthropol. 100, 712–713 (1998).
Oni-Orisan, A., Mavura, Y., Banda, Y., Thornton, T. A. & Sebro, R. Embracing genetic diversity to improve black health. N. Engl. J. Med. 384, 1163–1167 (2021).
Calhoun, A. The pathophysiology of racial disparities. N. Engl. J. Med. 384, e78 (2021).
Sun, R. et al. Don’t ignore genetic data from minority populations. Nature 585, 184–186 (2020).
Lannin, D. R. et al. Influence of socioeconomic and cultural factors on racial differences in late-stage presentation of breast cancer. JAMA 279, 1801–1807 (1998).
Bao, M. et al. It’s COMPASlicated: the messy relationship between RAI datasets and algorithmic fairness benchmarks. In 35th Conf. Neural Information Processing Systems Datasets and Benchmarks (2021).
Hao, M. et al. Efficient and privacy-enhanced federated learning for industrial artificial intelligence. IEEE Trans. Ind. Inf. 16, 6532–6542 (2019).
Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10, 1–19 (2019).
Bonawitz, K. et al. Practical secure aggregation for privacy-preserving machine learning. In Proc. 2017 ACM SIGSAC Conf. Computer and Communications Security 1175–1191 (2017).
Bonawitz, K. et al. Towards federated learning at scale: system design. In Proc. Mach. Learn. Syst. 1, 374–388 (2019).
Brisimi, T. S. et al. Federated learning of predictive models from federated electronic health records. Int. J. Med. Inform. 112, 59–67 (2018).
Huang, L. et al. Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. J. Biomed. Inform. 99, 103291 (2019).
Xu, J. et al. Federated learning for healthcare informatics. J. Healthc. Inform. Res. 5, 1–19 (2021).
Chakroborty, S., Patel, K. R. & Freytag, A. Beyond federated learning: fusion strategies for diabetic retinopathy screening algorithms trained from different device types. Invest. Ophthalmol. Vis. Sci. 62, 85–85 (2021).
Ju, C. et al. Federated transfer learning for EEG signal classification. In 42nd Annu. Int. Conf. IEEE Engineering in Medicine and Biology Society 3040–3045 (IEEE, 2020).
Li, W. et al. Privacy-preserving federated brain tumour segmentation. In Int. Workshop on Machine Learning in Medical Imaging 133–141 (Springer, 2019).
Kaissis, G. et al. End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nat. Mach. Intell. 3, 473–484 (2021).
Rieke, N. et al. The future of digital health with federated learning. NPJ Digit. Med. 3, 119 (2020).
Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10, 12598 (2020).
Choudhury, O. et al. Differential privacy-enabled federated learning for sensitive health data. In Machine Learning for Health (ML4H) Workshop at NeurIPS (2019).
Kushida, C. A. et al. Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies. Med. Care 50, S82–S101 (2012).
van der Haak, M. et al. Data security and protection in cross-institutional electronic patient records. Int. J. Med. Inform. 70, 117–130 (2003).
Veale, M. & Binns, R. Fairer machine learning in the real world: mitigating discrimination without collecting sensitive data. Big Data Soc. 4, 2053951717743530 (2017).
Fiume, M. et al. Federated discovery and sharing of genomic data using beacons. Nat. Biotechnol. 37, 220–224 (2019).
Sadilek, A. et al. Privacy-first health research with federated learning. NPJ Digit. Med. 4, 132 (2021).
Duan, R., Boland, M. R., Moore, J. H. & Chen, Y. ODAL: a one-shot distributed algorithm to perform logistic regressions on electronic health records data from multiple clinical sites. In BIOCOMPUTING 2019: Proc. Pacific Symposium 30–41 (World Scientific, 2018).
Sarma, K. V. et al. Federated learning improves site performance in multicenter deep learning without data sharing. J. Am. Med. Inform. Assoc. 28, 1259–1264 (2021).
Silva, S. et al. Federated learning in distributed medical databases: meta-analysis of large-scale subcortical brain data. In 2019 IEEE 16th International Symposium on Biomedical Imaging 270–274 (IEEE, 2019).
Roy, A. G., Siddiqui, S., Polsterl, S., Navab, N. & Wachinger, C. BrainTorrent: a peer-to-peer environment for decentralized federated learning. Preprint at https://doi.org/10.48550/arXiv.1905.06731 (2019).
Lu, M. Y. et al. Federated learning for computational pathology on gigapixel whole slide images. Med. Image Anal. 76, 102298 (2022).
Dou, Q. et al. Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study. NPJ Digit. Med. 4, 60 (2021).
Yang, D. et al. Federated semi-supervised learning for COVID region segmentation in chest CT using multinational data from China, Italy, Japan. Med. Image Anal. 70, 101992 (2021).
Vaid, A. et al. Federated learning of electronic health records to improve mortality prediction in hospitalized patients with COVID-19: machine learning approach. JMIR Med. Inform. 9, e24207 (2021).
Li, S., Cai, T. & Duan, R. Targeting underrepresented populations in precision medicine: a federated transfer learning approach. Preprint at https://doi.org/10.48550/arXiv.2108.12112 (2023).
Mandl, K. D. et al. The genomics research and innovation network: creating an interoperable, federated, genomics learning system. Genet. Med. 22, 371–380 (2020).
Cai, M. et al. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. Am. J. Hum. Genet. 108, 632–655 (2021).
Liang, J., Hu, D. & Feng, J. Do we really need to access the source data? Source hypothesis transfer for unsupervised domain adaptation. In Int. Conf. Machine Learning 6028–6039 (PMLR, 2020).
Song, L., Ma, C., Zhang, G. & Zhang, Y. Privacy-preserving unsupervised domain adaptation in federated setting. IEEE Access 8, 143233–143240 (2020).
Li, X. et al. Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results. Med. Image Anal. 65, 101765 (2020).
Peterson, D., Kanani, P. & Marathe, V. J. Private federated learning with domain adaptation. In Federated Learning for Data Privacy and Confidentiality Workshop in NeurIPS (2019).
Peng, X., Huang, Z., Zhu, Y. & Saenko, K. Federated adversarial domain adaptation. In Int. Conf. Learning Representations (2020).
Yao, C.-H. et al. Federated multi-target domain adaptation. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 1424–1433 (2022).
Li, T., Sanjabi, M., Beirami, A. & Smith, V. Fair resource allocation in federated learning. In Int. Conf. Learning Representations (2020).
Mohri, M., Sivek, G. & Suresh, A. T. Agnostic federated learning. In Int. Conf. Machine Learning 4615-4625 (PMLR, 2019).
Ezzeldin, Y. H., Yan, S., He, C., Ferrara, E. & Avestimehr, S. FairFed: enabling group fairness in federated learning. In Proc. AAAI Conf. Artificial Intelligence (2023).
Papadaki, A., Martinez, N., Bertran, M., Sapiro, G. & Rodrigues, M. Minimax demographic group fairness in federated learning. In ACM Conf. Fairness, Accountability, and Transparency 142–159 (2022).
Chen, D., Gao, D., Kuang, W., Li, Y. & Ding, B. pFL-Bench: a comprehensive benchmark for personalized federated learning. In 36th Conf. Neural Information Processing Systems Datasets and Benchmarks Track (2022).
Chai, J. & Wang, X. Self-supervised fair representation learning without demographics. In Adv. Neural Information Processing Systems (2022).
Jiang, M. et al. Fair federated medical image segmentation via client contribution estimation. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 16302–16311 (2023).
Jiang, M., Wang, Z. & Dou, Q. Harmofl: harmonizing local and global drifts in federated learning on heterogeneous medical images. In Proc. AAAI Conf. Artificial Intelligence 1087–1095 (2022).
Xu, Y. Y., Lin, C. S. and Wang, Y. C. F. Bias-eliminating augmentation learning for debiased federated learning. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 20442–20452 (2023).
Zhao, Y. et al. Federated learning with non-IID data. Preprint at https://doi.org/10.48550/arXiv.1806.00582 (2018).
Konečný, J. et al. Federated learning: strategies for improving communication efficiency. Preprint at https://doi.org/10.48550/arXiv.1610.05492 (2016).
Lin, Y., Han, S., Mao, H., Wang, Y. & Dally, W. J. Deep gradient compression: reducing the communication bandwidth for distributed training. In Int. Conf. Learning Representations (2018).
McMahan, B., et al Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics 1273–1282 (PMLR, 2017).
Li, T. et al. Federated optimization in heterogeneous networks. In Proc. Mach. Learn. Syst. 2, 429–450 (2020).
Sattler, F., Wiedemann, S., Muller, K.-R. & Samek, W. Robust and communication-efficient federated learning from non-iid data. In IEEE Trans. Neural Netw. Learn. Syst. 31, 3400–3413 (2019).
Abay, A. et al. Mitigating bias in federated learning. Preprint at https://doi.org/10.48550/arXiv.2012.02447 (2020).
Luo, Z., Wang, Y., Wang, Z., Sun, Z. & Tan, T. Disentangled federated learning for tackling attributes skew via invariant aggregation and diversity transferring. In Int. Conf. Machine Learning 14527–14541 (PMLR, 2022).
McNamara, D., Ong, C. S. & Williamson, R. C. Costs and benefits of fair representation learning. In Proc. 2019 AAAI/ACM Conference on AI, Ethics, and Society 263–270 (2019).
Madaio, M. A., Stark, L., Wortman Vaughan, J. & Wallach, H. Co-designing checklists to understand organizational challenges and opportunities around fairness in AI. In Proc. 2020 CHI Conf. Human Factors in Computing Systems (2020).
Jung, K. et al. A framework for making predictive models useful in practice. J. Am. Med. Inform. Assoc. 28, 1149–1158 (2021).
Pogodin, R. et al. Efficient conditionally invariant representation learning. In Int. Conf. Learning Representations (2023).
Louizos, C. et al. Causal effect inference with deep latent-variable models. In Adv. Neural Information Processing Systems (2017).
Shi, C., Blei, D. & Veitch, V. Adapting neural networks for the estimation of treatment effects. In Adv. Neural Information Processing Systems (2019).
Yoon, J., Jordon, J. & Van Der Schaar, M. GANITE: estimation of individualized treatment effects using generative adversarial nets. In Int. Conf. Learning Representations (2018).
Rezaei, A., Fathony, R., Memarrast, O. & Ziebart, B. Fairness for robust log loss classification. In Proc. AAAI Conf. Artificial Intelligence 34, 5511–5518 (2020).
Petrović, A., Nikolić, M., Radovanović, S., Delibašić, B. & Jovanović, M. FAIR: Fair adversarial instance re-weighting. Neurocomputing 476, 14–37 (2020).
Sattigeri, P., Hoffman, S. C., Chenthamarakshan, V. & Varshney, K. R. Fairness GAN: generating datasets with fairness properties using a generative adversarial network. IBM J. Res. Dev. 63, 3:1–3:9 (2019).
Xu, D., Yuan, S., Zhang, L. & Wu, X. FairGAN: fairness-aware generative adversarial networks. In 2018 IEEE International Conference on Big Data 570–575 (IEEE, 2018).
Xu, H., Liu, X., Li, Y., Jain, A. & Tang, J. To be robust or to be fair: towards fairness in adversarial training. In Int. Conf. Machine Learning 11492–11501 (PMLR, 2021).
Wadsworth, C., Vera, F. & Piech, C. Achieving fairness through adversarial learning: an application to recidivism prediction. In FAT/ML Workshop (2018).
Adel, T., Valera, I., Ghahramani, Z. & Weller, A. One-network adversarial fairness. In Proc. AAAI Conf. Artificial Intelligence 33, 2412–2420 (2019).
Madras, D., Creager, E., Pitassi, T. & Zemel, R. Learning adversarially fair and transferable representations. In Int. Conf. Machine Learning 3384–3393 (PMLR, 2018).
Madras, D., Creager, E., Pitassi, T. & Zemel, R. Learning adversarially fair and transferable representations. In Proc. 35th Int. Conf. Machine Learning (eds. Dy, J. & Krause, A.) 3384–3393 (PMLR, 2018).
Chen, X., Fain, B., Lyu, L. & Munagala, K. Proportionally fair clustering. In Proc. 36th Int. Conf. Machine Learning (eds. Chaudhuri, K. & Salakhutdinov, R.) 1032–1041 (PMLR, 2019).
Li, P., Zhao, H. & Liu, H. Deep fair clustering for visual learning. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 9070–9079 (2020).
Hong, J. et al. Federated adversarial debiasing for fair and transferable representations. In Proc. 27th ACM SIGKDD Conf. Knowledge Discovery and Data Mining 617–627 (2021).
Qi, T. et al. FairVFL: a fair vertical federated learning framework with contrastive adversarial learning. In Adv. Neural Information Processing Systems (2022).
Chen, Y., Raab, R., Wang, J. & Liu, Y. Fairness transferability subject to bounded distribution shift. In Adv. Neural Information Processing Systems (2022).
An, B., Che, Z., Ding, M. & Huang, F. Transferring fairness under distribution shifts via fair consistency regularization. In Adv. Neural Information Processing Systems (2022).
Giguere, S. et al. Fairness guarantees under demographic shift. In Int. Conf. Learning Representations (2022).
Schrouff, J. et al. Diagnosing failures of fairness transfer across distribution shift in real-world medical settings. In Adv. Neural Information Processing Systems (2022).
Lipkova, J. et al. Personalized radiotherapy design for glioblastoma: integrating mathematical tumor models, multimodal scans, and Bayesian inference. In IEEE Trans. Med. Imaging 38, 1875–1884 (2019).
Cen, L. P. et al. Automatic detection of 39 fundus diseases and conditions in retinal photographs using deep neural networks. Nat. Commun. 12, 4828 (2021).
Lézoray, O., Revenu, M. & Desvignes, M. Graph-based skin lesion segmentation of multispectral dermoscopic images. In IEEE Int. Conf. Image Processing 897–901 (2014).
Manica, A., Prugnolle, F. & Balloux, F. Geography is a better determinant of human genetic differentiation than ethnicity. Hum. Genet. 118, 366–371 (2005).
Hadad, N., Wolf, L. & Shahar, M. A two-step disentanglement method. In Proc. IEEE Conf. Computer Vision and Pattern Recognition 772–780 (2018).
Achille, A. & Soatto, S. Emergence of invariance and disentanglement in deep representations. J. Mach. Learn. Res. 19, 1947–1980 (2018).
Chen, R. T., Li, X., Grosse, R. & Duvenaud, D. Isolating sources of disentanglement in variational autoencoders. In Adv. Neural Information Processing Systems (2018).
Kim, H. & Mnih, A. Disentangling by factorising. In Int. Conf. Machine Learning 2649–2658 (PMLR, 2018).
Higgins, I. et al. beta-VAE: learning basic visual concepts with a constrained variational framework In Int. Conf. Learning Representations (2017).
Sarhan, M. H., Eslami, A., Navab, N. & Albarqouni, S. Learning interpretable disentangled representations using adversarial VAEs. In Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data 37–44 (Springer, 2019).
Gyawali, P. K. et al. Learning to disentangle inter-subject anatomical variations in electrocardiographic data. In IEEE Trans. Biomedical Engineering (IEEE, 2021).
Bing, S., Fortuin, V. & Ratsch, G. On disentanglement in Gaussian process variational autoencoders. In 4th Symp. Adv. Approximate Bayesian Inference (2021).
Xu, Y., He, H., Shen, T. & Jaakkola, T. S. Controlling directions orthogonal to a classifier. In Int. Conf. Learning Representations (2022).
Cisse, M. & Koyejo, S. Fairness and representation learning. In NeurIPS Invited Talk 2019; https://cs.stanford.edu/~sanmi/documents/Representation_Learning_Fairness_NeurIPS19_Tutorial.pdf (2019).
Creager, E. et al. Flexibly fair representation learning by disentanglement. In Int. Conf. Machine Learning 1436–1445 (PMLR, 2019).
Locatello, F. et al. On the fairness of disentangled representations. In Adv. Neural Information Processing Systems (2019).
Lee, J., Kim, E., Lee, J., Lee, J. & Choo, J. Learning debiased representation via disentangled feature augmentation. In Adv. Neural Information Processing Systems 34, 25123–25133 (2021).
Zhang, Y. K., Wang, Q. W., Zhan, D. C. & Ye, H. J. Learning debiased representations via conditional attribute interpolation. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 7599–7608 (2023).
Tartaglione, E., Barbano, C. A. & Grangetto, M. End: entangling and disentangling deep representations for bias correction. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 13508–13517 (2021).
Bercea, C. I., Wiestler, B., Rueckert, D. & Albarqouni, S. FedDis: disentangled federated learning for unsupervised brain pathology segmentation. Preprint at https://doi.org/10.48550/arXiv.2103.03705 (2021).
Ke, J., Shen, Y. & Lu, Y. Style normalization in histology with federated learning. In 2021 IEEE 18th Int. Symp. Biomedical Imaging 953–956 (IEEE, 2021).
Pfohl, S. R., Dai, A. M. & Heller, K. Federated and differentially private learning for electronic health records. In Machine Learning for Health (ML4H) Workshop at NeurIPS (2019).
Xin, B. et al. Private FL-GAN: differential privacy synthetic data generation based on federated learning. In 2020 IEEE Int. Conf. Acoustics, Speech and Signal Processing 2927–2931 (IEEE, 2020).
Rajotte, J.-F. et al. Reducing bias and increasing utility by federated generative modeling of medical images using a centralized adversary. In Proc. Conf. Information Technology for Social Good 79–84 (2021).
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Int. Conf. Machine Learning 1597–1607 (PMLR, 2020).
Shad, R., Cunningham, J. P., Ashley, E. A., Langlotz, C. P. & Hiesinger, W. Designing clinically translatable artificial intelligence systems for high-dimensional medical imaging. Nat. Mach. Intell. 3, 929–935 (2021).
Jacovi, A., Marasovic, A., Miller, T. & Goldberg, Y. Formalizing trust in artificial intelligence: prerequisites, causes and goals of human trust in AI. In Proc. 2021 ACM Conf. Fairness, Accountability, and Transparency 624–635 (2021).
Floridi, L. Establishing the rules for building trustworthy AI. Nat. Mach. Intell. 1, 261–262 (2019).
High-Level Expert Group on Artificial Intelligence. Ethics Guidelines for Trustworthy AI (European Commission, 2019).
Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. In Workshop at Int. Conf. Learning Representations (2014).
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In Proc. IEEE Int. Conf. Computer Vision 618–626 (2017).
Irvin, J. et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In Proc. AAAI Conf. Artificial Intelligence 33, 590–597 (2019).
Sayres, R. et al. Using a deep learning algorithm and integrated gradients explanation to assist grading for diabetic retinopathy. Ophthalmology 126, 552–564 (2019).
Patro, B. N., Lunayach, M., Patel, S. & Namboodiri, V. P. U-CAM: visual explanation using uncertainty based class activation maps. In Proc. IEEE/CVF Int. Conf. Computer Vision 7444–7453 (2019).
Grewal, M., Srivastava, M. M., Kumar, P. & Varadarajan, S. RADNET: radiologist level accuracy using deep learning for hemorrhage detection in CT scans. In 2018 IEEE 15th Int. Symp. Biomedical Imaging 281–284 (IEEE, 2018).
Arun, N. T. et al. Assessing the validity of saliency maps for abnormality localization in medical imaging. In Medical Imaging with Deep Learning (2020).
Schlemper, J. et al. Attention-gated networks for improving ultrasound scan plane detection. In Medical Imaging with Deep Learning (2018).
Schlemper, J. et al. Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2019).
Mittelstadt, B., Russell, C. & Wachter, S. Explaining explanations in AI. In Proc. Conf. Fairness, Accountability, and Transparency 279–288 (2019).
Kindermans, P.-J. et al. in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning 267–280 (Springer, 2019).
Kaur, H. et al. Interpreting interpretability: understanding data scientists’ use of interpretability tools for machine learning. In Proc. 2020 CHI Conf. Human Factors in Computing Systems (2020).
Adebayo, J. et al. Sanity checks for saliency maps. In Adv. Neural Information Processing Systems (2018).
Saporta, A. et al. Benchmarking saliency methods for chest X-ray interpretation. Nat. Mach. Intell. 4, 867–878 (2022).
Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878 (2022).
DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021).
Adebayo, J., Muelly, M., Liccardi, I. & Kim, B. Debugging tests for model explanations. In Adv. Neural Information Processing Syst. 33, 700–712 (2020).
Lee, M. K. & Rich, K. Who is included in human perceptions of AI?: Trust and perceived fairness around healthcare AI and cultural mistrust. In Proc. 2021 CHI Conf. Human Factors in Computing Systems (2021).
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Int. Conf. Machine Learning 3319–3328 (PMLR, 2017).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st Int. Conf. Neural Information Processing Systems 4768–4777 (2017).
Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3, e745–e750 (2021).
Kim, G. B., Gao, Y., Palsson, B. O. & Lee, S. Y. DeepTFactor: a deep learning-based tool for the prediction of transcription factors. Proc. Natl Acad. Sci. USA 118, e2021171118 (2021).
Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2, 749–760 (2018).
Qiu, W. et al. Interpretable machine learning prediction of all-cause mortality. Commun. Med. 2, 125 (2022).
Janizek, J. D. et al. Uncovering expression signatures of synergistic drug responses via ensembles of explainable machine-learning models. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-023-01034-0 (2023).
Wexler, J., Pushkarna, M., Robinson, S., Bolukbasi, T. & Zaldivar, A. Probing ML models for fairness with the What-If tool and SHAP: hands-on tutorial. In Proc. 2020 Conference on Fairness, Accountability, and Transparency 705 (2020).
Lundberg, S. M. Explaining quantitative measures of fairness. In Fair & Responsible AI Workshop @ CHI2020; https://scottlundberg.com/files/fairness_explanations.pdf (2020).
Cesaro, J. & Cozman, F. G. Measuring unfairness through game-theoretic interpretability. In Machine Learning and Knowledge Discovery in Databases: International Workshops of ECML PKDD (2019).
Meng, C., Trinh, L., Xu, N. & Liu, Y. Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset. Sci. Rep. 12, 7166 (2022).
Panigutti, C., Perotti, A., Panisson, A., Bajardi, P. & Pedreschi, D. FairLens: auditing black-box clinical decision support systems. Inf. Process. Manag. 58, 102657 (2021).
Röösli, E., Bozkurt, S. & Hernandez-Boussard, T. Peeking into a black box, the fairness and generalizability of a MIMIC-III benchmarking model. Sci. Data 9, 24 (2022).
Pan, W., Cui, S., Bian, J., Zhang, C. & Wang, F. Explaining algorithmic fairness through fairnessaware causal path decomposition. In Proc. 27th ACM SIGKDD Conf. Knowledge Discovery and Data Mining 1287–1297 (2021).
Agarwal, C. et al. Openxai: towards a transparent evaluation of model explanations. In Adv. Neural Information Processing Systems 35, 15784–15799 (2022).
Zhang, H., Singh, H., Ghassemi, M. & Joshi, S. “Why did the model fail?”: attributing model performance changes to distribution shifts. In Int. Conf. Machine Learning (2023).
Ghorbani, A. & Zou, J. Data Shapley: equitable valuation of data for machine learning. In Int. Conf. Machine Learn. 97, 2242–2251 (2019).
Pandl, K. D., Feiland, F., Thiebes, S. & Sunyaev, A. Trustworthy machine learning for health care: scalable data valuation with the Shapley value. In Proc. Conf. Health, Inference, and Learning 47–57 (2021).
Prakash, E. I., Shrikumar, A. & Kundaje, A. Towards more realistic simulated datasets for benchmarking deep learning models in regulatory genomics. In Machine Learning in Computational Biology 58–77 (2022).
Oktay, O. et al. Attention U-Net: learning where to look for the pancreas. In Medical Imaging with Deep Learning (2018).
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In Int. Conf. Learning Representations (2020).
Lu, M. Y. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Yufei, C. et al. Bayes-MIL: a new probabilistic perspective on attention-based multiple instance learning for whole slide images. In Int. Conf. Learning Representations (2023).
Van Gansbeke, W., Vandenhende, S., Georgoulis, S. & Van Gool, L. Unsupervised semantic segmentation by contrasting object mask proposals. In Proc. IEEE/CVF Int. Conf. Computer Vision 10052–10062 (2021).
Radford, A. et al. Learning transferable visual models from natural language supervision. In Int. Conf. Machine Learning 8748–8763 (2021).
Wei, J. et al. Chain of thought prompting elicits reasoning in large language models. In Adv. Neural Information Processing Systems (2022).
Javed, S. A., Juyal, D., Padigela, H., Taylor-Weiner, A. & Yu, L. Additive MIL: intrinsically interpretable multiple instance learning for pathology. In Adv. Neural Information Processing Systems (2022).
Diao, J. A. et al. Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes. Nat. Commun. 12, 1613 (2021).
Bhargava, H. K. et al. Computationally derived image signature of stromal morphology is prognostic of prostate cancer recurrence following prostatectomy in african american patients. Clin. Cancer Res. 26, 1915–1923 (2020).
Curtis, J. R. et al. Population-based fracture risk assessment and osteoporosis treatment disparities by race and gender. J. Gen. Intern. Med. 24, 956–962 (2009).
Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416 (2018).
Foley, R. N., Wang, C. & Collins, A. J. Cardiovascular risk factor profiles and kidney function stage in the US general population: the NHANES III study. In Mayo Clinic Proc. 80, 1270–1277 (Elsevier, 2005).
Nevitt, M., Felson, D. & Lester, G. The osteoarthritis initiative. Protocol for the cohort study 1; https://nda.nih.gov/static/docs/StudyDesignProtocolAndAppendices.pdf (2006).
Vaughn, I. A., Terry, E. L., Bartley, E. J., Schaefer, N. & Fillingim, R. B. Racial-ethnic differences in osteoarthritis pain and disability: a meta-analysis. J. Pain. 20, 629–644 (2019).
Rotemberg, V. et al. A patient-centric dataset of images and metadata for identifying melanomas using clinical context. Sci. Data 8, 34 (2021).
Kinyanjui, N. M. et al. Estimating skin tone and effects on classification performance in dermatology datasets. Preprint at https://doi.org/10.48550/arXiv.1910.13268 (2019).
Kinyanjui, N. M. et al. Fairness of classifiers across skin tones in dermatology. In Int. Conf. Medical Image Computing and Computer-Assisted Intervention 320–329 (2020).
Chew, E. Y. et al. The Age-Related Eye Disease Study 2 (AREDS2): study design and baseline characteristics (AREDS2 report number 1). Ophthalmology 119, 2282–2289 (2012).
Joshi, N. & Burlina, P. AI fairness via domain adaptation. Preprint at https://doi.org/10.48550/arXiv.2104.01109 (2021).
Zhou, Y. et al. RadFusion: benchmarking performance and fairness for multi-modal pulmonary embolism detection from CT and EMR. Preprint at https://doi.org/10.48550/arXiv.2111.11665 (2021).
Edwards, N. J. et al. The CPTAC data portal: a resource for cancer proteomics research. J. Proteome Res. 14, 2707–2713 (2015).
Johnson, A. E. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016).
Boag, W., Suresh, H., Celi, L. A., Szolovits, P. & Ghassemi, M. Racial disparities and mistrust in end-of-life care. In Machine Learning for Healthcare Conf. 587–602 (PMLR, 2018).
Prosper, A. E. et al. Association of inclusion of more black individuals in lung cancer screening with reduced mortality. JAMA Netw. Open 4, e2119629 (2021).
National Lung Screening Trial Research Team. et al. The National Lung Screening Trial: overview and study design. Radiology 258, 243–253 (2011).
Colak, E. et al. The RSNA pulmonary embolism CT dataset. Radiol. Artif. Intell. 3, e200254 (2021).
Gertych, A., Zhang, A., Sayre, J., Pospiech-Kurkowska, S. & Huang, H. Bone age assessment of children using a digital hand atlas. Comput. Med. Imaging Graph. 31, 322–331 (2007).
Jeong, J. J. et al. The EMory BrEast imaging Dataset (EMBED): a racially diverse, granular dataset of 3.4 million screening and diagnostic mammographic images. Radiol. Artif. Intell. 5.1, e220047 (2023).
Pollard, T. J. et al. The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data 5, 1–13 (2018).
Sheikhalishahi, S., Balaraman, V. & Osmani, V. Benchmarking machine learning models on multicentre eICU critical care dataset. PLoS ONE 15, e0235424 (2020).
El Emam, K. et al. De-identification methods for open health data: the case of the heritage health prize claims dataset. J. Med. Internet Res 14, e33 (2012).
Madras, D., Pitassi, T. & Zemel, R. Predict responsibly: improving fairness and accuracy by learning to defer. In Adv. Neural Information Processing Systems (2018).
Louizos, C., Swersky, K., Li, Y., Welling, M. & Zemel, R. The variational fair autoencoder. In Int. Conf. Learning Representations (2016).
Raff, E. & Sylvester, J. Gradient reversal against discrimination. Preprint at https://doi.org/10.48550/arXiv.1807.00392 (2018).
Smith, J. W., Everhart, J., Dickson, W., Knowler, W. & Johannes, R. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proc. Symp. Computer Applications in Medical Care 261—265 (1988).
Sharma, S. et al. Data augmentation for discrimination prevention and bias disambiguation. In Proc. AAAI/ACM Conf. AI, Ethics, and Society 358–364 (Association for Computing Machinery, 2020).
International Warfarin Pharmacogenetics Consortium. et al. Estimation of the warfarin dose with clinical and pharmacogenetic data. N. Engl. J. Med 360, 753–764 (2009).
Kallus, N., Mao, X. & Zhou, A. Assessing algorithmic fairness with unobserved protected class using data combination. In Proc. 2020 Conf. Fairness, Accountability, and Transparency 110 (Association for Computing Machinery, 2020).
Gross, R. T. Infant Health and Development Program (IHDP): Enhancing the Outcomes of Low Birth Weight, Premature Infants in the United States, 1985-1988 (Inter-university Consortium for Political and Social Research, 1993); https://www.icpsr.umich.edu/web/HMCA/studies/9795
Madras, D., Creager, E., Pitassi, T. & Zemel, R. Fairness through causal awareness: learning causal latent-variable models for biased data. In Proc. Conf. Fairness, Accountability, and Transparency 30, 349–358 (Association for Computing Machinery, 2019).
Weeks, M. R., Clair, S., Borgatti, S. P., Radda, K. & Schensul, J. J. Social networks of drug users in high-risk sites: finding the connections. AIDS Behav. 6, 193–206 (2002).
Kleindessner, M., Samadi, S., Awasthi, P. & Morgenstern, J. Guarantees for spectral clustering with fairness constraints. In Int. Conf. Machine Learning 3458–3467 (2019).
Daneshjou, R. et al. Disparities in dermatology AI performance on a diverse, curated clinical image set. Sci. Adv. 8, eabq6147 (2022).
Garg, S., Balakrishnan, S. & Lipton, Z. C. Domain adaptation under open set label shift. In Adv. Neural Information Processing Systems (2022).
Pham, T. H., Zhang, X. & Zhang, P. Fairness and accuracy under domain generalization. In Int. Conf. Learning Representations (2023).
Barocas, S., Hardt, M. & Narayanan, A. Fairness in machine learning. NIPS Tutor 1, 2017 (2017).
Liu, L. T., Simchowitz, M. & Hardt, M. The implicit fairness criterion of unconstrained learning. In Int. Conf. Machine Learning 4051–4060 (PMLR, 2017).
Acknowledgements
All authors are supported in part by the Brigham and Women’s Hospital (BWH) President’s Fund, Mass General Hospital (MGH) Pathology and by National Institute of General Medical Sciences (NIGMS) R35GM138216 (to F.M.). R.J.C. and S.S. were also supported by the National Science Foundation (NSF) Graduate Fellowship. M.Y.L. was also supported by the Siebel Scholars program. T.Y.C. was also supported by the National Institute of Health National Cancer Institute (NIH-NCI) Ruth L. Kirschstein National Service Award T32CA251062. The content is solely the responsibility of the authors and does not reflect the official views of the NIH, NIGMS, NCI or NSF.
Author information
Authors and Affiliations
Contributions
R.J.C. and F.M. drafted the manuscript. All authors contributed to literature review, conceptualization and editing of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Biomedical Engineering thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary table.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, R.J., Wang, J.J., Williamson, D.F.K. et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat. Biomed. Eng 7, 719–742 (2023). https://doi.org/10.1038/s41551-023-01056-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41551-023-01056-8
This article is cited by
-
Machine-learning-based models to predict cardiovascular risk using oculomics and clinic variables in KNHANES
BioData Mining (2024)
-
Simulated misuse of large language models and clinical credit systems
npj Digital Medicine (2024)
-
Developing an individualized treatment rule for Veterans with major depressive disorder using electronic health records
Molecular Psychiatry (2024)
-
A causal perspective on dataset bias in machine learning for medical imaging
Nature Machine Intelligence (2024)
-
Enhancing the fairness of AI prediction models by Quasi-Pareto improvement among heterogeneous thyroid nodule population
Nature Communications (2024)