Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Algorithmic fairness in artificial intelligence for medicine and healthcare

Abstract

In healthcare, the development and deployment of insufficiently fair systems of artificial intelligence (AI) can undermine the delivery of equitable care. Assessments of AI models stratified across subpopulations have revealed inequalities in how patients are diagnosed, treated and billed. In this Perspective, we outline fairness in machine learning through the lens of healthcare, and discuss how algorithmic biases (in data acquisition, genetic variation and intra-observer labelling variability, in particular) arise in clinical workflows and the resulting healthcare disparities. We also review emerging technology for mitigating biases via disentanglement, federated learning and model explainability, and their role in the development of AI-based software as a medical device.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Connecting healthcare disparities and dataset shifts to algorithmic fairness.
Fig. 2: Strategies for mitigating disparate impact.
Fig. 3: Genetic drift as population shift.
Fig. 4: Dataset shifts in the deployment of AI-SaMDs for clinical-grade AI algorithms.
Fig. 5: A decentralized framework that integrates federated learning with adversarial learning and disentanglement.

Similar content being viewed by others

References

  1. Buolamwini, J. & Gebru, T. Gender shades: intersectional accuracy disparities in commercial gender classification. In Conf. on Fairness, Accountability and Transparency 77–91 (PMLR, 2018).

  2. Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).

    Article  CAS  PubMed  Google Scholar 

  3. Pierson, E., Cutler, D. M., Leskovec, J., Mullainathan, S. & Obermeyer, Z. An algorithmic approach to reducing unexplained pain disparities in underserved populations. Nat. Med. 27, 136–140 (2021).

    Article  CAS  PubMed  Google Scholar 

  4. Hooker, S. Moving beyond ‘algorithmic bias is a data problem’. Patterns 2, 100241 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  5. McCradden, M. D., Joshi, S., Mazwi, M. & Anderson, J. A. Ethical limitations of algorithmic fairness solutions in health care machine learning. Lancet Digit. Health 2, e221–e223 (2020).

    Article  PubMed  Google Scholar 

  6. Mhasawade, V., Zhao, Y. & Chunara, R. Machine learning and algorithmic fairness in public and population health. Nat. Mach. Intell. 3, 659–666 (2021).

    Article  Google Scholar 

  7. Currie, G. & Hawk, K. E. Ethical and legal challenges of artificial intelligence in nuclear medicine. In Seminars in Nuclear Medicine (Elsevier, 2020).

  8. Chen, I. Y. et al. Ethical machine learning in healthcare. Annu. Rev. Biomed. Data Sci. 4, 123–144 (2020).

    Article  Google Scholar 

  9. Howard, F. M. et al. The impact of site-specific digital histology signatures on deep learning model accuracy and bias. Nat. Commun. 12, 4423 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Seyyed-Kalantari, L., Liu, G., McDermott, M., Chen, I. Y. & Ghassemi, M. CheXclusion: fairness gaps in deep chest X-ray classifiers. In BIOCOMPUTING 2021: Proc. Pacific Symposium 232–243 (World Scientific, 2020).

  11. Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen, I. Y. & Ghassemi, M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat. Med. 27, 2176–2182 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gichoya, J. W. et al. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4, E406–E414 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Glocker, B., Jones, C., Bernhardt, M. & Winzeck, S. Algorithmic encoding of protected characteristics in chest X-ray disease detection models. EBioMedicine 89, 104467 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Proposed Regulatory Framework for Modifications to Artificial Intelligence. Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) (US FDA, 2019).

  15. Gaube, S. et al. Do as AI say: susceptibility in deployment of clinical decision-aids. NPJ Digit. Med. 4, 31 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SAMD) Action Plan (US FDA, 2021).

  17. Vyas, D. A. et al. Challenging the use of race in the vaginal birth after cesarean section calculator. Women’s Health Issues 29, 201–204 (2019).

    Article  PubMed  Google Scholar 

  18. Vyas, D. A., Eisenstein, L. G. & Jones, D. S. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N. Engl. J. Med. 383, 874–882 (2020).

    Article  PubMed  Google Scholar 

  19. van der Burgh, A. C., Hoorn, E. J. & Chaker, L. Removing race from kidney function estimates. JAMA 325, 2018 (2021).

    Article  PubMed  Google Scholar 

  20. Diao, J. A. et al. Clinical implications of removing race from estimates of kidney function. JAMA 325, 184–186 (2021). 2021.

    Article  PubMed  Google Scholar 

  21. Caton, S. & Haas, C. Fairness in machine learning: a survey. Preprint at https://doi.org/10.48550/arXiv.2010.04053 (2020).

  22. Adler, N. E., Glymour, M. M. & Fielding, J. Addressing social determinants of health and health inequalities. JAMA 316, 1641–1642 (2016).

    Article  PubMed  Google Scholar 

  23. Phelan, J. C. & Link, B. G. Is racism a fundamental cause of inequalities in health? Annu. Rev. Socio 41, 311–330 (2015).

    Article  Google Scholar 

  24. Yehia, B. R. et al. Association of race with mortality among patients hospitalized with coronavirus disease 2019 (COVID-19) at 92 US hospitals. JAMA Netw. Open 3, e2018039 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Lopez, L., Hart, L. H. & Katz, M. H. Racial and ethnic health disparities related to COVID-19. JAMA 325, 719–720 (2021).

    Article  CAS  PubMed  Google Scholar 

  26. Bonvicini, K. A. LGBT healthcare disparities: what progress have we made? Patient Educ. Couns. 100, 2357–2361 (2017).

    Article  PubMed  Google Scholar 

  27. Yamada, T. et al. Access disparity and health inequality of the elderly: unmet needs and delayed healthcare. Int. J. Environ. Res. Public Health 12, 1745–1772 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Moy, E., Dayton, E. & Clancy, C. M. Compiling the evidence: the national healthcare disparities reports. Health Aff. 24, 376–387 (2005).

    Article  Google Scholar 

  29. Balsa, A. I., Seiler, N., McGuire, T. G. & Bloche, M. G. Clinical uncertainty and healthcare disparities. Am. J. Law Med. 29, 203–219 (2003).

    Article  PubMed  Google Scholar 

  30. Marmot, M. Social determinants of health inequalities. Lancet 365, 1099–1104 (2005).

    Article  PubMed  Google Scholar 

  31. Maness, S. B. et al. Social determinants of health and health disparities: COVID-19 exposures and mortality among African American people in the United States. Public Health Rep. 136, 18–22 (2021).

    Article  PubMed  Google Scholar 

  32. Seligman, H. K., Laraia, B. A. & Kushel, M. B. Food insecurity is associated with chronic disease among low-income NHANES participants. J. Nutr. 140, 304–310 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Thun, M. J., Apicella, L. F. & Henley, S. J. Smoking vs other risk factors as the cause of smoking attributable deaths: confounding in the courtroom. JAMA 284, 706–712 (2000).

    Article  CAS  PubMed  Google Scholar 

  34. Tucker, M. J., Berg, C. J., Callaghan, W. M. & Hsia, J. The Black–White disparity in pregnancy-related mortality from 5 conditions: differences in prevalence and case-fatality rates. Am. J. Public Health 97, 247–251 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Gadson, A., Akpovi, E. & Mehta, P. K. Exploring the social determinants of racial/ethnic disparities in prenatal care utilization and maternal outcome. In Seminars in Perinatology 41, 308–317 (Elsevier, 2017).

  36. Wallace, M. et al. Maternity care deserts and pregnancy-associated mortality in louisiana. Women’s Health Issues 31, 122–129 (2021).

    Article  PubMed  Google Scholar 

  37. Burchard, E. G. et al. The importance of race and ethnic background in biomedical research and clinical practice. N. Engl. J. Med. 348, 1170–1175 (2003).

    Article  PubMed  Google Scholar 

  38. Phimister, E. G. Medicine and the racial divide. N. Engl. J. Med. 348, 1081–1082 (2003).

    Article  PubMed  Google Scholar 

  39. Bonham, V. L., Green, E. D. & Pérez-Stable, E. J. Examining how race, ethnicity, and ancestry data are used in biomedical research. JAMA 320, 1533–1534 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Eneanya, N. D., Yang, W. & Reese, P. P. Reconsidering the consequences of using race to estimate kidney function. JAMA 322, 113–114 (2019).

    Article  PubMed  Google Scholar 

  41. Zelnick, L. R., Leca, N., Young, B. & Bansal, N. Association of the estimated glomerular filtration rate with vs without a coefficient for race with time to eligibility for kidney transplant. JAMA Netw. Open 4, e2034004 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Chadban, S. J. et al. KDIGO clinical practice guideline on the evaluation and management of candidates for kidney transplantation. Transplantation 104, S11–S103 (2020).

    Article  PubMed  Google Scholar 

  43. Wesselman, H. et al. Social determinants of health and race disparities in kidney transplant. Clin. J. Am. Soc. Nephrol. 16, 262–274 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Kanis JA, H. N. M. E. & Johansson, H. A brief history of frax. Arch. Osteoporos. 13, 118 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Lewiecki, E. M., Wright, N. C. & Singer, A. J. Racial disparities, frax, and the care of patients with osteoporosis. Osteoporos. Int. 31, 2069–2071 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Civil Rights Act of 1964. Title VII, Equal Employment Opportunities https://en.wikipedia.org/wiki/Civil_Rights_Act_of_1964 (1964)

  47. Griggs v. Duke Power Co. https://en.wikipedia.org/wiki/Griggs_v._Duke_Power_Co (1971).

  48. Awad, E. et al. The moral machine experiment. Nature 563, 59–64 (2018).

    Article  CAS  PubMed  Google Scholar 

  49. Feller, A., Pierson, E., Corbett-Davies, S. & Goel, S. A computer program used for bail and sentencing decisions was labeled biased against blacks. it’s actually not that clear. The Washington Post (17 October 2016).

  50. Dressel, J. & Farid, H. The accuracy, fairness, and limits of predicting recidivism. Sci. Adv. 4, eaao5580 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Char, D. S., Shah, N. H. & Magnus, D. Implementing machine learning in health care—addressing ethical challenges. N. Engl. J. Med. 378, 981–983 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Bernhardt, M., Jones, C. & Glocker, B. Potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms. Nat. Med. 28, 1157–1158 (2022).

    Article  CAS  PubMed  Google Scholar 

  53. Mukherjee, P. et al. Confounding factors need to be accounted for in assessing bias by machine learning algorithms. Nat. Med. 28, 1159–1160 (2022).

    Article  CAS  PubMed  Google Scholar 

  54. Diao, J. A., Powe, N. R. & Manrai, A. K. Race-free equations for eGFR: comparing effects on CKD classification. J. Am. Soc. Nephrol. 32, 1868–1870 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C. & Venkatasubramanian, S. Certifying and removing disparate impact. In Proc. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 259–268 (2015).

  56. Hardt, M., Price, E. & Srebro, N. Equality of opportunity in supervised learning. In Adv. Neural Information Processing Systems (2016).

  57. Corbett-Davies, S. & Goel, S. The measure and mismeasure of fairness: a critical review of fair machine learning. Preprint at https://doi.org/10.48550/arXiv.1808.00023 (2018).

  58. Calders, T., Kamiran, F. & Pechenizkiy, M. Building classifiers with independency constraints. In Int. Conf. Data Mining Workshops 13–18 (IEEE, 2009).

  59. Chen, J., Kallus, N., Mao, X., Svacha, G. & Udell, M. Fairness under unawareness: assessing disparity when protected class is unobserved. In Proc. Conf. Fairness, Accountability, and Transparency 339–348 (2019).

  60. Zliobaite, I., Kamiran, F. & Calders, T. Handling conditional discrimination. In 11th Int. Conf. Data Mining 992–1001 (IEEE, 2011).

  61. Dwork, C., Hardt, M., Pitassi, T., Reingold, O. & Zemel, R. Fairness through awareness. In Proc. 3rd Innovations in Theoretical Computer Science Conf. 214–226 (2012).

  62. Pedreshi, D., Ruggieri, S. & Turini, F. Discrimination-aware data mining. In Proc. 14th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining 560–568 (2008).

  63. Angwin, J., Larson, J., Mattu, S. & Kirchner, L. In Ethics of Data and Analytics 254–264 (Auerbach, 2016).

  64. Kleinberg, J., Mullainathan, S. & Raghavan, M. Inherent trade-offs in the fair determination of risk scores. In 8th Innovations in Theoretical Computer Science Conf. (ITCS 2017)

  65. Chouldechova, A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5, 153–163 (2017).

    Article  PubMed  Google Scholar 

  66. Joseph, M., Kearns, M., Morgenstern, J. H. & Roth, A. Fairness in learning: classic and contextual bandits. In Adv. Neural Information Processing Systems (2016).

  67. Celis, L. E. & Keswani, V. Improved adversarial learning for fair classification. Preprint at https://doi.org/10.48550/arXiv.1901.10443 (2019).

  68. Kamiran, F. & Calders, T. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33, 1–33 (2012).

    Article  Google Scholar 

  69. Calmon, F. P., Wei, D., Vinzamuri, B., Ramamurthy, K. N. & Varshney, K. R. Optimized pre-processing for discrimination prevention. In Proc. 31st Int. Conf. Neural Information Processing Systems 3995–4004 (2017).

  70. Krasanakis, E., Spyromitros-Xioufis, E., Papadopoulos, S. & Kompatsiaris, Y. Adaptive sensitive reweighting to mitigate bias in fairness-aware classification. In Proc. 2018 World Wide Web Conf. 853–862 (2018).

  71. Jiang, H. & Nachum, O. Identifying and correcting label bias in machine learning. In Int. Conf. Artificial Intelligence and Statistics 702–712 (PMLR, 2020).

  72. Chai, X. et al. Unsupervised domain adaptation techniques based on auto-encoder for non-stationary eeg-based emotion recognition. Comput. Biol. Med. 79, 205–214 (2016).

    Article  PubMed  Google Scholar 

  73. Kamishima, T., Akaho, S., Asoh, H. & Sakuma, J. Fairness-aware classifier with prejudice remover regularizer. In Joint Eur. Conf. Machine Learning and Knowledge Discovery in Databases 35–50 (Springer, 2012).

  74. Zafar, M. B., Valera, I., Rogriguez, M. G. & Gummadi, K. P. Fairness constraints: mechanisms for fair classification. In Artificial Intelligence and Statistics 962–970 (PMLR, 2017).

  75. Goel, N., Yaghini, M. & Faltings, B. Non-discriminatory machine learning through convex fairness criteria. In 32nd AAAI Conference on Artificial Intelligence (2018).

  76. Goh, G., Cotter, A., Gupta, M. & Friedlander, M. P. Satisfying real-world goals with dataset constraints. In Adv. Neural Information Processing Systems (2016).

  77. Agarwal, A. et al. A reductions approach to fair classification. In Int. Conf. Machine Learning 60–69 (PMLR, 2018).

  78. Corbett-Davies, S., Pierson, E., Feller, A., Goel, S. & Huq, A. Algorithmic decision making and the cost of fairness. In Proc. 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining 797–806 (2017).

  79. Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J. & Weinberger, K. Q. On fairness and calibration. In Adv. Neural Information Processing Systems (2017).

  80. Chouldechova, A., Benavides-Prado, D., Fialko, O. & Vaithianathan, R. A case study of algorithm assisted decision making in child maltreatment hotline screening decisions. In Conf. Fairness, Accountability and Transparency 134–148 (PMLR, 2018).

  81. Abernethy, J., Awasthi, P., Kleindessner, M., Morgenstern, J. & Zhang, J. Active sampling for min-max fairness. In Int. Conf. Machine Learning 53–65, (PMLR, 2022).

  82. Iosifidis, V. & Ntoutsi, E. Dealing with bias via data augmentation in supervised learning scenarios. In Proc. Int. Workshop on Bias in Information, Algorithms, and Systems (eds. Bates, J. et al.) (2018).

  83. Vodrahalli, K., Li, K. & Malik, J. Are all training examples created equal? An empirical study. Preprint at https://doi.org/10.48550/arXiv.1811.12569 (2018).

  84. Barocas, S. & Selbst, A. D. Big data’s disparate impact. Calif. Law Rev. 104, 671 (2016).

    Google Scholar 

  85. O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Crown, 2016).

  86. Rezaei, A., Liu, A., Memarrast, O. & Ziebart, B. D. Robust fairness under covariate shift. In Proc. AAAI Conf. Artificial Intelligence 35, 9419–9427 (2021).

  87. Alabi, D., Immorlica, N. & Kalai, A. Unleashing linear optimizers for group-fair learning and optimization. In Conf. Learning Theory 2043–2066 (PMLR, 2018).

  88. Kearns, M., Neel, S., Roth, A. & Wu, Z. S. Preventing fairness gerrymandering: auditing and learning for subgroup fairness. In Int. Conf. Machine Learning 2564–2572 (PMLR, 2018).

  89. Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2, 158–164 (2018).

    Article  PubMed  Google Scholar 

  90. Babenko, B. et al. Detection of signs of disease in external photographs of the eyes via deep learning. Nat. Biomed. Eng. 6, 1370–1383 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Kamishima, T., Akaho, S. & Sakuma, J. Fairness-aware learning through regularization approach. In 2011 IEEE 11th Int. Conf. Data Mining Workshops 643–650 (IEEE, 2011).

  92. Zafar, M. B., Valera, I., Gomez Rodriguez, M. & Gummadi, K. P. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proc. 26th Int. Conf. World Wide Web 1171–1180 (2017).

  93. Zemel, R., Wu, Y., Swersky, K., Pitassi, T. & Dwork, C. Learning fair representations. In Int. Conf. Machine Learning 325–333 (PMLR, 2013).

  94. Kim, M., Reingold, O. & Rothblum, G. Fairness through computationally-bounded awareness. In Adv. Neural Information Processing Systems (2018).

  95. Pfohl, S. R., Foryciarz, A. & Shah, N. H. An empirical characterization of fair machine learning for clinical risk prediction. J. Biomed. Inform. 113, 103621 (2021).

    Article  PubMed  Google Scholar 

  96. Foryciarz, A., Pfohl, S. R., Patel, B. & Shah, N. Evaluating algorithmic fairness in the presence of clinical guidelines: the case of atherosclerotic cardiovascular disease risk estimation. BMJ Health Care Inf. 29, e100460 (2022).

    Article  Google Scholar 

  97. Muntner, P. et al. Potential US population impact of the 2017 ACC/AHA high blood pressure guideline. Circulation 137, 109–118 (2018).

    Article  PubMed  Google Scholar 

  98. Chen, I., Johansson, F. D. & Sontag, D. Why is my classifier discriminatory? In Adv. Neural Information Processing Systems (2018).

  99. Raji, I. D. & Buolamwini, J. Actionable auditing: investigating the impact of publicly naming biased performance results of commercial AI products. In Proc. 2019 AAAI/ACM Conf. AI, Ethics, and Society 429–435 (2019).

  100. Rolf, E., Worledge, T., Recht, B. & Jordan, M. I. Representation matters: assessing the importance of subgroup allocations in training data. In Int. Conf. Machine Learning 9040–9051 (2021).

  101. Zhao, H. & Gordon, G. Inherent tradeoffs in learning fair representations. In Adv. Neural InformationProcessing Systems 32, 15675–15685 (2019).

  102. Pfohl, S. et al. Creating fair models of atherosclerotic cardiovascular disease risk. In Proc. 2019 AAAI/ACM Conf. AI, Ethics, and Society 271–278 (2019).

  103. Pfohl, S. R. Recommendations for Algorithmic Fairness Assessments of Predictive Models in Healthcare: Evidence from Large-scale Empirical Analyses. PhD thesis, Stanford Univ. (2021).

  104. Singh, H., Singh, R., Mhasawade, V. & Chunara, R. Fairness violations and mitigation under covariate shift. In Proc. 2021 ACM Conf. Fairness, Accountability, and Transparency 3–13 (2021).

  105. Biswas, A. & Mukherjee, S. Ensuring fairness under prior probability shifts. In Proc. 2021 AAAI/ACM Conf. AI, Ethics, and Society 414–424 (2021).

  106. Giguere, S. et al. Fairness guarantees under demographic shift. In Int. Conf. Learning Representations (2021).

  107. Mishler, A. & Dalmasso, N. Fair when trained, unfair when deployed: observable fairness measures are unstable in performative prediction settings. Preprint at https://doi.org/10.48550/arXiv.2202.05049 (2022).

  108. Duchi, J. & Namkoong, H. Learning models with uniform performance via distributionally robust optimization. Ann. Stat. 49, 1378–1406 (2021).

    Article  Google Scholar 

  109. Hashimoto, T., Srivastava, M., Namkoong, H. & Liang, P. Fairness without demographics in repeated loss minimization. In Int. Conf. Machine Learning 1929–1938 (PMLR, 2018).

  110. Wang, S. et al. Robust optimization for fairness with noisy protected groups. In Adv. Neural InformationProcessing Systems 33, 5190–5203 (2020).

  111. Coston, A. et al. Fair transfer learning with missing protected attributes. In Proc. 2019 AAAI/ACM Conf. AI, Ethics, and Society 91–98 (2019).

  112. Schumann, C. et al. Transfer of machine learning fairness across domains. In NeurIPS AI for Social Good Workshop (2019).

  113. Lahoti, P. et al. Fairness without demographics through adversarially reweighted learning. In Adv. Neural Information Processing Systems 33, 728–740 (2020).

  114. Yan, S., Kao, H.-t. & Ferrara, E. Fair class balancing: enhancing model fairness without observing sensitive attributes. In Proc. 29th ACM Int. Conf. Information and Knowledge Management 1715–1724 (2020).

  115. Zhao, T., Dai, E., Shu, K. & Wang, S. Towards fair classifiers without sensitive attributes: exploring biases in related features. In Proc. 15th ACM Int. Conf. Web Search and Data Mining 1433–1442 (2022).

  116. Quinonero-Candela, J., Sugiyama, M., Lawrence, N. D. & Schwaighofer, A. Dataset Shift in Machine Learning (MIT Press, 2009).

  117. Subbaswamy, A., Schulam, P. & Saria, S. Preventing failures due to dataset shift: learning predictive models that transport. In 22nd Int. Conf. Artificial Intelligence and Statistics 3118–3127 (PMLR, 2019).

  118. Subbaswamy, A. & Saria, S. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics 21, 345–352 (2020).

    PubMed  Google Scholar 

  119. Guo, L. L. et al. Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Sci. Rep. 12, 2726 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Singh, H., Singh, R., Mhasawade, V. & Chunara, R. Fair predictors under distribution shift. In NeurIPS Workshop on Fair ML for Health (2019).

  121. Bernhardt, M., Jones, C. & Glocker, B. Investigating underdiagnosis of ai algorithms in the presence of multiple sources of dataset bias. Nat. Med. 28, 1157–1158 (2022).

    Article  CAS  PubMed  Google Scholar 

  122. Ghosh, A. & Shanbhag, A. FairCanary: rapid continuous explainable fairness. In Proc. AAAI/ACM Conf. AI, Ethics, and Society (2022).

  123. Sagawa, S., Koh, P. W., Hashimoto, T. B. & Liang, P. Distributionally robust neural networks. In Int. Conf. Learning Representations (2020).

  124. Yang, Y., Zhang, H., Katabi, D. & Ghassemi, M. Change is hard: a closer look at subpopulation shift. In Int. Conf. Machine Learning (2023).

  125. Zong, Y., Yang, Y. & Hospedales, T. MEDFAIR: benchmarking fairness for medical imaging. In Int. Conf. Learning Representations (2023).

  126. Lipkova, J. et al. Deep learning-enabled assessment of cardiac allograft rejection from endomyocardial biopsies. Nat. Med. 28, 575–582 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Tedeschi, P. & Griffith, J. R. Classification of hospital patients as ‘surgical’. Implications of the shift to ICD-9-CM. Med. Care 22, 189–192 (1984).

    Article  CAS  PubMed  Google Scholar 

  128. Heslin, K. C. et al. Trends in opioid-related inpatient stays shifted after the US transitioned to ICD-10-CM diagnosis coding in 2015. Med. Care 55, 918–923 (2017).

    Article  PubMed  Google Scholar 

  129. Castro, D. C., Walker, I. & Glocker, B. Causality matters in medical imaging. Nat. Commun. 11, 3673 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal 6, pl1 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  131. Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).

    Article  Google Scholar 

  132. Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Wen, D. et al. Characteristics of publicly available skin cancer image datasets: a systematic review. Lancet Digit. Health 4, e64–e74 (2021).

    Article  PubMed  Google Scholar 

  134. Khan, S. M. et al. A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability. Lancet Digit. Health 3, e51–e66 (2021).

    Article  CAS  PubMed  Google Scholar 

  135. Mamary, A. J. et al. Race and gender disparities are evident in COPD underdiagnoses across all severities of measured airflow obstruction. Chronic Obstr. Pulm. Dis. 5, 177 (2018).

    PubMed  PubMed Central  Google Scholar 

  136. Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen, I. Y. & Ghassemi, M. Reply to: ‘potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms’ and ‘confounding factors need to be accounted for in assessing bias by machine learning algorithms’. Nat. Med. 28, 1161–1162 (2022).

    Article  CAS  PubMed  Google Scholar 

  137. Landry, L. G., Ali, N., Williams, D. R., Rehm, H. L. & Bonham, V. L. Lack of diversity in genomic databases is a barrier to translating precision medicine research into practice. Health Aff. 37, 780–785 (2018).

    Article  Google Scholar 

  138. Gusev, A. et al. Atlas of prostate cancer heritability in European and African-American men pinpoints tissue-specific regulation. Nat. Commun. 7, 10979 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. Hinch, A. G. et al. The landscape of recombination in African Americans. Nature 476, 170–175 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  140. Shriver, M. D. et al. Skin pigmentation, biogeographical ancestry and admixture mapping. Hum. Genet. 112, 387–399 (2003).

    Article  PubMed  Google Scholar 

  141. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12, e1001779 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  142. Puyol-Anton, E. et al. Fairness in cardiac MR image analysis: an investigation of bias due to data imbalance in deep learning based segmentation. Med. Image Comput. Computer Assist. Intervention 24, 413–423 (2021).

    Google Scholar 

  143. Kraft, S. A. et al. Beyond consent: building trusting relationships with diverse populations in precision medicine research. Am. J. Bioeth. 18, 3–20 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  144. West, K. M., Blacksher, E. & Burke, W. Genomics, health disparities, and missed opportunities for the nation’s research agenda. JAMA 317, 1831–1832 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  145. Mahal, B. A. et al. Racial differences in genomic profiling of prostate cancer. N. Engl. J. Med. 383, 1083–1085 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  146. Shi, Y. et al. A prospective, molecular epidemiology study of EGFR mutations in asian patients with advanced non–small-cell lung cancer of adenocarcinoma histology (PIONEER). J. Thorac. Oncol. 9, 154–162 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Spratt, D. E. et al. Racial/ethnic disparities in genomic sequencing. JAMA Oncol. 2, 1070–1074 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  148. Zhang, G. et al. Characterization of frequently mutated cancer genes in chinese breast tumors: a comparison of chinese and TCGA cohorts. Ann. Transl. Med. 7, 179 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  149. Zavala, V. A. et al. Cancer health disparities in racial/ethnic minorities in the United States. Br. J. Cancer 124, 315–332 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  150. Zhang, W., Edwards, A., Flemington, E. K. & Zhang, K. Racial disparities in patient survival and tumor mutation burden, and the association between tumor mutation burden and cancer incidence rate. Sci. Rep. 7, 13639 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  151. Ooi, S. L., Martinez, M. E. & Li, C. I. Disparities in breast cancer characteristics and outcomes by race/ethnicity. Breast Cancer Res. Treat. 127, 729–738 (2011).

    Article  PubMed  Google Scholar 

  152. Henderson, B. E., Lee, N. H., Seewaldt, V. & Shen, H. The influence of race and ethnicity on the biology of cancer. Nat. Rev. Cancer 12, 648–653 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  153. Gamble, P. et al. Determining breast cancer biomarker status and associated morphological features using deep learning. Commun. Med. 1, 1–12 (2021).

    Article  Google Scholar 

  154. Borrell, L. N. et al. Race and genetic ancestry in medicine—a time for reckoning with racism. N. Engl. J. Med. 384, 474–480 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  155. Martini, R., Newman, L. & Davis, M. Breast cancer disparities in outcomes; unmasking biological determinants associated with racial and genetic diversity. Clin. Exp. Metastasis 39, 7–14 (2022).

    Article  CAS  PubMed  Google Scholar 

  156. Martini, R. et al. African ancestry–associated gene expression profiles in triple-negative breast cancer underlie altered tumor biology and clinical outcome in women of African descent. Cancer Discov. 12, 2530–2551 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  157. Herbst, R. S. et al. Atezolizumab for first-line treatment of PD-L1–selected patients with NSCLC. N. Engl. J. Med. 383, 1328–1339 (2020).

    Article  CAS  PubMed  Google Scholar 

  158. Clarke, M. A., Devesa, S. S., Hammer, A. & Wentzensen, N. Racial and ethnic differences in hysterectomy-corrected uterine corpus cancer mortality by stage and histologic subtype. JAMA Oncol. 8, 895–903 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  159. Yeyeodu, S. T., Kidd, L. R. & Kimbro, K. S. Protective innate immune variants in racial/ethnic disparities of breast and prostate cancer. Cancer Immunol. Res. 7, 1384–1389 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  160. Yang, W. et al. Sex differences in gbm revealed by analysis of patient imaging, transcriptome, and survival data. Sci. Transl. Med. 11, eaao5253 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  161. Carrano, A., Juarez, J. J., Incontri, D., Ibarra, A. & Cazares, H. G. Sex-specific differences in glioblastoma. Cells 10, 1783 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  162. Creed, J. H. et al. Commercial gene expression tests for prostate cancer prognosis provide paradoxical estimates of race-specific risk. Cancer Epidemiol. Biomark. Prev. 29, 246–253 (2020).

    Article  CAS  Google Scholar 

  163. Burlina, P., Joshi, N., Paul, W., Pacheco, K. D. & Bressler, N. M. Addressing artificial intelligence bias in retinal diagnostics. Transl. Vis. Sci. Technol. 10, 13 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  164. Kakadekar, A., Greene, D. N., Schmidt, R. L., Khalifa, M. A. & Andrews, A. R. Nonhormone-related histologic findings in postsurgical pathology specimens from transgender persons: a systematic review. Am. J. Clin. Pathol. 157, 337–344 (2022).

    Article  PubMed  Google Scholar 

  165. Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).

    Article  CAS  PubMed  Google Scholar 

  166. Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  167. Echle, A. et al. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br. J. Cancer 124, 686–696 (2021).

    Article  PubMed  Google Scholar 

  168. Dwork, C., Immorlica, N., Kalai, A. T. & Leiserson, M. Decoupled classifiers for fair and efficient machine learning. In Conf. Fairness, Accountability and Transparency (PMLR, 2018).

  169. Lipton, Z., McAuley, J. & Chouldechova, A. Does mitigating ml’s impact disparity require treatment disparity? In Adv. Neural Information Processing Systems (2018).

  170. Madras, D., Creager, E., Pitassi, T. & Zemel, R. Fairness through causal awareness: learning causal latent-variable models for biased data. In Proc. Conf. Fairness, Accountability, and Transparency 349–358 (2019).

  171. Lohaus, M., Kleindessner, M., Kenthapadi, K., Locatello, F. & Russell, C. Are two heads the same as one? Identifying disparate treatment in fair neural networks. In Adv. Neural Information Processing Systems (2022).

  172. McCarty, C. A. et al. The emerge network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med. Genet. 4, 1–11 (2011).

    Google Scholar 

  173. Gottesman, O. et al. The electronic medical records and genomics (emerge) network: past, present, and future. Genet. Med. 15, 761–771 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  174. Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  175. Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  176. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  177. Manrai, A. K. et al. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375, 655–665 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  178. Dehkharghanian, T. et al. Biased data, biased AI: deep networks predict the acquisition site of TCGA images. Diagn. Pathol. 18, 1–12 (2023).

    Article  Google Scholar 

  179. Ganin, Y. & Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Int. Conf. Machine Learning 1180–1189 (PMLR, 2015).

  180. Shaban, M. T., Baur, C., Navab, N. & Albarqouni, S. StainGAN: stain style transfer for digital histological images. In 2019 IEEE 16th Int. Symp. Biomedical Imaging (ISBI 2019) 953–956 (IEEE, 2019).

  181. Widmer, G. & Kubat, M. Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23, 69–101 (1996).

    Article  Google Scholar 

  182. Schlimmer, J. C. & Granger, R. H. Incremental learning from noisy data. Mach. Learn. 1, 317–354 (1986).

    Article  Google Scholar 

  183. Lu, J. et al. Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31, 2346–2363 (2018).

    Google Scholar 

  184. Guo, L. L. et al. Systematic review of approaches to preserve machine learning performance in the presence of temporal dataset shift in clinical medicine. Appl. Clin. Inform. 12, 808–815 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  185. Barocas, S. et al. Designing disaggregated evaluations of AI systems: choices, considerations, and tradeoffs. In Proc. 2021 AAAI/ACM Conf. AI, Ethics, and Society 368–378 (2021).

  186. Zhou, H., Chen, Y. & Lipton, Z. C. Evaluating model performance in medical datasets over time. In Proc. Conf. Health, Inference, and Learning (2023).

  187. Scholkopf, B. et al. On causal and anticausal learning. In Int. Conf. Machine Learning (2012).

  188. Lipton, Z., Wang, Y.-X. & Smola, A. Detecting and correcting for label shift with black box predictors. In Int. Conf. Machine Learning 3122–3130 (PMLR, 2018).

  189. Loupy, A., Mengel, M. & Haas, M. Thirty years of the international banff classification for allograft pathology: the past, present, and future of kidney transplant diagnostics. Kidney Int 101, 678–691 (2022).

    Article  PubMed  Google Scholar 

  190. Delahunt, B. et al. Gleason and Fuhrman no longer make the grade. Histopathology 68, 475–481 (2016).

    Article  PubMed  Google Scholar 

  191. Davatchi, F. et al. The saga of diagnostic/classification criteria in Behcet’s disease. Int. J. Rheum. Dis. 18, 594–605 (2015).

    Article  PubMed  Google Scholar 

  192. Louis, D. N. et al. The 2016 World Health Organization classification of tumors of the central nervous system: a summary. Acta Neuropathol. 131, 803–820 (2016).

    Article  PubMed  Google Scholar 

  193. Bifet, A. & Gavalda, R. Learning from time-changing data with adaptive windowing. In Proc. 2007 SIAM International Conference on Data Mining 443–448 (SIAM, 2007).

  194. Nigenda, D. et al. Amazon SageMaker Model Monitor: a system for real-time insights into deployed machine learning models. In Proc. 28th ACM SIGKDD Conf. Knowledge Discovery and Data Mining (2022).

  195. Miroshnikov, A., Kotsiopoulos, K., Franks, R. & Kannan, A. R. Wasserstein-based fairness interpretability framework for machine learning models. Mach. Learn. 111, 3307–3357 (2022).

    Article  Google Scholar 

  196. Board, A. E. AAA statement on race. Am. Anthropol. 100, 712–713 (1998).

    Article  Google Scholar 

  197. Oni-Orisan, A., Mavura, Y., Banda, Y., Thornton, T. A. & Sebro, R. Embracing genetic diversity to improve black health. N. Engl. J. Med. 384, 1163–1167 (2021).

    Article  PubMed  Google Scholar 

  198. Calhoun, A. The pathophysiology of racial disparities. N. Engl. J. Med. 384, e78 (2021).

    Article  PubMed  Google Scholar 

  199. Sun, R. et al. Don’t ignore genetic data from minority populations. Nature 585, 184–186 (2020).

    Article  PubMed  Google Scholar 

  200. Lannin, D. R. et al. Influence of socioeconomic and cultural factors on racial differences in late-stage presentation of breast cancer. JAMA 279, 1801–1807 (1998).

    Article  CAS  PubMed  Google Scholar 

  201. Bao, M. et al. It’s COMPASlicated: the messy relationship between RAI datasets and algorithmic fairness benchmarks. In 35th Conf. Neural Information Processing Systems Datasets and Benchmarks (2021).

  202. Hao, M. et al. Efficient and privacy-enhanced federated learning for industrial artificial intelligence. IEEE Trans. Ind. Inf. 16, 6532–6542 (2019).

    Article  Google Scholar 

  203. Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10, 1–19 (2019).

    Article  Google Scholar 

  204. Bonawitz, K. et al. Practical secure aggregation for privacy-preserving machine learning. In Proc. 2017 ACM SIGSAC Conf. Computer and Communications Security 1175–1191 (2017).

  205. Bonawitz, K. et al. Towards federated learning at scale: system design. In Proc. Mach. Learn. Syst. 1, 374–388 (2019).

    Google Scholar 

  206. Brisimi, T. S. et al. Federated learning of predictive models from federated electronic health records. Int. J. Med. Inform. 112, 59–67 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  207. Huang, L. et al. Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. J. Biomed. Inform. 99, 103291 (2019).

    Article  PubMed  Google Scholar 

  208. Xu, J. et al. Federated learning for healthcare informatics. J. Healthc. Inform. Res. 5, 1–19 (2021).

    Article  PubMed  Google Scholar 

  209. Chakroborty, S., Patel, K. R. & Freytag, A. Beyond federated learning: fusion strategies for diabetic retinopathy screening algorithms trained from different device types. Invest. Ophthalmol. Vis. Sci. 62, 85–85 (2021).

    Google Scholar 

  210. Ju, C. et al. Federated transfer learning for EEG signal classification. In 42nd Annu. Int. Conf. IEEE Engineering in Medicine and Biology Society 3040–3045 (IEEE, 2020).

  211. Li, W. et al. Privacy-preserving federated brain tumour segmentation. In Int. Workshop on Machine Learning in Medical Imaging 133–141 (Springer, 2019).

  212. Kaissis, G. et al. End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nat. Mach. Intell. 3, 473–484 (2021).

    Article  Google Scholar 

  213. Rieke, N. et al. The future of digital health with federated learning. NPJ Digit. Med. 3, 119 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  214. Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10, 12598 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  215. Choudhury, O. et al. Differential privacy-enabled federated learning for sensitive health data. In Machine Learning for Health (ML4H) Workshop at NeurIPS (2019).

  216. Kushida, C. A. et al. Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies. Med. Care 50, S82–S101 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  217. van der Haak, M. et al. Data security and protection in cross-institutional electronic patient records. Int. J. Med. Inform. 70, 117–130 (2003).

    Article  PubMed  Google Scholar 

  218. Veale, M. & Binns, R. Fairer machine learning in the real world: mitigating discrimination without collecting sensitive data. Big Data Soc. 4, 2053951717743530 (2017).

    Article  Google Scholar 

  219. Fiume, M. et al. Federated discovery and sharing of genomic data using beacons. Nat. Biotechnol. 37, 220–224 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  220. Sadilek, A. et al. Privacy-first health research with federated learning. NPJ Digit. Med. 4, 132 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  221. Duan, R., Boland, M. R., Moore, J. H. & Chen, Y. ODAL: a one-shot distributed algorithm to perform logistic regressions on electronic health records data from multiple clinical sites. In BIOCOMPUTING 2019: Proc. Pacific Symposium 30–41 (World Scientific, 2018).

  222. Sarma, K. V. et al. Federated learning improves site performance in multicenter deep learning without data sharing. J. Am. Med. Inform. Assoc. 28, 1259–1264 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  223. Silva, S. et al. Federated learning in distributed medical databases: meta-analysis of large-scale subcortical brain data. In 2019 IEEE 16th International Symposium on Biomedical Imaging 270–274 (IEEE, 2019).

  224. Roy, A. G., Siddiqui, S., Polsterl, S., Navab, N. & Wachinger, C. BrainTorrent: a peer-to-peer environment for decentralized federated learning. Preprint at https://doi.org/10.48550/arXiv.1905.06731 (2019).

  225. Lu, M. Y. et al. Federated learning for computational pathology on gigapixel whole slide images. Med. Image Anal. 76, 102298 (2022).

    Article  PubMed  Google Scholar 

  226. Dou, Q. et al. Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study. NPJ Digit. Med. 4, 60 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  227. Yang, D. et al. Federated semi-supervised learning for COVID region segmentation in chest CT using multinational data from China, Italy, Japan. Med. Image Anal. 70, 101992 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  228. Vaid, A. et al. Federated learning of electronic health records to improve mortality prediction in hospitalized patients with COVID-19: machine learning approach. JMIR Med. Inform. 9, e24207 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  229. Li, S., Cai, T. & Duan, R. Targeting underrepresented populations in precision medicine: a federated transfer learning approach. Preprint at https://doi.org/10.48550/arXiv.2108.12112 (2023).

  230. Mandl, K. D. et al. The genomics research and innovation network: creating an interoperable, federated, genomics learning system. Genet. Med. 22, 371–380 (2020).

    Article  CAS  PubMed  Google Scholar 

  231. Cai, M. et al. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. Am. J. Hum. Genet. 108, 632–655 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  232. Liang, J., Hu, D. & Feng, J. Do we really need to access the source data? Source hypothesis transfer for unsupervised domain adaptation. In Int. Conf. Machine Learning 6028–6039 (PMLR, 2020).

  233. Song, L., Ma, C., Zhang, G. & Zhang, Y. Privacy-preserving unsupervised domain adaptation in federated setting. IEEE Access 8, 143233–143240 (2020).

    Article  Google Scholar 

  234. Li, X. et al. Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results. Med. Image Anal. 65, 101765 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  235. Peterson, D., Kanani, P. & Marathe, V. J. Private federated learning with domain adaptation. In Federated Learning for Data Privacy and Confidentiality Workshop in NeurIPS (2019).

  236. Peng, X., Huang, Z., Zhu, Y. & Saenko, K. Federated adversarial domain adaptation. In Int. Conf. Learning Representations (2020).

  237. Yao, C.-H. et al. Federated multi-target domain adaptation. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 1424–1433 (2022).

  238. Li, T., Sanjabi, M., Beirami, A. & Smith, V. Fair resource allocation in federated learning. In Int. Conf. Learning Representations (2020).

  239. Mohri, M., Sivek, G. & Suresh, A. T. Agnostic federated learning. In Int. Conf. Machine Learning 4615-4625 (PMLR, 2019).

  240. Ezzeldin, Y. H., Yan, S., He, C., Ferrara, E. & Avestimehr, S. FairFed: enabling group fairness in federated learning. In Proc. AAAI Conf. Artificial Intelligence (2023).

  241. Papadaki, A., Martinez, N., Bertran, M., Sapiro, G. & Rodrigues, M. Minimax demographic group fairness in federated learning. In ACM Conf. Fairness, Accountability, and Transparency 142–159 (2022).

  242. Chen, D., Gao, D., Kuang, W., Li, Y. & Ding, B. pFL-Bench: a comprehensive benchmark for personalized federated learning. In 36th Conf. Neural Information Processing Systems Datasets and Benchmarks Track (2022).

  243. Chai, J. & Wang, X. Self-supervised fair representation learning without demographics. In Adv. Neural Information Processing Systems (2022).

  244. Jiang, M. et al. Fair federated medical image segmentation via client contribution estimation. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 16302–16311 (2023).

  245. Jiang, M., Wang, Z. & Dou, Q. Harmofl: harmonizing local and global drifts in federated learning on heterogeneous medical images. In Proc. AAAI Conf. Artificial Intelligence 1087–1095 (2022).

  246. Xu, Y. Y., Lin, C. S. and Wang, Y. C. F. Bias-eliminating augmentation learning for debiased federated learning. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 20442–20452 (2023).

  247. Zhao, Y. et al. Federated learning with non-IID data. Preprint at https://doi.org/10.48550/arXiv.1806.00582 (2018).

  248. Konečný, J. et al. Federated learning: strategies for improving communication efficiency. Preprint at https://doi.org/10.48550/arXiv.1610.05492 (2016).

  249. Lin, Y., Han, S., Mao, H., Wang, Y. & Dally, W. J. Deep gradient compression: reducing the communication bandwidth for distributed training. In Int. Conf. Learning Representations (2018).

  250. McMahan, B., et al Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics 1273–1282 (PMLR, 2017).

  251. Li, T. et al. Federated optimization in heterogeneous networks. In Proc. Mach. Learn. Syst. 2, 429–450 (2020).

    Google Scholar 

  252. Sattler, F., Wiedemann, S., Muller, K.-R. & Samek, W. Robust and communication-efficient federated learning from non-iid data. In IEEE Trans. Neural Netw. Learn. Syst. 31, 3400–3413 (2019).

    Article  PubMed  Google Scholar 

  253. Abay, A. et al. Mitigating bias in federated learning. Preprint at https://doi.org/10.48550/arXiv.2012.02447 (2020).

  254. Luo, Z., Wang, Y., Wang, Z., Sun, Z. & Tan, T. Disentangled federated learning for tackling attributes skew via invariant aggregation and diversity transferring. In Int. Conf. Machine Learning 14527–14541 (PMLR, 2022).

  255. McNamara, D., Ong, C. S. & Williamson, R. C. Costs and benefits of fair representation learning. In Proc. 2019 AAAI/ACM Conference on AI, Ethics, and Society 263–270 (2019).

  256. Madaio, M. A., Stark, L., Wortman Vaughan, J. & Wallach, H. Co-designing checklists to understand organizational challenges and opportunities around fairness in AI. In Proc. 2020 CHI Conf. Human Factors in Computing Systems (2020).

  257. Jung, K. et al. A framework for making predictive models useful in practice. J. Am. Med. Inform. Assoc. 28, 1149–1158 (2021).

    Article  PubMed  Google Scholar 

  258. Pogodin, R. et al. Efficient conditionally invariant representation learning. In Int. Conf. Learning Representations (2023).

  259. Louizos, C. et al. Causal effect inference with deep latent-variable models. In Adv. Neural Information Processing Systems (2017).

  260. Shi, C., Blei, D. & Veitch, V. Adapting neural networks for the estimation of treatment effects. In Adv. Neural Information Processing Systems (2019).

  261. Yoon, J., Jordon, J. & Van Der Schaar, M. GANITE: estimation of individualized treatment effects using generative adversarial nets. In Int. Conf. Learning Representations (2018).

  262. Rezaei, A., Fathony, R., Memarrast, O. & Ziebart, B. Fairness for robust log loss classification. In Proc. AAAI Conf. Artificial Intelligence 34, 5511–5518 (2020).

  263. Petrović, A., Nikolić, M., Radovanović, S., Delibašić, B. & Jovanović, M. FAIR: Fair adversarial instance re-weighting. Neurocomputing 476, 14–37 (2020).

    Article  Google Scholar 

  264. Sattigeri, P., Hoffman, S. C., Chenthamarakshan, V. & Varshney, K. R. Fairness GAN: generating datasets with fairness properties using a generative adversarial network. IBM J. Res. Dev. 63, 3:1–3:9 (2019).

    Article  Google Scholar 

  265. Xu, D., Yuan, S., Zhang, L. & Wu, X. FairGAN: fairness-aware generative adversarial networks. In 2018 IEEE International Conference on Big Data 570–575 (IEEE, 2018).

  266. Xu, H., Liu, X., Li, Y., Jain, A. & Tang, J. To be robust or to be fair: towards fairness in adversarial training. In Int. Conf. Machine Learning 11492–11501 (PMLR, 2021).

  267. Wadsworth, C., Vera, F. & Piech, C. Achieving fairness through adversarial learning: an application to recidivism prediction. In FAT/ML Workshop (2018).

  268. Adel, T., Valera, I., Ghahramani, Z. & Weller, A. One-network adversarial fairness. In Proc. AAAI Conf. Artificial Intelligence 33, 2412–2420 (2019).

  269. Madras, D., Creager, E., Pitassi, T. & Zemel, R. Learning adversarially fair and transferable representations. In Int. Conf. Machine Learning 3384–3393 (PMLR, 2018).

  270. Madras, D., Creager, E., Pitassi, T. & Zemel, R. Learning adversarially fair and transferable representations. In Proc. 35th Int. Conf. Machine Learning (eds. Dy, J. & Krause, A.) 3384–3393 (PMLR, 2018).

  271. Chen, X., Fain, B., Lyu, L. & Munagala, K. Proportionally fair clustering. In Proc. 36th Int. Conf. Machine Learning (eds. Chaudhuri, K. & Salakhutdinov, R.) 1032–1041 (PMLR, 2019).

  272. Li, P., Zhao, H. & Liu, H. Deep fair clustering for visual learning. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 9070–9079 (2020).

  273. Hong, J. et al. Federated adversarial debiasing for fair and transferable representations. In Proc. 27th ACM SIGKDD Conf. Knowledge Discovery and Data Mining 617–627 (2021).

  274. Qi, T. et al. FairVFL: a fair vertical federated learning framework with contrastive adversarial learning. In Adv. Neural Information Processing Systems (2022).

  275. Chen, Y., Raab, R., Wang, J. & Liu, Y. Fairness transferability subject to bounded distribution shift. In Adv. Neural Information Processing Systems (2022).

  276. An, B., Che, Z., Ding, M. & Huang, F. Transferring fairness under distribution shifts via fair consistency regularization. In Adv. Neural Information Processing Systems (2022).

  277. Giguere, S. et al. Fairness guarantees under demographic shift. In Int. Conf. Learning Representations (2022).

  278. Schrouff, J. et al. Diagnosing failures of fairness transfer across distribution shift in real-world medical settings. In Adv. Neural Information Processing Systems (2022).

  279. Lipkova, J. et al. Personalized radiotherapy design for glioblastoma: integrating mathematical tumor models, multimodal scans, and Bayesian inference. In IEEE Trans. Med. Imaging 38, 1875–1884 (2019).

  280. Cen, L. P. et al. Automatic detection of 39 fundus diseases and conditions in retinal photographs using deep neural networks. Nat. Commun. 12, 4828 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  281. Lézoray, O., Revenu, M. & Desvignes, M. Graph-based skin lesion segmentation of multispectral dermoscopic images. In IEEE Int. Conf. Image Processing 897–901 (2014).

  282. Manica, A., Prugnolle, F. & Balloux, F. Geography is a better determinant of human genetic differentiation than ethnicity. Hum. Genet. 118, 366–371 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  283. Hadad, N., Wolf, L. & Shahar, M. A two-step disentanglement method. In Proc. IEEE Conf. Computer Vision and Pattern Recognition 772–780 (2018).

  284. Achille, A. & Soatto, S. Emergence of invariance and disentanglement in deep representations. J. Mach. Learn. Res. 19, 1947–1980 (2018).

    Google Scholar 

  285. Chen, R. T., Li, X., Grosse, R. & Duvenaud, D. Isolating sources of disentanglement in variational autoencoders. In Adv. Neural Information Processing Systems (2018).

  286. Kim, H. & Mnih, A. Disentangling by factorising. In Int. Conf. Machine Learning 2649–2658 (PMLR, 2018).

  287. Higgins, I. et al. beta-VAE: learning basic visual concepts with a constrained variational framework In Int. Conf. Learning Representations (2017).

  288. Sarhan, M. H., Eslami, A., Navab, N. & Albarqouni, S. Learning interpretable disentangled representations using adversarial VAEs. In Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data 37–44 (Springer, 2019).

  289. Gyawali, P. K. et al. Learning to disentangle inter-subject anatomical variations in electrocardiographic data. In IEEE Trans. Biomedical Engineering (IEEE, 2021).

  290. Bing, S., Fortuin, V. & Ratsch, G. On disentanglement in Gaussian process variational autoencoders. In 4th Symp. Adv. Approximate Bayesian Inference (2021).

  291. Xu, Y., He, H., Shen, T. & Jaakkola, T. S. Controlling directions orthogonal to a classifier. In Int. Conf. Learning Representations (2022).

  292. Cisse, M. & Koyejo, S. Fairness and representation learning. In NeurIPS Invited Talk 2019; https://cs.stanford.edu/~sanmi/documents/Representation_Learning_Fairness_NeurIPS19_Tutorial.pdf (2019).

  293. Creager, E. et al. Flexibly fair representation learning by disentanglement. In Int. Conf. Machine Learning 1436–1445 (PMLR, 2019).

  294. Locatello, F. et al. On the fairness of disentangled representations. In Adv. Neural Information Processing Systems (2019).

  295. Lee, J., Kim, E., Lee, J., Lee, J. & Choo, J. Learning debiased representation via disentangled feature augmentation. In Adv. Neural Information Processing Systems 34, 25123–25133 (2021).

  296. Zhang, Y. K., Wang, Q. W., Zhan, D. C. & Ye, H. J. Learning debiased representations via conditional attribute interpolation. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 7599–7608 (2023).

  297. Tartaglione, E., Barbano, C. A. & Grangetto, M. End: entangling and disentangling deep representations for bias correction. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 13508–13517 (2021).

  298. Bercea, C. I., Wiestler, B., Rueckert, D. & Albarqouni, S. FedDis: disentangled federated learning for unsupervised brain pathology segmentation. Preprint at https://doi.org/10.48550/arXiv.2103.03705 (2021).

  299. Ke, J., Shen, Y. & Lu, Y. Style normalization in histology with federated learning. In 2021 IEEE 18th Int. Symp. Biomedical Imaging 953–956 (IEEE, 2021).

  300. Pfohl, S. R., Dai, A. M. & Heller, K. Federated and differentially private learning for electronic health records. In Machine Learning for Health (ML4H) Workshop at NeurIPS (2019).

  301. Xin, B. et al. Private FL-GAN: differential privacy synthetic data generation based on federated learning. In 2020 IEEE Int. Conf. Acoustics, Speech and Signal Processing 2927–2931 (IEEE, 2020).

  302. Rajotte, J.-F. et al. Reducing bias and increasing utility by federated generative modeling of medical images using a centralized adversary. In Proc. Conf. Information Technology for Social Good 79–84 (2021).

  303. Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Int. Conf. Machine Learning 1597–1607 (PMLR, 2020).

  304. Shad, R., Cunningham, J. P., Ashley, E. A., Langlotz, C. P. & Hiesinger, W. Designing clinically translatable artificial intelligence systems for high-dimensional medical imaging. Nat. Mach. Intell. 3, 929–935 (2021).

    Article  Google Scholar 

  305. Jacovi, A., Marasovic, A., Miller, T. & Goldberg, Y. Formalizing trust in artificial intelligence: prerequisites, causes and goals of human trust in AI. In Proc. 2021 ACM Conf. Fairness, Accountability, and Transparency 624–635 (2021).

  306. Floridi, L. Establishing the rules for building trustworthy AI. Nat. Mach. Intell. 1, 261–262 (2019).

    Article  Google Scholar 

  307. High-Level Expert Group on Artificial Intelligence. Ethics Guidelines for Trustworthy AI (European Commission, 2019).

  308. Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. In Workshop at Int. Conf. Learning Representations (2014).

  309. Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In Proc. IEEE Int. Conf. Computer Vision 618–626 (2017).

  310. Irvin, J. et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In Proc. AAAI Conf. Artificial Intelligence 33, 590–597 (2019).

  311. Sayres, R. et al. Using a deep learning algorithm and integrated gradients explanation to assist grading for diabetic retinopathy. Ophthalmology 126, 552–564 (2019).

    Article  PubMed  Google Scholar 

  312. Patro, B. N., Lunayach, M., Patel, S. & Namboodiri, V. P. U-CAM: visual explanation using uncertainty based class activation maps. In Proc. IEEE/CVF Int. Conf. Computer Vision 7444–7453 (2019).

  313. Grewal, M., Srivastava, M. M., Kumar, P. & Varadarajan, S. RADNET: radiologist level accuracy using deep learning for hemorrhage detection in CT scans. In 2018 IEEE 15th Int. Symp. Biomedical Imaging 281–284 (IEEE, 2018).

  314. Arun, N. T. et al. Assessing the validity of saliency maps for abnormality localization in medical imaging. In Medical Imaging with Deep Learning (2020).

  315. Schlemper, J. et al. Attention-gated networks for improving ultrasound scan plane detection. In Medical Imaging with Deep Learning (2018).

  316. Schlemper, J. et al. Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  317. Mittelstadt, B., Russell, C. & Wachter, S. Explaining explanations in AI. In Proc. Conf. Fairness, Accountability, and Transparency 279–288 (2019).

  318. Kindermans, P.-J. et al. in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning 267–280 (Springer, 2019).

  319. Kaur, H. et al. Interpreting interpretability: understanding data scientists’ use of interpretability tools for machine learning. In Proc. 2020 CHI Conf. Human Factors in Computing Systems (2020).

  320. Adebayo, J. et al. Sanity checks for saliency maps. In Adv. Neural Information Processing Systems (2018).

  321. Saporta, A. et al. Benchmarking saliency methods for chest X-ray interpretation. Nat. Mach. Intell. 4, 867–878 (2022).

    Article  Google Scholar 

  322. Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878 (2022).

    Article  CAS  PubMed  Google Scholar 

  323. DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021).

    Article  Google Scholar 

  324. Adebayo, J., Muelly, M., Liccardi, I. & Kim, B. Debugging tests for model explanations. In Adv. Neural Information Processing Syst. 33, 700–712 (2020).

    Google Scholar 

  325. Lee, M. K. & Rich, K. Who is included in human perceptions of AI?: Trust and perceived fairness around healthcare AI and cultural mistrust. In Proc. 2021 CHI Conf. Human Factors in Computing Systems (2021).

  326. Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Int. Conf. Machine Learning 3319–3328 (PMLR, 2017).

  327. Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st Int. Conf. Neural Information Processing Systems 4768–4777 (2017).

  328. Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3, e745–e750 (2021).

    Article  CAS  PubMed  Google Scholar 

  329. Kim, G. B., Gao, Y., Palsson, B. O. & Lee, S. Y. DeepTFactor: a deep learning-based tool for the prediction of transcription factors. Proc. Natl Acad. Sci. USA 118, e2021171118 (2021).

    Article  CAS  PubMed  Google Scholar 

  330. Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2, 749–760 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  331. Qiu, W. et al. Interpretable machine learning prediction of all-cause mortality. Commun. Med. 2, 125 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  332. Janizek, J. D. et al. Uncovering expression signatures of synergistic drug responses via ensembles of explainable machine-learning models. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-023-01034-0 (2023).

    Article  PubMed  Google Scholar 

  333. Wexler, J., Pushkarna, M., Robinson, S., Bolukbasi, T. & Zaldivar, A. Probing ML models for fairness with the What-If tool and SHAP: hands-on tutorial. In Proc. 2020 Conference on Fairness, Accountability, and Transparency 705 (2020).

  334. Lundberg, S. M. Explaining quantitative measures of fairness. In Fair & Responsible AI Workshop @ CHI2020; https://scottlundberg.com/files/fairness_explanations.pdf (2020).

  335. Cesaro, J. & Cozman, F. G. Measuring unfairness through game-theoretic interpretability. In Machine Learning and Knowledge Discovery in Databases: International Workshops of ECML PKDD (2019).

  336. Meng, C., Trinh, L., Xu, N. & Liu, Y. Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset. Sci. Rep. 12, 7166 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  337. Panigutti, C., Perotti, A., Panisson, A., Bajardi, P. & Pedreschi, D. FairLens: auditing black-box clinical decision support systems. Inf. Process. Manag. 58, 102657 (2021).

    Article  Google Scholar 

  338. Röösli, E., Bozkurt, S. & Hernandez-Boussard, T. Peeking into a black box, the fairness and generalizability of a MIMIC-III benchmarking model. Sci. Data 9, 24 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  339. Pan, W., Cui, S., Bian, J., Zhang, C. & Wang, F. Explaining algorithmic fairness through fairnessaware causal path decomposition. In Proc. 27th ACM SIGKDD Conf. Knowledge Discovery and Data Mining 1287–1297 (2021).

  340. Agarwal, C. et al. Openxai: towards a transparent evaluation of model explanations. In Adv. Neural Information Processing Systems 35, 15784–15799 (2022).

  341. Zhang, H., Singh, H., Ghassemi, M. & Joshi, S. “Why did the model fail?”: attributing model performance changes to distribution shifts. In Int. Conf. Machine Learning (2023).

  342. Ghorbani, A. & Zou, J. Data Shapley: equitable valuation of data for machine learning. In Int. Conf. Machine Learn. 97, 2242–2251 (2019).

    Google Scholar 

  343. Pandl, K. D., Feiland, F., Thiebes, S. & Sunyaev, A. Trustworthy machine learning for health care: scalable data valuation with the Shapley value. In Proc. Conf. Health, Inference, and Learning 47–57 (2021).

  344. Prakash, E. I., Shrikumar, A. & Kundaje, A. Towards more realistic simulated datasets for benchmarking deep learning models in regulatory genomics. In Machine Learning in Computational Biology 58–77 (2022).

  345. Oktay, O. et al. Attention U-Net: learning where to look for the pancreas. In Medical Imaging with Deep Learning (2018).

  346. Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In Int. Conf. Learning Representations (2020).

  347. Lu, M. Y. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  348. Yufei, C. et al. Bayes-MIL: a new probabilistic perspective on attention-based multiple instance learning for whole slide images. In Int. Conf. Learning Representations (2023).

  349. Van Gansbeke, W., Vandenhende, S., Georgoulis, S. & Van Gool, L. Unsupervised semantic segmentation by contrasting object mask proposals. In Proc. IEEE/CVF Int. Conf. Computer Vision 10052–10062 (2021).

  350. Radford, A. et al. Learning transferable visual models from natural language supervision. In Int. Conf. Machine Learning 8748–8763 (2021).

  351. Wei, J. et al. Chain of thought prompting elicits reasoning in large language models. In Adv. Neural Information Processing Systems (2022).

  352. Javed, S. A., Juyal, D., Padigela, H., Taylor-Weiner, A. & Yu, L. Additive MIL: intrinsically interpretable multiple instance learning for pathology. In Adv. Neural Information Processing Systems (2022).

  353. Diao, J. A. et al. Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes. Nat. Commun. 12, 1613 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  354. Bhargava, H. K. et al. Computationally derived image signature of stromal morphology is prognostic of prostate cancer recurrence following prostatectomy in african american patients. Clin. Cancer Res. 26, 1915–1923 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  355. Curtis, J. R. et al. Population-based fracture risk assessment and osteoporosis treatment disparities by race and gender. J. Gen. Intern. Med. 24, 956–962 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  356. Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  357. Foley, R. N., Wang, C. & Collins, A. J. Cardiovascular risk factor profiles and kidney function stage in the US general population: the NHANES III study. In Mayo Clinic Proc. 80, 1270–1277 (Elsevier, 2005).

  358. Nevitt, M., Felson, D. & Lester, G. The osteoarthritis initiative. Protocol for the cohort study 1; https://nda.nih.gov/static/docs/StudyDesignProtocolAndAppendices.pdf (2006).

  359. Vaughn, I. A., Terry, E. L., Bartley, E. J., Schaefer, N. & Fillingim, R. B. Racial-ethnic differences in osteoarthritis pain and disability: a meta-analysis. J. Pain. 20, 629–644 (2019).

    Article  PubMed  Google Scholar 

  360. Rotemberg, V. et al. A patient-centric dataset of images and metadata for identifying melanomas using clinical context. Sci. Data 8, 34 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  361. Kinyanjui, N. M. et al. Estimating skin tone and effects on classification performance in dermatology datasets. Preprint at https://doi.org/10.48550/arXiv.1910.13268 (2019).

  362. Kinyanjui, N. M. et al. Fairness of classifiers across skin tones in dermatology. In Int. Conf. Medical Image Computing and Computer-Assisted Intervention 320–329 (2020).

  363. Chew, E. Y. et al. The Age-Related Eye Disease Study 2 (AREDS2): study design and baseline characteristics (AREDS2 report number 1). Ophthalmology 119, 2282–2289 (2012).

    Article  PubMed  Google Scholar 

  364. Joshi, N. & Burlina, P. AI fairness via domain adaptation. Preprint at https://doi.org/10.48550/arXiv.2104.01109 (2021).

  365. Zhou, Y. et al. RadFusion: benchmarking performance and fairness for multi-modal pulmonary embolism detection from CT and EMR. Preprint at https://doi.org/10.48550/arXiv.2111.11665 (2021).

  366. Edwards, N. J. et al. The CPTAC data portal: a resource for cancer proteomics research. J. Proteome Res. 14, 2707–2713 (2015).

    Article  CAS  PubMed  Google Scholar 

  367. Johnson, A. E. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  368. Boag, W., Suresh, H., Celi, L. A., Szolovits, P. & Ghassemi, M. Racial disparities and mistrust in end-of-life care. In Machine Learning for Healthcare Conf. 587–602 (PMLR, 2018).

  369. Prosper, A. E. et al. Association of inclusion of more black individuals in lung cancer screening with reduced mortality. JAMA Netw. Open 4, e2119629 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  370. National Lung Screening Trial Research Team. et al. The National Lung Screening Trial: overview and study design. Radiology 258, 243–253 (2011).

    Article  PubMed Central  Google Scholar 

  371. Colak, E. et al. The RSNA pulmonary embolism CT dataset. Radiol. Artif. Intell. 3, e200254 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  372. Gertych, A., Zhang, A., Sayre, J., Pospiech-Kurkowska, S. & Huang, H. Bone age assessment of children using a digital hand atlas. Comput. Med. Imaging Graph. 31, 322–331 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  373. Jeong, J. J. et al. The EMory BrEast imaging Dataset (EMBED): a racially diverse, granular dataset of 3.4 million screening and diagnostic mammographic images. Radiol. Artif. Intell. 5.1, e220047 (2023).

    Article  Google Scholar 

  374. Pollard, T. J. et al. The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data 5, 1–13 (2018).

    Article  Google Scholar 

  375. Sheikhalishahi, S., Balaraman, V. & Osmani, V. Benchmarking machine learning models on multicentre eICU critical care dataset. PLoS ONE 15, e0235424 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  376. El Emam, K. et al. De-identification methods for open health data: the case of the heritage health prize claims dataset. J. Med. Internet Res 14, e33 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  377. Madras, D., Pitassi, T. & Zemel, R. Predict responsibly: improving fairness and accuracy by learning to defer. In Adv. Neural Information Processing Systems (2018).

  378. Louizos, C., Swersky, K., Li, Y., Welling, M. & Zemel, R. The variational fair autoencoder. In Int. Conf. Learning Representations (2016).

  379. Raff, E. & Sylvester, J. Gradient reversal against discrimination. Preprint at https://doi.org/10.48550/arXiv.1807.00392 (2018).

  380. Smith, J. W., Everhart, J., Dickson, W., Knowler, W. & Johannes, R. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proc. Symp. Computer Applications in Medical Care 261—265 (1988).

  381. Sharma, S. et al. Data augmentation for discrimination prevention and bias disambiguation. In Proc. AAAI/ACM Conf. AI, Ethics, and Society 358–364 (Association for Computing Machinery, 2020).

  382. International Warfarin Pharmacogenetics Consortium. et al. Estimation of the warfarin dose with clinical and pharmacogenetic data. N. Engl. J. Med 360, 753–764 (2009).

    Article  Google Scholar 

  383. Kallus, N., Mao, X. & Zhou, A. Assessing algorithmic fairness with unobserved protected class using data combination. In Proc. 2020 Conf. Fairness, Accountability, and Transparency 110 (Association for Computing Machinery, 2020).

  384. Gross, R. T. Infant Health and Development Program (IHDP): Enhancing the Outcomes of Low Birth Weight, Premature Infants in the United States, 1985-1988 (Inter-university Consortium for Political and Social Research, 1993); https://www.icpsr.umich.edu/web/HMCA/studies/9795

  385. Madras, D., Creager, E., Pitassi, T. & Zemel, R. Fairness through causal awareness: learning causal latent-variable models for biased data. In Proc. Conf. Fairness, Accountability, and Transparency 30, 349–358 (Association for Computing Machinery, 2019).

  386. Weeks, M. R., Clair, S., Borgatti, S. P., Radda, K. & Schensul, J. J. Social networks of drug users in high-risk sites: finding the connections. AIDS Behav. 6, 193–206 (2002).

    Article  Google Scholar 

  387. Kleindessner, M., Samadi, S., Awasthi, P. & Morgenstern, J. Guarantees for spectral clustering with fairness constraints. In Int. Conf. Machine Learning 3458–3467 (2019).

  388. Daneshjou, R. et al. Disparities in dermatology AI performance on a diverse, curated clinical image set. Sci. Adv. 8, eabq6147 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  389. Garg, S., Balakrishnan, S. & Lipton, Z. C. Domain adaptation under open set label shift. In Adv. Neural Information Processing Systems (2022).

  390. Pham, T. H., Zhang, X. & Zhang, P. Fairness and accuracy under domain generalization. In Int. Conf. Learning Representations (2023).

  391. Barocas, S., Hardt, M. & Narayanan, A. Fairness in machine learning. NIPS Tutor 1, 2017 (2017).

    Google Scholar 

  392. Liu, L. T., Simchowitz, M. & Hardt, M. The implicit fairness criterion of unconstrained learning. In Int. Conf. Machine Learning 4051–4060 (PMLR, 2017).

Download references

Acknowledgements

All authors are supported in part by the Brigham and Women’s Hospital (BWH) President’s Fund, Mass General Hospital (MGH) Pathology and by National Institute of General Medical Sciences (NIGMS) R35GM138216 (to F.M.). R.J.C. and S.S. were also supported by the National Science Foundation (NSF) Graduate Fellowship. M.Y.L. was also supported by the Siebel Scholars program. T.Y.C. was also supported by the National Institute of Health National Cancer Institute (NIH-NCI) Ruth L. Kirschstein National Service Award T32CA251062. The content is solely the responsibility of the authors and does not reflect the official views of the NIH, NIGMS, NCI or NSF.

Author information

Authors and Affiliations

Authors

Contributions

R.J.C. and F.M. drafted the manuscript. All authors contributed to literature review, conceptualization and editing of the manuscript.

Corresponding author

Correspondence to Faisal Mahmood.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary table.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, R.J., Wang, J.J., Williamson, D.F.K. et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat. Biomed. Eng 7, 719–742 (2023). https://doi.org/10.1038/s41551-023-01056-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41551-023-01056-8

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing