Algorithmic fairness in artificial intelligence for medicine and healthcare

Chen, Richard J.; Wang, Judy J.; Williamson, Drew F. K.; Chen, Tiffany Y.; Lipkova, Jana; Lu, Ming Y.; Sahai, Sharifa; Mahmood, Faisal

doi:10.1038/s41551-023-01056-8

Perspective
Published: 28 June 2023

Algorithmic fairness in artificial intelligence for medicine and healthcare

Nature Biomedical Engineering volume 7, pages 719–742 (2023)Cite this article

11k Accesses
38 Citations
40 Altmetric
Metrics details

Subjects

Abstract

In healthcare, the development and deployment of insufficiently fair systems of artificial intelligence (AI) can undermine the delivery of equitable care. Assessments of AI models stratified across subpopulations have revealed inequalities in how patients are diagnosed, treated and billed. In this Perspective, we outline fairness in machine learning through the lens of healthcare, and discuss how algorithmic biases (in data acquisition, genetic variation and intra-observer labelling variability, in particular) arise in clinical workflows and the resulting healthcare disparities. We also review emerging technology for mitigating biases via disentanglement, federated learning and model explainability, and their role in the development of AI-based software as a medical device.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Connecting healthcare disparities and dataset shifts to algorithmic fairness.**

**Fig. 2: Strategies for mitigating disparate impact.**

**Fig. 3: Genetic drift as population shift.**

**Fig. 4: Dataset shifts in the deployment of AI-SaMDs for clinical-grade AI algorithms.**

**Fig. 5: A decentralized framework that integrates federated learning with adversarial learning and disentanglement.**

A translational perspective towards clinical AI fairness

Article Open access 14 September 2023

Machine learning and algorithmic fairness in public and population health

Article 29 July 2021

Considerations for addressing bias in artificial intelligence for health equity

Article Open access 12 September 2023

References

Buolamwini, J. & Gebru, T. Gender shades: intersectional accuracy disparities in commercial gender classification. In Conf. on Fairness, Accountability and Transparency 77–91 (PMLR, 2018).
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
Article CAS PubMed Google Scholar
Pierson, E., Cutler, D. M., Leskovec, J., Mullainathan, S. & Obermeyer, Z. An algorithmic approach to reducing unexplained pain disparities in underserved populations. Nat. Med. 27, 136–140 (2021).
Article CAS PubMed Google Scholar
Hooker, S. Moving beyond ‘algorithmic bias is a data problem’. Patterns 2, 100241 (2021).
Article PubMed PubMed Central Google Scholar
McCradden, M. D., Joshi, S., Mazwi, M. & Anderson, J. A. Ethical limitations of algorithmic fairness solutions in health care machine learning. Lancet Digit. Health 2, e221–e223 (2020).
Article PubMed Google Scholar
Mhasawade, V., Zhao, Y. & Chunara, R. Machine learning and algorithmic fairness in public and population health. Nat. Mach. Intell. 3, 659–666 (2021).
Article Google Scholar
Currie, G. & Hawk, K. E. Ethical and legal challenges of artificial intelligence in nuclear medicine. In Seminars in Nuclear Medicine (Elsevier, 2020).
Chen, I. Y. et al. Ethical machine learning in healthcare. Annu. Rev. Biomed. Data Sci. 4, 123–144 (2020).
Article Google Scholar
Howard, F. M. et al. The impact of site-specific digital histology signatures on deep learning model accuracy and bias. Nat. Commun. 12, 4423 (2021).
Article CAS PubMed PubMed Central Google Scholar
Seyyed-Kalantari, L., Liu, G., McDermott, M., Chen, I. Y. & Ghassemi, M. CheXclusion: fairness gaps in deep chest X-ray classifiers. In BIOCOMPUTING 2021: Proc. Pacific Symposium 232–243 (World Scientific, 2020).
Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen, I. Y. & Ghassemi, M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat. Med. 27, 2176–2182 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gichoya, J. W. et al. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4, E406–E414 (2022).
Article CAS PubMed PubMed Central Google Scholar
Glocker, B., Jones, C., Bernhardt, M. & Winzeck, S. Algorithmic encoding of protected characteristics in chest X-ray disease detection models. EBioMedicine 89, 104467 (2023).
Article PubMed PubMed Central Google Scholar
Proposed Regulatory Framework for Modifications to Artificial Intelligence. Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) (US FDA, 2019).
Gaube, S. et al. Do as AI say: susceptibility in deployment of clinical decision-aids. NPJ Digit. Med. 4, 31 (2021).
Article PubMed PubMed Central Google Scholar
Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SAMD) Action Plan (US FDA, 2021).
Vyas, D. A. et al. Challenging the use of race in the vaginal birth after cesarean section calculator. Women’s Health Issues 29, 201–204 (2019).
Article PubMed Google Scholar
Vyas, D. A., Eisenstein, L. G. & Jones, D. S. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N. Engl. J. Med. 383, 874–882 (2020).
Article PubMed Google Scholar
van der Burgh, A. C., Hoorn, E. J. & Chaker, L. Removing race from kidney function estimates. JAMA 325, 2018 (2021).
Article PubMed Google Scholar
Diao, J. A. et al. Clinical implications of removing race from estimates of kidney function. JAMA 325, 184–186 (2021). 2021.
Article PubMed Google Scholar
Caton, S. & Haas, C. Fairness in machine learning: a survey. Preprint at https://doi.org/10.48550/arXiv.2010.04053 (2020).
Adler, N. E., Glymour, M. M. & Fielding, J. Addressing social determinants of health and health inequalities. JAMA 316, 1641–1642 (2016).
Article PubMed Google Scholar
Phelan, J. C. & Link, B. G. Is racism a fundamental cause of inequalities in health? Annu. Rev. Socio 41, 311–330 (2015).
Article Google Scholar
Yehia, B. R. et al. Association of race with mortality among patients hospitalized with coronavirus disease 2019 (COVID-19) at 92 US hospitals. JAMA Netw. Open 3, e2018039 (2020).
Article PubMed PubMed Central Google Scholar
Lopez, L., Hart, L. H. & Katz, M. H. Racial and ethnic health disparities related to COVID-19. JAMA 325, 719–720 (2021).
Article CAS PubMed Google Scholar
Bonvicini, K. A. LGBT healthcare disparities: what progress have we made? Patient Educ. Couns. 100, 2357–2361 (2017).
Article PubMed Google Scholar
Yamada, T. et al. Access disparity and health inequality of the elderly: unmet needs and delayed healthcare. Int. J. Environ. Res. Public Health 12, 1745–1772 (2015).
Article PubMed PubMed Central Google Scholar
Moy, E., Dayton, E. & Clancy, C. M. Compiling the evidence: the national healthcare disparities reports. Health Aff. 24, 376–387 (2005).
Article Google Scholar
Balsa, A. I., Seiler, N., McGuire, T. G. & Bloche, M. G. Clinical uncertainty and healthcare disparities. Am. J. Law Med. 29, 203–219 (2003).
Article PubMed Google Scholar
Marmot, M. Social determinants of health inequalities. Lancet 365, 1099–1104 (2005).
Article PubMed Google Scholar
Maness, S. B. et al. Social determinants of health and health disparities: COVID-19 exposures and mortality among African American people in the United States. Public Health Rep. 136, 18–22 (2021).
Article PubMed Google Scholar
Seligman, H. K., Laraia, B. A. & Kushel, M. B. Food insecurity is associated with chronic disease among low-income NHANES participants. J. Nutr. 140, 304–310 (2010).
Article CAS PubMed PubMed Central Google Scholar
Thun, M. J., Apicella, L. F. & Henley, S. J. Smoking vs other risk factors as the cause of smoking attributable deaths: confounding in the courtroom. JAMA 284, 706–712 (2000).
Article CAS PubMed Google Scholar
Tucker, M. J., Berg, C. J., Callaghan, W. M. & Hsia, J. The Black–White disparity in pregnancy-related mortality from 5 conditions: differences in prevalence and case-fatality rates. Am. J. Public Health 97, 247–251 (2007).
Article PubMed PubMed Central Google Scholar
Gadson, A., Akpovi, E. & Mehta, P. K. Exploring the social determinants of racial/ethnic disparities in prenatal care utilization and maternal outcome. In Seminars in Perinatology 41, 308–317 (Elsevier, 2017).
Wallace, M. et al. Maternity care deserts and pregnancy-associated mortality in louisiana. Women’s Health Issues 31, 122–129 (2021).
Article PubMed Google Scholar
Burchard, E. G. et al. The importance of race and ethnic background in biomedical research and clinical practice. N. Engl. J. Med. 348, 1170–1175 (2003).
Article PubMed Google Scholar
Phimister, E. G. Medicine and the racial divide. N. Engl. J. Med. 348, 1081–1082 (2003).
Article PubMed Google Scholar
Bonham, V. L., Green, E. D. & Pérez-Stable, E. J. Examining how race, ethnicity, and ancestry data are used in biomedical research. JAMA 320, 1533–1534 (2018).
Article PubMed PubMed Central Google Scholar
Eneanya, N. D., Yang, W. & Reese, P. P. Reconsidering the consequences of using race to estimate kidney function. JAMA 322, 113–114 (2019).
Article PubMed Google Scholar
Zelnick, L. R., Leca, N., Young, B. & Bansal, N. Association of the estimated glomerular filtration rate with vs without a coefficient for race with time to eligibility for kidney transplant. JAMA Netw. Open 4, e2034004 (2021).
Article PubMed PubMed Central Google Scholar
Chadban, S. J. et al. KDIGO clinical practice guideline on the evaluation and management of candidates for kidney transplantation. Transplantation 104, S11–S103 (2020).
Article PubMed Google Scholar
Wesselman, H. et al. Social determinants of health and race disparities in kidney transplant. Clin. J. Am. Soc. Nephrol. 16, 262–274 (2021).
Article PubMed PubMed Central Google Scholar
Kanis JA, H. N. M. E. & Johansson, H. A brief history of frax. Arch. Osteoporos. 13, 118 (2018).
Article PubMed PubMed Central Google Scholar
Lewiecki, E. M., Wright, N. C. & Singer, A. J. Racial disparities, frax, and the care of patients with osteoporosis. Osteoporos. Int. 31, 2069–2071 (2020).
Article CAS PubMed PubMed Central Google Scholar
Civil Rights Act of 1964. Title VII, Equal Employment Opportunities https://en.wikipedia.org/wiki/Civil_Rights_Act_of_1964 (1964)
Griggs v. Duke Power Co. https://en.wikipedia.org/wiki/Griggs_v._Duke_Power_Co (1971).
Awad, E. et al. The moral machine experiment. Nature 563, 59–64 (2018).
Article CAS PubMed Google Scholar
Feller, A., Pierson, E., Corbett-Davies, S. & Goel, S. A computer program used for bail and sentencing decisions was labeled biased against blacks. it’s actually not that clear. The Washington Post (17 October 2016).
Dressel, J. & Farid, H. The accuracy, fairness, and limits of predicting recidivism. Sci. Adv. 4, eaao5580 (2018).
Article PubMed PubMed Central Google Scholar
Char, D. S., Shah, N. H. & Magnus, D. Implementing machine learning in health care—addressing ethical challenges. N. Engl. J. Med. 378, 981–983 (2018).
Article PubMed PubMed Central Google Scholar
Bernhardt, M., Jones, C. & Glocker, B. Potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms. Nat. Med. 28, 1157–1158 (2022).
Article CAS PubMed Google Scholar
Mukherjee, P. et al. Confounding factors need to be accounted for in assessing bias by machine learning algorithms. Nat. Med. 28, 1159–1160 (2022).
Article CAS PubMed Google Scholar
Diao, J. A., Powe, N. R. & Manrai, A. K. Race-free equations for eGFR: comparing effects on CKD classification. J. Am. Soc. Nephrol. 32, 1868–1870 (2021).
Article CAS PubMed PubMed Central Google Scholar
Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C. & Venkatasubramanian, S. Certifying and removing disparate impact. In Proc. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 259–268 (2015).
Hardt, M., Price, E. & Srebro, N. Equality of opportunity in supervised learning. In Adv. Neural Information Processing Systems (2016).
Corbett-Davies, S. & Goel, S. The measure and mismeasure of fairness: a critical review of fair machine learning. Preprint at https://doi.org/10.48550/arXiv.1808.00023 (2018).
Calders, T., Kamiran, F. & Pechenizkiy, M. Building classifiers with independency constraints. In Int. Conf. Data Mining Workshops 13–18 (IEEE, 2009).
Chen, J., Kallus, N., Mao, X., Svacha, G. & Udell, M. Fairness under unawareness: assessing disparity when protected class is unobserved. In Proc. Conf. Fairness, Accountability, and Transparency 339–348 (2019).
Zliobaite, I., Kamiran, F. & Calders, T. Handling conditional discrimination. In 11th Int. Conf. Data Mining 992–1001 (IEEE, 2011).
Dwork, C., Hardt, M., Pitassi, T., Reingold, O. & Zemel, R. Fairness through awareness. In Proc. 3rd Innovations in Theoretical Computer Science Conf. 214–226 (2012).
Pedreshi, D., Ruggieri, S. & Turini, F. Discrimination-aware data mining. In Proc. 14th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining 560–568 (2008).
Angwin, J., Larson, J., Mattu, S. & Kirchner, L. In Ethics of Data and Analytics 254–264 (Auerbach, 2016).
Kleinberg, J., Mullainathan, S. & Raghavan, M. Inherent trade-offs in the fair determination of risk scores. In 8th Innovations in Theoretical Computer Science Conf. (ITCS 2017)
Chouldechova, A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5, 153–163 (2017).
Article PubMed Google Scholar
Joseph, M., Kearns, M., Morgenstern, J. H. & Roth, A. Fairness in learning: classic and contextual bandits. In Adv. Neural Information Processing Systems (2016).
Celis, L. E. & Keswani, V. Improved adversarial learning for fair classification. Preprint at https://doi.org/10.48550/arXiv.1901.10443 (2019).
Kamiran, F. & Calders, T. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33, 1–33 (2012).
Article Google Scholar
Calmon, F. P., Wei, D., Vinzamuri, B., Ramamurthy, K. N. & Varshney, K. R. Optimized pre-processing for discrimination prevention. In Proc. 31st Int. Conf. Neural Information Processing Systems 3995–4004 (2017).
Krasanakis, E., Spyromitros-Xioufis, E., Papadopoulos, S. & Kompatsiaris, Y. Adaptive sensitive reweighting to mitigate bias in fairness-aware classification. In Proc. 2018 World Wide Web Conf. 853–862 (2018).
Jiang, H. & Nachum, O. Identifying and correcting label bias in machine learning. In Int. Conf. Artificial Intelligence and Statistics 702–712 (PMLR, 2020).
Chai, X. et al. Unsupervised domain adaptation techniques based on auto-encoder for non-stationary eeg-based emotion recognition. Comput. Biol. Med. 79, 205–214 (2016).
Article PubMed Google Scholar
Kamishima, T., Akaho, S., Asoh, H. & Sakuma, J. Fairness-aware classifier with prejudice remover regularizer. In Joint Eur. Conf. Machine Learning and Knowledge Discovery in Databases 35–50 (Springer, 2012).
Zafar, M. B., Valera, I., Rogriguez, M. G. & Gummadi, K. P. Fairness constraints: mechanisms for fair classification. In Artificial Intelligence and Statistics 962–970 (PMLR, 2017).
Goel, N., Yaghini, M. & Faltings, B. Non-discriminatory machine learning through convex fairness criteria. In 32nd AAAI Conference on Artificial Intelligence (2018).
Goh, G., Cotter, A., Gupta, M. & Friedlander, M. P. Satisfying real-world goals with dataset constraints. In Adv. Neural Information Processing Systems (2016).
Agarwal, A. et al. A reductions approach to fair classification. In Int. Conf. Machine Learning 60–69 (PMLR, 2018).
Corbett-Davies, S., Pierson, E., Feller, A., Goel, S. & Huq, A. Algorithmic decision making and the cost of fairness. In Proc. 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining 797–806 (2017).
Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J. & Weinberger, K. Q. On fairness and calibration. In Adv. Neural Information Processing Systems (2017).
Chouldechova, A., Benavides-Prado, D., Fialko, O. & Vaithianathan, R. A case study of algorithm assisted decision making in child maltreatment hotline screening decisions. In Conf. Fairness, Accountability and Transparency 134–148 (PMLR, 2018).
Abernethy, J., Awasthi, P., Kleindessner, M., Morgenstern, J. & Zhang, J. Active sampling for min-max fairness. In Int. Conf. Machine Learning 53–65, (PMLR, 2022).
Iosifidis, V. & Ntoutsi, E. Dealing with bias via data augmentation in supervised learning scenarios. In Proc. Int. Workshop on Bias in Information, Algorithms, and Systems (eds. Bates, J. et al.) (2018).
Vodrahalli, K., Li, K. & Malik, J. Are all training examples created equal? An empirical study. Preprint at https://doi.org/10.48550/arXiv.1811.12569 (2018).
Barocas, S. & Selbst, A. D. Big data’s disparate impact. Calif. Law Rev. 104, 671 (2016).
Google Scholar
O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Crown, 2016).
Rezaei, A., Liu, A., Memarrast, O. & Ziebart, B. D. Robust fairness under covariate shift. In Proc. AAAI Conf. Artificial Intelligence 35, 9419–9427 (2021).
Alabi, D., Immorlica, N. & Kalai, A. Unleashing linear optimizers for group-fair learning and optimization. In Conf. Learning Theory 2043–2066 (PMLR, 2018).
Kearns, M., Neel, S., Roth, A. & Wu, Z. S. Preventing fairness gerrymandering: auditing and learning for subgroup fairness. In Int. Conf. Machine Learning 2564–2572 (PMLR, 2018).
Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2, 158–164 (2018).
Article PubMed Google Scholar
Babenko, B. et al. Detection of signs of disease in external photographs of the eyes via deep learning. Nat. Biomed. Eng. 6, 1370–1383 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kamishima, T., Akaho, S. & Sakuma, J. Fairness-aware learning through regularization approach. In 2011 IEEE 11th Int. Conf. Data Mining Workshops 643–650 (IEEE, 2011).
Zafar, M. B., Valera, I., Gomez Rodriguez, M. & Gummadi, K. P. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proc. 26th Int. Conf. World Wide Web 1171–1180 (2017).
Zemel, R., Wu, Y., Swersky, K., Pitassi, T. & Dwork, C. Learning fair representations. In Int. Conf. Machine Learning 325–333 (PMLR, 2013).
Kim, M., Reingold, O. & Rothblum, G. Fairness through computationally-bounded awareness. In Adv. Neural Information Processing Systems (2018).
Pfohl, S. R., Foryciarz, A. & Shah, N. H. An empirical characterization of fair machine learning for clinical risk prediction. J. Biomed. Inform. 113, 103621 (2021).
Article PubMed Google Scholar
Foryciarz, A., Pfohl, S. R., Patel, B. & Shah, N. Evaluating algorithmic fairness in the presence of clinical guidelines: the case of atherosclerotic cardiovascular disease risk estimation. BMJ Health Care Inf. 29, e100460 (2022).
Article Google Scholar
Muntner, P. et al. Potential US population impact of the 2017 ACC/AHA high blood pressure guideline. Circulation 137, 109–118 (2018).
Article PubMed Google Scholar
Chen, I., Johansson, F. D. & Sontag, D. Why is my classifier discriminatory? In Adv. Neural Information Processing Systems (2018).
Raji, I. D. & Buolamwini, J. Actionable auditing: investigating the impact of publicly naming biased performance results of commercial AI products. In Proc. 2019 AAAI/ACM Conf. AI, Ethics, and Society 429–435 (2019).
Rolf, E., Worledge, T., Recht, B. & Jordan, M. I. Representation matters: assessing the importance of subgroup allocations in training data. In Int. Conf. Machine Learning 9040–9051 (2021).
Zhao, H. & Gordon, G. Inherent tradeoffs in learning fair representations. In Adv. Neural InformationProcessing Systems 32, 15675–15685 (2019).
Pfohl, S. et al. Creating fair models of atherosclerotic cardiovascular disease risk. In Proc. 2019 AAAI/ACM Conf. AI, Ethics, and Society 271–278 (2019).
Pfohl, S. R. Recommendations for Algorithmic Fairness Assessments of Predictive Models in Healthcare: Evidence from Large-scale Empirical Analyses. PhD thesis, Stanford Univ. (2021).
Singh, H., Singh, R., Mhasawade, V. & Chunara, R. Fairness violations and mitigation under covariate shift. In Proc. 2021 ACM Conf. Fairness, Accountability, and Transparency 3–13 (2021).
Biswas, A. & Mukherjee, S. Ensuring fairness under prior probability shifts. In Proc. 2021 AAAI/ACM Conf. AI, Ethics, and Society 414–424 (2021).
Giguere, S. et al. Fairness guarantees under demographic shift. In Int. Conf. Learning Representations (2021).
Mishler, A. & Dalmasso, N. Fair when trained, unfair when deployed: observable fairness measures are unstable in performative prediction settings. Preprint at https://doi.org/10.48550/arXiv.2202.05049 (2022).
Duchi, J. & Namkoong, H. Learning models with uniform performance via distributionally robust optimization. Ann. Stat. 49, 1378–1406 (2021).
Article Google Scholar
Hashimoto, T., Srivastava, M., Namkoong, H. & Liang, P. Fairness without demographics in repeated loss minimization. In Int. Conf. Machine Learning 1929–1938 (PMLR, 2018).
Wang, S. et al. Robust optimization for fairness with noisy protected groups. In Adv. Neural InformationProcessing Systems 33, 5190–5203 (2020).
Coston, A. et al. Fair transfer learning with missing protected attributes. In Proc. 2019 AAAI/ACM Conf. AI, Ethics, and Society 91–98 (2019).
Schumann, C. et al. Transfer of machine learning fairness across domains. In NeurIPS AI for Social Good Workshop (2019).
Lahoti, P. et al. Fairness without demographics through adversarially reweighted learning. In Adv. Neural Information Processing Systems 33, 728–740 (2020).
Yan, S., Kao, H.-t. & Ferrara, E. Fair class balancing: enhancing model fairness without observing sensitive attributes. In Proc. 29th ACM Int. Conf. Information and Knowledge Management 1715–1724 (2020).
Zhao, T., Dai, E., Shu, K. & Wang, S. Towards fair classifiers without sensitive attributes: exploring biases in related features. In Proc. 15th ACM Int. Conf. Web Search and Data Mining 1433–1442 (2022).
Quinonero-Candela, J., Sugiyama, M., Lawrence, N. D. & Schwaighofer, A. Dataset Shift in Machine Learning (MIT Press, 2009).
Subbaswamy, A., Schulam, P. & Saria, S. Preventing failures due to dataset shift: learning predictive models that transport. In 22nd Int. Conf. Artificial Intelligence and Statistics 3118–3127 (PMLR, 2019).
Subbaswamy, A. & Saria, S. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics 21, 345–352 (2020).
PubMed Google Scholar
Guo, L. L. et al. Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Sci. Rep. 12, 2726 (2022).
Article CAS PubMed PubMed Central Google Scholar
Singh, H., Singh, R., Mhasawade, V. & Chunara, R. Fair predictors under distribution shift. In NeurIPS Workshop on Fair ML for Health (2019).
Bernhardt, M., Jones, C. & Glocker, B. Investigating underdiagnosis of ai algorithms in the presence of multiple sources of dataset bias. Nat. Med. 28, 1157–1158 (2022).
Article CAS PubMed Google Scholar
Ghosh, A. & Shanbhag, A. FairCanary: rapid continuous explainable fairness. In Proc. AAAI/ACM Conf. AI, Ethics, and Society (2022).
Sagawa, S., Koh, P. W., Hashimoto, T. B. & Liang, P. Distributionally robust neural networks. In Int. Conf. Learning Representations (2020).
Yang, Y., Zhang, H., Katabi, D. & Ghassemi, M. Change is hard: a closer look at subpopulation shift. In Int. Conf. Machine Learning (2023).
Zong, Y., Yang, Y. & Hospedales, T. MEDFAIR: benchmarking fairness for medical imaging. In Int. Conf. Learning Representations (2023).
Lipkova, J. et al. Deep learning-enabled assessment of cardiac allograft rejection from endomyocardial biopsies. Nat. Med. 28, 575–582 (2022).
Article CAS PubMed PubMed Central Google Scholar
Tedeschi, P. & Griffith, J. R. Classification of hospital patients as ‘surgical’. Implications of the shift to ICD-9-CM. Med. Care 22, 189–192 (1984).
Article CAS PubMed Google Scholar
Heslin, K. C. et al. Trends in opioid-related inpatient stays shifted after the US transitioned to ICD-10-CM diagnosis coding in 2015. Med. Care 55, 918–923 (2017).
Article PubMed Google Scholar
Castro, D. C., Walker, I. & Glocker, B. Causality matters in medical imaging. Nat. Commun. 11, 3673 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal 6, pl1 (2013).
Article PubMed PubMed Central Google Scholar
Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).
Article Google Scholar
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wen, D. et al. Characteristics of publicly available skin cancer image datasets: a systematic review. Lancet Digit. Health 4, e64–e74 (2021).
Article PubMed Google Scholar
Khan, S. M. et al. A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability. Lancet Digit. Health 3, e51–e66 (2021).
Article CAS PubMed Google Scholar
Mamary, A. J. et al. Race and gender disparities are evident in COPD underdiagnoses across all severities of measured airflow obstruction. Chronic Obstr. Pulm. Dis. 5, 177 (2018).
PubMed PubMed Central Google Scholar
Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen, I. Y. & Ghassemi, M. Reply to: ‘potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms’ and ‘confounding factors need to be accounted for in assessing bias by machine learning algorithms’. Nat. Med. 28, 1161–1162 (2022).
Article CAS PubMed Google Scholar
Landry, L. G., Ali, N., Williams, D. R., Rehm, H. L. & Bonham, V. L. Lack of diversity in genomic databases is a barrier to translating precision medicine research into practice. Health Aff. 37, 780–785 (2018).
Article Google Scholar
Gusev, A. et al. Atlas of prostate cancer heritability in European and African-American men pinpoints tissue-specific regulation. Nat. Commun. 7, 10979 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hinch, A. G. et al. The landscape of recombination in African Americans. Nature 476, 170–175 (2011).
Article CAS PubMed PubMed Central Google Scholar
Shriver, M. D. et al. Skin pigmentation, biogeographical ancestry and admixture mapping. Hum. Genet. 112, 387–399 (2003).
Article PubMed Google Scholar
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12, e1001779 (2015).
Article PubMed PubMed Central Google Scholar
Puyol-Anton, E. et al. Fairness in cardiac MR image analysis: an investigation of bias due to data imbalance in deep learning based segmentation. Med. Image Comput. Computer Assist. Intervention 24, 413–423 (2021).
Google Scholar
Kraft, S. A. et al. Beyond consent: building trusting relationships with diverse populations in precision medicine research. Am. J. Bioeth. 18, 3–20 (2018).
Article PubMed PubMed Central Google Scholar
West, K. M., Blacksher, E. & Burke, W. Genomics, health disparities, and missed opportunities for the nation’s research agenda. JAMA 317, 1831–1832 (2017).
Article PubMed PubMed Central Google Scholar
Mahal, B. A. et al. Racial differences in genomic profiling of prostate cancer. N. Engl. J. Med. 383, 1083–1085 (2020).
Article PubMed PubMed Central Google Scholar
Shi, Y. et al. A prospective, molecular epidemiology study of EGFR mutations in asian patients with advanced non–small-cell lung cancer of adenocarcinoma histology (PIONEER). J. Thorac. Oncol. 9, 154–162 (2014).
Article CAS PubMed PubMed Central Google Scholar
Spratt, D. E. et al. Racial/ethnic disparities in genomic sequencing. JAMA Oncol. 2, 1070–1074 (2016).
Article PubMed PubMed Central Google Scholar
Zhang, G. et al. Characterization of frequently mutated cancer genes in chinese breast tumors: a comparison of chinese and TCGA cohorts. Ann. Transl. Med. 7, 179 (2019).
Article PubMed PubMed Central Google Scholar
Zavala, V. A. et al. Cancer health disparities in racial/ethnic minorities in the United States. Br. J. Cancer 124, 315–332 (2020).
Article PubMed PubMed Central Google Scholar
Zhang, W., Edwards, A., Flemington, E. K. & Zhang, K. Racial disparities in patient survival and tumor mutation burden, and the association between tumor mutation burden and cancer incidence rate. Sci. Rep. 7, 13639 (2017).
Article PubMed PubMed Central Google Scholar
Ooi, S. L., Martinez, M. E. & Li, C. I. Disparities in breast cancer characteristics and outcomes by race/ethnicity. Breast Cancer Res. Treat. 127, 729–738 (2011).
Article PubMed Google Scholar
Henderson, B. E., Lee, N. H., Seewaldt, V. & Shen, H. The influence of race and ethnicity on the biology of cancer. Nat. Rev. Cancer 12, 648–653 (2012).
Article CAS PubMed PubMed Central Google Scholar
Gamble, P. et al. Determining breast cancer biomarker status and associated morphological features using deep learning. Commun. Med. 1, 1–12 (2021).
Article Google Scholar
Borrell, L. N. et al. Race and genetic ancestry in medicine—a time for reckoning with racism. N. Engl. J. Med. 384, 474–480 (2021).
Article PubMed PubMed Central Google Scholar
Martini, R., Newman, L. & Davis, M. Breast cancer disparities in outcomes; unmasking biological determinants associated with racial and genetic diversity. Clin. Exp. Metastasis 39, 7–14 (2022).
Article CAS PubMed Google Scholar
Martini, R. et al. African ancestry–associated gene expression profiles in triple-negative breast cancer underlie altered tumor biology and clinical outcome in women of African descent. Cancer Discov. 12, 2530–2551 (2022).
Article CAS PubMed PubMed Central Google Scholar
Herbst, R. S. et al. Atezolizumab for first-line treatment of PD-L1–selected patients with NSCLC. N. Engl. J. Med. 383, 1328–1339 (2020).
Article CAS PubMed Google Scholar
Clarke, M. A., Devesa, S. S., Hammer, A. & Wentzensen, N. Racial and ethnic differences in hysterectomy-corrected uterine corpus cancer mortality by stage and histologic subtype. JAMA Oncol. 8, 895–903 (2022).
Article PubMed PubMed Central Google Scholar
Yeyeodu, S. T., Kidd, L. R. & Kimbro, K. S. Protective innate immune variants in racial/ethnic disparities of breast and prostate cancer. Cancer Immunol. Res. 7, 1384–1389 (2019).
Article CAS PubMed PubMed Central Google Scholar
Yang, W. et al. Sex differences in gbm revealed by analysis of patient imaging, transcriptome, and survival data. Sci. Transl. Med. 11, eaao5253 (2019).
Article CAS PubMed PubMed Central Google Scholar
Carrano, A., Juarez, J. J., Incontri, D., Ibarra, A. & Cazares, H. G. Sex-specific differences in glioblastoma. Cells 10, 1783 (2021).
Article PubMed PubMed Central Google Scholar
Creed, J. H. et al. Commercial gene expression tests for prostate cancer prognosis provide paradoxical estimates of race-specific risk. Cancer Epidemiol. Biomark. Prev. 29, 246–253 (2020).
Article CAS Google Scholar
Burlina, P., Joshi, N., Paul, W., Pacheco, K. D. & Bressler, N. M. Addressing artificial intelligence bias in retinal diagnostics. Transl. Vis. Sci. Technol. 10, 13 (2021).
Article PubMed PubMed Central Google Scholar
Kakadekar, A., Greene, D. N., Schmidt, R. L., Khalifa, M. A. & Andrews, A. R. Nonhormone-related histologic findings in postsurgical pathology specimens from transgender persons: a systematic review. Am. J. Clin. Pathol. 157, 337–344 (2022).
Article PubMed Google Scholar
Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).
Article CAS PubMed Google Scholar
Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).
Article CAS PubMed PubMed Central Google Scholar
Echle, A. et al. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br. J. Cancer 124, 686–696 (2021).
Article PubMed Google Scholar
Dwork, C., Immorlica, N., Kalai, A. T. & Leiserson, M. Decoupled classifiers for fair and efficient machine learning. In Conf. Fairness, Accountability and Transparency (PMLR, 2018).
Lipton, Z., McAuley, J. & Chouldechova, A. Does mitigating ml’s impact disparity require treatment disparity? In Adv. Neural Information Processing Systems (2018).
Madras, D., Creager, E., Pitassi, T. & Zemel, R. Fairness through causal awareness: learning causal latent-variable models for biased data. In Proc. Conf. Fairness, Accountability, and Transparency 349–358 (2019).
Lohaus, M., Kleindessner, M., Kenthapadi, K., Locatello, F. & Russell, C. Are two heads the same as one? Identifying disparate treatment in fair neural networks. In Adv. Neural Information Processing Systems (2022).
McCarty, C. A. et al. The emerge network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med. Genet. 4, 1–11 (2011).
Google Scholar
Gottesman, O. et al. The electronic medical records and genomics (emerge) network: past, present, and future. Genet. Med. 15, 761–771 (2013).
Article PubMed PubMed Central Google Scholar
Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).
Article CAS PubMed PubMed Central Google Scholar
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Article CAS PubMed PubMed Central Google Scholar
Manrai, A. K. et al. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375, 655–665 (2016).
Article PubMed PubMed Central Google Scholar
Dehkharghanian, T. et al. Biased data, biased AI: deep networks predict the acquisition site of TCGA images. Diagn. Pathol. 18, 1–12 (2023).
Article Google Scholar
Ganin, Y. & Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Int. Conf. Machine Learning 1180–1189 (PMLR, 2015).
Shaban, M. T., Baur, C., Navab, N. & Albarqouni, S. StainGAN: stain style transfer for digital histological images. In 2019 IEEE 16th Int. Symp. Biomedical Imaging (ISBI 2019) 953–956 (IEEE, 2019).
Widmer, G. & Kubat, M. Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23, 69–101 (1996).
Article Google Scholar
Schlimmer, J. C. & Granger, R. H. Incremental learning from noisy data. Mach. Learn. 1, 317–354 (1986).
Article Google Scholar
Lu, J. et al. Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31, 2346–2363 (2018).
Google Scholar
Guo, L. L. et al. Systematic review of approaches to preserve machine learning performance in the presence of temporal dataset shift in clinical medicine. Appl. Clin. Inform. 12, 808–815 (2021).
Article PubMed PubMed Central Google Scholar
Barocas, S. et al. Designing disaggregated evaluations of AI systems: choices, considerations, and tradeoffs. In Proc. 2021 AAAI/ACM Conf. AI, Ethics, and Society 368–378 (2021).
Zhou, H., Chen, Y. & Lipton, Z. C. Evaluating model performance in medical datasets over time. In Proc. Conf. Health, Inference, and Learning (2023).
Scholkopf, B. et al. On causal and anticausal learning. In Int. Conf. Machine Learning (2012).
Lipton, Z., Wang, Y.-X. & Smola, A. Detecting and correcting for label shift with black box predictors. In Int. Conf. Machine Learning 3122–3130 (PMLR, 2018).
Loupy, A., Mengel, M. & Haas, M. Thirty years of the international banff classification for allograft pathology: the past, present, and future of kidney transplant diagnostics. Kidney Int 101, 678–691 (2022).
Article PubMed Google Scholar
Delahunt, B. et al. Gleason and Fuhrman no longer make the grade. Histopathology 68, 475–481 (2016).
Article PubMed Google Scholar
Davatchi, F. et al. The saga of diagnostic/classification criteria in Behcet’s disease. Int. J. Rheum. Dis. 18, 594–605 (2015).
Article PubMed Google Scholar
Louis, D. N. et al. The 2016 World Health Organization classification of tumors of the central nervous system: a summary. Acta Neuropathol. 131, 803–820 (2016).
Article PubMed Google Scholar
Bifet, A. & Gavalda, R. Learning from time-changing data with adaptive windowing. In Proc. 2007 SIAM International Conference on Data Mining 443–448 (SIAM, 2007).
Nigenda, D. et al. Amazon SageMaker Model Monitor: a system for real-time insights into deployed machine learning models. In Proc. 28th ACM SIGKDD Conf. Knowledge Discovery and Data Mining (2022).
Miroshnikov, A., Kotsiopoulos, K., Franks, R. & Kannan, A. R. Wasserstein-based fairness interpretability framework for machine learning models. Mach. Learn. 111, 3307–3357 (2022).
Article Google Scholar
Board, A. E. AAA statement on race. Am. Anthropol. 100, 712–713 (1998).
Article Google Scholar
Oni-Orisan, A., Mavura, Y., Banda, Y., Thornton, T. A. & Sebro, R. Embracing genetic diversity to improve black health. N. Engl. J. Med. 384, 1163–1167 (2021).
Article PubMed Google Scholar
Calhoun, A. The pathophysiology of racial disparities. N. Engl. J. Med. 384, e78 (2021).
Article PubMed Google Scholar
Sun, R. et al. Don’t ignore genetic data from minority populations. Nature 585, 184–186 (2020).
Article PubMed Google Scholar
Lannin, D. R. et al. Influence of socioeconomic and cultural factors on racial differences in late-stage presentation of breast cancer. JAMA 279, 1801–1807 (1998).
Article CAS PubMed Google Scholar
Bao, M. et al. It’s COMPASlicated: the messy relationship between RAI datasets and algorithmic fairness benchmarks. In 35th Conf. Neural Information Processing Systems Datasets and Benchmarks (2021).
Hao, M. et al. Efficient and privacy-enhanced federated learning for industrial artificial intelligence. IEEE Trans. Ind. Inf. 16, 6532–6542 (2019).
Article Google Scholar
Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10, 1–19 (2019).
Article Google Scholar
Bonawitz, K. et al. Practical secure aggregation for privacy-preserving machine learning. In Proc. 2017 ACM SIGSAC Conf. Computer and Communications Security 1175–1191 (2017).
Bonawitz, K. et al. Towards federated learning at scale: system design. In Proc. Mach. Learn. Syst. 1, 374–388 (2019).
Google Scholar
Brisimi, T. S. et al. Federated learning of predictive models from federated electronic health records. Int. J. Med. Inform. 112, 59–67 (2018).
Article PubMed PubMed Central Google Scholar
Huang, L. et al. Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. J. Biomed. Inform. 99, 103291 (2019).
Article PubMed Google Scholar
Xu, J. et al. Federated learning for healthcare informatics. J. Healthc. Inform. Res. 5, 1–19 (2021).
Article PubMed Google Scholar
Chakroborty, S., Patel, K. R. & Freytag, A. Beyond federated learning: fusion strategies for diabetic retinopathy screening algorithms trained from different device types. Invest. Ophthalmol. Vis. Sci. 62, 85–85 (2021).
Google Scholar
Ju, C. et al. Federated transfer learning for EEG signal classification. In 42nd Annu. Int. Conf. IEEE Engineering in Medicine and Biology Society 3040–3045 (IEEE, 2020).
Li, W. et al. Privacy-preserving federated brain tumour segmentation. In Int. Workshop on Machine Learning in Medical Imaging 133–141 (Springer, 2019).
Kaissis, G. et al. End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nat. Mach. Intell. 3, 473–484 (2021).
Article Google Scholar
Rieke, N. et al. The future of digital health with federated learning. NPJ Digit. Med. 3, 119 (2020).
Article PubMed PubMed Central Google Scholar
Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10, 12598 (2020).
Article PubMed PubMed Central Google Scholar
Choudhury, O. et al. Differential privacy-enabled federated learning for sensitive health data. In Machine Learning for Health (ML4H) Workshop at NeurIPS (2019).
Kushida, C. A. et al. Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies. Med. Care 50, S82–S101 (2012).
Article PubMed PubMed Central Google Scholar
van der Haak, M. et al. Data security and protection in cross-institutional electronic patient records. Int. J. Med. Inform. 70, 117–130 (2003).
Article PubMed Google Scholar
Veale, M. & Binns, R. Fairer machine learning in the real world: mitigating discrimination without collecting sensitive data. Big Data Soc. 4, 2053951717743530 (2017).
Article Google Scholar
Fiume, M. et al. Federated discovery and sharing of genomic data using beacons. Nat. Biotechnol. 37, 220–224 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sadilek, A. et al. Privacy-first health research with federated learning. NPJ Digit. Med. 4, 132 (2021).
Article PubMed PubMed Central Google Scholar
Duan, R., Boland, M. R., Moore, J. H. & Chen, Y. ODAL: a one-shot distributed algorithm to perform logistic regressions on electronic health records data from multiple clinical sites. In BIOCOMPUTING 2019: Proc. Pacific Symposium 30–41 (World Scientific, 2018).
Sarma, K. V. et al. Federated learning improves site performance in multicenter deep learning without data sharing. J. Am. Med. Inform. Assoc. 28, 1259–1264 (2021).
Article PubMed PubMed Central Google Scholar
Silva, S. et al. Federated learning in distributed medical databases: meta-analysis of large-scale subcortical brain data. In 2019 IEEE 16th International Symposium on Biomedical Imaging 270–274 (IEEE, 2019).
Roy, A. G., Siddiqui, S., Polsterl, S., Navab, N. & Wachinger, C. BrainTorrent: a peer-to-peer environment for decentralized federated learning. Preprint at https://doi.org/10.48550/arXiv.1905.06731 (2019).
Lu, M. Y. et al. Federated learning for computational pathology on gigapixel whole slide images. Med. Image Anal. 76, 102298 (2022).
Article PubMed Google Scholar
Dou, Q. et al. Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study. NPJ Digit. Med. 4, 60 (2021).
Article PubMed PubMed Central Google Scholar
Yang, D. et al. Federated semi-supervised learning for COVID region segmentation in chest CT using multinational data from China, Italy, Japan. Med. Image Anal. 70, 101992 (2021).
Article PubMed PubMed Central Google Scholar
Vaid, A. et al. Federated learning of electronic health records to improve mortality prediction in hospitalized patients with COVID-19: machine learning approach. JMIR Med. Inform. 9, e24207 (2021).
Article PubMed PubMed Central Google Scholar
Li, S., Cai, T. & Duan, R. Targeting underrepresented populations in precision medicine: a federated transfer learning approach. Preprint at https://doi.org/10.48550/arXiv.2108.12112 (2023).
Mandl, K. D. et al. The genomics research and innovation network: creating an interoperable, federated, genomics learning system. Genet. Med. 22, 371–380 (2020).
Article CAS PubMed Google Scholar
Cai, M. et al. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. Am. J. Hum. Genet. 108, 632–655 (2021).
Article CAS PubMed PubMed Central Google Scholar
Liang, J., Hu, D. & Feng, J. Do we really need to access the source data? Source hypothesis transfer for unsupervised domain adaptation. In Int. Conf. Machine Learning 6028–6039 (PMLR, 2020).
Song, L., Ma, C., Zhang, G. & Zhang, Y. Privacy-preserving unsupervised domain adaptation in federated setting. IEEE Access 8, 143233–143240 (2020).
Article Google Scholar
Li, X. et al. Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results. Med. Image Anal. 65, 101765 (2020).
Article PubMed PubMed Central Google Scholar
Peterson, D., Kanani, P. & Marathe, V. J. Private federated learning with domain adaptation. In Federated Learning for Data Privacy and Confidentiality Workshop in NeurIPS (2019).
Peng, X., Huang, Z., Zhu, Y. & Saenko, K. Federated adversarial domain adaptation. In Int. Conf. Learning Representations (2020).
Yao, C.-H. et al. Federated multi-target domain adaptation. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 1424–1433 (2022).
Li, T., Sanjabi, M., Beirami, A. & Smith, V. Fair resource allocation in federated learning. In Int. Conf. Learning Representations (2020).
Mohri, M., Sivek, G. & Suresh, A. T. Agnostic federated learning. In Int. Conf. Machine Learning 4615-4625 (PMLR, 2019).
Ezzeldin, Y. H., Yan, S., He, C., Ferrara, E. & Avestimehr, S. FairFed: enabling group fairness in federated learning. In Proc. AAAI Conf. Artificial Intelligence (2023).
Papadaki, A., Martinez, N., Bertran, M., Sapiro, G. & Rodrigues, M. Minimax demographic group fairness in federated learning. In ACM Conf. Fairness, Accountability, and Transparency 142–159 (2022).
Chen, D., Gao, D., Kuang, W., Li, Y. & Ding, B. pFL-Bench: a comprehensive benchmark for personalized federated learning. In 36th Conf. Neural Information Processing Systems Datasets and Benchmarks Track (2022).
Chai, J. & Wang, X. Self-supervised fair representation learning without demographics. In Adv. Neural Information Processing Systems (2022).
Jiang, M. et al. Fair federated medical image segmentation via client contribution estimation. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 16302–16311 (2023).
Jiang, M., Wang, Z. & Dou, Q. Harmofl: harmonizing local and global drifts in federated learning on heterogeneous medical images. In Proc. AAAI Conf. Artificial Intelligence 1087–1095 (2022).
Xu, Y. Y., Lin, C. S. and Wang, Y. C. F. Bias-eliminating augmentation learning for debiased federated learning. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 20442–20452 (2023).
Zhao, Y. et al. Federated learning with non-IID data. Preprint at https://doi.org/10.48550/arXiv.1806.00582 (2018).
Konečný, J. et al. Federated learning: strategies for improving communication efficiency. Preprint at https://doi.org/10.48550/arXiv.1610.05492 (2016).
Lin, Y., Han, S., Mao, H., Wang, Y. & Dally, W. J. Deep gradient compression: reducing the communication bandwidth for distributed training. In Int. Conf. Learning Representations (2018).
McMahan, B., et al Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics 1273–1282 (PMLR, 2017).
Li, T. et al. Federated optimization in heterogeneous networks. In Proc. Mach. Learn. Syst. 2, 429–450 (2020).
Google Scholar
Sattler, F., Wiedemann, S., Muller, K.-R. & Samek, W. Robust and communication-efficient federated learning from non-iid data. In IEEE Trans. Neural Netw. Learn. Syst. 31, 3400–3413 (2019).
Article PubMed Google Scholar
Abay, A. et al. Mitigating bias in federated learning. Preprint at https://doi.org/10.48550/arXiv.2012.02447 (2020).
Luo, Z., Wang, Y., Wang, Z., Sun, Z. & Tan, T. Disentangled federated learning for tackling attributes skew via invariant aggregation and diversity transferring. In Int. Conf. Machine Learning 14527–14541 (PMLR, 2022).
McNamara, D., Ong, C. S. & Williamson, R. C. Costs and benefits of fair representation learning. In Proc. 2019 AAAI/ACM Conference on AI, Ethics, and Society 263–270 (2019).
Madaio, M. A., Stark, L., Wortman Vaughan, J. & Wallach, H. Co-designing checklists to understand organizational challenges and opportunities around fairness in AI. In Proc. 2020 CHI Conf. Human Factors in Computing Systems (2020).
Jung, K. et al. A framework for making predictive models useful in practice. J. Am. Med. Inform. Assoc. 28, 1149–1158 (2021).
Article PubMed Google Scholar
Pogodin, R. et al. Efficient conditionally invariant representation learning. In Int. Conf. Learning Representations (2023).
Louizos, C. et al. Causal effect inference with deep latent-variable models. In Adv. Neural Information Processing Systems (2017).
Shi, C., Blei, D. & Veitch, V. Adapting neural networks for the estimation of treatment effects. In Adv. Neural Information Processing Systems (2019).
Yoon, J., Jordon, J. & Van Der Schaar, M. GANITE: estimation of individualized treatment effects using generative adversarial nets. In Int. Conf. Learning Representations (2018).
Rezaei, A., Fathony, R., Memarrast, O. & Ziebart, B. Fairness for robust log loss classification. In Proc. AAAI Conf. Artificial Intelligence 34, 5511–5518 (2020).
Petrović, A., Nikolić, M., Radovanović, S., Delibašić, B. & Jovanović, M. FAIR: Fair adversarial instance re-weighting. Neurocomputing 476, 14–37 (2020).
Article Google Scholar
Sattigeri, P., Hoffman, S. C., Chenthamarakshan, V. & Varshney, K. R. Fairness GAN: generating datasets with fairness properties using a generative adversarial network. IBM J. Res. Dev. 63, 3:1–3:9 (2019).
Article Google Scholar
Xu, D., Yuan, S., Zhang, L. & Wu, X. FairGAN: fairness-aware generative adversarial networks. In 2018 IEEE International Conference on Big Data 570–575 (IEEE, 2018).
Xu, H., Liu, X., Li, Y., Jain, A. & Tang, J. To be robust or to be fair: towards fairness in adversarial training. In Int. Conf. Machine Learning 11492–11501 (PMLR, 2021).
Wadsworth, C., Vera, F. & Piech, C. Achieving fairness through adversarial learning: an application to recidivism prediction. In FAT/ML Workshop (2018).
Adel, T., Valera, I., Ghahramani, Z. & Weller, A. One-network adversarial fairness. In Proc. AAAI Conf. Artificial Intelligence 33, 2412–2420 (2019).
Madras, D., Creager, E., Pitassi, T. & Zemel, R. Learning adversarially fair and transferable representations. In Int. Conf. Machine Learning 3384–3393 (PMLR, 2018).
Madras, D., Creager, E., Pitassi, T. & Zemel, R. Learning adversarially fair and transferable representations. In Proc. 35th Int. Conf. Machine Learning (eds. Dy, J. & Krause, A.) 3384–3393 (PMLR, 2018).
Chen, X., Fain, B., Lyu, L. & Munagala, K. Proportionally fair clustering. In Proc. 36th Int. Conf. Machine Learning (eds. Chaudhuri, K. & Salakhutdinov, R.) 1032–1041 (PMLR, 2019).
Li, P., Zhao, H. & Liu, H. Deep fair clustering for visual learning. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 9070–9079 (2020).
Hong, J. et al. Federated adversarial debiasing for fair and transferable representations. In Proc. 27th ACM SIGKDD Conf. Knowledge Discovery and Data Mining 617–627 (2021).
Qi, T. et al. FairVFL: a fair vertical federated learning framework with contrastive adversarial learning. In Adv. Neural Information Processing Systems (2022).
Chen, Y., Raab, R., Wang, J. & Liu, Y. Fairness transferability subject to bounded distribution shift. In Adv. Neural Information Processing Systems (2022).
An, B., Che, Z., Ding, M. & Huang, F. Transferring fairness under distribution shifts via fair consistency regularization. In Adv. Neural Information Processing Systems (2022).
Giguere, S. et al. Fairness guarantees under demographic shift. In Int. Conf. Learning Representations (2022).
Schrouff, J. et al. Diagnosing failures of fairness transfer across distribution shift in real-world medical settings. In Adv. Neural Information Processing Systems (2022).
Lipkova, J. et al. Personalized radiotherapy design for glioblastoma: integrating mathematical tumor models, multimodal scans, and Bayesian inference. In IEEE Trans. Med. Imaging 38, 1875–1884 (2019).
Cen, L. P. et al. Automatic detection of 39 fundus diseases and conditions in retinal photographs using deep neural networks. Nat. Commun. 12, 4828 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lézoray, O., Revenu, M. & Desvignes, M. Graph-based skin lesion segmentation of multispectral dermoscopic images. In IEEE Int. Conf. Image Processing 897–901 (2014).
Manica, A., Prugnolle, F. & Balloux, F. Geography is a better determinant of human genetic differentiation than ethnicity. Hum. Genet. 118, 366–371 (2005).
Article PubMed PubMed Central Google Scholar
Hadad, N., Wolf, L. & Shahar, M. A two-step disentanglement method. In Proc. IEEE Conf. Computer Vision and Pattern Recognition 772–780 (2018).
Achille, A. & Soatto, S. Emergence of invariance and disentanglement in deep representations. J. Mach. Learn. Res. 19, 1947–1980 (2018).
Google Scholar
Chen, R. T., Li, X., Grosse, R. & Duvenaud, D. Isolating sources of disentanglement in variational autoencoders. In Adv. Neural Information Processing Systems (2018).
Kim, H. & Mnih, A. Disentangling by factorising. In Int. Conf. Machine Learning 2649–2658 (PMLR, 2018).
Higgins, I. et al. beta-VAE: learning basic visual concepts with a constrained variational framework In Int. Conf. Learning Representations (2017).
Sarhan, M. H., Eslami, A., Navab, N. & Albarqouni, S. Learning interpretable disentangled representations using adversarial VAEs. In Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data 37–44 (Springer, 2019).
Gyawali, P. K. et al. Learning to disentangle inter-subject anatomical variations in electrocardiographic data. In IEEE Trans. Biomedical Engineering (IEEE, 2021).
Bing, S., Fortuin, V. & Ratsch, G. On disentanglement in Gaussian process variational autoencoders. In 4th Symp. Adv. Approximate Bayesian Inference (2021).
Xu, Y., He, H., Shen, T. & Jaakkola, T. S. Controlling directions orthogonal to a classifier. In Int. Conf. Learning Representations (2022).
Cisse, M. & Koyejo, S. Fairness and representation learning. In NeurIPS Invited Talk 2019; https://cs.stanford.edu/~sanmi/documents/Representation_Learning_Fairness_NeurIPS19_Tutorial.pdf (2019).
Creager, E. et al. Flexibly fair representation learning by disentanglement. In Int. Conf. Machine Learning 1436–1445 (PMLR, 2019).
Locatello, F. et al. On the fairness of disentangled representations. In Adv. Neural Information Processing Systems (2019).
Lee, J., Kim, E., Lee, J., Lee, J. & Choo, J. Learning debiased representation via disentangled feature augmentation. In Adv. Neural Information Processing Systems 34, 25123–25133 (2021).
Zhang, Y. K., Wang, Q. W., Zhan, D. C. & Ye, H. J. Learning debiased representations via conditional attribute interpolation. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 7599–7608 (2023).
Tartaglione, E., Barbano, C. A. & Grangetto, M. End: entangling and disentangling deep representations for bias correction. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 13508–13517 (2021).
Bercea, C. I., Wiestler, B., Rueckert, D. & Albarqouni, S. FedDis: disentangled federated learning for unsupervised brain pathology segmentation. Preprint at https://doi.org/10.48550/arXiv.2103.03705 (2021).
Ke, J., Shen, Y. & Lu, Y. Style normalization in histology with federated learning. In 2021 IEEE 18th Int. Symp. Biomedical Imaging 953–956 (IEEE, 2021).
Pfohl, S. R., Dai, A. M. & Heller, K. Federated and differentially private learning for electronic health records. In Machine Learning for Health (ML4H) Workshop at NeurIPS (2019).
Xin, B. et al. Private FL-GAN: differential privacy synthetic data generation based on federated learning. In 2020 IEEE Int. Conf. Acoustics, Speech and Signal Processing 2927–2931 (IEEE, 2020).
Rajotte, J.-F. et al. Reducing bias and increasing utility by federated generative modeling of medical images using a centralized adversary. In Proc. Conf. Information Technology for Social Good 79–84 (2021).
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Int. Conf. Machine Learning 1597–1607 (PMLR, 2020).
Shad, R., Cunningham, J. P., Ashley, E. A., Langlotz, C. P. & Hiesinger, W. Designing clinically translatable artificial intelligence systems for high-dimensional medical imaging. Nat. Mach. Intell. 3, 929–935 (2021).
Article Google Scholar
Jacovi, A., Marasovic, A., Miller, T. & Goldberg, Y. Formalizing trust in artificial intelligence: prerequisites, causes and goals of human trust in AI. In Proc. 2021 ACM Conf. Fairness, Accountability, and Transparency 624–635 (2021).
Floridi, L. Establishing the rules for building trustworthy AI. Nat. Mach. Intell. 1, 261–262 (2019).
Article Google Scholar
High-Level Expert Group on Artificial Intelligence. Ethics Guidelines for Trustworthy AI (European Commission, 2019).
Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. In Workshop at Int. Conf. Learning Representations (2014).
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In Proc. IEEE Int. Conf. Computer Vision 618–626 (2017).
Irvin, J. et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In Proc. AAAI Conf. Artificial Intelligence 33, 590–597 (2019).
Sayres, R. et al. Using a deep learning algorithm and integrated gradients explanation to assist grading for diabetic retinopathy. Ophthalmology 126, 552–564 (2019).
Article PubMed Google Scholar
Patro, B. N., Lunayach, M., Patel, S. & Namboodiri, V. P. U-CAM: visual explanation using uncertainty based class activation maps. In Proc. IEEE/CVF Int. Conf. Computer Vision 7444–7453 (2019).
Grewal, M., Srivastava, M. M., Kumar, P. & Varadarajan, S. RADNET: radiologist level accuracy using deep learning for hemorrhage detection in CT scans. In 2018 IEEE 15th Int. Symp. Biomedical Imaging 281–284 (IEEE, 2018).
Arun, N. T. et al. Assessing the validity of saliency maps for abnormality localization in medical imaging. In Medical Imaging with Deep Learning (2020).
Schlemper, J. et al. Attention-gated networks for improving ultrasound scan plane detection. In Medical Imaging with Deep Learning (2018).
Schlemper, J. et al. Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2019).
Article PubMed PubMed Central Google Scholar
Mittelstadt, B., Russell, C. & Wachter, S. Explaining explanations in AI. In Proc. Conf. Fairness, Accountability, and Transparency 279–288 (2019).
Kindermans, P.-J. et al. in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning 267–280 (Springer, 2019).
Kaur, H. et al. Interpreting interpretability: understanding data scientists’ use of interpretability tools for machine learning. In Proc. 2020 CHI Conf. Human Factors in Computing Systems (2020).
Adebayo, J. et al. Sanity checks for saliency maps. In Adv. Neural Information Processing Systems (2018).
Saporta, A. et al. Benchmarking saliency methods for chest X-ray interpretation. Nat. Mach. Intell. 4, 867–878 (2022).
Article Google Scholar
Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878 (2022).
Article CAS PubMed Google Scholar
DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021).
Article Google Scholar
Adebayo, J., Muelly, M., Liccardi, I. & Kim, B. Debugging tests for model explanations. In Adv. Neural Information Processing Syst. 33, 700–712 (2020).
Google Scholar
Lee, M. K. & Rich, K. Who is included in human perceptions of AI?: Trust and perceived fairness around healthcare AI and cultural mistrust. In Proc. 2021 CHI Conf. Human Factors in Computing Systems (2021).
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Int. Conf. Machine Learning 3319–3328 (PMLR, 2017).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st Int. Conf. Neural Information Processing Systems 4768–4777 (2017).
Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3, e745–e750 (2021).
Article CAS PubMed Google Scholar
Kim, G. B., Gao, Y., Palsson, B. O. & Lee, S. Y. DeepTFactor: a deep learning-based tool for the prediction of transcription factors. Proc. Natl Acad. Sci. USA 118, e2021171118 (2021).
Article CAS PubMed Google Scholar
Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2, 749–760 (2018).
Article PubMed PubMed Central Google Scholar
Qiu, W. et al. Interpretable machine learning prediction of all-cause mortality. Commun. Med. 2, 125 (2022).
Article PubMed PubMed Central Google Scholar
Janizek, J. D. et al. Uncovering expression signatures of synergistic drug responses via ensembles of explainable machine-learning models. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-023-01034-0 (2023).
Article PubMed Google Scholar
Wexler, J., Pushkarna, M., Robinson, S., Bolukbasi, T. & Zaldivar, A. Probing ML models for fairness with the What-If tool and SHAP: hands-on tutorial. In Proc. 2020 Conference on Fairness, Accountability, and Transparency 705 (2020).
Lundberg, S. M. Explaining quantitative measures of fairness. In Fair & Responsible AI Workshop @ CHI2020; https://scottlundberg.com/files/fairness_explanations.pdf (2020).
Cesaro, J. & Cozman, F. G. Measuring unfairness through game-theoretic interpretability. In Machine Learning and Knowledge Discovery in Databases: International Workshops of ECML PKDD (2019).
Meng, C., Trinh, L., Xu, N. & Liu, Y. Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset. Sci. Rep. 12, 7166 (2022).
Article CAS PubMed PubMed Central Google Scholar
Panigutti, C., Perotti, A., Panisson, A., Bajardi, P. & Pedreschi, D. FairLens: auditing black-box clinical decision support systems. Inf. Process. Manag. 58, 102657 (2021).
Article Google Scholar
Röösli, E., Bozkurt, S. & Hernandez-Boussard, T. Peeking into a black box, the fairness and generalizability of a MIMIC-III benchmarking model. Sci. Data 9, 24 (2022).
Article PubMed PubMed Central Google Scholar
Pan, W., Cui, S., Bian, J., Zhang, C. & Wang, F. Explaining algorithmic fairness through fairnessaware causal path decomposition. In Proc. 27th ACM SIGKDD Conf. Knowledge Discovery and Data Mining 1287–1297 (2021).
Agarwal, C. et al. Openxai: towards a transparent evaluation of model explanations. In Adv. Neural Information Processing Systems 35, 15784–15799 (2022).
Zhang, H., Singh, H., Ghassemi, M. & Joshi, S. “Why did the model fail?”: attributing model performance changes to distribution shifts. In Int. Conf. Machine Learning (2023).
Ghorbani, A. & Zou, J. Data Shapley: equitable valuation of data for machine learning. In Int. Conf. Machine Learn. 97, 2242–2251 (2019).
Google Scholar
Pandl, K. D., Feiland, F., Thiebes, S. & Sunyaev, A. Trustworthy machine learning for health care: scalable data valuation with the Shapley value. In Proc. Conf. Health, Inference, and Learning 47–57 (2021).
Prakash, E. I., Shrikumar, A. & Kundaje, A. Towards more realistic simulated datasets for benchmarking deep learning models in regulatory genomics. In Machine Learning in Computational Biology 58–77 (2022).
Oktay, O. et al. Attention U-Net: learning where to look for the pancreas. In Medical Imaging with Deep Learning (2018).
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In Int. Conf. Learning Representations (2020).
Lu, M. Y. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Article PubMed PubMed Central Google Scholar
Yufei, C. et al. Bayes-MIL: a new probabilistic perspective on attention-based multiple instance learning for whole slide images. In Int. Conf. Learning Representations (2023).
Van Gansbeke, W., Vandenhende, S., Georgoulis, S. & Van Gool, L. Unsupervised semantic segmentation by contrasting object mask proposals. In Proc. IEEE/CVF Int. Conf. Computer Vision 10052–10062 (2021).
Radford, A. et al. Learning transferable visual models from natural language supervision. In Int. Conf. Machine Learning 8748–8763 (2021).
Wei, J. et al. Chain of thought prompting elicits reasoning in large language models. In Adv. Neural Information Processing Systems (2022).
Javed, S. A., Juyal, D., Padigela, H., Taylor-Weiner, A. & Yu, L. Additive MIL: intrinsically interpretable multiple instance learning for pathology. In Adv. Neural Information Processing Systems (2022).
Diao, J. A. et al. Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes. Nat. Commun. 12, 1613 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bhargava, H. K. et al. Computationally derived image signature of stromal morphology is prognostic of prostate cancer recurrence following prostatectomy in african american patients. Clin. Cancer Res. 26, 1915–1923 (2020).
Article CAS PubMed PubMed Central Google Scholar
Curtis, J. R. et al. Population-based fracture risk assessment and osteoporosis treatment disparities by race and gender. J. Gen. Intern. Med. 24, 956–962 (2009).
Article PubMed PubMed Central Google Scholar
Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416 (2018).
Article CAS PubMed PubMed Central Google Scholar
Foley, R. N., Wang, C. & Collins, A. J. Cardiovascular risk factor profiles and kidney function stage in the US general population: the NHANES III study. In Mayo Clinic Proc. 80, 1270–1277 (Elsevier, 2005).
Nevitt, M., Felson, D. & Lester, G. The osteoarthritis initiative. Protocol for the cohort study 1; https://nda.nih.gov/static/docs/StudyDesignProtocolAndAppendices.pdf (2006).
Vaughn, I. A., Terry, E. L., Bartley, E. J., Schaefer, N. & Fillingim, R. B. Racial-ethnic differences in osteoarthritis pain and disability: a meta-analysis. J. Pain. 20, 629–644 (2019).
Article PubMed Google Scholar
Rotemberg, V. et al. A patient-centric dataset of images and metadata for identifying melanomas using clinical context. Sci. Data 8, 34 (2021).
Article PubMed PubMed Central Google Scholar
Kinyanjui, N. M. et al. Estimating skin tone and effects on classification performance in dermatology datasets. Preprint at https://doi.org/10.48550/arXiv.1910.13268 (2019).
Kinyanjui, N. M. et al. Fairness of classifiers across skin tones in dermatology. In Int. Conf. Medical Image Computing and Computer-Assisted Intervention 320–329 (2020).
Chew, E. Y. et al. The Age-Related Eye Disease Study 2 (AREDS2): study design and baseline characteristics (AREDS2 report number 1). Ophthalmology 119, 2282–2289 (2012).
Article PubMed Google Scholar
Joshi, N. & Burlina, P. AI fairness via domain adaptation. Preprint at https://doi.org/10.48550/arXiv.2104.01109 (2021).
Zhou, Y. et al. RadFusion: benchmarking performance and fairness for multi-modal pulmonary embolism detection from CT and EMR. Preprint at https://doi.org/10.48550/arXiv.2111.11665 (2021).
Edwards, N. J. et al. The CPTAC data portal: a resource for cancer proteomics research. J. Proteome Res. 14, 2707–2713 (2015).
Article CAS PubMed Google Scholar
Johnson, A. E. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016).
Article CAS PubMed PubMed Central Google Scholar
Boag, W., Suresh, H., Celi, L. A., Szolovits, P. & Ghassemi, M. Racial disparities and mistrust in end-of-life care. In Machine Learning for Healthcare Conf. 587–602 (PMLR, 2018).
Prosper, A. E. et al. Association of inclusion of more black individuals in lung cancer screening with reduced mortality. JAMA Netw. Open 4, e2119629 (2021).
Article PubMed PubMed Central Google Scholar
National Lung Screening Trial Research Team. et al. The National Lung Screening Trial: overview and study design. Radiology 258, 243–253 (2011).
Article PubMed Central Google Scholar
Colak, E. et al. The RSNA pulmonary embolism CT dataset. Radiol. Artif. Intell. 3, e200254 (2021).
Article PubMed PubMed Central Google Scholar
Gertych, A., Zhang, A., Sayre, J., Pospiech-Kurkowska, S. & Huang, H. Bone age assessment of children using a digital hand atlas. Comput. Med. Imaging Graph. 31, 322–331 (2007).
Article PubMed PubMed Central Google Scholar
Jeong, J. J. et al. The EMory BrEast imaging Dataset (EMBED): a racially diverse, granular dataset of 3.4 million screening and diagnostic mammographic images. Radiol. Artif. Intell. 5.1, e220047 (2023).
Article Google Scholar
Pollard, T. J. et al. The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data 5, 1–13 (2018).
Article Google Scholar
Sheikhalishahi, S., Balaraman, V. & Osmani, V. Benchmarking machine learning models on multicentre eICU critical care dataset. PLoS ONE 15, e0235424 (2020).
Article CAS PubMed PubMed Central Google Scholar
El Emam, K. et al. De-identification methods for open health data: the case of the heritage health prize claims dataset. J. Med. Internet Res 14, e33 (2012).
Article PubMed PubMed Central Google Scholar
Madras, D., Pitassi, T. & Zemel, R. Predict responsibly: improving fairness and accuracy by learning to defer. In Adv. Neural Information Processing Systems (2018).
Louizos, C., Swersky, K., Li, Y., Welling, M. & Zemel, R. The variational fair autoencoder. In Int. Conf. Learning Representations (2016).
Raff, E. & Sylvester, J. Gradient reversal against discrimination. Preprint at https://doi.org/10.48550/arXiv.1807.00392 (2018).
Smith, J. W., Everhart, J., Dickson, W., Knowler, W. & Johannes, R. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proc. Symp. Computer Applications in Medical Care 261—265 (1988).
Sharma, S. et al. Data augmentation for discrimination prevention and bias disambiguation. In Proc. AAAI/ACM Conf. AI, Ethics, and Society 358–364 (Association for Computing Machinery, 2020).
International Warfarin Pharmacogenetics Consortium. et al. Estimation of the warfarin dose with clinical and pharmacogenetic data. N. Engl. J. Med 360, 753–764 (2009).
Article Google Scholar
Kallus, N., Mao, X. & Zhou, A. Assessing algorithmic fairness with unobserved protected class using data combination. In Proc. 2020 Conf. Fairness, Accountability, and Transparency 110 (Association for Computing Machinery, 2020).
Gross, R. T. Infant Health and Development Program (IHDP): Enhancing the Outcomes of Low Birth Weight, Premature Infants in the United States, 1985-1988 (Inter-university Consortium for Political and Social Research, 1993); https://www.icpsr.umich.edu/web/HMCA/studies/9795
Madras, D., Creager, E., Pitassi, T. & Zemel, R. Fairness through causal awareness: learning causal latent-variable models for biased data. In Proc. Conf. Fairness, Accountability, and Transparency 30, 349–358 (Association for Computing Machinery, 2019).
Weeks, M. R., Clair, S., Borgatti, S. P., Radda, K. & Schensul, J. J. Social networks of drug users in high-risk sites: finding the connections. AIDS Behav. 6, 193–206 (2002).
Article Google Scholar
Kleindessner, M., Samadi, S., Awasthi, P. & Morgenstern, J. Guarantees for spectral clustering with fairness constraints. In Int. Conf. Machine Learning 3458–3467 (2019).
Daneshjou, R. et al. Disparities in dermatology AI performance on a diverse, curated clinical image set. Sci. Adv. 8, eabq6147 (2022).
Article PubMed PubMed Central Google Scholar
Garg, S., Balakrishnan, S. & Lipton, Z. C. Domain adaptation under open set label shift. In Adv. Neural Information Processing Systems (2022).
Pham, T. H., Zhang, X. & Zhang, P. Fairness and accuracy under domain generalization. In Int. Conf. Learning Representations (2023).
Barocas, S., Hardt, M. & Narayanan, A. Fairness in machine learning. NIPS Tutor 1, 2017 (2017).
Google Scholar
Liu, L. T., Simchowitz, M. & Hardt, M. The implicit fairness criterion of unconstrained learning. In Int. Conf. Machine Learning 4051–4060 (PMLR, 2017).

Download references

Acknowledgements

All authors are supported in part by the Brigham and Women’s Hospital (BWH) President’s Fund, Mass General Hospital (MGH) Pathology and by National Institute of General Medical Sciences (NIGMS) R35GM138216 (to F.M.). R.J.C. and S.S. were also supported by the National Science Foundation (NSF) Graduate Fellowship. M.Y.L. was also supported by the Siebel Scholars program. T.Y.C. was also supported by the National Institute of Health National Cancer Institute (NIH-NCI) Ruth L. Kirschstein National Service Award T32CA251062. The content is solely the responsibility of the authors and does not reflect the official views of the NIH, NIGMS, NCI or NSF.

Author information

Authors and Affiliations

Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Richard J. Chen, Judy J. Wang, Drew F. K. Williamson, Tiffany Y. Chen, Jana Lipkova, Ming Y. Lu, Sharifa Sahai & Faisal Mahmood
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Richard J. Chen, Jana Lipkova & Sharifa Sahai
Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
Richard J. Chen, Drew F. K. Williamson, Tiffany Y. Chen, Jana Lipkova, Ming Y. Lu, Sharifa Sahai & Faisal Mahmood
Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Richard J. Chen, Ming Y. Lu & Faisal Mahmood
Boston University School of Medicine, Boston, MA, USA
Judy J. Wang
Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
Ming Y. Lu
Department of Systems Biology, Harvard Medical School, Boston, MA, USA
Sharifa Sahai
Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
Faisal Mahmood
Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA
Faisal Mahmood

Authors

Richard J. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Judy J. Wang
View author publications
You can also search for this author in PubMed Google Scholar
Drew F. K. Williamson
View author publications
You can also search for this author in PubMed Google Scholar
Tiffany Y. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jana Lipkova
View author publications
You can also search for this author in PubMed Google Scholar
Ming Y. Lu
View author publications
You can also search for this author in PubMed Google Scholar
Sharifa Sahai
View author publications
You can also search for this author in PubMed Google Scholar
Faisal Mahmood
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.J.C. and F.M. drafted the manuscript. All authors contributed to literature review, conceptualization and editing of the manuscript.

Corresponding author

Correspondence to Faisal Mahmood.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary table.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, R.J., Wang, J.J., Williamson, D.F.K. et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat. Biomed. Eng 7, 719–742 (2023). https://doi.org/10.1038/s41551-023-01056-8

Download citation

Received: 01 October 2021
Accepted: 13 April 2023
Published: 28 June 2023
Issue Date: June 2023
DOI: https://doi.org/10.1038/s41551-023-01056-8