Fairer AI in ophthalmology via implicit fairness learning for mitigating sexism and ageism

The transformative role of artificial intelligence (AI) in various fields highlights the need for it to be both accurate and fair. Biased medical AI systems pose significant potential risks to achieving fair and equitable healthcare. Here, we show an implicit fairness learning approach to build a fairer ophthalmology AI (called FairerOPTH) that mitigates sex (biological attribute) and age biases in AI diagnosis of eye diseases. Specifically, FairerOPTH incorporates the causal relationship between fundus features and eye diseases, which is relatively independent of sensitive attributes such as race, sex, and age. We demonstrate on a large and diverse collected dataset that FairerOPTH significantly outperforms several state-of-the-art approaches in terms of diagnostic accuracy and fairness for 38 eye diseases in ultra-widefield imaging and 16 eye diseases in narrow-angle imaging. This work demonstrates the significant potential of implicit fairness learning in promoting equitable treatment for patients regardless of their sex or age.


FPR and FNR rates of each disease within each group
The FPR and FNR for diseases with imbalanced data are demonstrated in Supplementary Table 2. Cataract, PM and RP have higher incidence rates as age increases, while Coats and FEVR have higher incidence rates at younger ages.There is an obvious data imbalance for these diseases.

The adaptability of FairerOPTH to text modality
Other types of data such as commonly used text can be easily adapted to FairerOPTH.To validate it, additional experiments using text as input to the pathology classification branch were conducted on the OculoScope dataset and denoted as "FairerOPTH (text)".We used the trained BERT (a famous language model) to extract feature from the inputted text.The experimental results are shown in Supplementary Table 3.The "FairerOPTH (text)" and "FairerOPTH (image)" achieve better performance than the baseline.In addition, "FairerOPTH (text)" performs better in screening accuracy and fairness than "FairerOPTH (image)" most of the time.The results demonstrate that FairerOPTH can adapt to the text modality and even achieve better performance.

Classification performance for the retinal pathological features
The classification performance for the retinal pathological features on the OculoScope and MixNAF datasets is demonstrated in Supplementary Table 7.Our FairerOPTH achieves 0.951 and 0.948 AUC (Area Under the Curve) on the Oculo-Scope and MixNAF datasets, respectively.

Theoretic analysis
To theoretically prove that the additional introduction of fundus features related to fundus diseases can improve the performance of the model, we explain it from the perspective of information theory.Shannon entropy [4] is employed to quantify the level of randomness associated with a discrete random variable Y, which has possible outcomes ( 1 ,  2 , • • • ,   ).The disease identification model processes a single fundus image and produces predicted probabilities for various ophthalmic diseases, denoted as ( 1 ,  2 , • • • ,   ), corresponding to the 38 categories represented in the OculoScope dataset and 16 categories represented in the MixNAF dataset.The entropy  ( ) can be formulated as: where  denotes the total number of categories.Eq. ( 1) represents the calculation of the overall confidence of a model across all classes.Considering a disease identification model, where the input features are denoted by  (fundus images), and the output class labels are represented by  , we introduce additional information  (fundus features), which is closely related to the fundus disease identification task.
In information theory, the mutual information measures the mutual dependence between two random variables.For  and  , the mutual information  (;  ) is defined as: where  ( ) is the entropy of  , and  ( |) is the conditional entropy of  given .
Similarly, the mutual information between ,  , and  is given by: where  ( |, ) is the conditional entropy of  given both  and .
Then, we aim to prove that when adding the additional information , which is closely related to the classification task, the conditional entropy  ( |) is greater than the conditional entropy  ( |, ), ..,  ( |) >  ( |, ).Specifically, we start by considering the mutual information  ( ; , ).The definition of conditional entropy is used as follows: where (|, ) represents the probability of  taking the value  given both  and .
Next, the definition of conditional entropy is used: where (|) represents the probability of  taking the value  given .Finally, we compare the difference between  ( ; , ) and  ( ; ): Notably, the term  ( ) is cancelled out in this comparison.
Hence, we have successfully proven, from an informationtheory perspective, that when incorporating additional information , which is closely related to the disease identification task, the conditional entropy  ( |) is greater than the conditional entropy  ( |, ), ..,  ( |) ≥  ( |, ).This indicates that given , the introduction of  reduces the uncertainty in predicting  , enhancing the model's predictive capacity and ultimately leading to improved performance of the disease identification model (see below).

Generalization analysis
The reason why reductions in uncertainty can lead to improvements in model performance drives us to delve into the fundamental concept of the generalization bound of FairerOPTH, particularly in how it compares to conventional fundus disease identification models.The generalization bound characterizes the model's ability to transfer knowledge from the training data to unseen testing data.In this context,  (ℎ) represents the generalization error (.., testing loss), and V (ℎ) signifies the empirical error (.., training loss) of  independent and identical distribution samples.While the Vapnik-Chervonenkis (VC) dimension [5] traditionally measures the model's capacity to perform binary classification algorithms, here we need to adopt the pseudo-dimension  psd [5] for disease identification tasks, as it aptly describes the generalization bound of infinite hypothesis sets [6], offering an appropriate measure of the model's capacity.
Consider a family of functions  associated with the network that incorporates the fundus features, and let  represent the loss function for disease identification.We have a family of bounded loss functions F = {(, ) ↦ → L (ℎ(), )} with values in the range [0, K], which are related to  [6].Assuming that the pseudo-dimension of F is  FairerOPTH , for any  > 0, with a probability of at least 1 − , the following inequality holds for all ℎ ∈ : where  denotes a natural constant.This theorem demonstrates that a larger sample size  and a lower pseudodimension ( psd ) ensures better generalization.As above discussion, the input entropy of the FairerOPTH is lower than that of the conventional identification model, resulting in a reduced number of possible states or choices within the function set . Consequently, the smaller function sets also contribute to a lower pseudo-dimension [5].We denote the pseudo-dimension of conventional identification model as  convention ; it follows that  FairerOPTH ≤  convention .As a result, FairerOPTH achieves a lower generalization bound, indicat-ing its stronger generalizability, which is further corroborated by the experimental results.

FSupplementary Figure 1 |
The overall architecture of the proposed FairerOPTH model.a, FairerOPTH consists of two branches: a pathology classification branch and disease classification branch.The interaction between these two branches is implemented by a pathology-aware attention module that enhances the fundus feature representations to inform the disease classification branch.Four loss terms including   ℎ ,   ,   , and   compose the overall loss to optimize the entire network.b, Detailed network structure of the FairerOPTH.We use ResNet-101 as the encoder to extract fundus features.The pathology-aware attention module is implemented with two fully connected layers and one sigmoid function.The implementation of the FairerOPTH is very simple but highly effective in mitigating unfairness and improving screening accuracy.

Supplementary Table 1 | Comparison of our dataset with public fundus image datasets.
The proposed MixNAF and OculoScope datasets have unique advantages in terms of the labelling of fundus features.They contain 20 and 67 types of fundus features, respectively.

Table 2 | Demonstration of the false positive rate (FPR) and false negative rate (FNR) of FairerOPTH/Baseline for diseases with imbalanced data
. Cataract, PM and RP have higher incidence rates as age increases, while Coats and FEVR have higher incidence rates at younger ages.There is an obvious data imbalance for these diseases.