Ensuring generalized fairness in batch classification

In this paper, we consider the problem of batch classification and propose a novel framework for achieving fairness in such settings. The problem of batch classification involves selection of a set of individuals, often encountered in real-world scenarios such as job recruitment, college admissions etc. This is in contrast to a typical classification problem, where each candidate in the test set is considered separately and independently. In such scenarios, achieving the same acceptance rate (i.e., probability of the classifier assigning positive class) for each group (membership determined by the value of sensitive attributes such as gender, race etc.) is often not desirable, and the regulatory body specifies a different acceptance rate for each group. The existing fairness enhancing methods do not allow for such specifications and hence are unsuited for such scenarios. In this paper, we define a configuration model whereby the acceptance rate of each group can be regulated and further introduce a novel batch-wise fairness post-processing framework using the classifier confidence-scores. We deploy our framework across four real-world datasets and two popular notions of fairness, namely demographic parity and equalized odds. In addition to consistent performance improvements over the competing baselines, the proposed framework allows flexibility and significant speed-up. It can also seamlessly incorporate multiple overlapping sensitive attributes. To further demonstrate the generalizability of our framework, we deploy it to the problem of fair gerrymandering where it achieves a better fairness-accuracy trade-off than the existing baseline method.

In this section we provide the details of the Tables 3, 4 and 7 present in the main paper.In addition, we provide a Table 20 that describes the applicability of baselines to various sensitive attributes.

Details of Table 3
In Table 3 we have presented the difference in accuracy, precision and recall.In the following Table 9 we present the details of the results of all the baselines on the chosen configurations on the three datasets.More specifically, we present the average weighted precision (of higher acceptance class), weighted recall (of lower acceptance class) and overall accuracy over the 4 configurations of all the baselines, for the different datasets.Thus, combining the results of Table 3  Next, we present the configurations of different baselines that have been used to get the results of Table 3 in the main paper.We only mention the configs of the baselines of Zafar et al. and Agarwal et al. for the three datasets and avoid the details of the other algorithms for a concise and tidy presentation.These configs are described in Tables 12, 13, 14 and 15.The accuracy, precision and recall of these configurations have been depicted in Table 10 and Table 11 for the four datasets.

Details of Table 4
Here we first present the average DEO M and accuracy of various baselines over the chosen configurations as used in Table 4. Then analogous to Table 10 and 11, we present the values of DEO M and accuracy for the four configurations of Zafar and Agarwal in Table 16 and also the DEO M and accuracy for each configuration in Tables 17 and 18 for four datasets.

Details of Table 7
In Table 19, we present the DDP M , DEO M and accuracy of individual configuration whose average has been presented in Table 7

Table 10 .
in the main paper, we can retrieve the values obtained by our algorithm LPCA on these configurations.The accuracy, precision and recall of individual configuration whose average has been presented in Table9for the algorithms of Zafar and Agarwal for Adult (first table) and Bank (second table) datasets.

Table 11 .
The accuracy, precision and recall of individual configuration whose average has been presented in Table9for the algorithms of Zafar and Agarwal for German (first table) and COMPAS (second table) datasets.

Table 12 .
The configurations for Adult dataset generated from Zafar et al. and Agarwal et al. for comparison with LPCA.The DDP M of the configs is in descending order from top to bottom for each algorithm.

Table 13 .
for the algorithms of Yang and LPCEO for Adult (first table) and Bank (second table) datasets.Since the same configuration is used to compute the DEP M and accuracy of both algorithms, the DDP M values of are the same for both for the corresponding configuration.The configurations for Bank dataset generated from Zafar et al. and Agarwal et al. for comparison with LPCA.The DDP M of the configs is in descending order from top to bottom for each algorithm.

Table 14 .
The configurations for COMPAS dataset generated from Zafar et al. and Agarwal et al. for comparison with LPCA.The DDP M of the configs is in descending order from top to bottom for each algorithm.

Table 15 .
The configurations for German dataset generated from Zafar et al. and Agarwal et al. for comparison with LPCA.The DDP M of the configs is in descending order from top to bottom for each algorithm.

Table 16 .
Average DEO M and Accuracy of various baselines over the chosen configs.NA entries refer to a scenario in which the baseline is giving trivial classification as output (all 1's or all 0's) that results in DEO M = 0. To be read together with Table4.

Table 17 .
The accuracy and DEO M of individual configuration whose average has been presented in Table 16 for the algorithms of Zafar and Agarwal for Adult (first table) and Bank (second table) datasets.

Table 18 .
The accuracy and DEO M of individual configuration whose average has been presented in Table16for the algorithms of Zafar and Agarwal for German (first table) and COMPAS (second table) datasets.

Table 19 .
The DDP M , DEO M and accuracy of individual configuration whose average has been presented in Table7for the algorithms of Yang and LPCEO for Adult (first table) and Bank (second table) datasets.