# Table 2 Three radiologists’ relabeling results for the 100 least valuable, 100 most valuable and 100 randomly sampled chest X-ray images in the training set.

100 least valuable images 100 most valuable images 100 random images p valuea
# Originally labeled as pneumonia 13 100 5 2.95e-51
# Mislabelsb 65 22 20 5.84e-13
# Mislabeled as pneumonia 13 22 4 0.00078
# Mislabeled as no pneumonia 52 0 16 2.66e-18
1. We used the majority vote to obtain the final label of each image. Disagreed images were excluded from further analyses. There were many more mislabeled examples in low value images (i.e. 65) than high value images (i.e. 22, pairwise p = 8.61e-10) or random images (i.e. 20, pairwise p = 1.22e-10), suggesting that low Shapley value effectively captures mislabels in the dataset.
2. ap values computed using $$\chi^{2}$$ test.
3. bNote that since our training set has a higher percentage of pneumonia labels, the mislabel rates may not be representative for the entire ChestX-ray14 dataset.