Table 1 Mean absolute error (mean ± s.d.) when estimating population uniqueness (100 trials per population)

From: Estimating the success of re-identifications in incomplete datasets using generative models

  MERNIS USA ADULT HDV MIDUS
Corpus n 8,820,049 3,061,692 32,561 8403 7108
c 10 40 50 50 60
[min Ξ, max Ξ] [0.087, 0.844] [0.000, 0.961] [0.000, 0.794] [0.002, 0.941] [0.052, 0.944]
Sampling fraction 100% 0.029 ± 0.019 0.028 ± 0.026 0.018 ± 0.016 0.006 ± 0.009 0.018 ± 0.014
10% 0.030 ± 0.019 0.028 ± 0.016 0.022 ± 0.020 0.011 ± 0.009 0.035 ± 0.044
5% 0.029 ± 0.019 0.027 ± 0.016 0.027 ± 0.023 0.015 ± 0.012 0.037 ± 0.055
1% 0.029 ± 0.019 0.029 ± 0.015 0.027 ± 0.014 0.045 ± 0.050 0.055 ± 0.079
0.5% 0.028 ± 0.019 0.029 ± 0.015 0.048 ± 0.039   
0.1% 0.026 ± 0.017 0.058 ± 0.037    
  1. Our model correctly estimates population uniqueness even when only a small to very small fraction of the population is available. n denotes the population size and c the corpus size (the total number of populations considered per corpus). We do not estimate population uniqueness when the sampled dataset contains <50 records