Fig. 2 | Nature Communications

Fig. 2

From: Estimating the success of re-identifications in incomplete datasets using generative models

Fig. 2

The model predicts correct re-identifications with high confidence. a Receiver operating characteristic (ROC) curves for USA populations (light ROC curve for each population and a solid line for the average ROC curve). Our method accurately predicts the (binary) individual uniqueness. (Inset) False-discovery rate (FDR) for individual records classified with ξ > 0.9, ξ > 0.95, and ξ > 0.99. For re-identifications that the model predicts are likely to be correct \(( {\widehat {\xi _{\boldsymbol{x}}} \,> \, 0.95})\), only 5.26% of them are incorrect (FDR). b Our model outperforms by 39% the best theoretically achievable prediction using population uniqueness across every corpus. A red point shows the Brier Score obtained by our model, when trained on a 1% sample. The solid line represents the lowest Brier Score achievable when using the exact population uniqueness while the dashed line represents the Brier Score of a random guess prediction (BS = 1/3)

Back to article page