Fig. 3 | Nature Communications

Fig. 3

From: Estimating the success of re-identifications in incomplete datasets using generative models

Fig. 3

Average individual uniqueness increases fast with the number of collected demographic attributes. a Distribution of predicted individual uniqueness knowing ZIP code, date of birth, and gender (resp. ZIP code, date of birth, gender, and number of children) in blue (resp. orange). The dotted blue line at \(\widehat {\xi _{\boldsymbol{x}}} = 0.580\) (resp. dashed orange line at \(\widehat {\xi _{\boldsymbol{x}}} = 0.997\)) illustrates the predicted individual uniqueness of Gov. Weld knowing the same combination of attributes. (Inset) The correctness κx is solely determined by uniqueness ξx and population size n (here for Massachusetts). We show individual uniqueness and correctness for William Weld with three (in blue) and four (in orange) attributes. b The boxplots (25, 50, and 75th percentiles, 1.5 IQR) show the average uniqueness 〈ξx〉 knowing k demographic attributes, grouped by number of attributes. The individual uniqueness scores ξx are estimated on the complete population in Massachusetts, based on the 5% Public Use Microdata Sample files. While few attributes might not be sufficient for a re-identification to be correct, collecting a few more attributes will quickly render the re-identification very likely to be successful. For instance, 15 demographic attributes would render 99.98% of people in Massachusetts unique. c Uniqueness varies with the specific value of attributes. For instance, a 33-year-old is less unique than a 58-year-old person. We here either (i) randomly replace the value of one baseline attribute (ZIP code, date of birth, or gender) or (ii) add one extra attribute, both by sampling from its marginal distribution, to the uniqueness of a 58-year-old male from Cambridge, MA. The dashed baseline shows his original uniqueness \(\widehat {\xi _{\boldsymbol{x}}} = 0.580\) and the boxplots the distribution of individual uniqueness obtained after randomly replacing or adding one attribute. A complete description of the attributes and method is available in Supplementary Methods

Back to article page