Temporal and cultural limits of privacy in smartphone app usage

Large-scale collection of human behavioural data by companies raises serious privacy concerns. We show that behaviour captured in the form of application usage data collected from smartphones is highly unique even in large datasets encompassing millions of individuals. This makes behaviour-based re-identification of users across datasets possible. We study 12 months of data from 3.5 million people from 33 countries and show that although four apps are enough to uniquely re-identify 91.2% of individuals using a simple strategy based on public information, there are considerable seasonal and cultural variations in re-identification rates. We find that people have more unique app-fingerprints during summer months making it easier to re-identify them. Further, we find significant variations in uniqueness across countries, and reveal that American users are the easiest to re-identify, while Finns have the least unique app-fingerprints. We show that differences across countries can largely be explained by two characteristics of the country specific app-ecosystems: the popularity distribution and the size of app-fingerprints. Our work highlights problems with current policies intended to protect user privacy and emphasizes that policies cannot directly be ported between countries. We anticipate this will nuance the discussion around re-identifiability in digital datasets and improve digital privacy.

: Fraction of apps per category. Apps are divided into popular Google play categories and figure shows the fraction of app that belong to each category over time.  Figure 3 (main text) are rescaled. For the random attack scheme we rescale according to is the uniqueness at month t, and |A| t is the number of apps at time t. With t = 0 denoting the first month of the dataset, February 2016. For the popularity scheme, the curves in the main figure are rescaled according to the probability of picking an app with 100 users or less (P t ), u(t) = u(t) Pt/Pt=0 .  Figure S6: Differences between months are statistically significant. We compared the distribution of uniqueness between any pair of months across 20 bootstrapping samples using a Welch's t-test with Bonferroni correction for multiple comparisons. The hypothesis that the value of uniqueness is the same for two different months is rejected, with p < 0.05, for 78% of the months pairs under the random attack scheme and for 89% of the cases under the popularity attack scheme. Results are shown for n=5 apps, for the random (left) and popularity (right) attack schemes.

S2 Extrapolation to larger population Subsampling the dataset
To quantify the relation between sample size and uniqueness, we subsample the dataset by selecting a fraction of the original dataset. For each sample s i we estimate uniqueness using the above methodology. To account for selection bias we estimate uniqueness as the average of multiple realizations of a sample size. We use 20 realizations for sample sizes between 100,000 -500,000, 10 realizations for samples between 600,000 -900,000, and 5 realizations for sample sizes above 1,000,000 individuals.

Hiding in the crowd
Our dataset is limited to 3.5 million users, similar in size to a small country, but how will uniqueness change as more users are added (increased sample-size)? Will it become possible to hide in the crowd? More precisely, how does the population size affect the extent to which a specific app-fingerprint remains unique. That is, as more and more users are added to our sample, does the likelihood to observe multiple individuals with identical fingerprints also increase? This corresponds to an inverse k-anonymity problem, where one needs to estimate the number of users that should be added in order to increase the overall anonymity of the dataset. (Bearing in mind that overall anonymity is not a good measure for the sensitivity of individual traces.) To understand the effect of sample-size on unicity, we first slice our dataset into smaller subsamples and use it to estimate the uniqueness for sample sizes ranging from 100,000 to 3.5 million individuals. Figure 4A reveals that sample size has a large effect on the re-identification rate when selecting apps using a random heuristic. Considering n apps = 5, the average re-identification rate decreases from 45.89% for a sample size of 1 million individuals to 37.33% for 2 million individuals and 32.09% for the full sample of 3.5 million people. The attack scheme is considerably less affected ( Figure S9). For n apps = 5 we find that the re-identification rates are respectively 96.60%, 94.23% and 92.72% for sample sizes of 1, 2 and 3.5 million individuals. As such, increasing the sample size by 250% (from 1 to 3.5 million individuals) only reduces uniqueness by approximately 4 percent-points. In order to estimate uniqueness for sample sizes larger than the study population we extrapolate results from Figure S7 for n apps = 5. We express uniqueness of fingerprints using multiple functional forms including: power-laws (∼ x γ ), exponentials (∼ exp(γx)), stretched exponentials (∼ exp(x γ )), and linear functions (∼ x), where x denotes the sample size and γ is a scaling factor. The stretched exponential and power-law show the highest agreement with the data (Figure S9), and roughly suggest that 5 apps are enough to re-identify 75%-80% of individuals for 10 times larger samples (35 million individuals). Although the applied analysis displays high uncertainty with regards to extrapolations, it illustrates the observation that increasing the population size does not help us in hiding in the crowd (that is, uniqueness is not a characteristic of small sample sizes).
Our data sample is not necessarily a representative of the general population, we know that not everybody owns a smartphone and that not everybody downloads custom apps which collect and sell people's data to third parties. More research is needed on this topic, however, our results show that up to 93% (an upper bound) of people from a 30M population might be re-identifiable from 5 apps.

Function
Pseudo   Figure S8: Extrapolated uniqueness. Fit of different functional forms (see Table S1) to the uniqueness curve for n apps = 5 when selecting apps using the popularity heuristic. Closest agreement with data is achieved by the stretched exponential and power law functional forms.

S3 Country differences
The results presented in the main text on differences between countries are robust with respect to the month of the year, and the sample size chosen. We find that the number of unique individuals, based on a fingerprint of n = 5 random apps is substantially stable across 2016 (see Fig. S9). The fit coefficient β characterizing the power-law distribution of app-popularity P (p) p −β is also stable over time (see Fig. S14). We find that the differences between countries are substantially unchanged for sample sizes of 10, 000 (see Fig. S10), 50, 000 (see Fig. S10) and 20, 000 (see main text Fig 4)   Number of apps    Figure S11: Differences between countries are statistically significant. We compared the distribution of uniqueness between any pair of countries across 60 bootstrapping samples of 20, 000 users using a Welch's t-test with Bonferroni correction for multiple comparisons. The hypothesis that the two countries have the same uniqueness is rejected, with p < 0.05, for 94% of the pairs under the random attack scheme and for 92% of the cases under the popularity attack scheme. Results are shown for the random attack scheme (left, n=5 apps) and the popularity attack scheme (right, n=2 apps).   Figure S12: Country differences when sampling apps using the popularity strategy. Countries in the legend are sorted based on the unicity computed for n = 1.

S3.1 Model of cultural differences
To model dependencies between country unicity and app-ecosystem dependent and control variables we use a linear model. To explore monotonic dependencies which not necessary are linear we rank transform all variables, such that rank 1 is the highest possible value of each variable. Initially we considered a larger set of variables relating to the app ecosystem such as number of apps, number of users, ration of apps to users, median number of apps, slope of the app popularity distribution, and fraction of users in population. For the control variables we included population size, GDP per capita (PPP), internet penetration, and the Gini Index of the wealth distribution. However, due to high collinearity between variables we reduced the variable set using a variance inflation factor (VIF) analysis. Fig. S15 shows the correlation plot for the variables in the final model, and variance inflation factors are all below 10. The model is defined as U = β 0 + βX + , where X is a matrix of the rank transformed variables, β 0 is the intercept, and denotes the residual. We build one model per attack strategy. Fig. S16 shows the normal Q-Q plot and the histogram of the residuals estimated from a for the random strategy, and Table S2 shows the coefficient estimates. For the popularity scheme model Fig. S17 and Table S3, show the residuals and the coefficient estimates. Fig. S18 shows model fits for other values of n apps.    : Negative correlation between the fit coefficient and uniqueness. The number of unique individuals, considering a fingerprint of n = 1 (A), n = 3 (B), n = 5 (C), and n = 7 (D) apps versus the fit coefficient β characterizing the power law distribution P (p) ∼ p −β of app popularity, p. Each dot is a different country.