replying to F. Houssiau et al. Nature Communications https://doi.org/10.1038/s41467-021-27566-0 (2021)
In the work developed in Bassolas et al.1, we studied the structure of cities and their impact in city livability using a highly aggregated mobility dataset. In order to protect privacy, random noise was added using an automated Laplace mechanism (ε, δ)-differential privacy, with ε = 0.66 and δ = 2.1 × 10−29. Where ε sets the noise intensity and δ stands for the deviation from pure ε-privacy.
To illustrate the protection provided by a layer of (ε, δ)-differential privacy, with ε = 0.66 and δ = 2.1 × 10−29, we note that an attacker can improve their certainty about an individual’s presence or absence in the dataset by at most 16%. This observation holds even if the attacker knows every individual’s data, including that of the target, via some side channel. An attack model like this is known as membership inference with perfect knowledge.
In their analysis, Houssiau et al. assume that the dataset referred to in the statistic is the entry dataset of trips. However, we specify the layer of (ε, δ)-differential privacy as per metric, i.e., the number of trips from location A to location B per week W. In other words, the unit of privacy that is protected with the promised differential privacy guarantees is not an individual’s contribution to the entire dataset, but rather whether the individual made a trip from A to B during week W. We agree with Houssiau et al. that it is important to communicate privacy protection precisely and we should have been more specific to avoid confusion.
It is worth pointing out that although Houssiau et al. correctly hypothesize that the 16% statistic does not hold when applied to the entire dataset, there are some discrepancies between their analysis and the privacy mechanisms we apply, resulting in stronger privacy protection in practice. In particular, we bound an individual’s contribution to a particular aggregation partition, i.e., trips from A to B within a week W, to 1. Moreover, the geographical areas we consider are grid cells of size ~1.3 km2 rather than exact locations, as Houssiau et al. assume. Thus, Houssiau et al.’s analysis of a single user (one of the authors), who reported 39 trips in total, likely translates to fewer contributions to the entire dataset and consequently also results in less privacy loss when evaluated over the entire dataset. Finally, we want to emphasize that membership inference with perfect knowledge of the entire dataset is a very strong attack model that is unrealistic in practice. So we stand by our claim that the dataset is highly aggregated and anonymous for all practical purposes.
Below we provide a clarified description of our data aggregation:
The automated Laplace mechanism adds random noise drawn from a zero mean Laplace distribution and yields (ε, δ)-differential privacy guarantee of ε = 0.66 and δ = 2.1 × 10−29 per metric. Specifically, for each week W and each location pair (A, B), we compute the number of unique users who took a trip from location A to location B during week W. To each of these metrics, we add Laplace noise from a zero-mean distribution of scale 1/0.66. We then remove all metrics for which the noisy number of users is lower than 100, following the process described in ref. 2 and publish those remaining. Each metric published therefore satisfies (ε, δ)-differential privacy with values defined above.
The parameter ε controls the noise intensity in terms of its variance, while δ represents the deviation from pure ε-privacy. The closer they are to zero, the stronger the privacy guarantees. For example, with these values of the parameters, an attacker with perfect knowledge on all users except user U would increase the level of certainty as to whether U went from geographical area A to area B during a given week no more than 16%. Each user contributes at most one increment to each partition. If they go from a region A to another region B multiple times in the same week, they only contribute once to the aggregation count. No individual user data was ever manually inspected, only heavily aggregated flows of large populations were handled.
Data availability
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
Code availability
Code sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
References
Bassolas, A. et al. Hierarchical organization of urban mobility and its connection with city livability. Nat. Commun. 10, 4817 (2019).
Wilson, R. J. et al. Differentially private sql with bounded user contribution. Proc. Priv. Enhancing Technol. 2020, 230–250 (2020).
Acknowledgements
A.B. is funded by the Conselleria d’Educacio, Cultura i Universitats of the Government of the Balearic Islands and the European Social Fund. A.B. and J.J.R. also acknowledge partial funding from the Spanish Ministry of Science and Innovation, the National Agency for Research Funding AEI MCIN/AEI/10.13039/501100011033/ and FEDER (EU) under the grant PACSS (RTI2018-093732-B-C22) and the Maria de Maeztu program for Units of Excellence in R&D (MDM-2017-0711). G.G. and S.H. acknowledge funding from the Department of Economic Development (DED), New York through the NYS Center of Excellence in Data Science at the University of Rochester (C160189). G.G. and H.B. also acknowledge support in part by the U. S. Army Research Office (ARO) under grant number W911NF-18-1-0421. Any opinions, findings, conclusions or recommendations expressed are those of the author(s) and do not necessarily reflect the views of the DED or the ARO.
Author information
Authors and Affiliations
Contributions
A.B., H.B., B.D., R.G., G.G., S.A.H., A.S., and J.J.R. contributed to the work methodology. A.B., R.G., G.G., H.K., A.S., and J.J.R. wrote the paper. G.G., H.K., A.S., and J.J.R. coordinated the study. All authors read, edited, and approved the final version of the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bassolas, A., Barbosa-Filho, H., Dickinson, B. et al. Reply to: On the difficulty of achieving differential privacy in practice: user-level guarantees in aggregate location data. Nat Commun 13, 30 (2022). https://doi.org/10.1038/s41467-021-27567-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-021-27567-z
This article is cited by
-
Utilization of anonymization techniques to create an external control arm for clinical trial data
BMC Medical Research Methodology (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.