Abstract
Geographic information adds a powerful component to environmental epidemiology studies but can compromise subject confidentiality. Although locations are often masked by perturbing spatial coordinates, existing masks do not ensure that the perturbation area contains a sufficient number of valid surrogates to prevent disclosure, nor are they designed to minimize perturbation while maintaining a specified level of privacy. I introduce a new approach to geoprivacy in which real property parcel data with information about land use are used to develop a pool of verified neighbors. GIS (geographic information system) processing optionally restricts the pool to residences with values of environmental variables similar to those of the subject parcel. A surrogate is then randomly selected from the k members of the pool closest to the subject with k chosen to achieve the desired spatial privacy protection. The method guarantees the specified level of privacy even where population density is uneven while minimizing spatial distortion and changes to the values of environmental variables assigned to subjects. The method is illustrated with an example that found it to be more effective than random perturbation-based methods in both protecting privacy and preserving spatial fidelity to the original locations.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 6 print issues and online access
$259.00 per year
only $43.17 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Brownstein JS, Cassa CA, Mandl KD . No place to hide – reverse identification of patients from published maps. N Engl J Med 2006; 355: 1741–1742.
Curtis AJ, Mills JW, Leitner M . Spatial confidentiality and GIS: re-engineering mortality locations from published maps about Hurricane Katrina. Int J Health Geog 2006; 5: 44.
Armstrong MP, Ruggles AJ . Geographic information technologies and personal privacy. Cartographica 2005; 40: 63–73.
Duncan GT, Pearson RW . Enhancing access to microdata while protecting confidentiality: prospects for the future. Stat Sci 1991; 6: 219–232.
National Research CouncilPutting People on the Map: Protecting Confidentiality with Linked Social-Spatial Data. Panel on Confidentiality Issues Arising from the Integration of Remotely Sensed and Self-Identifying Data In: Gutmann MP, Stern PC(eds.)Committee on the Human Dimensions of Global Change. Division of Behavioral and Social Sciences and Education. The National Academies Press: Washington, DC. 2007.
Gutmann MP, Witkowski K, Colyer C, O’Rourke JM, McNally J . Providing spatial data for secondary analysis: issues and current practices relating to confidentiality. Popul Res Policy Rev 2008; 27: 639–665.
Armstrong MP, Rushton G, Zimmerman DL . Geographically masking health data to preserve confidentiality. Stat Med 1999; 18: 497–525.
Zandbergen PA . Ensuring confidentiality of geocoded health data: assessing geographic masking strategies for individual-level data. Adv Med 2014; 2014: e567049.
Olson KL, Grannis SJ, Mandl KD . Privacy protection versus cluster detection in spatial epidemiology. Am J Public Health 2006; 96: 2002–2008.
Kwan M-P, Casas I, Schmitz BC . Protection of geoprivacy and accuracy of spatial information: how effective are geographical masks? Cartographica 2004; 39: 15–28.
Leitner M, Curtis A . A first step towards a framework for presenting the location of confidential point data on maps – results of an empirical perceptual study. Int J Geog Inform Sci 2006; 20: 813–822.
Sweeney L . k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzz 2002; 10: 557–570.
Cassa CA, Grannis SJ, Overhage JM, Mandl KD . A context-sensitive approach to anonymizing spatial surveillance data: impact on outbreak detection. J Am Med Inform Assoc 2006; 13: 160–165.
Hampton KH, Fitch MK, Allshouse WB, Doherty IA, Gesink DC, Leone PA et al. Mapping health data: improved privacy protection with donut method geomasking. Am J Epidemiol 2010; 172: 1062–1069.
Duncan GT, Lambert D . The risk of disclosure for microdata. J Bus Econ Stat 1989; 7: 207–217.
New York State Office of Real Property Services. Real property parcel centroids. Albany, New York2004.
New York State Department of Health.Water districts (unpublished data set). Troy, New York2006.
Environmental Research Systems Institute ArcGIS Version 10.0. ESRI: Redlands, CA. 2010.
Hampton K .pyDonutGeomask version 1.0. http://www.unc.edu/depts/case/BMElab/donutGeomask/pyDonutGeomask1.0.htmAccessed 26 December 2016.
R Core Team 2014 R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria,URL http://www.R-project.org/.
Kulldorff M . A spatial scan statistic. Commun Stat- Theory Methods 1997; 26: 1481–1496.
Kulldorff M and Information Management Services, Inc. SaTScanTM v9.1.1: Software for the spatial and space-time scan statisticshttp://www.satscan.org;2011. SaTScan is a trademark of Martin Kulldorff. The SaTScan™ software was developed under the joint auspices of (i) Martin Kulldorff, (ii) the National Cancer Institute, and (iii) Farzad Mostashari of the New York City Department of Health and Mental Hygiene.
Ripley BD . Modelling spatial patterns. J R Stat Soc Series B 1977; 39: 172–212.
Diggle PJ . Statistical Analysis of Spatial Point Patterns. Academic Press: London. 1983.
Baddeley A, Turner R . spatstat: an R package for analyzing spatial point patterns. J Stat Softw 2005; 12: 1–42version 1.40-0.
Vision TJ . Open data and the social contract of scientific publishing. Bioscience 2010; 60: 330–331.
National Institutes of Health.NIH Data Sharing Policies http://www.nlm.nih.gov/NIHbmic/nih_data_sharing_policies.htmlPublished 23 January 2013. Updated 31 January 2014. Accessed 16 February 2014.
National Science Foundation.Dissemination and sharing of research results https://www.nsf.gov/bfa/dias/policy/dmp.jspAccessed 16 February 2014.
Hanson B, Sugden A, Alberts B . Making data maximally available. Science 2011; 331: 649.
Wieland SC, Cassa CA, Mandl KD, Berger B . Revealing the spatial distribution of a disease while preserving privacy. Proc Natl Acad Sci USA 2008; 105: 17608–17613.
Clifton KJ, Gehrke S . Application of geographic perturbation methods to residential locations in the Oregon household activity survey. Transp Res Rec 2013; 2354: 40–50.
Allshouse WB, Fitch MK, Hampton KH, Gesink DC, Doherty IA, Leone PA et al. Geomasking sensitive health data and privacy protection: an evaluation using an E911 database. Geocarto Int 2010; 25: 443–452.
Kounadi O, Leitner M . Adaptive areal elimination (AAE): a transparent way of disclosing protected spatial datasets. Comput Environ Urban Syst 2016; 57: 59–67.
El Emam K, Dankar FK . Protecting privacy using k-anonymity. J Am Med Inform Assoc 2008; 15: 627–637.
Gymrek M, McGuire AL, Golan G, Halperin E, Erlich Y . Identifying personal genomes by surname inference. Science 2013; 339: 321–324.
OpenStreetMap.www.openstreetmap.orgAccessed 26 December 2016.
ParcelPoint. CoreLogichttp://www.corelogic.com/products/parcelpoint.aspx#container-OverviewAccessed 26 December 2016.
Seidl DE, Paulus G, Jankowski P, Regenfelder M . Spatial obfuscation methods for privacy protection of household-level data. Appl Geogr 2015; 63: 253–263.
Acknowledgements
I thank Jay Nuckols for suggesting the work that led to this paper, Erin Bell for the inspiration to write the paper, Celine Barakat for her assistance with initial development of the method, and Jonathan Riehl for programming help. Tom Hart provided valuable review and comments. Early work on this paper was supported in part by grant U50/CCU223284-01 from the United States Centers for Disease Control and Prevention to the New York State Department of Health under their Environmental and Health Effect Tracking initiative. I appreciate comments from reviewers that led to improvements in the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The author declares no conflict of interest.
Additional information
Supplementary Information accompanies the paper on the Journal of Exposure Science and Environmental Epidemiology website
Rights and permissions
About this article
Cite this article
Richter, W. The verified neighbor approach to geoprivacy: An improved method for geographic masking. J Expo Sci Environ Epidemiol 28, 109–118 (2018). https://doi.org/10.1038/jes.2017.17
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/jes.2017.17
Keywords
This article is cited by
-
Measuring the impact of spatial perturbations on the relationship between data privacy and validity of descriptive statistics
International Journal of Health Geographics (2021)
-
Street masking: a network-based geographic mask for easily protecting geoprivacy
International Journal of Health Geographics (2020)
-
Addressing the data guardian and geospatial scientist collaborator dilemma: how to share health records for spatial analysis while maintaining patient confidentiality
International Journal of Health Geographics (2019)