Abstract
Recent studies have demonstrated that statistical methods can be used to detect the presence of a single individual within a study group based on summary data reported from genome-wide association studies (GWAS). We present an analytical and empirical study of the statistical power of such methods. We thereby aim to provide quantitative guidelines for researchers wishing to make a limited number of SNPs available publicly without compromising subjects' privacy.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Homer, N. et al. PLoS Genet. 4, e1000167 (2008).
Gilbert, N. Nature doi:10.1038/news.2008.1083 (4 September 2008).
Barrett, J.C. et al. Nat. Genet. 40, 955–962 (2008).
Zeggini, E. et al. Nat. Genet. 40, 638–645 (2008).
Cooper, J.D. et al. Nat. Genet. 40, 1399–1401 (2008).
Lehmann, E.L. Testing Statistical Hypotheses (Springer, New York, 2005).
The Wellcome Trust Case Control Consortium. Nature 447, 661–683 (2007).
Acknowledgements
E.H. was supported by US National Science Foundation grant IIS-0713254. E.H. is a faculty fellow of the Edmond J. Safra Bioinformatics program at Tel-Aviv University. M.I.J., S.S. and G.O. were supported by NIH/NIGMS R01 grant GM071749. This study makes use of data generated by the Wellcome Trust Case Control Consortium; a full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113.
Author information
Authors and Affiliations
Contributions
S.S. and G.O. contributed to the design of the experiments and implemented the experiments; they also developed the theoretical analysis and contributed to writing the paper. M.I.J. contributed to the design of the experiments, the theoretical analysis and the writing of the paper as well as to the funding of the project. E.H. initiated the project and proposed the framework, and he contributed to the design of the experiments, the theoretical analysis, the writing of the paper and the funding of the project.
Corresponding author
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–7, Supplementary Methods and Supplementary Note (PDF 764 kb)
Rights and permissions
About this article
Cite this article
Sankararaman, S., Obozinski, G., Jordan, M. et al. Genomic privacy and limits of individual detection in a pool. Nat Genet 41, 965–967 (2009). https://doi.org/10.1038/ng.436
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.436
This article is cited by
-
COLLAGENE enables privacy-aware federated and collaborative genomic data analysis
Genome Biology (2023)
-
Membership inference attacks against compression models
Computing (2023)
-
Sociotechnical safeguards for genomic data privacy
Nature Reviews Genetics (2022)
-
Privacy challenges and research opportunities for genomic data sharing
Nature Genetics (2020)
-
Biomedical Big Data: New Models of Control Over Access, Use and Governance
Journal of Bioethical Inquiry (2017)