Routes for breaching and protecting genetic privacy

Journal name:
Nature Reviews Genetics
Volume:
15,
Pages:
409–421
Year published:
DOI:
doi:10.1038/nrg3723
Published online
Corrected online

Abstract

We are entering an era of ubiquitous genetic information for research, clinical care and personal curiosity. Sharing these data sets is vital for progress in biomedical research. However, a growing concern is the ability to protect the genetic privacy of the data originators. Here, we present an overview of genetic privacy breaching strategies. We outline the principles of each technique, indicate the underlying assumptions, and assess their technological complexity and maturation. We then review potential mitigation methods for privacy-preserving dissemination of sensitive data and highlight different cases that are relevant to genetic applications.

At a glance

Figures

  1. An integrative map of genetic privacy breaching techniques.
    Figure 1: An integrative map of genetic privacy breaching techniques.

    The map contrasts different scenarios, such as identifying de-identified genetic data sets, revealing an attribute from genetic data and unmasking of data. It also shows the interdependencies between the techniques and suggests potential routes to exploit further information after the completion of one attack. There are several simplifying assumptions (black circles). In certain scenarios (such as insurance decisions), uncertainty about the target's identity within a small group of people could still be considered a success (assumption 1). For certain privacy harms (such as surveillance), identity tracing can be considered a success and the end point of the process (assumption 2). The complete DNA sequence is not always necessary (assumption 3).

  2. A possible route for identity tracing.
    Figure 2: A possible route for identity tracing.

    The route combines both metadata and surname inference to triangulate the identity of an unknown genome of a person in the United States (represented by the black silhouette). Without any information, there are ~300 million individuals that could match the genome, which is equivalent to 28 bits of entropy. Inferring the sex by inspecting the sex chromosomes reduces the entropy by 1 bit. The adversary then uses the metadata to find the state of residence and the age, which reduces the entropy to 16 bits. Successful surname recovery (for example, using Ysearch) leaves only ~3 bits of entropy. At this point, the adversary uses public record search engines such as PeopleFinders to generate a list of potential individuals; he or she can use social engineering or pedigree structure to triangulate the person (represented by the red silhouette).

Change history

Corrected online 17 June 2014
In this article, an incorrect citation was given in reference 107. The citation should have been: Ayday, E., Raisaro, J. L., McLaren, P. J., Fellay, J. & Hubaux, J.-P. Privacy-preserving computation of disease risk by using genomic, clinical, and environmental data. Proc. USENIX Security Workshop Health Inf. Technol. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.309.1513 (2013). This has now been corrected online. The editors apologize for this error.

References

  1. Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216220 (2013).
  2. 1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 5665 (2012).
  3. Roberts, J. P. Million veterans sequenced. Nature Biotech. 31, 470470 (2013).
  4. Drmanac, R. Medicine. The ultimate genetic test. Science 336, 11101112 (2012).
  5. Burn, J. Should we sequence everyone's genome? Yes. BMJ 346, f3133 (2013).
  6. Kaye, J., Heeney, C., Hawkins, N., de Vries, J. & Boddington, P. Data sharing in genomics — re-shaping scientific practice. Nature Rev. Genet. 10, 331335 (2009).
  7. Park, J. H. et al. Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nature Genet. 42, 570575 (2010).
  8. Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nature Genet. 45, 400405 (2013).
  9. Friend, S. H. & Norman, T. C. Metcalfe's law and the biology information commons. Nature Biotech. 31, 297303 (2013).
  10. Rodriguez, L. L., Brooks, L. D., Greenberg, J. H. & Green, E. D. The complexities of genomic identifiability. Science 339, 275276 (2013).
  11. Institute of Medicine (US) Roundtable on Value & Science-Driven Health Care. Clinical Dataas the Basic Staple of Health Learning: Creating and Protecting a Public Good: Workshop Summary (National Academies Press (US), 2010).
  12. McGuire, A. L. et al. To share or not to share: a randomized trial of consent for data sharing in genome research. Genet. Med. 13, 948955 (2011).
  13. Oliver, J. M. et al. Balancing the risks and benefits of genomic data sharing: genome research participants' perspectives. Publ. Health Genom. 15, 106114 (2012).
  14. Careless.data. Nature 507, 7 (2014).
  15. Schwartz, P. M. & Solove, D. J. Reconciling personal information in the United States and European Union. 102 California Law Rev. http://dx.doi.org/10.2139/ssrn.2271442 (2013).
  16. El Emam, K. Heuristics for de-identifying health data. IEEE Secur. Priv. 6, 5861 (2008).
  17. Lunshof, J. E., Chadwick, R., Vorhaus, D. B. & Church, G. M. From genetic privacy to open consent. Nature Rev. Genet. 9, 406411 (2008).
  18. Brenner, S. E. Be prepared for the big genome leak. Nature 498, 139 (2013).
  19. McClure, S., Scambray, J. & Kurtz, G. Hacking Exposed 7: Network Security Secrets and Solutions (McGraw Hill, 2012).
  20. Solve, D. J. A taxonomy of privacy. Univ. Pennsylvania Law Rev. 154, 477 (2006).
    This work organizes various concepts of privacy violations from a legal perspective.
  21. Ohm, P. Broken promises of privacy: responding to the surprising failure of anonymization. UCLA Law Rev. 57, 1701 (2010).
  22. Golle, P. Revisiting the uniqueness of simple demographics in the US population. Proc. 5th ACM Workshop Privacy in Electron. Soc. 7780 (2006).
  23. Sweeney, L. A. Simple Demographics Often Identify People Uniquely. Carnegie Mellon Univ. Data Privacy Working Paper 3 (2000).
  24. Sweeney, L. Testimony of Latanya Sweeney before the Privacy and Integrity Advisory Committee of the Department of Homeland Security. US Homeland Security [online], (2005).
  25. Sweeney, L. A., Abu, A. & Winn, J. Identifying participants in the personal genome project by name. Data Privacy Lab [online], (2013).
    This study shows identity tracing of PGP participants using metadata and side-channel techniques.
  26. Code of Federal Regulations Title 45 Section 164.514 (US Federal Register, 2002).
  27. Benitez, K. & Malin, B. Evaluating re-identification risks with respect to the HIPAA Privacy Rule. J. Am. Med. Informat. Associ. 17, 169177 (2010).
  28. Kwok, P., Davern, M., Hair, E. & Lafky, D. Harder Than You Think: a Case Study of Re-identification Risk of HIPAA-Compliant Records. NORC at The University of Chicago Abstract 302255 (2011).
  29. Bennett, R. L. et al. Recommendations for standardized human pedigree nomenclature. Pedigree standardization task force of the national society of genetic counselors. Am. J. Hum. Genet. 56, 745752 (1995).
  30. Malin, B. Re-identification of familial database records. AMIA Annu. Symp. Proc. 2006, 524528 (2006).
  31. Israel v. N. Bilik and others 24441-05-12 [online], (in Hebrew) (2013).
  32. Khan, R. & Mittelman, D. Rumors of the death of consumer genomics are greatly exaggerated. Genome Biol. 14, 139 (2013).
  33. Gitschier, J. Inferential genotyping of Y chromosomes in Latter-Day Saints founders and comparison to Utah samples in the HapMap project. Am. J. Hum. Genet. 84, 251258 (2009).
  34. Gymrek, M., McGuire, A. L., Golan, D., Halperin, E. & Erlich, Y. Identifying personal genomes by surname inference. Science 339, 321324 (2013).
    This paper reports end-to-end identity tracing of anonymous research participants from DNA information and Internet searches, and a risk assessment for the US population.
  35. King, T. E. & Jobling, M. A. What's in a name? Y chromosomes, surnames and the genetic genealogy revolution. Trends Genet. 25, 351360 (2009).
  36. King, T. E. & Jobling, M. A. Founders, drift, and infidelity: the relationship between Y chromosome diversity and patrilineal surnames. Mol. Biol. Evol. 26, 10931102 (2009).
  37. Motluk, A. Anonymous sperm donor traced on internet. New Scientist 2 (3 Nov 2005).
    This article discusses the first public case of identity tracing using genealogical triangulation.
  38. Stein, R. Found on the web, with DNA: a boy's father. Washington Post A09 (13 Nov 2005).
  39. Naik, G. Family secrets: an adopted man's 26-year quest for his father. The Wall Street Journal (2 May 2009).
  40. Lehmann-Haupt, R. Are sperm donors really anonymous anymore? Slate (1 Mar 2010).
  41. Gymrek, M., Golan, D., Rosset, S. & Erlich, Y. lobSTR: a short tandem repeat profiler for personal genomes. Genome Res. 22, 11541162 (2012).
  42. China News Network. Ministry of Public Security statistics: “King” into the most common surname in China has 9288 million. Eastday [online], (in Chinese) (2007).
  43. Huff, C. D. et al. Maximum-likelihood estimation of recent shared ancestry (ERSA). Genome Res. 21, 768774 (2011).
  44. Henn, B. M. et al. Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS ONE 7, e34267 (2012).
  45. Lowrance, W. W. & Collins, F. S. Identifiability in genomic research. Science 317, 600602 (2007).
  46. Kayser, M. & de Knijff, P. Improving human forensics through advances in genetics, genomics and molecular biology. Nature Rev. Genet. 12, 179192 (2011).
    This is a comprehensive review of methods to predict phenotypes from DNA information.
  47. Silventoinen, K. et al. Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res. 6, 399408 (2003).
  48. Kohn, L. A. P. The role of genetics in craniofacial morphology and growth. Annu. Rev. Anthropol. 20, 261278 (1991).
  49. Zubakov, D. et al. Estimating human age from T-cell DNA rearrangements. Curr. Biol. 20, R970R971 (2010).
  50. Ou, X. L. et al. Predicting human age with bloodstains by sjTREC quantification. PLoS ONE 7, e42412 (2012).
  51. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832838 (2010).
  52. Manning, A. K. et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nature Genet. 44, 659669 (2012).
  53. Liu, F. et al. A genome-wide association study identifies five loci influencing facial morphology in Europeans. PLoS Genet. 8, e1002932 (2012).
  54. Walsh, S. et al. IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information. Forens. Sci. Int. Genet. 5, 170180 (2011).
  55. Byers, S. Information leakage caused by hidden data in published documents. IEEE Secur. Priv. 2, 2327 (2004).
  56. Kaufman, S., Rosset, S. & Perlich, C. Leakage in data mining: formulation, detection, and avoidance. Proc. 17th ACM SIGKDD Int. Conf. Knowledge Discov. Data Mining 556563 (2011).
  57. Acquisti, A. & Gross, R. Predicting Social Security numbers from public data. Proc. Natl Acad. Sci. USA 106, 1097510980 (2009).
  58. Noumeir, R., Lemay, A. & Lina, J. M. Pseudonymization of radiology data for research purposes. J. Digital Imag. 20, 284295 (2007).
  59. Pakstis, A. J. et al. SNPs for a universal individual identification panel. Hum. Genet. 127, 315324 (2010).
  60. Lin, Z., Owen, A. B. & Altman, R. B. Genomic research and human subject privacy. Science 305, 183 (2004).
  61. Mailman, M. D. et al. The NCBI dbGaP database of genotypes and phenotypes. Nature Genet. 39, 11811186 (2007).
  62. Homer, N. et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4, e1000167 (2008).
    This is the first study to show an ADAD from summary statistic data.
  63. Jacobs, K. B. et al. A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies. Nature Genet. 41, 12531257 (2009).
  64. Visscher, P. M. & Hill, W. G. The limits of individual identification from sample allele frequencies: theory and statistical analysis. PLoS Genet. 5, e1000628 (2009).
  65. Sankararaman, S., Obozinski, G., Jordan, M. I. & Halperin, E. Genomic privacy and limits of individual detection in a pool. Nature Genet. 41, 965967 (2009).
    References 64–65 provide excellent mathematical analyses of ADAD using allele frequency data.
  66. Wang, R., Li, Y. F., Wang, X., Haixu, T. & Zhou, X. Learning your identity and disease from research papers: information leaks in genome wide association study. Proc. 16th ACM Conf. Comput. Commun. Security 534544 (2009).
  67. Im, H. K., Gamazon, E. R., Nicolae, D. L. & Cox, N. J. On sharing quantitative trait, GWAS results in an era of multiple-omics data and the limits of genomic privacy. Am. J. Hum. Genet. 90, 591598 (2012).
  68. Lumley, T. Potential for revealing individual-level information in genome-wide association studies. JAMA 303, 659 (2010).
  69. Zerhouni, E. A. & Nabel, E. G. Protecting aggregate genomic data. Science 322, 44 (2008).
  70. Johnson, A. D., Leslie, R. & O'Donnell, C. J. Temporal trends in results availability from genome-wide association studies. PLoS Genet. 7, e1002269 (2011).
  71. Gilbert, N. Researchers criticize genetic data restrictions. Nature http://dx.doi.org/10.1038/news.2008.1083 (2008).
  72. Malin, B., Karp, D. & Scheuermann, R. H. Technical and policy approaches to balancing patient privacy and data sharing in clinical and translational research. J. Investig. Med. 58, 1118 (2010).
  73. Clayton, D. On inferring presence of an individual in a mixture: a Bayesian approach. Biostatistics 11, 661673 (2010).
  74. Report on the workshop on establishing a central resource of data from genome sequencing projects. National Genome Research Institute [online], (2012).
  75. Schadt, E. E., Woo, S. & Hao, K. Bayesian method to predict individual SNP genotypes from gene expression data. Nature Genet. 44, 603608 (2012).
  76. Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nature Rev. Genet. 11, 499511 (2010).
  77. Nyholt, D. R., Yu, C. E. & Visscher, P. M. On Jim Watson's APOE status: genetic information is hard to hide. Eur. J. Hum. Genet. 17, 147149 (2009).
    This study clearly shows the limited use of masking sensitive DNA areas.
  78. Humbert, M., Ayday, E., Hubaux, J.-P. & Telenti, A. Addressing the concerns of the Lacks family: quantification of kin genomic privacy. Proc. 2013 ACM SIGSAC Conf. Comput. Commun. Secur. 11411152 (2013).
  79. Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nature Genet. 40, 10681075 (2008).
  80. Kaiser, J. Agency nixes deCODE's new data-mining plan. Science 340, 13881389 (2013).
  81. Bambauer, J. R. Tragedy of the data commons. Harvard J. Law Technol. http://dx.doi.org/10.2139/ssrn.1789749 (2011).
  82. Hartzog, W. & Stutzman, F. The case for online obscurity. Calif. Law Rev. 101, 1 (2013).
  83. Taleb, N. N. The Black Swan: the Impact of the Highly Improbable (Random House, 2007).
  84. Shannon, C. Communication theory of secrecy systems. Bell System Techn. J. 28, 656715 (1949).
  85. Cavoukian, A. Privacy by design. Information and Privacy Commissioner, Ontario, Canada [online], (2009).
  86. Tryka, K. A. et al. NCBI's database of genotypes and phenotypes: dbGaP. Nucleic Acids Res. 42, D975D979 (2014).
  87. Ramos, E. M. et al. A mechanism for controlled access to GWAS data: experience of the GAIN Data Access Committee. Am. J. Hum. Genet. 92, 479488 (2013).
  88. Church, G. et al. Public access to genome-wide data: five views on balancing research with privacy and protection. PLoS Genet. 5, e1000665 (2009).
  89. Agrawal, R., Kiernan, J., Srikant, R. & Xu, Y. Hippocratic databases. Proc. 28th Int. Conf. Very Large Databases 143154 (2002).
  90. Agrawal, R. et al. Auditing compliance with a hippocratic database. Proc. 30th Int. Conf. Very Large Databases 516527 (2004).
  91. Venter, H. S., Olivier, M. S. & Eloff, J. H. PIDS: a privacy intrusion detection system. Internet Res. 14, 360365 (2004).
  92. Creating a global alliance to enable responsible sharing of genomic and clincal data. [online], (2013).
  93. Bafna, V. et al. Abstractions for genomics. Commun. ACM 56, 8393 (2013).
  94. Terry, S. F. & Terry, P. F. Power to the people: participant ownership of clinical trial data. Sci. Transl Med. 3, 69cm3 (2011).
  95. Kaye, J. et al. From patients to partners: participant-centric initiatives in biomedical research. Nature Rev. Genet. 13, 371376 (2012).
  96. Sweeney, L. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzz. 10, 557570 (2002).
  97. El Emam, K. & Dankar, F. K. Protecting privacy using k-anonymity. J. Am. Med. Informat. Associ. 15, 627637 (2008).
  98. Malin, B. A. Protecting genomic sequence anonymity with generalization lattices. Methods Inform. Med. 44, 687692 (2005).
  99. Machanavajjhala, A., Kifer, D., Gehrke, J. & Venkitasubramaniam, M. L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 3 (2007).
  100. Li, N., Li, T. & Venkatasubramanian, S. t-closeness: privacy beyond k-anonymity and L-diversity. IEEE 23rd Int. Conf. Data Eng. 106115 (2007).
  101. Dwork, C. Differential privacy. Automata, Languages and Programming 112 (Springer Verlag, 2006).
  102. Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J. & Vilhuber, L. Privacy: theory meets practice on the map. IEEE 24th Int. Conf. Data Eng. 277286 (2008).
  103. Uhler, C., Slavkovic, A. B. & Fienberg, S. E. Privacy-preserving data sharing for genome-wide association studies. arXiv 1205.0739 (2012).
  104. Yu, F., Fienberg, S. E., Slavkovic, A. & Uhler, C. Scalable privacy-preserving data sharing methodology for genome-wide association studies. arXiv 1401.5193 (2014).
  105. Johnson, A. & Shmatikov, V. Privacy-preserving data exploration in genome-wide association studies. Proc. 19th ACM SIGKDD Int. Conf. Knowledge Discov. Data Mining 10791087 (2013).
  106. Ayday, E., Raisaro, J. L. & Hubaux, J. P. Privacy-enhancing technologies for medical tests using genomic data. Ecole Polytechnique Federale de Lausanne [online], (2013).
  107. Ayday, E., Raisaro, J. L., McLaren, P.J., Fellay, J. & Hubaux, J.-P. Privacy-preserving computation of disease risk by using genomic, clinical, and environmental data. Proc. USENIX Security Workshop Health Inf. Technol. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.309.1513 (2013).
    This pioneering work shows the use of homomorphic encryption for privacy-preserving genetic risk predictions.
  108. Atallah, M. J., Kerschbaum, F. & Du, W. Secure and private sequence comparisons. Proc. 2003 ACM Workshop Privacy in Electron. Soc. 3944 (2003).
  109. Jha, S., Kruger, L. & Shmatikov, V. Towards practical privacy for genomic computation. IEEE Symp. Security and Privacy 216230 (2008).
  110. Chen, Y., Peng, B., Wang, X. & Tang, H. Large-scale privacy-preserving mapping of human genomic sequences on hybrid clouds. Proc. 19th Annu. Netw. Distributed Syst. Security Symp. (2013).
    The paper presents an interesting concept of privacy-preserving alignment of high-throughput sequencing data that allows the use of untrusted cloud providers.
  111. Yao, A. C.-C. Protocols for secure computations. 23rd Annu. Symp. Found. Comput. Sci. 160164 (1982).
  112. Li, H. & Homer, N. A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform. 11, 473483 (2010).
  113. Bohannon, P., Jakobsson, M. & Srikwan, S. in Public Key Cryptography (eds Imai, H. & Zheng, Y.) 373390 (Springer, 2000).
  114. Fons, B., Stefan, K., Klaus, K. & Pim, T. Privacy-preserving matching of DNA profiles. Cryptology ePrint Archive 2008, 203 (2008).
  115. Baldi, P., Baronio, R., Cristofaro, E. D., Gasti, P. & Tsudik, G. Countering GATTACA: efficient and secure testing of fully-sequenced human genomes. Proc. 18th ACM Conf. Comput. Commun. Security 691702 (2011).
  116. De Cristofaro, E., Faber, S., Gasti, P. & Tsudik, G. Genodroid: are privacy-preserving genomic tests ready for prime time? Proc. 2012 ACM Workshop Privacy in Electron. Soc. 97108 (2012).
  117. He, D. et al. Identifying genetic relatives without compromising privacy. Genome Res. 24, 664672 (2014).
  118. Kantarcioglu, M., Jiang, W., Liu, Y. & Malin, B. A cryptographic approach to securely share and query genomic sequences. IEEE Trans. Inf. Technol. Biomed. 12, 606617 (2008).
  119. Kamm, L., Bogdanov, D., Laur, S. & Vilo, J. A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics 29, 886893 (2013).
  120. Canim, M., Kantarcioglu, M. & Malin, B. Secure management of biomedical data with cryptographic hardware. IEEE Trans. Inf. Technol. Biomed. 16, 166175 (2012).
  121. Narayanan, A. What happened to the crypto dream? IEEE Secur. Priv. 11, 7576 (2013).
  122. Ayday, E., De Cristofaro, E. Hubaux, J.-P. & Tsudik, G. The chills and thrills of whole genome sequencing. Computer http://doi.ieeecomputersociety.org/10.1109/MC.2013.333 (2013).
    This is a good overview of cryptographic work for protecting genetic data and of open questions in the area.
  123. Presidential Commission for the Study of Bioethical Issues. Privacy and Progress in Whole Genome Sequencing (2012).
  124. Craig, D. W. et al. Assessing and managing risk when sharing aggregate genetic variant data. Nature Rev. Genet. 12, 730736 (2011).
  125. Braun, R., Rowe, W., Schaefer, C., Zhang, J. & Buetow, K. Needles in the haystack: identifying individuals present in pooled genomic data. PLoS Genet. 5, e1000668 (2009).
    This is a critical assessment of the performance of ADAD with allele frequency data.
  126. Kendler, K. S., Gallagher, T. J., Abelson, J. M. & Kessler, R. C. Lifetime prevalence, demographic risk factors, and diagnostic validity of nonaffective psychosis as assessed in a US community sample: the National Comorbidity Survey. Arch. Gen. Psychiatry 53, 10221031 (1996).
  127. Lee, J. & Clifton, C. in Information Security 325340 (Springer, 2011).
  128. Hsu, J. et al. Differential privacy: an economic method for choosing epsilon. arXiv 1402.3329 (2014).
  129. Dwork, C., McSherry, F., Nissim, K. & Smith, A. in Theory of Cryptography 265284 (Springer, 2006).
  130. Paillier, P. in Advances in Cryptology — EUROCRYPT '99 (ed. Stern, J.) 223238 (Springer, 1999).
  131. Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4, e1000008 (2008).
  132. Gentry, C. Fully homomorphic encryption using ideal lattices. Proc. 41st Annu. ACM Symp. Theory of Comput. 169178 (2009).
  133. Wang, R., Li, Y. F., Wang, X. F., Tang, H. & Zhou, X. Learning your identity and disease from research papers: information leaks in genome wide association study. Proc. 16th ACM Conf. Comput. Commun. Security 534544 (2009).

Download references

Author information

Affiliations

  1. Whitehead Institute for Biomedical Research, Nine Cambridge Center, Cambridge, Massachusetts 02142, USA.

    • Yaniv Erlich
  2. Department of Computer Science, Princeton University, 35 Olden Street, Princeton, New Jersey 08540, USA.

    • Arvind Narayanan

Competing interests statement

The authors declare no competing interests.

Corresponding author

Correspondence to:

Author details

  • Yaniv Erlich

    Yaniv Erlich is a fellow at the Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, USA. He received his Ph.D. from Cold Spring Harbor Laboratory, New York, USA, in 2010 and his B.Sc. from Tel-Aviv University, Israel, in 2006. Before that, he worked in computer security and was responsible for conducting penetration tests on financial institutes and commercial companies. His research involves developing new algorithms for computational human genetics. Yaniv Erlich's homepage.

  • Arvind Narayanan

    Arvind Narayanan is an assistant professor in the Department of Computer Science, and the Center for Information Technology and Policy at Princeton University, New Jersey, USA. He studies information privacy and security. His research has shown that data anonymization is broken in fundamental ways, for which he jointly received the 2008 Privacy Enhancing Technologies Award. His current research interests include building a platform for privacy-preserving data sharing. Arvind Narayanan's homepage.

Supplementary information

PDF files

  1. Supplementary information S1 (figure) (1.2 MB)

    Differential privacy statistic of an association study.

Additional data