Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Routes for breaching and protecting genetic privacy

An Erratum to this article was published on 17 June 2014

This article has been updated

Key Points

  • Privacy breaching techniques can work by cross-referencing two or more pieces of information to gain new, potentially harmful, knowledge on individuals or their families. Broadly speaking, the main routes to breach privacy are identity tracing, attribute disclosure attacks using DNA (ADAD) and completion of sensitive DNA information.

  • Identity tracing exploits quasi-identifiers in the DNA data or metadata to uncover the identity of an unknown genetic data set. ADAD links the identity of a known person to a sensitive phenotype using DNA-derived data. Completion techniques also work on known DNA data and aim to uncover sensitive genomic areas that were masked to protect the participant.

  • In the past few years, the range of techniques and tools to carry out privacy breaching attacks has expanded. Although most of these techniques are currently beyond the reach of the general public, they can be implemented by trained persons with varying degrees of effort and success.

  • There is considerable debate regarding risk management. Some support a pragmatic, ad-hoc approach of privacy by obscurity, whereas others support a systematic, mathematical approach of privacy by design. Privacy-by-design algorithms include access control, differential privacy and cryptographic techniques.

  • So far, data custodians of genetic databases have primarily adopted access control as a mitigation strategy. New developments in cryptographic methods may usher in additional 'security-by-design' techniques.

Abstract

We are entering an era of ubiquitous genetic information for research, clinical care and personal curiosity. Sharing these data sets is vital for progress in biomedical research. However, a growing concern is the ability to protect the genetic privacy of the data originators. Here, we present an overview of genetic privacy breaching strategies. We outline the principles of each technique, indicate the underlying assumptions, and assess their technological complexity and maturation. We then review potential mitigation methods for privacy-preserving dissemination of sensitive data and highlight different cases that are relevant to genetic applications.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: An integrative map of genetic privacy breaching techniques.
Figure 2: A possible route for identity tracing.

Similar content being viewed by others

Change history

  • 17 June 2014

    In this article, an incorrect citation was given in reference 107. The citation should have been: Ayday, E., Raisaro, J. L., McLaren, P. J., Fellay, J. & Hubaux, J.-P. Privacy-preserving computation of disease risk by using genomic, clinical, and environmental data. Proc. USENIX Security Workshop Health Inf. Technol. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.309.1513 (2013). This has now been corrected online. The editors apologize for this error.

References

  1. Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).

    Article  CAS  PubMed  Google Scholar 

  2. 1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  3. Roberts, J. P. Million veterans sequenced. Nature Biotech. 31, 470–470 (2013).

    Article  CAS  Google Scholar 

  4. Drmanac, R. Medicine. The ultimate genetic test. Science 336, 1110–1112 (2012).

    Article  CAS  PubMed  Google Scholar 

  5. Burn, J. Should we sequence everyone's genome? Yes. BMJ 346, f3133 (2013).

    Article  PubMed  Google Scholar 

  6. Kaye, J., Heeney, C., Hawkins, N., de Vries, J. & Boddington, P. Data sharing in genomics — re-shaping scientific practice. Nature Rev. Genet. 10, 331–335 (2009).

    Article  CAS  PubMed  Google Scholar 

  7. Park, J. H. et al. Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nature Genet. 42, 570–575 (2010).

    Article  CAS  PubMed  Google Scholar 

  8. Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nature Genet. 45, 400–405 (2013).

    Article  CAS  PubMed  Google Scholar 

  9. Friend, S. H. & Norman, T. C. Metcalfe's law and the biology information commons. Nature Biotech. 31, 297–303 (2013).

    Article  CAS  Google Scholar 

  10. Rodriguez, L. L., Brooks, L. D., Greenberg, J. H. & Green, E. D. The complexities of genomic identifiability. Science 339, 275–276 (2013).

    Article  CAS  PubMed  Google Scholar 

  11. Institute of Medicine (US) Roundtable on Value & Science-Driven Health Care. Clinical Dataas the Basic Staple of Health Learning: Creating and Protecting a Public Good: Workshop Summary (National Academies Press (US), 2010).

  12. McGuire, A. L. et al. To share or not to share: a randomized trial of consent for data sharing in genome research. Genet. Med. 13, 948–955 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Oliver, J. M. et al. Balancing the risks and benefits of genomic data sharing: genome research participants' perspectives. Publ. Health Genom. 15, 106–114 (2012).

    Article  CAS  Google Scholar 

  14. Careless.data. Nature 507, 7 (2014).

  15. Schwartz, P. M. & Solove, D. J. Reconciling personal information in the United States and European Union. 102 California Law Rev. http://dx.doi.org/10.2139/ssrn.2271442 (2013).

  16. El Emam, K. Heuristics for de-identifying health data. IEEE Secur. Priv. 6, 58–61 (2008).

    Article  Google Scholar 

  17. Lunshof, J. E., Chadwick, R., Vorhaus, D. B. & Church, G. M. From genetic privacy to open consent. Nature Rev. Genet. 9, 406–411 (2008).

    Article  CAS  PubMed  Google Scholar 

  18. Brenner, S. E. Be prepared for the big genome leak. Nature 498, 139 (2013).

    Article  CAS  PubMed  Google Scholar 

  19. McClure, S., Scambray, J. & Kurtz, G. Hacking Exposed 7: Network Security Secrets and Solutions (McGraw Hill, 2012).

    Google Scholar 

  20. Solve, D. J. A taxonomy of privacy. Univ. Pennsylvania Law Rev. 154, 477 (2006). This work organizes various concepts of privacy violations from a legal perspective.

    Article  Google Scholar 

  21. Ohm, P. Broken promises of privacy: responding to the surprising failure of anonymization. UCLA Law Rev. 57, 1701 (2010).

    Google Scholar 

  22. Golle, P. Revisiting the uniqueness of simple demographics in the US population. Proc. 5th ACM Workshop Privacy in Electron. Soc. 77–80 (2006).

  23. Sweeney, L. A. Simple Demographics Often Identify People Uniquely. Carnegie Mellon Univ. Data Privacy Working Paper 3 (2000).

    Google Scholar 

  24. Sweeney, L. Testimony of Latanya Sweeney before the Privacy and Integrity Advisory Committee of the Department of Homeland Security. US Homeland Security [online], (2005).

    Google Scholar 

  25. Sweeney, L. A., Abu, A. & Winn, J. Identifying participants in the personal genome project by name. Data Privacy Lab [online], (2013). This study shows identity tracing of PGP participants using metadata and side-channel techniques.

    Google Scholar 

  26. Code of Federal Regulations Title 45 Section 164.514 (US Federal Register, 2002).

  27. Benitez, K. & Malin, B. Evaluating re-identification risks with respect to the HIPAA Privacy Rule. J. Am. Med. Informat. Associ. 17, 169–177 (2010).

    Article  Google Scholar 

  28. Kwok, P., Davern, M., Hair, E. & Lafky, D. Harder Than You Think: a Case Study of Re-identification Risk of HIPAA-Compliant Records. NORC at The University of Chicago Abstract 302255 (2011).

    Google Scholar 

  29. Bennett, R. L. et al. Recommendations for standardized human pedigree nomenclature. Pedigree standardization task force of the national society of genetic counselors. Am. J. Hum. Genet. 56, 745–752 (1995).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Malin, B. Re-identification of familial database records. AMIA Annu. Symp. Proc. 2006, 524–528 (2006).

    PubMed Central  Google Scholar 

  31. Israel v. N. Bilik and others 24441-05-12 [online], (in Hebrew) (2013).

  32. Khan, R. & Mittelman, D. Rumors of the death of consumer genomics are greatly exaggerated. Genome Biol. 14, 139 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Gitschier, J. Inferential genotyping of Y chromosomes in Latter-Day Saints founders and comparison to Utah samples in the HapMap project. Am. J. Hum. Genet. 84, 251–258 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Gymrek, M., McGuire, A. L., Golan, D., Halperin, E. & Erlich, Y. Identifying personal genomes by surname inference. Science 339, 321–324 (2013). This paper reports end-to-end identity tracing of anonymous research participants from DNA information and Internet searches, and a risk assessment for the US population.

    Article  CAS  PubMed  Google Scholar 

  35. King, T. E. & Jobling, M. A. What's in a name? Y chromosomes, surnames and the genetic genealogy revolution. Trends Genet. 25, 351–360 (2009).

    Article  CAS  PubMed  Google Scholar 

  36. King, T. E. & Jobling, M. A. Founders, drift, and infidelity: the relationship between Y chromosome diversity and patrilineal surnames. Mol. Biol. Evol. 26, 1093–1102 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Motluk, A. Anonymous sperm donor traced on internet. New Scientist 2 (3 Nov 2005). This article discusses the first public case of identity tracing using genealogical triangulation.

  38. Stein, R. Found on the web, with DNA: a boy's father. Washington Post A09 (13 Nov 2005).

  39. Naik, G. Family secrets: an adopted man's 26-year quest for his father. The Wall Street Journal (2 May 2009).

  40. Lehmann-Haupt, R. Are sperm donors really anonymous anymore? Slate (1 Mar 2010).

  41. Gymrek, M., Golan, D., Rosset, S. & Erlich, Y. lobSTR: a short tandem repeat profiler for personal genomes. Genome Res. 22, 1154–1162 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. China News Network. Ministry of Public Security statistics: “King” into the most common surname in China has 9288 million. Eastday [online], (in Chinese) (2007).

  43. Huff, C. D. et al. Maximum-likelihood estimation of recent shared ancestry (ERSA). Genome Res. 21, 768–774 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Henn, B. M. et al. Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS ONE 7, e34267 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Lowrance, W. W. & Collins, F. S. Identifiability in genomic research. Science 317, 600–602 (2007).

    Article  CAS  PubMed  Google Scholar 

  46. Kayser, M. & de Knijff, P. Improving human forensics through advances in genetics, genomics and molecular biology. Nature Rev. Genet. 12, 179–192 (2011). This is a comprehensive review of methods to predict phenotypes from DNA information.

    Article  CAS  PubMed  Google Scholar 

  47. Silventoinen, K. et al. Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res. 6, 399–408 (2003).

    Article  PubMed  Google Scholar 

  48. Kohn, L. A. P. The role of genetics in craniofacial morphology and growth. Annu. Rev. Anthropol. 20, 261–278 (1991).

    Article  Google Scholar 

  49. Zubakov, D. et al. Estimating human age from T-cell DNA rearrangements. Curr. Biol. 20, R970–R971 (2010).

    Article  CAS  PubMed  Google Scholar 

  50. Ou, X. L. et al. Predicting human age with bloodstains by sjTREC quantification. PLoS ONE 7, e42412 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Manning, A. K. et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nature Genet. 44, 659–669 (2012).

    Article  CAS  PubMed  Google Scholar 

  53. Liu, F. et al. A genome-wide association study identifies five loci influencing facial morphology in Europeans. PLoS Genet. 8, e1002932 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Walsh, S. et al. IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information. Forens. Sci. Int. Genet. 5, 170–180 (2011).

    Article  CAS  Google Scholar 

  55. Byers, S. Information leakage caused by hidden data in published documents. IEEE Secur. Priv. 2, 23–27 (2004).

    Article  Google Scholar 

  56. Kaufman, S., Rosset, S. & Perlich, C. Leakage in data mining: formulation, detection, and avoidance. Proc. 17th ACM SIGKDD Int. Conf. Knowledge Discov. Data Mining 556–563 (2011).

  57. Acquisti, A. & Gross, R. Predicting Social Security numbers from public data. Proc. Natl Acad. Sci. USA 106, 10975–10980 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Noumeir, R., Lemay, A. & Lina, J. M. Pseudonymization of radiology data for research purposes. J. Digital Imag. 20, 284–295 (2007).

    Article  Google Scholar 

  59. Pakstis, A. J. et al. SNPs for a universal individual identification panel. Hum. Genet. 127, 315–324 (2010).

    Article  PubMed  Google Scholar 

  60. Lin, Z., Owen, A. B. & Altman, R. B. Genomic research and human subject privacy. Science 305, 183 (2004).

    Article  CAS  PubMed  Google Scholar 

  61. Mailman, M. D. et al. The NCBI dbGaP database of genotypes and phenotypes. Nature Genet. 39, 1181–1186 (2007).

    Article  CAS  PubMed  Google Scholar 

  62. Homer, N. et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4, e1000167 (2008). This is the first study to show an ADAD from summary statistic data.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  63. Jacobs, K. B. et al. A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies. Nature Genet. 41, 1253–1257 (2009).

    Article  CAS  PubMed  Google Scholar 

  64. Visscher, P. M. & Hill, W. G. The limits of individual identification from sample allele frequencies: theory and statistical analysis. PLoS Genet. 5, e1000628 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  65. Sankararaman, S., Obozinski, G., Jordan, M. I. & Halperin, E. Genomic privacy and limits of individual detection in a pool. Nature Genet. 41, 965–967 (2009). References 64–65 provide excellent mathematical analyses of ADAD using allele frequency data.

    Google Scholar 

  66. Wang, R., Li, Y. F., Wang, X., Haixu, T. & Zhou, X. Learning your identity and disease from research papers: information leaks in genome wide association study. Proc. 16th ACM Conf. Comput. Commun. Security 534–544 (2009).

  67. Im, H. K., Gamazon, E. R., Nicolae, D. L. & Cox, N. J. On sharing quantitative trait, GWAS results in an era of multiple-omics data and the limits of genomic privacy. Am. J. Hum. Genet. 90, 591–598 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Lumley, T. Potential for revealing individual-level information in genome-wide association studies. JAMA 303, 659 (2010).

    Article  CAS  PubMed  Google Scholar 

  69. Zerhouni, E. A. & Nabel, E. G. Protecting aggregate genomic data. Science 322, 44 (2008).

    Article  CAS  PubMed  Google Scholar 

  70. Johnson, A. D., Leslie, R. & O'Donnell, C. J. Temporal trends in results availability from genome-wide association studies. PLoS Genet. 7, e1002269 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Gilbert, N. Researchers criticize genetic data restrictions. Nature http://dx.doi.org/10.1038/news.2008.1083 (2008).

  72. Malin, B., Karp, D. & Scheuermann, R. H. Technical and policy approaches to balancing patient privacy and data sharing in clinical and translational research. J. Investig. Med. 58, 11–18 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  73. Clayton, D. On inferring presence of an individual in a mixture: a Bayesian approach. Biostatistics 11, 661–673 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  74. Report on the workshop on establishing a central resource of data from genome sequencing projects. National Genome Research Institute [online], (2012).

  75. Schadt, E. E., Woo, S. & Hao, K. Bayesian method to predict individual SNP genotypes from gene expression data. Nature Genet. 44, 603–608 (2012).

    Article  CAS  PubMed  Google Scholar 

  76. Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nature Rev. Genet. 11, 499–511 (2010).

    Article  CAS  PubMed  Google Scholar 

  77. Nyholt, D. R., Yu, C. E. & Visscher, P. M. On Jim Watson's APOE status: genetic information is hard to hide. Eur. J. Hum. Genet. 17, 147–149 (2009). This study clearly shows the limited use of masking sensitive DNA areas.

    Article  PubMed  Google Scholar 

  78. Humbert, M., Ayday, E., Hubaux, J.-P. & Telenti, A. Addressing the concerns of the Lacks family: quantification of kin genomic privacy. Proc. 2013 ACM SIGSAC Conf. Comput. Commun. Secur. 1141–1152 (2013).

  79. Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nature Genet. 40, 1068–1075 (2008).

    Article  CAS  PubMed  Google Scholar 

  80. Kaiser, J. Agency nixes deCODE's new data-mining plan. Science 340, 1388–1389 (2013).

    Article  CAS  PubMed  Google Scholar 

  81. Bambauer, J. R. Tragedy of the data commons. Harvard J. Law Technol. http://dx.doi.org/10.2139/ssrn.1789749 (2011).

  82. Hartzog, W. & Stutzman, F. The case for online obscurity. Calif. Law Rev. 101, 1 (2013).

    Google Scholar 

  83. Taleb, N. N. The Black Swan: the Impact of the Highly Improbable (Random House, 2007).

    Google Scholar 

  84. Shannon, C. Communication theory of secrecy systems. Bell System Techn. J. 28, 656–715 (1949).

    Article  Google Scholar 

  85. Cavoukian, A. Privacy by design. Information and Privacy Commissioner, Ontario, Canada [online], (2009).

    Google Scholar 

  86. Tryka, K. A. et al. NCBI's database of genotypes and phenotypes: dbGaP. Nucleic Acids Res. 42, D975–D979 (2014).

    Article  CAS  PubMed  Google Scholar 

  87. Ramos, E. M. et al. A mechanism for controlled access to GWAS data: experience of the GAIN Data Access Committee. Am. J. Hum. Genet. 92, 479–488 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Church, G. et al. Public access to genome-wide data: five views on balancing research with privacy and protection. PLoS Genet. 5, e1000665 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  89. Agrawal, R., Kiernan, J., Srikant, R. & Xu, Y. Hippocratic databases. Proc. 28th Int. Conf. Very Large Databases 143–154 (2002).

  90. Agrawal, R. et al. Auditing compliance with a hippocratic database. Proc. 30th Int. Conf. Very Large Databases 516–527 (2004).

  91. Venter, H. S., Olivier, M. S. & Eloff, J. H. PIDS: a privacy intrusion detection system. Internet Res. 14, 360–365 (2004).

    Article  Google Scholar 

  92. Creating a global alliance to enable responsible sharing of genomic and clincal data. [online], (2013).

  93. Bafna, V. et al. Abstractions for genomics. Commun. ACM 56, 83–93 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  94. Terry, S. F. & Terry, P. F. Power to the people: participant ownership of clinical trial data. Sci. Transl Med. 3, 69cm3 (2011).

    Article  PubMed  Google Scholar 

  95. Kaye, J. et al. From patients to partners: participant-centric initiatives in biomedical research. Nature Rev. Genet. 13, 371–376 (2012).

    Article  CAS  PubMed  Google Scholar 

  96. Sweeney, L. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzz. 10, 557–570 (2002).

    Article  Google Scholar 

  97. El Emam, K. & Dankar, F. K. Protecting privacy using k-anonymity. J. Am. Med. Informat. Associ. 15, 627–637 (2008).

    Article  Google Scholar 

  98. Malin, B. A. Protecting genomic sequence anonymity with generalization lattices. Methods Inform. Med. 44, 687–692 (2005).

    Article  CAS  Google Scholar 

  99. Machanavajjhala, A., Kifer, D., Gehrke, J. & Venkitasubramaniam, M. L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 3 (2007).

    Article  Google Scholar 

  100. Li, N., Li, T. & Venkatasubramanian, S. t-closeness: privacy beyond k-anonymity and L-diversity. IEEE 23rd Int. Conf. Data Eng. 106–115 (2007).

  101. Dwork, C. Differential privacy. Automata, Languages and Programming 1–12 (Springer Verlag, 2006).

  102. Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J. & Vilhuber, L. Privacy: theory meets practice on the map. IEEE 24th Int. Conf. Data Eng. 277–286 (2008).

  103. Uhler, C., Slavkovic, A. B. & Fienberg, S. E. Privacy-preserving data sharing for genome-wide association studies. arXiv 1205.0739 (2012).

  104. Yu, F., Fienberg, S. E., Slavkovic, A. & Uhler, C. Scalable privacy-preserving data sharing methodology for genome-wide association studies. arXiv 1401.5193 (2014).

  105. Johnson, A. & Shmatikov, V. Privacy-preserving data exploration in genome-wide association studies. Proc. 19th ACM SIGKDD Int. Conf. Knowledge Discov. Data Mining 1079–1087 (2013).

  106. Ayday, E., Raisaro, J. L. & Hubaux, J. P. Privacy-enhancing technologies for medical tests using genomic data. Ecole Polytechnique Federale de Lausanne [online], (2013).

  107. Ayday, E., Raisaro, J. L., McLaren, P.J., Fellay, J. & Hubaux, J.-P. Privacy-preserving computation of disease risk by using genomic, clinical, and environmental data. Proc. USENIX Security Workshop Health Inf. Technol. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.309.1513 (2013). This pioneering work shows the use of homomorphic encryption for privacy-preserving genetic risk predictions.

  108. Atallah, M. J., Kerschbaum, F. & Du, W. Secure and private sequence comparisons. Proc. 2003 ACM Workshop Privacy in Electron. Soc. 39–44 (2003).

  109. Jha, S., Kruger, L. & Shmatikov, V. Towards practical privacy for genomic computation. IEEE Symp. Security and Privacy 216–230 (2008).

  110. Chen, Y., Peng, B., Wang, X. & Tang, H. Large-scale privacy-preserving mapping of human genomic sequences on hybrid clouds. Proc. 19th Annu. Netw. Distributed Syst. Security Symp. (2013). The paper presents an interesting concept of privacy-preserving alignment of high-throughput sequencing data that allows the use of untrusted cloud providers.

  111. Yao, A. C.-C. Protocols for secure computations. 23rd Annu. Symp. Found. Comput. Sci. 160–164 (1982).

  112. Li, H. & Homer, N. A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform. 11, 473–483 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. Bohannon, P., Jakobsson, M. & Srikwan, S. in Public Key Cryptography (eds Imai, H. & Zheng, Y.) 373–390 (Springer, 2000).

    Book  Google Scholar 

  114. Fons, B., Stefan, K., Klaus, K. & Pim, T. Privacy-preserving matching of DNA profiles. Cryptology ePrint Archive 2008, 203 (2008).

    Google Scholar 

  115. Baldi, P., Baronio, R., Cristofaro, E. D., Gasti, P. & Tsudik, G. Countering GATTACA: efficient and secure testing of fully-sequenced human genomes. Proc. 18th ACM Conf. Comput. Commun. Security 691–702 (2011).

  116. De Cristofaro, E., Faber, S., Gasti, P. & Tsudik, G. Genodroid: are privacy-preserving genomic tests ready for prime time? Proc. 2012 ACM Workshop Privacy in Electron. Soc. 97–108 (2012).

  117. He, D. et al. Identifying genetic relatives without compromising privacy. Genome Res. 24, 664–672 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Kantarcioglu, M., Jiang, W., Liu, Y. & Malin, B. A cryptographic approach to securely share and query genomic sequences. IEEE Trans. Inf. Technol. Biomed. 12, 606–617 (2008).

    Article  PubMed  Google Scholar 

  119. Kamm, L., Bogdanov, D., Laur, S. & Vilo, J. A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics 29, 886–893 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Canim, M., Kantarcioglu, M. & Malin, B. Secure management of biomedical data with cryptographic hardware. IEEE Trans. Inf. Technol. Biomed. 16, 166–175 (2012).

    Article  PubMed  Google Scholar 

  121. Narayanan, A. What happened to the crypto dream? IEEE Secur. Priv. 11, 75–76 (2013).

    Article  Google Scholar 

  122. Ayday, E., De Cristofaro, E. Hubaux, J.-P. & Tsudik, G. The chills and thrills of whole genome sequencing. Computer http://doi.ieeecomputersociety.org/10.1109/MC.2013.333 (2013). This is a good overview of cryptographic work for protecting genetic data and of open questions in the area.

  123. Presidential Commission for the Study of Bioethical Issues. Privacy and Progress in Whole Genome Sequencing (2012).

  124. Craig, D. W. et al. Assessing and managing risk when sharing aggregate genetic variant data. Nature Rev. Genet. 12, 730–736 (2011).

    Article  CAS  PubMed  Google Scholar 

  125. Braun, R., Rowe, W., Schaefer, C., Zhang, J. & Buetow, K. Needles in the haystack: identifying individuals present in pooled genomic data. PLoS Genet. 5, e1000668 (2009). This is a critical assessment of the performance of ADAD with allele frequency data.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  126. Kendler, K. S., Gallagher, T. J., Abelson, J. M. & Kessler, R. C. Lifetime prevalence, demographic risk factors, and diagnostic validity of nonaffective psychosis as assessed in a US community sample: the National Comorbidity Survey. Arch. Gen. Psychiatry 53, 1022–1031 (1996).

    Article  CAS  PubMed  Google Scholar 

  127. Lee, J. & Clifton, C. in Information Security 325–340 (Springer, 2011).

    Book  Google Scholar 

  128. Hsu, J. et al. Differential privacy: an economic method for choosing epsilon. arXiv 1402.3329 (2014).

  129. Dwork, C., McSherry, F., Nissim, K. & Smith, A. in Theory of Cryptography 265–284 (Springer, 2006).

    Book  Google Scholar 

  130. Paillier, P. in Advances in Cryptology — EUROCRYPT '99 (ed. Stern, J.) 223–238 (Springer, 1999).

    Book  Google Scholar 

  131. Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4, e1000008 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  132. Gentry, C. Fully homomorphic encryption using ideal lattices. Proc. 41st Annu. ACM Symp. Theory of Comput. 169–178 (2009).

  133. Wang, R., Li, Y. F., Wang, X. F., Tang, H. & Zhou, X. Learning your identity and disease from research papers: information leaks in genome wide association study. Proc. 16th ACM Conf. Comput. Commun. Security 534–544 (2009).

Download references

Acknowledgements

Y.E. is an Andria and Paul Heafy Family Fellow and holds a Career Award at the Scientific Interface from the Burroughs Wellcome Fund. This study was supported in part by a US National Human Genome Research Institute grant R21HG006167, and by a gift from C. Stone and J. Stone. The authors thank D. Zielinski and M. Gymrek for comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaniv Erlich.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

PowerPoint slides

Supplementary information

Supplementary information S1 (figure)

Differential privacy statistic of an association study. (PDF 1266 kb)

Glossary

Safe Harbor

A standard in the US Health Insurance Portability and Accountability Act (HIPAA) rule for de-identification of protected health information by removing 18 types of quasi-identifiers.

Haplotypes

Sets of alleles along the same chromosome.

Cryptographic hashing

A procedure that yields a fixed-length output from any size of input in a way that is hard to determine the input from the output.

Dictionary attacks

Approaches to reverse cryptographic hashing by scanning only highly probable inputs.

Alice

A common generic name in computer security to denote party A.

Bob

A common generic name in computer security to denote party B.

Type 1 error

The probability of obtaining a positive answer from a negative item.

Linkage equilibrium

Absence of correlation between the alleles at two loci.

Power

The probability of obtaining a positive answer for a positive item.

Specificity

The probability of obtaining a negative answer for a negative item.

Linkage disequilibrium

(LD). The correlation between alleles at two loci.

Effect sizes

The contributions of alleles to the values of particular traits.

Positive predictive value

The probability that a positive answer belongs to a true positive.

Expression quantitative trait locus

(eQTL). A genetic variant associated with variability in gene expression.

Genotype imputation

A class of statistical techniques to predict a genotype from information on surrounding genotypes.

Application programming interface

(API). A set of commands that specify the interface with a data set or software applications.

χ2-statistic

A measure of association in case–control genome-wide association studies.

Read mapping

A computationally intensive step in the analysis of high-throughput sequencing to find the location of a short DNA sequence (string) in the genome.

Edit distance

The total number of insertions, deletions and substitutions between two strings.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Erlich, Y., Narayanan, A. Routes for breaching and protecting genetic privacy. Nat Rev Genet 15, 409–421 (2014). https://doi.org/10.1038/nrg3723

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3723

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing