Abstract
The sharing of genomic data holds great promise in advancing precision medicine and providing personalized treatments and other types of interventions. However, these opportunities come with privacy concerns, and data misuse could potentially lead to privacy infringement for individuals and their blood relatives. With the rapid growth and increased availability of genomic datasets, understanding the current genome privacy landscape and identifying the challenges in developing effective privacy-protecting solutions are imperative. In this work, we provide an overview of major privacy threats identified by the research community and examine the privacy challenges in the context of emerging direct-to-consumer genetic-testing applications. We additionally present general privacy-protection techniques for genomic data sharing and their potential applications in direct-to-consumer genomic testing and forensic analyses. Finally, we discuss limitations in current privacy-protection methods, highlight possible mitigation strategies and suggest future research opportunities for advancing genomic data sharing.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
GRANDPA: GeneRAtive network sampling using degree and property augmentation applied to the analysis of partially confidential healthcare networks
Applied Network Science Open Access 11 May 2023
-
Sharing GWAS summary statistics results in more citations
Communications Biology Open Access 28 January 2023
-
Parallel and private generalized suffix tree construction and query on genomic data
BMC Genomic Data Open Access 17 June 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout



References
Mardis, E. R. A decade’s perspective on DNA sequencing technology. Nature 470, 198–203 (2011).
Metzker, M. L. Sequencing technologies—the next generation. Nat. Rev. Genet. 11, 31–46 (2010).
Denny, J. C. et al. The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676 (2019).
Green, R. C. et al. Disclosure of APOE genotype for risk of Alzheimer’s disease. N. Engl. J. Med. 361, 245–254 (2009).
Goldman, J. S. et al. Genetic counseling and testing for Alzheimer disease: joint practice guidelines of the American College of Medical Genetics and the National Society of Genetic Counselors. Genet. Med. 13, 597–605 (2011).
Heeney, C., Hawkins, N., de Vries, J., Boddington, P. & Kaye, J. Assessing the privacy risks of data sharing in genomics. Public Health Genomics 14, 17–25 (2011).
Wang, S. et al. Genome privacy: challenges, technical approaches to mitigate risk, and ethical considerations in the United States. Ann. NY Acad. Sci 1387, 73–83 (2017).
Lin, Z., Owen, A. B. & Altman, R. B. Genomic research and human subject privacy. Science 305, 183 (2004).
Sankararaman, S., Obozinski, G., Jordan, M. I. & Halperin, E. Genomic privacy and limits of individual detection in a pool. Nat. Genet. 41, 965–967 (2009).
Homer, N. et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4, e1000167 (2008).
Humbert, M., Ayday, E., Hubaux, J.-P. & Telenti, A. Addressing the concerns of the lacks family: Quantification of kin genomic privacy. In Proc. 2013 ACM SIGSAC Conference on Computer & Communications Security 1141–1152 (ACM, 2013).
Gymrek, M., McGuire, A. L., Golan, D., Halperin, E. & Erlich, Y. Identifying personal genomes by surname inference. Science 339, 321–324 (2013).
Lippert, C. et al. Identification of individuals by trait prediction using whole-genome sequencing data. Proc. Natl Acad. Sci. USA 114, 10166–10171 (2017).
McGuire, A. L. et al. To share or not to share: a randomized trial of consent for data sharing in genome research. Genet. Med. 13, 948–955 (2011).
Oliver, J. M. et al. Balancing the risks and benefits of genomic data sharing: genome research participants’ perspectives. Public Health Genomics 15, 106–114 (2012).
Health Insurance Portability and Accountability Act of 1996, 18 USC §264. (1996).
Rocher, L., Hendrickx, J. M. & de Montjoye, Y.-A. Estimating the success of re-identifications in incomplete datasets using generative models. Nat. Commun. 10, 3069 (2019).
Na, L. et al. Feasibility of reidentifying individuals in large national physical activity data sets from which protected health information has been removed with use of machine learning. JAMA Netw. Open 1, e186040 (2018).
The Genetic Information Nondiscrimination Act of 2008 (2008); https://www.eeoc.gov/laws/statutes/gina.cfm
European Parliament and Council. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection Of Natural Persons With Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EE (General Data Protection Regulation). Off. J. Eur. Union 119, 1–88 (2016).
Erlich, Y. & Narayanan, A. Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15, 409–421 (2014).
Naveed, M. et al. Privacy in the genomic era. ACM Comput. Surv. 48, 6 (2015).
Mittos, A., Malin, B. & De Cristofaro, E. Systematizing genome privacy research: a privacy-enhancing technologies perspective. Proc. Priv. Enhancing Technol. 2019, 87–107 (2019).
Akgün, M., Bayrak, A. O., Ozer, B. & Sağıroğlu, M. Ş. Privacy preserving processing of genomic data: a survey. J. Biomed. Inform 56, 103–111 (2015).
Sweeney, L., Abu, A. & Winn, J. Identifying participants in the personal genome project by name (2013); http://dataprivacylab.org/projects/pgp/1021-1.pdf
Gitschier, J. Inferential genotyping of Y chromosomes in Latter-Day Saints founders and comparison to Utah samples in the HapMap project. Am. J. Hum. Genet. 84, 251–258 (2009).
Malin, B. Re-identification of familial database records. In AMIA Annual Symposium Proc., Vol. 2006, 524 (American Medical Informatics Association, 2006).
Malin, B. & Sweeney, L. How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. J. Biomed. Inform. 37, 179–192 (2004).
Malin, B. & Sweeney, L. Determining the identifiability of DNA database entries. In Proc. AMIA Symposium, Vol. 537 (American Medical Informatics Association, 2000).
Erlich, Y., Shor, T., Pe’er, I. & Carmi, S. Identity inference of genomic data using long-range familial searches. Science 362, 690–694 (2018).
Kahn, S. D. On the future of genomic data. Science 331, 728–729 (2011).
Areheart, B. A. & Roberts, J. L. GINA, big data, and the future of employee privacy. Yale Law J 128, 3 (2019).
Soo-Jin Lee, S. & Borgelt, E. Protecting posted genes: social networking and the limits of GINA. Am. J. Bioeth 14, 32–44 (2014).
Wheeler, D. A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).
Nyholt, D. R., Yu, C.-E. & Visscher, P. M. On Jim Watson’s APOE status: genetic information is hard to hide. Eur. J. Hum. Genet. 17, 147–149 (2009).
Humbert, M., Ayday, E., Hubaux, J.-P. & Telenti, A. Quantifying interdependent risks in genomic privacy. ACM Trans. Priv. Secur 20, 3 (2017).
Ayday, E. & Humbert, M. Inference attacks against kin genomic privacy. IEEE Secur. Priv. 15, 29–37 (2017).
Shringarpure, S. S. & Bustamante, C. D. Privacy risks from genomic data-sharing beacons. Am. J. Hum. Genet. 97, 631–646 (2015).
Wang, R., Li, Y.F., Wang, X., Tang, H. & Zhou, X. Learning your identity and disease from research papers: information leaks in genome wide association study. In Proc. 16th ACM conference on Computer and Communications Security 534–544 (ACM, 2009).
James, R. et al. Exploring pathways to trust: a tribal perspective on data sharing. Genet. Med. 16, 820–826 (2014).
Harding, A. et al. Conducting research with tribal communities: sovereignty, ethics, and data-sharing issues. Environ. Health Perspect. 120, 6–10 (2012).
Arquette, M. et al. Holistic risk-based environmental decision making: a Native perspective. Environ. Health Perspect. 110 (Suppl. 2), 259–264 (2002).
Mello, M. M. & Wolf, L. E. The Havasupai Indian tribe case—lessons for research involving stored biologic samples. N. Engl. J. Med. 363, 204–207 (2010).
Christofides, E. & O’Doherty, K. Company disclosure and consumer perceptions of the privacy implications of direct-to-consumer genetic testing. New Genet. Soc. 35, 101–123 (2016).
Laestadius, L. I., Rich, J. R. & Auer, P. L. All your data (effectively) belong to us: data practices among direct-to-consumer genetic testing firms. Genet. Med. 19, 513–520 (2017).
Niemiec, E. & Howard, H. C. Ethical issues in consumer genome sequencing: use of consumers’ samples and data. Appl. Transl. Genom. 8, 23–30 (2016).
23andMe. Terms of Service (accessed 11 June 2020); https://www.23andme.com/about/tos/
Allyse, M. 23 and me, we, and you: direct-to-consumer genetics, intellectual property, and informed consent. Trends Biotechnol. 31, 68–69 (2013).
Eriksson, N. et al. Web-based, participant-driven studies yield novel genetic associations for common traits. PLoS Genet. 6, e1000993 (2010).
Ram, N., Guerrini, C. J. & McGuire, A. L. Genealogy databases and the future of criminal investigation. Science 360, 1078–1079 (2018).
Greytak, E. M., Kaye, D. H., Budowle, B., Moore, C. & Armentrout, S. L. Privacy and genetic genealogy data. Science 361, 857 (2018).
Berkman, B. E., Miller, W. K. & Grady, C. Is it ethical to use genealogy data to solve crimes? Ann. Intern. Med. 169, 333–334 (2018).
GEDmatch. GEDmatch.Com Terms of Service and Privacy Policy (accessed 11 June 2020); https://www.gedmatch.com/tos.htm
Erlich, Y. et al. Redefining genomic privacy: trust and empowerment. PLoS Biol. 12, e1001983 (2014).
Lauter, K., López-Alt, A. & Naehrig, M. Private computation on encrypted genomic data. In Progress in Cryptology - LATINCRYPT 2014, Vol. 8895, 3–27 (Springer, 2015).
Wang, S. et al. HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS. Bioinformatics 32, 211–218 (2016).
He, D. et al. Identifying genetic relatives without compromising privacy. Genome Res. 24, 664–672 (2014).
Bohannon, P., Jakobsson, M. & Srikwan, S. Cryptographic approaches to privacy in forensic DNA databases. In Int. Workshop on Public Key Cryptography 373–390 (Springer, 2000).
Sousa, J. S. et al. Efficient and secure outsourcing of genomic data storage. BMC Med. Genomics 10 (Suppl. 2), 46 (2017).
Deuber, D. et al. My genome belongs to me: controlling third party computation on genomic data. Proc. Priv. Enh. Technol. 2019, 108–132 (2019).
Ayday, E., Raisaro, J.L., Hubaux, J.-P. & Rougemont, J. Protecting and evaluating genomic privacy in medical tests and personalized medicine. In Proc. 12th ACM Workshop on Workshop on Privacy in the Electronic Society 95–106 (ACM, 2013).
Constable, S. D., Tang, Y., Wang, S., Jiang, X. & Chapin, S. Privacy-preserving GWAS analysis on federated genomic datasets. BMC Med. Inform. Decis. Mak. 15 (Suppl. 5), S2 (2015).
Zhang, Y., Dai, W., Jiang, X., Xiong, H. & Wang, S. FORESEE: fully outsourced secure genome study based on homomorphic encryption. BMC Med. Inform. Decis. Mak. 15 (Suppl. 5), S5 (2015).
Chen, F. et al. PRINCESS: privacy-protecting rare disease international network collaboration via encryption through software guard extensions. Bioinformatics 33, 871–878 (2017).
Goodrich, M.T. The mastermind attack on genomic data. In Security and Privacy, 2009 30th IEEE Symposium 204–218 (IEEE, 2009).
Atallah, M.J., Kerschbaum, F. & Du, W. Secure and private sequence comparisons. In Proc. 2003 ACM Workshop on Privacy in the Electronic Society 39–44 (ACM, 2003).
Jha, S., Kruger, L. & Shmatikov, V. Towards practical privacy for genomic computation. In Proc. 2008 IEEE Symposium on Security and Privacy 16–230 (IEEE, 2008).
Bruekers, F., Katzenbeisser, S., Kursawe, K. & Tuyls, P. Privacy-preserving matching of DNA profiles. IACR Cryptol 2008, 203 (2008).
Danezis, G. & De Cristofaro, E. Fast and private genomic testing for disease susceptibility. In Proc. 13th Workshop on Privacy in the Electronic Society 31–34 (ACM, 2014).
Duverle, D.A., Kawasaki, S., Yamada, Y., Sakuma, J. & Tsuda, K. Privacy-preserving statistical analysis by exact logistic regression. In Proc. 2015 IEEE Security and Privacy Workshops 7–16 (IEEE, 2015).
Kamm, L., Bogdanov, D., Laur, S. & Vilo, J. A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics 29, 886–893 (2013).
Cho, H., Wu, D. J. & Berger, B. Secure genome-wide association analysis using multiparty computation. Nat. Biotechnol. 36, 547–551 (2018).
Sweeney, L. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10, 557–570 (2002).
Malin, B. A. An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future. J. Am. Med. Inform. Assoc 12, 28–34 (2005).
Li, N., Qardaji, W. & Su, D. On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In Proc. 7th ACM Symposium on Information, Computer and Communications Security 32–33 (ACM, 2012).
Malin, B. A. Protecting genomic sequence anonymity with generalization lattices. Methods Inf. Med. 44, 687–692 (2005).
Dwork, C. Differential privacy. Int. Colloq. Autom. Lang. Program 4052, 1–12 (2006).
Simmons, S. & Berger, B. Realizing privacy preserving genome-wide association studies. Bioinformatics 32, 1293–1300 (2016).
Johnson, A. & Shmatikov, V. Privacy-preserving data exploration in genome-wide association studies. In Proc. 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’13 1079 (ACM, 2013).
Yu, F. & Ji, Z. Scalable privacy-preserving data sharing methodology for genome-wide association studies: an application to iDASH healthcare privacy protection challenge. BMC Med. Inform. Decis. Mak. 14 (Suppl. 1), S3 (2014).
Uhlerop, C., Slavković, A. & Fienberg, S. E. Privacy-preserving data sharing for genome-wide association studies. J. Priv. Confid. 5, 137–166 (2013).
Backes, M., Berrang, P., Humbert, M. & Manoharan, P. Membership privacy in MicroRNA-based studies. In Proc. 2016 ACM SIGSAC Conference on Computer and Communications Security 319–330 (ACM, 2016).
Tramèr, F., Huang, Z., Hubaux, J.-P. & Ayday, E. Differential privacy with bounded priors: reconciling utility and privacy in genome-wide association studies. In Proc. 22nd ACM SIGSAC Conference on Computer and Communications Security 1286–1297 (ACM, 2015).
Raisaro, J. L. et al. Protecting privacy and security of genomic data in I2B2 with homomorphic encryption and differential privacy. IEEE/ACM Trans. Comput. Bioinform 15, 1413–1426 (2018).
Huang, Z., Ayday, E., Fellay, J., Hubaux, J.-P. & Juels, A. GenoGuard: protecting genomic data against brute-force attacks. In 36th IEEE Symposium on Security and Privacy (2015).
Juels, A. & Ristenpart, T. Honey encryption: security beyond the brute-force bound. In Annual International Conference on the Theory and Applications of Cryptographic Techniques 293–310 (Springer, 2014).
Humbert, M., Ayday, E., Hubaux, J.-P. & Telenti, A. Reconciling utility with privacy in genomics. In Proc. 13th Workshop on Privacy in the Electronic Society 11–20 (ACM, 2014).
Allyse, M.A., Robinson, D.H., Ferber, M.J. & Sharp, R.R. Direct-to-consumer testing 2.0: emerging models of direct-to-consumer genetic testing. In Mayo Clinic Proc., Vol. 93, 113–120 (Elsevier, 2018).
Future of Privacy Forum. Privacy best practices for consumer genetic testing services (2018); https://fpf.org/wp-content/uploads/2018/07/Privacy-Best-Practices-for-Consumer-Genetic-Testing-Services-FINAL.pdf
Wee, R., Henaghan, M. & Winship, I. Dynamic consent in the digital age of biology: online initiatives and regulatory considerations. J. Prim. Health Care 5, 341–347 (2013).
Mackey, T. K. et al. ‘Fit-for-purpose?’—challenges and opportunities for applications of blockchain technology in the future of healthcare. BMC Med. 17, 68 (2019).
Maxmen, A. AI researchers embrace Bitcoin technology to share medical data. Nature 555, 293–294 (2018).
Lawler, M. et al. All the world’s a stage: facilitating discovery science and improved cancer care through the global alliance for genomics and health. Cancer Discov 5, 1133–1136 (2015).
Phillips, A. M. ‘Only a click away—DTC genetics for ancestry, health, love…and more: a view of the business and regulatory landscape’. Appl. Transl. Genom 8, 16–22 (2016).
Simmons, S., Sahinalp, C. & Berger, B. Enabling privacy-preserving GWASs in heterogeneous human populations. Cell Syst 3, 54–61 (2016).
Yu, F., Fienberg, S. E., Slavković, A. B. & Uhler, C. Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inform. 50, 133–141 (2014).
Acknowledgements
This work was supported by National Human Genome Research Institute grant K99HG010493 to L.B., National Institute of General Medical Sciences grant R01GM118609 and National Heart, Lung, and Blood Institute grant R01HL136835 to L.O.-M.
Author information
Authors and Affiliations
Contributions
L.B. conducted the literature review, drafted the organization of the article and contributed most the writing. Y.H. contributed to the sections on data sharing in DTC genetic testing and provided helpful comments on the presentation. L.O.-M. provided the motivation for this work, and provided detailed edits and critical suggestions on the organization and structure of the article.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Table 1
Rights and permissions
About this article
Cite this article
Bonomi, L., Huang, Y. & Ohno-Machado, L. Privacy challenges and research opportunities for genomic data sharing. Nat Genet 52, 646–654 (2020). https://doi.org/10.1038/s41588-020-0651-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-020-0651-0
This article is cited by
-
Sharing GWAS summary statistics results in more citations
Communications Biology (2023)
-
GRANDPA: GeneRAtive network sampling using degree and property augmentation applied to the analysis of partially confidential healthcare networks
Applied Network Science (2023)
-
Parallel and private generalized suffix tree construction and query on genomic data
BMC Genomic Data (2022)
-
Sociotechnical safeguards for genomic data privacy
Nature Reviews Genetics (2022)
-
Federated learning and Indigenous genomic data sovereignty
Nature Machine Intelligence (2022)