Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Functional genomics data: privacy risk assessment and technological mitigation

An Author Correction to this article was published on 22 November 2021

This article has been updated

Abstract

The generation of functional genomics data by next-generation sequencing has increased greatly in the past decade. Broad sharing of these data is essential for research advancement but poses notable privacy challenges, some of which are analogous to those that occur when sharing genetic variant data. However, there are also unique privacy challenges that arise from cryptic information leakage during the processing and summarization of functional genomics data from raw reads to derived quantities, such as gene expression values. Here, we review these challenges and present potential solutions for mitigating privacy risks while allowing broad data dissemination and analysis.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Private information leakage in functional genomics data.
Fig. 2: Genetic characterization of an individual through functional genomics.
Fig. 3: Cryptic private information in functional genomics data.
Fig. 4: Challenges in accessing private functional genomics data and potential solutions from data sanitization.
Fig. 5: Cryptographic techniques to perform confidential eQTL mapping.
Fig. 6: Future perspectives for the privacy of functional genomics data.

Similar content being viewed by others

Change history

References

  1. Hirst, M. & Marra, M. A. Next generation sequencing based approaches to epigenomics. Brief. Funct. Genomics 9, 455–465 (2010).

    Article  CAS  PubMed  Google Scholar 

  2. Werner, T. Next generation sequencing in functional genomics. Brief. Bioinform. 11, 499–511 (2010).

    Article  CAS  PubMed  Google Scholar 

  3. Bonifer, C. & Cockerill, P. N. Chromatin mechanisms regulating gene expression in health and disease. Adv. Exp. Med. Biol. 711, 12–25 (2011).

    Article  CAS  PubMed  Google Scholar 

  4. Byron, S. et al. Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat. Rev. Genet. 17, 257–271 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).

    Article  CAS  PubMed  Google Scholar 

  7. Boyle, A. P. High-resolution mapping and characterization of open chromatin across the genome. Cell 132, 311–322 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Buenrostro, J. et al. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods. 10, 1213–1218 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Gasperskaja, E. & Kučinskas, V. The most common technologies and tools for functional genome analysis. Acta Med. Litu. 24, 1–11 (2017).

    PubMed  PubMed Central  Google Scholar 

  11. GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    Article  Google Scholar 

  12. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  PubMed Central  Google Scholar 

  13. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Cancer Genome Atlas Research Network. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

    Article  PubMed Central  Google Scholar 

  15. Rodriguez-Esteban, R. & Jiang, X. Differential gene expression in disease: a comparison between high-throughput studies and the literature. BMC Med. Genomics 10, 59 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Gürsoy, G. et al. Data sanitization to reduce private information leakage from functional genomics. Cell 183, 905–917.e16 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Harmanci, A. & Gerstein, M. Quantification of private information leakage from phenotype-genotype data: linking attacks. Nat. Methods 13, 251–256 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Gürsoy, G., Lu, N., Wagner, S. & Gerstein, M. Recovering genotypes and phenotypes using allele-specific genes. Genome Biol. 22, 263 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Schadt, E. E., Woo, S. & Hao, K. Bayesian method to predict individual SNP genotypes from gene expression data. Nat. Genet. 44, 603–608 (2012).

    Article  CAS  PubMed  Google Scholar 

  20. Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinforma. 12, 323 (2011).

    Article  CAS  Google Scholar 

  21. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    Article  CAS  PubMed  Google Scholar 

  22. Stuart, T. & Satija, R. Integrative single-cell analysis. Nat. Rev. Genet. 20, 257–272 (2019).

    Article  CAS  PubMed  Google Scholar 

  23. Feng, J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using MACS. Nat. Protoc. 7, 1728–1740 (2012).

    Article  CAS  PubMed  Google Scholar 

  24. Rozowsky, J. et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol. 27, 66–75 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Harmanci, A., Rozowsky, J. & Gerstein, M. MUSIC: identification of enriched regions in ChIP-seq experiments using a mappability-corrected multiscale signal processing framework. Genome Biol. 15, 474 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Zhao, Y. et al. A high-throughput SNP discovery strategy for RNA-seq data. BMC Genomics 20, 160 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Harmanci, A. & Gerstein, M. Analysis of sensitive information leakage in functional genomics signal profiles through genomic deletions. Nat. Commun. 9, 2453 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Mangul, S. et al. ROP: dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues. Genome Biol. 19, 36 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Tierney, B. T. et al. The predictive power of the microbiome exceeds that of genome-wide association studies in the discrimination of complex human disease. Preprint at https://doi.org/10.1101/2019.12.31.891978 (2020).

  31. Danko, D. et al. A global metagenomic map of urban microbiomes and antimicrobial resistance. Cell 184, 3376–3393.e17 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Tovino, S. A. HIPAA compliance. in The Cambridge Handbook of Compliance 895–908 (Cambridge University Press, 2021).

  33. Rothstein, M. A. Putting the Genetic Information Nondiscrimination Act in context. Genet. Med. 10, 655–656 (2008).

    Article  PubMed  Google Scholar 

  34. Yordanov, A. Nature and ideal steps of the data protection impact assessment under the general data protection regulation. Eur. Data Prot. Law Rev. 3, 486–495 (2017).

    Article  Google Scholar 

  35. Greenbaum, D., Harmanci, A. & Gerstein, M. Proposed social and technological solutions to issues of data privacy in personal genomics. In 2014 IEEE International Symposium on Ethics in Science, Technology and Engineering (IEEE, 2014).

  36. Ayoz, K., Ayday, E. & Cicek, A. E. Genome reconstruction attacks against genomic data-sharing beacons. Proc. Priv. Enh. Technol. 2021, 28–48 (2021).

    Google Scholar 

  37. Berger, B. & Cho, H. Emerging technologies towards enhancing privacy in genomic data sharing. Genome Biol. 20, 128 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Mittos, A., Malin, B. & De Cristofaro, E. Systematizing genome privacy research: a privacy-enhancing technologies perspective. Proc. Priv. Enh. Technol. 2019, 87–107 (2019).

    Google Scholar 

  39. Huang, Z. et al. A privacy-preserving solution for compressed storage and selective retrieval of genomic data. Genome Res. 26, 1687–1696 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Dyke, S. O. M. et al. Epigenome data release: a participant-centered approach to privacy protection. Genome Biol. 16, 142 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  41. He, D. et al. Identifying genetic relatives without compromising privacy. Genome Res. 24, 664–672 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Uhlerop, C., Slavković, A. & Fienberg, S. E. Privacy-preserving data sharing for genome-wide association studies. J. Priv. Confid. 5, 137–166 (2013).

    PubMed  PubMed Central  Google Scholar 

  43. Romeo Casabona, C. M. Genetic privacy and non-discrimination. Rev. Derecho Genoma Hum. 34, 141–151 (2011).

    Google Scholar 

  44. Ducato, R., Perra, S. & Zuddas, C. The legal fate of biobanks between privacy, IPRs and crisis of a firm: a preliminary study on the case of “bio-bankruptcy”. Rev. Derecho Genoma Hum. 41, 89–102 (2014).

    Google Scholar 

  45. Moniz, H. Privacy and intra-familiy communication of genetic information. Rev. Derecho Genoma Hum. 21, 103–124 (2004).

    Google Scholar 

  46. Andrews, L. B. Genetic privacy: from the laboratory to the legislature. Genome Res. 5, 209–213 (1995).

    Article  CAS  PubMed  Google Scholar 

  47. Erlich, Y. & Narayanan, A. Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15, 409–421 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Sboner, A., Mu, X. J., Greenbaum, D., Auerbach, R. K. & Gerstein, M. B. The real cost of sequencing: higher than you think! Genome Biol. 12, 125 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Rodriguez, L. L., Brooks, L. D., Greenberg, J. H. & Green, E. D. Research ethics. The complexities of genomic identifiability. Science 339, 275–276 (2013).

    Article  CAS  PubMed  Google Scholar 

  50. Canela-Xandri, O., Rawlik, K. & Tenesa, A. An atlas of genetic associations in UK Biobank. Nat. Genet. 50, 1593–1599 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Lockhart, N. C. et al. Development of a consensus approach for return of pathology incidental findings in the Genotype-Tissue Expression (GTEx) project. J. Med. Ethics 44, 643–645 (2018).

    Article  PubMed  Google Scholar 

  52. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  Google Scholar 

  53. Flynn, M. The culprit’s name remains unknown. But he licked a stamp, and now his DNA stands indicted. Washington Post, 17 October 2018.

  54. Claw, K. G. et al. A framework for enhancing ethical genomic research with Indigenous communities. Nat. Commun. 9, 2957 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Garrison, N. A. et al. Genomic research through an Indigenous lens: understanding the expectations. Annu. Rev. Genomics Hum. Genet. 20, 495–517 (2019).

    Article  CAS  PubMed  Google Scholar 

  56. Erlich, Y., Shor, T., Pe’er, I. & Carmi, S. Identity inference of genomic data using long-range familial searches. Science 362, 690–694 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Tsosie, K. S., Yracheta, J. M., Kolopenuk, J. A. & Geary, J. We have “gifted” enough: indigenous genomic data sovereignty in precision medicine. Am. J. Bioeth. 21, 72–75 (2021).

    Article  PubMed  Google Scholar 

  58. Fox, K. The illusion of inclusion - the “all of us” research program and indigenous peoples’ DNA. N. Engl. J. Med. 383, 411–413 (2020).

    Article  PubMed  Google Scholar 

  59. Rozowsky, J. et al. ExceRpt: a comprehensive analytic platform for extracellular RNA profiling. Cell Syst. 8, 352–357.e3 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. All of Us Research Program Investigators. The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676 (2019).

    Article  Google Scholar 

  61. Homer, N. et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4, e1000167 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Im, H. K., Gamazon, E. R., Nicolae, D. L. & Cox, N. J. On sharing quantitative trait GWAS results in an era of multiple-omics data and the limits of genomic privacy. Am. J. Hum. Genet. 90, 591–598 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Gymrek, M., McGuire, A. L., Golan, D., Halperin, E. & Erlich, Y. Identifying personal genomes by surname inference. Science 339, 321–324 (2013).

    Article  CAS  PubMed  Google Scholar 

  64. Sweeney, L. et al. Re-identification risks in HIPAA Safe Harbor Data: a study of data from one environmental health study. Technol. Sci. 2017, 2017082801 (2017).

    PubMed  PubMed Central  Google Scholar 

  65. Narayanan, A. & Shmatikov, V. Robust DE-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy (sp 2008) (IEEE, 2008).

  66. Knoppers, B. M. & Beauvais M. J. S. Three decades of genetic privacy: a metaphoric journey. Hum. Mol. Genet. 30, R156–R160 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Bonomi, L., Huang, Y. & Ohno-Machado, L. Privacy challenges and research opportunities for genomic data sharing. Nat. Genet. 52, 646–654 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Arellano, A. M., Dai, W., Wang, S., Jiang, X. & Ohno-Machado, L. Privacy policy and technology in biomedical data science. Annu. Rev. Biomed. Data Sci. 1, 115–129 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Wang, S. et al. Big data privacy in biomedical research. IEEE Trans. Big Data 6, 296–308 (2020).

    Article  PubMed  Google Scholar 

  70. Cock, P. J. A., Fields, C. J., Goto, N., Heuer, M. L. & Rice, P. M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38, 1767–1771 (2010).

    Article  CAS  PubMed  Google Scholar 

  71. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Davies, R. W. et al. Rapid genotype imputation from sequence with reference panels. Nat. Genet. 53, 1104–1111 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Chen, S.-F. et al. Genotype imputation and variability in polygenic risk score estimation. Genome Med. 12, 100 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  74. Gürsoy, G., Brannon, C. M., Navarro, F. C. P. & Gerstein, M. “FANCY: fast estimation of privacy risk in functional genomics data”. Bioinformatics 36, 5145–5150 (2020).

    Article  PubMed Central  Google Scholar 

  75. Backes, M. et al. Identifying personal DNA methylation profiles by genotype inference. In 2017 IEEE Symposium on Security and Privacy (SP) (IEEE, 2017).

  76. Philibert, R. A. et al. Methylation array data can simultaneously identify individuals and convey protected health information: an unrecognized ethical concern. Clin. Epigenetics 6, 28 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  77. Liang, P. & Pardee, A. B. Analysing differential gene expression in cancer. Nat. Rev. Cancer 3, 869–876 (2003).

    Article  CAS  PubMed  Google Scholar 

  78. Balgobind, B. V. et al. Evaluation of gene expression signatures predictive of cytogenetic and molecular subtypes of pediatric acute myeloid leukemia. Haematologica 96, 221–230 (2011).

    Article  CAS  PubMed  Google Scholar 

  79. Bergman, Y. & Cedar, H. DNA methylation dynamics in health and disease. Nat. Struct. Mol. Biol. 20, 274–281 (2013).

    Article  CAS  PubMed  Google Scholar 

  80. Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol. 14, R115 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  81. Liu, Z. et al. Underlying features of epigenetic aging clocks in vivo and in vitro. Aging Cell 19, e13229 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Kuo, C.-L., Pilling, L. C., Liu, Z., Atkins, J. L. & Levine, M. E. Genetic associations for two biological age measures point to distinct aging phenotypes. Aging Cell 20, e13376 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Leung, D. & Levine, M. Epigenetic signatures of cell states in aging. Innov. Aging 4, 132–132 (2020).

    Article  PubMed Central  Google Scholar 

  84. Office for Human Research Protections. Genetic Information Nondiscrimination Act (GINA): OHRP Guidance. U.S. Department of Health & Human Services (2009).

  85. Manor, O. et al. Health and disease markers correlate with gut microbiome composition across thousands of people. Nat. Commun. 11, 5206 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Franzosa, E. A. et al. Identifying personal microbiomes using metagenomic codes. Proc. Natl Acad. Sci. USA 112, E2930–E2938 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).

    Article  CAS  PubMed  Google Scholar 

  88. Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Lareau, C. A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat. Biotechnol. 37, 916–924 (2019).

    Article  CAS  PubMed  Google Scholar 

  90. Tryka, K. A. et al. NCBI’s database of genotypes and phenotypes: dbGaP. Nucleic Acids Res. 42, D975–D979 (2014).

    Article  CAS  PubMed  Google Scholar 

  91. Fernandez-Orth, D., Lloret-Villas, A. & Rambla de Argila, J. European genome-phenome archive (EGA)- granular solutions for the next 10 years. In 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS) (IEEE, 2019).

  92. Paltoo, D. N. et al. Data use under the NIH GWAS data sharing policy and future directions. Nat. Genet. 46, 934–938 (2014).

    Article  PubMed  Google Scholar 

  93. Joly, Y., Dyke, S. O. M., Knoppers, B. M. & Pastinen, T. Are data sharing and privacy protection mutually exclusive? Cell 167, 1150–1154 (2016).

    Article  CAS  PubMed  Google Scholar 

  94. Wang, X. et al. iDASH secure genome analysis competition 2017. BMC Med. Genomics 11, 85 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  95. Kuo, T.-T. et al. iDASH secure genome analysis competition 2018: blockchain genomic data access logging, homomorphic encryption on GWAS, and DNA segment searching. BMC Med. Genomics 13, 98 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  96. Rivest, R. L., Adleman, L. & Dertouzos, M. L. On Data Banks and Privacy Homomorphisms (Massachusetts Institute of Technology, 1978).

  97. Gentry, C. Fully homomorphic encryption using ideal lattices. In Proceedings of the 41st Annual ACM Symposium on Symposium on Theory of Computing- STOC ’09 (ACM Press, 2009).

  98. Zheng, W. et al. A survey of Intel SGX and its applications. Front. Comput. Sci. 15, 153808 (2021).

    Article  Google Scholar 

  99. Yao, A. C.-C. How to generate and exchange secrets. In 27th Annual Symposium on Foundations of Computer Science (sfcs 1986) (IEEE, 1986).

  100. Kairouz, P. et al. Advances and open problems in federated learning. Found. Trends Mach. Learn. 14, 1–210 (2021).

    Article  Google Scholar 

  101. Chong, K. S., Yap, C. N. & Tew, Z. H. Multi-key homomorphic encryption create new multiple logic gates and arithmetic circuit. In 2020 8th International Symposium on Digital Forensics and Security (ISDFS) (IEEE, 2020).

  102. Xu, J., Cui, B., Shi, R. & Feng, Q. Outsourced privacy-aware task allocation with flexible expressions in crowdsourcing. Future Gener. Comput. Syst. 112, 383–393 (2020).

    Article  Google Scholar 

  103. Zolotareva, O. et al. Flimma: a federated and privacy-preserving tool for differential gene expression analysis. Preprint at https://arxiv.org/abs/2010.16403 (2020).

  104. Subramanian, S. K. & Duraipandian. Artificial neural network based method for classification of gene expression data of human diseases along with privacy preserving. Int. J. Comput. Technol. 4, 722–730 (2005).

    Article  Google Scholar 

  105. Carpov, S. & Tortech, T. Secure top most significant genome variants search: iDASH 2017 competition. BMC Med. Genomics 11, 82 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  106. Yu, F. & Ji, Z. Scalable privacy-preserving data sharing methodology for genome-wide association studies: an application to iDASH healthcare privacy protection challenge. BMC Med. Inform. Decis. Mak. 14 (Suppl. 1), S3 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  107. Chen, H. et al. Logistic regression over encrypted data from fully homomorphic encryption. BMC Med. Genomics 11, 81 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  108. Ohno-Machado, L. et al. iDASH: integrating data for analysis, anonymization, and sharing. J. Am. Med. Inform. Assoc. 19, 196–201 (2012).

    Article  PubMed  Google Scholar 

  109. Warnat-Herresthal, S. et al. Swarm learning for decentralized and confidential clinical machine learning. Nature 594, 265–270 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Cho, H., Wu, D. J. & Berger, B. Secure genome-wide association analysis using multiparty computation. Nat. Biotechnol. 36, 547–551 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Kockan, C. et al. Sketching algorithms for genomic data analysis and querying in a secure enclave. Nat. Methods 17, 295–301 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Kim, D. et al. Privacy-preserving approximate GWAS computation based on homomorphic encryption. BMC Med. Genomics 13, 77 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  113. Kim, M. & Lauter, K. Private genome analysis through homomorphic encryption. BMC Med. Inform. Decis. Mak. 15 (Suppl. 5), S3 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  114. Sarkar, E. et al. Fast and scalable private genotype imputation using machine learning and partially homomorphic encryption. IEEE Access 9, 93097–93110 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  115. Kim, M. et al. Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation. Cell Systems 12, 1–13 (2021).

    Article  Google Scholar 

  116. Gürsoy, G., Chielle, E., Brannon, C. M., Maniatakos, M. & Gerstein, M. Privacy-preserving genotype imputation with fully homomorphic encryption. Preprint at https://doi.org/10.1101/2020.05.29.124412 (2020).

  117. Froelicher, D. et al. Truly privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption. Preprint at https://doi.org/10.1101/2021.02.24.432489 (2021).

  118. Dokmai, N. et al. Privacy-preserving genotype imputation in a trusted execution environment. Cell Systems 12, 983–993 (2021).

    Article  CAS  PubMed  Google Scholar 

  119. Hie, B., Cho, H. & Berger, B. Realizing private and practical pharmacological collaboration. Science 362, 347–350 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Mandl, K. D. et al. The Genomics Research and Innovation Network: creating an interoperable, federated, genomics learning system. Genet. Med. 22, 371–380 (2020).

    Article  CAS  PubMed  Google Scholar 

  121. Kim, M., Gunlu, O. & Schaefer, R. F. Federated learning with local differential privacy: Trade-offs between privacy, utility, and communication. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2021).

  122. Li, N., Lyu, M., Su, D. & Yang, W. Differential Privacy: from Theory to Practice (Morgan & Claypool, 2016).

  123. Pfitzner, B., Steckhan, N. & Arnrich, B. Federated learning in a medical context: a systematic literature review. ACM Trans. Internet Technol. 21, 1–31 (2021).

    Article  Google Scholar 

  124. Dwork, C. & Roth, A. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 211–407 (2013).

    Article  Google Scholar 

  125. Ozdayi, M. S., Kantarcioglu, M. & Malin, B. Leveraging blockchain for immutable logging and querying across multiple sites. BMC Med. Genomics 13, 82 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  126. Pattengale, N. D. & Hudson, C. M. Decentralized genomics audit logging via permissioned blockchain ledgering. BMC Med. Genomics 13, 102 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  127. Ma, S., Cao, Y. & Xiong, L. Efficient logging and querying for blockchain-based cross-site genomic dataset access audit. BMC Med. Genomics 13, 91 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  128. Kuo, T.-T. The anatomy of a distributed predictive modeling framework: online learning, blockchain network, and consensus algorithm. JAMIA Open. 3, 201–208 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  129. Kuo, T.-T., Gabriel, R. A., Cidambi, K. R. & Ohno-Machado, L. EXpectation Propagation LOgistic REgRession on permissioned blockCHAIN (ExplorerChain): decentralized online healthcare/genomics predictive model learning. J. Am. Med. Inform. Assoc. 27, 747–756 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  130. Kuo, T.-T., Kim, J. & Gabriel, R. A. Privacy-preserving model learning on a blockchain network-of-networks. J. Am. Med. Inform. Assoc. 27, 343–354 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  131. Mackey, T. K. Fit-for-purpose?’ — challenges and opportunities for applications of blockchain technology in the future of healthcare. BMC Med. 17, 68 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  132. Kuo, T.-T., Gabriel, R. A. & Ohno-Machado, L. Fair compute loads enabled by blockchain: sharing models by alternating client and server roles. J. Am. Med. Inform. Assoc. 26, 392–403 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  133. Kuo, T.-T., Kim, H.-E. & Ohno-Machado, L. Blockchain distributed ledger technologies for biomedical and health care applications. J. Am. Med. Inform. Assoc. 24, 1211–1220 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  134. Gürsoy, G., Brannon, C. M., Wagner, S. & Gerstein, M. Storing and analyzing a genome on a blockchain. Preprint at https://doi.org/10.1101/2020.03.03.975334 (2020).

  135. Gürsoy, G., Bjornson, R., Green, M. E. & Gerstein, M. Using blockchain to log genome dataset access: efficient storage and query. BMC Med. Genomics 13, 78 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  136. Gürsoy, G., Brannon, C. M. & Gerstein, M. Using Ethereum blockchain to store and query pharmacogenomics data via smart contracts. BMC Med. Genomics 13, 74 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  137. Grishin, D. et al. Citizen-centered, auditable, and privacy-preserving population genomics. Preprint at https://doi.org/10.1101/799999 (2019).

  138. Ozercan, H. I., Ileri, A. M., Ayday, E. & Alkan, C. Realizing the potential of blockchain technologies in genomics. Genome Res. 28, 1255–1263 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. Fiume, M. et al. Federated discovery and sharing of genomic data using beacons. Nat. Biotechnol. 37, 220–224 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  140. Hagestedt, I. et al. MBeacon: privacy-preserving beacons for DNA methylation data. In Proceedings 2019 Network and Distributed System Security Symposium (Internet Society, 2019).

  141. Shringarpure, S. S. & Bustamante, C. D. Privacy risks from genomic data-sharing beacons. Am. J. Hum. Genet. 97, 631–646 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. Raisaro, J. L. et al. Addressing beacon re-identification attacks: quantification and mitigation of privacy risks. J. Am. Med. Inform. Assoc. 24, 799–805 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  143. Bu, D., Wang, X. & Tang, H. Haplotype-based membership inference from summary genomic data. Bioinformatics 37, i161–i168 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  144. Chen, R. et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148, 1293–1307 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  145. PsychENCODE Consortium. Revealing the brain’s molecular architecture. Science 362, 1262–1263 (2018).

    Article  Google Scholar 

  146. Rockman, M. V. & Kruglyak, L. Genetics of global gene expression. Nat. Rev. Genet. 7, 862–872 (2006).

    Article  CAS  PubMed  Google Scholar 

  147. Nica, A. C. & Dermitzakis, E. T. Expression quantitative trait loci: present and future. Philos. Trans. R. Soc. Lond. B 368, 20120362 (2013).

    Article  Google Scholar 

  148. Michaelson, J. J., Loguercio, S. & Beyer, A. Detection and interpretation of expression quantitative trait loci (eQTL). Methods 48, 265–276 (2009).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This study was supported by grants from the US National Institutes of Health (R01 HG010749 to M.B.G. and K99 HG010909 to G.G.). This work is also supported by the A.L. Williams Professorship Fund.

Author information

Authors and Affiliations

Authors

Contributions

All authors researched the literature. G.G., T.L., S.L., E.N. and M.B.G. provided substantial contributions to discussions of the content. G.G., T.L., E.N., C.M.B. and M.B.G. wrote the article. G.G., C.M.B. and M.B.G. reviewed and/or edited the manuscript before submission.

Corresponding author

Correspondence to Mark B. Gerstein.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information

Nature Reviews Genetics thanks Yann Joly, A. Ercüment Çiçek and Xinghua Mindy Shi for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gürsoy, G., Li, T., Liu, S. et al. Functional genomics data: privacy risk assessment and technological mitigation. Nat Rev Genet 23, 245–258 (2022). https://doi.org/10.1038/s41576-021-00428-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41576-021-00428-7

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research