Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Citizen-centered, auditable and privacy-preserving population genomics

A preprint version of the article is available at bioRxiv.

Abstract

The growing number of health-data breaches, the use of genomic databases for law enforcement purposes and the lack of transparency of personal genomics companies are raising unprecedented privacy concerns. To enable a secure exploration of genomic datasets with controlled and transparent data access, we propose a citizen-centric approach that combines cryptographic privacy-preserving technologies, such as homomorphic encryption and secure multi-party computation, with the auditability of blockchains. Our open-source implementation supports queries on the encrypted genomic data of hundreds of thousands of individuals, with minimal overhead. We show that real-world adoption of our system alleviates widespread privacy concerns and encourages data access sharing with researchers.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Results of a survey of 442 individuals interested in genetic testing and data sharing.
Fig. 2: Overview of the system.
Fig. 3: Performance of data discovery with a dataset of 150,000 individuals.

Similar content being viewed by others

Data availability

To evaluate the performance of our system, we used open-access genomic and clinical data from The Cancer Genome Atlas (TCGA). The initial dataset contained one million genetic variants distributed across 8,000 individuals that were obtained by downloading and merging genomic data for the 32 studies of the TCGA PanCancer Atlas from https://www.cbioportal.org/, and are made available at https://github.com/ldsec/projects-data/tree/master/medco/datasets/genomic/tcga_cbio. We synthetically augmented this initial dataset to obtain the final test dataset of 28 billion genetic variants distributed across 150,000 individuals (each individual has a number of genetic variants, ranging from 15,000 to 200,000). This dataset is very large (in the order of tens of terabytes), so we cannot host it online. Instructions for reproducing the dataset augmentation are provided in the GitHub repository. It also includes a small demo dataset (https://github.com/ldsec/ccgd-platform) that is sufficient to test the code. If access to the large dataset that was used for the performance benchmarks is required, contact juan.troncoso-pastoriza@epfl.ch.

Code availability

The code is available on GitHub at https://github.com/ldsec/ccgd-platform. The code is archived on Zenodo at https://doi.org/10.5281/zenodo.455116513.

References

  1. Regalado, A. More than 26 million people have taken an at-home ancestry test. MIT Technology Review (11 February 2019).

  2. Farr, C. 23andMe lays off 100 people as DNA test sales decline, CEO says she was ‘surprised’ to see market turn. CNBC https://www.cnbc.com/2020/01/23/23andme-lays-off-100-people-ceo-anne-wojcicki-explains-why.html (2020).

  3. Farr, C. Ancestry to lay off 6% of workforce because of a slowdown in the consumer DNA-testing market. CNBC https://www.cnbc.com/2020/02/05/ancestry-layoffs-of-6percent-100-people-amid-dna-test-slowdown.html (2020).

  4. Huang, Z., Ayday, E., Fellay, J., Hubaux, J. & Juels, A. GenoGuard: protecting genomic data against brute-force attacks. In Proc. 2015 IEEE Symposium on Security and Privacy 447–462 (IEEE, 2015).

  5. Cho, H., Wu, D. J. & Berger, B. Secure genome-wide association analysis using multiparty computation. Nat. Biotechnol. 36, 547–551 (2018).

    Article  Google Scholar 

  6. Jagadeesh, K. A., Wu, D. J., Birgmeier, J. A., Boneh, D. & Bejerano, G. Deriving genomic diagnoses without revealing patient genomes. Science 357, 692–695 (2017).

    Article  Google Scholar 

  7. Bogdanov, D., Laur, S. & Willemson, J. Sharemind: a framework for fast privacy-preserving computations. In Proc. Computer Security ESORICS 2008 192–206 (Springer, 2008).

  8. Boura, C. et al. in Financial Cryptography and Data Security 183–202 (Springer, 2018).

  9. Raisaro, J. L. et al. MedCo: enabling secure and privacy-preserving exploration of distributed clinical and genomic data. IEEE/ACM Trans. Comput. Biol. Bioinform. 16, 1328–1341 (2018).

    Article  Google Scholar 

  10. Grishin, D., Obbad, K. & Church, G. M. Data privacy in the age of personal genomics. Nat. Biotechnol. 37, 1115–1117 (2019).

    Article  Google Scholar 

  11. Froelicher, D. et al. UnLynx: a decentralized system for privacy-conscious data sharing. Proc. Priv. Enhancing Technol. 2017, 232–250 (2017).

    Article  Google Scholar 

  12. Luterbacher, C. EPFL software to enable secure data-sharing for hospitals. EPFL (2 April 2002).

  13. de Sa Sousa, J. A. G., Misbach, M., Quinn, K., Pastoriza, J. R. T. & Grishin, D. ldsec/ccgd-platform: Citizen-Centered Genomic Discovery Platform (Zenodo, 2021); https://doi.org/10.5281/zenodo.4551165

Download references

Acknowledgements

We are grateful to the Bitfury Exonum engineering team for running part of the performance measurements described in this paper. We also thank C. Redin from Lausanne University Hospital, J. Aach from Harvard Medical School and A. Pyrgelis from EPFL for their useful feedback and constructive comments on the manuscript. This work was supported in part by grant no. 2017-201 (DPPH) of the Swiss strategic focus area Personalized Health and Related Technologies (PHRT) and by grant no. 2018-522 (MedCo) of the PHRT and the Swiss Personalized Health Network (SPHN). The work was also supported by Nebula Genomics.

Author information

Authors and Affiliations

Authors

Contributions

D.G., J.L.R. and J.T.P. conceived the study. D.G., J.L.R., J.T.P. and J.P.H. wrote the manuscript. D.G., J.L.R., J.T.P., K.O., J.G., M.M., K.Q. and J.S. implemented the algorithms described in the Methods and performed the benchmark experiments. All authors discussed and reviewed the manuscript at all stages. J.F., G.M.C. and J.P.H. supervised the work.

Corresponding authors

Correspondence to Dennis Grishin or Jean-Pierre Hubaux.

Ethics declarations

Competing interests

D.G., K.O., K.Q. and G.M.C. are affiliated with Nebula Genomics. J.L.R., J.T.P., M.M., J.S., J.F. and J.P.H. declare no competing interests.

Additional information

Peer review information Nature Computational Science thanks Fida Dankar and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Ananya Rastogi was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Security and privacy background, example of secure distributed aggregate-data analysis, Supplementary Results (Figs. 1–9) and Supplementary Methods (Figs. 10–18).

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Source data

Source Data Fig. 1

Survey responses.

Source Data Fig. 3

Code benchmarks.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grishin, D., Raisaro, J.L., Troncoso-Pastoriza, J.R. et al. Citizen-centered, auditable and privacy-preserving population genomics. Nat Comput Sci 1, 192–198 (2021). https://doi.org/10.1038/s43588-021-00044-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-021-00044-9

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research