Letter | Published:

Secure genome-wide association analysis using multiparty computation

Nature Biotechnology volume 36, pages 547551 (2018) | Download Citation

Abstract

Most sequenced genomes are currently stored in strict access-controlled repositories1,2,3. Free access to these data could improve the power of genome-wide association studies (GWAS) to identify disease-causing genetic variants and aid the discovery of new drug targets4,5. However, concerns over genetic data privacy6,7,8,9 may deter individuals from contributing their genomes to scientific studies10 and could prevent researchers from sharing data with the scientific community11. Although cryptographic techniques for secure data analysis exist12,13,14, none scales to computationally intensive analyses, such as GWAS. Here we describe a protocol for large-scale genome-wide analysis that facilitates quality control and population stratification correction in 9K, 13K, and 23K individuals while maintaining the confidentiality of underlying genotypes and phenotypes. We show the protocol could feasibly scale to a million individuals. This approach may help to make currently restricted data available to the scientific community and could potentially enable secure genome crowdsourcing, allowing individuals to contribute their genomes to a study without compromising their privacy.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

  2. 2.

    et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214–223 (2016).

  3. 3.

    et al. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int. J. Epidemiol. 40, 1652–1666 (2011).

  4. 4.

    , & Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am. J. Epidemiol. 164, 609–614 (2006).

  5. 5.

    , , & Required sample size and nonreplicability thresholds for heterogeneous genetic associations. Proc. Natl. Acad. Sci. USA 105, 617–622 (2008).

  6. 6.

    Be prepared for the big genome leak. Nature 498, 139 (2013).

  7. 7.

    , , , & Identifying personal genomes by surname inference. Science 339, 321–324 (2013).

  8. 8.

    & Privacy risks from genomic data-sharing beacons. Am. J. Hum. Genet. 97, 631–646 (2015).

  9. 9.

    & Quantification of private information leakage from phenotype-genotype data: linking attacks. Nat. Methods 13, 251–256 (2016).

  10. 10.

    et al. Motivations, concerns and preferences of personal genome sequencing research participants: baseline findings from the HealthSeq project. Eur. J. Hum. Genet. 24, 14–20 (2016).

  11. 11.

    , & Beyond our borders? Public resistance to global genomic data sharing. PLoS Biol. 14, e2000206 (2016).

  12. 12.

    & Secure Multiparty Computation (Cambridge University Press, 2015).

  13. 13.

    Fully homomorphic encryption using ideal lattices. STOC '09 Proceedings of the Forty-First Annual ACM symposium on Theory of Computing 169–178 (2009).

  14. 14.

    Protocols for secure computations. IEEE Annual Symposium on Foundations of Computer Science 160–164 (1982).

  15. 15.

    et al. A community assessment of privacy preserving techniques for human genomes. BMC Med. Inform. Decis. Mak. 14 (Suppl. 1), S1 (2014).

  16. 16.

    , , & A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics 29, 886–893 (2013).

  17. 17.

    , & Efficient secure outsourcing of genome-wide association studies. IEEE Security and Privacy Workshops 3–6, doi:10.1109/SPW.2015.11 (2015).

  18. 18.

    et al. HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS. Bioinformatics 32, 211–218 (2016).

  19. 19.

    , , , & Privacy-preserving GWAS analysis on federated genomic datasets. BMC Med. Inform. Decis. Mak. 15 (Suppl. 5), S2 (2015).

  20. 20.

    , , & Implementation and evaluation of an algorithm for cryptographically private principal component analysis on genomic data. 3rd International Workshop on Genome Privacy and Security (2016).

  21. 21.

    et al. Privacy-preserving genome-wide association study is practical. Cryptology ePrint Archive (2017).

  22. 22.

    , , , & Deriving genomic diagnoses without revealing patient genomes. Science 357, 692–695 (2017).

  23. 23.

    et al. Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388–393 (2004).

  24. 24.

    et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

  25. 25.

    , & Completeness Theorems for Non-Cryptographic Fault-Tolerant Distributed Computation. STOC '88 Proceedings of the Twentieth Annual ACM symposium on Theory of Computing 1–10 (1988).

  26. 26.

    , & Sharemind: a framework for fast privacy-preserving computations. ESORICS 5283, 192–206 (2008).

  27. 27.

    , , & Multiparty computation from somewhat homomorphic encryption. CRYPTO 2012, 643–662 (2012).

  28. 28.

    , & MASCOT: faster malicious arithmetic secure computation with oblivious transfer. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security 830–842 (2016).

  29. 29.

    , & Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53, 217–288 (2011).

  30. 30.

    et al. Fast principal-component analysis reveals convergent evolution of ADH1B in Europe and East Asia. Am. J. Hum. Genet. 98, 456–472 (2016).

  31. 31.

    et al. Interactions between household air pollution and GWAS-identified lung cancer susceptibility markers in the Female Lung Cancer Consortium in Asia (FLCCA). Hum. Genet. 134, 333–341 (2015).

  32. 32.

    et al. Association of granulomatosis with polyangiitis (Wegener's) with HLA-DPB1*04 and SEMA6A gene variants: evidence from genome-wide analysis. Arthritis Rheum. 65, 2457–2468 (2013).

  33. 33.

    , , , & Urinary bladder cancer in Wegener's granulomatosis: risks and relation to cyclophosphamide. Ann. Rheum. Dis. 63, 1307–1311 (2004).

  34. 34.

    et al. Inferring fine-grained control flow inside SGX enclaves with branch shadowing. Proceedings of the 26th USENIX Security Symposium 557–574 (USENIX Association, 2017).

  35. 35.

    , & Controlled-channel attacks: deterministic side channels for untrusted operating systems. Proceedings of the 2015 IEEE Symposium on Security and Privacy 640–656 (2015).

  36. 36.

    , & Enabling privacy-preserving GWASs in heterogeneous human populations. Cell Syst. 3, 54–61 (2016).

  37. 37.

    & Realizing privacy preserving genome-wide association studies. Bioinformatics 32, 1293–1300 (2016).

  38. 38.

    et al. Genome-wide association analysis identifies new lung cancer susceptibility loci in never-smoking women in Asia. Nat. Genet. 44, 1330–1335 (2012).

  39. 39.

    et al. Genome-wide association study identifies multiple loci associated with bladder cancer risk. Hum. Mol. Genet. 23, 1387–1398 (2014).

  40. 40.

    et al. A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nat. Genet. 48, 134–143 (2016).

  41. 41.

    et al. NCBI's Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res. 42, D975–D979 (2014).

Download references

Acknowledgements

H.C. and B.B. are partially supported by the US National Institutes of Health GM108348 (to B.B.). H.C. is also partially supported by Kwanjeong Educational Foundation. D.J.W. is supported by fellowships from the Simons and National Science Foundations.

Author information

Affiliations

  1. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Hyunghoon Cho
    •  & Bonnie Berger
  2. Department of Computer Science, Stanford University, Stanford, California, USA.

    • David J Wu
  3. Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Bonnie Berger

Authors

  1. Search for Hyunghoon Cho in:

  2. Search for David J Wu in:

  3. Search for Bonnie Berger in:

Contributions

H.C., D.J.W., and B.B. developed the methods. H.C. implemented the software and performed experiments with assistance from D.J.W. and B.B. B.B. supervised the project. All authors wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Bonnie Berger.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–2

  2. 2.

    Life Sciences Reporting Summary

  3. 3.

    Supplementary Tables

    Supplementary tables 1–3

  4. 4.

    Supplementary Notes

    Supplementary notes 1–12

Zip files

  1. 1.

    Supplementary Code

    An implementation of our secure GWAS protocol in C++.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.4108

Further reading