Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses

Barton, Alison R.; Sherman, Maxwell A.; Mukamel, Ronen E.; Loh, Po-Ru

doi:10.1038/s41588-021-00892-1

Analysis
Published: 05 July 2021

Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses

Nature Genetics volume 53, pages 1260–1269 (2021)Cite this article

10k Accesses
65 Citations
70 Altmetric
Metrics details

Subjects

Abstract

Exome association studies to date have generally been underpowered to systematically evaluate the phenotypic impact of very rare coding variants. We leveraged extensive haplotype sharing between 49,960 exome-sequenced UK Biobank participants and the remainder of the cohort (total n ≈ 500,000) to impute exome-wide variants with accuracy R² > 0.5 down to minor allele frequency (MAF) ~0.00005. Association and fine-mapping analyses of 54 quantitative traits identified 1,189 significant associations (P < 5 × 10⁻⁸) involving 675 distinct rare protein-altering variants (MAF < 0.01) that passed stringent filters for likely causality. Across all traits, 49% of associations (578/1,189) occurred in genes with two or more hits; follow-up analyses of these genes identified allelic series containing up to 45 distinct ‘likely-causal’ variants. Our results demonstrate the utility of within-cohort imputation in population-scale genome-wide association studies, provide a catalog of likely-causal, large-effect coding variant associations and foreshadow the insights that will be revealed as genetic biobank studies continue to grow.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Whole-exome imputation, association and fine mapping identify rare coding variants likely to causally associate with 54 quantitative traits.**

Fig. 2: Association analyses of the subsequent n = 200,643 UKB exome release demonstrate robustness of likely-causal variant–trait associations ascertained using genotypes imputed from n = 49,960 exomes.

**Fig. 3: Likely-causal coding variants are rare and enriched for deleteriousness.**

**Fig. 4: Many genes contain long allelic series of rare coding variants with consistent effect directions.**

Exome-wide analysis implicates rare protein-altering variants in human handedness

Article Open access 02 April 2024

Dick Schijven, Sourena Soheili-Nezhad, … Clyde Francks

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Protein-truncating variants in BSN are associated with severe adult-onset obesity, type 2 diabetes and fatty liver disease

Article Open access 04 April 2024

Yajie Zhao, Maria Chukanova, … John R. B. Perry

Data availability

Access to the UKB Resource is available by application (http://www.ukbiobank.ac.uk/). Exome-wide summary association statistics for the 54 quantitative traits we analyzed are available at https://data.broadinstitute.org/lohlab/UKB_exomeWAS/ and data files containing allelic series for all gene–trait associations with multiple likely-causal variants are also available at this website.

Code availability

The following publicly available software packages were used to perform analyses: Eagle2 (v.2.3.5), https://data.broadinstitute.org/alkesgroup/Eagle/; Minimac4 (v.1.0.1), https://genome.sph.umich.edu/wiki/Minimac4; BOLT–LMM (v.2.3.4), https://data.broadinstitute.org/alkesgroup/BOLT-LMM/; FINEMAP (v.1.3.1), http://www.christianbenner.com/; plink (v.1.9 and v.2.0), https://www.cog-genomics.org/plink2/ and tsinfer (v.0.1.4), https://tsinfer.readthedocs.io/en/latest/. Information from the following databases were also used: VEP (v.95 on GRCh37 with GENCODE 19), https://www.ensembl.org/vep; CADD (v.1.5), https://cadd.gs.washington.edu/download; SpliceAI (v.1.2.1) https://github.com/Illumina/SpliceAI; NHGRI–EBI GWAS Catalog (v.1.0), https://www.ebi.ac.uk/gwas/home; TOPMed (v.r2, 97,256 TOPMed samples), https://imputation.biodatacatalyst.nhlbi.nih.gov/#!pages/about; Protein Data Bank, https://www.rcsb.org/; SWISS-MODEL, https://swissmodel.expasy.org/ and PANTHER (v.15.0), http://www.pantherdb.org/. Scripts used to perform the downstream analyses described above are available at https://data.broadinstitute.org/lohlab/UKB_exomeWAS/ (https://doi.org/10.5281/zenodo.4771214).

References

International Multiple Sclerosis Genetics Consortium. Low-frequency and rare-coding variation contributes to multiple sclerosis risk. Cell 175, 1679–1687.e7 (2018).
Marouli, E. et al. Rare and low-frequency coding variants alter human adult height. Nature 542, 186–190 (2017).
Article CAS PubMed PubMed Central Google Scholar
Liu, D. J. et al. Exome-wide association study of plasma lipids in >300,000 individuals. Nat. Genet. 49, 1758–1766 (2017).
Article CAS PubMed PubMed Central Google Scholar
Liu, C. et al. Meta-analysis identifies common and rare variants influencing blood pressure and overlapping with metabolic trait loci. Nat. Genet. 48, 1162–1170 (2016).
Article CAS PubMed PubMed Central Google Scholar
Fu, W. et al. Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).
Article CAS PubMed Google Scholar
Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Article CAS PubMed PubMed Central Google Scholar
Dewey, F. E. et al. Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study. Science 354, aaf6814 (2016).
Article PubMed CAS Google Scholar
Van Hout, C. V. et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020).
Article PubMed PubMed Central CAS Google Scholar
Cirulli, E. T. et al. Genome-wide rare variant analysis for thousands of phenotypes in over 70,000 exomes from two cohorts. Nat. Commun. 11, 542 (2020).
Article CAS PubMed PubMed Central Google Scholar
Flannick, J. et al. Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls. Nature 570, 71–76 (2019).
Article CAS PubMed PubMed Central Google Scholar
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central Google Scholar
DeBoever, C. et al. Medical relevance of protein-truncating variants across 337,205 individuals in the UK Biobank study. Nat. Commun. 9, 1612 (2018).
Article PubMed PubMed Central CAS Google Scholar
Emdin, C. A. et al. Analysis of predicted loss-of-function variants in UK Biobank identifies variants protective for disease. Nat. Commun. 9, 1–8 (2018).
Article CAS Google Scholar
Loh, P.-R., Palamara, P. F. & Price, A. L. Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48, 811–816 (2016).
Article CAS PubMed PubMed Central Google Scholar
Loh, P.-R. et al. Reference-based phasing using the haplotype reference consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Article CAS PubMed PubMed Central Google Scholar
Nait Saada, J. et al. Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations. Nat. Commun. 11, 6130 (2020).
Article CAS PubMed PubMed Central Google Scholar
Loh, P.-R., Genovese, G. & McCarroll, S. A. Monogenic and polygenic inheritance become instruments for clonal selection. Nature 584, 136–141 (2020).
Browning, B. L. & Browning, S. R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98, 116–126 (2016).
Article CAS PubMed PubMed Central Google Scholar
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Huang, J. et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat. Commun. 6, 8111 (2015).
Article CAS PubMed Google Scholar
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Article CAS PubMed PubMed Central Google Scholar
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Article CAS PubMed PubMed Central Google Scholar
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
Article CAS PubMed PubMed Central Google Scholar
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).
Article CAS PubMed Google Scholar
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Article PubMed PubMed Central CAS Google Scholar
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).
Article CAS PubMed Google Scholar
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinforma. Oxf. Engl. 32, 1493–1501 (2016).
Article CAS Google Scholar
Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).
Article CAS PubMed PubMed Central Google Scholar
Szustakowski, J. D. et al. Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank. Preprint at medRxiv https://doi.org/10.1101/2020.11.02.20222232 (2020).
Wang, Q. et al. Surveying the contribution of rare variants to the genetic architecture of human disease through exome sequencing of 177,882 UK Biobank participants. Preprint at bioRxiv https://doi.org/10.1101/2020.12.13.422582 (2020).
Vuckovic, D. et al. The polygenic and monogenic basis of blood traits and diseases. Cell 182, 1214–1231.e11 (2020).
Article CAS PubMed PubMed Central Google Scholar
Haworth, S. et al. Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis. Nat. Commun. 10, 333 (2019).
Article PubMed PubMed Central CAS Google Scholar
Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012).
Article CAS PubMed PubMed Central Google Scholar
Yasoda, A. et al. Natriuretic peptide regulation of endochondral ossification: Evidence for possible roles of the C-type natriuretic peptide/guanylyl cyclase-B pathway. J. Biol. Chem. 273, 11695–11700 (1998).
Article CAS PubMed Google Scholar
Gandotra, S. et al. Perilipin deficiency and autosomal dominant partial lipodystrophy. N. Engl. J. Med. 364, 740–748 (2011).
Article CAS PubMed PubMed Central Google Scholar
Aslan, J. E. & McCarty, O. J. T. Rho GTPases in platelet function. J. Thromb. Haemost. 11, 35–46 (2013).
Article CAS PubMed PubMed Central Google Scholar
Zhao, A. Z., Huan, J.-N., Gupta, S., Pal, R. & Sahu, A. A phosphatidylinositol 3-kinase–phosphodiesterase 3B–cyclic AMP pathway in hypothalamic action of leptin on feeding. Nat. Neurosci. 5, 727–728 (2002).
Article CAS PubMed Google Scholar
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Article CAS PubMed PubMed Central Google Scholar
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Article CAS PubMed Google Scholar
Ahituv, N. et al. Medical sequencing at the extremes of human body mass. Am. J. Hum. Genet. 80, 779–791 (2007).
Article CAS PubMed PubMed Central Google Scholar
The Gene Ontology Consortium. The Gene Ontology resource: 20 years and still GOing strong. Nucleic Acids Res. 47, D330–D338 (2019).
Mi, H., Muruganujan, A., Ebert, D., Huang, X. & Thomas, P. D. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res. 47, D419–D426 (2019).
Article CAS PubMed Google Scholar
Sinnott-Armstrong, N. et al. Genetics of 38 blood and urine biomarkers in the UK Biobank. Nat. Genet. 53, 185–194 (2021).
Article CAS PubMed Google Scholar
Povysil, G. et al. Rare-variant collapsing analyses for complex traits: guidelines and applications. Nat. Rev. Genet. 20, 747–759 (2019).
Article CAS PubMed Google Scholar
Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Article PubMed PubMed Central CAS Google Scholar
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–S3 (2012).
Article CAS PubMed PubMed Central Google Scholar
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Article CAS PubMed Google Scholar
Locke, A. E. et al. Exome sequencing of Finnish isolates enhances rare-variant association power. Nature 572, 323–328 (2019).
Article CAS PubMed PubMed Central Google Scholar
Cunningham, D. et al. Structural and biophysical studies of PCSK9 and its mutants linked to familial hypercholesterolemia. Nat. Struct. Mol. Biol. 14, 413–419 (2007).
Article CAS PubMed Google Scholar
Biterova, E., Esmaeeli, M., Alanen, H. I., Saaranen, M. & Ruddock, L. W. Structures of Angptl3 and Angptl4, modulators of triglyceride levels and coronary artery disease. Sci. Rep. 8, 6752 (2018).
Article PubMed PubMed Central CAS Google Scholar
LeCour, L. et al. The structural basis for Cdc42-induced dimerization of IQGAPs. Structure 24, 1499–1508 (2016).
Article CAS PubMed PubMed Central Google Scholar
Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28, 235–242 (2000).
Article CAS PubMed PubMed Central Google Scholar
Bienert, S. et al. The SWISS-MODEL repository—new features and functionality. Nucleic Acids Res. 45, D313–D319 (2017).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank A. Gusev, M. Hujoel, P. Palamara, A. Price and S. Sunyaev for helpful discussions. This research was conducted using the UKB Resource under application no. 10438. A.R.B. was supported by US NIH grant T32 HG229516 and fellowship F31 HL154537. M.A.S. was supported by the MIT John W. Jarve (1978) Seed Fund for Science Innovation and US NIH Fellowship F31 MH124393. R.E.M. was supported by US NIH grant K25 HL150334 and NSF grant DMS-1939015. P.-R.L. was supported by US NIH grant DP2 ES030554, a Burroughs Wellcome Fund Career Award at the Scientific Interfaces, the Next Generation Fund at the Broad Institute of MIT and Harvard, and a Sloan Research Fellowship. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. Computational analyses were performed on the O2 High Performance Compute Cluster, supported by the Research Computing Group, at Harvard Medical School (http://rc.hms.harvard.edu).

Author information

Authors and Affiliations

Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
Alison R. Barton, Maxwell A. Sherman, Ronen E. Mukamel & Po-Ru Loh
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Alison R. Barton, Maxwell A. Sherman, Ronen E. Mukamel & Po-Ru Loh
Bioinformatics and Integrative Genomics Program, Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Alison R. Barton
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
Maxwell A. Sherman

Authors

Alison R. Barton
View author publications
You can also search for this author in PubMed Google Scholar
Maxwell A. Sherman
View author publications
You can also search for this author in PubMed Google Scholar
Ronen E. Mukamel
View author publications
You can also search for this author in PubMed Google Scholar
Po-Ru Loh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.R.B. and P.-R.L. performed statistical analyses and wrote the manuscript. M.A.S. and R.E.M. provided substantial input on all analyses and on the manuscript.

Corresponding authors

Correspondence to Alison R. Barton or Po-Ru Loh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Genetics thanks S. Petrovski and S. Carmi for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–5 and Figs. 1–11

Reporting Summary

Peer Review Information

Supplementary Tables

Supplementary Tables 1–15

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barton, A.R., Sherman, M.A., Mukamel, R.E. et al. Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses. Nat Genet 53, 1260–1269 (2021). https://doi.org/10.1038/s41588-021-00892-1

Download citation

Received: 21 August 2020
Accepted: 28 May 2021
Published: 05 July 2021
Issue Date: August 2021
DOI: https://doi.org/10.1038/s41588-021-00892-1

This article is cited by

Protein-altering variants at copy number-variable regions influence diverse human phenotypes
- Margaux L. A. Hujoel
- Robert E. Handsaker
- Po-Ru Loh
Nature Genetics (2024)
Proteo-genomics of soluble TREM2 in cerebrospinal fluid provides novel insights and identifies novel modulators for Alzheimer’s disease
- Lihua Wang
- Niko-Petteri Nykänen
- Carlos Cruchaga
Molecular Neurodegeneration (2024)
Candidate genes under selection in song sparrows co-vary with climate and body mass in support of Bergmann’s Rule
- Katherine Carbeck
- Peter Arcese
- Jennifer Walsh
Nature Communications (2023)
Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits
- Brian C. Zhang
- Arjun Biddanda
- Pier Francesco Palamara
Nature Genetics (2023)
CARE as a wearable derived feature linking circadian amplitude to human cognitive functions
- Shuya Cui
- Qingmin Lin
- Fan Jiang
npj Digital Medicine (2023)

Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses

Subjects

Abstract

Access options

Similar content being viewed by others

Exome-wide analysis implicates rare protein-altering variants in human handedness

Genome-wide association studies

Protein-truncating variants in BSN are associated with severe adult-onset obesity, type 2 diabetes and fatty liver disease

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Reporting Summary

Peer Review Information

Supplementary Tables

Rights and permissions

About this article

Cite this article

This article is cited by

Protein-altering variants at copy number-variable regions influence diverse human phenotypes

Proteo-genomics of soluble TREM2 in cerebrospinal fluid provides novel insights and identifies novel modulators for Alzheimer’s disease

Candidate genes under selection in song sparrows co-vary with climate and body mass in support of Bergmann’s Rule

Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits

CARE as a wearable derived feature linking circadian amplitude to human cognitive functions

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links