Abstract
Both common and rare genetic variants influence complex traits and common diseases. Genome-wide association studies have identified thousands of common-variant associations, and more recently, large-scale exome sequencing studies have identified rare-variant associations in hundreds of genes1,2,3. However, rare-variant genetic architecture is not well characterized, and the relationship between common-variant and rare-variant architecture is unclear4. Here we quantify the heritability explained by the gene-wise burden of rare coding variants across 22 common traits and diseases in 394,783 UK Biobank exomes5. Rare coding variants (allele frequency < 1 × 10−3) explain 1.3% (s.e. = 0.03%) of phenotypic variance on average—much less than common variants—and most burden heritability is explained by ultrarare loss-of-function variants (allele frequency < 1 × 10−5). Common and rare variants implicate the same cell types, with similar enrichments, and they have pleiotropic effects on the same pairs of traits, with similar genetic correlations. They partially colocalize at individual genes and loci, but not to the same extent: burden heritability is strongly concentrated in significant genes, while common-variant heritability is more polygenic, and burden heritability is also more strongly concentrated in constrained genes. Finally, we find that burden heritability for schizophrenia and bipolar disorder6,7 is approximately 2%. Our results indicate that rare coding variants will implicate a tractable number of large-effect genes, that common and rare associations are mechanistically convergent, and that rare coding variants will contribute only modestly to missing heritability and population risk stratification.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All data used in this manuscript are publicly available and documented in the Supplementary Tables. All results are available in the Supplementary Tables. Neale Lab UKB GWAS summary statistics are available at http://www.nealelab.is/uk-biobank/. Genebass summary statistics are available at https://app.genebass.org. SCHEMA is available at https://schema.broadinstitute.org. BipEx is available at https://bipex.broadinstitute.org. Differentially expressed gene sets are available at https://alkesgroup.broadinstitute.org. Gene-level constraint data are available at https://gnomad.broadinstitute.org. COSMIC cancer gene sets are available at https://cancer.sanger.ac.uk/census.
Code availability
BHR (v.0.1.0) is implemented in R, and its source code is publicly available at GitHub (https://github.com/ajaynadig/bhr) and Zenodo (https://doi.org/10.5281/zenodo.7382799). We have also published scripts enabling the results of the manuscript to be reproduced using publicly available data (Data availability); these are implemented in R, Python, Hail and MATLAB. We also used AMM (https://github.com/danjweiner/AMM21), LDSC (v.1.0.1; https://github.com/bulik/ldsc), HESS (v.0.5.3; https://huwenboshi.github.io/hess/), Genomic SEM (v.0.0.5c; https://github.com/GenomicSEM/GenomicSEM) and GCTA (v.1.94.1; https://yanglab.westlake.edu.cn/software/gcta/#GREMLanalysis).
References
Sun, B. B. et al. Genetic associations of protein-coding variants in human disease. Nature 603, 95–102 (2022).
Wang, Q. et al. Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature 597, 527–532 (2021).
Backman, J. D. et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 599, 628–634 (2021).
Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020).
Karczewski, K. J. et al. Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes. Cell Genomics 2, 100168 (2022).
Singh, T. et al. Rare coding variants in ten genes confer substantial risk for schizophrenia. Nature 604, 509–516 (2022).
Palmer, D. S. et al. Exome sequencing in bipolar disorder identifies AKAP11 as a risk gene shared with schizophrenia. Nat. Genet. 54, 541–547 (2022).
International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Brainstorm Consortium. Analysis of shared heritability in common disorders of the brain. Science 360, eaap8757 (2018).
Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51, 1339–1348 (2019).
Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Jagadeesh, K. A. et al. Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat. Genet. 54, 1479–1492 (2022).
O’Connor, L. J. et al. Extreme polygenicity of complex traits is explained by negative selection. Am. J. Hum. Genet. 105, 456–476 (2019).
Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Gazal, S. et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 50, 1600–1607 (2018).
Liu, D. J. & Leal, S. M. Estimating genetic effects and quantifying missing heritability explained by identified rare-variant associations. Am. J. Hum. Genet. 91, 585–596 (2012).
Wainschtein, P. et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat. Genet. 54, 263–273 (2022).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014).
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).
Jang, S.-K. et al. Rare genetic variants explain missing heritability in smoking. Nat. Hum. Behav. 6, 1577–1586 (2022).
Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).
Shi, H., Kichaev, G. & Pasaniuc, B. Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet. 99, 139–153 (2016).
Palmer, C. & Pe’er, I. Statistical correction of the winner’s curse explains replication variability in quantitative trait genome-wide association studies. PLoS Genet. 13, e1006916 (2017).
Weiner, D. J., Gazal, S., Robinson, E. B. & O’Connor, L. J. Partitioning gene-mediated disease heritability without eQTLs. Am. J. Hum. Genet. 109, 405–416 (2022).
Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Mostafavi, H., Spence, J. P., Naqvi, S. & Pritchard, J. K. Limited overlap of eQTLs and GWAS hits due to systematic differences in discovery. Preprint at bioRxiv https://doi.org/10.1101/2022.05.07.491045 (2022).
Gardner, E. J. et al. Reduced reproductive success is associated with selective constraint on human genes. Nature 603, 858–863 (2022).
Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).
Sanders, S. J. et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237–241 (2012).
Purcell, S. M. et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506, 185–190 (2014).
Simons, Y. B., Bullaughey, K., Hudson, R. R. & Sella, G. A population genetic interpretation of GWAS findings for human quantitative traits. PLoS Biol. 16, e2002985 (2018).
Fu, J. M. et al. Rare coding variation provides insight into the genetic architecture and phenotypic context of autism. Nat. Genet. 54, 1320–1331 (2022).
Border, R. et al. Cross-trait assortative mating is widespread and inflates genetic correlation estimates. Science 378, 754–761 (2022).
Genovese, G. et al. Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia. Nat. Neurosci. 19, 1433–1441 (2016).
Kosmicki, J. A. et al. Refining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples. Nat. Genet. 49, 504–510 (2017).
Baselmans, B. M. L., Yengo, L., van Rheenen, W. & Wray, N. R. Risk in relatives, heritability, SNP-based heritability, and genetic correlations in psychiatric disorders: a review. Biol. Psychiatry 89, 11–19 (2021).
Samocha, K. E. et al. Regional missense constraint improves variant deleteriousness prediction. Preprint at bioRxiv https://doi.org/10.1101/148353 (2017).
Lefebvre, S. et al. Identification and characterization of a spinal muscular atrophy-determining gene. Cell 80, 155–165 (1995).
Mendell, J. R. et al. Single-dose gene-replacement therapy for spinal muscular atrophy. N. Engl. J. Med. 377, 1713–1722 (2017).
Pritchard, J. K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 (2001).
Kim, S. S. et al. Genes with high network connectivity are enriched for disease heritability. Am. J. Hum. Genet. 104, 896–913 (2019).
Forgetta, V. et al. An effector index to predict target genes at GWAS loci. Hum. Genet. 141, 1431–1447 (2022).
Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034 (2019).
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
Fahed, A. C. et al. Polygenic background modifies penetrance of monogenic variants for tier 1 genomic conditions. Nat. Commun. 11, 3635 (2020).
Khera, A. V. et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell 177, 587–596 (2019).
Biddinger, K. J. et al. Rare and common genetic variation underlying the risk of hypertrophic cardiomyopathy in a national biobank. JAMA Cardiol. 7, 715–722 (2022).
Bishop, S. L., Thurm, A., Robinson, E. & Sanders, S. J. Prevalence of returnable genetic results based on recognizable phenotypes among children with autism spectrum disorder. Preprint at bioRxiv https://doi.org/10.1101/2021.05.28.21257736 (2021).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).
Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).
Zhou, W. et al. Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts. Nat. Genet. 52, 634–639 (2020).
Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019).
Acknowledgements
We thank S. Gazal, D. King, A. Price and K. Samocha for analytic assistance and comments on this manuscript; and J. Duan for identifying an issue in the first draft of our manuscript. We acknowledge support from National Institute Mental Health (F30MH129009 to D.J.W.), National Library of Medicine (T15LM007092 to D.J.W.), National Institute of General Medical Science (T32GM007753 to A.N.), Simons Foundation Autism Research Initiative (704413 to E.B.R. and L.J.O.) and the Broad Institute.
Author information
Authors and Affiliations
Contributions
D.J.W., A.N. and L.J.O. conceived and designed experiments. K.A.J. and K.K.D. suggested analyses. B.M.N., E.B.R., K.J.K. and L.J.O. supervised the project. D.J.W., A.N. and L.J.O. performed analyses. D.J.W., A.N. and L.J.O. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
K.J.K. is a consultant for Vor Biopharma and AlloDx. B.M.N. is a member of the scientific advisory board at Deep Genomics and Neumora, consultant of the scientific advisory board for Camp4 Therapeutics and consultant for Merck. The other authors declare no competing interests.
Peer review
Peer review information
Nature thanks Doug Speed and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Performance of BHR in exome-scale simulations with no individual-level data.
We performed an extended set of simulations to assess the performance of BHR. The MAF groups are < 1e-5 (group 1), 1e-5 - 1e-4 (group 2), 1e-4 - 1e-3 (group 3), and 1e-3 - 1e-4 (group 4), respectively; the grey and red boxplots indicate the distribution of estimates in null and non-null simulations (true burden h2 = 0%, 0.5% respectively). A minor difference in the way that BHR was applied to simulated vs. real data is that in simulated data, significant genes were identified without any attempt to correct for population stratification, whereas in our real-trait analyses, they were identified using SAIGE-GENE1. We started with a realistic set of parameters (see Methods) and varied one simulation parameter in each simulation. (A) We increased the sample size from 5e5 to 2e6. This increase amplifies the uncorrected population stratification, causing false positive significant genes and upward bias in BHR (no bias is observed in estimates without significant genes). (B) We added overdispersion effects with the same distribution of effect sizes as the burden effects, i.e. with per-allele effect size variance drawn from a discrete mixture distribution (see Methods). This distribution differs from the BHR model, which assumes that overdispersion effects have a constant per-s.d. effect size variance, but this form of misspecification does not lead to bias. (C) We performed simulations with realistic parameters, including stratification and selection (see Methods and Fig. 1c). (D) We decreased the sample size from 5e5 to 1e5. (E) We increased the strength of population stratification (including the minor-allele biased stratification) by a factor of 10, from a per-s.d. effect size mean of 1e-7 and a variance of 1e-5 to a mean of 1e-6 and a variance of 1e-4. (F) We increased the strength of selection, from mean Ns = 1 to mean Ns = 10. There were extremely few variants with allele frequency greater than 1e-3, so MAF group 4 estimates are not shown. Numerical results are contained in Supplementary Table 4. Boxplots denote median, quartiles and range of distribution (excepting outliers).
Extended Data Fig. 2 Comparison of BHR and GCTA in null simulations with individual-level genotypes and phenotypes, and different patterns of population stratification.
There are four demographic models: no stratification; north-south stratification; north-south stratification with smaller population size in the northern deme; and local stratification with very small population size in one deme (see Methods). Under each model, we performed simulations with and without selection, mimicking pLoF and synonymous variants respectively. (a) BHR burden heritability estimates with no correction for minor allele-biased stratification. (b) GCTA heritability estimates with no correction for ancestry. (c) BHR burden heritability estimates, correcting for minor allele-biased stratification. (d) GCTA heritability estimates, correcting for ancestry by providing the deme from which each individual was sampled as a covariate. Boxplots denote median, quartiles and range of distribution (excluding outliers).
Extended Data Fig. 3 Genome-wide mean minor allele effect sizes.
We define the “mean effect” as the effect size of the genome-wide burden, summing all minor alleles across genes within a category, on the phenotype. For synonymous variants, a nonzero mean effect is interpreted as evidence of minor-allele biased population stratification, and this type of stratification produces upward bias in BHR heritability estimates (see Methods). (a–c) Mean effect of synonymous variants vs. mean effect of missense benign, missense other, and pLoF variants respectively. The lack of correlation in (c) suggests that for pLoFs, the nonzero mean effect is mostly biological. (d) Mean effect of synonymous variants vs. the resulting bias in heritability estimates, for synonymous variants (left y axis) or for pLoFs (right y axis). These differ by a constant factor due to the larger number of synonymous variants than pLoFs. (e) Mean effect of pLoF variants vs. the contribution of these effects to burden heritability. These estimates are a small fraction of the total pLoF burden heritability. Error bars represent standard errors, which are computed by assuming independence across genes.
Extended Data Fig. 4 Burden heritability estimates with effect-allele-permuted burden statistics.
We assessed the potential for confounding in our results by repeating our analyses with ultra-rare pLoF burden statistics whose effect alleles were randomly permuted. This permutation is expected to eliminate the burden heritability while not affecting any form of confounding that is symmetrical with respect to the minor vs. major allele. Boxplots indicate the distribution of burden heritability estimates before and after the permutation (non-null and null, respectively), with median, quartiles and range (excepting outliers).
Extended Data Fig. 5 Proportion of common variant heritability explained by LD-independent blocks with significant heritability.
For each trait, we used HESS to identify which of the 1651 LD-independent blocks from Berisa2 have Bonferroni-significant heritability, and then computed the proportion of the overall HESS heritability mediated by each block. Although these blocks aggregate over many variants in many genes, the proportion of heritability explained by individual significant blocks is still less than the proportion of burden heritability explained by individual significant genes in BHR (Extended Data Fig. 4).
Extended Data Fig. 6 Comparison of burden versus common variant heritability explained by exome-wide significant genes.
Each point represents a trait-gene significant burden association from the Genebass dataset. X axis values are the fraction of common variant heritability (estimated with HESS) explained by the LD-independent block containing that gene. Y axis values are the fraction of burden heritability (estimated with BHR) explained by the significant gene.
Extended Data Fig. 8 Genetic correlation estimates across 37 traits, for common variants (upper triangle) and rare coding variants (lower).
Asterisks indicate nominally significant genetic correlation estimates (two-tailed p < 0.05). Grey boxes not on the diagonal indicate cross-trait LDSC point estimates that are outside of [−1.25, 1.25], which cross-trait LDSC does not report by default. For numerical results, see Supplementary Table 19.
Extended Data Fig. 9 Comparison of common coding vs. common whole-genome genetic correlations.
(a) We evaluated whether common coding variants, similar to rare coding variants, have stronger genetic correlations than common variants overall. The fit line indicates the Deming regression slope, which allows for uncertainty in both the X and Y axis values. (b) We also assessed the stability of the Deming regression slope for the burden genetic correlation vs. the common-variant genetic correlation on chromosomes 1–8 and chromosomes 9–22.
Extended Data Fig. 10 Burden heritability enrichments of drug target gene sets.
We used BHR to estimate the ultra-rare loss-of-function burden heritability enrichment in sets of manually curated drug target genes from a previous publication6. For all panels, error bars are standard errors, and bars are shaded in blue if the enrichment is significantly greater than 1. (A) Burden heritability enrichment in n = 14 blood pressure drug target genes (union of diastolic and systolic blood pressure gene sets from reference publication). (B) Burden heritability enrichment in n = 8 bone mineral density drug target genes. (C) Burden heritability enrichment in n = 6 calcium drug target genes. (D) Burden heritability enrichment in n = 10 lipid drug target genes (union of LDL and triglyceride gene sets from reference publication). (E) Burden heritability enrichment in n = 6 red blood cell drug target genes. (F) Burden heritability enrichment in n = 7 type 2 diabetes drug target genes.
Supplementary information
Supplementary Information
Legends for Supplementary Figs. 1–8, legends for Supplementary Tables 1–22 and additional references.
Supplementary Figs. 1–8
Supplementary Figs. 1–8.
Supplementary Tables 1–22
Supplementary Tables 1–22.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Weiner, D.J., Nadig, A., Jagadeesh, K.A. et al. Polygenic architecture of rare coding variation across 394,783 exomes. Nature 614, 492–499 (2023). https://doi.org/10.1038/s41586-022-05684-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-022-05684-z
This article is cited by
-
FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction
Human Genomics (2024)
-
Recent advances in polygenic scores: translation, equitability, methods and FAIR tools
Genome Medicine (2024)
-
Whole genome sequencing in clinical practice
BMC Medical Genomics (2024)
-
Whole-exome sequencing in UK Biobank reveals rare genetic architecture for depression
Nature Communications (2024)
-
Principles and methods for transferring polygenic risk scores across global populations
Nature Reviews Genetics (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.