Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations

Abstract

Common variant heritability has been widely reported to be concentrated in variants within cell-type-specific non-coding functional annotations, but little is known about low-frequency variant functional architectures. We partitioned the heritability of both low-frequency (0.5%≤ minor allele frequency <5%) and common (minor allele frequency ≥5%) variants in 40 UK Biobank traits across a broad set of functional annotations. We determined that non-synonymous coding variants explain 17 ± 1% of low-frequency variant heritability (\(h_{{\mathrm{lf}}}^2\)) versus 2.1 ± 0.2% of common variant heritability (\(h_{\mathrm{c}}^2\)). Cell-type-specific non-coding annotations that were significantly enriched for \(h_{\mathrm{c}}^2\) of corresponding traits were similarly enriched for \(h_{{\mathrm{lf}}}^2\) for most traits, but more enriched for brain-related annotations and traits. For example, H3K4me3 marks in brain dorsolateral prefrontal cortex explain 57 ± 12% of \(h_{{\mathrm{lf}}}^2\) versus 12 ± 2% of \(h_{\mathrm{c}}^2\) for neuroticism. Forward simulations confirmed that low-frequency variant enrichment depends on the mean selection coefficient of causal variants in the annotation, and can be used to predict effect size variance of causal rare variants (minor allele frequency <0.5%).

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Simulations to assess LFVE estimates.
Fig. 2: Common variant heritability \(\left( {{\boldsymbol{h}}_{\mathrm{c}}^2} \right)\) and low-frequency variant heritability \(\left( {{\boldsymbol{h}}_{{\mathrm{lf}}}^2} \right)\) estimates for 40 UK Biobank traits.
Fig. 3: Functional low-frequency and common variant architectures across 27 independent UK Biobank traits.
Fig. 4: Low-frequency and common variant architectures of CTS annotations.
Fig. 5: Low-frequency and common variant enrichments for non-synonymous variants vary with the strength of selection on the underlying genes.
Fig. 6: Forward simulations enable inferences about negative selection and rare variant architectures.

Data availability

Baseline-LF annotations are available at https://data.broadinstitute.org/alkesgroup/LDSCORE/baselineLF.tar.gz. BOLT-LMM association statistics computed in this study are available at https://data.broadinstitute.org/alkesgroup/UKBB/UKBB_409K.

References

  1. 1.

    Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    CAS  Article  Google Scholar 

  2. 2.

    Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).

    CAS  Article  Google Scholar 

  3. 3.

    Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).

    CAS  Article  Google Scholar 

  4. 4.

    Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).

    CAS  Article  Google Scholar 

  5. 5.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    CAS  Article  Google Scholar 

  6. 6.

    Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    CAS  Article  Google Scholar 

  7. 7.

    Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

    CAS  Article  Google Scholar 

  8. 8.

    Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).

    CAS  Article  Google Scholar 

  9. 9.

    Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).

    CAS  Article  Google Scholar 

  10. 10.

    Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).

    CAS  Article  Google Scholar 

  11. 11.

    Schoech, A. et al. Quantification of frequency-dependent genetic architectures and action of negative selection in 25 UK Biobank traits. Preprint at https://www.biorxiv.org/content/early/2017/09/13/188086 (2017).

  12. 12.

    Eyre-Walker, A. Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc. Natl. Acad. Sci. USA 107, 1752–1756 (2010).

    CAS  Article  Google Scholar 

  13. 13.

    Agarwala, V., Flannick, J., Sunyaev, S., GoT2D Consortium & Altshuler, D. Evaluating empirical bounds on complex disease genetic architecture. Nat. Genet. 45, 1418–1427 (2013).

    CAS  Article  Google Scholar 

  14. 14.

    Zuk, O. et al. Searching for missing heritability: designing rare variant association studies. Proc. Natl. Acad. Sci. USA 111, E455–E464 (2014).

    CAS  Article  Google Scholar 

  15. 15.

    Mancuso, N. et al. The contribution of rare variation to prostate cancer heritability. Nat. Genet. 48, 30–35 (2015).

    Article  Google Scholar 

  16. 16.

    Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).

    CAS  Article  Google Scholar 

  17. 17.

    Simons, Y. B., Bullaughey, K., Hudson, R. R. & Sella, G. A population genetic interpretation of GWAS findings for human quantitative traits. PLoS Biol. 16, e2002985 (2018).

    Article  Google Scholar 

  18. 18.

    The UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

    Article  Google Scholar 

  19. 19.

    Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429.e19 (2016).

    CAS  Article  Google Scholar 

  20. 20.

    Sveinbjornsson, G. et al. Weighting sequence variants based on their annotation increases power of whole-genome association studies. Nat. Genet. 48, 314–317 (2016).

    CAS  Article  Google Scholar 

  21. 21.

    Marouli, E. et al. Rare and low-frequency coding variants alter human adult height. Nature 542, 186–190 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014).

    CAS  Article  Google Scholar 

  23. 23.

    Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

    CAS  Article  Google Scholar 

  24. 24.

    Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    Article  Google Scholar 

  25. 25.

    Bycroft, C. et al. Genome-wide genetic data on ~500,000 UK Biobank participants. Preprint at https://www.biorxiv.org/content/early/2017/07/20/166298 (2017).

  26. 26.

    Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).

    CAS  Article  Google Scholar 

  27. 27.

    Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    CAS  Article  Google Scholar 

  28. 28.

    Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).

    Article  Google Scholar 

  29. 29.

    Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248 (2010).

    CAS  Article  Google Scholar 

  30. 30.

    Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).

    CAS  Article  Google Scholar 

  31. 31.

    Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017).

    CAS  Article  Google Scholar 

  32. 32.

    Gazal, S., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Preprint at https://www.biorxiv.org/content/early/2018/01/30/256412 (2018).

  33. 33.

    Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).

    CAS  Article  Google Scholar 

  34. 34.

    Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).

    CAS  Article  Google Scholar 

  35. 35.

    Lee, S. H. et al. Estimation of SNP heritability from dense genotype data. Am. J. Hum. Genet. 93, 1151–1155 (2013).

    CAS  Article  Google Scholar 

  36. 36.

    Li, Y. et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat. Genet. 42, 969 (2010).

    CAS  Article  Google Scholar 

  37. 37.

    Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

    CAS  Article  Google Scholar 

  38. 38.

    Shlyueva, D., Stampfel, G. & Stark, A. Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet. 15, 272–286 (2014).

    CAS  Article  Google Scholar 

  39. 39.

    Ganna, A. et al. Quantifying the impact of rare and ultra-rare coding variation across the phenotypic spectrum. Am. J. Hum. Genet. 102, 1204–1211 (2018).

    CAS  Article  Google Scholar 

  40. 40.

    Haller, B. C. & Messer, P. W. SLiM 2: flexible, interactive forward genetic simulations. Mol. Biol. Evol. 34, 230–240 (2017).

    CAS  Article  Google Scholar 

  41. 41.

    Kryukov, G. V., Pennacchio, L. A. & Sunyaev, S. R. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am. J. Hum. Genet. 80, 727–739 (2007).

    CAS  Article  Google Scholar 

  42. 42.

    Short, P. J. et al. De novo mutations in regulatory elements in neurodevelopmental disorders. Nature 555, 611–616 (2018).

    CAS  Article  Google Scholar 

  43. 43.

    Won, H. et al. Chromosome conformation elucidates regulatory relationships in developing human brain. Nature 538, 523–527 (2016).

    Article  Google Scholar 

  44. 44.

    Claussnitzer, M. et al. FTO obesity variant circuitry and adipocyte browning in humans. N. Engl. J. Med. 373, 895–907 (2015).

    CAS  Article  Google Scholar 

  45. 45.

    Kichaev, G. et al. Leveraging polygenic functional enrichment to improve GWAS power. Preprint at https://www.biorxiv.org/content/early/2017/11/20/222265 (2017).

  46. 46.

    Ritchie, G. R. S., Dunham, I., Zeggini, E. & Flicek, P. Functional annotation of non-coding sequence variants. Nat. Methods 11, 294–296 (2014).

    CAS  Article  Google Scholar 

  47. 47.

    Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    CAS  Article  Google Scholar 

  48. 48.

    Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J. D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214–220 (2016).

    CAS  Article  Google Scholar 

  49. 49.

    Huang, Y.-F., Gulko, B. & Siepel, A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat. Genet. 49, 618–624 (2017).

    CAS  Article  Google Scholar 

  50. 50.

    di Iulio, J. et al. The human noncoding genome defined by genetic diversity. Nat. Genet. 50, 333–337 (2018).

    Article  Google Scholar 

  51. 51.

    Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    CAS  Article  Google Scholar 

  52. 52.

    Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    CAS  Article  Google Scholar 

  53. 53.

    Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247–250 (2012).

    CAS  Article  Google Scholar 

  54. 54.

    Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

    CAS  Article  Google Scholar 

  55. 55.

    Moore, C. B. et al. Low frequency variants, collapsed based on biological knowledge, uncover complexity of population stratification in 1000 genomes project data. PLoS Genet. 9, e1003959 (2013).

    Article  Google Scholar 

  56. 56.

    Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).

    CAS  Article  Google Scholar 

  57. 57.

    Liu, X. et al. Functional architectures of local and distal regulation of gene expression in multiple human tissues. Am. J. Hum. Genet. 100, 605–616 (2017).

    CAS  Article  Google Scholar 

  58. 58.

    Hormozdiari, F. et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat. Genet. 50, 1041–1047 (2018).

    CAS  Article  Google Scholar 

  59. 59.

    Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  Google Scholar 

  60. 60.

    Rasmussen, M. D., Hubisz, M. J., Gronau, I. & Siepel, A. Genome-wide inference of ancestral recombination graphs. PLoS Genet. 10, e1004342 (2014).

    Article  Google Scholar 

  61. 61.

    Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  Google Scholar 

  62. 62.

    Hoffman, M. M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).

    CAS  Article  Google Scholar 

  63. 63.

    Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    CAS  Article  Google Scholar 

  64. 64.

    Vahedi, G. et al. Super-enhancers delineate disease-associated regulatory nodes in T-cells. Nature 520, 558–562 (2015).

    CAS  Article  Google Scholar 

  65. 65.

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    CAS  Article  Google Scholar 

  66. 66.

    Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. 108, 11983–11988 (2011).

    CAS  Article  Google Scholar 

  67. 67.

    Nordborg, M. & Krone, S. M. Separation of time scales and convergence to the coalescent in structured populations. in Modern Developments in Theoretical Population Genetics: The Legacy of Gustave Malécot (eds. Slatkin, M. & Veuille, M.) Ch. 12 (Oxford Univ. Press, New York, 2002).

Download references

Acknowledgements

We thank A. Gusev, C. Marquez-Luna, M. Hujoel, Y. Reshef, F. Hormozdiari, O. Weissbrod, B. Neale, A. Siepel, and S. M. Gazal for helpful discussions. This research has been conducted using the UK Biobank Resource (application number 16549). This research was funded by NIH grants U01 HG009379, R01 MH101244, R01 MH107649, R01 MH109978 and U01 HG009088. P.R.L. was supported by a Burroughs Wellcome Fund Career Award at the Scientific Interfaces and the Next Generation Fund at the Broad Institute of MIT and Harvard.

Author information

Affiliations

Authors

Contributions

S.G. and A.L.P. designed experiments. S.G. performed experiments. S.G., P.R.L., H.K.F., A.G., and A.S. analyzed data. S.G. and A.L.P. wrote the manuscript with assistance from P.R.L., H.K.F., A.G., A.S., and S.S..

Corresponding authors

Correspondence to Steven Gazal or Alkes L. Price.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–21, Supplementary Table 14 and Supplementary Note

Reporting Summary

Supplementary Tables

Supplementary Tables 1–13 and 15–19

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gazal, S., Loh, PR., Finucane, H.K. et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat Genet 50, 1600–1607 (2018). https://doi.org/10.1038/s41588-018-0231-8

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing