Analysis | Published:

Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations

Nature Genetics (2018) | Download Citation

Abstract

Common variant heritability has been widely reported to be concentrated in variants within cell-type-specific non-coding functional annotations, but little is known about low-frequency variant functional architectures. We partitioned the heritability of both low-frequency (0.5%≤ minor allele frequency <5%) and common (minor allele frequency ≥5%) variants in 40 UK Biobank traits across a broad set of functional annotations. We determined that non-synonymous coding variants explain 17 ± 1% of low-frequency variant heritability (\(h_{{\mathrm{lf}}}^2\)) versus 2.1 ± 0.2% of common variant heritability (\(h_{\mathrm{c}}^2\)). Cell-type-specific non-coding annotations that were significantly enriched for \(h_{\mathrm{c}}^2\) of corresponding traits were similarly enriched for \(h_{{\mathrm{lf}}}^2\) for most traits, but more enriched for brain-related annotations and traits. For example, H3K4me3 marks in brain dorsolateral prefrontal cortex explain 57 ± 12% of \(h_{{\mathrm{lf}}}^2\) versus 12 ± 2% of \(h_{\mathrm{c}}^2\) for neuroticism. Forward simulations confirmed that low-frequency variant enrichment depends on the mean selection coefficient of causal variants in the annotation, and can be used to predict effect size variance of causal rare variants (minor allele frequency <0.5%).

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

Baseline-LF annotations are available at https://data.broadinstitute.org/alkesgroup/LDSCORE/baselineLF.tar.gz. BOLT-LMM association statistics computed in this study are available at https://data.broadinstitute.org/alkesgroup/UKBB/UKBB_409K.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

  2. 2.

    Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).

  3. 3.

    Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).

  4. 4.

    Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).

  5. 5.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

  6. 6.

    Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  7. 7.

    Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

  8. 8.

    Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).

  9. 9.

    Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).

  10. 10.

    Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).

  11. 11.

    Schoech, A. et al. Quantification of frequency-dependent genetic architectures and action of negative selection in 25 UK Biobank traits. Preprint at https://www.biorxiv.org/content/early/2017/09/13/188086 (2017).

  12. 12.

    Eyre-Walker, A. Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc. Natl. Acad. Sci. USA 107, 1752–1756 (2010).

  13. 13.

    Agarwala, V., Flannick, J., Sunyaev, S., GoT2D Consortium & Altshuler, D. Evaluating empirical bounds on complex disease genetic architecture. Nat. Genet. 45, 1418–1427 (2013).

  14. 14.

    Zuk, O. et al. Searching for missing heritability: designing rare variant association studies. Proc. Natl. Acad. Sci. USA 111, E455–E464 (2014).

  15. 15.

    Mancuso, N. et al. The contribution of rare variation to prostate cancer heritability. Nat. Genet. 48, 30–35 (2015).

  16. 16.

    Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).

  17. 17.

    Simons, Y. B., Bullaughey, K., Hudson, R. R. & Sella, G. A population genetic interpretation of GWAS findings for human quantitative traits. PLoS Biol. 16, e2002985 (2018).

  18. 18.

    The UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

  19. 19.

    Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429.e19 (2016).

  20. 20.

    Sveinbjornsson, G. et al. Weighting sequence variants based on their annotation increases power of whole-genome association studies. Nat. Genet. 48, 314–317 (2016).

  21. 21.

    Marouli, E. et al. Rare and low-frequency coding variants alter human adult height. Nature 542, 186–190 (2017).

  22. 22.

    Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014).

  23. 23.

    Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

  24. 24.

    Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

  25. 25.

    Bycroft, C. et al. Genome-wide genetic data on ~500,000 UK Biobank participants. Preprint at https://www.biorxiv.org/content/early/2017/07/20/166298 (2017).

  26. 26.

    Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).

  27. 27.

    Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

  28. 28.

    Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).

  29. 29.

    Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248 (2010).

  30. 30.

    Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).

  31. 31.

    Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017).

  32. 32.

    Gazal, S., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Preprint at https://www.biorxiv.org/content/early/2018/01/30/256412 (2018).

  33. 33.

    Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).

  34. 34.

    Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).

  35. 35.

    Lee, S. H. et al. Estimation of SNP heritability from dense genotype data. Am. J. Hum. Genet. 93, 1151–1155 (2013).

  36. 36.

    Li, Y. et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat. Genet. 42, 969 (2010).

  37. 37.

    Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

  38. 38.

    Shlyueva, D., Stampfel, G. & Stark, A. Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet. 15, 272–286 (2014).

  39. 39.

    Ganna, A. et al. Quantifying the impact of rare and ultra-rare coding variation across the phenotypic spectrum. Am. J. Hum. Genet. 102, 1204–1211 (2018).

  40. 40.

    Haller, B. C. & Messer, P. W. SLiM 2: flexible, interactive forward genetic simulations. Mol. Biol. Evol. 34, 230–240 (2017).

  41. 41.

    Kryukov, G. V., Pennacchio, L. A. & Sunyaev, S. R. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am. J. Hum. Genet. 80, 727–739 (2007).

  42. 42.

    Short, P. J. et al. De novo mutations in regulatory elements in neurodevelopmental disorders. Nature 555, 611–616 (2018).

  43. 43.

    Won, H. et al. Chromosome conformation elucidates regulatory relationships in developing human brain. Nature 538, 523–527 (2016).

  44. 44.

    Claussnitzer, M. et al. FTO obesity variant circuitry and adipocyte browning in humans. N. Engl. J. Med. 373, 895–907 (2015).

  45. 45.

    Kichaev, G. et al. Leveraging polygenic functional enrichment to improve GWAS power. Preprint at https://www.biorxiv.org/content/early/2017/11/20/222265 (2017).

  46. 46.

    Ritchie, G. R. S., Dunham, I., Zeggini, E. & Flicek, P. Functional annotation of non-coding sequence variants. Nat. Methods 11, 294–296 (2014).

  47. 47.

    Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

  48. 48.

    Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J. D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214–220 (2016).

  49. 49.

    Huang, Y.-F., Gulko, B. & Siepel, A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat. Genet. 49, 618–624 (2017).

  50. 50.

    di Iulio, J. et al. The human noncoding genome defined by genetic diversity. Nat. Genet. 50, 333–337 (2018).

  51. 51.

    Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

  52. 52.

    Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

  53. 53.

    Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247–250 (2012).

  54. 54.

    Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

  55. 55.

    Moore, C. B. et al. Low frequency variants, collapsed based on biological knowledge, uncover complexity of population stratification in 1000 genomes project data. PLoS Genet. 9, e1003959 (2013).

  56. 56.

    Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).

  57. 57.

    Liu, X. et al. Functional architectures of local and distal regulation of gene expression in multiple human tissues. Am. J. Hum. Genet. 100, 605–616 (2017).

  58. 58.

    Hormozdiari, F. et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat. Genet. 50, 1041–1047 (2018).

  59. 59.

    Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

  60. 60.

    Rasmussen, M. D., Hubisz, M. J., Gronau, I. & Siepel, A. Genome-wide inference of ancestral recombination graphs. PLoS Genet. 10, e1004342 (2014).

  61. 61.

    Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  62. 62.

    Hoffman, M. M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).

  63. 63.

    Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

  64. 64.

    Vahedi, G. et al. Super-enhancers delineate disease-associated regulatory nodes in T-cells. Nature 520, 558–562 (2015).

  65. 65.

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

  66. 66.

    Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. 108, 11983–11988 (2011).

  67. 67.

    Nordborg, M. & Krone, S. M. Separation of time scales and convergence to the coalescent in structured populations. in Modern Developments in Theoretical Population Genetics: The Legacy of Gustave Malécot (eds. Slatkin, M. & Veuille, M.) Ch. 12 (Oxford Univ. Press, New York, 2002).

Download references

Acknowledgements

We thank A. Gusev, C. Marquez-Luna, M. Hujoel, Y. Reshef, F. Hormozdiari, O. Weissbrod, B. Neale, A. Siepel, and S. M. Gazal for helpful discussions. This research has been conducted using the UK Biobank Resource (application number 16549). This research was funded by NIH grants U01 HG009379, R01 MH101244, R01 MH107649, R01 MH109978 and U01 HG009088. P.R.L. was supported by a Burroughs Wellcome Fund Career Award at the Scientific Interfaces and the Next Generation Fund at the Broad Institute of MIT and Harvard.

Author information

Affiliations

  1. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA

    • Steven Gazal
    • , Armin Schoech
    •  & Alkes L. Price
  2. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA

    • Steven Gazal
    • , Po-Ru Loh
    • , Hilary K. Finucane
    • , Andrea Ganna
    • , Armin Schoech
    • , Shamil Sunyaev
    •  & Alkes L. Price
  3. Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA

    • Po-Ru Loh
    •  & Shamil Sunyaev
  4. Schmidt Fellows Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA

    • Hilary K. Finucane
  5. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA

    • Andrea Ganna
  6. Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA

    • Andrea Ganna
  7. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA

    • Armin Schoech
    •  & Alkes L. Price
  8. Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA

    • Shamil Sunyaev

Authors

  1. Search for Steven Gazal in:

  2. Search for Po-Ru Loh in:

  3. Search for Hilary K. Finucane in:

  4. Search for Andrea Ganna in:

  5. Search for Armin Schoech in:

  6. Search for Shamil Sunyaev in:

  7. Search for Alkes L. Price in:

Contributions

S.G. and A.L.P. designed experiments. S.G. performed experiments. S.G., P.R.L., H.K.F., A.G., and A.S. analyzed data. S.G. and A.L.P. wrote the manuscript with assistance from P.R.L., H.K.F., A.G., A.S., and S.S..

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Steven Gazal or Alkes L. Price.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–21, Supplementary Table 14 and Supplementary Note

  2. Reporting Summary

  3. Supplementary Tables

    Supplementary Tables 1–13 and 15–19

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41588-018-0231-8