Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Ultrarare variants drive substantial cis heritability of human gene expression

Abstract

The vast majority of human mutations have minor allele frequencies under 1%, with the plurality observed only once (that is, ‘singletons’). While Mendelian diseases are predominantly caused by rare alleles, their cumulative contribution to complex phenotypes is largely unknown. We develop and rigorously validate an approach to jointly estimate the contribution of all alleles, including singletons, to phenotypic variation. We apply our approach to transcriptional regulation, an intermediate between genetic variation and complex disease. Using whole-genome DNA and lymphoblastoid cell line RNA sequencing data from 360 European individuals, we conservatively estimate that singletons contribute approximately 25% of cis heritability across genes (dwarfing the contributions of other frequencies). The majority (approximately 76%) of singleton heritability derives from ultrarare variants absent from thousands of additional samples. We develop an inference procedure to demonstrate that our results are consistent with pervasive purifying selection shaping the regulatory architecture of most human genes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Simulation results.
Fig. 2: Partitioning heritability.
Fig. 3: Pervasive purifying selection drives the genetic architecture of gene expression.

Similar content being viewed by others

Data availability

RNA-seq gene expression data were downloaded from http://www.internationalgenome.org/data-portal/data-collection/geuvadis. This dataset contains 375 individuals of European descent from 4 locations. Each of these individuals are contained in the 1KGP and genome sequence data were downloaded from www.1000genomes.org (ref. 24).

Code availability

Three open source software tools are being made available as part of this study; all are available on GitHub: (1) HEh2.R—R code that performs all the Haseman–Elston analyses and simulations discussed in this paper. It also implements the artificial intelligence algorithm for parameter inference of linear mixed models. It is available from https://github.com/hernrya/HEh2; (2) SingHer R package discussed in the Supplementary Note, with performance statistics and available from https://github.com/andywdahl/SingHer; and (3) rejection sampling: scripts demonstrating how we used rejection sampling to infer parameters of the phenotype model are available from https://github.com/uricchio/HE_scripts.

References

  1. Keinan, A. & Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743 (2012).

    Article  CAS  Google Scholar 

  2. Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011).

    Article  CAS  Google Scholar 

  3. Zhao, J. et al. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016).

    Article  CAS  Google Scholar 

  4. Montgomery, S. B., Lappalainen, T., Gutierrez-Arcelus, M. & Dermitzakis, E. T. Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet. 7, e1002144 (2011).

    Article  CAS  Google Scholar 

  5. Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).

    Article  CAS  Google Scholar 

  6. Gaugler, T. et al. Most genetic risk for autism resides with common variation. Nat. Genet. 46, 881–885 (2014).

    Article  CAS  Google Scholar 

  7. Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).

    Article  CAS  Google Scholar 

  8. Li, X. et al. The impact of rare variation on gene expression across tissues. Nature 550, 239–243 (2017).

    Article  Google Scholar 

  9. Eyre-Walker, A. Evolution in health and medicine Sackler colloquium: genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc. Natl Acad. Sci. USA 107, 1752–1756 (2010).

    Article  CAS  Google Scholar 

  10. Uricchio, L. H., Zaitlen, N. A., Ye, C. J., Witte, J. S. & Hernandez, R. D. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res. 26, 863–873 (2016).

    Article  CAS  Google Scholar 

  11. Simons, Y. B., Turchin, M. C., Pritchard, J. K. & Sella, G. The deleterious mutation load is insensitive to recent population history. Nat. Genet. 46, 220–224 (2014).

    Article  CAS  Google Scholar 

  12. Lohmueller, K. E. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet. 10, e1004379 (2014).

    Article  Google Scholar 

  13. Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).

    Article  CAS  Google Scholar 

  14. McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

    Article  CAS  Google Scholar 

  15. Mancuso, N. et al. The contribution of rare variation to prostate cancer heritability. Nat. Genet. 48, 30–35 (2016).

    Article  CAS  Google Scholar 

  16. Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).

    Article  Google Scholar 

  17. Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

    Article  CAS  Google Scholar 

  18. Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).

    Article  CAS  Google Scholar 

  19. Pritchard, J. K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 (2001).

    Article  CAS  Google Scholar 

  20. Gusev, A. et al. Quantifying missing heritability at known GWAS loci. PLoS Genet. 9, e1003993 (2013).

    Article  Google Scholar 

  21. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    Article  CAS  Google Scholar 

  22. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    Article  CAS  Google Scholar 

  23. Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

    Article  CAS  Google Scholar 

  24. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  Google Scholar 

  25. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    Article  CAS  Google Scholar 

  26. Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl Acad. Sci. USA 111, E5272–E5281 (2014).

    Article  CAS  Google Scholar 

  27. Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).

    Article  Google Scholar 

  28. Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).

    Article  CAS  Google Scholar 

  29. Speed, D. et al. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).

    Article  CAS  Google Scholar 

  30. Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012).

    Article  CAS  Google Scholar 

  31. Simons, Y. B., Bullaughey, K., Hudson, R. R. & Sella, G. A population genetic interpretation of GWAS findings for human quantitative traits. PLoS Biol. 16, e2002985 (2018).

    Article  Google Scholar 

  32. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    Article  CAS  Google Scholar 

  33. Dickson, S. P., Wang, K., Krantz, I., Hakonarson, H. & Goldstein, D. B. Rare variants create synthetic genome-wide associations. PLoS Biol. 8, e1000294 (2010).

    Article  Google Scholar 

  34. McCoy, R. C., Wakefield, J. & Akey, J. M. Impacts of Neanderthal-introgressed sequences on the landscape of human gene expression. Cell 168, 916–927.e12 (2017).

    Article  CAS  Google Scholar 

  35. Chen, L., Liu, P., Evans, T. C.Jr. & Ettwiller, L. M. DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355, 752–756 (2017).

    Article  CAS  Google Scholar 

  36. Uricchio, L. H., Torres, R., Witte, J. S. & Hernandez, R. D. Population genetic simulations of complex phenotypes with implications for rare variant association tests. Genet. Epidemiol. 39, 35–44 (2015).

    Article  Google Scholar 

  37. Hernandez, R. D. A flexible forward simulator for populations subject to selection and demography. Bioinformatics 24, 2786–2787 (2008).

    Article  CAS  Google Scholar 

  38. Tavaré, S., Balding, D. J., Griffiths, R. C. & Donnelly, P. Inferring coalescence times from DNA sequence data. Genetics 145, 505–518 (1997).

    PubMed  PubMed Central  Google Scholar 

  39. Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

    Article  CAS  Google Scholar 

  40. Torgerson, D. G. et al. Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet. 5, e1000592 (2009).

    Article  Google Scholar 

  41. Boyko, A. R. et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4, e1000083 (2008).

    Article  Google Scholar 

  42. Katzman, S. et al. Human genome ultraconserved elements are ultraselected. Science 317, 915 (2007).

    Article  CAS  Google Scholar 

  43. Andrés, A. M. et al. Balancing selection maintains a form of ERAP2 that undergoes nonsense-mediated decay and affects antigen presentation. PLoS Genet. 6, e1001157 (2010).

    Article  Google Scholar 

  44. Price, A. L. et al. Single-tissue and cross-tissue heritability of gene expression via identity-by-descent in related or unrelated individuals. PLoS Genet. 7, e1001317 (2011).

    Article  CAS  Google Scholar 

  45. Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012).

    Article  CAS  Google Scholar 

  46. Powell, J. E. et al. Genetic control of gene expression in whole blood and lymphoblastoid cell lines is largely independent. Genome Res. 22, 456–466 (2012).

    Article  CAS  Google Scholar 

  47. Glassberg, E. C., Gao, Z., Harpak, A., Lan, X. & Pritchard, J. K. Measurement of selective constraint on human gene expression. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/345801v1 (2018).

  48. Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    Article  Google Scholar 

  49. Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/563866v1 (2019).

  50. Schweiger, R. et al. Detecting heritable phenotypes without a model using fast permutation testing for heritability and set-tests. Nat. Commun. 9, 4919 (2018).

    Article  Google Scholar 

  51. Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

    Article  CAS  Google Scholar 

  52. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    Article  CAS  Google Scholar 

  53. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    Article  Google Scholar 

Download references

Acknowledgements

We thank H. M. Kang, A. Auton and S. Gusev for discussions about possible confounders that improved our analysis; members of the Pritchard laboratory for comments on rejection sampling; J. Barrett and K. Karczewski for peer-reviewing our preprint; R. Torres for assistance with data analysis; J. Wall for assistance with the Neanderthal-introgressed alleles; and A. Hernandez for discussions on figure colors. Research reported in this publication was supported by the National Human Genome Research Institute of the National Institutes of Health (award no. R01HG007644 to R.D.H. and award no. K25HL121295). L.H.U. was supported by an Institutional Research and Academic Career Development Award (National Institute of General Medical Sciences, grant no. K12GM088033). K.H. was supported by a Gilliam Fellowship for Advanced Study; A.D. was supported by NIH (award nos. U01HG009080 and R01HG006399).

Author information

Authors and Affiliations

Authors

Contributions

R.D.H. and N.Z. conceived and designed the study. L.H.U. and A.D. developed methods. R.D.H., L.H.U., K.H., C.Y., A.D. and N.Z. contributed to data analysis or simulations. R.D.H. and N.Z. wrote the manuscript. All authors read and approved the manuscript.

Corresponding authors

Correspondence to Ryan D. Hernandez or Noah Zaitlen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Note, Tables 1 and 2, and Figs. 1–27

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hernandez, R.D., Uricchio, L.H., Hartman, K. et al. Ultrarare variants drive substantial cis heritability of human gene expression. Nat Genet 51, 1349–1355 (2019). https://doi.org/10.1038/s41588-019-0487-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-019-0487-7

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing