Ultrarare variants drive substantial cis heritability of human gene expression


The vast majority of human mutations have minor allele frequencies under 1%, with the plurality observed only once (that is, ‘singletons’). While Mendelian diseases are predominantly caused by rare alleles, their cumulative contribution to complex phenotypes is largely unknown. We develop and rigorously validate an approach to jointly estimate the contribution of all alleles, including singletons, to phenotypic variation. We apply our approach to transcriptional regulation, an intermediate between genetic variation and complex disease. Using whole-genome DNA and lymphoblastoid cell line RNA sequencing data from 360 European individuals, we conservatively estimate that singletons contribute approximately 25% of cis heritability across genes (dwarfing the contributions of other frequencies). The majority (approximately 76%) of singleton heritability derives from ultrarare variants absent from thousands of additional samples. We develop an inference procedure to demonstrate that our results are consistent with pervasive purifying selection shaping the regulatory architecture of most human genes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Simulation results.
Fig. 2: Partitioning heritability.
Fig. 3: Pervasive purifying selection drives the genetic architecture of gene expression.

Data availability

RNA-seq gene expression data were downloaded from http://www.internationalgenome.org/data-portal/data-collection/geuvadis. This dataset contains 375 individuals of European descent from 4 locations. Each of these individuals are contained in the 1KGP and genome sequence data were downloaded from www.1000genomes.org (ref. 24).

Code availability

Three open source software tools are being made available as part of this study; all are available on GitHub: (1) HEh2.R—R code that performs all the Haseman–Elston analyses and simulations discussed in this paper. It also implements the artificial intelligence algorithm for parameter inference of linear mixed models. It is available from https://github.com/hernrya/HEh2; (2) SingHer R package discussed in the Supplementary Note, with performance statistics and available from https://github.com/andywdahl/SingHer; and (3) rejection sampling: scripts demonstrating how we used rejection sampling to infer parameters of the phenotype model are available from https://github.com/uricchio/HE_scripts.


  1. 1.

    Keinan, A. & Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743 (2012).

  2. 2.

    Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011).

  3. 3.

    Zhao, J. et al. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016).

  4. 4.

    Montgomery, S. B., Lappalainen, T., Gutierrez-Arcelus, M. & Dermitzakis, E. T. Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet. 7, e1002144 (2011).

  5. 5.

    Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).

  6. 6.

    Gaugler, T. et al. Most genetic risk for autism resides with common variation. Nat. Genet. 46, 881–885 (2014).

  7. 7.

    Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).

  8. 8.

    Li, X. et al. The impact of rare variation on gene expression across tissues. Nature 550, 239–243 (2017).

  9. 9.

    Eyre-Walker, A. Evolution in health and medicine Sackler colloquium: genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc. Natl Acad. Sci. USA 107, 1752–1756 (2010).

  10. 10.

    Uricchio, L. H., Zaitlen, N. A., Ye, C. J., Witte, J. S. & Hernandez, R. D. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res. 26, 863–873 (2016).

  11. 11.

    Simons, Y. B., Turchin, M. C., Pritchard, J. K. & Sella, G. The deleterious mutation load is insensitive to recent population history. Nat. Genet. 46, 220–224 (2014).

  12. 12.

    Lohmueller, K. E. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet. 10, e1004379 (2014).

  13. 13.

    Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).

  14. 14.

    McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

  15. 15.

    Mancuso, N. et al. The contribution of rare variation to prostate cancer heritability. Nat. Genet. 48, 30–35 (2016).

  16. 16.

    Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).

  17. 17.

    Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

  18. 18.

    Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).

  19. 19.

    Pritchard, J. K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 (2001).

  20. 20.

    Gusev, A. et al. Quantifying missing heritability at known GWAS loci. PLoS Genet. 9, e1003993 (2013).

  21. 21.

    Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

  22. 22.

    Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

  23. 23.

    Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

  24. 24.

    Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  25. 25.

    Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

  26. 26.

    Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl Acad. Sci. USA 111, E5272–E5281 (2014).

  27. 27.

    Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).

  28. 28.

    Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).

  29. 29.

    Speed, D. et al. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).

  30. 30.

    Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012).

  31. 31.

    Simons, Y. B., Bullaughey, K., Hudson, R. R. & Sella, G. A population genetic interpretation of GWAS findings for human quantitative traits. PLoS Biol. 16, e2002985 (2018).

  32. 32.

    Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

  33. 33.

    Dickson, S. P., Wang, K., Krantz, I., Hakonarson, H. & Goldstein, D. B. Rare variants create synthetic genome-wide associations. PLoS Biol. 8, e1000294 (2010).

  34. 34.

    McCoy, R. C., Wakefield, J. & Akey, J. M. Impacts of Neanderthal-introgressed sequences on the landscape of human gene expression. Cell 168, 916–927.e12 (2017).

  35. 35.

    Chen, L., Liu, P., Evans, T. C.Jr. & Ettwiller, L. M. DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355, 752–756 (2017).

  36. 36.

    Uricchio, L. H., Torres, R., Witte, J. S. & Hernandez, R. D. Population genetic simulations of complex phenotypes with implications for rare variant association tests. Genet. Epidemiol. 39, 35–44 (2015).

  37. 37.

    Hernandez, R. D. A flexible forward simulator for populations subject to selection and demography. Bioinformatics 24, 2786–2787 (2008).

  38. 38.

    Tavaré, S., Balding, D. J., Griffiths, R. C. & Donnelly, P. Inferring coalescence times from DNA sequence data. Genetics 145, 505–518 (1997).

  39. 39.

    Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

  40. 40.

    Torgerson, D. G. et al. Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet. 5, e1000592 (2009).

  41. 41.

    Boyko, A. R. et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4, e1000083 (2008).

  42. 42.

    Katzman, S. et al. Human genome ultraconserved elements are ultraselected. Science 317, 915 (2007).

  43. 43.

    Andrés, A. M. et al. Balancing selection maintains a form of ERAP2 that undergoes nonsense-mediated decay and affects antigen presentation. PLoS Genet. 6, e1001157 (2010).

  44. 44.

    Price, A. L. et al. Single-tissue and cross-tissue heritability of gene expression via identity-by-descent in related or unrelated individuals. PLoS Genet. 7, e1001317 (2011).

  45. 45.

    Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012).

  46. 46.

    Powell, J. E. et al. Genetic control of gene expression in whole blood and lymphoblastoid cell lines is largely independent. Genome Res. 22, 456–466 (2012).

  47. 47.

    Glassberg, E. C., Gao, Z., Harpak, A., Lan, X. & Pritchard, J. K. Measurement of selective constraint on human gene expression. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/345801v1 (2018).

  48. 48.

    Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

  49. 49.

    Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/563866v1 (2019).

  50. 50.

    Schweiger, R. et al. Detecting heritable phenotypes without a model using fast permutation testing for heritability and set-tests. Nat. Commun. 9, 4919 (2018).

  51. 51.

    Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

  52. 52.

    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

  53. 53.

    Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

Download references


We thank H. M. Kang, A. Auton and S. Gusev for discussions about possible confounders that improved our analysis; members of the Pritchard laboratory for comments on rejection sampling; J. Barrett and K. Karczewski for peer-reviewing our preprint; R. Torres for assistance with data analysis; J. Wall for assistance with the Neanderthal-introgressed alleles; and A. Hernandez for discussions on figure colors. Research reported in this publication was supported by the National Human Genome Research Institute of the National Institutes of Health (award no. R01HG007644 to R.D.H. and award no. K25HL121295). L.H.U. was supported by an Institutional Research and Academic Career Development Award (National Institute of General Medical Sciences, grant no. K12GM088033). K.H. was supported by a Gilliam Fellowship for Advanced Study; A.D. was supported by NIH (award nos. U01HG009080 and R01HG006399).

Author information

R.D.H. and N.Z. conceived and designed the study. L.H.U. and A.D. developed methods. R.D.H., L.H.U., K.H., C.Y., A.D. and N.Z. contributed to data analysis or simulations. R.D.H. and N.Z. wrote the manuscript. All authors read and approved the manuscript.

Correspondence to Ryan D. Hernandez or Noah Zaitlen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Note, Tables 1 and 2, and Figs. 1–27

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hernandez, R.D., Uricchio, L.H., Hartman, K. et al. Ultrarare variants drive substantial cis heritability of human gene expression. Nat Genet 51, 1349–1355 (2019). https://doi.org/10.1038/s41588-019-0487-7

Download citation

Further reading