Abstract
The vast majority of human mutations have minor allele frequencies under 1%, with the plurality observed only once (that is, ‘singletons’). While Mendelian diseases are predominantly caused by rare alleles, their cumulative contribution to complex phenotypes is largely unknown. We develop and rigorously validate an approach to jointly estimate the contribution of all alleles, including singletons, to phenotypic variation. We apply our approach to transcriptional regulation, an intermediate between genetic variation and complex disease. Using whole-genome DNA and lymphoblastoid cell line RNA sequencing data from 360 European individuals, we conservatively estimate that singletons contribute approximately 25% of cis heritability across genes (dwarfing the contributions of other frequencies). The majority (approximately 76%) of singleton heritability derives from ultrarare variants absent from thousands of additional samples. We develop an inference procedure to demonstrate that our results are consistent with pervasive purifying selection shaping the regulatory architecture of most human genes.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Transposable elements maintain genome-wide heterozygosity in inbred populations
Nature Communications Open Access 17 November 2022
-
Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes
Nature Communications Open Access 10 September 2022
-
Challenge accepted: uncovering the role of rare genetic variants in Alzheimer’s disease
Molecular Neurodegeneration Open Access 09 January 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout



Data availability
RNA-seq gene expression data were downloaded from http://www.internationalgenome.org/data-portal/data-collection/geuvadis. This dataset contains 375 individuals of European descent from 4 locations. Each of these individuals are contained in the 1KGP and genome sequence data were downloaded from www.1000genomes.org (ref. 24).
Code availability
Three open source software tools are being made available as part of this study; all are available on GitHub: (1) HEh2.R—R code that performs all the Haseman–Elston analyses and simulations discussed in this paper. It also implements the artificial intelligence algorithm for parameter inference of linear mixed models. It is available from https://github.com/hernrya/HEh2; (2) SingHer R package discussed in the Supplementary Note, with performance statistics and available from https://github.com/andywdahl/SingHer; and (3) rejection sampling: scripts demonstrating how we used rejection sampling to infer parameters of the phenotype model are available from https://github.com/uricchio/HE_scripts.
References
Keinan, A. & Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743 (2012).
Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011).
Zhao, J. et al. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016).
Montgomery, S. B., Lappalainen, T., Gutierrez-Arcelus, M. & Dermitzakis, E. T. Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet. 7, e1002144 (2011).
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).
Gaugler, T. et al. Most genetic risk for autism resides with common variation. Nat. Genet. 46, 881–885 (2014).
Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).
Li, X. et al. The impact of rare variation on gene expression across tissues. Nature 550, 239–243 (2017).
Eyre-Walker, A. Evolution in health and medicine Sackler colloquium: genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc. Natl Acad. Sci. USA 107, 1752–1756 (2010).
Uricchio, L. H., Zaitlen, N. A., Ye, C. J., Witte, J. S. & Hernandez, R. D. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res. 26, 863–873 (2016).
Simons, Y. B., Turchin, M. C., Pritchard, J. K. & Sella, G. The deleterious mutation load is insensitive to recent population history. Nat. Genet. 46, 220–224 (2014).
Lohmueller, K. E. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet. 10, e1004379 (2014).
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Mancuso, N. et al. The contribution of rare variation to prostate cancer heritability. Nat. Genet. 48, 30–35 (2016).
Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).
Pritchard, J. K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 (2001).
Gusev, A. et al. Quantifying missing heritability at known GWAS loci. PLoS Genet. 9, e1003993 (2013).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl Acad. Sci. USA 111, E5272–E5281 (2014).
Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).
Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).
Speed, D. et al. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).
Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012).
Simons, Y. B., Bullaughey, K., Hudson, R. R. & Sella, G. A population genetic interpretation of GWAS findings for human quantitative traits. PLoS Biol. 16, e2002985 (2018).
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Dickson, S. P., Wang, K., Krantz, I., Hakonarson, H. & Goldstein, D. B. Rare variants create synthetic genome-wide associations. PLoS Biol. 8, e1000294 (2010).
McCoy, R. C., Wakefield, J. & Akey, J. M. Impacts of Neanderthal-introgressed sequences on the landscape of human gene expression. Cell 168, 916–927.e12 (2017).
Chen, L., Liu, P., Evans, T. C.Jr. & Ettwiller, L. M. DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355, 752–756 (2017).
Uricchio, L. H., Torres, R., Witte, J. S. & Hernandez, R. D. Population genetic simulations of complex phenotypes with implications for rare variant association tests. Genet. Epidemiol. 39, 35–44 (2015).
Hernandez, R. D. A flexible forward simulator for populations subject to selection and demography. Bioinformatics 24, 2786–2787 (2008).
Tavaré, S., Balding, D. J., Griffiths, R. C. & Donnelly, P. Inferring coalescence times from DNA sequence data. Genetics 145, 505–518 (1997).
Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Torgerson, D. G. et al. Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet. 5, e1000592 (2009).
Boyko, A. R. et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4, e1000083 (2008).
Katzman, S. et al. Human genome ultraconserved elements are ultraselected. Science 317, 915 (2007).
Andrés, A. M. et al. Balancing selection maintains a form of ERAP2 that undergoes nonsense-mediated decay and affects antigen presentation. PLoS Genet. 6, e1001157 (2010).
Price, A. L. et al. Single-tissue and cross-tissue heritability of gene expression via identity-by-descent in related or unrelated individuals. PLoS Genet. 7, e1001317 (2011).
Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012).
Powell, J. E. et al. Genetic control of gene expression in whole blood and lymphoblastoid cell lines is largely independent. Genome Res. 22, 456–466 (2012).
Glassberg, E. C., Gao, Z., Harpak, A., Lan, X. & Pritchard, J. K. Measurement of selective constraint on human gene expression. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/345801v1 (2018).
Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/563866v1 (2019).
Schweiger, R. et al. Detecting heritable phenotypes without a model using fast permutation testing for heritability and set-tests. Nat. Commun. 9, 4919 (2018).
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Acknowledgements
We thank H. M. Kang, A. Auton and S. Gusev for discussions about possible confounders that improved our analysis; members of the Pritchard laboratory for comments on rejection sampling; J. Barrett and K. Karczewski for peer-reviewing our preprint; R. Torres for assistance with data analysis; J. Wall for assistance with the Neanderthal-introgressed alleles; and A. Hernandez for discussions on figure colors. Research reported in this publication was supported by the National Human Genome Research Institute of the National Institutes of Health (award no. R01HG007644 to R.D.H. and award no. K25HL121295). L.H.U. was supported by an Institutional Research and Academic Career Development Award (National Institute of General Medical Sciences, grant no. K12GM088033). K.H. was supported by a Gilliam Fellowship for Advanced Study; A.D. was supported by NIH (award nos. U01HG009080 and R01HG006399).
Author information
Authors and Affiliations
Contributions
R.D.H. and N.Z. conceived and designed the study. L.H.U. and A.D. developed methods. R.D.H., L.H.U., K.H., C.Y., A.D. and N.Z. contributed to data analysis or simulations. R.D.H. and N.Z. wrote the manuscript. All authors read and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Note, Tables 1 and 2, and Figs. 1–27
Rights and permissions
About this article
Cite this article
Hernandez, R.D., Uricchio, L.H., Hartman, K. et al. Ultrarare variants drive substantial cis heritability of human gene expression. Nat Genet 51, 1349–1355 (2019). https://doi.org/10.1038/s41588-019-0487-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-019-0487-7
This article is cited by
-
Challenge accepted: uncovering the role of rare genetic variants in Alzheimer’s disease
Molecular Neurodegeneration (2022)
-
Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes
Nature Communications (2022)
-
Transposable elements maintain genome-wide heterozygosity in inbred populations
Nature Communications (2022)
-
Using the optimal method—explained variance weighted genetic risk score to predict the efficacy of folic acid therapy to hyperhomocysteinemia
European Journal of Clinical Nutrition (2022)
-
Genome-wide superior alleles, haplotypes and candidate genes associated with tolerance on sodic-dispersive soils in wheat (Triticum aestivum L.)
Theoretical and Applied Genetics (2022)