An unexpectedly large number of human autosomal genes are subject to monoallelic expression (MAE). Our analysis of 4,227 such genes uncovers surprisingly high genetic variation across human populations. This increased diversity is unlikely to reflect relaxed purifying selection. Remarkably, MAE genes exhibit an elevated recombination rate and an increased density of hypermutable sequence contexts. However, these factors do not fully account for the increased diversity. We find that the elevated nucleotide diversity of MAE genes is also associated with greater allelic age: variants in these genes tend to be older and are enriched in polymorphisms shared by Neanderthals and chimpanzees. Both synonymous and nonsynonymous alleles of MAE genes have elevated average population frequencies. We also observed strong enrichment of the MAE signature among genes reported to evolve under balancing selection. We propose that an important biological function of widespread MAE might be the generation of cell-to-cell heterogeneity; the increased genetic variation contributes to this heterogeneity.
Savova, V., Vigneau, S. & Gimelbrant, A.A. Autosomal monoallelic expression: genetics of epigenetic diversity? Curr. Opin. Genet. Dev. 23, 642–648 (2013).
Chess, A., Simon, I., Cedar, H. & Axel, R. Allelic inactivation regulates olfactory receptor gene expression. Cell 78, 823–834 (1994).
Gimelbrant, A., Hutchinson, J.N., Thompson, B.R. & Chess, A. Widespread monoallelic expression on human autosomes. Science 318, 1136–1140 (2007).
Zwemer, L.M. et al. Autosomal monoallelic expression in the mouse. Genome Biol. 13, R10 (2012).
Nag, A. et al. Chromatin signature of widespread monoallelic expression. eLife 2, e01256 (2013).
Deng, Q., Ramsköld, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193–196 (2014).
Jeffries, A.R. et al. Stochastic choice of allelic expression in human neural stem cells. Stem Cells 30, 1938–1947 (2012).
Gendrel, A.V. et al. Developmental dynamics and disease potential of random monoallelic gene expression. Dev. Cell 28, 366–380 (2014).
Eckersley-Maslin, M.A. et al. Random monoallelic gene expression increases upon embryonic stem cell differentiation. Dev. Cell 28, 351–365 (2014).
Li, S.M. et al. Transcriptome-wide survey of mouse CNS-derived cells reveals monoallelic expression within novel gene families. PLoS One 7, e31751 (2012).
Pereira, J.P., Girard, R., Chaby, R., Cumano, A. & Vieira, P. Monoallelic expression of the murine gene encoding Toll-like receptor 4. Nat. Immunol. 4, 464–470 (2003).
Spencer, H.G. Population genetics and evolution of genomic imprinting. Annu. Rev. Genet. 34, 457–477 (2000).
Wilkins, J.F. & Haig, D. What good is genomic imprinting: the function of parent-specific gene expression. Nat. Rev. Genet. 4, 359–368 (2003).
Wu, C.T. & Dunlap, J.C. Homology effects: the difference between 1 and 2. Adv. Genet. 46, xvii–xxiii (2002).
Hoehe, M.R. et al. Multiple haplotype–resolved genomes reveal population patterns of gene and protein diplotypes. Nat. Commun. 5, 5569 (2014).
Chess, A. Mechanisms and consequences of widespread random monoallelic expression. Nat. Rev. Genet. 13, 421–428 (2012).
1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010).
Francioli, L.C. et al. Genome-wide patterns and properties of de novo mutations in humans. Nat. Genet. 47, 822–826 (2015).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Nag, A., Vigneau, S., Savova, V., Zwemer, L.M. & Gimelbrant, A.A. Chromatin signature identifies monoallelic gene expression across mammalian cell types. G3 (Bethesda) 5, 1713–1720 (2015).
Nei, M. & Li, W.H. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. USA 76, 5269–5273 (1979).
Kimura, M. Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution. Nature 267, 275–276 (1977).
Chamary, J.V. & Hurst, L.D. Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol. 6, R75 (2005).
Samocha, K.E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).
Walser, J.C. & Furano, A.V. The mutational spectrum of non-CpG DNA varies with CpG content. Genome Res. 20, 875–882 (2010).
Li, W.-H. Molecular Evolution (Sinauer Associates, 1997).
Begun, D.J. & Aquadro, C.F. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356, 519–520 (1992).
Charlesworth, B., Morgan, M.T. & Charlesworth, D. The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289–1303 (1993).
Smith, J.M. & Haigh, J. The hitch-hiking effect of a favourable gene. Genet. Res. 23, 23–35 (1974).
Hellmann, I. et al. Why do human diversity levels vary at a megabase scale? Genome Res. 15, 1222–1231 (2005).
Necsulea, A., Sémon, M., Duret, L. & Hurst, L.D. Monoallelic expression and tissue specificity are associated with high crossover rates. Trends Genet. 25, 519–522 (2009).
Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010).
Kiezun, A. et al. Deleterious alleles in the human genome are on average younger than neutral alleles of the same frequency. PLoS Genet. 9, e1003301 (2013).
Rasmussen, M.D., Hubisz, M.J., Gronau, I. & Siepel, A. Genome-wide inference of ancestral recombination graphs. PLoS Genet. 10, e1004342 (2014).
Andrés, A.M. et al. Targets of balancing selection in the human genome. Mol. Biol. Evol. 26, 2755–2764 (2009).
Leffler, E.M. et al. Multiple instances of ancient balancing selection shared between humans and chimpanzees. Science 339, 1578–1582 (2013).
Veyrieras, J.B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).
Sellis, D., Callahan, B.J., Petrov, D.A. & Messer, P.W. Heterozygote advantage as a natural consequence of adaptation in diploids. Proc. Natl. Acad. Sci. USA 108, 20666–20671 (2011).
DeGiorgio, M., Lohmueller, K.E. & Nielsen, R. A model-based approach for identifying signatures of ancient balancing selection in genetic data. PLoS Genet. 10, e1004561 (2014).
Yang, S. et al. Parent-progeny sequencing indicates higher mutation rates in heterozygotes. Nature 523, 463–467 (2015).
Eisenberg, E. & Levanon, E.Y. Human housekeeping genes, revisited. Trends Genet. 29, 569–574 (2013).
Borel, C. et al. Biased allelic expression in human primary fibroblast single cells. Am. J. Hum. Genet. 96, 70–80 (2015).
Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).
Cai, J.J., Macpherson, J.M., Sella, G. & Petrov, D.A. Pervasive hitchhiking at coding and regulatory sites in humans. PLoS Genet. 5, e1000336 (2009).
Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
Bustamante, C.D. et al. Natural selection on protein-coding genes in the human genome. Nature 437, 1153–1157 (2005).
Yanai, I. et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21, 650–659 (2005).
We thank D. Balick for useful discussions and I. Adzhubei for help with PolyPhen analysis. This work was supported in part by the following US National Institutes of Health (NIH) awards: R01 GM114864 to A.A.G. and R01 GM078598, GM105857 and MH101244 to S.R.S. A.A.G. was supported in part by the Pew scholar award; T.L.L. was supported by a fellowship from the German Research Foundation (DFG; LE 2593/1-1 and LE 2593/2-1); L.G. was a summer scholar in the Harvard/Massachusetts Institute of Technology BIG program (supported by US NIH award U54 LM008748); and R.B.M. and C.W. were supported by grants from the US NIH/National Institute of General Medical Sciences (R01 GM61936 and 5DP1 GM106412) and Harvard Medical School.
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Monoallelic expression can lead to coexistence of epigenetically distinct cell subpopulations.
A cell (circle) is located along the horizontal axis according to relative expression of paternal (Pat) and maternal (Mat) alleles. Cells form a uniform population when assessed with a focus on genes with both alleles transcriptionally active (biallelically expressed; BAE). By contrast, mitotically stable monoallelic expression leads to the formation of distinct cell subpopulations, depending on which allele is active (gray) and which allele is downregulated (x). Note that MAE genes might be expressed biallelically in some cells; thus, it is important to distinguish between biallelic expression in a given cell, which might occur both in MAE and BAE genes, and the capability of a gene to show monoallelic expression–based heterogeneity. Briefly, the properties of monoallelic expression can be summed up as follows (reviewed in ref. 1). Extreme allelic bias (tenfold or more) is common for monoallelic expression, although RNA-seq–based approaches also show examples of more attenuated biases. Monoallelic expression is highly mitotically stable over multiple cell divisions. Multiple genes are simultaneously subjected to monoallelic expression in a given cell and its clonal progeny, and allelic choice at each MAE locus is apparently independent. When the proteins encoded by the two alleles are functionally distinct, monoallelic expression can lead to dramatic functional differences between otherwise similar cells of the same type. This can result in astronomical combinatorial diversity in overall allelic expression patterns between clonal lineages of otherwise similar cells.
π is calculated for coding regions (CDS), including all sites. Error bars, 95% confidence intervals calculated by bootstrapping. Colors are as in the main figures. Groups: ESP-all, total for all populations; ESP-AA, African-American; ESP-EA, American of European descent. Source data
Supplementary Figure 3 Average nucleotide diversity (π) for MAE and BAE genes in the global 1000 Genomes data set.
π is calculated for the coding regions (CDS), including all sites. Error bars, 95% confidence intervals calculated by bootstrapping. Orange, BAE genes; blue, MAE genes. (a) π per cell line for genes classified as MAE and BAE in that cell line. (b) π for genes classified as MAE with RPKM >1 in only one (MAE 1), two, three or four cell lines, as compared to genes classified as BAE. (c) π for genes experimentally determined to be MAE (217) or BAE (2,412) on the basis of SNP array assays of five clones from the GM13130 cell line, as reported in ref. 3. (d) π for MAE and BAE genes by expression level for low, intermediate and high expression, as determined in e. (e) Definition of the low, intermediate and high expression categories for genes in the genome-wide data set. RPKM is the highest RPKM observed with each gene’s assigned status in the six cell lines; the boundaries of the categories are shown in hashed lines. (f) Mutation rate–corrected, non-CpG-prone π values for MAE and BAE genes by expression level. 95% confidence intervals were estimated by bootstrapping. Colors are as in a. (g) Mutation rate corrected π for MAE and BAE genes by expression level. 95% confidence intervals were estimated by bootstrapping. Colors are as in a.Note for f and g:Nucleotide diversity (π) in expression level bins. BAE genes are shifted toward much higher mRNA expression levels as compared to MAE genes (blue, MAE genes; orange, BAE genes). Although previous studies have suggested that highly expressed genes are subjected to higher selective pressure than weakly expressed genes (Proc. Natl. Acad. Sci. USA 102, 14338–14343 (2005) and Trends Genet. 24, 114–123, 2008), in our gene sets, we do not find strong evidence for negative correlation between π and expression levels. Specifically, we stratified MAE and BAE genes into eight equally sized bins by expression levels in six cell types (log10 (RPKM); see the Online Methods for our definition of expression levels) and examined the linear relationship between π and expression level (f, non-CpG π; g, overall π). The difference in mutation rate was corrected with a divergence-based mutation rate map. Expression level is not significantly correlated with non-CpG π (P = 0.52 for MAE and 0.61 for BAE) or overall π (P = 0.10 for MAE and 0.07 for BAE). Note that even the marginal correlation between expression level and overall π for BAE genes is almost entirely driven by the genes in the highest expression level bin (log10 (RPKM) >2.0), without which the trend becomes flat (P = 0.89; solid black line). This most highly expressed group of genes can explain only 9% of the difference in overall π (Δp) between MAE and BAE genes.To make an extremely conservative assumption, one can argue that the insignificant trend of overall π for BAE is uniform and holds over lowest expression levels (dashed black line). Even in that case, the potential contribution of expression bias is estimated to explain only 36% of Δp. To estimate this, we extrapolated the π values of hypothetical BAE genes that follow the distribution for gene length and expression level of monoallelic expression bywhere and are the estimated intercept and slope of the BAE trend over expression level (7.8 × 10−4 and −6.2 × 10−5, respectively; from dashed black line) and and are the number of fourfold-degenerate sites and the expression level of MAE gene i. Source data
(a) Genes encoding extracellular matrix molecules (ECM, Gene Ontology category GO:0031012) were strongly enriched for MAE genes (Fisher’s exact test, odds ratio = 8.09, P = 7.5 × 10−33). The total number of genes (Genome) and the number of genes associated with ECM are given in parentheses. (b) Genes showing signatures of a trans-species polymorphic haplotype (TSP) are significantly enriched for MAE genes. This enrichment remained even when excluding genes encoding extracellular matrix molecules (ECM, Gene Ontology category GO:0031012), a functional category that has previously been associated with balancing selection2. The total number of genes in each category is given in parentheses. Odds ratios and their significance level are reported (*P < 0.05, **P < 0.01). Source data
Supplementary Figure 5 Proportions in the genome-wide group are similar to proportions in the group that excludes the main assessed gene list.
(a) Percentages of MAE (blue) and BAE (orange) genes among genes known to cause Mendelian diseases extracted from the OMIM database (OMIM MorbidMap), genes known to cause Mendelian diseases (Other) and in the genome-wide data set as a whole. Shown on the colored portions of each bar are the actual numbers of genes in each category. (b) Proportion of MAE (blue) and BAE (orange) genes among genes thought to be under balancing selection (listed in Supplementary Table 5) as compared to the remaining fraction (other) and the genome-wide data set.
The distribution of per-gene ratios of nonsynonymous substitution per nonsynonymous site (dN) to the number of synonymous substitution per synonymous site (dS) (obtained from ref. 8) is presented for 2,006 MAE (blue) and 3,209 BAE (orange) genes within the range of 0 ≤ dN/dS < 2. Source data
(a) Site frequency spectra for variants in BAE and MAE genes by PolyPhen-2 classification for the African subpopulation. Top, benign variants; middle, fourfold-degenerate synonymous variants; bottom, damaging variants (defined as the union of possibly damaging, probably damaging and nonsense variants). Plot inserts zoom into derived allele frequencies between 20% and 90%. The x axis represents derived allele frequencies in bins of 10%, and the y axis shows the fraction of variants. (b) The low-frequency tail of site frequency spectra showing the fractions of singleton and doubleton sites in the sequences in the 1000 Genomes Project, African population. (c) Low-frequency tail of site frequency spectra for the sequences in the 1000 Genomes Project, global population. The allele frequency bins on the x axis correspond to derived allele counts of 1 to 10. Source data
Red lines represent the median, the edges of the box are the 25th and 75th percentiles, and the whiskers extend to the most extreme data points not considered outliers. The local recombination rate is defined as the average over a 410-kb window centered at each gene, based on the deCODE sex-averaged genetic map. Source data Source data
For each gene, mean TMRCA over the gene length was calculated from genome-wide TMRCA estimates generated by running ARGWeaver on Complete Genomics data9. Source data
Supplementary Figure 10 Residuals for MAE and BAE genes in a multivariate regression model of TMRCA values as a function of individual variables.
For each gene (blue, MAE; orange, BAE), mean TMRCA over the gene length was calculated from genome-wide TMRCA estimates generated by running ARGWeaver on Complete Genomics data9. The TMRCA estimates were log transformed for the regression analysis. All confounders except one (plotted on the x axis) were residualized for each panel. The intercept and slope of the MAE and BAE trend lines (solid line, MAE; dashed line, BAE) are from the multivariate regression model. The gap between trend lines represents a 1.06-fold increase in TMRCA for MAE as compared to BAE genes (P = 7.5 × 10−8). See the Online Methods for details on model covariates. Source data
Supplementary Figure 11 Identification of genes as MAE on the basis of their chromatin signature is consistent between unrelated individuals.
Comparison of the gene sets identified as MAE or BAE on the basis of ChIP-seq data in different individuals. Chromatin signature analysis was performed using ChIP-seq data from HapMap lymphoblastoid cell lines. Left bar, comparison of analyses performed on GM12878 (CEU) and GM19239 (YRI). Center bar, comparison of MAE and BAE calls obtained using ChIP-seq data from GM12878 (CEU) cells in ENCODE4 (same data set used in ref. 5) and in a biological replicate using the same cell line in another laboratory6 (“rep”). Right bar, the same gene set as in the center but additionally limited only to genes that show an expression level of RPKM >1 in GM12878 cells. BAE genes in both samples are shown in orange, MAE genes in both samples are shown in blue, genes that are MAE in the first listed sample but BAE in the other are shown in red; and genes that are BAE in the first listed sample but MAE in the other are shown in green. Only genes with calls made for both samples are used in this analysis. The numbers of genes in the major categories are shown. Source data
Supplementary Figure 12 High allelic diversity in the context of monoallelic expression leads to an increase in functional cell-to-cell heterogeneity.
Supplementary Figures 1–13 and Supplementary Tables 2–4, 8 and 9. (PDF 2064 kb)
MAE and BAE calls for genes used in the study. (XLSX 348 kb)
De novo mutation rate in MAE and BAE genes. (XLSX 17 kb)
Nucleotide diversity in recombination rate bins. (XLSX 12 kb)
Nucleotide diversity in recombination rate bins with strict read depth mask and divergence-based correction for CpG mutation bias. (XLSX 12 kb)
Genes in the study for which balancing selection has been reported. (XLSX 24 kb)
Analysis of human-chimpanzee trans-species polymorphisms. (XLSX 14 kb)
Analysis of derived alleles predating the human-Neanderthal split. (XLSX 17 kb)
About this article
Cite this article
Savova, V., Chun, S., Sohail, M. et al. Genes with monoallelic expression contribute disproportionately to genetic diversity in humans. Nat Genet 48, 231–237 (2016). https://doi.org/10.1038/ng.3493
A patient with combined pituitary hormone deficiency and osteogenesis imperfecta associated with mutations in LHX4 and COL1A2
Journal of Advanced Research (2020)
Variant filtering, digenic variants, and other challenges in clinical sequencing: a lesson from fibrillinopathies
Clinical Genetics (2020)
European Journal of Human Genetics (2020)
Frontiers in Genetics (2019)
Russian Journal of Genetics (2019)