Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genes with monoallelic expression contribute disproportionately to genetic diversity in humans

Abstract

An unexpectedly large number of human autosomal genes are subject to monoallelic expression (MAE). Our analysis of 4,227 such genes uncovers surprisingly high genetic variation across human populations. This increased diversity is unlikely to reflect relaxed purifying selection. Remarkably, MAE genes exhibit an elevated recombination rate and an increased density of hypermutable sequence contexts. However, these factors do not fully account for the increased diversity. We find that the elevated nucleotide diversity of MAE genes is also associated with greater allelic age: variants in these genes tend to be older and are enriched in polymorphisms shared by Neanderthals and chimpanzees. Both synonymous and nonsynonymous alleles of MAE genes have elevated average population frequencies. We also observed strong enrichment of the MAE signature among genes reported to evolve under balancing selection. We propose that an important biological function of widespread MAE might be the generation of cell-to-cell heterogeneity; the increased genetic variation contributes to this heterogeneity.

Figure 1: Nucleotide diversity is higher in MAE genes.
Figure 2: Purifying selection, mutation rate and recombination as potential sources of genetic diversity in MAE genes.
Figure 3: Older variants and genes under balancing selection are enriched among MAE genes.
Figure 4: Trans-species polymorphisms are enriched among MAE genes.

References

  1. 1

    Savova, V., Vigneau, S. & Gimelbrant, A.A. Autosomal monoallelic expression: genetics of epigenetic diversity? Curr. Opin. Genet. Dev. 23, 642–648 (2013).

    CAS  PubMed  Google Scholar 

  2. 2

    Chess, A., Simon, I., Cedar, H. & Axel, R. Allelic inactivation regulates olfactory receptor gene expression. Cell 78, 823–834 (1994).

    CAS  PubMed  Google Scholar 

  3. 3

    Gimelbrant, A., Hutchinson, J.N., Thompson, B.R. & Chess, A. Widespread monoallelic expression on human autosomes. Science 318, 1136–1140 (2007).

    CAS  PubMed  Google Scholar 

  4. 4

    Zwemer, L.M. et al. Autosomal monoallelic expression in the mouse. Genome Biol. 13, R10 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5

    Nag, A. et al. Chromatin signature of widespread monoallelic expression. eLife 2, e01256 (2013).

    PubMed  PubMed Central  Google Scholar 

  6. 6

    Deng, Q., Ramsköld, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193–196 (2014).

    CAS  PubMed  Google Scholar 

  7. 7

    Jeffries, A.R. et al. Stochastic choice of allelic expression in human neural stem cells. Stem Cells 30, 1938–1947 (2012).

    PubMed  Google Scholar 

  8. 8

    Gendrel, A.V. et al. Developmental dynamics and disease potential of random monoallelic gene expression. Dev. Cell 28, 366–380 (2014).

    CAS  PubMed  Google Scholar 

  9. 9

    Eckersley-Maslin, M.A. et al. Random monoallelic gene expression increases upon embryonic stem cell differentiation. Dev. Cell 28, 351–365 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10

    Li, S.M. et al. Transcriptome-wide survey of mouse CNS-derived cells reveals monoallelic expression within novel gene families. PLoS One 7, e31751 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11

    Pereira, J.P., Girard, R., Chaby, R., Cumano, A. & Vieira, P. Monoallelic expression of the murine gene encoding Toll-like receptor 4. Nat. Immunol. 4, 464–470 (2003).

    CAS  PubMed  Google Scholar 

  12. 12

    Spencer, H.G. Population genetics and evolution of genomic imprinting. Annu. Rev. Genet. 34, 457–477 (2000).

    CAS  PubMed  Google Scholar 

  13. 13

    Wilkins, J.F. & Haig, D. What good is genomic imprinting: the function of parent-specific gene expression. Nat. Rev. Genet. 4, 359–368 (2003).

    CAS  PubMed  Google Scholar 

  14. 14

    Wu, C.T. & Dunlap, J.C. Homology effects: the difference between 1 and 2. Adv. Genet. 46, xvii–xxiii (2002).

    PubMed  Google Scholar 

  15. 15

    Hoehe, M.R. et al. Multiple haplotype–resolved genomes reveal population patterns of gene and protein diplotypes. Nat. Commun. 5, 5569 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    Chess, A. Mechanisms and consequences of widespread random monoallelic expression. Nat. Rev. Genet. 13, 421–428 (2012).

    CAS  PubMed  Google Scholar 

  17. 17

    1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  18. 18

    Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19

    Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010).

    CAS  PubMed  Google Scholar 

  20. 20

    Francioli, L.C. et al. Genome-wide patterns and properties of de novo mutations in humans. Nat. Genet. 47, 822–826 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  22. 22

    Nag, A., Vigneau, S., Savova, V., Zwemer, L.M. & Gimelbrant, A.A. Chromatin signature identifies monoallelic gene expression across mammalian cell types. G3 (Bethesda) 5, 1713–1720 (2015).

    CAS  Google Scholar 

  23. 23

    Nei, M. & Li, W.H. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. USA 76, 5269–5273 (1979).

    CAS  PubMed  Google Scholar 

  24. 24

    Kimura, M. Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution. Nature 267, 275–276 (1977).

    CAS  PubMed  Google Scholar 

  25. 25

    Chamary, J.V. & Hurst, L.D. Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol. 6, R75 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. 26

    Samocha, K.E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27

    Walser, J.C. & Furano, A.V. The mutational spectrum of non-CpG DNA varies with CpG content. Genome Res. 20, 875–882 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28

    Li, W.-H. Molecular Evolution (Sinauer Associates, 1997).

  29. 29

    Begun, D.J. & Aquadro, C.F. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356, 519–520 (1992).

    CAS  PubMed  Google Scholar 

  30. 30

    Charlesworth, B., Morgan, M.T. & Charlesworth, D. The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289–1303 (1993).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31

    Smith, J.M. & Haigh, J. The hitch-hiking effect of a favourable gene. Genet. Res. 23, 23–35 (1974).

    CAS  PubMed  Google Scholar 

  32. 32

    Hellmann, I. et al. Why do human diversity levels vary at a megabase scale? Genome Res. 15, 1222–1231 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33

    Necsulea, A., Sémon, M., Duret, L. & Hurst, L.D. Monoallelic expression and tissue specificity are associated with high crossover rates. Trends Genet. 25, 519–522 (2009).

    CAS  PubMed  Google Scholar 

  34. 34

    Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010).

    CAS  PubMed  Google Scholar 

  35. 35

    Kiezun, A. et al. Deleterious alleles in the human genome are on average younger than neutral alleles of the same frequency. PLoS Genet. 9, e1003301 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36

    Rasmussen, M.D., Hubisz, M.J., Gronau, I. & Siepel, A. Genome-wide inference of ancestral recombination graphs. PLoS Genet. 10, e1004342 (2014).

    PubMed  PubMed Central  Google Scholar 

  37. 37

    Andrés, A.M. et al. Targets of balancing selection in the human genome. Mol. Biol. Evol. 26, 2755–2764 (2009).

    PubMed  PubMed Central  Google Scholar 

  38. 38

    Leffler, E.M. et al. Multiple instances of ancient balancing selection shared between humans and chimpanzees. Science 339, 1578–1582 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39

    Veyrieras, J.B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).

    PubMed  PubMed Central  Google Scholar 

  40. 40

    Sellis, D., Callahan, B.J., Petrov, D.A. & Messer, P.W. Heterozygote advantage as a natural consequence of adaptation in diploids. Proc. Natl. Acad. Sci. USA 108, 20666–20671 (2011).

    CAS  PubMed  Google Scholar 

  41. 41

    DeGiorgio, M., Lohmueller, K.E. & Nielsen, R. A model-based approach for identifying signatures of ancient balancing selection in genetic data. PLoS Genet. 10, e1004561 (2014).

    PubMed  PubMed Central  Google Scholar 

  42. 42

    Yang, S. et al. Parent-progeny sequencing indicates higher mutation rates in heterozygotes. Nature 523, 463–467 (2015).

    CAS  PubMed  Google Scholar 

  43. 43

    Eisenberg, E. & Levanon, E.Y. Human housekeeping genes, revisited. Trends Genet. 29, 569–574 (2013).

    CAS  PubMed  Google Scholar 

  44. 44

    Borel, C. et al. Biased allelic expression in human primary fibroblast single cells. Am. J. Hum. Genet. 96, 70–80 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45

    Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).

    PubMed  Google Scholar 

  46. 46

    Cai, J.J., Macpherson, J.M., Sella, G. & Petrov, D.A. Pervasive hitchhiking at coding and regulatory sites in humans. PLoS Genet. 5, e1000336 (2009).

    PubMed  PubMed Central  Google Scholar 

  47. 47

    Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. 48

    Bustamante, C.D. et al. Natural selection on protein-coding genes in the human genome. Nature 437, 1153–1157 (2005).

    CAS  PubMed  Google Scholar 

  49. 49

    Yanai, I. et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21, 650–659 (2005).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank D. Balick for useful discussions and I. Adzhubei for help with PolyPhen analysis. This work was supported in part by the following US National Institutes of Health (NIH) awards: R01 GM114864 to A.A.G. and R01 GM078598, GM105857 and MH101244 to S.R.S. A.A.G. was supported in part by the Pew scholar award; T.L.L. was supported by a fellowship from the German Research Foundation (DFG; LE 2593/1-1 and LE 2593/2-1); L.G. was a summer scholar in the Harvard/Massachusetts Institute of Technology BIG program (supported by US NIH award U54 LM008748); and R.B.M. and C.W. were supported by grants from the US NIH/National Institute of General Medical Sciences (R01 GM61936 and 5DP1 GM106412) and Harvard Medical School.

Author information

Affiliations

Authors

Contributions

A.A.G. and S.R.S. conceived the study. All authors contributed to data analysis. A.A.G., S.R.S. and V.S. wrote the manuscript with input from S.C., M.S. and R.B.M.

Corresponding authors

Correspondence to Shamil R Sunyaev or Alexander A Gimelbrant.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Monoallelic expression can lead to coexistence of epigenetically distinct cell subpopulations.

A cell (circle) is located along the horizontal axis according to relative expression of paternal (Pat) and maternal (Mat) alleles. Cells form a uniform population when assessed with a focus on genes with both alleles transcriptionally active (biallelically expressed; BAE). By contrast, mitotically stable monoallelic expression leads to the formation of distinct cell subpopulations, depending on which allele is active (gray) and which allele is downregulated (x). Note that MAE genes might be expressed biallelically in some cells; thus, it is important to distinguish between biallelic expression in a given cell, which might occur both in MAE and BAE genes, and the capability of a gene to show monoallelic expression–based heterogeneity. Briefly, the properties of monoallelic expression can be summed up as follows (reviewed in ref. 1). Extreme allelic bias (tenfold or more) is common for monoallelic expression, although RNA-seq–based approaches also show examples of more attenuated biases. Monoallelic expression is highly mitotically stable over multiple cell divisions. Multiple genes are simultaneously subjected to monoallelic expression in a given cell and its clonal progeny, and allelic choice at each MAE locus is apparently independent. When the proteins encoded by the two alleles are functionally distinct, monoallelic expression can lead to dramatic functional differences between otherwise similar cells of the same type. This can result in astronomical combinatorial diversity in overall allelic expression patterns between clonal lineages of otherwise similar cells.

Supplementary Figure 2 Nucleotide diversity in the Exome Sequencing Project subpopulations.

π is calculated for coding regions (CDS), including all sites. Error bars, 95% confidence intervals calculated by bootstrapping. Colors are as in the main figures. Groups: ESP-all, total for all populations; ESP-AA, African-American; ESP-EA, American of European descent. Source data

Supplementary Figure 3 Average nucleotide diversity (π) for MAE and BAE genes in the global 1000 Genomes data set.

π is calculated for the coding regions (CDS), including all sites. Error bars, 95% confidence intervals calculated by bootstrapping. Orange, BAE genes; blue, MAE genes. (a) π per cell line for genes classified as MAE and BAE in that cell line. (b) π for genes classified as MAE with RPKM >1 in only one (MAE 1), two, three or four cell lines, as compared to genes classified as BAE. (c) π for genes experimentally determined to be MAE (217) or BAE (2,412) on the basis of SNP array assays of five clones from the GM13130 cell line, as reported in ref. 3. (d) π for MAE and BAE genes by expression level for low, intermediate and high expression, as determined in e. (e) Definition of the low, intermediate and high expression categories for genes in the genome-wide data set. RPKM is the highest RPKM observed with each gene’s assigned status in the six cell lines; the boundaries of the categories are shown in hashed lines. (f) Mutation rate–corrected, non-CpG-prone π values for MAE and BAE genes by expression level. 95% confidence intervals were estimated by bootstrapping. Colors are as in a. (g) Mutation rate corrected π for MAE and BAE genes by expression level. 95% confidence intervals were estimated by bootstrapping. Colors are as in a.Note for f and g:Nucleotide diversity (π) in expression level bins. BAE genes are shifted toward much higher mRNA expression levels as compared to MAE genes (blue, MAE genes; orange, BAE genes). Although previous studies have suggested that highly expressed genes are subjected to higher selective pressure than weakly expressed genes (Proc. Natl. Acad. Sci. USA 102, 14338–14343 (2005) and Trends Genet. 24, 114–123, 2008), in our gene sets, we do not find strong evidence for negative correlation between π and expression levels. Specifically, we stratified MAE and BAE genes into eight equally sized bins by expression levels in six cell types (log10 (RPKM); see the Online Methods for our definition of expression levels) and examined the linear relationship between π and expression level (f, non-CpG π; g, overall π). The difference in mutation rate was corrected with a divergence-based mutation rate map. Expression level is not significantly correlated with non-CpG π (P = 0.52 for MAE and 0.61 for BAE) or overall π (P = 0.10 for MAE and 0.07 for BAE). Note that even the marginal correlation between expression level and overall π for BAE genes is almost entirely driven by the genes in the highest expression level bin (log10 (RPKM) >2.0), without which the trend becomes flat (P = 0.89; solid black line). This most highly expressed group of genes can explain only 9% of the difference in overall πp) between MAE and BAE genes.To make an extremely conservative assumption, one can argue that the insignificant trend of overall π for BAE is uniform and holds over lowest expression levels (dashed black line). Even in that case, the potential contribution of expression bias is estimated to explain only 36% of Δp. To estimate this, we extrapolated the π values of hypothetical BAE genes that follow the distribution for gene length and expression level of monoallelic expression bywhere and are the estimated intercept and slope of the BAE trend over expression level (7.8 × 10−4 and −6.2 × 10−5, respectively; from dashed black line) and and are the number of fourfold-degenerate sites and the expression level of MAE gene i. Source data

Supplementary Figure 4 MAE genes and extracellular matrix molecules.

(a) Genes encoding extracellular matrix molecules (ECM, Gene Ontology category GO:0031012) were strongly enriched for MAE genes (Fisher’s exact test, odds ratio = 8.09, P = 7.5 × 10−33). The total number of genes (Genome) and the number of genes associated with ECM are given in parentheses. (b) Genes showing signatures of a trans-species polymorphic haplotype (TSP) are significantly enriched for MAE genes. This enrichment remained even when excluding genes encoding extracellular matrix molecules (ECM, Gene Ontology category GO:0031012), a functional category that has previously been associated with balancing selection2. The total number of genes in each category is given in parentheses. Odds ratios and their significance level are reported (*P < 0.05, **P < 0.01). Source data

Supplementary Figure 5 Proportions in the genome-wide group are similar to proportions in the group that excludes the main assessed gene list.

(a) Percentages of MAE (blue) and BAE (orange) genes among genes known to cause Mendelian diseases extracted from the OMIM database (OMIM MorbidMap), genes known to cause Mendelian diseases (Other) and in the genome-wide data set as a whole. Shown on the colored portions of each bar are the actual numbers of genes in each category. (b) Proportion of MAE (blue) and BAE (orange) genes among genes thought to be under balancing selection (listed in Supplementary Table 5) as compared to the remaining fraction (other) and the genome-wide data set.

Supplementary Figure 6 Distribution of dN/dS per gene.

The distribution of per-gene ratios of nonsynonymous substitution per nonsynonymous site (dN) to the number of synonymous substitution per synonymous site (dS) (obtained from ref. 8) is presented for 2,006 MAE (blue) and 3,209 BAE (orange) genes within the range of 0 ≤ dN/dS < 2. Source data

Supplementary Figure 7 Allele frequencies are higher in MAE genes than in BAE genes.

(a) Site frequency spectra for variants in BAE and MAE genes by PolyPhen-2 classification for the African subpopulation. Top, benign variants; middle, fourfold-degenerate synonymous variants; bottom, damaging variants (defined as the union of possibly damaging, probably damaging and nonsense variants). Plot inserts zoom into derived allele frequencies between 20% and 90%. The x axis represents derived allele frequencies in bins of 10%, and the y axis shows the fraction of variants. (b) The low-frequency tail of site frequency spectra showing the fractions of singleton and doubleton sites in the sequences in the 1000 Genomes Project, African population. (c) Low-frequency tail of site frequency spectra for the sequences in the 1000 Genomes Project, global population. The allele frequency bins on the x axis correspond to derived allele counts of 1 to 10. Source data

Supplementary Figure 8 Comparison of local recombination rates for MAE and BAE genes.

Red lines represent the median, the edges of the box are the 25th and 75th percentiles, and the whiskers extend to the most extreme data points not considered outliers. The local recombination rate is defined as the average over a 410-kb window centered at each gene, based on the deCODE sex-averaged genetic map. Source data Source data

Supplementary Figure 9 Distribution of TMRCA for MAE and BAE genes.

For each gene, mean TMRCA over the gene length was calculated from genome-wide TMRCA estimates generated by running ARGWeaver on Complete Genomics data9. Source data

Supplementary Figure 10 Residuals for MAE and BAE genes in a multivariate regression model of TMRCA values as a function of individual variables.

For each gene (blue, MAE; orange, BAE), mean TMRCA over the gene length was calculated from genome-wide TMRCA estimates generated by running ARGWeaver on Complete Genomics data9. The TMRCA estimates were log transformed for the regression analysis. All confounders except one (plotted on the x axis) were residualized for each panel. The intercept and slope of the MAE and BAE trend lines (solid line, MAE; dashed line, BAE) are from the multivariate regression model. The gap between trend lines represents a 1.06-fold increase in TMRCA for MAE as compared to BAE genes (P = 7.5 × 10−8). See the Online Methods for details on model covariates. Source data

Supplementary Figure 11 Identification of genes as MAE on the basis of their chromatin signature is consistent between unrelated individuals.

Comparison of the gene sets identified as MAE or BAE on the basis of ChIP-seq data in different individuals. Chromatin signature analysis was performed using ChIP-seq data from HapMap lymphoblastoid cell lines. Left bar, comparison of analyses performed on GM12878 (CEU) and GM19239 (YRI). Center bar, comparison of MAE and BAE calls obtained using ChIP-seq data from GM12878 (CEU) cells in ENCODE4 (same data set used in ref. 5) and in a biological replicate using the same cell line in another laboratory6 (“rep”). Right bar, the same gene set as in the center but additionally limited only to genes that show an expression level of RPKM >1 in GM12878 cells. BAE genes in both samples are shown in orange, MAE genes in both samples are shown in blue, genes that are MAE in the first listed sample but BAE in the other are shown in red; and genes that are BAE in the first listed sample but MAE in the other are shown in green. Only genes with calls made for both samples are used in this analysis. The numbers of genes in the major categories are shown. Source data

Supplementary Figure 12 High allelic diversity in the context of monoallelic expression leads to an increase in functional cell-to-cell heterogeneity.

Supplementary Figure 13 Analysis of selective constraint in MAE and BAE genes.

Distribution of signed z scores7 of BAE (orange) and MAE (blue) genes. z scores are binned into intervals of 0.5 units for 3,609 MAE and 5,191 BAE genes (P value = 0.003457, Wilcoxon rank-sum test). See also Supplementary Table 4. Source data

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–13 and Supplementary Tables 2–4, 8 and 9. (PDF 2064 kb)

Supplementary Table 1

MAE and BAE calls for genes used in the study. (XLSX 348 kb)

Supplementary Table 5

De novo mutation rate in MAE and BAE genes. (XLSX 17 kb)

Supplementary Table 6

Nucleotide diversity in recombination rate bins. (XLSX 12 kb)

Supplementary Table 7

Nucleotide diversity in recombination rate bins with strict read depth mask and divergence-based correction for CpG mutation bias. (XLSX 12 kb)

Supplementary Table 10

Genes in the study for which balancing selection has been reported. (XLSX 24 kb)

Supplementary Table 11

Analysis of human-chimpanzee trans-species polymorphisms. (XLSX 14 kb)

Supplementary Table 12

Analysis of derived alleles predating the human-Neanderthal split. (XLSX 17 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Savova, V., Chun, S., Sohail, M. et al. Genes with monoallelic expression contribute disproportionately to genetic diversity in humans. Nat Genet 48, 231–237 (2016). https://doi.org/10.1038/ng.3493

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing