Complex human traits are influenced by variation in regulatory DNA through mechanisms that are not fully understood. Because regulatory elements are conserved between humans and mice, a thorough annotation of cis regulatory variants in mice could aid in further characterizing these mechanisms. Here we provide a detailed portrait of mouse gene expression across multiple tissues in a three-way diallel. Greater than 80% of mouse genes have cis regulatory variation. Effects from these variants influence complex traits and usually extend to the human ortholog. Further, we estimate that at least one in every thousand SNPs creates a cis regulatory effect. We also observe two types of parent-of-origin effects, including classical imprinting and a new global allelic imbalance in expression favoring the paternal allele. We conclude that, as with humans, pervasive regulatory variation influences complex genetic traits in mice and provide a new resource toward understanding the genetic control of transcription in mammals.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
King, M.C. & Wilson, A.C. Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975).
Gan, X. et al. Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature 477, 419–423 (2011).
Keane, T.M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).
Maurano, M.T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).
Schaub, M.A., Boyle, A.P., Kundaje, A., Batzoglou, S. & Snyder, M. Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012).
Nicolae, D.L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).
Wang, X. et al. Transcriptome-wide identification of novel imprinted genes in neonatal mouse brain. PLoS ONE 3, e3839 (2008).
Gregg, C. et al. High-resolution analysis of parent-of-origin allelic expression in the mouse brain. Science 329, 643–648 (2010).
Gregg, C., Zhang, J., Butler, J.E., Haig, D. & Dulac, C. Sex-specific parent-of-origin allelic expression in the mouse brain. Science 329, 682–685 (2010).
DeVeale, B., van der Kooy, D. & Babak, T. Critical evaluation of imprinted gene expression by RNA-Seq: a new perspective. PLoS Genet. 8, e1002600 (2012).
Wang, X., Soloway, P.D. & Clark, A.G. A survey for novel imprinted genes in the mouse placenta by mRNA-seq. Genetics 189, 109–122 (2011).
Okae, H. et al. Re-investigation and RNA sequencing-based identification of genes with placenta-specific imprinted expression. Hum. Mol. Genet. 21, 548–558 (2012).
Goncalves, A. et al. Extensive compensatory cis-trans regulation in the evolution of mouse gene expression. Genome Res. 22, 2376–2384 (2012).
Babak, T. et al. Global survey of genomic imprinting by transcriptome sequencing. Curr. Biol. 18, 1735–1741 (2008).
Hayden, E.C. RNA studies under fire. Nature 484, 428 (2012).
Barlow, D.P. Gametic imprinting in mammals. Science 270, 1610–1613 (1995).
Skarnes, W.C. et al. A conditional knockout resource for the genome-wide study of mouse gene function. Nature 474, 337–342 (2011).
Collaborative Cross Consortium. The genome architecture of the Collaborative Cross mouse genetic reference population. Genetics 190, 389–401 (2012).
Churchill, G.A., Gatti, D.M., Munger, S.C. & Svenson, K.L. The Diversity Outbred mouse population. Mamm. Genome 23, 713–718 (2012).
Huang, S., Holt, J., Kao, C.Y., McMillan, L. & Wang, W. A novel multi-alignment pipeline for high-throughput sequencing data. Database (Oxford) 2014, bau057 (2014).
Zhang, Z. et al. GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference due to RNAseq reads misalignment. Bioinformatics 29, 291–299 (2013).
Zou, F. et al. A novel statistical approach for jointly analyzing RNA-Seq data from F1 reciprocal crosses and inbred lines. Genetics 197, 389–399 (2014).
Wright, F.A. et al. Heritability and genomics of gene expression in peripheral blood. Nat. Genet. 46, 430–437 (2014).
Kim, Y. et al. A meta-analysis of gene expression quantitative trait loci in brain. Transl. Psychiatry 4, e459 (2014).
Yang, H. et al. Subspecific origin and haplotype diversity in the laboratory mouse. Nat. Genet. 43, 648–655 (2011).
Xie, W. et al. Base-resolution analyses of sequence and parent-of-origin dependent DNA methylation in the mouse genome. Cell 148, 816–831 (2012).
Ohno, S., Kaplan, W.D. & Kinosita, R. Formation of the sex chromatin by a single X-chromosome in liver cells of Rattus norvegicus. Exp. Cell Res. 18, 415–418 (1959).
Lyon, M.F. Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 190, 372–373 (1961).
Ohno, S. Sex Chromosomes and Sex Linked Genes (Springer Verlag, 1967).
Cattanach, B.M. Controlling elements in the mouse X-chromosome. 3. Influence upon both parts of an X divided by rearrangement. Genet. Res. 16, 293–301 (1970).
Calaway, J.D. et al. Genetic architecture of skewed X inactivation in the laboratory mouse. PLoS Genet. 9, e1003853 (2013).
Aylor, D.L. et al. Genetic analysis of complex traits in the emerging Collaborative Cross. Genome Res. 21, 1213–1222 (2011).
Cui, X., Affourtit, J., Shockley, K.R., Woo, Y. & Churchill, G.A. Inheritance patterns of transcript levels in F1 hybrid mice. Genetics 174, 627–637 (2006).
Price, A.L. et al. Single-tissue and cross-tissue heritability of gene expression via identity-by-descent in related or unrelated individuals. PLoS Genet. 7, e1001317 (2011).
Schadt, E.E. et al. Genetics of gene expression surveyed in maize, mouse and man. Nature 422, 297–302 (2003).
Kong, A. et al. Rate of de novo mutations and the importance of father's age to disease risk. Nature 488, 471–475 (2012).
Drost, J.B. & Lee, W.R. Biological basis of germline mutation: comparisons of spontaneous germline mutation rates among Drosophila, mouse, and human. Environ. Mol. Mutagen. 25 (suppl. 26), 48–64 (1995).
Lin, H. et al. Dosage compensation in the mouse balances up-regulation and silencing of X-linked genes. PLoS Biol. 5, e326 (2007).
Johnston, C.M. et al. Large-scale population study of human cell lines indicates that dosage compensation is virtually complete. PLoS Genet. 4, e9 (2008).
Yang, F., Babak, T., Shendure, J. & Disteche, C.M. Global survey of escape from X inactivation by RNA-sequencing in mouse. Genome Res. 20, 614–622 (2010).
Li, N. & Carrel, L. Escape from X chromosome inactivation is an intrinsic property of the Jarid1c locus. Proc. Natl. Acad. Sci. USA 105, 17055–17060 (2008).
Lopes, A.M. et al. Transcriptional changes in response to X chromosome dosage in the mouse: implications for X inactivation and the molecular basis of Turner Syndrome. BMC Genomics 11, 82 (2010).
Carrel, L. & Willard, H.F. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature 434, 400–404 (2005).
Berletch, J.B., Yang, F. & Disteche, C.M. Escape from X inactivation in mice and humans. Genome Biol. 11, 213 (2010).
Nguyen, D.K. & Disteche, C.M. Dosage compensation of the active X chromosome in mammals. Nat. Genet. 38, 47–53 (2006).
Gupta, V. et al. Global analysis of X-chromosome dosage compensation. J. Biol. 5, 3 (2006).
Xiong, Y. et al. RNA sequencing shows no dosage compensation of the active X-chromosome. Nat. Genet. 42, 1043–1047 (2010).
Deng, X. et al. Evidence for compensatory upregulation of expressed X-linked genes in mammals, Caenorhabditis elegans and Drosophila melanogaster. Nat. Genet. 43, 1179–1185 (2011).
Kharchenko, P.V., Xi, R. & Park, P.J. Evidence for dosage compensation between the X chromosome and autosomes in mammals. Nat. Genet. 43, 1167–1169 author reply 1171–1172 (2011).
Lin, H. et al. Relative overexpression of X-linked genes in mouse embryonic stem cells is consistent with Ohno's hypothesis. Nat. Genet. 43, 1169–1170 author reply 1171–1172 (2011).
Yildirim, E., Sadreyev, R.I., Pinter, S.F. & Lee, J.T. X-chromosome hyperactivation in mammals via nonlinear relationships between chromatin states and transcription. Nat. Struct. Mol. Biol. 19, 56–61 (2012).
He, X. et al. He et al. reply. Nat. Genet. 43, 1171–1172 (2011).
Lin, F., Xing, K., Zhang, J. & He, X. Expression reduction in mammalian X chromosome evolution refutes Ohno's hypothesis of dosage compensation. Proc. Natl. Acad. Sci. USA 109, 11752–11757 (2012).
Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343–348 (2011).
Disteche, C.M. Dosage compensation of the sex chromosomes. Annu. Rev. Genet. 46, 537–560 (2012).
Jue, N.K. et al. Determination of dosage compensation of the mammalian X chromosome by RNA-seq is dependent on analytical approach. BMC Genomics 14, 150 (2013).
We thank P. Mieczkowski, A. Brandt, E. Malc, M. Vernon, J. Brennan and M. Calabrese for helpful discussions. Major funding was provided by National Institute of Mental Health/National Human Genome Research Institute Center of Excellence for Genome Sciences grants (P50MH090338 and P50HG006582, co-principal investigators F.P.-M.d.V. and P.F.S.). This work was also supported by grants R01GM074175 (principal investigator F.Z.) from the National Institute of General Medical Sciences and K01MH094406 (principal investigator J.J.C.) from the National Institute of Mental Health.
The authors declare no competing financial interests.
Integrated supplementary information
Allele-specific expression of a gene requires the presence of a genetic or epigenetic variant in cis. This is demonstrated above, where trans-acting factors have equal opportunity to affect both alleles.
For this study, we developed a likelihood-based method to jointly analyze both total read counts (TReC) and allele-specific read counts (ASReC) from inbred and F1 mice. (a) A negative binomial distribution was generated for TReC, and a beta-binomial distribution was used for ASReC. (b) Simulation and real data support the increased power of this joint model (simulation and real data analyses use the same sample size: 24 = 6(AA) + 6(BB) + 6(AB) + 6(BA).
Supplementary Figure 3 Clustering of expression microarray data and shared strain effects across tissues.
(a) All 384 microarrays run across 4 different tissues were entered into a single cluster analysis. Microarray samples cluster first by tissue, followed by cross, parent of origin and sex. (b) Venn diagram showing the number of genes with significant strain effects within at least one cross and the degree to which these strain effects are shared across multiple tissues.
Mad1l1 (mitotic arrest deficient 1–like 1) is one of several thousand genes identified with statistically significant allelic imbalance. (a) For each animal, two points are plotted to reflect the proportion of allele-specific Mad1l1 reads aligned to each parent (colored by genotype). The F1 allele-specific data indicate that, at the cis level, the PWK/PhJ Mad1l1 allele is stronger than the WSB/EiJ allele, which is stronger than the CAST/EiJ allele. (b) For each animal, one point is plotted to reflect the total number of reads aligned to Mad1l1, regardless of whether they are allele specific or not. The points for F1 mice are colored with the maternal strain on the outside and the paternal strain on the inside. This pattern indicates that the cis effect identified in the top panel is consistent with total expression in the parental strains. Furthermore, the expression levels in the F1 samples are intermediate to those of their parents, indicating an additive effect.
Fos (FBJ osteosarcoma oncogene) overdominance effect. (a) The F1 allele-specific data indicate that, at the cis level, the PWK/PhJ allele is stronger than the WSB/EiJ allele, which is stronger than the CAST/EiJ allele. (b) This cis effect is consistent with total expression in the parental strains, but the expression levels in the F1 animals show a dominance effect. (c) The dominance effect replicated in the brain microarray data.
Balanced contribution of different subspecies to the identification of genes with additive strain effects using microarray data. For each tissue, a Venn diagram shows the degree to which additive strain effects are shared by multiple crosses. For example, in brain, 74% of all expressed genes (9,701/13,162) showed a strain effect in at least one cross (FDR < 0.05), with the majority identified in at least 2 crosses. Also shown for each tissue is a distribution of the effect size (a positive value indicates higher expression from the alphabetically second strain for each cross) for all genes with a strain effect, separated by cross.
Sequence divergence is correlated to the number and magnitude of differentially expressed genes. Plotted for each reciprocal cross is the relationship between the number of SNPs in a 10-kb window (x axis), the proportion of expressed genes that are differentially expressed (y axis) and the P value associated with differential expression (z axis). The widths of the colored and shadow lines are proportional to the number of expressed genes in a given SNP density bin. The y and z axes are loess-smoothed curves, and this plot excludes genomic regions in which strains share subspecific origin (∼13% of the genome).
Estimation of the proportion of SNPs creating cis regulatory effects. As shown at the top of this figure, WSB/EiJ is of M. m. domesticus origin and CAST/EiJ is of M. m. castaneous origin for 91% of the genome. For such regions, we found 5,045 genes with a cis eQTL between WSB/EiJ and CAST/EiJ out of a total of 10,011 possible genes (50%), defined as having at least one strain-informative expressed SNP. These 10,011 genes are known to possess a total of 6,167,501 SNPs between WSB/EiJ and CAST/EiJ, considering the entire gene body with 10 kb upstream and downstream. Because each cis eQTL must result from at least one regulatory variant, we can estimate the minimum proportion of SNPs creating a cis regulatory effect as 0.082% of all new variants (5,045 cis eQTLs/6,167,501 SNPs). The same logic follows for every other cross and genomic regions with different phylogenies. Overall, the mean percentage of SNPs creating a cis eQTL is 0.10% (±0.02%, 95% confidence interval).
Distribution of genes imprinted in mouse brain. Plotted in red across each chromosome is the parent-of-origin effect P value for a combined analysis involving all three crosses. The positions of 128 known imprinted genes, and the clusters they form, are shown in green, and novel imprinted gene locations are shown in orange.
Genes with consistent overexpression from the paternal allele are closer to CpG islands that are preferentially methylated on the maternal allele. To create this plot, CpG islands were first divided into three categories on the basis of data from Xie et al. (Cell 148, 816–831, 2012): those preferentially methylated on the paternal allele (red) or maternal allele (blue) or with no preference (black). The y axis indicates the log of the ratio between two distances: (1) the distance between the TSS of consistently paternally expressed genes and the nearest CpG island and (2) the same measure for inconsistently expressed genes. The distribution of this ratio is plotted for all three categories of CpG islands. The blue line (CpG islands that are preferentially maternally methylated) shows a greater enrichment for negative values (greater area under the curve) than the red line, indicating that genes with consistent paternal expression are generally closer to maternally methylated CpG islands than paternally methylated CpG islands. The sharp downward spikes in each curve are due to the nature of the distance distributions, as many genes have a distance of zero (TSS overlaps a CpG island).
Dosage compensation in mouse. (a) Mean expression values for each gene on the X chromosome are plotted for males (n = 39) versus females (n = 51). The 1:1 linear relationship indicates that, as expected, inactivation of one X chromosome in females equalizes expression levels between the sexes, with the exception of Xist. (b) For each of 90 animals (52 female, 39 male), a distribution of gene expression levels was generated for autosomal and X-chromosome genes separately. These distributions were then plotted against each other, with each line representing a mouse. The result was a roughly 1:1 relationship in the levels of expression from the autosomes and X chromosome.
Representative examples of skewed X inactivation for two female animals. Plotted is the proportion of allele-specific reads assigned to each parent over the entire length of the X chromosome. The animal plotted on the top, a (CAST/EiJ × WSB/EiJ)F1 female, showed on average 76% of allele-specific reads derived from the strain with the stronger Xce allele, CAST/EiJ. The animal on the bottom, a (WSB/EiJ × CAST/EiJ)F1, showed on average 55% of allele-specific reads derived from CAST/EiJ. These examples underscore the importance of calculating individual-level null hypotheses that factor in stochastic and genetic contribution to X-inactivation skewing.
Unusual clustering of allele-specific reads from Vti1b led to identification of a ∼250-kb duplication. (a) Allele-specific and (b) total read counts for Vti1b. For two crosses (PWK/PhJ × CAST/EiJ and WSB/EiJ × CAST/EiJ), allele-specific read counts for biological replicates formed two clusters (red circles) for several consecutive genes on chromosome 12. The pattern was consistent across genes, with an overrepresentation of CAST/EiJ alleles in certain animals, coinciding with an overall higher level of total gene expression. This highly unusual pattern suggested a CAST/EiJ duplication in this region. (c) Further analysis of RNA-seq data suggested a CAST duplication affecting at least five genes (colored in red). A total of 12 DNA samples (n = 6 presumed three copy, n = 6 presumed two copy) were then examined with a high-density SNP array. Probe intensity data were used to identify duplication spanning at least 239 kb and encompassing the entire coding region of the five genes with unusual expression data. The flanking genes Pigh and Plekhh1 are expressed in the brain but do not show increased expression in mice with the duplication, suggesting that the duplication did not include regulatory sequences necessary for expression of these genes.
About this article
Cite this article
Crowley, J., Zhabotynsky, V., Sun, W. et al. Analyses of allele-specific gene expression in highly divergent mouse crosses identifies pervasive allelic imbalance. Nat Genet 47, 353–360 (2015). https://doi.org/10.1038/ng.3222
Journal of Molecular Cell Biology (2021)
Molecular Biology and Evolution (2021)
Testcrosses are an efficient strategy for identifying cis-regulatory variation: Bayesian analysis of allele-specific expression (BayesASE)
G3 Genes|Genomes|Genetics (2021)
Journal of Molecular Biology (2021)
Genome-wide analysis of spatiotemporal allele-specific expression in F1 hybrids of meat- and egg-type chickens