Evidence for dosage compensation between the X chromosome and autosomes in mammals

To the Editor:

It has been hypothesized that, in addition to the inactivation of one of the female X chromosomes, X-linked expression in mammals is regulated through dosage compensation that involves a twofold upregulation of expression from the active X chromosome. This idea was initially based on evolutionary arguments and has subsequently been supported by the analysis of microarray expression data, which suggested that the median transcriptional magnitude of genes on the single active X chromosome is similar to that of genes on the two-copy autosomes (X:AA ratio of 1)1,2. However, in a recent Nature Genetics article, Xiong et al. state, on the basis of their examination of multiple human and mouse RNA sequencing (RNA-seq) data sets, that global X-chromosome upregulation is absent, thus necessitating a major revision of the current model3. The authors argue that the increased accuracy of the RNA-seq data reveals the true X:AA ratio to be close to 0.5, about half of the value obtained by examining microarray data. Here we contend that the low estimate of the X:AA ratio by Xiong et al. stems from the disproportionate contribution of transcriptionally inactive genes, which are not relevant for the evaluation of dosage compensation mechanisms, to the X chromosome average. We show that when only active genes are considered, the RNA-seq data give X:AA ratios closer to 1, and the observed minor deviation of the X:AA ratio from 1 is within the range expected when taking into account chromosome-to-chromosome variability.

Whereas upregulation of the X chromosome was originally demonstrated in Drosophila melanogaster by direct comparison between the sexes4, X-chromosome inactivation that equilibrates the effective dosage in diploid adult cells in mammals precludes such an approach. Investigations have thus focused on comparing the transcriptional output of the single active X chromosome to that of two-copy autosomes2,3. The fundamental challenge in this approach, as we show below, is that it involves comparison of different groups of genes and therefore must account for interchromosomal differences not directly related to dosage compensation.

A critical question, then, in computing the X:AA ratio is which criteria should be used to select genes for subsequent analysis. Dosage compensation is expected to equilibrate the amounts of available gene products between the sexes while preserving the set of functionally active genes in a given cell type. We assert that the effect of a mechanism that regulates transcriptional dosage compensation pertains only to the expression magnitude of transcriptionally active genes5,6. Analysis of this mechanism, therefore, should include only transcriptionally active genes. Some background signal inherent to the assay used may be observed for inactive genes, but this signal is independent of the occurrence of dosage-compensation mechanisms. Thus, it is imperative to remove the contribution of inactive genes from the calculation of the average expression magnitudes of chromosomes.

In RNA-seq measurements, the abundance of a given transcript is assessed on the basis of the number of sequenced cDNA fragments that are associated with its exonic regions (that is, the number of associated reads per kilobase of exonic sequence per million of total reads sequenced (RPKM), adjusted for read mappability). At a given sequencing depth, the genes lacking any associated fragments are below the detection level and cannot be considered active. Re-examining the RNA-seq data from human and mouse tissues7,8,9,10 used by Xiong et al. and an additional mouse data set11, we find that the fraction of such undetected (RPKM = 0) genes is substantially higher on the X chromosome than on autosomes (1.6–2.2 times higher, depending on the sample), accounting for as much as 40% of all the X-linked genes (Supplementary Fig. 1a). Therefore, inclusion of all genes or the same fraction of genes from all chromosomes disproportionately reduces the average transcription magnitude estimates for the X chromosome (the analysis in Xiong et al. excluded an equal percentage of the most and least expressed genes from each chromosome3, thus only influencing the error bars from bootstrap analysis without altering the X:AA median ratio). When inactive genes are excluded based on the lack of observed reads (RPKM = 0), the X:AA ratio is substantially higher than the value of 0.5 expected from a single uncompensated X chromosome (Supplementary Fig. 1b).

The detection of at least one associated read, however, is not a robust criterion for identifying transcriptionally active genes. As the number of total sequenced reads increases, so does the probability that sequencing or alignment errors will lead to the erroneous association of at least one read with a given transcript. Greater sequencing depth also allows for detection of increasingly rare transcripts. We therefore chose a low relative abundance threshold (RPKM ≥ 1 with at least 3 reads, which corresponds to approximately 0.3 mRNA copies per cell9) to identify a more robust set of transcriptionally active genes. Using this threshold, we found the average X:AA ratio to be 0.93 ± 0.17 across human tissues (Fig. 1a). Although the minimum amount of transcript for functional relevance cannot be easily determined for different transcripts, we note that the X:AA ratio does not substantially change for thresholds above RPKM = 1.

Figure 1: Transcriptional upregulation of genes on the active X chromosome.
figure1

(a) The ratio of the median transcription magnitudes of X-linked and autosomal genes. The X:AA ratio estimates are shown based on the set of genes with minimal transcription (RPKM ≥ 1 and at least 3 associated reads). Black error bars show the 95% confidence interval (CI) based on bootstrap estimates incorrectly assuming independence of expression levels for neighboring genes (plotted here for reference; not used to make inferences). Red bars show the range around 1 into which the X:AA ratio is expected to fall (95% CI) in the presence of twofold upregulation of the X chromosome, taking into account interchromosomal variation (sampling of contiguous blocks of X-chromosome size from the autosomal portion of the genome). The observed X:AA values (black dots) in all tissues fall within this range, indicating that the observed transcriptional magnitude of X-linked genes is compatible with the presence of twofold upregulation. The blue bars show the range around 0.5 into which the X:AA ratio is expected to fall in the absence of X-chromosome upregulation (50% of the autosomal expression level). The X:AA estimates for the first five samples fall outside of this range, indicating that the X-linked expression magnitude is significantly higher than that expected in the absence of dosage compensation. The X:AA values for other samples are within both the red and blue ranges, indicating that the two hypotheses (X:AA = 1 and X:AA = 0.5) cannot be clearly distinguished based on these individual data sets. (b) Ratios of median expression magnitudes estimated for human chromosomes 10 and 11 with 95% CI computed by the simple gene resampling method. The chr. 10:A and chr. 11:A ratios deviate from the expected value of 1, illustrating chromosome-to-chromosome variability. (c) Mouse RNA-seq data presented as in a. None of the mouse data sets are compatible with a lack of dosage compensation. (d) Dependence of the X:AA estimates on the RPKM threshold. The tissue-averaged X:AA estimates are shown (black) as a function of the minimal RPKM threshold, from 0 (all genes, including those with undetected expression) to RPKM ≥2. The error bars correspond to the s.e.m. between different tissues. The largest change in the ratio is observed after exclusion of genes with undetected expression (RPKM >0). As the RPKM thresholds increase, the X:AA ratio largely stabilizes above RPKM = 1. The application of a RPKM threshold increases the median expression level and can artificially shift the X:AA ratio closer to 1. The shaded gray region shows the 95% confidence envelope for the hypothetical X chromosome that is expressed at 50% of the autosomal level (see Supplementary Methods). For non-zero RPKM thresholds, the observed X:AA ratios lie outside of this 95% confidence interval, showing that the high X:AA ratios are increased more than is expected from only setting a RPKM threshold.

Because the X:AA ratio compares the median expression levels of disparate genes located on different chromosomes, the significance of X:AA deviation from unity should be evaluated relative to the natural variability in the median expression levels among chromosomes. Such inherent variability is illustrated by analogous ratios estimated for individual autosomes, which are present in an equal number of copies. For instance, we find that chr. 10:A = 0.83 ± 0.09 and chr. 11:A = 1.25 ± 0.18 (Fig. 1b). Of note, many such deviations are statistically significant if the confidence intervals are estimated using bootstrap resampling or similar approaches that only control for gene-to-gene variability2,3. These classical uncertainty measures assume that the expression magnitudes of any two genes from the same chromosome are independent, but this is known to not be the case12.

To assess the inherent spatial variability of median expression levels within the genome, we sampled contiguous blocks of active autosomal genes of size similar to the X chromosome (see Supplementary Methods) and estimated the statistical range within which the X:AA ratio is expected to fall under competing hypotheses: (i) the true median expression level on the X chromosome is equal to that of the autosomes (compensated, Fig. 1a,c, red bars) or (ii) the true median expression level on the X chromosome is equal to half that of the autosomes (uncompensated, Fig. 1a,c, blue bars). We find that in all examined human and mouse data sets (except for the mouse muscle sample), the observed X:AA ratios are within the range expected for twofold upregulation of the X chromosome. In contrast, all of the mouse and 5 of the 11 human data sets show ratios higher than the statistical range expected without dosage compensation. For the other 6 human data sets, the inherent variability within the genome was too high to distinguish between the two hypotheses. Combining the results from all examined tissues, we find that the overall likelihood is strongly in favor of X-linked genes being expressed at the autosomal level (likelihood ratio of 1 × 1022 for human and 1 × 1027 for mouse tissues, see Supplementary Table 1).

Although applying a single threshold to the X chromosome and autosomes can artificially shift the X:AA ratio toward 1 even if the X chromosome is uncompensated, we note that the observed ratios are significantly above those expected in the absence of dosage compensation (Fig. 1d and Supplementary Fig. 2). Additional statistical corrections for this effect (see Supplementary Methods) increased the uncertainty for each data set, but the overall likelihood remained strongly in favor of the occurrence of dosage compensation (likelihood ratio of 1 × 1010 for human and 1 × 1013 for mouse tissues, see Supplementary Table 1).

Our results indicate that the controlled analysis of RNA-seq data supports earlier conclusions that the active X chromosome is upregulated in mammals and that adjustment for the different percentage of transcriptionally active genes on the X chromosome and autosomes is necessary for this analysis. In addition, we show the importance of controlling for spatial dependence in the analysis of genome-wide expression data. We note that our conclusion here is also supported by additional data and analysis presented by Deng et al.13 in this issue.

Accession numbers. RNA-seq data have been deposited in the GEO database (human: GSE12946, GSE13652; mouse: GSE21860, GSE22131) and in the SRA database (mouse: SRA001030).

Author contributions

P.J.P. conceived of the study, P.V.K., R.X. and P.J.P. devised analysis, P.V.K. and R.X. performed the calculations, and P.V.K. and P.J.P. wrote the manuscript.

Accession codes

Accessions

Gene Expression Omnibus

Sequence Read Archive

References

  1. 1

    Adler, D.A. et al. Proc. Natl. Acad. Sci. USA 94, 9244–9248 (1997).

    CAS  Article  Google Scholar 

  2. 2

    Nguyen, D.K. & Disteche, C.M. Nat. Genet. 38, 47–53 (2006).

    CAS  Article  Google Scholar 

  3. 3

    Xiong, Y. et al. Nat. Genet. 42, 1043–1047 (2010).

    CAS  Article  Google Scholar 

  4. 4

    Mukherjee, A.S. & Beermann, W. Nature 207, 785–786 (1965).

    CAS  Article  Google Scholar 

  5. 5

    Alekseyenko, A.A., Larschan, E., Lai, W.R., Park, P.J. & Kuroda, M.I. Genes Dev. 20, 848–857 (2006).

    CAS  Article  Google Scholar 

  6. 6

    Gilfillan, G.D. et al. Genes Dev. 20, 858–870 (2006).

    CAS  Article  Google Scholar 

  7. 7

    Pan, Q., Shai, O., Lee, L.J., Frey, B.J. & Blencowe, B.J. Nat. Genet. 40, 1413–1415 (2008).

    CAS  Article  Google Scholar 

  8. 8

    Wang, E.T. et al. Nature 456, 470–476 (2008).

    CAS  Article  Google Scholar 

  9. 9

    Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Nat. Methods 5, 621–628 (2008).

    CAS  Article  Google Scholar 

  10. 10

    Kim, H. et al. Nat. Med. 16, 804–808 (2010).

    CAS  Article  Google Scholar 

  11. 11

    Gregg, C. et al. Science 329, 643–648 (2010).

    CAS  Article  Google Scholar 

  12. 12

    Lercher, M.J., Urrutia, A.O. & Hurst, L.D. Nat. Genet. 31, 180–183 (2002).

    CAS  Article  Google Scholar 

  13. 13

    Deng, X. et al. Nat. Genet. 43, 1179–1185 (2011).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank M. Kuroda, E. Larschan, M. Gelart, C. Wang and J. Lee for helpful discussions and a critical reading of the manuscript.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Peter J Park.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1 and 2, Supplementary Table 1 and Supplementary Methods (PDF 417 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Kharchenko, P., Xi, R. & Park, P. Evidence for dosage compensation between the X chromosome and autosomes in mammals. Nat Genet 43, 1167–1169 (2011). https://doi.org/10.1038/ng.991

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing