To the Editor:
It has been hypothesized that, in addition to the inactivation of one of the female X chromosomes, X-linked expression in mammals is regulated through dosage compensation that involves a twofold upregulation of expression from the active X chromosome. This idea was initially based on evolutionary arguments and has subsequently been supported by the analysis of microarray expression data, which suggested that the median transcriptional magnitude of genes on the single active X chromosome is similar to that of genes on the two-copy autosomes (X:AA ratio of ∼1)1,2. However, in a recent Nature Genetics article, Xiong et al. state, on the basis of their examination of multiple human and mouse RNA sequencing (RNA-seq) data sets, that global X-chromosome upregulation is absent, thus necessitating a major revision of the current model3. The authors argue that the increased accuracy of the RNA-seq data reveals the true X:AA ratio to be close to 0.5, about half of the value obtained by examining microarray data. Here we contend that the low estimate of the X:AA ratio by Xiong et al. stems from the disproportionate contribution of transcriptionally inactive genes, which are not relevant for the evaluation of dosage compensation mechanisms, to the X chromosome average. We show that when only active genes are considered, the RNA-seq data give X:AA ratios closer to 1, and the observed minor deviation of the X:AA ratio from 1 is within the range expected when taking into account chromosome-to-chromosome variability.
Whereas upregulation of the X chromosome was originally demonstrated in Drosophila melanogaster by direct comparison between the sexes4, X-chromosome inactivation that equilibrates the effective dosage in diploid adult cells in mammals precludes such an approach. Investigations have thus focused on comparing the transcriptional output of the single active X chromosome to that of two-copy autosomes2,3. The fundamental challenge in this approach, as we show below, is that it involves comparison of different groups of genes and therefore must account for interchromosomal differences not directly related to dosage compensation.
A critical question, then, in computing the X:AA ratio is which criteria should be used to select genes for subsequent analysis. Dosage compensation is expected to equilibrate the amounts of available gene products between the sexes while preserving the set of functionally active genes in a given cell type. We assert that the effect of a mechanism that regulates transcriptional dosage compensation pertains only to the expression magnitude of transcriptionally active genes5,6. Analysis of this mechanism, therefore, should include only transcriptionally active genes. Some background signal inherent to the assay used may be observed for inactive genes, but this signal is independent of the occurrence of dosage-compensation mechanisms. Thus, it is imperative to remove the contribution of inactive genes from the calculation of the average expression magnitudes of chromosomes.
In RNA-seq measurements, the abundance of a given transcript is assessed on the basis of the number of sequenced cDNA fragments that are associated with its exonic regions (that is, the number of associated reads per kilobase of exonic sequence per million of total reads sequenced (RPKM), adjusted for read mappability). At a given sequencing depth, the genes lacking any associated fragments are below the detection level and cannot be considered active. Re-examining the RNA-seq data from human and mouse tissues7,8,9,10 used by Xiong et al. and an additional mouse data set11, we find that the fraction of such undetected (RPKM = 0) genes is substantially higher on the X chromosome than on autosomes (1.6–2.2 times higher, depending on the sample), accounting for as much as 40% of all the X-linked genes (Supplementary Fig. 1a). Therefore, inclusion of all genes or the same fraction of genes from all chromosomes disproportionately reduces the average transcription magnitude estimates for the X chromosome (the analysis in Xiong et al. excluded an equal percentage of the most and least expressed genes from each chromosome3, thus only influencing the error bars from bootstrap analysis without altering the X:AA median ratio). When inactive genes are excluded based on the lack of observed reads (RPKM = 0), the X:AA ratio is substantially higher than the value of 0.5 expected from a single uncompensated X chromosome (Supplementary Fig. 1b).
The detection of at least one associated read, however, is not a robust criterion for identifying transcriptionally active genes. As the number of total sequenced reads increases, so does the probability that sequencing or alignment errors will lead to the erroneous association of at least one read with a given transcript. Greater sequencing depth also allows for detection of increasingly rare transcripts. We therefore chose a low relative abundance threshold (RPKM ≥ 1 with at least 3 reads, which corresponds to approximately 0.3 mRNA copies per cell9) to identify a more robust set of transcriptionally active genes. Using this threshold, we found the average X:AA ratio to be 0.93 ± 0.17 across human tissues (Fig. 1a). Although the minimum amount of transcript for functional relevance cannot be easily determined for different transcripts, we note that the X:AA ratio does not substantially change for thresholds above RPKM = 1.
Because the X:AA ratio compares the median expression levels of disparate genes located on different chromosomes, the significance of X:AA deviation from unity should be evaluated relative to the natural variability in the median expression levels among chromosomes. Such inherent variability is illustrated by analogous ratios estimated for individual autosomes, which are present in an equal number of copies. For instance, we find that chr. 10:A = 0.83 ± 0.09 and chr. 11:A = 1.25 ± 0.18 (Fig. 1b). Of note, many such deviations are statistically significant if the confidence intervals are estimated using bootstrap resampling or similar approaches that only control for gene-to-gene variability2,3. These classical uncertainty measures assume that the expression magnitudes of any two genes from the same chromosome are independent, but this is known to not be the case12.
To assess the inherent spatial variability of median expression levels within the genome, we sampled contiguous blocks of active autosomal genes of size similar to the X chromosome (see Supplementary Methods) and estimated the statistical range within which the X:AA ratio is expected to fall under competing hypotheses: (i) the true median expression level on the X chromosome is equal to that of the autosomes (compensated, Fig. 1a,c, red bars) or (ii) the true median expression level on the X chromosome is equal to half that of the autosomes (uncompensated, Fig. 1a,c, blue bars). We find that in all examined human and mouse data sets (except for the mouse muscle sample), the observed X:AA ratios are within the range expected for twofold upregulation of the X chromosome. In contrast, all of the mouse and 5 of the 11 human data sets show ratios higher than the statistical range expected without dosage compensation. For the other 6 human data sets, the inherent variability within the genome was too high to distinguish between the two hypotheses. Combining the results from all examined tissues, we find that the overall likelihood is strongly in favor of X-linked genes being expressed at the autosomal level (likelihood ratio of 1 × 1022 for human and 1 × 1027 for mouse tissues, see Supplementary Table 1).
Although applying a single threshold to the X chromosome and autosomes can artificially shift the X:AA ratio toward 1 even if the X chromosome is uncompensated, we note that the observed ratios are significantly above those expected in the absence of dosage compensation (Fig. 1d and Supplementary Fig. 2). Additional statistical corrections for this effect (see Supplementary Methods) increased the uncertainty for each data set, but the overall likelihood remained strongly in favor of the occurrence of dosage compensation (likelihood ratio of 1 × 1010 for human and 1 × 1013 for mouse tissues, see Supplementary Table 1).
Our results indicate that the controlled analysis of RNA-seq data supports earlier conclusions that the active X chromosome is upregulated in mammals and that adjustment for the different percentage of transcriptionally active genes on the X chromosome and autosomes is necessary for this analysis. In addition, we show the importance of controlling for spatial dependence in the analysis of genome-wide expression data. We note that our conclusion here is also supported by additional data and analysis presented by Deng et al.13 in this issue.
P.J.P. conceived of the study, P.V.K., R.X. and P.J.P. devised analysis, P.V.K. and R.X. performed the calculations, and P.V.K. and P.J.P. wrote the manuscript.
Adler, D.A. et al. Proc. Natl. Acad. Sci. USA 94, 9244–9248 (1997).
Nguyen, D.K. & Disteche, C.M. Nat. Genet. 38, 47–53 (2006).
Xiong, Y. et al. Nat. Genet. 42, 1043–1047 (2010).
Mukherjee, A.S. & Beermann, W. Nature 207, 785–786 (1965).
Alekseyenko, A.A., Larschan, E., Lai, W.R., Park, P.J. & Kuroda, M.I. Genes Dev. 20, 848–857 (2006).
Gilfillan, G.D. et al. Genes Dev. 20, 858–870 (2006).
Pan, Q., Shai, O., Lee, L.J., Frey, B.J. & Blencowe, B.J. Nat. Genet. 40, 1413–1415 (2008).
Wang, E.T. et al. Nature 456, 470–476 (2008).
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Nat. Methods 5, 621–628 (2008).
Kim, H. et al. Nat. Med. 16, 804–808 (2010).
Gregg, C. et al. Science 329, 643–648 (2010).
Lercher, M.J., Urrutia, A.O. & Hurst, L.D. Nat. Genet. 31, 180–183 (2002).
Deng, X. et al. Nat. Genet. 43, 1179–1185 (2011).
We thank M. Kuroda, E. Larschan, M. Gelart, C. Wang and J. Lee for helpful discussions and a critical reading of the manuscript.
The authors declare no competing financial interests.
About this article
Cite this article
Kharchenko, P., Xi, R. & Park, P. Evidence for dosage compensation between the X chromosome and autosomes in mammals. Nat Genet 43, 1167–1169 (2011). https://doi.org/10.1038/ng.991
Journal of Genetics and Genomics (2020)
Frontiers in Cell and Developmental Biology (2019)
BMC Genomics (2019)
Genome Biology and Evolution (2019)