X chromosomes evolve differently from autosomes, but general governing principles have not emerged1. For example, genes with male-biased expression are under-represented on the X chromosome of D. melanogaster2, but are randomly distributed in the genome of Anopheles gambiae3. In direct global profiling experiments using species-specific microarrays, we find a nearly identical paucity of genes with male-biased expression on D. melanogaster, D. simulans, D. yakuba, D. ananassae, D. virilis and D. mojavensis X chromosomes. We observe the same under-representation on the neo-X of D. pseudoobscura. It has been suggested that precocious meiotic silencing of the X chromosome accounts for reduced X chromosome male-biased expression in nematodes4, mammals5 and Drosophila6. We show that X chromosome genes with male-biased expression are under-represented in somatic cells and in mitotic male germ cells. These data are incompatible with simple X chromosome inactivation models. Using expression profiling and comparative sequence analysis, we show that selective gene extinction on the X chromosome, creation of new genes on autosomes and changed genomic location of existing genes contribute to the unusual X chromosome gene content.
Several models have been advanced to explain the peculiar gene content on X chromosomes, which can be divided into those driven by gene-by-gene or chromosome-wide selective pressures1,7. Antagonistic selection is a popular gene-by-gene model. Females and males are under different selective pressures and deploy the genome differently such that genes and expression states advantageous for one sex can be disadvantageous to the other. This is expected to have a profound influence on X chromosomes because hemizygosity and immediate selection of recessive alleles in males should be masculinizing, whereas the increased residency time of X chromosomes in females and immediate selection of dominant alleles should be a counteracting force. A popular chromosome-wide model suggests that X chromosome inactivation during spermatogenesis is responsible for the reduced number of X chromosome genes with male-biased expression. The X chromosome is precociously condensed, and thus silenced, in preparation for male meiosis, owing to the absence of a homologous pairing partner.
Before addressing particular models, we asked if X chromosome sex-biased expression patterns were consistent across the genus. We determined female:male expression ratios on species-specific microarrays8, normalized sex bias across species, and parsed the expression data by chromosome arm (Fig. 1a, and Supplementary Fig. 1). Homologous linkage groups in the Drosophila genus are referred to as ‘Muller's elements’ to standardize discordant species-specific chromosome nomenclature9. Muller A is part of the X chromosome in all the species examined. Muller A genes with male-biased expression were under-represented relative to autosomes in each of the seven species (30–43% less than expected, P < 10-4 by chi-squared test; Supplementary Table 1). No other chromosome arms showed a genus-wide significant departure from a random distribution, although we did observe a modest over-representation of genes with male-biased expression on Muller B (P < 10-2 by chi-squared test) in D. melanogaster, D. simulans, D. ananassae and D. mojavensis. Therefore, a dearth of genes with male-biased expression on the X chromosome is characteristic of the genus.
Formally, reduced male-biased expression could be due to an historical accident on the Muller A element rather than a property of the X chromosome itself. This is directly testable owing to the chromosome arrangement in D. pseudoobscura, in which the ancestral autosome, Muller D, fused to the X chromosome ∼8–12 million years ago10. The neo-X chromosome (Fig. 1a) also showed a strong under-representation of male-biased expression (37% of the expected random distribution, P < 10-4 by chi-squared test). The consistent under-representation of genes with male-biased expression on the D. pseudoobscura neo-X and on the ancestral X chromosomes, suggests that the effect of linkage on male-biased expression is a property of X chromosomes and that it takes less than 12 million years to reach a stereotypical depleted status.
Important predictions of the X chromosome inactivation model for reduced male-biased expression are readily testable: Expression of all X chromosome genes should be reduced, and X chromosome deficits in male-biased expression should be restricted to late spermatocytes. There was no global reduction in X chromosome gene expression in males (Fig. 1b), suggesting that global inactivation in spermatocytes does not disrupt the over-all balance of gene expression at the organismal level. These data are also consistent with genus-wide high-fidelity X chromosome dosage compensation in both the male soma and germline as observed in D. melanogaster11. The paucity of X chromosome genes with male-biased expression seems due to reduced numbers of genes with overt male-biased expression, not a chromosome-wide effect. Although these data do not support the X chromosome inactivation model, they are also based on whole-animal expression profiles that greatly dilute late spermatocyte expression.
If male-biased expression from X chromosomes is reduced owing to a germline-specific event, then there should be no similar reduction in the soma. Most orthologues show sex-biased expression in all tested species8. Therefore, we estimated somatic sex-biased expression by computationally removing orthologues with testis-biased expression in D. melanogaster (1,569 genes). We observed significant under-representation (P < 10-4 by chi-squared test) of X chromosome genes with putative somatic male-biased expression in five of the seven species (Fig. 1c). This included the D. pseudoobscura neo-X chromosome. We also re-analysed non-gonadal D. melanogaster expression data (Fig. 2a)2. Whereas the number of genes with male-biased expression is much lower in the non-gonadal soma, we observed clear under-representation of X chromosome genes with male-biased expression (P < 10-2), as previously observed2.
To test the X chromosome inactivation model more directly, we used testis expression data from D. melanogaster mutants blocked during primary spermatocyte amplification12. Testes from these mutants accumulate vast numbers of mitotic germ cells. We used independently profiled ovary samples as a reference13. X chromosome genes with testis-biased expression were under-represented in these mutant profiles (P < 10-4 by chi-squared test, Fig. 2b, c). Additionally, there was no evidence for global reduction in the expression of all X chromosome genes in either wild-type or mutant testis (Fig. 2d–f). These data suggest that the X chromosome is a poor location for germline male-biased expression before X chromosome inactivation, which occurs in late post-mitotic spermatocytes6. X chromosome inactivation may be too late to affect most X chromosome transcript levels. We suggest that the paucity of X chromosome genes with male-biased expression in Drosophila is due to selection at the gene level, not global chromatin status.
Regardless of the thematic model, there are a limited number of physical mechanisms for achieving the observed under-representation of X chromosome genes with male-biased expression2,14. We explored whether X chromosome genes with male-biased expression convert expression class, preferentially move to autosomes, are preferentially lost, or fail to arise on X chromosomes, using the D. pseudoobscura neo-X chromosome as a well-controlled natural experiment. We inferred the expression pattern of the ancestral Muller D of D. pseudoobscura before translocation to the X chromosome by using expression data from six phylogenetically flanking species. The extant expression pattern was directly determined. In the other species, Muller D and Muller E are arms of the same chromosome or are individual chromosomes. Muller E provided a reference for the analysis (Fig. 3a, Supplementary Tables 3, 4, and Supplementary Fig. 2).
Of the 242 Muller D genes with ancestral or extant male-biased expression (Fig. 3b) that can be unambiguously assigned linkage and expression patterns, 216 (89%) remain in place on the neo-X chromosome. Only one gene on the neo-X chromosome switched from male-biased expression to female-biased expression. Although this particular gene monkey king (mkg-p) has undergone an interesting functional radiation, and sex-biased expression switches in the melanogaster subgroup15, sex-bias change on the neo-X chromosome is not significantly different from Muller E (no switches). Thus, there is no overt evidence that expression changes are responsible for under-representation of genes with male-biased expression on X chromosomes.
We found that 5% of genes with male-biased expression on the ancestral autosome had been lost from the genome of D. pseudoobscura, whereas only 1% of ancestral genes with male-biased expression were lost from Muller E (P < 0.05 by chi-squared test). Only 2% of genes with male-biased expression on the neo-X chromosome were unique to D. pseudoobscura, whereas 15% of the Muller E genes with male-biased expression in D. pseudoobscura are unique. The relative excess of new genes with male-biased expression on the autosome is highly significant (P < 10-4 by chi-squared test). Determining if a gene is absent or new rather than highly diverged is difficult. However, exploration of genus-wide gene content using relaxed alignment instead of the consensus gene models indicates that many of these genes are likely to be completely absent rather than highly diverged (Supplementary Fig. 3). We also found clear evidence that non-synonymous substitutions (KA) relative to synonymous substitution (KS) rates are higher for X chromosome genes with male-biased expression than for those on autosomes (Supplementary Fig. 4). These data suggest that there is net loss of genes with male-biased expression from the X chromosome (as well as more subtle changes to sequence) and that functions lost from the X chromosome may be replaced by new gene formation on autosomes.
Analysis of gene transposition showed a net movement of genes with male-biased expression off the neo-X chromosome, with 3% of ancestral genes with male-biased expression moving to autosomes and no genes with male-biased expression moving from Muller E (P < 10-4 by chi-squared test). Male-biased expression was always maintained in the new location. There was no significant difference between the neo-X and Muller E with respect to newly arriving genes, because 1% of ancestral genes with male-biased expression moved to the neo-X chromosome and 2% moved to Muller E. These data clearly support the idea that movement of genes with male-biased expression to autosomes promotes long-term survival of those genes. Gene loss, gain and movement unambiguously accounted for 13% of the loss of male-biased expression from the neo-X chromosome and 73% of the gain of male-biased expression on Muller E (Supplementary Table 5). If these unambiguous results reflect the changes that we are unable to trace, then gene loss, gain and movement are the dominant mechanisms for depleting male-biased expression on the X chromosome relative to autosomes.
Our data strongly support the idea that the X chromosome has an unusual distribution of genes with male-biased expression4,5,16,17,18. Although there is compelling evidence that precocious X chromosome inactivation occurs in at least some species, and that this may contribute to the reduced density of genes with male-biased expression on X chromosomes4,5, our data indicate that this is not a major contributor to the pattern in the Drosophila genus. We suggest that the dominant mechanisms for achieving this under-representation are preferential extinction of X chromosome genes and formation of new autosomal genes with male-biased expression2, along with movement of genes with male-biased expression from the X chromosome14. Somewhat surprisingly, altered gene expression does not seem to be a major contributor.
Array data sources
Expression data for sex-sorted whole adults of the seven Drosophila species8 are from GEO19 (GSE6640). Data for D. melanogaster gonadectomized male and female carcass on the FlyGEM platform were published previously2, GEO (GSE442). Affymetrix data for bgcn- and UAS-os, bgcn-/nos-Gal4–VP16, bgcn- mutant testes expression12 were obtained from GEO (GSE4188,GDS2228) and wild-type ovary data were obtained from FlyMine20 release v.8.0.
D. melanogaster annotation v.4.3 (ref. 21) and Comparative Analysis Freeze 1 (CAF1) assemblies22 were used throughout. Orthology relationships between the seven species including orthologue, paralogue and no_homologue were assigned to consensus gene predictions22. Gene content changes were also determined by comparing D. melanogaster amino acid sequence against six-frame translated genomic DNA of each species by BLAST tblastn23.
For gene loss and translocation, the ancestral state of the lost/moved gene was inferred by the consensus expression class and chromosome linkage for each gene. For the D. pseudoobscura neo-X chromosome (Muller D) and autosomal Muller E, gene gain/loss/movement was manually counted gene-by-gene. The ancestral gene content of this arm was inferred by phylogeny using genes present in species that diverged before (D. virilis and D. mojavensis) and after (D. ananassae and the melanogaster subgroup) the melanogaster/obscura group split. The ancestral state refers to the chromosome linkage and expression class of genes at the origin of this node in a rooted phylogenetic tree.
Expression data for whole adults of the seven Drosophila species were generated with custom-designed oligonucleotide arrays (NimbleGen Systems). Array design, sample preparation and labelling, and data handling were as described8. Data for D. melanogaster gonadectomized male and female carcass2 were re-analysed from raw intensity; these data were re-normalized by within-slide print-tip loess normalization, followed by between-slide quantile normalization using Bioconductor24. Differential expression was then called in the same manner as the custom Nimblegen arrays, using Mann–Whitney tests with false-discovery rate correction. For Affymetrix data, we used ‘present’ and ‘absent’ calls to determine sex-bias. Intensity data were used directly from GEO without additional normalization, and averaged over replicates. Data were remapped to match DrosGenome2 to DrosGenome1 identifiers by Affymetrix.
‘Gained genes’ were defined as genes present in only one species (Supplementary Fig. 2). ‘Lost genes’ are genes absent from only one. ‘Moved genes’ are those that are present in all seven species, but with different chromosome linkage in at least one species. For gene loss and translocation, the ancestral state of the lost/moved gene is inferred by the consensus expression class and chromosome linkage for each gene. Genes with different arm linkage, owing to a large pericentric inversion in the D. yakuba lineage21, were not counted as translocations.
We also calculated content changes independent of gene prediction in the newly sequenced species. We performed a comparison of D. melanogaster amino acid sequence against six-frame translated genomic DNA of each species via BLAST tblastn23. BLAST results were parsed and hits tiled together using BioPerl25. D. melanogaster genes were called ‘absent’ in the other species if there was no hit below an E-value cut-off (10-3 and 10-17 are shown in Supplementary Fig. 3).
For the D. pseudoobscura neo-X chromosome (Muller D) and autosomal Muller E, gene gain/loss/movement was manually counted gene-by-gene. The ancestral gene content of this arm was inferred by phylogeny, using genes present in species that diverged before (D. virilis and D. mojavensis) and after (D. ananassae and the melanogaster subgroup) the melanogaster/obscura group split. The ancestral state refers to the chromosome linkage and expression class of genes at the origin of this node in a rooted phylogenetic tree21,22. For both Muller D and E, we filtered genes that are linked to these arms in any species. These were then filtered to include those that are Muller D- or Muller E- linked in at least one species downstream and one species upstream, or in D. pseudoobscura itself. Then genes with male-biased expression in at least one species are filtered. Each remaining gene was then manually assigned a putative ancestral expression class and arm location and post neo-X chromosome fate that considered the entire Drosophila lineage. There are cases involving species on the ‘edge’ of the tree that could not be resolved by this approach. For example, a gene that is Muller B-linked in D.mojavensis/D.virilis (on the edge of the rooted tree), but Muller D-linked in every other species would not be counted, because it is not possible to discriminate the direction of translocation. Our approach includes only unambiguous cases (Supplementary Tables 2–4).
Multiple sequence alignments of orthologues were imported using the seqinR package26. KA/KS estimates adjusted for differences in transition and transversion rates were calculated from these alignments27.
All computation was performed in the R/Bioconductor environment24.
We thank the Drosophila 12 Genomes Consortium for access to the assembly, alignment and annotation of the 12 sequenced Drosophila genomes, S. Davis for valuable technical advice, and C. Vinson, A. Clark and members of the B. Oliver laboratory for helpful discussion and comments on the manuscript. We are supported by the Intramural Research Program of the NIH, NIDDK.
The file contains Supplementary Tables S3-S4.