Introduction

Dosage compensation in mammalian females is achieved by the inactivation of one X chromosome early in development leading to the equality of X-linked gene products between male and female cells.1 Not all X-linked genes are inactivated. Most loci in the pseudoautosomal regions on the tips of the long and short arms are active; there are, in addition, genes interspersed throughout the remainder of the chromosome that are expressed from both the active and, at varying levels, the inactive chromosomes.2, 3, 4 Some estimates indicate that 10–20% of X-linked genes escape inactivation, with the majority located on Xp. Some of these have Y homologues, but many do not.5, 6 Furthermore, there may be other genes on the X chromosome which are differentially imprinted, that is, selectively silenced, depending on their parent of origin.7 Loci escaping inactivation appear to be found both in clusters and in isolation.8, 9 The patterns of X-linked gene expression in different tissues and, in particular, different regions of the brain are of great significance for interpreting their impact on behavioural sex differences. Due to the paucity of current data, determination of the inactivation status of additional genes remains critical for the formulation and resolution of rival theories concerning the nature and spread of the inactivation process.

An efficient means of determining the activity of an X-linked gene is via the analysis of mouse–human somatic cell hybrids retaining only the inactive human X chromosome in a rodent background.5 As inactivation is effected at the level of transcription, detection of human-specific transcripts by RT-PCR is suggestive of an escape from inactivation. Inactivation can be confirmed by quantitation of the relevant transcripts from XO, XX, XXX or XXXX human cell lines.9, 10 Other strategies are possible including observations on tissues, or cell lines, from females with skewed inactivation and we have previously presented an independent RT-PCR-based approach examining transcribed polymorphisms using cDNA from cloned human cell lines.11

Although much useful data have been generated by the above techniques, they are labour intensive and/or generally applicable only to a restricted number of loci. The majority of available data reflects inactivation status in established cell lines or somatic cell hybrids, and thus may not reflect the situation in vivo. Recourse to microarray ‘chip technology’ enables the screening of all X-linked genes for which sufficient sequence data are available. We have therefore investigated the potential of this system employing commercially available microarrays to assess quantitatively the comparative expression of X-linked genes in female lymphocytes to that in males. Sudbrak et al12 were the first to demonstrate the value of the microarray approach in such studies, employing X-linked EST clone inserts to interrogate transcripts from lymphoblastoid cell lines.

Materials and methods

Lymphocytes, RNA extraction and cDNA synthesis

Six male and six female volunteers provided 20 ml blood samples and lymphocytes were isolated using Lymphoprep (Axis-Shield-PoC AS, Oslo). Total RNA was isolated subsequently with RNA-Bee (AMS Biotechnology). RNA was treated with DNase I (Qiagen, Crawley, UK) to avoid DNA contamination. The quality and purity of total RNA was assayed in a 2% agarose gel and the recovery was calculated after measuring absorbance with a spectrophotometer at 260 nm.

Biotin labelled cRNA was prepared from 5 μg of each total RNA, employing the manufacturer's reagents and protocols with the experiments being conducted in Affymetrix Laboratories, High Wycombe, England. cRNA products were subjected to quality control steps as follows: after spectrophotometric quantification they were hybridised to ‘Test 3 arrays’ providing information on 3′ compared to 5′ signal ratios for the human housekeeping genes. Each sample was hybridised to a single HG-U133A chip and processed using the standard protocol EukGE-WS2v4. The 12 chips were all scanned with the same scanner. Quality indicators from these hybridisations are summarised in Table 1. On this basis, it was decided to eliminate female sample 3 and, to balance the numbers of male and female samples, male 7 was also excluded. In the remaining 10 samples, the range of scale factors was 0.85–2.1. The intensities were analysed with the Affymetrix Microarray Suite (MAS) version 5.0 ‘statistical algorithm’, and the resulting signal values were scaled to give a trimmed mean of 100 for each chip. These values were then imported into the Affymetrix Data mining tool and also the R statistical environment. The HG-U133A microarray displays sequences corresponding to approximately 18 400 different transcripts and 14 500 known genes, of which 772 are known X chromosome probe sets. Male and female samples were tested for difference using the Mann–Whitney test.

Table 1 Quality control data for the 12 cRNA samples

Results

Employing the Mann–Whitney test provided with the Affymetrix Data Mining Tool, a total of 36 probes with a female to male ratio above 1 (P<0.05) were identified. Although this two-tailed nonparametric test is not corrected for multiple testing, it is otherwise extremely conservative. The overall female/male ratio for all sequences represented, based on five microarrays for each sex, was 0.91. As an additional check on the validity of the identification process, means and standard deviations were calculated for the standardised signal outputs for the five female and five male samples for loci identified as giving significantly higher female:male ratios. These data together with information on chromosomal location (in Mb from Xpter) and probe derivation (Unigene accession details) are provided in Table 2. All probe sets represented on the microarray are listed and those with significant values are in bold.

Table 2 Comparison of male/female ratios of expression

Examination of those loci having evidence of escape from inactivation revealed four classes. The first is represented by genes with well-established records as escapees, the second by those for which previous weak or preliminary evidence for their escape is available, the third by loci for which previous evidence suggested normal inactivation and the fourth by candidates with no previous information available concerning their inactivation status.

Loci with substantial pre-existing data supporting their escape from inactivation

X-linked genes that have active Y homologues

These are discussed sequentially in reported cytogenetic (Xpter–Xqter) order below.

Zinc-finger protein, X-linked, ZFX, (Xp22.2–p21.3) has a similar gene organisation and transcription profile to that of ZFY and escapes X inactivation.9, 12, 13, 14, 15

DEAD/H Box 3, X-linked, DDX3, (Xp11.3–p11.23) is one of a group of five X-linked loci with Y homologues in the nonpseudoautosomal region, which escape inactivation.16, 17

Ubiquitously transcribed tetratricopeptide repeat gene on X chromosome, UTX, (Xp11.2) is homologous to the murine X-linked locus Utx. Both are reported to escape inactivation.18, 19 Carrel et al5 observed the expression of UTX from inactive X chromosomes in all six hybrid cell lines tested. (see: http://genetics.gene.cwru.edu/willard/data/xin/genesurvey-all.xls for information on hybrid analysis).

X-linked genes that have inactive Y homologues

Arylsulphatase C, ARSC2, comprises two microsomal isoenzyme forms, s and f, with distinct transcripts. The coding sequences for both are localised at Xpter–p22.32, close to PAR1 and escape partially from inactivation.20 The extent to which the locus escapes inactivation may vary in different tissues with an overall ratio of female to male activity of 1.6, and ratios for peripheral blood white cells being particularly low.21 In the current examination, we obtained weak evidence for its escape from inactivation with one probe set indicating a female to male ratio of 1.33 with a P-value of 0.075.

X-linked genes with existing evidence for escape from inactivation and that have no Y homologues

X-linked loci escaping inactivation for which there are no apparent Y-linked homologues are represented in Table 2 by the eight loci detailed below. The eukaryotic translation initiation factor 2, gamma, EIF2S3, (Xp22.2–p22.1) unlike the mouse locus, apparently does not have a Y homologue; nevertheless, it appears to escape inactivation.22, 23, 5 Sedlin, SEDL, (Xp22.2–p22.1), which is mutated in spondyloepiphyseal dysplasia tarda, also escapes inactivation along with its closest flanking partners.24 Carrel et al5 reported its expression in all nine inactive-X-only hybrids tested. A similar result was reported for the gene encoding the cofactor required for SP1 transcriptional activation, subunit 2, CRSP2, (Xp11.4–p11.2), also referred to as TRAP170.25, 26 Ubiquilin 2, UBQLN2, was localised to Xp11.23–p11.1 by Kaye et al27 and reported by Carrel et al5 to escape from inactivation in two of nine hybrid cell lines. Another gene, structural maintenance of chromosomes 1-like, SMC1L1 is localised to a similar region (Xp11.22–p11.21) and reported to escape inactivation.5 On the long arm, the locus for armadillo repeat protein, ALEX2, (Xq21.33–q22.2) was found to escape inactivation in two of nine somatic cell hybrids tested.5 Also located on the long arm (Xq22) is the locus for collagen, type IV, alpha-6, COL4A6, which is mutated in some patients with Alport's syndrome.28, 29 Higher female expression for COL4A6 was detected in two probe sets in the current analysis and it was found to escape inactivation in three of nine somatic hybrids examined by Carrel et al5 The same approach suggests that the Homo sapiens hypothetical protein, FLJ21174, localised to Xq22.1, escapes inactivation in some (two of nine) hybrid backgrounds.

Loci with existing weak, conflicting or circumstantial evidence for escapee status

Montini et al30.identified a human cDNA, subsequently attributed to the locus, Sex comb on midleg, Drosophila, homologue-like 2, SCML2. Although Carrel et al5 failed to detect significant expression in the hybrid system, it is a possible escapee candidate by virtue of its chromosomal localisation at Xp22 in a region containing the MIC2 gene family and the sulphatase gene cluster. A similar argument applies to the gene encoding transducin-beta-like 1, X-linked, TBL1X, located in Xp22.3.31 The Chloride channel 4, CLCN4, locus is also located at Xp22.3.32 Interestingly, in Mus spretus the locus is X-linked, but autosomal (chromosome 7) in C57BL/J6, suggesting a recent evolutionary X rearrangement close to the pseudoautosomal boundary consistent with the gene's possible escapee status. The Rho guanine nucleotide exchange factor 6, ARHGEF6, has been identified at the X breakpoint in a mother and son having an apparently balanced X:21 translocation (tXq26:21p11). Notably, wild-type mRNA for the locus was detected in the mother, consistent with her normal phenotype,33, 34 which is contrary to the normal expectation that genes on the intact X in translocation carrier females are inactive. Somatic cell hybrid studies, nevertheless, suggest that ARHGEF6 is normally inactivated. In our analysis, the observed female/male ratio (1.11) suggested only very partial escape, but with a highly significant P-value (0.004). SRY-related HMG-box gene 3, SOX3, (Xq26–q27) is homologous to the SRY locus on the Y chromosome and its conservation in marsupials suggests that SOX3 represents the ancestral gene from which SRY was derived; it may therefore have an unusual inactivation profile.35, 36

Loci for which previous evidence suggested normal inactivation

Earlier reports concerning the following loci detected in our current screen failed to provide evidence of significant escape from inactivation: chronic granulomatous disease, CGD, (Xp21.1),37, 38.alpha thalassemia/mental retardation syndrome, X-linked, ATRX, (Xq13), which results from mutations in the helicase 2/RAD54 gene5, 39, 40 and septin 6, SEPT6.5 Ste20-like kinase, MST4, (Xq26) also appeared to be regularly inactivated in studies on X-only-somatic cell hybrids.41

Established loci lacking evidence either for, or against, inactivation

These include U2 small nuclear ribonucleoprotein auxiliary factor, small subunit 2, U2AF1RS2, (Xp22.1) a human homologue of the imprinted mouse gene U2af1rs1,42 synaptophysin, SYP, (Xp11.23–p11.22) heterogeneous ribonuclear protein H2, HNRPH2, (Xq22), cylicin 1, CYCL1, (Xq21.1), the melanoma antigen – family C, MAGEC1, (Xp26) and the serine/threonine kinase 23, STK23, (Xq28).

Finally, there are several poorly characterised X-linked loci, which have been identified in this screen as potential escapees. These are: the loci encoding the Homo sapiens hypothetical proteins, PRO0386, (Xp22.31); STRAIT 11499, (Xp11.4); G antigen 5, GAGE5/7, (Xp11.4–p11.2); FLJ20105, (Xq12); the putative purinergic receptor, P2Y10, (Xq21.1); FLJ2969, (Xq22.1–q22.3); the brain my048 protein, locus also at Xq22.1–q22.3; the Homo sapiens hypothetical proteins, LOC5726, at (Xq25) and FLJ12649, (Xq26.3) and the Homo sapiens protein, KIAA1232, for which no subregional localisation has been reported.

(Unigene accession numbers for these and other loci without literature references are provided in Table 2).

Discussion

Two of the most common previously employed strategies to assess inactivation status are to analyse methylation profiles at, or adjacent to, the locus of interest and to examine the expression of transcripts from the inactive X chromosome isolated in somatic cell hybrids employing RT-PCR. The former is based on the assumption that methylation is associated with inactive status, but does not assay expression directly and cannot provide information on the extent to which loci might escape the inactivation process. The latter is highly adaptable; however, it can only relate to the inactivation status in a cultured somatic cell hybrid background.2, 5, 6 Various attempts to overcome both of these limitations have been made. The recent availability of reliable microarray technology provides a powerful approach to examine transcription profiles and advances in both the preparation of arrays and in the software available for the analysis of signal intensity, have brought the realistic possibility of detecting fractional rather than fold differences in expression. This in turn provides an opportunity to examine subtle sex differences in gene expression and, in this investigation, we have analysed the relative expression of X-linked loci to evaluate the potential of the approach to investigate inactivation status.

The selection of lymphocytes as the tissue of choice was dictated by ease of access and in lymphoblastoid cell lines derived from them, about 50% of the 5184 genes represented on the microarrays used in one study were expressed at sufficient levels for analysis.43 Although such expression studies indicated some natural variation in human gene expression, with some familial aggregation – suggesting the existence of high and low expressing alleles – our investigations reported only results for which there were consistent patterns of expression differentials between the five male and female replicates. Although possible cross-hybridisation to Y-related sequences may be difficult formally to eliminate in some cases (and would in any case tend to reduce female/male ratios), the detection of loci for which there was previous evidence of their escapee status and for which no Y homologues have been reported supports the general potential of this approach. Indeed, given the reproducibility of signals observed and that in several examples the same loci have been identified by more than one probe set, we have not attempted replication by other techniques, which are not directly comparable and unlikely to detect subtle differences in expression levels; but rather have provided an overall evaluation of the method's potential by comparisons with the extensive data already available in the literature.

Sudbrak et al,12 employed cDNAs bound to microarrays to interrogate transcripts from lymphoblastoid lines. Their overall pattern of observation was similar to those reported here, with about 4% escapees – equally distributed between short and long arms, and with many of the same loci being identified. Clearly, the microarray approach will enable an efficient strategy to obtain a better understanding of the tissue and individual differences in inactivation profiles as more studies are completed.

The results obtained in this investigation have been extremely encouraging in that, of the 20 well-studied loci which we identified as having significantly higher female expression levels, 11 had previous supporting evidence for their escape from inactivation and five had provisional, or circumstantial, evidence. Only four characterised loci had previous contrary evidence to the conclusions reached in our study. The remaining loci, the majority of which encoded hypothetical proteins, had no associated information concerning their inactivation status.

We also examined the data for some genes that were not detected in the screen for X loci escaping inactivation, but for which previous data supported their inactivation status as either escapees, or as normally inactivated. Unfortunately, two well-established escapee loci, MIC2 (cell surface antigen MIC2) and RPS4X (ribosomal protein S4) were not represented on the microarray. PCTAIRE-1, cdc2-related protein kinase (PCTK1) and ubiquitin-activating enzyme (UBE1) were represented and have previously been reported as escaping inactivation5, 6 although for the latter there have been conflicting reports concerning its inactivation status.4, 44 Our data indicate that UBE1 is expressed at significant levels in lymphocytes and that there appears to be an excess of female transcripts (see Table 3), which may reflect partial escape from inactivation. PCTK1 also shows a slight excess of female transcripts; however, as for UBE1, the standard deviation intervals overlap and the interpretation concerning their inactivation status remains tentative. In contrast, pyruvate dehydrogensae subunit A1, PDHA1, and the Duchenne muscular dystrophy locus, DMD, are known to be regularly inactivated and the ratios we observed for both were consistent with a normal inactivation pattern.

Table 3 Expression level data for recognised X-linked genes believed to escape inactivation, but which were not detected by microarray analysis

Several loci detected in the screen had female: male ratios of >2; however, these all had low absolute detection levels viz ZFX, SYP, SOX3 and CLCN4 (with values c. <10) and the data for these should be regarded as preliminary, even though the reproducibility of their expression levels in the two sexes was robust. Clearly, the number of loci for which we have provisional information for their escape from inactivation based on the microarray approach is considerably below the 10–20% figure predicted from some other studies.5, 6 Among the factors contributing to this discrepancy are that the reliability of any method depends both on absolute expression levels and the extent to which escape from inactivation occurs. Many loci represented on the microarrays are expressed only at low levels in lymphocytes and their inactivation profiles may not be representative of other tissues. In addition, there may be differences in patterns of inactivation between lymphocytes and cell lines and somatic cell hybrids; the latter, in particular, may not reflect the situation in vivo.

Naturally, there are also sex differences in expression for non-X-linked genes. Four probesets gave female to male signal ratios of <0.1: 206700_s_at, 205000_at, 201909_at, and 204409_s_at. These have uncorrected t-test P-values ranging from 0.0002 to 0.02 and they are all unique Y chromosome transcripts. At the other extreme, one probe set presents with a ratio of >10 and corresponds to the HLA class II DRB4 gene with a common null allele, for which all but two of the female subjects are homozygous. These data indicate the general robustness of the approach. Overall, employing the rudimentary criterion of an uncorrected t-test P-value of 0.1, X chromosome transcripts show a 3.2-fold excess for female to male high expression, compared with a 2.4 excess for the genome as a whole.

Overall, therefore, we believe that the approach described here will have an important role in the investigation of inactivation patterns in a variety of tissues including the brain and may lead to insights into possible sex differences in behaviour that may result from such dosage differentials.