Introduction

X-chromosome inactivation (XCI) is a process whereby one of two X-chromosomes is inactivated early in mammalian female embryogenesis, to ensure that XX and XY individuals have a balanced expression of genes located on the X-chromosome.1, 2, 3 Once the maternal or paternal X-chromosome has been transcriptionally silenced in a given somatic cell, the same X is inactivated in all progeny cells creating mosaic adult individuals. Despite the fact that XCI is essential and fundamental to normal female development, the mechanism is still largely unknown. It is well established that XIST, a non-coding RNA expressed exclusively from the inactive X, is responsible for initiating and maintaining silencing of that chromosome through CpG hypermethylation and chromatin remodelling.4 However, the processes that differentiate an XX from XY cell and select one X for inactivation have yet to be elucidated.

The choice of which X is silenced has historically been considered a random process with the expectation that most females would have a ratio of active paternal X to active maternal X of 1:1 (balanced X inactivation pattern, XIP=0.5) and that significant variance from this pattern (skewed XIP) would be rare. There is growing evidence, however, that supports the concept that the choice step in XCI, and therefore the XIP, is genetically influenced. In mice, a polymorphic ‘X-controlling element’ (XCE), has been shown to influence the probability of XCI in cis, in a heritable fashion.5, 6 Although the XCE has yet to be isolated and characterised, it has been localised to the X-chromosome and four alleles with differing propensities for XCI have been identified. The XIPs in heterozygous mice are predictably skewed away from the balanced pattern that is observed in homozygous mice. Evidence in support of a human XCE is seen in reports of skewed XIPs clustering in families.7, 8 Naumova et al.9 used an approach of classic linkage analysis, to show that markers DXS425 (Xq25) and DXS294 (Xq26) are likely linked to the human XCE. Cau et al. used a similar approach to identify a candidate region of 4.2Mb cM ranging from DXS8067 (Xq24) to DXS8057 (Xq25), which partially overlaps the region identified by Naumova et al.7

In a previous report,10 we described a family with three males and three females with clinical symptoms of haemophilia A, a bleeding disorder caused by low factor VIII activity. We showed that the three affected females had XCI ratios skewed toward activation of the mutated X-chromosome and that the degree of skewing correlated with FVIII activity and the severity of disease in all carriers in the family. Further, we found that more females showed a higher degree of skewing than would be predicted by a model of random XCI choice. Our data are consistent with a model of genetically influenced XCI choice and support the possibility of a human XCE, similar to that described in mice.

Here we report further analyses of this family that have identified a critical region and SA2 as a candidate gene for the human XCE.

Materials and methods

Ethics approval

This project was approved by the IWK Health Centre and the Capital District Health Authority Research Ethics Boards, Halifax, Nova Scotia, Canada (Project no. 2949). Informed consent/authorisation was obtained for all participants in this study.

Participants

The family studied here (Figure 1) is as we have previously described,10 with the addition of seven new family members, II.4, II.10, III.4, III.6, III.7, IV.I and IV.2.

Figure 1
figure 1

Atlantic Canadian family with unexpected haemophilia A expression in females. A four-generation family with haemophilia A (HA) expression in multiple females was ascertained when the proband (arrow) was first diagnosed with severe haemophilia A.

Factor VIII analyses

F8 intron 22 inversion analysis and functional factor VIII activity measurements were carried out as previously described.10 In accordance with the IWK Health Sciences Centre guidelines on genetic testing of minors, the F8 genotypes of young females, who were not obligate carriers, were not assessed. Factor VIII activity measurements were taken.

DNA extraction

DNA from peripheral blood samples was prepared as previously described.10 Intraoral buccal mucosa cells were collected with cytology brushes (Orifice Medical AB, Ystad, Sweden). DNA was extracted using a QIAamp DNA Mini Kit (Qiagen, Germantown, MD, USA) according to the manufacturer's instructions.

XCI patterns in peripheral blood

XCI patterns (percentage of cells with Xap/Xim) in blood were determined by methylation sensitive endonuclease digestion (HhaI) followed by radioactive PCR amplification of the HUMARA microsatellite as described elsewhere.10 At this locus, the inactive allele is hypermethylated,11 therefore the peak with the greatest area is the least active. In some cases, the technique was adapted for analysis by fluorescent column electrophoresis (Supplementary Materials and methods). For participants who were informative at the HUMARA locus, but whose alleles differed by only three base pairs in size, only the fluorescent column electrophoresis strategy could be used. In these instances, stutter peaks from the larger allele co-migrated with the smaller allele, falsely increasing the quantification of the latter. An analysis of allele stutter patterns (area of allele peak: area of stutter peak) in individuals whose alleles were well resolved revealed that stutter patterns were consistent in an allele-specific manner. Quantification of the smaller allele could therefore be calculated by subtracting the presumed area of the stutter peak from the total area quantified.

For individuals homozygous at the HUMARA locus, the FMR-1 microsatellite was analysed12 by the fluorescent CE strategy described (Supplementary Materials and methods) with the following modifications. After incubation with or without restriction endonuclease, the FMR-1 microsatellite was amplified by PCR from aliquots of each solution (1.5 μl) using an FXS kit (Abbott Laboratories, Saint-Laurent Quebec, ON, Canada) according to the manufacturer's directions. For the visualisation of the PCR amplicons, 3.0 μl MM 1000 ROX Size Standard (BioVentures, Inc., Murfreesboro, TN, USA) was used. In contrast to the HUMARA locus, methylation of the FMR-1 locus is associated with the active X-chromosome, therefore the peak with the greatest area corresponds to the most active allele. An established threshold9 defines dramatically skewed XIPs as those >0.80 or <0.20. We considered an XIP somewhat skewed if it fell between 0.20–0.40 or 0.60–0.80, and balanced if it fell between 0.41–0.59. Values were calculated as an average of at least two independent determinations.

XIPs in buccal mucosa

XIPs in buccal mucosa were determined using a scaled-down version of the radioactive HUMARA protocol we have described previously,10 where 123 ng DNA were digested in a total reaction volume of 3 μl. XIPs in buccal mucosa were an average of at least two determinations. When DNA was not limiting, three or more determinations were made.

Statistical calculations

The correlation between XIPs of blood and buccal mucosa was determined by linear regression analysis using Microsoft Excel (2003).

Candidate gene identification

Microsatellite analyses. Peripheral blood DNA was analysed at 48 X-linked microsatellite markers by 5-cM X-chromosome scan (Australia Genome Research Facility, Parkville, VIC, Australia, http://www.agrf.org.au) (Supplementary Table 1). Twenty additional X-linked markers were also analysed (Supplementary Materials and methods). The cytogenetic and physical locations of all markers were taken from the Ensembl Genome database (version 35) (http://www.ensembl.org), and the UCSC Golden Path database (http://genome.ucsc.edu/), respectively, and the primer sequences for in-house microsatellite analyses were taken from the Genome Database (http://www.gdb.org) (Supplementary Table 1).

Linkage analyses. To identify candidate XCE genes, a tailored linkage approach was taken. This was necessary because of both the continuous nature of the ‘skewed XIP phenotype,’ and the unknown genotype-phenotype correlation. Linkage analyses were first performed using data only from sister-pairs with clearly discordant XIP phenotypes. These were presumed to have differing XCE genotypes. For one sibship (generation II, Figure 1), it was unclear which sisters were likely to have the same genotypes. Assuming single-locus, X-linked inheritance, this sibship can have a maximum of two putative XCE genotypes. Sisters can therefore be sorted into a maximum of two genotype groups. For four sisters, there are a total of eight possible genotype ‘sorting schemes’ (Table 1). All sorting schemes were assessed for linkage to the X-chromosome. To facilitate these linkage analyses, we created a computer program, which sorts marker data by chromosomal location, phases haplotypes when possible, and identifies candidate regions fulfilling the genotype sorting scheme under assessment. The program was written in C++ using GNU Emacs (v. 21.3.1) and compiled with GCC (v. 4.3.2) or GCC MinGW (v. 3.4.5).

Table 1 Genotype sorting schemes for linkage analyses

Candidate gene identification. Candidate regions identified through our linkage analyses were compared with XCE candidate regions described in the literature7, 9 Overlapping regions were recognised and candidate genes within the regions of overlap were identified using the Ensembl Genome Database. The Ensembl GNF development stage filter identified candidate genes expressed in the embryo.

SA2 sequencing

Exonic region identification. All SA2 (ENSG00000101972) transcript variants, were identified in the Ensembl Genome Database. Regions of sequence coding for exons in any processed transcript variant (exonic regions) were identified and enumerated (Supplementary Table 2). To facilitate this process, we created a computer program, written in C++ and PERL (v. 5.10.0) using GNU Emacs (v. 21.3.1) and compiled with GCC (v. 4.3.2) or GCC MinGW (v. 3.4.5). Primers were designed to amplify and sequence each region using Primer3.13 Amplifications were carried out in-house (Supplementary Materials and methods). Primarily, amplicons were spin-column purified using the illustra GFX PCR DNA and Gel Band Purification kit (GE Healthcare, Baie d'Urfe, Quebec, Canada) according to the manufacturer's directions, and sequenced at the Core Molecular Biology Facility, York University, Ontario, Canada (http://www.yorku.ca/biocore/) (indicated as ‘York’ sequencing protocol in Supplementary Table 2). Additional amplicons were sequenced in-house (Supplementary Materials and methods). All sequences were analysed with Mutation Surveyor v. 3.2.1 (SoftGenetics, Philadelphia, PA, USA).

Results

Pedigree expansion and mutation analysis

A total of 15 females and seven males across four generations were included in this study (Figure 1). Factor VIII activity and F8 genotype analyses were performed where appropriate. Heterozygosity for the intron 22 inversion (type II) was confirmed in seven females (Table 2). Two additional females (V.I and V.II) are considered carriers throughout the study due to their low factor VIII activities. Mutation analyses were omitted in both cases in accordance with IWK Health Center's policy on the genetic testing of minors.

Table 2 Summary of molecular results for carrier and non-carrier females

Correlation of XIP in blood vs buccal indicates that selective pressures are unlikely to account for skewed XIP

XIPs can be skewed due to a bias at the onset of XCI (primary skewing) or can become skewed secondary to selective pressures (secondary skewing). Previously, we showed that the inheritance of the F8 mutation is not affecting XIPs in this family.10 To examine whether some unknown factor could be imparting a selective pressure leading to skewed XIPs, we assessed whether XIPs were tissue-specific, a hallmark of secondary skewing. XIPs in buccal mucosa were determined for 11 females, and ranged from balanced (III.7: 0.53 SD 0.04) to dramatically skewed (III.9: 0.84 SD 0.02) (Table 2). It was not possible to determine the XIPs in buccal mucosa from three females (I.4, IV.1, IV.2) due to poor yield and/or quality of extracted DNA, or two other females (II.10 and III.3) due to homozygosity at the HUMARA allele. An overall comparison of average blood and buccal XIPs reveals a good correlation (R2=0.8; Figure 2). It has been reported that XIPs in blood become artificially skewed due to the progressive demethylation of the HUMARA locus in that tissue.14 We do see that in 7/10 cases, the average XIPs in blood were more extreme (further from XIP=0.5) than buccal XIPs, however, this is not a significantly high proportion of cases (P>0.2). Further, in 8/10 cases, any discrepancy between buccal and blood XIPs did not exceed the SD calculated from technical replicates. In III.9 and II.3, unusually small SDs (2 and 0, respectively) may account for the slight lack of fit between blood and buccal XIPs.

Figure 2
figure 2

A good correlation between the XIPs in blood and buccal mucosa suggests that secondary skewing is not a major contributor to the XIPs observed. All females for whom we had both buccal and blood XIP data were included. Phase was randomly assigned to female I.2. The dotted grey line represents a perfect correlation between buccal and blood XIPs. The proband is marked by a grey arrow.

As the buccal mucosa and blood are derived from embryologically distinct germ layers, the correlation between XIPs in these tissues suggests that, for the most part, XIP skewing is occurring early in development in this family. This is consistent with a hypothesis of biased primary XCI, possibly at the level of XCI choice. Alternatively, there could be a selective pressure that is affecting all tissues. This is unlikely, however, as mutations dramatically affecting cell survival or proliferation probably would have been embryonically lethal if present in hemizygous males such as II.8 or I.3. Or, if the mutation were passed through maternal lineages only, we would expect a preponderance of paternally biased XIPs in females inheriting the mutation, where the deleterious maternal allele is preferentially inactive. A statistical analysis of the prevalence of paternally biased XIPs vs maternally biased XIPs among the female relatives of III.9 on the paternal side of her family finds that there is no significant preponderance of paternally biased XIPs (P>0.2, data not shown).

X-chromosome-wide linkage analysis identifies an XCE candidate gene: SA2

Having ruled out common causes of genetically influenced skewed XIPs in a previous report,10 we considered here, whether skewed XCI in this family might be caused by a mechanism analogous to that described in the mouse XCE hypothesis.5, 15 Neither the nature nor precise location of the putative human XCE are known, though Xq25 appears to be linked to familial skewed XCI.7, 9 To determine whether the skewed XCI in the family studied here might also be linked to the X-chromosome, an X-chromosome-wide linkage analysis was undertaken.

As a skewed XIP phenotype is not precisely defined, and the XIP is a continuous trait, it is not possible to categorically assign phenotypes to each female. Therefore, we developed the following linkage approach. First, only sibships with clearly divergent phenotypes are considered (III.6-III.7, III.8-III.9 and IV.1-IV.2). Accordingly, in each case, sisters are assumed to have different genotypes. Our first linkage analysis considered only III.8 and III.9. We considered all markers to be a ‘match’ where III.8 and III.9 had inherited different alleles from their mother (Supplementary Table 3, step 1). All markers for which their mother was uninformative (homozygous or missing data) were also included. Next, we reduced the number of matching markers by considering IV.1 and IV.2. All markers at which their mother, III.6, was heterozygous but at which IV.1 and IV.2 had inherited the same maternal allele were eliminated. Similarly, we considered III.6 and III.7. Markers were eliminated if their mother, II.5, was heterozygous but sisters III.6 and III.7 had identical genotypes.

We then considered II.2, II.3, II.5, and II.6. Their intermediate XIPs precluded any obvious phenotype groupings. Assuming X-linked inheritance, a maximum of two XCE genotypes could be present in the sibship if their mother is heterozygous. This defines a total of eight genotype sorting schemes (Table 1). We tested each for linkage to the X-chromosome. Four sorting schemes (1, 3, 5, 6) were consistent with linkage to the X-chromosome for at least two consecutive markers (Supplementary Table 3). Three sorting schemes (3, 5, 6) reveal regions consistent with linkage for at least three consecutive markers (Table 3). The largest region identified, defined by sorting scheme 3, consists of 16.7 cM of Xq25-Xq27 (match region #1, Supplementary Figure 1). This overlaps by 0.7 cM with a region previously described in the literature as being linked to familial skewed XCI in humans7, 9 (match region 1.1, Supplementary Figure 1). Region 1.1, defined by markers DXS8098 and DXS8057, contains seven novel non-coding RNA genes and four protein coding genes: XIAP, SA2, SHZD1A and ODZ1. Only SA2 (STAG2/Scc3 homologue), a component of the cohesin complex, is known to be expressed in the embryo.

Table 3 Tailored linkage approach reveals four linkage regions

The linkage approach used to identify these candidates considered each sibship in isolation. Haplotype analysis of the markers flanking region 1.1, which considers the entire pedigree, is also consistent with linkage to region 1.1. This analysis assumes a co-dominant model analogous to the mouse XCE model, where homozygous females, regardless of genotype, tend toward balanced XIPs, and heterozygocity results in skewed XCI. Figure 3 shows how the inheritance of three theoretical SA2 alleles, designated A, B and C, could explain the XIP phenotypes in the family. In this model, A and B are strongly associated with the active X, and C is weakly associated with the active X as described by the relationship: A>B>>C. This model describes the degree of skewing for all females in the family, and the directionality of all but one, II.5. This discrepancy could indicate that we have not identified the correct gene. Alternatively, SA2 may indeed affect the degree of skewing but another element may affect the direction of skewing. It is also possible that the effect of SA2 on XCI choice may be influenced by chance, or that some unexplained secondary skewing is affecting the XIP of II.5 in her blood, buccal mucosa and, presumably her liver as well.

Figure 3
figure 3

SA2 and Y-RNA genes segregate with theoretical XCE alleles. Theoretical SA2 alleles are shown with the allelic designation of several flanking markers. X-chromosome inactivation (XCI) ratios are indicated below the pedigree number of each individual.

Sequencing of SA2 reveals unusual intron-exon boundaries

Working under the assumption that III.8 and III.9 have discordant XCE genotypes, we sought any difference in their SA2 gene sequences. SA2 transcript analysis identified 45 unique exonic regions among 25 splice variants (Supplementary Table 2). No mutations were confirmed; however, seven poly-N tracks were present whose precise lengths could not be determined in III.8, II.9 or their parents (Supplementary Table 4). Six of these are intronic, lying adjacent to splice sites and one lies within the 3′ UTR of nine SA2 transcript variants.

Discussion

Evidence for primary skewing

Adult secondary X-inactivation patterns may differ from the primary XIP due to stochastic and selective events. We previously ruled out chromosomal abnormalities and selection against the factor VIII mutation10 as possible causes of secondary skewing in this family and proposed that the skewing could principally be due to a primary bias in XCI choice. We have presented here further evidence that the skewing is of a primary nature. The good correlation between XIPs in blood and buccal mucosa argues against tissue specific secondary skewing due to selection, and the proportion of viable male descendents of I.2 (male:female=5:11, P>0.1) makes selection against multiple or all tissues unlikely. Thus, primary skewing is most likely.

The cohesin component SA2, may be influencing XIPs in this family

Three mechanisms have been described in mammals, which affect primary XIPs by biasing XCI choice at the onset of XCI: (1) XIC mutations, (2) imprinting and (3) heterozygosity for a putative X-controlling element, XCE. Only heterozygosity for an XCE has not been ruled out by our previous investigations.10 There is evidence to support the existence of a human XCE,7, 9 although no gene has yet been identified. The XCE hypothesis suggests that the XCE phenotype depends on both the maternal and paternal alleles in a co-dominant fashion. This would explain how sib-pairs such as III.8 and III.9 could be discordant for the skewing phenotype (0.53 SD 0.01 and 0.92 SD 0.02, respectively), and how a child and her grandmother, but not her mother, could be dramatically skewed, as in the case of III.9, I.4, and II.9, respectively.

Using the XCE model, we designed a tailored linkage approach to identify loci influencing XIPs in this family. Our approach assumes that there is a single X-linked locus responsible for the XCE effect and that there is a good XCE genotype/XIP phenotype correlation. Autosomal factors are likely involved in XCI,16, 17, 18 however, as the mouse XCE is X-linked,19 and most of the key elements involved in XCI of humans and mice are X-linked, we reasonably began our search for the human XCE on the X-chromosome. It may be that several loci interact to determine XIPs. There are studies critical of the concept of a single locus having a large effect on XIPs,20 however, these are predicated on a purely recessive or dominant mode of inheritance, and their data can be consistent with the XCE hypothesis if re-evaluated considering the XIP as a co-dominant trait.21 The strength of the genotype/phenotype is unknown. Though our evidence suggests a low likelihood for significant secondary skewing effects, our data cannot rule out selective influences all together, nor can it illuminate the importance of stochastic events.

Given these assumptions, our linkage analysis reveals several match regions, one of which overlaps with XCE candidate regions described elsewhere.8, 9 Assuming a common aetiology, the XCE candidate gene region can be reduced to the interval ChrX: 122914524–123576797.

The putative XCE interval contains an embryologically expressed gene, SA2, whose known biological functions make it a strong candidate for XCE function. SA2 is a core component of the ring-like cohesin complex.22, 23, 24 Cohesin is best known for its role in sister chromatid cohesion before and during mitosis,23, 24 however, other functions have recently been attributed to cohesin,25 including nuclear re-organisation,26 S-phase check point activation, DNA repair27, 28 and gene regulation.29, 30, 31 SA2 itself has been shown to have transcriptional co-activator function.32 Cohesin is recruited to chromatin through its interaction with DNA-binding proteins, including the CCCTCF binding factor (CTCF).33 It co-occupies up to 90% of CTCF binding sites,33, 34 and some functions initially attributed to CTCF may depend on cohesin.34 The cohesin-CTCF complex, which assembles in a cell-cycle-dependant35 and developmentally-regulated36 manner, can recruit RNA polymerase II to directly promote transcription,37 and form DNA loop structures to mediate transcriptional insulation31, 34 or connect enhancers with core promoters.25 Complexed with various other transcription factors, cohesin mediates tissue-specific30, 33 and hormone responsive transcription,33 as well as the coordinated expression of genes with interrelated functions38 and multiple genes in clusters.39 The gene expression control functions of cohesin are molecularly separate from its role in chromatin cohesion such that mutations affecting gene expression do not necessarily affect genome integrity.40

Several lines of evidence suggest a potential role for cohesin in XCI. CTCF binding sites located within the XIC41, 42, 43 likely recruit cohesin to the locus. CTCF binding is sensitive to methylation,44 and XCI choice may be affected by differential methylation of CTCF binding sites within the XIC.45 Mutations of the CTCF binding site within the XIST promoter result in skewed XCI choice.46, 47 XCI is initiated by transient colocalisation of the X-chromosomes,48 and CTCF is required for such pairing.49 The enhancer-blocking function of CTCF has been implicated in XCI.41 As a boundary element, cohesin-CTCF could establish the unique chromatin structure of the XIST gene, which is in a euchromatic state on the principally heterochromatic Xi.50

Our linkage data provides direct evidence linking the cohesin component SA2 to XCI. The specific mechanism by which SA2 could affect XCI choice is unknown. One possibility is that cohesin is involved in binding the two X-chromosomes together during their colocalisation before XCI initiation. As there is only one SA2 molecule in each cohesin ring complex, the orientation of the ring and location of the SA2 molecule could favour transcriptional activation of one XIC versus the other. Biased XCI choice could thus result from genetic or epigenetic variation affecting the stability, level or orientation of cohesin binding. In the family studied here, no variations in the SA2 protein coding sequence were found, however, affects from different splice variants or protein expression levels remain reasonable possibilities. Variations in protein concentration of cohesin loading factors can affect the stability of cohesin binding and alter expression of cohesin-regulated genes.51 RNAi mediated knockdown of Scc3/SA protein levels can also result in altered transcription of cohesin-regulated genes, perhaps by altering cohesin's insulator function.52 Similarly to the finding that protein levels of X-linked genes can be critical for the regulation of XCI,53 precise SA2 expression levels could be important for XCI choice.

The work detailed here supports the hypothesis that a human XCE influences XCI choice. It identifies a region of the X-chromosome that is linked to familial skewed XCI. In conjunction with information presented by others,7, 9 a small region of linkage is determined, which contains a gene, SA2, with an inheritance pattern and known functions that make it a strong candidate for a role in XCI. This is the first time that the cohesin component SA2 or cohesin have been implicated in XCI.