Introduction

FRAXF is a folate sensitive fragile site in Xq28. It was originally identified in 1993 in a family with developmental delay.1 Since then, it has been seen in only a small number of additional families tested cytogenetically because of MR or developmental delay.1,2,34 In addition to FRAXF there are two other folate sensitive fragile sites in Xq27-28, FRAXA and FRAXE. The molecular basis of all three Xq27-28 fragile sites is an expanded, unstable, and hypermethylated CGG repeat.4,5,6,7,89 While FRAXA and FRAXE are associated with mental retardation, FRAXF was considered to be benign. A study by Holden et al10 suggested that the high incidence of 12 and 14 repeat alleles, with only 1/501 alleles having 13 repeats, indicated a functional significance for the FRAXF repeat region. FRAXF has been investigated as a causative factor in diseases such as mental retardation with multiple congenital anomalies,11 autism and pervasive developmental disorder,12 and PPM-X (Psychosis, Pyramidal signs, Macroordism),13 however association with any of these phenotypes is yet to be confirmed. Based on this and previous work on FRAXF 3,4 it has been assumed that the FRAXF fragile site does not disturb expression of any gene, at least not at any phenotypically discernible level.

We report on the identification of a novel FRAXF fragile site associated gene FAM11A. The FAM11A gene is characterised in terms of its full length cDNA sequence, genomic structure, conservation, and transcriptional silencing in FRAXF full mutation.

Materials and methods

Family information

We present data from a male who expresses the FRAXF fragile site in Xq28. This individual (II : 3) was initially described by Romain et al,2 in a family without mental retardation. Parrish et al,3 later showed that this individual (II:3, family 2) exhibited methylated FRAXF allele of 1.6 kb. An EBV virus transformed lymphoblast cell line from this individual was established and cultured.

Database searches and gene structure determination

Genbank sequence searches were carried out with the BLAST® search algorithm (Basic Local Alignment Search Tool)14 at the NCBI website (http://www.ncbi.nlm.nih.gov). An EST cluster, mapping to the FRAXF region, was identified from the Unigene database at NCBI. Comparison between the human Unigene cluster Hs.11522 and FRAXF genomic sequence (U71148) was used to determine the genomic structure of the FAM11A gene. DNA sequences, ESTs and Unigene clusters were downloaded and further manipulated using the Lasergene software package (DNA Star).

Expression analysis of FRAXF full mutation

Lymphoblast cells were harvested and RNA isolated from approximately 5×106 cells using TRIZOL reagent (GIBCO BRL). Approximately 1 μg of purified total RNA was random primed for first-strand cDNA synthesis, using SuperScript II RNAse H reverse transcriptase (GIBCO BRL). Reverse transcription was carried out for 1 h at 42°C in a thermal cycler (Hybaid). Negative controls containing no reverse transcriptase were run at the same time. PCR was carried out using primers designed to amplify the region from exon 2 to 5 of FAM11A (E2F1: 5′-GGA TGG CAT CAT ACA GTG GAG-3′, E5R1: 5′-CAA GAC GGA CCA CAC AAT GTA G-3′). EsteraseD (ESD) specific primers were used as a control for RT–PCR efficiency.15 Standard conditions for PCR were 30 s at 94°C, 30 s at 60°C, and 30 s at 72°C for 35 cycles in a final 50 μl volume containing 100 mM Tris-HCl, pH=8.3; 50 mM KCl; 1.5 mM MgCl2; 0.2 mM of each dNTP; 50 pmol of each primer; and 2 U of Taq DNA polymerase (Roche). PCR products were analysed on a 1.5% agarose gel with 0.5 mg ml−1 ethidium bromide in 1×TBE.

Reactivation of FAM11A transcriptional silencing with 5-azadeoxycytidine

For treatment with 5-azadeoxycytidine, patient and control lymphoblast cell lines were grown in a 50 ml flask with RPMI1640 culture medium with or without 5-azadeoxycytidine (1 μM) for 7 days as previously described.16 Primers and RT–PCR conditions were as described above.

Alternative splicing

Lymphoblast cDNA was used for PCR to determine the extent of alternative splicing. PCR product was extracted from the agarose gel using the Concert Nucleic Acid Purification Kit (Invitrogen) and sequenced using BigDye Terminator ready reaction kit, according to the manufacturer's specifications (Applied Biosystems).

RT–PCR and Northern blot hybridisation

In order to confirm that the FRAXF CGG repeat was part of the FAM11A transcript, the following flanking oligonucleotide primers were designed (Figure 1), FCGG 5′-CAG GGG GCG GTG GCT CAG GTT TC-3′, and RCGG 5′-CTC AGG TTC ATG GCG GAG AAC-3′ (ATG translation start codon is underlined). These two primers amplify a 210 bp product from exon 1 of FAM11A. The 3′ UTR of FAM11A was amplified from lymphoblast cDNA using the primers; F: 5′-CAC CCC GCA GAC CAG AAC CAG-3′, R: 5′-TCA GCA TGC CCA AAG AGA AAC TAC-3′. This 500 bp product was then used as a probe for Northern blot. A human 12 lane Multi-Tissue Northern blot (Clontech), containing 1 μg of polyA+ RNA isolated from various human adult tissues, was hybridised with this probe using ExpressHyb solution (Clontech) as per manufacturer's instructions.

Figure 1
figure 1

Genomic sequence around the putative promoter region of the FAM11A gene. The region was analysed in silico for putative promoter and transcription start. A candidate transcription start at position −372 bp (with respect to the translation initiation codon) has been predicted (indicated by a circle). The FRAXF (CGG)7 repeat is boxed. There is little similarity between human (U71148) and mouse (AL672026) genomic sequence in the promoter region. The only 100% conserved stretch of sequence is that of 20 bp highlighted (in bold and underlined). Two primers FCGG and RCGG used for PCR and RT–PCR across the FRAXF CGG repeat are shown as arrows.

Results

Identification of a novel gene near FRAXF

Database searches using genomic sequence encompassing FRAXF fragile site (U71148) were used to identify transcribed sequences in the region around the FRAXF CGG repeat. Two Unigene clusters (Hs.352168 and Hs.11522) were identified which span the region distal to triplet. These two clusters corresponded to an as yet uncharacterised gene, which was assigned the symbol FAM11A (Fam ily with sequence similarity 11 , member A, Genbank accession no. AF530473). Northern blot analysis indicates that FAM11A is expressed nearly ubiquitously (also supported by analysis of ESTs) with predominant expression in heart, skeletal muscle, kidney and placenta (Figure 2a). It is transcribed as a 2554 bp message, composed of at least seven exons, encompassing about 35 kb of genomic sequence distal to the FRAXF CpG island. All intron/exon boundaries conform to the standard splice acceptor/splice donor consensus sequences17 (Table 1). The open reading frame (ORF) is 1050 bp long and encodes a protein of 350 amino acids. Analysis of the cDNA sequence has identified a translation start codon within exon 1 (Figure 1). A putative transcription start site was located at position −372 with respect to the ATG start codon (Figure 1). RT–PCR analysis with primers flanking the FRAXF CGG repeat showed that the CGG repeat is transcribed as part of the FAM11A transcript (results not shown). This is also supported by the analysis of available ESTs one of which does contain the CGG repeat in its sequence (BI910833) and two others (BM011526 and BM469272) which terminate just below the CGG repeat.

Figure 2
figure 2

Analysis of the expression of the FAM11A gene. (a) Northern blot analysis of the human FAM11A gene. Human adult 12-lane multiple tissue northern blot was hybridised with a FAM11A gene 3′ UTR probe. Two bands at 2.6 and 2.1 kb (indicated with arrowheads) of various intensity were detected across all tissues tested, however, heart, skeletal muscle, kidney and placenta were among the tissues with the highest FAM11A abundance. (b) A diagram showing alternative splicing events in FAM11A. Exons 2, 3, 4 and 7 have been found alternatively spliced. Exons (3a, 4a and intron 7) involved in alternative splicing are shaded light grey. The putative ORF is shaded black. I, shows the full length FAM11A transcript, while II, indicates the short FAM11A transcript truncated after exon 4.

Table 1 Splice sites of the FAM11A genea

Alternative splicing of FAM11A

Two major FAM11A isoforms were detected on Northern blot, 2.6 kb and a smaller 2.1 kb. The origin of these two isoforms is not obvious. Originally we speculated that the smaller isoform corresponds to the chromosome 2 transcribed processed retropseudogene, FAM11B transcript (see below). However, the probe as used for the Northern blot would not cross-hybridise with the FAM11B transcript based on sequence alignment (results not shown). As a result the most likely explanation for the existence of the two major isoforms as detected by Northern blot hybridisation is alternative splicing. This was studied on human lymphoblast mRNA. Alternative splicing events were detected for exons 2, 3, 4 and 7. An isoform missing exon 2 was detected at a low level, also exons 2 and 3 were missing in a much smaller proportion of transcripts, as determined from the RT–PCR. We have observed that in a small proportion of ESTs exons 3 and 4 show alternative splicing. For exon 3 an additional 49 bp is observed at the 3′ end (5′ donor splice site of intron 3), and for exon 4 an additional 4 bp at the 5′ end (3′ acceptor splice site of intron 3; see Table 1, annotated 3a and 4a). An interesting alternative splicing event at the 3′ end of the gene occurs within exon 7. Seventy-eight bp within exon 7, which we call intron 7 is excluded in 50% of ESTs. RT–PCR analysis has revealed that the intron is in fact present in a higher proportion of transcripts (data not shown) and is the most predominant form. This results in a splitting of exon 7 so we have annotated this as exons 7a and 7b (see Figure 2b).

Additionally, comparison of the two FAM11A associated Unigene clusters Hs.352168 and Hs.11522 revealed that ESTs from cluster Hs.352168 have different 3′ end starting from the end of exon 4. In five of these ESTs (BM458996, BG721461, BF734532, AV720618 and BG565086) the donor splice site of exon 4 is ignored. This results in the inclusion of an extra 600 bp of the FAM11A intron 3 sequence in the FAM11A transcript and as a consequence premature termination of transcription (see Figure 2).

FAM11A is transcriptionally silenced by FRAXF full mutation

To investigate whether FAM11A transcription is silenced by the FRAXF full mutation, lymphoblast mRNA from an individual expressing the fragile site was analysed by RT–PCR (Figure 3a). The result of this experiment shows that expansion of the FRAXF CGG repeat and consequent methylation of the CpG island results in the transcriptional silencing of the FAM11A gene.

Figure 3
figure 3

(a) RT–PCR analysis of the FAM11A gene in lymphoblast cells of a FRAXF normal and FRAXF full mutation individual. FAM11A expression was not detected in the individual with FRAXF full mutation. (b) Reactivation of FAM11A expression using 5-azadeoxycytidine in a FRAXF normal and full mutation individual. Full mutation cell lines were either treated (T) with 5-azadeoxycytidine or not treated (U) before RT–PCR, normal cell line (N) was not treated. FAM11A transcription is reactivated in the treated cell line. 1 kb+ molecular marker (GIBCO BRL) is indicated. As control ESD primer set was used for both experiments.

To confirm that methylation of the FRAXF CpG island was directly involved in the transcriptional silencing of the FAM11A gene we treated the patient lymphoblastoid cell line with the demethylation agent 5-azadeoxycytidine. Reactivation of the FAM11A gene transcription as detected by RT–PCR occurred upon treatment with 5-azadeoxycytidine for 7 days (Figure 3b). This demonstrated that methylation of the FRAXF CpG island as a consequence of the CGG repeat expansion has a crucial role in the FAM11A gene silencing.

Identification of a chromosome 2 transcribed, processed retropseudogene

dbEST database searches using FRAXF genomic sequence revealed additional ESTs (Unigene cluster Hs.44680), which correspond to a highly similar gene to FAM11A on chromosome 2q21.2. Comparison of FAM11A and the chromosome 2 gene, FAM11B (Fam ily with sequence similarity 11 , member B, Genbank accession no. AF530474) revealed that the chromosome 2 gene is a transcribed retropseudogene. Genes duplicated by retroposition (whereby RNA is reverse transcribed into a DNA copy and inserted into the genome) which remain active, often with amino acid changes or altered tissue expression patterns, are termed ‘retropseudogenes’ or ‘retroxaptonuons’.18 FAM11B does not contain any of the FAM11A introns. The open reading frame (ORF) of the FAM11B is also 1050 bp long and encodes a protein of 350 amino acids, 88% identical to the FAM11A (Figure 4). The two genes, FAM11A and FAM11B, otherwise differ markedly in their 5′ and 3′ untranslated regions (results not shown).

Figure 4
figure 4

ClustalW multiple protein alignment of the human and mouse FAM11A and FAM11B proteins. Amino acids shaded in black are those that differ from the consensus. Exon/intron boundaries are indicated by arrowheads. All exon/exon boundaries were 100% conserved between the human and mouse genes.

Conservation of FAM11A in M. musculus, D. melanogaster and C. elegans

Database searches of mouse ESTs revealed presence of two clusters, Mm.56701 (87 ESTs) and Mm.24569 (55 ESTs). Comparison with the FAM11A gene and FAM11B retropseudogene sequences showed that the mouse Unigene cluster Mm.24569 corresponds to the mouse orthologue of FAM11A and Mm.56701 to the mouse retropseudogene. This is further supported by matches against the mouse genomic sequences, AL672026 and AC100135 (X-chromosome genomic sequence) and AC100024 (mouse chromosome 1, unplaced), respectively. Interestingly, in mouse (likewise in man), both gene Fam11a and retropseudogene Fam11b are transcribed. ClustalW alignment of the putative human and mouse gene and retropseudogene proteins is shown in Figure 4. While the human and mouse proteins differ in only one amino acid, the retropseudogene-encoded proteins are 100% identical. This high conservation of both gene and retropseudogene encoded proteins between man and mouse provides an indication of an important, conserved function. FAM11A appears to be highly conserved, with orthologues in other species including D. melanogaster (AAF48931, 36% similarity) and C. elegans (CAB55145, 26% similarity).

Discussion

Expansion of FRAXF was initially described in a family with developmental delay.1 Since then it was reported only in a small number of additional cases.3 In contrary to the other two more proximal Xq27.3-q28 folate sensitive fragile sites, FRAXA and FRAXE, FRAXF was considered benign. This was primarily because of the absence of any FRAXF associated phenotype, but also because of a failure to identify any FRAXF associated gene(s).3,4

We report here on the identification of a novel gene, FAM11A, originating from the FRAXF CpG island and containing the FRAXF CGG repeat in its 5′ untranslated region. Similar to the other folate sensitive fragile site associated genes, ie FMR1/FRAXA,19 FMR2/FRAXE8,9 and FMR3/FRAXE,20 the FRAXF CGG repeat is transcribed (but not translated). We have also demonstrated that the paradigm of expansion-methylation-transcription silencing seen for the other folate sensitive fragile site associated genes is seen also for the FAM11A gene at FRAXF. However, as only one FRAXF full mutation was available for testing this paradigm needs to be further confirmed on additional cases. It also remains to be tested whether FRAXF CGG expansion and transcriptional silencing of FAM11A can be associated with developmental delay as seen in some FRAXF families studied.

At least for FRAXE and FMR2, there are now several families known where full mutation does not cause developmental delay (IQ<70) (for review see Gécz21). When this model is applied for FRAXF, we may speculate that absence of FAM11A does not necessarily lead to developmental delay, but might in combination with other, yet unknown factors. Otherwise, the lack of the FAM11A protein is either disposable (redundant) or its function can be complemented by the highly similar chromosome 2 retropseudogene product, FAM11B. Comparison of FAM11A and FAM11B proteins shows 88% identity. More interestingly, when respective gene and retropseudogene proteins are compared between man and mouse, 99.7% (one amino acid difference) and 100% conservation is detected. There are numerous retropseudogenes in the human genome, some of which are transcribed.22,23 Their function is not known, but it has been suggested that they may be acquired during evolution as novel genes.18,23,24 Their protein products may even be functionally indistinguishable from the original gene.

In summary we have identified the gene FAM11A associated with the FRAXF fragile site. Its transcription is extinguished by the FRAXF full mutation. In the absence of any FRAXF associated phenotype we can only speculate about the normal role of this gene.