Introduction

Mental retardation (MR), defined by an intelligence quotient below 70, is characterized by a global deficiency in cognitive abilities. It represents the most frequent phenotypic manifestation of abnormal development in the central nervous system, affecting about 2% of the population in industrialized countries.1 X-linked MR affects approximately 1 out of 1000 males, 2 3 of which show nonsyndromic X-linked MR (MRX). MRX is a genetically and clinically heterogeneous disorder with MR as the only clinically consistent feature. To date >20 genes playing a role in MRX have been identified.2 However, the mutation frequency in each gene is low, which implies that the majority of genetic defects involved in MRX remains to be identified.

Analysis of published as well as unpublished linkage data from numerous families has shown that mutations involved in MRX seem to cluster in three different regions of the human X-chromosome.3 Two regions, one at Xq28 and the other at Xp22.1–p21.3, contain known MR genes. The presence of a third broad peak on proximal Xp suggested the existence of novel MR genes in this region. Therefore, we have chosen a 7.4 Mb interval in Xp11, flanked by the genes ELK1 and ALAS2, to systematically screen the coding regions and splice sites of 47 candidate genes in between for mutations in up to 22 unrelated XLMR families with overlapping linkage intervals. Based on this study, four MR genes have been identified and published, but many other sequence variants, with so far unknown functional relevance have been identified as well.

The purpose of this article is to provide a comprehensive overview about the outcome of this work. It describes all sequence variants in a total of 548 exons and their flanking sequences.

Materials/subjects and methods

Patients and controls

Families with linkage intervals overlapping the region between ELK1 and ALAS2 on proximal Xp were selected. These included patients from five different families with syndromic forms of X-linked MR, one patient with Sutherland–Haan syndrome,4 and one patient with Wieacker–Wolf syndrome.5 Furthermore, 20 unrelated MRX patients were chosen, including eight patients from previously described MRX families: MRX15,6 MRX18,7 MRX26,8 MRX44, MRX45, MRX52,9 MRX5510 and MRX65.11 A survey of the investigated families, their individual linkage intervals as well as additional clinical features is given in Table 1. A different set consisting of 180 individuals from small families with presumed MRX was obtained through the Euro-MRX Consortium. These families show two to five affected brothers but no obligate female carriers. As controls, 168 unrelated male individuals, either students or healthy blood donors, have been used. For FLJ14103, HDAC6, SLC38A5, TRO and ZNF41, additional controls including also females were used, and for PLP2, DNA samples from patients with other disorders but without MR were included as well.

Table 1 Linkage intervals and additional features of 27 XLMR families

DNA extraction

Blood samples and lymphoblastoid cell lines (LCL) from patients and controls were obtained through the Euro-MRX Consortium (http://www.euromrx.com/). Genomic DNA was extracted using standard methods.

Selection of cDNA sequences for molecular screen

At the start of the project, 65 RefSeq genes and many more mRNA and EST sequences encoding hypothetical proteins (UCSC Genome Browser; April 2002 Freeze) were known in the candidate region at Xp11.23–11.21 flanked by the genes ELK1 and ALAS2. Candidate genes (Table 2) were selected using the following criteria: firstly, gene expression in nervous tissue (indicated by ESTs present in the Unigene database), secondly, no involvement in other disorders without MR and thirdly, presence of an open-reading frame. The 47 genes fulfilling these criteria are listed in Table 2.

Table 2 Alphabetical list of 47 screened candidate genes from the Xp11 interval ELK1-ALAS2

Primer design

Primers for amplification of the entire coding regions and exon–intron boundaries of all genes listed in Table 2 were designed using either the ‘Primer3’15 or the ‘Pride’ software.16 Exons longer than 200 bp were divided into overlapping fragments suitable for mutation detection by denaturing HPLC (see below). The primer sequences are available upon request.

PCR

In general, PCR amplifications were carried out in 50 μl reaction volumes containing 100 ng genomic DNA, 1 × supplied reaction buffer, 10 pmol of each primer, 200 μ M dNTPs and 1 U Taq polymerase (Promega, Mannheim, Germany or Qiagen, Hilden, Germany). A touchdown PCR profile was used. Step1: 96°C for 3 min followed by 20 cycles (95°C for 30 s, 65°C for 30 s) with a decrement of 0.5°/cycle. Step2: 30 cycles (95°C for 30 s, 55°C for 30 s and 72°C for 30 s). The PCR was concluded by a 5 min extension at 72°C. Alternatively, a PCR profile consisting of an initial denaturation step at 96°C for 3 min followed by 30–40 cycles at 95°C for 30 s, primer sequence-dependent annealing temperature for 45 s and 72°C for 30 s, with a 5 min final extension period (72°C) has been used. The specificity and the amount of the amplified products were checked by agarose gel electrophoresis before further analysis.

Denaturing high-performance liquid chromatography analysis

PCR-amplified fragments were submitted to denaturing high-performance liquid chromatography analysis (DHPLC) (WAVE Nucleic Acid Fragment Analysis System Transgenomic Inc., San Jose, CA, USA). For DHPLC, PCR products were pooled pairwise and denatured at 95°C for 10 min, followed by gradual re-annealing to room temperature over 20 min to enhance heteroduplex formation. Eight microliters of pooled PCR product were then injected into the autosampler, separated through a DNASep HT Cartridge (Transgenomic Inc., San Jose, CA, USA), eluted using a linear acetonitrile gradient at a flow rate of 0.9 ml/min and detected by UV analysis at 260 nm. Optimal conditions for each injection (temperature, elution time and buffer composition) were determined using the WAVE Maker software (version 4.1.40, Transgenomic). Samples were analyzed at 2–4 different temperatures in order to detect sequence variants in different melting domains of the fragments. DHPLC conditions for individual fragments are available upon request.

Sequence analysis

Sequencing reactions were carried out for patient DNAs, which showed abnormal elution profiles in the DHPLC analysis. Before sequencing, the original PCR products were either purified using the Qiaquick PCR Purification Kit (Qiagen, Hilden, Germany) or they were directly sequenced in both directions using a 3100 Genetic Analyzer and Big Dye terminator chemistry (Applied Biosystems, Foster City, CA, USA). Sequence data were assembled and analyzed using the GAP4 Contig Editor.17

RNA isolation and Northern blot analysis

Total RNA was isolated from patient LCL by use of Trizol (Invitrogen, Carlsbad, CA, USA), according to the manufacturer's recommendations. Fetal brain RNA was purchased from BD Biosciences (Palo Alto, CA, USA). Poly-A+ RNAs, obtained from 100 μg total RNA by using Dynabeads oligo-dT25 (Dynal Biotech, Oslo, Norway), were separated on a formaldehyde containing gel in 1 × MOPS, transferred to a Hybond N+ membrane and crosslinked by UV using the auto-crosslink program of a Stratalinker (Stratagene, La Jolla, CA, USA). The gene-specific probes with an average size between 300 and 600 bp were PCR amplified from genomic DNA. All probes were designed to hybridize to at least 100 bases of the respective RefSeq cDNA. The specificity of the probes was checked by BLAST alignment. The sequences of primers used for probe generation are available upon request. Probes were labeled with 32[P]dCTP using Klenow enzyme and random hexamer primers. The labeled fragments were purified and hybridized to membranes in UltraHyb buffer (Ambion, Austin, TX, USA) and washed according to instructions of the manufacturer. Subsequently, Northern blots were exposed to Fuji Medical X-Ray films at −80°C for 6 h up to 8 days or were analyzed using a Storm 820 imaging system (APBiotech, Piscataway, NJ, USA). To control for RNA loading, blots were re-probed with a β-actin probe (BioChain, Hayward, CA, USA).

Results

Spectrum of sequence variants

Within the framework of our molecular screen, we have studied a total of 47 genes in up to 22 different XLMR families. PQBP1, FTSJ1 and JARID1C were screened in all 22 families, whereas 15 other genes were screened only in families where no PQBP1 mutations had been found (indicated by ‘a’ in Table 2). For the remaining genes, fewer families have been analyzed, because in six patients, only limited amounts of DNA were available (genes denoted with ‘b’ in Table 2). With regard to the 22 MRX families analyzed, 17 did not carry mutations in PQBP1 and 11 of these have been screened for all 46 remaining genes. Altogether, 705 different PCR fragments (covering 548 exons) that represent about 90 000 bp of coding and about 60 000 bp of noncoding DNA sequence have been screened for the majority of patients.

The distribution of the 57 variants is shown in Figure 1. The sequence variants are not evenly distributed among the 47 investigated genes: in 27 genes, one or several variants were found, whereas no variants were found for the remaining 20 genes.

Figure 1
figure 1

Distribution of sequence variants among the 47 investigated genes is shown. Only 27 genes were affected by one or more changes and the proportion of variants in coding and noncoding regions are nearly equal.

Of the 57 variants, 50 are nucleotide substitutions (40 transitions and 10 transversions, see Table 4), 21 of which have not been reported in dbSNP (marked in bold in Table 3).

Table 4 Summary of 57 sequence variants found in 47 genes
Table 3 Summary of sequence alterations identified in screening 47 candidate genes in Xp11 in a panel of 22 patients with mental retardation

Of the 30 sequence variants found in the coding region (Figure 1, Table 3), 10 are putatively pathogenic alterations affecting the following genes: PQBP1, FTSJ1, PHF8, JARID1C, PLP2 and FLJ14103. In FLJ14103, a single variant (c.70G>T, S24P) was found, which was not present in 95 controls (with higher educational background) and 180 unrelated MR patients. The variant in PLP2 (c.434A>C, T145N) was not found in >600 controls. The FLJ14103 variant does not segregate with the disorder, whereas the PLP2 variant cosegregates with MR in all families where it has been observed. The eight mutations we found in the four remaining genes have already been reported elsewhere.18, 19, 20, 21 Interestingly, the variant c.442C>T (R148W) in FLJ21687 was not detected in the control panel, but the analysis of 180 patients from small MR families revealed the same variation in two patients, one of them being from family P048 where a nonsense mutation in FTSJ1 has been described.18

Five further missense variants (detected in the 22 MRX families) were found in at least two out of 250 control X-chromosomes. These variants include FLJ14103 (K128R), HDAC6 (R832H), SLC38A5 (M451T), TRO (P439L) and ZNF41 (I125R), of which the variants detected in FLJ14103 and SLC38A5 were both present in dbSNP.

Thirteen different silent sequence variants were encountered in nine different genes. Seven silent variants were found only once in our patient cohort, and four of these have not been described in dbSNP so far. All other silent variants were found more than once in our patient panel and were present in dbSNP. Other variants that have not been published or could not be found in controls or in dbSNP were not observed in the panel of 180 small MR families.

The most common silent variant (found in 12 out of 16 patients in our patient cohort) is the transition G>A at nucleotide 3483 (E1161E) in the gene KIAA1202. This transition is part of a haplotype, which also contains a 12 bp deletion c.3350_3361del (Q1124_Q1127del) and a deletion of 3 bp (c.3424_3426del; E1142del). Both haplotypes were found in controls and are described by Hagens et al.22

In MAGED2, another haplotype block was identified as five patients carry the transition c.252A>G in combination with c.624C>T. The other analyzed patients do not carry any of these variants.

In noncoding regions, we have identified 27 different sequence variants (Figure 1) including one 2 bp deletion and 26 SNPs. The 2 bp deletion was found in the trophinin gene, 68 bp downstream of the donor splice site of exon 6, in the patient of family N017 (also carrying a missense mutation c.1162G>C, A388P in JARID1C). The majority of the 26 SNPs were present in dbSNP except for eight variants, four of which were found only once in the patients tested.

Expression analysis of candidate genes by Northern blot hybridization

Clinically relevant sequence variants are not necessarily confined to coding regions: changes in regulatory sequences can alter gene expression levels, and intronic mutations may result in altered splicing patterns. To detect such variants, we have carried out Northern blot analysis in 27 XLMR patients (including the 22 patients analyzed for sequence variation plus five more recently collected patients). Of the 47 candidate genes investigated, 26 yielded specific signals corresponding to the expected transcript size (denoted bold in Table 2). mRNA expression of the 26 genes was analyzed in five separate Northern blot experiments using membranes carrying the same amount of RNA from each patient. Results were assessed by visual comparison of signal intensity patterns in patients and controls. Low signal intensities were consistently observed for patients N009, N001, N017, T003, N045, L038 and L045. Hybridization signals from lymphoblastoid RNA and fetal brain RNA showed different intensities, but banding patterns were similar in both tissues. Northern blot hybridization of PQBP1 confirmed our previous results: patients from families N009, N040, N045, MRX55 and SHS showed an almost complete loss of PQBP1 expression,20 and abnormal splicing of FTSJ1 was observed for the patient of family MRX44, where a G>A transition had been found.18 Abnormal splicing was also observed for the patient of family N042, where a 12 bp deletion covering the donor splice site of PHF8 exon 8 (c.1050delACAGgtcttccc) had been identified (Figure 2), which confirms the results recently published by Laumonnier et al21 Some genes, including APE2, HADH2, PLP2 and TIMP1, displayed relatively strong variation, both in patients and in controls, but we did not observe consistent mRNA expression changes pointing to effects of silent or intronic sequence variation.

Figure 2
figure 2

Northern blot analysis of poly-A+ RNA from 27 patient and five control LCL. RNA from fetal brain has been included as control. The blot was sequentially hybridized with a cDNA probe corresponding to the 3′UTR of PHF8 and a â-actin cDNA probe. (*) Note the increased transcript size in the patient from family N042.

Discussion

MR can result from selective impairment of brain development or physiology but also from fundamental cellular defects that are present in many tissues, but predominantly affecting the brain, for example, because of its higher sensitivity.23 This explains why some MR genes are expressed specifically in the brain, whereas others are ubiquitously expressed. The prevalence of MR is lower in females than in males, which is partially due to mutations on the X-chromosome (for a detailed discussion on the subject, see Ropers and Hamel2). Based on the distribution of linkage intervals in families affected with MRX, we have previously shown that approximately 30% of the causative mutations localize to the proximal Xp and the pericentromeric region.3 Within this area, a 7.4 Mb region flanked by the genes ELK1 and ALAS2 contains the highest number of defects that give rise to MRX.3

Therefore, we have selected 47 genes within this interval that are expressed in nervous tissue (but not necessarily exclusively so) and analyzed them for mutations in up to 22 MR families with linkage to this area. This led to the identification of four genes18, 19, 20, 21 involved in MRX, a comparatively high number, which underscores the heterogeneity of this disorder and at the same time confirms the ELK1ALAS2 region as a hotspot for MRX candidate genes.

In the remaining 43 genes analyzed in this screen, we found eight missense variants, three of which are not present in controls or in dbSNP, R148W in FLJ21687, T145N in PLP2 and S24P in FLJ14103. The FLJ21687 variant was not found in controls, but a small MR family, P048, carries this variant as well as a nonsense mutation in FTSJ1. Under the likely assumption that only one mutation is responsible for the disease in each family, R148W in FLJ21687 is unlikely to be causative. For the patient in whom the variant in PLP2 (T145N) was found, no other DNA changes have been reported. Therefore, this variant could be involved in MRX, but at present there is no functional evidence to support this.

Intronic variants can affect splicing and mRNA stability and increasing evidence suggests that this is also the case for silent variants (for a detailed discussion on the subject, see Chamary et al24). To study the possible presence of such variants, Northern blot analysis was carried out for 26 of the 47 investigated genes (the remaining 21 genes were not expressed at detectable levels in LCL). As expected, the frameshift mutations in PQBP1 and the splice site mutations in FTSJ1 and PHF8 could be shown to alter mRNA expression or splicing in LCLs. As no other detectable effects on mRNA expression were observed for the known MRX genes PQBP1, FTSJ1, PHF8, JARID1C and ZNF41, this suggests that silent or non-coding variation in these genes is not a common cause of MRX.

In order to address the question whether brain tissues express specific splice variants of MR candidate genes that might not be detectable in LCLs, we included fetal brain RNA in our Northern blot analysis. As we could not observe differential splice patterns between LCLs and fetal brain for the 26 genes where Northern blotting was successful, we conclude that none of the major transcripts escaped our LCL-based analysis. Still, expression of 21 genes in LCLs was too low for detection using Northern blot hybridization, and in seven of these genes (FLJ14103, FLJ21687, KIAA1202, PCSK1N, RBM3, SMC1L1 and TRO) we found nine sequence variants that were not present in dbSNP. The possibility that (some of) these variants have an effect on mRNA expression or splicing cannot be excluded.

Our findings also comprise a number of small (<13 bp) deletions. Owing to the limitations of our approach (PCR and sequencing), larger genomic rearrangements (except deletions) could not be detected in this study. Duplications containing MECP225 have been described as frequent cause of Rett syndrome, a syndromic form of MRX, and it is conceivable that similar rearrangements occur elsewhere on the X, too. However, low copy repeats, which often mediate duplications or inversions are comparatively rare in the Xp11 region,26 suggesting that in this region, large genomic rearrangements are not common. This also fits with the general absence of aberrant mRNA expression patterns, except for changes that could be ascribed to abnormal mRNA processing due to known mutations.

Taken together, in this study, we have found mutations within four genes, in eight out of 22 families with linkage to the ELK1ALAS2 region. The majority of mutations in the 22 families are not affecting the five MRX genes PQBP1, FTSJ1, PHF8, JARID1C and ZNF41.18, 19, 20, 21, 27 Therefore, it is very likely that mutation analysis in other families with linkage intervals overlapping this region will show a similarly low proportion of mutations affecting the above-mentioned genes, which implies that the majority of mutations is harbored by other genes.

As our study has excluded MRX causative mutations in 43 of these other genes in nine families with linkage information, their potential of bearing relevant changes for X-linked MR has been demonstrated to be low. This information will prove valuable when prioritizing candidate genes in the search for mutations in further MR families.