Introduction

X-linked spondyloepiphyseal dysplasia tarda (SEDT, MIM313400) can be caused by mutations in the SEDL gene (MIM300202). SEDL is a highly conserved gene with orthologs identified in yeast, flies and vertebrates, and it is widely expressed in tissues, including fibroblasts, lymphoblasts and fetal cartilage.1 It contains six exons2 and spans a genomic region of approximately 23 kb on Xp22 in the human genome; it also escapes from X-inactivation.3 As a small gene with alternative splicing involving exon 2,4 ubiquitously expressed SEDL mRNA is about 2.8 kb in size. Exons 1 and 2 are noncoding, and the remaining 420 bp coding region is split among exons 3–6. The corresponding introns vary in size from 339 to 14 061 bp and their splice-site sequences for four of the five introns conform to the general canonical GT–AG splicing rule5 using the U1–U2 snRNPs,6 whereas the splicing sites with AT–AC ends between exons 4 and 5 implicate a rare noncanonical splicing mechanism3 often utilizing the minor U11–U12 snRNPs.6 In fact, there are different views regarding the precise splice sites of exon 4/intron 4/ exon 5. The region was originally reported as an AT–AT splice site.1, 2 However, after a splice database search on the basis of the high-throughput genome sequencing project,7 Mumm and colleagues argued that rather than an AT–AT site as proposed previously, the splice site is more likely to be a rare noncanonical AT–AC sequence.3 As the ‘catat’ sequences in the boundaries of both exon 4/intron 4 and intron 4/exon 5 are the same, it had not been possible to identify the exact splicing site by simply comparing the genomic DNA with cDNA. Here, we report that an interesting novel mutation, exactly at the boundary of exon 4/intron 4, suggests a factual exon 4/5 definition.

In this study, involving members of a large Chinese family with X-linked spondyloepiphyseal dysplasia tarda, we provide substantial evidence that the mutation at the splicing donor site of intron 4 results in aberrant splicing of SEDL in the affected males and carrier females. This is a fairly unusual mutation occurring exactly at the AT splicing donor end, leading to a GT donor site and resulting in the activation of certain cryptic splice sites. A series of erroneous splices were generated due to the change from a rare noncanonical AT–AC splice joint to an ordinary canonical GT–AG pattern or an even rarer noncanonical GT–AT splice site.

Materials and methods

Patients and clinical diagnosis

After giving informed written consent, a total of 24 individuals descended directly from one antecedent were examined clinically and radiographically. The pedigree was consistent with an X-linked recessive pattern of inheritance for SEDT (see family pedigree, Supplementary Figure 1). All the adult patients had suffered back and hip pain beginning in their 20s, and the pain had gradually deteriorated. Radiographs of the lumbar spine showed platyspondylia, a posterior hump, and narrowing of the disc space accompanied by end plate sclerosis. They were of short stature in the range of 115.5–133.5 cm, with upper body measurements in the 55.5–65 cm range. Arm spans exceeded height measurements and were in the 132–159 cm range.

PCR and DNA sequencing

DNA was obtained from the 24 family members, comprising 13 affected males, 5 obligate carrier females and 6 unaffected family members. Given the small number of SEDL gene exons and the relative ease of their PCR amplification, we made direct sequencing the primary choice for the detection of the SEDL mutation.8 DNA extraction was performed using the salting-out method.9 Exons 3, 4, 5 and 6 (containing the coding sequence) of SEDL and adjacent splice sites were amplified by PCR and sequenced in both directions, as described by Gedeon et al.1 Sequencing reactions were performed with Big-Dye terminators and an ABI 3730 automated sequencer. All of the sequences were compared against the normal sequences of unrelated controls and the UCSC and NCBI databases.

cDNA cloning, resequencing and database searches

To further assess the consequence of the defect on SEDL gene transcription and RNA splicing, RT-PCR experiments were performed. Total RNA from blood lymphocytes was extracted using Trizol reagent (Invitrogen). RNA integrity was confirmed by direct visualization of 18S and 28S rRNA bands after agarose gel electrophoresis. RNA samples were incubated using 10 U of DNase I (Novagen) at 37°C for 20 min to remove residual DNA, followed by inactivation at 65°C for 10 min. RNA samples were further purified using a HiBind spin column (Omega) according to the manufacturer's instructions.10 The purified RNA samples (0.5 μg) were then reverse-transcribed using the SuperScript first-strand synthesis system (Invitrogen) and oligo-dT18. Subsequently, PCR was performed using primers E1F2 (5′-CTTCCGCGGAAACTGACATTGC-3′) and 5Rx (5′-GTATACACCATTGTGGTGACATC-3′), as described by Gedeon et al.11 RT-PCR products were sequenced in both directions. Simultaneously, to more precisely characterize the effect of the mutation, the unpurified PCR products were directly cloned into the pGEM-T vector (Promega) and resequenced in both directions using SP6 and T7primers.

A database of canonical and noncanonical mammalian splice sites (SpliceDB) was used to search for cDNA and genomic DNA of SEDL-related splice variants.7 The FSPLICE 1.0 and SPL platforms (http://www.softberry.com/) were used to predict potential splice sites in genomic DNA. SPLM was implemented to substitute for SPL only in case of noncanonical splice sites. Blast (NCBI) was used to search databases for SEDL-related sequences.

Real-time quantitative PCR and expression estimates

To determine the expression levels of SEDL among the affected subjects, carriers and normal controls, extra primers spanning exons 3–4 (E3F (5′-CCAGCTGGGAAGGCAGAAT-3′) and E4R (5′-TCGAGAGCAGCATGAGCTATG-3′), amplicon of 71 bp) were designed. Real-time quantitative PCR was performed using the ABI 7900 system (Applied Biosystems). Reactions were performed in a 10 μl volume including diluted cDNA samples, primers and SYBR Green I Mastermix (Applied Biosystems). Diluted cDNA samples produced from 10 ng total RNA were added to each well. Real-time PCR data were collected using SDS software (version 2.1, Applied Biosystems). Expression of SEDL E3-4 was normalized in relation to glyceraldehyde-3-phosphate dehydrogenase (GAPDH) levels (denoted as ΔCt).10 Primer pairs for GAPDH were GAPDHF: 5′-ATCACCATCTTCCAGGAGCGAG-3′ and GAPDHR: 5′-GTTGTCATGGATGACCTTGGCC-3′. Both GAPDH and SEDL were tested three times in a reaction for each sample.

We were aware that seven SEDL pseudogenes have been detected in the human genome. Of these, SEDLP1, a transcribed retropseudogene (or retro-xaptonuon), is located on chromosome 19q13.4 and shows only six nucleotide differences in the ORF as compared with that of the X-linked SEDL gene. To avoid the interference of SEDLP, both primer combinations for RT-PCR and for real-time quantitative PCR were designed to amplify only the X-linked SEDL gene and not the SEDLP1 (chromosome 19) pseudogene sequence or any other pseudogenes.

Results

Genomic DNA sequencing revealed that a single nucleotide transition of A+3431G (position given relative to the ATG start code) occurred at the exon 4/intron 4 boundary (Figure 1a). All affected male subjects showed a single nucleotide substitution of ‘G’ for ‘A,’ and all obligate carriers showed a heterozygous ‘A/G’ substitution, in contrast to a homozygous ‘A/A’ at the corresponding locus in the unaffected male family members (Figure 1b), unrelated controls and public databases.

Figure 1
figure 1

Sequence chromatograms showing mutation in SEDL of affected males and its consequences on splicing junctions. (a) gDNA sequence denotes the IVS4+1A>G mutation. The arrow indicates where the mutation occurred. Exonic sequence is shown in capital letters, intronic sequence in lower case; (b) gDNA sequence of normal control (c) cDNA sequence of RT-PCR product in the reverse-sequencing reaction of patients; (d) cDNA sequence of normal control (GGGGC–ATATGAGGTTT splicing junction); (e) del ATATGAG splice pattern in either patients or carriers; (f) del AT splice pattern in either patients or carriers.

To identify whether the mutation caused either missense coding or erroneous splicing, it was necessary to explore the precise boundaries of exon 4/intron 4/exon 5 using a direct biological approach, as opposed to previously used bioinformatic methods.3 We sequenced RT-PCR products in both directions. Sequence chromatograms showed distinct peaks ahead of the mutation point and ambiguous peaks thereafter in both patients and obligate carriers (Figure 1c). No changes were present in control chromosomes (Figure 1d), which suggested deletion or insertion events around the mutation site. After unscrambling the sequences with ambiguous peaks, we found the mutation was an intronic defect rather than an exonic missense codon, which disrupted the splicing processing. These splicing errors at the donor site (IVS4+1A>G or A+3431G) resulted in a dramatic increase in the percentage of consensus strength from 29.0 to 49.1, as calculated by Shapiro and Senapathy.12 Factually, alteration in RNA splice processing was generated with extra deletions of a seven-nucleotide ‘ATATGAG’ or ‘AT’ dinucleotide (Marked as ‘del ATATGAG’ and ‘del AT,’ respectively, Table 1, Figure 1e and f).

Table 1 Splicing patterns, frequencies and weight matrices of different transcript isoforms in patients and carriers

The results of sequencing cDNA clones confirmed that the two abnormal splice patterns described above presented in clones from patients and carriers’ cDNA (Table 1 and Figure 1), whereas 33 clones from three normal controls showed only the ordinary sequence. The normal splicing junction, with the sequence ‘atatcctt…ttatttcac’ spliced out, had a noncanonical AT–AC splice pair rather than the AT–AT pair. However, the splicing errors caused by the mutation showed the sequence ‘…GCGGGGC’ of exon 4 directly joined to ‘GTTTATT…’ of exon 5, without the 5′-flanking seven ‘ATATGAG’ nucleotides of exon 5 being spliced into the transcript (Figure 2a). It appeared to resume the canonical ‘GT–AG’ splice rule, with the sequence ‘gtatcctt…ttatttcacATATGAG’ being spliced out. Here also, another splice pattern seemed to implicate a noncanonical GT–AT splice pair, with ‘gtatcctt…ttatttcacAT’ being spliced out. Both of the splice junctions caused by the single nucleotide mutation predicted a frameshift in the ORF. Thus, premature termination resulted in the truncation of the putative protein.

Figure 2
figure 2

Schematic figures showing alternative splicing events caused by the IVS4+1A>G mutation. Wt: denotes the wild-type SEDL gene with alternative splicing Exon 2. The coding region is shaded black. Hatched regions represent the 5′- and 3′-untranslated regions (UTRs). White squares stand for the erroneous deleted parts of coding regions. Gray squares indicate the excessive sequences in the mature transcript. (a) Erroneous splice patterns of ‘del ATATGAG’, ‘del AT’ and ‘del ATAT’; (b) Erroneous splice pattern of ‘E1+E3+E4+partial intron 4+E5+E6’; (c) ‘E1+E4+broken Intron4+E5+E6’; (d) ‘E1+partial intron 2+E3, 4+partial intron4 +E5, 6.’

In addition, we also found that a series of variable splicing isoforms occurred infrequently in the patients’ and carriers’ cDNA clones. A different RNA splice junction with an extra deletion of a four-nucleotide ‘ATAT’ was identified in the transcripts, which indicated the presence of another noncanonical GT–AT splice pair with ‘gtatcctt…ttatttcacATAT’ being spliced out (marked as del ATAT; Table 1, Figure 2a and Supplementary Figure 2a). Consequently, the frameshift resulted in a premature stop codon as well as a truncated SEDL protein.

Furthermore, we discovered several other variable splicing isoforms related to the removal of large portions of coding sequence. Generally, the splicing implied that the mutation created stronger cryptic splice pairs that may compete with the constitutively used splice site pairs, thus impairing exon definition, as described by Dietz.13 Occasionally, however, some factual splice sites were not so ‘strong,’ which may be caused by the mutation created/disrupted exonic splicing enhancer/silencer or intronic splicing enhancer/silencer sites.6

One isoform was generated by the splicing pattern that is schematically described as ‘E1+E3+E4+partial intron 4+E5+E6’ (Table 1, Figure 2b and Supplementary Figure 2b). It was quite likely that the mutation prevented the primary RNA splicing process, and that an excessive partial intron 4 sequence from IVS4+1 to +23 (or position +3, 431–+3, 453) was spliced into the transcript, producing a new noncanonical ‘GT–AT’ splice junction, with ‘gtaactaac…tttcacAT’ spliced out. The ends with ‘AT’ indicated the first dinucleotide of exon 5. The altered exon 4 was referred to as exon 4a.2, 14 BLAST results predicted that the splicing pattern would cause eight amino-acid residue insertions into the 82nd to 83rd amino acids (reference: EAW98833) of sedlin.

Another complicated splice pattern was denoted as ‘E1+E4+broken intron4+E5+E6’ (Figure 2c). ‘E2+intron2+E3’ was spliced out of the transcript, and an excessive sequence from intron 4 was detected in the transcript. It is more likely that two cryptic splice sites in accordance with the canonical ‘GT–AG’ splicing mechanism were generated in intron 4. Both located at the downstream of the normal donor of exon 4, and upstream of the normal acceptor of exon 5 (Supplementary Figure 3).

A much more complicated splicing process was schematically presented as ‘E1+partial intron 2+E3, 4 +partial intron4 +E5, 6’ (Figure 2d). The transcript was probably created by the activation of cryptic splice sites within both intron 2 and intron 4. A redundant fragment with 70 nucleotides (position −535 to −464, ‘gtaagtgac…ggcaag’) from intron 2 was spliced into the transcript, which implied that two potential canonical ‘GT–AG’ splice pairs were activated. Therefore, intron 2 broken into two fragments with 13 546 nucleotides (position −14561 to −535, gtagggaatg…tcctttctag) and 445 nucleotides (position −464 to approximately −20 gtaattact…acttaattag) were spliced out (Supplementary Figure 4). The sequence of intron 4 was the same as the splicing pattern of ‘E1+E3+E 4+partial intron 4+E5, 6,’ with the extra residues of ‘gtatccttaccttcttagtaaag.’

Finally, we also found an erroneous transcript involving the partial intron 2 with 70 nucleotides as described in ‘E1+partial intron 2+E3, 4 +partial intron4 +E5, 6’ but without partial intron4, denoted schematically as ‘E1+partial intron 2+E3, 4, 5, 6’ (Table 1).

The predicted scores of SPL(M) and matrix weights of FSPLICE are also shown in Table 1. Our study suggests that some of the practical splice sites are not detectable by SPL, FSPLICE or even SPLM (marked as NA). It is most likely that those programs primarily predict GT–AG splice sites and are of limited value in detecting noncanonical splice sites.

Relative to the expression of the GAPDH housekeeping gene, expression values for each group were 25.30, 147.59 and 491.84, respectively (Figure 3). Analyses showed that expression levels in patients and carriers were both decreased. Patients’ expression was approximately 1/20 of that of controls, whereas that of carriers was approximately from 1/6 to 1/3 of that of controls.

Figure 3
figure 3

Relative expression levels of SEDL genes in controls, carriers and patients.

Discussion

SEDL gene mutations are spread along the entire length of the four SEDL exons3, 4, 5, 6 and their flanking introns. Examples include point mutations, splice alterations, an insertion, many deletions and several complex deletions/insertions. All characterized mutations within the SEDL gene led to loss of function of the SEDL protein.8

In our study, the novel IVS4+1A>G mutation caused seven erroneous transcript isoforms in total. Some patterns occurred more frequently than others and, in particular, the frequencies for the more complex splice patterns were very low. The possible reason was that the complex splice patterns implicated more erroneous splicing, whereas the simple ones simply involved only one erroneous splicing. The mutation was of particular interest, because it not only occurred exactly at the noncanonical splicing point, but also regained the canonical splice end with GT. In mammalian genomes, 99.24% of splice site pairs conform to the GT–AG rule, 0.69% to GC–AG and 0.05% to AT–AC, with only 0.02% consisting of other types of noncanonical splice sites.5 The AT–AC introns are very rare in the human genome, with only six introns showing this splicing pattern.5 Intron 4 of SEDL would be the fifth to be identified. The specific mutation causes the splice site to be recognized as a canonical donor with a ‘GT’ end, or to be unrecognized as a splice site. These splicing mechanisms are quite different from the one resulting from IVS4+4T>C.14 The IVS4+4T>C mutation caused the splice site to be ignored, yielding exon 4a. On the contrary, when an ‘AT’ donor changed into ‘GT’ as reported in our study, it was identified as a canonical splice donor site in most clones and was less likely to be ignored. Comparing the different transcripts caused by IVS4+1A>G with IVS4+4T>C would provide greater insight into the splicing mechanism of the consensus sequence.

In genomes, conserved splice sequences are an essential component of the exon splicing process, and they provide a specific molecular signal for the RNA splicing machinery to identify the precise splice points. GT–AG introns generally use the U1-U2 snRNPs, whereas the AT–AC introns often use the U11-U12 snRNPs.6 As a consequence, the new GT donor site is linked to an AG acceptor and not to the ‘original’ AC acceptor site. The reason why so many cryptic products were formed may be due to the disruption of U11-U12 splicing machinery and the activation of U1-U2 splicing machinery.

All of our transcripts resulting from variable splicing yielded altered SEDL proteins, most of which were truncated, leaving only about half of the NH2-terminus. The putative Golgi-targeting domains within the COOH-terminus were also affected. This may be why the patients from this family are shorter (in the range of 115.5–133.5 cm) than those described elsewhere. However, more evidence is needed on this point.

To our knowledge, this is the only instance where a single nucleotide mutation activated so many cryptic splice sites, and it represents the only natural mutation known to cause a rare noncanonical splice site and retrieve a canonical site. It is also the only RNA splicing mechanism to expose a new pattern of ‘GT–AT.’

We were able to confirm the splicing effect of this mutation on the SEDL gene using a variety of software, including SPL, SPLM and FSPLICE. However, they were not as effective in identifying the cryptic splice acceptor sites, particularly those with the noncanonical ends. Some studies have shown that different components of the splicing machinery are involved in recognizing splice sites with infrequently occurring bases, particularly in the case of some AT–AC introns.6 The related splice mechanisms in the noncanonical splice junctions should be quite different from those canonical patterns.12 Our present knowledge about how the cell specifies splice sites is not sufficient for accurate and comprehensive computational identification of splice junctions in genomic sequences. Characterization of the splice sites with alternative splicing variants would expand our knowledge of splice junctions and help us to improve the quality of gene structure prediction programs.7

SEDL expression levels of patients and carriers were decreased to 1/20 and 1/6, respectively, compared with those of normal controls. These data can help us to understand and quantify why patients affected by SEDT experience reduced endochondral bone growth at the epiphyses, particularly in the vertebral bodies, and why some adult female carriers experience problems with precocious osteoarthritis.

Taken together, identification of the IVS4+1A>G mutation in this SEDT group enables carrier detection and presymptomatic/prenatal diagnosis, but also reveals a significant mutation in the splice consensus. The disruption of the AT donor site in a rare AT–AC intron led to a GT donor site and resulted in a multitude of aberrant transcripts. This is an interesting mechanism and these splice patterns represent a fruitful area for further study on splicing mechanisms.