Introduction

Nebulin is a giant (600–900 kDa) muscle protein expressed predominantly in the thin filaments of striated muscle. Nebulin has a highly repetitive protein structure; up to 97% of the polypeptide consisting of modules 30–35 amino-acid residues long, arranged into simple repeats or super repeats. The repeat modules contain conserved SDXXYK-actin-binding motifs. Both the 8 kDa N-terminal and the 20 kDa C-terminal ends contain unique protein domains. The C-terminus is anchored in the Z disc of the muscle sarcomere and contains a conserved src homology (SH3) domain.1, 2 A novel sarcomeric protein, myopalladin, links nebulin to α-actinin in the Z discs.3 There is also evidence that the nebulin SH3 domain may bind titin.4 The Z-disc peripheral region of nebulin has recently been shown to bind desmin.5 Z discs vary in structure and width in different muscle tissues and fibre types, and several nebulin isoforms have been described differing in their C-terminal regions.6 Alternatively spliced exons in the 3′ end of the gene as well as in the central region account for the broad isoform diversity of nebulin.1, 6 Alternative splicing is a common mechanism used to create muscle proteins specific for different muscle types and muscles of different developmental stages.7, 8 Protein domains in the N-terminus of nebulin interact with tropomodulin at the pointed end of the thin filaments.9 The protein structure of nebulin is shown in Figure 1.

Figure 1
figure 1

The protein architecture of nebulin and its protein-binding partners.1, 2, 3, 4, 5, 7

We have previously described the exon–intron organization of the last 42 exons in the 3′ end of NEB, encoding the last two super repeats, simple repeats 163–185, and the unique serine-rich and SH3 domains.10, 11 Mutation analysis of the 3′ end exons revealed several different disease-causing mutations in DNA samples from patients with clinically different subtypes of nemaline myopathy.10, 11, 12

Nemaline myopathy is a clinically and genetically heterogeneous muscle disorder. A common denominator for nemaline myopathy patients is pathological disorganization of the muscle Z discs and accumulation of nemaline bodies, that is, aggregates of Z-disc and thin-filament proteins, such as α-actinin, actin and actin-associated proteins.

Nebulin is present in the muscles of nemaline myopathy patients, but the staining pattern may be abnormal and some epitopes are missing.10, 13 Taken together, mutational and immunohistochemical data suggest that the mutations may cause internal protein truncation, possibly associated with loss of some nebulin isoforms. This loss of fibre-type diversity may be relevant to disease pathogenesis.

In addition to NEB, mutations in the skeletal muscle α-actin gene (ACTA1), the α-tropomyosin gene (TPM3), the β-tropomyosin gene (TPM2) and the troponin T1 gene (TNNT1) have been shown to cause nemaline myopathy.14, 15, 16, 17, 18

The genomic sequence of the mouse nebulin gene has recently been published.19 The mouse gene has 166 exons spanning a genomic region of 202 kb. Interestingly, Kazmierski et al19 also reported low levels of nebulin expression in heart muscle, which previously was thought only to express the related protein nebulette.6 Here we report that the size of the human gene is 249 kb, containing 183 exons. We describe various different transcripts from four distinct regions involved in alternative splicing. Some of the transcripts were found in heart muscle also.

Materials and methods

Analysis of genomic sequences

BLAST homology searches20, 21 were performed with the nebulin cDNA sequence GenBank X83957 against the High Throughput Genomic (HTG) Sequences division (htgs database) and the nonredundant sequence division (nr database) of GenBank (http://www.ncbi.nlm.nih.gov/BLAST/). Draft human genomic sequences homologous to NEB were further analysed and organized into logical order with the help of the local BLAST programme in the BioEdit sequence alignment editor version 5.0.9 (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). Genomic sequence not showing homology to the full-length NEB mRNA sequence (GenBank X83957) was compared to the EST division of GenBank allowing identification of exons not present in the X83957 sequence. RepeatMasker available at http://searchlauncher.bcm.tmc.edu/22 was used to identify repetitive elements in the introns. Complete sequence annotation was performed by the NIX program (http://www.hgmp.mrc.ac.uk). The protein kinase C phosphorylation site in exon 144 was predicted by the SMART protein tool (http://smart.embl-heidelberg.de/).

RT-PCR and sequencing

Total RNA was isolated with the RNeasy Fibrous Tissue Mini Kit (Qiagen) from human tibialis anterior, gastrocnemius and rectus femoris muscles, and adult mouse hind leg, foetal mouse hind leg and mouse heart muscles. cDNA was synthesized using the Moloney murine leukaemia virus reverse transcriptase (Promega) and random hexamers as primers. Adult human cardiac RNA and human and mouse foetal skeletal muscle cDNAs were purchased from Clontech (BD Bioscience Clontech). PCR on the cDNA of the regions spanning exons 63–66, 82–105, 143–144 and 166–177 was performed using the primers listed in Table 1. The annealing temperatures for the primers were 62–68°C. DynaZyme EXT DNA polymerase, DynaZymeII polymerase (Finnzymes, Espoo, Finland) and AmpliTaq Gold polymerase (Applied Biosystems) was used for the amplification. The PCR products were cloned using the TOPO TA cloning system (Invitrogen). Sequencing of the isolated clones was performed on an ABI 3100 sequencer (Applied Biosystems).

Table 1 PCR primers for amplification of alternatively spliced regions in NEB

Results

The search through GenBank revealed two BAC clones homologous to NEB mRNA (GenBank X83957), one extending from the 5′ end to the middle of the super-repeat region (super repeat 15) of NEB (AC107052) and the other from super-repeat 15 to the 3′ UTR of NEB (AC009497). Altogether, 183 NEB exons were identified in the genomic sequence. The exons and the exon–intron boundaries were verified using the NIX programme. Exons 1–84 are in AC107052 and exons 85–183 in AC009497. The chromosome 2 genomic contig NT_005151 contains the whole nebulin gene. The translation initiation codon is in exon 3 and the stop codon and the 3′ UTR are in exon 183. The total length of the genomic sequence spanning NEB is 249 kb (Figure 2). The exons and the corresponding mRNA and protein domains are listed in Table 2.

Figure 2
figure 2

The genomic organization of the human nebulin gene. The exons are shown as boxes, and the introns as lines. The boxed regions in the middle of the gene correspond to duplicated segments. The protein domains are shown below the exons as thick lines. Protein-domain nomenclature according to Pfuhl et al.2

Table 2 The nebulin gene

The smallest NEB exon is 42 bp long (exon 4) and the largest 596 bp long (exon 183), the majority of the exons vary between 93 and 312 bp in size. The largest intron is 9232 bp long (intron 13), and the smallest 83 bp (intron 59). The splice junctions of all introns follow the GT-AG rule. The central region of the gene harbours a 8.2 kb region spanning eight exons, which apparently is duplicated twice (exons 82–89, 90–97 and 98–105). The duplicated segments are 99% identical. LINE-2 repetitive elements were identified in introns 89, 97 and 105, flanking the duplicated segments. The duplicated exons are not represented in the full-length NEB mRNA (X83957), but can be found in foetal NEB mRNA (U35637), in two EST sequences (BF576570, BE185376) and in the highly homologous mouse NEB mRNA (U58109). Exons 82–105 encode a total of 1458 amino acids, corresponding to 42 simple-repeat domains or six super repeats. We were able to amplify exons 80–106 from cDNA reverse-transcribed from RNA extracted from adult human tibialis anterior, gastrocnemius and rectus femoris muscles, but not from adult cardiac muscle. By sequencing the RT–PCR products, we identified isoforms lacking exons 82–105, that is, exon 81 spliced to exon 106, and isoforms expressing exons 82–89.

One region preceding the central duplicated region is also involved in alternative splicing, that is, exons 63–66. Exons 63–66 encode seven additional protein repeat modules highly homologous to super repeat 11. We were able to identify transcripts lacking exons 63–66, that is, exon 62 spliced to exon 67, in cDNA reverse transcribed from adult human tibialis anterior muscle RNA. Human foetal muscle expressed only transcripts including all exons 63–66.

The 3′ end of NEB harbours two regions with alternatively spliced exons, that is, exons 143–144, and exons 166–177. Exons 143 and 144 give rise to two different transcripts, that is, transcripts lacking either exon 143 or exon 144 (Figure 3). Foetal muscle in humans seems to express only transcripts lacking exon 144. Both transcripts can be found in adult human tibialis anterior muscle, in adult mouse hind leg muscle and in 11-day-old mouse embryos. Adult human gastrocnemius, rectus femoris and heart muscles seem to express only transcripts lacking exon 143. We did not identify any transcripts expressing both exon 143 and exon 144 in any of the muscle samples studied.

Figure 3
figure 3

Exons 143 and 144 give rise to two different transcripts. The use of these vary between muscle types and between muscles of different developmental stages.

Exons 166–177, encoding the Z-disc region simple repeats M176–M182, undergo extensive alternative splicing, giving rise to a multitude of different splice variants (Figures 4 and 5). The use of a variety of different transcripts was observed in both human and mouse muscles of different developmental states. We managed to identify 20 different transcripts in the human tibialis anterior muscle alone (Figure 5).

Figure 4
figure 4

RT-PCR of the region including the differentially expressed exons 166–177 on human skeletal muscle cDNA obtained from tibialis anterior, gastrocnemius and rectus femoris, as well as human foetal skeletal muscle cDNA (Clontech), and on mouse skeletal muscle cDNA from hind leg muscles. S.m.=size marker.

Figure 5
figure 5

20 different nebulin transcripts from adult human tibialis anterior muscle.

Discussion

We have elucidated the exon–intron structure of the human nebulin gene by analysing genomic sequences available through public databases. Human NEB has 183 exons in a 249 kb genomic region. The human gene is larger than the mouse nebulin gene (Neb), which recently was reported to have 165 exons in a 202 kb region.19 We have compared the genomic structure of human NEB with the genomic structure of mouse Neb (AY189120). According to our calculations, the mouse nebulin gene has 166 exons. We identified a mouse exon homologous to human exon 143 between mouse exons 126 and 127, not reported by Kazmierski et al.19 We have confirmed the expression of this exon by RT-PCR and sequencing of adult mouse hind leg and foetal mouse hind leg muscle.

The organization of the mouse and human nebulin genes, that is, the exon sizes and protein domains encoded by the exons, is similar up to exon 89. The copies of human exons 82–89, that is, human exons 90–105, are not present in the mouse gene. Mouse exon 90 is homologous to human exon 106. The LINE-2 elements, which we believe are responsible for the duplication event in human NEB, are not present in the corresponding mouse introns. Furthermore, the mouse gene lacks two of the alternative exons in the 3′ end of the gene, that is, exons corresponding to human exons 173 and 174. Altogether, human NEB has 18 exons not present in mouse Neb. On the other hand, mouse Neb has one exon not present in human NEB, that is, exon 130, which is a peculiar exon showing homology to the human tight junction protein, ZO-1.19

We have also compared the human NEB sequence with the partial genomic sequence of rat Neb (NW_043632). The total number of exons in rat Neb could not be determined at this point due to incomplete sequence data from the central region of the gene. However, exons corresponding to the duplicated human exons 82–89 are present in the rat genome, and a LINE-2 element was detected in the flanking intron, suggesting the possibility of a duplication event. Unfortunately, there is a gap of unknown length between exons 84 and 85 in the genomic contig. Similarly to the mouse gene, the rat gene lacks exons corresponding to human exons 173 and 174. However, rat Neb has two exons homologous to human exon 169. Rat Neb also lacks an exon homologous to mouse exon 130.

The exon structure of NEB does not match the protein repeat boundaries previously defined on the basis of NEB cDNA sequences.1, 2 The phase is shifted by half a repeat so that the last two codons in each exon encode serine and (usually) aspartate, and the first four codons of each exon encode two varying amino acids and the conserved tyrosine and lysine of the SDXXYK-actin-binding motif. This structure is conserved in the mouse and rat nebulin genes, as well as in the related nebulette gene.6 This observation could be of importance for the phasing of expression constructs for biochemical analysis.

Human NEB exons 166–175 encode simple repeat domains M176–M180, which extend into the Z discs (Figure 1).6 Four alternative exons, 168–171, previously named 177A–177D,11, 12 encode simple repeat domain M177/M178, and three alternative exons, 172–174, previously named 177E, 178A and 178B, encode simple repeat M178/M179. The rat Neb gene has five exons encoding simple repeat M177/M178, and only one encoding M178/M179. The mouse Neb gene has four exons encoding M177/M178 and one encoding M178/M179. The amino-acid sequence encoded by these exons is highly conserved between humans, rat and mouse.

The alternatively spliced exons 63–66 are located in the central super repeat area of nebulin. Exons 63–66 encode protein repeat modules highly homologous to super repeat 11,2 in the following order – S11R5 (second half)–S11R6–S11R7–S11R1–S11R2–S11R3–S11R4–S11R5 (first half). In adult human tibialis anterior muscle, we found only a transcript lacking exons 63–66, corresponding to the full-length transcript described by Labeit and Kolmerer.1 Interestingly, only longer transcripts including exons 63–66, were observed in human foetal muscle. Thus, foetal nebulin has one additional seven-module super repeat in the central region.

The two alternatively spliced exons 143 and 144 are differentially expressed depending on the developmental stage or type of the muscle (Figure 3). These two exons code for amino-acid sequences of different character (exon 143 translation: KKYRADYEQRKDKYHLVVDEPRHLLAKTAGDQISQ, exon 144 translation: RKYKSSAKMFLQHGCNEILRPDMLTALYNSHMWSQ). They differ in both charge and hydrophobicity in one particular area predicted to have a protein kinase C phosphorylation site (underlined in the sequence above) only in the isoform encoded by exon 144. This is the transcript that is not expressed in human foetal muscle. Both transcripts were, however, detected by semiquantitative PCR of 11-day-old mouse embryos. No nebulin expression was detected in 7-day-old mouse embryos, which is in accordance with the onset of muscle protein expression at gestation day 9 in mouse.23 Interestingly, the amino-acid sequence encoded by exon 143 shows complete homology between mouse, rat and humans, indicating a central, conserved role for this region of the nebulin protein.

The extensive alternative splicing of exons in the 3′ end (human exons 166–177, mouse exons 150–160) of NEB is seen in mouse and humans in both adult and foetal muscle (Figure 4). These exons code for the C-terminal part of the nebulin protein characterized by a highly conserved SSVLYKEN motif. We could not verify the extensive alternative splicing of these exons in heart muscle described by Kazmierski et al.19 This is probably due to low expression levels of nebulin in heart muscle. Z discs vary in structure and width in different tissues and fibre types, and during development.24 The alternative use of these exons might contribute to the molecular diversity of Z discs found in different muscle types. Since the C-terminus of nebulin including these alternatively spliced exons is located only in the periphery of the Z line, it seems unlikely that the diversity observed could be generated by nebulin alone.

Exon 169 (previously named 177B) and exon 171 (previous 177D) were not expressed in any of the transcripts sequenced from adult human tibialis anterior muscle (Figure 5). However, we cannot exclude the possibility of longer transcripts, not picked up by the current method, expressing these exons in this muscle type. We have previously reported three different frameshift mutations in exon 171 (177D) in four unrelated families with nemaline myopathy.11 The mutations were associated with typical or mild forms of nemaline myopathy.11 This is in agreement with the observation that exon 171 seems to be expressed in rare isoforms.

Exon 177 (previously 181) was expressed in 16 out of 20 different transcripts examined (Figure 5). We have reported a 2-bp deletion in exon 177 (181) in two brothers compound heterozygous for the mutation, and another 2-bp deletion in the constitutive exon 163 (previously 172).10 Both patients had a severe form of nemaline myopathy,12 and showed some muscle fibres negative for the nebulin SH3-domain antibody.13 Antibodies to other domains showed the presence of nebulin.13 The severity of the disease and the abnormal nebulin expression seem to correlate with the constitutive expression of exon 163 and the expression of exon 177 in the majority of nebulin isoforms.

The duplicated segment, that is, exons 82–105, in the central region of NEB encodes six super repeats in addition to the previously reported 22 super repeats.1 The extremely high homology between the duplicated exons complicates sequencing of the region. We were able to amplify and sequence exons 82, 83 and 105 from adult skeletal tibialis anterior muscle RNA. Longer products were also observed indicating expression of several of the exons 82–105. We were not able to detect expression of exons 80–106 in human cardiac muscle, which would be in agreement with the expression of shorter nebulin isoforms in cardiac muscle than in skeletal muscle.19

Knowledge of the complete genomic sequence of the human nebulin gene is a prerequisite for mutation analysis of the gene in patients with nemaline myopathy. Our results of mutational analysis indicate that there are no mutational hotspots, and that the usually compound heterozygous patients have their own unique mutations. This constitutes a challenge for making routine mutation analysis of the complete gene practically applicable. The identification of all NEB exons has also made possible the characterization of alternatively spliced transcripts. Our results provide a basis for understanding the pathogenesis of nemaline myopathy caused by mutations in NEB.