Introduction

Nemaline (rod) myopathy (NM; MIM IDs: NEM1 #609284, NEM2 #256030, NEM3 #161800, NEM4 #609285, NEM5 #605355, NEM6 #609273, NEM7 #610687, NEM8 #615348, NEM9 #615731, NEM10 #616165) is one of the most common congenital myopathies. Characteristically, NM presents with congenital proximal muscle weakness, but it is a very heterogeneous disorder including six clinical categories, in which the severity and onset of the disease varies widely.1, 2 Typical histological findings include the presence of nemaline bodies (rods) in the muscle fibres. Nemaline bodies are aggregates of Z-disc and thin-filament proteins of the muscle sarcomere.3

Hitherto, 10 different genes have been shown to cause MM (ACTA1, NEB, TPM3, TPM2, TNNT1, CFL2, KBTBD13, KLHL40, KLHL41 and LMOD3). Variants in the nebulin gene (NEB) are the most common cause of recessively inherited NM (NEM2 #256030). NEB, is one of the largest genes in the human genome, located in the chromosomal region 2q23.3 (152 341 850–152 591 001; GRCh37/hg19) and consisting of 249 kb of genomic sequence. NEB contains 183 exons, but only one region, the donor splice site of intron 32, may be considered a true mutational hotspot. The vast majority, 83% (132/159) of the families included in our recently published mutational update, were heterozygous for two different pathogenic variants.2 The great variety of different NEB variants makes mutation analysis cumbersome, and this is further accentuated by the complexity of NEB. The gene includes regions with highly similar exons and intronic repeat elements such as Alu and LINE. It also includes homologous sequences, such as the so-called triplicate region (TRI) in the middle of the gene. TRI consists of eight exons that are repeated three times (exons 82–89, 90–97 and 98–105), all ~10 kb repeats showing high similarity (99%) with each other.4 This type of homologous DNA segment is difficult to study, and it might even include variants that have been overlooked because of that. For example, the NEB TRI is not included in the typical exome sequencing capture kits because of the high similarity of the three copies. Therefore, developing alternative methods is essential for studying the region.

Nebulin is a gigantic protein of the muscle sarcomere, which acts as a molecular ruler regulating thin (actin) filament length, actin–myosin interactions and force generation.5, 6, 7 This highly repetitive protein contains around two hundred 30- to 35-amino-acid-long simple repeats, each containing one actin-binding site. Moreover, the simple repeats are arranged into 22–30 super repeats each containing a putative tropomyosin-binding site. Nebulin has several binding partners and it is an essential part of the properly functioning sarcomere of the striated muscle.7, 8, 9, 10

Currently, evidence is accumulating that large copy number variations (CNVs) in NEB are more common than previously thought. In addition to the 2.5-kb Ashkenazi Jewish deletion of NEB exon 55,11, 12 other large deletions have recently been identified, including variations of different sizes, from part of one exon to more than half of the gene.2, 13

The normal copy number of the NEB TRI region has been estimated to be three copies in each allele.4 Therefore, the normal copy number of the NEB TRI is six instead of the typical two copies. Mouse Neb contains only one ~7.5 kb copy of the TRI region segments, implying that the triplication has occurred later during evolution. It has been suggested that the TRI region is the result of two tandem duplications through Alu-mediated homologous recombination in an ancestor of human and chimpanzee.14

One of the most common mechanisms inducing CNVs is thought to be non-allelic homologous recombination (NAHR) caused by misalignment and cross-over of non-allelic homologous DNA segments, such as low copy repeats. CNV breakpoints have also been shown to frequently reside in regions of repeat elements. These include different SINEs (such as different Alu repeats), LINEs, DNA repeat elements (such as MERs) and long-terminal repeats.15, 16 Repeat elements are known to cause CNVs in the genome in general, but their role in the formation of TRI CNVs requires further investigations.

In the current study, we describe the TRI CNVs that we have identified in our study cohort of 196 families and 60 controls using our custom NM-CGH microarray, and hypothesize on the pathogenicity and possible mechanisms behind this recurrent variation.

Materials and methods

Samples

The NM-CGH study included 266 DNA samples from 196 families with patients diagnosed with or suspected to have NM or a related myopathy, in whom one or both pathogenic variants had remained unidentified. In addition, 60 normal control samples, 22 from Finland and 38 from CEPH (Centre d’Étude du Polymorphisme Humain), were studied. The samples were received either as isolated DNA or as blood, cell lines, or muscle or skin biopsies from which DNA extraction was done using appropriate methods.

Microarray design, protocol and data analysis

The NM-CGH 8x60k microarray (Oxford Gene Technology IP Limited, Oxford, UK) was designed (Human reference sequence, GRCh37/Hg19) as described in Kiiski et al.13 The seven genes causative for NM known at the time (NEB, ACTA1, TPM3, TPM2, TNNT1, CFL2, KBTBD13) were densely covered with a tiling approach (with one probe pair starting every 10 bp interval), avoiding the most repetitive regions of these genes. In addition, this study includes samples that were run with the updated NM-CGH microarray version v2, also including the KBTBD5, KBTBD13 and LMOD3 genes, and v3, also including one unpublished NM-associated gene. In v2 and v3, the control gene TTN has been removed and the probe interval has been reduced from 10 to 20 bp in the intronic regions for every gene except for NEB. No other significant modifications were made in the NM-CGH array updates.

The labelling, hybridization, scanning and data analysis were done according to the manufacturer’s protocol (Oxford Gene Technology PI Ltd) as previously described in Kiiski et al.13 The CytoSure Interpret Software v.4.2.5-4.6.85 (Hg19) (Oxford Gene Technology Ltd) was used for graphic analysis of the data. The CBS algorithm was used and specific thresholds determined to allow for aberration calling. A setting for ‘multiple mappings’ was used in the analysis and in the graphic view, that is, each probe is shown in every genomic location where it can be located. Therefore, regarding the NEB TRI, most of the probes are shown three times, once for each homologue. The TRI variations were analyzed manually based on the logarithmic scale of the NM-CGH microarray results (Table 1). It was possible to design ~180 unique probe pairs for this region, mostly based on small sequence differences in the TRI introns.

Table 1 The calculated NEB TRI variations on the log2 scale of the NM-CGH array and the corresponding NEB TRI copy number

Exome sequencing and other variant analysis methods

The variants other than NEB TRI CNVs were identified using dHPLC and Sanger sequencing as previously described.13, 17 In addition, exome sequencing was used. Exome capture and sequencing were done by Oxford Gene Technology using the Agilent SureSelectXT All Exon 50 Mb target enrichment kit (protocol v1.2; Agilent Technologies, Santa Clara, CA, USA) on an Illumina HiSeq2000 platform using TruSeq v3 chemistry (Illumina Inc, San Diego, CA, USA) and analysis was completed using the Oxford Gene Technology exome sequencing pipeline. A putative pre-mRNA splicing affecting variant identified in family F2 was tested using a NEB minigene construct as described previously.17

Variants in the LOVD database

The results have been submitted to the LOVD database (http://www.LOVD.nl/NEB) with the submission ID numbers: NEB_00253-NEB_00261. The NEB cDNA reference sequence NM_001271208.1 was used and the exon numbering is according to Donner et al.4

Statistical analyses

Fisher’s exact test was used for determining the statistical significance of the results.

Bioinformatics methods

The NEB TRI region is highly repetitive and many of the tiling array probes match multiple locations. To obtain an even coverage, the measurements for the ambiguous probes were randomly assigned to one of the possible locations. The data were then analyzed using the GLAD package18 for the R environment19 and the breakpoints and copy numbers were inferred using the daglad function. Owing to the high probe density and variable signal level across the target region, the default parameters of the function produced unrealistically high numbers of breakpoints. After experimenting with different parameter values, the function was used with options λ=40 and d=10. The resulting breakpoints, along with the original signal intensities and the genome annotations for the target region, were visualized using the GenomeGraphs20 and rtracklayer21 packages.

Results

Using the NM-CGH array, we identified frequent CNVs of the NEB TRI, some of the CNVs we interpret to be benign and other pathogenic. NEB TRI CNVs were identified in 13% of the studied families (26/196) with NM or NM-related myopathy (referred to as NM families from now on). In more detail, 5% (9/196) showed a loss and 8% (16/196) a gain of the NEB TRI (Table 2). In addition, one family (F286) was found to include members with a gain, a loss or the normal copy number of the NEB TRI (Figure 1 and Table 4). NEB TRI variation was also identified in 10% of the control samples (6/60; Table 2). Based on this study, we suggest that one-copy losses or gains are benign and gains of 2–4 copies may be pathogenic.

Table 2 Summary of NEB TRI variations identified with the NM-CGH microarray in the NM family and control cohorts
Figure 1
figure 1

(ag) NM-CGH profiles of NEB in family 268. The index case (c; typical NM) has inherited a NEB TRI gain from the mother (b) and a NEB (c.2784delT) frameshift mutation from the father (a; not shown in the NM-CGH array). The father and two siblings (e and f) have a deletion and one brother (d) the normal copy number of NEB TRI. The index patient has the typical form of NM and all other family members are unaffected. The family pedigree is shown in g.

Table 4 NM families carrying possibly pathogenic variation of the NEB TRI

All the NEB TRI deletions identified in the NM families and controls were losses of one TRI copy, that is,. 5/6 copies in total. One-copy losses were identified in five control samples (5/60) and in nine NM families (9/196). In four of the NM families, the disease-causing variants, either in NEB or in another NM-causing gene, had been published previously. Novel disease-causing variants are published here for families F2 and F411. In family F2, the patients have inherited a splice site variant, c.508-7T>A, which was shown on the RNA level to activate a cryptic acceptor splice site in intron 7, causing an insertion of five nucleotides leading to a frameshift p.Val170fs. In family F411, the patient is compound heterozygous for two different frameshift variants, c.24372_24375del, in exon 172, and c.13134del in exon 86. In three families one or both disease-causing variants remain unknown (Table 3).

Table 3 NM families carrying benign variation of the NEB TRI

In the control sample cohort, a gain of one additional copy, that is, 7/6 copies, was identified in one sample (1/60; Table 2). However, among the NM families, there was a wider spectrum of copy number gains (Tables 2, 3 and 4). Almost half (8/17) of the identified gains in the NM samples included two to four additional copies, that is, 8–10/6 copies in total (Tables 2 and 4). This size category of CNVs was not identified in the control cohort. The patients in nine NM families carried one-copy gains. In one of these families, the disease-causing variants had previously been identified in LMOD3 (Index Patient 3353, Table 3). Using exome sequencing, we found a frameshift variant c.24475_24479dupCACAA in exon 173 of NEB, in another family (F407) segregating on the same allele as the TRI gain (Index Patient 4073, Table 3). The consanguineous healthy parents are both carriers of one TRI copy gain each, as well as a NEB frameshift variant, and the patient is homozygous for the frameshift variant and the TRI gain. In another family (F321), we identified, in addition to the NEB TRI one-copy gain, a novel complex deletion–insertion rearrangement concerning the NEB exons 69–71 and intron 65 (Index Patient 3213, Table 3) using the NM-CGH array. The second variant remained unidentified in this family. One or both disease-causing variants remain unknown altogether in seven families with a gain of one TRI copy.

TRI copy gains of two or more copies were identified in patients of eight NM families. In five of these families, another heterozygous pathogenic NEB variant had been identified previously or is published here (Table 4). We had samples from all family members in three of these families and these are described in more detail below.

Family 268

The index case 2683 (typical NM) has inherited a NEB TRI four-copy gain from the mother and a NEB exon 28 frameshift variant (c.2784delT) from the father. The unaffected father has the frameshift variant (c.2784delT) on one allele and a TRI deletion (one-copy loss) on the other allele, which suggests that one-copy deletions of TRI would not be pathogenic. We suggest that the TRI four-copy gain carried by the healthy mother, and inherited by the index patient, might be disease causing (Figure 1). The pedigree of the family (Figure 1g) shows that the NEB TRI gain segregates with the disorder, and only the combination of the NEB TRI gain and the frameshift variant result in the NM phenotype.

Family 7

The index case 573 (mild NM) has inherited a NEB TRI three-copy gain from the father and a frameshift variant in NEB exon 134 (c.20446delC) from the mother. We suggest that the three additional TRI copies might be pathogenic. The parents and sibling are unaffected mutation carriers.

Family 272

The index case 2723 (other form of NM) has inherited a NEB TRI four-copy gain from the father and a frameshift variant in NEB exon 81 (c.12048_12049delGA) from the mother. We suggest that the four additional TRI copies might be pathogenic. The parents are unaffected mutation carriers.

Exome sequencing of samples from the index patients in all three families (F268, F7 and F272) did not reveal any variants with serious consequences for gene function in any of the other currently known NM-causing genes.

In summary, the copy numbers deviated only by one copy in the control samples, whereas deviations larger than that were only found in NM families. The difference is not, however, statistically significant (Fisher’s exact test; Table 2). Based on this study, the NEB TRI CNVs are the most common large recurrent NEB variation characterized to date.

Breakpoint analysis

Breakpoint analyses were done for all samples that showed TRI CNVs. The results achieved using the NM-CGH array indicate that the locations of the breakpoint regions are the same for both gains and losses (Figure 2). All variants start in intron 105, but the exact breakpoint cannot be determined due to a probe gap within this intron because of repeat elements, across which no unique probes could be designed. The stop region seems to differ slightly between different samples but it is always inside intron 81. Repeat elements, that is, LINEs (CR1 and L2) reside in introns 89, 97 and 105 and long-terminal repeats (ERVL) reside in introns 82, 90 and 98. In addition, DNA transposon fossils (MER113A) are present in introns 88, 96 and 104.

Figure 2
figure 2

(a and b) Breakpoint analysis of NEB TRI copy number loss and gain. Sample 526 from Family 2 shows a one-copy loss (a) and sample 2721 from Family 272 shows a four-copy gain (b) of the NEB TRI. Breakpoint analysis shows repeat elements in the breakpoint regions. In the upper part of the Figure, the inferred breakpoints are shown with vertical lines and the inferred copy number of the corresponding fragment is indicated with horizontal lines. The tracks in the lower part of the Figure show the different repeat elements, the NEB triplicate region and the gene structure. DNA, DNA repeat elements; LTR, long-terminal repeat; Low Compl, low complexity DNA sequences; SSR, simple sequence repeats=microsatellite DNA.

Discussion

In the current study, 13% (26/196) of the studied NM families showed a NEB TRI CNV. The identified losses were deletions of one copy, which did not seem to segregate with the disorder and were found in the control population (8%) more frequently than in the NM families (5%). We estimate that the loss of one copy would cause a 1458 bp shortening of various transcripts and a 486-amino-acid shortening of the translated protein. One TRI copy encodes 486 amino acids corresponding to two nebulin super repeats.4 The two remaining TRI copies would be enough for the allele to produce a functional protein.

NEB transcripts of many different lengths are known to be produced in normal muscle, and therefore, this alteration might not cause a drastic effect on the thin actin filament structure of the sarcomere.22 In family 268, the father showed a one-copy loss of TRI and a frameshift variant on the other allele, and he is healthy. In six families, the TRI deletion has also been found to be accompanied by two identified pathogenic variants (Table 3). In addition, it is also frequent in controls (8%). Thus, one-copy deletions of NEB TRI are unlikely to have a serious impact on protein function.

The identified gains have been categorized into two groups: gain of one TRI copy, which we interpret to be non-pathogenic, and gains of two to four TRI copies, which we interpret to be pathogenic. In one family (Index Patient 3353, Table 3) with a one-copy gain, two other causative variants in another NM-causing gene, LMOD3, had previously been identified, and in another family (Index Patient 4073, Table 3), we found a frameshift variant in NEB exon 173 segregating with the one-copy gain. Also, one control sample was identified with one TRI copy gain (1/60). Thus, it seems that deviations of one TRI copy in an allele, be it loss or gain, are tolerated.

However, 4% of the NM families had patients with gains of more than one TRI copy (8–10/6 copies present). Both disease-causing variants in NEB had not previously been identified in these families.In those families with parental samples available for analysis, the TRI CNV segregated with the disorder. Furthermore, this type of aberration was not encountered among the control samples. We speculate that gains of more than one copy may disrupt the stability or secondary structure of the mRNA, and also the translation process. If so, no nebulin protein would be produced from this allele.

If, on the other hand, RNA processing and translation were to take place, the TRI gain would cause an increase in the length of the transcript and further of the nebulin protein. Each TRI gain would add 1458 bases to the mRNA, and 486 amino acids to the protein. Therefore, a gain of four copies would add 5.8 kb to the mRNA, and 1944 amino acids or eight super repeats to the protein.

The breakpoints of the TRI variations are difficult to characterize further. Normally, the entire TRI region is ~32 kb in size and each one-copy gain would add ~10 kb to the gene. The identified losses of one TRI copy would set the region at around 20 kb in size. The homology of the TRI region has proven to be very challenging especially for PCR studies. The last introns of each TRI repeat (introns 89 and 97 and 105) contain repeat elements, such as Alu and LINE repeats. These transposable elements are known to be capable of being involved in NAHR, thought to be the most common underlying mechanism for recurrent CNVs.15, 16 The lack of further knowledge of these TRI variations makes it difficult to estimate the exact causative mechanism but NAHR seems plausible. It is thought that the human NEB TRI has emerged from two duplication events. This is based on the fact that the mouse Neb gene only contains one copy of this region (exons 82–89) and is lacking the LINE-L2 elements as well as the exons corresponding to human NEB exons 90–106.4 These repeat elements might explain the susceptibility of the NEB TRI region to the recurrent copy number changes discovered in this study.

The NM-CGH microarray is an effective and easy method for revealing NEB TRI CNVs. However, if there is a deletion in one allele and a duplication in the other, the copy number would look normal and these cases would thus remain unidentified. Because the normal copy number of the NEB TRI is six instead of two, it can be difficult to differentiate between subtle differences in size, for example, between gains of three copies and gains of four copies, especially if the DNA quality is poor. Also, the NM-CGH array cannot determine which allele harbours the CNV. Therefore, studying parental samples is necessary to verify the origin of the identified variant. Regarding duplications, the identification of the location or the orientation of the duplicated region requires alternative methods.

Conclusions and future prospects

The NM-CGH microarray has revealed a novel, recurrent CNV of the NEB triplicate region, apparently the most common large recurrent variant in NEB. The NM-CGH array is a very easy and effective method for detecting these CNVs. We will continue to use the NM-CGH array as a first tier diagnostic method for new NM families, as well as analyzing samples from families and patients whose causative variants have remained unidentified.