Introduction

Adult-onset neuronal ceroid lipofuscinoses (ANCL) constitute a group of rare genetic diseases characterized clinically by the progressive deterioration of mental and motor functions and histopathologically by the intracellular and ultrastructurally distinct accumulation of autofluorescent lipopigment—ceroid—in the brain and other tissues. Age of onset, spectrum of neurological phenotypes, and disease progression can vary even within families. Clinical heterogeneity of ANCLs is in line with diverse inheritance patterns, increasing number of identified causal genes (e.g., DNAJC5 [1], CLN6 [2], CTSF [3], GRN [4], CLN1 [5], CLN5 [6], ATP13A2 [7]), and various types of causative variants and their combinations (NCL Resource—A Gateway for Batten Disease: http://www.ucl.ac.uk/ncl/newnomenclature.shtml).

Diagnosis of ANCLs is challenging from a clinical, histopathologic, as well as diagnostic perspective. Even with recent technological advances [8, 9], the causative genetic variants(s) in some ANCL families have not been identified. In order to improve genetic diagnosis in these families, we recently established The Adult NCL Gene Discovery Consortium. Within the Consortium we reviewed clinical and histopathological data and classified recruited cases as definite, probable, or possible ANCL or not meeting the diagnostic criteria for ANCL [10]. ANCL cases were then subjected to candidate gene and whole-exome sequencing (WES) [11, 12].

Here we report and characterize a new variant—a 30 base pair in-frame duplication in DNAJC5, that we have identified in one of the investigated ANCL families. The identification of this variant was particularly challenging. It was initially missed by Sanger sequencing of DNAJC5 and WES, to be identified later by reanalysis of original WES data that were shared within the Consortium. Our work thus also provides a cautionary tale about the challenges in identification of even relatively short insertions and duplications by standard genetic methods.

Materials and methods

Subjects

The study protocol was approved by the local Institutional Review Boards and signed informed consent was obtained from all subjects.

The Canadian family was ascertained at the Montreal Neurological Institute, McGill University, Canada based on clinical observation of three affected individuals: a mother and two sons. The mother was diagnosed with Kufs disease at the age of 42 and died at the age of 56 (no clinical details are available). The affected sons presented with seizures, memory loss, and disability (wheelchair bound) at the age of 31 and 34. No biopsy material for pathological evaluation was available at the time of investigation to examine for the presence of typical lipopigment in tissues of affected individuals.

DNA sequencing and variant analysis

Genomic DNA of the two brothers was extracted from whole blood samples by a standard protocol. Coding regions of DNAJC5 (NG_029805.2) were amplified by PCR from genomic DNA of the two brothers and sequenced by direct Sanger sequencing using the version 3.1 Dye Terminator cycle sequencing kit (ThermoFisher Scientific) with electrophoresis on an ABI 3500XL Avant Genetic Analyzer (ThermoFisher Scientific). Data were analyzed using Sequencing Analysis software (ThermoFisher Scientific).

Exome sequencing

Exome sequencing was performed using genomic DNA from the two affected brothers (Fig. 1). For DNA enrichment, the Sure Select Human All Exon V4 capture kit (Agilent Technologies, Santa Clara, CA) was used according to the manufacturer’s protocol. DNA sequencing was performed on the captured barcoded DNA library via an Illumina HiSeq 2000 system as a pair end library with the read length of 100 bp. The resulting FASTQ files were aligned to the human reference genome (hg19) via BWA-MEM [13]. After genome alignment, conversion of SAM format to BAM and duplicate removal were performed using Picard Tools (1.129). The Genome Analysis Toolkit, GATK (3.2.2) [14,15,16] was used for local realignment around indels, base recalibration, variant recalibration, and variant calling (HaplotypeCaller). Variant annotation was performed with SnpEff 3.6 [17] and GEMINI 0.18.2 [18].

Fig. 1: Sanger sequencing and reads alignment of the DNAJC5 bearing the 30 bp duplication.
figure 1

a Pedigree of the Canadian family suggesting an autosomal dominant inheritance. b Chromatograms of DNAJC5 genomic DNA sequences showing normal DNAJC5 sequence in the proband using original protocol (Proband_sample 1) and heterozygous duplication in the same DNA sample upon modified PCR protocol (Proband_sample 2). Lower panel shows chromatogram from control DNA. c The 30 bp duplication in DNAJC5 in the Integrative Genomics Viewer (IGV2.3) before (upper panel) and after a visualization of soft-clipped bases (lower panel). d In silico analysis of the cysteine-string domain showing that compared with the wild-type sequence (blue line), the duplication (red line) alters palmitoylation potential (left panel) and hydrophobicity profile (right panel), critical parameters of post-translational modification, and intracellular localization of CSPα.

In silico analysis of the cysteine-string domain

Hydrophobicity and palmitoylation potential of the wild-type (wt) cysteine-string domain (CSD) and the CSD carrying the DNAJC5 variant were analyzed with a Kyte–Doolittle algorithm and CSS-Palm 2.0, respectively, as described previously [1].

CSPα-expression vectors

DNAJC5/CSPα cDNA was amplified by RT-PCR from the affected individuals’ leukocytes with specific primers. Resulting PCR products were first cloned into PCR2.1 TOPO TA-cloning vector (Invitrogen) and, after sequencing verification, the cDNA region containing the 30 bp duplication was subcloned into a pEGFP-C1/DNAJC5 wt vector using BstXI and BsmBI restriction sites. The pEGFP-C1/DNAJC5_wt, pEGFP-C1/DNAJC5_Leu115Arg, and pEGFP-C1/DNAJC5_Leu116del vectors were generated as described previously [1].

Transient expression of EGFP–CSPα

cDNA constructs were transfected into CAD5 cells derived from Cath. -a-differentiated (CAD) cells (provided by Sukhvir Mahal, The Scripps Research Institute, Jupiter, FL, USA). Four to seven days before transfection, 1 × 104 cells/cm2 were seeded with Opti-MEM medium (Opti-MEM; Invitrogen) containing 10% FBS (HyClone, Logan, UT), 90 units penicillin, streptomycin/ml. Cells were transfected by either 0.8 or 4 μg of plasmid constructs with Lipofectamine 2000 (Invitrogen) in serum and antibiotics free Opti-MEM medium according to the manufacturer’s protocol. Transfection experiments were performed in more than three replicates.

Immunofluorescence analysis

Cells were fixed 24 h after transfection with chilled methanol for 10 min, washed, blocked with 5% bovine serum albumin (BSA), and incubated for 1 h at 37 °C with antiprotein disulfide isomerase mouse monoclonal IgG1 (Stressgen, San Diego, CA) for endoplasmic reticulum (ER) localization, anti-GS28 mouse IgG1 (Stressgen, San Diego, CA) for Golgi localization, and anti-GFP rabbit polyclonal IgG (Abcam) for EGFP-CSPα detection. For fluorescence detection, corresponding species-specific secondary antibodies Alexa Fluor 488 and Alexa Fluor 555 (Molecular Probes, Invitrogen, Paisley, UK) were used. Prepared slides were mounted in ProLong® Gold Antifade with 4ʹ,6-diamidino-2-phenylindole staining nuclei (Life Technologies, Forster City, USA) fluorescence mounting medium and analyzed by confocal microscopy.

Image acquisition and analysis

Prepared slides were analyzed by confocal microscopy. XYZ images were sampled according to Nyquist criterion using a LeicaSP8X laser scanning confocal microscope, HC PL APO objective (63×, N.A. 1.40), 405, 488, and 543 laser lines. Images were restored using a classic maximum likelihood restoration algorithm in the Huygens Professional Software (SVI, Hilversum, The Netherlands) [19]. The colocalization maps employing single pixel overlap coefficient values ranging from 0 to 1 were created in the Huygens Professional Software [20]. The resulting overlap coefficient values are presented as the pseudo color denoted in the corresponding lookup tables.

Immunoblot analysis

Transfected CAD5 cells were harvested in PBS, centrifuged at 610 × g for 5 min, resuspended in 50 mM Tris pH 6.8, 50 mM DTT, 2% sodium dodecyl sulfate (SDS), and Complete Protease Inhibitor Cocktail (Roche) or PBS with Triton X-100, 0.1 or 0.5% and Complete Protease Inhibitor Cocktail (Roche), homogenized by sonication using the Covaris S2 Ultrasonicator, followed by denaturation at 100 °C for 10 min. The protein content in the supernatant was determined using an infrared spectrometer Direct Detect infrared (Millipore) according to the manufacturer’s protocol. Protein lysates equivalent to 15 or 20 μg of protein were incubated with and without 6 M hydroxylamine for CSPα depalmitoylation for 24 h at room temperature and reduced at 100 °C for 5 min in sample buffer with or without 1% beta-mercaptoethanol (βME) before SDS-PAGE electrophoresis. After protein transfer to the polyvinylidene fluoride membrane, membranes were blocked by 5% milk and 0.1% Tween 20 in PBS over night at 4 °C. CSPα or CSPα-EGFP protein was visualized by incubation with rabbit CSP antibody (Stressgen) at 1:5000 in 0.1% BSA and 0.1% Tween 20 in PBS for 60 min or rabbit green fluorescent protein (GFP) antibody (Abcam) at 1:3000 in 0.1% BSA and 0.1% Tween 20 in PBS for 60 min, followed by incubation with goat antirabbit HRP (Pierce) at 1:10,000 in 0.1% Tween 20 in PBS for 60 min and detection by Clarity Western ECL Substrate (Bio-Rad).

Results

Identification of 30 base pair duplication in DNAJC5 by a combination of exome sequencing and sanger sequencing

To identify the genetic lesion in affected family members we initially Sanger sequenced and excluded DNAJC5, the prevalent gene for autosomal dominant ANCL (AD-ANCL) (Fig. 1b). Next we sequenced all coding exons and 5ʹ and 3ʹ untranslated regions of their corresponding mRNAs (UTRs) (Sure Select Human All Exon V4 capture kit, Illumina HiSeq 2000) in both affected brothers. Considering an autosomal dominant model of inheritance, we searched for variants that had standard read count threshold ≥10, were present in the heterozygous state in both affected individuals and had a minor allele frequency <0.5% in The Exome Aggregation Consortium database [21]. These parameters did not yield any functionally relevant candidate variant. Lowering the standard read count threshold to ≥5, we found a 30 bp in-frame duplication in DNAJC5. This variant was however not seen in the IGV tool, which allows visualization of sequence alignments. Essential for the variant detection was a visualization of so called soft-clipped bases, which are reads not matching with the reference sequence in their whole length. Using the visualization of soft-clipped bases we revealed the 30 bp in-frame duplication: chr20:g.62562252_62562281dup (hg19); NM_025219.2:c.370_399dup (p.(Cys124_Cys133dup)), in exon 4 of DNAJC5. (Fig. 1c). We modified our original PCR protocol and confirmed the presence of the duplication using standard Sanger sequencing (Fig. 1a, upper panels). The variant was submitted to ClinVar database under the accession code VCV000689476 and to the Mutation and Patient Database for Human NCL genes [22].

In silico analysis of the novel CSPα c.370_399dup (p.(Cys124_Cys133dup)) variant

The duplication encodes for a duplication of the central core motif of the CSD of CSPα. NM_025219.2:c.370_399dup (p.(Cys124_Cys133dup)). The CSD is implicated in palmitoylation and membrane trafficking of CSPα. In silico analysis suggested that the duplication increases hydrophobicity (Fig. 1d, right panel) of the CSD and that the presence of the additional seven cysteine residues changes the palmitoylation potential (Fig. 1d, left panel). Changes in these parameters can make the protein carrying the p.(Cys124_Cys133dup) variant prone to aggregation [23].

The functional effect of the GFP-tagged CSPα p.(Cys124_Cys133dup) variant in neuronal CAD5 cell model

To assess the effect of the identified duplication on CSPα expression and intracellular localization we transiently expressed N-terminal GFP tagged CSPα with the identified duplication p.(Cys124_Cys133dup), wt CSPα (GFP_CSPα_wt) and GFP_CSPα with previously identified variants NM_025219.2:c.344T>G (p.(Leu115Arg)) and NM_025219.2:c.343_345del (p.(Leu116del)), both variants located in exon 4 of DNAJC5 (the exon numbering starts with exon 1 to exon 5), shortly p.(Leu115Arg) and p.(Leu116del), in CAD5 cells. Immunofluorescence analysis and colocalization studies showed that the GFP_CSPα_wt and the endogenous CSPα are localized dominantly along the plasma membrane in finely granular cytoplasmic structures. All three GFP_CSPα proteins with the variants p.(Cys124_Cys133dup), p.(Leu115Arg), and p.(Leu116del) had reduced expression on the plasma membrane. They are present mostly in cytoplasm, either in a diffuse form or as a coarsely granular inclusions that colocalize to a certain extent with markers of ER and Golgi apparatus (Fig. 2, Supplementary Fig. 1).

Fig. 2: Immunofluorescence analysis of transiently expressed GFP-tagged CSPα wt and variant proteins in CAD5 cells.
figure 2

All three variant proteins (ac, mo) are present in a finely or coarsely granular structures. Co-staining with (eg) protein disulfide isomerase (PDI), a marker of endoplasmic reticulum (ER), and (qs) Golgi SNAP receptor complex member 1 (GS28) demonstrates abnormal presence of mutated proteins in ER (ik) and Golgi (uw). Wild-type protein (d, h, lp, t, x) is present exclusively on plasma membrane. The degree of colocalization of GFP_CSPα with selected markers is demonstrated by the fluorescent signal overlap coefficient values ranging from 0 to 1. The resulting overlap coefficient values are presented as the pseudo color whose scale is shown in corresponding lookup table.

Immunoblot analysis of GFP-CSPα transiently produced in CAD5 cells

To assess the effect of the identified duplication we performed western blot analysis of transiently transfected CAD5 cell lysates before and after chemical depalmitoylation performed under different denaturing conditions (Fig. 3). We found that the GFP_CSPα protein carrying the Cys124_Cys133dup is present exclusively in nonpalmitoylated form, whereas GFP_CSPα_Leu115Arg and GFP_CSPα_Leu116del proteins can present in both, the nonpalmitoylated or palmitoylated CSPα with the former more abundant. All three GFP_CSPα_Leu115Arg, GFP_CSPα_Leu116del, and GFP_CSPα_Cys124_Cys133dup proteins formed high molecular weight aggregates that were resistant to SDS and reducing agents (DTT, βME). The aggregates became soluble by these procedures only upon initial chemical depalmitoylation by hydroxylamine (Fig. 3).

Fig. 3: Immunoblot analysis of transiently expressed GFP-tagged CSPα proteins in CAD5 cells.
figure 3

Immunodetection using the CSP antibody reveals that the GFP_CSPα_p.(Cys124_Cys133dup) protein is present exclusively in nonpalmitoylated form (CSPα nonpalm; the asterisk (*) refers to the palmitoylated form—CSPα palm). All three GFP_CSPα proteins with the variants p.(Leu115Arg), p.(Leu116del), p.(Cys124_Cys133dup) form high molecular weight aggregates (**) resistant to sodium dodecyl sulfate (SDS) and reducing agents (DTT, βME). The aggregates are soluble only upon initial chemical depalmitoylation by hydroxylamine (NH2OH).

Discussion

In this work we identified a 30 bp duplication in DNAJC5 encoding CSPα in one family ascertained by The Adult NCL Gene Discovery Consortium. The variant leads to a duplication of the central core motif of the CSD and affects palmitoylation-dependent CSPα sorting in cultured neuronal cells similar to two other previously described single nucleotide CSPα variants p.Leu115Arg and p.Leu116del. CSPα acts as a co-chaperone in the formation of presynaptic SNARE complexes (soluble N-ethylmaleimide-sensitive factor attachment protein receptors) [24]. The SNAREs are essential for docking of synaptic vesicles, their fusion and recycling. There is accumulating evidence that disruption of the SNARE machinery leads to neurodegeneration [25].

This family remained genetically undefined for decades. Initially, the variant could not be detected by standard Sanger sequencing of DNAJC5 probably due to a preferential PCR amplification of the shorter wt allele and allelic dropout of the mutated DNAJC5 allele. It was also missed by a subsequent analysis of WES. Its identification was facilitated by reanalysis of the original WES data shared within the Consortium and modification of the PCR and Sanger sequencing protocols.

Independently occurring variants in the genomic sequence of DNAJC5 encoding the CSD of CSPα [1] suggest that this region may be more prone to DNA replication errors and that insertions or duplications within this domain should be considered in not yet solved ANCL cases.

Our work demonstrates the limitations of Sanger sequencing and WES in detection of even relatively small insertions and duplications and shows that analysis of next generation sequence data still requires an individualized approach and unique interpretations of the data. Continued reanalysis of the data with a team of experienced scientists may identify previously missed variants. Approximately 75% of patients with neurodegeneration subjected to WES remain without a genetic diagnosis [26]. It is unclear how many similar variants will be identified by continued reanalysis, as demonstrated in this paper.