Intronic NEFH variant is associated with reduced risk for sporadic ALS and later age of disease onset

Neurofilament heavy (NEFH) is one of the critical proteins required for the formation of the neuronal cytoskeleton and polymorphisms in NEFH are reported as a rare cause of sporadic ALS (sALS). In the current study, a candidate tetranucleotide (TTTA) repeat variant in NEFH was selected using an in-silico short structural variant (SSV) evaluation algorithm and investigated in two cohorts of North American sALS patients, both separately and combined (Duke cohort n = 138, Coriell cohort n = 333; combined cohort n = 471), compared to a group of healthy controls from the Coriell Institute biobank (n = 496). Stratification according to site of disease onset revealed that the 9 TTTA allele was associated with reduced disease risk, specifically confined to spinal-onset sALS patients in the Duke cohort (p = 0.001). Furthermore, carriage of the 10 TTTA allele was associated with a 2.7 year later age of disease onset in the larger combined sALS cohort (p = 0.02). These results suggest that the 9 and 10 TTTA motif length may have a protective advantage for potentially lowering the risk of sALS and delaying the age of disease onset, however, these results need to be replicated in larger multicenter and multi-ethnic cohorts.

Statistical analysis. Differences in distributions of allelic frequencies were assessed using independent samples Mann-Whitney U or chi-squared tests as appropriate. Case-control genotype/allele associations were assessed using binary logistic regression controlling for patient sex in the covariate adjusted model, with data coded as either the absence (0) or presence (1) of each NEFH allele or genotype. Analyses were stratified according to cohort and site of onset, and also combined. Significant associations withstanding Bonferroni correction for multiple comparisons are indicated. Joint significance of genotype/allele associations were assessed via casecontrol logistic regression. General linear models were used for the association of age of onset with genotype/ allele, accounting for patient sex. Kaplan-Meier curves for survival duration were estimated taking into account site of disease onset. Genotypes with low frequency (n < 3) were excluded from statistical analyses. A p-value below 0.05 was considered statistically significant. Analyses were carried out in IBM SPSS Statistics version 25.0 (IBM Co., Armonk, NY, USA).
Ethics approval and consent to participate. Samples from Duke ALS clinic were collected in accordance with the Health Insurance Portability and Accountability Act (Pro00040665/323682). Retrospective genotyping of DNA samples was approved by the Human Research Ethics committee of the University of Western Australia (RA/4/20/5308).

Results
Identification of polymorphic TTTA short structural variant in NEFH. A SV evaluation algorithm was used to identify potential polymorphic variants within NEFH 31 , with candidate variants scored according to 24 different properties, previously described 34 . The selected TTTA variant was subsequently investigated on public genomic databases NCBI and ensemble genome browser NC_000022.11 and 22: 29483828-29483863 respectively. The recorded entries for this region show a multitude of "rs numbers" logged for this variant, indicating this genetic locus is likely polymorphic and warranted further investigation. The candidate TTTA variant resides 255 bp past exon 2 in the NEFH primary transcript encoding 1020 amino acids (Fig. 1a). Polyacrylamide electrophoresis fractionation of NEFH PCR products revealed polymorphic alleles with varying numbers of TTTA repeat motifs, and this was confirmed via Sanger sequencing (Fig. 1b). Additionally, capillary fragment separation confirmed variable length TTTA repeat genotypes, with alleles ranging from 6 to 15 TTTA repeats.  (Fig. 1c), with a single fluorescent signal peak indicating homozygous samples and two signal peaks representing heterozygous samples. As the signal peaks move further apart this indicates a greater base pair difference between the individual NEFH alleles.
Distribution of TTTA alleles. We next investigated the distribution of the TTTA variant in the two sALS cohorts (n = 138 and n = 333) and 496 healthy control cases (cohort demographics can be seen in Table 1). The distribution of allele lengths ranged from 6 to 15 TTTA repeats in both cohorts of sALS patients and in the controls. No significant differences were found in the allele frequency distributions between the Duke and Coriell sALS samples (Fig. 2a). Therefore, both sALS cohorts were analyzed individually and also as a combined cohort (combined allele distribution, Fig. 2b). Along with self-reported ethnicity and country of origin data for each participant, each NEFH allele distribution was also compared to Webstr database and was similar to both GTEx (predominantly European self-reported ancestry) and 1000 genomes European allele distributions, providing

TTTA variant and sALS disease risk.
To explore the association between NEFH TTTA repeat length and risk of sALS, a case-control association study was conducted. Binary logistic regression models were used to analyze the carriage of each allele compared to all other alleles in the Duke and Coriell sALS cohorts both separately and combined ( Table 2). Comparison of the 9 TTTA allele between Duke sALS cases and controls split by site of disease onset revealed reduced risk specifically for spinal-onset sALS (p = 0.001), whilst no effect was seen for bulbar-onset sALS (p = 0.60). This effect was not replicated within the Coriell (p = 0.38) or the combined cohort (p = 0.03). To determine if specific genotypes were contributing to this allelic association we followed up    Table 3). The 9,10 genotype was associated with reduced risk for sALS by a factor of 0.5 in the Duke cohort, in both the naïve and the covariate-adjusted model for patient sex. However, this was not statistically significant in the Coriell or combined sALS cohort. With both the 9 TTTA allele and 9,10 genotype showing statistical significance in the Duke cohort we wanted to further investigate this association. Within the spinal cases of the Duke cohort, when considering carriage of the 9 TTTA allele, 10 TTTA allele and the 9,10 TTTA genotype; carriage of the 9 TTTA allele remained significant (corrected p = 0.001) and was not abrogated by carriage of the 10 TTTA allele or 9,10 TTTA genotype which were both non-significant in the joint model.

TTTA variant and age of disease onset.
To examine the effect of TTTA repeat length on age of disease onset, a general linear model was used. Due to the variability in age of onset data from both Duke and Coriell cohorts, this was investigated in the larger combined sALS cohort to help negate the potential of detecting spurious associations. Within the combined cohort, males had on average a 3.7 year earlier age of onset compared to females (p = 0.02). Compared to patients with spinal onset, those with the bulbar onset ALS had a 4.3 year later age of disease onset (p = 0.001). In the combined sALS cohort, when analyzed allelically ( Table 4), carriage of the 10 TTTA allele was associated with a 2.5 year later age of disease onset in the naïve model (p = 0.03). This effect was not abrogated when taking into account the significant covariates including patient sex and site of disease onset, with an estimated mean difference of 2.7 years later age of disease onset following Bonferroni correction (p = 0.02). No NEFH genotypes passed the statistical significance threshold for being associated with age of disease onset (Table 4).

TTTA repeat alleles and survival duration in the Duke cohort.
End-point survival data were available only for the 138 sALS cases from the Duke cohort. Within this cohort there was a significantly reduced survival time with increasing age at disease onset (p = 0.009). Survival duration was analyzed according to TTTA allele carriage, with site of disease onset taken into consideration. There was no significant difference in endpoint survival when stratified by initial site of disease onset (Fig. 3a), nor was there any significant difference in end-point survival for carriage of the 9 or 10 TTTA alleles, or other alleles investigated (Fig. 3b-f).

Discussion
This study investigated a multi-allelic intronic TTTA structural variant in NEFH as a candidate risk factor for sALS, and modifier of age and site of disease onset and survival duration. Within the Duke cohort, carriage of the 9 TTTA allele was significantly enriched in bulbar onset cases. Interestingly, in the Duke cohort the 9 TTTA allele was associated with a substantial reduction in sALS risk by a magnitude of more than one half, but only in Table 4. Naïve generalized linear model evaluating the association between NEFH TTTA alleles/genotypes and age of disease onset in the combined sALS cohort. The following genotype were excluded form analysis due to low frequency: 6,7; 8,12; 9,15; 10,13; 11,11. Values in bold with * denote statistical significance p < 0.05. www.nature.com/scientificreports/ spinal onset patients. Within the spinal onset cases, the reduction in risk remained significant after considering carriage of the 10 TTTA allele or the 9,10 genotype, confirming this effect was in fact driven by the 9 TTTA allele. Despite the 9 TTTA association not replicating, it is important to note that the trend for reduced risk in the spinal onset patients was consistent in the Coriell and combined cohorts. On the other hand, there was no association for increased disease risk for any of the NEFH TTTA alleles/genotypes in either the Duke or Coriell cohorts, or in the combined sALS cohort. Following the association with disease risk, NEFH alleles/genotypes were investigated as potential modifiers of sALS age of disease onset using general linear modelling. The association with age at disease onset was investigated in the combined sALS cohort to minimize the effect of variability across each separate sALS cohort. Carriage of the 10 TTTA allele was found to be associated with a 2.7 year later mean age at disease onset when www.nature.com/scientificreports/ accounting for allele carriage, sex and site of disease onset, further supporting the notion of the 9 and 10 TTTA alleles being associated with protective traits in sALS. Lastly, NEFH alleles were investigated as a potential modifier of survival in the Duke cohort. Due to the limited number of patients with end-point survival data, we were only able to analyze the effect of NEFH alleles and not genotypes, which we acknowledge as a limitation of this study. We analyzed the carriage of NEFH alleles, taking into account the site of disease onset. Although the survival trajectories for spinal and bulbar onset patients did not differ within this cohort, it is important to include this covariate since short structural variants have previously been shown to stratify sALS patient sub-phenotype [33][34][35] . In the present study, no significant associations were detected between NEFH alleles and survival.

NEFH TTTA repeat length
A number of possible explanations were considered for the differences in the 9 and 10 TTTA allele associations found in the two sALS cohorts. Firstly, in view of the smaller size of the Duke cohort, the possibility of a spurious association was considered, but in view of the magnitude of the associations and the follow up analyses accounting for the carriage of the 9 TTTA allele, 10 TTTA allele or the 9,10 TTTA genotype, this is considered unlikely. Differences in ethnicity were excluded on the basis of a comparison of self-reported data on ethnicity and country-of-origin from the two cohorts. However, consideration still needs to be given to differences in genetic diversity between the two cohorts, based on the sources of patient samples and geographic patterns of patient recruitment. It is pertinent in this regard that the Duke cohort were from one geographical location (Duke ALS Clinic, Durham North Carolina), whereas cases from the Coriell biobank are multicenter in origin and were recruited from geographically more diverse locations.
Currently, both light and heavy neurofilaments (pNFL and pNFH) are considered among the most promising disease biomarkers for ALS diagnosis and progression, and have been extensively reviewed 5,36,37 . Typically, pNFL levels are considered a more sensitive prognostic marker, displaying a significant correlation between pNFL in the CSF and in serum 38 . On the other hand, pNFH in the CSF does not correlate well with serum pNFH levels but this may be due to the masking of epitopes or post-translational modifications influencing antibody detection 38,39 . Levels of both CSF and serum pNFL and pNFH have been shown to distinguish patients with ALS from controls and other neurodegenerative diseases 38,40 . Various studies have also reported pNFL and pNFH to be correlated with pheno-conversion, clinical measures of disease progression, or the clinical subtype of ALS [39][40][41][42] . With this in mind, it would be interesting to investigate if carriage of the NEFH 9 TTTA variant correlates with lower levels of pNFH in CSF and/or serum, suggesting a slower rate of degeneration. Particularly, with carriage of the 9 TTTA variant being associated with reduced risk for sALS only in spinal onset patients, it would be interesting to know whether this speaks to a selective vulnerability of different neuronal populations (upper vs lower motor neurons). It is noteworthy that a recent study reported significantly higher serum pNFH concentrations in pyramidal, bulbar and classic ALS phenotypes, compared to flail arm ALS and primary muscular atrophy subtypes in which lower motor neuron involvement was predominant 43 . The study concluded that a positive correlation between pNFH and disease progression suggests that a faster rate of neuronal degeneration may be a determinant of higher serum pNFH levels 43 .
With evidence of NEFH playing a role at both the genetic level and as a prognostic marker for ALS, one must consider the potential impact of previously overlooked short structural variants within the NEFH gene in helping to distinguish clinical variability, particularly in sALS. A recent report in 100 sALS patients has stated that 21% of sALS patients carry either pathogenic or likely pathogenic genetic variant in an ALS associated gene, with 13% of patients carrying more than one genetic variant (including variants of unknown significance) 44 . Of note, patients carrying two variants developed disease at a significantly earlier age, with variants in known ALS genes being of potential clinical importance in as many as 42% of sALS patients 44 . This suggests, genetic variation including variations of unknown significance may in fact have a cumulative effect, contributing to disease risk and phenotypic variability between patients. With NEFH having a clear link to ALS disease and progression, future work should explore the impact of short structural variants in NEFH and other ALS associated genes as additional markers of disease risk and prognosis.

Limitations
Importantly, there are several limitations of the present study that should be noted. Firstly, whilst it is known that ~ 10% of sALS patients carry a pathogenic variant in major ALS associated genes 45,46 , the DNA samples that were used in the present study were not screened for such variants. The most common sALS linked variant is the repeat expansion in C9orf72 47,48 , present in 7% of cases. Although there is phenotypic heterogeneity across carriers of C9orf72, the expansion is typically linked with earlier age of onset and with behavioral and cognitive changes, reflective of upper motor neuron involvement 49 . In the present study, the NEFH TTTA variant appears to have a protective effect (i.e. reduced disease risk in spinal cases only, and later age of disease onset) and it would therefore be unlikely that the presence of the C9orf72 repeat expansion would influence the current findings. Once validated, it would be interesting to know if the protective effects of the NEFH TTTA variant could reduce the influence of pathogenic variants such as C9orf72, and this should be considered in future studies. Similarly, the DNA samples were not screened for previously reported NEFH exonic variants, considered to be a rare cause of sALS, occurring in ~ 1% of sporadic cases [21][22][23][24][25][26][27] . Secondly, end-point survival data was only available for the smaller Duke sALS cohort. Due to the limited survival data, this could only be analyzed allelically in the Duke cohort and could not be further investigated in the Coriell or larger combined sALS cohort. Finally, the phenotypic data available for the two cohorts examined in this study was limited, therefore, progression measures such as ALSFRS or cognitive status could not be considered in the present study.

Conclusion
Neurofilament heavy plays a critical role in maintaining the structural integrity of the cytoskeleton within neurons, and when damaged leaks out of the axon into the CSF. Previous genetic studies of NEFH have primarily focused on exonic variants which are reported as a rare cause of sALS. The aim of this study was to investigate possible risk and disease-modifying effects of a candidate intronic TTTA variant in NEFH within two independent sALS cohorts. Our findings in the Duke cohort point to a substantial reduction in disease risk in carriers of the 9 TTTA allele, specifically in spinal-onset sALS. Collectively, the data suggests that the 9 and 10 TTTA motif lengths may have a protective advantage, potentially lowering both the risk of sALS and promoting a 2.7 year later age at disease onset, as seen in the combined sALS cohort. These results need to be replicated in larger multicenter cohorts with future studies considering more ethnically diverse populations to determine if this risk/disease modifying locus is specific to only Caucasian patients, and to validate this variant as a genetic marker for sALS.

Data availablility
Data can be made available upon reasonable request. Please contact Professor Anthony Akkari Anthony.akkari@ perron.uwa.edu.au.