The TOMM40 ‘523’ polymorphism in disease risk and age of symptom onset in two independent cohorts of Parkinson’s disease

Abnormal mitochondrial function is a key process in the pathogenesis of Parkinson’s disease (PD). The central pore-forming protein TOM40 of the mitochondria is encoded by the translocase of outer mitochondrial membrane 40 homologue gene (TOMM40). The highly variant ‘523’ poly-T repeat is associated with age-related cognitive decline and age of onset in Alzheimer’s disease, but whether it plays a role in modifying the risk or clinical course of PD it yet to be elucidated. The TOMM40 ‘523’ allele length was determined in 634 people with PD and 422 healthy controls from an Australian cohort and the Parkinson’s Progression Markers Initiative (PPMI) cohort, using polymerase chain reaction or whole genome sequencing analysis. Genotype and allele frequencies of TOMM40 ‘523’ and APOE ε did not differ significantly between the cohorts. Analyses revealed TOMM40 ‘523’ allele groups were not associated with disease risk, while considering APOE ε genotype. Regression analyses revealed the TOMM40 S/S genotype was associated with a significantly later age of symptom onset in the PPMI PD cohort, but not after correction for covariates, or in the Australian cohort. Whilst variation in the TOMM40 ‘523’ polymorphism was not associated with PD risk, the possibility that it may be a modifying factor for age of symptom onset warrants further investigation in other PD populations.

Parkinson's disease (PD) is increasingly known as a multifaceted neurodegenerative disorder with a heterogeneous and burdensome symptom presentation and progression. Abnormal or deficient mitochondrial functioning is widely implicated as a key process in the selective neuronal death and pathogenesis of PD. Mitochondrial dysfunction, resulting in a loss of electron transport chain (ETC) efficiency or decline in ATP-synthesising capacity, appears to elicit dopaminergic cell death via a number of mechanisms, including reactive oxygen species(ROS)-generation, impaired ATP production and disrupted calcium homeostasis 1,2 . The pivotal role of mitochondrial dysfunction in PD is supported by several parkinsonism-causing toxins, genetic mutations and retrotransposon insertions 3,4 , which specifically impair mitochondrial function. Insights from these toxins and mutations imply that mitochondrial dysfunction in the pathogenesis of PD can arise from a wide array of biological processes, such as bioenergetic disturbances, nuclear and mitochondrial DNA mutations, impaired fusion and fission, defective mitophagy, and abnormal morphology and size 5 . For example, a significant cause of mitochondrial dysfunction in PD is the inhibition of mitochondrial complex I, an ETC defect that leads to severe oxidative stress and ROS-and caspase-mediated dopaminergic cell death 5 . Notably, the latter impairment is a major pathological feature of PD induced by familial PTEN-induced kinase 1 (PINK1), alpha-synuclein (SNCA) and Daisuke-Junko-1 (DJ-1) gene mutations or the toxin 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP) 3 . Thus, mitochondrial dysfunction is thought to cause neuronal stress and degeneration, eventuating in this neurodegenerative disease, PD.

Methods
Cohorts. The first cohort comprised of 235 home-based people with PD (PwP) and 231 healthy controls of European heritage from the Perron Institute for Neurological and Translational Science PD Database, as previously reported 29 . Clinical and demographic data, including the age of symptom onset, were recorded in the database. All PwP were examined by a movement disorder neurologist prior to inclusion in the study for verification of the diagnosis in accordance with the UK Brain Bank criteria for idiopathic PD and reported no family history of PD 30 , while healthy controls were confirmed to have no history of any neurological disorders. This study was approved by the Sir Charles Gairdner Hospital Human Research and Ethics Committee (Approval number 2006/073). Written informed consent was obtained from all participants, in accordance with the Australian National Health and Medical Research Council research guidelines.
The second cohort was derived from the international Parkinson's Progression Markers Initiative (PPMI) database (available at http:// www. ppmi-info. org/ data). This cohort comprised of 368 PwP and 172 healthy controls, and only individuals of European heritage were included in order to reflect the composition of the Australian cohort.
DNA extraction. In the Australian cohort, DNA was extracted from either blood samples or buccal swabs.
Participant buccal samples were collected by a trained researcher using Isohelix DNA/RNA Buccal Swabs (Cell Projects Ltd, Kent, U.K.) and stored until DNA extraction. Alternatively, blood was collected from the medial cubital vein. DNA was extracted and purified from these samples using QIAamp DNA mini kits (Qiagen Pty LTD., Victoria, Australia), according to the manufacturer's protocol. DNA concentration was determined using absorbance readings calculated by a NanoDrop One Microvolume UV-Vis spectrophotometer (Thermo Fisher Scientific Australia Pty LTD., Victoria, Australia).
Genotyping of TOMM40 '523' using PCR and fragment analysis. PCR-amplification of the '523' variant in the Australian cohort was completed using fluorescently labelled primers, as previously described 19 . The forward primer sequence was 5′-/6-FAM/-TGC TGA CCT CAA GCT GTC CTC-3′ and the reverse primer was 5′-GAG GCT GAG AAG GGA GGA TT-3′, synthesized by Integrated DNA Technologies Pty Ltd (IDT, Iowa, USA). Endpoint PCR was performed using 5 μL of 5 × MyFi Reaction Buffer (including 1 mM dNTPs and 3 mM MgCl2; Bioline, NSW, Australia), 1 μL of MyFi DNA Polymerase (Bioline), 0.2 μL of each forward and reverse primer (20 μM 19 . Briefly, the highest intensity peak(s) in each peak cluster between 160-190 bp were identified and sized, and 150 bp (accounting for flanking regions and primers) was subtracted from the peak sizes to determine poly-T allele lengths. Alleles were grouped using the convention established by Roses et al. 9 : Short (S, T ≤ 19), Long (L, 20 ≤ T ≤ 29) and Very Long (VL, T ≥ 30).
Genotyping of TOMM40 '523' using Whole Genome Sequencing. Whole genome sequencing (WGS) data were obtained from the PPMI database (available at http:// www. ppmi-info. org/ data) in binary alignment map (BAM) format that had been aligned to the human reference genome GRCh38 using the Burrows-Wheeler transform alignment algorithm 31 . Resultant BAM files were analysed using the Integrative Genomics Viewer 32 in order to calculate the length of the poly-T repeat, as previously demonstrated 33 . Alleles were grouped using the convention established by Roses et al. 9 : Short (S, T ≤ 19), Long (L, 20 ≤ T ≤ 29) and Very Long (VL, T ≥ 30).

APOE ε genotyping.
Genotyping of APOE ε were determined using the single nucleotide polymorphism (SNP) and PCR-restriction fragment length polymorphism (PCR-RFLP) analyses. APOE ε2/ε3/ε4 genotypes were determined by sequencing two SNPs (rs429358 and rs7412) using the MassARRAY system (Agena, Biosciences) at the Australian Genome Research Facility (AGRF; Queensland, Australia) 34 . For PCR-RFLP analyses, endpoint PCR reactions were prepared to a final volume of 10 µl using primer sequences previously described 35  Statistical methods. The Australian and PPMI cohorts were analysed separately, and together, using IBM-SPSS software (version 26, IBM Corporation). A significant nominal p-value of < 0.05 was employed for all statistical tests. Variables were described using mean and standard deviation (in brackets, SD), or frequency and percent (in brackets, %), as appropriate. Normality was assessed and subsequent clinical characteristics were analysed using Independent Samples T-Test, Mann-Whitney U, or Chi-square, as appropriate. For cross-sectional analysis, Chi-square, stratified Mantel-Haenszel tests and binary logistic regression models were used to evaluate the association between TOMM40 '523' genotypes and risk of PD in the Australian and PPMI cohorts. Binary logistical regression models were run both with and without correction for the APOE ε4 status (being grouped as zero, one or two ε4 allele(s)) and patient sex. Analysis was also run considering all combinations of '523' length category and APOE ε genotype, to examine for interactive effects without a priori assumptions. The aforementioned analyses were performed separately in the PPMI and Australian cohorts, and after combination of the cohorts. Following this, binary logistic models were carried out when considering populations of APOE ε3/ε3 carriers only, as previously examined in this fashion and stated as a requirement for replication studies 13 . Generalised linear models (GLMs) were also constructed in order to study the interaction of TOMM40 on age of disease onset, correcting for APOE ε allele status and sex. Again, GLMs were carried out when considering populations of APOE ε3/ε3 carriers only. Residual plots were examined for all models and no violations were noted. Correction for multiple comparisons was conducted using Bonferroni pairwise comparisons, where appropriate.
Subsequently, the load combination of TOMM40 '523' S allele carrier status (being S/S genotype, carriage of one S allele, and non-carriage of the S allele) and APOE ε4 status (the genotype ε4/ε4, carriage of one ε4 allele and non-carriage of the ε4 allele) were combined to produce 5 groups. Mean comparisons were then analysed using the Kruskal-Wallis one-way analysis of variance and univariate GLMs. This analysis was repeated in the combinations of TOMM40 '523' S allele carrier status and APOE ε2 status, and TOMM40 '523' VL allele carrier status and APOE ε4 status.
Finally, Kaplan-Meier curves for age at PD symptom onset were estimated, stratified by TOMM40 '523' genotype as well as both the TOMM40 '523' and APOE genotype. To compare the survival curves, the log rank test was applied, placing weight on longer survival periods 36,37 . Allelic stratification by TOMM40 '523' was also run using Kaplan-Meier analysis. Additionally, all distributions of ages at onset adjusting for sex were compared via Cox proportional hazard regression models. TOMM40 '523' and APOE ε genotyping and distribution. In the Australian cohort, TOMM40 '523' genotyping was obtained following optimisation of a PCR-based assay, with representative electropherograms presented in Supplementary Fig. 1. Genotyping of the TOMM40 '523' variant in the PPMI cohort was completed using a WGS method, previously reported as an alternative method to fragment analysis and Sanger sequencing 33 . The distribution of the '523' alleles in PwP compared to healthy controls for both cohorts are presented in Fig. 1. Following prior literature, allele sizes were used to bin samples into Short (S, T ≤ 19), Long (L, 20 ≤ T ≤ 29) and Very Long (VL, T ≥ 30) groups 9 (Supplementary Table 1). APOE ε genotypes of the Australian cohort were further validated using PCR-RFLP, with representative samples ε2/ε3, ε2/ε4, ε3/ε3 and ε3/ε4 presented in Supplementary Fig. 2. APOE ε genotype and allele groupings for both cohorts exhibited similar distributions, however there was an absence of individuals with the ε2/ε2 genotype in the Australian cohort (see Supplementary Table 2). Both groups consisted of individuals with self-reported of European heritage, though country of origin information was not available for all cases. To determine whether cohorts could be appropriately combined, Levene's Test for Equality of Variances was used to compare the '523' distribution and no significant difference was observable (F = 1.772, p = 0.183). Each distribution was also compared to distributions on Webstr database, similarities were seen between the current study cohorts and both the Gtex (predominantly European self-reported ancestry) and 1000 genomes European distributions. Thus, the current cohort were considered reflective of a population of European descent and the two cohorts could be combined to increase the sample size and power to detect '523' effect.
TOMM40 '523' not associated with PD risk. As there were no significant differences in TOMM40 allele distributions between cohorts, case-control analyses were completed in both individual cohorts and in a combined cohort. When examined genotypically, the TOMM40 '523' variant was not significantly associated with the risk of PD in either the Australian cohort (χ2 = 1.455, p = 0.918), or the PPMI cohort (χ2 = 1.806, p = 0.875), or when cohorts were combined (χ2 = 3.471, p = 0.628 genotypes and APOE ε genotypes also failed to show any significant differences between PD and control groups. As with prior findings, analysis utilising stratified Mantel-Haenszel tests did not find significant differences. Binary logistic regression models were analysed comparing each genotype or allele to all other genotypes or alleles (Table 2). Genotypically, naïve models exhibited a lack of significant effect on PD risk, both in each individual cohort and when combined. Furthermore, models were constructed taking APOE ε4 status and patient sex into account, which again exhibited no significant association between TOMM40 '523' genotype and PD risk. Models considering all combinations of '523' and APOE ε genotypes also returned non-significant results. Binary logistic regression models were used subsequently to assess this association allelically. Neither naïve nor covariate corrected models, in either cohort or the combined cohort, exhibited a significant allelic association between TOMM40 '523' and disease risk (Table 2). Finally, all models were repeated in the sub-cohort of APOE ε3/ε3 carriers (Supplementary Table 3), and again failed to show any significant associations between TOMM40 '523' alleles or genotypes and risk of PD.

TOMM40 variant and age of disease onset.
To investigate the influence of TOMM40 '523' on age of disease onset, GLMs were constructed, with data from naïve and corrected models reported in Table 3. Within the Australian cohort, no significant association between TOMM40 '523' alleles or genotypes and age of PD symptom onset was observed. However, in the PPMI cohort, carriage of the TOMM40 '523' S/S genotype was Table 1. Summary of the independent and combined cohort demographic information. Data presented as Mean (SD) or n (%). PPMI, Parkinson's Progression Markers Initiative; PD, Parkinson's disease. www.nature.com/scientificreports/ associated with a 2.58 year later age of symptom onset (p = 0.040). This difference was not significant after controlling for APOE ε4 load and sex in multivariable GLMs, although a trend for later age of onset was still observable (1.970 years, p = 0.157). It is important to note, however, that there was a lack of co-carriage of the S/S genotype with the APOE ε4/ε4 genotype in this cohort (see Supplementary Table 4) and that APOE ε4/ε4 homozygotes had a mean age of symptom onset almost 4 years earlier than other genotypes, although this was not statistically significant (p = 0.314). Thus, significance may have been lost due to opposing directionalities of these two genotypes on age of symptom onset. Carriage of the S/S genotype was also associated with a 2.69 year later age of PD symptom onset in the PPMI cohort (p = 0.032) after correcting for APOE ε2 load and sex. However, when adjusting for multiple comparisons, model-derived estimated means revealed the S/S genotype did not result in a significantly different age of PD symptom onset when compared to individual genotypes (p = 0.326). Subsequent analysis of the PPMI cohort exhibited no other statistically significant associations (at the 0.05 level of significance), either allelically or genotypically. Furthermore, when the cohorts were analysed as a www.nature.com/scientificreports/ combined cohort, no significant associations were exhibited between TOMM40 '523' and age of PD symptom onset (Table 3). Following this, all models were also repeated in the large sub-cohort consisting of only APOE ε3/ε3 homozygotes (Supplementary Table 5), none of which revealed a significant association between TOMM40 '523' and age of symptom onset. Finally, when considering the load combination of the TOMM40 '523' S allele and APOE ε4 or APOE ε2, no significant differences were found in the mean age of symptom onset between groups in either cohort or in the combined cohort according to the copy number of these alleles. Kaplan-Meier curves for the age at which symptoms of PD were first reported were stratified by TOMM40 '523' genotype, in the Australian cohort (Fig. 1A), the PPMI cohort (Fig. 1B) and the combined cohort (Fig. 1C). No or statistically significant or notable stratification was noted (p = 0.782, p = 0.321, p = 0.399), respectively. Subsequently, Kaplan-Meier curves for the age at which symptoms of PD began were analysed, stratified by Table 2. Corrected regression models evaluating the association between '523' genotype and disease risk in the two PD cohorts. + Data taken from Binary Logistic Regression models without correction for covariates. # Data taken from Binary Logistic Regression models with correction for APOE ε4 load and patient sex.     13 . Again, neither the Australian cohort (p = 0.550, Fig. 1D), or the PPMI cohort (p = 0.867, Fig. 1E), or the combined cohort (p = 0.660, Fig. 1F) exhibited statistically stratified ages of PD symptom onset. Kaplan-Meier analysis based on TOMM40 '523' allelic stratification also failed to produce significant results. Finally, all analyses completed using the Kaplan-Meier method were re-run using Cox proportional hazard regression models, which adjusted for sex, and no significant results were noted.

Discussion
The current study aimed to investigate the TOMM40 '523' structural variant as a potential PD risk factor and modifier of the age at symptom onset. The distribution of '523' poly-T alleles did not vary between PwP and healthy controls, or in the two independent cohorts examined. The similarity in '523' allelic distribution between the PPMI and Australian cohorts is worth noting as this study utilised two different approaches in the calling of the TOMM40 '523' variant, establishing a PCR-based and a WGS-based assay for use in the Australian and PPMI cohorts, respectively. Assay development for the TOMM40 '523' variant is generally considered to be difficult, as poly-T variants are challenging to sequence 13,19 . Despite optimisation within this study, PCR stuttering was observed in the PCR-based assay similar to previous reports 13,19 , which is a standard complication when amplifying repetitive genomic sequences, and highlights the need for further optimisation of the assay. By comparison, calling from WGS files revealed a similar distribution as reported previously 33 . While it has been presented as having fewer challenges in optimisation and sequencing, calling of '523' variants was an arduous process and the aforementioned study has stated that the correlation between '523' calling by WGS and PCR-based methods decreases with increasing size of the '523' allele 33 . The present study then examined whether the TOMM40 '523' variant is implicated in PD risk and found no association between carriage of allelic variants and the risk of developing PD. When binned, genotypic and allelic frequencies appeared to be similar between controls and PwP in both independent cohorts, and when the cohorts were combined. This agrees with previous work in a Polish PD cohort which showed no association of '523' alleles, genotypes or haplotypes with the risk of PD or age of symptom onset 27 . In contrast, in a Swedish population, a higher frequency of the L/VL genotype was observed in PD patients compared to controls 28 , though these findings are yet to be published in full. Given that these studies were conducted in different ethnic groups, and that '523' allele frequencies are ethnic-specific 18 , the contradictory findings in the studies to date may indicate that '523' is a risk factor in some populations, but not in others. As this is only the third study to investigate the role of TOMM40 '523' in PD risk, further studies should be performed in other ethnically diverse populations.
As mitochondrial involvement in PD is thought to be a key contributor to neuronal dysfunction and degeneration, it is plausible that a genetic variant that modulates mitochondrial function could modify age of symptom onset. Furthermore, AD and PD share a number of clinical, pathological and molecular features including toxic protein accumulation and mitochondrial dysfunction in the form of respiratory chain defects, oxidative stress, mitochondrial DNA damage and morphological abnormalities 5,38,39 . Previous studies have implicated TOMM40 '523' length in the age of onset of AD, particularly in carriers of the APOE ε3 allele 9,22 , but mixed findings exist. For instance, although Roses et al. 9 reported that longer '523' allele lengths were associated with risk of AD and earlier age of onset, the initial findings were not replicated in other populations 22,40 . Such varied findings may be a result of varied methodology and varied consideration of the influence of the APOE ε locus 13 , which is well-established as the strongest genetic predictor of AD. Most studies to date report no association of APOE ε variation with susceptibility to PD 41,42 , as was also the case in the current study. While the present findings did not show any positive evidence of interactive effects between TOMM40 '523' and APOE ε genotypes in relation to PD risk, they do suggest that the two loci may have small independent and opposing effects on determining the age of onset of the symptoms. While initial regression analyses in the PPMI cohort showed that age of onset was significantly delayed by carriage of the S/S '523' genotype, this was not replicated after correction for covariates, or in the Australian cohort. However, it is possible that this difference is caused by a country-or geographicalspecific effect, as seen in other genetic studies 43 . It is worth noting that only one prior study has examined this association 27 , though it is not clear whether this study completed rigorous analysis of the interactive effects of TOMM40 and APOE, as was conducted in the current study. As such, further in-depth analysis in larger PD cohorts is required to determine the significance of the present findings.
Currently, the functional effects of variation in '523' allele length are poorly understood due to a scarcity of research in this area. While several studies have suggested that the VL allele increases TOMM40 mRNA expression and the S allele represses expression 12,19,20,44 , others have found no significant differences in mRNA levels between S and VL variants 45 . A recent study demonstrated that overexpression of TOMM40 in vitro was associated with greater mitochondrial membrane potentials, respiratory rates, spare respiratory capacities, ATP levels, amyloid-beta resistance, and protein uptake 16 . On the other hand, another study observed a correlation between TOM40 protein deficits and enhanced oxidative stress, reduced ATP production and abnormal complex I protein concentrations in the brains of PD patients and in alpha-synuclein overexpressing murine models 21 . As the literature currently stands, further elucidation of the biological consequences of up-or down-regulated TOMM40 expression is required to give insight into its potential role in PD risk and disease modification.

Conclusion
Overall, this study aimed to investigate the risk and disease-modifying role of the TOMM40 '523' variant in two independent PD cohorts. TOMM40 plays an essential role in mitochondrial import and mitophagy, and the '523' polymorphism has been associated with age of onset of AD and with age-related cognitive decline. While this study clarifies that TOMM40 '523' is not in itself a predictor of PD risk, it raises the possibility that, as in AD, it may be a genetic marker for the age of symptom onset in PD. Whilst not conclusive, our findings in the large international PPMI cohort suggest that carriage of the S/S '523' genotype may be protective in terms of www.nature.com/scientificreports/ delaying the age of symptom onset, and may warrant further investigation in other populations. Importantly, the effect of co-carriage of the APOE ε4 allele, which appears to have an opposing effect on age of onset, must be considered in future studies. Though not significant in modulating risk of PD, future studies should consider the possible role of TOMM40 '523'as a determinant of age of symptom onset, and symptom trajectory. Given a phase 3 clinical trial of a therapeutic for the prevention and delay of onset of AD was recently conducted involving participants stratified by TOMM40 '523' 46 , the findings reported herein are noteworthy, and may allow the design of symptom-focused studies in PD for much-needed improvements in patient outcomes and care.