Introduction

Parkinson’s disease (PD) is a complex neurodegenerative disorder with a large effect on individuals and society1. The disorder is mainly characterized by death of dopaminergic neurons in the pars compacta of the substantia nigra1,2 as a result of aberrant α-synuclein accumulation in Lewy bodies3,4, dysfunction of mitochondria, lysosomes or vesicle transport, and synaptic transport issues1.

PD develops from a complicated interplay between genetics and environment1. Genome-wide association studies (GWAS) have associated over 90 common variants with PD5. In GWAS, associations with rare variants often go undetected due to the low number of carriers. However, high-impact variants in coding regions often undergo negative selection and thus, tend to be rare. Unlike common variants, rare variants are often located within coding regions and may have substantial effects. The use of large-scale whole-genome sequence (WGS) data enable a more comprehensive search for rare variants. Combining variants predicted to cause a loss of gene function increases the statistical power6,7 and may allow us to pinpoint genes involved in the pathogenesis of PD. Previously, a meta-analysis found that the burden of rare variants in nine genes associated with PD using 7184 PD cases, 6701 proxy cases, and 51,650 controls of European ancestry8.

In a gene-based burden test, we collapsed rare variants (MAF < 0.1%) predicted to cause loss-of-function (LOF) as estimated with transcript effect (LOFTEE)9 using WGS data. We tested for association between PD and burden of LOF in 11,976 genes (P < 0.05/11,976 = 4.2 × 10−6) in Iceland and the UK (Supplementary Data 1 and Supplementary Fig. 1). We combined the results in a meta-analysis using 8647 PD cases and 777,693 controls (Fig. 1, Supplementary Data 2 for demographics) of European descent, of which 62% have been WGS and the rest chip-genotyped and imputed. For the top finding, we included the Accelerating Medicines Partnership Parkinson’s Disease dataset (AMP-PD) in the meta-analysis, adding 3538 PD cases and 2365 controls, in total 12,185 PD cases and 780,058 controls.

Fig. 1
figure 1

Overview of study design and summary of results.

Rare LOF variants in ITSN1 (Cumulative Variant Frequency [CVF] = 0.012% in Iceland, CVF = 0.013% in the UK, and CVF = 0.10% in AMP-PD) were associated with a large effect on the risk of PD (OR = 7.3, 95% Confidence Interval [CI] = 3.5–15.2, P = 1.5 × 10−7, Table 1). The effects in the datasets were similar (P-het = 0.86). In Iceland, we found five LOF variants in ITSN1 and the association was mainly driven by the two most common ones, a splice acceptor (chr21:33781560-A, c.1597-1 G > A, MAF = 0.006%, P = 0.028, OR = 6.1) and a stop-gain variant (rs1569244177-A, p.Trp944Ter, MAF = 0.005%, P = 0.034, OR = 10.0, Supplementary Data 3). In the UK, we found 64 LOF variants in ITSN1 (Supplementary Data 3). Among the 154 carriers, 13 (8.4%) were diagnosed with PD compared to 1.1% of individuals diagnosed with PD in the two population-based datasets. In the AMP-PD dataset (N = 5903 WGS), we found six carriers of six rare LOF variants (MAF < 0.1%, Supplementary Data 3), thereof 5 with PD.

Table 1 Association results for burden of LOF in ITSN1

We find support for the association (P = 8.9 × 10-6) in a comparable LOFTEE rare variant (MAF < 0.1%) burden test including over 750,000 individuals from the UK dataset (N ~ 450 K), All of Us (N ~ 250 K), and Massachusetts General Brigham Biobank (N ~ 53 K) - results shown in the Neurodegenerative Diseases Knowledge Portal10.

The youngest ITSN1 carrier with PD, diagnosed at 27 years of age, did not have variants in other genes described as pathogenic by OMIM11. Neither did any of the other PD diagnosed carriers. The average age at onset was 8 years earlier among ITSN1 carriers (P = 0.022) than non-carriers (62 years, SD = 12.2 vs 70 years, SD = 9.8, respectively, Supplementary Data 2 for each dataset).

We note that ITSN1 is intolerant of LOF variants (probability of LOF intolerance [pLI] = 1), according to the gnomAD database12, and both heterozygous LOF and missense variants in the gene have been associated with neurodevelopmental disorders13.

ITSN1 encodes Intersectin-1, a cytoplasmic membrane-associated protein involved in two main pathways; (1) actin cytoskeleton rearrangements through the Rho GTPase cycle14,15 and (2) clathrin-mediated endo- and exocytosis, including the synaptic vesicle cycle14,16.

One of the pathological hallmarks of PD is the early death of dopaminergic neurons in the pars compacta of substantia nigra1. The growth and degeneration of dopaminergic neurons depend on the actin cytoskeleton which is regulated by small GTPases of the Rho family15,17, encompassing Rho, Rac, and CDC42 subfamilies. ITSN1 acts as a guanine nucleotide exchange factor that specifically activates CDC4218, Cell Division Control Protein 42, and can, thus, modulate regulation of the actin cytoskeleton. Several findings suggest the involvement of CDC42 in PD. Downregulation of CDC42 has been observed in various brain regions, including substantia nigra, in postmortem brain tissues from PD patients19. In PD mouse models, Cdc42 signaling is decreased in the caudate and putamen, accompanied by impairments in motor coordination and cognitive function20 and Cdc42 knockout mice display similar phenotype as parkinsonian mice20.

Aggregation of α-synuclein in Lewy bodies is another pathological hallmark of PD1. Numerous proteins have been shown to interact with α-synuclein, including CDC4221. Activating Rho GTPases is followed by substantial reduction in α-synuclein expression in dopaminergic neurons17 and neurite extension22.

In conclusion, we report a novel gene associated with PD and suggest that the loss of ITSN1 function may be involved in the pathogenesis of PD in at least one of three ways; (1) Inactive CDC42 and its downstream pathway results in degeneration of dopaminergic neurons, (2) inactive CDC42 cannot regulate the vesicle exocytosis of α-synuclein, and (3) disrupted synaptic vesicle transport, a known feature in the pathogenesis of PD1, via clathrin-mediated endo- and exocytosis. Thus, direct modulation of CDC42 or its upstream regulator, ITSN1, could be exploited as therapeutic avenue for PD.

The results for all genes tested are available (Supplementary Data 1) and will hopefully become a useful resource in future PD studies.

Methods

Study sample and ethics statement

Icelandic PD cases were identified by ICD-10 code G20 and ICD-9 code 332.0 from medical records, filed from 1985 to 2023, through collaboration with physicians at Landspitali—National University Hospital in Reykjavik, the Registry of Primary Health Care Contacts, and the Registry of Contacts with Medical Specialists in Private Practice. The data in this study was approved by the National Bioethics Committee (NBC, VSN-17-142; VSNb2017060004/03.01) following review by the Icelandic Data Protection Authority. All genotyped participants signed a broad informed consent allowing the use of their samples and data in projects at deCODE genetics approved by the NBC. Personal identifiers of the participants’ data were encrypted in accordance with the regulations of the Icelandic Data Protection Authority.

The UKB resource consists of extensive phenotype and genotype data from ~500,000 participants, who enrolled in the study between 2006 and 2010 throughout the UK and were aged 40 to 69 years at recruitment23. PD cases were identified by ICD-10 code G20 in General Practice clinical event records (Field ID 42040), ICD-10 code G20 and ICD-9 code 332.0 in UK hospital inpatient data (Field ID 41270 and 41271), and self-report in non-cancer self-reported illness records (Field ID 20002). This study was conducted under application number 42256. All participants provided an informed consent for the use of their genotype data and the link to electronic health records. The North West Research Ethics Committee reviewed and approved the UKB protocol (ref. 06/MRE08/65).

Genotyping

The genomes of 58,346 Icelanders were WGS24,25 using GAIIx, HiSeq, HiSeqX, and NovaSeq Illumina technology to a mean depth of at least 38×. Joint variant calling was performed with Graphtyper (v2.7.1)26,27. Roughly 155,000 Icelanders were genotyped (of which all were WGS) using various Illumina SNP arrays24,25. The genotypes were long-range phased28, allowing for improved genotype calls using haplotype sharing information. Subsequently, familial imputation of genotypes in first and second degree relatives was used to increase the sample size29.

In the UKB, 431,079 white British/Irish individuals (identified by PCA analyses)23 were WGS using Illumina NovaSeq sequencing machines at deCODE for 214,548 individuals and Wellcome Trust Sanger Institute for 216,531 individuals. The average genome-wide sequencing coverage was 32.4x and joint variant calling was performed using GraphTyper (v.2.7.5)26,27. The protocol for a preliminary release of the WGS of the UK Biobank dataset has been described in detail30.

Variant quality control and annotation

Only high-quality sequence variants were considered for selection. To estimate the quality of the sequence variants we regressed the alternative allele counts on the depth conditioned on the genotypes reported by GraphTyper26,27. For a well-behaving sequence variant, the mean alternative allele count should be 0 for a homozygous genotype, depth/2 for a heterozygous genotype, and depth for homozygous alternative genotype. Assuming no sequencing or genotyping errors, the expected value of alternative allele count should be depth conditioned on the genotype, represented by an identity line (slope 1, intercept 0). Deviations from this line suggest a spurious or somatic sequence variant. We filtered variants with slope less than 0.5. Additionally, Graphtyper assigns each variant a score (AAscore), predicting the probability that it is a true positive. We only included variants with AAscore > 0.8.

Variant Effect Predictor (VEP)31 was used to attribute predicted consequences to the variants sequenced in the two datasets. LOF variants were classified as those predicted as start-lost, stop-gain, stop-lost, splice donor, splice acceptor, or frameshift. LOF variants with MAF < 0.1% were further evaluated by LOFTEE9, which determined which high-confidence LOF variants were used in this study.

Association and meta-analysis

We applied logistic regression under an additive model to test for association between gene-based burden of LOF variants and PD where the disease status was the dependent variable and genotype counts were the independent variable. To compute two-sided P values, we used likelihood ratio test. Individuals were coded 1 if they carry any LOF variants in the autosomal gene being tested and 0 otherwise. In the Icelandic dataset, we adjusted for sex, current age or age at death, county of origin (equivalent to principal components), blood sample availability, and an indicator function for the overlap of the lifetime of the individual with the time span of phenotype collection. In the UK dataset, we adjusted for sex, age, the first 20 principal components and three variables indicating sequencing batches, to remove batch effects. We used LD score regression intercepts32, 1.14 for Iceland and 1.00 for the UK, to adjust the χ2 statistics and avoid inflation due to cryptic relatedness and stratification, using a set of 1.1 million variants. P-values were calculated from the adjusted χ2 results.

Meta-analysis was performed on the summary results from the two datasets when available, using a fixed-effects inverse variance weighted method33, in which the datasets were allowed to have different population frequencies for alleles and genotypes but were assumed to have a common odds ratio (OR) and weighted with the inverse of the variance of the effect estimate derived from the logistic regression. The significance threshold was based on the number of genes tested (Bonferroni significance). In a random-effects method, a likelihood ratio test was performed to test the heterogeneity of the effect estimate in the datasets; the null hypothesis is that the effects are the same in the two datasets and the alternative hypothesis is that the effects differ between datasets.

Additional datasets

The Neurodegenerative Disease Knowledge Portal (NDKP) framework is an open platform developed by a team of scientists and software engineers at the Broad Institute of MIT and Harvard. The NDKP includes results generated via consortia based science focusing on neurodegenerative diseases such as ALS, PD, and Alzheimer’s disease. More information on https://ndkp.hugeamp.org/.

The Accelerating Medicines Partnership® (AMP®) program is a public-private partnership between the National Institutes of Health (NIH), multiple biopharmaceutical and life sciences companies, and non-profit organizations. Release 3.0 cohorts include the Michael J. Fox Foundation for Parkinson’s Research (MJFF) Parkinson’s Progression Marker Initiative (PPMI), The National Institute of Neurological Disorders and Stroke (NINDS) BioFIND study, Harvard Biomarkers Study (HBS), the NINDS Parkinson’s Disease Biomarkers Program (PDBP), the MJFF LRRK2 Cohort Consortium (LCC), the NINDS Study of Isradipine as a Disease Modifying Agent in Subjects With Early Parkinson Disease, Phase 3 (STEADY-PD3), the MJFF and NINDS Study of Urate Elevation in Parkinson’s Disease, Phase 3 (SURE-PD3), and the Global Parkinson’s Genetics Program (GP2). WGS was performed by Macrogen and the Uniformed Services University of Health Sciences using the Illumina HiSeq XTen sequencer with samples coming from whole blood. The data were processed using GATK Best Practices guidelines set by the Broad Institute’s joint discovery pipeline and elaborated on elsewhere34. Variant annotations were generated on the joint genotyped variants with VEP. All individuals were of European ancestry as confirmed by principal component analysis using HapMap3 European ancestry populations. The P value in Table 1 was generated with a two-sided Fisher’s Exact Test in R.