Main

Parkinson’s disease (PD) is a neurodegenerative disease pathologically defined by Lewy body inclusions in the brain and the death of dopaminergic neurons in the midbrain. The identification of genetic risk factors is imperative for mitigating the global burden of PD, one of the fastest growing age-related neurodegenerative diseases. A large PD genome-wide association study (GWAS) meta-analysis uncovered 90 independent genetic risk variants in individuals of European ancestry1. Similarly, large-scale PD GWAS meta-analyses of East Asian2 and a single GWAS of Latin American3 individuals have each identified two risk loci that were not previously identified in Europeans. For PD, there are now large-scale efforts to sequence and analyze genomic data in underrepresented populations with the goal of both identifying novel associated loci, fine-mapping known loci and addressing the inequality that exists in current precision medicine efforts4,5. Here we performed a large-scale multi-ancestry meta-analysis (MAMA) of PD GWASs by including individuals from four ancestral populations: European, East Asian, Latin American and African. This effort can serve as a guide for future genetic analyses to increase ancestral representation.

Meta-analyses identify 66 known and 12 novel loci

In addition to results from previously described European1, East Asian2 and Latin American3 studies, we also used FinnGen and additional datasets for East Asian, Latin American and African cohorts from 23andMe, Inc (Table 1, Fig. 1 and Supplementary Table 1). In total, we included 49,049 PD cases, 18,618 proxy cases (first-degree relative with PD) and 2,458,063 neurologically-healthy controls. Genetic covariance intercepts from linkage disequilibrium (LD) score regression6 within ancestries were close to zero or near the 95% confidence interval, implying that there is no sample overlap between the cohorts (Supplementary Table 1). After the data were harmonized and mapped to genome build hg19, MAMAs were conducted using a random-effects model and meta-regression of multi-ethnic genetic association (MR-MEGA)7. The random-effects model had greater power to detect homogenous allelic effects7. MR-MEGA uses axes of genetic variation as covariates in its meta-regression analysis and had greater power to detect heterogeneous effects across the different cohorts. MR-MEGA also distinguishes ancestral heterogeneity (differences in effect estimates due to ancestry-level genetic variation) from residual heterogeneity using axes of genetic variation generated from the allele frequencies across the different cohorts.

Table 1 Cohort descriptions
Fig. 1: MAMA study design.
figure 1

Top panel: four ancestry groups used in the meta-analysis. Middle panel: MAMA and the two methods used. Random-effect (top) is better suited for risk variants with homogeneous effect direction across different ancestries, whereas MR-MEGA (bottom) can identify risk variants with heterogeneous effects due to population stratification introduced by ancestry differences. The densely dashed lines indicate Bonferroni adjusted suggestive threshold of two-sided P < 1 × 106, and the loosely dashed lines indicate Bonferroni adjusted significant threshold of two-sided P < 5 × 109. Bottom panel: downstream analyses and their examples. Created with Biorender.com.

Combining results from the random-effects model and MR-MEGA, we found 12 novel PD risk loci and 66 hits in known risk loci from single-ancestry GWAS (Table 2, Fig. 2 and Supplementary Tables 25) that met the Bonferroni-corrected alpha of 5 × 10−9, a more stringent threshold chosen to account for the larger number of haplotypes resulting from the ancestrally diverse datasets8. Of the 78 risk loci identified, 69 were significant in the random-effects model, whereas 3 were only significant in MR-MEGA. Eight of the novel loci found by the random-effect method showed homogeneous effects across the four different ancestries. An additional novel locus (FASN) identified by the random-effect method showed homogeneous effects in all available populations, but note that this variant failed quality control in both East Asian datasets. The other three loci, identified exclusively in MR-MEGA, showed ancestrally heterogeneous effects. All three loci (IRS2, MYLK2 and USP25) showed evidence of significant ancestral heterogeneity (PANC-HET < 0.05) but no significant residual heterogeneity (PRES-HET > 0.148), supporting the idea that the signals are due to population structural differences rather than other confounding factors (Fig. 3). For the IRS2 locus (lead SNP rs1078514, PANC-HET = 5.3 × 10−3) the Finnish cohort has an opposite effect direction compared to the meta-analysis effect estimate (Supplementary Fig. 4). Similarly, the MYLK2 locus has the African effect estimate most different from the meta-analysis effect estimate (lead SNP rs6060983, PANC-HET = 0.035), suggesting different effects between populations. Although this is a novel single-trait GWAS locus, its lead SNP was previously discovered as a potential pleiotropic locus in a multi-trait conditional/conjunctional false discovery rate (FDR) study between schizophrenia and PD9. Lastly, the USP25 locus had the most significant ancestral heterogeneity (lead SNP rs1736020, PANC-HET = 4.74 × 10−5) and its effects were specific to European and African cohorts, albeit in different directions. When looking at the nearest protein coding gene to each novel lead SNP and their probability of being loss-of-function intolerant (pLI) score, we found that 7 out of 12 genes had a pLI score of 0.99 or 1. Genes with low pLI scores were found both in loci with (MYLK2) and without (SYBU, PIGL and PPP6R2) significant ancestry heterogeneity.

Table 2 Meta-analysis results of lead SNPs in the novel loci
Fig. 2: Manhattan plots of the meta-analysis results across 2,525,730 participants.
figure 2

a, Random-effects model test. b, MR-MEGA meta-regression test (chi-squared test with df = 4). The x axis shows chromosome and base pair positions of each variant tested in the meta-analyses. The y axis shows the two-sided P value with no multiple-test correction in the −log10 scale. Orange horizontal dashed line indicates the Bonferroni-adjusted significant threshold of P < 5 × 10−9. Gray horizontal dashed line indicates the truncation line, where all −log10 P values greater than 40 were truncated to 40 for visual clarity. Novel loci are highlighted in red and annotated with the nearest protein coding gene.

Fig. 3: Heterogeneity upset plots.
figure 3

a, Top variants per novel loci. b, Top variants per MR-MEGA identified locus with moderate to high heterogeneity (I2 > 30). The top bar plot illustrates heterogeneity with dark blue indicating ancestry heterogeneity proportion and light blue indicating other residual heterogeneity proportion. The bottom plot shows the subcohort level beta values with blue indicating positive and red indicating negative effect directions. Three variants with greater than 30% I2 total heterogeneity were only identified in the MR-MEGA meta-analysis method, whereas little to no heterogeneity is observed in loci identified in random effect.

PESCA v0.3 (ref. 10) was run for the main European and East Asian meta-analyses and all loci identified in the main analysis were explored (Supplementary Table 6). PESCA uses ancestry-matched LD estimates to infer whether the causal variants are population-specific or shared between two populations. Variants identified as shared between the populations may be more likely to be causal. In addition, we expect higher posterior probability (PP) for shared causal variants in the loci identified by MAMA, even if they have not previously been identified in the single-ancestry study. The lead SNP in the RIMS1 locus (rs12528068) had a high PP for being a shared causal variant (PP = 0.972) despite being significant in the European study1 but not in the East Asian study2. We also observed that the novel lead variants for MTF2 (rs35940311), PIK3CA (rs11918587), EP300 (rs4820434) and PPP6R2 (rs60708277) had higher PP estimates for being shared causal variants across both populations (PPshared = 0.757, 0.214, 0.769, 0.946) than for being causal variants in a single population (PPEUR <0.080, PPEAS < 0.001). However, it is important to note that the sample size discrepancy between the European and East Asian data impacts our power to detect population-specific causal variants at any of these loci.

We found 17 suggestive loci that failed to meet our stringent significance threshold but had P < 5 × 10−8 in a fixed-effects meta-analysis and P < 1 × 10−6 in the random-effects meta-analysis (Supplementary Table 4). Fourteen of these regions were novel loci. Two loci near JAK1 and HS1BP3 were exclusively found in the 23andMe Latin American and African cohorts. The lead SNPs (rs578139575 and rs73919910) for these loci are non-coding and very rare in European populations but are more common in Africans and Latin Americans (gnomAD v3.1.2 minor allele frequencies in EUR: 0.02%, 0.23%; AFR: 1.64%, 8.84%; AMR: 0.41%, 1.91%). If confirmed, these loci would confer a strong effect on PD risk (beta: −1.3, −0.54). These loci merit further studies in the African and Latin American populations.

Fine-mapping identifies six credible sets with single variants

Fine-mapping was also performed using MR-MEGA, which uses ancestry heterogeneity to increase fine-mapping resolution. We identified 23 loci that had fewer than 5 variants within the 95% credible set. Of these, MR-MEGA nominated a single putative causal variant with >95% PP in 6 loci: TMEM163, TMEM175, SNCA, CAMK2D, HIP1R and LSM7 (Table 3 and Supplementary Tables 7 and 8). Our results affirmed previous results showing the TMEM175 p.M393T coding variant as the likely causal variant11. The putative variants HIP1R have strong evidence for regulome binding (RegulomeDB rank ≤ 2). In particular the HIP1R variant rs10847864 is located in a transcription start site that is active in substantia nigra tissue (chromatin state windows: chr12:123326200.123327200) and astrocytes in the spinal cord and the brain (chromatin state windows: chr12:123326400.123326600). Outside of the credible sets containing a single variant, we identified missense variants in two genes: FCGR2A (p.H167R, PP = 0.145) and SLC18B1 (p.S30P, PP = 0.780).

Table 3 MR-MEGA fine-mapping results for loci with a single SNP within the 95% credible set

Gene set analysis finds enrichment in brain tissues

We used the Functional Mapping and Annotation (FUMA) software12,13 to functionally annotate the random-effect results. We generated a custom 1000 Genome reference panel that reflected the ancestry proportions of our dataset and ran multi-marker analysis of genomic annotation (MAGMA)14 for gene ontology, tissue level and single-cell expression data. We tested 16,992 gene ontology sets in MSigDB v7.0 (ref. 15) and used conditional analysis to discard redundant terms or identify gene sets that must be interpreted together. We found that 40 gene sets were significantly enriched with conditional analysis identifying 13 gene sets that share their signals with at least one other gene set (Supplementary Table 9). This is a substantial increase from previous 10 gene sets in the European meta-analysis performed by Nalls and colleagues1. Only two gene ontology terms that were significant in the Nalls et al. meta-analysis were also significant in the multi-ancestry results after multiple test correction: ‘curated geneset: Ikeda MIR30 Targets Up’ (PFDR = 0.018) and ‘cellular component: vacuolar membrane’ (PFDR = 0.047). In addition, ontology terms in immune system pathways (microglial cell proliferation, macrophage proliferation, natural killer T cell differentiation: PFDR < 0.04), mitochondria (response to mitochondrial depolarization: PFDR = 0.028), vesicles (vesicle uncoating, phagolysosome assembly, regulation of autophagosome maturation: PFDR < 0.03) and tau protein (tau protein kinase activity: PFDR = 0.034) were significant. At the tissue level, the genes of interest were enriched in all brain cell types, as well as pituitary tissue (Supplementary Fig. 9), consistent with the results from Nalls et al.1.

When analyzing single-cell RNA-sequencing data, there was no expression enrichment across 88 brain cell types in mouse brain data when cross-referenced with DropViz16 (Supplementary Fig. 10). There was also no enrichment of any specific cell types in the substantia nigra tissue in DropViz (Supplementary Fig. 10). However, in human midbrain data17, dopaminergic (DA1) and GABAergic (GABA) neurons were enriched (Supplementary Fig. 10).

eQTLs and SMR nominate 25 putative genes near novel loci

We also searched the GTEx v8 (ref. 18) brain tissue eQTLs and multi-ancestry eQTL meta-analysis of the brain19 to correlate novel loci with gene expression data (Supplementary Tables 10 and 11). To correlate potential putative genes with PD risk, we searched the significant-eQTL genes and genes near the loci with previously completed summary-based Mendelian randomization (SMR)20 results in European-only data. When comparing the SNPs in novel loci with multi-ancestry brain eQTLs19, 28 genes were significant (Supplementary Fig. 8 and Supplementary Tables 10 and 11). SMR found 25 genes in four novel loci associated with PD risk (Table 2 and Supplementary Table 12). Interestingly, PPP6R2 and CENPV expression changes in substantia nigra were associated with PD risk. PPP6R2 encodes protein phosphatase 6 regulatory subunit 2, a regulatory protein for protein phosphatase 6 catalytic subunit (PPP6C), which is involved in the vesicle-mediated transport pathway. Centromere protein V (CENPV) is involved in centromere formation and cell division.

Discussion

This study is a large-scale GWAS meta-analysis of PD that incorporates multiple diverse ancestry populations. From the joint cohort analysis, we identified 66 independent risk loci near previously known PD risk regions and 12 potentially novel risk loci. Of the putative novel loci, nine had homogeneous effects and three had heterogeneous effects across the different cohorts. We found 17 additional suggestive loci using fixed-effects meta-analysis threshold at P < 5 × 10−8 and random-effects meta-analysis threshold at P < 1 × 10−6. We fine-mapped 23 loci by leveraging the diverse ancestry populations. We highlighted tissues and cell types associated with PD risk, which were consistent with previous findings1. Finally we used SMR to nominate 25 putative genes near our novel loci.

Novel loci contained genes in pathways previously implicated in PD. The MTF2 and PPP6R2 loci contain the genes TMED5 and PPP6R2. Protein TMED5 localizes to Golgi body21 and PPP6C, regulated by PPP6R2, is part of the vesicular transport pathways (https://reactome.org/content/detail/R-HSA-199977)22, both of which are implicated in PD pathogenesis23,24,25,26,27,28. eQTL and SMR analysis showed association between expression changes for PPP6R2 and CENPV in substantia nigra and PD risk. Because substantia nigra deterioration is a hallmark pathogenic feature of PD, PPP6R2 and CENPV merit additional investigation. Within a known locus, a new independent signal was found in RILPL2 (rs28659953). Protein RILPL2 interacts with LRRK2-phosphorylated Rab10 to block primary cilia generation29. Genes JAK1 and HS1BP3 are in two suggestive loci that were found only in Latin American and African populations. JAK1 is one of the proteins in the Janus kinase family, which is a critical part of the JAK-STAT pathway and is implicated in cytokine and inflammatory signaling30. JAK1 variants have been implicated in autoimmune diseases such as juvenile idiopathic arthritis and multiple sclerosis31. HS1BP3, also known as essential tremor 2 (ETM2), has been implicated in essential tremor32,33,34. Based on its sequence, ETM2 may modulate interleukin-2 signaling35. If these loci are confirmed, they would further support the growing appreciation for the role of inflammation in PD36. All of the potentially novel PD loci identified in this analysis will require additional replication and functional validation to elucidate their role in PD pathogenesis. Previous findings in European populations found that polygenic risk scores explained 16–36% of PD heritability1. Although we did not perform similar tests incorporating our novel loci, they may explain additional heritable PD risk.

We found that 26 of the 66 detected known PD loci had nominally significant ancestral heterogeneity (PANC-HET < 0.05) and 10 remained significant after Bonferroni correction (PANC-HET < 0.05/62 MR-MEGA loci) (Fig. 3 and Supplementary Table 3). This heterogeneity may be caused by differences in effect sizes and allele frequencies between the different populations and thus should be studied as loci with potentially ancestrally divergent risk. 18 of the previous 92 known loci from single-ancestry GWASs did not overlap with any genome-wide significant loci in the multi-ancestry results at the significance threshold of 5 × 10−9 (Supplementary Table 13). However, our results do not necessarily invalidate these previous results. First, several of the cohorts have small sample sizes, which may increase the influence of sampling variation. Another reason may be due to the stringent genome-wide significance threshold of 5 × 10−9. Although this is a large PD GWAS meta-analysis, the more stringent significance threshold further raises the sample size needed to achieve equivalent statistical power. Of the 17 European loci identified, 3 were significant at the 5 × 10−8 threshold, and all 17 loci were at least nominally significant with the MR-MEGA method (PMR-MEGA < 5 × 10−6). Lastly, variants may be more specific to the population in which they were first identified. 5 of the 17 variants had nominal ancestral heterogeneity (PANC-HET < 0.05). It is worth noting that there are large differences in statistical power across ancestries. Additional population-specific loci will likely reach significance when larger sample sizes are available for non-European datasets.

Our fine-mapping isolated several putative causal variants in previously discovered loci. TMEM175-rs34311866 has been previously identified as functionally relevant to PD risk37, which is consistent with our fine-mapping results. Fine-mapped variants in TMEM163, HIP1R and CAMK3D were also found to be parts of active or strong transcription sites in substantia nigra tissues. Among the fine-mapped variants were two missense variants in FCGR2A and SLC18B1, albeit with a lower PP than the 7 singular putative variants that we highlighted in Table 3. FCGR2A is present in multiple immune-related ontology gene sets, further highlighting the potential role of the immune system in PD pathology. However, the function of SLC18B1 is still unknown. Although the fine-mapping results provided by MR-MEGA are sufficient to identify putative causal variants for loci driven by one independent signal, multiple variants in a locus can contribute to complex traits. The additive and epistatic effects of multiple causal variants in a locus can be difficult to interpret when the effects associated with each independent signal are small.

The gene ontology analysis found multiple pathways that may be relevant to PD pathology (Supplementary Table 9), including those related to mitochondria (response to mitochondrial depolarization) vesicles (vesicle uncoating, phagolysosome assembly, regulation of autophagosome maturation) tau protein (tau protein kinase activity) and immune cells (microglial cell/macrophage proliferation, and natural killer T cell differentiation)36. Neither mitochondrial nor immune cell pathways were significant in the previous European-only meta-analysis. Novel signals from the multi-ancestry approach may have given enough power to highlight these ontology terms. Out of 10 ontology terms that were significant in the previous European-only meta-analysis1, 4 terms were not tested due to version differences in MSigDB and only 2 of the remaining terms were significant. However, the other 4 terms were still nominally significant at P < 0.05. This may be due to genome-wide signals that were less significant due to their heterogeneity across the different populations.

Although this is a large multi-ancestry PD meta-analysis GWAS, the European population is still overrepresented. Around 80% of full PD cases are of European descent. Individuals of African descent were particularly underrepresented at just 0.5% of the effective PD cases. The discoveries in our study warrant future efforts to expand studies in more diverse populations. The Global Parkinson’s Genetics Program (GP2) is partnering with institutions that care for underrepresented populations to generate data for these underserved communities all over the world5, and we will continue the ongoing analysis as more participants are genotyped. Just as the first PD GWASs failed to identify significant signals38,39, we are confident that future diverse ancestry GWAS will produce impactful association results as sample sizes increase. Further efforts in multi-ancestry and non-European GWAS will identify loci that are more relevant to the global population and will continue to facilitate fine-mapping efforts to identify the genetic variants that drive these associations.

Methods

Study design and cohort descriptions

We used a single joint meta-analysis study design to maximize statistical power40. We used datasets representing four different ancestry groups: European, East Asian, Latin American and African. The meta-analysis included 49,049 PD cases, 18,618 PD proxy cases (participant with a parent with PD) and 2,458,063 neurologically normal controls (Table 1 and Supplementary Table 1). GWAS results of European1, East Asian2 and Latin American3 populations were previously reported. African dataset as well as the additional Latin American and East Asian PD GWAS summary statistics were provided by 23andMe. The Finnish PD GWAS summary statistics was acquired from FinnGen Release 4 (G6_PARKINSON_EXMORE). For the FinnGen data, we chose the endpoint ‘Parkinson’s disease (more controls excluded)’ (G6_PARKINSON_EXMORE), which excludes control participants with psychiatric diseases or neurological diseases. Although some FinnGen GWAS results also include UK Biobank participants, our FinnGen data did not include any UK Biobank participants.

23andMe diverse ancestry data

All self-reported PD cases and controls from 23andMe provided informed consent and participated in the research online, under a protocol approved by the external AAHRPP-accredited institutional review board (IRB), Ethical & Independent Review Services (E&I Review). Participants were included in the analysis on the basis of consent status as checked at the time data analyses were initiated. The name of the IRB at the time of the approval was Ethical & Independent Review Services. Ethical & Independent Review Services was recently acquired, and its new name as of July 2022 is Salus IRB (https://www.versiticlinicaltrials.org/salusirb). Samples were genotyped on one of five genotyping platforms: V1 and V2, which are variants of Illumina HumanHap550+ BeadChip; V3, Illumina OmniExpress+ BeadChip; V4, Illumina custom array that includes SNPs overlapping V2 and V3 chips; or V5, Illumina Infinium Global Screening Array. For inclusion, samples needed a minimal call rate of 98.5%. Genotyped samples were then phased using either Finch or Eagle2 (ref. 41) (RRID:SCR_015991) and imputed using Minimac3 (RRID:SCR_009292) and a reference panel of 1000 Genomes Phase III42 (GRCh38) and UK10K data43. For this study, samples were classified as African, East Asian or Latino using a genotype-based pipeline44 consisting of a support vector machine and a hidden Markov model, followed by a logistic classifier to differentiate Latinos from African Americans. Unrelated individuals were included in the analysis, as determined via identity-by-descent (IBD). Variants were tested for association with PD status using logistic regression, adjusting for age, sex, the first five principal components and genotyping platform. Reported P values were from a likelihood ratio test.

MAMA

We performed MAMA of GWAS results using MR-MEGA v0.2 (ref. 7) and PLINK 1.9 (RRID:SCR_001757). MR-MEGA performs a meta-regression by generating axes of genetic variation for each cohort, which are then used as covariates in the meta-analysis to account for differences in population structure. Although MR-MEGA was able to generate four principal components as axes of genetic variation, three principal components visibly separated the super population ancestries and explained 98% of the population variance (Supplementary Fig. 7). Therefore, we used three principal components to minimize overfitting. MR-MEGA has reduced power to detect associations for variants with homogeneous effects across populations. It is therefore recommended to run MR-MEGA alongside another meta-analysis method. PLINK 1.9 was used to perform random-effect meta-analysis to detect homogenous allelic effects.

Before the analysis, all datasets were harmonized to genome build hg19 using CrossMap45 (RRID:SCR_001173) and Python 3.7. All variants were filtered by imputation score (r2 > 0.3) and minor allele frequency ≥0.001. Only autosomal variants were kept in the final results as sex-chromosome data were not available for all ancestries. In total 20,590,839 variants met the inclusion criteria. However, MR-MEGA has a cohort-number requirement that varies based on the number of axes of variation. Therefore, 5,662,641 SNPs present in at least 6 of the 7 cohorts were analyzed in the MR-MEGA analysis. Bonferroni-adjusted alpha was set to a more stringent 5 × 10−9 for all MAMAs to account for the larger number of haplotypes resulting from the ancestrally diverse datasets8. Genomic inflations were measured for all cohorts and the meta-analysis. Inflation for cohorts with large discrepancy between the case and control numbers was normalized to 1,000 cases and 1,000 controls. All inflation was nominal and below 1.02 (Supplementary Figs. 13 and Supplementary Table 1). No genomic control was applied prior to meta-analysis.

We identified genomic risk loci within our meta-analysis results using Functional Mapping and Annotation (FUMA) v1.3.8 (refs. 11,12). In brief, FUMA first identifies independent significant SNPs in the GWAS results by clumping all significant variants with the r2 threshold <0.6, and then a locus is defined by merging LD blocks of all independent significant SNPs within 250 kb of each other. Start and end of a locus is defined by identifying SNPs in LD with the independent significant SNPs (r2 ≥ 0.6) and defining a region that encompasses all SNPs within the locus. Lead SNPs within a locus are determined by further clumping the independent significant variants within the genomic locus (r2 ≥ 0.1). The 1000 Genome reference panel with all ancestries was used to calculate the r2.

To determine if any associated loci in the meta-analysis were not previously identified, all significant SNPs were compared to the 92 known PD risk variants found in the previous two major meta-analyses1,2. Two variants identified in the Latin American admixture population3 could not be replicated, as the variants and their proxies were removed during quality control. If a genomic risk locus contained a significant hit in either population within 250 kb, then the locus was considered a known hit. Otherwise the locus was considered a novel hit. Forest plots and QQ plots were generated using python 3.7 with seaborn v0.11.2 and matplotlib v3.5.1. Manhattan plots were generated using gwaslab v3.3.11.

Fine-mapping

Fine-mapping was performed using MR-MEGA7, which approximates a single-SNP Bayes factor in favor of association. This is reported as the natural log of Bayes factor (lnBF) per SNP in the MR-MEGA meta-analysis summary statistics. SNPs were selected at meta-GWAS significance level (P < 5 × 10−9). PPs of driving the association signal at each locus were calculated from the Bayes factor as follows:

$${\pi }_{j}=\frac{{\varLambda }_{j}}{{\sum }_{j=1}^{n}{\varLambda }_{j}\,},$$

where Λj is the Bayes factor of the jth SNP within a locus with n number of SNPs. Credible sets of fewer than 5 SNPs with sum PP (πj) greater than 0.95 were accepted as putative causal variants. We excluded results located in the major histocompatibility complex region and the MAPT locus due to their complex LD structure.

Estimation of population-specific or shared causal variants at associated loci

Proportion of population-specific and shared causal variants (PESCA v0.3)10 was used to estimate whether causal variants at the loci identified in the meta-analysis were population-specific or shared between two populations. In brief, genome-wide heritability was estimated for the European and East Asian GWAS summary statistics using LD score regression6,46. Summary statistics of both populations were intersected with common variants with the 1000 Genome reference panels provided by PESCA, which have already been LD pruned (R2 > 0.95) and low-frequency SNPs removed (minor allele frequency < 0.05). The intersected variants were further split according to independent LD regions from the European and East Asian populations. The genome-wide prior probabilities of population-specific and shared causal variants were calculated using default parameters or as otherwise recommended by PESCA; then the results were used to calculate the PP for each variant. When the lead SNP was unavailable in the results, proxy variants (R2 > 0.8) were used to approximate the PP for each variant for East Asian and European ancestry using R 4.2.0 and LDlinkR v1.1.2 (ref. 47). Other cohorts were not included due to sample size constraints for this method.

Functional annotation and GSEA

Functional annotation of the discovery results utilizing publicly available annotation data was done using FUMA v1.3.8 (refs. 11,12). The summary statistics were annotated by ANNOVAR48 (RRID:SCR_012821) through the FUMA platform. Our meta-analysis results were analyzed using MAGMA13 (RRID:SCR_001757) to check for enrichment in gene ontology terms and gene expression data from tissues in GTEx v8 (ref. 18). We tested 16,992 gene sets and gene ontology terms from MSigDB v7 (ref. 15) as well as single-cell RNA-sequencing expression data from mouse brain samples in DropViz16 and human ventral midbrain samples17. Test parameters were set to default. MAGMA gene analysis was run with a custom 1000 Genome reference panel that had a similar proportion of European, East Asian, Latin American and African participants as our main analysis. In short, we added all European participants and randomly selected participants from the East Asian, Latin American and African populations until the ancestry proportions of the reference panel were matching the effective sample size proportions of our study. The MAGMA gene analysis results were then analyzed using gene set analysis for ontology terms and gene-property analysis for tissue specificity. Results were adjusted for multiple tests using Benjamini–Hochberg FDR correction with the alpha of 0.05. The significant ontology terms were analyzed again in conditional analyses to identify and filter terms that share the same signals. Conditional analyses rerun the analyses with significant ontology terms as additional covariates. This can identify terms that lose significance when ‘conditioned’ on another, which may mean the terms share an underlying signal. When a term lost significance while the paired term retained nominal significance, the term that was no longer significant was discarded. When both terms lost significance, both were retained but highlighted with the comment that the pairs need to be interpreted together. Tissue level enrichment analysis was done using the pre-processed GTEx gene expression dataset provided by FUMA investigators. Single-cell expression enrichment analyses were performed by uploading the MAGMA gene analysis results to the FUMA cell-type analysis tool, which runs the MAGMA gene-property analysis with the chosen RNA-sequencing data. Additional pathway analyses of genes mapped by FUMA SNP2GENE were performed through GENE2FUNC with default parameters.

SNPs in the novel loci were searched in multi-ancestry brain eQTL meta-analysis results19 (under Synapse ID syn23204884). We used a P-value cutoff of 10−6 as previously described19. eQTL and GWAS comparison plots were generated using LocusCompareR49. Multi-SNP SMR was used to test if DNA methylation and/or RNA expression of genes near the novel loci were associated with PD risk20. The nearest genes from the lead SNPs, significant genes in MAMA brain eQTL results and significant genes in GTEx v8 brain tissue were chosen for SMR. In total, 44 genes near the novel loci were searched in a list of previously completed PD SMR results from European-only GWAS meta- analysis (https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/news/nightingale-health-and-uk-biobank-announces-major-initiative-to-analyse-half-a-million-blood-samples-to-facilitate-global-medical-research)18,20,50,51,52,53,54,55,56. Only tissues in the central nervous system, digestive system and blood were used due to their relevance to PD pathology. Methylation probes were annotated using the Bioconductor R package IlluminaHumanMethylation450kanno.ilmn12.hg19 v0.6.0 (https://bioconductor.org/packages/release/data/annotation/html/IlluminaHumanMethylation450kanno.ilmn12.hg19.html). The association signals were adjusted using FDR correction with the alpha of 0.05 and all signals with PHEIDI < 0.05 were removed due to heterogeneity.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.