Introduction

Parkinson’s Disease (PD), known for bradykinesia, resting tremors, and increased muscle rigidity, significantly contributes to global disease burden due to its prevalence and disability rate1. Epidemiological studies have identified risk factors such as smoking, alcohol intake, and physical activity for PD prevention and prediction2. However, many potential risk factors may remain undiscovered due to the hypothesis-driven nature of current research.

PD is a complex neurodegenerative disorder influenced by genetic and environmental factors. While only about 3–5% of PD cases are due to clear genetic causes (monogenic PD), genetic risk variants account for 16% to 36% of PD heritability3,4. This result was obtained from an LD score regression heritability analysis, which includes common variants within the loci where monogenic PD causes are located. Genome-Wide Association Studies (GWAS) have identified several PD-related risk factors, showcasing the effectiveness of genetic association in uncovering PD risks5. Nonetheless, it is noteworthy that no large-scale phenome-wide analysis of biobank level data using polygenic risk scores (PRS) as an input for PheWAS has been conducted to date.

With the progression of large cohort GWAS, the use of PRS has significantly increased. Although each single genetic variant contributes minimally to disease susceptibility, PRS, which aggregates the additive effects of common genetic variants across the genome, can explain a substantial proportion of phenotypic variance. A Phenome-Wide Association Studies (PheWAS) is a type of hypothesis-free analysis aims at identifying multiple phenotypes associated with a single genetic risk score or genotype, exploring a wide range of phenotypes genetically linked to various diseases. PheWAS are less constrained by prior assumptions compared to studies focusing on the association between a single trait and genetic risk scores, an important feature when our understanding of disease mechanisms is incomplete. Genotype-based PheWAS methods also offer significant advantages as they are grounded in robust biological knowledge fixed from birth, making them less susceptible to confounding and reverse causality6. Using PRS to model PD risk allows for a systematic evaluation of its associations with various phenotypes. Earlier studies underutilized this methodology due to limited phenotype data and genomic resources. However, the availability of extensive biobanks such as the UK Biobank (UKB) now affords an unprecedented opportunity for the application of this approach7. Compared to traditional observational studies that require follow-up periods and substantial sample sizes, this approach enhances statistical power by leveraging both genetic and phenotypic information, thus uncovering the connections between PD and a broad range of phenotypes.

Mendelian Randomization (MR) is a method that employs genetic variation as an instrumental variable for assessing causal relationships8. It primarily utilizes genetic variants that exhibit robust associations with the exposure of interest, which are subsequently treated as instrumental variables (IV)9. MR is less vulnerable to confounding and reverse causality than traditional studies, as genetic variations, set at conception, don’t change with disease status10. Specifically, MR is used to investigate the causality of factors associated with PD, providing insights into how these factors might contribute to the development or progression of the disease. The MR methodology can be synergistically integrated with an exploratory, hypothesis-agnostic PheWAS framework11, enabling the systematic exploration of associations across a broad spectrum encompassing numerous disease outcomes or traits, potentially encompassing numerous manifestations12.

In this study, we employed summary statistics derived from the most recent GWAS meta-analysis of PD, which includes data predominantly from individuals of European ancestry4, alongside genomic and phenomic datasets obtained from the UKB, to conduct a Polygenic Risk Score-based Phenome-Wide Association Study (PRS-based PheWAS) for PD. This comprehensive investigation encompassed an extensive array of phenotypes, encompassing domains such as physical and mental health, biochemical parameters, and socio-demographic factors. To further investigate the nature of novel associations identified, we conducted a supplementary two-sample MR analysis using an independent population. While the PRS-based PheWAS identifies potential associations between genetic risk scores and phenotypes, the MR analysis is employed to assess the potential causal relationships between these phenotypes and genetic susceptibility to PD. By identifying these phenotypes and exploring their potential causal relationships with PD, we can gain insights into the biological pathways and mechanisms that underlie the genetic risk for PD. This research endeavour contributes significantly to the enhancement of our understanding of the PD phenotype and its underlying genetic architecture. Furthermore, it lays a foundation for future investigations aimed at exploring potential causal relationships between the phenotypes identified in association with genetic risk for PD.

Results

Study sample overview

The UK Biobank (UKB) study utilized data from 502,364 British participants recruited between 2006 and 2010, aged 37 to 73 years. Following stringent quality control measures and genetic analysis, including the exclusion of individuals based on SNP call rates, minor allele frequency, non-white British ancestry, and familial relatedness, the final cohort consisted of 407,917 participants (Supplementary Fig. 1).

PheWAS identifies 267 factors significantly associated with PD-PRS

Our PheWAS analyzed 1851 variables across 11 categories, including cognitive function, early-life risk factors, employment, health conditions, lifestyle and environment, medications and operations, mental health, neuroimaging, physical measures, sex-specific factors, and sociodemographic, reorganized from the UK Biobank’s original six categories, with detailed classifications in Fig. 1 and Supplementary Table 1.

Fig. 1
figure 1

Pie chart illustrating the distribution of the 1851 factors.

The PRS was calculated using 8,804,535 SNPs. To assess the predictive efficacy of PRS for PD, we found that the prevalence of PD increased with the rise in PRS score (Supplementary Fig. 2), indicating a robust predictive capacity of the PRS used in this study for PD. In our PheWAS analysis, we identified significant associations between PD-PRS and 267 phenotypes (comprising one cognitive function phenotype, three early-life risk factors, ten health conditions phenotypes, 35 lifestyle and environment phenotypes, one medication phenotype, ten mental health phenotypes, 120 neuroimaging phenotypes, 75 physical measures phenotypes, three sex-specific factors, and nine sociodemographic measures) out of the 1,851 phenotypes examined (Fig. 2, Supplementary Fig. 3, and Supplementary Table 24). These associations retained statistical significance across a minimum of four p-value thresholds following FDR correction for multiple comparisons (with absolute β values ranging from −0.092 to 0.339, where β denotes standardized regression coefficients, and pFDR for linear regression ranged from 0.049 to 5.80 × 10−41). All significant associations showed an identical effect direction for each of the 267 phenotypes (Supplementary Table 4). The proportions of significant findings in our PheWAS analysis were as follows: 37.5% (3 out of 8) of early-life risk factors, 11.4% (35 out of 308) of lifestyle and environmental phenotypes, 13.5% (10 out of 74) of mental health phenotypes, 64.5% (120 out of 186) of neuroimaging phenotypes, 24.7% (75 out of 304) of physical measures phenotypes, and 12.9% (9 out of 70) of sociodemographic measures exhibited statistical significance. In contrast, cognitive function phenotypes had a lower proportion of 3.23% (1 out of 31), while health conditions phenotypes showed merely 1.32% (10 out of 755) significance. Sex-specific factors and medications and operations displayed proportions of 9.38% (3 out of 32) and 1.75% (1 out of 57), respectively. No significant associations were observed in the employment phenotypes. Among these associations, 107 phenotypes sustained their statistical significance even following rigorous Bonferroni correction. This subset encompasses a diverse range, comprising one early-life risk factors, six lifestyle and environmental phenotypes, two mental health phenotypes, 61 neuroimaging phenotypes, 35 physical measures phenotypes, one sex-specific factor, and one sociodemographic phenotype (Supplementary Table 5). More details of the phewas analysis are provided in the Supplementary Results. However, it is noteworthy that the application of Bonferroni correction, although stringent, may tend to be overly conservative due to the inherent correlations among the tested phenotypes, with the Bonferroni correction thresholds set at 3.38 × 106.

Fig. 2: PheWAS Manhattan plot showing associations of phenotypes with PD PRS, grouped by categories.
figure 2

The x-axis represents phenotypes, and the y-axis represents the −log10 of uncorrected p values of two-sided test for linear regression between each phenotype and the PD-PRS (the most significantly correlated one of the 8 different threshold PD-PRSs, see Supplementary Table 3 for detailed results). Each dot represents one phenotype, and the colours indicate their according categories. The size of the dots corresponds to the magnitude (absolute value) of the effect between the phenotype and PD-PRS. The solid dots represent phenotypes that exhibited significant correlations with the PD-PRS at a minimum of four PRS variant p-value thresholds. The dashed lines indicate the threshold to survive FDR correction.

Two-sample Mendelian randomization of UK Biobank phenotypes on PD

Of the 267 potential causal effects identified in the PheWAS, we identified 194 with a relevant GWAS in MR-Base, and hence eligible for follow-up (Supplementary Table 6). The potentially causal effects on PD were found for 35 of 194 factors in the IVW MR analyses, which showed the same effect direction as those of PheWAS (Supplementary Table 7). These associations were all significant at p < 0.05 (IVW method) without directional pleiotropy (Supplementary Table 7). To ensure the validity of our MR assumptions, we verified the relevance assumption by confirming that the instrumental variables (SNPs) used were strongly associated with the exposure variables as indicated by their respective GWAS significance levels (Supplementary Table 7). For the independence assumption, we performed MR-Egger intercept tests, which showed no significant intercepts, indicating no horizontal pleiotropy (Supplementary Figs. 436). The Steiger directionality test, which examines the direction of causality to ensure that the genetic instruments explain more variance in the exposure than in the outcome, did not identify any SNPs that explained more variance in factors than in the PD risk for any analysis (Supplementary Table 7). The exclusion restriction assumption was supported by funnel plots for the remaining 33 phenotypes, which exhibited little evidence of departure from symmetry, indicating the absence of directional pleiotropy (Supplementary Figs. 436). In cases of substantial heterogeneity detected in the heterogeneity test, we employ the random effects model to estimate the MR effect sizes directly. All outcomes consistently support the presence of a causal relationship (Supplementary Table 8). In addition, besides guilty feelings and the area of isthmus cingulate (left hemisphere), funnel plots and scatter plots could not be generated due to insufficient instrumental variables. Combining Single SNP analysis and Leave-one-out analysis to examine the robustness of the above results, we have confirmed the reliable conclusions regarding the potential causal effects of 27 factors on PD (Fig. 3 and Supplementary Tables 9 and 10). These factors include one factors in cognitive function (fluid intelligence score [ORIVW = 1.156, pIVW = 2.86 × 10−3]), one in early life factors (maternal smoking around birth[ORIVW = 0.052, pIVW = 4.19 × 10−3]), one in health conditions (overall health rating [ORIVW = 0.590, pIVW = 9.74 × 10−3]), eight in lifestyle and environment (age first had sexual intercourse, cereal intake, dried fruit intake, and past tobacco smoking [ORIVW = 1.255 – 1.762, pIVW = 9.65 × 10−3 – 0.022]; exposure to tobacco smoke outside home, plays computer games, salt added to food, and time spent watching television (TV) [ORIVW = 0.004 – 0.658, pIVW = 3.31 × 10−3 – 0.019]), 14 in physical measures (arm fat mass (left), arm fat mass (right), arm fat percentage (left), arm fat percentage (right), body fat percentage, body mass index (BMI, Field ID = 21001), body mass index (BMI, Field ID = 23104), leg fat mass (left), leg fat mass (right), leg fat percentage (left), leg fat percentage (right), trunk fat mass, trunk fat percentage, whole body fat mass [ORIVW = 0.718 – 0.863, pIVW = 1.99 × 10−3 – 0.025], and two in sociodemographics (average total household income before tax and qualifications: College or University degree [ORIVW = 2.010 – 3.819, pIVW = 6.70 × 10−5 – 8.72 × 10−3]). Notably, the average total household income before tax was significantly associated with PD. MR analysis produced ORIVW of 2.010 (pIVW = 6.70 × 10−5), with an FDR-corrected p-value of 0.011, indicating a robust association even after correcting for multiple comparisons.

Fig. 3: MR analysis of factors related to PD Risk.
figure 3

Estimates were obtained from the inverse-variance weighted method. CI Confidence interval, OR Odds ratio. Further details are available in Supplementary Figs. 436 and Tables 610.

Discussion

We conducted a PRS-based PheWAS analysis to understand and identify associations between genetic liability for PD and 1851 phenotypes available in the UK Biobank dataset. Among these PRS-outcome associations, 267 met our criteria for potential causal effects, spanning across categories such as cognitive function, early life factors, health conditions, lifestyle and environment, mental health, neuroimaging, physical measures, and sociodemographic factors. Of these, 194 were eligible for follow-up studies using two-sample MR. Strong evidence was found for 27 factors covering cognitive function, early life factors, health conditions, lifestyle and environment, physical measures, and sociodemographics. Key findings include fluid intelligence score, age at first sexual intercourse, cereal and dried fruit intake, and average total household income before tax as new risk factors for PD. Conversely, maternal smoking around birth, playing computer games, adding salt to food, and TV watching time emerged as protective factors.

We found that fluid intelligence scores constitute a novel risk factor for PD. Previous research has emphasized a positive correlation between fluid intelligence and working memory capacity, a finding of particular significance in understanding the cognitive performance of individuals with PD13. It can be speculated that the decline in fluid intelligence may serve as an early indicator of cognitive deterioration in PD, particularly when considering the functional impairments experienced by these patients. Furthermore, given the associations between fluid intelligence and various neurobiological factors, future research should consider the interplay of these factors and how they collectively influence the development of PD.

Maternal smoking around birth is a potential novel protective factor for PD, possibly linked to specific effects of certain chemicals in tobacco on the nervous system. Prior research has shown a significant association between maternal smoking during pregnancy and low birth weight (LBW) in infants14. Although the direct relationship between this finding and PD remains unclear, it suggests that maternal smoking might impact fetal neurodevelopment, potentially indirectly influencing neurological health in children and later in adulthood. However, the mechanisms underlying this protective effect remain unknown and must be assessed in the context of other well-established adverse health effects of smoking.

We observed a negative association between overall health rating and the risk of developing PD, indicating that a higher overall health rating may serve as a protective factor against PD. This finding is consistent with the research conducted by Lai et al., which explored the relationship between quality of life (QOL) and health status in PD patients15. They found that non-motor symptoms (such as daily functioning and emotional/behavioural issues) significantly impact the quality of life in PD patients. This suggests that individuals with higher overall health scores may perform better in these non-motor symptom areas, thereby reducing the risk of PD.

We found that regular computer gaming and TV-watching time inversely correlate with PD risk. Prior research has underscored the potential role of electronic games in PD rehabilitation, particularly in terms of motivating players and sustaining long-term engagement16. These games might provide cognitive benefits, possibly slowing PD’s cognitive decline. Additionally, our research indicates a surprising positive link between consuming cereals and dried fruits and increased PD risk, possibly due to adverse effects from components like refined carbohydrates or added sugars. Such a diet, high in sugar, is tied to inflammation and oxidative stress, both PD risk factors. Additionally, our findings challenge conventional health advice by suggesting a protective role for added salt in food against PD development. We also observed a complex relationship between tobacco smoke exposure and PD risk. External household exposure to tobacco smoke seems to lower PD risk, while personal smoking history increases it17. This may be due to the influence of smoking on gut microbiota, with indirect exposure offering some smoking benefits without the health risks of direct smoking. Furthermore, Sieurin et al.‘s study supports this, showing smoking initiation’s protective effect against PD18. Moreover, our research identifies early sexual activity as a new risk factor for PD19. Sex hormones, especially estrogen and testosterone, are known for neuroprotection and may impact neurodegenerative disease development, suggesting that early hormonal changes could influence PD risk.

Our study’s finding of a higher BMI correlating with lower PD risk remains debated. A comprehensive meta-analysis covering 10 cohort studies found no direct link between BMI and PD risk [RR = 1.00 for each 5 kg/m² increase, 95% CI = 0.89–1.12], consistent across gender-specific subgroups20. Another study noted that while a higher BMI doesn’t increase PD risk, being underweight is associated with a higher risk21. Similarly, studies on body shape metrics like waist circumference showed mixed results22,23,24. Recent large cohort studies, enhanced by genome-wide association study methodologies, are clarifying BMI’s relationship with various diseases25. Notably, increased BMI has been observed to reduce the risk of AD(18) and other non-cardiovascular diseases26, suggesting a potential protective effect of BMI on non-vascular neurological and other diseases. A recent large cohort study found obese women (BMI ≥ 30 kg/m²) had a significantly lower PD risk (HR = 0.76, 95% CI = 0.59–0.98, P = 0.032), with similar correlations observed for higher waist circumference and waist-to-height ratio27. This aligns closely with our findings and is further supported by multiple MR studies28,29. Furthermore, an increased BMI also reduced the risk of depression in PD patients30. In our study, metrics related to body shape and body fat exhibited consistent effects with BMI, such as body fat percentage, arm fat mass, leg fat mass, whole body fat mass, and trunk fat mass.

Furthermore, we observed a positive correlation between higher pre-tax household income and PD risk. Engaging in physical activities like household chores and commuting might reduce PD risk, suggesting lower-income families involved in more physical labor could have a lower PD risk31. Additionally, research indicates that PD typically leads to unemployment within less than 10 years of onset32. This could imply that higher household incomes might be linked to earlier diagnosis and treatment of PD, while lower incomes might delay diagnosis and treatment due to limited access to medical resources. We also found that within the sociodemographic category, having a college or university degree positively correlates with the risk of PD, consistent with Frigerio et al.‘s findings on increased PD risk among highly educated individuals33. This could be attributed to less physical activity among the higher-educated. Concurrently, Keener et al.‘s study found a link between education level and PD-related cognitive impairment, suggesting an influence on early diagnosis and cognitive function in PD34.

This study also has several limitations that warrant consideration. Firstly, our PheWAS was constrained by the available variables in the UKB database, excluding some potential factors that might have associations. Moreover, since PheWAS is based on PRS for association analyses, our study might fail to identify risk factors that have no or weak genetic ties to the disease in question. Lastly, the strict filtering criteria in the current PheWAS may mask some association outcomes.

Utilizing phenotypic and genomic data from over 500,000 individuals from the UKB, this study employed PheWAS and MR methods to systematically screen for and rigorously identify 27 PD risk factors. Among these, fluid intelligence score, age first had sexual intercourse, cereal intake, dried fruit intake, and average total household income before tax emerged as newly recognized risk factors for PD. Maternal smoking around birth, playing computer games, salt added to food, and time spent watching television have been determined as new protective factors against PD. These findings offer valuable insights and references for the prevention of PD. Our research findings require validation in a broader population and further investigation to explore how these factors specifically impact the pathogenesis of PD.

Methods

Study population

We utilized prospective cohort study data from the UKB, which recruited over 502,364 British participants between 2006 and 2010. The UKB has received organizational repository approval from the North West Multi-Centre Research Ethics Committee (https://www.example.com about-us/ethics) and oversaw this study. The initial sample consisted of 502,364 participants aged between 37 and 73 years. Genetic and phenotypic data, including clinical outcomes such as PD diagnosis, were obtained for all participants at baseline. These were ascertained during the follow-up period from 2007 to 2023 through hospital inpatient records, death certificates, primary care records, and self-reports. Data collection and analysis in this study was under UKB application No. 104811. PD-PRS calculation, PheWAS, and MR analysis were restricted to individuals of European ancestry to minimize confounding due to population stratification in genetic data analyses.

PD-PRS generation

In the UKB, genotypic data were available for 488,127 participants. Detailed genotyping and quality control procedures can be found in previous publications35. We excluded single nucleotide polymorphisms with call rates below 95% and a minor allele frequency less than 0·1%. Subjects were chosen based on an estimation of recent British ancestry via self-report information and principal component analyses of the genotypes. Additionally, we excluded 161 individuals with ten or more presumptive third-degree relatives, resulting in a final subset of 407,917 participants. Post quality control procedures yielded a total of 407,917 participants (Supplementary Fig. 1). PRSice2 was employed to calculate individual PRS36. PRS calculation leveraged GWAS summary data across multiple ethnicities, including European4, East Asian37, and Latin American38. This meta-analysis provided a comprehensive training dataset of 2,525,897 individuals, encompassing 49,049 cases, 18,785 proxy cases, and 2,458,063 controls. For more intricate details, please refer to https://drive.google.com/file/d/1TmDZNFgyQvsOZ0xu-aZmBpVCpeUUa0UX/. We employed a p-value informed clumping method, using a cutoff of r2 = 0·1 in a 250 kb window for the analysis39. P thresholds for scoring were determined at p < 0.0005, p < 0.001, p < 0.005, p < 0.01, p < 0.05, p < 0.1, p < 0.5, and p < 16,40.

Risk Factors

The PheWAS incorporated 11 primary categories of factors (comprising a total of 1851 variables), which are: 1) Cognitive function, 2) Early-life risk factors, 3) Employment, 4) Health conditions, 5) Lifestyle and environment, 6) Medications and operations, 7) Mental health, 8) Neuroimaging, 9) Physical measures, 10) Sex-specific factors, and 11) Sociodemographic measures. These variables originate from six categories demonstrated in the UKB, namely population characteristics, additional exposures, assessment centres, online follow-up, health-related outcomes, and biological samples. This study re-categorized them accordingly. For a detailed breakdown, please refer to Fig. 1 and Supplementary Table 1.

Phenome-wide Association Study (PheWAS)

For our PheWAS, we utilized the PHESANT package in R to assess associations41. The decision rules in PHESANT are based on variable types, and each variable falls into one of four data categories: continuous, ordinal categorical, nominal categorical, or binary. Before conducting tests, continuous data underwent normalization through inverse normal rank transformation. In this study, the PD-PRS was employed as the independent variable, while the analysis encompassed 1851 integrated factors as dependent variables. We performed linear regression for continuous outcomes, logistic regression for binary outcomes, and ordered logistic regression for ordinal outcomes. Covariates consistently included in all association tests were sex, age, genotyping array42, the first ten genetic principal components, and the assessment centre. In total, 1851 phenotypes (31 cognitive function phenotypes + 8 early-life risk factors + 26 employment phenotypes + 755 health conditions phenotypes + 308 lifestyle and environment phenotypes + 57 medications and operations phenotypes + 74 mental health phenotypes + 186 neuroimaging phenotypes + 304 physical measures phenotypes + 32 sex-specific factors + 70 sociodemographic measures) × eight PD-PRS (under 8 p thresholds) = 14,808 tests across phenotypes and PD-PRS p thresholds were corrected altogether by FDR-correction using the p.adjust function in R43(q < 0.05). For clarity, we have additionally reported the number of associations identified through Bonferroni correction as a supplementary approach (p < 3.38 × 10−6). We acknowledge that the phenotypes are likely to be correlated, and therefore Bonferroni correction is considered excessively conservative. We opted to pursue subsequent MR analysis on phenotypes that exhibited significant associations with the PD-PRS across a minimum of four PRS variant p-value thresholds. Our rationale for combining FDR correction and the four-threshold criterion in the initial step was twofold: to control Type I errors (achieved through FDR correction) and to carry forward the most robust and consistent findings, characterized by significance at over half of all PRS thresholds, into the subsequent MR analysis. All analyses were conducted using two-tailed statistical tests.

Follow-up using MR analysis

TwosampleMR package in R was used to conduct two-sample MR analysesTo estimate the effects of risk factors on PD. The GWAS summary statistics of factors and PD were acquired from the MRC IEU OpenGWAS database (https://gwas.mrcieu.ac.uk/). The inverse variance weighted (IVW) method was the primary method for conducting MR. MR assumptions, including relevance, independence, and exclusion restriction, were stringently tested. Instrumental variables were confirmed to be strongly associated with exposures. MR-Egger intercept tests and the Steiger directionality test were conducted to check for horizontal pleiotropy and confirm the causal direction. Funnel plots were generated to assess symmetry and detect any directional pleiotropy. All analyses ensured the robustness of MR assumptions for valid causal inferences. Consistency in the direction of effects across both the PheWAS and MR analyses would suggest that these are risk factors associated with PD. See reverse MR and more details in Supplemental Methods.