Introduction

The diagnosis of Parkinson’s disease (PD) is typically based on clinical symptoms, such as tremors, rigidity, and bradykinesia, following standard guidelines for previously validated clinical assessments. While clinical diagnosis has improved continuously over the past decades and includes indicators for differential and early diagnosis, many of the covered symptoms are nonspecific. These may overlap with other neurological disorders or only show up in later stages of the disease, often resulting in a misdiagnosis or delayed diagnosis1. One of the main challenges in the clinical diagnosis is a long pre-motor phase of the disease, during which many common symptoms may not yet be present or may be subclinical. Differentiating PD from other neurological disorders, such as atypical or secondary parkinsonism, during this pre-motor phase can be challenging2. In addition, PD phenotypes can vary substantially, highlighting a need for objective biological biomarker signatures rather than diagnosis based solely on a subjective judgment.

Molecular signatures have the potential to provide more specific, accurate, and cost-effective indicators of a complex disorder such as PD. While clinical indicators mostly rely on assessing broad categories of symptomatic and disease-associated changes, molecular markers may reveal more granular pathological changes occurring already in the pre-symptomatic disease stages. By facilitating an earlier, more reliable, and specific diagnosis, molecular signatures may enable more patient-tailored and effective treatments.

In recent years, omics technologies have significantly contributed to discovering PD biomarkers. Numerous genetic, protein, and metabolic changes associated with PD have been identified through these approaches, and new insights into the disease pathogenesis were gained3. For instance, several genetic variants have been linked to an increased risk of developing PD4. Additionally, proteomic studies have revealed altered protein levels in PD brain tissue, including an increased abundance of alpha-synuclein, which plays a central role in Lewy body and Lewy neurites formation as a hallmark pathological feature of PD5. Finally, prior metabolomic studies in PD have identified changes in disease-relevant cellular pathways, particularly in those related to energy metabolism6.

Despite these advances, omics-based biomarker discovery for PD is still hampered by several limitations and challenges. While high cross-validated accuracies for PD diagnosis have been reported for some of the published omics signatures7,8, the training and test set sizes used to evaluate the corresponding machine learning (ML) models are often small. In addition, the signatures for the most predictive models have often been derived from tissues or body fluids with limited practical accessibility, e.g., cerebrospinal fluid (CSF), which requires a lumbar puncture for sample collection. A further common limitation of the multidimensional patterns in PD molecular biomarker signatures is that they can emerge as black-box models that are not fully intuitive to interpret9. Considering this and the limited sample sizes in many prior studies, more research is needed to find robust and interpretable PD-specific molecular signatures.

To contribute to the ongoing research efforts in this field, we have conducted a cohort-wide blood plasma metabolic profiling of 549 PD patients and 590 controls in the Luxembourg Parkinson’s Study (LuxPARK10) combined with subsequent statistical and bioinformatics pathway and network analyses. As a distinctive characteristic of other studies focusing on PD patients who have already received dopaminergic medication, we included biospecimens from all 56 untreated de novo patients available in the cohort. This subset of samples was used to distinguish between treatment-associated and treatment-independent metabolite changes.

To mechanistically interpret PD-associated alterations in the context of cellular networks and exploit prior information from complementary omics data, we have mapped the metabolomics statistics onto a dedicated genome-scale enzyme-metabolite network together with transcriptomics data from an independent PD case/control study. Through the integrated analysis of these omics data, we identified coordinated sub-network alterations, particularly in xanthine metabolism, which displayed regulatory consistent changes between metabolite abundances and the expression of enzyme-encoding genes. These consistent sub-network changes may help to pave the way towards more robust blood-based biomarker signatures and provide new insights into coordinated, disease-associated cellular process alterations in PD.

Results

When studying metabolite abundance changes in de novo PD patients compared to controls and in all PD patients (including subjects who had received dopaminergic treatments) vs. controls, we identified several metabolites with a statistically significant alteration (adjusted p value <= 0.05). Figure 1 presents a volcano plot for the de novo PD vs. control comparison, highlighting the metabolites with both high statistical significance and pronounced effect sizes. Table 1 shows the top 25 most significant metabolites in de novo PD vs. controls, and Table 2 shows the metabolites with shared significance in de novo PD vs. controls and all PD vs. controls (complete ranking tables of all significant metabolites for the individual comparisons are provided in Supplementary Tables 5 and 6; rankings for treated patients only are provided in Supplementary Table 7). The metabolites with shared significance also display the same direction of the change, i.e., the signs of the log fold-changes are identical. We grouped the significantly altered metabolites by shared functional categories to discuss them in the context of the prior literature on molecular mechanisms in PD.

Fig. 1: Volcano plot for the differentially abundant metabolites.
figure 1

Volcano plot for the differentially abundant metabolites when comparing de novo PD vs. control blood plasma samples. Metabolites displaying abundance changes with high effect size (absolute log. fold-change effect size (abs(logFC)) > 0.3) and high significance (adjusted p value <= 0.05) are highlighted in green, metabolites with only a high effect size are shown in orange, and metabolites with only a high significance in red.

Table 1 Top 25 most significantly differentially abundant blood plasma metabolites
Table 2 Shared significant blood plasma metabolites

Xanthine metabolites (inosine, xanthosine, xanthine, hypoxanthine)

Among the significant abundance alterations, xanthine metabolites stand out with a shared increased abundance in de novo PD vs. control for four representatives of this group: inosine, xanthosine, xanthine, and hypoxanthine (highlighted by the star symbol in the first column of Table 1). While these changes are highly significant in de novo patients, they are not observed when comparing only treated PD patients vs. controls after multiple hypothesis testing adjustments (see Supplementary Table 7). Since the effect of dopaminergic medication on blood metabolite levels in the group of treated patients cannot be removed entirely by the conducted filtering and statistical adjustments, a higher measurement variation in this group as compared to the de novo PD subgroup is in line with our expectations. However, the difference between treatment-naïve and treated patients may also reflect the later disease stage of the latter group (see Table 3).

Table 3 Tabular overview of baseline subject characteristics

Interestingly, xanthine metabolites have already been implicated in PD through multiple mechanisms. One of the proposed functional links to molecular hallmarks of PD is the generation of reactive oxygen species by xanthine oxidase (also known as xanthine dehydrogenase or XDH). When XDH catalyzes the conversion of hypoxanthine to xanthine, the reactive oxygen species (ROS) O2- and H2O2 are generated11. ROS can induce oxidative stress, which may damage cellular components such as mitochondria and dopaminergic neurons. In PD, this can contribute to the loss of dopaminergic neurons in the substantia nigra as one of the main pathological characteristics of the disease. Indeed, increased levels of XDH in the blood of PD patients have previously been reported12, matching our observed metabolite changes in this pathway. Changes in oxidative stress signaling in PD are further evidenced by the significant PD-associated decreased abundance of gamma-glutamylalanine (see Table 2), which is involved in the metabolism of glutathione, an antioxidant that is essential for cellular protection against oxidative damage13.

In addition to this association with ROS generation, xanthine and some of its derivatives, including paraxanthine, theophylline, and caffeine, have been linked to neuroprotective mechanisms in PD. In an MPTP (1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine) mouse model of PD, Xu et al. demonstrated that the administration of paraxanthine, theophylline, and caffeine significantly attenuated MPTP-induced dopamine depletion, as reported in their study14. For caffeine in particular, several epidemiological studies have also consistently shown a significant negative correlation between its consumption and a PD diagnosis15, although this may be explained by inverse causation (e.g., reduced caffeine consumption due its effects as central nervous system stimulant, potentially worsening PD symptoms). Furthermore, the substituted xanthine molecule has been used as a scaffold to synthesize new drug-like compounds as a non-dopaminergic strategy for neuroprotection16. The main proposed mechanisms for the protective actions of xanthines and caffeine in this context include the antagonism of the adenosine A2A receptor (ADORA2A) and inhibition of monoamine oxidase type B (MAO-B). Indeed, A2A is targeted by the drug istradefylline, a xanthine derivative with a particularly long half-life, which has been approved by the Food and Drug Administration (FDA) as an add-on treatment to levodopa (L-DOPA) for Parkinson’s patients with motor fluctuations17. For the second proposed xanthine target, MAO-B, pharmacological inhibitors belong to the first drugs developed for treating PD18.

In summary, multiple different pathways have been proposed to link alterations in xanthine metabolism with pathological or protective mechanisms in PD. To better understand the specific processes involved in xanthine abundance alterations in PD and identify the potential enzymes involved, we have further investigated xanthine metabolism as part of a joint network analysis of the metabolomics data and independent PD case/control transcriptomics data (see section on “Integrative network analysis of metabolomic and transcriptomic changes in PD”).

Catecholamine metabolism (retinal, ALDH1A1)

Besides the xanthines, the carotenoid retinal was the only other metabolite exhibiting a significantly higher abundance in both de novo PD vs. controls and all PD patients vs. controls. Retinal is the oxidized form of retinol, most well-known as a constituent of visual pigments. Interestingly, the enzyme aldehyde dehydrogenase 1A1 (ALDH1A1), responsible for converting retinal into retinoic acid, has already been implicated in the catecholaldehyde hypothesis of PD19,20. This hypothesis proposes that long-term increased build-up of DOPAL (3,4-Dihydroxyphenylacetaldehyde), a toxic catecholaldehyde metabolite of dopamine which is converted by ALDH1A1 into its non-toxic form DOPAC (3,4-Dihydroxyphenylacetic acid), plays a pathogenic role in the development of the disease. It suggests that DOPAL contributes to the damage to neurons in the substantia nigra pars compacta, resulting in the typical motor symptoms associated with PD. Consistent with the increased levels of retinal in PD, we detected a matching PD-associated decrease in the gene expression of ALDH1A1 in the studied PD case/control transcriptomics data (adjusted p value = 5.65E-04), which could result in reduced conversion of retinal to retinoic acid.

Metabolites associated with nutrition and the microbiome (oxalate, tartronate, 4-hydroxyphenylacetate, 3-CMPFP, 3-methylcytidine)

The identified metabolites with significant PD-associations include multiple compounds previously linked to microbiome composition or diet. Among these compounds, oxalate is a naturally occurring substance present in large quantities in many plants but only found in very low concentrations in animal tissues21. It may therefore serve as an indicator for a predominantly plant-based diet. It displayed a significantly decreased abundance in de novo PD vs. controls (adj. p value = 0.021), whereas in the all-PD patients vs. control comparison, the decrease was only close to the significance threshold (adj. p value = 0.051). As previous studies suggested that adherence to plant-based diets, which are typically high in antioxidants and anti-inflammatory compounds, may be associated with a reduced risk of developing PD22 or improved motor performance23 and slower disease progression24, oxalate may warrant further study as marker for a plant-based diet and potential associated beneficial effects.

A further metabolite with known dietary associations is tartronate, which displayed a significantly reduced abundance in both all PD patients vs. controls (adj. p value = 7.65E-03) and de novo PD vs. controls (adj. p value = 0.049). Tartronate is a monosaccharide that has been detected in several natural foods, including sourdough and ground cherries, among others25, and linked with the presence of various bacterial species in the human microbiome26. While potential mechanisms linking tartronate to PD are still unknown, significant decreased serological abundances in early-stage PD have already been reported for an independent cohort for both tartronate and oxalate27, matching with the findings of our study.

Among the diet-associated metabolites with increased abundance, 4-hydroxyphenylacetate (4-HP) displayed significance specific only to the de novo PD vs. control comparison (adj. p value = 0.032). 4-HP can be found naturally in human tissues and biofluids and in several natural foods, and it is also produced by multiple microbial species25. However, since 4-HP does not provide a sufficiently specific marker for individual food items or bacterial species, further research is needed to link the observed increase in de novo PD to specific mechanisms.

Next, the metabolite 3-carboxy-4-methyl-5-pentyl-2-furanpropionate (3-CMPFP), which showed a significantly decreased abundance in all PD patients compared to controls (adj. p value = 1.34E-04) and a nominally significant decrease in de novo PD vs. control (p value = 9.41E-04; adj. p value = 5.86E-02), is a furan fatty acid previously suggested as a marker for fish oil intake28, which could indicate a diet with high levels of omega-3 fatty acids. The metabolite was also reported to act as a protein-bound uremic toxin and interact with reactive oxygen species (ROS), resulting in cellular damage28. Thus, the decrease of 3-CMPFP in PD may be linked to disease-relevant processes through indirect dietary associations or direct mechanistic pathways.

Finally, the metabolite 3-methylcytidine (m3C), which was significantly decreased in de novo PD vs. controls (adj. p value = 0.016; with a similar trend in the all PD vs. control comparison, but no statistical significance, adj. p value = 0.18), is a pyrimidine nucleoside previously proposed as a urinary biomarker of whole grain intake29. In tRNA molecules, m3C is a frequently observed epigenetic modification30 and a lack of m3C32 modifications in tRNAs has been shown to impair cytoplasmic and mitochondrial translation31. Furthermore, significant differences in m3C modifications have been reported in prefrontal lobe cortex samples of Alzheimer’s disease patients32, and further studies suggested functional links between defects in tRNA modifications and neurological disease33.

Metabolites involved in fatty acid metabolism and β-oxidation (benzoylcarnitine, butyrate, 2-butenoylglycine)

Among the significant metabolites, we identified multiple compounds involved in fatty acid metabolism, particularly in the β-oxidation pathway. One of the most pronounced changes was observed for an acylcarnitine, benzoylcarnitine, showing a shared decreased abundance in de novo PD vs. controls and all PD patients vs. controls (see Table 2). Alterations in acylcarnitine metabolism have been associated with a dysregulation of mitochondrial β-oxidation in PD34. β-oxidation breaks down fatty acids into acetyl CoA (coenzyme A), which can then enter the TCA (tricarboxylic acid) cycle to generate ATP, fulfilling cellular energy needs. Acylcarnitines, such as benzoylcarnitine, are used as carriers to transport activated long-chain fatty acids into the mitochondria for β-oxidation. In PD, increased long-chain fatty acids and decreased long-chain acylcarnitines have been observed, suggesting an impairment of mitochondrial β-oxidation34. Interestingly, the drug zonisamide, which has been used to treat resting tremor and motor fluctuations in PD, was previously shown to increase the abundance of multiple long-chain acylcarnitines associated with improved fatty acid β-oxidation35.

A further metabolite associated with fatty acid β-oxidation is butyrate, a short-chain fatty acid formed by bacterial fermentation of carbohydrates, e.g., from dietary fiber, in the intestine25. Butyrate is found ubiquitously in plant oils and animal fat and is contained in many dairy food products25. Here, a significant increase in butyrate, or, respectively, its isomer isobutyrate, was observed in de novo PD vs. control but not when comparing treated PD to controls. Differences in butyrate levels may result from changes in its production or utilization. For example, differences in diet or medicine intake can influence gut bacterial butyrate production, and the gut microbiome in PD patients has been reported to have a reduced fraction of butyrate-producing bacteria in multiple studies36,37,38,39,40,41,42. Similar to the acylcarnitines, butyrate is involved in fatty acid ß-oxidation, where it serves as an intermediate metabolite. Its increased abundance may therefore indicate dysfunctional β-oxidation in PD. However, butyrate has also been proposed to influence PD symptoms through a variety of other mechanisms. For instance, sodium butyrate intake was reported to reduce PD-related motor symptoms via mechanisms associated with gut microbial dysbiosis regulation43, intestinal barrier protection through the activation of G-protein-coupled receptor 109 A (GPR109A)44, and stimulation of glucagon-like peptide-145. In contrast to these proposed protective effects, a study in a mouse model of neurodegeneration using the toxin MPTP reported worsening effects of sodium butyrate administration on motor function, associated with upregulation of pro-inflammatory cytokine expression and increased colonic inflammation46. Overall, the potential effects of butyrate on gut dysfunction and inflammation warrant further investigation.

Finally, as a further change associated with fatty acid ß-oxidation, 2-butenoylglycine was significantly increased in de novo PD vs. controls. This metabolite belongs to the class of acylglycines, which can be produced through glycine conjugation of acyl-CoA esters. Glycine conjugation in mammals is often used as a detoxification method to promote the excretion of carboxylic acids, and the increase of crotonylglycine may therefore reflect the response to a pathological accumulation of crotonyl-CoA. Similar to butyrate, crotonyl-CoA is an intermediate in fatty acid ß-oxidation, and the alterations in crotonylglycine match with previously reported changes in fatty acid β-oxidation in PD47. Moreover, in anaerobic bacteria, crotonyl-CoA serves as an intermediate for butyrate production48, and the shared increase in crotonylglycine and butyrate may therefore reflect the same pathway alteration.

In summary, multiple observed metabolite abundance changes point to PD-associated alterations in fatty acid metabolism, specifically in fatty acid β-oxidation, where a dominant increase of intermediate metabolites matches previous independent reports of incomplete β-oxidation in PD34,47.

Metabolites associated with non-dopaminergic medication

While the comparison of de novo patients vs. controls reveals PD-related metabolite changes that are independent of dopaminergic treatments, de novo patients may still take other non-dopaminergic medications that can be detected in the metabolomics profile. We therefore investigated the metabolite changes in de novo patients vs. controls for potential effects of non-dopaminergic drugs and identified two significant drug-related metabolites: oxazepam and 4-hydroxycoumarin (see Table 1). Oxazepam is the active ingredient in many sedatives and is used to treat anxiety and depression, which commonly occurs in PD49. While sufficiently detailed medication data for participants in the cohort was not available to confirm the intake of corresponding drugs, treatment with sedatives is the most plausible explanation for the significantly increased oxazepam abundance observed in de novo PD. By contrast, 4-hydroxycoumarin was significantly reduced in de novo PD. This compound serves as an anticoagulant for conditions caused by a blood clot50. However, coumarins also have anti-inflammatory, antioxidant and neuroprotective actions51 with potential relevance for PD. In particular, coumarins can inhibit monoamine oxidase (MAO) enzymes52, which are well-established PD drug targets (see the section above on the actions of xanthines on MAO-B). Thus, potential protective actions of 4-hydroxycoumarin resulting in a preferential detection of this compound in controls may merit further study.

Altered metabolites with unknown identity

Among the top 25 most significant metabolites in the de novo PD vs. controls comparison, the chemical identity could not be resolved for one metabolite with an increased abundance in PD (metabolite ID: X-24494) and two metabolites with a decreased abundance (metabolite IDs: X-15674, X-12812; the latter also showed a significant decrease in all PD patients vs. controls). Possible reasons for this include the lack of a reference standard for these molecules or the chemical interference of other metabolites. As the libraries and annotations by the metabolomics service provider Metabolon and the reference databases are continuously updated, we will complement the current annotations with every significant update in the future and publish relevant new findings on the GitLab repository associated with this study (see section “Data availability”).

Pathway enrichment analysis of metabolite changes in PD

While the analysis of individual metabolites already revealed multiple significant PD-associated changes in metabolites with similar functions, we performed further complementary pathway analyses to identify and interpret coordinated alterations in the data. For this purpose, we tested the over-representation of differentially abundant metabolites in de novo PD vs. controls in pathways from the KEGG (Kyoto Encyclopedia of Genes and Genomes) database53 (RRID:SCR_012773) and metabolite sets representing chemical structure classes in the software MetaboAnalyst54 (RRID:SCR_015539), using the entire set of named metabolites as background reference (see Supplementary Table 8) and focusing on the metabolite sets with at least 5 metabolites.

For both databases only nominally significant pathways were identified, including “Fatty Acids and Conjugates” (p value = 1.9E-3, adj. p value = 0.25), and “Fatty Acyls” (p value = 4.96E-02, adj. p value = 1) for the chemical structure classes (see Fig. 2 and Supplementary Table 9), and “Ubiquinone and other terpenoid-quinone biosynthesis” (p value = 2.26E-02, adj. p value = 1), “Retinol metabolism” (p value = 2.26E-02, adj. p value = 1), and “Tyrosine metabolism” (p value = 4.96E-02, adj. p value = 1) for the KEGG database (see Fig. 3 and Supplementary Table 10).

Fig. 2: Metabolite set enrichment analysis results.
figure 2

Metabolite set enrichment analysis results for the de novo PD vs. control comparison using chemical structure classes (main set) in the software MetaboAnalyst. The horizontal axis shows the negative decadic logarithm of the p value, and the vertical axis shows the pathways, sorted by decreasing significance from the top. The color gradient from red to yellow reflects increasing p values, and the size of the dots reflects the effect size for each metabolite set (enrichment ratio; see the legend on the right).

Fig. 3: Dot plot visualization of pathway enrichment analysis results.
figure 3

Dot plot visualization of pathway enrichment analysis results for the de novo PD vs. control comparison using the KEGG database. The horizontal axis shows the negative decadic logarithm of the p value, and the vertical axis shows the pathways, sorted by decreasing significance from the top. The color gradient from red to yellow reflects increasing p values, and the size of the dots reflects the effect size for each pathway (enrichment ratio; see the legend on the right).

Changes in “Fatty Acids and Conjugates” and “Fatty Acyls” match with the statistically significant alterations in individual metabolites in fatty acid metabolism already discussed (see section “Fatty acid metabolism / β-oxidation”). A suppression of mitochondrial fatty acid ß-oxidation in an early stage of PD (Hoehn and Yahr stage I), characterized by decreased levels of long-chain acylcarnitine, has been described before and proposed as a potential diagnostic biomarker for PD34. While the mechanism behind this dysregulation is still unclear, an overexpression of a mutant form of the gene encoding alpha-synuclein (A53T-SNCA) linked to familial cases of PD has been shown to increase triacylglycerol levels and associated with increased activity of acyl-CoA synthetase, which catalyzes fatty acyl-CoA formation as a substrate for β-oxidation47.

The identified nominally significant alteration in “Retinol metabolism” matches with the observed change for the carotenoid retinal (see section “Catecholamine metabolism (retinal, ALDH1A1)”), while the changes for “Ubiquinone and other terpenoid-quinone biosynthesis” and “Tyrosine metabolism” may both reflect alterations in the network around the amino acid L-tyrosine, which is involved in both of these KEGG pathways and displays a nominally significant change in de novo PD vs. controls (p = 0.013).

Overall, the pathway enrichment analysis results, while only providing nominally significant findings, match with the relevance of the functional annotations for individually significant metabolites involved in fatty acid β-oxidation and indicate further putative changes in the network around L-tyrosine for follow-up study.

Comparison with previously reported metabolomic pathway alterations in PD

To assess the consistency of our findings with previously reported metabolomics cellular process alterations in PD, we have compared the main coordinated changes in the PD blood metabolomics biomarker identification study by Hatano et al.55 with our results. The experimental set-up used by Hatano et al. differed from our approach in multiple relevant aspects: Serum samples were analyzed instead of plasma samples, the study focused on patients who had already received antiparkinsonian treatment (35 subjects) and age-matched healthy controls (15 subjects), and it used measurements from ultrahigh-performance liquid chromatography/tandem mass spectrometry (UPLC/MS/MS) optimized for basic species, UPLC/MS/MS optimized for acidic species, and gas chromatography/MS (GC/MS). Despite these methodological differences, when comparing the main pathway alterations in this prior study with our liquid chromatography–mass spectrometry (LC-MS) metabolomics data for all PD patients vs. controls, we observe largely consistent qualitative results.

In particular, as a key finding, Hatano et al. report that the levels of caffeine and its main metabolites were consistently lower in PD than in controls, which matches our observations (see Fig. 4, which seeks to reproduce the visualization of changes in caffeine metabolism presented in Fig. 2 in the study by Hatano et al.55).

Fig. 4: Box plot visualization of alterations in caffeine metabolism.
figure 4

Box plots showing alterations in caffeine metabolism in PD vs. controls, reproducing the results reported in the study by Hatano et al. (see Fig. 2 in Ref. 55, which was used as a model). Vertical axes represent log-scale normalized abundances, and the horizontal axes show the two conditions: control (Ctrl, in blue) and PD (in pink). Arrows indicate enzymatic reactions which relate the source metabolites to their conversion products (the source of the arrows represent the educts, and the arrow targets the products).

Although the corresponding effect sizes are small and only nominal significance is observed in most cases, our metabolomics data confirms a consistent pattern of average decreased caffeine metabolite abundances in PD. However, the nominally significant decrease in caffeine levels in treated patients vs. controls (p = 0.033) is not observed in the de novo patients vs. controls (p = 0.7). These findings match with prior knowledge on complex associations between caffeine intake and PD. On the one hand, caffeine has been reported to have neuroprotective effects and has been associated with a lower risk of developing PD14. On the other hand, as a central nervous system stimulant, it could potentially worsen PD symptoms, such as tremors, possibly resulting in reduced caffeine intake among patients. Furthermore, interactions between dopaminergic medication and caffeine may occur56, which could explain why the nominally significant decrease in caffeine levels in all PD patients vs. controls is not seen in de novo PD vs. controls. Further research will be essential to understand these variation patterns, considering factors such as dietary caffeine intake, symptom profiles, and specific dopaminergic treatments.

Interestingly, the same pattern of nominally significant alterations specific to treated patients is also observed for caffeine metabolites, including multiple xanthine alkaloids such as paraxanthine (treated PD vs. controls: p = 0.015; de novo PD vs. controls: p = 0.47), 1-methylxanthine (treated PD vs. controls: p = 0.02; de novo PD vs. controls: p = 0.78), and 7-methylxanthine (treated PD vs. controls: p = 0.029; de novo PD vs. controls: p = 0.33, see Supplementary Tables 5 and 7). Thus, while the xanthines involved in purine metabolism (inosine, xanthosine, xanthine, and hypoxanthine) display a significantly increased abundance in de novo PD vs. controls (see Table 3 and discussion above), xanthine alkaloids involved in caffeine metabolism only display nominally significant changes in treated patients, with a trend of decreased abundances consistent with their predecessor metabolite caffeine. This matches with prior data indicating that interactions between caffeine metabolism and dopaminergic treatment effects need to be considered, as caffeine intake has been reported to shorten the maximal plasma concentration of L-DOPA56.

As a second main pathway alteration, Hatano et al. reported changes in tryptophan metabolism, with a pronounced reduction in tryptophan levels and slight reductions in some of its downstream conversion products. Our study observed a similar trend in directional change, but the effect sizes for individual metabolites were not large enough to reach statistical significance (see Fig. 5, which qualitatively reproduces the directional changes from Fig. 1B in the study by Hatano et al.).

Fig. 5: Box plot visualization of alterations in tryptophan metabolism.
figure 5

Box plots showing alterations in tryptophan metabolism in PD vs. controls, reproducing the results reported in the study by Hatano et al. (see Fig. 1B in Ref. 55, which was used as a model). Vertical axes represent log-scale normalized abundances, and the horizontal axes show the two conditions: control (Ctrl, in blue) and PD (in pink). Arrows indicate enzymatic reactions which relate the source metabolites to their conversion products (the source of the arrows represents the educts, and the arrow targets the products).

In line with previous reports of changed tryptophan levels in L-DOPA treated rats57, alterations in tryptophan metabolism may mainly represent an L-DOPA treatment effect and not a disease-specific change. However, we note that tryptophan displayed at least a nominally significant decrease (p = 0.003) for the comparison of de novo PD vs. controls. Independent of the cause, the observation of decreased tryptophan levels in PD may be relevant for the choice of adjuvant treatments, as reduced plasma and serum levels of tryptophan have been linked with depression by multiple studies58,59.

Overall, we observe qualitatively similar results for treated patients in this study compared to the main pathway alterations reported by Hatano et al., indicating coordinated changes in caffeine and tryptophan metabolism. Additional studies will be required to assess the physiological implications of these changes and their precise relationship with PD medication.

Metabolite associations with MDS-UPDRS motor scores

To examine if blood metabolite levels are linked to the severity of movement impairments in Parkinson’s Disease (PD), as measured by the MDS-UPDRS-III (Movement Disorder Society-Unified Parkinson’s Disease Rating Scale Part III), we fitted linear models, adjusting for sex, age, and L-DOPA medication effects.

The results showed that eight metabolites were significantly associated with the MDS-UPDRS-III after adjusting p values for multiple hypothesis testing (see Table 4). Except for one unidentified metabolite, all displayed negative correlations with MDS-UPDRS-III scores with small effect sizes. Five of the seven significant metabolites with known identity are caffeine-related metabolites, including caffeine itself and its conversion products theophylline, paraxanthine, 1,3,7-trimethylurate, and 1,7-dimethylurate. Considering this result together with our observation of consistently lower levels of caffeine metabolites in PD vs. controls discussed above and the qualitatively similar results reported by Hatano et al.55, the inverse association of these metabolites with both the presence of PD and motor severity matches with their protective role as suggested in epidemiological studies and experimental models (see section on “Xanthine metabolites”). However, it is important to also consider the possibility of inverse causation, e.g., severe motor symptoms might lead to behavioral changes including reduced caffeine consumption. Additionally, a potential influence of other PD-related molecular factors on caffeine metabolites cannot be excluded. Interestingly, a previous study that monitored both total caffeine intake and measured serum levels of caffeine and nine of its downstream metabolites in 108 PD patients and 31 age-matched healthy controls concluded that the observed significantly decreased levels of these metabolites were unrelated to total caffeine intake and may serve as a potential diagnostic biomarker signature60.

Table 4 Ranking of metabolites with significant associations with MDS-UPDRS-III scores in PD patients

Besides caffeine metabolites, only two other metabolites, 2 S,3R-dihydroxybutyrate and behenoyl sphingomyelin, also showed significant negative associations with the MDS-UPDRS-III score. 2 S,3R-dihydroxybutyrate (also known as 4-deoxythreonic acid) is a secondary metabolite and sugar acid. It has been described as an L-threonine metabolite which correlates negatively with age in adults61, but has to our knowledge not been linked directly to PD. The last metabolite with a negative association, behenoyl sphingomyelin, is a sphingolipid contained in animal cell membranes, in particular in the myelin sheath surrounding certain nerve cell axons25. Sphingomyelins have been implicated in several cellular processes with potential relevance in PD, including nerve impulse transmission, presynaptic plasticity, and the localization of neurotransmitter receptors62. Furthermore, intralysosomal accumulation of sphingomyelins is a pathological mechanism in lysosomal storage disorders, such as Gaucher disease and Niemann-Pick disease, which are both associated with an increased risk of developing PD62. A proposed mechanism linking these disorders to PD is that sphingolipid changes promote a pathological conversion of alpha-synuclein into a proteinase K-resistant conformation and induce its oligomerization63. However, while saturated sphingomyelin species are depleted in the putamen of post-mortem tissue samples from PD patients64, the processes involved remain under investigation.

Overall, MDS-UPDRS-III associations were identified for eight metabolites, which are predominantly involved in caffeine metabolism (no further pathway enrichment analysis was therefore conducted in this case). The results match with the findings on altered caffeine metabolism in the PD vs. control comparison.

Integrative network analysis of metabolomic and transcriptomic changes in PD

To better understand the molecular mechanisms behind the observed metabolite alterations, we performed an integrated molecular network analysis of the metabolomics data with complementary PD case/control transcriptomics data. In line with the statistical findings for individual metabolites, this analysis identified a coordinated sub-network alteration associated with xanthine metabolism. Specifically, it highlighted a regulatory consistent sub-network change with increased abundances in three xanthine metabolites (xanthine, hypoxanthine, and inosine) and decreased gene expression for the associated enzyme hypoxanthine phosphoribosyltransferase 1 (HPRT1), which catalyzes the conversion of hypoxanthine and phosphoribose diphosphate into inosine monophosphate (IMP) and pyrophosphate (see Fig. 6).

Fig. 6: Molecular sub-network visualization of alterations in xanthine metabolism.
figure 6

Molecular sub-network visualization, highlighting shared alterations in xanthine metabolism identified by a joint network analysis of PD vs. controls transcriptomics and metabolomics data (increased abundances are highlighted by red circles, decreased abundances by blue circles). The network analysis suggests decreased expression of the gene HPRT1 as the main cause for an accumulation of the metabolite hypoxanthine and further related xanthines in PD.

Given the role of HPRT1 in hypoxanthine conversion, the decreased HPRT1 expression in PD may contribute to the increased abundances of xanthines by inhibiting the processing of hypoxanthine and its precursor inosine through this branch of the conversion pathway (see Fig. 6, right side), and thereby indirectly increase the levels of the alternative hypoxanthine conversion product xanthine (see Fig. 6, left side). Importantly, the relevant xanthines (inosine, hypoxanthine and xanthine) all pass the blood-brain-barrier65,66,67, and in patients with an HPRT1 deficiency, both plasma and CSF levels of xanthine and hypoxanthine are elevated compared to controls68,69, indicating that changes in plasma xanthine abundances may indeed be linked to HPRT1 expression changes in the brain. Interestingly, the most severe human form of HPRT1 deficiency, known as Lesch-Nyhan’s syndrome and caused by HPRT1 mutations, is a neurological disorder characterized by hyperkinetic movements and by loss of dopamine in the basal ganglia. This suggests that the PD-associated changes we identified in HPRT1 and in the associated xanthines may reflect a mechanism relevant to basal ganglia dysfunction in PD.

While xanthines have been linked to multiple molecular mechanisms in PD, including the generation of reactive oxygen species by xanthine oxidase and the inhibition of the adenosine A2A receptor and Monoamine Oxidase type B (see section on “Xanthine metabolites”), the observed sub-network alteration involving decreased HPRT1 expression suggests an additional mechanism by which xanthines may influence PD. Specifically, the hypoxanthine conversion product inosine monophosphate (IMP), resulting from the reaction catalyzed by HPRT1, is a precursor for adenosine monophosphate (AMP), which in turn is required for the synthesis of adenosine triphosphate (ATP, see Fig. 6, right side). Thus, diminished HPRT1 expression in Parkinson’s Disease (PD) and the resultant lower conversion of hypoxanthine to inosine monophosphate (IMP) may contribute to a deficiency of cellular ATP in PD. Indeed, rare genetic mutations in the HPRT1 gene, linked to Lesch-Nyhan syndrome, are known to lead to elevated xanthine levels and a concurrent reduction in cellular ATP70. Defects in cellular ATP production found in several regions of PD brains have been associated with mitochondrial dysfunction71, and previous studies showing that HPRT1 deficiency inhibits mitochondrial protein complex I-dependent respiration72 indicate that HPRT1 may be involved in energy metabolism dysregulations in PD. Additionally, a further independent study reports that HPRT1 exhibits decreased expression in PD and in other neurodegenerative disorders involving mitochondrial dysfunction, including Alzheimer’s and Huntington’s disease73.

Interestingly, in an open-label, single-arm trial, 26 PD patients received a combined treatment with inosine and an inhibitor of the enzyme XDH that converts hypoxanthine to xanthine (see Fig. 6, left side), and this treatment increased blood hypoxanthine and ATP levels, and lowered the patients’ UPDRS III motor impairment score74. Considering the mechanisms linking hypoxanthine to ATP metabolism as outlined in Fig. 6, HPRT1 may therefore warrant further study as a potential alternative target for pharmacological induction to achieve similar treatment effects as the combination of inosine and XDH inhibition.

Finally, HPRT1 has been linked to PD also through its role in activating Wnt/ß-catenin signaling, a pathway known to protect dopaminergic neurons in the 6-hydroxydopamine mouse model75. Reduced HPRT1 gene and protein expression was observed in the substantia nigra of these mice, in line with our observations in human PD, and lentiviral over-expression of HPRT1 in this model inhibited neuron loss. Interestingly, the study also suggested a possible upstream mechanism for HPRT1 under-expression in PD by showing that LncRNA H19, which is also under-expressed in PD, normally elevates HPRT1 expression by inhibiting miR-301b-3p. Over-expressing H19 not only raised HPRT1 levels but also activated Wnt/β-catenin signaling, reducing neuron loss. Thus, multiple lines of evidence suggest protective effects of rescuing HPRT1 under-expression in PD.

In summary, the observed under-expression of HPRT1 in PD and associated increases in the levels of xanthines may influence disease pathways via multiple independent mechanisms. Follow-up studies in independent human biospecimens and complementary disease models will need to further confirm and characterize the clinical relevance of these mechanisms and associated rescue strategies.

Machine learning analysis

Classification and regression analyses were conducted to identify potential metabolite markers helping to distinguish de novo PD and control subjects (see Table 5) and to predict motor symptom severity in treated patients, as indicated by the MDS-UPDRS III total score (see Table 6).

Table 5 Ranking table of the top 10 known metabolite features using supervised sample classification

The top three identified metabolites for distinguishing de novo PD from controls were xanthine, 2-ketocaprylate, and glutarylcarnitine, as indicated by their higher average area under the Receiver Operating Characteristic Curve (AUC) values across both linear and radial support vector machine (SVM) classifiers in training sets (5-fold cross-validation) and test sets (see Table 5; a complete ranking table, including the AUC values for unidentified metabolites, is provided in Supplementary Table 11). Xanthine, in particular, shows consistent performance in both classifiers, with its test set AUC scores (0.71 for the linear SVM, 0.72 for the radial SVM) slightly exceeding the cross-validated training scores (0.68 for the linear SVM, 0.63 for the radial SVM). This result is consistent with the changes in xanthine metabolism observed in the statistical and network analyses, although we note that it should not be considered as validation, as the same input data was used for these analyses. By contrast, 2-ketocaprylate and glutarylcarnitine were not statistically significant in the de novo PD vs. controls differential analysis, showing that the machine learning-based feature ranking offers distinct information from the statistical analyses. Predictive changes in these two metabolites may reflect disturbances in branched-chain amino acid metabolism and mitochondrial function, respectively, as 2-ketocaprylate is a branched-chain alpha-keto acid that serves as an intermediate of branched-chain amino acids25, and glutarylcarnitine is an acylcarnitine and an intermediate in lysine and tryptophan metabolism76, processes that rely heavily on mitochondrial efficiency.

In the regression analysis results (see Table 6, dopamine metabolites, such as homovanillate (HVA) and various dopamine sulfates, rank highest among the metabolite features for predicting MDS-UPDRS III total motor scores, indicated by the lowest sum-of-ranks across the performance metrics for both linear and radial SVM regression models (a complete ranking table, including scores for unidentified metabolites, is provided in Supplementary Table 12). The presence and levels of these dopamine-related metabolites are likely influenced by dopaminergic medication and may reflect increases in medication intake for patients impacted by more severe motor impairments. While an accurate prediction model for UPDRS III total motor scores would still have practical value, the coefficient of determination (R²) is generally low even for the top-ranked features, suggesting a limited predictivity. However, the occurrence of sphingomyelin among the top-ranked features matches with the significant UPDRS III total score association observed for behenoyl sphingomyelin in the statistical analyses and with previous studies on functional implications of sphingomyelins in PD (see section on “Metabolite associations with MDS-UPDRS motor scores”). Thus, potential links between sphingomyelin metabolism and PD motor impairment may warrant further study.

Overall, the classification and regression analyses of highlighted metabolites as most predictive that match with prior associations identified for xanthines and sphingomyelins in the statistical analyses. However, metabolites only provided limited predictive information when used individually, and further analyses are needed to assess whether a panel of multiple metabolites used in combination could potentially enhance predictive power. Follow-up machine learning analyses for multivariable predictive modeling, integration of metabolomics data with complementary data types (clinical, omics, digital biomarkers, polygenic risk scores, among others), and prediction of further disease outcomes (motor and non-motor scores, comorbidities), which would extend beyond the scope of the current manuscript, are currently in preparation.

Study limitations

This study recognizes limitations inherent in its design and methodology.

Assessment of diagnostic status

As an observational study, the Luxembourg Parkinson’s Study relies on imaging performed during routine diagnostic examinations by patients’ treating physicians. The inclusion of DaTscan (Dopamine Transporter Scan) and other structural imaging in our dataset supports the diagnostic assessments made by the study physicians. In addition, the annual longitudinal follow-up of patients enhances the diagnostic confidence by monitoring for sustained dopaminergic response and the absence of warning signs that might lead to reconsideration of the diagnosis and possible reclassification to atypical or secondary parkinsonism. Despite these measures, a small proportion of misclassifications, particularly in the early stages of PD and in de novo patients, cannot be completely excluded. Such misclassifications are not expected to significantly affect the qualitative results of our study, which consistently includes more than 50 subjects per group.

Noise, bias, and confounding

Regarding possible sources of noise, bias, and confounding in the data, we note that we did not record the time between the last meal and blood sampling, and we did not sample after overnight fasting. Blood levels of glucose, lipids, and amino acids, among other metabolites, can vary significantly with dietary intake, and in this study, we did not have the means to ensure that patients and controls received the same diet or participated in the fasting state. Therefore, potential systematic differences in diet between the study groups could also lead to differences in the metabolite profiles. For this reason, we have discussed potential dietary influences and gut microbiome influences in our interpretation of the individual metabolite changes (see section on ”Significant metabolite changes in de novo Parkinson’s disease and treated patients”). In general, these factors may contribute to greater variability in the data, underscoring the need for a cautious interpretation of the results.

It is important to note the time lag between blood collection and processing: Samples collected in the morning (between 9 am and 12 pm) were delivered to the biobank by ~1 pm and processed within 1–1.5 hours, and derivatives were immediately frozen at −80 °C (pre-centrifugation delay between 1 and 4 h); samples collected in the afternoon (between 12 pm and 5 pm) were delivered to the biobank by ~9 am the next day and processed within 1–1.5 hours, and derivatives were immediately frozen at −80 °C (pre-centrifugation delay between 16 and 21 h). There were no systematic differences in the handling of samples from different study groups, and accordingly, no systematic differences in postprandial status or circadian cycle are expected between the groups. However, stochastic variation in these parameters and the time lags between sample collection and further processing are limitations that may increase measurement variance. All samples underwent two freeze-thaw cycles prior to analysis, and any potential bias associated with this should be the same for all samples. To identify blood samples and metabolomics measurements of insufficient quality for further processing and analysis, multiple quality control analyses were performed on both blood and metabolomics samples, including quality control samples such as a solvent blank and ultrapure water as a process blank, and multiple quality control standards (described and justified in detail in the Methods sections “Biospecimen collection, quality control and sample accessioning” and “Metabolomics sample processing”).

Finally, we note that as the samples were randomly assigned across the measurement batches in combination with samples from another independent study, balanced sample group representations for the present study could not always be ensured across the batches, which may have led to increased data variance.

Missing value imputation and normalization

Missing values were imputed using the minimum detected value, following Metabolon’s routine approach. Since missing values in this type of data are generally the result of data falling below the limit of detection, the data in these cases are left-censored with informative missingness. In total, 21.9% of the values were missing and imputed, but as they reflect data falling below the detection limit, the percentage of missing values is not an indicator of data quality in this setting. There were no significant differences in the percentage of imputed values between the study groups (before imputation, controls had 21.8% missing values, PD samples had 22.0% missing values, and de novo PD samples had 21.4% missing values). Because the imputation filled in missing values using the minimum observed values, and because the proportion of missingness was very similar across the batches, there was no significant change in the individual median values and in the differences between the medians, and no renormalization was required. The possible alternative approach of normalizing with quality control samples was not chosen, because it relies on fewer samples, which could potentially reduce the robustness of the normalization.

Adjustment for treatment effects

For the study of PD patients receiving dopaminergic treatments, it is important to note that using 3-O-Methyldopa (3-OMD, 3-methoxytyrosine) as a covariate to adjust for treatment effects has limitations. 3-OMD is an imperfect indicator of the many different types of dopaminergic medications taken by PD patients and can be modulated by both catechol-O-methyltransferase (COMT) inhibitors and the prolonged drug effects of carbidopa. L-DOPA itself was not a covered metabolite in this LC-MS profiling study and therefore could not be used instead of 3-OMD. Therefore, adjustment for 3-OMD in the data for treated patients is not expected to fully correct for the confounding effects of dopaminergic medications, and the separate analysis of de novo patients is an essential component of this profiling study.

Group differences in cognitive impairment

Significant group differences in cognitive decline, as assessed by the MoCA (Montreal Cognitive Assessment) score, were observed in PD patients compared with controls (see Table 3). Although there is evidence that PD can cause cognitive decline, e.g., through a dysfunction of fronto-striatal pathways77, to our knowledge there is no clear evidence suggesting that early cognitive decline increases the risk of developing PD. A variable that is affected by the outcome does not meet the definition of a confounder, and by adjusting for a covariate that predominantly represents a downstream effect of the disease, we would risk removing the primary effects of the disease itself. However, we note that some of the observed metabolite changes may not be directly associated with PD, but rather reflect cognition-related changes. Nevertheless, differences in MoCA scores between cases and controls cannot affect our association analysis of motor scores (MDS-UPDRS III), which is focused exclusively on PD patients.

Machine learning analyses

Given the limited sample size available for de novo PD patients (56 subjects), the impact of dopaminergic medications on blood metabolite levels in treated patients, and the focus on a single cohort, the machine learning and cross-validation results herein should be considered as preliminary estimates of the achievable predictive power for case-control classification and MDS-UPDRS III total score prediction. Furthermore, we focused on single metabolites as predictors and the potential of multi-variable signatures to improve the predictive performance will require further investigation. In general, for both the optimization and robust validation of metabolite-derived machine learning models, independent analyses with larger sample sizes of de novo patients across multiple distinct cohorts are needed.

Discussion

The study of blood plasma metabolomics for the Luxembourg Parkinson’s Study revealed several statistically significant disease-associated changes. In statistical and machine learning analyses of de novo PD patients compared to controls, xanthine metabolites stood out among the most significant and predictive metabolites. The xanthine alteration patterns showed consistency and coordination both in terms of the direction of changes, with a common increased abundance for inosine, xanthosine, xanthine, and hypoxanthine in de novo PD vs. controls, and in terms of the mechanistic matching of changes with complementary PD vs. control transcriptomics data in an enzyme-metabolite regulatory network analysis. As a main finding, the integrated network analysis also highlighted a potential key enzyme, HPRT1, which catalyzes the conversion of hypoxanthine to IMP, and whose decreased abundance in PD may explain the increased metabolite abundances observed in the alternative hypoxanthine-to-xanthine pathway (see Fig. 6).

While the case-control statistical comparison of metabolomics data cannot distinguish causal from non-causal relationships, the coordinated regulatory network changes identified in xanthine metabolism point to potential clinically relevant disease mechanisms, paving the way for targeted follow-up validation studies. Furthermore, the detected coordinated sub-network changes may have relevant applications in biomarker signature modeling for PD due to their consistency and robustness across multiple biomolecules in two different omics data types. In this context, we note that in a previous study of CSF biomarkers in an independent PD cohort, the metabolites xanthine, homovanillic acid, and their ratio were proposed as biomarkers of PD status and severity78, suggesting that xanthine is altered in multiple body fluids in PD and may indeed have diagnostic applications.

In addition to biomarker applications, xanthines and their derivatives, including naturally occurring xanthine alkaloids such as caffeine, paraxanthine, and theophylline, have previously been proposed as potential neuroprotective agents for PD. Multiple studies using in vitro and in vivo models of PD have confirmed protective effects of these compounds14,16,79, e.g., for paraxanthine, caffeine, and theophylline in an MPTP mouse model14, and for inosine in a cellular PD model80. While the main proposed protective mechanism for inosine and other xanthines links them to the downstream metabolite uric acid as a mediator of neuroprotective effects via induction of the Nrf2 signaling pathway81, other studies suggest that the protection by inosine is independent of uric acid80. Further alternative candidate protective mechanisms for xanthines and their derivatives include inhibition of the adenosine A2A receptor and the enzyme MAO-B, both of which are well-established PD drug targets with associated clinically approved drugs (e.g., the xanthine derivative drug Istradefylline, which targets A2A82, and the MAO inhibitor selegiline83).

In addition to these known actions of xanthine derivatives, the integrated metabolomic and transcriptomic network analysis performed here suggests another possible pathway by which xanthine alterations may affect PD-related processes. The observed decreased expression of the enzyme HPRT1, which converts hypoxanthine to IMP, provides both a mechanistic explanation for the observed increased abundance of xanthines in PD and is associated with a shortage of cellular ATP as a further pathological downstream effect (see Fig. 6). This potential disease mechanism is supported by a previous single-arm, open-label trial in PD patients, targeting the same pathway by a combined treatment with inosine and an inhibitor of the enzyme XDH to block the conversion of hypoxanthine to xanthine75. This inhibition favors the conversion of hypoxanthine to IMP by HPRT1 and has been shown in the same study to increase downstream ATP production and improve PD motor symptoms.

Besides the main finding of coordinated changes in xanthines, the metabolite analyses also revealed significant alterations in other PD-related cellular processes. In particular, pronounced changes in fatty acid β-oxidation were observed, consistent with previous studies reporting impaired mitochondrial β-oxidation in PD34. In addition, the combined analysis of metabolomics and transcriptomics data revealed coordinated changes in catecholamine metabolism, with significantly increased levels of the metabolite retinal coinciding with a decreased expression of the enzyme aldehyde dehydrogenase 1A1 (ALDH1A1), which converts retinal to retinoic acid. These findings are in line with the catecholaldehyde hypothesis of PD, which links reduced ALDH1A1 levels to toxic accumulation of the dopamine metabolite DOPAL19,20.

The machine learning analyses for de novo PD vs. control classification and for prediction of MDS-UPDRS III total motor score provided results that matched with the associations for xanthines and sphingomyelins observed in the statistical analyses. However, the predictive capacity of individual metabolites was low to moderate, and further analysis is needed to evaluate potential improvements of multi-metabolite signatures and confirm the results across different cohorts.

Overall, the investigations revealed several significant metabolite changes in de novo PD and treated patients. These findings highlight coordinated and mechanistically congruent alterations in specific cellular processes and sub-networks, laying the ground for follow-up mechanistic intervention and validation studies in PD model systems.

Methods

Study cohort

The data used in this study were obtained from participants recruited from the nationwide, monocentric, observational, longitudinal Luxembourg Parkinson’s Study10 in the frame of the National Centre of Excellence in Research on Parkinson’s disease (NCER-PD). All subjects signed a written informed consent, and the study was approved by the National Research Ethics Committee (CNER Ref: 201407/13) and complied with all relevant ethical regulations. To characterize the cross-sectional metabolite alterations in PD, blood plasma samples were obtained from 549 PD patients and 590 controls from the Luxembourg Parkinson’s Study and submitted for metabolomics profiling (study approved by the University of Luxembourg Ethics Review Panel, ref. ERP 18-042). An overview of relevant baseline subject characteristics for PD patients and controls from this dataset is shown in Table 3. The clinical diagnosis of PD adhered to the United Kingdom Parkinson’s Disease Society Brain Bank (UKPDSBB) diagnostic criteria84. Criteria for the inclusion of controls in the Luxembourg Parkinson study were: (i) no evidence of neurodegenerative disorders according to all clinical assessments and imaging information available to the assessing study physician; (ii) age above 18 years, (iii) absence of pregnancy or active cancer. De novo PD was defined as dopaminergic treatment-naïve patients within one year of PD diagnosis.

Table 6 Ranking table of the top 10 known metabolite features for regression analysis of motor scores

Biospecimen collection, quality control and sample accessioning

Blood plasma samples were collected for all study participants from the Luxembourg Parkinson’s Study and were submitted for liquid chromatography–mass spectrometry (LC-MS) metabolomics profiling using the Metabolon untargeted global metabolomics screening platform (www.metabolon.com). Plasma was recovered from ethylenediaminetetraacetic acid (EDTA) coated blood collection tubes (Becton Dickinson Ref. 367525) by centrifugation at 2000 g for 10 minutes at room temperature. The entire plasma supernatant was transferred to a 15 ml polypropylene conical mixing tube (VWR Ref. 525-0401) taking care not to aspirate from the buffy coat layer between the plasma and the erythrocytes, either by automated script on a TECAN liquid handling platform (Freedom EVO, TECAN) or manually with sterile serological pipettes. The collected supernatant was homogenized by repeated aspiration and dispense cycles and then pipetted into 220 µl aliquots into barcode labeled screw-cap cryovials of approximately 700 µl capacity (Thermo MATRIX 0.5 ml, Thermo-Fisher ref.3744 or FluidX 0.7 ml, Azenta ref, 68-0703-10) and frozen and stored at −80 °C in 96-position SBS-format lockable racks. Up to 12 aliquots of 220 µl could be obtained from each 10 ml EDTA blood collection tube.

To detect and filter blood samples of low quality, the Integrated Biobank of Luxembourg (IBBL), which processed the study samples, performed the following assays as part of its routine quality control: 1) Complete blood cell counting using an ABX Micros 60 Hematology Analyzer on a small aliquot of the whole blood before centrifugation, providing the following 6 parameters: White blood cell count (WBC), red blood cell count (RBC), hemoglobin (HGB), mean corpuscular volume (MCV), hematocrit (HCT), C-reactive protein (CRP); 2) HIL (hemolysis, icterus and lipemia) indices are tested using a COBAS Integra SI4 assay on the recovered pooled and mixed plasma prior to aliquoting. In addition, IBBL conducts an annual program of continuous quality control monitoring of all its processing methods, which includes assays performed on derivatives of EDTA blood samples collected and processed under the same conditions throughout the year specifically for IBBL’s annual quality control (QC) program. Assays performed on the plasma derivatives include the following: 1) Platelet count; 2) Interleukin-16 (IL-16) concentration; 3) HIL (hemolysis, icterus and lipemia) indices are tested using a COBAS Integra SI4 assay on the recovered pooled and mixed plasma prior to aliquoting; 4) Consistency of aliquot volumes (automated process). Hemolyzed blood samples and samples highlighted as abnormal or problematic by instrument flags or measurements for the above readouts are not included in any further analyses, such as the metabolomics profiling analyses conducted for our study. Further quality control analyses were performed as part of the metabolomics profiling (see sub-section on “Quality analysis and quality controls (QC)” in the section on “Metabolomics sample processing”).

Following receipt of the samples at Metabolon on dry ice, the samples were inventoried and immediately stored at −80 °C. Each sample received was accessioned into the Metabolon Laboratory Information Management System (LIMS) and assigned a unique identifier associated with the source identifier only. This identifier was used to track all sample handling, tasks, and results. All samples were maintained at −80oC until processed.

Metabolomics sample processing

The processing of the metabolomics samples, including sample preparation, profiling, quality analyses and quality controls (QC), followed the standard procedure by Metabolon as described below (extracted from the documentation provided by Metabolon along with the experimental data).

Sample preparation for metabolomics

Samples were prepared using the automated MicroLab STAR® system from Hamilton Company. Several recovery standards were added prior to the first step in the extraction process for QC purposes. To remove protein, dissociate small molecules bound to protein or trapped in the precipitated protein matrix, and to recover chemically diverse metabolites, proteins were precipitated with methanol under vigorous shaking for 2 min (GenoGrinder 2000 by the supplier Glen Mills) followed by centrifugation. The resulting extract was divided into five fractions: two for analysis by two separate reverse phase Ultrahigh Performance Liquid Chromatography-Tandem Mass Spectroscopy (RP/UPLC-MS/MS) methods with positive ion mode electrospray ionization (ESI), one for analysis by RP/UPLC-MS/MS with negative ion mode ESI, one for analysis by HILIC/UPLC-MS/MS with negative ion mode ESI, and one sample was reserved for backup. Samples were placed briefly on a TurboVap® (Zymark) to remove the organic solvent. The sample extracts were desolvated under nitrogen and stored at −80 °C before preparation for analysis. In total, the samples underwent two freeze-thaw cycles during the entire course of the study, including one for the experimental processing at Metabolon. The aliquots were always processed in parallel for the samples, i.e., all samples experienced consistent handling, undergoing the same number of freeze-thaw cycles.

Metabolomics profiling

The metabolomics measurements for all samples were conducted using Ultra-Performance Liquid Chromatography-Tandem Mass Spectroscopy (UPLC-MS/MS). All methods used a Waters ACQUITY ultra-performance liquid chromatography (UPLC) and a Thermo Scientific Q-Exactive high resolution/accurate mass spectrometer interfaced with a heated electrospray ionization (HESI-II) source and Orbitrap mass analyzer operated at 35,000 mass resolution. The sample extract was dried, then reconstituted in solvents compatible with each of the four methods. Each reconstitution solvent contained a series of standards at fixed concentrations to ensure injection and chromatographic consistency. One aliquot was analyzed using acidic positive ion conditions, chromatographically optimized for more hydrophilic compounds. In this method, the extract was gradient eluted from a C18 column (Waters UPLC BEH C18-2.1 × 100 mm, 1.7 µm) using water and methanol, containing 0.05% perfluoropentanoic acid (PFPA) and 0.1% formic acid (FA). Another aliquot was also analyzed using acidic positive ion conditions, however it was chromatographically optimized for more hydrophobic compounds. In this method, the extract was gradient eluted from the same aforementioned C18 column using methanol, acetonitrile, water, 0.05% PFPA and 0.01% FA and was operated at an overall higher organic content. Another aliquot was analyzed using basic negative ion optimized conditions using a separate dedicated C18 column. The basic extracts were gradient eluted from the column using methanol and water, however with 6.5 mM Ammonium Bicarbonate at pH 8. The fourth aliquot was analyzed via negative ionization following elution from a HILIC column (Waters UPLC BEH Amide 2.1 × 150 mm, 1.7 µm) using a gradient consisting of water and acetonitrile with 10 mM Ammonium Formate, pH 10.8. The MS analysis alternated between MS and data-dependent MSn scans using dynamic exclusion. The scan range varied between methods but covered 70–1000 m/z. Raw data files were archived, extracted and further processed as described below.

Quality analysis and quality controls (QC)

Multiple types of controls were analyzed in concert with the experimental samples: a pooled matrix sample generated by taking a small volume of each experimental sample served as a technical replicate throughout the data set; extracted water samples served as process blanks, aliquots of solvents used in extraction served as solvent blanks; and QC standards that were carefully chosen not to interfere with the measurement of endogenous compounds were spiked into every analyzed sample, allowed instrument performance monitoring and aided chromatographic alignment (Supplementary Tables 1 and 2, and Supplementary Fig. 1 describe these Metabolon QC samples and standards).

The use of ultrapure water as a blank sample is part of the standard operating procedures by Metabolon to assess the contribution to compound signals from the analytical procedures. This same process is used for all sample types processed at Metabolon including those not involving blood/plasma. Water is used instead of phosphate-buffered saline (PBS), because while PBS has comparable osmolarity and buffering capacity to plasma, it also contains salts and other components that could potentially interfere with the detection and quantification of metabolites. The use of water ensures that the blank does not introduce any extraneous peaks or signals into the mass spectrometry data, allowing for clearer interpretation of the results from the experimental samples. In addition, the use of water provides a “clean” background against which the performance of quality control (QC) standards can be evaluated. As the blank sample is also used to assess any contamination or carry-over effects during the sample preparation and analysis process, the use of water helps to identify any such issues more clearly than using a more complex matrix such as PBS.

Instrument variability was determined by calculating the median relative standard deviation (RSD) for the standards that were added to each sample prior to injection into the mass spectrometers. Overall process variability was checked by calculating the median RSD for all endogenous metabolites (i.e., non-instrument standards) present in 100% of the pooled matrix samples (see the median RSD values for instrument variability and process variability in Supplementary Table 3). Experimental samples were randomized across the batches, and with QC samples spaced evenly among the injections.

Data quality control, filtering, and pre-processing for the metabolomics data

The quality control (QC) procedures encompassed the analysis of multiple controls alongside the experimental samples, including process blanks, solvent blanks, technical replicates, and carefully selected QC standards to monitor instrument performance and assist in chromatographic alignment (Supplementary Tables 1 and 2, and Supplementary Fig. 1 describe these Metabolon QC samples and standards; details on the quality analyses are described in the Supplementary Material section “Metabolomics sample processing - Quality analysis and quality controls”). Additionally, median relative standard deviations (RSD) were calculated for both instrument and process variability to ensure accurate and consistent results (see Supplementary Table 3 and Supplementary Fig. 2).

The raw metabolomics data was pre-processed to obtain metabolite abundances in the form of log-transformed, batch normalized and imputed peak-area data (i.e., total ion counts, which represent the integrated area-under-the-curve). Experimental samples were randomized across the batches, and with QC samples spaced evenly among the injections. The batch normalization was performed so that for each metabolite, the raw values for the samples were divided by the median in each instrument batch so that each batch and metabolite has a median of one. For every metabolite, the minimum value across the batches for the median-scaled data was used to impute the missing values (limitations associated with missing values and the rationale for the imputation approach are discussed in the section on “Study limitations”). The batch-normalized and imputed data was transformed using the natural logarithm. This was motivated by a comparison of average density estimation plots of the peak-area data before and after log transformation, suggesting that the log-transformed data better follows a normal distribution (see Supplementary Fig. 3). The final metabolomics dataset covered a total of 1490 biochemicals, covering 1207 compounds of known identity and 283 compounds of unknown structural identity. A complete list of these compounds, including information on their public database IDs, chemical properties, and associated biochemical pathways, is provided in Supplementary Table 4.

Transcriptomics data

For the PD case-control brain transcriptomics data analyzed in this study85, pre-processed data was obtained from the database Gene Expression Omnibus (GEO, ID: GSE8397). The used samples are from the lateral substantia nigra midbrain region, covering 9 PD patients and 5 controls, obtained in the GSE Series Matrix file format and analyzed at the log scale.

Statistical analyses of the metabolomics data

A detailed reporting form, providing standardized information on the metabolomics data by integrating relevant recommendations from the “Core Information for Metabolomics Reporting (CIMR)” by the Metabolomics Standards Initiative86 and the Co-ordination of Standards in Metabolomics87 is provided as a Supplementary Material. To avoid that treatment effects resulting from standard drug therapy for PD patients involving medications containing the active compound levodopa (L-DOPA) affect the metabolomics data analysis, we first focused on the subset of de novo PD patients. A differential abundance analysis comparing these 56 de novo patients against all 590 controls, was performed using the empirical Bayes moderated t-statistic88 as implemented in the R software package limma (v3.52.2, RRID:SCR_010943)89, adjusting for age and sex as confounders. The resulting p values were corrected for multiple hypothesis testing according to the Benjamini and Hochberg method90. Next, the differential abundance analysis was repeated for the entire cohort of PD patients, i.e., comparing 549 patients against 590 controls, using the same statistical approach but including the abundance measurements for the L-DOPA metabolite 3-O-Methyldopa (3-OMD, 3-methoxytyrosine) as an additional covariate to adjust for dopaminergic treatment effects (L-DOPA itself was not covered among the measured metabolites, see section on “Study limitations”). Prior to all differential abundance analyses, metabolite features with zero variance across the considered samples were removed. Since L-DOPA medication has a pronounced effect on blood metabolite measurements, we additionally filtered out all metabolites from the data with a minimum absolute Spearman correlation of 0.2 to the 3-OMD abundances prior to the differential analysis. We note that weak indirect treatment effects may persist in the data after these filtering and adjustment steps. Therefore, we mainly rely on the prior de novo patient vs. control comparison to assess treatment-independent effects.

By conducting the differential analysis for de novo patients and for the entire cohort of PD patients, with the described additional filtering and covariate adjustments, we identified the set of shared significant differential metabolites with the same direction of the change in these two analyses as the set of high-confidence PD-associated metabolites, whose alterations are both independent of treatment effects (as confirmed by the de novo differential analysis) and robust across a large sample size (as confirmed by the analysis of the entire PD dataset). While the analysis of the entire PD cohort (including de novo patients and treated patients) in addition to the de novo patient-specific analysis allowed us to maximize statistical power to detect PD-associated changes, we also performed a dedicated analysis for treated patients only, using the same filtering and adjustment steps as for the entire PD dataset, to better distinguish changes in treated from treatment-naïve patients.

In addition, we used available clinical measurements of motor impairment severity, quantified by the Movement Disorder Society‐Sponsored Revision of the Unified Parkinson’s Disease Rating Scale (MDS‐UPDRS) Part III Motor Scores91, to build linear models for testing associations between metabolite profiles and the severity of motor symptoms. This analysis was conducted using the data from all patients, adjusting for sex, age, and L-DOPA medication, and correcting the significance scores for multiple hypothesis testing in the same manner as for the differential abundance analysis.

Statistical analyses of transcriptomics data

The gene expression data was analyzed by comparing PD vs. control samples from the lateral substantia nigra brain region using the same implementation of the empirical Bayes moderated t-statistic as for the metabolomics data. We again adjusted the analysis for the available confounding factor variables age and sex and performed multiple testing corrections for the p values using the Benjamini and Hochberg method. MDS‐UPDRS motor scores and data on dopaminergic treatment were not available for the subjects covered for the transcriptomics profiling, and therefore the comparison of metabolomics and transcriptomics data focused on the case-control analyses and shared network alterations, confirming the treatment-independence using the metabolomics data (see section on “Pathway and network analyses” below).

All statistical analyses and associated volcano plot and dot plot visualizations were implemented in the R statistical programming software (version 4.2.0)92. The results were computed on a physical machine (CentOS 7.9.2009, Kernel: 3.10.0-1160.25.1.el7.x86_64).

Pathway and network analyses

Pathway enrichment analyses for the metabolomics data were conducted using the MetaboAnalyst software54. As annotation data resources, we used cellular pathway definitions from the database KEGG53 and metabolite sets representing chemical structure classes directly from MetaboAnalyst54. The complete set of identifiable, experimentally profiled metabolites was used as a reference metabolome for the pathway analyses, and from the pathway annotation databases only metabolite sets covering at least 5 metabolites were considered. To obtain a comprehensive coverage of pathways enriched in putative PD-associated changes, we tested the over-representation of both metabolites with false-discovery rate (FDR)-adjusted and nominal significance (p <= 0.05) in pathways and metabolite sets from these databases.

Next, to investigate the network relationships between metabolites and enzymes undergoing coordinated changes in PD, integrated network analyses of the differential metabolomics and transcriptomics data statistics were implemented using the “Build Network” analysis workflow in the GeneGo MetaCore™ software with a focus on human molecular interaction data (filtered to species homo sapiens). All resulting p values for the pathway and network analyses were adjusted according to Benjamini and Hochberg90.

Machine learning analyses

To obtain a first estimate of the utility of the metabolomics data for predictive modeling of disease outcomes, we performed machine learning (ML) analyses for PD vs. control diagnostic discrimination (classification analysis) and UPDRS III total motor score prediction (regression analysis). To avoid the strong confounding effects of dopaminergic medication in the data from treated patients, only data from de novo patients and controls were used for the PD vs. control classification analysis. By contrast, the regression analysis of the UPDRS III total motor score as a measure of disease severity was performed only for treated patients, excluding controls and de novo patients, because monitoring of motor performance is arguably most relevant for the majority of treated patients who are no longer in the initial stages of the disease. The aim of this specific analysis was to investigate whether metabolomics data could serve as a surrogate marker for UPDRS III motor assessments, potentially providing a means to replace or complement some of the routine assessments in the clinic with molecular measurements that may be less time consuming and burdensome for patients. Importantly, although regression models may be influenced strongly by the impact of dopaminergic medication on the metabolomics data, they may still be of practical use if they can accurately predict UPDRS III motor performance, e.g., to monitor the effects of medication.

For the ML analyses, the data was divided into training and testing sets in a 66:34 ratio, using a predetermined random seed to ensure reproducibility. Support Vector Machines (SVMs) were employed for the model building considering both a linear kernel and a radial basis function (RBF) kernel to detect both linear and non-linear predictive patterns. The R software package e1071 was used to train and apply the SVM models (https://cran.r-project.org/package=e1071, version 1.7-11). To optimize hyperparameters, a grid search was conducted on the training data within a 5-fold cross-validation framework. A key aspect of our methodology was the selection of the least complex model (with the lowest value of the regularization parameter C in the SVM) that was within one standard deviation of the best-performing model (in terms of cross-validated area under the Receiver Operating Characteristic Curve (AUC) for the classification analysis, and in terms of the root mean squared error (RMSE) for the regression analysis). This approach is grounded in the principle of preferring simpler models with similar predictive ability to avoid overfitting and increase generalizability. Once the most suitable models in terms of this performance/complexity trade-off criterion were identified, they were retrained with the selected hyperparameters on the entire training set. The retrained models were then applied to the test set to further evaluate their performance on independent samples. To assess the discriminative power of individual features, we conducted the above ML and cross-validation analyses for each metabolite feature in isolation. This involved training SVM models for each feature and assessing the predictive performance for both the classification and the regression analysis. Feature rankings were then consolidated by computing the average AUC for the classification analysis, and respectively, the sum of feature ranks for the regression analysis, for both linear kernel and RBF kernel SVMs for the training set cross-validation and the independent test set evaluation. This composite ranking provided a summarized view of feature importance to distinguish the most informative features for both classification and regression tasks.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.