Multiomics implicate gut microbiota in altered lipid and energy metabolism in Parkinson’s disease

We aimed to investigate the link between serum metabolites, gut bacterial community composition, and clinical variables in Parkinson’s disease (PD) and healthy control subjects (HC). A total of 124 subjects were part of the study (63 PD patients and 61 HC subjects). 139 metabolite features were found to be predictive between the PD and Control groups. No associations were found between metabolite features and within-PD clinical variables. The results suggest alterations in serum metabolite profiles in PD, and the results of correlation analysis between metabolite features and microbiota suggest that several bacterial taxa are associated with altered lipid and energy metabolism in PD.


INTRODUCTION
Parkinson' disease (PD) is the second most common neurodegenerative disorder and is associated with prominent gastrointestinal pathophysiological changes and symptoms 1 . Compositional alterations of gut microbiota in PD have been robustly demonstrated across multiple cohorts 2,3 . However, the functional implications of these changes regarding microbiota-host interactions and PD pathology and progression are poorly understood. Alterations of fecal and serum metabolites and inflammatory markers have been described in relation to gut microbiota 4-6 , but findings have been inconsistent except for reproducible findings of reduced fecal short-chain fatty acid levels in PD 7 . The Helsinki cohort has so far been analyzed for microbiome correlations with clinical features 2,8 and disease progression 9 . We have also studied the oral and nasal microbe communities 10 . Recently, the immune response and fecal SFCA levels were studied among the same individuals 6 . In the current study we aimed to characterize links between untargeted serum metabolomics, microbiota composition, and clinical symptoms in PD as compared to healthy control subjects.

Metabolome analysis
Data-driven metabolomics profiling of serum samples was undertaken, identifying a total of 7585 metabolite features. Support Vector Machines (SVM) using RBF kernel showed 81% classification accuracy using GC-MS metabolic profiles, and 77% and 72% classification accuracy was achieved using LC-MS positive and negative mode ionisation data, respectively (Table 1).

Metabolome-microbiome correlation analysis
Correlation analysis between the selected 139 metabolite features and bacterial taxa at genus, family, and phylum levels was performed separately for the PD and Control groups to facilitate their contrast (Supplementary Tables 1-6). All results' tables have been curated to obtain the best possible putative identification of the metabolite peak IDs at MSI level 3 identification level. Tables 2 and 5, as well as the two genus-level supplementary tables (Supplementary Tables 1 and 2) contain a metabolite "class" identification column (see Methods section for details). All supplementary tables contain all identified taxa/metabolite correlation pairs (using the selected 139 metabolite features and all bacterial abundance data) that show posterior mean correlation values at or above 0.3 and at a 95% "confidence level" (see Methods section for more details).
The within-PD analysis at genus level identified a total of 176 correlation pairs, while within-Controls analysis produced 202 pairs (Supplementary Tables 1 and 2, respectively). As can be seen, there is some overlap in the taxa and metabolites represented in the two groups, but overall there are substantial differences in the bacterial taxon-metabolite pair associations. To aid in the identification of possible links between metabolite classes and bacteria, we produced network figures for both within-PD and within-Controls results at genus level, using metabolite classes. (Figures 1 and 2). At family level, the within-PD analysis identified a total of 67 correlation pairs, while the within-Controls analysis identified a total of 85 pairs (Supplementary Tables 3 and 4, respectively). Finally, the within-PD analysis at phylum level identified a total of 17 correlation pairs and the within-Controls analysis identified a total of 6 pairs (Supplementary Tables 5 and 6, respectively).

Figure 1:
Network of within-PD correlations between bacterial genera and metabolite classes. Taxon groups unclassified at genus level may contain more than one genus (or higher . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. taxon) in the same node, but were kept in the network for visualization purposes and to match the raw results from Supplementary table 1. Edge thickness represents the strength of the correlation. Blue edges represent positive correlations, and Orange edges represent negative correlations. Green nodes represent metabolite classes, and Blue nodes represent bacterial taxa.

Figure 2:
Network of within-Controls' correlations between bacterial genera and metabolite classes. Taxon groups unclassified at genus level may contain more than one genus (or higher taxon) in the same node, but were kept in the network for visualization purposes and to match the raw results from Supplementary table 2. Edge thickness represents the strength of the correlation. Blue edges represent positive correlations, and Orange edges represent negative correlations. Green nodes represent metabolite classes, and Blue nodes represent bacterial taxa.
To further focus the study, we trimmed the correlation pairs down to only those containing bacterial taxa that were i) not unclassified at the target taxonomic level, ii) differentially abundant between PD and Control groups at one or both of the two time points of sample collection in a previous study using the same subject data 9 , and iii) taxa that were systematically reported previously in the PD microbiome literature as being differentially abundant between PD and Control groups 3,9 ( Table 5). The bacterial abundance data used in the present article corresponds to the second time point of sample collection in Aho et al.
(2019) 9 . To aid in visualizing the relationships, a third within-PD network figure was produced, using genus-level data and metabolite classes as before, but limited to the trimmed correlation pairs (Figure 3).
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021.  (Table 5). Edge thickness represents the strength of the correlation. Blue edges represent positive correlations, and Orange edges represent negative correlations. Green nodes represent metabolite classes, and Blue nodes represent bacterial taxa. Table 5   Table 5: Selected results based on taxa previously reported in the literature at genus, family, and phylum levels. Red numbers represent negative correlations, and Blue positive correlations. The last column presents a consensus on direction of effect based on previous reports: before the slash (/), we report the result obtained in Aho et al. (2019) 9 using the same bacterial data as in the present study; after the slash, we report the consensus reported by Boertien et al. (2020) 3 (see that study for details). NA means that no result for that taxon is available in that study 9 . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. ; In the Helsinki cohort, six bacterial genera were previously reported as being differentially abundant (selected for the present article at an alpha threshold of statistical significance of 0.05 from the original 0.1), using one or more statistical methods 9 , between Control and PD groups at one or both time points, namely: Bifidobacterium, Roseburia, Prevotella, Blautia, Lactobacillus, and Clostridium XIVa. All these genera, except for Clostridium XIVa, have also been reported as being differentially abundant in previous publications contrasting a control group to PD patients 9 . Of these, Prevotella, Bifidobacterium, Roseburia, and Lactobacillus are also found to be correlated with one or more metabolite features in our genus-level analysis (Table 5).
Other differentially abundant bacterial genera reported previously in the PD microbiome literature 3 besides those referred to in Aho et al. (2019) 9 have also been found covarying with metabolite features in our dataset, and we also used that information for the purpose of focusing our study's results. At genus level, Akkermansia, Bifidobacterium, Faecalibacterium, Prevotella, Lactobacillus, and Roseburia were reported multiple times in the literature, with only Akkermansia and Faecalibacterium not being reported in the Aho et al. (2019) 9 study as being differentially abundant (Table 5).
In the Aho et al. (2019) 9 study, seven bacterial families were reported as being differentially abundant between PD and Control groups at one or both time points, namely: Bifidobacteriaceae, Prevotellaceae, Rikenellaceae, Lachnospiraceae, Pasteurellaceae, Lactobacillaceae, and Puniceicoccaceae. All these families, except for Puniceicoccaceae, have also been reported as being differentially abundant in previous publications contrasting a control group to PD patients 9 . Three of them showed correlations with one or more metabolite features in our family-level analysis (Table 5). Boertien et al. (2019) 3 also reported on the bacterial families most commonly found to be differentially abundant, namely Bifidobacteriaceae, Prevotellaceae, Lachnospiraceae, Lactobacillaceae, Verrucomicrobiaceae, Enterobacteriaceae, Erysipelotrichaceae, and Ruminococcaceae, with the last four families not being found to be differentially abundant at any time point in our cohort 9 (Table 5).
For this study, we have also analysed our data at phylum level, unlike in Aho et al. (2019) 9 , and found correlations between various metabolites and the phyla Lentisphaerae, Verrucomicrobia, Synergistetes, and Tenericutes (Supplementary Tables 5 and 6). Some phyla recurrently found to be differentially abundant between PD and Control groups are Verrucomicrobia, Firmicutes, and Bacteroidetes 3 , although of those reported only Verrucomicrobia yielded correlations with metabolite features in our analysis (Table 5).

Serum Metabolome
Using mummichog approach, we have shown in this study that the identified serum metabolome differences in PD have functional significance on over 20 pathways, including carnitine shuttle, vitamin E metabolism, glycerophospholipids, sphingolipids, fatty acids, and aminoacyl-tRNA biosynthesis. In a separate study, we have demonstrated that carnitine shuttle, sphingolipids, and fatty acids pathways change in Parkinson's sebum -these are within the 20 pathways enlisted above detected in serum in this study 14 . Energy metabolism . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. ; is highly regulated by facilitation of long chain fatty acid β-oxidation. Also in frail 15 elderly participants without Parkinson's, dysregulation of carnitine shuttle and vitamin E metabolism was observed when compared to similarly aged resilient individuals 16 . Thus, perturbation of carnitine shuttle and vitamin E, along with fatty acids in serum metabolome may indicate a significant change in energy metabolism during PD. This can further be supported by changes observed in sphingolipid metabolism, a key pathway in cell signalling and regulation. Dysregulation of sphingolipids is known to be associated with α-synucleinopathy 17,18 , changes in lysosomal metabolism, and in mitochondrial metabolism observed in PD 19,20 .
Decreased long-chain acylcarnitines due to insufficient β-oxidation has been shown to carry potential for early diagnosis of Parkinson's 21 , especially 12-14 long chain acylcarnitines. In recent work studying the gut microbiome, Rosario et al. (2021) 22 have shown the role of bacterial folate and homocysteine metabolism in PD. Higher numbers of bacterial mucin and host degradation enzymes were linked to the manifestation of PD. The contribution of bacterial folate metabolism to human metabolic regulation is not entirely clear. Folate is an essential vitamin B, that maintains methylation reactions. The liver, via many methylation reactions in post-translational modifications, regulates the synthesis of hormones, creatine, carnitine, and phosphatidylcholine 23 . If methylation capacity is compromised due to an alteration in folate metabolism, there may be impaired phosphatidylcholine synthesis along with shunted or disrupted carnitine shuttle observed in our results. Altered carnitine metabolism, fatty acids, and steroid metabolism was also observed in a metabolomics profiling study recently reported 24 . Several studies have reported decreased levels of carnitine and acylcarnitines in plasma from PD patients 21,25,26 ; however, according to Jiménez-Jiménez et al. (1997) 27 no changes were observed in acylcarnitine levels in plasma or cerebrospinal fluids of PD participants. Thus, there is no clear evidence of direction in which carnitines are expressed but there is much research evidence that suggests a link between perturbations in carnitine shuttle owing to protective mechanism of acylcarnitines leading to changes in other fatty acids and eventually the lipid make-up in PD.

Correlations between bacterial taxa and metabolites features
Regarding the correlations between bacterial taxa and metabolite features, and given that metabolite MSI 3 ID is a putative identification, we will mostly focus the following discussion at the level of metabolite classes. When mentioning specific bacterial taxa in terms of correlation results, we will report between brackets if the taxon is always (or usually) reported in the literature as being over-or underrepresented in PD (see the last column in the Table 5 for details), as well as the signal of the correlation detected in our study.
At all bacterial taxonomic levels investigated in this study, the most commonly detected correlations were found with putative metabolites in the glycerophospholipid class and with lipids in general. This is the case both in our non-trimmed results (Supplementary Tables 1-6) and in the trimmed results focusing only on bacterial taxa found in the previous literature as being differentially abundant between Controls and PD cases (Table 5). Specifically, in the within-PD analysis and focusing the discussion on the trimmed results, we find correlations between various glycerophospholipids and Roseburia (decreased in PD; positive correlation), Lactobacillus (increased in PD; positive correlation), Akkermansia (increased in PD; one positive and one negative correlation), Bifidobacteriaceae (increased in PD; positive correlation), Pasteurellaceae (decreased in PD; positive correlation), Lactobacillaceae (increased in PD; positive correlation), Verrucomicrobiaceae (increased in PD; one positive . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. ; and one negative correlation), and Verrucomicrobia (increased in PD; one positive and one negative correlation) (Table 5). Akkermansia, Verrucomicrobiaceae, and Verrucomicrobia share the same positive and negative correlations with two metabolite features. Lactobacillus and Lactobacillaceae correlate positively with the same metabolite feature.
With the exception of Roseburia and Pasteurellaceae, all these taxa are usually found to be overrepresented in PD, and are mostly positively correlated with various glycerophospholipids, which are also found in our analysis to be mostly overrepresented in PD (Table 2). This is not the case in the within-controls analysis, in which the only detected correlations with glycerophospholipids are with Lactobacillaceae (positive correlation with a different metabolite) and Enterobacteriaceae (also a positive correlation with a different metabolite) ( Table 5). The genus-level network figures for PD and Controls (Figures 1 and 2) also indicate the existence of possible alterations in bacterial metabolic dynamics in PD. Overall, these results suggest that these bacterial taxa, which have been found to be overabundant in PD in several studies, may be associated for the most part with an increase in glycerophospholipid abundance in PD.
All of these glycerophospholipids have an endogenous (human host) origin and are linked to cell signalling, lipid peroxidation, and lipid metabolism. These phospholipids are the main component of cell membranes in all known living systems, and play roles in various biological processes, including signal induction and acting as transporters. Interestingly, there are several genetic factors directly or indirectly related to glycerophospholipid metabolism, such as PLA2G6 17 , that are associated with PD risk (PLA2G6 is the cause of early-onset PARK14-linked dystonia-parkinsonism 28,29 ). Alpha-synuclein, characteristically found in aggregates within Lewy bodies in the brains of PD patients, directly binds to negatively charged phospholipids in the cells' lipid membranes, and exhibits preferential binding to small lipid vesicles 30 . The binding of alpha-synuclein to lipid membranes can also lead to alterations in their bilayer structure that can induce the formation of those lipid vesicles 31 . Very importantly in the PD context, these interactions between lipid membranes and alphasynuclein affect its rate of aggregation, and can lead to disruption of membrane integrity both in vitro and in vivo 32 . It has also been shown that the association of soluble alpha-synuclein with planar lipid bilayers results in the formation of aggregates and small fibrils 33 . Exposure to docosahexaenoic acid (DHA), which accounts for 60% of glycerophospholipid esterified fatty acids in the plasma membrane, gradually assembles alpha-synuclein into amyloid-like fibrils, with the notable feature that DHA itself becomes part of the aggregate 34 . Notably, alpha-synuclein gene expression is increased with elevated DHA intake, and the resulting oligomers are toxic to cells 35,36 . Alpha-synuclein also binds with specific phospholipids in mitochondrial membranes, modulating the efficiency of mitochondrial energy production 37 , with various mitochondrial phospholipids appearing to have an effect on alpha-synuclein toxicity 17 . Thus, the interaction between alpha-synuclein and various phospholipids and their metabolism may play an important role in PD pathogenesis, and gut microbiota may be implicated in these interactions.
We also detected correlations with other lipids in the within-PD analysis. Roseburia (decreased in PD) negatively correlates with a sterol lipid, probably of dietary origin. Roseburia is also positively correlated with a metabolite in the fatty acyl class, possibly also associated with diet. Lactobacillus (increased in PD) is negatively correlated with a furanoic fatty acid, which is associated with cell signalling, lipid peroxidation, lipid metabolism, and lipid transport metabolism, with dietary, human, and/or bacterial origin. Half of the metabolites from the fatty acyl class were found to be overrepresented in PD in the selected . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
In the within-controls analysis we detect a positive correlation between Enterobacteriaceae (increased in PD) and a sphingolipid, which is associated with membrane stabilization, lipid peroxidation, and lipid metabolism, of endogenous origin. In our study, this metabolite feature is slightly decreased in PD (Table 2) and no correlation is found between it and PDlinked bacterial taxa in the within-PD analysis.
Although the interpretation of these results regarding lipids in general is not suggestive of a particular pattern as in the case of glycerophospholipids, it is nevertheless interesting that virtually all correlations, positive or negative, with lipids are detected in the PD group, with most lipids detected in our study being overrepresented in PD relative to the control group ( Table 2). Also interesting is that the majority of the identified correlations between metabolite features and the trimmed bacterial taxa list are with lipids, with relatively few other metabolite groups represented in the results ( Table 5). The importance of this link between lipid metabolism and PD can't be overstated: as mentioned earlier in the context of phospholipids, recent research shows that alpha-synuclein binds preferentially to specific lipid families and molecules, and that the latter promote alpha-synuclein interaction with synaptic membranes and affect alpha-synuclein oligomerization and aggregation. These same lipid-protein complexes also affect lipid metabolism by interfering with the catalytic activity of lipid enzymes in the cytoplasm and lipases in lysosomes. Lipid compositional alterations in PD have also been reported in brain and plasma, as well as linked to oxidative stress, inflammation, and progressive neurodegeneration through pro-inflammatory lipid mediators (see Alecu et al. (2019) 17 for a full review on the role of lipids in PD). The link between lipids and bacteria in PD, if any, would probably consist of the bacterial modulation of lipid intake through diet and its differential effect on the bioavailability of those lipids in the host. The results of our study, by establishing associations between bacterial taxa found to be differentially abundant between Controls and the PD group and lipid metabolites present in serum that are themselves differentially abundant between both groups, suggests that such a scenario could have a role in PD pathology and development.
Further correlations with putative metabolites in other classes have also been detected in our study, in particular in the hexoses class and carboxylic acid or derivatives class. In the hexoses case, two negative correlations for the same metabolite feature were found for Bifidobacterium and Bifidobacteriaceae (both taxon levels increased in PD; Table 5). These two correlations are only found in the within-controls analysis. The metabolite is probably endogenous in origin and is involved in sugar metabolism shunts, diverting a proportion of glucose from the main glycolytic path and returning metabolites at the level of triose phosphate and fructose 6-phosphate.
Finally, a positive correlation in the within-PD analysis is found between a metabolite feature in the carboxylic acid or derivatives class and the bacterial family Erysipelotrichaceae, which is mostly found in previous studies to be overrepresented in PD. This metabolite feature is tentatively identified as proline and may have a microbiome or endogenous source. Four different metabolite features found to be differentially abundant in our data are tentatively identified as proline (Table 2), and in all cases show a slight decrease in mean abundance in PD. Interestingly, L-proline can act as a weak agonist for glycine and glutamate receptors 38 , like NMDA, AMPA, and kainite. Both glutamate and glycine are neurotransmitters. Although that is not the case in our data, proline has been reported previously as being . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. overrepresented in PD 24 and is known to be linked to protein metabolism and structure, cell differentiation, conceptus growth and development 39 , and gut microbiota community reequilibration in cases of dysbiosis, with L-proline dietary supplementation being known to affect gut microbial composition and gut concentrations of several bacterial metabolites 40 . One of the detected correlations with Prevotella also involves (putative) proline. Prevotella and Prevotellaceae, when detected in PD studies, are usually found decreased in PD. In this study it correlates negatively with proline, which is found to be slightly underrepresented in the PD group.

Study subjects, clinical data, and sampling
The present study uses bacterial abundance data that was used in a previously published study by Aho et al. (2019) 9 . The study subjects and associated clinical data in the present study is similar to the data referred to previously in that study as "follow-up" timepoint, with minor changes specific to the present study: of the original 128 subjects, 61 control subjects and 63 PD patients were used in the present study, i.e. 3 control subjects and 1 PD subject less than in the original cohort (C75, C82, C123, and P119). This difference in sample numbers was due to insufficient metabolite data available to perform the study.
For DNA sequencing, the stool samples were collected and stored as described earlier 9 . For serum samples, blood was drawn at the study visit and after processing immediately transferred to -20°C and subsequently to -80°C. Samples were shipped overnight on dry ice from Helsinki to Manchester for analysis.

Sample preparation and metabolomics methods
Metabolomics sample preparation: Untargeted metabolite profiling was performed on serum samples that were collected from participants stored at −80 °C prior to analysis. Complementary coverage of metabolites was obtained using Ultra-High Performance Liquid Chromatography Mass Spectrometry (UHPLC-MS) and gas chromatography mass spectrometry (GC-MS). The procedures were adapted from the Dunn 41 and Begley 42 protocols as summarized here: Metabolites were extracted from the serum samples by individually adding 400 µL of cold methanol to 200 µL of serum. This was followed vortexing and centrifugation (17,500 × g) to yield a metabolite rich supernatant that was split into two aliquots and lyophilised for 12 h. Resultant metabolite pellet was stored at −80 °C until analysis. A pooled QC standard was also generated by combining 20 µL aliquots of each sample into a pooled vial with subsequent 200 µL aliquots from the pool, being extracted identical to each sample.

LC-MS method parameters:
Processed metabolite pellets were defrosted at 4 °C and subsequently reconstituted in 100 µL of 95:5 H2O:MeOH (v/v). UHPLC-MS analysis was performed using an Accela UHPLC with cooled auto sampler system coupled to an electrospray LTQ-Orbitrap XL hybrid mass spectrometer (ThermoFisher, Bremen, Germany). Analysis was carried out in positive and negative ESI modes while samples were . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. ; completely randomised to negate for any bias. The mobile phases and gradient elution profile were as tabulated below: From each sample vial, 10 µL of the extract was injected onto a Hypersil GOLD UHPLC C18 column (length 100 mm, diameter 2.1 mm, particle size 1.9 µm, Thermo-Fisher Ltd. Hemel Hempsted, UK) held at a constant temperature of 50 °C with a solvent flow rate of 400 µL min −1 .
Prior to analysis, LTQ-Orbitrap XL was calibrated according to manufacturer's instructions using caffeine (20 µg mL −1 ), the tetrapeptide MRFA (1 µg ml −1 ) and Ultramark 1621 (0.001%) in an aqueous solution of acetonitrile (50%), methanol (25%) and acetic acid (1%). The data acquisition was performed in centroid mode with 30K mass resolution and scan rate of 400ms per scan. The masses were measured between 100-1200 m/z range with source gases set at sheath gas = 40 arb units, aux gas = 0 arb units, sweep gas = 5 arb units. The ESI source voltage was set to 3.5 V, and capillary ion transfer tube temperature set at 275 °C.
LC-MS data processing: Xcaliber software (v.3.0; Thermo-Fisher Ltd. Hemel Hempsted, U.K.) was used as the operating software for the Thermo LTQ-Orbitrap XL mass spectrometer. Data processing was initiated by the conversion of the standard UHPLC .raw files into the .mzML format using Proteowizard 43 . Subsequently, peak picking was carried out in RStudio 44 using the XCMS 45 package for data deconvolution (http://masspec.scripps.edu/xcms/xcms.php). The output data was a matrix of mass spectral features with accurate m/z and retention time pairs. Any missing values after deconvolution were replaced using k-nearest neighbours algorithm. Peaks that with relative standard deviation of more than 20% within pooled QCs were removed. The remaining data was normalised with total ion count to account for injection to injection signal variations, log10 transformed, and pareto scaled prior to statistical analysis.

GC-MS method parameters:
Analysis of serum samples was also carried out on a Agilent 7250 GC-Time-of-Flight mass spectrometer coupled to a Gerstel-MPS autosampler. Two step derivatization of metabolite pellets thawed at 4 °C was carried out as described in the Begley . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. ; protocol 42 . The source temperature was set to 230°C and quad temperature was at 150°C. The total run time was 25 minutes for 10 µL sample injected each time. The sample was injected in split mode with 20:1 split ratio and split flow of 20 mL per minute. Agilent CP8944 VF-5ms column was used for separation (30m x 250µm x 0.25µm). With a 5 minute solvent delay at the start of run, gradient elution method was used to elute and separate analytes from serum. The oven temperature was ramped from 70°C to 300°C with an increase of 14°C per minute. At 300°C the temperature was held for 4 minutes before dropping back to starting conditions.

GC-MS data processing:
The raw data files were in Agilent .D format that were converted to .mzML format using Proteowizard 43 . Peak picking was carried out in RStudio 44 using an inhouse script for the eRah 46 package for GC-MS peak picking and deconvolution. The peaks were annotated using eRah's MassBank library. Any missing values after deconvolution were replaced using k-nearest neighbours algorithm. Peaks that with relative standard deviation of more than 20% within pooled QCs were removed. The remaining data was normalised with total ion count to account for injection to injection signal variations, log10 transformed, and pareto scaled prior to statistical analysis.
All metabolites successfully annotated within both the LC-MS and GC-MS analysis were assessed and scored at MSI level 3 putative identification according to rules set out by the Chemical Analysis Working Group of the Metabolite Standards Initiative 13 .

Sample preparation and DNA sequencing
The stool sample preparation and subsequent sequencing of the bacterial 16S rRNA genes has been described earlier 9 . All details, as well as the results of the full analysis of the 16S rRNA gene amplicon sequence data can be obtained from that publication.

Bioinformatics and Statistical Data Analysis
Statistical analysis of metabolomics data: In this untargeted profiling study, we performed (i) SVM models Using Python via Orange user interface, SVM models were generated for this analysis. The data were pre-treated as described earlier. The data were then split into train (60% data) and test (40% data), and resampling was repeated 100 times. The SVM model was generated with linear-kernel, cost (C) was set to 1.5, and regression loss epsilon was set at 0.10.
(ii) feature selection To select metabolite features (i.e. features differentially abundant between the Control and PD groups) contributing to the SVM models, the mSVM-RFE algorithm was used 47 . The iterative algorithm worked backwards from an initial set of features consisting of all the variables (i.e. metabolite features) in the dataset. At each iterative round, firstly a simple linear SVM was fitted, then features were ranked based on their weights in the SVM solution and lastly, the algorithm eliminated the feature with the lowest weight. In order to stabilize these feature ranking, at each iteration cross validation resampling was used. By using k-fold cross validation (k=10) multiple SVM-RFE iterations were carried out. From the resultant ranked feature list, the top 10% of the features were selected for further interpretation as they contributed the most towards SVM models.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. ; (iii) variable selection In order to investigate the effect of 84 clinical and technical variables, these were used along with the metabolomics abundance data to create PLS-DA models. The effect of additional metadata variables was marginal, indicating no significant contributions by these variables in classification or prediction of phenotypes (Table 4). Correct classification rate as an average of 250 bootstraps was 84% compared to 82% obtained using only metabolomics data.  9 , while adjusting for confounders. All RBDSQ and Progression categories were adjusted for age at sampling and time since motor onset. SCS-PD, SDQ, UPDRS-II, UPDRS-III and Hoehn & Yahr were also adjusted for LED. UPDRS-III was additionally adjusted for beta blockers. Wexner score and Rome III IBS status and Rome III constipation scores were also adjusted for anticholinergic medication, constipation medication, opioids, and tricyclic medications, as well as dietary fiber intake. SCS-PD was additionally adjusted for anticholinergic and tricyclics. GDS-15 was additionally adjusted for SSRI medications. MMSE was additionally adjusted for anticholinergic and tricyclic medications.
The features that had continuous scaled values were subjected to linear regression with elastic net penalty (50:50, L1:L2) and partial least squares regression with 20 PCs, for 1000 iterations. The features that had categorical values were subjected to support vector machines with RBF kernels (cost was set to 1.5 and regression loss epsilon was set at 0.10), random forest (10 trees), and logistic regression with ridge penalization with cost value of 1.
(v) pathway analysis In this data driven approach, we have interrogated data generated from an untargeted profiling study. It is often impractical to identify each peak in a metabolomics profiling study as it could contain upwards to 5000 features in a single sample. To identify them accurately, the only option is to purchase commercial standards and perform MS/MS analysis in both samples and standards and then match fragmentation spectra. This could be relevant when performing a targeted analysis with a defined set of metabolites. Computationally predicted m/z based identification alone is not adequate for pathway analysis due to multiple metabolite matches to single m/z. Thus, we have employed mummichog analysis that does not depend on identification of metabolites and then mapping on pathways. Instead mummichog leverages the collective power in the organisation of metabolic networks. If a list of m/z values truly reflect a biological activity, the true metabolites that are represented by these m/z values should show enrichment on a local structure of a metabolic network. If the measured m/z matches to a falsely represented metabolite, the distribution will be observed randomly. The overall significance of mapping and pathway enrichment is estimated by ranking the p-values from the real data among the p-values from permutation data to adjust for type I error, along with penalisation. Thus, a robust functional metabolic network gives us insight into our data more than identifying a handful of features.
Mummichog 56 (v.1.0.9) pathway analysis was used to predict network activity from preprocessed UHPLC-MS metabolomics data. The full metabolite data set consisting of 5897 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. ; and 1258 features from LC-MS positive and negative mode, respectively, was used as an input. Pathway enrichment analysis was performed on annotated 429 GC-MS features using MetaboAnalystR 57 .
Data pre-processing for 16S rRNA gene sequence data: The bioinformatics pipeline for the 16S rRNA gene amplicon sequence data used in the present study was the same as described in Aho et al. (2019) 9 given that the same data was used for the present study minus the above mentioned 4 samples that were excluded from the present analysis during quality control of the metabolomics data. Please refer to the above study for details.
Metabolome-microbiome correlation analysis: Statistical data analysis specific to the 16S rRNA gene sequence data was performed and reported in full earlier 9 . Putative confounders (technical or otherwise) identified in that analysis were taken into account in the correlation modelling performed in the present paper (see below for details).
For correlation analysis between metabolites and bacterial taxa at genus, family, and phylum levels, we used the fido 58 package (v.0.1.13; the package was formerly known as stray) for the R Statistical Programming Software 59 (v.3.6.0). fido provides a framework for inferring multinomial logistic-normal models which can account for zeros and compositional constraints, as well as sampling and technical variation present in sequence count data 58 . For the present study we used the function orthus from fido which enables joint modelling of multivariate count data (e.g. 16S rRNA gene amplicon sequence data) and multivariate Gaussian data (e.g. metabolomic data on the log-scale).
For samples jÎ{1, ...,N} we denote by Yj the observed D1-dimensional vector of sequence counts, Zj the standardized (i.e. Z-transformed) and log10-transformed P -dimensional vector of observed metabolite abundances, and Xj a Q -dimensional vector of covariates. Using this notation, the orthus likelihood model is given by with priors Λ ∼ (Θ, Σ, Γ) and Σ ∼ Inverse Wishart (Ξ, ) and with "# denoting the inverse additive log-ratio (ALR) transform with respect to the D -th taxa 60 . Of note, the ultimate inference is invariant to the chosen ALR transform. This represents a joint linear model over the latent relative abundances of microbial taxa and metabolite abundances. For computational scalability this model was inferred using the multinomial-Dirichlet Bootstrap approximation to the marginal posterior density (π | Y) that is available in fido. The multinomial-Dirichlet bootstrap approximates the true marginal posterior density using the posterior of a Bayesian multinomial-Dirichlet model centered at the maximum a posteriori (map) estimate of π. In brief, this is accomplished as follows: for each sample j, the marginal posterior distribution (πj | Y) is approximated as the posterior of a Bayesian multinomial Dirichlet model ( 9 | ; ) where ; = ! %&' ∑ ( )*# )! . Here, the Dirichlet parameters αj were . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 1, 2021. all taken to be 0.5; this can be thought of as a probabilistic equivalent of using a pseudo-count of 0.5 yet also produced quantified uncertainty due to multivariate counting. The prior parameters were chosen as Θ = 0 ([("#-.]×1) , Γ = 3 , and = D + P + 9. Finally, we set the prior Ξ = ( − − ) BlockDiagonal ( $ , . ) where G is the (D -1) × D ALR contrast matrix given by = [ ("# − 1]. This choice of Θ, Γ, Ξ, and represents the weak prior belief that the correlation between the absolute abundance of taxa is, on average, small. This prior is closely related to the sparse penalization used by SparCC 61 . Using this model, priors, and inference, we sampled 2000 independent samples from the posterior distribution (Λ, Σ |Y, Z, X).
Variable selection for the models was performed as described above in section (iii) and in Aho et al. (2019) 9 : for the three within-Parkinson's covariance models (i.e. using only PD subjects at three taxonomic levels), the models were adjusted for COMT inhibitor medication use. For the three within-Controls covariance models we used intercept-only models. The matrix Σ represents a ([D -1] + Q) × ([D -1] + Q) covariance matrix encoding all possible covariances between ALR coordinates and metabolites. For model interpretation and inference, each posterior sample of the upper (D -1) × Q submatrix was transformed to a D × Q matrix representing the covariance between microbial composition (now represented with respect to centered log-ratio coordinates) and metabolite abundances. Covariances were transformed to correlations using the function cov2cor in the R programming language. For the purposes of this study, we considered only those correlations that had a posterior mean equal to or larger than 0.3 and that had a 95% credible region not including zero. Conditioned on our chosen priors, this decision boundary can be thought of as limiting our results to correlations which we believe (with at least 95% certainty) are non-zero.
Correlation modelling was performed for bacterial genus, family, and phylum levels, using only metabolite abundance data at "Peak ID" level. This means that, although several of their corresponding MSI level 3 putative identifications were nominally the same, these were not merged before analysis, because they have different retention times and there is nonnegligible uncertainty in their identification. After the correlations were calculated, we broadly assigned metabolite class information to these metabolites for Table 5 and Table 2 to aid in interpretation. These class assignments were then used to produce the Cytoscape 62 network visualisations (v.3.8.0), both because classes simplify the networks and because they are more plausible than the putative MSI level 3 identifications. These classes were assigned by searching each putative identification of a metabolite feature against the Human Metabolome Database 63 (HMDB) and the Kyoto Encyclopedia of Genes and Genomes 64 (KEGG) entry.

DATA AVAILABILITY
The 16S rRNA gene sequence abundance raw data is available from Aho et al. (2019) 9 . The metabolomics data will be hosted on MetaboLights (https://www.ebi.ac.uk/metabolights/index) and also on The University of Manchester servers. It will also be made available upon reasonable request. The clinical data is also available upon request due to European subject confidentiality laws.

CODE AVAILABILITY
Code for the bacterial taxa-metabolite correlation analyses is available as R scripts' files (supplementary files "Metabolomics.SERVER.SCRIPT.PD_ONLY.Selected.CORR.final" and "Metabolomics.SERVER.SCRIPT.CONTROLS_ONLY.Selected.CORR.final"). For metabolomics-only data analysis, deconvolution R scripts using XCMS and eRah, Matlab code for Partial Least Squares Discriminant Analysis (PLS-DA), R code for metabolite correlations, and Python code for Support Vector Machine (SVM) are available in the GitHub repository at github.com/drupadt/