Short chain fatty acids-producing and mucin-degrading intestinal bacteria predict the progression of early Parkinson’s disease

To elucidate the relevance of gut dysbiosis in Parkinson’s disease (PD) in disease progression, we made random forest models to predict the progression of PD in two years by gut microbiota in 165 PD patients. The area under the receiver operating characteristic curves (AUROCs) of gut microbiota-based models for Hoehn & Yahr (HY) stages 1 and 2 were 0.799 and 0.705, respectively. Similarly, gut microbiota predicted the progression of Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) III scores in an early stage of PD with AUROC = 0.728. Decreases of short-chain fatty acid-producing genera, Fusicatenibacter, Faecalibacterium, and Blautia, as well as an increase of mucin-degrading genus Akkermansia, predicted accelerated disease progression. The four genera remained unchanged in two years in PD, indicating that the taxonomic changes were not the consequences of disease progression. PD patients with marked gut dysbiosis may thus be destined to progress faster than those without gut dysbiosis.


INTRODUCTION
Parkinson's disease (PD) is a long-term neurodegenerative disease that exhibits not only motor symptoms but also non-motor symptoms 1 . PD is attributed to the loss of dopaminergic neurons in the substantia nigra. The loss is caused by abnormal aggregation of α-synuclein fibrils (Lewy bodies) in the neuronal cells. Lewy bodies also exist in the lower brainstem and the cerebral cortex 2 , the olfactory bulb 3 , the salivary glands 4 , the skin 5 , the autonomic nervous system 6 , and the intestine 4,7,8 . The Braak's breakthrough paper and the following studies indicated that abnormal aggregation of α-synuclein fibrils starts in the intestinal nerve plexus and gradually moves up to the substantia nigra 2,3,9,10 . Constipation, idiopathic rapid eye movement sleep behavior disorder (iRBD), and depression can be frequently observed about 20, 10, and 5 years before the development of motor symptoms in PD 1 , which is consistent with the Braak's hypothesis. In rodent models, gastrointestinal injection of pathogenic α-synuclein causes propagation of α-synuclein aggregates to brain via the vagus nerve [11][12][13][14] and neurodegeneration of the substantia nigra 14 . In common marmoset, pathologic αsynuclein transmits within the brain and can be neurotoxic 15 . Similarly, in baboon monkey, pathogenic α-synuclein transmits bidirectionally between the enteric and the central nervous systems even in the absence of α-synuclein pathology in the vagus nerve 16 .
Epidemiological studies indicate that older age, male, cognitive impairment, and postural instability/gait-dominant type of PD are predictive of rapid progression of PD [17][18][19][20][21] . A single machinelearning model to predict the progression of PD has been reported, and will be addressed in detail in the discussion 22 .
As far as we know, 19 studies in PD [23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41] and one study in iRBD 42 have been reported on gut microbiota. Another study analyzed gut microbiota in both PD and iRBD 43 . We showed by metaanalysis of gut microbiota in five countries that genus Akkermansia was increased and genera Roseburia and Faecalibacterium were decreased in PD 38 . In contrast, in a meta-analysis of iRBD in Germany and Japan, genus Akkermansia was increased, whereas genera Roseburia and Faecalibacterium were not decreased 42 . Akkermansia degrades the intestinal mucin layer, 44,45 and is predicted to increase the intestinal permeability, which has been reported in PD by us 23 and others 46 . Genera Roseburia and Faecalibacterium produce short chain fatty acids (SCFAs). Decreased SCFAs are potentially linked to activated neuroinflammations in PD 47,48 . In Finland, genus Prevotella was decreased in PD, and PD patients with lower relative abundance of genus Prevotella tended to progress faster in two years 24,41 . In our metaanalysis of PD in five countries, however, significant decrease of genus Prevotella was observed in Finland, but not in United States, Russia, Germany, or Japan 38 . Similarly, genus Prevotella was not decreased in iRBD in Germany or Japan 42,43 . We here examined whether gut microbiota could predict the progression of PD in two years.
We also collated clinical features (Table 1), as well as clinical features in the stable and deteriorated groups (Supplementary  Table S2), for each HY stage at year 0. Fifteen out of 35 features were significantly different between HY stages 1, 2, and 3. The significant difference was observed only in reasonable features like age, disease duration, and MDS-UPDRS scores.
Construction of random forest models to predict whether HY stages are advanced in two years or not We divided PD patients at HY stages 1-3 (n = 165), 1 (n = 24), 2 (n = 85), and 3 (n = 56) at year 0 into the deteriorated and stable groups. The deteriorated group had an increased HY stage, whereas the stable group had an unchanged or decreased HY stage. We made random forest models to differentiate the deteriorated and stable groups for HY stages 1-3, 1, 2, and 3 at year 0 using bacterial features, and compared the models with those using clinical features. Generations of random forest models by nested cross-validation and crossvalidation are indicated in detail in Methods, and are illustrated in Fig. 1. We first examined the validity of our modeling strategy by nested cross-validation with recursive feature elimination (RFE), which should have no leakage (type 1 circularity 49 ) between the training and test datasets.
The nested cross-validation for HY stages 1-3 at year 0 yielded AUROCs of 0.548 (95% confidence interval: 0.456-0.641) in microbiota-based models (red line in Fig. 2a) and 0.559 (0.464-0.654) in clinical feature-based models (blue line in Fig.  2a). Thus, both bacterial and clinical features failed to make decent models. We thus constructed models for each HY stage.
For HY stage 1 at year 0, the nested cross-validation yielded AUROCs of 0.799 (95% confidence interval: 0.615-0.982) in microbiota-based models (red line in Fig. 2d) and 0.549 (0.307-0.791) in clinical feature-based models (blue line in Fig. 2d). In the nested cross-validation, the microbiota-based model yielded 75% sensitivity and 83% specificity, whereas the clinical featurebased model failed to make a decent model (Fig. 2d). Seven statistical measures including sensitivities and specificities to indicate the performance of each model were all better with the microbiotabased models than with the clinical feature-based models ( Table 2). As the nested cross-validation did not provide us with essential features that determined the progression of PD, we made models by recursively eliminating features and evaluated them by crossvalidation ( Fig. 1). Recursive feature elimination with crossvalidation showed that maximum AUROCs were 0.868 (0.719-1.000) at two genera (Fusicatenibacter and Faecalibacterium) in microbiota-based models (red line in Fig. 2e, f) and 0.639 (0.402-0.876) at two features (BMI and age) in clinical feature-based models (blue line in Fig. 2e, f). AUROC was as high as 0.861 (0.707-1.000) even when a model was generated using genus Fusicatenibacter alone (green dot in Fig. 2e and green line in Fig. 2f). The microbiota-based model with Fusicatenibacter and Faecalibacterium yielded 83% sensitivity and 83% specificity, and Fusicatenibacterbased model yielded 92% sensitivity and 75% specificity (Fig. 2f). In contrast, the clinical feature-based model with BMI and age yielded 75% sensitivity and 67% specificity (Fig. 2f).
For HY stage 2 at year 0, the nested cross-validation yielded AUROCs of 0.705 (0.592-0.818) in microbiota-based models (red line in Fig. 2g) and 0.719 (0.602-0.835) in clinical feature-based models (blue line in Fig. 2g). Seven statistical measures to indicate the performance of each model were slightly better with the clinical feature-based models than with the microbiota-based models except for specificity (Table 2). Recursive feature elimination with cross-validation showed that maximum AUROCs were 0.793 (0.692-0.895) at seven genera (Lactobacillus, Blautia, Fusicatenibacter, Anaerostipes, Ruminococcus gnavus group, Akkermansia, Bifidobacterium) in microbiota-based models (red line in Fig. 2h, i) and 0.783 (0.682-0.883) at ten features in clinical feature-based models (blue line in Fig. 2h, i).
For HY stage 3 at year 0, the nested cross-validation yielded AUROCs of as low as 0.509 (0.301-0.719) in microbiota-based models (red line in Fig. 2j) and 0.772 (0.619-0.925) in clinical feature-based models (blue line in Fig. 2j). Thus, microbiota-based models were dependable for the early stage of PD, but clinical feature-based models became reliable with the advancement of PD.
We also made random forest models using both bacterial and clinical features to examine whether some clinical features were as essential as bacterial features for HY stage 1 ( Supplementary Fig.  S1, Supplementary Table S3).
Step-wise feature elimination for HY stage 1 revealed that the combined feature-based models and the gut microbiota-based models (black solid line and red dotted line in Supplementary Fig. S1b, respectively) became identical, when the number of features became six or less. Thus, all clinical features were eliminated in the combined feature-based models, when the number of features became six or less. This indicates that none of the 31 clinical features were as predictive as the remaining six bacterial features. Two bacterial features made the maximum AUROC (an arrow in Supplementary Fig. S1b), and are indicated in Supplementary Table S3.
In contrast, clinical features constituted two out of nine essential features for HY stage 2, and three out of six essential features for HY stage 3 (Supplementary Table S3). This indicates that some clinical and some bacterial features were similarly essential to predict the progression of PD for HY stages 2 and 3. However, nested cross-validation showed that the combined models were not as good as either microbiota-based models or clinical feature-based models for each HY stage, which was likely due to the inclusion of a large number of non-informative features. In contrast to nested cross-validation ( Supplementary Fig.  S1d, g), cross-validation showed that the combined models outperformed both microbiota-based models and clinical feature-based models ( Supplementary Fig. S1e, f, h, i) for HY stages 2 and 3, which indicates the requirement of nested crossvalidation to examine the feasibility of modeling strategies.
We next asked whether gut microbiota is able to predict changes of MDS-UPDRS III, representing objective motor symptoms, in two years. As gut microbiota was able to predict the progression of PD at the early stage of PD, we divided PD patients in half using MDS-UPDRS III to make cohorts of the early and advanced stages of PD patients. We then sorted the rates of change of MDS-UPDRS III in two years in ascending order. The top and bottom halves of patients constituted the stable and deteriorated groups, respectively. Nested cross-validation of microbiota-based models for the early and advanced PD patients yielded AUROCs of 0.728 (95% confidence interval: 0.601-0.854) and 0.586 (95% confidence interval: 0.449-0.723), respectively. Cross-validation of microbiota-based models identified four essential genera. Genera Faecalibacterium, Dorea, Ruminococcus gnavus group were decreased, while genus Bacteroides was increased, in the deteriorated group. In HY-based models, genera Faecalibacteirum and Ruminococcus gnavus group constituted essential features in HY stage 1 and 2, respectively. Genera Faecalibacterium and Dorea are SCFA-producing bacteria. Identification of the early stage of PD and evaluation of the disease progression both by MDS-UPDRS III similarly showed that gut microbiota predicted the progression of PD in two years for the early stage of PD.
As we evaluated all PD patients in outpatient clinics, MDS-UPDRS III scores were obtained in the "on" state. We estimated MDS-UPDRS III scores in the "off" state by adding 7.3, 8.5 and 6.1 for patients taking levodopa only, levodopa and any other medications, and dopamine agonists without levodopa according to a report by Bordelon and colleagues 50 . Nested cross validation with the adjusted MDS-UPDRS III scores in the "off" state yielded AUROC of 0.717 (0.550-0.883) and 0.571 (0.444-0.697) for the early stage and the advanced stages of PD, respectively, which were similar to those obtained with the "on" state. We examined whether combinations of medications, as well as levodopaequivalent daily doses (LEDDs), were different between the stable and deteriorated groups at years 0 and 2. We found that combinations of medications were not statistically different between the stable and deteriorated groups at years 0 and 2 (Supplementary Table S4). Although the rate of change of LEDDs in two years was marginally higher in the stable group than the deteriorated group, no statistical significance was observed ( Supplementary Fig. S2).

Differences of taxonomic relative abundances between the deteriorated and stable groups
We collated the feature importance and p-value of bacterial and clinical features that attained a maximum AUROC for HY stages 1, 2, and 3 in Supplementary Tables S5 and S6, respectively. Again, microbiota-based models for HY stages 1 and 2, but not HY stage 3, were dependable. Two and seven essential genera made dependable models for HY stages 1 and 2, respectively, while genus Fusicatenibacter was shared between HY stages 1 and 2. To ask why essential genera were different between HY stages 1 and 2, we plotted relative abundances of the eight genera in the deteriorated and stable groups for each of HY stages 1, 2, and 3 ( Fig. 3). We found that these genera were changed in the same direction in the stable and deteriorated groups for HY stages 1, 2, and 3, except for genus Bifidobacterium in HY stage 1. The same genera were likely to have the same effects on the progression of PD independent of HY stages, but the effect size of each genus was different from HY stage to HY stage. It was also interesting to note that Fusicatenibacter, Faecalibacterium, Blautia, and Akkermansia (green letters in Fig. 3) were significantly different between controls and PD in our data in our previous analysis 38 .

Change of relative abundances of four genera in two years
Plots of the four genera, Fusicatenibacter, Faecalibacterium, Blautia, and Akkermansia, at years 0 and 2 in the deteriorated group showed that all the four genera remained unchanged in two years (Fig. 4a, c, e, g). Additionally, plots of these four genera in the course of progression of α-synucleinopathy showed that Akkermansia becomes higher with disease progression, whereas Fusicatenibacter, Faecalibacterium and Blautia become lower with disease progression, with statistically significant monotonous trends for all the four genera (Fig. 4b, d, f, h). No change of the four genera even in the deteriorated group and the change of the four genera with disease progression suggest that patients harboring dysbiosis of the four genera progress more rapidly than those without dysbiosis. Year 0 Year 2 *Clinical features were obtained at year 2 from 17 additional PD patients at HY stages 4 and 5, but were excluded from modeling. † Fecal samples were obtained at year 2 from 9 additional PD patients at HY stages 4 and 5, but were excluded from analysis of temporal profiles.  Fig. 4. Construction of prediction models was constituted of two steps: (1) nested cross-validation has no leakage between the training and test datasets, and is for evaluation of the modeling strategy, and (2) cross-validation has marginal leakage between the training and test datasets, and is for determination of essential features to predict the progression of PD. Recursive feature elimination (FRE) was employed in both steps. AUROCs were calculated in both steps, but AUROC of the nested cross-validation should be dependable because of lack of potential leakage.

DISCUSSION
We examined whether gut microbiota was able to predict the progression of PD in two years, and asked whether gut microbiota predicted the progression of PD more precisely than clinical features. We first scrutinized whether our modeling approach was appropriate or not by nested cross-validation, in which there should be no leakage between the training and test datasets (Fig.  1). Nested cross-validation showed that bacterial features predicted the progression of PD with AUROC = 0.799 for HY stage 1, and the efficiency decreased for HY stages 2 and 3 ( Fig. 2 Table 2). Bacterial features similarly predicted the progression of PD evaluated by MDS-UPDRS III with AUROC = 0.728 for the early stage of PD. Similarly, bacterial features predicted the progression of PD evaluated by MDS-UPDRS III in the estimated "off" state with AUROC = 0.717 for the early stage of PD. However, lack of the MDS-UPDRS III in the "off" state in our patients was a limitation of our analysis. In contrast to gut microbiota-based models, nested cross-validation showed that clinical features predicted the progression of PD with AUROC = 0.772 for HY stage 3, and the efficiency decreased for HY stages 2 and 1 ( Fig. 2 and Table 2). Thus, bacterial and clinical features predicted the progression of PD at the early and medium stages of PD, respectively.
Two and seven genera were essential to predict the disease progression for HY stages 1 and 2, respectively, and only genus Fusicatenibacter was shared between HY stages 1 and 2 ( Fig. 3 and Supplementary Table S5). Plots of the relative abundances of the eight genera showed that seven out of the eight genera had the same effects on the progression of PD for HY stages 1, 2, and 3 (Fig. 3). The difference in the effect sizes of each genus for different HY stages was likely to have given rise to different essential genera for each HY stage.
Clinical features were better than bacterial features in predicting the progression of PD for HY stage 3, which was likely because at HY stage 1 all PD patients exhibited minimal and similar clinical phenotypes, and clinical features were not informative enough to predict the prognosis. In contrast to HY stage 1, PD patients at HY stage 3 exhibited widely variable phenotypes from infrequent episodes of toppling down to marked difficulty in walking. Among a total of 12 essential clinical features to predict the prognosis of PD for HY stages 1, 2, and 3 (Supplementary Table S6), six features (age, MMSE, tremor, postural instability, walking and balance, and gait) were previously reported to be associated with the progression and the mortality rate of PD in two original articles 18,21 and a review article summarizing 27 original articles 17 . Two features (disease duration and levodopa dosage) were also predictive of the progression and the mortality rate of PD in some but not in all reports 17,18,21 . The remaining four features (BMI, COMT inhibitor, kinetic tremor of hands, and stool frequency) have not been analyzed in association with the progression and the mortality rate of PD to the best of our knowledge. Both previous studies 17-21 and our current study point to the notion that tremordominant type of PD progresses more slowly than postural instability/gait-dominant type of PD.
A machine-learning model was recently reported to predict the prognosis of PD using clinical features, which was aimed at predicting the progression of total MDS-UPDRS in two or four years 22 . AUROC of their model was 0.70 22 . The important features were autonomic dysfunction, mood impairment, anxiety, iRBD, cognitive decline, and memory impairment 22 . MMSE in our model for HY stage 2, and cognitive decline and memory impairment in a model for total MDS-UPDRS 22 , were the only important features shared between the two models. Lack of shared features may be accounted for by the differences in the number of features: 601 features in their model vs 31 features in our model.
After assuring the validity of our modeling strategy by nested cross-validation, we determined essential bacterial features by crossvalidation with RFE (Fig. 1). For HY stage 1, the best model to predict the progression of PD was constructed by two genera (Fusicatenibacter and Faecalibacterium) (Supplementary Table S5). Similarly, for HY stage 2, the best model was constructed by seven genera including Fusicatenibacter and Blautia (Supplementary Table S5). These three genera (Fusicatenibacter, Faecalibacterium, and Blautia) are SCFA-producing bacteria. Faecalibacterium was decreased in PD across countries in meta-analyses by us 38 and others 51 . Similarly, Fusicatenibacter and Blautia were decreased in PD except for Germany in meta-analyzes by us 38 and others 51 . Reduced abundances of these genera were thus hallmarks of rapid progression in the early stage of PD, as well as hallmarks of PD. As stated in the introduction, decreased SCFAs are potentially associated with abnormal activation of neuroinflammations in the brain 47,48 . In addition, increased genus Akkermansia also predicted the progression of PD for HY stage 2 in our model (Supplementary Table S5). In contrast, Akkermansia is protective against ALS 52 and epilepsy 53 in mouse models, as well as diabetes mellitus [54][55][56] and obesity 57-60 in humans. In a mouse model of ALS, nicotinamide produced by Akkermansia improves motor symptoms 52 . In a mouse model of epilepsy, ketogenic-diet increases Akkermansia, which inhibits seizure by decreasing gamma-glutamylated amino acids in the colon lumen 53 . As epidemiological studies indicate that diabetes mellitus increases a risk of PD 1.85-folds 61 , Akkermansia should decrease a risk of PD by normalizing glucose metabolisms [54][55][56] . We, however, showed that Akkermansia was rather associated with the development and progression of PD. Akkermansia-mediated improvement in glucose metabolisms may have no effect on the prevention of the development of PD by unknown mechanisms. Akkermansia thickens the mucin layer of mice when fed with a high fat diet 45 . On the other hand, Akkermansia degrades the mucin layer in mice when fed with fiber-free diet 44 . Similarly, Akkermansia induces intestinal inflammation and increases intestinal permeability by possibly generating hydrogen sulfide 62 . The intestinal environment of PD patients is likely to be similar to the latter situation since expression of the tight junction protein, occludin, is decreased in PD 63 , and intestinal permeability is increased in PD 23,46 . The association of pesticides and herbicides with PD has been repeatedly reported in 440 original epidemiological studies and 69 review articles 64 . The increased intestinal permeability might have led to exposure of the intestinal nerve plexus to pesticides/herbicides and other toxins. Alternatively, increased intestinal permeability may enhance the intestinal oxidative stress, as observed in increased intestinal staining for nitrotyrosine in PD patients 46 , which may potentiate the formation of α-synuclein fibrils. Increased Akkermansia is thus likely to have substantial effects on the development and progression of PD, but about 42% of PD patients had no intestinal Akkermansia in our cohort 38 . PD in patients without Akkermansia is likely to be mediated by SCFA-producing or other unrecognized bacteria, or not by dysbiosis of gut microbiota.
Although we obtained stool samples in 50.4% of PD patients at year 2, we unexpectedly observed that relative abundances of four genera (Fusicatenibacter, Faecalibacterium, Blautia, and Akkermansia) remained unchanged in two years even in the deteriorated group (Fig. 4a, c, e, g). Nevertheless, we observed that genus Akkermansia was increased with the progression of α-synucleinopathy, whereas genera Fusicatenibacter, Faecalibacterium, and Blautia were decreased with the progression (Fig. 4b, d, f, h). Thus, increased Akkermansia, Fig. 2 Validation of modeling strategy by nested cross-validation and determination of essential features by cross-validation. a, d, g, and j ROC curves of nested cross-validation of random forest models for HY stages 1-3 (a), 1 (d), 2 (g), and 3 (j) at year 0. Red and blue solid lines represent models constructed by bacterial and clinical features, respectively. The optimal point by Youden index is indicated by a dot with the specificity and sensitivity in parentheses. b, e, h, and k AUROCs by leave-one-out cross-validation of random forest models for HY stages 1-3 (b), 1 (e), 2 (h), and 3 (k) at year 0, while features were recursively eliminated. An arrow points to the maximum AUROC with the number of features. c, f, i, and l ROC curves of leave-one-out cross-validation of random forest models at the maximum AUROC for HY stages 1-3 (c), 1 (f), 2 (i), and 3 (l) at year 0. The optimal point by Youden index is indicated by a dot with the specificity and sensitivity in parentheses. Green ROC curve in f represents a model predicted by Fusicatenibacter alone, and its AUROC is plotted in green in e.   The number of PD patients at HY stages 1, 2, and 3 at year 0 were 24, 85, and 56, respectively. The seven following statistical measures were calculated when AUROC was >0.6. and decreased Fusicatenibacter, Faecalibacterium, and Blautia were not due to the progression of PD, but were likely to have driven the progression of PD. In other words, PD patients with these taxonomic changes were likely to be destined to progress rapidly. These observations are also in accordance with the assumption that intestinal dysbiosis of these genera determines the progression of αsynucleinopathy. We thus may be able to retard the progression of PD in the early stages by therapeutic intervention with pre-, pro-, and post-biotics to normalize gut dysbiosis or to compensate for defective gut metabolisms. Our cohort did not include PD patients with other chronic diseases including diabetes mellitus, heart failure, liver cirrhosis, malignancy, hematological diseases, and autoimmune diseases. Similarly, our cohort did not include PD patients who claimed to have taken antibiotics in the past one month. At year 2 (November 2018-May 2021), fecal samples were obtained from 113 PD patients, and clinical features were obtained from 182 PD patients. We obtained clinical features at year 2 in 182 out of 224 PD patients (81.3%). We similarly obtained fecal samples at year 2 in 113 PD patients (50.4%). Loss of a substantial number of patients in 2 years was partly because participants complained that it was too much burden to take and send fecal samples to us. Additionally, some participants moved out to other hospitals. We also included 137 healthy controls whose fecal

DNA isolation and 16S rRNA V3-V4 gene amplicon sequencing
The samples were transported from the participant's home to Nagoya University below 4˚C, freeze-dried 66 , and subjected to DNA isolation and sequencing of the 16S rRNA V3-V4 region using a pair of primers (341F, 5'-CCTACGGGNGGCWGCAG-3' and 805R, 5'-GACTACHVGGGTATCTAATCC-3') 38,42 . Paired-end sequencing of 300-nucleotide fragments was performed using the MiSeq reagent kit V3 on a MiSeq System (Illumina). The 16S rRNA gene amplicon sequencing data were analyzed by QIIME2 67 with DADA2 using the SILVA taxonomy database release 138 68,69 . Deteriorated and stable groups for combined HY stages 1-3 and each of HY stages 1, 2, and 3 For HY stages 1-3 (165 patients), 1 (24 patients), 2 (85 patients), and 3 (56 patients), we divided PD patients into the deteriorated and stable groups. The deteriorated group was comprised of PD patients with an advanced HY stage at year 2 compared to year 0. In contrast, in the stable group, the HY stage remained unchanged or was decreased at year 2. We excluded 14 patients at HY stage 4 at year 0, because only two of them were advanced to HY stage 5 in two years. The unbalanced dataset should give rise to a biased model that would be in favor of predicting no progression. We also excluded three patients with HY stage 5 at year 0, because this was the final stage of PD. We obtained 113 pairs of stool samples at years 0 and 2. To analyze taxonomic changes in two years, we excluded nine pairs of stool samples with HY stages 4 and 5 at year 0. A total of 104 pairs of stool samples were thus used to analyze taxonomic changes in two years (Fig. 1).

Bacterial and clinical features for random forest modeling
We filtered intestinal genera under the following conditions. For each dataset, we selected genera with >0.5% relative abundance on average. The numbers of genera that satisfied this criterion were 44, 41, 43, and 42 for HY stages 1-3, 1, 2, and 3, respectively. These genera were used as features to predict whether HY stages were advanced or not in two years for each dataset. We similarly limited the number of clinical and demographic features to 31 to prevent overfitting of our models and also to match the number of bacterial features. The clinical and demographic features included age, sex, body mass index (BMI), disease duration, stool frequency per week, and HY stage at year 0. The clinical features also included the use of proton pump inhibitor, H 2 blocker, antihyperlipidemic drug, angiotensin II receptor blocker, calcium channel blocker, COMT inhibitor, anticholinergic agent, dopamine agonist, MAO-B inhibitor, and amantadine, as well as levodopa/carbidopa dosage. We also used MDS-UPDRS to differentiate dominance in tremor and postural instability with gait difficulty 70  Construction of random forest models to predict whether HY stages are advanced in two years or not We constructed random forest models with sklearn.ensemble.RandomFor-estRegressor function on Python 3.8.2 to differentiate the deteriorated and stable groups for HY stages 1-3, 1, 2, and 3 at year 0 using bacterial and clinical features. We followed the AUC-RF method to determine the bacterial and clinical features 71 , which was previously adopted to make random forest models using bacterial features to differentiate adenoma and colon cancer 72 . The outline of this analysis was illustrated in Fig. 1. We first examined the performance of our modeling strategy by nested crossvalidation with recursive feature elimination using sklearn.feature_selection.RFECV function on Python 3.8.2. In the nested cross-validation, the outer loop was comprised of leave-one-out cross validation (LOOCV), whereas the inner loop was comprised of 10-to-20-fold cross validation depending on the number of samples. In the inner loop, features were recursively eliminated one by one to obtain the best combination of features that gave rise to the highest AUROC. The best model in the inner loop was generated, and was applied to predict the prognosis of a patient that was left out by LOOCV. The nested cross-validation should have no leakage between the training and test datasets (type 1 circularity 49 ), but could not provide us with bacterial and clinical features to make clinically applicable models. We thus determined essential bacterial and clinical features by recursive feature elimination using the sklearn.feature_selection.RFE function on Python 3.8.2. For the determined essential genera, we compared the relative abundances in the stable and deteriorated groups at year 0 by the Wilcoxon rank sum test. Python code to perform nested cross-validation and cross-validation are available upon request.

Seven statistical measures to represent the model performance
We evaluated the performance of random forest models by the following seven statistical measures. TP, FP, FN, and TN indicate true positive, false positive, false negative, and true negative, respectively.
Rate to predict true positives and true negatives in the whole dataset Rate of true positives in predicted positives Rate of true positives in actual positives Rate of true negatives in actual negatives Harmonic mean of precision and recall. Higher precision and higher recall increase F1 score, but discrepancy between precision and recall lowers F1 score.
A correlation coefficient between the actual and predicted binary conditions while the numbers of each condition are balanced. Unlike the other parameters, MCC balances the ratio between actual positives and actual negatives.

Statistical analysis
Relative abundances of intestinal bacteria were analyzed by the Wilcoxon signed-rank test for matched pairs with the wilcoxon functionality of scipy. stat, and by the Wilcoxon rank sum test for unmatched pairs with the mannwhitneyu functionality of scipy.stat, both on Python 3.8.2. Jonckheere-Terpstra trend test to examine whether intestinal bacteria increased or decreased monotonically was performed with jonckheere.test of library PMCMR on R version 4.1.0. The area under the receiver operating characteristic curve (AUROC) was calculated with the roc_curve functionality of sklearn.metrics on Python 3.8.2. P-values < 0.05 were considered to be significantly different.

DATA AVAILABILITY
FASTQ files of our dataset are available at the DNA Data Bank of Japan (DDBJ) under the accession numbers of "DRA009229" and "DRA012438" for year 0, and "DRA012445" for year 2.