Systematic identification of the role of gut microbiota in mental disorders: a TwinsUK cohort study

Mental disorders are complex disorders influenced by multiple genetic, environmental, and biological factors. Specific microbiota imbalances seem to affect mental health status. However, the mechanisms by which microbiota disturbances impact the presence of depression, stress, anxiety, and eating disorders remain poorly understood. Currently, there are no robust biomarkers identified. We proposed a novel pyramid-layer design to accurately identify microbial/metabolomic signatures underlying mental disorders in the TwinsUK registry. Monozygotic and dizygotic twins discordant for mental disorders were screened, in a pairwise manner, for differentially abundant bacterial genera and circulating metabolites. In addition, multivariate analyses were performed, accounting for individual-level confounders. Our pyramid-layer study design allowed us to overcome the limitations of cross-sectional study designs with significant confounder effects and resulted in an association of the abundance of genus Parabacteroides with the diagnosis of mental disorders. Future research should explore the potential role of Parabacteroides as a mediator of mental health status. Our results indicate the potential role of the microbiome as a modifier in mental disorders that might contribute to the development of novel methodologies to assess personal risk and intervention strategies.

increased risk of developing MD.This layer specifically examines the co-twins of individuals who have been diagnosed with MD (discordant twin pairs) in relation to any twin who is diagnosed with MD (from both concordant and discordant pairs).Layer (e) involves a broader comparison between all healthy twins and all MD twins (both from discordant and concordant twin pairs).Thus, layer (e) allows for a more specific investigation of the unique characteristics associated with being a healthy co-twin and the potential factors that contribute to their increased risk of developing MD, whereas layer (e) encompasses the larger, broad population.This study is the first exploratory analysis to discover potential microbiome-host interactions in the role of MD from genetically identical or similar individuals hence aims to minimize the role of genetics in gut microbiome-neurological phenotype associations.We only analyzed individuals with matched and complete microbiome and metabolome datasets, comprising a total of 438 individuals (219 twin pairs).Due to the small sample size of our study, we report associations with a false discovery rate (FDR) cut-off below 0.2 and report our findings in the context of earlier studies.There are two main layers: (i) top layer: pairwise analyses in which we control for early life events, genetic background, and shared environment/household in which the twins grow up (ii) bottom layer: linear mixed regression analyses in which individual-level confounders were controlled for (antibiotics, diet differences, BMI, age).The top layer is divided into three parts (a) analysis of MZ twins, discordant in MD for a total of 28 twin pairs (b) analysis of DZ twins, discordant in MD for a total of 46 pairs, (c) analysis of twins, discordant for MD for a total of 74 pairs.The bottom layer studies (d) the healthy co-twins (who are assumed to be at increased risk of developing MD) with MD twins (a comparison of 74-144 twins) and (e) all healthy twins compared to all twins with MD diagnosis.The arrow indicates a higher interindividual heterogeneity and an increase in sample size the more you go toward the bottom of the pyramid.The study aims to find alterations in the microbiome and metabolome landscape in MD.

General description cohort
A total of 438 twins (219 twin pairs) were included in the study, amongst which 424 were female and 14 male twins.97 twin pairs were MZ, 122 twin pairs were DZ, and all twin pairs were same-sex pairs.Descriptive cohort characteristics are found in Table 1.
In this study, there is a prevalence of affected twins in the MZ population of 37.1%, while 29.5% in the DZ population.We found proband-wise concordance rates (probability that healthy co-twin will also have a history of MD) being 61% among MZ twins and 36% in DZ twins.No significant relationship was found between zygosity and the presence of MD (p = 0.114), nor between zygosity and phenotype concordance (p = 0.06562).However, when considering all MD-diagnosed twins, a significant relationship was found between concordance of the phenotype and zygosity (p = 0.004591).A significant relationship was also found between MD diagnosis and zygosity when considering all concordant twins (p = 0.00513).These results suggest that genetic factors may partially explain the MD phenotype that runs in twins.Given these results, we hypothesize that the healthy cotwin is at an increased risk of developing an MD which raises questions on how the microbiome might act as a potential modifier of this risk.Why is the healthy co-twin still healthy?Can the microbiome composition along with its functional effects contribute to his/her health status?

Microbiome and metabolome composition are significantly more similar in MZ twins compared to DZ twins
A comparison of Bray-Curtis dissimilarity metrics revealed a statistically significant greater dissimilarity in DZ twin pairs compared to MZ twin pairs (p = 0.028, Student's t-test).Additionally, twins in the same twin pair demonstrated a significantly lower dissimilarity in their microbiomes, compared to twins from other pairs (p = 3.4e−11, Student's t-test).Similar findings apply to the plasma metabolome composition, for which the Euclidean distance was used to compare metabolic profiles in MZ and DZ twin pairs.MZ twin pairs revealed a significantly lower Euclidean distance compared to DZ twin pairs (p = 0.036, Student's t-test), as well as individuals in the same twin pair showed a significantly lower Euclidean distance compared to twins from other pairs (p < 2.2 e−16, Student's t-test) (Fig. 2).
The same pairwise analysis was performed in the population of DZ twins discordant in MD (DZ population, Fig. 1b).Out of the 18 significant genera identified, two genera overlapped with the findings in the MZ twin pairs, i.e., Ruminococcaceae UCG 013 and Family XIII AD3011 group (Supplementary Fig. S2).Moreover,  60 metabolites were identified to be significantly different between MD and healthy twins (p < 0.05), but none of the ten metabolites identified in the MZ twin pair analysis were re-identified in the DZ twin pair analysis (Supplementary Fig. S3).The predicted metabolic pathway, folate biosynthesis, was identified as significantly differentially abundant between MD and healthy co-twins (p < 0.05).Other metabolic pathways involving vitamin metabolism were identified at a significance threshold of 0.1: ascorbate and aldarate metabolism (p = 0.09), and retinol metabolism (p = 0.09) (Supplementary Fig. S4).When analyzing all discordant twins (combined MZ-DZ population, Fig. 1c), four of the genera identified in the MZ twin pair analysis, and 17 of the genera identified in the DZ twin pair analysis remained significantly abundant (p < 0.05).Regarding the metabolomics analyses, ten metabolites that were identified amongst the MZ twins could not be re-identified, whereas nine metabolites of the DZ twin pair analysis were significantly re-identified.Two metabolic pathways were identified in common between the pairwise analyses in DZ and all discordant twins: folate biosynthesis and homologous recombination.A summary of the overlapping findings is shown in Fig. 4.

A distinct set of bacterial genera were found to be significantly associated with the diagnosis of mental disorder in linear mixed regression analyses
A distinct set of associated bacterial genera and metabolites, resulting from previous analyses (Fig. 4), were regressed against individual-level confounders by using linear mixed models (LMMs) in two setups: healthy cotwins versus all MD-twins (risk-bearing, healthy co-twins vs MD twins, Fig. 1d), and all healthy twins versus all MD-twins (all healthy vs all MD, Fig. 1e).Confounders such as age, gender, BMI, antibiotics usage of the prior month, and vegetable and fruit intake were included.The MD diagnosis was used as a fixed effect, and random slopes were adopted to capture the twin's zygosity.Out of the 41 bacterial genera, 14 genera were significantly associated with the MD diagnosis in a comparison of healthy co-twins (74) with MD-twins (144) (FDR < 0.2) (Fig. 5).Comparing their distributions, shifts can be seen in the more abundant taxa (left-skewed distributions with fewer negative CLR-transformed values), including genera Parabacteroides, Ruminococcaceae UCG 014, Ruminococcaceae UCG 013, Family XIII AD3011 group and unclassified Christensenellaceae.The existing imbalance in the microbial ecosystem coincides with shifts of more abundant taxa, along with multiple shifts in lower abundant taxa.This was not the case for the comparative setup of all healthy twins (294) with MD twins (144).Microbiome and metabolome (dis)similarities between twins.Top: Bray-Curtis dissimilarity in DZ and MZ twins.within-twin pair refers to twin-twin dissimilarity in microbiome composition within a twin; outside-twin pair refers to each of the twins' microbiome composition in comparison to each of the other twins (except his/her own co-twin).Bottom: Euclidean distances in DZ and MZ twins; and within-twin pair and between-twin analyses.Each dot represents a pair of twins (within-pair) or a pair of twins (independent of each other).Differences in dissimilarity were reported (Student's t-test, significance level 0.05).
Predicted metabolic pathways (KEGG Level 3) from gut microbiome resulting from previous pairwise analyses were regressed against the same set of confounders and the fixed effect used for MD-diagnosis in LMMs.Metabolic functions of the MD gut microbiome were predicted to have roles in ascorbate and aldarate metabolism,  S1).Regarding the metabolites significantly identified in previous analyses (Fig. 1a-c), none were significantly associated with MD diagnosis in either of the setups.
Metabolites XL HDL TG, total FA, and saturated FA were identified to be significantly associated with MD diagnosis (p-value < 0.05) but remained no longer significant after FDR correction (FDR < 0.20) (Supplementary Table S3).Regarding analyses from layer (e), genus Parabacteroides, and Lachnospiraceae UCG 008 popped up to be significantly associated with MD diagnosis (P-value < 0.05), after accounting for confounders, though no longer significant after FDR-correction (FDR < 0.2) (Supplementary Table S4).None of the previously identified metabolites were found to be significant (Supplementary Table S5).Results of the microbiome analyses across the pyramid (Fig. 1) indicated that genus Parabacteroides was strongly associated with MD diagnosis (lowest p-value, highest estimate in LMM), though a stronger effect was seen in the comparison of healthy twins, at risk, in comparison with MD-twins.All bacterial genera that were significantly identified across the multiple layers of the pyramid are visualized in Fig. 6.Of the previously identified metabolic pathways, ascorbate and aldarate metabolism (increased), carbapenem biosynthesis (decreased), retinol metabolism (increased), drug metabolism by cytochrome P450 (increased), metabolism of xenobiotics by cytochrome P450 (increased) and homologous recombination (decreased) were re-identified in the setup of Fig. 1d (Supplementary Table S6).None of them were re-identified in the setup of Fig. 1e (Supplementary Table S7).A correlation analysis using Spearman's rank correlation coefficients revealed various significant correlations between previously identified differential abundant microbial genera and differentially abundant predicted metabolic pathways (Tax4Fun analyses).The correlation analysis was performed in the population of healthy co-twins versus MD-twins (Fig. 1d).A total of 24 correlations were identified (coefficient >|0.2|,FDR < 0.05).

Discussion
Gut microbiome dysbiosis has been suggested to be a hallmark of mental disorders 45 .In our study, we profiled the gut microbiome and plasma metabolome in both MZ and DZ twins with MD-diagnosed and unaffected twins.Since MDs share genetic underpinnings, healthy co-twins were considered to be at an increased risk of developing an MD.We observed significantly higher similarity in gut microbiome and plasma metabolome composition in MZ twins, compared to DZ twins.This suggests that both microbiome and metabolome composition is shaped by experiencing similar environments, yet hinting at a host genetic effect.These findings support the co-twin control design, in which we control for both genetic diversity and early-life factors/environment that contribute to within-twin microbiome similarity.
Studying the gut microbiome in MD-discordant and concordant twin pairs aims to elucidate the contribution of the gut microbiota in the (risk of) MD development, because of the shared (childhood) environment, i.e., maternal sites and shared household as well as genetic factors between individuals from the same twin pair.We adopted a pyramid-layer design for our analyses.First, twins with the same zygosity were compared for MD in a pairwise manner.Second, healthy co-twins (healthy twins at risk) were compared with all MD twins, as well as all healthy twins against all MD twins.The latter, due to its higher sample size, increased statistical power and allowed us to validate the robustness of previous findings.These results allowed us to find candidate alterations in the gut microbiome or plasma metabolome landscape, assuming a shared genetic background and (childhood) environment.However, since these candidate alterations might as well be the result of individual-level confounders such as antibiotics usage, dietary influences, BMI, and age differences, they were validated in a multivariate design.There, we made a separation between two layers: (1) healthy cotwins in comparison to all MD twins and (2) all healthy versus all MD twins.This separation helps to identify specific alterations that may be associated with the risk of developing MD on the one hand (layer d) and on the other hand, layer (e) aims to provide a broader, comprehensive assessment of healthy vs MD twins across the larger population.The bottom Figure 7. Correlation networks showings the relationships between differential microbial genera and the differential microbial KEGG level 3 metabolic pathways.The node sizes and colors (purple: differential microbial genera; orange: differential microbial metabolic pathways) are proportional to the number of significant associations across the layers of the pyramid.The width and color of the lines (red: positive, blue: negative) are proportional to the correlation strength.Only significant correlations (FDR-adjusted p-values with Benjamini-Hochberg procedure, < 0.05) are displayed, with a correlation strength >|0.2|.Permission has been obtained from Kanehisa laboratories for using KEGG pathway database 80 .layer aids in confirming the significance and association of microbial genera/metabolites with the MD diagnosis.In this way, significant candidate alterations in both microbiome and plasma metabolome were considered potential biomarkers when their presence was consistently noticed in all layers of the pyramid.As a result, genus Parabacteroides was identified in all layers, as most strongly associated with the MD diagnosis rather than with other individual-level confounders.
The increased abundance of genus Parabacteroides (family Tenerellaceae) is in concordance with recently published reports in which the genus was found to be elevated in depressed subjects compared to healthy controls [46][47][48] .Since Parabacteroides has been considered to have health-promoting effects, i.e., the production of SCFAs, the elevation of the genus in depressed subjects has been suggested to be a compensatory mechanism 48 .Parabacteroides species have also been linked to the production of metabolites γ-aminobutyric acid (GABA) 49 and indole 50 of which dysfunctions in their signaling pathways are related to depression and anxiety 51,52 .Parabacteroides enrichment was shown to alter gene expression in pathways associated with metabolic function, neurodegenerative diseases, and dopaminergic signaling 53 .Hence, the Parabacteroides genus has been associated in previous reports with both protective and pathogenic effects on human health 54,55 .Nevertheless, the exact role played by this bacteria in depressive, stress, and anxiety-like behaviors remains to be investigated.
We found a positive correlation between the abundance of Parabacteroides and the glucagon signaling pathway (Fig. 7).The role of Parabacteroides species in the context of metabolic dysfunctions has been discussed in several earlier reports, though with inconsistent findings.
As previously suggested, we would like to argue that functionally related microbial communities rather than single microbes have an overall impact.First, genus Parabacteroides has been confirmed in a recent meta-analysis to be enriched in major depressive disorder in more than 20% of the included studies (24 in total).The authors highlighted Parabacteroides' involvement in greater utilization of glutamate and increased synthesis of GABA 56 .In our research, we noticed overlapping distributions of this genus' (and other genera) abundance in healthy co-twins compared with MD-twins, though with a slightly larger abundance in MD-twins (Fig. 5).However, due to the resolution of the TwinsUK dataset, the higher abundance of this specific genus cannot be brought back to specific species.Second, diagnoses of mental disorders in our research were not limited to depressive disorders, but also include anxiety, stress, and eating disorders that might explain the less strong association compared to earlier studies.For instance, a high abundance of Parabacteroides could be an indication of MD symptoms.Other genera that were identified through multiple layers from our pyramid design include decreased abundances of genera Ruminococcaceae UCG 013, Lachnospiraceae UCG 008, Lachnospiraceae UCG 006, and increased abundances of Ruminococcaceae UCG 014, Family XIII AD3011 group, and unclassified Christensenellaceae. Genera of families Lachnospiraceae and Ruminococcaceae were previously associated with the maintenance of gut health 57 .Members of Lachnospiraceae are among the main producers of SCFA, though different taxa of this family have been associated with intra-and extraintestinal diseases 58 .Since Lachnospiraceae are strongly influenced by diet specifics, such as non-starch polysaccharides and high-fat feeding 58 , the associations in both directions were not unexpected.In contrast to other reports, only a few genera were identified in significant association with MD diagnosis.Gut microbiome composition may be different in patients with an MD in partial or full remission compared to patients with a current illness phase.Since we could not account for illness states, state-related changes in the gut microbiome composition could not be established in our study design 59 .We suggest combining multiple omics approaches (meta-transcriptomics, proteomics, metabolomics) to reveal a systems-level insight into functional differences rather than to retain a compositional focus.
In this study, we observed that metabolic functions involving vitamin metabolism, including retinol and ascorbate and aldarate metabolism were more abundant in the gut microbiome of patients with MDs compared to non-MD controls.It has been hypothesized that vitamin A levels are inversely associated with depression 60,61 .In addition, previous work has established the role of gut microbiota in vitamin A (retinol) metabolism 62,63 .By regulating the expression of gene Rdh7, gut bacteria can curb the production of vitamin A metabolite retinoic acid (RA).The increase in ascorbate metabolism, associated with MD is, in contrast, to what has been found earlier 63 .However, the pathophysiological association of ascorbate with depression is currently not clear 64 .
In the setup of healthy co-twins, assumed to be at risk of developing MD, in comparison with MD-twins (Fig. 1d), more genera were found to be significantly associated with MD than when considering the setup of all healthy twins compared with the MD-twins (discordant and concordant twins included).In addition, the size of the estimate of Parabacteroides in association with MD diagnosis was stronger in the former setup.Thus, the inclusion of healthy concordant twins (layer e) (no identifiable risk of MD) weakened the association of Parabacteroides with MD diagnosis.The number of genera identified and consequently the metabolic pathways differentially identified were different between the two setups (Supplementary Tables S2, S4, S6-7).Further studies with increased study sample sizes are expected to unravel the contributions of microbiota in individuals at increased genetic risk.Unlike other omics data, the microbiome's dynamic nature does not only suggest a role as a modifier to disease risk, but, not unimportantly, appears very suitable for real-life applications in terms of privacy and security reasons 65 .
Compared to healthy controls, none of the metabolomic alterations identified in the pairwise analyses (both MZ and DZ) remained significant after correcting for individual-level confounders.This stands in contrast to several reports that showed plasma metabolomic markers to discriminate healthy controls from individuals with depression and anxiety [66][67][68][69] .Plasma metabolites identified in pairwise analyses in the MZ group (Fig. 2a) varied greatly from those identified in the DZ group (Supplementary Fig. S3).First, the NMR platform mainly captures lipid metabolites rather than the full metabolome profile.Second, the variation might be explained by the BMI differences in the MZ and DZ population, between the MD-diagnosed and control group (Supplementary Fig. S5).The control group in the DZ population demonstrates a higher BMI, compared with the MZ population, along with higher levels of VLDL metabolites in the DZ control group and higher levels of HDL in the MZ control group.This is in line with previous reports on dyslipidemia and obesity 70,71 72 .
Our study has several limitations.First, while highly relevant covariates were included (age, antibiotics, gender, vegetable, and fruit intake), other environmental triggers might not have been comprehensibly tracked.Seen the major contribution of dietary influences on the host's microbiome, more detailed dietary information on micro-and macronutrients and different food groups must be considered to yield better insights into the microbiome-metabolome-brain crosstalk.This will allow us to gain insight into what dietary patterns are protective and contribute to better mental health.Second, all individuals with diagnoses of MD were included, i.e., diagnoses that were ongoing at the moment of sample collection, as well as individuals previously diagnosed with an MD.When the patient is under treatment or has overcome his/her mental illness, the microbiome composition is expected to change again to restore gut dysbiosis 73 .Third, a more balanced population sampling exploiting additional cohorts is necessary to obtain more reliable results.The TwinsUK cohort consists of mainly female individuals (96%).Therefore, the research results are mainly representative of the female population.A genderspecific version of performed analyses might be considered to see the existing sex differences in behavior and risk for neuropsychiatric disorders 74 .Fourth, several diagnoses of MDs were categorized under the same heading to identify common pathways emerging in individuals with MDs, including depression, anxiety and stress, and eating disorders.Although the included diagnoses of disorders differed in their pathophysiology, common pathways are emerging in psychiatric diseases with immune dysregulation being present in a broad spectrum of neuropsychiatric disorders, increasing mental vulnerability and risk of psychiatric symptoms 75 .Furthermore, these diagnoses have shown associations with increased gut permeability, impaired gut-brain communication, and disrupted neurotransmitter production [20][21][22][23][76][77][78] . The cobination of these factors highlights the intricate interplay between the gut microbiota, the immune system, and neurological processes in the context of MDs.Fifth, although the identified changes in the intestinal microbiota have been previously identified along with its mechanisms of altered glutamate and GABA metabolism and butyrate production 56 , we would like to emphasize the need for more established protocols in microbiome studies to reveal the relationships between mental disorders and the gut microbiome (Supplementary Information, Appendix 1).Importantly, the utilization of the 16S rRNA gene sequencing method, which is inherent to the TwinsUK study design, imposes certain limitations on the extent of the microbiome analysis.The method is unable to provide species-level resolution or detailed information on the functional capacity of the microbiota.However, we applied the method with the overall idea to identify differences in the structure and composition of microbial communities.Moreover, we agree that longitudinal cohort studies, the inclusion of confounding covariates to disentangle cause/consequence as well as the exploitation of multi-omics approaches will aid to uncover the activity of our microbiomes beyond their composition.

Conclusions
In this study, we robustly identified that individuals diagnosed with MD showed a greater abundance of the genus Parabacteroides.This genus might offer a novel opportunity for primary prevention.However, the exact role and effect of this genus in MD initiation and progression remain to be investigated in further studies as the effects are likely species-specific.In general, we can conclude that our presented pyramid-layering design in a population of twins offers a robust manner to identify differentially abundant features in diseases with both genetic and environmental components underlying.We propose microbiome-mediated mechanisms as likely gut biomarkers and/or mediators in the context of MD as a rationale for future research.

Study population
The TwinsUK cohort is a longitudinal dataset and the largest research cohort of adult twins from the United Kingdom (http:// www.twins uk.ac.uk/).The cohort comprises over 14,000 volunteers followed over more than two decades.Participants were predominantly female (> 80%) and aged 18-103.This cohort has been extensively studied for a wide range of clinical and behavioral outcomes.All work involving human subjects was approved by the Cornell University IRB (Protocol ID 1108002388) and all methods were performed following relevant guidelines and regulations.Informed consent was obtained from all participants.
In our study, 438 twins (219 pairs) were included, who have both 16S rRNA gut microbiome data and concurrent plasma metabolomics data at the same time point.Amongst the 438 twins (219 twin pairs), 122 were DZ, and 97 were MZ twin pairs (Fig. 8).Regarding the phenotypic variable on MD, different groups were specified according to the Hospital Anxiety and Depression (HADS) questionnaire as follows.For participants that have one of the questions related to Depression (see Appendix 1) specified on "1" (yes), "Anxiety/Stress disorder" and "eating disorder", the variable "MD" was specified on "1" (yes).Individuals who had none of the MD-related variables specified on "1", were defined as "healthy control", in the assumption that "healthy" means no diagnosed MDs.The final dataset consisted of 294 controls and 144 cases with one or more MDs diagnosed.Detailed description of the implementation of the Hospital Anxiety and Depression (HADS) questionnaire in TwinsUK cohort is available 79 .
Microbiome sequences from 3335 samples, including metabolic abundances from 4830 samples, were collected from the TwinsUK cohort.First, for all samples in the microbiome dataset, twin pairs were screened based on zygosity, age, year of birth, the time point at which the microbiome sample was taken, and type of descendant.Only pairs with microbiome samples matched with their corresponding time point were considered, resulting in 1254 pairs with 2508 samples.Next, from these microbiome samples, only 524 individuals were matched with their complementary metabolome samples at the same time point.

Data preprocessing
The microbiome data from the TwinsUK cohort was delivered in a preprocessed format, i.e. amplicon sequence variants (ASVs) were generated using the DADA2 pipeline.Samples achieving a sequence depth of less than 10,000 were excluded, and sequences that were unassigned at Kingdom/Phylum level were removed.Taxonomy was assigned to the ASV sequences with the SILVA reference database 80 .In the next step, the ASVs were imported, together with their associated metadata (see Metadata processing) using the phyloseq package (v1.40.0) 81 .The dataset was filtered using the following criteria: (1) taxa observed only once in the whole dataset were removed, (2) taxa must appear in more than one sample, (3) ASVs with no counts were removed (4) empty samples were removed (none in the dataset).Finally, data were agglomerated to genus level and transformed by using the centered log-ratio transformation (CLR), i.e. each of the taxon counts is divided by the geometric mean of all taxa in a sample, and subsequently logarithmically transformed.The CLR transformation was implemented in the microbiome R package (v1.18.0).Metabolic measurements were available for 228 metabolic traits: 147 metabolite concentrations (comprising groups: Cholesterol, Lipoprotein subclasses, Glycerides and Phospholipids, Apolipoproteins, Total Fatty acids, Amino Acids, Ketone bodies, Fluid balance, and Inflammation), 77 lipid ratios, three lipoprotein particle sizes and a semi-quantitative measure of albumin.Metabolic profiling was conducted by Nightingale Health Ltd. (Helsinki, Finland) using a targeted NMR spectroscopy platform.These metabolic measurements were logtransformed; for all measurements before transformation, a pseudo-count of 1 was added.Next, measurements were shifted to zero mean and scaled to a standard deviation (SD) of 1 82 .Metabolites that were assigned missing values in more than 25% of the individuals were removed.Metabolites that had missing values, in less than 25% of the individuals, were replaced by their median value.For the confounder-related questionnaire data, selected features were included that are known to confound gut microbiome composition: age, gender, BMI, antibiotics, vegetable and fruit intake 83 .Data was taken at the time point at which the microbiome/metabolome sample was collected; if not available, data were collected from the closest available time point.Missing values were replaced by the median value of the respective variable.

Statistical analyses
The analyses were performed using the R statistical software version 4.1.2(2021-11-01, https:// www.rstud io.com/ produ cts/ rstud io/ downl oad/), and different libraries included in Bioconductor (v3.13) were used, including microbiome, vegan (v2.6.2), and phyloseq.Microbiota beta-diversity was calculated by the Bray-Curtis dissimilarity metric, which quantifies the compositional dissimilarity between two samples or groups, considering both overall abundance per sample as well as the abundance of each taxon 84 .The Euclidean distance was used for comparisons of metabolic profiles.Tests for paired data were used to address the between-group comparisons (MD vs healthy) of continuous variables.The Shapiro-Wilk's method was used to test for normality of the distribution and accordingly, the student's t-test (p-value > 0.05) or Wilcoxon's test was used.The level of significance was set at the two-tailed p-value of 0.05.For the multivariate analyses, association analyses were performed for (1) healthy co-twins vs. MD-twins and (2) healthy twins vs. twins with MD.For this, a modified version of the model proposed by New et al. 85 is proposed as follows: where Y i is the microbe/metabolite abundance; α Z j[i] is the random effect where accounted for either MZ or DZ twins, i.e. α MZ,j and α DZ,j respectively; X are the predictor variable(s), β are the slope estimates explaining the contribution of the predictor variable to the dependent variable, and σ 2 is the variance.In the model, each twin was assumed to have a random slope, specifically random slopes for MZ twins and for DZ twins coming from two different distributions.The model estimates variance (relatedness) between MZ twins separately from the variance between DZ twins (relatedness) between DZ twins.CLR-transformed bacterial genera abundances and scaled, log-transformed metabolite abundances were regressed against main confounders: age, gender, BMI, antibiotics usage prior month, vegetable and fruit intake.The regression was done by using a LMM, using the nlme R package (v3.1.157).The MD categorical variable was included as fixed effect.We corrected for multiple testing by using the Benjamini-Hochberg approach.For multivariate analyses, FDR-thresholds were relaxed to 0.25 since we are interested in an exploration and screening of potential features (bacteria/metabolites) of interest and thus less strict on the inflated type-I error.
Differentially abundant bacterial genera along with differentially abundant KEGG pathways, that were significantly associated with MD-diagnosis in the multivariate analyses, were used in the correlation analysis based on Spearman's rank correlation coefficients.The cutoffs for the correlation coefficients and adjusted p-values (FDR correction with Benjamini-Hochberg procedure) were determined to equal 0.2 and 0.05, respectively.Networks were built using Cytoscape (https:// cytos cape.org/; v3.9.1) to represent the correlation network of bacterial genera and metabolic pathways.

Predicted functional potential
The Tax4Fun2 open-source R package (v1.1.5)was used to predict the functional capabilities of the microbial communities based on 16S rRNA datasets, applicable to SILVA-labeled microbial abundances 86 .To predict the community function, the table with sequences of ASVs and the table with sequence abundances were required.Both tables were read by Tax4Fun2 through an external file; sequences were saved in FASTA format through the write.fasta()function from the seqinr package.In the first step, the sequences of the samples were aligned with the package reference database via BLAST, using the runRefBlast() function with database_mode option "Ref100NR" for greater precision in alignment.Next, makeFunctionalPrediction() function was used to obtain the prediction of the abundance of functions in the community, with a similarity cut-off set to 97%.As a result, two files were generated.The first one contained KEGG ortholog profiles, i.e., predictions of the relative abundances of each enzyme indicated for each sample, according to their composition (functional predictions).The second one contained predictions on the abundances of the metabolic pathways of each sample according to its composition.The latter was organized into three hierarchical levels, with different categories as corresponding to the metabolic pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) 87 .Tax4Fun2 output for the level 3 KEGG entries were analyzed.

Figure 1 .
Figure 1.Visualization of the study design.The pyramid layers represent the different setups of this study.There are two main layers: (i) top layer: pairwise analyses in which we control for early life events, genetic background, and shared environment/household in which the twins grow up (ii) bottom layer: linear mixed regression analyses in which individual-level confounders were controlled for (antibiotics, diet differences, BMI, age).The top layer is divided into three parts (a) analysis of MZ twins, discordant in MD for a total of 28 twin pairs (b) analysis of DZ twins, discordant in MD for a total of 46 pairs, (c) analysis of twins, discordant for MD for a total of 74 pairs.The bottom layer studies (d) the healthy co-twins (who are assumed to be at increased risk of developing MD) with MD twins (a comparison of 74-144 twins) and (e) all healthy twins compared to all twins with MD diagnosis.The arrow indicates a higher interindividual heterogeneity and an increase in sample size the more you go toward the bottom of the pyramid.The study aims to find alterations in the microbiome and metabolome landscape in MD.

Figure 2 .
Figure 2. Microbiome and metabolome (dis)similarities between twins.Top: Bray-Curtis dissimilarity in DZ and MZ twins.within-twin pair refers to twin-twin dissimilarity in microbiome composition within a twin; outside-twin pair refers to each of the twins' microbiome composition in comparison to each of the other twins (except his/her own co-twin).Bottom: Euclidean distances in DZ and MZ twins; and within-twin pair and between-twin analyses.Each dot represents a pair of twins (within-pair) or a pair of twins (independent of each other).Differences in dissimilarity were reported (Student's t-test, significance level 0.05).

Figure 6 .
Figure 6.Pyramid layers complemented with significantly associated bacterial genera.Ticks (✔) indicate whether the bacterial genus was significantly identified in a particular layer of the pyramid.Genera are categorized per family represented by distinct colors (except for unclassified subgroup 5, which is identified up to the class level).MZ, monozygotic; DZ, dizygotic; MD, mental disorder; *: genera in these layers were significantly identified at a p-value threshold of 0.05, though no longer significantly identified after FDR correction.