Spontaneous preterm birth (sPTB) is a leading cause of maternal and neonatal morbidity and mortality, yet its prevention and early risk stratification are limited. Previous investigations have suggested that vaginal microbes and metabolites may be implicated in sPTB. Here we performed untargeted metabolomics on 232 second-trimester vaginal samples, 80 from pregnancies ending preterm. We find multiple associations between vaginal metabolites and subsequent preterm birth, and propose that several of these metabolites, including diethanolamine and ethyl glucoside, are exogenous. We observe associations between the metabolome and microbiome profiles previously obtained using 16S ribosomal RNA amplicon sequencing, including correlations between bacteria considered suboptimal, such as Gardnerella vaginalis, and metabolites enriched in term pregnancies, such as tyramine. We investigate these associations using metabolic models. We use machine learning models to predict sPTB risk from metabolite levels, weeks to months before birth, with good accuracy (area under receiver operating characteristic curve of 0.78). These models, which we validate using two external cohorts, are more accurate than microbiome-based and maternal covariates-based models (area under receiver operating characteristic curve of 0.55–0.59). Our results demonstrate the potential of vaginal metabolites as early biomarkers of sPTB and highlight exogenous exposures as potential risk factors for prematurity.
Preterm birth (PTB), childbirth before 37 weeks of gestation, is the leading cause of neonatal death, and may lead to a variety of lifelong morbidities1,2. PTB also reflects a notable racial disparity, manifesting in a substantially higher PTB rate in Black women3. This disparity is driven by various factors, such as the persistent stress of systemic and environmental racism and a lack of access to maternal care4. Spontaneous preterm birth (sPTB), PTB not medically induced, accounts for two-thirds of all PTBs1. Despite extensive efforts, methods for early prediction, prevention or treatment of PTB are lacking1,5,6, and its prevalence remains high1.
The human microbiome is a strong biomarker of many complex diseases7,8,9,10,11. The vaginal microbiome, specifically, has been repeatedly associated with sPTB and other adverse pregnancy outcomes12,13,14,15,16,17. However, a clear consensus on the relationship between the vaginal microbiome and sPTB has yet to emerge18, and our knowledge of specific mechanisms underlying potential host–microbiome interactions in sPTB is lacking.
Metabolites produced or modified by the microbiome have emerged as a prominent factor with potential local and systemic effects on the host19,20,21,22. Their study has been facilitated by metabolomics, which enables the measurement of thousands of small molecules present in an ecosystem, and paired microbiome–metabolome studies have yielded potential mechanistic insights into host–microbiome interactions in various pathologies23,24. A few studies of the vaginal metabolome described associations with the microbiome, inflammation and PTB25,26. However, studies of demographic groups at high risk for sPTB, with measurements of a broad set of metabolites and which generate robust prediction models for sPTB, are still needed to advance our understanding of the role of the vaginal ecosystem in prematurity and other pregnancy outcomes.
Here, we measured the second-trimester vaginal metabolome of 232 pregnant women, for whom the microbiota was previously characterized using 16S ribosomal RNA gene amplicon sequencing14. We show that the vaginal metabolome partially corresponds to community state types (CSTs), reveal associations between metabolites measured in the middle of pregnancy and subsequent sPTB, and propose that some of these metabolites are of an exogenous source. Finally, we devise machine learning algorithms that use the vaginal metabolome to predict subsequent sPTB an average of 3 months before delivery, which we validate on two external cohorts. Our results demonstrate a promising approach for studying potential causes of prematurity as well as for early risk stratification, and highlight the need to study environmental exposures as a risk factor for sPTB.
Vaginal microbiota and metabolome from a pregnancy cohort
We used mass spectrometry to profile 232 vaginal samples collected between 20 and 24 weeks of gestation from women with singleton pregnancies, for which the microbiota was previously characterized from the same timepoint14 (Supplementary Table 1 and Methods). All women with subsequent sPTB and available samples (N = 80), as well as similar term birth controls (TB; N = 152) were included (Table 1). As expected, PTB history was associated with sPTB (Fisher’s exact P = 3 × 10−4).
We quantified 635 identified metabolites, as well as 110 unnamed spectral features (Methods). Metabolites belonged to diverse biochemical classes, including amino acids, lipids, nucleotides, carbohydrates and xenobiotics. Most metabolites (549) were measured in over 50% of the cohort, and 108 metabolites were present in all samples (Extended Data Fig. 1; for discussion of batch processing of the samples, see Supplementary Note 1 and Extended Data Fig. 2). We have previously shown that similar measurements are in excellent agreement with measurements by an independent certified medical laboratory27.
The vaginal metabolome partially preserves CST structure
The vaginal microbiome clusters to well-defined CSTs28. We demonstrated the same for this cohort14 (permutational multivariate analysis of variance (PERMANOVA) P < 0.001; Fig. 1a), and investigated whether the vaginal metabolome recapitulates this structure. The metabolome was separated by CSTs (P < 0.001; Fig. 1b), and was generally associated with the microbiome (Mantel P < 0.001), as previously described29. However, specific CSTs were not as well separated. While the metabolomes of women with CST-I (dominated by Lactobacillus crispatus) and CST-IV (enriched with diverse anaerobes) microbiomes were well separated from the rest of the cohort (PERMANOVA P < 0.001 for both), neither the metabolomes of women with CSTs IV-A and IV-B, nor with CST-II (dominated by Lactobacillus gasseri) and CST-III (dominated by Lactobacillus iners), were well separated from one another (P = 0.158 and P = 0.155, respectively). Overall, these results demonstrate a strong but imperfect correspondence between the vaginal microbiome and metabolome.
Metabolite clusters associate with sPTB
Next, we performed de novo k-medoids clustering of the metabolome, revealing six ‘metabolite clusters’ (MCs A–F; Methods, Fig. 1c, Extended Data Fig. 3 and Supplementary Table 2), which are not as well separated as the separation of the vaginal microbiome to CSTs. The metabolite subpathway most enriched within each MC was polyamine metabolism, dipeptides, dicarboxylated fatty acids, glutamate metabolism, tricarboxylic acid cycle and dipeptides for MCs A–F, respectively (Fisher’s exact P < 0.05 for all). Amino-acid-related metabolites were similarly enriched in MCs A,B and D (P < 0.01, q < 0.1 for all), and xenobiotics in MC-C (Fisher’s exact P = 0.005, q < 0.1). While MCs A–D are mostly paired with Lactobacillus-dominated CSTs (54–93%), MC-F is composed entirely of CST-IV, and MC-E is evenly split (50% CST-IV; Fig. 1d and Extended Data Fig. 4a). Reciprocally, we found various enrichments of CSTs in MCs (Extended Data Fig. 4b).
Similar to the strong association between the global microbiome signature and self-identified race in this cohort (PERMANOVA P < 0.001; Extended Data Fig. 4c), we saw a significant difference between the metabolome of Black and White women (P < 0.001; Extended Data Fig. 4d). However, we found only mild differences between these subgroups in their assignments to MCs (Fig. 1e). Interestingly, while CSTs are only weakly associated with sPTB and only in White women (Fisher’s exact P = 0.047, q = 0.21; Fig. 1f and Extended Data Fig. 4e; similar to a previous analysis14), we found that several MCs are significantly associated with sPTB in Black women (P = 0.047, P = 0.025 and P = 0.006, respectively, for MCs A, B and D; q < 0.1 for all; Fig. 1g and Extended Data Fig. 4f). However, we observed no significant associations with early PTB (<32 weeks; q > 0.1 for all, Extended Data Fig. 4g). Taken together, our results demonstrate that the metabolome structure in this cohort better captures associations with prematurity in Black women than the microbiome structure.
Multiple metabolites associate with sPTB
We next investigated associations between sPTB and specific metabolites. We found four metabolites that are significantly associated with sPTB (Mann–Whitney P < 0.05, q < 0.1; Fig. 2a and Extended Data Fig. 5a). Three of these, ethyl β-glucopyranoside (ethyl glucoside; P = 1.9 × 10−4, q = 0.065); tartrate (P = 4.8 × 10−4, q = 0.078); and diethanolamine (DEA; P < 10−10, q = 5 × 10−8), all higher in sPTB, appear to be of exogenous source30,31,32,33,34,35,36. We confirmed this using AMON37 (Methods), a method that predicts metabolite origins, which predicted that DEA and tartrate were of xenobiotic origin (no prediction could be made for ethyl glucoside; Supplementary Table 3). Of note, DEA is also associated with MC-A (P = 0.006, q = 0.014) and MC-D (P = 0.04, q = 0.07), the MCs we found to be enriched with sPTB (Fig. 1g). Despite their likely exogenous source, these metabolites were detected in >95% of this cohort (Extended Data Fig. 5b).
We further found lower levels of choline in women with subsequent sPTB (P = 5.5 × 10−4, q = 0.078; Fig. 2a,b). Choline is an essential nutrient38, and lower choline levels were previously found in cord blood from premature infants39. Choline is also a precursor of betaine40, an osmoregulator that was also negatively associated with sPTB (P = 0.007, q = 0.29; Fig. 2b). DEA is known to disrupt choline metabolism41, and its dermal administration in mice depleted hepatic choline42,43. We therefore propose that the higher levels of DEA in sPTB may also be linked to lower choline and betaine levels (Fig. 2b,c). DEA was further shown to be carcinogenic44 and teratogenic42 in mice. However, the relative nature of our metabolomic assay precludes quantitative comparison with levels measured in previous studies. Taken together, these results highlight a potential role of several metabolites in prematurity, some of which may arise exogenously from environmental exposures.
Metabolite associations interact with race and sPTB timing
As the metabolome differed between Black and White women, we performed the same association analysis while stratifying by race. Interestingly, we detected five additional metabolites negatively associated with sPTB (Mann–Whitney P < 0.05, q < 0.1; Fig. 2a and Extended Data Fig. 5a). In Black women, these included glycerophosphoserine (P = 3 × 10−5, q = 0.014), previously reported to be altered in pre-eclampsia45; spermine (P = 3.5 × 10−4, q = 0.07), previously shown to be increased in the blood of preterm infants46; hydroxybutyl carnitine (P = 2.6 × 10−4, q = 0.065), a ketocarnitine shown to be depleted in the blood of low-birth-weight full-term neonates47; and glutamate γ-methyl ester (P = 4.9 × 10−4, q = 0.078). Tyramine, a biogenic amine, was significantly lower in samples from White women who delivered preterm (P = 2.8 × 10−4, q = 0.065; Fig. 2a). Tyramine was shown to co-localize with synaptic vesicles in the mouse uterine plexus, highlighting a possible role in uterine contractions48. Altogether, these results highlight the potential connection among vaginal metabolites, metabolite levels in other organs and sPTB.
As several participants in this cohort (N = 13, N = 11 in Black women) were treated with intravaginal progesterone before or close to sample collection (at weeks 18–23 of gestation), we performed the same analysis only in women not treated with vaginal progesterone. One association, between glutamate γ-methyl ester and TB in Black women (Fig. 2a) no longer passed correction for multiple hypothesis testing (P = 0.002, q = 0.12; Extended Data Fig. 5c). However, we found an additional seven metabolites to be associated with TB in Black women (all P < 0.05; q < 0.1; Extended Data Fig. 5c). These included proline (P = 6 × 10−4, q = 0.082), which accounts for about a quarter of the amino acid residues of collagen49, and is integral to the extracellular matrix; spermine, a polyamine important for placental angiogenesis50, which was lower in Black women with subsequent sPTB (P = 4 × 10−4, q = 0.08) and betaine (P = 9 × 10−4, q = 0.091). N-acetylarginine (P = 0.0015, q = 0.102), which is produced from proline and is necessary for the synthesis of polyamines such as spermine, was also lower in Black women with subsequent sPTB. Both disordered placental angiogenesis and extracellular matrix remodelling have been associated with sPTB51.
Earlier preterm deliveries are associated with worse outcomes1. Therefore, we next investigated associations between vaginal metabolites and subsequent very and extremely preterm deliveries (gestational age at birth <32 and <28 weeks, respectively). We limited this analysis to Black women, due to their high proportion among such deliveries (21 of 26 and 14 of 15, respectively). We identified 13 metabolites that were associated only with these earlier sPTBs (P < 0.05, q < 0.1; Fig. 2d). The phospholipids palmitoyl sphingomyelin and palmitoyl dihydro sphingomyelin were both negatively associated with extremely PTB (P = 8.7 × 10−4, q = 0.061 and P = 0.0011, q = 0.069, respectively). Citraconate was likewise negatively associated with extremely PTB (P = 0.0014, q = 0.075), and was previously found to have lower concentrations in placental mitochondria of women with severe pre-eclampsia52. We also found several sugar or sugar alcohol metabolites to be higher in early PTB, including mannose (P = 4 × 10−4, q = 0.052), previously associated with uropathogens such as Escherichia coli53; arabinose (P = 9 × 10−4, q = 0.061), previously associated with bacterial vaginosis (BV)54 and mannitol/sorbitol (P = 1.7 × 10−4, q = 0.022), previously associated with PTB55. Ethylenediaminetetraacetic acid (EDTA), an additional xenobiotic whose likely exogenous source56,57,58 was also confirmed by AMON (Methods and Supplementary Table 3), was increased in extremely and very PTB (P = 8 × 10−4, q = 0.061 and P = 1.6 × 10−4, q = 0.044, respectively). EDTA was shown to be cytotoxic in vaginal epithelial cells59, and is teratogenic in rats at non-maternotoxic doses57,60. EDTA was detected in 100% of women in this cohort (Extended Data Fig. 5b), which is expected given its presence in the sample collection buffer, yet this is unlikely to explain these associations. Overall, we found that metabolite associations with sPTB interact with both race and sPTB timing, and detected an additional sPTB-associated xenobiotic.
Functional metabolite sets enriched for sPTB associations
We next checked whether functional groups of metabolites (for example, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways61; Supplementary Table 4) are enriched for associations with sPTB, even if changes to any specific metabolite are small (Methods). We found significant enrichment in proline and arginine metabolism (P = 0.0018, q = 0.058; Extended Data Fig. 5d), consistent with our findings regarding proline and N-acetylarginine (Extended Data Fig. 5c). Additionally, and again consistent with the association between tyramine and TB among White women (Fig. 2a), we found an enrichment in metabolites related to the endocrine system among White women (P = 0.0045, q = 0.077; Extended Data Fig. 5d). We further identified lipid-metabolism-related metabolites to be enriched for associations with early sPTB among Black women (P = 0.0019, q = 0.032 and P = 0.0047, q = 0.038 for very and extremely PTB, respectively; Extended Data Fig. 5d), potentially related to other lipid metabolism alterations reported in PTB62. Notably, we identified a global enrichment of xenobiotics associated with sPTB among Black women (P = 0.006, q = 0.054; Extended Data Fig. 5d), consistent with our finding regarding specific metabolites (Fig. 2).
A network of microbe–metabolite associations in sPTB
We next investigated the correlations between the estimated absolute abundances of microbial species and sPTB-associated metabolites (Methods). Contrary to metabolite associations with sPTB, we found weak interactions between microbe–metabolite associations and both race and sPTB timing (Supplementary Note 2). Our results replicate multiple known associations, such as between Dialister species or Enterococcus faecalis and tyramine63,64 (Spearman ρ > 0.54, P < 10−10, q < 0.1 for all; Fig. 3a and Extended Data Fig. 6a), as well as evidence for choline metabolism in G. vaginalis65 and Corynebacterium aurimucosum66 (ρ = 0.34, P < 10−6, q = 1.7 × 10−5 and ρ = 0.40, P = 4 × 10−4, q = 0.006, respectively). Additionally, higher tyramine concentrations were previously found in BV67, supporting the associations we found with BV-associated microbes (Fig. 3a).
We note that xenobiotics positively associated with sPTB have significantly weaker correlations with vaginal microbes than those observed for the rest of the metabolites (Mann–Whitney P = 0.024). DEA, for example, shows only weak correlations with all vaginal microbes (ρ < 0.23, q > 0.1 for all microbes). This observation provides further support for an exogenous source for these metabolites.
We found the strongest and most numerous correlations for tyramine (35 associations, Spearman 0.27 < ρ < 0.73; Fig. 3a), which was higher in TB among White women (Fig. 2a). Eight out of the 35 tyramine-correlated microbes are also correlated with choline, which was enriched in TB across all women (Fig. 2a). Interestingly, many of the species positively correlated with TB-associated metabolites, including Atopobium vaginae, G. vaginalis, several Prevotella species, BV-associated bacteria (BVAB) and many others, were previously reported to be associated with negative outcomes, such as BV68, PTB13,14,15,17 and other adverse pregnancy69 and neonatal70 outcomes. We found a similarly paradoxical negative correlation between Staphylococcus epidermidis, previously associated with BV71 and late-onset sepsis in preterm neonates72, and both tartrate and ethyl glucoside (ρ = −0.28, P = 6.9 × 10−4, q = 0.009 and ρ = −0.26, P = 0.0015, q = 0.016, respectively; Fig. 3a), which were positively associated with sPTB. Therefore, even as many of these associations were known, our results also suggest complex interactions among suboptimal vaginal microbes, sPTB-associated metabolites and health outcomes.
Metabolic models support microbiome production of tyramine
To gain some mechanistic insight into the correlations we found, we used community-level metabolic models73, which integrate genetic and biochemical knowledge to predict the metabolic output of each microbiome sample (community net maximal production capacity73 (NMPC); Methods). Our models show accurate predictions for several metabolites known to be produced by the vaginal microbiome63,74, such as putrescine and histamine (Spearman ρ = 0.64 between NMPCs and metabolomic measurements, N = 214, P < 10−10 and ρ = 0.54, N = 167, P < 10−10, respectively; Extended Data Fig. 7a,b).
Two sPTB-associated metabolites, tyramine and choline, were represented in our models. As our models predicted that choline was not affected by the vaginal microbiome (NMPCs of 0 for all women), we focused on tyramine, which previous studies suggest is produced by vaginal microbes63,74. Following genomic curation (Methods), the predictions of our models were highly accurate (Spearman ρ = 0.62, N = 229, P < 10−10; Extended Data Fig. 7c). Interestingly, we found that, among White women, while the measured levels of tyramine were enriched in TB (Mann–Whitney P = 2.8 × 10−4; Fig. 3b), its predicted microbiome output was not, and was even somewhat higher in sPTB (P = 0.26; Fig. 3c). This stems from lower accuracy in tyramine predictions in White women who delivered preterm (Spearman ρ = 0.19 versus ρ = 0.65, P = 0.02 for difference in ρ’s; Fig. 3d).
This difference in accuracy could not be explained by the representation of microbes in the metabolic models, which was in fact lower in Black women (Mann–Whitney P = 0.05, Extended Data Fig. 7d), probably due to the generally higher vaginal microbial diversity in this population75. Furthermore, tyramine prediction accuracy was not sensitive to constraints on metabolite uptakes or to the representation of low-abundance taxa (Methods, Supplementary Table 5 and Extended Data Fig. 7e). As these analyses suggest that lower tyramine prediction accuracy in White women with sPTB is not the result of a modelling artefact, the different accuracy could stem from a difference in strains, functional capacity or a non-microbial effect. Either phenomenon also has the potential to explain the aforementioned paradoxical microbial associations with tyramine (Fig. 3a). The possibility of a microbial difference or a host effect is also supported by AMON37, which predicts that tyramine is either microbial or host derived (Supplementary Table 3). Overall, our results demonstrate the utility of metabolic models in studying microbiome–metabolome interactions, and raise intriguing hypotheses for further investigation.
Early prediction of sPTB risk using the vaginal metabolome
Early diagnosis of pregnancies with high risk for prematurity is crucial for the development of prevention and intervention strategies. We therefore explored whether we can use clinical, microbiome or metabolome data, collected ~3 months before delivery (mean ± s.d. of 14.5 ± 4.2 weeks), to predict subsequent sPTB. We used boosted decision trees, which were superior to alternative models (Extended Data Fig. 8a). For microbiome- and metabolome-based models, we trained composite predictors, such that a separate model was used for White and Black women. Despite the smaller effective sample size for each model, this resulted in better performance (Extended Data Fig. 8b). We evaluated all models on held-out samples using nested cross-validation without test data leakage (Methods).
Our models using clinical (age, body mass index (BMI), race, PTB history and nulliparity) and microbial abundance data, obtained limited accuracy (area under the receiver operating characteristic curve (auROC) of 0.59, area under the precision-recall curve (auPR) of 0.46 for clinical data; auROC = 0.55, auPR = 0.41 for microbiome data; P = 0.12 for difference between the models; Methods and Fig. 4a,b). Notably, using metabolomics data, we were able to generate a model with superior accuracy (auROC = 0.78, auPR = 0.61, P < 10−10 for comparison with either clinical or microbiome models; Methods and Fig. 4a,b). Lastly, a model combining clinical, microbiome and metabolomics data obtained similar accuracy to the metabolome-based model (auROC = 0.76, auPR = 0.62, P = 0.44 versus metabolome-based model; Extended Data Fig. 8c,d), with metabolites as the most prominent contributors to the model (Extended Data Fig. 8e). This suggests that metabolite measurements are a sufficient representation of information contained in these three data types with respect to sPTB.
Our metabolome-based model is superior or similar in accuracy to several previously published models, such as those using amniotic fluid metabolomics (auROC 0.65–0.70, N = 24) (ref. 76), maternal serum metabolome and clinical data (auROC 0.73, N = 164) (ref. 77), maternal urine and plasma metabolome (auROC 0.69–0.79, N = 146) (ref. 78), blood cell-free RNA measurements (auROC 0.81, N = 38) (ref. 79) or vaginal protein biomarkers (auROC 0.86, N = 150, sPTB N = 11) (ref. 80), many of which have small sample sizes, lack demographic diversity or focus on high-risk cohorts. Overall, our results demonstrate the promising utility of vaginal metabolites as early and accurate biomarkers of sPTB.
We next evaluated the same models, without retraining, for predicting extremely and very PTB in Black women from the same held-out data (that is, only the ground-truth classification of outcome changed). Interestingly, while the metabolome-based model showed a slight decrease in accuracy (auROC of 0.69 and 0.73 for extremely and very PTB, respectively, compared with auROC of 0.77 for sPTB in Black women; P = 4.3 × 10−4 and P = 0.001, respectively; Extended Data Fig. 8f), our microbiome-based model showed increasing accuracy (auROC of 0.69 and 0.62, respectively, compared with auROC of 0.55; P = 0.031 and P = 0.49, respectively; Extended Data Fig. 8g). These results may reflect the potentially increased involvement of the vaginal microbiome in earlier sPTBs1.
Metabolome-based predictor replicates in external cohorts
To test the generalizability of our metabolome-based model, we validated its accuracy in two independent cohorts (Methods): a case–control study of 20 women (10 PTB), mostly (75%) White, at high risk for PTB, with samples collected at 24–28 weeks of gestation (‘Ghartey, 2015’) (ref. 81); and a case–control study of 50 women (20 PTB), mostly (88%) Black, presenting with symptoms of preterm labour and no PTB history, with samples collected at 22–34 weeks of gestation (‘Ghartey, 2017’) (ref. 55).
This validation was extremely challenging: due to the different inclusion criteria and population structure, substantial batch effects in metabolomics measurements across different studies82 and finally, as data were generated 4–6 years earlier, only a small fraction of metabolites used by our predictor were measured (34% and 39%). To emphasize this, only one and two (for Ghartey 2015 and 2017, respectively) of the ten associations we detected between vaginal metabolites and sPTB (Fig. 2a) could be examined in these cohorts (Methods), of which none were significant (Mann–Whitney P > 0.05). These sPTB-associated metabolites are probably important features for prediction, making generalization across these cohorts difficult. Despite this challenging setting, our metabolome-based predictor, trained only on the 232 samples profiled here, without any retraining or adaptation, provided relatively accurate predictions in both external cohorts (auROC = 0.65, auPR = 0.67 and auROC = 0.66, auPR = 0.58 for Ghartey 2015 and 2017, respectively; Fig. 4c and Extended Data Fig. 8h,i). These results demonstrate the robustness of the vaginal metabolome and of our predictive approach to study specific biases.
Model interpretation reveals other contributing features
To obtain insights into the features used by the models, we assessed the contribution of each feature towards the prediction for each sample using SHapley Additive exPlanations (SHAP)83 (Supplementary Table 6). As expected, six of the ten most predictive metabolites, namely DEA, tyramine, arabinose, glutamate γ-methyl ester, mannitol/sorbitol and mannose, were also identified in our association analysis, with a similar direction of association (Figs. 2 and 4d). We additionally found that high pipecolate levels and low levels of lactosyl-N-palmitoyl-sphingosine and orotidine contribute to sPTB predictions. Of these, pipecolate was shown to be elevated in women with BV84.
A similar analysis of our microbiome-based predictor also captured previously detected associations between vaginal microbes and sPTB, including those of Mobiluncus mulieris14 and Finegoldia magna85, and of Lactobacillus14 and Dialister species15 (Extended Data Fig. 8j). These results highlight the interpretability of our models and their ability to model complex non-linear interactions, enabling us to expose associations not detected by univariate analyses.
In this study, we measured the second-trimester vaginal metabolome of 232 pregnant women. We show that it is associated with the vaginal microbiome, and that metabolite signatures are enriched for sPTB among Black women. We identify multiple metabolites that are associated with sPTB, across the cohort and separately for Black and White women. Our results highlight exogenous metabolites with strong associations with sPTB, which we suggest constitute important risk factors. We further uncover intriguing interactions between TB-associated metabolites and potentially suboptimal microbes, and propose a difference in the vaginal metabolism of tyramine in White women who delivered preterm. Finally, we demonstrate that metabolome-based models can predict subsequent sPTB weeks to months in advance, potentially paving the way for early diagnostics.
We detected several sPTB-associated xenobiotics: DEA, ethyl glucoside, tartrate and EDTA, which prior literature and a functional analysis37 suggest are of exogenous source. DEA, a chemical with no known natural source86, commonly used in drilling and metalworking fluids35, and to which reproductive-aged women are highly exposed87, and ethyl glucoside, present in alcohol-containing products31, are both precursors or ingredients in hygienic and cosmetic products30,33. Tartrate and EDTA are used as food additives32,58 and are also common in hygienic and cosmetic products32,57. While we have not identified the sources of these metabolites, the fact that all are documented in hygienic and cosmetic products raises concern that some of these products may increase the risk of sPTB. Our results coincide with recent studies raising concerns regarding environmental exposures in pregnancy88,89, and identify these chemicals in the reproductive tract. Further study is warranted to identify the sources of these metabolites and to disentangle their effects on the host, microbiome and pregnancy outcomes, so that policy recommendations can be made regarding their use in various products and during pregnancy.
The cohort we analysed included a majority of Black women, offering an opportunity to study PTB in women who are disproportionately burdened by PTB and other adverse pregnancy outcomes, while also represented in small numbers in many studies. However, we urge caution in drawing conclusions from differences in associations between Black and White women, as maternal self-identified race represents a complex array of pre-existing differences, disparities and clinical covariates at the time of sampling. Nevertheless, we note that the enrichment of sPTB associations among the xenobiotic metabolite set in Black women may potentially reflect disparities in environmental and exogenous exposures90,91, consistent with reports that Black women have greater exposures to endocrine disrupting chemicals through personal care products92,93 and with studies that identified exogenous chemicals as possible drivers of PTB94,95. Metabolomic exposure patterns could contribute to the association between racial disparities in prematurity rates and racial differences in the vaginal microbiome96.
We used community-scale metabolic models to investigate microbial tyramine metabolism, which have important limitations. Model curation is an ongoing effort, and thus models may not be tailored to each sample or may lack representation of niche-specific metabolic capabilities. Another limitation stems from the resolution of 16S rRNA amplicon sequencing, which identifies taxa at the species or genus level, precluding strain-specific modelling. Despite these limitations, our models accurately predicted several metabolites, and offered insights regarding potential sources of tyramine.
Our predictive modelling approach has several noteworthy limitations: (1) our use of a case–control cohort enriched for sPTB limits our ability to assess population-level predictive value, and further validation is required in prospective studies. (2) As this cohort was focused on sPTB, we are unable to assess if our models are specific to sPTB or are detecting a general risk for adverse pregnancy outcomes. (3) The use of race in our models, while common throughout medicine97, is controversial and creates issues in implementation98. This was driven by differences in both sample size and the vaginal metabolome itself between Black and White women in this cohort, and resulted in an overall increased accuracy. (4) Finally, there is additional unexplored potential in using even earlier samples for prediction. A larger sample size, and combination with other sources of data, such as maternal urine or serum metabolomics, vaginal metagenomics or cell-free RNA measurements, could further improve prediction accuracy.
Our results demonstrate the utility of vaginal metabolites as early biomarkers of PTB, and identify xenobiotic metabolites as potentially modifiable sPTB risk factors, which may also disproportionately affect Black women. The strong associations we observe motivate the investigation of the vaginal microbiome and metabolome in the context of other adverse pregnancy outcomes such as pre-eclampsia, indicated PTB and BV.
Study design and cohort description
We analysed banked samples from the previously collected and described Motherhood and Microbiome cohort (NCT02030106) (ref. 14). This cohort was approved by the institutional review board at the University of Pennsylvania (IRB 818914) and the University of Maryland School of Medicine (HP-00045398), and all participants provided written informed consent. The Motherhood and Microbiome cohort recruited 2,000 women with a singleton pregnancy before 20 weeks of gestation. Women were followed to delivery, and sPTB was defined as delivery before 37 weeks of gestation with a presentation of cervical dilation and/or premature rupture of membranes. Of these, the vaginal microbiota of 503 women was previously characterized via 16S rRNA gene amplicon sequencing (V3–V4 region) of vaginal swabs collected between 20 and 24 weeks of gestation, and total bacterial load was assessed using the TaqMan BactQuant assay14. For this study, out of women with available microbiome data, all available samples were selected from women who delivered preterm (N = 80), in addition to samples from 152 controls who delivered at term. The selected cervicovaginal samples were replicates of those used for 16S rRNA gene sequencing, collected using a double shaft dacron swab. Cervicovaginal swabs were either self-collected or collected by a research coordinator during a study visit14.
Statistics and reproducibility
No data was excluded from analysis in the present study. As the study was observational, there was no allocation or randomization. The study included all available samples who delivered preterm (N = 80), and no statistical methods were used to pre-determine sample sizes; our sample size is similar to those reported in previous publications25,26. Samples were randomly distributed across metabolomics batches and metabolomics analysis was performed by Metabolon, who were blinded to the outcome assessment of each sample. Two-sided Mann–Whitney U tests (SciPy 1.5.2) and logistic regression (Statsmodels 0.12.1) were used to identify associations between metabolite levels and sPTB. Two-sided Fisher’s exact tests (R stats 3.6.1) were used to identify associations among MCs, CSTs, race and sPTB. PERMANOVA tests (scikit-bio 0.5.6) were used to identify associations among the microbiome, metabolome, CST, race and metabolomics batches. Metabolite set enrichment analysis (Methods) was used to identify associations between metabolite sets and sPTB. Spearman correlations were used to measure the agreement between metabolite levels and NMPCs and between metabolite levels and microbial abundances. Fisher R-to-z transform was used to compare correlations measured within subgroups. Evaluation of machine learning models was performed using scikit-learn 0.24.2. pandas 1.1.5 and NumPy 1.18.5 were used for data processing. Robust assessment of generalization error of predictive models was achieved via nested cross-validation.
Metabolomics profiling and preprocessing
Metabolite levels were measured from vaginal swabs by Metabolon, using an untargeted liquid chromatography–tandem mass spectrometry (LC-MS/MS) platform99. For discussion of batch processing of the sample, see Supplementary Note 1 and Extended Data Fig. 2. We note that swab lot number, sterile swabs for blank processing and sample collector (coordinator or self-collection) are not available. While this limits analysis of potential batch effects, we find batch confounding (for example, swab lot associated with sPTB) unlikely as samples were collected before delivery and outcome determination.
Following a methanol-based small-molecule extraction, samples were divided into 5 µl aliquots and each was resuspended in an appropriate extraction solvent and separated via one of four chromatography techniques. Each chromatographic method was optimized for the extraction of hydrophobic, basic or polar compounds. The chromatographic method used for the quantification of each metabolite is provided in Supplementary Table 4. Isotopically labelled or halogenated standards were added to all aliquots at fixed concentrations before extraction to serve as retention time markers. Following extraction, compounds were subjected to electrospray ionization and measured via tandem mass spectrometry by a Q-Exactive Hybrid Quadrupole-Orbitrap high resolution mass spectrometer. Data-dependent acquisition mode was used to generate fragmentation spectra of high-intensity m/z peaks detected during the first round of mass spectrometry. m/z peaks were identified and annotated by Metabolon using proprietary software and comparisons to their database of retention indices and fragment ion spectra. The areas under annotated m/z peaks were taken as metabolite measurements. A comprehensive overview of all chromatographic and mass spectrometry parameters is available in Supplementary Table 7. Process blanks (negative controls) were run with each metabolomic plate, and metabolites were considered present only if they were detected with levels that were at least three times higher than these controls. Detected levels of the xenobiotics highlighted in this study, in vaginal samples and negative controls, are shown in Extended Data Fig. 5e, demonstrating the same. For the mass error of these xenobiotics, see also Extended Data Fig. 5f, showing high identification quality compared with other non-xenobiotic metabolites.
While the majority of named metabolites (N = 556) were tier 1 identified by Metabolon via fragmentation spectra matches to experimentally measured library standards, only tier 2 assignments are available for independent identification due to the proprietary nature of the Metabolon platform. Metabolite measurements were volume normalized to the volume of buffer used, which may not necessarily account for differences in the original tissue. This was followed by robust standardization27 of the log (base 10) transformed values (subtracting the median and dividing by the standard deviation calculated while clipping the top and bottom 5% of outliers). The Shapiro–Wilk test was used to determine that log (base 10) transformed values deviated from normality for the majority of metabolites (389 of 635 named metabolites). For this reason, non-parametric tests were used in subsequent metabolomic analyses.
Microbiome data processing
All microbiome-based analyses were done using data previously processed with DADA2 (ref. 100) and SpeciateIT14, available from Supplementary Data 2 of ref. 14. A single exception to this are predictive models, which were trained on 97% clustered operational taxonomic units (OTUs) using the USEARCH pipeline101. We obtained raw sequences from the database of Genotypes and Phenotypes (dbGaP) under study accession: phs001739.v1.p1. Primers were aligned to reads and then trimmed, followed by end merging and quality filtering (-fastq_maxee 1.0). The filtered reads were then pooled together, dereplicated, clustered with a 97% threshold and chimera filtered with the UPARSE algorithm to produce the OTU count matrix.
Global microbiome and metabolome structure
PERMANOVA analysis was performed using Bray–Curtis distance for microbiome data and the Canberra distance for metabolites data, which is robust to outliers and sensitive to differences in common features. De novo clustering of metabolite vectors was done using the k-medoids algorithm (scikit-learn-extra 0.2.0), also with the Canberra distance. We determined the optimal number of clusters by comparing the within cluster sum of square error and the gap statistic for clustering solutions with k between 1 and 15 (Extended Data Fig. 3a,b). To check the robustness and consistency of these clusters, we performed 100 random selections of 209 (90%) of the 232 samples, recreating clusters de novo with the same procedure for each random subset. Many of the resulting subsets (36) had over 95% of samples assigned to the same metabolite cluster as the original assignment (Supplementary Table 2), with an average assignment accuracy of 86% across all random subsets (Extended Data Fig. 3g), demonstrating that our metabolite clusters are indeed consistent. Uniform manifold approximation and projection (UMAP)102 was performed using the Python umap-learn package102, with n_neighbors of 15 and min_dist of 0.05 for microbiome data and n_neighbors of 15 and min_dist of 0.25 for metabolomics data. To further describe each metabolomics cluster, Fisher’s exact test was used to identify metabolite super and subpathways enriched among metabolites associated with each cluster (P < 0.05).
Differential abundance testing and metabolite set enrichment analysis
Differential abundance tests between metabolite levels were done using the two-sided Mann–Whitney U test for metabolites that were present in at least half of the cases. All associations with early PTB were calculated using only samples from Black women, due to their high proportion among these deliveries (21 of 26 for childbirths <32 weeks of gestation and 14 of 15 for childbirth <28 weeks). To identify functional sets of metabolites that were perturbed between sPTB and TB, we compared, for each set, the Mann–Whitney P values for differential abundance between PTB and sPTB for metabolites within the set to the same P values for metabolites outside the sets, using an additional Mann–Whitney U test. We calculated significance by comparing the P value of the latter test to 10,000 similar P values calculated on random permutations of sPTB and TB labels. For functional sets, we used definitions of super and subpathways provided by Metabolon, as well as KEGG61 pathways. False discovery rate (FDR) correction was performed separately for each metabolite set type.
Prediction of metabolite origins using AMON
AMON37 is a method that uses functional annotations according to the KEGG database61 to predict metabolite origins for all metabolites that could be matched to a KEGG entry (N = 334 of 635 named metabolites). We used PICRUSt2 (ref. 103) to generate functional profiles for each sample, and then applied AMON37 to predict whether metabolites that had matching entries in the KEGG Database are products of human or microbial metabolism. When both were false, we interpreted the metabolite to be a xenobiotic.
To identify associations between microbes and metabolites, we estimated microbial absolute abundance by multiplying the relative abundances of each taxon by the total 16S rRNA copy number for the sample, obtained using the TaqMan quantitative polymerase chain reaction (qPCR)-based panel14,104,105, and calculated Spearman correlations with the levels of metabolites we found to be associated with sPTB. Across all correlation network analyses (Fig. 3a and Extended Data Figs. 6a,c–e) we included correlations with at least 22% of paired measurements, corresponding to 50 samples of 232 for Fig. 3a. All correlation measurements used available data without imputation, and correction for multiple testing was performed via the Benjamini–Hochberg FDR method. To determine whether edges in our network were influenced by race (Extended Data Fig. 6b) or by the severity of sPTB (Extended Data Fig. 6f), we used a two-sided Fisher R-to-z transform to compare these correlations in Black women to the same correlations in White women, as well as to compare these correlations in Black women who delivered before 32 weeks to the same correlations in all other Black women.
Creating and interrogating vaginal microbiome models
Microbiome metabolic modelling was done using Microbiome Modeling Toolbox (COBRA toolbox commit: 71c117305231f77a0292856e292b95ab32040711) (refs. 73,106), using models from AGORA2 (ref. 107). All computations were performed in MATLAB version 2019a (Mathworks), using the IBM CPLEX (IBM) 12.10.0 solver.
For each sample, tailored microbiome models were created through the compartmentalization technique108: metabolic reconstructions of species present in the sample are merged into a shared compartment, and input and output compartments are added. The shared compartment enables microbes to share metabolites while input and output compartments are present to enable compounds intake and secretion. Coupling constraints are added as in refs. 109,110 to ensure a dependency between relative abundances and each species network fluxes. Finally, sample-specific microbiome biomass objective functions, composed by the sum of each microbial biomass multiplied by the corresponding relative abundance value, are added to each microbiome model.
To interrogate the secretion potential of each sample-specific microbiome model, we computed NMPCs using the pipeline mgPipe.m of the Microbiome Modeling Toolbox73 (Supplementary Table 8). NMPC calculation accounts for maximal microbiome compound production and uptake rates, and aims at predicting the overall contribution of microbiomes to the metabolism of specific compounds73. To later assess prediction accuracy, we computed Spearman correlations between NMPCs and the corresponding metabolite measurements without imputation.
To support and improve the accuracy of our tyramine predictions, we validated the presence of the TDC gene, coding for tyrosine decarboxylase. For each species represented in our metabolic models (N = 95), we used Prodigal111 to predict open reading frames in up to 200 randomly selected Refseq112 assemblies, and searched them for evidence of TDC using the hmmsearch function of Hmmer3.3.2 (ref. 113) and a profile hmm for TDC114 (NCBI HMM accession TIGR03811.1). We then curated our metabolic models, making sure that the corresponding reaction exists in models for which at least one assembly contained the corresponding gene.
To compile the metabolic models, we matched between the species detected in the microbiome samples and those present in AGORA2 (ref. 107) (Supplementary Table 9). To increase the representativeness of our models, we added three representatives for abundant vaginal species without a corresponding AGORA2 model that were present with >5% relative abundance in at least 20 samples (listed in Supplementary Table 9). The only species that passed this threshold, which was not included in our models was Candidatus Lachnocurva vaginae (BVAB1), for which no suitable AGORA model was available. To generate species-level models, we combined metabolic models from available strains using the function createPanModels.m of the Microbiome Modeling Toolbox73. Altogether, our microbiome metabolic models included 95 different species, with an average of 20 species in each sample. As the vaginal microbiome has a very skewed distribution28, this resulted in a median (interquartile range (IQR)) of 96.7% (88.4–98.8%) of the total abundance across samples represented by our models (Extended Data Fig. 7d).
As a test of the sensitivity of our models to the lack of representation of low-abundance microbes, we performed simulations where we iteratively removed the ten least abundant species from consideration by our models, and evaluated the accuracy of our models in predicting the well-modelled metabolites tyramine, putrescine and histamine. As expected, as our models account for the abundance of each microbe, and as the vaginal microbiome has a skewed distribution, our models were not sensitive to the representation of low-abundance microbes (Extended Data Fig. 7e), even when removing 70 out of 95 models.
Metabolic modelling requires environmental conditions such as media and carbon source availability115. We therefore formulated a ‘general vaginal media’ (Supplementary Table 10), as the union of all metabolites present in at least 50 samples to which a corresponding metabolite was identified in AGORA, assuming them to be present in an unlimited (that is, very high) concentration. This vaginal media was applied to each microbiome model input compartment in the form of constraints on metabolite uptake reactions, constraining uptake of compounds not present in the environment to zero. Uptake of specific gut-related dietary compounds, automatically performed in mgPipe, was disabled acknowledging the different metabolic environment in the vagina, and essential metabolites required for achieving microbiome growth, together with their respective flux value, were detected and added to the vaginal media using the fastFVA and findMIIS functions of the COBRA toolbox106. A comparison of the ‘general’ media to subgroup-specific media, defined as metabolites present in 75% of samples from Black and White women separately, with uptake fluxes constrained to the mean value across the subgroup, and to a person-specific media, in which uptake fluxes were constrained for each sample separately, showed similar accuracy with respect to tyramine predictions (Supplementary Table 5).
Training, testing and validation of sPTB classifiers
We constructed predictive models separately using the clinical (age, race, parity status, history of sPTB and BMI), microbiome and metabolomics data, as well as a combination model consisting of all of these data types combined. As race had very strong interactions with microbiome and metabolomics data, we trained a composite predictor for microbiome, metabolomics and combination models, whereas a separate model was trained for Black women. Despite the smaller sample size for each model, this empirically improved prediction performance (Extended Data Fig. 8b). Microbiome-based models used absolute abundances, calculated from USEARCH-processed OTUs as described above. In cases where qPCR-based total load was not available (N = 14), it was imputed to the mean total load using only training samples.
Samples were split into training and test sets using 10-fold cross-validation (‘outer folds’), block-stratified for deciles of gestational age at birth (GAB), and for microbiome, metabolomics and combined models, also stratified for race. To account for stochasticity in the division to ten folds, we repeated this process five times. Train–test sterility was strictly maintained. To tune the optimal set of hyperparameters (including parameters for feature engineering and selection), and to obtain a robust estimate of the generalization error, we used nested cross-validation. In this extension of the training–test–validation framework, the training set was further split to five folds (‘inner folds’), on which we used 1,000 iterations of a random set of hyperparameters (Supplementary Table 11). Once more, to account for stochasticity, we repeated this process five times. We selected the best hyperparameter set as the model with the top average auROC score out of the top five most accurate models based on average R2 for sPTB classification, based on performance on the inner folds. We then used these hyperparameters to train a model on the entire training data for the outer fold, and evaluated it on the held-out test data. Of note, in this framework, hyperparameters are selected using strictly the training data of each outer 10-fold cross-validation fold, and are evaluated just once on the test set. Our prediction pipeline included standardization and imputation (for metabolomics data), optional principal component analysis (PCA) transformation, and feature selection using sparsity, SHAP83 feature importance, information gain and/or Spearman correlation, followed by prediction using LightGBM116, with all steps performed strictly using training data. The selected models were then evaluated, without retraining, on classification of extremely (GAB <28 weeks) or very (GAB <32 weeks) PTB on the outer fold. Benchmark analyses (Extended Data Fig. 8a,b) were done using 10-fold cross-validation, repeated five times. We assessed the significance of the difference in auROC between two models by computing z-scores of the normal distributions of auROCs117.
To obtain a final model for interpretation and validation, we trained new composite models on the entire cohort (N = 232), using the hyperparameters selected for each of the outer folds (50 models), and picked the model with the best auROC on the same cohort (training fit). The final parameter set for each model is listed in Supplementary Table 12. For validation on external vaginal metabolome datasets, we note that information on maternal race at the subject level was not available to us. We therefore applied the metabolomics model used for non-Black women, without retraining or adaptation, to metabolomics data from the Ghartey 2015 (ref. 81) cohort, as this cohort contained mostly White women; and similarly applied the metabolomics model used for Black women to metabolomics data from the Ghartey 2017 (ref. 55) cohort. For validation of associations of metabolites with sPTB (Fig. 2a) in these cohorts, we note that, of the ten metabolites in Fig. 2a, only the six that apply to all and White women can be validated in the Ghartey 2015 cohort, of which only one was measured; and only the nine that apply to all and Black women can be validated in the Ghartey 2017 cohort, of which only two were measured.
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
The 16S rRNA gene amplicon sequencing data and the associated samples and subjects’ metadata analysed in this study are publicly available in the database of Genotypes and Phenotypes (dbGaP) under accession number phs001739.v1.p1 as well as in Supplementary Data 2 of ref. 14. Raw metabolomics data are available in Supplementary Table 1. Mass spectral data are available from MetaboLights under accession number MTBLS702 (https://www.ebi.ac.uk/metabolights/MTBLS702). Additional information regarding xenobiotics is provided in Supplementary Table 13. The KEGG Database is available at https://www.genome.jp/kegg/, and the AGORA models are available at https://www.vmh.life/.
Goldenberg, R. L., Culhane, J. F., Iams, J. D. & Romero, R. Epidemiology and causes of preterm birth. Lancet 371, 75–84 (2008).
Howson, C. P., Kinney, M. V., McDougall, L. & Lawn, J. E. Born too soon: preterm birth matters. Reprod. Health 10, 1–9 (2013).
Martin, J. A., Hamilton, B. E. & Osterman, M. J. K. Births in the United States, 2018 (NCHS Data Brief Hyattsville MD Natl Cent. Health Stat. 1–8, 2019).
Braveman, P. et al. Explaining the Black–White disparity in preterm birth: a consensus statement from a multi-disciplinary scientific work group convened by the march of dimes. Front. Reprod. Health 3, 684207 (2021).
Meertens, L. J. et al. Prediction models for the risk of spontaneous preterm birth based on maternal characteristics: a systematic review and independent external validation. Acta Obstet. Gynecol. Scand. 97, 907–920 (2018).
Conde‐Agudelo, A., Papageorghiou, A. T., Kennedy, S. H. & Villar, J. Novel biomarkers for the prediction of the spontaneous preterm birth phenotype: a systematic review and meta‐analysis. BJOG 118, 1042–1054 (2011).
Zeevi, D. et al. Personalized nutrition by prediction of glycemic responses. Cell 163, 1079–1094 (2015).
Qin, N. et al. Alterations of the human gut microbiome in liver cirrhosis. Nature 513, 59–64 (2014).
Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).
Wirbel, J. et al. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nat. Med. 25, 679–689 (2019).
Thomas, A. M. et al. Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nat. Med. 25, 667–678 (2019).
Brown, R. G. et al. Vaginal dysbiosis increases risk of preterm fetal membrane rupture, neonatal sepsis and is exacerbated by erythromycin. BMC Med. 16, 9 (2018).
Callahan, B. J. et al. Replication and refinement of a vaginal microbial signature of preterm birth in two racially distinct cohorts of US women. Proc. Natl Acad. Sci. USA 114, 9966–9971 (2017).
Elovitz, M. A. et al. Cervicovaginal microbiota and local immune response modulate the risk of spontaneous preterm delivery. Nat. Commun. 10, 1305 (2019).
Fettweis, J. M. et al. The vaginal microbiome and preterm birth. Nat. Med. 25, 1012–1021 (2019).
DiGiulio, D. B. et al. Temporal and spatial variation of the human microbiota during pregnancy. Proc. Natl Acad. Sci. USA 112, 11060–11065 (2015).
Romero, R. et al. The vaginal microbiota of pregnant women who subsequently have spontaneous preterm labor and delivery and those with a normal delivery at term. Microbiome 2, 18 (2014).
Bayar, E., Bennett, P. R., Chan, D., Sykes, L. & MacIntyre, D. A. The pregnancy microbiome and preterm birth. Semin. Immunopathol. 42, 487–499 (2020).
Thaiss, C. A. et al. Microbiota diurnal rhythmicity programs host transcriptome oscillations. Cell 167, 1495–1510.e12 (2016).
Yoshimoto, S. et al. Obesity-induced gut microbial metabolite promotes liver cancer through senescence secretome. Nature 499, 97–101 (2013).
Koeth, R. A. et al. Intestinal microbiota metabolism of l-carnitine, a nutrient in red meat, promotes atherosclerosis. Nat. Med. 19, 576–585 (2013).
Levy, M. et al. Microbiota-modulated metabolites shape the intestinal microenvironment by regulating NLRP6 inflammasome signaling. Cell 163, 1428–1443 (2015).
Yachida, S. et al. Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer. Nat. Med. 25, 968–976 (2019).
Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).
Flaviani, F. et al. Cervicovaginal microbiota and metabolome predict preterm birth risk in an ethnically diverse cohort. JCI Insight 6, e149257 (2021).
Pruski, P. et al. Direct on-swab metabolic profiling of vaginal microbiome host interactions during pregnancy and preterm birth. Nat. Commun. 12, 5967 (2021).
Bar, N. et al. A reference map of potential determinants for the human serum metabolome. Nature 588, 135–140 (2020).
Ravel, J. et al. Vaginal microbiome of reproductive-age women. Proc. Natl Acad. Sci. USA 108, 4680–4687 (2011).
Stafford, G. P. et al. Spontaneous preterm birth is associated with differential expression of vaginal metabolites by Lactobacilli-dominated microflora. Front. Physiol. 8, 615 (2017).
Fiume, M. M. et al. Safety assessment of decyl glucoside and other alkyl glucosides as used in cosmetics. Int. J. Toxicol. 32, 22S–48S (2013).
Waters, B. et al. A validated method for the separation of ethyl glucoside isomers by gas chromatography-tandem mass spectrometry and quantitation in human whole blood and urine. J. Chromatogr. B 1188, 123074 (2021).
Kassaian, J.-M. Ullmann’s Encyclopedia of Industrial Chemistry pp. 671–677 (American Cancer Society, 2000).
Fiume, M. M. et al. Safety assessment of diethanolamine and its salts as used in cosmetics. Int. J. Toxicol. 36, 89S–110S (2017).
Final Report on the Safety Assessment of Cocamide DEA. Lauramide DEA, linoleamide DEA, and oleamide DEA. J. Am. Coll. Toxicol. 5, 415–454 (1986).
Mirer, F. Updated epidemiology of workers exposed to metalworking fluids provides sufficient evidence for carcinogenicity. Appl. Occup. Environ. Hyg. 18, 902–912 (2003).
Shariq, L. et al. Irrigation of wheat with select hydraulic fracturing chemicals: evaluating plant uptake and growth impacts. Environ. Pollut. 273, 116402 (2020).
Shaffer, M. et al. AMON: annotation of metabolite origins via networks to integrate microbiome and metabolome data. BMC Bioinf. 20, 614 (2019).
Zeisel, S. H. & da Costa, K.-A. Choline: an essential nutrient for public health. Nutr. Rev. 67, 615–623 (2009).
Bernhard, W. et al. Choline concentrations are lower in postnatal plasma of preterm infants than in cord plasma. Eur. J. Nutr. 54, 733–741 (2015).
Ueland, P. M. Choline and betaine in health and disease. J. Inherit. Metab. Dis. 34, 3–15 (2011).
Kirman, C. R., Hughes, B., Becker, R. A. & Hays, S. M. Derivation of a no-significant-risk-level (NSRL) for dermal exposures to diethanolamine. Regul. Toxicol. Pharmacol. 76, 137–151 (2016).
Craciunescu, C. N., Wu, R. & Zeisel, S. H. Diethanolamine alters neurogenesis and induces apoptosis in fetal mouse hippocampus. FASEB J. 20, 1635–1640 (2006).
Lehman-McKeeman, L. D. et al. Diethanolamine induces hepatic choline deficiency in mice. Toxicol. Sci. 67, 38–45 (2002).
National Toxicology Program. NTP toxicology and carcinogenesis studies of diethanolamine (CAS no. 111-42-2) in F344/N rats and B6C3F1 mice (dermal studies). Natl Toxicol. Program Tech. Rep. Ser. 478, 1–212 (1999).
Korkes, H. A. et al. Lipidomic assessment of plasma and placenta of women with early-onset preeclampsia. PLoS ONE 9, e110747 (2014).
Casti, A. et al. Pattern of human blood spermidine and spermine in prematurity. Clin. Chim. Acta 147, 223–232 (1985).
Vidarsdottir, H. et al. Does metabolomic profile differ with regard to birth weight? Pediatr. Res. 89, 1144–1151 (2021).
Obayomi, S. B. & Baluch, D. P. Tyramine localization closely corelates to circular vesicles within the mouse uterine horn using correlational fluorescence and scanning electron microscopy. Microsc. Microanal. 26, 1348–1349 (2020).
Albaugh, V. L., Mukherjee, K. & Barbul, A. Proline precursors and collagen synthesis: biochemical challenges of nutrient supplementation and wound healing. J. Nutr. 147, 2011–2017 (2017).
Wu, G., Bazer, F. W., Cudd, T. A., Meininger, C. J. & Spencer, T. E. Maternal nutrition and fetal development. J. Nutr. 134, 2169–2172 (2004).
Strauss, J. F. Extracellular matrix dynamics and fetal membrane rupture. Reprod. Sci. 20, 140–153 (2013).
Zhou, X. et al. Impaired mitochondrial fusion, autophagy, biogenesis and dysregulated lipid metabolism is associated with preeclampsia. Exp. Cell. Res. 359, 195–204 (2017).
Sauer, M. M. et al. Binding of the bacterial adhesin fimh to its natural, multivalent high-mannose type glycan targets. J. Am. Chem. Soc. 141, 936–944 (2019).
Benito, R., Vazquez, J. A., Berron, S., Fenoll, A. & Saez-Nieto, J. A. Y. A modified scheme for biotyping Gardnerella vaginalis. J. Med. Microbiol. 21, 357–359 (1986).
Ghartey, J., Anglim, L., Romero, J., Brown, A. & Elovitz, M. A. Women with symptomatic preterm birth have a distinct cervicovaginal metabolome. Am. J. Perinatol. 34, 1078–1083 (2017).
Fashemi, B., Delaney, M. L., Onderdonk, A. B. & Fichorova, R. N. Effects of feminine hygiene products on the vaginal mucosal biome. Microb. Ecol. Health Dis. https://doi.org/10.3402/mehd.v24i0.19703 (2013).
Lanigan, R. S. & Yamarik, T. A. Final report on the safety assessment of EDTA, calcium disodium EDTA, diammonium EDTA, dipotassium EDTA, disodium EDTA, TEA-EDTA, tetrasodium EDTA, tripotassium EDTA, trisodium EDTA, HEDTA, and trisodium HEDTA. Int. J. Toxicol. 21, 95–142 (2002).
Evstatiev, R. et al. The food additive EDTA aggravates colitis and colon carcinogenesis in mouse models. Sci. Rep. 11, 5188 (2021).
Youn, H., Hong, K., Yoo, J.-W. & Lee, C. H. ICAM-1 expression in vaginal cells as a potential biomarker for inflammatory response. Biomarkers 13, 257–269 (2008).
Brownie, C. F. et al. Teratogenic effect of calcium edetate (CaEDTA) in rats and the protective effect of zinc. Toxicol. Appl. Pharmacol. 82, 426–443 (1986).
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Catov, J. M. et al. Early pregnancy lipid concentrations and spontaneous preterm birth. Am. J. Obstet. Gynecol. 197, 610.e1–610.e7 (2007).
Nelson, T. M. et al. Vaginal biogenic amines: biomarkers of bacterial vaginosis or precursors to vaginal dysbiosis? Front. Physiol. 6, 253 (2015).
Bargossi, E. et al. The capability of tyramine production and correlation between phenotypic and genetic characteristics of Enterococcus faecium and Enterococcus faecalis strains. Front. Microbiol. 6, 1371 (2015).
Cornejo, O. E., Hickey, R. J., Suzuki, H. & Forney, L. J. Focusing the diversity of Gardnerella vaginalis through the lens of ecotypes. Evol. Appl. 11, 312–324 (2018).
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).
Wolrath, H., Forsum, U., Larsson, P.-G. & Borén, H. Analysis of bacterial vaginosis-related amines in vaginal fluid by gas chromatography and mass spectrometry. J. Clin. Microbiol. 39, 4026–4031 (2001).
Ravel, J. et al. Daily temporal dynamics of vaginal microbiota before, during and after episodes of bacterial vaginosis. Microbiome 1, 29 (2013).
Al-Memar, M. et al. The association between vaginal bacterial composition and miscarriage: a nested case–control study. BJOG 127, 264–274 (2020).
Mann, C., Dertinger, S., Hartmann, G., Schurz, R. & Simma, B. Actinomyces neuii and neonatal sepsis. Infection 30, 178–180 (2002).
Holst, E., Wathne, B., Hovelius, B. & Mårdh, P. A. Bacterial vaginosis: microbiological and clinical findings. Eur. J. Clin. Microbiol. 6, 536–541 (1987).
Moles, L. et al. Staphylococcus epidermidis in feedings and feces of preterm neonates. PLoS ONE 15, e0227823 (2020).
Baldini, F. et al. The Microbiome Modeling Toolbox: from microbial interactions to personalized microbial communities. Bioinformatics 35, 2332–2334 (2019).
Chen, K. C., Forsyth, P. S., Buchanan, T. M. & Holmes, K. K. Amine content of vaginal fluid from untreated and treated patients with nonspecific vaginitis. J. Clin. Invest. 63, 828–835 (1979).
Serrano, M. G. et al. Racioethnic diversity in the dynamics of the vaginal microbiome during pregnancy. Nat. Med. 25, 1001–1011 (2019).
Baraldi, E. et al. Untargeted metabolomic analysis of amniotic fluid in the prediction of preterm delivery and bronchopulmonary dysplasia. PLoS ONE 11, e0164211 (2016).
Souza, R. T. et al. Trace biomarkers associated with spontaneous preterm birth from the maternal serum metabolome of asymptomatic nulliparous women—parallel case–control studies from the SCOPE cohort. Sci. Rep. 9, 13701 (2019).
Aung, M. T. et al. Prediction and associations of preterm birth and its subtypes with eicosanoid enzymatic pathways and inflammatory markers. Sci. Rep. 9, 17049 (2019).
Ngo, T. T. M. et al. Noninvasive blood tests for fetal development predict gestational age and preterm delivery. Science 360, 1133–1136 (2018).
Leow, S. M. et al. Preterm birth prediction in asymptomatic women at mid-gestation using a panel of novel protein biomarkers: the Prediction of PreTerm Labor (PPeTaL) study. Am. J. Obstet. Gynecol. 2, 100084 (2020).
Ghartey, J., Bastek, J. A., Brown, A. G., Anglim, L. & Elovitz, M. A. Women with preterm birth have a distinct cervicovaginal metabolome. Am. J. Obstet. Gynecol. 212, 776.e1–12 (2015).
Brunius, C., Shi, L. & Landberg, R. Large-scale untargeted LC-MS metabolomics data correction using between-batch feature alignment and cluster-based within-batch signal intensity drift correction. Metabolomics 12, 173 (2016).
Lundberg, S. M. & Lee, S.-I. A in Proceedings of the 31st International Conference on Neural Information Processing Systems 4768–4777 (Curran Associates, 2017).
Srinivasan, S. et al. Metabolic signatures of bacterial vaginosis. mBio 6, e00204–e00215 (2015).
Freitas, A. C., Bocking, A., Hill, J. E. & Money, D. M., VOGUE Research Group. Increased richness and diversity of the vaginal microbiota and spontaneous preterm birth. Microbiome 6, 117 (2018).
Howard, P. H. Handbook of Environmental Fate and Exposure Data For Organic Chemicals (CRC Press, 1990).
Wambaugh, J. F. et al. High throughput heuristics for prioritizing human exposure to environmental chemicals. Environ. Sci. Technol. 48, 12760–12767 (2014).
Wang, A. et al. Suspect screening, prioritization, and confirmation of environmental chemicals in maternal-newborn pairs from San Francisco. Environ. Sci. Technol. 55, 5037–5049 (2021).
Woodruff, T. J., Zota, A. R. & Schwartz, J. M. Environmental chemicals in pregnant women in the United States: NHANES 2003–2004. Environ. Health Perspect. 119, 878–885 (2011).
Bullard, R. D. Race and environmental justice in the United States. Yale J. Int. Law 18, 319 (1993).
Morello-Frosch, R. & Lopez, R. The riskscape and the color line: examining the role of segregation in environmental health disparities. Environ. Res. 102, 181–196 (2006).
Helm, J. S., Nishioka, M., Brody, J. G., Rudel, R. A. & Dodson, R. E. Measurement of endocrine disrupting and asthma-associated chemicals in hair products used by Black women. Environ. Res. 165, 448–458 (2018).
James-Todd, T., Senie, R. & Terry, M. B. Racial/ethnic differences in hormonally-active hair product use: a plausible risk factor for health disparities. J. Immigr. Minor. Health 14, 506–511 (2012).
Longnecker, M. P., Klebanoff, M. A., Zhou, H. & Brock, J. W. Association between maternal serum concentration of the DDT metabolite DDE and preterm and small-for-gestational-age babies at birth. Lancet 358, 110–114 (2001).
Ferguson, K. K. et al. Environmental phthalate exposure and preterm birth in the PROTECT birth cohort. Environ. Int. 132, 105099 (2019).
Fettweis, J. M. et al. Differences in vaginal microbiome in African American women versus women of European ancestry. Microbiol. 160, 2272–2282 (2014).
Vyas, D. A., Eisenstein, L. G. & Jones, D. S. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N. Engl. J. Med. 383, 874–882 (2020).
Cooper, R. S., Kaufman, J. S. & Ward, R. Race and genomics. N. Engl. J. Med. 348, 1166–1170 (2003).
Ford, L. et al. Precision of a clinical metabolomics profiling platform for use in the identification of inborn errors of metabolism. J. Appl. Lab. Med. 5, 342–356 (2020).
Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. ArXiv https://doi.org/10.48550/arXiv.1802.03426 (2020).
Douglas, G. M. et al. PICRUSt2 for prediction of metagenome functions. Nat. Biotechnol. 38, 685–688 (2020).
Liu, C. M. et al. BactQuant: an enhanced broad-coverage bacterial quantitative real-time PCR assay. BMC Microbiol. 12, 56 (2012).
Jian, C., Luukkonen, P., Yki-Järvinen, H., Salonen, A. & Korpela, K. Quantitative PCR provides a simple and accessible method for quantitative microbiota profiling. PLoS ONE 15, e0227285 (2020).
Heirendt, L. et al. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v. 3.0. Nat. Protoc. 14, 639–702 (2019).
Heinken, A. et al. AGORA2: Large scale reconstruction of the microbiome highlights wide-spread drug-metabolising capacities. Preprint at bioRxiv https://doi.org/10.1101/2020.11.09.375451 (2020).
Klitgord, N. & Segrè, D. Environments that induce synthetic microbial ecosystems. PLoS Comput. Biol. 6, e1001002 (2010).
Heinken, A., Sahoo, S., Fleming, R. M. T. & Thiele, I. Systems-level characterization of a host–microbe metabolic symbiosis in the mammalian gut. Gut Microbes 4, 28–40 (2013).
Baldini, F. et al. Parkinson’s disease-associated alterations of the gut microbiome predict disease-relevant changes in metabolic functions. BMC Biol. 18, 62 (2020).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
Connil, N. et al. Identification of the Enterococcus faecalis tyrosine decarboxylase operon involved in tyramine production. Appl. Environ. Microbiol. 68, 3537–3544 (2002).
Orth, J. D., Thiele, I. & Palsson, B. Ø. What is flux balance analysis? Nat. Biotechnol. 28, 245–248 (2010).
Ke, G. et al. in Proceedings of the 31st International Conference on Neural Information Processing Systems 3149–3157 (Curran Associates, 2017).
Hanley, J. A. & McNeil, B. J. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148, 839–843 (1983).
We thank M. A. Elovitz, J. Ravel, K. D. Gerson, P. Gajer and L. Anton for initiating, collecting and sharing samples, for assistance in funding acquisition and for useful discussions. We thank the members of the Korem group, L. Shenhav, D. Zeevi, N. Bar and R. Wapner for useful discussions. The Motherhood and Microbiome cohort was funded by the National Institute of Nursing Research (NINR; R01NR014784). One of the datasets used was obtained from the database of Genotypes and Phenotypes (dbGaP) through dbGaP accession number phs001739.v1.p1. The current study was supported by NINR (R01NR014784), the Center for Precision Medicine at the University of Pennsylvania, the Vagelos Award provided by Columbia University Precision Medicine Initiative, the Program for Mathematical Genomics at Columbia University and the CIFAR Azrieli Global Scholarship in the Humans & the Microbiome Program. W.F.K. was supported by NIH T32GM007367 and F30HD108886. I.T. and A.H. were supported by grants from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 757922) awarded to I.T.
M.L. and T.K. are inventors on a provisional patent application related to this work. Other authors declare no conflict of interests.
Peer review information
Nature Microbiology thanks Rodman Turpin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, Distribution of metabolite super pathways among assayed metabolites. Metabolite super pathway assignments were provided by Metabolon. b, Distribution of metabolite prevalences across samples. Gray distribution reflects prevalences of all metabolites (N = 745). Blue distribution only reflects prevalences of named metabolites (N = 635). Dashed lines distinguish metabolites prevalent in more than 80% (N = 352) and more than 20% of samples (N = 694).
a, b, UMAP ordination of metabolomics data (N = 232), same as Fig. 1b, colored by Pos Early, Pos Late, and Polar platform batches (a; 2 batches) and by Neg platform batches (b; 3 batches). See Supplementary Table 4 for which metabolites were measured by each platform. Limited batch effect is noted, which is statistically significant only for the 3 batches (PERMANOVA P = 0.09 and P = 0.023 for 2 and 3 batches, respectively). c, The fraction of samples from each batch (y-axis; top, Pos Early, Pos Late, and Polar platform batches; bottom, Neg platform batches) whose metabolite profiles clustered to each metabolite cluster (MC; x-axis), shown for each MC separately. No significant batch effect was detected in MC assignments (Two-sided Fisher’s exact P > 0.05 for all without FDR correction). d, Heatmap showing odds ratio for sPTB (color bar) for each metabolite from Fig. 2a (x-axis) using a logistic regression model adjusting for batch (according to the appropriate platform for the metabolite, Supplementary Table 4), stratified by maternal race (y-axis). The exact odds ratio and confidence interval are written in the cell for all statistically significant associations (FDR < 0.1). e, sPTB classification accuracy (auROC, x-axis) for a prediction model similar to those used for the entire cohort (Fig. 4, Methods), that is: trained and evaluated in cross validation on batch 1 (N = 114; orange; auROC = 0.66; one-sided permutation P = 0.44 for lower accuracy than random draw); trained on batch 1 (N = 114) and evaluated on batch 2 (N = 118; violet; auROC = 0.66; P = 0.46); trained and evaluated in cross validation on batch 2 (N = 118; magenta; auROC = 0.66; P = 0.44); and trained on batch 2 (N = 118) and evaluated on batch 1 (N = 114; brown; auROC = 0.69; P = 0.66). Gray histogram (black line, KDE) shows accuracy of models evaluated in cross-validation on random samples (N = 116) from this cohort (mean auROC = 0.67). This analysis demonstrates that a prediction model trained on one of the two batches generalizes well to the other batch, and that both accuracies are to be expected given the limited sample size.
a, b, Within cluster sum of squared distances (a) and gap statistic (b) for k-medoids clustering using Canberra distances with k from 1 to 15. A shoulder (a) and peak (b) are visible for k = 6. c, Heatmap showing metabolite levels for each subject (rows) and metabolite (columns). Subjects are sorted by their assigned metabolites cluster (MC) and metabolites are clustered hierarchically using Canberra distance and Ward linkage. The color above each column reflects metabolite annotations (legend to the right). d-f, Same as Fig. 1c, using PCA (d), Canberra distance-based PCoA (e) and t-SNE (f). g, Histogram of consistency of MC assignment, defined as the fraction of samples assigned to the same MC (x-axis) in 100 iterations in which we randomly selected 90% (209 women) of the cohort, and generated 6 metabolite clusters de novo. The analysis shows that many of the iterations (36 iterations, 36%) had over 95% consistency, with an overall mean consistency of 86%.
a, Distribution of CSTs within each metabolite cluster, for all (top; N = 232), White (middle; N = 51) and Black (bottom; N = 173) women. Each group of bars corresponds to a single metabolite cluster and bars within a group sum to 100%. b, Same as Fig. 1d, stratified by race. P - two-sided Fisher’s exact p-values, q < 0.1. c, d, Same as Fig. 1b, c, colored by maternal race. P - PERMANOVA. e,f, Same as Fig. 1f, g, performed for all women combined. g, Same as Fig. 1g, for association with early sPTB (gestational age at birth < 32).
a, Box and swarm plots (line, median; box, IQR; whiskers, 1.5*IQR) of the levels of metabolites associated with sPTB, comparing preterm and term deliveries and stratifying by maternal self-identified race. P – two-sided Mann-Whitney U. b, Distribution (kernel density estimation) of four xenobiotics associated with sPTB or early sPTB across this cohort. Samples with no metabolite detected are excluded. c, Same as Fig. 2a, for women not treated with progesterone. d, Heatmap showing metabolite sets altered in sPTB in various subsets of this cohort. Colors correspond to two-sided p-value of metabolite set enrichment analysis (Methods). Only associations with FDR < 0.1 are shown. e, Raw intensity levels measured across samples for the same four xenobiotics as in b, compared to measures from plate negative process controls. Box mid-line, median; box, IQR; whiskers, 1.5*IQR; vertical line, min:max range; dot, mean; N.D., not detected. N = 232 for Diethanolamine; N = 230 for ethyl glucoside; N = 221 for tartrate; N = 232 for EDTA. f, Mass error for spectral matching (y-axis) for the same xenobiotics, compared to the mean mass error for all non-xenobiotic, tier 1 metabolites, showing that the four xenobiotic metabolites had very good identification quality.
a, Same as Fig. 3a, but with each microbial taxa represented as an individual node. b, Volcano plot where every point represents a microbe–metabolite association. X-axis displays the difference between spearman ρ’s calculated separately among Black and White women. Y-axis displays the significance of the difference, using the two-sided Fisher’s R-to-z transform. Horizontal maroon line designates p = 0.05. Gold points indicate associations where there is a difference in sign between the correlations among Black and White women. c, d, Same as a, for associations only among Black (c) and White (d) women. e, Same as a, for metabolites associated with extremely or very PTB among Black women. f, Same as b, for difference in associations between Black women who delivered extremely or very preterm and the rest of the Black women in the cohort.
Extended Data Fig. 7 Metabolic models provide accurate predictions of putrescine, histamine, and tyramine.
a–c, Putrescine (a), histamine (b), and tyramine (c) predictions derived from microbiome metabolic models (NMPC; Methods; y-axis) plotted against measured metabolite levels (x-axis), showing good accuracy for all (Spearman ρ = 0.64; ρ = 0.54; and ρ = 0.62, respectively, P < 10−10 for all). d, Model coverage (y-axis; line, median; box, IQR; whiskers, 1.5*IQR), described as the fraction of total sample abundance represented by metabolic models, for each subgroup separately. Samples from White women had higher model coverage compared to samples from Black women, despite the lower accuracy for tyramine prediction in the former group. N = 173 for Black women; N = 21 for White women with sPTB; N = 30 for White women with TB. e, Spearman ρ between metabolic model predictions (NMPCs) and metabolite measurements (y-axis) for models that only contain a maximum of N most abundant species (x-axis). As our metabolic models account for the abundance of each microbe, and as the vaginal microbiome has a skewed distribution, our models are robust to lack of representation of low-abundance microbes.
a, Receiver operating characteristic (ROC) curve comparing the performance of different sPTB prediction algorithms on metabolomics data. LightGBM (auROC = 0.81) outperforms logistic regression (auROC = 0.78, P = 0.017 for auROC comparison against LightGBM), support vector classification (auROC = 0.76, P = 2.9 × 10−4) and elastic net (auROC = 0.72, P = 0.004). b, ROC curve comparing the performance of a composite model stratified for race against a model trained on all samples. A model trained on samples from all women achieves the same accuracy as a model trained only on samples from Black women when evaluated in 10-fold cross-validation on sPTB prediction for Black women (auROC of 0.83 and 0.82, respectively). However, a model trained on samples from all women significantly underperforms a model trained only on samples from women who do not identify as Black when evaluated in 10-fold cross-validation on the same subgroup (auROC of 0.64 vs. 0.80, P = 4 × 10−7 for auROC comparison). Demonstrating that a different model is learned on each subgroup, models trained separately on each subgroup do not generalize as well to the other subgroup (auROC of 0.64 and 0.65). c, d, ROC (c) and precision-recall (PR; d) curves, evaluated in nested cross-validation, comparing sPTB prediction accuracy for models based on metabolomics data alone (auROC = 0.78, auPR = 0.61), and on metabolomics data combined with microbiome and clinical data (‘combination’; auROC = 0.76, auPR = 0.62; P = 0.44). e, SHAP83-based effect on total prediction (x-axis) for the top 10 features used in our combination models, sorted with descending importance. Each dot represents a sample, with the color corresponding to the metabolite level in the sample compared to all samples. f, g, ROC curves for the same metabolome-based (f) and microbiome-based (g) models as in Fig. 4a,b, when prediction is evaluated for extremely (<28 weeks of gestation) and very (<32 weeks) PTB. The microbiome-based models show increasing accuracy for predicting extremely and very PTB (auROC of 0.69 and 0.62, respectively, compared to auROC of 0.55 for all sPTB, P = 0.03 and P = 0.49, respectively). h, i, PR curve for sPTB prediction on two external cohorts, obtained using our metabolome-based predictor without retraining or adaptation. j, Same as (e) for the microbiome-based model. Shaded lines in a–d, f, g show results from five independent 10-fold cross validation draws (Methods). p-values for comparisons between ROC curves are based on the two-sided test described in ref. 117.
Supplementary Notes 1–2.
Supplementary Table 1 Raw metabolite measurements. Sample IDs refer to Supplementary Data 2 of ref. 14. Values are raw area counts. Supplementary Table 2 Assignments of samples to metabolite clusters (MCs). Sample IDs refer to Supplementary Data 2 of ref. 14. Supplementary Table 3 Metabolite origin predictions by AMON. Supplementary Table 4 Metabolite annotations and extraction platforms. Supplementary Table 5 Tyramine prediction accuracy with metabolic models using different media definitions. Values are Spearman ρ between NMPC values and tyramine measurements. Supplementary Table 6 Shapley values of prediction models. Supplementary Table 7 Chromatography and mass spectrometry parameters. Listed are all technical parameters for each of Metabolon’s LC–MS/MS platforms. Supplementary Table 8 Tyramine, putrescine and histamine predicted NMPCs. Sample IDs refer to Supplementary Data 2 of ref. 14. Supplementary Table 9 Assignments of SpeciateIT species to AGORA models. SpeciateIT species are the columns of Supplementary Data 2 of ref. 14. Supplementary Table 10 Metabolites included in the vaginal media used in metabolic models. Listed are the metabolites included, along with their AGORA identifiers. Supplementary Table 11 Hyperparameter sets used to optimize prediction models. Supplementary Table 12 Parameters of final prediction models. Supplementary Table 13 Measurement characteristics for highlighted xenobiotics.
About this article
Cite this article
Kindschuh, W.F., Baldini, F., Liu, M.C. et al. Preterm birth is associated with xenobiotics and predicted by the vaginal metabolome. Nat Microbiol 8, 246–259 (2023). https://doi.org/10.1038/s41564-022-01293-8
This article is cited by
Nature Communications (2023)
Signal Transduction and Targeted Therapy (2023)