Untargeted analysis of first trimester serum to reveal biomarkers of pregnancy complications: a case–control discovery phase study

Understanding of causal biology and predictive biomarkers are lacking for hypertensive disorders of pregnancy (HDP) and preterm birth (PTB). First-trimester serum specimens from 51 cases of HDP, including 18 cases of pre-eclampsia (PE) and 33 cases of gestational hypertension (GH); 53 cases of PTB; and 109 controls were obtained from the Global Alliance to Prevent Prematurity and Stillbirth repository. Metabotyping was conducted using liquid chromatography high resolution mass spectroscopy and nuclear magnetic resonance spectroscopy. Multivariable logistic regression was used to identify signals that differed between groups after controlling for confounders. Signals important to predicting HDP and PTB were matched to an in-house physical standards library and public databases. Pathway analysis was conducted using GeneGo MetaCore. Over 400 signals for endogenous and exogenous metabolites that differentiated cases and controls were identified or annotated, and models that included these signals produced substantial improvements in predictive power beyond models that only included known risk factors. Perturbations of the aminoacyl-tRNA biosynthesis, l-threonine, and renal secretion of organic electrolytes pathways were associated with both HDP and PTB, while pathways related to cholesterol transport and metabolism were associated with HDP. This untargeted metabolomics analysis identified signals and common pathways associated with pregnancy complications.

Statistical analysis. The Caret R package (version 6.0-84) and RStudio 3.6.1 were used for the UPLC-HR-MS model selection procedure based on cross validation with SAS software 9.4 for the remaining data analysis. All demographic, behavioral, medical, and lifestyle factors available or harmonizable across questionnaires and potentially associated with exposure and the outcomes were examined as possible confounders. Covariates that were distributed differently in cases and controls with p value < 0.2, with the exception of previous history of complications (because causes of previous events might also cause events in the current pregnancy 23 ), were included in the initial stepwise models. Due to the sample selection criteria, "gravidity" was included in each of the stepwise models regardless of significance level.
Each case group was modeled separately using univariate and multivariable logistic regression models. All signals meeting the selection criterion, regardless of being identified/annotated, were considered in the analysis. The first set of multivariable regression models utilized all 3,122 signals, and due to the high dimensionality of the data we utilized a multi-step approach based on a fivefold cross validation (supplementary materials) 24 . In addition, the 12 exogenous metabolites that were identified or annotated based on the untargeted LC-MS inhouse physical standards library and differed by group were modeled using stepwise multivariable regression with p < 0.05 for retention; the covariates identified above were included regardless of the p value. The broad spectrum NMR data (195 bins) was modeled using stepwise multivariable regression. Stepwise regression was used for the exogenous signals and the NMR data due to the lower dimensionality of the data.
The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of the prediction models. The strength and precision of the associations with individual metabolites were compared based on the odds ratios and widths of the confidence interval respectively. Pathway enrichment analysis. GeneGo MetaCore (Clarivate Analytics, PA) was used to assess the enrichment of perturbed metabolic pathways. For this analysis, metabolites were included that had an ontology level (OL) of OL1 (RT, Mass, and MS/MS), OL2a (RT and Mass), or were determined by NMR (Tables S2-S11). Metacore uses the hypergeometric test, which represents the enrichment of certain metabolites in a pathway, together with the false discovery rate (FDR). A p value < 0.01 is considered indicative of significant enrichment in pathways.
This secondary analysis of de-identified data and samples was ruled not human subjects research by the Tulane Institutional Review Board. All participants provided informed consent to recruitment into the GAPPS repository 25 .

Results
The largest proportion of included samples were from Yakima Valley Memorial Hospital, with 14.1% of participants from the University of Washington Medical Center and 14.1% from Swedish Medical Center; there was no statistically significant association between pregnancy complications and center where participants were enrolled. The mean participant age was between 29 and 31 for all case groups and for the controls (Table 1). A large majority had been pregnant before (78% of controls, 72-79% depending on case group). The majority of the participants were white (72% of controls, 69% of HDP cases, 53% of PTB cases). Early-pregnancy BMI of those with hypertensive disorders (mean for HDP cases, 34.7, SD 9.6) was higher than controls (mean 29.3, SD 7.8). Cases of overall PTB (29% ever smokers) and GH (39% ever smokers) were more likely to have smoked than controls (17%). Cases were more likely to have used street drugs prior to pregnancy (overall HDP 17%, overall PTB 15%) than controls (6%). Other variables that differed from controls for at least one case group are listed in Table 1. Besides gravidity, for the stepwise modeling, BMI and illegal drug use were included in models of HDP, GH, and PE; obesity and illegal drug use were selected for the PTB model; and no covariates remained in the sPTB model.
When examined one at a time, 337 signals were associated with HDP (p < 0.1) with 173 metabolites being identified or annotated (Table S2). When GH and PE were examined individually, 344 signals (with 173 being identified or annotated) were associated with GH (p < 0.1, Table S3), while 446 (with 189 being identified or annotated) were associated with PE (p < 0.1, Table S4).
Models including signals/metabolites determined by UPLC-HRMS (Table 2) showed significant improvements in the AUC over models constructed using only covariates. Among the signals/metabolites retained in the HDP models, the most precise associations were with an unknown signal with an neutral mass of 746.6045 Da and retention time at 0.59 min (0.59_746.6045n, reduced odds), and a signal annotated as pilocarpine (PDc, increased odds). The strongest effect sizes were for an unidentified signal at 8.66_762.1452 m/z and 12.74_412.2842 m/z, both of which were associated with reduced odds. A signal that annotated as 2,6-Di-tert-butyl-4-hydroxymethylphenol (BHT-OH) through matching with public database by exact mass and MS/MS spectra (PDa) was    www.nature.com/scientificreports/ strongly associated with GH. For the PE model, 4 signals were included; among them, an unidentified signal at 6.30_477.7721 m/z was most precise, while cerasinone (PDb) had the strongest effect size and bolasterone the most definite annotation (PDa) ( Table 2). The signals/metabolites included in the overall HDP model were not the same as those included for models of each type of HDP, but signals/metabolites included in the final HDP model were associated with either GH or PE, and usually both, when examined individually (Tables S2-S4).
Over 246 signals were individually associated with PTB (p < 0.1), with 189 metabolites identified or annotated (Table S5); 298 signals were individually associated with sPTB (p < 0.1) and 135 metabolites identified or annotated (Table S6). In multiple logistic regression analysis, 5 signals were included in the PTB model, while 6 signals were included in the sPTB model (Table 2), all with similar precision (variance) and effect size (odds ratio). A common signal was retained in both models with a RT at 15.66 min and an exact neutral mass at 770.4609 Da. All of these metabolites were annotated with an evidence bases of PDc or below.
In the NMR analysis (Table 3; unadjusted results in tables S6-S10), bins containing signals that could be derived from asparagine/albumin was associated with HDP (OR 0.17, 95% CI 0.04-0.74), and from asparagine/N,N-dimethylglycine/trimethylamine were associated with PE (OR 0.16, 95% CI 0.05-0.52). Threonine and urea were associated with reduced risk of PTB and SPTB, respectively, but did not add significantly to the predictive value of the model.
An additional aim of our study was to evaluate the correlation between environmental exposures and pregnancy complications. Over 20 metabolites derived from exogenous compounds were identified or annotated (OL1, OL2a, and OL2b), and over a dozen metabolites that are derived from exogenous exposures differentiated the case-control status (univariable logistic regression analysis, p < 0.1). This included metabolites of bisphenols, parabens, phthalates, polyphenol metabolites, and medications (Table S2-S6). Monohexyl phthalate was associated with HDP and GH (Table 4), while salicylamide was associated with PE. (R,S)-N-Acetyl-S-(2-hydroxy-3buten-1-yl)-l-cysteine was associated with reduced odds of sPTB.
For pathway analysis, metabolites that were perturbed between cases and controls with the evidence bases of OL1 (RT, MS, MS/MS) or OL2a (RT, MS) included individual steroid hormones, acetylcarnitines, nucleosides, hydroxyl short-chain fatty acids, and exogenous metabolites. Thirty pathways were found to be associated with HDP, with 24 associated with GH and 37 associated with PE; while 15 pathways were associated with PTB and 9 with sPTB ( Fig. 1 and Table 5). Five perturbed pathways were associated with all the investigated complications: aminoacyl-tRNA biosynthesis, l-threonine, renal secretion of organic electrolytes, and urea cycle. HDP, GH and PE were also highly overlapping in pathways related to cortisol biosynthesis, cholesterol and sphingolipid transport, lipoprotein metabolism, and metabolic syndrome/type 2 diabetes. Pathways associated with PTB and/ or sPTB related to cortisol production activation in depression, renal secretion of drugs, transcription role of Vitamin D receptor in regulation of genes involved in osteoporosis, immune responses, and tyrosine metabolism.

Discussion
In this untargeted metabolomic analysis of first trimester serum samples, we identified and annotated several endogenous and exogenous metabolites associated with complications of pregnancy, and showed that metabolites significantly improved the predictive value of models over known risk factors. The number of features differentiating cases and controls and the identified/annotated features found for PTB were less than that of HDP; this may indicate that PTB is a more heterogeneous condition. The investigation was is a discovery-based (i.e., untargeted) approach which could lead to biomarker(s) useful in clinical practice. Unlike analyses that focused mainly on a few signals with identification/annotation 13,16,26 , we created models using all signals for a more comprehensive analysis. Some signals used in the modelling approach could be identified through retention time, mass, and fragmentation, while others were annotated through public databases or remained unknown. The identifications and annotations in our study provide evidence-based ontology levels, which is important for data comparison and harmonization in future collaborations. Table 3. NMR metabolites associated with HDP and PTB in cross-validated multiple logistic regression model (NMR). Adjustment factors: HDP: BMI at first prenatal care visit, illegal drug use in the year before pregnancy, gravidity; GH: BMI at first prenatal care visit, illegal drug use in the year before pregnancy, gravidity; PE: illegal drug use in the year before pregnancy, gravidity; PTB: obesity, and illegal drug use in the year before pregnancy, gravidity; sPTB: gravidity. HDP, any hypertensive disorder of pregnancy; GH, gestational hypertension; PE, pre-eclampsia, PTB, preterm birth; sPTB, spontaneous preterm birth; GDM, gestational diabetes mellitus. a Baseline model includes only adjustment variables. www.nature.com/scientificreports/ Exogenous metabolites Monohexyl phthalate was correlated with HDP and GH, and phthalate metabolites were weakly associated with decreased blood pressure in the second trimester in one previous study 27 . The correlation between salicylamide and PE may be due to the usage of aspirin-like medication (such as Labetalol, 2-hydroxy-5-[1-hydroxy-2-[(1-methyl-3-phenylpropyl)amino]ethyl]benzamide monohydrochloride), in hypertensive women 28 . (In our study, salicylamide levels were higher for the 5 women in the study, 4 cases and 1 control, who had chronic hypertension.) (R,S)-N-Acetyl-S-(2-hydroxy-3-buten-1-yl)-l-cysteine (MHB2) is a metabolite generated in vivo after exposure to 1,3-butadiene via smoking or air pollution 29 ; the link we found between MHB2 and sPTB is consistent with previous studies finding associations with these toxicants 30,31 .
Individual metabolites, HDP: Our study identified multiple signals with strong predictive value for HDP. We attempted to match signals to our in-house library of standards run under identical conditions to the study samples, as well as with public database. These signals could not be identified using evidence of retention time and/or MS/MS spectra pattern. Therefore, we provided the tentative annotation and chromatographic/spectra information for those important signals, which might be helpful for identification/annotation using other data mining technologies in the future 22,32 . We found a large number of metabolic profiles that were significantly perturbed (p < 0.1) between cases and controls (Table S2-S11 in supplementary materials). Although none of these identified/annotated metabolites was predictive enough to be used as a clinical biomarker, most of our findings in metabolic profiles (Table S2-S11) are highly consistent with the New Zealand SCOPE cohort 33 , as well as other discovery-phase studies 34,35 . One of the signals with predictive value for PE matched to an androgen steroid hormone, and the PE-associated perturbation of steroid hormones was also reported in the SCOPE study 33 . Increased androgens are correlated with vascular dysfunction in HDP, interrupting oxygen and nutrient transport from the maternal blood supply 36 . In the GH model, 2,6-Di-tert-butyl-4-hydroxymethylphenol Table 4. Association between exposure of exogenous chemicals and pregnancy complications by stepwise modeling. HDP, any hypertensive disorder of pregnancy; GH, gestational hypertension; PE, pre-eclampsia, PTB, preterm birth; sPTB, spontaneous preterm birth; GDM, gestational diabetes mellitus. a The exogenous metabolites include the exposed parent chemicals and their metabolites or conjugates that are formed in vivo after the exposures. The exogenous metabolite was identified or annotated via matching to an In-house Experimental Standards Library that contains over 300 exogenous metabolite standards, which were prioritized for data acquisition based on previous findings in human exposure studies. b p < 0.10 after adjustment for covariates. Single metabolite models include only one metabolite as well as covariates; stepwise models include all predictive metabolites simultaneously and retain those p < 0.05. Adjustment factors: HDP: BMI at first prenatal care visit, illegal drug use in the year before pregnancy, gravidity; GH: BMI at first prenatal care visit, illegal drug use in the year before pregnancy, gravidity; PE: illegal drug use in the year before pregnancy, gravidity; PTB: obesity, and illegal drug use in the year before pregnancy, gravidity; sPTB: gravidity. c Ontology levels: Identification or annotation of exogenous metabolites are supported by evidences from chromatography, e.g., retention time (RT), and/or mass spectrometry, e.g., exact mass (MS) and/or tandem mass spectra (MS/ MS). OL1, highly confident identification based on matching with In-house physical standard library (IPSL) via retention time (RT, with RT error ≤|0.5|), exact mass (MS, with mass error < 5 ppm), and tandem mass similarity (MS/MS, with similarity ≥ 30); OL2a, confident identification based on matching with IPSL via MS and RT; OL2b, annotation for the isomer or derivatives of the compound listed, based on matching with IPSL via MS and MS/MS. d No variables p < 0.05. www.nature.com/scientificreports/ (BHT-OH, PDa) was predictive. This compound is a metabolite of 2,6-Di-tert-butyl-4-methylphenol (BHA), a synthetic phenolic antioxidant used widely in foods, polymers, and cosmetics to slow oxidation. Some BHA metabolites have been found to induce cellular DNA damage and the chemical was placed on the European Union watch list in 2015 37 . Only elevated acylcarnitine and decreased taurine levels have repeatedly been found to relate to PE in previous metabolomic studies 6 . Neither was included in our final model, but butenylcarnitine and 3-hydroxyhexanoyl carnitine were associated with higher odds of HDP in univariate models (Table S2); no association was found with taurine. Individual metabolites, PTB: Most of the signals retained in the final models for PTB and sPTB were identified with public database matching. Of the metabolites we found that were associated with PTB in this analysis, only threonine had been previously associated with PTB, with a negative association 20 . Our previous review of metabolomics and PTB found little consistency across studies, with only myoinositol, creatinine, histidine, and 5-oxoproline associated across multiple studies 20 . Among these, in our analysis, only histidine was weakly associated with PTB, and it was not retained in final models.
Common pathways: Pathways involved in protein synthesis (aminoacyl-tRNA biosynthesis), threonine metabolism, urea cycle, and renal secretion of organic electrolytes were perturbed in both HDP and PTB. Protein synthesis and amino acid metabolism play important roles in maternal and fetal health. Pregnant women who have inherited metabolic disorders in protein and amino acid metabolism are more likely to develop pregnancy complications, indicating burdens in urea nitrogen clearance 38 . A previous study of late-onset pre-eclampsia also found associations with aminoacyl-tRNA synthesis (though they were not statistically robust) 39 . Perturbation of the renal secretion of organic electrolytes pathway may indicate changes in the kidney proximal tubule related to xenobiotic metabolism 40 .
Pathways and individual complications: Multiple pathways were perturbed in the early part of pregnancies that later developed HDP. Several lipid-related pathways were associated with HDP, consistent with the disruptions of lipid metabolism that have been demonstrated in HDP 41,42 . The leucine, valine, and isoleucine metabolism, related to both HDP and PTB in these data, was previously associated with late-onset preeclampsia 39 . 4-hydroxyglutamate, identified as a strong predictor of PE in a previous study 16 , was not associated in our analysis. However, it is involved in the arginine-proline metabolism pathway, one of the pathways identified for HDP, and is a substrate that produces 4-hydroxy-2-oxoglutarate, an intermediate on several pathways identified in this analysis. Pathways related to oxidative stress, nitrous oxide signaling, and inflammatory signaling were associated only with PE, suggesting that the oxidative stress and inflammation leading to severe damage in endothelial function might contribute to the more severe pathology of PE. Fewer pathways were associated with PTB and the associations were less strong, but some were intriguing. For instance, the pathways related to Venn diagram of metabolic pathways perturbed between cases (e.g., GH, HPD, PE, PTB, and sPTB) and controls. Pathway enrichment was conducted by Genego Metacore using Enrichment by Pathway Map, and the cut-off for pathway enrichment is p < 0.01. Each section of the diagram is labeled by capital letters (A, B, C, D, E), and the numbers of pathways that were specific to a certain phenotype (in the region with single capital letter) or overlapping between different phenotypes (in the region with combination of letters). The list of pathways corresponding to each section are shown in Table 5 www.nature.com/scientificreports/ activation of cortisol pathways in major depressive disorder were perturbed, and cortisol and depression have both been previously related to PTB 43 . Strengths of the study include the first-trimester sampling and strong QC for both the sample collection and the spectroscopic analyses. Limitations include the small sample size, lack of detailed information on subtypes of PE and PTB, lack of a replication sample, the single-timepoint sample, and the limited number of African-American participants.
This study contributes to the growing literature on metabolites associated with pregnancy complications and suggests that perturbations of several common pathways are associated with both HDP and PTB. The  Table 5. Enriched metabolic pathways perturbed between different case groups and control group (corresponding to Fig. 1). Metabolic pathways between cases (e.g., GH, HDP, PE, PTB, and sPTB) and controls were enriched by GeneGo MetaCore using Enrichment by Pathway Maps. a Capital letters (A, B, C, D, E, and their combinations) and numbers are corresponding to the Venn diagram in Fig. 1. Pathways in a region with single capital letter are specific to a certain phenotype; while pathways in a region with combination of letters indicate the common pathways being impacted in different phenotypes. A, gestational hypertension (GH); B, hypertensive disorder of pregnancy (HDP); C, pre-eclampsia (PE); D, preterm birth (PTB); E, spontaneous preterm birth (sPTB). b The p value that was generated from the hypergeometric test in Metacore indicates the significance of enrichment of metabolites in pathway mapping. A lower p value indicates a higher significance in the pathway enrichment. c FDR = false discovery rate. www.nature.com/scientificreports/ metabolomic field needs to report the evidence basis for identifications and annotations in order to increase the usability of reported findings.