Trace biomarkers associated with spontaneous preterm birth from the maternal serum metabolome of asymptomatic nulliparous women - parallel case-control studies from the SCOPE cohort.

Prediction of spontaneous preterm birth (sPTB) in asymptomatic women remains a great challenge; accurate and reproducible screening tools are still not available in clinical practice. We aimed to investigate whether the maternal serum metabolome together with clinical factors could be used to identify asymptomatic women at risk of sPTB. We conducted two case-control studies using gas chromatography-mass spectrometry to analyse maternal serum samples collected at 15- and 20-weeks' gestation from 164 nulliparous women from Cork, and 157 from Auckland. Smoking and vaginal bleeding before 15 weeks were the only significant clinical predictors of sPTB for Auckland and Cork subsets, respectively. Decane, undecane, and dodecane were significantly associated with sPTB (FDR < 0.05) in the Cork subset. An odds ratio of 1.9 was associated with a one standard deviation increase in log (undecane) in a multiple logistic regression which also included vaginal bleeding as a predictor. In summary, elevated serum levels of the alkanes decane, undecane, and dodecane were associated with sPTB in asymptomatic nulliparous women from Cork, but not in the Auckland cohort. The association is not strong enough to be a useful clinical predictor, but suggests that further investigation of the association between oxidative stress processes and sPTB risk is warranted.


Methods
Study design and participants. Two case-control studies were conducted, using a sub-set of women from the Auckland, New Zealand and Cork, Ireland centres of the Screening for Pregnancy Endpoints (SCOPE) study 41 .
The SCOPE study recruited 5,690 nulliparous low-risk women with a singleton pregnancy from New Zealand, Australia, Ireland and the United Kingdom between November 2004 and August 2008. Ethical approval was obtained from the local ethics committees in both Ireland and New Zealand (Cork: study number ECM5 (10) 05/02/08, Clinical Research Ethics Committee of the Cork Teaching Hospitals; New Zealand: study number AKX/02/00/364, Northern X Reginal Ethics Committee) and all participants gave their written informed consent. Details on enrolment and inclusion criteria for the SCOPE study have been published elsewhere [38][39][40] . Collection of data and biological samples complied with standardized procedures in all participating centres and was conducted in accordance with the principles of the Declaration of Helsinki. Cases of sPTB were defined as those women who delivered before 37 weeks' gestation, and controls as those who delivered at or after 37 weeks' gestation.
The subset of SCOPE participants included in this study from the Cork, Ireland centre consisted of 55 cases of sPTB and 102 controls, matched to cases according to maternal age (±3 years) and maternal body mass index (BMI; ±5 kg/m 2 )*. (b) Auckland, New Zealand.
The subset of SCOPE participants included in this study from the Auckland, New Zealand centre consisted of 55 cases of sPTB and 109 controls, comprising 56 controls matched to cases according to maternal age (±3 years), and 53 controls matched to cases according to both maternal age (±3 years) and BMI (±5 kg/m 2 )*.
(*Our intention was to match each case to two controls, however, in Cork eight control subjects were excluded due to misclassification or lack of data; in Auckland one maternal age and BMI control was reclassified to a maternal age control and another was excluded due to technical difficulties). outcome. Spontaneous preterm birth was defined as any birth that occurred before 37 weeks due to spontaneous onset of labour or premature rupture of membranes. Gestational age (GA) was estimated by last menstrual period (LMP) and confirmed by an ultrasound dating before 16 weeks of gestation. A discordance of seven or more days between LMP and ultrasound dating or an unsure LMP led to the estimation of GA exclusively by early ultrasound parameters. Term birth for the control group was determined as delivery after 37 weeks of gestation. Early sPTB before 34 weeks was considered for subgroup analysis.
Sample collection and storage. Maternal blood samples were collected in two 6 mL vacutainers by venepuncture at 15 and 20 weeks (±1 week). Following clot formation, the vacutainers were centrifuged at 3,000 rpm for 10 min at 4 °C, followed by a second centrifugation at 4,000 rpm for 10 min at 4 °C. After centrifugation, the resulting serum (supernatant) was pipetted into a sterile tube and aliquots of 250 µL were dispensed into cryotubes. All samples were stored at −80 °C. The study followed standard best practice procedures for repositories in all steps of sample collection and storage, registering all SPREC (Standard PREanalytical Coding) data accordingly in an online database 42 .
For this study, two 250 µL maternal serum samples were obtained from the biobank for each participant, one from each time point; 15-and 20-weeks' gestation (±1 week).
Sample preparation and extraction. The serum samples underwent extraction and derivatization procedures based on the 2010 protocol of Smart et al. 43 . Briefly, samples were thawed on ice at 4 °C and transferred from cryotubes to 1.5 mL microcentrifuge tubes. An internal standard (IS; 20 µL of 10 mM L-Alanine-2,3,3,4-d4) was added to all samples and the sample-IS mix was vortexed for 1 min. Samples were dried for 4 h at 0.8 HPa in a centrifugal vacuum concentrator with a −104 °C refrigerated vapour trap (Thermo Fisher Scientific Savant SC250EXP SpeedVac Concentrator with Savant SP5121P Refrigerated Vapour trap). Metabolites were extracted using 50% and 80% cold methanol-water solution (−20 °C, v/v). Specifically, 500 µL of 50% cold methanol-water solution was added to all samples, followed by vortexing for 1 min and centrifugation at 3,000 rpm for 5 min at −4 °C. After centrifugation, the supernatant was transferred to a fresh chilled microcentrifuge tube (kept on dry ice). Then, 500 µL of 80% methanol-water solution was added to the pellet and this was centrifuged at 3,000 rpm for 5 min at −4 °C. The supernatants obtained from both extraction steps were combined and dried in the centrifugal vacuum concentrator for 4 hours at 0.8 HPa, with a −104 °C refrigerated vapour trap. Dried extracted samples were stored at −80 °C prior to derivatisation.
Negative controls were produced by subjecting an empty microcentrifuge tube to the same processing as the samples. Pooled quality control samples (QC) were produced by pooling a small amount from every sample, mixing, and then making aliquots of the same volume as the samples.
Derivatization was carried out using methyl chloroformate (MCF). Samples were derivatised in batches of 18-24, on the same day that they were analysed on the GC-MS. Samples were re-suspended with 400 µL of 1 M sodium hydroxide and were transferred to silanised glass tubes, followed by addition of 334 µL of methanol and 68 µL of pyridine. The sample was placed on a vortexer for the remainder of the derivatization process at ~1,500 rpm. The rate limiting step began from the addition of 40 µL of MCF. A second addition of 40 µL of MCF was made, 30 sec later. After another 30 sec, 400 µL of chloroform was added to extract the alkylated derivatives from the reaction mixture. After 10 sec, 400 µL of sodium bicarbonate (50 mM) was added. Centrifugation was used to separate the aqueous layer from the chloroform layer. After centrifugation, the aqueous layer was removed and the remaining chloroform extract was dehydrated by the addition of sodium sulphate (~0.3 g). The remaining liquid was then transferred to an amber glass GC-MS vial with a glass 33 µL insert.
Gas chromatography -mass spectrometry (Gc-MS) analysis. The GC-MS instrument parameters were based on Smart et al. 43 , with modifications. One microliter of sample was injected for analysis. The injector was set to 290 °C in splitless mode. The column flow was maintained at 1.0 mL min-1 in constant flow mode. The column was a fused silica ZB-1701 30 m long, 0.25 mm inside diameters, with a 0.15 µm stationary phase constituting of 86% dimethylpolysiloxane and 14% cyanopropylphenyl (Phenomenex). Instrument grade helium (>99.99%, BOC) was used as the carrier gas for the analysis. The detector was run in positive-ion, electron-impact ionisation mode, at 70 eV electron energy. Identification of compounds was carried out using mass spectra acquired in scan mode from 38 to 550 atomic mass units. The Cork samples were analysed on an Agilent 7890 A gas chromatograph coupled to an 5975 C inert mass spectrometer. The Auckland samples were analysed on a Thermo Scientific Trace GC Ultra gas chromatograph coupled to an ISQ mass spectrometer.
Data extraction and compound identification. Data processing was semi-automated. The raw files obtained from the GC-MS were converted into common data form (cdf) format for analysis and were deconvoluted and identified using the Automated Mass Spectral Deconvolution and Identification System (AMDIS -http://www.amdis.net/) 44 from an in-house mass spectral library for MCF-derivatised metabolites (~210 compounds) developed by Silas Villas-Boas. The library contained mass spectra predominantly from certified reference standards. In addition to the in-house library, the National Institute of Standards and Technology (NIST) mass spectral library (NIST14, 163,198 compounds) was also used to identify peaks in the raw chromatograms. Since AMDIS is not able to batch deconvolute with the entire NIST library, a NIST subset library was constructed employing a method developed by Elizabeth McKenzie using pooled quality controls.
The NIST subset was produced using the top five results for each feature from the Agilent Chemstation PBM (Probability Based Matching) deconvolution program. MassOmics in house software (version 2.3) was used to create the subset library. MassOmics is an R script based on the XCMS R package 45 , with a Windows graphic user interface (GUI) developed by Ting-Li Han. MassOmics used XCMS and the AMDIS report to integrate the peak areas for each of the identified metabolites. The summary report obtained from running the MassOmics script was then checked, and peaks with low ID hits, or with large retention time shift, and laboratory contaminants were removed.
Data was checked against negative controls to identify and remove background contaminants. Peaks that were not extracted correctly with XCMS were integrated separately using the Ion Extractor feature in MassOmics. Co-eluting peaks underwent manual integration. The relative abundance obtained for each metabolite was normalised to the internal standard (Alanine-d4). After internal standard normalisation, the remaining technical variation was corrected for by analytical batch median centering using the samples. An analytical batch was defined as ~25 injections, comprising ~18 samples, 4 QC's, one Alkane series, one negative control, and one standard mixture.
Statistical analysis. Statistical analysis of the normalised mass spectral data was performed using R 3.4.3 (https://www.R-project.org) 46 . Data was analysed separately for each site. Clinical predictors were analysed for univariate associations with preterm birth, and a multivariate logistic regression model predicting preterm birth with these predictors was selected using stepwise logistic regression with AIC, starting from an intercept only model, followed by backward elimination of variables with multivariate p > 0.05. Mann-Whitney tests were used to analyse the difference in metabolite levels between cases and controls. Ratios between the 20 and 15-week levels of each metabolite were also assessed in this way, to identify situations where changes from baseline rather than absolute metabolite levels were associated with case status. To adjust for multiple comparison testing, false discovery rates (FDR) were calculated for each comparison using the Benjamini-Hochberg procedure 47 . Metabolites with an estimated FDR < 0.05 are reported.
Where potential metabolite predictors were identified, we then built logistic regression and random forest models using the selected clinical factors and metabolites. For logistic regression, we assessed whether one or more metabolites improved the area under the receiver operator characteristic curve compared to the model based on clinical predictors alone 48,49 . The average case-control difference in the linear predictor taken from the selected multivariate logistic regression model was also compared to average linear predictor differences from permuted data, to form a permutation test of model utility. Specifically, the model was refit to data where the response category was permuted to destroy any true association with the predictors. One thousand permutations were performed; the model was considered significantly better than random if the average linear predictor difference for the real data was better than 95% of the permuted replicates (p < 0.05). The random forest was assessed via a permutation test based on the out-of-bag error rate.
To assess whether expanding the set of predictors might improve accuracy, the sparse partial least squares discriminant analysis (PLS-DA) method from the mixOmics package was employed 50 . The clinical variables, log transformed 15 and 20-week metabolite intensities, and 20 to 15-week ratios were used as candidate predictors. Up to three components with five predictors each were used. The number of components was selected based on the Mahalanobis distance balanced error rate using 10-fold cross validation, averaged over 10 repeats. This error rate was also used to assess the significance of this model via a permutation test.
The above procedures were repeated for comparing preterm birth <34 weeks to the entire control group.

Metabolites identified.
In the Cork samples, 176 compounds were detected. Of these, 77 were identified using an in-house library of reference standards and the remaining compounds were identified using mass spectrum alone (NIST 2014 mass spectral library). Of the in-house library matches, 61 were putatively identified (80-100% match to a reference standard) and 15 were tentatively identified (60-79% mass spectral match). Of the NIST14 library identifications, 25 were putatively identified (80-100% mass spectral match), 58 were tentatively identified (60-79% mass spectral match), and three were unknown (<60% mass spectral match).
In the Auckland samples, 142 compounds were detected. Of these, 50 were identified using an in-house library of reference standards and the remaining compounds were identified using mass spectrum alone (NIST 2014). Of the in-house library matches, 41 were identified, eight were tentatively identified, and one was unknown. Of www.nature.com/scientificreports www.nature.com/scientificreports/ the NIST14 library identifications, 51 were putatively identified, 34 were tentatively identified, and seven were unknown. Table 3 shows the metabolites associated with preterm birth <37 weeks' and preterm birth <34 weeks' , at each location and at each gestational time point (15 weeks or 20 weeks); ratios between the 15 and 20 week values were also assessed. Three metabolites detected at 20 weeks of gestation in the Cork subset were found to be significantly associated with sPTB (FDR < 0.05) when compared to term births: undecane, dodecane and decane. All three were found to have higher abundance in sPTB cases. Adding the natural log intensity of undecane to the clinical predictors in a logistic regression model estimates an odds ratio of 1.9 for a 1 standard deviation increase in log (undecane). This is significantly different from 1 (p = 0.0006), and the model as a whole was significant via permutation test (p = 0.001). The other metabolites were correlated with undecane (Pearson's correlation for log intensities r = 0.87 decane, r = 0.89 dodecane) (Suppl. Fig. a). Consequently, results were similar for the other metabolites added individually to the model but adding multiple metabolites did not improve the model further. Figure 1 shows the receiver operating characteristic (ROC) curves for the clinical model only (area under the curve 0.60), and the clinical model with the addition of undecane (area under the curve 0.73) predicting sPTB <37 weeks. Supplementary Fig. b shows the ROC curve for the models predicting sPTB <34 weeks (Suppl . Fig. b).

Metabolites and performance of predictive models.
Similarly, the random forest model using log(undecane) and vaginal bleeding was significantly better than random (p = 0.002) although the classification success was modest (70%). However, models also including log(decane) or log(dodecane), or both, had error rates similar to those based on permuted data (p = 0.29, 0.08 and 0.08, respectively).
Sparse PLS-DA also produced a 1-component model for the Cork preterm birth data with an error rate that was significantly better than error rates produced for random permutations (p = 0.01). This model again included undecane, dodecane, decane, and vaginal bleeding. A fifth predictor, stearic acid measured at 15 weeks' gestation, was present; however, its loading was small, and it did not have an odds ratio significantly different from 1 (p = 0.07) when incorporated into the logistic regression with vaginal bleeding and undecane.
No metabolites or 20-15 week ratios met the false discovery rate threshold for the Cork <34 weeks data, for either preterm <37 weeks or <34 weeks in the Auckland data, nor were the sparse PLS-DA models significant (p = 0.06 preterm birth <34 weeks Cork; p = 0.07, preterm birth <37 weeks Auckland; p = 0.48 preterm birth <34 weeks Auckland).
Log transformed Undecane did not meet the FDR threshold for association with preterm birth <34 weeks in the Cork data (compared to birth ≥ 37 weeks), but was associated with preterm birth <37 weeks. We therefore examined the sensitivity of our model to the threshold for preterm birth. The odds ratio associated with a 1 sd change in log(Undecane), in a model also including vaginal bleeding, when the divider between preterm birth and term birth was reduced to 36, 35 and 34 weeks were 2.1 (p = 0.0009), 2.2 (p = 0.0014), and 1.7 (p = 0.0571) respectively.

Discussion
We have analysed potential predictors of sPTB using serum samples and clinical data from Cork and Auckland participants of the SCOPE study, an international cohort of low-risk nulliparous women. An untargeted metabolomics approach was applied to serum samples collected at 15 and 20 weeks of gestation. More than one hundred metabolites were identified in each subset (Cork and Auckland). As expected for the metabolomics method employed, the most common classes of metabolites were fatty acids, followed by amino acids. Only three metabolites from the 20-week serum of Cork participants were found to be significantly associated with sPTB. Vaginal bleeding before 15 weeks, and smoking during pregnancy were the only clinical factors associated with www.nature.com/scientificreports www.nature.com/scientificreports/ sPTB, in Cork and Auckland subsets, respectively. In the Cork cohort, adding undecane to a multivariate logistic regression model for predicting sPTB improved its performance over a model with vaginal bleeding alone. This improvement is robust to the definition of preterm birth, persisting when the preterm threshold was decreased to 36 or 35 weeks; at 34 weeks the improvement is no longer significant. We note that, as depicted in Fig. 1, there are clear average differences in undecane levels between case and control groups, but also substantial overlap, limiting the utility of the metabolite measurements in clinical practice.
There is a biologically plausible explanation for observing elevated alkanes (decane, undecane, dodecane) in the serum of mothers who had sPTB. Oxidative stress may lead to degradation of cell membranes by lipid peroxidation, followed by conversion of polyunsaturated fatty acids to volatile alkanes. The association of several oxidative stress-associated processes and sPTB have been previously reported [51][52][53] , and oxidative stress has been associated with elevated alkane levels in gastroenteric disease, lung disease, and other chronic diseases of metabolism [54][55][56] . Glutathione, an important intracellular antioxidant, has been found to be decreased in maternal and umbilical cord blood of very low preterm neonates and their mothers 57 . Preterm birth seems to be associated with depletion of glutathione, reinforcing a possible increased oxidative status and lower antioxidant capacity. However, failure to demonstrate elevated alkanes among sPTB cases in the Auckland cohort limits our confidence, and there was no evidence these alkanes were associated with early sPTB (<34 weeks) at either study site. In addition, hypotheses describing the role of reactive oxygen species generation, metabolic and inflammatory imbalance and  www.nature.com/scientificreports www.nature.com/scientificreports/ many other downstream mechanisms (telomerase reduction, cell apoptosis and senescence, etc.) caused by oxidative stress activation 51 do not clarify whether those mechanisms are trigger factors or consequences of underlying conditions/alterations resulting in preterm PROM and/or spontaneous onset of preterm labour.
The differences across study sites in the associations between sPTB and clinical factors such as smoking and vaginal bleeding in this case-control analysis also suggest that there are differences in the Cork and Auckland populations. The alkanes elevated in the Cork preterm birth group are present in outdoor and indoor air contaminants, which could potentially differ across study sites, providing an explanation for this observation. Association of environmental exposures with maternal and perinatal health have been reported for many years but are not yet well established 58,59 . Further investigation of individual pollutant exposure would be necessary to confirm whether elevated alkanes are associated with environmental exposure. It is also possible that technical rather than biological variability accounts for some of the differences observed across sites. Samples from the different sites, while analysed using the same protocols, were run on different platforms by different technical personnel. Thirty-four fewer metabolite species were detected in the Auckland samples, suggesting reduced sensitivity. In particular, two of the three metabolites elevated in cases in Cork (decane and dodecane) were not identified in the Auckland samples.
We also examined prediction of very preterm birth (<34 weeks). No clinical predictors were significant in the Auckland cohort, but vaginal bleeding before 15 weeks was associated with very preterm birth, as well as preterm birth, in the Cork cohort.
To the best of our knowledge, there are few studies in the literature applying metabolomics techniques to understand sPTB in asymptomatic pregnant women 33,[60][61][62] . Previous studies have used between 20 and 70 samples of a variety of biofluids (amniotic and cervicovaginal fluid, as well as serum), equally divided between cases and controls. There is also a diversity of pregnancy stages and analytical techniques, so it is not surprising that there is little overlap in the specific compounds identified. However, Virgiliou et al. also suggested that the changes in amino acids and lipids they observed could be related to oxidative stress 62 .
Our study has strengths and limitations. Cases and controls were selected from a large cohort comprised of low-risk nulliparous women enrolled in early pregnancy and containing a high standard biobank. Several procedures were employed to assure data, sample, and analysis compliance and reliability according to Standard Operating Protocols. Shortcomings of our study include lack of data regarding cervical length, a previously reported risk factor, and assaying of the metabolome of the two cohorts at different times on different equipment. Genetic factors may also be contributors to PTB risk however these were not assessed in this study. In addition, we have not investigated predictive metabolites for the different preterm birth subtypes (spontaneous onset of preterm labour or preterm premature rupture of membranes), focusing instead on an early sPTB (<34 weeks) subgroup, due to the increased morbidity in this specific group. Using data and samples from women of different populations (Cork, Ireland; Auckland, New Zealand), enabled us to compare reproducibility of our technique and also to discuss possible local drivers for sPTB (Alkane pollutants in Cork, Ireland, for instance). Cork and Auckland samples were analysed by different laboratory experts and instruments and as such, the reproducibility and sensitivity differed across the sites.
While our finding of an association between elevated alkanes and sPTB at the Cork site is preliminary, it raises several interesting questions to pursue in the future. What differences are typical or expected across geographically distant study sites? What role might exposure to exogenous pollution sources, including alkanes, play in preterm birth? And finally, how might oxidative stress trigger, or be triggered by, processes leading to spontaneous preterm birth?