First-trimester vaginal microbiome diversity: A potential indicator of preterm delivery risk

Preterm birth is a leading cause of global neonate mortality. Hospitalization costs associated with preterm deliveries present a huge economic burden. Existing physical/biochemical markers for predicting preterm birth risk are mostly suited for application at mid/late pregnancy stages, thereby leaving very short time (between diagnosis and delivery) for adopting appropriate intervention strategies. Recent studies indicating correlations between pre/full-term delivery and the composition of vaginal microbiota in pregnant women have opened new diagnostic possibilities. In this study, we performed a thorough meta-analysis of vaginal microbiome datasets to evaluate the utility of popular diversity and inequality measures for predicting, at an early stage, the risk of preterm delivery. Results indicate significant differences (in diversity measures) between ‘first-trimester’ vaginal microbiomes obtained from women with term and preterm outcomes, indicating the potential diagnostic utility of these measures. In this context, we introduce a novel diversity metric that has significantly better diagnostic ability as compared to established diversity measures. The metric enables ‘early’ and highly accurate prediction of preterm delivery outcomes, and can potentially be deployed in clinical settings for preterm birth risk-assessment. Our findings have potentially far reaching implications in the fight against neonatal deaths due to preterm birth.

The clinical predictive values of the stated factors are however quite limited. Pathologic manifestations such as bacterial vaginosis or urinary tract infections during pregnancy, unless properly addressed, are also known to adversely impact pregnancy outcome in most cases [15][16][17][18] . Several studies have also investigated associations between 'physical markers' (e.g. cervical length, uterine artery pulsatility index, etc.) and the eventual pregnancy outcome (term or preterm) 19,20 . Most studies indicate cervical length shortening in the second trimester of pregnancy to be associated with higher risk of spontaneous preterm birth 19,[21][22][23] . More than its utility in indicating an impending preterm birth outcome, cervical length appears to have a good negative predictive value. CerviLenz is a commercially available device that finds utility in measurement of cervical length 24 .
A few biochemical markers identified from cervico-vaginal secretions, amniotic fluid, urine, saliva, serum, and plasma also find utility in assessment of premature delivery risk (Fig. 1). These include inflammation markers like cervical IL-6, serum C-reactive protein (CRP), and other proteins like fetal fibronectin, β-hCG, placental α-microblobulin etc. 10,[25][26][27][28][29][30] . Amongst these, fetal fibronectin (Ffn) has been reported to be the most effective marker, having more than 60% sensitivity in predicting spontaneous preterm births, based on sampling done during ~22-24 weeks of gestation 27,28 . Inflammation markers like IL-6, tested at a similar time-period during pregnancy, can also predict an impending preterm delivery, albeit with lesser sensitivity 10,31 . On the other hand, hormonal markers like β-hCG have been reported to predict spontaneous preterm birth outcomes with high sensitivity, but are applicable only during late stages of pregnancy (~34 weeks of gestation) 32,33 . Some recently developed in vitro diagnostic tests rely on comprehensive proteomic and metabolomic analyses of biological samples (blood, amniotic fluid, etc.) and combine multiple risk predictors in order to increase sensitivity of prediction. For example, while a test offered by SERA prognostics provides risk assessment based on the detected levels of two blood proteins (viz. SHBG & IBP4) 34,35 , another test offered by Metabolon relies on a metabolomic analysis of the amniotic fluid 36 . It is likely that high costs and limited accuracy (~60-80%) of the methods depicted in Fig. 1 deter gynaecologists and public health organisations from recommending routine (wide-spread) clinical usage. The false positive prediction rates of depicted methods also remain high 10,11 . Even a single false prediction will needlessly subject a pregnant woman to unnecessary mental turmoil, whilst also incurring personal/state-funded diagnostic and monitoring costs.
In a significant shift from physical and/or biochemical diagnostic markers, a few recent studies have indicated the potential of employing characteristics of vaginal microbial communities (in pregnant women) as a diagnostic marker for predicting pre/full term outcomes 23,[37][38][39][40][41][42][43][44] . Observations indicate that taxonomic profiles derived from vaginal microbiomes of preterm subjects tend to cluster as a somewhat distinct group (typically referred to as CST viz., community state type). The consistent presence of species belonging to known bacterial pathogens such as Gardenerella, Atopobium, Ureaplasma, etc., (with certain abundance) in samples grouping into a preterm delivery associated CST, has fuelled a lot of research focus in this direction 37 . Reports from these efforts also indicate subtle differences in alpha-diversity metrics (particularly with respect to species diversity and evenness measures) between taxonomic profiles obtained from vaginal microbiome samples taken from preterm and full-term subjects 38,40 .
In this study, we performed a systematic analysis of taxonomic diversity profiles corresponding to 1621 publicly available vaginal microbiomes (sampled from 303 pregnant women) pooled from four recent studies 37,38,44,45 . Table 1 provides details of the four studies. The aim was to understand and investigate temporal differences, if any, between the community structures of vaginal microbiomes sampled from pregnant women during various stages of their pregnancy. Further, suitable experiments were designed for evaluating and comparing the efficiency of various diversity measures in differentiating between vaginal microbiomes sampled from pregnant women with reported "term" or "preterm" delivery outcomes. The overall objective of this study was to obtain a possible answer for the following question. Are there potential (temporal) signatures in the microbial community structure of vaginal samples (in pregnant women) that can indicate predisposition to preterm birth?

Results and Discussion
Changes in vaginal microbiome diversity across various stages of pregnancy were first evaluated using Shannon diversity 40,46 as a metric. In order to obtain a clear picture of microbial community transition at various time points in pregnancy, and to minimize the effects of outliers, taxonomic profiles corresponding to 1621 samples pooled from 4 studies ( Table 1) were divided into 15 overlapping week-wise groups. Shannon diversity values of each of the samples were computed using their respective taxonomic abundance profiles. Figure 2 depicts the Shannon diversity trends across the 15 temporally overlapping groups for the evaluated 'term' and 'preterm' samples. Results indicate that women with preterm delivery outcomes tend to have lesser diversity in their vaginal microbiome during their first 15-20 weeks of pregnancy as compared to women with term delivery outcomes. After approximately 20 weeks of pregnancy, the vaginal microbiome diversity in case of both term and preterm outcomes appear to converge and remain more or less stable in the remaining weeks of pregnancy.
The observed temporal differences (with respect to Shannon diversity) between microbial communities in vaginal samples taken from subjects with term or preterm delivery outcomes indicate the possibility of employing a suitable diversity metric that can effectively capture the microbial community structure (in early weeks of pregnancy) and can therefore be potentially employed as a screening/diagnostic marker for predicting preterm delivery risk. Given this context, the following experiments were performed with the objective of evaluating and comparing the capability of various existing diversity measures (including Shannon diversity) in predicting pregnancy delivery outcomes (term or preterm) from a taxonomic profile corresponding to a sampled vaginal microbiome.
All available microbiomes were segregated into 33 week-wise groups. Group 'Week N ' comprised of all vaginal microbiomes that were sampled at any time-point on or before the N th week of pregnancy (N ranging between 8-40). Microbiomes in each group were labelled into classes 'Term' or 'PTD' , denoting (reported) 'full-term' or 'preterm' delivery outcome respectively. Besides Shannon diversity, other widely used/reported alpha diversity indices, viz., Chao1 and Simpson, denoting species richness and evenness respectively [47][48][49] , were computed for taxonomic profiles corresponding to microbiomes in all groups. Given that vaginal microbiomes have an overwhelming predominance of microbial species belonging to Lactobacillus, with other taxa playing the minority role (as the 'tail'), microbial abundance distribution (upon ordering) resembles a highly skewed Lorenz curve (analogous to inequitable distribution of wealth in human populations). Keeping this in mind, statistical measures of inequality, viz., Gini, Atkinson, Theil, Decile ratios, and Ricci-Schutz indices were also computed for individual taxonomic profiles of all vaginal microbiomes [50][51][52][53][54][55] . For each group, the diagnostic value (ability)  Table 1. Details of microbiome studies considered in the present analysis. Only those microbiome samples were considered that had at least 500 taxonomically assigned sequences and were collected from pregnant women within 40 weeks of gestation.

Figure 2.
Shannon diversity trends across the 15 temporally overlapping groups for the evaluated 'term' and 'preterm' samples. Results indicate that women with preterm delivery outcomes tend to have lesser diversity in their vaginal microbiome during their first 15-20 weeks of pregnancy as compared to women with term delivery outcomes.
Scientific RepoRts | 7: 16145 | DOI:10.1038/s41598-017-16352-y of individual indices to differentiate between term and PTD outcomes was estimated in terms of Mathews Correlation Coefficient (MCC), a measure that captures both specificity and sensitivity (details in methods section), at a threshold value that best separates the compared classes 56 . 'Extent of Segregation' (ES), an additional feature (Supplementary Figure 1) that quantifies the differentiating capability of a metric was also computed in order to evaluate metrics achieving a perfect MCC value of 1 (i.e. complete separation between case and control samples). In addition to the mentioned diversity and inequality metrics, we also computed 'Taxonomic Composition Skew' (TCS), a novel metric, for all samples. Supplementary Information provides a brief background that places TCS metric in the context of other existing diversity and inequality metrics, and the methodology adopted for computing TCS (along with a worked-out example). Results depicted in Table 2 indicate that all evaluated diversity and inequality measures obtain positive MCC values. This clearly indicates significant differences (in taxonomic diversity) between vaginal microbiome samples obtained from women with term or PTD outcomes. Interestingly, these differences are observed to be more pronounced in (approximately) the first trimester of pregnancy (<15 gestation weeks). With increasing gestational age, the MCC values for various indices progressively decrease, but still remain above a value of 0.25. Although this indicates that diversity and inequality measures (in the late second and third trimesters of pregnancy) exhibit contrasting trends between term and PTD cases at a population level, their utility in subject-specific risk assessment appears to be limited. Amongst the compared indices, TCS is observed to outperform others, with the difference appearing more prominent until 15-20 weeks of gestation ( Table 2). A statistical comparison (employing Wilcoxon signed rank test) indicates that MCC values obtained using TCS (until 20 th week of pregnancy) are significantly higher (Benjamini-Hochberg corrected p-values < 0.008) than that obtained using other diversity and inequality measures. Furthermore, TCS is observed to obtain perfect MCC values of 1 (until Week-10), thereby indicating that a vaginal microbiome sample (obtained from a subject at any time point in pregnancy on or before the 10 th gestation week) can be employed for accurately diagnosing/predicting the risk of a preterm delivery outcome while avoiding false alarms. Even when samples obtained on or before 20 th gestation week were considered, TCS could provide an MCC value of 0.823, indicating sufficiently high sensitivity and specificity of prediction. A comparison of 'Extent of Segregation' (ES) values obtained for various metrics also indicates that ES provided by TCS is significantly higher (Wilcoxon signed rank test, Benjamini-Hochberg corrected p-values < 0.005) than that obtained with other metrics. Higher ES values indicate increased confidence with respect to the differentiating capability of the metric.
Results obtained using microbiome datasets cumulated from all 4 studies indicate existence of differences in vaginal microbial community structures (during the first 15-20 weeks of pregnancy) in pregnant subjects with term and preterm delivery outcomes. However, with increasing gestation age, these (diversity) differences are observed to become less pronounced. In order to assess whether these differences in term and preterm vaginal microbial communities (in the "early" stages of pregnancy) remain consistent between individual studies and across datasets sampled from women with diverse ethnicities, suitable internal and external cross-validation experiments were designed and performed in the following manner.
Internal cross-validation experiments were performed using vaginal microbiome taxonomic profiles corresponding to two sets of samples belonging to study 1, which was the largest amongst the four studies in terms of number of samples ( Table 1). The two sets comprised of samples that were obtained during the first 15 weeks (set 1) and the first 20 weeks (set 2) of pregnancy, respectively. In each set, two-thirds of samples were randomly selected and used as a 'training corpus' for determining an optimal threshold value of TCS metric that would provide maximal separation (quantified in terms of MCC) between the microbiome samples corresponding to preterm (case) delivery and term (control) delivery outcomes. The efficiency of the determined "threshold" value (in predicting pregnancy outcome) was subsequently evaluated using the corresponding 'test corpus' , which comprised of data pertaining to the remaining one-third of the microbiome samples. In each set, the above cross validation procedure was iterated 1000 times. During each of these iterations, the ability of the determined "threshold" TCS value to differentiate between microbiome samples (of respective test corpus) corresponding to cases (preterm) and controls (term) was evaluated in terms of six parameters, namely, accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Matthews's correlation coefficient (MCC). For the purpose of comparison, all other diversity and inequality metrics were also subjected to the same internal cross-validation exercise.
Results, in terms of six evaluation parameters (mentioned above), obtained across 1000 iterations of testing are summarized in Tables 3 and 4. Mean values of the evaluation parameters (along with respective standard deviations) generated from set 1, as well as, set 2 are depicted in Tables 3 and 4, respectively. Results in these tables are observed to concur with earlier observations depicted in Table 2. Diversity of microbial communities indeed appears to be a good indicator of pregnancy outcome. Amongst the compared metrics, TCS is observed to have better prediction efficiency and a good balance between specificity and sensitivity of prediction (which is also reflected in the MCC values). A statistical comparison of evaluation parameter values obtained (across 1000 iterations) using TCS metric with those obtained using other metrics for both subsets of (testing corpus) samples are further provided in Tables 3 and 4 respectively. As evident from the data shown in these tables, in majority of cases, TCS obtains significantly higher values of accuracy, sensitivity, specificity, PPV, NPV, and MCC, thereby clearly indicating its potential utility in accurately diagnosing/predicting the risk of the preterm delivery from vaginal microbiome samples in the early stages (i.e. first 15-20 weeks) of pregnancy.
To further evaluate the robustness of diversity metrics (with respect to their differentiating capability between term and preterm pregnancy outcomes) across studies/ethnicities, the following external validation experiment was performed. Mean thresholds of various metrics (including TCS) obtained during the internal cross-validation experiments (performed using data from Study 1) were employed to check their applicability with microbiome data collected from the other three studies, viz. Studies 2, 3, and 4 (considering them as external validation data). In the current scenario, external validation data should ideally comprise of vaginal microbiome samples taken from pregnant women from a different geography or ethnicity. Although, studies 2 and 3 comprise of samples obtained from American women (similar to that of Study 1), data from these studies have differences with respect to racial distribution of the subject cohort, experimental protocols, and the 16 S variable regions sequenced 37,44 . In an absolute contrast, the external validation data also comprises data from Study 4, wherein samples were obtained from Chinese subjects. Results of external validation are provided in Tables 5 and 6. Similar to the internal cross-validation experiments, external validation was also performed for two subsets of samples using data cumulated till 15 weeks and 20 weeks, respectively. Results, with respect to all six evaluation parameters (viz. accuracy, sensitivity, specificity, PPV, NPV, and MCC), clearly indicate and lend support to the hypothesis regarding distinct differences between vaginal microbial community structure (in early stages of pregnancy) between women with term and preterm delivery outcomes. More so, this hypothesis appears to hold true irrespective of the ethnicity/geography. Results also confirm the relatively better efficiency of TCS metric (compared to other diversity metrics) in capturing the community level differences prevalent in vaginal microbiomes (in the first 15-20 weeks of pregnancy). It is pertinent to mention here that the number of microbiome samples corresponding to preterm delivery outcome that could be collated for this study were limited. Availability of data from other ethnicities and geographies (besides US and China) would be essential for a more robust statistical validation of the above findings. It is expected that more data will become available in the coming years as interest in this field grows, allowing for a more comprehensive and statistically revealing meta-analysis. Furthermore, the concept of microbiome based diagnostics is still in its infancy, and translating the findings of this study to a clinical setting will have its own set of challenges.

Conclusion
The capability to 'accurately' predict a preterm delivery outcome, right in the first trimester of pregnancy, enables the following. Primarily, early prediction allows application/administration of available physical and/or pharmacological interventions (either prophylactic or therapeutic) to the concerned subject, with an aim of reducing/ completely obviating the impending risk. Moreover, it helps in initiating monitoring/surveillance of the concerned subject and suitably enhancing levels of antenatal care. On a different note, early and (more importantly) 'accurate' prediction also finds application in identifying/recruiting a cohort of high-risk subjects willing to participate in clinical trials of novel intervention techniques that reduce the risk of preterm delivery outcomes and associated complications 57 .
In summary, this work explores and validates the utility of vaginal microbiome diversity in enabling 'early' prediction of preterm delivery outcomes. This work also introduces a novel diversity metric (TCS) that can accurately predict a preterm delivery outcome, as early as in the first trimester of pregnancy. Validation results indicate the potential utility of employing TCS metric in a clinical diagnostic setting (for accurate preterm birth risk assessment). We anticipate that the presented findings have far reaching implications in the fight against neonate mortality resulting due to preterm births.

Methods
OTU level taxonomic profiles (Greengenes OTUs version 13.5, clustered at 97% identity) corresponding to vaginal microbiome samples from four studies 37,38,44,45 were obtained. Alpha diversity (Shannon, Simpson, Chao1) and inequality measures (Gini-coefficient, Ricci-Schutz, Atkinson, Theil, and Decile ratio) corresponding to the taxonomic profiles were calculated using R packages vegan (v2.0-10) and ineq (v0.2-13), respectively. In addition, the skew in abundances of different microbial groups in the taxonomic profiles was computed using the novel metric -TCS (details in Supplementary information). Evaluation parameters, viz. MCC, AUC, specificity, sensitivity, accuracy, PPV and NPV were calculated using R packages -ROCR, cross val v.1.0.3, and pROC. Matthews Correlation Coefficient (MCC), a measure that captures both specificity and sensitivity of prediction/ classification using a selected threshold value of the index under consideration, was computed using the equation below.
Wherein, TP, TN, FP, and FN represent the number of true-positive predictions, true-negative predictions, false-positive predictions, and false-negative predictions, respectively. A perfect MCC value of +1 indicates complete separation between the microbiome samples corresponding to the preterm delivery and the term delivery.
Other evaluation parameters were calculated from the generated confusion matrices using the following formulae (equations 2-6): The extent of segregation (ES), an additional feature that quantifies the differentiating capability of various metrics evaluated in the present study was computed using the following equation. Wherein, D TD → set of values calculated for a given diversity/inequality metric for all samples corresponding to term delivery outcomes. D PTD → set of values calculated for a given diversity/inequality metric for all samples corresponding to preterm delivery outcomes.
δ (max D TD , min D PTD ) → absolute difference between the maximum value of the set D TD and the minimum value of the set D PTD .
δ (max D PTD , min D TD ) → absolute difference between the maximum value of the set D PTD and the minimum value of the set D TD .
A higher ES value indicates a better separation between the two groups of microbiome samples. Figure 3 diagrammatically depicts an example of calculation of extent of segregation (ES), and indicates how ES captures the discriminating ability of an ecological-diversity or an economic-inequality metric.