## Introduction

Technological advances in medical diagnostics and therapeutics in the last decade have greatly reduced the burden of several life-threatening diseases affecting young children1. The Millennium Development Goals (MDG) report released by UN in 2015 indicates significant reduction in the incidence rates of malaria, tuberculosis, measles, and AIDS in the last 15 years2. However, amidst these encouraging signs of improvement, the report highlights that a majority of deaths in children (under 5) occur within the first 28 days of life (i.e. the neonatal period). Complications arising due to preterm births are indicated as the single largest contributor to neonatal deaths3,4. Surprisingly, in comparison to the decrease in incidence rates of other diseases across the globe, the rate of preterm births has remained more or less constant, irrespective of a country’s economic status. For instance, in 2014, the rate of preterm births in US stood at 9.6%, which is quite comparable to the rate (~12%) tracked in countries from the developing world (Brazil, India, and Nigeria)5. Overall, these trends highlight the urgent need for greater international attention on developing improved methods for diagnosis, prevention, and management of preterm births.

The entire cascade of patho-physiologic events that cause a preterm delivery (PTD) is not completely understood till date. Genetic predisposition, maternal risk factors (e.g., age, smoking, alcohol intake, reproductive history), urinary tract infections, intrauterine infections, etc., are typically associated with increased PTD risk. Given the variety of predisposing factors that contribute to an increased PTD risk6,7, there exist intervention approaches that can potentially promote a healthy full-term gestation outcome8,9. However, the success of these intervention approaches is dependent on identification of high-risk subjects as early as possible during pregnancy3. Given this context, diagnostic markers (physical and/or biochemical) that can accurately indicate, at an early stage of pregnancy, the possibility of progression towards a preterm delivery outcome assume a lot of significance10,11. Figure 1 provides an overview of approaches (and few commercial in vitro diagnostic offerings) used for determining PTD risk. Given below is a summary of various methods depicted in Fig. 1.

Briefly, general factors considered for ‘risk scoring’ include ethnicity, socio-economic status, periodontal health, blood pressure, weight, diabetes, smoking, alcohol consumption, inter-pregnancy duration, etc.10,12,13,14. The clinical predictive values of the stated factors are however quite limited. Pathologic manifestations such as bacterial vaginosis or urinary tract infections during pregnancy, unless properly addressed, are also known to adversely impact pregnancy outcome in most cases15,16,17,18. Several studies have also investigated associations between ‘physical markers’ (e.g. cervical length, uterine artery pulsatility index, etc.) and the eventual pregnancy outcome (term or preterm)19,20. Most studies indicate cervical length shortening in the second trimester of pregnancy to be associated with higher risk of spontaneous preterm birth19,21,22,23. More than its utility in indicating an impending preterm birth outcome, cervical length appears to have a good negative predictive value. CerviLenz is a commercially available device that finds utility in measurement of cervical length24.

A few biochemical markers identified from cervico-vaginal secretions, amniotic fluid, urine, saliva, serum, and plasma also find utility in assessment of premature delivery risk (Fig. 1). These include inflammation markers like cervical IL-6, serum C-reactive protein (CRP), and other proteins like fetal fibronectin, β-hCG, placental α-microblobulin etc.10,25,26,27,28,29,30. Amongst these, fetal fibronectin (Ffn) has been reported to be the most effective marker, having more than 60% sensitivity in predicting spontaneous preterm births, based on sampling done during ~22–24 weeks of gestation27,28. Inflammation markers like IL-6, tested at a similar time-period during pregnancy, can also predict an impending preterm delivery, albeit with lesser sensitivity10,31. On the other hand, hormonal markers like β-hCG have been reported to predict spontaneous preterm birth outcomes with high sensitivity, but are applicable only during late stages of pregnancy (~34 weeks of gestation)32,33. Some recently developed in vitro diagnostic tests rely on comprehensive proteomic and metabolomic analyses of biological samples (blood, amniotic fluid, etc.) and combine multiple risk predictors in order to increase sensitivity of prediction. For example, while a test offered by SERA prognostics provides risk assessment based on the detected levels of two blood proteins (viz. SHBG & IBP4)34,35, another test offered by Metabolon relies on a metabolomic analysis of the amniotic fluid36. It is likely that high costs and limited accuracy (~60–80%) of the methods depicted in Fig. 1 deter gynaecologists and public health organisations from recommending routine (wide-spread) clinical usage. The false positive prediction rates of depicted methods also remain high10,11. Even a single false prediction will needlessly subject a pregnant woman to unnecessary mental turmoil, whilst also incurring personal/state-funded diagnostic and monitoring costs.

In a significant shift from physical and/or biochemical diagnostic markers, a few recent studies have indicated the potential of employing characteristics of vaginal microbial communities (in pregnant women) as a diagnostic marker for predicting pre/full term outcomes23,37,38,39,40,41,42,43,44. Observations indicate that taxonomic profiles derived from vaginal microbiomes of preterm subjects tend to cluster as a somewhat distinct group (typically referred to as CST viz., community state type). The consistent presence of species belonging to known bacterial pathogens such as Gardenerella, Atopobium, Ureaplasma, etc., (with certain abundance) in samples grouping into a preterm delivery associated CST, has fuelled a lot of research focus in this direction37. Reports from these efforts also indicate subtle differences in alpha-diversity metrics (particularly with respect to species diversity and evenness measures) between taxonomic profiles obtained from vaginal microbiome samples taken from preterm and full-term subjects38,40.

In this study, we performed a systematic analysis of taxonomic diversity profiles corresponding to 1621 publicly available vaginal microbiomes (sampled from 303 pregnant women) pooled from four recent studies37,38,44,45. Table 1 provides details of the four studies. The aim was to understand and investigate temporal differences, if any, between the community structures of vaginal microbiomes sampled from pregnant women during various stages of their pregnancy. Further, suitable experiments were designed for evaluating and comparing the efficiency of various diversity measures in differentiating between vaginal microbiomes sampled from pregnant women with reported “term” or “preterm” delivery outcomes. The overall objective of this study was to obtain a possible answer for the following question. Are there potential (temporal) signatures in the microbial community structure of vaginal samples (in pregnant women) that can indicate predisposition to preterm birth?

## Results and Discussion

Changes in vaginal microbiome diversity across various stages of pregnancy were first evaluated using Shannon diversity40,46 as a metric. In order to obtain a clear picture of microbial community transition at various time points in pregnancy, and to minimize the effects of outliers, taxonomic profiles corresponding to 1621 samples pooled from 4 studies (Table 1) were divided into 15 overlapping week-wise groups. Shannon diversity values of each of the samples were computed using their respective taxonomic abundance profiles. Figure 2 depicts the Shannon diversity trends across the 15 temporally overlapping groups for the evaluated ‘term’ and ‘preterm’ samples. Results indicate that women with preterm delivery outcomes tend to have lesser diversity in their vaginal microbiome during their first 15–20 weeks of pregnancy as compared to women with term delivery outcomes. After approximately 20 weeks of pregnancy, the vaginal microbiome diversity in case of both term and preterm outcomes appear to converge and remain more or less stable in the remaining weeks of pregnancy.

The observed temporal differences (with respect to Shannon diversity) between microbial communities in vaginal samples taken from subjects with term or preterm delivery outcomes indicate the possibility of employing a suitable diversity metric that can effectively capture the microbial community structure (in early weeks of pregnancy) and can therefore be potentially employed as a screening/diagnostic marker for predicting preterm delivery risk. Given this context, the following experiments were performed with the objective of evaluating and comparing the capability of various existing diversity measures (including Shannon diversity) in predicting pregnancy delivery outcomes (term or preterm) from a taxonomic profile corresponding to a sampled vaginal microbiome.

All available microbiomes were segregated into 33 week-wise groups. Group ‘WeekN’ comprised of all vaginal microbiomes that were sampled at any time-point on or before the Nth week of pregnancy (N ranging between 8–40). Microbiomes in each group were labelled into classes ‘Term’ or ‘PTD’, denoting (reported) ‘full-term’ or ‘preterm’ delivery outcome respectively. Besides Shannon diversity, other widely used/reported alpha diversity indices, viz., Chao1 and Simpson, denoting species richness and evenness respectively47,48,49, were computed for taxonomic profiles corresponding to microbiomes in all groups. Given that vaginal microbiomes have an overwhelming predominance of microbial species belonging to Lactobacillus, with other taxa playing the minority role (as the ‘tail’), microbial abundance distribution (upon ordering) resembles a highly skewed Lorenz curve (analogous to inequitable distribution of wealth in human populations). Keeping this in mind, statistical measures of inequality, viz., Gini, Atkinson, Theil, Decile ratios, and Ricci-Schutz indices were also computed for individual taxonomic profiles of all vaginal microbiomes50,51,52,53,54,55. For each group, the diagnostic value (ability) of individual indices to differentiate between term and PTD outcomes was estimated in terms of Mathews Correlation Coefficient (MCC), a measure that captures both specificity and sensitivity (details in methods section), at a threshold value that best separates the compared classes56. ‘Extent of Segregation’ (ES), an additional feature (Supplementary Figure 1) that quantifies the differentiating capability of a metric was also computed in order to evaluate metrics achieving a perfect MCC value of 1 (i.e. complete separation between case and control samples). In addition to the mentioned diversity and inequality metrics, we also computed ‘Taxonomic Composition Skew’ (TCS), a novel metric, for all samples. Supplementary Information provides a brief background that places TCS metric in the context of other existing diversity and inequality metrics, and the methodology adopted for computing TCS (along with a worked-out example).

Results depicted in Table 2 indicate that all evaluated diversity and inequality measures obtain positive MCC values. This clearly indicates significant differences (in taxonomic diversity) between vaginal microbiome samples obtained from women with term or PTD outcomes. Interestingly, these differences are observed to be more pronounced in (approximately) the first trimester of pregnancy (<15 gestation weeks). With increasing gestational age, the MCC values for various indices progressively decrease, but still remain above a value of 0.25. Although this indicates that diversity and inequality measures (in the late second and third trimesters of pregnancy) exhibit contrasting trends between term and PTD cases at a population level, their utility in subject-specific risk assessment appears to be limited. Amongst the compared indices, TCS is observed to outperform others, with the difference appearing more prominent until 15–20 weeks of gestation (Table 2). A statistical comparison (employing Wilcoxon signed rank test) indicates that MCC values obtained using TCS (until 20th week of pregnancy) are significantly higher (Benjamini-Hochberg corrected p-values < 0.008) than that obtained using other diversity and inequality measures. Furthermore, TCS is observed to obtain perfect MCC values of 1 (until Week-10), thereby indicating that a vaginal microbiome sample (obtained from a subject at any time point in pregnancy on or before the 10th gestation week) can be employed for accurately diagnosing/predicting the risk of a preterm delivery outcome while avoiding false alarms. Even when samples obtained on or before 20th gestation week were considered, TCS could provide an MCC value of 0.823, indicating sufficiently high sensitivity and specificity of prediction. A comparison of ‘Extent of Segregation’ (ES) values obtained for various metrics also indicates that ES provided by TCS is significantly higher (Wilcoxon signed rank test, Benjamini-Hochberg corrected p-values < 0.005) than that obtained with other metrics. Higher ES values indicate increased confidence with respect to the differentiating capability of the metric.

Results obtained using microbiome datasets cumulated from all 4 studies indicate existence of differences in vaginal microbial community structures (during the first 15–20 weeks of pregnancy) in pregnant subjects with term and preterm delivery outcomes. However, with increasing gestation age, these (diversity) differences are observed to become less pronounced. In order to assess whether these differences in term and preterm vaginal microbial communities (in the “early” stages of pregnancy) remain consistent between individual studies and across datasets sampled from women with diverse ethnicities, suitable internal and external cross-validation experiments were designed and performed in the following manner.

Internal cross-validation experiments were performed using vaginal microbiome taxonomic profiles corresponding to two sets of samples belonging to study 1, which was the largest amongst the four studies in terms of number of samples (Table 1). The two sets comprised of samples that were obtained during the first 15 weeks (set 1) and the first 20 weeks (set 2) of pregnancy, respectively. In each set, two-thirds of samples were randomly selected and used as a ‘training corpus’ for determining an optimal threshold value of TCS metric that would provide maximal separation (quantified in terms of MCC) between the microbiome samples corresponding to preterm (case) delivery and term (control) delivery outcomes. The efficiency of the determined “threshold” value (in predicting pregnancy outcome) was subsequently evaluated using the corresponding ‘test corpus’, which comprised of data pertaining to the remaining one-third of the microbiome samples. In each set, the above cross validation procedure was iterated 1000 times. During each of these iterations, the ability of the determined “threshold” TCS value to differentiate between microbiome samples (of respective test corpus) corresponding to cases (preterm) and controls (term) was evaluated in terms of six parameters, namely, accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Matthews’s correlation coefficient (MCC). For the purpose of comparison, all other diversity and inequality metrics were also subjected to the same internal cross-validation exercise.

Results, in terms of six evaluation parameters (mentioned above), obtained across 1000 iterations of testing are summarized in Tables 3 and 4. Mean values of the evaluation parameters (along with respective standard deviations) generated from set 1, as well as, set 2 are depicted in Tables 3 and 4, respectively. Results in these tables are observed to concur with earlier observations depicted in Table 2. Diversity of microbial communities indeed appears to be a good indicator of pregnancy outcome. Amongst the compared metrics, TCS is observed to have better prediction efficiency and a good balance between specificity and sensitivity of prediction (which is also reflected in the MCC values). A statistical comparison of evaluation parameter values obtained (across 1000 iterations) using TCS metric with those obtained using other metrics for both subsets of (testing corpus) samples are further provided in Tables 3 and 4 respectively. As evident from the data shown in these tables, in majority of cases, TCS obtains significantly higher values of accuracy, sensitivity, specificity, PPV, NPV, and MCC, thereby clearly indicating its potential utility in accurately diagnosing/predicting the risk of the preterm delivery from vaginal microbiome samples in the early stages (i.e. first 15–20 weeks) of pregnancy.

To further evaluate the robustness of diversity metrics (with respect to their differentiating capability between term and preterm pregnancy outcomes) across studies/ethnicities, the following external validation experiment was performed. Mean thresholds of various metrics (including TCS) obtained during the internal cross-validation experiments (performed using data from Study 1) were employed to check their applicability with microbiome data collected from the other three studies, viz. Studies 2, 3, and 4 (considering them as external validation data). In the current scenario, external validation data should ideally comprise of vaginal microbiome samples taken from pregnant women from a different geography or ethnicity. Although, studies 2 and 3 comprise of samples obtained from American women (similar to that of Study 1), data from these studies have differences with respect to racial distribution of the subject cohort, experimental protocols, and the 16 S variable regions sequenced37,44. In an absolute contrast, the external validation data also comprises data from Study 4, wherein samples were obtained from Chinese subjects. Results of external validation are provided in Tables 5 and 6. Similar to the internal cross-validation experiments, external validation was also performed for two subsets of samples using data cumulated till 15 weeks and 20 weeks, respectively. Results, with respect to all six evaluation parameters (viz. accuracy, sensitivity, specificity, PPV, NPV, and MCC), clearly indicate and lend support to the hypothesis regarding distinct differences between vaginal microbial community structure (in early stages of pregnancy) between women with term and preterm delivery outcomes. More so, this hypothesis appears to hold true irrespective of the ethnicity/geography. Results also confirm the relatively better efficiency of TCS metric (compared to other diversity metrics) in capturing the community level differences prevalent in vaginal microbiomes (in the first 15–20 weeks of pregnancy).

It is pertinent to mention here that the number of microbiome samples corresponding to preterm delivery outcome that could be collated for this study were limited. Availability of data from other ethnicities and geographies (besides US and China) would be essential for a more robust statistical validation of the above findings. It is expected that more data will become available in the coming years as interest in this field grows, allowing for a more comprehensive and statistically revealing meta-analysis. Furthermore, the concept of microbiome based diagnostics is still in its infancy, and translating the findings of this study to a clinical setting will have its own set of challenges.

## Conclusion

The capability to ‘accurately’ predict a preterm delivery outcome, right in the first trimester of pregnancy, enables the following. Primarily, early prediction allows application/administration of available physical and/or pharmacological interventions (either prophylactic or therapeutic) to the concerned subject, with an aim of reducing/completely obviating the impending risk. Moreover, it helps in initiating monitoring/surveillance of the concerned subject and suitably enhancing levels of antenatal care. On a different note, early and (more importantly) ‘accurate’ prediction also finds application in identifying/recruiting a cohort of high-risk subjects willing to participate in clinical trials of novel intervention techniques that reduce the risk of preterm delivery outcomes and associated complications57.

In summary, this work explores and validates the utility of vaginal microbiome diversity in enabling ‘early’ prediction of preterm delivery outcomes. This work also introduces a novel diversity metric (TCS) that can accurately predict a preterm delivery outcome, as early as in the first trimester of pregnancy. Validation results indicate the potential utility of employing TCS metric in a clinical diagnostic setting (for accurate preterm birth risk assessment). We anticipate that the presented findings have far reaching implications in the fight against neonate mortality resulting due to preterm births.

## Methods

OTU level taxonomic profiles (Greengenes OTUs version 13.5, clustered at 97% identity) corresponding to vaginal microbiome samples from four studies37,38,44,45 were obtained. Alpha diversity (Shannon, Simpson, Chao1) and inequality measures (Gini-coefficient, Ricci-Schutz, Atkinson, Theil, and Decile ratio) corresponding to the taxonomic profiles were calculated using R packages vegan (v2.0–10) and ineq (v0.2–13), respectively. In addition, the skew in abundances of different microbial groups in the taxonomic profiles was computed using the novel metric – TCS (details in Supplementary information). Evaluation parameters, viz. MCC, AUC, specificity, sensitivity, accuracy, PPV and NPV were calculated using R packages – ROCR, cross val v.1.0.3, and pROC. Matthews Correlation Coefficient (MCC), a measure that captures both specificity and sensitivity of prediction/classification using a selected threshold value of the index under consideration, was computed using the equation below.

$${\rm{MCC}}=\frac{{\rm{TP}}\times {\rm{TN}}-{\rm{FP}}\times {\rm{FN}}}{\sqrt{({\rm{TP}}+{\rm{FP}})({\rm{TP}}+{\rm{FN}})({\rm{TN}}+{\rm{FP}})({\rm{TN}}+{\rm{FN}})}}$$
(1)

Wherein, TP, TN, FP, and FN represent the number of true-positive predictions, true-negative predictions, false-positive predictions, and false-negative predictions, respectively. A perfect MCC value of +1 indicates complete separation between the microbiome samples corresponding to the preterm delivery and the term delivery. Other evaluation parameters were calculated from the generated confusion matrices using the following formulae (equations 26):

$${\rm{Accuracy}}=({\rm{TP}}+{\rm{TN}})/({\rm{TP}}+{\rm{TN}}+{\rm{FP}}+\text{FN})$$
(2)
$${\rm{Sensitivity}}={\rm{TP}}/({\rm{TP}}+{\rm{FN}})$$
(3)
$${\rm{Specificity}}={\rm{TN}}/({\rm{TN}}+{\rm{FP}})$$
(4)
$${\rm{PPV}}={\rm{TP}}/({\rm{TP}}+{\rm{FP}})$$
(5)
$${\rm{NPV}}={\rm{TN}}/({\rm{TN}}+{\rm{FN}})$$
(6)

The extent of segregation (ES), an additional feature that quantifies the differentiating capability of various metrics evaluated in the present study was computed using the following equation.

$${\rm{Extent}}\,{\rm{of}}\,\text{Segregation}\,({\rm{ES}})=\frac{\min [{\rm{\delta }}(\max \,{{\rm{D}}}_{{\rm{TD}}},\,{{\rm{minD}}}_{{\rm{PTD}}}),\,{\rm{\delta }}\,(\max \,{{\rm{D}}}_{{\rm{PTD}}},\,{{\rm{minD}}}_{{\rm{TD}}})]}{\max \,[{\rm{\delta }}(\max \,{{\rm{D}}}_{{\rm{TD}}},\,{{\rm{minD}}}_{{\rm{PTD}}}),\,{\rm{\delta }}\,(\max \,{{\rm{D}}}_{{\rm{PTD}}},\,{{\rm{minD}}}_{{\rm{TD}}})]}\,\times \,100$$
(7)

Wherein,

DTD→ set of values calculated for a given diversity/inequality metric for all samples corresponding to term delivery outcomes.

DPTD→ set of values calculated for a given diversity/inequality metric for all samples corresponding to preterm delivery outcomes.

δ (max DTD, min DPTD) → absolute difference between the maximum value of the set DTD and the minimum value of the set DPTD.

δ (max DPTD, min DTD) → absolute difference between the maximum value of the set DPTD and the minimum value of the set DTD.

A higher ES value indicates a better separation between the two groups of microbiome samples. Figure 3 diagrammatically depicts an example of calculation of extent of segregation (ES), and indicates how ES captures the discriminating ability of an ecological-diversity or an economic-inequality metric.